Simplify data management with new APIs in Amazon Personalize

Amazon Personalize now makes it easier to manage your growing item and user catalogs with new APIs to incrementally add items and users in your datasets to create personalized recommendations. With the new putItems and putUsers APIs, you can simplify the process of managing your datasets. You no longer need to upload an entire dataset containing historical records and new records just to include new records in your recommendations. Providing new records to Amazon Personalize when they become available reduces your latency for incorporating new information, ensuring your recommendations remain relevant to your users and item catalog.

Based on over 20 years of personalization experience at Amazon.com, Amazon Personalize enables you to improve customer engagement by powering personalized product and content recommendations and targeted marketing promotions. Amazon Personalize uses machine learning (ML) to create higher-quality recommendations for your websites and applications. You can get started without any prior ML experience and use simple APIs to easily build sophisticated personalization capabilities in just a few clicks. Amazon Personalize processes and examines your data, identifies what is meaningful, and trains and optimizes a personalization model that is customized for your data. All your data is encrypted to be private and secure, and is only used to create recommendations for your users.

This post walks you through the process of incrementally modifying your items and users datasets in Amazon Personalize.

Adding new items and users to your datasets

For this use case, we create a dataset group with an interaction dataset, an item dataset (item metadata) and a user dataset using the Amazon Personalize CLI. For instructions on creating a dataset group, see Getting Started (CLI).

  1. Create an Interactions dataset using the following schema and import data using the interactions-100k.csv data file:
{
	"type": "record",
	"name": "Interactions",
	"namespace": "com.amazonaws.personalize.schema",
	"fields": [
		{
			"name": "USER_ID",
			"type": "string"
		},
		{
			"name": "ITEM_ID",
			"type": "string"
		},
		{
			"name": "EVENT_TYPE",
			"type": [
				"string"
			]
		},
		{
			"name": "EVENT_VALUE",
			"type": [
				"null",
				"float"
			]
		},
		{
			"name": "TIMESTAMP",
			"type": "long"
		}
	]
}
  1. Create an Items dataset using the following schema and import data using the csv data file:
{
	"type": "record",
	"name": "Items",
	"namespace": "com.amazonaws.personalize.schema",
	"fields": [
		{
			"name": "ITEM_ID",
			"type": "string"
		},
		{
			"name": "GENRE",
			"type”: "null”
			 "categorical": true
		}
	],
	"version": "1.0"
}
  1. Create a Users dataset using the following schema and import data using the csv data file:
{
	"type": "record",
	"name": "Users",
	"namespace": "com.amazonaws.personalize.schema",
	"fields": [
		{
			"name": "USER_ID",
			"type": "string"
		},
		{
			"name": "AGE",
			"type": "int"
		},
		{
			"name": "GENDER",
			"type": "string"
		}
	],
	"version": "1.0"
}

Now that you have created your datasets, you can add data to them in two different ways:

  • Using bulk import for item and user datasets from Amazon Simple Storage Service (Amazon S3). (for more information, see Preparing and Importing Data)
  • Using the new putUsers and putItems You can incrementally add up to 10 records per call to the user dataset using the putUsers API and the items dataset using putItems API.

For the putUsers call, the Users dataset required schema field (USER_ID) is mapped to the camel case userId. For the putItems call, the Items dataset required schema field (ITEM_ID) is mapped to the camel case itemId.

The following code adds two new users to the Users dataset via the putUsers API:

personalize_events.put_users(
datasetArn="arn:aws:personalize:region:acctID:dataset/crud-test/USERS",                          
    	users=[
 {
                 'userId' :"489",
                 'properties': "{"AGE":"29", "GENDER":F}"
             },
             {
                 'userId' : "650",
                 'properties':"{"AGE":"65", "GENDER"":F}"
             }]
)

The following code adds a new item to the Items dataset via the putItems API:

personalize_events.put_items(
datasetArn="arn:aws:personalize:region:acctID:dataset/crud-test/ITEMS",
items=[
{
            'itemId' :"432",
             'properties': "{"GENRE":"Action"}"
         }]
)

An HTTP/1.1 200 response is returned for successful record creation. In cases where your new item or user doesn’t match your dataset’s defined schema, you receive an InvalidInputException detailing the total number of records in your request that don’t match the schema.

For new records created (incrementally or via bulk upload) with the same userId or itemId as a record that already exists in the Users or Items dataset, the most recently created record (ingested by Amazon Personalize) is used in new solutions or solution versions.

Additionally, records added using putUsers or putItems are persisted until your dataset is deleted, so be sure to delete your dataset in the dataset group before importing a refreshed dataset. Amazon Personalize doesn’t replace your catalog or user data management systems.

Incorporating the newly added users and items in recommendations and filters

Now that you’ve added new items and new users to your datasets, incorporating this information into your Amazon Personalize solutions makes sure that recommendations remain timely and relevant for your users. When not using the aws-user-personalization recipe, solution re-training is needed to include these new items in your personalized recommendations.

If you have exploration enabled in an Amazon Personalize recipe, your new items are included in recommendations as soon as your next campaign update is complete. New events generated by your users’ interactions with these items are incorporated when your train a new solution or solution version in this dataset group.

Any filters you created in the dataset group are updated with your new item and user data within 15 minutes from the last dataset import job completion or the last incremental record. This update allows your campaigns to use your most recent data when filtering recommendations for your users.

Summary

Amazon Personalize allows you to easily manage your growing item and user catalogs so your personalized product and content recommendations keep pace with your business and your customers. For more information about optimizing your user experience with Amazon Personalize, see What Is Amazon Personalize?


About the Authors

Matt Chwastek is a Senior Product Manager for Amazon Personalize. He focuses on delivering products that make it easier to build and use machine learning solutions. In his spare time, he enjoys reading and photography.

 

 

 

 

Gaurav Singh Chauhan is a Software Engineer for Amazon Personalize and works on architecting software systems and big data pipelines that serve customers at scale. Gaurav has a B.Tech in Computer Science from IIT Bombay, India. Outside of work, he likes all things outdoors and is an avid runner. In his spare time, he likes reading about and exploring new technologies. He tweets on startups, technology, and India at @bazingaurav.

 

 

View Original Source (aws.amazon.com) Here.

Leave a Reply

Your email address will not be published. Required fields are marked *

Shared by: AWS Machine Learning

Tags: