Custom deep reinforcement learning and multi-track training for AWS DeepRacer with Amazon SageMaker RL Notebook

AWS DeepRacer, launched at re:Invent 2018, helps developers get hands on with reinforcement learning (RL). Since then, thousands of people have developed and raced their models at 21 AWS DeepRacer League events at AWS Summits across the world, and virtually via the AWS DeepRacer console. Beyond the summits there have been several events at AWS Lofts, developer meetups, partner sessions, and corporate events.

The enthusiasm among developers to learn and experiment in AWS DeepRacer is exceptionally high. Many want to explore further and have greater ability to modify the neural network architecture, modify the training presets, or train on multiple tracks in parallel.

AWS DeepRacer makes use of several other AWS services: Amazon SageMaker, AWS RoboMaker, Amazon Kinesis Video Streams, Amazon CloudWatch, and Amazon S3. To give you more fine-grained control on each of these components to extend the simulation environment and modeling environment, this post includes a notebook environment that helps provision and manage these environments so you can modify any aspect of the AWS DeepRacer experience. For more information, see the GitHub repo for this post.

This post explores how to set up an environment, dives into the main components of the AWS DeepRacer code base, and walks you through modifying your neural network and training presets, customizing your action space, and training on multiple tracks in parallel. By the end, you should understand how to modify the AWS DeepRacer model training using Amazon SageMaker.

By utilizing the tools behind the AWS DeepRacer console, developers can customize and modify every aspect of their AWS DeepRacer training and models, allowing them to download models to race in person and participate in the AIDO 3 challenge at NeurIPS.

Setting up your AWS DeepRacer notebook environment

To get started, log in to the AWS Management Console and complete the following steps:

From the console, under SageMaker, choose Notebook instances.
Choose Create notebook instance.
Give your notebook a name. For example, DeepracerNotebook.

Because AWS RoboMaker and Amazon SageMaker do the heavy lifting in training, the notebook itself does not need much horsepower.

Leave the instance type as the default ml.t2.medium.
Choose Additional configuration.
For Volume size, set it to at least 25 GB.

This size gives enough room to rebuild the training environment and the simulation application.

Choose Create a new role.
Choose Any S3 bucket.
Choose Create role.

If this is not your first time using Amazon SageMaker Notebooks, select a valid role from the drop-down list.

Leave all other settings as the default.
Choose Create notebook instance.

Here is a screencast showing you how to set up the notebook environment.

It takes a few minutes for the Notebook instance to start. When it’s ready, choose Open Jupyter.

Loading your notebook

To load the AWS DeepRacer sample notebook, complete the following steps:

Choose SageMaker Examples.
Choose Reinforcement Learning.
Next to deepracer_rl.ipynb, choose Use.
Choose Create copy.

This process copies the AWS DeepRacer notebook stack to your notebook instance (found under the Files tab under a rl_deepracer_robomaker_coach_gazebo_YYYY-MM-DD directory), and opens the main notebook file in a new tab.

Here is a screencast of this process:

The AWS DeepRacer notebook environment

You can modify the following files to customize the AWS DeepRacer training and evaluations in any way desired:

src/training_worker.py – This file handles either loading a pre-trained model or creating a new neural network (using a presets file), setting up the data store, and starting up a Redis server for the communication between Amazon SageMaker and AWS RoboMaker.
src/markov/rollout_worker.py – This file runs on the Amazon SageMaker training instance, and downloads the model checkpoints from S3 (initially created by the training_worker.py, and updated by previous runs of rollout_worker.py) and runs the training loops.
src/markov/evaluation_worker.py – This file is used during evaluation to evaluate the model. It downloads the model from S3 and runs the evaluation loops.
src/markov/sagemaker_graph_manager.py – This file runs on the Amazon SageMaker training instance, and instantiates the RL class, including handling the hyperparameters passed in, and sets up the input filters, such as converting the camera input to grayscale.
src/markov/environments/deepracer_racetrack_env.py – This file is loaded twice—both on the Amazon SageMaker training instance, and the AWS RoboMaker instance. It uses the environmental variable NODE_TYPE to determine which environment is running. The AWS RoboMaker instance runs the Robotics Operating System (ROS) code. This file does most of the work of interacting with the AWS RoboMaker environment, such as resetting the car when it goes off the track, collecting the reward function parameters, executing the reward function, and logging to CloudWatch.

You can also add files to the following directories for further customization:

src/markov/rewards – This directory stores sample reward functions. These are copied to S3 and passed on to Amazon SageMaker in the notebook. The notebook copies the selected one to S3, where the deepracer_racetrack_env.py fetches and runs it.
src/markov/actions – This directory contains a series of JSON files that define the action taken for each of the nodes in the last row of the neural network. The one selected (or any new ones created) should match the number of output nodes in your neural network. The notebook copies the selected one to S3, where the rollout_worker.py script fetches it.
src/markov/presets – This directory contains files in which one can modify the RL algorithm and modify other parameters such as the size and shape of the neural network. The notebook copies the selected one to S3, where the rollout_worker.py script fetches it.
Dockerfile – This contains directions for building the container that is deployed to the Amazon SageMaker training instance. The container is built on a standard Ubuntu base, and the src/markov directory is copied into the container. It also has a series of packages installed that AWS DeepRacer uses.

Customizing neural network architectures for RL

You may be interested in how to customize the neural network architecture to do things such as add an entry, change the algorithm, or change the size and shape of the network.

As of this writing, AWS DeepRacer uses the open source package Intel RL Coach to run state-of-the-art RL algorithms. In Intel RL Coach, you can edit the RL algorithm hyperparameters, including but not limited to training batch size, exploration method, and neural network architecture by creating a new presets file.

For examples from the GitHub repo, see defaults.py and preset_attention_layer.py. Specific to your notebook setup, when you make changes to the preset file, you also need to modify sagemaker_graph_manager.py to reflect any appropriate changes to the hyperparameters or algorithm settings to match the new preset file.

Once you have the new file located in the presets/ directory, modify the notebook file to use the new presets file by editing the “Copy custom files to S3 bucket so that Amazon SageMaker and AWS RoboMaker can pick it up” section. See the following code:

s3_location = "s3://%s/%s" % (s3_bucket, s3_prefix)
print(s3_location)

# Clean up the previously uploaded files
!aws s3 rm --recursive {s3_location}

# Make any changes to the environment and preset files below and upload these files
!aws s3 cp src/markov/environments/deepracer_racetrack_env.py {s3_location}/environments/deepracer_racetrack_env.py

!aws s3 cp src/markov/rewards/default.py {s3_location}/rewards/reward_function.py

!aws s3 cp src/markov/actions/model_metadata_10_state.json {s3_location}/model_metadata.json

#!aws s3 cp src/markov/presets/default.py {s3_location}/presets/preset.py
!aws s3 cp src/markov/presets/preset_attention_layer.py {s3_location}/presets/preset.py

The modified last line copies the preset_attention_layer.py instead of the default.py to the S3 bucket. Amazon SageMaker and AWS RoboMaker copy the changed files from the S3 bucket during the initialization period before starting to train.

Customizing the action space and noise injection

The action space defines the output layer of the neural network and how the car acts upon choosing the corresponding output node. The output of the neural network is an array of size equal to the number of actions. The array contains the probabilities taking a particular action. This post uses the index of the output node with the highest probability.

You can obtain the action, speed, and steering angle corresponding to the index of the maximum probability output node via a mapping written in standard JSON. The AWS RoboMaker simulation application uses the JSON file to determine the speed and steering angle during training as well as evaluation phases. The following code example defines five nodes with the same speed, varying only by the steering angle:

{
    "action_space": [
        {
            "steering_angle": -30,
            "speed": 0.8,
            "index": 0
        },
        {
            "steering_angle": -15,
            "speed": 0.8,
            "index": 1
        },
        {
            "steering_angle": 0,
            "speed": 0.8,
            "index": 2
        },
        {
            "steering_angle": 15,
            "speed": 0.8,
            "index": 3
        },
        {
            "steering_angle": 30,
            "speed": 0.8,
            "index": 4
        }
    ]
}

The units for steering angle and speed are degrees and meters per second, respectively. Deepracer_env.py loads the JSON file to execute a given action for a specified output node. This file is also bundled with the exported model for loading on the physical car for the same reason, that is, to map the neural network output nodes to the corresponding steering angle and speed from the simulation to the real world.

The more permutations you have in your action space, the more nodes there are in the output layer of the neural network. More nodes mean bigger matrices for mathematical operations during training; therefore, training takes longer.

The following Python code helps generate custom action spaces:

#!/usr/bin/env python

import json

min_speed = 4
max_speed = 8
speed_resolution = 2

min_steering_angle = -30
max_steering_angle = 30
steering_angle_resolution = 15

output = {"action_space":[]}
index = 0
speed = min_speed
while speed <= max_speed:
    steering_angle = min_steering_angle
    while steering_angle <= max_steering_angle:
        output["action_space"].append( {"index":index,
                                         "steering_angle":steering_angle,
                                         "speed":speed}
                                     )
        steering_angle += steering_angle_resolution
        index += 1
    speed += speed_resolution

print json.dumps(output,indent=4)

Improving your simulation-to-real world transfer

Robotics research has shown that introducing entropy and noise into the simulation helps the model identify more appropriate features and react more appropriately to real-world conditions, leading to better a simulation-to-real world transfer. Keep this in mind while developing new algorithms and networks.

For example, AWS DeepRacer already includes some random noise for the steering angle and speed to account for the changes in the friction and deviations in the mechanical components during manufacturing. You can see this in the following code in src/markov/environments/deepracer_racetrack_env.py:

   def step(self, action):
        self.steering_angle = float(self.json_actions[action]['steering_angle']) * math.pi / 180.0
        self.speed = float(self.json_actions[action]['speed']) + 
    
        ## NOISE ##    
        # Add random NOISE in to both the steering angle and speed
        self.steering_angle += 0.01 * np.random.normal(0, 1.0, 1)
        self.speed += 0.1 * np.random.normal(0, 1.0, 1)

In addition to steering and speed noise, you may want to account for variations in lighting, track material, track conditions, and battery charge levels. You can modify these in the environment code or the AWS RoboMaker world configuration files.

Multi-track training in parallel

You can train your models faster by training on multiple simulation environments with a single training job. For example, one simulation environment may use a road with concrete material, while the other uses carpet. As the parallel AWS RoboMaker environments generate batches, the training instance uses the information from all the simulations to train the model. This strategy helps make sure that the model can identify features of the road instead of some aspect of a single map, or operate under various textures or lighting conditions.

AWS RoboMaker uses Gazebo, an open source 3D robotics simulator. World files define Gazebo environments and use model definitions and collada files to build an environment. The standard AWS DeepRacer simulation application includes several word files: reinvent_base, reinvent_carpet, reinvent_concrete, reinvent_wood, AWS_track, Bowtie_track, Oval_track, and Straight_track. New tracks are released regularly as part of the virtual league; you can identify them by the WORLD_NAME environmental variable on the AWS RoboMaker simulation job.

To run parallel simulation applications with varying world configurations, modify the “Launch the Simulation job on AWS RoboMaker” section of the notebook. See the following code:

import datetime #need microsecond precision to avoid collisions 

envriron_vars = {
    "KINESIS_VIDEO_STREAM_NAME": "SilverstoneStream",
    "SAGEMAKER_SHARED_S3_BUCKET": s3_bucket,
    "SAGEMAKER_SHARED_S3_PREFIX": s3_prefix,
    "TRAINING_JOB_ARN": job_name,
    "APP_REGION": aws_region,
    "METRIC_NAME": "TrainingRewardScore",
    "METRIC_NAMESPACE": "AWSDeepRacer",
    "REWARD_FILE_S3_KEY": "%s/rewards/reward_function.py" % s3_prefix,
    "MODEL_METADATA_FILE_S3_KEY": "%s/model_metadata.json" % s3_prefix,
    "METRICS_S3_BUCKET": s3_bucket,
    "METRICS_S3_OBJECT_KEY": s3_bucket + "/training_metrics.json",
    "TARGET_REWARD_SCORE": "None",
    "NUMBER_OF_EPISODES": "0",
    "ROBOMAKER_SIMULATION_JOB_ACCOUNT_ID": account_id
}

vpcConfig = {"subnets": deepracer_subnets,
             "securityGroups": deepracer_security_groups,
             "assignPublicIp": True}

worldsToRun = ["reinvent_base","reinvent_carpet","reinvent_concrete","reinvent_wood"]

responses = []
for world_name in worldsToRun:
    envriron_vars["WORLD_NAME"]=world_name
    simulation_application = {"application":simulation_app_arn,
                              "launchConfig": {"packageName": "deepracer_simulation_environment",
                                               "launchFile": "distributed_training.launch",
                                               "environmentVariables": envriron_vars}
                              }
    client_request_token = datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S-%f") 
    response =  robomaker.create_simulation_job(iamRole=sagemaker_role,
                                            clientRequestToken=client_request_token,
                                            maxJobDurationInSeconds=job_duration_in_seconds,
                                            failureBehavior="Continue",
                                            simulationApplications=[simulation_application],
                                            vpcConfig=vpcConfig
                                            )
    responses.append(response)

print("Created the following jobs:")
job_arns = [response["arn"] for response in responses]
for response in responses:
    print("Job ARN", response["arn"])

The modified list loops over the new worldsToRun list, and the definition of the simulation_application dictionary is inside the loop (because the envriron_vars dictionary needs to update with a new WORLD_NAME each time). Additionally, the modified clientRequestToken uses microseconds with the datetime module because the old method may have resulted in an error if two jobs were submitted within the same second.

Custom evaluation

The standard AWS DeepRacer console evaluation runs three episodes. If a car goes off the track, that episode is over, and the percentage completed and time thus far is recorded. The number of episodes can be passed in, as the sample notebook demonstrates with the NUMBER_OF_TRIALS assignment in the envriron_vars dictionary. However, you can modify this behavior in the evaluation_worker.py file. To get as many runs in as possible in four minutes, change the following code (lines 37–39):

    while curr_num_trials < number_of_trials:
        graph_manager.evaluate(EnvironmentSteps(1))
        curr_num_trials += 1

The following is the updated code:

    import time
    starttime = time.time()
    while time.time()-starttime < 240:  #240 seconds = 4 minutes
        graph_manager.evaluate(EnvironmentSteps(1))
        curr_num_trials += 1

This lets the car run for four minutes, as per the AWS Summit Physical track rules.

To take this further and simulate the AWS Summit physical race reset rules, wherein a car can be moved back onto the track up to three times before the episode ends, modify the infer_reward_state() function in deepracer_racetrack_env.py. See the following code (lines 396 and 397):

             done = True
             reward = CRASHED

The following is the updated code:

            reward = CRASHED
            try:
              self.resets +=1
            except:
              self.resets = 1 #likely this is the first reset and the variable hadn't been defined before
            if self.resets > 3:
              done = True
            else:
              done = False
              #Now reset everything back onto the track
              self.steering_angle = 0
              self.speed = 0
              self.action_taken = 0
              self.send_action(0, 0)
              for joint in EFFORT_JOINTS:
                  self.clear_forces_client(joint)
              current_ndist -= model_point.distance(self.prev_point)/2  #Try to get close to where the car went off
              prev_index, next_index = self.find_prev_next_waypoints(current_ndist)
              self.reset_car_client(current_ndist, next_index)
              #Clear the image queue so that we don't train on an old state from before moving the car back to the track
              _ = self.image_queue.get(block=True, timeout=None)
              self.set_next_state()

Conclusion

AWS DeepRacer is a fun way to get started with reinforcement learning. To build your autonomous model, all you need is to write a proper reward function in Python. For developers that want to dive deep into the code and environment to extend AWS DeepRacer, this post also provides a notebook environment to do so.

This post showed you how to get started with the notebook environment, customize the training algorithm, modify the action space, train on multiple tracks, and run custom evaluation methods. Please share what you come up with!

A subsequent post dives into modifying the AWS RoboMaker simulation application to train and evaluate on your custom tracks. The post gives tips and tricks on shaping the tracks, shares code for generating tracks, and discusses how to package them for AWS DeepRacer.

About the authors

Neal McFee is a Partner Solutions Architect with AWS. He is passionate about solutions that span Robotics, Computer Vision, and Autonomous systems. In his spare time, he flies drones and works with AWS customers to realize the potential of reinforcement learning via DeepRacer events.

Don Barber is a Senior Solutions Architect, with over 20 years of experience helping customers solve business problems with technology in regulated industries such as finance, pharma, and government. He has a Bachelors in Computer Science from Marietta College and a MBA from the University of Maryland. Outside of the office he spends time with his family and hobbies such as amateur radio and repairing electronics.

Sunil Mallya is a Senior Solutions Architect in the AWS Deep Learning team. He helps our customers build machine learning and deep learning solutions to advance their businesses. In his spare time, he enjoys cooking, sailing and building self driving RC autonomous cars.

Sahika Genc is a senior applied scientist at Amazon artificial intelligence (AI). Her research interests are in smart automation, robotics, predictive control and optimization, and reinforcement learning (RL), and she serves in the industrial committee for the International Federation of Automatic Control. She leads science teams in scalable autonomous driving and automation systems, including consumer products such as AWS DeepRacer and SageMaker RL. Previously, she was a senior research scientist in the Artificial Intelligence and Learning Laboratory at the General Electric (GE) Global Research Center