Resource Tracker Overview

Overview

Resource Tracker is a service that monitors the health of instances started by Deadline in your AWS account. It monitors a heartbeat reported by each render node running Deadline Slave and terminates instances that are failing Deadline health checks, helping you avoid extra costs. Resource Tracker also monitors the health of your Spot Fleet Requests, and when greater than 35% of a Spot Fleet Request fail their health checks, it cancels that Fleet and prevents new Fleet launches. Once you have solved the underlying issue, you can then restore normal operation by following the instructions in the Notifications section below.

Examples of Failure

Here are a few example scenarios where the Resource Tracker will intervene.

Deadline Slave Can’t Communicate to the Deadline Repository

When the Deadline Slave cannot connect to the Deadline Repository, the Resource Tracker will terminate the instance. That can happen if:

  • The Remote Connection Server isn’t running.
  • The instance hasn’t mounted the Repository network drive correctly.
  • The instance loses its network connection.

The Deadline Repository won’t Allow Incoming Connections

Situations can arise where the Deadline Repository won’t accept incoming connections to it. In that case, the instances won’t be able to connect to the repository and will be terminated. That can happen if:

  • The Repository loses network or internet connection (ex. power outage).
  • The Repository or Remote Connection Server machine shuts down.
  • AWS Portal Link is not running.

The Deadline Slave is Not Reporting its Health

When the Deadline Slave is not reporting its health information to the Resource Tracker, the instance will be terminated. That can happen if:

  • The instance doesn’t have the Deadline Slave installed.
  • The Deadline Slave crashes.
  • The Deadline Slave is older than version 10.0.27.

Starting the Resource Tracker

A Resource Tracker service is automatically started the first time you create an AWS Resource with a compatible version of the AWS Portal or the Spot Event Plugin. If you’re working in multiple AWS regions, one Resource Tracker will be created for each region.

For AWS Portal, the Resource Tracker will be launched prior to you creating a new infrastructure. If you already have a compatible infrastructure, the Resource Tracker will be launched when you create a new Spot Fleet Request.

The Spot Event Plugin will launch the Resource Tracker on House Cleaning.

The additional setup required to run the Resource Tracker is to update your IAM permissions. For AWS Portal, the full list of recommended permissions and for Spot Event Plugin; the Spot Credentials and IAM Instance Profile should be updated and if operating in a private subnet with no IGW/NAT gateway, then VPC Endpoints will be required.

Compatibility

The Resource Tracker was released with version 10.0.27 of Deadline.

Warning

Be sure to upgrade ALL Deadline components to Deadline 10.0.27 or later to ensure that it will operate correctly.

Why is upgrading important?

The Resource Tracker may not function as expected if some components are not upgraded. This can result in:

  • Normal functioning instances being terminated.
  • Instances that are not functioning correctly not being terminated.
  • Spot Fleet Request and their Instances may not be monitored by the Resource Tracker.
  • Infrastructures failing to start.
  • Spot Event Plugin failing to start.

Upgrading AWS Portal

General instructions for upgrading Deadline components can be found here.

Repository

Start by upgrading the Deadline Repository. The Deadline Repository should be upgraded following the instructions found here.

Non-Workstations & Remote Connection Server

Next, upgrade non-render node and non-workstations. For AWS Portal, that means the Remote Connection Server and AWS Portal Link.

Instructions for the Remote Connection Server can be found here.

To upgrade the AWS Portal Link. Install the latest Deadline Client on the machine followed by the latest AWS Portal Link installer.

IAM Policy

The IAM Policy we recommend using for AWS Portal has been updated. Updating your current policy is required for the new features to work. Follow the instructions in Creating an IAM Policy. The new recommended Policy can be found here.

Infrastructures

Simply stop any running infrastructures and start new ones. See Creating a Deadline AWS Portal Infrastructure for more details.

Custom AMI Render Nodes

If you’re using a custom AMI for your render nodes, you’ll need to upgrade the version of the Deadline Slave. You can follow the instructions for customizing an AMI here. NOTE: In this case, your base AMI will be the AMI that needs upgrading. Install the new Deadline Client on the machine and image the machine. Use the new image from now on.

Upgrading the Spot Event Plugin

If currently running, disable the Spot Event Plugin, stop Pulse and cancel any running Spot Fleet Requests that the Spot Event Plugin has made before upgrading anything.

General instructions for upgrading Deadline components can be found here.

Repository

Start by upgrading the Deadline Repository. The Deadline Repository should be upgraded following the instructions found here.

Non-Workstations

Next, upgrade non-render nodes and non-workstations. For the Spot Event Plugin, that means updating Pulse.

Instructions for Pulse can be found here.

IAM Policy

These are the new IAM Policies that are required for the Spot Event Plugin credentials and the Render Node IAM instance profile.

VPC Endpoints

If one or more of your EC2 instances are running inside a private subnet in your VPC, then you will need to provide a mechanism for those instances to be able to access the AWS web service endpoints used in the above IAM policies. A VPC endpoint enables you to privately connect your VPC to supported AWS services and VPC endpoint services powered by PrivateLink without requiring an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection.

Render Nodes

Update the Deadline Slave application on your custom AMI by starting a new instance of that AMI, connect to it and install the latest Deadline Client installer. Create a new AMI from the instance.

Spot Fleet Configuration

Once you have your new AMI created, you’ll need to make a new Spot Fleet configuration. Do not replace the AMI ids in your current configuration as that will result in errors. It’s best to create a new configuration. See How to make a Spot Fleet configuration. Finally, restart Pulse.

Notifications

The status of your instances is indicated by a status label in the Deadline Monitor.

Typically, a status of “Healthy” will be shown.

../_images/resource_tracker_healthy.png

If at least one, but fewer than 35% of your instances are not functioning correctly, the status will change to “Warning”. If a Spot Fleet Request remains in the “Warning” state for 2 hours, it will be marked as “Error”.

../_images/resource_tracker_grace.png

If more than 35% of your instances are not functioning correctly, the status will change to “Error”. You can view which Spot Fleet Requests are in the “Error” state by double-clicking the label or by viewing the status label tooltip.

../_images/resource_tracker_unhealthy.png

You will be unable to create new Spot Fleet Requests or change the target capacity of existing requests when in the “Error” state. When something has gone wrong, we want to avoid starting more instances. This prevents you from being charged for instances that aren’t behaving as intended.

Once you’ve resolved why your fleet(s) have errored you can manually reset the system. To do this, double-click on the status label. Enter your AWS Access Key and AWS Secret Key. Then click OK.

../_images/resolve_unhealthy_state.png