This session explores methods for analyzing and optimizing compute performance on AWS Trainium. We begin with the hardware factors that influence performance and introduce theoretical models such as the roofline framework. Practical recommendations are provided to maximize compute efficiency. Key performance indicators are reviewed, including step time and MFU for training, as well as TTFT and OTPS for inference. The session concludes with a demonstration of the Neuron Profiler, analyzing performance data from a Llama 3.2 8B distilled reasoning model.
Subscribe to AWS: https://go.aws/subscribe
Sign up for AWS: https://go.aws/signup
AWS free tier: https://go.aws/free
Explore more: https://go.aws/more
Contact AWS: https://go.aws/contact
Next steps:
Explore on AWS in Analyst Research: https://go.aws/reports
Discover, deploy, and manage software that runs on AWS: https://go.aws/marketplace
Join the AWS Partner Network: https://go.aws/partners
Learn more on how Amazon builds and operates software: https://go.aws/library
Do you have technical AWS questions?
Ask the community of experts on AWS re:Post: https://go.aws/3lPaoPb
Why AWS?
Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—use AWS to be more agile, lower costs, and innovate faster.
#AWS #AI #GenerativeAI #AmazonWebServices #CloudComputing