Skip to main content

Recent Posts

Describe 3-5 accomplishments that best describe your performance in the last year.
Over the past year, I delivered three key accomplishments that demonstrate my commitment to customer obsession, ownership, and innovation while driving significant business impact.

Customer-Focused Innovation: As the sole developer for the AWS Site-to-Site VPN CloudWatch BGP Logging feature launched at Re:Invent 2025, I participated in the complete SDLC from research to deployment. This feature directly addressed customer visibility needs, resulting in over 1,250 unique customer accounts enabling VPN logging across 1,750 connections worldwide. The business impact is substantial—generating up to $31.5K in monthly recurring revenue with projections reaching up to $270K by EOY 2026 all while eliminating up to ~4K VPN support cases annually. I demonstrated Think Big and Ownership by taking full technical responsibility while collaborating with principal engineers, AppSec, and stakeholders throughout the development process.

Operational Excellence: Recognizing that AWS received 5.9K annual VPN-related support cases, I took ownership of improving the Site-to-Site VPN CloudWatch IKE logging experience. Through cross-team collaboration, I delivered 17 comprehensive logging enhancements that eliminated approximately 900 annual support cases. Beta testers confirmed they can now "diagnose and fix VPN problems with ease," directly demonstrating Customer Obsession by enabling self-service capabilities for customers with limited IPSec expertise.

Leadership and Development: I organized and led a comprehensive week-long CSA upskilling workshop for newly onboarded Network Devices engineers, creating 4 hours of intensive VPN training content. This initiative demonstrated Develop Others and Ownership by proactively addressing knowledge gaps that impacted our team's service delivery effectiveness.

Describe how you demonstrated the Leadership Principles to deliver for customers in the last year.
Strengths

Customer Obsession: Addressing 5.9K annual VPN support cases, I delivered 17 IKE logging enhancements that eliminated ~900 support cases. Beta confirmed they can now "diagnose and fix VPN problems with ease," enabling self-service capabilities and reducing customer friction.

Ownership / Invent & Simplify: As sole developer for the VPN BGP Logging feature launched at Re:Invent 2025, I managed the complete SDLC from research to deployment, including Threat Model creation, AppSec coordination, and stakeholder engagement. Result: 1,250+ customer accounts adopted the feature, generating $31.5K monthly recurring revenue with $270K projections by EOY 2026.

Growth Areas

Think Big: Moving forward, I aim to expand my strategic vision by identifying cross-service integration opportunities that can amplify customer value beyond individual feature development. My successful VPN feature launches demonstrate strong execution capabilities, and I'm excited to leverage this foundation to explore broader observability ecosystems that connect multiple AWS services for enhanced customer outcomes.

Bias for Action: I plan to enhance my delivery approach by implementing iterative development strategies that provide incremental customer value while gathering real-time feedback. Building on my comprehensive feature delivery success, I will focus on breaking complex initiatives into smaller, high-impact releases that enable faster customer benefit realization and continuous improvement cycles.

Strength Leadership Principles
Customer Obsession
Ownership
Invent and Simplify
Growth Leadership Principles
Think Big
Bias for Action
In the next year, what specific actions do you want to take to grow as an Amazonian and increase your customer impact?
Moving forward, I aim to expand my influence across additional AWS networking services, particularly Site-to-Site VPN, Transit Gateway, Direct Connect, and Core Compute/Storage services, while helping junior engineers through comprehensive internal guide documentation. I will continue driving customer-centric innovation and scaling my technical leadership impact by contributing to further case reduction through my role as Continuous Improvement SXO for the AWS Client VPN Service.
Austin Leath Final Promo Doc Revision

Scope of role

The L5 Cloud Support Engineer works independently, troubleshooting enterprise technical support cases of all severities. They help clarify the customer need, determine if there is a problem, evaluate impact(s) or technical risk(s), and manage customer expectations for resolution appropriately. They know when to escalate critical and complex issues and propose workarounds during times of crisis to prevent interruptions to customers productivity. TAMs, ProServe, Solution Architects may reach out to them directly to assist on customer cases. 

 They are a Subject Matter Expert in one or more services (or have similar accreditation) and handle escalations related to those services. They have a high-level understanding of architecture, operational parameters, and troubleshooting techniques. They can assess when the right action is to replicate workloads to best serve or guide a customer, and can provide guidance to customers around the risks/opportunities with various implementations so that the customer can make the right trade-offs. They help increase the usage of Amazon published service support tools and add to the materials available to enable case deflection and have customers adopt best practices.

 L5 Support Engineers identify repetitive or serious problems and communicate these issues to their team, Support Operations, and the service team. They may automate manual tasks or create tools that improve SE-E productivity. They continue to learn new and emerging technologies and help train other SEs through new hire training, mentoring, service launch planning, and other knowledge sharing events. They create instructive scenarios and internal documentation for the team. They may use the experiences of customers to drive new features and/or improvements for Amazon Services Support products (e.g., Infrastructure Event Management). They play a large role in hiring new talent and mentoring Support Engineers through the organization. 

Promotion assessment

Austin Joined AWS Network Devices in January of 2023 as a Cloud Support Associate after taking part in the Cloud Support Associate Intern program in the summer of 2022. Austin ramped up on all Network Devices services and transitioned to CSE I in August of 2023 after completing his A2P. In the last twelve months, Austin demonstrated exceptional performance by achieving 517 resolves against a goal of 403, maintaining a positive CCR percentage of 94.59% against a goal of 92% and a DRE percentage of 98.34% against a goal of 95%. His technical excellence and customer obsession were recognized through multiple achievements, including being awarded the Wise Guru award three times in 2024 and achieving top resolver status for severity 5 Direct Connect cases in Q1 2024. Austin obtained his VPN SME accreditation in February of 2025 and achieved these milestones while being involved in Profile Roles, Mentoring, Hiring Processes, VPN Support Operations, Content Development and various internal sprint projects for the VPN Service. Austin has also passed the AWS Cloud Practitioner, Solutions Architect—Associate certifications.

Customer Impact

Customer: FDA CFSAN
 In January 2024, Austin led an engagement with FDA and their partner CFSAN addressing critical MACsec connectivity issues affecting their Direct Connect infrastructure. When Layer 2 connections failed to establish despite functional Layer 1 connectivity, Austin faced initial resistance from the Service Team who found no apparent issues. Demonstrating Disagree and Commit, he persisted with detailed evidence including packet captures and logs that proved the existence of a systemic problem affecting multiple customers globally. Through troubleshooting and coordination between Direct Connect service team and Cisco, Austin helped uncover a previously unidentified firmware bug in the VC-CAS devices that prevented Layer 2 MACsec sessions from establishing under specific scenarios. His thorough investigation revealed this wasn't an isolated incident but a global issue affecting Direct Connect Points of Presence worldwide, potentially impacting hundreds of customers using MACsec-enabled connections. Austin set clear expectations with both the customer and Service Team, maintaining transparent communication about investigation progress while firmly advocating for deeper technical investigation. His persistence and detailed technical evidence led the Service Team to acknowledge the broader implications of the issue. Working collaboratively with Solutions Architects, he coordinated between AWS DX Service Team and Cisco to develop a comprehensive solution. This resulted in planning and executing an emergency maintenance window for critical infrastructure upgrades, which not only resolved FDA & CFSAN's immediate issues but led to a global rollout of fixes across all Direct Connect Points of Presence. The resolution restored FDA & CFSAN's operations from a critical red status to green, while also preventing potential service degradation for other customers worldwide using MACsec-enabled connections.

Customer: United States Department of Agriculture
 Austin’s ability to combine deep technical knowledge and effective stakeholder management allows him to deliver high impact results and drive rapid resolution to intricate cases that his peers do not have technical depth to resolve. This is evidenced in a SEV1 case Austin led in November and December 2024 involving critical performance degradation in a Direct Connect connection between Lumen and AWS Transit Gateway. The customer was experiencing significant business impact due to limited bandwidth performance (15-20 Mbps) when pushing data from Azure Gov East to AWS commercial us-east-1 region through their Direct Connect connection.  Austin's packet capture analysis revealed MTU mismatches between AWS Transit Gateway and Lumen's provider device. He confirmed this by systematically analyzing infrastructure metrics, including light levels, link utilization, and error counters. Austin then examined AWS Transit Gateway flow logs and conducted network packet captures at both endpoints. This precise investigation identified a misconfigured QOS policy on Lumen's device as the root cause. Austin’s approach pinpointed the issue without relying on assumptions, demonstrating effective troubleshooting skills. While previous engineers had been unable to identify the root cause, Austin noticed subtle Maximum Transmission Unit (MTU) differences in the packet captures. This discovery led him to develop a crucial hypothesis that a Quality of Service (QoS) policy was active on the Lumen provider device. This insight proved to be the key factor in finally resolving the issue. What distinguished Austin's handling of this case was his ability to effectively communicate his technical findings across multiple stakeholders, including Microsoft Azure ExpressRoute and Lumen Backbone engineering teams. His identification of the QOS policy as the root cause accelerated the resolution process significantly, as Lumen was able to directly address the issue by removing the policy, immediately restoring normal performance levels. The impact of Austin's work was substantial - he helped restore critical infrastructure performance for a customer operating across multiple cloud providers. The impact realized by the customer was particularly significant, as 95% of their global workloads were affected by this performance issue. The customer's appreciation for Austin's efforts was evident in their feedback: "We really appreciate your support on this. We will be having our 2PM Central call to discuss these findings. You are more than welcome to attend, but at this time we feel that the root cause has been discovered and resolved. - and again we appreciate all of the support." This response not only highlights the customer's satisfaction but also underscores the comprehensive nature of Austin's solution, which effectively resolved the issue without requiring further discussion or troubleshooting.

Technical Proficiency

Point 72 Asset Management
 Austin led the resolution of a critical Severity 5 case for Point 72 Asset Management, addressing complex network connectivity issues affecting their AWS MWAA environment. The customer experienced significant business impact during peak hours (11 AM to 5 PM EST), with 50% of MWAA jobs failing, causing daily financial and backup reports to fail, severely hampering day-to-day operations. Austin orchestrated a comprehensive technical analysis using traffic mirroring, MTR, Traceroute, and strategic packet captures. His investigation revealed routing anomalies between the customer's Stamford and Orangeburg sites, with both Orangeburg Direct Connect links operating at maximum capacity (9.8 Gbps each). Using NetVane and other internal tools, he identified that backup jobs during business hours were causing network saturation, while inefficient ECMP routing configurations were preventing proper load distribution between their Direct Connect locations. Notably, Austin set expectations with the customer's CIO, who initially rejected the possibility of a configuration issue on their third-party device. Through persistent, data-driven analysis of packet captures and routing tables, Austin demonstrated that the incorrect ECMP routing configuration was causing traffic distribution issues between their Stamford and Orangeburg sites, leading to the network saturation during peak hours. Austin developed and implemented a solution combining immediate relief by collaborating with the customer's storage team to limit backup server threads and optimize backup scheduling, while also providing technical guidance on AS-PATH prepending for optimized traffic routing between sites. He worked directly with the customer's engineering team, who had never encountered such an ambiguous peak-hour issue before, educating them on network monitoring and troubleshooting. The comprehensive solution reduced link utilization from 100% to 20-40% across both Orangeburg connections, restoring stability to their business-critical financial reporting systems and establishing long-term network reliability. His thorough knowledge transfer ensured the customer's team could prevent similar issues in the future.

Anthem, Inc
 Austin resolved an E2M escalation that had VP-level customer visibility for Anthem Inc, where their us-east-1 production environment experienced severe packet loss during peak business hours. Through Austin’s investigation, he analyzed two Direct Connect connections showing 12Gbps traffic peaks and identified Transit Gateway FREP packet failures using MTR, tcpdump packet captures, and traceroute testing across availability zones. His technical evidence, using TCP retransmission patterns and MTR reports showing latency spikes, led to him initiating a targeted escalation with the TGW service team. Austin pinpointed the problem in the TGW Architecture at the TOP level, identified that the service failed to scale, and directed the service team to investigate this component. This focus led to resolution when Austin instructed the TGW service team to scale up the customer's TGW to handle the elevated workload during business peak hours. He documented the analysis, correlating data from router statistics, light level measurements, and error counts, and conducted a knowledge transfer through documentation to ensure the customer's team could prevent similar issues. The reconfiguration of the Transit Gateway scaling engine based on traffic patterns restored business operations for this Enterprise Support customer and prevented future service disruptions. He ensured that the code changes made to the TGW scaling engine were integrated into the global deployment pipeline to protect customers in all regions from experiencing similar issues.

Team Impact

Mentoring:
Austin completed 37 hiring activities as a CSE-I (L4), conducting 12 phone screens, 7 loop interviews, and 9 reverse shadowing sessions to coach new interviewers. He mentored 7 team members, including 4 CSAs and 3 new CSE-I’s that are meeting performance for their current role. One notable achievement was helping a CSA convert to CSE-I in just 9 months. His mentorship work included reviewing 401 correspondences guiding engineers through pspo.  Austin was the top contributor within the team for PSAP  by resolving 13 PSAP tickets across VPN, TGW, CloudWAN, Network Manager, and Direct Connect. His expertise in Network Devices enabled him to provide cross-site mentorship to PNW, IAD, MEX, and DFW regions, providing technical guidance on VPN, Direct Connect, TGW, Client VPN, Outposts, and CloudWAN cases.

Contributions to Training & Development:
 After identifying a gap during his own onboarding process, where VPN troubleshooting content was primarily delivered through KNETs, Austin took initiative by  developing a "VPN Troubleshooting Tips and Tricks" wiki that serves as a quick reference guide for engineers supporting customer VPN issues in real-time. The document, which has received visits from over 300 employees and has garnered over 1000 total page visits, streamlines VPN troubleshooting through five strategic sections: VPN Troubleshooting Methodology Tips, Dante VPN Troubleshooting Tool Usage, CloudWatch VPN Tunnel State Optimization, Public Health Dashboard Analysis, and Specialized Tools Integration. The guide particularly benefits low-tenure engineers by providing easily accessible, structured guidance during customer interactions. Austin maintains and regularly updates this living document to reflect service updates and emerging troubleshooting techniques, enabling faster issue resolution and improved customer experience for AWS Site-to-Site VPN connectivity challenges.

Operational Excellence

Service Team Engagement
Austin worked closely with the VPN service team to tackle a significant challenge involving VPN tunnel troubleshooting and CloudWatch logging - a critical issue that affects 40,600 customers annually who seek AWS Support assistance. To qualify for leading this important initiative to enhance VPN logging capabilities, Austin successfully completed both a Python Assessment and a proctored live coding test that covered Java and Scala programming knowledge, administered by the service team. His implementation will empower customers with self-diagnostic tools, enabling them to independently identify and resolve VPN-related issues. This innovative solution is expected to achieve                        in VPN support cases,                                                                                               S-team's Project X goal of achieving a                                                       target by 2025

Dante-VPN Tool
Austin served as a critical Beta tester for the Dante-VPN tool, where his thorough testing identified and resolved numerous bugs and system inefficiencies. His contributions were fundamental in shaping the tools development, which has since become the primary VPN tooling solution for AWS Support Engineering operations. The tool's significance is evident in its extensive adoption, now supporting 6,500 cases and being utilized by over 850 engineers across multiple AWS teams, including 596 Support Engineers, 197 Enterprise Support Engineers, 42 AWS Managed Cloud Engineers, and 22 AWS ADC Engineers. Austin continues to actively maintain and enhance the tool to ensure its optimal performance across these teams.

WX Telemetry Log Copier Greasemonkey Script
Austin identified that Site-to-Site VPN troubleshooting requires time-consuming manual collection of telemetry logs, which impacted case resolution and engineer handoffs. To address this, he developed the WX Telemetry Historical Logs Link Copier using GreaseMonkey, implementing two primary functions: "Copy Data (CSE Essentials)" and "Copy All Data (VPN SO)" with automated extraction and formatting of Log Report information, VGW/CGW IP addresses, and clickable Log Report URLs. The tool successfully reduced log collection time for total case volume and eliminated the need to re-pull historical logs during engineer case handover. By enabling persistent access to telemetry data for Support Operations, this solution streamlined VPN troubleshooting processes, demonstrating proactive problem-solving that delivered measurable efficiency improvements to the team's daily operations.
lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 507:* rwm
lxc.cgroup2.devices.allow: c 511:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-caps/nvidia-cap1 dev/nvidia-caps/nvidia-cap1 none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-caps/nvidia-cap2 dev/nvidia-caps/nvidia-cap2 none bind,optional,create=file



----

root@pm3:~# nano /etc/pve/lxc/102.conf
root@pm3:~# ls -l /dev/nvid*
crw-rw-rw- 1 root root 195,   0 Dec  8 12:43 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Dec  8 12:43 /dev/nvidiactl
crw-rw-rw- 1 root root 507,   0 Dec  8 12:43 /dev/nvidia-uvm
crw-rw-rw- 1 root root 507,   1 Dec  8 12:43 /dev/nvidia-uvm-tools

/dev/nvidia-caps:
total 0
cr-------- 1 root root 511, 1 Dec  8 12:43 nvidia-cap1
cr--r--r-- 1 root root 511, 2 Dec  8 12:43 nvidia-cap2



apt install gpg curl
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
apt update
apt install nvidia-container-toolkit

Post Statistics