Austin Leath Final Promo Doc Revision
Scope of role
The L5 Cloud Support Engineer works independently, troubleshooting enterprise technical support cases of all severities. They help clarify the customer need, determine if there is a problem, evaluate impact(s) or technical risk(s), and manage customer expectations for resolution appropriately. They know when to escalate critical and complex issues and propose workarounds during times of crisis to prevent interruptions to customers productivity. TAMs, ProServe, Solution Architects may reach out to them directly to assist on customer cases.
They are a Subject Matter Expert in one or more services (or have similar accreditation) and handle escalations related to those services. They have a high-level understanding of architecture, operational parameters, and troubleshooting techniques. They can assess when the right action is to replicate workloads to best serve or guide a customer, and can provide guidance to customers around the risks/opportunities with various implementations so that the customer can make the right trade-offs. They help increase the usage of Amazon published service support tools and add to the materials available to enable case deflection and have customers adopt best practices.
L5 Support Engineers identify repetitive or serious problems and communicate these issues to their team, Support Operations, and the service team. They may automate manual tasks or create tools that improve SE-E productivity. They continue to learn new and emerging technologies and help train other SEs through new hire training, mentoring, service launch planning, and other knowledge sharing events. They create instructive scenarios and internal documentation for the team. They may use the experiences of customers to drive new features and/or improvements for Amazon Services Support products (e.g., Infrastructure Event Management). They play a large role in hiring new talent and mentoring Support Engineers through the organization.
Promotion assessment
Austin Joined AWS Network Devices in January of 2023 as a Cloud Support Associate after taking part in the Cloud Support Associate Intern program in the summer of 2022. Austin ramped up on all Network Devices services and transitioned to CSE I in August of 2023 after completing his A2P. In the last twelve months, Austin demonstrated exceptional performance by achieving 517 resolves against a goal of 403, maintaining a positive CCR percentage of 94.59% against a goal of 92% and a DRE percentage of 98.34% against a goal of 95%. His technical excellence and customer obsession were recognized through multiple achievements, including being awarded the Wise Guru award three times in 2024 and achieving top resolver status for severity 5 Direct Connect cases in Q1 2024. Austin obtained his VPN SME accreditation in February of 2025 and achieved these milestones while being involved in Profile Roles, Mentoring, Hiring Processes, VPN Support Operations, Content Development and various internal sprint projects for the VPN Service. Austin has also passed the AWS Cloud Practitioner, Solutions Architect—Associate certifications.
Customer Impact
Customer: FDA CFSAN
In January 2024, Austin led an engagement with FDA and their partner CFSAN addressing critical MACsec connectivity issues affecting their Direct Connect infrastructure. When Layer 2 connections failed to establish despite functional Layer 1 connectivity, Austin faced initial resistance from the Service Team who found no apparent issues. Demonstrating Disagree and Commit, he persisted with detailed evidence including packet captures and logs that proved the existence of a systemic problem affecting multiple customers globally. Through troubleshooting and coordination between Direct Connect service team and Cisco, Austin helped uncover a previously unidentified firmware bug in the VC-CAS devices that prevented Layer 2 MACsec sessions from establishing under specific scenarios. His thorough investigation revealed this wasn't an isolated incident but a global issue affecting Direct Connect Points of Presence worldwide, potentially impacting hundreds of customers using MACsec-enabled connections. Austin set clear expectations with both the customer and Service Team, maintaining transparent communication about investigation progress while firmly advocating for deeper technical investigation. His persistence and detailed technical evidence led the Service Team to acknowledge the broader implications of the issue. Working collaboratively with Solutions Architects, he coordinated between AWS DX Service Team and Cisco to develop a comprehensive solution. This resulted in planning and executing an emergency maintenance window for critical infrastructure upgrades, which not only resolved FDA & CFSAN's immediate issues but led to a global rollout of fixes across all Direct Connect Points of Presence. The resolution restored FDA & CFSAN's operations from a critical red status to green, while also preventing potential service degradation for other customers worldwide using MACsec-enabled connections.
Customer: United States Department of Agriculture
Austin’s ability to combine deep technical knowledge and effective stakeholder management allows him to deliver high impact results and drive rapid resolution to intricate cases that his peers do not have technical depth to resolve. This is evidenced in a SEV1 case Austin led in November and December 2024 involving critical performance degradation in a Direct Connect connection between Lumen and AWS Transit Gateway. The customer was experiencing significant business impact due to limited bandwidth performance (15-20 Mbps) when pushing data from Azure Gov East to AWS commercial us-east-1 region through their Direct Connect connection. Austin's packet capture analysis revealed MTU mismatches between AWS Transit Gateway and Lumen's provider device. He confirmed this by systematically analyzing infrastructure metrics, including light levels, link utilization, and error counters. Austin then examined AWS Transit Gateway flow logs and conducted network packet captures at both endpoints. This precise investigation identified a misconfigured QOS policy on Lumen's device as the root cause. Austin’s approach pinpointed the issue without relying on assumptions, demonstrating effective troubleshooting skills. While previous engineers had been unable to identify the root cause, Austin noticed subtle Maximum Transmission Unit (MTU) differences in the packet captures. This discovery led him to develop a crucial hypothesis that a Quality of Service (QoS) policy was active on the Lumen provider device. This insight proved to be the key factor in finally resolving the issue. What distinguished Austin's handling of this case was his ability to effectively communicate his technical findings across multiple stakeholders, including Microsoft Azure ExpressRoute and Lumen Backbone engineering teams. His identification of the QOS policy as the root cause accelerated the resolution process significantly, as Lumen was able to directly address the issue by removing the policy, immediately restoring normal performance levels. The impact of Austin's work was substantial - he helped restore critical infrastructure performance for a customer operating across multiple cloud providers. The impact realized by the customer was particularly significant, as 95% of their global workloads were affected by this performance issue. The customer's appreciation for Austin's efforts was evident in their feedback: "We really appreciate your support on this. We will be having our 2PM Central call to discuss these findings. You are more than welcome to attend, but at this time we feel that the root cause has been discovered and resolved. - and again we appreciate all of the support." This response not only highlights the customer's satisfaction but also underscores the comprehensive nature of Austin's solution, which effectively resolved the issue without requiring further discussion or troubleshooting.
Technical Proficiency
Point 72 Asset Management
Austin led the resolution of a critical Severity 5 case for Point 72 Asset Management, addressing complex network connectivity issues affecting their AWS MWAA environment. The customer experienced significant business impact during peak hours (11 AM to 5 PM EST), with 50% of MWAA jobs failing, causing daily financial and backup reports to fail, severely hampering day-to-day operations. Austin orchestrated a comprehensive technical analysis using traffic mirroring, MTR, Traceroute, and strategic packet captures. His investigation revealed routing anomalies between the customer's Stamford and Orangeburg sites, with both Orangeburg Direct Connect links operating at maximum capacity (9.8 Gbps each). Using NetVane and other internal tools, he identified that backup jobs during business hours were causing network saturation, while inefficient ECMP routing configurations were preventing proper load distribution between their Direct Connect locations. Notably, Austin set expectations with the customer's CIO, who initially rejected the possibility of a configuration issue on their third-party device. Through persistent, data-driven analysis of packet captures and routing tables, Austin demonstrated that the incorrect ECMP routing configuration was causing traffic distribution issues between their Stamford and Orangeburg sites, leading to the network saturation during peak hours. Austin developed and implemented a solution combining immediate relief by collaborating with the customer's storage team to limit backup server threads and optimize backup scheduling, while also providing technical guidance on AS-PATH prepending for optimized traffic routing between sites. He worked directly with the customer's engineering team, who had never encountered such an ambiguous peak-hour issue before, educating them on network monitoring and troubleshooting. The comprehensive solution reduced link utilization from 100% to 20-40% across both Orangeburg connections, restoring stability to their business-critical financial reporting systems and establishing long-term network reliability. His thorough knowledge transfer ensured the customer's team could prevent similar issues in the future.
Anthem, Inc
Austin resolved an E2M escalation that had VP-level customer visibility for Anthem Inc, where their us-east-1 production environment experienced severe packet loss during peak business hours. Through Austin’s investigation, he analyzed two Direct Connect connections showing 12Gbps traffic peaks and identified Transit Gateway FREP packet failures using MTR, tcpdump packet captures, and traceroute testing across availability zones. His technical evidence, using TCP retransmission patterns and MTR reports showing latency spikes, led to him initiating a targeted escalation with the TGW service team. Austin pinpointed the problem in the TGW Architecture at the TOP level, identified that the service failed to scale, and directed the service team to investigate this component. This focus led to resolution when Austin instructed the TGW service team to scale up the customer's TGW to handle the elevated workload during business peak hours. He documented the analysis, correlating data from router statistics, light level measurements, and error counts, and conducted a knowledge transfer through documentation to ensure the customer's team could prevent similar issues. The reconfiguration of the Transit Gateway scaling engine based on traffic patterns restored business operations for this Enterprise Support customer and prevented future service disruptions. He ensured that the code changes made to the TGW scaling engine were integrated into the global deployment pipeline to protect customers in all regions from experiencing similar issues.
Team Impact
Mentoring:
Austin completed 37 hiring activities as a CSE-I (L4), conducting 12 phone screens, 7 loop interviews, and 9 reverse shadowing sessions to coach new interviewers. He mentored 7 team members, including 4 CSAs and 3 new CSE-I’s that are meeting performance for their current role. One notable achievement was helping a CSA convert to CSE-I in just 9 months. His mentorship work included reviewing 401 correspondences guiding engineers through pspo. Austin was the top contributor within the team for PSAP by resolving 13 PSAP tickets across VPN, TGW, CloudWAN, Network Manager, and Direct Connect. His expertise in Network Devices enabled him to provide cross-site mentorship to PNW, IAD, MEX, and DFW regions, providing technical guidance on VPN, Direct Connect, TGW, Client VPN, Outposts, and CloudWAN cases.
Contributions to Training & Development:
After identifying a gap during his own onboarding process, where VPN troubleshooting content was primarily delivered through KNETs, Austin took initiative by developing a "VPN Troubleshooting Tips and Tricks" wiki that serves as a quick reference guide for engineers supporting customer VPN issues in real-time. The document, which has received visits from over 300 employees and has garnered over 1000 total page visits, streamlines VPN troubleshooting through five strategic sections: VPN Troubleshooting Methodology Tips, Dante VPN Troubleshooting Tool Usage, CloudWatch VPN Tunnel State Optimization, Public Health Dashboard Analysis, and Specialized Tools Integration. The guide particularly benefits low-tenure engineers by providing easily accessible, structured guidance during customer interactions. Austin maintains and regularly updates this living document to reflect service updates and emerging troubleshooting techniques, enabling faster issue resolution and improved customer experience for AWS Site-to-Site VPN connectivity challenges.
Operational Excellence
Service Team Engagement
Austin worked closely with the VPN service team to tackle a significant challenge involving VPN tunnel troubleshooting and CloudWatch logging - a critical issue that affects 40,600 customers annually who seek AWS Support assistance. To qualify for leading this important initiative to enhance VPN logging capabilities, Austin successfully completed both a Python Assessment and a proctored live coding test that covered Java and Scala programming knowledge, administered by the service team. His implementation will empower customers with self-diagnostic tools, enabling them to independently identify and resolve VPN-related issues. This innovative solution is expected to achieve in VPN support cases, S-team's Project X goal of achieving a target by 2025
Dante-VPN Tool
Austin served as a critical Beta tester for the Dante-VPN tool, where his thorough testing identified and resolved numerous bugs and system inefficiencies. His contributions were fundamental in shaping the tools development, which has since become the primary VPN tooling solution for AWS Support Engineering operations. The tool's significance is evident in its extensive adoption, now supporting 6,500 cases and being utilized by over 850 engineers across multiple AWS teams, including 596 Support Engineers, 197 Enterprise Support Engineers, 42 AWS Managed Cloud Engineers, and 22 AWS ADC Engineers. Austin continues to actively maintain and enhance the tool to ensure its optimal performance across these teams.
WX Telemetry Log Copier Greasemonkey Script
Austin identified that Site-to-Site VPN troubleshooting requires time-consuming manual collection of telemetry logs, which impacted case resolution and engineer handoffs. To address this, he developed the WX Telemetry Historical Logs Link Copier using GreaseMonkey, implementing two primary functions: "Copy Data (CSE Essentials)" and "Copy All Data (VPN SO)" with automated extraction and formatting of Log Report information, VGW/CGW IP addresses, and clickable Log Report URLs. The tool successfully reduced log collection time for total case volume and eliminated the need to re-pull historical logs during engineer case handover. By enabling persistent access to telemetry data for Support Operations, this solution streamlined VPN troubleshooting processes, demonstrating proactive problem-solving that delivered measurable efficiency improvements to the team's daily operations.