One of the world's top algorithmic trading firms, our client is looking for GPU Systems Engineers to help scale and evolve their exceptionally sophisticated HPC/AI research environment. Joining the Research and Development team, you will collaborate with experts responsible for the compute, storage, operating systems, and automation tools that enable trading and research to run 24/7 across the globe. They design, grow, and operate infrastructure at a large scale, including triple-digit petabyte-scale storage and massive CPU and GPU clusters in globally distributed data centers. As such, this is a high-impact role with broad scope, from HPC/AI cluster design and performance tuning, to troubleshooting and automation for thousands of nodes. Responsibilities
- Design, build, and optimize large-scale distributed GPU compute clusters
- Identify and resolve GPU workloads' performance bottlenecks across compute, storage, and networking layers
- Collaborate with research and development teams to profile, benchmark, and fine-tune GPU-based workloads
- Automate system deployment, monitoring, and troubleshooting across thousands of nodes
- Collaborate with research, and engineering teams to support evolving workloads
- Own critical infrastructure projects - from concept to implementation and support
- Test and deploy new hardware and software, and partner with vendors to resolve complex issues
Qualifications
- 5+ years of experience in large-scale Linux systems engineering in HPC, AI or distributed infrastructure roles
- Extensive experience in Linux system installation, performance tuning, and troubleshooting
- Expertise in troubleshooting distributed GPU workloads
- Deep knowledge around GPU optimization and performance
- Proficiency in Python scripting and automation frameworks
- CUDA or C/C++ experience is a plus
- Experience with NVIDIA technologies beyond CUDA, such as NCCL, GPUDirect RDMA, and NVLink
- Familiarity with configuration management tools (e.g. Salt, Ansible, Puppet, Chef)
- Comfortable diagnosing complex system issues at the hardware, OS, and network levels
- Strong communication and organizational skills; able to collaborate across diverse technical teams
- Thrive in fast-paced environments and excited by high-impact work
Culture:This fund brings a scientific approach to trading financial products. They've built one of the world's most sophisticated computing environments for research and development, and their researchers are at the forefront of innovation in the world of algorithmic trading. Colleagues come from all sorts of backgrounds: mathematics, computer science, statistics, physics, and engineering. A community of self-starters who are motivated by the excitement of being at the cutting edge of automated trading, and their culture celebrates great ideas whether they come from veterans or new hires. Seem like something you might be interested in? The goal is to find the best people and bring them together to do great work in a place where everyone is valued. They're proud of their diverse staff; with offices all over the globe they benefit from varied and unique perspectives. This is an equal opportunity employer; so whoever you are they'd love to get to know you. Whilst we carefully review all applications, to all jobs, due to the high volume of applications we receive it is not possible to respond to those who have not been successful.
ContactIf this sounds like you, or you'd like more information, please get in touch: George Hutchinson-Binks(+44) in/george-hutchinson-binks-a62a69252