Senior Hpc System Administrator
The University of Chicago is seeking a highly qualified Senior HPC System Administrator to join the system and operation team that builds and manages RCC HPC systems and facility operations. The individual in this position will be involved in the procurement and management of HPC hardware and software. This is a hybrid position requiring 3 days onsite.
Responsibilities include:
- Installing, configuring, and maintaining large computer clusters/servers and software.
- Day-to-day operations of the systems including systems administration, monitoring and storage performance up to and including network components.
- Configuration of the scheduling and queuing system.
- Diagnosing and resolving system operational problems quickly and effectively.
- Use scripting/programming skills to enable system-level automation, problem detection, security maintenance and patch management.
- Building and deploying open-source software and software from vendors/partners.
- Providing reliable and efficient backups/restores for all managed systems.
- Documenting system administration procedures for routine and complex tasks.
- Maintaining and monitoring the security of the HPC systems and servers.
- Plans and installs necessary patches and upgrades for servers and their associated storage, network, communications, and peripheral sub-systems.
- Tracks compliance and maintains documentation for hardware, software, and service inventories for management reports.
- Performs other related work as needed.
Minimum qualifications include a college or university degree in a related field and 5-7 years of work experience in a related job discipline. Preferred qualifications include a master's degree in Computer Science or closely related field, full-time Linux system administration experience in a large distributed computing environment, and experience with various technical skills and knowledge.