Staff AI Infrastructure Site Reliability Engineer

15000 人民币~20000 人民币/每月

全职
5~10年
刷新于 2 小时前
170 查看
35 申请
深圳
分享
职位职责
Architect and lead the development of scalable, secure AI infrastructure on cloud-native platforms to support autonomous driving technologies Collaborate closely with ML teams to facilitate seamless integration and optimal performance of AI algorithms Identify and address system bottlenecks and instabilities, applying innovative solutions to enhance system reliability and efficiency Foster technological advancements through research and implementation of state-of-the-art AI tools and methodologies Act as a key technical leader and mentor, promoting a culture of technical excellence and collaborative innovation within the AI infrastructure team
职位要求
Minimum Skill Requirements: Bachelor's or Master's in Computer Science, Engineering, or related technical field 5 years + of experience in in designing, deploying, and managing GPU clusters for high-performance computing in AI applications, particularly within cloud environments Proficient in cloud services (AWS, Azure, ALI Cloud) and building containerized applications using Kubernetes and Docker Strong programming skills in Python, Golang, and experience with AI/ML frameworks (TensorFlow, PyTorch) Preferred Skill Requirements: Expertise in designing and managing high-availability, high-throughput systems that support machine learning and deep learning workloads Demonstrable leadership skills with a track record of mentoring and leading technical teams In-depth understanding of data structures, algorithms, and software engineering principles relevant to AI and autonomous systems
搜索你理想的职位
职位类别
城市或国家
职位
人才
博客
我的