NextGen logo

Staff Engineer, Machine Learning Operations

NextGen
Full-time
Remote
India

Job Description:

The Staff Engineer, Machine Learning Operations will provide technical leadership for our AI platform, define architecture and standards for training, evaluation, and high-scale, low-latency inference of models and AI agents. This role will be responsible to develop and implement strategy for CI/CD, governance, and reliability across multiple AI models, partnering with security, compliance, and leadership to deliver resilient, cost-effective AI. Aside from the core responsibilities, Machine Learning Operations Engineers will also have responsibilities shared with other engineering functions.

  • Establish the technical vision for end-to-end ML-AIOps (from data to model/agent to product integration).
  • Design and evolve multi-region, multi-tenant inference/training platforms with strong isolation.
  • Design and Implement CI/CD strategy for models/agents/data pipelines (policy gates, canary/rollbacks, approvals).
  • Institutionalize model/agent monitoring (quality, safety, drift) and business KPIs; sponsor continuous evaluations.
  • Lead major reliability programs (capacity planning, disaster recovery, chaos testing, incident management).
  • Establish and implement governance methodologies for datasets, prompts, models, and agents (lineage, approvals, etc.).
  • Collaborate on security architecture with security teams (zero-trust, key management, vaults, secrets rotation, audit).
  • Evaluate and integrate platforms/vendors; influence build-vs-buy; manage technical debt and roadmap.
  • Mentor/prioritize other engineers; build a culture of documentation, runbooks, and post-incident learning.
  • Perform other duties that support the overall objective of the position.

Education Required:

  • Bachelor’s degree in Computer Science, Information Technology, Electronics/Electrical Engineering, or a related field.
  • Or, any combination of education and experience which would provide the required qualifications for the position.

Experience Required:

  • 5-8 years of hands-on experience in MLOps, DevOps, or related roles involving operation of an AI/ML platform at-scale with 10 – 12+ years of experience in overall IT experience.
  • IaC with Terraform at an organizational scale and strong experience in Unix based environments.
  • Expert with Continerization and orchestration (Docker/Kubernetes) and cloud, including networking, security, and autoscaling.
  • Strong AWS experience is expected.
  • Experience in building CI/CD pipelines using tools like BitBucket Pipelines, AWS Code Pipelines or similar.
  • Experience with mature observability stacks (e.g. DataDog/Dynatrace). Experience with LLM observability frameworks is a plus.
  • Deep experience with operationalizing ML/AI models. Experience with LLMs or AI agents is a plus.

Knowledge, Skills & Abilities:

  • Knowledge of: Familiarity with database technologies and data pipelines (Data Lakes, Lakehouse, Warehouse, NoSQL, ETL/ELT processes). Solid understanding of model monitoring, logging, and debugging tools. Strong command of platform SRE practices, and cost governance. Familiarity with feature stores, lakehouse patterns, distributed computing systems (Spark) and model versioning systems (MLFlow).
  • Skill in: Strong problem-solving skills and a detail-oriented mindset. Excellent communication skills.
  • Ability to: Excellent collaboration ability. Ability to have a clear view of complete systems and the ability to understand and work on different components as and when required. 

The company has reviewed this job description to ensure that essential functions and basic duties have been included. It is intended to provide guidelines for job expectations and the employee's ability to perform the position described. It is not intended to be construed as an exhaustive list of all functions, responsibilities, skills and abilities. Additional functions and requirements may be assigned by supervisors as deemed appropriate. This document does not represent a contract of employment, and the company reserves the right to change this job description and/or assign tasks for the employee to perform, as the company may deem appropriate.

NextGen Healthcare is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Apply now
Share this job