Essential Job Functions
Own end-to-end delivery of domain datasets (intake → requirements → design → build → test → deploy → monitor) with defined SLAs / SLOs.
Translate ambiguous asks into productized, reusable datasets with clear interfaces and lifecycle plans (versioning, deprecation).
Design robust models (staging / core / marts / semantic) and implement scalable ELT / CDC / streaming patterns.
Engineer idempotent, parameterized, and modular pipelines that support schema evolution and backfills / replays safely.
Build datasets and services that power LLMs—RAG pipelines (chunking / embedding / vector search), prompt / response and feedback logs, grounding / attribution metadata, redaction / PII scrubbing, and evaluation sets for hallucination, faithfulness, and relevance.
Author and maintain data contracts (grain, fields, quality thresholds, SLAs / SLIs) and uphold approved metric / semantic definitions.
Ensure ownership, stewardship, lineage, and sensitivity classifications are complete and current in the data catalog.
Implement comprehensive test suites (constraint, referential, reconciliation, anomaly) and certify datasets prior to release.
Drive defect prevention with root-cause analysis, preventive patterns, and quality gates in CI / CD.
Build actionable monitoring / alerting and runbooks; lead triage for domain incidents and execute post-incident reviews.
Improve MTTD / MTTR, data freshness, and success rates through error budgets, SLO dashboards, and automated remediations.
Apply privacy-by-design (least-privilege access, masking / row-level filters, secrets management, audit / retention).
Partner with Compliance / Security on access reviews, data handling, and de-identification / tokenization for secondary use.
Profile and optimize workload performance (partitioning / clustering, materialization choices, scheduling) to meet refresh windows.
Track and reduce cost per successful run; recommend workload isolation and capacity strategies.
Automate builds, tests, deployments, and environment provisioning; enforce branching and review standards.
Publish reusable templates / components that shorten time-to-first-pipeline and increase team throughput.
Partner with product / analytics leads to groom and prioritize work; define Definition of Ready / Done and acceptance criteria.
Sequence dependencies across teams and environments; communicate risks, trade-offs, and timelines transparently.
Maintain clear technical docs, runbooks, and ADRs; provide examples and usage guidance that enable self-service.
Contribute to internal learning design clinics and elevate peer code quality via actionable reviews.
Facilitate demos and handoffs; ensure solutions meet clinical, operational, and financial needs with measurable outcomes.
Convert ad-hoc requests into governed, reusable assets with clear support and change processes.
Partner with BI analysts to refine requirements (grain, fields, filters, refresh cadence), agree on acceptance criteria, and execute a structured dataset handoff (schema, data dictionary, sample queries, SLA / ownership).
Identify and retire legacy patterns; propose platform and process enhancements backed by data.
Pilot emerging approaches (e.g., anomaly detection, semantic layers) and formalize those that prove value.
Advance LLM data capabilities—standardize embedding refresh strategies, vector index maintenance, prompt / version governance, red-teaming datasets, and offline / online LLM eval harnesses (e.g., safety, bias, robustness).
Minimum Qualifications
Education
Experience
Or an equivalent combination of education and experience relating to the above tasks, knowledge, skills and abilities will be considered. Employees that require a licensed or certification must be properly licensed / certified and the licensure / certification must be in good standing.
Senior Data Engineer • BILLINGS, MT, United States