Carnegie Mellon’s Delphi Group Re-Architects Epidemic Forecasting Infrastructure to Enhance Readiness for the Next Public Health Crisis

After the COVID-era surge pushed their infrastructure to its limits, CMU’s Delphi Group re-architected its ML and data infrastructure with Astro—shifting from crisis-built systems to scalable, crisis-ready public health infrastructure.

90%+

10x

3x

The Customer

In epidemic forecasting, the stakes don’t get higher and hours matter greatly.

To ensure precious hours are not lost, Carnegie Mellon University’s Delphi Group builds advanced machine learning models, statistical methods, and real-time epidemic data systems. The group’s work sits at the intersection of research and operations—where rigorous modeling must meet real-world urgency.

Co-founded in 2012 by Machine Learning Professors Ryan Tibshirani and Roni Rosenfeld, who continue to co-lead the initiative, Delphi has played a leading role in shaping epidemic forecasting technology in the United States. The group has been recognized by the U.S. Centers for Disease Control and Prevention as a National Center for Epidemic Forecasting and is a perennial winner in CDC forecasting challenges.

“Our mission is to make epidemic forecasting as useful and actionable as weather forecasting, so that decision-makers can rely on it when it matters most.” Roni Rosenfeld Co-Founder and Co-Director, Delphi Group, Carnegie Mellon University

The Challenge

When COVID-19 struck, demand for Delphi’s data, tools, and forecasts surged overnight. A platform built for public health professionals became a public utility, as journalists, federal and state agencies, academic researchers, hospital systems, and the general public turned to Delphi’s indicators for clarity. The team grew from fewer than ten people to more than seventy in a few short months.

To meet urgent needs, Delphi assembled a custom Python orchestration layer, stitching together pipelines across APIs, email feeds, FTP servers, and scraped sources. It worked. It was also fragile. Every data source was bespoke. Cron jobs polled blindly. Backfills were heavy. As the repository records grew into the billions, latencies crept up. In Roni’s words, “Every pipeline, every data source, was different… bespoke. Lots of bugs, lots of inefficiencies.”

The system delivered—but it was built in crisis. Crisis architecture favors speed over durability.

By 2024, the emergency system had become the permanent one. Leadership recognized that the next public health crisis could not be met with improvised infrastructure. The foundation itself had to be rebuilt deliberately, and to scale.

The Solution

Delphi Group leadership initiated a multi-month evaluation of orchestration platforms. After stress-testing several options they aligned on Apache Airflow as the architectural foundation and Astronomer’s Astro platform to operationalize it.

“We considered Airflow alongside several other platforms over a number of months. In the end, Airflow was the right direction for the scale and flexibility we needed—and working with Astronomer meant we didn’t have to take on the full operational burden ourselves.” Adam Johns Software Engineering Manager, Delphi Group, Carnegie Mellon University

This was not a tooling swap. With Astro and Astronomer’s Center of Excellence (CoE) support, the team executed a deliberate architectural reset.

Standardizing the Foundation

Bespoke, source-by-source pipelines gave way to a repeatable orchestration framework. Rather than reinventing logic for every dataset, the team introduced standardized DAG patterns, reusable transformation components, and consistent deployment practices built for longevity. “We needed to templatize our DAGs and stop reinventing the wheel for every data source,” says Adam Johns, Software Engineering Manager at the Delphi Group.

Modernizing Operations to Ensure Data Freshness

Event-driven ingestion using Airflow sensors replaced fixed schedule polling, enabling workflows to react to data arrival instead of guessing at it. Testing, versioning, and code review became systematic. Backfills and reprocessing—once disruptive—became controlled, predictable operations. As Roni reports, “Timeliness is central to our mission. Moving from schedule-based polling to event-driven detection was critical to closing the gap between when data arrives and when insight is available.”

Institutionalizing Engineering Discipline

Through Astronomer’s Center of Excellence (CoE), the team received architectural guidance, DAG reviews, pair programming, and targeted training from the Airflow experts. Together they established a durable engineering discipline across a group that balances research innovation with 24/7 production responsibility.

“The additional conversations and support we received through the Center of Excellence reinforced our decision to work with Astronomer.” Peter Jhon Executive Director, Delphi Group, Carnegie Mellon University

Versioned Data in Action: Evaluating Forecast Performance

When you produce a forecast, it matters which version of the training data you use. The bottom chart shows what the model looks like when trained with data available today — but that's not how things looked when the forecast was actually made. The top chart corrects for this, using the data as it existed at the time, giving a much more honest picture of model performance.

Source: Delphi Group, Carnegie Mellon University. Forecast performance of COVID-like illness surveillance models, 2024–2026.

Getting this right requires careful data versioning — knowing exactly what data existed at any given point in time. That's where Astronomer comes in. As Adam explains, "Airflow and Astronomer are helping us do much better with accurately and swiftly managing versioned data."

The Results

Adopting Astro improved performance, consistency, and readiness:

  • 3× faster pipeline runtime for model training and retraining, with more than 200 million rows processed per refresh cycle. Model training cycles that once required careful scheduling now run predictably, allowing the team to focus on epidemiological insight rather than infrastructure management.
  • 80% reduction in new pipeline deployment time, shifting new-source development from months to a repeatable process taking only a few weeks. The modernization also changed development velocity. New data sources can be integrated more quickly, and for the first time in years, the backlog is shrinking rather than expanding.
  • 90%+ reduced latency between raw data arrival and published indicators through event-driven detection. By replacing schedule-based polling with event-driven detection, the gap between data arrival and published signals has narrowed substantially. In epidemic forecasting, that improvement strengthens decision-making at critical moments.
  • 10×+ faster backfill run times, with dramatically better automation. Now the team is able to rerun the full history of a pipeline when needed.
“With Astro in place, we reduced the time to less than ten minutes from raw data availability to published indicators—and built a system capable of sustaining that performance when the next public health emergency arrives.” Roni Rosenfeld Co-Founder and Co-Director, Delphi Group, Carnegie Mellon University

What’s Next

Delphi is preparing for two futures at once: steady-state expansion and emergency-scale response.

The team plans to at least double its data volume and number of sources while continuing to reduce latency and improve automation. At the same time, they are designing for unpredictable public health emergencies that spike traffic, accelerate model retraining, and increase demand for real-time insight by orders of magnitude.

With a standardized orchestration backbone in place, Delphi is positioned to respond proactively to emerging threats like bird flu, rather than rebuilding under crisis conditions.

The next public health emergency will not arrive on schedule. But when it does, Delphi’s infrastructure will be ready.

“With Astro as our data orchestration foundation, we’re prepared not just for steady-state operations, but for the kind of sudden demand spikes that accompany a public health emergency.” Roni Rosenfeld Co-Founder and Co-Director, Delphi Group, Carnegie Mellon University
Carnegie Mellon University

Learn What Astronomer Can Do For You

OR

API Access
Alerting
SAML-Based SSO
Airflow AI Assistant
Deployment Rollbacks
Audit Logging

By proceeding you agree to our Privacy Policy, our Website Terms and to receive emails from Astronomer.