Access the dataset here >>
Project website here >>

We aim to gather a machine readable dataset related to socioeconomic factors that may affect the spread and/or consequences of epidemiological outbreaks, particularly the novel coronavirus (COVID-19). This dataset is envisioned to serve the data science, machine learning, and epidemiological modeling communities.

To facilitate research on such areas, we have curated a machine readable dataset that aggregates relevant data from around 10 governmental and academic sources on the county-level.

In addition to county-level time-series data from the JHU CSSE COVID-19 Dashboard, our dataset contains more than 300 variables that summarize population estimates, demographics, ethnicity, housing, education, employment and income, climate, transit scores, and healthcare system-related metrics.

We hope that this dataset proves to be a useful resource to the community that facilitates and encourages important research on

  • Epidemiological modeling and forecasting,
  • Understanding the pandemic’s influence on economy and legislation but also on mental health and safety, and
  • Many other critical problems not discussed here.

If you find this dataset useful, would like to contribute, or have ideas which other data sources we should consider next: Please do not hesitate to contact us.


Mathias Unberath, Assistant Professor of Computer Science, Johns Hopkins University

Jie Ying Wu, Benjamin Killeen, Kinjal Shah, Anna Zapaishchykova, Philipp Nikutta, Aniruddha Tamhane, Shreya Chakraborty, Jinchi Wei, Tiger Gao, and Mareike Thies.