Data, Software, and Research Projects

U.S Census Full Count

The Demography Department Computing Lab maintains a secure computing environment with access to digitized full count US Census microdata from 1790 to 1940, by agreement with the Minnesota Population Center (MPC) and More information about applying to access the data can be found on the Population Sciences website.


The CenSoc Project is a five-year project funded by the National Institute on Aging R01AG058940, with Joshua R. Goldstein, Principal Investigator.

The CenSoc project – so named because it links 1940 Census data with Social Security Administration death records – is a new, large-scale, public microdata data set to be used for advancing understanding of mortality disparities in the United States. The project uses record linkage techniques to match deaths aged 65-and-over observed from 1975 to 2005 back to individual, family, and neighborhood characteristics in the census. The use of modern data-linkage techniques allows us to construct a data set of about 15 million deaths, more than 30 times the size of the largest existing sample surveys. The unprecedented scale and detail of CenSoc data allow researchers to make new discoveries in areas such as (a) mortality disparities by education, national origin, and race; (b) early life conditions and later-life mortality; and (c) geographic variation and the neighborhood determinants of mortality. These topics are of increasing importance in understanding increases in disparities in life expectancy in the United States.

CenSoc has three core purposes:

  1. Creation and Dissemination of Linked Mortality Data Sets, that is, public, individual-level data sets linking the 1940 U.S. Census with the (a) Social Security Death Master File, an (b) the NARA Numident file.

  2. Development of New Mortality Rate Estimation Methods for Linked Data.

  3. Production of ‘High Resolution’ Studies of Mortality Disparities and Longevity Determinants. See the research page for accomplishments.

The publically available CenSoc data files can be accessed through the CenSoc Project Dataverse


Socsim (‘Social Simulator’) is an open source stochastic microsimulation platform used to produce synthetic populations with plausible kinship structures using demographic rates as input. The Socsim R package, rsocsim, is developed in collaboration with the Max Planck Institute for Demographic Research.

The preceding SocSim software files are archived through Harvard Dataverse:

The Human Mortality Database (HMD)

The Human Mortality Database (HMD) is the world´s leading scientific data resource on mortality in developed countries. The HMD provides detailed high-quality harmonized mortality and population estimates to researchers, students, journalists, policy analysts, and others interested in the human longevity. The Human Mortality Database (HMD) contains original calculations of death rates and life tables for national populations (countries or areas), as well as the input data used in constructing those tables. The input data consist of death counts from vital statistics, plus census counts, birth counts, and population estimates from various sources.

The National Longitudinal Study of Adolescent to Adult Health (AddHealth)

The National Longitudinal Study of Adolescent to Adult Health (Add Health) is a longitudinal study of a nationally representative sample of over 20,000 adolescents who were in grades 7-12 during the 1994-95 school year, and have been followed for five waves to date, most recently in 2016-18.