State-level needs for social distancing and contact tracing to contain COVID-19 in the United States | Nature Human Behaviour

Our overall approach is as follows: (1) develop a mathematical model (an SEIR-type compartmental model)18,19 that incorporates social-distancing data, case identification via testing, isolation of detected cases and contact tracing; (2) assess the model’s predictive performance by training (calibrating) it to reported cases and mortality data from 19 March to 30 April 2020 and validating its predictions against data from 1 May to 20 June 2020; and (3) use the model, trained on data to 22 July 2020, to predict future incidence and mortality. The final stage of our approach predicts future events under a set of scenarios that include increased case detection through expansion of testing rate, contact tracing and relaxation or increase of measures to promote social distancing. All model fitting is performed in a Bayesian framework to incorporate available prior information and address multivariate uncertainty in model parameters.

Model formulation

We modified the standard SEIR model to address testing and contact tracing, as well as asymptomatic individuals. A fraction fA of those exposed (E) to enter the asymptomatic A class (divided into AU for untested and AC for contact traced) instead of the infected I class, which in our model formulation also includes infectious presymptomatic individuals. With respect to testing, separate compartments were added for untested, ‘freely roaming’ infected individuals (IU), tested/isolated cases (IT) and fatalities (FT). Following recovery, untested infected individuals (IU) and all asymptomatic individuals move to the untested recovered compartment, IU, and tested infected individuals move to the tested recovered compartment, IT. In balancing considerations of model fidelity and parameter identifiability, we made the reasonably conservative assumptions that all tested cases are effectively isolated (through self-quarantine or hospitalization) and thus unavailable for transmission, and that all COVID-related deaths are identified/tested.

With respect to contact tracing, the additional compartment SC represents unexposed contacts who undergo a period of isolation during which they are not susceptible before returning to S, while EC, AC and IC represent contacts who were exposed. Again, the reasonably conservative assumption was made that all exposed contacts undergo testing, with an accelerated testing rate compared to the general population. We assume a closed population of constant size, N, for each state.

The ordinary differential equations governing our model are as follows:

$$\begin{array}{l}\frac{{\mathrm{d}S}}{{\mathrm{d}t}} = – S \times c \times \left[ {\beta + (1 – \beta ) \times f_{\mathrm{C}}} \right] \times (I_{\mathrm{U}} + A_{\mathrm{U}})/N + S_{\mathrm{C}} \times \gamma \\ \frac{{\mathrm{d}S_{\mathrm{C}}}}{{\mathrm{d}t}} = – S_{\mathrm{C}} \times \gamma + S \times c \times (1 – \beta ) \times f_{\mathrm{C}} \times (I_{\mathrm{U}} + A_{\mathrm{U}})/N\\ \frac{{\mathrm{d}E}}{{\mathrm{d}t}} = – E \times \kappa + S \times c \times \beta \times (1 – f_{\mathrm{C}}) \times (I_{\mathrm{U}} + A_{\mathrm{U}})/N\\ \frac{{\mathrm{d}E_{\mathrm{C}}}}{{\mathrm{d}t}} = – E_{\mathrm{C}} \times \kappa + S \times c \times \beta \times f_{\mathrm{C}} \times (I_{\mathrm{U}} + A_{\mathrm{U}})/N\\ \frac{{\mathrm{d}I_{\mathrm{U}}}}{{\mathrm{d}t}} = – I_{\mathrm{U}} \times (\lambda + \rho ) + E \times \kappa \times (1 – f_{\mathrm{A}})\\ \frac{{\mathrm{d}A_{\mathrm{U}}}}{{\mathrm{d}t}} = – A_{\mathrm{U}} \times \rho + E \times \kappa \times f_{\mathrm{A}}\\ \frac{{\mathrm{d}I_{\mathrm{C}}}}{{\mathrm{d}t}} = – I_{\mathrm{C}} \times (\lambda _{\mathrm{C}} + \rho _{\mathrm{C}}) + E_{\mathrm{C}} \times \kappa \times (1 – f_{\mathrm{A}})\\ \frac{{\mathrm{d}A_{\mathrm{C}}}}{{\mathrm{d}t}} = – A_{\mathrm{C}} \times \rho _{\mathrm{C}} + E_{\mathrm{C}} \times \kappa \times f_{\mathrm{A}}\\ \frac{{\mathrm{d}R_{\mathrm{U}}}}{{\mathrm{d}t}} = (I_{\mathrm{U}} + A_{\mathrm{U}} + A_{\mathrm{C}}) \times \rho + I_{\mathrm{C}} \times \rho _{\mathrm{C}}\\ \frac{{\mathrm{d}I_{\mathrm{T}}}}{{\mathrm{d}t}} = – I_{\mathrm{T}} \times (\rho + \delta ) + I_{\mathrm{U}} \times \lambda + I_{\mathrm{C}} \times \lambda _{\mathrm{C}}\\ \frac{{\mathrm{d}R_{\mathrm{T}}}}{{\mathrm{d}t}} = I_{\mathrm{T}} \times \rho \\ \frac{{\mathrm{d}F_{\mathrm{T}}}}{{\mathrm{d}t}} = I_{\mathrm{T}} \times \delta \end{array}$$

where c is the contact rate between individuals, β is the transmission probability per infected contact, fC is the fraction of contacts identified through contact tracing, 1/γ is the duration of self-isolation after contact tracing, 1/κ is the latent period, fA is the fraction of exposed who are asymptomatic, λ is the testing rate, δ is the fatality rate, ρ is the recovery rate and λC and ρC are the testing and recovery rates, respectively, of contact-traced individuals. The testing rates λ and λC, the fatality rate δ and the recovery rate of traced contacts ρC are each composites of several underlying parameters. The testing rate defined as

$$\lambda (t) = F_{{\mathrm{test}},0} \times \left[ {1 – \frac{1}{{1 + \mathrm{e}^{(t – T50_T)/\tau _T}}}} \right] \times {\mathrm{Sens}_{\rm{test}}} \times k_{{\mathrm{test}}},$$

where Ftest,0 is the current testing coverage (fraction of infected individuals tested), Senstest is the test sensitivity (true positive rate) and ktest is the rate of testing for those tested, with a typical time-to-test equal to 1/ktest. The time-dependence term models the ramping up of testing using a logistic function with a growth rate of 1/τT d−1, where T50T is the time where 50% of the current testing rate is achieved. Similarly, for testing of traced contacts, the same definition is used with the assumption that all identified contacts are tested, Ftest,0 = 1 and at a faster assumed testing rate, kC,test:

$$\lambda _{\mathrm{C}}(t) = \left[ {1 – \frac{1}{{1 + \mathrm{e}^{(t – T50_T)/\tau _T}}}} \right] \times {\mathrm{Sens}_{\rm{test}}} \times k_{{\mathrm{C,test}}},$$

Because all contacts are assumed to be tested, the rate ρC at which they enter the ‘recovered’ compartment, RU is simply the rate of false negative test results:

$$\rho _{\mathrm{C}}(t) = \left[ {1 – \frac{1}{{1 + \mathrm{e}^{(t – T50_T)/\tau _T}}}} \right] \times (1 – {\mathrm{Sens}_{\rm{test}}}) \times k_{{\mathrm{test}}}$$

The fatality rate is adjusted to maintain consistency with the assumption that all COVID-19 deaths are identified, assuming constant IFR. Specifically, we first calculated the fraction of infected that is tested and positive:

$$f_{{\mathrm{pos}}}(t) = f_{\mathrm{C}}\frac{{\lambda _{\mathrm{C}}(t)}}{{\lambda _{\mathrm{C}}(t) + \rho _{\mathrm{C}}(t)}} + (1 – f_{\mathrm{C}})\frac{{\lambda (t)}}{{\lambda (t) + \rho }}.$$

Then the case fatality rate CFR(t) = IFR/fpos(t). Because CFR = δ/(δ + ρ), this implies

$$\delta (t) = \rho \frac{{{\mathrm{CFR}}(t)}}{{1 – {\mathrm{CFR}}(t)}} = \rho \frac{{{\mathrm{IFR}}}}{{f_{{\mathrm{pos}}}(t) – {\mathrm{IFR}}}}.$$

The model is ‘seeded’ Ninitial cases on 29 February 2020. Because in the early stages of the outbreak there may be multiple ‘imported’ cases, we fit to data only from 19 March 2020 onwards, 1 week after the US travel ban was put in place31.

Our model is fit to daily case yc and death yd data (cumulative data are not used for fitting because of autocorrelation). To adequately fit the case and mortality data, we accounted for two lag times. First, a lag is assumed between leaving the IU compartment and public reporting of a positive test result, accounting for the time it takes to seek a test, obtain testing and have the result reported. No lag is assumed for tests from contact tracing. Second, a lag time is assumed between entering the fatally ill compartment FT and publicly reported deaths. Additionally, we use a negative binomial likelihood to account for the substantial day-to-day over-dispersion in reporting results. The corresponding equations are as follows:

$$\begin{array}{l}y_{{\mathrm{obs}},[c,d]}(t) \approx {\mathrm{NegBin}}[\alpha _{[c,d]},p_{[c,d]}(t)]\\ p_{[c,d]}(t) = \frac{{y_{{\mathrm{pred}},[c,d]}(t)}}{{\alpha _{[c,d]} + y_{{\mathrm{pred}},[c,d]}(t)}}\\ y_{{\mathrm{pred}},c}(t) = I_{\mathrm{U}}(t – \tau _{{\mathrm{case}}}) \times \lambda (t) + I_{\mathrm{C}}(t) \times \lambda _{\mathrm{C}}(t)\\ y_{{\mathrm{pred}},d}(t) = I_{\mathrm{T}}(t – \tau _{{\mathrm{death}}}) \times \delta (t)\end{array}$$

In this parameterization, because the dispersion parameter α → ∞, the likelihood becomes a Poisson distribution with expected value ypred,[c,d], whereas for small values of α there is substantial interindividual variability. Case and death data were sourced from The COVID Tracking Project32.

Finally, we derived the time-dependent reproduction number, R(t) and the effective reproduction number, Reff(t) of this model, given by

$$R(t) = c \times \beta \times (1 – f_{\mathrm{C}})\left( {\frac{{1 – f_{\mathrm{A}}}}{{\lambda + \rho }} + \frac{{f_{\mathrm{A}}}}{\rho }} \right)$$
$$R_{{\mathrm{eff}}}(t) = R(t) \times \frac{{{{S}}(t)}}{N}$$

Reff(t) is the average number of secondary infection cases generated by a single infectious individual during their infectious period in partially susceptible population at time t. It is equal to the product of the transmission risk per contact of an infectious individual with their untraced contacts, c × β × (1 − fC), times their average duration of infection, \(\left( {\frac{{1 – f_{\mathrm{A}}}}{{\lambda + \rho }} + \frac{{f_{\mathrm{A}}}}{\rho }} \right)\), and the portion of contacts that are susceptible, \(\frac{{{{S}}(t)}}{N}\). This accounts for the relative contribution of asymptomatic, \(c \times \beta \times \left( {1 – f_{\mathrm{C}}} \right)\left( {\frac{{f_{\mathrm{A}}}}{\rho }} \right) \times \frac{{{{S}}(t)}}{N}\) and symptomatic infection, \(c \times \beta \times (1 – f_{\mathrm{C}})\left( {\frac{{1 – f_{\mathrm{A}}}}{{\lambda + \rho }}} \right) \times \frac{{{{S}}(t)}}{N}\). Using posterior samples for all 50 states and the District of Columbia, we conducted an analysis of variance using a linear model to characterize the contributions to the combined interstate and intrastate variation in Reff. Specifically, we used a linear model for Reff with the model parameters R0, η, θmin, rmax, fC, fA, λ and ρ as predictors, and evaluated the percentage of variance in Reff contributed by each parameter.

Incorporating social distancing, enhanced hygiene practices and reopening

The impact of social distancing, hygiene practices and reopening was modelled through a time dependence in the contact rate, c and the transmission probability per infected contact, β:

$$\begin{array}{l}c(t) = c_0 \times \left[ {\theta (t) + (1 – \theta _{\mathrm{min}}) \times r(t)} \right]\\ \beta (t) = \beta _0 \times \theta (t)^\eta \end{array}$$

The θ(t) function parameterized social distancing during the progression to shelter-in-place, and is modelled as a Weibull function:

$$\theta (t) = \theta _{{\mathrm{min}}} + (1 – \theta _{{\mathrm{min}}}){\mathrm{e}}^{ – (t/\tau _\theta )^{n_\theta }},$$

which starts as unity and decreases to θmin, with τθ being the Weibull scale parameter and nθ the Weibull shape parameter (Fig. 1).

The r(t) function parameterized relative increase in contacts due to reopening after shelter-in-place, with r = 1 corresponding to a return to baseline c = c0.

$$\begin{array}{l}r(t) = r_{{\mathrm{max}}}\frac{{t – \tau _\theta – \tau _s}}{{\tau _r}}\left[ {u(t – t_r) – u(t – t_{r{\mathrm{max}}})} \right] + u(t – t_{r{\mathrm{max}}})\\ u(t) = {\mathrm{Heaviside}}(t) \approx 1 – \frac{1}{{1 + {\mathrm{e}}^{4t}}}\\ t_r = \tau _\theta + \tau _s\\ t_{r{\mathrm{max}}} = \tau _\theta + \tau _s + \tau _r\end{array}$$

The term r(t) is 0 before tr, linear between tr and trmax and constant at a value of rmax after that, and made continuous by approximating the Heaviside function by a logistic function. The reopening time is defined as τs days after τθ, and the maximum relative increase in contacts rmax happens τr days after that.

We selected the functional form above for c(t) because it was found to be able to represent a wide variety of social-distancing data, including mobile phone mobility data from Unacast33 and Google34 as well as restaurant booking data from OpenTable35. We used these different mobility sources to derive state-specific prior distributions because different social-distancing datasets had different values for θmin, τθ, nθ, τs, rmax and τr (Supplementary Fig. 1).

With respect to the reduction in transmission probability β, we assumed that during the shelter-in-place phase, hygiene-based mitigation paralleled this decline with an effectiveness power η, and that this mitigation continued through reopening.

Finally, we define an overall reopening parameter Δ that measures the rebound in disease transmission, c × β relative to its minimum, defined to be 0 during shelter-in-place (that is, R(t) is at a minimum) and 1 when all restrictions are removed (when R(t) = R0), which can be derived as:

$${\Delta}(t) = \frac{{c \times \beta /(c_0 \times \beta _0) – \theta _{{\mathrm{min}}}^{1 + \eta }}}{{1 – \theta _{{\mathrm{min}}}^{1 + \eta }}}.$$

Our model is illustrated in Fig. 1, with parameters and prior distributions listed in Table 1.

Scenario evaluation

We used the model to make several inferences about the current and future course of the pandemic in each state. First, we consider the effective reproduction number. Two time points of particular interest are the time of minimum Reff, reflecting the degree to which shelter-in-place and other interventions were effective in reducing transmission, and the final time of the simulation, 22 July 2020, reflecting the extent to which reopening has increased Reff. Additional parameters of interest are the current levels of reopening Δ(t), testing λ and contact tracing fC.

We then conducted scenario-based prospective predictions using our model’s parameters as estimated to 22 July 2020. We then asked the following questions:

Assuming current levels of reopening, what increases in general testing λ and/or contact tracing fC would be necessary to bring Reff < 1?

What level of reopening Δ can maintain Reff < 1 under four different scenarios: current values of testing and contact tracing, doubling testing, double tracing and doubling both testing and tracing?

What will be the rates of new cases and deaths under different scenarios? Specifically, we evaluate the impact of increases in testing and contact tracing under current levels of reopening, as well as increases or decreases of 25 or 50%.

For (1), we evaluated the posterior probability that Reff < 1 under scaling transformations λ → λ × μλ and fC → fC × μC with scaling factors μλ and μC:

$$R_{{\mathrm{eff}}}(t) = {{S}}(t) \times c \times \beta \times (1 – \mu _{\mathrm{C}} \times f_{\mathrm{C}})\left( {\frac{{1 – f_{\mathrm{A}}}}{{\mu _\lambda \cdot \lambda + \rho }} + \frac{{f_{\mathrm{A}}}}{\rho }} \right)$$

We additionally derived ‘critical’ values of μC and μλ where Reff(t) < 1 under the conditions of increased testing alone (μC = 1), increased contact tracing alone (μλ = 1) and equal increases in testing and tracing (μC = μλ). We also performed the same analysis under a full reopening scenario (that is, setting S(t) = 1, c = c0 and β = β0).

For (2), we rearranged the equation for Reff in terms of the reopening parameter Δ:

$$R_{{\mathrm{eff}}}(t) = {{S}}(t) \times c_0 \times \beta _0 \times (1 – \mu _{\mathrm{C}} \times f_{\mathrm{C}})\left( {\frac{{1 – f_{\mathrm{A}}}}{{\mu _\lambda \times \lambda + \rho }} + \frac{{f_{\mathrm{A}}}}{\rho }} \right)\left[ {{\Delta} \times (1 – \theta _{{\mathrm{min}}}^{1 + \eta }) + \theta _{{\mathrm{min}}}^{1 + \eta }} \right]$$

We then fixed the scaling factors at 1 or 2, and solved the above equation to determine the percentage of reopening (Δcrit) that can be achieved while keeping Reff < 1. Values of Δcrit ≥ Δ(t) indicate the additional degree of reopening possible while maintaining Reff < 1, while values of Δcrit < Δ(t) indicate that reduction of reopening is needed. To convert back to testing and contact-tracing rates, we multiplied the scaling factors μC and μλ by the original values of fC and λ, respectively.

Finally, for (3), we additionally evaluated changes in reopening Δ → Δ + ΔΔ for ΔΔ values of +25% (+50%) or −25% (−50%), for a total of 20 scenarios (four different levels of testing and tracing and five different levels of reopening). We then ran the SEIR model forward in time to 30 September 2020. For all three intervention parameters, μC, μλ and ΔΔ, we assumed a ramp-up period of 2 weeks from 1 to 14 August 2020.

To summarize the relative need for mitigation in each state, we categorized states based on which scenarios resulted in the IQR of R(t) < 1 on 15 August 2020. The categories were defined as follows:

Very Low: can reopen further by >25% while maintaining R(t) < 1

Low: can reopen further by <25% with up to 2× increase in testing while maintaining R(t) < 1

Moderate: requires 2× contact tracing or reversal of reopening by 25% to bring and maintain R(t) < 1

High: requires multiple interventions (2× testing, 2× contract tracing and reversal of reopening by 25%) to bring and maintain R(t) < 1

Very High: combining 2× testing, 2× contact tracing and reversal of reopening by 50% is needed to bring and maintain R(t) < 1

We use R(t) instead of Reff(t), to minimize the impact of heterogeneity and uncertainty in the value of S(t)/N on our results. Thus, requiring R(t) < 1 provides greater assurance of state-wide control of the epidemic.

Software and code

Posterior distributions were sampled with Markov chain Monte Carlo (MCMC) simulation performed using MCSim v.6.1.0 in Metropolis within Gibbs sampling36. For each US state, four chains of 200,000 iterations each were run, with the first 20% of runs discarded and 500 posterior samples saved for analysis. For each parameter, comparison of interchain and intrachain variability was assessed to determine convergence, with the potential scale reduction factor R ≤ 1.2 considered converged37. Additional analysis of model outputs was performed in RStudio v.1.2.1335 (ref. 38) with R v.3.6.1 (ref. 39).

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.