# State-level needs for social distancing and contact tracing to contain COVID-19 in the United States | Nature Human Behaviour

Our overall approach is as follows: (1) develop a mathematical model (an SEIR-type compartmental model)^{18,19} that incorporates social-distancing data, case identification via testing, isolation of detected cases and contact tracing; (2) assess the model’s predictive performance by training (calibrating) it to reported cases and mortality data from 19 March to 30 April 2020 and validating its predictions against data from 1 May to 20 June 2020; and (3) use the model, trained on data to 22 July 2020, to predict future incidence and mortality. The final stage of our approach predicts future events under a set of scenarios that include increased case detection through expansion of testing rate, contact tracing and relaxation or increase of measures to promote social distancing. All model fitting is performed in a Bayesian framework to incorporate available prior information and address multivariate uncertainty in model parameters.

### Model formulation

We modified the standard SEIR model to address testing and contact tracing, as well as asymptomatic individuals. A fraction *f*_{A} of those exposed (*E*) to enter the asymptomatic *A* class (divided into *A*_{U} for untested and *A*_{C} for contact traced) instead of the infected *I* class, which in our model formulation also includes infectious presymptomatic individuals. With respect to testing, separate compartments were added for untested, ‘freely roaming’ infected individuals (*I*_{U}), tested/isolated cases (*I*_{T}) and fatalities (*F*_{T}). Following recovery, untested infected individuals (*I*_{U}) and all asymptomatic individuals move to the untested recovered compartment, *I*_{U}, and tested infected individuals move to the tested recovered compartment, *I*_{T}. In balancing considerations of model fidelity and parameter identifiability, we made the reasonably conservative assumptions that all tested cases are effectively isolated (through self-quarantine or hospitalization) and thus unavailable for transmission, and that all COVID-related deaths are identified/tested.

With respect to contact tracing, the additional compartment *S*_{C} represents unexposed contacts who undergo a period of isolation during which they are not susceptible before returning to *S*, while *E*_{C}, *A*_{C} and *I*_{C} represent contacts who were exposed. Again, the reasonably conservative assumption was made that all exposed contacts undergo testing, with an accelerated testing rate compared to the general population. We assume a closed population of constant size, *N*, for each state.

The ordinary differential equations governing our model are as follows:

where *c* is the contact rate between individuals, *β* is the transmission probability per infected contact, *f*_{C} is the fraction of contacts identified through contact tracing, 1/*γ* is the duration of self-isolation after contact tracing, 1/*κ* is the latent period, *f*_{A} is the fraction of exposed who are asymptomatic, *λ* is the testing rate, *δ* is the fatality rate, *ρ* is the recovery rate and *λ*_{C} and *ρ*_{C} are the testing and recovery rates, respectively, of contact-traced individuals. The testing rates *λ* and *λ*_{C}, the fatality rate *δ* and the recovery rate of traced contacts *ρ*_{C} are each composites of several underlying parameters. The testing rate defined as

where *F*_{test,0} is the current testing coverage (fraction of infected individuals tested), Sens_{test} is the test sensitivity (true positive rate) and *k*_{test} is the rate of testing for those tested, with a typical time-to-test equal to 1/*k*_{test}. The time-dependence term models the ramping up of testing using a logistic function with a growth rate of 1/*τ*_{T} d^{−1}, where *T*50_{T} is the time where 50% of the current testing rate is achieved. Similarly, for testing of traced contacts, the same definition is used with the assumption that all identified contacts are tested, *F*_{test,0} = 1 and at a faster assumed testing rate, *k*_{C,test}:

Because all contacts are assumed to be tested, the rate *ρ*_{C} at which they enter the ‘recovered’ compartment, R_{U} is simply the rate of false negative test results:

The fatality rate is adjusted to maintain consistency with the assumption that all COVID-19 deaths are identified, assuming constant IFR. Specifically, we first calculated the fraction of infected that is tested and positive:

Then the case fatality rate CFR(*t*) = IFR/*f*_{pos}(*t*). Because CFR = *δ*/(*δ* + *ρ*), this implies

The model is ‘seeded’ *N*_{initial} cases on 29 February 2020. Because in the early stages of the outbreak there may be multiple ‘imported’ cases, we fit to data only from 19 March 2020 onwards, 1 week after the US travel ban was put in place^{31}.

Our model is fit to daily case *y*_{c} and death *y*_{d} data (cumulative data are not used for fitting because of autocorrelation). To adequately fit the case and mortality data, we accounted for two lag times. First, a lag is assumed between leaving the *I*_{U} compartment and public reporting of a positive test result, accounting for the time it takes to seek a test, obtain testing and have the result reported. No lag is assumed for tests from contact tracing. Second, a lag time is assumed between entering the fatally ill compartment F_{T} and publicly reported deaths. Additionally, we use a negative binomial likelihood to account for the substantial day-to-day over-dispersion in reporting results. The corresponding equations are as follows:

In this parameterization, because the dispersion parameter *α* → ∞, the likelihood becomes a Poisson distribution with expected value *y*_{pred,[c,d]}, whereas for small values of *α* there is substantial interindividual variability. Case and death data were sourced from The COVID Tracking Project^{32}.

Finally, we derived the time-dependent reproduction number, *R*(*t*) and the effective reproduction number, *R*_{eff}(*t*) of this model, given by

*R*_{eff}(*t*) is the average number of secondary infection cases generated by a single infectious individual during their infectious period in partially susceptible population at time *t*. It is equal to the product of the transmission risk per contact of an infectious individual with their untraced contacts, *c* × *β* × (1 − *f*_{C}), times their average duration of infection, \(\left( {\frac{{1 – f_{\mathrm{A}}}}{{\lambda + \rho }} + \frac{{f_{\mathrm{A}}}}{\rho }} \right)\), and the portion of contacts that are susceptible, \(\frac{{{{S}}(t)}}{N}\). This accounts for the relative contribution of asymptomatic, \(c \times \beta \times \left( {1 – f_{\mathrm{C}}} \right)\left( {\frac{{f_{\mathrm{A}}}}{\rho }} \right) \times \frac{{{{S}}(t)}}{N}\) and symptomatic infection, \(c \times \beta \times (1 – f_{\mathrm{C}})\left( {\frac{{1 – f_{\mathrm{A}}}}{{\lambda + \rho }}} \right) \times \frac{{{{S}}(t)}}{N}\). Using posterior samples for all 50 states and the District of Columbia, we conducted an analysis of variance using a linear model to characterize the contributions to the combined interstate and intrastate variation in *R*_{eff}. Specifically, we used a linear model for *R*_{eff} with the model parameters *R*_{0}, *η*, *θ*_{min}, *r*_{max}, *f*_{C}, *f*_{A}, *λ* and *ρ* as predictors, and evaluated the percentage of variance in *R*_{eff} contributed by each parameter.

### Incorporating social distancing, enhanced hygiene practices and reopening

The impact of social distancing, hygiene practices and reopening was modelled through a time dependence in the contact rate, *c* and the transmission probability per infected contact, *β*:

The *θ*(*t*) function parameterized social distancing during the progression to shelter-in-place, and is modelled as a Weibull function:

which starts as unity and decreases to *θ*_{min}, with *τ*_{θ} being the Weibull scale parameter and *n*_{θ} the Weibull shape parameter (Fig. 1).

The *r*(*t*) function parameterized relative increase in contacts due to reopening after shelter-in-place, with *r* = 1 corresponding to a return to baseline *c* = *c*_{0.}

The term *r*(*t*) is 0 before *t*_{r}, linear between *t*_{r} and *t*_{rmax} and constant at a value of *r*_{max} after that, and made continuous by approximating the Heaviside function by a logistic function. The reopening time is defined as *τ*_{s} days after *τ*_{θ}, and the maximum relative increase in contacts *r*_{max} happens *τ*_{r} days after that.

We selected the functional form above for *c*(*t*) because it was found to be able to represent a wide variety of social-distancing data, including mobile phone mobility data from Unacast^{33} and Google^{34} as well as restaurant booking data from OpenTable^{35}. We used these different mobility sources to derive state-specific prior distributions because different social-distancing datasets had different values for *θ*_{min}, *τ*_{θ}, *n*_{θ}, *τ*_{s}, *r*_{max} and *τ*_{r} (Supplementary Fig. 1).

With respect to the reduction in transmission probability *β*, we assumed that during the shelter-in-place phase, hygiene-based mitigation paralleled this decline with an effectiveness power *η*, and that this mitigation continued through reopening.

Finally, we define an overall reopening parameter *Δ* that measures the rebound in disease transmission, *c* × *β* relative to its minimum, defined to be 0 during shelter-in-place (that is, *R*(*t*) is at a minimum) and 1 when all restrictions are removed (when *R*(*t*) = *R*_{0}), which can be derived as:

Our model is illustrated in Fig. 1, with parameters and prior distributions listed in Table 1.

### Scenario evaluation

We used the model to make several inferences about the current and future course of the pandemic in each state. First, we consider the effective reproduction number. Two time points of particular interest are the time of minimum *R*_{eff}, reflecting the degree to which shelter-in-place and other interventions were effective in reducing transmission, and the final time of the simulation, 22 July 2020, reflecting the extent to which reopening has increased *R*_{eff}. Additional parameters of interest are the current levels of reopening *Δ*(*t*), testing *λ* and contact tracing *f*_{C}.

We then conducted scenario-based prospective predictions using our model’s parameters as estimated to 22 July 2020. We then asked the following questions:

Assuming current levels of reopening, what increases in general testing *λ* and/or contact tracing *f*_{C} would be necessary to bring *R*_{eff} < 1?

What level of reopening *Δ* can maintain *R*_{eff} < 1 under four different scenarios: current values of testing and contact tracing, doubling testing, double tracing and doubling both testing and tracing?

What will be the rates of new cases and deaths under different scenarios? Specifically, we evaluate the impact of increases in testing and contact tracing under current levels of reopening, as well as increases or decreases of 25 or 50%.

For (1), we evaluated the posterior probability that *R*_{eff} < 1 under scaling transformations *λ* → *λ* × *μ*_{λ} and *f*_{C} → *f*_{C} × *μ*_{C} with scaling factors *μ*_{λ} and *μ*_{C}:

We additionally derived ‘critical’ values of *μ*_{C} and *μ*_{λ} where *R*_{eff}(*t*) < 1 under the conditions of increased testing alone (*μ*_{C} = 1), increased contact tracing alone (*μ*_{λ} = 1) and equal increases in testing and tracing (*μ*_{C} = *μ*_{λ}). We also performed the same analysis under a full reopening scenario (that is, setting *S*(*t*) = 1, *c* = *c*_{0} and *β* = *β*_{0}).

For (2), we rearranged the equation for *R*_{eff} in terms of the reopening parameter *Δ*:

We then fixed the scaling factors at 1 or 2, and solved the above equation to determine the percentage of reopening (*Δ*_{crit}) that can be achieved while keeping *R*_{eff} < 1. Values of *Δ*_{crit} ≥ *Δ*(*t*) indicate the additional degree of reopening possible while maintaining *R*_{eff} < 1, while values of *Δ*_{crit} < *Δ*(*t*) indicate that reduction of reopening is needed. To convert back to testing and contact-tracing rates, we multiplied the scaling factors *μ*_{C} and *μ*_{λ} by the original values of *f*_{C} and *λ*, respectively.

Finally, for (3), we additionally evaluated changes in reopening *Δ* → *Δ* + *Δ*_{Δ} for *Δ*_{Δ} values of +25% (+50%) or −25% (−50%), for a total of 20 scenarios (four different levels of testing and tracing and five different levels of reopening). We then ran the SEIR model forward in time to 30 September 2020. For all three intervention parameters, *μ*_{C}, *μ*_{λ} and *Δ*_{Δ}, we assumed a ramp-up period of 2 weeks from 1 to 14 August 2020.

To summarize the relative need for mitigation in each state, we categorized states based on which scenarios resulted in the IQR of *R*(*t*) < 1 on 15 August 2020. The categories were defined as follows:

Very Low: can reopen further by >25% while maintaining *R*(*t*) < 1

Low: can reopen further by <25% with up to 2× increase in testing while maintaining *R*(*t*) < 1

Moderate: requires 2× contact tracing or reversal of reopening by 25% to bring and maintain *R*(*t*) < 1

High: requires multiple interventions (2× testing, 2× contract tracing and reversal of reopening by 25%) to bring and maintain *R*(*t*) < 1

Very High: combining 2× testing, 2× contact tracing and reversal of reopening by 50% is needed to bring and maintain *R*(*t*) < 1

We use *R*(*t*) instead of *R*_{eff}(*t*), to minimize the impact of heterogeneity and uncertainty in the value of *S*(*t*)/*N* on our results. Thus, requiring *R*(*t*) < 1 provides greater assurance of state-wide control of the epidemic.

### Software and code

Posterior distributions were sampled with Markov chain Monte Carlo (MCMC) simulation performed using MCSim v.6.1.0 in Metropolis within Gibbs sampling^{36}. For each US state, four chains of 200,000 iterations each were run, with the first 20% of runs discarded and 500 posterior samples saved for analysis. For each parameter, comparison of interchain and intrachain variability was assessed to determine convergence, with the potential scale reduction factor *R* ≤ 1.2 considered converged^{37}. Additional analysis of model outputs was performed in RStudio v.1.2.1335 (ref. ^{38}) with R v.3.6.1 (ref. ^{39}).

### Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.