We describe our nonparametric approach for contact rate estimation in the context of the SIR model used to generate the results described in the paper. This is done without loss of generality. As also clear in the sequel, all the ideas and mathematical results here obtained then apply to any compartmental model, hence defining the entire nonparametric class illustrated in Fig. 1.

### Time-varying SIR

Consider, without loss of generality, a population normalized to 1. According to SIR models, the notation S(t) indicates the susceptible people (who can be infected), I(t) denotes the infected people (who have been infected and are able to spread the infection), while R(t) represents the removed people (who were infected but then either healed or died). Healed people acquire immunity, so that S(t) is a decreasing function. In addition, susceptible people can be infected through dynamics depending on the number of contacts between infected and susceptible ones, i.e.

\begin{aligned} {\dot{S}}(t)=-a(t)S(t)I(t), \ a(t)>0. \end{aligned}

Above, a(t) is the contact rate which is strongly related to the adopted restraints. The function I(t) can both increase, because of susceptible who become infected, and decrease, because of healing and/or death. The decreasing rate is proportional to I(t), i.e.

\begin{aligned} {\dot{I}}(t)=a(t)S(t)I(t)-bI(t), \ b>0. \end{aligned}

Finally, R(t) increases accordingly to the healing/death rate, i.e.

\begin{aligned} {\dot{R}}(t)=bI(t). \end{aligned}

In the above equations, b describes average time for healing/death. In absence of effective cures it can be assumed independent of time. We are only interested in positive solutions, i.e. $$S(t),I(t),R(t) \ge 0$$, which are necessarily bounded since

\begin{aligned} S(t)+I(t)+R(t)=1, \ \forall t \in {{\mathbb {R}}}. \end{aligned}
(4)

By defining $$q(t):=\frac{b(t)}{a(t)}>0$$, which corresponds to the inverse of the reproduction number $$\gamma (t)$$ at the beginning of the epidemic, it easily follows that

\begin{aligned} \begin{array}{lcl} \frac{d}{dt} \{ I(t)+S(t)-q(t)\ln [S(t)]\}&{}=&{}-{\dot{q}}(t)\ln [S(t)] \ \Rightarrow \\ I(t)+S(t)-q(t)\ln [S(t)]&{}=&{}1-\int _{-\infty }^t \ {\dot{q}}(\tau )\ln [S(\tau )] d\tau . \end{array} \end{aligned}

The integral on the lhs is evaluated for $$t \rightarrow -\infty$$, when no infected and, hence, no removed are present. So, one has $$S(-\infty )=1$$ and $$I(-\infty )=R(-\infty )=0$$. By defining

\begin{aligned} \delta (t):=-\int _{-\infty }^t \ {\dot{q}}(\tau ) \ \ln S(\tau ) d\tau \end{aligned}
(5)

I(t) and R(t) become the following functions of S(t):

\begin{aligned} R(t)= & {} g(S(t),\delta (t)):=-\delta (t)-q(t)\ln [S(t)], \nonumber \\ I(t)= & {} f(S(t),\delta (t)):=1+\delta (t)-S(t)+q(t) \ln [S(t)]. \end{aligned}
(6)

This permits us to rewrite the differential equations in terms of q(t) as

\begin{aligned} {\dot{I}}(t)=a(t)I(t)[S(t)-q(t)], \ {\dot{S}}(t)=-a(t)S(t)I(t). \end{aligned}
(7)

We have seen that the function a(t) has to describe how the level of people social interactions evolves in time, accounting also for lockdowns effects. Before the lockdown’s instant $$t^*$$, such a function is assumed constant. Under these stationary assumptions, i.e. in absence of lockdowns, the function q(t) is thus equal to the constant q and one has $$\delta (t)=0$$. This implies

\begin{aligned} I(t)= & {} f(S(t),0)=1-S(t)+q \ln [S(t)] \end{aligned}
(8)
\begin{aligned} R(t)= & {} g(S(t),0)=-q \ln [S(t)]. \end{aligned}
(9)

Now, let the instant $$t=0$$ be the beginning of our experiment, with $$S(0) \approx 1$$. The lockdown’s instant is instead denoted by $$t^*>0$$ (corresponding to March 9, 2020, in Italy). Using (9), the following approximated relationship is obtained

\begin{aligned} S(0) \approx 1+\frac{I(0)}{q-1}. \end{aligned}
(10)

### A parametric class of time-varying SIR

We start introducing a parametric class of time-varying SIR models instrumental for the building of the nonparametric approach. Our data model is

\begin{aligned} {\dot{S}}(t)= & {} -a(t)S(t)I(t) \\ {\dot{I}}(t)= & {} a(t)S(t)I(t)-bI(t) \\ {\dot{R}}(t)= & {} bI(t)\\ y(t)= & {} \frac{1}{H}I(t). \end{aligned}

Note that the measurable output y(t) is proportional to the number of infected people through the inverse of the unknown parameter H. The state dynamics then depend on the parameter b and the time-varying contact rate a(t).

Since the system is assumed to be stationary before the lockdown’s instant, initial conditions are

\begin{aligned} S(0)= & {} 1+\frac{Hy(0)}{q(0)-1}, \quad q(0)=\frac{b}{a(0)}\\ I(0)= & {} Hy(0)\\ R(0)= & {} 1-I(0)-S(0) \end{aligned}

where in the expression of S(0) we have assumed exact the approximation (10).

While a(t) is constant before the lockdown’s instant $$t^*$$, next we assume that it has a discontinuity in $$t^*$$. Then, during the restrictions, its value could still decrease e.g. due to people’s growing awareness of infection risk or because restrictions can be further strengthened after the first lockdown. One simple parametric time-course for a(t) is given by

\begin{aligned} \quad \qquad a(t)= \left\{ \begin{array}{cl} a_1 &{} \quad \text{ if } \ \ t< t^* \\ a_2 e^{-c(t-t^*)} &{} \quad \text{ if } \ \ t^* \le t \le t_{end} \end{array} \right. \end{aligned}
(11)

where $$t_{end}$$ denotes the end of the lockdown (May 18, 2020, in Italy). Note that a(t) would tend to zero if the lockdown would never end ($$t_{end}=+\infty$$).

The above model does not follow the paradigm depicted in Fig. 1 because the contact rate is not described through an infinite-dimensional model. It depends on parameters which are the components of the following finite-dimensional parameter vector

\begin{aligned} \theta =\left[ a_1 \ a_2 \ b \ c \ H\right] . \end{aligned}

For our future developments, we need to prove that $$\theta$$ is globally identifiable from data, i.e. it can be reconstructed under the ideal assumption of knowledge of the entire output trajectory y(t). Using differential algebra tools, e.g. see49, the system leads to the following characteristic set:

\begin{aligned} \frac{{\dot{a}}(t)}{H}{\dot{y}}(t)y(t) – \frac{a(t)}{H} \ddot{y}(t) y(t) +\frac{a(t)}{H} {\dot{y}}^2(t) + a^2(t){\dot{y}}(t)y^2(t) – {\dot{y}}(t)y(t) \frac{ba(t)}{H}. \end{aligned}

If $$t < t^*$$, one has $$a(t)=a_1$$ and $${\dot{a}}(t)=0$$ so that the coefficients of the characteristic set become

\begin{aligned} \frac{a_1}{H}, a_1^2, \frac{ba_1}{H}. \end{aligned}

Hence, since all the three parameters are known to be positive, the values of $$a_1,H,b$$ can be univocally determined. If $$t \ge t^*$$, one has $$a(t)=a_2e^{-c\tau },{\dot{a}}(t)=-ca_2e^{-c\tau }$$. If $$\tau _1=t_1-t^*$$ and $$\tau _2=t_2-t^*$$ are two distinct and known time-instants, the characteristic set permits e.g. to reconstruct

\begin{aligned} \frac{a_2e^{-c\tau _1}}{H},\frac{a_2e^{-c\tau _2}}{H} \end{aligned}

and this fact, combined with the knowledge of H, permits to achieve also $$a_2$$ and c.

Having shown that such parametric time-varying SIR is globally identifiable, we can use least squares to estimate $$\theta$$. Let $$y_{\theta }(t_i)$$ denote the output of the SIR model as a function of the unknown parameter vector. Then, two estimators are now considered. The first one uses a subclass of models defined by imposing $$c=0$$, making the contact rate description equal to (3). The resulting estimator is

\begin{aligned} {\hat{\theta }} = \arg \min _{\theta \ s.t. \ c=0} \sum _i (y(t_i) – y_{\theta }(t_i))^2 \end{aligned}

and defines exactly the parametric estimates reported with dotted lines in Figs. 6 and 7.

The other estimator exploits the entire class and is thus given by

\begin{aligned} {\hat{\theta }} = \arg \min _{\theta } \sum _i (y(t_i) – y_{\theta }(t_i))^2 \end{aligned}

Using intensive care data in Lombardy, the estimates of the components of $$\theta$$ turn out

\begin{aligned} {\hat{a}}_1=0.27, \ {\hat{a}}_2=0.19, \ \hat{b=0.076}, \ {\hat{c}}=0.011, \ {\hat{H}}=1467.8. \end{aligned}

while, assuming Gaussian noise, the maximum likelihood estimate of the noise variance is $${\hat{\sigma }}^2$$=3.5e-12. This corresponds to a standard deviation equal to 18.8 on the intensive care data not normalized w.r.t. whole population in Lombardy which were shown in the right panel of Fig. 2. Thus, the estimate of a(t) for $$t\ge t^*$$ is $$0.19e^{-0.011t}$$ and provides a first hint as how the contact rate decreased during the lockdown in Lombardy. But it is questionable if a mono-exponential is suited to describe a so complex phenomenon. For this reason, in the next section this simple model will be generalized through nonparametric arguments.

### Nonparametric model of the contact rate

We will assume that a(t) belongs to a special class of Hilbert spaces $${\mathcal {H}}$$ called Reproducing kernel Hilbert spaces (RKHSs)50,51. To introduce them, recall that, if $${\mathcal {X}}$$ denotes the function domain, $$K:{\mathcal {X}}\times {\mathcal {X}} \rightarrow {\mathbb {R}}$$ is called positive definite kernel if, for any finite natural number p, it holds that

\begin{aligned} \sum _{i=1}^{p}\sum _{j=1}^{p}c_ic_j K(x_i,x_j) \ge 0, \quad \forall (x_k,c_k) \in \left( {\mathcal {X}},{\mathbb {R}}\right) , \quad k=1,\ldots , p. \end{aligned}

One can then prove that any RKHS is in one-to-one correspondence with a positive definite kernel and inherits the properties of the kernel, e.g. continuous kernels induce spaces of continuous functions. For our developments, the following fact is also important. Given a kernel K, the kernel section $$K_x$$ centered at x is the function $${\mathcal {X}} \rightarrow {\mathbb {R}}$$ defined by

\begin{aligned} K_x(y) = K(x,y) \quad \forall y \in {\mathcal {X}}. \end{aligned}

Then, one has that any function in $${\mathcal {H}}$$ is a linear combination of a possibly infinite number of kernel sections52.

The question is now which RKHS can be conveniently introduced as hypothesis space for a(t). During a lockdown, this function is expected to have a smooth decay as time progresses. We can then consider the so called first-order stable spline kernel defined by

\begin{aligned} K(t,\tau ) =\lambda e^{-\alpha \max {(t,\tau )}}, \quad 0< \alpha <1, \ \lambda \ge 0 \end{aligned}
(12)

which was originally introduced in the literature to describe impulse responses of stable systems53. It depends on the positive scale factor $$\lambda$$ and the scalar $$\alpha$$ which regulates the decay rate of the functions contained in the associated RKHS. We will fix these two parameters by exploiting the estimates of the mono-exponential decay obtained in the previous section. In particular, we set $$\lambda ={\hat{a}}_2^2$$ and $$\alpha =2{\hat{c}}$$. Thinking also of the Bayesian interpretation of regularization, where the kernel is seen as a covariance54, this makes our space in some sense centred around exponentials of amplitude $${\hat{a}}_2$$ and decay rate $${\hat{c}}$$. This fully defines the kernel and, hence, the associated RKHS $${\mathcal {H}}$$. It can be proved that such stable spline space is infinite-dimensional and able to approximate any continuous map. Our nonparametric model for a(t) is then defined by

\begin{aligned} \quad \qquad a(t)= \left\{ \begin{array}{cl} a &{} \quad \text{ if } \ \ t< t^* \\ f(t-t^*) \ \text{ with } \ f \in {\mathcal {H}} &{} \quad \text{ if } \ \ t^* \le t \le t_{end} \end{array} \right. \end{aligned}
(13)

So, the overall model now follows the paradigm in Fig. 1 with $$\theta =[a \ b \ H]$$ and the a(t) defined by a and $$f \in {\mathcal {H}}$$.

Estimation of f and $$\theta$$ is however ill-posed. This problem is circumvented using regularization in $${\mathcal {H}}$$ with penalty term defined by the RKHS norm $$\Vert \cdot \Vert _{{\mathcal {H}}}$$. Specifically, letting $$y_{f,\theta }(t_i)$$ be the output of the SIR model as a function of f and $$\theta$$, our estimator is given by

\begin{aligned} ({\hat{f}},{\hat{\theta }}) = \arg \min _{f \in {\mathcal {H}},\theta \in {\mathbb {R}}^3} \sum _i \frac{(y(t_i) – y_{f,\theta }(t_i))^2}{{\hat{\sigma }}^2} + \Vert f \Vert _{{\mathcal {H}}}^2 \end{aligned}
(14)

where $${\hat{\sigma }}^2$$ is the maximum likelihood estimate of the noise variance already mentioned in the previous section. The objective in (14) includes two different components. The first one is a quadratic loss and penalizes values of $$\theta$$ and f associated to compartmental models unable to well describe the observational data. The second one is the regularizer, defined by the RKHS norm, which restores well-posedness. It excludes non plausible solutions for the contact rate a(t), e.g. defined by too irregular temporal profiles of f(t). The problem thus corresponds to a nonlinear version of a regularization network36,37. Its solution exists since, according to the results in55, optimization can be restricted to a compact set of the continuous functions equipped with the sup-norm where the map $$y_{f,\theta }$$ is continuous (see also Appendix of56) and the regularizer is lower semicontinuous.

However, differently from the classical machine learning problems where f is linearly related to data, the nonlinearities present in our compartmental model makes the solution (14) not available in closed-form. To compute it, the following strategy has been then adopted. By fixing an integer M, we define the following representation $$f(t)= \sum _i^{M} c_i K_{t_i}(t)$$ given by sum of kernel sections over a uniform grid of temporal instants $$t_i$$. Next, a sequence of solutions of the problem (14), with the objective restricted to these finite-dimensional subspaces of dimension M, is obtained for increasing values of M. This is done until reaching convergence (which is guaranteed still exploiting the results in55).

The stable spline kernel is useful to describe the contact rate during the lockdown, since it embeds information on smooth decay of the reproduction number. Since $$\gamma (t)$$ after the end of the lockdown is not expected to decay, and could also increase, to obtain the results depicted in Fig. 9 we have used the Laplacian kernel37

\begin{aligned} K(t,\tau ) =\lambda e^{-\frac{| t-\tau |}{\eta }}, \quad \lambda ,\eta > 0 \end{aligned}
(15)

which embeds information only on continuity of the time-course. System initial conditions at the end of the lockdown are set to the estimates obtained by the procedure reported above. Next, a parametric model with constant contact rate a(t) is fitted to data, obtaining also a recalibration of the noise variance, and the scale factor $$\lambda$$ is set to its squared value. The kernel width $$\eta$$ is then estimated through the concept of Bayesian evidence, exploiting the stochastic interpretation of (14), as also discussed in the next paragraph, and using the Laplace approximation to compute the model posterior probability57.

Finally, to complement the estimates with confidence intervals, a Bayesian framework has been adopted resorting to the stochastic interpretation of regularization and the duality between RKHSs and Gaussian processes54. The noise affecting the data is assumed to be Gaussian. The components of $$\theta$$, and also the noise variance, are seen as mutually independent random variables and are assigned poorly informative prior distributions, in practice including only nonnegativity information. The contact rate for $$t \ge t^*$$ is then seen as a Gaussian process defined by $$f(t)= \sum _i^{M} c_i K_{t_i}(t)$$. Here, the $$c_i$$ are the components of the zero-mean Gaussian vector c whose covariance matrix is the inverse of $${\bar{K}} \in {\mathbb {R}}^{M \times M}$$ with (ij) entry given by $$K(t_i,t_j)$$. In this way, the function f(t) sampled on the $$t_i$$ is indeed Gaussian with covariance matrix $${\bar{K}}$$. Markov chain Monte Carlo has been then used to reconstruct the posterior in sampled form38. In particular, a random walk Metropolis has been implemented. Covariances of the increments have been tuned through a pilot analysis to obtain an acceptance rate around $$30\%$$ and then 4 million iterations have been performed.