Study on the propagation model of COVID-19 varieties in the United Kingdom

Once you see a sentence wrapped in red, that means it's not finished.

利用 EM 算法估计混合三参数 logistic 模型的参数

Birnbaum¹ introduced 3-parameter logistic model by adding parameters to traditional logistic model. The 3-parameter logistic model is more flexible and easier to apply than traditional logistic model. The model can be expressed as:

Its differential function is:

3-parameter mixture model is a linear combination of several 3-parameter logistic models, where the weight coefficients of each model equals to 1. This model is easy to fit and can be used to estimate . The mixture model is:

where .

In order to estimate the parameters of the mixture model, we turn into following probability density function:

let , then we can rewrite the above equation as:

where meets the restriction , ；；, , are all estimands of the parameters of the mixture model.

There are various methods to estimate parameters in the 3-parameter mixture model, among which maximum likelihood estimation (MLE) and EM algorithm are widely used. EM (Expectation-Maximization) algorithm is a standard algorithm that iteratively estimates the parameters of the mixture model, which converges to the MLE of mixture parameters.

Dempster and Rubin introduces EM algorithm in 1977², it is an iterative algorithm that combines calculation of expectation of probability density function using designated values of missing parameters (E-step) and re-estimation (maximization) of parameters using the result of E-step (M-step). The estimation process of the parameters , , via EM algorithm is written as follows:

Set the value of , in this study. Initialize the parameters , , with , , .
E-step —— calculate the likelihood function

let and the sample series contains samples, and the probability density function of is:

Its logarithm is:
M-step —— calculate posterior probabilities and , ,

For convenience, we will calculate ，即 first:

Because meets the restriction , we use Lagrange multiplier to solve . Langrange function can be written as:

let

denote , where is Bayesian posterior probability, then

Obviously, . In addition, considering , then

The solution is . Bring it back to the original equation, we have

Then calculate ，that is ：

let

In a nutshell,
Repeat E-step and M-step, 直到参数收敛到所需精度。

-means 时间序列聚类

数据的特征

k-means 算法的特点

-means is a simple algorithm aiming to partition a set of data into clusters by comparing their distance between mean of the clusters.

针对多变量时间序列的发展趋势，提出了一种 k 均值聚类算法。通常的 k 均值方法是基于多元数据和聚类质心之间的距离或不相似度量。对于多变量时间序列，一些相似或不同的度量也是可用的。然而，不同测度的适用性取决于时间序列的性质。此外，时间序列质心的确定也不容易。K- 中介体聚类方法可以在不使用质心的情况下，利用其中一种不相似度量对时间序列进行聚类。然而，如果不存在适当的中介体，k- 中介体方法就会受到限制。本文将质心定义为一种共同趋势，并对趋势引入了相异性度量。基于这些质心和不相似度量，提出了一种多变量趋势的 k-means 方法样式算法。建议的方法适用于每个 2019 冠状病毒疾病都道府县的时间序列。

算法如下所示：

K regions are r
在所有样本中随机选取个地区分别作为个聚类的中心，在本研究中，。
分配步：将所有样本按其与其最接近（通过 DTW 距离度量）的聚类中心按照数量均等分配给个群组，即根据这种方法生成的 Voronoi 图对所有样本进行划分，即

其中表示第个聚类，表示第和个聚类的中心，表示时间序列向量与的 DTW 距离，其定义如下：

where indicates the Euclidean distance between and ， is a path which meets following conditions:
- ,
- ,
- .
Update step: Recalculate centroids for observations assigned to each cluster.
Repeat assignment step and update step until convergence. The algorithm terminates when the assignments no longer change.

地图

所有的地图均通过 QGIS 3.22 Białowieża 绘制，投影方法为横轴墨卡托投影，坐标系统为 EPSG:27700 - OSGB 1936 British National Grid³，使用的地图来自于英国国家统计署⁴。

参考文献

Birnbaum, A. L. (1968). Some latent trait models and their use in inferring an examinee's ability. Statistical theories of mental test scores. ↩
Voronoi, G. (1908). Nouvelles applications des paramètres continus à la théorie des formes quadratiques. Premier mémoire. Sur quelques propriétés des formes quadratiques positives parfaites. Journal für die reine und angewandte Mathematik (Crelles Journal), 1908(133), 97-102. ↩
OSGB36 / British National Grid -- United Kingdom Ordnance Survey | EPSG ↩
https://www.ordnancesurvey.co.uk/ ↩

利用 EM 算法估计混合三参数 logistic 模型的参数

-means 时间序列聚类

地图

参考文献

Footnotes