Scientists including Professor Ferguson from Imperial College London published articles on 17th January and 22nd January estimating the number of cases of a new coronavirus, 2019-nCoV, in Wuhan, China. How did they estimate the number based on cases detected overseas? Let us have a look at their calculation formula.
They used the same calculation formula in the two reports. Let us use the second report to illustrate how they get the estimated number out.
The scientists estimated that “a total of 4,000 cases of 2019-nCoV in Wuhan City (uncertainty range: 1,000 – 9,700) had onset of symptoms by 18th January (the last reported onset date of any case).” “As of 4am 21st January (Beijing Time), 440 cases (including nine deaths) have been confirmed across 13 provinces in China, plus suspected cases in multiple other provinces. As of 9:00 GMT 22nd January, 7 confirmed cases in travellers from Wuhan with symptom onset on or before the 18th January were detected outside mainland China, in Thailand (3 cases), Japan (1 case), South Korea (1 case), Taiwan (1 case) and the United States (1 case).”
Their estimation is based on the following assumptions:
• “Wuhan International Airport has a catchment population of 19 million individuals.
• There is a mean 10-day delay between infection and detection, comprising a 5-6 day incubation period and a 4-5 day delay from symptom onset to detection/hospitalisation of a case (the cases detected in Thailand and Japan were hospitalised 3 and 7 days after onset, respectively).
• Total volume of international travel from Wuhan over the last two months has been about 3,300 passengers per day. This estimate is derived from the 3,418 foreign passengers per day in the top 20 country destinations based on 2018 IATA data, and uses 2016 IATA data held by Imperial College London to correct for the travel surge at Chinese New Year present in the latter data (which has not happened yet this year) and for travel to countries outside the top 20 destination list.”
Calculation Formula:
The total number of cases = number of cases detected overseas / probability any one case will be detected overseas (p)
where the probability any one case will be detected overseas (p) = daily probability of international travel x mean time to detection of a case.
This is incorrect, as evidenced by the fact that (a) if the mean time to detect a case goes up, the probability that any one case will be detected, according to this formula, goes up, and hence the total number of cases goes down (because probability is on the denominator of the first division), whereas we would expect the opposite to be true; and (b) if the probability of a patient being overseas were 100%, a mean time to detection of more than one day would, according to this formula, lead to a probability higher than 100%, which is clearly impossible. The correct formula should take the difference between 100% and the probability that the case will not be detected overseas, which is (1-(1/t))^d where t is mean time to detection after the incubation period (assuming a very low probability of detection during the incubation period), and d is the expected number of days any particular patient has been overseas with their incubation period completed. The expected number of days any particular patient has been overseas at all will be the daily probability of international travel multiplied by half the number of days since January 1st (assuming that passenger-flights were evenly distributed between January 1st and January 17th, that hardly any travellers returned during this time, and that the virus spread quickly enough for us to assume for the purposes of this calculation that everyone in Wuhan who was going to catch it did so by January 1st), i.e. probability of international travel x 9, but the expected number of days they will have been overseas post-incubation will depend on the incubation period: if it’s 5 days, and everyone who flew between January 1st and January 5th ended their incubation on the 5th, then that 5/18 of passengers will have had 13 days overseas post-incubation and the other 13/18 of the passengers will have had an average of 6.5 days, so the average overseas post-incubation days per passenger is 13*5/18+6.5*13/18 = 8.31. So (p) = 1 – ((1 – (1/t)) ^ (Ptravel * 8.31)).
and the daily probability of daily international travel = daily outbound of international travellers from Wuhan / catchment population of Wuhan international airport
Finally, the mean time to detection can be approximated by:
incubation period + mean time from onset of symptoms to detection
Putting the numbers into their formula, we have
Total number of estimated cases = 7 detected overseas /((3301 passengers / 19000000 catchment area)x 10 days)
giving an estimated number of 4029 (the number difference from the report most probably due to the difference of rounding up of digit during the multiplication and division).
Putting the numbers into the formula derived from us, we have an estimated number of
7 cases / (1 - ((1 - (1 / (5 days post-incubation))) ^ ((3301.0/19000000) * 8.31))) = about 22,000.
At a 95% statistical confidence interval, the report says Wuhan has a minimum of about 1700 cases of 2019-nCoV, while the maximum number of cases is about 9800. According to the report, confidence intervals “can be calculated from the observation that the number of cases detected overseas, X, is binomially distributed as Bin(p,N), where p = probability any one case will be detected overseas, and N is the total number of cases. N is therefore a negative binomially distributed function of X.” The result is the maximum likelihood estimates obtained using this negative binomial likelihood function and their incorrect formula.
After a while, we may like to calculate the estimated new coronavirus cases based on the above formula and compare with the announced data from the local government. Before doing that, we need to consider a couple of things. Is the overseas cases are still only confined to be exported from Wuhan? Any other city from China involved by that time will affect both the catchment population number and the number of flights to consider. Moreover, by the time you do the calculation, has the local authority started the prevention measurement by restricting local people from travelling? If this is the case, this would certainly decrease the reliability of the result by making use of detected overseas cases’ number.
Ideally, the calculation formula should be applied 4-5 days (allowing the 4-5 days of detection delay from the day symptom onset) before the local government started restricting the local people from travelling overseas.
The report also mentioned some factors which could affect the number of the estimated cases. Please follow the following links (internet archived link) for the two reports if you would like to know more in detail. https://web.archive.org/web/20200123095105/http://www.imperial.ac.uk/mrc-global-infectious-disease-analysis/news--wuhan-coronavirus/