Why and when positivity is misleading
By Daniel Simons
Last Updated: 29 November 2020
The university has repeatedly touted the low “positivity ratio” in press releases, press conferences, mass emails, and editorials. They told community members to watch that number as an indicator of how the campus was doing, and they used it to claim that our rates were better than those at other schools. But, that statistic is meaningless when you’re testing everyone multiple times each week and it doesn’t permit direct comparison of infection rates between communities. Those are misuses—abuses, really—of the positivity statistic, and they are misleading. This page describes what the positivity ratio actually is, when it is (and isn’t) useful to report positivity, and when it is a meaningless or deceptive statistic.
What is the positivity ratio?
The positivity ratio is a statistic used to estimate rates of infection in a broader population when you can’t test everyone. If you have a population of 100,000 people, you can test 1,000 of them and use the positivity ratio for that thousand to estimate how many of the 100,000 are infected. For example, if 20 out of 1000 tested positive (2%), then to the extent that your sample of 1000 was typical of that broader community, you could estimate that had you tested everyone, about 2000 would have tested positive. If you took a different sample of 1,000 on another day, you could get another estimate of that population infection rate as a way to see whether positivity has increased, decreased, or stayed the same. In both cases, you are attempting to estimate the percentage of people in the broader population who are infected. A stable positivity ratio given a comparable number of tests implies that the rate of infection is stable.
Back in March, testing nationwide was focused primarily on symptomatic people, and the question was what percentage of them had Covid-19 as opposed to some other illness. In that context, the population of interest was people with symptoms, and the goal was to determine whether Covid-19 was the cause of those symptoms. For those symptomatic people, the ratio of positive tests to all tests—the “positivity ratio”—gave an estimate of the proportion of sick people who had Covid. As testing has become more widely available, the positivity ratio has been used to estimate the infection rate in entire communities, not just in those who are showing symptoms.
Why is positivity ratio meaningless and misleading when you’re testing everyone?
The key idea of a positivity ratio is that it provides an estimate of the infection rate in an entire population based on a sample of people from that population. Much like a political poll, it assumes that the sample of a relatively small number of people is “representative” or “typical” of the broader population so that whatever you learn from that sample will generalize to everyone. If you could poll everyone (i.e., have an election), you no longer need the polls because you actually know the vote totals. Similarly, if you can actually test everyone, there’s no need to estimate the population infection rate from a sample because you have tested the whole population: You actually know the number you’re trying to estimate.
Put simply, you should just report the proportion of people who are infected rather than the proportion of tests that came back positive. Why is it misleading to report the positivity ratio when you’re testing everyone repeatedly? Let’s work through the logic with a concrete example.
This summer, the university tested all returning athletes repeatedly. They refused to release the testing results at the time, but they eventually provided a summary of those early summer test results on August 3. In their press release, they touted the low positivity ratio among the athletes of 1.9% as of July 30. You might think, based on that 1.9% positivity ratio, that only 1.9% of the returning athletes tested positive for Covid over the summer. But, in reality, 14% of the returning athletes tested positive!
To see how misleading that 1.9% positivity number was for athletes, let’s instead look at the actual numbers. For the 164 returning athletes, 12 tested positive immediately when they returned (\({12 \over 164} = 7.3%\)). Those testing positive were then isolated and no longer tested repeatedly. Of the remaining 154, another 11 tested positive within weeks (another \({11 \over 152} = 7.2%\)) despite frequent testing—a harbinger of the spread of infection we saw on campus in the Fall semester. So, for the athletes, a total of \({23 \over 164} = 14.0%\) tested positive for Covid-19 this summer. The vast majority tests over the summer were repeated negative tests of the same (uninfected) athletes. Consequently, the 1.9% positivity tells you nothing more than that the 141 athletes who did not test positive were each tested many, many times. It does not tell you anything about the actual infection rate, which was 14%! If you’re testing everyone repeatedly, the numbers to report are the total number of confirmed cases and the actual infection rate, not the positivity ratio.Misleading positivity numbers on the campus dashboard
When you test everyone, like Illinois is doing, there is no need to estimate a rate of infection — you can just compute the actual rate. The purpose of a positivity ratio is to estimate the overall infection rate from a sample of people in that population, but when you are testing everyone, it is an invalid statistic. It does not estimate the population infection rate. The 7-day positivity featured prominently on the dashboard and touted repeatedly to the media and public is even more meaningless and misleading. The numerator is the number of positive tests during that 7-day window. The denominator is the total number of tests. But, that total number includes multiple tests from the same people each week. Just like the example of why the 1.9% positivity for testing of athletes this summer was misleading because the same athletes were tested repeatedly, the 7-day positivity is not interpretable because it combines multiple tests of the same people. It is not a valid estimate of the population infection rate. A more meaningful number to report would be the number of people newly infected during that week. If you want to report a proportion, a more meaningful one would be the proportion of the people tested that week who were positive (i.e., use people tested rather than total tests in the denominator). Another concrete example might clarify why positivity is an invalid estimate of infections when you are testing everyone:
Day | Daily Tests | Daily Positivity | Daily Cases | Cumulative Cases | Cumulative Tests |
---|---|---|---|---|---|
1 | 10000 | 0.005 | 50 | 50 | 10000 |
2 | 9950 | 0.005 | 50 | 100 | 19950 |
3 | 9900 | 0.005 | 50 | 150 | 29850 |
4 | 9850 | 0.005 | 49 | 199 | 39700 |
5 | 9801 | 0.005 | 49 | 248 | 49501 |
6 | 9752 | 0.005 | 49 | 297 | 59253 |
7 | 9703 | 0.005 | 49 | 346 | 68956 |
8 | 9654 | 0.005 | 48 | 394 | 78610 |
9 | 9606 | 0.005 | 48 | 442 | 88216 |
10 | 9558 | 0.005 | 48 | 490 | 97774 |
11 | 9510 | 0.005 | 48 | 538 | 107284 |
12 | 9462 | 0.005 | 47 | 585 | 116746 |
13 | 9415 | 0.005 | 47 | 632 | 126161 |
14 | 9368 | 0.005 | 47 | 679 | 135529 |
15 | 9321 | 0.005 | 47 | 726 | 144850 |
16 | 9274 | 0.005 | 46 | 772 | 154124 |
17 | 9228 | 0.005 | 46 | 818 | 163352 |
18 | 9182 | 0.005 | 46 | 864 | 172534 |
19 | 9136 | 0.005 | 46 | 910 | 181670 |
20 | 9090 | 0.005 | 45 | 955 | 190760 |
21 | 9045 | 0.005 | 45 | 1000 | 199805 |
22 | 9000 | 0.005 | 45 | 1045 | 208805 |
23 | 8955 | 0.005 | 45 | 1090 | 217760 |
24 | 8910 | 0.005 | 45 | 1135 | 226670 |
25 | 8865 | 0.005 | 44 | 1179 | 235535 |
Testing everyone repeatedly, with rapid results, is essential for containing spread on campus. But, it invalidates positivity as an estimate of the infection rate. When testing everyone, you can just compute the numbers that matter directly: The number of cases and the proportion of tested people who are infected. We need the numbers analogous to the 14% we could compute for the athletes. The university does not provide that information. It also does not provide the information necessary to compute it.
(Subtle statistical point: The campus tested about 38,000 people each week, with many people testing 2 or 3 times each week. On average, they conducted about 10,000 tests each day. Given that people are tested repeatedly every 3-4 days, the single-day positivity can’t be used to estimate the campus infection rate because the daily samples are not independent of each other.)
The university touts the low positivity ratio as an important index of how the campus is doing. Ironically, in a mass email sent right before the start of the semester (August 21), Chancellor Jones defines the positivity ratio incorrectly as “the percentage of people who test positive out of those who have been tested.” That number is interesting and important - it is the proportion infected (\(cases \over people\)), though, not the positivity ratio (\(cases \over tests\)). Even more ironically, the proportion infected is what we should be paying attention to, but the campus never provided that number. They also never discussed the proportion infected in press releases, interviews, or emails, possibly because it would show that more than 8% of the entire tested population (about 49000 people) and more than 14% of an estimated 25,000 undergraduates tested positive during the fall semester. The email also stated that the positivity ratio “tells us how widespread the infection is here in our community and whether our levels of testing are keeping up with levels of disease transmission.” It doesn’t. When you are testing everyone in the population of interest multiple times each week, neither the daily positivity ratio, nor the 7-day positivity ratio, nor the total positivity ratio provides an index of the infection rate. The actual infection rate does.
Positivity ratios are hard to interpret even when used properly
The CUPHD and IDPH have used positivity more appropriately. They estimate the rates of infection for Champaign County, or Illinois Region 6, or the whole state of Illinois by estimating those rates using (mostly) independent samples from those populations of interest. The people who get tested one day in Champaign County likely have little overlap with those tested on other days, and the groups of people tested from day to day within a region are likely fairly similar in their makeup. Consequently, changes in the positivity ratio can be meaningfully interpreted as changes in the infection rate in the population.
But, even then, positivity is not straightforward to interpret. Positivity varies not just with the proportion of the population that is infected, but also with the number of tests conducted and who has access to those tests (only sick people? anyone who wants one?). If two communities have different access to testing or different testing strategies, differences in positivity do not necessarily mean that they have differences in the prevalence of infection. A really high positivity rate likely means that a community is conducting too little testing and is focusing the tests they are conducting on people with symptoms. A really low positivity rate could mean that infection rates are low and/or that the community is conducting widespread testing. Positivity combines infection rates and testing prevalence in ways that make it hard to compare ratios directly whenever the amount of testing varies over time or differs across regions.
Within a community, if the number of tests is stable (or increasing) and the people being tested are relatively similar from day to day, an increase in the positivity ratio reflects an increase in the proportion of people infected with Covid. If positivity is increasing faster than the rate of increase in testing, that also reflects increasing infection levels. So, when Illinois Region 6 and Champaign County saw a big spike in positivity throughout October and November, those increases were meaningful because the increases in positivity were greater than the increases in testing. When positivity declines, that can be due to increased testing or to reduced infections.
Although changes in positivity within a community can be meaningful, the absolute levels of infection can be hard to interpret. It’s essential to think about how your tested sample of people is similar to and different from the broader population to which you want to generalize. For positivity to provide an unbiased estimate of the infection rate in the community, you would need to test people from the community at random to ensure that there was nothing systematically different about the people who were and were not tested. That almost never happens in practice, so it’s essential to think about whether the sample of people who were tested is representative of the broader community. If they aren’t, then positivity in a sample might not generalize to infection rates in the broader community.