Multiplex Target Testing— Should We Be More Specific about Specificity?

One of the big changes in the last 10 years or so in infectious disease diagnostics is the increasing use of syndromic panels—multiplex tests with as many as a few dozen targets—used to identify the presence or absence of multiple organisms that may cause a “syndrome,” or general pattern of symptoms. Patients with upper respiratory tract infections, diarrheal illness or bacteremia, for example, may be infected with any one of a variety of different organisms that cause similar symptoms. Syndromic panels allow the clinical microbiology lab to simultaneously test for many organisms in a single specimen, generally with a short turnaround time.  The advantages and challenges that such panel tests (also called multiplexed tests) present were recently addressed in an excellent review article.
Another interesting aspect of such panel tests is how we need to think about sensitivity and specificity using such an approach. Most panel tests are nucleic acid amplification tests (NAATs, PCR) which typically offer very good sensitivity relative to other methods; not surprisingly, sensitivities for each target on NAAT panel tests are usually reported to be excellent. Taking into consideration the entire panel, the improved “sensitivity” over single target tests is actually difficult to quantify—because physicians are unlikely to test for all the components on a panel test if they needed to be ordered separately—so sometimes the panel provides a diagnosis via a target that was considered less likely or not considered at all. A panel test is kind of like a shotgun—you don’t have to be spot on to have a good chance to get your target, just in the neighborhood. Conversely, the panel approach has a negative impact on specificity: an additive effect of false positives in the surplus channels. Extending the shotgun analogy, you’re also more likely to hit things you didn’t want to hit (just ask Dick Cheney!).
Let’s consider some hypothetical test specificities for single target and multiplexed or panel tests. We are talking specifically about diagnostic specificity, rather than analytical specificity.
First consider a single target test, performed on 1,000 specimens, with a population target prevalence of 10%. Given those numbers, there would be 100 true-positives (patients who really are infected with the organism, or “target” in question), and 900 true-negatives (patients not infected with the organism). If the test specificity is 99%, then for the 1,000 specimens analyzed there would be 9 false-positive test results (for 1% of 900, the test result is positive although the patient is not infected with the organism). If the test specificity is instead 95%, then for the 1,000 specimens run there would be 45 false-positive test results.

Table 1. % of specimens with a false-positive result *based on parameters indicated in text, not universally applicable.
Table 1. % of specimens with a false-positive result *based on parameters indicated in text, not universally applicable.
Source: Table courtesy M. Pettengill.

Now consider a 20-target panel test, performed on 1,000 specimens, in a population with a panel prevalence of 20% (20% total from all targets, lower than 20% for each target; true result is positive for 1 target 20% of the time). Given those numbers, there would be 200 true-positives (let’s say 10 positive for each target, no overlap), and 800 true negatives on a per-run basis, but 19,800 true negatives on a per-target basis (often this type of number is used to calculate an “overall specificity”). If the test specificity is 99% for each target, then for the 1,000 specimens analyzed there would be ~10 falsely positive test results (1% of 990) for each target. That is ~10 for each target, or a total of ~200 false-positives, or ~1 out of every 5 specimens! If the test specificity is instead 95% for each target, then for the 1,000 specimens run, there would be ~50 false-positive test results for each target, and as you may have guessed, that presents the chaotic situation wherein there is on average ~1 false-positive result for every specimen analyzed! Yikes! Even the example with 99% per target specificity results in approximately 1 out of every 5 patient specimens analyzed having a false-positive result. Thus, an “overall specificity” can be misleading regarding the anticipated frequency of false-positive results both for patient testing and quality control. Table 1 shows examples of overall specificity, using the above 20% prevalence example and simplified situation with no overlap of positive results.
Now some real examples, and we will see how this can get kind of messy.
For a direct from blood test for candidemia that targets 5 common species of Candida—the abstract of published results announced an “overall specificity” of 99.4% per assay (considering each target a separate assay, so 5 assays per specimen), or alternatively listed in the text 98.1% per patient. Those numbers look OK on the surface, but in the details there were only 6 true positives (by standard blood culture as the comparator, low prevalence target), and 29 false positives (out of ~1,500 patients). The authors note that there is evidence that blood cultures underdetect candidemia and that for at least 1 of the 29 “false positives” there was other evidence supportive of a systemic Candida infection. They also evaluated the assay performance with spiked blood cultures, and the data here look good, although it would have been nice if they had evaluated these head-to-head with blood culture for the same contrived specimens to help evaluate the claim that some of the 29 “false positives” were likely due to superior sensitivity of their assay relative to blood culture. 99.4% specificity sounds pretty good in an abstract or a sales pitch, but nearly 5 times more false-positives than true-positives does not.
Here is another example of a study evaluating a panel test for bloodstream infections, with an “overall specificity” for identifications listed in the abstract as 98.9%. To an old single-target-test way of thinking that sounds quite good, but in the multiplex panel space it is a more complex evaluation. For 277 analyzable test runs (patient specimens) on this 16-target panel there were 45 false-positive identifications. That is approximately 1 out of every 6 runs with a false-positive result. This test uses positive blood culture specimens, so it is readily comparable to what grew in subculture, and thus there is not much room to try to explain away the false-positives. The assay performance was, however, significantly better for gram-negative bacteria than it was for gram-positive bacteria or yeast, and additionally was improved with further revisions for the final FDA cleared assay (in this case with a published “overall specificity” of 99.3%, and clarification that some false-positives can be resolved by not reporting identifications that are not compatible with the blood culture Gram stain). It is important to note that neither of the described blood molecular assays replaces subculture to confirm results, identify organisms that aren’t on the panels, or allow for further antibiotic susceptibility testing.
Figure 1. “Should I turn the dial?” (Of course it depends on the context…) Image courtesy M. Pettengill.

As mentioned in the opening paragraph, panel tests offer some clear advantages compared to single-target tests, and the referenced review highlights some of the added value and positive impacts on the timing of patient result reporting.  As we can see from the hypothetical and real-world examples, though, multiplex panel testing also presents some significant drawbacks and requires more careful consideration of what published numbers for test performance parameters like specificity really mean in practice. To answer the question in the title from my perspective: Yes, I think we should be more careful and more patient-centric in how we broadcast specificity claims from evaluations of multiplexed or syndromic panel tests. Specificity should be calculated on a per-patient-specimen basis, and if we want to calculate a per-target parameter, let’s give it a different name.