Thyroid nodules are a common clinical illness. Neck ultrasonography identifies at least one nodule in 20–76 % of adults, the majority of which are benign.1–3 Cytopathologic evaluation by ultrasound-guided fine needle aspiration (FNA) has emerged as the principal means to evaluate malignancy risk, and is recommended for most nodules greater than 1–1.5 cm.4–6 However, in 15– 25 % of cases, FNA yields an ‘indeterminate’ result, which implies a 20–30 % risk for cancer.7–9 The Bethesda system for reporting thyroid cytology aids evaluation by stratifying indeterminate findings into groups of escalating malignancy risk: atypia of undetermined significance/follicular lesion of undetermined significance (AUS/FLUS) with 5–25 % cancer risk; follicular neoplasm (FN) with 15–30 % risk; and suspicious for malignancy (SUSP) with 60–75 % risk.10 Ultrasound findings may modify risk further. Despite gradation of cancer risk, none of the indeterminate classes imply sufficiently low risk for malignancy to allow monitoring in lieu of surgery. Therefore, diagnostic surgical resection has been generally recommended,4,11 although the majority of indeterminate nodule have proved benign. For these patients, surgery was unnecessary, yet carried inherent risk, morbidity, and cost.12,13 Improved preoperative diagnostic tools have sought to address this matter.
The diagnostic use of molecular markers has proved valuable toward modifying cancer risk assessment and addressing the shortfalls of visual microscopy. Initially, the discovery of several oncogenic gene mutations associated with well-differentiated thyroid cancer led to their use as diagnostic molecular markers. When applied to cytologically indeterminate nodules, the identification of mutations/translocation in BRAF, RAS, RET/ PTC, and PAX8/PPARG increased the predictive value for malignancy.14 Initial retrospective, nonblinded analysis of this mutation panel suggested its utility as a ‘rule in’ test for thyroid cancer that could influence the decision to proceed with surgery or modify its extent.5,15,16 More recent prospective, blinded data confirm the utility of detecting BRAF mutations, though question the performance and predictive value of the entire panel, and most specifically the detection of RAS mutations.17 Further, confirmation of malignancy in indeterminate nodules already destined for surgery may not substantially change the clinical management of patients.
Since the majority of indeterminate nodules are ultimately benign, a different testing approach sought to identify molecular profiles associated with benignity. Such a test, if validated, effectively ‘rules out’ malignancy, and identifies patients in whom surgery can be avoided. This novel diagnostic paradigm was utilized in the design of the Afirma® gene expression classifier (GEC) (Veractye, Inc.). The Afirma GEC employs measurement of 167 expressed messenger RNAs (mRNAs) from fresh aspiration tissue to classify indeterminate nodules as ‘benign’ or ‘suspicious.’18 Importantly, the microarray platform for this test demonstrated excellent reproducibility and stability in different temperatures and shipping environments.19 Here, we discuss the multicenter validation trial and subsequent published experiences with the Afirma GEC.
Validation and Clinical Use of the Afirma Gene Expression Classifier
In 2012, the New England Journal of Medicine (NEJM) published the results of a prospective, multicenter, blinded validation of the Afirma GEC in 3,789 patients enrolled from 49 academic and community sites.20 Of 265 indeterminate nodules greater than 1 cm for which surgery was performed and gold-standard histopathology available for blinded review, 85 (32 %) nodules were malignant and 180 benign. The Afirma GEC demonstrated a sensitivity of 92 % (95 % confidence interval [CI] 84–97 %) and specificity of 52 % (95 % CI 44–59 %). In a population harboring a malignancy prevalence of 32 %, the negative predictive value (NPV) and positive predictive value (PPV) were 93 % and 47 %, respectively. More specifically, when Bethesda categories were AUS/FLUS or FN, the NPV for the Afirma GEC was 95 % and 94 %, respectively—similar to that of a benign cytology diagnosis for which conservative management is often favored. The NPV was 85 % for SUSP nodules and, at present, the Afirma GEC is not recommended for routine use in this population.
In spite of the demonstrated accuracy of this testing approach to identify benign nodules preoperatively, it remained unclear how implementation of this knowledge could affect clinical practice and vary in a real-world environment.
To address this concern, a multicenter experience with the Afirma GEC was performed and published by Alexander and colleagues in the Journal of Clinical Endocrinology and Metabolism in 2014.21 From five academic centers, 346 consecutive samples were obtained from patients who underwent FNA biopsy for an ultrasound-confirmed thyroid nodule greater than 1 cm, determined to be of indeterminate cytology by local evaluation and subsequently sent for Afirma GEC at the discretion of the treating physician. After exclusion of seven samples, 339 samples were evaluated. These included 165 (49 %) cases cytologically classified as AUS/FLUS, 161 (47 %) classified as FN, and 13 (4 %) classified as SUSP.
Of nodules with AUS/FLUS cytology, the Afirma GEC result was ‘benign’ in 91/165 (55 %), ‘suspicious’ in 66/165 (40 %), and ‘nondiagnostic’ in 8/165 (5%). Nodules categorized as FN were ‘benign,’ ‘suspicious,’ or ‘nondiagnostic’ in 79 (49 %), 73 (45 %), and 9 (6 %) of cases, respectively. These distributions paralleled those of the initial NEJM validation.
For patients with an indeterminate nodule whose Afirma GEC results were ‘suspicious,’ surgery was recommended in 141 of 148 (95 %) cases, and was completed in 121 (85 %). Surgical pathology confirmed malignancy in 53 of 121 (44 %) cases. By contrast, only four of 174 patients with a ‘benign’ Afirma GEC result were recommended for surgery, all of whom complied. Documented follow-up monitoring was available in 71 patients with ‘benign’ Afirma GEC results. Ultimately, surgery was performed in 11 of these 71 patients, with only one proving malignant (1.0 cm sonographic nodule, confirmed to be a 0.6 cm papillary carcinoma). This study showed a dramatic change in clinical management with Afirma GEC assessment. Using an ‘intention-to-treat’ model, Afirma GEC results modified the care recommendations of 171 of 339 (50 %) patients, and reduced surgery by 47 %.
Data now confirm that molecular testing has the ability to inform and influence the clinical management of cytologically indeterminate thyroid nodules. Results from two large multicenter studies using the Afirma GEC have demonstrated high NPV for thyroid malignancy and the ability to alter the surgical recommendations among affected patients. The National Cancer Cooperative Network now proposes ultrasound monitoring rather than surgery ‘if molecular testing predicts a risk for malignancy comparable to the risk for malignancy seen with a benign FNA cytology (approximately 5 % or less).’22 Given the very low post-test malignancy risk demonstrated in the Afirma GEC validation trial, it appears reasonable to employ a strategy of watchful waiting in patients with an indeterminate nodule and Afirma GEC ‘benign’ result when no other mitigating clinical or radiographic factors prompt surgical intervention.
The impact of this change in management has been evaluated in two costeffectiveness analyses. Cost-effectiveness was demonstrated with statistical modeling by Li et al.,23 whereas Duick et al.24 made cost calculations based on the retrospective finding that only 7.6 % of patients with an Afirma GEC ‘benign’ result underwent surgery. Estimated cost savings are between $1,453 and $2,600 per patient tested, and are attributed to a reduction in diagnostic surgery, surgical complications, and the need for life-long thyroid hormone replacement.
Central to the interpretation of Afirma GEC performance is the relationship between the prevalence of thyroid cancer in the population studied (i.e. pretest probability), and the NPV and PPV of a test. As the prevalence of malignancy increases, the NPV falls since there is an increasing probability that a negative test result represents a false negative. As the malignancy rate declines, NPV improves as the test result is increasingly likely to be a true negative. Differences in the underlying rate of malignant nodules within a population and the specific malignancy risk within nodules classified as indeterminate will influence both NPV and PPVs. Prior studies have confirmed substantial inter- and intra-rater variability for indeterminate cytology, even among experts25–27 using the Bethesda system. Thus, the interpretation of an Afirma GEC result must be performed in the context of the thyroid cancer prevalence within a given practice setting.
Separate retrospective experiences using the Afirma GEC in the clinical setting have recently been published. While these are not independent validation trials and are thus susceptible to selection bias and variation in cytologic/histologic assessment, these trials nonetheless show consistent performance of the Afirma GEC in varied practice populations. In a study by Harrell et al.,28 of 58 cytologically indeterminate nodules, 20 were Afirma GEC ‘benign.’ Though two malignancies were ultimately detected in this group, both false negative cases were attributed to sampling error (and not assay failure). No malignancies were detected in the remaining nodules with ‘benign’ Afirma GEC results, suggesting a very high NPV consistent with the 95 % CIs of the initial validation study.29 A separate study by McIver et al.,30 evaluated 90 indeterminate nodules primarily composed of lesions categorized as follicular (or Hurthle cell) neoplasm (i.e. Bethesda class IV).The authors reported one malignancy among 16 nodules (6 %) ‘benign’ on Afirma GEC testing, again consistent with the initial validation findings. The PPV in this study was calculated to be 16 % and the authors question if their data suggest that the PPV of the Afirma GEC may be lower than initially reported. However, the rate of histologically proved malignancy in GEC ‘suspicious’ nodules was only 17 %, in comparison to 56 % in the original validation trial. This suggests that marked variation in the use and meaning of Bethesda categorizations at the time of initial cytologic review is the most likely explanation for such differences.
Other potential markers have been evaluated for use in indeterminate thyroid nodules, with some yielded encouraging preliminary results. Assessment of the cell surface glycoprotein galectin-3 by immunohistochemistry in two multicenter studies has shown the potential for clinical utility.31,32 In the most recent, Bartolazzi et al.32 evaluated 465 nodules with indeterminate cytology, finding a PPV and NPV for galectin-3 expression of 82 % and 91 %, respectively. Evaluation of a 4-microRNA prolife differentially expressed in thyroid cancers compared with benign nodules showed a diagnostic accuracy as high as 90 % in the research setting,33 but demonstration of diagnostic accuracy and utility in large, prospective, validation studies has not yet been performed.34 The 17-mutation panel (provided by some academic institutions or commercially marketed as miRInform™ Thyroid by Asuragen, Inc.) has more recently demonstrated inferior performance compared with original reports, although next-generation sequencing for a larger set of mutations has been proposed as a next step.17,35 Several have reasoned that combining mutational testing with Afirma GEC would provide ideal NPV and PPV performance. However, Kloos et al.36 demonstrated that the additional determination of BRAF mutational status did not improve test sensitivity or specificity when added to the Afirma GEC. This was primarily because all BRAF mutation positive specimens were already classified by Afirma GEC as ‘suspicious’.
In a short amount of time, molecular testing of thyroid nodule aspirates has dramatically changed the management of indeterminate thyroid nodules. Though sonographic, cytologic, and clinical assessment must still be performed, their limitations and imprecisions have made molecular testing particularly attractive. Results thus far indicate that molecular testing with the Afirma GEC significantly effects clinical decision-making and limits unnecessary surgical intervention in many patients. The application of the Afirma GEC to more diverse practice settings, along with increased experience and longer patient follow-up, will continue to document its clinical impact. Given the unique strengths of different molecular testing methods, combination testing with different and multiple molecular markers warrants further study. Clinician judgment remains paramount, though molecular diagnostic tests have now become a key adjunct to the existing clinical, biochemical, radiologic, and cytologic assessments for most indeterminate thyroid nodules.