Palpation and physical exam remain an important part of thyroid evaluation, and while nodular discovery has increased overall, the prevalence of palpable thyroid nodules has not changed significantly since the 1960s, and remains around 3–7%.1–5 Currently, ultrasonography of the neck area has incidentally identified thyroid nodules with an incidence of 30–70 %,6,7 and unrelated computerized tomography (CT) scans containing the neck have increased thyroid nodule identification in 16–18% of patients.8–10 It is has been estimated that in 2018, 54,000 new cases of thyroid malignancy will be diagnosed with around 2,000 deaths related to thyroid cancer.11 This estimates to around 0.3% of thyroid cancer related deaths, and 3.1 % of all cancer related deaths for 2018 as recorded by the National Cancer Institute (NCI), a branch of the National institute of Health (NIH), at the time this manuscript was written.12
With increased detection of nodules and lack of consistent assessment protocols, surgery has been a favored treatment modality for both malignant and benign nodules. While this removes the tumor burden, in many cases surgery can lead to surgically associated complications, life-long thyroxine therapy for the patient, an increased overall cost burden with minimal to no changes in survival rates, in small localized or benign lesions.11 Over the years, our understanding of thyroid nodules and the natural progression of thyroid cancer has been a guiding force leading to a more standardized evaluation and management. In this article, we review recommendations of how to evaluate and manage thyroid nodules, from the initial ultrasound, to biopsy, to molecular testing.
Evaluation of nodules via imaging
The value of ultrasound to evaluate a thyroid nodule has improved over time, not only in resolution but also in identifying specific features associated with a higher risk of malignancy. Unfortunately, inconsistent or incomplete reporting, and interobserver variability, may lead to inappropriate or overaggressive management. A recent retrospective analysis was highly suggestive that the vast majority of current radiological reports provide insufficient information to allow the clinician to effectively risk stratify nodules.13 It has been a goal of various societies to develop a platform for uniform reporting. While each society differs in their reporting method, similarities are evident in determining risk of malignancy (e.g., size, shape, echogenicity, calcifications, and vascularity). Reports comparing three societies (American Association of Clinical Endocrinologists [AACE]/Associazione Medici Endocrinologi [AME], American Thyroid Association [ATA], American College of Radiology [ACR]) suggest a similar predictive accuracy in determining risk of malignancy.14–16 Thus, until evidence suggests a clear benefit from one reporting system over another, it is up to the center performing the evaluations to determine which reporting system is most suitable for them, and to maintain a reporting consistency for practitioner review.
Important ultrasonographic features identified by each society
The reflective comparison of a nodule to its surrounding normal thyroid tissue determines its echogenicity. For example, a hypoechogenic nodule (Figure 1) is darker than the surrounding normal thyroid tissue, while a hyperechogenic nodule is brighter than the surrounding thyroid tissue. A marked hypoechogenic nodule is even darker and compares the nodule echogenicity to surrounding infrahyoid or strap muscles rather than normal thyroid tissue. This feature is suggestive of increased risk of malignancy and is distinguished from an anechoic or cystic nodule that does not have any reflective solid tissue, and is a benign finding.
Reported as microcalcification, coarse calcification, or rim calcification (Figure 1). Microcalcifications imply the presence of psammoma bodies, measuring 10–100 micron round, and are the most specific feature of thyroid malignancy with a specificity of up to 95% and positive predictive value ranging from 42–94%.17–19 Coarse calcifications, typically causing posterior acoustic shadowing, are more benign features, but may be associated with medullary thyroid carcinoma.20 Rim calcification, also reported as peripheral calcification, are bright echoes found on the surface of the thyroid nodule and may represent malignancy.21
Nodule contour defines its margins. An ill-defined nodule is one in which more than 50% cannot be clearly demarcated and should not be confused with irregular, lobulated, or jagged margins (Figure 1). A recent study of 1,851 nodules, reported that irregular margins have a specificity for malignancy of around 83%.22 A 2014 meta-analysis noted that irregular margins have an odds ratio of 6.12 for malignancy.23 Sharp borders or well-demarcated margins may represent a more benign finding.24
Color Doppler evaluates vascular flow within a nodule and has been proposed as an important component in nodular evaluation. Vascular patterns should be reported as peripheral, intranodular, or avascular. While some studies suggest value to vascularity, others refute this, suggesting it is a poor predictor of malignancy.25–27 Much of the debate, for example, is that, while benign nodules possess a predominant peripheral flow pattern, up to 20% of malignant nodules also have a peripheral pattern.19 While the debate continues, reporting vascular flow remains an important component of thyroid ultrasound reporting.
Nodules are typically measured on three different axis planes (anterior-posterior, transverse, and longitudinal). While identifying malignancy is important, a key feature is to improve survival and minimize tumor burden. Miyauchi and colleagues, monitored >1,200 nodules with papillary thyroid carcinoma not removed surgically measuring, <1.0 cm.28,29 Ten years after serial ultrasounds, known as “active surveillance,” 8% of the nodules grew by ≥3 mm and 3.8% showed novel appearance of node metastasis without any increased risk of death over the 10-year period. Another study suggests that increasing tumor size beyond 1.0 cm does not affect survival until a threshold of 2.5 cm.30 Risk of malignancy, based on size, increases as the nodule grows beyond 1.0 cm with a threshold detected at 2.0 cm, beyond which, cancer risk remains essentially unchanged (e.g., a 3.0 cm nodule has essentially the same risk of being malignant as a 4.0 cm nodule).31
American Association of Clinical Endocrinologist, American College of Endocrinology, and Associazione Medici Endocrinologi guidelines
In their most recent update in 2016, the AACE/ACE/AME expanded on their three-class system to better identify the risk of malignancy of thyroid nodules (Table 1).32 The risk categories are:
Class 1 (low-risk lesions): These nodules have a risk of malignancy of around 1% and do not require fine needle aspiration (FNA).33 These nodules, made up of pure cysts, or predominantly cystic nodules (>50% fluid component), are not associated with suspicious ultrasound features. Spongiform nodules are also categorized in this group, composed of multiple microcystic spaces separated by thin echogenic septa.34 These nodules do not require FNA unless >2.0 cm and growing in size.
Class 2 (intermediate-risk lesion): Nodules in this category have a 5–15% risk of malignancy. These are slightly hypoechoic or isoechoic nodules with an ovoid (wider-than-tall) feature with smooth or ill-defined margins. These lesions may have intranodular vascularity, macro- or continuous-rim calcifications and/or indeterminate hyperechoic foci. Indication for FNA are nodules that are >2.0 cm.
Class 3 (high-risk lesion): These have at least one of the following features: Marked hypoechogenicity; speculated or lobulated margins; microcalcifications; taller-than-wide shape, show extrathyroidal growth; and/or possess pathological appearing adenopathy. These nodules carry a 50–90% risk of malignancy depending on how many of these features are present. Nodules in this category should undergo FNA biopsy if >1.0 cm while those between 5–10mm may undergo active surveillance and monitoring.
American Thyroid Association guidelines
In 2015, the ATA developed a five-classification system (benign, very low suspicion, low suspicion, intermediate suspicion, high suspicion) to identify sonographic features to risk-stratify malignancy risks and assist in determining which nodules require further evaluation with FNA (Table 2).34
Benign: These are anechoic/cystic nodules without any solid components. They have a risk of malignancy of <1% and typically do not require further workup unless for cosmetic or functional reasons.
Very low suspicion: These nodules have a <3% risk of malignancy, and are solid, isoechoic or hyperechoic. They do not have any microcalcifications, irregular margins, or extension into the extrathyroidal space. They are oval (wider-than-tall). Consideration should be made for FNA when the lesion is ≥2.0 cm. Observation is also a reasonable option due to its low risk. Spongiform or partially cystic nodules are also in this category.
Low suspicion: Isoechoic or hyperechoic solid nodule with or without cystic properties with eccentric solid areas. No microcalcifications or extrathyroidal extension. Nodules may be oval (wider-than-tall). These lesions have a 5–10% risk of malignancy. FNA is recommended with lesions ≥1.5 cm.
Intermediate suspicion: Nodules are hypoechoic, solid, oval (wider-than-tall) and have smooth margins. No microcalcifications are noted. Extrathyroidal extension is not identified. These lesions have a 10–20 % risk of malignancy and FNA is recommended when nodule is ≥1.0 cm.
High suspicion: Predominantly solid, hypoechoic containing one or more of the following features: irregular margins (not to be confused with ill-defined margins), microcalcifications, taller-than-wide, rim calcification with small extrusive soft tissue components. They may also have evidence of extrathyroidal extension. These lesions have a >70–90 % risk of malignancy and FNA would be recommended with nodules ≥1.0 cm.
The American College of Radiology Thyroid Imaging-Reporting and Data Systems
In 2012, the ACR developed a reporting system modeled after the their widely accepted Breast Imaging-Reporting Data System, known as BI-RADS.35 The most recent Thyroid Imaging-Reporting and Data Systems (TI-RADS) update in 2017 divides various ultrasound features into five categories assigning points, the total of which determine risk of malignancy and are reported as TR1–5.36 The different categories are described in Table 3. Evaluation of the sensitivity, specificity, and accuracy of the TI-RADS system compared to standard thyroid ultrasound evaluation was found to be 87%, 44%, and 52% respectively.37
Comparison between the three societies’ reporting systems
The ultimate goal in the development of thyroid reporting systems is to provide the highest diagnostic accuracy in identifying malignant versus benign thyroid nodules. A recently published cross-sectional study compared the ATA, AACE/ACE/AME, and ACR TI-RADS systems using an automated algorithm to classify each nodule into respective risk categories.14 In relation to diagnostic accuracy, no significant difference was seen between the TI-RADS and the AACE/ACE/AME systems (p=0.287), while the ATA system proved inferior (p=0.008 versus TI-RADS and p=0.036 versus AACE/AME). This was not seen in a smaller study of 195 thyroid nodules, which found ATA to have a similar accuracy to TI-RADS (60% for TI-RADS versus 68% for ATA).15 In terms of sensitivity and specificity, when nodules were reported in their highest risk categories, the AACE/ACE/AME system showed high sensitivity with low specificity, while ATA and TI-RADS systems showed high specificity with low sensitivity.14 A study evaluating 962 nodules retrospectively reported that specificity of TI-RADS and ATA systems may be influenced by nodular size.38 When comparing ATA to TI-RADS, ATA had a higher specificity (89.8% versus 80.6% respectively; p=0.003) in nodules >2.0 cm, while having similar specificity in smaller nodules.38 While debate exists as to which system is “better” than the other, it is important to note that thyroid ultrsonography is an evolving field and is far from perfect. For instance, there are occasions in which nodules are considered “unclassifiable” under these systems. Reports suggest that up to 5.0% in ATA, 3.0% in TI-RADS, and 2.6% in AACE/AME fall under this “unclassified” category,14,38 of which, malignancy rate reached 38.7% in TI-RADS group and 28.6% of the ATA group.38 For this reason, further research is needed to improve reporting systems in order to minimize missing possible malignant nodules. Reporting centers should also identify and use the system best suited to the practice. This will help minimize possible reporting errors and allow practitioners a more consistent report.
Ultrasound-guided fine needle aspiration
Regardless of criteria used to determine the risk of malignancy, FNA is frequently required to cytologically determine if a nodule is malignant. FNA using real time ultrasound is preferred as it allows for a safe, accurate, and cost-effective method for cytologic evaluation.39,40 It also helps minimize complications including trauma to nearby vital structures (i.e., carotid artery, trachea, jugular veins). Ultrasound-FNA has an accuracy of 80%.41
Bethesda System for Reporting Thyroid Cytopathology
In 2007, the Bethesda System for Reporting Thyroid Cytopathology (BSRTC)42 was introduced and gained considerable popularity for its ability to categorize the risk of malignancy based on cytological evaluation from FNA samples.43,44 This resulted in reduced rates of inappropriate reporting and consequently, fewer surgeries.45 The most recent BSRTC publication, in 2017, included several improvements. One of which is the reclassification of noninvasive encapsulated follicular variant of papillary thyroid carcinoma (EFVPTC) to noninvasive follicular thyroid neoplasm with papillary-like nuclear features (NIFTP).42 By eliminating the term carcinoma, it reduced patient anxiety by implying a benign condition requiring a more limited follow-up. Another important change in the 2017 BSRTC is the emphasis on the value of molecular testing as an adjunct to cytologic evaluation.
The BSRTC is divided into six tiers (Table 4): I, nondiagnostic; II, benign; III, atypia of undetermined significance (AUS) and follicular lesion of undetermined significance (FLUS); IV, follicular neoplasm (FN) and suspicious for FN (SFN); V, suspicious for malignancy; and VI, malignant.43 An important feature of this reporting system is the adequacy of the sample, defined as, no less than six groups of well-preserved thyroid epithelial cells consisting of at least 10 cells in each group. If a sample does not meet these criteria, they are labeled as Bethesda System (BS) I, inadequate or nondiagnostic. Inadequate samples should be correlated with risk stratification based on ultrasound. For example, AACE class 1/ATA benign/or ACR TR1 lesions that are biopsied do not require re-biopsy due to their near negligible risk of malignancy but more suspicious-looking nodules may require repeat biopsy.
The different classifications in the Bethesda System for Reporting Thyroid Cytopathology
BS II (benign): Cytology reported as benign have a risk of malignancy of <3%, with a false-negative rate of 1–11%.46–48 False-negative risk tends to increase with nodular size, notably those >4 cm, suggesting that a re-biopsy may be warranted on larger nodules.49–51 The decision on repeat biopsy is dependent on the correlation between ultrasound features to biopsy report. If discordance between imaging and cytology is noted, repeat FNA is warranted.52,53
BS III (AUS/FLUS): These terms are synonyms and not be used to denote two distinct interpretations. Their risk of malignancy is dependent on if the reading pathologist considers NIFTP, the new classification in BS, in the reporting. When the NIFTP category is used, risk of malignancy in AUS/FLUS is decreased to around 6–18% compared to the previous 10–30% risk of malignancy.42 Thus reducing the implied risk of malignancy in the AUS/FLUS category.44,54,55Inter-observer and -institutional variability in reporting EFVPTC may alter an organization’s reported risk of malignancy56 with the biggest impact at centers with high-frequency reporting of EFVPTC.57 Management of AUS/FLUS may include, re-biopsy, molecular testing,
BS IV (FN/SFN): These two terms are synonyms and not used to denote different distinct lesions types. Cibas and Ali report a modest reduction in risk of malignancy reporting from 25–40% to 10–40% when reporting the newer benign NIFTP.41 Since this category maintains a higher risk of malignancy overall but remaining <50%, management includes molecular testing or lobectomy to confirm malignancy prior to proceeding to total thyroidectomy. This limits post-surgical hypothyroidism in benign nodules.BS V (suspicious for malignancy): Malignancy is suspected on cytology but not with high certainty.42,43 Pre-NIFTP, this category carried a 50–75% risk of malignancy, which has reduced to 45–60 % with the implementation of NIFTP. Papillary thyroid carcinoma dominates this category. Since NIFTP has a more indolent nature, lobectomy is favored over near-total thyroidectomy, when appropriate.
BS VI (malignant): This category carries a high malignancy risk of 97–99%.43 Papillary thyroid carcinoma, the most common thyroid cancer, accounts for 70–80% of the BS VI category.58 Cytologically, papillary thyroid carcinoma is characterized by pale chromatin, linear chromatin ridges (grooves), intranuclear cytoplasmic inclusions, and nuclear crowding often overlapping. Medullary thyroid carcinoma, anaplastic carcinoma, lymphomas, poorly differentiated carcinoma, and metastatic cancers are cytologically distinguishable and are also categorized as “malignant”.59 Since histological features and cancer type impact treatment, prognosis, and recurrence, they should be reported by the cytopathologist when possible. Near-total thyroidectomy is indicated in this category.
Molecular testing, previously difficult to attain due to cost restrictions and availability, now has a higher accuracy, reliability, availability, and affordability, making it easier to attain and interpret. Recommendations suggest use of molecular testing in cytologically indeterminate (BS III/IV) and sonographically indeterminate nodules to establish better risk of malignancy and indicate whether surgery is indicated. Currently, there are two common molecular tests in clinical use in the United States, Afirma® Gene Expression Classifier (GEC; Veracyte South San Francisco, California, USA), established in 2012,60 and Thyroseq® V2 (CBLPath, New York, New York, USA), established in 2015.61 The 2015 ATA34 and 2016 AACE31 guidelines recommend consideration of molecular testing for indeterminate nodules (BSRTC III/IV) in establishing risk of malignancy and to determine course of action (surgery versus observation).
The Afirma GEC includes a 142-gene expression molecular assay using microarray to measure mRNA expression in order to classify a nodule as “benign” or “suspicious”. The test has a high reported sensitivity (92%) and negative predictive value (93%), with a low specificity (52%) and positive predictive value (47%),60 making this a “rule-out” test for malignancy. Since the reclassification of NIFTP, a decrease in positive predictive value from 42% to 24% is seen in the BS III group and from 23% to 13% in the BS IV group.62 This implies the ability of GEC to better detect carcinoma with EFVPTC, while lower ability to detect carcinoma in the NIFTP category.63
ThyroSeq V2, designed to identify malignant thyroid nodules using next generation sequencing, detects 14 thyroid cancer-related genetic mutations, including RAS and BRAF mutations, 42 types of gene fusions associated with thyroid cancer, and mRNA expression levels for 16 genes. ThyroSeq is reported to have a sensitivity of 90%, specificity of 93%, positive predictive value of 77–83%, and negative predictive value of 96–97%, with the ability to stratify risk based on the mutation detected.64,65 It is considered a test to “rule-in” malignancy.64 Since the newer classification of NIFTP, a recent study reports a decrease in positive predictive value with ThyroSeq of 42% and 33%, respectively when considering NIFTP as malignant or benign.66
BRAFV600E (BRAF) is an amino acid substitution at position 600 in BRAF, found in approximately 45–69% of all papillary thyroid carcinomas,67 with a 100% specificity for papillary thyroid carcinoma. However, a low overall sensitivity (40–60%) prevents BRAF from being a valuable screening test alone.68,69 In contrast, its presence in a cytologically malignant tumor may predict tumor aggressiveness.70,71 by activating various molecular mechanisms, accelerating the tumor’s natural course.71 In cytological indeterminate nodules (BS III/IV/V), detection of BRAF mutations can improve diagnostic accuracy and reduce unnecessary surgeries.72 In NIFTP nodules, BRAF is absent.73
The three isoforms of RAS (NRAS, HRAS, KRAS) along with PAX8/PPARG and RET/papillary thyroid carcinoma rearrangements are detected at a lower frequency than BRAF.74 Some evidence suggests that RAS, PAX8/PPARG, or RET/papillary thyroid carcinoma rearrangement-positive nodules may be histologically benign but carry a high potential of becoming malignant,75 or are associated with distant metastasis.76
Currently, expert option recommends ultrasound follow-up of nodules in 1–2 years after an initial cytological benign FNA, due to possible initial false-negative aspiration results. Recent studies are beginning to refute this monitoring interval and suggest a longer 2–4-year follow-up interval.77–80 A recent study of 2,000 cytologically benign nodules noted no long-term sequelae 4 years after initial benign cytology, even if the nodule turned out to have been a false-negative and discovered 4 years after initial biopsy.78
A common question, “once a nodule has been re-evaluated, and confirmed to be benign, how long should it be monitored?” There has not yet been a clear census to answer this question. The ATA suggests follow-up with ultrasound every 3–5 years but without a specified endpoint.34 A recent study suggests that in proven biopsy-negative nodules, consideration should be made to stop monitoring nodules beyond 3 years, as the risk of becoming malignant is exceedingly rare.81 In the pediatric population, currently, follow-up ultrasound every 1–2 years is appropriate until further research shows otherwise.82 For FNA-benign nodules we prefer repeat ultrasound in 1 year, and periodically thereafter
Historically, thyroxine therapy used to shrink thyroid nodules, was common; typically for cosmetic reasons. This practice typically leads to increased risk of adverse events (i.e., arrhythmia, loss of bone mineral density),83,84 along with long-term unnecessary pill use. Since many reports have shown minimal to no efficacy with this practice,85–87 the authors feel that the use of thyroxine to shrink nodules should be abandoned.
While the main focus of this article is the evaluation of thyroid nodules via ultrasound and cytology, we must not forget biochemical testing. Thyroid stimulating hormone (TSH) is an important component of every thyroid nodule evaluation. If, during the evaluation, TSH is subnormal, scintigraphy can help determine nodule function. A hyperfunctional or “hot” nodule is rarely malignant and biopsy is typically not warranted,34 thus, eliminating the need for serial ultrasound or FNA. In the case of a multinodular goiter, scintigraphy can separate the “hot” from “cold” (nonfunctional) nodules. This allows the practitioner to focus on the nodules that possess a higher risk of malignancy.
The detection of thyroid nodules has increased dramatically over time with the increased use of different imaging modalities. This has also led to a higher incidence/detection of thyroid malignancy. In a patient with normal or elevated TSH, ultrasound remains the method of choice to determine initial risks of malignancy of a thyroid nodule. Oftentimes, poor or incomplete reporting does not allow the practitioner sufficient information to determine if biopsy is indicated leading to overaggressive therapy. The ATA, AACE, and ACR have been standardizing their respective reporting systems to help alleviate this issue. While different from one another, their similar accuracy allows an organization to adopt whichever one best suits their needs.
In 2017, the Bethesda System changed the classification of EFVPTC to NIFTP. While still in its infancy stage, its wide spread use will limit unnecessary surgical procedures and minimize post-surgical hypothyroidism. Molecular testing, previously cost restrictive, has greatly improved the ability to “rule-in” (Afirma), or, “rule-out” (ThyroSeq) thyroid malignancy in Bethesda III/IV nodules. Also, newer recommendations, with their improved accuracy, recommend use of molecular markers in indeterminate nodules to help guide surgical recommendations.