The lymphatic system’s primary function is to drain excess fluid from the tissues and return it to the bloodstream; when lymphatic drainage is impaired, fluid accumulates, causing the characteristic swelling of lymphoedema, a chronic condition most commonly affecting the arms and legs (Grada and Phillips, 2017; Greene and Goss, 2018; Keast et al, 2019; Azhar et al, 2020). Lymphoedema may be a result of congenital abnormalities, trauma, or infection, but more commonly, it is a post-surgical side-effect, especially after treatments for certain types of cancers (Unno et al, 2010; Azhar et al, 2020).
Lymphoedema is estimated to affect 90 million–250 million people globally, although this number is likely an underestimation due to variability in diagnostic criteria and missed clinical recognition (Rockson and Rivera 2008; Greene 2015; Keast et al, 2019; Torgbenuet al, 2020). Primary lymphoedema is rare, with 1 in 100,000 individuals affected. Secondary lymphoedema is more common, affecting approximately 1 in 1,000 Americans (Rockson and Rivera 2008; Greene 2015; Keast et al, 2019; Torgbenuet al, 2020).
In fact, 99% of lymphoedema is secondary (or acquired) lymphoedema, which is associated with higher morbidity, likely due to impaired compensation and comorbid conditions. In low- and middle-income countries, parasitic filariasis infection is the most common cause of lymphadenectomy. Lymph node radiation secondary to oncological surgery is the most common cause in high-income countries (Douglass and Kelly-Hope, 2019).
Lymphoedema is a significant cause of medical comorbidity, including chronic pain, functional impairment, recurrent infections, psychological distress and poor self-perception of body image (Greene 2015). Various clinical scoring systems have been developed to evaluate the severity and progression of lymphoedema (Greene and Goss, 2018). These scoring systems offer a structured approach to assess the extent of swelling, skin changes, functional impairment, and other clinical manifestations of the disease. Scoring systems can guide treatment decisions, monitor therapeutic outcomes, and facilitate standardised communication among healthcare professionals (Dambha-Miller et al, 2020).
However, a notable challenge in lymphoedema assessment is the heterogeneity of these scoring systems. Different scales prioritise various parameters, such as limb volume, skin thickness or functional outcomes. This variety means that there is no universal gold standard for assessing lymphoedema. As a result, the choice of a scoring system often depends on the clinical setting, the objectives of the assessment, and the preference of the healthcare professional.
To our knowledge, there is currently no published study that compares the various lymphoedema scoring systems. As such, we set out to delineate the diversity in the available clinical scoring systems and highlight the opportunities and challenges of such heterogeneity. As a secondary objective, we sought to review areas where unity can be achieved to allow for more actionable assessments.
Methods
The methods of the study were based on the Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews guidelines (Tricco et al, 2018). The inclusion and exclusion criteria are shown in Table 1.
A medical subject librarian was consulted in the development of our search strategy, and searches included combinations of the following index terms: lymphoedema, clinical scoring system, assessment tool, the severity of illness index, and severity classification. Searches were conducted in Medline, Embase, CINAHL, and Google Scholar to obtain all relevant articles as of 1 June 2024, with no restrictions on publication dates.
Two independent reviewers (SS and KS) screened articles using Covidence, and discrepancies were resolved through a consensus discussion. Studies that met inclusion criteria were further assessed with a full-text screen. All articles that could not be screened for eligibility based on title and abstract were moved to the full-text screening stage. At the full-text screening stage, each excluded study was assigned a specific reason for exclusion. Reference lists of the included articles were reviewed for additional studies to screen. A spreadsheet was used to record set parameters from each scoring system by two independent reviewers (SS and KS) with conflicts resolved through consensus discussion.
Results
The literature search and screening process is presented as a PRISMA flow diagram [Figure 1]. The combined database searches yielded 690 records. After removing duplicates, 586 records underwent title and abstract screening; of these, 86 articles were reviewed in the full-text screening stage, 55 of which were excluded.
In the 31 included studies, 33 clinical scoring systems were described. Six described only upper-extremity lymphoedema [Table 2], 10 described only lower-extremity [Table 3], two described both the upper and lower extremities [Table 4], six were non-specific or general scoring systems [Table 5], and nine were for head and neck [Table 6].
The most cited parameters in the scoring systems included limb volume (23 studies), skin changes (20 studies), functional impairment (19 studies), and pain (15 studies).
Historically, the gold standard for subjectively grading upper extremity lymphoedema (UEL) has been the International Society of Lymphoedema (ISL) grading system (Yamamoto et al, 2013; Wiser et al, 2020). This symptom-based scale has broad categories ranging from subclinical lymphoedema to lymphostatic elephantiasis (Wiser et al, 2020). This is a convenient way to stage patients on presentation but requires only an overall gestalt of the patient’s presentation.
More objective measurements supplemented this system, including the volume or limb circumference difference between two limbs (Yamamoto et al, 2013; Kim et al, 2020). Although these measurements were convenient and more accurate than the ISL staging, they have the limitations of bilateral lymphoedema being more challenging to assess, and they are difficult to compare across individuals with different heights and BMIs.
Several alternative scales for UEL that rely on quantitative measurements have been suggested. The most notable quantitative scale is the UEL index suggested by Yamamoto et al (2013) which takes the circumference of five locations along the upper extremity and corrects for the patient’s BMI. This scale has the notable benefit of being comparable across individuals despite differences in BMI. It may be a prudent scale to assess the efficacy of lymphoedema therapies in trials. It is also a technique that is easily accessible and adopted by providers. However, it is more time-consuming than the ISL grading or volume/circumference measurements, which limits its adoption in routine follow-up visits.
Lymphoscintigraphy and indocyanine green (ICG) have also been suggested for surgical planning for lymphoedema (Yamamoto et al, 2011; Yoon et al, 2020). Both appear to provide comparable ability to assess for the functional characteristics of the lymphoedematous limb but are significantly more specialised and less accessible to general practitioners and are not common in primary or urgent care settings.
The LEL index is analogous to the UEL scale. Both were proposed by Yamamoto et al and, consequently, have very similar benefits and drawbacks. Specifically, both have increased robustness, depth of information, and are easily accessible. Consequently, both are time intensive to complete compared to ISL/limb circumference.
Interestingly, there was a greater variety of scales identified for LEL, such as calf oedema area/volume by MRI, qualitative features on ultrasound, lymphoscintography/ICG backflow measurements, the LEL index based on limb circumference correcting for BMI, and the Latency-Edema-Compression (LEC) score based on clinical factors (Cheville et al, 2003; Yamamoto et al, 2011, 2013; Lu et al, 2014; Yamamoto 2016; Wang et al, 2018; Bjork et al, 2020, Omura et al, 2022; Shinaoka et al, 2022).
It seems beneficial to have this robustness of data for assessing lymphoedema in clinical trials, as mentioned for the UEL index. Along with the proposed LEL index, there is an LEC score, which uses functional parameters such as latency period (time to develop lymphoedema), duration of oedema, period of compression therapy and number of cellulitis episodes per year (Yamamoto et al, 2013). The LEC score stratifies patients into more of a binary classification based on only clinical factors.
More specialised imaging methods have also been proposed for LEL. MRI results appear promising for measuring tissue areas/volumes in different ISL stages of lymphoedema, providing a quantitative supplement for categorising ISL stages (Lu et al, 2014). It remains an open question if MRI has utility in further stratifying patient populations with lymphoedema beyond the our ISL stages and if there is any clinical utility or predictive power to MRI measurement.
The ISL and GDB Stages based on ICG Lymphography addressed both UEL and LEL (Wang et al, 2018; Garza et al, 2019). These systems aim to provide a comprehensive understanding of the patient’s lymphoedema status. Though incorporating both extremities allows for a complete picture, it can potentially lead to an increase in the complexity of the scoring method, with decreased accuracy given the decreased specificity of the score when removing the region of lymphoedema as a consideration.
The same can be said for non-specific scoring systems, six of which were identified. The Common Toxicity Criteria (CTC) lymphoedema criteria and ISL scales are the most commonly used in practice (Cheville et al, 2003). The CTC takes multiple factors into consideration, including patient-reported symptoms and clinical features, including dermal changes, regions where lymphoedema is present, inter-limb discrepancies, obscuration of the genitals, lymph-related fibrosis, and phlemolymphatic cording (Cheville et al, 2003). While comprehensive, this score is not easily accessible or understood by those without previous experience in the field.
Scoring systems such as the British Lymphology Society Staging System group those affected into four categories based on risk factors: regional involvement, presence of malignancy, and limb volume (Honnor, 2006).
Other specialised imaging modalities, such as elastography, have been described. However, they have not been validated as stand-alone scoring tools or integrated into any pre-existing lymphoedema scoring system. Bioimpedance spectroscopy has also been described as a rating tool but is binary, non-specific, and not commonly used in practice (Ridner et al, 2018).
Several head and neck lymphoedema (HNL) scoring systems have also been described. One notable system is the Head and Neck External Lymphoedema and Fibrosis Assessment Criteria, which categorises scores by clinical signs, subjective symptoms, and functional impairment (Deng et al, 2015). The Secondary Quadrant Upper Lymphoedema criteria, guided by ISL guidelines, has multiple objective measures, including bioimpedance analysis, circumferential measurement, water displacement, perimetry and imaging (Levenhagen et al, 2017). The MD Anderson Cancer Center HNL rating scale simplifies categorisation into three levels based on visual assessments of lymphoedema and the presence or absence of pitting, similar to the Common Terminology Criteria for Adverse Events and Compression Class =scores (Deng et al, 2011).
Other scoring systems, not yet validated, include the ALOHA scale, which uses two unique metrics, MoistureMeter D and neck tape measuring systems (Nixon et al, 2014; Purcell et al, 2016). Using endoscopy, the Modified Patterson scale looks specifically at laryngeal and pharyngeal oedema in head and neck cancer patients (Starmer et al, 2021). This more subjective assessment depends on the user’s comfort and skill level with endoscopy.
Discussion
The purpose of this study was to explore the range of clinical scoring systems for the evaluation of lymphoedema to identify areas where standardisation and unification could be achieved.
We identified 33 clinical scoring systems, targeting different regional areas affected by lymphoedema and focusing on varied parameters. While certain parameters, like limb volume, were universally recognised and incorporated, others, such as psychological distress and self-perception of body image, were only integrated in a subset of systems.
Further, classification systems fell into two predominant categories, scoring using a binary approach (present versus absent) versus grading systems with respective clinical signs with each grade. Regarding usability and clinical applicability, scoring systems with fewer parameters were reported to be more user-friendly and time-efficient in busy clinical settings. However, they might compromise on the granularity and comprehensiveness of the assessment. Conversely, while offering a thorough assessment, more detailed systems might be too cumbersome for routine clinical evaluations.
The diversity of clinical presentations of lymphoedema is represented in the heterogeneity of its scoring systems. As demonstrated in this study, a broad range of systems are currently in use and the challenge of selecting the system(s) that best align the objective and patient population falls on the clinician or researcher. While beneficial in capturing the nuanced presentations of lymphoedema, the diversity poses challenges for standardising assessments, comparing results across studies, and ensuring consistent patient care across different settings.
The heterogeneity in scoring systems, beyond reflecting the complexity of the disease, also underscores gaps in the collective understanding and approach to lymphoedema. While some systems are comprehensive in their assessment, capturing the condition’s physical and psychological facets, others focus narrowly on specific clinical signs or symptomatology. This variation might lead to disparities in diagnosis, treatment, and long-term patient care. For example, only 12 of the 33 scoring systems incorporated an assessment of the patient’s psychological well-being despite it being a significant comorbidity of lymphoedema.
This reveals a potential gap in the holistic assessment of patients with lymphoedema and highlights the need for a more comprehensive approach that considers both the physical and emotional ramifications of the condition.
Conclusion
Addressing the heterogeneity in lymphoedema scoring systems requires a two-pronged approach.
Firstly, there is a need for an evidence-based consensus among experts in the field. Collaborative efforts to synthesise the strengths of existing systems and address their gaps can pave the way for a more unified, comprehensive scoring method. This not only aids in standardising clinical assessments, but also ensures that research findings across different studies are comparable.
Secondly, the integration of patient feedback in refining these systems is crucial. Since lymphoedema impacts patients’ lives on multiple fronts, patients offer invaluable perspectives on what dimensions of the disease are most pertinent to their quality of life. By bridging clinical expertise with patient experiences, the field will be able to move towards a more standardised approach to lymphoedema assessment and care.