Ramalingam et al.

BMC Oral Health (2024) 24:349 BMC Oral Health

RESEARCH Open Access

Light gradient boosting-based prediction

of quality of life among oral cancer-treated
Karthikeyan Ramalingam1, Pradeep Kumar Yadalam2*, Pratibha Ramani1, Murugesan Krishna3, Salah Hafedh4*,
Almir Badnjević5, Gabriele Cervino6 and Giuseppe Minervini7,8

Background and introduction Statisticians rank oral and lip cancer sixth in global mortality at 10.2%. Mouth
opening and swallowing are challenging. Hence, most oral cancer patients only report later stages. They worry about
surviving cancer and receiving therapy. Oral cancer severely affects QOL. QOL is affected by risk factors, disease site,
and treatment. Using oral cancer patient questionnaires, we use light gradient Boost Tree classifiers to predict life
Methods DIAS records were used for 111 oral cancer patients. The European Organisation for Research and
Treatment of Cancer’s QLQ-C30 and QLQ-HN43 were used to document the findings. Anyone could enroll, regardless
of gender or age. The IHEC/SDC/PhD/OPATH-1954/19/TH-001 Institutional Ethical Clearance Committee approved
this work. After informed consent, patients received the EORTC QLQ-C30 and QLQ-HN43 questionnaires. Surveys were
in Tamil and English. Overall, QOL ratings covered several domains. We obtained patient demographics, case history,
and therapy information from our DIAS (Dental Information Archival Software). Enrolled patients were monitored
for at least a year. After one year, the EORTC questionnaire was retaken, and scores were recorded. This prospective
analytical exploratory study at Saveetha Dental College, Chennai, India, examined QOL at diagnosis and at least 12
months after primary therapy in patients with histopathologically diagnosed oral malignancies. We measured oral
cancer patients’ quality of life using data preprocessing, feature selection, and model construction. A confusion matrix
was created using light gradient boosting to measure accuracy.
Results Light gradient boosting predicted cancer patients’ quality of life with 96% accuracy and 0.20 log loss.
Conclusion Oral surgeons and oncologists can improve planning and therapy with this prediction model.
Keywords Quality of life, Oral cancer, Machine learning, EORTC, Clinical, A.I

Full list of author information is available at the end of the article

Ramalingam et al. BMC Oral Health (2024) 24:349 Page 2 of 8

Background and introduction options were similar, with radiotherapy being more com-
In the current global context, the most sought-after treat- mon in older patients.
ment modality is the one that stresses patient autonomy The emphasis on health measurement has recently
[1]. Even in the treatment of patients with oral cancer, shifted away from conventional metrics like mortal-
there is a tendency toward individualized management ity and morbidity. Comprehensive treatment planning
that includes numerous outcomes beyond just survival must include indications that show how disease and
and response rate. Oral and lip cancers rank sixth glob- impairment affect daily activities and behavior, subjec-
ally in terms of mortality, according to statistics, with a tive health, and disability or functional status [3]. In
reported rate of 10.2%. Due to the difficulty in mouth oral healthcare, machine learning-based illness predic-
opening and swallowing, the majority of oral cancer tion utilizing clinical data parameters seems promis-
patients only report when the condition is in advanced ing. Advanced algorithms like light gradient boost trees
stages. They also exhibit anxiety about surviving a cancer analyze and interpret massive patient data to improve
diagnosis and receiving future treatment [2, 3]. The Euro- forecasts, early detection, and personalized treatment
pean Organisation for Research and Treatment of Can- strategies. Clinical factors such as patient demographics,
cer Quality of Life Questionnaire Head and Neck Module medical history, test results, and imaging findings feed
(EORTC QLQ-HN43) is a revised and updated version these algorithms. An algorithm learns patterns and rela-
of the Head and Neck Cancer Module (QLQ-HN35) [4, tionships in data by training a machine learning model on
5]. It is a supplementary questionnaire module with the a dataset with known outcomes. Light Gradient Boosting
General Quality of Life Questionnaire Core 30 (QLQ- Tree (LightGBM) [32] improves gradient boosting, train-
C30). The QLQ-HN43 incorporates twelve multi-item ing efficiency, and prediction accuracy in huge datasets.
scales to assess pain in the mouth, swallowing, prob- The ensemble machine learning technique Random For-
lems with teeth, dry mouth, sticky saliva, problems with est aggregates many decision trees to improve forecast
senses, speech, body image, social eating, sexuality, prob- accuracy and handle complex datasets. Few studies have
lems with the shoulder, skin problems, and fear of pro- been done to predict quality of life based on question-
gression. In addition, seven single items assess problems naires using advanced machine learning. Predicting qual-
opening the mouth, coughing, social contact, swelling ity of life helps identify patients at higher risk of negative
in the neck, weight loss, problems with wound healing, impacts, enabling targeted interventions like psychologi-
and neurological problems. Artificial intelligence (A.I.) cal support or symptom management strategies for those
[6–12], which imitates human cognitive processes, is a needing additional assistance [33–36]. We aim to predict
revolutionary technology that has captured the attention the quality of life using light gradient Boost Tree classi-
of scientists worldwide [13–29]. fiers based on questionnaires from oral cancer patients.
Machine learning algorithms were developed to pre-
dict reduced health-related quality of life (HRQoL) with Methods
high accuracy in patients with benign or low-grade brain Thirty-five closed-ended questions regarding oral cancer
tumors, suggesting they can predict symptoms and global pre- and post-treatment were formulated. The validation
HRQoL decline up to 60 months post-surgery [30]. The committee, comprised of postgraduate students and fac-
previous study aimed to determine if machine learn- ulty from Saveetha Dental College’s Department of Oral
ing (ML) algorithms could predict HRQOL improve- Pathology in Chennai, examined each question. Statisti-
ments after stroke sensorimotor rehabilitation. Five ML cal analysis was used to estimate the size with a sample
algorithms were used, with random forest and k-nearest size of 201, a confidence level of 95%, a margin of error
neighbors effectively predicting recovery. of 5%, and an expected failure rate of 20%. Two hundred-
Important predictors included age, gender, baseline one patients participated in a questionnaire study that we
HRQOL, wrist and hand muscle function, arm move- performed using the snowball sampling technique. Of
ment efficiency, and sensory function. Hecksher coined the 201 participants, 111 offered their time to participate
the term “quality of life” (QOL), which the U.S. National in the survey. Use the web data protocol to swiftly and
Library of Medicine accepted as a keyword in 1977. The securely retrieve data and gather responses. The Euro-
World Health Organization defines “quality of life” as a pean Organisation for Research and Treatment of Cancer
person’s assessment of their place in life within the frame- Quality of Life Questionnaire Head and Neck (EORTC
work of their culture and in connection to their aspira- QLQ-H&N) [5, 37] is a patient-reported outcome (PRO)
tions, norms, expectations, and worries [13, 31]. Previous measure designed to assess patients’ quality of life with
studies compared the management of older and younger head and neck cancer. It is a comprehensive question-
patients with head and neck cancer, finding older patients naire that covers a wide range of domains, including
had more comorbidities and stage IV tumors. Treatment physical, functional, emotional, and social well-being.
The Declaration of Helsinki conducted the study, and
Ramalingam et al. BMC Oral Health (2024) 24:349 Page 3 of 8

the Ethics Committee of the Institute approved the pro- effectiveness of different treatment strategies. EORTC
tocol, Saveetha Dental College And Hospitals [Protocol QLQ-C30 has 30 questions that have to be taken, along
number: IHEC/SDC/PhD/0 PATH-2212/22/001; Date: with 43 questions in QLQ-HN43, which has 73 ques-
23/08/2022]. The study protocol was developed, and all tions. Other questionnaires, like the Oral Health Impact
subjects gave their written informed consent for inclu- Profile-14 (OHIP-14), are also available but not widely
sion before participating. used. Hence, we studied the QOL using the validated
After obtaining informed consent from the patients, EORTC QLQ-HN43 and QLQ-C30 questionnaires.
the EORTC QLQ-HN43 and QLQ-C30 questionnaires We have compared the overall health and QOL scores
were given to them. It was used both in Tamil and Eng- obtained with the clinical and demographic details of our
lish. Various domains were documented, such as pain, patients with oral cancer.
appearance, and oral function. The Saveetha Dental College’s SRB Committee in
(EORTC QLQ-HN43) is a revised and updated version Chennai, India, provided ethical approval. The Decla-
of the Head and Neck Cancer Module (QLQ-HN35). It ration of Helsinki was followed in the gathering of data
is a supplementary questionnaire module with the Gen- and the formulation of recommendations. Questionnaire
eral Quality of Life Questionnaire Core 30 (QLQ-C30). data was collected, and data was preprocessed, outliers
The QLQ-HN43 incorporates twelve multi-item scales removed, choosing a model, training, evaluating, adjust-
to assess pain in the mouth, swallowing, problems with ing hyperparameters, cross-validating, and making the
teeth, dry mouth, sticky saliva, problems with senses, data robot tool interpretable by Light Gradient Boosted
speech, body image, social eating, sexuality, problems Trees ( All other models
with the shoulder, skin problems, and fear of progression. showed less accuracy.
In addition, seven single items assess problems opening
the mouth, coughing, social contact, swelling in the neck, Light gradient boosted trees
weight loss, problems with wound healing, and neurolog- Light Gradient Boost Trees (LightGBM) is a machine
ical problems. EORTC QLQ-C30 has 30 questions that learning algorithm for large-scale datasets. It uses gra-
must be taken, along with 43 questions in QLQ-HN43, dient descent optimization and iterative weight updates
which has 73 questions. Other questionnaires, like the to minimize loss. Its unique features, including a histo-
Oral Health Impact Profile-14 (OHIP-14), are also avail- gram-based approach for best-split points, Leaf-wise
able but not widely used. Hence, we studied the QOL tree growth strategy, and customized data storage layout,
using the validated EORTC QLQ-HN43 and QLQ-C30 contribute to its exceptional speed and accuracy.
questionnaires. The architecture of LightGBM involves several key
The EORTC QLQ-H&N was developed by a multidisci- components:
plinary group of head and neck cancer experts, including
patients, clinicians, and researchers. It has been exten- 1. Histogram-based Learning: LightGBM uses a
sively validated and is now widely used in clinical trials histogram-based approach for tree construction,
and research studies to assess the impact of head and which helps reduce memory usage and speeds up the
neck cancer and its treatment on patients’ quality of life. training process. Instead of using the exact values of
The questionnaire consists of 35 items, each scored on feature points, it constructs histograms to represent
a 4-point scale ranging from “not at all” to “very much.” the distribution of feature values.
The items are grouped into nine subscales: 2. Leaf-wise Tree Growth: Unlike traditional depth-
wise tree growth, LightGBM grows trees leaf-wise.
•  Functional impairment. It selects the leaf node with the maximum delta loss
•  Pain. during tree growth. This approach tends to result in
•  Emotional functioning. a more accurate model but may lead to overfitting, so
•  Social eating. regularization techniques are applied to control it.
•  Social contact. 3. Gradient-based One-Side Sampling (GOSS):
•  Speech problems. LightGBM uses GOSS to perform efficient gradient-
•  Swallowing problems. based sampling during training. This technique helps
•  Sensory problems. to select the instances with large gradients, focusing
•  Global health status/quality of life. on the samples that contribute the most to the error.
4. Exclusive Feature Bundling: LightGBM supports
The EORTC QLQ-H&N is a valuable tool for assess- exclusive feature bundling, which groups categorical
ing the quality of life of head and neck cancer patients. features with common values. This can help
It can be used to track changes in quality of life over improve the algorithm’s efficiency when dealing with
time, to identify areas of concern, and to evaluate the categorical features.
Ramalingam et al. BMC Oral Health (2024) 24:349 Page 4 of 8

5. Parallel and GPU Learning: LightGBM is designed questionnaires at two treatment points, proving their
for distributed computing and supports parallel reliability and validity, similar to Previous studies trans-
and GPU learning. This makes it suitable for large lated. They validated the EORTC QLQ-H&N35 in Urdu
datasets and accelerates the training process. [37], assessing its convergent and discriminant validity.
6. Regularization: LightGBM incorporates The translations were comprehensible for all patients,
regularization techniques such as L1 and L2 to with Cronbach alpha ranging from 0.75 to 0.98. The
prevent overfitting during training. patient-reported content validity index scores were excel-
lent, and weak bidirectional correlations were found with
resilience, depression, and anxiety. Another study found
Results that lower QoL scores at diagnosis and during the first
The included patients were of both male and female gen- year after diagnosis have a predictive value for patients
der (56% male, 44% female). They were in the age range with head and neck squamous cell carcinoma, indepen-
of 30–70 years, with a mean age of 50. The study sam- dent of other factors, predicting lower overall survival [2,
ples included tobacco users (82%) and non-tobacco users 3], and another study examined the correlation between
(18%). Smoking tobacco was identified in 32%; tobacco three commonly used instruments for assessing the qual-
chewing was noted in 31% of patients; and 3% smoked ity of life of 33 head and neck cancer patients at Mato
and chewed tobacco. Grosso Cancer Hospital in Brazil, revealing a positive
The performance of Light Gradient Boosting (Light- correlation [31, 38, 39]. None of the studies did predictive
GBM) can be evaluated using various metrics, such as analysis. So, we applied machine learning algorithms like
accuracy, recall, F1 score, and area under the receiver Light Gradient Boosting Tree (LightGBM) and built trees
operating characteristic curve (AUC-ROC). It is often faster using histograms, making it appropriate for huge
faster and more memory-efficient than other boosting datasets. LightGBM performs well across challenges
algorithms, such as XGBoost and AdaBoost. LightGBM with leaf-wise tree development and gradient-based one-
can handle categorical features directly without requir- side sampling. There are two fundamental differences
ing one-hot encoding, which can save time and memory. between random forests and gradient-enhancing boost
Moreover, LightGBM provides options for controlling trees. Sequentially training the former corrects mistakes
the trade-off between computation time and model accu- in the preceding trees. However, we build trees in a ran-
racy. Parameters like the number of iterations, learning dom forest independently. A forest can be trained with
rate, number of leaves, and max depth can be adjusted to parallel but not gradient-boosting trees. Random forest
optimize performance for specific use cases. trees can output in any order because they are indepen-
Light gradient boosting predicted cancer patients’ dent [40–42]. The role of nutrition and old age in the
quality of life with 96% accuracy and 0.20 log loss. quality of life of cancer patients is a critical aspect of can-
cer care. Nutrition is crucial for cancer treatment, influ-
AUC-ROC encing treatment outcomes, energy levels, and immune
The AUC-ROC is a metric used to evaluate binary clas- function. In older adults, age-related factors complicate
sification models, indicating their performance distin- the impact of cancer and treatments on quality of life.
guishing positive and negative instances. It ranges from Balancing nutrition interventions with age-related con-
0 to 1, with higher values indicating better performance. siderations is essential for maintaining and improving
A high AUC-ROC indicates a good balance between sen- quality of life during cancer treatment.
sitivity and specificity and a high chance of ranking posi- However, gradient-boosting trees have a predetermined
tive instances higher than negative ones. order that cannot be altered. Another study showed
that the Arabic version of the MD Anderson Dysphagia
Discussion Inventory was validated among 82 Saudi Arabian head
Artificial intelligence (AI) is a relatively new technol- and neck cancer patients [41, 43], demonstrating 100%
ogy with great predictive potential. With so much digi- feasibility, acceptable test-retest reliability, and concur-
tal data at their disposal, AI has the amazing potential to rent validity with the EORTC Quality-of-Life Head and
enable meaningful decisions in choosing the best treat- Neck Module [41, 44, 45]. This study concluded with pre-
ment for every patient. EORTC scoring is one to seven, and post-operative results showing good improvement in
with one (very poor) to seven (excellent). We docu- quality of life and good accuracy of predictive modeling
mented the overall quality of life score at diagnosis and with an accuracy of 96% (Figs. 1, 2, 3 and 4, and Table 1),
one year after surgical intervention for 111 patients with Another study compared QoL in patients with T1a glot-
oral cancer. The EORTC quality of life questionnaires tic carcinoma treated with surgery or radiotherapy in
(QLQ-C30 and QLQ-H&N 35) were validated in India, U.K. specialist units [32, 40, 46–48]. Results showed simi-
with 200 head and neck cancer patients completing the lar overall QoL scores, with modest differences in certain
Ramalingam et al. BMC Oral Health (2024) 24:349 Page 5 of 8

Fig. 1 Shows a flow chart diagram of this light gradient boosting

Fig. 2 Light gradient boosting algorithm

Fig. 3 ROC curve of the predicted class

subscales but not persisting beyond four months [49], Conclusion

supporting the transoral laser microsurgery recommen- In conclusion, the development and implementation of
dation. LightGBM may struggle with unbalanced datasets a prediction model based on quality of life in oral can-
with many samples in one class and lack interpretability. cer patients can greatly enhance the planning and thera-
In questionnaire-based studies, oversampling the minor- peutic processes for oral surgeons and oncologists. This
ity class or altering class weights may be needed to make model enables a more personalized approach to care,
more balanced predictions. Further, large samples with empowering patients and optimizing resource allocation
better algorithms may help us achieve good accuracy for to ensure the delivery of high-quality, patient-centered
clinical applications. care.
Ramalingam et al. BMC Oral Health (2024) 24:349 Page 6 of 8

Fig. 4 Shows lift data of elastic net predictions

Matrix: confusion matrix (percent)
Predicted Competing interests
The authors declare no competing interests.
Actual n y
n 72.73% (TN) 0.00% (FP) Author details
y 0.00% (FN) 27.27% (TP) Department of Oral Pathology and Microbiology, Saveetha Dental
College and Hospitals, Saveetha Institute of Medical and Technical
Acknowledgements Sciences (SIMATS), Saveetha University, Chennai, India
Not applicable. Department of Periodontics, Saveetha Dental College and Hospital,
Saveetha Institute of Medical and Technical Science (SIMATS), Saveetha
Author contributions University, Chennai, India
Conceptualization KR; PKY; methodology, PR; MK; KR; software, PKY; and KR; Department of Oral and Maxillofacial Surgery, Saveetha Dental College
formal analysis, PKY and PR .; investigation, KR; and PR; data curation, PKY; and Hospitals, Saveetha Institute of Medical and Technical Sciences
and MK; writing—original draft preparation, SH; AB; G.M. and GC; writing— (SIMATS), Saveetha University, Chennai, India
review and editing, GC; MC.; G.M.; supervision, GM; funding acquisition, SH; Orthodontics Department, Faculty of Dentistry, Sana’a University, Sana’a,
administration: SH. All authors have read and agreed to the published version Yemen
of the manuscript. Verlab Research Institute for Biomedical Engineering, Medical Devices,
and Artificial Intelligence, Ferhadija 27, Sarajevo
Funding 71 000, Bosnia and Herzegovina
This research received no external funding. Dental Sciences and Morphofunctional Imaging, University of Messina -
Policlinico “Gaetano Martino”, Via Consolare Valeria, Messina, ME
Data availability 98100, Italy
The data will be available on reasonable request from the corresponding Saveetha Dental College and Hospitals, Saveetha Institute of Medical
author. and Technical Sciences (SIMATS), Saveetha University, Chennai, Tamil
Nadu, India
Multidisciplinary Department of Medical-Surgical and Dental Specialties,
Declarations University of Campania Luigi Vanvitelli, Naples, Italy

Ethics approval and consent to participate Received: 7 January 2024 / Accepted: 19 February 2024
The study was conducted following the Declaration of Helsinki, and the
protocol was approved by the Ethics Committee of the Institute, Saveetha
Dental College And Hospitals [Protocol number: IHEC/SDC/PhD/0 PATH-
2212/22/001; Date: 23/08/2022]. The study protocol was developed, and
all subjects gave their written informed consent for inclusion before they
participated in the study.
Ramalingam et al. BMC Oral Health (2024) 24:349 Page 7 of 8

