Received 29 August 2022, accepted 11 September 2022, date of publication 22 September 2022, date of current version 7 October 2022.

Digital Object Identifier 10.1109/ACCESS.2022.3208587

The Role of Intelligent Technologies in Early

Detection of Autism Spectrum Disorder
(ASD): A Scoping Review
Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, India
Corresponding author: Manu Kohli ([email protected])

ABSTRACT Background: Two-year delay is reported between the first developmental concern raised by
the parents and the diagnosis of ASD (Autism Spectrum Disorder), delaying the start of early intervention
programs most beneficial within the first three years. Aim: Evaluate the role of technology in ASD detection
by answering four research questions analyzing 1) evolution of technology, 2) use of various bio-behavioral
data sources, 3) demographic categories, databases, controls, comparators, and assessment instruments, and
4) data collection, processing, and outcomes of the technology-based methods in ASD detection. Methods:
Scoping review included behavioral-based ASD screening and diagnostic studies, published between 1st
January 2011 to 31st December 2021 in PUBMED, SCOPUS, and IEEE Xplore databases for children under
six years. The studies were evaluated using the Critical Appraisal Skills Programm (CASP) and the PRISMA
scoping review checklist (PRISMA-ScR). Results: The shortlisted 35 studies were categorized into seven
bio-behavioral categories. The review highlighted the extensive use of machine learning (ML) and Deep
Learning (DL) to detect infants (as young as 9 to 12 months) at risk of ASD and Other developmental
delays (ODD) using multimodal structured and unstructured data. However, the review reported various
internal and external validity threats. Conclusion: Technology can significantly improve the current ASD
detection process. The validation and adoption of technology can be fast-tracked by 1) designing robust
study protocols, 2) executing multi-cultural field trials, 3) standardizing datasets, data quality, and feature
engineering methods, 4) recruiting statistically significant participants from ASD, typically developing (TD)
and other developmental disorders (ODD) groups to ensure technological generalization, validation, and
adoption outside laboratory settings.

INDEX TERMS Autism, screening, diagnosis, technology, machine learning, mobile technology, artificial

I. INTRODUCTION Low and Medium Income countries (LMICs) for the early
Autism Spectrum Disorder (ASD) roughly affects 1.3 million identification of developmental challenges. Screening is a
children annually, on a conservative one in hundred diagnosis more formal, standardized method that includes routine pedi-
rate [1], and has increased 700% since 1996 [2]. atric evaluations [5], usually practiced in high-income coun-
Generally, the best-recommended practices to detect ASD tries (HIC). Practitioners such as doctors, nurses, and school
and other development disorders (ODD) are developmental teachers request families to respond to level 1 screeners [6]
monitoring, screening, and diagnosis [3]. The World Health such as M-CHAT-R/F [7], evaluating children’s social com-
Organization [4] recommends developmental monitoring in munication, peer interaction, eye contact, motor skills, and
fixated behaviors if any. Further, High-Risk (HR) children,
The associate editor coordinating the review of this manuscript and approving it for publication was Kumaradevan Punithakumar .
approving it for publication was Kumaradevan Punithakumar . plications, low birth weight, or admitted to newborn intensive

M. Kohli et al.: Role of Intelligent Technologies in Early Detection of ASD: A Scoping Review

care unit (NICU), are recommended to undergo additional assessment questions and their reliance on memory recall of
developmental risk assessments [8]. the child’s developmental history, contributing to evaluation
An exhaustive developmental examination can confirm and assessment biases [30]. Moreover, developmental eval-
diagnosis and referral to intervention if the screening instru- uations are seldom conducted in children’s natural contexts,
ment indicates a developmental concern. Clinicians usually such as in their homes. An encounter with a new clinician
implement gold-standard tools such as Autism Diagnostic in a new environment with social performance pressure may
Observation Schedule (ADOS-2) [9] and Autism Diagnostic trigger discomfort for the child resulting in assessment and
Interview-Revised (ADI-R) [10] to confirm ASD diagnosis. diagnostic biases.
Though early ASD indicators are evident at 12 months, and Artificial Intelligence (A.I.) based innovations have
diagnosis is possible at earlier than 18 months [11], fast-tracked ASD diagnostics [31], [32], increased clinician
most children are diagnosed between 48-60 months capacity, and improved access to early intervention pro-
[12], [13], highlighting a delay of two years. Delayed diag- grams [26]. The adoption of these technologies has surged
nosis slows the initiation of early intervention services by during the COVID-19 pandemic [33]. These solutions have
12-14 months [14], which can improve children’s IQ by the following benefits over traditional face-to-face methods:
10-15 points if started under the age of three [15] due to the 1) enhancing ASD management solution access to rural and
brain’s high neuroplasticity. Therefore, early ASD identifi- underserved persons and families, 2) reducing doctors’ and
cation and intervention can ensure a better quality of life for patients’ expenditures (such as travel duration and cost),
ASD children. and 3) expanding providers’ coverage areas. The prelimi-
There are various reasons for the delayed or misdiagnosis nary findings provide evidence of technological innovation’s
of ASD among children. Firstly, children with ASD exhibit feasibility and efficacy in improving current ASD detection
high variability in typical ASD features such as stereotypical and behavioral intervention methods, enhancing access, qual-
interests, repetitive behaviors, and limited communication ity, and affordability. However, more in-depth analysis and
and social skills [1]. The high behavioral variance makes it information can confirm the impact and outcomes of these
challenging for the clinician to establish an early diagnosis for innovations.
borderline and high-functioning ASD children, for example, Scoping reviews are a descriptive method that aids in ana-
with Asperger syndrome [16]. Moreover, with 80% of ASD lyzing complicated or varied research projects by identifying
cases diagnosed in males [1], women with ASD [17] are sus- the critical concepts, theories, and evidence sources to guide
ceptible to diagnostic delays and misdiagnosis attributed to and evaluate the adoption of new methods into practice [34].
stereotypical gender biases [18]. Secondly, the symptomatic The results of scoping reviews can identify gaps in the exist-
similarity of ASD with Attention Deficit Hyperactive Disor- ing literature and indicate areas with limited evidence to
der (ADHD) and speech delays [19] often leads to delayed or merit additional studies or a systematic review. We, therefore,
misdiagnoses [20]. An accurate diagnosis is critical to identi- performed a scoping review to evaluate the use of innovative
fying the child’s area of strength and developing a personal- technologies for ASD detection. We investigated a body of
ized need-based intervention plan per the child’s need [21]. literature to examine the extent, nature, and scope of current
Thirdly, the gold standard ASD diagnostic and screening research activities and answer the following four research
tools such as ADOS [9], ADI-R [10], M-CHAT-R/F [7], and questions based on the PICO framework [35], [36] aligned
CARS2 [22] are designed for the western world. Therefore, toward diagnostic innovations. In the framework definition,
these tests are sensitive to evaluation biases and subjec- ‘‘P’’ signifies the population in focus, ‘‘I’’ for intervention or
tive decision-making of clinicians from Low and Medium researched condition, ‘‘C’’ for the comparators, and ‘‘O’’ for
income countries (LMICs), resulting in incorrect results, psychometric outcomes.
primarily influenced due to lack of training and cultural 1) RQ1 How has the literature on technology-based ASD
disparities [23]. Fourthly, the availability of clinicians and detection methods evolved?
infrastructure globally to assist ASD detection and manage- 2) RQ2 How do researchers use the various bio-behavioral
ment is limited [24], especially in LMICs, a challenge further markers to detect ASD?
constrained by the poor awareness of the disorder [25]. Also, 3) RQ3 What demographic categories, databases, con-
families have limited access to clinicians and infrastructure trols, comparators, and assessment instruments are
and usually travel considerable distances or relocate to access a part of the technology-facilitated ASD detection
services [26]. These limitations lead to lengthy wait times, process?
delayed diagnosis, and causing stress to individuals and 4) RQ4 How have researchers gathered and processed
families [26], [27]. multimodal data? How do technological innova-
In addition, the current ASD detection process has limi- tion’s results compare to conventional ASD detection
tations. The clinicians require significant training and time methods?
to implement diagnostic instruments [28]. A 93-point ADI-R The review is based on PRISMA scoping review guide-
questionnaire, for example, can take 2.5 hours to com- lines and includes the following sections. Section II details
plete [29] across multiple visits. Further, interview responses eligibility criteria for study selection, keyword definition
are based on the caregiver’s subjective comprehension of and justification, study search process, data extraction, and

M. Kohli et al.: Role of Intelligent Technologies in Early Detection of ASD: A Scoping Review

analysis. The results are listed in section III, where we 5) Given the importance and effectiveness of early ASD
synthesized the review finding and answered four research detection and intervention due to the brain’s strong
questions. We present the result under seven multimodal neuroplasticity, the emphasis of the review was limited
data categories, technological subdivisions, analyzing data to studies that included children under the age of six.
sources, data extraction, synthesis, and outcomes. Discussion
section IV highlights internal and external validity threats, B. SEARCH STRINGS
advantages, disadvantages, ethical, legal, and cultural con- We searched the following search strings in the title, abstract,
straints, high-level limitations, and mitigation measures and and keywords fields: (‘‘Autism spectrum disorder’’ OR
recommends future directions. Section V lists the study’s lim- ‘‘ASD’’ OR ‘‘Autism’’ OR ‘‘AUTISTIC’’) AND (‘‘Detect*’’
itations and section VI lists future directions and additional OR ‘‘Predict*’’ OR ‘‘Diagnos*’’ OR ‘‘Screening’’ OR ‘‘Iden-
focus areas for research. Finally, in section VII, we conclude tif*’’ OR ‘‘Suspect’’ OR ‘‘Classif*’’ OR ‘‘Distinguish*’’
our findings. OR ‘‘Differentiate’’ OR ‘‘Risk’’) AND (‘‘Technology’’ OR’’
A.I.’’ OR ‘‘Artificial Intelligence’’ OR ‘‘Machine Learning’’
This section describes the study’s selection criteria, search The search string justification is as follows:
strategy, justification, data extraction, and analysis. The 1) The keywords ‘‘Autism spectrum disorder’’ OR
review is conducted using the PRISMA Extension for ‘‘ASD’’ OR ‘‘Autism’’ OR ‘‘AUTISTIC’’ shortlisted
Scoping Reviews (PRISMA-ScR) checklist [37]. The studies focused on Autism Spectrum Disorder.
22-point checklist is attached in the appendix section (See 2) To ensure shortlisted studies focused on screen-
Appendix C). ing, diagnosis, detection, and identification of ASD,
we included the following keywords: ‘‘Detect*’’ OR
‘‘Predict*’’ OR ‘‘Diagnos*’’ OR ‘‘Screening’’ OR
A. ELIGIBILITY CRITERIA ‘‘Identif*’’ OR ‘‘Suspect’’ OR ‘‘Risk’’. The keywords
The inclusion criteria for this study are as follows: (1) Stud- ‘‘Classif*’’ OR ‘‘Distinguish*’’ OR ‘‘Differentiate’’
ies that leveraged technology and included behavioral-based were included to shortlist studies that differentiate
ASD screening or diagnostic methods; (2) included children between ASD, T.D. (Typical development), and ODD
under the age of six; (3) published between January 1, 2011, (other developmental disorder) groups.
and December 31, 2021; (4) included quantitative ASD detec- 3) The keywords ‘‘Technology’’ OR’’ A.I.’’ OR ‘‘Arti-
tion methods including cross-sectional experiments, longitu- ficial Intelligence’’ OR ‘‘Machine Learning’’ OR
dinal data analysis, and dataset investigations; and (5) were ‘‘Mobile’’ helped shortlist studies with technology
part of one of the three electronic databases: PUBMED, IEEE usage.
Xplore, and SCOPUS. The following are the search criteria 4) The search outcomes of the above three criteria were
justifications: combined with an AND operator.
1) Most evidence-based ASD detection methods [38], and
tools [9], [10], [22], [30] track social communica- C. SEARCH RESULTS PROCESSING
tion, eye contact, challenging behavioral, and notable Two authors MK and AKK (Manu Kohli & Arpan Kumar
play-based landmarks to identify children with ASD. Kar), completed each phase of the PRISMA scoping review
We, therefore, shortlisted studies that used these behav- depicted in Figure 1. Any contradictory results were resolved
ioral landmarks and excluded studies focusing on with the consultation and mediation of a third author SS
medicine, biology, genetics, EEG (electroencephalo- (Shuchi Sinha). The search results were downloaded, com-
gram), MRI (Magnetic resonance imaging) usage, piled, and imported into Zotero for the presence of dupli-
and non-technology-based ASD screening or diagnosis cates and subsequent removal. Zotero assists reference
methods. management by syncing citations with bibliographies, DOI
2) We excluded conference papers to ensure we included (Digital object identifiers), and metadata. Each unique arti-
only high-quality peer-review journal publications cle’s title and abstract were screened for relevancy, followed
selected from PUBMED, IEEE Xplore, and SCOPUS. by a full-text analysis per the inclusion-exclusion criteria
3) We excluded literature reviews as we focussed on stud- listed in subsection II-A. Thirty-two studies were shortlisted
ies that conducted experiments, trials, datasets, or lon- post-full-text analysis, and additional three publications [44],
gitudinal multimodal data analysis. [45], [46] were uncovered by analyzing the shortlisted study’s
4) Since 2011, the growth in mobile and edge-based references, making the total shortlisted study count to 35.
A.I. innovations can be attributed to the emergence
of low-cost, scalable cloud computing infrastructure D. DATA EXTRACTION AND ANALYSIS
and sensors [39], [40], [41], [42], [43]. Therefore, The review included extracting the below-listed data
we selected studies published between January 1, 2011, from thirty-five shortlisted studies listed in two tables.
and December 31, 2021, to evaluate the role of technol- Table 1 includes multimodal input data, feature reduction
ogy in ASD evaluation. steps, environment setting, data processing algorithms, and

M. Kohli et al.: Role of Intelligent Technologies in Early Detection of ASD: A Scoping Review

FIGURE 1. PRISMA flow diagram- Literature screening and shortlisting studies.

psychometric outcomes, i.e., sensitivity, specificity, and accu- Programme (CASP) tool [80]. The studies were scored with
racy. Table 2 summarizes the enrolment counts, software or three possible responses: a) criterion met, b) partially met,
hardware devices used, assessment tools, assessment dura- or c) not applicable, not met, or not mentioned, with scores
tion, limitations, and future direction of each study. In addi- of 2,1 and 0, respectively. Table 3 shows implemented rating
tion, a quality evaluation using the Critical Appraisal Skills scales, referring to previous clinical studies [81], [82], to rank
Programme (CASP) was performed for each shortlisted studies into high, medium, and moderate categories. The
study. quality evaluation sheet for shortlisted studies is attached in
The technical terms used in the review are explained in Appendix C section.
Appendix B in Table 5. The quality evaluation suggested ten studies with moderate
1) Study objective, methods, and experiment locations quality [46], [48], [49], [50], [52], [53], [56], [57], [63], [76].
2) Participant’s group size and diagnosis status While twenty-five studies were classified as high-quality
3) Datasets used in the study [44], [45], [47], [51], [54], [55], [59], [60], [61], [62], [64],
4) Bio-behavioral markers for data extraction [65], [66], [67], [68], [69], [70], [71], [72], [74], [75], [77],
5) Assessment duration, tools, and methods [78], [83], [84]. In general, the quality analysis highlighted
6) List of software, material, or devices used the following limitations for most studies: 1) small sample
7) Multimodal data collection steps sizes, 2)Unclear study questions, objectives, inclusion and
8) Data processing steps exclusion criteria, 3) Insufficient information on participant
9) Technology used in the study, and sampling and recruitment,4) Imprecise data analysis and out-
10) Outcomes, limitations, and future direction comes reporting.
In the following sections, four research questions are
III. RESULTS answered.
This section answers four research questions and presents
quality assessment results. B. RQ1 HOW HAS THE LITERATURE ON
Two authors (MK and AKK) undertook the quality evalua- We respond to the research question by assessing the selected
tion of shortlisted studies using the Critical Appraisal Skills study’s 1) temporal publishing, 2) co-authorships, and

M. Kohli et al.: Role of Intelligent Technologies in Early Detection of ASD: A Scoping Review

TABLE 1. Data input, processing, and outcomes summary from (N=35) shortlisted studies.

VOLUME 10, 2022 104891

M. Kohli et al.: Role of Intelligent Technologies in Early Detection of ASD: A Scoping Review

TABLE 1. (Continued.) Data input, processing, and outcomes summary from (N=35) shortlisted studies.

3) keywords trends. In addition, we highlight prominent jour- 2021 (Figure 2). Even though the use of technology in ASD
nals where shortlisted articles were published. management and in general has shown growth since 2011
[85], [86], [87], the review highlight 2018 to 2021 as dom-
1) PUBLICATIONS TRENDS inant years in the adoption of Machine Learning (ML) and
The temporal publication patterns suggested that around 80% Deep Learning (DL) technologies. This aberration can be
of the shortlisted studies were published between 2018 and attributed to the following inclusion criteria for shortlisting

M. Kohli et al.: Role of Intelligent Technologies in Early Detection of ASD: A Scoping Review

TABLE 2. Shortlisted studies (N=35) participants,evaluation duration, hardware, software,limitations and future Directions summary.

M. Kohli et al.: Role of Intelligent Technologies in Early Detection of ASD: A Scoping Review

TABLE 2. (Continued.) Shortlisted studies (N=35) participants,evaluation duration, hardware, software,limitations and future Directions summary.

M. Kohli et al.: Role of Intelligent Technologies in Early Detection of ASD: A Scoping Review

TABLE 2. (Continued.) Shortlisted studies (N=35) participants,evaluation duration, hardware, software,limitations and future Directions summary.

M. Kohli et al.: Role of Intelligent Technologies in Early Detection of ASD: A Scoping Review

TABLE 2. (Continued.) Shortlisted studies (N=35) participants,evaluation duration, hardware, software,limitations and future Directions summary.

TABLE 3. Rating scales and quality evaluation.

FIGURE 3. Coauthorship pattern based on author’s country of origin and

respective ties.

FIGURE 2. Yearly publication count per Data category.

co-authored with researchers from Austria, Bangladesh,
China, Japan, and Israel. Authors from Iran, Canada,
studies for the review; 1) selecting studies focussing on ASD South Korea, the United Kingdom (UK), the Netherlands,
detection rather than an intervention that has seen higher tech- Poland, Italy, the UAE, and India are the other countries with
nological adoption, 2) including only behavioral-based detec- co-authorship collaborations. Authors from Brazil, France,
tion methods and excluding EEG, MRI, and genetic methods Sweden, Spain, and Switzerland collaborated with other
that have incorporated technology since 2011, 3) selecting co-authors from the same country. The analysis highlights
studies with participants of less than six years, and 4) skewed that most research initiatives and partnerships are from devel-
temporal adoption of technology in ASD detection. oped economies that have formed partnerships with selected
developing economies.
The publication pattern shown in Figure 3 depicts the county 3) KEYWORDS ASSOCIATION
as a node and its size as the publication frequency from the Figure 4 depicts the most important and frequently used
country’s authors. The country node’s edge strength indi- keywords in the shortlisted studies. The size of the keyword
cates collaboration between co-authors from multiple coun- nodes represents the frequency of occurrences in the short-
tries. The significant country-level contributions are from listed studies, with the edge weights indicating their simul-
the United States of America (USA), whose researchers taneous occurrence in other studies as shown in Figure 4.

M. Kohli et al.: Role of Intelligent Technologies in Early Detection of ASD: A Scoping Review

Keywords with substantial edge weights and dense connec- clinician questionnaire module. The three-module screener
tions share a semantic relationship. The most frequently used implemented in 8-10 minutes outperformed earlier psycho-
keywords in research papers are ‘‘Autism spectrum disor- metric outcomes using the GBDT (Gradient Boosted Deci-
der,’’ ‘‘child,’’ ‘‘age,’’ ‘‘clinician,’’ ‘‘joint attention,’’ ‘‘behav- sion Tree) algorithm.
ior,’’ ‘‘time,’’ ‘‘analysis,’’ ‘‘deep neural network,’’ ‘‘td child,’’ [49] collected one to five-minute home-based videos rated
‘‘machine learning,’’ ‘‘development,’’ ‘‘study,’’ ‘‘gaze,’’ ‘‘asd by non-experts generating a feature set analyzed by eight ML
child,’’ ‘‘screening tool,’’ ‘‘feature,’’ ‘‘classifier,’’ ‘‘symp- classifiers previously trained on ADI-R and ADOS datasets.
tom,’’ ‘‘infants,’’ and ‘‘pattern.’’ These keywords suggest that All classifiers had a sensitivity above 0.945, but only three
most ASD solutions used ML and DL classification methods had a specificity above 0.5. The LR5 (LR model with five
on multimodal eye-gaze, behavior, and joint-attention data. shortlisted features) outperformed other ML models. [50]
validated their previous work [49] on Bangladeshi children,
4) JOURNAL PUBLICATIONS including those with SLC (speech-language conditions).
The frequency distribution of 35 review articles was as fol- Non-expert US raters, after one-hour training, reviewed
lows: four in the Journal of Autism and Developmental Disor- videos and responded to 31 multiple-choice questions, gen-
ders, three in Scientific Reports, two in the Journal of Medical erating a feature set from the responses. The LR with Elastic
Internet Research, and the remaining publications were pub- Net penalty [89] and LR5 were the best performing ML
lished in different journals. The breadth of studies published models on the feature set with sensitivity, specificity, AUC,
in various journals suggests the adaptability and validation of and accuracy for ASD vs. TD as 0.76, 0.58, 0.76, and 0.70 and
a wide range of technology-based ASD detection innovations, ASD vs. ODD as 0.76, 0.77, 0.85 and 0.76 respectively.
with multi-country authorships and multimodal data types. [51] video-recorded mother and child social interactions
of HR (High-Risk) toddlers aged 9-12 months in three social
C. RQ2 HOW DO RESEARCHERS USE THE VARIOUS situations. 1) Face-to-Face (FF) mother-child interactions,
BIO-BEHAVIORAL MARKERS TO DETECT ASD? 2) mother’s unresponsive Still-face (SF), followed by 3) usual
Each shortlisted study is assigned to one of the seven data mother and child interactions. The SVM classification model
categories shown in Figure 5, also referred to as bio behavior. outperformed NB (Naive Bayes) and RF (Random forest) in
Listed below are the study counts for each data category. the ASD detection and classification.
1) Stereotypical behavior (Nine Studies) Further, [52] developed a Video-referenced Infant Rating
2) Eye gaze (Six Studies) System for Autism (VIRSA). The system algorithm proposed
3) Facial expressions (Three Studies) a series of parent-infant interactive age-matched videos, with
4) Postural analysis (Three Studies) parents choosing the most appropriate ones matching their
5) Motor control and movements (Four Studies) child, resulting in a score computation. At ages 6, 9, 12, and
6) Auditory data (Three Studies) and 18 months, children were clinically examined, diagnosed, and
7) Assessments and electronic health record data (Seven rated on the VIRSA. The statistical analysis of VIRSA scores
Studies). predicted 100% ASD in children at 18 months and 78% at
We summarize shortlisted studies in seven data categories 36 months compared to diagnostic established using gold-
in the subsections below. standard tools. This study is a first step towards creating a
novel video-based online rating system for detecting ASD in
1) STEREOTYPED ASD BEHAVIORS children with robust psychometric properties.
In this review section, nine studies [47], [48], [49], [50], [53] developed a smartphone application, NODAsmart-
[51], [52], [53], [54], [55] extracted and classified the ASD Capture empowering parents to record home videos of child’s
deterministic behaviors such as tantrums, self-stimulatory behavior and label social dialogue, play, and problematic
and injurious behavior, non-compliance, the objects lining, behavior in four social scenarios. Diagnosticians annotated
and poor communication, eye contact, or social skills from the videos with built-in tags designed on DSM criteria such
videos to perform ASD detection. as ‘‘no eye contact’’ or ‘‘repetitive play,’’ matching ninety-one
[47] developed ML models through a two-stage pro- percent of their recommendations with the ground truth diag-
cess: (1) feature selection and (2) ASD and TD classifica- nosis recorded at study enrolment.
tion. They trained ML models using historical ADI-R and West syndrome (WS) disorder [90], diagnosed in 0.06%
ADOS-2 records, shortlisted 20 critical features using the of infants and children, is characterized by epileptic spasms,
DF (Decision Forest) algorithm, and incorporated them in often leading to mental impairment in children. [54]
the parental questionnaire (PQ) and annotation-based video- implemented ML to predict the onset of ASD/ID (Intellectual
tagging module. In the second stage, researchers integrated disability) in high-risk 9–12-months with WS. Researchers
responses of both modules applying L2-regularized logis- captured three video-recorded social engagement scenar-
tic regression (LR) [88], whose psychometric outcomes ios, and out of SVM, J48, and RF [91], the DS (Decision
outperformed those of M-CHAT, CBCL (Child Behavior stump) [92] algorithm predicted WS vs. TD with 0.765 and
Checklist), standalone questionnaire, and video modules. WS+ vs. WS− with 0.812 accuracy using multimodal audio
[48] enhanced their previous work by introducing a third and video data.

M. Kohli et al.: Role of Intelligent Technologies in Early Detection of ASD: A Scoping Review

FIGURE 4. Shortlisted studies (N=35) keywords association.

in images and videos. The studies hypothesized that chil-

dren with ASD prefer circumscribed interests (CIs) [93],
preferring specific animated characters, toys, or activities.
Researchers use the gaze preference of children on the content
of images or AOI to make ASD vs. TD classification
In the experiment by [56], TD and ASD groups observed
six scenic images with social (e.g., people) or without social
cues (e.g., bowl). The researchers extended the experiment
with twelve images, half with CI (e.g., a toy car) and another
half without CI (e.g., a plant). Within-subjects CI and non-CI
eye-gaze data for ASD and TD groups using T-tests suggested
poor social attention processing abilities for the ASD group.
The study [57] recruited children from ASD and TD groups
who were similar in age and gender. For 10 seconds, partici-
pants observed a female speak the English alphabet, and their
fixation data on various facial and body areas were collected.
They applied DA (Discriminant analysis) to mouth and body
AOI fixation data and classified ASD and TD children.
FIGURE 5. Studies distribution -Seven multimodal data categories. [58] studied six-month-old preterm children’s gaze and
fixation on social figures, suggesting that children preferred
looking at the eyes or lips of social figures over nonsocial
Based on a movie stimuli [55] elicited and engaged the images. However, at 18 months, each subject tested negative
child’s attention, video recorded behavioral and social reac- for ASD when evaluated on M-CHAT and without CG (con-
tions of children. They analyzed the scenes using computer trol group) presence; the results provided weak evidence to
vision to decipher children’s emotional, behavioral codings, detect and classify ASD and ODD.
and head positions and classified ASD children with 85-95% [59] recorded participants’ eye movements while viewing
accuracy. eleven photographs and constructed a virtual network graph
using temporal gaze patterns and fixation time on the seven
2) EYE-TRACKING AOI on the human face. Betweenness centrality at four face
Eye-tracking is a non-invasive method for examining features, under the right and left eye, left eye, and mouth, was
an individual’s attention and mental processing abilities, lower in ASD children than TD children by 27, 53, 42, and
which serve as proxies for cognitive and neurological 61%, respectively, forming a basis of ASD detection.
functioning. [60] captured the gaze modulation of children with ASD
In this review, six studies, [56], [57], [58], [59], [60], [61] and TD children using an eye tracker as they played a variant
used eye-tracking and gaze analysis to measure fixation of the Go/No-Go game. AdaBoost’s meta-learning algorithm
frequency, duration, and AOI (Area of Interest) responses could distinguish ASD and non-ASD participants with an
from children’s gaze towards social and nonsocial stimuli accuracy of 88.6% based on gaze patterns.

M. Kohli et al.: Role of Intelligent Technologies in Early Detection of ASD: A Scoping Review

[61] evaluated if an impaired response to joint atten- origin/race, and sex, the ASD group had a faster head move-
tion (RJA) in infancy is a critical ASD marker. The infant ment rate in four of five movies with complex stimuli.
eye gaze was recorded in a 10-minute session of several IJA By removing the ODD (other developmental delays) group
(Initiation to joint attention) tasks. Since newborns utilize from the non-ASD group, the 95% CI level adjusted rate
their gaze for RJA and IJA, this method can be used to ratios to distinguish ASD vs. TD were significant.
quantify children’s social cognition milestones at an early Reinforcement learning is a subfield of AI (Artificial Intel-
development age of 10–18 months. ligence) that guides intelligent entities’ behavior based on
a reward-based environment [101]. [66] in multiple stim-
3) FACIAL EXPRESSIONS uli, single Child-Robot Interaction (CRI) session measured
Children with ASD struggle to produce and perceive facial head postures, joint-attention, and eye-gaze data [102] using
expressions that express a range of emotions and display RGBD sensors and cameras. They used CNN (Convolu-
affection [94], impacting their social functions. Deep learn- tional Neural Network), CVA, and CLNF (Constrained Local
ing (DL) models can identify facial expressions from images Neural Field), differentiating TD and ASD children. The
or videos in three steps, 1) preprocessing the image or videos, TD group had good adherence to IJA (Initiation of Joint
2) extracting facial expression features, and 3) classifying the Attention) and RJA (Responding to Joint Attention) with
extracted features to various emotions. the therapist and robot than the ASD group. However, the
In this review section, two studies [62], [63] used pub- children with ASD displayed higher comfort and engagement
lically available facial expressions datasets to train the DL with robots and a high IJA towards the therapist during the
models and extracted and analyzed facial expressions from transition.
EG (experiment group) and CG (control group) to make an In addition, [67] developed and validated a deep neural
ASD diagnosis. network (CNN-LSTM architecture) trained on the non-verbal
[62] trained CVA model on Binghamton University 3D aspects of social interaction from video recordings captured
Facial Expression database [95] to extract facial landmarks during ADOS-2 assessments that distinguished ASD and TD
that SVM classified into positive, neutral, and other cat- peers with an accuracy of 80.9%.
egories. They observed that children with ASD had more
neutral expressions than children without ASD. The AUROC 5) MOTOR MOVEMENTS
with age-covariates ranged between 0.75 to 0.83 for five Children with ASD have varying degrees of fine and gross
movies that children with ASD and TD watched. motor skills. Gross motor deficits in children with ASD can
Imitation of facial expressions is a critical measure of impair body balance and make it challenging to participate
social interaction skills. Studies demonstrate that children in sports or do daily tasks [103]. Difficulties with fine motor
with ASD on prompted stimuli usually perform imita- skills might limit participation in activities that demand hand
tion slower than TD children [96]. [63] trained the DL muscle movements [104]. In this review section, we covered
model to recognize facial expressions using FER2013 [97], four studies [68], [69], [70], [71] that used motor data to
CK databases [98], and augmented the model learning with classify ASD children.
sixteen Chinese children’s facial expressions. The partici- [68] used a smart tablet with touch-sensitive screens and
pants imitated seven facial expressions, and their responses inertial movement sensors to capture the study participant’s
were video-recorded. For the ASD group, average expression contact impact data patterns while playing games. They
imitation was lower than 60%, compared with TD, a critical applied the Kolmogorov-Smirnov test (KST) [105] on the
ASD deterministic threshold. sensor dataset, shortlisting the ten most significant features
[64] studied facial expressions using the Facial Action from 262 and classified ASD vs. TD using RGF2 (Reg-
Coding System (FACS). An OpenFace software extracted the ularized Greedy Forest) [106] algorithm computing AUC,
subtle dynamics of social smiles of ASD and TD children sensitivity, and specificity scores. [69] analyzed participant’s
from their home recordings. The results suggested that ASD upper-limb movements in a reach-to-drop task exercise. The
children display happy facial expressions less intensely than participant reached the ball, placed it in the support, and
their TD counterparts during the first year of life. transferred it to the target box hole. They shortlisted seven
discriminating features out of seventeen using Fisher discrim-
4) POSTURAL AND HEAD MOVEMENT DATA inant ratio (FDR) [107] for both EG and gender and mental
Children with ASD demonstrate a diminished capacity for age-matched CG. They used the SVM algorithm to identify
postural stability [99] and functional balance [100]. ASD children using seven features. [70] on three real-world
Two studies, [65], [66], used CVA (computer vision analyt- virtual reality imitation tasks collected participant’s body
ics) from recorded videos to measure head postural control in movements in response to visual, auditory, and olfactory
study participants to distinguish ASD and TD groups. stimuli. They identified joint motions using the DL (Deep
[65] induced social and nonsocial stimuli by asking Learning) OpenPose and shortlisted critical and extensive
study participants to watch five movies comprising animated body part movements using PCA (Principal component anal-
and complex characters and recorded participant’s rate of ysis) [108] to detect children with ASD. The SVM algorithm
head movements using CVA. After adjusting for age, ethnic classified ASD children with an accuracy of 0.893 using

M. Kohli et al.: Role of Intelligent Technologies in Early Detection of ASD: A Scoping Review

five joint movements (head, trunk, arms, legs, and feet) in In addition to analyzing and classifying multimodal data,
response to visual stimuli. Inter-joint coordination and motor few studies focused on enhancing ML performance. [75]
synergies [71] can be potential substrates of ASD markers. used Grasshopper Optimization Algorithm (GOA) [117]
Researchers asked ASD and TD participants to engage in a on three datasets [79] and predicted ASD with near
motor task behavior by manipulating a felt-tip pen to draw 100% accuracy.
on a sheet of paper. At the same time, an optoelectronic [76] assessed HR and low-risk infants at eight, fourteen,
motion capture system recorded their movement kinematics twenty-four, and thirty-six months. The best ML classifier
that was analyzed by the SVM algorithm to classify ASD and was SVM (AUC of 0.713) trained on VABS [118] daily
TD participants with a 94.7 percent accuracy. The analysis living module [119] records that were captured at 14 months,
implies that an ecologically valid autism motor signature can normalized and z-scored. [77] used ML to investigate the
predict ASD risk in children. Q-CHAT [120] assessment records to distinguish between
ASD and non-ASD children. Of five ML algorithms: RF,
NB, SVM, LR, and KNN, the SVM achieved the highest
6) ASSESSMENTS, DATASETS, AND EMR ANALYSIS accuracy of 95%. [78] used the Q-CHAT and Q-CHAT-10
We discussed seven papers in this section. Two studies (Q-CHAT with ten features) datasets to develop two 5-layer
[72], [73] incorporated natural language processing (NLP), DNNs to detect children with ASD. They compared the per-
[74] used Electronic Medical Records (EMR). Another two formance of both the models and observed that the Q-CHAT-
publications analyzed Q-CHAT [77], [78], one study VABS 10 model reported higher AUROC, sensitivity, and specificity
(Vineland Adaptive Behavior Scales) [76] and [75] used cus- than the outcome of SVM and DNN algorithms processed
tomized ASD assessment dataset to classify ASD and TD on Q-CHAT data [78]. The findings confirm the role of ML
children. models in reducing the assessment features and predicting an
Word2Vec algorithms [109] convert words to vectors, eval- ASD condition.
uate similarities, and group words logically, allowing the pro-
cessing of sizeable unstructured text repositories. In addition, 7) AUDIO DATA
LDA (Latent Dirichlet Allocation) [110] uses a prior Dirichlet DL models can identify distinctive vocal patterns by analyz-
distribution [111] matching word distributions with logical ing the production of canonical syllables and speech volubil-
topics. Combining LDA and Word2Vec, both parts of NLP ity [121]. Canonical syllables [122] have a consonant and a
can generate discriminative features for a topic based on vowel-like component that emerge by the second half-year of
contextual associations. life and not later than ten months in TD children. Volubility
[72] analyzed unstructured ASD evaluation referrals refers to syllable production frequency and is usually limited
by scanning, preprocessing, physical records, and reading in children with ASD [123]. In the review, three studies
through OCR (Optical character reader). The dataset was analyzed audio data; two used syllable production, speech
upsampled [112] by adding two simulated positive samples patterns, and canonical babbling [44], [45], and the third used
for each positive case and feature reduced using L1 and L2 crying patterns [46] to detect ASD.
regularizations [88] using SVM. Word2Vec predicted ASD [44] used a pre-trained feature extraction auto-encoder
risk with precision, recall, and F2 scores of 0.646, 0. 911, integrated with a joint optimization method, and trained four
and 0.842, respectively, outperforming LDA. ML models on eGeMAPS (Geneva minimalistic acoustic
[73] predicted ASD risk by asking families of HR children parameter set) dataset [124]. The ML models: SVM, BLSTM
to state social-communication developmental concerns in a (88 features) [125], BLSTM (54 features), and optimized AE
sentence. A regression tree algorithm analyzed the textual BLSTM were tested on 95 ASD and 130 TD utterances across
responses that either suggested ASD risk or presented an five vocalizations categories: syllables, canonical babbling,
additional M-CHAT-R [7] or ASQ [113] question and, after calling mother or father, screaming and crying. The BLSTM
processing, suggested ASD risk. The ML model AUC with AE model outperformed other ML models with precision,
text-only analysis ranged between 0.36 to 0.54, and for text recall, and F1 scores of 0.4526, 0.6869, and 0.5457.
and with M-CHAT-R [7] questionnaire between 0.74 to 0.88. [45] conducted a retrospective study examining the vocal-
The EMR [114] is usually implemented in clinicians’ izations of 37 infants from two 5-mins videos in the 9–12
offices, clinics, and hospitals to capture notes, assessments, months and the 15–18 months age range; that included fam-
and treatment records cross-sectionally and longitudinally ily play, vacations, and familiar routines (e.g., mealtimes).
for diagnosis and treatment. [74] extracted 89 features from The video recordings were annotated on canonical babbling,
longitudinal retrospective EMR data and shortlisted 20 fea- syllables production, and speech volubility features. The LR
tures using RF Gini impurity [115] scores. They used model trained on the canonical babbling ASD features was
SMOTE [115] to upsample and overcome the class imbalance the strongest predictor to classify 90% of ASD and 63% of
in the ASD dataset. The LR predicted ASD risk with an TD infants at 9–12 months. Further, Log odds ratios (log
AUC of 0.727. Researchers obtained ground truth labels for OR) confirmed that TD infants reached the canonical bab-
patients (ASD or non-ASD) in the studies [116] from the bling [123] stage earlier than other infants who were later
clinical reports. diagnosed with ASD.

M. Kohli et al.: Role of Intelligent Technologies in Early Detection of ASD: A Scoping Review

[46] for ten ASD and TD children collected crying samples [47], [48], [49], [50], [52], [54], [62], [62], [65], [65], [68],
(300 ms to 3-sec clips), preprocessed and cleansed them [76] included HR, ASD and ODD children in the control
by removing screaming, babbling, or vocalizations instances group to perform classification tasks.
with a closed or non-empty mouth. They used phonation and
vocal quality features from Belalcazar-Bolaños dataset [126] 4) ASSESSMENT TOOLS
created from audios of Parkinson’s patients. To minimize Out of the seventeen different psychometric tools, six of
misclassification, they used a novel SubSet Instance (SSI) the most widely used in the review were ADOS, ADI-R,
method using unsupervised and supervised methods. They M-CHAT-R, MSEL, CARS, and DSM. The ADOS, ADI-R,
shortlisted two discriminative speech features, i.e., an MFCC CARS-2, and DSM are gold-standard ASD diagnostic tools.
and SONE coefficient, to measure tone’s timbre and loudness The outcomes of these tools are matched with the outcomes of
with temporal difference variance to form a basis to screen technology-based tools to calculate psychometric properties.
children with ASD. The MSEL measures children’s cognitive development and
ensures that the controls and comparators recruited in the
TECHNOLOGY-FACILITATED ASD The assessment duration is reported in Table 2 in the col-
DETECTION PROCESS? umn ‘Evaluation duration’. Most assessments lasted less than
This section identifies various participant counts, datasets, 10 minutes. In addition, studies capturing bio-behavioral
experiment and control groups, assessment instruments, loca- data involved parents and non-experts to perform annotations
tions, and durations. were conducted in-home setting [47], [48], [49], [50], [51],
[52], [53], [54], [73]. Studies that used eye-contact, postural
1) PARTICIPANT COUNTS measures and facial expressions required extensive set up of
The study participant counts are reported in Table 2 in the sensors and cameras and were conducted in clinic or hospitals
column ‘Participant Count’. The majority of studies reported [44], [45], [56], [57], [58], [59], [60], [61], [63], [65], [66],
limited Participant enrollment in the study. There are only [67], [69], [71], [72], [74], [76], [77], [78]. Few studies that
three studies that reported greater than 250 participants [74], captured motor, behavioral and postural data captured infor-
[77], [78], six studies between 150-250 [47], [48], [49], mation in home and clinic settings [46], [48], [55], [62], [64],
[50], [72], [76] and another six studies between 100-150 [68]. Studies that used virtual reality framework to capture
participants [52], [61], [62], [65], [67], [73]. Seven studies motor data required a dedicated VR room [70].
[46], [51], [54], [56], [57], [60], [68] reported enrolment
between 50-100 participants and nine studies [44], [45], [55], E. RQ4 HOW HAVE RESEARCHERS GATHERED AND
[58], [59], [64], [69], [70], [71] represented between 10-50 PROCESSED MULTIMODAL DATA? HOW DO
enrollments. Two studies reported less than 10 participants TECHNOLOGICAL INNOVATION’s RESULTS
[53], [66]. The remaining studies reported using datasets for COMPARE TO CONVENTIONAL ASD
This section list various data collection methods, shortlisted
2) DATASETS ASD markers post feature reduction, the performance of ML
Large-scale datasets give researchers the motivation and nec- and DL models, and their psychometric outcomes.
essary sample size to develop, collaborate, and benchmark the
performance of ML and DL algorithms. The studies reported 1) DATA COLLECTION METHODS
use of datasets in the audio [126], [127], assessments [79], The review selected a variety of approaches to capture
facial expression [95], [97], [98], and EMR category [74]. unstructured data. Eye trackers such as Tobii [56], [61],
The studies reported challenges such as data preprocessing, BeGaze software [57], SensoMotoric with infrared tech sen-
cleaning, and augmenting the audio and EMR datasets as sors [59] and Mirametrix S2 Eye Tracker [58] captured eye-
they were neither age-matched [62] nor culturally relevant gaze data. The inertial sensors, tri-axial accelerometers, and
to the experiment data [63]. Additional datasets are listed in gyroscopes captured motor motions [68], [69], [70] data.
Appendix A, allowing researchers to collaborate and develop Further, cameras and RGBD sensors; captured videos
ASD detection innovations and improve the current ASD that were analyzed using CVA to classify facial expressions
detection process. [62], [63], [64], postural and head movements [55], [65],
[66], and socio-emotional behaviors and head positions [55].
3) CONTROLS AND COMPARATORS In addition, cameras recorded ASD-specific behavior mark-
While the majority of studies focused on categorizing chil- ers [47], [48], [49], [50], [51], [52], [53], [54] and the video
dren with ASD and TD, a few studies included children with frames were manually annotated to generate feature sets on
ODD as well. For example, [54] performed classification which ML and DL models were trained. The ML models
between WS and TD and WS+ vs. WS-. In addition [46], were trained on structured assessments such as VABS [76],

M. Kohli et al.: Role of Intelligent Technologies in Early Detection of ASD: A Scoping Review

Q-CHAT [77], [78], ADOS-2 [47], [49], and ADI-R Atypical motor movements can predict the risk for ASD.
[47], [49]. The scanned referrals [72] were processed using In an experiment, [68], researchers observed that while play-
OCR (optical character readers) and classified ASD and TD ing tablet games, gesture velocity was more significant in the
cases by training ML models with ground truth diagnosis as ASD group, while the time to tap a screen was shorter than
labels. in the control group. In another study, a ball drop task [69]
indicated an improper wrist angle position, hand inclination,
2) SHORTLISTED ASD MARKERS and slower, fragmented movement as critical criteria for ASD
Numerous studies incorporated feature reduction methods, and TD classification. Similar findings were reported by [71]
marked in column ‘‘reduced features’’ in Table 1, and short- in a reaching-grasping paradigm in which children with
listed critical ASD deterministic landmarks. Researchers ASD displayed decreased coupling between DoF (degree of
trained ML models on these features to perform ASD vs. freedom), which correlated with the severity of their socio-
TD classification using a supervised learning method shown communicative symptoms. During a virtual reality, motor
in figure 6. For example, [72] applied feature reduction on the movement task [70] could classify ASD and TD groups with
scanned referral records and shortlisted behavioral patterns 82.98% using only head movements, 74.47%, and 72.34%
such as vocal vowel sounds and mood swings as critical ASD accurately using arms and legs movements, respectively.
deterministic markers. Additionally, [74] shortlisted parental The findings corroborate the literature suggesting that head
age, medication use, treatment, and dietary patterns as signif- spinning and banging, body rocking, and foot-stomping are
icant predictors of ASD. three major stereotypes and repetitive motions associated
Further, [63] highlighted that children with ASD can com- with ASD.
prehend and imitate facial expressions such as happiness and [77], [78] findings indicated that ML algorithms could
sadness but struggle with complicated facial expressions such detect ASD with an accuracy greater than 90 percent from a
as neutrality, aversion, disgust, and surprise. [62] reported selection of 14 feature items and greater than 80 percent using
that non-ASD children, while watching movies, often raised only three items of Q-CHAT.In addition, VABS (Vineland
eyebrows and an open mouth, a characteristic of normal Adaptive Behavior Scale) [118] daily living normalized z-
development and a feature not displayed by ASD children. scored [119] assessment scores at 14 months reported AUC
The social communication deficit is a critical marker for of 0.713 [76] for ASD detection.
ASD. [49], [51], [67] highlighted speech patterns, com- A study by [60] using the eye gaze reported that ASD
municative engagement, language understanding, emotional children exhibited more unstable gaze modulation and
expression, sensory seeking, responsive social smile, and demonstrated significantly shorter initial, average, and total
stereotyped speech as critical markers for ASD. Further, [50], fixation durations for social stimuli [56]. Further, [57] sug-
[53], [67], [76] highlighted the child’s stereotyped behav- gested that children with ASD show reduced fixation time at
iors, repetitive interests, and poor eye contact as important the eyes, mouth, and nose, affirming the critical role of fix-
markers for ASD risk determination. In addition, [55] sug- ation on the eyes in detecting autism via eye-tracking. How-
gested name-call responses and emotional state analysis as an ever, findings of [58] suggested quite the opposite, as preterm
enabler for early ASD warning flags in children. [51], [52], children preferred to glance at the eyes or lips of social images
[53] emphasized shorter duration and lower frequencies of or people. Therefore the ability to process social cues by
eye contact, lack of social smiling, and poor social engage- analyzing the fixation duration at various body parts can
ment as ASD risk markers. However, [76] revealed that poor predict the severity of ASD in children. [45] presented the
eye contact and repetitive hand movements alone did not vocal analysis of the children and confirmed that at 9–12
accurately diagnose ASD. Individual behaviors such as daily months, TD infants reached the canonical babbling [123]
living skills impairments and compliance within the house- stage earlier than other infants later diagnosed with ASD.
hold must be considered in conjunction with other behaviors They further confirmed that infants diagnosed later with ASD
to suggest predictive accuracy of ASD. Further, [54] used produced fewer words per minute than those diagnosed with
PCA to identify stereotypical hand motions (HM), mother- TD. Therefore canonical ability and syllables production in
child communication exchange, and speech analysis as essen- younger years can confirm the risk of ASD.
tial behavioral and auditory markers for ASD among children
with WS. Thus, ML models can analyze facial expressions, 3) MULTIMODAL DATA PROCESSING
gestural patterns, stereotypical behavior, and communication The research utilized seventeen ML algorithms listed in the
exchanges to predict ASD risk with high confidence. column ‘‘Algorithms’’ of Table 2. Decision trees, random
While measuring joint attention skills, [61] reported forests, and support vector machines were the most often
that infants later diagnosed with ASD exhibit considerable used machine learning models. CNN algorithm is utilized
atypical IJA but not RJA. In addition, the prevalence of atyp- approximately 80% of the time when deep learning methods
ical nonverbal behaviors manifested by displaying uncom- are employed. The review employed six statistical methods,
mon, limited gestural postures decoupled from visual contact, with the ANOVA, T-test, and Chi-squared test being the most
facial affect, and speech in ASD children [67] can lead to often utilized. Ensemble decision trees [128] performed the
ASD identification. best on structured data generated from the video annotation of

104902 VOLUME 10, 2022

ASD-relevant behaviors. In eye-tracking, statistical and dis- The review contributes to the literature by
criminant analysis were the most effective algorithms. CVA 1) Shortlisting various multimodal bio-behavioral mark-
ranked highest in analyzing unstructured facial expressions ers for ASD detection;
and postural and head movement data. Additionally, SVM 2) Analyzing automatic multimodal data extraction, fea-
scored admirably in the structured feature reduced data cap- ture optimization, and data processing methods;
tured from the motor movements. MGOA and word2vector 3) Highlighting psychometric outcomes from technolog-
algorithms outperformed all other algorithms in Assessments, ical innovations and comparing them with traditional
Datasets, and EMR Analysis. Finally, BLSTM (Bidirectional methods, and
LSTM), AE, and SVM effectively classified audio data to 4) Identifying relevant datasets for researchers to collabo-
detect ASD conditions. rate and cocreate ASD and ODD detection innovations
bringing efficacy to the detection process. The review
4) MACHINE LEARNING VS. DEEP LEARNING highlights that ASD detection ML and DL methods
A ML model trained on multimodal data can classify ASD can be applied to identify children at risk of ODD,
and TD children at the current state of the art. However, including speech and developmental delays and hyper-
DL outperformed ML methods in feature extraction and clas- active challenges. Researchers can shortlist specific
sification tasks on unstructured data. For instance, researchers feature sets for each condition and train machine learn-
captured features of interest for ASD classification using DL ing models with statistically significant data volume.
from facial expressions [62], [63], postural and head move- The outcomes of the machine learning models can be
ments [65], [66], [67], text analysis using NLP [72], [74], measured based on psychometric properties calculated
[78], motor movements [68], [69], [70], and audio recordings by comparing predicted diagnoses with gold-standard
[44], [45], [46] and incorporated supervised learning tech- tools.
niques as shown in figure 6 to classify ASD children. As a The subsections below detail the role of technology in ASD
result, a conclusive DL model trained on multimodal data detection and internal and external validity threats.
sourced from one or more of the seven categories can make
the ASD diagnosis procedure efficient.
The analysis highlights an upward trend in adopting
The robustness of the technological solutions can be mea-
technology-based ASD detection solutions during 2018-2021
sured on psychometric properties such as sensitivity, speci-
attributed to multiple factors.
ficity, and accuracy listed in Table 1. The following eight
studies [46], [47], [48], [49], [53], [69], [74], [75] reported 1) The demand for low-cost diagnoses [129], universal
psychometric properties of greater than 0.9 on any one of screening [130], and the availability of research fund-
the sensitivity, specificity, and accuracy measures. Seventeen ing [131] have promoted research initiatives to develop
studies [45], [50], [51], [52], [54], [55], [57], [60], [67], [68], technology-based ASD screening innovations.
[70], [71], [72], [73], [76], [77], [78] reported psychometric 2) The high penetration of mobile devices, low-cost cam-
outcomes between 0.7 to 0.9, one study [44] less than 0.7 and eras, and Micro-Electro-Mechanical System (MEMS)
ten studies [56], [58], [59], [61], [62], [63], [64], [65], [66] sensors such as accelerometers and gyroscopes [132]
did not reported any outcomes. have enabled the real-time capturing of vast volumes of
structured and unstructured data. In clinical situations,
IV. DISCUSSION cameras and sensors are more practical, less expen-
The scoping review shortlisted 35 studies after eligibility and sive, and less invasive technologies than fMRI and
inclusion-exclusion assessments. The review analyzes tech- EEG.
nology’s viability, application, division, and outcomes for 3) With technology maturing, the generation and multi-
the following seven bio-behaviors;(a) Stereotypical behavior; modal data processing is automated by researchers by
(b) facial expressions; (c) eye gaze; (d) motor movements; building data pipelines on a low-cost cloud infrastruc-
(e) postural analysis; (f) assessments and EHR datasets; ture. Integrating data pipelines with technologies such
and (g) auditory data. The review data summary is popu- as AI, ML, and DL has expedited the development of
lated in table Table 1 which includes multimodal input data, cost-effective and superior detection and on-risk iden-
feature reduction steps, environment setting, data process- tification of ASD and ODD population.
ing algorithms, and psychometric outcomes, i.e., sensitivity, However, the traditional ASD diagnostic services are not
specificity, and accuracy. Table 2 lists enrolment counts, soft- always accessible, affordable, or data-driven [27], [133].
ware or hardware devices used, assessment tools, assessment The review findings suggest that technology-based ASD
duration, limitations, and future directions. The review uses methods can be extrapolated to the ODD population and
table data and answers four research questions on technol- can effectively, efficiently, rapidly, and potentially serve
ogy usability, multimodal data capture, data analysis, quality larger population groups with improved quality, access, and
evaluation, limitations, and strengths. affordability [27].

M. Kohli et al.: Role of Intelligent Technologies in Early Detection of ASD: A Scoping Review

FIGURE 6. Process flow of Supervised Machine Learning methods.

Further, the technology-facilitated innovations are expec- behavioral [51], [52], [54], eye gaze [58], audio [44],
ted to supplement traditional detection methods because of [45], [46], Facial expressions [62], postural [65] and
the following reasons: assessments [73], [74], [76] data to identify children at
1) Diagnostic methods based on ML and DL can be risk of ASD between 6-18 months, circumventing tra-
trained on a large volume of involuntary generated ditional diagnostic instrument’s age constraints. These
multimodal data from various bio-behaviors to detect improvements can advance the field by promoting
children with ASD and ODD risk. early identification, improving clinician’s capacity, and
2) Traditional ASD screening methods can misdiag- thereby improving access to early intervention [134]
nose children with borderline ASD or with speech services.
delay or ODD as ASD. These limitations can be Even though the review highlights that the demand
overcome using technological innovations such as for technology-based detection methods has grown from
an inconclusive ML classifier developed by [47] 2018-2021, the actual adoption of these innovations has
trained solely on misclassified data instances. The been minimal. These innovations should ideally be used by
method reduces misdiagnosis of comorbid conditions non-specialists, available on mobile applications ( to ensure
with an implementation time of under ten minutes widespread adoption), and able to identify TD, ASD, and
by suggesting borderline or ODD instances into an ODD ( speech delay, development delay, Intellectual delay)
inconclusive class and recommending users for fur- based on well-defined minimal distinguishable features,
ther evaluation by a clinician. Thus, ML technologies in the first three years [135]. The adoption of these technolo-
can potentially alleviate the misdiagnosis of detecting gies can be supported through controlled pilots through the
comorbid, ASD, or ODD borderline conditions such participation of stakeholders such as parents, clinicians, and
as speech and developmental delays with increased schools, digitizing downstream detection processes, assess-
accuracy. ments, and treatments [136]. A digital human-supported
3) A few gold-standard tools, such as CARS-2, can diag- ASD and ODD detection and management framework can
nose children only beyond two years. Also, children’s be initiated and transition to an autonomous and need-based
social communication, language, and other critical blended digital model, optimizing cost and maximizing
milestones do not develop until the second or third scale [137].
year of life. Therefore, evaluating ASD risk in children Further, adopting these technologies can be supported with
under two years can give conflicting results by an inex- vernacular massive online open courses’’ (MOOCs), training
perienced clinician. The review emphasized the extrac- websites, and brief knowledge content, including text-based
tion and analysis of ASD and ODD landmarks from training procedures with video clips [137].

104904 VOLUME 10, 2022

M. Kohli et al.: Role of Intelligent Technologies in Early Detection of ASD: A Scoping Review

TABLE 4. Internal and external validity threats and mitigation measures.

M. Kohli et al.: Role of Intelligent Technologies in Early Detection of ASD: A Scoping Review

TABLE 4. (Continued.) Internal and external validity threats and mitigation measures.

M. Kohli et al.: Role of Intelligent Technologies in Early Detection of ASD: A Scoping Review

TABLE 4. (Continued.) Internal and external validity threats and mitigation measures.

B. VALIDITY THREATS following studies had overlapping and higher age ranges.
Internal [138] and external validity [139] threats need to be Three studies recruited children between two to eight years
reviewed and managed to ensure the reliability and robust- [48], [63], [70], and two studies included adolescents, teens,
ness of the study’s research methods and their outcomes. and adults age-group [72], [75]. The mismatch between study
Internal validity evaluates study appropriateness concerning eligibility definition and study selection can limit the validity
its method, rigor of an experiment, protocol, structure, study of the scoping review.
variables, and execution. External validity confirms study
findings in the real world and leads to broader adoption. VI. FUTURE DIRECTIONS
While rigorous research procedures can ensure the study’s The review highlighted the presence of a sophisticated tech-
internal validity, they may limit its generalization, applica- nical stack, which produced promising but non-generalizable
tion, and external validation. Below Table 4 list internal and results with privacy, legal, cultural, and ethical chal-
external validity threats and suggest methods to overcome lenges [142]. Therefore future studies should:
those. 1) Future research should focus on establishing trust with
the ‘‘vulnerable’’ population and their families by
V. LIMITATIONS addressing ethical, legal, and cultural obstacles in addi-
The access and reach of technology-based ASD detection tion to the internal and external validity risks discussed
methods depend on the availability of computers, mobile in the previous section.
phones, and the internet. A lack of internet coverage may dis- 2) Initiate a framework for collaborative research to create
proportionately disadvantage those in rural and underserved new datasets and improve the existing ones described
locations hampered by sluggish internet speeds, poor qual- in Appendix A to disseminate research results globally.
ity, unstable connectivity, and persons’ lack of technological This can encourage collaborations between academics
ability and trust in technology. In addition, the internal and from developing and developed nations, enabling them
external validity threats listed above limit the acceptance and to conduct fair performance comparisons of technical
generalization of the innovations. solutions, conduct joint experiments, and assure thor-
Further, although the scoping review eligibility to shortlist ough worldwide and systematic empirical validation of
studies was for children between two and six years, the studies.

M. Kohli et al.: Role of Intelligent Technologies in Early Detection of ASD: A Scoping Review

3) Researchers should focus on policy-level activities ethical, legal, and adoption challenges of technological solu-
involving stakeholders in developing study designs tions in real-world scenarios. The review can aid academics,
based on the FATE (Fairness, Accountability, Trans- clinicians, and practitioners by offering vital inputs for devel-
parency, and Ethics) framework for using AI for ASD oping technologically-based ASD screening and diagnostic
detection. In addition, adopting and enforcing strong solutions that are efficient, cost-effective, and data-driven and
regulations, policies, and data protection and privacy can address the current constraints of the industry.
legislation to prevent inadvertent data leaking can
inspire confidence among stakeholders [143]. APPENDIX A
In addition, the development of technology-facilitated Appendix lists important datasets in autism research:
early ASD and ODD detection solutions should be supported JAFFE [146] – The database of 213 images containing
downstream with reliable referral and intervention infrastruc- the facial expressions of ten Japanese women. There are
ture, improving the healthcare system’s efficiency, capacity, seven distinct facial expressions: neutral, happy, smiling, sad,
and efficacy [136]. surprised, anger, disgust, and fear.
Most studies in the review captured data from a single CK+ [147] – The expression database created in the
bio-behavior to develop ASD detection innovations. Future laboratory includes 593 expression sequences from 123 indi-
improvements should include capturing multimodal data viduals, 69% female and 31% male from African Ameri-
from diverse bio-behavior categories. The feature engineer- cans, Asians, and South Americans. It comprises seven facial
ing methods can assign weights to multimodal data originat- expressions: disdain, disgust, fear, happiness, sadness, and
ing from more than one of the seven categories, such as eye surprise.
contact, stereotyped behaviors, postural demonstration, and 2013 FER [148] – The library contains 35,887 facial pho-
speech, to develop ML and DL models to predict ASD and tos in gray–scale representing seven different facial expres-
ODD risk and their severity levels for broader age groups. sions: angry, disgusted, fearful, pleased, sad, surprised, and
These improvements can be offered as a service on a mobile neutral.
application to improve its adoption and usability. MMI [149] – The expression database is broken into two
Finally, future studies should focus on preventive methods sections: the first is a dynamic data set containing over 2,900
incorporating genetic approaches. For example, [84] used video sequences; the second is a static data set containing over
Hidden Markov Models and genetics to examine the risk 2,900 video sequences. The second component is a static data
of having an ASD offspring, as the ASD risk is multiplied collection consisting of many high–resolution photographs.
by 40 to 65 times in parents with an ASD diagnosis or The collection contains seven distinct types of expressions.
carrying a risk gene. Therefore, future genetic-focused trials AFEW [150] – All of the facial photos in the database were
can preempt the risk of ASD in children and empower parents edited from movies and included seven fundamental facial
to decide on starting families with possible risk exposure. expressions
In addition, technological innovations using trained robots SFEW [151] – The expression library consists of a static
to treat and diagnose ASD in young children, using POMDP frame image from the AFEW data set containing seven fun-
(Partially Observable Markov Decision Process) [144], [145] damental expressions.
can significantly automate the ASD detection process and eGeMAPS [152] – A set of acoustic parameters suit-
should be a focus for future research. able for use in various areas of automatic voice analysis,
including para–linguistic and clinical speech analysis. The
VII. CONCLUSION set is designed to serve as a single reference point for future
The review comprised 35 studies grouped into seven mul- research evaluations and prevent discrepancies produced by
timodal data categories: (a) stereotyped behavior, (b) facial separate parameter sets or even by different implementations
expressions, (c) eye gaze, (d) motor control and move- of the same parameter
ments, (e) postural analysis, (f) auditory data, and (g) assess- The Simons Simplex Collection (SSC) [153] is a
ments and electronic health record data. A scoping review resource of the Simons Foundation Autism Research Initia-
based on PRISMA guidelines revealed a rising trend of tive (SFARI). The SSC established a permanent repository of
technology-based ASD detection tools incorporating mul- genetic samples from 2,600 simplex families, each of which
timodal data analyzed through ML and DL methods and has one child affected with an autism spectrum disorder, and
supports the role and effectiveness of technology applications unaffected parents and siblings.
in improving current ASD screening and diagnosis meth- Binghamton University 3D Facial Expression database
ods. The review reported internal and external validity chal- [95] has currently, 100 participants (56% female, 44% male),
lenges with ethical, legal, dataset, and restricted participant ranging in age from 18 to 70 years and representing a diversity
and controls as critical challenges. In addition, most solu- of ethnic/racial ancestries. Each person made seven different
tions reported outcomes limited to the laboratory with non- facial expressions in front of the 3D face scanner. Except for
generalizable outcomes. Therefore, additional cross-cultural the neutral emotion, each of the six prototypical expressions
intensive trials with large population groups with various (happiness, disgust, fear, anger, surprise, and sadness) has
other disorders are needed to examine the field preparedness, four intensity levels.

M. Kohli et al.: Role of Intelligent Technologies in Early Detection of ASD: A Scoping Review

TABLE 5. Explanation of technical terms used in the review. [3] B. Mozolic-Staunton, M. Donelly, J. Yoxall, and J. Barbaro, ‘‘Early
detection for better outcomes: Universal developmental surveillance for
autism across health and early childhood education settings,’’ Res. Autism
Spectr. Disorders, vol. 71, Mar. 2020, Art. no. 101496.
[4] Early Childhood Development and Disability: A Discussion Paper, World
Health Org., Geneva, Switzerland, 2012.
[5] R. C. Sheldrick, S. Marakovitz, D. Garfinkel, A. S. Carter, and
E. C. Perrin, ‘‘Comparative accuracy of developmental screening ques-
tionnaires,’’ JAMA Pediatrics, vol. 174, no. 4, pp. 366–374, 2020.
[6] M. Khowaja, D. L. Robins, and L. B. Adamson, ‘‘Utilizing two-tiered
screening for early detection of autism spectrum disorder,’’ Autism,
vol. 22, no. 7, pp. 881–890, Oct. 2018.
[7] D. L. Robins, K. Casagrande, M. Barton, C.-M.-A. Chen,
T. Dumont-Mathieu, and D. Fein, ‘‘Validation of the modified checklist
for autism in toddlers, revised with follow-up (M-CHAT-R/F),’’
Pediatrics, vol. 133, no. 1, pp. 37–45, Jan. 2014.
[8] S. Ravi, V. Chandrasekaran, S. Kattimani, and M. Subramanian, ‘‘Mater-
nal and birth risk factors for children screening positive for autism spec-
trum disorders on M-CHAT-R,’’ Asian J. Psychiatry, vol. 22, pp. 17–21,
Aug. 2016.
[9] C. Lord, S. Risi, L. Lambrecht, E. H. Cook, B. L. Leventhal,
P. C. DiLavore, A. Pickles, and M. Rutter, ‘‘The autism diagnostic obser-
vation schedule-generic: A standard measure of social and communica-
tion deficits associated with the spectrum of autism,’’ J. Developmental
Disorders, vol. 30, no. 3, pp. 205–223, Jun. 2000.
[10] C. Lord, M. Rutter, and A. Le Couteur, ‘‘Autism diagnostic interview-
revised: A revised version of a diagnostic interview for caregivers of
individuals with possible pervasive developmental disorders,’’ J. Autism
Develop. Disorders, vol. 24, no. 5, pp. 659–685, Oct. 1994.
[11] K. Pierce, V. H. Gazestani, E. Bacon, C. C. Barnes, D. Cha, S. Nalabolu,
L. Lopez, A. Moore, S. Pence-Stophaeros, and E. Courchesne, ‘‘Evalua-
tion of the diagnostic stability of the early autism spectrum disorder phe-
notype in the general population starting at 12 months,’’ JAMA Pediatrics,
vol. 173, no. 6, pp. 578–587, 2019.
[12] S. Jayanath and S. Ozonoff, ‘‘First parental concerns and age at diagnosis
of autism spectrum disorder: A retrospective review from Malaysia,’’
Malaysian J. Med. Sci., vol. 27, no. 5, p. 78, 2020.
[13] M. J. Maenner, K. A. Shaw, J. Baio, A. Washington, M. Patrick,
M. DiRienzo, D. L. Christensen, L. D. Wiggins, S. Pettygrove,
J. G. Andrews, and M. Lopez, ‘‘Prevalence of autism spectrum disor-
APPENDIX B der among children aged 8 years-autism and developmental disabilities
See Table 5. monitoring network, 11 sites, United States, 2016,’’ MMWR Surveill.
Summaries, vol. 69, no. 4, p. 1, 2020.
APPENDIX C [14] H. Manohar, P. Kandasamy, V. Chandrasekaran, and R. P. Rajkumar,
PRISMA Checklist: PRISMA-ScR checklist for studies. ‘‘Early diagnosis and intervention for autism spectrum disorder: Need for
pediatrician-child psychiatrist liaison,’’ Indian J. Psychol. Med., vol. 41,
CASP Evaluation Sheet: The results of the CASP quality no. 1, pp. 87–90, Jan. 2019.
assessment tool for studies. (Microsoft Excel Open XML [15] L. Klintwall and S. Eikeseth, ‘‘Early and intensive behavioral intervention
Spreadsheet (XLSX)) (EIBI) in autism,’’ Comprehensive Guide Autism, vol. 1, pp. 117–137,
Jun. 2014.
[16] C. Luciano and R. Keller, ‘‘Misdiagnosis of high function autism spec-
ACKNOWLEDGMENT trum disorders in adults: An Italian case series,’’ Autism-Open Access,
Study Registration: Similar to most other scoping reviews, the vol. 4, no. 131, p. 2, 2014.
current scoping study was not registered. Author Contribu- [17] C. Gesi, G. Migliarese, S. Torriero, M. Capellazzi, A. C. Omboni,
tions: Manu Kohli: conceptualization, methodology, writing G. Cerveri, and C. Mencacci, ‘‘Gender differences in misdiagnosis and
(original draft), writing (review and editing), software, for- delayed diagnosis among adults with autism spectrum disorder with no
language or intellectual disability,’’ Brain Sci., vol. 11, no. 7, p. 912,
mal analysis, investigation, resources, data curation, visual- Jul. 2021.
ization; Arpan Kumar Kar: conceptualization, methodology, [18] S. Aggarwal and B. Angus, ‘‘Misdiagnosis versus missed diagnosis:
writing (review and editing), supervision, validation, and Diagnosing autism spectrum disorder in adolescents,’’ Australas. Psychi-
atry, vol. 23, no. 2, pp. 120–123, Apr. 2015.
investigation; and Shuchi Sinha: conceptualization, method- [19] Y. D. Keller-Bell, ‘‘Disparities in the identification and diagnosis of
ology, writing (review and editing), and supervision. autism spectrum disorder in culturally and linguistically diverse popula-
tions,’’ Perspect. ASHA Special Interest Groups, vol. 2, no. 14, pp. 68–81,
VOLUME 10, 2022

VOLUME 10, 2022

VOLUME 10, 2022

VOLUME 10, 2022

You might also like