Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Deploying AI-based Driver Monitoring System on

NVIDIA DRIVE AGX Xavier


A L K Sainath Medam, Adarsh S* , Saurabh Sinalkar**
*
Department of Electronics and Communication Engineering, Amrita School of Engineering, Coimbatore
**
Center of Excellence for AD/ADAS, Tata Consultancy Services, Pune, India
Amrita Vishwa Vidyapeetham, India, 641112
*
s [email protected]

Abstract—Driving safety has been a concern ever since the These distractions can cause severe road accidents which we
first cars started travelling down the road. According to survey can’t imagine the severity level.
by National Highway Traffic Safety Administration (NHTSA) According to recent estimates, Advanced Driver Assistance
in 2021, the driver inattention plays one of the key reason to
fatal crashes. The main factor contributing to car accidents System (ADAS) could potentially to prevent as many as thirty
are driver distraction and inattention. Distracted driving has percent of accidents on highways caused by light vehicles
significantly affected road safety because it has become one of [4]. Nowadays, the majority of commercial DMS use driving
the leading causes of vehicular accidents globally. An in-vehicle metrics, such as steering or lateral control, to assess the driver’s
real-time, unobtrusive alerting and monitoring system for driver mental state [5]. The extensive use of vision-based DMS is
fatigue with prototype is suggested to save lives during driving.
The work proposes an Artificial Intelligence (AI) based driver essential for fully or partially autonomous driving solutions
monitoring system using Convolutional Neural Network (CNN) that comply with Society of Automotive Engineers (SAE)
based algorithm. The performance of the model is optimized Levels 2-4 [6].
using the DriveWorks software development kit to make use Driver drowsiness and attention warning (DDAW) [7] sys-
of the hardware accelerators. The prototype of an AI based tems must be installed in new motor vehicles of categories M
driver fatigue monitoring system built with NVIDIA DRIVE
AGX Xavier has a 99.3% accuracy. and N beginning July 6, 2022, and all new cars beginning July
Index Terms—DRIVE AGX Xavier, Convolutional Neural Net- 7, 2024 . To accomplish this, the system must be trained to
work, eyes, drowsiness, driver. navigate autonomously. Good path-planning [8] technologies
may aid in task completion, achieving the goal while reducing
I. I NTRODUCTION capital investment . Automated and assisted driving technology
Inattention driving is the main reason for accidents all over has advanced significantly in recent years, which helps reduce
the world. The National Highway Traffic Safety Adminis- traffic accidents brought on by human mistakes.
tration (NHTSA) reported 3522 distracted driving fatalities Driver Monitoring System is one such effort featured in
in 2021 [1]. Driving is a difficult activity that necessitates the Euro New Car Assessment Program (NCAP) 2025 Road-
balancing numerous tasks at once. While remaining within map to 0 Road Casualties by 2050 [9]. It is essential in these
the permitted limits of the road, drivers must be aware of circumstances a system to monitor the driver’s actions and
other road users, anticipate potential hazards, and manage promptly alert him, which is why implementing a system
disruptions from both inside and outside the car. For ensuring to track the driver’s actions based on eye closure, in the
road safety, it is essential to comprehend how drivers perceive car is important. In proposed work, a non-intrusive, deep
the environment, how it affects their thinking, and what learning-based driver monitoring system is presented to avert
causes concentration breaks, especially in considering research such situations in real-time utilizing production-grade high
showing that momentary distractions and poor visual scanning computing hardware. Driver- in-loop setup is attached with
abilities increase the likelihood of accidents [2]. Sekonix camera to record the driver’s face at all times.
Distractions can be anything from minor to major and Using the OpenCV library, the driver’s face along with eyes
those can happen with or without driver’s intervention. Some are recognized and tracked. DRIVE AGX Xavier is used
of the causes occurred with the driver’s intervention are to validate the model in real-time. A Deep Neural Network
using mobile phones, talking with co-passengers, drinking, (DNN) framework is utilized to extract the observed face and
using mobile phone. Some of the causes which are occurred categorize it as either inattentive or attentive based on the
without the driver’s intervention are improper roads, sleeping, closure of the eyes.
unpredictable climatic conditions and rash driving of vehicles The manuscript is organised as follows: Section II reviews
on the road which driver can’t control. Among these sleeping the literature; Section III outlines the methodology ; Section
had major impact. According to the National Sleep Foundation IV analyzes the results and performance of the system; and
(NSF) survey, around 37% of drivers have fallen asleep behind Section V draws conclusions and outlines the paper’s future
the wheel, and 4% accepts drowsiness for the car accidents [3]. focus.
II. L ITERATURE R EVIEW for detection of inattentiveness of driver by using convolutional
Several studies have already been done on detecting tired- neural network [24]. In low light conditions also, the model
ness and distraction. If drivers are occupied with activities able to detect the driver’s behavior which is used in all camera-
other than driving, they may not be able to focus on driving related applications [25]. These are all detection and predic-
adequately during important occurrences [10]. M.Sunagawa tion models, trained and verified in respective hardware. No
et al. [11] proposed a scale based on the activity of driver’s work has been found with driver monitoring system deployed
drowsiness is , are represented in Table I. and validated on real-time high-computing automotive ECU
(Electronic Control Unit) grade hardware, which can directly
TABLE I be installed in the vehicle.
D ROWSINESS L EVELS A ND F EATURES Previous studies have employed many features to identify
inattention, which complicates the method. In addition, a lot
of research studies used sensors to evaluate biological factors
These sensors must be in close proximity to the driver, which
may interfere with driving. The inability to use the system in
vehicles directly is a shortcoming of several studies.
This research suggests a non-invasive method that uses a
Sekonix camera along with self-driving car hardware as a
remedy to these problems. The processing and inference time
are decreased while maintaining great accuracy and system is
validated with automotive standard camera with Drive AGX
Xavier.
III. M ETHODOLOGY
A. Hardware Components
In this article, many existing sleepiness and distraction 1) Camera: Sekonix camera is used to capture driver
detection methods are also compared. Different strategies for behavior. The sensor data is passed as input to the system
classifying and detecting sleepiness were compared by M. which is used to classify driver behavior. Table III shows the
Ramzan et al. [12]. John et al. [13] had did quality research on camera specifications.
the available algorithms and data-sets so far available related
to driver assistance is shown in Table II. Distracted types based TABLE III
on Non-Driving Related Tasks (NDRT) are C AMERA S PECIFICATIONS
1) Visual - Moving gaze away from road
Name Specification
2) Cognitive - related to thinking
Model Sekonix SF3324
3) Manual - removing hands from wheel Sensor ON Semi - AR0231
4) Visuo-Manual - combination of both visual and manual HFOV (Horizontal Field Of View) 120◦
such as drinking water and thinking something Lens Part Sekonix - NA1262

TABLE II 2) NVIDIA DRIVE AGX Xavier: NVIDIA DRIVE AGX


A LGORITHMS RELATED TO DISTRACTED DRIVING BASED ON V ISUAL
F EATURES AND D ISTRACTION T YPES Xavier hardware is used to deploy our system. It is scalable
and allowed us to save development time. Table IV represents
S.No Algorithm Visual Features Distraction types the hardware specifications.
1 Wollmer (2011) [14] Head VM
2 Li (2016) [15] Gaze+head+Face V+M+C
3 Liao (2016) [16] Gaze+head C TABLE IV
4 Wang (2018) [17] Gaze C+VM H ARDWARE S PECIFICATIONS
5 Chen (2021) [18] Face C+VM
Component Description Details
6 Yang (2021) [19] Gaze+Face VM
∗ Consider C - Cognitive, V - Visual, M - Manual, VM - Visuo-Manual Deep Learning Accelerators 10 TOPS (INT8)
NVIDIA Volta™-class integrated GPU 20 TOPS (INT8)
Image Signal Processor 1.5 Gigapixels/s
Drowsiness is accurately identified in [20] by machine AGX Xavier System I/O for Camera 90 Gb/s
Memory Bandwidth 136 GB/s
learning utilizing eye PERCLOS and yawning. Proposes a
tiredness detection system that makes use of neural networks
and faces pictures [21]. Using facial landmarks, Francisco et 3) Driver-in-loop setup: Driver in Loop setup (DIL) setup
al. offer a method for detecting inattentiveness using gaze shown in Fig. 1, mimics all the controls, such as the steering
tracking [22]. Uses facial landmarks to identify sleepiness wheel angle, gear shifts, acceleration, and braking of an actual
and a Support Vector Machine (SVM) to classify [23]. Using vehicle. The Sekonix camera is mounted on the windshield
transfer learning acquired good amount of accuracy is achieved facing the driver to continuously record the driver’s video.
This ongoing data is then transmitted to the NVIDIA DRIVE algorithm designed by Paul Viola and Michael Jones offer an
AGX hardware, which is directly connected to the camera. effective object detection method called Haar feature-based
cascade classifiers [27]. Used cascade function which had been

Fig. 1. Typical DIL setup diagram.


Fig. 3. Software Flow Diagram.

B. Collection of Drowsiness Data-set learned using a large number of both positive and negative
Sekonix camera is installed to the DIL setup in a position images, this approach is machine learning based. It is next
to capture the required region of interest by our requirement. applied to other photos to detect things. For face detection, the
The recordings are of 30fps. The recorded videos are in H.264 algorithm had been feed-ed with a lot of positive images and
format using the Nvidia DriveWorks sensor data recording negative images which had been used to train the classifier.
tool. The features had been extracted using Haar features. For
The images had been acquired and annotated based on the extraction, Haar features had been used. Thus, the face of the
following classifications. driver had been detected. After detection model will classify
• Eyes opened the driver based on eye closure and will render an output
• Eyes closed message on the cluster panel and an indication will be given
• Yawning to the driver through buzzer.
• Not-Yawning
D. Model Training
Yawning Detection Data-set (YawDD) [26] is used along
Deep Neural Network architecture is used to train the model
with our data-set, in YawDD - The driver’s dashboard is
with more than 6000 images which were converted to frames
where the camera is mounted. Each of the 29 films in this
from recorded videos with annotation. The Model had been
collection, one for each subject, covers every aspect of driving
trained to detect the inattentiveness of the driver. The Cate-
while nodding off, using a phone, and driving in silence. The
gorical cross-entropy loss function, Adam optimizer had been
package comes with 29 films altogether. The drowsiness data-
used for training. Maximum epochs had been set to 500 for
set samples is displayed in Fig. 2.
better results. With the available data-set, variety of photos was
created using a data augmentation strategy, which is crucial
for the model’s successful detection. Every epoch produced
a fresh set of images by altering each image. Zoom and
horizontal movements were accomplished without changing
image class. This stops model from being over-fitting. CNN
architecture with the following layers convolutional, max-
pooling, flatten, dropout, dense, Rectified Linear Unit (ReLU)
activation function had been used to train combined with the
collected and recorded data-set so that the model will work in
multiple scenarios. Fig. 4 shows the model visualization.
The driver’s visage is continuously videotaped in real-time
for tracking. Then, these videos are changed into frames.
Fig. 2. Data-set samples. Using the classifying model, these frames are then labeled
as inattentive or other, and an alarm is given as an indication.
C. Software Architecture E. Model Optimization
The Software architecture is illustrated in Fig. 3. Data is The model had been trained in TensorFlow framework but
captured from the Sekonix camera after that the data will for better performance converted into an Open Neural Network
be sent to DRIVE AGX Xavier for processing. After that, Exchange (ONNX) model using tf2onnx 1.14.0 release version
data will be processed so that the data need can be passed as which is a library in Python. ONNX model is optimized using
input to our model. After that we will get an inference from TensorRT Optimization [28] Tool provided by Nvidia. It is
the model and this inference is done by using the detection an software development kit comes with a run-time and deep
G. Deployment steps
In Deployment, we build the system in Host PC (Personal
Computer) after validation, deployed the system in DRIVE
AGX Xavier [29] for testing.

Fig. 6. Inference Workflow for DRIVE AGX Xavier.

1) Inference Workflow:
• Build the model in Host device.
Fig. 4. Model Visualization. • Deploying system in platform/Drive.
Inference workflow for DRIVE AGX Xavier is shown in
learning inference optimizer that offer inference applications Fig. 6.
minimal latency and high throughput. ONNX version of DNN model was optimized using drive-
works TensorRT Optimization Tool in host pc to create a plan
F. Code Workflow file. This optimized model was then integrated in the code
The C Plus Plus (CPP) is the efficient and supported way of using DNN framework api’s (Application Programming Inter-
writing code to flash in an embedded device. Code workflow face) of driveworks and the complete system was validated in
Fig. 5 goes on like this, first, we imported DriveWorksSample host pc. Further similar process was followed to generate plan
which had visualization, camera and DNN framework which file for hardware and the system was validated on hardware
we used in our system. After that initialized camera sensor as well.

IV. R ESULT A ND D ISCUSSION

NVIDIA DRIVE AGX Xavier, Sekonix camera connected


to DIL setup are used to capture, perform detection, extraction,
and classification. In real-time model will detect the face of
the driver, after that it will identify the eyes using a cascade
classifier and the identified region of interest (ROI) is given
as input to our model. It will detect whether the driver is
inattentive or not. Based on the frame count, the model will
give a warning to the driver to make him attentive.

A. Performance Metrics
Fig. 5. Code Work-Flow of the System.
Performance metrics are used to track and assess a model’s
to capture driver behavior. The following steps had been performance throughout training and testing.
considered while initializing DNN, calculated total size needed 1) Accuracy: Classification accuracy is calculated by di-
to store input and output, got classification blob indices, viding the total number of predictions by the number of right
allocated GPU, CPU memory for reading the output of DNN predictions and multiplying the result by 100. We implemented
and initialized data conditioner. In OnProcess, we collected the this in practice by comparing ground truth values with predic-
sensor frame and classification had been done. Once Interpret tions continuously. The accuracy of the proposed system is
was done followed by rendering of sample output on screen. 99.30%, the training loss and accuracy of the model after 500
For every new frame, we reset memory and conditioner. epochs are displayed in Fig. 7.
Fig. 7. Training Accuracy & Training Loss Graph.

2) Confusion Matrix: Confusion matrix is a table-based


representation of ground-truth labels against model predic-
tions. Each row of the c represents instances in a predicted
class, while each column represents instances in an actual
class. It is a performance metric of classification model on Fig. 9. Testing data Confusion Matrix.
a test data when true labels are known. The performance
matrices of the system are displayed in Fig. 8 and Fig. 9.

Fig. 10. Sample System Inference.

Fig. 8. Training data Confusion Matrix.

B. System Inference Time Analysis


System inference time is defined as the time taken to predict
the driver’s inattentive state based on the trained classifica-
tions of the system. Sample system Inference is displayed in
“Fig. 10”. The comparison of the inference times between our
TensorFlow model to our same TensorRT optimized model
on 15 samples displayed in Fig. 11. For every sample the
time taken to predict the classification took less time when
compared to non-optimized model. With optimization of the
model and the methodology used to implement the system able
classify the inattentiveness of the driver while driving in better
time. The model that has been optimized using TensorRT
is referred to as the TensorRT Optimized model, while the
regular model is known as the TensorFlow model. Fig. 11. Inference Comparison Graph.
The performance of our TensorRT optimized system in
terms of inference time as shown in Table V, Optimized
model inference time (InferenceTime O) is less than our [10] K. L. Young and P. M. Salmon, ‘Examining the relationship between
Keras TensorFlow system inference time (InferenceTime) i.e. driver distraction and driving errors: A discussion of theory, studies and
methods’, Safety Science, vol. 50, no. 2, pp. 165–174, 2012.
It predicts the output in less time when compared to our Keras [11] M. Sunagawa et al., ‘Comprehensive Drowsiness Level Detection Model
TensorFlow trained system. We had achieved in decreasing the Combining Multimodal Information’, in IEEE Sensors Journal, vol. 20,
inference time by 83%. no. 7, pp. 3709-3717, 1 Apr. 2020, doi: 10.1109/JSEN.2019.2960158.
[12] M. Ramzan, H. U. Khan, S. M. Awan, A. Ismail, M. Ilyas and
A. Mahmood, “A Survey on State-of-the-Art Drowsiness Detection
TABLE V Techniques”, in IEEE Access, vol. 7, pp. 61904-61919, 2019, doi:
C OMPARISON O F S YSTEM P ERFORMANCE 10.1109/ACCESS.2019.2914373.
[13] I. Kotseruba and J. K. Tsotsos, “Attention for Vision-Based Assistive and
Trail InferenceTime InferenceTime O Automated Driving: A Review of Algorithms and Datasets”, in IEEE
1 15.6 2.7 Transactions on Intelligent Transportation Systems, vol. 23, no. 11, pp.
2 16.3 3.1 19907-19928, Nov. 2022, doi: 10.1109/TITS.2022.3186613.
3 14.2 2.6 [14] M. Wollmer et al., “Online Driver Distraction Detection Using Long
∗ Considering the Times are in ms. Short-Term Memory”, in IEEE Trans. Intell. Transp. Syst., vol. 12, no.
2, pp. 574-582, June 2011, doi: 10.1109/TITS.2011.2119483.
[15] N. Li and C. Busso, “Detecting Drivers’ Mirror-Checking Actions and
Its Application to Maneuver and Secondary Task Recognition”, in IEEE
Trans. Intell. Transp. Syst., vol. 17, no. 4, pp. 980-992, April 2016, doi:
V. C ONCLUSION 10.1109/TITS.2015.2493451.
[16] Y. Liao, S. E. Li, G. Li, W. Wang, B. Cheng, and F. Chen, “Detection of
In proposed method we developed a driver monitoring sys- driver cognitive distraction: An SVM based real-time algorithm and its
tem which will detect and alert drowsy drivers, and deployed comparison study in typical driving scenarios”, in 2016 IEEE Intelligent
Vehicles Symposium (IV),Gothenburg, Sweden, 2016, pp. 394-399, doi:
the system in ready-to-use NVIDIA DRIVE AGX Xavier 10.1109/IVS.2016.7535416.
hardware. The system employs the Viola-Jones algorithm to [17] R. Wang, P. V. Amadori, and Y. Demiris, “Real-Time Workload Clas-
recognize faces in every frame of the video. The driver’s sification during Driving using HyperNetworks”, in 2018 IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS),
inattention is classified using a Deep Neural Network. We were Madrid, Spain, 2018, pp. 3060-3065, doi: 10.1109/IROS.2018.8594305.
able to reduce inference time by combining our methodology [18] J. Chen et al., “Fine-Grained Detection of Driver Distraction Based on
with the NVIDIA optimization technique and comparing it Neural Architecture Search”, in IEEE Trans. Intell. Transp. Syst.,vol.
22, no. 9, pp. 5783-5801, Sept. 2021, doi: 10.1109/TITS.2021.3055545.
to our standard model. In 2.7ms, we were able to infer [19] L. Yang et al., “Recognition of visual-related non-driving activities using
information, which is faster than our standard model. The a dual-camera monitoring system”, Pattern Recognition., vol. 116, Art.
specified system has a 99.33% accuracy. no. 107955, Aug. 2021.
[20] Misal, S., Nair, B.B. (2019). A Machine Learning Based Ap-
Future research may concentrate on the Minimum Risk proach to Driver Drowsiness Detection. in Information Communica-
Manoeuvre, which will help to reduce the risk of the driver’s tion and Computing Technology, vol 835. Springer, Singapore, 2018,
drowsiness. More training data collected while driving can https://doi.org/10.1007/978-981-13-5992-7 13.
[21] I.SinghandV..Banga,“Development of a Drowsiness Warning System
significantly improve the model’s accuracy. Using Neural Network”, Int. J. Adv. Res. Electr. Electron. Instrum. Eng.,
vol. 2, no. 8, pp. 3614–3623, Aug. 2013.
R EFERENCES [22] F.Vicente,Z.Huang,X.Xiong,F.DeLaTorre,W.Zhang,andD.Levi, “Driver
Gaze Tracking and Eyes off the Road Detection System”, in IEEE
[1] “National Highway Traffic Safety Administration (NHTSA), Transactions on Intelligent Transportation Systems, vol. 16, no. 4, pp.
Drowsy Driving”. [Online]. Available: https://www.nhtsa.gov/risky- 2014–2027, 2015, doi: 10.1109/TITS.2015.2396031.
driving/distracted-driving. [Accessed: 04-Aug-2023]. [23] A. Kumar and R. Patra, “Driver drowsiness monitoring system using
[2] C. Robbins and P. Chapman, “How does drivers’ visual search change visual behaviour and machine learning”, in 2018 IEEE Symposium on
as a function of experience? A systematic review and meta-analysis”, Computer Applications Industrial Electronics (ISCAIE), pp. 339–344,
Accident Anal. Prevention, vol. 132, Art. no. 105266, Nov. 2019. 2018, doi: 10.1109/ISCAIE.2018.8405495.
[3] “National Sleep Foundation’s (NSF)”. [Online]. Available: https:// [24] P. M. Manjula, S. Adarsh and K. I. Ramachandran, “Driver Inattention
drowsydriving.org/about/facts-and-stats. [Accessed: 04-Aug-2023]. Monitoring System Based on the Orientation of the Face Using Convolu-
[4] L. Yue, M. AbdelAty, Y. Wu, and L. Wang, “Assessment of the safety tional Neural Network”, in 11th International Conference on Computing,
benefits of vehicles’ advanced driver assistance, connectivity and low Communication and Networking Technologies (ICCCNT), Kharagpur,
level automation systems”, Accident Anal. Prevention, vol. 117, pp. India, 2020, pp. 1-7, doi: 10.1109/ICCCNT49239.2020.9225600.
55–64, Aug. 2018. [25] S. Srivastava, A. S, B. B. Nair and K. I. Ramachandran, “Driver’s
[5] A. El Khatib, C. Ou, and F. Karray, “Driver inattention detection in Face Detection in Poor Illumination for ADAS Applications”, 2021
the context of next-generation autonomous vehicles design: A survey”, 5th International Conference on Computer, Communication and Signal
IEEE Trans. Intell. Transp. Syst., vol. 21, no. 11, pp. 4483–4496, Nov. Processing (ICCCSP), Chennai, India, 2021, pp. 1-6, doi: 10.1109/IC-
2019. CCSP52374.2021.9465533.
[6] “Taxonomy and Definitions for Terms Related to Driving Automation [26] Shabnam Abtahi, Mona Omidyeganeh, Shervin Shirmohammadi,
Systems for On-Road Motor Vehicles”, Standard J3016, SAE Interna- Behnoosh Hariri, ”YawDD: Yawning Detection Dataset”, IEEE Data-
tional, 2018. port, August 1, 2020, pp. 19-21. doi: https://dx.doi.org/10.21227/e1qm-
[7] “Commission Delegated Regulation (EU)”. [online]. Available: hb90.
https://members.wto.org/crnattachments/2021/TBT/EEC/21 0621 00 e.pdf.[27] P. Viola and M. Jones, ”Rapid object detection using a boosted cascade
[Accessed: 04-Jun-2023]. of simple features”, Proceedings of the 2001 IEEE Computer Society
[8] Sinalkar, S. and Nair, B.B. ‘Stereo Vision-Based Path Planning System Conference on Computer Vision and Pattern Recognition. CVPR 2001,
for an Autonomous Harvester’, International Conference on Soft Com- Kauai, HI, USA, 2001, pp. I-I, doi: 10.1109/CVPR.2001.990517.
puting and Signal Processing, vol 1118, Springer, Singapore,2020, pp. [28] “NVIDIA TensorRT SDKf”. [Online]. Available:
499-510. https://developer.nvidia.com/tensorrt. [Accessed: July. 7, 2023].
[9] “Euro NCAP 2025 Roadmap”. [Online]. Available: [29] “DRIVE AGX Xavier”. [Online]. Available:
https://cdn.euroncap.com/media/30700/euroncap-roadmap-2025-v4.pdf. https://developer.nvidia.com/drive/documentation#drive-xavier.
Accessed: Mar. 1, 2022. [Accessed: July. 7, 2023].

You might also like