AI City Challenge 2024
AI City Challenge 2024
Group Members
Hridya N - TVE23ECRA08
2. Aim:
• Develop a machine learning model capable of generating dense video captioning captions for
traffic safety scenarios, focusing on pedestrian accidents, using long videos with multiple
viewpoints.
• Leverage multiple cameras and viewpoints to describe the continuous moment before incidents,
as well as normal scenes, capturing details regarding context, attention, location, and behavior of
pedestrians and vehicles. participants will detail events leading up to incidents and ordinary
scenes, offering deep insights for applications like insurance inspection processes and accident
prevention.
3. Deliverables:
• Trained machine learning model capable of fine-grained video captioning for traffic safety
scenarios.
• Detailed documentation outlining the model architecture, training procedure, hyperparameters,
and evaluation results.
• Codebase implementing the model, preprocessing pipeline, and evaluation metrics.
• Evaluation report showcasing the performance of the model on the provided dataset and
comparison with baseline approaches.
• Details: Proposes an approach using convolutional neural networks (CNNs) and recurrent neural
networks (RNNs) with attention mechanisms for image captioning.
• Pros: Effective in generating descriptive captions by aligning visual and semantic information.
• Cons: May struggle with capturing fine-grained details in complex scenes with multiple objects
and interactions.
Paper 2: Attention Mechanisms in Neural Networks for Video Captioning (2017)
• Details: Introduces attention mechanisms for video captioning tasks, allowing the model to focus
on relevant regions of the video frame.
• Pros: Improves the model's ability to generate informative captions by attending to salient
features.
• Cons: Increased computational complexity and potential performance degradation on long
videos with multiple viewpoints.
5. Research Gap:
• While existing research has made significant progress in video captioning, there is a need for
specialized models tailored to fine-grained traffic safety scenarios.
• Current approaches may not adequately capture the nuanced details of pedestrian accidents and
the surrounding context, highlighting the importance of developing dedicated solutions for this
domain.