Credit: Pixabay/CC0 Public domain
What if a security camera could not only capture video, but also understand what’s going on, distinguishing routine activities from potentially dangerous behavior in real time? This is the future being shaped by researchers at the University of Virginia School of Engineering and Applied Sciences with their latest breakthrough: an intelligent AI-driven video analyzer capable of detecting human actions in video sequences with unprecedented precision and intelligence.
The research paper is published in the journal IEEE Transactions on Pattern Analysis and Artificial Intelligence.
The system, called SMAST (Semantic and Motion-Aware Spatiotemporal Transformer Network), promises a wide range of societal benefits, from improving surveillance systems and public safety to enabling more advanced motion tracking in healthcare and refining how autonomous vehicles navigate complex environments. environments.
“This AI technology opens the door to real-time action detection in some of the most demanding environments,” said Scott T. Acton, professor and chair of the Department of Electrical and Computer Engineering, and principal investigator on the project. . “This is the kind of progress that can help prevent accidents, improve diagnostics and even save lives.”
AI-powered innovation for complex video analysis
So how does it work? At its core, SMAST is based on artificial intelligence. The system relies on two key components to detect and understand complex human behaviors. The first is a multi-feature selective attention model, which helps AI focus on the most important parts of a scene, such as a person or object, while ignoring unnecessary details. This makes the system more accurate at identifying what is happening, such as recognizing someone throwing a ball instead of just moving their arm.
The second key feature is a motion-aware 2D positional coding algorithm, which helps the AI track how things change over time. Imagine watching a video in which people are constantly changing positions: this tool helps the AI remember these movements and understand their relationships with each other. By integrating these features, SMAST can accurately recognize complex actions in real-time, making it more effective in high-stakes scenarios like surveillance, healthcare diagnostics, or autonomous driving.
SMAST redefines how machines detect and interpret human actions. Current systems struggle to handle chaotic, unedited contiguous video sequences, often missing the context of events. But SMAST’s innovative design allows it to capture the dynamic relationships between people and objects with remarkable precision, thanks to the very AI components that allow it to learn and adapt from data.
Setting new standards in action detection technology
This leap in technology allows the AI system to identify actions like a runner crossing a street, a doctor performing a specific procedure, or even a safety threat in a crowded space. SMAST has already outperformed leading solutions in key academic benchmarks, including AVA, UCF101-24 and EPIC-Kitchens, setting new standards in accuracy and efficiency.
“The societal impact could be enormous,” said Matthew Korban, a postdoctoral research associate in Acton’s lab working on the project. “We are excited to see how this AI technology could transform industries, making video systems more intelligent and capable of understanding in real time. »
More information:
Matthew Korban et al, A network of semantic and motion-aware spatiotemporal transformers for action detection, IEEE Transactions on Pattern Analysis and Artificial Intelligence (2024). DOI: 10.1109/TPAMI.2024.3377192
Provided by University of Virginia
Quote: AI-based video analyzer sets new standards in human action detection (October 16, 2024) retrieved October 16, 2024 from
This document is subject to copyright. Except for fair use for private study or research purposes, no part may be reproduced without written permission. The content is provided for informational purposes only.