Violent Action Detector - AI Video Surveillance

COMPUTER VISION

Visual Threat Analytics

Executes parallel YOLOv11 inference streams for real-time weapon (knife) classification and skeletal pose estimation, maintaining frame rates suitable for live video feeds.

DEEP LEARNING

Temporal Sequence Processing

Integrates a Bidirectional LSTM network tracking a 150-frame temporal buffer of normalized keypoint velocities to classify suspicious physical gestures and stabbing movements.

SYSTEM INTEGRATION

Automated Alert Logging

Triggers instant logging upon violent activity detection, extracting facial crops from clean video frames and archiving logs for forensic review.

Workflow Architecture

Detection Pipeline Overview

Four integrated steps to analyze, filter, and classify threats in under 30 milliseconds.

Visual Ingestion

Captures live camera feeds or local security video streams at 30 FPS.

Parallel YOLOv11

Runs object detection for weapons and tracks 17-joint skeletal human poses.

Torso Normalization

Centers and scales skeleton coordinates relative to shoulders for distance invariance.

LSTM Sequence HAR

Classifies gestures over a 150-frame queue to flag active stabbing motions.

Interactive GUI

Real-time Surveillance Monitor

The system features a lightweight Tkinter dashboard designed for operators. It overlays active tracking lines, weapon bounding boxes, and threat classifications (Safe vs. Danger) directly on the video feed.

Dynamic Frame Skipping: Adjust processing load in real-time from 1 to 5 frames.
Automated Forensic Captures: Extracts and crops the face of suspects instantly.
Alert Logging: Appends detailed telemetry logs for security review.

Learn GUI Controls Watch Demo Videos

Academic Abstract

Research Foundation

"By separating visual features into parallel streams—object classification for weapons (YOLOv11 Knife) and skeletal keypoint estimation (YOLOv11-Pose)—the system isolates physical postures. These pose trajectories are normalized against perspective scaling and tracked across consecutive frames. The temporal sequences are then processed by a Bidirectional LSTM network."

Read Thesis Abstract

Scientific Background

Thesis Paper Documentation

Author: Filippo Notari • Advisor: Prof. Francesco Santini • Università degli Studi di Perugia

The development of this real-time detection pipeline is backed by a structured academic thesis exploring human activity recognition (HAR), computer vision optimizations, and temporal modeling.

The documentation prepares all sections of the thesis, outlining research methodology, comparative models (CNN-LSTM vs. 3D-CNNs), training hyperparameters, and experimental accuracy outputs (AUC / Flicker Rates).

Explore Thesis & Download PDF

Deploy the Surveillance Pipeline

Explore the installation guides, prerequisites, and code directories to start tracking and classifying stabbing motions.

Get Started GitHub Repository