How the AI sees you,
frame by frame.
Gaara AI combines real-time computer vision with sequence-aware deep learning — the same techniques used in research labs, optimised to run live in your browser.
Camera to coaching cue in four stages.
MediaPipe Holistic.
Every frame is fed through Google's MediaPipe Holistic model, which extracts full-body landmarks in real time on commodity hardware.
The output is a complete biomechanical snapshot — body, hands, and face — ready to be consumed by the recognition model.
1,662 features per frame.
Landmarks are flattened into a single feature vector consumed by the LSTM. Each row below shows how the dimensions add up.
LSTM neural network.
A Long Short-Term Memory network processes sequences of 30 frames — capturing the temporal dynamics that distinguish a correct movement from a flawed one.
Sliding window inference at 30fps.
Sliding window
The latest 30 frames are kept in a rolling buffer. Each new frame replaces the oldest, giving the model continuous context.
Throttled inference
Predictions run every 250ms — fast enough to feel instant, slow enough to avoid burning compute when the body isn't moving.
Stable detection
A shot or pose is only confirmed after 3 consecutive frames pass the confidence threshold — preventing flicker from noisy poses.
What it's built on.
Want to license this stack?
We license our pose-recognition pipeline for custom sports and wellness products. Talk to us about your use case.
