Vision & HCIlive

Responsive Lamp

A gaze-tracked virtual desk lamp with object detection and memory.

6-DOF lamp · real-time gaze + object memory

The problem

Most CV demos show you bounding boxes and stop. This one closes the loop — gaze tells the lamp where you're looking, object detection tells it what's there, and a small store remembers what was on the desk so the lamp (and an LLM behind it) can answer questions about it later.

Who this is for

Anyone interviewing for HCI / spatial-computing / multimodal-agent roles, or curious how MediaPipe + YOLO + an LLM stitch together.

Architecture

MediaPipe iris landmarks: Real-time gaze direction from the webcam; drives where the virtual lamp points.
YOLOv8 detector: Object detection on the same camera frame; populates the spatial memory.
Spatial memory store: Detected objects with timestamps + last-seen positions; queryable by the LLM.
GPT-4o-mini answerer: Grounds spatial-memory queries ("where did I last see my keys?") against the local store instead of hallucinating.
Three.js lamp: 6-DOF lamp model in the browser; its target follows gaze via WebSocket.
WebSocket bridge: Backend (FastAPI) pushes gaze and detection events to the browser without polling.

Request / data flow

01Webcam frame → MediaPipe extracts iris landmarks → gaze vector.
02Same frame → YOLO produces detections → memory store updated.
03Gaze vector pushed over WS → Three.js lamp re-aims.
04User asks "what's near my mug?" → LLM reads memory store, answers with last-seen objects + positions.

Key decisions

Memory store, not raw frame logs.

whyThe LLM should reason over typed events (object, time, position), not over thousands of frames.

WebSocket end-to-end.

whyGaze at 30+ Hz over HTTP polling would either stutter or hammer the server.

GPT-4o-mini for query answering, not a larger model.

whyMemory queries are short and bounded; latency matters more than reasoning depth.

Stack

MediaPipeYOLOv8Three.jsGPT-4o-miniWebSocketFastAPI

If I rebuilt it

›Persist memory beyond process lifetime so "yesterday" queries work.
›Switch detection to a smaller-but-faster YOLO variant; the demo doesn't need 80-class COCO.

← All projects Contact me about this →