Responsive Lamp
A gaze-tracked virtual desk lamp with object detection and memory.
6-DOF lamp · real-time gaze + object memory
The problem
Most CV demos show you bounding boxes and stop. This one closes the loop — gaze tells the lamp where you're looking, object detection tells it what's there, and a small store remembers what was on the desk so the lamp (and an LLM behind it) can answer questions about it later.
Who this is for
Anyone interviewing for HCI / spatial-computing / multimodal-agent roles, or curious how MediaPipe + YOLO + an LLM stitch together.
Architecture
- MediaPipe iris landmarks
- Real-time gaze direction from the webcam; drives where the virtual lamp points.
- YOLOv8 detector
- Object detection on the same camera frame; populates the spatial memory.
- Spatial memory store
- Detected objects with timestamps + last-seen positions; queryable by the LLM.
- GPT-4o-mini answerer
- Grounds spatial-memory queries ("where did I last see my keys?") against the local store instead of hallucinating.
- Three.js lamp
- 6-DOF lamp model in the browser; its target follows gaze via WebSocket.
- WebSocket bridge
- Backend (FastAPI) pushes gaze and detection events to the browser without polling.
Request / data flow
- 01Webcam frame → MediaPipe extracts iris landmarks → gaze vector.
- 02Same frame → YOLO produces detections → memory store updated.
- 03Gaze vector pushed over WS → Three.js lamp re-aims.
- 04User asks "what's near my mug?" → LLM reads memory store, answers with last-seen objects + positions.
Key decisions
Memory store, not raw frame logs.
whyThe LLM should reason over typed events (object, time, position), not over thousands of frames.
WebSocket end-to-end.
whyGaze at 30+ Hz over HTTP polling would either stutter or hammer the server.
GPT-4o-mini for query answering, not a larger model.
whyMemory queries are short and bounded; latency matters more than reasoning depth.
Stack
If I rebuilt it
- ›Persist memory beyond process lifetime so "yesterday" queries work.
- ›Switch detection to a smaller-but-faster YOLO variant; the demo doesn't need 80-class COCO.