autonomous robots cs stream
• led end-to-end design and implementation of a ROS2-based multimodal human–robot interaction system by decomposing it into modular nodes: facial recognition, wake-word detection, computer vision, and LLM-driven dialogue
• engineered a 3D “pointing ray” interaction model using MediaPipe Hands + MiDaS depth estimation to identify and crop user-indicated objects in real-time
• integrated OpenAI Whisper for live speech-to-text transcription and GPT-4.1 for context-aware conversational response generation