Authors - Veeravalli Sri Satya, Anjan Babu G Abstract - Human-computer interaction with smart consumer electronics predominantly requires physical peripherals, which introduce limitations regarding hardware degradation, shared-surface hygiene, and usability in hands-free environments. Voice-activated systems provide an alternative but exhibit high latency and degraded performance under ambient noise. This paper presents a multi-layered touchless gesture control framework that translates human hand kinematics into direct system actuation. The architecture utilizes a standard web camera and the Google MediaPipe framework to extract 21 three-dimensional hand landmarks in real time. To bypass the computational bottlenecks of traditional Convolutional Neural Networks (CNNs), the system employs a custom heuristic algorithm to classify eight distinct static and dynamic gestures by analyzing the geometric relationships between finger joints. The framework processes these classifications locally and actuates Android-based Smart TVs over Wi-Fi utilizing Android Debug Bridge (ADB) protocols [11]. Evaluated in a controlled environment, the pipeline achieved an average processing time of 35 milliseconds per frame (approximately 30 frames per second) with a network transmission delay of 50 to 80 milliseconds. The results suggest that computationally lightweight computer vision models, when paired with structured state-machine logic, can effectively replace physical remote controls without requiring dedicated GPU hardware.