HomeIO | Offline-based Voice-Assisted Home Automation System
Introduction:
The Internet of Things (IoT) has made it easier for people to operate and monitor their electrical and electronic products at home, allowing for a more comfortable lifestyle. However, many home automation systems rely on cloud-based services, which can be vulnerable to cyberattacks and require a stable internet connection. This can be a problem in developing nations where internet quality is low, and support for local languages is limited. To address these issues, our company has developed an offline-based voice-assisted home automation system called HomeIO.
System Architecture:
The system architecture of HomeIO consists of a central control point, which is the smart hub, and multiple smart home devices that can interact and respond to one another via the smart hub. The smart hub can be implemented using a device such as a Raspberry Pi 4, which has enough processing power and memory to run the necessary algorithms for the system. The smart hub includes a Voice Activity Detection (VAD) algorithm that continuously monitors the environment for speech signals and triggers the Wake Word Detection model if it finds any. Once the wake word is detected, the speech-to-text model is activated to convert the audio signals into sentences. The system then uses a Natural Language Understanding (NLU) Engine to extract the scenario, intent, and entity needed for the control system to decide on the appropriate action to take. The NLU model is usually implemented using pre-trained models such as Bidirectional Encoder Representations from Transformers (BERT), which can extract significant information from a sentence.
System Implementation:
HomeIO's smart hub is implemented using a Raspberry Pi 4 with 4GB RAM, and the smart plug socket is implemented using an ESP32 Node MCU module. The system uses an open-source Google WebRTC Voice Activity Detector for voice activity detection, and a simple CNN-based neural network for speech-to-text conversion. The system also uses a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model for natural language understanding. The smart plug socket also includes a relay control circuit and an energy measurement feature using an HLW8012 breakout board. The system uses a MQTT protocol for 2-way communication between the smart hub and smart plug sockets. The smart hub includes a dashboard for device management and monitoring, which can be accessed locally through any device on the same Wi-Fi mesh network.
Results and Discussion:
The automatic speech recognition system implemented on the Raspberry Pi 4 was able to detect English language commands and carry out relay operations in near-real-time. The system's speech- to-text models performed better than other commonly used models in terms of Word Error Rate (WER) and size occupancy. The natural language understanding model had an accuracy of 96% in a private dataset. However, further research is needed to test the system with people of varied ages and dialects. Additionally, the system could be improved by re-writing scripts in C++, testing on low-performance devices, and retraining models for localization in other languages.
Conclusion:
Our company's offline-based voice-assisted home automation system, HomeIO, offers a secure and reliable solution for controlling and monitoring devices at home. The system's on-device speech-based user interface and device power and security management features make it a valuable addition to the smart home market, especially in areas with weak internet connectivity. With the ability to expand interoperability and create specialized mobile applications, HomeIO has the potential to become a leading home automation system for consumers. Additionally, by retraining models for localization in other languages, HomeIO can cater to a wider range of users, making it a versatile and adaptable solution for any household. Overall, HomeIO is a cost-effective, easy-to-use, and secure system that can enhance the convenience and comfort of any home.