1. Introduction
There is a growing interest in the field of NLP study in academia, as several industries are deploying virtual assistant solutions. [1] reports 27% of people have whether Google Assistant, Alexa, Cortana or Siri, the smartphone is as 85% high in adoption over intelligent speakers, tablets, laptops, smart TVs, wearable technology, and home automation, nevertheless, a remarkable 31% of cars nowadays have a virtual assistant. In the automotive industry, [2] shows pairs of manufacturers and virtual assistants, this is Ford uses Alexa [3], Mercedes-Benz [4] and Hyundai [5] use Google Assistant, BMW [6] and Nissan [7] use Cortana, GM uses IBM Watson [8], Honda uses Hana [9], Toyota uses YUI [10]. From the academy and research communities, related relevant patents provide innovation trends, on the one hand, Automotive Virtual Personal Assistant [11] which is a system that actively monitors the car state to provide relevant notifications, and on the other hand, Proactive Virtual Assistant [12] which evaluates user’s information to provide suggestions and perform actions in advance. Finally on worldwide events like the Consumer Electronics Show, technology and innovations from carmakers are presented as well, like the Mercedes-Benz User Experience Hyperscreen [13] which apart from multiple displays a virtual assistant is integrated, by Hey Mercedes command car information is retrieved, and taking into consideration GPS location, information of nearby restaurants, parking lots, between others is provided to the driver. It is well known that open-source products have characteristics such as low or non-existing cost for usage and distribution depending on the license type, high quality, security, open access and flexibility to modify its components, furthermore collaboration and innovation are present due to development communities’ support; so, the virtual assistant solutions are not the exception to the rule. Based on the earlier, this work presents an investigation result in a comparison table of relevant features of contemporary open-source virtual assistant solutions, besides a detail of the chosen solution Mycroft AI’s components, algorithms, and methods is provided. Afterwards, the steps to create a Mycroft AI skill or application are presented, followed by the customization of intent, dialog and entity files for automotive instrument panel’s indicators seat belt, fuel level and battery level. The main application design relies on dynamic behavior diagrams, a state machine, and a sequence diagram, to create a base product of a voice communication module for automotive instrument panel indicators based on Mycroft AI as the final goal. Finally, achievements, contributions and future work are listed and discussed.
2. Background
A wide range of open-source tools and technologies related to virtual assistance are available in the market, investigation was performed on sites such as makezine [14] where free and private voice assistants are compared based on open source architecture components, medevel [15] in which open-source technologies and platforms of popular voice assistants are analyzed, yourtechdiet [16] which lists project’s origin and up-to-date status of best open source voice assistants, and finally libhunt [17] which reports virtual assistant solutions’ popularity based on activity, commits on corresponding repositories and mentions from development communities, based on the earlier, the “Table 1. Contemporary open-source virtual assistant solutions” was created, the table shows a comparison between different contemporary open-source virtual assistant solutions and their most relevant characteristics. Mycroft [18] stood from the crowd due to ready to deploy, well documented, simple installation on a Linux PC, and straightforward execution.
Assistant | Os/Hw | Prog. Language | Popularity | Internet required | Privacy | Customization | Documentation |
---|---|---|---|---|---|---|---|
Mycroft | Linux, RPI | Python, Bash |
|
|
|
|
|
Leon | Windows, Mac | Node.js, Python, Http |
|
|
|
|
|
Rhasspy | RPI | Docker, Python, Shell |
|
|
|
|
|
Jasper | RPI | Python |
|
|
|
|
|
Almond | Linux, Web | Javascript |
|
|
|
|
|
OpenAssistant | Windows, Linux, Mac | Own SDK |
|
|
|
|
|
LinTO | RPI | Docker, Python, Bash, C++, Java |
|
|
|
|
|
Aimybox | Android, iOS | Apache 2.0 |
|
|
|
|
|
Kalliope | RPI, Linux, Android | Python, Rest, Bash |
|
|
|
|
|
2.1 Why use Mycroft?
Mycroft is presented on IEEE's Entrepreneurs in consumer Electronics [19] as an open-source software platform which integrates technologies that have significantly improved in recent years such as speech recognition, text to voice, command processing, etc., all these technologies allow to add voice assistance powered by artificial intelligence to any application executed on laptops, speakers, Raspberry PIs and cars.
A successful deployment of Mycroft is [20] in which an intelligent robot assistant is created to handle smart homes for the elderly.
Over other solutions Mycroft [21] presents itself as:
Open source: The Mycroft code can be analyzed, copied, modified, and distributed, these cannot be said for Alexa, Google Assistant, Cortana, Siri which are black boxes, contents is hidden and protected by commercial licenses.
Respectful of users’ privacy: Voice recording works only if user grants permission.
Multiple hardware compliant: RPI, Android, Linux PC.
Light: Designed to be executed on low cost, low power, and low resources hardware.
Community oriented: Vibrant, committed, and helpful community.
2.2 Modular Mycroft
Mycroft implements Voice Stack components [14] as part of an open-source virtual assistant architecture, these components can be configured, personalized, started and stopped independently, their openness and flexibility is the main advantage of Mycroft against the commercial and other open-source counterparts.
Wake Word Detection: “Hey Mycroft” is default, customized through [22]. Due to the simplicity to configure a new Wake Word through phonemes on a text configuration file based on CMU Dictionary of sound, it was decided to use PocketSphinx [23] part of CMUSphinx [24] originally based on the SPHINX-II [25] speech recognition system, this system achieves improved unified acoustic and language modeling through normalized feature representations, multiple-codebook semi-continuous hidden Markov models, between-word senones, and multi-pass search algorithms. Software Precise can be used to provide higher precision on Wake Word detection at the expense of demanding training a NN on large audio sequences.
Speech To Text: Google STT [26] is the default engine, deep learning progress on voice transcription makes possible models like LSTM RNN [27] perform remarkably on the speech recognition domain and further text transcription.
Intent Parser: Adapt [28] is the default software developed by Mycroft AI to identify utterances or commands as machine readable data structures after parsing the natural language text input. Software Padatious [29] can be used to provide higher precision on utterance detection at the expense of demanding training a NN on required phrases.
Text to Speech: Mimic [30] is the default software, developed by Mycroft AI along with VocaliID, Mimic is based on the open-source text to speech synthesis engine CMU’s Flite, voice synthesis is achieved by Classification and Regression Tree and Finite State Transducer algorithms. Google TTS [31] can be also used.
Mycroft Skills: Mycroft AI applications, e.g., timers, alarms, whether, time, date, etc., custom skills can be developed with Mycroft Skills Kit (MSK) support and Python programming language.
2.3 User Centered Design
Mycroft's design and development approach is driven by user's needs, the philosophy also known as User Centered Design (UCD) a concise version of Human Centered Design (HCD) which turns around observe and understand users, assures useful, understandable, pleasant, and enjoyable products to interact with, essentially the final goal of the Norman's User Experience [32].
Mycroft makes use of the Design Thinking [33] method, which synthesizes the problem handling as follows:
This follows the Mycroft's application flow.
3. Methodology
After the hardware to work with is chosen whether it is RPI, Android or Linux PC, in order to use Mycroft AI, steps in [18] must be followed to install and set all the dependencies needed to successfully run the software solution, generic tests are provided to make sure the solution is properly installed and dependencies working well, this is especially relevant for the microphone and speakers peripherals and corresponding drivers, just in case of trouble, useful help is provided by Mycroft AI with a troubleshooting section [34].
The “Figure 2. Mycroft AI flow to create a Skill” shows the steps to create and modify the Skill to meet custom needs.
3.1 Mycroft Skill Kit
Mycroft Skills Kit (MSK) utility is installed along with Mycroft with the objective to facilitate the creation, upload, and upgrade of skills in the corresponding local directories or repositories. mycroft-msk create is the console command used to run MSK and execute an interactive script to ask and answer information to generate a skill skeleton in the form of a template.
3.2 Creation of Automotive Telltales Skill
An automotive Instrument Panel (IP) presents to the driver different ECUs's signals visually or auditory, this work considers three main telltales which are part of the safety relevant telltales according to the National Highway Traffic Safety Administration (NHTSA) [35]: seat belt, fuel level and battery level.
For example, for seat belt telltale, applying the Design Thinking approach previously presented, we get the following output:
When seat belt status is queried and unfastened,
I want to be provided a suggestion to fasten the seat belt,
So, I am able to achieve a trustful trip.
“Table 2. Mycroft Skill Kit script for Skill Telltales” shows the mycroft-msk create command result for seat belt telltale.
Script question | Answer given |
---|---|
Utterances | What is the car seat belt status |
Dialog | Car seat belt is fastened |
Short description | Telltale status / Indicator level retrieval |
Long description | Skill to retrieve instrument panel's current telltale status or indicator level |
Author | Hernandez Ricardo |
Category | IoT |
Tags | None |
Finally, MSK's script asks for GitHub's repositories creation for the skill, this will be required only if the skill will be published in the Mycroft's marketplace. In our case, it is enough the MSK's output, this is the Skill Telltales located in local directory /opt/mycroft/skills, “Table 3. Mycroft Skill Telltales output” shows output files and directories.
Item | Description |
---|---|
Locale | Directory containing files (intent and dialog) for every language supported, en-us for English as default. |
__init__.py | The skill python-based core to import libraries, define class which inherits from Mycroft Skill to work with voice, define own methods to handle intents and dialogs, create skill's class instance and its execution. |
README.md | Skill's human readable information, provided in MSK's script, short and long descriptions, author, category. |
settingsmeta.yaml | Parameters of Mycroft's profile stored at https://sso.mycroft.ai/home.mycroft.ai, like date, time, time measured, location, voice type, etc. |
manifest.yml | External software dependencies if any. |
3.3 Intents, Dialogs and Entity files
Within Locale directory it is found intent file and dialog file which contains phrases originated in MSK's script, later by manually editing this pair of files and following the Mycroft’s guidelines and rules, custom utterances and dialogs were added to these files. Besides, file telltales.entity was manually created to provide flexibility, a wild-card to handle different kind of phrases for each telltale, the entities are used in both, intent and dialog files. Referenced files and corresponding contents are summarized in “Table 4. Intent, Dialog and Entity files”. Facts about dialogs are on the one hand, that they are randomly chosen by Mycroft to simulate a more natural impression, and on the other hand in this work dialogs are intentionally incomplete to handle apart each telltale's status or level through python code. The result is the user experience’s feedback functionality in action.
Furthermore, a sample of user experience’s feedforward functionality, this is, the helpful information provided to the user to decide what to do next, is exemplified as a simulated car trip in three different modes. Every trip mode represents a different level of feedforward functionality detailed in “Table 5. Trip modes for feedforward functionality”.
Trip mode | System’s behavior description |
---|---|
Notification Mode | Telltale’s status or level is provided with simple sentence and no more. Decision of what to do next relies completely on the user. |
Suggestion Mode | Telltale’s status or level is provided with context information sentence about the car’s location. Even decision of what to do next still relies on user, on this mode the decision is facilitated. |
Action Mode | Telltale’s status or level is provided with context information sentence about the car’s location. Decision of what to do next relies completely on assumed autonomous car. |
A simple state machine model was created to depict available Trip Modes, this is presented in “Figure 3. Trip Modes state chart”. To start a trip and enter one of the Trip Modes available, utterances are introduced by adding intent files presented in “Table 6. Trip modes intent files”.
File | Content |
---|---|
trip_notification.intent | start notification trip |
trip_suggestion.intent | start suggestion trip |
trip_action.intent | start action trip |
Simulated trip’s sequence of actions for each Trip Mode which takes into consideration utterances, dialogs and simulated ECU’s messages are shown in corresponding sequence diagrams. On “Figure 4. Skill Telltales Notification Mode sequence diagram”, after welcome message is provided by the voice communication module running inside an UX ECU, utterance from user is received to start a trip in Notification Mode, then simulated ECUs: Seat Belt, Fuel Level and Battery Level provide corresponding status to finally trigger notification messages with predefined simple sentence dialogs. Decision on what to do next relies on the user.
On “Figure 5. Skill Telltales Suggestion Mode sequence diagram”, after welcome message is provided by the voice communication module running inside an UX ECU, utterance from user is received to start a trip in Suggestion Mode, then simulated ECUs: Seat Belt, Fuel Level and Battery Level provide corresponding status to finally trigger suggestion messages with predefined sentence dialogs based on context information about the car’s location. Even decision of what to do next still relies on user, on this mode the decision is facilitated.
On “Figure 6. Skill Telltales Action Mode sequence diagram”, after welcome message is provided by the voice communication module running inside an UX ECU, utterance from user is received to start a trip in Action Mode, then simulated ECUs: Seat Belt, Fuel Level and Battery Level provide corresponding status to finally trigger action messages with predefined sentence dialogs based on context information about the car’s location. Decision of what to do next relies completely on assumed autonomous car.
3.3 Skill Telltales class
The Mycroft generated and later customized Skill Telltales python class, through the methods handle_telltales, handle_notification, handle_suggestion and handle_action is the main responsible to process each telltale's status request and speaks a predefined output according to the dynamic behavior represented in “Figure 3. Trip Modes state chart”, “Figure 4. Skill Telltales Notification Mode sequence diagram”, “Figure 5. Skill Telltales Suggestion Mode sequence diagram”, and “Figure 6. Skill Telltales Action Mode sequence diagram”.
3.4 Test cases
To provide a real example of the Skill Telltales class in action “Table 7. Test cases for Skill Telltales” summarizes meaningful test cases, this is the input called intent, utterance or command provided by a human with his voice to Mycroft AI and corresponding output called dialog provided Mycroft AI to the human.
Intent (Human) | Dialog (Mycroft AI) |
---|---|
Car seat belt status | Current car seat belt is Unfastened |
What is the car fuel level status | Fuel level status is Reserve |
Give me the car battery level status | Car battery level status is Low |
Start Notification Trip | Notification messages from simulated trip in Notification Mode |
Start Suggestion Trip | Suggestion messages from simulated trip in Suggestion Mode |
Start Action Trip | Action messages from simulated trip in Action Mode |
4. Results and discussions
The main goals and results achieved in this work are listed as follows:
Voice-Based Virtual Assistants investigation result reveals an increasing trend in the automotive industry, this is confirmed by their omnipresence in current and short-term future vehicles, also it is proof enough of their relevance as a big value asset that must be paid attention to.
Contemporary Open-Source Virtual Assistant solutions presented here as investigation result, evidences the worldwide motivations to work and contribute to the improvement and refinement of methods, algorithms and techniques related to the NLP.
Mycroft AI demonstrated to be the most complete Open-Source Virtual Assistant based on the facts of installation and setup on a Linux PC is quite simple, documentation and troubleshooting is available and helpful, and finally, deployment of voice applications or skills is straightforward aided by the MSK.
The skill Telltales created by following the UCD approach and the Design Thinking method over three telltales seat belt, fuel level and battery level, represents a voice communication module for automotive instrument panel indicators with the following characteristics: 1) Simple, because skill output folder and dependency files are reduced and concrete, 2) Portable, since it can be deployed in hardware like RPI, Android or Linux PC which has Mycroft AI instantiated, and 3) Customizable, though the adaption of the intent, dialog, entity files as well as the python based class of the skill.
5. Conclusions
The main contribution of this work is to lay the foundations of the evolution of contemporary Automotive Instrument Panels from a mere physical device to a comprehensive virtual device, this is, automobile users demand of future vehicles voice-based virtual assistance, which is easily achieved by using Mycroft AI, besides the unique footprint of Mycroft AI is the deployment of the UCD approach and the Design Thinking method that makes the difference against competitors.
For this work, the scope of the Instrument Panel’s indicators was reduced to only three telltales which are safety relevant: seat belt, fuel level and battery level, however real Instrument Panel’s indicators are numerous safety relevant or non-safety relevant, a short list of intents and dialogs was introduced in corresponding files to keep it simple, and finally a fixed status for each telltale was set in the corresponding python class to conclude generic dialogs.
About the applicability of the contribution of this work, the automotive industry whether big companies, mid-range companies or start-ups are the main interested parties and beneficiaries of this work, since voice communication modules are fully compatible with any kind of HMI products inside the car. Due to the fact Mycroft AI is an open-source solution, its usage brings economic benefits along with legal responsibilities.
Future work relies on the vast possibilities of connection and expansion of the voice communication module presented in this work, through the usage of countless python libraries to import any kind, format or source of information, for example, different systems interconnection through communication buses and technologies already available in the automotive domain like CAN, LIN, Ethernet, MOST, GPS location between others, in order to get, process and present relevant information to the final user.