Blog

Blog

Next Generation User Experience by Integrating Alexa

by Satya K Vivek | December 05, 2017 Next Generation User Experience by Integrating Alexa

With real IoT implementation taking place in different market verticals like smart-home, smart-office, etc., researchers are looking at various options in input mechanisms to improve user experiences. Amazon’s Alexa-controlled Echo speaker, Google Home, Apple’s Siri based HomeKit are products targeted to improve the user experience by taking voice based inputs. Voice enabled digital assistance services/devices are getting increasingly popular. The experience of using voice commands to control devices plays a role in the user experience. 

In this blog we focuses on integration of Amazon Alexa based Echo speaker to IoT implementation to deliver next generation user experience. Amazon's Alexa based Echo speaker, now in its second generation and with several derivative versions available, continues to expand its music, smart-home, and digital-assistant abilities. It’s first a wireless speaker, but capable of much more. Using  your own voice, you can play music, search the Web, create to-do and shopping lists, shop online, get instant weather reports, and control popular smart-home products—all while your smartphone stays in your pocket.

From the market trends, it’s becoming clear that voice interaction will soon become an expected offering as either an alternative or add-on feature to traditional visual interfaces. See the diagram below on the prediction on this market sector. 

voice enabled digital assistantus voice enabled speakers users

Alexa is an intelligent voice enabled personal assistant developed by Amazon which is currently leading this market sector. 

The smart home device market is highly competent and many companies introducing new best featured smart home devices into the consumer market. Hence, providing additional features like voice enabled services assistance is a  must for competing with other products. 

Integrate Smart Devices with Alexa

Alexa can be integrated with the smart device hardware or can be used Alexa as a separate service like using Alexa mobile app, Amazon Echo speaker, etc. The following section describe the architecture of both the implementation.

External Alexa integration

If the smart device development is completed, still we can make the device voice enabled using external Alexa applications. The high level architecture diagram is given below.

External Alexa integration

Built-In Alexa Integration

Amazon supports Alexa SDK on various operating systems like Linux, RaspberryPi etc. Hence, Smart devices can have an in built Alexa support. The high level architecture is given below.

Built-In Alexa Integration solution

The major components in the above architectures are:-

Alexa Enabled Device 

Alexa service will be invoked using “Alexa” keyword, followed by user’s utterances.  These device continuously listen to all speech and monitoring for the wake word to be spoken, which is primarily set up as "Alexa". These devices have natural lifelike voices recognition technology through sophisticated natural language processing (NLP) algorithms. These devices have integrated with Alexa skill kit. 

Alexa Skill Service

The Alexa skill analyzes the voice request extracting intent, with slots if any and passes all to the Lambda Function that invokes the right API service, extracts data and prepares the response to return to Alexa in order to provide voice response to the user.

Alexa skill service consists of two main components a Skill Adapter and a Skill Interface. The code is written in the skill adapter and user can configure the skill interface via Amazon's Alexa skill developer portal.  The interaction between the code on the skill service and the configuration on the skill interface yields a working skill.

Skill Adapter

The skill service is where business logic is implemented which resides in the cloud and hosts code would receives JSON payloads from Alexa voice service. It determines what actions to take in response to a user speech. The skill service layer manages HTTP requests, user accounts information, processing sessions and database access for exam all this behavior which is a concern to the skill service.

Skill Interface

The interface configuration is the second part of creating a skill where we specify the utterances which is responsible for processing the user's spoken words. The interaction model is used for resolving the spoken words to specific intent events. Skill interface is available in Alexa Developer Portal.

Alexa Skill Kit

Amazon provide The Alexa Skills Kit (ASK) which is a collection of self-service APIs, tools, documentation, and code samples that makes it fast and easy for you to add skills to Alexa. The Alexa Skill kit triggers the Skill Adapter Lambda function where the business logic is written. 

Alexa Voice Service

The Alexa Voice Service (AVS) is used to voice-enable smart devices with a microphone and speaker. Once integrated, the device will have access to the built in capabilities of Alexa. Voice service has audio signal processor which receives the voice and cleans it up for further processing. This data will splits into two sections; the first section is the wake word(“Alexa”); the second section is passed to audio input processor which handles audio input that is sent to AVS.

Smart Device Cloud Server

Smart Device Cloud server will interface with the smart devices. Cloud server will receive request from Alexa and it will forward these  command to IoT devices.

Create Skills for Your Smart Device

The first step in building a new skill is to decide “what the skill will do”. The functionality of the skill determines how the skill integrates with the Alexa  and what we need to build. The Alexa Skills Kit supports building different types of skills as listed below:

  • Custom Skills
  • Smart Home Skills
  • Flash Briefing Skills
  • Video Skills
  • List Skills

Smart home skill and custom skills are most useful for controlling smart devices. A custom skill can handle any kind of request, as long as there is proper interaction model and code to fulfill the request. However Smart Home skill is Skill API from Amazon to control smart home devices such as cameras, lights, locks, thermostats, and smart TVs which simplifies development since prebuilt voice interaction model is available.

Custom skills provide the best user experience since we can define all possible utterances.

Comparison between Alexa Custom Skills and Alexa Smart Home skills

Smart Home Skill

Custom Skill

Easy to use for customer

Most flexible, but most complex to customers

The Invocation name is not required for the skill. Customers can say “Alexa, turn on living room lights”

Invocation name is required and customer need to remember invocation name. Customers can say  “Alexa, ask Uber to request a ride” 

Built-in interaction model

Design our own interaction model 

Limited devices can be linked to smart home skill, such as thermostat, lights etc. The custom sensors or devices such as valve , security devices etc. cannot be integrated.

Any devices can be integrated.

Skill must use AWS Lambda function

Skill can be used with Lambda function or any other server

This skill is for cloud connected smart home devices

This skill can be used for any purposes.

Sample Use Case

The process of integrating an Alexa on a hub which will control zWave based devices (Valves, thermostat etc) is given below. These can be divided into 2 main activities.

Create a Skill in Amazon Alexa and link the skill with Smart Device’s cloud

Amazon Alexa skill with Smart Device’s cloud

Once the skill is created, the hub users can add the new skill and link their hub on the Alexa device. Amazon will handle the voice conversion and invoke the correct interaction model for further processing.

Enable the Smart Device Cloud for Alexa support

Alexa supports OAuth2.0 authentication for validating the Smart Device users. Hence, the Smart Device Cloud should support OAuth2.0. The Smart Device Cloud should also supports the operation invoked by the Alexa. The cloud server should respond proper response messages to Alexa for a good user experience.

Best Practises to Create a Skill with Best User Experience

Some of the best practises and tips for developing alexa skill is listed below:

  • Provide the best user experience :-The user experience matters a lot. Users should be able to converse with Alexa just as they interact with another human.  The voice interaction model should be designed to support this. We should avoid lengthy and complex invocations, repetitive invocations etc
  • Create Lots of Sample Utterances:- We need to design the skill with lots of utterance so that Alexa should understood and acted on to the most of the user requests. Create multiple utterances for the same request; e.g., “Switch on the kitchen light”, “Turn on the kitchen light” etc.
  • Proper response for the skill:-  The responses should be short and clear. Skill should provide adequate error handling for unexpected or unsupported utterances. The end user should not be directly exposed to errors. The skill should offer “help” for unsupported utterances.
  • Re-prompt messages:- There may be chances of user may not hear/understand the question. Hence always include re-prompt messages from users.
  • Test with real voice and Alexa device:- Alexa voice simulator is a good tool for testing. However it does not fully reflect all real use cases. 
  • Start simple and then Fine-tune: Start with primary use cases and then add more capabilities. Make the Alexa learn on the way; i.e., based on commonly used utterances update the skills.

Conclusion

Alexa is the most popular voice enabled assistance. But, it's not the only one. Gadgeon can integrate more services like Google Home, Apple Homekit, IFTTT etc. to make your smart device smarter. Gadgeon - IoT software development company, with years of experience in Home Automation and IoT, can help to navigate the challenges and provide the most appropriate user experience for voiced enabled solutions. 


Download this Blog