How to use the Jargon SDK to manage voice content
The Jargon SDK helps you manage your voice content through several best practices, including managing patterns like singular and plural phrases (“you have one coin” vs. “you have two coins”), built-in variation of voice output, and built-in responses to common skill use cases. The SDK is also a prerequisite to take advantage of Jargon’s localization services but is worth looking into even if you don’t plan to release into multiple languages. In this post, I’ll go into details about how I incorporated the Jargon SDK into one of my Alexa skills, Slot Machine. While my focus is on how to migrate an existing skill, you can apply these same principles if you are authoring a new skill from scratch with Jargon. If you would like to see an in-depth guide to using the SDK including code samples, visit my Medium post for more details.
To get started using the Jargon SDK, you’ll first need to extract your resources into a separate JSON file. The format of this file is straightforward and is basically a set of key/value pairs. It supports basic strings, variations, and objects — the nuances come as you work to extract the resources from your code into a separate file. You can also provide named parameters or plural forms (based on the ICU Message format standard for internationalization), which will allow you to do substitutions. Plural formats provide a way to avoid awkward responses like “you have 1 items,” and can also be used to provide richer instructions in case you want to provide them for customers with different number of items. For example, if a customer has 0 items, you can respond saying “you have no items, would you like to learn how to get some?” without having to make a code change.
A commonly accepted best practice in voice is to add variety to the output from your Alexa skill. Jargon allows you to do this without code changes by including variations within an object with fields named “v1,” “v2,” etc. When you ask Jargon to load the resource, it will randomly select one of these variations. The best part is this behavior is driven by the content – there are no code changes needed to select a variation. The Jargon documentation notes that more complex methods than random selection will be coming in future versions of the SDK, hinting at a possibility of personalization or A/B testing to intelligently help developers manage content choices. If you need to know which variation was selected, Jargon provides this capability but in general there are very few use cases where you’ll need to know which variation was selected. Personally, I only do this for upsell of in-skill purchases, because I want to track how effective different variants are in completing a sale.
One of the sins of localization that I was committing in my skills was building strings from fragments (don’t tell me you don’t do this too!). Often, there are multiple things that can happen in an invocation and it can cause several strings to be put together. For example, in my Slot Machine skill, when a user says “Spin,” I would reiterate how much they bet, which symbols came up, whether they won or lost, and finally some special logic based on whether they lost their full bankroll or not.
The best way to solve this is to put each path as a separate entry in your resources file, where feasible. This is because it allows you to tailor a response to a specific situation, provide variations specific to that condition, and makes for the most natural translations for native speakers. I did this by naming my keys based on the path that a user takes through my skill. In the spin example, one of my keys is named SPIN_BET_WIN (which is called when a user invokes the Spin intent by betting a different amount of coins, and then wins). I can then tailor fun response variants specific to this case, which makes the game more engaging. The downside is this forced me to create over 80 different resources, but I appreciate the variety that I am now able to provide between different use cases.
As an alternative, v1.1 of the Jargon SDK introduced a feature where you can merge multiple strings together in your response. I would only recommend doing this in cases where your response consists of logically independent content and isn’t on the common interaction path of your skill. It is easier to migrate to Jargon with this approach, however you will miss out on opportunities to provide richer responses and variants to specific use cases that you might be overlooking.
An example where I have used this is in a leaderboard feature. I recently added the ability for customers to provide my skill permission to use their first name as registered with their Amazon account to appear on the leaderboard by name. But I first need to prompt the user to make them aware of this feature, and then read the leaderboard. Because these are distinct concepts in a lesser-used flow and didn’t really lend itself to natural speech variations, I implemented this feature by concatenating strings.
As part of your response to Alexa, you can provide a set of directives which can be used for in-skill purchases, screen display, controlling gadgets, or progressive results. Sometimes, you’ll want to include localizable content within these responses, for example an upsell message for premium content. Jargon allows you to either render individual strings to place into the directive within your code, or to place the entire directive object into your resources file which you can manage and load from the content file. Personally, I prefer the second approach as it keeps the code cleaner, but either approach works fine.
While it can take some effort to refactor your code and resources to use with the Jargon SDK, the effort is worth it. I’m enjoying the ease with which I can now add variations to my skill output, and it’s been great being able to interface with Jargon to localize and launch my skills in new markets.
About Garrett Vargas: Garrett Vargas is CTO at CarRentals.com, part of the Expedia Group. He has over 20 years of engineering leadership experience in technology. Over the past few years, he has published more than a dozen Alexa skills ranging from casual games to a skill for CarRentals.com customers to manage their bookings. For more voice related posts by Garrett Vargas, visit his blog at medium.com/@garrettvargas
This post was originally posted by Garrett Vargas on Medium.
Looking to get started with the Jargon SDK? Check out these additional resources below:
Get Started with Jargon
Jargon SDK Starter Templates
Comments or questions? Contact us at email@example.com