Language understanding is hard. This is perhaps the most crystal clear lesson of the past few years of intensified efforts across the world to build conversational applications.
It’s no wonder that in that historic document from 1955, introducing the first gathering on artificial intelligence, language was one of the first challenges mentioned.
“An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves.”
McCarthy et al, 1955
A bit more than 65 years later where are we? Can we usefully interact with machines using natural language today? If so, how should we go about thinking about it and designing dialogical systems?
To begin answering we need to distinguish between general understanding supporting open-ended conversations about any topic and focussed conversations with a specific goal.
Open-ended conversations are very much not a solved problem (the argument is still ongoing on whether it is even a solvable problem). We do have sophisticated systems such as GPT-3 from OpenAI or LaMDA from Google that can sustain convincing conversations with a human around a broad range of topics. However, such systems are not designed to help you achieve a specific goal or even have a sense of what such a goal should be.
Conversational applications, on the other hand, exist to help people achieve an end goal. For conversational applications that problem is far more tractable. We can build useful conversational applications today and we believe that with OpenDialog we are making it simpler, more powerful and more accessible than ever for anyone.
At OpenDialog we believe that the path forward for building useful conversation applications today is to take a pragmatic, context-first, approach to conversation design that empowers designers and developers to make the best use of all the tools at hand. We start from conversational context, pass through fine-grained interpreter management and tie everything up with a proactive conversation engine that attempts to exploit both context and NLU interpretation as beneficially as possible.
1. Capture high-level context through conversation design
OpenDialog is an opinionated model of how conversational applications should be built. The approach enables designers to define the main aspects of a conversation in terms of scenarios, conversations, scenes and turns and defines the relationships between things. The purpose of this high-level context capturing is to reduce reliance on non-deterministic, natural language understanding systems to keep track of context. The context we are in at any given moment provides a shortcut to understanding what are useful interactions and how our conversational application can coordinate and collaborate with the user to reach the end goal. It allows us to design conversations that are very focused without feeling restrained and makes it possible to help the application recover in ways that are contextually relevant. Take a look at our conversational patterns for indications of where this approach can lead.
2. Fine-grained NLU interpreter management
Context on its own cannot solve the entire puzzle of course. Eventually, we will need to interpret user input and match it to a possible intent.
There is no one-size-fits-all approach to interpretation. Depending on the domain and type of phrases you are looking to understand, different techniques may apply. OpenDialog allows you to associate different interpreters to different contexts of the same conversational application. Whether it is a more generic NLU service such as Google Dialogflow or Microsoft LUIS or your own model trained using an open-source tool like RASA you can mix and match.
OpenDialog supports the definition of interpreters at each conversation context level down to a single conversational turn.
This once again reduces the need for very complex interpreters that need to sort through a large range of possibilities. We rely on context to help us and build small focussed interpreters that can be combined to handle parts of the problem reliably.
3. Pro-active conversation engine
The third piece of the puzzle is the conversation engine itself. The conversation engine uses the context defined by the user to determine what are the possibilities at any given exchange but will also consider sensible ways to handle situations such as failure.
For example, you can define both a global No-Match conversation but also define localised no-match turns within a single scene. If the conversation engine cannot match the user input to an anticipated intent it will look for a local no-match turn where, through contextual conversation design, we can help the application and the user recover. In addition, through conditions, we can ensure that if failures are repeated we escalate the problem appropriately.
What is coming next
We believe that the way forward for sophisticated multi-turn conversational applications will depend on making the relationship between context, natural language understanding and pro-active conversation management increasingly more efficient. We are working on ways to make it simpler for designers and developers to mix and match conversational patterns and NLU approaches (for example through the use of knowledge graphs) and make the conversation engine increasingly smarter. OpenDialog enables you to build sophisticated, complex conversational applications today, but we are only getting started!