What is the most representative way to describe the problem Conversational AI needs to solve? Is it ensuring that interactions feel as human-like as possible? Is it a natural language understanding (NLU) or machine learning (ML) problem? Is it how to manage conversational state? How you phrase and approach this question will deeply influence what path you take to solve it.
Typically, Conversational AI is viewed as primarily an NLU/ML type of problem – i.e. how do we get machines to understand (classify) what a user said and then identify what an appropriate response is. Certainly, a crucial part of the problem to solve but, is it the central problem? By phrasing the problem as primarily an NLP problem you risk limiting the scope of action to just this aspect. You are limiting yourself to a hammer and that risks making everything look like a nail.
We can test if NLU/ML is the only challenge to solve through a simple thought experiment. Project yourself into a future where you have a perfect NLU tool at your disposal. As good in language understanding as any human. Are all our interactions now immediately going to be amazing? As most of us experience (often almost daily) even when the counterpart is a human, conversations still break down; things still go wrong; misunderstandings still take place. Well, if things still go wrong for humans perhaps the central problem was not purely a language understanding one.
At OpenDialog we think that the true challenge at the core of Conversational AI is managing to effectively co-operate, co-ordinate and problem solve through language as the interface. Language understanding is a means to an end.
The interacting parties, whether all humans or humans and machines (or just machines), need to figure out how to work together to achieve their goals. Each brings their own pre-conceptions of how to go about it but they need to come to a common agreement for the “rules of the game” and the expected ways to interact. For example, in a simple question/answer scenario the rules are that one gets to ask a question and then needs to wait for the other to provide an answer. If the one asking the question keeps interrupting, while the question is being answered, or keeps throwing another question on top of the previous one the other party will attempt to establish some guidelines. They might say: “Let me finish answering the first one, and then we can proceed to the second one”.
The more complex the setting the more we need to depend on existing prior social knowledge about what the “rules” are (e.g. people generally know that there is a certain process to ordering food, booking a ticket and so on) or the more we need to build instructions in the conversation (we may need to guide the user through a completely novel experience such as setting up a new device in their home or carrying out a new procedure in an industrial setting).
All of this indicates that there is a need for a conversation designer to provide a flexible structure through which this co-ordination and co-operation can happen in the best possible way. There is then the added challenge of having to do that with imperfect NLP tooling that will not always get it right.
This is why at OpenDialog we view Conversational AI as primarily a co-operation and co-ordination problem that then includes an NLP challenge as well. By considering the problem with a wider lense we have a broader range of tools at our disposal to solve the problem. We are no longer restricted to only hammers and nails!
Luckily this co-operation and co-ordination problem is one that Artificial Intelligence research has been grappling with for a long time, in a field called multi-agent systems. At OpenDialog we draw from that field relevant ideas and approaches and translate them to easy to use tools that a conversation designer can deploy appropriately as they’re designing the overall system.