Conversational AI

And their Neural Approaches

Aarsh Kariya
5 min readMay 9, 2021

Let’s start with IBM’s definition of conversational AI. Conversational Artificial Intelligence (AI) refers to technologies, like chatbots or voice assistants, which users can talk to.

During this pandemic, we have read that a lot of people copped up with loneliness using available conversational devices like Siri. It has been a goal of conversational AI for ages to develop an intelligent AI dialogue system that not only emulates human conversation but is able to plan our travels, answer questions on topics from latest news to Newton’s laws of mechanics. This goal was far-fetched until recently when we started observing promising results in both the research community and industry because of the large amount of data available for training the models. And also because of the developments achieved in Deep Learning and Reinforcement Learning.

This article discusses in detail the neural approaches to conversational AI that were developed for three types of dialogue systems. The first is a question answering (QA) agent which can provide concise answers to queries. The second is a task-oriented dialogue system that can help plan vacations. The third is a social chat bot which can converse fluently.

Photo by Josh Rose on Unsplash

Question-answering Agents (QA Agents)

The data gathered from web documents and pre-compiled Knowledge Graphs, QA agent can provide concise answers to queries.

Example of a human agent conversation. Courtesy of Jianfeng Gao at Microsoft Research.

A human-agent conversation can be formulated as a sequential decision making process. It has a hierarchy. A top level executive selects what person to talk to for a particular task. And the person performs the task to completion. The same is with the process. A top level process selects a low level process to complete a particular subtask. And the low level process, chooses actions to complete those tasks.

A mathematical framework of options over Markov Decision Making, is used to formulate such hierarchical decision making processes. It generalizes primitive actions to higher-level actions.

Reinforcement Learning for dialogue. Courtesy of Jianfeng Gao at Microsoft Research.

By looking at each option as an action, top-level and low-level processes are naturally mapped to the RL framework as shown above.

Upon observation, we can understand that the dialogue-agent navigates a MDP (Markov Decision Process) by interaction with the environment. This process is carried out over a sequence of discrete steps. A policy is followed at each step to choose an action after carefully observing the current state. This cycle continues until termination.

The policy and the goal of dialogue learning (DL) is to find optimal answers to maximize rewards.

Reinforcement Learning provides a unified ML framework that is helpful in building dialogue agents, interacting with real users etc. However, this can prove highly expensive for ample domains. To resolve that situation, RL is used with supervised learning and especially in cases where there are large amounts of human-human interactions.

Task Oriented Dialogue Systems.

Let us first get familiar with the architecture of task oriented dialogue systems. There are four levels into understanding the architecture.

  1. Identify instances of user language using Natural Language Understanding (NLU)
  2. Track conversation state.
  3. A dialogue policy which selects the next action based on the current state. (This is similar to the process in the first type of conversational AI)
  4. A Natural Language Generator (NLG) for converting the agent action to a natural language response

The modules were optimized and implemented individually until now. But we can see a growing trend of applying DL and RL to automate the optimization.

Now, let’s take a look at how these systems are approached. There are two fronts.

  1. End-to-End Learning. This approach is implemented using differentiable models like neural networks. The reason is to be able to jointly optimize the neural networks from user feedback signals using backpropagation and RL.
  2. The second is the use of advanced RL techniques to optimize dialogue policies in more complex scenarios.

Social Bots

There are multiple names to this one. Chatbots, social bots, virtual assistants etc are all the same. The sole purpose of these bots is to facilitate smooth interaction between humans and their electronic devices.

Now-a-days, researchers are dependent on automatic generation of responses to save themselves some time. These conversational responses are data-driven within the framework of NMT (Neural Machine Translation). They are available in the form of encoder-decoder of seq2seq models.

And they have been successful. The reason? They require little interaction with the user’s environment. Another reason? They cope well with free-form and open domain texts.

However, do neural responses carry meaningful information all the time? A common response like OK or I don’t know can serve as a reply to most of the questions addressed by the users.

Some mutual information models are proposed which are further developed and improved using deep reinforcement learning. Even persona-based models were presented to address some issues like speaker consistency.

I would like to end this article by presenting a few examples of chatbots that have been made available to the public.

Microsoft’s XiaoIce, Replika and Alexa Price Systems.

You can read the book Neural Approaches to Conversational AI here.

Thank you for reading!

--

--

Aarsh Kariya

AI and Literature Enthusiast. I even asked literature out!