I’m developing a chatbot. Where can I find help for testing it? I would like to share it to some communities, people who like to play with chatbots etc.
I don’t want to make it open to everyone because it’s I don’t have infrastructure to serve high traffic. Just to limited number of ppl.
Not sure if this is the right place, but here goes. We’d like to create a cross-platform automatically-updating wiki and bot. We got a pretty big budget for this.
Think following scenarios:
If someone shares a presentation in Microsoft Teams public channel, related to a topic. Throw it in the wiki (Collab. Platform)
Someone’s account manager for a client? Throw it in the wiki (CRM Platform)
Someone’s available to work in 2 weeks? Throw it in the wiki (ERP Platform)
Problem right now is I have no idea at all where to begin. Anyone got any inputs? Could be platforms, AI, machine learning, anything really. The more specific the better. All I got is ‘start with the data people will actually need/should be easily available.
Mercury — a chatbot for Ordering Food using ALBERT & CRF
Unless you have been out of touch with the Deep Learning world, chances are that you have heard about BERT, ALBERT and CRF (Conditional Random Field).
Mercury, named after the Greek God Hermes who was the messenger of the Gods, is a chatbot service which can be integrated with various Food-Delivery brands such as Swiggy or Zomato where a User can simply type in his order and send it as a text.
Mercury can then extract the essential information from the order and place the order for the User accordingly.
Here is a list of technologies involved in Mercury :
Since this paper is about “Mercury”, I will only be providing a brief summary and some useful links for a more in-detail understanding of each concept.
What is BERT?
Let’s take a look at these sentences where the same word has different meanings :
|- I was late to work because I left my phone at home and had to go back.
|- Go straight for a mile and then take a left turn.
|- Some left-winged parties are shifting towards centralist ideas.
How do you differentiate between each meaning of the word left?
These differences are almost certainly obvious to you, but what about a machine? Can it spot the differences? What understanding of language do you have that a machine does not?
Or rather, What understanding of language do you have that machines did not have earlier?
The Answer : Context.
Your brain automatically filters out the incorrect meaning of words depending on the other words in the sentence, i.e. depending on the context.
But how does a machine do it?
This is where BERT, a language model which is bidirectionally trained (this is also its key technical innovation), comes into the picture.
This means that machines can now have a deeper sense of language by deriving contextual meaning of each word.
ALBERT was proposed in 2019, with the goal of improving the training and results of BERT Architecture by various techniques :
Parameter Sharing (Drop in Number of Parameters by over 80%)| Inter-Sentence Coherence Loss | Factorization of Embedding Matrix
Results of ALBERT on NLP benchmarks :
ALBERT VS BERT (ALBERT Achieves SOTA Results with 20% Parameters)
What are Conditional Random Fields?
CRF classifies inputs to a feature from a ‘list of potential’ features.
I will be going into a little more detail shortly but for now, just understand that CRFs are used for predicting sequences depending upon previous labels in sentences.
They are often used in NLP in various tasks such as Part-Of-Speech Tagging and Named-Entity Recognition since CRFs excel in modelling sequential data such as words in a sentence.
What is gRPC?
It is an open-sourced high performance Remote Procedure Call framework.
It’s main advantage is that the client and server can exchange multiple messages over a single TCP connection via the gRPC Bidirectional Steaming API.
Mercury uses gRPC bidirectional streaming API for implementing Speech-To-Text functionality by using the Google Speech-To-Text API.
Mercury — What’s under the Hood?
What does Mercury do before placing the order for the User?
How does Mercury know that the text it has received is indeed a request for placing an order?
Let’s take a look at this sentence:
“I would like to have 1 non veg Taco, 3 veg Pizzas and 3 cold drinks from Domino’s.”
How does Mercury go from this to something like this?
This is where Joint-ALBERT (Slot-Filling & Intent-Classification) comes into the picture.
Sneak-Peak under the hood of Mercury’s Model
Training :
We come up with some desired labels for our model.
Intent Label : <OrderFood>
Slot Labels : <restaurant_name> , <food_name>, <food_type>, <qty>, <O> (<O> means that specific word does not carry much value in the sentence and can be masked or ignored).
We create hundreds of sample sentences with labels associated to each word.
ALBERT + Conditional Random Field (Joint-ALBERT):
We have already learnt that CRFs excel in modelling sequential data. So how does it help Mercury?
CRFs essentially help in mapping each word to it’s appropriate label.
For example:
It can map the number “1” to <qty> denoting quantity.
It can map the word “Domino’s” to <restaurant_name>.
Great! So if CRFs can do this, why do we even need ALBERT?
In our original sentence :
“I would like to have 1 non veg Taco, 3 veg Pizzas and 3 cold drinks from Domino’s.”
How does CRF know that the word “non” is a <B-food_type> and the word “veg” is <I-food_type> (B means beginning & I means continuation of B)?
How does CRF know that the word “non” is not the dictionary meaning “anti”?
As you probably already guessed, ALBERT provides CRF the contextual meaning of each word which helps CRF in classifying each word into the correct slot labels.
CRF does Slot-Identification for each word by mapping each word’s possible label with each other and figuring out which mapping has the highest probability.
Bold Line represents the Most Probable Mapping
Finally, how is Intent of the sentence predicted?
CRF does this part too by figuring out that “A specific sequence of slot-labels leads to a specific Intent”.
For example :
If the slots <food_type>, <food_name> and <restaurant_name> are found in a sentence, then the sentence is probably having the intent of <OrderFood>.
Intent Prediction based on Slot Labels
Flutter Front-End
Mercury has a simple and elegant front-end for the User.
Humans have long been fascinated by the concept of machine thinking and working like a human, at least intellectually. Though there might be potentially devastating consequences of a machine becoming as intelligent as humans, we simply cannot deny that they could be of great help and improve our daily lives.
Cognitive AI has been powering virtual assistant services throughout their existence. The capabilities of virtual assistants are increasing year by year and With various virtual personal assistants showing very advanced and high intelligence capabilities, it is common for anyone to get curious as to what is a virtual assistant ai? and to what extent this “cognition” they exhibit would go.
In this article, let us try to understand exactly that. However, before that, we must understand what cognition means and what to think of when software exhibits cognition.
What is Cognition?
Basically, cognition is a state of awareness of an entity of its surroundings and an ability to evaluate it and give an intelligent response to the stimuli. To define more formally, it is a mental action or a process of a being in which it acquires understanding through thinking, experience, and senses which it uses to communicate with the world.
Other than being able to “sense” the world, the cognitive being must have the intelligence to understand and react to it and also have a working memory.
Humans have intuitive cognition which is developed through a lot of means. However, a machine exhibiting cognition means that it is also “aware” of its surroundings and gives intelligent responses if we interact with them.
How can we test the Cognition of a Software?
In humans, cognitive capabilities are tested by making them do various activities that require a certain level of cognition. So, a similar form can be used to test and assess the level of cognition of software.
This actually can be seen as the core idea behind the Turing test, which is a thought experiment that goes like this: a computer is hidden behind the doors of a room and a man would be communicating with that computer by sending them written notes. If the computer manages to answer all queries satisfactorily without a man getting doubt, the computer wins!
Artificial Intelligence is actually divided in general into weak AI and strong AI. The former one being Artificial intelligence that works only for a subset of problems and requires a lot of training material to solve a new set of problems and the latter is a general AI that can solve almost all problems that are given to it. Strong AI has not been developed fully yet, so much of the development is still ongoing.
So, this is the first limitation we see with today’s AI. Which is that they can exhibit only finite cognition in limited fields onto which they are trained on. There is another general problem associated with the present state of AI — — black-box nature.
That is, much of all machine learning models used in practice are black boxes and we do not easily know how exactly a particular decision is made and so, we cannot look at “process” and come to a conclusion whether the machine is “thinking” and going through “right steps” of intelligence in giving an answer.
So, what can we do? One easy way is to look at various examples in which virtual assistants give intelligent responses and analyze how intelligently they handled the situation.
Cognitive Capabilities of a Cognitive Virtual Assistant
Here, let us see various examples of how an intelligent virtual assistant would respond to a query and analyze it from various viewpoints of cognition.
Memory Retention
When using memory, we can understand that a virtual assistant would have an exceptional memory where it is able to access an appropriate memory when there is particular use.
Consider you are talking to a virtual assistant regarding going to a place for a vacation. Think that you told it a very while ago that you like to go to a particular place in the winter season. Now, when you ask a virtual assistant to suggest a place to visit, the virtual assistant would check that the current season is winter and from an understanding of your interests, it may suggest to you places that really suit you.
Not only this long-form memory retention, virtual assistants can also retain current context across various previous messages and thus can give intelligent answers based on previous conversations.
Understanding Spell Mistakes and Paraphrases
The ability to understand spelling mistakes and paraphrases signifies that a virtual assistant is cognizant enough not to take for granted that what you type to be correct and includes complete intent. Even though you give an incorrect input or do not specifically specify intent, the virtual assistant would extract the correct intent and proceed with the next actions.
For example, let us think you have given an input when my event is scheduled? Then, it might understand that you have asked when your event is scheduled. Also, you did not specify what event you have asked for. The virtual assistant tries to guess the most appropriate event that you might ask about and would provide a response based on that.
Understanding long-form Sentences
Understanding long-form sentences signify that a virtual assistant is able to break sentences down and can still understand the overall intent and meaning of a sentence.
For example, if you provide Hi I am John Doe. Currently, I have a mid-tier plan. I Am not satisfied with it. Either solve my problem or get me a higher plan. Then, the virtual assistant would understand that you are somewhat angry and tailor its response. Also, it first asks about the problem and if the problem is not solvable easily, then provides information regarding higher plans. If you are still not satisfied, then your request, with all appropriate information, would be forwarded to a support representative.
Conclusion
From the above analysis, we can come to a conclusion that virtual assistants, with help of cognitive AI technologies, exhibit exceptionally good cognitive capabilities. The capabilities would exhibit intelligence in discerning the intent of the user and also remember and understand the context.
So the limit of a virtual assistant to exhibit cognitive capabilities cannot be specified as such, except for the fact that it’s limit is just to our imaginations. The existing attributes and level of intelligence is certainly growing with a drastic pace.
In the previous article, I wrote about why good chatbots need context instead of tree-based flows. The benefits of introducing context are that users can engage in a more natural dialogue with your chatbot, get direct replies and change information without restarting the conversation.
I’ll be using Dialogflow and Cloud Functions for Firebase to describe and explain the implementation. The ticket price inquiry example is based on the scenario described in my previous article, so take a look if you have not.
Concept
1. Instead of one intent with the required slot filling parameters, create that intent followed by one intent for each parameter. (See purple boxes above)
2. In those intents with a single slot filling parameter, set it optional.
3. Put all entities extracted from any intent into the conversation context programmatically. (See the blue box above)
4. Make a functional response for a group of related intents (see the orange box above), so that you’re making a chatbot to reply based on the user’s intent and information (either mentioned or referred from context), instead of intent without information.
Let’s take a closer look at the code.
Intent mapping
Start by creating a map of intents. Let the agent (a webhook client) use the intent map to handle incoming messages.
Remember to create those intents in Dialogflow and turn on the webhook fulfilment.
Intent fulfilment
Next, for each intent that acts on the customer’s query (such as the ticket price inquiry in this case), the chatbot will reply based on the number of participants (children, adults and seniors), the site they wish to visit, and the citizenship.
Extract both the slot filling values (parameters) and the context parameters found in the request body.
If the customer has explicitly mentioned the site, the number of people and/or their citizenship, use that. If not, recall from the context in the current conversation session.
Make a functional reply using those parameters, such as replyTicketPrice(agent, citizenship, participants, site) .
Finally, keep all (and new) parameters mentioned in the context. agent.context.set({name, lifespan, parameters: { …currentParams, …newParams}}) This is helpful when the bot goes back to step 1 and 2, preventing the user from repeating the details.
In many cases, a chatbot is designed to fulfil a wide range of customer requests, so a good idea is to assign a topic name in the context to keep the current conversation relevant (be it about ticket price inquiry, membership registration or something else) in case the user informs the chatbot about a different site or a different number of visitors.
So how should the bot exactly reply? There are 8 possible ways to reply, based on what parameters were given, and it’s up to your conversation design.
If the customer comes with a short and sweet message like “ticket prices”, the chatbot will ask for one of the required parameters. Or if the customer says “ticket prices to the cloud forest”, there’s an equal chance that the bot will ask for the number of the participants or the citizenship. All bot replies are in the form of questions, which will elicit a relevant answer from the user, otherwise, the bot will tell the customer the ticket prices right away if all are known.
What happens when the user says “2 adults, 1 child”, “cloud forest”, or “tourist”? The respective parameter-based intent is triggered, not the ticket price intent. In this situation, the chatbot will invoke the replyTicketPrice response based on whatever information is passed to it each time. There is also the possibility that the customer starts talking to the chatbot that they are “interested to visit” without a specific purpose like inquiring about the ticket prices, so the bot may ask “Which site?”.
The fulfilment of this parameter-based intent (site) shares the same design pattern for the other two (participants and citizenship).
Conclusion
Context is a great way to carry important details from intent to intent, especially if the customer changes information, interrupts the flow by going off-topic, or wants the bot to complete multiple requests.
You’ve probably looked up local weather info using digital assistants, and then asked “what about (this city) instead?” and still get the weather forecast. That’s context at work. Can you think of other use cases too? Are you thinking of re-designing your bot dialogues? Or do you have different ways to accomplish context using other bot frameworks?
Opinions expressed are solely my own and do not express the views or opinions of my employer. If you enjoyed this, subscribe to my updates or connect with me over LinkedIn.
Among the various ways you can improve customer satisfaction, chatbots are a powerful solution to help the customer base. Chatbots are affordable, help scale your business, fully customizable, help your customers find the right products/services, and help build trust for your business. To prove this I’ll go through following content:
What is a machine learning chatbot?
Why chatbots are important in different business spheres?
Build you own NLP based chatbot using PyTorch.
Deploy chatbot in Javascript and Flask.
What is a machine learning chatbot?
A chatbot (Conversational AI) is an automated program that simulates human conversation through text messages, voice chats, or both. It learns to do that based on a lot of inputs, and Natural Language Processing (NLP).
For the sake of semantics, chatbots and conversational assistants will be used interchangeably in this article, they sort of mean the same thing.
Why chatbots are important in different business spheres?
Business Insider reported that the global chatbot market was expected to grow from $2.6 billion in 2019 to $9.4 billion in 2024, forecasting a compound annual growth rate of 29.7%. The same report also suggested that the highest growth in chatbot implementation would be in the retail and ecommerce industries, due to the increasing demand of providing customers with seamless omnichannel experiences.
That alone should be enough to convince you that chatbots are the way to handle customer relationships moving forward, but they will also continue to grow as internal tools for enterprise tools and nearly every industry will adopt the technology if it hasn’t already.
Below are the key reasons why more and more businesses are adopting the chatbot strategy and how they are a win-win formula to acquire & retain customers.
Reduce customer waiting time — 21% of consumers see chatbots as the easiest way to contact a business. Bots are a smarter way to ensure that customers receive the immediate response that they are looking for without making them wait in a queue.
24×7 availability — Bots are always available to engage customers with immediate answers to the common questions asked by them. The top potential benefit of using chatbots is 24-hour customer service.
Better customer engagement — Conversational bots can engage customers round the clock by starting proactive conservation and offering personalized recommendations that boost customer experience.
Save customer service costs — Chatbots will help businesses save more than $8 billion per year. Bots can be easily scaled which saves customer support costs of hiring more resources, infrastructure costs, etc.
Automate lead qualification & sales — You can automate your sales funnel with chatbots to prequalify leads and direct them to the right team for further nurturing. Being able to engage customers instantly increases the number of leads and conversion rates.
There are many platform where developers, data scientists, and machine learning engineers can create and maintain chatbots like Dialogflow and Amazon Lex. But my goal in this article to show you how to create a chatbot from scratch to help you understand concepts of Feed-Forward Networks for Natural Language Processing.
Let’s get started!
You can easily find a complete code in my GitHub repo.
Here is a short plan that I want to follow to build a model.
Theory + NLP concepts (Stemming, Tokenization, bag of words)
Create training data
PyTorch model and training
Save/load model and implement the chat
We will build chatbot for Coffee and Tea Supplier needs to handle simple questions about hours of operation, reservation options and so on.
A chatbot framework needs a structure in which conversational intents are defined. One clean way to do this is with a JSON file, like this.
Chatbot intents
Each conversational intent contains:
a tag (a unique name)
patterns (sentence patterns for our neural network text classifier)
responses (one will be used as a response)
So our NLP pipeline looks like this
Tokenize
Lower + stem
Exclude punctuation characters
Bag of Words
We create a list of documents (sentences), each sentence is a list of stemmedwords and each document is associated with an intent (a class). Full code is in this file.
Then we need to set a training data and hyperparameters.
After all needed preprocessing steps we create a model.py file to define FeedForward Neural Network.
Feedforward neural networks are artificial neural networks where the connections between units do not form a cycle. Feedforward neural networks were the first type of artificial neural network invented and are simpler than their counterpart, recurrent neural networks. They are called feedforward because information only travels forward in the network (no loops), first through the input nodes, then through the hidden nodes (if present), and finally through the output nodes.
Be careful! In the end we don’t need an activation function because later we will use cross-entropy loss and it automatically apply an activation function for us.
Why we use ReLU?
They are simple, fast to compute, and don’t suffer from vanishing gradients, like sigmoid functions (logistic, tanh, erf, and similar). The simplicity of implementation makes them suitable for use on GPUs, which are very common today due to being optimised for matrix operations (which are also needed for 3D graphics).
After defining a CrossEntropy Loss and Adam we implement backward and optimizer step.
What do all these lines mean?
We set zero_grad() to optimizer because in PyTorch, for every mini-batch during the training phase, we need to explicitly set the gradients to zero before starting to do backpropragation (i.e., updation of Weights and biases) because PyTorch accumulates the gradients on subsequent backward passes.
Calling .backward() mutiple times accumulates the gradient (by addition) for each parameter. This is why you should call optimizer.zero_grad() after each .step() call. Note that following the first .backward call, a second call is only possible after you have performed another forward pass.
optimizer.step is performs a parameter update based on the current gradient (stored in .grad attribute of a parameter) and the update rule.
Finally, after running train.py script what a wonderful result we got!
And in the last part we need to save our model. Here the way I did it easily.
Chatbot Deployment with Flask and JavaScript
I decided to go further and create this amazing visualization of ChatBot.
All my HTML, CSS and JavaScript scripts you will find in my GitHub repo.
Enjoy!
Conclusion
Now, as you are aware of what a chatbot is and how important bot technology is for any kinds of business. You will certainly agree that bots have drastically changed the way businesses interact with their customers.
Chatbot technologies will become a vital part of customer engagement strategy going forward. Near to future bots will advance to enhance human capabilities and human agents to be more innovative, in handling strategic activities.
Have you ever wondered how techies build a chatbot? With the development of the python language, it has become simple with just a few lines of code. I’m trying to give some idea about how to build a simple chatbot using python and NLP which we come across in our daily interaction with any app/website.
Recently, the human interaction for most kinds of initial customer support is being handled by these chatbots which reduces an enormous amount of logs or complaints coming in the day-to-day lives for any product or service that we buy. The simple and repetitive queries are very well handled and it’s good to leave these tasks to robots and focus our human energy on more efficient work that adds value to the company.
There are two types of chatbots that we generally come across,
Rule-Based Chatbot: It’s a decision tree-bots and use series of defined rules. The structures and answers are all pre-defined you are in control of the conversation.
Artificial Intelligence Chatbot: It is powered by NLP(Natural Language Processing). It is more suited for complex kinds and a large number of queries.
Some applications of Chatbots:
Without wasting much time, let’s dive into the jupyter notebook and get our hands dirty with coding.
1. Import Required Libraries
Building a chatbot requires only three important libraries as follows, nltk — Natural Language Tool Kit for natural language processing. strings — To process strings in python random — To randomly select the words or responses
2. Import the Corpus
Courpus in simple terms means collection of texts(strings, words, sentences etc). It is the training data required for chatbot to learn. The corpus plays a very important role in deciding the responses. So whatever you want your chatbot to learn and respond has to put in a txt file and save it.
The NLTK data package includes pre-trained punkt tokenizer for english language so it is preferred over other tokenizers such as tweepy, RegEx etc.
Wordnet is a semantically-oriented dictionary of English included in NLTK library.
Corpus is the core of our chatbot from which we proceed further to data preprocessing where we handle text case and convert all the data input to either lower or upper case to avoid misinterpretation of words.
Tokenization is a process of converting a sentence into individual collection of words as shown below. The punkt tokenizer is used for this purpose.
Once we separate the sentences and words using tokenizer, let’s check if it is done correctly with the following codes.
0
4. Text Pre-processing
Lemmetization or Stemming is a process of finding similarity of words which are having same root(lemma) words. For example:
5. Generating Bag Of Words (BOW)
The process of converting words into numbers by generating vector embeddings from the tokens generated in the previous steps. For example
Vector Embeddings
On top of this vector embeddings, one hot encoding is applied through which all the words are converted in 0’s and 1’s which will be used as input for the ML algorithms.
Code for lemmetization/stemming
6. Defining the Greeting function
7. Response Generation function
Import two libraries that are important at this stage as follows:
Tfidf — Term frequency(tf) is used to count the frequeny of occurance of words in the corpus & how rare is the occurance of the word is identified by inverse document frequency(idf).
Once we have bag of words(BOW) converted into 0’s and 1’s the cosine_similarity function is used to produce normalized output so that machine can understand.
Next, we write a function for response generation so that after we provide certain data (corpus) to it and ask some questions, we get an answer. But in case if the user asks something which the chatbot don’t understand meaning tfidf==0 then the machine should respond accordingly as mentioned in the message.
8. Defining conversation start/end protocols
In this section, as soon as the user greets, the chatbot will be able to respond back with the message & when the user says bye, it will quit. In between any questions related to our corpus will be responded back to the user which makes this chatbot interesting.
9. Chatbot queries and responses
Now, our simple chatbot is ready to respond to the user. Type in some inputs saying hi and it will respond with any of the greeting input by randomising it. When we ask any question and it will respond back to user with the related words and sentences which makes sense. Sample Chatbot interactions are shown below
Note: Since it is a simple chatbot, it will not answer some of the direct questions like what is data science and stuffs like that.
10. Conclusion
This is one of the most simple chatbots you can build with very few lines of code. Of course, if you want a more sophisticated chatbot then it all depends on the scale & vastness of the corpus which we give for training & the complexity of the code which helps the chatbot to learn and respond to the user questions.
Hope these baby steps helps my fellow friends and Data Science aspirants to dig deeper and build more complicated chatbots as per the requirement.
Final code for this project can be found at my GitHub repository.
Building a simple Chatbot was originally published in Chatbots Life on Medium, where people are continuing the conversation by highlighting and responding to this story.