Chatbots: Development and Applications
What You Will Learn
- 1 Introduction
- 2 Fundamentals
- 3 Applications
- 4 Development
- 5 Future Development
- 6 Conclusions
This work gives a general introduction to chatbots by explaining what they are, what they can be used for and how to develop them. No previous domain-specific knowledge is required.
Lately, as of writing topics around chatbots have received increasing attention from media and also numerous investments from different actors in the industry. At the same time, not many potential users know about the existence of chatbots or about areas in which chatbot could be helpful assistance. The topic is equally unknown to developers. While the term chatbot is commonly used in media, the meaning mostly remains ambiguous There is a need for further explanation of what chatbots are and further analysis to identify well-suited applications for chatbots. Additionally, to spreading knowledge about the potentials of chatbots and their use cases, more developers should be enabled to create new, innovative chatbots.
The lack of knowledge can be solved by providing answers to the questions of what chatbots are, what benefits they bring and how to create them. An appropriate definition of chatbots can be given by analyzing the fundamental meaning of the term chatbot and by exploring past and current applications. Use cases of chatbots can be identified in existing products. Market trends and attributes of media and technology can be analyzed to find new potential scenarios for the usage of chatbots.
Development is best explained by creating a real chatbot and by using it to present the general principles of the development process. Explaining what chatbots are, demystifying what to use them for and presenting how to create them, will help more people to be able to use and create chatbots, and thereby, accelerate the development of the chatbot ecosystem. Innovation in technology and the creation of new solutions can help to automate and simplifying more tasks, which gives people the opportunity to focus on more interesting issues and accomplish more things. Chatbots have the potential to simplify and automate many existing tasks and thereby accelerate the overall technological progress.
The structure of this work follows the three main questions. To begin with, the terminology is defined and applications are explored to form a definition and understanding of what chatbots are. Afterwards use cases of chatbots are identified not only through the collection of existing examples, but also through the exploration of future potentials by analyzing attributes of the relevant technologies. The second half of the work is a case study for the development of a chatbot. The presented example guides through the process of designing user interactions for a chatbot and additionally explains architectural decisions and technological choices, which provide a basis for other developers to build on when creating new chatbots in the future.
Before analyzing a topic it is necessary to have common definitions of its vocabulary. This chapter helps to align previous assumptions.
The term chatbot consists of two other terms – chat and bot. The meaning can be better understood by examining the two components separately. The Oxford Dictionary defines chat as “an informal conversation” and more specifically as “the online exchange of messages in real time with one or more simultaneous users of a computer network”. As apparent in this definition, conversations play a central role in chat and therefore chatbots. Other noteworthy aspects of this definition are the inherent informal format of a chat, and the traits of being online and real time.
Informality does not have to be seen as a strict requirement; however, a chat message and, for example, a classical letter have different degrees of formality.
Being online and thereby not bound to a specific geographic location, device or other physicality can be seen as a critical foundation for determining potential types of systems suitable for such media.
The aspect of limiting communication to real-time implies restrictions on possible interactions
and sets a baseline for the expected user experience. This also excludes the usage of certain
technologies which do not support the desired responsiveness.
A conversation is defined as “a talk, especially an informal one, between two or more people, in which news and ideas are exchanged”. Fundamental to this definition is that there are always at least two parties involved in communication and that information is exchanged. Keeping that in mind, the kind of systems involved in this should always receive and provide information; chatbots can not work with solely unidirectional interaction.
Bot is defined as being “(chiefly in science fiction) a robot” with the specific characteristics of representing “an autonomous program on a network (especially the Internet) which can interact with systems or users, especially one designed to behave like a player in some video games”
Foremost this provides the information that bots, including chatbots, are programs. The creation of a chatbot implies the creation of an artifact in the form of a computer program. Furthermore, the aspect of autonomy and communication over a network can be connected with the previously described trait of a chat to be online.
The program is given autonomy by not being bound to any specific device. Building on this allows for different solutions than a scenario where the user is in full control of a program’s behavior. Lastly, there is a hint in this definition pointing out that a bot can often be seen as a player in a game. This trend towards game-like mechanisms and the previously mentioned informality suggest the utilization of playful interactions.
Concluding from the combination of these definitions, a chatbot can be defined as an autonomous computer program that interacts with users or systems online and in real time in the form of, often play-like and informal, conversations.
Having defined the concept and character of chatbots, one apparent attribute of the technology under examination is that its domain of application involves interaction with one or more users. Any form of interaction requires an interface as a way to interact. An interface is generally described as a “point where two systems, subjects, organizations, etc. meet and interact” and in the area of computing, it can be further defined as a “device or program enabling a user to communicate with a computer”
This definition leaves a broad array of possible manifestations. However, the range of suitable means of communication can be further narrowed by including another aspect of the previous definition, namely that interaction happens in the form of conversation.
Many communication mechanisms can be excluded by focusing on this characteristic of the interaction. The term conversation strongly suggests the usage of natural human language as the base for interaction while discouraging the idea of using an interface consisting purely of static or
Further, conversations are not limited to written language. Verbal conversation is also a possible interface for communication.
Focusing on written, text-based communication, it becomes apparent that not all text-based
interfaces fit the characteristics of a conversational interface.
Classical command line interfaces, which are often used to interact with computers, are one example of text-based interfaces that do not have the attributes of conversational interfaces. As implied in the naming, these interfaces use commands for interaction. A command is an “authoritative order” which contrasts with a conversation.
The term conversation has a connotation of complex, non-linear communication where each involved party understands the underlying ideas communicated as opposed to merely receiving the characters the words consist of.
Understanding of the intentions a user has and the ability to adjust interactions accordingly set conversational interfaces, and thus chatbots, apart from other text-based interfaces.
This chapter explores different applications of chatbots to understand the characteristics of a chatbot in practice and to see the kind of applications chatbots can be used for. Starting with past work and looking at the present in subsequent sections, different approaches and products are presented. Furthermore, today’s ecosystem is portrayed, including an overview of available platforms, existing products and current approaches for the creation of chatbots. Lastly aspired advantages, potentials and promises of chatbots are further identified.
Before exploring new technology one should examine prior work and learn from past ideas, both successful and also failed attempts.
This section presents a selection of events from the last century, which introduced the ideas that formed the present definition of a chatbot. It is not an attempt to give an all-encompassing overview of the history of computing, instead, the aim is to explain where the concept of chatbots and the interest of creating them originated from.
The Turing Test
Even before the term chatbot was coined people started working on machines that interact
with humans through natural language.
A first milestone was the 1950 paper Computing Machinery and Intelligence by Alan Turing The ideas he formulated back then are still fundamental to the concept of a chatbot in today’s world and his thoughts are still central to many discussions about artificial intelligence. The most famous idea from this paper is the so-called Turing Test, which is meant to decide whether a machine possess human-like intelligence or not.
Originally Turing called the test imitation game whereas the experiment consists of a human interacting with two parties via textual messages. One of the parties is another human and one is a machine. The test subject does not know upfront which a party is a machine and which one is a human, but only that one of them will be a machine. During the game, the human interacts with the other party only through textual messages but is free to use any variation of messages. If the human is not able to tell which of the two parties is a machine and which one is a human, the machine passes the Turing Test.
When creating a chatbot or another kind of artificial intelligence this test can still be applied to test the human-likeliness of the created machinery.
While Alan Turing did not invent the first chatbot, the Turing Test was a crucial motivation for the following developments and even today the test still remains to be challenged by new systems.
Fourteen years after the Turing Test was introduced, Joseph Weizenbaum started working on what would be known as the first program to pass a variation of the Turing Test. Joseph Weizenbaum began working at MIT Artificial Intelligence Laboratory in 1964 and released the ELIZA program in 1966. Thus it can be considered as the first known creation of a chatbot.
ELIZA creates responses to messages that a user inputs via a text-based terminal.
The original version of ELIZA was written in a programming language called MAD-Slip, which was also created by Joseph Weizenbaum himself, and ran on the IBM 704 computer.
The most famous implementation of ELIZA is called DOCTOR and simulates a Rogerian psychotherapist.
Rogerian psychotherapy is a person-centered therapy intended to let the client realize their own attitudes and behavior. Although relying on mostly simple methods, it remains a popular treatment. Most answers the therapist gives are questions for further details about information which the client mentioned previously. Furthermore, clients mostly keep the assumption that a therapist has specific intentions even when asking non-obvious questions.
ELIZA takes advantage of the structure of the English language; the program takes apart sentences via pattern matching and keywords and reuses phrases after substituting certain words. For example, a client’s answer “Well, my boyfriend made me come here.” can be transformed to “Your boyfriend made you come here?”
Certain signal words and also sentences containing no signals words can be answered with generic, static phrases. Detecting the signal word “alike” in the sentence “Men are all alike.” ELIZA could pick the programmed phrase “In what way?” as an answer.
Knowing about the nature of Rogerian psychotherapy, Joseph Weizenbaum created ELIZA initially intended as a parody to demonstrate the simple behavior necessary for imitating this therapy. He was surprised that even people who knew about the inner workings of the program ended up having serious conversations with ELIZA. In one anecdote Joseph Weizenbaum told how his secretary, after starting a conversation with ELIZA, asked him: “Would you mind leaving the room, please?”.
Led by the success of the experiment he published the book Computer power and human reason: from judgment to the calculation in 1976, which presents his thoughts about artificial intelligence, including the differences between machines and humans and the limits of computer intelligence. In the book, he admits that he had not realized “that extremely short exposure to a relatively simple computer program could induce powerful delusional thinking in quite normal people” . This idea coined the term Eliza Effect which describes, that people quickly assume computers to behave like humans. This term is still in use today.
Another famous program was published by the psychiatrist Kenneth Colby in 1972. He created PARRY as an attempt to simulate a human with paranoid schizophrenia. The implementation of PARRY is far more complex than ELIZA, but it also models a personality including concepts of how to conduct conversations. The most famous demonstration of PARRY was at the first International Conference on Computer Communications (ICCC) in 1972 where PARRY and ELIZA had a conversation with each other. Later on, in scientific experiments, PARRY also passed a version of the Turing Test.
Further programs have been created to pass the Turing Test and gained the attention of the public. Jabberwacky was started in 1988 and attempted to learn from the user’s input. In 1991 Dr. Sbaitso was released as an ELIZA-like demonstration for a sound card, which was one of the first chatbots for MS-DOS-based personal computers. And A.L.I.C.E. has been released in 1995 and became famous for its realistic behavior, that is based on heuristic patterns instead of static rules.
The origin of the term chatbot itself can be seen in a paper called “ChatterBots, TinyMuds, and the Turing Test: Entering the Loebner Prize Competition” published by Michael L. Mauldin in 1994, whereby chatbot can be seen as a variation of the original term ChatterBots.
Up until today the Turing Test has only been passed limited to certain domains, and there is no chatbot yet that is able to simulate general human behavior indistinguishable from real human beings.
But, although creating an as human-like as the possible system remains a popular challenge, not all applications of chatbots benefit from this type of behavior. Many systems are instead optimized to provide quick and efficient interactions and behave accordingly without attempting to hide their artificiality.
With more than sixty years of history, the concept of chatbots is not a recent discovery. People have been fascinated with the idea of being able to talk to computers for a long time, but past attempts have mostly been simple experiments or applications focused on the aspect of entertainment associated with machines inspired by science fiction.
Lately, the technology industry and press are increasingly interested in the topic of conversational interfaces and chatbots. The reinforced interest can be explained by observing recent developments of technology and current market trends.
Since Apple’s release of Siri in 2011 customers have become more aware of the possibilities of conversational interfaces. Even though the capabilities were limited at that time, functionality improved quickly in the following years, which can be attributed to the new competition Apple triggered in the market.
At the same time, artificial intelligence gained new traction due to the success of using artificial neural networks for machine learning.
The concept of artificial neural networks “dates back to the 1950s, and many of the key algorithmic breakthroughs occurred in the 1980s and 1990s”, but only now they are successfully applied. This is mainly due to the increased computing power available today. A second crucial condition is a necessity for a large disposable amount of data; big Internet companies specialize in collecting data, originally intended to better target advertisement, but now they can use their data to train artificial neural networks.
The technology is named artificial neural network because they are modeled after neural networks in the human brain; instead of specifying rules of what a program should do, the machine learns from examples in similar ways to how humans learn. Some tasks that are too complex to be solved with a rule-based program, can now be solved by collecting enough example data, letting the machine figure out the solution instead.
These techniques can also be applied in the field of natural language processing, which is essential to understanding and generating text for conversational interfaces.
With new technical possibilities, more people see conversation as an interface not only as an idea of science fiction movies but instead as something that could be possible in the real world.
Also, recent market trends make chatbots more compelling.
“Computing is rapidly shifting to mobile devices” and “messaging apps have surpassed social networks in monthly active users”. As a result of these developments users do not have space for complex interfaces on the small screens of their devices, they need solutions light-weight in data consumption, and they can not use complicated keyboard shortcuts. At the same time, users are already spending a majority of their time in instant messaging applications and therefore, they are well accustomed to chatting as a means of communication.
Originating from the current state of technology the increasing interest in conversational interfaces leads to new platforms, products and applications for chatbots.
Unless the software is distributed with dedicated hardware, software products are designed to be
executed by and accessed through other software. The underlying software is the platform a
product is created for.
Products that target operating systems such as Microsoft Windows or Apple’s iOS require users to install necessary executable files on their local system. Other software uses the web as a platform, whereby customers use a web browser to access the software over the Internet,
while the software itself is executed on another computer referred to as a server.
In the case of chatbots, the target platform can be any medium that allows users to send messages to each other. A chatbot can be seen as a counterpart to interact within the same way a user interacts with a human counterpart.
There are numerous platforms available that fulfill these requirements. While messaging platforms provide means of communication, chatbots function similar to software accessible via a web browser; a server executing the chatbot software is needed and the messaging platform communicates with the server in the same way a web browser communicates with a server on the user’s behalf.
Because of the wide variety of available messaging platforms, it is not possible to create an all-encompassing collection of available platforms in the context of this work; the following is an overview over the currently most popular platforms, including their capabilities and their area of focus.
One of the most used online communication platforms is E-mail. E-mail, however, does not have the characteristics of chatbots, defined in 2.1 on page 3, to be able to communicate informally and in real time; which disqualifies E-mail as a platform for chatbots, even though in practice many use cases of chatbots overlap with the ones that can be solved with the automation of E-mail.
Although users can choose to express themselves less formally and certain E-mail providers deliver E-mail in a very short time period, this statement is based on the current general use case whereby these two attributes are not given. Still, it is indeed possible for these characteristics to change in the future and nothing fundamental about the E-mail protocols is preventing their usage for chatbots.
A well-known communication technology, which is suited for chatbots, is Short Message Service, or SMS.
SMS is primarily used on mobile devices and users are identified by their phone numbers, wherefore the communication has to happen through cellular network providers. However the technology is limited in a number of characters, often users are charged by the number of messages sent and communication is limited to text-only. “End of the year 2009 user level for SMS globally was 78%, ie 3.6 Billion” users worldwide, which means it remains one of the most popular communication channels; and it, therefore, is an interesting option for applications with a wide-spread audience or applications requiring a low entry barrier.
Since chatbots can communicate not only via text but also using voice, phone calls are possible medium too. They are a common way of communication available to a large number of people. However, for a chatbot that relies solely on voice for communication without any visual feedback, the design of the user experience has to be thought out especially careful. Furthermore to not only understand and generate natural language but to also parse andgenerate voice comes with further technical costs.
Apple’s Siri is another voice-based system available, but as of writing, it is not accessible as a platform for external services. Voice-based systems that can be targeted as platforms are Amazon’s Alexa and Google Assistant. Both systems are general assistants helping the user with a variety of tasks and in both cases, tasks can be delegated to third parties.
Currently, popular target platforms for chatbots are messenger platforms. They are primarily text-based, they mostly come without cost for end-users and additional to the text they often support multimedia formats such as pictures, audio, locations and stickers. Some platforms also allow developers to display sliders, buttons and other graphical interface elements, which can help to guide users instead of exclusively relying on text for communication. At this point, it is not feasible to create a comprehensive list of available features for each platform, since the field is innovating constantly and many of the platforms add new features almost every single month.
Facebook’s WhatsApp is with one billion active users in January 2017 one of the most popular messenger applications. It, however, does currently not allow automated access to its platform and therefore using it as a chatbot platform is not a viable option. The second messenger application belonging to Facebook, called Facebook Messenger, is equally popular with one billion active users in January 2017 as well. Contrary to WhatsApp, Facebook Messenger provides a platform for developing chatbots. Following in popularity are two messenger platforms which are mainly popular in China, QQ Mobile and WeChat. Both of them currently do not provide a specific chatbot platform, but there have been successful attempts at creating chatbots for these platforms. Further popular messenger applications include the Japan-focused Line, Microsoft’s Skype,
Telegram and the more business-focused Slack. All of these applications provide dedicated platforms for the development of chatbots.
The choice of platform primarily depends on the target market. Different audiences prefer different platforms and therefore in certain scenarios, one product might be better suited than another.
One important factor can be the geographical location of the target audience. As visible in figure 3.1, Facebook Messenger and WhatsApp are the global leading messengers and as previously mentioned the markets in China and Japan are dominated by WeChat and Line respectively, but the data shows some lesser know trends; for example the Thai market is also dominated by Line and in Iran, Telegram is the most popular messenger application.
As with the creation of other kinds of software, it is possible to release the same chatbot software for multiple target platforms, whereby the interaction between software and platform has to conform to the technical details and protocols of each environment, the usage of platform-specific features has to be adapted individually and the user experience needs to be designed to fit each environment’s expectations.
There are existing frameworks that allow developers to develop a chatbot once and release it to multiple platforms at the same time without any adjustments to individual platforms. One such framework is API.ai by Google. As of writing, it supports 16 different integrations, including platforms such as Facebook Messenger, Skype, and Slack. However, this platform is more than an adapter to different platforms. It is a complete solution to developing chatbots. API.ai comes with built-in support for natural language processing, a chatbot is already able to have basic conversations out of the box, and developers can train chatbots about new topics by just providing example conversations while the framework handles all of the language parsings. Detected keywords and intends can be forwarded to be handled with custom logic; although for simple use cases this might not be necessary and a chatbot can be developed without writing a single line of code.
However, there are also limits to platforms like API.ai.
First, intend parsing is, in the case if API.ai, currently limited to a finite list of topics. If a chatbot is handling topics from a domain unknown to API.ai, this solution is not sufficient anymore. Further, developers have no control over the applied machine learning and natural language processing algorithms. There are no possibilities for customization if the parsing results or the generated responses do not match the requirements.
Additionally, while API.ai currently supports 15 different languages, a chatbot is limited to these available languages. If another language needs to be supported, a lot of work might be necessary because the system needs to be recreated with a custom solution.
Moreover, an issue to keep in mind is, that, while API.ai has support for many platform-specific features such as custom formats for message content and quick reply buttons, there are unique features of single platforms that are not supported and since the space is evolving at such rapid pace future extensions might not be available either.
Even though API.ai is at the moment free to use for everyone, they will very likely search for a sustainable business model in the near future. The business could be sustainable by charging for the service, collecting data which Google can use elsewhere, binding customers to their services and gaining market share. Regardless which monetization strategy is chosen, as a developer one has to be aware that such a service is a non-controllable, external dependency.
Nowadays there are two fundamental ways a chatbot can communicate with users; some platforms provide user interface elements, such as buttons, that can be displayed to users; otherwise, communication is done solely with natural language. Interface elements limit user input to a number of predefined actions, while natural language has no restrictions on possible inputs. By suggesting possible actions in the form of a list of buttons or similar, possible interactions become obvious and simpler for the user. Even when limiting interaction to certain actions, the main characteristic of a chatbot remains; the interaction is still structured in the form of a conversation, only that the choice of input is restricted.
In certain scenarios, there are too many possible user inputs to fit in a fixed list. In these cases, natural language is a more appropriate input method.
Custom input requires to be parsed to extract information. As an example, when the user is asked for a date and time, and repeating times are also allowed, a user could express time as “every second Tuesday at 6 am and 9 pm”, while in a traditional user interface this would require a non-trivial amount of interface elements.
If natural language is used for communication, it should be clearly stated what kind of input is expected, so that the user knows which topics and which variations of input the system understands.
One of the main challenges with a natural language interface is handling the non-restricted interaction in a coherent way; since user input is in no way limited to a single topic, all sorts of unexpected user reactions have to be accounted for. As a consequence of this, there are certain conversations every chatbot has to be able to handle in some way. This includes, but is not limited to, simple small talk such as questions like “How are you today?”.
Most platforms allow for a combination of the discussed communication mechanisms; predefined actions can be displayed as suggestions to the user, while the user is also able to use natural language. By combining both methods it can be taken advantage of the benefits of both; there is a clear guideline for interactions and, simultaneously, the user is free to express any possible custom input.
With the increasing number of messaging platforms opening up for chatbot development, companies have become interested in releasing their product for this new format and some companies also create new products focusing solely on the chatbot market. It is still a new, not fully formed market, but there are certain trends for what companies are interested in creating. One helpful classification of chatbots is categorizing them in terms of the features they provide.
The following categories are adapted from the article “7 Types of Bots” by Dotan Elharrar, a Product Manager at Microsoft AI & Research.
Single-feature Chatbots A large number of chatbots provide only one single feature. These chatbots are limited in functionality but simple to use. One example is a Facebook chatbot called Instant Translator; in the beginning, the user selects one language to translate to. From there on Instant Translator simply translates all text it receives into the selected target language.
Proactive Chatbots This category describes chatbots which push information to the user instead of answering questions in conversations. Hereby the user does not need to interact with the chatbot, but only uses it as service to receive information at certain times. One example would a service which sends the user a daily weather forecast. Another use case is the chatbot for Facebook Messenger from the airline KLM; users can use the service to get updates and information about their booked flights.
Group Chatbots There is a range of functionality chatbots can provide when they interact with a whole group of people instead of only a single user. These chatbots are limited to platforms which provide the necessary features to use chatbots in group conversations. A simple example for a group chatbot is called Roll for a messaging platform called Kik; when sending a question to Roll, the chatbot answers with a random name picked from the members of the group.
Simplification Chatbots In a few cases chatbots are used to provide users with a simpler interface to complicated existing tasks, which would traditionally involve many bureaucratic and formal steps. One example is a service called DoNotPay. It is advertised as “the world’s first robot lawyer” and the service helps the user with simple legal problems, such as fighting parking tickets.
Entertainment Chatbots A popular kind of chatbots are still chatbots whose functionality consists only of having conversations with users. These services don’t interact with other resources apart from the conversation itself. The in 3.1.2 on page 7 described ELIZA belongs to this category.
Personal Assistants This category consists of chatbots that combine many different features and can be seen as platforms of their own. Siri and Alexa, which have been mentioned earlier, belong to this category.
Optimization Chatbots This category tries to make existing products more accessible by creating a chatbot for users to connect to a product. The difference to a simplification chatbot is that an optimization chatbot is not built on an external entity, such as the legal system of a state, but it instead connects to a product a company has full control over. By taking advantage of new platforms, companies like to reduce friction for customers to use their products. The currently most obvious aspect of chatbot platforms is the ease for users to access products. Companies like to optimize the use of their products by making them available via the conversational interfaces of chatbots.
Use cases fitting the category of optimization chatbots can be found across many different industries. The article “100 Best Bots For Brands & Businesses” lists examples from different industries using chatbots to optimize access to their products. Products include beauty brands such as Sephora, consumer goods like Johnnie Walker, entertainment companies including Disney and Marvel, fashion brand such as H&M, financial services like PayPal, food delivery from stores such as Pizza Hut, E-commerce platforms including eBay, traveling services such as Airbnb and Expedia, airlines like Lufthansa and British Airways and many news outlets including Washington Post, New York Times, Forbes and BCC.
As apparent from the engagement of many well-established companies, brands are very
interested in being present on messenger platforms.
While there is growing interest in targeting messaging platforms as new markets, currently user engagement with available chatbots has not reached the popularity of other channels such as mobile applications yet, and most chatbots created so far do not provide much functionality, but instead redirect users to their existing products.
With increasing interest, engagement and financial investments, the rise of more sophisticated products can be expected in the future.
Advantages of Using Chatbots
As explained previously in 3.2 on page 9, the interest in chatbots and conversational interfaces increased in recent time mainly driven by the popularity of messaging platforms, mobile devices, and personal assistants, and also by the advancements made in the field of artificial intelligence. The conditions are right to think about taking advantage of the possibilities in the newly available market, however, the question remaining is what can chatbots achieve that existing solutions are not good at, both, from a user’s point of view and looking at the interests of companies. From a user’s viewpoint, chatbots can be seen as alternative interfaces to interact with computers.
Existing interfaces are often not intuitive for humans. Humans need to initially learn how to use technology. With every new application one installs and every new website one visits, there is a new interface to adapt to. “Adjusting to a machine does not come naturally to us. With every app, you need to learn how to use it. … Conversations come naturally to us”.
The conversation is a way of communicating that humans already know how to use. It is the fundamental method for humans to interact with other humans. If it is possible to use this communication technology also to interact with machines, the interface would be by default intuitive for humans to use. “The vision for a chatbot: get machines to respond to questions like a human being”
Further, there is a trend in consumer behavior of “outsourcing their “chores”, such as driving,
shopping, cleaning, food delivery, errands” to companies that offer these services.
Service companies are not a new occurrence, however, in the past, it took more effort to coordinate the usage of such services. By using technology to automate many steps of the coordination process, not only the cost can be lowered, but also the friction for customers to use a service is reduced significantly. Managing and coordinating the usage of such services are tasks conversational interfaces are particularly suited for because this is a scenario that profits especially from the simplicity and low friction that characterize conversational interfaces.
Users can also profit from technical advantages chatbots have over native applications and websites.
Providing customers a more intuitive and more direct way of interacting with a company’s product is already a compelling reason for a company to be interested in the new platforms. But there are additional benefits companies can draw from conversational interfaces, which are not perceivable by users.
First, “the cost of developing a chatbot is one-third of what is required in developing a mobile app” This might not be the case for every product but in general, creating a chatbot is less work than creating a mobile application, because neither is custom design required nor is it necessary to write code for the logic controlling the user interface.
Next, “Chat apps also have higher retention and usage rates than most mobile apps”. Since chatbots are part of a chat application they can take advantage of being where the attention of mobile phone users already is. A chatbot can therefore potentially gain more user engagement than a competing website or mobile application.
Another aspect of using natural language as an interface is that “chatbots are able to gain invaluable data and insights on user behavior” , because firstly, users have the freedom to send any kind of information and feedback, and secondly, being in the context of an informal conversation people tend to be more talkative than they would be in a more formal environment. Especially for companies such as media outlets or retailers being able to further profile users can be useful assistance in tailoring personal experiences for users and targeting them with individual offers. Additional context and user data is also available on the platform itself; when interacting with a user via a chatbot on the Facebook Messenger platform, all public information of the user’s Facebook profile is also available for the company to use for further personalization.
Lastly, the most fundamental reason for a company to be interested in chatbots is the before mentioned popularity and ubiquity of messenger applications. “The question brands and publishers now face is how to engage with these private social network users”. When the attention of users is shifting away from not only non-digital media but also away from other mobile applications and traditional social networks, companies need to find a way to reach users at the place they spend most of their time at.
While chatbots have many promising properties, chatbots also come with restrictions which frame them as unfeasible solutions in certain scenarios.
One fundamental property of the architecture of chatbot systems is that they are not software running on the device of the user. They mimic the nature of chat, that two parties communicating with each other are using two separate devices. This fundamental design of chatbot implementations is similar to the basic idea of browsing the world wide web, whereby a network connection is a mandatory requirement. Typically network connections are part of the Internet and such a system can be described as purely online.
This underlying design has certain implications for the usage of chatbots.
First and foremost, this implies that all scenarios, which are obliged to work without a network connection, cannot make use of chatbots. Exemplary areas of applications affected by this implication are applications that forbid network connections due to high-security standards and also products targeting rural and remote locations with unstable network connections.
Another consequence is that a certain amount of latency is unavoidable with networked applications. When information needs to be transferred from one device to another, it needs to be transported through more physical space.
For a chatbot based on Facebook Messenger, this means information needs to be transfered from the user’s device to a data center, where Facebook is operating their Messenger platform, further to another server where the developed chatbot software is running, and back to the user through the intermediary data center. Many factors influence the accumulated latency, of which certain factors can hardly be influenced by the developers and operators of the chatbot software. Aspects of networking such as the capabilities of the user’s devices, the conditions the Internet service provider of the user is operating under, domain name lookups, IP package routing and associated package loss, the locations of Facebook’s data centers, and the performance of event processing and forwarding of the Messenger platform, are only a selection of unknowns when operating a chatbot.
The unavoidable latency and the unpredictable parties involved in the networking layout make it clear, that chatbots are not suitable for any time-critical applications. It can not be guaranteed that a user receives a response within an unnoticeable period of milliseconds or with a delay of tens of seconds.
Not only the dependency on the network is limiting the capabilities of chatbots, but also the very idea of using chat as a medium. While the text is undoubtedly a powerful medium, there are certain applications where other media are better suited. Many messaging platforms already enhance communication by adding interface elements such as buttons and by allowing multimedia content such as images and videos to be sent. However, the linear layout fundamental to text communication remains and its immutable nature limits interactivity. While chat can be a very simple and efficient way of communicating and it is appropriate in many scenarios, other use cases are addressed better with existing solutions such as native or browser-based applications. Obvious candidates, which are better served with other technology are photo and video editing applications and also 3D gaming.
Due to the limitations, the medium chatbot has, it can be anticipated that, while they address many needs of human-computer interaction, they are not universally applicable and they are not able to replace all other media. Much in the same way that the invention of the radio has not replaced newspapers and neither did the introduction of television replace the radio, chatbots can be seen as a new possible communication concept, but they can coexist with previous technology in a way that each of them focuses on the area they are best suited for. Understanding not only the merits but also the limitations of a medium is essential for deciding whether the medium is a good fit for a use case or if another medium is a better-suited solution.
To not only understand where chatbots can be applied but to also understand how a chatbot can be created in practice, this chapter guides through the development of an example chatbot. The description of the development process is kept on a more general level and many aspects of the presented architecture and solution can be reused when creating other kinds of chatbots.
First of all, a suited application needs to be chosen and specified in its requirements. Before starting with the implementation, possible usage scenarios need to be defined and matching user stories have to be created. When all requirements are set, appropriate platforms, tooling, and solutions can be selected. After all, preparations are done the technical implementation can take place.
Choosing a Practical Example
A suitable application for a chatbot needs to be selected before thinking about implementation details.
As illustrated earlier, chatbots can be used to cater to a wide variety of applications. Since this application should be a demonstration of the different aspects of developing chatbots, it should not be too simplistic in scope. An appropriate example covers more than one of the product categories described in section 3.5 on page 16, while being, at the same time, not too technically challenging in a specific problem domain outside of chatbot development. A service, that accepts image files and returns the same picture with a filter applied, might be an interesting and entertaining use case for a chatbot, however, it would not be an adequate example to give a general introduction to chatbot development since, even though it would be a technical interesting task to solve regarding image processing, it would not illustrate many technical details in the domain of chatbot development.
The here selected example application is a system for individual language studying.
While this idea arises from a personal need for such a system and an observed shortage in the functionalities existing solutions provide, this application also fulfills the stated requirements of a suitable example application.
The language studying chatbot covers multiple of the categories defined in section 3.5. It is mainly an optimization chatbot improving the task of language studying, but the chatbot can also be classified as a proactive chatbot because it can use proactive features to notify the user when it is time for studying.
Although there is a necessity to understand parts of the problem domain of language studying, the required domain-specific knowledge is minimal.
The idea is to build a system independent from existing learning resources that users can use to study their own vocabulary. Since there is no need to focus on content for specific languages, the main focus remains to implement the chatbot features, which are primarily, that a user is able to input new vocabulary, and then the system tests the user’s knowledge in appropriate studying intervals.
When starting a new project, it is helpful to research existing solutions which solve similar problems. There are already a variety of existing software applications for language studying, which have various use cases and solve different problems.
One significant separation is between software that includes content and software users can utilize and customize to study personal content.
The first segment is the most prominent. This software is intended to enable people to self-study language and, at least partially, replace physical language courses. The main reason for the prominence of this segment is the ability to sell content. Professionally curating the curriculum for a language course requires teaching expertise and takes a lot of effort. Because of this, good content for language studying remains expensive, not only in software but also in the form of physical textbooks. One popular example from this segment is Duolingo. “Duolingo has courses in a handful of languages… The courses are structured in a way like games as well – you earn skill points as you complete lessons”.
Interestingly, Duolingo recently released a chatbot as part of their iPhone application which enables the user to have a text-based conversation about certain topics with a chatbot to learn the appropriate phrases for the given scenario. Although the topics and possible phrases are restricted in each scenario, this is a first example of how chatbots can be used for language studying.
The second segment consists of software which does not provide users a guideline for what to study but is instead intended to support users studying their own vocabulary. Most of the software belonging to this segment are attempts at bringing traditional flashcards to digital media.
One of the most established applications in the second segment is Anki, which exists for more than ten years already and provides a flexible, but also rather complex, interface to create various kinds of studying material. Another more recent competitor is Memrise, which gives users a more intuitive interface, that also includes several gamification1 features to make the studying process more appealing. Both mentioned products are not restricted to a field of study and users are able to add their own content. Furthermore, both products also offer mechanisms for users to share content with others, which allows users to reuse what other users created.
For the in this work planned chatbot example, the second segment is more fitting, since there are no resources in the current context to curate professional content.
The following implementation is an attempt at creating a software product with a conversational text interface that is in its use cases similar to products like Anki and Duolingo while making use of the unique features the medium chatbot provides.
Use Cases and Requirements
Building on the analysis of existing solutions in 4.2 on page 24, features for the chatbot need to be specified. An effective method for gathering crucial features is to find potential users and create usage scenarios for their individual needs.
To apply this method, the fundamental problem the application is solving needs to be defined first.
The issue this chatbot is trying to help with is the study of individual vocabulary. The goal is not to provide studying material in a way a language course or a textbook does, but instead to complement these resources with a tool to study new vocabulary and phrases students
pick up while studying or in different situations in everyday life.
The following are two hypothetic scenarios of individuals that might use the chatbot and both profit from it in different ways.
Clara is a 22 years old American. She moved to New York City to go to university. Currently, she is in the last year of her bachelor degree in economics. In university, she signed up for an evening class in Mandarin. She uses Facebook Messenger every day to talk to her friends and she discovered the chatbot when a friend sent her a link. For her, the most difficult part of the studies is to write hànzì, the Chinese characters. Now Clara uses the chatbot to write down vocabulary in hànzì during her class, and at home, she revises the new characters by going through them using the chatbot and writing the characters down on paper.
Pierre is 29 years and born in Bordeaux, France. He studied computer science and a year ago he moved to Berlin where he found a job in a startup. At work, all communication is done in English since the team consists of people from all around the globe. Because Pierre is not a native English speaker, he picks up new words at work almost every single day. Since moving to Berlin Pierre also made a few German friends and he tries to pick up new words they teach him. He found the chatbot on a news website for technology products, and since then whenever Pierre learns a new word he grabs his phone from his pocket and adds the word to the chatbot. Since the chatbot has no restrictions on what to learn, Pierre uses it to save both, German and English, vocabulary in one place. Pierre’s daily commute from and to work takes 40 minutes twice a day. Now he uses his commute time to take out his phone and review new vocabulary he picked up the previous days.
All necessary functional requirements can be extracted from the above-defined user stories. First, a user needs to be able to add new vocabulary. There should not be any restrictions on what can be added and vocabulary should not be limited to single words, because in many cases it is more helpful to add whole phrases instead. Each vocabulary consists of the phrase the user tries to memorize and an explanation to help to understand the meaning of the phrase.
Next, the chatbot should provide a way to revise vocabulary. There should be two possible modes for revising; one where users can click a button to tell whether they remembered the phrase correctly or not, and a second mode whereby users type out the phrase themselves. In each case, the system should keep track of whether users knew the correct solution or not. Lastly, it is necessary to determine what to study next. A user should not be required to think about what or when to review vocabulary. The chatbot needs a system to decide the review time for each vocabulary, and ideally, the user is notified when the vocabulary is ready to be reviewed by sending a message to the user.
These three main features can be seen as a sufficient minimal viable product or MVP.
For demonstration purposes, it is desired to keep the product as simple as possible. The knowledge that can be taken from making decisions about the implementation and walking through the process of creating the chatbot, is mostly independent of this particular product and can be applied to the development of other chatbot products.
Since this is a simple example, non-functional requirements remain minimal. Availability of the service is not a priority, but chatbot software can be scaled similar to other software, and redundancy can be used to ensure availability. Since messaging platforms act as an intermediary between users and the chatbot software, most platforms also re-send missed messages in case the chatbot is unavailable. That the platform ensures availability, further lessens the priority to address it in the chatbot software itself.
Similarly, security is not the main focus here, because the messaging platform itself already handles certain security-sensitive functionality such as authentication and encryption of communication. A production scenario, though, would require further care for securing the service.
Performance is equally not a major concern. Because the scope of the example application is limited, the domain-specific logic remains inexpensive in computation. The main performance bottleneck is in 3.7 on page 20 mentioned aspect of networking and involved unknown parties. Employing performance-improving solutions for networking issues won’t be a part of the example chatbot, but performance can be improved by choosing geographically strategic located data centers for deploying the chatbot software.
A more central requirement is reusability. Although the example focuses on solving a specific task, the software architecture should be designed in a way, that appropriate parts can be reused for other chatbots in the future. To ensure reusability the software should be documented, stable and extensible.
Usability can be seen as the most important non-functional requirement. The focus of developing the example chatbot is to design good user experience and to explore how the interface and interaction design can be best accomplished with the given medium.
After knowing the features the chatbot should support, technical requirements can be extracted and appropriate technology chosen for the implementation.
As discussed in 3.4 on page 15, there are two fundamental ways for communicating with a user, interface elements and natural language.
Since the previously defined features for the minimal viable product of the example chatbot require not many options, everything can be represented with unambitious interface elements. However, complimentary to guiding the user with interface elements natural language is required to capture user input. When adding new vocabulary, phrases and their explanations need to be captured, and likewise, when studying the user’s guesses need to be evaluated. The implementation of this chatbot demonstrates the complementing use of interface elements and natural language side by side.
By relying solely on simple input parsing and no advanced natural language processing techniques, a major source of complexity of chatbot development can be avoided. While there are useful techniques that enable previously impossible use cases, they are not necessary to explore the fundamentals and paradigms of chatbot development.
An important question to answer when developing a chatbot is which platform to target. As shown in 3.3 on page 10 there are many possible target platforms and some of them are fundamentally different from others. Deciding for a voice-based platform like Alexa, for SMS or for a messenger platform has consequences for all further decisions.
For the example case, a voice-based interface is less appropriate since users should be able to control the precise spelling of the vocabulary and further, there is currently no voice-based platform available that supports multiple languages simultaneously.
SMS communication is better suited for scenarios that only need to send a low number of messages since there are costs for each sent text message. As of writing, Twilio, a cloud communications service charges $0.0075 for receiving and $0.085 for sending a message when using their Global Short Message Service API in Germany. Supposed the chatbot has 1000 active users that all study 100 phrases daily, the costs would accumulate to $277,500 of monthly expenses2. These prices are affordable if a company is selling airplane tickets or something similar, but in other scenarios, another messaging platform will be a better option. By choosing a messaging application as a target platform, there are no monthly costs to be taken care of. Additionally, there are further interface elements than only plain text available for interaction. But there are many major existing messaging applications, therefore deciding for one can be difficult.
As shown in 3.3 on page 13, different platforms have geographically different target markets. If a chatbot targets the Chinese market WeChat would be the obvious platform to choose; likewise, Japan would be targeted by using Line. In North America and Europe Facebook Messenger and Facebook’s WhatsApp are currently the leading platforms. Since as of writing, WhatsApp does not provide publicly available access to the platform, Facebook Messenger is the biggest platform one can currently target.
As mentioned in 3.3.1 on page 14, there are also solutions to create chatbots using a framework that allows releasing a chatbot to multiple platforms at the same time, but as previously noted such a framework also has drawbacks.
With a number of limitations and the goal of keeping the example as simple as possible, it will be implemented for Facebook Messenger as a single platform without abstracting the process by using third-party frameworks.
The registration and configuration of a chatbot for Facebook Messenger are not explained in detail here since Facebook’s online documentation already covers detailed explanations and the majority of the settings are specific to each chatbot.
Facebook refers to a chatbot in Messenger simply as a bot, and to create a Messenger but it is required to initially create a Facebook Page and an application for the page. Afterwards, Messenger can be added as a product to the application. Further information can be found
2 1000 ⇥ 100 ⇥ 30 ⇥ (0.085 + 0.0075) = 277, 500
in the developer documentation.3
The development of a chatbot for Facebook Messenger is similar to most other platforms. When creating and configuring a chatbot, the developer registers a WebHook. “The concept of a WebHook is simple. A WebHook is an HTTP callback: an HTTP POST that occurs when something happens; a simple event-notification via HTTP POST”. After the setup was successful, Facebook sends HTTP POST requests to the registered URL containing event information for every message a user sends to the chatbot. This way developers have complete control over handling each message on an arbitrary machine that is reachable via a public URL.
At this point that the interaction with the platform is decided, the application can be created on a custom server, and it has to be decided how to structure the application running on this server.
Since all interaction with the platform happens via HTTP, there is no restriction on which programming language to use as long as it can be accessed through HTTP.
Chatbots can be written in any programming language.
Depending on the specific application, different programming languages are better suited than others.
In the example case, there are not many specific technological requirements. However, to be able to send notifications to users when their studies are ready, the system requires a way to schedule timers. The timers need to be lightweight enough to be re-scheduled for every user activity that can affect the time the notification is scheduled for.
Single-threaded programming languages, such as Python, Ruby or PHP, mostly use separate worker processes to handle scheduled jobs, but in this case, there would need to be a worker process for each user waiting until it is time to send notifications to the specific user. Another way to handle this in single-threaded programming languages is by using a system for an asynchronous, event I/O, such as the asyncio module for Python or the Node.js
The example chatbot is created with the Go programming language. Go is a multithreaded language which allows for taking advantage of all available CPUs on a machine without the overhead of creating new processes. It is well suited for scheduling notifications and Goes includes a robust web server in the standard library which can be exposed to the public Internet without a proxy server, which is a common approach with the previously mentioned programming languages.
With this setup, there are less moving parts to take care of and the example can be a simple, self-contained application.
In the example, case data needs to be stored and it should be stored locally on the same machine to keep the application simple.
Since a user can save new vocabulary, there needs to be a way to store information for each user. For studying itself, further information has to be stored. It is necessary to keep track of correct and incorrect user reviews, it needs to be decided when to study next and the time of the last study needs to be tracked too.
To send notifications additional information has to be stored. It is necessary to know when the user was last active and if the user saw the last notification. Notifications should only be sent when the user is not currently active and saw the last notification.
All of the required information is always bound to a single user. Users can never share information with each other. There are no further relations in the data. Without relational data, the features that relational databases such as MySQL or PostgreSQL provide are not used; instead a simple key-value store can be sufficient for storing the data.
The example uses an embedded key-value store called BoltDB, which does not need a separate process and saves data on disk in a single file.
These technical decisions are the base for the following description of the chatbot implementation.
Features and Interaction Design
Figure 4.1: Search At this point it is defined what the example chatbot should be able to do, and which technologies are used for its implementation.
Before looking at the technical implementation, the usage of the chatbot is shown from a user’s point of view. The following is a presentation of the features of the chatbot and the design of the interactions to use these features.
A Facebook Page and a Facebook Messenger bot have been created under the name Studybot. By not publishing the Facebook Page, the chatbot remains only accessible for administrators of the Facebook Page.
The demonstration uses the Messenger Android application, but Messenger bots can also be accessed through applications on other platforms or using the web version of Facebook.
When using the Messenger application as administrator, the chatbot can be found by using the search as shown in figure 4.1.
After navigating to the chatbot, a description becomes visible. This can be seen in figure 4.2. It contains the profile image of the Facebook Page the chatbot belongs to, the category the Facebook Page is part of, and a text describing the functionality of the chatbot, whereby all of these elements can be defined by the developer of the chatbot. A button labeled Get Started is displayed at the bottom of the screen.
When pressing the Get Started button, a message is sent to the chatbot, and as indicated by the dots visible in figure 4.3
on the left image, the chatbot is active and about to send a reply. In normal conversations, the dots indicate that a user is typing. For chatbots, they indicate that the chatbot received the message and is crafting a reply.
The image in the middle of figure 4.3 shows the first message sent to users. They are greeted with their first name to create a more personal feeling atmosphere. The greeting is followed by two sentences explaining the chatbot.
When working with text as a medium, it is especially critical to focus on the most important information. In graphical interfaces, users can get an impression with a single glance, but to perceive the meaning of text users need to read word by word. In most scenarios the time users are willing to invest in understanding a product is limited, every word in a text has to be selected carefully.
It can be seen in the image on the right of figure 4.3 that a second message is sent to the user. The second message is delivered 5 seconds later than the first one to not overwhelm the users with too much information at once. This message contains instructions to begin using the application. The instructions are additionally illustrated by providing an example for a message in the expected format.
Figure 4.4 shows how a user adds three phrases to Studybot. The image in the middle has a button labeled as stop adding displayed at the bottom of the messages, which users can use as an alternative way for interacting with the chatbot when they wish to stop adding phrases.
Figure 4.5 shows, that after stopping the adding of phrases, the user is prompted with an array of four possible actions. The actions are labeled as study, + phrases, help and done. Each button label also contains an emoji4 used as an icon for the action to make it easier to recognize for users. When the usage of graphical elements is limited and one has to rely mainly on text for communication, emojis can be a helpful substitution for traditional icons to provide graphical guidance.
The chatbot has different modes of interaction. Internally the chatbot needs to keep track of the current mode of each user to be able to address each message in the correct context. In figure 4.5 the user has left the mode for adding words and entered the menu mode which provides access to other modes.
By clicking on the button labeled study, the user switches to the mode for studying of the previously added vocabulary. However, as the image on the left of figure 4.6 shows, the only recently added phrases cannot be studied yet.
Study both uses a concept known as spaced repetition system, or short SRS, which makes use of the so-called spacing effect. “Information that is spaced over time is better remembered than the same amount of information massed together”. Based on this approach users need to wait before studying newly added vocabulary. The more often a phrase is guessed correctly the less frequent the user will be asked to review the phrase.
Since added phrases cannot be studied directly, the user is instead offered to receive a notification message once phrases are ready for review.
When sending notifications to users, it is crucial to not send more messages than necessary. Depending on the messenger platform, there are also policies in place on how many messages are permitted. The platform policy of Facebook Messenger clearly requires to “respect all requests (either on Messenger or off) by people to block, discontinue, or otherwise opt-out of your using Messenger to communicate with them”, which means that there needs to be a way for users to disable the sending of notifications. Further, the platform policy clarifies that “you may message people within 24 hours of a person’s interaction with your business or Bot…, and until the next interaction, you may send one additional message after this 24 hour period in order to follow up on your conversation”. How notifications can be disabled in Studybot will not be further explained here, but an illustration of the feature can be found in the appendix in figure A.1 on page 47.
In figure 4.7 a notification from the Messenger application is shown on the mobile phone. It is triggered when enough studies are ready to be reviewed. When opening the conversation with the chatbot, the user is now prompted to study. As defined as a requirement in 4.3.2 on page 26, while studying one is free to choose to answer either by using the buttons at the bottom of the screen or by directly typing the correct phrase.
The x button available on the right image of figure 4.7, allows users to delete a phrase. Since there is no possibility in Facebook Messenger to display a long list of interactive elements to the user, creating separate functionality for editing and deleting phrases is difficult to achieve, and during studying is the only possible moment to refer to a phrase.
The figures 4.7 and 4.8 show a subtle decision to display the buttons users are most likely to interact with on the right side.
The current interface language is English and it is written from left to right. Since humans scan the interface in the same order as they read, it is more intuitive to show the most important information on the left. However certain interactions, like studying, need to be performed so frequently, that a user will remember the location of the buttons quickly and does not need to scan the interface every single time. In this scenario, it is helpful to have the most used action on the right side, because the majority of humans are right-handed, and they can, therefore, reach the buttons on the right with less effort since they are closer to the thumb when using a mobile phone with a single hand.
A feature, that is outside of the main flow of Study both, is the sending of feedback to the developers of the chatbot. As shown in figure 4.9, it can be accessed after pressing the help button. This feature is not specific to language studying and it is useful for most chatbots. Normally all messages are answered automatically by the chatbot, but there should be a way for users to talk to the human developers or administrators that provide the chatbot. The feature is automated by sending the messages that users send as feedback to a Slack channel. Additionally, administrators can directly reply to user messages inside Slack and the replies are sent back to the correct users within the chatbot.
Figure 4.9 shows the flow for sending feedback from a user’s point of view. The exact implementation is hidden from users, but simplifying the sending of replies by using an existing medium like Slack, can minimize the manual work required to maintain a personal feature like this.
All primary interactions the example chatbot provides have been covered above. Many of the implemented interactions can be transferred and reused when creating other kinds of chatbots; addressing users by name, sending text in small chunks with a delay in between, prompting users for specific input with every message, keeping track of the context of each user, asking for permission before sending notifications, consciously ordering buttons and supporting custom user feedback, these patterns can be applied in many different scenarios.
The resulting functionality and implementation of the example chatbot can be summarized as finding the simplest possible interaction for the user to get a task done, which is expressed well in the following quote from the newspaper The Guardian about the lessons they learned by developing a chatbot:
A lot of users responded as they would to a human, and when they got non-human responses, they’d stop using it, said Wilk. So the Guardian went in the opposite direction with its newsbot and aimed for utter simplicity. The lesson, according to Wilk, was: “Don’t build people’s expectations too much of what’s possible, just keep it simple.”
The following is an overview of how Studybot has been created. Instead of covering all details, the focus here is to communicate the underlying concepts, which are not unique to this specific chatbot and thereby enable others to apply these patterns in their own development.
There are numerous resources available today that demonstrate the development of chatbots. However, a majority of existing materials rely on specific services, frameworks or libraries for the implementation. The implementation of Study but demonstrates all basics necessary for the creation of a chatbot without relying on external tooling; the code-base has no external dependencies, with the exception of a package for the database.
The graph has been created with a node graph
The Graph in figure 4.10 shows the overall architecture of the chatbot. The main package handles only configuration, setup and tear-down. It instantiates the data store from package brain and starts web servers listening on two separate ports for the packages messenger and admin.
The store has a connection to the database and it is responsible for all domain-specific business logic of Studybot. Both servers use the store to fetch data and they, therefore, depend on package brain.
Package admin only provides functionality for internal use by the administrators of the chatbot; one of its main responsibilities is to handle communication with Slack for the feedback feature, which was mentioned in 4.5 on page 38.
Package messenger is responsible for processing all events received from the Facebook Messenger platform and sending replies back to users. It relies on another package named fbot, which is a simple abstraction over the functionality of the Facebook Messenger platform. Communication in this package happens via JSON over HTTP. Since this is a custom package, it is specifically designed to be used in this chatbot. It only needs to support data types and parameters that are relevant to this chatbot.
Most of the code base and its architecture are structure the same way most other servers are organized; the logic specific to the development of the chatbot is contained in the package messenger. Figure 4.11 illustrates the behavior of this package. For brevity calls to the packages brain and fbot are not included in the graphic. The upper half of the graphic in 4.11 shows function calls, that are invoked from a package
The visualization has been created with go-callvis
which belongs to the Bot type. This is used to send replies from Slack back to users.
Call graphs for the packages main, admin and fbot can be found in the appendix starting on page 48. Additionally, the Go documentation for all exported types and functions can be found on page 50. It explains specific implementations in more detail.
Separating and containing logic specific to the chosen platform, in this case, Facebook Messenger simplifies adopting the chatbot to new platforms in the future. The development of a chatbot can be summarized as being a slight variation of already known server-side development. Chatbot software is fundamentally a server accepting HTTP requests from a messenger platform and sending HTTP responses back to the messenger platform. Developing such a system should be familiar for developers who have created web applications with server-side rendering in the past, since state management and request processing work in the same manner. While the receiving of user events can be approached with known patterns, the sending of replies back to users requires a paradigm shift in thinking. Technically, responses are also rendered and sent to users, but the rendering has no custom underlying interface as it is the case when rendering HTML for web pages. Instead, primarily plain text and only a few platform-specific interface elements are available for presenting
Apart from the potential use of complex natural language technology, the main difficulty of chatbot development is the design of user interfaces for the medium chat.
To conclude the exploration of chatbots, a more opinionated take on the merits of chatbots and also ideas in which direction their role could evolve in the future are given based on thoughts from different people active in the field.
To begin with it should be clarified that text is a great medium for communication as thisquote illustrates:
Text is the most socially useful communication technology. It works well in 1:1, 1:N, and M:N modes. It can be indexed and searched efficiently, even by hand. It can be translated. It can be produced and consumed at variable speeds. It is asynchronous. It can be compared, diffed, clustered, corrected, summarized and filtered algorithmically. It permits multiparty editing. It permits branching conversations, lurking, annotation, quoting, reviewing, summarizing, structured responses, exegesis, even fan fic. The breadth, scale and depth of ways people use text is unmatched by anything.
But no matter how useful text is, users prefer to not to type out everything in text. “If something can be tapped/clicked instead of typing, they prefer that”.
The idea of chatbots has been existing for a long time, but only recently messenger platforms started adding features that help developers to make chatbots more user-friendly. “Through our journey, we have understood that the best way to build bots for businesses is through a hybrid approach – use of buttons and quick replies along with text-based queries. This approach helps both businesses and customers,” says Mounish, a co-founder of the bot builder platforms MindIQ. Neither is natural language processing based on the latest advancements in machine learning and artificial intelligence a solution for every problem:
“Adding natural language for simple domains is overkill,” says Dennis Thomas, CTO at NeuraFlash, which develops AI tools that integrate into Salesforce …
“When you have a visual medium and buttons can accomplish the task in a couple of clicks (think easy re-order), open-ended natural language is not making the user’s life easier.”
Text and natural language enable new kinds of products, where the focus is more on flexible and individual behavior than on accuracy.
A “place where NLP is a big win is when the bot’s objective is focused on helping users with the discovery phase of products or shopping.” Businesses selling goods, newspapers that need to spread their content and travel agencies helping customers to create individual journeys, these are some branches which are currently particularly interested in the possibilities of natural language. Two areas the usage of chatbots is promising for are health care and banking because these fields involve a big amount of customer service of which many common situations could potentially be automated with machine-based customer support.
Another interesting development, apart from enhancements of the interfaces and natural language, is the in 3.5 on page 17 mentioned category of personal assistants: Many chatbots, including the created example, are designed to only solve specific problems. They are forced to prevent users from trying to use them for tasks outside of their area of expertise. Personal assistants do not have these limitations and systems like Siri or GoogleAssistant have the potential to evolve into universal assistants as imagined in science fiction movies and literature.
Particularly interesting ideas are open systems where third-party developers can add additional capabilities, and the personal assistant has a mechanism of delegating specific tasks to the third-party systems. An example is the development of skills for Amazon’s Alexa.
Lastly, after focusing on chatbots for the whole time, it can be questioned if this is the right terminology after all:
The term “chatbot” sets an expectation around the user experience that the technology can’t deliver. “Chat” is a very human word. You chat with your friends. You have chat with your neighbor. Chatting has specific connotations it’s very casual and easy. Chats meander, and you can take them any direction you want. … If business users think “chatbots” are trivial, or if they simply prefer a fancier word to refer to the function (“business process automation”) then we’re setting ourselves up for hard conversations with potential business buyers.
Depending on the circumstances a different term can be more appropriate. No matter whether we call them chatbot, business process automation or simply bot, they give opportunities to explore new product ideas and bring the power and convenience of text to the applications, that users already spend most of their time with.
This work introduced the fundamentals of what chatbots are. It gave an overview about ideas, products and platforms, both, from the past and available today. The current interest in chatbots, potential use cases and limitations have been explored in detail. Different aspects of the implementation of a chatbot and working with conversational interfaces have been presented through the creation of an exemplary chatbot, which included interaction and user experience design, and a general, reusable software architecture for chatbots.
While not all aspects can be covered within the context of this work, the goal was to give an overview about what chatbots are, their use cases and how to create them. This knowledge should help exploring further possibilities of chatbot usage and it should enable more developers to apply chatbots to new scenarios and thereby also improve human-machine interaction in general.
B Call Graphs