Building a Chat Bot for Fun and Profit
May 18, 2016
This blog post is based on a lunch and learn talk I gave at Rangle.io on May 13, 2016.
On April 16th, 2016 Telegram unveiled their “$1,000,000 to Bot Developers. For free.” challenge. Developers were incentivized by a chance to win $25,000 USD to build novel, interesting bots on Telegram’s platform. Telegram is not the only major chat platform to embrace bots. Slack, Whatsapp, Facebook, Kik, Skype all have well developed bot APIs with bots ranging from image and video search, games to weather, sports and translation.
There’s no shortage of bot ideas out there and many developers are building them. This got me thinking. How hard would it be to build a bot anyway? What separates a bot from a command line REPL? This blog post details my journey to find out.
What is a bot?
Let’s start with a definition:
A computer program designed to simulate conversation with human users, especially over the Internet. - Google
Okay. A bot is supposed to simulate conversation, so what makes a conversation in bot-land? Conversations are:
- Message based
- Realtime
- Intelligent (contextual)
Therefore we must give our bot all of these qualities. It must have a brain, or at the very least be able to make intelligent assertions about incoming data.
How to make an intelligent bot?
State machines! State machines allow you to give the bot context. Incoming messages can set the bot in a state that lets it know what to expect of the next message.
What are state machines?
Another definition, a state machine (otherwise known as a finite-state machine) consists of:
- An initial state or record of something stored someplace
- A set of possible input events
- A set of new states that may result from the input
- A set of possible actions or output events that result from a new state
Above is an example of a simple finite-state machine. At any time it is in one state of a set of known finite states. From each state are defined transitions to other states.
Operations: Commands vs. Actions
For our bot, there are two types of operations. Here’s the lexicon we’ll be using:
- Commands are global top-level chat operations. They begin with a “/” and can be run at any time in the conversation lifecycle.
- Actions are contextual responses. They are handled by a specific function for each state.
The flowchart below shows the message parsing logic for commands and actions:
Now that we have the basics, let’s build a bot!
We’re going to be building Expense_Bot, a single-entry accounting bot. This bot will allow you create accounts and log transactions such as income and expenses to those accounts through a chat based interface. We’re going to be using Node.js and some database (pick your poison: MongoDB, RethinkDB, /(My|Postgre)?SQL(lite)?/
) for this exercise.
Commands
Let’s define a set of commands for our chat bot:
/start - Returns this list of commands /newaccount - Create a new account /accounts - List accounts /transaction - Log transaction (expense or income) /history - List previous transaction history /charts - View a chart image summary of your expenses and income /spreadsheet - Download full data /delete - Delete a transaction (expense or income) /deleteaccount - Delete an account /deleteall - Delete all info Expense_Bot knows about you /cancel - Cancel current operation
For our example we’re going to go through the /newaccount
workflow and build a conversation.
States
States are constants string literals defined as follows:
const STATES = {
NONE: 'NONE',
NEW_ACCOUNT_TYPE: 'NEW_ACCOUNT_TYPE',
NEW_ACCOUNT_NAME: 'NEW_ACCOUNT_NAME',
NEW_ACCOUNT_INITIAL_BALANCE: 'NEW_ACCOUNT_INITIAL_BALANCE'
};
Data structures
I’m going to define a set of data structures to hold our commands and actions.
const commands = {
help: function* (msg) {
// output help message
},
newaccount: function* (msg) {
// start new account
};
};
const actions = {
[STATES.NONE]: function* (msg) {
// NONE state launches commands ^
},
[STATES.NEW_ACCOUNT_INITIAL_BALANCE]: function* (msg) {
// Validates response is a number
// Transition to NEW_ACCOUNT_TYPE
}
// ...,
[STATES.NEW_ACCOUNT_TYPE]: function* (msg) {
// Validates response is account type
// Transition to NEW_ACCOUNT_NAME
}
// ...
};
Entry Point
All messages are triaged by an entry point. I’m using ES6 generators to leverage the yield
keyword and create co-routines. These are asynchronous function calls that appear synchronous, with the value returned inline, eliminating the need for a callback. The bluebird promise library gives us Promise.coroutine()
, a wrapper around generators that allows us to use yield
to return the value of a promise. Errors that would normally be given to a .catch
are raised, allowing us to use the traditional JavaScript error catching mechanism try { } catch (e) { }
. Cool!
const Promise = require('bluebird');
const main = Promise.coroutine(function* (msg) {
var state;
if (msg.text.startsWith('/')) {
state = yield createNoneState(msg);
} else {
var collection = yield State
.orderBy({index: r.desc('createdAt')})
.filter({telegramId: msg.from.id})
.limit(1)
.run();
state = !collection[0] ? yield createNoneState(msg) : collection[0];
}
runAction(state, msg);
});
This parses the contents of the message. If it begins with a /
, it creates a new State record in the database with a STATE
of NONE
. The idea is if the user is issuing a global command, its current context becomes invalid. The global command will then set them in a new state. If their message does not begin with a /
, it must be a contextual response, so we retrieve their current state and direct them to the runAction function. This function looks like:
function runAction(state, msg) {
const action = actions[state.state];
if (!action) {
bot.sendMessage('That action is not understood. Run /start to get the list of actions.');
return;
}
Promise.coroutine(action)(state, msg);
}
This references the actions
data structure above. In each handler we can do validation on the incoming response. So for example when the user is in the NEW_ACCOUNT_INITIAL_BALANCE
, the following handler will be used:
const actions = {
[STATES.NEW_ACCOUNT_INITIAL_BALANCE]: function* (state, msg) {
const validNumber = DOLLAR_REGEX.exec(msg.text);
if (validNumber && validNumber[1]) {
// Transition them to new account
yield new State({
telegramId: msg.from.id,
state: STATES.NEW_ACCOUNT_TYPE,
meta: {,
balance: validNumber[1]
}
}).save();
bot.sendMessage(msg.from.id, `What type of account is it?`);
} else {
bot.sendMessage(msg.from.id, 'I cannot parse that number. Please enter an initial balance of the format $1234.56.');
}
}
// ...
};
Conversation Example
Consider the following conversation with an accounting bot:
<robot> Hello, welcome to Expense_Bot! <human> /newaccount <robot> What is the initial balance for your new account? <human> $750.00 <robot> What type of account is it? <human> Savings <robot> And finally what is the name of this account? <human> Royal Bank <robot> Great! New account "Royal Bank" created.
In the above example: * /newaccount
is a command. At any time it can be run and interrupt the flow of the conversation because it is global. * $750.00
is an action. It only makes sense if the current state is expecting it. If you asked me “What is the initial balance for your new account?” and I responded $750.00, that conversation would make total sense. However if I walked up to you and said “$750.00” out of nowhere, you would have no idea what I’m talking about. This is the definition of context.
When parsing the second message from the human, the bot knew to expect a dollar figure. Why? When the human asked to create a new account (/newaccount
), it put the bot into state NEW_ACCOUNT_AMOUNT
. In our bot we can define a specialized handler for the NEW_ACCOUNT_AMOUNT
state that will validate the next response (is it a dollar amount?), save any persistent data ({balance: 750.00}
). After they submit a valid dollar figure, we ask the user what type of account this is and transition them to the NEW_ACCOUNT_TYPE
state. After the process is done we save the new account record and transition the human back to the NONE
state.
This transition of states and building of data is illustrated in the table below.
Current State | → | Next State | Incoming Message | Data (after message) |
---|---|---|---|---|
NONE | → | NEW_ACCOUNT_INITIAL_BALANCE | /newaccount | {} |
NEW_ACCOUNT_INITIAL_BALANCE | → | NEW_ACCOUNT_TYPE | $750.00 | { balance: 750.00 } |
NEW_ACCOUNT_TYPE | → | NEW_ACCOUNT_NAME | Savings | { balance: 750.00, type: 'Savings' } |
NEW_ACCOUNT_NAME | → | NONE | Royal Bank | { balance: 750.00, type: 'Savings', name: 'Royal Bank' } |
After the final step, the data is committed to the database as a new Account model. There you have it. The finite-state machine model of computing fits well with a conversation chat bot. This approach works well with transactional or wizard style bots that walk users through a number of steps. Now go write your own bot!