By Amanda Stent
If you have used a smartphone personal assistant then you would probably agree a computer has talked to you in a “natural language” like English or Spanish. However, it may surprise you to learn that the same is true if you have checked your email, used a shopping website, checked the weather online, tweeted with a company, or looked up directions on the Web. In fact, the Internet today is full of a mishmash of human- and computer-generated language.
How do computers generate language? Modern natural language generation (NLG) systems operate over raw numerical data, structured databases, or text input. They generate language for a great variety of useful applications, including weather forecasting, financial and healthcare report generation123, and review summarization4. They produce output using one of three basic methods. The first, and by far most widely used, is template-based generation: a human writes natural language text with gaps, and the computer fills the gaps in from dictionaries. If you’ve received a form letter from a company, that was template based generation. The second type of natural language generation is grammar based: a human writes a set of rules covering the structure of a natural language, and the computer processes the rules to produce natural language. Example grammar-based NLG systems are the open source SimpleNLG and OpenCCG systems. The third approach to natural language generation is statistical: the computer “reads” a lot of text (e.g. from the Web) and learns the patterns with which people write or speak. Then it can produce those patterns. A variation on statistical natural language generation that allows for more control uses a simple set of rules specifying the structure of the language to produce many possible outputs, and then a statistical model of text to rank those outputs so the most “human like” one can be selected.
Now let’s imagine that you wanted to make a system that talked with human users using natural language. For example, you might want to make a mobile app that recommended restaurants, that helped users change their bad habits, that compared the stats of football players for a fantasy football league, or that played a character in a mobile game. What would you want the NLG system to do in each case? At a minimum, you would probably want the system to produce correct, grammatical and natural prompts and responses, in an efficient manner; that is, you would want the system’s output to capture the content of the input accurately, to be easily understandable by a human, and to appear in a reasonable amount of time. These are standard NLG evaluation metrics.
One could argue that there is more than enough natural language on the Web to give any computer a correct, grammatical and natural output for almost any input, i.e. if we can learn the mappings from language inputs to knowledge representations, we never have to build an NLG system again. However, if you wanted to use the NLG system in an interactive context, such as encouraging users while they exercise or playing a character in a game, you would probably also want several other, less obvious, things from your NLG system. For example, you might want the system to exhibit controlled variation. Specifically, you might want the system to adapt its output to the context and to the user (e.g. not keep saying ‘Peyton Manning, the quarterback’ when ‘Manning’ would work for a football fan, or not say ‘Rob’s Bistro, 234 Main Street, Madison’ when the user is right across the street and it could just say ‘Rob’s Bistro, in front of you’). In addition, if the system is representing a company or a character in a game, you might want it to exhibit personality; a villain interacts differently than a hero, and different companies have different corporate personalities. And finally, if the system is very interactive, you would want it to have good ways to manage the interaction – for example, good ways to handle errors and ambiguities. Several of these new metrics arise directly from the interactive nature of the application – essentially, you want users to be sufficiently engaged with the system that they continue the interaction. The problem is that these additional desiderata are easy for humans to understand but hard to quantify and model in a computer program, and especially so in the absence of user feedback. We need methods for NLG that allow us to model the complexities of interaction as well as take advantage of the many sources of language data on the Web.
What is the big goal for NLG systems for interaction? What would allow us to say this AI problem had been ‘solved’? And what about the science of NLG - how can we use NLG systems to further understand human intelligence? The famous Turing test is a test of an interactive NLG system, but in some ways the test is oddly limited – the system and human are not co-present and can interact only through text, so the interaction does not take into account physical context or the user’s history; the task is a sort of trivia quiz, so the user may not care deeply about success; and there is no social or emotional engagement element, so only a small aspect of human intelligence is examined. At the same time, the famous experiments with the Eliza chatbot showed how easily humans can be fooled about human intelligence. What if we proposed new tests, e.g. a computer system that could convince a user to buy a product, or a virtual standup comedian? Both of these applications involve task-related intelligence, conversational intelligence and social intelligence. Or how about an interactive system that could be so helpful and engaging that a user would choose it over a human personal assistant?
At Yahoo, we are all about creating fun and personalized interactions to support users’ daily habits, and consequently we care deeply about issues of adaptation and engagement. Our applications run the gamut from asynchronous interaction (e.g. Yahoo Answers, Yahoo Groups) to situated interaction (e.g. Yahoo mobile search, Aviate). Furthermore, at Yahoo Labs we have the ability to run experiments at scale, allowing us to automatically identify the subtle features of language use that correspond, for example, to ‘helpful’ adaptation, to ‘informative’ answers or to a ‘fun’ personality. If you are a graduate student or faculty researcher interested in questions around NLG for interaction, we invite you to contact us – we would love to collaborate. Help us design interactive systems for the future that are engaging (e.g. fun, dramatic, beautiful) as well as useful.
4 Di Fabbrizio, G., Stent, A., & Gaizauskas, (2013) Summarizing opinion-related information for mobile devices. In Neustein, A. & Markowitz, J. (eds). Mobile Speech and Advanced Natural Language Solutions. Springer.