PolyAI Blog · Article

The Capability Dilemma

PolyAI Blog · 2024

One of the most significant factors determining how a user feels about their experience with a technology is the difference between the tech’s inferred capability and perceived capability. Early on in an interaction with a device or interface, a user will subconsciously construct a use image, a mental picture of how the device works and what it can do. When a user discerns a discrepancy between their use image and the technology performance, there is potential for an extremely positive or an extremely negative experience. On the one hand, if a user overestimates a device’s capabilities, they’ll be all the more disappointed when it underperforms. On the other hand, a device might so exceed the capabilities inferred by the user that it blows their socks off and positively upends their assumptions about that technology entirely.

Under Promise; Over Deliver

A use image is built in response to cues presented by the device and by past experiences with similar technology. When a device has pretty simple behavior–think, a TV remote control–the use image is pretty simple. A TV remote cues the user to its capabilities by having a bunch of buttons labeled with their purposes. A user combines the information provided by these cues with their prior experience with TV remotes (and all devices with buttons) and mentally models how they expect the device in their hand to behave: if I push the mute button, the TV will mute.

Naturally, if the mute button actually did nothing at all, that would be a very negative experience because the device is seriously underperforming its advertised capabilities and violating that mental model. Conversely, if the remote integrated some new unexpected capabilities (e.g. nowadays, some smart-TV remotes have a microphone to let you search for movies and shows without typing out their names), the use image is blown wide open and the user is might be amazed. In a simple application like a TV remote, it seems obvious that you should always “under promise; over deliver”–you want to keep your users pleasantly surprised by your device’s capabilities, hanging onto your most advanced capabilities as a hidden ace up your sleeve.

Complexity and the Intentional Stance

As a piece of technology and its behavior becomes more complex, so too does our use image of it. Think now of the behavior of an entire computer, which can do lots of different things with lots of kinds of input. When the use image becomes so complex as to be almost inscrutable, people tend to fall back to a kind of default mental image, assigning agency and intentions to the device. We default to this intentional stance in our language around technology all the time. Think of the last time something unexpected happened when interacting with a computer or application and you said “Oh, it didn’t like that,” or “Seems like it wants me to click on this.” Those kinds of outbursts may seem like cute, superficial ways we personify devices in language, but actually it’s indicative of a deeper personification of our use images. This makes perfect sense, since the most complex and nuanced behavior we ever experience is that of other people, who have genuine wants, needs, likes, dislikes, etc.

The Risks and Rewards of a Human Use Image

When it comes to PolyAI’s Conversational User Interfaces (CUIs), the sky's the limit in terms of behavioral complexity. CUIs capitalize on human intuitions about interacting with other people and deliberately cue the user to construct a human-like use image of how it behaves. This design tactic has great potential to make CUI interactions feel extremely natural and frictionless, since we interact with other people effortlessly all day. Also, an extremely important part of maintaining user engagement in turn-based interface like a voice assistant is captivating the user enough in the first turn that they’ll even give you a chance. This is even more important for customer service applications, where it is well documented that people already prefer to use a human intermediary rather than a self-service option, and where people might be disillusioned by generations of underperforming traditional phone menus (think, press 1 to speak to the front desk, press 2 to…). Much more than in the case of a TV-remote, there is a significant risk in fostering a low-capability use image because the user might abandon the VA before we even get a chance to prove ourselves. This is one of the main reasons PolyAI agents often don’t announce that they are VAs on the very first turn.

That said, a CUI has potentially the greatest risk of over-advertising its capabilities. If a user has built a human-like use image, they might, even implicitly, expect the entire range of human-like capabilities. Research shows that the experience of using a lower-performing device which fosters a human-like use image is far more frustrating than the experience of the exact same capabilities in the context of a more robotic or machine-like use image. Although PolyAI agents deploy top-of-the-line natural language technology and are carefully customized to the domain of each application, there is always a risk of fostering a use image that will lead to frustration.

Know Your Audience

This seems like a real design catch-22, do we prioritize the concern about losing user engagement by seeming not capable enough and design our agents to seem maximally capable? Or do we prioritize not over-selling our capabilities (perhaps by having our agents announce themselves on the first turn) so as to privilege overall customer experience among those who choose to engage.

As is often the case, the answer lies in the customer. We find that the risks and rewards associated with these two options are not universally the same across every customer demographic. Current research suggests that older adults are enthusiastic about the prospects of AI-enabled technology and that the media targeted toward them tends to oversell the capabilities of AI tech and voice user interfaces. When designing for primarily older-adult customers, it can therefore be especially important to be clear and upfront about what we can and cannot do and provide some suggestions for tasks to complete.

On the other, hand in the younger and middle-aged crowd who may be more familiar with the range of voice user interfaces and their applications, under-inferring PolyAI’s capabilities may be more of a risk, so we want to prioritize preventing user abandonment and give ourselves a chance to surprise them with performance that far exceeds IVR and other VUIs.

Designing customer-led voice assistants goes beyond allowing the customer to guide the conversation once they’re engaged, it also involves anticipating customer feelings and assumptions about our technology and adapting to them before they even pick up the phone.

← Back to Work