Understanding the Risks of AI Deployment


The current fascination with Generative AI indicates the long arc of AI’s progress has now burst into public consciousness and debate. As such, the world of customer service and customer experience is now flooded with enthusiastic descriptions of the latest ‘thing that will change everything’.

Of that we have no doubt given the long view of how new generations of technology soar in popularity, crash and burn from the indignation of misplaced hopes and competency, then finally become part of everyday life once expertise and cooler heads take charge. A trend that Gartner captured so memorably in their famous phrase ‘the trough of disillusionment’.

Those with an eye for the way disruption plays out in organisations will recognise how conversations tend to start with plenty of alarmist views then become moderated and focussed as understanding develops.

Right now, industry watchers are watching Generative AI head for the sun, aware of what’s likely to happen next. In anticipation, this article looks at early lessons in making a credible case for investment and laying the foundations for successful AI infused business models for those still entering the race.

McKinsey has just refreshed its annual tracking of AI’s strategic evolution and adoption which provides as good a summary of what’s currently happening as any other industry commentary.

Organisations that enjoy informed CXO leadership and faster growth from technology driven innovation are making impressive progress. In relation to Generative AI, they are deploying uses cases, (service operations optimisation is top), upskilling, reprioritising digital budgets and generally aiming to take first mover advantage in the sectors they operate in. 

Where they lag is in risk mitigation to bolster digital trust (security, compliance, privacy, etc). Evidently these are not sufficient reasons to pause or wait relative to what they see as a ‘once in a generation’ opportunity that they intend to leverage as fast and completely as possible.

In other words, for some organisations, the case for Generative AI is made. Whatever issues they bring can be dealt with through agile adaptation of existing governance or by rapid iteration of new practices.   

As such some customer service leaders will find themselves caught up in their organisation’s broader response to AI. Framework strategies and capabilities needed to manage Generative AI and optimise its value cannot be developed in isolation. It’s going to require strategic co-ordination and investment.

That alignment and execution process inevitably takes time. Meanwhile contact centres are being wooed into becoming early adopters as CCaaS vendors and other customer service components (knowledge management, quality assurance) infuse their solutions with Generative AI.  

We believe confidence in making those investments increases with greater understanding of the challenges and the questions that need asking. As well as being reassured with how other organisations are making progress.

Getting to Know Generative AI

So, what makes Generative AI different, special, and somewhat unsettling?

Back in the day, conversation topics (intents) had to be mapped against all the ways (utterances) in which users expressed them. In 2016, Citi bank declared they had identified over 2,000 different ways their customers requested ‘what’s my balance?’. We have been running to catch up ever since.

As users our experience is that this generation of bot still falls short of expectation: triggered by the promise of natural language. From the bot’s perspective, users will always introduce new and unmapped topics since this is what naturally happens in human-to-human conversation. And is why veering off topic often needs escalation to human assistance.   

By contrast, large language models are different because of the massive datasets they have already ingested before being deployed. This means almost every intent and utterance is already mapped and become addressable. It has been used to power the novel functionality LLMs have introduced: the ability to both understand an idea (prompt) and generate relevant content in response.

How they achieve these goals is also different.

LLMs are probabilistic, not deterministic,” says Nicola Morini Bianzino, EY Global Chief Technology Officer and Co-Leader of EY.ai. “Unlike prior IT platforms, giving an LLM a particular input does not lead to the same output every time. GenAI models instead produce a range of outputs with an underlying probability distribution and any approach to measuring confidence needs to similarly adopt a probabilistic approach.”

This perfectly captures the opportunities and challenges of LLMs. Probabilistic (expert guessing) more closely matches what happens in human language. Attributes like nuance, context and meaning, which we associate with the quality of a conversation, suggest there is more subtle interpretation at play than just vocabulary.

Now the bar for having a decent conversation with LLM powered bots has moved significantly forward, we must be prepared to embrace and mitigate the consequences. For instance, generating language based on probability means that customers will receive differently worded responses.

When does this matter?

Assuming output is being moderated by tone of voice guidelines and sense checked for accuracy, many would cheer its ability to deliver unique personalised conversation. But what happens when bias infects the language? Or the model does not just modify the way an idea is expressed but also alters the veracity of the content. Put bluntly, it makes stuff up.

LLMs have no native concept of right and wrong without the addition of ‘deterministic’ guidance. 

Embracing the fluency of LLMs entails developing strategies for when the probability engine is going to generate ‘wrong’ output.  

In some situations, the consequence is a low NPS score from a frustrated customer. In others, it might escalate to legal threat and reputational damage. And bear in mind the full extent of that liability remains unknown as legislation inevitably lags.

In case all this starts to feel far too unpredictable, here is an encouraging story from Nat West, who augmented their ‘deterministic’ bot with an LLM upgrade: bearing in mind the risks implicit in being a high-profile bank.

The first point to notice in this story is that Cora the bot was first launched in 2017 to answer simple questions from retail and business customers. At launch, it handled around 1,000 chats a month. This scaled to 1.4 million conversations every month. Nat West therefore had six years of prior expertise and internal debate to draw on when exploring LLM risks with executive decision makers.  

The renamed Cora+ now generates personalised responses to complex queries. It can help customers compare products, find services they need and navigate NatWest’s digital platforms. And draw information from various secure sources: a capability not previously available through chat alone.

The vendor’s Client Engineering team tested and validated Cora+ alongside NatWest’s business and technology teams, with a focus on the secure deployment of the AI digital assistant.

During this they probably focussed on a well reported LLM vulnerability enabling hackers to construct prompts requesting answers on ‘forbidden’ topics (e.g. customer details) by tricking the bot to override existing guardrails and respond to a fictitious life-threatening scenario.

With the appropriate guardrails and governance in place ensuring that AI is open, trusted and targeted, banks can deliver an empowering value proposition enabling an even deeper level of customer loyalty

The published quote is also instructive. The mitigation strategy is built from a combination of technical controls (guardrails) and risk management (governance) that enables a Generative AI service to be trusted and focussed (an optimal balance of probabilistic and deterministic behaviours)

As a final point in this section of the discussion, here is an observation that will interest designers of more deterministically constructed bots. It comes from the concluding thoughts of an article comparing old and emerging design principles. 

“Rather than designing everything we want to include in the bot, instead we give the bot a vast information resource and tell it everything we want it to exclude from its responses.”

Issues to Focus On

Customer Service leaders need to be assured that the right answers are being delivered. So what can go wrong?

A currently popular use case is the ability to summarise a knowledge article or pick out relevant sections in response to a specific user intent. Ongoing tracking show that in controlled tests, hallucinations range between 3% and 27% depending on the LLM.  Associated research suggests that flawed reasoning and the model’s misinterpreting of retrieved data are among the prime causes of factual errors.

While the research community continues to probe root causes, vendors are expediting practical solutions. For instance, by adding a range of filters to ensure safe and accurate answers.

Another issue might be caused by the recency of the information needed. All LLMs are created at a certain point in time and become restricted to what is known at that point. However additional information can be added using a workaround called Retrieval Augmented Generation or RAG for short without needing to retrain the model.

Then there is the issue of data leaks. What about an LLM inadvertently exposing personal data from one customer interaction in a conversation with another customer? Therefore leading to serious privacy breaches and fraud potential with the threat of regulatory fines and broader reputational damage.

Mitigation strategies are at hand in terms of ensuring that training and fine-tuning data are thoroughly anonymised and free of sensitive information. This can be augmented with policies to limit the retention of sensitive user data within the model’s memory or databases. Organisations can then remain vigilant through regular auditing and monitoring of the model’s outputs to detect and address any inappropriate disclosures of information.

As a final comment, it is in the interest of key market players to proactively reduce these risks. For instance, Microsoft offers an enterprise-grade version of ChatGPT. This allows organisations to deploy and run ChatGPT privately. This provides a similar user experience to the standard ChatGPT while not exposing company data when using it.

Similar initiatives and mitigations from others can be expected in the future.  

Concluding Thoughts

AI is an evolving portfolio of capabilities rather than a single thing that emerged at a single point in time. As such, many customer service operations have been using AI for decades. In this sense, given the broadness of the AI label, we need to appreciate that it is both tried and tested and full of risk based on which generation you are referring to. 

The possibility that by 2030, customer service as we currently deliver it will be transformed in terms of value and cost is what’s driving sector leading organisations to adopt sooner than later.  They are ready to deal with the issues as they arise.

They recognise Generative AI as a business imperative. Being late could be a fatal mistake. In the unlikely circumstance that legislation arrives in time to slow down the level of foundation adoption that is now in play, vested stakeholder interests will also prove tough opposition.

In sharing the experience of deploying custom ChatGPT at his company. Philippe Rambach, Schneider Electric’s chief AI officer offers a final piece of radical advice:

“Still very early, but I would say that the biggest learning for me is to forget the past. Whatever millions you have put into technology, if a new one is coming which is much better, stop (using it) and stop it fast. For example, we decided quickly in June to stop investing in our (existing) chatbots … and decided to move all our chatbots to generative AI.