Generative AI in the Enterprise: A pragmatic approach to understanding and managing risk

Starting with Generative AI in the enterprise may feel daunting. Here is a pragmatist’s guide to how you can get started while managing risk so that you can start benefiting from the transformational potential of the technology.

Generative AI (GenAI) is a challenging set of technologies to work with. There are a number of reasons for this, but three stand out.

  • It is a general-purpose technology – potentially on par with steam or electricity, it brings the potential for large-scale transformation. The flip side is that change management is large-scale as well.
  • It is one of the least mature technologies out there from a governance and best practices perspective. If things go wrong they can be significantly wrong.
  • It evolves faster than any other technology, especially in terms of what IT departments in enterprises are used to. It requires constant evaluation, research and innovation.
 

As a result, it generates a double whammy of irrational fears about using it, and unrealistic expectations of its impact, especially in the short term.

 

A Pragmatic GenAI framework for evaluating risk and realising value

The challenge is understanding how to take advantage of what the technology has to offer today while mitigating risks and remaining in control. To do that, you need a pragmatic framework that allows you to reason clearly about what is important. While there is a lot out there around trust and safety in AI it can often feel quite abstract and high level. The focus is often on the foundational work and not on the application of those technologies to everyday enterprise problems.

Here I’ll give you a pragmatist’s guide to evaluating risk and realising value. It gets you started on the process and can help identify the right use case to begin with.

The framework is based on the numerous conversations with enterprises and the live projects we have atOpenDialog AI using GenAI in a way that provides a high level of automation while allowing our customers to remain in control.

It considers four dimensions:

  • Impact of the use case that we are looking to tackle, both in terms of how it impacts people and how it delivers business value.
  • Control that we are exercising on the underlying GenAI technologies. How are we constraining (or not) their behaviour?
  • Supervision of the outcomes and our ability to correct them and understand them. How are we monitoring the outputs and building in the ability to respond to undesired results?
  • Data exchange between other systems and people and GenAI technologies. Do we understand what inputs we are giving and how those are managed?
 

For each dimension, you can assign a value from low to high and you can examine what dimensions need to be better managed so that you get to an overall setting where the risk profile seems reasonable. For example, if you have a high Impact use case but control and supervision are low then you know that there is more work to be done. You can end up evaluating use cases as shown in the table below.

 

Now, let’s consider each dimension in turn.

 

Impact

There are two types of impact to consider: what it means for people dealing with the consequences of automated decision-making and what will be the return on investment (ROI).

If you are starting out with GenAI you want a use case that has enough ROI to matter but not so much that a failure would be catastrophic. It may feel counter-intuitive but you are not looking to get the maximum impact from the get-go. Failure makes everyone retreat (irrationally) and the broader opportunity is then lost. Much more so in any environment that enjoys extensive regulation. Instead, you are looking to grow your ability to govern the technology while showcasing the potential so as to justify further investment.

Similarly, and more importantly, when evaluating the impact on people and how automated decision-making will affect them we want a use case that is low impact until our confidence and trust in the technology is high enough. For example, you would not want to start with automating your entire hiring process by having an LLM directly evaluate CVs and send out acceptance letters but you could start by speeding up the process by summarising CVs for faster subsequent human evaluation.

Control

GenAI technologies are generally discussed as black box technologies that are uniquely inscrutable. While this is true it does not mean that we cannot apply control. There are a number of different ways to achieve this and when evaluating risk it is important to understand these techniques. For example:

  • Is the user interacting directly with a large language model (LLM) or is that interaction mediated by a control framework? For example, are you just taking user input and sending it to an OpenAI endpoint or are you pre-processing it and have a clear sense of where you are in the overall process before you involve an LLM.
  • Is the output from an LLM reduced to a specific, predefined set of outputs, or are you picking up exactly what was generated from the LLM and presenting it to the user?
  • Are we depending on a model’s understanding and recall of facts or are we feeding it information that it needs to reprocess and repurpose through something like retrieval augmented generation?
 
The balance to strike is finding the sweet spot of gaining the most possible value from the use of the technology while retaining as much control as possible.

For example, for one of the recent use cases that went into production at an OpenDialog AI customer, we limited the use of the LLM to choosing an answer based on a specific, predefined set. These outputs are well understood and each one has in place strategies to deal with both correct and wrong determinations. In addition, the user does not interact directly with the LLM, interaction is completely mediated through the OpenDialog framework. The customer was able to reach over 80% of the automation of the use case while benefitting from GenAI technologies. They’ve solved a real problem that no other approach would have been able to solve as efficiently, can see a return on the investment, retain tight control and grow their ability to integrate and govern the use of GenAI technologies in their business.

Supervision

Closely connected to control is supervision. Is the GenAI technology acting independently without the results being verified or is there a human in the loop or a human looking at the process, post the fact? If there is not a human in the loop can we provide a clear audit trail of all decisions along the way?

LLMs present a very specific challenge since technologically we cannot necessarily explain the inner workings of an LLM. The strategy we take at OpenDialog AI, is to make the decision-making as granular as possible so that we can better control it. The OpenDialog engine acts as a moderator of the conversation between a human and any interpreter or reasoning engine, including LLMs. This enables us to constrain the LLM significantly and provide an overarching semantic control mechanism that ensures that only appropriate next steps are considered. This gives us a good degree of control, a clear train of granular decisions that can more accurately be managed and better auditing of decisions.

Data

The final dimension to discuss is data. In particular, how carefully we manage the types of data we are providing an LLM. This is perhaps one of the issues that got the most attention from early on as enterprises realised that staff was inputting potentially confidential information into tools like ChatGPT.

Similar to the other dimensions the task here is to clearly understand what needs to be provided and, crucially, what does not need to be provided for reasoning to take place. For example, if we want to reason about a specific user do we need to provide their specific information or can we construct anonymous personas that act as substitutes? The reasoning from an AI technology will be the same but the information sharing risk reduces significantly.

How to apply the framework

You can now start evaluating different use cases and identifying how different dimensions need to be adjusted in response to elements that are fixed or non-negotiable. For example, in a scenario where data sensitivity is high you need to ensure that control and supervision are equally high.

Let us consider a post-hospitalisation care scenario as a more practical use case. The goal of the automation is to provide useful post-hospitalisation reminders and guidance to patients so that they can improve their chances of a positive recovery as well as collect feedback that can trigger or inform follow-up care from a medical professional.

Impact on the business: From an ROI perspective this is going to free up medical staff to deal with more complex tasks. It is not the most urgent or critical part of their job but it is significant for the long-term care and well-being of patients. A good balance between providing a clear return and not impacting the most critical systems.

Impact on customers: For customers, this information is available in a few different ways already (website, leaflets, etc) however, they don’t always make the best use of it. A reminder is helpful and automation can also provide guidance that is specific to their situation so they don’t have to sift through a lot of material. Again a good example of something that is augmenting and improving without taking over. For a first foray into automation, this is a good fit.

Control:The conversational automation can be constrained in several different ways to provide the appropriate level of control.

  • We can focus on providing answers to questions about post-treatment from material that is pre-approved by the organisation and fits a limited number of the more general topics. More sensitive topics can trigger handover or notification to involve a medical professional.
  • When the user is providing feedback about their current state we can constrain that to well-understood scales and scenarios that have clear follow-on steps from a care perspective.
 
 

Supervision: For a use case like this it is important that relevant staff are proactively notified whenever there is an issue and there is an overall monitoring of the quality of conversations.

Data: For the LLM to be able to provide appropriate recommendations it does not need to process personally identifiable information. It can reason about broad types of users. Again this makes it a good candidate for a starting use case.

 

Conclusions

Organisations can benefit from GenAI technologies today. While it is true that the technology is still very new and best practices are still maturing you can take a pragmatic approach and get started on the path. The right use case with the right levels of control and supervision enables you to manage risk. Given that we are dealing with a general-purpose technology that is transformative it is important to get started now though. As I’ve said in a previous article getting started early is a matter of survival for enterprises.

Share This Post

More To Explore

Bridge
Blog Post

The journey so far…

A collection of the teams week notes detailing the OpenDialog journey between 1st January 2021 to 31st March 2021 as we headed towards the next official release of OpenDialog in Spring/Summer 2021.

See How it works!

Get in touch for a showcase of how OpenDialog can help your business Deploy Conversational AI, at scale.