Securing AI: Building Safer LLMs in the Face of Emerging Threats

Published 4/12/2024 11:32:52 AM
Filed under Machine Learning

The future of computing is changing rapidly with the arrival of foundational models like GPT-4. We're learning a lot about how to use these models, but as we're working towards new applications, we shouldn't forget about keeping information safe.

Last week I visited QCon 2024 in London with Joop Snijder where we learned that there's still a lot of work left for us when it comes to securing AI applications based on large language models. We're faced with an up hill battle it seems because of how easy you can fool models like GPT-4 to expose information that shouldn't be visible.

Threat modeling is becoming more important

During one of the presentations, we were shown how you can expose important security information by sending GPT-4 a limerick that alludes to certain sensitive properties you want returned. When asked directly, GPT-4 wouldn't show the information the hacker was looking for. However, when presented with a limerick that alluded to the same information, GPT-4 got weak in the knees and presented the information. And it thanked the user for the attempt at being poetic.

While fun to watch, it made me more than a little worried about the future of LLM applications. The people who wrote the software around the LLM seemed to forget the basics of security. They figured that telling the LLM not to expose sensitive information in the system prompt would be enough to protect their system against abuse. They were wrong.

I think I mentioned this before on my blog a couple of times. Building AI applications isn't about the AI so much as it is about building software. And with building software, you need to spend time thinking about security. More powerful applications like the ones integrating AI need better security measures, even in less regulated industries.

Increasing security of software starts with raising awareness. And to increase security awareness, it's essential to take a practical approach. One such approach that works well in financial environments is to build a threat model of your application.

A threat model helps you understand threats and mitigations for assets in your application that are valuable to you. The practice of threat modeling involves identifying entry points, exit points and assets in your application that you want to protect. You can then analyze threats for these three categories and come up with mitigations for those threats.

Learning how to build a threat model does take time, though, and you should be aware that it's not something you can do in an afternoon. That's why I recommend integrating it in your software development process.

Integrating security in the development workflow

Integrating threat modeling in your software development process helps break down a complicated problem into manageable pieces. It also helps your team grow in the practice of building secure software. Finally, it constrains your team in building too much security, making the software more expensive than it needs to be.

If you're interested in starting with threat modeling as an integral part of your software development process, you'll need to implement three steps:

  1. At the start of the project, you'll want to set up the basic structure of the threat model with a list of immediate concerns that impact the architecture of the software.

  2. During the sprint you want to expand the model by identifying and analyzing threats for entry points, exit points, and assets involved in user stories you're working on.

  3. When the project is in production, you want to review and update your model periodically and after incidents to increase the security of your software as necessary.

Let me over every step to explore what each of the steps involves.

Building the initial threat model

At the start of a project, you don't know what hackers will try to break. You also don't have a complete picture of what the software will look like. However, you'll likely have a good idea of what the structure of the software will look like.

I recommend building a project start architecture for your project. This helps outline the context, scope, and solution strategy for the solution. A great way to document the scope, context, and solution architecture is to use the Arc42 template. As you're thinking about an initial set of requirements, consider the entry points, exit points, and assets that are involved in these initial requirements. This can help you identify threats and come up with an initial set of mitigations for those threats.

There are several methods available to document threats and mitigations. If you choose to follow the Arc42 template, I recommend that you document threats and mitigations as part of the risks chapter in your architecture documentation. You can then refer to other chapters for building blocks involved in the threats and mitigation of the threats.

After you've completed the project start architecture, you'll have a solid basis upon which to build for the remainder of your project.

Integrating threat modeling throughout the project

As you're implementing the project based on the project start architecture, you'll learn more about what assets are valuable to hackers and what entry and exit points need to be protected.

Make sure you add threat modeling as part of the refinement process, just like how you make quality part of the refinement process. One way of integrating quality and security is to ask three questions:

  1. How are we going to test this piece of functionality?

  2. What data are we going to feed into the system, and what is coming out of the system?

  3. Do we know who will be using this user story, and how well can we trust this someone?

This then leads to a discussion about possible threats, and how we should mitigate them. It's helpful to have the OWASP threat modeling guide at hand for the discussion.

After you've finished refining a user story, make sure to update the architecture documentation with the new threats and mitigations.

I prefer developers to update the architecture documentation because they know all the fine-grained details. I then review the changes to the architecture documentation to make sure we're not forgetting important steps.

As a rule, I recommend keeping the documentation short. You should prefer a few bullet points with a good quality diagram over long pieces of text. It's easier to maintain and easier to understand when you're rushing to resolve a problem later.

As you progress through the user stories, you'll build a well-integrated approach to building secure software and a well-documented system without breaking the bank.

Updating the threat model in production

Eventually, there comes a point where your project enters production. For AI projects, and software projects in general, this is where you expose your system to hackers who will attempt to break your beautifully crafted AI solution.

Incidents will happen, that's the nature of using AI and large language models at this stage. Of course, it's important to fix incidents as they happen. But you'll also want to perform a postmortem to learn from the incidents and update the threat model with new knowledge gained from the incident. You can follow an approach like how you expanded the threat model when you refined user stories. The approach for incidents differs in that you now have information about the real value of assets in the solution to hackers.

I find that taking an agile approach to threat modeling has helped me overcome the fear of missing security measures because I know my team will provide with a few extra sets of eyes, increasing the chance that we're doing the right thing.

As you can see, threat modeling is something that can and should be applied to all kinds of software projects. But there are a few extra aspects that you need to think about when you're building projects that involve large language models.

Building protective measures around LLMs

When building an AI project that contains a large language model, you're working with a model that's easily broken by hackers. Let me explain why.

A typical AI solution that contains a LLM uses that LLM to process user input to then access systems based on the parsed information, turning the information from those systems into output for the user.

More concrete, if you're building something like a RAG, you'll take the user prompt, send it off to a vector store to retrieve internal business information, turning that internal business information into a response to the user prompt.

You should be asking yourself, what happens if someone asks a question about information, they shouldn't have access to? You don't want your LLM to provide an answer in that case.

And this is precisely what's broken in many LLM solutions today. Most solutions use guard rails in the system prompt to guide the LLM so that it doesn't provide answers that the user shouldn't see. However, if you write a clever prompt, you can break the LLM to provide an answer anyway.

There are three important aspects to preventing the LLM from giving unwanted answers:

  1. First, we need to make sure that we protect information sources.

  2. Then, we need to protect the user against malicious responses.

  3. Finally, we have to make sure that we provide adequate answers.

Let's go over each of these aspects to learn how to handle them.

Protecting information sources

You should never trust the system prompt to provide protection against exposing. Instead, make sure you protect the assets that the LLM has access to, like the vector store and relational database containing information that you want the user to access using the LLM.

It's sounds funny, but you shouldn't give your LLM a personality. In other words, the LLM should always access systems on behalf of the user. Don't call the vector store using a system account, but instead forward the user token so you can limit access to relevant information for the user.

Filter malicious and unwanted responses

When I built my first smart assistant for Info Support, I added a list of instructions to our system prompt to guide the LLM not to talk about certain topics. However, I also learned that this didn't work at all. Many colleagues, who will remain anonymous, broke the system more than once.

Instead of relying on guard rails to protect against malicious responses, I switched to using a separate machine learning-based filter to catch prompts and responses that we considered unwanted or unsafe.

It's important to note that while using external filters is better than using guard rails, it's not perfect and likely will never be perfect. This is where I find it important to discuss how far you want to go. The threat modeling process will help you find a good balance between keeping users safe and sane while keeping costs reasonable.

Providing reasonable answers to user questions

One of the most annoying things about using LLMs is that they tend to wander when you don't provide the right information to answer a user's question. Many people call this hallucination. I'm not sure if this is the appropriate term for a mathematical model, but it seems to stick with everyone.

When building a RAG, you want to make sure that you have the appropriate information to answer user questions. Low-quality information assets will break your RAG, which leads to unwanted answers and possibly unwanted user behavior consequently.

I recommend investing in high-quality evaluation datasets and tools to mitigate the risk of providing low-quality answers. One framework that has helped me a lot is Ragas. This framework allows me to easily measure how factual the LLM is answering questions and how coherent the LLM is in conversations.

No matter how well your data, the user will always find a way to get the wrong answer, willingly or unwillingly. So, as a final precaution, I recommend that you clearly post notices in the user interface letting the user know that they're dealing with generated content that could be wrong.

In some cases, I even recommend not using AI at all. For example, for legal advice, I can't say that AI adds a lot of value when you consider the risk of getting the answers wrong. The same goes for HR related information. I don't recommend using LLMs to answer questions in those cases.

This leads me to another interesting topic that we shouldn't skip when talking about security in AI, regulatory requirements do have a role to play when implementing LLMS and AI in general.

The role of AI legislation

When you consider using AI, I can highly recommend looking at the upcoming AI act. Even if you're not living in Europe, it's still a good idea to look at the ideas provided in this new piece of legislation that's coming into force in April 2026.

As we're learning more about the possibilities of AI, it's important to understand when AI doesn't provide enough value considering the risks we need to take. The AI act is obvious on this topic. There are cases where you shouldn't use LLMs or other machine learning models. For example, manipulating people's behavior with deep fakes or similar techniques is considered extremely dangerous and therefore prohibited. You also don't want to use AI for social scoring.

The AI act provides guidance on how to approach security and safety of AI by setting requirements for systems that introduce various levels of risk. For example, high-risk systems require you to be transparent about the data you use and how your models work. This will have an impact on the security measures in your system as well as the solution strategy you need to follow.

There's still a lot we must learn around security and AI. The AI act acknowledges this fact. It also accounts for the fact that AI is a fast-changing technology area. Therefore, the AI also has a clause that allows for experimentation. Startups, and small to medium-sized organizations, are allowed to try out new techniques without having to worry too much about security and safety requirements. However, it's still a good idea to apply the practices that we discussed in this article from the start.

Preparing for regulatory changes

Following the guidelines in the AI act will provide a solid foundation for more secure and better-quality AI Solutions going forward. If you combine the guidelines with threat modeling and a well-documented architecture, you're well on your way to complying with the new legislation that's coming up.

The steps I provided in this article will work for both small and large organizations, since you don't need to document a lot or spend thousands of euros/dollars on specialized technology.


If there's one thing, I learned from QCon London 2024 and my experience, it's that software developers have an important role to play in building AI solutions. Building safe AI solutions requires you to take different perspectives on what you're building, and it will involve many disciplines ranging from security specialists, developers, architects to data scientists.

We should learn from what came before in security to build better AI systems. And I hope you will pick my practices whether you're building AI solutions or note.

Thank you for reading and see you in the next one!