Sweat the small stuff: Data protection in the age of AI

Posted by:

|

On:

|

As concerns about AI security, risk, and compliance continue to escalate, practical solutions remain elusive. While NIST released NIST-AI-600-1, Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile on July 26, 2024, most organizations are just beginning to digest and implement its guidance, with the formation of internal AI Councils as a first step in AI governance. So as AI adoption and risk increases, it’s time to understand why sweating the small and not-so-small stuff matters and where we go from here.

Data protection in the AI era

Recently, I attended the annual member conference of the ACSC, a non-profit organization focused on improving cybersecurity defense for enterprises, universities, government agencies, and other organizations. From the discussions, it is clear that today, the critical focus for CISOs, CIOs, CDOs, and CTOs centers on protecting proprietary AI models from attack and protecting proprietary data from being ingested by public AI models.

While a smaller number of organizations are concerned about the former problem, those in this category realize that they must protect against prompt injection attacks that cause models to drift, hallucinate, or completely fail. In the early days of AI deployment, there was no well-known incident equivalent to the 2013 Target breach that represented how an attack might play out. Most of the evidence is academic at this point in time. However, executives who have deployed their own models have begun to focus on how to protect their integrity, given it will be only a matter of time before a major attack becomes public information, resulting in brand damage and potentially greater harm.

The latter issue, data protection, touches every company. How do you ensure that your core IP, code, customer data, etc., isn’t intentionally or accidentally exfiltrated into a public LLM model? It’s become ultra-important for CISOs to monitor LLM interactions, track protected source code in cloud repositories (repos), and prevent unauthorized AI indexing of intellectual property and other private data.

Industry security perspectives

From the data observed at the recent conference and talking with other industry security executives, it is clear that only a minority of organizations have deployed solutions to protect their enterprises against AI dangers. In general, it appears that there are three mindsets at this point in time:

  • Focus on policy and process, adding new AI usage limitations to the rules that are already in place for data protection and privacy
  • Treat AI data risk as a new area where new policies, processes, and product-based logging and defense are required
  • Look at AI as just another tool that is already covered by existing policies and processes. The executives in this category say that data privacy, data and IP exfiltration, and malicious attacks are the pertinent topics to focus on, with or without AI in the mix. AI is no different from other applications or cloud environments already covered by existing defenses and processes.

No matter your beliefs, protecting your organization’s internal data has to be at the top of the CISO checklist. For those with proprietary AI models, preventing malicious attacks, data leaks, or model contamination (both accidental and intentional) are critical tasks.

Time is short to deliberate further on activating these cyber protections. As AI solutions become more pervasive, it’s time to advance these organizational efforts in 2025.

Key challenges

CISOs are and should be concerned about several AI-related areas in their cybersecurity pursuits. One is the monitoring of employee’s AI use. While many organizations can now track which Large Language Models (LLMs) employees are accessing, can your teams monitor the actual prompt content? For many, that’s a significant blind spot. Additionally, does your enterprise flat-out restrict or permit public LLM access? That’s an additional hand-wringing dilemma shared by technology executives. Even if there is a prohibition on corporate networks and assets, will employees find a way around these restrictions if they believe that they provide a shortcut to getting their work done? This can create the problem of Shadow AI, which organizations should avoid.

Second, enterprises face concerns over data protection. AI usage may bring the risk of sensitive data exfiltration through AI interactions. Thus, CISOs must emphasize the need for a balance between accessibility and security and oversee the growing demand for logging and tracking capabilities.

For example, a concern that security executives frequently discuss is the monitoring of source code movement into public repos or LLMs where it should never flow. Several notorious instances have been made public where developers foolishly or accidentally used public resources to troubleshoot or look for advice on how to fix their code. These have led to the exposure of core IP, API keys, and other sensitive data. So, how do you prevent your source code from being put into a public GitHub or GitLab repo or input to ChatGPT? 

While employee training is a must to avoid these behaviors, in some cases it goes directly against the desires of the development team to maximize productivity and meet schedule deadlines. I’ve talked to development executives who have encouraged the use of public tools and repos for their employees who are “stuck.” As one recently told me, it can create a time and quality advantage to upload code segments into public repos, so long as it is just a small enough segment that IP leakage is avoided. But how does one measure the risk of “small enough”? These are the types of situations that are driving a considerable thrust around next-generation data exfiltration, protection, and prevention.

Rumored vs. practical threats

While the foundations of AI security threats exist, the current landscape is driven more by preventative concerns than actual incidents. We’ve heard rumors of model contamination or poisoning, even though documented research shows potential vulnerabilities (e.g., training image recognition models to misidentify objects).

Anecdotal evidence shows that even LLM firewall providers haven’t encountered attacks in recent months. Sure, we hear about cases of large data leaks, but we’ve yet to hear much about an AI-generated attack in which an organization’s model was contaminated or poisoned.

At this time, organizations need to focus on developing balanced, practical security measures rather than overly restrictive protocols. These will start with existing controls and be augmented with new AI-specific ones. As regulations continue to evolve, businesses need to implement new reasonable security measures while maintaining the productivity benefits of AI tools. The challenge lies in finding the balance between protection and practicality.

Cautionary small steps forward

CISOs should be taking essential small steps to protect against any data breach event and any accidental (or malicious) AI-related incidents. The first should be to have a clear, common-sense policy around your data usage, with internal limits for access. Your enterprise should have a policy around an LLM that’s no different than current policies for putting information into the cloud or web today. Continuous training is essential.

Create an audit trail of employees’ interactions with a specific LLM. Find out what questions employees ask via LLM prompts and what data they might ingest. Track these efforts internally for further exploration.

Choose to adopt a private instance explicitly built for the organization’s needs without compromising data privacy. Your enterprise needs to decide if your employees will access public LLMs or a dedicated, isolated version of an AI model trained solely on an organization’s data. This private instance will help ensure that sensitive information remains confidential and is not shared with other users or the broader AI platform. This customized internal AI system effectively controls the company’s data and model usage.

After all these are implemented, CISOs can look at protecting their AI models once they are in place. With bigger use cases to come, that may be the subject of a forthcoming column here.

Doomsday scenarios or the big stuff

Beyond focusing on the items discussed above, some CISOs foresee futuristic doomsday-type scenarios to be wary of. For example, what if a tech hacker disliked a particular hedge fund manager and launched a coordinated strike against all of the AI models their firm used to trade? That manipulation of financial trading platforms might cause the company and its investors to lose a great deal of money. Or imagine a bad actor that goes after the self-driving algorithms of automobiles, leading to physical harm. While we all hope these are in the realm of negative speculation at best, it’s not unreasonable to fathom that these types of attacks are possible in the future. And before you say that these models wouldn’t be accessible from the outside world, there is always the risk of an insider threat. Rogue employees exist, and you never know what a disgruntled employee on the opposite of the political spectrum could do. We hope we won’t have to contend with these extreme cases of model contamination for a very long time, if ever. However, security executives and their AI Councils must focus on preventing these risks, however small, from materializing.

Posted by

in

Leave a Reply

Your email address will not be published. Required fields are marked *