Scaling retrieval augmented generation systems for enterprises

If your LinkedIn feed is anything like mine, it is full of posts about how easy it is to implement a retrieval augmented generation (RAG) system in just a handful of lines of code. But they tend to leave out the hard parts.

The narrative surrounding the ease of deploying advanced machine learning systems like RAG can be misleading. While it’s true that modern frameworks and pre-built models have simplified the process to an extent, they often gloss over the complexities that arise when scaling to enterprise levels. Let’s dive into the reasons why implementing a RAG system at scale is not as straightforward as it seems.

Considerations before implementing a RAG system

Like everything in this world, once you start adding scale, the simplest solution can become complex. Here’s why.

When dealing with enterprise-level amounts of data, the one-size-fits-all approach quickly becomes inadequate. The complexity and diversity of data types, compliance requirements, and the need for integration with existing workflows mean that the plug-and-play methods celebrated on social media are often inadequate for enterprise use.

First: Enterprise file stores have different file types, formats, and languages. How do you identify the right files to use? And are they the right versions?

Enterprises possess vast reservoirs of information in various forms—from PDFs and Word documents to emails and spreadsheets. Ensuring that a RAG system can handle and accurately interpret this array of data formats is a significant undertaking.
Ensuring data consistency, version control, and language compatibility across all these formats is a foundational challenge that needs to be addressed before any advanced analytics can occur.

Second: How do you process all that text into some sort of index? Should you use a cloud-hosted VectorDB? On-prem? Maybe a relational database that supports vector search? Or a document database that supports vector and hybrid search? Who is going to support this?

Choosing the right infrastructure for data indexing and retrieval is crucial for performance and scalability. The decision to host on the cloud or on-prem can affect cost, speed, and security.
Picking the appropriate database technology that aligns with enterprise needs while ensuring there’s technical support available is a decision that requires thorough research and strategic foresight.

Third: Arguably the most important part—it comes down to security. Who can access what data for their RAG queries? Not everyone should be able to see all the proprietary R&D data, or all the HR/Payroll data—right? How do you manage access controls to data while still giving the right groups access to what they need?

Balancing the accessibility of data for RAG applications with stringent security protocols to protect sensitive information is a tightrope walk for enterprises.
The implementation of robust access control mechanisms without impeding the flow of necessary data for intelligent decision-making is a critical aspect of deploying a RAG system in an enterprise environment.

Cohesity Gaia features

To address these challenges, Cohesity Gaia* offers a comprehensive solution that caters to the unique requirements of enterprises. Our data security and management platform, the Cohesity Data Cloud, provides all the necessary tools to handle diverse data types, ensure data consistency, and maintain a high level of security while enabling powerful analytics capabilities.

Architecture: Cohesity Gaia’s architecture consists of a control plane (Gaia-CP) and a data plane (Gaia-DP) that work together to manage and process enterprise data. The control plane is responsible for orchestrating various workflows, managing data models, and providing APIs for user interactions. The data plane, on the other hand, is responsible for accessing and indexing the data stored in the Cohesity cluster. With Cohesity Gaia’s Embedding Service (Gaia-ES), enterprises can effectively extract text from various file formats, create semantic indexes on the data, and use these indexes to gain deeper insights into their data.

Infrastructure: In terms of addressing infrastructure concerns, Cohesity Gaia is designed with flexibility in mind. Design decisions made today are meant to support the deployment of its components on the cloud, on-prem, or a hybrid of both in the future. For example, Cohesity Gaia’s indexing service is designed to abstract away different vector databases from the embedding service, so that in the future, enterprises may be able to choose the most suitable vector database technology based on their specific needs and requirements. Cohesity knows that customers will want a choice of infrastructure and services that align with the organization’s cost, performance, and security objectives.

Security: Security is a top priority for enterprises, and Cohesity Gaia addresses this through the implementation of fine-grained, specialized role-based access control (RBAC) policies. These policies restrict access to Cohesity Gaia APIs, ensuring that only authorized users can access specific data sets. This allows for a balance between data accessibility for RAG applications and the protection of sensitive information.

Align tools to support the long-term vision

By providing a comprehensive solution that addresses the challenges of scaling retrieval augmented generation, Cohesity Gaia empowers enterprises to harness the full potential of their data while maintaining the highest levels of security and compliance.

This is why, more often than not, it isn’t always in the best interest of an enterprise to build tools. But they should seriously consider buying tools from companies that understand managing data at scale. Companies specializing in scalable RAG systems have the experience, resources, and expertise to tackle the challenges of large-scale deployment. For enterprises, using these specialized tools can mean the difference between a cumbersome, inefficient system and an agile, insightful one that drives business value.

While it’s tempting to get swept up in the hype of easy RAG implementations, the reality is that deploying a RAG system at an enterprise scale is a multifaceted challenge. It requires navigating data complexity, scalability concerns, security protocols, and integration hurdles—all while ensuring the quality and explainability of the outputs. As enterprises consider their options, they must weigh the trade-offs and decide whether to invest in building in-house solutions or to partner with experts in the field. The decision to build or buy is not just about the technology. It’s about understanding the strategic direction of the business and aligning the tools to support that vision for the long term.

*GA expected March 15, 2024.

Learn more

Written By

Greg Statton

Office of the CTO - Data & AI

Scaling retrieval augmented generation systems for enterprises

Considerations before implementing a RAG system

Cohesity Gaia features

Align tools to support the long-term vision

Learn more

Recent Blogs

You may also like

Tips for bringing generative AI into your enterprise

RAG to riches: Unleashing the power of retrieval augmented generation (RAG) in LLMs

Why responsible AI matters

Most popular blogs

AI is accelerating vulnerability discovery—here’s how Cohesity is responding

Cohesity and Semperis team up to accelerate identity resilience

Minimum Viable Company is about trust