Protect and secure your data from cyber attacks
Data Protection
Data Security
Data Insights
The 5 Steps to Cyber Resilience
Cloud & SaaS
Enterprise
Industries
Cohesity Gaia Catalog lets you discover, curate, and publish time-series backup data directly into your AI and analytics platforms — through a secure, read-only S3-compatible endpoint — across cloud and fully on-prem environments.
Integrations with Databricks and Microsoft Fabric are coming soon, with more on the way.
You’re on the list! Our product team will reach out to discuss Early Access opportunities.
Use Gaia Catalog to curate and expose only the data relevant for your AI/ML use case – without triggering a full downstream copy of your entire data estate.
Every ingestion pipeline your team builds adds maintenance overhead, governance reconfiguration, and weeks of engineering time before data is usable. Gaia Catalog eliminates this cycle – curated datasets are approved, registered, and queryable in Databricks without rebuilding your access controls or classification from scratch.
Once a dataset is curated and approved, it appears inside Databricks as an external data source readable via S3-compatible endpoint. Your team queries it directly; the data stays in Cohesity. No permission rebuilds. No migration project. No pipeline engineering required.
RBAC, immutability, and auditability are inherited from the Cohesity Data Cloud — not rebuilt after exposure. Sensitive data is identified and tagged before the dataset is ever shared downstream. Every access through the endpoint is authenticated, logged, and policy-enforced.
Search protected backups using attributes like file type, path, ownership, permissions, and time range. Build governed datasets across historical versions.
Apply intelligent classification models to tag and contextualize unstructured data – helping identify high-value datasets for AI and analytics use cases.
Expose curated datasets through a secure, read-only S3-compatible endpoint directly on top of protected data. No duplication. No new ETL pipelines. No permission rebuilds.
Role-based access controls (RBAC) are carried forward from the Cohesity Data Cloud into every exposed dataset. When a dataset is registered and approved for access, permissions travel with it – no manual rebuild, no policy reconfiguration required on the receiving platform.
Datasets are read from immutable backup data – the underlying files cannot be altered through the Gaia Catalog layer. Every access through the S3-compatible endpoint is authenticated and logged, giving your security and compliance teams a full, auditable trail of who accessed what and when.
Gaia Catalog applies Data Security Posture Management (DSPM) scanning during the enrichment step – before any dataset is published downstream. Sensitive data is identified, tagged, and flagged at the source. Your analytics platform receives a dataset that’s already been assessed, not one that needs to be re-scanned after it arrives.