A Fresh Look at Scale-out Files and Objects
We recently shared several challenges of scale-out NAS in a previous blog. The first step toward overcoming challenges is to acknowledge them. IT teams have become accustomed to complexity, limitations, silos, inefficiency, and more.
Let’s acknowledge the pains and challenges – as we also imagine solutions that transcend the limitations of traditional scale-out NAS.
Data-Centric or Infrastructure Centric?
With a storage-focus, one becomes fixated on adding storage infrastructure as the resolution to file environment requirements. For example, anti-virus infrastructure is infrastructure that is added to file serving infrastructure to protect against malicious files. But is additional infrastructure the only way to address AV protection?
A data-centric approach brings apps and compute to the data. The result is that a large part of NAS infrastructure can be eliminated by running NAS ecosystem apps directly on a storage solution. If the anti-virus application runs on the same NAS solution that it protects, this would eliminate the need for additional anti-virus infrastructure. And so it is with other file ecosystem apps such as file audit and content search.
True data-centricity isn’t just about integration. It’s about data-driven decisions. For example, if you can look inside data files, you can understand the data. Driving compliance, regulatory, and corporate governance from actual data is far beyond what is learned from file metadata alone… or worse… plain old wild guesses.
Efficiency and Scalability
This is important because if you don’t have both, it can cost you money. With efficiency, more is always better, but we often don’t always know there are more efficient choices to fit tight budgets. And scalability only matters when you hit scalability limits, but if scale comes with inefficiency – what is the point? For consolidation projects and deploying future-proofed systems, some would even say that there is no such thing as too much scale. But everyday operations such as snapshots, clones, limited file attribute parameters, and more impose limits that we simply accept because we are unaware of other options.
Let’s assume a world in which there are no limits. Number of files, file size, file name size, file path size, and so on. Not only should app data never hit a capacity brick-wall – consolidation projects are completed with capacity to grow. “No-limits” means that common operations such as snapshots can be scheduled with short intervals for better data protection, and without the typical side-effect of performance degradation or burdensome snapshot maintenance.
Data reduction technologies allow customers to squeeze more data into the same physical storage space for lower cost. Based on sliding window variable dedupe, advanced compression, and small file capacity optimization, customers can receive up to 2.5x greater storage capacity, or more, for the same disk capacity versus traditional scale-out NAS. And what about data that is only deduped within its own volume silo because it cannot be deduped across the entire data center? Dedupe should not have limits if the real goal is efficiency.
A robber is only a problem AFTER they have broken into a home… and so it is with cyber-crimes. Deep on the inside, this is often an IT manager’s worst fear – if they will admit it. If it’s not an unauthorized person gaining access credentials, then it’s fear of virus, malware, hackers, or ransomware attacks. Amongst the biggest challenges with cybersecurity is not the absence of it (because it is required), but rather, easily deploying and managing it. If security is always needed, why is it treated as an optional “add-on” that you have to separately deploy and manage? When you buy a car…the car comes with brakes BY DEFAULT. Safety is never an option. Traditional scale out NAS does not yet understand this as integrated cybersecurity remains an external option.
We imagine cybersecurity as fully-integrated within the data solution – and with the ability to protect against all manner of cyber threats. Blocking malicious files, detecting anomalous file accesses, identifying high-risk information. And that’s just the start. The goal is always bulletproof security…so let’s add multi-factor authentication, end-to-end software encryption, an immutable file system, and compliance with federal security specifications to the list of what should be integrated with NAS.
If it was as simple as searching the C: drive on your PC…or entering a Google search…this would never be an issue for the enterprise. Unfortunately, conducting file searches across the enterprise can be very challenging. Searching across data center silos, VMs, backups, multiple sites, and even searching across multiple clouds – it’s a project, not just a simple query. Multiple search iterations are the rule as a single “Google-like” search across the enterprise is generally not possible.
Ideally, file products should be able to provide fast indexed search of any data, anywhere, and across multiple workloads and protocols. Shouldn’t it be a single simple search across VMs, backups, data centers, and multiple clouds? And what if I need to search within file contents for compliance and eDiscovery purposes, or for personally identifiable information? Why can’t file solutions make this as easy as Google? Well, it can be that easy.
There is always a way to do this better. There are always files that should be purged or moved. Inactive files on tier one storage remain on costly tier one. Data migration is an undesirable management chore. What data to move? On what conditions? Will anything break after the move, and will the move be transparent? And for large data migrations, why can’t they be as simple as running a backup?
What about seamless and transparent data tiering of cold data to a cost-effective tier – all integrated within a file services solution? Why can’t file solutions be smart enough to automatically move cold data from almost any tier one NAS appliance to a more cost-effective storage tier, or to the cloud? All data migrations should be automated or as simple to execute as backup – if the IT world was perfect.
Why is data interchange between Windows, Linux, Unix, and the cloud so tricky? Simultaneous access to the same data? File permissions honored? Why should it matter which file protocol data is written in? Does a different protocol for different environments really work in a mixed OS, hybrid, and cloud world? It’s time for a real solution to this madness!
Freedom of Choice
Freedom of choice is like a fork in the road. It always begins with freedom, but the wrong choice ends freedom – leading to a dead-end road. The fork in the scale-out NAS road is a choice between proprietary hardware-based appliances and software-defined file and object services. Can freedom be defined as proprietary hardware lock-in? Or as lack of support for the same file environment in the cloud? Or, maybe limited and costly hardware choices that are not always optimized for workload requirements?
This is not freedom. Traditional scale-out NAS is on the proprietary hardware fork. In spite of customers seeing the road’s dead-end at the time of purchase, customers still opt for this fork for a variety of reasons, but freedom is not one of them.
Real freedom is hardware agnostic, environment agnostic (will run anywhere), and vendors cannot lock you in. Software-defined actually increases freedom. A wide selection of hardware is available to match workloads. You can run anywhere – including the cloud. This should be enough, but bigger than all of this is the assurance of having a future-proofed environment. It’s freedom for the future. Want to move to a hybrid or cloud environment in the future? The answer is “yes”. Need to change hardware vendors? Again, yes. Unsure of your IT strategy in five years, but need an approach today that enables you for tomorrow?
Announcing Cohesity SmartFiles
Today, Cohesity announced the Cohesity SmartFiles solution, which belongs to the Cohesity DataPlatform. SmartFiles addresses the challenges we’ve described, including infrastructure, efficiency, scalability, cybersecurity, enterprise search, multi-tier data management, and compatibility across varied enterprise and cloud environments. Yes… that’s a lot of challenges to address. But in addition to this, the Cohesity DataPlatform is unique with integrated apps to support a file ecosystem without the expense of external hardware infrastructure. Going without anti-virus protection is not an option these days, and two anti-virus app choices are available within the Cohesity Marketplace for the SmartFiles solution. There is also file audit and the ability to detect anomalous file accesses. Content search allows for fast indexed search of the contents of files. This is the foundation for any data-centric approach as search results can be used to drive other services and processes. It’s also invaluable for eDiscovery, compliance, and gaining control of dark data across the enterprise.
SmartFiles features include multiprotocol support for NFS, SMB, and S3 with unified permissions and an API-first design. The Cohesity DataPlatform is built on Cohesity SpanFS, a fully-distributed shared-nothing file system for scale, performance, fast-ingest, and limitless snapshots and clones. In addition to file and object services, the consolidated platform provides backup, disaster recovery, cloud integration, ransomware protection, non-disruptive upgrades, and an app engine for a variety Cohesity Marketplace apps. The platform uniquely addresses the problem of mass data fragmentation by eliminating multiple silos and point solutions across data centers and multiple clouds. The Cohesity DataPlatform is supported on a variety of hardware platforms including Cohesity, HPE, Cisco, and more.
You can view a video on SmartFiles, or view a lightboard video on SmartFiles multi-tier data management for tiering of cold data from NetApp, Dell/EMC Isilon, and Pure Storage file products to a more cost-effective Cohesity tier.