This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy
Enrico Signoretti, VP Product and Partnerships, Cubbit
Most of AI today is hype. Of the AI that’s really making money, most of it is machine learning, which is where your data is essential. You need good data, you need control over it, you need it perfectly tagged and with well-defined perimeters. So, data management in general is huge.
It goes beyond just storage. It’s about having a platform that gives you visibility and security, and minimizes duplication of data. It’s in between a storage system, a data management system, and governance. Enterprises need to be aware of what they have and how they can use it. In the past, data management was not a primary concern: petabytes of data, but no idea what was being stored, for how long, or whether it was really important. Now, everything could be useful to train AI, but it needs to be targeted and under control.
For data management in Europe, along with GDPR regulations, sovereignty is becoming the key. If you want to be competitive, you need full control of your data. What happens if a service provider regulated by a foreign country threatens to take advantage of you from your data, which sits in their data centers? This issue of control is increasingly discussed by CIOs because it’s becoming the key for the future of practically everything.
Everything will be more about understanding your data. Companies will continue to produce a lot of data and recognize it’s essential any AI project. Some data will be for everyday production, which won't change much. You’ll produce and store it for compliance, etc.
What is changing is that you need to be aware of what you're storing, and how, and why. A law firm, for instance, has contracts and documents. With basic tagging and some additional information, they can build a huge platform to train an AI and then automatically check new documents, analyze laws, and correct or even predict problems. It simplifies document production and improves reliability and that's great.
But if you don't have any idea beyond the name of the Word file, then it becomes really hard to find these key documents and build a fancy AI upon it.
Data is your most important asset. Many companies have a multi-cloud infrastructure – but where is your data? You might choose the best application for each single-use case, but now you’ve got your data in silos. Every time you need to move it around, you’re paying out money and are locked in.
Think about it the other way around and build a huge data repository. All the clouds can access it, but they don't own the data, they just own the compute. You have control of the data, the cost, everything. “Data first” is the key.
There are several types of data storage – block, file, object – and each has its advantages and disadvantages, depending on the kind of workload you have to run. I've always been a proponent of having some very fast, very resilient block storage, because you need it for under-millisecond latency, so you can easily access your stuff. All the rest could be object storage.
Even files could be on object storage because there are plenty of solutions that convert files to objects and allow you to seamlessly access them. I called it “flash and trash,” because you keep data that you need immediately on flash and all the rest could be on a secondary storage that could be object or files. It needs to be cost-effective, resilient and accessible when you need it. That strategy’s still valid.
We store everything now, because it could be useful later to train AI. Oil and gas companies, for instance, can use the latest computing techniques and AI innovations to retrieve and research data that was stored decades ago to find new sources of oil. This is petabytes of data being used 20 or 30 years later, so you don't need high speed, just resilient storage.
A: We're seeing cyber storage that’s resilient to ransom attacks, but again, it's about your "data first" strategy. A hacker can launch a DDOS attack and stop your operation for one or two days. You lose a lot of money, and that’s damaging. But if you lose your data, you lose everything.
A "data first" strategy means data security too. Before the perimeter, before the devices, you have to think how to protect your data. There are a lot of techniques to protect your data. The problem is the cost. If it becomes too expensive – backing up to two different sites with three different media, for example – then at some point, you give up and use a simpler back-up technique. But it’s about how your data is encrypted in the backend, how it’s moved between various nodes in your infrastructure, etc.
In reality, the most urgent problem is not ransomware encryption, because ransomware encryption is the last visible problem you have. Hackers got access to your systems months before that, and had time to read all your data and steal what they needed. Only when they’ve finished that job do they start encrypting everything. Data is of value to them because they can ask you for millions of dollars.
But what is the additional value of that data to a competitor? What is the value of having millions of your customers’ accounts on the dark web? You need to really protect your company, and your customers, from that threat.
Exactly. It's all about data. And AI is just the last of a long list of applications where data is the only real value that you have.