This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy

We can connect to our data wherever we are
02

The glittering promise of data and AI

Chris Mellor, Founder and Editor, Blocks and Files

AI holds out a shining opportunity. Data storage lies at its heart.

In recent years, an ever-rising tide of data has been distributed throughout an enterprise, whether staff are at the office or elsewhere. Now, the rise of AI encourages access to this far-flung data through a single lens; the AI smart chatbot or agent. 

How will these two different trends be reconciled so that organizations (and their people) get the best of both worlds, enabling us all to live, work and relax in an AI data space? 

The shift to hybrid computing and cloud adoption 

A few years ago, many organizations began adopting public cloud computing; their data and applications ran on the distributed IT infrastructure of their cloud service provider. This came with different business models and management facilities from on-premises IT. Suppliers responded to this with common virtual machine environments and adopted cloud data storage protocols to move data and applications between the public cloud and on-premises environments. This was known as hybrid computing. Organizations embraced the kind of subscription-style business models supplied by cloud providers, moving away from permanent purchases and perpetual software licenses. 

As organizations and their data centers grew and adopted public cloud IT, the size of their data estates – the infrastructure needed to manage all of that data – increased as well. Database records about customers, products, internal processes, sales, marketing, operations, and so forth mushroomed. Files measured in their thousands became hundreds of thousands, then millions, tens and hundreds of millions and even billions. 

From disks to SSDs 

In the early days, files were kept on disk drives. Then, companies that needed fast access to their data moved to solid state drives. The SSDs were more expensive than disk drives, which, in turn, were more expensive than slow-speed archival tapes where older data was stored for reference. There were tiers of storage, from fast access and costly SSDs, through mid-speed and mid-cost disks, to slow and more affordable tape. 

But having IT staff move files from one tier to another is simply not feasible when you have hundreds of millions or billions of files. 

Automating data management with file lifecycle software 

File lifecycle management software arrived to do this automatically. It looked at how often files were accessed, moving seldom-accessed files from SSDs to disk and then to tape. Users didn’t need to know where files were physically located, as the file data management software presented them with a single index, then fetched requested files from wherever they were stored. 

This system managed all of the dispersed data through a single facility – a single pane of glass, as it were. It knew where exactly the file was: in which data center, office or public cloud region, and on which storage tier. It was as if a multi-branch library maintained a central catalog that was visible to all of the branches. The data management facility also enabled data to be moved to where it was needed, and to be readily accessible when someone requested it. 

The concept of a unified data space 

The data manager software can orchestrate the data’s location or placement, just as a public reference library can arrange for books, microfiche, or periodicals to be brought to a reading room desk from separate branches. But now the storage, data request, and delivery are all digital, and are not tied to any specific physical location. 

We exist in a kind of pervasive virtual data space. We can change our location, flying from Singapore to London, for example, and still access our data. Indeed, we can work with it on an airplane, confident that the updated information will be synchronized when we land. 

The data space in which we operate was originally tied to wired computer terminals and then PCs in data centers. It spread with the wired Internet to our offices and our homes. 

Then mobile telephony and WiFi changed the world forever. We are no longer tethered by wires and can connect to our data wherever we are. Smart watches, smart glasses, and other devices use our mobile phones and notebooks as relay stations to the Internet. 

Advances in storage capacities and technologies 

The increasing amount of data requires the individual storage devices to hold more data. Tape drives were once the champions here, with individual cartridge reels storing 15 or 30 compressed terabytes. They were slow to access as they have to be read from the start of the tape to find specific data items. Faster disk drives let you go directly to any part of the disk and their capacity has caught up with tape cartridges, now having 32 TB or more of uncompressed data on them. 

Yet even they are being eclipsed by solid state drives which are very much faster than disk, having a direct electrical connection to the storage cells instead of waiting while a disk spins to bring the stored data location underneath the read/write head. We now have 61 TB SSDs with 128TB ones announced in the last few weeks. These are built from individual NAND chips that have vastly more capacity than before; one terabit, for example – a thousand times greater than the one megabit chips seen a few years ago. 

This means a rack of such SSDs can store 50 petabytes or more of data. We would need around 4,700 disk drives to store that amount of data, or about 11 racks. The data center space saving with SSDs being used instead of HDDs is huge, as is the reduction in the power and cooling needed. 

AI revolution: large language models and data needs 

AI has been revolutionized by the development of large language models which enable an AI agent or chatbot to accept natural language input and generate a natural language response ranging from a simple query answer to a summary of a patent application or analysis of hospital X-ray and CT scans, computer program code, an image or even a video. Although at heart these are statistical what-comes-next prediction engines, the sophistication and depth of their response is astonishing. The larger the training data sets, the better their results. Grant them access to an organization’s proprietary data and they promise to augment simple relatively low-level human interactions such as first-pass inbound sales inquiries and support calls. Even greater capabilities are promised in what is being called agentic AI with chatbot agent talking to chatbot agent to accomplish multi-step tasks. 

The AI data pipeline: preparing data for AI agents 

AI agents need massive lakes of data fed quickly to the GPUs on which they are trained. And they need data from which to generate responses in everyday use, inferencing as it’s called. Most of the data they need is stored in files and objects, to then be made available by selecting relevant subsets, filtering and removing any sensitive information and mathematically transforming it into so-called vectors. The chatbot agents search a database of vectors to generate their responses. So, an AI data pipeline is needed to select, filter, transform and then feed the vectorized data to the AI agent for processing. Virtually every database, data warehouse and lake-house supplier is now building such a pipeline. 

Huawei’s role in the AI ecosystem 

For a supplier like Huawei and its customers this is an extraordinary moment. The company’s chips can process AI workloads, its servers can train AI agents using data stored in its OceanStor arrays, with pipelines preparing the data for processing, and then the arrays and their data lake software serving data for inferencing. 

Its network equipment can transfer AI data between servers and storage and end-points; the PCs, notebooks and smartphones. The very large NAND chip capacities noted above mean that its smartphones, tablets, notebooks and PCs can store immense amounts of data compared to a few years ago. Running AI applications on them becomes quite feasible. Indeed, AI chatbots like Perplexity and Grok are even now available in smartphone application form. Soon we will be able to speak to them and listen to a spoken response. 

The future of AI: ubiquitous access and smart devices 

The whole range of Huawei’s semiconductor chips, server, networking, storage, PC, notebook, tablet, smartphone, smart glasses, smartwatch and earbud products can participate in what promises to become a veritable AI feast. This will give it an unrivalled ability to recognize AI tech and usage trends in what we could call an AI data space, and develop and provide AI capabilities across a wider range of products than any other supplier in the world.