ModelArts 3.0: a Arue AI Accelerator

As a one-stop development platform for AI, ModelArts 3.0 leads the field in model training.

By Tian Qi, Chief AI Scientist, HUAWEI CLOUD
田奇

HUAWEI CLOUD’s Enterprise Intelligence (EI) has achieved strong results in numerous industry competitions and evaluations. HUAWEI CLOUD has invested heavily in basic research AI in three domains: computer vision, speech and semantics, and decision optimization. Focusing on four areas  - model efficiency, data efficiency, computing power efficiency, and knowledge efficiency - Huawei has proposed six basic research plans: 

  • High-performance Model for large models, 
  • Lightweight Model for small models
  • Data Magic Cube for multi-modal learning
  • Data Iceberg for small sample learning
  • Generic Vision for general knowledge extraction
  • V-R Integration for a new learning paradigm.

To help AI empower all industries, the ModelArts enabling platform supports plug-and-play deployment of HUAWEI CLOUD's research results in areas such as automatic machine learning, small sample learning, federated learning, and pre-training models.

The three basic research areas of AI

In the area of perception, HUAWEI CLOUD continues to be an industry-leader in ImageNet large-scale image classification, WebVision large-scale network image classification, MS-COCO two-dimensional object detection, nuScenes three-dimensional object detection, and visual pre-training model verification, including downstream classification, detection, and segmentation. Perception models driven by ModelArts have been widely used in sectors such as medical image analysis, oil and gas exploration, and fault detection in manufacturing.

In cognition, HUAWEI CLOUD integrates industry data based on its expertise in semantic analysis and knowledge graphs. By managing diverse, complex, and siloed datasets, it has leapt from perception intelligence to cognitive intelligence. Cognitive models driven by ModelArts have been used for a range of tasks, including drug-target prediction, financial fraud analysis, and intelligent after-sales services.

In the field of decision-making, Huawei has built a complete foundation for decision-making based on various algorithms, including operations research optimization, reinforcement learning, and intelligent control algorithms. This has made possible a true intelligent closed loop of perception-cognition-decision making.

The decision-making engine driven by ModelArts has already been deployed in multiple sectors, including aircraft stand allocation, industrial manufacturing, intelligent transportation, and gaming and entertainment.

HUAWEI CLOUD's ModelArts 3.0 is a one-stop AI development platform for the AI industry. HUAWEI CLOUD has been exploring ways to use AI to efficiently solve industry challenges, such as including training high-precision models with very little data, lowering the barriers for AI adoption in the enterprise sector, and solving businesses' concerns about the safe use of data. ModelArts 3.0 which integrates backbone models, federated learning, intelligent diagnosis, evaluation and optimization, and high-efficiency computing power.

The four new features of ModelArts 3.0

HUAWEI CLOUD's backbone tool-chain EI-Backbone integrates model efficiency, data efficiency, computing power efficiency, and knowledge efficiency, and optimizes AI deployment capabilities by enterprises in different industries. EI-Backbone’s capabilities have already been successfully verified in over 10 sectors, it’s won more than 10 industry challenge competitions, and had more than 100 top-level conference papers presented on it.

EI-Backbone offers a new paradigm for AI development. For example, medical image segmentation for the lung used to require hundreds or thousands of labeled data for training, but with EI-Backbone, training can be completed using just dozens of labeled data – and even as few as ten – reducing labeling costs by 90 percent.

While in the past model selection and hyperparameter tuning for lung medical image segmentation required extensive expert experience and trial and error, EI-Backbone's full-space network architecture search and automatic hyperparameter optimization technology, it can be completed quickly without manual intervention. 

Moreover, precision is greatly improved. And instead of the weeks it would take from scratch, model training, testing, acceptance, and deployment can be completed in a few hours, or even minutes, by loading a pre-training model integrated into EI-Backbone. This can lower training costs by more than 90 percent.

ModelArts 3.0's latest big feature: federated learning

Data is the cornerstone of AI applications and intelligent perception by AI depends on diverse data. However, when AI is deployed in industry scenarios, data silos in industry applications created by data being scattered across different data controllers reduces the effectiveness of training AI algorithms.

To solve this issue, HUAWEI CLOUD's ModelArts 3.0 provides federated learning, which supports joint modeling while letting the data stay where it is. Users use local data for training and exchange updated and encrypted model parameters, rather than exchanging the data itself, therefore enabling collaborative training.

In collaboration with Professor Jiang Hualiang from the Shanghai Institute of Materia Medica, Chinese Academy of Sciences, HUAWEI CLOUD EI leveraged Huawei's self-developed FedAMP algorithm and AutoGenome algorithm in AI tasks for drug research and development. They were able to accurately predict the water solubility, cardiotoxicity, and kinase activity of a drug, at a level far exceeding traditional federated learning and deep learning algorithms.

Furthermore, HUAWEI CLOUD's cloud collaboration service supports federated training using data from different locations and customers. Encrypted data can be uploaded to servers where the global model is updated and then distributed to edge devices. This makes it easy to support same-format horizontal federation and cross-format vertical federated learning.

Users can participate in federated training through the cloud or the computing capabilities of HUAWEI CLOUD edge devices (such as intelligent microsites), which enables intra-industry joint modeling.

Models must be fully evaluated before they are deployed and launched. Models with excellent results are directly put into the production environment, while those with unsatisfactory results need to be further optimized and iterated.

ModelArts provides a comprehensive visual evaluation and intelligent diagnosis function. This lets developers intuitively understand the performance of all aspects of the model and carry out targeted tuning or deploy for production.

To evaluate the classification model for epithelial disease cells, for example, ModelArts maximizes precision by providing a data sensitivity analysis module to evaluate the performance of the model in different data feature sub-intervals, as well as providing conventional indicators such as accuracy, precision, recall, F1 value, confusion matrix, and ROC curve. 

ModelArts provides operator-level time and space consumption statistical analysis and various overall performance indicators. It also gives suggestions for model performance such as model quantification and distillation. For interpretability, it provides a heat map to show areas the model uses to make inferences. ModelArts also offers various methods for evaluating model trustworthiness, providing multiple evaluation indicators for model security and capability. It can also give diagnostic suggestions for improvements based on the model's current performance.

With the continued growth of distributed training for deep learning, model training has increased demand for computing equipment. However, resources for training tasks are often not fully utilized for a number of reasons, including:

  • Low-quality training algorithm code.
  • Non-optimal model size and hyperparameter settings.
  • Peaks and troughs in overall resource pool utilization. Just as in electricity demand, there are peaks and troughs in commits in training tasks.

Flexible training is one of ModelArts' core capabilities. It can adapt to required model training speeds for optimum resource allocation.

ModelArts provides two modes. Turbo mode fully utilizes idle resources to accelerate existing training tasks by 10 times or more without affecting the convergence precision of the model. Economic mode maximizes resource utilization, providing developers with the ultimate price/performance ratio and boosting price/performance ratio by over 30 percent in most scenarios.

To better support AI R&D with large computing power requirements, the ModelArts platform has optimized cluster size, task numbers, and distributed training, and the ModelArts R&D platform can manage tens of thousands of nodes for large-scale training tasks. By optimizing the service framework, the ModelArts platform can run and support large-scale distributed tasks with 10,000 processors while supporting 100,000 operations.

A key capability of distributed training on a large-scale cluster is the excellent distributed acceleration ratio, and this is also a key factor encouraging users to opt for large-scale clusters to accelerate AI services. HUAWEI CLOUD's ModelArts offers industry-leading distributed acceleration capability. Its 512-chip cluster can run the ImageNet 1K image classification MLPerf benchmark in 93.6 seconds, smashing the NVIDIA V100's 120-second record. Thanks to backbone models, federated learning, model diagnosis and optimization, and efficient computing power, HUAWEI CLOUD's ModelArts will accelerate the application of AI in business scenarios.

Looking ahead, Huawei will continue to concentrate on the four areas of model efficiency, data efficiency, computing power efficiency, and knowledge efficiency, and invest heavily in AI research. Focusing on its basic research plans in the AI domains of computer vision, speech and semantics, and decision-making optimization, HUAWEI CLOUD will keep striving to provide powerful AI technology to help each developer reach their full potential and create unique value. We will continue to make AI more inclusive and grow together with developers around the world.