NET4AI: Supporting AI as a Service in 6G
We propose a 6G system architecture, named NET4AI, which leverages pervasive edge computing capabilities in 6G and offers an end-to-end solution to network-based AI, from deployment to operations.
Authors (all from Huawei 6G research team): Xu Li 1, Hang Zhang 1, Chenghui Peng 2, Zhe Liu 2, Fei Wang 2
Artificial intelligence (AI) refers to intelligence as exhibited by machines. It perceives external data and takes appropriate actions to achieve goals. AI techniques have become an essential tool for solving challenging problems in business analytics and decision-making. It is anticipated that AI applications will reach every sector of the global economy and affect all aspects of society.
AI techniques can be broadly classified as rule-based AI and learning-based AI. In rule-based AI, such as an expert system, human knowledge is encoded into rules that apply to input data for problem solving. Rule-based AI is limited by its knowledge base and cannot solve unknown problems. In contrast, learning-based AI aims to learn rules from historical data and uses these rules to achieve specific goals. Learning-based AI is at the center of the current resurgence of AI research and development, with machine learning approaches becoming mainstream.
At the core of a machine learning algorithm is a learning model that describes the relationship between input data and output rules. Typical learning models include artificial neural networks (ANNs), genetic algorithms, and regression analysis, to name just a few. This paper draws attention to ANN-based machine learning, in particular deep learning, which is able to extract features from training data and identify which are relevant to the target problem. It is suitable for correlated data and prevails in a variety of applications. In the sequel, we use "learning model" and "AI model" interchangeably.
Figure 1 A DNN with three hidden layers (layers 2–4)
ANN in deep learning includes multiple hidden layers between the input and output layers, and is often referred to as a deep neural network (DNN). Figure 1 shows a DNN with three hidden layers. Deep learning relies on frequent data access and intensive computation to train the DNN. To reduce training time, distributed systems have been exploited for parallelizing time-consuming computation and slow I/O access in deep learning.
In parallel to the proliferation of machine learning, mobile computing has entered an exciting new era, where personal devices such as smart phones and tablets are becoming the primary computing platform for many people and applications. These devices have access to an unparalleled amount of data that is often not only personal but also private in nature.
When deep learning meets mobile computing, a new paradigm of privacy-preserving deep learning with decentralized data is revealed. As collecting and storing such sensitive data comes with associated privacy risks, as well as the responsibility to protect the privacy embedded in the data, a large amount of research effort has recently been devoted to differential privacy in deep learning, which aims to protect the exact training data of individual devices to the point that they are indistinguishable.
A learning model is used for inference after it has been trained. Model inference can be performed at different places, depending on how the model is distributed. If the learning model is distributed to the client, inference happens locally. Local inference may place a large computational workload on the client. If the model is held on a server, the client will need to upload inference data to that server, and this can cause information leakage. Recent research suggests splitting the model between the client and the server, so that the client sends intermediate results rather than raw data to the server. This split inference can reduce communication overheads and latency, and protects data privacy as the server cannot derive information about the raw inference data from the intermediate results. This is similar to split learning (SL), which is described later in Section 1.1.
Model inference is technically similar to a forward propagation step in model training. It is triggered by an inference data item, instead of a training data item, and it returns a classification result from the output layer rather than triggers loss function evaluation and back propagation. Due to this technical similarity, we will focus on model training in this paper. However, our solution can be readily applied to model inference. Below we will briefly review existing work related to privacy-preserving deep learning.
In terms of deep learning, differential privacy approaches include adding noise to training data without jeopardizing its statistical properties, so that the trained model still captures features in the original dataset, and applying cryptographic techniques so that learning is based on encrypted data without decryption.
In this paper we draw attention to an alternative, where instead of sending raw training data, clients forward information that appears random. Federated learning (FL) and SL are two typical examples of this approach, and both train a deep learning model (such as a DNN) without requiring raw training data to leave the clients (for example, uploaded to a training server).
In FL, individual clients each train a local model using their own datasets only, and update the model parameters to a training server where a global model (specifically, global model parameters) is maintained. The training server aggregates updates received from the clients to adjust the global model, the parameters of which are then returned to the clients. Based on this information, the clients update the local model and continue the training. The procedure then repeats until the global model converges. FL can be viewed as a generalized implementation of stochastic gradient descent with flexible batch size and participating clients.
In SL, the DNN is split into two disjoint components by a cut layer. The lower component includes the input layer and is run on the client side, while the remaining upper component runs on the server side. A cut normally occurs between two layers of the DNN (for example, between layers 2 and 3 in Figure 1), although in theory it can be freely defined as long as it produces two disjoint partitions of the DNN. Consequently, the two components can be viewed as two concatenated learning models — a client-side model and a server-side model — with the client- side model feeding its output to the server-side model as input. Clients (such as devices) interact with the training server sequentially to train the DNN using their local data, by iteratively sending intermediate results (such as the output of the client-side model) to the server and receiving the corresponding gradients from the server. When a client finishes the training with the server using its local data, it provides the latest model parameters to the next client, which continues the training using its own dataset. Training then proceeds sequentially among clients until all are finished. A new round of training may be initiated as needed.
FL combines simultaneously and individually trained local models to generate a global model. As the local models are based on pure local data that is usually non-IID (independent and identically distributed), FL converges slowly. SL essentially trains the global model directly using all local datasets and can therefore converge fast (in terms of duration, but not necessarily in terms of training rounds). However, it requires synchronization among clients due to its sequential learning nature. A comparative study of FL and SL can be found in. As clients do not send raw training data to the training server, FL and SL both offer differential privacy.
Study has shown that an insider adversary with complete knowledge of the learning model can construct information that is very similar to the training data by taking advantage of the gradual course of model convergence. In FL, this can lead to information leakage to malicious clients without violating differential privacy. SL, in contrast, does not suffer from this problem, as none of its clients have complete knowledge of the deep learning model. That said, if the learning process involves only a small number of clients, severe information leakage is possible as information similarity is narrowed down to the local data of the small set of clients. Consequently, it is desirable to ensure a minimum number k of participating clients, where k is a system parameter. This provides further privacy protection and is known as k-anonymity.
A secure aggregation protocol is proposed in to achieve k-anonymity in FL. The protocol is built on the concept of secret sharing and runs between devices and the server. As k-anonymity relies on the server's involvement, the proposal may not ease device privacy concerns, especially if the devices do not trust the server. This anxiety is largely due to fear of concentrated power: the server is not only the primary entity of the learning that owns the learning model and oversees the entire learning process, it also knows which devices contributed to the learning. The work presented in this paper takes a systemic approach to address this anxiety and resolve the problem.
The work presented in this paper also relates to the shared machine learning system described in. This system protects data security and privacy using specialized hardware that provides trusted execution environments. However, it is not suitable for large-scale public systems such as telecommunications networks.
Privacy-preserving deep learning with decentralized data is tightly coupled with the telecommunications network when the training data sources are mobile terminal devices. It is envisioned that the 6G wireless system will go beyond connectivity provisioning to enable connected intelligence, i.e., distributed learning and inference. The 6G wireless system should therefore provide native support to this new AI computing paradigm.
In this paper, we first generalize SL by extending cut layer definition so that it covers FL and centralized learning (CL) as special cases, and combine it with FL to obtain a two-level learning framework. The framework inherits the advantages of FL and SL, but not their drawbacks. We propose customizing the learning approach at the bottom level of the two-level framework by selecting proper cuts for the AI model. The cut layer selection considers a number of factors, and the goal is to balance the learning overhead on devices, on servers, and on the network. Such an optimization can result in a mix of local learning (at the client side), CL (at the server side), and SL (at both sides) to appear at the bottom level concurrently.
We then propose the 6G wireless system architecture, NET4AI, which bears a service-oriented design. It supports the two-level learning framework and offers learning approach customization as a value-added service. With NET4AI architecture, the system can provide end-to-end support to network-based AI applications, from deployment to operations. During the operation phase, the AI computing modules of AI applications and user devices (such as UEs) communicate anonymously to finish AI computing (for example, model training and model inference), as coordinated by the system. The system can further enforce k-anonymity for user privacy protection and apply proxy re-encryption to ensure data confidentiality.
The remainder of the paper is organized as follows: we describe important concepts and assumptions in Section 2; we present the two-level learning framework and the NET4AI architecture in Section 3; we identify a number of challenges associated with NET4AI in Section 4; and we offer our conclusions in Section 5.
In this section, we will describe our assumptions about the network infrastructure and introduce radio computing node (RCN), a radio access network (RAN) node equipped with edge computing capabilities. We will also introduce some AI computing-related concepts, including job, task, and routine. The networking logic of an AI computing service is expressed using these concepts.
Algorithm, data, and computing power are the main driving factors of AI innovation. Conventionally, AI computing power is provided by a centralized cloud platform, and data is pushed to the central cloud for processing. In the era of big data and AI, this introduces a number of issues relating to data privacy, latency, and efficiency, which trigger a paradigm shift in the other direction; i.e., bringing computing to data.
Edge computing (EC) is an approach to computing that reduces transmission delay and bandwidth consumption by moving computation and storage close to the network edge, and therefore to end user and data. As EC techniques are maturing, it is anticipated that EC capabilities in the form of edge clouds will be pervasively deployed in the network, for example, collocated with RAN nodes or base stations. When a RAN node or base station is equipped with EC capabilities, it is referred to as an RCN in this paper.
As the wireless network infrastructure offers both computing resources (such as CPU cycles, memory, storage, and I/O access) and communication resources (including radio resources and transport resources), it becomes possible to jointly optimize utilization of distributed and diversified infrastructure resources, and to enable ultimate, optimal end-to-end performance to end users. It is assumed that there are one or more entities in the network (for example, resource managers) performing infrastructure resource management.
An AI computing service can be an application-layer service. It can also be a network service that aims to optimize network management or operations.
The AI computing service includes one or more computing modules called service functions. The AI computing service is associated with well-defined computing logic and delivers computational results according to input data. This computing logic includes algorithm implementations for each of these service functions and interactions between them. The latter can be specified in terms of job, task, and routine, as defined below.
The AI computing service includes one or more independent jobs, each of which is focused on a computational goal and exposed to the service consumer. An execution can be triggered at the job level, upon request, on some events, or under certain conditions. When a job is being executed, the service consumer can, for example, join in or contribute to the execution by providing input data and/or by receiving computational results.
A job comprises one or more tasks that may have inter-dependencies. During the execution of a job, its tasks are executed in accordance with their interdependencies. A task that depends on another must be executed after the other task has been executed. Each task is associated with a service function chain (SFC) that comprises one or more routines, each of which corresponds to two adjacent service functions, routine server function and routine client function, in the SFC. Within a routine, the routine client function can represent devices, implying that the computing logic of the function runs on the devices. However, the routine server function cannot represent devices.
Figure 2 Illustration of a service, jobs, tasks, routines, and service functions (F)
During the execution of a task, its routines are executed in sequence along the SFC. When a routine is being executed, its client and server functions are triggered to communicate with each other. Figure 2 illustrates the relationship between a service, jobs, tasks, routines, and service functions, where arrowed lines indicate dependency. Using the AI computing service or application as an example, model training and model inference can be two separate jobs of the AI computing service. The (model) training job can include a model training task and its dependent model validation task. The model training task can be associated with a three-function SFC, as illustrated in Figure 3. It includes two routines, which respectively correspond to bottom- and top-level learning in the two-level learning framework introduced later in Section 3.1.
Figure 3 Two-level learning framework
A service function, whether a routine client function or a routine server function, is regarded as a concrete service function if it does not represent devices. Alternatively, a service function representing devices is considered to be an abstract service function. When an AI computing service is deployed in the network, concrete service functions from the AI computing service are instantiated at network locations such as edge clouds. After a concrete service function is instantiated at a network location, an instance of the service function runs on an application server (AS) at the network location. Given a routine, we refer to an instance of the routine client function as a routine client, and an instance of the routine server function as a routine server. When the routine client function represents devices, these devices are considered routine clients.
In this section, we will discuss how to offer native support for AI applications in 6G, particularly in regard to privacy-preserving deep learning. This includes a two-level learning framework, which allows learning approach customization, and a service-oriented system architecture, which provides an end-to-end solution to AI computing services. We will also discuss protocol design in RCNs — RAN nodes that provide radio-and-computing integrated resources — and show how the proposed solution works through a case study.
We generalize SL by extending the definition of cut layer to the point where FL and CL become two special cases of SL. In FL, each device has complete knowledge of the AI model and trains the model using its local dataset. FL can be viewed as SL applying a top cut, where the cut layer is above the output layer. On the other hand, CL requires devices to send raw training data to a server and learning happens purely on the server side. As such, CL can be viewed as SL with a bottom cut applied, where the cut layer is below the input layer. Traditional SL corresponds to cases where the AI model is partitioned by a middle cut. Bottom cut, middle cut, and top cut are illustrated in Figure 1.
We combine FL and the generalized SL to obtain a generic two-level learning framework. As shown in Figure 3, the framework can be expressed by a task composed of two routines: a bottom-level learning routine and a top-level learning routine. The generalized SL may be applied at the bottom level (the bottom-level learning routine) of the framework and FL at the top level (the top-level learning routine). The bottom-level learning routine runs between two service functions: a data source function and a local training function. The data source function represents devices, while the local training function can be instantiated at local ASs to train local AI models. These local ASs, for example, can be located on RCNs. The top-level learning routine runs between the local training function and a global training function, the latter of which can be instantiated at a global AS. The global AS can be located at an edge cloud relatively far from the RAN so that it can efficiently serve multiple local ASs (RCNs). Within this two-level learning framework, learning approach customization is realized through cut layer selection at the bottom level.
When a middle cut is selected for the bottom-level learning (routine), the two-level learning framework offers the advantages of both FL and SL. As the bottom-level learning is based on the combined datasets of multiple devices, which are less non-IID than a single device's dataset, the trained local AI models at the local ASs are more accurate than those trained by individual devices using their own dataset in FL. Generally, improved local model accuracy leads to accelerated convergence of the global model. As devices do not have complete knowledge of the AI model, information in the training data is not leaked to adversary devices as described in.
When the top cut is selected for the bottom level, local AI models are trained at individual devices, and the local training function receives local AI model parameters and sends them to the global training function in an aggregate form. In this case, the framework reduces to FL and suffers from the information leakage problem. When bottom-level learning applies the bottom cut, the framework does not offer differential privacy. The top cut and bottom cut are not recommended as far as privacy is concerned. However, they may be used due to other factors, as described below.
Devices associated with the same instance of the local training function for bottom-level learning should be assigned with the same cut layer, so that they and the local training function instance exhibit consistent behavior during the learning. Under this constraint and considering model structure, device status (such as computing power and energy levels), server conditions (for example, AS loading), device locations, deployment locations of the local training function, and network conditions (for example, bandwidth or congestions), the bottom-level learning can apply a mixed cut to optimize device, server, and network performance at the same time. When a mixed cut is applied, a different cut layer can be selected for different groups of devices, with each group associated with a different local training function instance, as illustrated in Figure 4.
Figure 4 A mixture of different cuts applied to model training
Adhering to the concepts of job, task, and routine, we propose a 6G wireless system architecture to support the two-level learning framework. This architecture is known as NET4AI. A 6G wireless system using this architecture is referred to as a NET4AI system, and such a system can offer NET4AI services to its customers, providing end-to-end support of AI computing services. The NET4AI architecture operates under the assumption that service functions can be located at or run on devices or edge clouds. Figure 5 illustrates the NET4AI architecture.
The NET4AI architecture includes a control plane (CP) and a compute plane (CmP). The control plane manages AI computing services and controls the execution of jobs and tasks for those services, in addition to providing traditional device- related management functionalities. It includes a number of CP entities, such as a service manager, orchestrator, resource manager, access manager, job manager, and task manager. Depending on implementation, some of these control plane entities can be merged — for example, the job manager can be merged into the service manager. Furthermore, and also depending on implementation, the control plane can span the RAN and the core network segments of the system, or dedicate itself to just one of them. The compute plane controls executions of routines for an AI computing service and supports data communication between service functions. It comprises one or more routine managers, as well as a forwarding sub-plane, which can be simply called a forwarding plane (FP). The forwarding plane can correspond to the user plane (UP) in 3GPP 5G system architecture, and includes one or more forwarding plane functions (FPFs) and RAN nodes (specifically, the user plane part of RAN nodes). Note that each FPF can be integrated with a RAN node.
The links between entities in Figure 5 indicate the interfaces they use to communicate with each other and can be defined or created at a per-service granularity. Special attention should be paid to the T2 interface. When the FPF is separate from the RAN node, as illustrated by scenario 1 in Figure 5, the T2 interface corresponds to the RAN node and maps to a radio bearer. When the FPF is integrated with the RAN node, as illustrated by scenario 2 in Figure 5, the RAN node implements the FPF's functionality. In this case, the interface T2 corresponds to the device and can be supported by a radio bearer. In either scenario, the radio bearer can be shared among multiple devices, such as a computing radio bearer (CRB), as described in Section 3.3. Note that in scenario 2, the T4 interface becomes integral to the RAN node in cases where the RAN node integrates with the edge cloud (such as when the RAN node is an RCN).
Figure 5 NET4AI architecture
An authorized application controller (AC) can register an AI computing service with the NET4AI system. The AC belongs to the service provider, and is responsible for managing the AI computing service. During registration, the computing service is instantiated in the system. Every concrete service function of the AI computing service is instantiated at one or more network locations (such as edge clouds). After registration, the NET4AI system supports the AI computing service's operations by coordinating executions of routines, tasks, and jobs of the AI computing service. We describe service instantiation and service operations in detail below.
The AC registers the AI computing service with the NET4AI system by sending information describing service functions, routines, tasks, and jobs of the AI computing service to a service manager, which interacts with the orchestrator to instantiate the AI computing service accordingly. During this process, the orchestrator determines the deployment of the AI computing service and selects an appropriate compute plane.
The deployment decision includes locations of instances of concrete service functions and the resources needed for each of the service function instances. These service function instances include routine servers and possibly routine clients of individual routines of the AI computing service. The deployment decision also includes logical links between routine clients and routine servers for each of the routines. In cases where there is a logical link between a routine client and a routine server, they can communicate with each other via the compute plane during the execution of the routine.
The routine client is connected to an FPF in the compute plane via a T2 or T4 interface, while the routine server is also connected to an FPF in the compute plane via a T4 interface. The two FPFs are either the same entity, or different entities interconnected via a T8 interface. The routine client and the routine server are assigned to the same routine manager in the compute plane. The routine manager manages the execution of the routine with respect to the routine client and the routine server by triggering data communication between them.
The orchestrator implements the deployment decision using the resource manager. The orchestrator informs the service manager about the compute plane selection result, and the service manager configures the compute plane accordingly.
The AI computing service comprises a job that includes a task for training an AI model using the two-level learning framework described in Section 3.1. We refer to this task as a model training task, and it involves a bottom-level learning routine and a top-level learning routine, as illustrated in Figure 3. The service functions involved in the model training task include a data source function, local training function, and global training function. The data source function is the routine client function of the bottom-level learning routine. It represents devices and acts as a source of training data. The local training function is the routine server function of the bottom-level learning routine. We will now elaborate on how the NET4AI system supports service operations related to the job.
The AC requests execution of the job by sending a job order to the service manager, which then informs a selected job manager to execute the job. As described earlier, the service manager and job manager can be combined in some implementations.
When executing the job, the job manager selects a task manager for each of the tasks, including the model training task, within the job, and triggers the task manager(s) to execute the tasks in accordance with their interdependencies. When executing a task, a task manager identifies related routine manager(s) in the compute plane, which correspond to the routines within the task. The task manager requests the routine manager(s) to execute the routines in accordance with their interdependencies.
If the routine client function (for example, the data source function) of a routine represents devices, the task manager selects these devices as routine clients and assigns each of them to a routine manager. The devices are selected among those that are allowed to access the AI computing service, have registered, and have given their consensus (on contributing to the job). A device can register to the NET4AI system and provide its consensus via the access manager, which may have functionalities similar to those of the access and mobility management function (AMF) in the 3GPP 5G system architecture.
When executing the model training task, the corresponding task manager performs cut layer selection for the bottom-level learning routine, as described in Section 3.1, to customize the learning approach for the AI model. The cut layer selection can be performed jointly with the routine client selection described above. The task manager informs the routine manager about the cut layer selection result.
The NET4AI compute plane is deeply involving in the execution of each routine of the AI computing service. It is responsible for coordinating the participation of routine clients and routine servers in the routine execution, enforcing k -anonymity, and enabling anonymous data communication between the routine clients and the routine servers, as described below.
In the compute plane, a routine manager is assigned with routine clients and routine servers for the corresponding routine, and each routine server is associated with one or more routine clients. The server-client association is determined by the orchestrator during service instantiation if the routine client function is a concrete service function (as described in Section 3.2.1), and dynamically by the task manager and/or the routine manager in all other cases. The server-client association is not known at the application layer, which includes routine clients, routine server, and AC.
When executing the routine, the routine manager triggers or invites the routine clients associated with a routine server into a data communication with the routine server. This communication is mutually anonymous as the communicating parties do not know about each other. The routine manager first invites the routine server via the forwarding plane. During this step, the routine manager can provide cut layer information to the routine server, if applicable. It then invites the routine clients, and these invitations can be sent via the access manager (for example, when the routine client is a device) or via the forwarding plane.
During data communication, data traffic is routed between a routine client and the routine server by the forwarding plane. Either the routine client or the routine server can notify the routine manager when it finishes communication, so that the routine manager can proceed to the next step, for example, inviting the next routine client to communicate, or notifying the task manager that the routine execution has been completed.
Figure 6 illustrates a routine execution procedure in accordance with the above description. In step 6, the routine server can request to restart the data communication or submit a notification regarding its completion. In the case of a restart request, the routine manager repeats steps 3–6 in step 7. If all routine servers have sent a completion notification, the routine manager notifies the task manager in step 8 that the execution of the routine has been completed.
Figure 6 Routine execution procedure, with respect to a routine server
Note that step 5 can be performed in parallel with steps 3 and 4, unless the routine requires sequential communication. For example, the bottom-level learning routine may require sequential communication if a middle cut is selected for it. In this case, when the routine server is associated with multiple routine clients, the routine manager invites one such client to communicate with the routine server at one time. When inviting a routine client, the routine manager can provide it with cut layer information.
During data communication, there may be a strong requirement for user privacy in cases where the routine client is a device. To address such privacy concerns, k-anonymity can be enforced in the compute plane, where the value of k is a system parameter or a requirement from the AC. Assume that the routine manager is assigned with m – 1 other routine servers during service instantiation. The task manager assigns at least m*k routine clients to the routine manager, which then associates at least k routine clients with the routine server.
There may also be a data confidentiality requirement for data communication. As communicating parties do not know about each other, end-to-end encryption is not applicable in this setting. Instead, data confidentiality can be achieved through proxy re-encryption in the forwarding plane (i.e., FPFs). That is, data originating from a sender is in ciphertext and forwarded by the forwarding plane without decryption. Before the data is forwarded, it is re-encrypted using a re-encryption key, enabling the receiver to recover the original data (plaintext) using their own private key. The re-encryption key can be obtained by the routine manager in step 2 or 3 and configured into the forwarding plane before data communication begins.
The radio interface at a RAN node includes three protocol layers: layer 1 — physical layer, layer 2 — data link layer, and layer 3 — network layer. The NET4AI architecture focuses on protocols at layers 2 and 3.
A radio bearer is a layer 2 logical channel. It bridges layer 1 and layer 3 to support the transfer of user or control data. Legacy radio bearer management assumes that the RAN node is a network access point and that computing happens on the other side of the network. It may not be efficient or suitable for RCNs, where communication and computing are integrated to allow computing within the network. Consequently, the NET4AI compute plane introduces a CRB at layer 2 to enable RCNs to distinguish between data at the computing and user planes for self-loop computing services within RCNs. A CRB connects a UE served by an RCN and one or more service functions deployed on the RCN; in this sense, it provides the T2 interface functionality shown in Figure 5. A deep edge protocol (DEP) is a new simplified protocol, which is proposed at layer 3 to enable efficient exchange of data between the UE and service functions. The core reason for introducing DEP is to enable service functions to be deployed in a wireless network like the RAN. As a result, data transmission protocols can be simplified to a greater extent than through the use of traditional cloud deployments.
Figure 7 Illustration of CRB and DEP
Figure 7 illustrates CRB and DEP, along with 5G layer 2 radio bearers and layer 3 protocols. In the figure, the AMF and the user plane function (UPF) are 5G network functions, which respectively provide some of access manager and FPF functionalities in the NET4AI architecture. The compute plane function (CPF) is a module within the RCN that implements the FPF's functionality. These radio bearers and protocols can all be present on an RCN to support different scenarios or needs. For instance, data radio bearers (DRBs) and Service Data Adaptation Protocol (SDAP) can be used to support UE connection to a service function that is not deployed on the RCN.
Assume that a UE is served by an RCN. When a UE accesses a service function deployed on the RCN for computing, the RCN allocates a CRB to the UE and the CRB connects the UE to the service function. An example of a basic procedure is as follows: The UE sends a computing request to the control plane of the NET4AI system via the RCN. When the NET4AI control plane notifies the UE that the request has been accepted, the NET4AI control plane also notifies the RCN to establish a DEP session for the UE. The RCN (the control plane part) will then allocate a DEP session to the UE accordingly. The DEP session maps to a CRB, which can be either newly created or an existing one. The RCN then transmits the CRB parameters to the UE over radio resource control (RRC) signaling, enabling the UE and service function to exchange data.
As illustrated in Figure 8, a DEP session may map to one or more CRBs, and a CRB can support one or more DEP sessions. Every CRB is configured with specific QoS capabilities. When a DEP session is mapped to multiple CRBs, these CRBs are configured with different QoS capabilities. Consequently, the DEP session supports multiple QoS flows via the corresponding CRBs. If a DEP session is mapped to a single CRB, the DEP session supports only one QoS flow.
Figure 8 Mapping between CRBs and DEP sessions
A healthcare institute is building a blood pressure model. The model is a DNN-based AI model designed to capture blood pressure ranges during various times of the day for different age groups in connection with certain geographic regions. The healthcare institute wants to bring the model online for a large population of users to contribute to model training. The healthcare institute registers an AI computing service with the NET4AI system to achieve this goal.
When registering the AI computing service, the healthcare institute provides information about the AI model, including the number of layers, number of neurons per layer, and number of links between every two adjacent layers. The AI computing service includes two concrete service functions: a local training function and a global training function. The local training function supports CL and SL for the AI model, while the global training function implements FL model aggregation logic.
The AI computing service includes a model buildup job, which in turn includes a model training task. The model training task includes a bottom-level learning routine and a top-level learning routine, as shown in Figure 3. The bottom-level learning routine is associated with the data source function, which represents devices, and the local training function. Meanwhile, the top-level learning routine is associated with the local training function and the global training function. The model buildup job can also include a model verification task intended to be executed after the model training task. For simplicity, we ignore the model verification task in this case study.
According to information about the AI computing service received from the healthcare institute, the NET4AI system deploys the AI computing service in the network (for example, at edge clouds). The deployment includes an instance of the global training function and multiple instances of the local training function. Each of these service function instances is attached to the NET4AI system via an attachment point, which is either an FPF or an RCN (when the service function instance is deployed on the RCN).
After the AI computing service is deployed, the healthcare institute requests the NET4AI system to execute the model buildup job. According to the request, the NET4AI system executes the model buildup job by notifying the instances of the local training function to execute the bottom- level learning routine. As this routine requires devices to participate, execution does not start until such participation begins.
Once a device connects to the network, the NET4AI system informs the device about the model buildup job of the AI computing service. The device may then volunteer to contribute to the model buildup job and provide its consensus to the NET4AI system. At the same time, the device can provide information about its status (for example, computing power and energy levels) and privacy requirements, the latter of which indicates if the device is willing to provide raw data for the model buildup job.
Based on the information received from devices as well as other information (such as network conditions), the NET4AI system divides consenting devices into multiple groups. The NET4AI system selects a cut layer for each of the groups for the bottom-level learning routine, and associates each group of devices with a service function instance, which is either an instance of the local training function or the instance of the global training function. For example, as illustrated in Figure 4, where the three ellipses represent three groups of devices, the NET4AI system selects the bottom cut for a group, a middle cut for another, and the top cut for the remainder. It associates the first two groups respectively with two instances of the local training function, and the third group with the instance of the global training function.
The NET4AI system informs the devices within each of the groups about the cut layer selection result, and connects the devices to the attachment point of the corresponding service function instance. Devices that are assigned with a top cut or a middle cut further obtain the local training function from the NET4AI system (configured with the cut layer) and run it locally.
When sufficient devices are available (at least k devices within a group), the NET4AI system invites these devices into the execution of the bottom-level routine. These devices can then send data to the corresponding instance of the local training function via the NET4AI system. During data communication, the devices and the local training function instance do not know about each other. The data sent from the devices includes blood pressure readings and user age information in cases where the devices are assigned with the bottom cut, which correspond to intermediate results as described in SL in cases where the devices are assigned with a middle cut, or comprises local model parameters as in FL in cases where the devices are assigned with the top cut.
When the instances of the local training function that are associated with devices finish computing (when local AI models are established), the NET4AI system knows that the execution of the bottom-level learning routine has finished, and notifies the local training function instances and global training function instance to execute the top-level learning routine. The model buildup job can be executed this way in rounds, until the global training function instance notifies the NET4AI system to stop, for example, when the global model converges.
The global training function instance can provide the model parameters to the healthcare institute via the NET4AI system. The healthcare institute then considers the blood pressure model to be successfully trained.
To support the vision of inclusive intelligence, 6G networks need to consider native AI design instead of overlay design at an architectural level, which introduces new technical challenges to 6G networks beyond traditional connectivity issues, involving issues such as data privacy, heterogeneous resources, and energy-saving. Of particular concern is the complexity of the wireless edge environment, which occurs due to unstable wireless connections, large-scale distribution, and the heterogeneity of edge resources. The following issues require further study:
NET4AI may need to support large-scale distributed training within 6G networks. Data communication will be increased significantly for model and parameter synchronization, which results in severe energy consumption challenges when combined with rising computing costs.
Considering the training process, there are generally two ways to reduce communication overheads. The first is to reduce the amount of data exchanged per round, and includes model compression methods such as quantization, sparsification, and knowledge distillation. The second is to reduce the number of communication rounds. For example, FedAvg performs multiple rounds of local updates before aggregation. Further investigation is required to reach a compromise between communication overheads and AI training performance.
Topological structure design for communication networks is another effective method for improving communication efficiency. For example, the Ring Allreduce solution in high-performance computing (HPC) may reduce communication bandwidth. However, the topology of wireless networks is not as flexible as IoT servers, and actually applying such mature HPC technologies to wireless networks still requires a lot of research.
Distributed learning systems may consider certain fault tolerant mechanisms, such as the classical Byzantine. However, this becomes more challenging with wireless networks. One of the reasons for this is because resources can dynamically change in wireless network environments that include connections. For example, an AI training task on a base station may be blocked by extreme burst traffic, or the sudden drop-out of terminal devices due to unstable wireless connections. As a result, to ignore failed nodes or perform redundant backups, further research may be needed for natively adapting to wireless dynamic environments.
To achieve NET4AI, we need to manage and schedule large-scale heterogeneous resources within a 6G network. However, designing an efficient distributed scheduling framework and algorithm is a challenging prospect.
To properly and efficiently deploy AI tasks to a wireless network, we must first perform general modeling for various heterogeneous resources, including computing power, memory, storage, and communication bandwidth. After that, the state and action spaces of resource scheduling can be defined, and then the distributed scheduling framework and algorithm can finally be designed. The scheduling algorithm needs to consider how to efficiently allocate tasks among large-scale heterogeneous resources, which is an NP problem. Computational complexity increases greatly with scaling.
Huge amounts of data (such as sensing data) may be generated, processed, and consumed in 6G networks to drive network intelligence and data sharing and to improve network operation efficiency. However, this introduces challenges for 6G data services, including how to fully exploit data value while also ensuring data security and natively complying with data regulations such as the General Data Protection Regulation (GDPR) in the European Union (EU) and the European Economic Area (EEA) from architectural perspectives.
NET4AI needs to ensure that users have full control of their personal data and can decide whether to share, monetize, or offer the data for training. Certain standalone data protection solutions, such as k-anonymity, l-diversity, t -closeness, and differential privacy, may not be enough. Constructing a complete architecture-level data service framework with a transparent multi-party mechanism is a key challenge for 6G.
In this paper, we proposed NET4AI — a 6G system architecture intended to support future computing services, AI computing services in particular. The NET4AI offers end-to-end support to AI computing services, from the deployment phase to the operation phase. It addresses the emerging problem of privacy-aware deep learning and allows for learning approach customization. It can enforce k-anonymity and ensure data confidentiality in the compute plane, offering strong privacy protection. We discussed protocol design at layers 2 and 3 of RCNs, i.e., RAN nodes equipped with edge computing capabilities, and we introduced CRB and DEP to support efficient combined communication and computation on RCNs. We also identified a number of technical challenges associated with NET4AI. However, development of NET4AI architecture is still in its initial stages. The details of actual implementations, for example, mobility and connection management, with respect to the execution of routines, tasks, jobs, and other system procedures, are yet to be developed.