Future Technologies
LLM Application in Wireless Communication Knowledge Management
This paper details the use of RAG to design a Q&A solution for wireless communication knowledge bases and an accompanying evaluation solution.

By Huawei Wireless Technology Lab: Hongwei Hou, Chixiang Ma, Lihong Du, Junhui Li
1 Development of LLMs
In 2005, the use of large n-gram models in machine translation marked the beginning of large language models (LLMs). In 2017, the Transformer network structure was introduced, which redefined natural language processing (NLP) by incorporating an attention mechanism that significantly improved model performance across multiple tasks. The introduction of Bidirectional Encoder Representations from Transformers (BERT) models in 2018 and 2019 further advanced the development of pre-trained language models (PLMs). BERT effectively utilizes the context information from both the left and right through a bidirectional encoder, achieving state-of-the-art (SOTA) performance on multiple NLP tasks. RoBERTa, an advanced edition of BERT, further improves model performance by adjusting the size of hyperparameters and training data.
The launch of GPT-3 in 2020 marked an important milestone in LLM development. GPT-3 enhances an LLM's generalization and few-shot learning capabilities by simply increasing the model size. Additionally, GPT-3 excels in text generation, producing samples of news articles that are indistinguishable from human works.
In recent years, LLMs have been increasingly used for multimodal tasks, such as image + text hybrid tasks, in addition to conventional text processing tasks. With the rapid advancement of technology, LLMs face new challenges and research interests in terms of adapting to the ever-changing knowledge in real-world applications through knowledge updates.
The evolution of LLMs involves algorithmic and architectural innovation, as well as advanced research on model training, evaluation, and application, transitioning from simple statistical models to complex neural network models and large pre-trained models. LLMs are expected to advance toward greater explainability, improved efficiency, and optimal integration and processing of multiple data types.
2 Essential LLM Technologies in Knowledge Management
LLMs have shown great potential in the field of knowledge management due to their advanced capabilities. However, they also face several significant challenges. First, they are trained using general-purpose data from the Internet to maximize accessibility and applicability. This lack of professional data in the training process leads to suboptimal LLM performance in professional fields. Second, LLMs often generate seemingly convincing but inaccurate responses, known as hallucinations.
To address these challenges, the industry has developed two common solutions: fine-tuning and retrieval augmented generation (RAG).
2.1 Fine-Tuning
Fine-tuning is a machine learning technology that involves using a small volume of task-specific data to retrain a pre-trained LLM for a new or specific application scenario. This process involves adding one or more output layers to the pre-trained model and using a dataset designed for the task to retrain the model, enabling it to better understand and execute the specific task. Fine-tuning leverages the general knowledge learned by the pre-trained model as a starting point, eliminating the need to train a model from scratch, which can be computationally expensive and time-consuming. BERT is a prime example of fine-tuning. It is pre-trained on a large amount of text data first and then fine-tuned for specific NLP tasks, resulting in significant performance improvements.
2.2 RAG
RAG is an innovative approach that combines pre-trained parameterized memory, like LLMs, with non-parameterized memory, such as dense vector indexes from Wikipedia. It dynamically retrieves information from external knowledge resources in language generation tasks, improving the accuracy, diversity, and factuality of the generated content. A typical RAG model includes an LLM as parameterized memory and a retriever that accesses non-parameterized memory such as dense vector indexes.
2.3 Advantages and Disadvantages of RAG and Fine-Tuning
We compare RAG and fine-tuning from six dimensions: dynamic data, external knowledge, model customization, reducing hallucinations, transparency, and technical expertise.
Table 1 Advantages and disadvantages of RAG and fine-tuning

Because RAG demonstrates superior performance in five dimensions, we used RAG in LLMs to improve the performance of wireless communication knowledge management.
3 Solution
3.1 Q&A Solution Design for Wireless Communication Knowledge Bases
The solution comprises two parts: offline construction of wireless communication knowledge bases and online question and answer (Q&A).
3.1.1 Offline Construction of Wireless Communication Knowledge Bases
Figure 1 illustrates the process of offline construction of wireless communication knowledge bases. Initially, users upload different types of documents, such as code files and 3GPP protocols. These uploaded documents undergo parsing, cleaning, and slicing and are sent to the LLM to generate Q&A pairs for each slice, which is optional. Vector indexes and keyword indexes are then created for the Q&A pairs and raw slice data, and are stored in a vector database and common database, respectively. The creation of vector indexes involves an embedding model.
3.1.2 Online Q&A
Figure 1 also illustrates the online Q&A process. A user inputs a question, and the LLM recognizes the user's intent, which is optional. Based on the intent, the LLM selects relevant wireless communication knowledge databases or a handling process. Then, hybrid retrieval is performed to recall the first K knowledge segments that are most

Figure 1 Diagram of using LLMs in wireless communication knowledge management
relevant to the question from the selected databases. These knowledge segments are ranked using a rerank model based on the question to obtain the most relevant N knowledge segments. The question and N knowledge segments are then organized according to a prompt template and sent to the LLM. The LLM provides an answer based on the input and relevant knowledge segments found in the wireless communication knowledge databases.
3.1.2.1 Hybrid Retrieval
Hybrid retrieval involves semantic retrieval and keyword retrieval. In semantic retrieval, an embedding model is used to vectorize the user's question, match the vectors of the question with those in the vector database, and recall K knowledge segments with similar semantics. Keyword retrieval involves searching information from databases based on keywords.
Semantic retrieval supports text with complex semantics and has the following advantages:
- Multi-lingual understanding: English content can be retrieved based on Chinese input.
- Multi-modal understanding: Information can be retrieved for various types of input, such as text, image, audio, and video.
- Advanced fault tolerance: Spelling mistakes and ambiguous descriptions are acceptable.
Despite these advantages, semantic retrieval may deliver suboptimal performance in certain scenarios, for example:
- Searching for a person or item by its name. For instance, the semantic retrieval result of the input "Huawei Mate 60" may include information about Mate 50.
- Searching for an abbreviation or short phrase, for example, "LLM".
In these scenarios, the conventional keyword search approach offers the following advantages:
- Exact match: Product names and person names can be accurately matched.
- Efficient search of short words: Information can be quickly searched based on a few keywords. However, the performance of vector retrieval is unsatisfactory in the case of only a few keywords.
- Able to match words that are used less frequently: Such words often convey more significant information. For instance, in the sentence "Do you want to have a cup of coffee with me?", the words "have" and "coffee" offer more information than the words "do", "you", or "with".
Hybrid retrieval integrates the unique advantages of vector retrieval and keyword retrieval to search for the most relevant information, which is a major goal in all text search scenarios.
3.1.2.2 Reranking
Hybrid retrieval integrates multiple retrieval technologies to improve the recall rate of search results. It uses a data normalization policy to efficiently process the results of different retrieval technologies. The policy converts data to a standard paradigm or distribution, which can be quickly compared, analyzed, and processed by the LLM. A crucial ingredient to the conversion process is a scoring system — a rerank model.
A rerank model rearranges the retrieval result by measuring the relevance between the documents in the candidate document list and the semantics of the user's query. Relevance is evaluated based on the relevance score of each candidate document. All items are ranked in descending order of the score.
This technique can also be implemented after keyword retrieval in a non-hybrid retrieval system to significantly improve the recall rate. A rerank model can also benefit vector databases, which often trade retrieval accuracy for computational efficiency, leading to uncertainties in the retrieval result. Such uncertainties may disturb the ranking order (descending order by relevance), meaning that the top K segments in the original retrieval result may not be the most relevant ones. In this scenario, a rerank model can be used to reorganize the retrieval result.
Reranking is not a retrieval technology but an enhancement to retrieval systems. With its simplicity and low complexity, it integrates semantic correlations into search systems without requiring any major infrastructure changes.
3.2 Model Combination Evaluation
As shown in Figure 1, we used three types of models in our design: LLM, embedding model, and rerank model. The open-source models were deployed locally, and local documents were used to build wireless communication knowledge bases.
3.2.1 Model Selection
3.2.1.1 LLM
We selected Llama-3-70b-Instruct, Command R+, and Qwen1.5-110B-Chat from the LLM leaderboard "LMSYS Chatbot Arena". These models support both Chinese and English.
3.2.1.2 Embedding Model
Retrieval is a major indicator for selecting an embedding model, according to the embedding model leaderboard "Massive Text Embedding Benchmark (MTEB) Leaderboard".
We selected 360Zhinao-search, stella-mrl-large-zh-v3.5- 1792d, PEG, and bce-embedding-base_v1 for Chinese and SFR-Embedding-Mistral, gte-large-en-v1.5, GritLM-7B, and bce-embedding-base_v1 for English.
3.2.1.3 Rerank Model
We selected bge-reranker-v2-gemma and bce-reranker-base_v1 by referring to. These models support Chinese and English.
3.2.1.4 Model Combination
To select an optimal combination of Chinese and English models, we evaluated the selected models in terms of Chinese and English: three candidate LLMs, four candidate embedding models, and two candidate rerank models for each language. The candidates formed 24 possible combinations for each language. We used vLLM to run the LLM and Xinference to run the embedding and rerank models.
3.2.2 Evaluation Method
We created a Chinese dataset zh_refine.json and an English dataset en_refine.json based on the open-source project RGB. Figure 2 shows the data format.

Figure 2 Raw dataset format
id indicates the data ID, query indicates the question corresponding to the data, answer indicates the answer, positive indicates the text relevant to the question, and negative indicates text irrelevant to the question (interference). The positive and negative texts of 300 Chinese data records are stored in the same file, which the embedding model sends to the vector database to create a Chinese knowledge base. We used the same approach to create an English knowledge base. The vector database is created based on the open-source vector database Chroma.
The evaluation framework is Ragas, which requires the data format in Figure 3.

Figure 3 Data format required by Ragas
question indicates the question, and ground_truths indicates the correct answer. The values of these fields can be obtained from either zh_refine.json or en_refine.json. The LLM generates the values of answer and contexts according to the prompt template we designed, as shown in Figure 4.

Figure 4 Prompt template
We evaluated the created datasets and the Chinese and English model combinations.
3.2.3 Evaluation Results
The evaluation indicators are faithfulness, answer_relevancy, context_precision, and context_recall. For details about each indicator, see.
3.2.3.1 Chinese Model Combinations
The total score of each of the 24 Chinese model combinations equals the sum of the scores of faithfulness, answer_ relevancy, context_precision, and context_recall.
In Figure 5, the horizontal coordinate indicates the name of each model combination, in the format of zh_x _y _z . x indicates the LLM (0: Command R+, 1: Llama-3-70b-Instruct, 2: Qwen1.5-110B-Chat). y indicates the embedding model (0: stella-mrl-large-zh-v3.5-1792d, 1: bce-embedding-base_v1, 2: 360Zhinao-search, 3: PEG). z indicates the rerank model (0: bce-reranker-base_v1, 1: bge-reranker-v2-gemma).

The optimal Chinese model combination deployed for Chinese scenarios is Command R+, stella-mrl-large-zh-v3.5- 1792d, and bge-reranker-v2-gemma.
3.2.3.2 English Model Combinations
The total score of each of the 24 English model combinations is equal to the sum of the scores of faithfulness, answer_ relevancy, context_precision, and context_recall.
In Figure 6, the horizontal coordinate indicates the name of each model combination, in the format of en_x _y _z . x indicates the LLM (0: Command R+, 1: Llama-3-70b-Instruct, 2: Qwen1.5-110B-Chat). y indicates the embedding model (0: SFR-Embedding-Mistral, 1: bce-embedding-base_v1, 2: gte-large-en-v1.5, 3: GritLM-7B. z indicates the rerank model (0: bce-reranker-base_v1, 1: bge-reranker-v2-gemma).
The optimal English model combination deployed for English scenarios is Llama-3-70b-Instruct, SFR-Embedding- Mistral, and bce-reranker-base_v1.
3.3 Implementation Results
We used Dify as the bottom-layer framework to implement the Q&A solution for wireless communication knowledge bases. We employed the selected optimal model combinations to create wireless communications knowledge bases based on Huawei documents. The RAG processes are integrated into workflows, which form the LLM application software.

Figure 7 Implementation of the Q&A solution for wireless communication knowledge bases
The question classifier is responsible for intent recognition. If the question concerns 5G, the LLM application software performs RAG until an answer is generated. If the question is unrelated to wireless communications, the software politely refuses to answer.
3.3.1 Questions Related to Wireless Communications
In Figure 8, the user asks a question about wireless communications. The LLM application software performs RAG to generate an answer and provide reference documents.
3.3.2 Questions Unrelated to Wireless Communications
In Figure 9, the LLM application software politely refuses to answer any question unrelated to wireless communications.

Figure 8 Wireless communications-related question answered by the LLM application

Figure 9 Unrelated question answered by the LLM application
4 Prospects
Although the integration of RAG with LLMs and databases significantly reduces hallucinations in the generated content, RAG still faces many challenges. This section outlines these challenges and RAG's future research interests.
- Multi-modal data processing
A large proportion of enterprise digital assets are PowerPoint and PDF files, which include a massive volume of unconstructed data, such as images and tables. Only a small proportion is text files. However, text is the most stable information source for LLMs. Extracting accurate key information from unstructured data is critical to improving model performance.
- Numerous RAG components and hyperparameters
A RAG application involves many components. For instance, LlamaIndex has more than 80 RAG components that can be selected and integrated based on different scenario requirements. Consequently, developing a RAG application may involve adjusting and optimizing many parameters, including the slicing mode, recall mode, pre-processing, post-processing, and routing parameters. Finding the optimal combination among these components in a huge parameter space is a significant challenge for developers and researchers, leading to a large number of experiments and trial-and-error tests. It is critical to simplify the development process and improve the efficiency of developing RAG applications.
- Integration with enterprises' knowledge bases and search engines
RAG can be integrated with enterprise search engines (such as Elasticsearch) and keyword search engines, eliminating the need for developing knowledge bases from scratch. To meet the requirements of LLMs, such integration requires interface customization. The customized interfaces must be free from restrictions on the context window, be easy to understand, use simplified words and expressions, and have fewer interface parameters. These interfaces can fully unlock the potential of RAG models by further adapting LLMs to search engines, improving the usage of enterprises' knowledge bases and data resources.
- Lack of time attributes
Conflicting datasets in databases might become a major issue for current RAG strategies. For example, after an enterprise policy document is updated, the new rules may conflict with old ones. Since the current RAG recall strategy does not take time attributes into account, it may recall old knowledge segments during vector recall, causing the LLM to provide incorrect answers. This problem can be solved by introducing time attributes of data into the RAG strategy and increasing the recall score of data that is updated and more relevant based on the attributes. This approach can reduce the possibility of old knowledge segments being recalled after the data is updated. Additionally, paying more attention to the time attributes during model training and application can improve model accuracy and adaptability.
- Fine-tuning the embedding model and rerank model based on data in professional areas
Most open-source embedding models and rerank models are trained based on common corpora. In professional fields, such as wireless communications, these models may fail to recall the most relevant corpus in the RAG process, delivering suboptimal performance. Consequently, these models need to be fine-tuned based on the data in professional fields to improve RAG performance further.
5 Conclusion
In recent years, the development of LLMs has led to significant innovation in the field of knowledge management. In this paper, we have outlined the challenges and technical schemes involved in knowledge management, including fine-tuning and RAG. We used RAG technology to create a knowledge management solution. We evaluated different LLM models, embedding models, and rerank models to select the optimal combinations for implementing an LLM application that achieves Q&A for wireless communication knowledge bases through many open-source tools. Our model combinations achieved significant reductions in hallucinations through RAG-based integration with databases. However, RAG technology still faces challenges in real-world applications, such as processing multimodal data, selecting complex hyperparameters, and integrating enterprise knowledge bases with search engines.
Our work demonstrates the potential of RAG technology in wireless communication knowledge management and lays a foundation for improving LLM application performance in this field.