Optimizing the transport network for a better user experience
Services are in a rapid state of evolution. Mobile networks have quickly evolved from supporting voice and limited amounts of data to supporting massive amounts of video and gaming traffic. The growing shift to video will drive network traffic and necessitate a higher grade of performance to maintain a high-quality user experience. And mobile gaming is a high-value service that attracts customers and aids customer retention, particularly if the user experience is high-quality.
Many operators' networks comprise many layers of technology that have been deployed over a long period of time. And they have to manage all of the challenges within an environment of business model realities and CAPEX constraints, while also considering how the network will evolve to 5G, which promises new applications and business models but also brings many network performance challenges.
Network operators are correct in asking, “How can I improve the user experience and my network performance right now, without launching into a time consuming, expensive next generation business and network plan?”
User experience is ultimately defined by end to end IP network performance
Today’s networks: typically a layered, complex mix of technologies and converged bandwidth
4G is inherently IP. The E2E performance of the IP layer is the aggregate of all of the underpinning network layers and options. All the layers of the network impact performance, and the performance of the IP layer ultimately defines the user experience.
The typical operator’s network is a mix of technologies, layered (DWDM/IP/Microwave), and features equipment of varying ages from multiple vendors. The goal of E2E visibility is challenging, as the different equipment can report at different time intervals with different levels of precision.
Many operators have constructed converged networks to minimize CAPEX. The trade-off is the transport network becomes more complex. Additionally, unpredictable traffic bursts can negatively impact an under provisioned or tightly provisioned network.
Historic service assurance is not keeping pace
Hour-or minute-level bandwidth utilization does not reflect the real service experience
With LTE and 5G to come, radio access networks are capable of higher speeds, services come with lower latency requirements and users demand higher network availability. The service mix has changed with the rise of web-scale video. Traffic has become less predictable and shifts geographic locations. Earlier generations of service assurance sampling techniques aren’t keeping pace with the new, more dynamic environment. Service assurance sampling, historically, has been conducted in intervals of minutes to hours. Packet loss ratio over the lengthier intervals can be within acceptable thresholds over the timeframe of the whole interval. The weakness of the approach is very short-term bursty traffic events that may cause short term packet loss. A small number of packets lost will not impact an email service but can cause a VoLTE call to be dropped or video flow to be degraded. The net result is a negative user experience that’s immediately detected by the user, but may go undetected by the network operator.
With finer sampling granularity, traffic bursts can be clearly identified. Fine-grained sampling is vital to stay on top of network performance and the user experience in today's environment.
Experience-oriented transport construction and optimization, and an approach to fine tune today’s backhaul network
Networking planning to optimize user experience while minimizing CAPEX
Network planning is a balancing act. Network operators can elect to overbuild – to over-provision – to minimize network congestion and maintain a high-quality user experience, but they will need more CAPEX to accomplish their goal. Networks can also be optimized based on today's average network requirements. The near-term appeal of this is lower CAPEX, but the risk is that the network is under provisioned for bursty or unforeseen conditions, which will impact the user experience. Networks are not static: the number of users and services and the amount of bandwidth per user are growing. Networks that are optimally sized for today's conditions may quickly become undersized for future growth scenarios.
In today's 4G networks, some network operators have elected to follow the fully provisioned route to minimize congestion bottlenecks. Backhaul networks have been sized to carry the maximum load of the base station. Statistical multiplexing has not been employed to avoid packet loss due to network congestion. An operational consideration factored into the fully provisioned backhaul philosophy is that radio access operations teams do not want to troubleshoot transport backhaul networks. An argument could be made that this approach adds to the near-term CAPEX requirement.
Increase the service assurance sampling frequency
Increasing the service assurance sampling frequency per QoS flow will address the visibility issue for network operators. Next-generation service assurance has the potential to identify a variety of suboptimal network conditions.
While enabling pinpoint precision in identifying network trouble, next-generation service assurance, with advanced telemetry capabilities, will generate vast amounts of data. In the advanced telemetry era, the challenge is to find the underperforming network item in this data. Big data tools are designed to quickly ingest massive quantities of event data and support low-latency queries on that data.
The new big data analytics approach will need to detect network performance anomalies, and machine learning will be needed to identify network deficiencies and predict future deficiency scenarios. Advanced telemetry may mean a vast number of scenarios sound the alarm, so advanced correlation techniques will be required to distill these down to the root cause of the problem. The next-generation service assurance tool will have an advanced dashboard display to rapidly deliver insights on the network.
The future: closed loop automation and integration with SDN control
The longer-term goal of network management is closed-loop automation, with more powerful analytical tools in place to both identify and predict network problems, and the underpinning capabilities in place to take corrective action. The advanced telemetry tool can supply future orchestration tools, identifying the network scenario and deficiency, and providing the recommended action. The orchestration tool can decide on the best course of action and then request an action. Bandwidth can be flexed up to address congestion issues. Traffic can be rerouted to preserve the service and the user experience.
Conclusions & Recommendations for operators
- The transport network plays a major role in end-to-end (E2E) mobile network performance and the user experience. Network operators need to optimize the transport network in a capital-efficient manner to optimize the user experience. Packet loss of less than 10-4 and latency of 50 ms are the recommended performance targets for today's networks. Future ultra-reliable and low-latency 5G services will have even more stringent performance requirements.
- The historic service assurance technique of measuring the transport network by utilization in minutes or hours does not provide an accurate view of today's service performance. Network operators need to understand the state of the transport network with fine-grained precision. The standard of precision needs to be improved from minutes or hours to seconds or sub-seconds. With improved network visibility, network operators can consider courses of action for problem remediation.
- Transitioning to a new service assurance operational approach is a major endeavor. Network operators with the necessary skill sets can, on their own, transition to next-generation service assurance tools. Operators can consider partnering with a service assurance specialist to accelerate the transition process to a next-generation service assurance paradigm.
- Identifying network trouble spots with precision means many potential remediation actions can be considered. Many actions would require a change in an operational procedure. Some remedies may involve CAPEX. With a more precise knowledge of major network congestion points, operators can deploy capital as efficiently as possible for maximum benefit.