# Open-Source Software Optimization - Optimizing the Merge Sort Algorithm of Go Programming Language

Aug 25, 2021

The sorting algorithm is used to sort a series of data in a specific sequence. For example, the sorting algorithm is used to sort the exam scores of a school or data reports of a company in descending order. Common sorting algorithms include bubble sort, quick sort, dictionary sort, merge sort, and heap sort. Merge sort is a stable and efficient sorting algorithm, and the time complexity is N*log2N. This document uses the merge sort algorithm optimization case of Go in the open-source community as an example to describe the common algorithm analysis and optimization methods.

### 1. Merge Sort Algorithm of Go

The sorting package of Go supports insertion sort, merge sort, quick sort, and heap sort. Merge sort is a stable sorting algorithm implemented by the sorting package. Go implements the merge sort algorithm based on recursion. The code is as follows:

``````// The code for merge sort before optimization uses recursion to merge two ordered subsequences: data[a:m] and data[m:b].
func symMerge(data Interface, a, m, b int) {
// Initialize the values of mid, start, and r. mid indicates the median of [a:b], start indicates the start subscript for data exchange in [a:m], and r indicates the rightmost subscript for data exchange in [a:m].
mid := int(uint(a+b) >> 1)
n := mid + m
var start, r int
if m > mid {
start = n - b
r = mid
} else {
start = a
r = m
}
p := n - 1

// Use binary search for symmetrical comparison of data block [a:m] and [m:b], and find the start subscript position where data block [a:m] begins to be greater than data block [m:b].
for start < r {
c := int(uint(start+r) >> 1)
if !data.Less(p-c, c) {
start = c + 1
} else {
r = c
}
}

// Flip the data blocks [start:m] and [m:end] to ensure that the data block [start:end] is in order.
end := n - start
if start < m && m < end {
rotate(data, start, m, end)
}
// Invoke the recursive function to sort two groups of data sub-blocks [a:start] and [start:mid], and [mid:end] and [end:b].
if a < start && start < mid {
symMerge(data, a, start, mid)
}
if mid < end && end < b {
symMerge(data, mid, end, b)
}
}``````

### 2. Scenario and Problem Analysis

The merge sort algorithm is mainly used for sequential merging of data blocks, for example, merging of sequential slice data [0:20] and [20:40]. The main processing procedure is as follows:

1. Invoke the recursive function symMerge to initialize slice parameters.
2. The slice data [0:20] becomes less than [20:40] through binary search, symmetric comparison, and flipping.
3. Invoke the recursive function symMerge again to merge sort data sub-slices [0:start] and [start:20], and return to step 1.
4. Invoke the recursive function symMerge again to merge sort data sub-slices [20:end] and [end:40], and return to step 1.
5. The entire slices data [0:40] will be in order when all data sub-slices are sequenced.

In processing the common application scenario, it is found that the recursive function needs to be continuously invoked to process the subsequence in merge sort, and this is also a main reason for performance loss in the merge sort algorithm. In merge sort data slices, if recursive processing of sub-slices can be avoided or reduced, running performance of the algorithm can be improved. For example, in a scenario in merge sort the sub-slice data  whose length is 1 and the sub-slice data [1:10] whose length is 9, insertion sort is used to avoid invoking recursion, so that running time of an algorithm can be reduced, and performance of the algorithm can be optimized.

### 3. Optimization Solution and Implementation

Based on the code and scenario analysis of the merge sort algorithm in Go, an optimization method is found. That is, in merge sort, for a data block whose length is 1, a binary insertion sort algorithm is used to process data blocks, so that invoking of the recursive function can be reduced, and algorithm performance can be improved. The open-source Go community also uses the optimization method based on binary insertion sort to improve the performance of the merge sort algorithm. The following figure shows the code comparison before and after the optimization: Optimization code:

``````// The optimization code adds a judgment on the slice length. When the slice length is 1, the binary insertion sort is used to directly sort the slices and return the slices.
func symMerge(data Interface, a, m, b int) {
// If data[a:m] contains only one element, use the binary insertion sort to find the subsequential insertion point and insert data[a] to data[m:b].
// Avoid invoking recursion function to improve the algorithm performance.
if m-a == 1 {
i := m
j := b
// Use binary search to find a proper sequential insertion point.
for i < j {
h := int(uint(i+j) >> 1)
if data.Less(h, a) {
i = h + 1
} else {
j = h
}
}
// Insert data.
for k := a; k < i-1; k++ {
data.Swap(k, k+1)
}
return
}

// If data[m:b] contains only one element, use binary insertion to find the proper sequential insertion point and insert data[m] to data[a:m].
// Avoid invoking recursion function to improve the algorithm performance.
if b-m == 1 {
i := a
j := m
// Use binary search to find a proper sequential insertion point.
for i < j {
h := int(uint(i+j) >> 1)
if !data.Less(m, h) {
i = h + 1
} else {
j = h
}
}
// Insert data.
for k := m; k > i; k-- {
data.Swap(k, k-1)
}
return
}

// Recursive implementation of merge sort.
......
}``````

### 4. Optimization Result

Use Go benchmark to test the algorithm performance before and after the optimization, use Go benchstat to compare the performance test results before and after the optimization, and collect statistics on the number of times that the recursive function is invoked before and after the optimization. The following table lists the statistics:

Test Item Test Case Time per Operation Before Optimization Time per Operation Time After Optimization Operation Time Comparison Total Recursions Before Optimization Total Recursions After Optimization Comparison of Total Recursions
BenchmarkStableString1K-8 1 KB string slice 302278 ns/op 288879 ns/op -2.71% 1110 790 -28.8%
BenchmarkStableInt1K-8 1 KB integer slice 144207 ns/op 139911 ns/op -0.96% 689 586 -14.9%
BenchmarkStableInt64K-8 64 KB integer slice 12291195 ns/op 12119536 ns/op -0.75% 48465 41280 -14.8%
BenchmarkStable1e2-8 1e2(100) structure slice 135357 ns/op 124875 ns/op -7.17% 1079 807 -25.2%
BenchmarkStable1e4-8 1e4(10000) structure slice 43507732 ns/op 40183173 ns/op -7.64% 449735 350205 -22.1%
BenchmarkStable1e6-8 1e6(1000000) structure slice 9005038733 ns/op 8440007994 ns/op -7.69% 79347843 61483110 -22.5%

Note: `-8` indicates the GOMAXPROCS value when the function is running, and `ns/op` indicates the average nanosecond duration for running the function each time.

The preceding table lists the performance test results of different data types, such as string slice, integer slice, and structure slice. The comparison analysis is as follows:

• Compare the 1 KB string slice data with the integer slice data. After the optimization, the recursion times of the string slice data are reduced by 28.8%, and the recursion times of the integer slice data are reduced by only 14.9%. This indicates that the optimization technology plays a greater role in the string slice dataset and the performance is improved greatly.
• The comparison of integer slice data with the length of 64 KB and 1 KB respectively shows that the total number of recursion times decreases by about 14.8% after the optimization, and the performance improvement is 0.5% to 1%.
• Compare the structure slice data whose lengths are 1e2, 1e4, and 1e6. After the optimization, the total numbers of recursion times are all reduced by about 22%, and the performance is all improved by about 7%.

The comparison and analysis of the performance test results show that the performance improvement after optimization is closely related to the test dataset. The more the data slice length of 1 occurred in the recursion, the greater effect the optimization will make, and the more the number of recursion times is reduced, the further the performance will be improved.

### 5. Summary

The optimization case of the merge sort algorithm analyzes the performance problems in a specific scenario, provides an optimization solution based on binary insertion sort, and finally verifies the optimization result. It is an algorithm optimization practice worth learning.