Blazing Fast APT Download Speeds

"Before you marry a person, you should first make them use a computer with slow Internet service to see who they really are." — Will_Ferrell

penguins crossing sign

Waiting on a slow Internet connection can be a bummer. This tutorial will walk you through how to get blazing fast speeds for the Advanced Packet Tool or apt https://en.wikipedia.org/wiki/APT_(software). Apt software originated at the Debian https://en.wikipedia.org/wiki/Debian Linux distribution. I used it for Debian a while back, but now I’m using it for the Ubuntu distro https://en.wikipedia.org/wiki/Ubuntu.

My Process

Ultimately, I decided to write a custom Python script based on a Bash script I found on the Internet. I didn’t try netselect https://github.com/pgollangi/netselect, apt mirror:// https://www.baeldung.com/linux/apt-terminal-choose-fastest-mirror#2-apt-mirror, nor curl https://www.baeldung.com/linux/apt-terminal-choose-fastest-mirror#3-curl-transfer-speed

because I was steered away from them by https://www.baeldung.com/linux/. I did this on Ubuntu Desktop 22.04.

Code Download

https://github.com/mday299/keypuncher/tree/main/Networking/Python/apt-speedy

Based on: https://www.baeldung.com/linux/apt-terminal-choose-fastest-mirror

Instructions

sudo apt update
sudo apt install git python3 curl

Open your favorite editor or IDE https://en.wikipedia.org/wiki/Integrated_development_environment, and start with these Python imports:

import requests
import re
import subprocess
from concurrent.futures import ThreadPoolExecutor

These are the requests https://pypi.org/project/requests/, regular expression https://docs.python.org/3/library/re.html, subprocess https://docs.python.org/3/library/subprocess.html

and ThreadPoolExecutor https://docs.python.org/3/library/concurrent.futures.html

Python modules.

We’ll discuss them as they come up later on in the Python code.

Next add the following to your Python file:

# Fetch the HTML list of Ubuntu mirrors and extract URLs of up-to-date mirrors
response = requests.get('https://launchpad.net/ubuntu/+archivemirrors')
mirrors = re.findall(r'(https?://[^\"]+)', response.text)
filtered_mirrors = [mirror for mirror in mirrors if 'statusUP' in response.text.split(mirror)[1]]

The first line sends an HTTP GET request to the URL https://launchpad.net/ubuntu/+archivemirrors using the requests library. The response from the server is stored in the response variable. Further response.text contains the HTML content of the page.

The next line uses the re (regular expression) module to find all substrings in response.text that match the given pattern. The pattern

https?://[^\"]+

matches any URL that starts with http or https and continues until a double quote ("). The re.findall function returns a list of all matching URLs, which is stored in the mirrors variable.

The next line packs quite a lot into a single line:

List Comprehension: Python - List Comprehension (w3schools.com)

  • This line uses a list comprehension to create a new list called filtered_mirrors. List comprehensions provide a concise way to generate lists by iterating over an existing list and applying a condition or transformation.

Iteration:

  • The list comprehension iterates over each mirror in the mirrors list.

  • mirrors is a list of URLs extracted from the HTML content of the Ubuntu mirrors page.

Condition:

  • For each mirror, the condition if 'statusUP' in response.text.split(mirror)[1] is checked.

  • This condition checks if the string 'statusUP' appears in the part of the HTML content that comes after the mirror URL.

Splitting the HTML Content:

  • response.text.split(mirror) splits the entire HTML content (response.text) at each occurrence of the mirror URL.

  • This results in a list where the first element is the part of the HTML content before the mirror URL, and the second element is the part after the mirror URL.

Accessing the Part After the URL:

  • response.text.split(mirror)[1] accesses the second element of the split result, which is the part of the HTML content that comes after the mirror URL.

Filtering:

  • If the string 'statusUP' is found in the part of the HTML content that comes after the mirror URL, the mirror is included in the filtered_mirrors list.

  • If 'statusUP' is not found in this part, the mirror is excluded from the filtered_mirrors list.

Next we define a function to test each mirror’s speed:

# Function to test mirror speed
def test_mirror_speed(mirror):
    try:
        print(f"testing mirror {mirror}")
        result = subprocess.run(
            ['curl', '--max-time', '2', '-r', '0-102400', '-s', '-w', '%{speed_download}', '-o', '/dev/null', f'{mirror}/ls-lR.gz'],
            capture_output=True, text=True
        )
        speed_bps = float(result.stdout.strip())
        speed_kbps = speed_bps / 1024
        return mirror, speed_kbps
    except Exception as e:
        print(f"Error testing mirror {mirror}: {e}")
        return mirror, 0

The rather complex line of code:

result = subprocess.run( ['curl', '--max-time', '2', '-r', '0-102400', '-s', '-w', '%{speed_download}', '-o', '/dev/null', f'{mirror}/ls-lR.gz'], capture_output=True, text=True )

uses the subprocess.run function to execute a curl command cURL - Wikipedia. Summarized:

  • curl: This is a command-line tool for transferring data with URLs.

  • --max-time 2: This option sets the maximum time in seconds that the curl command is allowed to take. In this case, it’s set to 2 seconds.

  • -r 0-102400: This option specifies the byte range to retrieve. Here, it fetches the first 102400 bytes (100 KB) of the file.

  • -s: This option makes curl operate in silent mode, meaning it won’t show progress or error messages.

  • -w '%{speed_download}': This option tells curl to output the download speed after the transfer is complete.

  • -o /dev/null: This option discards the downloaded data by sending it to /dev/null.

  • f'{mirror}/ls-lR.gz': This is the URL of the file to be downloaded, with mirror being a variable that holds the base URL of the mirror.

The subprocess.run function executes this curl command and captures its output. The capture_output=True argument ensures that the output is captured, and text=True means the output will be returned as a string. The result of this command is stored in the result variable, which contains information about the execution, including the download speed in bytes per second.

The next few lines do the following:

speed_bps = float(result.stdout.strip())

  • result.stdout contains the output from the curl command, which is the download speed in bytes per second (Bps).

  • .strip() removes any leading or trailing whitespace from the output.

  • float() converts the cleaned string output to a floating-point number, representing the download speed in bytes per second.

speed_kbps = speed_bps / 1024

  • This line converts the download speed from bytes per second to kilobytes per second by dividing the value by 1024 (since 1 KB = 1024 bytes).

return mirror, speed_kbps

  • This line returns a tuple containing the mirror URL and the calculated download speed in kilobytes per second. Tuple - Wikipedia

The last few lines are exception handling — see: https://en.wikipedia.org/wiki/Exception_handling_(programming).

The next lines few lines also do quite a lot:

# Test each mirror with a 2-second timeout using ThreadPoolExecutor
print("Testing mirrors for speed...")
with ThreadPoolExecutor(max_workers=10) as executor:
    results = list(executor.map(test_mirror_speed, filtered_mirrors))

with ThreadPoolExecutor(max_workers=10) as executor:

  • This line creates a ThreadPoolExecutor instance named executor.

  • ThreadPoolExecutor is a class from the concurrent.futures module that provides a high-level interface for asynchronously executing callables using a pool of threads.

  • The max_workers=10 argument specifies that the thread pool should have a maximum of 10 worker threads.

results = list(executor.map(test_mirror_speed, filtered_mirrors))

  • executor.map(test_mirror_speed, filtered_mirrors):

    • The map method of ThreadPoolExecutor takes a function and an iterable as arguments. Iterables in Python - Python Geeks It applies the function to each item in the iterable concurrently using the thread pool.

    • In this case, the test_mirror_speed function is applied to each mirror URL in the filtered_mirrors list.

  • The map method returns an iterator that yields the results of the function calls.

  • list(executor.map(test_mirror_speed, filtered_mirrors))

    • The list function converts the iterator returned by map into a list.

    • This list contains the results of the test_mirror_speed function for each mirror URL in filtered_mirrors.

Context of with statement:

  • The with statement ensures that the ThreadPoolExecutor is properly cleaned up after its block of code is executed. This includes shutting down the thread pool and releasing any resources it holds.

In summary, these lines create a thread pool with up to 10 worker threads, use it to concurrently test the speed of multiple mirrors by applying the test_mirror_speed function to each mirror URL in the filtered_mirrors list, and store the results in the results list. This approach helps to speed up the process of testing multiple mirrors by leveraging concurrent execution. Concurrency (computer science) - Wikipedia

# Filter out mirrors with speeds of 0.0 KB/s
valid_results = [result for result in results if result[1] > 0]

# Sort mirrors by speed and get the top 5
sorted_results = sorted(valid_results, key=lambda x: x[1], reverse=True)[:5]

The next line creates a new list called valid_results.

  • It uses a list comprehension to iterate over each result in the results list.

    • The condition if result[1] > 0 checks if the second element of each result tuple (which represents the speed in KB/s) is greater than 0 and only the results that meet this condition are included in the valid_results list.

  • This effectively filters out any mirrors that have a speed of 0.0 KB/s, ensuring that only valid mirrors with non-zero speeds are considered.

The next line sorts the valid_results list based on the download speed.

  • The sorted function is used to sort the list.

  • The key=lambda x: x[1] argument specifies that the sorting should be done based on the second element of each tuple (the speed in KB/s). Python Lambda (w3schools.com)

  • The reverse=True argument sorts the list in descending order, so the mirrors with the highest speeds come first.

  • The [:5] slice notation selects the top 5 mirrors from the sorted list.

The resulting list, sorted_results, contains the top 5 mirrors with the highest speeds.

# Print the top 5 fastest mirrors
print("Top 5 fastest mirrors:")
for mirror, speed in sorted_results:
    print(f"{mirror} --> {speed:.2f} KB/s")

And finally, we print the top 5 fastest mirrors.

Building

To build the Python script enter:

python3 ./apt-speedy.py

At a bash or similar prompt.

Full Source

import requests
import re
import subprocess
from concurrent.futures import ThreadPoolExecutor

# Fetch the HTML list of Ubuntu mirrors and extract URLs of up-to-date mirrors
response = requests.get('https://launchpad.net/ubuntu/+archivemirrors')
#print(response.text)

mirrors = re.findall(r'(https?://[^\"]+)', response.text)
filtered_mirrors = [mirror for mirror in mirrors if 'statusUP' in response.text.split(mirror)[1]]

# Function to test mirror speed
def test_mirror_speed(mirror):
    try:
        print(f"testing mirror {mirror}")
        result = subprocess.run(
            ['curl', '--max-time', '2', '-r', '0-102400', '-s', '-w', '%{speed_download}', '-o', '/dev/null', f'{mirror}/ls-lR.gz'],
            capture_output=True, text=True
        )
        speed_bps = float(result.stdout.strip())
        speed_kbps = speed_bps / 1024
        return mirror, speed_kbps
    except Exception as e:
        print(f"Error testing mirror {mirror}: {e}")
        return mirror, 0

# Test each mirror with a 2-second timeout using ThreadPoolExecutor
print("Testing mirrors for speed...")
with ThreadPoolExecutor(max_workers=10) as executor:
    results = list(executor.map(test_mirror_speed, filtered_mirrors))

# Filter out mirrors with speeds of 0.0 KB/s
valid_results = [result for result in results if result[1] > 0]

# Sort mirrors by speed and get the top 5
sorted_results = sorted(valid_results, key=lambda x: x[1], reverse=True)[:5]

# Print the top 5 fastest mirrors
print("Top 5 fastest mirrors:")
for mirror, speed in sorted_results:
    print(f"{mirror} --> {speed:.2f} KB/s")

Make Mirror Permanent:

At some point you are probably going to want to make whatever is at the top of the list permanent. Here’s a step-by-step guide to locate the “Software & Updates” settings in Ubuntu 22.04:

  • Open the Activities Overview:

    • You can do this by clicking on the top-left corner of your screen.

  • Search for “Software & Updates”:

    • In the search bar, type “Software & Updates”.

  • Open “Software & Updates”:

    • Click on the “Software & Updates” application to open it.

  • Select Your Mirror:

    • In the “Software & Updates” window, go to the Ubuntu Software tab.

    • Click on the Download from dropdown menu and select Other.

    • Click on the mirror you want from the list provided by the Python script.

Incidentally I tried to use “Select Best Server,” but I invariably found it was picking one that was NOT the best.

Illustrated version:

actives overview location

Activities Overview as it appears in a default Ubuntu 20.04 installation.

software and updates location

Software & Updates as it appears in the Activities Overview.

Software and Updates Dialog box

Ubuntu Software Download as seen in the Software & Update app.

Differences between APT and APT-GET

https://itsfoss.com/apt-vs-apt-get-difference/

In a nutshell: apt-get is old and apt is the new way.

Alternatives

Synaptic, https://en.wikipedia.org/wiki/Synaptic_(software), though it also will use the mirror above if provided.

One that I particularly loathe is Snap https://en.wikipedia.org/wiki/Snap_(software) because it doesn’t do well in virtual machines.

The Red Hat one: https://en.wikipedia.org/wiki/RPM_Package_Manager.

Arch pacman: https://en.wikipedia.org/wiki/Arch_Linux#Pacman

Nala was one I didn’t know about when I wrote this article, I might check it out! https://www.omgubuntu.co.uk/2023/01/install-nala-on-ubuntu and https://github.com/volitank/nala.

Feedback

As always, do make a comment or write me an email if you have something to say about this post!

Previous
Previous

VirtualBox: Windows Host, Manjaro Guest

Next
Next

Quickstart: gRPC in C++