Skip to main content
Stefan's Blog

Catching exceptions in Threadpool executors

Context #

We need to collect data from hundreds of Git repositories. For each repository we need to analyze the repository and extract some information. Each repository contains some metadata, and we need to transform a string formatted as yyyy-mm-dd into a datetime object in Python. Since Python 3.7, there is the fromisoformat() function that we can use.

Problem #

We have a lot of repositories, thus we'd like to parallelize the data collection. One of the easiest ways is to use the concurrent.futures API and in particular, to use the ThreadPoolExecutor. We deploy our code on a powerful server and we see that it does not provide any errors, but on the other hand the data we're wanting is not collected. In each repository analysis, we need to call the fromisoformat() function.

We can quickly hypothesize that maybe it is something to do with the ThreadPoolExecutor. The first option would be to try analyzing each repository in a for loop, sequentially. I can run this again on my machine and it does not yield any errors, and all works as expected. I deploy it again (sequential version) on the server and then I get the error. fromisoformat() is not available

What happened? I run the program on my local machine using Python 3.8, but on the server we had Python 3.6. The easiest way would have been to update the server to a newer Python version, but that does not fix our less than optimal code.

So we have two issues at hand: Python version differs and no exceptions are thrown. Let's look at how to make sure we catch the exceptions when using ThreadPoolExecutor or ProcessPoolExecutor.

How to solve the surpressing of errors & exceptions #

By default, If a func call raises an exception, then that exception will be raised when its value is retrieved from the iterator.. Now, what happens if you have a piece of code as the following one?

import concurrent.futures


def process_file(file):
''' Read a file and output its content '''
with open(file, 'r') as f:
lines = f.readlines()
print(lines)

def main():
files = ["a", "b", "c"]
# use a thread pool executor to call the process_file method concurrently, using four workers
with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
executor.map(process_file, files)

if __name__ == "__main__":
main()

The answer is, you will get nothing, and no exception.

The easiest way to fix this is to use a try/except clause

import concurrent.futures

def process_file(file):
''' Try reading a file and output its content. Print file not found if the file does not exist '''
try:
with open(file, 'r') as f:
lines = f.readlines()
print(lines)
except FileNotFoundError:
print("I cannot find the file {}".format(file))

def main():
files = ["a", "b", "c"]
# use a thread pool executor to call the process_file method concurrently, using four workers
with concurrent.futures.ThreadPoolExecutor() as executor:
executor.map(process_file, files)

if __name__ == "__main__":
main()

The output will look like this:

$ python python-concurrency-example.py
I cannot find the file a
I cannot find the file b
I cannot find the file c

Lessons learned #

  1. Always make sure that the local development and the server development have the same Python and libraries version. Use venv/virtual-env or a Docker image to ensure that the environments match.
  2. Use try/except on code that might throw exceptions, even if you're 99% sure that the code will not throw an exception.