Python: How to await all async tasks

Date: 2019-09-02 | python | async | await |

problem

I'm new to Python and coming from a C# background. Building a program that downloads a lot of images, so would be useful to be able to async each download / write in a task and then parallelize those downloads / io writes. How can I parallelize these fetches and writes using the built-in asyncio and the external aiohttp libraries in a similar fashion as C#'s Task.AwaitAll(MYTASKS)?

solution

Here's an example. Obviously there's a bajillion ways to do this, but this example shows how to start your script such that asyncio can handle it and how to create an array of tasks and await them.

Note: This is written against the latest (as of writing) Python version, 3.7.3. Other versions may require slightly different code.

The code:

import asyncio
import random
import time

async def main():
    async_tasks = []
    number_of_tasks = 10
    wait_times_seconds = []

    for i in range(0, number_of_tasks):
        wait_times_seconds.append(i)
    
    for wait_seconds in wait_times_seconds:
        async_tasks.append(wait_for_time_async(wait_seconds))

    start_time = time.time()
    await asyncio.gather(*async_tasks)
    end_time = time.time()
    print("total operation time = ", (end_time - start_time))

async def wait_for_time_async(seconds: int) -> None:
    await asyncio.sleep(seconds)

asyncio.run(main())

So what does this code do?

  • uses asyncio.run(MYFUNCTION) (line 24) to give asyncio access to handle async operations
  • creates an array of 10 wait times from [0, 10) and stores in wait_times_seconds (lines 10 and 11)
  • creates the tasks to actually do the waiting and stores those tasks in async_tasks on lines 13 and 14 (you can see the awaiting code via asyncio.sleep(SECONDS) on lines 21 and 22)
  • awaits all of the tasks using asyncio.gather(*ASYNC_TASKS) on line 17
  • prints out the operation time so we can see if we actually ran in async on line 19

The output of this on my machine is:

$ python async_example.py
total operation time =  9.002383708953857

So we ran in parallel via async, so this method works to run in parallel via async.

To sanity check, we know that we created 10 tasks from 0 - 9 and then tried to await each of them. If they waited sequentially, we would've waited 0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 = 45 seconds, but based on the output we only waited 9 (the largest wait_time in our array) and thus this likely did run in parallel via async.

source

  • I ran into a similar problem while trying to parallelize Unsplash image downloads for my Blinder project.

did this help?

I regularly post about tech topics I run into. You can get a periodic email containing updates on new posts and things I've built by subscribing here.

Want more like this?

The best / easiest way to support my work is by subscribing for future updates and sharing with your network.