I love Aria2, but I’m building a web scraper / crawler and I need to download hundreds of thousands of files. Aria2 locks up around the 20,000 file mark. Is there another download manager that could possibly be able to achieve what I’m trying to do? or a more recent fork of Aria2?

I have a workaround I believe, which is to use the API to determine how many files are in queue and sleep indefinitely until there is < 1000, but I’m not sure this is the most effective. It kind of significantly slows down the download pipe.

The issue seems to lie with connections timing out in aria2, which cause them to get locked up and they have to be manually cleared. I have my timeout set at 10 seconds, but that doesn’t seem to matter. I’ve considered running a schedule task to clean them up, but was going to give downloading with Python a try first.

Any suggestions would be appreciated.

  • tetris11@lemmy.ml
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Interesring. Source on the 20,000 file limit? It could just be that you need to increase the number of allowed file descriptors on your OS