Has anyone used ArchiveBox for self hosted web archiving? If so, what are your thoughts on it compared to Internet Archive or other publicly available services?

  • hoodlem@hoodlem.me
    link
    fedilink
    English
    arrow-up
    3
    ·
    1 year ago

    I used it but unfortunately it did not meet my needs. I’m interested in a full mirror of a website, while ArchiveBox focuses on a single webpage with a max of 1 level deep. I use wget personally, but if your goal is to archive a single webpage then ArchiveBox might be a good choice.

    • stricken_liftoff@feddit.chOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      Thanks for the info! Single page with no link following is all I need for this project, so I’ll give it a go.

  • ThorrJo@lemmy.sdf.org
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 year ago

    I have been experimenting with it, for what it is, it works pretty well … for now. I have concerns about the fact that it’s a ton of moving parts basically duct-taped together by an abuse of the Django admin (that’s the web app platform it’s based on, which I was a developer for long ago). Also, the search function is primitive at best. I don’t think it’s going to be my long-term solution for this need, but maybe I’m wrong.

    • oldfart
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      The archived pages are available as files on disk, I also added a script which generates index.html so I can browse it without starting the program. Basically the only time I run archivebox code is when adding a new site. And I never look at the GUI, it adds nothing to the table

  • BustedPancake@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    It’s a great tool, but depends on what you expect from it and your use case. Personally I tried it but was always disappointed by it. I always just end up using SingleFile(Z) on my browser or in the cli along with the usual yt-dlp and the like and that’s all I need really. And if I need to save an entire site I just use wget or httrack. I don’t really have the need for a browsable archive of my saved pages, I usually order them by subject when saving.