Thoughts on ArchiveBox for archiving webpages?

stricken_liftoff@feddit.ch · 1 year ago

Thoughts on ArchiveBox for archiving webpages?

hoodlem@hoodlem.me · 1 year ago

I used it but unfortunately it did not meet my needs. I’m interested in a full mirror of a website, while ArchiveBox focuses on a single webpage with a max of 1 level deep. I use wget personally, but if your goal is to archive a single webpage then ArchiveBox might be a good choice.

stricken_liftoff@feddit.ch · 1 year ago

Thanks for the info! Single page with no link following is all I need for this project, so I’ll give it a go.

Fryboyter@discuss.tchncs.de · edit-2 1 year ago

I don’t particularly like the graphic interface as shown at https://demo.archivebox.io/public/. In my opinion, too much is displayed at once.

For my part, I use Wallabag to save single Internet pages. I think its graphic interface is better. But it is not perfect either.

stricken_liftoff@feddit.ch · 1 year ago

I’ll check Wallabag out as well

ThorrJo@lemmy.sdf.org · 1 year ago

I have been experimenting with it, for what it is, it works pretty well … for now. I have concerns about the fact that it’s a ton of moving parts basically duct-taped together by an abuse of the Django admin (that’s the web app platform it’s based on, which I was a developer for long ago). Also, the search function is primitive at best. I don’t think it’s going to be my long-term solution for this need, but maybe I’m wrong.

oldfart · 1 year ago

The archived pages are available as files on disk, I also added a script which generates index.html so I can browse it without starting the program. Basically the only time I run archivebox code is when adding a new site. And I never look at the GUI, it adds nothing to the table

BustedPancake@lemmy.world · 1 year ago

It’s a great tool, but depends on what you expect from it and your use case. Personally I tried it but was always disappointed by it. I always just end up using SingleFile(Z) on my browser or in the cli along with the usual yt-dlp and the like and that’s all I need really. And if I need to save an entire site I just use wget or httrack. I don’t really have the need for a browsable archive of my saved pages, I usually order them by subject when saving.

Thoughts on ArchiveBox for archiving webpages?

Thoughts on ArchiveBox for archiving webpages?

GitHub - ArchiveBox/ArchiveBox: 🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...