To download a website for offline browsing using wget
, you can use the following command:
wget --recursive --no-clobber --page-requisites --html-extension --convert-links --restrict-file-names=windows --domains example.com --no-parent http://www.example.com
Replace example.com
with the website you want to download. This command will download the entire website, including all the elements that compose the page (images, CSS, etc.), and convert the links so that they work locally and offline[1].
Here’s a brief explanation of the options used in the command:
--recursive
: Download the entire website recursively.--no-clobber
: Don’t overwrite existing files.--page-requisites
: Download all the elements required to display the page properly (images, CSS, etc.).--html-extension
: Save files with the.html
extension.--convert-links
: Convert links so that they work locally and offline.--restrict-file-names=windows
: Modify filenames so that they work in Windows as well.--domains example.com
: Don’t follow links outside the specified domain.--no-parent
: Don’t follow links outside the specified directory.
If the website uses external resources like jQuery, you can try using the --span-hosts
option to download resources from other domains as well:
wget --recursive --no-clobber --page-requisites --html-extension --convert-links --restrict-file-names=windows --domains example.com --span-hosts --no-parent http://www.example.com
However, this may also download additional content from other domains that you might not need. To avoid downloading unnecessary content, you can manually download the required resources (like jQuery) and update the HTML files to use the local copies instead.
Keep in mind that some websites may have measures in place to prevent downloading their content using tools like wget
. In such cases, you may need to adjust the command options or use alternative methods to download the website for offline browsing[6].
Citations: [1] https://www.linuxjournal.com/content/downloading-entire-web-site-wget [2] https://winaero.com/make-offline-copy-of-a-site-with-wget-on-windows-and-linux/amp/ [3] https://stackoverflow.com/questions/10842263/wget-download-for-offline-viewing-including-absolute-references [4] https://askubuntu.com/questions/391622/download-a-whole-website-with-wget-or-other-including-all-its-downloadable-con [5] https://superuser.com/questions/970323/using-wget-to-copy-website-with-proper-layout-for-offline-browsing [6] https://www.computerhope.com/unix/wget.htm [7] https://superuser.com/questions/1672776/download-whole-website-wget [8] https://gist.github.com/stvhwrd/985dedbe1d3329e68d70 [9] https://simpleit.rocks/linux/how-to-download-a-website-with-wget-the-right-way/ [10] https://www.guyrutenberg.com/2014/05/02/make-offline-mirror-of-a-site-using-wget/ [11] https://linuxreviews.org/Wget:_download_whole_or_parts_of_websites_with_ease [12] https://brain-dump.space/articles/how-to-get-full-offline-website-copy-using-wget-on-mac-os/ [13] https://dev.to/jjokah/how-to-download-an-entire-website-for-offline-usage-using-wget-2lli [14] https://alvinalexander.com/linux-unix/how-to-make-offline-mirror-copy-website-with-wget [15] https://askubuntu.com/questions/979655/using-wget-and-having-websites-working-properly-offline
wget -mkEpnp
wget --mirror --convert-links --adjust-extension --page-requisites –no-parent http://example.org
Explanation of the various flags:
wget -mpHkKEb -t 1 -e robots=off -U ‘Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0’ http://www.example.com
Cronjobs
0 23 * * * cd ~/Documents/Webs/mirror; wget -mpk -t 1 -e robots=off -U ‘Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0’ https://example.com
0 23 * * * cd ~/Documents/Webs/mirror; wget -mpk -t 1 -e robots=off -U ‘Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0’ https://example.com
0 23 * * * cd ~/Documents/Webs/mirror; wget -mpk t 1 -e robots=off -U ‘Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0’ https://example.com
0 23 * * * cd ~/Documents/Webs/mirror; wget -mpkH t 1 -e robots=off -U ‘Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0’ -D https://example.com
0 23 * * * cd ~/Documents/Webs/mirror; wget -mpk t 1 -e robots=off -U ‘Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0’ https://example.com
0 23 * * * cd ~/Documents/Webs/mirror; wget -mpk t 1 -e robots=off -U ‘Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0’ https://example.com
0 23 * 1 * cd ~/Documents/Webs/mirror; wget -mpk t 1 -e robots=off -U ‘Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0’ https://example.com
0 8 * * * pkill wget; cd ~/Documents/Webs/mirror/; rm wget*
I have been only using page-requisites but this is even better, thanks!