site

files for beauhilton.com
git clone https://git.beauhilton.com/site.git
Log | Files | Refs

2099-04-13-wget-outta-my-way.md (1378B)


      1 ---
      2 layout: post
      3 title: "Wget Outta My Way, diigo"
      4 toc: true
      5 image: https://source.unsplash.com/OfMq2hIbWMQ
      6 tags:
      7   - wget
      8   - web
      9   - archiving
     10   - productivity
     11   - Python
     12   - Markdown
     13   - "academic writing"
     14 ---
     15 
     16 ## read-later
     17 
     18 I've been thinking a lot about sustainable, preferably third-party-service-free ways to keep track of and use things I've read online.
     19 
     20 [wget manual online](https://www.gnu.org/software/wget/manual/wget.html#Download-Options)
     21 
     22 `wget -E -k -p https://www.nateliason.com/blog/smart-notes`
     23 
     24 - `-E` - adds the `.html` extension to the filename
     25 - `-p` - page requisites (downloads all resources necessary to properly render the page, e.g. images)
     26 - `-nd` - no directories
     27 - `-nH` - no host directories
     28 - `-H` - spans hosts when recursively retrieving
     29 - `-K` - "When converting a file, back up the original version with a ‘.orig’ suffix."
     30 - `-k` - convert non-relative links
     31 - `-P` - specify directory for download, creates if it doesn't exist. Append desired directory name directly to the command, e.g. a directory called "web" would be specified as `-Pweb`
     32 
     33 "Actually, to download a single page and all its requisites (even if they exist on separate websites), and make sure the lot displays properly locally, this author likes to use a few options in addition to ‘-p’:"
     34 
     35 `wget -E -H -k -K -p -Pweb https://www.nateliason.com/blog/smart-notes`