Using wget to recursively fetch a directory with arbitrary files in it

I have a web directory where I store some config files. I'd like to use wget to pull those files down and maintain their current structure. For instance, the remote directory looks like:

http://mysite.com/configs/.vim/

.vim holds multiple files and directories. I want to replicate that on the client using wget. Can't seem to find the right combo of wget flags to get this done. Any ideas?


You have to pass the -np / --no-parent option to wget (in addition to -r / --recursive , of course), otherwise it will follow the link in the directory index on my site to the parent directory. So the command would look like this:

wget --recursive --no-parent http://example.com/configs/.vim/

To avoid downloading the auto-generated index.html files, use the -R / --reject option:

wget -r -np -R "index.html*" http://example.com/configs/.vim/

要递归下载目录,拒绝index.html *文件并下载没有主机名,父目录和整个目录结构的目录:

wget -r -nH --cut-dirs=2 --no-parent --reject="index.html*" http://mysite.com/dir1/dir2/data

For anyone else that having similar issues. Wget follows robots.txt which might not allow you to grab the site. No worries, you can turn it off:

wget -e robots=off http://www.example.com/

http://www.gnu.org/software/wget/manual/html_node/Robot-Exclusion.html

链接地址: http://www.djcxy.com/p/9804.html

上一篇: 如何统计文档中的行数?

下一篇: 使用wget以递归方式获取包含任意文件的目录