首页linux用for和echo批量抓取https://ftp.ncbi.nlm.nih.gov/blast/db/中包含01 02 03 04 05..数值的链接

linux用for和echo批量抓取https://ftp.ncbi.nlm.nih.gov/blast/db/中包含01 02 03 04 05..数值的链接

时间: 2024-10-16 17:11:55 浏览: 5

在Linux中，你可以使用`wget`命令配合`for`循环和正则表达式来批量下载指定网站上包含数字"01", "02", "03", ..., 的链接。假设你想获取所有以"/db/"结尾并且包含连续数字的URL，可以这样做：首先，创建一个临时文件，用于存储需要下载的URL： ```sh #!/bin/bash # 创建一个临时文件，用于存储找到的URLs output_file="ncbi_urls.txt" touch "$output_file" # 使用for循环遍历从1到你需要的数字范围 for i in {1..n}; do # 使用wget -O - --reject="index.html*" 获取当前数字对应的页面内容，并筛选出所需链接 urls=$(wget -O - --reject="index.html*" "https://ftp.ncbi.nlm.nih.gov/blast/db/?term=0$i" | grep -oP "https:\/\/ftp.ncbi.nlm.nih.gov\/blast\/db\/[^"]+") # 将筛选出的链接追加到output_file echo "$urls" >> "$output_file" done # 使用curl批量下载这些URL（如果wget不支持） # 如果wget不支持，可以用下面的curl替换上面的wget # curl -L --silent -i -O --remote-name $(cat $output_file) > /dev/null # 提示：确保你有足够的权限并监控网络状况，因为这个操作可能会下载大量数据 ``` 注意，这里假设`n`是你想要查找的最大数字。将`{1..n}`替换为你实际需要的数字范围。执行上述脚本前，请检查你的环境是否支持`wget`，如果不支持，可以使用`curl`替代。完成后，查看`ncbi_urls.txt`文件即可看到下载的链接列表。

阅读全文