week1-4作业
学习python的第一周 5.18号完成练习week1-4 抓取霉霉图片
主要目标:下载下图片路径中的图片
代码如下:
import requests
from bs4 import BeautifulSoup
import urllib.request
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36',
'Cookie':'locale=zh-cn; __whiAnonymousID=485c28b9ba9b4de49ee343868aa88679; __qca=P0-358680574-1463536358856; __utma=222371101.1034172345.1463536359.1463536359.1463536359.1; __utmc=222371101; __utmz=222371101.1463536359.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmv=222371101.|27=locale=zh-cn=1; _weheartit_anonymous_session=%7B%22page_views%22%3A1%2C%22search_count%22%3A0%2C%22last_searches%22%3A%5B%5D%2C%22last_page_view_at%22%3A1463536359188%7D; auth=no; _session=49c98296de5f5d42ff9633136b3c7f1c; _ga=GA1.2.1034172345.1463536359'
}
def down_load(url):
print(url)
wb_data = requests.get(url,headers=headers)
soup = BeautifulSoup(wb_data.text,'lxml')
downLoad = []
images =soup.select('#main-container > div > div.grid-thumb.grid-responsive > div > div > div > a > img ')
for image in images:
downLoad.append(image.get('src'))
file_path = 'E:/image_download/'
for imagePath in downLoad:
print(imagePath)
urllib.request.urlretrieve(imagePath,file_path + imagePath[-8:])
#print('下载完成一张图')
return None
#http://weheartit.com/inspirations/taylorswift?page=1&before=143392569
full_url = ['http://weheartit.com/inspirations/taylorswift?page={}&before=143392569'.format(str(i)) for
i in range(1, 10, 1)]
for link in full_url:
down_load(link)
结果:
总结:本次下载图片失败,请教老师后,老师表示是服务器那边的问题,代码没有什么语法错误,后又将下载目标换到国内
某网站,下载图片成功。
总结
-学会了urllib.request库的基础运用
-学会调用 urllib.request.urlretrieve方法