python实战计划1_2
视频总共看了好几遍才看懂,BeautifulSoup文档也看了好几遍,总算做出来了。
结果:
Library/Frameworks/Python.framework/Versions/3.5/bin/python3.5 "/Applications/PyCharm CE.app/Contents/helpers/pycharm/utrunner.py" /Users/apple/PycharmProjects/untitled/beautifulsouptest.py true
Testing started at 下午12:47 ...
{'price': '$24.99', 'name': 'EarPod', 'star': 5, 'image': 'img/pic_0000_073a9256d9624c92a05dc680fc28865f.jpg', 'rate': '65 reviews'}
{'price': '$64.99', 'name': 'New Pocket', 'star': 4, 'image': 'img/pic_0005_828148335519990171_c234285520ff.jpg', 'rate': '12 reviews'}
{'price': '$74.99', 'name': 'New sunglasses', 'star': 4, 'image': 'img/pic_0006_949802399717918904_339a16e02268.jpg', 'rate': '31 reviews'}
{'price': '$84.99', 'name': 'Art Cup', 'star': 3, 'image': 'img/pic_0008_975641865984412951_ade7a767cfc8.jpg', 'rate': '6 reviews'}
{'price': '$94.99', 'name': 'iphone gamepad', 'star': 4, 'image': 'img/pic_0001_160243060888837960_1c3bcd26f5fe.jpg', 'rate': '18 reviews'}
{'price': '$214.5', 'name': 'Best Bed', 'star': 4, 'image': 'img/pic_0002_556261037783915561_bf22b24b9e4e.jpg', 'rate': '18 reviews'}
{'price': '$500', 'name': 'iWatch', 'star': 4, 'image': 'img/pic_0011_1032030741401174813_4e43d182fce7.jpg', 'rate': '35 reviews'}
{'price': '$15.5', 'name': 'Park tickets', 'star': 4, 'image': 'img/pic_0010_1027323963916688311_09cc2d7648d9.jpg', 'rate': '8 reviews'}
我的代码:
frombs4importBeautifulSoup
info = []
withopen('/Users/apple/Downloads/Plan-for-combating-master/week1/1_2/1_2answer_of_homework/index.html','r')aswb_data:
soup = BeautifulSoup(wb_data,'lxml')
images = soup.select('body > div > div > div.col-md-9 > div > div > div > img')
prices = soup.select('body > div > div > div.col-md-9 > div > div > div > div.caption > h4.pull-right')
names = soup.select('body > div > div > div.col-md-9 > div > div > div > div.caption > h4 > a')
rates = soup.select('body > div > div > div.col-md-9 > div > div > div > div.ratings > p.pull-right')
stars = soup.select('body > div > div > div.col-md-9 > div > div > div > div.ratings > p:nth-of-type(2)')#找到星号所在的段
foriinrange(len(stars)):
stars[i] = stars[i].find_all(class_='glyphicon-star')#找到每个段中的实体星星
stars[i] = stars[i].count(stars[i][0])#数下每个段中有几个星星
forname,price,rate,image,starinzip(names,prices,rates,images,stars):
data = {
'name': name.string,
'price': price.string,
'rate': rate.string,
'image': image.get('src'),
'star': star
}
info.append(data)
print(data)
总结:
1. Safari查看源代码没有css selector项,只能用chrome查看.
2. 复制css selector 地址后要删所有的nth child()项。
body > div:nth-child(2) > div > div.col-md-9 > div:nth-child(2) > div:nth-child(1) > div > div.ratings > p:nth-child(2) > span:nth-child(1)