先登录网页,获取cookie,然后转化为字典,保存在settings.py中的COOKIES池中,使用中间件用cookie登录。
1、cookie,转化为字典
def cookieChangeToDict(cookie):
'''
将cookie字符串转换成字典
:param cookie: 登录后的cookie
:return:字典
'''
cookieList = cookie.split(';')
cookieDict = {}
for cookie in cookieList:
name = cookie.split('=', maxsplit=1)[0].strip()
value = cookie.split('=', maxsplit=1)[1].strip()
cookieDict[name] = value
return cookieDict
if name == 'main':
cookie = """
你的cookie
"""
print(cookieChangeToDict(cookie))
把打印出的cookie放到settings.py中自定义的COOKIES=[]中
2、使用登录后的cookie发送请求
方式一:
可以重写Spider类的start_requests方法,附带Cookie值,发送POST请求
def start_requests(self):
url= ''
return [scrapy.FormRequest(url, cookies = self.cookies, callback = self.parse)]
方式2:使用中间件:
from scrapy import signals
from scrapy.downloadermiddlewares.cookies import CookiesMiddleware
import random
from renren.settings import COOKIES
class RandomCookieMiddleware(CookiesMiddleware):
'''
随机cookie池
'''
def process_request(self, request, spider):
cookie = random.choice(COOKIES)
request.cookies = cookie
在settings.py中设置:
ROBOTSTXT_OBEY = False
COOKIES_ENABLED = True
启用中间件
DOWNLOADER_MIDDLEWARES = {
'renren.middlewares.RandomCookieMiddleware': 543,
}
COOKIES池
COOKIES = [
]