重点:需同时运行scrapyd和scrapykeep
mkvirtualenv --python=/usr/local/python3/bin/python3.5 server
环境搭建:
pip install -i https://pypi.douban.com/simple/ scrapy
pip install -i https://pypi.douban.com/simple/ scrapyd
pip install -i https://pypi.douban.com/simple/ spiderkeeper
使用/opt 目录进行部署
mkdir demo # 下载demo,测试环境是否正常
cd demo
git clone https://github.com/scrapy/quotesbot.git
mkdir crawl_server # 用于部署keepspider相关东西
配置项目名字和连接:
[deploy:quotesbot] # 注意不要有空格
url=http://10.9.3.251:6800/
project=p_quotesbot
上传项目:
不能使用: scrapyd-deploy --build-egg output.egg ,否者上传后看不到项目(注:先建蛋,再发送可已。更新只需更新蛋)
scrapyd-deploy quotesbot -p p_quotesbot
例子:
E:\demo\quotesbot (master)
(server) λ scrapyd-deploy quotesbot -p p_quotesbot
Packing version 1511949868
Deploying to project "p_quotesbot" in http://10.9.3.251:6800/addversion.json
Server response (200):
{"node_name": "crawl-server", "status": "ok", "spiders": 2, "version": "1511949868", "project": "p_quotesbot"}
或使用 curl命令,待测试
curl http:// localhost:6800 / addversion.json -F project = myproject -F version = r23 -F egg=@myproject.egg
布置作业
支持的请求方法: POST
参数:
project (字符串,必需) - 项目名称
spider (字符串,必需) - 蜘蛛名称
setting (字符串,可选) - 运行蜘蛛时使用Scrapy设置
jobid (字符串,可选) - 用于标识作业的作业ID,将覆盖默认生成的UUID
_version (字符串,可选) - 要使用的项目版本
任何其他参数作为蜘蛛参数传递
curl http://10.9.3.251:6800/schedule.json -d project=myproject -d spider=somespider -d setting = DOWNLOAD_DELAY = 2 -d arg1 = val1
启动测试
scrapyd
配置文件目录: /etc/scrapyd/scrapyd.conf
启动测试
spiderkeeper
创建打包执行文件
注意修改路径,另存为scrapyd-deploy.bat 文件
@echo off
"f:\python\python.exe" "f:\python\Scripts\scrapyd-deploy" %1 %2 %3 %4 %5 %6 %7 %8 %9
服务器地址有 scrapy.cfg的url进行指定
[deploy]
url = http://10.9.3.251:6800/
project = quotesbot
列出所有的scrapyd服务器
scrapyd-deploy -l
scrapyd-deploy 查看服务器是否存在该项目
参考模板:
[scrapyd]
eggs_dir = /opt/crawl_server/eggs
logs_dir = /opt/crawl_server/logs
items_dir =
jobs_to_keep = 5
dbs_dir = /opt/crawl_server/dbs
max_proc = 0
max_proc_per_cpu = 4
finished_to_keep = 100
poll_interval = 5.0
bind_address = 127.0.0.1
http_port = 6800
debug = off
runner = scrapyd.runner
application = scrapyd.app.application
launcher = scrapyd.launcher.Launcher
webroot = scrapyd.website.Root
[services]
schedule.json = scrapyd.webservice.Schedule
cancel.json = scrapyd.webservice.Cancel
addversion.json = scrapyd.webservice.AddVersion
listprojects.json = scrapyd.webservice.ListProjects
listversions.json = scrapyd.webservice.ListVersions
listspiders.json = scrapyd.webservice.ListSpiders
delproject.json = scrapyd.webservice.DeleteProject
delversion.json = scrapyd.webservice.DeleteVersion
listjobs.json = scrapyd.webservice.ListJobs
daemonstatus.json = scrapyd.webservice.DaemonStatus