scrapy采集dmoz网站Home目录下的信息

一、实验背景

此次实验要求我们爬取DMOZ下的Home目录(http://www.dmoztools.net/Home/)的所有子目录.Home子目录下图所示。

Home子目录

二、实验目标

我们需要爬取Home目录下的所有的网站信息,爬取时主要爬取以下内容:
①爬取site时的当前路径(category_path)
②目录的目录名(cat_name)、链接即内链(cat_url)
③site的标题(site_title)、网址(site_url)、简介(site_desc)

三、网站分析

dmoz的网页结构总的来说是一棵树,其每一节点的子节点较多。
接下来先分析需要得到的Xpath:
1.<-----categories(目录)----->

subcategories——目录的div块

这里有两个目录的div块,尽管它们的id属性值不同,但是他们的class属性值是相同的,因此我们的目录部分的Xpath就写成:

'//div[@class="cat-list results leaf-nodes"]/div[@class="cat-item"]'
目录部分


进一步细化到目录的链接(cat_url)和名称(cat_name):

cat_url 的Xpath = '//div[@class="cat-list results leaf-nodes"]/div[@class="cat-item"]/a/@href'
cat_name 的Xpath = '//div[@class="cat-list results leaf-nodes"]/div[@class="cat-item"]/a/div/text()

2.<-----sites(网站)----->

sites(网站)部分

site的标题(site_title)、网址(site_url)、简介(site_desc)的Xpath分析出来如下:

site_title的Xpath = '//div[@class="site-item "]/div[@class="title-and-desc"]/a/div[@class="site-title"]/text()'
site_url的Xpath = '/div[@class="site-item "]/div[@class="title-and-desc"]/a/@href'
site_desc的Xpath = '/div[@class="site-item "]/div[@class="title-and-desc"]/div[@class="site-descr "]/text()'

三、创建编辑项目——home

1.新建项目
scrapy startproject home
2.在Pycharm中导入项目进行编辑
导入项目
3.编辑items.py文件
import scrapy
class HomeItem(scrapy.Item):
    category_path = scrapy.Field()
    categories = scrapy.Field()
    cat_url = scrapy.Field()
    cat_name = scrapy.Field()
    sites = scrapy.Field()
    site_url =  scrapy.Field()
    site_desc = scrapy.Field()
    site_title = scrapy.Field()
    pass
4.在spider文件夹下创建homeSpider.py文件,并编辑
from home.items import *
from scrapy.spiders import CrawlSpider, Rule    #CrawlSpider与Spider类的最大不同是多了一个rules参数,通过rule可以定义提取动作。
from scrapy.linkextractors import LinkExtractor

class HomeSpider(CrawlSpider):
    name = "home"
    start_urls = ['http://www.dmoztools.net/Home/']
    rules = (
        Rule(LinkExtractor(allow=(r'http://www.dmoztools.net/Home/.*'),
                           restrict_xpaths=('//div[@class="cat-list results leaf-nodes"]/div[@class="cat-item"]')),
             callback="parse_item", follow=True),
    )
 # 利用rule的Rule类定义爬取规则;LinkExtractor,用于定义需要提取的链接;callback是当LinkExtractor获取到链接时参数所指定的值作为回调函数; follow指定了根据该规则从response提取的链接是否需要跟进。当callback为None,默认值为true。
 #其中allow后面是提取匹配 'https://curlie.org/Home/.*';使用restrict_xpaths参数限制,将对url进行选择性抓取,抓取规定位置符合allow的url
   
 def parse_item(self, response):
        item = HomeItem()
        item['category_path'] = response.url.lstrip('http://www.dmoztools.net')   
 #python 的lstrip()方法,用于截掉字符串左边的空格或指定字符,在这里表示只取'http://www.dmoztools.net'右边的字符,用于提取爬取site时的当前路径(category_path)

        item['categories'] = []
        for cat in response.xpath('//div[@class="cat-list results leaf-nodes"]/div[@class="cat-item"]'):
            child_cat = {}
            child_cat["cat_url"] = cat.xpath('a/@href').extract()
            child_cat['cat_name'] = cat.xpath('a/div/text()').extract()
            child_cat['cat_name'] = [item.replace("\r\n", "") for item in child_cat['cat_name']]
            child_cat['cat_name'] = [item.replace(" ", "") for item in child_cat['cat_name']]
            item['categories'].append(child_cat)    
 #python list append()方法用于在列表末尾添加新的对象 

        item['sites'] = []
        for site in response.xpath('//div[@class="site-item "]/div[@class="title-and-desc"]'):
            result_site = {}
            result_site['site_title'] = site.xpath('a/div[@class="site-title"]/text()').extract_first().strip()
            result_site['site_url'] = site.xpath('a/@href').extract_first().strip()
            result_site['site_desc'] = site.xpath('div[@class="site-descr "]/text()').extract_first().strip()
            item['sites'].append(result_site)
        yield item
5.编辑begin.py文件

在home和home.cfg的同级目录下新建一个begin.py文件,然后进行编辑:

from scrapy import cmdline
cmdline.execute('scrapy crawl home -o home.jl'.split())

如果是在云服务器或者通过命令行启动爬虫的就不需要做这一步,只需要在对应的项目目录下输入以下内容即可,并且可以省去后面的一个步骤

scrapy crawl home -o home.jl
6.运行前配置Pycharm

②设置路径

③点击运行

四、爬取结果

给出部分吧,数据挺多的

{"category_path": "Home/Apartment_Living/", "categories": [{"cat_url": ["/Business/Real_Estate/Residential/Rentals/"], "cat_name": ["", "ApartmentLocators", ""]}, {"cat_url": ["/Home/Apartment_Living/Roommates/"], "cat_name": ["", "Roommates", ""]}, {"cat_url": ["/Society/Issues/Housing/Tenant_Rights/"], "cat_name": ["", "TenantRights", ""]}], "sites": [{"site_title": "Apartment Living", "site_url": "http://ths.gardenweb.com/forums/apt/", "site_desc": "A discussion forum for those living in apartments, condominiums and co-ops. Topics range from roommate problems, maintenance issues, leases to decorating."}, {"site_title": "Apartment Therapy", "site_url": "http://www.apartmenttherapy.com/", "site_desc": "Offers articles and advice on apartment living."}, {"site_title": "Good for Apartment Life", "site_url": "http://www.dogbreedinfo.com/apartment.htm", "site_desc": "Provides a list of dog breeds suited to apartment living. Each breed is linked to a detailed description."}, {"site_title": "Rental Decorating Digest", "site_url": "http://www.rentaldecorating.com/", "site_desc": "Tips and tricks for decorating for the renter."}, {"site_title": "TheDollarStretcher.com: How to Furnish a Studio Apartment", "site_url": "http://www.stretcher.com/stories/970929b.cfm", "site_desc": "Studio apartment solutions for \"hiding and otherwise disguising belongings that needed to be stored in a small space.\""}]}
{"category_path": "Home/Weblogs/", "categories": [{"cat_url": ["/Home/Family/Adoption/Weblogs/"], "cat_name": ["", "Adoption", ""]}, {"cat_url": ["/Home/Gardening/Bonsai_and_Suiseki/Bonsai/Weblogs/"], "cat_name": ["", "Bonsai", ""]}, {"cat_url": ["/Home/Cooking/Weblogs/"], "cat_name": ["", "Cooking", ""]}, {"cat_url": ["/Home/Consumer_Information/Electronics/Weblogs/"], "cat_name": ["", "Electronics", ""]}, {"cat_url": ["/Home/Consumer_Information/Food_and_Drink/Weblogs/"], "cat_name": ["", "FoodandDrink", ""]}, {"cat_url": ["/Home/Personal_Finance/Money_Management/Weblogs/"], "cat_name": ["", "MoneyManagement", ""]}, {"cat_url": ["/Home/Family/Parenting/Mothers/Weblogs/"], "cat_name": ["", "Mothers", ""]}, {"cat_url": ["/Home/Family/Parenting/Fathers/Stay_at_Home_Fathers/Weblogs/"], "cat_name": ["", "StayatHomeFathers", ""]}], "sites": []}
{"category_path": "Home/Software/", "categories": [{"cat_url": ["/Home/Family/Software/"], "cat_name": ["", "Family", ""]}, {"cat_url": ["/Home/Gardening/Software/"], "cat_name": ["", "Gardening", ""]}, {"cat_url": ["/Society/Genealogy/Software/"], "cat_name": ["", "Genealogy", ""]}, {"cat_url": ["/Computers/Home_Automation/Software/"], "cat_name": ["", "HomeAutomation", ""]}, {"cat_url": ["/Home/Personal_Finance/Software/"], "cat_name": ["", "PersonalFinance", ""]}, {"cat_url": ["/Home/Cooking/Recipe_Management/"], "cat_name": ["", "RecipeManagement", ""]}], "sites": [{"site_title": "Agilaire", "site_url": "http://www.agilairecorp.com/", "site_desc": "Environmental data management solutions. Products and support."}, {"site_title": "Chief Architect Inc: Home Designer Software", "site_url": "http://www.homedesignersoftware.com/", "site_desc": "Software package for home remodelling, interior design, decks and landscaping creation. Company info, products list, shop and support."}, {"site_title": "Chore Wars", "site_url": "http://www.chorewars.com/", "site_desc": "Browser-based system loosely based on D&D that allows household members to claim experience points for doing tasks.  Monsters and treasure may be optionally defined with each chore."}, {"site_title": "Kopy Kake", "site_url": "http://www.kopykake.com/", "site_desc": "Cake decorating software for professional bakers and hobbyists."}, {"site_title": "Let's Clean Up", "site_url": "http://www.lets-clean-up.com/", "site_desc": "Software to help the family and small businesses organize cleaning chores and maintenance activities."}, {"site_title": "Punch Software", "site_url": "http://www.punchsoftware.com/", "site_desc": "Offers 3D home design suite for professional home planning with real model technology. Plan a dream house with this architecture design and 3D landscape software."}]}
{"category_path": "Home/News_and_Media/", "categories": [{"cat_url": ["/Home/News_and_Media/Radio_Programs/"], "cat_name": ["", "RadioPrograms", ""]}, {"cat_url": ["/Home/News_and_Media/Television/"], "cat_name": ["", "Television", ""]}, {"cat_url": ["/Home/Homemaking/Frugality/Publications/"], "cat_name": ["", "BudgetLiving", ""]}, {"cat_url": ["/Home/Consumer_Information/News_and_Media/"], "cat_name": ["", "ConsumerInformation", ""]}, {"cat_url": ["/Home/Cooking/Magazines_and_E-zines/"], "cat_name": ["", "Cooking", ""]}, {"cat_url": ["/Home/Family/Publications/"], "cat_name": ["", "Families", ""]}, {"cat_url": ["/Home/Gardening/News_and_Media/"], "cat_name": ["", "Gardens", ""]}, {"cat_url": ["/Home/Home_Improvement/News_and_Media/"], "cat_name": ["", "HomeImprovement", ""]}, {"cat_url": ["/Home/Family/Parenting/Magazines_and_E-zines/"], "cat_name": ["", "Parenting", ""]}, {"cat_url": ["/Recreation/Pets/News_and_Media/"], "cat_name": ["", "Pets", ""]}, {"cat_url": ["/Business/Real_Estate/News_and_Media/"], "cat_name": ["", "RealEstate", ""]}], "sites": [{"site_title": "Better Homes & Gardens", "site_url": "http://www.bhg.com/", "site_desc": "Ideas and improvement projects for your home and garden plus recipes and entertaining ideas."}, {"site_title": "Better Homes and Gardens Australia", "site_url": "http://www.bhg.com.au/", "site_desc": "Offers items from both the weekly television show and monthly magazine."}, {"site_title": "Carolina Home and Garden", "site_url": "http://www.carolinahg.com/", "site_desc": "Features articles on the arts, gardens, local people, interior decor. Also includes calendar and resources."}, {"site_title": "Coastal Living Magazine", "site_url": "http://www.coastalliving.com/", "site_desc": "Features articles on homes, decorating, travel, food and living in coastal communities."}, {"site_title": "Country Living Magazine", "site_url": "http://www.countryliving.com/", "site_desc": "Features include home decorating, recipes and antiques and collectibles."}, {"site_title": "Homes and Gardens Magazine", "site_url": "http://www.housetohome.co.uk/homesandgardens", "site_desc": "Features beautiful houses and gardens, decorating ideas, designers and decorators, unusual shops. Includes subscription information, and a range of articles online."}, {"site_title": "House to Home UK", "site_url": "http://www.housetohome.co.uk/", "site_desc": "Provides a look inside British residences, ideas for rooms, videos and guest bloggers."}, {"site_title": "Howdini", "site_url": "http://www.howdini.com/", "site_desc": "An e-zine with video tips on how to do a variety of household tasks such as cooking or caring for a new baby. Includes 'life hacks'."}, {"site_title": "Martha Stewart Living", "site_url": "http://www.marthastewart.com/", "site_desc": "Official site, with links to personal information, Martha's Scrapbook, television highlights, radio guide, virtual studio, recipes, live chat."}, {"site_title": "Mother Earth Living", "site_url": "http://www.motherearthliving.com/", "site_desc": "Offers today's health-conscious, environmentally concerned homeowners information needed to practice earth-inspired living."}, {"site_title": "Mother Earth News", "site_url": "http://www.motherearthnews.com/", "site_desc": "Features and articles covering sustainable, self-reliant living. Topics include building, gardening, homesteading, do-it-yourself, kitchen, energy and health."}, {"site_title": "Pioneer Thinking", "site_url": "http://www.pioneerthinking.com/", "site_desc": "Features articles on gardening, cooking, crafts, personal finance, and beauty."}, {"site_title": "Real Simple", "site_url": "http://www.realsimple.com/", "site_desc": "Magazine about simplifying your life. Includes home solutions, meals, special features."}, {"site_title": "SFGate.com: Home and Garden", "site_url": "http://www.sfgate.com/homeandgarden/", "site_desc": "Daily features and articles from the San Francisco Chronicle covering decorating, entertaining, gardening, and family life."}, {"site_title": "Southern Living", "site_url": "http://www.southernliving.com/", "site_desc": "Features about fine interiors, gardens, design, antiques, travel, events, and the arts."}, {"site_title": "Style at Home", "site_url": "http://www.styleathome.com/", "site_desc": "Covers interior decoration and design tips, decorating on a budget, recipes and buying guides."}, {"site_title": "Sunset Magazine and Books", "site_url": "http://www.sunset.com/", "site_desc": "News and feature articles on Western living."}, {"site_title": "The Washington Post: Home & Garden", "site_url": "http://www.washingtonpost.com/wp-dyn/home/", "site_desc": "Daily features and articles covering decorating, home improvement, gardening, pets and family life."}]}
{"category_path": "Home/Urban_Living/", "categories": [{"cat_url": ["/Home/Apartment_Living/"], "cat_name": ["", "ApartmentLiving", ""]}, {"cat_url": ["/Science/Earth_Sciences/Atmospheric_Sciences/Climatology/Urban/"], "cat_name": ["", "Climate", ""]}, {"cat_url": ["/Arts/Photography/Photographers/Urban/Fine_Art/"], "cat_name": ["", "PhotographicExhibits", ""]}, {"cat_url": ["/Society/Subcultures/"], "cat_name": ["", "Subcultures", ""]}], "sites": [{"site_title": "City Noise", "site_url": "http://www.citynoise.org/", "site_desc": "A public photoblog where people with a love for the urban form, modern world, or a general appreciation of their environment gather to post stories, narratives and often upload photos of their favourite cities, hometowns, travels, or current locations."}, {"site_title": "CityCulture", "site_url": "http://cityculture.org/", "site_desc": "World city reviews and a free test which suggests cities and regions around the world that fit your personality."}, {"site_title": "Flickr: Urban Negative", "site_url": "http://www.flickr.com/groups/urban_negative/", "site_desc": "Photos of the negative object, material and people in urban lifestyles."}, {"site_title": "Pedestrian", "site_url": "http://www.turbulence.org/Works/pedestrian/pedestrian2.html", "site_desc": "An artistic work about urban life."}, {"site_title": "Self Sufficientish", "site_url": "http://www.selfsufficientish.com/", "site_desc": "Information on growing plants, wild food recipes, and alternatives to lead a low impact urban life."}, {"site_title": "Urbansome", "site_url": "http://urbansome.com/", "site_desc": "City travel information, tips, deals, news, photography and lots of entertainment brought to you by urban enthusiasts."}]}
{"category_path": "Home/Rural_Living/", "categories": [{"cat_url": ["/Science/Social_Sciences/Economics/Agricultural_and_Rural_Economics/"], "cat_name": ["", "AgriculturalandResourceEconomics", ""]}, {"cat_url": ["/Home/Cooking/"], "cat_name": ["", "Cooking", ""]}, {"cat_url": ["/Society/People/Cowboys/"], "cat_name": ["", "Cowboys", ""]}, {"cat_url": ["/Reference/Education/K_through_12/Rural_Issues/"], "cat_name": ["", "Education", ""]}, {"cat_url": ["/Business/Agriculture_and_Forestry/Farm_Real_Estate/"], "cat_name": ["", "FarmRealEstate", ""]}, {"cat_url": ["/Health/Public_Health_and_Safety/Rural_Health/"], "cat_name": ["", "Health", ""]}, {"cat_url": ["/Home/Rural_Living/Hobby_Farms/"], "cat_name": ["", "HobbyFarms", ""]}, {"cat_url": ["/Reference/Education/K_through_12/Home_Schooling/"], "cat_name": ["", "HomeSchooling", ""]}, {"cat_url": ["/Home/Rural_Living/Homesteading/"], "cat_name": ["", "Homesteadi­ng", ""]}, {"cat_url": ["/Science/Environment/Water_Resources/Wastewater/Household_Wastewater_Management/"], "cat_name": ["", "HouseholdWastewaterManagement", ""]}, {"cat_url": ["/Society/Lifestyle_Choices/Intentional_Communities/"], "cat_name": ["", "IntentionalCommunities", ""]}, {"cat_url": ["/Home/Rural_Living/Personal_Pages/"], "cat_name": ["", "PersonalPages", ""]}, {"cat_url": ["/Science/Technology/Energy/Renewable/"], "cat_name": ["", "RenewableEnergy", ""]}, {"cat_url": ["/Science/Social_Sciences/Sociology/Rural_Sociology/"], "cat_name": ["", "RuralSociology", ""]}, {"cat_url": ["/Science/Agriculture/Sustainable_Agriculture/"], "cat_name": ["", "SustainableAgriculture", ""]}, {"cat_url": ["/Business/Construction_and_Maintenance/Building_Types/Sustainable_Architecture/"], "cat_name": ["", "SustainableArchitecture", ""]}, {"cat_url": ["/Society/Lifestyle_Choices/Voluntary_Simplicity/"], "cat_name": ["", "VoluntarySimplicity", ""]}], "sites": [{"site_title": "Countryside Magazine", "site_url": "http://countrysidenetwork.com/daily/", "site_desc": "Selected articles from the printed magazine for readers seeking voluntary simplicity and greater self-reliance with emphasis on home food production. Gardening, cooking, food preservation, and livestock. Has an active forum."}, {"site_title": "DTN: The Progressive Farmer", "site_url": "https://www.dtnpf.com/agriculture/web/ag/home", "site_desc": "Forums and news for farmers and people involved in rural issues."}, {"site_title": "Internet Hay Exchange", "site_url": "http://www.hayexchange.com/", "site_desc": "Free International hay listing service for the US and Canada."}, {"site_title": "Kountry Life", "site_url": "http://www.kountrylife.com/index.htm", "site_desc": "An interactive country and rural living site with discussion forums, photo gallery, articles, how-to information, humor, sounds, and recipes."}, {"site_title": "Renewing the Countryside", "site_url": "http://www.renewingthecountryside.org/", "site_desc": "Aims to strengthen rural areas by highlighting the initiatives and projects of rural communities, farmers, artists, entrepreneurs, educators, and activists."}, {"site_title": "Rural Living Canada", "site_url": "http://rurallivingcanada.4t.com/", "site_desc": "A concise directory of Canadian non-urban lifestyle information, news and websites."}, {"site_title": "Soil and Health Library", "site_url": "http://www.soilandhealth.org/", "site_desc": "A free how-to and encouragement resource for self-supporters with little cash, for non-domineering environmentalists, and folks frustrated with urbanity."}, {"site_title": "The Urban Rancher", "site_url": "http://theurbanrancher.tamu.edu/", "site_desc": "The Texas A&M University site dedicated to improving rural living with information on natural resources, rural life, and the urban-rural interface."}]}
{"category_path": "Home/Personal_Organization/", "categories": [{"cat_url": ["/Home/Personal_Organization/Consultants/"], "cat_name": ["", "Consultant­s", ""]}, {"cat_url": ["/Business/Business_Services/Office_Services/Secretarial_Services_and_Virtual_Assistants/"], "cat_name": ["", "VirtualAssistants", ""]}, {"cat_url": ["/Health/Services/Health_Records_Services/"], "cat_name": ["", "HealthRecordsServices", ""]}, {"cat_url": ["/Society/Crime/Theft/Identity_Theft/"], "cat_name": ["", "IdentityTheft", ""]}, {"cat_url": ["/Shopping/Home_and_Garden/Furniture/Storage/"], "cat_name": ["", "StorageFurnitureShopping", ""]}, {"cat_url": ["/Business/Management/Education_and_Training/Time_Management/"], "cat_name": ["", "TimeManagement", ""]}, {"cat_url": ["/Reference/Knowledge_Management/Information_Overload/"], "cat_name": ["", "InformationOverload", ""]}, {"cat_url": ["/Computers/Software/Operating_Systems/Microsoft_Windows/Software/Shareware/Home_and_Hobby/Financial%2C_Insurance_and_Home_Inventory/"], "cat_name": ["", "InventorySoftware", ""]}, {"cat_url": ["/Computers/Mobile_Computing/"], "cat_name": ["", "MobileComputing", ""]}, {"cat_url": ["/Computers/Software/Freeware/Personal_Information_Managers/"], "cat_name": ["", "PersonalInformationManagersFreeware", ""]}, {"cat_url": ["/Computers/Software/Operating_Systems/Microsoft_Windows/Software/Shareware/Personal_Information_Managers/"], "cat_name": ["", "PersonalInformationManagersShareware", ""]}, {"cat_url": ["/Computers/Internet/On_the_Web/Web_Applications/Personal_Information_Managers/"], "cat_name": ["", "PersonalInformationManagersWebApplications", ""]}], "sites": [{"site_title": "43 Folders", "site_url": "http://www.43folders.com/", "site_desc": "Aids to living including (but not limited to) productivity and time management tips, Mac OS X programs and technologies, ideas about modest ways to improve a person's life and reduce stress, and cool or helpful shortcuts that makes life a bit easier."}, {"site_title": "52 Projects", "site_url": "http://www.52projects.com/", "site_desc": "Motivating people to work on whatever projects they have long wanted or needed to do."}, {"site_title": "ABCs of Life Skills", "site_url": "http://lifeskills.endlex.com/", "site_desc": "Articles and references about organizing a successful life. Divided into general knowledge, money, work, family, health, and communication."}, {"site_title": "Checklists.com", "site_url": "http://www.checklists.com/atoz.html", "site_desc": "Free checklists on a large number of activities."}, {"site_title": "Clutterbug Network", "site_url": "http://www.clutterbug.net/", "site_desc": "Organizer directory, free newsletter."}, {"site_title": "Creative Homemaking - Organize", "site_url": "http://www.creativehomemaking.com/organize_1.shtml", "site_desc": "Offers tips for home, family, clutter control, holidays, cooking and time management. Includes a newsletter and check lists."}, {"site_title": "Flylady.net", "site_url": "http://www.flylady.net/", "site_desc": "Offers a system for organizing and managing a home, based on the concept of daily routines and a focus on small, time- and space-limited tasks. Provides resources, tips and newsletter."}, {"site_title": "Get Organized Now", "site_url": "http://www.getorganizednow.com/", "site_desc": "Offers tools, ideas and articles. Features monthly checklists, a discussion forum, e-courses and a newsletter."}, {"site_title": "I Need More Time", "site_url": "http://ineedmoretime.com/", "site_desc": "Free organizing tips, ideas and articles.  Sells additional tips as an ebook."}, {"site_title": "Lifehack.org", "site_url": "http://www.lifehack.org/", "site_desc": "Pointers on productivity, getting things done and lifehacks."}, {"site_title": "List Organizer", "site_url": "http://listorganizer.com/", "site_desc": "Offers planning lists with time management instructions for home, personal, travel, budgets, children, pets. Includes a newsletter."}, {"site_title": "Messies Anonymous", "site_url": "http://www.messies.com/", "site_desc": "Dedicated to bringing harmony in the home through understanding and aiding the messie mindset. Provides resources, FAQ and newsletter."}, {"site_title": "National Association of Professional Organizers (NAPO)", "site_url": "http://www.napo.net/", "site_desc": "Non-profit educational association whose members include organizing consultants, speakers, trainers, authors, and manufacturers of organizing products. Includes membership information, events, chapters, FAQ and a newsletter."}, {"site_title": "OrganizeTips", "site_url": "http://www.organizetips.com/", "site_desc": "Tips on organizing daily life.  Free planners, organizers, free software for home, office, wedding, moving, pregnancy, holiday and budget."}, {"site_title": "Printable Checklists", "site_url": "http://www.printablechecklists.com/", "site_desc": "Printable charts and checklists on topics such as parenting, children and special occasions. Includes a newsletter."}, {"site_title": "Professional Organizer Academy", "site_url": "http://professionalorganizeracademy.com/", "site_desc": "Online training academy for professional organizers."}, {"site_title": "Professional Organizers Web Ring", "site_url": "http://www.organizerswebring.com/", "site_desc": "Works to promote the field and to provide information on products and services. Includes events and FAQ."}]}
{"category_path": "Home/Personal_Finance/", "categories": [{"cat_url": ["/Reference/Education/Colleges_and_Universities/Financial_Aid/"], "cat_name": ["", "CollegeFinancialAid", ""]}, {"cat_url": ["/Home/Personal_Finance/Money_Management/Debt_and_Bankruptcy/"], "cat_name": ["", "DebtandBankruptcy", ""]}, {"cat_url": ["/Society/Issues/Violence_and_Abuse/Elder/Financial_Abuse/"], "cat_name": ["", "ElderFraudIssues", ""]}, {"cat_url": ["/Society/Law/Legal_Information/Estate_Planning_and_Administration/"], "cat_name": ["", "EstatePlanning", ""]}, {"cat_url": ["/Society/Crime/Theft/Identity_Theft/"], "cat_name": ["", "IdentifyTheftIssues", ""]}, {"cat_url": ["/Home/Personal_Finance/Insurance/"], "cat_name": ["", "Insurance", ""]}, {"cat_url": ["/Home/Personal_Finance/Investing/"], "cat_name": ["", "Investing", ""]}, {"cat_url": ["/Home/Personal_Finance/Money_Management/Loans/"], "cat_name": ["", "Loans", ""]}, {"cat_url": ["/Home/Personal_Finance/Money_Management/"], "cat_name": ["", "MoneyManagement", ""]}, {"cat_url": ["/Home/Personal_Finance/Money_Management/Loans/Home/"], "cat_name": ["", "Mortgages", ""]}, {"cat_url": ["/Home/Personal_Finance/Philanthropy/"], "cat_name": ["", "Philanthro­py", ""]}, {"cat_url": ["/Home/Personal_Finance/Retirement/"], "cat_name": ["", "Retirement", ""]}, {"cat_url": ["/Home/Personal_Finance/Software/"], "cat_name": ["", "Software", ""]}, {"cat_url": ["/Home/Personal_Finance/Tax_Preparation/"], "cat_name": ["", "TaxPreparatio­n", ""]}, {"cat_url": ["/Home/Personal_Finance/Unclaimed_Money/"], "cat_name": ["", "UnclaimedMoney", ""]}], "sites": [{"site_title": "20 Something Finance", "site_url": "http://20somethingfinance.com/", "site_desc": "Articles focused on helping young people manage their money."}, {"site_title": "AARP - Money and Work", "site_url": "http://www.aarp.org/money/", "site_desc": "Discussion of money matters in considerable depth, especially those related to people who have retired or are planning to in the near future."}, {"site_title": "About.com: Financial Planning", "site_url": "http://financialplan.about.com/", "site_desc": "Information on personal financial planning, including budgeting, savings, investing, retirement, insurance, and taxes."}, {"site_title": "American Savings Education Council", "site_url": "http://www.asec.org/", "site_desc": "A coalition of government and industry institutions to educate people on all aspects of personal finance and wealth development, including credit management, college savings, home purchase, and retirement planning."}, {"site_title": "Bankrate.com", "site_url": "http://www.bankrate.com/", "site_desc": "An online publication that provides consumers with  financial data, research and editorial information on       non-investment financial products."}, {"site_title": "CCH Financial Planning Toolkit", "site_url": "http://www.finance.cch.com/", "site_desc": "Information to manage one's personal finances, including investments, insurance, risk and asset management strategies, and tax, retirement and estate planning."}, {"site_title": "CNBC", "site_url": "http://www.cnbc.com/", "site_desc": "Headline news, articles, reports, stocks and quotes, message boards, and a stock ticker."}, {"site_title": "CNN/Money", "site_url": "http://money.cnn.com/", "site_desc": "Combines practical personal finance advice, calculators and investing tips with business news, stock quotes, and financial market coverage from the editors of CNN and Money Magazine."}, {"site_title": "ConsumerReports.org: Money", "site_url": "http://www.consumerreports.org/cro/money/index.htm", "site_desc": "Information about adjustable rate mortgage, investment tools, and personal finance tips."}, {"site_title": "Federal Reserve Board: Consumer Information", "site_url": "http://www.federalreserve.gov/consumers.htm", "site_desc": "Centralized home for articles giving advice and warnings about financial topics, products, and scams."}, {"site_title": "Forbes.com - Personal Finance", "site_url": "http://www.forbes.com/finance/", "site_desc": "Financial information previously published in the print version of Forbes."}, {"site_title": "I Retire Early", "site_url": "http://www.iretireearly.com/", "site_desc": "How to save money, advance your career and manage your finances."}, {"site_title": "Inflation Calculator", "site_url": "http://www.westegg.com/inflation/", "site_desc": "Adjusts a given amount of money for inflation, according to the Consumer Price Index."}, {"site_title": "Institute of Consumer Financial Education", "site_url": "http://www.financial-education-icfe.org/", "site_desc": "Offers financial education for all age groups with a special section devoted to teaching children about money."}, {"site_title": "International Foundation for Retirement Education (InFRE)", "site_url": "http://www.infre.org/", "site_desc": "A not-for-profit educational foundation dedicated to empowering working Americans with the motivation and capability to save and plan for a successful retirement."}, {"site_title": "Joe Taxpayer", "site_url": "http://www.joetaxpayer.com/", "site_desc": "Blog covering nearly all of the personal finance topics, with an emphasis on taxes."}, {"site_title": "The JumpStart Coalition for Personal Financial Literacy", "site_url": "http://www.jumpstart.org/", "site_desc": "Purpose is to evaluate the financial literacy of young adults; develop, disseminate, and encourage the use of guidelines for grades K-12; and promote the teaching of personal finance."}, {"site_title": "Kiplinger Online", "site_url": "http://www.kiplinger.com/", "site_desc": "Investing, personal finance, calculators and financial advice."}, {"site_title": "Marketplace", "site_url": "http://www.marketplace.org/", "site_desc": "Public radio business and economic news and commentary."}, {"site_title": "MarketWatch.com: Personal Finance", "site_url": "http://www.marketwatch.com/personal-finance", "site_desc": "Tips and stories for managing your personal finances."}, {"site_title": "Money Instructor", "site_url": "http://www.moneyinstructor.com/", "site_desc": "Tools and information to help   teach money and money management, business, the economy, and investing."}, {"site_title": "Money Talks News", "site_url": "http://www.moneytalksnews.com/", "site_desc": "Tips and advice to help you spend less and save more."}, {"site_title": "The Money Ways", "site_url": "http://www.themoneyways.com/", "site_desc": "Advice on money management, saving money, and budgeting."}, {"site_title": "The Motley Fool", "site_url": "http://www.fool.com/", "site_desc": "Investing information and an enjoyably useful site. Updated hourly."}, {"site_title": "MsMoney.com", "site_url": "http://www.msmoney.com/", "site_desc": "Resource for women to learn about financial planning, personal finance and investing."}, {"site_title": "Mymoney.gov - Financial Literacy Education Commission", "site_url": "http://www.mymoney.gov/", "site_desc": "Starting point for information intended by the US government to help improve the financial literacy and education of persons in the United States."}, {"site_title": "The New York Times - Your Money", "site_url": "http://www.nytimes.com/pages/business/yourmoney/", "site_desc": "Articles and features on investing, pensions and retirement plans, mortgage rates, mutual funds, the stock market, bonds and notes.  Also has company research, earnings reports and market insight."}, {"site_title": "Wealth Informatics", "site_url": "http://www.wealthinformatics.com/", "site_desc": "Blog discussing everyday finance topics."}, {"site_title": "Yahoo! Finance", "site_url": "http://finance.yahoo.com/", "site_desc": "Personal finance, investing tips and news."}, {"site_title": "Your Money Page", "site_url": "http://www.yourmoneypage.com/", "site_desc": "Online calculators for financial planning and personal finance."}]}
{"category_path": "Home/Moving_and_Relocating/", "categories": [{"cat_url": ["/Home/Consumer_Information/Home_and_Family/Moving_and_Relocating/"], "cat_name": ["", "ConsumerInformation", ""]}, {"cat_url": ["/Business/Business_Services/Corporate_Relocation/"], "cat_name": ["", "CorporateRelocation", ""]}, {"cat_url": ["/Home/Moving_and_Relocating/Moving/"], "cat_name": ["", "Moving", ""]}, {"cat_url": ["/Home/Moving_and_Relocating/Publications/"], "cat_name": ["", "Publicatio­ns", ""]}, {"cat_url": ["/Business/Real_Estate/Agents_and_Agencies/"], "cat_name": ["", "RealEstateAgencies", ""]}, {"cat_url": ["/Business/Real_Estate/By_Region/"], "cat_name": ["", "RealEstatebyCountry", ""]}, {"cat_url": ["/Business/Real_Estate/Residential/Rentals/"], "cat_name": ["", "ApartmentsandRentals", ""]}, {"cat_url": ["/Society/Gay%2C_Lesbian%2C_and_Bisexual/Home_and_Living/Moving_and_Relocating/"], "cat_name": ["", "Gay,Lesbian,andBisexual", ""]}, {"cat_url": ["/Home/Moving_and_Relocating/International_Relocation/"], "cat_name": ["", "Internatio­nalRelocation", ""]}, {"cat_url": ["/Home/Moving_and_Relocating/Military_Relocation/"], "cat_name": ["", "MilitaryRelocation", ""]}, {"cat_url": ["/Business/Real_Estate/Residential/Rentals/Students/"], "cat_name": ["", "StudentHousing", ""]}], "sites": []}
{"category_path": "Home/Homeowners/", "categories": [{"cat_url": ["/Home/Home_Improvement/Decorating/"], "cat_name": ["", "Decorating", ""]}, {"cat_url": ["/Home/Home_Improvement/Design_and_Construction/"], "cat_name": ["", "DesignandConstruction", ""]}, {"cat_url": ["/Home/Do-It-Yourself/"], "cat_name": ["", "Do-It-Yourself", ""]}, {"cat_url": ["/Home/Home_Improvement/Energy_Efficiency/"], "cat_name": ["", "EnergyEfficiency", ""]}, {"cat_url": ["/Society/Religion_and_Spirituality/Taoism/Feng_Shui/"], "cat_name": ["", "FengShui", ""]}, {"cat_url": ["/Home/Home_Improvement/Automation/"], "cat_name": ["", "HomeAutomation", ""]}, {"cat_url": ["/Home/Homeowners/Home_Buyers/"], "cat_name": ["", "HomeBuyers", ""]}, {"cat_url": ["/Home/Home_Improvement/"], "cat_name": ["", "HomeImprovement", ""]}, {"cat_url": ["/Home/Homeowners/Homeowner_Associations/"], "cat_name": ["", "HomeownerAssociatio­ns", ""]}, {"cat_url": ["/Business/Real_Estate/Residential/Cooperatives/"], "cat_name": ["", "HousingCooperatives", ""]}, {"cat_url": ["/Science/Environment/Air_Quality/Indoor_Air_Quality/"], "cat_name": ["", "IndoorAirQuality", ""]}, {"cat_url": ["/Home/Homeowners/Pest_Control/"], "cat_name": ["", "PestControl", ""]}, {"cat_url": ["/Shopping/Home_and_Garden/Outdoor_Structures/Playsets/"], "cat_name": ["", "Playhouses", ""]}, {"cat_url": ["/Home/Home_Improvement/Restoration/"], "cat_name": ["", "Restoration", ""]}, {"cat_url": ["/Home/Homeowners/Treehouses/"], "cat_name": ["", "Treehouses", ""]}], "sites": [{"site_title": "The Condominium Bluebook (Condominium Laws)", "site_url": "http://www.condobook.com/", "site_desc": "The complete guide to the operations of condominiums, planned developments and other common interest developments in California."}, {"site_title": "EPA.gov - Refrigerant-22 Phaseout", "site_url": "http://www.epa.gov/ozone/title6/phaseout/22phaseout.html", "site_desc": "EPA site with homeowner information for consideration when purchasing or repairing a residential system or heat pump."}, {"site_title": "Home Owners Information Center", "site_url": "http://www.ourfamilyplace.com/homeowner/", "site_desc": "Guide to remodeling, refinancing, household budgets and getting the most enjoyment from your home."}, {"site_title": "Homebuilding Pitfalls", "site_url": "http://www.homebuildingpitfalls.com/", "site_desc": "Aims to help consumers save time, money and stress by providing advice on how to properly building their own home"}, {"site_title": "Popular Mechanics: Home Improvement", "site_url": "http://www.popularmechanics.com/home_journal/home_improvement/", "site_desc": "Project information for home and garden. Furniture making,  gardening, home improvement, tools, homeowner's clinic, how it works section. Illustrated."}, {"site_title": "WSJ.com's Real Estate Journal", "site_url": "http://www.realestatejournal.com/", "site_desc": "A guide to buying, selling and maintaining a home."}]}
{"category_path": "Home/Homemaking/", "categories": [{"cat_url": ["/Home/Homemaking/Cleaning_and_Stains/"], "cat_name": ["", "CleaningandStains", ""]}, {"cat_url": ["/Home/Home_Improvement/Decorating/"], "cat_name": ["", "Decorating", ""]}, {"cat_url": ["/Home/Homemaking/Frugality/"], "cat_name": ["", "Frugality", ""]}, {"cat_url": ["/Health/Alternative/Non-Toxic_Living/"], "cat_name": ["", "Non-ToxicLiving", ""]}, {"cat_url": ["/Home/Family/Parenting/"], "cat_name": ["", "Parenting", ""]}, {"cat_url": ["/Home/Personal_Organization/"], "cat_name": ["", "PersonalOrganization", ""]}, {"cat_url": ["/Home/Homemaking/Christian/"], "cat_name": ["", "Christian", ""]}, {"cat_url": ["/Home/Homemaking/Celebrity_Homemakers/"], "cat_name": ["", "CelebrityHomemakers", ""]}, {"cat_url": ["/Home/Homemaking/News_and_Media/"], "cat_name": ["", "NewsandMedia", ""]}], "sites": [{"site_title": "About.com - Housekeeping", "site_url": "http://housekeeping.about.com/", "site_desc": "Offers cleaning articles, how-to's, and product reviews. Includes a newsletter and message board."}, {"site_title": "All About Home by Service Master", "site_url": "http://www.allabouthome.com/", "site_desc": "Offers advice on topics including seasonal issues and disaster preparedness. Features a virtual tour and measurement calculators."}, {"site_title": "AOL Living", "site_url": "http://living.aol.com/", "site_desc": "Features homemaking help, organizational tips, recipes, beauty advice, and decorating ideas."}, {"site_title": "Barefoot Lass's Hints & Tips", "site_url": "http://members.tripod.com/~Barefoot_Lass/", "site_desc": "Offers information on topics such as removing crayon marks from walls, finding the best hangover cure and alternative uses for cola. Includes awards and information about trigminal neuralgia."}, {"site_title": "Berkeley Parents Network - Advice About Household Management", "site_url": "http://parents.berkeley.edu/advice/household/index.html", "site_desc": "Offers advice on topics such as health and safety, household organization, cleaning and laundry. Includes subscription information."}, {"site_title": "Bob Allison's Ask Your Neighbor - Helpful Household Hints", "site_url": "http://www.askyourneighbor.com/hhints.htm", "site_desc": "Features tips shared on Bob Allison's Ask your Neighbor radio program. Includes instructions for making cleaning products."}, {"site_title": "Creative Homemaking", "site_url": "http://www.creativehomemaking.com/", "site_desc": "Offers organization, decorating, crafts, frugal living and parenting hints. Includes holiday ideas, recipes and a newsletter."}, {"site_title": "DontForgetTheMilk.com", "site_url": "http://www.dontforgetthemilk.com/", "site_desc": "Creates shopping lists sorted by store and price. Features a message board. Requires free registration."}, {"site_title": "eHow: Organize Your Closet", "site_url": "http://www.ehow.com/closet-organizing/", "site_desc": "Full-length article covers the process of organizing the items in a closet."}, {"site_title": "The F.U.N. Place - Families United on the Net", "site_url": "http://www.thefunplace.com/", "site_desc": "Offers home tips, recipes, crafts and parenting articles. Includes forums, chat and a newsletter."}, {"site_title": "Forums for the Chaotic Home", "site_url": "http://ths.gardenweb.com/forums/", "site_desc": "Offers discussion forums on topics such as the house, cooking, crafts and hobbies and the family. Includes information on meetings."}, {"site_title": "Hints and Things", "site_url": "http://www.hintsandthings.com/", "site_desc": "Offers advice that used to be passed down from generation to generation. Features competitions and a newsletter."}, {"site_title": "Hints from Heloise", "site_url": "http://www.heloise.com/", "site_desc": "Offers tips for the home, garden and travel."}, {"site_title": "Home Made Simple", "site_url": "http://www.homemadesimple.com/", "site_desc": "Includes features on home decorating, gardening, and organizing, with ideas to simplify, organize, beautify and inspire life."}, {"site_title": "Homemaking School for Children", "site_url": "http://theparentsite.com/parenting/homemakingschool.asp", "site_desc": "Discusses how to teach children lessons in house cleaning and responsibility. By Monica Resinger."}, {"site_title": "Household Hints by Myra L. Fitch", "site_url": "http://lonestar.texas.net/~fitch/hints/hints.html#stainlesssteel", "site_desc": "Offers advice on topics such as getting whites white and removing hard water marks on polished marble. Includes reactions from readers."}, {"site_title": "Joey Green's WackyUses.com", "site_url": "http://www.wackyuses.com/", "site_desc": "Offers little-known uses for well-known products. Includes histories and facts behind the products."}, {"site_title": "John's House", "site_url": "http://johnshouse.itgo.com/", "site_desc": "Offers advice on topics such as food storage, removing pet hair from clothing and preventing dust build-up on television screens. Includes author's profile."}, {"site_title": "The New Homemaker", "site_url": "http://www.thenewhomemaker.com/", "site_desc": "Offers advice and resources on topics including parenting, thriftiness, kitchen, family health, crafts, decorating, and organization. Features news and chat."}, {"site_title": "Old Fashioned Living", "site_url": "http://oldfashionedliving.com/", "site_desc": "Presents old-fashioned traditions for the modern family. Features a newsletter and discussion forum."}, {"site_title": "Organized Home", "site_url": "http://organizedhome.com/", "site_desc": "Offers articles on uncluttering the house, cutting mealtime chaos, streamlining storage and finding more time. Includes a newsletter."}, {"site_title": "Robbie's Kitchen - Household Tips & Tricks", "site_url": "http://www.robbiehaf.com/RobbiesKitchen/RobbiesHints.html", "site_desc": "Offers tips for cleaning, cooking, laundry, and home remedies. Features a message board."}, {"site_title": "Seeking Sources", "site_url": "http://www.seekingsources.com/", "site_desc": "Offers articles about cooking, home finance and the holidays. Topics include gifts from the kitchen, how to take a financial inventory and how to choose the right cookware."}, {"site_title": "Uses for Vinegar", "site_url": "http://www.angelfire.com/cantina/homemaking/vinegar.html", "site_desc": "Offers suggestions on topics including cleaning tools, getting rid of an upset stomach and laundry care."}]}
{"category_path": "Home/Home_Improvement/", "categories": [{"cat_url": ["/Home/Home_Improvement/Appliances/"], "cat_name": ["", "Appliances", ""]}, {"cat_url": ["/Home/Home_Improvement/Bathrooms/"], "cat_name": ["", "Bathrooms", ""]}, {"cat_url": ["/Home/Home_Improvement/Exterior/"], "cat_name": ["", "Exterior", ""]}, {"cat_url": ["/Home/Home_Improvement/Floors/"], "cat_name": ["", "Floors", ""]}, {"cat_url": ["/Home/Home_Improvement/Furniture/"], "cat_name": ["", "Furniture", ""]}, {"cat_url": ["/Home/Home_Improvement/Kitchens/"], "cat_name": ["", "Kitchens", ""]}, {"cat_url": ["/Home/Home_Improvement/Storage/"], "cat_name": ["", "Storage", ""]}, {"cat_url": ["/Home/Home_Improvement/Walls/"], "cat_name": ["", "Walls", ""]}, {"cat_url": ["/Home/Home_Improvement/Windows_and_Doors/"], "cat_name": ["", "WindowsandDoors", ""]}, {"cat_url": ["/Home/Home_Improvement/Automation/"], "cat_name": ["", "Automation", ""]}, {"cat_url": ["/Home/Home_Improvement/Climate_Control/"], "cat_name": ["", "ClimateControl", ""]}, {"cat_url": ["/Home/Home_Improvement/Decorating/"], "cat_name": ["", "Decorating", ""]}, {"cat_url": ["/Home/Home_Improvement/Electrical/"], "cat_name": ["", "Electrical", ""]}, {"cat_url": ["/Home/Home_Improvement/Energy_Efficiency/"], "cat_name": ["", "EnergyEfficiency", ""]}, {"cat_url": ["/Home/Home_Improvement/Lighting/"], "cat_name": ["", "Lighting", ""]}, {"cat_url": ["/Home/Home_Improvement/Painting/"], "cat_name": ["", "Painting", ""]}, {"cat_url": ["/Home/Home_Improvement/Plumbing/"], "cat_name": ["", "Plumbing", ""]}, {"cat_url": ["/Home/Home_Improvement/Restoration/"], "cat_name": ["", "Restoratio­n", ""]}, {"cat_url": ["/Home/Home_Improvement/Safety_and_Security/"], "cat_name": ["", "SafetyandSecurity", ""]}, {"cat_url": ["/Home/Home_Improvement/Welding_and_Soldering/"], "cat_name": ["", "WeldingandSoldering", ""]}, {"cat_url": ["/Home/Home_Improvement/Chats_and_Forums/"], "cat_name": ["", "ChatsandForums", ""]}, {"cat_url": ["/Business/Construction_and_Maintenance/Commercial_Contractors/"], "cat_name": ["", "CommercialContractors", ""]}, {"cat_url": ["/Home/Home_Improvement/Design_and_Construction/"], "cat_name": ["", "DesignandConstructi­on", ""]}, {"cat_url": ["/Home/Home_Improvement/Glossaries/"], "cat_name": ["", "Glossaries", ""]}, {"cat_url": ["/Home/Home_Improvement/News_and_Media/"], "cat_name": ["", "NewsandMedia", ""]}, {"cat_url": ["/Home/Home_Improvement/Tools_and_Equipment/"], "cat_name": ["", "ToolsandEquipment", ""]}], "sites": [{"site_title": "411 Home Repair", "site_url": "http://www.411homerepair.com/", "site_desc": "Collection of short articles offering tips for home repair, gardens, tools, and appliances."}, {"site_title": "Acme How To", "site_url": "http://www.acmehowto.com/", "site_desc": "Articles on tasks and repairs around the home and garden."}, {"site_title": "Around the House", "site_url": "http://www.thefunplace.com/house/home/", "site_desc": "Articles on pool maintenance, water softeners, lead exposure, doors and windows, and heating safely."}, {"site_title": "Ask The Builder", "site_url": "http://www.askthebuilder.com/", "site_desc": "Information about home building and remodeling."}, {"site_title": "AskToolTalk.com", "site_url": "http://www.asktooltalk.com/", "site_desc": "Home improvement experts feature articles, product reviews, links to manufacturers, and online shopping."}, {"site_title": "Beaver House Addition", "site_url": "http://adam_sb.tripod.com/beaveraddition/", "site_desc": "Homeowner's journal of large, residential building addition project spanning several years. Includes photographs, descriptions, and stories."}, {"site_title": "BobVila.com", "site_url": "http://www.bobvila.com/", "site_desc": "Home improvement projects, featured products, tip library, bulletin board, designer tools, and information about television programs hosted by Bob Vila."}, {"site_title": "Construction Resource", "site_url": "http://www.construction-resource.com/", "site_desc": "Employment listings, forums, how to articles, and calculators."}, {"site_title": "Consumer Information Center: Housing", "site_url": "http://publications.usa.gov/USAPubs.php?CatID=8", "site_desc": "Collection of home maintenance articles ranging from two to thirty-six pages available for download or purchase. Most include detailed information and illustrations. From the United States Federal Consumer Information Center."}, {"site_title": "Dave's Shop Talk: Building Confidence", "site_url": "http://daveosborne.com/dave/index.php", "site_desc": "Features articles and plans for the do it yourself person on home renovations and home repair, from a professional carpenter, renovator and contractor."}, {"site_title": "DIY Doctor", "site_url": "http://www.diydoctor.org.uk/projects.htm", "site_desc": "Collection of articles and photographs covering a range of home projects."}, {"site_title": "Diy Fix It", "site_url": "http://www.diyfixit.co.uk/", "site_desc": "Collection of tips and advice covering building and repairs, plumbing, tiling, electrical, wallpapering, painting, and decorating."}, {"site_title": "DIY Not", "site_url": "http://www.diynot.com/", "site_desc": "Information including an encyclopedia, regular articles and a discussion forum."}, {"site_title": "DIY Repair and Home Improvement Forums", "site_url": "http://www.diychatroom.com/", "site_desc": "A community of homeowners and contractors sharing knowledge on painting, construction, electrical, plumbing, carpentry, flooring, and landscaping."}, {"site_title": "DIYData.com", "site_url": "http://www.diydata.com/", "site_desc": "Collection of short articles on topics including plumbing, painting, tool usage, and other repair projects around the home."}, {"site_title": "DIYonline.com", "site_url": "http://www.diyonline.com/", "site_desc": "Articles and step-by-step tutorials with photographs for a wide variety of home improvement projects, glossary, cost calculators (United States-based), and design tools."}, {"site_title": "HammerZone.com", "site_url": "http://www.hammerzone.com/", "site_desc": "Photographs and step-by-step instructions for electrical, plumbing, kitchen, bath, windows and doors, exterior, flooring, and carpentry projects."}, {"site_title": "Handymanwire", "site_url": "http://www.handymanwire.com/", "site_desc": "Collection of articles, tips, FAQs, and forums for a variety of home improvement projects."}, {"site_title": "Helpwithdiy.com", "site_url": "http://www.helpwithdiy.com/", "site_desc": "Illustrated tutorials on topics including plumbing, painting, tiling, and decorating."}, {"site_title": "Home Improvement", "site_url": "http://www.home-improvement-home-improvement.com/", "site_desc": "Do-it-yourself articles cover interior and exterior, decks and patios, and decorating."}, {"site_title": "Home Repair", "site_url": "http://homerepair.about.com/index.htm", "site_desc": "Information on do-it-yourself projects or major renovations. Offers time and money saving techniques, diagrams, and links to other home repair sites."}, {"site_title": "Home Repair Stuff", "site_url": "http://www.factsfacts.com/MyHomeRepair/", "site_desc": "Frequently asked questions from the alt.home.repair newsgroup, with plumbing, carpentry, and tool tips."}, {"site_title": "Home Repairs", "site_url": "http://www.repair-home.com/", "site_desc": "Home repair tips, discussion forum and contractor search utility."}, {"site_title": "Home Tips", "site_url": "http://www.hometips.com/", "site_desc": "Home repair and improvement advice and tips including \"How Your House Works\", a manual by Don Vandervort, buying guides, ideas for home and shop."}, {"site_title": "HomeDoctor.net", "site_url": "http://homedoctor.net/", "site_desc": "Short articles covering appliance repair, pest control, electrical repairs, energy savings, and heating and cooling in the home. Also includes discussion forums."}, {"site_title": "HomeImprove.com", "site_url": "http://homeimprove.com/", "site_desc": "Collection of short articles on dozens of home improvement and repair topics."}, {"site_title": "Hometime", "site_url": "http://www.hometime.com/Howto/howto.htm", "site_desc": "An assortment of how-to guides, with manufacturer and safety information."}, {"site_title": "How Not to Build an Addition", "site_url": "http://www.homehumor.com/", "site_desc": "All aspects of home improvement from getting a loan to finish carpentry.  Humor and practical advice."}, {"site_title": "Jackie Craven: The Fix", "site_url": "http://jackiecraven.com/fixit/thefix.htm", "site_desc": "Article archive with answers on a variety of subjects including kitchen and bath, paint and wallpaper, home design, and pests."}, {"site_title": "Jerry Built", "site_url": "http://jerrybuilt.com/", "site_desc": "Woodworking and home-improvement site offering the opportunity to participate in the construction of an actual online project, by joining the \"team\" and submitting ideas, plans, and criticisms."}, {"site_title": "Kenovations", "site_url": "http://www.kenovations.net/", "site_desc": "A homeowner's step-by-step guide to renovating a 1968 Cape Cod home."}, {"site_title": "Mobile Home Doctor", "site_url": "http://www.mobilehomedoctor.com/", "site_desc": "Short articles offering tips for repair and improvement specific to mobile homes. Includes a tutorial on construction of mobile homes."}, {"site_title": "Mobile Home Repair", "site_url": "http://www.mobilehomerepair.com/", "site_desc": "Advice; hardboard siding lawsuit update."}, {"site_title": "Move, Inc. Home Improvement", "site_url": "http://www.realtor.com/advice/home-improvement/", "site_desc": "Offers a variety of how-to guides and do-it-yourself related calculators."}, {"site_title": "National Association of the Remodeling Industry", "site_url": "http://www.nari.org/", "site_desc": "Tips on planning, what to look for when hiring a pro."}, {"site_title": "National Kitchen and Bath Association", "site_url": "http://www.nkba.org/", "site_desc": "Information for consumers and trade professionals. Consumers can locate design professionals, research design strategies, and request a free remodeling planning kit."}, {"site_title": "The Natural Handyman", "site_url": "http://www.naturalhandyman.com/", "site_desc": "Home repair help, humor, and encouragement through a collection of articles and a newsletter."}, {"site_title": "Practical DIY", "site_url": "http://www.practicaldiy.com/", "site_desc": "UK-specific information. Small series of Do-It-Yourself home repair articles."}, {"site_title": "Ten Square Metres", "site_url": "http://www.tensquaremetres.com/", "site_desc": "A photographic diary of a DIY project to extend a house by ten square metres."}, {"site_title": "This to That", "site_url": "http://www.thistothat.com/", "site_desc": "Advice about how to glue things to other              things. They are given with humor and good details."}, {"site_title": "Thumb and Hammer", "site_url": "http://www.thumbandhammer.com/", "site_desc": "A do-it-yourselfer's photo reviews of his past projects."}, {"site_title": "You Repair", "site_url": "http://www.yourepair.com/", "site_desc": "Helping people fix things around the house or understanding what a contractor will do to solve your troubles. Managed by Robert Marencin."}]}

不得不说这个网站真的太庞大了,我在编写这篇文章时,我的小蜘蛛都没有爬完home目录下的内容,但是我们组员之前已经自己编写并运行了爬取home目录的项目(据他说用了一个半小时左右,但是其他组的同学有4个小时才爬完他们的目录),结果显示是13280多条数据:

组员的运行成果

补上我的运行结果:
本次实验的运行结果

为什么组员爬了1万多条而我的只有2378条呢,这看起来很奇怪,但这是因为我把一个目录下的所有site作为对象放进了sites{}里面,所以整体的数量条数就变少了。

五、操作过程中出现的主要问题

1.代码出现的问题

child_cat['cat_name'] = cat.xpath('a/div/text()').extract().replace("\r\n", "").replace(" ", "")

以上是获取目录名称最初的代码,由于获取的结果含有\n,\r以及空格等标签,本来打算直接在后面使用replace代替,但是运行报错:

AttributeError: 'list' object has no attribute 'replace' when trying to remove character

后面发现是因为 xpath方法返回一个列表而不是字符串,替换只适用于字符串即replace是不能直接作用于列表的,因此需要迭代项目来完成一些替换工作。(参考:https://stackoverflow.com/questions/36642782/attributeerror-list-object-has-no-attribute-replace-when-trying-to-remove-c
因而后面的代码便改成了:

 child_cat['cat_name'] = cat.xpath('a/div/text()').extract()
            child_cat['cat_name'] = [item.replace("\r\n", "") for item in child_cat['cat_name']]
            child_cat['cat_name'] = [item.replace(" ", "") for item in child_cat['cat_name']]

不过还存在一点问题,就是没有替换掉 ' 以及
2.爬取过程中被禁止访问的问题
由于之前做过豆瓣的爬取,因此知道需要将爬虫伪装成浏览器访问——在settings.py中添加:

USER_AGENT = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"

但编写代码的过程中总会有很多出错的地方,编辑过程中出现了一个死循环,我还没有发现,当我运行时发现爬虫一直不断地在爬取同一个目录下的site里面的东西,没有跳出来,由于爬取频率太高,被这个网站发现了,于是IP就被禁止了,当然访问时就出现403错误。解禁过后,就立马去解决这个问题,在settings.py中找到:

#DOWNLOAD_DELAY = 3

将#去掉,爬取时设置3秒的时间间隔,这样爬取的频率就不会太高。

六、总结

在爬取一个网站之前我认为需要将整个网站的结构给分析透彻,清楚每一个链接会链到什么页面,每一个页面的div块哪里一样,这样才能便于分析,制定爬取计划。总之,我觉得这是我这几周以来做过的最难的爬虫项目了,也认识到了自己对python和scrapy其实还有好多好多不懂得地方,好多方法和包都不会用。在进行项目的过程中还需要和组员、同学多多交流讨论,互相学习。这个网站,真的不知道该怎么去说,结构化程很高,但是一层一层的深入下去,要爬取到数据还是需要好好的分析,制定对爬取的路线。这是一个练习爬虫的比较好的网站,但是数据量也有点太庞大了。

参考:
https://blog.csdn.net/u012150179/article/details/34913315
https://www.jianshu.com/p/83c73071d3cb
https://www.jianshu.com/p/83c73071d3cb
一个学习python基本知识的网站http://www.runoob.com/python/python-tutorial.html
十分重要的友情链接(对完成本次实验帮助很大)https://www.jianshu.com/p/d6b6feb0a504
我们小组关于dmoz/Home目录下的数据的采集另一种采集方法https://www.jianshu.com/p/51419fec3915

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 200,667评论 5 472
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 84,361评论 2 377
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 147,700评论 0 333
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,027评论 1 272
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 62,988评论 5 361
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,230评论 1 277
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,705评论 3 393
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,366评论 0 255
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,496评论 1 294
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,405评论 2 317
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,453评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,126评论 3 315
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,725评论 3 303
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,803评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,015评论 1 255
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 42,514评论 2 346
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,111评论 2 341

推荐阅读更多精彩内容