Scrapy 1.2.0 发布了。
Scrapy 是一套基于基于Twisted的异步处理框架,纯python实现的爬虫框架,用户只需要定制开发几个模块就可以轻松的实现一个爬虫,用来抓取网页内容以及各种图片。
更新内容:
新特性
New
FEED_EXPORT_ENCODING
setting to customize the encoding used when writing items to a file. This can be used to turn off\uXXXX
escapes in JSON output. This is also useful for those wanting something else than UTF-8 for XML or CSV output (#2034).startproject
command now supports an optional destination directory to override the default one based on the project name (#2005).New
SCHEDULER_DEBUG
setting to log requests serialization failures (#1610).JSON encoder now supports serialization of
set
instances (#2058).Interpret
application/json-amazonui-streaming
asTextResponse
(#1503).scrapy
is imported by default when using shell tools (shell
,inspect_response
) (#2248).
Bug 修复
DefaultRequestHeaders middleware now runs before UserAgent middleware (#2088). Warning: this is technically backwards incompatible, though we consider this a bug fix.
HTTP cache extension and plugins that use the
.scrapy
data directory now work outside projects (#1581). Warning: this is technically backwards incompatible, though we consider this a bug fix.Selector
does not allow passing bothresponse
andtext
anymore (#2153).Fixed logging of wrong callback name with
scrapy parse
(#2169).Fix for an odd gzip decompression bug (#1606).
Fix for selected callbacks when using
CrawlSpider
withscrapy parse
(#2225).Fix for invalid JSON and XML files when spider yields no items (#872).
Implement
flush()
forStreamLogger
avoiding a warning in logs (#2125).