首页 \ 问答 \ 无法运行Scrapy代码(Unable to run Scrapy code)

无法运行Scrapy代码(Unable to run Scrapy code)

我写了下面的代码:

spiders.test.py代码:

  from scrapy.spider import BaseSpider
  from scrapy.selector import HtmlXPathSelector
  from wscraper.items import WscraperItem


  class MySpider(BaseSpider):
  name = "ExampleSpider"
  allowed_domains = ["timeanddate.com"]
  start_urls = ["https://www.timeanddate.com/worldclock/"]

  def parse(self, response):
       hxs = HtmlXPathSelector(response)
       titles = hxs.select("/html/body/div[1]/div[8]/section[2]/div[1]/table/tbody").extract()
    #for titles in titles:
        #title = titles.select("a/text()").extract()
        #link = titles.select("a/@href").extract()
        print title

scraper.items的代码是:from scrapy.item import Item,Field

      class WscraperItem(Item):
      # define the fields for your item here like:
      # name = scrapy.Field()
      title = Field()
      pass

在运行命令“scrapy crawl ExampleSpider”时出现以下错误:

      [boto] ERROR: Caught exception reading instance data
        Traceback (most recent call last):
        File "/usr/lib/python2.7/dist-packages/boto/utils.py", line 210, in 
   retry_url
   r = opener.open(req, timeout=timeout)
   File "/usr/lib/python2.7/urllib2.py", line 429, in open
   response = self._open(req, data)
   File "/usr/lib/python2.7/urllib2.py", line 447, in _open
    '_open', req)
    File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
    File "/usr/lib/python2.7/urllib2.py", line 1228, in http_open
    return self.do_open(httplib.HTTPConnection, req)
     File "/usr/lib/python2.7/urllib2.py", line 1198, in do_open
     raise URLError(err)
     URLError: <urlopen error [Errno 101] Network is unreachable>
    [boto] ERROR: Unable to read instance data, giving up
    [scrapy] ERROR: Error downloading <GET 
     https://www.timeanddate.com/worldclock/>
     Traceback (most recent call last):
     File "/usr/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 45, 
     in mustbe_deferred
      result = f(*args, **kw)  
      File "/usr/lib/python2.7/dist-
     packages/scrapy/core/downloader/handlers/__init__.py", line 41, in 
     download_request
      return handler(request, spider)
      File "/usr/lib/python2.7/dist-
      packages/scrapy/core/downloader/handlers/http11.py", line 44, in 
      download_request
      return agent.download_request(request)
      d = super(CachingThreadedResolver, self).getHostByName(name, timeout)
      File "/home/priyanka/.local/lib/python2.7/site-
      packages/twisted/internet/base.py", line 276, in getHostByName
       timeoutDelay = sum(timeout)
       TypeError: 'float' object is not iterable
       [scrapy] INFO: Dumping Scrapy stats:
       {'downloader/exception_count': 1,
      'downloader/exception_type_count/exceptions.TypeError': 1,
       'downloader/request_bytes': 228,
       'log_count/DEBUG': 2,
       'log_count/ERROR': 3,
       'log_count/INFO': 7,
      'scheduler/dequeued': 1,
      'scheduler/dequeued/memory': 1,

I have written the following code:

spiders.test.py code:

  from scrapy.spider import BaseSpider
  from scrapy.selector import HtmlXPathSelector
  from wscraper.items import WscraperItem


  class MySpider(BaseSpider):
  name = "ExampleSpider"
  allowed_domains = ["timeanddate.com"]
  start_urls = ["https://www.timeanddate.com/worldclock/"]

  def parse(self, response):
       hxs = HtmlXPathSelector(response)
       titles = hxs.select("/html/body/div[1]/div[8]/section[2]/div[1]/table/tbody").extract()
    #for titles in titles:
        #title = titles.select("a/text()").extract()
        #link = titles.select("a/@href").extract()
        print title

The code for scraper.items is: from scrapy.item import Item, Field

      class WscraperItem(Item):
      # define the fields for your item here like:
      # name = scrapy.Field()
      title = Field()
      pass

I'm getting the following error on running the command "scrapy crawl ExampleSpider":

      [boto] ERROR: Caught exception reading instance data
        Traceback (most recent call last):
        File "/usr/lib/python2.7/dist-packages/boto/utils.py", line 210, in 
   retry_url
   r = opener.open(req, timeout=timeout)
   File "/usr/lib/python2.7/urllib2.py", line 429, in open
   response = self._open(req, data)
   File "/usr/lib/python2.7/urllib2.py", line 447, in _open
    '_open', req)
    File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
    File "/usr/lib/python2.7/urllib2.py", line 1228, in http_open
    return self.do_open(httplib.HTTPConnection, req)
     File "/usr/lib/python2.7/urllib2.py", line 1198, in do_open
     raise URLError(err)
     URLError: <urlopen error [Errno 101] Network is unreachable>
    [boto] ERROR: Unable to read instance data, giving up
    [scrapy] ERROR: Error downloading <GET 
     https://www.timeanddate.com/worldclock/>
     Traceback (most recent call last):
     File "/usr/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 45, 
     in mustbe_deferred
      result = f(*args, **kw)  
      File "/usr/lib/python2.7/dist-
     packages/scrapy/core/downloader/handlers/__init__.py", line 41, in 
     download_request
      return handler(request, spider)
      File "/usr/lib/python2.7/dist-
      packages/scrapy/core/downloader/handlers/http11.py", line 44, in 
      download_request
      return agent.download_request(request)
      d = super(CachingThreadedResolver, self).getHostByName(name, timeout)
      File "/home/priyanka/.local/lib/python2.7/site-
      packages/twisted/internet/base.py", line 276, in getHostByName
       timeoutDelay = sum(timeout)
       TypeError: 'float' object is not iterable
       [scrapy] INFO: Dumping Scrapy stats:
       {'downloader/exception_count': 1,
      'downloader/exception_type_count/exceptions.TypeError': 1,
       'downloader/request_bytes': 228,
       'log_count/DEBUG': 2,
       'log_count/ERROR': 3,
       'log_count/INFO': 7,
      'scheduler/dequeued': 1,
      'scheduler/dequeued/memory': 1,

原文:https://stackoverflow.com/questions/43796230
更新时间:2023-01-13 08:01

最满意答案

看起来UserSocialAuth对象现在有一个.refresh_token()方法 ,它允许您使用.tokens并获取更新的令牌。


It looks like UserSocialAuth objects now have a .refresh_token() method, which allows you to use .tokens and get the updated token.

相关问答

更多

相关文章

更多

最新问答

更多
  • 获取MVC 4使用的DisplayMode后缀(Get the DisplayMode Suffix being used by MVC 4)
  • 如何通过引用返回对象?(How is returning an object by reference possible?)
  • 矩阵如何存储在内存中?(How are matrices stored in memory?)
  • 每个请求的Java新会话?(Java New Session For Each Request?)
  • css:浮动div中重叠的标题h1(css: overlapping headlines h1 in floated divs)
  • 无论图像如何,Caffe预测同一类(Caffe predicts same class regardless of image)
  • xcode语法颜色编码解释?(xcode syntax color coding explained?)
  • 在Access 2010 Runtime中使用Office 2000校对工具(Use Office 2000 proofing tools in Access 2010 Runtime)
  • 从单独的Web主机将图像传输到服务器上(Getting images onto server from separate web host)
  • 从旧版本复制文件并保留它们(旧/新版本)(Copy a file from old revision and keep both of them (old / new revision))
  • 西安哪有PLC可控制编程的培训
  • 在Entity Framework中选择基类(Select base class in Entity Framework)
  • 在Android中出现错误“数据集和渲染器应该不为null,并且应该具有相同数量的系列”(Error “Dataset and renderer should be not null and should have the same number of series” in Android)
  • 电脑二级VF有什么用
  • Datamapper Ruby如何添加Hook方法(Datamapper Ruby How to add Hook Method)
  • 金华英语角.
  • 手机软件如何制作
  • 用于Android webview中图像保存的上下文菜单(Context Menu for Image Saving in an Android webview)
  • 注意:未定义的偏移量:PHP(Notice: Undefined offset: PHP)
  • 如何读R中的大数据集[复制](How to read large dataset in R [duplicate])
  • Unity 5 Heighmap与地形宽度/地形长度的分辨率关系?(Unity 5 Heighmap Resolution relationship to terrain width / terrain length?)
  • 如何通知PipedOutputStream线程写入最后一个字节的PipedInputStream线程?(How to notify PipedInputStream thread that PipedOutputStream thread has written last byte?)
  • python的访问器方法有哪些
  • DeviceNetworkInformation:哪个是哪个?(DeviceNetworkInformation: Which is which?)
  • 在Ruby中对组合进行排序(Sorting a combination in Ruby)
  • 网站开发的流程?
  • 使用Zend Framework 2中的JOIN sql检索数据(Retrieve data using JOIN sql in Zend Framework 2)
  • 条带格式类型格式模式编号无法正常工作(Stripes format type format pattern number not working properly)
  • 透明度错误IE11(Transparency bug IE11)
  • linux的基本操作命令。。。