首页 \ 问答 \ 无法运行Scrapy代码(Unable to run Scrapy code)

无法运行Scrapy代码(Unable to run Scrapy code)

 我写了下面的代码：  
 spiders.test.py代码：  
  from scrapy.spider import BaseSpider
  from scrapy.selector import HtmlXPathSelector
  from wscraper.items import WscraperItem


  class MySpider(BaseSpider):
  name = "ExampleSpider"
  allowed_domains = ["timeanddate.com"]
  start_urls = ["https://www.timeanddate.com/worldclock/"]

  def parse(self, response):
       hxs = HtmlXPathSelector(response)
       titles = hxs.select("/html/body/div[1]/div[8]/section[2]/div[1]/table/tbody").extract()
    #for titles in titles:
        #title = titles.select("a/text()").extract()
        #link = titles.select("a/@href").extract()
        print title
 
 scraper.items的代码是：from scrapy.item import Item，Field  
      class WscraperItem(Item):
      # define the fields for your item here like:
      # name = scrapy.Field()
      title = Field()
      pass
 
 在运行命令“scrapy crawl ExampleSpider”时出现以下错误：  
      [boto] ERROR: Caught exception reading instance data
        Traceback (most recent call last):
        File "/usr/lib/python2.7/dist-packages/boto/utils.py", line 210, in 
   retry_url
   r = opener.open(req, timeout=timeout)
   File "/usr/lib/python2.7/urllib2.py", line 429, in open
   response = self._open(req, data)
   File "/usr/lib/python2.7/urllib2.py", line 447, in _open
    '_open', req)
    File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
    File "/usr/lib/python2.7/urllib2.py", line 1228, in http_open
    return self.do_open(httplib.HTTPConnection, req)
     File "/usr/lib/python2.7/urllib2.py", line 1198, in do_open
     raise URLError(err)
     URLError: <urlopen error [Errno 101] Network is unreachable>
    [boto] ERROR: Unable to read instance data, giving up
    [scrapy] ERROR: Error downloading <GET 
     https://www.timeanddate.com/worldclock/>
     Traceback (most recent call last):
     File "/usr/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 45, 
     in mustbe_deferred
      result = f(*args, **kw)  
      File "/usr/lib/python2.7/dist-
     packages/scrapy/core/downloader/handlers/__init__.py", line 41, in 
     download_request
      return handler(request, spider)
      File "/usr/lib/python2.7/dist-
      packages/scrapy/core/downloader/handlers/http11.py", line 44, in 
      download_request
      return agent.download_request(request)
      d = super(CachingThreadedResolver, self).getHostByName(name, timeout)
      File "/home/priyanka/.local/lib/python2.7/site-
      packages/twisted/internet/base.py", line 276, in getHostByName
       timeoutDelay = sum(timeout)
       TypeError: 'float' object is not iterable
       [scrapy] INFO: Dumping Scrapy stats:
       {'downloader/exception_count': 1,
      'downloader/exception_type_count/exceptions.TypeError': 1,
       'downloader/request_bytes': 228,
       'log_count/DEBUG': 2,
       'log_count/ERROR': 3,
       'log_count/INFO': 7,
      'scheduler/dequeued': 1,
      'scheduler/dequeued/memory': 1,

I have written the following code:  
spiders.test.py code: 
  from scrapy.spider import BaseSpider
  from scrapy.selector import HtmlXPathSelector
  from wscraper.items import WscraperItem


  class MySpider(BaseSpider):
  name = "ExampleSpider"
  allowed_domains = ["timeanddate.com"]
  start_urls = ["https://www.timeanddate.com/worldclock/"]

  def parse(self, response):
       hxs = HtmlXPathSelector(response)
       titles = hxs.select("/html/body/div[1]/div[8]/section[2]/div[1]/table/tbody").extract()
    #for titles in titles:
        #title = titles.select("a/text()").extract()
        #link = titles.select("a/@href").extract()
        print title
 
The code for scraper.items is: from scrapy.item import Item, Field 
      class WscraperItem(Item):
      # define the fields for your item here like:
      # name = scrapy.Field()
      title = Field()
      pass
 
I'm getting the following error on running the command "scrapy crawl ExampleSpider": 
      [boto] ERROR: Caught exception reading instance data
        Traceback (most recent call last):
        File "/usr/lib/python2.7/dist-packages/boto/utils.py", line 210, in 
   retry_url
   r = opener.open(req, timeout=timeout)
   File "/usr/lib/python2.7/urllib2.py", line 429, in open
   response = self._open(req, data)
   File "/usr/lib/python2.7/urllib2.py", line 447, in _open
    '_open', req)
    File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
    File "/usr/lib/python2.7/urllib2.py", line 1228, in http_open
    return self.do_open(httplib.HTTPConnection, req)
     File "/usr/lib/python2.7/urllib2.py", line 1198, in do_open
     raise URLError(err)
     URLError: <urlopen error [Errno 101] Network is unreachable>
    [boto] ERROR: Unable to read instance data, giving up
    [scrapy] ERROR: Error downloading <GET 
     https://www.timeanddate.com/worldclock/>
     Traceback (most recent call last):
     File "/usr/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 45, 
     in mustbe_deferred
      result = f(*args, **kw)  
      File "/usr/lib/python2.7/dist-
     packages/scrapy/core/downloader/handlers/__init__.py", line 41, in 
     download_request
      return handler(request, spider)
      File "/usr/lib/python2.7/dist-
      packages/scrapy/core/downloader/handlers/http11.py", line 44, in 
      download_request
      return agent.download_request(request)
      d = super(CachingThreadedResolver, self).getHostByName(name, timeout)
      File "/home/priyanka/.local/lib/python2.7/site-
      packages/twisted/internet/base.py", line 276, in getHostByName
       timeoutDelay = sum(timeout)
       TypeError: 'float' object is not iterable
       [scrapy] INFO: Dumping Scrapy stats:
       {'downloader/exception_count': 1,
      'downloader/exception_type_count/exceptions.TypeError': 1,
       'downloader/request_bytes': 228,
       'log_count/DEBUG': 2,
       'log_count/ERROR': 3,
       'log_count/INFO': 7,
      'scheduler/dequeued': 1,
      'scheduler/dequeued/memory': 1,

原文：https://stackoverflow.com/questions/43796230

更新时间：2023-01-13 08:01

最满意答案

 看起来UserSocialAuth对象现在有一个.refresh_token（）方法 ，它允许您使用.tokens并获取更新的令牌。 

It looks like UserSocialAuth objects now have a .refresh_token() method, which allows you to use .tokens and get the updated token.

无法运行Scrapy代码(Unable to run Scrapy code)

最满意答案

相关问答

使用Python Social Auth确定何时刷新OAUTH2令牌(Decide when to refresh OAUTH2 token with Python Social Auth)[2022-03-15]

Python Social Auth，Google和刷新令牌(Python Social Auth, Google, and refresh token)[2022-08-31]

如何在使用django和django-social-auth时向用户显示设置用户名的表单(How to show user a form to set username when using django with django-social-auth)[2022-08-09]

django-social-auth的OAuthException异常(OAuthException exception with django-social-auth)[2024-01-14]

无法让Linkedin auth与django-social-auth合作(Unable to get Linkedin auth to work with django-social-auth)[2022-11-28]

Django社交认证GitHub认证(Django social auth GitHub authentication)[2021-08-27]

使用django-social-auth（omab）自动刷新访问令牌(Automatically refresh access token with django-social-auth (omab))[2023-07-19]

“python-social-auth”或“django-social-auth”(“python-social-auth” or “django-social-auth”)[2023-01-27]

Django-Social-Auth错误(Django-Social-Auth Errors)[2022-09-14]

Django rest_social_auth登录。如何处理facebook令牌？(Django rest_social_auth log in. What to do with facebook token?)[2022-07-26]

相关文章

最新问答