无法运行Scrapy代码(Unable to run Scrapy code)
我写了下面的代码:
spiders.test.py代码:
from scrapy.spider import BaseSpider from scrapy.selector import HtmlXPathSelector from wscraper.items import WscraperItem class MySpider(BaseSpider): name = "ExampleSpider" allowed_domains = ["timeanddate.com"] start_urls = ["https://www.timeanddate.com/worldclock/"] def parse(self, response): hxs = HtmlXPathSelector(response) titles = hxs.select("/html/body/div[1]/div[8]/section[2]/div[1]/table/tbody").extract() #for titles in titles: #title = titles.select("a/text()").extract() #link = titles.select("a/@href").extract() print title
scraper.items的代码是:from scrapy.item import Item,Field
class WscraperItem(Item): # define the fields for your item here like: # name = scrapy.Field() title = Field() pass
在运行命令“scrapy crawl ExampleSpider”时出现以下错误:
[boto] ERROR: Caught exception reading instance data Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/boto/utils.py", line 210, in retry_url r = opener.open(req, timeout=timeout) File "/usr/lib/python2.7/urllib2.py", line 429, in open response = self._open(req, data) File "/usr/lib/python2.7/urllib2.py", line 447, in _open '_open', req) File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain result = func(*args) File "/usr/lib/python2.7/urllib2.py", line 1228, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib/python2.7/urllib2.py", line 1198, in do_open raise URLError(err) URLError: <urlopen error [Errno 101] Network is unreachable> [boto] ERROR: Unable to read instance data, giving up [scrapy] ERROR: Error downloading <GET https://www.timeanddate.com/worldclock/> Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 45, in mustbe_deferred result = f(*args, **kw) File "/usr/lib/python2.7/dist- packages/scrapy/core/downloader/handlers/__init__.py", line 41, in download_request return handler(request, spider) File "/usr/lib/python2.7/dist- packages/scrapy/core/downloader/handlers/http11.py", line 44, in download_request return agent.download_request(request) d = super(CachingThreadedResolver, self).getHostByName(name, timeout) File "/home/priyanka/.local/lib/python2.7/site- packages/twisted/internet/base.py", line 276, in getHostByName timeoutDelay = sum(timeout) TypeError: 'float' object is not iterable [scrapy] INFO: Dumping Scrapy stats: {'downloader/exception_count': 1, 'downloader/exception_type_count/exceptions.TypeError': 1, 'downloader/request_bytes': 228, 'log_count/DEBUG': 2, 'log_count/ERROR': 3, 'log_count/INFO': 7, 'scheduler/dequeued': 1, 'scheduler/dequeued/memory': 1,
I have written the following code:
spiders.test.py code:
from scrapy.spider import BaseSpider from scrapy.selector import HtmlXPathSelector from wscraper.items import WscraperItem class MySpider(BaseSpider): name = "ExampleSpider" allowed_domains = ["timeanddate.com"] start_urls = ["https://www.timeanddate.com/worldclock/"] def parse(self, response): hxs = HtmlXPathSelector(response) titles = hxs.select("/html/body/div[1]/div[8]/section[2]/div[1]/table/tbody").extract() #for titles in titles: #title = titles.select("a/text()").extract() #link = titles.select("a/@href").extract() print title
The code for scraper.items is: from scrapy.item import Item, Field
class WscraperItem(Item): # define the fields for your item here like: # name = scrapy.Field() title = Field() pass
I'm getting the following error on running the command "scrapy crawl ExampleSpider":
[boto] ERROR: Caught exception reading instance data Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/boto/utils.py", line 210, in retry_url r = opener.open(req, timeout=timeout) File "/usr/lib/python2.7/urllib2.py", line 429, in open response = self._open(req, data) File "/usr/lib/python2.7/urllib2.py", line 447, in _open '_open', req) File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain result = func(*args) File "/usr/lib/python2.7/urllib2.py", line 1228, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib/python2.7/urllib2.py", line 1198, in do_open raise URLError(err) URLError: <urlopen error [Errno 101] Network is unreachable> [boto] ERROR: Unable to read instance data, giving up [scrapy] ERROR: Error downloading <GET https://www.timeanddate.com/worldclock/> Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 45, in mustbe_deferred result = f(*args, **kw) File "/usr/lib/python2.7/dist- packages/scrapy/core/downloader/handlers/__init__.py", line 41, in download_request return handler(request, spider) File "/usr/lib/python2.7/dist- packages/scrapy/core/downloader/handlers/http11.py", line 44, in download_request return agent.download_request(request) d = super(CachingThreadedResolver, self).getHostByName(name, timeout) File "/home/priyanka/.local/lib/python2.7/site- packages/twisted/internet/base.py", line 276, in getHostByName timeoutDelay = sum(timeout) TypeError: 'float' object is not iterable [scrapy] INFO: Dumping Scrapy stats: {'downloader/exception_count': 1, 'downloader/exception_type_count/exceptions.TypeError': 1, 'downloader/request_bytes': 228, 'log_count/DEBUG': 2, 'log_count/ERROR': 3, 'log_count/INFO': 7, 'scheduler/dequeued': 1, 'scheduler/dequeued/memory': 1,
原文:https://stackoverflow.com/questions/43796230
更新时间:2023-01-13 08:01
最满意答案
看起来UserSocialAuth对象现在有一个.refresh_token()方法 ,它允许您使用.tokens并获取更新的令牌。
It looks like UserSocialAuth objects now have a .refresh_token() method, which allows you to use .tokens and get the updated token.
相关问答
更多-
使用Python Social Auth确定何时刷新OAUTH2令牌(Decide when to refresh OAUTH2 token with Python Social Auth)[2022-03-15]
我终于明白了这一点。 我最初困惑的原因是因为实际上有两种情况: 当用户来自登录时,基本上是在执行管道时。 当令牌刷新时调用用户社交验证方法refresh_token 解决第一种情况 我为管道创建了一个新功能: def set_last_update(details, *args, **kwargs): # pylint: disable=unused-argument """ Pipeline function to add extra information about when the ... -
的确, python-social-auth会使用Google+平台的某些部分,至少API可以检索关于用户填写帐户的详细信息。 从你的设置中,我看到你在底部有associate_by_email ,那时,它已经没有用了,因为用户已经被创建了,如果你打算使用它,它必须在create_user之前,你可以检查DEFAULT_PIPELINE作为参考。 为了从google获得一个refresh_token ,你需要告诉它你想要一个,为此你需要设置offline访问类型: SOCIAL_AUTH_GOOGLE_OA ...
-
1可能是要走的路。 您可以为您遇到问题的社交提供者重写Backend类的get_user_details方法,并从那里提取用户名。 它看起来像这样,例如Facebook: from social_auth.backends.facebook import FacebookBackend class CustomFacebookBackend(FacebookBackend): name = 'facebook' def get_user_details(self, response): ...
-
@omab是正确的,我的问题是附加redirect_uri 。 除此之外,我编辑了django-social-auth,在网址上附加一个反斜杠。 这也是不正确的。 @omab was correct, my issue was attaching the redirect_uri. It addition to that I had edited django-social-auth to append a backslash to the url. This was also incorrect.
-
无法让Linkedin auth与django-social-auth合作(Unable to get Linkedin auth to work with django-social-auth)[2022-11-28]
这些是我用于Linkedin Auth的设置 SOCIAL_AUTH_PIPELINE = ( 'social_auth.backends.pipeline.social.social_auth_user', 'social_auth.backends.pipeline.user.get_username', 'social_auth.backends.pipeline.user.create_user', 'social_auth.backends.pipeline.social.associate_us ... -
您是否阅读了此处提供的文档? 安装应用程序后,您需要进行一系列配置,包括指定Github 后端 您可能还想了解OAuth协议以及您需要的所有令牌以及如何获取这些令牌。 如果您只需要访问github特定的API,我建议您查看PyGithub 身份验证就像执行g = Github( token )一样简单,其中令牌来自oauth。 阅读回购如下 for repo in g.get_user().get_repos(): print repo.name repo.edit( has_wiki = ...
-
使用django-social-auth(omab)自动刷新访问令牌(Automatically refresh access token with django-social-auth (omab))[2023-07-19]
看起来UserSocialAuth对象现在有一个.refresh_token()方法 ,它允许您使用.tokens并获取更新的令牌。 It looks like UserSocialAuth objects now have a .refresh_token() method, which allows you to use .tokens and get the updated token. -
Django-social-auth有一个非常清楚的警告 ,它被弃用而不支持python-social-auth。 Django-social-auth has a very clear warning that it is deprecated in favour of python-social-auth.
-
Django-Social-Auth错误(Django-Social-Auth Errors)[2022-09-14]
第一个问题 - ForeignKey错误 - 是由循环导入引起的。 解决方案很简单,遵循惯例。 将该信号注册到models.py末尾的代码块,并将其移动到名为barbuser/signals.py的新文件中。 然后在barbuser/__init__.py from .signals import * 我没有运行你的代码足以得到你的第二个错误 - 'NoneType'对象没有属性'extra_data' - 但我发现了其他一些问题。 从settings.py中删除SOCIAL_AUTH_ENABLED_BA ... -
Django rest_social_auth登录。如何处理facebook令牌?(Django rest_social_auth log in. What to do with facebook token?)[2022-07-26]
添加Authorization: Bearer 4571b2dce1f3abec34b28a4c7bd981c248a30698到标头,如下所示: After some study I found a solution. There are 2 considerations here: Django needs permission and authentication classes My views.py now looks like this: class AuthDetailView(BaseDet ...