首页
\
问答
\
BeautifulSoup find_all(“img”)不适用于所有网站(BeautifulSoup find_all(“img”) not working for all sites)
BeautifulSoup find_all(“img”)不适用于所有网站(BeautifulSoup find_all(“img”) not working for all sites)
我正在尝试编写一个Python脚本来从任何网站下载图像。 它工作,但不一致。 具体来说,find_all(“img”)对第二个url没有这样做。 该脚本是:
# works for http://proof.nationalgeographic.com/2016/02/02/photo-of-the-day-best-of-january-3/ # but not http://www.nationalgeographic.com/photography/proof/2017/05/lake-chad-desertification/ import requests from PIL import Image from io import BytesIO from bs4 import BeautifulSoup def url_to_image(url, filename): # get HTTP response, open as bytes, save the image # http://docs.python-requests.org/en/master/user/quickstart/#binary-response-content req = requests.get(url) i = Image.open(BytesIO(req.content)) i.save(filename) # open page, get HTML request and parse with BeautifulSoup html = requests.get("http://proof.nationalgeographic.com/2016/02/02/photo-of-the-day-best-of-january-3/") soup = BeautifulSoup(html.text, "html.parser") # find all JPEGS in our soup and write their "src" attribute to array urls = [] for img in soup.find_all("img"): if img["src"].endswith("jpg"): print("endswith jpg") urls.append(str(img["src"])) print(str(img)) jpeg_no = 00 for url in urls: url_to_image(url, filename="NatGeoPix/" + str(jpeg_no) + ".jpg") jpeg_no += 1
I'm trying to write a Python script to download images from any website. It is working, but inconsistently. Specifically, find_all("img") is not doing so for the second url. The script is:
# works for http://proof.nationalgeographic.com/2016/02/02/photo-of-the-day-best-of-january-3/ # but not http://www.nationalgeographic.com/photography/proof/2017/05/lake-chad-desertification/ import requests from PIL import Image from io import BytesIO from bs4 import BeautifulSoup def url_to_image(url, filename): # get HTTP response, open as bytes, save the image # http://docs.python-requests.org/en/master/user/quickstart/#binary-response-content req = requests.get(url) i = Image.open(BytesIO(req.content)) i.save(filename) # open page, get HTML request and parse with BeautifulSoup html = requests.get("http://proof.nationalgeographic.com/2016/02/02/photo-of-the-day-best-of-january-3/") soup = BeautifulSoup(html.text, "html.parser") # find all JPEGS in our soup and write their "src" attribute to array urls = [] for img in soup.find_all("img"): if img["src"].endswith("jpg"): print("endswith jpg") urls.append(str(img["src"])) print(str(img)) jpeg_no = 00 for url in urls: url_to_image(url, filename="NatGeoPix/" + str(jpeg_no) + ".jpg") jpeg_no += 1
原文:https://stackoverflow.com/questions/43985554
更新时间:2023-09-20 22:09
最满意答案
最好尽快启动Windows服务。 您可以将初始化代码移动到单独的线程,如下所示:
protected override void OnStart(string[] args) { Task.Run(() => StartSynchro()); }
It's better to start the windows service as fast as possible. You could move the initialization code to a separate thread as follows:
protected override void OnStart(string[] args) { Task.Run(() => StartSynchro()); }
相关问答
更多-
如果您的服务StartType设置为自动,但服务在重新启动后没有运行,那么您的服务依赖于其他服务无法正常启动,或者服务自己的启动代码失败并最终停止服务。 检查Windows事件日志中是否有错误(如果您没有记录自己的错误,则应该是)。 If your service StartType is set to Automatic, but the service is not running after a reboot, then either your service has a dependency on ...
-
你是否阻止了OnStart的回归? 通常会从那里产生一个线程来完成工作,然后让方法返回。 Are you blocking the return of OnStart? Normally one would spawn a thread from there to do the work, and let the method return.
-
这似乎是由artifactory-service.exe导致在服务定义中使用不正常的字符引起的。 运行installService.bat后,当我检查服务的“可执行文件的路径”了 ...\artifactory-pro-5.5.1\bin\artifactory-service.exe //RS//Artifactory 不寻常的字符是一些奇怪的unicode字符,例如: http : //www.fileformat.info/info/unicode/char/0cf4/index.htm 这似乎是 ...
-
分开关注。 您应该尽快从OnStart返回,因此我建议您在OnStart方法中启动异步TPL任务,不做任何其他事情。 在异步任务中,您可以执行任何操作。 通过这样做,您的服务将能够完成OnStart并正确地从开始到开始。 Separate concerns. You should return from OnStart as quick as possible, so I'd suggest to spin up an asynchronous TPL task in your OnStart method ...
-
正如其他人已经提到的那样,你不能(容易地)直接从服务启动应用程序,所以我认为解决这个问题最简单的方法是创建一个从登录开始并使用登录用户凭证运行的进程,例如位于系统托盘中的应用程序,并打开命名管道或网络端口以连接服务。 如果服务需要提醒用户,它会向该通道发送消息,然后客户端进程可以显示其自己的UI或启动应用程序。 使用管道或端口进行的进程间通信是处理会话0进程限制的最简单方法。 As others have mentioned already, you can't (easily) launch an app ...
-
只是让你知道。 现在可以在不更改注册表中的“imagepath”的情况下将参数添加到服务。 Just to let you know. There is now way to add a Parameter to a Service without changing the "imagepath" in the registry.
-
Windows服务无法启动(Windows service is not starting)[2022-01-28]
最好尽快启动Windows服务。 您可以将初始化代码移动到单独的线程,如下所示: protected override void OnStart(string[] args) { Task.Run(() => StartSynchro()); } It's better to start the windows service as fast as possible. You could move the initialization code to a separate thread as fol ... -
我想你可以在一个线程中包含OnStart中的逻辑。 收到OnStop事件后,此线程将关闭。 像这样的东西: Thread _ServiceThread; protected override void OnStart(string[] args) { _ServiceThread = new Thread(() => { /* your current OnStart logic here...*/ }); _ServiceThread.Start(); } protected overr ...
-
启动Windows服务时出错(Error starting a Windows Service)[2022-03-20]
这与明确地成为Windows服务无关,这是因为您没有设置IoC,以便Unity在构造函数中询问某个实例时知道要注入什么。 大概你在AuctionControl.Service.Service1构造函数中有一个接口,但是你没有告诉Unity容器绑定/解析该接口的具体类。 编辑: 你真的需要Unity吗? 它似乎没有做任何有用的事情。 尝试: public Service1() { InitializeComponent(); _auctionControl = new Services. ... -
如何在启动Windows服务时阅读Windows服务配置?(How to read Windows Service configuration while starting the Windows Service?)[2023-02-25]
例外情况表明配置文件存在问题。 仔细检查。 在异常或其内部异常中应该有更多信息,这将使您更准确地指出错误。 The exception suggests that there is something wrong with your configuration file. Check it carefully. There should be more information in the exception or its inner exception which will give you a more ...