首页 \ 问答 \ Img使用bs4和硒进行刮擦(Img scraping using bs4 and selenium)

Img使用bs4和硒进行刮擦(Img scraping using bs4 and selenium)

我正在尝试使用硒和bs4从IG中获取一些img文件。 我有这个脚本来做到这一点,它似乎工作正常,但最终我想它只是打印img src ,一个示例: https://scontent-lax3-2.cdninstagram.com/vp/2592f6b07f88bfc4bfdf6d73400a04b8/5BA6E998/t51.2885-15/s640x640/sh0.08/e35/28752330_1972627949433283_1816022201220988928_n.jpghttps://scontent-lax3-2.cdninstagram.com/vp/2592f6b07f88bfc4bfdf6d73400a04b8/5BA6E998/t51.2885-15/s640x640/sh0.08/e35/28752330_1972627949433283_1816022201220988928_n.jpg ,稍后下载图像。 但现在我需要一些帮助来打印该img src链接,而不需要标签和附加内容。 感谢您的建议。

码:

import requests
from bs4 import BeautifulSoup
import selenium.webdriver as webdriver

url = ('https://www.instagram.com/kitties/')
driver = webdriver.Firefox()
driver.get(url)

soup = BeautifulSoup(driver.page_source, 'lxml')

img_url = soup.find_all('img', class_='_2di5p')

print img_url

I am trying to scrape some img files from IG using selenium and bs4. I have this following script to do it, it seems to work fine, but eventually I'd like it to just print img src, a sample: https://scontent-lax3-2.cdninstagram.com/vp/2592f6b07f88bfc4bfdf6d73400a04b8/5BA6E998/t51.2885-15/s640x640/sh0.08/e35/28752330_1972627949433283_1816022201220988928_n.jpg and download images later. But for now I would need some help to just print that img src link without the tags and extras. Thanks for the advice.

Code:

import requests
from bs4 import BeautifulSoup
import selenium.webdriver as webdriver

url = ('https://www.instagram.com/kitties/')
driver = webdriver.Firefox()
driver.get(url)

soup = BeautifulSoup(driver.page_source, 'lxml')

img_url = soup.find_all('img', class_='_2di5p')

print img_url

原文:https://stackoverflow.com/questions/50592603
更新时间:2021-07-24 13:07

相关文章

更多

最新问答

更多
  • 如何检索Ember.js模型的所有属性(How to retrieve all properties of an Ember.js model)
  • maven中snapshot快照库和release发布库的区别和作用
  • arraylist中的搜索元素(Search element in arraylist)
  • 从mysli_fetch_array中获取选定的值并输出(Get selected value from mysli_fetch_array and output)
  • Windows Phone上的可用共享扩展(Available Share Extensions on Windows Phone)
  • 如何在命令提示符下将日期设置为文件名(How to set file name as date in command prompt)
  • 如何在Laravel 5.2中使用paginate与关系?(How to use paginate with relationships in Laravel 5.2?)
  • 从iframe访问父页面的id元素(accessing id element of parent page from iframe)
  • linux的常用命令干什么用的
  • Feign Client + Eureka POST请求正文(Feign Client + Eureka POST request body)
  • 怎么删除禁用RHEL/CentOS 7上不需要的服务
  • 为什么Gradle运行测试两次?(Why does Gradle run tests twice?)
  • 由于有四个新控制器,Auth刀片是否有任何变化?(Are there any changes in Auth blades due to four new controllers?)
  • 如何交换返回集中的行?(How to swap rows in a return set?)
  • 在android中的活动之间切换?(Switching between activities in android?)
  • Perforce:如何从Depot到Workspace丢失文件?(Perforce: how to get missing file from Depot to Workspace?)
  • Webform页面避免运行服务器(Webform page avoiding runat server)
  • 在ios 7中的UITableView部分周围绘制边界线(draw borderline around UITableView section in ios 7)
  • 内存布局破解(memory layout hack)
  • 使用Boost.Spirit Qi和Lex时的空白队长(Whitespace skipper when using Boost.Spirit Qi and Lex)
  • 我们可以有一个调度程序,你可以异步添加东西,但会同步按顺序执行吗?(Can we have a dispatcher that you can add things todo asynchronously but will be executed in that order synchronously?)
  • “FROM a,b”和“FROM a FULL OUTER JOIN b”之间有什么区别?(What is the difference between “FROM a, b” and “FROM a FULL OUTER JOIN b”?)
  • Java中的不可变类(Immutable class in Java)
  • bat批处理文件结果导出到txt
  • WordPress发布查询(WordPress post query)
  • 如何在关系数据库中存储与IPv6兼容的地址(How to store IPv6-compatible address in a relational database)
  • 是否可以检查对象值的条件并返回密钥?(Is it possible to check the condition of a value of an object and JUST return the key?)
  • 德州新起点计算机培训学校主要课程有什么?
  • GEP分段错误LLVM C ++ API(GEP segmentation fault LLVM C++ API)
  • “latin1_german1_ci”整理来自哪里?(Where is “latin1_german1_ci” collation coming from?)