首页 \ 问答 \ Img使用bs4和硒进行刮擦(Img scraping using bs4 and selenium)

Img使用bs4和硒进行刮擦(Img scraping using bs4 and selenium)

 我正在尝试使用硒和bs4从IG中获取一些img文件。 我有这个脚本来做到这一点，它似乎工作正常，但最终我想它只是打印img src ，一个示例： https://scontent-lax3-2.cdninstagram.com/vp/2592f6b07f88bfc4bfdf6d73400a04b8/5BA6E998/t51.2885-15/s640x640/sh0.08/e35/28752330_1972627949433283_1816022201220988928_n.jpg ： https://scontent-lax3-2.cdninstagram.com/vp/2592f6b07f88bfc4bfdf6d73400a04b8/5BA6E998/t51.2885-15/s640x640/sh0.08/e35/28752330_1972627949433283_1816022201220988928_n.jpg ，稍后下载图像。 但现在我需要一些帮助来打印该img src链接，而不需要标签和附加内容。 感谢您的建议。  
 码：  
import requests
from bs4 import BeautifulSoup
import selenium.webdriver as webdriver

url = ('https://www.instagram.com/kitties/')
driver = webdriver.Firefox()
driver.get(url)

soup = BeautifulSoup(driver.page_source, 'lxml')

img_url = soup.find_all('img', class_='_2di5p')

print img_url

I am trying to scrape some img files from IG using selenium and bs4. I have this following script to do it, it seems to work fine, but eventually I'd like it to just print img src, a sample: https://scontent-lax3-2.cdninstagram.com/vp/2592f6b07f88bfc4bfdf6d73400a04b8/5BA6E998/t51.2885-15/s640x640/sh0.08/e35/28752330_1972627949433283_1816022201220988928_n.jpg and download images later. But for now I would need some help to just print that img src link without the tags and extras. Thanks for the advice. 
Code: 
import requests
from bs4 import BeautifulSoup
import selenium.webdriver as webdriver

url = ('https://www.instagram.com/kitties/')
driver = webdriver.Firefox()
driver.get(url)

soup = BeautifulSoup(driver.page_source, 'lxml')

img_url = soup.find_all('img', class_='_2di5p')

print img_url

原文：https://stackoverflow.com/questions/50592603

更新时间：2021-07-24 13:07

相关文章

更多

英特尔推出Hadoop免费版本布局BS时代

Struts的 html:img 标签(问题表述清楚)

HMTL 图形标签（img标签）

系统运维技巧（三）——利用dd命令临时增加交换分区

打针前为啥先要擦棉球呀？

微信公共平台（码农在努力）

《轻轻松松自动化测试》扫描版[PDF]

正则表达式匹配字串问题

网页抓取时遇到相对路径怎么办啊，高手快帮帮我

分享信息到微博微信（朋友圈）好友

最新问答

更多

如何检索Ember.js模型的所有属性(How to retrieve all properties of an Ember.js model)

maven中snapshot快照库和release发布库的区别和作用

arraylist中的搜索元素(Search element in arraylist)

从mysli_fetch_array中获取选定的值并输出(Get selected value from mysli_fetch_array and output)

Windows Phone上的可用共享扩展(Available Share Extensions on Windows Phone)

如何在命令提示符下将日期设置为文件名(How to set file name as date in command prompt)

如何在Laravel 5.2中使用paginate与关系？(How to use paginate with relationships in Laravel 5.2?)

从iframe访问父页面的id元素(accessing id element of parent page from iframe)

linux的常用命令干什么用的

Feign Client + Eureka POST请求正文(Feign Client + Eureka POST request body)

怎么删除禁用RHEL/CentOS 7上不需要的服务

为什么Gradle运行测试两次？(Why does Gradle run tests twice?)

由于有四个新控制器，Auth刀片是否有任何变化？(Are there any changes in Auth blades due to four new controllers?)

如何交换返回集中的行？(How to swap rows in a return set?)

在android中的活动之间切换？(Switching between activities in android?)

Perforce：如何从Depot到Workspace丢失文件？(Perforce: how to get missing file from Depot to Workspace?)

Webform页面避免运行服务器(Webform page avoiding runat server)

在ios 7中的UITableView部分周围绘制边界线(draw borderline around UITableView section in ios 7)

内存布局破解(memory layout hack)

使用Boost.Spirit Qi和Lex时的空白队长(Whitespace skipper when using Boost.Spirit Qi and Lex)

我们可以有一个调度程序，你可以异步添加东西，但会同步按顺序执行吗？(Can we have a dispatcher that you can add things todo asynchronously but will be executed in that order synchronously?)

“FROM a，b”和“FROM a FULL OUTER JOIN b”之间有什么区别？(What is the difference between “FROM a, b” and “FROM a FULL OUTER JOIN b”?)

Java中的不可变类(Immutable class in Java)

bat批处理文件结果导出到txt

WordPress发布查询(WordPress post query)

如何在关系数据库中存储与IPv6兼容的地址(How to store IPv6-compatible address in a relational database)

是否可以检查对象值的条件并返回密钥？(Is it possible to check the condition of a value of an object and JUST return the key?)

德州新起点计算机培训学校主要课程有什么？

GEP分段错误LLVM C ++ API(GEP segmentation fault LLVM C++ API)

“latin1_german1_ci”整理来自哪里？(Where is “latin1_german1_ci” collation coming from?)