Img使用bs4和硒进行刮擦(Img scraping using bs4 and selenium)
我正在尝试使用硒和bs4从IG中获取一些img文件。 我有这个脚本来做到这一点,它似乎工作正常,但最终我想它只是打印
img src
,一个示例:https://scontent-lax3-2.cdninstagram.com/vp/2592f6b07f88bfc4bfdf6d73400a04b8/5BA6E998/t51.2885-15/s640x640/sh0.08/e35/28752330_1972627949433283_1816022201220988928_n.jpg
:https://scontent-lax3-2.cdninstagram.com/vp/2592f6b07f88bfc4bfdf6d73400a04b8/5BA6E998/t51.2885-15/s640x640/sh0.08/e35/28752330_1972627949433283_1816022201220988928_n.jpg
,稍后下载图像。 但现在我需要一些帮助来打印该img src链接,而不需要标签和附加内容。 感谢您的建议。码:
import requests from bs4 import BeautifulSoup import selenium.webdriver as webdriver url = ('https://www.instagram.com/kitties/') driver = webdriver.Firefox() driver.get(url) soup = BeautifulSoup(driver.page_source, 'lxml') img_url = soup.find_all('img', class_='_2di5p') print img_url
I am trying to scrape some img files from IG using selenium and bs4. I have this following script to do it, it seems to work fine, but eventually I'd like it to just print
img src
, a sample:https://scontent-lax3-2.cdninstagram.com/vp/2592f6b07f88bfc4bfdf6d73400a04b8/5BA6E998/t51.2885-15/s640x640/sh0.08/e35/28752330_1972627949433283_1816022201220988928_n.jpg
and download images later. But for now I would need some help to just print that img src link without the tags and extras. Thanks for the advice.Code:
import requests from bs4 import BeautifulSoup import selenium.webdriver as webdriver url = ('https://www.instagram.com/kitties/') driver = webdriver.Firefox() driver.get(url) soup = BeautifulSoup(driver.page_source, 'lxml') img_url = soup.find_all('img', class_='_2di5p') print img_url
原文:https://stackoverflow.com/questions/50592603