首页 \ 问答 \ 使用BS4，Python和Selenium后解析文本(Parse the text after using BS4, Python and Selenium)

使用BS4，Python和Selenium后解析文本(Parse the text after using BS4, Python and Selenium)

 使用我的scrape脚本后：  
from selenium import webdriver
from bs4 import BeautifulSoup
import csv

browser = webdriver.Firefox()
browser.get('http://dyn.com/about/events/')
html = browser.page_source
soup = BeautifulSoup(html)
titles = [tag.text for tag in soup.find_all('p','pubdate')]
 
 我得到的结果如下：  
 
  [u'\ n \ n \ t \ t \ t \ BWEBINAR：如何扩大您的全球覆盖范围到中国\ xa0 \ n \ t \ t \ t \ n \ t \ t \ t设置22,2014 \ t \ t \ t \ nspeak \ n'，u'\ n \ n \ t \ t \ t LAUNCH Scale \ u2013旧金山，CA \ xa0 \ n \ t \ t \ t \ n \ t \ t \ t \ tOct 23 - 24,2014 \ t \ t \ t \ nattend \ n'，u'\ n \ n \ t \ t \ tAcquia参与用户会议\ u2013 Boston，MA \ xa0 \ n \ t \ t \ t \ n \ t \ t \ t \ t 3 - 5 ，2014 \ t \ t \ t \ nexhibitattend \ n'，u'\ n \ n \ t \ t \ t \ tCloud Expo \ u2013圣克拉拉，加利福尼亚\ xa0 \ n \ t \ t \ t \ n \ t \ t \ tNov 4 - 6,2014 \ t \ t \ t \ nexhibit \ n'，u'\ n \ n \ t \ t \ t \ 2014年全球运营商奖项\ u2013阿姆斯特丹\ xa0 \ n \ t \ t \ t \ n \ n \ t \ t \ tNov 4,2014 \ t \ t \ t \ n \ n'，u'\ n \ n \ t \ t \ t \ t \ twit \ Summit \ u2013都柏林，爱尔兰\ xa0 \ n \ t \ t \ t \ n \ t \ t \ t \ tNov 4 - 6,2014 \ t \ t \ t \ n \ n \ n \ n'，u'\ n \ n \ t \ t \ t \ t \ tVelocity Europe \ u2013巴塞罗那，西班牙\ xa0 \ n \ t \ t \ t \ n \ t \ t \ tNov 17 - 19,2014 \ t \ t \ t \ nexhibit \ n'，u'\ n \ n \ t \ t \ tNH / VT第一届乐高联赛冠军赛\ xa0 \ n \ t \ t \ t \ n \ t \ t \ tDec 6,2014 \ t \ t \ t \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n“>  
 
 我是python的新手，所以你能建议我如何从这个结果中获取事件名称，日期，事件类型？  
 谢谢！ 

after using my scrape script: 
from selenium import webdriver
from bs4 import BeautifulSoup
import csv

browser = webdriver.Firefox()
browser.get('http://dyn.com/about/events/')
html = browser.page_source
soup = BeautifulSoup(html)
titles = [tag.text for tag in soup.find_all('p','pubdate')]
 
I have got the result that looks like: 
 
 [u'\n\n\t\t\tWEBINAR: How To Expand Your Global Reach To China\xa0\n\t\t\t\n\t\t\tOct 22, 2014\t\t\t\nspeak \n', u'\n\n\t\t\tLAUNCH Scale \u2013 San Francisco, CA\xa0\n\t\t\t\n\t\t\tOct 23 - 24, 2014\t\t\t\nattend \n', u'\n\n\t\t\tAcquia Engage User Conference \u2013 Boston, MA\xa0\n\t\t\t\n\t\t\tNov 3 - 5, 2014\t\t\t\nexhibitattend \n', u'\n\n\t\t\tCloud Expo \u2013 Santa Clara, CA\xa0\n\t\t\t\n\t\t\tNov 4 - 6, 2014\t\t\t\nexhibit \n', u'\n\n\t\t\tThe Global Carrier Awards 2014 \u2013 Amsterdam\xa0\n\t\t\t\n\t\t\tNov 4, 2014\t\t\t\n\n', u'\n\n\t\t\tWeb Summit \u2013 Dublin, Ireland\xa0\n\t\t\t\n\t\t\tNov 4 - 6, 2014\t\t\t\nspeak \n', u'\n\n\t\t\tVelocity Europe \u2013 Barcelona, Spain\xa0\n\t\t\t\n\t\t\tNov 17 - 19, 2014\t\t\t\nexhibit \n', u'\n\n\t\t\tNH/VT FIRST LEGO League Championship Event\xa0\n\t\t\t\n\t\t\tDec 6, 2014\t\t\t\nspeak \n'] 
 
I am new to python, so could you suggest how can I get Event Name, Date, Event Type from this result? 
Thanks!

原文：https://stackoverflow.com/questions/26484951

更新时间：2022-11-26 07:11

最满意答案

 您应该使用外部联接。  
select
    A.ID,
    A.DataA1,
    A.DataA2,
    B.A_ID,
    B.DataB1,
    B.DataB2,
    C.A_ID,
    C.DataC1,
    C.DataC2
from A 
left join B
on A.ID = B.A_ID
left join C
on A.ID = C.A_ID
 
 有关SQL连接的详细解释， 请访问 ： http ： //www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html 

You should use an outer join. 
select
    A.ID,
    A.DataA1,
    A.DataA2,
    B.A_ID,
    B.DataB1,
    B.DataB2,
    C.A_ID,
    C.DataC1,
    C.DataC2
from A 
left join B
on A.ID = B.A_ID
left join C
on A.ID = C.A_ID
 
For a good explanation of SQL joins checkout: http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html

使用BS4，Python和Selenium后解析文本(Parse the text after using BS4, Python and Selenium)

最满意答案

相关问答

为什么此查询仅返回非空子表的结果？(Why does this query only return results with non-empty child tables?)[2021-08-30]

SQL用于查询具有两个子表的父表中的行，如果不存在子行，则使用空值(SQL to query rows from parent tables with two child tables, with blank values if no child row exists)[2023-05-10]

从父表和子表中删除行(Deleting rows from parent and child tables)[2023-10-20]

一个父行，另一个表中的多个子行。(One parent row, multiple child rows in another table. How to get them all in one row?)[2023-07-31]

SQL父子查询 - 关系在两个表中定义(SQL Parent Child query - relation is defined in two tables)[2022-01-08]

Oracle SQL插入查询 - 进入父表和子表(Oracle SQL insert query - into parent and child tables)[2022-11-22]

简单的SQL来检查父项是否有任何子行(Simple SQL to check if parent has any child rows or not)[2023-06-06]

优化python csv处理到父和EAV子表(Optimize python csv processing into parent and EAV child table)[2023-08-10]

如何查询子表值(How to query child tables values)[2022-11-20]

查找具有完全相同的子行集的Sql父行(Find Sql parent rows with exactly same set of child rows)[2022-10-11]

相关文章

最新问答