首页 \ 问答 \ 使用Beautifulsoup的whitestaces类的正则表达式(Regular expression for class with whitespaces using Beautifulsoup)

使用Beautifulsoup的whitestaces类的正则表达式(Regular expression for class with whitespaces using Beautifulsoup)

我发现方法BeautifulSoup.find()按空格分割类属性。 在这种情况下,我无法使用正则表达式,如下面的代码所示。 你能不能帮助我找到所有'树儿'的元素:

import re
from bs4 import BeautifulSoup 

r_html = "<div class='root'>" \
       "<div class='tree children1'>text children 1 </div>" \
       "<div class='tree children2'>text children 2 </div>" \
       "<div class='tree children3'>text children 3 </div>" \
   "</div>"

bs_tab = BeautifulSoup(r_html, "html.parser")
workspace_box_visible = bs_tab.findAll('div', {'class':'tree children1'})
print workspace_box_visible # result: [<div class="tree children1">textchildren 1 </div>]
workspace_box_visible = bs_tab.findAll('div', {'class':re.compile('^tree children\d')})
print workspace_box_visible # result: [] >>>> empty array because 
                                         #class name was splited by whitespace character<<<<

# >>>>>> print all element classes <<<<<<<
def print_class(class_):
    print class_
    return False

workspace_box_visible = bs_tab.find('div', {'class': print_class})

# expected: 
# root
# tree children1
# tree children2
# tree children3

# actual:
# root
# tree
# children1
# tree
# children2
# tree
# children3

提前致谢,

====评论==========

stackoverflow站点不允许添加超过500个字符的注释,所以我在这里添加了注释:

上面,举例说明了BeautifulSoup如何寻找所需的类。

但是,如果我有DOM结构,如:

 r_html = "<div class='root'>" \
       "<div class='tree children'>zero</div>" \
       "<div class='tree children first'>first</div>" \
       "<div class='tree children second'>second</div>" \
       "<div class='tree children third'>third</div>" \
   "</div>"

当需要选择具有类属性的控件时:' tree children '和' tree children first ',你的(Padraic Cunningham)帖子中描述的所有方法都不起作用。

我找到了使用正则表达式的解决方案:

controls = bs_tab.findAll('div')
for control in controls:
    if re.search("^tree children|^tree children first", " ".join(control.attrs['class']  if control.attrs.has_key('class') else "")):
    print control

另一个解决方案:

bs_tab.findAll('div', class_='tree children') + bs_tab.findAll('div', class_='tree children first')

我知道,这不是一个好的解决方案。 我希望BeautifulSoup模块有适当的方法。


I found that method BeautifulSoup.find() splits class attribute by whitespaces. In that case I couldn't use regular expression as show in code below. Could you somebody help me to get right way find all 'tree children' elements:

import re
from bs4 import BeautifulSoup 

r_html = "<div class='root'>" \
       "<div class='tree children1'>text children 1 </div>" \
       "<div class='tree children2'>text children 2 </div>" \
       "<div class='tree children3'>text children 3 </div>" \
   "</div>"

bs_tab = BeautifulSoup(r_html, "html.parser")
workspace_box_visible = bs_tab.findAll('div', {'class':'tree children1'})
print workspace_box_visible # result: [<div class="tree children1">textchildren 1 </div>]
workspace_box_visible = bs_tab.findAll('div', {'class':re.compile('^tree children\d')})
print workspace_box_visible # result: [] >>>> empty array because 
                                         #class name was splited by whitespace character<<<<

# >>>>>> print all element classes <<<<<<<
def print_class(class_):
    print class_
    return False

workspace_box_visible = bs_tab.find('div', {'class': print_class})

# expected: 
# root
# tree children1
# tree children2
# tree children3

# actual:
# root
# tree
# children1
# tree
# children2
# tree
# children3

Thanks in advance,

==== comments ==========

stackoverflow site don't allow add comments more than 500 characters, so I added comments here:

Above, it was example to show how to BeautifulSoup looking for required classes.

But, If I have DOM structure like:

 r_html = "<div class='root'>" \
       "<div class='tree children'>zero</div>" \
       "<div class='tree children first'>first</div>" \
       "<div class='tree children second'>second</div>" \
       "<div class='tree children third'>third</div>" \
   "</div>"

and when need to select controls with class attributes: 'tree children' and 'tree children first', All of the methods described in your(Padraic Cunningham) post aren't work.

I found a solution with using regex:

controls = bs_tab.findAll('div')
for control in controls:
    if re.search("^tree children|^tree children first", " ".join(control.attrs['class']  if control.attrs.has_key('class') else "")):
    print control

and another solution:

bs_tab.findAll('div', class_='tree children') + bs_tab.findAll('div', class_='tree children first')

I know, it's not good solution. and I hope that BeautifulSoup module has appropriate method for that.


原文:https://stackoverflow.com/questions/38824121
更新时间:2023-07-31 15:07

最满意答案

你最好有四个图像,并使用overflow:hidden属性将它们屏蔽为div。

// Your markup
<div id="imgMask" style="overflow:hidden; height:200px; width:200px;">
    <div id="inner" style="position:relative; left:0;">
        <img src="images/environments/img0.jpg" />
        <img src="images/environments/img1.jpg" />
        <img src="images/environments/img2.jpg" />
        <img src="images/environments/img3.jpg" />
    </div>
</div>

// Your js
function slideLeft(){
    $('#inner').animate({
        left: -200px;
    },2000, function(){
        $('#inner img').eq(0).remove().appendTo('#inner');
        $('#inner').css({
            'left',0
        });
    });
}

这样,您只能滑动一个父元素而不是多个图像。 希望它有所帮助 - 上面的代码是未经测试的,但假设你的图像高度和宽度为200px,当然样式在样式表中比在线内容更好。


You're best off having four images and having them masked bi a div using the overflow:hidden attribute.

// Your markup
<div id="imgMask" style="overflow:hidden; height:200px; width:200px;">
    <div id="inner" style="position:relative; left:0;">
        <img src="images/environments/img0.jpg" />
        <img src="images/environments/img1.jpg" />
        <img src="images/environments/img2.jpg" />
        <img src="images/environments/img3.jpg" />
    </div>
</div>

// Your js
function slideLeft(){
    $('#inner').animate({
        left: -200px;
    },2000, function(){
        $('#inner img').eq(0).remove().appendTo('#inner');
        $('#inner').css({
            'left',0
        });
    });
}

This way you are only sliding one parent element instead of multiple images. Hope it helps - the above code is untested but assumes you have an image height and width of 200px, and of course the styles are better off in your stylesheet than being inline like this.

相关问答

更多

相关文章

更多

最新问答

更多
  • 您如何使用git diff文件,并将其应用于同一存储库的副本的本地分支?(How do you take a git diff file, and apply it to a local branch that is a copy of the same repository?)
  • 将长浮点值剪切为2个小数点并复制到字符数组(Cut Long Float Value to 2 decimal points and copy to Character Array)
  • OctoberCMS侧边栏不呈现(OctoberCMS Sidebar not rendering)
  • 页面加载后对象是否有资格进行垃圾回收?(Are objects eligible for garbage collection after the page loads?)
  • codeigniter中的语言不能按预期工作(language in codeigniter doesn' t work as expected)
  • 在计算机拍照在哪里进入
  • 使用cin.get()从c ++中的输入流中丢弃不需要的字符(Using cin.get() to discard unwanted characters from the input stream in c++)
  • No for循环将在for循环中运行。(No for loop will run inside for loop. Testing for primes)
  • 单页应用程序:页面重新加载(Single Page Application: page reload)
  • 在循环中选择具有相似模式的列名称(Selecting Column Name With Similar Pattern in a Loop)
  • System.StackOverflow错误(System.StackOverflow error)
  • KnockoutJS未在嵌套模板上应用beforeRemove和afterAdd(KnockoutJS not applying beforeRemove and afterAdd on nested templates)
  • 散列包括方法和/或嵌套属性(Hash include methods and/or nested attributes)
  • android - 如何避免使用Samsung RFS文件系统延迟/冻结?(android - how to avoid lag/freezes with Samsung RFS filesystem?)
  • TensorFlow:基于索引列表创建新张量(TensorFlow: Create a new tensor based on list of indices)
  • 企业安全培训的各项内容
  • 错误:RPC失败;(error: RPC failed; curl transfer closed with outstanding read data remaining)
  • C#类名中允许哪些字符?(What characters are allowed in C# class name?)
  • NumPy:将int64值存储在np.array中并使用dtype float64并将其转换回整数是否安全?(NumPy: Is it safe to store an int64 value in an np.array with dtype float64 and later convert it back to integer?)
  • 注销后如何隐藏导航portlet?(How to hide navigation portlet after logout?)
  • 将多个行和可变行移动到列(moving multiple and variable rows to columns)
  • 提交表单时忽略基础href,而不使用Javascript(ignore base href when submitting form, without using Javascript)
  • 对setOnInfoWindowClickListener的意图(Intent on setOnInfoWindowClickListener)
  • Angular $资源不会改变方法(Angular $resource doesn't change method)
  • 在Angular 5中不是一个函数(is not a function in Angular 5)
  • 如何配置Composite C1以将.m和桌面作为同一站点提供服务(How to configure Composite C1 to serve .m and desktop as the same site)
  • 不适用:悬停在悬停时:在元素之前[复制](Don't apply :hover when hovering on :before element [duplicate])
  • 常见的python rpc和cli接口(Common python rpc and cli interface)
  • Mysql DB单个字段匹配多个其他字段(Mysql DB single field matching to multiple other fields)
  • 产品页面上的Magento Up出售对齐问题(Magento Up sell alignment issue on the products page)