首页 \ 问答 \ 使用BS4解析HTML表(Parsing HTML Tables with BS4)

使用BS4解析HTML表(Parsing HTML Tables with BS4)

 我一直在尝试从这个站点抓取数据的不同方法（ http://nflcombineresults.com/nflcombinedata.php?year=1999&pos=WR&college= ），似乎无法让它们中的任何一个工作。 我尝试过使用指数，但似乎无法使其发挥作用。 我想我此刻已经尝试了太多东西，所以如果有人能指出我正确的方向，我会非常感激。  
 我想提取所有信息并将其导出到.csv文件，但此时我只是想获取打印的名称和位置以开始使用。  
 这是我的代码：  
import urllib2
from bs4 import BeautifulSoup
import re

url = ('http://nflcombineresults.com/nflcombinedata.php?year=1999&pos=&college=')

page = urllib2.urlopen(url).read()

soup = BeautifulSoup(page)
table = soup.find('table')

for row in table.findAll('tr')[0:]:
    col = row.findAll('tr')
    name = col[1].string
    position = col[3].string
    player = (name, position)
    print "|".join(player)
 
 这是我得到的错误：第14行，名称= col [1] .string IndexError：列表索引超出范围。  
 --UPDATE--  
 好的，我已经取得了一些进展。 它现在允许我从头到尾，但它需要知道表中有多少行。 我怎么能直到最后才通过它们？ 更新的代码：  
import urllib2
from bs4 import BeautifulSoup
import re

url = ('http://nflcombineresults.com/nflcombinedata.php?year=1999&pos=&college=')

page = urllib2.urlopen(url).read()

soup = BeautifulSoup(page)
table = soup.find('table')


for row in table.findAll('tr')[1:250]:
    col = row.findAll('td')
    name = col[1].getText()
    position = col[3].getText()
    player = (name, position)
    print "|".join(player)

I've been trying different methods of scraping data from this site (http://nflcombineresults.com/nflcombinedata.php?year=1999&pos=WR&college=) and can't seem to get any of them to work. I've tried playing with the indices given, but can't seem to make it work. I think I've tried too many things at this point,so if someone could point me in the right direction I would really appreciate it.  
I would like to pull all of the information and export it to a .csv file, but at this point I'm just trying to get the name and position to print to get started. 
Here's my code: 
import urllib2
from bs4 import BeautifulSoup
import re

url = ('http://nflcombineresults.com/nflcombinedata.php?year=1999&pos=&college=')

page = urllib2.urlopen(url).read()

soup = BeautifulSoup(page)
table = soup.find('table')

for row in table.findAll('tr')[0:]:
    col = row.findAll('tr')
    name = col[1].string
    position = col[3].string
    player = (name, position)
    print "|".join(player)
 
Here's the error I'm getting: line 14, in name = col[1].string IndexError: list index out of range. 
--UPDATE-- 
Ok, I've made a little progress. It now allows me to go from start to finish, but it requires knowing how many rows are in the table. How would I get it to just go through them until the end? Updated Code: 
import urllib2
from bs4 import BeautifulSoup
import re

url = ('http://nflcombineresults.com/nflcombinedata.php?year=1999&pos=&college=')

page = urllib2.urlopen(url).read()

soup = BeautifulSoup(page)
table = soup.find('table')


for row in table.findAll('tr')[1:250]:
    col = row.findAll('td')
    name = col[1].getText()
    position = col[3].getText()
    player = (name, position)
    print "|".join(player)

原文：https://stackoverflow.com/questions/22078620

更新时间：2022-06-27 10:06

最满意答案

 在Audacity中，您必须选中首选项的“ 导入/导出”部分中的 “使用自定义混合”单选按钮。 这将允许您导出多声道文件，并手动将曲目分配给频道。  
 除此之外，普通的旧.wav可以正常工作。  
 但您也可以使用SoX以更自动化的方式创建文件。  
 手动，您可以将五个不同的文件组合（或“合并”，如文档中所述）五个不同的文件，如下所示：  
sox -M chan1.wav chan2.wav chan3.wav chan4.wav chan5.wav multi.wav
 
 为了自动化这个过程，我整理了一个简短的Bash例程，用于生成具有交错测试音的多声道文件：  
NUM=5    # Number of channels
LEN=2    # Length of each test tone, in seconds
OVL=0.5  # Overlap between test tones, in seconds

# A one-channel base file containing simple white noise.
# faded at both end with a quarter wave envelope to ensure 
# smooth equal power transitions
sox -n -b 24 -c 1 out.wav synth $LEN whitenoise fade q $OVL -0 $OVL

# Instead of white noise you can for example make a 1kHz tone
# like this:
# sox -n -b 24 -c 1 out.wav synth $LEN sine 1k fade q $OVL -0 $OVL

# Or a sweep from 10Hz to 10kHz like this:
# sox -n -b 24 -c 1 out.wav synth $LEN sine 10-10k fade q $OVL -0 $OVL

# Produces a sequence of the number of seconds each channel
# shall be padded with
SEQ=$(for ((i=1; i<=NUM; i++))
do 
  echo "$i 1 - [$LEN $OVL -]x * p" | dc  # reverse-Polish arithmetic
done)

echo $SEQ

# Padding the base file to various degrees and saving them separately
for j in $SEQ
do 
  sox -c 1 out.wav outpad${j}.wav pad $j
done

# Finding the just-produced individual files
FIL=$(ls | grep ^outpad)

# Merging the individual files into a single multi-channel file
sox -M $FIL multi.wav

rm $FIL  # removing the individual files

# Producing a multi-channel waveform plot
ffmpeg -i multi.wav -y -filter_complex "showwavespic=s=2400x900:split_channels=1" -frames:v 1 waveform.png

# displaying the waveform plot
open waveform.png
 
 如波形图清晰显示，结果由一个包含五个通道的文件组成，每个通道具有相同的内容，只是在一段时间内移动：  
   
 更多关于使用dc反向波兰算法： http ： //wiki.bash-hackers.org/howto/calculate-dc  
 有关使用ffmpeg显示波形的更多信息： https ： //trac.ffmpeg.org/wiki/Waveform 

In Audacity you have to check the 'Use custom mix' radio button in the Import/Export section of the preferences. This will let you export multi-channel files, and manually assign tracks to channels. 
Other than that, plain old .wav works fine for this. 
But you can also use SoX to create the files in a more automated manner. 
Manually you can combine (or 'merge' as it's referred to in the documentation) five distinct files into a single five-channel file like this: 
sox -M chan1.wav chan2.wav chan3.wav chan4.wav chan5.wav multi.wav
 
To automate the process I put together a short Bash routine for producing a multichannel file with staggered test tones: 
NUM=5    # Number of channels
LEN=2    # Length of each test tone, in seconds
OVL=0.5  # Overlap between test tones, in seconds

# A one-channel base file containing simple white noise.
# faded at both end with a quarter wave envelope to ensure 
# smooth equal power transitions
sox -n -b 24 -c 1 out.wav synth $LEN whitenoise fade q $OVL -0 $OVL

# Instead of white noise you can for example make a 1kHz tone
# like this:
# sox -n -b 24 -c 1 out.wav synth $LEN sine 1k fade q $OVL -0 $OVL

# Or a sweep from 10Hz to 10kHz like this:
# sox -n -b 24 -c 1 out.wav synth $LEN sine 10-10k fade q $OVL -0 $OVL

# Produces a sequence of the number of seconds each channel
# shall be padded with
SEQ=$(for ((i=1; i<=NUM; i++))
do 
  echo "$i 1 - [$LEN $OVL -]x * p" | dc  # reverse-Polish arithmetic
done)

echo $SEQ

# Padding the base file to various degrees and saving them separately
for j in $SEQ
do 
  sox -c 1 out.wav outpad${j}.wav pad $j
done

# Finding the just-produced individual files
FIL=$(ls | grep ^outpad)

# Merging the individual files into a single multi-channel file
sox -M $FIL multi.wav

rm $FIL  # removing the individual files

# Producing a multi-channel waveform plot
ffmpeg -i multi.wav -y -filter_complex "showwavespic=s=2400x900:split_channels=1" -frames:v 1 waveform.png

# displaying the waveform plot
open waveform.png
 
As the waveform plot clearly shows, the result consists of a file with five channels, each with the same content, just moved about some in time: 
 
More on reverse-Polish arithmetic using dc: http://wiki.bash-hackers.org/howto/calculate-dc 
More on displaying waveforms using ffmpeg: https://trac.ffmpeg.org/wiki/Waveform

使用BS4解析HTML表(Parsing HTML Tables with BS4)

最满意答案

相关问答

FFMpeg如何从wav / .w64中提取单个音频通道，并使用轨道标记插入.mxf(FFMpeg How to extract individual audio channels from wav/.w64 and insert in .mxf with track tags)[2023-02-27]

提取quicktime文件中的每个音频通道(extract every audio-channel in a quicktime file)[2021-07-22]

左右音频通道正在交换(Left and Right audio channels are exchanging)[2022-05-13]

在ffmpeg中平移所有音频频道中心？(Panning all audio channels center in ffmpeg?)[2022-05-10]

1个文件中的5个独立音频通道(5 individual audio channels in 1 file)[2023-08-03]

如何混合两个音频通道？(how to mix two audio channels?)[2022-12-07]

分离然后加入.wav立体声通道的混乱音频(Choppy audio from separating and then joining .wav stereo channels)[2023-01-09]

如果有多个通道，则合并，然后从音频文件中获取采样长度并将其保存到s3(If multiple channels, merge then take sample length from audio file and save it to s3)[2023-09-11]

如何使用外部音频接口访问具有核心音频的单个通道(How to access individual channels with core audio from an external audio interface)[2021-12-15]

将多个音频输出到各个声卡通道(Output multiple audio to individual sound card channels)[2021-12-08]

相关文章

最新问答