使用BS4解析HTML表(Parsing HTML Tables with BS4)
我一直在尝试从这个站点抓取数据的不同方法( http://nflcombineresults.com/nflcombinedata.php?year=1999&pos=WR&college= ),似乎无法让它们中的任何一个工作。 我尝试过使用指数,但似乎无法使其发挥作用。 我想我此刻已经尝试了太多东西,所以如果有人能指出我正确的方向,我会非常感激。
我想提取所有信息并将其导出到.csv文件,但此时我只是想获取打印的名称和位置以开始使用。
这是我的代码:
import urllib2 from bs4 import BeautifulSoup import re url = ('http://nflcombineresults.com/nflcombinedata.php?year=1999&pos=&college=') page = urllib2.urlopen(url).read() soup = BeautifulSoup(page) table = soup.find('table') for row in table.findAll('tr')[0:]: col = row.findAll('tr') name = col[1].string position = col[3].string player = (name, position) print "|".join(player)
这是我得到的错误:第14行,名称= col [1] .string IndexError:列表索引超出范围。
--UPDATE--
好的,我已经取得了一些进展。 它现在允许我从头到尾,但它需要知道表中有多少行。 我怎么能直到最后才通过它们? 更新的代码:
import urllib2 from bs4 import BeautifulSoup import re url = ('http://nflcombineresults.com/nflcombinedata.php?year=1999&pos=&college=') page = urllib2.urlopen(url).read() soup = BeautifulSoup(page) table = soup.find('table') for row in table.findAll('tr')[1:250]: col = row.findAll('td') name = col[1].getText() position = col[3].getText() player = (name, position) print "|".join(player)
I've been trying different methods of scraping data from this site (http://nflcombineresults.com/nflcombinedata.php?year=1999&pos=WR&college=) and can't seem to get any of them to work. I've tried playing with the indices given, but can't seem to make it work. I think I've tried too many things at this point,so if someone could point me in the right direction I would really appreciate it.
I would like to pull all of the information and export it to a .csv file, but at this point I'm just trying to get the name and position to print to get started.
Here's my code:
import urllib2 from bs4 import BeautifulSoup import re url = ('http://nflcombineresults.com/nflcombinedata.php?year=1999&pos=&college=') page = urllib2.urlopen(url).read() soup = BeautifulSoup(page) table = soup.find('table') for row in table.findAll('tr')[0:]: col = row.findAll('tr') name = col[1].string position = col[3].string player = (name, position) print "|".join(player)
Here's the error I'm getting: line 14, in name = col[1].string IndexError: list index out of range.
--UPDATE--
Ok, I've made a little progress. It now allows me to go from start to finish, but it requires knowing how many rows are in the table. How would I get it to just go through them until the end? Updated Code:
import urllib2 from bs4 import BeautifulSoup import re url = ('http://nflcombineresults.com/nflcombinedata.php?year=1999&pos=&college=') page = urllib2.urlopen(url).read() soup = BeautifulSoup(page) table = soup.find('table') for row in table.findAll('tr')[1:250]: col = row.findAll('td') name = col[1].getText() position = col[3].getText() player = (name, position) print "|".join(player)
原文:https://stackoverflow.com/questions/22078620
最满意答案
在Audacity中,您必须选中首选项的“ 导入/导出”部分中的 “使用自定义混合”单选按钮。 这将允许您导出多声道文件,并手动将曲目分配给频道。
除此之外,普通的旧.wav可以正常工作。
但您也可以使用SoX以更自动化的方式创建文件。
手动,您可以将五个不同的文件组合(或“合并”,如文档中所述)五个不同的文件,如下所示:
sox -M chan1.wav chan2.wav chan3.wav chan4.wav chan5.wav multi.wav
为了自动化这个过程,我整理了一个简短的Bash例程,用于生成具有交错测试音的多声道文件:
NUM=5 # Number of channels LEN=2 # Length of each test tone, in seconds OVL=0.5 # Overlap between test tones, in seconds # A one-channel base file containing simple white noise. # faded at both end with a quarter wave envelope to ensure # smooth equal power transitions sox -n -b 24 -c 1 out.wav synth $LEN whitenoise fade q $OVL -0 $OVL # Instead of white noise you can for example make a 1kHz tone # like this: # sox -n -b 24 -c 1 out.wav synth $LEN sine 1k fade q $OVL -0 $OVL # Or a sweep from 10Hz to 10kHz like this: # sox -n -b 24 -c 1 out.wav synth $LEN sine 10-10k fade q $OVL -0 $OVL # Produces a sequence of the number of seconds each channel # shall be padded with SEQ=$(for ((i=1; i<=NUM; i++)) do echo "$i 1 - [$LEN $OVL -]x * p" | dc # reverse-Polish arithmetic done) echo $SEQ # Padding the base file to various degrees and saving them separately for j in $SEQ do sox -c 1 out.wav outpad${j}.wav pad $j done # Finding the just-produced individual files FIL=$(ls | grep ^outpad) # Merging the individual files into a single multi-channel file sox -M $FIL multi.wav rm $FIL # removing the individual files # Producing a multi-channel waveform plot ffmpeg -i multi.wav -y -filter_complex "showwavespic=s=2400x900:split_channels=1" -frames:v 1 waveform.png # displaying the waveform plot open waveform.png
如波形图清晰显示,结果由一个包含五个通道的文件组成,每个通道具有相同的内容,只是在一段时间内移动:
更多关于使用
dc
反向波兰算法: http : //wiki.bash-hackers.org/howto/calculate-dc有关使用
ffmpeg
显示波形的更多信息: https : //trac.ffmpeg.org/wiki/WaveformIn Audacity you have to check the 'Use custom mix' radio button in the Import/Export section of the preferences. This will let you export multi-channel files, and manually assign tracks to channels.
Other than that, plain old .wav works fine for this.
But you can also use SoX to create the files in a more automated manner.
Manually you can combine (or 'merge' as it's referred to in the documentation) five distinct files into a single five-channel file like this:
sox -M chan1.wav chan2.wav chan3.wav chan4.wav chan5.wav multi.wav
To automate the process I put together a short Bash routine for producing a multichannel file with staggered test tones:
NUM=5 # Number of channels LEN=2 # Length of each test tone, in seconds OVL=0.5 # Overlap between test tones, in seconds # A one-channel base file containing simple white noise. # faded at both end with a quarter wave envelope to ensure # smooth equal power transitions sox -n -b 24 -c 1 out.wav synth $LEN whitenoise fade q $OVL -0 $OVL # Instead of white noise you can for example make a 1kHz tone # like this: # sox -n -b 24 -c 1 out.wav synth $LEN sine 1k fade q $OVL -0 $OVL # Or a sweep from 10Hz to 10kHz like this: # sox -n -b 24 -c 1 out.wav synth $LEN sine 10-10k fade q $OVL -0 $OVL # Produces a sequence of the number of seconds each channel # shall be padded with SEQ=$(for ((i=1; i<=NUM; i++)) do echo "$i 1 - [$LEN $OVL -]x * p" | dc # reverse-Polish arithmetic done) echo $SEQ # Padding the base file to various degrees and saving them separately for j in $SEQ do sox -c 1 out.wav outpad${j}.wav pad $j done # Finding the just-produced individual files FIL=$(ls | grep ^outpad) # Merging the individual files into a single multi-channel file sox -M $FIL multi.wav rm $FIL # removing the individual files # Producing a multi-channel waveform plot ffmpeg -i multi.wav -y -filter_complex "showwavespic=s=2400x900:split_channels=1" -frames:v 1 waveform.png # displaying the waveform plot open waveform.png
As the waveform plot clearly shows, the result consists of a file with five channels, each with the same content, just moved about some in time:
More on reverse-Polish arithmetic using
dc
: http://wiki.bash-hackers.org/howto/calculate-dcMore on displaying waveforms using
ffmpeg
: https://trac.ffmpeg.org/wiki/Waveform
相关问答
更多-
你的映射搞砸了 错误消息指出“必须只有一个视频流,它必须是第一个”。 MXF很挑剔,因此您必须首先映射视频,因为映射顺序将决定输出中的流顺序。 其次,您尝试使用-map进行音频通道选择,但它不能像这样工作:您必须添加-map_channel或平移音频过滤器。 -map_channel ffmpeg -i "D:\Media\AUDIO_0.W64" -i "D:\media\NO_AUDIO.mxf" -map 1:v -map 0:a -map 0:a -map 0:a -map_channel 0.0. ...
-
你有这个检查过吗? Have you checked with this?
-
存储在wav文件中的立体声PCM是LR格式。 'L'代表左声道样本,'R'代表右声道样本。 我想你在检索或存储PCM时有一个错误。 也许有时你会从缓冲区中的正确(正确)位置开始,有时你会从第二个样本开始。 没有其他信息很难分辨。 stereo PCM stored in a wav file is in an LR format. 'L' stands for left channel sample and 'R' for right channel sample. I guess you have a b ...
-
音频滤波器在FFMPEG上具有不同的sintax。 您可以平移音频通道而不会产生混音。 使用你的例子: ffmpeg -f lavfi -i "amovie=inMovie.mov,pan=stereo: c0=c0+c1: c1=c0+c1" -i inMovie.mov -map 0:0 -map 1:0 -vcodec libx264 -vpre medium -b 320k -pass 1 -s 374x210 -threads 0 -acodec libfaac -ab 64k outMov.mp ...
-
在Audacity中,您必须选中首选项的“ 导入/导出”部分中的 “使用自定义混合”单选按钮。 这将允许您导出多声道文件,并手动将曲目分配给频道。 除此之外,普通的旧.wav可以正常工作。 但您也可以使用SoX以更自动化的方式创建文件。 手动,您可以将五个不同的文件组合(或“合并”,如文档中所述)五个不同的文件,如下所示: sox -M chan1.wav chan2.wav chan3.wav chan4.wav chan5.wav multi.wav 为了自动化这个过程,我整理了一个简短的Bash例程 ...
-
如何混合两个音频通道?(how to mix two audio channels?)[2022-12-07]
混合只是两个信号的加权加法。 因此,如果您希望它们在一个单声道信号中相等,则将两个信号降低2倍并添加它们。 如果要将它们放置在立体声空间中,请在左右声道上使用不同的加权。 例如,信道1的0.6和左声道上的信号2的0.4,反之亦然,右声道将完成。 为了获得更好的结果,需要稍微移档,但这取决于您的需求。 Mixing is just a weighted addition of both signals. So if you want them to be equal in one mono signal, l ... -
分离然后加入.wav立体声通道的混乱音频(Choppy audio from separating and then joining .wav stereo channels)[2023-01-09]
因为它经常发生,我睡在它上,并在第二天用解决方案醒来。 问题出在combineChannels函数中。 以下是工作代码: def combineChannels(self, left, right): stereoData = left for i in range(0, self.CHUNK): index = i*2+1 stereoData = np.insert(stereoData, index, right[i:(i+1)]) ret ... -
Transloadit支持注入ffmpeg参数。 所以你可以很容易地将你的参数添加到ffmpeg对象中,如下所示。 ffmpeg:{ ab: '128k', "ss": "0", "t": "30" } 如果您对support@transloadit.com有其他疑问,请告诉我们 Transloadit supports injection of ffmpeg params. So you can easily add your params into ffmpeg object as followin ...
-
如何使用外部音频接口访问具有核心音频的单个通道(How to access individual channels with core audio from an external audio interface)[2021-12-15]
虽然您的代码对于它似乎管理的任务看起来过于复杂,但我会尝试回答您的问题: 在回调中检索样本数据的概念没有任何问题。 如果处理多声道音频设备,那将是不够的。 设备有多少个通道,通过AudioStreamBasicDescription查询给定设备的通道布局,格式等。 此属性用于初始化您的处理链的其余部分。 您在初始化时分配音频缓冲区,或让程序为您执行此操作(请阅读文档)。 如果你觉得使用额外的缓冲区更容易复制到数据处理和DSP,你可以在回调中管理它(简化代码): Float32 buf[streamForma ... -
您可以使用MultiplexingSampleProvider或MultiplexingWaveProvider从其中一个立体声对中播放两个独立的单声道流。 如果您想将整个声卡视为单个设备,那么我发现AsioOut往往是唯一的选择,您可以再次使用多路复用提供程序将各个NAudio波流路由到不同的设备输出。 You can play two independent mono streams out of one of the stereo pairs using MultiplexingSampleProvi ...