分离然后加入.wav立体声通道的混乱音频(Choppy audio from separating and then joining .wav stereo channels)
我目前正在使用python处理.wav文件,使用Pyaudio处理音频流,以及用于加载文件数据的python wave库。 我计划稍后包括处理各个立体声声道,关于信号的幅度和立体声信号的声像,但是现在我只是试图分离波形文件的两个声道,并将它们缝合回来一起 - 希望最终得到与输入数据相同的数据。
以下是我的代码。 方法getRawSample工作得很好,我可以通过该函数流式传输音频。 问题是我的getSample方法。 在线路的某处,我将两个音频通道分开,并将它们连接在一起,音频变形。 我甚至评论了我进行振幅和平移调整的部分,所以理论上它的数据是 - >数据输出。
以下是我的代码示例:class Sample(threading.Thread) : def __init__(self, filepath, chunk): super(Sample, self).__init__() self.CHUNK = chunk self.filepath = filepath self.wave = wave.open(self.filepath, 'rb') self.amp = 0.5 # varies from 0 to 1 self.pan = 0 # varies from -pi to pi self.WIDTH = self.wave.getsampwidth() self.CHANNELS = self.wave.getnchannels() self.RATE = self.wave.getframerate() self.MAXFRAMEFEEDS = self.wave.getnframes()/self.CHUNK # maximum even number of chunks self.unpstr = '<{0}h'.format(self.CHUNK*self.WIDTH) # format for unpacking the sample byte string self.pckstr = '<{0}h'.format(self.CHUNK*self.WIDTH) # format for unpacking the sample byte string self.framePos = 0 # keeps track of how many chunks of data fed # panning and amplitude adjustment of input sample data def panAmp(self, data, panVal, ampVal): # when panning, using constant power panning [left, right] = self.getChannels(data) #left = np.multiply(0.5, left) #(np.sqrt(2)/2)*(np.cos(panVal) + np.sin(panVal)) #right = np.multiply(0.5, right) # (np.sqrt(2)/2)*(np.cos(panVal) - np.sin(panVal)) outputList = self.combineChannels(left, right) dataResult = struct.pack(self.pckstr, *outputList) return dataResult def getChannels(self, data): dataPrepare = list(struct.unpack(self.unpstr, data)) left = dataPrepare[0::self.CHANNELS] right = dataPrepare[1::self.CHANNELS] return [left, right] def combineChannels(self, left, right): stereoData = left for i in range(0, self.CHUNK/self.WIDTH): index = i*2+1 stereoData = np.insert(stereoData, index, right[i*self.WIDTH:(i+1)*self.WIDTH]) return stereoData def getSample(self, panVal, ampVal): data = self.wave.readframes(self.CHUNK) self.framePos += 1 if self.framePos > self.MAXFRAMEFEEDS: # if no more audio samples to process self.wave.rewind() data = self.wave.readframes(self.CHUNK) self.framePos = 1 return self.panAmp(data, panVal, ampVal) def getRawSample(self): # for debugging, bypasses pan and amp functions data = self.wave.readframes(self.CHUNK) self.framePos += 1 if self.framePos > self.MAXFRAMEFEEDS: # if no more audio samples to process self.wave.rewind() data = self.wave.readframes(self.CHUNK) self.framePos = 1 return data
我怀疑错误是我将左右声道拼接在一起的方式,但不确定。 我用16位44100khz .wav文件加载项目。 下面是一个音频文件的链接,以便您可以听到结果音频输出。 第一部分是通过getSample方法运行两个文件(两个通道),而下一部分是通过getRawSample方法运行这些相同的文件。
https://dl.dropboxusercontent.com/u/24215404/pythonaudiosample.wav
基于音频,如前所述,似乎立体声文件变形。 查看上面文件的波形,看起来好像通过getSample方法后左右声道完全相同。
如果需要,我也可以发布我的代码,包括主要功能。 希望我的问题不是太模糊,但我很感激任何帮助或输入!
I am currently working on processing .wav files with python, using Pyaudio for streaming the audio, and the python wave library for loading the file data. I plan to later on include processing of the individual stereo channels, with regards to amplitude of the signal, and panning of the stereo signal, but for now i'm just trying to seperate the two channels of the wave file, and stitch them back together - Hopefully ending up with data that is identical to the input data.
Below is my code. The method getRawSample works perfectly fine, and i can stream audio through that function. The problem is my getSample method. Somewhere along the line, where i'm seperating the two channels of audio, and joining them back together, the audio gets distorted. I have even commented out the part where i do amplitude and panning adjustment, so in theory it's data in -> data out.
Below is an example of my code:class Sample(threading.Thread) : def __init__(self, filepath, chunk): super(Sample, self).__init__() self.CHUNK = chunk self.filepath = filepath self.wave = wave.open(self.filepath, 'rb') self.amp = 0.5 # varies from 0 to 1 self.pan = 0 # varies from -pi to pi self.WIDTH = self.wave.getsampwidth() self.CHANNELS = self.wave.getnchannels() self.RATE = self.wave.getframerate() self.MAXFRAMEFEEDS = self.wave.getnframes()/self.CHUNK # maximum even number of chunks self.unpstr = '<{0}h'.format(self.CHUNK*self.WIDTH) # format for unpacking the sample byte string self.pckstr = '<{0}h'.format(self.CHUNK*self.WIDTH) # format for unpacking the sample byte string self.framePos = 0 # keeps track of how many chunks of data fed # panning and amplitude adjustment of input sample data def panAmp(self, data, panVal, ampVal): # when panning, using constant power panning [left, right] = self.getChannels(data) #left = np.multiply(0.5, left) #(np.sqrt(2)/2)*(np.cos(panVal) + np.sin(panVal)) #right = np.multiply(0.5, right) # (np.sqrt(2)/2)*(np.cos(panVal) - np.sin(panVal)) outputList = self.combineChannels(left, right) dataResult = struct.pack(self.pckstr, *outputList) return dataResult def getChannels(self, data): dataPrepare = list(struct.unpack(self.unpstr, data)) left = dataPrepare[0::self.CHANNELS] right = dataPrepare[1::self.CHANNELS] return [left, right] def combineChannels(self, left, right): stereoData = left for i in range(0, self.CHUNK/self.WIDTH): index = i*2+1 stereoData = np.insert(stereoData, index, right[i*self.WIDTH:(i+1)*self.WIDTH]) return stereoData def getSample(self, panVal, ampVal): data = self.wave.readframes(self.CHUNK) self.framePos += 1 if self.framePos > self.MAXFRAMEFEEDS: # if no more audio samples to process self.wave.rewind() data = self.wave.readframes(self.CHUNK) self.framePos = 1 return self.panAmp(data, panVal, ampVal) def getRawSample(self): # for debugging, bypasses pan and amp functions data = self.wave.readframes(self.CHUNK) self.framePos += 1 if self.framePos > self.MAXFRAMEFEEDS: # if no more audio samples to process self.wave.rewind() data = self.wave.readframes(self.CHUNK) self.framePos = 1 return data
i am suspecting that the error is in the way that i stitch together the left and right channel, but not sure. I load the project with 16 bit 44100khz .wav files. Below is a link to an audio file so that you can hear the resulting audio output. The first part is running two files (both two channel) through the getSample method, while the next part is running those same files, through the getRawSample method.
https://dl.dropboxusercontent.com/u/24215404/pythonaudiosample.wav
Basing on the audio, as said earlier, it seems like the stereo file gets distorted. Looking at the waveform of above file, it seems as though the right and left channels are exactly the same after going through the getSample method.
If needed, i can also post my code including the main function. Hopefully my question isn't too vague, but i am grateful for any help or input!
原文:https://stackoverflow.com/questions/38943778
最满意答案
默认情况下,linux中使用eSpeak 。 在Windows上 - 由Microsoft提供的SAPI5。 抱歉没有运气的SAPI,但对于eSpeak,你可以很容易地添加单词的明确发音(不是很简短的改变,但它应该是有用的)。
您只需要语言词典文件的来源(
en_list
for english)。 你可以从这里获得eSpeak的来源。 我采取了espeak-1.47.11-source.zip
。然后我去了
espeak-1.47.11-source/dictsource
目录,打开en_list
并进入一行(就在香火之前):inc Insi:dEnt
然后我编译了字典(它将
en_dict
放在/usr/lib/x86_64-linux-gnu/espeak-data/en_dict
)$ sudo espeak --compile English
请注意,在Pronanciation之后,请参阅此处了解详细信息 。 就这样。 现在我的笔记本讲的是事件,而不是公司 。 除此之外从未告诉我包括代替公司 。
eSpeak is used by default in linux. On windows - SAPI5 by Microsoft. Sorry no luck for SAPI but for eSpeak you can add explicit pronunciation of word rather easily (not quite abbreviation change but it should be of use).
All you need is source for language dictionary file (
en_list
for english). You can get it with source of eSpeak from here. I've takenespeak-1.47.11-source.zip
.Then I've went to
espeak-1.47.11-source/dictsource
dir, openeden_list
and entered one line (just before incense):inc Insi:dEnt
Then I've compiled dictionary with (it will place
en_dict
in/usr/lib/x86_64-linux-gnu/espeak-data/en_dict
)$ sudo espeak --compile English
Note that after the word pronanciation goes see here for the detail. That's all. Now my notebook speaks incident instead of inc. Besides it never told me include in place of inc.
相关问答
更多-
呃,你应该使用engine.setProperty('voice', voice_id) ( voice_id是系统中语音的ID;你可以从engine.getProperty('voices')中获取可用语音列表) 例子 : engine = pyttsx.init() voices = engine.getProperty('voices') for voice in voices: engine.setProperty('voice', voice.id) # changes the voice ...
-
无法在Ubuntu Linux 16上的Python 2.7中导入pyttsx(Can't import pyttsx in Python 2.7 on Ubuntu Linux 16)[2023-11-14]
通过检查两个路径,确保你的pip绑定到你的python安装。 在Pip检查中: `pip --version` pip 9.0.1 from C:\Python27\lib\site-packages (python 2.7) 后来在python中: import sys print sys.executable C:\Python27\python.exe Make sure your pip is tied to your python installation by checking both ... -
PhpStorm中的缩写(Abbreviations in PhpStorm)[2023-09-14]
PHPStorm具有live template因此您可以使用它来生成多种类型的代码片段。 实时模板可用于将常用的构造插入到源代码中,例如,循环,条件,各种声明,打印语句,标签等。 要展开代码段,请键入相应的模板缩写,然后按Tab键。 继续按Tab键从模板中的一个变量跳转到下一个变量。 按Shift + Tab键移动到上一个变量 我认为此链接很有用,并建议您阅读此链接 要创建新代码段,您可以进行settings > Editor > Live Template PHPStorm has live templa ... -
python - pyttsx出错(python - error with pyttsx)[2022-03-16]
安装pywin32然后确保pywintypes27.dll位于C:\ Windows \ System32目录中。 install pywin32 then make sure that pywintypes27.dll is in the C:\Windows\System32 directory. -
您的文件名为pyttsx.py ,因此您import自己的文件,而不是已安装的模块。 重命名您的文件。 Your file has name pyttsx.py, so you import your own file, instead of installed module. Rename your file.
-
通过使用python的内置Queue类,我可以得到正确的结果: import pyttsx from Queue import Queue from threading import Thread q = Queue() def say_loop(): engine = pyttsx.init() while True: engine.say(q.get()) engine.runAndWait() q.task_done() def a ...
-
默认情况下,linux中使用eSpeak 。 在Windows上 - 由Microsoft提供的SAPI5。 抱歉没有运气的SAPI,但对于eSpeak,你可以很容易地添加单词的明确发音(不是很简短的改变,但它应该是有用的)。 您只需要语言词典文件的来源( en_list for english)。 你可以从这里获得eSpeak的来源。 我采取了espeak-1.47.11-source.zip 。 然后我去了espeak-1.47.11-source/dictsource目录,打开en_list并进入一行 ...
-
我认为你最好使用pip $ pip install pyttsx $ pip list pip (1.5.4) pyttsx (1.1) setuptools (2.2) 一切都应该没问题 $ python Python 2.7.2 (default, Jul 20 2011, 02:32:18) [GCC 4.2.1 (LLVM, Emscripten 1.5, Empythoned)] on linux2 Type "help", "copyright", "credits" or "license" ...
-
我知道转换为PascalCase的唯一内置方法是TextInfo.ToTitleCase ,它不能按设计处理全大写字。 为了解决这个问题,我制作了一个可以检测所有单词部分的自定义正则表达式,然后将它们单独转换为Title / Pascal Case: string ToPascalCase(string s) { // Find word parts using the following rules: // 1. all lowercase starting at the beginnin ...
-
我自己找到了答案:我必须添加两行代码来打开文件并将其行读入数组: import pyttsx engine = pyttsx.init() with open('/Users/exepaul/Desktop/a.txt') as f: lines = f.readlines() engine.say(lines) engine.runAndWait() engine.runAndWait() I found the answer myself: I had to add two lines of ...