首页 \ 问答 \ 分离然后加入.wav立体声通道的混乱音频(Choppy audio from separating and then joining .wav stereo channels)

分离然后加入.wav立体声通道的混乱音频(Choppy audio from separating and then joining .wav stereo channels)

 我目前正在使用python处理.wav文件，使用Pyaudio处理音频流，以及用于加载文件数据的python wave库。 我计划稍后包括处理各个立体声声道，关于信号的幅度和立体声信号的声像，但是现在我只是试图分离波形文件的两个声道，并将它们缝合回来一起 - 希望最终得到与输入数据相同的数据。  
 以下是我的代码。 方法getRawSample工作得很好，我可以通过该函数流式传输音频。 问题是我的getSample方法。 在线路的某处，我将两个音频通道分开，并将它们连接在一起，音频变形。 我甚至评论了我进行振幅和平移调整的部分，所以理论上它的数据是 - >数据输出。 
 以下是我的代码示例：  
class Sample(threading.Thread) :

def __init__(self, filepath, chunk):
    super(Sample, self).__init__()
    self.CHUNK = chunk
    self.filepath = filepath
    self.wave = wave.open(self.filepath, 'rb')
    self.amp = 0.5 # varies from 0 to 1
    self.pan = 0 # varies from -pi to pi
    self.WIDTH = self.wave.getsampwidth()
    self.CHANNELS  = self.wave.getnchannels()
    self.RATE = self.wave.getframerate()
    self.MAXFRAMEFEEDS = self.wave.getnframes()/self.CHUNK  # maximum even number of chunks
    self.unpstr = '<{0}h'.format(self.CHUNK*self.WIDTH)  # format for unpacking the sample byte string
    self.pckstr = '<{0}h'.format(self.CHUNK*self.WIDTH)  # format for unpacking the sample byte string

    self.framePos = 0  # keeps track of how many chunks of data fed

#  panning and amplitude adjustment of input sample data

def panAmp(self, data, panVal, ampVal):  # when panning, using constant power panning
    [left, right] = self.getChannels(data)
    #left = np.multiply(0.5, left) #(np.sqrt(2)/2)*(np.cos(panVal) + np.sin(panVal))
    #right = np.multiply(0.5, right)  # (np.sqrt(2)/2)*(np.cos(panVal) - np.sin(panVal))
    outputList = self.combineChannels(left, right)
    dataResult = struct.pack(self.pckstr, *outputList)
    return dataResult

def getChannels(self, data):
    dataPrepare = list(struct.unpack(self.unpstr, data))
    left = dataPrepare[0::self.CHANNELS]
    right = dataPrepare[1::self.CHANNELS]
    return [left, right]

def combineChannels(self, left, right):
    stereoData = left
    for i in range(0, self.CHUNK/self.WIDTH):
        index = i*2+1
        stereoData = np.insert(stereoData, index, right[i*self.WIDTH:(i+1)*self.WIDTH])
    return stereoData

def getSample(self, panVal, ampVal):
    data = self.wave.readframes(self.CHUNK)
    self.framePos += 1
    if self.framePos > self.MAXFRAMEFEEDS:  # if no more audio samples to process
        self.wave.rewind()
        data = self.wave.readframes(self.CHUNK)
        self.framePos = 1
    return self.panAmp(data, panVal, ampVal)

def getRawSample(self):  # for debugging, bypasses pan and amp functions
    data = self.wave.readframes(self.CHUNK)
    self.framePos += 1
    if self.framePos > self.MAXFRAMEFEEDS:  # if no more audio samples to process
        self.wave.rewind()
        data = self.wave.readframes(self.CHUNK)
        self.framePos = 1
    return data
 
 我怀疑错误是我将左右声道拼接在一起的方式，但不确定。 我用16位44100khz .wav文件加载项目。 下面是一个音频文件的链接，以便您可以听到结果音频输出。 第一部分是通过getSample方法运行两个文件（两个通道），而下一部分是通过getRawSample方法运行这些相同的文件。  
 https://dl.dropboxusercontent.com/u/24215404/pythonaudiosample.wav  
 基于音频，如前所述，似乎立体声文件变形。 查看上面文件的波形，看起来好像通过getSample方法后左右声道完全相同。  
 如果需要，我也可以发布我的代码，包括主要功能。 希望我的问题不是太模糊，但我很感激任何帮助或输入！ 

I am currently working on processing .wav files with python, using Pyaudio for streaming the audio, and the python wave library for loading the file data. I plan to later on include processing of the individual stereo channels, with regards to amplitude of the signal, and panning of the stereo signal, but for now i'm just trying to seperate the two channels of the wave file, and stitch them back together - Hopefully ending up with data that is identical to the input data. 
Below is my code. The method getRawSample works perfectly fine, and i can stream audio through that function. The problem is my getSample method. Somewhere along the line, where i'm seperating the two channels of audio, and joining them back together, the audio gets distorted. I have even commented out the part where i do amplitude and panning adjustment, so in theory it's data in -> data out.
 Below is an example of my code: 
class Sample(threading.Thread) :

def __init__(self, filepath, chunk):
    super(Sample, self).__init__()
    self.CHUNK = chunk
    self.filepath = filepath
    self.wave = wave.open(self.filepath, 'rb')
    self.amp = 0.5 # varies from 0 to 1
    self.pan = 0 # varies from -pi to pi
    self.WIDTH = self.wave.getsampwidth()
    self.CHANNELS  = self.wave.getnchannels()
    self.RATE = self.wave.getframerate()
    self.MAXFRAMEFEEDS = self.wave.getnframes()/self.CHUNK  # maximum even number of chunks
    self.unpstr = '<{0}h'.format(self.CHUNK*self.WIDTH)  # format for unpacking the sample byte string
    self.pckstr = '<{0}h'.format(self.CHUNK*self.WIDTH)  # format for unpacking the sample byte string

    self.framePos = 0  # keeps track of how many chunks of data fed

#  panning and amplitude adjustment of input sample data

def panAmp(self, data, panVal, ampVal):  # when panning, using constant power panning
    [left, right] = self.getChannels(data)
    #left = np.multiply(0.5, left) #(np.sqrt(2)/2)*(np.cos(panVal) + np.sin(panVal))
    #right = np.multiply(0.5, right)  # (np.sqrt(2)/2)*(np.cos(panVal) - np.sin(panVal))
    outputList = self.combineChannels(left, right)
    dataResult = struct.pack(self.pckstr, *outputList)
    return dataResult

def getChannels(self, data):
    dataPrepare = list(struct.unpack(self.unpstr, data))
    left = dataPrepare[0::self.CHANNELS]
    right = dataPrepare[1::self.CHANNELS]
    return [left, right]

def combineChannels(self, left, right):
    stereoData = left
    for i in range(0, self.CHUNK/self.WIDTH):
        index = i*2+1
        stereoData = np.insert(stereoData, index, right[i*self.WIDTH:(i+1)*self.WIDTH])
    return stereoData

def getSample(self, panVal, ampVal):
    data = self.wave.readframes(self.CHUNK)
    self.framePos += 1
    if self.framePos > self.MAXFRAMEFEEDS:  # if no more audio samples to process
        self.wave.rewind()
        data = self.wave.readframes(self.CHUNK)
        self.framePos = 1
    return self.panAmp(data, panVal, ampVal)

def getRawSample(self):  # for debugging, bypasses pan and amp functions
    data = self.wave.readframes(self.CHUNK)
    self.framePos += 1
    if self.framePos > self.MAXFRAMEFEEDS:  # if no more audio samples to process
        self.wave.rewind()
        data = self.wave.readframes(self.CHUNK)
        self.framePos = 1
    return data
 
i am suspecting that the error is in the way that i stitch together the left and right channel, but not sure. I load the project with 16 bit 44100khz .wav files. Below is a link to an audio file so that you can hear the resulting audio output. The first part is running two files (both two channel) through the getSample method, while the next part is running those same files, through the getRawSample method.  
https://dl.dropboxusercontent.com/u/24215404/pythonaudiosample.wav 
Basing on the audio, as said earlier, it seems like the stereo file gets distorted. Looking at the waveform of above file, it seems as though the right and left channels are exactly the same after going through the getSample method. 
If needed, i can also post my code including the main function. Hopefully my question isn't too vague, but i am grateful for any help or input!

原文：https://stackoverflow.com/questions/38943778

更新时间：2023-01-09 12:01

最满意答案

 默认情况下，linux中使用eSpeak 。 在Windows上 - 由Microsoft提供的SAPI5。 抱歉没有运气的SAPI，但对于eSpeak，你可以很容易地添加单词的明确发音（不是很简短的改变，但它应该是有用的）。  
 您只需要语言词典文件的来源（ en_list for english）。 你可以从这里获得eSpeak的来源。 我采取了espeak-1.47.11-source.zip 。  
 然后我去了espeak-1.47.11-source/dictsource目录，打开en_list并进入一行（就在香火之前）：  
inc Insi:dEnt
 
 然后我编译了字典（它将en_dict放在/usr/lib/x86_64-linux-gnu/espeak-data/en_dict ）  
$ sudo espeak --compile English
 
 请注意，在Pronanciation之后，请参阅此处了解详细信息 。 就这样。 现在我的笔记本讲的是事件，而不是公司 。 除此之外从未告诉我包括代替公司 。 

eSpeak is used by default in linux. On windows - SAPI5 by Microsoft. Sorry no luck for SAPI but for eSpeak you can add explicit pronunciation of word rather easily (not quite abbreviation change but it should be of use). 
All you need is source for language dictionary file (en_list for english). You can get it with source of eSpeak from here. I've taken espeak-1.47.11-source.zip. 
Then I've went to espeak-1.47.11-source/dictsource dir, opened en_list and entered one line (just before incense): 
inc Insi:dEnt
 
Then I've compiled dictionary with (it will place en_dict in /usr/lib/x86_64-linux-gnu/espeak-data/en_dict) 
$ sudo espeak --compile English
 
Note that after the word pronanciation goes see here for the detail. That's all. Now my notebook speaks incident instead of inc. Besides it never told me include in place of inc.

分离然后加入.wav立体声通道的混乱音频(Choppy audio from separating and then joining .wav stereo channels)

最满意答案

相关问答

在python中使用PYTTSX模块更改语音(Changing the voice with PYTTSX module in python)[2022-02-23]

无法在Ubuntu Linux 16上的Python 2.7中导入pyttsx(Can't import pyttsx in Python 2.7 on Ubuntu Linux 16)[2023-11-14]

PhpStorm中的缩写(Abbreviations in PhpStorm)[2023-09-14]

python - pyttsx出错(python - error with pyttsx)[2022-03-16]

来自他们网站的pyttsx代码出错(error with pyttsx code from their website)[2022-11-20]

Python pyttsx，如何使用外部循环(Python pyttsx, how to use external loop)[2021-08-11]

python：更改pyttsx缩写(python: Change pyttsx abbreviations)[2021-11-02]

ImportError：没有名为pyttsx的模块(ImportError: No module named pyttsx)[2023-07-08]

ToPascalCase（）C＃for all caps缩写(ToPascalCase() C# for all caps Abbreviations)[2022-08-20]

从pyttsx python中的文本文件中获取数据(fetch data from text file in pyttsx python)[2024-01-20]

相关文章

最新问答