首页 \ 问答 \ 分离然后加入.wav立体声通道的混乱音频(Choppy audio from separating and then joining .wav stereo channels)

分离然后加入.wav立体声通道的混乱音频(Choppy audio from separating and then joining .wav stereo channels)

我目前正在使用python处理.wav文件,使用Pyaudio处理音频流,以及用于加载文件数据的python wave库。 我计划稍后包括处理各个立体声声道,关于信号的幅度和立体声信号的声像,但是现在我只是试图分离波形文件的两个声道,并将它们缝合回来一起 - 希望最终得到与输入数据相同的数据。

以下是我的代码。 方法getRawSample工作得很好,我可以通过该函数流式传输音频。 问题是我的getSample方法。 在线路的某处,我将两个音频通道分开,并将它们连接在一起,音频变形。 我甚至评论了我进行振幅和平移调整的部分,所以理论上它的数据是 - >数据输出。
以下是我的代码示例:

class Sample(threading.Thread) :

def __init__(self, filepath, chunk):
    super(Sample, self).__init__()
    self.CHUNK = chunk
    self.filepath = filepath
    self.wave = wave.open(self.filepath, 'rb')
    self.amp = 0.5 # varies from 0 to 1
    self.pan = 0 # varies from -pi to pi
    self.WIDTH = self.wave.getsampwidth()
    self.CHANNELS  = self.wave.getnchannels()
    self.RATE = self.wave.getframerate()
    self.MAXFRAMEFEEDS = self.wave.getnframes()/self.CHUNK  # maximum even number of chunks
    self.unpstr = '<{0}h'.format(self.CHUNK*self.WIDTH)  # format for unpacking the sample byte string
    self.pckstr = '<{0}h'.format(self.CHUNK*self.WIDTH)  # format for unpacking the sample byte string

    self.framePos = 0  # keeps track of how many chunks of data fed

#  panning and amplitude adjustment of input sample data

def panAmp(self, data, panVal, ampVal):  # when panning, using constant power panning
    [left, right] = self.getChannels(data)
    #left = np.multiply(0.5, left) #(np.sqrt(2)/2)*(np.cos(panVal) + np.sin(panVal))
    #right = np.multiply(0.5, right)  # (np.sqrt(2)/2)*(np.cos(panVal) - np.sin(panVal))
    outputList = self.combineChannels(left, right)
    dataResult = struct.pack(self.pckstr, *outputList)
    return dataResult

def getChannels(self, data):
    dataPrepare = list(struct.unpack(self.unpstr, data))
    left = dataPrepare[0::self.CHANNELS]
    right = dataPrepare[1::self.CHANNELS]
    return [left, right]

def combineChannels(self, left, right):
    stereoData = left
    for i in range(0, self.CHUNK/self.WIDTH):
        index = i*2+1
        stereoData = np.insert(stereoData, index, right[i*self.WIDTH:(i+1)*self.WIDTH])
    return stereoData

def getSample(self, panVal, ampVal):
    data = self.wave.readframes(self.CHUNK)
    self.framePos += 1
    if self.framePos > self.MAXFRAMEFEEDS:  # if no more audio samples to process
        self.wave.rewind()
        data = self.wave.readframes(self.CHUNK)
        self.framePos = 1
    return self.panAmp(data, panVal, ampVal)

def getRawSample(self):  # for debugging, bypasses pan and amp functions
    data = self.wave.readframes(self.CHUNK)
    self.framePos += 1
    if self.framePos > self.MAXFRAMEFEEDS:  # if no more audio samples to process
        self.wave.rewind()
        data = self.wave.readframes(self.CHUNK)
        self.framePos = 1
    return data

我怀疑错误是我将左右声道拼接在一起的方式,但不确定。 我用16位44100khz .wav文件加载项目。 下面是一个音频文件的链接,以便您可以听到结果音频输出。 第一部分是通过getSample方法运行两个文件(两个通道),而下一部分是通过getRawSample方法运行这些相同的文件。

https://dl.dropboxusercontent.com/u/24215404/pythonaudiosample.wav

基于音频,如前所述,似乎立体声文件变形。 查看上面文件的波形,看起来好像通过getSample方法后左右声道完全相同。

如果需要,我也可以发布我的代码,包括主要功能。 希望我的问题不是太模糊,但我很感激任何帮助或输入!


I am currently working on processing .wav files with python, using Pyaudio for streaming the audio, and the python wave library for loading the file data. I plan to later on include processing of the individual stereo channels, with regards to amplitude of the signal, and panning of the stereo signal, but for now i'm just trying to seperate the two channels of the wave file, and stitch them back together - Hopefully ending up with data that is identical to the input data.

Below is my code. The method getRawSample works perfectly fine, and i can stream audio through that function. The problem is my getSample method. Somewhere along the line, where i'm seperating the two channels of audio, and joining them back together, the audio gets distorted. I have even commented out the part where i do amplitude and panning adjustment, so in theory it's data in -> data out.
Below is an example of my code:

class Sample(threading.Thread) :

def __init__(self, filepath, chunk):
    super(Sample, self).__init__()
    self.CHUNK = chunk
    self.filepath = filepath
    self.wave = wave.open(self.filepath, 'rb')
    self.amp = 0.5 # varies from 0 to 1
    self.pan = 0 # varies from -pi to pi
    self.WIDTH = self.wave.getsampwidth()
    self.CHANNELS  = self.wave.getnchannels()
    self.RATE = self.wave.getframerate()
    self.MAXFRAMEFEEDS = self.wave.getnframes()/self.CHUNK  # maximum even number of chunks
    self.unpstr = '<{0}h'.format(self.CHUNK*self.WIDTH)  # format for unpacking the sample byte string
    self.pckstr = '<{0}h'.format(self.CHUNK*self.WIDTH)  # format for unpacking the sample byte string

    self.framePos = 0  # keeps track of how many chunks of data fed

#  panning and amplitude adjustment of input sample data

def panAmp(self, data, panVal, ampVal):  # when panning, using constant power panning
    [left, right] = self.getChannels(data)
    #left = np.multiply(0.5, left) #(np.sqrt(2)/2)*(np.cos(panVal) + np.sin(panVal))
    #right = np.multiply(0.5, right)  # (np.sqrt(2)/2)*(np.cos(panVal) - np.sin(panVal))
    outputList = self.combineChannels(left, right)
    dataResult = struct.pack(self.pckstr, *outputList)
    return dataResult

def getChannels(self, data):
    dataPrepare = list(struct.unpack(self.unpstr, data))
    left = dataPrepare[0::self.CHANNELS]
    right = dataPrepare[1::self.CHANNELS]
    return [left, right]

def combineChannels(self, left, right):
    stereoData = left
    for i in range(0, self.CHUNK/self.WIDTH):
        index = i*2+1
        stereoData = np.insert(stereoData, index, right[i*self.WIDTH:(i+1)*self.WIDTH])
    return stereoData

def getSample(self, panVal, ampVal):
    data = self.wave.readframes(self.CHUNK)
    self.framePos += 1
    if self.framePos > self.MAXFRAMEFEEDS:  # if no more audio samples to process
        self.wave.rewind()
        data = self.wave.readframes(self.CHUNK)
        self.framePos = 1
    return self.panAmp(data, panVal, ampVal)

def getRawSample(self):  # for debugging, bypasses pan and amp functions
    data = self.wave.readframes(self.CHUNK)
    self.framePos += 1
    if self.framePos > self.MAXFRAMEFEEDS:  # if no more audio samples to process
        self.wave.rewind()
        data = self.wave.readframes(self.CHUNK)
        self.framePos = 1
    return data

i am suspecting that the error is in the way that i stitch together the left and right channel, but not sure. I load the project with 16 bit 44100khz .wav files. Below is a link to an audio file so that you can hear the resulting audio output. The first part is running two files (both two channel) through the getSample method, while the next part is running those same files, through the getRawSample method.

https://dl.dropboxusercontent.com/u/24215404/pythonaudiosample.wav

Basing on the audio, as said earlier, it seems like the stereo file gets distorted. Looking at the waveform of above file, it seems as though the right and left channels are exactly the same after going through the getSample method.

If needed, i can also post my code including the main function. Hopefully my question isn't too vague, but i am grateful for any help or input!


原文:https://stackoverflow.com/questions/38943778
更新时间:2023-01-09 12:01

最满意答案

默认情况下,linux中使用eSpeak 。 在Windows上 - 由Microsoft提供的SAPI5。 抱歉没有运气的SAPI,但对于eSpeak,你可以很容易地添加单词的明确发音(不是很简短的改变,但它应该是有用的)。

您只需要语言词典文件的来源( en_list for english)。 你可以从这里获得eSpeak的来源。 我采取了espeak-1.47.11-source.zip

然后我去了espeak-1.47.11-source/dictsource目录,打开en_list并进入一行(就在香火之前):

inc Insi:dEnt

然后我编译了字典(它将en_dict放在/usr/lib/x86_64-linux-gnu/espeak-data/en_dict

$ sudo espeak --compile English

请注意,在Pronanciation之后,请参阅此处了解详细信息 。 就这样。 现在我的笔记本讲的是事件,而不是公司 。 除此之外从未告诉我包括代替公司


eSpeak is used by default in linux. On windows - SAPI5 by Microsoft. Sorry no luck for SAPI but for eSpeak you can add explicit pronunciation of word rather easily (not quite abbreviation change but it should be of use).

All you need is source for language dictionary file (en_list for english). You can get it with source of eSpeak from here. I've taken espeak-1.47.11-source.zip.

Then I've went to espeak-1.47.11-source/dictsource dir, opened en_list and entered one line (just before incense):

inc Insi:dEnt

Then I've compiled dictionary with (it will place en_dict in /usr/lib/x86_64-linux-gnu/espeak-data/en_dict)

$ sudo espeak --compile English

Note that after the word pronanciation goes see here for the detail. That's all. Now my notebook speaks incident instead of inc. Besides it never told me include in place of inc.

相关问答

更多
  • 呃,你应该使用engine.setProperty('voice', voice_id) ( voice_id是系统中语音的ID;你可以从engine.getProperty('voices')中获取可用语音列表) 例子 : engine = pyttsx.init() voices = engine.getProperty('voices') for voice in voices: engine.setProperty('voice', voice.id) # changes the voice ...
  • 通过检查两个路径,确保你的pip绑定到你的python安装。 在Pip检查中: `pip --version` pip 9.0.1 from C:\Python27\lib\site-packages (python 2.7) 后来在python中: import sys print sys.executable C:\Python27\python.exe Make sure your pip is tied to your python installation by checking both ...
  • PHPStorm具有live template因此您可以使用它来生成多种类型的代码片段。 实时模板可用于将常用的构造插入到源代码中,例如,循环,条件,各种声明,打印语句,标签等。 要展开代码段,请键入相应的模板缩写,然后按Tab键。 继续按Tab键从模板中的一个变量跳转到下一个变量。 按Shift + Tab键移动到上一个变量 我认为此链接很有用,并建议您阅读此链接 要创建新代码段,您可以进行settings > Editor > Live Template PHPStorm has live templa ...
  • 安装pywin32然后确保pywintypes27.dll位于C:\ Windows \ System32目录中。 install pywin32 then make sure that pywintypes27.dll is in the C:\Windows\System32 directory.
  • 您的文件名为pyttsx.py ,因此您import自己的文件,而不是已安装的模块。 重命名您的文件。 Your file has name pyttsx.py, so you import your own file, instead of installed module. Rename your file.
  • 通过使用python的内置Queue类,我可以得到正确的结果: import pyttsx from Queue import Queue from threading import Thread q = Queue() def say_loop(): engine = pyttsx.init() while True: engine.say(q.get()) engine.runAndWait() q.task_done() def a ...
  • 默认情况下,linux中使用eSpeak 。 在Windows上 - 由Microsoft提供的SAPI5。 抱歉没有运气的SAPI,但对于eSpeak,你可以很容易地添加单词的明确发音(不是很简短的改变,但它应该是有用的)。 您只需要语言词典文件的来源( en_list for english)。 你可以从这里获得eSpeak的来源。 我采取了espeak-1.47.11-source.zip 。 然后我去了espeak-1.47.11-source/dictsource目录,打开en_list并进入一行 ...
  • 我认为你最好使用pip $ pip install pyttsx $ pip list pip (1.5.4) pyttsx (1.1) setuptools (2.2) 一切都应该没问题 $ python Python 2.7.2 (default, Jul 20 2011, 02:32:18) [GCC 4.2.1 (LLVM, Emscripten 1.5, Empythoned)] on linux2 Type "help", "copyright", "credits" or "license" ...
  • 我知道转换为PascalCase的唯一内置方法是TextInfo.ToTitleCase ,它不能按设计处理全大写字。 为了解决这个问题,我制作了一个可以检测所有单词部分的自定义正则表达式,然后将它们单独转换为Title / Pascal Case: string ToPascalCase(string s) { // Find word parts using the following rules: // 1. all lowercase starting at the beginnin ...
  • 我自己找到了答案:我必须添加两行代码来打开文件并将其行读入数组: import pyttsx engine = pyttsx.init() with open('/Users/exepaul/Desktop/a.txt') as f: lines = f.readlines() engine.say(lines) engine.runAndWait() engine.runAndWait() I found the answer myself: I had to add two lines of ...

相关文章

更多

最新问答

更多
  • h2元素推动其他h2和div。(h2 element pushing other h2 and div down. two divs, two headers, and they're wrapped within a parent div)
  • 创建一个功能(Create a function)
  • 我投了份简历,是电脑编程方面的学徒,面试时说要培训三个月,前面
  • PDO语句不显示获取的结果(PDOstatement not displaying fetched results)
  • Qt冻结循环的原因?(Qt freezing cause of the loop?)
  • TableView重复youtube-api结果(TableView Repeating youtube-api result)
  • 如何使用自由职业者帐户登录我的php网站?(How can I login into my php website using freelancer account? [closed])
  • SQL Server 2014版本支持的最大数据库数(Maximum number of databases supported by SQL Server 2014 editions)
  • 我如何获得DynamicJasper 3.1.2(或更高版本)的Maven仓库?(How do I get the maven repository for DynamicJasper 3.1.2 (or higher)?)
  • 以编程方式创建UITableView(Creating a UITableView Programmatically)
  • 如何打破按钮上的生命周期循环(How to break do-while loop on button)
  • C#使用EF访问MVC上的部分类的自定义属性(C# access custom attributes of a partial class on MVC with EF)
  • 如何获得facebook app的publish_stream权限?(How to get publish_stream permissions for facebook app?)
  • 如何防止调用冗余函数的postgres视图(how to prevent postgres views calling redundant functions)
  • Sql Server在欧洲获取当前日期时间(Sql Server get current date time in Europe)
  • 设置kotlin扩展名(Setting a kotlin extension)
  • 如何并排放置两个元件?(How to position two elements side by side?)
  • 如何在vim中启用python3?(How to enable python3 in vim?)
  • 在MySQL和/或多列中使用多个表用于Rails应用程序(Using multiple tables in MySQL and/or multiple columns for a Rails application)
  • 如何隐藏谷歌地图上的登录按钮?(How to hide the Sign in button from Google maps?)
  • Mysql左连接旋转90°表(Mysql Left join rotate 90° table)
  • dedecms如何安装?
  • 在哪儿学计算机最好?
  • 学php哪个的书 最好,本人菜鸟
  • 触摸时不要突出显示表格视图行(Do not highlight table view row when touched)
  • 如何覆盖错误堆栈getter(How to override Error stack getter)
  • 带有ImageMagick和许多图像的GIF动画(GIF animation with ImageMagick and many images)
  • USSD INTERFACE - > java web应用程序通信(USSD INTERFACE -> java web app communication)
  • 电脑高中毕业学习去哪里培训
  • 正则表达式验证SMTP响应(Regex to validate SMTP Responses)