ValueError：无法将形状（20,590）的输入数组广播为形状（20）-Java 学习之路

我试图通过使用MFCC的声音文件从.wav文件中提取功能 . 当我尝试将我的MFCC列表转换为numpy数组时，我收到错误 . 我很确定发生此错误是因为列表包含具有不同形状的MFCC值（但我不确定如何解决该问题） .

我查看了其他2个stackoverflow帖子，但是这些并不能解决我的问题，因为它们对于某个任务来说太具体了 .

ValueError: could not broadcast input array from shape (128,128,3) into shape (128,128)

Value Error: could not broadcast input array from shape (857,3) into shape (857)

完整错误消息：

回溯（最近一次调用最后一次）：文件“/.... /.../...../Batch_MFCC_Data.py”，第68行，在X = np.array（MFCCs）中ValueError：无法广播输入从形状（20,590）到形状（20）的阵列

代码示例：

all_wav_paths = glob.glob('directory_of_wav_files/**/*.wav', recursive=True)
np.random.shuffle(all_wav_paths)

MFCCs = [] #array to hold all MFCC's
labels = [] #array to hold all labels

for i, wav_path in enumerate(all_wav_paths):

    individual_MFCC = MFCC_from_wav(wav_path)
    #MFCC_from_wav() -> returns the MFCC coefficients 

    label = get_class(wav_path)
    #get_class() -> returns the label of the wav file either 0 or 1

    #add features and label to the array
    MFCCs.append(individual_MFCC)
    labels.append(label)

#Must convert the training data to a Numpy Array for 
#train_test_split and saving to local drive

X = np.array(MFCCs) #THIS LINE CRASHES WITH ABOVE ERROR

# binary encode labels
onehot_encoder = OneHotEncoder(sparse=False)
Y = onehot_encoder.fit_transform(labels)

#create train/test data
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(MFCCs, Y, test_size=0.25, random_state=0)

#saving data to local drive
np.save("LABEL_SAVE_PATH", Y)
np.save("TRAINING_DATA_SAVE_PATH", X)

以下是MFCCs阵列中MFCC的 shape （来自.wav文件）的快照

MFCCs数组包含以下形状：

...More above...
(20, 423) #shape of returned MFCC from one of the .wav files
(20, 457)
(20, 1757)
(20, 345)
(20, 835)
(20, 345)
(20, 687)
(20, 774)
(20, 597)
(20, 719)
(20, 1195)
(20, 433)
(20, 728)
(20, 939)
(20, 345)
(20, 1112)
(20, 345)
(20, 591)
(20, 936)
(20, 1161)
....More below....

正如您所看到的，MFCC阵列中的MFCC并非都具有相同的形状，这是因为录制的时间长度并不完全相同 . 这是我无法将数组转换为numpy数组的原因吗？如果这是问题，如何修复此问题以使整个MFCC阵列具有相同的形状？

任何代码片段，以实现这一点和建议将不胜感激！

谢谢！

1 回答

1
使用以下逻辑将数组下采样到 min_shape ，即将较大的数组减少为 min_shape
```
min_shape = (20, 345)
MFCCs = [arr1, arr2, arr3, ...]    

for idx, arr in enumerate(MFCCs):
    MFCCs[idx] = arr[:, :min_shape[1]]

batch_arr = np.array(MFCCs)
```
然后，您可以将这些数组堆叠在批处理数组中，如下面的最小示例所示：
```
In [33]: a1 = np.random.randn(2, 3)    
In [34]: a2 = np.random.randn(2, 5)    
In [35]: a3 = np.random.randn(2, 10)

In [36]: MFCCs = [a1, a2, a3]

In [37]: min_shape = (2, 2)

In [38]: for idx, arr in enumerate(MFCCs):
    ...:     MFCCs[idx] = arr[:, :min_shape[1]]
    ...:     

In [42]: batch_arr = np.array(MFCCs)

In [43]: batch_arr.shape
Out[43]: (3, 2, 2)
```
现在针对第二种策略，将数组较小的数组上采样到 max_shape ，遵循类似的逻辑，但是根据您的喜好用零或 nan 值填充缺失的值 .

然后，您可以将数组堆叠为形状的批处理数组 (num_arrays, dim1, dim2) ;那么，对于你的情况，形状应该是 (num_wav_files, 20, max_column ）
回复于 2024-04-30T03:50:16+08:00

ValueError：无法将形状（20,590）的输入数组广播为形状（20）

1 回答

相关问题