Seaborn:使用boxplot导致内存不足(Seaborn: using boxplot cause running out of memory)
我想为1,2和3个
weight_cat
值绘制三个weight_cat
(这些是它唯一的不同值)。 这些weight_cat
图应显示重量类别(weight_cat
)的依赖性高度。所以我有这样一个数据帧:
print data.head(5) Height Weight weight_cat Index 1 65.78331 112.9925 1 2 71.51521 136.4873 2 3 69.39874 153.0269 3 4 68.21660 142.3354 2 5 67.78781 144.2971 2
下面的代码终于吃掉了我的所有内存。 这不正常,我相信:
Seaborn.boxplot(x="Height", y="weight_cat", data=data)
这有什么不对? 这是手册的链接。 数据帧的形状是(25000,4)。 这是csv文件的链接。
这是你如何获得相同的数据:
data = pd.read_csv('weights_heights.csv', index_col='Index') def weight_category(weight): newWeight = weight if newWeight < 120: return 1 if newWeight >= 150: return 3 else: return 2 data['weight_cat'] = data['Weight'].apply(weight_category)
I would like to plot three boxplots for 1, 2 and 3
weight_cat
values (these are the only distinct values it has). These boxplots should show dependency height on weight category (weight_cat
).So I have such a dataframe:
print data.head(5) Height Weight weight_cat Index 1 65.78331 112.9925 1 2 71.51521 136.4873 2 3 69.39874 153.0269 3 4 68.21660 142.3354 2 5 67.78781 144.2971 2
The code below finally eats all my ram. This is not normal, I believe:
Seaborn.boxplot(x="Height", y="weight_cat", data=data)
What is wrong here? This is the link to manual. Shape of the dataframe is (25000,4). This the link to the csv file.
This is how you can get the same data:
data = pd.read_csv('weights_heights.csv', index_col='Index') def weight_category(weight): newWeight = weight if newWeight < 120: return 1 if newWeight >= 150: return 3 else: return 2 data['weight_cat'] = data['Weight'].apply(weight_category)
原文:https://stackoverflow.com/questions/36666562
最满意答案
问题是你正在使用
np.matrix
。 改为使用np.array
并简单地迭代而不进行索引:result = np.array([[11, 12, 13], [21, 22, 23], [31, 32, 33]]) for p in result: print(p) [11 12 13] [21 22 23] [31 32 33]
说明
你看到的是
numpy.matrix
的效果,要求每一行有2个维度。 这对于NumPy来说是不必要的和反模式的。
numpy.matrix
背后有一段历史。 为了方便矩阵乘法运算符,它被初始化使用。 但这不再是一个问题,因为@
是可能的(Python 3.5+)而不是嵌套dot
调用。 因此,默认情况下,使用numpy.array
。The problem is you are using
np.matrix
. Usenp.array
instead and simply iterate without indexing:result = np.array([[11, 12, 13], [21, 22, 23], [31, 32, 33]]) for p in result: print(p) [11 12 13] [21 22 23] [31 32 33]
Explanation
What you are seeing is the effect of
numpy.matrix
requiring each row to have 2 dimensions. This is unnecessary and anti-pattern for NumPy.There is a history behind
numpy.matrix
. It was used initial for convenience of matrix multiplication operators. But this is no longer an issue since@
is possible (Python 3.5+) instead of nesteddot
calls. Therefore, by default, usenumpy.array
.
相关问答
更多-
数组矩阵(Numpy matrix to array)[2023-12-09]
如果你想要一些更可读的东西,你可以这样做: A = np.squeeze(np.asarray(M)) 同样,你也可以做: A = np.asarray(M).reshape(-1) ,但是这不容易阅读。 If you'd like something a bit more readable, you can do this: A = np.squeeze(np.asarray(M)) Equivalently, you could also do: A = np.asarray(M).reshape( ... -
矩阵中的前n行?(Top n rows in a matrix?)[2022-09-15]
我假设你的意思是你想要矩阵中的n个最大值。 在这种情况下, 获取矩阵中n个最大元素的索引几乎与此问题相同,除了OP需要整个矩阵的最大值,而不是单个最大值。 这应该可以满足您的需求 n = 2; % The depth to get M = [ 1, 3, 5; ... 2, 9, 1; ... 7, 2, 4 ]; % The matrix to look at [m, mi] = sort(M, 'descend'); % Sort the t ... -
迭代n行的矩阵行(Iterate over a numpy Matrix rows)[2023-05-22]
问题是你正在使用np.matrix 。 改为使用np.array并简单地迭代而不进行索引: result = np.array([[11, 12, 13], [21, 22, 23], [31, 32, 33]]) for p in result: print(p) [11 12 13] [21 22 23] [31 32 33] 说明 你看到的是numpy.matrix的效果,要求每一行有2个维度。 这对于NumPy ... -
output[ind.astype(bool)] = full 通过将ind的整数值转换为布尔值,您可以执行布尔索引以选择要用full值填充的output中的行。 4x4数组的示例 : M = 4 K = 4 ind = np.array([0,1,0,1]) full = np.random.rand(sum(ind),K) output = np.zeros((M,K)) output[ind.astype(bool)] = full print(output) [[ 0. 0 ...
-
看看scipy.linalg.circulant In [255]: r Out[255]: array([1, 2, 3, 4, 5]) In [256]: circulant(r).T Out[256]: array([[1, 2, 3, 4, 5], [5, 1, 2, 3, 4], [4, 5, 1, 2, 3], [3, 4, 5, 1, 2], [2, 3, 4, 5, 1]]) 或scipy.linalg.toeplitz In [ ...
-
arr作为matrix类型返回,它可能不是一个可以与join很好地匹配的可迭代对象。 您可以将arr转换为带有tolist()的list , 然后执行您的join 。 >>> a = arr.tolist() # now you can manipulate the list. >>> for i in a: '|'.join(map(str,i)) '0|1|2|3|4' '0|1|2|3|4' '0|1|2|3|4' '0|1|2|3|4' '0|1|2|3|4' 或者使用numpy.as ...
-
如何根据条件从NumPy Matrix获取行的子集?(How to get a subset of rows from a NumPy Matrix based on a condition?)[2021-09-28]
>>> X[(X[:, 0] == 'rainy').ravel(), :] matrix([['rainy', 'mild', 'high', 'FALSE'], ['rainy', 'cool', 'normal', 'FALSE'], ['rainy', 'cool', 'normal', 'TRUE'], ['rainy', 'mild', 'normal', 'FALSE'], ['rainy', 'mild', 'high', 'T ... -
迭代矩阵,对某些行求和并将结果添加到另一个数组(Iterate over a matrix, sum over some rows and add the result to another array)[2021-12-27]
制作数据, matrix , numpy.ndarray对象,而不是列表列表,然后只做matrix.sum(axis=1) 。 >>> matrix = np.asarray([[ 47, 43, 51, 81, 54, 81, 52, 54, 31, 46], [ 35, 21, 30, 16, 37, 11, 35, 30, 39, 37], [ 8, 17, 11, 2, 5, 4, 11, 9, 17, 10], [ 5, ... -
Numpy:在另一个不同维度的矩阵上添加矩阵行(Numpy: Add Rows of Matrix over another Matrix of different dimension)[2022-04-19]
您的A =和B =命令不会生成矩阵,而是生成列表。 差异很重要,因为他们没有附加numpy的漂亮的矢量数学。 无论如何,您可以通过使用[:,None]创建新轴来扩展A ,执行添加,然后交换轴以获得所需的形状: >>> A = np.array([[1,2,3], [4,5,6], [7,8,9]]) >>> B = np.array([[1,1,1], [2,2,2]]) >>> (A[:, None] + B).swapaxes(0,1) array([[[ 2, 3, 4], [ 5 ... -
矩阵([[]]) - > shape =(1,0)非常有意义; 你给它一行,零列; 怎么numpy应该猜测你真正想要的是零行呢? 所有数组创建函数(如np.zeros((0,0))都可用于您的目的。 matrix([[]]) -> shape = (1,0) makes perfect sense; you gave it one row, with zero columns; how is numpy supposed to guess what you really wanted was zero ro ...