首页 \ 问答 \ 使用标头和dtypes将pandas数据帧转换为numpy数组(Converting pandas dataframe to numpy array with headers and dtypes)

使用标头和dtypes将pandas数据帧转换为numpy数组(Converting pandas dataframe to numpy array with headers and dtypes)

我一直在尝试将pandas数据帧转换为numpy数组,并带有dtypes和头名称以便于参考。 我需要这样做,因为对熊猫的处理过于缓慢,numpy快10倍。 我有这个来自SO的代码,它给了我除了我需要的东西,结果看起来不像标准的numpy数组 - 即它没有显示形状中的列数。

[In]:
df = pd.DataFrame(randn(10,3),columns=['Acol','Ccol','Bcol'])
arr_ip = [tuple(i) for i in df.as_matrix()]
dtyp = np.dtype(list(zip(df.dtypes.index, df.dtypes)))
dfnp= np.array(arr_ip, dtype=dtyp)
print(dfnp.shape)
dfnp

[Out]: 

(10,) #expecting (10,3)

array([(-1.0645345 ,  0.34590193,  0.15063829),
( 1.5010928 ,  0.63312454,  2.38309797),
(-0.10203999, -0.40589525,  0.63262773),
( 0.92725915,  1.07961763,  0.60425353),
( 0.18905164, -0.90602597, -0.27692396),
(-0.48671514,  0.14182815, -0.64240004),
( 0.05012859, -0.01969079, -0.74910076),
( 0.71681329, -0.38473052, -0.57692395),
( 0.60363249, -0.0169229 , -0.16330232),
( 0.04078263,  0.55943898, -0.05783683)],
dtype=[('Acol', '<f8'), ('Ccol', '<f8'), ('Bcol', '<f8')])

我错过了什么或者有其他方法吗? 我有很多df要转换,他们的dtypes和列名称不同,所以我需要这种自动化方法。 由于大量的df,我也需要它才能高效。


I have been trying to convert a pandas dataframe into a numpy array, carrying over the dtypes and header names for ease of reference. I need to do this as the processing on pandas is WAY too slow, numpy is 10 fold quicker. I have this code from SO that gives me what I need apart from that the result does not look like a standard numpy array - i.e. it does not show the columns numbers in the shape.

[In]:
df = pd.DataFrame(randn(10,3),columns=['Acol','Ccol','Bcol'])
arr_ip = [tuple(i) for i in df.as_matrix()]
dtyp = np.dtype(list(zip(df.dtypes.index, df.dtypes)))
dfnp= np.array(arr_ip, dtype=dtyp)
print(dfnp.shape)
dfnp

[Out]: 

(10,) #expecting (10,3)

array([(-1.0645345 ,  0.34590193,  0.15063829),
( 1.5010928 ,  0.63312454,  2.38309797),
(-0.10203999, -0.40589525,  0.63262773),
( 0.92725915,  1.07961763,  0.60425353),
( 0.18905164, -0.90602597, -0.27692396),
(-0.48671514,  0.14182815, -0.64240004),
( 0.05012859, -0.01969079, -0.74910076),
( 0.71681329, -0.38473052, -0.57692395),
( 0.60363249, -0.0169229 , -0.16330232),
( 0.04078263,  0.55943898, -0.05783683)],
dtype=[('Acol', '<f8'), ('Ccol', '<f8'), ('Bcol', '<f8')])

Am I missing something or is there another way of doing this? I have many df's to convert and their dtypes and column names vary so I need this automated approach. I also need it to be efficient due to the large number of df's.


原文:https://stackoverflow.com/questions/49734441
更新时间:2022-09-07 18:09

最满意答案

我需要指定virtualenv的完整路径:

C:\virtualenvs>C:\python34\Scripts\virtualenv.exe -p C:\Python34\python.exe 

因为我有效地称之为:

C:\virtualenvs>C:\python27\Scripts\virtualenv.exe -p C:\Python34\python.exe

因为C:\python27\Scripts在我的PATH 。 而Python 2.7和3.4的碰撞导致了这个问题。


I needed to specify the full path of virtualenv:

C:\virtualenvs>C:\python34\Scripts\virtualenv.exe -p C:\Python34\python.exe 

because I was effectively calling this:

C:\virtualenvs>C:\python27\Scripts\virtualenv.exe -p C:\Python34\python.exe

since C:\python27\Scripts is in my PATH. And the collision of Python 2.7 and 3.4 was causing the issue.

相关问答

更多

相关文章

更多

最新问答

更多
  • 您如何使用git diff文件,并将其应用于同一存储库的副本的本地分支?(How do you take a git diff file, and apply it to a local branch that is a copy of the same repository?)
  • 将长浮点值剪切为2个小数点并复制到字符数组(Cut Long Float Value to 2 decimal points and copy to Character Array)
  • OctoberCMS侧边栏不呈现(OctoberCMS Sidebar not rendering)
  • 页面加载后对象是否有资格进行垃圾回收?(Are objects eligible for garbage collection after the page loads?)
  • codeigniter中的语言不能按预期工作(language in codeigniter doesn' t work as expected)
  • 在计算机拍照在哪里进入
  • 使用cin.get()从c ++中的输入流中丢弃不需要的字符(Using cin.get() to discard unwanted characters from the input stream in c++)
  • No for循环将在for循环中运行。(No for loop will run inside for loop. Testing for primes)
  • 单页应用程序:页面重新加载(Single Page Application: page reload)
  • 在循环中选择具有相似模式的列名称(Selecting Column Name With Similar Pattern in a Loop)
  • System.StackOverflow错误(System.StackOverflow error)
  • KnockoutJS未在嵌套模板上应用beforeRemove和afterAdd(KnockoutJS not applying beforeRemove and afterAdd on nested templates)
  • 散列包括方法和/或嵌套属性(Hash include methods and/or nested attributes)
  • android - 如何避免使用Samsung RFS文件系统延迟/冻结?(android - how to avoid lag/freezes with Samsung RFS filesystem?)
  • TensorFlow:基于索引列表创建新张量(TensorFlow: Create a new tensor based on list of indices)
  • 企业安全培训的各项内容
  • 错误:RPC失败;(error: RPC failed; curl transfer closed with outstanding read data remaining)
  • C#类名中允许哪些字符?(What characters are allowed in C# class name?)
  • NumPy:将int64值存储在np.array中并使用dtype float64并将其转换回整数是否安全?(NumPy: Is it safe to store an int64 value in an np.array with dtype float64 and later convert it back to integer?)
  • 注销后如何隐藏导航portlet?(How to hide navigation portlet after logout?)
  • 将多个行和可变行移动到列(moving multiple and variable rows to columns)
  • 提交表单时忽略基础href,而不使用Javascript(ignore base href when submitting form, without using Javascript)
  • 对setOnInfoWindowClickListener的意图(Intent on setOnInfoWindowClickListener)
  • Angular $资源不会改变方法(Angular $resource doesn't change method)
  • 在Angular 5中不是一个函数(is not a function in Angular 5)
  • 如何配置Composite C1以将.m和桌面作为同一站点提供服务(How to configure Composite C1 to serve .m and desktop as the same site)
  • 不适用:悬停在悬停时:在元素之前[复制](Don't apply :hover when hovering on :before element [duplicate])
  • 常见的python rpc和cli接口(Common python rpc and cli interface)
  • Mysql DB单个字段匹配多个其他字段(Mysql DB single field matching to multiple other fields)
  • 产品页面上的Magento Up出售对齐问题(Magento Up sell alignment issue on the products page)