如何在mongo聚合框架中的流水线阶段之后加入文档(How to join documents after a pipeline stage in mongo aggregation framwork)
因此,让我们说在聚合的第一阶段之后,我已经将所有文档按中心分组,所以我有这样的内容:
{ center:"A", gender:"Male", count:50 } { center:"A", gender:"Female", count:20 }
我想加入这两个文件,使最终的文件看起来像
{ center:A, Male:50, Female:20 }
So lets say after the first stage of aggregation I have grouped all the documents by the center so i have something like this:
{ center:"A", gender:"Male", count:50 } { center:"A", gender:"Female", count:20 }
I want to join these two documents such that the final document looks something like
{ center:A, Male:50, Female:20 }
原文:https://stackoverflow.com/questions/34627212
最满意答案
您可以使用带有
skipinitialspace=True
的csv.reader
跳过空格,然后压缩行以获取列,我们使用itertools.izip_longest
因为缺少最后一列中的值。 转换set中的列并使用set.intersection
获取交集:from itertools import izip_longest import csv with open('test') as f: reader = csv.reader(f, delimiter=' ', skipinitialspace=True) cols = map(set, izip_longest(*reader)) print set.intersection(*cols)
注意你的文件不正确是一个csv,如果你在一个不是最后一个列的列中缺少值,这将不正确地解释你的输入。 考虑至少使用不是空格的分隔符。
例
使用
StringIO
解析字符串并显示它适用于测试用例:from itertools import izip_longest import csv import StringIO data='''table1 table2 table3 table4 table5 paper paper pen book book pen pencil pencil charger apple apple pen charger beatroot sandle beatroot mobile apple pen paper sandle book paper paper''' f = StringIO.StringIO(data) reader = csv.reader(f, delimiter=' ', skipinitialspace=True) cols = map(set, izip_longest(*reader)) print set.intersection(*cols)
产量
set(['paper'])
You can use the
csv.reader
withskipinitialspace=True
to skip the spaces, then zip the rows to get the columns, we useitertools.izip_longest
because a value in the last column is missing. Convert the columns in set and take the intersection usingset.intersection
:from itertools import izip_longest import csv with open('test') as f: reader = csv.reader(f, delimiter=' ', skipinitialspace=True) cols = map(set, izip_longest(*reader)) print set.intersection(*cols)
Watch out that your file is not properly a csv, and if you have missing values in a column that is not the last one this will interpret your input not properly. Consider at least using a delimiter that is not space.
Example
Using
StringIO
to parse a string and show that it works for the test case:from itertools import izip_longest import csv import StringIO data='''table1 table2 table3 table4 table5 paper paper pen book book pen pencil pencil charger apple apple pen charger beatroot sandle beatroot mobile apple pen paper sandle book paper paper''' f = StringIO.StringIO(data) reader = csv.reader(f, delimiter=' ', skipinitialspace=True) cols = map(set, izip_longest(*reader)) print set.intersection(*cols)
Output
set(['paper'])
相关问答
更多-
写入CSV中的不同列(Writing to different columns in CSV)[2023-07-17]
你需要在循环中调用.writerow() : for item in r: screen_name = item['user']['screen_name'].encode('utf-8') created_at = item['created_at'].encode('utf-8') tweet = item['text'].encode('utf-8') writer.writerow([screen_name, created_at, tweet]) 或者,收集列表列 ... -
重新排列CSV列(rearrange CSV columns)[2024-01-06]
字典不是有序的,如果要强制执行列排序,则需要明确指定 import csv headers = ['Party', 'Period', 'Date', 'ExTime', 'Name'] # Don't use my_dict.keys() with open('header.csv', 'w') as f: w = csv.DictWriter(f, fieldnames=headers) w.writeheader() 看到 $ python sample.py && cat head ... -
您从此代码获取最后一列的唯一方法是,如果您的print语句不包含在 for循环中。 这很可能是你的代码的结尾: for row in reader: content = list(row[i] for i in included_cols) print content 你想要这样做: for row in reader: content = list(row[i] for i in included_cols) print content 现在我们已经涵盖了你的错 ...
-
从CSV更新列(Update columns from a CSV)[2022-03-13]
我无法解释为什么你会看到你所看到的内容,我会认为你所看到的实际结果是,你的数据库中只有一行与CSV中的每一行一遍又一遍地更新。 我也非常惊讶,因为你puts $INPUT_LINE_NUMBER放在你的循环中你还没有看到我期望的东西(一遍又一遍地打印一个数字)。 这是因为在Rails中每个each没有自动设置$INPUT_LINE_NUMBER ,它甚至不是由File.read设置的,所以在你的代码中它将是在最后一个IO循环结束时发生的任何事情。 最简单的方法是使用循环索引作为您的id,而不是尝试使用行号, ... -
试试这个,看看手册中的fgetcsv()和fputcsv()由于熊猫不能使用,我会使用numpy如下: # first get all the columns of each csv file as lists csv1_cols = ['ColumnA','ColumnB','ColumnF','ColumnC'] csv2_cols = ['ColumnD','ColumnA','ColumnC','ColumnB','ColumnH'] csv3_cols = ['ColumnH','ColumnJ','ColumnA','ColumnB','ColumnC' ...
CSV列中的CSV列(CSV Columns to Arrays in Python)[2023-12-28]
您需要将column1.extend(row[0])更改为column1.append(row[0]) (对于column2,显然也是如此)。 Extend用于将一个列表的内容添加到另一个列表,append用于添加单个元素。 Extend告诉python将字符串视为其字符列表并附加每个字符。 >>> lst = [] >>> lst.extend("foo") >>> lst ['f', 'o', 'o'] >>> lst.append("foo") >>> lst ['f', 'o', 'o', 'foo ...按值对CSV列进行排序(Sort CSV columns by value)[2022-09-18]
如果要将所有值存储在数组或散列数组中,则可以使用Enumerable#sort 。 sort_index = 1 # or Hash Key "Value1" values.sort { |a, b| a[sort_index] <=> b[sort_index] } 注意:不使用爆炸! 这将返回排序列表。 如果你想让它改变它,请使用#sort! 。 I found SmarterCSV, so I was able to easily do: array_of_hashes = SmarterCSV.p ...所有csv列之间通用(common between all csv columns)[2021-08-01]
您可以使用带有skipinitialspace=True的csv.reader跳过空格,然后压缩行以获取列,我们使用itertools.izip_longest因为缺少最后一列中的值。 转换set中的列并使用set.intersection获取交集: from itertools import izip_longest import csv with open('test') as f: reader = csv.reader(f, delimiter=' ', skipinitialspace= ...您应该使用pd.concat(..., axis=1)参数来水平连接DF: import os import glob import pandas as pd In [46]: files = glob.glob(r'D:\temp\.data\42011160\*.csv') In [47]: pd.concat([pd.read_csv(f, usecols=['hour', 'energy'], index_col='hour') ...: .rename(col ...相关文章
更多- Mongo-Hadoop 1.1 发布,利用 Hadoop 并行处理 MongoDB 中的大数据
- 使用Hadoop的datajoin包进行关系型join操作
- Solr 4.0: Partial documents update
- ubuntu下安装Mongo的php扩展
- sunspot mongo search 步骤和注意事项
- Spring Data: a new perspective of data operations
- Hadoop Oozie学习笔记E0720: Fork/join mismatch, node [join_node_name]异常解决
- Spark - A Fault-Tolerant Abstraction for In-Memory Cluster Computing
- Scaling Pinterest - From 0 To 10s Of Billions Of Page Views A Month In Two Years
- hibernate 多表 join 查询发现还是会重新load one-to-many 子表
最新问答
更多- 您如何使用git diff文件,并将其应用于同一存储库的副本的本地分支?(How do you take a git diff file, and apply it to a local branch that is a copy of the same repository?)
- 将长浮点值剪切为2个小数点并复制到字符数组(Cut Long Float Value to 2 decimal points and copy to Character Array)
- OctoberCMS侧边栏不呈现(OctoberCMS Sidebar not rendering)
- 页面加载后对象是否有资格进行垃圾回收?(Are objects eligible for garbage collection after the page loads?)
- codeigniter中的语言不能按预期工作(language in codeigniter doesn' t work as expected)
- 在计算机拍照在哪里进入
- 使用cin.get()从c ++中的输入流中丢弃不需要的字符(Using cin.get() to discard unwanted characters from the input stream in c++)
- No for循环将在for循环中运行。(No for loop will run inside for loop. Testing for primes)
- 单页应用程序:页面重新加载(Single Page Application: page reload)
- 在循环中选择具有相似模式的列名称(Selecting Column Name With Similar Pattern in a Loop)
- System.StackOverflow错误(System.StackOverflow error)
- KnockoutJS未在嵌套模板上应用beforeRemove和afterAdd(KnockoutJS not applying beforeRemove and afterAdd on nested templates)
- 散列包括方法和/或嵌套属性(Hash include methods and/or nested attributes)
- android - 如何避免使用Samsung RFS文件系统延迟/冻结?(android - how to avoid lag/freezes with Samsung RFS filesystem?)
- TensorFlow:基于索引列表创建新张量(TensorFlow: Create a new tensor based on list of indices)
- 企业安全培训的各项内容
- 错误:RPC失败;(error: RPC failed; curl transfer closed with outstanding read data remaining)
- C#类名中允许哪些字符?(What characters are allowed in C# class name?)
- NumPy:将int64值存储在np.array中并使用dtype float64并将其转换回整数是否安全?(NumPy: Is it safe to store an int64 value in an np.array with dtype float64 and later convert it back to integer?)
- 注销后如何隐藏导航portlet?(How to hide navigation portlet after logout?)
- 将多个行和可变行移动到列(moving multiple and variable rows to columns)
- 提交表单时忽略基础href,而不使用Javascript(ignore base href when submitting form, without using Javascript)
- 对setOnInfoWindowClickListener的意图(Intent on setOnInfoWindowClickListener)
- Angular $资源不会改变方法(Angular $resource doesn't change method)
- 在Angular 5中不是一个函数(is not a function in Angular 5)
- 如何配置Composite C1以将.m和桌面作为同一站点提供服务(How to configure Composite C1 to serve .m and desktop as the same site)
- 不适用:悬停在悬停时:在元素之前[复制](Don't apply :hover when hovering on :before element [duplicate])
- 常见的python rpc和cli接口(Common python rpc and cli interface)
- Mysql DB单个字段匹配多个其他字段(Mysql DB single field matching to multiple other fields)
- 产品页面上的Magento Up出售对齐问题(Magento Up sell alignment issue on the products page)