MySQL:获取计数和平均值[重复](MySQL: Get Counts and Averages [duplicate])
这个问题在这里已有答案:
select COUNT(pd.property_id) AS `Beginning Total File Count`, COUNT(pd.recv_dt) as `average days in inventory`, AVG(pd.status = 'P') as `average days in pre-marketing`, AVG(pd.status NOT IN('I','C')) as `average days onMarket`, AVG(pd.status ='U') as `average days UnderContract`, SUM(pd.status = 'O') as `Total FilesOccupied Status`, SUM(pd.status = 'O') / COUNT(pd.property_id) as `percentage of Occupied / total file count` from resnet.property_Details pd
我想要
- 开始总文件数
- 库存的平均天数
- 上市前的平均天数
- 平均待售天数
- 合同平均天数
- 处于占用状态的文件总数
- 占用/总文件数的百分比
不确定我的查询是否写得正确,请帮助:)
This question already has an answer here:
select COUNT(pd.property_id) AS `Beginning Total File Count`, COUNT(pd.recv_dt) as `average days in inventory`, AVG(pd.status = 'P') as `average days in pre-marketing`, AVG(pd.status NOT IN('I','C')) as `average days onMarket`, AVG(pd.status ='U') as `average days UnderContract`, SUM(pd.status = 'O') as `Total FilesOccupied Status`, SUM(pd.status = 'O') / COUNT(pd.property_id) as `percentage of Occupied / total file count` from resnet.property_Details pd
I'm trying to get
- Beginning total file count
- Average days in inventory
- Average days in Pre-Marketing
- Average days on market
- Average days under contract
- Total files in occupied status
- Percentage of Occupied / total file count
Not sure if my query is written properly, please help :)
原文:https://stackoverflow.com/questions/43689973
最满意答案
我认为你需要首先使用
boolean indexing
进行过滤,然后进行groupby
和聚合size
。汇总输出并添加
reindex
以添加由0
填充的缺失行:print (df) Date ID 0 01/01/2016 a 1 05/01/2016 a 2 10/05/2017 a 3 05/05/2018 b 4 07/09/2014 b 5 07/09/2014 c 6 12/08/2018 b
#convert to datetime (if first number is day, add parameter dayfirst) df['Date'] = pd.to_datetime(df['Date'], dayfirst=True) now = pd.datetime.today() print (now) oneyarbeforenow = now - pd.offsets.DateOffset(years=1) oneyarafternow = now + pd.offsets.DateOffset(years=1) #first filter a = df[df['Date'].between(oneyarbeforenow, now)].groupby('ID').size() b = df[df['Date'].between(now, oneyarafternow)].groupby('ID').size() print (a) ID a 1 dtype: int64 print (b) ID b 2 dtype: int64 df1 = pd.concat([a,b],axis=1).fillna(0).astype(int).reindex(df['ID'].unique(),fill_value=0) print (df1) 0 1 a 1 0 b 0 2 c 0 0
编辑:
如果需要比较每个日期的第一个日期加上或减去每组的
year offset
需要自定义函数的条件和sum
:offs = pd.offsets.DateOffset(years=1) f = lambda x: pd.Series([(x > x.iat[-1] - offs).sum(), \ (x < x.iat[-1] + offs).sum()], index=['last','next']) df = df.groupby('ID')['Date'].apply(f).unstack(fill_value=0).reset_index() print (df) ID last next 0 a 1 3 1 b 3 2 2 c 1 1
I think you need
between
withboolean indexing
for filter first and thengroupby
and aggregatesize
.Outputs are
concat
ed and addreindex
for add missing rows filled by0
:print (df) Date ID 0 01/01/2016 a 1 05/01/2016 a 2 10/05/2017 a 3 05/05/2018 b 4 07/09/2014 b 5 07/09/2014 c 6 12/08/2018 b
#convert to datetime (if first number is day, add parameter dayfirst) df['Date'] = pd.to_datetime(df['Date'], dayfirst=True) now = pd.datetime.today() print (now) oneyarbeforenow = now - pd.offsets.DateOffset(years=1) oneyarafternow = now + pd.offsets.DateOffset(years=1) #first filter a = df[df['Date'].between(oneyarbeforenow, now)].groupby('ID').size() b = df[df['Date'].between(now, oneyarafternow)].groupby('ID').size() print (a) ID a 1 dtype: int64 print (b) ID b 2 dtype: int64 df1 = pd.concat([a,b],axis=1).fillna(0).astype(int).reindex(df['ID'].unique(),fill_value=0) print (df1) 0 1 a 1 0 b 0 2 c 0 0
EDIT:
If need compare each date by first date add or subtract
year offset
per group need custom function with condition andsum
Trues:offs = pd.offsets.DateOffset(years=1) f = lambda x: pd.Series([(x > x.iat[-1] - offs).sum(), \ (x < x.iat[-1] + offs).sum()], index=['last','next']) df = df.groupby('ID')['Date'].apply(f).unstack(fill_value=0).reset_index() print (df) ID last next 0 a 1 3 1 b 3 2 2 c 1 1
相关问答
更多-
计算特定ID的行数(Count number of rows for specific ID)[2023-01-03]
GROUP BY是您的首选武器。 SELECT a.ID, a.CONTENT_VALUE, COUNT(ad.ID) FROM albums AS a LEFT JOIN album_details AS ad ON a.ID = ad.SUB_ID GROUP BY a.ID 随意在GROUP BY之前添加您的WHERE 。 GROUP BY is your weapon of choice. SELECT a.ID, a.CONTENT_VALUE, ... -
还有df2 <- count(x, c('Year','Month')) (plyr package) There is also df2 <- count(x, c('Year','Month')) (plyr package)
-
我同意Jeanno和Brad的说法,对于这种类型的要求,Access是比Excel更好的工具。 但是,我想知道尝试用Excel读取这么大的文件是否会有一个实际的持续时间。 我连接了一些大文本文件来创建一个663 Mb的文件,我认为这个文件足够接近。 下面的宏读取文件的每一行,并将其拆分为准备分析的字段。 注意:我的文件使用“|” 作为分隔符而不是“,”。 宏在100多秒内读取7,782,013条记录。 访问仍然是更好的选择,但如果Access不可用,则Excel是可行的。 注意:此宏需要引用“Microso ...
-
PDOStatement对象提供了rowCount()方法,该方法返回SELECT ed的行数: 如果关联的PDOStatement执行的最后一个SQL语句是SELECT语句,则某些数据库可能会返回该语句返回的行数。 但是,并不是所有数据库都能保证这种行为,不应依赖于便携式应用程序。 PDOStatement object provides a rowCount() method, which returns the number of rows SELECTed: If the last SQL stat ...
-
您可以通过两个级别的聚合来完成此操作。 计算UserId级别的计数,然后使用该信息获取所需的计数: select sum(case when cnt > 1 then 1 else 0 end) as ReturningUsers, sum(case when cnt = 1 then 1 else 0 end) as NewUsers from (select UserId, count(*) as cnt from [TelemetryData] where [Di ...
-
总结一年内的公司数量(Sum the number of firms in one year)[2022-02-02]
您可以使用表格列出“输入”年份和“退出”年份: res <- table( dt$year[!duplicated(dt$firm)], factor(dt$year[!duplicated(dt$firm, fromLast = TRUE)], levels = unique(dt$year)) ) res <- as.data.frame.matrix(res) res$All <- rowSums(res) # > res # 2001 2002 2003 2004 200 ... -
我认为你需要首先使用boolean indexing进行过滤,然后进行groupby和聚合size 。 汇总输出并添加reindex以添加由0填充的缺失行: print (df) Date ID 0 01/01/2016 a 1 05/01/2016 a 2 10/05/2017 a 3 05/05/2018 b 4 07/09/2014 b 5 07/09/2014 c 6 12/08/2018 b #convert to datetime (if firs ...
-
试试这个查询 SELECT InsuranceId, name, COUNT(*) TotalPendingClaims, ClaimIds = STUFF((SELECT ', ' + CONVERT(varchar, claimid) FROM dbo.Claim c2 WHERE c1.InsuranceId = c2.InsuranceId AND c1.name = c2.name FOR XML PATH('')), 1, 2 ...
-
连续行数(Number of successive rows)[2022-09-17]
尝试: dat[,grp := cumsum(c(1,diff(year)) > 1),by=list(id1,id2)] dat[,list(year=year[1],length=length(year)),by=list(id1,id2,grp)] # id1 id2 grp year length #1: 51557094 65122111 0 2003 3 #2: 51557094 65122111 1 2007 2 #3: 51557093 ... -
我就是这样做的: my_file = r'..\dummy.csv' my_dict = dict() with open(my_file, 'r') as f: for rows in f: v, k = rows.strip().split('|') my_dict.setdefault(k, []) my_dict[k].append(v) for keys, values in my_dict.items(): my_dict[ke ...