SQL:在每个成员的多个开始和结束日期范围内识别不同的处理块(SQL: Identify distinct blocks of treatment over multiple start and end date ranges for each member)
目标:确定表中每个成员的连续治疗的不同事件。 每个成员都有诊断和服务日期,并且剧集被定义为每个连续服务之间的时间小于某个数字的所有服务(例如,对于该示例,假设为90天)。 查询将需要遍历每一行并计算日期之间的差异,并返回与每集相关的第一个和最后一个日期。 目标是按成员和剧集开始/结束日期对结果进行分组。
之前已经提出了一个非常类似的问题 ,并且有些帮助。 问题是在自定义代码时,返回的表将排除第一个和最后一个记录。 我不知道该怎么办。
我的数据目前看起来像这样:
MemberCode Diagnosis ServiceDate 1001 ----- ABC ----- 2010-02-04 1001 ----- ABC ----- 2010-03-20 1001 ----- ABC ----- 2010-04-18 1001 ----- ABC ----- 2010-05-22 1001 ----- ABC ----- 2010-09-26 1001 ----- ABC ----- 2010-10-11 1001 ----- ABC ----- 2010-10-19 2002 ----- XYZ ----- 2010-07-10 2002 ----- XYZ ----- 2010-07-21 2002 ----- XYZ ----- 2010-11-08 2002 ----- ABC ----- 2010-06-03 2002 ----- ABC ----- 2010-08-13
在上述数据中,会员1001的第一个记录是2010-02-04 ,并且在2010-09-26 (新剧集开始的日期)之前连续服务之间的差异不超过90天。 因此,会员1001有两个不同的剧集:(1)诊断ABC,从2010-02-04到2010-05-22 ,以及(2)诊断ABC,从2010-09-26到2010-10-19 。
同样,2002年会员有三个不同的剧集:(1)诊断XYZ,从2010-07-10到2010-07-21 ,(2)诊断XYZ,在2010-11-08开始和结束,和(3) )诊断ABC,从2010-06-03到2010-08-13 。
期望的输出:
MemberCode Diagnosis EpisodeStartDate EpisodeEndDate 1001 ----- ABC ----- 2010-02-04 ----- 2010-05-22 1001 ----- ABC ----- 2010-09-26 ----- 2010-10-19 2002 ----- XYZ ----- 2010-07-10 ----- 2010-07-21 2002 ----- XYZ ----- 2010-11-08 ----- 2010-11-08 2002 ----- ABC ----- 2010-06-03 ----- 2010-08-13
我一直在研究这个查询太久了,仍然无法得到我需要的东西。 任何帮助,将不胜感激。 提前致谢!
Objective: Identify distinct episodes of continuous treatment for each member in a table. Each member has a diagnosis and a service date, and an episode is defined as all services where the time between each consecutive service is less than some number (let's say 90 days for this example). The query will need to loop through each row and calculate the difference between dates, and return the first and last date associated with each episode. The goal is to group results by member and episode start/end date.
A very similar question has been asked before, and was somewhat helpful. The problem is that in customizing the code, the returned tables are excluding first and last records. I'm not sure how to proceed.
My data currently looks like this:
MemberCode Diagnosis ServiceDate 1001 ----- ABC ----- 2010-02-04 1001 ----- ABC ----- 2010-03-20 1001 ----- ABC ----- 2010-04-18 1001 ----- ABC ----- 2010-05-22 1001 ----- ABC ----- 2010-09-26 1001 ----- ABC ----- 2010-10-11 1001 ----- ABC ----- 2010-10-19 2002 ----- XYZ ----- 2010-07-10 2002 ----- XYZ ----- 2010-07-21 2002 ----- XYZ ----- 2010-11-08 2002 ----- ABC ----- 2010-06-03 2002 ----- ABC ----- 2010-08-13
In the above data, the first record for Member 1001 is 2010-02-04, and there is not a difference of more than 90 days between consecutive services until 2010-09-26 (the date at which a new episode starts). So Member 1001 has two distinct episodes: (1) Diagnosis ABC, which goes from 2010-02-04 to 2010-05-22, and (2) Diagnosis ABC, which goes from 2010-09-26 to 2010-10-19.
Similarly, Member 2002 has three distinct episodes: (1) Diagnosis XYZ, which goes from 2010-07-10 to 2010-07-21, (2) Diagnosis XYZ, which begins and ends on 2010-11-08, and (3) Diagnosis ABC, which goes from 2010-06-03 to 2010-08-13.
Desired output:
MemberCode Diagnosis EpisodeStartDate EpisodeEndDate 1001 ----- ABC ----- 2010-02-04 ----- 2010-05-22 1001 ----- ABC ----- 2010-09-26 ----- 2010-10-19 2002 ----- XYZ ----- 2010-07-10 ----- 2010-07-21 2002 ----- XYZ ----- 2010-11-08 ----- 2010-11-08 2002 ----- ABC ----- 2010-06-03 ----- 2010-08-13
I've been working on this query for too long, and still can't get exactly what I need. Any help would be appreciated. Thanks in advance!
原文:
最满意答案
您的
rbindlist(lapply(...))
可以使用版本1.9.7的data.table替换为非equi连接specialty.dt[ provider.dt, on = .(p1 <= prob, p2 > prob)]
这将
specialty.dt
直接连接到provider.dt
,使用p1 <= prob
和prob < p2
。
参考
Your
rbindlist(lapply(...))
can be replaced with a non-equi join using version 1.9.7 of data.tablespecialty.dt[ provider.dt, on = .(p1 <= prob, p2 > prob)]
This joins
specialty.dt
ontoprovider.dt
directly, using the condition thatp1 <= prob
, andprob < p2
.
References
相关问答
更多-
TCP/IP模型是一个________。[2023-05-19]
a -
下列中不属于面向对象的编程语言的是?[2022-05-30]
a -
在Rails 4.x中(见http://edgeguides.rubyonrails.org/active_record_querying.html#not-conditions ) GroupUser.where.not(user_id: me) 在Rails 3.x GroupUser.where(GroupUser.arel_table[:user_id].not_eq(me)) 要缩短长度,您可以将GroupUser.arel_table存储在变量中,或者如果在模型GroupUser本身内部使用 ...
-
ANSI SQL具有IS DISTINCT FROM但这在SQL Server中未实现。 它可以以不依赖于魔术常量的方式进行模拟 ,如果在列上使用则可以进行模拟 IF EXISTS (SELECT @A EXCEPT SELECT @B) PRINT 'Different' Here are the results of a performance test that checks for inequality a million times such that 90% of the time ...
-
浮点平等和不平等(Floating-point equality and inequality)[2023-06-27]
是的,这将永远为真,没有数字解释的位模式,运算符==将不会返回完全相反的!=。 对于不同的位模式(负零点0x80 ...等于零0x00 ...)进行比较的位模式有异常,或者与您已经提到的相同的位模式比较为不等(NaN),但是!=总是==的反转。 Yes, this will be always true, there are no number interpretations of bit patterns for which the operator == will not return the exac ... -
否定平等与不平等(negated equal vs. not equal)[2022-07-08]
我真的很喜欢 if ( a != b ) 只是因为你应该少阅读,而且你更快地理解程序员想要传递的信息。 记住程序员花更多的时间阅读代码,而不是编写代码,所以你可以做的越多,代码就越容易理解,越好。 I really prefer if ( a != b ) simply because you are supposed to read less and you understand more quickly the message the programmer wanted to transmit. R ... -
如何在哪里“不平等”?(How to get “not equal” in Where?)[2023-04-16]
你需要使用notEqualTo $where->notEqualTo('field2_id', $field2_id); http://framework.zend.com/apidoc/2.1/classes/Zend.Db.Sql.Predicate.Predicate.html#notEqualTo You need to use notEqualTo $where->notEqualTo('field2_id', $field2_id); http://framework.zend.com/ap ... -
input函数返回一个string ,你应该把它转换为int : qe1 = int(input("What does RAM stand for?: ")) input function return a string, you should cast it to int: qe1 = int(input("What does RAM stand for?: "))
-
这是一个包含在lingprog公式中的下限行的包装器。 请注意,需要更多的错误捕获(例如,每个A矩阵的列数必须相等),这并不意味着是一个健壮的实现。 为了正确的错误捕获,我建议您浏览一下linprog 源代码 。 from scipy.optimize import linprog import numpy as np def linprog_lb_wrapper(c, A_ub=None, b_ub=None, A_lb=None, b_lb=None, A_eq=None, b_eq=None, \ ...
-
加速不平等的加入(speeding up a non-equal join)[2022-05-09]
您的rbindlist(lapply(...))可以使用版本1.9.7的data.table替换为非equi连接 specialty.dt[ provider.dt, on = .(p1 <= prob, p2 > prob)] 这将specialty.dt直接连接到provider.dt ,使用p1 <= prob和prob < p2 。 参考 这是一个类似问题的列表 这是阿伦的一次演讲 Your rbindlist(lapply(...)) can be replaced with a non-equ ...