首页 \ 问答 \ SQL：在每个成员的多个开始和结束日期范围内识别不同的处理块(SQL: Identify distinct blocks of treatment over multiple start and end date ranges for each member)

SQL：在每个成员的多个开始和结束日期范围内识别不同的处理块(SQL: Identify distinct blocks of treatment over multiple start and end date ranges for each member)

 目标：确定表中每个成员的连续治疗的不同事件。 每个成员都有诊断和服务日期，并且剧集被定义为每个连续服务之间的时间小于某个数字的所有服务（例如，对于该示例，假设为90天）。 查询将需要遍历每一行并计算日期之间的差异，并返回与每集相关的第一个和最后一个日期。 目标是按成员和剧集开始/结束日期对结果进行分组。  
 之前已经提出了一个非常类似的问题 ，并且有些帮助。 问题是在自定义代码时，返回的表将排除第一个和最后一个记录。 我不知道该怎么办。  
 我的数据目前看起来像这样：  
MemberCode       Diagnosis              ServiceDate         
1001   -----        ABC      -----       2010-02-04           
1001   -----        ABC      -----       2010-03-20          
1001   -----        ABC      -----       2010-04-18          
1001   -----        ABC      -----       2010-05-22         
1001   -----        ABC      -----       2010-09-26          
1001   -----        ABC      -----       2010-10-11  
1001   -----        ABC      -----       2010-10-19
2002   -----        XYZ      -----       2010-07-10          
2002   -----        XYZ      -----       2010-07-21
2002   -----        XYZ      -----       2010-11-08
2002   -----        ABC      -----       2010-06-03           
2002   -----        ABC      -----       2010-08-13         
 
 在上述数据中，会员1001的第一个记录是2010-02-04 ，并且在2010-09-26 （新剧集开始的日期）之前连续服务之间的差异不超过90天。 因此，会员1001有两个不同的剧集：（1）诊断ABC，从2010-02-04到2010-05-22 ，以及（2）诊断ABC，从2010-09-26到2010-10-19 。  
 同样，2002年会员有三个不同的剧集：（1）诊断XYZ，从2010-07-10到2010-07-21 ，（2）诊断XYZ，在2010-11-08开始和结束，和（3） ）诊断ABC，从2010-06-03到2010-08-13 。  
 期望的输出：  
MemberCode         Diagnosis       EpisodeStartDate          EpisodeEndDate
1001   -----          ABC   -----     2010-02-04   -----       2010-05-22
1001   -----          ABC   -----     2010-09-26   -----       2010-10-19
2002   -----          XYZ   -----     2010-07-10   -----       2010-07-21
2002   -----          XYZ   -----     2010-11-08   -----       2010-11-08
2002   -----          ABC   -----     2010-06-03   -----       2010-08-13
 
 我一直在研究这个查询太久了，仍然无法得到我需要的东西。 任何帮助，将不胜感激。 提前致谢！ 

Objective: Identify distinct episodes of continuous treatment for each member in a table. Each member has a diagnosis and a service date, and an episode is defined as all services where the time between each consecutive service is less than some number (let's say 90 days for this example). The query will need to loop through each row and calculate the difference between dates, and return the first and last date associated with each episode. The goal is to group results by member and episode start/end date. 
A very similar question has been asked before, and was somewhat helpful. The problem is that in customizing the code, the returned tables are excluding first and last records. I'm not sure how to proceed. 
My data currently looks like this: 
MemberCode       Diagnosis              ServiceDate         
1001   -----        ABC      -----       2010-02-04           
1001   -----        ABC      -----       2010-03-20          
1001   -----        ABC      -----       2010-04-18          
1001   -----        ABC      -----       2010-05-22         
1001   -----        ABC      -----       2010-09-26          
1001   -----        ABC      -----       2010-10-11  
1001   -----        ABC      -----       2010-10-19
2002   -----        XYZ      -----       2010-07-10          
2002   -----        XYZ      -----       2010-07-21
2002   -----        XYZ      -----       2010-11-08
2002   -----        ABC      -----       2010-06-03           
2002   -----        ABC      -----       2010-08-13         
 
In the above data, the first record for Member 1001 is 2010-02-04, and there is not a difference of more than 90 days between consecutive services until 2010-09-26 (the date at which a new episode starts). So Member 1001 has two distinct episodes: (1) Diagnosis ABC, which goes from 2010-02-04 to 2010-05-22, and (2) Diagnosis ABC, which goes from 2010-09-26 to 2010-10-19.  
Similarly, Member 2002 has three distinct episodes: (1) Diagnosis XYZ, which goes from 2010-07-10 to 2010-07-21, (2) Diagnosis XYZ, which begins and ends on 2010-11-08, and (3) Diagnosis ABC, which goes from 2010-06-03 to 2010-08-13. 
Desired output: 
MemberCode         Diagnosis       EpisodeStartDate          EpisodeEndDate
1001   -----          ABC   -----     2010-02-04   -----       2010-05-22
1001   -----          ABC   -----     2010-09-26   -----       2010-10-19
2002   -----          XYZ   -----     2010-07-10   -----       2010-07-21
2002   -----          XYZ   -----     2010-11-08   -----       2010-11-08
2002   -----          ABC   -----     2010-06-03   -----       2010-08-13
 
I've been working on this query for too long, and still can't get exactly what I need. Any help would be appreciated. Thanks in advance! 

原文：

更新时间：2021-10-16 08:10

最满意答案

 您的rbindlist(lapply(...))可以使用版本1.9.7的data.table替换为非equi连接  
specialty.dt[ provider.dt, on = .(p1 <= prob, p2 > prob)]
 
 这将specialty.dt直接连接到provider.dt ，使用p1 <= prob和prob < p2 。  
 
 参考  
 这是一个类似问题的列表  
 这是阿伦的一次演讲 

Your rbindlist(lapply(...)) can be replaced with a non-equi join using version 1.9.7 of data.table 
specialty.dt[ provider.dt, on = .(p1 <= prob, p2 > prob)]
 
This joins specialty.dt onto provider.dt directly, using the condition that p1 <= prob, and prob < p2. 
 
References 
Here's a list of similar questions 
And here's a talk by Arun

SQL：在每个成员的多个开始和结束日期范围内识别不同的处理块(SQL: Identify distinct blocks of treatment over multiple start and end date ranges for each member)

最满意答案

相关问答

TCP/IP模型是一个________。[2023-05-19]

下列中不属于面向对象的编程语言的是?[2022-05-30]

Rails Model找到不平等的地方(Rails Model find where not equal)[2023-05-30]

检查不平等的有效方法(Efficient way to check for inequality)[2024-01-04]

浮点平等和不平等(Floating-point equality and inequality)[2023-06-27]

否定平等与不平等(negated equal vs. not equal)[2022-07-08]

如何在哪里“不平等”？(How to get “not equal” in Where?)[2023-04-16]

不平等的声明，不工作[重复](Not equal statement, not working [duplicate])[2021-11-25]

在scipy中指定大于不平等(Specifying greater than inequality in scipy)[2023-06-25]

加速不平等的加入(speeding up a non-equal join)[2022-05-09]

相关文章

最新问答