sql server - How to find matching pairs with transitivity of equality in MS SQL -
if have table data this:
x1 y1 x2 y2 a001 1 b001 2 a001 1 b002 2 a002 2 a001 1 c001 2 b003 3 c002 1 b003 3
what sql query (microsoft sql server), achieve result this:
groupid x y 1 a001 1 1 b001 2 1 b002 2 1 a002 2 2 c001 2 2 c002 1 2 b003 3
it's grouping equal pairs like: if == b , b == c == c
well, after some trying found following:
declare @mod int; set @mod=1; declare @newgrp int; set @newgrp=1; create table tbl([x1] varchar(4), [y1] int, [x2] varchar(4), [y2] int); insert tbl ([x1], [y1], [x2], [y2]) select 'a001', 1, 'b001', 2 -- modified input create chained equalities: union select 'b001', 2, 'b002', 2 -- --> replaced a001 1 b001 2 union select 'a002', 2, 'b002', 1 union select 'c001', 2, 'b003', 3 union select 'c002', 1, 'b003', 3; select x,y,0 grp tmp ( select x1 x,y1 y tbl union select x2 x, y2 y tbl ) t; -- set first seed: grp=1 on first id ... update top(1) tmp set grp=1; -- iteratively populate tmp table while @newgrp>0 -- each group begin while @mod>0 begin -- in case of chained equalities update t2 set grp=tmp.grp tmp inner join ( select x1,x2 tbl union select x2,x1 tbl ) -- group assignments in both directions! tt on tt.x1 = tmp.x , tmp.grp>0 inner join tmp t2 on t2.x = x2 , t2.grp=0 set @mod=@@rowcount; end -- ok, move on next group , repeat game ... update top(1) tmp set grp=(select max(grp) tmp)+1 grp=0 select @newgrp=@@rowcount, @mod=1; end -- show result select * tmp
result:
x y grp ---- --- --- a001 1 1 a002 2 1 b001 2 1 b002 2 1 b003 3 2 c001 2 2 c002 1 2
the suggested sample script assumes column y not relevant comparing (the sample data has each x-value 1 y value). if necessary of course possible include y columns comparison process.
edit:
et violá (the comparison includes y columns too): and ... here sqlfiddle go (i put in many semicolons @ first - silly me)!
create table tbl([x1] varchar(4), [y1] int, [x2] varchar(4), [y2] int); insert tbl (x1, y1, x2, y2) select 'a001', 1, 'b001', 2 union select 'a001', 1, 'b002', 2 union select 'a002', 2, 'a001', 1 union select 'd001', 3, 'b003', 3 union select 'd003', 1, 'd001', 3 union select 'd001', 1, 'a001', 1 union select 'c001', 2, 'b003', 3 union select 'c002', 1, 'b003', 3 -- start of processing ... select x,y,0 grp tmp ( select x1 x,y1 y tbl union select x2 x, y2 y tbl ) t; declare @mod int set @mod=1 declare @newgrp int set @newgrp=1 update top(1) tmp set grp=1 -- set first grp-label (seed) -- iteratively populate tmp table while @newgrp>0 -- each group begin while @mod>0 -- in case of chained equalities begin update t2 set grp=tmp.grp tmp inner join ( select x1,y1,x2,y2 tbl union select x2,y2,x1,y1 tbl ) -- group assignments in both directions! tt on tt.x1 = tmp.x , tt.y1 = tmp.y , tmp.grp>0 inner join tmp t2 on t2.x = tt.x2 , t2.y = tt.y2 , t2.grp=0 set @mod=@@rowcount -- ok, move on next group , repeat game ... end update top(1) tmp set grp=(select max(grp) tmp)+1 grp=0 select @newgrp=@@rowcount, @mod=1 end -- show result select * tmp -- , drop tmp again drop table tmp
i added sample data show chained equality ('d001' 3
= 'b003' 3
, 'd003' 1
= 'd001' 3
) , have cases differing y-values ('d001',1
, 'd001' 3
). while
loops caused me head-ache since did not pay enough attention contents of @@rowcount @ first ... should work again!
the chained equailty cases (recursion) main problem in query. if had not been, done in single statement, see @roman pekar)
results of (extended) example:
x y grp ------ --- ---- a001 1 1 a002 2 1 b001 2 1 b002 2 1 b003 3 2 c001 2 2 c002 1 2 d001 1 1 d001 3 2 d003 1 2
Comments
Post a Comment