oracle - Finding duplicate values in table (and getting their pk's) -


i've problem selecting values i've simplefied samples below. basically, have table this:

create table sample_table (   pk_id        number,   business_id  number ) 

now business_id's in table duplicates, , need know pk's of records.

let's assume (further) build , fill table this:

alter table sample_table add (   constraint sample_table_pk  primary key  (pk_id));   create sequence sample_sequence;   create trigger sample_trigger before insert on sample_table each row   begin     :new.pk_id := sample_sequence.nextval;   end;    insert sample_table (business_id) values (1000);  insert sample_table (business_id) values (1001);  insert sample_table (business_id) values (1002);  insert sample_table (business_id) values (1003);  insert sample_table (business_id) values (1003);  insert sample_table (business_id) values (1004); 

now figuring out business_id's duplicate easy:

  select   business_id, count (business_id)       sample_table group   business_id   having   count (business_id) > 1; 

but not want business_id's, want pk_id's.

i can them using above query subquery:

select * sample_table business_id in (   select   business_id       sample_table group   business_id   having   count (business_id) > 1); 

or using count ( * ) on partition subquery factoring

with q  (select   business_id, count ( * ) on (partition business_id) totalcount     sample_table) select * q q.totalcount > 1 

but both of them make query pretty slow (the 1 sample work ok, when work production data of 500.000 rows, performance not great) wondering if there's nicer way this.

as stands table , pk index, first query:

select * sample_table business_id in (   select   business_id       sample_table group   business_id   having   count (business_id) > 1); 

will need full table scan evaluate subquery , main query need full scan given list of business_ids found (the pk index won't of use this.) you'll see plan this:

-----------------------------------------------... | id  | operation             | name         | ... -----------------------------------------------... |   0 | select statement      |              | ... |*  1 |  hash join right semi |              | ... |   2 |   view                | vw_nso_1     | ... |*  3 |    filter             |              | ... |   4 |     hash group     |              | ... |   5 |      table access full| sample_table | ... |   6 |   table access full   | sample_table | ... -----------------------------------------------...  predicate information (identified operation id): ---------------------------------------------------     1 - access("business_id"="business_id")    3 - filter(count(*)>1) 

throw unique index on business_id , pk_id (in order) , should able forgo 2nd table scan , use index duplicate business_ids. (the first table scan unavoidable has check of rows possible duplication.) composite index, oracle can business_id , grab pk_id @ same time without having jump on table.

-------------------------------------------------... | id  | operation             | name            |... -------------------------------------------------... |   0 | select statement      |                 |... |   1 |  nested loops         |                 |... |   2 |   view                | vw_nso_1        |... |*  3 |    filter             |                 |... |   4 |     hash group     |                 |... |   5 |      table access full| sample_table    |... |*  6 |   index range scan    | business_id_idx |... -------------------------------------------------...  predicate information (identified operation id): ---------------------------------------------------     3 - filter(count(*)>1)                              6 - access("business_id"="business_id") 

this should work pretty if duplicates exception. if, worst case scenario, business_ids duplicates, index lookups ugly.

you try bit funkier this:

select business_id, listagg(pk_id) within group (order pk_id) sample_table group business_id having count(*) > 1 

now 1 full table scan, pk_ids glued on same line.


Comments

Popular posts from this blog

matlab - Deleting rows with specific rules -

php - MySQLi multi_query results for later use -