better method for DTS: Is this possible without using cursors?

Hi!

I have 2 tables: Person and Address

Person
(
PersonID int PK
)

Adress
(
AddresID int PK,
PersonID int FK,
Default -- 1 if address is default for person
)

so when I join those table it yelds (for example):

p1 a1 1
p1 a2 0
p1 a3 0
p2 a4 1
p3 a5 0
p3 a6 0

Person may:
- have one default addres and some non-default;
- haven't default address;
- have only default adres

So proper result is:

p1 a1 p1 a1
p2 a4 OR p2 a4
p3 a5 p3 a6

I want to get list of persons and their adresses in following manner:

Get person and
- default adress ID if exists for this person
- any address ID else

Is this possible without using cursors?

Regards,
Walter

Walter:

This can be done with an outer join. The main thing to verify is what you want to do when you have NULL returned for OUTER TABLE results:

select p.personID,
a.addressID,
a.[default]
from Person p
left join address a
on p.personID = a.personID
and a.[default] = 1
-- Sample Output:
-- personID addressID default
-- -- -- -
-- 1 1 1
-- 2 4 1
-- 3 NULL NULL
Dave

|||

Hmmm...

I that statement, you have persons with defaults addresses, or info that there's no default address. But there is no addressID when it is non-default. We don't want NULLs, but any non-default addressID...

Walte

|||

Sorry about that. Is this closer to what you have in mind?

select personID,
addressID
from ( select p.personID,
a.addressID,
a.[default],
row_number () over
( partition by p.personID
order by a.[default] desc, a.addressID
) as seq
from Person p
inner join address a
on p.personID = a.personID
) a
where seq = 1
-- Sample Output:
-- personID addressID
-- -- --
-- 1 1
-- 2 4
-- 3 5

|||

Please mention the version of SQL Server so it is easy to suggest the correct solution that will work in that version or all versions. You can do below in SQL Server 2005:

select ...

from Person as p

cross apply (

select top 1 *

from Address as a

where a.PersonID = p.PersonID

order by a.Default desc

) as pa

And add an index on (Address.PersonID, Address.Default DESC) to get the best performance. If you need solution for older version of SQL Server then please reply back.

|||

I simulated 32767 rows of person data with about 75000 rows of address data with:

drop table dbo.Person
go
create table dbo.Person
(
PersonID int primary key
)
go
drop table dbo.address
go
create table dbo.Address
(
AddressID int primary key,
PersonID int,
[default] tinyint
)
go
insert into person
select iter
from small_iterator
insert into address
select personID,
personID,
0
from person
where personID % 11 > 1
insert into address
select 32768 + personID,
personID,
0
from person
where personID % 11 < 2
insert into address
select 2*32768 + personID,
personID,
0
from person
where personID %17 not in (3, 5, 13)
insert into address
select 3*32768 + personID,
personID,
0
from person
where personID %23 not in (7, 10, 13, 19)

First, I ran this query with these results:

declare @.begDt datetime
set @.begDt = getdate()
select personID,
addressID
from ( select p.personID,
a.addressID,
a.[default],
row_number () over
( partition by p.personID
order by a.[default] desc, a.addressID
) as seq
from Person p
inner join address a
on p.personID = a.personID
) a
where seq = 1
print ' '
select datediff (ms, @.begDt, getdate()) as [Elapsed Time]
-- Sample Output:
-- personID addressID
-- -- --
-- 1 32769
-- 2 2
-- ...
-- 32766 32766
-- 32767 32767

-- |--Filter(WHERE:([Expr1004]=(1)))
-- |--Sequence Project(DEFINE:([Expr1004]=row_number))
-- |--Compute Scalar(DEFINE:([Expr1006]=(1)))
-- |--Segment
-- |--Sort(ORDER BY:([p].[PersonID] ASC, .[default] DESC, .[AddressID] ASC))
-- |--Hash Match(Inner Join, HASH:([p].[PersonID])=(.[PersonID]), RESIDUAL:([Mugambo].[dbo].[Address].[PersonID] as .[PersonID]=[Mugambo].[dbo].[Person].[PersonID] as [p].[PersonID]))
-- |--Clustered Index Scan(OBJECT:([Mugambo].[dbo].[Person].[PK__Person__7D63964E] AS [p]))
-- |--Clustered Index Scan(OBJECT:([Mugambo].[dbo].[Address].[PK__Address__7F4BDEC0] AS ))
-- Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
-- Table 'Address'. Scan count 1, logical reads 196, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
-- Table 'Person'. Scan count 1, logical reads 55, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

-- Elapsed Time
--
-- 656

And then ran this query with these results:

declare @.begDt datetime
select @.begDt = getdate()
select p.personID,
pa.addressID
from person p
cross apply
( select top 1 *
from address as a
where a.personID = p.personID
order by a.[default] desc
) pa
print ' '
select datediff (ms, @.begDt, getdate()) as [Elapsed Time]
-- |--Nested Loops(Inner Join, OUTER REFERENCES:([p].[PersonID]))
-- |--Clustered Index Scan(OBJECT:([Mugambo].[dbo].[Person].[PK__Person__7D63964E] AS [p]))
-- |--Sort(TOP 1, ORDER BY:(.[default] DESC))
-- |--Index Spool(SEEK:(.[PersonID]=[Mugambo].[dbo].[Person].[PersonID] as [p].[PersonID]))
-- |--Clustered Index Scan(OBJECT:([Mugambo].[dbo].[Address].[PK__Address__7F4BDEC0] AS ))
-- Table 'Worktable'. Scan count 32767, logical reads 242508, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
-- Table 'Address'. Scan count 1, logical reads 196, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
-- Table 'Person'. Scan count 1, logical reads 55, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
-- Elapsed Time
--
-- 2563

Dave

|||Please add an index on (Address.PersonID, Address.Default DESC) and you will get the best performance with the APPLY approach. Also, the results from both queries will not be identical because of the additional ordering on AddressID in your query.|||

Hi,

thanks for yours posts :) Unfortunately I use SQL SERVER 2000....

Regards,
Walter

|||

The following query might help you

Select Person.PersonID,Isnull(Adress.AddresID,Adress2.AddresID) from Person
left Outer Join Adress on Person.PersonID = Adress.PersonID And Adress.isDefault = 1
left Outer Join Adress Adress2 On Person.PersonID = Adress2.PersonID And Adress2.isDefault = 0
and Adress2.AddresID = (Select Top 1 AddresID From Adress Sub Where Adress2.PersonID = Sub.PersonID Order By newId())

|||

Select Person.PersonID,Isnull(Adress.AddresID,Adress2.AddresID) from Person
left Outer Join Adress on Person.PersonID = Adress.PersonID And Adress.isDefault = 1
left Outer Join Adress Adress2 On Person.PersonID = Adress2.PersonID And Adress2.isDefault = 0
and Adress2.AddresID = (Select Top 1 AddresID From Adress Sub Where Adress2.PersonID = Sub.PersonID Order By newId())

This query has me confused; I am not sure that I am getting the correct return data for my large test case.

|||

Here if there is a default value then it will pull that value..

If there is no default value instead of fetching Min/Max address id it will pull the random address from the DB.

May be your test case fail if you try to get Max/Min address id as expected.

Change your test case as "IN Non Default Address List instead of Particular Address"

|||Well, I am getting nulls for some address IDs and some rows that have duplicate personIDs. And I agree with you in that this might be a data problem. (I gotta get something to eat.)|||

Mani:

Ran the following three queries and got the results that follow:

select count(*) as [Person Records] from person
select count(distinct personID) as [Address Records] from address
select count(distinct a.personID) [Matched Person Records] from person a inner join address b on a.personID = b.personID

-- Person Records
-- --
-- 32767

-- Address Records
--
-- 32767

-- Matched Person Records
-- -
-- 32767

Therefore, I do not think that the problem is a data problem. I re-examined your select and I question this line:

and Adress2.AddresID = (Select Top 1 AddresID From Adress Sub Where Adress2.PersonID = Sub.PersonID Order By newId())

specifically, the "... Where Address2.PersonID ..." portion. I think this is the source of the NULL and duplicate PersonID records. When I change this line to:

and Adress2.AddresID = (Select Top 1 AddresID From Adress Sub Where Adress2.PersonID = Sub.PersonID Order By newId())

I get what I perceive to be "correct" results. Please verify whether or not you aggree.

Also, I completely agree with Umachandar's assessment that we need an additional index. I am therefore going to add a cover index based on (1) personID, (2) default and (3) addressID.

Running one more test.
Dave

|||

Mani:

Ran the following three queries and got the results that follow:

-- Person Records
-- --
-- 32767

-- Address Records
--
-- 32767

-- Matched Person Records
-- -
-- 32767

Therefore, I do not think that the problem is a data problem. I re-examined your select and I question this line:

and Adress2.AddresID = (Select Top 1 AddresID From Adress Sub Where Adress2.PersonID = Sub.PersonID Order By newId())

specifically, the "... Where Address2.PersonID ..." portion. I think this is the source of the NULL and duplicate PersonID records. When I change this line to:

and Adress2.AddresID = (Select Top 1 AddresID From Adress Sub Where Adress2.PersonID = Sub.PersonID Order By newId())

I get what I perceive to be "correct" results. Please verify whether or not you aggree.

Also, I completely agree with Umachandar's assessment that we need an additional index. I am therefore going to add a cover index based on (1) personID, (2) default and (3) addressID.

Running one more test.
Dave

|||

I ran the modified version of Mani's query against my mock tables and got the following results:

-- |--Compute Scalar(DEFINE:([Expr1010]=isnull([Address].[AddressID], [Address].[AddressID])))
-- |--Nested Loops(Left Outer Join, OUTER REFERENCES:([Person].[PersonID]))
-- |--Merge Join(Right Outer Join, MANY-TO-MANY MERGE:([Address].[PersonID])=([Person].[PersonID]), RESIDUAL:([Address].[PersonID]=[Person].[PersonID]))
-- | |--Sort(ORDER BY:([Address].[PersonID] ASC))
-- | | |--Clustered Index Scan(OBJECT:([tempdb].[dbo].[Address].[PK__Address__701695AD]), WHERE:([Address].[default]=1))
-- | |--Clustered Index Scan(OBJECT:([tempdb].[dbo].[Person].[PK__Person__6E2E4D3B]), ORDERED FORWARD)
-- |--Hash Match(Cache, HASH:([Person].[PersonID]), RESIDUAL:([Person].[PersonID]=[Person].[PersonID]))
-- |--Nested Loops(Inner Join, OUTER REFERENCES:([Address].[AddressID]))
-- |--Sort(TOP 1, ORDER BY:([Expr1009] ASC))
-- | |--Compute Scalar(DEFINE:([Address].[AddressID]=[Address].[AddressID], [Expr1009]=newid()))
-- | |--Index Spool(SEEK:([Address].[PersonID]=[Person].[PersonID]))
-- | |--Clustered Index Scan(OBJECT:([tempdb].[dbo].[Address].[PK__Address__701695AD]))
-- |--Clustered Index Seek(OBJECT:([tempdb].[dbo].[Address].[PK__Address__701695AD]), SEEK:([Address].[AddressID]=[Address].[AddressID]), WHERE:([Address].[PersonID]=[Person].[PersonID] AND [Address].[default]=0) ORDERED FORWARD)
-- Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0.
-- Table 'Person'. Scan count 1, logical reads 54, physical reads 0, read-ahead reads 0.
-- Table 'Address'. Scan count 3, logical reads 585, physical reads 0, read-ahead reads 0.
-- Table 'Worktable'. Scan count 85514, logical reads 347474, physical reads 0, read-ahead reads 0.
-- Elapsed Time
--
-- 2186

I next added this index:

create index address_personID_ndx
on address (personID, [default] desc, addressID)

And then I re-tested the modified version of Mani's query and received these improved results:

-- |--Compute Scalar(DEFINE:([Expr1010]=isnull([Address].[AddressID], [Address].[AddressID])))
-- |--Nested Loops(Left Outer Join, OUTER REFERENCES:([Person].[PersonID]))
-- |--Merge Join(Right Outer Join, MANY-TO-MANY MERGE:([Address].[PersonID])=([Person].[PersonID]), RESIDUAL:([Address].[PersonID]=[Person].[PersonID]))
-- | |--Index Scan(OBJECT:([tempdb].[dbo].[Address].[address_personID_ndx]), WHERE:([Address].[default]=1) ORDERED FORWARD)
-- | |--Clustered Index Scan(OBJECT:([tempdb].[dbo].[Person].[PK__Person__6E2E4D3B]), ORDERED FORWARD)
-- |--Hash Match(Cache, HASH:([Person].[PersonID]), RESIDUAL:([Person].[PersonID]=[Person].[PersonID]))
-- |--Nested Loops(Inner Join, OUTER REFERENCES:([Address].[AddressID]))
-- |--Sort(TOP 1, ORDER BY:([Expr1009] ASC))
-- | |--Compute Scalar(DEFINE:([Address].[AddressID]=[Address].[AddressID], [Expr1009]=newid()))
-- | |--Index Seek(OBJECT:([tempdb].[dbo].[Address].[address_personID_ndx]), SEEK:([Address].[PersonID]=[Person].[PersonID]) ORDERED FORWARD)
-- |--Index Seek(OBJECT:([tempdb].[dbo].[Address].[address_personID_ndx]), SEEK:([Address].[PersonID]=[Person].[PersonID] AND [Address].[default]=0 AND [Address].[AddressID]=[Address].[AddressID]) ORDERED FORWARD)
-- Table 'Address'. Scan count 65535, logical reads 131671, physical reads 0, read-ahead reads 0.
-- Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0.
-- Table 'Person'. Scan count 1, logical reads 54, physical reads 0, read-ahead reads 0.
-- Elapsed Time
--
-- 1953

I then tested a different query and obtained the results that follow:

select p.personID,
( select top 1 addressID from address b
where p.personID = b.personID
order by [default] desc
) as addressID
from Person p
-- |--Compute Scalar(DEFINE:(.[AddressID]=.[AddressID]))
-- |--Nested Loops(Left Outer Join, OUTER REFERENCES:([p].[PersonID]))
-- |--Clustered Index Scan(OBJECT:([tempdb].[dbo].[Person].[PK__Person__6E2E4D3B] AS [p]))
-- |--Compute Scalar(DEFINE:(.[AddressID]=.[AddressID]))
-- |--Top(1)
-- |--Index Seek(OBJECT:([tempdb].[dbo].[Address].[address_personID_ndx] AS ), SEEK:(.[PersonID]=[p].[PersonID]) ORDERED FORWARD)
-- Table 'Address'. Scan count 32767, logical reads 65593, physical reads 0, read-ahead reads 0.
-- Table 'Person'. Scan count 1, logical reads 54, physical reads 0, read-ahead reads 0.
-- Elapsed Time
--
-- 1890

There is not a lot of difference with respect to execution time between this query and the modified version of Mani's query. This version has one less join and therefore half the scans and half the logical reads. Also, if you neglect to add the index this query is liable to run slow because of the NESTED LOOP join and associated TABLE SCANS instead of INDEX SEEKS.

Either way, BE SURE TO ADD THE INDEX RECOMMENDED BY Umachandar!

Dave

sql

Monday, March 26, 2012

Is this possible without using cursors?

No comments:

Post a Comment

better method for DTS

Blog Archive

About Me