Wednesday, March 21, 2012

Is this an ok design?

Hey im really confused about this database design ive encountered, and I
wonder if anyone can help me understqand if this is a decent design and what
to do about it. The database contains about 24 billion records. It is split
into a distributed partitioned view on a RAID 5 of 7 different harddrives.
There are no columns that are unique...the closest one is a datetime but
there can be up to 10 records with the same date time...the rest of the
columns can expect to have a few billion dupicates. There is this bizarre
thing...a timestamp column, and the database is clustered on the timestamp
and the view uses a check constraint on that. It is not a unique index. I
understand that timestamp will change whenever any data is modified so I don
t
understand what effect this might have overall. Any thoughts on this?
Message posted via http://www.droptable.comCharlesT via droptable.com wrote:

> Hey im really confused about this database design ive encountered, and I
> wonder if anyone can help me understqand if this is a decent design and wh
at
> to do about it. The database contains about 24 billion records. It is spli
t
> into a distributed partitioned view on a RAID 5 of 7 different harddrives.
> There are no columns that are unique...the closest one is a datetime but
> there can be up to 10 records with the same date time...the rest of the
> columns can expect to have a few billion dupicates. There is this bizarre
> thing...a timestamp column, and the database is clustered on the timestamp
> and the view uses a check constraint on that. It is not a unique index. I
> understand that timestamp will change whenever any data is modified so I d
ont
> understand what effect this might have overall. Any thoughts on this?
> --
> Message posted via http://www.droptable.com
How is the data used? There may be a case for temporarily staging data
like this - if you have an external data feed over which you don't have
direct control for example - but to store data permanently in this form
has to be insanely inefficient. According to your narrative upto 90% of
the data is redundant, which is a heavy price to pay at 24 billion
rows!
Clustering on a TIMESTAMP column makes no sense that I can see. Doing
so will create an index hot-spot and TIMESTAMP is anyway not normally
used in queries without other key columns.
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--|||Hi
Each table in SQL Serverc must have a PRIMERY KEY. Read about it for more
details in the BOL.
What does it give you , having a CI on timestamp column?
"CharlesT via droptable.com" <u18248@.uwe> wrote in message
news:5b6aaa612bed9@.uwe...
> Hey im really confused about this database design ive encountered, and I
> wonder if anyone can help me understqand if this is a decent design and
> what
> to do about it. The database contains about 24 billion records. It is
> split
> into a distributed partitioned view on a RAID 5 of 7 different harddrives.
> There are no columns that are unique...the closest one is a datetime but
> there can be up to 10 records with the same date time...the rest of the
> columns can expect to have a few billion dupicates. There is this bizarre
> thing...a timestamp column, and the database is clustered on the timestamp
> and the view uses a check constraint on that. It is not a unique index. I
> understand that timestamp will change whenever any data is modified so I
> dont
> understand what effect this might have overall. Any thoughts on this?
> --
> Message posted via http://www.droptable.com|||Are you certain that the clustered index/partitioning column is a timestamp
column? I agree that a timestamp *data type* column would make a very poor
choice for the clustered index. However, a clustered index on a
sequentially increasing value (e.g. CreateDate datetime value or identity
column) can greatly improve insert performance for large tables because I/O
is reduced by 'appending' to the end of the table.
For example, I had a single table containing 2-3 billion rows. The PK index
was clustered and included an increasing datetime column as the high-order
column. Insert performance was excellent as were deletes (based on data
age). Queries always specified a datetime range and performance was
directly proportional to number of rows in the range.
I believe you would get similar performance (perhaps even better) with this
type of index design in a DPV, even if the clustered index is not the PK. A
lot depends on the nature of your queries.
Hope this helps.
Dan Guzman
SQL Server MVP
"CharlesT via droptable.com" <u18248@.uwe> wrote in message
news:5b6aaa612bed9@.uwe...
> Hey im really confused about this database design ive encountered, and I
> wonder if anyone can help me understqand if this is a decent design and
> what
> to do about it. The database contains about 24 billion records. It is
> split
> into a distributed partitioned view on a RAID 5 of 7 different harddrives.
> There are no columns that are unique...the closest one is a datetime but
> there can be up to 10 records with the same date time...the rest of the
> columns can expect to have a few billion dupicates. There is this bizarre
> thing...a timestamp column, and the database is clustered on the timestamp
> and the view uses a check constraint on that. It is not a unique index. I
> understand that timestamp will change whenever any data is modified so I
> dont
> understand what effect this might have overall. Any thoughts on this?
> --
> Message posted via http://www.droptable.com|||On Sun, 05 Feb 2006 08:44:54 GMT, CharlesT via droptable.com wrote:

>Hey im really confused about this database design ive encountered, and I
>wonder if anyone can help me understqand if this is a decent design and wha
t
>to do about it. The database contains about 24 billion records. It is split
>into a distributed partitioned view on a RAID 5 of 7 different harddrives.
>There are no columns that are unique...the closest one is a datetime but
>there can be up to 10 records with the same date time...the rest of the
>columns can expect to have a few billion dupicates. There is this bizarre
>thing...a timestamp column, and the database is clustered on the timestamp
>and the view uses a check constraint on that. It is not a unique index. I
>understand that timestamp will change whenever any data is modified so I do
nt
>understand what effect this might have overall. Any thoughts on this?
Hi Charles,
The datatype name "timestamp" is my candidate for the most ill-chosen
name of the last century. :-)
Just to clarify: are you talking about a column that is used to keep a
true timestamp (i.e. the time that something happened), is maybe even
called timestamp but has a datetime datatype? Or are you talking about a
column with a "timestamp" (equivalent to "rowversion") datatype, used to
help in optimistic locking scenario's?
In the first case, it might not be that bad at all - especially if it is
a timestamp that will never change (i.e. the moment a row was entered in
the database). From your message, I'm afraid you'll still have other
issues with your design, but this wouldn't be one of them.
In the latter case, I can only advise you to change this ASAP. Or run
away screaming - your choice <g>
Hugo Kornelis, SQL Server MVP|||yeah this table has no primary key.
its call logs imported from another server.
Sometimes the index just doesnt work and it can take hours to run a normal
query.
There is no natural candidate for a primary key and i am told that there is
no reason to have a surrogate key because of the redundancy of data. I am
also told an index is pointless for the same reason. Queries are typically o
f
the form how many calls with this attribute where made in a certain month. I
t
seems like date could be a candidate for an index then, but its not there. I
m
not sure if the person who designed the database knew that timestamp and
datetime are very different.
Message posted via droptable.com
http://www.droptable.com/Uwe/Forum...server/200602/1|||> yeah this table has no primary key.
Your view is not a true distributed partitioned view if you have no primary
key. In a DPV, you must have a primary key, partition based a column that
is part of the PK and create a check constraint on the partitioning column
that enforces how data are partitioned. This allows SQL Server to eliminate
unneeded tables during query optimization. With a regular UNION ALL view
(non-DPV), all of the underlying tables need to be accessed during queries.
Although you might not have a single column key, I would expect that you
might have a composite key uniquely identifies a row. For example,
CallDateTime, OriginatingNumber and TerminatingNumber can uniquely identify
a call event. If not, then you'll need to introduce a surrogate key to
address dirty data (assuming data scrubbing isn't an option).

> Sometimes the index just doesnt work and it can take hours to run a normal
> query.
I would expect all queries against this view to take hours unless a
timestamp value or narrow range is specified.
You mentioned in your original post that the timestamp column has a check
constraint on it. Can you post the complete DDL for one of the tables?
Hope this helps.
Dan Guzman
SQL Server MVP
"CharlesT via droptable.com" <u18248@.uwe> wrote in message
news:5b792579afb4f@.uwe...
> yeah this table has no primary key.
> its call logs imported from another server.
> Sometimes the index just doesnt work and it can take hours to run a normal
> query.
> There is no natural candidate for a primary key and i am told that there
> is
> no reason to have a surrogate key because of the redundancy of data. I am
> also told an index is pointless for the same reason. Queries are typically
> of
> the form how many calls with this attribute where made in a certain month.
> It
> seems like date could be a candidate for an index then, but its not there.
> Im
> not sure if the person who designed the database knew that timestamp and
> datetime are very different.
> --
> Message posted via droptable.com
> http://www.droptable.com/Uwe/Forum...server/200602/1sql

No comments:

Post a Comment