sql server
Author
Discussion

tim_s

Original Poster:

299 posts

278 months

Monday 27th September 2004
quotequote all
i'm working on rather a large project at the moment which uses a sql server db that will soon hold tens of millions of records. i've developed some reporting screens but these will begin to become very slow as the amount of data being queried increases. ideally i would like to have 4+ servers to speed things up but i'm not sure how i could load balance these / make them appear as a single entity. can anyone offer any advice? i though clustering would do this but after some research i've found that it does not.

>>> Edited by tim_s on Monday 27th September 16:49

ATG

23,119 posts

296 months

Monday 27th September 2004
quotequote all
I'm pretty sure a query on SQL Server runs on a single processor. So if you want to get some parallelism going on in what is logically a single query, you'll have to break it up into several queries. If you've got one or two whopping great tables you could try splitting them and putting the on different boxes. Ugly though. Have you already exhausted the performance tweaking stuff, like indexing the tables, having tonnes of RAM, fast discs, etc? Obviously depends on the application, but I've used SQL Server to manage tables with tens of millions of rows and got reasonable performance on a dual proc box with fast SCSI drives ... only a handful of simultanious users, mind you.

>> Edited by ATG on Monday 27th September 16:56

Liszt

4,334 posts

294 months

Monday 27th September 2004
quotequote all
Put different tables on different machines.

However for reporting millions of lines of raw data, I'd look to do some aggregating and store that somewhere to improve response times

Plotloss

67,280 posts

294 months

Monday 27th September 2004
quotequote all
I take it that Indexes are being perpetuated rather than the usual SQL create/destroy?

What sort of test is selecting the records?

In the long term the best solution is simply to throw more iron at one instance of SQL Server but the prices of the SQL licenses go up spectacularly for multi-proc boxes. Its still cheaper than tweak, tweak, tweak ad infinitum though.

tim_s

Original Poster:

299 posts

278 months

Monday 27th September 2004
quotequote all
thanks for the replies.

i had thought of a 'ugly' solution ie. having 4 seperate db's, doing 4 seperate queries and then merging the datatables in memory and displaying that on the page but that means the webserver will need quite a bit of ram.

by far the most important thing is the speed of the insert statements so i can't really optimise the table for select's (i.e indexing).

Plotloss

67,280 posts

294 months

Monday 27th September 2004
quotequote all
Surely inserting in a file with 10 million records is exactly the same as inserting into a file with 4?

Its the maintenance of the index thats taking the time.

With SQL is there a static/dynamic maintenance option?

I dont know much about SQL Server you see...

tim_s

Original Poster:

299 posts

278 months

Monday 27th September 2004
quotequote all
Plotloss said:
I take it that Indexes are being perpetuated rather than the usual SQL create/destroy?

What sort of test is selecting the records?


not too sure what you mean here? I'm a .NET developer but I'll have to manage this new db.

>> Edited by tim_s on Monday 27th September 17:05

tim_s

Original Poster:

299 posts

278 months

Monday 27th September 2004
quotequote all
Plotloss said:
Surely inserting in a file with 10 million records is exactly the same as inserting into a file with 4?


my understanding was that if a table was optimised for select statements by using indexes it would slow insert statements down because the index would need to be recreated / added to.

Plotloss

67,280 posts

294 months

Monday 27th September 2004
quotequote all
When you execute a query if there is no index SQL creates one for they keyset you are querying and then destroys it at the end of the execute.

You can, somehow, I'm not sure, perpetuate the index so the query uses that index rather than the SQL created one (it may even do this for you automatically if one exists, it does on the AS/400)

Plotloss

67,280 posts

294 months

Monday 27th September 2004
quotequote all
tim_s said:

Plotloss said:
Surely inserting in a file with 10 million records is exactly the same as inserting into a file with 4?



my understanding was that if a table was optimised for select statements by using indexes it would slow insert statements down because the index would need to be recreated / added to.


True, thats the maintenance, but surely there is a way to optimise that maintenance again its there on the AS/400...

ATG

23,119 posts

296 months

Monday 27th September 2004
quotequote all
If you've got no indexing on tables with millions of rows then your selects are going to scan the entire table and they will be slow. Have you tested the impact of a small amount of indexing on your app?

By the way, what kind of data are you managing?

noelwatson

11,710 posts

266 months

Monday 27th September 2004
quotequote all
tim_s,

Parallelism is turned on by default in SQL Server. We've not found it to improve query response so have disabled it (MAXDOP per query or via Enterprise Manager for all queries).
With correct indexes, I was returning 100 rows from a 30 million row table (with numerous joins to other tables) in around 100ms, but we weren't doing a lot of inserts/updates so could afford to be generous with the indexing.
I believe inserting is better with a clustered index than with no clustered index, but haven't done any testing on how much adding one additional non-clustered index would slow things down.
As someone has suggested, I would see if I could get away with adding an index and see if your inserting performance isn't affected too much.

tim_s

Original Poster:

299 posts

278 months

Monday 27th September 2004
quotequote all
thanks for all the replies again.

i'll have to generate some test data and see how indexing affects the performance on inserts.

it's quite reassuring to hear that some of you have large databases and are getting good response times with queries!

wiggy001

7,066 posts

295 months

Monday 27th September 2004
quotequote all
There are so many to look at here that some more info would be helpful but here's a few points based on what's been raised:

1. Not having any indexes is bad. End of. Each table should have at least a clustered index.

2. Clustered indexes store data sequentially on the physical server based on the index keys, so if a query can use a clustered index it'll be quicker than using a non-clustered index. It may also be able to utilise 'Read Ahead scans'

3. Additional indexes will slow down inserts to a degree, but the effect shouldn't be that bad with up to half a dozen indexes (you shouldn't need any more).

4. To speed up queries, consider the use of inline selects, 'SELECT TOP n <fields> FROM...' and EXISTS vs IN criteria.

5. May be the most obvious point of all, but make sure tables are joined properly where necessary.

6. Consider table joins vs subselects and, where possible, use IN and EXISTS criteria as opposed to NOT IN and NOT EXISTS.

Just a few random ramblings to get you thinking. Feel free to ask for more info...

HTH


gopher

5,160 posts

283 months

Monday 27th September 2004
quotequote all
few more points

If inserts are a worry have a look at fill factor and the padding - that can help reduce the number index rebuilds required.(read Books On-Line)

If the DB was built using enterprise manager (not a good idea if you want to view the scripts that built it, but common) and a primary key marked then a clustered index would have been created on that primary key column. This is good if it's a natural key but if it is an auto key then check this is the most appropriate (which columns are searched on, is there are natural key etc)

When doing a select always name the columns, even if you want all the columns, if not SQL Server goes back to the master table to find out which columns are in that table.

Try to use Stored procedures, apart from other benefits (security, code separation etc) the query is optomised, although this can also be achieved in .net by using the SqlCommand object in System.Data.SqlClient and passing parameters.

Do not prefix stored procedure names "sp_" (although I think this was an MS reccomendation at one time) as it will cause SQL Server to read through System procedures before (or as well as) yours to find the correct procedure (this may be limited to where dbo is owner).

Unless you want a seperate dataset with the number of columns affected returned as well as your data always use the SET NOCOUNT ON command in stored procs, improves speed dramtically especially in large queries.

When reading data, especially for reports use an SqlDataReader (If using .net), which is read only, forward only which will reduce the time taken to read the data. This also sends the query in TDS (Tabular Data Stream) protocol which is the native SQL Server protocol for client communication, so removing another layer.

Cheers

Paul

ThatPhilBrettGuy

11,810 posts

264 months

Monday 27th September 2004
quotequote all
tim_s said:
by far the most important thing is the speed of the insert statements so i can't really optimise the table for select's (i.e indexing).

We have to store 20,000,000 records a day at a max rate of 25,000 per second..into a table with 2 indexes (one clustered) on a DL380 with 2 3.2GHz processors.

Unless your queries/tables are really complex you should be able to achieve similar rates. Look at bulk copy calls...