Discussion:
Mysql growing pains, 4 days to create index on one table!
matt ryan
2004-06-30 18:50:30 UTC
Permalink
Rebuilding index takes 3 1/2 days!!! Growing pains with mysql..

I've got 2 dedicated servers, each with a slave, all run 32gig 15k rpm
raid 5 on u320 perc raid cards, dell 2600/4600's with single channel
backplanes (new ones will have dual channel)

All have 2 gig of ram, but I've never seen mysql use more than 600mb
of ram.

The servers handle huge loads, each day there are 30 1-2 gig files
loaded into large tables, total mysql data size is 96 gig, the large
tables are 2-6 gig.

Inserts are done on dupe key ignore, this takes hours on the large
files, it barely keeps up with input files.

At the bottom of this post I've got the mysql.ini config lines, any
suggestions are welcome, I'm already beyond the mysql "huge" sample
they used to include in the program.

Sample table that I load is as follows.

each day I get 40 % new records on the text file, the input file is
normally 20mb, once a week I get one that's 1-2 gig, these take all
day to load.

I need more multiple column indexes, as some querys return millions of
rows that must be scanned, but the index size already exceeds the
table size, and the combinations I need would result in an myi that's
5x larger than the data itself.

Here's an example of the speed problem, the index was corrupt so I
dropped all and recreated, rather than a myisam repair. I think 3 days
is a little excessive for a table that's only 3.428 gig, index is
2.729 gig. I cant remove the primary key, as it keeps duplicates out
of the system, the input files are from old database's, we use mysql
to store the data for the web frontend, mostly done in ASP, most
queries take less than a second, unforuntatly we have big queries that
take way more than the IIS timeout setting all the time, but no way
around it, I cant add more indexes without making it even slower :(



I cant tell if it's mysql that's the problem, or the hardware, Here's a
screenshot of the disk IO, if I copy a file while mysql is doing the
build index, the io shoots way up, which tells me, mysql is NOT maxing
out the drives, and it's also not maxing out the memory.

Unless it's doing lots and lots of seeks on the drive, which is harder
to test using perfmon, are there any mysql test setups that would help
identify where the bottleneck is?


screenshot of disk io usage

Loading Image...

I'm all out of ideas, other than switching to another db, and the table
& indexes split across drives, maybe a 2 channel setup, 4 drives per
channel, each 4 is a separate raid 5 setup, one holds data one holds
indexes, cant do this with mysql though

mysql> alter table hood_stat add primary key
(dic,niin,fr_ric,don,suf,dte_txn,sta) , add index `don` (`don`), add
index `niin` (`niin`), add index `stor` (`stor`), add index `dic`
(`dic`), add index `ctasc` (`ctasc`);

Query OK, 45449534 rows affected (3 days 19 hours 6 min 34.94 seconds
Records: 45449534 Duplicates: 0 Warnings: 0


CREATE TABLE `hood_stat` (
`dic` char(3) NOT NULL default '',
`fr_ric` char(3) NOT NULL default '',
`niin` char(11) NOT NULL default '',
`ui` char(2) NOT NULL default '',
`qty` char(5) NOT NULL default '',
`don` char(14) NOT NULL default '',
`suf` char(1) NOT NULL default '',
`dte_txn` char(5) NOT NULL default '',
`ship_to` char(3) NOT NULL default '',
`sta` char(2) NOT NULL default '',
`lst_sos` char(3) NOT NULL default '',
`esd` char(4) NOT NULL default '',
`stor` char(3) NOT NULL default '',
`d_t` char(4) NOT NULL default '',
`ctasc` char(10) NOT NULL default '',
PRIMARY KEY (`dic`,`niin`,`fr_ric`,`don`,`suf`,`dte_txn`,`sta` ),
KEY `don` (`don`),
KEY `niin` (`niin`),
KEY `stor` (`stor`),
KEY `dic` (`dic`),
KEY `ctasc` (`ctasc`)
) TYPE=MyISAM MAX_ROWS=1000000000 PACK_KEYS=1

skip-locking
set-variable=delay_key_write=ALL
set-variable= key_buffer_size=1500M
set-variable=join_buffer=512M
set-variable= max_allowed_packet=256M
set-variable= table_cache=512
set-variable= sort_buffer=256M
set-variable=tmp_table_size=400M
set-variable= record_buffer=512M
set-variable= thread_cache=8
set-variable=myisam_sort_buffer_size=256M
myisam-recover=BACKUP,FORCE
set-variable=read_buffer_size=512M
set-variable=interactive_timeout=7200
set-variable=wait_timeout=7200
log-bin
server-id=1
replicate-do-db=finlog
set-variable=open-files-limit=500
set-variable=table-cache=400
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
Lopez David E-r9374c
2004-06-30 21:18:52 UTC
Permalink
matt

1) inserts using this format is much faster:
INSERT INTO table (col1, col2) VALUES (val1,val2), (val3,val4)
is much faster then single row insert. My experience is
2.5 hrs vs.. 36 hrs.

2) The PACK_KEYS=1 may be hurting you. I've never used it.

3) There may be a cache somewhere that's to small. You'll
have to do some digging in this area.

4) dup key ignore - what does that mean exactly?

5) what is your OS & rev, mysql rev.

Please post any suggestions that you find valuable so we
can all learn..

david
-----Original Message-----
Sent: Wednesday, June 30, 2004 11:51 AM
Subject: Mysql growing pains, 4 days to create index on one table!
Rebuilding index takes 3 1/2 days!!! Growing pains with mysql..
I've got 2 dedicated servers, each with a slave, all run 32gig 15k rpm
raid 5 on u320 perc raid cards, dell 2600/4600's with single channel
backplanes (new ones will have dual channel)
All have 2 gig of ram, but I've never seen mysql use more than 600mb
of ram.
The servers handle huge loads, each day there are 30 1-2 gig files
loaded into large tables, total mysql data size is 96 gig, the large
tables are 2-6 gig.
Inserts are done on dupe key ignore, this takes hours on the large
files, it barely keeps up with input files.
At the bottom of this post I've got the mysql.ini config lines, any
suggestions are welcome, I'm already beyond the mysql "huge" sample
they used to include in the program.
Sample table that I load is as follows.
each day I get 40 % new records on the text file, the input file is
normally 20mb, once a week I get one that's 1-2 gig, these take all
day to load.
I need more multiple column indexes, as some querys return millions of
rows that must be scanned, but the index size already exceeds the
table size, and the combinations I need would result in an myi that's
5x larger than the data itself.
Here's an example of the speed problem, the index was corrupt so I
dropped all and recreated, rather than a myisam repair. I think 3 days
is a little excessive for a table that's only 3.428 gig, index is
2.729 gig. I cant remove the primary key, as it keeps duplicates out
of the system, the input files are from old database's, we use mysql
to store the data for the web frontend, mostly done in ASP, most
queries take less than a second, unforuntatly we have big queries that
take way more than the IIS timeout setting all the time, but no way
around it, I cant add more indexes without making it even slower :(
I cant tell if it's mysql that's the problem, or the
hardware, Here's a
screenshot of the disk IO, if I copy a file while mysql is doing the
build index, the io shoots way up, which tells me, mysql is
NOT maxing
out the drives, and it's also not maxing out the memory.
Unless it's doing lots and lots of seeks on the drive, which
is harder
to test using perfmon, are there any mysql test setups that
would help
identify where the bottleneck is?
screenshot of disk io usage
http://www.geekopolis.com/pics/diskio.jpg
I'm all out of ideas, other than switching to another db, and
the table
& indexes split across drives, maybe a 2 channel setup, 4 drives per
channel, each 4 is a separate raid 5 setup, one holds data one holds
indexes, cant do this with mysql though
mysql> alter table hood_stat add primary key
(dic,niin,fr_ric,don,suf,dte_txn,sta) , add index `don` (`don`), add
index `niin` (`niin`), add index `stor` (`stor`), add index `dic`
(`dic`), add index `ctasc` (`ctasc`);
Query OK, 45449534 rows affected (3 days 19 hours 6 min 34.94 seconds
Records: 45449534 Duplicates: 0 Warnings: 0
CREATE TABLE `hood_stat` (
`dic` char(3) NOT NULL default '',
`fr_ric` char(3) NOT NULL default '',
`niin` char(11) NOT NULL default '',
`ui` char(2) NOT NULL default '',
`qty` char(5) NOT NULL default '',
`don` char(14) NOT NULL default '',
`suf` char(1) NOT NULL default '',
`dte_txn` char(5) NOT NULL default '',
`ship_to` char(3) NOT NULL default '',
`sta` char(2) NOT NULL default '',
`lst_sos` char(3) NOT NULL default '',
`esd` char(4) NOT NULL default '',
`stor` char(3) NOT NULL default '',
`d_t` char(4) NOT NULL default '',
`ctasc` char(10) NOT NULL default '',
PRIMARY KEY (`dic`,`niin`,`fr_ric`,`don`,`suf`,`dte_txn`,`sta` ),
KEY `don` (`don`),
KEY `niin` (`niin`),
KEY `stor` (`stor`),
KEY `dic` (`dic`),
KEY `ctasc` (`ctasc`)
) TYPE=MyISAM MAX_ROWS=1000000000 PACK_KEYS=1
skip-locking
set-variable=delay_key_write=ALL
set-variable= key_buffer_size=1500M
set-variable=join_buffer=512M
set-variable= max_allowed_packet=256M
set-variable= table_cache=512
set-variable= sort_buffer=256M
set-variable=tmp_table_size=400M
set-variable= record_buffer=512M
set-variable= thread_cache=8
set-variable=myisam_sort_buffer_size=256M
myisam-recover=BACKUP,FORCE
set-variable=read_buffer_size=512M
set-variable=interactive_timeout=7200
set-variable=wait_timeout=7200
log-bin
server-id=1
replicate-do-db=finlog
set-variable=open-files-limit=500
set-variable=table-cache=400
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
http://lists.mysql.com/mysql?unsub=***@motorola.com
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
matt ryan
2004-07-14 19:24:02 UTC
Permalink
Post by Lopez David E-r9374c
matt
INSERT INTO table (col1, col2) VALUES (val1,val2), (val3,val4)
is much faster then single row insert. My experience is
2.5 hrs vs.. 36 hrs.
2) The PACK_KEYS=1 may be hurting you. I've never used it.
3) There may be a cache somewhere that's to small. You'll
have to do some digging in this area.
4) dup key ignore - what does that mean exactly?
5) what is your OS & rev, mysql rev.
Please post any suggestions that you find valuable so we
can all learn..
david
If I understand it, pack_keys should help if your disk IO limited vs cpu
limited, cpu is ususally near idle.

I increased the cache sizes, helped a little, but not much

delay_key_write=ALL
key_buffer_size=1000M
read_buffer_size=512M
record_buffer=512M


What would the syntax for that type of insert be?

I have a table with 30,000 records, I need to insert them into the main
table with millions of records, I thought that insert into table select
* from 2nd table would be the fastest way.

insert ignore will not insert a record, if it violates the primary key,
I do this to keep duplicate records out of the system

windows 2003 and 2000 servers, mysql-nt 4.0.16 I tried the newer
versions, but found bugs on all of them, I submitted them to the bug system.

I believe my bottleneck is reading the data to ensure the primary key is
not violated, I see lots of read IO, but little write IO

Matt
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
U***@aol.com
2004-06-30 23:31:26 UTC
Permalink
You may want more indexes but you might be getting killed because you already have too many.

To test - try loading into a table without indexes and see if it makes a difference.

At the very least - check to see if the primary index which starts with 'dic' can make your special 'dic' index superfluous.

If write speed is a bottleneck you might consider Raid-1 instead of Raid-5.

Reading lots of rows via index is a killer. Depending on your hardware it may be cheaper to table scan 50 rows than to read 1 via index. However, this requires partitioning of the data based on some column which appears in every query and acts as an initial filter. If you are lucky enough to be in that situation - consider a MERGE table.
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
matt ryan
2004-07-14 19:27:02 UTC
Permalink
Post by U***@aol.com
You may want more indexes but you might be getting killed because you already have too many.
To test - try loading into a table without indexes and see if it makes a difference.
At the very least - check to see if the primary index which starts with 'dic' can make your special 'dic' index superfluous.
If write speed is a bottleneck you might consider Raid-1 instead of Raid-5.
Reading lots of rows via index is a killer. Depending on your hardware it may be cheaper to table scan 50 rows than to read 1 via index. However, this requires partitioning of the data based on some column which appears in every query and acts as an initial filter. If you are lucky enough to be in that situation - consider a MERGE table.
These tables are merged, the total table size is huge, on this
particular table, it's , 45,449,534 rows, however, all the merge tables
combined are 258,840,305 records

perhaps I should reorder the pimary key, putting the longest most unique
record up front, and the least unique at the end, would that speed up
the key check? I can tell that almost everything is read IO, very
little write IO
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
Justin Swanhart
2004-07-14 20:18:27 UTC
Permalink
Indexes can generate vast amounts of random i/o.
Because of rotational latency, random i/o can really
slow you down, especially if you are using IDE or SATA
disks because they can't do tagged queueing like SCSI
disks can.

If you have the budget for it, I would consider
getting some solid state disks. Because they have
extremely low latency you will be able to get full i/o
bandwidth on your reads. If you can't afford those,
consider adding more disks to your RAID array so that
you can spread the reads over more spindles, which
will help performance.
Post by U***@aol.com
Post by U***@aol.com
You may want more indexes but you might be getting
killed because you already have too many.
Post by U***@aol.com
To test - try loading into a table without indexes
and see if it makes a difference.
Post by U***@aol.com
At the very least - check to see if the primary
index which starts with 'dic' can make your special
'dic' index superfluous.
Post by U***@aol.com
If write speed is a bottleneck you might consider
Raid-1 instead of Raid-5.
Post by U***@aol.com
Reading lots of rows via index is a killer.
Depending on your hardware it may be cheaper to
table scan 50 rows than to read 1 via index.
However, this requires partitioning of the data
based on some column which appears in every query
and acts as an initial filter. If you are lucky
enough to be in that situation - consider a MERGE
table.
These tables are merged, the total table size is
huge, on this
particular table, it's , 45,449,534 rows, however,
all the merge tables
combined are 258,840,305 records
perhaps I should reorder the pimary key, putting the
longest most unique
record up front, and the least unique at the end,
would that speed up
the key check? I can tell that almost everything
is read IO, very
little write IO
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
http://lists.mysql.com/mysql?unsub=sp4mv0rt3x-***@yahoo.com
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
matt ryan
2004-07-15 13:10:43 UTC
Permalink
Post by Justin Swanhart
Indexes can generate vast amounts of random i/o.
Because of rotational latency, random i/o can really
slow you down, especially if you are using IDE or SATA
disks because they can't do tagged queueing like SCSI
disks can.
If you have the budget for it, I would consider
getting some solid state disks. Because they have
extremely low latency you will be able to get full i/o
bandwidth on your reads. If you can't afford those,
consider adding more disks to your RAID array so that
you can spread the reads over more spindles, which
will help performance.
Using 8x72 gig 15,000 rpm U320 scsi drives in raid 5 now, that should be
a fast read raid config

no more will fit in the server, and solid state are 70,000 $ it's out of
our budget

I optimize the tables every weekened

any other sugestions?

Would it help to defrag? The only way I can do it, is backup every
file, wipe out the server, and then restore the files, there's not
enough free space to do a proper defrag

Matt
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
Pete McNeil
2004-07-15 14:53:00 UTC
Permalink
Post by Justin Swanhart
Indexes can generate vast amounts of random i/o.
Because of rotational latency, random i/o can really
slow you down, especially if you are using IDE or SATA
disks because they can't do tagged queueing like SCSI
disks can.
If you have the budget for it, I would consider
getting some solid state disks. Because they have
extremely low latency you will be able to get full i/o
bandwidth on your reads. If you can't afford those,
consider adding more disks to your RAID array so that
you can spread the reads over more spindles, which
will help performance.
mr> Using 8x72 gig 15,000 rpm U320 scsi drives in raid 5 now, that should be
mr> a fast read raid config

mr> no more will fit in the server, and solid state are 70,000 $ it's out of
mr> our budget

mr> I optimize the tables every weekened

mr> any other sugestions?

Consider moving to a raid 10 configuration + adding additional drives
externally on alternate scsi chains.

Consider additional ram.

Consider replicating to some slave servers and dividing reads among them.

Hope this helps,

_M
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
matt ryan
2004-07-15 14:58:31 UTC
Permalink
Post by Pete McNeil
Consider replicating to some slave servers and dividing reads among them.
I already replicate to slaves, and sites will do read only queries off
these slaves

99.9 % of the tables are read only anyway, the only tables we update or
insert into, are very very small and fast.

These big tables are daily extracts from IBM DB2 sites, in ebcdic
format, we archive the data and users then query our site which is
faster, unless they start doing multiple query options, then things get
slow.

If you query only one feild its FAST, but if you query two feilds, its
slow, very slow, need multiple key per query support in mysql.
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
a***@asocmedrosario.com.ar
2004-07-15 15:45:05 UTC
Permalink
Post by Pete McNeil
Consider replicating to some slave servers and dividing reads among them.
mr> I already replicate to slaves, and sites will do read only queries off
mr> these slaves

mr> 99.9 % of the tables are read only anyway, the only tables we update or
mr> insert into, are very very small and fast.

mr> These big tables are daily extracts from IBM DB2 sites, in ebcdic
mr> format, we archive the data and users then query our site which is
mr> faster, unless they start doing multiple query options, then things get
mr> slow.

mr> If you query only one feild its FAST, but if you query two feilds, its
mr> slow, very slow, need multiple key per query support in mysql.


One thing that can help (at least for me) is working with temporary
tables, spliting one complex query in many littles.

A simple example:
2 tables: header and details

CREATE TEMPORARY TABLE tmp TYPE=HEAP
SELECT hdr_code FROM header WHERE ...;

ALTER TABLE tmp ADD PRIMARY KEY(hdr_code);

SELECT d.* FROM details d INNER JOIN tmp
ON d.hdr_code=tmp.hdr_code WHERE ...;

DROP TABLE TMP;
--
Alejandro D. Burne
Departamento de Sistemas
Asociación Médica Rosario
España 401 - S2000SBI
Rosario - Santa Fe - Argentina
+54-0341-4252313 Interno 145
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
Alejandro D. Burne
2004-07-15 17:00:19 UTC
Permalink
Post by Pete McNeil
Consider replicating to some slave servers and dividing reads among them.
mr>> I already replicate to slaves, and sites will do read only queries off
mr>> these slaves

mr>> 99.9 % of the tables are read only anyway, the only tables we update or
mr>> insert into, are very very small and fast.

mr>> These big tables are daily extracts from IBM DB2 sites, in ebcdic
mr>> format, we archive the data and users then query our site which is
mr>> faster, unless they start doing multiple query options, then things get
mr>> slow.

mr>> If you query only one feild its FAST, but if you query two feilds, its
mr>> slow, very slow, need multiple key per query support in mysql.


aaca> One thing that can help (at least for me) is working with temporary
aaca> tables, spliting one complex query in many littles.

aaca> A simple example:
aaca> 2 tables: header and details

aaca> CREATE TEMPORARY TABLE tmp TYPE=HEAP
aaca> SELECT hdr_code FROM header WHERE ...;

aaca> ALTER TABLE tmp ADD PRIMARY KEY(hdr_code);

aaca> SELECT d.* FROM details d INNER JOIN tmp
aaca> ON d.hdr_code=tmp.hdr_code WHERE ...;

aaca> DROP TABLE TMP;

aaca> --
aaca> Alejandro D. Burne
aaca> Departamento de Sistemas
aaca> Asociación Médica Rosario
aaca> España 401 - S2000SBI
aaca> Rosario - Santa Fe - Argentina
aaca> +54-0341-4252313 Interno 145

I'm sorry this was an answer to Patrick Drouin
--
Alejandro D. Burne
Departamento de Sistemas
Asociación Médica Rosario
España 401 - S2000SBI
Rosario - Santa Fe - Argentina
+54-0341-4252313 Interno 145
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
Egor Egorov
2004-07-01 10:44:19 UTC
Permalink
Are you running this under Microsoft Windows?
--
For technical support contracts, goto https://order.mysql.com/?ref=ensita
This email is sponsored by Ensita.net http://www.ensita.net/
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Egor Egorov
/ /|_/ / // /\ \/ /_/ / /__ ***@ensita.net
/_/ /_/\_, /___/\___\_\___/ MySQL AB / Ensita.net
<___/ www.mysql.com
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
matt ryan
2004-07-14 19:24:38 UTC
Permalink
Post by Egor Egorov
Are you running this under Microsoft Windows?
Yes, windows 2k and 2003, mysql-nt 4.0.16
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
Tim Brody
2004-07-15 14:22:58 UTC
Permalink
I know it's naff but I've found it quicker to use "myisamchk" with row-sort
than it is to get the MySQL daemon to regenerate keys (and if you know
you're not changing the data file you can tell myisamchk not to copy the
data), unless I've missed something in the MySQL config ...

The only way I know to do this for PRIMARY KEYs is a process described by
Kyle J. Munn (but watch out for free space :-):
http://lists.mysql.com/mysql/158737

This boils down to moving the MYD file, truncating the table to create an
empty MYD, modify your key(s), move the MYD back, and repair using myisamchk
to re-create the keys.

Otherwise you can use ALTER TABLE <table> DISABLE KEYS/ENABLE KEYS to
disable keys while you make your inserts, then enable keys to update
non-Primary keys in one go.

I've never tried PACK_KEYS so can't suggest anything on that, other than it
sounds like it adds more data to your index which is probably a bad thing
with such short values.

You may find that the 'dic' KEY isn't necessary, as it's the first part of
your PRIMARY KEY.

Depending on your data you may be better off storing integer IDs and the
char values in a look-up table.

I've found better performance for multi-column keys by putting the columns
in order of least variance first, e.g. for a list of dates:
1979-04-23
1979-07-15
1980-02-04
1980-06-04
You want a key on (YEAR-MONTH-DAY)

If you can you could put the index/data on different disks - not sure how
you would do that in Windows (INDEX DIRECTORY option for CREATE TABLE?).

You should definitely put the binary log file on another disk, but again not
something I've used.

I've found MySQL to be a royal pain working with multi-GB tables (my biggest
is 12GB+13GB index). I've learnt that MySQL is a bit like a confused puppy -
it doesn't know why it wet the floor, but it expects you to know that pained
expression means you need to move it somewhere else ...

All the best,
Tim.

----- Original Message -----
From: "matt ryan" <***@xotech-llc.com>
To: <***@lists.mysql.com>
Sent: Wednesday, June 30, 2004 7:50 PM
Subject: Mysql growing pains, 4 days to create index on one table!
Post by matt ryan
Rebuilding index takes 3 1/2 days!!! Growing pains with mysql..
I've got 2 dedicated servers, each with a slave, all run 32gig 15k rpm
raid 5 on u320 perc raid cards, dell 2600/4600's with single channel
backplanes (new ones will have dual channel)
All have 2 gig of ram, but I've never seen mysql use more than 600mb
of ram.
The servers handle huge loads, each day there are 30 1-2 gig files
loaded into large tables, total mysql data size is 96 gig, the large
tables are 2-6 gig.
Inserts are done on dupe key ignore, this takes hours on the large
files, it barely keeps up with input files.
At the bottom of this post I've got the mysql.ini config lines, any
suggestions are welcome, I'm already beyond the mysql "huge" sample
they used to include in the program.
Sample table that I load is as follows.
each day I get 40 % new records on the text file, the input file is
normally 20mb, once a week I get one that's 1-2 gig, these take all
day to load.
I need more multiple column indexes, as some querys return millions of
rows that must be scanned, but the index size already exceeds the
table size, and the combinations I need would result in an myi that's
5x larger than the data itself.
Here's an example of the speed problem, the index was corrupt so I
dropped all and recreated, rather than a myisam repair. I think 3 days
is a little excessive for a table that's only 3.428 gig, index is
2.729 gig. I cant remove the primary key, as it keeps duplicates out
of the system, the input files are from old database's, we use mysql
to store the data for the web frontend, mostly done in ASP, most
queries take less than a second, unforuntatly we have big queries that
take way more than the IIS timeout setting all the time, but no way
around it, I cant add more indexes without making it even slower :(
I cant tell if it's mysql that's the problem, or the hardware, Here's a
screenshot of the disk IO, if I copy a file while mysql is doing the
build index, the io shoots way up, which tells me, mysql is NOT maxing
out the drives, and it's also not maxing out the memory.
Unless it's doing lots and lots of seeks on the drive, which is harder
to test using perfmon, are there any mysql test setups that would help
identify where the bottleneck is?
screenshot of disk io usage
http://www.geekopolis.com/pics/diskio.jpg
I'm all out of ideas, other than switching to another db, and the table
& indexes split across drives, maybe a 2 channel setup, 4 drives per
channel, each 4 is a separate raid 5 setup, one holds data one holds
indexes, cant do this with mysql though
mysql> alter table hood_stat add primary key
(dic,niin,fr_ric,don,suf,dte_txn,sta) , add index `don` (`don`), add
index `niin` (`niin`), add index `stor` (`stor`), add index `dic`
(`dic`), add index `ctasc` (`ctasc`);
Query OK, 45449534 rows affected (3 days 19 hours 6 min 34.94 seconds
Records: 45449534 Duplicates: 0 Warnings: 0
CREATE TABLE `hood_stat` (
`dic` char(3) NOT NULL default '',
`fr_ric` char(3) NOT NULL default '',
`niin` char(11) NOT NULL default '',
`ui` char(2) NOT NULL default '',
`qty` char(5) NOT NULL default '',
`don` char(14) NOT NULL default '',
`suf` char(1) NOT NULL default '',
`dte_txn` char(5) NOT NULL default '',
`ship_to` char(3) NOT NULL default '',
`sta` char(2) NOT NULL default '',
`lst_sos` char(3) NOT NULL default '',
`esd` char(4) NOT NULL default '',
`stor` char(3) NOT NULL default '',
`d_t` char(4) NOT NULL default '',
`ctasc` char(10) NOT NULL default '',
PRIMARY KEY (`dic`,`niin`,`fr_ric`,`don`,`suf`,`dte_txn`,`sta` ),
KEY `don` (`don`),
KEY `niin` (`niin`),
KEY `stor` (`stor`),
KEY `dic` (`dic`),
KEY `ctasc` (`ctasc`)
) TYPE=MyISAM MAX_ROWS=1000000000 PACK_KEYS=1
skip-locking
set-variable=delay_key_write=ALL
set-variable= key_buffer_size=1500M
set-variable=join_buffer=512M
set-variable= max_allowed_packet=256M
set-variable= table_cache=512
set-variable= sort_buffer=256M
set-variable=tmp_table_size=400M
set-variable= record_buffer=512M
set-variable= thread_cache=8
set-variable=myisam_sort_buffer_size=256M
myisam-recover=BACKUP,FORCE
set-variable=read_buffer_size=512M
set-variable=interactive_timeout=7200
set-variable=wait_timeout=7200
log-bin
server-id=1
replicate-do-db=finlog
set-variable=open-files-limit=500
set-variable=table-cache=400
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
http://lists.mysql.com/mysql?unsub=***@ecs.soton.ac.uk
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
matt ryan
2004-07-15 14:31:41 UTC
Permalink
Post by Tim Brody
You may find that the 'dic' KEY isn't necessary, as it's the first part of
your PRIMARY KEY.
I've found better performance for multi-column keys by putting the columns
1979-04-23
1979-07-15
1980-02-04
1980-06-04
You want a key on (YEAR-MONTH-DAY)
If you can you could put the index/data on different disks - not sure how
you would do that in Windows (INDEX DIRECTORY option for CREATE TABLE?).
You should definitely put the binary log file on another disk, but again not
something I've used.
I've found MySQL to be a royal pain working with multi-GB tables (my biggest
is 12GB+13GB index). I've learnt that MySQL is a bit like a confused puppy -
it doesn't know why it wet the floor, but it expects you to know that pained
expression means you need to move it somewhere else ...
I need the DIC in the key to keep the record unique, I have thousands
with everything identical except the DIC.

I was confused on the multi key index issue, I thought it would seek
faster if I put the most unique field up front, which I do on most
tables, I did not on this one though. I have one large raid array now,
so I cant split the data, or put the binary log on another disk.

I found mysql was great up to about 3 gig, then everything hit the
brakes and got really really really slow

I'm scared of joines, every time I do a join in mysql on indexed fields
in mysql, the performance is horrible, because the where clause is not a
field that's in the join, performance is poopy

I wish mysql could use multiple indexes like oracle, to narrow down the
results, I've got some simple queries that take hours due to single
index use, but every query field is indexed.
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
Tim Brody
2004-07-15 17:35:47 UTC
Permalink
----- Original Message -----
Post by matt ryan
I need the DIC in the key to keep the record unique, I have thousands
with everything identical except the DIC.
In your schema you had DIC in the PRIMARY KEY and an additional
(unnecessary?) KEY on DIC.
Post by matt ryan
I was confused on the multi key index issue, I thought it would seek
faster if I put the most unique field up front, which I do on most
tables, I did not on this one though. I have one large raid array now,
so I cant split the data, or put the binary log on another disk.
Having performed a couple of tests on a multi-key index I can't find any
significant difference between different orders on the columns (logical I
guess), although there may be certain uses that might work better in either
situation ...

Perhaps a better point to make would be to say change the order of your
PRIMARY KEY to reflect the queries that you're performing. e.g. as I said
above MySQL will efficiently use the first part of any key. Certainly, if
you're performing a WHERE or ORDER BY on any of the columns in the primary
key put those at the front.
Post by matt ryan
I found mysql was great up to about 3 gig, then everything hit the
brakes and got really really really slow
I'm scared of joines, every time I do a join in mysql on indexed fields
in mysql, the performance is horrible, because the where clause is not a
field that's in the join, performance is poopy
Hmm, how about (sorry if I'm pointing out the obvious) e.g.:
_left (col1 int, col2 int, col3 int, unique(col3,col1,col2));
_right (col1 int, col2 int, col3 int, unique(col1,col2,col3));

_left inner join _right using(col1,col2) where _left.col3=42 and
_right.col3=35;

The analyzer (explain) shows a const at the end of the ref columns for
_right and const for _left.

All the best,
Tim.
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
Donny Simonton
2004-07-15 21:08:41 UTC
Permalink
Matt,
I've been reading this thread for a while and at this point, I would say
that you would need to provide the table structures and queries that you are
running.

For example, we have one table that has 8 billion rows in it and it close to
100 gigs and we can hammer it all day long without any problems. It really
depends on how you are doing things.

But as far as you mentioning about mysql not using multiple indexes, it
does. You just have to create an index on multiple fields at one time.
I've got tables with 10 fields in one index, now the trick with mysql is
that you must use all top 10 fields in your where clause for mysql to really
take advantage of the index.

But I would definitely send the list your table structure with your indexes
and some of your selects and inserts. You can always change the names of
things if you don't want people to know the names of everything.

Just my 2 cents.

Donny
-----Original Message-----
Sent: Thursday, July 15, 2004 9:32 AM
Subject: Re: Mysql growing pains, 4 days to create index on one table!
Post by Tim Brody
You may find that the 'dic' KEY isn't necessary, as it's the first part
of
Post by Tim Brody
your PRIMARY KEY.
I've found better performance for multi-column keys by putting the
columns
Post by Tim Brody
1979-04-23
1979-07-15
1980-02-04
1980-06-04
You want a key on (YEAR-MONTH-DAY)
If you can you could put the index/data on different disks - not sure how
you would do that in Windows (INDEX DIRECTORY option for CREATE TABLE?).
You should definitely put the binary log file on another disk, but again
not
Post by Tim Brody
something I've used.
I've found MySQL to be a royal pain working with multi-GB tables (my
biggest
Post by Tim Brody
is 12GB+13GB index). I've learnt that MySQL is a bit like a confused
puppy -
Post by Tim Brody
it doesn't know why it wet the floor, but it expects you to know that
pained
Post by Tim Brody
expression means you need to move it somewhere else ...
I need the DIC in the key to keep the record unique, I have thousands
with everything identical except the DIC.
I was confused on the multi key index issue, I thought it would seek
faster if I put the most unique field up front, which I do on most
tables, I did not on this one though. I have one large raid array now,
so I cant split the data, or put the binary log on another disk.
I found mysql was great up to about 3 gig, then everything hit the
brakes and got really really really slow
I'm scared of joines, every time I do a join in mysql on indexed fields
in mysql, the performance is horrible, because the where clause is not a
field that's in the join, performance is poopy
I wish mysql could use multiple indexes like oracle, to narrow down the
results, I've got some simple queries that take hours due to single
index use, but every query field is indexed.
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
matt ryan
2004-07-15 18:38:45 UTC
Permalink
You might be out of luck with MySQL ... sorry.
You may need to switch to a database that has a parallel query
facility. Then - every query becomes a massive table scan but gets
divided into multiple concurrent subqueries - and overall the job
finishes in a reasonable amount of time. The epitomy of brute force.
It's hard to rationalize initially but after a while you see it's the
only way to go. Remember - indexes are no longer required.
We have a billion row 100GB table the users search any and every way.
Response time is less than a minute.
We are anxiously waiting to see this technology added to MySQL. Maybe
one day we'll have some money to contribute to the effort. Parallel
query is not trivial. That's why these databases are expensive.
I can send you more details privately if you are interested.
I've used it, with oracle, but oracles index searches are better, hit
the best one first, then 2nd best, then 3rd, but I really dont want to
go to oracle, it's too complicated for my tech's

vs mysql, hit the best one first, and use no other

Query time is a non issue at this point, it's load time, load daily file
into temp table, then insert ignore into main table, on key violation
the violating record is ignored

load time is the issue, the server loads files 15 hours a day, that big
primary key makes loading any table over 2-3 gig VERY slow

I thought it was a bug, everything was great untill you get up to about
3 or 4 gig, then it gets SLOW
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
Justin Swanhart
2004-07-15 20:49:38 UTC
Permalink
a few suggestions...

Your slow inserts could be a concurrancy issue. If
lots of users are doing selects that could be
interfering with your inserts, especially if they use
a n odbc/jdbc app that locks the table for the entire
read. Jdbc reads do that when you tell it to "stream"
the contents of the query, because they wouldn't all
fit in memory.

------

Does your import script do an INSERT for each line, or
does it combine lines into multi-value inserts?

doing an insert into ... values (...),(...),(...)
will be much faster than doing one insert for each
row.

since your max packet size is pretty big, you should
be able to consolidate a fairly large number of rows
into one insert.

------

What settings are you using on the Percs? What stripe
size? What write cache are you using (back or thru)?
Are you using read-ahead cacheing (that can hurt index
performance)?

------

Defragging the filessytem probably won't have any
major impact on the speed of your application.

-------

Is there another way you can approach the duplicate
problem? For instance, if duplicate data can only be
generated in the last few days worth of imports, you
might not have to rely on your primary key on the main
table for importing. This only works if you can drop
the primary key because it is only used for duplicate
checking and isn't used to speed queries. You could
instead create another table that you do all your
importing to, taking care of dupes with a primary key
on that table, then insert from that table into the
main one. Keep a timestamp in that table and purge
the older records periodically.

----
the last thing i could think of would be a "dupe
checker" table. Create an innodb table that consists
of only the columns from your big table and make all
the columns the primary key. Essentially you have
just created an "index only table." Insert your new
data into a temporary heap table, then delete from the
heap table where the key is in your "dupe table".
Then insert everything from the heap table into the
big table. Once again, this only works if you don't
need the primary key on the big table. This will use
more CPU/memory but it may get around your read
problems.




Justin
You might be out of luck with MySQL ... sorry.
You may need to switch to a database that has a
parallel query
facility. Then - every query becomes a massive
table scan but gets
divided into multiple concurrent subqueries - and
overall the job
finishes in a reasonable amount of time. The
epitomy of brute force.
It's hard to rationalize initially but after a
while you see it's the
only way to go. Remember - indexes are no longer
required.
We have a billion row 100GB table the users search
any and every way.
Response time is less than a minute.
We are anxiously waiting to see this technology
added to MySQL. Maybe
one day we'll have some money to contribute to the
effort. Parallel
query is not trivial. That's why these databases
are expensive.
I can send you more details privately if you are
interested.
I've used it, with oracle, but oracles index
searches are better, hit
the best one first, then 2nd best, then 3rd, but I
really dont want to
go to oracle, it's too complicated for my tech's
vs mysql, hit the best one first, and use no other
Query time is a non issue at this point, it's load
time, load daily file
into temp table, then insert ignore into main table,
on key violation
the violating record is ignored
load time is the issue, the server loads files 15
hours a day, that big
primary key makes loading any table over 2-3 gig
VERY slow
I thought it was a bug, everything was great untill
you get up to about
3 or 4 gig, then it gets SLOW
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
http://lists.mysql.com/mysql?unsub=sp4mv0rt3x-***@yahoo.com
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
Justin Swanhart
2004-07-15 21:18:53 UTC
Permalink
Post by matt ryan
I've used it, with oracle, but oracles index
searches are better, hit
the best one first, then 2nd best, then 3rd, but I
really dont want to
go to oracle, it's too complicated for my tech's
Oracle rarely performs index merges, but it does have
the ability to do, which mysql lacks.
Post by matt ryan
Query time is a non issue at this point, it's load
time, load daily file
into temp table, then insert ignore into main table,
on key violation
the violating record is ignored
I know you don't want to go with Oracle, but I will
just add that it could help you here too, because it
can do parallel DML. This is especially useful if you
have access to the partitioning option, because you
could then partition your data by hash and get a
number of bonuses. #1 your index updates will be much
faster because there are a lot less rows to look
through. #2 parallel DML can insert into multiple
partitions at once.

I am unsure if MaxDB supports any of those features,
though it may. You may want to look into it to see if
they are.

I don't want to sound like I'm pushing Oracle. I'm
not an Oracle sales rep, or anything like that. I am
a professional Oracle DBA that happens to also use
mySQL a lot and I like both databases. Sometimes one
is better than the other for solving a problem.
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
Lachlan Mulcahy
2004-07-15 23:53:25 UTC
Permalink
Matt,

I think I might have missed the start of this thread, as I can't seem to
find it.

Could you please post the following info (I don't mind if you just mail it
directly to me to save the list):

MySQL Version:
Server OS:
Server Hardware configuration:
- Memory
- CPU(s)
- Disks (RAIDs and Independent disk speed/types)

Database Structure dump and some pointers on where you are having the
issues.

Lachlan
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
matt ryan
2004-07-21 15:11:30 UTC
Permalink
Lachlan Mulcahy wrote:

MySQL Version: 4.0.18
Server OS: windows 2000, or 2003
Memory 2 gig
CPU(s) dual 2.6-3ghz xeon 500-2mb cache (cpu load is low)
Disks (RAIDs and Independent disk speed/types) 8x72 gig 15,000 rpm scsi
II u320 raid 5 dell perc setup
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
U***@kp.org
2004-07-15 20:29:42 UTC
Permalink
That's the whole point. Eliminate your indexes and your load problems are
solved. Especially given the fact that you insert ignore and don't use the
primary key to validate uniqueness.






matt ryan <***@xotech-llc.com>
07/15/2004 11:38 AM


To:
cc: ***@lists.mysql.com
Subject: Re: Mysql growing pains, 4 days to create index on one table!
You might be out of luck with MySQL ... sorry.
You may need to switch to a database that has a parallel query
facility. Then - every query becomes a massive table scan but gets
divided into multiple concurrent subqueries - and overall the job
finishes in a reasonable amount of time. The epitomy of brute force.
It's hard to rationalize initially but after a while you see it's the
only way to go. Remember - indexes are no longer required.
We have a billion row 100GB table the users search any and every way.
Response time is less than a minute.
We are anxiously waiting to see this technology added to MySQL. Maybe
one day we'll have some money to contribute to the effort. Parallel
query is not trivial. That's why these databases are expensive.
I can send you more details privately if you are interested.
I've used it, with oracle, but oracles index searches are better, hit
the best one first, then 2nd best, then 3rd, but I really dont want to
go to oracle, it's too complicated for my tech's

vs mysql, hit the best one first, and use no other

Query time is a non issue at this point, it's load time, load daily file
into temp table, then insert ignore into main table, on key violation
the violating record is ignored

load time is the issue, the server loads files 15 hours a day, that big
primary key makes loading any table over 2-3 gig VERY slow

I thought it was a bug, everything was great untill you get up to about
3 or 4 gig, then it gets SLOW
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=***@kp.org
Justin Swanhart
2004-07-15 21:00:29 UTC
Permalink
Insert ignore doesn't insert the record if there is a
duplicate. It simply doesn't insert the row. Without
the IGNORE clause, the query would generate an error
insert of silenty ignoring the insert.
Post by U***@kp.org
That's the whole point. Eliminate your indexes and
your load problems are
solved. Especially given the fact that you insert
ignore and don't use the
primary key to validate uniqueness.
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
U***@kp.org
2004-07-15 20:24:35 UTC
Permalink
Reordering the primary key wouldn't necessarily speed up the key check.
The reason for selecting a particular sequence within a primary key is to
put the columns that are available the most often - upfront - so that the
index will have at least something to bite on.

Can you parittion further? You want to shoot for reading no more than a 1%
slice for a given query. If possible.

Can you partition by ranges of a column? Values 000001 to 1000000 in
partition 1, 100001 to 2000000 in partition 2, etc? Anything that's
specified consistently in every query?

Then again -
if your table is huge...
and your queries are all over the place...
and the user can specify any search criteria...
and there's no predictable pattern you can partition on...
and you frequently add and/or remove lots of data from your table...
and your queries pull lots of rows...

You might be out of luck with MySQL ... sorry.

You may need to switch to a database that has a parallel query facility.
Then - every query becomes a massive table scan but gets divided into
multiple concurrent subqueries - and overall the job finishes in a
reasonable amount of time. The epitomy of brute force. It's hard to
rationalize initially but after a while you see it's the only way to go.
Remember - indexes are no longer required.

We have a billion row 100GB table the users search any and every way.
Response time is less than a minute.

We are anxiously waiting to see this technology added to MySQL. Maybe one
day we'll have some money to contribute to the effort. Parallel query is
not trivial. That's why these databases are expensive.

I can send you more details privately if you are interested.






matt ryan <***@xotech-llc.com>
07/14/2004 12:27 PM


To: ***@lists.mysql.com
cc:
Subject: Re: Mysql growing pains, 4 days to create index on one table!
Post by U***@aol.com
You may want more indexes but you might be getting killed because you
already have too many.
Post by U***@aol.com
To test - try loading into a table without indexes and see if it makes a
difference.
Post by U***@aol.com
At the very least - check to see if the primary index which starts with
'dic' can make your special 'dic' index superfluous.
Post by U***@aol.com
If write speed is a bottleneck you might consider Raid-1 instead of
Raid-5.
Post by U***@aol.com
Reading lots of rows via index is a killer. Depending on your hardware it
may be cheaper to table scan 50 rows than to read 1 via index. However,
this requires partitioning of the data based on some column which appears
in every query and acts as an initial filter. If you are lucky enough to
be in that situation - consider a MERGE table.
These tables are merged, the total table size is huge, on this
particular table, it's , 45,449,534 rows, however, all the merge tables
combined are 258,840,305 records

perhaps I should reorder the pimary key, putting the longest most unique
record up front, and the least unique at the end, would that speed up
the key check? I can tell that almost everything is read IO, very
little write IO
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=***@kp.org
matt ryan
2004-07-16 14:43:22 UTC
Permalink
Post by Donny Simonton
Matt,
I've been reading this thread for a while and at this point, I would say
that you would need to provide the table structures and queries that you are
running.
For example, we have one table that has 8 billion rows in it and it close to
100 gigs and we can hammer it all day long without any problems. It really
depends on how you are doing things.
But as far as you mentioning about mysql not using multiple indexes, it
does. You just have to create an index on multiple fields at one time.
I've got tables with 10 fields in one index, now the trick with mysql is
that you must use all top 10 fields in your where clause for mysql to really
take advantage of the index.
But I would definitely send the list your table structure with your indexes
and some of your selects and inserts. You can always change the names of
things if you don't want people to know the names of everything.
Just my 2 cents.
Original email has the table structure, query speed is not an issue (it
is, but I will deal with that later)

the issue is insert speed, I get 150k-1M records daily, of these, only
5-10 % are new records, I load the EBCDIC file into a temp table, and
then do "insert ignore into historytable select * from temp table"

and I cant create multiple field indexes, I would need 100 indexes on a
table, the index side already exceeds the table size, I tried to add an
index on date, but gave up because it ran for 2 days and was not done yet.
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
U***@kp.org
2004-07-15 17:48:43 UTC
Permalink
Here's another option to load without requiring a primary key (requires a
LOT of extra disk space and fast CPU, and a batch window to run).

Load the new daily data into the table without checking for dupes.

Then create a new version of the table with distinct values.

Something like this (assuming your table has 2 columns which are char (3)
and char (5) for simplicity's sake) in pseudocode:
(remember to pad all columns to the maximum width to make them uniform,
right spaces on char, left zeroes on numerics)


CREATE NEW_TABLE AS
SELECT SUBSTR (DISTINCT (RPAD(COL1,3) || RPAD(COL2,5)),1,3) AS COL1,
SUBSTR (DISTINCT (RPAD(COL1,3) || RPAD(COL2,5)),4,5) AS COL2
FROM OLD_TABLE









matt ryan <***@xotech-llc.com>
07/16/2004 07:43 AM


To:
cc: ***@lists.mysql.com
Subject: Re: Mysql growing pains, 4 days to create index on one table!
Post by Donny Simonton
Matt,
I've been reading this thread for a while and at this point, I would say
that you would need to provide the table structures and queries that you
are
Post by Donny Simonton
running.
For example, we have one table that has 8 billion rows in it and it close
to
Post by Donny Simonton
100 gigs and we can hammer it all day long without any problems. It
really
Post by Donny Simonton
depends on how you are doing things.
But as far as you mentioning about mysql not using multiple indexes, it
does. You just have to create an index on multiple fields at one time.
I've got tables with 10 fields in one index, now the trick with mysql is
that you must use all top 10 fields in your where clause for mysql to
really
Post by Donny Simonton
take advantage of the index.
But I would definitely send the list your table structure with your
indexes
Post by Donny Simonton
and some of your selects and inserts. You can always change the names of
things if you don't want people to know the names of everything.
Just my 2 cents.
Original email has the table structure, query speed is not an issue (it
is, but I will deal with that later)

the issue is insert speed, I get 150k-1M records daily, of these, only
5-10 % are new records, I load the EBCDIC file into a temp table, and
then do "insert ignore into historytable select * from temp table"

and I cant create multiple field indexes, I would need 100 indexes on a
table, the index side already exceeds the table size, I tried to add an
index on date, but gave up because it ran for 2 days and was not done yet.
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=***@kp.org
matt ryan
2004-07-21 15:09:08 UTC
Permalink
I load all the data into a table with no keys

then I insert this data into a table with 225 million records, this
large table has the primary key, this is what takes a LONG time

Matt
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
matt ryan
2004-07-16 18:10:31 UTC
Permalink
Since you have a temp table created (no keys I assume), use the command
mysqldump -v -e -n -t dbname tablename > filename.sql
This should create insert statements with many values in a single
insert. Then use the client program to insert them to you db.
mysql -u matt -p dbname < filename.sql
This is very fast way to insert rows. Speeds up insertion by at
least 10x on my large tables (11 million rows). I noticed someone
with 1000x more rows w/o problems.
BTW, by forcing the table to have packed keys, the docs say it
will slow your insertion. Maybe not that much, i don't know.
David
Isnt the text file it creates, going to insert the records back into the
temp table when I load it back in?

Does this do insert ignore or insert replace? I need to control that,
on some tables I do insert ignore, on others i do insert replace.

Almost all of the speed issue is read related, the disk writes are
nearly 0, the reads are as fast as the drive can run, reading to see if
the record violates the primary key I assume

about 3 gig seems to be the magic number, less than that is lightning
fast, more than that is extreemly slow
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
Lopez David E-r9374c
2004-07-16 17:27:58 UTC
Permalink
matt
Post by matt ryan
the issue is insert speed, I get 150k-1M records daily, of
these, only
5-10 % are new records, I load the EBCDIC file into a temp table, and
then do "insert ignore into historytable select * from temp table"
Since you have a temp table created (no keys I assume), use the command

mysqldump -v -e -n -t dbname tablename > filename.sql

This should create insert statements with many values in a single
insert. Then use the client program to insert them to you db.

mysql -u matt -p dbname < filename.sql

This is very fast way to insert rows. Speeds up insertion by at
least 10x on my large tables (11 million rows). I noticed someone
with 1000x more rows w/o problems.

BTW, by forcing the table to have packed keys, the docs say it
will slow your insertion. Maybe not that much, i don't know.

David
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
matt ryan
2004-07-21 17:34:59 UTC
Permalink
Post by matt ryan
Since you have a temp table created (no keys I assume), use the command
mysqldump -v -e -n -t dbname tablename > filename.sql
This creates a file that inserts the records back into the same table

it also does not do an insert ignore

I need the records to go into the historical table, with an insert ignore
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
gerald_clark
2004-07-21 17:58:25 UTC
Permalink
mysql -i < filename.sql
Post by matt ryan
Post by matt ryan
Since you have a temp table created (no keys I assume), use the command
mysqldump -v -e -n -t dbname tablename > filename.sql
This creates a file that inserts the records back into the same table
it also does not do an insert ignore
I need the records to go into the historical table, with an insert ignore
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
Justin Swanhart
2004-07-21 18:36:03 UTC
Permalink
I don't see how using a multi value insert would be
any faster than the insert between the tables. It
would certainly be faster than one insert statement
per row, but I don't think it would be faster than
insert ... select ...

The only reason I suggested an extended syntax insert
earlier was because I wasn't aware that a temporary
table was being loaded first.

Do you ever delete from this table?

Can you post the results from
"show variables" for us?

Have you removed the unecessary duplicate key on the
first column of your primary key?

Thanks,

Justin
Post by gerald_clark
mysql -i < filename.sql
Post by matt ryan
Post by matt ryan
Since you have a temp table created (no keys I
assume), use the command
Post by matt ryan
Post by matt ryan
mysqldump -v -e -n -t dbname tablename >
filename.sql
Post by matt ryan
This creates a file that inserts the records back
into the same table
Post by matt ryan
it also does not do an insert ignore
I need the records to go into the historical
table, with an insert ignore
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
http://lists.mysql.com/mysql?unsub=sp4mv0rt3x-***@yahoo.com
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
matt ryan
2004-07-21 19:04:55 UTC
Permalink
Post by Justin Swanhart
Do you ever delete from this table?
Temp table is trunicated before the EBCDIC file is loaded
Post by Justin Swanhart
Have you removed the unecessary duplicate key on the first column of your primary key?
Have not touched the DIC index yet, I need a backup server to change
indexes, it would take the main server down for too long, and it wont be
that big an increase, it's only a 3 character index, I also do joines on
that field to other tables, so I was hesitant on removing that index.
Post by Justin Swanhart
Can you post the results from "show variables" for
nope, the list wont let me send an email that big

I did put it on the web though, here's the results from show variables

http://www.geekopolis.com/Query_Result.txt
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
Justin Swanhart
2004-07-21 20:31:56 UTC
Permalink
Post by matt ryan
Post by Justin Swanhart
Do you ever delete from this table?
Temp table is trunicated before the EBCDIC file is
loaded
I meant the history table :)
Post by matt ryan
Post by Justin Swanhart
Have you removed the unecessary duplicate key on
the first column of your primary key?
Have not touched the DIC index yet, I need a backup
server to change
indexes, it would take the main server down for too
long, and it wont be
that big an increase, it's only a 3 character index,
I also do joines on
that field to other tables, so I was hesitant on
removing that index.
Even if it is only a 3 character index, you still need
to do disk reads/writes to update the values in the
index. With a huge table like yours one index can
make a big difference.

It will still work fine with joins. Since it is the
leading column of another index, it will function just
like a normal index. Having indexes on a and (a,b) is
redundant for searching/joining only a. If you need
to join/search on b, then a seperate index is required
for b if you have indexed (a,b).


The last option I can think of requires a lot more
work on your part as far as inserting data, but it may
be your only option at this point. This is why I
asked if you delete data from the history table, as it
makes deletes/updates more problematic as well.

Split the myisam table into seperate tables. We will
call each table a bucket.

Create a MERGE table of all of them. For selecting the
data.

When inserting, use a hash function on your primary
key values to determine which bucket to insert into.
If you almost always select by primary key, then you
can optimize your select queries to only look at the
correct bucket as well, and to only use the merge
table when you aren't selecting by primary key.

This will speed your inserts because instead of 258M
rows to search through for each insert there are only
8M if you use 32 buckets. The biggest benefit is that
you could also insert in parallel using multiple mysql
connections, because you could calculate the bucket as
an additional column in your temporary table, then do
the inserts for all the buckets the same time.
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
matt ryan
2004-07-22 12:19:20 UTC
Permalink
Post by Justin Swanhart
Split the myisam table into seperate tables. We will
call each table a bucket.
Create a MERGE table of all of them. For selecting the
data.
When inserting, use a hash function on your primary
key values to determine which bucket to insert into.
If you almost always select by primary key, then you
can optimize your select queries to only look at the
correct bucket as well, and to only use the merge
table when you aren't selecting by primary key.
This will speed your inserts because instead of 258M
rows to search through for each insert there are only
8M if you use 32 buckets. The biggest benefit is that
you could also insert in parallel using multiple mysql
connections, because you could calculate the bucket as
an additional column in your temporary table, then do
the inserts for all the buckets the same time.
I hit duplicate hashes for unique records, not sure why, I think I used
aes_encrypt, how do you recomend creating a hash column via sql?

I already split the data into separate tables, and use a merge, it
really didnt speed things up that much
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
Loading...