Post by Scott CareyWell, something is causing the system to alternate between
CPU and disk bound here. (see below).
It would be useful to see what affect the index has.
Ok, I simply deleted the index and repeated the test: I did
not notice any difference. This is probably so because in
fact I am doing just 10 INSERTs.
BTW: I have some more screenshots but as they do not arrive
on the mailing list I keep it. The new graphs are basicly
the same anyway.
Post by Scott Carey(1) The upper dark/gray graph touching the 100% sometimes is
(2) The yellow graph is nearly completly overpainted by (1)
since it is "disk time %".
(3) The brown graph below (1) is "Disk Write Byts/s" divided
by 1.000.000, so around 40 MB/s average.
Looks like it is writing everything twice, or close to it.
Alternatively the index writes occupy half, but that is
unlikely.
'Writing twice': That is the most interesting point I
believe. Why is the data disk doing 40 MB/s *not* including
WAL, however, having 20 MB/s write thoughput in fact. Seems
like: 20 MB for data, 20 MB for X, 20 MB for WAL.
Although that questions is still unanswered: I verified
again that I am disk bound by temporarily replacing the
raid-0 with slower solution: a singly attached sata disk
of the same type: This *did* slow down the test a lot
(approx. 20%). So, yes, I am disk bound but, again, why
that much...
About removing the index on OIDs: No impact (see above).
Post by Scott Carey(4) The read graph is "Disk Time %" of the WAL drive E:,
average approx. 30%.
WAL doesn't look like a bottleneck here, as other tests
have shown. A larger wal_buffers setting might lower this
more, since your record overflows the buffer for sure.
You might want to change your test case to write records
similar size to what you expect (do you expect 80MB?) and
then set wal_buffers up to the size of one checkpoint
segment (16MB) if you expect larger data per transaction.
Ok, without knowing each exact effect I changed some of the
configuration values (from the defaults in 8.2.4), and did
some tests:
(1) First, the most important 8.2.4 defaults (for you to
overlook):
#shared_buffers=32MB
#temp_buffers=8MB
#max_prepared_transactions=5
#work_mem=1MB
#maintenance_work_mem=16MB
#max_stack_depth=2MB
#max_fsm_pages=204800
#max_fsm_relations=1000
#max_files_per_process=1000
#shared_preload_libraries=''
#vacuum_cost_delay=0
#vacuum_cost_page_hit=1
#vacuum_cost_page_miss=10
#vacuum_cost_page_dirty=20
#vacuum_cost_limit=200
#bgwriter_delay=200ms
#bgwriter_lru_percent=1.0
#bgwriter_lru_maxpages=5
#bgwriter_all_percent=0.333
#bgwriter_all_maxpages=5
#fsync=on
#full_page_writes=on
#wal_buffers=64kB
#checkpoint_segments=3
#checkpoint_timeout=5min
#checkpoint_warning=30s
#seq_page_cost=1.0
#random_page_cost=4.0
#cpu_tuple_cost=0.01
#cpu_index_tuple_cost=0.005
#cpu_operator_cost=0.0025
#effective_cache_size=128MB
#default_statistics_target=10
#constraint_exclusion=off
#from_collapse_limit=8
#join_collapse_limit=8
#autovacuum=on
#autovacuum_naptime=1min
#autovacuum_vacuum_threshold=500
#autovacuum_analyze_threshold=250
#autovacuum_vacuum_scale_factor=0.2
#autovacuum_analyze_scale_factor=0.1
#autovacuum_freeze_max_age=200000000
#autovacuum_vacuum_cost_delay=-1
#autovacuum_vacuum_cost_limit=-1
#deadlock_timeout=1s
#max_locks_per_transaction=64
(2) The tests:
Note: The standard speed was about 800MB/40s, so 20MB/s.
a)
What I changed: fsync=off
Result: 35s, so 5s faster.
b) like a) but:
checkpoint_segments=128 (was 3)
autovacuum=off
Result: 35s (no change...?!)
c) like b) but:
temp_buffers=200MB (was 8)
wal_sync_method=open_datasync (was fsync)
wal_buffers=1024kB (was 64)
Result:
The best ever, it took just 29s, so 800MB/29s = 27.5MB/s.
However, having autovacuum=off probably means that deleted
rows will occupy disk space? And I also fear that
checkpoint_segments=128 mean that at some point in the
future there will be a huge delay then (?).
d) also like b) but:
temp_buffers=1000MB
wal_buffers=4096kB
checkpoint_segments=3
autovacuum=on
Result: Again slower 36s
I am not able to interprete that in depth.
Post by Scott Carey(C) My interpretation
(1) Although the data disk G: sometimes hits 100%: All in
all it seems that neither the CPUs nor the data disk
(approx. 65%) nor the WAL disk (approx. 30%) are at their
limits. See also 1000 writes/s, 40MB/s write thoughput.
I think it is alternating. Whatever is causing the 25%
CPU jump during the 'slow' periods is a clue. Some
process on the system must be increasing its time
significantly in these bursts. I suspect it is postgres
flushing data from its shared_buffers to the OS. 8.2 is
not very efficient at its ability to write out to the OS
in a constant stream, and tends to be very 'bursty' like
this. I suspect that 8.3 or 8.4 would perform a lot
better here, when tuned right.
Ok, I've managed to use 8.4 here. Unfortunatelly: There was
nearly no improvement in speed. For example test 2d)
performed in 35s.
Post by Scott CareyThe write troughput is still disappointing to me: Even if we
would find a way to avoid those 'interrupts' of disk
inactivity (see E 1), we are too far beyond serial disk
write throughput (20 MB/s data disk + 20 MB/s other (!) WAL
... or better 20MB/s data disk + 20MB/s unexplained writes
to data disk + 20 MB/s WAL disk...
Post by Scott Careydisk: is far below 100-200 MB/s resp. 40-70 MB/s).
Although reaching 'full' sequential throughput is very
hard because not all of the writes are sequential, there
is a rather large gap here.
Yes, it's a pitty.
Thank You again so much.
Felix
--
Sent via pgsql-performance mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance