8.2.3: Server crashes on Windows using Eclipse/Junit

Discussion:

8.2.3: Server crashes on Windows using Eclipse/Junit

Laurent Duperval

2007-10-15 17:58:48 UTC

Hi,

I have a large amount of tests I run in Eclipse to test my application.
Many of them create and delete a lot of information in PG and at some
point, PG will crash and restart.

I get en error in the logs that state:

Server process exited with exit code -1073741502
.
.
.
Terminating connection because of crash of another server process

If it helps:

- I am using Windows XP
- I have 2 GB of memory
- I am using JPA/Hibernate3 and the Postgres Java driver

Any ideas?

L
--
Prenez la parole en public en étant Speak to an audience while being
moins nerveux et plus convaincant! less nervous and more convincing!
Éveillez l'orateur en vous! Bring out the speaker in you!

Information: ***@duperval.com http://www.duperval.com (514) 902-0186

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Kris Jurka

2007-10-15 19:06:37 UTC

Post by Laurent Duperval
I have a large amount of tests I run in Eclipse to test my application.
Many of them create and delete a lot of information in PG and at some
point, PG will crash and restart.
Server process exited with exit code -1073741502

This is likely a server bug. If you can isolate the failing test and
extract a self contained example someone can probably fix it.

Kris Jurka

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Laurent Duperval

2007-10-16 16:46:42 UTC

Post by Kris Jurka

Post by Laurent Duperval
Server process exited with exit code -1073741502

This is likely a server bug. If you can isolate the failing test and
extract a self contained example someone can probably fix it.

It seems to be some sort of interaction between Eclipse and Junit/Postgres
driver. When I run my tests, just before the server crash, I have dozens
and dozens of spawned Postgres processes. When the crash occurs, all
processes are killed and restarted again. And this process continues until
the tests complete.

When I run the tests from an ant script I also see some spawned processes,
but nothing like running it in Eclipse.

If I run each test case separately, I don't see this issue. But when I run
them as a whole (i.e. run all tests defined in my application) I get the
same error every time.

L

Post by Kris Jurka
Kris Jurka
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

--
Prenez la parole en public en étant Speak to an audience while being
moins nerveux et plus convaincant! less nervous and more convincing!
Éveillez l'orateur en vous! Bring out the speaker in you!

Information: ***@duperval.com http://www.duperval.com (514) 902-0186

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Alvaro Herrera

2007-10-16 22:20:10 UTC

Post by Laurent Duperval

Post by Kris Jurka

Post by Laurent Duperval
Server process exited with exit code -1073741502

This is likely a server bug. If you can isolate the failing test and
extract a self contained example someone can probably fix it.

It seems to be some sort of interaction between Eclipse and Junit/Postgres
driver. When I run my tests, just before the server crash, I have dozens
and dozens of spawned Postgres processes. When the crash occurs, all
processes are killed and restarted again. And this process continues until
the tests complete.

The fact that all Postgres processes disappear is normal. Postgres
itself (more precisely, the postmaster process) kills all server
processes when one of them dies unexpectedly.

Post by Laurent Duperval
When I run the tests from an ant script I also see some spawned processes,
but nothing like running it in Eclipse.
If I run each test case separately, I don't see this issue. But when I run
them as a whole (i.e. run all tests defined in my application) I get the
same error every time.

Maybe Eclipse is trying to run more of them at a time than ant, and the
extra concurrency is killing the server for some reason. Was this
compiled with Cygwin, or is it the native (mingw) version?

--
Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J
"La vida es para el que se aventura"

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Tom Lane

2007-10-16 23:01:07 UTC

Post by Alvaro Herrera
Maybe Eclipse is trying to run more of them at a time than ant, and the
extra concurrency is killing the server for some reason. Was this
compiled with Cygwin, or is it the native (mingw) version?

Don't both those builds have some hard-wired upper limit on the number of
child processes? I wonder what Laurent has max_connections set to...
if it's larger than the build could actually support, perhaps this
behavior would be the result.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Trevor Talbot

2007-10-16 23:02:32 UTC

Post by Alvaro Herrera

Post by Laurent Duperval

Post by Laurent Duperval
Server process exited with exit code -1073741502

FYI, this exit code means a DLL's initialization routine indicated
failure during process startup.

Post by Alvaro Herrera

Post by Laurent Duperval
If I run each test case separately, I don't see this issue. But when I run
them as a whole (i.e. run all tests defined in my application) I get the
same error every time.

Maybe Eclipse is trying to run more of them at a time than ant, and the
extra concurrency is killing the server for some reason.

Sounds likely.

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Laurent Duperval

2007-10-17 02:21:41 UTC

Hi,

Sorry for top-posting but since I am answering questions that don't all
appear in this message:

- I installed the default download of Postgres. I didn't compile myself,
so it's probably the mingw version

- Max_connections is set to 500. I did that originally because I kept
seeing a message about no connection available and I thought it was
because I was not allocating enough connections. My machine has 2GB of RAM.

- How do I determine what DLL is failing and what is causing it to fail in
its initialization routine?

Thanks,

L

Post by Trevor Talbot

Post by Alvaro Herrera

Post by Laurent Duperval

Post by Laurent Duperval
Server process exited with exit code -1073741502

FYI, this exit code means a DLL's initialization routine indicated
failure during process startup.

Post by Alvaro Herrera

Post by Laurent Duperval
If I run each test case separately, I don't see this issue. But when I run
them as a whole (i.e. run all tests defined in my application) I get the
same error every time.

Maybe Eclipse is trying to run more of them at a time than ant, and the
extra concurrency is killing the server for some reason.

Sounds likely.
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

--
Prenez la parole en public en étant Speak to an audience while being
moins nerveux et plus convaincant! less nervous and more convincing!
Éveillez l'orateur en vous! Bring out the speaker in you!

Information: ***@duperval.com http://www.duperval.com (514) 902-0186

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly

Laurent Duperval

2007-10-16 18:24:20 UTC

I will add that speed may be a factor also. When I increase the amount of
logging by the PG server, I see the problem less often.

L

Post by Laurent Duperval
Hi,
I have a large amount of tests I run in Eclipse to test my application.
Many of them create and delete a lot of information in PG and at some
point, PG will crash and restart.
Server process exited with exit code -1073741502
.
.
.
Terminating connection because of crash of another server process
- I am using Windows XP
- I have 2 GB of memory
- I am using JPA/Hibernate3 and the Postgres Java driver
Any ideas?
L

--
Prenez la parole en public en étant Speak to an audience while being
moins nerveux et plus convaincant! less nervous and more convincing!
Éveillez l'orateur en vous! Bring out the speaker in you!

Information: ***@duperval.com http://www.duperval.com (514) 902-0186

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Magnus Hagander

2007-10-17 05:20:56 UTC

Post by Laurent Duperval
Hi,
Sorry for top-posting but since I am answering questions that don't all
- I installed the default download of Postgres. I didn't compile myself,
so it's probably the mingw version

It is.

Post by Laurent Duperval
- Max_connections is set to 500. I did that originally because I kept
seeing a message about no connection available and I thought it was
because I was not allocating enough connections. My machine has 2GB of RAM.

There's your problem. 500 is way above what the windows version can handle. IIRC the hard max is somewhere around 200 depending on some OS factors that we don't entirely know. I'd never recommend going above 100-150. With no more than 2Gb ram, not above 100.

You'll ned to figure out what's eating all your connections - it sounds like it's not entirely expected. Perhaps conections are leaked somewhere?

Post by Laurent Duperval
- How do I determine what DLL is failing and what is causing it to fail in
its initialization routine?

You really can't in this case, but if you could it wouldn't help you. It's windows running out of global resources.

/Magnus

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Tom Lane

2007-10-17 06:40:14 UTC

Post by Magnus Hagander

Post by Laurent Duperval
- Max_connections is set to 500.

There's your problem. 500 is way above what the windows version can
handle. IIRC the hard max is somewhere around 200 depending on some OS
factors that we don't entirely know.

Maybe we should put an #ifdef WIN32 into guc.c to limit max_connections
to something we know the platform can stand? It'd be more comfortable
if we understood exactly where the limit was, but I think I'd rather
have an "I'm sorry Dave, I can't do that" than random-seeming crashes.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Magnus Hagander

2007-10-17 07:22:11 UTC

Post by Tom Lane

Post by Magnus Hagander

Post by Laurent Duperval
- Max_connections is set to 500.

There's your problem. 500 is way above what the windows version can
handle. IIRC the hard max is somewhere around 200 depending on some OS
factors that we don't entirely know.

Maybe we should put an #ifdef WIN32 into guc.c to limit max_connections
to something we know the platform can stand? It'd be more comfortable
if we understood exactly where the limit was, but I think I'd rather
have an "I'm sorry Dave, I can't do that" than random-seeming crashes.

Yeayh, that's probably a good idea - except we never managed to figure out
where the limit is. It appears to vary pretty wildly between different
machines, for reasons we don't really know why (total RAM has some effect
on it, but that's not the only one, for example)

//Magnus

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly

Joshua D. Drake

2007-10-19 19:48:55 UTC

On Wed, 17 Oct 2007 09:22:11 +0200

Post by Magnus Hagander

Post by Tom Lane
Maybe we should put an #ifdef WIN32 into guc.c to limit
max_connections to something we know the platform can stand? It'd
be more comfortable if we understood exactly where the limit was,
but I think I'd rather have an "I'm sorry Dave, I can't do that"
than random-seeming crashes.

Yeayh, that's probably a good idea - except we never managed to
figure out where the limit is. It appears to vary pretty wildly
between different machines, for reasons we don't really know why
(total RAM has some effect on it, but that's not the only one, for
example)

How about we just emit a warning..

WARNING: Connections above 250 on Windows platforms may have
unpredictable results.

Joshua D. Drake

Post by Magnus Hagander
//Magnus
---------------------------(end of
broadcast)--------------------------- TIP 1: if posting/reading
through Usenet, please send an appropriate subscribe-nomail command
the mailing list cleanly

--
=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997 http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

Shelby Cain

2007-10-20 18:29:15 UTC

Post by Joshua D. Drake
How about we just emit a warning..
WARNING: Connections above 250 on Windows platforms may have
unpredictable results.
Joshua D. Drake

I'd personally vote for a lower warning limit like 175 as I can
consistently crash Postgresql on Windows system right around the 200th
connection.

Regards,

Shelby Cain

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Trevor Talbot

2007-10-20 18:44:52 UTC

Post by Shelby Cain
I'd personally vote for a lower warning limit like 175 as I can
consistently crash Postgresql on Windows system right around the 200th
connection.

What error gets logged for your crashes?

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Shelby Cain

2007-10-20 21:01:19 UTC

Post by Trevor Talbot

Post by Shelby Cain
I'd personally vote for a lower warning limit like 175 as I can
consistently crash Postgresql on Windows system right around the

200th

Post by Shelby Cain
connection.

What error gets logged for your crashes?

It's been a while but IIRC there wasn't anything in the logs other than
an entry noting that a backend had crashed unexpectedly so the
postmaster was restarting all active backends. I can trivially
reproduce it at work on my workstation if you need the exact error
text.

Regards,

Shelby Cain

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Trevor Talbot

2007-10-20 21:25:28 UTC

Post by Shelby Cain

Post by Trevor Talbot

Post by Shelby Cain
I'd personally vote for a lower warning limit like 175 as I can
consistently crash Postgresql on Windows system right around the

200th

Post by Shelby Cain
connection.

What error gets logged for your crashes?

It's been a while but IIRC there wasn't anything in the logs other than
an entry noting that a backend had crashed unexpectedly so the
postmaster was restarting all active backends. I can trivially
reproduce it at work on my workstation if you need the exact error
text.

I think it would be useful; if nothing else, maybe it'll tell us if
you can see the same problem Laruent does, or if it's a different
limit entirely.

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly

Rainer Bauer

2007-10-20 21:47:05 UTC

Post by Shelby Cain

Post by Trevor Talbot

Post by Shelby Cain
I'd personally vote for a lower warning limit like 175 as I can
consistently crash Postgresql on Windows system right around the

200th

Post by Shelby Cain
connection.

What error gets logged for your crashes?

It's been a while but IIRC there wasn't anything in the logs other than
an entry noting that a backend had crashed unexpectedly so the
postmaster was restarting all active backends. I can trivially
reproduce it at work on my workstation if you need the exact error
text.

I could reproduce this here:

Server closed the connection unexpectedly
This probaly means the server terminated abnormally before or while processing
the request

2007-10-20 23:33:42 LOG: server process (PID 5240) exited with exit code
-1073741502

Shelby, are you using the /3GB switch by chance? This will half the no. of
available handles on your system.

Rainer

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Trevor Talbot

2007-10-20 22:15:25 UTC

Post by Rainer Bauer
Server closed the connection unexpectedly
This probaly means the server terminated abnormally before or while processing
the request
2007-10-20 23:33:42 LOG: server process (PID 5240) exited with exit code
-1073741502

How?

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Rainer Bauer

2007-10-20 22:27:41 UTC

Every backend creates at least 4*<max_connections> semaphores.

Sorry, this must read <max_connections> semaphores, not 4 times.

Rainer

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Rainer Bauer

2007-10-20 22:24:30 UTC

Hello Trevor,

Post by Rainer Bauer
Server closed the connection unexpectedly
This probaly means the server terminated abnormally before or while processing
the request
2007-10-20 23:33:42 LOG: server process (PID 5240) exited with exit code
-1073741502

TT> How?

Seems like the mailiming list is not catching up fast enough (I am
posting through usenet)...

Anyway, the problem are the no. of semaphores created by Postgres:
Every backend creates at least 4*<max_connections> semaphores. Just
increase <max_connections> to an unusual high value (say 10000) and
start creating new connections while monitoring the handle count.

Rainer

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly

Trevor Talbot

2007-10-21 03:55:01 UTC

Post by Rainer Bauer
Every backend creates at least 4*<max_connections> semaphores. Just
increase <max_connections> to an unusual high value (say 10000) and
start creating new connections while monitoring the handle count.

Hmm, they're actually the same semaphores, so the only cost is for
slots in each process's handle table, which comes from kernel paged
pool. Testing shows I can easily create about 30 million handles to a
given object on this machine. This is under win2003 with 1.25GB RAM,
which gives it a paged pool limit of 352MB.

I tried going up to 20000 max_connections, and still blew postmaster's
VM space long before paged pool was exhausted. I couldn't test any
higher values, as there's some interaction between max_connections and
shared_buffers that prevents it from mapping the buffer contiguously.

Something's missing though, since I'm not hitting the same issue you
are. How are you generating the connections? I just have an app
calling PQconnectdb() in a loop, but I guess that's not good enough.

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Magnus Hagander

2007-10-21 08:46:50 UTC

Post by Trevor Talbot

Post by Rainer Bauer
Every backend creates at least 4*<max_connections> semaphores. Just
increase <max_connections> to an unusual high value (say 10000) and
start creating new connections while monitoring the handle count.

Hmm, they're actually the same semaphores, so the only cost is for
slots in each process's handle table, which comes from kernel paged
pool. Testing shows I can easily create about 30 million handles to a
given object on this machine. This is under win2003 with 1.25GB RAM,
which gives it a paged pool limit of 352MB.
I tried going up to 20000 max_connections, and still blew postmaster's
VM space long before paged pool was exhausted. I couldn't test any
higher values, as there's some interaction between max_connections and
shared_buffers that prevents it from mapping the buffer contiguously.
Something's missing though, since I'm not hitting the same issue you
are. How are you generating the connections? I just have an app
calling PQconnectdb() in a loop, but I guess that's not good enough.

Yeah, something is obviously missing.. Are you guys on the exactly the
same Windows versions? WRT both version and servivepack. Anybody on x64
windows?

Another thing worth testing - check if the amount of shared memory used
makes a noticable difference. Try both very small and very large values.

I don't think the paged pool is the problem - I think it's the nonpaged
pool. Would be interesting to track that one in the failing case (using
performance monitor, up to the point where it fails). And the nonpaged
one is smaller... If that looks like it's the problem, it could be
helpful to do a pooltag trace on it (see for example
http://blogs.msdn.com/ntdebugging/archive/2006/12/18/Understanding-Pool-Consumption-and-Event-ID_3A00_--2020-or-2019.aspx)

//Magnus

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly

Rainer Bauer

2007-10-21 19:43:27 UTC

Post by Magnus Hagander

Post by Trevor Talbot

Post by Rainer Bauer
Every backend creates at least 4*<max_connections> semaphores. Just
increase <max_connections> to an unusual high value (say 10000) and
start creating new connections while monitoring the handle count.

Hmm, they're actually the same semaphores, so the only cost is for
slots in each process's handle table, which comes from kernel paged
pool. Testing shows I can easily create about 30 million handles to a
given object on this machine. This is under win2003 with 1.25GB RAM,
which gives it a paged pool limit of 352MB.

On my system I can only create about 4 millions semaphores.

Post by Magnus Hagander

Post by Trevor Talbot
I tried going up to 20000 max_connections, and still blew postmaster's
VM space long before paged pool was exhausted. I couldn't test any
higher values, as there's some interaction between max_connections and
shared_buffers that prevents it from mapping the buffer contiguously.
Something's missing though, since I'm not hitting the same issue you
are. How are you generating the connections? I just have an app
calling PQconnectdb() in a loop, but I guess that's not good enough.

I am using the ASCII version of the psqlODBC driver version 8.2.4.2 to
establish the test connections.

Post by Magnus Hagander
Yeah, something is obviously missing.. Are you guys on the exactly the
same Windows versions? WRT both version and servivepack. Anybody on x64
windows?

No, I am using WinXP SP2 32 bit with 2GB RAM.

These are my altered settings from the default 8.2.5 Postgres installation:
ssl = on
shared_buffers = 512MB
work_mem = 16MB
maintenance_work_mem = 256MB
wal_sync_method = fsync_writethrough
checkpoint_segments = 15
checkpoint_timeout = 30min
random_page_cost = 3.0
effective_cache_size = 1GB
autovacuum_vacuum_scale_factor = 0.10
autovacuum_analyze_scale_factor = 0.05

Post by Magnus Hagander
Another thing worth testing - check if the amount of shared memory used
makes a noticable difference. Try both very small and very large values.

Well I tried different shared_buffers settings, but the result was consisting:
with max_connections set to 10000, I can create 150 database connections.

However, I checked the handle count at the moment the last connection fails
and it is only at 1,5 million. So it seems the handles are not the primary
problem.

Let me know if you want any other tests performed on this machine. I also have
VS2005 installed, but until now I haven't compiled Postgres here (I was
waiting for 8.3 which fully supports building with VS).

Rainer

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly

Magnus Hagander

2007-10-22 08:23:16 UTC

Post by Rainer Bauer

Post by Magnus Hagander

Post by Trevor Talbot

Post by Rainer Bauer
Every backend creates at least 4*<max_connections> semaphores. Just
increase <max_connections> to an unusual high value (say 10000) and
start creating new connections while monitoring the handle count.

Hmm, they're actually the same semaphores, so the only cost is for
slots in each process's handle table, which comes from kernel paged
pool. Testing shows I can easily create about 30 million handles to a
given object on this machine. This is under win2003 with 1.25GB RAM,
which gives it a paged pool limit of 352MB.

On my system I can only create about 4 millions semaphores.

Is that 4 million semaphores, or 4 million handles to a smaller number of
semaphores?

Post by Rainer Bauer

Post by Magnus Hagander

Post by Trevor Talbot
I tried going up to 20000 max_connections, and still blew postmaster's
VM space long before paged pool was exhausted. I couldn't test any
higher values, as there's some interaction between max_connections and
shared_buffers that prevents it from mapping the buffer contiguously.
Something's missing though, since I'm not hitting the same issue you
are. How are you generating the connections? I just have an app
calling PQconnectdb() in a loop, but I guess that's not good enough.

I am using the ASCII version of the psqlODBC driver version 8.2.4.2 to
establish the test connections.

Could you try the same tests with the client runnint on a different system?
Since the client eats up a bunch of handles and such as well, and that
would eliminate the difference due to different clients.

Post by Rainer Bauer

Post by Magnus Hagander
Yeah, something is obviously missing.. Are you guys on the exactly the
same Windows versions? WRT both version and servivepack. Anybody on x64
windows?

No, I am using WinXP SP2 32 bit with 2GB RAM.

Ok. So one is on XP and one is on 2003. That' interesting - given that 2003
is tuned towards servers, it doesn't surprise me that it allows more
clients before breaking.

Post by Rainer Bauer
ssl = on

Does it make a difference if you turn this off?

Post by Rainer Bauer
shared_buffers = 512MB

As a general note, thsi is *way* too high. All evidence I've seen points to
that you should have shared_buffers as *small* as possible on win32,
because memory access there is slow. And leave more of the caching up to
the OS.

Post by Rainer Bauer
work_mem = 16MB
maintenance_work_mem = 256MB
wal_sync_method = fsync_writethrough
checkpoint_segments = 15
checkpoint_timeout = 30min
random_page_cost = 3.0
effective_cache_size = 1GB
autovacuum_vacuum_scale_factor = 0.10
autovacuum_analyze_scale_factor = 0.05

None of those should make a difference on this.

Post by Rainer Bauer

Post by Magnus Hagander
Another thing worth testing - check if the amount of shared memory used
makes a noticable difference. Try both very small and very large values.

with max_connections set to 10000, I can create 150 database connections.

Ok. But if you decrease max_connections, you can have more connections? Or
the other way around?

Post by Rainer Bauer
However, I checked the handle count at the moment the last connection fails
and it is only at 1,5 million. So it seems the handles are not the primary
problem.

Good, it shouldn't be, but it's good to have that confirmed.

/Magnus

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Magnus Hagander

2007-10-22 08:41:14 UTC

Post by Magnus Hagander

Post by Rainer Bauer

Post by Trevor Talbot
I tried going up to 20000 max_connections, and still blew postmaster's
VM space long before paged pool was exhausted. I couldn't test any
higher values, as there's some interaction between max_connections and
shared_buffers that prevents it from mapping the buffer contiguously.
Something's missing though, since I'm not hitting the same issue you
are. How are you generating the connections? I just have an app
calling PQconnectdb() in a loop, but I guess that's not good enough.

I am using the ASCII version of the psqlODBC driver version 8.2.4.2 to
establish the test connections.

Could you try the same tests with the client runnint on a different system?
Since the client eats up a bunch of handles and such as well, and that
would eliminate the difference due to different clients.

Followup, when running these tests, could you check using Process Explorer
if you're hitting close to the limit of either of the two pools? See
http://blogs.technet.com/askperf/archive/2007/03/07/memory-management-understanding-pool-resources.aspx

//Magnus

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Magnus Hagander

2007-10-22 12:41:42 UTC

Post by Magnus Hagander

Post by Magnus Hagander

Post by Rainer Bauer

Post by Trevor Talbot
I tried going up to 20000 max_connections, and still blew postmaster's
VM space long before paged pool was exhausted. I couldn't test any
higher values, as there's some interaction between max_connections and
shared_buffers that prevents it from mapping the buffer contiguously.
Something's missing though, since I'm not hitting the same issue you
are. How are you generating the connections? I just have an app
calling PQconnectdb() in a loop, but I guess that's not good enough.

I am using the ASCII version of the psqlODBC driver version 8.2.4.2 to
establish the test connections.

Could you try the same tests with the client runnint on a different system?
Since the client eats up a bunch of handles and such as well, and that
would eliminate the difference due to different clients.

Followup, when running these tests, could you check using Process Explorer
if you're hitting close to the limit of either of the two pools? See
http://blogs.technet.com/askperf/archive/2007/03/07/memory-management-understanding-pool-resources.aspx

Another followup. Been working with Dave on and off today (well, him mostly
on to be honest, me a bit more on and off), and it seems that both our
repros clearly blame the desktop heap, and nothing else. Please use the
desktop heap tool and see if it breaks when the desktop heap usage
approaches 100%:

http://www.microsoft.com/downloads/details.aspx?familyid=5cfc9b74-97aa-4510-b4b9-b2dc98c8ed8b&displaylang=en

It'd still be good to know why the difference is so big between your two
systems.

//Magnus

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

Dave Page

2007-10-22 13:33:18 UTC

Post by Magnus Hagander
Another followup. Been working with Dave on and off today (well, him mostly
on to be honest, me a bit more on and off), and it seems that both our
repros clearly blame the desktop heap, and nothing else. Please use the
desktop heap tool and see if it breaks when the desktop heap usage
http://www.microsoft.com/downloads/details.aspx?familyid=5cfc9b74-97aa-4510-b4b9-b2dc98c8ed8b&displaylang=en
It'd still be good to know why the difference is so big between your two
systems.

Further info on this for the record - on XP Pro (which I'm testing on),
the desktop heap size defaults to 512KB for non-interactive sessions and
3072KB for interactive. In testing I find that I can get up to around 46
or so connections when running as a service before desktop heap is
exhausted and postgres dies.

When running interactively I can get a little over 125 connections
before things start dying, however, in that case it's *not* because of
dekstop heap, because I can start a second cluster and run 125
connections on each simultaneously.

If I run three instances up together, one of them will die as soon as
desktop heap gets to 100% usage.

So, we seem to be hitting two limits here - the desktop heap, and
something else which is cluster-specific. Investigation continues...

Regards, Dave

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Rainer Bauer

2007-10-22 14:03:42 UTC

Post by Dave Page
So, we seem to be hitting two limits here - the desktop heap, and
something else which is cluster-specific. Investigation continues...

I will make these tests tonight or tomorrow morning and will let you know.

Rainer

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Dave Page

2007-10-22 14:38:07 UTC

Post by Dave Page
So, we seem to be hitting two limits here - the desktop heap, and
something else which is cluster-specific. Investigation continues...

In further info, I've been testing this with the 8.3b1 release build
that we put out with pgInstaller, and a build with all optional
dependencies (OpenSSL, Kerberos, gettext, ldap etc) disabled. I'm seeing
pretty much the same results with each - roughtly 9.6KB of desktop heap
used per connection.

In addition, I've tried with standard pgbench runs, as well as a script
that just does 'select version()'. Again, no differences were observed.

Magnus and I did observe that we're using 1 user object and 4 GDI
objects per connection. If anyone happens to know how we might identify
those, please shout as so far we've drawn a blank :-(

Regards, Dave

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Trevor Talbot

2007-10-22 15:04:03 UTC

Post by Dave Page

Post by Dave Page
So, we seem to be hitting two limits here - the desktop heap, and
something else which is cluster-specific. Investigation continues...

In further info, I've been testing this with the 8.3b1 release build
that we put out with pgInstaller, and a build with all optional
dependencies (OpenSSL, Kerberos, gettext, ldap etc) disabled. I'm seeing
pretty much the same results with each - roughtly 9.6KB of desktop heap
used per connection.

The question is where that's coming from. I wondered if it was
desktop heap originally, but there's no reason it should be using it,
and that seems to be precisely the difference between my system and
the others. Connections here are barely making a dent; at 490 there's
an entire 45KB committed in the service desktop.

Post by Dave Page
Magnus and I did observe that we're using 1 user object and 4 GDI
objects per connection. If anyone happens to know how we might identify
those, please shout as so far we've drawn a blank :-(

Those appear to belong to the console window.

I've yet to do anything that generates real load (lightweight system),
but a simple "select version()" doesn't make any difference here
either, and raising shared buffers just makes postmaster run out of VM
space faster. (I don't think I mentioned that error before, but it
shows up as "FATAL: could not create sigchld waiter thread: error
code 8".)

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Magnus Hagander

2007-10-22 15:09:20 UTC

Post by Trevor Talbot

Post by Dave Page

Post by Dave Page
So, we seem to be hitting two limits here - the desktop heap, and
something else which is cluster-specific. Investigation continues...

In further info, I've been testing this with the 8.3b1 release build
that we put out with pgInstaller, and a build with all optional
dependencies (OpenSSL, Kerberos, gettext, ldap etc) disabled. I'm seeing
pretty much the same results with each - roughtly 9.6KB of desktop heap
used per connection.

The question is where that's coming from. I wondered if it was
desktop heap originally, but there's no reason it should be using it,
and that seems to be precisely the difference between my system and
the others. Connections here are barely making a dent; at 490 there's
an entire 45KB committed in the service desktop.

Yes, that would be very interesting to know. Because obviouslyi it's
something.

I read somewhere that Vista makes the size of the desktop heap dynamic, but
you were on 2003, right?

Are you running the server as a service or from the commandprompt?

Post by Trevor Talbot

Post by Dave Page
Magnus and I did observe that we're using 1 user object and 4 GDI
objects per connection. If anyone happens to know how we might identify
those, please shout as so far we've drawn a blank :-(

Those appear to belong to the console window.

Makes sense - a Windows, a system menu, etc. There's probably a "hidden
console window" when running as a service...

Post by Trevor Talbot
I've yet to do anything that generates real load (lightweight system),
but a simple "select version()" doesn't make any difference here
either, and raising shared buffers just makes postmaster run out of VM
space faster. (I don't think I mentioned that error before, but it
shows up as "FATAL: could not create sigchld waiter thread: error
code 8".)

Yeah, that makes sense. We need to fix that, but I think that's too big of
a change to push during beta, given how few reports we've had of people
running into it.

//Magnus

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Trevor Talbot

2007-10-22 15:30:59 UTC

Post by Magnus Hagander
I read somewhere that Vista makes the size of the desktop heap dynamic, but
you were on 2003, right?

Yeah, 32bit 2003 SP2, which has the same limits as XP. It looks like
Vista also has the same limits on actual heap sizes, but manages
kernel address space dynamically, so it doesn't get stuck with
arbitrary limits there. I don't have a Vista machine to verify
though.

Post by Magnus Hagander
Are you running the server as a service or from the commandprompt?

Service, I've been using the standard MSI install of 8.2.5.

Post by Magnus Hagander

Post by Trevor Talbot

Post by Dave Page
Magnus and I did observe that we're using 1 user object and 4 GDI
objects per connection. If anyone happens to know how we might identify
those, please shout as so far we've drawn a blank :-(

Those appear to belong to the console window.

Makes sense - a Windows, a system menu, etc. There's probably a "hidden
console window" when running as a service...

Well, the only thing actually running as a service is pg_ctl; the
other processes just belong to the same desktop. They're all console
executables, so they get the usual objects, but they're not visible
anywhere.

It could be that there's a significant difference between XP and 2003
in how that's handled though. I do have an XP SP2 machine here with
512MB RAM, and I'll try tests on it as soon as I can free up what it's
currently occupied with.

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly

Trevor Talbot

2007-10-22 15:59:18 UTC

I wrote:

[ desktop heap usage ]

Post by Trevor Talbot
It could be that there's a significant difference between XP and 2003
in how that's handled though. I do have an XP SP2 machine here with
512MB RAM, and I'll try tests on it as soon as I can free up what it's
currently occupied with.

...yep, under XP I'm using about 3.1KB of the service heap per
connection, which tears through it quite a bit faster. Now to figure
out exactly where it's coming from...

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly

Dave Page

2007-10-22 16:02:27 UTC

Post by Trevor Talbot
[ desktop heap usage ]

Post by Trevor Talbot
It could be that there's a significant difference between XP and 2003
in how that's handled though. I do have an XP SP2 machine here with
512MB RAM, and I'll try tests on it as soon as I can free up what it's
currently occupied with.

...yep, under XP I'm using about 3.1KB of the service heap per
connection, which tears through it quite a bit faster. Now to figure
out exactly where it's coming from...

That ties up with what I'm seeing - on 8.3 it's about 9.6KB per
connection, and I get a little under a third as many connections as 8.2
before it dies.

/D

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Rainer Bauer

2007-10-23 12:02:03 UTC

Post by Trevor Talbot
[ desktop heap usage ]

Post by Trevor Talbot
It could be that there's a significant difference between XP and 2003
in how that's handled though. I do have an XP SP2 machine here with
512MB RAM, and I'll try tests on it as soon as I can free up what it's
currently occupied with.

...yep, under XP I'm using about 3.1KB of the service heap per
connection, which tears through it quite a bit faster. Now to figure
out exactly where it's coming from...

I can confirm this here (WinXP SP2).

I have restored the original postgresql.conf file that was created when the
cluster was initialized with Postgres 8.2.4-1 (the installed version now is
8.2.5-1). The only other change to this installation is that I have moved the
WAL directory pg_xlog to another drive using a junction link.

Here are my numbers from SysInternals System Information program:
Pages Limit: 364544KB [356MB]
Nonpaged Limit: 262144KB [256MB]
These limits are never reached.

Using the Desktop Heap Monitor every new connection consumes 3232 bytes of the
total 512KB heap.

Post by Trevor Talbot
It could be that there's a significant difference between XP and 2003
in how that's handled though. I do have an XP SP2 machine here with
512MB RAM, and I'll try tests on it as soon as I can free up what it's
currently occupied with.

Yeah, Win2003 behaves differently accoriding to this source:
<http://blogs.msdn.com/ntdebugging/archive/2007/01/04/desktop-heap-overview.aspx>

<quote>
Session paged pool allows session specific paged pool allocations. Windows XP
uses regular paged pool, since the number of remote desktop connections is
limited. On the other hand, Windows Server 2003 makes allocations from
session paged pool instead of regular paged pool if Terminal Services
(application server mode) is installed.
</quote>

After increasing the session heap size in the registry from 512KB to 1024KB
the no. of connections was roughly doubled. So this might be a solution for
people running out of Desktop heap.

Alter the value of the following key
<HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows>

The numeric values following "SharedSection=" control the heap management:
On WinXP these are the default values: "SharedSection=1024,3072,512"
Altering this to "SharedSection=1024,3072,1024" will increase the heap for all
non-interactive window stations to 1024KB.

Rainer

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Dave Page

2007-10-23 15:14:14 UTC

Post by Rainer Bauer

Post by Trevor Talbot
...yep, under XP I'm using about 3.1KB of the service heap per
connection, which tears through it quite a bit faster. Now to figure
out exactly where it's coming from...

I can confirm this here (WinXP SP2).

It's coming from direct dependencies on user32.dll (from which we use
wsprintf()) and shell32.dll (from which we use SHGetSpecialFolderPath())
and is allocated when ResumeThread() is called to kickstart the new
backend, but before the backend actually does anything (proven with a
while(1) loop in main() for the -forkbackend case with a breakpoint on
ResumeThread() in the postmaster).

I've submitted a patch against 8.3 that removes these dependencies
altogether. Unfortuntely, it seems we still have indirect dependencies
on user32.dll which I don't believe we can do anything about. In
testing, the patch reduces the per-connection desktop heap usage from
arount 9.7KB to 3.2KB which is back in line with 8.2.

Regards, Dave

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Harald Armin Massa

2007-10-23 15:32:29 UTC

Dave,

Post by Dave Page
It's coming from direct dependencies on user32.dll (from which we use
wsprintf()) and shell32.dll (from which we use SHGetSpecialFolderPath())
and is allocated when ResumeThread() is called to kickstart the new
backend,

why does every backend need its own heap for user32.dll or
shell32.dll? Wasn't the point of shared dlls to be shared?

Harald
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
fx 01212-5-13695179
-
EuroPython 2008 will take place in Vilnius, Lithuania - Stay tuned!

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Gregory Stark

2007-10-23 15:49:35 UTC

Post by Harald Armin Massa
Dave,

Post by Dave Page
It's coming from direct dependencies on user32.dll (from which we use
wsprintf()) and shell32.dll (from which we use SHGetSpecialFolderPath())
and is allocated when ResumeThread() is called to kickstart the new
backend,

why does every backend need its own heap for user32.dll or
shell32.dll? Wasn't the point of shared dlls to be shared?

The Desktop Heap appears to be a place for processes belonging to the same
"desktop" to allocate shared objects such as GUI elements. These are allocated
in shared space so they can be manipulated by any process running in that
"desktop".

Why Shell32 and User32 are allocating space in there just to initialize
themselves or handle these basic utility functions is a bit of a mystery.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Harald Armin Massa

2007-10-23 16:11:12 UTC

Post by Gregory Stark

Post by Harald Armin Massa
why does every backend need its own heap for user32.dll or
shell32.dll? Wasn't the point of shared dlls to be shared?

The Desktop Heap appears to be a place for processes belonging to the same
"desktop" to allocate shared objects such as GUI elements. These are allocated
in shared space so they can be manipulated by any process running in that
"desktop".

Using this knowledge and Daves response, also looking back at "3,2kb
per backend", I stumbled upon that KB entry:

http://support.microsoft.com/?scid=kb%3Ben-us%3B184802&x=15&y=14

Please pay special attention to the following parts:

%SystemRoot%\system32\csrss.exe ObjectDirectory=\Windows
SharedSection=1024,3072,512 Windows=On SubSystemType=Windows
ServerDll=basesrv,1 ServerDll=winsrv:UserServerDllInitialization,3
ServerDll=winsrv:ConServerDllInitialization,2 ProfileControl=Off
MaxRequestThreads=16

"""The second SharedSection value (3072) is the size of the desktop
heap for each desktop that is associated with the "interactive" window
station WinSta0."""

and further down:

"""All services that are executed under the LocalSystem account with
the Allow Service to Interact with Desktop startup option selected
will use "Winsta0\Default". All these processes will share the same
desktop heap associated with the "Default" application desktop."""

Postgres is definitely NOT started as LocalSystem account; so using a
"logical not" on Microsofts Words that could indicate the reason why
our service-backends consume that memory? Add to this that MS SQL runs
as LocalSystem; and as much as I know also Oracle.

Is this a path of thinking to try?

Harald
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
fx 01212-5-13695179
-
EuroPython 2008 will take place in Vilnius, Lithuania - Stay tuned!

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Harald Armin Massa

2007-10-23 16:13:44 UTC

Replying to myself....

Post by Harald Armin Massa
Postgres is definitely NOT started as LocalSystem account; so using a
"logical not" on Microsofts Words that could indicate the reason why
our service-backends consume that memory? Add to this that MS SQL runs
as LocalSystem; and as much as I know also Oracle.

just some lines further down:

"""Every service process executed under a user account will receive a
new desktop in a noninteractive window station created by the Service
Control Manager (SCM). Thus, each service executed under a user
account will consume the number of kilobytes of desktop heap specified
in the third SharedSection value. All services executed under the
LocalSystem account with Allow Service to Interact with the Desktop
not selected share the desktop heap of the "Default" desktop in the
noninteractive service windows station (Service-0x0-3e7$)."""

it is exactly as suspected ... just starting the service allocates that heap

Harald
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
fx 01212-5-13695179
-
EuroPython 2008 will take place in Vilnius, Lithuania - Stay tuned!

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

Dave Page

2007-10-23 16:22:11 UTC

Post by Harald Armin Massa
Replying to myself....

Post by Harald Armin Massa
Postgres is definitely NOT started as LocalSystem account; so using a
"logical not" on Microsofts Words that could indicate the reason why
our service-backends consume that memory? Add to this that MS SQL runs
as LocalSystem; and as much as I know also Oracle.

"""Every service process executed under a user account will receive a
new desktop in a noninteractive window station created by the Service
Control Manager (SCM). Thus, each service executed under a user
account will consume the number of kilobytes of desktop heap specified
in the third SharedSection value. All services executed under the
LocalSystem account with Allow Service to Interact with the Desktop
not selected share the desktop heap of the "Default" desktop in the
noninteractive service windows station (Service-0x0-3e7$)."""
it is exactly as suspected ... just starting the service allocates that heap

You're missing the point I think. There's 48MB (iirc) on XP that is
reserved for desktop heaps. From that, it allocates 64KB for
WinSta0\Disconnect, 128KB for WinSta0\Winlogon and 3072KB for
WinSta0\Default (ie. the regular desktop). Each additional session
started by the SCM gets allocated the non-interactive default of 512KB.

It's not the 48MB we're running out of, it's the 512KB. That's why if
you look back in the thread, you'll see I found 8.3 was crashing with 46
connections when running as a service, but with much higher numbers of
connections when run from the logon session.

The reason why Microsoft services don't consume so much heap is that
they are multi-threaded, not multi-process so they don't init user32.dll
etc. for each individual connection like we do, but only once for the
whole server.

/D

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

Trevor Talbot

2007-10-23 16:43:33 UTC

Post by Harald Armin Massa

Post by Gregory Stark
The Desktop Heap appears to be a place for processes belonging to the same
"desktop" to allocate shared objects such as GUI elements. These are allocated
in shared space so they can be manipulated by any process running in that
"desktop".

Using this knowledge and Daves response, also looking back at "3,2kb
http://support.microsoft.com/?scid=kb%3Ben-us%3B184802&x=15&y=14

[...]

Post by Harald Armin Massa
Postgres is definitely NOT started as LocalSystem account; so using a
"logical not" on Microsofts Words that could indicate the reason why
our service-backends consume that memory? Add to this that MS SQL runs
as LocalSystem; and as much as I know also Oracle.

It's not quite what you think. The link Rainer posted upthread does a
decent job describing it, although there's still some room for
confusion: http://blogs.msdn.com/ntdebugging/archive/2007/01/04/desktop-heap-overview.aspx

The hierarchy of containers goes Session, Window Station, Desktop.
Everything relevant is under the same Session, so I'll ignore that for
now.

The console gets a Window Station; this is the interactive one since
the user sitting down works with it directly. It normally contains
one Desktop of interest (Default), which is what the user actually
sees. (It's possible to create multiple desktops as a framework for a
"virtual desktop" type of thing, but that's all third-party stuff.)

Each service registered with the Service Manager has a specific
account it logs in under. For each account, the Service Manager
creates a Window Station to contain it, and all services using the
same account share the default Desktop inside it. Most services run
under one of the 3 canned accounts, which is what that KB article is
talking about with the Local System bit.

Each Desktop created has a fixed-size chunk of memory allocated to it.
Desktops created under the interactive Window Station get the larger
chunk of memory (3072KB) since they expect to contain lots of UI
stuff. Desktops created under other Window Stations get the smaller
chunk of memory (512KB), since they aren't presenting a UI to the
user.

That fixed-size desktop heap is used to track objects handled by the
USER subsystem, which is mostly UI elements like windows and such.
Most of the API interaction for those resources go through user32.dll,
and apparently its initialization procedure grabs some of that heap
space for each process it's loaded into.

The PostgreSQL service is set to log in under its own account, so it
gets its own Window Station, and a default Desktop inside that. This
is a non-interactive Window Station, so the Desktop gets the smaller
heap. All postgres.exe processes run in that Desktop and share one
512KB heap. As each process ends up carving out a chunk of that
space, it uses up all 512KB and fails to create more backends.

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Dave Page

2007-10-23 15:57:45 UTC

Post by Harald Armin Massa
Dave,

Post by Dave Page
It's coming from direct dependencies on user32.dll (from which we use
wsprintf()) and shell32.dll (from which we use SHGetSpecialFolderPath())
and is allocated when ResumeThread() is called to kickstart the new
backend,

why does every backend need its own heap for user32.dll or
shell32.dll? Wasn't the point of shared dlls to be shared?

No idea, and I thought so.

It's quite easy to prove using the test program attached. Just monitor
the desktop heap with dheapmon (from Microsoft's website), and run the
program with a single command line argument to get it to spawn a 100
child processes. You can stop it loading various DLLs by commenting out
the dummy calls to functions in them and rebuilding.

Of course, none of this would be an issue if we made the backend
multithreaded. :-)

I'll get my coat...

/D

Trevor Talbot

2007-10-23 17:07:14 UTC

Post by Rainer Bauer

Post by Trevor Talbot
It could be that there's a significant difference between XP and 2003
in how that's handled though. I do have an XP SP2 machine here with
512MB RAM, and I'll try tests on it as soon as I can free up what it's
currently occupied with.

<http://blogs.msdn.com/ntdebugging/archive/2007/01/04/desktop-heap-overview.aspx>
<quote>
Session paged pool allows session specific paged pool allocations. Windows XP
uses regular paged pool, since the number of remote desktop connections is
limited. On the other hand, Windows Server 2003 makes allocations from
session paged pool instead of regular paged pool if Terminal Services
(application server mode) is installed.
</quote>

That's a little different. There's a specific range of kernel VM
space dedicated to session-specific data, so each session references
the same addresses but it can be backed by different physical memory
(same concept as separate processes). The session paged pool area of
that VM space is used to allocate the individual desktop heaps from.

It's saying that under XP, it's mapped to the main kernel paged pool,
while under 2003 TS it's mapped to session-specific memory, to avoid
depleting the main paged pool. (Each Terminal Services connection
creates an entire Session.) It doesn't change how desktop heap is
actually used though, which is the issue we're running into.

The system I'm testing on doesn't have Terminal Services running in
appserver mode.

Post by Rainer Bauer
After increasing the session heap size in the registry from 512KB to 1024KB
the no. of connections was roughly doubled. So this might be a solution for
people running out of Desktop heap.
Alter the value of the following key
<HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows>
On WinXP these are the default values: "SharedSection=1024,3072,512"
Altering this to "SharedSection=1024,3072,1024" will increase the heap for all
non-interactive window stations to 1024KB.

It's probably safe to do on a typical XP box, but it's unfortunately
not something you want the installer to do, or even suggest as blanket
advice. I also wondered about having postmaster create more desktops
on demand, but that has about the same amount of sanity (i.e. not
much).

I think it boils down to getting postgres to avoid using desktop heap
if at all possible, and if not, advising people to avoid XP for high
concurrency, except for suggesting the above change in specific
circumstances.

I suspect win2000 has the same issue, but I don't have a system to
test. It'd be interesting to know if 2000 Professional behaves any
differently than Server.

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

Magnus Hagander

2007-10-23 18:07:22 UTC

Post by Rainer Bauer
After increasing the session heap size in the registry from 512KB to 1024KB
the no. of connections was roughly doubled. So this might be a solution for
people running out of Desktop heap.
Alter the value of the following key
<HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows>
On WinXP these are the default values: "SharedSection=1024,3072,512"
Altering this to "SharedSection=1024,3072,1024" will increase the heap for all
non-interactive window stations to 1024KB.

This part should go in the FAQ, I think. It's valid for 8.2 as well,
from what I can tell, and it's valid for 8.3 both before and after the
patch I just applied.

Dave, you're listed as maintainer :-P

//Magnus

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

Dave Page

2007-10-26 08:53:39 UTC

Post by Magnus Hagander

Post by Rainer Bauer
After increasing the session heap size in the registry from 512KB to 1024KB
the no. of connections was roughly doubled. So this might be a solution for
people running out of Desktop heap.
Alter the value of the following key
<HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows>
On WinXP these are the default values: "SharedSection=1024,3072,512"
Altering this to "SharedSection=1024,3072,1024" will increase the heap for all
non-interactive window stations to 1024KB.

This part should go in the FAQ, I think. It's valid for 8.2 as well,
from what I can tell, and it's valid for 8.3 both before and after the
patch I just applied.
Dave, you're listed as maintainer :-P

done.

/D

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Rainer Bauer

2007-10-26 17:09:26 UTC

Post by Magnus Hagander

Post by Rainer Bauer
After increasing the session heap size in the registry from 512KB to 1024KB
the no. of connections was roughly doubled. So this might be a solution for
people running out of Desktop heap.
Alter the value of the following key
<HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows>
On WinXP these are the default values: "SharedSection=1024,3072,512"
Altering this to "SharedSection=1024,3072,1024" will increase the heap for all
non-interactive window stations to 1024KB.

This part should go in the FAQ, I think. It's valid for 8.2 as well,
from what I can tell, and it's valid for 8.3 both before and after the
patch I just applied.
Dave, you're listed as maintainer :-P

done.

Dave could you add that it's the third parameter of the "SharedSection" string
that must be changed. I read that KB article, but still had to find the
correct one by trial and error, which required a reboot every time.

Rainer

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

Laurent Duperval

2007-11-30 00:55:30 UTC

Hi,

I'm not sure if this is good netiquette, or not. I'm reviving a month-old
thread, because I'm trying to figure out how to resolve the issue.

To summarize: when I run unit tests with eclipse (and with Ant) on
Windows, at some point, I run out of available connections. I tried
increasing the maximum number of connections, but then I started seeing
the postgres server die and restart.

I'm trying to fix this, yet again, but I don't have a clear idea of what
to fix.

Post by Magnus Hagander

Post by Rainer Bauer
After increasing the session heap size in the registry from 512KB to 1024KB
the no. of connections was roughly doubled. So this might be a solution for
people running out of Desktop heap.
Alter the value of the following key
<HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows>
On WinXP these are the default values: "SharedSection=1024,3072,512"
Altering this to "SharedSection=1024,3072,1024" will increase the heap for all
non-interactive window stations to 1024KB.

Does this allow creating more connections? At some point, the discussion
became too technical for me, and I no longer could tell if the answer was
for developers of for users.

I saw other messages dealing with semaphores/connection relations, etc.
But unless I really did not understand the discussion, none of them seemed
to address the issue I was seeing.

I'm thinking that the Java driver combined with Hibernate may be keeping
handles open for too long, because my tests aren't supposed to maintain
connections open for very long. I also would expect the connections to
either be closed or released once the statements are executed.

Post by Magnus Hagander
This part should go in the FAQ, I think. It's

valid for 8.2 as well,

Post by Magnus Hagander
from what I can tell, and it's valid for 8.3 both before and after the
patch I just applied.
Dave, you're listed as maintainer :-P
//Magnus
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org/

--
Prenez la parole en public en étant Speak to an audience while being
moins nerveux et plus convaincant! less nervous and more convincing!
Abonnez-vous au bulletin gratuit! Sign up for the free newsletter!

http://www.duperval.com (514) 902-0186

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Trevor Talbot

2007-11-30 05:51:02 UTC

Post by Laurent Duperval
To summarize: when I run unit tests with eclipse (and with Ant) on
Windows, at some point, I run out of available connections. I tried
increasing the maximum number of connections, but then I started seeing
the postgres server die and restart.

The conclusion was that under Windows XP, postgres is normally limited
to a maximum of 125-150 connections. Raising max_connections higher
than that will lead to the crashes you saw.

Post by Laurent Duperval

Post by Rainer Bauer
After increasing the session heap size in the registry from 512KB to 1024KB
the no. of connections was roughly doubled. So this might be a solution for
people running out of Desktop heap.
Alter the value of the following key
<HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows>
On WinXP these are the default values: "SharedSection=1024,3072,512"
Altering this to "SharedSection=1024,3072,1024" will increase the heap for all
non-interactive window stations to 1024KB.

Does this allow creating more connections? At some point, the discussion
became too technical for me, and I no longer could tell if the answer was
for developers of for users.

Yes. After making that change and restarting Windows, postgres will be
able to safely handle 250-300 connections.

Post by Laurent Duperval
I saw other messages dealing with semaphores/connection relations, etc.
But unless I really did not understand the discussion, none of them seemed
to address the issue I was seeing.

Right, we were just trying to find the precise resource limit that was
causing the crash.

Post by Laurent Duperval
I'm thinking that the Java driver combined with Hibernate may be keeping
handles open for too long, because my tests aren't supposed to maintain
connections open for very long. I also would expect the connections to
either be closed or released once the statements are executed.

This is where I would start on your problem. Increasing the max
connections is one thing, but having so very many simultaneous
operations in progress on your database is probably not productive, as
it's likely to spend more time juggling tasks than actually performing
them.

I'm not familiar with Java tools, so someone else will have to chime
in with specific suggestions. It may be something as simple as
limiting how many tests JUnit/Ant tries to run at the same time, or
some parameter buried in Hibernate or the driver.

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Dave Page

2007-11-30 09:51:07 UTC

Post by Laurent Duperval
Does this allow creating more connections? At some point, the discussion
became too technical for me, and I no longer could tell if the answer was
for developers of for users.

Yeah, it did become something of an investigation into the problem which
probably should have been moved to -hackers.

I summarised the info in the FAQ
http://www.postgresql.org/docs/faqs.FAQ_windows.html#4.4 for user
consumption, and included a link to the MS Knowledgebase article that
shows what to tweak in the registry.

Post by Laurent Duperval
I saw other messages dealing with semaphores/connection relations, etc.
But unless I really did not understand the discussion, none of them seemed
to address the issue I was seeing.

Yes, that was all about how we were using threads to manage interprocess
communications. We found a far more efficient way to do that, but my
guess is that thats not your problem.

Post by Laurent Duperval
I'm thinking that the Java driver combined with Hibernate may be keeping
handles open for too long, because my tests aren't supposed to maintain
connections open for very long. I also would expect the connections to
either be closed or released once the statements are executed.

That could be an issue with Hibernate or the other code you're running,
but yes, if it's opening lots of connections and keeping them open that
could be what's wrong and I would suggest checking the FAQ above.

Regards, Dave

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Dave Page

2007-10-22 15:42:42 UTC

Post by Trevor Talbot
The question is where that's coming from. I wondered if it was
desktop heap originally, but there's no reason it should be using it,
and that seems to be precisely the difference between my system and
the others. Connections here are barely making a dent; at 490 there's
an entire 45KB committed in the service desktop.

Hmm, Greg mentioned to me earlier that he was suspicious of SSPI which
seems to drag in dependencies on gdi32.dll and user32.dll via
secur32.dll. Sure enough, testing with 8.2.5 on XP Pro, I get to 150
connections running as a service having used 97.2 of desktop heap (vs.
45 connections max with 8.3).

So we have a pretty serious regression in 8.3.

Of course, that still doesn't tally up with what you're seeing on
Win2k3. I'll test on there tomorrow.

Regards, Dave

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Magnus Hagander

2007-10-22 16:35:34 UTC

Post by Dave Page

Post by Trevor Talbot
The question is where that's coming from. I wondered if it was
desktop heap originally, but there's no reason it should be using it,
and that seems to be precisely the difference between my system and
the others. Connections here are barely making a dent; at 490 there's
an entire 45KB committed in the service desktop.

Hmm, Greg mentioned to me earlier that he was suspicious of SSPI which
seems to drag in dependencies on gdi32.dll and user32.dll via
secur32.dll. Sure enough, testing with 8.2.5 on XP Pro, I get to 150
connections running as a service having used 97.2 of desktop heap (vs.
45 connections max with 8.3).
So we have a pretty serious regression in 8.3.
Of course, that still doesn't tally up with what you're seeing on
Win2k3. I'll test on there tomorrow.

Could you try a build without SSPI? It should be as simple as removing
the #define ENABLE_SSPI 1 from port/win32.h. I don't think you need to
touch the linker lines at all, actually, so try without first.

//Magnus

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Dave Page

2007-10-22 20:00:31 UTC

Post by Magnus Hagander
Could you try a build without SSPI? It should be as simple as removing
the #define ENABLE_SSPI 1 from port/win32.h. I don't think you need to
touch the linker lines at all, actually, so try without first.

Nope, doesn't help - still using around 9.7KB per connection. Just to be
sure I did remove the link option, and checking with depends see that
there are now only delay load references to secur32.dll, nothing direct
- as is the case with 8.2

So the only other changes I can think of that might affect things are
the VC++ build or the shared memory changes, though I can't see why they
would cause problems offhand. I'll go try a mingw build...

/D

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Magnus Hagander

2007-10-22 20:16:38 UTC

Post by Dave Page

Post by Magnus Hagander
Could you try a build without SSPI? It should be as simple as removing
the #define ENABLE_SSPI 1 from port/win32.h. I don't think you need to
touch the linker lines at all, actually, so try without first.

Nope, doesn't help - still using around 9.7KB per connection. Just to be
sure I did remove the link option, and checking with depends see that
there are now only delay load references to secur32.dll, nothing direct
- as is the case with 8.2

ok. That makes sense, actually...

Post by Dave Page
So the only other changes I can think of that might affect things are
the VC++ build or the shared memory changes, though I can't see why they
would cause problems offhand. I'll go try a mingw build...

Yeah, it could be that the newer MSVCRT files do something we don't
like.. Other than that, did we upgrade to a different version of some of
our dependents?

Also, is this the DEBUG or RELEASE build of 8.3?

//Magnus

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Dave Page

2007-10-22 20:23:14 UTC

Post by Magnus Hagander
Yeah, it could be that the newer MSVCRT files do something we don't
like.. Other than that, did we upgrade to a different version of some of
our dependents?

Most of them - but my test build is without any of them:

our $config = {
asserts=>1, # --enable-cassert
integer_datetimes=>1, # --enable-integer-datetimes
nls=>undef, # --enable-nls=<path>
tcl=>undef, # --with-tls=<path>
perl=>undef, # --with-perl
python=>undef, # --with-python=<path>
krb5=>undef, # --with-krb5=<path>
ldap=>0, # --with-ldap
openssl=>undef, # --with-ssl=<path>
xml=>undef,
xslt=>undef,
iconv=>undef,
zlib=>undef # --with-zlib=<path>
};

Post by Magnus Hagander
Also, is this the DEBUG or RELEASE build of 8.3?

Both behave similarly.

/D

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

Dave Page

2007-10-22 20:19:16 UTC

Post by Dave Page
So the only other changes I can think of that might affect things are
the VC++ build or the shared memory changes, though I can't see why they
would cause problems offhand. I'll go try a mingw build...

mingw build of stock 8.3b1, no configure options specified at all,
consumes 3.2KB of desktop heap per connection.

So, it's either something we're doing different with the VC++
compile/link options, or it's the VC8 runtimes using more resources.

Oh, and I still see the second limitation where it bombs out over about
125 connections, so that isn't build/runtime specific.

Shall we take this over to -hackers btw?

/D

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Rainer Bauer

2007-10-22 14:03:35 UTC

Post by Magnus Hagander

Post by Rainer Bauer

Post by Magnus Hagander

Post by Trevor Talbot

Post by Rainer Bauer
Every backend creates at least 4*<max_connections> semaphores. Just
increase <max_connections> to an unusual high value (say 10000) and
start creating new connections while monitoring the handle count.

Hmm, they're actually the same semaphores, so the only cost is for
slots in each process's handle table, which comes from kernel paged
pool. Testing shows I can easily create about 30 million handles to a
given object on this machine. This is under win2003 with 1.25GB RAM,
which gives it a paged pool limit of 352MB.

On my system I can only create about 4 millions semaphores.

Is that 4 million semaphores, or 4 million handles to a smaller number of
semaphores?

No, 4 millions distinct semaphores by calling:
CreateSemaphore( NULL, 0, 1, NULL );

Post by Magnus Hagander

Post by Rainer Bauer

Post by Magnus Hagander

Post by Trevor Talbot
I tried going up to 20000 max_connections, and still blew postmaster's
VM space long before paged pool was exhausted. I couldn't test any
higher values, as there's some interaction between max_connections and
shared_buffers that prevents it from mapping the buffer contiguously.
Something's missing though, since I'm not hitting the same issue you
are. How are you generating the connections? I just have an app
calling PQconnectdb() in a loop, but I guess that's not good enough.

I am using the ASCII version of the psqlODBC driver version 8.2.4.2 to
establish the test connections.

Could you try the same tests with the client runnint on a different system?
Since the client eats up a bunch of handles and such as well, and that
would eliminate the difference due to different clients.
Followup, when running these tests, could you check using Process Explorer
if you're hitting close to the limit of either of the two pools? See
http://blogs.technet.com/askperf/archive/2007/03/07/memory-management-understanding-pool-resources.aspx

Well after installing Postgres explorer and starting the system information
program the kernel memory section shows me the current count, but not the
limits (it says "no symbols"). I am currently downloading the "Debugging Tools
for Windows". Maybe these limits are shown after the installation.

I just repeated the test with a local connection. After 150 connections, the
following values are displayed:
Paged physical 113000
Paged virtual 120000
Nonpaged 28000

Also there are 1.583.182 handles open.

I will check the behaviour with a remote connection later (have to go now...).

Post by Magnus Hagander

Post by Rainer Bauer
ssl = on

Does it make a difference if you turn this off?

Post by Rainer Bauer
shared_buffers = 512MB

As a general note, thsi is *way* too high. All evidence I've seen points to
that you should have shared_buffers as *small* as possible on win32,
because memory access there is slow. And leave more of the caching up to
the OS.

I followed Josh's advice here:
<http://archives.postgresql.org/pgsql-performance/2007-06/msg00606.php>

What value would you recommend then? The default 32MB?

Post by Magnus Hagander

Post by Rainer Bauer
ssl = on

Does it make a difference if you turn this off?

No.

Post by Magnus Hagander

Post by Rainer Bauer

Post by Magnus Hagander
Another thing worth testing - check if the amount of shared memory used
makes a noticable difference. Try both very small and very large values.

with max_connections set to 10000, I can create 150 database connections.

Ok. But if you decrease max_connections, you can have more connections? Or
the other way around?

A few tests indicated, that the maximum no. of connections is 150, regardless
of the <max_connections> settings. But I will have to check whether this is
somehow caused by the ODBC driver.

Rainer

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Trevor Talbot

2007-10-22 14:25:08 UTC

Post by Rainer Bauer
Well after installing Postgres explorer and starting the system information
program the kernel memory section shows me the current count, but not the
limits (it says "no symbols"). I am currently downloading the "Debugging Tools
for Windows". Maybe these limits are shown after the installation.

After you install that, go to Options->Configure Symbols in Process
Explorer. Change it to use the dbghelp.dll installed in the Debugging
Tools directory, and configure the symbol path like this:
http://support.microsoft.com/kb/311503

The limits should show up after that.

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Magnus Hagander

2007-10-22 14:33:17 UTC

Post by Rainer Bauer

Post by Magnus Hagander

Post by Rainer Bauer
shared_buffers = 512MB

As a general note, thsi is *way* too high. All evidence I've seen points to
that you should have shared_buffers as *small* as possible on win32,
because memory access there is slow. And leave more of the caching up to
the OS.

<http://archives.postgresql.org/pgsql-performance/2007-06/msg00606.php>
What value would you recommend then? The default 32MB?

That advice is good - for Unix platforms. For windows, yes, try with 32Mb.

//Magnus

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Shelby Cain

2007-10-20 22:26:32 UTC

Post by Rainer Bauer
Server closed the connection unexpectedly
This probaly means the server terminated abnormally before or while processing
the request
2007-10-20 23:33:42 LOG: server process (PID 5240) exited with exit code
-1073741502
Shelby, are you using the /3GB switch by chance? This will half the no. of
available handles on your system.
Rainer

Probably not although I haven't examined boot.ini. My workstation only
has 1.5 GB of ram so I'm highly doubtful that IBM would have configured
it to boot with the /3GB switch.

Regards,

Shelby Cain

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Trevor Talbot

2007-10-20 16:40:20 UTC

Post by Magnus Hagander

Post by Tom Lane
Maybe we should put an #ifdef WIN32 into guc.c to limit max_connections
to something we know the platform can stand? It'd be more comfortable
if we understood exactly where the limit was, but I think I'd rather
have an "I'm sorry Dave, I can't do that" than random-seeming crashes.

Yeayh, that's probably a good idea - except we never managed to figure out
where the limit is. It appears to vary pretty wildly between different
machines, for reasons we don't really know why (total RAM has some effect
on it, but that's not the only one, for example)

I tried generating idle connections in an effort to reproduce
Laurent's problem, but I ran into a local limit instead: for each
backend, postmaster creates a thread and burns 4MB of its 2GB address
space. It fails around 490.

Laurent's issue must depend on other load characteristics. It's
possible to get a trace of DLL loads, but I haven't found a
noninvasive way of doing that. It seems to require a debugger be
attached.

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

Magnus Hagander

2007-10-21 09:01:17 UTC

Post by Trevor Talbot

Post by Magnus Hagander

Post by Tom Lane
Maybe we should put an #ifdef WIN32 into guc.c to limit max_connections
to something we know the platform can stand? It'd be more comfortable
if we understood exactly where the limit was, but I think I'd rather
have an "I'm sorry Dave, I can't do that" than random-seeming crashes.

Yeayh, that's probably a good idea - except we never managed to figure out
where the limit is. It appears to vary pretty wildly between different
machines, for reasons we don't really know why (total RAM has some effect
on it, but that's not the only one, for example)

I tried generating idle connections in an effort to reproduce
Laurent's problem, but I ran into a local limit instead: for each
backend, postmaster creates a thread and burns 4MB of its 2GB address
space. It fails around 490.

Oh, that's interesting. That's actually a sideeffect of us increasing
the stack size for the postgres.exe executable in order to work on other
things. By default, it burns 1MB/thread, but ours will do 4MB. Never
really thought of the problem that it'll run out of address space.
Unfortunately, that size can't be changed in the CreateThread() call -
only the initially committed size can be changed there.

There are two ways to get around it - one is not using a thread for each
backend, but a single thread that handles them all and then some sync
objects around it. We originally considered this but said we won't
bother changing it because the current way is simpler, and the overhead
of a thread is tiny compared to a process. I don't think anybody even
thought about the fact that it'd run you out of address space...

The other way is to finish off win64 support :-) Which I plan to look
at, but I don't think that alone should be considered a solution.

The question is if it's worth fixing that part, if it will just fall
down for other reasons before we reach these 500 connections anyway. Can
you try having your program actually run some queries and so, and not
just do a PQconnect? To see if it falls over then, because it's been
doing more?

Post by Trevor Talbot
Laurent's issue must depend on other load characteristics. It's
possible to get a trace of DLL loads, but I haven't found a
noninvasive way of doing that. It seems to require a debugger be
attached.

AFAIK, it does require that, yes.

//Magnus

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Trevor Talbot

2007-10-22 14:43:35 UTC

Post by Magnus Hagander

Post by Trevor Talbot
I tried generating idle connections in an effort to reproduce
Laurent's problem, but I ran into a local limit instead: for each
backend, postmaster creates a thread and burns 4MB of its 2GB address
space. It fails around 490.

Oh, that's interesting. That's actually a sideeffect of us increasing
the stack size for the postgres.exe executable in order to work on other
things. By default, it burns 1MB/thread, but ours will do 4MB. Never
really thought of the problem that it'll run out of address space.
Unfortunately, that size can't be changed in the CreateThread() call -
only the initially committed size can be changed there.
There are two ways to get around it - one is not using a thread for each
backend, but a single thread that handles them all and then some sync
objects around it. We originally considered this but said we won't
bother changing it because the current way is simpler, and the overhead
of a thread is tiny compared to a process. I don't think anybody even
thought about the fact that it'd run you out of address space...

I'd probably take the approach of combining win32_waitpid() and
threads. You'd end up with 1 thread per 64 backends; when something
interesting happens the thread could push the info onto a queue, which
the new win32_waitpid() would check. Use APCs to add new backends to
threads with free slots.

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

Magnus Hagander

2007-10-22 19:19:22 UTC

Post by Trevor Talbot

Post by Magnus Hagander

Post by Trevor Talbot
I tried generating idle connections in an effort to reproduce
Laurent's problem, but I ran into a local limit instead: for each
backend, postmaster creates a thread and burns 4MB of its 2GB address
space. It fails around 490.

Oh, that's interesting. That's actually a sideeffect of us increasing
the stack size for the postgres.exe executable in order to work on other
things. By default, it burns 1MB/thread, but ours will do 4MB. Never
really thought of the problem that it'll run out of address space.
Unfortunately, that size can't be changed in the CreateThread() call -
only the initially committed size can be changed there.
There are two ways to get around it - one is not using a thread for each
backend, but a single thread that handles them all and then some sync
objects around it. We originally considered this but said we won't
bother changing it because the current way is simpler, and the overhead
of a thread is tiny compared to a process. I don't think anybody even
thought about the fact that it'd run you out of address space...

I'd probably take the approach of combining win32_waitpid() and
threads. You'd end up with 1 thread per 64 backends; when something
interesting happens the thread could push the info onto a queue, which
the new win32_waitpid() would check. Use APCs to add new backends to
threads with free slots.

I was planning to make it even easier and let Windows do the job for us,
just using RegisterWaitForSingleObject(). Does the same - one thread per
64 backends, but we don't have to deal with the queueing ourselves.
Should be rather trivial to do.

Keeps win32_waitpid() unchanged.

That said, refactoring win32_waitpid() to be based on a queue might be a
good idea *anyway*. Have the callback from above put something in the
queue, and go with your idea for the rest.

//Magnus

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly

Trevor Talbot

2007-10-22 20:19:24 UTC

Post by Magnus Hagander

Post by Trevor Talbot
I'd probably take the approach of combining win32_waitpid() and
threads. You'd end up with 1 thread per 64 backends; when something
interesting happens the thread could push the info onto a queue, which
the new win32_waitpid() would check. Use APCs to add new backends to
threads with free slots.

I was planning to make it even easier and let Windows do the job for us,
just using RegisterWaitForSingleObject(). Does the same - one thread per
64 backends, but we don't have to deal with the queueing ourselves.

Oh, good call -- I keep forgetting the native thread pool exists.

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Tom Lane

2007-10-22 20:26:10 UTC

Post by Magnus Hagander
I was planning to make it even easier and let Windows do the job for us,
just using RegisterWaitForSingleObject(). Does the same - one thread per
64 backends, but we don't have to deal with the queueing ourselves.
Should be rather trivial to do.

How can that possibly work? Backends have to be able to run
concurrently, and I don't see how they'll do that if they share a stack.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Magnus Hagander

2007-10-22 20:45:02 UTC

Post by Tom Lane

Post by Magnus Hagander
I was planning to make it even easier and let Windows do the job for us,
just using RegisterWaitForSingleObject(). Does the same - one thread per
64 backends, but we don't have to deal with the queueing ourselves.
Should be rather trivial to do.

How can that possibly work? Backends have to be able to run
concurrently, and I don't see how they'll do that if they share a stack.

We're not talking about the backends, we're talking about the backend
waiter threads whose sole purpose is to wait for a backend to die and
then raise a signal when it does. We can easily have the kernel wait for
a whole bunch of them at once, and have it call our callback function
whenever anyone of them dies.

//Magnus

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Tom Lane

2007-10-22 21:39:56 UTC

Post by Magnus Hagander
We're not talking about the backends, we're talking about the backend
waiter threads whose sole purpose is to wait for a backend to die and
then raise a signal when it does.

Oh, OK, I had not twigged to exactly what the threads were being used
for. Never mind ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Trevor Talbot

2007-10-22 20:44:29 UTC

Post by Tom Lane

Post by Magnus Hagander
I was planning to make it even easier and let Windows do the job for us,
just using RegisterWaitForSingleObject(). Does the same - one thread per
64 backends, but we don't have to deal with the queueing ourselves.
Should be rather trivial to do.

How can that possibly work? Backends have to be able to run
concurrently, and I don't see how they'll do that if they share a stack.

This is about what postmaster does for its SIGCHLD wait equivalent on
win32. The 64 comes from Windows' object/event mechanism, which lets
you perform a blocking wait on up to that many handles in a single
call. Currently postmaster is creating a new thread to wait on only
one backend at a time, so it ends up with too many threads.

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Magnus Hagander

2007-10-22 18:52:33 UTC

Post by Magnus Hagander
Oh, that's interesting. That's actually a sideeffect of us increasing
the stack size for the postgres.exe executable in order to work on other
things. By default, it burns 1MB/thread, but ours will do 4MB. Never
really thought of the problem that it'll run out of address space.
Unfortunately, that size can't be changed in the CreateThread() call -
only the initially committed size can be changed there.

Windows XP supports the STACK_SIZE_PARAM_IS_A_RESERVATION flag, which
apparently allows to reduce the reserved size. It might be better to do
this the other way round, though (leave the reservation at its 1 MB
default, and increase it only when necessary).

It does, but we still support windows 2000 as well. I think it's better
to use a different method altogether - one not using one thread per child.

//Magnus

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Florian Weimer

2007-10-22 18:48:34 UTC

Post by Magnus Hagander
Oh, that's interesting. That's actually a sideeffect of us increasing
the stack size for the postgres.exe executable in order to work on other
things. By default, it burns 1MB/thread, but ours will do 4MB. Never
really thought of the problem that it'll run out of address space.
Unfortunately, that size can't be changed in the CreateThread() call -
only the initially committed size can be changed there.

Windows XP supports the STACK_SIZE_PARAM_IS_A_RESERVATION flag, which
apparently allows to reduce the reserved size. It might be better to do
this the other way round, though (leave the reservation at its 1 MB
default, and increase it only when necessary).

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Rainer Bauer

2007-10-20 11:22:56 UTC

Post by Magnus Hagander

Post by Laurent Duperval
- Max_connections is set to 500. I did that originally because I kept
seeing a message about no connection available and I thought it was
because I was not allocating enough connections. My machine has 2GB of RAM.

There's your problem. 500 is way above what the windows version can handle. IIRC the hard max is somewhere around 200 depending on some OS factors that we don't entirely know. I'd never recommend going above 100-150. With no more than 2Gb ram, not above 100.

My guess is that Windows is running out of handles. Each backend uses about
150 handles. 100 Backends means 15000 handles. Depending how many other
programs are currently running the no. of startable backends will vary
depending on the total handle limit Windows imposes.

Rainer

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Trevor Talbot

2007-10-20 16:53:03 UTC

Post by Rainer Bauer

Post by Magnus Hagander

Post by Laurent Duperval
- Max_connections is set to 500. I did that originally because I kept
seeing a message about no connection available and I thought it was
because I was not allocating enough connections. My machine has 2GB of RAM.

There's your problem. 500 is way above what the windows version can handle. IIRC the hard max is somewhere around 200 depending on some OS factors that we don't entirely know. I'd never recommend going above 100-150. With no more than 2Gb ram, not above 100.

My guess is that Windows is running out of handles. Each backend uses about
150 handles. 100 Backends means 15000 handles. Depending how many other
programs are currently running the no. of startable backends will vary
depending on the total handle limit Windows imposes.

Those are kernel object handles; the ceiling does depend on available
kernel memory, but they're cheap, and postgres is in no danger of
running into that limit. Most of the handle limits people talk about
are on USER (window etc) objects, which come from a single shared
pool.

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Rainer Bauer

2007-10-20 20:15:27 UTC

Post by Trevor Talbot

Post by Rainer Bauer

Post by Magnus Hagander

Post by Laurent Duperval
- Max_connections is set to 500. I did that originally because I kept
seeing a message about no connection available and I thought it was
because I was not allocating enough connections. My machine has 2GB of RAM.

There's your problem. 500 is way above what the windows version can handle. IIRC the hard max is somewhere around 200 depending on some OS factors that we don't entirely know. I'd never recommend going above 100-150. With no more than 2Gb ram, not above 100.

My guess is that Windows is running out of handles. Each backend uses about
150 handles. 100 Backends means 15000 handles. Depending how many other
programs are currently running the no. of startable backends will vary
depending on the total handle limit Windows imposes.

Those are kernel object handles; the ceiling does depend on available
kernel memory, but they're cheap, and postgres is in no danger of
running into that limit. Most of the handle limits people talk about
are on USER (window etc) objects, which come from a single shared
pool.

You are right. I just did a quick test and depending on the handle type these
limits are quite high. I could create 5 millions events or 4 millions
semaphores or 3,5 millions mutexes before the system returned error 1816
ERROR_NOT_ENOUGH_QUOTA "Not enough quota is available to process this
command.".

Rainer

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly

Rainer Bauer

2007-10-20 20:42:00 UTC

Post by Rainer Bauer
You are right. I just did a quick test and depending on the handle type these
limits are quite high. I could create 5 millions events or 4 millions
semaphores or 3,5 millions mutexes before the system returned error 1816
ERROR_NOT_ENOUGH_QUOTA "Not enough quota is available to process this
command.".

[Does some further testing] The limit is high, but nonetheless Postgres is
running out of handles. Setting <max_connections> to 10000 and starting
postgres _without_ any connection consumes 40000 handles. This correspodends
to the 4 Postgres processes running after the server was started. Every new
connection consumes another 10000 handles.

I don't know the Postgres code involved, but it seems that every backend
consumes at least <max_connections> handles. Hence increasing this value will
have the opposite effect once a certain threshold is met.

Rainer

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly

Magnus Hagander

2007-10-20 06:23:48 UTC

Post by Joshua D. Drake

Post by Magnus Hagander

Post by Tom Lane
Maybe we should put an #ifdef WIN32 into guc.c to limit
max_connections to something we know the platform can stand? It'd
be more comfortable if we understood exactly where the limit was,
but I think I'd rather have an "I'm sorry Dave, I can't do that"
than random-seeming crashes.

Yeayh, that's probably a good idea - except we never managed to
figure out where the limit is. It appears to vary pretty wildly
between different machines, for reasons we don't really know why
(total RAM has some effect on it, but that's not the only one, for
example)

How about we just emit a warning..
WARNING: Connections above 250 on Windows platforms may have
unpredictable results.

That's probably a better idea. I'll go look at that unless people feel we should just stick it in docd/faq?

/Magnus

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Tom Lane

2007-10-20 15:19:20 UTC

Post by Magnus Hagander

Post by Joshua D. Drake
How about we just emit a warning..
WARNING: Connections above 250 on Windows platforms may have
unpredictable results.

That's probably a better idea. I'll go look at that unless people feel we should just stick it in docd/faq?

Unless we've got some credible basis for citing a particular number,
I don't think this will help much.

Post by Magnus Hagander
My guess is that Windows is running out of handles. Each backend uses about
150 handles. 100 Backends means 15000 handles. Depending how many other
programs are currently running the no. of startable backends will vary
depending on the total handle limit Windows imposes.

I find this theory very interesting; for one thing it explains the
reported variability of results, since the non-Postgres demand for
handles could be anything. Is there any way we could check it?
If it's accurate, what we ought to be whining about is some
combination of max_connections and max_files_per_process, rather
than only considering the former.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly

Magnus Hagander

2007-10-20 15:33:03 UTC

Post by Tom Lane

Post by Magnus Hagander

Post by Joshua D. Drake
How about we just emit a warning..
WARNING: Connections above 250 on Windows platforms may have
unpredictable results.

That's probably a better idea. I'll go look at that unless people feel we should just stick it in docd/faq?

Unless we've got some credible basis for citing a particular number,
I don't think this will help much.

ok. Maybe a note in the docs or FAQ at least?

Post by Tom Lane

Post by Magnus Hagander
My guess is that Windows is running out of handles. Each backend uses about
150 handles. 100 Backends means 15000 handles. Depending how many other
programs are currently running the no. of startable backends will vary
depending on the total handle limit Windows imposes.

I find this theory very interesting; for one thing it explains the
reported variability of results, since the non-Postgres demand for
handles could be anything. Is there any way we could check it?
If it's accurate, what we ought to be whining about is some
combination of max_connections and max_files_per_process, rather
than only considering the former.

It's not that simple. Merlin ran some checks, and drastically reducing
max_files_per_process made no measurable difference.

My best guess is it's due to the non-paged pool. Handles are a part of
what goes in there, but only a part.

//Magnus

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Dave Page

2007-10-26 18:12:38 UTC

------- Original Message -------
Sent: 26/10/07, 18:09:26
Subject: Re: [GENERAL] 8.2.3: Server crashes on Windows using Eclipse/Junit
Dave could you add that it's the third parameter of the "SharedSection" string
that must be changed. I read that KB article, but still had to find the
correct one by trial and error, which required a reboot every time.

Err, it does say that:

You can increase the non-interactive Desktop Heap by modifying the third SharedSection value in the registry as described in this Microsoft Knowledgebase article.

/D

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

Rainer Bauer

2007-10-26 19:35:05 UTC

Post by Dave Page

------- Original Message -------
Sent: 26/10/07, 18:09:26
Subject: Re: [GENERAL] 8.2.3: Server crashes on Windows using Eclipse/Junit
Dave could you add that it's the third parameter of the "SharedSection" string
that must be changed. I read that KB article, but still had to find the
correct one by trial and error, which required a reboot every time.

You can increase the non-interactive Desktop Heap by modifying the third SharedSection value in the registry as described in this Microsoft Knowledgebase article.

Must have overlooked that part. Sorry for the noise.

Rainer

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly

79 Replies
13 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Laurent Duperval 2007-10-15 17:58:48 UTC

Kris Jurka 2007-10-15 19:06:37 UTC

Laurent Duperval 2007-10-16 16:46:42 UTC

Alvaro Herrera 2007-10-16 22:20:10 UTC

Tom Lane 2007-10-16 23:01:07 UTC

Trevor Talbot 2007-10-16 23:02:32 UTC

Laurent Duperval 2007-10-17 02:21:41 UTC

Laurent Duperval 2007-10-16 18:24:20 UTC

Magnus Hagander 2007-10-17 05:20:56 UTC

Tom Lane 2007-10-17 06:40:14 UTC

Magnus Hagander 2007-10-17 07:22:11 UTC

Joshua D. Drake 2007-10-19 19:48:55 UTC

Shelby Cain 2007-10-20 18:29:15 UTC

Trevor Talbot 2007-10-20 18:44:52 UTC

Shelby Cain 2007-10-20 21:01:19 UTC

Trevor Talbot 2007-10-20 21:25:28 UTC

Rainer Bauer 2007-10-20 21:47:05 UTC

Trevor Talbot 2007-10-20 22:15:25 UTC

Rainer Bauer 2007-10-20 22:27:41 UTC

Rainer Bauer 2007-10-20 22:24:30 UTC

Trevor Talbot 2007-10-21 03:55:01 UTC

Magnus Hagander 2007-10-21 08:46:50 UTC

Rainer Bauer 2007-10-21 19:43:27 UTC

Magnus Hagander 2007-10-22 08:23:16 UTC

Magnus Hagander 2007-10-22 08:41:14 UTC

Magnus Hagander 2007-10-22 12:41:42 UTC

Dave Page 2007-10-22 13:33:18 UTC

Rainer Bauer 2007-10-22 14:03:42 UTC

Dave Page 2007-10-22 14:38:07 UTC

Trevor Talbot 2007-10-22 15:04:03 UTC

Magnus Hagander 2007-10-22 15:09:20 UTC

Trevor Talbot 2007-10-22 15:30:59 UTC

Trevor Talbot 2007-10-22 15:59:18 UTC

Dave Page 2007-10-22 16:02:27 UTC

Rainer Bauer 2007-10-23 12:02:03 UTC

Dave Page 2007-10-23 15:14:14 UTC

Harald Armin Massa 2007-10-23 15:32:29 UTC

Gregory Stark 2007-10-23 15:49:35 UTC

Harald Armin Massa 2007-10-23 16:11:12 UTC

Harald Armin Massa 2007-10-23 16:13:44 UTC

Dave Page 2007-10-23 16:22:11 UTC

Trevor Talbot 2007-10-23 16:43:33 UTC

Dave Page 2007-10-23 15:57:45 UTC

Trevor Talbot 2007-10-23 17:07:14 UTC

Magnus Hagander 2007-10-23 18:07:22 UTC

Dave Page 2007-10-26 08:53:39 UTC

Rainer Bauer 2007-10-26 17:09:26 UTC

Laurent Duperval 2007-11-30 00:55:30 UTC

Trevor Talbot 2007-11-30 05:51:02 UTC

Dave Page 2007-11-30 09:51:07 UTC

Dave Page 2007-10-22 15:42:42 UTC

Magnus Hagander 2007-10-22 16:35:34 UTC

Dave Page 2007-10-22 20:00:31 UTC

Magnus Hagander 2007-10-22 20:16:38 UTC

Dave Page 2007-10-22 20:23:14 UTC

Dave Page 2007-10-22 20:19:16 UTC

Rainer Bauer 2007-10-22 14:03:35 UTC

Trevor Talbot 2007-10-22 14:25:08 UTC

Magnus Hagander 2007-10-22 14:33:17 UTC

Shelby Cain 2007-10-20 22:26:32 UTC

Trevor Talbot 2007-10-20 16:40:20 UTC

Magnus Hagander 2007-10-21 09:01:17 UTC

Trevor Talbot 2007-10-22 14:43:35 UTC

Magnus Hagander 2007-10-22 19:19:22 UTC

Trevor Talbot 2007-10-22 20:19:24 UTC

Tom Lane 2007-10-22 20:26:10 UTC

Magnus Hagander 2007-10-22 20:45:02 UTC

Tom Lane 2007-10-22 21:39:56 UTC

Trevor Talbot 2007-10-22 20:44:29 UTC

Magnus Hagander 2007-10-22 18:52:33 UTC

Florian Weimer 2007-10-22 18:48:34 UTC

Rainer Bauer 2007-10-20 11:22:56 UTC

Trevor Talbot 2007-10-20 16:53:03 UTC

Rainer Bauer 2007-10-20 20:15:27 UTC

Rainer Bauer 2007-10-20 20:42:00 UTC

Magnus Hagander 2007-10-20 06:23:48 UTC

Tom Lane 2007-10-20 15:19:20 UTC

Magnus Hagander 2007-10-20 15:33:03 UTC

Dave Page 2007-10-26 18:12:38 UTC

Rainer Bauer 2007-10-26 19:35:05 UTC

about - legalese

Loading...