Discussion:
improving async support in psycopg
Jan Urbański
2010-03-21 01:07:48 UTC
Permalink
Hi,

I've recently got the need for a asynchronous Python driver for
PostgreSQL and I started looking for alternatives.

One of them is pgasync, but that seems unmaintained and given up on a
long time ago. So I came across the mention of asynchronous operations
support in psycopg2, but then I found the email saying that it's not
really ready for prime time
(http://lists.initd.org/pipermail/psycopg/2009-October/006642.html) and
the documentation states that as well.

We use a lot of Twisted in our shop and currently we use the Twisted API
for doing database stuff, but that involves threading, which we'd really
like to avoid. So I think it would be nice to expose libpq's
asynchronous APIs from psycopg and do it in a way that can be used
directly with select or things like that, from which point it would be
easy to make it work smoothly with Twisted.

It looks like the things expose would be:

* PQconnectStart and PQconnectPoll and make them into a
connect(async=True), connection.fileno() and connection.poll()

* make sure the current async support for executing queries is robust
and that is exposes enough things to wrap it in Twisted

* maybe implement asynchronous COPY support

Does that sound interesting? I can't really promise going and making
that happen, but if there's interest I will certainly try.

Cheers,
Jan
Jan Urbański
2010-03-23 00:38:27 UTC
Permalink
Post by Jan Urbański
Hi,
I've recently got the need for a asynchronous Python driver for
PostgreSQL and I started looking for alternatives.
So here's a first attempt, only only doc changes and tests added.

This is basically a trimmed down version of what Markus Demleitner wrote
a couple of months ago - it's missing the query cancellation feature
that I will try to implement after reading the libpq documentation some
more.

I'll keep on working on exposing more async features of the C library
and adding more unit tests (I've started another thread with the fix for
the test suite for PG 9.0).

Cheers,
Jan
Jan Urbański
2010-03-24 03:38:48 UTC
Permalink
Post by Jan Urbański
I'll keep on working on exposing more async features of the C library
and adding more unit tests (I've started another thread with the fix for
the test suite for PG 9.0).
Hi again,

here are some notes I took when looking through the async code in psycopg2:

PQsendQuery can return an error is the socket buffer to the server is
full. Currently psycopg2 will react with raising an OperationalError.
Another thing is that you should be calling PQflush after using
PQsendQuery until you get a 0 from PQflush. My idea was to add a method
on the cursor that would call PQflush for you, like issent().

This means that the safe way of doing async access to the DB would be:

c = conn.cursor()
# make sure the socket is writable
# so we don't get an exception from PQsendQuery
_, _, _, = select.select([], [c.fileno()], [])

# execute the async command
c.execute("select * from pg_class", async=True)

# check if the data has been sent
wait = not c.issent()

# while there's still data to be sent to the server,
# wait for the socket to become writable and flush
while wait:
_, _, _ = select.select([], [c.fileno()], [])
wait = not c.issent()

# now wait for the data to arrive at the socket
wait = True
while wait:
_, _, _ = select.select([c.fileno()], [], [])
wait = not c.isready()

print c.fetchall()


Another thing that I'm wondering about: why is the fileno() method doing
PQflush? The comment says it's to make sure the user uses select()
safely, but AFAICS it does not really do that. First of all one PQflush
might not be enough to empty the write buffer and second of all, getting
the file descriptor should not attempt to flush the outgoing buffer
queue. Instead, the issent() method would be used for that.


Next question is about pq_fetch and the loop where it does pq_is_busy
and uses select() internally. Is this loop necessary at all? If the
execute() call on the cursor was synchronous, there's no need to check
the status of the connection, or to do a select. PQexec would block
until all data has been transferred.
If the execute() call was asynchronous, then a client should enter a
isready() loop doing select() on his own, and eventually get all the
result, so that loop would also not be necessary. I think it could just
be dropped, maybe after making sure that pending NOTIFY messages are
processed.


That brings me back to the issue of doing a fetch() on an async cursor
that has not received all data yet. The current code blocks until it
gets all data, but I think that an easier solution would be to just
error out if there's an async cursor that has not consumed all input
yet. Anyone using async cursors would implement that select() loop
anyway, so he will consume all input. That would make it possible to
drop some things like needsfetch from the cursor structure and simplify
the code around that quite a bit.


Another issue are server-side asynchronous cursors. ATM I don't see any
sane way to support them, because after creating a named async cursor
and executing something asynchronously with it, using isready() to loop
will just make you wait for the DECLARE CURSOR to return. After that the
only way to actually do the query is to call fetch*(), which will block.
The problem is that the fetch*() method has to return the result, so
it's too late to ask the user to use select() after that. The only thing
I came up was making async server-side cursors throw errors for calls to
fetch() that are not preceded by a call to a new method called load(n),
that will issue the FETCH query and will make it possible to use
isready() to wait for the completion. After that fetch() would have the
data already on the client side and could throw errors if you try to
access data that has not been loaded yet.


I won't bother you with the details of how an async connection building
procedure could look like, I just want to raise one last issue. The
connection class issues a couple of queries, like SHOW client_encoding
or SHOW default_transaction_isolation. These are issues synchronously
and there is no control over them. That would even be OK, hopefully an
application would not create and close lots of connection during it's
lifetime. The bad thing is the BEGIN query that gets issued
synchronously and even less convenient is that COMMITs are synchronous,
and these could take some time on systems with a certain configuration.


As you can see there are quite a few things to look at to be able to use
psycopg2 as a fully async library (remember that my use case is Twisted,
so every call that can potentially block is bad). I'm beginning to
wonder if it's actually feasible...


Thanks for reading that overly long post,
Jan
Daniele Varrazzo
2010-03-24 10:46:56 UTC
Permalink
Post by Jan Urbański
I won't bother you with the details of how an async connection building
procedure could look like, I just want to raise one last issue. The
connection class issues a couple of queries, like SHOW client_encoding or
SHOW default_transaction_isolation. These are issues synchronously and there
is no control over them. That would even be OK, hopefully an application
would not create and close lots of connection during it's lifetime. The bad
thing is the BEGIN query that gets issued synchronously and even less
convenient is that COMMITs are synchronous, and these could take some time
on systems with a certain configuration.
I was wondering if it wouldn't be better to have "async" as a
connection characteristic (a `psycopg2.connect()` parameter) instead
of as a parameter of the single query executions. This way the driver
could performed these steps in asynchronous fashion too. Also the
issues arising by mixing up sync/async operation would disappear.
Jan Urbański
2010-03-24 10:56:07 UTC
Permalink
Post by Daniele Varrazzo
Post by Jan Urbański
I won't bother you with the details of how an async connection building
procedure could look like, I just want to raise one last issue. The
connection class issues a couple of queries, like SHOW client_encoding or
SHOW default_transaction_isolation. These are issues synchronously and there
is no control over them. That would even be OK, hopefully an application
would not create and close lots of connection during it's lifetime. The bad
thing is the BEGIN query that gets issued synchronously and even less
convenient is that COMMITs are synchronous, and these could take some time
on systems with a certain configuration.
I was wondering if it wouldn't be better to have "async" as a
connection characteristic (a `psycopg2.connect()` parameter) instead
of as a parameter of the single query executions. This way the driver
could performed these steps in asynchronous fashion too. Also the
issues arising by mixing up sync/async operation would disappear.
That might make sense. Although then you can't use the same connection
for both sync and async things, so if in your software you sometimes
want to just execute a simple blocking query and then do na asynchronous
one you would have to open two separate connections.

You could argue that an application that uses the async interface will
not want to block ever (that's certainly the case with my Twisted PG
module, I would never use sync execution there). But is it worth it to
impose that limit? It's easy enough to just disallow running anything
while the connection is processing an async query.

Jan
Daniele Varrazzo
2010-03-24 11:55:20 UTC
Permalink
Post by Jan Urbański
Post by Daniele Varrazzo
Post by Jan Urbański
I won't bother you with the details of how an async connection building
procedure could look like, I just want to raise one last issue. The
connection class issues a couple of queries, like SHOW client_encoding or
SHOW default_transaction_isolation. These are issues synchronously and there
is no control over them. That would even be OK, hopefully an application
would not create and close lots of connection during it's lifetime. The bad
thing is the BEGIN query that gets issued synchronously and even less
convenient is that COMMITs are synchronous, and these could take some time
on systems with a certain configuration.
I was wondering if it wouldn't be better to have "async" as a
connection characteristic (a `psycopg2.connect()` parameter) instead
of as a parameter of the single query executions. This way the driver
could performed these steps in asynchronous fashion too. Also the
issues arising by mixing up sync/async operation would disappear.
That might make sense. Although then you can't use the same connection
for both sync and async things, so if in your software you sometimes
want to just execute a simple blocking query and then do na asynchronous
one you would have to open two separate connections.
You could argue that an application that uses the async interface will
not want to block ever (that's certainly the case with my Twisted PG
module, I would never use sync execution there). But is it worth it to
impose that limit? It's easy enough to just disallow running anything
while the connection is processing an async query.
I think an application with mixed needs could easily create two
connections with different sync parameter to mix the call styles. Of
course they couldn't be in the same transaction, but this would be a
still more uncommon use case.

I have a different use case for async psycopg, maybe worth mentioning
for completeness and in order to see if it would be possible to have
an interface useful in different async environments too. Notice that I
am not working directly on these issues, so my knowledge of the issues
is more from code read than from code written, so I may actually be
not completely accurate.

We have coroutine-based program, currently using eventlet_ and running
psycopg in a thread to avoid being blocked (a service offered by
eventlet itself via the db_pool module). We'd like to move to gevent_,
a library built on libevent_. Both eventlet and gevent can
"monkeypatch" pure python code in order to make it non-blocking,
cooperatively yielding the control to another coroutine in presence of
a blocking call. In a pilot program using gevent we used a different
postgres adapter (pg8000, non libpq based) because, being written in
pure python, gevent can replace its socket with a cooperative one (but
pg8000 is rather immature, e.g. we had to patch it to avoid leaving
idle transactions open, something it does by design).

_eventlet: http://wiki.secondlife.com/wiki/Eventlet
_gevent: http://www.gevent.org/
_libevent: http://monkey.org/~provos/libevent/

I think the optimum would be to have an async connection with the
classic dbapi interface built on top, with coroutine-friendly blocking
methods. A dbapi interface (thus without the extra `cursor.execute()`
parameter) would also enable the async psycopg to be used with
existing clients, e.g. SQLAlchemy. I haven't thought if it is possible
to have this independently from the async framework used (twisted,
gevent, libevent, maybe providing an hook point where the libs could
install their callbacks...), I just think that having the concept of
an "async connection" instead of just an "async execute" could be a
better starting point (the sync/async code paths can be widely
different: there could be two C classes to handle them, which could
clean up a lot of code paths).

I'd really like to do myself some tries in this direction, but I don't
think it will be possible in the next days. I'd like to hear some
feedback by people knowing twisted and/or coroutine-based libraries to
hear if there would be show-stoppers I don't know about in this
approach.

-- Daniele
Jan Urbański
2010-03-24 12:19:03 UTC
Permalink
Post by Daniele Varrazzo
Post by Jan Urbański
You could argue that an application that uses the async interface will
not want to block ever (that's certainly the case with my Twisted PG
module, I would never use sync execution there). But is it worth it to
impose that limit? It's easy enough to just disallow running anything
while the connection is processing an async query.
I think an application with mixed needs could easily create two
connections with different sync parameter to mix the call styles. Of
course they couldn't be in the same transaction, but this would be a
still more uncommon use case.
Yeah, I guess we could live with not permitting sync and async
connection at the same time.
Post by Daniele Varrazzo
We have coroutine-based program, currently using eventlet_ and running
psycopg
Well the big advantage of psycopg2 is that it's using the official C
library, so it's getting the same fixes as libpq gets updated by the
upstream project and of course it's faster. So it you'd want to use it
transparently with eventlet you'll have to write some code anyway... My
Twisted wrapper needs to deal with quite a few things, so going async
can never be free IMHO. As you said the way you work with async
connections is quite different.
Post by Daniele Varrazzo
I think the optimum would be to have an async connection with the
classic dbapi interface built on top, with coroutine-friendly blocking
methods.
Yeah, to be honest the whole DBAPI 2.0 concept of connections and
cursors is broken IMHO :( You do queries using cursors, but you commit
using connections... when in fact you are always sending stuff over the
same connection, so the "cursor" is just a layer of indiredction (until
you do real standard SQL cursors, but then they behave very differently).

A dbapi interface (thus without the extra `cursor.execute()`
Post by Daniele Varrazzo
parameter) would also enable the async psycopg to be used with
existing clients, e.g. SQLAlchemy.
I'm not sure I follow: how would SQLAlchemy use the async API? I think
the user needs to be aware that he's using the async features, so he
will do select calls (like Twisted) or will yield control on EAGAIN
errors from the socket (like eventlet).
Post by Daniele Varrazzo
an "async connection" instead of just an "async execute" could be a
better starting point (the sync/async code paths can be widely
different: there could be two C classes to handle them, which could
clean up a lot of code paths).
I agree with that part. Maybe instead of bolting the async features into
the same connection and cursor classes just do them separately, in
separate classes, using separate methods (even if they would look more
or less like DBAPI 2.0). I am fairly sure that frameworks using DBAPI
2.0 (like SQLAlchemy or for instance Django, or even current Twisted DB
integration) will not be able to work seamlessly with an async driver.
So maybe let's just invent our own psycopg2 API for that and let people
write code around that.

Hmm, someone could even come up with an PEP for ADBAPI ;-) But at least
for now that's not going to be me.

Cheers,
Jan
Manlio Perillo
2010-03-24 14:16:53 UTC
Permalink
Post by Jan Urbański
[...]
Well the big advantage of psycopg2 is that it's using the official C
library, so it's getting the same fixes as libpq gets updated by the
upstream project and of course it's faster. So it you'd want to use it
transparently with eventlet you'll have to write some code anyway... My
Twisted wrapper needs to deal with quite a few things, so going async
can never be free IMHO. As you said the way you work with async
connections is quite different.
Some days ago, I wrote a libpq wrapper using ctypes.
http://hg.mperillo.ath.cx/libpq

It is rather low level, but it is not hard to write higher level wrappers.

Of course, the main problem is conversion between PostgreSQL types and
Python types, since it requires some parsing.

One advantage of using the low level libpq API is that you have more
control about what you are doing.
Post by Jan Urbański
[...]
A dbapi interface (thus without the extra `cursor.execute()`
Post by Daniele Varrazzo
parameter) would also enable the async psycopg to be used with
existing clients, e.g. SQLAlchemy.
I'm not sure I follow: how would SQLAlchemy use the async API? I think
the user needs to be aware that he's using the async features, so he
will do select calls (like Twisted) or will yield control on EAGAIN
errors from the socket (like eventlet).
The idea is that you have a generic "reactor" API, with a simple API like:

waitRead(fd)
waitWrite(fd)

or a more generic:

poll(fd, ...)


Using greenlet, these calls will suspend execution of the current
greenlet until a socket is ready.

On top of this API, you can implement an high level API, like:

readn(fd, n)
read n bytes, blocking current greenlet until all data is read
writen(fd, n)

connect(fd, ...)

and so on.


How the reactor is started is unspecified.

As with Daniele, I, too, am interested in greenlet for use in my WSGI
module for Nginx.

In this case the reactor API is implemented on top of Nginx event module.
Post by Jan Urbański
[...]
Regards Manlio
Daniele Varrazzo
2010-03-24 14:36:37 UTC
Permalink
 A dbapi interface (thus without the extra `cursor.execute()`
Post by Daniele Varrazzo
parameter) would also enable the async psycopg to be used with
existing clients, e.g. SQLAlchemy.
I'm not sure I follow: how would SQLAlchemy use the async API? I think
the user needs to be aware that he's using the async features, so he
will do select calls (like Twisted) or will yield control on EAGAIN
errors from the socket (like eventlet).
In case eventlet is used, the single coroutine can get blocked by an
execute: while it is blocked, control is yield to a different
coroutine. From the PoV of the single coroutine the semantic would be
the same of a blocking `execute()`, so SQLAlchemy can be used. How to
obtain this? I was thinking about something along the line of
providing an hook on the connection or on the module (for newly
created connections) that e.g. eventlet could use to install the
connection file descriptor into the libevent reactor
(http://www.gevent.org/gevent.core.html) and enable switching on
block. I think other coroutine-based framework may use the same hook,
provided they expose a way to hook a fd into the select/poll/whatever
but I don't know if twisted+deferred would fit this model.

-- Daniele
Federico Di Gregorio
2010-03-29 07:37:05 UTC
Permalink
Post by Jan Urbański
Yeah, to be honest the whole DBAPI 2.0 concept of connections and
cursors is broken IMHO :( You do queries using cursors, but you commit
using connections... when in fact you are always sending stuff over the
same connection, so the "cursor" is just a layer of indiredction (until
you do real standard SQL cursors, but then they behave very differently).
It is not broken. It is just that you _need_ cursors for some
programming patterns, like doing two selects and comparing results or
doing SELECT/UPDATE or SELECT/INSERT series. True, you can always cache
intermediate results in memory but why reinvent the wheel? A cursor is
exactly that, a lightweight cache for SELECT data and an handle to do
multiple INSERTs/UPDATEs in the same transaction.

federico (slowly catching up with email)
--
Federico Di Gregorio fog-NGVKUo/i/***@public.gmane.org
Ma chi sei?....-il trafficante di Nutella? -- Giorgia
Federico Di Gregorio
2010-03-24 20:47:05 UTC
Permalink
On 24/03/2010 04:38, Jan Urbański wrote:
[snip]
Post by Jan Urbański
Another thing that I'm wondering about: why is the fileno() method doing
PQflush? The comment says it's to make sure the user uses select()
safely, but AFAICS it does not really do that. First of all one PQflush
might not be enough to empty the write buffer and second of all, getting
the file descriptor should not attempt to flush the outgoing buffer
queue. Instead, the issent() method would be used for that.
The main idea was to avoid the need of issent(). Unfortunately a single
PQflush() call does not guarantee that the outgoing buffer if empty so
the call is not safe. I don't like the need to two methods (issent() and
isready()) and two loops just to one single async query.
Post by Jan Urbański
Next question is about pq_fetch and the loop where it does pq_is_busy
and uses select() internally. Is this loop necessary at all? If the
execute() call on the cursor was synchronous, there's no need to check
the status of the connection, or to do a select. PQexec would block
until all data has been transferred.
Right.
Post by Jan Urbański
If the execute() call was asynchronous, then a client should enter a
isready() loop doing select() on his own, and eventually get all the
result, so that loop would also not be necessary. I think it could just
be dropped, maybe after making sure that pending NOTIFY messages are
processed.
I agree. We can try to simplify the loop and see if anything breaks.
Post by Jan Urbański
That brings me back to the issue of doing a fetch() on an async cursor
that has not received all data yet. The current code blocks until it
gets all data, but I think that an easier solution would be to just
error out if there's an async cursor that has not consumed all input
yet. Anyone using async cursors would implement that select() loop
anyway, so he will consume all input. That would make it possible to
drop some things like needsfetch from the cursor structure and simplify
the code around that quite a bit.
But then what happens to the connection? If we just throw an error then
we need to block and flush the buffers anyway because an important
assumption in all psycopg client code is that unless the connection is
broken you can recover from _any_ error by issuing a rollback() and then
you can start using the connection again. So, if we need to block/flush
buffers before raising the exception it does make sense to just use that
information, i.e., transform the fetchXXX() call into a blocking call
and let the user get its data.
Post by Jan Urbański
Another issue are server-side asynchronous cursors. ATM I don't see any
sane way to support them, because after creating a named async cursor
and executing something asynchronously with it, using isready() to loop
will just make you wait for the DECLARE CURSOR to return. After that the
only way to actually do the query is to call fetch*(), which will block.
The problem is that the fetch*() method has to return the result, so
it's too late to ask the user to use select() after that. The only thing
I came up was making async server-side cursors throw errors for calls to
fetch() that are not preceded by a call to a new method called load(n),
that will issue the FETCH query and will make it possible to use
isready() to wait for the completion. After that fetch() would have the
data already on the client side and could throw errors if you try to
access data that has not been loaded yet.
While I don't like to add yet another method I don't see an easy
solution with the current API.
Post by Jan Urbański
I won't bother you with the details of how an async connection building
procedure could look like, I just want to raise one last issue. The
connection class issues a couple of queries, like SHOW client_encoding
or SHOW default_transaction_isolation. These are issues synchronously
and there is no control over them. That would even be OK, hopefully an
application would not create and close lots of connection during it's
lifetime. The bad thing is the BEGIN query that gets issued
synchronously and even less convenient is that COMMITs are synchronous,
and these could take some time on systems with a certain configuration.
As you can see there are quite a few things to look at to be able to use
psycopg2 as a fully async library (remember that my use case is Twisted,
so every call that can potentially block is bad). I'm beginning to
wonder if it's actually feasible...
True. I suppose we can have a flag on the connection saying "do
everything as async" so that even transaction control can be made
asynchronous but that will take a lot of work.

federico
--
Federico Di Gregorio fog-NGVKUo/i/***@public.gmane.org
In science one tries to tell people, in such a way as to be understood
by everyone, something that no one ever knew before. But in poetry,
it's the exact opposite. -- Paul Dirac
Federico Di Gregorio
2010-03-24 20:27:42 UTC
Permalink
Post by Jan Urbański
a couple of months ago - it's missing the query cancellation feature
that I will try to implement after reading the libpq documentation some
more.
I'll keep on working on exposing more async features of the C library
and adding more unit tests (I've started another thread with the fix for
the test suite for PG 9.0).
I'll start by this email. I don't agree on exposing more and more
features of libpq. While async is surely backend specific (especially if
done trhough libpq) I'd like to keep the API generic enough to be
implemented by other database adapters.

Some random toughts:

In theory you should be able to start a query and then manage the cursor
as any file descriptor, using select() or poll() with less possible
psycopg-specific code. I think the best would be to have an attribute on
the connection object to specify the _default_ for cursor.execute() [I
don't like the idea of completely async connections].

Also, having a way to tell from the connection if an async query is
running seems a good idea, so, probably .isready() should exist both at
the cursor and connection level.

If reactors and coroutines require additional methods then we should
provide *one* API that expose enough functionality for all of them (but
I don't use such stuff so someone else will be required to do this work).

Anyway, I applied your patch because it is the first step we need. I'll
add some more tests in the next few days.

federico
--
Federico Di Gregorio fog-NGVKUo/i/***@public.gmane.org
Se sai che hai un ***** di file così, lo manovri subito. -- vodka
Jan Urbański
2010-03-24 21:32:25 UTC
Permalink
Post by Federico Di Gregorio
Post by Jan Urbański
a couple of months ago - it's missing the query cancellation feature
that I will try to implement after reading the libpq documentation some
more.
I'll keep on working on exposing more async features of the C library
and adding more unit tests (I've started another thread with the fix for
the test suite for PG 9.0).
I'll start by this email. I don't agree on exposing more and more
features of libpq. While async is surely backend specific (especially if
done trhough libpq) I'd like to keep the API generic enough to be
implemented by other database adapters.
I agree 100%. Let's start by designing an API then, although I don't
know anything about other databases, so I won't be able to tell if
similar things would be possible for MySQL or Oracle etc.
Post by Federico Di Gregorio
In theory you should be able to start a query and then manage the cursor
as any file descriptor, using select() or poll() with less possible
psycopg-specific code. I think the best would be to have an attribute on
the connection object to specify the _default_ for cursor.execute() [I
don't like the idea of completely async connections].
I'm not entirely sure that's the case. Just grabbing the socket file
descriptor and putting it in a select() without using the
library-specific APIs might break that library's expectations of how
things should be done.
So you're saying you'd prefer a way to tell the connection which kind of
cursors should it be creating - that's OK for me. Would you like that
property to be changeable during the lifetime of a connection? There is
no technical reason to forbid that, but I'd be equally happy with just
having to choose sync vs async at connection creation time.
Post by Federico Di Gregorio
Also, having a way to tell from the connection if an async query is
running seems a good idea, so, probably .isready() should exist both at
the cursor and connection level.
Yeah, asking the connection if it's doing an async query at a given
moment would be useful.
Post by Federico Di Gregorio
If reactors and coroutines require additional methods then we should
provide *one* API that expose enough functionality for all of them (but
I don't use such stuff so someone else will be required to do this work).
I think that deep down under, the frameworks need a way to get a file
descriptor that can be put in a select(), a way to figure out if they
should be be waiting for the fd to become readable or writable and a
method to call after the select() call returns. At least I'm sure that
would be enough for Twisted to be able to wrap such a driver.

I have since implemented a way to build connections asynchronously and
the API looks like this:

conn = psycopg2.connect(tests.dsn, async=True)

state = psycopg2.extensions.POLL_WRITE
while state != psycopg2.extensions.POLL_OK:
if state == psycopg2.extensions.POLL_WRITE:
select.select([], [conn.fileno()], [])
elif state == psycopg2.extensions.POLL_READ:
select.select([conn.fileno()], [], [])
state = conn.poll()

so you create an async connection and then loop in a select() until the
driver tells you that your connection has been built. This required
adding two additional methods on the connection - fileno() and poll().
I'm not sure if poll() is the best name, isready() would be another
choice, but it can't simply return a boolean: libpq support for async
connection building requires you to switch between waiting on write and
read on the connection socket.

So how about we try to design an async API that would make sense and be
useful for both Twisted and Eventlet. Just to get the ball rolling I'll
try to start something, I'd love to hear from the Eventlet and know if
that would be enough for them.

psycopg2.connect() gets a new kwarg, async=False. If it's True, the
connect() call returns immediately and the returned Connection object is
in a state where calling any method except for poll() or close() raises
an exception.

Connection.poll() (or .isready()) returns one of the three values,
POLL_OK, POLL_WRITE or POLL_READ that tell you if you should put the
connection socket in the readfds or the writefds set of select(). It can
also raise an exception, which would be analogous to getting an
exception using a normal blocking call to psycopg2.connect()

Connection.fileno() returns the fd associated with the connection socket.

Connection.issync() returns a boolean saying if the connection is sync
or not.

Connection.executing() returns a boolean saying if there's an async
query being execute at the moment, i.e. if a call to execute() on a
async cursor for this connection will raise an exception (or block) or
not. FIXME: get a better name for that. Maybe use isready() and make use
poll() for connection building?

Connection.cursor() loses the async kwarg. Instead all cursors created
by an async connection are async. Trying to create a named cursor with
an async connection raises an exception (if people want real cursors
they will have to issue the DECLARE CURSOR themselves - sucks, but
that's a first approach).

Connection.commit() and rollback() blocks, throws away any asynchronous
result and commits/rolls back the transaction. If I'd be using that API
I would set it to autocommit mode anyway and do my own BEGINs and
COMMITs, I think.

Connection.set_isolation_level() blocks.

Connection.set_client_encoding() blocks.

Connection.lobject() and large objects in general work the same way as
with a sync connection.

Cursor.execute() on an async cursor returns immediately. If there's
already an async query underway it either throws an exception or blocks
until the previous query has completed, then throws away its result and
runs another execute(). I actually kind of like the latter approach,
because then you can use the async cursors in almost the same way as the
sync ones.

Cursor.isready() if there's no async query in progress tries to flush
the connection's write buffer and returns a boolean saying whether thhe
whole query has been sent already. If there's an async query in progress
it tries to get some data and returns a boolean saying whether it
managed to get all of it. You will still need two select() loops to
safely issue a query, but that's unavoidable: you have to send it and
read the result. But at least you can have just one method to do that.

Cursor.callproc() works just like execute().

Connection.fileno() returns the fd associated with the connection socket.

Cursor.executemany() on an async cursor raises an exception.

Cursor.fetch*() blocks until it gets the result and gives it to the client.

Cursor.scroll() also blocks and then does the scroll.

Cursor.copy.from/to/expert always blocks, at least for now.

I'll publish my psycopg2 branch after I clean it up a bit, but I would
love to agree on an API before implementing the whole thing. Right now I
have async connection building and an issent() method on the cursor, and
I can now see that it should be simply folded into isready().

Cheers,
Jan
Tim Roberts
2010-03-24 21:39:20 UTC
Permalink
Post by Jan Urbański
Post by Federico Di Gregorio
I'll start by this email. I don't agree on exposing more and more
features of libpq. While async is surely backend specific (especially if
done trhough libpq) I'd like to keep the API generic enough to be
implemented by other database adapters.
I agree 100%. Let's start by designing an API then, although I don't
know anything about other databases, so I won't be able to tell if
similar things would be possible for MySQL or Oracle etc.
Let us not lose sight of the fact that psycopg2 is intended to be an
abstraction, based on a database-neutral standard (DBAPI 2). It is not
intended to provide low-level access to Postgres. I would be quite
upset to find a whole set of new Postgres-specific features added to
psycopg2 at this point. That's not what it's for.

Now, I'm not arguing that low-level access is necessarily a bad thing
(although, having done a number of very painful database transitions
over the years, I'm certainly willing to argue that), but it doesn't
belong in psycopg2. If someone wants to create a separate package that
wraps the Postgres native API for Python in a direct and low-level way,
that's fine, but it should not be part of psycopg2. I should be able to
take my psycopg2 application, replace the "psycopg2" instance with
"pyMySQL", and have it Just Work.
--
Tim Roberts, ***@probo.com
Providenza & Boekelheide, Inc.
Jan Urbański
2010-03-24 21:48:22 UTC
Permalink
Post by Tim Roberts
Post by Jan Urbański
I agree 100%. Let's start by designing an API then, although I don't
know anything about other databases, so I won't be able to tell if
similar things would be possible for MySQL or Oracle etc.
Let us not lose sight of the fact that psycopg2 is intended to be an
abstraction, based on a database-neutral standard (DBAPI 2).
If someone wants to create a separate package that
wraps the Postgres native API for Python in a direct and low-level way,
that's fine, but it should not be part of psycopg2. I should be able to
take my psycopg2 application, replace the "psycopg2" instance with
"pyMySQL", and have it Just Work.
Of course the default behaviour should not change, so existing
application would continue to work. If you don't use the extentions, you
won't ever see an async query. But I can't see anything wrong with
exposing more native API, as long as it's optional and has no effect on
people that are not willing to use it.

Cheers,
Jan
Jan Urbański
2010-03-26 03:20:20 UTC
Permalink
Post by Jan Urbański
I'll publish my psycopg2 branch after I clean it up a bit
And here it is:

http://git.wulczer.org/?p=psycopg2.git;a=summary

I have implemented all (or so I think) the API I wrote about earlier,
with one change:

cursor.isready() got changed to cursor.poll(). This method just can't
return a boolean, it has to somehow tell you which event should you wait
for: socket writability or readability. I'm not terribly attached to the
name, but since the API is changing, it might as well change the method
name. Otherwise code that uses async might not notice that it's broken
for a long time.
If someone has a better proposal than poll(), or if the consensus is
that isready() could return something that's not a boolean, I'm all ears.

Documentation patches are missing from that repository, I'll wait with
learning Sphinx until you guys give me some feedback on this ;) But of
course if the feature would get accepted, I'll update the docs as well.

I haven't done any Windows testing, but I'm almost 100% sure it's going
to work. I also haven't tested against other PGs than 9.0devel, but then
again the worst thing that can happen is that the unit tests will need
some adjusting.

Question: I don't really know what's the custom on this list: should I
attach the patchset to a mail? Or is just pushing the changes to a git
repo OK for you?

My previous patches were done against the master branch of the git repo
on initd.org, but I then noticed that the python2 branch is much more
advanced... So I rebased against python2. Maybe there should be a notice
somewhere that the active branch is python2, it took me some time to
realize I'm working on old code :(

Cheers,
Jan
Jan Urbański
2010-03-26 14:58:45 UTC
Permalink
Hi,

I noticed that you didn't CC: the list - is that intentional? I will CC:
the list on reply, I hope you don't mind.
- is it necessary to expose all the states in the connection building,
(CONN_STATUS_{SEND|SENT|GET}_DATESTYLE etc)? I suspect they are an
implementation detail of the current psycopg version and I don't know
if it would be safe to build Python code on top of them.
Oh, you're right, no need to expose those to Python. The connection
poll() method returns the same values as the cursor poll() method
(meaning psycopg2.extensions.POLL_{READ, WRITE, OK}) and only these need
to be exposed. Thanks for pointing that out, will fix.
what about dropping it? I don't think there is any production code
using it (I may be wrong, in this case it is just to be pointed out in
the docs).
Yes, the parameter on the cursor is now redundant, and actually there's
code that when you try to execute asynchronously from a sync connection
(or the other way around) will just raise an exception.
I left it for backwards compat, and to not have to touch the cursor
subclasses from psycopg2.extras. I'm afraid people might have cursor
subclasses that chain up to the superclass execute() method passing the
async parameter and this would break them. But of course I'm also fine
with just dropping it.
Post by Jan Urbański
Documentation patches are missing from that repository, I'll wait with
learning Sphinx until you guys give me some feedback on this ;) But of
course if the feature would get accepted, I'll update the docs as well.
Don't worry for the docs: I'll be happy to fix them as soon as the
interface is settled.
Thanks, that would be awesome.
On top of what you've done, I'd like to add a hook on the connection
in order to call automatically an user-provided function after execute
(i.e. like the `wait_for_query()` in your test case): I'd like to test
with it to see if it would be useful to use async psycopg in a "green"
environment. It seems easy enough and I'll try to do something in the
afternoon.
Great! I already started a Twisted wrapper around it, so we might
actually get two real usecases and see if the interface is good enough
for both of them.
Another question: how do you think async connections and named cursors
mix? Basically in named cursors the fetch* methods are blocking too.
Does your patch already address them?
Yes, it raises an exception when you try to create a named cursor from
an async connection :o)

I outlined a possible approach for named cursor in a different email,
but basically I think it might not be worth it. The problem is that the
cursor gets declared when you do execute() (and this would work
asynchronously), but then when you do fetch() it actually issues another
query (FETCH FROM) and this means that the fetch() method would have to
be async... I think it could be solved by adding yet another method that
would mediate between execute() and fetch(), which I proposed to call
load().

You can always issue your DECLARE CURSOR and FETCH FROM calls yourself.
For instance, to really use async connections you will have to turn off
the psycopg2 transaction management and issue BEGIN and COMMIT yourself,
because currently they both are blocking.

Cheers,
Jan
Daniele Varrazzo
2010-03-26 15:26:21 UTC
Permalink
Post by Jan Urbański
Hi,
the list on reply, I hope you don't mind.
No, I simply hit the wrong button (which I regularly do, but then I
usually notice it a second later and CC the list too...)
Post by Jan Urbański
Another question: how do you think async connections and named cursors
mix? Basically in named cursors the fetch* methods are blocking too.
Does your patch already address them?
Yes, it raises an exception when you try to create a named cursor from
an async connection :o)
I outlined a possible approach for named cursor in a different email,
but basically I think it might not be worth it. The problem is that the
cursor gets declared when you do execute() (and this would work
asynchronously), but then when you do fetch() it actually issues another
query (FETCH FROM) and this means that the fetch() method would have to
be async... I think it could be solved by adding yet another method that
would mediate between execute() and fetch(), which I proposed to call
load().
Yes, the correct (awsome) behavior would be to perform an async fetch.

I read again the mail where you propose load(). I am thinking that
having the "wait hook" in place, it could get called by `fetch*()`
before returning the value, but I am talking a little bit too much
about this yet-vapor feature, so I'd rater write something before.
Post by Jan Urbański
You can always issue your DECLARE CURSOR and FETCH FROM calls yourself.
For instance, to really use async connections you will have to turn off
the psycopg2 transaction management and issue BEGIN and COMMIT yourself,
because currently they both are blocking.
Oh, I haven't notice that. Can't they be handled using the same async machinery?

Regards,

-- Daniele
Jan Urbański
2010-03-26 15:36:39 UTC
Permalink
Post by Daniele Varrazzo
Post by Jan Urbański
You can always issue your DECLARE CURSOR and FETCH FROM calls yourself.
For instance, to really use async connections you will have to turn off
the psycopg2 transaction management and issue BEGIN and COMMIT yourself,
because currently they both are blocking.
Oh, I haven't notice that. Can't they be handled using the same async machinery?
It's possible, you would only have to add a select() loop after
begin/commit and use conn.poll() after doing rollback(). Also, psycopg2
issues a BEGIN for you when you first do an execute() after a commit(),
so we'd have to use the same trick the connection class uses currently,
issue the BEGIN asynchronously and only after it returns execute the
user's code. Possible, although a bit complex.

Federico suggested that a nice property of psycopg2 connections is that
you can always issue a rollback() and get the connection back into a
workable state. With async ROLLBACK that would no longer be the case.
I'm torn on whether ROLLBACK should always be sync, whether there should
be an argument on rollback() that specifies if you want to block on it,
or something still different should happen...

Cheers, Jan
Daniele Varrazzo
2010-03-26 19:29:57 UTC
Permalink
Post by Jan Urbański
On top of what you've done, I'd like to add a hook on the connection
in order to call automatically an user-provided function after execute
(i.e. like the `wait_for_query()` in your test case): I'd like to test
with it to see if it would be useful to use async psycopg in a "green"
environment. It seems easy enough and I'll try to do something in the
afternoon.
Great! I already started a Twisted wrapper around it, so we might
actually get two real usecases and see if the interface is good enough
for both of them.
I've written a first version of the wait_callback idea: code is
available from the 'async' branch of
http://piro.develer.com/psycopg2.git (gitweb:
https://www.develer.com/gitweb/pub?p=users/piro/psycopg2.git;a=summary).

The connection has a new read/write wait_callback attribute: if set it
should be a callback receiving a cursor argument. The idea is that an
async connection calls the callback at each operation that would
block. If the callback is set to None nothing is invoked, so the
connection is left with the need to be poll()ed. Currently is it
called by execute() and callproc().

I've checked interoperability with the stdlib select() and with the
greenlet-based libraries gevent and eventlet: test cases are included.

The current interface is a proof of concept: it is not complete
because the queries issued during connect() can't take advantage of
the wait_callback. In order to overcome this limitation, I think there
should be a module attribute or function, e.g.
psycopg.extensions.register_wait_callback() so that a callback can be
inherited by every newly created connection. Currently the cb takes a
cursor, which is necessary to invoke poll(), so it wouldn't work e.g.
for connect() where a cursor doesn't exist. Probably the cb should
have a signature f(conn, cur=None) so that poll() can be invoked on
the right object.

I think with this callback automatically called by psycopg, the async
features could be extended to other operations:

- connection initialization
- begin, commit, rollback
- fetch*() for named cursors
- copy?
- lobjects?

Do you think this strategy could be useful?

-- Daniele
Daniele Varrazzo
2010-03-27 02:21:44 UTC
Permalink
On Fri, Mar 26, 2010 at 7:29 PM, Daniele Varrazzo
Post by Daniele Varrazzo
I've written a first version of the wait_callback idea: code is
available from the 'async' branch of
https://www.develer.com/gitweb/pub?p=users/piro/psycopg2.git;a=summary).
[...] Probably the cb should
have a signature f(conn, cur=None) so that poll() can be invoked on
the right object.
I updated the above branch with definition of the wait callback as
f(conn, cur=None).

Here is a testing script I wrote to check that the async feature works
as expected (using gevent). Here, after connect(), the callback is
used manually to put the connection in the correct state: as discussed
a better interface for this operation can be provided.

# make urllib2 coroutine-friendly
import gevent
import gevent.monkey
gevent.monkey.patch_all()
import urllib2 # green

# have an async connection to play well with gevent
import psycopg2
from test_async_gevent import gevent_wait_callback
conn = psycopg2.connect("dbname=postgres", async=1)
gevent_wait_callback(conn)
conn.wait_callback = gevent_wait_callback

import logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(message)s")
logger = logging.getLogger()

def download(num, secs):
url = "http://localhost:8000/sleep/%d/" % secs
for i in range(num):
logger.info("download %d start", i)
data = urllib2.urlopen(url).read()
logger.info("download %d end", i)

def fetch(num, secs):
cur = conn.cursor()
for i in range(num):
logger.info("fetch %d start", i)
cur.execute("select pg_sleep(%s)", (secs,))
logger.info("fetch %d end", i)

# two concurrent tasks: one downloading 2 urls, each taking 3 secs,
# the other running 3 queries, each blocking for 2 secs
logger.info("making jobs")
jobs = [
gevent.spawn(download, 2, 3),
gevent.spawn(fetch, 3, 2),
]

logger.info("join begin")
gevent.joinall(jobs)
logger.info("join end")


The output of the script is what expected:

$ python nonblock.py
2010-03-27 02:01:02,004 making jobs
2010-03-27 02:01:02,022 join begin
2010-03-27 02:01:02,031 download 0 start
2010-03-27 02:01:02,046 fetch 0 start
2010-03-27 02:01:04,106 fetch 0 end
2010-03-27 02:01:04,107 fetch 1 start
2010-03-27 02:01:05,203 download 0 end
2010-03-27 02:01:05,204 download 1 start
2010-03-27 02:01:06,110 fetch 1 end
2010-03-27 02:01:06,111 fetch 2 start
2010-03-27 02:01:08,115 fetch 2 end
2010-03-27 02:01:08,381 download 1 end
2010-03-27 02:01:08,383 join end


-- Daniele
Jan Urbański
2010-03-27 02:40:57 UTC
Permalink
Post by Daniele Varrazzo
On Fri, Mar 26, 2010 at 7:29 PM, Daniele Varrazzo
Post by Daniele Varrazzo
I've written a first version of the wait_callback idea: code is
available from the 'async' branch of
https://www.develer.com/gitweb/pub?p=users/piro/psycopg2.git;a=summary).
[...]
Here is a testing script I wrote to check that the async feature works
as expected (using gevent).
Awesome, I'll try to take a look over the weekend. Looks like you beat
me to writing an example using $your_favourite_async_library ;)

Skimming over the patches I have a question: is it really necessary to
add the callback feature into the C library? AFAICS the callback is
executed on asynchronous cursors everytime there is a execute() attempt.
Couldn't you just write a Python psycopg2.extensions.cursor subclass
that overrides execute() by calling that callback?

Cheers,
Jan
Jan Urbański
2010-03-27 20:29:43 UTC
Permalink
Post by Jan Urbański
Skimming over the patches I have a question: is it really necessary to
add the callback feature into the C library? AFAICS the callback is
executed on asynchronous cursors everytime there is a execute() attempt.
Couldn't you just write a Python psycopg2.extensions.cursor subclass
that overrides execute() by calling that callback?
I just tried using something like:

class GreenCursor(psycopg2.extensions.cursor):
def execute(self, sql, args=None):
ret = psycopg2.extensions.cursor.execute(self, sql, args)
gevent_wait_callback(self)
return ret

and then doing cur = conn.cursor(cursor_factory=GreenCursor) and your
gevent test worked without having to add the callback feature to the
factory. All it takes is some helper functions to wrap the basic async
support, which I think is the way to go: only make psycopg2 expose the
minimal interface and make the libraries wrap it the way the like.

Cheers,
Jan
Daniele Varrazzo
2010-03-28 01:35:22 UTC
Permalink
Skimming over the patches I have a question: is it really necessary to add
the callback feature into the C library? AFAICS the callback is executed on
asynchronous cursors everytime there is a execute() attempt. Couldn't you
just write a Python psycopg2.extensions.cursor subclass that overrides
execute() by calling that callback?
As things are now, execute and callproc being the only methods using
async features, this is the case. Probably it will be too extending
support to other currently blocking methods (begin/commit/rollback
etc).

Yes, it is probably the case to think about Python subclasses to add a
wait wrapper around async methods. the psycopg.extras module could
provide such wrappers exposing a method, equivalent to the callback in
my test, to be customized by async libraries to fit their framework.

Regards,

-- Daniele
Daniele Varrazzo
2010-03-28 02:10:52 UTC
Permalink
On Sun, Mar 28, 2010 at 2:35 AM, Daniele Varrazzo
Post by Daniele Varrazzo
Skimming over the patches I have a question: is it really necessary to add
the callback feature into the C library? AFAICS the callback is executed on
asynchronous cursors everytime there is a execute() attempt. Couldn't you
just write a Python psycopg2.extensions.cursor subclass that overrides
execute() by calling that callback?
As things are now, execute and callproc being the only methods using
async features, this is the case. Probably it will be too extending
support to other currently blocking methods (begin/commit/rollback
etc).
Yes, it is probably the case to think about Python subclasses to add a
wait wrapper around async methods. the psycopg.extras module could
provide such wrappers exposing a method, equivalent to the callback in
my test, to be customized by async libraries to fit their framework.
OTOH, looking at the pqpath.c, many functions call PQexec (via
pq_execute_command_locked) and set the connection state according to
its result (begin, commit and rollback). reset() instead executes a
sequence of 2 or 3 operations (something similar to what connect()
does). If those methods were to be made async, it seems to me that the
knowledge about "how to block" should be available at C level.

-- Daniele
Jan Urbański
2010-03-28 08:42:55 UTC
Permalink
Post by Daniele Varrazzo
On Sun, Mar 28, 2010 at 2:35 AM, Daniele Varrazzo
OTOH, looking at the pqpath.c, many functions call PQexec (via
pq_execute_command_locked) and set the connection state according to
its result (begin, commit and rollback). reset() instead executes a
sequence of 2 or 3 operations (something similar to what connect()
does). If those methods were to be made async, it seems to me that the
knowledge about "how to block" should be available at C level.
Yeah, there are places where psycopg2 issues queries behind your back :(
That's why I said that a user of the async features would probably have
to set the transaction isolation level to autocommit and issue his own
BEGINs and COMMITs from a Cursor.execute() instead.

I think that the way to fix it would be to use a similar trick to the
one in async connecting: when you execute something for the first time
in a cursor, issue a BEGIN behind the user's back, make him poll the
cursor and then when the BEGIN finishes, issue the real query and
continue polling. This way it's more or less trasparent. COMMIT,
possibly ROLLBACK and things like encoding setting would need to be
handled this way as well... But since you can manage transactions
yourself, I didn't look into implementing that behaviour yet.

Cheers,
Jan
Federico Di Gregorio
2010-03-29 07:13:47 UTC
Permalink
Post by Jan Urbański
Post by Daniele Varrazzo
On Sun, Mar 28, 2010 at 2:35 AM, Daniele Varrazzo
OTOH, looking at the pqpath.c, many functions call PQexec (via
pq_execute_command_locked) and set the connection state according to
its result (begin, commit and rollback). reset() instead executes a
sequence of 2 or 3 operations (something similar to what connect()
does). If those methods were to be made async, it seems to me that the
knowledge about "how to block" should be available at C level.
Yeah, there are places where psycopg2 issues queries behind your back :(
That's why I said that a user of the async features would probably have
to set the transaction isolation level to autocommit and issue his own
BEGINs and COMMITs from a Cursor.execute() instead.
I still haven't skimmed over the patches but the more I read the
paragraph above, the more I like it. It is reasonable that the client
code using psycopg in async mode will usually be some asynchronous
framework of some sort and not code written directly by the user. Then
it make sense to depythonize the API a little bit and let the client
code use psycopg as a facility to issue asynchronous queries. Also I
don't think that keeping a single path in the code is such a good idea,
so why not have a connect() call that when invoked with async=True takes
a completely different path and setup everything for async mode? It will
put the connection into autocommit and disable (meaning that they will
raise an exception) methods such as commit(), rollback() and
set_transaction_isolation(). We just need a list of what psycopg should
or should not do.

federico
--
Federico Di Gregorio fog-NGVKUo/i/***@public.gmane.org
God is real. Unless declared integer. -- Anonymous FORTRAN programmer
Jan Urbański
2010-03-29 22:17:44 UTC
Permalink
Post by Federico Di Gregorio
Post by Jan Urbański
Yeah, there are places where psycopg2 issues queries behind your back :(
That's why I said that a user of the async features would probably have
to set the transaction isolation level to autocommit and issue his own
BEGINs and COMMITs from a Cursor.execute() instead.
code using psycopg in async mode will usually be some asynchronous
framework of some sort and not code written directly by the user. Then
it make sense to depythonize the API a little bit and let the client
code use psycopg as a facility to issue asynchronous queries. Also I
don't think that keeping a single path in the code is such a good idea,
so why not have a connect() call that when invoked with async=True takes
a completely different path and setup everything for async mode? It will
put the connection into autocommit and disable (meaning that they will
raise an exception) methods such as commit(), rollback() and
set_transaction_isolation(). We just need a list of what psycopg should
or should not do.
This sounds good. In my patches the connect() call is choosing different
codepaths sync and async connecting (conn_sync_connect vs
conn_async_connect).

The behaviour I proposed in a previous mail was that repeated execute()
attempts, commit() and rollback() were blocking, so app code would have
to make sure it's not calling and take care of serialising execute()
calls. Same for fetch() - if the cursor has not finished executing yet,
it was blocking until it got the result. That was an attempt to make
async cursors as safe as possible: that way you could almost use them as
sync cursor and should not really notice. But I guess there's no value
in that.

Maybe it could work like this:
* async connections start in isolation level 0
* connection methods that raise an exception if called on an async
connection:
* commit
* rollback
* set_isolation_level
* set_client_encoding
* lobject
* named cursor creation
* cursior methods that raise an exception if connection is async:
* execute if another execute is underway
* same for callproc
* executemany
* scroll
* calling fetch before the query ended results in an error (not
sure about that one, though, might be better to block)
* copy_{from, to, expert) (might become supported later)

I don't really mind making these error out instead of trying to make
them work, even if they'd have to block. If I'm to use it, it will be
via a Twisted wrapper, that will present me with a different API that
will take care of not allowing multiple executes and will not have
things like scroll etc.

Cheers,
Jan
Federico Di Gregorio
2010-03-30 13:17:37 UTC
Permalink
On 30/03/2010 00:17, Jan Urbański wrote:
[snip]
Post by Jan Urbański
This sounds good. In my patches the connect() call is choosing different
codepaths sync and async connecting (conn_sync_connect vs
conn_async_connect).
The behaviour I proposed in a previous mail was that repeated execute()
attempts, commit() and rollback() were blocking, so app code would have
to make sure it's not calling and take care of serialising execute()
That was also my initial idea. But more I think about it more I feel
that async code should be _completely_ async and the client code should
cope with it (and get loud exceptions in case of errors).
Post by Jan Urbański
* async connections start in isolation level 0
* connection methods that raise an exception if called on an async
* commit
* rollback
* set_isolation_level
* set_client_encoding
* lobject
Agreed.
Post by Jan Urbański
* named cursor creation
Initially. In the future I'd like to have support for async server-side
cursors too.
Post by Jan Urbański
* execute if another execute is underway
* same for callproc
* executemany
Agreed.
Post by Jan Urbański
* scroll
scroll() should raise an exception only if the cursor is in the middle
of a query. If all data has been fetched then scroll should keep its
currrent behaviour.
Post by Jan Urbański
* calling fetch before the query ended results in an error (not sure
about that one, though, might be better to block)
I'd like it to raise.
Post by Jan Urbański
* copy_{from, to, expert) (might become supported later)
Agreed.
Post by Jan Urbański
I don't really mind making these error out instead of trying to make
them work, even if they'd have to block. If I'm to use it, it will be
via a Twisted wrapper, that will present me with a different API that
will take care of not allowing multiple executes and will not have
things like scroll etc.
During the next few days I'll start pulling from your tree. I won't pull
any change that does non-trivial changes to the current code base but
everything that add a different code path is ok. I'll post on this list
as I progress.

Thank you very much for your contribution,

federico
--
Federico Di Gregorio fog-NGVKUo/i/***@public.gmane.org
Come sai se il primo Ú ben dato? In realtà Ú "Beh il secondo bacio ben
dato non si rifiuta". -- Alessandra
Jan Urbański
2010-03-31 00:06:59 UTC
Permalink
Post by Federico Di Gregorio
Post by Jan Urbański
The behaviour I proposed in a previous mail was that repeated execute()
attempts, commit() and rollback() were blocking, so app code would have
to make sure it's not calling and take care of serialising execute()
That was also my initial idea. But more I think about it more I feel
that async code should be _completely_ async and the client code should
cope with it (and get loud exceptions in case of errors).
Post by Jan Urbański
[things raise errors in async mode]
I've pushed these changes to my branch, along with a few bugfixes in the
async execution/connection implementation.
Post by Federico Di Gregorio
During the next few days I'll start pulling from your tree. I won't pull
any change that does non-trivial changes to the current code base but
everything that add a different code path is ok. I'll post on this list
as I progress.
Thanks! If you'd like me to rework anything, just say so. I'm looking
forward to having a fully operational async psycopg2 driver. On the PG
Python dirver market psycopg2 is the only choice for me and I hope that
it will eventually estabilish itself as the de facto standard for PG+Python.

Cheers,
Jan
Jan Urbański
2010-03-31 00:15:32 UTC
Permalink
Post by Daniele Varrazzo
Skimming over the patches I have a question: is it really necessary to add
the callback feature into the C library? AFAICS the callback is executed on
asynchronous cursors everytime there is a execute() attempt. Couldn't you
just write a Python psycopg2.extensions.cursor subclass that overrides
execute() by calling that callback?
As things are now, execute and callproc being the only methods using
async features, this is the case. Probably it will be too extending
support to other currently blocking methods (begin/commit/rollback
etc).
From the discussion it seems that Federico would prefer to shift the
responsability of safely using async features to the framerworks instead
of instrumenting more psycopg methods.

OTOH I saw your post on the gevent mailing list and it indeed would be
cool if you could change your apps to non-blocking without changing
their code, by just dropping in the gevent module. I'm not convinced
that's fully possible, though. I think gevent could provide connection
and cursor subclasses that would emulate connection.commit() by issuing
a COMMIT query and waiting (non-blockingly) for it to return. You could
probably do the same thing for other methods and build up a fully async
drop-in replacement that you could use with django or whatnot.

I think it makes sense to only expose the raw stuff and let frameworks
use it the way they see fit.

All that sounds very interesting, please keep posting your results!

Cheers,
Jan
Daniele Varrazzo
2010-03-31 11:29:50 UTC
Permalink
Post by Jan Urbański
Post by Daniele Varrazzo
Skimming over the patches I have a question: is it really necessary to add
the callback feature into the C library? AFAICS the callback is executed on
asynchronous cursors everytime there is a execute() attempt. Couldn't you
just write a Python psycopg2.extensions.cursor subclass that overrides
execute() by calling that callback?
As things are now, execute and callproc being the only methods using
async features, this is the case. Probably it will be too extending
support to other currently blocking methods (begin/commit/rollback
etc).
From the discussion it seems that Federico would prefer to shift the
responsability of safely using async features to the framerworks instead of
instrumenting more psycopg methods.
OTOH I saw your post on the gevent mailing list and it indeed would be cool
if you could change your apps to non-blocking without changing their code,
by just dropping in the gevent module. I'm not convinced that's fully
possible, though. I think gevent could provide connection and cursor
subclasses that would emulate connection.commit() by issuing a COMMIT query
and waiting (non-blockingly) for it to return. You could probably do the
same thing for other methods and build up a fully async drop-in replacement
that you could use with django or whatnot.
I think it makes sense to only expose the raw stuff and let frameworks use
it the way they see fit.
All that sounds very interesting, please keep posting your results!
Hello, I have been working on the "other way". Couldn't write anything
in the last two days, but what I've done has contributed to clarify
myself things a little bit.

What I've understood is that what I want to do, after all... is not
async support :) In the sense that what I want to do doesn't allow,
for instance, to send 2 async queries and put both them in the same
select() call. Each call is still blocking, and still blocks on its
own. The only difference is that it gives the external environment a
possibility to use their way of blocking. For coroutine frameworks,
this block can yield to another waiting coroutine, enabling a
cooperative switch in the middle of a libpq blocking call. I think it
wouldn't work fine for "normal" system, where there is a single Python
stack (Twisted or an UI event loop): in these systems the call would
simply block.

So I think we are really doing two different things: with your
evolution psycopg will be usable with larger freedom and scheduled to
block whenever the caller decides, at the price of not being usable as
DBAPI adapter anymore, but more as a "low level tool" as described.
With my evolution psycopg would be usable from stackless environments
maintaining compatibility with everything built on top of it, but with
an imposed constraint: the select() is called when a psycopg function
is called, non later at users' will.

My patch is relatively little invasive: it only affects where PQexec
is called, i.e. only pq_execute_command_locked (there are also calls
in conn_setup but can probably be refactored to pass through the above
function) and where PQconnectdb is called. These calls are replaced by
the matching async calls, invoking the "wait hook" in the middle. It
is not very different from what e.g. PQexec does in the libpq (it
calls the async version and then blocks using a poll() or a select()).

My plan at this point is to stop calling what I'm doing "async" :) and
let you build the async path as is best for the frameworks you know
about. What I am really touching is the "sync" code path, adding a
"stackless friendly hook". I've currently implemented the hook call
and the connection, which I think was the trickiest part: as soon as i
can I shall implement the PQexec replacement, refactor the conn_setup
to use it and check that all the tests pass using an user supplied
hook (my target is to keep the full DBAPI semantics).

Thank you very much, have a nice day.

-- Daniele
Federico Di Gregorio
2010-03-31 13:03:40 UTC
Permalink
Post by Daniele Varrazzo
Post by Jan Urbański
From the discussion it seems that Federico would prefer to shift the
Post by Jan Urbański
responsability of safely using async features to the framerworks instead of
instrumenting more psycopg methods.
[snippety snip]
Post by Daniele Varrazzo
My plan at this point is to stop calling what I'm doing "async" :) and
let you build the async path as is best for the frameworks you know
about. What I am really touching is the "sync" code path, adding a
"stackless friendly hook". I've currently implemented the hook call
and the connection, which I think was the trickiest part: as soon as i
can I shall implement the PQexec replacement, refactor the conn_setup
to use it and check that all the tests pass using an user supplied
hook (my target is to keep the full DBAPI semantics).
Thank you very much, have a nice day.
Hi Daniele,

thank you very much for sharing this. I agree that the async
implementation and the stackless friendly hook are aiming at two
different targets and I'd like to have both in a future psycopg
implementation. I'll concentrate on async because quite some people
asked for it and Jan's code is a very good start. But please, if you see
any change that will make more difficult to write the stackless friendly
hook in the future, just stand up and we'll find a different way.

federico
--
Federico Di Gregorio <mailto:fog-NGVKUo/i/***@public.gmane.org> <jid:fog-6VaHXZgVb86aHsy+***@public.gmane.org>
DISCLAIMER. If I receive a message from you, you are agreeing that:
1. I am by definition, "the intended recipient".
2. All information in the email is mine to do with as I see fit and
make such financial profit, political mileage, or good joke as it lends
itself to. In particular, I may quote it on USENET or the WWW.
3. I may take the contents as representing the views of your company.
4. This overrides any disclaimer or statement of confidentiality that
may be included on your message.
Loading...