Discussion:
Multi-handle hanging up immediately after creating FTP data socket (using GCD)
Bill Doyle
2013-08-07 01:55:16 UTC
Permalink
Hi! I'm in the process of building a simple libcurl FTP wrapper for a
Cocoa app, and I'm using Grand Central Dispatch as the backing for a
multi-handle. I've got the easy-handle configured properly, as it will
upload fine when run with curl_easy_perform. However, when the
multi-handle attempts the upload, it gets as far as opening the data
socket to the server and then immediately gives up and closes the
control socket. I've been poking at this for several days with no
success, so I was hoping that someone here would be able to offer a
bit of assistance. Here are the logs as provided by my
CURLOPT_DEBUGFUNCTION, plus a few extras:
-----

new timeout = 1
[CURLInterface] Added handle to curlMulti.
About to connect() to unsquared.co port 21 (#0)
Trying 205.134.239.195...
Adding handle: conn: 0x103811000
Adding handle: send: 0
Adding handle: recv: 0
Curl_addHandleToPipeline: length: 1
- Conn 0 (0x103811000) send_pipe: 1, recv_pipe: 0
Creating CurlSocketContext_S for 5
CURL_POLL_OUT for 5
new timeout = 1
new timeout = 1
Connected to unsquared.co (205.134.239.195) port 21 (#0)
CURL_POLL_IN for 5
220---------- Welcome to Pure-FTPd [privsep] [TLS] ----------
220-You are user number 1 of 150 allowed.
220-Local time is now 17:40. Server port: 21.
220-This is a private system - No anonymous login
220-IPv6 connections are also welcome on this server.
220 You will be disconnected after 30 minutes of inactivity.
220-You are user number 1 of 150 allowed.
220-Local time is now 17:40. Server port: 21.
220-This is a private system - No anonymous login
220-IPv6 connections are also welcome on this server.
220 You will be disconnected after 30 minutes of inactivity.
220-Local time is now 17:40. Server port: 21.
220-This is a private system - No anonymous login
220-IPv6 connections are also welcome on this server.
220 You will be disconnected after 30 minutes of inactivity.
220-This is a private system - No anonymous login
220-IPv6 connections are also welcome on this server.
220 You will be disconnected after 30 minutes of inactivity.
220-IPv6 connections are also welcome on this server.
220 You will be disconnected after 30 minutes of inactivity.
220 You will be disconnected after 30 minutes of inactivity.
USER sphere6
331 User sphere6 OK. Password required
nutes of inactivity.

220-You are user number 1 of 150 allowed.
220-Local time is now 17:40. Server port: 21.
220-This is a private system - No anonymous login
220-IPv6 connections are also welcome on this server.
220 You will be disconnected after 30 minutes of inactivity.
PASS **************
230 OK. Current restricted directory is /
es of inactivity.

220-You are user number 1 of 150 allowed.
220-Local time is now 17:40. Server port: 21.
220-This is a private system - No anonymous login
220-IPv6 connections are also welcome on this server.
220 You will be disconnected after 30 minutes of inactivity.
PWD
257 "/" is your current location
ry is /
Entry path is '/'
CWD public_html
250 OK. Current directory is /public_html
es of inactivity.

220-You are user number 1 of 150 allowed.
220-Local time is now 17:40. Server port: 21.
220-This is a private system - No anonymous login
220-IPv6 connections are also welcome on this server.
220 You will be disconnected after 30 minutes of inactivity.
CWD files
250 OK. Current directory is /public_html/files
inactivity.

220-You are user number 1 of 150 allowed.
220-Local time is now 17:40. Server port: 21.
220-This is a private system - No anonymous login
220-IPv6 connections are also welcome on this server.
220 You will be disconnected after 30 minutes of inactivity.
EPSV
Connect data stream passively
229 Extended Passive mode OK (|||31818|)
files
Trying 205.134.239.195...
Connecting to 205.134.239.195 (205.134.239.195) port 31818
TYPE I
200 TYPE is now 8-bit binary
|||31818|)
STOR test.jpg
Creating CurlSocketContext_S for 7
CURL_POLL_IN for 7
CURL_POLL_REMOVE for 5
cancelling 5
cancelling 5

-----
The duplicated messages, especially after the initial login, along
with the small fragments of previous messages that are interspersed,
lead me to believe that this is threading related, but I can't pick
out what I've done wrong.

I based the code on the existing GCD "crash-test" example
(http://curl.haxx.se/mail/lib-2013-03/att-0309/multi-gcd-crashtest.c),
and I've tried several variations on it. For example, it makes no
difference whether my CURLOPT_SOCKOPTFUNCTION is present. I've
attached the relevant Objective-C implementation file. Another
interesting note is that libcurl never asks to adjust the timeout
value (CURLOPT_TIMERFUNCTION), though that may just be due to the lack
of any uploading actually happening. I'm not exactly sure when that's
meant to happen. (You can see the timeout being set to the initial 1ms
to kick things off at the beginning of the logs.)

It's entirely possible that this is more of a GCD issue than a libcurl
issue, and if that's the case, I'd be happy to move this question
elsewhere. Either way, thanks in advance for any help you can provide!
Bill
Daniel Stenberg
2013-08-07 20:31:48 UTC
Permalink
Post by Bill Doyle
Hi! I'm in the process of building a simple libcurl FTP wrapper for a
Cocoa app, and I'm using Grand Central Dispatch as the backing for a
multi-handle.
That's MacOS then, right? Which libcurl version are you using?
Post by Bill Doyle
it gets as far as opening the data socket to the server and then immediately
gives up and closes the control socket.
I can't say that I recall this problem but when we switched over to doing only
multi internally (before... was it 7.28?) I did fix a number of problems with
the multi interface.
Post by Bill Doyle
The duplicated messages, especially after the initial login, along with the
small fragments of previous messages that are interspersed, lead me to
believe that this is threading related, but I can't pick out what I've done
wrong.
Those weird-looking lines combined with your words about threading makes me
scared. What are you doing multi-threaded?

I did quickly glance over the code you attached but it was very complicated to
follow.

Can you try to implement your logic in a plain (non-threaded) C program and
see if that works?
Post by Bill Doyle
Another interesting note is that libcurl never asks to adjust the timeout
value (CURLOPT_TIMERFUNCTION),
Never? For every handle you add it will set a 1 ms timeout so it certainly
should...
Post by Bill Doyle
It's entirely possible that this is more of a GCD issue than a libcurl
issue, and if that's the case, I'd be happy to move this question elsewhere.
Perhaps, but I can't say. I know nothing about GCD and objective-C is mostly
confusing to me...
--
/ daniel.haxx.se
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Nick Zitzmann
2013-08-07 21:26:11 UTC
Permalink
Post by Daniel Stenberg
Post by Bill Doyle
Hi! I'm in the process of building a simple libcurl FTP wrapper for a
Cocoa app, and I'm using Grand Central Dispatch as the backing for a
multi-handle.
That's MacOS then, right?
Possibly, although GCD has a C interface (it's not just Objective-C), and has been ported to a few other operating systems, including GNU/Linux, so you don't need an Apple operating system to use it anymore.

Nick Zitzmann
<http://www.chronosnet.com/>


-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Bill Doyle
2013-08-07 21:39:57 UTC
Permalink
Post by Daniel Stenberg
That's MacOS then, right? Which libcurl version are you using?
Yep, libcurl 7.30.0 on OS X, built using the MacOSX-Framework script.
Post by Daniel Stenberg
Those weird-looking lines combined with your words about threading makes me
scared. What are you doing multi-threaded?
This is where it gets a bit tricky: GCD is outwardly queue-based, but
manages threads internally. I have little control over what it chooses
to do thread-wise, though I have told it to use a serial queue, so
everything should be getting executed in the proper order. Evidently,
something isn't working quite as I expected it to, though. I'm
continuing to poke around at that aspect of it.
Post by Daniel Stenberg
I did quickly glance over the code you attached but it was very complicated
to follow.
Yeah, sorry about that. I'm still in the process of learning GCD, so
this has been patched together as I translated it from C to
Objective-C. I'm planning on re-writing this, but I need it working
prior to doing so. :)
Post by Daniel Stenberg
Can you try to implement your logic in a plain (non-threaded) C program and
see if that works?
That would essentially just be the crash-test program that I linked,
since I did the opposite to get it into this state. But, like I said
above, I'm not actually doing anything with threads, GCD is, so that
code *should* have roughly the same behavior. I'm currently trying to
get that program compiled and running to ensure that it does in fact
work, but given that it was written in March of this year, I can't see
there being any libcurl-version-related issues there.
Post by Daniel Stenberg
Post by Bill Doyle
Another interesting note is that libcurl never asks to adjust the timeout
value (CURLOPT_TIMERFUNCTION),
Never? For every handle you add it will set a 1 ms timeout so it certainly
should...
Well, never after initially setting it to 1ms. I wasn't certain
whether it was supposed to attempt changing the timeout again later in
the process.
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Daniel Stenberg
2013-08-08 09:50:31 UTC
Permalink
Post by Bill Doyle
Post by Daniel Stenberg
Can you try to implement your logic in a plain (non-threaded) C program and
see if that works?
That would essentially just be the crash-test program that I linked,
Right, but that was also highly unreadable and hard to follow so that wasn't
really a good read for me either.
Post by Bill Doyle
Post by Daniel Stenberg
Post by Bill Doyle
Another interesting note is that libcurl never asks to adjust the timeout
value (CURLOPT_TIMERFUNCTION),
Never? For every handle you add it will set a 1 ms timeout so it certainly
should...
Well, never after initially setting it to 1ms. I wasn't certain whether it
was supposed to attempt changing the timeout again later in the process.
After the first timeout, you should get another soon enough for 300 seconds
into the future which is the name resolver timeout and then things will of
course depend a lot on your build setup and what options you use to libcurl.

Can you perhaps describe in words which libcurl functions you're using in this
program and in which order, like just a high-level description of what it is
supposed to do?
--
/ daniel.haxx.se
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Bill Doyle
2013-08-08 13:44:04 UTC
Permalink
Post by Daniel Stenberg
Right, but that was also highly unreadable and hard to follow so that wasn't
really a good read for me either.
That I can't argue with, it took me a week of poking at it to figure
out what was going on. I chalked it up to not entirely understanding
libcurl or GCD, but good to know that it isn't just me.
Post by Daniel Stenberg
Can you perhaps describe in words which libcurl functions you're using in
this program and in which order, like just a high-level description of what
it is supposed to do?
Yeah, I can give it a try. I'm still in the process of learning both
components here, but I think I understand what's *supposed* to be
going on:

When the CURLInterface object is initially created (so everything
inside the -init method), I go through the process of a) setting up
the GCD queue, and b) the curl multi-handle. The queue is pretty easy,
just a single call to dispatch_queue_create() telling it that I want a
serial queue, and then creating/attaching the initial timeout to kick
everything off. How this happens is described a few paragraphs down.
(I haven't set a specific timer interval here, but that doesn't seem
to affect it, so I assume that it defaults to not doing anything? I
haven't been able to find any info on that, I'm going to ping the
#macdev room on Freenode later today and see if anyone knows.)

Once the object is set up, it simply waits for someone to call
-addUpload:… with all the various info it needs to do so. The first
thing it does is construct a simple data object that stores all of the
state info about the handle, such as current byte, status code, debug
output, etc. It then creates an easy-handle, sets all of the various
options on it, and adds it to the multi-handle. In particular, the
state object (an instance of CURLHandleData) is attached using
CURLOPT_PRIVATE and also as the userData pointer for most of the
callbacks. It is also returned to whoever called -addUpload:… so that
they can use it to track the upload's progress.
Post by Daniel Stenberg
From there, it follows the typical multi-handle flow. socket_callback
and timer_callback handle requests from the multi-handle to attach
event handlers to GCD. socket_callback needs access to the
CURLInterface to attach things properly, so that's what it receives as
its userData pointer. Because of that, you can think of the
socket_callback and -attachSourceOfType:… method as being a single
piece. socket_callback's socketData pointer is configured to point at
a simple struct that keeps track of the GCD sources used to watch for
events on the specified socket, so they can be updated or removed as
needed.

When activity is detected on the sockets, or from the timer, a call to
-performPartialUpload:forSocket: is added to the GCD queue. This
method calls curl_multi_socket_action, and then reads through any
available info from curl_multi_info_read so that it can notify the
object's delegate (whoever is managing the uploads) that something has
happened, be it success, failure, whatever. It also performs the
cleanup of completed easy-handles and their attached CURLHandleData
object.

As far as the callbacks for the easy-handles go:
- read_function is reading from an NSData object, which is basically
just a wrapper around some data stored in memory. It uses the
CURLHandleData object assigned as its userData pointer to keep track
this NSData object, and also its current position within the data.
- progress_callback simply updates the progress that's also stored in
the CURLHandleData object, and also calls a method on the delegate to
inform it of the progress.
- debug_callback writes the debug information into the CURLHandleData object.
- socket_function sets a keepalive on the sockets, but it doesn't seem
to have any effect on how things work (or don't).

Sorry for the long post, but I think that covers just about anything.
It's entirely possible that I've forgotten something, as I'm typing
this in a hurry. Just poke at me and I can add info later this
evening.

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Daniel Stenberg
2013-08-08 21:28:48 UTC
Permalink
On Thu, 8 Aug 2013, Bill Doyle wrote:

Thanks. There's clearly still a distance between where I sit and where you
are. Your description is still filled with references I have a hard time to
figure out...
I go through the process of a) setting up the GCD queue, and b) the curl
multi-handle. The queue is pretty easy, just a single call to
dispatch_queue_create() telling it that I want a serial queue, and then
creating/attaching the initial timeout to kick everything off. How this
happens is described a few paragraphs down. (I haven't set a specific timer
interval here, but that doesn't seem to affect it, so I assume that it
defaults to not doing anything?
This wording makes me curious. What timer interval are you talking about?

Your event dispatcher should monitor all file descriptors/sockets you've asked
it to monitor and tell you when the timeout is reached. There should be no
timer interval involved!
It then creates an easy-handle, sets all of the various options on it, and
adds it to the multi-handle. In particular, the state object (an instance of
CURLHandleData) is attached using CURLOPT_PRIVATE and also as the userData
pointer for most of the callbacks. It is also returned to whoever called
-addUpload:… so that they can use it to track the upload's progress.
Okay so after you've called curl_multi_add_handle() to add that brand new easy
handle to the multi handle, what libcurl function do you call then? If you
don't explicitly call a function, you then just call one when the next action
happens?
- socket_function sets a keepalive on the sockets, but it doesn't seem
to have any effect on how things work (or don't).
I think you're talking about TCP keepalive here? I recommend
CURLOPT_TCP_KEEPALIVE and friends instead for that!
--
/ daniel.haxx.se
Bill Doyle
2013-08-09 16:05:44 UTC
Permalink
Post by Daniel Stenberg
This wording makes me curious. What timer interval are you talking about?
Your event dispatcher should monitor all file descriptors/sockets you've
asked it to monitor and tell you when the timeout is reached. There should
be no timer interval involved!
Alright, this might be one of the points that's causing me trouble
then. So when the CURLMOPT_TIMERFUNCTION is told to set a timeout, is
it always supposed to be a one-time thing? GCD's timers are
interval-based, so if that's the case then I'll either have to do some
trickery to stop them again after the first time they run, or switch
to using dispatch_after().

In the parenthesized portion of that sentence, I'm referring to when I
initially set up the timer source's block and give it to GCD. I'm
simply reusing the same timer source whenever libcurl requests a
timeout, so when I initially create the CURLHandle object, since
nothing has been added yet, I just don't set the interval. It appears
that this has the desired effect of "doing nothing", but I'm working
on confirming that. If I switch to dispatch_after(), of course, that
will be irrelevant.
Post by Daniel Stenberg
Okay so after you've called curl_multi_add_handle() to add that brand new
easy handle to the multi handle, what libcurl function do you call then? If
you don't explicitly call a function, you then just call one when the next
action happens?
I'm not calling anything after adding the handle. Should I be? I
thought that libcurl would set the first timeout when I added a handle
and do everything on its own from there on out. It appears to connect
to the server just fine the way I'm doing it now, but if I'm missing a
step, that would explain a lot.
Post by Daniel Stenberg
I think you're talking about TCP keepalive here? I recommend
CURLOPT_TCP_KEEPALIVE and friends instead for that!
To be honest, I just copied it verbatim from the GCD example that I
was following. Didn't even have it in there originally, I just wanted
to see if it would have any effect on it. If I do end up needing it,
I'll definitely switch to libcurl's built-in option.
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Daniel Stenberg
2013-08-09 21:48:02 UTC
Permalink
Post by Bill Doyle
Alright, this might be one of the points that's causing me trouble
then. So when the CURLMOPT_TIMERFUNCTION is told to set a timeout, is
it always supposed to be a one-time thing?
Yes. It is always one single (active) timer and it is always single-shot. If
you get a new timeout callback (before or after the existing one expires),
that's the new time to set for the timer.
Post by Bill Doyle
Post by Daniel Stenberg
Okay so after you've called curl_multi_add_handle() to add that brand new
easy handle to the multi handle, what libcurl function do you call then? If
you don't explicitly call a function, you then just call one when the next
action happens?
I'm not calling anything after adding the handle. Should I be?
Sorry, I was a bit rusty with the specific details but I read up on the source
code and I also just updated the curl_multi_add_handle() man page to clarify:
when you add a new handle it will cause the timer callback to get called with
the updated timeout timer (1ms) so when that expires you'll of course call
curl_multi_socket_action() on that handle and then it is kicked off. You
should of course be able to see this.

Depending on what name resolver backend you use, you should get some timeout
set after the 1ms timeout but then there's no other default timeout if I
remember correctly and it should only be doing socket callbacks to alter what
you should wait for.
--
/ daniel.haxx.se
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Bill Doyle
2013-08-09 23:36:15 UTC
Permalink
Post by Daniel Stenberg
Yes. It is always one single (active) timer and it is always single-shot. If
you get a new timeout callback (before or after the existing one expires),
that's the new time to set for the timer.
Alright, cool. This makes it a bit more complicated, as there's no way
to cancel a dispatch_after() call to change an existing timeout. Will
I run into issues if I leave the previous timeout running after
creating a new one? It would cause an extra call to
curl_multi_socket_action(), and I'm not sure how much damage this
could cause. If it's a bad idea, then I'll have to use the timer
source as I am now, with some trickery to have it cancel itself after
a single run. Probably not too difficult, but not very clean.
Post by Daniel Stenberg
Sorry, I was a bit rusty with the specific details but I read up on the
source code and I also just updated the curl_multi_add_handle() man page to
clarify: when you add a new handle it will cause the timer callback to get
called with the updated timeout timer (1ms) so when that expires you'll of
course call curl_multi_socket_action() on that handle and then it is kicked
off. You should of course be able to see this.
Depending on what name resolver backend you use, you should get some timeout
set after the 1ms timeout but then there's no other default timeout if I
remember correctly and it should only be doing socket callbacks to alter
what you should wait for.
Excellent, this appears to be (at least close to) the behavior that
I'm already seeing. I'm using the default resolver, and it doesn't
seem to set a timeout, or if it does, it's not doing it through my
TIMEOUTFUNCTION. This leads me into the new discoveries that I've
made.

I was switching over to using dispatch_after() to only run the timeout
once, just to get a jump on seeing if it even worked. While it doesn't
have any *extra* issues, it's not getting any further than it was.
However, I made progress on two issues while working on this. The
first is that I fixed the odd looking log output. Turns out that this
wasn't thread-related at all, but due to the fact that I had neglected
to limit the output to the length of the string provided to my
debug_function. Since it isn't null-terminated, it was spewing extra
copies of contents everywhere each time I printed it. That's why the
extra content got one line shorter every time it printed, because the
string was being advanced by one line of output on each call. So
that's good.

Secondly, and more importantly, I discovered that it isn't outright
hanging after dropping the control connection. I ran out of ideas and
decided to let it just sit there after running as far as it usually
does, and, lo and behold, it 15 minutes later it finally timed out
with code 55! Before stopping, it printed "150 Data Connection
Accepted", meaning that it must have received that from the server
prior to hanging. The fact that it didn't print it out prior to this
makes me think that maybe that message was stuck in the socket
somehow, and my dispatch queue sources didn't pick it up?

In addition, it called the read_function twice or so (probably just
from the false action on the socket being picked up?), print a bit of
what I assume was the contents of the test file (a jpeg, so it was
nonsense), and then finally let me know that it "timed out after x
milliseconds with 0 out of -1 bytes received". A fifteen minute
timeout was never set using my timeout_callback, so where is this
occuring? Is it within libcurl, or is this some system thing giving up
and forcing the socket closed?

This last part also makes me curious: It appears that it's letting me
know that it *downloaded* 0 out of -1 bytes, which makes sense, being
an upload. However, does that fact that it didn't tell me about the
*upload* state mean that some part of it is stuck thinking that it
should be receiving data instead of sending it? I've set
CURLOPT_UPLOAD, and it sends the STOR command to the server, so this
seems unlikely, but it made me wonder.

Finally, I'm seeing a very odd and extremely intermittent issue: every
once in a while, when I perform the upload (with no variation in the
process, at least that *I'm* causing), rather than just getting the
normal messages about reading on a socket, I see a massive amount of
writes. I'm logging it from within the dispatch source that monitors
the socket. I attached a log that shows this event, it looks identical
to a normal run other than the wall of "socket 6: write" messages. The
log also contains the timeout at the end, though it was shortened by
the addition of a CURLOPT_TIMEOUT on the easy-handle so that it isn't
taking 15 minutes anymore.
Daniel Stenberg
2013-08-10 11:28:41 UTC
Permalink
Post by Daniel Stenberg
It is always one single (active) timer and it is always single-shot
Alright, cool. This makes it a bit more complicated, as there's no way to
cancel a dispatch_after() call to change an existing timeout. Will I run
into issues if I leave the previous timeout running after creating a new
one? It would cause an extra call to curl_multi_socket_action()
While I think that's a rather sever limitation of a timer functionality,
calling curl_multi_socket_action() a little bit too often won't cause any harm
at all. It'll just not be as optimal as otherwise.
I'm already seeing. I'm using the default resolver, and it doesn't
seem to set a timeout, or if it does, it's not doing it through my
TIMEOUTFUNCTION. This leads me into the new discoveries that I've
made.
The default name resolver is synchronous so it'll block until done so it
doesn't use a timer.
I fixed the odd looking log output.
Great, removing doubts and problems one by one is indeed good and will help us
to eventually get down to the actual culprit!
Secondly, and more importantly, I discovered that it isn't outright hanging
after dropping the control connection. I ran out of ideas and decided to let
it just sit there after running as far as it usually does, and, lo and
behold, it 15 minutes later it finally timed out with code 55! Before
stopping, it printed "150 Data Connection Accepted", meaning that it must
have received that from the server prior to hanging. The fact that it didn't
print it out prior to this makes me think that maybe that message was stuck
in the socket somehow, and my dispatch queue sources didn't pick it up?
A very strange effect. I won't rule out the risk that this is actually a
libcurl bug so that perhaps you've not been told to wait for the correct
socket action and the 15 minutes timeout that occurs is probably something the
server end has implemented...

I'll see if I can get some time soon to write an event-based ftp upload
application in plain C with something like libevent and verify how that works.
Unfortunately we don't have any event-based tests in the test suite. I have
that on my TODO for the coming months but I can't promise anything. (And I'll
bring this up in a separate topic on the list in a short while.)
In addition, it called the read_function twice or so (probably just
from the false action on the socket being picked up?), print a bit of
what I assume was the contents of the test file (a jpeg, so it was
nonsense), and then finally let me know that it "timed out after x
milliseconds with 0 out of -1 bytes received". A fifteen minute
timeout was never set using my timeout_callback, so where is this
occuring? Is it within libcurl, or is this some system thing giving up
and forcing the socket closed?
From what I can tell, those outputs are all shown when libcurl considers a
timeout to have expired. (They exist in two places in the code.)
This last part also makes me curious: It appears that it's letting me know
that it *downloaded* 0 out of -1 bytes, which makes sense, being an upload.
I believe that's just a flaw in the output message. It shows that completely
unconditionally of what operation that was attempted. We can consider that
room for improvement!
Finally, I'm seeing a very odd and extremely intermittent issue: every once
in a while, when I perform the upload (with no variation in the process, at
least that *I'm* causing), rather than just getting the normal messages
about reading on a socket, I see a massive amount of writes.
Are those happening when libcurl has told the app to wait for a socket
writability and the socket turns writable? Or who's telling that socket write
thing? It seems wrong to wait for a writable socket when libcurl waits for a
command response...

There's a chance that the problems you see and have are depending on timing so
perhaps every once in a while it'll act different due to the timing and things
happening slighlty slower or faster or in another order...
--
/ daniel.haxx.se
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Bill Doyle
2013-08-10 16:26:23 UTC
Permalink
Post by Daniel Stenberg
While I think that's a rather sever limitation of a timer functionality,
calling curl_multi_socket_action() a little bit too often won't cause any
harm at all. It'll just not be as optimal as otherwise.
It is definitely an annoying limitation. I'll work on wrapping around
one of the two methods a bit later to see if I can get it behaving
properly, but until then, it's good to know that it isn't hurting
anything. I know that Apple provides this functionality in the
higher-level NSOperation class, but in the interests of keeping this
simple/lightweight, I'll try working it out on my own.
Post by Daniel Stenberg
The default name resolver is synchronous so it'll block until done so it
doesn't use a timer.
That would be why, then! :)
Post by Daniel Stenberg
A very strange effect. I won't rule out the risk that this is actually a
libcurl bug so that perhaps you've not been told to wait for the correct
socket action and the 15 minutes timeout that occurs is probably something
the server end has implemented...
I checked the manual for pure-ftpd, and 15 minutes is indeed the
default idle timeout, so libcurl is definitely making no attempt to
send any data (or to tell me to do so) after opening the second
socket.
Post by Daniel Stenberg
I'll see if I can get some time soon to write an event-based ftp upload
application in plain C with something like libevent and verify how that
works. Unfortunately we don't have any event-based tests in the test suite.
I have that on my TODO for the coming months but I can't promise anything.
(And I'll bring this up in a separate topic on the list in a short while.)
Ok, and I'll keep bashing on it from this side as well. I poked around
in the libcurl source a little bit the other day, but it'll take a
while to become familiar enough with it to really be able to make a
dent in it. Also, I think I'll try reversing the polarity (heh) and
see if a download completes successfully. That will at least give some
direction to my poking.
Post by Daniel Stenberg
Post by Bill Doyle
Finally, I'm seeing a very odd and extremely intermittent issue: every
once in a while, when I perform the upload (with no variation in the
process, at least that *I'm* causing), rather than just getting the normal
messages about reading on a socket, I see a massive amount of writes.
Are those happening when libcurl has told the app to wait for a socket
writability and the socket turns writable? Or who's telling that socket
write thing? It seems wrong to wait for a writable socket when libcurl waits
for a command response...
As far as I can tell, libcurl is instructing me to watch that socket
just like the others. It happens on the same socket as the normal
exchange of commands, so I'm not sure what's prompting it to register
for write availability. On a normal run it only registers for
CURL_POLL_IN, but on the anomolous runs it also registers for
CURL_POLL_OUT (and this happens before registering for CURL_POLL_IN,
if that makes a difference). The writes start immediately when the
socket is opened and the Curl_addHandleToPipeline happens, before the
welcome message has even arrived from the server, so something seems
to be getting very confused.
Post by Daniel Stenberg
There's a chance that the problems you see and have are depending on timing
so perhaps every once in a while it'll act different due to the timing and
things happening slighlty slower or faster or in another order...
Timing was definitely my first thought, just because there seems to be
no rhyme or reason as to when it happens. I think I might try to set
up some packet-sniffing software and see if I can't get a handle on
exactly what is getting sent where when it happens. I'll also switch
on verbose logging on my test FTP server to see if that has any clues.

As I was finishing up the paragraph on the strange writes, I noticed
something: when it opens the data connection to the server, it's
registering for CURL_POLL_IN. Shouldn't this be CURL_POLL_OUT for an
upload? That would explain why it stops, it's waiting for data to
arrive when none ever will, instead of trying to send it. I verified
that this isn't me mis-interpreting the value passed to
socket_callback as the action somehow, it is in fact what libcurl is
asking for.
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Bill Doyle
2013-08-10 17:05:26 UTC
Permalink
Post by Bill Doyle
As I was finishing up the paragraph on the strange writes, I noticed
something: when it opens the data connection to the server, it's
registering for CURL_POLL_IN. Shouldn't this be CURL_POLL_OUT for an
upload? That would explain why it stops, it's waiting for data to
arrive when none ever will, instead of trying to send it. I verified
that this isn't me mis-interpreting the value passed to
socket_callback as the action somehow, it is in fact what libcurl is
asking for.
Not sure if replying to yourself is good form, but I have an important
development: I forced it to register for both CURL_POLL_IN and
CURL_POLL_OUT on all sockets, and the upload worked perfectly on the
first try. So it does appear that libcurl is requesting the incorrect
type when registering the data socket, for whatever reason.
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Daniel Stenberg
2013-08-10 17:39:00 UTC
Permalink
I forced it to register for both CURL_POLL_IN and CURL_POLL_OUT on all
sockets, and the upload worked perfectly on the first try. So it does appear
that libcurl is requesting the incorrect type when registering the data
socket, for whatever reason.
Ah, very useful experiement and yes I agree it certainly seems like libcurl
doesn't tell you the correct action to monitor for, or possibly on the wrong
socket since it has two in the FTP case.

My initial guess is then that transfer.c:Curl_single_getsock() isn't good
enough for the FTP upload perform state...
--
/ daniel.haxx.se
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Daniel Stenberg
2013-08-14 12:26:59 UTC
Permalink
Post by Daniel Stenberg
Ah, very useful experiement and yes I agree it certainly seems like libcurl
doesn't tell you the correct action to monitor for, or possibly on the wrong
socket since it has two in the FTP case.
My initial guess is then that transfer.c:Curl_single_getsock() isn't good
enough for the FTP upload perform state...
I've now been able to reproduce the problem using test case 107 with my brand
new event-based curl_easy_perform() replacement.

I'll soon A) post my event-based thing as a patch for review and comments and
B) see if I can use this new setup to fix test 107 when run event-based!
--
/ daniel.haxx.se
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Bill Doyle
2013-08-14 16:11:13 UTC
Permalink
Post by Daniel Stenberg
I've now been able to reproduce the problem using test case 107 with my
brand new event-based curl_easy_perform() replacement.
I'll soon A) post my event-based thing as a patch for review and comments
and B) see if I can use this new setup to fix test 107 when run event-based!
Excellent, glad to hear that I'm not crazy. :) Thanks for all your
help with this issue, I would have had no idea where to start. In the
meantime, will forcing it to register for read and write on all
sockets cause any problems? If not, I'll just leave that work-around
in place for the moment. If so, I'll probably set up a slightly more
specific one that tries to target only the proper socket.

Thanks again for your help.
Bill
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Daniel Stenberg
2013-08-14 20:37:54 UTC
Permalink
Excellent, glad to hear that I'm not crazy. :) Thanks for all your help with
this issue, I would have had no idea where to start. In the meantime, will
forcing it to register for read and write on all sockets cause any problems?
I don't think so. You mostlyrisk that it isn't as efficient as otherwise since
it may then get signalled to act on sockets where it has nothing to do.

The attached patch is what I'm trying out right now and I expect to push this
to git after some additional local testing... It fixes the problem for me.
--
/ daniel.haxx.se
Loading...