Discussion:
Using reverse execution
Stan Shebs
2005-09-13 01:17:30 UTC
Permalink
Hi all, I've been spending some time recently on the reverse execution
idea that's been getting some airplay since May or so
(http://sources.redhat.com/ml/gdb/2005-05/msg00145.html and
subsequent), and would like to bring people up to date on some of
my thinking, plus solicit ideas.

The context is Darwin aka Mac OS X native debugging (surprise), and
the idea is to make it something that any application developer using
the Xcode environment could use. There are of course a hundred
practical difficulties (how do you un-execute an IP packet send? how
do you reverse a billion instructions executed on multi-GHz
hardware?), so this is more of a research project at this point; a
real-life usable implementation probably entails extensive kernel
hacking, but right now we don't know enough even to tell kernel people
what facilities we want them to add. In this message, I'm going to
focus on the user model, without trying to tie things down to a
specific implementation.

So my big question is: what is reverse execution good for? Thinking
about some of the difficulties I allude to above, it's easy to
conclude that maybe reverse execution is just "party tricks" - an
impressive demo perhaps, but not a feature that real-life users would
ever adopt. Since the May discussion, I've been watching myself while
debugging other things (like a GDB merge :-) ), and asking "if I had
reverse execution, when would I use it?".

The thing that jumped out at me most strongly was reverse execution as
"undo" facility.

For instance, when stepping through unfamiliar and complicated code,
it's very common to "next" over an apparently innocuous function, then
say "oh sh*t" - your data has magically changed for the worse and so
the function you nexted over must be the guilty party. But it's often
too late - you've passed by the key bit of wrong code, and need to
re-run. Much of the time this is OK, and only takes a moment; but if
your application is complicated (like an iTunes), or if you have a
complicated sequence of breaks and continues and user input to get to
the point of interest, re-running starts to get slow and
errorprone. You may also have a situation where the bug is
intermittent and not guaranteed to appear on rerunning (if that sounds
familiar, hold the thought for a moment). So in these kinds of cases,
what you really want to undo that last "next" so you can do a "step"
and descend into the function of unexpected interest.

A similar case might occur with single-stepping through a series of
calculations. I suspect everybody has at one time or another stepped
over something like "a *= b;", printed the value of a only to get a
bogus value, then either mentally calculated a / b or typed in "p a /
b", as a quick way to recover the last value of a. It would have been
easier and faster just to back up by one line and print (or watch)
a. If the calculation is complicated, the manual un-calculate exposes
the user to blind alleys if the calcution was mistaken. For instance,
if you try to manually undo some pointer arithmetic, you might
mentally adjust by chars when you should be adjusting by ints, and
then be misled because you think that the bug is that the program is
writing bad stuff into memory, when it's the pointer value that's
mistaken.

The key tradeoff for reverse execution as undo facility is complexity
of rerunning. If rerunning is a cheap part of the debugging session,
then the undo facility is not going to seem that important.

Another use for reverse execution is a more general form of zeroing in
on a bug. Suppose you have a bogus pointer that was supposedly
returned by malloc() somewhere earlier in the program's
execution. That pointer may only sit in a named variable, and the rest
of time is wandering around in various other data structures. There's
no single thing to watch in this case; it's not the memory being
pointed to that's the problem, it's that the pointer itself goes "off
the reservation". So what you want to do is start from the point at
which you've discovered the pointer is bad (segfault perhaps), watch
the location holding the bad pointer, and run backwards until either
a) somebody modifies the pointer in place, or b) the bogus pointer is
copied from elsewhere, in which case you watch it and continue
backwards. In many cases you'll get to the bad code sooner than by
running forwards, where you have to anticipate which malloc will
produce the later-to-be-scrambled pointer, and generally trace along
trying to anticipate who's going to do what to it before the bad thing
happens. (The display of ignored breakpoint hits was introduced as a
way to speed this up some.) Again, as with undo, the efficiency of
this process vs re-running depends on whether the actual bug occurs
closer to the beginning of execution, or closer to the point of
failure. One could make an argument that most root-cause bugs tend to
occur closer to failure points than to the beginning of program
execution, but that's kind of a philosphical point about program
desing for which I have no concrete evidence.

Then there is stepping backwards by instructions to retrace what is
happening at the machine level. I'm less inclined to say this is
valuable; picking apart registers and raw memory is a rather
painstaking activity, so slow (at the human level), that the time to
re-run up to the line in question is usually negligible by
comparison. Even so, I can see it becoming very natural for a user to
do a step, see bogus data that simply can't be explained by the source
line on the screen, do a reverse-step and then multiple stepi's to
"slo-mo" the calculations of that line's compiled code.

I touched on hard-to-repeat cases briefly above - GDB mavens will
recognize this as one of the rationales for the tracepoint facility.
Reverse execution is similar in that once you've gotten the program
into a state where a problem manifests, you want to poke around in the
program's immediate past states. Tracepoints however are designed such
that the user needs to anticipate what data will be interesting;
sensible in a decoupled remote debugging context, but not so good for
the data-driven spur-of-the-moment experimentation that is part of a
productive debugging session. So a working reverse execution gives the
user freedom to look around a program's entire state while moving up
and down along the flow of execution. (Ironically, this capability
might work against good program design, in that it takes away some
incentive to design a program with repeatable behavior. For instance,
programs using random number generator often include machinery to
display and input RNG seeds, one of the uses being to guarantee
predictability while re-running under a debugger.)

But will users actually use any of this in real life? "Undo" is pretty
easy - everybody understands "undo", even nonprogrammers, with many
GUIs giving it a dedicated keystroke. Tracking data backwards through
a program is a powerful tool for a tough class of bugs, but as we know
from past experience, powerful features that are at all hard to use
are often ignored. Single-instruction reverse stepping is conceptually
simpler, but likely to see more interest from the low-level
developers, and may only be interesting if available for kernel
debugging and the like. Reproducibility problems crop up regularly, so
I can see people wanting to use reverse execution after a breakpoint
sets them down in rarely-executed code.

Once we have an idea of what we think users will want from the
feature, we'll have a better idea of what characteristics and
limitations might be acceptable in an implementation.

Stan
Eli Zaretskii
2005-09-13 03:42:19 UTC
Permalink
Date: Mon, 12 Sep 2005 18:17:30 -0700
But will users actually use any of this in real life? "Undo" is pretty
easy - everybody understands "undo", even nonprogrammers, with many
GUIs giving it a dedicated keystroke. Tracking data backwards through
a program is a powerful tool for a tough class of bugs, but as we know
from past experience, powerful features that are at all hard to use
are often ignored. Single-instruction reverse stepping is conceptually
simpler, but likely to see more interest from the low-level
developers, and may only be interesting if available for kernel
debugging and the like. Reproducibility problems crop up regularly, so
I can see people wanting to use reverse execution after a breakpoint
sets them down in rarely-executed code.
Once we have an idea of what we think users will want from the
feature, we'll have a better idea of what characteristics and
limitations might be acceptable in an implementation.
You said it yourself several times: looking for a bug is fundamentally
a go-backward problem. When you debug a program, you start from some
manifestation of a bug, and mentally go backwards trying to find out
"whodunit".

The various "undo" facilities are different: they let you correct a
mistakenly taken action. That's not what backwards debugging is
about.

The challenge is to implement going backwards in a way that won't
overcomplicate its usage. But I think we can manage that, if we rely
on checkpoints and implement going backwards by unrolling to the
previous checkpoint and then stepping forward.
Stan Shebs
2005-09-14 00:36:19 UTC
Permalink
[...] looking for a bug is fundamentally
a go-backward problem. When you debug a program, you start from some
manifestation of a bug, and mentally go backwards trying to find out
"whodunit".
I think it's a strong and unproven claim to say that the backwards
reasoning of the mental deduction process implies that users will
find reverse execution a natural mode of debugging. For example,
imperative programming is strongly time-oriented - "do this, then
that, then the other thing". Stepping backwards from the else clause
of an if, or from a label to the goto can be kind of jarring.
Stepping backwards through an undoable operation is going to require
one to remember that the packet is not actually un-sent; multiple
backward steps (say through remote.c) could turn into a new and
unpleasant sort of memory game.

It may indeed be true that users will instantly take to reverse
execution - I believe the Simics folks have made that claim - but
I don't think we yet have enough data to justify the general
conclusion.
The various "undo" facilities are different: they let you correct a
mistakenly taken action. That's not what backwards debugging is
about.
What I'm getting at is that maybe that's the only thing users would
want to use reverse execution for. It would be ironic if we worked up
elaborate machinery, and we find users only caring about getting back
to the last checkpoint so they can try single-stepping forward again.
The challenge is to implement going backwards in a way that won't
overcomplicate its usage. But I think we can manage that, if we rely
on checkpoints and implement going backwards by unrolling to the
previous checkpoint and then stepping forward.
That's the general implementation theory, complicated by shared
memory, thread switching, multiple CPUs, and other nasty things. :-)

Stan
Eli Zaretskii
2005-09-14 03:41:59 UTC
Permalink
Date: Tue, 13 Sep 2005 17:36:19 -0700
I think it's a strong and unproven claim to say that the backwards
reasoning of the mental deduction process implies that users will
find reverse execution a natural mode of debugging.
I find this natural to the degree that it doesn't need any proof.
Stepping backwards from the else clause
of an if, or from a label to the goto can be kind of jarring.
That's not what you do when you trace a bug. You start from the place
where, e.g., the program gets a SIGSEGV, and then unroll it back to
possible places where the corruption could have happened. That is,
you try to guess where the problem could have originated from, and
then get there and look around for clues. I don't find this jarring
in any way.
Post by Eli Zaretskii
The challenge is to implement going backwards in a way that won't
overcomplicate its usage. But I think we can manage that, if we rely
on checkpoints and implement going backwards by unrolling to the
previous checkpoint and then stepping forward.
That's the general implementation theory, complicated by shared
memory, thread switching, multiple CPUs, and other nasty things. :-)
Life is full of complications. That doesn't mean we should do
nothing, just that we should understand the limitations.
Stan Shebs
2005-09-14 22:34:28 UTC
Permalink
Post by Eli Zaretskii
Date: Tue, 13 Sep 2005 17:36:19 -0700
Stepping backwards from the else clause
of an if, or from a label to the goto can be kind of jarring.
That's not what you do when you trace a bug. You start from the place
where, e.g., the program gets a SIGSEGV, and then unroll it back to
possible places where the corruption could have happened. That is,
you try to guess where the problem could have originated from, and
then get there and look around for clues. I don't find this jarring
in any way.
But have you actually done any debugging by reverse execution yourself?
What you describe is the reason we hypothesize that reverse execution
is a useful feature, not the evidence that our users will flock to it.

As a comparison, for tracepoints we came up with various scenarios for
how they would be amazingly useful and powerful, and yet after nearly
a decade they remain a curiosity in GDB. One could argue that they're
lacking native support, or need better documentation, or whatever,
but if that's true, then the attractiveness of tracepoints depends as
much on getting those details right as on the general concept.

So that's the kind of question I'm asking for reverse execution - what
do we think it takes to make it useful? Do we have to be be able to
undo all system calls, or is it sufficient to just skip over them
somehow? Should executing forward after reversal re-execute system
calls, or skip over them, or should there be a sort of virtual/real
option? Do we have to be able to unroll back to the beginning of the
program, or can we usefully limit the range? Is there any more risk
to users than they incur now when calling a function in the inferior?
Reversing is likely to be slower - how much is acceptable? Will an
incomplete mechanism still be interesting, or would it get a bad
reputation such that no one will use it?

Stan
Eli Zaretskii
2005-09-15 03:37:35 UTC
Permalink
Date: Wed, 14 Sep 2005 15:34:28 -0700
But have you actually done any debugging by reverse execution yourself?
Yes.
As a comparison, for tracepoints we came up with various scenarios for
how they would be amazingly useful and powerful, and yet after nearly
a decade they remain a curiosity in GDB.
IMHO, tracepoints remain a curiosity because they were never
implemented on a large enough number of platforms. Lack of native
support, in particular, is the main reason for its non-use.
So that's the kind of question I'm asking for reverse execution - what
do we think it takes to make it useful? Do we have to be be able to
undo all system calls, or is it sufficient to just skip over them
somehow? Should executing forward after reversal re-execute system
calls, or skip over them, or should there be a sort of virtual/real
option? Do we have to be able to unroll back to the beginning of the
program, or can we usefully limit the range? Is there any more risk
to users than they incur now when calling a function in the inferior?
Reversing is likely to be slower - how much is acceptable? Will an
incomplete mechanism still be interesting, or would it get a bad
reputation such that no one will use it?
We could discuss these questions one by one. But we shouldn't fear
them to the degree that prevents us from starting to implement this
feature.
Stan Shebs
2005-09-15 05:36:34 UTC
Permalink
Post by Eli Zaretskii
Date: Wed, 14 Sep 2005 15:34:28 -0700
But have you actually done any debugging by reverse execution yourself?
Yes.
Cool! Care to share any details??
Post by Eli Zaretskii
As a comparison, for tracepoints we came up with various scenarios for
how they would be amazingly useful and powerful, and yet after nearly
a decade they remain a curiosity in GDB.
IMHO, tracepoints remain a curiosity because they were never
implemented on a large enough number of platforms. Lack of native
support, in particular, is the main reason for its non-use.
But don't you think it's telling that not one single person was
willing to go to the trouble of implementing it on more platforms?
When breakpoints don't work on a platform, users don't say "oh well,
we'll just have to do without". Apparently tracepoints are just not
a must-have.
Post by Eli Zaretskii
So that's the kind of question I'm asking for reverse execution - what
do we think it takes to make it useful? Do we have to be be able to
undo all system calls, or is it sufficient to just skip over them
somehow? Should executing forward after reversal re-execute system
calls, or skip over them, or should there be a sort of virtual/real
option? Do we have to be able to unroll back to the beginning of the
program, or can we usefully limit the range? Is there any more risk
to users than they incur now when calling a function in the inferior?
Reversing is likely to be slower - how much is acceptable? Will an
incomplete mechanism still be interesting, or would it get a bad
reputation such that no one will use it?
We could discuss these questions one by one. But we shouldn't fear
them to the degree that prevents us from starting to implement this
feature.
Depending on the answers, the project could be fatally flawed. For
instance, if the ability to undo system calls is critical for
usability, that pretty much relegates reversal to simulator targets
only - not interesting for my user base. That's why I wanted to talk
about usage patterns; if users don't need the debugger to do the
incredibly hard things, then we can get to something useful sooner.

Stan
Eli Zaretskii
2005-09-15 15:14:19 UTC
Permalink
Date: Wed, 14 Sep 2005 22:36:34 -0700
Cool! Care to share any details??
I thought I was doing just that...

If you mean to try to answer the questions you rose, like whether to
try to undo system calls, then I'm afraid I don't remember what
happened on the system where I used such a debugger (it was quite some
time ago).

Anyway, one of the latest issues of DrDobb's ran an article about
debuggers that support similar features, with pointers to existing
products, so you could try to find them to get some ideas about
usability of this feature.
Post by Eli Zaretskii
IMHO, tracepoints remain a curiosity because they were never
implemented on a large enough number of platforms. Lack of native
support, in particular, is the main reason for its non-use.
But don't you think it's telling that not one single person was
willing to go to the trouble of implementing it on more platforms?
I can only speak for myself. You once wrote here that tracepoints in
native debugging is something to kill for, but I myself didn't have
time and resources to make that happen.

Basically, the lesson from tracepoints is, I think, that features that
GDB developers (as opposed to users) don't need too much will not
materialize.
Post by Eli Zaretskii
We could discuss these questions one by one. But we shouldn't fear
them to the degree that prevents us from starting to implement this
feature.
Depending on the answers, the project could be fatally flawed.
I don't think so.
For instance, if the ability to undo system calls is critical for
usability, that pretty much relegates reversal to simulator targets
only - not interesting for my user base. That's why I wanted to talk
about usage patterns; if users don't need the debugger to do the
incredibly hard things, then we can get to something useful sooner.
I suspect that answers to most or all of your questions are
"sometimes". I.e., sometimes the user will want to undo the system
call, and sometimes not. I even think that sometimes they will want
to _redo_ the system call, since the bug might only happen when the
syscall is made.

This might mean we will have to put in code to ask the user what to do
with a syscall.
Jason Molenda
2005-09-15 18:02:23 UTC
Permalink
Post by Eli Zaretskii
Anyway, one of the latest issues of DrDobb's ran an article about
debuggers that support similar features, with pointers to existing
products, so you could try to find them to get some ideas about
usability of this feature.
June 2005, issue #373, "Omniscient Debugging" by Bil Lewis.
http://www.lambdacs.com/debugger/debugger.html
Stan Shebs
2005-09-15 20:12:19 UTC
Permalink
Post by Jason Molenda
Post by Eli Zaretskii
Anyway, one of the latest issues of DrDobb's ran an article about
debuggers that support similar features, with pointers to existing
products, so you could try to find them to get some ideas about
usability of this feature.
June 2005, issue #373, "Omniscient Debugging" by Bil Lewis.
http://www.lambdacs.com/debugger/debugger.html
The whole omniscient debugging thing is actually a good example of
why I'm asking questions about user experience - while there are lots
of people interested, it gets magazine writeups, etc, there is very
little feedback from actual users that I could find. (Despite the
date of the Dr Dobbs article, this particular project goes back a
couple years.) Real user experience with ODB would be especially
interesting because it is strictly about walking around in collected
trace data; it's not possible to execute anything a second time.

Stan
Eli Zaretskii
2005-09-16 10:42:50 UTC
Permalink
Date: Thu, 15 Sep 2005 13:12:19 -0700
The whole omniscient debugging thing is actually a good example of
why I'm asking questions about user experience - while there are lots
of people interested, it gets magazine writeups, etc, there is very
little feedback from actual users that I could find.
I mentioned that article because I think it explains quite a bit why a
feature like this is very useful. You asked some questions about that
aspect.

In addition, one could download the software linked to from that
article and play with it a little, thus gaining some first-hand
experience.

Anyway, I really don't understand why we need to discuss all this at
such length. Either there is a volunteer who is ready to do the job
of adding this, or there isn't. In the latter case, there's no sense
arguing about the value of the feature; in the former case, will we
really consider rejecting the patches that implement the feature
because some of us are unsure how useful it will be?
Stan Shebs
2005-09-16 14:00:27 UTC
Permalink
Post by Eli Zaretskii
Anyway, I really don't understand why we need to discuss all this at
such length. Either there is a volunteer who is ready to do the job
of adding this, or there isn't. In the latter case, there's no sense
arguing about the value of the feature; in the former case, will we
really consider rejecting the patches that implement the feature
because some of us are unsure how useful it will be?
I must not be getting my point across very well then. Apple is quite
interested in getting reverse execution going outside of the simulator
context, and I've been playing with some prototype machinery. However,
it's clear that a full-blown handles-every-situation implementation
will require a huge amount of kernel hacking in addition to the GDB
part. I don't want to get into a situation like that of tracepoints,
where the feature ultimately falls by the wayside because it's too
narrow in applicability and implementation.

So I'm not questioning the value of the feature, but trying to get a
sense of the user requirements. Undoing a single a=b+c is relatively
easy, and my prototype can do that now, but reversing through 15
minutes of iTunes usage is fiendishly hard, and would require a major
commitment by Apple involving multiple software groups. The level of
effort I'll be able to get depends on what we think the requirements
look like.

This is a rare opportunity to weigh in on a feature *before* it's
implemented; an unfamiliar situation I know :-) , but something
people ought to take advantage of rather than brush off. A lot of
past GDB projects would have come out better had there been open
discussion earlier in the process.

Stan
Eli Zaretskii
2005-09-16 16:21:50 UTC
Permalink
Date: Fri, 16 Sep 2005 07:00:27 -0700
it's clear that a full-blown handles-every-situation implementation
will require a huge amount of kernel hacking in addition to the GDB
part. I don't want to get into a situation like that of tracepoints,
where the feature ultimately falls by the wayside because it's too
narrow in applicability and implementation.
What features can be implemented without hacking the kernel?
Stan Shebs
2005-09-16 18:02:50 UTC
Permalink
Post by Eli Zaretskii
Date: Fri, 16 Sep 2005 07:00:27 -0700
it's clear that a full-blown handles-every-situation implementation
will require a huge amount of kernel hacking in addition to the GDB
part. I don't want to get into a situation like that of tracepoints,
where the feature ultimately falls by the wayside because it's too
narrow in applicability and implementation.
What features can be implemented without hacking the kernel?
If you limited reversing to designated regions, and single-stepped
every instruction in the range, collected the exact data changes
(by disassembling the instructions) and only allowed examination
rather than re-executing from any given point, all that just needs
existing GDB machinery. Is it useful? At least somebody thinks so,
because I just described how the omniscient debugger works (using
Java bytecodes instead of machine instructions).

Stan
Eli Zaretskii
2005-09-16 20:50:09 UTC
Permalink
Date: Fri, 16 Sep 2005 11:02:50 -0700
Post by Eli Zaretskii
What features can be implemented without hacking the kernel?
If you limited reversing to designated regions, and single-stepped
every instruction in the range, collected the exact data changes
(by disassembling the instructions) and only allowed examination
rather than re-executing from any given point, all that just needs
existing GDB machinery. Is it useful?
Sorry, you lost me. Can you describe this in smaller words?
At least somebody thinks so, because I just described how the
omniscient debugger works (using Java bytecodes instead of machine
instructions).
As I understood the description of the omniscient debugger, it would
be very useful.
Stan Shebs
2005-09-23 23:20:20 UTC
Permalink
(Been in Cupertino talking about reverse execution to Apple people,
just now getting caught up)
Post by Eli Zaretskii
Date: Fri, 16 Sep 2005 11:02:50 -0700
Post by Eli Zaretskii
What features can be implemented without hacking the kernel?
If you limited reversing to designated regions, and single-stepped
every instruction in the range, collected the exact data changes
(by disassembling the instructions) and only allowed examination
rather than re-executing from any given point, all that just needs
existing GDB machinery. Is it useful?
Sorry, you lost me. Can you describe this in smaller words?
ODB works by hacking up the bytecode interpreter to log what every
bytecode does, recording addresses and data values. After the
program runs, you have a gigantic log, and you look at it with a
debugger-type interface. Since the log contains every before-and-after
change to objects etc, it's just a matter of rooting through it to
be able to display any value at any time. But the program execution
is already over, so there's no way to set a variable and continue,
call a method, or do anything else to alter execution.

An analogy I used this week is that of a "time-traveler's telescope" -
you can focus in on any point in the past, and study it closely from
the present day, but you can't alter it. A handy way to sidestep
paradox, but it also precludes "what-if" experiments; you'd have to
restart the program before you could make the program go a different way.
Post by Eli Zaretskii
At least somebody thinks so, because I just described how the
omniscient debugger works (using Java bytecodes instead of machine
instructions).
As I understood the description of the omniscient debugger, it would
be very useful.
Indeed it seems so. That would be a very nice outcome - we could get
a powerful new feature without having to make a scarily-large
commitment to a feature that doesn't current have a large body of
users demanding it already.

Stan
Ian Lance Taylor
2005-09-16 17:49:51 UTC
Permalink
Post by Stan Shebs
So I'm not questioning the value of the feature, but trying to get a
sense of the user requirements.
Well, here are some comments based on how I use gdb.


First, debugging a program with which I am not familiar, in order to
locate a bug. Let's assume this is a reproducible bug. The first job
is to find out where the bug is occurring. Reverse execution can help
here with one of the examples that Stan gave. When I 'next' over a
function, the bug might occur there. It would be convenient to be
able to reverse over the function, and then step into it.

In order to make that work most conveniently, it would be ideal if
network and file input were replayed when going forward again.
Basically, any invocation of the read system call would ideally return
the same results as before. Similarly for the open, socket, and
accept system calls and for system calls like ioctl (FIONREAD). For
super extra credit, SIGIO and SIGURG signals would be repeated at the
same time as before.

For my purposes it's not necessary to replay network and file output.
In fact, in some cases that would actually be less helpful. It would
be better to just skip the output system calls. To put it another
way, it's not necessary for my purposes to undo any system actions
when stepping backward; it's merely helpful to replay the input calls
when stepping forward again.

Mind you, reverse execution would be useful even without the ability
to replay system calls. It's just that it would be more useful if
they could be replayed.


Second, once the bug has been located, how did it happen? When this
is not obvious, it's generally an issue of memory corruption or
mysterious data structure manipulation. For this the most useful
feature would be a reverse watchpoint, to walk backward to the point
where the data was changed. Here again it is not necessary to undo
system calls. I should note that this would only be useful if the
reverse watchpoint were quite efficient; I don't know how feasible
that is. I recall that non-hardware watchpoints were unusable.


Third, a different scenario, the bug which can not be recreated at
will. gdb isn't normally too helpful on these cases, but let's say
for the sake of argument that we have a core dump or we can attach to
a running process which shows the bug. Here again reverse execution
could be helpful to pin down where the bug happened. However, it's
hard for me to imagine this actually getting implemented, so I won't
discuss it in detail.


Hope this helps.

Ian
Eli Zaretskii
2005-09-16 10:43:12 UTC
Permalink
Date: Thu, 15 Sep 2005 11:02:23 -0700
Post by Eli Zaretskii
Anyway, one of the latest issues of DrDobb's ran an article about
debuggers that support similar features, with pointers to existing
products, so you could try to find them to get some ideas about
usability of this feature.
June 2005, issue #373, "Omniscient Debugging" by Bil Lewis.
http://www.lambdacs.com/debugger/debugger.html
That's the one. Thanks, Jason.
Min Xu (Hsu)
2005-09-13 18:11:05 UTC
Permalink
I found myself often "simulating" reverse execution when using gdb
by recording the sequence of actions (breakpoints) I issued in order
to get to specific points in a program.

I would second the "bookmarks" idea, which has been discussed back
in May. To me it is the first step toward reverse execution without
needing any special OS supports. The assumptions there were:

1. The program is deterministic. For single threaded program, this
largely means the inputs are repeatable. GDB may have to record
the command line parameters, but no more requirement should the
program needs from GDB to repeat its inputs.

2. Re-running the program from the beginning is fast. Therefore, no
checkpoint mechanism is required.

I imagine the bookmarks are in a linear time history of the program
execution. User should be able to name them and inspect variable
values previously displayed. Internally, gdb can locate a bookmark
by counting backward branches and setting appropriate breakpoint after
enough number of backward branches are passed.
Post by Stan Shebs
Hi all, I've been spending some time recently on the reverse execution
idea that's been getting some airplay since May or so
(http://sources.redhat.com/ml/gdb/2005-05/msg00145.html and
subsequent), and would like to bring people up to date on some of
my thinking, plus solicit ideas.
The context is Darwin aka Mac OS X native debugging (surprise), and
the idea is to make it something that any application developer using
the Xcode environment could use. There are of course a hundred
practical difficulties (how do you un-execute an IP packet send? how
do you reverse a billion instructions executed on multi-GHz
hardware?), so this is more of a research project at this point; a
real-life usable implementation probably entails extensive kernel
hacking, but right now we don't know enough even to tell kernel people
what facilities we want them to add. In this message, I'm going to
focus on the user model, without trying to tie things down to a
specific implementation.
So my big question is: what is reverse execution good for? Thinking
about some of the difficulties I allude to above, it's easy to
conclude that maybe reverse execution is just "party tricks" - an
impressive demo perhaps, but not a feature that real-life users would
ever adopt. Since the May discussion, I've been watching myself while
debugging other things (like a GDB merge :-) ), and asking "if I had
reverse execution, when would I use it?".
The thing that jumped out at me most strongly was reverse execution as
"undo" facility.
For instance, when stepping through unfamiliar and complicated code,
it's very common to "next" over an apparently innocuous function, then
say "oh sh*t" - your data has magically changed for the worse and so
the function you nexted over must be the guilty party. But it's often
too late - you've passed by the key bit of wrong code, and need to
re-run. Much of the time this is OK, and only takes a moment; but if
your application is complicated (like an iTunes), or if you have a
complicated sequence of breaks and continues and user input to get to
the point of interest, re-running starts to get slow and
errorprone. You may also have a situation where the bug is
intermittent and not guaranteed to appear on rerunning (if that sounds
familiar, hold the thought for a moment). So in these kinds of cases,
what you really want to undo that last "next" so you can do a "step"
and descend into the function of unexpected interest.
A similar case might occur with single-stepping through a series of
calculations. I suspect everybody has at one time or another stepped
over something like "a *= b;", printed the value of a only to get a
bogus value, then either mentally calculated a / b or typed in "p a /
b", as a quick way to recover the last value of a. It would have been
easier and faster just to back up by one line and print (or watch)
a. If the calculation is complicated, the manual un-calculate exposes
the user to blind alleys if the calcution was mistaken. For instance,
if you try to manually undo some pointer arithmetic, you might
mentally adjust by chars when you should be adjusting by ints, and
then be misled because you think that the bug is that the program is
writing bad stuff into memory, when it's the pointer value that's
mistaken.
The key tradeoff for reverse execution as undo facility is complexity
of rerunning. If rerunning is a cheap part of the debugging session,
then the undo facility is not going to seem that important.
Another use for reverse execution is a more general form of zeroing in
on a bug. Suppose you have a bogus pointer that was supposedly
returned by malloc() somewhere earlier in the program's
execution. That pointer may only sit in a named variable, and the rest
of time is wandering around in various other data structures. There's
no single thing to watch in this case; it's not the memory being
pointed to that's the problem, it's that the pointer itself goes "off
the reservation". So what you want to do is start from the point at
which you've discovered the pointer is bad (segfault perhaps), watch
the location holding the bad pointer, and run backwards until either
a) somebody modifies the pointer in place, or b) the bogus pointer is
copied from elsewhere, in which case you watch it and continue
backwards. In many cases you'll get to the bad code sooner than by
running forwards, where you have to anticipate which malloc will
produce the later-to-be-scrambled pointer, and generally trace along
trying to anticipate who's going to do what to it before the bad thing
happens. (The display of ignored breakpoint hits was introduced as a
way to speed this up some.) Again, as with undo, the efficiency of
this process vs re-running depends on whether the actual bug occurs
closer to the beginning of execution, or closer to the point of
failure. One could make an argument that most root-cause bugs tend to
occur closer to failure points than to the beginning of program
execution, but that's kind of a philosphical point about program
desing for which I have no concrete evidence.
Then there is stepping backwards by instructions to retrace what is
happening at the machine level. I'm less inclined to say this is
valuable; picking apart registers and raw memory is a rather
painstaking activity, so slow (at the human level), that the time to
re-run up to the line in question is usually negligible by
comparison. Even so, I can see it becoming very natural for a user to
do a step, see bogus data that simply can't be explained by the source
line on the screen, do a reverse-step and then multiple stepi's to
"slo-mo" the calculations of that line's compiled code.
I touched on hard-to-repeat cases briefly above - GDB mavens will
recognize this as one of the rationales for the tracepoint facility.
Reverse execution is similar in that once you've gotten the program
into a state where a problem manifests, you want to poke around in the
program's immediate past states. Tracepoints however are designed such
that the user needs to anticipate what data will be interesting;
sensible in a decoupled remote debugging context, but not so good for
the data-driven spur-of-the-moment experimentation that is part of a
productive debugging session. So a working reverse execution gives the
user freedom to look around a program's entire state while moving up
and down along the flow of execution. (Ironically, this capability
might work against good program design, in that it takes away some
incentive to design a program with repeatable behavior. For instance,
programs using random number generator often include machinery to
display and input RNG seeds, one of the uses being to guarantee
predictability while re-running under a debugger.)
But will users actually use any of this in real life? "Undo" is pretty
easy - everybody understands "undo", even nonprogrammers, with many
GUIs giving it a dedicated keystroke. Tracking data backwards through
a program is a powerful tool for a tough class of bugs, but as we know
from past experience, powerful features that are at all hard to use
are often ignored. Single-instruction reverse stepping is conceptually
simpler, but likely to see more interest from the low-level
developers, and may only be interesting if available for kernel
debugging and the like. Reproducibility problems crop up regularly, so
I can see people wanting to use reverse execution after a breakpoint
sets them down in rarely-executed code.
Once we have an idea of what we think users will want from the
feature, we'll have a better idea of what characteristics and
limitations might be acceptable in an implementation.
Stan
Jim Blandy
2005-09-13 22:00:35 UTC
Permalink
ptrace allows you to stop the program just before and after it makes a
system call; this allows programs like strace to recover the
parameters that were passed, and the values returned. It's also
enough control to allow you to "replay" the system calls with values
you've saved earlier. Michael Chastain has written a program to do
this. The system call record would have to be related in the
appropriate way to whatever bookmarks or checkpoints the system
retained.

Some obvious restrictions:

- if you rewind and then modify the inferior's state, it may not make
the same system calls it did before. The illusion can't be
sustained, and you just have to punt somehow.

- Getting signals like SIGIO or SIGWINCH to arrive at exactly the same
points is something I just don't know how to do. It's clear how to
take and restore snapshots, but it's not clear how to recognize when
a system has re-reached a given state and should have its signal
re-delivered.

But it still seems like enough to be helpful in a lot of cases.
Stan Shebs
2005-09-14 00:42:32 UTC
Permalink
Post by Jim Blandy
- Getting signals like SIGIO or SIGWINCH to arrive at exactly the same
points is something I just don't know how to do. It's clear how to
take and restore snapshots, but it's not clear how to recognize when
a system has re-reached a given state and should have its signal
re-delivered.
I think you'd have to get hints from the system, and similarly for
thread switching.

Stan
Ramana Radhakrishnan
2005-09-16 11:56:14 UTC
Permalink
Hi,

I remember us having quite a bit of discussion on this when LIZARD was
happening here at Codito. The features you are discussing here resemble
some of the work we did back then . It was an effort with gdb and Linux
to get gdb to do some reverse execution .

I don't know if you saw this too.

You can check the Lizard site at http://lizard.sourceforge.net/

The features are here :

http://lizard.sourceforge.net/features.html


and there is a mailing list at
http://sourceforge.net/mailarchive/forum.php?forum=lizard-hackers

cheers
Ramana
Post by Stan Shebs
Hi all, I've been spending some time recently on the reverse execution
idea that's been getting some airplay since May or so
(http://sources.redhat.com/ml/gdb/2005-05/msg00145.html and
subsequent), and would like to bring people up to date on some of
my thinking, plus solicit ideas.
The context is Darwin aka Mac OS X native debugging (surprise), and
the idea is to make it something that any application developer using
the Xcode environment could use. There are of course a hundred
practical difficulties (how do you un-execute an IP packet send? how
do you reverse a billion instructions executed on multi-GHz
hardware?), so this is more of a research project at this point; a
real-life usable implementation probably entails extensive kernel
hacking, but right now we don't know enough even to tell kernel people
what facilities we want them to add. In this message, I'm going to
focus on the user model, without trying to tie things down to a
specific implementation.
So my big question is: what is reverse execution good for? Thinking
about some of the difficulties I allude to above, it's easy to
conclude that maybe reverse execution is just "party tricks" - an
impressive demo perhaps, but not a feature that real-life users would
ever adopt. Since the May discussion, I've been watching myself while
debugging other things (like a GDB merge :-) ), and asking "if I had
reverse execution, when would I use it?".
The thing that jumped out at me most strongly was reverse execution as
"undo" facility.
For instance, when stepping through unfamiliar and complicated code,
it's very common to "next" over an apparently innocuous function, then
say "oh sh*t" - your data has magically changed for the worse and so
the function you nexted over must be the guilty party. But it's often
too late - you've passed by the key bit of wrong code, and need to
re-run. Much of the time this is OK, and only takes a moment; but if
your application is complicated (like an iTunes), or if you have a
complicated sequence of breaks and continues and user input to get to
the point of interest, re-running starts to get slow and
errorprone. You may also have a situation where the bug is
intermittent and not guaranteed to appear on rerunning (if that sounds
familiar, hold the thought for a moment). So in these kinds of cases,
what you really want to undo that last "next" so you can do a "step"
and descend into the function of unexpected interest.
A similar case might occur with single-stepping through a series of
calculations. I suspect everybody has at one time or another stepped
over something like "a *= b;", printed the value of a only to get a
bogus value, then either mentally calculated a / b or typed in "p a /
b", as a quick way to recover the last value of a. It would have been
easier and faster just to back up by one line and print (or watch)
a. If the calculation is complicated, the manual un-calculate exposes
the user to blind alleys if the calcution was mistaken. For instance,
if you try to manually undo some pointer arithmetic, you might
mentally adjust by chars when you should be adjusting by ints, and
then be misled because you think that the bug is that the program is
writing bad stuff into memory, when it's the pointer value that's
mistaken.
The key tradeoff for reverse execution as undo facility is complexity
of rerunning. If rerunning is a cheap part of the debugging session,
then the undo facility is not going to seem that important.
Another use for reverse execution is a more general form of zeroing in
on a bug. Suppose you have a bogus pointer that was supposedly
returned by malloc() somewhere earlier in the program's
execution. That pointer may only sit in a named variable, and the rest
of time is wandering around in various other data structures. There's
no single thing to watch in this case; it's not the memory being
pointed to that's the problem, it's that the pointer itself goes "off
the reservation". So what you want to do is start from the point at
which you've discovered the pointer is bad (segfault perhaps), watch
the location holding the bad pointer, and run backwards until either
a) somebody modifies the pointer in place, or b) the bogus pointer is
copied from elsewhere, in which case you watch it and continue
backwards. In many cases you'll get to the bad code sooner than by
running forwards, where you have to anticipate which malloc will
produce the later-to-be-scrambled pointer, and generally trace along
trying to anticipate who's going to do what to it before the bad thing
happens. (The display of ignored breakpoint hits was introduced as a
way to speed this up some.) Again, as with undo, the efficiency of
this process vs re-running depends on whether the actual bug occurs
closer to the beginning of execution, or closer to the point of
failure. One could make an argument that most root-cause bugs tend to
occur closer to failure points than to the beginning of program
execution, but that's kind of a philosphical point about program
desing for which I have no concrete evidence.
Then there is stepping backwards by instructions to retrace what is
happening at the machine level. I'm less inclined to say this is
valuable; picking apart registers and raw memory is a rather
painstaking activity, so slow (at the human level), that the time to
re-run up to the line in question is usually negligible by
comparison. Even so, I can see it becoming very natural for a user to
do a step, see bogus data that simply can't be explained by the source
line on the screen, do a reverse-step and then multiple stepi's to
"slo-mo" the calculations of that line's compiled code.
I touched on hard-to-repeat cases briefly above - GDB mavens will
recognize this as one of the rationales for the tracepoint facility.
Reverse execution is similar in that once you've gotten the program
into a state where a problem manifests, you want to poke around in the
program's immediate past states. Tracepoints however are designed such
that the user needs to anticipate what data will be interesting;
sensible in a decoupled remote debugging context, but not so good for
the data-driven spur-of-the-moment experimentation that is part of a
productive debugging session. So a working reverse execution gives the
user freedom to look around a program's entire state while moving up
and down along the flow of execution. (Ironically, this capability
might work against good program design, in that it takes away some
incentive to design a program with repeatable behavior. For instance,
programs using random number generator often include machinery to
display and input RNG seeds, one of the uses being to guarantee
predictability while re-running under a debugger.)
But will users actually use any of this in real life? "Undo" is pretty
easy - everybody understands "undo", even nonprogrammers, with many
GUIs giving it a dedicated keystroke. Tracking data backwards through
a program is a powerful tool for a tough class of bugs, but as we know
from past experience, powerful features that are at all hard to use
are often ignored. Single-instruction reverse stepping is conceptually
simpler, but likely to see more interest from the low-level
developers, and may only be interesting if available for kernel
debugging and the like. Reproducibility problems crop up regularly, so
I can see people wanting to use reverse execution after a breakpoint
sets them down in rarely-executed code.
Once we have an idea of what we think users will want from the
feature, we'll have a better idea of what characteristics and
limitations might be acceptable in an implementation.
Stan
Michael Snyder
2005-09-20 22:46:51 UTC
Permalink
Post by Stan Shebs
Post by Eli Zaretskii
That's not what you do when you trace a bug. You start from
the place where, e.g., the program gets a SIGSEGV, and then
unroll it back to possible places where the corruption could
have happened. That is, you try to guess where the problem
could have originated from, and then get there and look around
for clues. I don't find this jarring in any way.
But have you actually done any debugging by reverse execution
yourself?
I have. I've been using it to debug real bugs, difficult ones,
in a realtime embedded OS. I've got a prototype gdb working
with the Simics simulator, with all of the reverse-* commands
pretty much working: reverse-continue, step, stepi, next,
nexti, and finish. Breakpoints and watchpoints also work
in reverse.

I'll give you my best example, which follows a scenario
that Stan outlined near the beginning of this thread.

I've got multiple threads, and one of them is blowing its
stack. Unfortunately it doesn't cause an immediate problem --
it isn't detected until the scheduler does a sanity check at
the next task switch point, and discovers that the guard word
at the end of the stack is gone. At that point, it panics.
This is essentially like seg faulting when you write thru a
bad pointer -- you need to know who wrote the bad value to
the pointer, and that will be the LAST person who changed
it. Many people may have changed it before then.

But -- all I had to do was run forward until the stack
corruption was detected (by analogy, to the segfault),
and then put a watchpoint on the clobbered memory
location and run backward. Bingo -- the first time
the watchpoint triggers, I have my culprit.

Michael Snyder
(still at Red Hat, don't be confused by the email address)
Michael Snyder
2005-09-20 22:56:07 UTC
Permalink
Post by Stan Shebs
Post by Eli Zaretskii
Post by Stan Shebs
As a comparison, for tracepoints we came up with various
scenarios for how they would be amazingly useful and powerful,
and yet after nearly a decade they remain a curiosity in GDB.
IMHO, tracepoints remain a curiosity because they were never
implemented on a large enough number of platforms. Lack of
native support, in particular, is the main reason for its non-use.
But don't you think it's telling that not one single person
was willing to go to the trouble of implementing it on more
platforms? When breakpoints don't work on a platform, users
don't say "oh well, we'll just have to do without". Apparently
tracepoints are just not a must-have.
Eli remarked that the usefulness of reverse execution was a
no-brainer for him, and it's obviously a no-brainer for you
and me and a number of other GDB maintainers.

And yet -- I have a target audience of engineers to whom
I've been trying to "sell" reverse execution -- and I have
a working implementation that I can demo, live, and a real-life
bug that I can show to be easy to debug with reverse execution,
and pretty damn hard otherwise. And the majority of them will
go "wow", but they aren't jumping up and down demanding access
to this cool facility.

I think this is a familiar concept to us, but an unfamiliar
one for many users, and they may have to get their hands on
it and actually use it and play with it before they start to
get a feel for its true power.

The same may have been true for tracepoints. There were some
people who went "wow", and even a few who took a stab at doing
a target implementation -- but few people ever actually got to
get their hands on it and play with it. Even a live demo is
not always as convincing as that.
Ian Lance Taylor
2005-09-20 23:13:55 UTC
Permalink
Post by Michael Snyder
The same may have been true for tracepoints. There were some
people who went "wow", and even a few who took a stab at doing
a target implementation -- but few people ever actually got to
get their hands on it and play with it. Even a live demo is
not always as convincing as that.
For what it's worth, I can think of reasons why I might want to use
reverse execution--see my earlier message.

I have no idea why I would ever want to use tracepoints. As far as I
can see, anything I can do with a tracepoint I can do by logging data
in my program--and if I add the logging code to the program, the code
is ready and waiting for the next time I have a problem. There is
probably some cool use for which tracepoints are the obvious right
answer, but I don't know what it is.

Ian
Eli Zaretskii
2005-09-21 03:39:42 UTC
Permalink
Date: 20 Sep 2005 16:13:55 -0700
There is probably some cool use for which tracepoints are the
obvious right answer, but I don't know what it is.
In native debugging, tracepoints would be very useful to debug a
real-time program, or, more generally, a program where timing issues
are crucial to its correct operation. With such programs, normal GDB
usage disrupts the program operation and might even cause the program
to fail in ways that are unrelated to the bug you are looking for.
Ian Lance Taylor
2005-09-21 03:59:58 UTC
Permalink
Post by Eli Zaretskii
Date: 20 Sep 2005 16:13:55 -0700
There is probably some cool use for which tracepoints are the
obvious right answer, but I don't know what it is.
In native debugging, tracepoints would be very useful to debug a
real-time program, or, more generally, a program where timing issues
are crucial to its correct operation. With such programs, normal GDB
usage disrupts the program operation and might even cause the program
to fail in ways that are unrelated to the bug you are looking for.
I get that that is the idea, it's just that I wouldn't tackle that
problem that way. I would put a logging framework in the program
itself. That's how I've debugged this sort of issue in the past, and
the logging framework generally pays off for itself over time.

(I had several communicating programs with real time interactions. I
arranged for each one to spit out log lines into a separate
multilog-like program to add timestamps, and then after the fact
sorted the lines together to see what was happening.)

I can imagine an embedded system with no output facilities, where it
would be helpful to use gdb to run tracepoints over JTAG--only gdb
doesn't really support JTAG anyhow, so it still seems kind of
hypothetical.

Obviously I'm not saying that tracepoints should be removed or
anything like that. I'm just responding to Stan's comment that
tracepoints have been around for a while and not used, by mentioning
that I personally never seen any important use for them.

Ian
Eli Zaretskii
2005-09-21 17:51:57 UTC
Permalink
Date: 20 Sep 2005 20:59:58 -0700
I would put a logging framework in the program itself. That's how
I've debugged this sort of issue in the past, and the logging
framework generally pays off for itself over time.
We've all used some kind of logging system to debug real-time or near
real-time programs---because that's about the only way to debug them,
if you don't have something like tracepoints. (Well, there's also
oscilloscope debugging for hard real-time programs, if you know what I
mean ;-)

However, debugging through a logging system is akin to printf
debugging; it has all the same deficiencies: the need to recompile the
program to add logging code, which could cause some hard Heisenbugs to
change behavior or even go away, due to code changes and timing
changes.
(I had several communicating programs with real time interactions. I
arranged for each one to spit out log lines into a separate
multilog-like program to add timestamps, and then after the fact
sorted the lines together to see what was happening.)
Yep, been-there-done-that.
Obviously I'm not saying that tracepoints should be removed or
anything like that. I'm just responding to Stan's comment that
tracepoints have been around for a while and not used, by mentioning
that I personally never seen any important use for them.
You will find in the archives that I said a few years ago that native
tracepoints is a feature to kill for. Not surprisingly, at the time I
was working on a large real-time software project. A sophisticated
logging system, augmented by deliberate abort-core-dump code in
strategic places was the best replacement I came up with. That was on
an SGI machine that needed to respond to an interrupt and run the
application code that serviced the interrupt within 500 microseconds.
I used a scope to convince myself that this hard real-time requirement
was being met.
Michael Snyder
2005-09-21 20:36:56 UTC
Permalink
Post by Ian Lance Taylor
Post by Eli Zaretskii
Date: 20 Sep 2005 16:13:55 -0700
There is probably some cool use for which tracepoints are the
obvious right answer, but I don't know what it is.
In native debugging, tracepoints would be very useful to debug a
real-time program, or, more generally, a program where timing issues
are crucial to its correct operation. With such programs, normal GDB
usage disrupts the program operation and might even cause the program
to fail in ways that are unrelated to the bug you are looking for.
I get that that is the idea, it's just that I wouldn't tackle that
problem that way. I would put a logging framework in the program
itself. That's how I've debugged this sort of issue in the past, and
the logging framework generally pays off for itself over time.
That's what tracepoint debugging is, Ian -- it's a re-usable
logging framework. It just frees you up from having to write
that logging code over and over into different projects, and
recompile your project whenever you want to log something
different.

Well, that and a way-cool interactive data review and
presentation mode. ;-)

But let's not highjack this thread to talk about tracepoints,
unles it's to compare their use and utility to reverse execution.

Michael
Stan Shebs
2005-09-24 00:46:11 UTC
Permalink
Post by Michael Snyder
Post by Ian Lance Taylor
Post by Eli Zaretskii
Date: 20 Sep 2005 16:13:55 -0700
There is probably some cool use for which tracepoints are the
obvious right answer, but I don't know what it is.
In native debugging, tracepoints would be very useful to debug a
real-time program, or, more generally, a program where timing issues
are crucial to its correct operation. With such programs, normal GDB
usage disrupts the program operation and might even cause the program
to fail in ways that are unrelated to the bug you are looking for.
I get that that is the idea, it's just that I wouldn't tackle that
problem that way. I would put a logging framework in the program
itself. That's how I've debugged this sort of issue in the past, and
the logging framework generally pays off for itself over time.
That's what tracepoint debugging is, Ian -- it's a re-usable
logging framework. It just frees you up from having to write
that logging code over and over into different projects, and
recompile your project whenever you want to log something
different.
Well, that and a way-cool interactive data review and
presentation mode. ;-)
But let's not highjack this thread to talk about tracepoints,
unles it's to compare their use and utility to reverse execution.
Ian does touch on an important general point, which is that
debugger features ought to be uniquely available, not just
repackaging of functionality that can be accomplished nearly
as well in other ways.

For instance, when I have a plain old breakpoint that lets me stop
and interactively look at a backtrace, that is something that is
(usually) not possible to do without the help of a debugger, and
everybody agrees that this is a good feature to have.

Conversely, if I have a tracepoint that just prints out one of my
program's variables, that doesn't give me much that I can't get
with a printf. However, if the tracepoint is collecting raw
registers, that's more difficult to manage using only print
functions, and then the tracepoint starts to looks more interesting.
Ditto if I'm in a context where printf is not available, or so slow
that it affects critical real-time behavior.

In the case of reverse execution, one Appleite wondered why anybody
would bother, since you could repeatedly start the program over. And
indeed, GDB makes the restarting process pretty quick and easy; just
type "r". So reverse execution is not going to be a must-have unless
rerunning is either impossible (as in the case of intermittent bugs),
or very slow (as in the case of spending fifteen minutes giving
iTunes a particular pattern of mouse clicks and CD insertions, just
to get to the failing code).

Stan
Michael Snyder
2005-09-24 01:09:45 UTC
Permalink
Post by Stan Shebs
Post by Michael Snyder
Post by Ian Lance Taylor
Post by Eli Zaretskii
Date: 20 Sep 2005 16:13:55 -0700
There is probably some cool use for which tracepoints are the
obvious right answer, but I don't know what it is.
In native debugging, tracepoints would be very useful to debug a
real-time program, or, more generally, a program where timing issues
are crucial to its correct operation. With such programs, normal GDB
usage disrupts the program operation and might even cause the program
to fail in ways that are unrelated to the bug you are looking for.
I get that that is the idea, it's just that I wouldn't tackle that
problem that way. I would put a logging framework in the program
itself. That's how I've debugged this sort of issue in the past, and
the logging framework generally pays off for itself over time.
That's what tracepoint debugging is, Ian -- it's a re-usable
logging framework. It just frees you up from having to write
that logging code over and over into different projects, and
recompile your project whenever you want to log something
different.
Well, that and a way-cool interactive data review and
presentation mode. ;-)
But let's not highjack this thread to talk about tracepoints,
unles it's to compare their use and utility to reverse execution.
Ian does touch on an important general point, which is that
debugger features ought to be uniquely available, not just
repackaging of functionality that can be accomplished nearly
as well in other ways.
For instance, when I have a plain old breakpoint that lets me stop
and interactively look at a backtrace, that is something that is
(usually) not possible to do without the help of a debugger, and
everybody agrees that this is a good feature to have.
Conversely, if I have a tracepoint that just prints out one of my
program's variables, that doesn't give me much that I can't get
with a printf. However, if the tracepoint is collecting raw
registers, that's more difficult to manage using only print
functions, and then the tracepoint starts to looks more interesting.
Ditto if I'm in a context where printf is not available, or so slow
that it affects critical real-time behavior.
In the case of reverse execution, one Appleite wondered why anybody
would bother, since you could repeatedly start the program over. And
indeed, GDB makes the restarting process pretty quick and easy; just
type "r". So reverse execution is not going to be a must-have unless
rerunning is either impossible (as in the case of intermittent bugs),
or very slow (as in the case of spending fifteen minutes giving
iTunes a particular pattern of mouse clicks and CD insertions, just
to get to the failing code).
No, it's not terribly useful for easy problems.
What do the Marines say? The difficult we do immediately...
Eli Zaretskii
2005-09-24 10:05:46 UTC
Permalink
Date: Fri, 23 Sep 2005 17:46:11 -0700
In the case of reverse execution, one Appleite wondered why anybody
would bother, since you could repeatedly start the program over. And
indeed, GDB makes the restarting process pretty quick and easy; just
type "r". So reverse execution is not going to be a must-have unless
rerunning is either impossible (as in the case of intermittent bugs),
or very slow (as in the case of spending fifteen minutes giving
iTunes a particular pattern of mouse clicks and CD insertions, just
to get to the failing code).
You gave the answer yourself: most non-trivial bugs require prolonged
and complicated sequences of actions (program setup and user commands)
to reproduce the buggy behavior. Restarting would mean that one needs
to redo all that every time, which is a PITA.
Jim Blandy
2005-09-27 21:59:13 UTC
Permalink
Post by Stan Shebs
Conversely, if I have a tracepoint that just prints out one of my
program's variables, that doesn't give me much that I can't get
with a printf. However, if the tracepoint is collecting raw
registers, that's more difficult to manage using only print
functions, and then the tracepoint starts to looks more interesting.
Ditto if I'm in a context where printf is not available, or so slow
that it affects critical real-time behavior.
Tracepoints can collect (at least partial, and usually complete) stack
backtraces, too.

(Again, not to sidetrack the discussion...)

Daniel Jacobowitz
2005-09-21 04:01:54 UTC
Permalink
Post by Eli Zaretskii
Date: 20 Sep 2005 16:13:55 -0700
There is probably some cool use for which tracepoints are the
obvious right answer, but I don't know what it is.
In native debugging, tracepoints would be very useful to debug a
real-time program, or, more generally, a program where timing issues
are crucial to its correct operation. With such programs, normal GDB
usage disrupts the program operation and might even cause the program
to fail in ways that are unrelated to the bug you are looking for.
Yes - and I've definitely tried to debug programs using GDB where this
would have been helpful. E.G. when working on the MIPS port of NPTL.
I ended up doing the manual equivalent using printf, but printf can
actually be much higher overhead than in-memory tracepointing, and
recompiling libc every time I needed to change the debugging output got
to be a bit of a drag.
--
Daniel Jacobowitz
CodeSourcery, LLC
Paul Gilliam
2005-09-21 16:55:56 UTC
Permalink
Post by Michael Snyder
And yet -- I have a target audience of engineers to whom
I've been trying to "sell" reverse execution -- and I have
a working implementation that I can demo, live, and a real-life
bug that I can show to be easy to debug with reverse execution,
and pretty damn hard otherwise. And the majority of them will
go "wow", but they aren't jumping up and down demanding access
to this cool facility.
How 'bout a bunch of you start a new gdb branch and add as much 'reverse execution' as would be needed to "prove your point".

If you do, I pledge to use it as my primary debugger and give copious feed back.

-=# Paul #=-
Stan Shebs
2005-09-23 23:43:45 UTC
Permalink
[...] I have a target audience of engineers to whom
I've been trying to "sell" reverse execution -- and I have
a working implementation that I can demo, live, and a real-life
bug that I can show to be easy to debug with reverse execution,
and pretty damn hard otherwise. And the majority of them will
go "wow", but they aren't jumping up and down demanding access
to this cool facility.
This is a really important data point.
I think this is a familiar concept to us, but an unfamiliar
one for many users, and they may have to get their hands on
it and actually use it and play with it before they start to
get a feel for its true power.
My intuition is that this is the most accurate description of
the current reality. It makes things harder for us, because
it means we have to make a really good first impression; a lame
or unreliable implementation of reversal will have users hearing
through the grapevine that they should avoid it or that it is
useless, and many people will never even give it that first try.

Stan
Michael Snyder
2005-09-20 23:10:56 UTC
Permalink
Post by Stan Shebs
Depending on the answers, the project could be fatally flawed.
For instance, if the ability to undo system calls is critical
for usability, that pretty much relegates reversal to simulator
targets only - not interesting for my user base. That's why I
wanted to talk about usage patterns; if users don't need the
debugger to do the incredibly hard things, then we can get to
something useful sooner.
Here's the thing, though, Stan --

We can separate the debugger implementation questions from
the target-side implementation questions. Whether I/O can
be "undone", whether system calls can be reversed over, even
whether the target can proceed forward again from a point
that it has reversed back to -- these are all things about
which gdb need not concern itself. They're target-side
details.

Think about forward execution. Does gdb know anything
about system calls? In general, not. Does it know anything
about I/O? Definitely not, except in some special cases.
GDB knows about step, continue, and why-did-we-stop?
Those are its primatives.

If we make the CORE PART of gdb do nothing more than use
similar primatives for backward debugging, then it will
"just work". I know this, 'cause I've done it. We may
need to build some more intimate details into SOME gdb
back-ends, or implement a separate module that can do
certain things such as checkpoints for a target that
can't do them for itself -- but the core part of gdb
doesn't need to know about that, and those considerations
need not hold up the development of reverse execution
in the core part of gdb.

Separate the debugging of reverse-execution from the
question of how the reverse-execution is to be done.
I know, you need to consider both, and there's definitely
cross-over, but what I am saying is that we CAN
separate them, and that gdb will be better if we do.
The part of gdb that controls execution (infrun and
infcmd, for instance) SHOULD not know how the backend
or the target "works".

The target, on the other hand, may have lots of
capabilities, and it may not. Maybe it can only
"back up" until the first system call, and then
it gives up. Well, then gdb just needs to know
how to handle a target that can do some reverse
executing, but then can't do more. That's general --
because another target may have a "buffer" of saved
state for reverse execution, and it may eventually
reach the beginning of that buffer. Infrun doesn't
necessarily need to know WHY the target can't go
backward any more, just that it can't. Although
of course we might encode some common reasons and
give some meaningful failure message, it isn't
essential to the implementation of reverse debugging.
Stan Shebs
2005-09-24 00:06:54 UTC
Permalink
Post by Michael Snyder
Think about forward execution. Does gdb know anything
about system calls? In general, not. Does it know anything
about I/O? Definitely not, except in some special cases.
GDB knows about step, continue, and why-did-we-stop?
Those are its primatives.
I think we know these primitives work for forward execution
because we have 40+ years of experience that says they are
useful. Students learn those in school as part of their formative
experience with debugging, so there is also a bit of tautology -
"these are useful for debugging because these are what we use
for debugging". It's so ingrained that researchers in
algorithmic and specification-based debugging continually
have a hard time getting anybody interested in their approach.
Post by Michael Snyder
The target, on the other hand, may have lots of
capabilities, and it may not. Maybe it can only
"back up" until the first system call, and then
it gives up.
This comes back to the "first impression" issue. App writers
on Mac OS X have a huge numbers of libraries (frameworks) that
they're suppposed to use, and at the bottom of those there are lots
and lots of system calls going on. If the reverse step button is
perpetually grayed out for every line more complicated than
"a = b + c", users are going to see reversing as a useless feature
and never get in the habit of using it to solve their problems.

Fortunately I don't think we have to give up that easily; an adroitly
designed mechanism could treat system call behavior as a black box
requiring a bit of footwork to reverse past, just as GDB does some
footwork with breakpoints to cover up how we're scribbling trap
instructions all over supposedly read-only memory. :-)

Stan
Loading...