Discussion:
Help with recursive destructive function
Eric Abrahamsen
2018-05-05 01:04:57 UTC
Permalink
So I'm revisiting my nemesis, eieio-persistent, and trying to write a
function for serializing data to disk in a generalized, data-agnostic
way.

The function needs to take a totally arbitrary list of data, and run a
function over it so that each element in the (possibly very nested) list
is tested and, if the test is positive, destructively replaced by a new
value.

I'd like to make it destructive because there could potentially be a
very large amount of data, and very few of the elements will actually
need to be altered, and I'm afraid copying enormous structures will be
slow.

I'm pretending that the job is to walk an arbitrary tree, and upcase any
and all strings in that tree. That's not the job, but I'm confident that
if I can make this work, it will be relatively simple to translate into
the actual job. (It will also have to handle hash-tables and vectors,
again I think that's all relatively straightforward.)

So here's my test data:

(setq test-data '("a" 1 "b" ("c" (2 ("d" . 3)) (4 . "e") "f")))

And the result I want (the destructive alteration of `test-data') is:

("A" 1 "B" ("C" (2 ("D" . 3)) (4 . "E") "F"))

The function I came up with always operates at one higher level of
nesting, because that's the only way to get a setf-able handle to
internal elements of the data structure: if you `mapc' a function over a
list, you can't use the resulting lambda argument symbol to set
anything. So I stay a level "up", in order to use setf, setcar, setcdr,
etc.

I did briefly consider `cl-symbol-macrolet', since it seems to be made
to address these sorts of problems, but this other solution more or less
came clear before I got a clear mental grasp of symbol-macrolet.

I'm hoping that someone might be willing to lend me some brain, and tell
me if I've done something conceptually wrong. Will this function do what
I expect it to do?

(defun walk (thing)
(cond ((listp (cdr thing))
(dotimes (i (length thing))
(cond ((stringp (nth i thing))
(setf (nth i thing) (upcase (nth i thing))))
((listp (nth i thing))
(walk (nth i thing))))))
((consp thing)
(when (stringp (car thing))
(setcar thing (upcase (car thing))))
(when (stringp (cdr thing))
(setcdr thing (upcase (cdr thing)))))))
Stefan Monnier
2018-05-05 01:18:56 UTC
Permalink
Post by Eric Abrahamsen
(cond ((listp (cdr thing))
(dotimes (i (length thing))
(cond ((stringp (nth i thing))
(setf (nth i thing) (upcase (nth i thing))))
((listp (nth i thing))
(walk (nth i thing))))))
Each `nth i` will take time O(i) so you have an O(n^2) complexity right there.

You want to do something more like

(let ((xs thing))
(while (consp xs)
(let ((x (car xs)))
(cond
((stringp x) (setf (car xs) (upcase x)))
((listp x) (walk x)))
(setq xs (cdr xs)))))


-- Stefan
Michael Heerdegen
2018-05-05 01:37:00 UTC
Permalink
Post by Stefan Monnier
You want to do something more like
(let ((xs thing))
(while (consp xs)
(let ((x (car xs)))
(cond
((stringp x) (setf (car xs) (upcase x)))
((listp x) (walk x)))
(setq xs (cdr xs)))))
`cl-loop' can do this out of the box (not recursively, though):

‘for VAR being the elements of-ref SEQUENCE’
This clause iterates over a sequence, with VAR a ‘setf’-able
reference onto the elements; see ‘in-ref’ above.

Exactly what he wants.

I'm not sure, however, if it's a good idea to use a recursive function
to do that. Recursing on cdrs can soon hit Emacs limits for very long
lists (and Eric, remember all the nested quotes you will need to
traverse ;-) ). I guess I would use an iterator: the definition would
still looks recursive, but the execution isn't problematic any more if
done right.


Michael.
Michael Heerdegen
2018-05-05 15:41:41 UTC
Permalink
I guess I would use an iterator: the definition would still looks
recursive, but the execution isn't problematic any more if done right.
Here is an example:

#+begin_src emacs-lisp
(iter-defun iter-tree-example (tree)
(cl-loop for thing in-ref tree by #'cdr do
(if (consp thing)
(iter-yield-from (iter-tree-example thing))
(iter-yield
(ignore
(when (stringp thing)
(cl-callf upcase thing)))))))

(let ((tree '("a" 1 ("b" "c" ("d")) "e")))
(iter-do (_ (iter-tree-example tree)))
tree)
==>
("A" 1
("B" "C"
("D"))
"E")

(let ((huge-list (number-sequence 1 100000)))
(setcdr (last huge-list) (cons "a" nil))
(iter-do (_ (iter-tree-example huge-list)))
(ignore huge-list))
|-- takes some time but doesn't crash
#+end_src

Yield values don't have any purpose but I guess without yielding you
would get no CPS rewrite but a standard recursive function that would be
problematic with the HUGE-LIST.


Michael.
Eric Abrahamsen
2018-05-06 17:29:39 UTC
Permalink
Post by Michael Heerdegen
I guess I would use an iterator: the definition would still looks
recursive, but the execution isn't problematic any more if done right.
Thanks to both of you! This was some good food for thought. I made
Stefan's suggested change to my original function and it works fine. It
still looks ugly to me because I'm doing the same test-and-set in three
different places, but with sufficient poking I can probably get it all
inside the same loop.

All else being equal I prefer this more "basic" version, simply because
I understand everything that's happening in it. I haven't used `cl-loop'
before, but I assume it's not doing anything that

(while (consp thing)
...
(setq thing (cdr thing))

Isn't doing? Oh, but then you wouldn't be able to use cl-callf directly
on thing.

Recursion is an issue, but the original version recurses on car, not
cdr, which I think (?) is much less of a problem. It went through your
huge-list with no trouble (and faster than the iterative version). I
suppose someone might have accumulated 801 levels of nested quotes in
their Gnus registry (god, I hope not), but otherwise I'm not sure it's a
worry.

On the third hand, if "bulletproof" is the goal, maybe it's best not to
risk it...
Post by Michael Heerdegen
#+begin_src emacs-lisp
(iter-defun iter-tree-example (tree)
(cl-loop for thing in-ref tree by #'cdr do
(if (consp thing)
(iter-yield-from (iter-tree-example thing))
(iter-yield
(ignore
(when (stringp thing)
(cl-callf upcase thing)))))))
Yield values don't have any purpose but I guess without yielding you
would get no CPS rewrite but a standard recursive function that would be
problematic with the HUGE-LIST.
I guess this works because the calls to `iter-yield' and
`iter-yield-from' fully return from the function? Also, what does "CPS
rewrite" mean?

Thanks again,
Eric
Michael Heerdegen
2018-05-06 19:29:34 UTC
Permalink
Post by Eric Abrahamsen
I guess this works because the calls to `iter-yield' and
`iter-yield-from' fully return from the function? Also, what does "CPS
rewrite" mean?
"Continuation passing style". You can google it.


Michael.
Eric Abrahamsen
2018-05-06 19:34:39 UTC
Permalink
Post by Michael Heerdegen
Post by Eric Abrahamsen
I guess this works because the calls to `iter-yield' and
`iter-yield-from' fully return from the function? Also, what does "CPS
rewrite" mean?
"Continuation passing style". You can google it.
Ah, thanks. I know the concept, but not the acronym.
Eric Abrahamsen
2018-05-06 18:27:42 UTC
Permalink
Post by Michael Heerdegen
I guess I would use an iterator: the definition would still looks
recursive, but the execution isn't problematic any more if done right.
#+begin_src emacs-lisp
(iter-defun iter-tree-example (tree)
(cl-loop for thing in-ref tree by #'cdr do
(if (consp thing)
(iter-yield-from (iter-tree-example thing))
(iter-yield
(ignore
(when (stringp thing)
(cl-callf upcase thing)))))))
(let ((tree '("a" 1 ("b" "c" ("d")) "e")))
(iter-do (_ (iter-tree-example tree)))
tree)
==>
("A" 1
("B" "C"
("D"))
"E")
Oh hang on, this doesn't work for all cases: the "by #'cdr" prevents it
from hitting the cdr of cons cells:

(let ((tree '("c" (2 ("d" . 3)) (4 . "e") "f")))
(iter-do (_ (iter-tree-example tree)))
tree)

--> ("C" (2 ("D" . 3)) (4 . "e") "F")

The best I've been able to come up with is:

(defun useless (val)
(while (consp val)
(let ((head (car val)))
(cond ((stringp head)
(setcar val (upcase head)))
((listp head)
(useless head)))
(when (stringp (cdr val))
(setcdr val (upcase (cdr val))))
(setq val (cdr val)))))

Recurses on car, and only repeats the test twice. I'd love to know if
that could be simplified...

Thanks again,
Eric
Michael Heerdegen
2018-05-07 02:01:03 UTC
Permalink
Post by Eric Abrahamsen
Oh hang on, this doesn't work for all cases: the "by #'cdr" prevents
(let ((tree '("c" (2 ("d" . 3)) (4 . "e") "f")))
(iter-do (_ (iter-tree-example tree)))
tree)
--> ("C" (2 ("D" . 3)) (4 . "e") "F")
Not only that (it's easy to fix), but I also realized that this approach
doesn't circumvent the recursion issue. But if your data is not
extremely deeply nested, recursion should not be such a problem.
Post by Eric Abrahamsen
(defun useless (val)
(while (consp val)
(let ((head (car val)))
(cond ((stringp head)
(setcar val (upcase head)))
((listp head)
(useless head)))
(when (stringp (cdr val))
(setcdr val (upcase (cdr val))))
(setq val (cdr val)))))
Recurses on car, and only repeats the test twice. I'd love to know if
that could be simplified...
Looks ok to me in principle.


Michael.
Eric Abrahamsen
2018-05-07 03:01:02 UTC
Permalink
Post by Michael Heerdegen
Post by Eric Abrahamsen
Oh hang on, this doesn't work for all cases: the "by #'cdr" prevents
(let ((tree '("c" (2 ("d" . 3)) (4 . "e") "f")))
(iter-do (_ (iter-tree-example tree)))
tree)
--> ("C" (2 ("D" . 3)) (4 . "e") "F")
Not only that (it's easy to fix), but I also realized that this approach
doesn't circumvent the recursion issue. But if your data is not
extremely deeply nested, recursion should not be such a problem.
Well... but... it might be deeply nested...

I'd like to have an iterative approach to work with as well -- it may
end up being slower, but if it's only a little slower, the slow-down is
worth the safety. If you have nothing to do on a Sunday afternoon (here)
and want to contribute a non-recursive iterative version, I would look
forward to testing!

Just saying,
Eric
Clément Pit-Claudel
2018-05-07 04:16:12 UTC
Permalink
Post by Eric Abrahamsen
I'd like to have an iterative approach to work with as well
Here's a quick attempt:

(defun deep-edit (f data)
(let ((cur (list #'car #'setcar data))
(stack (list (list #'cdr #'setcdr data))))
(while cur
(let* ((getter (car cur))
(setter (cadr cur))
(tree (caddr cur))
(subtree (funcall getter tree)))
(funcall setter tree (funcall f subtree))
(cond
((consp subtree)
(push (list #'cdr #'setcdr subtree) stack)
(setq cur (list #'car #'setcar subtree)))
(t (setq cur (pop stack))))))))

(setq test-data '("a" 1 "b" ("c" (2 ("d" . 3)) (4 . "e") "f")))

(defun f (subtree)
(if (stringp subtree)
(upcase subtree)
subtree))

(deep-edit #'f test-data)

… I'd actually tend to write it like this, if pcase is OK:

(defun deep-edit (f data)
(let ((cur `(car setcar ,data))
(stack `((cdr setcdr ,data))))
(while (or cur stack)
(pcase-let* ((`(,getter ,setter ,cell) cur)
(subtree (funcall getter cell)))
(funcall setter cell (funcall f subtree))
(cond
((consp subtree)
(push `(cdr setcdr ,subtree) stack)
(setq cur `(car setcar ,subtree)))
(t (setq cur (pop stack))))))))

Let me know if it needs clarification. I haven't tested it much, either.

Clément.
Michael Heerdegen
2018-05-07 14:14:36 UTC
Permalink
Post by Clément Pit-Claudel
(defun deep-edit (f data)
(let ((cur `(car setcar ,data))
(stack `((cdr setcdr ,data))))
(while (or cur stack)
(pcase-let* ((`(,getter ,setter ,cell) cur)
(subtree (funcall getter cell)))
(funcall setter cell (funcall f subtree))
(cond
((consp subtree)
(push `(cdr setcdr ,subtree) stack)
(setq cur `(car setcar ,subtree)))
(t (setq cur (pop stack))))))))
Well done!

A random example and two corner cases seem to work ok:

(defun edit-elt (thing)
(if (stringp thing) (upcase thing) thing))

(let ((tree '("c" (2 ("d" . 3)) (4 . "e") "f" (("g" . "h")))))
(deep-edit #'edit-elt tree)
tree)
==>
("C"
(2
("D" . 3))
(4 . "E")
"F"
(("G" . "H")))

(let ((huge-list (number-sequence 1 100000)))
(setcdr (last huge-list) (cons "a" nil))
(deep-edit #'edit-elt huge-list))
|-- no crash

(let ((deeply-nested-list (list nil)))
(let ((pointer deeply-nested-list))
(dotimes (_ 100000)
(setf (car pointer) (list nil)
pointer (car pointer)))
(setf (car pointer) "a"))
(deep-edit #'edit-elt deeply-nested-list))
|- no crash


Michael.
Stefan Monnier
2018-05-07 16:26:45 UTC
Permalink
Post by Clément Pit-Claudel
(defun deep-edit (f data)
(let ((cur (list #'car #'setcar data))
(stack (list (list #'cdr #'setcdr data))))
(while cur
(let* ((getter (car cur))
(setter (cadr cur))
(tree (caddr cur))
(subtree (funcall getter tree)))
(funcall setter tree (funcall f subtree))
(cond
((consp subtree)
(push (list #'cdr #'setcdr subtree) stack)
(setq cur (list #'car #'setcar subtree)))
(t (setq cur (pop stack))))))))
BTW, a "cleaner" (tho less efficient) way uses gv-ref:

(defun deep-edit (f data)
(let ((cur (gv-ref (car data)))
(stack (list (gv-ref (cdr data)))))
(while (or cur stack)
;; Probably should use (cl-callf f (gv-deref cell)).
(pcase (setf (gv-deref cur) (funcall f (gv-deref cell)))
((and (pred consp) subtree)
(push (gv-ref (cdr subtree)) stack)
(setq cur (gv-ref (car subtree))))
(t (setq cur (pop stack)))))))

tho the above has a bug when cur is nil, so it won't work as is.


Stefan "it'd be great to be able to represent a gv-ref as
a 3-element tuple so as to make it as efficient as your
code, tho"
Eric Abrahamsen
2018-05-07 16:52:49 UTC
Permalink
Post by Clément Pit-Claudel
Post by Clément Pit-Claudel
(defun deep-edit (f data)
(let ((cur (list #'car #'setcar data))
(stack (list (list #'cdr #'setcdr data))))
(while cur
(let* ((getter (car cur))
(setter (cadr cur))
(tree (caddr cur))
(subtree (funcall getter tree)))
(funcall setter tree (funcall f subtree))
(cond
((consp subtree)
(push (list #'cdr #'setcdr subtree) stack)
(setq cur (list #'car #'setcar subtree)))
(t (setq cur (pop stack))))))))
(defun deep-edit (f data)
(let ((cur (gv-ref (car data)))
(stack (list (gv-ref (cdr data)))))
(while (or cur stack)
;; Probably should use (cl-callf f (gv-deref cell)).
(pcase (setf (gv-deref cur) (funcall f (gv-deref cell)))
((and (pred consp) subtree)
(push (gv-ref (cdr subtree)) stack)
(setq cur (gv-ref (car subtree))))
(t (setq cur (pop stack)))))))
tho the above has a bug when cur is nil, so it won't work as is.
All this went very rapidly over my head, but I guess that's what I was
hoping for! I'll wait and see if you all work this through any further,
then play around with the results, and update bug #29541 with a new patch.

Thanks!

Eric
Eric Abrahamsen
2018-05-11 02:12:40 UTC
Permalink
I tweaked it a bit and extended the idea for arrays. Hope the comments
are helpful. Uses seq.el and cl-lib.
- Uses gv places as references.
This is great, and I really like the use of gv -- morally it seems like
the right tool. I will have time this weekend to apply all this to eieio-persistent.
Michael Heerdegen
2018-05-14 14:27:32 UTC
Permalink
Post by Eric Abrahamsen
This is great, and I really like the use of gv -- morally it seems like
the right tool.
Didn't profile speed, however - hope it's good enough.
Post by Eric Abrahamsen
I will have time this weekend to apply all this to eieio-persistent.
Where can I have a look at your work?


Michael.
Eric Abrahamsen
2018-05-14 16:57:34 UTC
Permalink
Post by Michael Heerdegen
Post by Eric Abrahamsen
This is great, and I really like the use of gv -- morally it seems like
the right tool.
Didn't profile speed, however - hope it's good enough.
I'm trying to break the changes into discrete steps, and benchmark each
step, so hopefully we'll know.
Post by Michael Heerdegen
Post by Eric Abrahamsen
I will have time this weekend to apply all this to eieio-persistent.
Where can I have a look at your work?
Nowhere yet -- I got partway through this, and ran up against a dumb
problem maybe you can help me with.

For backwards compatibility, we need to be able to handle lists that are
quoted, or that start with the symbol `list'. This will sound familiar
to you... In the non-destructive version, it was easy enough just to
return (cdr thing) instead of thing.

In the destructive version, this means that `handle-refs' would need to
first edit and *then* traverse the cons, which is not what it's set up
to do, obviously. I could probably cheat and move the backward
compatibility into some other part of `deep-edit' itself, but I was
trying to avoid that because that function could be useful elsewhere as
part of the general library.

Here's what the problem actually looks like:

#+BEGIN_SRC elisp
(defun edit-func (proposed-value)
(cond ((and (consp proposed-value)
(eq (car proposed-value) 'list))
#'cdr)
((and (consp proposed-value)
(eq (car proposed-value) 'quote))
#'cadr)
((stringp proposed-value)
#'upcase)
(t nil)))

(let ((tree '("b" '((first ("one" "two" "three"))
(second ("four" "five" "six")))
"c" (list "seven" "eight" "nine"))))
(deep-edit #'edit-func
(lambda (thing) (consp thing))
tree)
tree)
#+END_SRC

Do you have any good ideas about this?

Thanks again,
Eric
Michael Heerdegen
2018-05-14 23:16:45 UTC
Permalink
Post by Eric Abrahamsen
For backwards compatibility, we need to be able to handle lists that are
quoted, or that start with the symbol `list'. This will sound familiar
to you... In the non-destructive version, it was easy enough just to
return (cdr thing) instead of thing.
What will this code do when the saved list was really quoted, or was a
list with the symbol list as first element? Wouldn't the backward
compatible version cause errors?
Post by Eric Abrahamsen
In the destructive version, this means that `handle-refs' would need to
first edit and *then* traverse the cons, which is not what it's set up
to do, obviously. I could probably cheat and move the backward
compatibility into some other part of `deep-edit' itself, but I was
trying to avoid that because that function could be useful elsewhere as
part of the general library.
My implementation so far doesn't perfectly fit that case (as you
noticed).

Do you expect something occurring like '1 or (list) - i.e. cases where
the remaining thing is an atom (like 1 and nil above)?

If not, I would just tweak `handle-refs' so that it calls the modify
function _and_ additionally pushes the replacement to the stack when
some condition is met. You could make it so that `edit-func' returns a
special value for this case, like in

#+begin_src emacs-lisp
(defun edit-func (proposed-value)
(cond ((and (consp proposed-value)
(eq (car proposed-value) 'list))
(list t #'cdr))
((and (consp proposed-value)
(eq (car proposed-value) 'quote))
(list t #'cadr))
((stringp proposed-value)
#'upcase)
(t nil)))
#+end_src

where (list t FUN) would mean that we want to modify the value with FUN
and additionally push the resulting (remaining) tree to the stack. Not
much magic here...


Michael.
Eric Abrahamsen
2018-05-15 00:28:31 UTC
Permalink
Post by Michael Heerdegen
Post by Eric Abrahamsen
For backwards compatibility, we need to be able to handle lists that are
quoted, or that start with the symbol `list'. This will sound familiar
to you... In the non-destructive version, it was easy enough just to
return (cdr thing) instead of thing.
What will this code do when the saved list was really quoted, or was a
list with the symbol list as first element? Wouldn't the backward
compatible version cause errors?
Yup. But no more errors than it already would. This will also be an
issue when I try to unambiguously identify lists-that-should-be-objects.
The only surefire way around this that I can think of is writing an
eieio-persistent version number into the files.
Post by Michael Heerdegen
Post by Eric Abrahamsen
In the destructive version, this means that `handle-refs' would need to
first edit and *then* traverse the cons, which is not what it's set up
to do, obviously. I could probably cheat and move the backward
compatibility into some other part of `deep-edit' itself, but I was
trying to avoid that because that function could be useful elsewhere as
part of the general library.
My implementation so far doesn't perfectly fit that case (as you
noticed).
Do you expect something occurring like '1 or (list) - i.e. cases where
the remaining thing is an atom (like 1 and nil above)?
Yes -- there's a check in the current code precisely for the (list)
case.
Post by Michael Heerdegen
If not, I would just tweak `handle-refs' so that it calls the modify
function _and_ additionally pushes the replacement to the stack when
some condition is met. You could make it so that `edit-func' returns a
special value for this case, like in
#+begin_src emacs-lisp
(defun edit-func (proposed-value)
(cond ((and (consp proposed-value)
(eq (car proposed-value) 'list))
(list t #'cdr))
((and (consp proposed-value)
(eq (car proposed-value) 'quote))
(list t #'cadr))
((stringp proposed-value)
#'upcase)
(t nil)))
#+end_src
where (list t FUN) would mean that we want to modify the value with FUN
and additionally push the resulting (remaining) tree to the stack. Not
much magic here...
I had been treating it as magic :) I'll try to loosen up a bit and
experiment with something that might also handle the (list) case.
Loading...