Discussion:
bug#30626: 26.0.91; Crash when traversing a `stream-of-directory-files'
Michael Heerdegen
2018-02-27 09:22:26 UTC
Permalink
Hello,

Traversing a `stream-of-directory-files' over a huge directory hierarchy
crashes my Emacs.

Here is a recipe: I have a file
"/home/micha/Treasure/today/stream-crash.el" with these contents:

#+begin_src emacs-lisp
;; -*- lexical-binding: t -*-

(require 'stream)

(seq-doseq (_file (stream-of-directory-files
"/home/micha" t nil t nil
(lambda (file) (and (file-readable-p file) (file-regular-p file)))))
nil)
#+end_src

Then I

micha> emacs -Q -L /home/micha/software/elpa/packages/stream\
-l /home/micha/Treasure/today/stream-crash.el

and I get a segfault after ~ 1 minute.

This kind of crash only seems to occur when the traversed directory is
sufficiently large.


Thanks in advance,

Michael.


In GNU Emacs 26.0.91 (build 2, x86_64-pc-linux-gnu, GTK+ Version 3.22.28)
of 2018-02-26 built on drachen
Repository revision: ea8f9fd7be138708a52aad418d09d5d4ca6b2a7e
Windowing system distributor 'The X.Org Foundation', version 11.0.11906000
System Description: Debian GNU/Linux testing (buster)
Eli Zaretskii
2018-02-27 11:21:30 UTC
Permalink
Post by Michael Heerdegen
Hello,
Traversing a `stream-of-directory-files' over a huge directory
hierarchy
crashes my Emacs.
Here is a recipe: I have a file
#+begin_src emacs-lisp
;; -*- lexical-binding: t -*-
(require 'stream)
(seq-doseq (_file (stream-of-directory-files
"/home/micha" t nil t nil
(lambda (file) (and (file-readable-p file) (file-regular-p file)))))
nil)
#+end_src
Then I
micha> emacs -Q -L /home/micha/software/elpa/packages/stream\
-l /home/micha/Treasure/today/stream-crash.el
and I get a segfault after ~ 1 minute.
This kind of crash only seems to occur when the traversed directory is
sufficiently large.
I guess it's a stack overflow: that function recurses into subdirectories.

To avoid such problems, the function should be rewritten to work by BFS, not DFS.
Michael Heerdegen
2018-02-27 11:39:47 UTC
Permalink
Post by Eli Zaretskii
I guess it's a stack overflow: that function recurses into
subdirectories.
To avoid such problems, the function should be rewritten to work by BFS, not DFS.
Here is the backtrace I'm now able to produce. Looks like the crash
happens in gc:

| [lots of more lines like these cut]
| #104019 0x00000000005e0526 in mark_object (arg=XIL(0x5a15413)) at alloc.c:6624
| #104020 0x00000000005dfab5 in mark_vectorlike (ptr=0x5a584e0) at alloc.c:6227
| #104021 0x00000000005e0526 in mark_object (arg=XIL(0x5bd7cf3)) at alloc.c:6624
| #104022 0x00000000005dfab5 in mark_vectorlike (ptr=0x5873a80) at alloc.c:6227
| #104023 0x00000000005e0526 in mark_object (arg=XIL(0x5bd7cb3)) at alloc.c:6624
| #104024 0x00000000006070ea in mark_specpdl (first=0x58a1348, ptr=0x58a1b90) at eval.c:3868
| #104025 0x00000000006856b2 in mark_one_thread (thread=0xcd5340 <main_thread>) at thread.c:614
| #104026 0x00000000006857c7 in mark_threads_callback (ignore=0x0) at thread.c:649
| #104027 0x00000000005dda03 in flush_stack_call_func (func=0x685781 <mark_threads_callback>, arg=0x0) at alloc.c:5220
| #104028 0x00000000006857f5 in mark_threads () at thread.c:656
| #104029 0x00000000005df2d2 in garbage_collect_1 (end=0x7fffffff63f0) at alloc.c:5997
| #104030 0x00000000005df929 in Fgarbage_collect () at alloc.c:6168
| #104031 0x000000000055746c in maybe_gc () at lisp.h:4749
| #104032 0x00000000006047cb in Ffuncall (nargs=4, args=0x7fffffff64d0) at eval.c:2751
| #104033 0x000000000064cc41 in exec_byte_code (bytestr=XIL(0x42f83c4), vector=XIL(0x3255bb5), maxdepth=make_number(13), args_template=make_number(128), nargs=2, args=0x7fffffff6b38) at bytecode.c:629
| #104034 0x000000000060529c in funcall_lambda (fun=XIL(0x3255c15), nargs=2, arg_vector=0x7fffffff6b38) at eval.c:2970
| #104035 0x00000000006048d4 in Ffuncall (nargs=3, args=0x7fffffff6b30) at eval.c:2771
| #104036 0x000000000060395b in Fapply (nargs=3, args=0x7fffffff6b30) at eval.c:2346
| #104037 0x0000000000604b87 in funcall_subr (subr=0xc3d7c0 <Sapply>, numargs=3, args=0x7fffffff6b30) at eval.c:2824
| #104038 0x0000000000604890 in Ffuncall (nargs=4, args=0x7fffffff6b28) at eval.c:2769
| #104039 0x000000000064cc41 in exec_byte_code (bytestr=XIL(0x42f83e4), vector=XIL(0x15372775), maxdepth=make_number(8), args_template=make_number(256), nargs=0, args=0x7fffffff6ff8) at bytecode.c:629
| #104040 0x000000000060529c in funcall_lambda (fun=XIL(0x153727c5), nargs=0, arg_vector=0x7fffffff6ff8) at eval.c:2970
| #104041 0x00000000006048d4 in Ffuncall (nargs=1, args=0x7fffffff6ff0) at eval.c:2771
| #104042 0x000000000064cc41 in exec_byte_code (bytestr=XIL(0x42b7ad4), vector=XIL(0x930825), maxdepth=make_number(2), args_template=make_number(257), nargs=1, args=0x7fffffff7460) at bytecode.c:629
| #104043 0x000000000060529c in funcall_lambda (fun=XIL(0x3b73d75), nargs=1, arg_vector=0x7fffffff7458) at eval.c:2970
| #104044 0x00000000006048d4 in Ffuncall (nargs=2, args=0x7fffffff7450) at eval.c:2771
| #104045 0x000000000064cc41 in exec_byte_code (bytestr=XIL(0x42f82f4), vector=XIL(0x3255aa5), maxdepth=make_number(3), args_template=make_number(257), nargs=1, args=0x7fffffff78e0) at bytecode.c:629
| #104046 0x000000000060529c in funcall_lambda (fun=XIL(0x3255ab5), nargs=1, arg_vector=0x7fffffff78d8) at eval.c:2970
| #104047 0x00000000006048d4 in Ffuncall (nargs=2, args=0x7fffffff78d0) at eval.c:2771
| #104048 0x000000000064cc41 in exec_byte_code (bytestr=XIL(0x42f83e4), vector=XIL(0x153727f5), maxdepth=make_number(8), args_template=make_number(256), nargs=0, args=0x7fffffff7da8) at bytecode.c:629
| #104049 0x000000000060529c in funcall_lambda (fun=XIL(0x15372845), nargs=0, arg_vector=0x7fffffff7da8) at eval.c:2970
| #104050 0x00000000006048d4 in Ffuncall (nargs=1, args=0x7fffffff7da0) at eval.c:2771
| #104051 0x000000000064cc41 in exec_byte_code (bytestr=XIL(0x42b7ad4), vector=XIL(0x930825), maxdepth=make_number(2), args_template=make_number(257), nargs=1, args=0x7fffffff8210) at bytecode.c:629
| #104052 0x000000000060529c in funcall_lambda (fun=XIL(0x3b73d75), nargs=1, arg_vector=0x7fffffff8208) at eval.c:2970
| #104053 0x00000000006048d4 in Ffuncall (nargs=2, args=0x7fffffff8200) at eval.c:2771
| #104054 0x000000000064cc41 in exec_byte_code (bytestr=XIL(0x42f82f4), vector=XIL(0x3255aa5), maxdepth=make_number(3), args_template=make_number(257), nargs=1, args=0x7fffffff8690) at bytecode.c:629
| #104055 0x000000000060529c in funcall_lambda (fun=XIL(0x3255ab5), nargs=1, arg_vector=0x7fffffff8688) at eval.c:2970
| #104056 0x00000000006048d4 in Ffuncall (nargs=2, args=0x7fffffff8680) at eval.c:2771
| #104057 0x000000000064cc41 in exec_byte_code (bytestr=XIL(0x42f83e4), vector=XIL(0x15372875), maxdepth=make_number(8), args_template=make_number(256), nargs=0, args=0x7fffffff8b58) at bytecode.c:629
| #104058 0x000000000060529c in funcall_lambda (fun=XIL(0x153728c5), nargs=0, arg_vector=0x7fffffff8b58) at eval.c:2970
| #104059 0x00000000006048d4 in Ffuncall (nargs=1, args=0x7fffffff8b50) at eval.c:2771
| #104060 0x000000000064cc41 in exec_byte_code (bytestr=XIL(0x42b7ad4), vector=XIL(0x930825), maxdepth=make_number(2), args_template=make_number(257), nargs=1, args=0x7fffffff8fc0) at bytecode.c:629
| #104061 0x000000000060529c in funcall_lambda (fun=XIL(0x3b73d75), nargs=1, arg_vector=0x7fffffff8fb8) at eval.c:2970
| #104062 0x00000000006048d4 in Ffuncall (nargs=2, args=0x7fffffff8fb0) at eval.c:2771
| #104063 0x000000000064cc41 in exec_byte_code (bytestr=XIL(0x42f82f4), vector=XIL(0x3255aa5), maxdepth=make_number(3), args_template=make_number(257), nargs=1, args=0x7fffffff9440) at bytecode.c:629
| #104064 0x000000000060529c in funcall_lambda (fun=XIL(0x3255ab5), nargs=1, arg_vector=0x7fffffff9438) at eval.c:2970
| #104065 0x00000000006048d4 in Ffuncall (nargs=2, args=0x7fffffff9430) at eval.c:2771
| #104066 0x000000000064cc41 in exec_byte_code (bytestr=XIL(0x42f83e4), vector=XIL(0x153728f5), maxdepth=make_number(8), args_template=make_number(256), nargs=0, args=0x7fffffff9908) at bytecode.c:629
| #104067 0x000000000060529c in funcall_lambda (fun=XIL(0x15372945), nargs=0, arg_vector=0x7fffffff9908) at eval.c:2970
| #104068 0x00000000006048d4 in Ffuncall (nargs=1, args=0x7fffffff9900) at eval.c:2771
| #104069 0x000000000064cc41 in exec_byte_code (bytestr=XIL(0x42b7ad4), vector=XIL(0x930825), maxdepth=make_number(2), args_template=make_number(257), nargs=1, args=0x7fffffff9d70) at bytecode.c:629
| #104070 0x000000000060529c in funcall_lambda (fun=XIL(0x3b73d75), nargs=1, arg_vector=0x7fffffff9d68) at eval.c:2970
| #104071 0x00000000006048d4 in Ffuncall (nargs=2, args=0x7fffffff9d60) at eval.c:2771
| #104072 0x000000000064cc41 in exec_byte_code (bytestr=XIL(0x42f82f4), vector=XIL(0x3255aa5), maxdepth=make_number(3), args_template=make_number(257), nargs=1, args=0x7fffffffa1e8) at bytecode.c:629
| #104073 0x000000000060529c in funcall_lambda (fun=XIL(0x3255ab5), nargs=1, arg_vector=0x7fffffffa1e0) at eval.c:2970
| #104074 0x00000000006048d4 in Ffuncall (nargs=2, args=0x7fffffffa1d8) at eval.c:2771
| #104075 0x000000000064cc41 in exec_byte_code (bytestr=XIL(0x42f8c64), vector=XIL(0x153729a5), maxdepth=make_number(7), args_template=make_number(256), nargs=0, args=0x7fffffffa6a8) at bytecode.c:629
| #104076 0x000000000060529c in funcall_lambda (fun=XIL(0x153729f5), nargs=0, arg_vector=0x7fffffffa6a8) at eval.c:2970
| #104077 0x00000000006048d4 in Ffuncall (nargs=1, args=0x7fffffffa6a0) at eval.c:2771
| #104078 0x000000000064cc41 in exec_byte_code (bytestr=XIL(0x42b7ad4), vector=XIL(0x930825), maxdepth=make_number(2), args_template=make_number(257), nargs=1, args=0x7fffffffab10) at bytecode.c:629
| #104079 0x000000000060529c in funcall_lambda (fun=XIL(0x3b73d75), nargs=1, arg_vector=0x7fffffffab08) at eval.c:2970
| #104080 0x00000000006048d4 in Ffuncall (nargs=2, args=0x7fffffffab00) at eval.c:2771
| #104081 0x000000000064cc41 in exec_byte_code (bytestr=XIL(0x42f82f4), vector=XIL(0x3255aa5), maxdepth=make_number(3), args_template=make_number(257), nargs=1, args=0x7fffffffaf88) at bytecode.c:629
| #104082 0x000000000060529c in funcall_lambda (fun=XIL(0x3255ab5), nargs=1, arg_vector=0x7fffffffaf80) at eval.c:2970
| #104083 0x00000000006048d4 in Ffuncall (nargs=2, args=0x7fffffffaf78) at eval.c:2771
| #104084 0x000000000064cc41 in exec_byte_code (bytestr=XIL(0x42f8c04), vector=XIL(0x331ede5), maxdepth=make_number(5), args_template=make_number(514), nargs=2, args=0x7fffffffb5d0) at bytecode.c:629
| #104085 0x000000000060529c in funcall_lambda (fun=XIL(0x331ee05), nargs=2, arg_vector=0x7fffffffb5c0) at eval.c:2970
| #104086 0x00000000006048d4 in Ffuncall (nargs=3, args=0x7fffffffb5b8) at eval.c:2771
| #104087 0x000000000060390d in Fapply (nargs=4, args=0x7fffffffb5b8) at eval.c:2342
| #104088 0x0000000000604b87 in funcall_subr (subr=0xc3d7c0 <Sapply>, numargs=4, args=0x7fffffffb5b8) at eval.c:2824
| #104089 0x0000000000604890 in Ffuncall (nargs=5, args=0x7fffffffb5b0) at eval.c:2769
| #104090 0x000000000064cc41 in exec_byte_code (bytestr=XIL(0x42fb574), vector=XIL(0x425f285), maxdepth=make_number(15), args_template=make_number(642), nargs=2, args=0x7fffffffba40) at bytecode.c:629
| #104091 0x000000000060529c in funcall_lambda (fun=XIL(0x425f305), nargs=2, arg_vector=0x7fffffffba30) at eval.c:2970
| #104092 0x000000000060500d in apply_lambda (fun=XIL(0x425f305), args=XIL(0x5bd89c3), count=30) at eval.c:2906
| #104093 0x0000000000603602 in eval_sub (form=XIL(0x5bd89d3)) at eval.c:2279
| #104094 0x0000000000632c69 in readevalloop_eager_expand_eval (val=XIL(0x5bd8a73), macroexpand=XIL(0xda5d0)) at lread.c:1850
| #104095 0x00000000006335f0 in readevalloop (readcharfun=XIL(0x5b75ce5), infile0=0x0, sourcename=XIL(0x55c2e14), printflag=false, unibyte=XIL(0), readfun=XIL(0), start=XIL(0), end=XIL(0)) at lread.c:2036
| #104096 0x0000000000633a0a in Feval_buffer (buffer=XIL(0), printflag=XIL(0), filename=XIL(0x56ea3c4), unibyte=XIL(0), do_allow_print=XIL(0)) at lread.c:2103
| #104097 0x0000000000604d25 in funcall_subr (subr=0xc40240 <Seval_buffer>, numargs=0, args=0x7fffffffbfa0) at eval.c:2856
| #104098 0x0000000000604890 in Ffuncall (nargs=1, args=0x7fffffffbf98) at eval.c:2769
| #104099 0x00000000005fc8f7 in Ffuncall_interactively (nargs=1, args=0x7fffffffbf98) at callint.c:252
| #104100 0x0000000000604b87 in funcall_subr (subr=0xc3cfc0 <Sfuncall_interactively>, numargs=1, args=0x7fffffffbf98) at eval.c:2824
| #104101 0x0000000000604890 in Ffuncall (nargs=2, args=0x7fffffffbf90) at eval.c:2769
| #104102 0x00000000005fec50 in Fcall_interactively (function=XIL(0xad320), record_flag=XIL(0xaad0), keys=XIL(0x5a55745)) at callint.c:854
| #104103 0x0000000000604cac in funcall_subr (subr=0xc3d000 <Scall_interactively>, numargs=3, args=0x7fffffffc300) at eval.c:2849
| #104104 0x0000000000604890 in Ffuncall (nargs=4, args=0x7fffffffc2f8) at eval.c:2769
| #104105 0x000000000064cc41 in exec_byte_code (bytestr=XIL(0x9f4bec), vector=XIL(0x9f4c0d), maxdepth=make_number(13), args_template=make_number(1025), nargs=2, args=0x7fffffffc868) at bytecode.c:629
| #104106 0x000000000060529c in funcall_lambda (fun=XIL(0x9f4bbd), nargs=2, arg_vector=0x7fffffffc858) at eval.c:2970
| #104107 0x00000000006048d4 in Ffuncall (nargs=3, args=0x7fffffffc850) at eval.c:2771
| #104108 0x000000000064cc41 in exec_byte_code (bytestr=XIL(0x9f489c), vector=XIL(0x9f48bd), maxdepth=make_number(15), args_template=make_number(769), nargs=3, args=0x7fffffffce00) at bytecode.c:629
| #104109 0x000000000060529c in funcall_lambda (fun=XIL(0x9f485d), nargs=3, arg_vector=0x7fffffffcde8) at eval.c:2970
| #104110 0x00000000006048d4 in Ffuncall (nargs=4, args=0x7fffffffcde0) at eval.c:2771
| #104111 0x0000000000603cdb in Fapply (nargs=2, args=0x7fffffffcfc0) at eval.c:2389
| #104112 0x0000000000604b87 in funcall_subr (subr=0xc3d7c0 <Sapply>, numargs=2, args=0x7fffffffcfc0) at eval.c:2824
| #104113 0x0000000000604890 in Ffuncall (nargs=3, args=0x7fffffffcfb8) at eval.c:2769
| #104114 0x000000000064cc41 in exec_byte_code (bytestr=XIL(0x505da94), vector=XIL(0x5044095), maxdepth=make_number(3), args_template=XIL(0), nargs=0, args=0x0) at bytecode.c:629
| #104115 0x0000000000605615 in funcall_lambda (fun=XIL(0x50c3185), nargs=5, arg_vector=0x5044095) at eval.c:3052
| #104116 0x00000000006048d4 in Ffuncall (nargs=6, args=0x7fffffffd450) at eval.c:2771
| #104117 0x0000000000603cdb in Fapply (nargs=3, args=0x7fffffffd648) at eval.c:2389
| #104118 0x0000000000604b87 in funcall_subr (subr=0xc3d7c0 <Sapply>, numargs=3, args=0x7fffffffd648) at eval.c:2824
| #104119 0x0000000000604890 in Ffuncall (nargs=4, args=0x7fffffffd640) at eval.c:2769
| #104120 0x000000000064cc41 in exec_byte_code (bytestr=XIL(0x14402c4), vector=XIL(0x50441c5), maxdepth=make_number(5), args_template=make_number(128), nargs=4, args=0x7fffffffdbf0) at bytecode.c:629
| #104121 0x000000000060529c in funcall_lambda (fun=XIL(0x50441f5), nargs=4, arg_vector=0x7fffffffdbf0) at eval.c:2970
| #104122 0x00000000006048d4 in Ffuncall (nargs=5, args=0x7fffffffdbe8) at eval.c:2771
| #104123 0x00000000005fc8f7 in Ffuncall_interactively (nargs=5, args=0x7fffffffdbe8) at callint.c:252
| #104124 0x0000000000604b87 in funcall_subr (subr=0xc3cfc0 <Sfuncall_interactively>, numargs=5, args=0x7fffffffdbe8) at eval.c:2824
| #104125 0x0000000000604890 in Ffuncall (nargs=6, args=0x7fffffffdbe0) at eval.c:2769
| #104126 0x0000000000603cdb in Fapply (nargs=3, args=0x7fffffffdd90) at eval.c:2389
| #104127 0x00000000005fcd7d in Fcall_interactively (function=XIL(0xda3c0), record_flag=XIL(0), keys=XIL(0x5a3e0a5)) at callint.c:389
| #104128 0x0000000000604cac in funcall_subr (subr=0xc3d000 <Scall_interactively>, numargs=3, args=0x7fffffffdfe0) at eval.c:2849
| #104129 0x0000000000604890 in Ffuncall (nargs=4, args=0x7fffffffdfd8) at eval.c:2769
| #104130 0x000000000064cc41 in exec_byte_code (bytestr=XIL(0x9f4bec), vector=XIL(0x9f4c0d), maxdepth=make_number(13), args_template=make_number(1025), nargs=1, args=0x7fffffffe520) at bytecode.c:629
| #104131 0x000000000060529c in funcall_lambda (fun=XIL(0x9f4bbd), nargs=1, arg_vector=0x7fffffffe518) at eval.c:2970
| #104132 0x00000000006048d4 in Ffuncall (nargs=2, args=0x7fffffffe510) at eval.c:2771
| #104133 0x00000000006042a0 in call1 (fn=XIL(0x3f00), arg1=XIL(0xda3c0)) at eval.c:2620
| #104134 0x000000000055ec02 in command_loop_1 () at keyboard.c:1482
| #104135 0x00000000006013b2 in internal_condition_case (bfun=0x55e45e <command_loop_1>, handlers=XIL(0x5250), hfun=0x55dc1d <cmd_error>) at eval.c:1332
| #104136 0x000000000055e151 in command_loop_2 (ignore=XIL(0)) at keyboard.c:1110
| #104137 0x0000000000600c94 in internal_catch (tag=XIL(0xc6f0), func=0x55e124 <command_loop_2>, arg=XIL(0)) at eval.c:1097
| #104138 0x000000000055e0ef in command_loop () at keyboard.c:1089
| #104139 0x000000000055d7ef in recursive_edit_1 () at keyboard.c:695
| #104140 0x000000000055d970 in Frecursive_edit () at keyboard.c:766
| #104141 0x000000000055b5f0 in main (argc=1, argv=0x7fffffffe9f8) at emacs.c:1713
| Cannot access memory at address 0x7fffff66ff4f



Michael.
Michael Heerdegen
2018-02-27 12:08:59 UTC
Permalink
Post by Michael Heerdegen
Here is the backtrace I'm now able to produce. Looks like the crash
happens in gc.
Here is a much simpler example that crashes as well:

#+begin_src emacs-lisp
(seq-doseq (_ (stream-range 1 1000000)) nil)
#+end_src

Note that this is executed as a loop due how to streams are implemented,
although the definition of `seq-doseq' looks recursive. But it seems
that gc has a problem with the large number of conses created when
processing that.


Michael.
Eli Zaretskii
2018-02-27 18:08:56 UTC
Permalink
Date: Tue, 27 Feb 2018 13:08:59 +0100
#+begin_src emacs-lisp
(seq-doseq (_ (stream-range 1 1000000)) nil)
#+end_src
Note that this is executed as a loop due how to streams are implemented,
although the definition of `seq-doseq' looks recursive. But it seems
that gc has a problem with the large number of conses created when
processing that.
What can we do instead in such cases? Stack-overflow protection
cannot work in GC, so you are shooting yourself in the foot by
creating such large recursive structures. By the time we get to GC,
where the problem will happen, it's too late, because the memory was
already allocated.

Does anyone has a reasonable idea for avoiding the crash in such
programs?
Noam Postavsky
2018-02-28 01:29:21 UTC
Permalink
Post by Eli Zaretskii
Date: Tue, 27 Feb 2018 13:08:59 +0100
#+begin_src emacs-lisp
(seq-doseq (_ (stream-range 1 1000000)) nil)
#+end_src
Note that this is executed as a loop due how to streams are implemented,
although the definition of `seq-doseq' looks recursive.
Doesn't look recursive to me, it expands to a call to seq-do, which uses
a simple loop.
Post by Eli Zaretskii
But it seems that gc has a problem with the large number of conses
created when processing that.
What can we do instead in such cases? Stack-overflow protection
cannot work in GC, so you are shooting yourself in the foot by
creating such large recursive structures. By the time we get to GC,
where the problem will happen, it's too late, because the memory was
already allocated.
Does anyone has a reasonable idea for avoiding the crash in such
programs?
I don't have a quick answer for the general case, but I think it's a bug
in stream.el that it's creating such large structures in the first
place. As far as I understand it, the point of streams is to handle
long lists by encoding them as

(FIRST-VALUE . FUNCTION-TO-PRODUCE-REST-OF-LIST)

so as to avoid allocating large amounts of memory. Is there an easy way
to find out what the large structures are, and where they are coming
from?
Michael Heerdegen
2018-02-28 10:58:49 UTC
Permalink
Post by Noam Postavsky
Post by Eli Zaretskii
Post by Michael Heerdegen
#+begin_src emacs-lisp
(seq-doseq (_ (stream-range 1 1000000)) nil)
#+end_src
Note that this is executed as a loop due how to streams are
implemented, although the definition of `seq-doseq' looks
recursive.
Doesn't look recursive to me, it expands to a call to seq-do, which uses
a simple loop.
I was imprecise, I meant that the streams are defined recursively (most
of the time). Though, it's a delayed type of recursion. Anyway, I
think that this doesn't matter here.
Post by Noam Postavsky
Post by Eli Zaretskii
Does anyone has a reasonable idea for avoiding the crash in such
programs?
I don't have a quick answer for the general case, but I think it's a bug
in stream.el that it's creating such large structures in the first
place. As far as I understand it, the point of streams is to handle
long lists by encoding them as
(FIRST-VALUE . FUNCTION-TO-PRODUCE-REST-OF-LIST)
Yes, that's exactly how it's implemented. When requesting more elements
from the stream, that becomes

(FIRST-VALUE .
(SECOND-VALUE . FUNCTION-TO-PRODUCE-MORE-REST-OF-LIST))

When we loop over the string, the cons whose car is the FIRST-VALUE,
let's call it cons1, is immediately thrown away, and we continue with

(SECOND-VALUE . FUNCTION-TO-PRODUCE-MORE-REST-OF-LIST)

etc.

AFAIU the problem is that the cons1 still exists in memory until garbage
collection kicks in. When that happens, the cons1 points to a largely
recursive cons structure, though this structure is never referenced from
Lisp in this form.
Post by Noam Postavsky
so as to avoid allocating large amounts of memory. Is there an easy way
to find out what the large structures are, and where they are coming
from?
I think I've answered that. At least, I think so. What I don't
understand is that when I force the `seq-doseq' to call
`garbace-collect' explicitly every 1000 cycles, or so, it doesn't help:
the crash still happens after generating ~ 70 000 elements, or some
more, but I can't avoid the crash, no matter how often I call gc. So
I'm not sure whether these long lists are the problem or something else.
The FUNCTION-TO-PRODUCE-MORE-REST-OF-LIST looks harmless when I print
it, even after thousands of iterations, so I would not understand why
that could be problematic. streams.el implements things in a way that
these rest functions are not deeply nested lambdas.


Michael.
Eli Zaretskii
2018-02-28 16:00:00 UTC
Permalink
Date: Wed, 28 Feb 2018 11:58:49 +0100
Post by Noam Postavsky
(FIRST-VALUE . FUNCTION-TO-PRODUCE-REST-OF-LIST)
Yes, that's exactly how it's implemented. When requesting more elements
from the stream, that becomes
(FIRST-VALUE .
(SECOND-VALUE . FUNCTION-TO-PRODUCE-MORE-REST-OF-LIST))
When we loop over the string, the cons whose car is the FIRST-VALUE,
let's call it cons1, is immediately thrown away, and we continue with
(SECOND-VALUE . FUNCTION-TO-PRODUCE-MORE-REST-OF-LIST)
etc.
How do you see that the car is immediately thrown away?
AFAIU the problem is that the cons1 still exists in memory until garbage
collection kicks in. When that happens, the cons1 points to a largely
recursive cons structure, though this structure is never referenced from
Lisp in this form.
What is "cons1" in this context?
Michael Heerdegen
2018-02-28 16:20:53 UTC
Permalink
Post by Eli Zaretskii
How do you see that the car is immediately thrown away?
After expansion and dispatch, this is what gets executed:

#+begin_src emacs-lisp
(let ((stream (stream-range 1 1000000)))
(while (not (stream-empty-p stream))
(funcall #'ignore (stream-first stream))
(setq stream (stream-rest stream))))
#+end_src

Creating an element and throwing away the outermost cons alternate.
Post by Eli Zaretskii
Post by Michael Heerdegen
AFAIU the problem is that the cons1 still exists in memory until
garbage collection kicks in. When that happens, the cons1 points to
a largely recursive cons structure, though this structure is never
referenced from Lisp in this form.
What is "cons1" in this context?
I said it some lines above: I defined it as name of the "first cons",
i.e. the original stream. The return value of `stream-range' in the
above example.


Michael.
Eli Zaretskii
2018-02-28 17:22:57 UTC
Permalink
Date: Wed, 28 Feb 2018 17:20:53 +0100
Post by Eli Zaretskii
How do you see that the car is immediately thrown away?
#+begin_src emacs-lisp
(let ((stream (stream-range 1 1000000)))
(while (not (stream-empty-p stream))
(funcall #'ignore (stream-first stream))
(setq stream (stream-rest stream))))
#+end_src
Creating an element and throwing away the outermost cons alternate.
I don't think it's thrown away from the POV of GC. But you can easily
see what is going on if you trace the GC on the C level. You should
be able to see which object causes recursion in mark_object. I didn't
look long enough, but what I did see looks very much like the entire
unwound stream.
Michael Heerdegen
2018-02-28 18:25:45 UTC
Permalink
Post by Eli Zaretskii
I don't think it's thrown away from the POV of GC. But you can easily
see what is going on if you trace the GC on the C level. You should
be able to see which object causes recursion in mark_object. I didn't
look long enough, but what I did see looks very much like the entire
unwound stream.
I have an idea what could be going on. In the stream-range example,
this is how the stream is build:

#+begin_src emacs-lisp
(list stream--identifier
(let
(forced val)
(lambda
(&optional check)
(if
check
forced
(unless
forced
(setf
val
(progn
(cons
start
(stream-range (+ start step) end step))))
(setf forced t))
val))))
#+end_src

The inner `stream-range' call results in a closure, and I guess that
this closure includes a reference to the outside VAL, which is the
stream from one step back (though there isn't a lexical reference to the
variable...does that make sense?)

So there could be a chain of references via closure variables back to
the first cons.


Michael.
Michael Heerdegen
2018-03-01 11:25:43 UTC
Permalink
Post by Michael Heerdegen
The inner `stream-range' call results in a closure, and I guess that
this closure includes a reference to the outside VAL, which is the
stream from one step back (though there isn't a lexical reference to the
variable...does that make sense?)
This is probably only a part of the puzzle. Some examples:

test1.el: This is more or less the `stream-range' example reimplemented
without dependencies so that it works in emacs -Q:

#+begin_src emacs-lisp
;; -*- lexical-binding: t -*-

(defun my-range (start)
(let (forced val)
(lambda ()
(unless forced
(setq val (cons start (my-range (1+ start)))
forced t))
val)))

(defun my-test ()
(let ((range-object (my-range 1))
current-cons)
(while (< (car (setq current-cons (funcall range-object))) (* 1000 1000))
(message "%d" (car current-cons))
(setq range-object (cdr current-cons)))))

(my-test)
#+end_src

Loading this file test1.el crashes Emacs - but if you compile it,
loading the compiled file doesn't crash. This is what I expected from
my previous thoughts, because only the uncompiled closures include a
reference to the outer VALs.


test2.el:

#+begin_src emacs-lisp
;; -*- lexical-binding: t -*-

(require 'stream)

(let ((stream (stream-range 1 1000000)))
(stream-flush stream))
#+end_src

This traverses the whole stream ignoring the elements. Loading this
file crashes Emacs, no matter if compiled or not. I'm surprised it
doesn't work even when compiled.


test3.el:

#+begin_src emacs-lisp
;; -*- lexical-binding: t -*-

(require 'stream)

(let ((stream (stream-range 1 1000000)))
(while (not (stream-empty-p stream))
(cl-callf stream-rest stream)))
#+end_src

This is semantically exactly like test2.el, only the call to
`stream-flush' has been replaced by literally writing out the
definition. Nonetheless, the compiled file suddenly doesn't crash Emacs
when loaded. Loading the uncompiled file test3.el still crashes.

Seems that whether we get a crash or not depends on details in the
implementation of lexical-binding.

Michael.
Eli Zaretskii
2018-03-01 15:00:33 UTC
Permalink
Date: Thu, 01 Mar 2018 12:25:43 +0100
Seems that whether we get a crash or not depends on details in the
implementation of lexical-binding.
Byte compilation doesn't just produce byte code, it also changes the
code into an equivalent one, but "equivalence" in this context doesn't
include side effects like stack usage. As an extreme example,
consider a tail-recursive program that the byte compiler converts (or
at least might convert theoretically) into a loop.
Noam Postavsky
2018-03-02 14:11:46 UTC
Permalink
Post by Michael Heerdegen
#+begin_src emacs-lisp
;; -*- lexical-binding: t -*-
(require 'stream)
(let ((stream (stream-range 1 1000000)))
(while (not (stream-empty-p stream))
(cl-callf stream-rest stream)))
#+end_src
This is semantically exactly like test2.el, only the call to
`stream-flush' has been replaced by literally writing out the
definition. Nonetheless, the compiled file suddenly doesn't crash Emacs
when loaded. Loading the uncompiled file test3.el still crashes.
Aha, but the following also crashes, whether compiled or not:

;; -*- lexical-binding: t -*-

(require 'stream)

(let* ((stream0 (stream-range 1 1000000))
(stream stream0))
(while (not (stream-empty-p stream))
(cl-callf stream-rest stream)))

So the problem is that the initial stream0 object can reach the entire
unfolding stream as it goes, and just holding on to that reference is
enough to keep the whole thing in memory.

Now, I can see that letting stream0 automagically get access to the
unfolded result can be an optimization in some cases, although in this
case it's a pessimization. It could also affect the semantics if
unfolding the stream has side effects, not sure if stream.el makes
guarantees about that though.
Michael Heerdegen
2018-03-02 15:06:49 UTC
Permalink
Post by Michael Heerdegen
;; -*- lexical-binding: t -*-
(require 'stream)
(let* ((stream0 (stream-range 1 1000000))
(stream stream0))
(while (not (stream-empty-p stream))
(cl-callf stream-rest stream)))
I guess you should have some awe if you write something like
(stream-range 1 1000000) and keep in mind what happens with this
gigantic thing. Though, could the object `stream0' references not
already be garbage-collected when the loop has been entered, since there
is no lexical reference to that variable there?

BTW, it gets even worse if you append streams, since the original
streams are not copied and magically unfolded when you generate elements
from the concatenation.
Post by Michael Heerdegen
Now, I can see that letting stream0 automagically get access to the
unfolded result can be an optimization in some cases, although in this
case it's a pessimization. It could also affect the semantics if
unfolding the stream has side effects, not sure if stream.el makes
guarantees about that though.
AFAICT we make no such guarantees at all. When generating stream
elements has side effects (which is not ideal, but it's not forbidden),
then you must know what you are doing. In my experience, side effects
often directly correlate with element generation - e.g. for a stream of
search matches, a side effect is to set a variable to the position where
to continue searching. These kind of side effects are not problematic.

For non-trivial side-effects, dunno, never wanted something like that.
But this problem also concerns other forms of delayed evaluation,
including thunks, and generally everything you can do with closures.


Michael.
Eli Zaretskii
2018-03-02 15:43:27 UTC
Permalink
Date: Fri, 02 Mar 2018 16:06:49 +0100
Post by Noam Postavsky
(let* ((stream0 (stream-range 1 1000000))
(stream stream0))
(while (not (stream-empty-p stream))
(cl-callf stream-rest stream)))
I guess you should have some awe if you write something like
(stream-range 1 1000000) and keep in mind what happens with this
gigantic thing. Though, could the object `stream0' references not
already be garbage-collected when the loop has been entered, since there
is no lexical reference to that variable there?
It's still referenced by the let* form, I think.

But even if it didn't, you cannot rely on it being GC'ed, because
Emacs triggers GC at times which are hard or even impossible to
predict.
Nicolas Petton
2018-03-02 20:16:36 UTC
Permalink
Post by Noam Postavsky
Now, I can see that letting stream0 automagically get access to the
unfolded result can be an optimization in some cases, although in this
case it's a pessimization.
stream.el is at its core just an implementation of lazy-cons cells, so
not letting stream0 get access to the result would mean changing the
core implementation (I'm not necessarily against it).

If we accept that `seq-elt', and other positional functions of seq.el
should not work on streams, then I could rewrite stream.el to make it a
positioned stream where previous elements are discarded after each
element generation. However the list of supported functions from seq.el
API would be significantly reduced.
Post by Noam Postavsky
It could also affect the semantics if unfolding the stream has side
effects, not sure if stream.el makes guarantees about that though.
No, it doesn't.

Nico
Nicolas Petton
2018-03-02 20:58:43 UTC
Permalink
Post by Nicolas Petton
If we accept that `seq-elt', and other positional functions of seq.el
should not work on streams, then I could rewrite stream.el to make it a
positioned stream where previous elements are discarded after each
element generation. However the list of supported functions from seq.el
API would be significantly reduced.
I had something like the following in mind:

(cl-defstruct nstream current next-function)

(cl-defmethod nstream-next ((stream nstream))
(setf (nstream-current stream) (funcall (nstream-next-function stream)
(nstream-current stream))))

(defun nstream-range (&optional start end step)
(unless start (setq start 0))
(unless step (setq step 1))
(make-nstream :current start
:next-function (lambda (cur)
(if (equal cur end)
nil
(+ cur step)))))

Cheers,
Nico
Michael Heerdegen
2018-03-03 07:56:05 UTC
Permalink
Post by Nicolas Petton
(cl-defstruct nstream current next-function)
(cl-defmethod nstream-next ((stream nstream))
(setf (nstream-current stream) (funcall (nstream-next-function stream)
(nstream-current stream))))
(defun nstream-range (&optional start end step)
(unless start (setq start 0))
(unless step (setq step 1))
(make-nstream :current start
:next-function (lambda (cur)
(if (equal cur end)
nil
(+ cur step)))))
This is an implementation of iterators, not streams. We already have an
implementation of iterators in Emacs.


Michael.
Michael Heerdegen
2018-03-03 07:54:02 UTC
Permalink
Post by Nicolas Petton
stream.el is at its core just an implementation of lazy-cons cells, so
not letting stream0 get access to the result would mean changing the
core implementation (I'm not necessarily against it).
Please don't do this. The semantics of streams is worth keeping. I
make use of it in el-search and other stuff.


Michael.
Nicolas Petton
2018-03-03 08:47:41 UTC
Permalink
Post by Michael Heerdegen
Please don't do this. The semantics of streams is worth keeping. I
make use of it in el-search and other stuff.
I'm not saying that I want this :)

Nico
John Mastro
2018-03-02 21:48:23 UTC
Permalink
Post by Michael Heerdegen
;; -*- lexical-binding: t -*-
(require 'stream)
(let* ((stream0 (stream-range 1 1000000))
(stream stream0))
(while (not (stream-empty-p stream))
(cl-callf stream-rest stream)))
So the problem is that the initial stream0 object can reach the entire
unfolding stream as it goes, and just holding on to that reference is
enough to keep the whole thing in memory.
Now, I can see that letting stream0 automagically get access to the
unfolded result can be an optimization in some cases, although in this
case it's a pessimization. It could also affect the semantics if
unfolding the stream has side effects, not sure if stream.el makes
guarantees about that though.
Clojure has/had a similar issue. They describe this scenario (having a
live reference to the beginning of the stream, preventing GC from
collecting it) as "holding onto the head" of the stream (in Clojure
they're called lazy seqs).

Their solution was what they call "locals clearing". The compiler tracks
the lifetimes of local bindings and "clears" them (by setting them to
nil/null) after their point of last use, e.g.:

(let* ((stream0 (stream-range 1 1000000))
(stream stream0))
(setq stream0 nil) ;; <<< Inserted by compiler
(while (not (stream-empty-p stream))
(cl-callf stream-rest stream)))

If the code does reference stream0 later, locals clearing can't help
you, but that's considered a "if it hurts, don't do it" situation.

This probably isn't practical for Emacs, especially since it could only
work for byte-compiled code, but thought the prior art may be of
interest.

John
Noam Postavsky
2018-03-03 23:00:00 UTC
Permalink
Post by Noam Postavsky
(let* ((stream0 (stream-range 1 1000000))
(stream stream0))
(setq stream0 nil) ;; <<< Inserted by compiler
(while (not (stream-empty-p stream))
(cl-callf stream-rest stream)))
If the code does reference stream0 later, locals clearing can't help
you, but that's considered a "if it hurts, don't do it" situation.
This probably isn't practical for Emacs, especially since it could only
work for byte-compiled code, but thought the prior art may be of
interest.
Not sure how doable this solution is, but the fact that it works only
for byte-compiled code seems fine to me. The interpreted case is doomed
to fail anyway since the interpreter doesn't prune redundant variables
from closures.
Noam Postavsky
2018-03-04 15:56:36 UTC
Permalink
Post by Noam Postavsky
Post by Noam Postavsky
(let* ((stream0 (stream-range 1 1000000))
(stream stream0))
(setq stream0 nil) ;; <<< Inserted by compiler
(while (not (stream-empty-p stream))
(cl-callf stream-rest stream)))
If the code does reference stream0 later, locals clearing can't help
you, but that's considered a "if it hurts, don't do it" situation.
This probably isn't practical for Emacs, especially since it could only
work for byte-compiled code, but thought the prior art may be of
interest.
Not sure how doable this solution is, but the fact that it works only
for byte-compiled code seems fine to me. The interpreted case is doomed
to fail anyway since the interpreter doesn't prune redundant variables
from closures.
Hmm, I think it won't work by itself though, just doing

(stream-flush (stream-range 1 1000000))

also crashes, due to the head of the stream being referenced from the C
stack somewhere (I can get the address from gdb, but I can't figure out
how to get to the corresponding C variable from there).
Eli Zaretskii
2018-03-04 17:02:03 UTC
Permalink
Date: Sun, 04 Mar 2018 10:56:36 -0500
Hmm, I think it won't work by itself though, just doing
(stream-flush (stream-range 1 1000000))
also crashes, due to the head of the stream being referenced from the C
stack somewhere (I can get the address from gdb, but I can't figure out
how to get to the corresponding C variable from there).
Did you try "info symbol ADDRESS"? (I'm not sure this will work for
automatic variables, though.)

You could also try "info locals" after "set print address on" and/or
"set print symbol on".
Noam Postavsky
2018-03-11 18:52:22 UTC
Permalink
Post by Eli Zaretskii
Post by Noam Postavsky
also crashes, due to the head of the stream being referenced from the C
stack somewhere (I can get the address from gdb, but I can't figure out
how to get to the corresponding C variable from there).
Did you try "info symbol ADDRESS"? (I'm not sure this will work for
automatic variables, though.)
Doesn't seem to work. I guess it wouldn't work if the address was in
the middle of an array either.
Post by Eli Zaretskii
You could also try "info locals" after "set print address on" and/or
"set print symbol on".
Those settings don't seem to help.

I first guessed that the problem is due to saving function arguments
during funcall, so I tried the following to check it:

--- i/src/bytecode.c
+++ w/src/bytecode.c
@@ -387,7 +387,10 @@ exec_byte_code (Lisp_Object bytestr, Lisp_Object vector, Lisp_Object maxdepth,
make_number (nargs)));
ptrdiff_t pushedargs = min (nonrest, nargs);
for (ptrdiff_t i = 0; i < pushedargs; i++, args++)
- PUSH (*args);
+ {
+ PUSH (*args);
+ *args = Qnil;
+ }
if (nonrest < nargs)
PUSH (Flist (nargs - nonrest, args));
else

This did change the backtrace (from starting with mark_specpdl to
mark_stack), meaning I did find one reference, but it still crashes, so
there must be more.
Eli Zaretskii
2018-03-11 20:31:09 UTC
Permalink
Date: Sun, 11 Mar 2018 14:52:22 -0400
This did change the backtrace (from starting with mark_specpdl to
mark_stack), meaning I did find one reference, but it still crashes, so
there must be more.
If you have the address, you could first find the stack frame to which
it belongs, right? Then go to that stack frame and type "info
locals", which should give you the locals in that frame. One of them
is your variable. It could also be a temporary slot, in which case
disassembly of that frame's function should show it.
Noam Postavsky
2018-03-11 21:51:19 UTC
Permalink
Post by Eli Zaretskii
Date: Sun, 11 Mar 2018 14:52:22 -0400
This did change the backtrace (from starting with mark_specpdl to
mark_stack), meaning I did find one reference, but it still crashes, so
there must be more.
If you have the address, you could first find the stack frame to which
it belongs, right?
Um, how do I do that part?
Eli Zaretskii
2018-03-12 03:28:19 UTC
Permalink
Date: Sun, 11 Mar 2018 17:51:19 -0400
Post by Eli Zaretskii
If you have the address, you could first find the stack frame to which
it belongs, right?
Um, how do I do that part?
By comparing the address with the value of $bp in each frame, I'd say.
Noam Postavsky
2018-03-13 01:59:57 UTC
Permalink
Post by Eli Zaretskii
Date: Sun, 11 Mar 2018 17:51:19 -0400
Post by Eli Zaretskii
If you have the address, you could first find the stack frame to which
it belongs, right?
Um, how do I do that part?
By comparing the address with the value of $bp in each frame, I'd say.
Hmm, I found a match, but it doesn't make any sense.

#4851 0x0000000000611d4f in mark_vectorlike (ptr=0x2e64c90) at ../../src/alloc.c:6227
#4852 0x0000000000612b42 in mark_object (arg=XIL(0x2e64c95)) at ../../src/alloc.c:6624
#4853 0x000000000060f3ce in mark_maybe_pointer (p=0x2e64c90) at ../../src/alloc.c:4936
#4854 0x000000000060f452 in mark_memory (start=0x7fffffffa520, end=0x7fffffffe868)
at ../../src/alloc.c:4985
#4855 0x000000000060f493 in mark_stack (bottom=0x7fffffffe868 "a\036h\364\377\177",
end=0x7fffffffa520 "0\245\377\377\377\177") at ../../src/alloc.c:5193

(gdb) frame 4854
#4854 0x000000000060f452 in mark_memory (start=0x7fffffffa520, end=0x7fffffffe868)
at ../../src/alloc.c:4985
4985 mark_maybe_pointer (*(void **) pp);
(gdb) p pp
$28 = 0x7fffffffa968 "\220L\346\002"

(gdb) frame 4864
#4864 0x000000000068d950 in exec_byte_code (bytestr=XIL(0x2e7aad4), vector=XIL(0x2e72715),
maxdepth=make_number(18), args_template=make_number(768), nargs=3, args=0x7fffffffad20)
at ../../src/bytecode.c:632
632 TOP = Ffuncall (op + 1, &TOP);
(gdb) p $rbp
$29 = (void *) 0x7fffffffabd0

(gdb) p/x $rbp - $28
$32 = 0x268

(gdb) disas /s
[...]
1180 CASE (Bbuffer_substring):
1181 {
1182 Lisp_Object v1 = POP;
0x000000000068fea4 <+13154>: mov -0x40(%rbp),%rax
0x000000000068fea8 <+13158>: lea -0x8(%rax),%rdx
0x000000000068feac <+13162>: mov %rdx,-0x40(%rbp)
0x000000000068feb0 <+13166>: mov (%rax),%rax
0x000000000068feb3 <+13169>: mov %rax,-0x268(%rbp)

1183 TOP = Fbuffer_substring (TOP, v1);
0x000000000068feba <+13176>: mov -0x268(%rbp),%rdx
0x000000000068fec1 <+13183>: mov -0x40(%rbp),%rax
0x000000000068fec5 <+13187>: mov %rdx,%rsi
0x000000000068fec8 <+13190>: mov (%rax),%rdi
0x000000000068fecb <+13193>: callq 0x627e0a <Fbuffer_substring>

It can't be a buffer-substring arg, but that's the only reference to
-0x268(%rbp) in that function.
Eli Zaretskii
2018-03-13 16:06:23 UTC
Permalink
Date: Mon, 12 Mar 2018 21:59:57 -0400
4985 mark_maybe_pointer (*(void **) pp);
(gdb) p pp
$28 = 0x7fffffffa968 "\220L\346\002"
Should you look at pp or at *pp?

Also note that for Lisp objects that are marked you need to reset
their mark bit before trying to determine their type and value.

If none of the above helps, please walk me through the steps that led
you to look at -0x268(%rbp), because I'm not sure I follow.
Noam Postavsky
2018-03-14 00:09:17 UTC
Permalink
Post by Eli Zaretskii
Should you look at pp or at *pp?
I think it should be pp, but I'm not sure. The context:

#4854 0x000000000060f452 in mark_memory (start=0x7fffffffa520, end=0x7fffffffe868)
at ../../src/alloc.c:4985
#4855 0x000000000060f493 in mark_stack (bottom=0x7fffffffe868 "a\036h\364\377\177",
end=0x7fffffffa520 "0\245\377\377\377\177") at ../../src/alloc.c:5193

mark_memory (void *start, void *end)
{
...
for (pp = start; (void *) pp < end; pp += GC_POINTER_ALIGNMENT)
{
mark_maybe_pointer (*(void **) pp);
mark_maybe_object (*(Lisp_Object *) pp);
}

So the value of pp ranges over stack addresses and *pp would be the
contents of the stack location.
Post by Eli Zaretskii
Also note that for Lisp objects that are marked you need to reset
their mark bit before trying to determine their type and value.
I think I'm looking for a C variable, and not a Lisp object (although
the C variable presumably contains/points to a Lisp object).
Post by Eli Zaretskii
If none of the above helps, please walk me through the steps that led
you to look at -0x268(%rbp), because I'm not sure I follow.
Starting with the value of pp, I then go up looking for a close value of
$rbp:

(gdb) p pp
$39 = 0x7fffffffa968 "\220L\346\002"

(gdb) up
#4855 0x000000000060f493 in mark_stack (bottom=0x7fffffffe868 "a\036h\364\377\177",
end=0x7fffffffa520 "0\245\377\377\377\177") at ../../src/alloc.c:5193
5193 mark_memory (bottom, end);
(gdb) p $rbp
$40 = (void *) 0x7fffffffa420
(gdb) up
#4856 0x00000000006cdd75 in mark_one_thread (thread=0xe103e0 <main_thread>) at ../../src/thread.c:616
616 mark_stack (thread->m_stack_bottom, stack_top);
(gdb) p $rbp
$41 = (void *) 0x7fffffffa470
[...]
(gdb) up
#4863 0x000000000063c2cb in Ffuncall (nargs=6, args=0x7fffffffa7f8) at ../../src/eval.c:2751
2751 maybe_gc ();
(gdb) p $rbp
$48 = (void *) 0x7fffffffa780
(gdb) up
#4864 0x000000000068d950 in exec_byte_code (bytestr=XIL(0x2e7aad4), vector=XIL(0x2e72715),
maxdepth=make_number(18), args_template=make_number(768), nargs=3, args=0x7fffffffad20)
at ../../src/bytecode.c:632
632 TOP = Ffuncall (op + 1, &TOP);
(gdb) p $rbp
$49 = (void *) 0x7fffffffabd0

Now I see that $rbp is higher than the target address, and the
difference is 0x268, so the target location should be -0x268(%rbp).

(gdb) p $rbp - 0x7fffffffa968
$52 = (void *) 0x268

Except something must be wrong in my reasoning, since the only
ocurrences of -0x268(%rbp) are the buffer-string args, which could only
hold integers or markers (neither of which could further point to long
chains of objects).
Eli Zaretskii
2018-03-15 16:34:16 UTC
Permalink
Date: Tue, 13 Mar 2018 20:09:17 -0400
Post by Eli Zaretskii
Should you look at pp or at *pp?
#4854 0x000000000060f452 in mark_memory (start=0x7fffffffa520, end=0x7fffffffe868)
at ../../src/alloc.c:4985
#4855 0x000000000060f493 in mark_stack (bottom=0x7fffffffe868 "a\036h\364\377\177",
end=0x7fffffffa520 "0\245\377\377\377\177") at ../../src/alloc.c:5193
mark_memory (void *start, void *end)
{
...
for (pp = start; (void *) pp < end; pp += GC_POINTER_ALIGNMENT)
{
mark_maybe_pointer (*(void **) pp);
mark_maybe_object (*(Lisp_Object *) pp);
}
So the value of pp ranges over stack addresses and *pp would be the
contents of the stack location.
But the call to mark_maybe_pointer means that we consider pp to be a
pointer (in)to a Lisp object.

Anyway, wouldn't it be easier to look one frame lower? We have this:

#4850 0x0000000000612b42 in mark_object (arg=XIL(0x2efcb83)) at ../../src/alloc.c:6624
#4851 0x0000000000611d4f in mark_vectorlike (ptr=0x2e64c90) at ../../src/alloc.c:6227
#4852 0x0000000000612b42 in mark_object (arg=XIL(0x2e64c95)) at ../../src/alloc.c:6624
#4853 0x000000000060f3ce in mark_maybe_pointer (p=0x2e64c90) at ../../src/alloc.c:4936
#4854 0x000000000060f452 in mark_memory (start=0x7fffffffa520, end=0x7fffffffe868)
at ../../src/alloc.c:4985
#4855 0x000000000060f493 in mark_stack (bottom=0x7fffffffe868 "a\036h\364\377\177",
end=0x7fffffffa520 "0\245\377\377\377\177") at ../../src/alloc.c:5193

In frame #4852, we have found an object, and we are marking it. Did
Post by Eli Zaretskii
Also note that for Lisp objects that are marked you need to reset
their mark bit before trying to determine their type and value.
Noam Postavsky
2018-03-17 15:53:36 UTC
Permalink
Post by Eli Zaretskii
In frame #4852, we have found an object, and we are marking it. Did
Post by Eli Zaretskii
Also note that for Lisp objects that are marked you need to reset
their mark bit before trying to determine their type and value.
Okay, I think xpr takes care of that, right? (I've restarted the debug
sessions a few times, so numbers may not match exactly)

#4853 0x000000000060f429 in mark_maybe_pointer (p=0x2e64c90) at ../../src/alloc.c:4936
4936 mark_object (obj);
(gdb) p obj
$60 = XIL(0x2e64c95)
(gdb) xpr
Lisp_Vectorlike
PVEC_NORMAL_VECTOR
$61 = (struct Lisp_Vector *) 0x2e64c90
{XIL(0x2efcb63), make_number(1000000), XIL(0x2efcb53), XIL(0x2efcb73), XIL(0x2efcb83),
XIL(0x20ab5b0), XIL(0xc090)}
(gdb) p $61->contents[0]
$62 = XIL(0x2efcb63)
(gdb) xpr
Lisp_Cons
$63 = (struct Lisp_Cons *) 0x2efcb60
{
[...]
car = make_number(2369),
[...]
chain = 0x0
[...]
(gdb) p $61->contents[1]
$64 = make_number(1000000)
(gdb) p $61->contents[2]
$65 = XIL(0x2efcb53)
(gdb) xpr
Lisp_Cons
[...] car = make_number(1), [...] chain = 0x0 [...]
(gdb) p $61->contents[3]
$67 = XIL(0x2efcb73)
(gdb) xpr
[...] car = XIL(0xc090), [...] chain = 0x0 [...]
(gdb) p $61->contents[4]
[...] car = XIL(0x2efc443), [...] chain = 0x0 [...]
(gdb) p $61->contents[5]
[...]
$72 = (struct Lisp_Symbol *) 0x2e97ef0
"stream-range"
(gdb) p $61->contents[6]
[...]
$74 = (struct Lisp_Symbol *) 0xdf89d0 <lispsym+49296>
"t"


It looks like a the lexical environment of a bytecode function, probably
the initial stream, e.g., (stream-range 1 1000000) gives:

(--stream-- #[256 "\211\203\303\242\207\303\242\204 \304\300\242\305\300\242\302\242\\\301\302\242#B\240\210\303\306\240\210\304\242\207"
[(1) 1000000 (1) (nil) (nil) stream-range t] 7 "
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(fn &optional CHECK)"])

Not sure where to go next with this.
Eli Zaretskii
2018-03-17 16:10:24 UTC
Permalink
Date: Sat, 17 Mar 2018 11:53:36 -0400
It looks like a the lexical environment of a bytecode function, probably
(--stream-- #[256 "\211\203\303\242\207\303\242\204 \304\300\242\305\300\242\302\242\\\301\302\242#B\240\210\303\306\240\210\304\242\207"
[(1) 1000000 (1) (nil) (nil) stream-range t] 7 "
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(fn &optional CHECK)"])
Not sure where to go next with this.
The goal was to find out which variable holds a reference to the
entire long stream, right? So it sounds like a pointer to it is kept
in an automatic variable on the stack of exec_byte_code, right? Which
kinda makes sense, since the stream is still being processed, I think.

Or am I confused?
Eli Zaretskii
2018-03-17 16:27:38 UTC
Permalink
Date: Sat, 17 Mar 2018 18:10:24 +0200
Post by Noam Postavsky
(--stream-- #[256 "\211\203\303\242\207\303\242\204 \304\300\242\305\300\242\302\242\\\301\302\242#B\240\210\303\306\240\210\304\242\207"
[(1) 1000000 (1) (nil) (nil) stream-range t] 7 "
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(fn &optional CHECK)"])
Not sure where to go next with this.
The goal was to find out which variable holds a reference to the
entire long stream, right? So it sounds like a pointer to it is kept
in an automatic variable on the stack of exec_byte_code, right? Which
kinda makes sense, since the stream is still being processed, I think.
Or am I confused?
Actually, there's still some mystery: if this object is a 7-element
vector, where do all the other GC frame come from? Hmm... how
long/deep is each of the cons cells in elements 1 through 4 of the
vector? If they are deeply nested, then that's the answer we've been
looking for, I think.
Noam Postavsky
2018-03-17 17:28:22 UTC
Permalink
Post by Eli Zaretskii
Post by Eli Zaretskii
The goal was to find out which variable holds a reference to the
entire long stream, right? So it sounds like a pointer to it is kept
in an automatic variable on the stack of exec_byte_code, right? Which
kinda makes sense, since the stream is still being processed, I think.
Yeah, but which variable exactly? I'd like to find it and add 'X =
Qnil;' to confirm we've found where it is.
Post by Eli Zaretskii
Actually, there's still some mystery: if this object is a 7-element
vector, where do all the other GC frame come from? Hmm... how
long/deep is each of the cons cells in elements 1 through 4 of the
vector? If they are deeply nested, then that's the answer we've been
looking for, I think.
It's a bit confusing because of the indirection: stream-range uses the
stream-cons macro, which uses the stream-make macro, which uses the
thunk-delay macro. I believe the end result is that the lexical
environment of the resulting closure has access to the next
stream-element in the chain, so the nesting depth is the length of the
stream (i.e., 100000 in the example). Perhaps this example makes it
clearer:

(setq print-circle t)

(let* ((s0 (stream-range 1 2))
(s1 (stream-rest s0)))
(list s0 s1))
;=>
((--stream--
#[256 "\211\203..."
[(1) 2 (1) (t)
((1 . #1=(--stream--
#[256 "\211\203..."
[(nil) (nil) nil t]
3 "\n\n(fn &optional CHECK)"])))
stream-range t]
7 "\n\n(fn &optional CHECK)"])
#1#)
Eli Zaretskii
2018-03-19 20:05:58 UTC
Permalink
Date: Sat, 17 Mar 2018 13:28:22 -0400
Post by Eli Zaretskii
The goal was to find out which variable holds a reference to the
entire long stream, right? So it sounds like a pointer to it is kept
in an automatic variable on the stack of exec_byte_code, right? Which
kinda makes sense, since the stream is still being processed, I think.
Yeah, but which variable exactly? I'd like to find it and add 'X =
Qnil;' to confirm we've found where it is.
I don't think you will be able to tell without stepping through the
byte-code interpreter code, keeping track of what it stores where.
Michael Heerdegen
2018-02-28 11:05:09 UTC
Permalink
Hello,

I had written that
Post by Eli Zaretskii
Post by Michael Heerdegen
(seq-doseq (_ (stream-range 1 1000000)) nil)
crashes. CC'ing Nicolas, the author of stream.el.
Post by Eli Zaretskii
What can we do instead in such cases? Stack-overflow protection
cannot work in GC, so you are shooting yourself in the foot by
creating such large recursive structures. By the time we get to GC,
where the problem will happen, it's too late, because the memory was
already allocated.
Does anyone has a reasonable idea for avoiding the crash in such
programs?
I would appreciate any effort to fix that, because it seems that
currently streams are broken by design, and there is no way to fix that
from the Lisp implementation.

Yes, we could implement iterators instead of streams - that's what we
get when we avoid the consing. But it's something different and not
always an alternative, depending on what you want to do.


Michael.
Nicolas Petton
2018-02-28 13:20:42 UTC
Permalink
Post by Michael Heerdegen
Hello,
Hi,
Post by Michael Heerdegen
I had written that
Post by Michael Heerdegen
(seq-doseq (_ (stream-range 1 1000000)) nil)
crashes. CC'ing Nicolas, the author of stream.el.
Thanks, I'll look into it.

Cheers,
Nico
Daniel Colascione
2018-03-01 10:44:54 UTC
Permalink
Post by Eli Zaretskii
Date: Tue, 27 Feb 2018 13:08:59 +0100
#+begin_src emacs-lisp
(seq-doseq (_ (stream-range 1 1000000)) nil)
#+end_src
Note that this is executed as a loop due how to streams are implemented,
although the definition of `seq-doseq' looks recursive. But it seems
that gc has a problem with the large number of conses created when
processing that.
What can we do instead in such cases? Stack-overflow protection
cannot work in GC, so you are shooting yourself in the foot by
creating such large recursive structures. By the time we get to GC,
where the problem will happen, it's too late, because the memory was
already allocated.
Does anyone has a reasonable idea for avoiding the crash in such
programs?
We need to fix GC being deeply recursive once and for all. Tweaking
stack sizes on various platforms and trying to spot-fix GC for the
occasional deeply recursive structure is annoying. Here's my proposal:

Turn garbage_collect_1 into a queue-draining loop, initializing the
object queue with the GC roots before draining it. We'll make
mark_object put an object on this queue, turning the existing
mark_object code into a mark_queued_object function.

garbage_collect_1 will just call mark_queued_object in a loop;
mark_queued_object can call mark_object, but since mark_object just
enqueues an object and doesn't recurse, we can't exhaust the stack with
deep object graphs. (We'll repurpose the mark bit to mean that the
object is on the to-mark queue; by the time we fully drain the queue,
just before we sweep, the mark bit will have the same meaning it does now.)

We can't allocate memory to hold the queue during GC, so we'll have to
pre-allocate it. We can implement the queue as a list of queue blocks,
where each queue block is an array of 16k or so Lisp_Objects. During
allocation, we'll just make sure we have one Lisp_Object queue-block
slot for every non-self-representing Lisp object we allocate.

Since we know that we'll have enough queue blocks for the worst GC case,
we can have mark_object pull queue blocks from a free list, aborting if
for some reason it ever runs out of queue blocks. (The previous
paragraph guarantees we won't.) garbage_collect_1 will churn through
these heap blocks and place each back on the free list after it's called
mark_queued_object on every Lisp_Object in the queue block.

In this way, in non-pathological cases of GC, we'll end up using the
same few queue blocks over and over. That's a nice optimization, because
we can MADV_DONTNEED unused queue blocks so the OS doesn't actually have
to remember their contents.

In this way, I think we can make the current GC model recursion-proof
without drastically changing how we allocate Lisp objects. The
additional memory requirements should be modest: it's basically one
Lisp_Object per Lisp object allocated.

The naive version of this scheme needs about 4.6MB of overhead on my
current 20MB Emacs heap, but it should be possible to reduce the
overhead significantly by taking advantage of the block allocation we do
for conses and other types --- we can put whole blocks on the queue
instead of pointers to individual block parts, so we can get away with a
much smaller queue. Under this approach, the reserved-queue-block scheme
would impose an overhead of somewhere around 1MB on the same heap. This
amount of overhead seems reasonable. We may end up actually using less
memory that we would for recursive mark_object stack invocation.
Noam Postavsky
2018-03-01 15:51:31 UTC
Permalink
We need to fix GC being deeply recursive once and for all. Tweaking stack
sizes on various platforms and trying to spot-fix GC for the occasional
deeply recursive structure is annoying.
Agreed, but could you please open a new thread for it? AFAICT, this
bug thread is about streams.el functions producing structures
proportional in size to the length of the entire stream, which is a
bug in itself, regardless of whether or not the GC can handle the
result.
Michael Heerdegen
2018-03-01 16:54:57 UTC
Permalink
Post by Noam Postavsky
Agreed, but could you please open a new thread for it? AFAICT, this
bug thread is about streams.el functions producing structures
proportional in size to the length of the entire stream, which is a
bug in itself, regardless of whether or not the GC can handle the
result.
No, it is a feature: we want streams to produce these structures,
streams are delayed lists, not just iterators. Lists are also
potentially unlimited nested conses, and that's not a bug. Streams are
the same realized with delayed conses.


Michael.
Noam Postavsky
2018-03-01 17:15:43 UTC
Permalink
On Thu, Mar 1, 2018 at 11:54 AM, Michael Heerdegen
Post by Michael Heerdegen
Post by Noam Postavsky
Agreed, but could you please open a new thread for it? AFAICT, this
bug thread is about streams.el functions producing structures
proportional in size to the length of the entire stream, which is a
bug in itself, regardless of whether or not the GC can handle the
result.
No, it is a feature: we want streams to produce these structures,
streams are delayed lists, not just iterators. Lists are also
potentially unlimited nested conses, and that's not a bug. Streams are
the same realized with delayed conses.
Maybe I've misunderstood, but is it not the case that iterating over
(stream-range 1 n) should require only a constant amount of memory,
regardless of the value of n?
Michael Heerdegen
2018-03-02 07:08:55 UTC
Permalink
Post by Noam Postavsky
Maybe I've misunderstood, but is it not the case that iterating over
(stream-range 1 n) should require only a constant amount of memory,
regardless of the value of n?
Depends on your definition of `require'. Like, for example,

(dolist (i 1000000) (message "%S" (cons i (1+ i))))

each iteration step creates and discards a new cons (or a constant
number of conses). Not a exceptional thing in Lisp. When iterating
over (stream-range 1 n), the garbage collector seems to have problems
with how the garbage is structured.


Michael.
Noam Postavsky
2018-03-02 13:01:16 UTC
Permalink
Post by Michael Heerdegen
Post by Noam Postavsky
Maybe I've misunderstood, but is it not the case that iterating over
(stream-range 1 n) should require only a constant amount of memory,
regardless of the value of n?
Depends on your definition of `require'. Like, for example,
(dolist (i 1000000) (message "%S" (cons i (1+ i))))
each iteration step creates and discards a new cons (or a constant
number of conses). Not a exceptional thing in Lisp. When iterating
over (stream-range 1 n), the garbage collector seems to have problems
with how the garbage is structured.
Ah, so let me be more precise. Iterating over (stream-range 1 n) should
require only a constant amount of *reachable* memory at any particular
instant. So your example above is okay, but the following one would be
not acceptable (I mean, it's fine if some random lisp code does that,
but stream-range should not be creating such long lists, or equivalently
large structures):

(let ((list nil))
(dotimes (i 1000000)
(push i list)
(message "%S" (car list))))
Michael Heerdegen
2018-03-02 13:13:59 UTC
Permalink
Post by Noam Postavsky
Ah, so let me be more precise. Iterating over (stream-range 1 n) should
require only a constant amount of *reachable* memory at any particular
instant. So your example above is okay, but the following one would be
not acceptable (I mean, it's fine if some random lisp code does that,
but stream-range should not be creating such long lists, or equivalently
(let ((list nil))
(dotimes (i 1000000)
(push i list)
(message "%S" (car list))))
Yes, absolutely agreed.

Using streams, you can logically refer to the complete list of elements
as one object, but the programmer must ensure that the referable list of
generated elements doesn't get too large.

Michael.
Michael Heerdegen
2018-03-02 13:04:53 UTC
Permalink
Post by Noam Postavsky
Maybe I've misunderstood, but is it not the case that iterating over
(stream-range 1 n) should require only a constant amount of memory,
regardless of the value of n?
But in this regard, we have a problem with how lexical-binding is
implemented for interpreted code. Nested thunks (as implemented in
"thunk.el") accumulate useless variable bindings - e.g.

(defun test ()
(thunk-force
(thunk-delay
(thunk-force
(thunk-delay
(thunk-force
(thunk-delay
(lambda () 1))))))))
(test)
==>
#1=(closure
((check)
(#:val . #1#)
(#:forced . t)
(check)
(#:val . #1#)
(#:forced . t)
(check)
(#:val . #1#)
(#:forced . t)
t)
nil 1)

The length of the variable list is equivalent to the number of thunk
wrappers. I believe that these useless variable lists are responsible
for the crashes of the uncompiled versions of the test files I had
posted. I think this problem is different from the gc issue.

Streams use nested thunks. Of course does thunk.el not explicitly add
such variable lists to the result - this is how closures are built in
interpreted code. For nested thunks these just add up.

BTW, if you byte-compile the above `test' function, then

(disassemble (test))
==>
byte code:
args: nil
0 constant 1
1 return

and this problem is gone.


Michael.
Eli Zaretskii
2018-02-27 18:00:39 UTC
Permalink
Date: Tue, 27 Feb 2018 12:39:47 +0100
Post by Eli Zaretskii
I guess it's a stack overflow: that function recurses into
subdirectories.
To avoid such problems, the function should be rewritten to work by BFS, not DFS.
Here is the backtrace I'm now able to produce. Looks like the crash
GC is deeply-recursive, and you have exacerbated that by using up a
lot of stack.
Loading...