* RE: Bug#953421: dash: Resident Set Size growth is unbound (memory leak) on an infinite shell loop [not found] ` <CACujMDPfs5mJs8CVaxqM6wkCRANYQ71wTUkvHiNvOg+MPSTECQ@mail.gmail.com> @ 2020-03-29 17:54 ` Vitaly Zuevsky 2020-03-29 19:06 ` Harald van Dijk 0 siblings, 1 reply; 7+ messages in thread From: Vitaly Zuevsky @ 2020-03-29 17:54 UTC (permalink / raw) To: 'Andrej Shadura', 953421, dash Cc: 'Debian Bug Tracking System' I have now fixed this bug locally. The leak is in jobtab array (jobs.c). I concluded that the most logical approach would be eliminating inconsistency between makejob() and dowait() functions. My fix in a forked repo: https://salsa.debian.org/psvz-guest/dash/-/commit/5e3ea90cb3355d1308c482661a471883d36af5e7 I suspect that imbalance between allocating and freeing jobtab slots was introduced at evolving SIGCHLD handling, because I can see two waitforjob() calls, wherein the first call could wait for both done jobs and gotsigchld is set. At the second call, gotsigchld is cleared and waitforjob() does nothing. So 1 of 2 jobtab slots goes astray. Simplified bug repro steps: Script ------- while true do true & sleep .1 done Execution ------------- # dash Script & You can now observe resident set size (RSS) is bumped every several seconds: # ps aux |grep 'RSS\|dash' What do I do to have it peer-reviewed and, potentially, merged into mainline? Cheers, Vitaly ----------fixDetails: Function makejob() calls freejob() when appropriate. Specifically, it detects a free slot in jobtab array by this condition: (jp->state != JOBDONE || !jp->waited) The problem is that dowait() never sets waited flag together with JOBDONE state. Consequently, jobtab may leak: repro steps are provided in a bug report #953421. One solution could be not relying on waited flag at all, i.e. converting the makejob()'s condition into (jp->state != JOBDONE). As an alternative, I am setting waited flag in dowait(). ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bug#953421: dash: Resident Set Size growth is unbound (memory leak) on an infinite shell loop 2020-03-29 17:54 ` Bug#953421: dash: Resident Set Size growth is unbound (memory leak) on an infinite shell loop Vitaly Zuevsky @ 2020-03-29 19:06 ` Harald van Dijk 2020-03-29 22:07 ` Jilles Tjoelker ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Harald van Dijk @ 2020-03-29 19:06 UTC (permalink / raw) To: Vitaly Zuevsky, 'Andrej Shadura', 953421, dash Cc: 'Debian Bug Tracking System' Hi Vitaly, On 29/03/2020 18:54, Vitaly Zuevsky wrote: > I have now fixed this bug locally. > > The leak is in jobtab array (jobs.c). I concluded that the most logical approach would be eliminating inconsistency between makejob() and dowait() functions. My fix in a forked repo: > > https://salsa.debian.org/psvz-guest/dash/-/commit/5e3ea90cb3355d1308c482661a471883d36af5e7 This change is incorrect. The reason dash keeps on allocating memory is because dash needs to keep on allocating memory. Consider this script: set -- $(seq 1 100) for i do : & sleep .1 done for i do wait %$i done This is a valid script and works fine in dash. Your change breaks this by not keeping the jobs around long enough, and I hope this test script shows that there is no way to keep the jobs around long enough but by allocating ever more memory. Your change makes it impossible to keep track of the background process's status, but if you do not care about that anyway, you can avoid the increasing memory use without modifying dash by launching a background process without including it in the current shell's job table, by launching it from a subshell: while true do (true &) sleep .1 done Cheers, Harald van Dijk ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bug#953421: dash: Resident Set Size growth is unbound (memory leak) on an infinite shell loop 2020-03-29 19:06 ` Harald van Dijk @ 2020-03-29 22:07 ` Jilles Tjoelker 2020-03-29 23:07 ` Harald van Dijk 2020-03-31 19:07 ` Vitaly Zuevsky 2020-04-02 13:18 ` Vitaly Zuevsky 2 siblings, 1 reply; 7+ messages in thread From: Jilles Tjoelker @ 2020-03-29 22:07 UTC (permalink / raw) To: Harald van Dijk Cc: Vitaly Zuevsky, 'Andrej Shadura', 953421, dash, 'Debian Bug Tracking System' On Sun, Mar 29, 2020 at 08:06:31PM +0100, Harald van Dijk wrote: > On 29/03/2020 18:54, Vitaly Zuevsky wrote: > > I have now fixed this bug locally. > > The leak is in jobtab array (jobs.c). I concluded that the most > > logical approach would be eliminating inconsistency between > > makejob() and dowait() functions. My fix in a forked repo: > > https://salsa.debian.org/psvz-guest/dash/-/commit/5e3ea90cb3355d1308c482661a471883d36af5e7 > This change is incorrect. The reason dash keeps on allocating memory is > because dash needs to keep on allocating memory. Consider this script: > set -- $(seq 1 100) > for i > do > : & > sleep .1 > done > for i > do > wait %$i > done > This is a valid script and works fine in dash. Your change breaks this by > not keeping the jobs around long enough, and I hope this test script shows > that there is no way to keep the jobs around long enough but by allocating > ever more memory. I agree that the change is incorrect, but I do not agree that this kind of script must leak memory. Per POSIX.1-2008 XCU 2.9.3.1 Asynchronous Lists, an implementation has additional ways to forget about jobs than just an appropriate completion of the wait utility: if another asynchronous job is started when $! was not referenced or if the number of known process IDs would exceed {CHILD_MAX} (which tends to be rather big, though). POSIX does not seem to expect using %<jobid> in scripts like this; it seems highly fragile to me anyway (although $! has problems with process ID reuse). FreeBSD sh implements forgetting when $! was not referenced (and the job has terminated), but not the {CHILD_MAX} limit. This avoids the increasing memory usage in the example script. > Your change makes it impossible to keep track of the background process's > status, but if you do not care about that anyway, you can avoid the > increasing memory use without modifying dash by launching a background > process without including it in the current shell's job table, by launching > it from a subshell: > while true > do > (true &) > sleep .1 > done Certainly a good idea. Another option may be to include regular invocations of the wait utility without parameters, although this is not suitable for all scripts. -- Jilles Tjoelker ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bug#953421: dash: Resident Set Size growth is unbound (memory leak) on an infinite shell loop 2020-03-29 22:07 ` Jilles Tjoelker @ 2020-03-29 23:07 ` Harald van Dijk 0 siblings, 0 replies; 7+ messages in thread From: Harald van Dijk @ 2020-03-29 23:07 UTC (permalink / raw) To: Jilles Tjoelker Cc: Vitaly Zuevsky, 'Andrej Shadura', 953421, dash, 'Debian Bug Tracking System' On 29/03/2020 23:07, Jilles Tjoelker wrote: > On Sun, Mar 29, 2020 at 08:06:31PM +0100, Harald van Dijk wrote: >> On 29/03/2020 18:54, Vitaly Zuevsky wrote: >>> I have now fixed this bug locally. > >>> The leak is in jobtab array (jobs.c). I concluded that the most >>> logical approach would be eliminating inconsistency between >>> makejob() and dowait() functions. My fix in a forked repo: > >>> https://salsa.debian.org/psvz-guest/dash/-/commit/5e3ea90cb3355d1308c482661a471883d36af5e7 > >> This change is incorrect. The reason dash keeps on allocating memory is >> because dash needs to keep on allocating memory. Consider this script: > >> set -- $(seq 1 100) >> for i >> do >> : & >> sleep .1 >> done >> for i >> do >> wait %$i >> done > >> This is a valid script and works fine in dash. Your change breaks this by >> not keeping the jobs around long enough, and I hope this test script shows >> that there is no way to keep the jobs around long enough but by allocating >> ever more memory. > > I agree that the change is incorrect, but I do not agree that this kind > of script must leak memory. Per POSIX.1-2008 XCU 2.9.3.1 Asynchronous > Lists, an implementation has additional ways to forget about jobs than > just an appropriate completion of the wait utility: if another > asynchronous job is started when $! was not referenced or if the number > of known process IDs would exceed {CHILD_MAX} (which tends to be rather > big, though). That says the process ID need not be remembered, which makes sense: process IDs are a very limited resource on some systems. This reasoning does not apply to job IDs and I do not see any corresponding wording for those. Am I overlooking something? Regardless, it does not leak memory. It continually uses more memory, but it is not a leak if the memory is still in use and can/will be freed when it is no longer needed. Memory used for jobs is released when the job is removed from the job table. > POSIX does not seem to expect using %<jobid> in scripts like this; it > seems highly fragile to me anyway (although $! has problems with process > ID reuse). In what way does it not seem to expect this? POSIX says "The job control job ID type of pid is only available on systems supporting the User Portability Utilities option" but does not appear to have any other relevant restrictions. As for fragile, sure, but if a script is correct, it should be processed correctly by a shell. > FreeBSD sh implements forgetting when $! was not referenced (and the job > has terminated), but not the {CHILD_MAX} limit. This avoids the > increasing memory usage in the example script. See above for the note on process ID vs job ID. >> Your change makes it impossible to keep track of the background process's >> status, but if you do not care about that anyway, you can avoid the >> increasing memory use without modifying dash by launching a background >> process without including it in the current shell's job table, by launching >> it from a subshell: > >> while true >> do >> (true &) >> sleep .1 >> done > > Certainly a good idea. > > Another option may be to include regular invocations of the wait utility > without parameters, although this is not suitable for all scripts. A third option is invoking the 'jobs' command, possibly redirecting the output to /dev/null. When the 'jobs' command reports that any background job is done, that job is removed from the list of jobs the shell has to remembers. Cheers, Harald van Dijk ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: Bug#953421: dash: Resident Set Size growth is unbound (memory leak) on an infinite shell loop 2020-03-29 19:06 ` Harald van Dijk 2020-03-29 22:07 ` Jilles Tjoelker @ 2020-03-31 19:07 ` Vitaly Zuevsky 2020-03-31 21:04 ` Harald van Dijk 2020-04-02 13:18 ` Vitaly Zuevsky 2 siblings, 1 reply; 7+ messages in thread From: Vitaly Zuevsky @ 2020-03-31 19:07 UTC (permalink / raw) To: 'Harald van Dijk', 'Andrej Shadura', 953421, dash Cc: 'Debian Bug Tracking System' Hi Harald, > set -- $(seq 1 100) > for i > do > : & > sleep .1 > done > for i > do > wait %$i > done > >This is a valid script and works fine in dash. Your change breaks this by not keeping the jobs around long enough, and I hope this test script shows that there is no way to keep the jobs around long enough but by allocating ever more memory. I must have confused two concepts: waited process in OS -vs- waited job inside shell interpreter. I am trying to see how it work in practice: # true & false & # [2] + Done(1) false [1] + Done true # wait 2 # echo $? 127 As we preserve job exit codes, I would expect wait command to read them and free associated jobtab slots. In above example I expect to see 1 in place of 127. What do I miss here? Many thanks for your help. Best, Vitaly ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bug#953421: dash: Resident Set Size growth is unbound (memory leak) on an infinite shell loop 2020-03-31 19:07 ` Vitaly Zuevsky @ 2020-03-31 21:04 ` Harald van Dijk 0 siblings, 0 replies; 7+ messages in thread From: Harald van Dijk @ 2020-03-31 21:04 UTC (permalink / raw) To: Vitaly Zuevsky, 'Andrej Shadura', 953421, dash Cc: 'Debian Bug Tracking System' Hi Vitaly, On 31/03/2020 20:07, Vitaly Zuevsky wrote: > I must have confused two concepts: waited process in OS -vs- waited job inside shell interpreter. I am trying to see how it work in practice: > > # true & false & > # > [2] + Done(1) false > [1] + Done true > # wait 2 > # echo $? > 127 > > As we preserve job exit codes, I would expect wait command to read them and free associated jobtab slots. In above example I expect to see 1 in place of 127. What do I miss here? There are two problems here. The first is that waiting for jobs involves the % symbol. wait 2 always means "wait for process 2", never "wait for job 2". To wait for job 2, the syntax is "wait %2". The second problem is that as soon as you see that "[2] + Done(1)", the job has been removed the job table already, so "wait %2" no longer works after that. The blank line (except for the #) indicates that you pressed Return at that point. In interactive shells, jobs are checked for completion and removed from the job table automatically when prompting for input. Since it was practically impossible to press Return before the "true" and "false" commands had finished, when the next prompt would have been shown, the jobs were cleaned up. If you had typed "wait %2" before pressing Return, it would have worked and the subsequent "echo $?" would have printed "1". > Many thanks for your help. > > Best, > Vitaly ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: Bug#953421: dash: Resident Set Size growth is unbound (memory leak) on an infinite shell loop 2020-03-29 19:06 ` Harald van Dijk 2020-03-29 22:07 ` Jilles Tjoelker 2020-03-31 19:07 ` Vitaly Zuevsky @ 2020-04-02 13:18 ` Vitaly Zuevsky 2 siblings, 0 replies; 7+ messages in thread From: Vitaly Zuevsky @ 2020-04-02 13:18 UTC (permalink / raw) To: 'Harald van Dijk', 'Andrej Shadura', 953421, dash Cc: 'Debian Bug Tracking System' Hi Harald, Thanks for comprehensive account of the job flow - all worked as expected now. Interestingly, I originally assumed it was a bug due to observed discrepancy with bash... On 29/03/2020 23:07, Jilles Tjoelker wrote: > I agree that the change is incorrect, but I do not agree that this kind > of script must leak memory. Per POSIX.1-2008 XCU 2.9.3.1 Asynchronous > Lists, an implementation has additional ways to forget about jobs than > just an appropriate completion of the wait utility: if another > asynchronous job is started when $! was not referenced or if the number > of known process IDs would exceed {CHILD_MAX} (which tends to be rather > big, though). I can see in Bash man pages: CHILD_MAX Set the number of exited child status values for the shell to remember. Bash will not allow this value to be decreased below a POSIX-mandated minimum, and there is a maximum value (currently 8192) that this may not exceed. The minimum value is system-dependent. Running above scripts under bash shell manifests growing RSS just initially, and with sleep .1 that growth will end after some 800 seconds as one could deduce from the man pages. That is, RSS growth is bound. I understand dash philosophy could be focussed on the most minimalistic solution possible. Such solution would inherently be less forgiving in terms of mistakes. On the other hand, the mistake in question can be difficult to detect and leads all the way to OOM, negatively affecting user experience. It is up to you and community where that balance should rest. And if you decide the change would be worthy of doing - I will be more than happy to contribute. Cheers, Vitaly ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-04-02 13:18 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <158376996556.31988.8584094104007124674.reportbug@ec2-34-240-101-198.eu-west-1.compute.amazonaws.com> [not found] ` <CACujMDPfs5mJs8CVaxqM6wkCRANYQ71wTUkvHiNvOg+MPSTECQ@mail.gmail.com> 2020-03-29 17:54 ` Bug#953421: dash: Resident Set Size growth is unbound (memory leak) on an infinite shell loop Vitaly Zuevsky 2020-03-29 19:06 ` Harald van Dijk 2020-03-29 22:07 ` Jilles Tjoelker 2020-03-29 23:07 ` Harald van Dijk 2020-03-31 19:07 ` Vitaly Zuevsky 2020-03-31 21:04 ` Harald van Dijk 2020-04-02 13:18 ` Vitaly Zuevsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).