All of lore.kernel.org
 help / color / mirror / Atom feed
* [LTP] [RFC] Shell API timeout sleep orphan processes
@ 2021-04-30 11:08 Joerg Vehlow
  2021-05-04  6:52 ` Petr Vorel
  0 siblings, 1 reply; 8+ messages in thread
From: Joerg Vehlow @ 2021-04-30 11:08 UTC (permalink / raw)
  To: ltp

Hi,

I am looking into getting rid of our custom patches for ltp.
One of these patches fixes the problem, that the timeout sleep process 
is orphaned, if the test does not timeout.

The kill code is not working as expected, because it only kills the 
shell process spawned by "sleep $sec && _tst_kill_test &".
We are running single ltp tests using robot framework and robot waits 
until all processes of session have finished.

This can also be seen by piping the output of a testrun into cat (eg. 
with timeout02.sh from newlib_test/shell):
$ time sh -c './timeout02.sh >/dev/null | cat'
timeout02 1 TINFO: timeout per run is 0h 0m 2s
timeout02 1 TPASS: timeout 2 set (LTP_TIMEOUT_MUL='1')

[snip]

real??? 0m2,011s


The test does nothing, and completes in < 100ms. This can be seen 
without piping through cat:

time sh -c 'PATH="$PWD:$PWD/../../../testcases/lib/:$PATH" ./timeout02.sh'
timeout02 1 TINFO: timeout per run is 0h 0m 2s
timeout02 1 TPASS: timeout 2 set (LTP_TIMEOUT_MUL='1')

[snip]

real??? 0m0,010s


I am not sure what the best approach for fixing these sleep orphans is. 
Out patch uses "set -m" around the start of the timer, this makes most 
of the shells create a new process group, but it failed (at least did 
not work) in zsh. The killing of the timeout process is then changed to 
kill the process group (kill -- -$_tst_setup_timer_pid).
This works fine@least for some shells.

The only way to fix this really portable I can think of is moving the 
timeout code (including the logic in _tst_kill_test) into c code. This 
way there would only be one binary, that can be killed flawlessly.

Do you have any other idea or do you think this "bug" is not relevant 
enough to be fixed?

Thanks,
Joerg

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [LTP] [RFC] Shell API timeout sleep orphan processes
  2021-04-30 11:08 [LTP] [RFC] Shell API timeout sleep orphan processes Joerg Vehlow
@ 2021-05-04  6:52 ` Petr Vorel
  2021-05-04  8:04   ` Joerg Vehlow
  0 siblings, 1 reply; 8+ messages in thread
From: Petr Vorel @ 2021-05-04  6:52 UTC (permalink / raw)
  To: ltp

Hi Joerg,

[ Cc: Cyril and Li ]

> I am looking into getting rid of our custom patches for ltp.
> One of these patches fixes the problem, that the timeout sleep process is
> orphaned, if the test does not timeout.

> The kill code is not working as expected, because it only kills the shell
> process spawned by "sleep $sec && _tst_kill_test &".
> We are running single ltp tests using robot framework and robot waits until
> all processes of session have finished.

Interesting. Do you mean $_tst_setup_timer_pid from _tst_setup_timer was left
running if the test does not timeout? Because I was not able to find it.

> This can also be seen by piping the output of a testrun into cat (eg. with
> timeout02.sh from newlib_test/shell):
> $ time sh -c './timeout02.sh >/dev/null | cat'
> timeout02 1 TINFO: timeout per run is 0h 0m 2s
> timeout02 1 TPASS: timeout 2 set (LTP_TIMEOUT_MUL='1')

> [snip]

> real??? 0m2,011s


> The test does nothing, and completes in < 100ms. This can be seen without
> piping through cat:

> time sh -c 'PATH="$PWD:$PWD/../../../testcases/lib/:$PATH" ./timeout02.sh'
> timeout02 1 TINFO: timeout per run is 0h 0m 2s
> timeout02 1 TPASS: timeout 2 set (LTP_TIMEOUT_MUL='1')

> [snip]

> real??? 0m0,010s

Interesting slowdown. It looks to me it's exit $ret in final _tst_do_exit()
takes so much time. I have no idea why, but it was here before 25ad54dba
("tst_test.sh: Run cleanup also after test timeout").



> I am not sure what the best approach for fixing these sleep orphans is. Out
> patch uses "set -m" around the start of the timer, this makes most of the
> shells create a new process group, but it failed (at least did not work) in
> zsh. The killing of the timeout process is then changed to kill the process
> group (kill -- -$_tst_setup_timer_pid).
> This works fine at least for some shells.
Please do send the patch. "set -m" is supported by dash and busbox sh, IMHO it's
safe to use it.

> The only way to fix this really portable I can think of is moving the
> timeout code (including the logic in _tst_kill_test) into c code. This way
> there would only be one binary, that can be killed flawlessly.
Maybe set -m would be enough. But sure, rewriting C is usually the best approach
for shell problems, we use quite a lot of C helpers for shell already.

> Do you have any other idea or do you think this "bug" is not relevant enough
> to be fixed?

Kind regards,
Petr

> Thanks,
> Joerg

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [LTP] [RFC] Shell API timeout sleep orphan processes
  2021-05-04  6:52 ` Petr Vorel
@ 2021-05-04  8:04   ` Joerg Vehlow
  2021-05-04  8:47     ` Petr Vorel
  0 siblings, 1 reply; 8+ messages in thread
From: Joerg Vehlow @ 2021-05-04  8:04 UTC (permalink / raw)
  To: ltp

Hi Petr,
>> The kill code is not working as expected, because it only kills the shell
>> process spawned by "sleep $sec && _tst_kill_test &".
>> We are running single ltp tests using robot framework and robot waits until
>> all processes of session have finished.
> Interesting. Do you mean $_tst_setup_timer_pid from _tst_setup_timer was left
> running if the test does not timeout? Because I was not able to find it.
Ups there was a bug in my command. Redirection of the output of the test 
to /dev/null does not trigger the long delay:
Please try with time sh -c './timeout02.sh | cat'
Sorry for that...

The line "sleep $sec && _tst_kill_test &" spawns two processes:
sleep and a shell process, that is (probably) forked from the running 
shell. The pid returned by $! is the pid of this shell.
When killing the timeout process, only this shell process, but not the 
sleep is killed. That is also were the slowdown comes from.

However, this might be shell implementation specific. At least for 
busybox sh and I think dash and bash the behavior is the same.

> Interesting slowdown. It looks to me it's exit $ret in final _tst_do_exit()
> takes so much time. I have no idea why, but it was here before 25ad54dba
> ("tst_test.sh: Run cleanup also after test timeout").
I think what actually is consuming the time is the sleep process, that 
has stdout still opened.
Redirecting the output of sleep to /dev/null, fixes the hanging, but 
there is still the orphaned sleep process lingering around.
Try "sleep $sec >/dev/null && _tst_kill_test &"

$ ps; time sh -c 'PATH="$PWD:$PWD/../../../testcases/lib/:$PATH" 
./timeout02.sh | cat' ; ps
 ??? PID TTY????????? TIME CMD
 ?? 2352 pts/5??? 00:00:00 bash
 ? 19981 pts/5??? 00:00:00 ps
timeout02 1 TINFO: timeout per run is 0h 0m 2s
timeout02 1 TPASS: timeout 2 set (LTP_TIMEOUT_MUL='1')

Summary:
passed?? 1
failed?? 0
broken?? 0
skipped? 0
warnings 0

real??? 0m0,013s
user??? 0m0,012s
sys??? 0m0,005s
 ??? PID TTY????????? TIME CMD
 ?? 2352 pts/5??? 00:00:00 bash
 ? 19998 pts/5??? 00:00:00 sleep
 ? 20001 pts/5??? 00:00:00 ps

>> The only way to fix this really portable I can think of is moving the
>> timeout code (including the logic in _tst_kill_test) into c code. This way
>> there would only be one binary, that can be killed flawlessly.
> Maybe set -m would be enough. But sure, rewriting C is usually the best approach
> for shell problems, we use quite a lot of C helpers for shell already.
I will send the patch, if this introduces any new issues, we can still 
switch to a c based implementation.

J?rg

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [LTP] [RFC] Shell API timeout sleep orphan processes
  2021-05-04  8:04   ` Joerg Vehlow
@ 2021-05-04  8:47     ` Petr Vorel
  2021-05-04 10:35       ` Joerg Vehlow
  0 siblings, 1 reply; 8+ messages in thread
From: Petr Vorel @ 2021-05-04  8:47 UTC (permalink / raw)
  To: ltp

> Hi Petr,
> > > The kill code is not working as expected, because it only kills the shell
> > > process spawned by "sleep $sec && _tst_kill_test &".
> > > We are running single ltp tests using robot framework and robot waits until
> > > all processes of session have finished.
> > Interesting. Do you mean $_tst_setup_timer_pid from _tst_setup_timer was left
> > running if the test does not timeout? Because I was not able to find it.
> Ups there was a bug in my command. Redirection of the output of the test to
> /dev/null does not trigger the long delay:
> Please try with time sh -c './timeout02.sh | cat'
> Sorry for that...

> The line "sleep $sec && _tst_kill_test &" spawns two processes:
> sleep and a shell process, that is (probably) forked from the running shell.
> The pid returned by $! is the pid of this shell.
> When killing the timeout process, only this shell process, but not the sleep
> is killed. That is also were the slowdown comes from.

> However, this might be shell implementation specific. At least for busybox
> sh and I think dash and bash the behavior is the same.

> > Interesting slowdown. It looks to me it's exit $ret in final _tst_do_exit()
> > takes so much time. I have no idea why, but it was here before 25ad54dba
> > ("tst_test.sh: Run cleanup also after test timeout").
> I think what actually is consuming the time is the sleep process, that has
> stdout still opened.
> Redirecting the output of sleep to /dev/null, fixes the hanging, but there
> is still the orphaned sleep process lingering around.
> Try "sleep $sec >/dev/null && _tst_kill_test &"
Indeed, redirection helps. Interesting.

> $ ps; time sh -c 'PATH="$PWD:$PWD/../../../testcases/lib/:$PATH"
> ./timeout02.sh | cat' ; ps
> ??? PID TTY????????? TIME CMD
> ?? 2352 pts/5??? 00:00:00 bash
> ? 19981 pts/5??? 00:00:00 ps
> timeout02 1 TINFO: timeout per run is 0h 0m 2s
> timeout02 1 TPASS: timeout 2 set (LTP_TIMEOUT_MUL='1')

> Summary:
> passed?? 1
> failed?? 0
> broken?? 0
> skipped? 0
> warnings 0

> real??? 0m0,013s
> user??? 0m0,012s
> sys??? 0m0,005s
> ??? PID TTY????????? TIME CMD
> ?? 2352 pts/5??? 00:00:00 bash
> ? 19998 pts/5??? 00:00:00 sleep
> ? 20001 pts/5??? 00:00:00 ps
Yep, you're right :(. Thanks a lot for your analysis!

> > > The only way to fix this really portable I can think of is moving the
> > > timeout code (including the logic in _tst_kill_test) into c code. This way
> > > there would only be one binary, that can be killed flawlessly.
> > Maybe set -m would be enough. But sure, rewriting C is usually the best approach
> > for shell problems, we use quite a lot of C helpers for shell already.
> I will send the patch, if this introduces any new issues, we can still
> switch to a c based implementation.
Thank you!

Kind regards,
Petr

> J?rg

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [LTP] [RFC] Shell API timeout sleep orphan processes
  2021-05-04  8:47     ` Petr Vorel
@ 2021-05-04 10:35       ` Joerg Vehlow
  2021-05-04 15:07         ` Petr Vorel
  2021-05-06 12:23         ` Li Wang
  0 siblings, 2 replies; 8+ messages in thread
From: Joerg Vehlow @ 2021-05-04 10:35 UTC (permalink / raw)
  To: ltp

Hi Petr,

>>>> The only way to fix this really portable I can think of is moving the
>>>> timeout code (including the logic in _tst_kill_test) into c code. This way
>>>> there would only be one binary, that can be killed flawlessly.
>>> Maybe set -m would be enough. But sure, rewriting C is usually the best approach
>>> for shell problems, we use quite a lot of C helpers for shell already.
>> I will send the patch, if this introduces any new issues, we can still
>> switch to a c based implementation.
> Thank you!

I guess I will not submit the fix with set -m...
My colleague working testing on a new target just ran into a problem 
with out fix using set -m.
For busybox sh, it only works, if a tty is allocated, otherwise it 
refuses to enable job monitor mode and the whole construct fails...

Since using the monitor mode only exploits the side effect, that it 
creates a process group and we already found a first issue, I think the 
only solution is switching to a c-based solution for the timeout stuff.
Personally I like that it is written in shell script, but I see no other 
option to do it in pure shell. There is no builtin way, to create a 
process group and just redirecting the output of sleep to null does not 
seem to be a very good solution to me, because there would still be a 
lot of orphans around.

J?rg

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [LTP] [RFC] Shell API timeout sleep orphan processes
  2021-05-04 10:35       ` Joerg Vehlow
@ 2021-05-04 15:07         ` Petr Vorel
  2021-05-06 12:23         ` Li Wang
  1 sibling, 0 replies; 8+ messages in thread
From: Petr Vorel @ 2021-05-04 15:07 UTC (permalink / raw)
  To: ltp

Hi Joerg,

> Hi Petr,

> > > > > The only way to fix this really portable I can think of is moving the
> > > > > timeout code (including the logic in _tst_kill_test) into c code. This way
> > > > > there would only be one binary, that can be killed flawlessly.
> > > > Maybe set -m would be enough. But sure, rewriting C is usually the best approach
> > > > for shell problems, we use quite a lot of C helpers for shell already.
> > > I will send the patch, if this introduces any new issues, we can still
> > > switch to a c based implementation.
> > Thank you!

> I guess I will not submit the fix with set -m...
> My colleague working testing on a new target just ran into a problem with
> out fix using set -m.
> For busybox sh, it only works, if a tty is allocated, otherwise it refuses
> to enable job monitor mode and the whole construct fails...
OK, C will be needed (unless anybody has an idea how to safely implement it in
shell).

> Since using the monitor mode only exploits the side effect, that it creates
> a process group and we already found a first issue, I think the only
> solution is switching to a c-based solution for the timeout stuff.
> Personally I like that it is written in shell script, but I see no other
> option to do it in pure shell. There is no builtin way, to create a process
> group and just redirecting the output of sleep to null does not seem to be a
> very good solution to me, because there would still be a lot of orphans
> around.

There will be git freeze in very soon, so I guess the fix won't make it to the
release (unless you have time to work on it next week).

Kind regards,
Petr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [LTP] [RFC] Shell API timeout sleep orphan processes
  2021-05-04 10:35       ` Joerg Vehlow
  2021-05-04 15:07         ` Petr Vorel
@ 2021-05-06 12:23         ` Li Wang
  2021-05-07  4:48           ` Joerg Vehlow
  1 sibling, 1 reply; 8+ messages in thread
From: Li Wang @ 2021-05-06 12:23 UTC (permalink / raw)
  To: ltp

Hi Joerg, Petr,

On Tue, May 4, 2021 at 6:34 PM Joerg Vehlow <lkml@jv-coder.de> wrote:
>
> Hi Petr,
>
> >>>> The only way to fix this really portable I can think of is moving the
> >>>> timeout code (including the logic in _tst_kill_test) into c code. This way
> >>>> there would only be one binary, that can be killed flawlessly.
> >>> Maybe set -m would be enough. But sure, rewriting C is usually the best approach
> >>> for shell problems, we use quite a lot of C helpers for shell already.
> >> I will send the patch, if this introduces any new issues, we can still
> >> switch to a c based implementation.
> > Thank you!
>
> I guess I will not submit the fix with set -m...
> My colleague working testing on a new target just ran into a problem
> with out fix using set -m.
> For busybox sh, it only works, if a tty is allocated, otherwise it
> refuses to enable job monitor mode and the whole construct fails...
>
> Since using the monitor mode only exploits the side effect, that it
> creates a process group and we already found a first issue, I think the
> only solution is switching to a c-based solution for the timeout stuff.
> Personally I like that it is written in shell script, but I see no other
> option to do it in pure shell. There is no builtin way, to create a
> process group and just redirecting the output of sleep to null does not
> seem to be a very good solution to me, because there would still be a
> lot of orphans around.

(Sorry for the late reply, I was just back from the Holidays)

Before we decide to rewrite in C, can we think about this below method?

--- a/testcases/lib/tst_test.sh
+++ b/testcases/lib/tst_test.sh
@@ -26,7 +26,7 @@ trap "tst_brk TBROK 'test interrupted or timed out'" INT
 _tst_cleanup_timer()
 {
        if [ -n "$_tst_setup_timer_pid" ]; then
-               kill $_tst_setup_timer_pid 2>/dev/null
+               kill -TERM $_tst_setup_timer_pid 2>/dev/null
                wait $_tst_setup_timer_pid 2>/dev/null
        fi
 }
@@ -486,7 +486,7 @@ _tst_setup_timer()
        tst_res TINFO "timeout per run is ${h}h ${m}m ${s}s"

        _tst_cleanup_timer
-       sleep $sec && _tst_kill_test &
+       (trap 'kill $!; exit' TERM; sleep $sec & wait $! && _tst_kill_test)&

        _tst_setup_timer_pid=$!
 }


-- 
Regards,
Li Wang


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [LTP] [RFC] Shell API timeout sleep orphan processes
  2021-05-06 12:23         ` Li Wang
@ 2021-05-07  4:48           ` Joerg Vehlow
  0 siblings, 0 replies; 8+ messages in thread
From: Joerg Vehlow @ 2021-05-07  4:48 UTC (permalink / raw)
  To: ltp

Hi Li,

> (Sorry for the late reply, I was just back from the Holidays)
>
> Before we decide to rewrite in C, can we think about this below method?
>
> --- a/testcases/lib/tst_test.sh
> +++ b/testcases/lib/tst_test.sh
> @@ -26,7 +26,7 @@ trap "tst_brk TBROK 'test interrupted or timed out'" INT
>   _tst_cleanup_timer()
>   {
>          if [ -n "$_tst_setup_timer_pid" ]; then
> -               kill $_tst_setup_timer_pid 2>/dev/null
> +               kill -TERM $_tst_setup_timer_pid 2>/dev/null
>                  wait $_tst_setup_timer_pid 2>/dev/null
>          fi
>   }
> @@ -486,7 +486,7 @@ _tst_setup_timer()
>          tst_res TINFO "timeout per run is ${h}h ${m}m ${s}s"
>
>          _tst_cleanup_timer
> -       sleep $sec && _tst_kill_test &
> +       (trap 'kill $!; exit' TERM; sleep $sec & wait $! && _tst_kill_test)&
>
>          _tst_setup_timer_pid=$!
>   }
Thanks, good idea. This looks like it works. My tests are passing. I 
will test this a bit more.
Although I already submitted a rewrite in c, I would prefer this 
solution, maybe with a bit improved readability.
I will post a new patch when I finished testing later today.

J?rg

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-05-07  4:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-30 11:08 [LTP] [RFC] Shell API timeout sleep orphan processes Joerg Vehlow
2021-05-04  6:52 ` Petr Vorel
2021-05-04  8:04   ` Joerg Vehlow
2021-05-04  8:47     ` Petr Vorel
2021-05-04 10:35       ` Joerg Vehlow
2021-05-04 15:07         ` Petr Vorel
2021-05-06 12:23         ` Li Wang
2021-05-07  4:48           ` Joerg Vehlow

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.