trinity.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* test processes are not all killed
@ 2017-08-01  9:38 Dai Xiang
  2017-08-01 15:38 ` Dave Jones
  0 siblings, 1 reply; 8+ messages in thread
From: Dai Xiang @ 2017-08-01  9:38 UTC (permalink / raw)
  To: trinity

Hi!
I use below cmds(with root permission) include trinity to test and find an interesting issue:

cmd="trinity -q -q -l off -s $seed -x get_robust_list -x remap_file_pages -N 999999999"
cd /tmp
chroot --userspec nobody:nogroup / $cmd 2>&1 &
pid=$!
sleep 300s
kill -9 $pid

Then after run finish, i use pgrep and find test process do not kill
while i think the test logic is right:

5292 trinity -q -q -l off -s 3648957937 -x get_robust_list -x remap_file_pages -N 999999999
5293 trinity-watchdo
5294 trinity -q -q -l off -s 3648957937 -x get_robust_list -x remap_file_pages -N 999999999
70558 trinity -q -q -l off -s 3648957937 -x get_robust_list -x remap_file_pages -N 999999999

I do some simple tests and all processes can be killed.

Does trinity suppress kill or it run at background can not use this
way to kill?

Thanks
Xiang

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: test processes are not all killed
  2017-08-01  9:38 test processes are not all killed Dai Xiang
@ 2017-08-01 15:38 ` Dave Jones
  2017-08-02  3:09   ` Dai Xiang
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Jones @ 2017-08-01 15:38 UTC (permalink / raw)
  To: Dai Xiang; +Cc: trinity

On Tue, Aug 01, 2017 at 05:38:13PM +0800, Dai Xiang wrote:
 > Hi!
 > I use below cmds(with root permission) include trinity to test and find an interesting issue:
 > 
 > cmd="trinity -q -q -l off -s $seed -x get_robust_list -x remap_file_pages -N 999999999"
 > cd /tmp
 > chroot --userspec nobody:nogroup / $cmd 2>&1 &
 > pid=$!
 > sleep 300s
 > kill -9 $pid
 > 
 > Then after run finish, i use pgrep and find test process do not kill
 > while i think the test logic is right:
 > 
 > 5292 trinity -q -q -l off -s 3648957937 -x get_robust_list -x remap_file_pages -N 999999999
 > 5293 trinity-watchdo
 > 5294 trinity -q -q -l off -s 3648957937 -x get_robust_list -x remap_file_pages -N 999999999
 > 70558 trinity -q -q -l off -s 3648957937 -x get_robust_list -x remap_file_pages -N 999999999
 > 
 > I do some simple tests and all processes can be killed.
 > 
 > Does trinity suppress kill or it run at background can not use this
 > way to kill?

It doesn't do anything special to mask signals (unless it happened to
call some of the signal syscalls with the right random arguments, which
is unlikely - the sanitize routines for the signal syscalls are pretty
dumb, or missing entirely)

More likely is you've found a kernel bug, or the processes are blocked
on something.

Looking at /proc/<pid>/stack can sometimes give clues as to where a
process is stuck.

Also a script like this is useful for tracing stuck pids

cd /sys/kernel/debug/tracing/
echo $1 >> set_ftrace_pid
echo function_graph >> current_tracer
echo 1 >> tracing_on
sleep 5
echo 0 >> tracing_on

cat /sys/kernel/debug/tracing/trace


Actually looking again, I see you have a trinity-watchdog process, which
current versions don't have, so maybe try updating to 1.7, (or better, the git
version) and seeing if it's reproducable there.  I don't even remember
what bugs got fixed that long ago.

	Dave

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: test processes are not all killed
  2017-08-01 15:38 ` Dave Jones
@ 2017-08-02  3:09   ` Dai Xiang
  2017-08-02 12:37     ` Dave Jones
  2017-08-02 14:57     ` Tommi Rantala
  0 siblings, 2 replies; 8+ messages in thread
From: Dai Xiang @ 2017-08-02  3:09 UTC (permalink / raw)
  To: Dave Jones; +Cc: trinity

On Tue, Aug 01, 2017 at 11:38:23AM -0400, Dave Jones wrote:
> On Tue, Aug 01, 2017 at 05:38:13PM +0800, Dai Xiang wrote:
>  > Hi!
>  > I use below cmds(with root permission) include trinity to test and find an interesting issue:
>  > 
>  > cmd="trinity -q -q -l off -s $seed -x get_robust_list -x remap_file_pages -N 999999999"
>  > cd /tmp
>  > chroot --userspec nobody:nogroup / $cmd 2>&1 &
>  > pid=$!
>  > sleep 300s
>  > kill -9 $pid
>  > 
>  > Then after run finish, i use pgrep and find test process do not kill
>  > while i think the test logic is right:
>  > 
>  > 5292 trinity -q -q -l off -s 3648957937 -x get_robust_list -x remap_file_pages -N 999999999
>  > 5293 trinity-watchdo
>  > 5294 trinity -q -q -l off -s 3648957937 -x get_robust_list -x remap_file_pages -N 999999999
>  > 70558 trinity -q -q -l off -s 3648957937 -x get_robust_list -x remap_file_pages -N 999999999
>  > 
>  > I do some simple tests and all processes can be killed.
>  > 
>  > Does trinity suppress kill or it run at background can not use this
>  > way to kill?
> 
> It doesn't do anything special to mask signals (unless it happened to
> call some of the signal syscalls with the right random arguments, which
> is unlikely - the sanitize routines for the signal syscalls are pretty
> dumb, or missing entirely)
> 
> More likely is you've found a kernel bug, or the processes are blocked
> on something.
> 
> Looking at /proc/<pid>/stack can sometimes give clues as to where a
> process is stuck.
> 
> Also a script like this is useful for tracing stuck pids
> 
> cd /sys/kernel/debug/tracing/
> echo $1 >> set_ftrace_pid
> echo function_graph >> current_tracer
> echo 1 >> tracing_on
> sleep 5
> echo 0 >> tracing_on
> 
> cat /sys/kernel/debug/tracing/trace
> 
> 
> Actually looking again, I see you have a trinity-watchdog process, which
> current versions don't have, so maybe try updating to 1.7, (or better, the git
> version) and seeing if it's reproducable there.  I don't even remember
> what bugs got fixed that long ago.

I use apt to install 1.7 version and still can reproduce:
root@local ~# pgrep -a trinity
30480 trinity -q -q -l off -s -x get_robust_list -x remap_file_pages -N 999999999
30504 trinity -q -q -l off -s -x get_robust_list -x remap_file_pages -N 999999999
30558 trinity -q -q -l off -s -x get_robust_list -x remap_file_pages -N 999999999
30564 trinity -q -q -l off -s -x get_robust_list -x remap_file_pages -N 999999999
30565 trinity -q -q -l off -s -x get_robust_list -x remap_file_pages -N 999999999
30573 trinity -q -q -l off -s -x get_robust_list -x remap_file_pages -N 999999999
30587 trinity -q -q -l off -s -x get_robust_list -x remap_file_pages -N 999999999
30600 trinity -q -q -l off -s -x get_robust_list -x remap_file_pages -N 999999999

root@local ~# cat /proc/30504/stack
[<ffffffff8122b5ac>] wb_wait_for_completion+0x5c/0x90
[<ffffffff8122ed26>] sync_inodes_sb+0x96/0x200
[<ffffffff81235135>] sync_inodes_one_sb+0x15/0x20
[<ffffffff81204913>] iterate_supers+0xc3/0x120
[<ffffffff81235455>] sys_sync+0x35/0x90
[<ffffffff818fd39e>] tracesys_phase2+0x84/0x89
[<ffffffffffffffff>] 0xffffffffffffffff

The test script:
#!/bin/bash

cmd="trinity -q -q -l off -s $seed -x get_robust_list -x remap_file_pages -N 999999999"
chroot --userspec nobody:nogroup / $cmd 2>&1 &
pid=$!
echo $pid
sleep 300
kill -9 $pid

Run log:
23182   <===
Trinity 1.7  Dave Jones <davej@codemonkey.org.uk> <===
shm:0x7f2beff1c000-0x7f2bfc898da0 (4 pages)
[main] Marking syscall remap_file_pages (64bit:216 32bit:257) as to be disabled.
[main] Couldn't chmod tmp/ to 0777.
[main] Using user passed random seed: 0.
Marking all syscalls as enabled.
[main] Disabling syscalls marked as disabled by command line options
[main] Marked 64-bit syscall remap_file_pages (216) as deactivated.
[main] Marked 32-bit syscall remap_file_pages (257) as deactivated.
[main] 32-bit syscalls: 378 enabled, 2 disabled.  64-bit syscalls: 330 enabled, 2 disabled.
[main] Using pid_max = 32768
[main] There are 12 entries in the 0 list (@0x5586de2afe50).
[main]  start: 0x7f2befee0000 size:4KB  name: anon(PROT_READ | PROT_WRITE)
[main]  start: 0x7f2befedf000 size:4KB  name: anon(PROT_READ)
[main]  start: 0x7f2befede000 size:4KB  name: anon(PROT_WRITE)
[main]  start: 0x7f2befdde000 size:1MB  name: anon(PROT_READ | PROT_WRITE)
[main]  start: 0x7f2bee2ef000 size:1MB  name: anon(PROT_READ)
[main]  start: 0x7f2bee1ef000 size:1MB  name: anon(PROT_WRITE)
[main]  start: 0x7f2bedfef000 size:2MB  name: anon(PROT_READ | PROT_WRITE)
[main]  start: 0x7f2beddef000 size:2MB  name: anon(PROT_READ)
[main]  start: 0x7f2bedbef000 size:2MB  name: anon(PROT_WRITE)
[main]  start: 0x7f2befddd000 size:4KB  name: anon(PROT_READ | PROT_WRITE)
[main]  start: 0x7f2befddc000 size:4KB  name: anon(PROT_READ)
[main]  start: 0x7f2befddb000 size:4KB  name: anon(PROT_WRITE)
[main] Reserved/initialized 10 futexes.
[main] Added 25 filenames from /dev
[main] Added 25305 filenames from /proc
[main] Added 8175 filenames from /sys
[main] There are 8 entries in the 3 list (@0x5586de4987f0).
[main] pipefd:293
[main] pipefd:294
[main] pipefd:295
[main] pipefd:296
[main] pipefd:297
[main] pipefd:298
[main] pipefd:299
[main] pipefd:300
[main] Couldn't open socket 2:5:0. Socket type not supported
[main] Couldn't open socket 3:2:0. Address family not supported by protocol
[main] Couldn't open socket 3:3:0. Address family not supported by protocol
[main] Couldn't open socket 3:5:0. Address family not supported by protocol
[main] Couldn't open socket 3:5:1. Address family not supported by protocol
[main] Couldn't open socket 3:5:207. Address family not supported by protocol
[main] Couldn't open socket 4:2:0. Address family not supported by protocol
[main] Couldn't open socket 5:2:0. Address family not supported by protocol
[main] Couldn't open socket 5:3:0. Address family not supported by protocol
[main] Couldn't open socket 6:5:0. Address family not supported by protocol
[main] Couldn't open socket 9:5:0. Address family not supported by protocol
[main] Couldn't open socket 12:5:2. Address family not supported by protocol
[main] Couldn't open socket 12:1:2. Address family not supported by protocol
[main] Couldn't open socket 26:2:0. Address family not supported by protocol
[main] Couldn't open socket 26:1:0. Address family not supported by protocol
[main] Couldn't open socket 16:2:14. Protocol not supported
[main] Couldn't open socket 16:3:14. Protocol not supported
[main] Couldn't open socket 17:10:768. Operation not permitted
[main] Couldn't open socket 17:3:768. Operation not permitted
[main] Couldn't open socket 19:5:0. Address family not supported by protocol
[main] Couldn't open socket 21:5:0. Address family not supported by protocol
[main] Couldn't open socket 23:2:0. Address family not supported by protocol
[main] Couldn't open socket 23:2:1. Address family not supported by protocol
[main] Couldn't open socket 23:5:0. Address family not supported by protocol
[main] Couldn't open socket 23:1:0. Address family not supported by protocol
[main] Couldn't open socket 26:2:0. Address family not supported by protocol
[main] Couldn't open socket 26:1:0. Address family not supported by protocol
[main] Couldn't open socket 29:3:1. Address family not supported by protocol
[main] Couldn't open socket 29:2:2. Address family not supported by protocol
[main] Couldn't open socket 30:2:0. Address family not supported by protocol
[main] Couldn't open socket 30:5:0. Address family not supported by protocol
[main] Couldn't open socket 30:1:0. Address family not supported by protocol
[main] Couldn't open socket 31:5:0. Address family not supported by protocol
[main] Couldn't open socket 31:5:2. Address family not supported by protocol
[main] Couldn't open socket 31:1:0. Address family not supported by protocol
[main] Couldn't open socket 31:1:3. Address family not supported by protocol
[main] Couldn't open socket 31:3:0. Address family not supported by protocol
[main] Couldn't open socket 31:3:1. Address family not supported by protocol
[main] Couldn't open socket 31:3:3. Address family not supported by protocol
[main] Couldn't open socket 31:3:4. Address family not supported by protocol
[main] Couldn't open socket 31:3:5. Address family not supported by protocol
[main] Couldn't open socket 31:3:6. Address family not supported by protocol
[main] Couldn't open socket 31:3:7. Address family not supported by protocol
[main] Couldn't open socket 31:2:0. Address family not supported by protocol
[main] Couldn't open socket 33:2:2. Address family not supported by protocol
[main] Couldn't open socket 35:2:0. Address family not supported by protocol
[main] Couldn't open socket 35:5:0. Address family not supported by protocol
[main] Couldn't open socket 35:2:1. Address family not supported by protocol
[main] Couldn't open socket 35:5:2. Address family not supported by protocol
[main] Couldn't open socket 37:5:0. Address family not supported by protocol
[main] Couldn't open socket 37:5:1. Address family not supported by protocol
[main] Couldn't open socket 37:5:2. Address family not supported by protocol
[main] Couldn't open socket 37:5:3. Address family not supported by protocol
[main] Couldn't open socket 37:5:4. Address family not supported by protocol
[main] Couldn't open socket 37:5:5. Address family not supported by protocol
[main] Couldn't open socket 37:1:0. Address family not supported by protocol
[main] Couldn't open socket 37:1:1. Address family not supported by protocol
[main] Couldn't open socket 37:1:2. Address family not supported by protocol
[main] Couldn't open socket 37:1:3. Address family not supported by protocol
[main] Couldn't open socket 37:1:4. Address family not supported by protocol
[main] Couldn't open socket 37:1:5. Address family not supported by protocol
[main] Couldn't open socket 39:5:0. Address family not supported by protocol
[main] Couldn't open socket 39:3:0. Address family not supported by protocol
[main] Couldn't open socket 39:2:1. Address family not supported by protocol
[main] Couldn't open socket 39:1:1. Address family not supported by protocol
[main] Couldn't open socket 39:3:1. Address family not supported by protocol
[main] Couldn't open socket 41:10:0. Address family not supported by protocol
[main] Couldn't open socket 41:2:0. Address family not supported by protocol
[main] There are 20 entries in the 2 list (@0x5586de5e9180).
[main]  start: 0x7f2befd7e000 size:4KB  name: trinity-testfile1
[main]  start: 0x7f2befd7d000 size:4KB  name: trinity-testfile2
[main]  start: 0x7f2befd7c000 size:4KB  name: trinity-testfile3
[main]  start: 0x7f2bed400000 size:4KB  name: trinity-testfile4
[main]  start: 0x7f2befd7b000 size:4KB  name: trinity-testfile1
[main]  start: 0x7f2befd7a000 size:4KB  name: trinity-testfile2
[main]  start: 0x7f2befd79000 size:4KB  name: trinity-testfile3
[main]  start: 0x7f2befd78000 size:4KB  name: trinity-testfile4
[main]  start: 0x7f2befd77000 size:4KB  name: trinity-testfile1
[main]  start: 0x7f2befd76000 size:4KB  name: trinity-testfile2
[main]  start: 0x7f2befd75000 size:4KB  name: trinity-testfile3
[main]  start: 0x7f2befd74000 size:4KB  name: trinity-testfile4
[main]  start: 0x7f2befd73000 size:4KB  name: trinity-testfile1
[main]  start: 0x7f2befd72000 size:4KB  name: trinity-testfile2
[main]  start: 0x7f2befd71000 size:4KB  name: trinity-testfile3
[main]  start: 0x7f2befd70000 size:4KB  name: trinity-testfile4
[main]  start: 0x7f2befd6f000 size:4KB  name: trinity-testfile1
[main]  start: 0x7f2befd6e000 size:4KB  name: trinity-testfile2
[main]  start: 0x7f2befd6d000 size:4KB  name: trinity-testfile3
[main]  start: 0x41aba000 size:4KB  name: trinity-testfile4
[main] Enabled 13/14 fd providers. initialized:13.
[main] 11222 iterations. [F:8431 S:2745 HI:1573]
[main] 22548 iterations. [F:16928 S:5535 HI:2212]
[main] 33796 iterations. [F:25466 S:8211 HI:3806]
[main] 44419 iterations. [F:33558 S:10718 HI:3806]
[main] 54513 iterations. [F:41165 S:13178 HI:4445 STALLED:1]
[main] 64799 iterations. [F:48968 S:15625 HI:4445]
[main] 75504 iterations. [F:56938 S:18327 HI:4445]
[main] 85566 iterations. [F:64472 S:20816 HI:4445]
[main] 96687 iterations. [F:72892 S:23475 HI:4445]
[main] 107252 iterations. [F:80984 S:25902 HI:4445]
[main] 117292 iterations. [F:88535 S:28347 HI:4445]
[main] 127929 iterations. [F:96598 S:30879 HI:4445]
[main] 138578 iterations. [F:104592 S:33502 HI:4445]
[main] 148618 iterations. [F:112194 S:35879 HI:4445]

It makes me confused that the pid is different from which i echo.
with `diff /proc/30558/stack /proc/30504/stack` but no difference.

$ ps aux | grep trinity
nobody   30480  0.0  0.4  56612 36160 ?        Ds   10:35   0:00 trinity -q -q -l off -s -x get_robust_list -x remap_file_pages -N 999999999
nobody   30504  0.0  0.4  54804 34172 ?        DNs  10:35   0:00 trinity -q -q -l off -s -x get_robust_list -x remap_file_pages -N 999999999
nobody   30558  0.0  0.3  56180 29256 ?        DNs  10:35   0:00 trinity -q -q -l off -s -x get_robust_list -x remap_file_pages -N 999999999
nobody   30564  0.0  0.2  55160 21320 pts/0    D    10:35   0:00 trinity -q -q -l off -s -x get_robust_list -x remap_file_pages -N 999999999
nobody   30565  0.0  0.3  57504 28472 ?        Ds   10:35   0:00 trinity -q -q -l off -s -x get_robust_list -x remap_file_pages -N 999999999
Their status are all D, so i can not kill them.
And i want to know when those process kill themselves.
Is it a bug?

Thanks
Xiang

> 
> 	Dave
> 
> --
> To unsubscribe from this list: send the line "unsubscribe trinity" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: test processes are not all killed
  2017-08-02  3:09   ` Dai Xiang
@ 2017-08-02 12:37     ` Dave Jones
  2017-08-03  3:17       ` Dai Xiang
  2017-08-02 14:57     ` Tommi Rantala
  1 sibling, 1 reply; 8+ messages in thread
From: Dave Jones @ 2017-08-02 12:37 UTC (permalink / raw)
  To: Dai Xiang; +Cc: trinity

On Wed, Aug 02, 2017 at 11:09:21AM +0800, Dai Xiang wrote:
 
 > root@local ~# cat /proc/30504/stack
 > [<ffffffff8122b5ac>] wb_wait_for_completion+0x5c/0x90
 > [<ffffffff8122ed26>] sync_inodes_sb+0x96/0x200
 > [<ffffffff81235135>] sync_inodes_one_sb+0x15/0x20
 > [<ffffffff81204913>] iterate_supers+0xc3/0x120
 > [<ffffffff81235455>] sys_sync+0x35/0x90
 > [<ffffffff818fd39e>] tracesys_phase2+0x84/0x89
 > [<ffffffffffffffff>] 0xffffffffffffffff

Ah. You might just have a *lot* of dirty pages to write out.
Does iotop show that journald is writing ?
If IO is progressing, it's not a bug. If it's completely idle, and we're
still stuck here, that's a kernel bug.

Unless you're particularly focussed on stressing filesystems, you might
want to skip the sync related syscalls (fsync,fdatasync,sync,syncfs)

	Dave

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: test processes are not all killed
  2017-08-02  3:09   ` Dai Xiang
  2017-08-02 12:37     ` Dave Jones
@ 2017-08-02 14:57     ` Tommi Rantala
  2017-08-02 16:41       ` Dave Jones
  1 sibling, 1 reply; 8+ messages in thread
From: Tommi Rantala @ 2017-08-02 14:57 UTC (permalink / raw)
  To: Dai Xiang; +Cc: Dave Jones, trinity

2017-08-02 6:09 GMT+03:00 Dai Xiang <xiangx.dai@intel.com>:
> On Tue, Aug 01, 2017 at 11:38:23AM -0400, Dave Jones wrote:
>> On Tue, Aug 01, 2017 at 05:38:13PM +0800, Dai Xiang wrote:
>>  > Hi!
>>  > I use below cmds(with root permission) include trinity to test and find an interesting issue:
>>  >
>>  > cmd="trinity -q -q -l off -s $seed -x get_robust_list -x remap_file_pages -N 999999999"
>>  > cd /tmp
>>  > chroot --userspec nobody:nogroup / $cmd 2>&1 &
>>  > pid=$!
>>  > sleep 300s
>>  > kill -9 $pid

Hi,

"kill -9 $pid" only kills the main trinity pid, right?
So the watchdog and all the forked child processes will not get killed.

Maybe Dave knows better if the other trinity processes will kill
themselves if they get re-parented.

Perhaps you could do "killall -9 trinity" or something like that.

-Tommi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: test processes are not all killed
  2017-08-02 14:57     ` Tommi Rantala
@ 2017-08-02 16:41       ` Dave Jones
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Jones @ 2017-08-02 16:41 UTC (permalink / raw)
  To: Tommi Rantala; +Cc: Dai Xiang, trinity

On Wed, Aug 02, 2017 at 05:57:21PM +0300, Tommi Rantala wrote:
 > 2017-08-02 6:09 GMT+03:00 Dai Xiang <xiangx.dai@intel.com>:
 > > On Tue, Aug 01, 2017 at 11:38:23AM -0400, Dave Jones wrote:
 > >> On Tue, Aug 01, 2017 at 05:38:13PM +0800, Dai Xiang wrote:
 > >>  > Hi!
 > >>  > I use below cmds(with root permission) include trinity to test and find an interesting issue:
 > >>  >
 > >>  > cmd="trinity -q -q -l off -s $seed -x get_robust_list -x remap_file_pages -N 999999999"
 > >>  > cd /tmp
 > >>  > chroot --userspec nobody:nogroup / $cmd 2>&1 &
 > >>  > pid=$!
 > >>  > sleep 300s
 > >>  > kill -9 $pid
 > 
 > Hi,
 > 
 > "kill -9 $pid" only kills the main trinity pid, right?
 > So the watchdog and all the forked child processes will not get killed.
 > 
 > Maybe Dave knows better if the other trinity processes will kill
 > themselves if they get re-parented.

They should, but only after it exits that sync() syscall.
There's a check in periodic_work() that checks the main pid is still
around every 10 syscalls.

	Dave

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: test processes are not all killed
  2017-08-02 12:37     ` Dave Jones
@ 2017-08-03  3:17       ` Dai Xiang
  2017-08-03  3:22         ` Dave Jones
  0 siblings, 1 reply; 8+ messages in thread
From: Dai Xiang @ 2017-08-03  3:17 UTC (permalink / raw)
  To: Dave Jones; +Cc: trinity

On Wed, Aug 02, 2017 at 08:37:35AM -0400, Dave Jones wrote:
> On Wed, Aug 02, 2017 at 11:09:21AM +0800, Dai Xiang wrote:
>  
>  > root@local ~# cat /proc/30504/stack
>  > [<ffffffff8122b5ac>] wb_wait_for_completion+0x5c/0x90
>  > [<ffffffff8122ed26>] sync_inodes_sb+0x96/0x200
>  > [<ffffffff81235135>] sync_inodes_one_sb+0x15/0x20
>  > [<ffffffff81204913>] iterate_supers+0xc3/0x120
>  > [<ffffffff81235455>] sys_sync+0x35/0x90
>  > [<ffffffff818fd39e>] tracesys_phase2+0x84/0x89
>  > [<ffffffffffffffff>] 0xffffffffffffffff
> 
> Ah. You might just have a *lot* of dirty pages to write out.
> Does iotop show that journald is writing ?

I do test again and only echo pid without kill pid but killall -9
trinity:
nobody   29553  0.0  0.4  55288 33212 pts/0    SL   09:42   0:00 trinity -q -q -l off -s -x get_robust_list -x remap_file_pages -N 999999999
nobody   30107  0.0  0.3  56360 28308 ?        Ds   09:51   0:00 trinity -q -q -l off -s -x get_robust_list -x remap_file_pages -N 999999999
nobody   30109  0.0  0.4  55608 34932 ?        DNLs 09:51   0:00 trinity -q -q -l off -s -x get_robust_list -x remap_file_pages -N 999999999
nobody   30110  0.0  0.3  55424 30156 ?        Ds   09:51   0:00 trinity -q -q -l off -s -x get_robust_list -x remap_file_pages -N 999999999
nobody   30111  0.0  0.4  55464 32788 ?        DNs  09:51   0:00 trinity -q -q -l off -s -x get_robust_list -x remap_file_pages -N 999999999
nobody   30114  0.0  0.4  56584 32792 ?        Ds   09:51   0:00 trinity -q -q -l off -s -x get_robust_list -x remap_file_pages -N 999999999
nobody   30161  0.0  0.2  55464 23704 ?        Ds   09:51   0:00 trinity -q -q -l off -s -x get_robust_list -x remap_file_pages -N 999999999
nobody   30165  0.0  0.3  58564 30568 ?        DNs  09:51   0:00 trinity -q -q -l off -s -x get_robust_list -x remap_file_pages -N 999999999
nobody   30186  0.0  0.1  55436 14984 ?        Ds   09:51   0:00 trinity -q -q -l off -s -x get_robust_list -x remap_file_pages -N 999999999

The 29553 is printed but killall failed:
$ killall -9 trinity
trinity: no process found


use pstree:
  |-trinity-main -q -q -l off -s 3648957937 -x get_robust_list -x remap_file_pages -N 999999999
  |   |-trinity-c0 -q -q -l off -s 3648957937 -x get_robust_list -x remap_file_pages -N 999999999
  |   |-trinity-c1 -q -q -l off -s 3648957937 -x get_robust_list -x remap_file_pages -N 999999999
  |   |-trinity-c2 -q -q -l off -s 3648957937 -x get_robust_list -x remap_file_pages -N 999999999
  |   |-trinity-c3 -q -q -l off -s 3648957937 -x get_robust_list -x remap_file_pages -N 999999999
  |   |-trinity-c4 -q -q -l off -s 3648957937 -x get_robust_list -x remap_file_pages -N 999999999
  |   |-trinity-c5 -q -q -l off -s 3648957937 -x get_robust_list -x remap_file_pages -N 999999999
  |   |-trinity-c6 -q -q -l off -s 3648957937 -x get_robust_list -x remap_file_pages -N 999999999
  |   `-trinity-c7 -q -q -l off -s 3648957937 -x get_robust_list -x remap_file_pages -N 999999999

use iotop -o:
111 be/4 root        0.00 B/s    0.00 B/s  0.00 % 99.99 %
[kworker/0:2] <== sometime kworker/1:1 or so on
sometimes more processes called kworker run, do not find trinity.

Thanks

Xiang
> If IO is progressing, it's not a bug. If it's completely idle, and we're
> still stuck here, that's a kernel bug.
> 
> Unless you're particularly focussed on stressing filesystems, you might
> want to skip the sync related syscalls (fsync,fdatasync,sync,syncfs)

How can i skip those syscalls?

> 
> 	Dave
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: test processes are not all killed
  2017-08-03  3:17       ` Dai Xiang
@ 2017-08-03  3:22         ` Dave Jones
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Jones @ 2017-08-03  3:22 UTC (permalink / raw)
  To: Dai Xiang; +Cc: trinity

On Thu, Aug 03, 2017 at 11:17:12AM +0800, Dai Xiang wrote:
 
 > > Ah. You might just have a *lot* of dirty pages to write out.
 > > Does iotop show that journald is writing ?
 > 
 > use iotop -o:
 > 111 be/4 root        0.00 B/s    0.00 B/s  0.00 % 99.99 %
 > [kworker/0:2] <== sometime kworker/1:1 or so on
 > sometimes more processes called kworker run, do not find trinity.
 
You can try the ftrace stuff I suggested to trace what those kworker
threads are doing, but it really sounds like it's just accumulated a ton
of dirty pages to write out.

 > > Unless you're particularly focussed on stressing filesystems, you might
 > > want to skip the sync related syscalls (fsync,fdatasync,sync,syncfs)
 > 
 > How can i skip those syscalls?

use multiple -x arguments.
This is common enough, that it might be worth adding a --no-sync
argument that just does this for all relevant syscalls.

	Dave

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-08-03  3:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-01  9:38 test processes are not all killed Dai Xiang
2017-08-01 15:38 ` Dave Jones
2017-08-02  3:09   ` Dai Xiang
2017-08-02 12:37     ` Dave Jones
2017-08-03  3:17       ` Dai Xiang
2017-08-03  3:22         ` Dave Jones
2017-08-02 14:57     ` Tommi Rantala
2017-08-02 16:41       ` Dave Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).