linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Tracing busy processes/threads freezes/stalls the whole machine
       [not found] <293cfb1d-8a53-21e1-83c1-cdb6e2f32c65@samba.org>
@ 2021-05-04 12:26 ` Stefan Metzmacher
       [not found] ` <20210504092404.6b12aba4@gandalf.local.home>
  1 sibling, 0 replies; 8+ messages in thread
From: Stefan Metzmacher @ 2021-05-04 12:26 UTC (permalink / raw)
  To: Steven Rostedt, Ingo Molnar; +Cc: linux-trace-devel, io-uring, linux-kernel

+ linux-kernel

I just retested with 5.12 final:

As long as I give the VM more than one cpu, I can easily reproduce the problem with:

> while true; do cat linux-image-5.12.0-rc8-dbg_5.12.0-rc8-22_amd64.deb > /dev/null; done
> and
> pidofcat=$(pidof cat); echo $pidofcat; trace-cmd record -e all -P $pidofcat

On a single cpu VM it doesn't seem to trigger.

Is anyone else able to reproduce this?

Any ideas how to debug this?

It's really annoying to fear a machine freeze when trying to trace problems
on recent kernels...

Thanks for any possible hint...
metze

Am 22.04.21 um 16:26 schrieb Stefan Metzmacher:
> 
> Hi Steven, hi Ingo,
> 
> I recently tried to analyze the performance of Samba using io-uring.
> 
> I was using ubuntu 20.04 with the 5.10.0-1023-oem kernel, which is based on v5.10.25, see:
> https://kernel.ubuntu.com/git/kernel-ppa/mirror/ubuntu-oem-5.10-focal.git/log/?h=oem-5.10-prep
> trace-cmd is at version 2.8.3-4build1.
> 
> In order to find the bottleneck I tried to use (trace-cmd is at version 2.8.3-4build1):
> 
>   trace-cmd -e all -P ${pid_of_io_uring_worker}
> 
> As a result the server was completely dead immediately.
> 
> I tried to reproduce this in a virtual machine (inside virtualbox).
> 
> I used a modified 'io_uring-cp' that loops forever, see:
> https://github.com/metze-samba/liburing/commit/5e98efed053baf03521692e786c1c55690b04d8e
> 
> When I run './io_uring-cp-forever link-cp.c file',
> I see a 'io_wq_manager' and a 'io_wqe_worker-0' kernel thread,
> while './io_uring-cp-forever link-cp.c file' as well as 'io_wqe_worker-0'
> consume about 25% cpu each.
> 
> When I run 'trace-cmd -e all -P $pid' for 'io_wqe_worker-0' or 'io_wq_manager'
> I can reproduce the problem, then I found that the same seems to happen for
> also for other kernel threads e.g. '[kworker/1:1-events]', it seems that
> it happens for all kernel threads, which are not completely idle.
> 
> Which this:
> 
>  From 'top':
>    1341 root      20   0    2512    576    508 S  33,4   0,1   0:10.39 io_uring-cp-for
>    1343 root      20   0       0      0      0 R  29,8   0,0   0:08.43 io_wqe_worker-0
>       7 root      20   0       0      0      0 I   0,3   0,0   0:00.31 kworker/0:1-events
> 
>    PID 5 is [kworker/0:0-ata_sff]
> 
> # trace-cmd record -e all -P 5'
> Hit Ctrl^C to stop recording
> ^CCPU0 data recorded at offset=0x7b8000
>     0 bytes in size
> CPU1 data recorded at offset=0x7b8000
>     69632 bytes in size
> 
> # But
> # trace-cmd record -e all -P 7
> => machine unresponsive (no blinking cursor on the console anymore)
> On the host 'top' shows that the VirtualBoxVM cpu emulator thread 'EMT-1'
> uses 100% cpu, so I guess the guest kernel is in something like an endless
> recursion loop. Maybe a trace function recursing to itself?
> 
> On the same VM I tried a 5.12rc8 kernel and there I can also reproduce the
> problem.
> 
> I also managed to reproduce the problem without io-uring, just using:
> 
>  while true; do cat linux-image-5.12.0-rc8-dbg_5.12.0-rc8-22_amd64.deb > /dev/null; done
> 
> in order to keep some kernel threads moving.
> This happens with 5.12rc8 and 5.10.0-1023-oem, but I was not able to
> reproduce any of this using the 5.8.0-50-generic kernel, see
> https://kernel.ubuntu.com/git/ubuntu/ubuntu-focal.git/log/?h=Ubuntu-hwe-5.8-5.8.0-50.56_20.04.1
> 
> I was also able to reproduce this with a ubuntu 21.04 vm using
> the 5.11.0-14-generic kernel, see:
> https://kernel.ubuntu.com/git/ubuntu/ubuntu-hirsute.git/log/?h=Ubuntu-5.11.0-14.15
> On this one I only managed to reproduce the problem with
> './io_uring-cp-forever link-cp.c file', but not with
> 'while true; do cat linux-image-5.12.0-rc8-dbg_5.12.0-rc8-22_amd64.deb > /dev/null; done'
> 
> 
> So it seems the problem was introduced after 5.8 and is not really related to
> io-uring. And it may not be purely related to kernel threads.
> 
> With this on 5.12-rc8 (again):
> 
>   └─tmux: server,903
>       ├─bash,904
>       │   └─io_uring-cp-for,925 link-cp.c file
>       │       ├─{iou-mgr-925},926
>       │       └─{iou-wrk-925},927
>       └─bash,929
>           └─pstree,938 -a -t -p
> 
> I was able to to trace once:
> 
> root@ub1704-166:~# trace-cmd record -e all -P 925
> Hit Ctrl^C to stop recording
> ^CCPU0 data recorded at offset=0x7b8000
>     10842112 bytes in size
> CPU1 data recorded at offset=0x120f000
>     36450304 bytes in size
> 
> But the 2nd run reproduced the problem:
> root@ub1704-166:~# trace-cmd record -e all -P 925
> 
> I was also able to reproduce it with:
> 
> while true; do cat linux-image-5.12.0-rc8-dbg_5.12.0-rc8-22_amd64.deb > /dev/null; done
> and
> pidofcat=$(pidof cat); echo $pidofcat; trace-cmd record -e all -P $pidofcat
> 
> So it seems any busy thread (user or kernel) triggers the problem.
> 
> Any ideas what has changed after 5.8?
> 
> Thanks!
> metze
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* sched_waking vs. set_event_pid crash (Re: Tracing busy processes/threads freezes/stalls the whole machine)
       [not found]           ` <90c806a0-8a2f-1257-7337-6761100217c9@samba.org>
@ 2021-07-22 22:43             ` Stefan Metzmacher
  2021-07-23  1:41               ` Steven Rostedt
  2021-07-23  2:51               ` Steven Rostedt
  0 siblings, 2 replies; 8+ messages in thread
From: Stefan Metzmacher @ 2021-07-22 22:43 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Ingo Molnar, linux-trace-devel, io-uring, linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 8084 bytes --]


Hi Steve,

After some days of training:
https://training.linuxfoundation.org/training/linux-kernel-debugging-and-security/
I was able to get much closer to the problem :-)

In order to reproduce it and get reliable kexec crash dumps,
I needed to give the VM at least 3 cores.

While running './io-uring_cp-forever link-cp.c file' (from:
https://github.com/metze-samba/liburing/commits/io_uring-cp-forever )
in one window, the following simple sequence triggered the problem in most cases:

echo 1 > /sys/kernel/tracing/events/sched/sched_waking/enable
echo 1 > /sys/kernel/tracing/set_event_pid

It triggered the following:

> [  192.924023] ------------[ cut here ]------------
> [  192.924026] WARNING: CPU: 2 PID: 1696 at arch/x86/include/asm/kfence.h:44 kfence_protect_page+0x33/0xc0
> [  192.924034] Modules linked in: vboxsf intel_rapl_msr intel_rapl_common crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd rapl vboxvideo drm_vr
> am_helper drm_ttm_helper ttm snd_intel8x0 input_leds snd_ac97_codec joydev ac97_bus serio_raw snd_pcm drm_kms_helper cec mac_hid vboxguest snd_timer rc_core snd fb
> _sys_fops syscopyarea sysfillrect soundcore sysimgblt kvm_intel kvm sch_fq_codel drm sunrpc ip_tables x_tables autofs4 hid_generic usbhid hid crc32_pclmul psmouse 
> ahci libahci e1000 i2c_piix4 pata_acpi video
> [  192.924068] CPU: 2 PID: 1696 Comm: io_uring-cp-for Kdump: loaded Not tainted 5.13.0-1007-oem #7-Ubuntu
> [  192.924071] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
> [  192.924072] RIP: 0010:kfence_protect_page+0x33/0xc0
> [  192.924075] Code: 53 89 f3 48 8d 75 e4 48 83 ec 10 65 48 8b 04 25 28 00 00 00 48 89 45 e8 31 c0 e8 98 1f da ff 48 85 c0 74 06 83 7d e4 01 74 06 <0f> 0b 31 c0 eb 39 48 8b 38 48 89 c2 84 db 75 47 48 89 f8 0f 1f 40
> [  192.924077] RSP: 0018:ffff980c0077f918 EFLAGS: 00010046
> [  192.924079] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffbcc10000
> [  192.924080] RDX: ffff980c0077f91c RSI: 0000000000032000 RDI: 0000000000000000
> [  192.924082] RBP: ffff980c0077f938 R08: 0000000000000000 R09: 0000000000000000
> [  192.924083] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000032000
> [  192.924084] R13: 0000000000000000 R14: ffff980c0077fb58 R15: 0000000000000000
> [  192.924085] FS:  00007f2491207540(0000) GS:ffff8ccb5bd00000(0000) knlGS:0000000000000000
> [  192.924087] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  192.924088] CR2: 00000000000325f8 CR3: 0000000102572005 CR4: 00000000000706e0
> [  192.924091] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  192.924092] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  192.924093] Call Trace:
> [  192.924096]  kfence_unprotect+0x17/0x30
> [  192.924099]  kfence_handle_page_fault+0x97/0x250
> [  192.924102]  ? cmp_ex_sort+0x30/0x30
> [  192.924104]  page_fault_oops+0xa0/0x2a0
> [  192.924106]  ? trace_event_buffer_reserve+0x22/0xb0
> [  192.924110]  ? search_exception_tables+0x5b/0x60
> [  192.924113]  kernelmode_fixup_or_oops+0x92/0xf0
> [  192.924115]  __bad_area_nosemaphore+0x14d/0x190
> [  192.924116]  __bad_area+0x5f/0x80
> [  192.924118]  bad_area+0x16/0x20
> [  192.924119]  do_user_addr_fault+0x368/0x640> [  192.924121]  ? aa_file_perm+0x11d/0x470
> [  192.924123]  exc_page_fault+0x7d/0x170
> [  192.924127]  asm_exc_page_fault+0x1e/0x30
> [  192.924129] RIP: 0010:trace_event_buffer_reserve+0x22/0xb0
> [  192.924132] Code: 00 00 00 00 0f 1f 40 00 55 48 89 e5 41 56 41 55 49 89 d5 41 54 49 89 f4 53 48 89 fb 4c 8b 76 10 f6 46 49 02 74 29 48 8b 46 28 <48> 8b 88 b8 00 00 00 48 8b 90 c0 00 00 00 48 09 d1 74 12 48 8b 40
> [  192.924134] RSP: 0018:ffff980c0077fc00 EFLAGS: 00010002
> [  192.924135] RAX: 0000000000032540 RBX: ffff980c0077fc38 RCX: 0000000000000002
> [  192.924136] RDX: 0000000000000028 RSI: ffffffffbcdb7e80 RDI: ffff980c0077fc38
> [  192.924137] RBP: ffff980c0077fc20 R08: ffff8ccb46814cb8 R09: 0000000000000010
> [  192.924138] R10: 000000007ffff000 R11: 0000000000000000 R12: ffffffffbcdb7e80
> [  192.924140] R13: 0000000000000028 R14: 0000000000000000 R15: ffff8ccb49b6ed0c
> [  192.924142]  trace_event_raw_event_sched_wakeup_template+0x63/0xf0
> [  192.924146]  try_to_wake_up+0x285/0x570
> [  192.924148]  wake_up_process+0x15/0x20
> [  192.924149]  io_wqe_activate_free_worker+0x5b/0x70
> [  192.924152]  io_wqe_enqueue+0xfb/0x190
> [  192.924154]  io_wq_enqueue+0x1e/0x20
> [  192.924156]  io_queue_async_work+0xa0/0x120
> [  192.924158]  __io_queue_sqe+0x17e/0x360
> [  192.924160]  ? io_req_free_batch_finish+0x8d/0xf0
> [  192.924162]  io_queue_sqe+0x199/0x2a0
> [  192.924164]  io_submit_sqes+0x70d/0x7f0
> [  192.924166]  __x64_sys_io_uring_enter+0x1b8/0x3d0
> [  192.924168]  do_syscall_64+0x40/0xb0
> [  192.924170]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  192.924172] RIP: 0033:0x7f249112f89d
> [  192.924174] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 f5 0c 00 f7 d8 64 89 01 48
> [  192.924179] RSP: 002b:00007fff56c491c8 EFLAGS: 00000216 ORIG_RAX: 00000000000001aa
> [  192.924180] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f249112f89d
> [  192.924182] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000005
> [  192.924182] RBP: 00007fff56c49200 R08: 0000000000000000 R09: 0000000000000008
> [  192.924183] R10: 0000000000000000 R11: 0000000000000216 R12: 0000000000000000
> [  192.924184] R13: 00007fff56c49380 R14: 0000000000000dcf R15: 000055c533b6f2a0
> [  192.924186] ---[ end trace d1211902aae73d20 ]---

The problem seems to happen in this line of trace_event_ignore_this_pid():

pid_list = rcu_dereference_raw(tr->filtered_pids);

It seems it got inlined within trace_event_buffer_reserve()

There strangest things I found so far is this:

crash> sym global_trace
ffffffffbcdb7e80 (d) global_trace
crash> list ftrace_trace_arrays
ffffffffbcdb7e70
ffffffffbcdb7e80

trace_array has size 7672, but ffffffffbcdb7e70 is only 16 bytes away from
ffffffffbcdb7e80.

Also this:

crash> struct trace_array.events -o global_trace
struct trace_array {
  [ffffffffbcdb9be0] struct list_head events;
}
crash> list -s trace_event_file.tr -o trace_event_file.list ffffffffbcdb9be0
ffffffffbcdb9be0
  tr = 0xffffffffbcdb7b20
ffff8ccb45cdfb00
  tr = 0xffffffffbcdb7e80
ffff8ccb45cdf580
  tr = 0xffffffffbcdb7e80
ffff8ccb45cdfe18
  tr = 0xffffffffbcdb7e80
...
  tr = 0xffffffffbcdb7e80

The first one 0xffffffffbcdb7b20 is only 864 bytes away from 0xffffffffbcdb7e80

Additional information can be found here:
https://www.samba.org/~metze/202107221802.reproduce-freeze-05-first-time.v3-pid1/
-rw-r--r-- 1 metze metze 37250108 Jul 22 18:02 dump.202107221802
-rw-r--r-- 1 metze metze    63075 Jul 22 18:02 dmesg.202107221802
-rw-r--r-- 1 metze metze     8820 Jul 22 18:13 crash-bt-a-s.txt
-rw-r--r-- 1 metze metze    36586 Jul 22 18:14 crash-bt-a-l-FF-s.txt
-rw-r--r-- 1 metze metze     4798 Jul 22 23:49 crash-bt-p.txt
-rw-r--r-- 1 metze metze      946 Jul 23 00:13 strange-things.txt
-rw-r--r-- 1 metze metze      482 Jul 23 00:24 code-and-vmlinux.txt

The code can be found here:
https://kernel.ubuntu.com/git/kernel-ppa/mirror/ubuntu-oem-5.10-focal.git/tag/?h=Ubuntu-oem-5.13-5.13.0-1007.7

And the /usr/lib/debug/boot/vmlinux-5.13.0-1007-oem file can be extracted from here:
http://ddebs.ubuntu.com/pool/main/l/linux-oem-5.13/linux-image-unsigned-5.13.0-1007-oem-dbgsym_5.13.0-1007.7_amd64.ddeb

Also also needed the latest code from https://github.com/crash-utility/crash.git
(at commit f53b73e8380bca054cebd2b61ff118c46609429b)

It would be really great if you (or anyone else on the lists) could have a closer look
in order to get the problem fixed :-)

I've learned a lot this week and tried hard to find the problem myself,
but I have to move back to other work for now.

Thanks!
metze


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sched_waking vs. set_event_pid crash (Re: Tracing busy processes/threads freezes/stalls the whole machine)
  2021-07-22 22:43             ` sched_waking vs. set_event_pid crash (Re: Tracing busy processes/threads freezes/stalls the whole machine) Stefan Metzmacher
@ 2021-07-23  1:41               ` Steven Rostedt
  2021-07-23  2:51               ` Steven Rostedt
  1 sibling, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2021-07-23  1:41 UTC (permalink / raw)
  To: Stefan Metzmacher; +Cc: Ingo Molnar, linux-trace-devel, io-uring, linux-kernel

On Fri, 23 Jul 2021 00:43:13 +0200
Stefan Metzmacher <metze@samba.org> wrote:

> Hi Steve,

Hi Stefan,

> 
> After some days of training:
> https://training.linuxfoundation.org/training/linux-kernel-debugging-and-security/
> I was able to get much closer to the problem :-)
> 
> In order to reproduce it and get reliable kexec crash dumps,
> I needed to give the VM at least 3 cores.
> 
> While running './io-uring_cp-forever link-cp.c file' (from:
> https://github.com/metze-samba/liburing/commits/io_uring-cp-forever )
> in one window, the following simple sequence triggered the problem in most cases:
> 
> echo 1 > /sys/kernel/tracing/events/sched/sched_waking/enable
> echo 1 > /sys/kernel/tracing/set_event_pid

I was able to reproduce it with running hackbench in a while loop and
in another terminal, executing the above two lines.

I think I found the bug. Can you test this patch?

Thanks,

-- Steve

diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 976bf8ce8039..fc32821f8240 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -299,8 +299,8 @@ static int tracepoint_add_func(struct tracepoint *tp,
 	 * a pointer to it.  This array is referenced by __DO_TRACE from
 	 * include/linux/tracepoint.h using rcu_dereference_sched().
 	 */
-	rcu_assign_pointer(tp->funcs, tp_funcs);
 	tracepoint_update_call(tp, tp_funcs, false);
+	rcu_assign_pointer(tp->funcs, tp_funcs);
 	static_key_enable(&tp->key);
 
 	release_probes(old);

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: sched_waking vs. set_event_pid crash (Re: Tracing busy processes/threads freezes/stalls the whole machine)
  2021-07-22 22:43             ` sched_waking vs. set_event_pid crash (Re: Tracing busy processes/threads freezes/stalls the whole machine) Stefan Metzmacher
  2021-07-23  1:41               ` Steven Rostedt
@ 2021-07-23  2:51               ` Steven Rostedt
  2021-07-23 10:35                 ` Stefan Metzmacher
  1 sibling, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2021-07-23  2:51 UTC (permalink / raw)
  To: Stefan Metzmacher; +Cc: Ingo Molnar, linux-trace-devel, io-uring, linux-kernel

On Fri, 23 Jul 2021 00:43:13 +0200
Stefan Metzmacher <metze@samba.org> wrote:

> Hi Steve,
> 
> After some days of training:
> https://training.linuxfoundation.org/training/linux-kernel-debugging-and-security/
> I was able to get much closer to the problem :-)

Well, knowing what to look for was not going to be easy. And I'm sure
you were shocked to see what I posted as a fix ;-)

Assuming this does fix your issue, I sent out a real patch with the
explanation of what happened in the change log, so that you can see why
that change was your issue.

> 
> In order to reproduce it and get reliable kexec crash dumps,
> I needed to give the VM at least 3 cores.

Yes, it did require having this run on multiple CPUs to have a race
condition trigger, and two cores would have been hard to hit it. I ran
it on 8 cores and it triggered rather easily.

> 
> While running './io-uring_cp-forever link-cp.c file' (from:
> https://github.com/metze-samba/liburing/commits/io_uring-cp-forever )
> in one window, the following simple sequence triggered the problem in most cases:
> 
> echo 1 > /sys/kernel/tracing/events/sched/sched_waking/enable
> echo 1 > /sys/kernel/tracing/set_event_pid

All it took was something busy that did a lot of wakeups while setting
the set_event_pid, to be able to hit the race easily. As I stated, I
triggered it with running hackbench instead of the io-uring code. In
fact, this bug had nothing to do with io-uring, and you were only
triggering it because you were making enough of a load on the system to
make the race happen often.

> 
> It triggered the following:
> 
> > [  192.924023] ------------[ cut here ]------------
> > [  192.924026] WARNING: CPU: 2 PID: 1696 at arch/x86/include/asm/kfence.h:44 kfence_protect_page+0x33/0xc0
> > [  192.924034] Modules linked in: vboxsf intel_rapl_msr intel_rapl_common crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd rapl vboxvideo drm_vr
> > am_helper drm_ttm_helper ttm snd_intel8x0 input_leds snd_ac97_codec joydev ac97_bus serio_raw snd_pcm drm_kms_helper cec mac_hid vboxguest snd_timer rc_core snd fb
> > _sys_fops syscopyarea sysfillrect soundcore sysimgblt kvm_intel kvm sch_fq_codel drm sunrpc ip_tables x_tables autofs4 hid_generic usbhid hid crc32_pclmul psmouse 
> > ahci libahci e1000 i2c_piix4 pata_acpi video
> > [  192.924068] CPU: 2 PID: 1696 Comm: io_uring-cp-for Kdump: loaded Not tainted 5.13.0-1007-oem #7-Ubuntu
> > [  192.924071] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
> > [  192.924072] RIP: 0010:kfence_protect_page+0x33/0xc0
> > [  192.924075] Code: 53 89 f3 48 8d 75 e4 48 83 ec 10 65 48 8b 04 25 28 00 00 00 48 89 45 e8 31 c0 e8 98 1f da ff 48 85 c0 74 06 83 7d e4 01 74 06 <0f> 0b 31 c0 eb 39 48 8b 38 48 89 c2 84 db 75 47 48 89 f8 0f 1f 40
> > [  192.924077] RSP: 0018:ffff980c0077f918 EFLAGS: 00010046
> > [  192.924079] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffbcc10000
> > [  192.924080] RDX: ffff980c0077f91c RSI: 0000000000032000 RDI: 0000000000000000
> > [  192.924082] RBP: ffff980c0077f938 R08: 0000000000000000 R09: 0000000000000000
> > [  192.924083] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000032000
> > [  192.924084] R13: 0000000000000000 R14: ffff980c0077fb58 R15: 0000000000000000
> > [  192.924085] FS:  00007f2491207540(0000) GS:ffff8ccb5bd00000(0000) knlGS:0000000000000000
> > [  192.924087] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  192.924088] CR2: 00000000000325f8 CR3: 0000000102572005 CR4: 00000000000706e0
> > [  192.924091] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  192.924092] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [  192.924093] Call Trace:
> > [  192.924096]  kfence_unprotect+0x17/0x30
> > [  192.924099]  kfence_handle_page_fault+0x97/0x250
> > [  192.924102]  ? cmp_ex_sort+0x30/0x30
> > [  192.924104]  page_fault_oops+0xa0/0x2a0
> > [  192.924106]  ? trace_event_buffer_reserve+0x22/0xb0
> > [  192.924110]  ? search_exception_tables+0x5b/0x60
> > [  192.924113]  kernelmode_fixup_or_oops+0x92/0xf0
> > [  192.924115]  __bad_area_nosemaphore+0x14d/0x190
> > [  192.924116]  __bad_area+0x5f/0x80
> > [  192.924118]  bad_area+0x16/0x20
> > [  192.924119]  do_user_addr_fault+0x368/0x640> [  192.924121]  ? aa_file_perm+0x11d/0x470
> > [  192.924123]  exc_page_fault+0x7d/0x170
> > [  192.924127]  asm_exc_page_fault+0x1e/0x30
> > [  192.924129] RIP: 0010:trace_event_buffer_reserve+0x22/0xb0
> > [  192.924132] Code: 00 00 00 00 0f 1f 40 00 55 48 89 e5 41 56 41 55 49 89 d5 41 54 49 89 f4 53 48 89 fb 4c 8b 76 10 f6 46 49 02 74 29 48 8b 46 28 <48> 8b 88 b8 00 00 00 48 8b 90 c0 00 00 00 48 09 d1 74 12 48 8b 40
> > [  192.924134] RSP: 0018:ffff980c0077fc00 EFLAGS: 00010002
> > [  192.924135] RAX: 0000000000032540 RBX: ffff980c0077fc38 RCX: 0000000000000002
> > [  192.924136] RDX: 0000000000000028 RSI: ffffffffbcdb7e80 RDI: ffff980c0077fc38
> > [  192.924137] RBP: ffff980c0077fc20 R08: ffff8ccb46814cb8 R09: 0000000000000010
> > [  192.924138] R10: 000000007ffff000 R11: 0000000000000000 R12: ffffffffbcdb7e80
> > [  192.924140] R13: 0000000000000028 R14: 0000000000000000 R15: ffff8ccb49b6ed0c
> > [  192.924142]  trace_event_raw_event_sched_wakeup_template+0x63/0xf0
> > [  192.924146]  try_to_wake_up+0x285/0x570
> > [  192.924148]  wake_up_process+0x15/0x20
> > [  192.924149]  io_wqe_activate_free_worker+0x5b/0x70
> > [  192.924152]  io_wqe_enqueue+0xfb/0x190
> > [  192.924154]  io_wq_enqueue+0x1e/0x20
> > [  192.924156]  io_queue_async_work+0xa0/0x120
> > [  192.924158]  __io_queue_sqe+0x17e/0x360
> > [  192.924160]  ? io_req_free_batch_finish+0x8d/0xf0
> > [  192.924162]  io_queue_sqe+0x199/0x2a0
> > [  192.924164]  io_submit_sqes+0x70d/0x7f0
> > [  192.924166]  __x64_sys_io_uring_enter+0x1b8/0x3d0
> > [  192.924168]  do_syscall_64+0x40/0xb0
> > [  192.924170]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [  192.924172] RIP: 0033:0x7f249112f89d
> > [  192.924174] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 f5 0c 00 f7 d8 64 89 01 48
> > [  192.924179] RSP: 002b:00007fff56c491c8 EFLAGS: 00000216 ORIG_RAX: 00000000000001aa
> > [  192.924180] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f249112f89d
> > [  192.924182] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000005
> > [  192.924182] RBP: 00007fff56c49200 R08: 0000000000000000 R09: 0000000000000008
> > [  192.924183] R10: 0000000000000000 R11: 0000000000000216 R12: 0000000000000000
> > [  192.924184] R13: 00007fff56c49380 R14: 0000000000000dcf R15: 000055c533b6f2a0
> > [  192.924186] ---[ end trace d1211902aae73d20 ]---  
> 
> The problem seems to happen in this line of trace_event_ignore_this_pid():
> 
> pid_list = rcu_dereference_raw(tr->filtered_pids);

That "tr" comes from the trace_event_file that is passed in by the
"data" field of the callback. Hence, this callback got the data field
of the event_filter_pid_sched_wakeup_probe_pre() callback that is
called before all events when the set_event_pid file is set. That
means, the "tr" being dereferened was not the "tr" you were looking for.

> 
> It seems it got inlined within trace_event_buffer_reserve()
> 
> There strangest things I found so far is this:
> 
> crash> sym global_trace  
> ffffffffbcdb7e80 (d) global_trace
> crash> list ftrace_trace_arrays  
> ffffffffbcdb7e70
> ffffffffbcdb7e80
> 
> trace_array has size 7672, but ffffffffbcdb7e70 is only 16 bytes away from
> ffffffffbcdb7e80.

ftrace_trace_arrays is a list_head, and I doubt you created any
instances, thus the list head has only one instance, and that is
global_trace. Hence, it points to global_trace and itself. It just so
happens that a list_head is 16 bytes.


> 
> Also this:
> 
> crash> struct trace_array.events -o global_trace  
> struct trace_array {
>   [ffffffffbcdb9be0] struct list_head events;
> }
> crash> list -s trace_event_file.tr -o trace_event_file.list ffffffffbcdb9be0  
> ffffffffbcdb9be0
>   tr = 0xffffffffbcdb7b20
> ffff8ccb45cdfb00
>   tr = 0xffffffffbcdb7e80
> ffff8ccb45cdf580
>   tr = 0xffffffffbcdb7e80
> ffff8ccb45cdfe18
>   tr = 0xffffffffbcdb7e80
> ...
>   tr = 0xffffffffbcdb7e80
> 
> The first one 0xffffffffbcdb7b20 is only 864 bytes away from 0xffffffffbcdb7e80

I'm thinking it is confused by hitting the ftrace_trace_arrays
list_head itself.

> 
> Additional information can be found here:
> https://www.samba.org/~metze/202107221802.reproduce-freeze-05-first-time.v3-pid1/
> -rw-r--r-- 1 metze metze 37250108 Jul 22 18:02 dump.202107221802
> -rw-r--r-- 1 metze metze    63075 Jul 22 18:02 dmesg.202107221802
> -rw-r--r-- 1 metze metze     8820 Jul 22 18:13 crash-bt-a-s.txt
> -rw-r--r-- 1 metze metze    36586 Jul 22 18:14 crash-bt-a-l-FF-s.txt
> -rw-r--r-- 1 metze metze     4798 Jul 22 23:49 crash-bt-p.txt
> -rw-r--r-- 1 metze metze      946 Jul 23 00:13 strange-things.txt
> -rw-r--r-- 1 metze metze      482 Jul 23 00:24 code-and-vmlinux.txt
> 
> The code can be found here:
> https://kernel.ubuntu.com/git/kernel-ppa/mirror/ubuntu-oem-5.10-focal.git/tag/?h=Ubuntu-oem-5.13-5.13.0-1007.7
> 
> And the /usr/lib/debug/boot/vmlinux-5.13.0-1007-oem file can be extracted from here:
> http://ddebs.ubuntu.com/pool/main/l/linux-oem-5.13/linux-image-unsigned-5.13.0-1007-oem-dbgsym_5.13.0-1007.7_amd64.ddeb
> 
> Also also needed the latest code from https://github.com/crash-utility/crash.git
> (at commit f53b73e8380bca054cebd2b61ff118c46609429b)
> 
> It would be really great if you (or anyone else on the lists) could have a closer look
> in order to get the problem fixed :-)

Once I triggered it and started looking at what was happening, it
didn't take me to long to figure out where the culprit was.

> 
> I've learned a lot this week and tried hard to find the problem myself,
> but I have to move back to other work for now.
> 

I'm glad you learned a lot. There's a lot of complex code in there, and
its getting more complex, as you can see with the static_calls.

This is because tracing tries hard to avoid the heisenbug effect. You
know, you see a bug, turn on tracing, and then that bug goes away!

Thus it pulls out all the tricks it can to be as least intrusive on the
system as it can be. And that causes things to get complex really quick.

Cheers, and thanks for keeping up on this bug!

-- Steve

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sched_waking vs. set_event_pid crash (Re: Tracing busy processes/threads freezes/stalls the whole machine)
  2021-07-23  2:51               ` Steven Rostedt
@ 2021-07-23 10:35                 ` Stefan Metzmacher
  2021-07-23 11:29                   ` Steven Rostedt
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Metzmacher @ 2021-07-23 10:35 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Ingo Molnar, linux-trace-devel, io-uring, linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 5748 bytes --]


Hi Steve,
>> After some days of training:
>> https://training.linuxfoundation.org/training/linux-kernel-debugging-and-security/
>> I was able to get much closer to the problem :-)
> 
> Well, knowing what to look for was not going to be easy.
>
> And I'm sure you were shocked to see what I posted as a fix ;-)

Not really, such problems typically just have one line fixes :-)

> Assuming this does fix your issue, I sent out a real patch with the
> explanation of what happened in the change log, so that you can see why
> that change was your issue.

Yes, it does the trick, thanks very much!

>> In order to reproduce it and get reliable kexec crash dumps,
>> I needed to give the VM at least 3 cores.
>
> Yes, it did require having this run on multiple CPUs to have a race
> condition trigger, and two cores would have been hard to hit it. I ran
> it on 8 cores and it triggered rather easily.

I mean I was able to trigger the problem with 2 cores, but then
the machine was completely stuck, without triggering the kexec reboot
to get the crash dump. I don't understand why as the additional cpu
is not really involved at all..

Any idea why kexec could be so unreliable?
(even with 3 or 6 cpus it sometimes just didn't trigger)

>> While running './io-uring_cp-forever link-cp.c file' (from:
>> https://github.com/metze-samba/liburing/commits/io_uring-cp-forever )
>> in one window, the following simple sequence triggered the problem in most cases:
>>
>> echo 1 > /sys/kernel/tracing/events/sched/sched_waking/enable
>> echo 1 > /sys/kernel/tracing/set_event_pid
> 
> All it took was something busy that did a lot of wakeups while setting
> the set_event_pid, to be able to hit the race easily. As I stated, I
> triggered it with running hackbench instead of the io-uring code. In
> fact, this bug had nothing to do with io-uring, and you were only
> triggering it because you were making enough of a load on the system to
> make the race happen often.

Yes, it was just the example that made it most reliable to trigger for me.

>> pid_list = rcu_dereference_raw(tr->filtered_pids);
> 
> That "tr" comes from the trace_event_file that is passed in by the
> "data" field of the callback. Hence, this callback got the data field
> of the event_filter_pid_sched_wakeup_probe_pre() callback that is
> called before all events when the set_event_pid file is set. That
> means, the "tr" being dereferened was not the "tr" you were looking for.

Yes, that's what I assumed, but didn't find the reason for.

>>
>> It seems it got inlined within trace_event_buffer_reserve()
>>
>> There strangest things I found so far is this:
>>
>> crash> sym global_trace  
>> ffffffffbcdb7e80 (d) global_trace
>> crash> list ftrace_trace_arrays  
>> ffffffffbcdb7e70
>> ffffffffbcdb7e80
>>
>> trace_array has size 7672, but ffffffffbcdb7e70 is only 16 bytes away from
>> ffffffffbcdb7e80.
> 
> ftrace_trace_arrays is a list_head, and I doubt you created any
> instances, thus the list head has only one instance, and that is
> global_trace. Hence, it points to global_trace and itself. It just so
> happens that a list_head is 16 bytes.

Ah, that explains a lot:

crash> struct list_head -o ftrace_trace_arrays
struct list_head {
  [ffffffffbcdb7e70] struct list_head *next;
  [ffffffffbcdb7e78] struct list_head *prev;
}
SIZE: 16
crash> list  ftrace_trace_arrays
ffffffffbcdb7e70
ffffffffbcdb7e80

I guess this is what I want:

crash> list -H ftrace_trace_arrays
ffffffffbcdb7e80

>> Also this:
>>
>> crash> struct trace_array.events -o global_trace  
>> struct trace_array {
>>   [ffffffffbcdb9be0] struct list_head events;
>> }
>> crash> list -s trace_event_file.tr -o trace_event_file.list ffffffffbcdb9be0  
>> ffffffffbcdb9be0
>>   tr = 0xffffffffbcdb7b20
>> ffff8ccb45cdfb00
>>   tr = 0xffffffffbcdb7e80
>> ffff8ccb45cdf580
>>   tr = 0xffffffffbcdb7e80
>> ffff8ccb45cdfe18
>>   tr = 0xffffffffbcdb7e80
>> ...
>>   tr = 0xffffffffbcdb7e80
>>
>> The first one 0xffffffffbcdb7b20 is only 864 bytes away from 0xffffffffbcdb7e80
> 
> I'm thinking it is confused by hitting the ftrace_trace_arrays
> list_head itself.

Yes, I needed the -H here too:

list -s trace_event_file.tr -o trace_event_file.list -H ffffffffbcdb9be0


>> It would be really great if you (or anyone else on the lists) could have a closer look
>> in order to get the problem fixed :-)
> 
> Once I triggered it and started looking at what was happening, it
> didn't take me to long to figure out where the culprit was.

Great!

>>
>> I've learned a lot this week and tried hard to find the problem myself,
>> but I have to move back to other work for now.
>>
> 
> I'm glad you learned a lot. There's a lot of complex code in there, and
> its getting more complex, as you can see with the static_calls.
> 
> This is because tracing tries hard to avoid the heisenbug effect. You
> know, you see a bug, turn on tracing, and then that bug goes away!
> 
> Thus it pulls out all the tricks it can to be as least intrusive on the
> system as it can be. And that causes things to get complex really quick.

I'll have a look, I may need something similar for my smbdirect socket driver
in future.

> Cheers, and thanks for keeping up on this bug!

Thanks for fixing the bug!

Now I can finally use:

trace-cmd record -e all -P $(pidof io_uring-cp-forever)

But that doesn't include the iou-wrk-* threads
and the '-c' option seems to only work with forking.

Is there a way to specify "trace *all* threads of the given pid"?
(Note the threads are comming and going, so it's not possible to
specifiy -P more than once)

Thanks!
metze


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sched_waking vs. set_event_pid crash (Re: Tracing busy processes/threads freezes/stalls the whole machine)
  2021-07-23 10:35                 ` Stefan Metzmacher
@ 2021-07-23 11:29                   ` Steven Rostedt
  2021-07-23 11:53                     ` Stefan Metzmacher
  0 siblings, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2021-07-23 11:29 UTC (permalink / raw)
  To: Stefan Metzmacher; +Cc: Ingo Molnar, linux-trace-devel, io-uring, linux-kernel

On Fri, 23 Jul 2021 12:35:09 +0200
Stefan Metzmacher <metze@samba.org> wrote:

> > Assuming this does fix your issue, I sent out a real patch with the
> > explanation of what happened in the change log, so that you can see why
> > that change was your issue.  
> 
> Yes, it does the trick, thanks very much!

Can I get a "Tested-by" from you?

> Now I can finally use:
> 
> trace-cmd record -e all -P $(pidof io_uring-cp-forever)
> 
> But that doesn't include the iou-wrk-* threads
> and the '-c' option seems to only work with forking.

Have you tried it? It should work for threads as well. It hooks to the
sched_process_fork tracepoint, which should be triggered even when a new
thread is created.

Or do you mean that you want that process and all its threads too that are
already running? I could probably have it try to add it via the /proc file
system in that case.

Can you start the task via trace-cmd?

  trace-cmd record -e all -F -c io_uring-cp-forever ...


> 
> Is there a way to specify "trace *all* threads of the given pid"?
> (Note the threads are comming and going, so it's not possible to
> specifiy -P more than once)

Right, although, you could append tasks manually to the set_event_pid file
from another terminal after starting trace-cmd. Once a pid is added to that
file, all children it makes will also be added. That could be a work around
until we have trace-cmd do it.

Care to write a bugzilla report for this feature?

  https://bugzilla.kernel.org/buglist.cgi?component=Trace-cmd%2FKernelshark&list_id=1088173

-- Steve

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sched_waking vs. set_event_pid crash (Re: Tracing busy processes/threads freezes/stalls the whole machine)
  2021-07-23 11:29                   ` Steven Rostedt
@ 2021-07-23 11:53                     ` Stefan Metzmacher
  2021-07-23 12:41                       ` Steven Rostedt
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Metzmacher @ 2021-07-23 11:53 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Ingo Molnar, linux-trace-devel, io-uring, linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 2088 bytes --]


Hi Steve,

>>> Assuming this does fix your issue, I sent out a real patch with the
>>> explanation of what happened in the change log, so that you can see why
>>> that change was your issue.  
>>
>> Yes, it does the trick, thanks very much!
> 
> Can I get a "Tested-by" from you?

Sure!

Can you check if the static_key_disable() and static_key_enable()
calls are correctly ordered compared to rcu_assign_pointer()
and explain why they are, as I not really understand that it's different
from tracepoint_update_call vs. rcu_assign_pointer

>> Now I can finally use:
>>
>> trace-cmd record -e all -P $(pidof io_uring-cp-forever)
>>
>> But that doesn't include the iou-wrk-* threads
>> and the '-c' option seems to only work with forking.
> 
> Have you tried it? It should work for threads as well. It hooks to the
> sched_process_fork tracepoint, which should be triggered even when a new
> thread is created.
> 
> Or do you mean that you want that process and all its threads too that are
> already running? I could probably have it try to add it via the /proc file
> system in that case.
> 
> Can you start the task via trace-cmd?
> 
>   trace-cmd record -e all -F -c io_uring-cp-forever ...

I think that would work, but I typically want to analyze a process
that is already running.

>> Is there a way to specify "trace *all* threads of the given pid"?
>> (Note the threads are comming and going, so it's not possible to
>> specifiy -P more than once)
> 
> Right, although, you could append tasks manually to the set_event_pid file
> from another terminal after starting trace-cmd. Once a pid is added to that
> file, all children it makes will also be added. That could be a work around
> until we have trace-cmd do it.

Sure, but it will always be racy.

With children, does new threads also count as children or only fork() children?

> Care to write a bugzilla report for this feature?
> 
>   https://bugzilla.kernel.org/buglist.cgi?component=Trace-cmd%2FKernelshark&list_id=1088173

I may do later...

Thanks!
metze


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sched_waking vs. set_event_pid crash (Re: Tracing busy processes/threads freezes/stalls the whole machine)
  2021-07-23 11:53                     ` Stefan Metzmacher
@ 2021-07-23 12:41                       ` Steven Rostedt
  0 siblings, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2021-07-23 12:41 UTC (permalink / raw)
  To: Stefan Metzmacher; +Cc: Ingo Molnar, linux-trace-devel, io-uring, linux-kernel

On Fri, 23 Jul 2021 13:53:41 +0200
Stefan Metzmacher <metze@samba.org> wrote:

> Hi Steve,
> 
> >>> Assuming this does fix your issue, I sent out a real patch with the
> >>> explanation of what happened in the change log, so that you can see why
> >>> that change was your issue.    
> >>
> >> Yes, it does the trick, thanks very much!  
> > 
> > Can I get a "Tested-by" from you?  
> 
> Sure!

Thanks.

> 
> Can you check if the static_key_disable() and static_key_enable()
> calls are correctly ordered compared to rcu_assign_pointer()
> and explain why they are, as I not really understand that it's different
> from tracepoint_update_call vs. rcu_assign_pointer

The order doesn't even matter. I'm assuming you are talking about the
static_key_disable/enable with respect to enabling the tracepoint.

The reason it doesn't matter is because the rcu_assign_pointer is
updating an array of elements that hold both the function to call and
the data it needs. Inside the tracepoint loop where the callbacks are
called, in the iterator code (not the static call), you will see:

		it_func_ptr =						\
			rcu_dereference_raw((&__tracepoint_##_name)->funcs); \
		if (it_func_ptr) {					\
			do {						\
				it_func = READ_ONCE((it_func_ptr)->func); \
				__data = (it_func_ptr)->data;		\
				((void(*)(void *, proto))(it_func))(__data, args); \
			} while ((++it_func_ptr)->func);		\
		}

What that does is to get either the old array, or the new array and
places that array into it_func_ptr. Each element of this array contains
the callback (in .func) and the callback's data (in .data).

The enabling or disabling of the tracepoint doesn't really make a
difference with respect to the order of updating the funcs array.
That's because the users of this will either see the old array, the new
array, or NULL, in that it_func_ptr. This is how RCU works.

The bug we hit was because the static_call was updated separately from
the array. That makes it more imperative that you get the order
correct. As my email stated, with static_calls we have this:

		it_func_ptr =						\
			rcu_dereference_raw((&__tracepoint_##name)->funcs); \
		if (it_func_ptr) {					\
			__data = (it_func_ptr)->data;			\
			static_call(tp_func_##name)(__data, args);	\
		}

Where the issue is that the static_call which chooses which callback to
make, is updated asynchronously from the update of the array.

> 
> >> Now I can finally use:
> >>
> >> trace-cmd record -e all -P $(pidof io_uring-cp-forever)
> >>
> >> But that doesn't include the iou-wrk-* threads
> >> and the '-c' option seems to only work with forking.  
> > 
> > Have you tried it? It should work for threads as well. It hooks to the
> > sched_process_fork tracepoint, which should be triggered even when a new
> > thread is created.
> > 
> > Or do you mean that you want that process and all its threads too that are
> > already running? I could probably have it try to add it via the /proc file
> > system in that case.
> > 
> > Can you start the task via trace-cmd?
> > 
> >   trace-cmd record -e all -F -c io_uring-cp-forever ...  
> 
> I think that would work, but I typically want to analyze a process
> that is already running.
> 
> >> Is there a way to specify "trace *all* threads of the given pid"?
> >> (Note the threads are comming and going, so it's not possible to
> >> specifiy -P more than once)  
> > 
> > Right, although, you could append tasks manually to the set_event_pid file
> > from another terminal after starting trace-cmd. Once a pid is added to that
> > file, all children it makes will also be added. That could be a work around
> > until we have trace-cmd do it.  
> 
> Sure, but it will always be racy.

Not really. Matters what you mean by "racy". You wont be able to
"instantaneously" enable a process and all its threads, but you can
capture all of them after a given point. As you are attaching to a
process already running, you already missed events because you were not
yet tracing. But once you have a thread, you will always have it. 

> 
> With children, does new threads also count as children or only fork() children?

New threads. It's the kernel point of view of a task, which does not
differentiate processes from threads. Everything that can be scheduled
on a CPU is called a "task". How a task interacts with other tasks is
what defines it being a "thread" or a "process".

> 
> > Care to write a bugzilla report for this feature?
> > 
> >   https://bugzilla.kernel.org/buglist.cgi?component=Trace-cmd%2FKernelshark&list_id=1088173  
> 
> I may do later...
> 

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-07-23 12:41 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <293cfb1d-8a53-21e1-83c1-cdb6e2f32c65@samba.org>
2021-05-04 12:26 ` Tracing busy processes/threads freezes/stalls the whole machine Stefan Metzmacher
     [not found] ` <20210504092404.6b12aba4@gandalf.local.home>
     [not found]   ` <f590b26d-c027-cc5a-bcbd-1dc734f72e7e@samba.org>
     [not found]     ` <20210504093550.5719d4bd@gandalf.local.home>
     [not found]       ` <f351bdfa-5223-e457-0396-a24ffa09d6b5@samba.org>
     [not found]         ` <8bb757fb-a83b-0ed5-5247-8273be3925c5@samba.org>
     [not found]           ` <90c806a0-8a2f-1257-7337-6761100217c9@samba.org>
2021-07-22 22:43             ` sched_waking vs. set_event_pid crash (Re: Tracing busy processes/threads freezes/stalls the whole machine) Stefan Metzmacher
2021-07-23  1:41               ` Steven Rostedt
2021-07-23  2:51               ` Steven Rostedt
2021-07-23 10:35                 ` Stefan Metzmacher
2021-07-23 11:29                   ` Steven Rostedt
2021-07-23 11:53                     ` Stefan Metzmacher
2021-07-23 12:41                       ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).