archive mirror
 help / color / mirror / Atom feed
From: Stefan Metzmacher <>
To: Steven Rostedt <>
Cc: Ingo Molnar <>,,
	io-uring <>,
	"" <>
Subject: Re: sched_waking vs. set_event_pid crash (Re: Tracing busy processes/threads freezes/stalls the whole machine)
Date: Fri, 23 Jul 2021 12:35:09 +0200	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <20210722225124.6d7d7153@rorschach.local.home>

[-- Attachment #1.1: Type: text/plain, Size: 5748 bytes --]

Hi Steve,
>> After some days of training:
>> I was able to get much closer to the problem :-)
> Well, knowing what to look for was not going to be easy.
> And I'm sure you were shocked to see what I posted as a fix ;-)

Not really, such problems typically just have one line fixes :-)

> Assuming this does fix your issue, I sent out a real patch with the
> explanation of what happened in the change log, so that you can see why
> that change was your issue.

Yes, it does the trick, thanks very much!

>> In order to reproduce it and get reliable kexec crash dumps,
>> I needed to give the VM at least 3 cores.
> Yes, it did require having this run on multiple CPUs to have a race
> condition trigger, and two cores would have been hard to hit it. I ran
> it on 8 cores and it triggered rather easily.

I mean I was able to trigger the problem with 2 cores, but then
the machine was completely stuck, without triggering the kexec reboot
to get the crash dump. I don't understand why as the additional cpu
is not really involved at all..

Any idea why kexec could be so unreliable?
(even with 3 or 6 cpus it sometimes just didn't trigger)

>> While running './io-uring_cp-forever link-cp.c file' (from:
>> )
>> in one window, the following simple sequence triggered the problem in most cases:
>> echo 1 > /sys/kernel/tracing/events/sched/sched_waking/enable
>> echo 1 > /sys/kernel/tracing/set_event_pid
> All it took was something busy that did a lot of wakeups while setting
> the set_event_pid, to be able to hit the race easily. As I stated, I
> triggered it with running hackbench instead of the io-uring code. In
> fact, this bug had nothing to do with io-uring, and you were only
> triggering it because you were making enough of a load on the system to
> make the race happen often.

Yes, it was just the example that made it most reliable to trigger for me.

>> pid_list = rcu_dereference_raw(tr->filtered_pids);
> That "tr" comes from the trace_event_file that is passed in by the
> "data" field of the callback. Hence, this callback got the data field
> of the event_filter_pid_sched_wakeup_probe_pre() callback that is
> called before all events when the set_event_pid file is set. That
> means, the "tr" being dereferened was not the "tr" you were looking for.

Yes, that's what I assumed, but didn't find the reason for.

>> It seems it got inlined within trace_event_buffer_reserve()
>> There strangest things I found so far is this:
>> crash> sym global_trace  
>> ffffffffbcdb7e80 (d) global_trace
>> crash> list ftrace_trace_arrays  
>> ffffffffbcdb7e70
>> ffffffffbcdb7e80
>> trace_array has size 7672, but ffffffffbcdb7e70 is only 16 bytes away from
>> ffffffffbcdb7e80.
> ftrace_trace_arrays is a list_head, and I doubt you created any
> instances, thus the list head has only one instance, and that is
> global_trace. Hence, it points to global_trace and itself. It just so
> happens that a list_head is 16 bytes.

Ah, that explains a lot:

crash> struct list_head -o ftrace_trace_arrays
struct list_head {
  [ffffffffbcdb7e70] struct list_head *next;
  [ffffffffbcdb7e78] struct list_head *prev;
SIZE: 16
crash> list  ftrace_trace_arrays

I guess this is what I want:

crash> list -H ftrace_trace_arrays

>> Also this:
>> crash> struct -o global_trace  
>> struct trace_array {
>>   [ffffffffbcdb9be0] struct list_head events;
>> }
>> crash> list -s -o trace_event_file.list ffffffffbcdb9be0  
>> ffffffffbcdb9be0
>>   tr = 0xffffffffbcdb7b20
>> ffff8ccb45cdfb00
>>   tr = 0xffffffffbcdb7e80
>> ffff8ccb45cdf580
>>   tr = 0xffffffffbcdb7e80
>> ffff8ccb45cdfe18
>>   tr = 0xffffffffbcdb7e80
>> ...
>>   tr = 0xffffffffbcdb7e80
>> The first one 0xffffffffbcdb7b20 is only 864 bytes away from 0xffffffffbcdb7e80
> I'm thinking it is confused by hitting the ftrace_trace_arrays
> list_head itself.

Yes, I needed the -H here too:

list -s -o trace_event_file.list -H ffffffffbcdb9be0

>> It would be really great if you (or anyone else on the lists) could have a closer look
>> in order to get the problem fixed :-)
> Once I triggered it and started looking at what was happening, it
> didn't take me to long to figure out where the culprit was.


>> I've learned a lot this week and tried hard to find the problem myself,
>> but I have to move back to other work for now.
> I'm glad you learned a lot. There's a lot of complex code in there, and
> its getting more complex, as you can see with the static_calls.
> This is because tracing tries hard to avoid the heisenbug effect. You
> know, you see a bug, turn on tracing, and then that bug goes away!
> Thus it pulls out all the tricks it can to be as least intrusive on the
> system as it can be. And that causes things to get complex really quick.

I'll have a look, I may need something similar for my smbdirect socket driver
in future.

> Cheers, and thanks for keeping up on this bug!

Thanks for fixing the bug!

Now I can finally use:

trace-cmd record -e all -P $(pidof io_uring-cp-forever)

But that doesn't include the iou-wrk-* threads
and the '-c' option seems to only work with forking.

Is there a way to specify "trace *all* threads of the given pid"?
(Note the threads are comming and going, so it's not possible to
specifiy -P more than once)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2021-07-23 10:35 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <>
2021-05-04 12:26 ` Tracing busy processes/threads freezes/stalls the whole machine Stefan Metzmacher
     [not found] ` <20210504092404.6b12aba4@gandalf.local.home>
     [not found]   ` <>
     [not found]     ` <20210504093550.5719d4bd@gandalf.local.home>
     [not found]       ` <>
     [not found]         ` <>
     [not found]           ` <>
2021-07-22 22:43             ` sched_waking vs. set_event_pid crash (Re: Tracing busy processes/threads freezes/stalls the whole machine) Stefan Metzmacher
2021-07-23  1:41               ` Steven Rostedt
2021-07-23  2:51               ` Steven Rostedt
2021-07-23 10:35                 ` Stefan Metzmacher [this message]
2021-07-23 11:29                   ` Steven Rostedt
2021-07-23 11:53                     ` Stefan Metzmacher
2021-07-23 12:41                       ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).