io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Metzmacher <metze@samba.org>
To: Steven Rostedt <rostedt@goodmis.org>, Ingo Molnar <mingo@redhat.com>
Cc: linux-trace-devel@vger.kernel.org,
	io-uring <io-uring@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: Tracing busy processes/threads freezes/stalls the whole machine
Date: Tue, 4 May 2021 14:26:18 +0200	[thread overview]
Message-ID: <92e31509-c32e-9f2f-633a-e0ad3d3d1f1b@samba.org> (raw)
In-Reply-To: <293cfb1d-8a53-21e1-83c1-cdb6e2f32c65@samba.org>

+ linux-kernel

I just retested with 5.12 final:

As long as I give the VM more than one cpu, I can easily reproduce the problem with:

> while true; do cat linux-image-5.12.0-rc8-dbg_5.12.0-rc8-22_amd64.deb > /dev/null; done
> and
> pidofcat=$(pidof cat); echo $pidofcat; trace-cmd record -e all -P $pidofcat

On a single cpu VM it doesn't seem to trigger.

Is anyone else able to reproduce this?

Any ideas how to debug this?

It's really annoying to fear a machine freeze when trying to trace problems
on recent kernels...

Thanks for any possible hint...
metze

Am 22.04.21 um 16:26 schrieb Stefan Metzmacher:
> 
> Hi Steven, hi Ingo,
> 
> I recently tried to analyze the performance of Samba using io-uring.
> 
> I was using ubuntu 20.04 with the 5.10.0-1023-oem kernel, which is based on v5.10.25, see:
> https://kernel.ubuntu.com/git/kernel-ppa/mirror/ubuntu-oem-5.10-focal.git/log/?h=oem-5.10-prep
> trace-cmd is at version 2.8.3-4build1.
> 
> In order to find the bottleneck I tried to use (trace-cmd is at version 2.8.3-4build1):
> 
>   trace-cmd -e all -P ${pid_of_io_uring_worker}
> 
> As a result the server was completely dead immediately.
> 
> I tried to reproduce this in a virtual machine (inside virtualbox).
> 
> I used a modified 'io_uring-cp' that loops forever, see:
> https://github.com/metze-samba/liburing/commit/5e98efed053baf03521692e786c1c55690b04d8e
> 
> When I run './io_uring-cp-forever link-cp.c file',
> I see a 'io_wq_manager' and a 'io_wqe_worker-0' kernel thread,
> while './io_uring-cp-forever link-cp.c file' as well as 'io_wqe_worker-0'
> consume about 25% cpu each.
> 
> When I run 'trace-cmd -e all -P $pid' for 'io_wqe_worker-0' or 'io_wq_manager'
> I can reproduce the problem, then I found that the same seems to happen for
> also for other kernel threads e.g. '[kworker/1:1-events]', it seems that
> it happens for all kernel threads, which are not completely idle.
> 
> Which this:
> 
>  From 'top':
>    1341 root      20   0    2512    576    508 S  33,4   0,1   0:10.39 io_uring-cp-for
>    1343 root      20   0       0      0      0 R  29,8   0,0   0:08.43 io_wqe_worker-0
>       7 root      20   0       0      0      0 I   0,3   0,0   0:00.31 kworker/0:1-events
> 
>    PID 5 is [kworker/0:0-ata_sff]
> 
> # trace-cmd record -e all -P 5'
> Hit Ctrl^C to stop recording
> ^CCPU0 data recorded at offset=0x7b8000
>     0 bytes in size
> CPU1 data recorded at offset=0x7b8000
>     69632 bytes in size
> 
> # But
> # trace-cmd record -e all -P 7
> => machine unresponsive (no blinking cursor on the console anymore)
> On the host 'top' shows that the VirtualBoxVM cpu emulator thread 'EMT-1'
> uses 100% cpu, so I guess the guest kernel is in something like an endless
> recursion loop. Maybe a trace function recursing to itself?
> 
> On the same VM I tried a 5.12rc8 kernel and there I can also reproduce the
> problem.
> 
> I also managed to reproduce the problem without io-uring, just using:
> 
>  while true; do cat linux-image-5.12.0-rc8-dbg_5.12.0-rc8-22_amd64.deb > /dev/null; done
> 
> in order to keep some kernel threads moving.
> This happens with 5.12rc8 and 5.10.0-1023-oem, but I was not able to
> reproduce any of this using the 5.8.0-50-generic kernel, see
> https://kernel.ubuntu.com/git/ubuntu/ubuntu-focal.git/log/?h=Ubuntu-hwe-5.8-5.8.0-50.56_20.04.1
> 
> I was also able to reproduce this with a ubuntu 21.04 vm using
> the 5.11.0-14-generic kernel, see:
> https://kernel.ubuntu.com/git/ubuntu/ubuntu-hirsute.git/log/?h=Ubuntu-5.11.0-14.15
> On this one I only managed to reproduce the problem with
> './io_uring-cp-forever link-cp.c file', but not with
> 'while true; do cat linux-image-5.12.0-rc8-dbg_5.12.0-rc8-22_amd64.deb > /dev/null; done'
> 
> 
> So it seems the problem was introduced after 5.8 and is not really related to
> io-uring. And it may not be purely related to kernel threads.
> 
> With this on 5.12-rc8 (again):
> 
>   └─tmux: server,903
>       ├─bash,904
>       │   └─io_uring-cp-for,925 link-cp.c file
>       │       ├─{iou-mgr-925},926
>       │       └─{iou-wrk-925},927
>       └─bash,929
>           └─pstree,938 -a -t -p
> 
> I was able to to trace once:
> 
> root@ub1704-166:~# trace-cmd record -e all -P 925
> Hit Ctrl^C to stop recording
> ^CCPU0 data recorded at offset=0x7b8000
>     10842112 bytes in size
> CPU1 data recorded at offset=0x120f000
>     36450304 bytes in size
> 
> But the 2nd run reproduced the problem:
> root@ub1704-166:~# trace-cmd record -e all -P 925
> 
> I was also able to reproduce it with:
> 
> while true; do cat linux-image-5.12.0-rc8-dbg_5.12.0-rc8-22_amd64.deb > /dev/null; done
> and
> pidofcat=$(pidof cat); echo $pidofcat; trace-cmd record -e all -P $pidofcat
> 
> So it seems any busy thread (user or kernel) triggers the problem.
> 
> Any ideas what has changed after 5.8?
> 
> Thanks!
> metze
> 


  parent reply	other threads:[~2021-05-04 12:26 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-22 14:26 Tracing busy processes/threads freezes/stalls the whole machine Stefan Metzmacher
2021-04-22 14:55 ` Stefan Metzmacher
2021-05-04 12:26 ` Stefan Metzmacher [this message]
2021-05-04 13:24 ` Steven Rostedt
2021-05-04 13:28   ` Stefan Metzmacher
2021-05-04 13:32     ` Steven Rostedt
2021-05-04 13:35     ` Steven Rostedt
2021-05-05  9:50       ` Stefan Metzmacher
2021-05-31 15:39         ` Stefan Metzmacher
2021-07-19 17:07           ` Stefan Metzmacher
2021-07-22 22:43             ` sched_waking vs. set_event_pid crash (Re: Tracing busy processes/threads freezes/stalls the whole machine) Stefan Metzmacher
2021-07-23  1:41               ` Steven Rostedt
2021-07-23  2:51               ` Steven Rostedt
2021-07-23 10:35                 ` Stefan Metzmacher
2021-07-23 11:29                   ` Steven Rostedt
2021-07-23 11:53                     ` Stefan Metzmacher
2021-07-23 12:41                       ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=92e31509-c32e-9f2f-633a-e0ad3d3d1f1b@samba.org \
    --to=metze@samba.org \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-devel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).