linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bruno Goncalves <bgoncalv@redhat.com>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: CKI Project <cki-project@redhat.com>,
	linux-kernel@vger.kernel.org, nathan@kernel.org,
	Memory Management <mm-qe@redhat.com>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: WARNING: CPU: 112 PID: 2041 at kernel/sched/sched.h:1453
Date: Thu, 29 Jul 2021 14:36:56 +0200	[thread overview]
Message-ID: <CA+QYu4rCRR_pNQVxSwGpzcLWJKLUA3F7LAtBiU9hPzz3D0k0Cg@mail.gmail.com> (raw)
In-Reply-To: <d86333e7-bcde-2adb-e566-21ec97cda8bf@arm.com>

On Wed, Jul 28, 2021 at 5:55 PM Dietmar Eggemann
<dietmar.eggemann@arm.com> wrote:
>
> On 28/07/2021 15:11, Bruno Goncalves wrote:
> > Hello,
> >
> > Since this commit (Commit: 45312bd762d3 - Merge tag 'zonefs-5.14-rc2')
> > we started to see the following call trace, it seems to be
> > reproducible only on aarch64.
>
> It should happen on platforms using a slow-switching cpufreq driver.
>
> Only in this case you have n (depends on nbr of frequency domains)
> special-purpose DL threads when using schedutil CPUFreq governor:
>
> root@juno: ps -eTo comm,pid,pri,class | grep sugov
>
> sugov:0           132 140 DLN
> sugov:1           134 140 DLN
>
> >
> > [  384.485614] ------------[ cut here ]------------
> > [  384.490227] rq->clock_update_flags < RQCF_ACT_SKIP
> > [  384.490232] WARNING: CPU: 112 PID: 2041 at
> > kernel/sched/sched.h:1453 sub_running_bw.isra.0+0x190/0x1a0
> > [  384.504312] Modules linked in: mlx5_ib ib_uverbs ib_core rfkill
> > sunrpc acpi_ipmi ipmi_ssif mlx5_core mlxfw psample ipmi_devintf
> > arm_cmn ipmi_msghandler arm_dsu_pmu cppc_cpufreq acpi_tad vfat fat
> > fuse zram ip_tables x_tables xfs crct10dif_ce ghash_ce ast
> > i2c_algo_bit drm_vram_helper sbsa_gwdt drm_kms_helper syscopyarea
> > sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm nvme
> > nvme_core drm xgene_hwmon aes_neon_bs
> > [  384.541165] CPU: 112 PID: 2041 Comm: sugov:112 Tainted: G        W
> >        5.14.0-rc1 #1
> > [  384.549244] Hardware name: WIWYNN Mt.Jade Server System
> > B81.030Z1.0007/Mt.Jade Motherboard, BIOS 1.6.20210526 (SCP:
> > 1.06.20210526) 2021/05/26
> > [  384.561922] pstate: 404000c9 (nZcv daIF +PAN -UAO -TCO BTYPE=--)
> > [  384.567918] pc : sub_running_bw.isra.0+0x190/0x1a0
> > [  384.572698] lr : sub_running_bw.isra.0+0x190/0x1a0
> > [  384.577477] sp : ffff800024c4bb20
> > [  384.580779] x29: ffff800024c4bb20 x28: 0000000000000000 x27: ffffb9a9bbe1d200
> > [  384.587904] x26: 0000000000000074 x25: 0000000000000011 x24: ffffb9a9bdff9000
> > [  384.595029] x23: ffff07ffb36fcaa0 x22: ffff401ee09b65c0 x21: ffffb9a9bbe1de00
> > [  384.602153] x20: ffff401ee09a3360 x19: ffff401ee09b6f58 x18: 0000000000000000
> > [  384.609277] x17: ffff867522f0c000 x16: ffff800010384000 x15: 0000000000000030
> > [  384.616401] x14: 0000000000000000 x13: 50494b535f544341 x12: 5f46435152203c20
> > [  384.623526] x11: ffff401ee04b0ea8 x10: ffff401ee021e068 x9 : ffffb9a9bbe4214c
> > [  384.630650] x8 : 0000000000010ea8 x7 : ffff401ee01e0000 x6 : 0000000000017ffd
> > [  384.637774] x5 : ffff401ee09a3490 x4 : 0000000000000001 x3 : ffff867522f0c000
> > [  384.644898] x2 : ffff401ee09a3498 x1 : ffff07ffb53cc000 x0 : 0000000000000026
> > [  384.652022] Call trace:
> > [  384.654457]  sub_running_bw.isra.0+0x190/0x1a0
> > [  384.658890]  migrate_task_rq_dl+0xf8/0x1e0
> > [  384.662975]  set_task_cpu+0xa8/0x1f0
> > [  384.666540]  try_to_wake_up+0x150/0x3d4
> > [  384.670365]  wake_up_q+0x64/0xc0
> > [  384.673582]  __up_write+0xd0/0x1c0
> > [  384.676974]  up_write+0x4c/0x2b0
> > [  384.680191]  cppc_set_perf+0x120/0x2d0
> > [  384.683931]  cppc_cpufreq_set_target+0xe0/0x1a4 [cppc_cpufreq]
> > [  384.689756]  __cpufreq_driver_target+0x74/0x140
> > [  384.694277]  sugov_work+0x64/0x80
> > [  384.697580]  kthread_worker_fn+0xe0/0x230
> > [  384.701580]  kthread+0x138/0x140
> > [  384.704797]  ret_from_fork+0x10/0x18
>
> Don't quite get this.
> `sugov:112` should be a special DL entity (dl_se->flags &
> SCHED_FLAG_SUGOV) so sub_running_bw() should not call __sub_running_bw()
> and hence there won't be a call to cpufreq_update_util() which calls
> q_clock(rq) -> assert_clock_updated()?
>
> Can't reproduce it on my Juno (arm64) (slow-switching (scpi-cpufreq
> driver)).

We seem to be able to reproduce this only on Ampere Altra machines,
specifically on mtjade and mtsnow cpus.

# cpupower frequency-info
analyzing CPU 0:
  driver: cppc_cpufreq
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 1000 MHz - 2.80 GHz
  available cpufreq governors: conservative ondemand userspace
powersave performance schedutil
  current policy: frequency should be within 2.00 GHz and 2.80 GHz.
                  The governor "schedutil" may decide which speed to use
                  within this range.
  current CPU frequency: 1.55 GHz (asserted by call to hardware)

# ps -eTo comm,pid,pri,class | grep sugov
sugov:0            1082 140 DLN
sugov:1            1085 140 DLN
...
sugov:78           1319 140 DLN
sugov:79           1320 140 DLN


Bruno


>


  reply	other threads:[~2021-07-29 12:37 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-28 13:11 WARNING: CPU: 112 PID: 2041 at kernel/sched/sched.h:1453 Bruno Goncalves
2021-07-28 15:55 ` Dietmar Eggemann
2021-07-29 12:36   ` Bruno Goncalves [this message]
2021-07-29 14:38     ` Dietmar Eggemann
2021-07-30 12:22       ` Dietmar Eggemann
2021-07-30 15:23         ` Bruno Goncalves
2021-08-02  8:43           ` Dietmar Eggemann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+QYu4rCRR_pNQVxSwGpzcLWJKLUA3F7LAtBiU9hPzz3D0k0Cg@mail.gmail.com \
    --to=bgoncalv@redhat.com \
    --cc=cki-project@redhat.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mm-qe@redhat.com \
    --cc=nathan@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).