All of lore.kernel.org
 help / color / mirror / Atom feed
From: Veronika Kabatova <vkabatov@redhat.com>
To: CKI Project <cki-project@redhat.com>
Cc: skt-results-master@redhat.com,
	Linux Stable maillist <stable@vger.kernel.org>,
	Li Wang <liwang@redhat.com>, Memory Management <mm-qe@redhat.com>,
	Jan Stancek <jstancek@redhat.com>
Subject: Re: ❌ FAIL: Test report for kernel 5.13.2-rc1 (stable, 949241ad)
Date: Tue, 13 Jul 2021 18:31:23 +0200	[thread overview]
Message-ID: <CA+tGwnnq848NswpBS1Es0oobf+Wgqn7sfbsj-=+zV0QFnkazBg@mail.gmail.com> (raw)
In-Reply-To: <cki.AEAEEEA6D2.TXY6U4DQIG@redhat.com>

On Tue, Jul 13, 2021 at 6:16 PM CKI Project <cki-project@redhat.com> wrote:
>
>
> Hello,
>
> We ran automated tests on a recent commit from this kernel tree:
>
>        Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
>             Commit: 949241ad55a9 - Linux 5.13.2-rc1
>
> The results of these automated tests are provided below.
>
>     Overall result: FAILED (see details below)
>              Merge: OK
>            Compile: OK
>              Tests: FAILED
>
> All kernel binaries, config files, and logs are available for download here:
>
>   https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefix=datawarehouse-public/2021/07/12/335680374
>
> One or more kernel tests failed:
>
>     ppc64le:
>      ❌ LTP
>
>     aarch64:
>      ❌ Boot test
>

Hi,

I'm not sure why the reporter is ignoring a panic on ppc64le, but it is visible
in the console log. Direct link:

https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/datawarehouse-public/2021/07/12/335680374/build_ppc64le_redhat%3A1418169689/tests/10288413_ppc64le_1_console.log

Look for "call trace" or "bfq_finish_requeue_request". The panic is
reproducible 100% on POWER8. The panic goes away if the following
patch is reverted:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/commit/?h=linux-5.13.y&id=a483f513670541227e6a31ac7141826b8c785842

For the aarch64 boot failure, we cannot reproduce it reliably. The console
log is available in the artifacts link, direct:

https://s3.us-east-1.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2021/07/12/335680374/build_aarch64_redhat:1418169681/tests/10288401_aarch64_1_console.log

We couldn't reproduce the lockup trace, but while trying to do it we
instead saw some rcu stalls, the trace was same on two different runs.
Not sure if they are related to the original boot lockup or not:

INFO: rcu_sched detec ted stalls on C..0: (0 ticks this GP)
idle=b9e/1/0x4000000000000000 softirq=421/421 fqs=1706
[  178.152710] (detected by 60, t=6003 jiffies, g=4649, q=3457)
[  178.158450] Task dump for CPU 85:
[  178.161756] task:kworker/u513:2  state:R  running task     stack:
 0 pid: 1650 ppid:     2 flags:0x0000000a
[  178.171670] Workqueue: writeback wb_workfn (flush-8:0)
[  178.176813] Call trace:
[  178.179248]  __switch_to+0x108/0x134
[  178. 182822]  0xffff[-- MARK -- Tue Jul 13 15:35:00 2021]
[  183.157641] rcu: INFO: rcu_sched detected expedited stalls on
CPUs/tasks: { 85-... } 6485 jiffies s: 833 root: 0x20/.
[  183.168282] rcu: blocking rcu_node structures (internal RCU debug):
l=1:80-95:0x20/.
[  183.176025] Task dump for CPU 85:
[  183.179345] task:kworker/u513:2  stat e:R  running ta  2 flags:0x0000000a
[  183.189271] Workqueue: writeback wb_workfn (flush-8:0)
[  183.194409] Call trace:
[  183.196844]  __switch_to+0x108/0x134
[  183.200428]  0xffff000807de4180
[  358.186779] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[  358.192699] rcu: 85-...0: (0 ticks this GP)
idle=b9e/1/0x4000000000000000 softirq=421/421  fqs=6863
[  jiffies, g=4649, q=12800)
[  358.207852] Task dump for CPU 85:
[  358.211157] task:kworker/u513:2  state:R  running task     stack:
 0 pid: 1650 ppid:     2 flags:0x0000000a
[  358.221068] Workqueue: writeback wb_workfn (flush-8:0)
[  358.226206] Call trace:
[  358.228641]  __switch_to+0x108/0x134
[  358.232212]  0xffff000807de4180
[  367.476761] rcu: INFO:  rcu_sched dete-... } 24917 jiffies s: 833
root: 0x20/.
[  367.487481] rcu: blocking rcu_node structures (internal RCU debug):
l=1:80-95:0x20/.
[  367.495224] Task dump for CPU 85:
[  367.498538] task:kworker/u513:2  state:R  running task     stack:
 0 pid: 1650 ppid:     2 flags:0x0000000a
[  367.508456] Workqueue: writeback wb_workfn (flush-8:0)
[  367.513594] Call trace:
[  367.516029]  __switch_to+0x108/0x134
[  367.519607]  0xffff000807de4180
[-- MARK -- Tue Jul 13 15:40:00 2021]
[  538.235960] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[  538.241879] rcu: 85-...0: (0 ticks this GP)
idle=b9e/1/0x4000000000000000 softirq=421/421 fqs=12725
[  538.251112] (detected by 16, t=42013 jiffies, g=4649, q=22145)
[  538.257026] Task dump for CPU 85:
[   538.260331] t     stack:    0 pid: 1650 ppid:     2 flags:0x0000000a
[  538.270241] Workqueue: writeback wb_workfn (flush-8:0)
[  538.275379] Call trace:
[  538.277814]  __switch_to+0x108/0x134
[  538.281385]  0xffff000807de4180
[  551.795927] rcu: INFO: rcu_sched detected expedited stalls on
CPUs/tasks: { 85-... } 43349 jiffies s: 833 root: 0x20/.
[  551.806641] rcu: blocking rcu_node structures (internal RCU debug):
l=1:80-95:0x20/.
[  551.814384] Task dump for CPU 85:
[  551. 817697] task:kw    stack:    0 pid: 1650 ppid:     2 flags:0x0000000a
[  551.827614] Workqueue: writeback wb_workfn (flush-8:0)
[  551.832751] Call trace:
[  551.835186]  __switch_to+0x108/0x134
[  551.838763]  0xffff000807de4180
[  718.285222] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[  718.291140] rcu: 85-...0: (0 ticks this GP)
idle=b9e/1/0x4000000000000000 softirq=421/421 fqs=18713
[  718.300377] (detected by 48, t=60018 jiffies, g=4649, q=30763)
[  718.306291] Task dump for CPU 85:
[  718.309595] task:kworker/u513:2  state:R  running task     stack:
 0 pid: 1650 ppid:     2 flags:0x0000000a
[  718.319505]  Workqueue: writ.324642] Call trace:
[  718.327077]  __switch_to+0x108/0x134
[  718.330647]  0xffff000807de4180
[   736.115176] rdited stalls on CPUs/tasks: { 85-... } 61781 jiffies
s: 833 root: 0x20/.
[  736.125890] rcu: blocking rcu_node structures (internal RCU debug):
l=1:80-95:0x20/.
[  736.133632] Task dump for CPU 85:
[  736.136943] task:kworker/u513:2  state:R  running task     stack:
 0 pid: 1650 ppid:     2 flags:0x0000000a
[  736.146861] Workqueue: writeback wb_workfn (flush-8:0)
[  736.151998] Call trace:
[  736.154432]  __switch_to+0x1 08/0x134
[  73[-- MARK -- Tue Jul 13 15:45:00 2021]
[  891.634611] ------------[ cut here ]------------
[  891.639223] NETDEV WATCHDOG: enp11s0f0np0 (mlx5_c ore): transmit G:
CPU: 63 PID: 0 at net/sched/sch_generic.c:467 dev_watchdog+0x374/0x37c
[  891.655173] Modules linked in: rfkill vfat fat rpcrdma sunrpc
rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser
libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm
mlx5_ib ib_uverbs ib_core acpi_ipmi joydev ipmi_ssif i2c_smbus
mlx5_core psample mlxfw ipmi_devintf ipmi_msghandler thunderx2_pmu
cppc_cpufreq fuse zram ip_tab les xfs ast crcm_helper drm_kms_helper
syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm
drm gpio_xlp i2c_xlp9xx uas usb_storage aes_neon_bs
[  891.703738] CPU: 63 PID: 0 Comm: swapper/63 Not tainted 5.13.2-rc1 #1
[  891.710171] Hardware name: HPE ASSY,ARx4z
SystemBoard/Comanche_2S_CN99X_ARM , BIOS L50_5.13_1.11 06/18/2019
[  891.719901] pstate: 40400009 (nZcv daif +PAN -UAO -TCO BTYPE=--)
[  891.725900] pc : dev_watchdog+ 0x374/0x37c
[ x37c
[  891.733904] sp : ffff800012b83d60
[  891.737207] x29: ffff800012b83d60 x28: 0000000000000000 x27: ffff800011813000
[  891.744340] x26: 0000000000000001 x25: 0000000000000140 x24: 00000000ffffffff
[  891.751470] x23: 000000000000003f x22: ffff000854f00480 x21: ffff800011d37000
[  891.758600] x20: ffff000854f00000 x19: 0000000000000017 x18: 0000000000000000
[  891.765731] x17: 0000000000000001 x16: 0000000000000019 x1 5:
000000000000fff x13: ffff800092b83a8f x12: ffff800012b83a97
[  891.779989] x11: ffff008f53a80000 x10: 00000000ffff0000 x9 : ffff800010232988
[  891.787119] x8 : 0000000000000000 x7 : ffff008f53500000 x6 : 000000000002fffd
[  891.794249] x5 : 0000000000000000 x4 : ffff000f5c5e3148 x3 : ffff000f5c5f0ef0
[  891.801377] x2 : ffff000f5c5e3148 x1 : ffff800f4adcd000 x0 : 0000000000000046
[  891.808507] Call trace:
[  891.810944]  dev_watchdog +0x374/0x37c
[
[  891.818256]  __run_timers.part.0+0x290/0x330
[  891.822517]  run_timer_softirq+0x48/0x80
[  891.826430]  __do_softirq+0x128/0x388
[  891.830084]  __irq_exit_rcu+0x168/0x170
[  891.833915]  irq_exit+0x1c/0x30
[  891.837048]  __handle_domain_irq+0x8c/0xec
[  891.841139]  gic_handle_irq+0x5c/0xdc
[  891.844791]  el1_irq+0xc0/0x148
[  891.847923]  arch_cpu_idle+0x18/0x30
[  891.851490]  default_idle_call+0x4c/0x160
[  89 1.855494]  cpui585]  do_idle+0xbc/0x110
[  891.862718]  cpu_startup_entry+0x34/0x9c
[  891.866633]  secondary_start_kernel+0xf4/0x120
[  891.871072] ---[ end trace 4867eab29f724990 ]---
[  891.875689] mlx5_core 0000:0b:00.0 enp11s0f0np0: TX timeout detected
[  891.882060] mlx5_core 0000:0b:00.0 enp11s0f0np0: TX timeout on
queue: 23, SQ: 0x195, CQ: 0x9a, SQ Cons: 0x1 SQ Prod: 0x2, usecs since
last trans: 29710000
[  891.895925] mlx5_core 0 000:0b:00.0 enp 0xa2
[  891.903522] mlx5_core 0000:0b:00.0 enp11s0f0np0: Recovered 2 eqes on EQ 0x2a
[  898.334537] rcu: INFO: rcu_sched detec ted stalls on C..0: (0 ticks
this GP) idle=b9e/1/0x4000000000000000 softirq=421/421 fqs=24701
[  898.349691] (detected by 54, t=78023 jiffies, g=4649, q=39150)
[  898.355606] Task dump for CPU 85:
[  898.358910] task:kworker/u513:2  state:R  running task     stack:
 0 pid: 1650 ppid:     2 flags:0x0000000a
[  898.368821] Workqueue: writeback wb_workfn (flush-8:0)
[  898.373959] Call trace:
[  898.376395]  __switch_to+0x108/0x134
[  8 98.379965]  0xf[  920.434478] rcu: INFO: rcu_sched detected
expedited stalls on CPUs/tasks: { 85-... } 80213 jiffies s: 833 root:
0x20/.
[  920.445191] rcu: blocking rcu_node structures (internal RCU debug):
l=1:80-95:0x20/.
[  92 0.452933] Task rker/u513:2  state:R  running task     stack:
0 pid: 1650 ppid:     2 flags:0x0000000a
[  920.466163] Workqueue: writeback wb_workfn (flush-8:0)
[  920.471299] Call trace:
[  920.473734]  __switch_to+0x108/0x134
[  920.477312]  0xffff000807de4180
[ 1078.383857] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:



Veronika

> We hope that these logs can help you find the problem quickly. For the full
> detail on our testing procedures, please scroll to the bottom of this message.
>
> Please reply to this email if you have any questions about the tests that we
> ran or if you have any suggestions on how to make future tests more effective.
>
>         ,-.   ,-.
>        ( C ) ( K )  Continuous
>         `-',-.`-'   Kernel
>           ( I )     Integration
>            `-'
> ______________________________________________________________________________
>
> Compile testing
> ---------------
>
> We compiled the kernel for 4 architectures:
>

<snip>


      reply	other threads:[~2021-07-13 16:32 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-13 16:15 ❌ FAIL: Test report for kernel 5.13.2-rc1 (stable, 949241ad) CKI Project
2021-07-13 16:31 ` Veronika Kabatova [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+tGwnnq848NswpBS1Es0oobf+Wgqn7sfbsj-=+zV0QFnkazBg@mail.gmail.com' \
    --to=vkabatov@redhat.com \
    --cc=cki-project@redhat.com \
    --cc=jstancek@redhat.com \
    --cc=liwang@redhat.com \
    --cc=mm-qe@redhat.com \
    --cc=skt-results-master@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.