From: Karsten Graul <kgraul@linux.ibm.com>
To: Tony Lu <tonylu@linux.alibaba.com>, Alaa Hleihel <alaa@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>, Jason Gunthorpe <jgg@ziepe.ca>,
linux-rdma@vger.kernel.org
Subject: Re: RDMA/mlx5: Regression since v5.15-rc5: Kernel panic when called ib_dereg_mr
Date: Tue, 21 Dec 2021 09:20:28 +0100 [thread overview]
Message-ID: <8b764027-4f25-e27d-15f9-7466343cf845@linux.ibm.com> (raw)
In-Reply-To: <9974ea8c-f1cb-aeb4-cf1b-19d37536894a@linux.alibaba.com>
On 21/12/2021 09:04, Tony Lu wrote:
> Hello,
>
> During developing and testing of SMC (net/smc), We found a problem,
> when SMC released linkgroup or link, it called ib_dereg_mr to release
> resources, then it panicked in mlx5_ib_dereg_mr. After investigation,
> we found this panic was introduce by this commit:
>
> f0ae4afe3d35 ("RDMA/mlx5: Fix releasing unallocated memory in dereg MR flow")
+1, this panic in our environment:
[ 380.055202] smc: SMC-R lg 00000200 link removed: id 00000201, peerid 00000101, ibdev mlx5_0, ibport 1
[ 380.055230] smc: SMC-R lg 00000100 state changed: SINGLE, pnetid NET10
[ 380.055605] Unable to handle kernel pointer dereference in virtual kernel address space
[ 380.055607] Failing address: 7563745f64657000 TEID: 7563745f64657803
[ 380.055609] Fault in home space mode while using kernel ASCE.
[ 380.055613] AS:0000000124abc007 R3:0000000000000024
[ 380.055650] Oops: 0038 ilc:3 [#1] SMP
[ 380.055655] Modules linked in: dummy smc_diag smc tcp_diag ...
[ 380.055698] CPU: 2 PID: 21939 Comm: kworker/2:22 Not tainted 5.16.0-20211220.rc5.git0.c4a510cd6ab8.300.fc35.s390x #1
[ 380.055700] Hardware name: IBM 8561 T01 701 (z/VM 7.2.0)
[ 380.055702] Workqueue: events smc_link_down_work [smc]
[ 380.055717] Krnl PSW : 0704e00180000000 000000012311abbc (dma_unmap_sg_attrs+0x1c/0x68)
[ 380.055729] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
[ 380.055732] Krnl GPRS: 0000000000000018 000000012311aba0 7563745f64657461 000000010232f003
[ 380.055735] 0000000002330003 0000000000000000 0000000000000000 0000000000000000
[ 380.055738] 0000000000000000 000000008fe64000 0000000084cd6000 000000008fe64000
[ 380.055740] 0000000035244200 00000000b669c248 000003800a077a68 000003800a077a10
[ 380.055748] Krnl Code: 000000012311abac: b90400ef lgr %r14,%r15
000000012311abb0: e3f0ffa8ff71 lay %r15,-88(%r15)
#000000012311abb6: e3e0f0980024 stg %r14,152(%r15)
>000000012311abbc: e3b021300002 ltg %r11,304(%r2)
000000012311abc2: a7840013 brc 8,000000012311abe8
000000012311abc6: ec52001d027f clij %r5,2,2,000000012311ac00
000000012311abcc: e310b0580002 ltg %r1,88(%r11)
000000012311abd2: a7840005 brc 8,000000012311abdc
[ 380.055775] Call Trace:
[ 380.055777] [<000000012311abbc>] dma_unmap_sg_attrs+0x1c/0x68
[ 380.055780] [<000003ff80560bd2>] __ib_umem_release+0xc2/0xd8 [ib_uverbs]
[ 380.055797] [<000003ff805610a6>] ib_umem_release+0x4e/0xe0 [ib_uverbs]
[ 380.055806] [<000003ff804fe7ca>] mlx5_ib_dereg_mr.localalias+0x212/0x480 [mlx5_ib]
[ 380.055830] [<000003ff803a0ddc>] ib_dereg_mr_user+0x5c/0xe0 [ib_core]
[ 380.055878] [<000003ff806c249c>] smcr_buf_unmap_link+0x64/0xe0 [smc]
[ 380.055887] [<000003ff806c2cb2>] smcr_link_clear.part.0+0x72/0x230 [smc]
[ 380.055896] [<000003ff806c6364>] smcr_link_down+0xc4/0x1b8 [smc]
[ 380.055902] [<000003ff806c64be>] smc_link_down_work+0x66/0x88 [smc]
[ 380.055909] [<00000001230a2b02>] process_one_work+0x1fa/0x470
[ 380.055913] [<00000001230a32a4>] worker_thread+0x64/0x498
[ 380.055915] [<00000001230aaf5c>] kthread+0x17c/0x188
[ 380.055919] [<00000001230333c4>] __ret_from_fork+0x3c/0x58
[ 380.055922] [<0000000123bc46ba>] ret_from_fork+0xa/0x40
[ 380.055927] Last Breaking-Event-Address:
[ 380.055929] [<000003ff8054e2a8>] 0x3ff8054e2a8
[ 380.055940] Kernel panic - not syncing: Fatal exception: panic_on_oops
next prev parent reply other threads:[~2021-12-21 8:20 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-21 8:04 RDMA/mlx5: Regression since v5.15-rc5: Kernel panic when called ib_dereg_mr Tony Lu
2021-12-21 8:20 ` Karsten Graul [this message]
2021-12-21 10:26 ` Leon Romanovsky
2021-12-22 7:37 ` Thorsten Leemhuis
2022-01-08 11:41 ` RDMA/mlx5: Regression since v5.15-rc5: Kernel panic when called ib_dereg_mr #forregzbot Thorsten Leemhuis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8b764027-4f25-e27d-15f9-7466343cf845@linux.ibm.com \
--to=kgraul@linux.ibm.com \
--cc=alaa@nvidia.com \
--cc=jgg@ziepe.ca \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=tonylu@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).