linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Karsten Graul <kgraul@linux.ibm.com>
To: Tony Lu <tonylu@linux.alibaba.com>, Alaa Hleihel <alaa@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>, Jason Gunthorpe <jgg@ziepe.ca>,
	linux-rdma@vger.kernel.org
Subject: Re: RDMA/mlx5: Regression since v5.15-rc5: Kernel panic when called ib_dereg_mr
Date: Tue, 21 Dec 2021 09:20:28 +0100	[thread overview]
Message-ID: <8b764027-4f25-e27d-15f9-7466343cf845@linux.ibm.com> (raw)
In-Reply-To: <9974ea8c-f1cb-aeb4-cf1b-19d37536894a@linux.alibaba.com>

On 21/12/2021 09:04, Tony Lu wrote:
> Hello,
> 
> During developing and testing of SMC (net/smc), We found a problem,
> when SMC released linkgroup or link, it called ib_dereg_mr to release
> resources, then it panicked in mlx5_ib_dereg_mr. After investigation,
> we found this panic was introduce by this commit:
> 
>     f0ae4afe3d35 ("RDMA/mlx5: Fix releasing unallocated memory in dereg MR flow")

+1, this panic in our environment:

[  380.055202] smc: SMC-R lg 00000200 link removed: id 00000201, peerid 00000101, ibdev mlx5_0, ibport 1
[  380.055230] smc: SMC-R lg 00000100 state changed: SINGLE, pnetid NET10           
[  380.055605] Unable to handle kernel pointer dereference in virtual kernel address space
[  380.055607] Failing address: 7563745f64657000 TEID: 7563745f64657803
[  380.055609] Fault in home space mode while using kernel ASCE.
[  380.055613] AS:0000000124abc007 R3:0000000000000024 
[  380.055650] Oops: 0038 ilc:3 [#1] SMP 
[  380.055655] Modules linked in: dummy smc_diag smc tcp_diag ...
[  380.055698] CPU: 2 PID: 21939 Comm: kworker/2:22 Not tainted 5.16.0-20211220.rc5.git0.c4a510cd6ab8.300.fc35.s390x #1
[  380.055700] Hardware name: IBM 8561 T01 701 (z/VM 7.2.0)
[  380.055702] Workqueue: events smc_link_down_work [smc]
[  380.055717] Krnl PSW : 0704e00180000000 000000012311abbc (dma_unmap_sg_attrs+0x1c/0x68)
[  380.055729]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
[  380.055732] Krnl GPRS: 0000000000000018 000000012311aba0 7563745f64657461 000000010232f003
[  380.055735]            0000000002330003 0000000000000000 0000000000000000 0000000000000000
[  380.055738]            0000000000000000 000000008fe64000 0000000084cd6000 000000008fe64000
[  380.055740]            0000000035244200 00000000b669c248 000003800a077a68 000003800a077a10
[  380.055748] Krnl Code: 000000012311abac: b90400ef		lgr	%r14,%r15
                          000000012311abb0: e3f0ffa8ff71	lay	%r15,-88(%r15)
                         #000000012311abb6: e3e0f0980024	stg	%r14,152(%r15)
                         >000000012311abbc: e3b021300002	ltg	%r11,304(%r2)
                          000000012311abc2: a7840013		brc	8,000000012311abe8
                          000000012311abc6: ec52001d027f	clij	%r5,2,2,000000012311ac00
                          000000012311abcc: e310b0580002	ltg	%r1,88(%r11)
                          000000012311abd2: a7840005		brc	8,000000012311abdc
[  380.055775] Call Trace:
[  380.055777]  [<000000012311abbc>] dma_unmap_sg_attrs+0x1c/0x68 
[  380.055780]  [<000003ff80560bd2>] __ib_umem_release+0xc2/0xd8 [ib_uverbs] 
[  380.055797]  [<000003ff805610a6>] ib_umem_release+0x4e/0xe0 [ib_uverbs] 
[  380.055806]  [<000003ff804fe7ca>] mlx5_ib_dereg_mr.localalias+0x212/0x480 [mlx5_ib] 
[  380.055830]  [<000003ff803a0ddc>] ib_dereg_mr_user+0x5c/0xe0 [ib_core] 
[  380.055878]  [<000003ff806c249c>] smcr_buf_unmap_link+0x64/0xe0 [smc] 
[  380.055887]  [<000003ff806c2cb2>] smcr_link_clear.part.0+0x72/0x230 [smc] 
[  380.055896]  [<000003ff806c6364>] smcr_link_down+0xc4/0x1b8 [smc] 
[  380.055902]  [<000003ff806c64be>] smc_link_down_work+0x66/0x88 [smc] 
[  380.055909]  [<00000001230a2b02>] process_one_work+0x1fa/0x470 
[  380.055913]  [<00000001230a32a4>] worker_thread+0x64/0x498 
[  380.055915]  [<00000001230aaf5c>] kthread+0x17c/0x188 
[  380.055919]  [<00000001230333c4>] __ret_from_fork+0x3c/0x58 
[  380.055922]  [<0000000123bc46ba>] ret_from_fork+0xa/0x40 
[  380.055927] Last Breaking-Event-Address:
[  380.055929]  [<000003ff8054e2a8>] 0x3ff8054e2a8
[  380.055940] Kernel panic - not syncing: Fatal exception: panic_on_oops

  reply	other threads:[~2021-12-21  8:20 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-21  8:04 RDMA/mlx5: Regression since v5.15-rc5: Kernel panic when called ib_dereg_mr Tony Lu
2021-12-21  8:20 ` Karsten Graul [this message]
2021-12-21 10:26   ` Leon Romanovsky
2021-12-22  7:37 ` Thorsten Leemhuis
2022-01-08 11:41   ` RDMA/mlx5: Regression since v5.15-rc5: Kernel panic when called ib_dereg_mr #forregzbot Thorsten Leemhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8b764027-4f25-e27d-15f9-7466343cf845@linux.ibm.com \
    --to=kgraul@linux.ibm.com \
    --cc=alaa@nvidia.com \
    --cc=jgg@ziepe.ca \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=tonylu@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).