All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net 0/2] Two fixes for SMCRv2
@ 2023-05-26 11:48 Wen Gu
  2023-05-26 11:49 ` [PATCH net 1/2] net/smc: Scan from current RMB list when no position specified Wen Gu
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Wen Gu @ 2023-05-26 11:48 UTC (permalink / raw)
  To: kgraul, wenjia, jaka, davem, edumazet, kuba, pabeni
  Cc: linux-s390, netdev, linux-kernel

This patch set includes two bugfix for SMCRv2.

Wen Gu (2):
  net/smc: Scan from current RMB list when no position specified
  net/smc: Don't use RMBs not mapped to new link in SMCRv2 ADD LINK

 net/smc/smc_llc.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH net 1/2] net/smc: Scan from current RMB list when no position specified
  2023-05-26 11:48 [PATCH net 0/2] Two fixes for SMCRv2 Wen Gu
@ 2023-05-26 11:49 ` Wen Gu
  2023-05-30 20:08   ` Wenjia Zhang
  2023-05-26 11:49 ` [PATCH net 2/2] net/smc: Don't use RMBs not mapped to new link in SMCRv2 ADD LINK Wen Gu
  2023-05-30  9:30 ` [PATCH net 0/2] Two fixes for SMCRv2 patchwork-bot+netdevbpf
  2 siblings, 1 reply; 9+ messages in thread
From: Wen Gu @ 2023-05-26 11:49 UTC (permalink / raw)
  To: kgraul, wenjia, jaka, davem, edumazet, kuba, pabeni
  Cc: linux-s390, netdev, linux-kernel

When finding the first RMB of link group, it should start from the
current RMB list whose index is 0. So fix it.

Fixes: b4ba4652b3f8 ("net/smc: extend LLC layer for SMC-Rv2")
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
---
 net/smc/smc_llc.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/smc/smc_llc.c b/net/smc/smc_llc.c
index a0840b8..8423e8e 100644
--- a/net/smc/smc_llc.c
+++ b/net/smc/smc_llc.c
@@ -578,7 +578,10 @@ static struct smc_buf_desc *smc_llc_get_next_rmb(struct smc_link_group *lgr,
 {
 	struct smc_buf_desc *buf_next;
 
-	if (!buf_pos || list_is_last(&buf_pos->list, &lgr->rmbs[*buf_lst])) {
+	if (!buf_pos)
+		return _smc_llc_get_next_rmb(lgr, buf_lst);
+
+	if (list_is_last(&buf_pos->list, &lgr->rmbs[*buf_lst])) {
 		(*buf_lst)++;
 		return _smc_llc_get_next_rmb(lgr, buf_lst);
 	}
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net 2/2] net/smc: Don't use RMBs not mapped to new link in SMCRv2 ADD LINK
  2023-05-26 11:48 [PATCH net 0/2] Two fixes for SMCRv2 Wen Gu
  2023-05-26 11:49 ` [PATCH net 1/2] net/smc: Scan from current RMB list when no position specified Wen Gu
@ 2023-05-26 11:49 ` Wen Gu
  2023-05-27 10:22   ` Wenjia Zhang
  2023-05-30  9:30 ` [PATCH net 0/2] Two fixes for SMCRv2 patchwork-bot+netdevbpf
  2 siblings, 1 reply; 9+ messages in thread
From: Wen Gu @ 2023-05-26 11:49 UTC (permalink / raw)
  To: kgraul, wenjia, jaka, davem, edumazet, kuba, pabeni
  Cc: linux-s390, netdev, linux-kernel

We encountered a crash when using SMCRv2. It is caused by a logical
error in smc_llc_fill_ext_v2().

 BUG: kernel NULL pointer dereference, address: 0000000000000014
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] PREEMPT SMP PTI
 CPU: 7 PID: 453 Comm: kworker/7:4 Kdump: loaded Tainted: G        W   E      6.4.0-rc3+ #44
 Workqueue: events smc_llc_add_link_work [smc]
 RIP: 0010:smc_llc_fill_ext_v2+0x117/0x280 [smc]
 RSP: 0018:ffffacb5c064bd88 EFLAGS: 00010282
 RAX: ffff9a6bc1c3c02c RBX: ffff9a6be3558000 RCX: 0000000000000000
 RDX: 0000000000000002 RSI: 0000000000000002 RDI: 000000000000000a
 RBP: ffffacb5c064bdb8 R08: 0000000000000040 R09: 000000000000000c
 R10: ffff9a6bc0910300 R11: 0000000000000002 R12: 0000000000000000
 R13: 0000000000000002 R14: ffff9a6bc1c3c02c R15: ffff9a6be3558250
 FS:  0000000000000000(0000) GS:ffff9a6eefdc0000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000014 CR3: 000000010b078003 CR4: 00000000003706e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  <TASK>
  smc_llc_send_add_link+0x1ae/0x2f0 [smc]
  smc_llc_srv_add_link+0x2c9/0x5a0 [smc]
  ? cc_mkenc+0x40/0x60
  smc_llc_add_link_work+0xb8/0x140 [smc]
  process_one_work+0x1e5/0x3f0
  worker_thread+0x4d/0x2f0
  ? __pfx_worker_thread+0x10/0x10
  kthread+0xe5/0x120
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x2c/0x50
  </TASK>

When an alernate RNIC is available in system, SMC will try to add a new
link based on the RNIC for resilience. All the RMBs in use will be mapped
to the new link. Then the RMBs' MRs corresponding to the new link will be
filled into SMCRv2 LLC ADD LINK messages.

However, smc_llc_fill_ext_v2() mistakenly accesses to unused RMBs which
haven't been mapped to the new link and have no valid MRs, thus causing
a crash. So this patch fixes the logic.

Fixes: b4ba4652b3f8 ("net/smc: extend LLC layer for SMC-Rv2")
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
---
 net/smc/smc_llc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/smc/smc_llc.c b/net/smc/smc_llc.c
index 8423e8e..7a8d916 100644
--- a/net/smc/smc_llc.c
+++ b/net/smc/smc_llc.c
@@ -617,6 +617,8 @@ static int smc_llc_fill_ext_v2(struct smc_llc_msg_add_link_v2_ext *ext,
 		goto out;
 	buf_pos = smc_llc_get_first_rmb(lgr, &buf_lst);
 	for (i = 0; i < ext->num_rkeys; i++) {
+		while (buf_pos && !(buf_pos)->used)
+			buf_pos = smc_llc_get_next_rmb(lgr, &buf_lst, buf_pos);
 		if (!buf_pos)
 			break;
 		rmb = buf_pos;
@@ -626,8 +628,6 @@ static int smc_llc_fill_ext_v2(struct smc_llc_msg_add_link_v2_ext *ext,
 			cpu_to_be64((uintptr_t)rmb->cpu_addr) :
 			cpu_to_be64((u64)sg_dma_address(rmb->sgt[lnk_idx].sgl));
 		buf_pos = smc_llc_get_next_rmb(lgr, &buf_lst, buf_pos);
-		while (buf_pos && !(buf_pos)->used)
-			buf_pos = smc_llc_get_next_rmb(lgr, &buf_lst, buf_pos);
 	}
 	len += i * sizeof(ext->rt[0]);
 out:
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net 2/2] net/smc: Don't use RMBs not mapped to new link in SMCRv2 ADD LINK
  2023-05-26 11:49 ` [PATCH net 2/2] net/smc: Don't use RMBs not mapped to new link in SMCRv2 ADD LINK Wen Gu
@ 2023-05-27 10:22   ` Wenjia Zhang
  2023-05-27 15:20     ` Wen Gu
  0 siblings, 1 reply; 9+ messages in thread
From: Wenjia Zhang @ 2023-05-27 10:22 UTC (permalink / raw)
  To: Wen Gu, kgraul, jaka, davem, edumazet, kuba, pabeni
  Cc: linux-s390, netdev, linux-kernel



On 26.05.23 13:49, Wen Gu wrote:
> We encountered a crash when using SMCRv2. It is caused by a logical
> error in smc_llc_fill_ext_v2().
> 
>   BUG: kernel NULL pointer dereference, address: 0000000000000014
>   #PF: supervisor read access in kernel mode
>   #PF: error_code(0x0000) - not-present page
>   PGD 0 P4D 0
>   Oops: 0000 [#1] PREEMPT SMP PTI
>   CPU: 7 PID: 453 Comm: kworker/7:4 Kdump: loaded Tainted: G        W   E      6.4.0-rc3+ #44
>   Workqueue: events smc_llc_add_link_work [smc]
>   RIP: 0010:smc_llc_fill_ext_v2+0x117/0x280 [smc]
>   RSP: 0018:ffffacb5c064bd88 EFLAGS: 00010282
>   RAX: ffff9a6bc1c3c02c RBX: ffff9a6be3558000 RCX: 0000000000000000
>   RDX: 0000000000000002 RSI: 0000000000000002 RDI: 000000000000000a
>   RBP: ffffacb5c064bdb8 R08: 0000000000000040 R09: 000000000000000c
>   R10: ffff9a6bc0910300 R11: 0000000000000002 R12: 0000000000000000
>   R13: 0000000000000002 R14: ffff9a6bc1c3c02c R15: ffff9a6be3558250
>   FS:  0000000000000000(0000) GS:ffff9a6eefdc0000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 0000000000000014 CR3: 000000010b078003 CR4: 00000000003706e0
>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   Call Trace:
>    <TASK>
>    smc_llc_send_add_link+0x1ae/0x2f0 [smc]
>    smc_llc_srv_add_link+0x2c9/0x5a0 [smc]
>    ? cc_mkenc+0x40/0x60
>    smc_llc_add_link_work+0xb8/0x140 [smc]
>    process_one_work+0x1e5/0x3f0
>    worker_thread+0x4d/0x2f0
>    ? __pfx_worker_thread+0x10/0x10
>    kthread+0xe5/0x120
>    ? __pfx_kthread+0x10/0x10
>    ret_from_fork+0x2c/0x50
>    </TASK>
> 
> When an alernate RNIC is available in system, SMC will try to add a new
> link based on the RNIC for resilience. All the RMBs in use will be mapped
> to the new link. Then the RMBs' MRs corresponding to the new link will be
> filled into SMCRv2 LLC ADD LINK messages.
> 
> However, smc_llc_fill_ext_v2() mistakenly accesses to unused RMBs which
> haven't been mapped to the new link and have no valid MRs, thus causing
> a crash. So this patch fixes the logic.
> 
> Fixes: b4ba4652b3f8 ("net/smc: extend LLC layer for SMC-Rv2")
> Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
> ---
>   net/smc/smc_llc.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/smc/smc_llc.c b/net/smc/smc_llc.c
> index 8423e8e..7a8d916 100644
> --- a/net/smc/smc_llc.c
> +++ b/net/smc/smc_llc.c
> @@ -617,6 +617,8 @@ static int smc_llc_fill_ext_v2(struct smc_llc_msg_add_link_v2_ext *ext,
>   		goto out;
>   	buf_pos = smc_llc_get_first_rmb(lgr, &buf_lst);
>   	for (i = 0; i < ext->num_rkeys; i++) {
> +		while (buf_pos && !(buf_pos)->used)
> +			buf_pos = smc_llc_get_next_rmb(lgr, &buf_lst, buf_pos);
>   		if (!buf_pos)
>   			break;
>   		rmb = buf_pos;
> @@ -626,8 +628,6 @@ static int smc_llc_fill_ext_v2(struct smc_llc_msg_add_link_v2_ext *ext,
>   			cpu_to_be64((uintptr_t)rmb->cpu_addr) :
>   			cpu_to_be64((u64)sg_dma_address(rmb->sgt[lnk_idx].sgl));
>   		buf_pos = smc_llc_get_next_rmb(lgr, &buf_lst, buf_pos);
> -		while (buf_pos && !(buf_pos)->used)
> -			buf_pos = smc_llc_get_next_rmb(lgr, &buf_lst, buf_pos);
>   	}
>   	len += i * sizeof(ext->rt[0]);
>   out:

I'm wondering if this crash is introduced by the first fix patch you wrote.

Thanks,
Wenjia

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net 2/2] net/smc: Don't use RMBs not mapped to new link in SMCRv2 ADD LINK
  2023-05-27 10:22   ` Wenjia Zhang
@ 2023-05-27 15:20     ` Wen Gu
  2023-05-30 20:34       ` Wenjia Zhang
  0 siblings, 1 reply; 9+ messages in thread
From: Wen Gu @ 2023-05-27 15:20 UTC (permalink / raw)
  To: Wenjia Zhang, kgraul, jaka, davem, edumazet, kuba, pabeni
  Cc: linux-s390, netdev, linux-kernel



On 2023/5/27 18:22, Wenjia Zhang wrote:
> 
> I'm wondering if this crash is introduced by the first fix patch you wrote.
> 
> Thanks,
> Wenjia

Hi Wenjia,

No, the crash can be reproduced without my two patches by the following steps:

1. Each side activates only one RNIC firstly and set the default sndbuf/RMB sizes to more
    than 16KB, such as 64KB, through sysctl net.smc.{wmem | rmem}.
    (The reason why initial sndbufs/RMBs size needs to be larger than 16KB will be explained later)

2. Use SMCRv2 in any test, just to create a link group that has some alloced RMBs.

    Example of step #1 #2:

    [server]
    smcr ueid add 1234
    sysctl net.smc.rmem=65536
    sysctl net.smc.wmem=65536
    smc_run sockperf sr --tcp

    [client]
    smcr ueid add 1234
    sysctl net.smc.rmem=65536
    sysctl net.smc.wmem=65536
    smc_run sockperf pp --tcp -i <server ip> -t <time>


3. Change the default sndbuf/RMB sizes, make sure they are larger than initial size above,
    such as 256KB.

4. Then rerun the test, and there will be some bigger RMBs alloced. And when the test is
    running, activate the second alternate RNIC of each side. It will trigger to add a new
    link and do what I described in the second patch's commit log, that only map the in-use
    256KB RMBs to new link but try to access the unused 64KB RMBs' invalid mr[new_link->lnk_idx].

    Example of step #3 #4:

    [server]
    sysctl net.smc.rmem=262144
    sysctl net.smc.wmem=262144
    smc_run sockperf sr --tcp

    [client]
    sysctl net.smc.rmem=262144
    sysctl net.smc.wmem=262144
    smc_run sockperf pp --tcp -i <server ip> -t <time>

    When the sockperf is running:

    [server/client]
    ip link set dev <2nd RNIC> up	# activate the second alternate RNIC, then crash occurs.


At the beginning, I only found the crash in the second patch. But when I try to fix it,
I found the issue descibed in the first patch.

In first patch, if I understand correctly, smc_llc_get_first_rmb() is aimed to get the first
RMB in lgr->rmb[*]. If so, It should start from lgr->rmbs[0] instead of lgr->rmbs[1], right?

Then back to the reason needs to be explained in step #1. Because of the issue mentioned
above in smc_llc_get_first_rmb(), if we set the initial sndbuf/RMB sizes to 16KB, these 16KB
RMBs (in lgr->rmbs[0]) alloced in step #2 will happen not to be accessed in step #4, so the
potential crash is hided.

So, the crash is not introduced by the first fix. Instead, it is the first issue that may hide
the second issue(crash) in special cases.

I am a little curious why you think the first fix patch caused the second crash? Is
something wrong in the first fix patch?

Thanks for your review!

Regards,
Wen Gu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net 0/2] Two fixes for SMCRv2
  2023-05-26 11:48 [PATCH net 0/2] Two fixes for SMCRv2 Wen Gu
  2023-05-26 11:49 ` [PATCH net 1/2] net/smc: Scan from current RMB list when no position specified Wen Gu
  2023-05-26 11:49 ` [PATCH net 2/2] net/smc: Don't use RMBs not mapped to new link in SMCRv2 ADD LINK Wen Gu
@ 2023-05-30  9:30 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 9+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-05-30  9:30 UTC (permalink / raw)
  To: Wen Gu
  Cc: kgraul, wenjia, jaka, davem, edumazet, kuba, pabeni, linux-s390,
	netdev, linux-kernel

Hello:

This series was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Fri, 26 May 2023 19:48:59 +0800 you wrote:
> This patch set includes two bugfix for SMCRv2.
> 
> Wen Gu (2):
>   net/smc: Scan from current RMB list when no position specified
>   net/smc: Don't use RMBs not mapped to new link in SMCRv2 ADD LINK
> 
>  net/smc/smc_llc.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)

Here is the summary with links:
  - [net,1/2] net/smc: Scan from current RMB list when no position specified
    https://git.kernel.org/netdev/net/c/b24aa141c2ff
  - [net,2/2] net/smc: Don't use RMBs not mapped to new link in SMCRv2 ADD LINK
    https://git.kernel.org/netdev/net/c/71c6aa0305e3

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net 1/2] net/smc: Scan from current RMB list when no position specified
  2023-05-26 11:49 ` [PATCH net 1/2] net/smc: Scan from current RMB list when no position specified Wen Gu
@ 2023-05-30 20:08   ` Wenjia Zhang
  0 siblings, 0 replies; 9+ messages in thread
From: Wenjia Zhang @ 2023-05-30 20:08 UTC (permalink / raw)
  To: Wen Gu, kgraul, jaka, davem, edumazet, kuba, pabeni
  Cc: linux-s390, netdev, linux-kernel



On 26.05.23 13:49, Wen Gu wrote:
> When finding the first RMB of link group, it should start from the
> current RMB list whose index is 0. So fix it.
> 
> Fixes: b4ba4652b3f8 ("net/smc: extend LLC layer for SMC-Rv2")
> Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
> ---
>   net/smc/smc_llc.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/smc/smc_llc.c b/net/smc/smc_llc.c
> index a0840b8..8423e8e 100644
> --- a/net/smc/smc_llc.c
> +++ b/net/smc/smc_llc.c
> @@ -578,7 +578,10 @@ static struct smc_buf_desc *smc_llc_get_next_rmb(struct smc_link_group *lgr,
>   {
>   	struct smc_buf_desc *buf_next;
>   
> -	if (!buf_pos || list_is_last(&buf_pos->list, &lgr->rmbs[*buf_lst])) {
> +	if (!buf_pos)
> +		return _smc_llc_get_next_rmb(lgr, buf_lst);
> +
> +	if (list_is_last(&buf_pos->list, &lgr->rmbs[*buf_lst])) {
>   		(*buf_lst)++;
>   		return _smc_llc_get_next_rmb(lgr, buf_lst);
>   	}
It seems too late, but still, why not? :

-	if (!buf_pos || list_is_last(&buf_pos->list, &lgr->rmbs[*buf_lst])) {
-  		(*buf_lst)++;
+	if (list_is_last(&buf_pos->list, &lgr->rmbs[(*buf_lst])++)) {


Thanks,
Wenjia

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net 2/2] net/smc: Don't use RMBs not mapped to new link in SMCRv2 ADD LINK
  2023-05-27 15:20     ` Wen Gu
@ 2023-05-30 20:34       ` Wenjia Zhang
  2023-06-01  8:37         ` Wen Gu
  0 siblings, 1 reply; 9+ messages in thread
From: Wenjia Zhang @ 2023-05-30 20:34 UTC (permalink / raw)
  To: Wen Gu, kgraul, jaka, davem, edumazet, kuba, pabeni
  Cc: linux-s390, netdev, linux-kernel



On 27.05.23 17:20, Wen Gu wrote:
> 
> 
> On 2023/5/27 18:22, Wenjia Zhang wrote:
>>
>> I'm wondering if this crash is introduced by the first fix patch you 
>> wrote.
>>
>> Thanks,
>> Wenjia
> 
> Hi Wenjia,
> 
> No, the crash can be reproduced without my two patches by the following 
> steps:
> 
> 1. Each side activates only one RNIC firstly and set the default 
> sndbuf/RMB sizes to more
>     than 16KB, such as 64KB, through sysctl net.smc.{wmem | rmem}.
>     (The reason why initial sndbufs/RMBs size needs to be larger than 
> 16KB will be explained later)
> 
> 2. Use SMCRv2 in any test, just to create a link group that has some 
> alloced RMBs.
> 
>     Example of step #1 #2:
> 
>     [server]
>     smcr ueid add 1234
>     sysctl net.smc.rmem=65536
>     sysctl net.smc.wmem=65536
>     smc_run sockperf sr --tcp
> 
>     [client]
>     smcr ueid add 1234
>     sysctl net.smc.rmem=65536
>     sysctl net.smc.wmem=65536
>     smc_run sockperf pp --tcp -i <server ip> -t <time>
> 
> 
> 3. Change the default sndbuf/RMB sizes, make sure they are larger than 
> initial size above,
>     such as 256KB.
> 
> 4. Then rerun the test, and there will be some bigger RMBs alloced. And 
> when the test is
>     running, activate the second alternate RNIC of each side. It will 
> trigger to add a new
>     link and do what I described in the second patch's commit log, that 
> only map the in-use
>     256KB RMBs to new link but try to access the unused 64KB RMBs' 
> invalid mr[new_link->lnk_idx].
> 
>     Example of step #3 #4:
> 
>     [server]
>     sysctl net.smc.rmem=262144
>     sysctl net.smc.wmem=262144
>     smc_run sockperf sr --tcp
> 
>     [client]
>     sysctl net.smc.rmem=262144
>     sysctl net.smc.wmem=262144
>     smc_run sockperf pp --tcp -i <server ip> -t <time>
> 
>     When the sockperf is running:
> 
>     [server/client]
>     ip link set dev <2nd RNIC> up    # activate the second alternate 
> RNIC, then crash occurs.
> 
> 
> At the beginning, I only found the crash in the second patch. But when I 
> try to fix it,
> I found the issue descibed in the first patch.
> 
> In first patch, if I understand correctly, smc_llc_get_first_rmb() is 
> aimed to get the first
> RMB in lgr->rmb[*]. If so, It should start from lgr->rmbs[0] instead of 
> lgr->rmbs[1], right?
> 
> Then back to the reason needs to be explained in step #1. Because of the 
> issue mentioned
> above in smc_llc_get_first_rmb(), if we set the initial sndbuf/RMB sizes 
> to 16KB, these 16KB
> RMBs (in lgr->rmbs[0]) alloced in step #2 will happen not to be accessed 
> in step #4, so the
> potential crash is hided.
> 
> So, the crash is not introduced by the first fix. Instead, it is the 
> first issue that may hide
> the second issue(crash) in special cases.
> 
> I am a little curious why you think the first fix patch caused the 
> second crash? Is
> something wrong in the first fix patch?
> 
> Thanks for your review!
> 
> Regards,
> Wen Gu

Hi Wen,

Sorry for the late answer because of the public holiday here!

I really like the test scenario, thank you for the elaboration and the 
fixes!
They look good to me.

Why I asked that was that the first patch looked very reasonable, but I 
was wondering why I didn't meet any problem with that before ;-) and if 
it would trigger some problem during processing the SMCRv1 ADD Link 
Continuation Messages. After checking the code again, I don't think 
there would be any problem with the patch, because in the case of 
processing the SMCRv1 ADD Link Continuation Messages, it's about the 
same RMB.

Hi @Paolo, I would appreciate it if you could give us more time to 
review and test the patches. Because we have to make sure that they can 
work on our platform (s390) without problem, not only on x86.

Thanks
Wenjia




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net 2/2] net/smc: Don't use RMBs not mapped to new link in SMCRv2 ADD LINK
  2023-05-30 20:34       ` Wenjia Zhang
@ 2023-06-01  8:37         ` Wen Gu
  0 siblings, 0 replies; 9+ messages in thread
From: Wen Gu @ 2023-06-01  8:37 UTC (permalink / raw)
  To: Wenjia Zhang, kgraul, jaka, davem, edumazet, kuba, pabeni
  Cc: linux-s390, netdev, linux-kernel



On 2023/5/31 04:34, Wenjia Zhang wrote:
> 
> 
> 
> Hi Wen,
> 
> Sorry for the late answer because of the public holiday here!
> 
> I really like the test scenario, thank you for the elaboration and the fixes!
> They look good to me.
> 
> Why I asked that was that the first patch looked very reasonable, but I was wondering why I didn't meet any problem with 
> that before ;-) and if it would trigger some problem during processing the SMCRv1 ADD Link Continuation Messages. After 
> checking the code again, I don't think there would be any problem with the patch, because in the case of processing the 
> SMCRv1 ADD Link Continuation Messages, it's about the same RMB.
> 
> Hi @Paolo, I would appreciate it if you could give us more time to review and test the patches. Because we have to make 
> sure that they can work on our platform (s390) without problem, not only on x86.
> 
> Thanks
> Wenjia
> 
> 

Inspired by your comments, I check the SMCRv1 and find it has the similar issue in smc_llc_add_link_cont().
The cause and way to reproduce it are similar to the issue in SMCRv2. I will fix this as well.

[  361.813390] BUG: kernel NULL pointer dereference, address: 0000000000000014
[  361.814121] #PF: supervisor read access in kernel mode
[  361.814646] #PF: error_code(0x0000) - not-present page
[  361.815160] PGD 0 P4D 0
[  361.815431] Oops: 0000 [#1] PREEMPT SMP PTI
[  361.815866] CPU: 5 PID: 48 Comm: kworker/5:0 Kdump: loaded Tainted: G        W   E      6.4.0-rc3+ #49
[  361.817952] Workqueue: events smc_llc_add_link_work [smc]
[  361.818527] RIP: 0010:smc_llc_add_link_cont+0x160/0x270 [smc]
[  361.820973] RSP: 0018:ffffa737801d3d50 EFLAGS: 00010286
[  361.821517] RAX: ffff964f82144000 RBX: ffffa737801d3dd8 RCX: 0000000000000000
[  361.822246] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff964f81370c30
[  361.822957] RBP: ffffa737801d3dd4 R08: ffff964f81370000 R09: ffffa737801d3db0
[  361.823678] R10: 0000000000000001 R11: 0000000000000060 R12: ffff964f82e70000
[  361.824409] R13: ffff964f81370c38 R14: ffffa737801d3dd3 R15: 0000000000000001
[  361.825119] FS:  0000000000000000(0000) GS:ffff9652bfd40000(0000) knlGS:0000000000000000
[  361.825934] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  361.826515] CR2: 0000000000000014 CR3: 000000008fa20004 CR4: 00000000003706e0
[  361.827251] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  361.827989] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  361.828712] Call Trace:
[  361.828964]  <TASK>
[  361.829182]  smc_llc_srv_rkey_exchange+0xa7/0x190 [smc]
[  361.829726]  smc_llc_srv_add_link+0x3ae/0x5a0 [smc]
[  361.830246]  smc_llc_add_link_work+0xb8/0x140 [smc]
[  361.830752]  process_one_work+0x1e5/0x3f0
[  361.831173]  worker_thread+0x4d/0x2f0
[  361.831531]  ? __pfx_worker_thread+0x10/0x10
[  361.831925]  kthread+0xe5/0x120
[  361.832239]  ? __pfx_kthread+0x10/0x10
[  361.832630]  ret_from_fork+0x2c/0x50
[  361.833004]  </TASK>
[  361.833236] Modules linked in: binfmt_misc(E) smc_diag(E) smc(E) rfkill(E) intel_rapl_msr(E) intel_rapl_common(E) 
mousedev(E) psmouse(E) i2c_piix4(E) pcspkr(E) ip_tables(E) mlx5_ib(E) ib_uverbs(E) ib_core(E) cirrus(E) ata_generic(E) 
drm_shmem_helper(E) drm_kms_helper(E) syscopyarea(E) ata_piix(E) sysfillrect(E) crct10dif_pclmul(E) sysimgblt(E) 
mlx5_core(E) crc32_pclmul(E) drm(E) virtio_net(E) mlxfw(E) crc32c_intel(E) ghash_clmulni_intel(E) net_failover(E) 
psample(E) i2c_core(E) failover(E) pci_hyperv_intf(E) serio_raw(E) libata(E) dm_mirror(E) dm_region_hash(E) dm_log(E) 
dm_mod(E)
[  361.839180] CR2: 0000000000000014

Thanks,
Wen Gu

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-06-01  8:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-26 11:48 [PATCH net 0/2] Two fixes for SMCRv2 Wen Gu
2023-05-26 11:49 ` [PATCH net 1/2] net/smc: Scan from current RMB list when no position specified Wen Gu
2023-05-30 20:08   ` Wenjia Zhang
2023-05-26 11:49 ` [PATCH net 2/2] net/smc: Don't use RMBs not mapped to new link in SMCRv2 ADD LINK Wen Gu
2023-05-27 10:22   ` Wenjia Zhang
2023-05-27 15:20     ` Wen Gu
2023-05-30 20:34       ` Wenjia Zhang
2023-06-01  8:37         ` Wen Gu
2023-05-30  9:30 ` [PATCH net 0/2] Two fixes for SMCRv2 patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.