linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Pearson, Robert B" <robert.pearson2@hpe.com>
To: Olga Kornievskaia <aglo@umich.edu>, Bob Pearson <rpearsonhpe@gmail.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>,
	Zhu Yanjun <zyjzyj2000@gmail.com>,
	linux-rdma <linux-rdma@vger.kernel.org>
Subject: RE: [PATCH for-next] RDMA/rxe: Fix bug in rxe_net.c
Date: Thu, 22 Jul 2021 01:55:58 +0000	[thread overview]
Message-ID: <CS1PR8401MB10968C0943041FEDCBEBF8BDBCE49@CS1PR8401MB1096.NAMPRD84.PROD.OUTLOOK.COM> (raw)
In-Reply-To: <CAN-5tyEkkBN49HCghKSCfPb8e_+0C2PCt8o51TOaMBS=3L7AuA@mail.gmail.com>

OK. For tomorrow. I need to know more about your setup. Which versions of kernel, rdma-core and what application SW you are running so I can try to reproduce your results.

Regards,

Bob Pearson

-----Original Message-----
From: Olga Kornievskaia <aglo@umich.edu> 
Sent: Wednesday, July 21, 2021 7:31 PM
To: Bob Pearson <rpearsonhpe@gmail.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>; Zhu Yanjun <zyjzyj2000@gmail.com>; linux-rdma <linux-rdma@vger.kernel.org>
Subject: Re: [PATCH for-next] RDMA/rxe: Fix bug in rxe_net.c

On Wed, Jul 21, 2021 at 5:42 PM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>
> An earlier patch removed setting of tot_len in IPV4 headers because it 
> was also set in ip_local_out. However, this change resulted in an 
> incorrect ICRC being computed because the tot_len field is not masked 
> out. This patch restores that line. This fixes the bug reported by Zhu Yanjun.
> This bug would have also affected anyone using rxe.
>
> Fixes: 230bb836ee88 ("RDMA/rxe: Fix redundant call to ip_send_check")
> Reported_by: Zhu Yanjun <zyjzyj2000@gmail.com>
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> ---
>  drivers/infiniband/sw/rxe/rxe_net.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c 
> b/drivers/infiniband/sw/rxe/rxe_net.c
> index dec92928a1cd..5ac27f28ace1 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -259,6 +259,7 @@ static void prepare_ipv4_hdr(struct dst_entry 
> *dst, struct sk_buff *skb,
>
>         iph->version    =       IPVERSION;
>         iph->ihl        =       sizeof(struct iphdr) >> 2;
> +       iph->tot_len    =       htons(skb->len);
>         iph->frag_off   =       df;
>         iph->protocol   =       proto;
>         iph->tos        =       tos;
> --

This patch made the server crash (just like one of the other crashes I've seen and posted to the list).

The client logs:

[  206.437839] rdma_rxe: bad ICRC from 192.168.1.92 [  211.043978] rdma_rxe: bad ICRC from 192.168.1.92 [  215.652973] rdma_rxe: bad ICRC from 192.168.1.92


Server crash:

[11568.440098] BUG: unable to handle page fault for address: ffffaddb21f61180 [11568.442923] #PF: supervisor write access in kernel mode [11568.444452] #PF: error_code(0x0002) - not-present page [11568.445996] PGD 1000067 P4D 1000067 PUD 11b9067 PMD 0 [11568.447527] Oops: 0002 [#1] SMP PTI
[11568.448606] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W
  5.14.0-rc1+ #42
[11568.450911] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/22/2020 [11568.454072] RIP: 0010:rxe_cq_post+0x98/0x210 [rdma_rxe] [11568.455613] Code: 8b b3 48 01 00 00 4d 8b 48 08 41 8b 48 28 49 8d
b9 80 01 00 00 85 f6 0f 84 78 01 00 00 41 8b 50 34 d3 e2 48 01 fa 48 8b 4d 00 <48> 89 0a 48 8b 4d 08 48 89 4a 08 48 8b 4d 10 48 89 4a 10 48 8b 4d [11568.461093] RSP: 0018:ffffaddb004c0988 EFLAGS: 00010082 [11568.462621] RAX: 0000000000000246 RBX: ffff9c9137df1a00 RCX: 0000000000000000 [11568.464695] RDX: ffffaddb21f61180 RSI: 0000000000000001 RDI: ffffaddb05f5f180 [11568.466779] RBP: ffffaddb004c0a30 R08: ffff9c9123186c00 R09: ffffaddb05f5f000 [11568.468902] R10: 80139a1c70550000 R11: 400000005d050000 R12: 0000000000000000 [11568.470977] R13: ffff9c9137df1b40 R14: ffff9c9137d50008 R15: 000000000000000a [11568.473050] FS:  0000000000000000(0000) GS:ffff9c917be40000(0000)
knlGS:0000000000000000
[11568.475395] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [11568.477090] CR2: ffffaddb21f61180 CR3: 0000000043476005 CR4: 00000000001706e0 [11568.479170] Call Trace:
[11568.479966]  <IRQ>
[11568.480683]  rxe_responder+0x612/0x2470 [rdma_rxe] [11568.482122]  rxe_do_task+0x89/0x100 [rdma_rxe] [11568.483427]  rxe_rcv+0x2eb/0x900 [rdma_rxe] [11568.484655]  ? __udp4_lib_lookup+0x2c8/0x440 [11568.486159]  rxe_udp_encap_recv+0x68/0xa0 [rdma_rxe] [11568.487721]  ? rxe_enable_task+0x10/0x10 [rdma_rxe] [11568.489223]  udp_queue_rcv_one_skb+0x1df/0x4e0 [11568.490528]  udp_unicast_rcv_skb.isra.67+0x74/0x90
[11568.491926]  __udp4_lib_rcv+0x555/0xb90 [11568.493053]  ip_protocol_deliver_rcu+0xe8/0x1b0
[11568.494479]  ip_local_deliver_finish+0x44/0x50 [11568.496204]  ip_local_deliver+0xf1/0x100 [11568.497621]  ? ip_protocol_deliver_rcu+0x1b0/0x1b0
[11568.499147]  ip_rcv+0xcb/0xe0
[11568.500032]  __netif_receive_skb_core+0x3a2/0x1010
[11568.501491]  ? packet_rcv+0x40/0x4b0
[11568.502661]  ? select_idle_sibling+0x29/0x970 [11568.504019]  __netif_receive_skb_one_core+0x3c/0xa0
[11568.505455]  netif_receive_skb+0x3d/0x130 [11568.506650]  vmxnet3_rq_rx_complete+0x5f0/0xdc0 [vmxnet3] [11568.508808]  vmxnet3_poll_rx_only+0x31/0xa0 [vmxnet3] [11568.510526]  __napi_poll+0x2b/0x120 [11568.511596]  net_rx_action+0xe2/0x240 [11568.512678]  ? vmxnet3_msix_rx+0x4a/0x60 [vmxnet3] [11568.514084]  __do_softirq+0xd9/0x2a1 [11568.515218]  irq_exit_rcu+0xba/0xd0 [11568.516272]  common_interrupt+0x77/0x90 [11568.517438]  </IRQ> [11568.518059]  asm_common_interrupt+0x1e/0x40 [11568.519291] RIP: 0010:acpi_idle_do_entry+0x4c/0x50 [11568.520680] Code: 08 48 8b 15 3a e3 94 01 ed c3 e9 5f fc ff ff 65
48 8b 04 25 00 6f 01 00 48 8b 00 a8 08 75 ea eb 07 0f 00 2d 40 41 50
00 fb f4 <fa> c3 cc cc 0f 1f 44 00 00 41 55 41 89 d5 41 54 49 89 f4 55
53 48
[11568.526026] RSP: 0018:ffffaddb0009be68 EFLAGS: 00000246 [11568.527569] RAX: 0000000000004000 RBX: 0000000000000001 RCX: ffff9c917be40000 [11568.529627] RDX: 0000000000000001 RSI: ffffffff9dcc99c0 RDI: ffff9c917c03b464 [11568.531723] RBP: ffff9c9105f63400 R08: ffff9c917c03b400 R09: 000000000000b0e0 [11568.533772] R10: 0000000000001e99 R11: ffff9c917be6a984 R12: ffffffff9dcc9a40 [11568.535918] R13: ffffffff9dcc99c0 R14: 0000000000000001 R15: 0000000000000000 [11568.538558]  ? sched_clock_cpu+0x9/0xa0 [11568.539706]  acpi_idle_enter+0x4d/0xb0 [11568.540912]  cpuidle_enter_state+0x8c/0x350 [11568.542164]  cpuidle_enter+0x29/0x40 [11568.543211]  do_idle+0x257/0x2a0 [11568.544303]  cpu_startup_entry+0x19/0x20 [11568.545455]  start_secondary+0x116/0x150 [11568.546928]  secondary_startup_64_no_verify+0xc2/0xcb
[11568.548479] Modules linked in: rpcrdma rdma_ucm rdma_cm iw_cm ib_cm rdma_rxe ip6_udp_tunnel udp_tunnel ib_uverbs ib_core fuse rfcomm xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT
nf_reject_ipv4 nft_counter nft_compat nf_tables nfnetlink tun bridge stp llc vmw_vsock_vmci_transport vsock bnep snd_seq_midi snd_seq_midi_event intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul vmw_balloon ghash_clmulni_intel joydev pcspkr btusb btrtl btbcm btintel bluetooth uvcvideo rfkill videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 snd_ens1371 snd_ac97_codec ac97_bus snd_seq videobuf2_common snd_pcm videodev mc ecdh_generic ecc snd_timer snd_rawmidi snd_seq_device snd soundcore vmw_vmci i2c_piix4 auth_rpcgss sunrpc ip_tables xfs libcrc32c sr_mod cdrom sg crc32c_intel ata_generic vmwgfx ttm serio_raw nvme drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops nvme_core t10_pi cec ata_piix ahci vmxnet3 libahci drm libata [11568.575542] CR2: ffffaddb21f61180 [11568.577210] ---[ end trace 8afcc89bb91d9b85 ]--- [11568.578573] RIP: 0010:rxe_cq_post+0x98/0x210 [rdma_rxe]



> 2.30.2
>

  reply	other threads:[~2021-07-22  1:56 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-21 21:41 [PATCH for-next] RDMA/rxe: Fix bug in rxe_net.c Bob Pearson
2021-07-22  0:31 ` Olga Kornievskaia
2021-07-22  1:55   ` Pearson, Robert B [this message]
2021-07-22 15:37     ` Olga Kornievskaia
2021-07-26  7:42       ` Zhu Yanjun
2021-07-26 13:15         ` Pearson, Robert B

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CS1PR8401MB10968C0943041FEDCBEBF8BDBCE49@CS1PR8401MB1096.NAMPRD84.PROD.OUTLOOK.COM \
    --to=robert.pearson2@hpe.com \
    --cc=aglo@umich.edu \
    --cc=jgg@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=rpearsonhpe@gmail.com \
    --cc=zyjzyj2000@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).