All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zhu Yanjun <zyjzyj2000@gmail.com>
To: Leon Romanovsky <leon@kernel.org>
Cc: Doug Ledford <dledford@redhat.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Leon Romanovsky <leonro@nvidia.com>,
	Dan Carpenter <dan.carpenter@oracle.com>,
	RDMA mailing list <linux-rdma@vger.kernel.org>,
	Maor Gottlieb <maorg@nvidia.com>, Shay Drory <shayd@nvidia.com>
Subject: Re: [PATCH rdma-rc 5/5] RDMA/rxe: Return CQE error if invalid lkey was supplied
Date: Tue, 11 May 2021 14:12:40 +0800	[thread overview]
Message-ID: <CAD=hENe+uoX6gtRjXWiiSfAvhYHSx+NkTPZj=R30xBtQdORPHA@mail.gmail.com> (raw)
In-Reply-To: <11e7b553f3a6f5371c6bb3f57c494bb52b88af99.1620711734.git.leonro@nvidia.com>

On Tue, May 11, 2021 at 1:48 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> From: Leon Romanovsky <leonro@nvidia.com>
>
> The RXE is missing update of WQE status in LOCAL_WRITE failures.
> This caused to the following kernel panic if someone sent atomic
> operation with explicitly wrong lkey.
>
> [leonro@vm ~]$ mkt test
> test_atomic_invalid_lkey (tests.test_atomic.AtomicTest) ... [   43.860977] ------------[ cut here ]------------
>  WARNING: CPU: 5 PID: 263 at drivers/infiniband/sw/rxe/rxe_comp.c:740 rxe_completer+0x1a6d/0x2e30 [rdma_rxe]
>  Modules linked in: crc32_generic rdma_rxe ip6_udp_tunnel udp_tunnel rdma_ucm rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5_core ptp pps_core
>  CPU: 5 PID: 263 Comm: python3 Not tainted 5.13.0-rc1+ #2936
>  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
>  RIP: 0010:rxe_completer+0x1a6d/0x2e30 [rdma_rxe]
>  Code: 03 0f 8e 65 0e 00 00 3b 93 10 06 00 00 0f 84 82 0a 00 00 4c 89 ff 4c 89 44 24 38 e8 2d 74 a9 e1 4c 8b 44 24 38 e9 1c f5 ff ff <0f> 0b e9 0c e8 ff ff b8 05 00 00 00 41 bf 05 00 00 00 e9 ab e7 ff
>  RSP: 0018:ffff8880158af090 EFLAGS: 00010246
>  RAX: 0000000000000000 RBX: ffff888016a78000 RCX: ffffffffa0cf1652
>  RDX: 1ffff9200004b442 RSI: 0000000000000004 RDI: ffffc9000025a210
>  RBP: dffffc0000000000 R08: 00000000ffffffea R09: ffff88801617740b
>  R10: ffffed1002c2ee81 R11: 0000000000000007 R12: ffff88800f3b63e8
>  R13: ffff888016a78008 R14: ffffc9000025a180 R15: 000000000000000c
>  FS:  00007f88b622a740(0000) GS:ffff88806d540000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00007f88b5a1fa10 CR3: 000000000d848004 CR4: 0000000000370ea0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  Call Trace:
>   ? lock_release+0x1f9/0x6c0
>   ? rxe_comp_queue_pkt+0xb0/0xb0 [rdma_rxe]
>   ? lock_downgrade+0x6d0/0x6d0
>   ? lock_downgrade+0x6d0/0x6d0
>   ? lockdep_hardirqs_on_prepare+0x273/0x3e0
>   ? _raw_spin_unlock_irqrestore+0x2d/0x40
>   ? rxe_comp_queue_pkt+0xb0/0xb0 [rdma_rxe]
>   rxe_do_task+0x130/0x230 [rdma_rxe]
>   ? _raw_spin_unlock_irqrestore+0x2d/0x40
>   rxe_rcv+0xb11/0x1df0 [rdma_rxe]
>   ? rxe_crc32.isra.0+0x120/0x120 [rdma_rxe]
>   ? prepare_ack_packet+0x50b/0xa20 [rdma_rxe]
>   rxe_loopback+0x157/0x1e0 [rdma_rxe]
>   ? rxe_send+0x520/0x520 [rdma_rxe]
>   ? lockdep_hardirqs_on_prepare+0x273/0x3e0
>   rxe_responder+0x5532/0x7620 [rdma_rxe]
>   ? rxe_resp_queue_pkt+0xa0/0xa0 [rdma_rxe]
>   ? lock_downgrade+0x6d0/0x6d0
>   ? rxe_crc32.isra.0+0x8e/0x120 [rdma_rxe]
>   ? lock_is_held_type+0x98/0x110
>   ? find_held_lock+0x2d/0x110
>   ? lock_release+0x1f9/0x6c0
>   ? rxe_do_task+0xe5/0x230 [rdma_rxe]
>   ? lock_downgrade+0x6d0/0x6d0
>   ? rxe_resp_queue_pkt+0x19/0xa0 [rdma_rxe]
>   ? lockdep_hardirqs_on_prepare+0x273/0x3e0
>   ? _raw_spin_unlock_irqrestore+0x2d/0x40
>   ? rxe_resp_queue_pkt+0xa0/0xa0 [rdma_rxe]
>   rxe_do_task+0x130/0x230 [rdma_rxe]
>   rxe_rcv+0x9c8/0x1df0 [rdma_rxe]
>   ? __kmalloc_track_caller+0x174/0x390
>   ? rxe_crc32.isra.0+0x120/0x120 [rdma_rxe]
>   rxe_loopback+0x157/0x1e0 [rdma_rxe]
>   ? rxe_send+0x520/0x520 [rdma_rxe]
>   rxe_requester+0x1efd/0x58c0 [rdma_rxe]
>   ? lock_is_held_type+0x98/0x110
>   ? find_held_lock+0x2d/0x110
>   ? rnr_nak_timer+0x70/0x70 [rdma_rxe]
>   ? lock_release+0x1f9/0x6c0
>   ? rxe_do_task+0xe5/0x230 [rdma_rxe]
>   ? lock_downgrade+0x6d0/0x6d0
>   ? lockdep_hardirqs_on_prepare+0x273/0x3e0
>   ? _raw_spin_unlock_irqrestore+0x2d/0x40
>   ? rnr_nak_timer+0x70/0x70 [rdma_rxe]
>   rxe_do_task+0x130/0x230 [rdma_rxe]
>   ? rxe_poll_cq+0x450/0x450 [rdma_rxe]
>   rxe_post_send+0x998/0x1860 [rdma_rxe]
>   ? lock_is_held_type+0x98/0x110
>   ? lock_is_held_type+0x98/0x110
>   ? rdma_lookup_get_uobject+0x22c/0x4a0 [ib_uverbs]
>   ? rxe_poll_cq+0x450/0x450 [rdma_rxe]
>   ib_uverbs_post_send+0xd5f/0x1220 [ib_uverbs]
>   ? lock_acquire+0x1a9/0x6d0
>   ? lock_is_held_type+0x98/0x110
>   ? ib_uverbs_ex_create_wq+0xb00/0xb00 [ib_uverbs]
>   ? lock_release+0x1f9/0x6c0
>   ? __might_fault+0xba/0x160
>   ? lock_downgrade+0x6d0/0x6d0
>   ib_uverbs_write+0x847/0xc80 [ib_uverbs]
>   ? ib_uverbs_open+0x810/0x810 [ib_uverbs]
>   ? vfs_fileattr_set+0x990/0x990
>   ? __up_read+0x1a1/0x7b0
>   vfs_write+0x1c5/0x840
>   ksys_write+0x176/0x1d0
>   ? __x64_sys_read+0xb0/0xb0
>   ? lockdep_hardirqs_on_prepare+0x273/0x3e0
>   ? syscall_enter_from_user_mode+0x1d/0x50
>   do_syscall_64+0x3f/0x80
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
>  RIP: 0033:0x7f88b64917a7
>  Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
>  RSP: 002b:00007ffee189e6c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>  RAX: ffffffffffffffda RBX: 000055c17c8b2660 RCX: 00007f88b64917a7
>  RDX: 0000000000000020 RSI: 00007ffee189e6e0 RDI: 0000000000000003
>  RBP: 0000000000000000 R08: 00007f88b5cbbc80 R09: 00007f88b689fdc0
>  R10: 00007f88b66a0510 R11: 0000000000000246 R12: 00007f88b6072180
>  R13: 0000000000000000 R14: 00007f88b5e6e9d0 R15: 0000000000000008
>  irq event stamp: 809719
>  hardirqs last  enabled at (809727): [<ffffffff813990c4>] console_unlock+0x434/0x850
>  hardirqs last disabled at (809734): [<ffffffff81399279>] console_unlock+0x5e9/0x850
>  softirqs last  enabled at (809016): [<ffffffff8129089e>] irq_exit_rcu+0x11e/0x1a0
>  softirqs last disabled at (808963): [<ffffffff8129089e>] irq_exit_rcu+0x11e/0x1a0
>  ---[ end trace 1e302e4b7857843b ]---
>
> Fixes: 8700e3e7c485 ("Soft RoCE driver")
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>

Thanks,
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>

Zhu Yanjun

> ---
>  drivers/infiniband/sw/rxe/rxe_comp.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
> index 2af26737d32d..a6712e373eed 100644
> --- a/drivers/infiniband/sw/rxe/rxe_comp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
> @@ -346,13 +346,15 @@ static inline enum comp_state do_read(struct rxe_qp *qp,
>         ret = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE,
>                         &wqe->dma, payload_addr(pkt),
>                         payload_size(pkt), to_mr_obj, NULL);
> -       if (ret)
> +       if (ret) {
> +               wqe->status = IB_WC_LOC_PROT_ERR;
>                 return COMPST_ERROR;
> +       }
>
>         if (wqe->dma.resid == 0 && (pkt->mask & RXE_END_MASK))
>                 return COMPST_COMP_ACK;
> -       else
> -               return COMPST_UPDATE_COMP;
> +
> +       return COMPST_UPDATE_COMP;
>  }
>
>  static inline enum comp_state do_atomic(struct rxe_qp *qp,
> @@ -366,10 +368,12 @@ static inline enum comp_state do_atomic(struct rxe_qp *qp,
>         ret = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE,
>                         &wqe->dma, &atomic_orig,
>                         sizeof(u64), to_mr_obj, NULL);
> -       if (ret)
> +       if (ret) {
> +               wqe->status = IB_WC_LOC_PROT_ERR;
>                 return COMPST_ERROR;
> -       else
> -               return COMPST_COMP_ACK;
> +       }
> +
> +       return COMPST_COMP_ACK;
>  }
>
>  static void make_send_cqe(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
> --
> 2.31.1
>

  reply	other threads:[~2021-05-11  6:12 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-11  5:48 [PATCH rdma-rc 0/5] RDMA fixes Leon Romanovsky
2021-05-11  5:48 ` [PATCH rdma-rc 1/5] RDMA/mlx5: Verify that DM operation is reasonable Leon Romanovsky
2021-05-11  5:48 ` [PATCH rdma-rc 2/5] RDMA/core: Don't access cm_id after its destruction Leon Romanovsky
2021-05-11  5:48 ` [PATCH rdma-rc 3/5] RDMA/mlx5: Recover from fatal event in dual port mode Leon Romanovsky
2021-05-11  5:48 ` [PATCH rdma-rc 4/5] RDMA/core: Simplify addition of restrack object Leon Romanovsky
2021-05-25 13:42   ` Jason Gunthorpe
2021-06-02  6:36     ` Leon Romanovsky
2021-05-11  5:48 ` [PATCH rdma-rc 5/5] RDMA/rxe: Return CQE error if invalid lkey was supplied Leon Romanovsky
2021-05-11  6:12   ` Zhu Yanjun [this message]
2021-05-18 17:40 ` [PATCH rdma-rc 0/5] RDMA fixes Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAD=hENe+uoX6gtRjXWiiSfAvhYHSx+NkTPZj=R30xBtQdORPHA@mail.gmail.com' \
    --to=zyjzyj2000@gmail.com \
    --cc=dan.carpenter@oracle.com \
    --cc=dledford@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=leon@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=maorg@nvidia.com \
    --cc=shayd@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.