From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.2 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D93E1C433B4 for ; Wed, 28 Apr 2021 16:15:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9A90961420 for ; Wed, 28 Apr 2021 16:15:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240170AbhD1QPr (ORCPT ); Wed, 28 Apr 2021 12:15:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57500 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239984AbhD1QPr (ORCPT ); Wed, 28 Apr 2021 12:15:47 -0400 Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9DD93C061573 for ; Wed, 28 Apr 2021 09:15:00 -0700 (PDT) Received: by mail-oi1-x22b.google.com with SMTP id m13so63545231oiw.13 for ; Wed, 28 Apr 2021 09:15:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=05bXw+bvbriz/0BVlldRWJ7tCcfmMHTX4MiaQpMlHHI=; b=jw7whC3uL2sBpfy9+A1v0rdkTSXSfwQuX4Da3SZ2V1iq/lB8+2FzJPEsrpb/WjALQV hgm5xuM8j1A0j5dIZYewVLQ4E37pWG6XOeoCJdQu6uIoO1h+yMn0LS2zSiwW5OYTlBcY FB+Oa0W4EhIGtDVRStpgI0rq3YFVzxMnhJ+20vmi5mHzYqPcgUcrFYdYgRc+l7ZnkGk4 67AxB6OMu5PEolzLOq756JZq8MB9dp+Gznu6xIcJjpv4EoWLUXObA8ppd8E+xtVCF+zt mZapRbYHARVFeNMPEP82QYbBt2b3w8xTldmNRelhhL0iaXivQy03fpwkyOG9jn+ykZe3 /fEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=05bXw+bvbriz/0BVlldRWJ7tCcfmMHTX4MiaQpMlHHI=; b=GWUAjXYUxtTSh0jpks0GWKkMw74E2mTICBWsK++img+hDhajVWq3QwrKLclbmn5fSv OAuIbaCYRkZXYeup4TCgVDCThOKvq9HGh53EesLduEPnyPOZ97kxM8ZAN1opn8U/+w0/ uzmDA7GY+KAB8rGjB8i6AR7zffygQn3ABDzzT1SGBsw7bGoW5sjFGL1b7ItUUjGRh1cn /OeJz3M5VywFgRPdvHe7Nwgf5ewrjbQWYetPX7Kdo9qjWwje0LzCmGbUY3h9hJldXD3d xJk4yATcslO9nkAhhWkDrZQaTQV3UgQl+Alu8RTa65MlLdPEnODfNEh6B9YsnzEHZ7jM W50A== X-Gm-Message-State: AOAM530n2+4VLnUKWog9IkMbhDIyq0sXXBJFVq+OToVNoI09igQJpBfn tZcbJB4LUCEhKYLKjgN8G28= X-Google-Smtp-Source: ABdhPJx0PDAu3KE3bvAm9C0tsRxFhISuyETQ/JQsWo+XtjqaG4ehQiDlojEWIo6rLa7JgViHfWUUow== X-Received: by 2002:a05:6808:142:: with SMTP id h2mr21146782oie.171.1619626499971; Wed, 28 Apr 2021 09:14:59 -0700 (PDT) Received: from ?IPv6:2603:8081:140c:1a00:dcc:55b8:efe8:ecc6? (2603-8081-140c-1a00-0dcc-55b8-efe8-ecc6.res6.spectrum.com. [2603:8081:140c:1a00:dcc:55b8:efe8:ecc6]) by smtp.gmail.com with ESMTPSA id d12sm101690ook.1.2021.04.28.09.14.59 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 28 Apr 2021 09:14:59 -0700 (PDT) Subject: Re: [PATCH for-next v5 07/10] RDMA/rxe: Add support for bind MW work requests To: Zhu Yanjun Cc: Jason Gunthorpe , RDMA mailing list , Bob Pearson References: <20210422161341.41929-1-rpearson@hpe.com> <20210422161341.41929-8-rpearson@hpe.com> From: "Pearson, Robert B" Message-ID: <05be50f7-2af6-2323-71dd-e7c6efd75a5d@gmail.com> Date: Wed, 28 Apr 2021 11:14:59 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org On 4/24/2021 11:29 PM, Zhu Yanjun wrote: > On Fri, Apr 23, 2021 at 12:13 AM Bob Pearson wrote: >> Add support for bind MW work requests from user space. >> Since rdma/core does not support bind mw in ib_send_wr >> there is no way to support bind mw in kernel space. >> >> Added bind_mw local operation in rxe_req.c >> Added bind_mw WR operation in rxe_opcode.c >> Added bind_mw WC in rxe_comp.c >> Added additional fields to rxe_mw in rxe_verbs.h >> Added do_dealloc_mw() subroutine to cleanup an mw >> when rxe_dealloc_mw is called. >> Added code to implement bind_mw operation in rxe_mw.c >> >> Signed-off-by: Bob Pearson >> --- >> v3: >> do_bind_mw() in rxe_mw.c is changed to be a void instead of >> returning an int. >> v2: >> Dropped kernel support for bind_mw in rxe_mw.c >> Replaced umw with mw in struct rxe_send_wr >> --- >> drivers/infiniband/sw/rxe/rxe_comp.c | 1 + >> drivers/infiniband/sw/rxe/rxe_loc.h | 1 + >> drivers/infiniband/sw/rxe/rxe_mw.c | 202 ++++++++++++++++++++++++- >> drivers/infiniband/sw/rxe/rxe_opcode.c | 7 + >> drivers/infiniband/sw/rxe/rxe_req.c | 9 ++ >> drivers/infiniband/sw/rxe/rxe_verbs.h | 15 +- >> 6 files changed, 230 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c >> index 2af26737d32d..bc5488af5f55 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_comp.c >> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c >> @@ -103,6 +103,7 @@ static enum ib_wc_opcode wr_to_wc_opcode(enum ib_wr_opcode opcode) >> case IB_WR_RDMA_READ_WITH_INV: return IB_WC_RDMA_READ; >> case IB_WR_LOCAL_INV: return IB_WC_LOCAL_INV; >> case IB_WR_REG_MR: return IB_WC_REG_MR; >> + case IB_WR_BIND_MW: return IB_WC_BIND_MW; >> >> default: >> return 0xff; >> diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h >> index edf575930a98..e6f574973298 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_loc.h >> +++ b/drivers/infiniband/sw/rxe/rxe_loc.h >> @@ -110,6 +110,7 @@ int advance_dma_data(struct rxe_dma_info *dma, unsigned int length); >> /* rxe_mw.c */ >> int rxe_alloc_mw(struct ib_mw *ibmw, struct ib_udata *udata); >> int rxe_dealloc_mw(struct ib_mw *ibmw); >> +int rxe_bind_mw(struct rxe_qp *qp, struct rxe_send_wqe *wqe); >> void rxe_mw_cleanup(struct rxe_pool_entry *arg); >> >> /* rxe_net.c */ >> diff --git a/drivers/infiniband/sw/rxe/rxe_mw.c b/drivers/infiniband/sw/rxe/rxe_mw.c >> index 69128e298d44..c018e8865876 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_mw.c >> +++ b/drivers/infiniband/sw/rxe/rxe_mw.c >> @@ -29,6 +29,29 @@ int rxe_alloc_mw(struct ib_mw *ibmw, struct ib_udata *udata) >> return 0; >> } >> >> +static void do_dealloc_mw(struct rxe_mw *mw) > rxe_do_dealloc_mw if it is not used out of softroce. Same issue here. If a name is local adding the prefix doesn't add any value and makes the code harder to read. > >> +{ >> + if (mw->mr) { >> + struct rxe_mr *mr = mw->mr; >> + >> + mw->mr = NULL; >> + atomic_dec(&mr->num_mw); >> + rxe_drop_ref(mr); >> + } >> + >> + if (mw->qp) { >> + struct rxe_qp *qp = mw->qp; >> + >> + mw->qp = NULL; >> + rxe_drop_ref(qp); >> + } >> + >> + mw->access = 0; >> + mw->addr = 0; >> + mw->length = 0; >> + mw->state = RXE_MW_STATE_INVALID; >> +} >> + >> int rxe_dealloc_mw(struct ib_mw *ibmw) >> { >> struct rxe_mw *mw = to_rmw(ibmw); >> @@ -36,7 +59,7 @@ int rxe_dealloc_mw(struct ib_mw *ibmw) >> unsigned long flags; >> >> spin_lock_irqsave(&mw->lock, flags); >> - mw->state = RXE_MW_STATE_INVALID; >> + do_dealloc_mw(mw); >> spin_unlock_irqrestore(&mw->lock, flags); >> >> rxe_drop_ref(mw); >> @@ -45,6 +68,183 @@ int rxe_dealloc_mw(struct ib_mw *ibmw) >> return 0; >> } >> >> +static int check_bind_mw(struct rxe_qp *qp, struct rxe_send_wqe *wqe, > rxe_check_bind_mw > >> + struct rxe_mw *mw, struct rxe_mr *mr) >> +{ >> + if (mw->ibmw.type == IB_MW_TYPE_1) { >> + if (unlikely(mw->state != RXE_MW_STATE_VALID)) { >> + pr_err_once( >> + "attempt to bind a type 1 MW not in the valid state\n"); >> + return -EINVAL; >> + } >> + >> + /* o10-36.2.2 */ >> + if (unlikely((mw->access & IB_ZERO_BASED))) { >> + pr_err_once("attempt to bind a zero based type 1 MW\n"); >> + return -EINVAL; >> + } >> + } >> + >> + if (mw->ibmw.type == IB_MW_TYPE_2) { >> + /* o10-37.2.30 */ >> + if (unlikely(mw->state != RXE_MW_STATE_FREE)) { >> + pr_err_once( >> + "attempt to bind a type 2 MW not in the free state\n"); >> + return -EINVAL; >> + } >> + >> + /* C10-72 */ >> + if (unlikely(qp->pd != to_rpd(mw->ibmw.pd))) { >> + pr_err_once( >> + "attempt to bind type 2 MW with qp with different PD\n"); >> + return -EINVAL; >> + } >> + >> + /* o10-37.2.40 */ >> + if (unlikely(!mr || wqe->wr.wr.mw.length == 0)) { >> + pr_err_once( >> + "attempt to invalidate type 2 MW by binding with NULL or zero length MR\n"); >> + return -EINVAL; >> + } >> + } >> + >> + if (unlikely((wqe->wr.wr.mw.rkey & 0xff) == (mw->ibmw.rkey & 0xff))) { >> + pr_err_once("attempt to bind MW with same key\n"); >> + return -EINVAL; >> + } >> + >> + /* remaining checks only apply to a nonzero MR */ >> + if (!mr) >> + return 0; >> + >> + if (unlikely(mr->access & IB_ZERO_BASED)) { >> + pr_err_once("attempt to bind MW to zero based MR\n"); >> + return -EINVAL; >> + } >> + >> + /* C10-73 */ >> + if (unlikely(!(mr->access & IB_ACCESS_MW_BIND))) { >> + pr_err_once( >> + "attempt to bind an MW to an MR without bind access\n"); >> + return -EINVAL; >> + } >> + >> + /* C10-74 */ >> + if (unlikely((mw->access & >> + (IB_ACCESS_REMOTE_WRITE | IB_ACCESS_REMOTE_ATOMIC)) && >> + !(mr->access & IB_ACCESS_LOCAL_WRITE))) { >> + pr_err_once( >> + "attempt to bind an writeable MW to an MR without local write access\n"); >> + return -EINVAL; >> + } >> + >> + /* C10-75 */ >> + if (mw->access & IB_ZERO_BASED) { >> + if (unlikely(wqe->wr.wr.mw.length > mr->length)) { >> + pr_err_once( >> + "attempt to bind a ZB MW outside of the MR\n"); >> + return -EINVAL; >> + } >> + } else { >> + if (unlikely((wqe->wr.wr.mw.addr < mr->iova) || >> + ((wqe->wr.wr.mw.addr + wqe->wr.wr.mw.length) > >> + (mr->iova + mr->length)))) { >> + pr_err_once( >> + "attempt to bind a VA MW outside of the MR\n"); >> + return -EINVAL; >> + } >> + } >> + >> + return 0; >> +} >> + >> +static void do_bind_mw(struct rxe_qp *qp, struct rxe_send_wqe *wqe, > rxe_do_bind_mw > >> + struct rxe_mw *mw, struct rxe_mr *mr) >> +{ >> + u32 rkey; >> + u32 new_rkey; >> + >> + rkey = mw->ibmw.rkey; >> + new_rkey = (rkey & 0xffffff00) | (wqe->wr.wr.mw.rkey & 0x000000ff); >> + >> + mw->ibmw.rkey = new_rkey; >> + mw->access = wqe->wr.wr.mw.access; >> + mw->state = RXE_MW_STATE_VALID; >> + mw->addr = wqe->wr.wr.mw.addr; >> + mw->length = wqe->wr.wr.mw.length; >> + >> + if (mw->mr) { >> + rxe_drop_ref(mw->mr); >> + atomic_dec(&mw->mr->num_mw); >> + mw->mr = NULL; >> + } >> + >> + if (mw->length) { >> + mw->mr = mr; >> + atomic_inc(&mr->num_mw); >> + rxe_add_ref(mr); >> + } >> + >> + if (mw->ibmw.type == IB_MW_TYPE_2) { >> + rxe_add_ref(qp); >> + mw->qp = qp; >> + } >> +} >> + >> +int rxe_bind_mw(struct rxe_qp *qp, struct rxe_send_wqe *wqe) >> +{ >> + int ret; >> + struct rxe_mw *mw; >> + struct rxe_mr *mr; >> + struct rxe_dev *rxe = to_rdev(qp->ibqp.device); >> + unsigned long flags; >> + >> + mw = rxe_pool_get_index(&rxe->mw_pool, >> + wqe->wr.wr.mw.mw_rkey >> 8); >> + if (unlikely(!mw)) { >> + ret = -EINVAL; >> + goto err; >> + } >> + >> + if (unlikely(mw->ibmw.rkey != wqe->wr.wr.mw.mw_rkey)) { >> + ret = -EINVAL; >> + goto err_drop_mw; >> + } >> + >> + if (likely(wqe->wr.wr.mw.length)) { >> + mr = rxe_pool_get_index(&rxe->mr_pool, >> + wqe->wr.wr.mw.mr_lkey >> 8); >> + if (unlikely(!mr)) { >> + ret = -EINVAL; >> + goto err_drop_mw; >> + } >> + >> + if (unlikely(mr->ibmr.lkey != wqe->wr.wr.mw.mr_lkey)) { >> + ret = -EINVAL; >> + goto err_drop_mr; >> + } >> + } else { >> + mr = NULL; >> + } >> + >> + spin_lock_irqsave(&mw->lock, flags); >> + >> + ret = check_bind_mw(qp, wqe, mw, mr); >> + if (ret) >> + goto err_unlock; >> + >> + do_bind_mw(qp, wqe, mw, mr); >> +err_unlock: >> + spin_unlock_irqrestore(&mw->lock, flags); >> +err_drop_mr: >> + if (mr) >> + rxe_drop_ref(mr); >> +err_drop_mw: >> + rxe_drop_ref(mw); >> +err: >> + return ret; >> +} >> + >> void rxe_mw_cleanup(struct rxe_pool_entry *elem) >> { >> struct rxe_mw *mw = container_of(elem, typeof(*mw), pelem); >> diff --git a/drivers/infiniband/sw/rxe/rxe_opcode.c b/drivers/infiniband/sw/rxe/rxe_opcode.c >> index 1e4b67b048f3..3ef5a10a6efd 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_opcode.c >> +++ b/drivers/infiniband/sw/rxe/rxe_opcode.c >> @@ -96,6 +96,13 @@ struct rxe_wr_opcode_info rxe_wr_opcode_info[] = { >> [IB_QPT_RC] = WR_LOCAL_OP_MASK, >> }, >> }, >> + [IB_WR_BIND_MW] = { >> + .name = "IB_WR_BIND_MW", >> + .mask = { >> + [IB_QPT_RC] = WR_LOCAL_OP_MASK, >> + [IB_QPT_UC] = WR_LOCAL_OP_MASK, >> + }, >> + }, >> }; >> >> struct rxe_opcode_info rxe_opcode[RXE_NUM_OPCODE] = { >> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c >> index 0cf97e3db29f..243602584a28 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_req.c >> +++ b/drivers/infiniband/sw/rxe/rxe_req.c >> @@ -561,6 +561,7 @@ static int do_local_ops(struct rxe_qp *qp, struct rxe_send_wqe *wqe) >> struct rxe_dev *rxe; >> struct rxe_mr *mr; >> u32 rkey; >> + int ret; >> >> switch (opcode) { >> case IB_WR_LOCAL_INV: >> @@ -587,6 +588,14 @@ static int do_local_ops(struct rxe_qp *qp, struct rxe_send_wqe *wqe) >> mr->iova = wqe->wr.wr.reg.mr->iova; >> rxe_drop_ref(mr); >> break; >> + case IB_WR_BIND_MW: >> + ret = rxe_bind_mw(qp, wqe); >> + if (ret) { >> + wqe->state = wqe_state_error; >> + wqe->status = IB_WC_MW_BIND_ERR; >> + return -EINVAL; >> + } >> + break; >> default: >> pr_err("Unexpected send wqe opcode %d\n", opcode); >> wqe->state = wqe_state_error; >> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h >> index c8597ae8c833..7da47b8c707b 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h >> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h >> @@ -312,6 +312,8 @@ struct rxe_mr { >> u32 num_map; >> >> struct rxe_map **map; >> + >> + atomic_t num_mw; >> }; >> >> enum rxe_mw_state { >> @@ -321,10 +323,15 @@ enum rxe_mw_state { >> }; >> >> struct rxe_mw { >> - struct ib_mw ibmw; >> - struct rxe_pool_entry pelem; >> - spinlock_t lock; >> - enum rxe_mw_state state; >> + struct ib_mw ibmw; >> + struct rxe_pool_entry pelem; >> + spinlock_t lock; >> + enum rxe_mw_state state; >> + struct rxe_qp *qp; /* Type 2 only */ >> + struct rxe_mr *mr; >> + int access; >> + u64 addr; >> + u64 length; >> }; >> >> struct rxe_mc_grp { >> -- >> 2.27.0 >>