linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for-next v2 0/2] RDMA/erdma: Add non-4K page size support
@ 2023-03-07 10:29 Cheng Xu
  2023-03-07 10:29 ` [PATCH for-next v2 1/2] RDMA/erdma: Use fixed hardware page size Cheng Xu
  2023-03-07 10:29 ` [PATCH for-next v2 2/2] RDMA/erdma: Support non-4K page size in doorbell allocation Cheng Xu
  0 siblings, 2 replies; 23+ messages in thread
From: Cheng Xu @ 2023-03-07 10:29 UTC (permalink / raw)
  To: jgg, leon; +Cc: linux-rdma, KaiShen

Hi,

This series introduces non-4K page size support. For some aarch64
distributions, the default page size is 64K, not 4K. Current erdma can
not work correctly for these systems. There are two reasons: one is that
the kernel driver passes the kernel's page size to HW, but HW always
treats 4K as the basic page size. Another reason is that the user
space provider always uses 4K to map the doorbell space which is not
right if page size is not 4K.

So, we fix these issues in this patchset:
- #1 fixes the issue that put wrong value in CMD to HW if PAGE_SIZE is
  not 4096.
- #2 returns the necessary information for userspace to call mmap. This
  commit requires changes in userspace provider. PR is [1].

v2 -> v1:
- Add fixes line in the first patch.
- Update commit description of the two patches.

Thanks,
Cheng Xu

[1] https://github.com/linux-rdma/rdma-core/pull/1313

Cheng Xu (2):
  RDMA/erdma: Use fixed hardware page size
  RDMA/erdma: Support non-4K page size in doorbell allocation

 drivers/infiniband/hw/erdma/erdma_hw.h    |  4 ++
 drivers/infiniband/hw/erdma/erdma_verbs.c | 57 +++++++++++------------
 drivers/infiniband/hw/erdma/erdma_verbs.h |  5 +-
 include/uapi/rdma/erdma-abi.h             |  5 +-
 4 files changed, 36 insertions(+), 35 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH for-next v2 1/2] RDMA/erdma: Use fixed hardware page size
  2023-03-07 10:29 [PATCH for-next v2 0/2] RDMA/erdma: Add non-4K page size support Cheng Xu
@ 2023-03-07 10:29 ` Cheng Xu
  2023-03-24 14:34   ` Jason Gunthorpe
  2023-03-07 10:29 ` [PATCH for-next v2 2/2] RDMA/erdma: Support non-4K page size in doorbell allocation Cheng Xu
  1 sibling, 1 reply; 23+ messages in thread
From: Cheng Xu @ 2023-03-07 10:29 UTC (permalink / raw)
  To: jgg, leon; +Cc: linux-rdma, KaiShen

Hardware's page size is 4096, but the kernel's page size may vary. Driver
should use hardware's page size when communicating with hardware.

Fixes: 155055771704 ("RDMA/erdma: Add verbs implementation")
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
---
 drivers/infiniband/hw/erdma/erdma_hw.h    |  4 ++++
 drivers/infiniband/hw/erdma/erdma_verbs.c | 17 +++++++++--------
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/hw/erdma/erdma_hw.h b/drivers/infiniband/hw/erdma/erdma_hw.h
index 4c38d99c73f1..56e0c7a3e8f8 100644
--- a/drivers/infiniband/hw/erdma/erdma_hw.h
+++ b/drivers/infiniband/hw/erdma/erdma_hw.h
@@ -112,6 +112,10 @@
 
 #define ERDMA_PAGE_SIZE_SUPPORT 0x7FFFF000
 
+/* Hardware page size definition */
+#define ERDMA_HW_PAGE_SHIFT 12
+#define ERDMA_HW_PAGE_SIZE 4096
+
 /* WQE related. */
 #define EQE_SIZE 16
 #define EQE_SHIFT 4
diff --git a/drivers/infiniband/hw/erdma/erdma_verbs.c b/drivers/infiniband/hw/erdma/erdma_verbs.c
index 9c30d78730aa..83e1b0d55977 100644
--- a/drivers/infiniband/hw/erdma/erdma_verbs.c
+++ b/drivers/infiniband/hw/erdma/erdma_verbs.c
@@ -38,7 +38,7 @@ static int create_qp_cmd(struct erdma_dev *dev, struct erdma_qp *qp)
 		   FIELD_PREP(ERDMA_CMD_CREATE_QP_PD_MASK, pd->pdn);
 
 	if (rdma_is_kernel_res(&qp->ibqp.res)) {
-		u32 pgsz_range = ilog2(SZ_1M) - PAGE_SHIFT;
+		u32 pgsz_range = ilog2(SZ_1M) - ERDMA_HW_PAGE_SHIFT;
 
 		req.sq_cqn_mtt_cfg =
 			FIELD_PREP(ERDMA_CMD_CREATE_QP_PAGE_SIZE_MASK,
@@ -66,13 +66,13 @@ static int create_qp_cmd(struct erdma_dev *dev, struct erdma_qp *qp)
 		user_qp = &qp->user_qp;
 		req.sq_cqn_mtt_cfg = FIELD_PREP(
 			ERDMA_CMD_CREATE_QP_PAGE_SIZE_MASK,
-			ilog2(user_qp->sq_mtt.page_size) - PAGE_SHIFT);
+			ilog2(user_qp->sq_mtt.page_size) - ERDMA_HW_PAGE_SHIFT);
 		req.sq_cqn_mtt_cfg |=
 			FIELD_PREP(ERDMA_CMD_CREATE_QP_CQN_MASK, qp->scq->cqn);
 
 		req.rq_cqn_mtt_cfg = FIELD_PREP(
 			ERDMA_CMD_CREATE_QP_PAGE_SIZE_MASK,
-			ilog2(user_qp->rq_mtt.page_size) - PAGE_SHIFT);
+			ilog2(user_qp->rq_mtt.page_size) - ERDMA_HW_PAGE_SHIFT);
 		req.rq_cqn_mtt_cfg |=
 			FIELD_PREP(ERDMA_CMD_CREATE_QP_CQN_MASK, qp->rcq->cqn);
 
@@ -162,7 +162,7 @@ static int create_cq_cmd(struct erdma_dev *dev, struct erdma_cq *cq)
 	if (rdma_is_kernel_res(&cq->ibcq.res)) {
 		page_size = SZ_32M;
 		req.cfg0 |= FIELD_PREP(ERDMA_CMD_CREATE_CQ_PAGESIZE_MASK,
-				       ilog2(page_size) - PAGE_SHIFT);
+				       ilog2(page_size) - ERDMA_HW_PAGE_SHIFT);
 		req.qbuf_addr_l = lower_32_bits(cq->kern_cq.qbuf_dma_addr);
 		req.qbuf_addr_h = upper_32_bits(cq->kern_cq.qbuf_dma_addr);
 
@@ -175,8 +175,9 @@ static int create_cq_cmd(struct erdma_dev *dev, struct erdma_cq *cq)
 			cq->kern_cq.qbuf_dma_addr + (cq->depth << CQE_SHIFT);
 	} else {
 		mtt = &cq->user_cq.qbuf_mtt;
-		req.cfg0 |= FIELD_PREP(ERDMA_CMD_CREATE_CQ_PAGESIZE_MASK,
-				       ilog2(mtt->page_size) - PAGE_SHIFT);
+		req.cfg0 |=
+			FIELD_PREP(ERDMA_CMD_CREATE_CQ_PAGESIZE_MASK,
+				   ilog2(mtt->page_size) - ERDMA_HW_PAGE_SHIFT);
 		if (mtt->mtt_nents == 1) {
 			req.qbuf_addr_l = lower_32_bits(*(u64 *)mtt->mtt_buf);
 			req.qbuf_addr_h = upper_32_bits(*(u64 *)mtt->mtt_buf);
@@ -636,7 +637,7 @@ static int init_user_qp(struct erdma_qp *qp, struct erdma_ucontext *uctx,
 	u32 rq_offset;
 	int ret;
 
-	if (len < (PAGE_ALIGN(qp->attrs.sq_size * SQEBB_SIZE) +
+	if (len < (ALIGN(qp->attrs.sq_size * SQEBB_SIZE, ERDMA_HW_PAGE_SIZE) +
 		   qp->attrs.rq_size * RQE_SIZE))
 		return -EINVAL;
 
@@ -646,7 +647,7 @@ static int init_user_qp(struct erdma_qp *qp, struct erdma_ucontext *uctx,
 	if (ret)
 		return ret;
 
-	rq_offset = PAGE_ALIGN(qp->attrs.sq_size << SQEBB_SHIFT);
+	rq_offset = ALIGN(qp->attrs.sq_size << SQEBB_SHIFT, ERDMA_HW_PAGE_SIZE);
 	qp->user_qp.rq_offset = rq_offset;
 
 	ret = get_mtt_entries(qp->dev, &qp->user_qp.rq_mtt, va + rq_offset,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH for-next v2 2/2] RDMA/erdma: Support non-4K page size in doorbell allocation
  2023-03-07 10:29 [PATCH for-next v2 0/2] RDMA/erdma: Add non-4K page size support Cheng Xu
  2023-03-07 10:29 ` [PATCH for-next v2 1/2] RDMA/erdma: Use fixed hardware page size Cheng Xu
@ 2023-03-07 10:29 ` Cheng Xu
  2023-03-14 10:23   ` Leon Romanovsky
  1 sibling, 1 reply; 23+ messages in thread
From: Cheng Xu @ 2023-03-07 10:29 UTC (permalink / raw)
  To: jgg, leon; +Cc: linux-rdma, KaiShen

Doorbell resources are exposed to userspace by mmap. The size unit of mmap
is PAGE_SIZE, previous implementation can not work correctly if PAGE_SIZE
is not 4K. We support non-4K page size in this commit.

Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
---
 drivers/infiniband/hw/erdma/erdma_verbs.c | 40 ++++++++++-------------
 drivers/infiniband/hw/erdma/erdma_verbs.h |  5 ++-
 include/uapi/rdma/erdma-abi.h             |  5 ++-
 3 files changed, 23 insertions(+), 27 deletions(-)

diff --git a/drivers/infiniband/hw/erdma/erdma_verbs.c b/drivers/infiniband/hw/erdma/erdma_verbs.c
index 83e1b0d55977..0cd8dd61f569 100644
--- a/drivers/infiniband/hw/erdma/erdma_verbs.c
+++ b/drivers/infiniband/hw/erdma/erdma_verbs.c
@@ -1137,8 +1137,8 @@ void erdma_mmap_free(struct rdma_user_mmap_entry *rdma_entry)
 static void alloc_db_resources(struct erdma_dev *dev,
 			       struct erdma_ucontext *ctx)
 {
-	u32 bitmap_idx;
 	struct erdma_devattr *attrs = &dev->attrs;
+	u32 bitmap_idx, hw_page_idx;
 
 	if (attrs->disable_dwqe)
 		goto alloc_normal_db;
@@ -1151,11 +1151,9 @@ static void alloc_db_resources(struct erdma_dev *dev,
 		spin_unlock(&dev->db_bitmap_lock);
 
 		ctx->sdb_type = ERDMA_SDB_PAGE;
-		ctx->sdb_idx = bitmap_idx;
-		ctx->sdb_page_idx = bitmap_idx;
+		ctx->sdb_bitmap_idx = bitmap_idx;
 		ctx->sdb = dev->func_bar_addr + ERDMA_BAR_SQDB_SPACE_OFFSET +
-			   (bitmap_idx << PAGE_SHIFT);
-		ctx->sdb_page_off = 0;
+			   (bitmap_idx << ERDMA_HW_PAGE_SHIFT);
 
 		return;
 	}
@@ -1166,13 +1164,13 @@ static void alloc_db_resources(struct erdma_dev *dev,
 		spin_unlock(&dev->db_bitmap_lock);
 
 		ctx->sdb_type = ERDMA_SDB_ENTRY;
-		ctx->sdb_idx = bitmap_idx;
-		ctx->sdb_page_idx = attrs->dwqe_pages +
+		ctx->sdb_bitmap_idx = bitmap_idx;
+		hw_page_idx = attrs->dwqe_pages +
 				    bitmap_idx / ERDMA_DWQE_TYPE1_CNT_PER_PAGE;
-		ctx->sdb_page_off = bitmap_idx % ERDMA_DWQE_TYPE1_CNT_PER_PAGE;
 
+		ctx->sdb_entid = bitmap_idx % ERDMA_DWQE_TYPE1_CNT_PER_PAGE;
 		ctx->sdb = dev->func_bar_addr + ERDMA_BAR_SQDB_SPACE_OFFSET +
-			   (ctx->sdb_page_idx << PAGE_SHIFT);
+			   (hw_page_idx << PAGE_SHIFT);
 
 		return;
 	}
@@ -1181,11 +1179,8 @@ static void alloc_db_resources(struct erdma_dev *dev,
 
 alloc_normal_db:
 	ctx->sdb_type = ERDMA_SDB_SHARED;
-	ctx->sdb_idx = 0;
-	ctx->sdb_page_idx = ERDMA_SDB_SHARED_PAGE_INDEX;
-	ctx->sdb_page_off = 0;
-
-	ctx->sdb = dev->func_bar_addr + (ctx->sdb_page_idx << PAGE_SHIFT);
+	ctx->sdb = dev->func_bar_addr +
+		   (ERDMA_SDB_SHARED_PAGE_INDEX << ERDMA_HW_PAGE_SHIFT);
 }
 
 static void erdma_uctx_user_mmap_entries_remove(struct erdma_ucontext *uctx)
@@ -1215,11 +1210,6 @@ int erdma_alloc_ucontext(struct ib_ucontext *ibctx, struct ib_udata *udata)
 	ctx->rdb = dev->func_bar_addr + ERDMA_BAR_RQDB_SPACE_OFFSET;
 	ctx->cdb = dev->func_bar_addr + ERDMA_BAR_CQDB_SPACE_OFFSET;
 
-	if (udata->outlen < sizeof(uresp)) {
-		ret = -EINVAL;
-		goto err_out;
-	}
-
 	ctx->sq_db_mmap_entry = erdma_user_mmap_entry_insert(
 		ctx, (void *)ctx->sdb, PAGE_SIZE, ERDMA_MMAP_IO_NC, &uresp.sdb);
 	if (!ctx->sq_db_mmap_entry) {
@@ -1243,9 +1233,13 @@ int erdma_alloc_ucontext(struct ib_ucontext *ibctx, struct ib_udata *udata)
 
 	uresp.dev_id = dev->pdev->device;
 	uresp.sdb_type = ctx->sdb_type;
-	uresp.sdb_offset = ctx->sdb_page_off;
+	uresp.sdb_entid = ctx->sdb_entid;
+	uresp.sdb_off = ctx->sdb & ~PAGE_MASK;
+	uresp.rdb_off = ctx->rdb & ~PAGE_MASK;
+	uresp.cdb_off = ctx->cdb & ~PAGE_MASK;
 
-	ret = ib_copy_to_udata(udata, &uresp, sizeof(uresp));
+	ret = ib_copy_to_udata(udata, &uresp,
+			       min(sizeof(uresp), udata->outlen));
 	if (ret)
 		goto err_out;
 
@@ -1264,9 +1258,9 @@ void erdma_dealloc_ucontext(struct ib_ucontext *ibctx)
 
 	spin_lock(&dev->db_bitmap_lock);
 	if (ctx->sdb_type == ERDMA_SDB_PAGE)
-		clear_bit(ctx->sdb_idx, dev->sdb_page);
+		clear_bit(ctx->sdb_bitmap_idx, dev->sdb_page);
 	else if (ctx->sdb_type == ERDMA_SDB_ENTRY)
-		clear_bit(ctx->sdb_idx, dev->sdb_entry);
+		clear_bit(ctx->sdb_bitmap_idx, dev->sdb_entry);
 
 	erdma_uctx_user_mmap_entries_remove(ctx);
 
diff --git a/drivers/infiniband/hw/erdma/erdma_verbs.h b/drivers/infiniband/hw/erdma/erdma_verbs.h
index e0a993bc032a..4dbef1483027 100644
--- a/drivers/infiniband/hw/erdma/erdma_verbs.h
+++ b/drivers/infiniband/hw/erdma/erdma_verbs.h
@@ -35,9 +35,8 @@ struct erdma_ucontext {
 	struct ib_ucontext ibucontext;
 
 	u32 sdb_type;
-	u32 sdb_idx;
-	u32 sdb_page_idx;
-	u32 sdb_page_off;
+	u32 sdb_bitmap_idx;
+	u32 sdb_entid;
 	u64 sdb;
 	u64 rdb;
 	u64 cdb;
diff --git a/include/uapi/rdma/erdma-abi.h b/include/uapi/rdma/erdma-abi.h
index b7a0222f978f..57f8942a3c56 100644
--- a/include/uapi/rdma/erdma-abi.h
+++ b/include/uapi/rdma/erdma-abi.h
@@ -40,10 +40,13 @@ struct erdma_uresp_alloc_ctx {
 	__u32 dev_id;
 	__u32 pad;
 	__u32 sdb_type;
-	__u32 sdb_offset;
+	__u32 sdb_entid;
 	__aligned_u64 sdb;
 	__aligned_u64 rdb;
 	__aligned_u64 cdb;
+	__u32 sdb_off;
+	__u32 rdb_off;
+	__u32 cdb_off;
 };
 
 #endif
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v2 2/2] RDMA/erdma: Support non-4K page size in doorbell allocation
  2023-03-07 10:29 ` [PATCH for-next v2 2/2] RDMA/erdma: Support non-4K page size in doorbell allocation Cheng Xu
@ 2023-03-14 10:23   ` Leon Romanovsky
       [not found]     ` <5b0cc34d-a185-d9b4-c312-27bc959d929d@linux.alibaba.com>
  2023-03-14 11:50     ` Cheng Xu
  0 siblings, 2 replies; 23+ messages in thread
From: Leon Romanovsky @ 2023-03-14 10:23 UTC (permalink / raw)
  To: Cheng Xu; +Cc: jgg, linux-rdma, KaiShen

On Tue, Mar 07, 2023 at 06:29:24PM +0800, Cheng Xu wrote:
> Doorbell resources are exposed to userspace by mmap. The size unit of mmap
> is PAGE_SIZE, previous implementation can not work correctly if PAGE_SIZE
> is not 4K. We support non-4K page size in this commit.

Why do you need this information in rdma-core?
Can you use sysconf(_SC_PAGESIZE) there to understand the page size like
other providers?

Thanks

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v2 2/2] RDMA/erdma: Support non-4K page size in doorbell allocation
       [not found]     ` <5b0cc34d-a185-d9b4-c312-27bc959d929d@linux.alibaba.com>
@ 2023-03-14 11:34       ` Cheng Xu
  0 siblings, 0 replies; 23+ messages in thread
From: Cheng Xu @ 2023-03-14 11:34 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: jgg, linux-rdma, KaiShen



On 3/14/23 7:31 PM, Cheng Xu wrote:
> 
> 
> On 3/14/23 6:23 PM, Leon Romanovsky wrote:
>> On Tue, Mar 07, 2023 at 06:29:24PM +0800, Cheng Xu wrote:
>>> Doorbell resources are exposed to userspace by mmap. The size unit of mmap
>>> is PAGE_SIZE, previous implementation can not work correctly if PAGE_SIZE
>>> is not 4K. We support non-4K page size in this commit.
>>
>> Why do you need this information in rdma-core?
>> Can you use sysconf(_SC_PAGESIZE) there to understand the page size like
>> other providers?
>>
> 
> I don't expose PAGE_SIZE to userspace in this patchset, but the *offset* in
> PAGE of the DBs:
> 
> diff --git a/include/uapi/rdma/erdma-abi.h b/include/uapi/rdma/erdma-abi.h
> index b7a0222f978f..57f8942a3c56 100644
> --- a/include/uapi/rdma/erdma-abi.h
> +++ b/include/uapi/rdma/erdma-abi.h
> @@ -40,10 +40,13 @@ struct erdma_uresp_alloc_ctx {
>  	__u32 dev_id;
>  	__u32 pad;
>  	__u32 sdb_type;
> -	__u32 sdb_offset;
> +	__u32 sdb_entid;
>  	__aligned_u64 sdb;
>  	__aligned_u64 rdb;
>  	__aligned_u64 cdb;
> +	__u32 sdb_off;
> +	__u32 rdb_off;
> +	__u32 cdb_off;
>  };
> 
> Our doorbell space is aligned to 4096
> 

Please ignore these two response. There is something wrong with my Thunderbird:
When I press ctrl + s, it send the email, not save it.

Sorry for this.

Cheng Xu

> 
> 
>> Thanks

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v2 2/2] RDMA/erdma: Support non-4K page size in doorbell allocation
  2023-03-14 10:23   ` Leon Romanovsky
       [not found]     ` <5b0cc34d-a185-d9b4-c312-27bc959d929d@linux.alibaba.com>
@ 2023-03-14 11:50     ` Cheng Xu
  2023-03-14 14:10       ` Leon Romanovsky
  1 sibling, 1 reply; 23+ messages in thread
From: Cheng Xu @ 2023-03-14 11:50 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: jgg, linux-rdma, KaiShen



On 3/14/23 6:23 PM, Leon Romanovsky wrote:
> On Tue, Mar 07, 2023 at 06:29:24PM +0800, Cheng Xu wrote:
>> Doorbell resources are exposed to userspace by mmap. The size unit of mmap
>> is PAGE_SIZE, previous implementation can not work correctly if PAGE_SIZE
>> is not 4K. We support non-4K page size in this commit.
> 
> Why do you need this information in rdma-core?
> Can you use sysconf(_SC_PAGESIZE) there to understand the page size like
> other providers?
> 

I don't expose PAGE_SIZE to userspace in this patchset, but the *offset* in
PAGE of the DBs:

diff --git a/include/uapi/rdma/erdma-abi.h b/include/uapi/rdma/erdma-abi.h
index b7a0222f978f..57f8942a3c56 100644
--- a/include/uapi/rdma/erdma-abi.h
+++ b/include/uapi/rdma/erdma-abi.h
@@ -40,10 +40,13 @@ struct erdma_uresp_alloc_ctx {
 	__u32 dev_id;
 	__u32 pad;
 	__u32 sdb_type;
-	__u32 sdb_offset;
+	__u32 sdb_entid;
 	__aligned_u64 sdb;
 	__aligned_u64 rdb;
 	__aligned_u64 cdb;
+	__u32 sdb_off;
+	__u32 rdb_off;
+	__u32 cdb_off;
 };

Our doorbell space is aligned to 4096, this works fine when PAGE_SIZE is
also 4096, and the doorbell space starts from the mapped page. When
PAGE_SIZE is not 4096, the doorbell space may starts from the middle of
the mapped page.

For example, our SQ doorbell starts from the offset 4096 in PCIe bar 0.
When we map the first SQ doorbell to userspace when PAGE_SIZE is 64K,
the doorbell space starts from the offset 4096 in mmap returned address.

So the userspace needs to know the doorbell space offset in mmaped page.

Thanks,
Cheng Xu



> Thanks

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v2 2/2] RDMA/erdma: Support non-4K page size in doorbell allocation
  2023-03-14 11:50     ` Cheng Xu
@ 2023-03-14 14:10       ` Leon Romanovsky
  2023-03-15  1:58         ` Cheng Xu
  0 siblings, 1 reply; 23+ messages in thread
From: Leon Romanovsky @ 2023-03-14 14:10 UTC (permalink / raw)
  To: Cheng Xu; +Cc: jgg, linux-rdma, KaiShen

On Tue, Mar 14, 2023 at 07:50:19PM +0800, Cheng Xu wrote:
> 
> 
> On 3/14/23 6:23 PM, Leon Romanovsky wrote:
> > On Tue, Mar 07, 2023 at 06:29:24PM +0800, Cheng Xu wrote:
> >> Doorbell resources are exposed to userspace by mmap. The size unit of mmap
> >> is PAGE_SIZE, previous implementation can not work correctly if PAGE_SIZE
> >> is not 4K. We support non-4K page size in this commit.
> > 
> > Why do you need this information in rdma-core?
> > Can you use sysconf(_SC_PAGESIZE) there to understand the page size like
> > other providers?
> > 
> 
> I don't expose PAGE_SIZE to userspace in this patchset, but the *offset* in
> PAGE of the DBs:
> 
> diff --git a/include/uapi/rdma/erdma-abi.h b/include/uapi/rdma/erdma-abi.h
> index b7a0222f978f..57f8942a3c56 100644
> --- a/include/uapi/rdma/erdma-abi.h
> +++ b/include/uapi/rdma/erdma-abi.h
> @@ -40,10 +40,13 @@ struct erdma_uresp_alloc_ctx {
>  	__u32 dev_id;
>  	__u32 pad;
>  	__u32 sdb_type;
> -	__u32 sdb_offset;
> +	__u32 sdb_entid;
>  	__aligned_u64 sdb;
>  	__aligned_u64 rdb;
>  	__aligned_u64 cdb;
> +	__u32 sdb_off;
> +	__u32 rdb_off;
> +	__u32 cdb_off;
>  };
> 
> Our doorbell space is aligned to 4096, this works fine when PAGE_SIZE is
> also 4096, and the doorbell space starts from the mapped page. When
> PAGE_SIZE is not 4096, the doorbell space may starts from the middle of
> the mapped page.
> 
> For example, our SQ doorbell starts from the offset 4096 in PCIe bar 0.
> When we map the first SQ doorbell to userspace when PAGE_SIZE is 64K,
> the doorbell space starts from the offset 4096 in mmap returned address.
> 
> So the userspace needs to know the doorbell space offset in mmaped page.

And can't you preserve same alignment in the kernel for doorbells for every page size?
Just always start from 0.

> 
> Thanks,
> Cheng Xu
> 
> 
> 
> > Thanks

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v2 2/2] RDMA/erdma: Support non-4K page size in doorbell allocation
  2023-03-14 14:10       ` Leon Romanovsky
@ 2023-03-15  1:58         ` Cheng Xu
  2023-03-15 10:22           ` Leon Romanovsky
  0 siblings, 1 reply; 23+ messages in thread
From: Cheng Xu @ 2023-03-15  1:58 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: jgg, linux-rdma, KaiShen



On 3/14/23 10:10 PM, Leon Romanovsky wrote:
> On Tue, Mar 14, 2023 at 07:50:19PM +0800, Cheng Xu wrote:
>>
>>
<...>
>>
>> Our doorbell space is aligned to 4096, this works fine when PAGE_SIZE is
>> also 4096, and the doorbell space starts from the mapped page. When
>> PAGE_SIZE is not 4096, the doorbell space may starts from the middle of
>> the mapped page.
>>
>> For example, our SQ doorbell starts from the offset 4096 in PCIe bar 0.
>> When we map the first SQ doorbell to userspace when PAGE_SIZE is 64K,
>> the doorbell space starts from the offset 4096 in mmap returned address.
>>
>> So the userspace needs to know the doorbell space offset in mmaped page.
> 
> And can't you preserve same alignment in the kernel for doorbells for every page size?
> Just always start from 0.
> 

I've considered this option before, but unfortunately can't, at least for CQ DB.
The size of our PCIe bar 0 is 512K, and offset [484K, 508K] are CQ doorbells.
CQ doorbell space is located in offset [36K, 60K] when PAGE_SIZE = 64K, and can't
start from offset 0 in this case.

Another reason is that we want to organize SQ doorbell space in unit of 4096.
In current implementation, each ucontext will be assigned a SQ doorbell space
for both normal doorbell and direct wqe usage. Unit of 4096, compared with
larger unit, more ucontexts can be assigned exclusive doorbell space for direct
wqe.

Thanks,
Cheng Xu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v2 2/2] RDMA/erdma: Support non-4K page size in doorbell allocation
  2023-03-15  1:58         ` Cheng Xu
@ 2023-03-15 10:22           ` Leon Romanovsky
  2023-03-21 14:30             ` Jason Gunthorpe
  0 siblings, 1 reply; 23+ messages in thread
From: Leon Romanovsky @ 2023-03-15 10:22 UTC (permalink / raw)
  To: Cheng Xu; +Cc: jgg, linux-rdma, KaiShen

On Wed, Mar 15, 2023 at 09:58:06AM +0800, Cheng Xu wrote:
> 
> 
> On 3/14/23 10:10 PM, Leon Romanovsky wrote:
> > On Tue, Mar 14, 2023 at 07:50:19PM +0800, Cheng Xu wrote:
> >>
> >>
> <...>
> >>
> >> Our doorbell space is aligned to 4096, this works fine when PAGE_SIZE is
> >> also 4096, and the doorbell space starts from the mapped page. When
> >> PAGE_SIZE is not 4096, the doorbell space may starts from the middle of
> >> the mapped page.
> >>
> >> For example, our SQ doorbell starts from the offset 4096 in PCIe bar 0.
> >> When we map the first SQ doorbell to userspace when PAGE_SIZE is 64K,
> >> the doorbell space starts from the offset 4096 in mmap returned address.
> >>
> >> So the userspace needs to know the doorbell space offset in mmaped page.
> > 
> > And can't you preserve same alignment in the kernel for doorbells for every page size?
> > Just always start from 0.
> > 
> 
> I've considered this option before, but unfortunately can't, at least for CQ DB.
> The size of our PCIe bar 0 is 512K, and offset [484K, 508K] are CQ doorbells.
> CQ doorbell space is located in offset [36K, 60K] when PAGE_SIZE = 64K, and can't
> start from offset 0 in this case.
> 
> Another reason is that we want to organize SQ doorbell space in unit of 4096.
> In current implementation, each ucontext will be assigned a SQ doorbell space
> for both normal doorbell and direct wqe usage. Unit of 4096, compared with
> larger unit, more ucontexts can be assigned exclusive doorbell space for direct
> wqe.

I have a feeling that there is an existing API for it already.
Let's give a chance for Jason to chime in.

Thanks

> 
> Thanks,
> Cheng Xu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v2 2/2] RDMA/erdma: Support non-4K page size in doorbell allocation
  2023-03-15 10:22           ` Leon Romanovsky
@ 2023-03-21 14:30             ` Jason Gunthorpe
  2023-03-22  7:05               ` Cheng Xu
  0 siblings, 1 reply; 23+ messages in thread
From: Jason Gunthorpe @ 2023-03-21 14:30 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Cheng Xu, linux-rdma, KaiShen

On Wed, Mar 15, 2023 at 12:22:10PM +0200, Leon Romanovsky wrote:
> On Wed, Mar 15, 2023 at 09:58:06AM +0800, Cheng Xu wrote:
> > 
> > 
> > On 3/14/23 10:10 PM, Leon Romanovsky wrote:
> > > On Tue, Mar 14, 2023 at 07:50:19PM +0800, Cheng Xu wrote:
> > >>
> > >>
> > <...>
> > >>
> > >> Our doorbell space is aligned to 4096, this works fine when PAGE_SIZE is
> > >> also 4096, and the doorbell space starts from the mapped page. When
> > >> PAGE_SIZE is not 4096, the doorbell space may starts from the middle of
> > >> the mapped page.
> > >>
> > >> For example, our SQ doorbell starts from the offset 4096 in PCIe bar 0.
> > >> When we map the first SQ doorbell to userspace when PAGE_SIZE is 64K,
> > >> the doorbell space starts from the offset 4096 in mmap returned address.
> > >>
> > >> So the userspace needs to know the doorbell space offset in mmaped page.
> > > 
> > > And can't you preserve same alignment in the kernel for doorbells for every page size?
> > > Just always start from 0.
> > > 
> > 
> > I've considered this option before, but unfortunately can't, at least for CQ DB.
> > The size of our PCIe bar 0 is 512K, and offset [484K, 508K] are CQ doorbells.
> > CQ doorbell space is located in offset [36K, 60K] when PAGE_SIZE = 64K, and can't
> > start from offset 0 in this case.
> > 
> > Another reason is that we want to organize SQ doorbell space in unit of 4096.
> > In current implementation, each ucontext will be assigned a SQ doorbell space
> > for both normal doorbell and direct wqe usage. Unit of 4096, compared with
> > larger unit, more ucontexts can be assigned exclusive doorbell space for direct
> > wqe.
> 
> I have a feeling that there is an existing API for it already.
> Let's give a chance for Jason to chime in.

This sounds similar to how mlx5 chops up its doorbell space

But I don't understand your device security model.

In mlx5 it is not allowed to share doorbells between unrelated
processes. Doorbells have built in security and a doorbell can only
trigger QP/CQ's that are explicitly linked to that doorbell.

Thus mlx5 is careful to only mmap doorbells that are linked to the
QPs. On 64K page size userspace receives alot of doorbells per mmap,
all linked to the same security context.

Improperly sharing MMIO pages can result in various security problems
if a hostile userspace can write arbitary things to MMIO space.

Here you seem to be talking about overmapping stuff. What is the
security argument that it is safe to leak to userspace parts of the
device MMIO BAR beyond that strictly cotnained to the single doorbell?

This has to be clearly explained in the commit message and a comment.

Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v2 2/2] RDMA/erdma: Support non-4K page size in doorbell allocation
  2023-03-21 14:30             ` Jason Gunthorpe
@ 2023-03-22  7:05               ` Cheng Xu
  2023-03-22 11:54                 ` Jason Gunthorpe
  0 siblings, 1 reply; 23+ messages in thread
From: Cheng Xu @ 2023-03-22  7:05 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky; +Cc: linux-rdma, KaiShen



On 3/21/23 10:30 PM, Jason Gunthorpe wrote:
> On Wed, Mar 15, 2023 at 12:22:10PM +0200, Leon Romanovsky wrote:
<...>
> 
> This sounds similar to how mlx5 chops up its doorbell space
> 
> But I don't understand your device security model.
> 
> In mlx5 it is not allowed to share doorbells between unrelated
> processes. Doorbells have built in security and a doorbell can only
> trigger QP/CQ's that are explicitly linked to that doorbell.
> 
> Thus mlx5 is careful to only mmap doorbells that are linked to the
> QPs. On 64K page size userspace receives alot of doorbells per mmap,
> all linked to the same security context.
> 
> Improperly sharing MMIO pages can result in various security problems
> if a hostile userspace can write arbitary things to MMIO space.
> 
> Here you seem to be talking about overmapping stuff. What is the
> security argument that it is safe to leak to userspace parts of the
> device MMIO BAR beyond that strictly cotnained to the single doorbell?

Thank you for your explanation. IIUC, this security mechanism of mlx5
needs the support of HW, and the HW can reject doorbells from unauthorized
doorbell space.

The current generation of erdma devices do not have this capability due to
implementation complexity. Without this HW capability, isolating the MMIO
space in software doesn't prevent the attack, because the malicious APPs
can map mmio itself, not through verbs interface.

Our consideration of security is mainly focused on the VM/VF level,
not at the process/ucontext level: no matter what the user does inside the
VM, it cannot affect other VMs. Therefore, The userspace isolation of mmio
inside one VM is incomplete and shared mmio pages cannot be avoided in
this generation.

> This has to be clearly explained in the commit message and a comment.

All right, I will do this in v3.

Thanks,
Cheng Xu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v2 2/2] RDMA/erdma: Support non-4K page size in doorbell allocation
  2023-03-22  7:05               ` Cheng Xu
@ 2023-03-22 11:54                 ` Jason Gunthorpe
  2023-03-22 13:30                   ` Cheng Xu
  0 siblings, 1 reply; 23+ messages in thread
From: Jason Gunthorpe @ 2023-03-22 11:54 UTC (permalink / raw)
  To: Cheng Xu; +Cc: Leon Romanovsky, linux-rdma, KaiShen

On Wed, Mar 22, 2023 at 03:05:29PM +0800, Cheng Xu wrote:

> The current generation of erdma devices do not have this capability due to
> implementation complexity. Without this HW capability, isolating the MMIO
> space in software doesn't prevent the attack, because the malicious APPs
> can map mmio itself, not through verbs interface.

This doesn't meet the security model of Linux, verbs HW is expected to
protect one process from another process.

if this is the case we should consider restricting this HW to
CAP_SYS_RAW_IO only.

You should come with an explanation why this HW is safe enough to
avoid this.

Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v2 2/2] RDMA/erdma: Support non-4K page size in doorbell allocation
  2023-03-22 11:54                 ` Jason Gunthorpe
@ 2023-03-22 13:30                   ` Cheng Xu
  2023-03-22 14:01                     ` Jason Gunthorpe
  0 siblings, 1 reply; 23+ messages in thread
From: Cheng Xu @ 2023-03-22 13:30 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma, KaiShen



On 3/22/23 7:54 PM, Jason Gunthorpe wrote:
> On Wed, Mar 22, 2023 at 03:05:29PM +0800, Cheng Xu wrote:
> 
>> The current generation of erdma devices do not have this capability due to
>> implementation complexity. Without this HW capability, isolating the MMIO
>> space in software doesn't prevent the attack, because the malicious APPs
>> can map mmio itself, not through verbs interface.
> 
> This doesn't meet the security model of Linux, verbs HW is expected to
> protect one process from another process.

OK, I see.

So the key point is that HW should restrict each process to use its own doorbell
space. If hardware can do this, share or do not share MMIO pages both will meet
the security requirement. Do I get it right? 

It seems that EFA uses shared MMIO pages with hardware security assurance.

> if this is the case we should consider restricting this HW to
> CAP_SYS_RAW_IO only.
> 

Please give us a chance to fix this issue first.

> You should come with an explanation why this HW is safe enough to
> avoid this.

I need to discuss with our HW guys and implement the similar security check
in HW, and this won't be long.

Thanks,
Cheng Xu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v2 2/2] RDMA/erdma: Support non-4K page size in doorbell allocation
  2023-03-22 13:30                   ` Cheng Xu
@ 2023-03-22 14:01                     ` Jason Gunthorpe
  2023-03-22 15:09                       ` Gal Pressman
  2023-03-23  6:57                       ` Cheng Xu
  0 siblings, 2 replies; 23+ messages in thread
From: Jason Gunthorpe @ 2023-03-22 14:01 UTC (permalink / raw)
  To: Cheng Xu
  Cc: Leon Romanovsky, linux-rdma, KaiShen, Yossi Leybovich, Gal Pressman

On Wed, Mar 22, 2023 at 09:30:41PM +0800, Cheng Xu wrote:
> 
> 
> On 3/22/23 7:54 PM, Jason Gunthorpe wrote:
> > On Wed, Mar 22, 2023 at 03:05:29PM +0800, Cheng Xu wrote:
> > 
> >> The current generation of erdma devices do not have this capability due to
> >> implementation complexity. Without this HW capability, isolating the MMIO
> >> space in software doesn't prevent the attack, because the malicious APPs
> >> can map mmio itself, not through verbs interface.
> > 
> > This doesn't meet the security model of Linux, verbs HW is expected to
> > protect one process from another process.
> 
> OK, I see.
> 
> So the key point is that HW should restrict each process to use its own doorbell
> space. If hardware can do this, share or do not share MMIO pages both will meet
> the security requirement. Do I get it right? 

HW can never do that, HW is supposed to rely on the system MMU to
isolate doorbell registers

The HW responsibility is to make doorbell MMIO registers safe in the
hands of other processes.

Simple doorbells that only 'kick' and don't convey any information are
probably safe to share, and don't require HW checks between the
doorbell page and the PD/QP/CQ/etc

Doorbells that deliver data - eg a head pointer - are not safe because
the wrong head pointer can corrupt the HW state. Process B must not be
able to corrupt the head pointer of a QP/CQ owned by Process A under
any circumstances. Definitely they cannot have access to the MMIO and
also the HW must ensure that writes coming from process B are rejected
if they touch resources owned by process a (eg by PD/QPN/CQN checks in
HW)

Doorbells that accept entire WQE's are definately not safe as a
hostile process could execute a WQE on a QP it does not own.

> It seems that EFA uses shared MMIO pages with hardware security assurance.

I'm not sure how EFA works, it writes this:

        EFA_SET(&db, EFA_IO_REGS_CQ_DB_CONSUMER_INDEX, cq->cc);
        EFA_SET(&db, EFA_IO_REGS_CQ_DB_CMD_SN, cq->cmd_sn & 0x3);
        EFA_SET(&db, EFA_IO_REGS_CQ_DB_ARM, arm);

But interestingly there is no CQN value, so I have no idea how it
associates the doorbell register with the CQ to load the data into.

If it is using a DB register offset to learn the CQN then obviously it
cannot share the same doorbell page (or associated CQNs) across verbs
FD contexts, that would be a security bug. EFA folks?

Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v2 2/2] RDMA/erdma: Support non-4K page size in doorbell allocation
  2023-03-22 14:01                     ` Jason Gunthorpe
@ 2023-03-22 15:09                       ` Gal Pressman
  2023-03-23  6:57                       ` Cheng Xu
  1 sibling, 0 replies; 23+ messages in thread
From: Gal Pressman @ 2023-03-22 15:09 UTC (permalink / raw)
  To: Jason Gunthorpe, Cheng Xu
  Cc: Leon Romanovsky, linux-rdma, KaiShen, Yossi Leybovich

On 22/03/2023 16:01, Jason Gunthorpe wrote:
>> It seems that EFA uses shared MMIO pages with hardware security assurance.
> 
> I'm not sure how EFA works, it writes this:
> 
>         EFA_SET(&db, EFA_IO_REGS_CQ_DB_CONSUMER_INDEX, cq->cc);
>         EFA_SET(&db, EFA_IO_REGS_CQ_DB_CMD_SN, cq->cmd_sn & 0x3);
>         EFA_SET(&db, EFA_IO_REGS_CQ_DB_ARM, arm);
> 
> But interestingly there is no CQN value, so I have no idea how it
> associates the doorbell register with the CQ to load the data into.
> 
> If it is using a DB register offset to learn the CQN then obviously it
> cannot share the same doorbell page (or associated CQNs) across verbs
> FD contexts, that would be a security bug. EFA folks?

The doorbell's BAR address is dictated by the device, and the CQN is
derived from the address. Each doorbell resides in a separate page,
there is no sharing of doorbells pages between different UARs.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v2 2/2] RDMA/erdma: Support non-4K page size in doorbell allocation
  2023-03-22 14:01                     ` Jason Gunthorpe
  2023-03-22 15:09                       ` Gal Pressman
@ 2023-03-23  6:57                       ` Cheng Xu
  2023-03-23 11:53                         ` Jason Gunthorpe
  1 sibling, 1 reply; 23+ messages in thread
From: Cheng Xu @ 2023-03-23  6:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, KaiShen, Yossi Leybovich, Gal Pressman



On 3/22/23 10:01 PM, Jason Gunthorpe wrote:
> On Wed, Mar 22, 2023 at 09:30:41PM +0800, Cheng Xu wrote:
>>
>>
>> On 3/22/23 7:54 PM, Jason Gunthorpe wrote:
>>> On Wed, Mar 22, 2023 at 03:05:29PM +0800, Cheng Xu wrote:
>>>
>>>> The current generation of erdma devices do not have this capability due to
>>>> implementation complexity. Without this HW capability, isolating the MMIO
>>>> space in software doesn't prevent the attack, because the malicious APPs
>>>> can map mmio itself, not through verbs interface.
>>>
>>> This doesn't meet the security model of Linux, verbs HW is expected to
>>> protect one process from another process.
>>
>> OK, I see.
>>
>> So the key point is that HW should restrict each process to use its own doorbell
>> space. If hardware can do this, share or do not share MMIO pages both will meet
>> the security requirement. Do I get it right? 
> 
> HW can never do that, HW is supposed to rely on the system MMU to
> isolate doorbell registers
> 
> The HW responsibility is to make doorbell MMIO registers safe in the
> hands of other processes.
> 
> Simple doorbells that only 'kick' and don't convey any information are
> probably safe to share, and don't require HW checks between the
> doorbell page and the PD/QP/CQ/etc
> 
> Doorbells that deliver data - eg a head pointer - are not safe because
> the wrong head pointer can corrupt the HW state. Process B must not be
> able to corrupt the head pointer of a QP/CQ owned by Process A under
> any circumstances. Definitely they cannot have access to the MMIO and
> also the HW must ensure that writes coming from process B are rejected
> if they touch resources owned by process a (eg by PD/QPN/CQN checks in
> HW)
> 
> Doorbells that accept entire WQE's are definately not safe as a
> hostile process could execute a WQE on a QP it does not own.
> 

It's much clear, thanks for your explanation and patience.

Back to erdma context, we have rethought our implementation. For QPs,
we have a field *wqe_index* in SQE/RQE, which indicates the validity
of the current WQE. Incorrect doorbell value from other processes can
not corrupt the QPC in hardware due to PI range and WQE content
validation in HW.

Unlike SQ/RQ, for CQ doorbell, It seems that we need some more works
to protect it. We have started analyzing the details.

Thanks,
Cheng Xu




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v2 2/2] RDMA/erdma: Support non-4K page size in doorbell allocation
  2023-03-23  6:57                       ` Cheng Xu
@ 2023-03-23 11:53                         ` Jason Gunthorpe
  2023-03-23 12:33                           ` Cheng Xu
  0 siblings, 1 reply; 23+ messages in thread
From: Jason Gunthorpe @ 2023-03-23 11:53 UTC (permalink / raw)
  To: Cheng Xu
  Cc: Leon Romanovsky, linux-rdma, KaiShen, Yossi Leybovich, Gal Pressman

On Thu, Mar 23, 2023 at 02:57:49PM +0800, Cheng Xu wrote:
> 
> 
> On 3/22/23 10:01 PM, Jason Gunthorpe wrote:
> > On Wed, Mar 22, 2023 at 09:30:41PM +0800, Cheng Xu wrote:
> >>
> >>
> >> On 3/22/23 7:54 PM, Jason Gunthorpe wrote:
> >>> On Wed, Mar 22, 2023 at 03:05:29PM +0800, Cheng Xu wrote:
> >>>
> >>>> The current generation of erdma devices do not have this capability due to
> >>>> implementation complexity. Without this HW capability, isolating the MMIO
> >>>> space in software doesn't prevent the attack, because the malicious APPs
> >>>> can map mmio itself, not through verbs interface.
> >>>
> >>> This doesn't meet the security model of Linux, verbs HW is expected to
> >>> protect one process from another process.
> >>
> >> OK, I see.
> >>
> >> So the key point is that HW should restrict each process to use its own doorbell
> >> space. If hardware can do this, share or do not share MMIO pages both will meet
> >> the security requirement. Do I get it right? 
> > 
> > HW can never do that, HW is supposed to rely on the system MMU to
> > isolate doorbell registers
> > 
> > The HW responsibility is to make doorbell MMIO registers safe in the
> > hands of other processes.
> > 
> > Simple doorbells that only 'kick' and don't convey any information are
> > probably safe to share, and don't require HW checks between the
> > doorbell page and the PD/QP/CQ/etc
> > 
> > Doorbells that deliver data - eg a head pointer - are not safe because
> > the wrong head pointer can corrupt the HW state. Process B must not be
> > able to corrupt the head pointer of a QP/CQ owned by Process A under
> > any circumstances. Definitely they cannot have access to the MMIO and
> > also the HW must ensure that writes coming from process B are rejected
> > if they touch resources owned by process a (eg by PD/QPN/CQN checks in
> > HW)
> > 
> > Doorbells that accept entire WQE's are definately not safe as a
> > hostile process could execute a WQE on a QP it does not own.
> > 
> 
> It's much clear, thanks for your explanation and patience.
> 
> Back to erdma context, we have rethought our implementation. For QPs,
> we have a field *wqe_index* in SQE/RQE, which indicates the validity
> of the current WQE. Incorrect doorbell value from other processes can
> not corrupt the QPC in hardware due to PI range and WQE content
> validation in HW.

No, validating the DB content is not acceptable security. The attacker
process can always generate valid content if it tries hard enough.

The only acceptable answer is to do like every other NIC did and link
the DB register to the HW object it is allowed to affect.

Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v2 2/2] RDMA/erdma: Support non-4K page size in doorbell allocation
  2023-03-23 11:53                         ` Jason Gunthorpe
@ 2023-03-23 12:33                           ` Cheng Xu
  2023-03-23 13:05                             ` Jason Gunthorpe
  0 siblings, 1 reply; 23+ messages in thread
From: Cheng Xu @ 2023-03-23 12:33 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, KaiShen, Yossi Leybovich, Gal Pressman



On 3/23/23 7:53 PM, Jason Gunthorpe wrote:
> On Thu, Mar 23, 2023 at 02:57:49PM +0800, Cheng Xu wrote:
>>
>>
>> On 3/22/23 10:01 PM, Jason Gunthorpe wrote:
>>> On Wed, Mar 22, 2023 at 09:30:41PM +0800, Cheng Xu wrote:
>>>>
>>>>
>>>> On 3/22/23 7:54 PM, Jason Gunthorpe wrote:
<...>
>>
>> It's much clear, thanks for your explanation and patience.
>>
>> Back to erdma context, we have rethought our implementation. For QPs,
>> we have a field *wqe_index* in SQE/RQE, which indicates the validity
>> of the current WQE. Incorrect doorbell value from other processes can
>> not corrupt the QPC in hardware due to PI range and WQE content
>> validation in HW.
> 
> No, validating the DB content is not acceptable security. The attacker
> process can always generate valid content if it tries hard enough.
>

Oh, you may misunderstand what I said, our HW validates the *WQE* content,
not *DB* content. The attacker can not generate the WQE of other QPs. This
protection and correction is already implemented in our HW.

> The only acceptable answer is to do like every other NIC did and link
> the DB register to the HW object it is allowed to affect.
> 

Emm, still not acceptable with WQE content validation? If it's acceptable,
will reduce some works.

Thanks,
Cheng Xu



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v2 2/2] RDMA/erdma: Support non-4K page size in doorbell allocation
  2023-03-23 12:33                           ` Cheng Xu
@ 2023-03-23 13:05                             ` Jason Gunthorpe
  2023-03-23 14:10                               ` Cheng Xu
  0 siblings, 1 reply; 23+ messages in thread
From: Jason Gunthorpe @ 2023-03-23 13:05 UTC (permalink / raw)
  To: Cheng Xu
  Cc: Leon Romanovsky, linux-rdma, KaiShen, Yossi Leybovich, Gal Pressman

On Thu, Mar 23, 2023 at 08:33:53PM +0800, Cheng Xu wrote:
> 
> 
> On 3/23/23 7:53 PM, Jason Gunthorpe wrote:
> > On Thu, Mar 23, 2023 at 02:57:49PM +0800, Cheng Xu wrote:
> >>
> >>
> >> On 3/22/23 10:01 PM, Jason Gunthorpe wrote:
> >>> On Wed, Mar 22, 2023 at 09:30:41PM +0800, Cheng Xu wrote:
> >>>>
> >>>>
> >>>> On 3/22/23 7:54 PM, Jason Gunthorpe wrote:
> <...>
> >>
> >> It's much clear, thanks for your explanation and patience.
> >>
> >> Back to erdma context, we have rethought our implementation. For QPs,
> >> we have a field *wqe_index* in SQE/RQE, which indicates the validity
> >> of the current WQE. Incorrect doorbell value from other processes can
> >> not corrupt the QPC in hardware due to PI range and WQE content
> >> validation in HW.
> > 
> > No, validating the DB content is not acceptable security. The attacker
> > process can always generate valid content if it tries hard enough.
> >
> 
> Oh, you may misunderstand what I said, our HW validates the *WQE* content,
> not *DB* content. The attacker can not generate the WQE of other QPs. This
> protection and correction is already implemented in our HW.

Why are you talking about WQEs in a discussion about doorbell
security?

WQE's are read via DMA from their SQ/RQs - that path doesn't have a
doorbell security problem.

The issue comes if you try to deliver the WQE via a doorbell write.

Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v2 2/2] RDMA/erdma: Support non-4K page size in doorbell allocation
  2023-03-23 13:05                             ` Jason Gunthorpe
@ 2023-03-23 14:10                               ` Cheng Xu
  2023-03-23 14:18                                 ` Jason Gunthorpe
  0 siblings, 1 reply; 23+ messages in thread
From: Cheng Xu @ 2023-03-23 14:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, KaiShen, Yossi Leybovich, Gal Pressman



On 3/23/23 9:05 PM, Jason Gunthorpe wrote:
> On Thu, Mar 23, 2023 at 08:33:53PM +0800, Cheng Xu wrote:
>>
>>
>> On 3/23/23 7:53 PM, Jason Gunthorpe wrote:
>>> On Thu, Mar 23, 2023 at 02:57:49PM +0800, Cheng Xu wrote:
>>>>
>>>>
>>>> On 3/22/23 10:01 PM, Jason Gunthorpe wrote:
>>>>> On Wed, Mar 22, 2023 at 09:30:41PM +0800, Cheng Xu wrote:
>>>>>>
>>>>>>
>>>>>> On 3/22/23 7:54 PM, Jason Gunthorpe wrote:
>> <...>
>>>>
>>>> It's much clear, thanks for your explanation and patience.
>>>>
>>>> Back to erdma context, we have rethought our implementation. For QPs,
>>>> we have a field *wqe_index* in SQE/RQE, which indicates the validity
>>>> of the current WQE. Incorrect doorbell value from other processes can
>>>> not corrupt the QPC in hardware due to PI range and WQE content
>>>> validation in HW.
>>>
>>> No, validating the DB content is not acceptable security. The attacker
>>> process can always generate valid content if it tries hard enough.
>>>
>>
>> Oh, you may misunderstand what I said, our HW validates the *WQE* content,
>> not *DB* content. The attacker can not generate the WQE of other QPs. This
>> protection and correction is already implemented in our HW.
> 
> Why are you talking about WQEs in a discussion about doorbell
> security?

Malicious doorbell will corrupt the head pointer in QPC, and then invalid WQEs
will be processed. My point is that WQE validation can correct the head pointer
carried in malicious doorbell, and can prevent such attack.

It looks like the first kind of the doorbell types you mentioned before, but only
conveying the QPN.

Thanks,
Cheng Xu 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v2 2/2] RDMA/erdma: Support non-4K page size in doorbell allocation
  2023-03-23 14:10                               ` Cheng Xu
@ 2023-03-23 14:18                                 ` Jason Gunthorpe
  2023-03-26  0:10                                   ` Cheng Xu
  0 siblings, 1 reply; 23+ messages in thread
From: Jason Gunthorpe @ 2023-03-23 14:18 UTC (permalink / raw)
  To: Cheng Xu
  Cc: Leon Romanovsky, linux-rdma, KaiShen, Yossi Leybovich, Gal Pressman

On Thu, Mar 23, 2023 at 10:10:25PM +0800, Cheng Xu wrote:
> 
> 
> On 3/23/23 9:05 PM, Jason Gunthorpe wrote:
> > On Thu, Mar 23, 2023 at 08:33:53PM +0800, Cheng Xu wrote:
> >>
> >>
> >> On 3/23/23 7:53 PM, Jason Gunthorpe wrote:
> >>> On Thu, Mar 23, 2023 at 02:57:49PM +0800, Cheng Xu wrote:
> >>>>
> >>>>
> >>>> On 3/22/23 10:01 PM, Jason Gunthorpe wrote:
> >>>>> On Wed, Mar 22, 2023 at 09:30:41PM +0800, Cheng Xu wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 3/22/23 7:54 PM, Jason Gunthorpe wrote:
> >> <...>
> >>>>
> >>>> It's much clear, thanks for your explanation and patience.
> >>>>
> >>>> Back to erdma context, we have rethought our implementation. For QPs,
> >>>> we have a field *wqe_index* in SQE/RQE, which indicates the validity
> >>>> of the current WQE. Incorrect doorbell value from other processes can
> >>>> not corrupt the QPC in hardware due to PI range and WQE content
> >>>> validation in HW.
> >>>
> >>> No, validating the DB content is not acceptable security. The attacker
> >>> process can always generate valid content if it tries hard enough.
> >>>
> >>
> >> Oh, you may misunderstand what I said, our HW validates the *WQE* content,
> >> not *DB* content. The attacker can not generate the WQE of other QPs. This
> >> protection and correction is already implemented in our HW.
> > 
> > Why are you talking about WQEs in a discussion about doorbell
> > security?
> 
> Malicious doorbell will corrupt the head pointer in QPC, and then invalid WQEs
> will be processed. My point is that WQE validation can correct the head pointer
> carried in malicious doorbell, and can prevent such attack.

No, if the head pointer is incorrect an attack can stall the QP by
guessing a head pointer that is valid but before already submitted
WQEs.

There is no WQE based recovery for such a thing.

Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v2 1/2] RDMA/erdma: Use fixed hardware page size
  2023-03-07 10:29 ` [PATCH for-next v2 1/2] RDMA/erdma: Use fixed hardware page size Cheng Xu
@ 2023-03-24 14:34   ` Jason Gunthorpe
  0 siblings, 0 replies; 23+ messages in thread
From: Jason Gunthorpe @ 2023-03-24 14:34 UTC (permalink / raw)
  To: Cheng Xu; +Cc: leon, linux-rdma, KaiShen

On Tue, Mar 07, 2023 at 06:29:23PM +0800, Cheng Xu wrote:
> Hardware's page size is 4096, but the kernel's page size may vary. Driver
> should use hardware's page size when communicating with hardware.
> 
> Fixes: 155055771704 ("RDMA/erdma: Add verbs implementation")
> Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
> ---
>  drivers/infiniband/hw/erdma/erdma_hw.h    |  4 ++++
>  drivers/infiniband/hw/erdma/erdma_verbs.c | 17 +++++++++--------
>  2 files changed, 13 insertions(+), 8 deletions(-)

This patch applied to for-next

Thanks,
Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v2 2/2] RDMA/erdma: Support non-4K page size in doorbell allocation
  2023-03-23 14:18                                 ` Jason Gunthorpe
@ 2023-03-26  0:10                                   ` Cheng Xu
  0 siblings, 0 replies; 23+ messages in thread
From: Cheng Xu @ 2023-03-26  0:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, KaiShen, Yossi Leybovich, Gal Pressman



On 3/23/23 10:18 PM, Jason Gunthorpe wrote:
> On Thu, Mar 23, 2023 at 10:10:25PM +0800, Cheng Xu wrote:
>>
>>
<...>
>>
>> Malicious doorbell will corrupt the head pointer in QPC, and then invalid WQEs
>> will be processed. My point is that WQE validation can correct the head pointer
>> carried in malicious doorbell, and can prevent such attack.
> 
> No, if the head pointer is incorrect an attack can stall the QP by
> guessing a head pointer that is valid but before already submitted
> WQEs.

You are right.

Thanks very much for this discussion and your advisement.

Cheng Xu

> 
> There is no WQE based recovery for such a thing.
> 
> Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2023-03-26  0:11 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-07 10:29 [PATCH for-next v2 0/2] RDMA/erdma: Add non-4K page size support Cheng Xu
2023-03-07 10:29 ` [PATCH for-next v2 1/2] RDMA/erdma: Use fixed hardware page size Cheng Xu
2023-03-24 14:34   ` Jason Gunthorpe
2023-03-07 10:29 ` [PATCH for-next v2 2/2] RDMA/erdma: Support non-4K page size in doorbell allocation Cheng Xu
2023-03-14 10:23   ` Leon Romanovsky
     [not found]     ` <5b0cc34d-a185-d9b4-c312-27bc959d929d@linux.alibaba.com>
2023-03-14 11:34       ` Cheng Xu
2023-03-14 11:50     ` Cheng Xu
2023-03-14 14:10       ` Leon Romanovsky
2023-03-15  1:58         ` Cheng Xu
2023-03-15 10:22           ` Leon Romanovsky
2023-03-21 14:30             ` Jason Gunthorpe
2023-03-22  7:05               ` Cheng Xu
2023-03-22 11:54                 ` Jason Gunthorpe
2023-03-22 13:30                   ` Cheng Xu
2023-03-22 14:01                     ` Jason Gunthorpe
2023-03-22 15:09                       ` Gal Pressman
2023-03-23  6:57                       ` Cheng Xu
2023-03-23 11:53                         ` Jason Gunthorpe
2023-03-23 12:33                           ` Cheng Xu
2023-03-23 13:05                             ` Jason Gunthorpe
2023-03-23 14:10                               ` Cheng Xu
2023-03-23 14:18                                 ` Jason Gunthorpe
2023-03-26  0:10                                   ` Cheng Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).