[PATCH] nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads
@ 2016-08-01  8:36 ` Sagi Grimberg
  0 siblings, 0 replies; 16+ messages in thread
From: Sagi Grimberg @ 2016-08-01  8:36 UTC (permalink / raw)
  To: linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Jens Axboe, Christoph Hellwig, Steve Wise, Jay Freyensee, Ming Lin

Under extreme conditions this might cause data corruptions. By doing that
we we repost the buffer and then post this buffer for the device to send.
If we happen to use shared receive queues the device might write to the
buffer before it sends it (there is no ordering between send and recv
queues). Without SRQs we probably won't get that if the host doesn't
mis-behave and send more than we allowed it, but relying on that is not
really a good idea.

Signed-off-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
---
 drivers/nvme/target/rdma.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index e06d504bdf0c..4e83d92d6bdd 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -615,15 +615,10 @@ static u16 nvmet_rdma_map_sgl_keyed(struct nvmet_rdma_rsp *rsp,
 	if (!len)
 		return 0;
 
-	/* use the already allocated data buffer if possible */
-	if (len <= NVMET_RDMA_INLINE_DATA_SIZE && rsp->queue->host_qid) {
-		nvmet_rdma_use_inline_sg(rsp, len, 0);
-	} else {
-		status = nvmet_rdma_alloc_sgl(&rsp->req.sg, &rsp->req.sg_cnt,
-				len);
-		if (status)
-			return status;
-	}
+	status = nvmet_rdma_alloc_sgl(&rsp->req.sg, &rsp->req.sg_cnt,
+			len);
+	if (status)
+		return status;
 
 	ret = rdma_rw_ctx_init(&rsp->rw, cm_id->qp, cm_id->port_num,
 			rsp->req.sg, rsp->req.sg_cnt, 0, addr, key,
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH] nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads
@ 2016-08-01  8:36 ` Sagi Grimberg
  0 siblings, 0 replies; 16+ messages in thread
From: Sagi Grimberg @ 2016-08-01  8:36 UTC (permalink / raw)


Under extreme conditions this might cause data corruptions. By doing that
we we repost the buffer and then post this buffer for the device to send.
If we happen to use shared receive queues the device might write to the
buffer before it sends it (there is no ordering between send and recv
queues). Without SRQs we probably won't get that if the host doesn't
mis-behave and send more than we allowed it, but relying on that is not
really a good idea.

Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/target/rdma.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index e06d504bdf0c..4e83d92d6bdd 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -615,15 +615,10 @@ static u16 nvmet_rdma_map_sgl_keyed(struct nvmet_rdma_rsp *rsp,
 	if (!len)
 		return 0;
 
-	/* use the already allocated data buffer if possible */
-	if (len <= NVMET_RDMA_INLINE_DATA_SIZE && rsp->queue->host_qid) {
-		nvmet_rdma_use_inline_sg(rsp, len, 0);
-	} else {
-		status = nvmet_rdma_alloc_sgl(&rsp->req.sg, &rsp->req.sg_cnt,
-				len);
-		if (status)
-			return status;
-	}
+	status = nvmet_rdma_alloc_sgl(&rsp->req.sg, &rsp->req.sg_cnt,
+			len);
+	if (status)
+		return status;
 
 	ret = rdma_rw_ctx_init(&rsp->rw, cm_id->qp, cm_id->port_num,
 			rsp->req.sg, rsp->req.sg_cnt, 0, addr, key,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads
  2016-08-01  8:36 ` Sagi Grimberg
@ 2016-08-02 12:50     ` Christoph Hellwig
  -1 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2016-08-02 12:50 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jens Axboe, Christoph Hellwig,
	Steve Wise, Jay Freyensee, Ming Lin

On Mon, Aug 01, 2016 at 11:36:39AM +0300, Sagi Grimberg wrote:
> Under extreme conditions this might cause data corruptions. By doing that
> we we repost the buffer and then post this buffer for the device to send.
> If we happen to use shared receive queues the device might write to the
> buffer before it sends it (there is no ordering between send and recv
> queues). Without SRQs we probably won't get that if the host doesn't
> mis-behave and send more than we allowed it, but relying on that is not
> really a good idea.

Pitty - it seems so wasteful not being able to use these buffers for
anything that isn't an inline write.  I fully agree on the SRQ case,
but I think we should offer it for the non-SRP case.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads
@ 2016-08-02 12:50     ` Christoph Hellwig
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2016-08-02 12:50 UTC (permalink / raw)


On Mon, Aug 01, 2016@11:36:39AM +0300, Sagi Grimberg wrote:
> Under extreme conditions this might cause data corruptions. By doing that
> we we repost the buffer and then post this buffer for the device to send.
> If we happen to use shared receive queues the device might write to the
> buffer before it sends it (there is no ordering between send and recv
> queues). Without SRQs we probably won't get that if the host doesn't
> mis-behave and send more than we allowed it, but relying on that is not
> really a good idea.

Pitty - it seems so wasteful not being able to use these buffers for
anything that isn't an inline write.  I fully agree on the SRQ case,
but I think we should offer it for the non-SRP case.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads
  2016-08-02 12:50     ` Christoph Hellwig
@ 2016-08-02 13:38         ` Sagi Grimberg
  -1 siblings, 0 replies; 16+ messages in thread
From: Sagi Grimberg @ 2016-08-02 13:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jens Axboe, Steve Wise,
	Jay Freyensee, Ming Lin


>> Under extreme conditions this might cause data corruptions. By doing that
>> we we repost the buffer and then post this buffer for the device to send.
>> If we happen to use shared receive queues the device might write to the
>> buffer before it sends it (there is no ordering between send and recv
>> queues). Without SRQs we probably won't get that if the host doesn't
>> mis-behave and send more than we allowed it, but relying on that is not
>> really a good idea.
>
> Pitty - it seems so wasteful not being able to use these buffers for
> anything that isn't an inline write.

Totally agree, I'm open to smart ideas on this...

> I fully agree on the SRQ case, but I think we should offer it for the non-SRP case.

As I wrote, even in the non-srq case, if the host is sending a single
write over the negotiated queue size, the data can land in the buffer
that is currently being sent (its a rare race condition, but
theoretically possible). The reason is that we repost the inline data
buffer for receive before we post the send request. We used to have
it the other way around (which eliminates the issue) but we then saw
some latency bubbles due to the HW sending rnr-naks to the host in
the lack of a receive buffer (in iWARP the problem was even worse
because there is no flow-control).

Do you think it's OK to risk data corruption if the host is
misbehaving?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads
@ 2016-08-02 13:38         ` Sagi Grimberg
  0 siblings, 0 replies; 16+ messages in thread
From: Sagi Grimberg @ 2016-08-02 13:38 UTC (permalink / raw)



>> Under extreme conditions this might cause data corruptions. By doing that
>> we we repost the buffer and then post this buffer for the device to send.
>> If we happen to use shared receive queues the device might write to the
>> buffer before it sends it (there is no ordering between send and recv
>> queues). Without SRQs we probably won't get that if the host doesn't
>> mis-behave and send more than we allowed it, but relying on that is not
>> really a good idea.
>
> Pitty - it seems so wasteful not being able to use these buffers for
> anything that isn't an inline write.

Totally agree, I'm open to smart ideas on this...

> I fully agree on the SRQ case, but I think we should offer it for the non-SRP case.

As I wrote, even in the non-srq case, if the host is sending a single
write over the negotiated queue size, the data can land in the buffer
that is currently being sent (its a rare race condition, but
theoretically possible). The reason is that we repost the inline data
buffer for receive before we post the send request. We used to have
it the other way around (which eliminates the issue) but we then saw
some latency bubbles due to the HW sending rnr-naks to the host in
the lack of a receive buffer (in iWARP the problem was even worse
because there is no flow-control).

Do you think it's OK to risk data corruption if the host is
misbehaving?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads
  2016-08-02 13:38         ` Sagi Grimberg
@ 2016-08-02 16:15             ` Jason Gunthorpe
  -1 siblings, 0 replies; 16+ messages in thread
From: Jason Gunthorpe @ 2016-08-02 16:15 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jens Axboe, Steve Wise,
	Jay Freyensee, Ming Lin

On Tue, Aug 02, 2016 at 04:38:58PM +0300, Sagi Grimberg wrote:
> that is currently being sent (its a rare race condition, but
> theoretically possible). The reason is that we repost the inline data
> buffer for receive before we post the send request. We used to have

?? The same buffer is posted at the same time for send and recv? That
is never OK, SRQ or not.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads
@ 2016-08-02 16:15             ` Jason Gunthorpe
  0 siblings, 0 replies; 16+ messages in thread
From: Jason Gunthorpe @ 2016-08-02 16:15 UTC (permalink / raw)


On Tue, Aug 02, 2016@04:38:58PM +0300, Sagi Grimberg wrote:
> that is currently being sent (its a rare race condition, but
> theoretically possible). The reason is that we repost the inline data
> buffer for receive before we post the send request. We used to have

?? The same buffer is posted at the same time for send and recv? That
is never OK, SRQ or not.

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads
  2016-08-02 16:15             ` Jason Gunthorpe
@ 2016-08-03  9:48                 ` Sagi Grimberg
  -1 siblings, 0 replies; 16+ messages in thread
From: Sagi Grimberg @ 2016-08-03  9:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jens Axboe, Steve Wise,
	Jay Freyensee, Ming Lin


>> that is currently being sent (its a rare race condition, but
>> theoretically possible). The reason is that we repost the inline data
>> buffer for receive before we post the send request. We used to have
>
> ?? The same buffer is posted at the same time for send and recv? That
> is never OK, SRQ or not.

I agree.

But I agree its a shame to lose this. Maybe if we over-allocate cmds
and restore the receive repost to be after the send completion? I'm not
too fond of that either (it's not just commands but also inline
pages...)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads
@ 2016-08-03  9:48                 ` Sagi Grimberg
  0 siblings, 0 replies; 16+ messages in thread
From: Sagi Grimberg @ 2016-08-03  9:48 UTC (permalink / raw)



>> that is currently being sent (its a rare race condition, but
>> theoretically possible). The reason is that we repost the inline data
>> buffer for receive before we post the send request. We used to have
>
> ?? The same buffer is posted at the same time for send and recv? That
> is never OK, SRQ or not.

I agree.

But I agree its a shame to lose this. Maybe if we over-allocate cmds
and restore the receive repost to be after the send completion? I'm not
too fond of that either (it's not just commands but also inline
pages...)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads
  2016-08-02 16:15             ` Jason Gunthorpe
@ 2016-08-03  9:49                 ` Christoph Hellwig
  -1 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2016-08-03  9:49 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Sagi Grimberg, Christoph Hellwig,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jens Axboe, Steve Wise,
	Jay Freyensee, Ming Lin

On Tue, Aug 02, 2016 at 10:15:26AM -0600, Jason Gunthorpe wrote:
> On Tue, Aug 02, 2016 at 04:38:58PM +0300, Sagi Grimberg wrote:
> > that is currently being sent (its a rare race condition, but
> > theoretically possible). The reason is that we repost the inline data
> > buffer for receive before we post the send request. We used to have
> 
> ?? The same buffer is posted at the same time for send and recv? That
> is never OK, SRQ or not.

We will never POST it for a SEND, but it would be used as the target
of RDMA READ / WRITE operations.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads
@ 2016-08-03  9:49                 ` Christoph Hellwig
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2016-08-03  9:49 UTC (permalink / raw)


On Tue, Aug 02, 2016@10:15:26AM -0600, Jason Gunthorpe wrote:
> On Tue, Aug 02, 2016@04:38:58PM +0300, Sagi Grimberg wrote:
> > that is currently being sent (its a rare race condition, but
> > theoretically possible). The reason is that we repost the inline data
> > buffer for receive before we post the send request. We used to have
> 
> ?? The same buffer is posted at the same time for send and recv? That
> is never OK, SRQ or not.

We will never POST it for a SEND, but it would be used as the target
of RDMA READ / WRITE operations.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads
  2016-08-03  9:49                 ` Christoph Hellwig
@ 2016-08-03 10:37                     ` Sagi Grimberg
  -1 siblings, 0 replies; 16+ messages in thread
From: Sagi Grimberg @ 2016-08-03 10:37 UTC (permalink / raw)
  To: Christoph Hellwig, Jason Gunthorpe
  Cc: linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jens Axboe, Steve Wise,
	Jay Freyensee, Ming Lin


>>> that is currently being sent (its a rare race condition, but
>>> theoretically possible). The reason is that we repost the inline data
>>> buffer for receive before we post the send request. We used to have
>>
>> ?? The same buffer is posted at the same time for send and recv? That
>> is never OK, SRQ or not.
>
> We will never POST it for a SEND, but it would be used as the target
> of RDMA READ / WRITE operations.

Jason's comment still holds.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads
@ 2016-08-03 10:37                     ` Sagi Grimberg
  0 siblings, 0 replies; 16+ messages in thread
From: Sagi Grimberg @ 2016-08-03 10:37 UTC (permalink / raw)



>>> that is currently being sent (its a rare race condition, but
>>> theoretically possible). The reason is that we repost the inline data
>>> buffer for receive before we post the send request. We used to have
>>
>> ?? The same buffer is posted at the same time for send and recv? That
>> is never OK, SRQ or not.
>
> We will never POST it for a SEND, but it would be used as the target
> of RDMA READ / WRITE operations.

Jason's comment still holds.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads
  2016-08-01  8:36 ` Sagi Grimberg
@ 2016-08-04 11:49     ` Christoph Hellwig
  -1 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2016-08-04 11:49 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jens Axboe, Christoph Hellwig,
	Steve Wise, Jay Freyensee, Ming Lin

On Mon, Aug 01, 2016 at 11:36:39AM +0300, Sagi Grimberg wrote:
> Under extreme conditions this might cause data corruptions. By doing that
> we we repost the buffer and then post this buffer for the device to send.
> If we happen to use shared receive queues the device might write to the
> buffer before it sends it (there is no ordering between send and recv
> queues). Without SRQs we probably won't get that if the host doesn't
> mis-behave and send more than we allowed it, but relying on that is not
> really a good idea.

Allright, after the discussion:

Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads
@ 2016-08-04 11:49     ` Christoph Hellwig
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2016-08-04 11:49 UTC (permalink / raw)


On Mon, Aug 01, 2016@11:36:39AM +0300, Sagi Grimberg wrote:
> Under extreme conditions this might cause data corruptions. By doing that
> we we repost the buffer and then post this buffer for the device to send.
> If we happen to use shared receive queues the device might write to the
> buffer before it sends it (there is no ordering between send and recv
> queues). Without SRQs we probably won't get that if the host doesn't
> mis-behave and send more than we allowed it, but relying on that is not
> really a good idea.

Allright, after the discussion:

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-08-04 11:49 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-01  8:36 [PATCH] nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads Sagi Grimberg
2016-08-01  8:36 ` Sagi Grimberg
     [not found] ` <1470040599-7294-1-git-send-email-sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2016-08-02 12:50   ` Christoph Hellwig
2016-08-02 12:50     ` Christoph Hellwig
     [not found]     ` <20160802125042.GB13235-jcswGhMUV9g@public.gmane.org>
2016-08-02 13:38       ` Sagi Grimberg
2016-08-02 13:38         ` Sagi Grimberg
     [not found]         ` <157675f0-0576-1cc9-1f99-c0944185e3c5-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2016-08-02 16:15           ` Jason Gunthorpe
2016-08-02 16:15             ` Jason Gunthorpe
     [not found]             ` <20160802161526.GA14964-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-08-03  9:48               ` Sagi Grimberg
2016-08-03  9:48                 ` Sagi Grimberg
2016-08-03  9:49               ` Christoph Hellwig
2016-08-03  9:49                 ` Christoph Hellwig
     [not found]                 ` <20160803094941.GA11821-jcswGhMUV9g@public.gmane.org>
2016-08-03 10:37                   ` Sagi Grimberg
2016-08-03 10:37                     ` Sagi Grimberg
2016-08-04 11:49   ` Christoph Hellwig
2016-08-04 11:49     ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.