* [PATCH 1/2] nvme-tcp: set MSG_SENDPAGE_NOTLAST with MSG_MORE when we have more to send
@ 2020-05-05 5:20 Sagi Grimberg
2020-05-05 5:20 ` [PATCH 2/2] nvmet-tcp: " Sagi Grimberg
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Sagi Grimberg @ 2020-05-05 5:20 UTC (permalink / raw)
To: linux-nvme, Christoph Hellwig, Keith Busch
Cc: Anil Vasudevan, Mark Wunderlich
We can signal the stack that this is not the last page coming and the
stack can build a larger tso segment, so go ahead and use it.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
drivers/nvme/host/tcp.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index c79e248b9f43..7c7c1886642f 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -885,7 +885,7 @@ static int nvme_tcp_try_send_data(struct nvme_tcp_request *req)
if (last && !queue->data_digest)
flags |= MSG_EOR;
else
- flags |= MSG_MORE;
+ flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST;
/* can't zcopy slab pages */
if (unlikely(PageSlab(page))) {
@@ -924,11 +924,16 @@ static int nvme_tcp_try_send_cmd_pdu(struct nvme_tcp_request *req)
struct nvme_tcp_queue *queue = req->queue;
struct nvme_tcp_cmd_pdu *pdu = req->pdu;
bool inline_data = nvme_tcp_has_inline_data(req);
- int flags = MSG_DONTWAIT | (inline_data ? MSG_MORE : MSG_EOR);
u8 hdgst = nvme_tcp_hdgst_len(queue);
int len = sizeof(*pdu) + hdgst - req->offset;
+ int flags = MSG_DONTWAIT;
int ret;
+ if (inline_data)
+ flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST;
+ else
+ flags |= MSG_EOR;
+
if (queue->hdr_digest && !req->offset)
nvme_tcp_hdgst(queue->snd_hash, pdu, sizeof(*pdu));
@@ -967,7 +972,7 @@ static int nvme_tcp_try_send_data_pdu(struct nvme_tcp_request *req)
ret = kernel_sendpage(queue->sock, virt_to_page(pdu),
offset_in_page(pdu) + req->offset, len,
- MSG_DONTWAIT | MSG_MORE);
+ MSG_DONTWAIT | MSG_MORE | MSG_SENDPAGE_NOTLAST);
if (unlikely(ret <= 0))
return ret;
--
2.20.1
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/2] nvmet-tcp: set MSG_SENDPAGE_NOTLAST with MSG_MORE when we have more to send
2020-05-05 5:20 [PATCH 1/2] nvme-tcp: set MSG_SENDPAGE_NOTLAST with MSG_MORE when we have more to send Sagi Grimberg
@ 2020-05-05 5:20 ` Sagi Grimberg
2020-05-12 16:13 ` Christoph Hellwig
2020-05-05 6:09 ` [PATCH 1/2] nvme-tcp: " Christoph Hellwig
2020-05-12 16:12 ` Christoph Hellwig
2 siblings, 1 reply; 12+ messages in thread
From: Sagi Grimberg @ 2020-05-05 5:20 UTC (permalink / raw)
To: linux-nvme, Christoph Hellwig, Keith Busch
Cc: Anil Vasudevan, Mark Wunderlich
We can signal the stack that this is not the last page coming and the
stack can build a larger tso segment, so go ahead and use it.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
drivers/nvme/target/tcp.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index f0da04e960f4..c08aec62115e 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -510,7 +510,7 @@ static int nvmet_try_send_data_pdu(struct nvmet_tcp_cmd *cmd)
ret = kernel_sendpage(cmd->queue->sock, virt_to_page(cmd->data_pdu),
offset_in_page(cmd->data_pdu) + cmd->offset,
- left, MSG_DONTWAIT | MSG_MORE);
+ left, MSG_DONTWAIT | MSG_MORE | MSG_SENDPAGE_NOTLAST);
if (ret <= 0)
return ret;
@@ -538,7 +538,7 @@ static int nvmet_try_send_data(struct nvmet_tcp_cmd *cmd, bool last_in_batch)
if ((!last_in_batch && cmd->queue->send_list_len) ||
cmd->wbytes_done + left < cmd->req.transfer_len ||
queue->data_digest || !queue->nvme_sq.sqhd_disabled)
- flags |= MSG_MORE;
+ flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST;
ret = kernel_sendpage(cmd->queue->sock, page, cmd->offset,
left, flags);
@@ -585,7 +585,7 @@ static int nvmet_try_send_response(struct nvmet_tcp_cmd *cmd,
int ret;
if (!last_in_batch && cmd->queue->send_list_len)
- flags |= MSG_MORE;
+ flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST;
else
flags |= MSG_EOR;
@@ -614,7 +614,7 @@ static int nvmet_try_send_r2t(struct nvmet_tcp_cmd *cmd, bool last_in_batch)
int ret;
if (!last_in_batch && cmd->queue->send_list_len)
- flags |= MSG_MORE;
+ flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST;
else
flags |= MSG_EOR;
--
2.20.1
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] nvme-tcp: set MSG_SENDPAGE_NOTLAST with MSG_MORE when we have more to send
2020-05-05 5:20 [PATCH 1/2] nvme-tcp: set MSG_SENDPAGE_NOTLAST with MSG_MORE when we have more to send Sagi Grimberg
2020-05-05 5:20 ` [PATCH 2/2] nvmet-tcp: " Sagi Grimberg
@ 2020-05-05 6:09 ` Christoph Hellwig
2020-05-05 6:50 ` Sagi Grimberg
2020-05-12 16:12 ` Christoph Hellwig
2 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2020-05-05 6:09 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Keith Busch, Anil Vasudevan, Mark Wunderlich, Christoph Hellwig,
linux-nvme
On Mon, May 04, 2020 at 10:20:01PM -0700, Sagi Grimberg wrote:
> We can signal the stack that this is not the last page coming and the
> stack can build a larger tso segment, so go ahead and use it.
Maybe you wan a little helper that returns the flags based on a last
flag? Something like:
static int nvme_tcp_msg_flags(bool last_page)
{
if (last_page)
return MSG_DONTWAIT | MSG_MORE | MSG_SENDPAGE_NOTLAST;
return MSG_DONTWAIT | MSG_EOR;
}
or do we have a case where we don't want to set EOR? At least the
target seems to currently have such a case.
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] nvme-tcp: set MSG_SENDPAGE_NOTLAST with MSG_MORE when we have more to send
2020-05-05 6:09 ` [PATCH 1/2] nvme-tcp: " Christoph Hellwig
@ 2020-05-05 6:50 ` Sagi Grimberg
2020-05-05 10:23 ` Christoph Hellwig
0 siblings, 1 reply; 12+ messages in thread
From: Sagi Grimberg @ 2020-05-05 6:50 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Keith Busch, Mark Wunderlich, Anil Vasudevan, linux-nvme
>> We can signal the stack that this is not the last page coming and the
>> stack can build a larger tso segment, so go ahead and use it.
>
> Maybe you wan a little helper that returns the flags based on a last
> flag? Something like:
>
> static int nvme_tcp_msg_flags(bool last_page)
> {
> if (last_page)
> return MSG_DONTWAIT | MSG_MORE | MSG_SENDPAGE_NOTLAST;
> return MSG_DONTWAIT | MSG_EOR;
> }
You have it reversed, the flag here probably means more...
Let me see if it is useful to have, will let you know...
>
> or do we have a case where we don't want to set EOR? At least the
> target seems to currently have such a case.
As a design goal, we try to tell the stack explicitly if we have more
to send and if not we want to push it down to reduce latency. So
I think we need to have it in the target as well.
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] nvme-tcp: set MSG_SENDPAGE_NOTLAST with MSG_MORE when we have more to send
2020-05-05 6:50 ` Sagi Grimberg
@ 2020-05-05 10:23 ` Christoph Hellwig
2020-05-05 21:53 ` Sagi Grimberg
0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2020-05-05 10:23 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Keith Busch, Mark Wunderlich, Anil Vasudevan, Christoph Hellwig,
linux-nvme
On Mon, May 04, 2020 at 11:50:57PM -0700, Sagi Grimberg wrote:
>
>>> We can signal the stack that this is not the last page coming and the
>>> stack can build a larger tso segment, so go ahead and use it.
>>
>> Maybe you wan a little helper that returns the flags based on a last
>> flag? Something like:
>>
>> static int nvme_tcp_msg_flags(bool last_page)
>> {
>> if (last_page)
>> return MSG_DONTWAIT | MSG_MORE | MSG_SENDPAGE_NOTLAST;
>> return MSG_DONTWAIT | MSG_EOR;
>> }
>
> You have it reversed, the flag here probably means more...
>
> Let me see if it is useful to have, will let you know...
>
>>
>> or do we have a case where we don't want to set EOR? At least the
>> target seems to currently have such a case.
>
> As a design goal, we try to tell the stack explicitly if we have more
> to send and if not we want to push it down to reduce latency. So
> I think we need to have it in the target as well.
What I mean is that nvmet_try_send_data and nvmet_try_send_ddgst may set
neither MS_MORE nor MSG_EOR. Is that intentional?
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] nvme-tcp: set MSG_SENDPAGE_NOTLAST with MSG_MORE when we have more to send
2020-05-05 10:23 ` Christoph Hellwig
@ 2020-05-05 21:53 ` Sagi Grimberg
2020-05-06 4:27 ` Christoph Hellwig
0 siblings, 1 reply; 12+ messages in thread
From: Sagi Grimberg @ 2020-05-05 21:53 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Keith Busch, Mark Wunderlich, Anil Vasudevan, linux-nvme
>>>> We can signal the stack that this is not the last page coming and the
>>>> stack can build a larger tso segment, so go ahead and use it.
>>>
>>> Maybe you wan a little helper that returns the flags based on a last
>>> flag? Something like:
>>>
>>> static int nvme_tcp_msg_flags(bool last_page)
>>> {
>>> if (last_page)
>>> return MSG_DONTWAIT | MSG_MORE | MSG_SENDPAGE_NOTLAST;
>>> return MSG_DONTWAIT | MSG_EOR;
>>> }
>>
>> You have it reversed, the flag here probably means more...
>>
>> Let me see if it is useful to have, will let you know...
>>
>>>
>>> or do we have a case where we don't want to set EOR? At least the
>>> target seems to currently have such a case.
>>
>> As a design goal, we try to tell the stack explicitly if we have more
>> to send and if not we want to push it down to reduce latency. So
>> I think we need to have it in the target as well.
>
> What I mean is that nvmet_try_send_data and nvmet_try_send_ddgst may set
> neither MS_MORE nor MSG_EOR. Is that intentional?
nvmet_try_send_data should set MSG_EOR if it doesn't have more to send
and also nvmet_try_send_ddgst. So its not intentional.
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] nvme-tcp: set MSG_SENDPAGE_NOTLAST with MSG_MORE when we have more to send
2020-05-05 21:53 ` Sagi Grimberg
@ 2020-05-06 4:27 ` Christoph Hellwig
2020-05-08 0:50 ` Sagi Grimberg
0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2020-05-06 4:27 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Keith Busch, Mark Wunderlich, Anil Vasudevan, Christoph Hellwig,
linux-nvme
On Tue, May 05, 2020 at 02:53:40PM -0700, Sagi Grimberg wrote:
>> What I mean is that nvmet_try_send_data and nvmet_try_send_ddgst may set
>> neither MS_MORE nor MSG_EOR. Is that intentional?
>
> nvmet_try_send_data should set MSG_EOR if it doesn't have more to send
> and also nvmet_try_send_ddgst. So its not intentional.
Ok. Can you send it with a little helper like I suggested (probably one
each for host and target) that ensures the right flags are set
everywhere?
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] nvme-tcp: set MSG_SENDPAGE_NOTLAST with MSG_MORE when we have more to send
2020-05-06 4:27 ` Christoph Hellwig
@ 2020-05-08 0:50 ` Sagi Grimberg
2020-05-08 7:35 ` Christoph Hellwig
0 siblings, 1 reply; 12+ messages in thread
From: Sagi Grimberg @ 2020-05-08 0:50 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Keith Busch, Mark Wunderlich, Anil Vasudevan, linux-nvme
>>> What I mean is that nvmet_try_send_data and nvmet_try_send_ddgst may set
>>> neither MS_MORE nor MSG_EOR. Is that intentional?
>>
>> nvmet_try_send_data should set MSG_EOR if it doesn't have more to send
>> and also nvmet_try_send_ddgst. So its not intentional.
>
> Ok. Can you send it with a little helper like I suggested (probably one
> each for host and target) that ensures the right flags are set
> everywhere?
I think its actually better without the helper. MSG_SENDPAGE_NOTLAST is
designed only for sendpage and not for sendmsg which we use for ddgst
(although the net stack code appears to ignore, but still) and when we
send a pdu header that has data, we dont need the condition because its
not last for sure.
So the helpers capture ~60% of the call-sites... seems to me like its
better off without them at the moment. WDYT?
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] nvme-tcp: set MSG_SENDPAGE_NOTLAST with MSG_MORE when we have more to send
2020-05-08 0:50 ` Sagi Grimberg
@ 2020-05-08 7:35 ` Christoph Hellwig
2020-05-08 7:38 ` Sagi Grimberg
0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2020-05-08 7:35 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Keith Busch, Mark Wunderlich, Anil Vasudevan, Christoph Hellwig,
linux-nvme
On Thu, May 07, 2020 at 05:50:38PM -0700, Sagi Grimberg wrote:
>
>>>> What I mean is that nvmet_try_send_data and nvmet_try_send_ddgst may set
>>>> neither MS_MORE nor MSG_EOR. Is that intentional?
>>>
>>> nvmet_try_send_data should set MSG_EOR if it doesn't have more to send
>>> and also nvmet_try_send_ddgst. So its not intentional.
>>
>> Ok. Can you send it with a little helper like I suggested (probably one
>> each for host and target) that ensures the right flags are set
>> everywhere?
>
> I think its actually better without the helper. MSG_SENDPAGE_NOTLAST is
> designed only for sendpage and not for sendmsg which we use for ddgst
> (although the net stack code appears to ignore, but still) and when we
> send a pdu header that has data, we dont need the condition because its
> not last for sure.
>
> So the helpers capture ~60% of the call-sites... seems to me like its
> better off without them at the moment. WDYT?
Ok. Are going to resend with the nvmet_try_send_data fix thrown in?
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] nvme-tcp: set MSG_SENDPAGE_NOTLAST with MSG_MORE when we have more to send
2020-05-08 7:35 ` Christoph Hellwig
@ 2020-05-08 7:38 ` Sagi Grimberg
0 siblings, 0 replies; 12+ messages in thread
From: Sagi Grimberg @ 2020-05-08 7:38 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Keith Busch, Anil Vasudevan, Mark Wunderlich, linux-nvme
>>>>> What I mean is that nvmet_try_send_data and nvmet_try_send_ddgst may set
>>>>> neither MS_MORE nor MSG_EOR. Is that intentional?
>>>>
>>>> nvmet_try_send_data should set MSG_EOR if it doesn't have more to send
>>>> and also nvmet_try_send_ddgst. So its not intentional.
>>>
>>> Ok. Can you send it with a little helper like I suggested (probably one
>>> each for host and target) that ensures the right flags are set
>>> everywhere?
>>
>> I think its actually better without the helper. MSG_SENDPAGE_NOTLAST is
>> designed only for sendpage and not for sendmsg which we use for ddgst
>> (although the net stack code appears to ignore, but still) and when we
>> send a pdu header that has data, we dont need the condition because its
>> not last for sure.
>>
>> So the helpers capture ~60% of the call-sites... seems to me like its
>> better off without them at the moment. WDYT?
>
> Ok. Are going to resend with the nvmet_try_send_data fix thrown in?
Don't want to overload more logical changes on the patch itself, I'll
send a patch separately, these should be good to go...
Thanks.
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] nvme-tcp: set MSG_SENDPAGE_NOTLAST with MSG_MORE when we have more to send
2020-05-05 5:20 [PATCH 1/2] nvme-tcp: set MSG_SENDPAGE_NOTLAST with MSG_MORE when we have more to send Sagi Grimberg
2020-05-05 5:20 ` [PATCH 2/2] nvmet-tcp: " Sagi Grimberg
2020-05-05 6:09 ` [PATCH 1/2] nvme-tcp: " Christoph Hellwig
@ 2020-05-12 16:12 ` Christoph Hellwig
2 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2020-05-12 16:12 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Keith Busch, Anil Vasudevan, Mark Wunderlich, Christoph Hellwig,
linux-nvme
Applied to nvme-5.8.
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] nvmet-tcp: set MSG_SENDPAGE_NOTLAST with MSG_MORE when we have more to send
2020-05-05 5:20 ` [PATCH 2/2] nvmet-tcp: " Sagi Grimberg
@ 2020-05-12 16:13 ` Christoph Hellwig
0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2020-05-12 16:13 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Keith Busch, Anil Vasudevan, Mark Wunderlich, Christoph Hellwig,
linux-nvme
On Mon, May 04, 2020 at 10:20:02PM -0700, Sagi Grimberg wrote:
> We can signal the stack that this is not the last page coming and the
> stack can build a larger tso segment, so go ahead and use it.
Applied to nvme-5.8.
Can you prepare a patch to add the missing MSG_EOR flag in
nvmet_try_send_ddgst?
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2020-05-12 16:13 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-05 5:20 [PATCH 1/2] nvme-tcp: set MSG_SENDPAGE_NOTLAST with MSG_MORE when we have more to send Sagi Grimberg
2020-05-05 5:20 ` [PATCH 2/2] nvmet-tcp: " Sagi Grimberg
2020-05-12 16:13 ` Christoph Hellwig
2020-05-05 6:09 ` [PATCH 1/2] nvme-tcp: " Christoph Hellwig
2020-05-05 6:50 ` Sagi Grimberg
2020-05-05 10:23 ` Christoph Hellwig
2020-05-05 21:53 ` Sagi Grimberg
2020-05-06 4:27 ` Christoph Hellwig
2020-05-08 0:50 ` Sagi Grimberg
2020-05-08 7:35 ` Christoph Hellwig
2020-05-08 7:38 ` Sagi Grimberg
2020-05-12 16:12 ` Christoph Hellwig
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.