All of lore.kernel.org
 help / color / mirror / Atom feed
From: Maxim Mikityanskiy <maximmi@nvidia.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Boris Pismenny <borisp@nvidia.com>,
	John Fastabend <john.fastabend@gmail.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"David S. Miller" <davem@davemloft.net>,
	Paolo Abeni <pabeni@redhat.com>, Tariq Toukan <tariqt@nvidia.com>,
	Aviad Yehezkel <aviadye@mellanox.com>,
	Ilya Lesokhin <ilyal@mellanox.com>,
	netdev@vger.kernel.org
Subject: Re: [PATCH net] tls: Skip tls_append_frag on zero copy size
Date: Thu, 21 Apr 2022 12:47:18 +0300	[thread overview]
Message-ID: <da984a08-1730-1b0c-d845-cf7ec732ba4c@nvidia.com> (raw)
In-Reply-To: <3c90d3cd-5224-4224-e9d9-e45546ce51c6@nvidia.com>

On 2022-04-18 17:56, Maxim Mikityanskiy wrote:
> On 2022-04-14 13:28, Jakub Kicinski wrote:
>> On Wed, 13 Apr 2022 16:49:56 +0300 Maxim Mikityanskiy wrote:
>>> Calling tls_append_frag when max_open_record_len == record->len might
>>> add an empty fragment to the TLS record if the call happens to be on the
>>> page boundary. Normally tls_append_frag coalesces the zero-sized
>>> fragment to the previous one, but not if it's on page boundary.
>>>
>>> If a resync happens then, the mlx5 driver posts dump WQEs in
>>> tx_post_resync_dump, and the empty fragment may become a data segment
>>> with byte_count == 0, which will confuse the NIC and lead to a CQE
>>> error.
>>>
>>> This commit fixes the described issue by skipping tls_append_frag on
>>> zero size to avoid adding empty fragments. The fix is not in the driver,
>>> because an empty fragment is hardly the desired behavior.
>>>
>>> Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
>>> Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
>>> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
>>> ---
>>>   net/tls/tls_device.c | 12 +++++++-----
>>>   1 file changed, 7 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
>>> index 12f7b56771d9..af875ad4a822 100644
>>> --- a/net/tls/tls_device.c
>>> +++ b/net/tls/tls_device.c
>>> @@ -483,11 +483,13 @@ static int tls_push_data(struct sock *sk,
>>>           copy = min_t(size_t, size, (pfrag->size - pfrag->offset));
>>>           copy = min_t(size_t, copy, (max_open_record_len - 
>>> record->len));
>>> -        rc = tls_device_copy_data(page_address(pfrag->page) +
>>> -                      pfrag->offset, copy, msg_iter);
>>> -        if (rc)
>>> -            goto handle_error;
>>> -        tls_append_frag(record, pfrag, copy);
>>> +        if (copy) {
>>> +            rc = tls_device_copy_data(page_address(pfrag->page) +
>>> +                          pfrag->offset, copy, msg_iter);
>>> +            if (rc)
>>> +                goto handle_error;
>>> +            tls_append_frag(record, pfrag, copy);
>>> +        }
>>
>> I appreciate you're likely trying to keep the fix minimal but Greg
>> always says "fix it right, worry about backports later".
>>
>> I think we should skip more, we can reorder the mins and if
>> min(size, rec space) == 0 then we can skip the allocation as well.
> 
> Sorry, I didn't get the idea. Could you elaborate?
> 
> Reordering the mins:
> 
> copy = min_t(size_t, size, max_open_record_len - record->len);
> copy = min_t(size_t, copy, pfrag->size - pfrag->offset);
> 
> I assume by skipping the allocation you mean skipping 
> tls_do_allocation(), right? Do you suggest to skip it if the result of 
> the first min_t() is 0?
> 
> record->len used in the first min_t() comes from ctx->open_record, which 
> either exists or is allocated by tls_do_allocation(). If we move the 
> copy == 0 check above the tls_do_allocation() call, first we'll have to 
> check whether ctx->open_record is NULL, which is currently checked by 
> tls_do_allocation() itself.
> 
> If open_record is not NULL, there isn't much to skip in 
> tls_do_allocation on copy == 0, the main part is already skipped, 
> regardless of the value of copy. If open_record is NULL, we can't skip 
> tls_do_allocation, and copy won't be 0 afterwards.
> 
> To compare, before (pseudocode):
> 
> tls_do_allocation {
>      if (!ctx->open_record)
>          ALLOCATE RECORD
>          Now ctx->open_record is not NULL
>      if (!sk_page_frag_refill(sk, pfrag))
>          return -ENOMEM
> }
> handle errors from tls_do_allocation
> copy = min(size, pfrag->size - pfrag->offset)
> copy = min(copy, max_open_record_len - ctx->open_record->len)
> if (copy)
>      copy data and append frag
> 
> After:
> 
> if (ctx->open_record) {
>      copy = min(size, max_open_record_len - ctx->open_record->len)
>      if (copy) {
>          // You want to put this part of tls_do_allocation under if (copy)?
>          if (!sk_page_frag_refill(sk, pfrag))
>              handle errors
>          copy = min(copy, pfrag->size - pfrag->offset)
>          if (copy)
>              copy data and append frag
>      }
> } else {
>      ALLOCATE RECORD
>      if (!sk_page_frag_refill(sk, pfrag))
>          handle errors
>      // Have to do this after the allocation anyway.
>      copy = min(size, max_open_record_len - ctx->open_record->len)
>      copy = min(copy, pfrag->size - pfrag->offset)
>      if (copy)
>          copy data and append frag
> }
> 
> Either I totally don't get what you suggested, or it doesn't make sense 
> to me, because we have +1 branch in the common path when a record is 
> open and copy is not 0, no changes when there is no record, and more 
> repeating code hard to compress.
> 
> If I missed your idea, please explain in more details.

Jakub, is your comment still relevant after my response? If not, can the 
patch be merged?

>> Maybe some application wants to do zero-length sends to flush the
>> MSG_MORE and would benefit that way?
> 
> If it's a zero-length send, it means that size is 0 initially, and 
> max_open_record_len - ctx->open_record->len isn't 0 (otherwise the 
> record would have been closed at a previous iteration). That doesn't 
> sound related to swapping the mins and skipping tls_do_allocation on 
> copy == 0.
> 
> Thanks,
> Max
> 
>>>           size -= copy;
>>>           if (!size) {
>>
> 


  reply	other threads:[~2022-04-21  9:47 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-13 13:49 [PATCH net] tls: Skip tls_append_frag on zero copy size Maxim Mikityanskiy
2022-04-14 10:28 ` Jakub Kicinski
2022-04-18 14:56   ` Maxim Mikityanskiy
2022-04-21  9:47     ` Maxim Mikityanskiy [this message]
2022-04-22 14:55       ` Jakub Kicinski
2022-04-26 15:48         ` Maxim Mikityanskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=da984a08-1730-1b0c-d845-cf7ec732ba4c@nvidia.com \
    --to=maximmi@nvidia.com \
    --cc=aviadye@mellanox.com \
    --cc=borisp@nvidia.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=ilyal@mellanox.com \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.