Re: [PATCH net] tls: Skip tls_append_frag on zero copy size

From: Maxim Mikityanskiy <maximmi@nvidia.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Boris Pismenny <borisp@nvidia.com>,
	John Fastabend <john.fastabend@gmail.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"David S. Miller" <davem@davemloft.net>,
	Paolo Abeni <pabeni@redhat.com>, Tariq Toukan <tariqt@nvidia.com>,
	Aviad Yehezkel <aviadye@mellanox.com>,
	Ilya Lesokhin <ilyal@mellanox.com>,
	netdev@vger.kernel.org
Subject: Re: [PATCH net] tls: Skip tls_append_frag on zero copy size
Date: Thu, 21 Apr 2022 12:47:18 +0300	[thread overview]
Message-ID: <da984a08-1730-1b0c-d845-cf7ec732ba4c@nvidia.com> (raw)
In-Reply-To: <3c90d3cd-5224-4224-e9d9-e45546ce51c6@nvidia.com>

On 2022-04-18 17:56, Maxim Mikityanskiy wrote:
> On 2022-04-14 13:28, Jakub Kicinski wrote:
>> On Wed, 13 Apr 2022 16:49:56 +0300 Maxim Mikityanskiy wrote:
>>> Calling tls_append_frag when max_open_record_len == record->len might
>>> add an empty fragment to the TLS record if the call happens to be on the
>>> page boundary. Normally tls_append_frag coalesces the zero-sized
>>> fragment to the previous one, but not if it's on page boundary.
>>>
>>> If a resync happens then, the mlx5 driver posts dump WQEs in
>>> tx_post_resync_dump, and the empty fragment may become a data segment
>>> with byte_count == 0, which will confuse the NIC and lead to a CQE
>>> error.
>>>
>>> This commit fixes the described issue by skipping tls_append_frag on
>>> zero size to avoid adding empty fragments. The fix is not in the driver,
>>> because an empty fragment is hardly the desired behavior.
>>>
>>> Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
>>> Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
>>> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
>>> ---
>>>   net/tls/tls_device.c | 12 +++++++-----
>>>   1 file changed, 7 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
>>> index 12f7b56771d9..af875ad4a822 100644
>>> --- a/net/tls/tls_device.c
>>> +++ b/net/tls/tls_device.c
>>> @@ -483,11 +483,13 @@ static int tls_push_data(struct sock *sk,
>>>           copy = min_t(size_t, size, (pfrag->size - pfrag->offset));
>>>           copy = min_t(size_t, copy, (max_open_record_len - 
>>> record->len));
>>> -        rc = tls_device_copy_data(page_address(pfrag->page) +
>>> -                      pfrag->offset, copy, msg_iter);
>>> -        if (rc)
>>> -            goto handle_error;
>>> -        tls_append_frag(record, pfrag, copy);
>>> +        if (copy) {
>>> +            rc = tls_device_copy_data(page_address(pfrag->page) +
>>> +                          pfrag->offset, copy, msg_iter);
>>> +            if (rc)
>>> +                goto handle_error;
>>> +            tls_append_frag(record, pfrag, copy);
>>> +        }
>>
>> I appreciate you're likely trying to keep the fix minimal but Greg
>> always says "fix it right, worry about backports later".
>>
>> I think we should skip more, we can reorder the mins and if
>> min(size, rec space) == 0 then we can skip the allocation as well.
> 
> Sorry, I didn't get the idea. Could you elaborate?
> 
> Reordering the mins:
> 
> copy = min_t(size_t, size, max_open_record_len - record->len);
> copy = min_t(size_t, copy, pfrag->size - pfrag->offset);
> 
> I assume by skipping the allocation you mean skipping 
> tls_do_allocation(), right? Do you suggest to skip it if the result of 
> the first min_t() is 0?
> 
> record->len used in the first min_t() comes from ctx->open_record, which 
> either exists or is allocated by tls_do_allocation(). If we move the 
> copy == 0 check above the tls_do_allocation() call, first we'll have to 
> check whether ctx->open_record is NULL, which is currently checked by 
> tls_do_allocation() itself.
> 
> If open_record is not NULL, there isn't much to skip in 
> tls_do_allocation on copy == 0, the main part is already skipped, 
> regardless of the value of copy. If open_record is NULL, we can't skip 
> tls_do_allocation, and copy won't be 0 afterwards.
> 
> To compare, before (pseudocode):
> 
> tls_do_allocation {
>      if (!ctx->open_record)
>          ALLOCATE RECORD
>          Now ctx->open_record is not NULL
>      if (!sk_page_frag_refill(sk, pfrag))
>          return -ENOMEM
> }
> handle errors from tls_do_allocation
> copy = min(size, pfrag->size - pfrag->offset)
> copy = min(copy, max_open_record_len - ctx->open_record->len)
> if (copy)
>      copy data and append frag
> 
> After:
> 
> if (ctx->open_record) {
>      copy = min(size, max_open_record_len - ctx->open_record->len)
>      if (copy) {
>          // You want to put this part of tls_do_allocation under if (copy)?
>          if (!sk_page_frag_refill(sk, pfrag))
>              handle errors
>          copy = min(copy, pfrag->size - pfrag->offset)
>          if (copy)
>              copy data and append frag
>      }
> } else {
>      ALLOCATE RECORD
>      if (!sk_page_frag_refill(sk, pfrag))
>          handle errors
>      // Have to do this after the allocation anyway.
>      copy = min(size, max_open_record_len - ctx->open_record->len)
>      copy = min(copy, pfrag->size - pfrag->offset)
>      if (copy)
>          copy data and append frag
> }
> 
> Either I totally don't get what you suggested, or it doesn't make sense 
> to me, because we have +1 branch in the common path when a record is 
> open and copy is not 0, no changes when there is no record, and more 
> repeating code hard to compress.
> 
> If I missed your idea, please explain in more details.

Jakub, is your comment still relevant after my response? If not, can the 
patch be merged?

>> Maybe some application wants to do zero-length sends to flush the
>> MSG_MORE and would benefit that way?
> 
> If it's a zero-length send, it means that size is 0 initially, and 
> max_open_record_len - ctx->open_record->len isn't 0 (otherwise the 
> record would have been closed at a previous iteration). That doesn't 
> sound related to swapping the mins and skipping tls_do_allocation on 
> copy == 0.
> 
> Thanks,
> Max
> 
>>>           size -= copy;
>>>           if (!size) {
>>
>