Re: cifs.ko page management during reads

From: ronnie sahlberg <ronniesahlberg@gmail.com>
To: Pavel Shilovsky <piastryyy@gmail.com>
Cc: Steve French <smfrench@gmail.com>,
	Shyam Prasad N <nspmangalore@gmail.com>,
	CIFS <linux-cifs@vger.kernel.org>
Subject: Re: cifs.ko page management during reads
Date: Wed, 14 Jul 2021 10:15:13 +1000	[thread overview]
Message-ID: <CAN05THQL3GTN0oG8rREQ3W9bxEoAV+NppbSgAuytdzPyH3cuhA@mail.gmail.com> (raw)
In-Reply-To: <CAKywueR3aTzrC-QM=P3tJ3RuS1AWAPsgcK-eqyX55HYtH-M_bQ@mail.gmail.com>

On Wed, Jul 14, 2021 at 8:58 AM Pavel Shilovsky <piastryyy@gmail.com> wrote:
>
> пн, 12 июл. 2021 г. в 22:41,Steve French <smfrench@gmail.com>:
> >
> > And don't forget "esize" mount option to allow offload of decryption...
> >
> > On Tue, Jul 13, 2021, 00:37 Shyam Prasad N <nspmangalore@gmail.com> wrote:
> >>
> >> Hi all,
> >>
> >> I'm trying to understand our read path (big picture is to integrate
> >> the new netfs helpers into cifs.ko), and I wanted to get some
> >> understanding about why things are the way they are. @Pavel Shilovsky
> >> I'm guessing that you've worked on most of this code, so hoping that
> >> you'll be able to answer some of these:
>
> Thanks for taking a look at this.
>
> >>
> >> 1. for cases where both encryption and signing are disabled, it looks
> >> like we read from the socket page-by-page directly into pages and map
> >> those to the inode address space
>
> Yes, we read directly into pre-allocated pages. For direct IO we read
> into pages supplied by the user and for non-direct IO (including page
> IO) we allocate intermediate pages.
>
> >>
> >> 2. for cases where encryption is enabled, we read the encrypted data
> >> into a set of pages first, then call crypto APIs, which decrypt the
> >> data in-place to the same pages. We then copy out the decrypted data
> >> from these pages into the pages in the address space that are already
> >> mapped.
>
> Correct. Regardless of whether offloading is used or not there is an extra copy.
>
> >>
> >> 3. similarly for the signing case, we read the data into a set of
> >> pages first, then verify signature, then copy the pages to the mapped
> >> pages in the inode address space.
>
> Yes.
>
> >>
> >> for case 1, can't we read from the socket directly into the mapped
> >> pages?
>
> Yes - assuming no signing or encryption, we should be able to read
> directly into the mapped pages. But I doubt many people use SMB2
> without signing or encryption although there may be some use cases
> requiring performance over safety.
>
> >> for case 2 and 3, instead of allocating temporary pages to read
> >> into, can't we read directly into the pages of the inode address
> >> space? the only risk I see is that if decryption or sign verification
> >> fails, it would overwrite the data that is already present in the page
> >> cache. but I see that as acceptable, because we're reading from the
> >> server, since the data we have in the cache is stale anyways.
>
> Theoretically we may try doing this with signing but we need to make
> sure that no one can modify those pages locally which may lead to
> signing errors. I don't think today we reconnect an SMB session on
> those today but we probably should.
>
> For encryption, we do not know where read data starts in an encrypted
> SMB packet, so there is almost no point to read directly because we
> would need to shift (copy) the data afterwards anyway.

Yes. But even if we knew where the data starts, it will not be aligned
to a page anyway so we need to memcpy() it anyway.

But I think it is possible, with a lot of effort. And probably not
worth the effort.

But, I have been toying with the idea of trying to do "zero-copy" SMB2
READ in libsmb2.
It, does read all the unencrypted READ payloads straight into the
application buffer without a copy
already, but just like cifs.ko it uses an intermediate buffer for
encrypted packets.And thus an extra memcpy()

The solution I was thinking of would be something like :
1, Read the first 4 + 64 bytes into a buffer.  This will be
enough to hold, either
1a, unencrypted case:
      inspect the SMB2 header to find the opcode, then read the
response header  (16 bytes for smb2 read/write responses)
      then read the payload straight into the application buffer.
      I do this already and it works without too much pain.
      Now we are finished for the unencrypted case.
2, encrypted case:
      In this case we have the full transform header and also part of
an encrypted smb2 header.
      We know how big the transform header is so we can copy the
partial smb2 header into a 64 byte
      temp buffer and then read the remaining part of the smb2 header.
3,   Start decryption, but only run it for the first 64 bytes, i.e.
the smb2 header.
3a, OPCODE != READ: Inspect the decrypted smb2 header, IF it is NOT a
read response then just read all remaining data into
      a temp buffer, continue the decryption and handle it the normal way.
4,   If the opcode is a READ, then read an additional 16 bytes into a
temp buffer and continue the decryption of these 16 bytes.
5,   What follows now is the read response payload so read this
decrypted data straight into the application buffer (or pages)
      and again, continue the decryption and decrypt the data in-place.
      Voila, zero-copy encrypted read handling.  Which would really
help on a PS2 since memcpy() is pretty expensive on its very slow
memory.

I think this would work, and it would be very interesting to implement.
Not sure if it is worth it to do in kernel code because there will be
hairy special conditions to handle.

For kernel code I think it would be better/safer to have fast hw
offload for memcpy()

regards
ronnie sahlberg

>
> --
> Best regards,
> Pavel Shilovsky