linux-cifs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* cifs.ko page management during reads
@ 2021-07-13  5:37 Shyam Prasad N
       [not found] ` <CAH2r5mvxj-A3F1tPr9OH1D00bdznVYyx7FzyjLZt=Xq41TCVxw@mail.gmail.com>
  0 siblings, 1 reply; 5+ messages in thread
From: Shyam Prasad N @ 2021-07-13  5:37 UTC (permalink / raw)
  To: Pavel Shilovsky, Steve French, CIFS

Hi all,

I'm trying to understand our read path (big picture is to integrate
the new netfs helpers into cifs.ko), and I wanted to get some
understanding about why things are the way they are. @Pavel Shilovsky
I'm guessing that you've worked on most of this code, so hoping that
you'll be able to answer some of these:

1. for cases where both encryption and signing are disabled, it looks
like we read from the socket page-by-page directly into pages and map
those to the inode address space

2. for cases where encryption is enabled, we read the encrypted data
into a set of pages first, then call crypto APIs, which decrypt the
data in-place to the same pages. We then copy out the decrypted data
from these pages into the pages in the address space that are already
mapped.

3. similarly for the signing case, we read the data into a set of
pages first, then verify signature, then copy the pages to the mapped
pages in the inode address space.

for case 1, can't we read from the socket directly into the mapped
pages? for case 2 and 3, instead of allocating temporary pages to read
into, can't we read directly into the pages of the inode address
space? the only risk I see is that if decryption or sign verification
fails, it would overwrite the data that is already present in the page
cache. but I see that as acceptable, because we're reading from the
server, since the data we have in the cache is stale anyways.

Seeking opinions from experts who've worked on this codepath before.
Please correct me if I'm wrong with any of these.

-- 
Regards,
Shyam

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: cifs.ko page management during reads
       [not found] ` <CAH2r5mvxj-A3F1tPr9OH1D00bdznVYyx7FzyjLZt=Xq41TCVxw@mail.gmail.com>
@ 2021-07-13 22:58   ` Pavel Shilovsky
  2021-07-14  0:15     ` ronnie sahlberg
  0 siblings, 1 reply; 5+ messages in thread
From: Pavel Shilovsky @ 2021-07-13 22:58 UTC (permalink / raw)
  To: Steve French; +Cc: Shyam Prasad N, CIFS

пн, 12 июл. 2021 г. в 22:41, Steve French <smfrench@gmail.com>:
>
> And don't forget "esize" mount option to allow offload of decryption...
>
> On Tue, Jul 13, 2021, 00:37 Shyam Prasad N <nspmangalore@gmail.com> wrote:
>>
>> Hi all,
>>
>> I'm trying to understand our read path (big picture is to integrate
>> the new netfs helpers into cifs.ko), and I wanted to get some
>> understanding about why things are the way they are. @Pavel Shilovsky
>> I'm guessing that you've worked on most of this code, so hoping that
>> you'll be able to answer some of these:

Thanks for taking a look at this.

>>
>> 1. for cases where both encryption and signing are disabled, it looks
>> like we read from the socket page-by-page directly into pages and map
>> those to the inode address space

Yes, we read directly into pre-allocated pages. For direct IO we read
into pages supplied by the user and for non-direct IO (including page
IO) we allocate intermediate pages.

>>
>> 2. for cases where encryption is enabled, we read the encrypted data
>> into a set of pages first, then call crypto APIs, which decrypt the
>> data in-place to the same pages. We then copy out the decrypted data
>> from these pages into the pages in the address space that are already
>> mapped.

Correct. Regardless of whether offloading is used or not there is an extra copy.

>>
>> 3. similarly for the signing case, we read the data into a set of
>> pages first, then verify signature, then copy the pages to the mapped
>> pages in the inode address space.

Yes.

>>
>> for case 1, can't we read from the socket directly into the mapped
>> pages?

Yes - assuming no signing or encryption, we should be able to read
directly into the mapped pages. But I doubt many people use SMB2
without signing or encryption although there may be some use cases
requiring performance over safety.

>> for case 2 and 3, instead of allocating temporary pages to read
>> into, can't we read directly into the pages of the inode address
>> space? the only risk I see is that if decryption or sign verification
>> fails, it would overwrite the data that is already present in the page
>> cache. but I see that as acceptable, because we're reading from the
>> server, since the data we have in the cache is stale anyways.

Theoretically we may try doing this with signing but we need to make
sure that no one can modify those pages locally which may lead to
signing errors. I don't think today we reconnect an SMB session on
those today but we probably should.

For encryption, we do not know where read data starts in an encrypted
SMB packet, so there is almost no point to read directly because we
would need to shift (copy) the data afterwards anyway.

--
Best regards,
Pavel Shilovsky

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: cifs.ko page management during reads
  2021-07-13 22:58   ` Pavel Shilovsky
@ 2021-07-14  0:15     ` ronnie sahlberg
  2021-07-14  6:50       ` Shyam Prasad N
  0 siblings, 1 reply; 5+ messages in thread
From: ronnie sahlberg @ 2021-07-14  0:15 UTC (permalink / raw)
  To: Pavel Shilovsky; +Cc: Steve French, Shyam Prasad N, CIFS

On Wed, Jul 14, 2021 at 8:58 AM Pavel Shilovsky <piastryyy@gmail.com> wrote:
>
> пн, 12 июл. 2021 г. в 22:41,Steve French <smfrench@gmail.com>:
> >
> > And don't forget "esize" mount option to allow offload of decryption...
> >
> > On Tue, Jul 13, 2021, 00:37 Shyam Prasad N <nspmangalore@gmail.com> wrote:
> >>
> >> Hi all,
> >>
> >> I'm trying to understand our read path (big picture is to integrate
> >> the new netfs helpers into cifs.ko), and I wanted to get some
> >> understanding about why things are the way they are. @Pavel Shilovsky
> >> I'm guessing that you've worked on most of this code, so hoping that
> >> you'll be able to answer some of these:
>
> Thanks for taking a look at this.
>
> >>
> >> 1. for cases where both encryption and signing are disabled, it looks
> >> like we read from the socket page-by-page directly into pages and map
> >> those to the inode address space
>
> Yes, we read directly into pre-allocated pages. For direct IO we read
> into pages supplied by the user and for non-direct IO (including page
> IO) we allocate intermediate pages.
>
> >>
> >> 2. for cases where encryption is enabled, we read the encrypted data
> >> into a set of pages first, then call crypto APIs, which decrypt the
> >> data in-place to the same pages. We then copy out the decrypted data
> >> from these pages into the pages in the address space that are already
> >> mapped.
>
> Correct. Regardless of whether offloading is used or not there is an extra copy.
>
> >>
> >> 3. similarly for the signing case, we read the data into a set of
> >> pages first, then verify signature, then copy the pages to the mapped
> >> pages in the inode address space.
>
> Yes.
>
> >>
> >> for case 1, can't we read from the socket directly into the mapped
> >> pages?
>
> Yes - assuming no signing or encryption, we should be able to read
> directly into the mapped pages. But I doubt many people use SMB2
> without signing or encryption although there may be some use cases
> requiring performance over safety.
>
> >> for case 2 and 3, instead of allocating temporary pages to read
> >> into, can't we read directly into the pages of the inode address
> >> space? the only risk I see is that if decryption or sign verification
> >> fails, it would overwrite the data that is already present in the page
> >> cache. but I see that as acceptable, because we're reading from the
> >> server, since the data we have in the cache is stale anyways.
>
> Theoretically we may try doing this with signing but we need to make
> sure that no one can modify those pages locally which may lead to
> signing errors. I don't think today we reconnect an SMB session on
> those today but we probably should.
>
> For encryption, we do not know where read data starts in an encrypted
> SMB packet, so there is almost no point to read directly because we
> would need to shift (copy) the data afterwards anyway.

Yes. But even if we knew where the data starts, it will not be aligned
to a page anyway so we need to memcpy() it anyway.

But I think it is possible, with a lot of effort. And probably not
worth the effort.

But, I have been toying with the idea of trying to do "zero-copy" SMB2
READ in libsmb2.
It, does read all the unencrypted READ payloads straight into the
application buffer without a copy
already, but just like cifs.ko it uses an intermediate buffer for
encrypted packets.And thus an extra memcpy()

The solution I was thinking of would be something like :
1, Read the first 4 + 64 bytes into a buffer.  This will be
enough to hold, either
1a, unencrypted case:
      inspect the SMB2 header to find the opcode, then read the
response header  (16 bytes for smb2 read/write responses)
      then read the payload straight into the application buffer.
      I do this already and it works without too much pain.
      Now we are finished for the unencrypted case.
2, encrypted case:
      In this case we have the full transform header and also part of
an encrypted smb2 header.
      We know how big the transform header is so we can copy the
partial smb2 header into a 64 byte
      temp buffer and then read the remaining part of the smb2 header.
3,   Start decryption, but only run it for the first 64 bytes, i.e.
the smb2 header.
3a, OPCODE != READ: Inspect the decrypted smb2 header, IF it is NOT a
read response then just read all remaining data into
      a temp buffer, continue the decryption and handle it the normal way.
4,   If the opcode is a READ, then read an additional 16 bytes into a
temp buffer and continue the decryption of these 16 bytes.
5,   What follows now is the read response payload so read this
decrypted data straight into the application buffer (or pages)
      and again, continue the decryption and decrypt the data in-place.
      Voila, zero-copy encrypted read handling.  Which would really
help on a PS2 since memcpy() is pretty expensive on its very slow
memory.


I think this would work, and it would be very interesting to implement.
Not sure if it is worth it to do in kernel code because there will be
hairy special conditions to handle.


For kernel code I think it would be better/safer to have fast hw
offload for memcpy()


regards
ronnie sahlberg


>
> --
> Best regards,
> Pavel Shilovsky

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: cifs.ko page management during reads
  2021-07-14  0:15     ` ronnie sahlberg
@ 2021-07-14  6:50       ` Shyam Prasad N
  2021-07-14 16:34         ` Pavel Shilovsky
  0 siblings, 1 reply; 5+ messages in thread
From: Shyam Prasad N @ 2021-07-14  6:50 UTC (permalink / raw)
  To: ronnie sahlberg; +Cc: Pavel Shilovsky, Steve French, CIFS

On Wed, Jul 14, 2021 at 5:45 AM ronnie sahlberg
<ronniesahlberg@gmail.com> wrote:
>
> On Wed, Jul 14, 2021 at 8:58 AM Pavel Shilovsky <piastryyy@gmail.com> wrote:
> >
> > пн, 12 июл. 2021 г. в 22:41,Steve French <smfrench@gmail.com>:
> > >
> > > And don't forget "esize" mount option to allow offload of decryption...
> > >
> > > On Tue, Jul 13, 2021, 00:37 Shyam Prasad N <nspmangalore@gmail.com> wrote:
> > >>
> > >> Hi all,
> > >>
> > >> I'm trying to understand our read path (big picture is to integrate
> > >> the new netfs helpers into cifs.ko), and I wanted to get some
> > >> understanding about why things are the way they are. @Pavel Shilovsky
> > >> I'm guessing that you've worked on most of this code, so hoping that
> > >> you'll be able to answer some of these:
> >
> > Thanks for taking a look at this.
> >
> > >>
> > >> 1. for cases where both encryption and signing are disabled, it looks
> > >> like we read from the socket page-by-page directly into pages and map
> > >> those to the inode address space
> >
> > Yes, we read directly into pre-allocated pages. For direct IO we read
> > into pages supplied by the user and for non-direct IO (including page
> > IO) we allocate intermediate pages.
> >
> > >>
> > >> 2. for cases where encryption is enabled, we read the encrypted data
> > >> into a set of pages first, then call crypto APIs, which decrypt the
> > >> data in-place to the same pages. We then copy out the decrypted data
> > >> from these pages into the pages in the address space that are already
> > >> mapped.
> >
> > Correct. Regardless of whether offloading is used or not there is an extra copy.
> >
> > >>
> > >> 3. similarly for the signing case, we read the data into a set of
> > >> pages first, then verify signature, then copy the pages to the mapped
> > >> pages in the inode address space.
> >
> > Yes.
> >
> > >>
> > >> for case 1, can't we read from the socket directly into the mapped
> > >> pages?
> >
> > Yes - assuming no signing or encryption, we should be able to read
> > directly into the mapped pages. But I doubt many people use SMB2
> > without signing or encryption although there may be some use cases
> > requiring performance over safety.
> >
> > >> for case 2 and 3, instead of allocating temporary pages to read
> > >> into, can't we read directly into the pages of the inode address
> > >> space? the only risk I see is that if decryption or sign verification
> > >> fails, it would overwrite the data that is already present in the page
> > >> cache. but I see that as acceptable, because we're reading from the
> > >> server, since the data we have in the cache is stale anyways.
> >
> > Theoretically we may try doing this with signing but we need to make
> > sure that no one can modify those pages locally which may lead to
> > signing errors. I don't think today we reconnect an SMB session on
> > those today but we probably should.
> >

How can someone modify those pages? Don't we keep the pages locked
when the read is in progress?

> > For encryption, we do not know where read data starts in an encrypted
> > SMB packet, so there is almost no point to read directly because we
> > would need to shift (copy) the data afterwards anyway.
>

Understood.

> Yes. But even if we knew where the data starts, it will not be aligned
> to a page anyway so we need to memcpy() it anyway.
>
> But I think it is possible, with a lot of effort. And probably not
> worth the effort.
>
> But, I have been toying with the idea of trying to do "zero-copy" SMB2
> READ in libsmb2.
> It, does read all the unencrypted READ payloads straight into the
> application buffer without a copy
> already, but just like cifs.ko it uses an intermediate buffer for
> encrypted packets.And thus an extra memcpy()
>
> The solution I was thinking of would be something like :
> 1, Read the first 4 + 64 bytes into a buffer.  This will be
> enough to hold, either
> 1a, unencrypted case:
>       inspect the SMB2 header to find the opcode, then read the
> response header  (16 bytes for smb2 read/write responses)
>       then read the payload straight into the application buffer.
>       I do this already and it works without too much pain.
>       Now we are finished for the unencrypted case.
> 2, encrypted case:
>       In this case we have the full transform header and also part of
> an encrypted smb2 header.
>       We know how big the transform header is so we can copy the
> partial smb2 header into a 64 byte
>       temp buffer and then read the remaining part of the smb2 header.
> 3,   Start decryption, but only run it for the first 64 bytes, i.e.
> the smb2 header.
> 3a, OPCODE != READ: Inspect the decrypted smb2 header, IF it is NOT a
> read response then just read all remaining data into
>       a temp buffer, continue the decryption and handle it the normal way.
> 4,   If the opcode is a READ, then read an additional 16 bytes into a
> temp buffer and continue the decryption of these 16 bytes.
> 5,   What follows now is the read response payload so read this
> decrypted data straight into the application buffer (or pages)
>       and again, continue the decryption and decrypt the data in-place.
>       Voila, zero-copy encrypted read handling.  Which would really
> help on a PS2 since memcpy() is pretty expensive on its very slow
> memory.
>
>
> I think this would work, and it would be very interesting to implement.
> Not sure if it is worth it to do in kernel code because there will be
> hairy special conditions to handle.
>

Very interesting.
Understood that this will be significant changes. But curious as to
what special conditions you're referring to?

>
> For kernel code I think it would be better/safer to have fast hw
> offload for memcpy()
>

Understood.

>
> regards
> ronnie sahlberg
>
>
> >
> > --
> > Best regards,
> > Pavel Shilovsky



-- 
Regards,
Shyam

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: cifs.ko page management during reads
  2021-07-14  6:50       ` Shyam Prasad N
@ 2021-07-14 16:34         ` Pavel Shilovsky
  0 siblings, 0 replies; 5+ messages in thread
From: Pavel Shilovsky @ 2021-07-14 16:34 UTC (permalink / raw)
  To: Shyam Prasad N; +Cc: ronnie sahlberg, Steve French, CIFS

вт, 13 июл. 2021 г. в 23:51, Shyam Prasad N <nspmangalore@gmail.com>:
>
> On Wed, Jul 14, 2021 at 5:45 AM ronnie sahlberg
> <ronniesahlberg@gmail.com> wrote:
> >
> > On Wed, Jul 14, 2021 at 8:58 AM Pavel Shilovsky <piastryyy@gmail.com> wrote:
> > >
> > > пн, 12 июл. 2021 г. в 22:41,Steve French <smfrench@gmail.com>:
> > > >
> > > > And don't forget "esize" mount option to allow offload of decryption...
> > > >
> > > > On Tue, Jul 13, 2021, 00:37 Shyam Prasad N <nspmangalore@gmail.com> wrote:
> > > >>
> > > >> Hi all,
> > > >>
> > > >> I'm trying to understand our read path (big picture is to integrate
> > > >> the new netfs helpers into cifs.ko), and I wanted to get some
> > > >> understanding about why things are the way they are. @Pavel Shilovsky
> > > >> I'm guessing that you've worked on most of this code, so hoping that
> > > >> you'll be able to answer some of these:
> > >
> > > Thanks for taking a look at this.
> > >
> > > >>
> > > >> 1. for cases where both encryption and signing are disabled, it looks
> > > >> like we read from the socket page-by-page directly into pages and map
> > > >> those to the inode address space
> > >
> > > Yes, we read directly into pre-allocated pages. For direct IO we read
> > > into pages supplied by the user and for non-direct IO (including page
> > > IO) we allocate intermediate pages.
> > >
> > > >>
> > > >> 2. for cases where encryption is enabled, we read the encrypted data
> > > >> into a set of pages first, then call crypto APIs, which decrypt the
> > > >> data in-place to the same pages. We then copy out the decrypted data
> > > >> from these pages into the pages in the address space that are already
> > > >> mapped.
> > >
> > > Correct. Regardless of whether offloading is used or not there is an extra copy.
> > >
> > > >>
> > > >> 3. similarly for the signing case, we read the data into a set of
> > > >> pages first, then verify signature, then copy the pages to the mapped
> > > >> pages in the inode address space.
> > >
> > > Yes.
> > >
> > > >>
> > > >> for case 1, can't we read from the socket directly into the mapped
> > > >> pages?
> > >
> > > Yes - assuming no signing or encryption, we should be able to read
> > > directly into the mapped pages. But I doubt many people use SMB2
> > > without signing or encryption although there may be some use cases
> > > requiring performance over safety.
> > >
> > > >> for case 2 and 3, instead of allocating temporary pages to read
> > > >> into, can't we read directly into the pages of the inode address
> > > >> space? the only risk I see is that if decryption or sign verification
> > > >> fails, it would overwrite the data that is already present in the page
> > > >> cache. but I see that as acceptable, because we're reading from the
> > > >> server, since the data we have in the cache is stale anyways.
> > >
> > > Theoretically we may try doing this with signing but we need to make
> > > sure that no one can modify those pages locally which may lead to
> > > signing errors. I don't think today we reconnect an SMB session on
> > > those today but we probably should.
> > >
>
> How can someone modify those pages? Don't we keep the pages locked
> when the read is in progress?
>

Agree. As I said - theoretically we can do such an optimization. I am
not sure if it is worth it given the whole signing overhead.

--
Best regards,
Pavel Shilovsky

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-07-14 16:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-13  5:37 cifs.ko page management during reads Shyam Prasad N
     [not found] ` <CAH2r5mvxj-A3F1tPr9OH1D00bdznVYyx7FzyjLZt=Xq41TCVxw@mail.gmail.com>
2021-07-13 22:58   ` Pavel Shilovsky
2021-07-14  0:15     ` ronnie sahlberg
2021-07-14  6:50       ` Shyam Prasad N
2021-07-14 16:34         ` Pavel Shilovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).