From: Shyam Prasad N <firstname.lastname@example.org> To: ronnie sahlberg <email@example.com> Cc: Pavel Shilovsky <firstname.lastname@example.org>, Steve French <email@example.com>, CIFS <firstname.lastname@example.org> Subject: Re: cifs.ko page management during reads Date: Wed, 14 Jul 2021 12:20:56 +0530 [thread overview] Message-ID: <CANT5p=oCJ5MRSVcNHV3-+Jg2TcGc3m785GGJBzScGThkS24eMw@mail.gmail.com> (raw) In-Reply-To: <CAN05THQL3GTN0oG8rREQ3W9bxEoAV+NppbSgAuytdzPyH3cuhA@mail.gmail.com> On Wed, Jul 14, 2021 at 5:45 AM ronnie sahlberg <email@example.com> wrote: > > On Wed, Jul 14, 2021 at 8:58 AM Pavel Shilovsky <firstname.lastname@example.org> wrote: > > > > пн, 12 июл. 2021 г. в 22:41,Steve French <email@example.com>: > > > > > > And don't forget "esize" mount option to allow offload of decryption... > > > > > > On Tue, Jul 13, 2021, 00:37 Shyam Prasad N <firstname.lastname@example.org> wrote: > > >> > > >> Hi all, > > >> > > >> I'm trying to understand our read path (big picture is to integrate > > >> the new netfs helpers into cifs.ko), and I wanted to get some > > >> understanding about why things are the way they are. @Pavel Shilovsky > > >> I'm guessing that you've worked on most of this code, so hoping that > > >> you'll be able to answer some of these: > > > > Thanks for taking a look at this. > > > > >> > > >> 1. for cases where both encryption and signing are disabled, it looks > > >> like we read from the socket page-by-page directly into pages and map > > >> those to the inode address space > > > > Yes, we read directly into pre-allocated pages. For direct IO we read > > into pages supplied by the user and for non-direct IO (including page > > IO) we allocate intermediate pages. > > > > >> > > >> 2. for cases where encryption is enabled, we read the encrypted data > > >> into a set of pages first, then call crypto APIs, which decrypt the > > >> data in-place to the same pages. We then copy out the decrypted data > > >> from these pages into the pages in the address space that are already > > >> mapped. > > > > Correct. Regardless of whether offloading is used or not there is an extra copy. > > > > >> > > >> 3. similarly for the signing case, we read the data into a set of > > >> pages first, then verify signature, then copy the pages to the mapped > > >> pages in the inode address space. > > > > Yes. > > > > >> > > >> for case 1, can't we read from the socket directly into the mapped > > >> pages? > > > > Yes - assuming no signing or encryption, we should be able to read > > directly into the mapped pages. But I doubt many people use SMB2 > > without signing or encryption although there may be some use cases > > requiring performance over safety. > > > > >> for case 2 and 3, instead of allocating temporary pages to read > > >> into, can't we read directly into the pages of the inode address > > >> space? the only risk I see is that if decryption or sign verification > > >> fails, it would overwrite the data that is already present in the page > > >> cache. but I see that as acceptable, because we're reading from the > > >> server, since the data we have in the cache is stale anyways. > > > > Theoretically we may try doing this with signing but we need to make > > sure that no one can modify those pages locally which may lead to > > signing errors. I don't think today we reconnect an SMB session on > > those today but we probably should. > > How can someone modify those pages? Don't we keep the pages locked when the read is in progress? > > For encryption, we do not know where read data starts in an encrypted > > SMB packet, so there is almost no point to read directly because we > > would need to shift (copy) the data afterwards anyway. > Understood. > Yes. But even if we knew where the data starts, it will not be aligned > to a page anyway so we need to memcpy() it anyway. > > But I think it is possible, with a lot of effort. And probably not > worth the effort. > > But, I have been toying with the idea of trying to do "zero-copy" SMB2 > READ in libsmb2. > It, does read all the unencrypted READ payloads straight into the > application buffer without a copy > already, but just like cifs.ko it uses an intermediate buffer for > encrypted packets.And thus an extra memcpy() > > The solution I was thinking of would be something like : > 1, Read the first 4 + 64 bytes into a buffer. This will be > enough to hold, either > 1a, unencrypted case: > inspect the SMB2 header to find the opcode, then read the > response header (16 bytes for smb2 read/write responses) > then read the payload straight into the application buffer. > I do this already and it works without too much pain. > Now we are finished for the unencrypted case. > 2, encrypted case: > In this case we have the full transform header and also part of > an encrypted smb2 header. > We know how big the transform header is so we can copy the > partial smb2 header into a 64 byte > temp buffer and then read the remaining part of the smb2 header. > 3, Start decryption, but only run it for the first 64 bytes, i.e. > the smb2 header. > 3a, OPCODE != READ: Inspect the decrypted smb2 header, IF it is NOT a > read response then just read all remaining data into > a temp buffer, continue the decryption and handle it the normal way. > 4, If the opcode is a READ, then read an additional 16 bytes into a > temp buffer and continue the decryption of these 16 bytes. > 5, What follows now is the read response payload so read this > decrypted data straight into the application buffer (or pages) > and again, continue the decryption and decrypt the data in-place. > Voila, zero-copy encrypted read handling. Which would really > help on a PS2 since memcpy() is pretty expensive on its very slow > memory. > > > I think this would work, and it would be very interesting to implement. > Not sure if it is worth it to do in kernel code because there will be > hairy special conditions to handle. > Very interesting. Understood that this will be significant changes. But curious as to what special conditions you're referring to? > > For kernel code I think it would be better/safer to have fast hw > offload for memcpy() > Understood. > > regards > ronnie sahlberg > > > > > > -- > > Best regards, > > Pavel Shilovsky -- Regards, Shyam
next prev parent reply other threads:[~2021-07-14 6:51 UTC|newest] Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-07-13 5:37 Shyam Prasad N [not found] ` <CAH2r5mvxj-A3F1tPr9OH1D00bdznVYyx7FzyjLZt=Xq41TCVxw@mail.gmail.com> 2021-07-13 22:58 ` Pavel Shilovsky 2021-07-14 0:15 ` ronnie sahlberg 2021-07-14 6:50 ` Shyam Prasad N [this message] 2021-07-14 16:34 ` Pavel Shilovsky
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CANT5p=oCJ5MRSVcNHV3-+Jg2TcGc3m785GGJBzScGThkS24eMw@mail.gmail.com' \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --subject='Re: cifs.ko page management during reads' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).