* cifs.ko page management during reads @ 2021-07-13 5:37 Shyam Prasad N [not found] ` <CAH2r5mvxj-A3F1tPr9OH1D00bdznVYyx7FzyjLZt=Xq41TCVxw@mail.gmail.com> 0 siblings, 1 reply; 5+ messages in thread From: Shyam Prasad N @ 2021-07-13 5:37 UTC (permalink / raw) To: Pavel Shilovsky, Steve French, CIFS Hi all, I'm trying to understand our read path (big picture is to integrate the new netfs helpers into cifs.ko), and I wanted to get some understanding about why things are the way they are. @Pavel Shilovsky I'm guessing that you've worked on most of this code, so hoping that you'll be able to answer some of these: 1. for cases where both encryption and signing are disabled, it looks like we read from the socket page-by-page directly into pages and map those to the inode address space 2. for cases where encryption is enabled, we read the encrypted data into a set of pages first, then call crypto APIs, which decrypt the data in-place to the same pages. We then copy out the decrypted data from these pages into the pages in the address space that are already mapped. 3. similarly for the signing case, we read the data into a set of pages first, then verify signature, then copy the pages to the mapped pages in the inode address space. for case 1, can't we read from the socket directly into the mapped pages? for case 2 and 3, instead of allocating temporary pages to read into, can't we read directly into the pages of the inode address space? the only risk I see is that if decryption or sign verification fails, it would overwrite the data that is already present in the page cache. but I see that as acceptable, because we're reading from the server, since the data we have in the cache is stale anyways. Seeking opinions from experts who've worked on this codepath before. Please correct me if I'm wrong with any of these. -- Regards, Shyam ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <CAH2r5mvxj-A3F1tPr9OH1D00bdznVYyx7FzyjLZt=Xq41TCVxw@mail.gmail.com>]
* Re: cifs.ko page management during reads [not found] ` <CAH2r5mvxj-A3F1tPr9OH1D00bdznVYyx7FzyjLZt=Xq41TCVxw@mail.gmail.com> @ 2021-07-13 22:58 ` Pavel Shilovsky 2021-07-14 0:15 ` ronnie sahlberg 0 siblings, 1 reply; 5+ messages in thread From: Pavel Shilovsky @ 2021-07-13 22:58 UTC (permalink / raw) To: Steve French; +Cc: Shyam Prasad N, CIFS пн, 12 июл. 2021 г. в 22:41, Steve French <smfrench@gmail.com>: > > And don't forget "esize" mount option to allow offload of decryption... > > On Tue, Jul 13, 2021, 00:37 Shyam Prasad N <nspmangalore@gmail.com> wrote: >> >> Hi all, >> >> I'm trying to understand our read path (big picture is to integrate >> the new netfs helpers into cifs.ko), and I wanted to get some >> understanding about why things are the way they are. @Pavel Shilovsky >> I'm guessing that you've worked on most of this code, so hoping that >> you'll be able to answer some of these: Thanks for taking a look at this. >> >> 1. for cases where both encryption and signing are disabled, it looks >> like we read from the socket page-by-page directly into pages and map >> those to the inode address space Yes, we read directly into pre-allocated pages. For direct IO we read into pages supplied by the user and for non-direct IO (including page IO) we allocate intermediate pages. >> >> 2. for cases where encryption is enabled, we read the encrypted data >> into a set of pages first, then call crypto APIs, which decrypt the >> data in-place to the same pages. We then copy out the decrypted data >> from these pages into the pages in the address space that are already >> mapped. Correct. Regardless of whether offloading is used or not there is an extra copy. >> >> 3. similarly for the signing case, we read the data into a set of >> pages first, then verify signature, then copy the pages to the mapped >> pages in the inode address space. Yes. >> >> for case 1, can't we read from the socket directly into the mapped >> pages? Yes - assuming no signing or encryption, we should be able to read directly into the mapped pages. But I doubt many people use SMB2 without signing or encryption although there may be some use cases requiring performance over safety. >> for case 2 and 3, instead of allocating temporary pages to read >> into, can't we read directly into the pages of the inode address >> space? the only risk I see is that if decryption or sign verification >> fails, it would overwrite the data that is already present in the page >> cache. but I see that as acceptable, because we're reading from the >> server, since the data we have in the cache is stale anyways. Theoretically we may try doing this with signing but we need to make sure that no one can modify those pages locally which may lead to signing errors. I don't think today we reconnect an SMB session on those today but we probably should. For encryption, we do not know where read data starts in an encrypted SMB packet, so there is almost no point to read directly because we would need to shift (copy) the data afterwards anyway. -- Best regards, Pavel Shilovsky ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: cifs.ko page management during reads 2021-07-13 22:58 ` Pavel Shilovsky @ 2021-07-14 0:15 ` ronnie sahlberg 2021-07-14 6:50 ` Shyam Prasad N 0 siblings, 1 reply; 5+ messages in thread From: ronnie sahlberg @ 2021-07-14 0:15 UTC (permalink / raw) To: Pavel Shilovsky; +Cc: Steve French, Shyam Prasad N, CIFS On Wed, Jul 14, 2021 at 8:58 AM Pavel Shilovsky <piastryyy@gmail.com> wrote: > > пн, 12 июл. 2021 г. в 22:41,Steve French <smfrench@gmail.com>: > > > > And don't forget "esize" mount option to allow offload of decryption... > > > > On Tue, Jul 13, 2021, 00:37 Shyam Prasad N <nspmangalore@gmail.com> wrote: > >> > >> Hi all, > >> > >> I'm trying to understand our read path (big picture is to integrate > >> the new netfs helpers into cifs.ko), and I wanted to get some > >> understanding about why things are the way they are. @Pavel Shilovsky > >> I'm guessing that you've worked on most of this code, so hoping that > >> you'll be able to answer some of these: > > Thanks for taking a look at this. > > >> > >> 1. for cases where both encryption and signing are disabled, it looks > >> like we read from the socket page-by-page directly into pages and map > >> those to the inode address space > > Yes, we read directly into pre-allocated pages. For direct IO we read > into pages supplied by the user and for non-direct IO (including page > IO) we allocate intermediate pages. > > >> > >> 2. for cases where encryption is enabled, we read the encrypted data > >> into a set of pages first, then call crypto APIs, which decrypt the > >> data in-place to the same pages. We then copy out the decrypted data > >> from these pages into the pages in the address space that are already > >> mapped. > > Correct. Regardless of whether offloading is used or not there is an extra copy. > > >> > >> 3. similarly for the signing case, we read the data into a set of > >> pages first, then verify signature, then copy the pages to the mapped > >> pages in the inode address space. > > Yes. > > >> > >> for case 1, can't we read from the socket directly into the mapped > >> pages? > > Yes - assuming no signing or encryption, we should be able to read > directly into the mapped pages. But I doubt many people use SMB2 > without signing or encryption although there may be some use cases > requiring performance over safety. > > >> for case 2 and 3, instead of allocating temporary pages to read > >> into, can't we read directly into the pages of the inode address > >> space? the only risk I see is that if decryption or sign verification > >> fails, it would overwrite the data that is already present in the page > >> cache. but I see that as acceptable, because we're reading from the > >> server, since the data we have in the cache is stale anyways. > > Theoretically we may try doing this with signing but we need to make > sure that no one can modify those pages locally which may lead to > signing errors. I don't think today we reconnect an SMB session on > those today but we probably should. > > For encryption, we do not know where read data starts in an encrypted > SMB packet, so there is almost no point to read directly because we > would need to shift (copy) the data afterwards anyway. Yes. But even if we knew where the data starts, it will not be aligned to a page anyway so we need to memcpy() it anyway. But I think it is possible, with a lot of effort. And probably not worth the effort. But, I have been toying with the idea of trying to do "zero-copy" SMB2 READ in libsmb2. It, does read all the unencrypted READ payloads straight into the application buffer without a copy already, but just like cifs.ko it uses an intermediate buffer for encrypted packets.And thus an extra memcpy() The solution I was thinking of would be something like : 1, Read the first 4 + 64 bytes into a buffer. This will be enough to hold, either 1a, unencrypted case: inspect the SMB2 header to find the opcode, then read the response header (16 bytes for smb2 read/write responses) then read the payload straight into the application buffer. I do this already and it works without too much pain. Now we are finished for the unencrypted case. 2, encrypted case: In this case we have the full transform header and also part of an encrypted smb2 header. We know how big the transform header is so we can copy the partial smb2 header into a 64 byte temp buffer and then read the remaining part of the smb2 header. 3, Start decryption, but only run it for the first 64 bytes, i.e. the smb2 header. 3a, OPCODE != READ: Inspect the decrypted smb2 header, IF it is NOT a read response then just read all remaining data into a temp buffer, continue the decryption and handle it the normal way. 4, If the opcode is a READ, then read an additional 16 bytes into a temp buffer and continue the decryption of these 16 bytes. 5, What follows now is the read response payload so read this decrypted data straight into the application buffer (or pages) and again, continue the decryption and decrypt the data in-place. Voila, zero-copy encrypted read handling. Which would really help on a PS2 since memcpy() is pretty expensive on its very slow memory. I think this would work, and it would be very interesting to implement. Not sure if it is worth it to do in kernel code because there will be hairy special conditions to handle. For kernel code I think it would be better/safer to have fast hw offload for memcpy() regards ronnie sahlberg > > -- > Best regards, > Pavel Shilovsky ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: cifs.ko page management during reads 2021-07-14 0:15 ` ronnie sahlberg @ 2021-07-14 6:50 ` Shyam Prasad N 2021-07-14 16:34 ` Pavel Shilovsky 0 siblings, 1 reply; 5+ messages in thread From: Shyam Prasad N @ 2021-07-14 6:50 UTC (permalink / raw) To: ronnie sahlberg; +Cc: Pavel Shilovsky, Steve French, CIFS On Wed, Jul 14, 2021 at 5:45 AM ronnie sahlberg <ronniesahlberg@gmail.com> wrote: > > On Wed, Jul 14, 2021 at 8:58 AM Pavel Shilovsky <piastryyy@gmail.com> wrote: > > > > пн, 12 июл. 2021 г. в 22:41,Steve French <smfrench@gmail.com>: > > > > > > And don't forget "esize" mount option to allow offload of decryption... > > > > > > On Tue, Jul 13, 2021, 00:37 Shyam Prasad N <nspmangalore@gmail.com> wrote: > > >> > > >> Hi all, > > >> > > >> I'm trying to understand our read path (big picture is to integrate > > >> the new netfs helpers into cifs.ko), and I wanted to get some > > >> understanding about why things are the way they are. @Pavel Shilovsky > > >> I'm guessing that you've worked on most of this code, so hoping that > > >> you'll be able to answer some of these: > > > > Thanks for taking a look at this. > > > > >> > > >> 1. for cases where both encryption and signing are disabled, it looks > > >> like we read from the socket page-by-page directly into pages and map > > >> those to the inode address space > > > > Yes, we read directly into pre-allocated pages. For direct IO we read > > into pages supplied by the user and for non-direct IO (including page > > IO) we allocate intermediate pages. > > > > >> > > >> 2. for cases where encryption is enabled, we read the encrypted data > > >> into a set of pages first, then call crypto APIs, which decrypt the > > >> data in-place to the same pages. We then copy out the decrypted data > > >> from these pages into the pages in the address space that are already > > >> mapped. > > > > Correct. Regardless of whether offloading is used or not there is an extra copy. > > > > >> > > >> 3. similarly for the signing case, we read the data into a set of > > >> pages first, then verify signature, then copy the pages to the mapped > > >> pages in the inode address space. > > > > Yes. > > > > >> > > >> for case 1, can't we read from the socket directly into the mapped > > >> pages? > > > > Yes - assuming no signing or encryption, we should be able to read > > directly into the mapped pages. But I doubt many people use SMB2 > > without signing or encryption although there may be some use cases > > requiring performance over safety. > > > > >> for case 2 and 3, instead of allocating temporary pages to read > > >> into, can't we read directly into the pages of the inode address > > >> space? the only risk I see is that if decryption or sign verification > > >> fails, it would overwrite the data that is already present in the page > > >> cache. but I see that as acceptable, because we're reading from the > > >> server, since the data we have in the cache is stale anyways. > > > > Theoretically we may try doing this with signing but we need to make > > sure that no one can modify those pages locally which may lead to > > signing errors. I don't think today we reconnect an SMB session on > > those today but we probably should. > > How can someone modify those pages? Don't we keep the pages locked when the read is in progress? > > For encryption, we do not know where read data starts in an encrypted > > SMB packet, so there is almost no point to read directly because we > > would need to shift (copy) the data afterwards anyway. > Understood. > Yes. But even if we knew where the data starts, it will not be aligned > to a page anyway so we need to memcpy() it anyway. > > But I think it is possible, with a lot of effort. And probably not > worth the effort. > > But, I have been toying with the idea of trying to do "zero-copy" SMB2 > READ in libsmb2. > It, does read all the unencrypted READ payloads straight into the > application buffer without a copy > already, but just like cifs.ko it uses an intermediate buffer for > encrypted packets.And thus an extra memcpy() > > The solution I was thinking of would be something like : > 1, Read the first 4 + 64 bytes into a buffer. This will be > enough to hold, either > 1a, unencrypted case: > inspect the SMB2 header to find the opcode, then read the > response header (16 bytes for smb2 read/write responses) > then read the payload straight into the application buffer. > I do this already and it works without too much pain. > Now we are finished for the unencrypted case. > 2, encrypted case: > In this case we have the full transform header and also part of > an encrypted smb2 header. > We know how big the transform header is so we can copy the > partial smb2 header into a 64 byte > temp buffer and then read the remaining part of the smb2 header. > 3, Start decryption, but only run it for the first 64 bytes, i.e. > the smb2 header. > 3a, OPCODE != READ: Inspect the decrypted smb2 header, IF it is NOT a > read response then just read all remaining data into > a temp buffer, continue the decryption and handle it the normal way. > 4, If the opcode is a READ, then read an additional 16 bytes into a > temp buffer and continue the decryption of these 16 bytes. > 5, What follows now is the read response payload so read this > decrypted data straight into the application buffer (or pages) > and again, continue the decryption and decrypt the data in-place. > Voila, zero-copy encrypted read handling. Which would really > help on a PS2 since memcpy() is pretty expensive on its very slow > memory. > > > I think this would work, and it would be very interesting to implement. > Not sure if it is worth it to do in kernel code because there will be > hairy special conditions to handle. > Very interesting. Understood that this will be significant changes. But curious as to what special conditions you're referring to? > > For kernel code I think it would be better/safer to have fast hw > offload for memcpy() > Understood. > > regards > ronnie sahlberg > > > > > > -- > > Best regards, > > Pavel Shilovsky -- Regards, Shyam ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: cifs.ko page management during reads 2021-07-14 6:50 ` Shyam Prasad N @ 2021-07-14 16:34 ` Pavel Shilovsky 0 siblings, 0 replies; 5+ messages in thread From: Pavel Shilovsky @ 2021-07-14 16:34 UTC (permalink / raw) To: Shyam Prasad N; +Cc: ronnie sahlberg, Steve French, CIFS вт, 13 июл. 2021 г. в 23:51, Shyam Prasad N <nspmangalore@gmail.com>: > > On Wed, Jul 14, 2021 at 5:45 AM ronnie sahlberg > <ronniesahlberg@gmail.com> wrote: > > > > On Wed, Jul 14, 2021 at 8:58 AM Pavel Shilovsky <piastryyy@gmail.com> wrote: > > > > > > пн, 12 июл. 2021 г. в 22:41,Steve French <smfrench@gmail.com>: > > > > > > > > And don't forget "esize" mount option to allow offload of decryption... > > > > > > > > On Tue, Jul 13, 2021, 00:37 Shyam Prasad N <nspmangalore@gmail.com> wrote: > > > >> > > > >> Hi all, > > > >> > > > >> I'm trying to understand our read path (big picture is to integrate > > > >> the new netfs helpers into cifs.ko), and I wanted to get some > > > >> understanding about why things are the way they are. @Pavel Shilovsky > > > >> I'm guessing that you've worked on most of this code, so hoping that > > > >> you'll be able to answer some of these: > > > > > > Thanks for taking a look at this. > > > > > > >> > > > >> 1. for cases where both encryption and signing are disabled, it looks > > > >> like we read from the socket page-by-page directly into pages and map > > > >> those to the inode address space > > > > > > Yes, we read directly into pre-allocated pages. For direct IO we read > > > into pages supplied by the user and for non-direct IO (including page > > > IO) we allocate intermediate pages. > > > > > > >> > > > >> 2. for cases where encryption is enabled, we read the encrypted data > > > >> into a set of pages first, then call crypto APIs, which decrypt the > > > >> data in-place to the same pages. We then copy out the decrypted data > > > >> from these pages into the pages in the address space that are already > > > >> mapped. > > > > > > Correct. Regardless of whether offloading is used or not there is an extra copy. > > > > > > >> > > > >> 3. similarly for the signing case, we read the data into a set of > > > >> pages first, then verify signature, then copy the pages to the mapped > > > >> pages in the inode address space. > > > > > > Yes. > > > > > > >> > > > >> for case 1, can't we read from the socket directly into the mapped > > > >> pages? > > > > > > Yes - assuming no signing or encryption, we should be able to read > > > directly into the mapped pages. But I doubt many people use SMB2 > > > without signing or encryption although there may be some use cases > > > requiring performance over safety. > > > > > > >> for case 2 and 3, instead of allocating temporary pages to read > > > >> into, can't we read directly into the pages of the inode address > > > >> space? the only risk I see is that if decryption or sign verification > > > >> fails, it would overwrite the data that is already present in the page > > > >> cache. but I see that as acceptable, because we're reading from the > > > >> server, since the data we have in the cache is stale anyways. > > > > > > Theoretically we may try doing this with signing but we need to make > > > sure that no one can modify those pages locally which may lead to > > > signing errors. I don't think today we reconnect an SMB session on > > > those today but we probably should. > > > > > How can someone modify those pages? Don't we keep the pages locked > when the read is in progress? > Agree. As I said - theoretically we can do such an optimization. I am not sure if it is worth it given the whole signing overhead. -- Best regards, Pavel Shilovsky ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-07-14 16:34 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-07-13 5:37 cifs.ko page management during reads Shyam Prasad N [not found] ` <CAH2r5mvxj-A3F1tPr9OH1D00bdznVYyx7FzyjLZt=Xq41TCVxw@mail.gmail.com> 2021-07-13 22:58 ` Pavel Shilovsky 2021-07-14 0:15 ` ronnie sahlberg 2021-07-14 6:50 ` Shyam Prasad N 2021-07-14 16:34 ` Pavel Shilovsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).