All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: "Xiubo Li" <xiubli@redhat.com>, "Luís Henriques" <lhenriques@suse.de>
Cc: Ilya Dryomov <idryomov@gmail.com>,
	ceph-devel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] ceph: invalidate pages when doing DIO in encrypted inodes
Date: Wed, 06 Apr 2022 09:41:04 -0400	[thread overview]
Message-ID: <f0ed169ed02fe810076e959e9ec5455d9de4b4ff.camel@kernel.org> (raw)
In-Reply-To: <321104e6-36db-c143-a7ba-58f9199e6fb7@redhat.com>

On Wed, 2022-04-06 at 21:10 +0800, Xiubo Li wrote:
> On 4/6/22 7:48 PM, Jeff Layton wrote:
> > On Wed, 2022-04-06 at 12:33 +0100, Luís Henriques wrote:
> > > Xiubo Li <xiubli@redhat.com> writes:
> > > 
> > > > On 4/6/22 6:57 PM, Luís Henriques wrote:
> > > > > Xiubo Li <xiubli@redhat.com> writes:
> > > > > 
> > > > > > On 4/1/22 9:32 PM, Luís Henriques wrote:
> > > > > > > When doing DIO on an encrypted node, we need to invalidate the page cache in
> > > > > > > the range being written to, otherwise the cache will include invalid data.
> > > > > > > 
> > > > > > > Signed-off-by: Luís Henriques <lhenriques@suse.de>
> > > > > > > ---
> > > > > > >     fs/ceph/file.c | 11 ++++++++++-
> > > > > > >     1 file changed, 10 insertions(+), 1 deletion(-)
> > > > > > > 
> > > > > > > Changes since v1:
> > > > > > > - Replaced truncate_inode_pages_range() by invalidate_inode_pages2_range
> > > > > > > - Call fscache_invalidate with FSCACHE_INVAL_DIO_WRITE if we're doing DIO
> > > > > > > 
> > > > > > > Note: I'm not really sure this last change is required, it doesn't really
> > > > > > > affect generic/647 result, but seems to be the most correct.
> > > > > > > 
> > > > > > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > > > > > > index 5072570c2203..b2743c342305 100644
> > > > > > > --- a/fs/ceph/file.c
> > > > > > > +++ b/fs/ceph/file.c
> > > > > > > @@ -1605,7 +1605,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
> > > > > > >     	if (ret < 0)
> > > > > > >     		return ret;
> > > > > > >     -	ceph_fscache_invalidate(inode, false);
> > > > > > > +	ceph_fscache_invalidate(inode, (iocb->ki_flags & IOCB_DIRECT));
> > > > > > >     	ret = invalidate_inode_pages2_range(inode->i_mapping,
> > > > > > >     					    pos >> PAGE_SHIFT,
> > > > > > >     					    (pos + count - 1) >> PAGE_SHIFT);
> > > > > > The above has already invalidated the pages, why doesn't it work ?
> > > > > I suspect the reason is because later on we loop through the number of
> > > > > pages, call copy_page_from_iter() and then ceph_fscrypt_encrypt_pages().
> > > > Checked the 'copy_page_from_iter()', it will do the kmap for the pages but will
> > > > kunmap them again later. And they shouldn't update the i_mapping if I didn't
> > > > miss something important.
> > > > 
> > > > For 'ceph_fscrypt_encrypt_pages()' it will encrypt/dencrypt the context inplace,
> > > > IMO if it needs to map the page and it should also unmap it just like in
> > > > 'copy_page_from_iter()'.
> > > > 
> > > > I thought it possibly be when we need to do RMW, it may will update the
> > > > i_mapping when reading contents, but I checked the code didn't find any
> > > > place is doing this. So I am wondering where tha page caches come from ? If that
> > > > page caches really from reading the contents, then we should discard it instead
> > > > of flushing it back ?
> > > > 
> > > > BTW, what's the problem without this fixing ? xfstest fails ?
> > > Yes, generic/647 fails if you run it with test_dummy_encryption.  And I've
> > > also checked that the RMW code was never executed in this test.
> > > 
> > > But yeah I have assumed (perhaps wrongly) that the kmap/kunmap could
> > > change the inode->i_mapping.
> > > 
> > No, kmap/unmap are all about high memory and 32-bit architectures. Those
> > functions are usually no-ops on 64-bit arches.
> 
> Yeah, right.
> 
> So they do nothing here.
> 
> > > In my debugging this seemed to be the case
> > > for the O_DIRECT path.  That's why I added this extra call here.
> > > 
> > I agree with Xiubo that we really shouldn't need to invalidate multiple
> > times.
> > 
> > I guess in this test, we have a DIO write racing with an mmap read
> > Probably what's happening is either that we can't invalidate the page
> > because it needs to be cleaned, or the mmap read is racing in just after
> > the invalidate occurs but before writeback.
> 
> This sounds a possible case.
> 
> 
> > In any case, it might be interesting to see whether you're getting
> > -EBUSY back from the new invalidate_inode_pages2 calls with your patch.
> > 
> If it's really this case maybe this should be retried some where ?
> 

Possibly, or we may need to implement ->launder_folio.

Either way, we need to understand what's happening first and then we can
figure out a solution for it.
-- 
Jeff Layton <jlayton@kernel.org>

  reply	other threads:[~2022-04-06 16:12 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-01 13:32 [PATCH v2] ceph: invalidate pages when doing DIO in encrypted inodes Luís Henriques
2022-04-06  5:24 ` Xiubo Li
2022-04-06 10:50   ` Luís Henriques
2022-04-06 10:57     ` Xiubo Li
2022-04-06  6:28 ` Xiubo Li
2022-04-06 10:57   ` Luís Henriques
2022-04-06 11:18     ` Xiubo Li
2022-04-06 11:33       ` Luís Henriques
2022-04-06 11:48         ` Jeff Layton
2022-04-06 13:10           ` Xiubo Li
2022-04-06 13:41             ` Jeff Layton [this message]
2022-04-07  1:17               ` Xiubo Li
2022-04-07 11:55                 ` Luís Henriques
2022-04-07 13:23                   ` Jeff Layton
2022-04-07 14:08                     ` Jeff Layton
2022-04-07  3:19 ` Xiubo Li
2022-04-07  9:06   ` Luís Henriques

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f0ed169ed02fe810076e959e9ec5455d9de4b4ff.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=idryomov@gmail.com \
    --cc=lhenriques@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=xiubli@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.