ceph-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Jan Kara <jack@suse.cz>
Cc: Ilya Dryomov <idryomov@gmail.com>, ceph-devel@vger.kernel.org
Subject: Re: Hole punch races in Ceph
Date: Thu, 22 Apr 2021 08:05:12 -0400	[thread overview]
Message-ID: <8f2e47965340c4a5dcd7e6b025b1bcd7a588f058.camel@kernel.org> (raw)
In-Reply-To: <20210422120255.GH26221@quack2.suse.cz>

On Thu, 2021-04-22 at 14:02 +0200, Jan Kara wrote:
> On Thu 22-04-21 07:43:16, Jeff Layton wrote:
> > On Thu, 2021-04-22 at 13:15 +0200, Jan Kara wrote:
> > > Hello,
> > > 
> > > I'm looking into how Ceph protects against races between page fault and
> > > hole punching (I'm unifying protection for this kind of races among
> > > filesystems) and AFAICT it does not. What I have in mind in particular is a
> > > race like:
> > > 
> > > CPU1					CPU2
> > > 
> > > ceph_fallocate()
> > >   ...
> > >   ceph_zero_pagecache_range()
> > > 					ceph_filemap_fault()
> > > 					  faults in page in the range being
> > > 					  punched
> > >   ceph_zero_objects()
> > > 
> > > And now we have a page in punched range with invalid data. If
> > > ceph_page_mkwrite() manages to squeeze in at the right moment, we might
> > > even associate invalid metadata with the page I'd assume (but I'm not sure
> > > whether this would be harmful). Am I missing something?
> > > 
> > > 								Honza
> > 
> > No, I don't think you're missing anything. If ceph_page_mkwrite happens
> > to get called at an inopportune time then we'd probably end up writing
> > that page back into the punched range too. What would be the best way to
> > fix this, do you think?
> > 
> > One idea:
> > 
> > We could lock the pages we're planning to punch out first, then
> > zero/punch out the objects on the OSDs, and then do the hole punch in
> > the pagecache? Would that be sufficient to close the race?
> 
> Yes, that would be sufficient but very awkward e.g. if you want to punch
> out 4GB of data which even needn't be in the page cache. But all
> filesystems have this problem - e.g. ext4, xfs, etc. have already their
> private locks to avoid races like this, I'm now working on lifting the
> fs-private solutions into a generic one so I'll fix CEPH along the way as
> well. I was just making sure I'm not missing some other protection
> mechanism in CEPH.
> 

Even better! I'll keep an eye out for your patches.

Thanks,
-- 
Jeff Layton <jlayton@kernel.org>


      reply	other threads:[~2021-04-22 12:05 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-22 11:15 Hole punch races in Ceph Jan Kara
2021-04-22 11:43 ` Jeff Layton
2021-04-22 12:02   ` Jan Kara
2021-04-22 12:05     ` Jeff Layton [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8f2e47965340c4a5dcd7e6b025b1bcd7a588f058.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=idryomov@gmail.com \
    --cc=jack@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).