linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Jan Kara <jack@suse.cz>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	device-mapper development <dm-devel@redhat.com>,
	linux-btrfs@vger.kernel.org, axboe@fb.com, zab@zabbo.net,
	neilb@suse.de
Subject: Re: Proposal for annotating _unstable_ pages
Date: Thu, 21 May 2015 21:21:12 +0200	[thread overview]
Message-ID: <20150521192112.GA2665@quack.suse.cz> (raw)
In-Reply-To: <20150521180954.GA27397@kmo-pixel>

On Thu 21-05-15 11:09:55, Kent Overstreet wrote:
> On Thu, May 21, 2015 at 06:54:53PM +0200, Jan Kara wrote:
> > On Wed 20-05-15 18:04:40, Kent Overstreet wrote:
> > > > Yeah.  I never figured out a sane way to migrate pages and keep everything
> > > > else happy.  Daniel Phillips is having a go at page forking for tux3; let's
> > > > see if the questions about that get resolved.
> > > 
> > > That would be great, we need something.
> > > 
> > > I'd also be really curious what btrfs is doing today - is it just bouncing
> > > everything internally, or did they come up with something more clever?
> > 
> > Btrfs is just waiting for IO to complete.
> > 
> > > > > Also, there's probably always going to be situations where we're reading or
> > > > > writing to pages user space can stomp on (dio) - IMO we need to add a bio flag
> > > > > to annotate this - "if you need this to be stable you have to bounce it".
> > > > > Otherwise either filesystems/block drivers are going to be stuck bouncing
> > > > > everything, or it'll just (continue to be) buggy.
> > > > 
> > > > Well, for now there's BIO_SNAP_STABLE that forces the block layer to bounce it,
> > > > but right now ext3 is the last user of it, and afaict btrfs is the only other
> > > > FS that takes care of stable pages on its own.
> > > 
> > > I have no idea what BIO_SNAP_STABLE was supposed to be for, but I don't see how
> > > it's useful for anything sane.
> > 
> > It's for the case where lower layer requests it needs stable pages but
> > upper layer isn't able to provide them (as is the case of ext3). Then block
> > layer bounces the data for the caller.
> > 
> > > But that's the complete opposite of the problem stable pages are supposed to
> > > solve: stable pages are for when the _lower_ layer (be it filesystem, bcache,
> > > md, lvm) needs the memory being either read to or written from (both, it's not
> > > just writes) to not be diddled over while the IO is in flight.
> > > 
> > > Now, a point that I think has been missed is that stable pages are _not_ a
> > > complete solution, at least for consumers in the block layer.
> > > 
> > > The situation today is that if I'm in the block layer, and I get a handed a read
> > > or write bio, I _don't know_ if it's from something that's going to diddle over
> > > those pages or not. So if I require stable pages - be it for data checksumming
> > > or for other things - I've just got to bounce the bio myself.
> > > 
> > > And then the really annoying thing is that if you've got stacked things that all
> > > need stable pages (maybe btrfs on top of bcache on top of md) - they _all_ have
> > > to assume the pages aren't going to be stable, so if they need them they _all_
> > > have to bounce - even though once the first layer bounced the bio that made it
> > > stable for everything underneath it.
> > 
> > The current design is that if you need stable pages for your device, set
> > bdi capability BDI_CAP_STABLE_WRITES, fs then takes care of not scribbling
> > over your page while it is under writeback or uses BIO_SNAP_STABLE if it
> > cannot.
> 
> But if I need stable pages, I still have to bounce because that _does not_
> guarantee stable pages, it only gives me stable pages for some of the IOs and in
> the lower layers you can't tell which is which.
> 
> Do you see the problem? What good is BDI_CAP_STABLE_WRITES if it's not a
> guarantee and I can't tell if I need to bounce or not?
  So fix the upper layers to make it a guarantee? You mentioned direct IO
needs fixing. Anything else?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  reply	other threads:[~2015-05-21 19:21 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-15 20:58 Let's get a File & Storage miniconf going at LPC2015! Darrick J. Wong
2015-05-19 15:42 ` Kent Overstreet
2015-05-19 20:10   ` Darrick J. Wong
2015-05-21  1:04     ` Proposal for annotating _unstable_ pages Kent Overstreet
2015-05-21 16:54       ` Jan Kara
2015-05-21 18:09         ` Kent Overstreet
2015-05-21 19:21           ` Jan Kara [this message]
2015-05-22 18:17             ` [dm-devel] " Darrick J. Wong
2015-05-22 18:33               ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150521192112.GA2665@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=axboe@fb.com \
    --cc=darrick.wong@oracle.com \
    --cc=dm-devel@redhat.com \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=zab@zabbo.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).