All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	Jens Axboe <axboe@fb.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-block@vger.kernel.org, Jan Kara <jack@suse.com>
Subject: Re: [PATCH v2 5/7] fs: notify superblocks of backing-device death
Date: Thu, 26 Nov 2015 17:27:28 +1100	[thread overview]
Message-ID: <20151126062728.GR19199@dastard> (raw)
In-Reply-To: <CAPcyv4i_wjZLNPN5C=fy7TL4N+QCedMX5U-T4gdMxhMmFztfJA@mail.gmail.com>

On Wed, Nov 25, 2015 at 02:09:34PM -0800, Dan Williams wrote:
> On Wed, Nov 25, 2015 at 1:50 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Wed, Nov 25, 2015 at 10:37:24AM -0800, Dan Williams wrote:
> >> Set SB_I_BDIDEAD when a block device is no longer available to service
> >> requests.  This will be used in several places where an fs should give
> >> up early because the block device is gone.  Letting the fs continue on
> >> as if the block device is still present can lead to long latencies
> >> waiting for an fs to detect the loss of its backing device, trigger
> >> crashes, or generate misleasing warnings.
> >>
> >> Cc: Jan Kara <jack@suse.com>
> >> Cc: Jens Axboe <axboe@fb.com>
> >> Suggested-by: Dave Chinner <david@fromorbit.com>
> >
> > This isn't what I suggested. :/
> >
> > .....
> >
> >> diff --git a/fs/block_dev.c b/fs/block_dev.c
> >> index 1dd416bf72f7..d0233d643d33 100644
> >> --- a/fs/block_dev.c
> >> +++ b/fs/block_dev.c
> >> @@ -1795,6 +1795,23 @@ int __invalidate_device(struct block_device *bdev, bool kill_dirty)
> >>  }
> >>  EXPORT_SYMBOL(__invalidate_device);
> >>
> >> +void kill_bdev_super(struct gendisk *disk, int partno)
> >> +{
> >> +     struct block_device *bdev = bdget_disk(disk, partno);
> >> +     struct super_block *sb;
> >> +
> >> +     if (!bdev)
> >> +             return;
> >> +     sb = get_super(bdev);
> >> +     if (!sb)
> >> +             goto out;
> >> +
> >> +     sb->s_iflags |= SB_I_BDI_DEAD;
> >> +     drop_super(sb);
> >> + out:
> >> +     bdput(bdev);
> >> +}
> >
> > That's not a notification to the filesystem - that's a status flag
> > the filesystem has to explicitly check for *on every operation*. We
> > already have checks like these throughout filesystems, but they are
> > filesystem specific and need to propagate into fs-specific
> > subsystems that have knowledge of VFS level superblocks.
> >
> > To that end, what I actually suggested what a callback - something
> > like a function in the super operations structure so that the
> > filesystem can take *immediate action* when the block device is
> > dying. i.e.
> >
> > void kill_bdev_super(struct gendisk *disk, int partno)
> > {
> >         struct block_device *bdev = bdget_disk(disk, partno);
> >         struct super_block *sb;
> >
> >         if (!bdev)
> >                 return;
> >         sb = get_super(bdev);
> >         if (!sb)
> >                 goto out;
> >
> >         if (sb->s_ops->shutdown)
> >                 sb->s_ops->shutdown(sb);
> >
> >         drop_super(sb);
> >  out:
> >         bdput(bdev);
> > }
> >
> > and then we implement ->shutdown somthing like this in XFS:
> >
> > xfs_fs_shutdown(struct superblock *sb)
> > {
> >         xfs_force_shutdown(XFS_M(sb), SHUTDOWN_DEVICE_REQ);
> > }
> >
> > and so we immediately notify the entire filesystem that a shutdown
> > state has been entered and the appropriate actions are immediately
> > taken.
> >
> 
> That sounds good in theory.  However, in the case of xfs it is already
> calling xfs_force_shutdown(),

Where? If XFS does not do any metadata IO, then it won't shut the
filesystem down. We almost always allocate/map blocks without doing
any IO, which means we cannot guarantee erroring out page faults
during get_blocks until a shutdown has be triggered by other
means....

> but that does not prevent it from
> continuing to wait indefinitely at umount.

Which is a bug we need to fix - I don't see how a shutdown
implementation problem is at all relevant to triggering shutdowns in
a prompt manner...

> For the ext4 the
> mark_inode_dirty() warning we're triggering the error in generic code.

So? XFS doesn't use that generic code, but we have filesystem
specific issues that we need to handle sanely.

> None of this fixes the problem of dax mappings leaking past block
> device remove.  That can be done generically without needing per-fs
> code.

No, the shutdown is intended to close the race between the device
being removed, the mappings being invalidated and the filesytem
racing creating new mappings during page faults because it doesn't
know the device has been unplugged until it does IO that gets some
error in an unrecoverable path...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2015-11-26  6:27 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-25 18:36 [PATCH v2 0/7] dax cleanups and lifetime fixes Dan Williams
2015-11-25 18:37 ` [PATCH v2 1/7] pmem, dax: clean up clear_pmem() Dan Williams
2015-11-25 18:37 ` [PATCH v2 2/7] dax: increase granularity of dax_clear_blocks() operations Dan Williams
2015-11-25 18:37 ` [PATCH v2 3/7] dax: guarantee page aligned results from bdev_direct_access() Dan Williams
2015-11-25 18:37 ` [PATCH v2 4/7] dax: fix lifetime of in-kernel dax mappings with dax_map_atomic() Dan Williams
2015-11-25 18:37 ` [PATCH v2 5/7] fs: notify superblocks of backing-device death Dan Williams
2015-11-25 21:50   ` Dave Chinner
2015-11-25 22:09     ` Dan Williams
2015-11-26  6:27       ` Dave Chinner [this message]
2015-11-26  7:11         ` Dan Williams
2015-12-01  4:03           ` Dave Chinner
2015-12-01  4:20             ` Dan Williams
2015-11-25 18:37 ` [PATCH v2 6/7] ext4: skip inode dirty when backing device is gone Dan Williams
2015-11-25 18:37 ` [PATCH v2 7/7] mm, dax: unmap dax mappings at bdev shutdown Dan Williams
2015-11-30 22:03   ` Dan Williams
2015-12-01  4:21     ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151126062728.GR19199@dastard \
    --to=david@fromorbit.com \
    --cc=axboe@fb.com \
    --cc=dan.j.williams@intel.com \
    --cc=jack@suse.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.