All of lore.kernel.org
 help / color / mirror / Atom feed
From: snitzer@redhat.com (Mike Snitzer)
Subject: [PATCH v3 07/12] md: suspend i/o during runtime blk_integrity_unregister
Date: Wed, 13 Jan 2016 14:51:37 -0500	[thread overview]
Message-ID: <20160113195137.GA3896@redhat.com> (raw)
In-Reply-To: <CAPcyv4ieU0EjB=i1sxs8QGisNyygYoqcx5rvkE_f7Q9Kp-j9Vg@mail.gmail.com>

On Wed, Jan 13 2016 at 12:11am -0500,
Dan Williams <dan.j.williams@intel.com> wrote:

> On Tue, Jan 12, 2016@7:10 PM, NeilBrown <neilb@suse.com> wrote:
> > On Wed, Jan 13 2016, Dan Williams wrote:
> >
> >> On Tue, Jan 12, 2016@6:14 PM, NeilBrown <neilb@suse.com> wrote:
> >>> On Thu, Oct 22 2015, Dan Williams wrote:
> >>>
> >>>> Synchronize pending i/o against a change in the integrity profile to
> >>>> avoid the possibility of spurious integrity errors.  Given linear_add()
> >>>> is suspending the mddev before manipulating the mddev, do the same for
> >>>> the other personalities.
> >>>>
> >>>> Acked-by: NeilBrown <neilb at suse.com>
> >>>> Signed-off-by: Dan Williams <dan.j.williams at intel.com>
> >>>> ---
> >>>>  drivers/md/md.c        |    1 +
> >>>>  drivers/md/multipath.c |    2 ++
> >>>>  drivers/md/raid1.c     |    2 ++
> >>>>  drivers/md/raid10.c    |    2 ++
> >>>>  4 files changed, 7 insertions(+)
> >>>>
> >>>
> >>>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> >>>> index 7c99a4037715..6f0ec107996a 100644
> >>>> --- a/drivers/md/raid10.c
> >>>> +++ b/drivers/md/raid10.c
> >>>> @@ -1736,7 +1736,9 @@ static int raid10_add_disk(struct mddev *mddev, struct md_rdev *rdev)
> >>>>               rcu_assign_pointer(p->rdev, rdev);
> >>>>               break;
> >>>>       }
> >>>> +     mddev_suspend(mddev);
> >>>>       md_integrity_add_rdev(rdev, mddev);
> >>>> +     mddev_resume(mddev);
> >>>>       if (mddev->queue && blk_queue_discard(bdev_get_queue(rdev->bdev)))
> >>>>               queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
> >>>>
> >>>
> >>> Hi Dan,
> >>>
> >>> Somewhat belatedly I am testing this code, and it deadlocks.  Which is
> >>> obvious once I think about it.
> >>>
> >>> raid10_add_disk() is called from
> >>>   raid10d -> md_check_recovery -> remove_and_add_spares
> >>>
> >>> so this call to mddev_suspend() will block raid10d until all pending IO
> >>> completes.  But raid10d might be needed to complete some IO -
> >>> particularly in the case of errors.  I don't know exactly what is
> >>> deadlocking in my test, but it doesn't surprise me that something does.
> >>
> >> Ugh.
> >>
> >>> Can you explain in more detail what needs to be synchronised here?  If
> >>> the synchronisation doesn't happen, what can do wrong?
> >>> Because I cannot imagine what this might actually be protecting against.
> >>
> >> My thinking is that if the integrity profile changes while i/o is in
> >> flight we can get spurious errors.  For example when enabling
> >> integrity a checksum for in-flight commands will be missing, so wait
> >> for those to complete.
> >
> > Who/what adds that checksum?  The filesystem?
> > mddev_suspend() doesn't stop the filesystem from submitting requests.
> > It just stops them from getting into md.  So when mddev_resume()
> > completes, old requests can arrive.
> 
> The block layer generates the checksum when the bio is queued.
> 
> You're right the suspend does nothing to protect against the integrity
> generated for the array.

MD and DM are _not_ using the Linux block layer's ability to generate
protection information metadata (via bio_integrity_generate).  As such
they are only providing pass-through of upper layer application
generated protection information (DIX).

A bit more explanation:
bio-based MD and DM's cloning of protection information causes their
cloned bios to have bio->bi_integrity -- as such the underlying
request-based driver skips bio_integrity_generate() via
blk_integrity_prep() (because blk_queue_bio()'s call to
bio_integrity_enabled() returns false for MD and DM's cloned bios).

> > If stability of the integrity profile is important, then we presumably
> > need to make sure it never changes (while the array is active - or at
> > least while it is mounted).  So don't accept drives which don't support
> > the current profile.  Also don't upgrade the profile just because all
> > current devices support some new feature.
> >
> > Maybe the array should have a persistent knowledge of what profile it
> > has and only accept devices which match?
> > Or maybe integrity should be completely disabled on arrays which can
> > hot-add devices (so support it on RAID0 and LINEAR, but not on
> > RAID1/4/5/6/10).
> 
> For -stable I'll develop an immediate fix that disallows live changes
> of the integrity similar to DM.
> 
> However, given no filesystem is sending down app tags, I'm wondering
> why md has integrity enabled at the array level in the first place?
> Is this just future preparation for eventual filesystem enabling?
> Otherwise, the integrity handling for each component device is
> sufficient.  I'm probably missing something.

AFAIK, Oracle DB is the only application known to be using DIX.

  reply	other threads:[~2016-01-13 19:51 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-21 17:19 [PATCH v3 00/12] Simplify block integrity registration v3 Dan Williams
2015-10-21 17:19 ` [PATCH v3 01/12] block: Move integrity kobject to struct gendisk Dan Williams
2015-10-21 17:19 ` [PATCH v3 02/12] block: Consolidate static integrity profile properties Dan Williams
2015-10-21 17:19 ` [PATCH v3 03/12] block: Reduce the size of struct blk_integrity Dan Williams
2015-10-21 17:19 ` [PATCH v3 04/12] block: Export integrity data interval size in sysfs Dan Williams
2015-10-21 17:19 ` [PATCH v3 05/12] block: Inline blk_integrity in struct gendisk Dan Williams
2015-10-21 17:19 ` [PATCH v3 06/12] md, dm, scsi, nvme, libnvdimm: drop blk_integrity_unregister() at shutdown Dan Williams
2016-01-14 10:32   ` Sagi Grimberg
2015-10-21 17:20 ` [PATCH v3 07/12] md: suspend i/o during runtime blk_integrity_unregister Dan Williams
2016-01-13  2:14   ` NeilBrown
2016-01-13  2:24     ` Dan Williams
2016-01-13  2:59       ` Mike Snitzer
2016-01-13  3:10       ` NeilBrown
2016-01-13  5:11         ` Dan Williams
2016-01-13 19:51           ` Mike Snitzer [this message]
2016-01-14  0:00     ` [PATCH] md/raid: only permit hot-add of compatible integrity profiles Dan Williams
2016-01-14  0:00       ` Dan Williams
2016-01-14  0:56       ` NeilBrown
2016-01-14  0:56         ` NeilBrown
2015-10-21 17:20 ` [PATCH v3 08/12] nvme: suspend i/o during runtime blk_integrity_unregister Dan Williams
2016-01-14 10:32   ` Sagi Grimberg
2015-10-21 17:20 ` [PATCH v3 09/12] block: generic request_queue reference counting Dan Williams
2016-01-14 10:35   ` Sagi Grimberg
2015-10-21 17:20 ` [PATCH v3 10/12] block: move blk_integrity to request_queue Dan Williams
2016-01-14 10:36   ` Sagi Grimberg
2015-10-21 17:20 ` [PATCH v3 11/12] block: blk_flush_integrity() for bio-based drivers Dan Williams
2016-01-14 10:36   ` Sagi Grimberg
2015-10-21 17:20 ` [PATCH v3 12/12] block, libnvdimm, nvme: provide a built-in blk_integrity nop profile Dan Williams
2016-01-14 10:37   ` Sagi Grimberg
2015-10-22 15:26 ` [PATCH v3 00/12] Simplify block integrity registration v3 Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160113195137.GA3896@redhat.com \
    --to=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.