linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* blk_integrity_revalidate() clears BDI_CAP_STABLE_WRITES
@ 2017-02-20 18:45 Ilya Dryomov
  2017-02-22  4:41 ` Martin K. Petersen
  0 siblings, 1 reply; 7+ messages in thread
From: Ilya Dryomov @ 2017-02-20 18:45 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Ceph Development, linux-block, Dan Williams, Christoph Hellwig

Hello,

rbd requires stable pages if ceph message data CRCs are enabled.  To
that end we set BDI_CAP_STABLE_WRITES in rbd_init_disk(), but since
commit 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk",
went into 4.4) this bit is almost immediately cleared:

add_disk
  register_disk
    blkdev_get
      rescan_partitions
        blk_integrity_revalidate

void blk_integrity_revalidate(struct gendisk *disk)
{
        struct blk_integrity *bi = &disk->queue->integrity;

        if (!(disk->flags & GENHD_FL_UP))
                return;

        if (bi->profile)
                disk->queue->backing_dev_info.capabilities |=
                        BDI_CAP_STABLE_WRITES;
        else
                disk->queue->backing_dev_info.capabilities &=
                        ~BDI_CAP_STABLE_WRITES;
}

ceph messenger is responsible for generating/verifying CRCs, so we
don't call blk_integrity_register() -- bi->profile is NULL.

Martin, could you please explain blk_integrity_revalidate() and its
GENHD_FL_UP check in particular?  We have the queue, bi->profile can't
be NULL after blk_integrity_register(), and since the latter "must" be
used for registering the profile with the block layer, wouldn't the
following be sufficient for blk_integrity users?

diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index d69c5c79f98e..319f2e4f4a8b 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -417,7 +417,7 @@ void blk_integrity_register(struct gendisk *disk,
struct blk_integrity *template
        bi->tuple_size = template->tuple_size;
        bi->tag_size = template->tag_size;

-       blk_integrity_revalidate(disk);
+       disk->queue->backing_dev_info.capabilities |= BDI_CAP_STABLE_WRITES;
 }
 EXPORT_SYMBOL(blk_integrity_register);

@@ -430,26 +430,11 @@ EXPORT_SYMBOL(blk_integrity_register);
  */
 void blk_integrity_unregister(struct gendisk *disk)
 {
-       blk_integrity_revalidate(disk);
+       disk->queue->backing_dev_info.capabilities &= ~BDI_CAP_STABLE_WRITES;
        memset(&disk->queue->integrity, 0, sizeof(struct blk_integrity));
 }
 EXPORT_SYMBOL(blk_integrity_unregister);

blk_integrity_revalidate() can then go away -- I can send the full
patch later.

The alternative seems to be to set up a bogus blk_integrity_profile
(nop_profile won't do -- this one would have to be truly bogus w/ NULL
->*_fn) under BLK_DEV_INTEGRITY ifdefs and hope that nothing breaks.
That can't be the right thing to do here...

Thanks,

                Ilya

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: blk_integrity_revalidate() clears BDI_CAP_STABLE_WRITES
  2017-02-20 18:45 blk_integrity_revalidate() clears BDI_CAP_STABLE_WRITES Ilya Dryomov
@ 2017-02-22  4:41 ` Martin K. Petersen
  2017-02-22 14:41   ` Ilya Dryomov
  0 siblings, 1 reply; 7+ messages in thread
From: Martin K. Petersen @ 2017-02-22  4:41 UTC (permalink / raw)
  To: Ilya Dryomov
  Cc: Martin K. Petersen, Ceph Development, linux-block, Dan Williams,
	Christoph Hellwig

>>>>> "Ilya" == Ilya Dryomov <idryomov@gmail.com> writes:

Ilya,

Ilya> could you please explain blk_integrity_revalidate() and
Ilya> its GENHD_FL_UP check in particular?  We have the queue,
Ilya> bi->profile can't be NULL after blk_integrity_register(), and
Ilya> since the latter "must" be used for registering the profile with
Ilya> the block layer, wouldn't the following be sufficient for
Ilya> blk_integrity users?

IIrc, the FL_UP check fixed a registration problem in the nvme driver.

The rationale behind revalidate was that we need to handle devices which
lose the integrity capability at runtime (i.e. a integrity-enabled DM
device is extended with a non-cable drive forcing the feature to be
turned off). The clearing of the integrity profile is more important in
that case than zapping the stable pages flag. But that was the original
reason for not just ORing BDI_CAP_STABLE_WRITES.

I don't have a huge problem with keeping stable pages on if a device
suddenly stops being integrity capable. However, I'd like to understand
your use case a bit better.

Ilya> The alternative seems to be to set up a bogus
Ilya> blk_integrity_profile (nop_profile won't do -- this one would have
Ilya> to be truly bogus w/ NULL *_fn) under BLK_DEV_INTEGRITY ifdefs and
Ilya> hope that nothing breaks.

Can you point me to the relevant code on your end?

Thanks,
Martin

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: blk_integrity_revalidate() clears BDI_CAP_STABLE_WRITES
  2017-02-22  4:41 ` Martin K. Petersen
@ 2017-02-22 14:41   ` Ilya Dryomov
  2017-02-23 23:49     ` Martin K. Petersen
  0 siblings, 1 reply; 7+ messages in thread
From: Ilya Dryomov @ 2017-02-22 14:41 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Ceph Development, linux-block, Dan Williams, Christoph Hellwig

[-- Attachment #1: Type: text/plain, Size: 2941 bytes --]

On Wed, Feb 22, 2017 at 5:41 AM, Martin K. Petersen
<martin.petersen@oracle.com> wrote:
>>>>>> "Ilya" == Ilya Dryomov <idryomov@gmail.com> writes:
>
> Ilya,
>
> Ilya> could you please explain blk_integrity_revalidate() and
> Ilya> its GENHD_FL_UP check in particular?  We have the queue,
> Ilya> bi->profile can't be NULL after blk_integrity_register(), and
> Ilya> since the latter "must" be used for registering the profile with
> Ilya> the block layer, wouldn't the following be sufficient for
> Ilya> blk_integrity users?
>
> IIrc, the FL_UP check fixed a registration problem in the nvme driver.

Did it have something to do with a NULL disk->queue?

>
> The rationale behind revalidate was that we need to handle devices which
> lose the integrity capability at runtime (i.e. a integrity-enabled DM
> device is extended with a non-cable drive forcing the feature to be
> turned off). The clearing of the integrity profile is more important in
> that case than zapping the stable pages flag. But that was the original
> reason for not just ORing BDI_CAP_STABLE_WRITES.
>
> I don't have a huge problem with keeping stable pages on if a device
> suddenly stops being integrity capable. However, I'd like to understand
> your use case a bit better.

Well, blk_integrity_revalidate() doesn't clear the profile, it just
clears the stable pages flag.  Whoever calls blk_integrity_unregister()
to clear the profile can also clear the stable pages flag -- why not
let blk_integrity_unregister() clear the flag like I suggested?

>
> Ilya> The alternative seems to be to set up a bogus
> Ilya> blk_integrity_profile (nop_profile won't do -- this one would have
> Ilya> to be truly bogus w/ NULL *_fn) under BLK_DEV_INTEGRITY ifdefs and
> Ilya> hope that nothing breaks.
>
> Can you point me to the relevant code on your end?

This code doesn't exist yet -- it's just something I thought I'd have
to do if my patch or something along those lines isn't accepted.  There
is the nop_profile with stub *_fn callbacks, which is used by nvme and
nvdimm to make bio_integrity_enabled() return true, so that per-interval
buffers are allocated in bio_integrity_prep().  "I want some space for
per-interval metadata, but no integrity checksums" is use case there.

In the rbd case, we don't want that.  We just want to set the stable
pages flag and make sure it's not reset by blk_integrity code.  So, if
blk_integrity_revalidate() isn't fixed, rbd will need to set up a bogus
profile with NULL *_fn callbacks to make bio_integrity_enabled() return
false.  This is obviously fragile, because blk_get_integrity() will no
longer return NULL...

We are setting BDI_CAP_STABLE_WRITES in rbd_init_disk(), see commit
bae818ee1577 ("rbd: require stable pages if message data CRCs are
enabled").  The CRCs are generated in write_partial_message_data() and
verified in read_partial_msg_data().

I'm attaching the patch I had in mind.

Thanks,

                Ilya

[-- Attachment #2: 0001-block-get-rid-of-blk_integrity_revalidate.patch --]
[-- Type: text/x-patch, Size: 4240 bytes --]

From 24a0cf8bc437a65b3986e9ab25cf2f05e6bbd5d0 Mon Sep 17 00:00:00 2001
From: Ilya Dryomov <idryomov@gmail.com>
Date: Mon, 20 Feb 2017 18:50:50 +0100
Subject: [PATCH] block: get rid of blk_integrity_revalidate()

Commit 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
introduced blk_integrity_revalidate(), which clears the stable pages
bit if no blk_integrity profile is registered.  It's called from
revalidate_disk() and rescan_partitions(), making it impossible to set
BDI_CAP_STABLE_WRITES for drivers that support partitions and don't use
blk_integrity.  One such driver is rbd, where the ceph messenger is
responsible for generating/verifying CRCs.

Since blk_integrity_{un,}register() "must" be used for (un)registering
the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES
setting there.  This way drivers that call blk_integrity_register() and
use integrity infrastructure won't interfere with drivers that don't
but still want stable pages.

Fixes: 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
 block/blk-integrity.c     | 19 ++-----------------
 block/partition-generic.c |  1 -
 fs/block_dev.c            |  1 -
 include/linux/genhd.h     |  2 --
 4 files changed, 2 insertions(+), 21 deletions(-)

diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index d69c5c79f98e..319f2e4f4a8b 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -417,7 +417,7 @@ void blk_integrity_register(struct gendisk *disk, struct blk_integrity *template
 	bi->tuple_size = template->tuple_size;
 	bi->tag_size = template->tag_size;
 
-	blk_integrity_revalidate(disk);
+	disk->queue->backing_dev_info.capabilities |= BDI_CAP_STABLE_WRITES;
 }
 EXPORT_SYMBOL(blk_integrity_register);
 
@@ -430,26 +430,11 @@ EXPORT_SYMBOL(blk_integrity_register);
  */
 void blk_integrity_unregister(struct gendisk *disk)
 {
-	blk_integrity_revalidate(disk);
+	disk->queue->backing_dev_info.capabilities &= ~BDI_CAP_STABLE_WRITES;
 	memset(&disk->queue->integrity, 0, sizeof(struct blk_integrity));
 }
 EXPORT_SYMBOL(blk_integrity_unregister);
 
-void blk_integrity_revalidate(struct gendisk *disk)
-{
-	struct blk_integrity *bi = &disk->queue->integrity;
-
-	if (!(disk->flags & GENHD_FL_UP))
-		return;
-
-	if (bi->profile)
-		disk->queue->backing_dev_info.capabilities |=
-			BDI_CAP_STABLE_WRITES;
-	else
-		disk->queue->backing_dev_info.capabilities &=
-			~BDI_CAP_STABLE_WRITES;
-}
-
 void blk_integrity_add(struct gendisk *disk)
 {
 	if (kobject_init_and_add(&disk->integrity_kobj, &integrity_ktype,
diff --git a/block/partition-generic.c b/block/partition-generic.c
index 7afb9907821f..0171a2faad68 100644
--- a/block/partition-generic.c
+++ b/block/partition-generic.c
@@ -497,7 +497,6 @@ int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
 
 	if (disk->fops->revalidate_disk)
 		disk->fops->revalidate_disk(disk);
-	blk_integrity_revalidate(disk);
 	check_disk_size_change(disk, bdev);
 	bdev->bd_invalidated = 0;
 	if (!get_capacity(disk) || !(state = check_partition(disk, bdev)))
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 3c47614a4b32..b94e2a4974a1 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1422,7 +1422,6 @@ int revalidate_disk(struct gendisk *disk)
 
 	if (disk->fops->revalidate_disk)
 		ret = disk->fops->revalidate_disk(disk);
-	blk_integrity_revalidate(disk);
 	bdev = bdget_disk(disk, 0);
 	if (!bdev)
 		return ret;
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 76f39754e7b0..76d6a1cd4153 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -722,11 +722,9 @@ static inline void part_nr_sects_write(struct hd_struct *part, sector_t size)
 #if defined(CONFIG_BLK_DEV_INTEGRITY)
 extern void blk_integrity_add(struct gendisk *);
 extern void blk_integrity_del(struct gendisk *);
-extern void blk_integrity_revalidate(struct gendisk *);
 #else	/* CONFIG_BLK_DEV_INTEGRITY */
 static inline void blk_integrity_add(struct gendisk *disk) { }
 static inline void blk_integrity_del(struct gendisk *disk) { }
-static inline void blk_integrity_revalidate(struct gendisk *disk) { }
 #endif	/* CONFIG_BLK_DEV_INTEGRITY */
 
 #else /* CONFIG_BLOCK */
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: blk_integrity_revalidate() clears BDI_CAP_STABLE_WRITES
  2017-02-22 14:41   ` Ilya Dryomov
@ 2017-02-23 23:49     ` Martin K. Petersen
  2017-02-28  8:19       ` Ilya Dryomov
  0 siblings, 1 reply; 7+ messages in thread
From: Martin K. Petersen @ 2017-02-23 23:49 UTC (permalink / raw)
  To: Ilya Dryomov
  Cc: Martin K. Petersen, Ceph Development, linux-block, Dan Williams,
	Christoph Hellwig

>>>>> "Ilya" == Ilya Dryomov <idryomov@gmail.com> writes:

Ilya,

Ilya> Well, blk_integrity_revalidate() doesn't clear the profile, it
Ilya> just clears the stable pages flag.  Whoever calls
Ilya> blk_integrity_unregister() to clear the profile can also clear the
Ilya> stable pages flag -- why not let blk_integrity_unregister() clear
Ilya> the flag like I suggested?

That's what it used to do.

blk_integrity_revalidate() was obviously introduced to overcome some
problem. Unfortunately, I can't recall what that was and Google isn't
being particularly helpful. I suspect it was either in the NVDIMM or
NVMe camps since that's where the churn was.

I don't have a problem with your patch as long as we're sure there are
no regressions. I would carry the gendisk check over, though.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: blk_integrity_revalidate() clears BDI_CAP_STABLE_WRITES
  2017-02-23 23:49     ` Martin K. Petersen
@ 2017-02-28  8:19       ` Ilya Dryomov
  2017-03-02  3:24         ` Martin K. Petersen
  0 siblings, 1 reply; 7+ messages in thread
From: Ilya Dryomov @ 2017-02-28  8:19 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Ceph Development, linux-block, Dan Williams, Christoph Hellwig,
	Sagi Grimberg, Mike Snitzer, Jens Axboe, linux-nvme

On Fri, Feb 24, 2017 at 12:49 AM, Martin K. Petersen
<martin.petersen@oracle.com> wrote:
>>>>>> "Ilya" == Ilya Dryomov <idryomov@gmail.com> writes:
>
> Ilya,
>
> Ilya> Well, blk_integrity_revalidate() doesn't clear the profile, it
> Ilya> just clears the stable pages flag.  Whoever calls
> Ilya> blk_integrity_unregister() to clear the profile can also clear the
> Ilya> stable pages flag -- why not let blk_integrity_unregister() clear
> Ilya> the flag like I suggested?
>
> That's what it used to do.
>
> blk_integrity_revalidate() was obviously introduced to overcome some
> problem. Unfortunately, I can't recall what that was and Google isn't
> being particularly helpful. I suspect it was either in the NVDIMM or
> NVMe camps since that's where the churn was.

Adding more people from 25520d55cdb6 -- does anyone remember the story
behind blk_integrity_revalidate()?

>
> I don't have a problem with your patch as long as we're sure there are
> no regressions. I would carry the gendisk check over, though.

I spent quite some time trying to do basic tests on nvme, since it uses
both nop_profile and real integrity profiles.  Unfortunately I don't
have access to an nvme card with separate-metadata capability and
upstream qemu driver doesn't support any metadata capabilities at all,
so I turned to https://github.com/OpenChannelSSD/qemu-nvme.  It lets
you configure mc and multiple lbafs, but

    commit ff646a5841f743fdd9cfef138ac6be1f0ba4fbfb
    Author: Keith Busch <keith.busch@intel.com> Thu Oct 23 20:02:44 2014

    As part of this, I am removing all meta-data support. It's not well
    supported in any immediatly available operating system, so the code
    is not well tested.

In the end I got it into a semi-working state by checking out an old
version and dropping some sanity checks.  I formatted the namespace in
a number of ways -- bdi/stable_pages_required and integrity/format
behaved as expected ("none", "nop", "T10-DIF-TYPE1-CRC").

Given the above, I'm not sure what the baseline is -- blk_integrity
code isn't invoked for data-only lbafs.  Could nvme folks please look
at this?  rbd regression caused by integrity revalidate change goes
back to 4.4 and I'd really like to get it fixed.  The proposed patch
is attached to my earlier reply on linux-block.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: blk_integrity_revalidate() clears BDI_CAP_STABLE_WRITES
  2017-02-28  8:19       ` Ilya Dryomov
@ 2017-03-02  3:24         ` Martin K. Petersen
  2017-03-10 16:49           ` Ilya Dryomov
  0 siblings, 1 reply; 7+ messages in thread
From: Martin K. Petersen @ 2017-03-02  3:24 UTC (permalink / raw)
  To: Ilya Dryomov
  Cc: Jens Axboe, Sagi Grimberg, Martin K. Petersen, Mike Snitzer,
	linux-nvme, Christoph Hellwig, linux-block, Ceph Development,
	Dan Williams

>>>>> "Ilya" == Ilya Dryomov <idryomov@gmail.com> writes:

Ilya,

Ilya> Given the above, I'm not sure what the baseline is --
Ilya> blk_integrity code isn't invoked for data-only lbafs.  Could nvme
Ilya> folks please look at this?  rbd regression caused by integrity
Ilya> revalidate change goes back to 4.4 and I'd really like to get it
Ilya> fixed.  The proposed patch is attached to my earlier reply on
Ilya> linux-block.

I've had a couple of fires to attend to this week. I'll try and give
your patch a spin on my FC setup tomorrow or Friday.

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: blk_integrity_revalidate() clears BDI_CAP_STABLE_WRITES
  2017-03-02  3:24         ` Martin K. Petersen
@ 2017-03-10 16:49           ` Ilya Dryomov
  0 siblings, 0 replies; 7+ messages in thread
From: Ilya Dryomov @ 2017-03-10 16:49 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Jens Axboe, Sagi Grimberg, Mike Snitzer, linux-nvme,
	Christoph Hellwig, linux-block, Ceph Development, Dan Williams

On Thu, Mar 2, 2017 at 4:24 AM, Martin K. Petersen
<martin.petersen@oracle.com> wrote:
>>>>>> "Ilya" == Ilya Dryomov <idryomov@gmail.com> writes:
>
> Ilya,
>
> Ilya> Given the above, I'm not sure what the baseline is --
> Ilya> blk_integrity code isn't invoked for data-only lbafs.  Could nvme
> Ilya> folks please look at this?  rbd regression caused by integrity
> Ilya> revalidate change goes back to 4.4 and I'd really like to get it
> Ilya> fixed.  The proposed patch is attached to my earlier reply on
> Ilya> linux-block.
>
> I've had a couple of fires to attend to this week. I'll try and give
> your patch a spin on my FC setup tomorrow or Friday.

Hi Martin,

Ping...  Did you get a chance to try it?

Thanks,

                Ilya

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-03-10 16:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-20 18:45 blk_integrity_revalidate() clears BDI_CAP_STABLE_WRITES Ilya Dryomov
2017-02-22  4:41 ` Martin K. Petersen
2017-02-22 14:41   ` Ilya Dryomov
2017-02-23 23:49     ` Martin K. Petersen
2017-02-28  8:19       ` Ilya Dryomov
2017-03-02  3:24         ` Martin K. Petersen
2017-03-10 16:49           ` Ilya Dryomov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).