All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/3] Couple of fixes detected with blktests
@ 2019-03-11 22:02 Sagi Grimberg
  2019-03-11 22:02 ` [PATCH v4 1/3] nvme: set 0 capacity if namespace block size exceeds PAGE_SIZE Sagi Grimberg
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Sagi Grimberg @ 2019-03-11 22:02 UTC (permalink / raw)


nvme test 009 discovered a few issues with and without multipathing.

1. if a namespace has a larger block size than PAGE_SIZE we either panic
   on a divide by zero with multipathing, or we see I/O errors without it.
   - fix nvmet to always export block size that is leq PAGE_SIZE
   - fix the core to set 0 capacity on such a namespace

2. if a namespace allocation fails after nvme_init_ns_head we leak
   subsystems due to a missing ns_head ref put
   - fix it by putting the ns_head reference on nvme_alloc_ns error flow

Changes from v3:
- don't have the host fail if ns allocation lba_shift exceeds PAGE_SHIFT,
  instead set 0 capacity

Changes from v2:
- add comment to nvmet file ns lba_shift clamping
- add host side protection against such a namespace

Sagi Grimberg (3):
  nvme: set 0 capacity if namespace block size exceeds PAGE_SIZE
  nvme: put ns_head ref if namespace fails allocation
  nvmet-file: clamp-down file namespace lba_shift

 drivers/nvme/host/core.c          | 8 +++++++-
 drivers/nvme/target/io-cmd-file.c | 7 ++++++-
 2 files changed, 13 insertions(+), 2 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 1/3] nvme: set 0 capacity if namespace block size exceeds PAGE_SIZE
  2019-03-11 22:02 [PATCH v4 0/3] Couple of fixes detected with blktests Sagi Grimberg
@ 2019-03-11 22:02 ` Sagi Grimberg
  2019-03-11 22:06   ` Keith Busch
                     ` (2 more replies)
  2019-03-11 22:02 ` [PATCH v4 2/3] nvme: put ns_head ref if namespace fails allocation Sagi Grimberg
  2019-03-11 22:02 ` [PATCH v4 3/3] nvmet-file: clamp-down file namespace lba_shift Sagi Grimberg
  2 siblings, 3 replies; 22+ messages in thread
From: Sagi Grimberg @ 2019-03-11 22:02 UTC (permalink / raw)


If our target exposed a namespace with a block size that is greater
than PAGE_SIZE, set 0 capacity on the namespace as we do not support it.

This issue encountered when the nvmet namespace was backed by a tempfile.

Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/core.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 7321da21f8c9..e08979094eb5 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1610,6 +1610,10 @@ static void nvme_update_disk_info(struct gendisk *disk,
 	sector_t capacity = le64_to_cpup(&id->nsze) << (ns->lba_shift - 9);
 	unsigned short bs = 1 << ns->lba_shift;
 
+	if (ns->lba_shift > PAGE_SHIFT) {
+		/* unsupported block size, set capacity to 0 later */
+		bs = (1 << 9);
+	}
 	blk_mq_freeze_queue(disk->queue);
 	blk_integrity_unregister(disk);
 
@@ -1621,7 +1625,8 @@ static void nvme_update_disk_info(struct gendisk *disk,
 	if (ns->ms && !ns->ext &&
 	    (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED))
 		nvme_init_integrity(disk, ns->ms, ns->pi_type);
-	if (ns->ms && !nvme_ns_has_pi(ns) && !blk_get_integrity(disk))
+	if ((ns->ms && !nvme_ns_has_pi(ns) && !blk_get_integrity(disk)) ||
+	    ns->lba_shift > PAGE_SHIFT)
 		capacity = 0;
 
 	set_capacity(disk, capacity);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 2/3] nvme: put ns_head ref if namespace fails allocation
  2019-03-11 22:02 [PATCH v4 0/3] Couple of fixes detected with blktests Sagi Grimberg
  2019-03-11 22:02 ` [PATCH v4 1/3] nvme: set 0 capacity if namespace block size exceeds PAGE_SIZE Sagi Grimberg
@ 2019-03-11 22:02 ` Sagi Grimberg
  2019-03-12 14:47   ` Christoph Hellwig
  2019-03-11 22:02 ` [PATCH v4 3/3] nvmet-file: clamp-down file namespace lba_shift Sagi Grimberg
  2 siblings, 1 reply; 22+ messages in thread
From: Sagi Grimberg @ 2019-03-11 22:02 UTC (permalink / raw)


In case nvme_alloc_ns fails after we initialize ns_head but before we
add the ns to the controller namespaces list we need to explicitly put
the ns_head reference because when we teardown the controller we
won't find it, causing us to leak a dangling subsystem eventually.

Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index e08979094eb5..4530243ef706 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3314,6 +3314,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	mutex_lock(&ctrl->subsys->lock);
 	list_del_rcu(&ns->siblings);
 	mutex_unlock(&ctrl->subsys->lock);
+	nvme_put_ns_head(ns->head);
  out_free_id:
 	kfree(id);
  out_free_queue:
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 3/3] nvmet-file: clamp-down file namespace lba_shift
  2019-03-11 22:02 [PATCH v4 0/3] Couple of fixes detected with blktests Sagi Grimberg
  2019-03-11 22:02 ` [PATCH v4 1/3] nvme: set 0 capacity if namespace block size exceeds PAGE_SIZE Sagi Grimberg
  2019-03-11 22:02 ` [PATCH v4 2/3] nvme: put ns_head ref if namespace fails allocation Sagi Grimberg
@ 2019-03-11 22:02 ` Sagi Grimberg
  2019-03-12 14:33   ` Christoph Hellwig
  2 siblings, 1 reply; 22+ messages in thread
From: Sagi Grimberg @ 2019-03-11 22:02 UTC (permalink / raw)


When the backing file is a tempfile for example, the inode i_blkbits
can be 1M in size which causes problems for hosts to support as the
disk block size.

Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/target/io-cmd-file.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
index 517522305e5c..d52fe965a5d5 100644
--- a/drivers/nvme/target/io-cmd-file.c
+++ b/drivers/nvme/target/io-cmd-file.c
@@ -49,7 +49,12 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns)
 		goto err;
 
 	ns->size = stat.size;
-	ns->blksize_shift = file_inode(ns->file)->i_blkbits;
+	/*
+	 * inode i_blkbits can be greater than the universally accepted upper
+	 * bound so make sure we export we export a sane namespace lba_shift.
+	 */
+	ns->blksize_shift = min_t(u8,
+			file_inode(ns->file)->i_blkbits, PAGE_SHIFT);
 
 	ns->bvec_cache = kmem_cache_create("nvmet-bvec",
 			NVMET_MAX_MPOOL_BVEC * sizeof(struct bio_vec),
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 1/3] nvme: set 0 capacity if namespace block size exceeds PAGE_SIZE
  2019-03-11 22:02 ` [PATCH v4 1/3] nvme: set 0 capacity if namespace block size exceeds PAGE_SIZE Sagi Grimberg
@ 2019-03-11 22:06   ` Keith Busch
  2019-03-12 14:32   ` Christoph Hellwig
  2019-04-25 14:52   ` Christoph Hellwig
  2 siblings, 0 replies; 22+ messages in thread
From: Keith Busch @ 2019-03-11 22:06 UTC (permalink / raw)


On Mon, Mar 11, 2019@03:02:25PM -0700, Sagi Grimberg wrote:
> If our target exposed a namespace with a block size that is greater
> than PAGE_SIZE, set 0 capacity on the namespace as we do not support it.
> 
> This issue encountered when the nvmet namespace was backed by a tempfile.
> 
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>

I responded to the v3 with the review by mistake. :)

Reviewed-by: Keith Busch <keith.busch at intel.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 1/3] nvme: set 0 capacity if namespace block size exceeds PAGE_SIZE
  2019-03-11 22:02 ` [PATCH v4 1/3] nvme: set 0 capacity if namespace block size exceeds PAGE_SIZE Sagi Grimberg
  2019-03-11 22:06   ` Keith Busch
@ 2019-03-12 14:32   ` Christoph Hellwig
  2019-03-12 15:35     ` Christoph Hellwig
  2019-03-12 21:15     ` Sagi Grimberg
  2019-04-25 14:52   ` Christoph Hellwig
  2 siblings, 2 replies; 22+ messages in thread
From: Christoph Hellwig @ 2019-03-12 14:32 UTC (permalink / raw)


On Mon, Mar 11, 2019@03:02:25PM -0700, Sagi Grimberg wrote:
> If our target exposed a namespace with a block size that is greater
> than PAGE_SIZE, set 0 capacity on the namespace as we do not support it.
> 
> This issue encountered when the nvmet namespace was backed by a tempfile.
> 
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
> ---
>  drivers/nvme/host/core.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 7321da21f8c9..e08979094eb5 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1610,6 +1610,10 @@ static void nvme_update_disk_info(struct gendisk *disk,
>  	sector_t capacity = le64_to_cpup(&id->nsze) << (ns->lba_shift - 9);
>  	unsigned short bs = 1 << ns->lba_shift;
>  
> +	if (ns->lba_shift > PAGE_SHIFT) {
> +		/* unsupported block size, set capacity to 0 later */
> +		bs = (1 << 9);
> +	}
>  	blk_mq_freeze_queue(disk->queue);
>  	blk_integrity_unregister(disk);
>  
> @@ -1621,7 +1625,8 @@ static void nvme_update_disk_info(struct gendisk *disk,
>  	if (ns->ms && !ns->ext &&
>  	    (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED))
>  		nvme_init_integrity(disk, ns->ms, ns->pi_type);
> -	if (ns->ms && !nvme_ns_has_pi(ns) && !blk_get_integrity(disk))
> +	if ((ns->ms && !nvme_ns_has_pi(ns) && !blk_get_integrity(disk)) ||
> +	    ns->lba_shift > PAGE_SHIFT)
>  		capacity = 0;

I like the idea behind this, but it looks rather convoluted.  I think
for the unusable namespace case we should warn and have a common label
that just sets the capacity, not touching anything else.

Does something like this work for you?

---
>From ec4ce1708eb4d4f536b93f09dab952067f58ba4b Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Tue, 12 Mar 2019 15:31:09 +0100
Subject: nvme: ignore namespaces with an unusable block size

And also refactor the handling for unusable namespaces to use a common
goto label, and print a warning for invalid metadata formats as well
while we are at it.

Reported-by: Sagi Grimberg <sagi at grimberg.me>
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/core.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 1597efae6af8..0cf5741ecb7b 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1599,6 +1599,12 @@ static void nvme_update_disk_info(struct gendisk *disk,
 	blk_mq_freeze_queue(disk->queue);
 	blk_integrity_unregister(disk);
 
+	if (bs > PAGE_SIZE) {
+		dev_warn(ns->ctrl->device,
+			"unsupported LBA size %u, ignoring namespace\n", bs);
+		goto ignore;
+	}
+
 	blk_queue_logical_block_size(disk->queue, bs);
 	blk_queue_physical_block_size(disk->queue, bs);
 	blk_queue_io_min(disk->queue, bs);
@@ -1606,8 +1612,11 @@ static void nvme_update_disk_info(struct gendisk *disk,
 	if (ns->ms && !ns->ext &&
 	    (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED))
 		nvme_init_integrity(disk, ns->ms, ns->pi_type);
-	if (ns->ms && !nvme_ns_has_pi(ns) && !blk_get_integrity(disk))
-		capacity = 0;
+	if (ns->ms && !nvme_ns_has_pi(ns) && !blk_get_integrity(disk)) {
+		dev_warn(ns->ctrl->device,
+			"unsupported metadata format, ignoring namespace\n");
+		goto ignore;
+	}
 
 	set_capacity(disk, capacity);
 	nvme_ns_config_oncs(ns);
@@ -1618,6 +1627,10 @@ static void nvme_update_disk_info(struct gendisk *disk,
 		set_disk_ro(disk, false);
 
 	blk_mq_unfreeze_queue(disk->queue);
+	return;
+ignore:
+	blk_mq_unfreeze_queue(disk->queue);
+	set_capacity(disk, 0);
 }
 
 static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 3/3] nvmet-file: clamp-down file namespace lba_shift
  2019-03-11 22:02 ` [PATCH v4 3/3] nvmet-file: clamp-down file namespace lba_shift Sagi Grimberg
@ 2019-03-12 14:33   ` Christoph Hellwig
  2019-03-12 21:07     ` Sagi Grimberg
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2019-03-12 14:33 UTC (permalink / raw)


On Mon, Mar 11, 2019@03:02:27PM -0700, Sagi Grimberg wrote:
> When the backing file is a tempfile for example, the inode i_blkbits
> can be 1M in size which causes problems for hosts to support as the
> disk block size.
> 
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
> ---
>  drivers/nvme/target/io-cmd-file.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
> index 517522305e5c..d52fe965a5d5 100644
> --- a/drivers/nvme/target/io-cmd-file.c
> +++ b/drivers/nvme/target/io-cmd-file.c
> @@ -49,7 +49,12 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns)
>  		goto err;
>  
>  	ns->size = stat.size;
> -	ns->blksize_shift = file_inode(ns->file)->i_blkbits;
> +	/*
> +	 * inode i_blkbits can be greater than the universally accepted upper
> +	 * bound so make sure we export we export a sane namespace lba_shift.
> +	 */
> +	ns->blksize_shift = min_t(u8,
> +			file_inode(ns->file)->i_blkbits, PAGE_SHIFT);

I still wonder why we clamp this down to PAGE SIZE rather than 4k.

Everything larger than 4k has a huge chance of not working with most
remote hosts, even if our target system has a larger page size.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 2/3] nvme: put ns_head ref if namespace fails allocation
  2019-03-11 22:02 ` [PATCH v4 2/3] nvme: put ns_head ref if namespace fails allocation Sagi Grimberg
@ 2019-03-12 14:47   ` Christoph Hellwig
  0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2019-03-12 14:47 UTC (permalink / raw)


Thanks,

applied to nvme-5.1.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 1/3] nvme: set 0 capacity if namespace block size exceeds PAGE_SIZE
  2019-03-12 14:32   ` Christoph Hellwig
@ 2019-03-12 15:35     ` Christoph Hellwig
  2019-03-12 21:15     ` Sagi Grimberg
  1 sibling, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2019-03-12 15:35 UTC (permalink / raw)


And while we are at it - Dave Chinner has been actively working on
blocksize > PAGE_SIZE support for XFS.  So we might be able to use
"too large" sectors sizes very soon.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 3/3] nvmet-file: clamp-down file namespace lba_shift
  2019-03-12 14:33   ` Christoph Hellwig
@ 2019-03-12 21:07     ` Sagi Grimberg
  2019-03-15 16:29       ` Christoph Hellwig
  0 siblings, 1 reply; 22+ messages in thread
From: Sagi Grimberg @ 2019-03-12 21:07 UTC (permalink / raw)



>> When the backing file is a tempfile for example, the inode i_blkbits
>> can be 1M in size which causes problems for hosts to support as the
>> disk block size.
>>
>> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
>> ---
>>   drivers/nvme/target/io-cmd-file.c | 7 ++++++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
>> index 517522305e5c..d52fe965a5d5 100644
>> --- a/drivers/nvme/target/io-cmd-file.c
>> +++ b/drivers/nvme/target/io-cmd-file.c
>> @@ -49,7 +49,12 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns)
>>   		goto err;
>>   
>>   	ns->size = stat.size;
>> -	ns->blksize_shift = file_inode(ns->file)->i_blkbits;
>> +	/*
>> +	 * inode i_blkbits can be greater than the universally accepted upper
>> +	 * bound so make sure we export we export a sane namespace lba_shift.
>> +	 */
>> +	ns->blksize_shift = min_t(u8,
>> +			file_inode(ns->file)->i_blkbits, PAGE_SHIFT);
> 
> I still wonder why we clamp this down to PAGE SIZE rather than 4k.
> 
> Everything larger than 4k has a huge chance of not working with most
> remote hosts, even if our target system has a larger page size.

loop does the same thing, didn't ever hear complaints on that..

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 1/3] nvme: set 0 capacity if namespace block size exceeds PAGE_SIZE
  2019-03-12 14:32   ` Christoph Hellwig
  2019-03-12 15:35     ` Christoph Hellwig
@ 2019-03-12 21:15     ` Sagi Grimberg
  2019-03-15 16:28       ` Christoph Hellwig
  1 sibling, 1 reply; 22+ messages in thread
From: Sagi Grimberg @ 2019-03-12 21:15 UTC (permalink / raw)



> I like the idea behind this, but it looks rather convoluted.  I think
> for the unusable namespace case we should warn and have a common label
> that just sets the capacity, not touching anything else.
> 
> Does something like this work for you?

No, this is what I had done originally, but we need to always have the
queue set to a decent block size, otherwise blk_queue_stack_limits()
panics on div by 0..

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 1/3] nvme: set 0 capacity if namespace block size exceeds PAGE_SIZE
  2019-03-12 21:15     ` Sagi Grimberg
@ 2019-03-15 16:28       ` Christoph Hellwig
  2019-03-15 16:38         ` Keith Busch
  2019-03-15 17:17         ` Sagi Grimberg
  0 siblings, 2 replies; 22+ messages in thread
From: Christoph Hellwig @ 2019-03-15 16:28 UTC (permalink / raw)


On Tue, Mar 12, 2019@02:15:26PM -0700, Sagi Grimberg wrote:
>
>> I like the idea behind this, but it looks rather convoluted.  I think
>> for the unusable namespace case we should warn and have a common label
>> that just sets the capacity, not touching anything else.
>>
>> Does something like this work for you?
>
> No, this is what I had done originally, but we need to always have the
> queue set to a decent block size, otherwise blk_queue_stack_limits()
> panics on div by 0..

I actually tested it by manually hacking a 8k block size into nvmet
and and it works just fine for me.  Where do you see a division by
zero with this patch exactly?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 3/3] nvmet-file: clamp-down file namespace lba_shift
  2019-03-12 21:07     ` Sagi Grimberg
@ 2019-03-15 16:29       ` Christoph Hellwig
  2019-03-15 17:18         ` Sagi Grimberg
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2019-03-15 16:29 UTC (permalink / raw)


On Tue, Mar 12, 2019@02:07:38PM -0700, Sagi Grimberg wrote:
>> I still wonder why we clamp this down to PAGE SIZE rather than 4k.
>>
>> Everything larger than 4k has a huge chance of not working with most
>> remote hosts, even if our target system has a larger page size.
>
> loop does the same thing, didn't ever hear complaints on that..

loop is always used on the same system exporting things..

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 1/3] nvme: set 0 capacity if namespace block size exceeds PAGE_SIZE
  2019-03-15 16:28       ` Christoph Hellwig
@ 2019-03-15 16:38         ` Keith Busch
  2019-03-15 16:40           ` Christoph Hellwig
  2019-03-15 17:17         ` Sagi Grimberg
  1 sibling, 1 reply; 22+ messages in thread
From: Keith Busch @ 2019-03-15 16:38 UTC (permalink / raw)


On Fri, Mar 15, 2019@05:28:37PM +0100, Christoph Hellwig wrote:
> On Tue, Mar 12, 2019@02:15:26PM -0700, Sagi Grimberg wrote:
> >
> >> I like the idea behind this, but it looks rather convoluted.  I think
> >> for the unusable namespace case we should warn and have a common label
> >> that just sets the capacity, not touching anything else.
> >>
> >> Does something like this work for you?
> >
> > No, this is what I had done originally, but we need to always have the
> > queue set to a decent block size, otherwise blk_queue_stack_limits()
> > panics on div by 0..
> 
> I actually tested it by manually hacking a 8k block size into nvmet
> and and it works just fine for me.  Where do you see a division by
> zero with this patch exactly?

I'm not sure abuot a divide-by-zero, but I just hacked up qemu to report
an 8k block size and get this on boot. Happens because alloc_page_buffers
won't allocate a buffer_head when block size is greater than a page size:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
 #PF error: [normal kernel read fault]
 PGD 0 P4D 0
 Oops: 0000 [#1] SMP
 CPU: 3 PID: 391 Comm: kworker/u18:1 Not tainted 5.0.0+ #42
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
 Workqueue: nvme-wq nvme_scan_work [nvme_core]
 RIP: 0010:create_empty_buffers+0x24/0x100
 Code: eb cb 0f 1f 40 00 0f 1f 44 00 00 41 54 55 49 89 d4 53 ba 01 00 00 00 48 89 fb e8 87 fe ff ff 48 89 c5 48 89 c2 eb 03 48 89 ca <48> 8b 4a 08 4c 09 22 48 85 c9 75 f1 48 89 6a 08 48 8b 43 18 48 8d
 RSP: 0018:ffffbd9ec05cf880 EFLAGS: 00010286
 RAX: 0000000000000000 RBX: ffffe0fec03a38c0 RCX: ffff99f4c751d000
 RDX: 0000000000000000 RSI: ffff99f4c751d000 RDI: ffffe0fec03a38c0
 RBP: 0000000000000000 R08: dead0000000000ff R09: 0000000000000003
 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000200 R15: ffffe0fec03a38c0
 FS:  0000000000000000(0000) GS:ffff99f4cfd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000008 CR3: 000000000eb28000 CR4: 00000000000006e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  create_page_buffers+0x4d/0x60
  block_read_full_page+0x47/0x310
  ? __add_to_page_cache_locked+0x288/0x330
  ? check_disk_change+0x60/0x60
  ? count_shadow_nodes+0x130/0x130
  do_read_cache_page+0x31c/0x6b0
  ? blkdev_writepages+0x10/0x10
  read_dev_sector+0x28/0xc0
  read_lba+0x126/0x210
  ? kmem_cache_alloc_trace+0x19b/0x1b0
  efi_partition+0x137/0x780
  ? vsnprintf+0x2ae/0x4a0
  ? vsnprintf+0xec/0x4a0
  ? snprintf+0x45/0x70
  ? is_gpt_valid.part.6+0x400/0x400
  ? check_partition+0x137/0x240
  check_partition+0x137/0x240
  rescan_partitions+0xab/0x350
  __blkdev_get+0x342/0x560
  ? inode_insert5+0x11f/0x1e0
  blkdev_get+0x11f/0x310
  ? unlock_new_inode+0x44/0x60
  ? bdget+0xff/0x110
  __device_add_disk+0x426/0x470
  nvme_validate_ns+0x35e/0x7c0 [nvme_core]
  ? nvme_identify_ctrl.isra.56+0x7e/0xc0 [nvme_core]
  ? update_load_avg+0x89/0x550
  nvme_scan_work+0xe5/0x370 [nvme_core]
  ? __synchronize_srcu.part.18+0x91/0xc0
  ? try_to_wake_up+0x55/0x430
  process_one_work+0x1e9/0x3e0
  worker_thread+0x21a/0x3d0
  ? process_one_work+0x3e0/0x3e0
  kthread+0x111/0x130
  ? kthread_park+0x90/0x90
  ret_from_fork+0x1f/0x30
 Modules linked in: nvme nvme_core serio_raw
 CR2: 0000000000000008
 ---[ end trace b38bdf1b424f36e9 ]---

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 1/3] nvme: set 0 capacity if namespace block size exceeds PAGE_SIZE
  2019-03-15 16:38         ` Keith Busch
@ 2019-03-15 16:40           ` Christoph Hellwig
  2019-03-15 16:47             ` Keith Busch
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2019-03-15 16:40 UTC (permalink / raw)


On Fri, Mar 15, 2019@10:38:43AM -0600, Keith Busch wrote:
> On Fri, Mar 15, 2019@05:28:37PM +0100, Christoph Hellwig wrote:
> > On Tue, Mar 12, 2019@02:15:26PM -0700, Sagi Grimberg wrote:
> > >
> > >> I like the idea behind this, but it looks rather convoluted.  I think
> > >> for the unusable namespace case we should warn and have a common label
> > >> that just sets the capacity, not touching anything else.
> > >>
> > >> Does something like this work for you?
> > >
> > > No, this is what I had done originally, but we need to always have the
> > > queue set to a decent block size, otherwise blk_queue_stack_limits()
> > > panics on div by 0..
> > 
> > I actually tested it by manually hacking a 8k block size into nvmet
> > and and it works just fine for me.  Where do you see a division by
> > zero with this patch exactly?
> 
> I'm not sure abuot a divide-by-zero, but I just hacked up qemu to report
> an 8k block size and get this on boot. Happens because alloc_page_buffers
> won't allocate a buffer_head when block size is greater than a page size:

But that is without either the patch from Sagi or the one from me,
right?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 1/3] nvme: set 0 capacity if namespace block size exceeds PAGE_SIZE
  2019-03-15 16:40           ` Christoph Hellwig
@ 2019-03-15 16:47             ` Keith Busch
  0 siblings, 0 replies; 22+ messages in thread
From: Keith Busch @ 2019-03-15 16:47 UTC (permalink / raw)


On Fri, Mar 15, 2019@05:40:57PM +0100, Christoph Hellwig wrote:
> On Fri, Mar 15, 2019@10:38:43AM -0600, Keith Busch wrote:
> > I'm not sure abuot a divide-by-zero, but I just hacked up qemu to report
> > an 8k block size and get this on boot. Happens because alloc_page_buffers
> > won't allocate a buffer_head when block size is greater than a page size:
> 
> But that is without either the patch from Sagi or the one from me,
> right?

Sorry, I misunderstood to think you were saying only that Sagi's patch
wasn't necessary. Yes, either patch works.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 1/3] nvme: set 0 capacity if namespace block size exceeds PAGE_SIZE
  2019-03-15 16:28       ` Christoph Hellwig
  2019-03-15 16:38         ` Keith Busch
@ 2019-03-15 17:17         ` Sagi Grimberg
  2019-03-15 18:24           ` Christoph Hellwig
  1 sibling, 1 reply; 22+ messages in thread
From: Sagi Grimberg @ 2019-03-15 17:17 UTC (permalink / raw)



>>> I like the idea behind this, but it looks rather convoluted.  I think
>>> for the unusable namespace case we should warn and have a common label
>>> that just sets the capacity, not touching anything else.
>>>
>>> Does something like this work for you?
>>
>> No, this is what I had done originally, but we need to always have the
>> queue set to a decent block size, otherwise blk_queue_stack_limits()
>> panics on div by 0..
> 
> I actually tested it by manually hacking a 8k block size into nvmet
> and and it works just fine for me.  Where do you see a division by
> zero with this patch exactly?

You need your block size to exceed u16 bs.

Then run blktest/tests/nvme/009 with multipath on.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 3/3] nvmet-file: clamp-down file namespace lba_shift
  2019-03-15 16:29       ` Christoph Hellwig
@ 2019-03-15 17:18         ` Sagi Grimberg
  0 siblings, 0 replies; 22+ messages in thread
From: Sagi Grimberg @ 2019-03-15 17:18 UTC (permalink / raw)



>>> I still wonder why we clamp this down to PAGE SIZE rather than 4k.
>>>
>>> Everything larger than 4k has a huge chance of not working with most
>>> remote hosts, even if our target system has a larger page size.
>>
>> loop does the same thing, didn't ever hear complaints on that..
> 
> loop is always used on the same system exporting things..

Yea, I see what you're saying... I can change it to 4k.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 1/3] nvme: set 0 capacity if namespace block size exceeds PAGE_SIZE
  2019-03-15 17:17         ` Sagi Grimberg
@ 2019-03-15 18:24           ` Christoph Hellwig
  2019-03-15 20:44             ` Sagi Grimberg
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2019-03-15 18:24 UTC (permalink / raw)


On Fri, Mar 15, 2019@10:17:18AM -0700, Sagi Grimberg wrote:
>
>>>> I like the idea behind this, but it looks rather convoluted.  I think
>>>> for the unusable namespace case we should warn and have a common label
>>>> that just sets the capacity, not touching anything else.
>>>>
>>>> Does something like this work for you?
>>>
>>> No, this is what I had done originally, but we need to always have the
>>> queue set to a decent block size, otherwise blk_queue_stack_limits()
>>> panics on div by 0..
>>
>> I actually tested it by manually hacking a 8k block size into nvmet
>> and and it works just fine for me.  Where do you see a division by
>> zero with this patch exactly?
>
> You need your block size to exceed u16 bs.

I'll need to look into actually testing something that large, but
the whole point of my patch is that we never even use the bs value
for anything. So I'm not sure why that would matter.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 1/3] nvme: set 0 capacity if namespace block size exceeds PAGE_SIZE
  2019-03-15 18:24           ` Christoph Hellwig
@ 2019-03-15 20:44             ` Sagi Grimberg
  2019-04-24 18:50               ` Sagi Grimberg
  0 siblings, 1 reply; 22+ messages in thread
From: Sagi Grimberg @ 2019-03-15 20:44 UTC (permalink / raw)



>>> I actually tested it by manually hacking a 8k block size into nvmet
>>> and and it works just fine for me.  Where do you see a division by
>>> zero with this patch exactly?
>>
>> You need your block size to exceed u16 bs.
> 
> I'll need to look into actually testing something that large, 

just run blktests nvme/009 test with multipath, nothing fancy (I used
ext4).

 From a quick look I think that ext4 tempfiles are coming from
the transaction subsystem so maybe that's why the inode i_blkbits
is 20? Not sure at all though...

Anyways, its very easy to reproduce.

> but the whole point of my patch is that we never even use the bs value
> for anything. So I'm not sure why that would matter.

But than with multipath, blk_queue_stack_limits assume that we have a
valid bs in the lower device queue.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 1/3] nvme: set 0 capacity if namespace block size exceeds PAGE_SIZE
  2019-03-15 20:44             ` Sagi Grimberg
@ 2019-04-24 18:50               ` Sagi Grimberg
  0 siblings, 0 replies; 22+ messages in thread
From: Sagi Grimberg @ 2019-04-24 18:50 UTC (permalink / raw)


>>>> I actually tested it by manually hacking a 8k block size into nvmet
>>>> and and it works just fine for me.? Where do you see a division by
>>>> zero with this patch exactly?
>>>
>>> You need your block size to exceed u16 bs.
>>
>> I'll need to look into actually testing something that large, 
> 
> just run blktests nvme/009 test with multipath, nothing fancy (I used
> ext4).
> 
>  From a quick look I think that ext4 tempfiles are coming from
> the transaction subsystem so maybe that's why the inode i_blkbits
> is 20? Not sure at all though...
> 
> Anyways, its very easy to reproduce.
> 
>> but the whole point of my patch is that we never even use the bs value
>> for anything. So I'm not sure why that would matter.
> 
> But than with multipath, blk_queue_stack_limits assume that we have a
> valid bs in the lower device queue.

Reviving this,

This is still failing without this patch.

Reproducer:
- use ext4 filesystem
- set nvme.multipath=1
- run blktests: tests/nvme/009

bang...

Christoph,

The reason why bs is set to 512 in nvme_update_disk_info() is because
blk_queue_stack_limits assumes that q->limits.logical_block_size of the
underlying device is not zero (automatically divides by it).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 1/3] nvme: set 0 capacity if namespace block size exceeds PAGE_SIZE
  2019-03-11 22:02 ` [PATCH v4 1/3] nvme: set 0 capacity if namespace block size exceeds PAGE_SIZE Sagi Grimberg
  2019-03-11 22:06   ` Keith Busch
  2019-03-12 14:32   ` Christoph Hellwig
@ 2019-04-25 14:52   ` Christoph Hellwig
  2 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2019-04-25 14:52 UTC (permalink / raw)


Thanks,

applied to nvme-5.2.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-04-25 14:52 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-11 22:02 [PATCH v4 0/3] Couple of fixes detected with blktests Sagi Grimberg
2019-03-11 22:02 ` [PATCH v4 1/3] nvme: set 0 capacity if namespace block size exceeds PAGE_SIZE Sagi Grimberg
2019-03-11 22:06   ` Keith Busch
2019-03-12 14:32   ` Christoph Hellwig
2019-03-12 15:35     ` Christoph Hellwig
2019-03-12 21:15     ` Sagi Grimberg
2019-03-15 16:28       ` Christoph Hellwig
2019-03-15 16:38         ` Keith Busch
2019-03-15 16:40           ` Christoph Hellwig
2019-03-15 16:47             ` Keith Busch
2019-03-15 17:17         ` Sagi Grimberg
2019-03-15 18:24           ` Christoph Hellwig
2019-03-15 20:44             ` Sagi Grimberg
2019-04-24 18:50               ` Sagi Grimberg
2019-04-25 14:52   ` Christoph Hellwig
2019-03-11 22:02 ` [PATCH v4 2/3] nvme: put ns_head ref if namespace fails allocation Sagi Grimberg
2019-03-12 14:47   ` Christoph Hellwig
2019-03-11 22:02 ` [PATCH v4 3/3] nvmet-file: clamp-down file namespace lba_shift Sagi Grimberg
2019-03-12 14:33   ` Christoph Hellwig
2019-03-12 21:07     ` Sagi Grimberg
2019-03-15 16:29       ` Christoph Hellwig
2019-03-15 17:18         ` Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.