linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [RFC] nvme: set block size during namespace validation
@ 2020-12-23 15:01 Minwoo Im
  2020-12-23 15:49 ` Christoph Hellwig
  0 siblings, 1 reply; 5+ messages in thread
From: Minwoo Im @ 2020-12-23 15:01 UTC (permalink / raw)
  To: linux-nvme, linux-block
  Cc: Keith Busch, Jens Axboe, Minwoo Im, Christoph Hellwig, Sagi Grimberg

  Let's say we have 2 LBA format for 4096B and 512B LBA size for a
NVMe namespace.  Assume that current LBA format is 4096B and in case
we convert namespace to 512B and 4096B back again:

  nvme format /dev/nvme0n1 --lbaf=1 --force  # 512B
  nvme format /dev/nvme0n1 --lbaf=0 --force  # 4096B

  Then we can see the following errors during the BLKRRPART ioctl from
the nvme-cli format subcommand:

  [   10.771740] blk_update_request: operation not supported error, dev nvme0n1, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
  [   10.780262] Buffer I/O error on dev nvme0n1, logical block 0, async page read
  ...

  Also, we can see the Read commands followed by the Format command due
to BLKRRPART ioctl with Number of LBAs to 0xffff which is under-flowed
because the request for the Read commands are coming with 512B.

  kworker/0:1H-56      [000] ....   913.456922: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=1, cmdid=216, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=0, len=65535, ctrl=0x0, dsmgmt=0, reftag=0)
  ksoftirqd/0-9       [000] .Ns.   916.566351: nvme_complete_rq: nvme0: disk=nvme0n1, qid=1, cmdid=216, res=0x0, retries=0, flags=0x0, status=0x4002

  Before we have commit 5ff9f19231a0 ("block: simplify
set_init_blocksize"), block size used to be bumped up to the
4K(PAGE_SIZE) in this example and we have not seen these errors.  But
with this patch, we have to make sure that bdev->bd_inode->i_blkbits to
make sure that BLKRRPART ioctl pass proper request length based on the
changed logical block size.

  Currently, nvme_setup_rw() does not consider under-flow case when it
fills the cmnd->rw.length:

  cmnd->rw.length = cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1);

  Maybe we can have some statement to prevent under-flow case here, but
this patch set the blocksize to the block layer when validating
namespace where the logical_block_size of request_queue also is set.

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
---
 drivers/nvme/host/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ce1b61519441..6f2e8eb78e00 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2085,6 +2085,8 @@ static void nvme_update_disk_info(struct gendisk *disk,
 	}
 
 	blk_queue_logical_block_size(disk->queue, bs);
+	set_blocksize(disk->part0, bs);
+
 	/*
 	 * Linux filesystems assume writing a single physical block is
 	 * an atomic operation. Hence limit the physical block size to the
-- 
2.17.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC] nvme: set block size during namespace validation
  2020-12-23 15:01 [RFC] nvme: set block size during namespace validation Minwoo Im
@ 2020-12-23 15:49 ` Christoph Hellwig
  2020-12-23 16:16   ` Minwoo Im
  0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2020-12-23 15:49 UTC (permalink / raw)
  To: Minwoo Im
  Cc: Jens Axboe, Sagi Grimberg, linux-nvme, linux-block, Keith Busch,
	Christoph Hellwig

set_blocksize just sets the block sise used for buffer heads and should
not be called by the driver.  blkdev_get updates the block size, so
you must already have the fd re-reading the partition table open?
I'm not entirely sure how we can work around this except by avoiding
buffer head I/O in the partition reread code.  Note that this affects
all block drivers where the block size could change at runtime.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] nvme: set block size during namespace validation
  2020-12-23 15:49 ` Christoph Hellwig
@ 2020-12-23 16:16   ` Minwoo Im
  2020-12-23 16:27     ` Christoph Hellwig
  0 siblings, 1 reply; 5+ messages in thread
From: Minwoo Im @ 2020-12-23 16:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-block, Keith Busch, Jens Axboe, linux-nvme, Sagi Grimberg

Hello,

On 20-12-23 16:49:04, Christoph Hellwig wrote:
> set_blocksize just sets the block sise used for buffer heads and should
> not be called by the driver.  blkdev_get updates the block size, so
> you must already have the fd re-reading the partition table open?
> I'm not entirely sure how we can work around this except by avoiding
> buffer head I/O in the partition reread code.  Note that this affects
> all block drivers where the block size could change at runtime.

Thank you Christoph for your comment on this.

Agreed.  BLKRRPART leads us to block_read_full_page which takes buffer
heads for I/O.

Yes, __blkdev_get() sets i_blkbits of block device inode via
set_init_blocksize.  And Yes again as nvme-cli already opened the block
device fd and requests the BLKRRPART with that fd.  Also, __bdev_get()
only updates the i_blkbits(blocksize) in case bdev->bd_openers == 0 which
is the first time to open this block device.

Then, how about having NVMe driver prevent underflow case for the
request->__data_len is smaller than the logical block size like:

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ce1b61519441..030353d203bf 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -803,7 +803,11 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
        cmnd->rw.opcode = op;
        cmnd->rw.nsid = cpu_to_le32(ns->head->ns_id);
        cmnd->rw.slba = cpu_to_le64(nvme_sect_to_lba(ns, blk_rq_pos(req)));
-       cmnd->rw.length = cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1);
+
+       if (unlikely(blk_rq_bytes(req) < (1 << ns->lba_shift)))
+               cmnd->rw.length = 0;
+       else
+               cmnd->rw.length = cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1);
 
        if (req_op(req) == REQ_OP_WRITE && ctrl->nr_streams)
                nvme_assign_write_stream(ctrl, req, &control, &dsmgmt);

Thanks,

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC] nvme: set block size during namespace validation
  2020-12-23 16:16   ` Minwoo Im
@ 2020-12-23 16:27     ` Christoph Hellwig
  2020-12-23 18:31       ` Minwoo Im
  0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2020-12-23 16:27 UTC (permalink / raw)
  To: Minwoo Im
  Cc: Jens Axboe, Sagi Grimberg, linux-nvme, linux-block, Keith Busch,
	Christoph Hellwig

On Thu, Dec 24, 2020 at 01:16:50AM +0900, Minwoo Im wrote:
> Hello,
> 
> On 20-12-23 16:49:04, Christoph Hellwig wrote:
> > set_blocksize just sets the block sise used for buffer heads and should
> > not be called by the driver.  blkdev_get updates the block size, so
> > you must already have the fd re-reading the partition table open?
> > I'm not entirely sure how we can work around this except by avoiding
> > buffer head I/O in the partition reread code.  Note that this affects
> > all block drivers where the block size could change at runtime.
> 
> Thank you Christoph for your comment on this.
> 
> Agreed.  BLKRRPART leads us to block_read_full_page which takes buffer
> heads for I/O.
> 
> Yes, __blkdev_get() sets i_blkbits of block device inode via
> set_init_blocksize.  And Yes again as nvme-cli already opened the block
> device fd and requests the BLKRRPART with that fd.  Also, __bdev_get()
> only updates the i_blkbits(blocksize) in case bdev->bd_openers == 0 which
> is the first time to open this block device.
> 
> Then, how about having NVMe driver prevent underflow case for the
> request->__data_len is smaller than the logical block size like:

Not sure this helps.  I think we need to fix this proper and in the
block layer.  The long term fix is to stop messing with i_blksize
at all, but that is going to take very long.

I think for now the only thing we can do is to set a flag in the
gendisk when the block size changes and then reject all I/O until
the next first open that sets the blocksize.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] nvme: set block size during namespace validation
  2020-12-23 16:27     ` Christoph Hellwig
@ 2020-12-23 18:31       ` Minwoo Im
  0 siblings, 0 replies; 5+ messages in thread
From: Minwoo Im @ 2020-12-23 18:31 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-block, Keith Busch, Jens Axboe, linux-nvme, Sagi Grimberg

On 20-12-23 17:27:37, Christoph Hellwig wrote:
> On Thu, Dec 24, 2020 at 01:16:50AM +0900, Minwoo Im wrote:
> > Hello,
> > 
> > On 20-12-23 16:49:04, Christoph Hellwig wrote:
> > > set_blocksize just sets the block sise used for buffer heads and should
> > > not be called by the driver.  blkdev_get updates the block size, so
> > > you must already have the fd re-reading the partition table open?
> > > I'm not entirely sure how we can work around this except by avoiding
> > > buffer head I/O in the partition reread code.  Note that this affects
> > > all block drivers where the block size could change at runtime.
> > 
> > Thank you Christoph for your comment on this.
> > 
> > Agreed.  BLKRRPART leads us to block_read_full_page which takes buffer
> > heads for I/O.
> > 
> > Yes, __blkdev_get() sets i_blkbits of block device inode via
> > set_init_blocksize.  And Yes again as nvme-cli already opened the block
> > device fd and requests the BLKRRPART with that fd.  Also, __bdev_get()
> > only updates the i_blkbits(blocksize) in case bdev->bd_openers == 0 which
> > is the first time to open this block device.
> > 
> > Then, how about having NVMe driver prevent underflow case for the
> > request->__data_len is smaller than the logical block size like:
> 
> Not sure this helps.  I think we need to fix this proper and in the
> block layer.  The long term fix is to stop messing with i_blksize
> at all, but that is going to take very long.

Agreed.

> 
> I think for now the only thing we can do is to set a flag in the
> gendisk when the block size changes and then reject all I/O until
> the next first open that sets the blocksize.

Let me prepare work around patch for this issue soon.

Thanks!

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-12-23 18:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-23 15:01 [RFC] nvme: set block size during namespace validation Minwoo Im
2020-12-23 15:49 ` Christoph Hellwig
2020-12-23 16:16   ` Minwoo Im
2020-12-23 16:27     ` Christoph Hellwig
2020-12-23 18:31       ` Minwoo Im

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).