linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] virtio_blk: fix defaults for max_hw_sectors and max_segment_size
@ 2014-11-20 19:00 Mike Snitzer
  2014-11-20 20:30 ` Michael S. Tsirkin
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Mike Snitzer @ 2014-11-20 19:00 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, martin.petersen, hch, mst, rusty, dm-devel

virtio_blk incorrectly established -1U as the default for these
queue_limits.  Set these limits to sane default values to avoid crashing
the kernel.  But the virtio-blk protocol should probably be extended to
allow proper stacking of the disk's limits from the host.

This change fixes a crash that was reported when virtio-blk was used to
test linux-dm.git commit 604ea90641b4 ("dm thin: adjust max_sectors_kb
based on thinp blocksize") that will initially set max_sectors to
max_hw_sectors and then rounddown to the first power-of-2 factor of the
DM thin-pool's blocksize.  Basically that commit assumes drivers don't
suck when establishing max_hw_sectors so it acted like a canary in the
coal mine.

In the case of a DM thin-pool built ontop of virtio-blk data device
these are the insane limits that were established for the DM thin-pool:

  # cat /sys/block/dm-6/queue/max_sectors_kb
  1073741824
  # cat /sys/block/dm-6/queue/max_hw_sectors_kb
  2147483647

by stacking the virtio-blk device's limits:

  # cat /sys/block/vdb/queue/max_sectors_kb
  512
  # cat /sys/block/vdb/queue/max_hw_sectors_kb
  2147483647

Attempting to mkfs.xfs against a thin device from this thin-pool quickly
resulted in fs/direct-io.c:dio_send_cur_page()'s BUG_ON.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
---
 drivers/block/virtio_blk.c |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index c6a27d5..68efbdc 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -674,8 +674,11 @@ static int virtblk_probe(struct virtio_device *vdev)
 	/* No need to bounce any requests */
 	blk_queue_bounce_limit(q, BLK_BOUNCE_ANY);
 
-	/* No real sector limit. */
-	blk_queue_max_hw_sectors(q, -1U);
+	/*
+	 * Limited by disk's max_hw_sectors in host, but
+	 * without that info establish a sane default.
+	 */
+	blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS);
 
 	/* Host can optionally specify maximum segment size and number of
 	 * segments. */
@@ -684,7 +687,7 @@ static int virtblk_probe(struct virtio_device *vdev)
 	if (!err)
 		blk_queue_max_segment_size(q, v);
 	else
-		blk_queue_max_segment_size(q, -1U);
+		blk_queue_max_segment_size(q, BLK_MAX_SEGMENT_SIZE);
 
 	/* Host can optionally specify the block size of the device */
 	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_BLK_SIZE,
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] virtio_blk: fix defaults for max_hw_sectors and max_segment_size
  2014-11-20 19:00 [PATCH] virtio_blk: fix defaults for max_hw_sectors and max_segment_size Mike Snitzer
@ 2014-11-20 20:30 ` Michael S. Tsirkin
  2014-11-20 21:15   ` Mike Snitzer
  2014-11-21  1:59 ` Mike Snitzer
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Michael S. Tsirkin @ 2014-11-20 20:30 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: axboe, linux-kernel, martin.petersen, hch, rusty, dm-devel

On Thu, Nov 20, 2014 at 02:00:59PM -0500, Mike Snitzer wrote:
> virtio_blk incorrectly established -1U as the default for these
> queue_limits.  Set these limits to sane default values to avoid crashing
> the kernel.  But the virtio-blk protocol should probably be extended to
> allow proper stacking of the disk's limits from the host.
> 
> This change fixes a crash that was reported when virtio-blk was used to
> test linux-dm.git commit 604ea90641b4 ("dm thin: adjust max_sectors_kb
> based on thinp blocksize") that will initially set max_sectors to
> max_hw_sectors and then rounddown to the first power-of-2 factor of the
> DM thin-pool's blocksize.  Basically that commit assumes drivers don't
> suck when establishing max_hw_sectors so it acted like a canary in the
> coal mine.
> 
> In the case of a DM thin-pool built ontop of virtio-blk data device
> these are the insane limits that were established for the DM thin-pool:
> 
>   # cat /sys/block/dm-6/queue/max_sectors_kb
>   1073741824
>   # cat /sys/block/dm-6/queue/max_hw_sectors_kb
>   2147483647
> 
> by stacking the virtio-blk device's limits:
> 
>   # cat /sys/block/vdb/queue/max_sectors_kb
>   512
>   # cat /sys/block/vdb/queue/max_hw_sectors_kb
>   2147483647
> 
> Attempting to mkfs.xfs against a thin device from this thin-pool quickly
> resulted in fs/direct-io.c:dio_send_cur_page()'s BUG_ON.

Why exactly does it BUG_ON?
Did some memory allocation fail?

Will it still BUG_ON if host gives us high values?

If linux makes assumptions about hardware limits, won't
it be better to put them in blk core and not in
individual drivers?


> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> Cc: stable@vger.kernel.org
> ---
>  drivers/block/virtio_blk.c |    9 ++++++---
>  1 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index c6a27d5..68efbdc 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -674,8 +674,11 @@ static int virtblk_probe(struct virtio_device *vdev)
>  	/* No need to bounce any requests */
>  	blk_queue_bounce_limit(q, BLK_BOUNCE_ANY);
>  
> -	/* No real sector limit. */
> -	blk_queue_max_hw_sectors(q, -1U);
> +	/*
> +	 * Limited by disk's max_hw_sectors in host, but
> +	 * without that info establish a sane default.
> +	 */
> +	blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS);

I see
drivers/usb/storage/scsiglue.c: blk_queue_max_hw_sectors(sdev->request_queue, 0x7FFFFF);

so maybe we should go higher, and use INT_MAX too?


>  
>  	/* Host can optionally specify maximum segment size and number of
>  	 * segments. */
> @@ -684,7 +687,7 @@ static int virtblk_probe(struct virtio_device *vdev)
>  	if (!err)
>  		blk_queue_max_segment_size(q, v);
>  	else
> -		blk_queue_max_segment_size(q, -1U);
> +		blk_queue_max_segment_size(q, BLK_MAX_SEGMENT_SIZE);
>  
>  	/* Host can optionally specify the block size of the device */
>  	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_BLK_SIZE,

Here too, I see some drivers asking for more:
drivers/block/mtip32xx/mtip32xx.c: blk_queue_max_segment_size(dd->queue, 0x400000);




> -- 
> 1.7.4.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: virtio_blk: fix defaults for max_hw_sectors and max_segment_size
  2014-11-20 20:30 ` Michael S. Tsirkin
@ 2014-11-20 21:15   ` Mike Snitzer
  2014-11-26  5:58     ` Rusty Russell
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2014-11-20 21:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: axboe, linux-kernel, martin.petersen, hch, rusty, dm-devel

On Thu, Nov 20 2014 at  3:30pm -0500,
Michael S. Tsirkin <mst@redhat.com> wrote:

> On Thu, Nov 20, 2014 at 02:00:59PM -0500, Mike Snitzer wrote:
> > virtio_blk incorrectly established -1U as the default for these
> > queue_limits.  Set these limits to sane default values to avoid crashing
> > the kernel.
...
> > Attempting to mkfs.xfs against a thin device from this thin-pool quickly
> > resulted in fs/direct-io.c:dio_send_cur_page()'s BUG_ON.
> 
> Why exactly does it BUG_ON?
> Did some memory allocation fail?

No idea, kernel log doesn't say.. all it has is "kernel BUG" pointing to
fs/direct-io.c:dio_send_cur_page()'s BUG_ON.

I could dig deeper on _why_ but honestly, there really isn't much point.
virtio-blk doesn't get to live in fantasy-land just because it happens
to think it is limitless.

> Will it still BUG_ON if host gives us high values?

Maybe, if/when virtio-blk allows the host to inject a value for
max_hw_sectors.  But my fix doesn't stack the host's limits up, it sets
a value that isn't prone to make the block/fs layers BUG.

> If linux makes assumptions about hardware limits, won't
> it be better to put them in blk core and not in
> individual drivers?

The individual block driver is meant to establish sane values for these
limits.

Block core _does_ have some sane wrappers for stacking these limits
(e.g. blk_stack_limits, etc).  All of those wrappers are meant to allow
for virtual drivers to build up limits that respect the underlying
hardware's limits.

But virtio-blk doesn't use any of them due to the virtio-blk driver
relying on the virtio-blk protocol to encapsulate each and every one of
them.

> > Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> > Cc: stable@vger.kernel.org
> > ---
> >  drivers/block/virtio_blk.c |    9 ++++++---
> >  1 files changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > index c6a27d5..68efbdc 100644
> > --- a/drivers/block/virtio_blk.c
> > +++ b/drivers/block/virtio_blk.c
> > @@ -674,8 +674,11 @@ static int virtblk_probe(struct virtio_device *vdev)
> >  	/* No need to bounce any requests */
> >  	blk_queue_bounce_limit(q, BLK_BOUNCE_ANY);
> >  
> > -	/* No real sector limit. */
> > -	blk_queue_max_hw_sectors(q, -1U);
> > +	/*
> > +	 * Limited by disk's max_hw_sectors in host, but
> > +	 * without that info establish a sane default.
> > +	 */
> > +	blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS);
> 
> I see
> drivers/usb/storage/scsiglue.c: blk_queue_max_hw_sectors(sdev->request_queue, 0x7FFFFF);
> 
> so maybe we should go higher, and use INT_MAX too?

No, higher doesn't help _at all_ if the driver itself doesn't actually
take care to stack the underlying driver's limits.  Without limits
stacking (which virtio-blk doesn't really have) it is the lack of
reality-based default values that is _the_ problem that enduced this
BUG.

blk_stack_limits() does a lot of min_t(top, bottom), etc.  So you want
the default "top" of a stacking driver to be high enough so as not to
artificially limit the resulting stacked limit.  Which is why we have
things like blk_set_stacking_limits().  You'll note that
blk_set_stacking_limits() properly establishes UINT_MAX, etc.  BUT it is
"proper" purely because drivers that call it (e.g. DM) also make use of
the block layer's limits stacking functions (again,
e.g. blk_stack_limits).

> >  
> >  	/* Host can optionally specify maximum segment size and number of
> >  	 * segments. */
> > @@ -684,7 +687,7 @@ static int virtblk_probe(struct virtio_device *vdev)
> >  	if (!err)
> >  		blk_queue_max_segment_size(q, v);
> >  	else
> > -		blk_queue_max_segment_size(q, -1U);
> > +		blk_queue_max_segment_size(q, BLK_MAX_SEGMENT_SIZE);
> >  
> >  	/* Host can optionally specify the block size of the device */
> >  	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_BLK_SIZE,
> 
> Here too, I see some drivers asking for more:
> drivers/block/mtip32xx/mtip32xx.c: blk_queue_max_segment_size(dd->queue, 0x400000);

Those drivers you listed could be equally broken..

For virtio-blk the issue is that the limits it establishes don't reflect
the underlying host's hardware capabilties.  This was a virtio-blk time
bomb waiting to go off.

And to be clear, I just fixed the blk_queue_max_segment_size(q, -1U);
because it is blatantly wrong when we've established
BLK_MAX_SEGMENT_SIZE.

The bug that was reported is purely due to max_hw_sectors being 2TB and
the established max_sectors being 1TB.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: virtio_blk: fix defaults for max_hw_sectors and max_segment_size
  2014-11-20 19:00 [PATCH] virtio_blk: fix defaults for max_hw_sectors and max_segment_size Mike Snitzer
  2014-11-20 20:30 ` Michael S. Tsirkin
@ 2014-11-21  1:59 ` Mike Snitzer
  2014-11-21  2:11 ` [PATCH v2] " Mike Snitzer
  2014-11-21  9:54 ` [PATCH] " Christoph Hellwig
  3 siblings, 0 replies; 15+ messages in thread
From: Mike Snitzer @ 2014-11-21  1:59 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, martin.petersen, hch, mst, rusty, dm-devel

On Thu, Nov 20 2014 at  2:00pm -0500,
Mike Snitzer <snitzer@redhat.com> wrote:

> virtio_blk incorrectly established -1U as the default for these
> queue_limits.  Set these limits to sane default values to avoid crashing
> the kernel.  But the virtio-blk protocol should probably be extended to
> allow proper stacking of the disk's limits from the host.
> 
> This change fixes a crash that was reported when virtio-blk was used to
> test linux-dm.git commit 604ea90641b4 ("dm thin: adjust max_sectors_kb
> based on thinp blocksize") that will initially set max_sectors to
> max_hw_sectors and then rounddown to the first power-of-2 factor of the
> DM thin-pool's blocksize.  Basically that commit assumes drivers don't
> suck when establishing max_hw_sectors so it acted like a canary in the
> coal mine.

I have changed that DM thinp code to not be so fragile with this
follow-on fix:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-for-3.19&id=971ab7029b61ec10e0765bfb96331448ce5c3094

> In the case of a DM thin-pool built ontop of virtio-blk data device
> these are the insane limits that were established for the DM thin-pool:
> 
>   # cat /sys/block/dm-6/queue/max_sectors_kb
>   1073741824
>   # cat /sys/block/dm-6/queue/max_hw_sectors_kb
>   2147483647
> 
> by stacking the virtio-blk device's limits:
> 
>   # cat /sys/block/vdb/queue/max_sectors_kb
>   512
>   # cat /sys/block/vdb/queue/max_hw_sectors_kb
>   2147483647
> 
> Attempting to mkfs.xfs against a thin device from this thin-pool quickly
> resulted in fs/direct-io.c:dio_send_cur_page()'s BUG_ON.

But virtio_blk really must be fixed.  I'll post v2 of this patch with a
revised header that skips all the references to DM thinp, etc.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2] virtio_blk: fix defaults for max_hw_sectors and max_segment_size
  2014-11-20 19:00 [PATCH] virtio_blk: fix defaults for max_hw_sectors and max_segment_size Mike Snitzer
  2014-11-20 20:30 ` Michael S. Tsirkin
  2014-11-21  1:59 ` Mike Snitzer
@ 2014-11-21  2:11 ` Mike Snitzer
  2014-11-21  9:54 ` [PATCH] " Christoph Hellwig
  3 siblings, 0 replies; 15+ messages in thread
From: Mike Snitzer @ 2014-11-21  2:11 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, martin.petersen, hch, mst, rusty, dm-devel

virtio_blk incorrectly established -1U as the default for these
queue_limits.  Set these limits to sane default values to avoid crashing
the kernel.  But the virtio-blk protocol should probably be extended to
allow proper stacking of the disk's limits from the host.

This change fixes a crash that can occur if the max_sectors_kb is
modified to even be half of virtio_blk's advertised max_hw_sectors_kb:

  # cat /sys/block/vdb/queue/max_sectors_kb
  512
  # cat /sys/block/vdb/queue/max_hw_sectors_kb
  2147483647
  # echo 1073741824 > /sys/block/vdb/queue/max_sectors_kb

Attempting to mkfs.xfs against /dev/vdb will result in hitting
fs/direct-io.c:dio_send_cur_page()'s BUG_ON.

While fixing the blk_queue_max_hw_sectors(q, -1U) call it was
immediately apparent that the blk_queue_max_segment_size(q, -1U) also
should be fixed.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
---
 drivers/block/virtio_blk.c |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

v2: revise header to simplify the scope of the problem explanation

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index c6a27d5..68efbdc 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -674,8 +674,11 @@ static int virtblk_probe(struct virtio_device *vdev)
 	/* No need to bounce any requests */
 	blk_queue_bounce_limit(q, BLK_BOUNCE_ANY);
 
-	/* No real sector limit. */
-	blk_queue_max_hw_sectors(q, -1U);
+	/*
+	 * Limited by disk's max_hw_sectors in host, but
+	 * without that info establish a sane default.
+	 */
+	blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS);
 
 	/* Host can optionally specify maximum segment size and number of
 	 * segments. */
@@ -684,7 +687,7 @@ static int virtblk_probe(struct virtio_device *vdev)
 	if (!err)
 		blk_queue_max_segment_size(q, v);
 	else
-		blk_queue_max_segment_size(q, -1U);
+		blk_queue_max_segment_size(q, BLK_MAX_SEGMENT_SIZE);
 
 	/* Host can optionally specify the block size of the device */
 	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_BLK_SIZE,
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] virtio_blk: fix defaults for max_hw_sectors and max_segment_size
  2014-11-20 19:00 [PATCH] virtio_blk: fix defaults for max_hw_sectors and max_segment_size Mike Snitzer
                   ` (2 preceding siblings ...)
  2014-11-21  2:11 ` [PATCH v2] " Mike Snitzer
@ 2014-11-21  9:54 ` Christoph Hellwig
  2014-11-21 15:49   ` Mike Snitzer
  3 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2014-11-21  9:54 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: axboe, linux-kernel, martin.petersen, hch, mst, rusty, dm-devel,
	Paolo Bonzini, qemu-devel

On Thu, Nov 20, 2014 at 02:00:59PM -0500, Mike Snitzer wrote:
> virtio_blk incorrectly established -1U as the default for these
> queue_limits.  Set these limits to sane default values to avoid crashing
> the kernel.  But the virtio-blk protocol should probably be extended to
> allow proper stacking of the disk's limits from the host.
> 
> This change fixes a crash that was reported when virtio-blk was used to
> test linux-dm.git commit 604ea90641b4 ("dm thin: adjust max_sectors_kb
> based on thinp blocksize") that will initially set max_sectors to
> max_hw_sectors and then rounddown to the first power-of-2 factor of the
> DM thin-pool's blocksize.  Basically that commit assumes drivers don't
> suck when establishing max_hw_sectors so it acted like a canary in the
> coal mine.

Is that a crash in the host or guest?  What kind of mishandling did you
see?  Unless the recent virtio standard changed anything the host
should be able to handle our arbitrary limits, and even if it doesn't
that something we need to hash out with qemu and the virtio standards
folks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: virtio_blk: fix defaults for max_hw_sectors and max_segment_size
  2014-11-21  9:54 ` [PATCH] " Christoph Hellwig
@ 2014-11-21 15:49   ` Mike Snitzer
  2014-11-26 19:48     ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2014-11-21 15:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, martin.petersen, mst, rusty, linux-kernel, qemu-devel,
	dm-devel, Paolo Bonzini

On Fri, Nov 21 2014 at  4:54am -0500,
Christoph Hellwig <hch@infradead.org> wrote:

> On Thu, Nov 20, 2014 at 02:00:59PM -0500, Mike Snitzer wrote:
> > virtio_blk incorrectly established -1U as the default for these
> > queue_limits.  Set these limits to sane default values to avoid crashing
> > the kernel.  But the virtio-blk protocol should probably be extended to
> > allow proper stacking of the disk's limits from the host.
> > 
> > This change fixes a crash that was reported when virtio-blk was used to
> > test linux-dm.git commit 604ea90641b4 ("dm thin: adjust max_sectors_kb
> > based on thinp blocksize") that will initially set max_sectors to
> > max_hw_sectors and then rounddown to the first power-of-2 factor of the
> > DM thin-pool's blocksize.  Basically that commit assumes drivers don't
> > suck when establishing max_hw_sectors so it acted like a canary in the
> > coal mine.
> 
> Is that a crash in the host or guest?  What kind of mishandling did you
> see?  Unless the recent virtio standard changed anything the host
> should be able to handle our arbitrary limits, and even if it doesn't
> that something we need to hash out with qemu and the virtio standards
> folks.

Some good news: this guest crash isn't an issue with recent kernels (so
upstream, fedora 20, RHEL7, etc aren't impacted -- Jens feel free to
drop my virtio_blk patch; even though some of it's limits are clearly
broken I'll defer to the virtio_blk developers on the best way forward
-- sorry for the noise!).

The BUG I saw only seems to impact RHEL6 kernels so far (note to self,
actually _test_ on upstream before reporting a crash against upstream!)

[root@RHEL-6 ~]# echo 1073741824 > /sys/block/vdc/queue/max_sectors_kb
[root@RHEL-6 ~]# lvs

Message from syslogd@RHEL-6 at Nov 21 15:32:15 ...
 kernel:Kernel panic - not syncing: Fatal exception

Here is the RHEL6 guest crash, just for full disclosure:

kernel BUG at fs/direct-io.c:696!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/virtual/block/dm-4/dev
CPU 0
Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipv6 ext2 dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio libcrc32c dm_mirror dm_region_hash dm_log dm_mod microcode virtio_balloon i2c_piix4 i2c_core virtio_net ext4 jbd2 mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix [last unloaded: speedstep_lib]

Pid: 1679, comm: lvs Not tainted 2.6.32 #6 Bochs Bochs
RIP: 0010:[<ffffffff811ce336>]  [<ffffffff811ce336>] __blockdev_direct_IO_newtrunc+0x986/0x1270
RSP: 0018:ffff88011a11ba48  EFLAGS: 00010287
RAX: 0000000000000000 RBX: ffff8801192fbd28 RCX: 0000000000001000
RDX: ffffea0003b3d218 RSI: ffff88011aac4300 RDI: ffff880118572378
RBP: ffff88011a11bbe8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801192fbd00
R13: 0000000000000000 R14: ffff880118c3cac0 R15: 0000000000000000
FS:  00007fde78bc37a0(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000012706f0 CR3: 000000011a432000 CR4: 00000000000407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process lvs (pid: 1679, threadinfo ffff88011a11a000, task ffff8801185a4aa0)
Stack:
 ffff88011a11bb48 ffff88011a11baa8 ffff88010000000c ffff88011a11bb18
<d> 0000000000000000 0000000000000000 ffff88011a11bdc8 ffff88011a11beb8
<d> 0000000c1a11baa8 ffff880118c3cb98 0000000000000000 0000000018c3ccb8
Call Trace:
 [<ffffffff811c9e90>] ? blkdev_get_block+0x0/0x20
 [<ffffffff811cec97>] __blockdev_direct_IO+0x77/0xe0
 [<ffffffff811c9e90>] ? blkdev_get_block+0x0/0x20
 [<ffffffff811caf17>] blkdev_direct_IO+0x57/0x60
 [<ffffffff811c9e90>] ? blkdev_get_block+0x0/0x20
 [<ffffffff8112619b>] generic_file_aio_read+0x6bb/0x700
 [<ffffffff811cba60>] ? blkdev_get+0x10/0x20
 [<ffffffff811cba70>] ? blkdev_open+0x0/0xc0
 [<ffffffff8118af4f>] ? __dentry_open+0x23f/0x360
 [<ffffffff811ca2d1>] blkdev_aio_read+0x51/0x80
 [<ffffffff8118dc6a>] do_sync_read+0xfa/0x140
 [<ffffffff8109eaf0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff811ca22c>] ? block_ioctl+0x3c/0x40
 [<ffffffff811a34c2>] ? vfs_ioctl+0x22/0xa0
 [<ffffffff811a3664>] ? do_vfs_ioctl+0x84/0x580
 [<ffffffff8122cee6>] ? security_file_permission+0x16/0x20
 [<ffffffff8118e625>] vfs_read+0xb5/0x1a0
 [<ffffffff8118e761>] sys_read+0x51/0x90
 [<ffffffff810e5aae>] ? __audit_syscall_exit+0x25e/0x290
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Code: fe ff ff c7 85 fc fe ff ff 00 00 00 00 48 89 95 10 ff ff ff 8b 95 34 ff ff ff e8 46 ac ff ff 3b 85 34 ff ff ff 0f 84 fc 02 00 00 <0f> 0b eb fe 8b 9d 34 ff ff ff 8b 85 30 ff ff ff 01 d8 85 c0 0f
RIP  [<ffffffff811ce336>] __blockdev_direct_IO_newtrunc+0x986/0x1270
 RSP <ffff88011a11ba48>
---[ end trace 73be5dcaf8939399 ]---
Kernel panic - not syncing: Fatal exception

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: virtio_blk: fix defaults for max_hw_sectors and max_segment_size
  2014-11-20 21:15   ` Mike Snitzer
@ 2014-11-26  5:58     ` Rusty Russell
  2014-11-26 14:10       ` Mike Snitzer
  0 siblings, 1 reply; 15+ messages in thread
From: Rusty Russell @ 2014-11-26  5:58 UTC (permalink / raw)
  To: Mike Snitzer, Michael S. Tsirkin
  Cc: axboe, linux-kernel, martin.petersen, hch, dm-devel

Mike Snitzer <snitzer@redhat.com> writes:
> On Thu, Nov 20 2014 at  3:30pm -0500,
> Michael S. Tsirkin <mst@redhat.com> wrote:
>
>> On Thu, Nov 20, 2014 at 02:00:59PM -0500, Mike Snitzer wrote:
>> > virtio_blk incorrectly established -1U as the default for these
>> > queue_limits.  Set these limits to sane default values to avoid crashing
>> > the kernel.
> ...
>> > Attempting to mkfs.xfs against a thin device from this thin-pool quickly
>> > resulted in fs/direct-io.c:dio_send_cur_page()'s BUG_ON.
>> 
>> Why exactly does it BUG_ON?
>> Did some memory allocation fail?
>
> No idea, kernel log doesn't say.. all it has is "kernel BUG" pointing to
> fs/direct-io.c:dio_send_cur_page()'s BUG_ON.
>
> I could dig deeper on _why_ but honestly, there really isn't much point.

There is *always* a point in understanding the code you are modifying.

> virtio-blk doesn't get to live in fantasy-land just because it happens
> to think it is limitless.

Calm down please.

We don't have a sector limit.  We have a segment limit, which is set
above this line.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: virtio_blk: fix defaults for max_hw_sectors and max_segment_size
  2014-11-26  5:58     ` Rusty Russell
@ 2014-11-26 14:10       ` Mike Snitzer
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Snitzer @ 2014-11-26 14:10 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Michael S. Tsirkin, axboe, linux-kernel, martin.petersen, hch, dm-devel

On Wed, Nov 26 2014 at 12:58am -0500,
Rusty Russell <rusty@rustcorp.com.au> wrote:

> Mike Snitzer <snitzer@redhat.com> writes:
> > On Thu, Nov 20 2014 at  3:30pm -0500,
> > Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> >> On Thu, Nov 20, 2014 at 02:00:59PM -0500, Mike Snitzer wrote:
> >> > virtio_blk incorrectly established -1U as the default for these
> >> > queue_limits.  Set these limits to sane default values to avoid crashing
> >> > the kernel.
> > ...
> >> > Attempting to mkfs.xfs against a thin device from this thin-pool quickly
> >> > resulted in fs/direct-io.c:dio_send_cur_page()'s BUG_ON.
> >> 
> >> Why exactly does it BUG_ON?
> >> Did some memory allocation fail?
> >
> > No idea, kernel log doesn't say.. all it has is "kernel BUG" pointing to
> > fs/direct-io.c:dio_send_cur_page()'s BUG_ON.
> >
> > I could dig deeper on _why_ but honestly, there really isn't much point.
> 
> There is *always* a point in understanding the code you are modifying.

Yes, I agree (and understanding the BUG in question will be pursued).
But in the context of the patch I proposed it was irrelevent.
virtio-blk still _should_ fix its limits to reflect those of the block
device it stacks on.  My patch was a stop-gap until proper virtio-blk
protocol extensions were added.  But you don't seem inclined to care.

> > virtio-blk doesn't get to live in fantasy-land just because it happens
> > to think it is limitless.
> 
> Calm down please.

Sure, but it'd have helped if virtio-blk developers demonstrated
acknowledgement that a stacking block driver should stack the limits of
the underlying device.  Instead you decided to trim all related portions
of my reply to mst that were measured and helpful.

> We don't have a sector limit.  We have a segment limit, which is set
> above this line.

Then at a minimum max_hw_sectors should reflect that segment limit.

But again, the underlying device has limits that should be stacked up.
Why is that irrelevent to virtio-blk?  Plus, this is a matter of not
allowing a user to shoot themselves in the foot by fiddling with
traditional block limits only to find in some kernel (*cough* RHEL6)
they result in BUG.

Mike

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: virtio_blk: fix defaults for max_hw_sectors and max_segment_size
  2014-11-21 15:49   ` Mike Snitzer
@ 2014-11-26 19:48     ` Jens Axboe
  2014-11-26 20:51       ` Mike Snitzer
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2014-11-26 19:48 UTC (permalink / raw)
  To: Mike Snitzer, Christoph Hellwig
  Cc: martin.petersen, mst, rusty, linux-kernel, qemu-devel, dm-devel,
	Paolo Bonzini

On 11/21/2014 08:49 AM, Mike Snitzer wrote:
> On Fri, Nov 21 2014 at  4:54am -0500,
> Christoph Hellwig <hch@infradead.org> wrote:
> 
>> On Thu, Nov 20, 2014 at 02:00:59PM -0500, Mike Snitzer wrote:
>>> virtio_blk incorrectly established -1U as the default for these
>>> queue_limits.  Set these limits to sane default values to avoid crashing
>>> the kernel.  But the virtio-blk protocol should probably be extended to
>>> allow proper stacking of the disk's limits from the host.
>>>
>>> This change fixes a crash that was reported when virtio-blk was used to
>>> test linux-dm.git commit 604ea90641b4 ("dm thin: adjust max_sectors_kb
>>> based on thinp blocksize") that will initially set max_sectors to
>>> max_hw_sectors and then rounddown to the first power-of-2 factor of the
>>> DM thin-pool's blocksize.  Basically that commit assumes drivers don't
>>> suck when establishing max_hw_sectors so it acted like a canary in the
>>> coal mine.
>>
>> Is that a crash in the host or guest?  What kind of mishandling did you
>> see?  Unless the recent virtio standard changed anything the host
>> should be able to handle our arbitrary limits, and even if it doesn't
>> that something we need to hash out with qemu and the virtio standards
>> folks.
> 
> Some good news: this guest crash isn't an issue with recent kernels (so
> upstream, fedora 20, RHEL7, etc aren't impacted -- Jens feel free to
> drop my virtio_blk patch; even though some of it's limits are clearly
> broken I'll defer to the virtio_blk developers on the best way forward
> -- sorry for the noise!).
> 
> The BUG I saw only seems to impact RHEL6 kernels so far (note to self,
> actually _test_ on upstream before reporting a crash against upstream!)
> 
> [root@RHEL-6 ~]# echo 1073741824 > /sys/block/vdc/queue/max_sectors_kb
> [root@RHEL-6 ~]# lvs
> 
> Message from syslogd@RHEL-6 at Nov 21 15:32:15 ...
>  kernel:Kernel panic - not syncing: Fatal exception
> 
> Here is the RHEL6 guest crash, just for full disclosure:
> 
> kernel BUG at fs/direct-io.c:696!
> invalid opcode: 0000 [#1] SMP
> last sysfs file: /sys/devices/virtual/block/dm-4/dev
> CPU 0
> Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipv6 ext2 dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio libcrc32c dm_mirror dm_region_hash dm_log dm_mod microcode virtio_balloon i2c_piix4 i2c_core virtio_net ext4 jbd2 mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix [last unloaded: speedstep_lib]
> 
> Pid: 1679, comm: lvs Not tainted 2.6.32 #6 Bochs Bochs
> RIP: 0010:[<ffffffff811ce336>]  [<ffffffff811ce336>] __blockdev_direct_IO_newtrunc+0x986/0x1270
> RSP: 0018:ffff88011a11ba48  EFLAGS: 00010287
> RAX: 0000000000000000 RBX: ffff8801192fbd28 RCX: 0000000000001000
> RDX: ffffea0003b3d218 RSI: ffff88011aac4300 RDI: ffff880118572378
> RBP: ffff88011a11bbe8 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801192fbd00
> R13: 0000000000000000 R14: ffff880118c3cac0 R15: 0000000000000000
> FS:  00007fde78bc37a0(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000012706f0 CR3: 000000011a432000 CR4: 00000000000407f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process lvs (pid: 1679, threadinfo ffff88011a11a000, task ffff8801185a4aa0)
> Stack:
>  ffff88011a11bb48 ffff88011a11baa8 ffff88010000000c ffff88011a11bb18
> <d> 0000000000000000 0000000000000000 ffff88011a11bdc8 ffff88011a11beb8
> <d> 0000000c1a11baa8 ffff880118c3cb98 0000000000000000 0000000018c3ccb8
> Call Trace:
>  [<ffffffff811c9e90>] ? blkdev_get_block+0x0/0x20
>  [<ffffffff811cec97>] __blockdev_direct_IO+0x77/0xe0
>  [<ffffffff811c9e90>] ? blkdev_get_block+0x0/0x20
>  [<ffffffff811caf17>] blkdev_direct_IO+0x57/0x60
>  [<ffffffff811c9e90>] ? blkdev_get_block+0x0/0x20
>  [<ffffffff8112619b>] generic_file_aio_read+0x6bb/0x700
>  [<ffffffff811cba60>] ? blkdev_get+0x10/0x20
>  [<ffffffff811cba70>] ? blkdev_open+0x0/0xc0
>  [<ffffffff8118af4f>] ? __dentry_open+0x23f/0x360
>  [<ffffffff811ca2d1>] blkdev_aio_read+0x51/0x80
>  [<ffffffff8118dc6a>] do_sync_read+0xfa/0x140
>  [<ffffffff8109eaf0>] ? autoremove_wake_function+0x0/0x40
>  [<ffffffff811ca22c>] ? block_ioctl+0x3c/0x40
>  [<ffffffff811a34c2>] ? vfs_ioctl+0x22/0xa0
>  [<ffffffff811a3664>] ? do_vfs_ioctl+0x84/0x580
>  [<ffffffff8122cee6>] ? security_file_permission+0x16/0x20
>  [<ffffffff8118e625>] vfs_read+0xb5/0x1a0
>  [<ffffffff8118e761>] sys_read+0x51/0x90
>  [<ffffffff810e5aae>] ? __audit_syscall_exit+0x25e/0x290
>  [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
> Code: fe ff ff c7 85 fc fe ff ff 00 00 00 00 48 89 95 10 ff ff ff 8b 95 34 ff ff ff e8 46 ac ff ff 3b 85 34 ff ff ff 0f 84 fc 02 00 00 <0f> 0b eb fe 8b 9d 34 ff ff ff 8b 85 30 ff ff ff 01 d8 85 c0 0f
> RIP  [<ffffffff811ce336>] __blockdev_direct_IO_newtrunc+0x986/0x1270
>  RSP <ffff88011a11ba48>
> ---[ end trace 73be5dcaf8939399 ]---
> Kernel panic - not syncing: Fatal exception

That code isn't even in mainline, as far as I can tell...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: virtio_blk: fix defaults for max_hw_sectors and max_segment_size
  2014-11-26 19:48     ` Jens Axboe
@ 2014-11-26 20:51       ` Mike Snitzer
  2014-11-26 20:54         ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2014-11-26 20:51 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, martin.petersen, mst, rusty, linux-kernel,
	qemu-devel, dm-devel, Paolo Bonzini

On Wed, Nov 26 2014 at  2:48pm -0500,
Jens Axboe <axboe@kernel.dk> wrote:

> On 11/21/2014 08:49 AM, Mike Snitzer wrote:
> > On Fri, Nov 21 2014 at  4:54am -0500,
> > Christoph Hellwig <hch@infradead.org> wrote:
> > 
> >> On Thu, Nov 20, 2014 at 02:00:59PM -0500, Mike Snitzer wrote:
> >>> virtio_blk incorrectly established -1U as the default for these
> >>> queue_limits.  Set these limits to sane default values to avoid crashing
> >>> the kernel.  But the virtio-blk protocol should probably be extended to
> >>> allow proper stacking of the disk's limits from the host.
> >>>
> >>> This change fixes a crash that was reported when virtio-blk was used to
> >>> test linux-dm.git commit 604ea90641b4 ("dm thin: adjust max_sectors_kb
> >>> based on thinp blocksize") that will initially set max_sectors to
> >>> max_hw_sectors and then rounddown to the first power-of-2 factor of the
> >>> DM thin-pool's blocksize.  Basically that commit assumes drivers don't
> >>> suck when establishing max_hw_sectors so it acted like a canary in the
> >>> coal mine.
> >>
> >> Is that a crash in the host or guest?  What kind of mishandling did you
> >> see?  Unless the recent virtio standard changed anything the host
> >> should be able to handle our arbitrary limits, and even if it doesn't
> >> that something we need to hash out with qemu and the virtio standards
> >> folks.
> > 
> > Some good news: this guest crash isn't an issue with recent kernels (so
> > upstream, fedora 20, RHEL7, etc aren't impacted -- Jens feel free to
> > drop my virtio_blk patch; even though some of it's limits are clearly
> > broken I'll defer to the virtio_blk developers on the best way forward
> > -- sorry for the noise!).
> > 
> > The BUG I saw only seems to impact RHEL6 kernels so far (note to self,
> > actually _test_ on upstream before reporting a crash against upstream!)
> > 
> > [root@RHEL-6 ~]# echo 1073741824 > /sys/block/vdc/queue/max_sectors_kb
> > [root@RHEL-6 ~]# lvs
> > 
> > Message from syslogd@RHEL-6 at Nov 21 15:32:15 ...
> >  kernel:Kernel panic - not syncing: Fatal exception
> > 
> > Here is the RHEL6 guest crash, just for full disclosure:
> > 
> > kernel BUG at fs/direct-io.c:696!
> > invalid opcode: 0000 [#1] SMP
> > last sysfs file: /sys/devices/virtual/block/dm-4/dev
> > CPU 0
> > Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipv6 ext2 dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio libcrc32c dm_mirror dm_region_hash dm_log dm_mod microcode virtio_balloon i2c_piix4 i2c_core virtio_net ext4 jbd2 mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix [last unloaded: speedstep_lib]
> > 
> > Pid: 1679, comm: lvs Not tainted 2.6.32 #6 Bochs Bochs
> > RIP: 0010:[<ffffffff811ce336>]  [<ffffffff811ce336>] __blockdev_direct_IO_newtrunc+0x986/0x1270
> > RSP: 0018:ffff88011a11ba48  EFLAGS: 00010287
> > RAX: 0000000000000000 RBX: ffff8801192fbd28 RCX: 0000000000001000
> > RDX: ffffea0003b3d218 RSI: ffff88011aac4300 RDI: ffff880118572378
> > RBP: ffff88011a11bbe8 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801192fbd00
> > R13: 0000000000000000 R14: ffff880118c3cac0 R15: 0000000000000000
> > FS:  00007fde78bc37a0(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00000000012706f0 CR3: 000000011a432000 CR4: 00000000000407f0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Process lvs (pid: 1679, threadinfo ffff88011a11a000, task ffff8801185a4aa0)
> > Stack:
> >  ffff88011a11bb48 ffff88011a11baa8 ffff88010000000c ffff88011a11bb18
> > <d> 0000000000000000 0000000000000000 ffff88011a11bdc8 ffff88011a11beb8
> > <d> 0000000c1a11baa8 ffff880118c3cb98 0000000000000000 0000000018c3ccb8
> > Call Trace:
> >  [<ffffffff811c9e90>] ? blkdev_get_block+0x0/0x20
> >  [<ffffffff811cec97>] __blockdev_direct_IO+0x77/0xe0
> >  [<ffffffff811c9e90>] ? blkdev_get_block+0x0/0x20
> >  [<ffffffff811caf17>] blkdev_direct_IO+0x57/0x60
> >  [<ffffffff811c9e90>] ? blkdev_get_block+0x0/0x20
> >  [<ffffffff8112619b>] generic_file_aio_read+0x6bb/0x700
> >  [<ffffffff811cba60>] ? blkdev_get+0x10/0x20
> >  [<ffffffff811cba70>] ? blkdev_open+0x0/0xc0
> >  [<ffffffff8118af4f>] ? __dentry_open+0x23f/0x360
> >  [<ffffffff811ca2d1>] blkdev_aio_read+0x51/0x80
> >  [<ffffffff8118dc6a>] do_sync_read+0xfa/0x140
> >  [<ffffffff8109eaf0>] ? autoremove_wake_function+0x0/0x40
> >  [<ffffffff811ca22c>] ? block_ioctl+0x3c/0x40
> >  [<ffffffff811a34c2>] ? vfs_ioctl+0x22/0xa0
> >  [<ffffffff811a3664>] ? do_vfs_ioctl+0x84/0x580
> >  [<ffffffff8122cee6>] ? security_file_permission+0x16/0x20
> >  [<ffffffff8118e625>] vfs_read+0xb5/0x1a0
> >  [<ffffffff8118e761>] sys_read+0x51/0x90
> >  [<ffffffff810e5aae>] ? __audit_syscall_exit+0x25e/0x290
> >  [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
> > Code: fe ff ff c7 85 fc fe ff ff 00 00 00 00 48 89 95 10 ff ff ff 8b 95 34 ff ff ff e8 46 ac ff ff 3b 85 34 ff ff ff 0f 84 fc 02 00 00 <0f> 0b eb fe 8b 9d 34 ff ff ff 8b 85 30 ff ff ff 01 d8 85 c0 0f
> > RIP  [<ffffffff811ce336>] __blockdev_direct_IO_newtrunc+0x986/0x1270
> >  RSP <ffff88011a11ba48>
> > ---[ end trace 73be5dcaf8939399 ]---
> > Kernel panic - not syncing: Fatal exception
> 
> That code isn't even in mainline, as far as I can tell...

Right, it is old RHEL6 code.

But I've yet to determine what changed upstream that enables this to
"just work" with a really large max_sectors (I haven't been looking
either).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: virtio_blk: fix defaults for max_hw_sectors and max_segment_size
  2014-11-26 20:51       ` Mike Snitzer
@ 2014-11-26 20:54         ` Jens Axboe
  2014-11-26 21:51           ` Mike Snitzer
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2014-11-26 20:54 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Christoph Hellwig, martin.petersen, mst, rusty, linux-kernel,
	qemu-devel, dm-devel, Paolo Bonzini

On 11/26/2014 01:51 PM, Mike Snitzer wrote:
> On Wed, Nov 26 2014 at  2:48pm -0500,
> Jens Axboe <axboe@kernel.dk> wrote:
> 
>> On 11/21/2014 08:49 AM, Mike Snitzer wrote:
>>> On Fri, Nov 21 2014 at  4:54am -0500,
>>> Christoph Hellwig <hch@infradead.org> wrote:
>>>
>>>> On Thu, Nov 20, 2014 at 02:00:59PM -0500, Mike Snitzer wrote:
>>>>> virtio_blk incorrectly established -1U as the default for these
>>>>> queue_limits.  Set these limits to sane default values to avoid crashing
>>>>> the kernel.  But the virtio-blk protocol should probably be extended to
>>>>> allow proper stacking of the disk's limits from the host.
>>>>>
>>>>> This change fixes a crash that was reported when virtio-blk was used to
>>>>> test linux-dm.git commit 604ea90641b4 ("dm thin: adjust max_sectors_kb
>>>>> based on thinp blocksize") that will initially set max_sectors to
>>>>> max_hw_sectors and then rounddown to the first power-of-2 factor of the
>>>>> DM thin-pool's blocksize.  Basically that commit assumes drivers don't
>>>>> suck when establishing max_hw_sectors so it acted like a canary in the
>>>>> coal mine.
>>>>
>>>> Is that a crash in the host or guest?  What kind of mishandling did you
>>>> see?  Unless the recent virtio standard changed anything the host
>>>> should be able to handle our arbitrary limits, and even if it doesn't
>>>> that something we need to hash out with qemu and the virtio standards
>>>> folks.
>>>
>>> Some good news: this guest crash isn't an issue with recent kernels (so
>>> upstream, fedora 20, RHEL7, etc aren't impacted -- Jens feel free to
>>> drop my virtio_blk patch; even though some of it's limits are clearly
>>> broken I'll defer to the virtio_blk developers on the best way forward
>>> -- sorry for the noise!).
>>>
>>> The BUG I saw only seems to impact RHEL6 kernels so far (note to self,
>>> actually _test_ on upstream before reporting a crash against upstream!)
>>>
>>> [root@RHEL-6 ~]# echo 1073741824 > /sys/block/vdc/queue/max_sectors_kb
>>> [root@RHEL-6 ~]# lvs
>>>
>>> Message from syslogd@RHEL-6 at Nov 21 15:32:15 ...
>>>  kernel:Kernel panic - not syncing: Fatal exception
>>>
>>> Here is the RHEL6 guest crash, just for full disclosure:
>>>
>>> kernel BUG at fs/direct-io.c:696!
>>> invalid opcode: 0000 [#1] SMP
>>> last sysfs file: /sys/devices/virtual/block/dm-4/dev
>>> CPU 0
>>> Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipv6 ext2 dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio libcrc32c dm_mirror dm_region_hash dm_log dm_mod microcode virtio_balloon i2c_piix4 i2c_core virtio_net ext4 jbd2 mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix [last unloaded: speedstep_lib]
>>>
>>> Pid: 1679, comm: lvs Not tainted 2.6.32 #6 Bochs Bochs
>>> RIP: 0010:[<ffffffff811ce336>]  [<ffffffff811ce336>] __blockdev_direct_IO_newtrunc+0x986/0x1270
>>> RSP: 0018:ffff88011a11ba48  EFLAGS: 00010287
>>> RAX: 0000000000000000 RBX: ffff8801192fbd28 RCX: 0000000000001000
>>> RDX: ffffea0003b3d218 RSI: ffff88011aac4300 RDI: ffff880118572378
>>> RBP: ffff88011a11bbe8 R08: 0000000000000000 R09: 0000000000000000
>>> R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801192fbd00
>>> R13: 0000000000000000 R14: ffff880118c3cac0 R15: 0000000000000000
>>> FS:  00007fde78bc37a0(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 00000000012706f0 CR3: 000000011a432000 CR4: 00000000000407f0
>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>> Process lvs (pid: 1679, threadinfo ffff88011a11a000, task ffff8801185a4aa0)
>>> Stack:
>>>  ffff88011a11bb48 ffff88011a11baa8 ffff88010000000c ffff88011a11bb18
>>> <d> 0000000000000000 0000000000000000 ffff88011a11bdc8 ffff88011a11beb8
>>> <d> 0000000c1a11baa8 ffff880118c3cb98 0000000000000000 0000000018c3ccb8
>>> Call Trace:
>>>  [<ffffffff811c9e90>] ? blkdev_get_block+0x0/0x20
>>>  [<ffffffff811cec97>] __blockdev_direct_IO+0x77/0xe0
>>>  [<ffffffff811c9e90>] ? blkdev_get_block+0x0/0x20
>>>  [<ffffffff811caf17>] blkdev_direct_IO+0x57/0x60
>>>  [<ffffffff811c9e90>] ? blkdev_get_block+0x0/0x20
>>>  [<ffffffff8112619b>] generic_file_aio_read+0x6bb/0x700
>>>  [<ffffffff811cba60>] ? blkdev_get+0x10/0x20
>>>  [<ffffffff811cba70>] ? blkdev_open+0x0/0xc0
>>>  [<ffffffff8118af4f>] ? __dentry_open+0x23f/0x360
>>>  [<ffffffff811ca2d1>] blkdev_aio_read+0x51/0x80
>>>  [<ffffffff8118dc6a>] do_sync_read+0xfa/0x140
>>>  [<ffffffff8109eaf0>] ? autoremove_wake_function+0x0/0x40
>>>  [<ffffffff811ca22c>] ? block_ioctl+0x3c/0x40
>>>  [<ffffffff811a34c2>] ? vfs_ioctl+0x22/0xa0
>>>  [<ffffffff811a3664>] ? do_vfs_ioctl+0x84/0x580
>>>  [<ffffffff8122cee6>] ? security_file_permission+0x16/0x20
>>>  [<ffffffff8118e625>] vfs_read+0xb5/0x1a0
>>>  [<ffffffff8118e761>] sys_read+0x51/0x90
>>>  [<ffffffff810e5aae>] ? __audit_syscall_exit+0x25e/0x290
>>>  [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
>>> Code: fe ff ff c7 85 fc fe ff ff 00 00 00 00 48 89 95 10 ff ff ff 8b 95 34 ff ff ff e8 46 ac ff ff 3b 85 34 ff ff ff 0f 84 fc 02 00 00 <0f> 0b eb fe 8b 9d 34 ff ff ff 8b 85 30 ff ff ff 01 d8 85 c0 0f
>>> RIP  [<ffffffff811ce336>] __blockdev_direct_IO_newtrunc+0x986/0x1270
>>>  RSP <ffff88011a11ba48>
>>> ---[ end trace 73be5dcaf8939399 ]---
>>> Kernel panic - not syncing: Fatal exception
>>
>> That code isn't even in mainline, as far as I can tell...
> 
> Right, it is old RHEL6 code.
> 
> But I've yet to determine what changed upstream that enables this to
> "just work" with a really large max_sectors (I haven't been looking
> either).

Kind of hard for the rest of us to say, since it's triggering a BUG in
code we don't have :-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: virtio_blk: fix defaults for max_hw_sectors and max_segment_size
  2014-11-26 20:54         ` Jens Axboe
@ 2014-11-26 21:51           ` Mike Snitzer
  2014-11-26 21:53             ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2014-11-26 21:51 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, martin.petersen, mst, rusty, linux-kernel,
	qemu-devel, dm-devel, Paolo Bonzini

On Wed, Nov 26 2014 at  3:54pm -0500,
Jens Axboe <axboe@kernel.dk> wrote:

> On 11/26/2014 01:51 PM, Mike Snitzer wrote:
> > On Wed, Nov 26 2014 at  2:48pm -0500,
> > Jens Axboe <axboe@kernel.dk> wrote:
> > 
> >>
> >> That code isn't even in mainline, as far as I can tell...
> > 
> > Right, it is old RHEL6 code.
> > 
> > But I've yet to determine what changed upstream that enables this to
> > "just work" with a really large max_sectors (I haven't been looking
> > either).
> 
> Kind of hard for the rest of us to say, since it's triggering a BUG in
> code we don't have :-)

I never asked you or others to weigh in on old RHEL6 code.  Once I
realized upstream worked even if max_sectors is _really_ high I said
"sorry for the noise".

But while you're here, I wouldn't mind getting your take on virtio-blk
setting max_hw_sectors to -1U.

As I said in my original reply to mst: it only makes sense to set a
really high initial upper bound like that in a driver if that driver
goes on to stack an underlying device's limit.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: virtio_blk: fix defaults for max_hw_sectors and max_segment_size
  2014-11-26 21:51           ` Mike Snitzer
@ 2014-11-26 21:53             ` Jens Axboe
  2014-11-26 23:00               ` Mike Snitzer
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2014-11-26 21:53 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Christoph Hellwig, martin.petersen, mst, rusty, linux-kernel,
	qemu-devel, dm-devel, Paolo Bonzini

On 11/26/2014 02:51 PM, Mike Snitzer wrote:
> On Wed, Nov 26 2014 at  3:54pm -0500,
> Jens Axboe <axboe@kernel.dk> wrote:
> 
>> On 11/26/2014 01:51 PM, Mike Snitzer wrote:
>>> On Wed, Nov 26 2014 at  2:48pm -0500,
>>> Jens Axboe <axboe@kernel.dk> wrote:
>>>
>>>>
>>>> That code isn't even in mainline, as far as I can tell...
>>>
>>> Right, it is old RHEL6 code.
>>>
>>> But I've yet to determine what changed upstream that enables this to
>>> "just work" with a really large max_sectors (I haven't been looking
>>> either).
>>
>> Kind of hard for the rest of us to say, since it's triggering a BUG in
>> code we don't have :-)
> 
> I never asked you or others to weigh in on old RHEL6 code.  Once I
> realized upstream worked even if max_sectors is _really_ high I said
> "sorry for the noise".
> 
> But while you're here, I wouldn't mind getting your take on virtio-blk
> setting max_hw_sectors to -1U.
> 
> As I said in my original reply to mst: it only makes sense to set a
> really high initial upper bound like that in a driver if that driver
> goes on to stack an underlying device's limit.

-1U should just work, IMHO, there's no reason we should need to cap it
at some synthetic value. That said, it seems it should be one of those
parameters that should be negotiated up and set appropriately.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: virtio_blk: fix defaults for max_hw_sectors and max_segment_size
  2014-11-26 21:53             ` Jens Axboe
@ 2014-11-26 23:00               ` Mike Snitzer
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Snitzer @ 2014-11-26 23:00 UTC (permalink / raw)
  To: Jens Axboe
  Cc: martin.petersen, mst, rusty, qemu-devel, linux-kernel,
	Christoph Hellwig, dm-devel, Paolo Bonzini

On Wed, Nov 26 2014 at  4:53pm -0500,
Jens Axboe <axboe@kernel.dk> wrote:

> On 11/26/2014 02:51 PM, Mike Snitzer wrote:
> > 
> > But while you're here, I wouldn't mind getting your take on virtio-blk
> > setting max_hw_sectors to -1U.
> > 
> > As I said in my original reply to mst: it only makes sense to set a
> > really high initial upper bound like that in a driver if that driver
> > goes on to stack an underlying device's limit.
> 
> -1U should just work, IMHO, there's no reason we should need to cap it
> at some synthetic value.  That said, it seems it should be one of
> those parameters that should be negotiated up and set appropriately. 

I'm saying set it to the underlying device's value for max_hw_sectors --
not some synthetic value.  So I think we're saying the same thing.

But it isn't immediately clear (to me) how that benefits virtio-blk
users (obviously they are getting by today).  So until that is pinned
down I imagine nobody will care to extend the virtio-blk protocol to
allow stacking max_hw_sectors and max_sectors up.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-11-26 23:01 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-20 19:00 [PATCH] virtio_blk: fix defaults for max_hw_sectors and max_segment_size Mike Snitzer
2014-11-20 20:30 ` Michael S. Tsirkin
2014-11-20 21:15   ` Mike Snitzer
2014-11-26  5:58     ` Rusty Russell
2014-11-26 14:10       ` Mike Snitzer
2014-11-21  1:59 ` Mike Snitzer
2014-11-21  2:11 ` [PATCH v2] " Mike Snitzer
2014-11-21  9:54 ` [PATCH] " Christoph Hellwig
2014-11-21 15:49   ` Mike Snitzer
2014-11-26 19:48     ` Jens Axboe
2014-11-26 20:51       ` Mike Snitzer
2014-11-26 20:54         ` Jens Axboe
2014-11-26 21:51           ` Mike Snitzer
2014-11-26 21:53             ` Jens Axboe
2014-11-26 23:00               ` Mike Snitzer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).