stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request()
@ 2019-05-23 17:23 Song Liu
  2019-05-23 17:23 ` [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios Song Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Song Liu @ 2019-05-23 17:23 UTC (permalink / raw)
  To: linux-raid, stable; +Cc: axboe, Guilherme G. Piccoli, Song Liu

From: "Guilherme G. Piccoli" <gpiccoli@canonical.com>

Commit 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently
with device removal triggers a crash") introduced a NULL pointer
dereference in generic_make_request(). The patch sets q to NULL and
enter_succeeded to false; right after, there's an 'if (enter_succeeded)'
which is not taken, and then the 'else' will dereference q in
blk_queue_dying(q).

This patch just moves the 'q = NULL' to a point in which it won't trigger
the oops, although the semantics of this NULLification remains untouched.

A simple test case/reproducer is as follows:
a) Build kernel v5.2-rc1 with CONFIG_BLK_CGROUP=n.

b) Create a raid0 md array with 2 NVMe devices as members, and mount it
with an ext4 filesystem.

c) Run the following oneliner (supposing the raid0 is mounted in /mnt):
(dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;
echo 1 > /sys/block/nvme0n1/device/device/remove
(whereas nvme0n1 is the 2nd array member)

This will trigger the following oops:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
RIP: 0010:generic_make_request+0x32b/0x400
Call Trace:
 submit_bio+0x73/0x140
 ext4_io_submit+0x4d/0x60
 ext4_writepages+0x626/0xe90
 do_writepages+0x4b/0xe0
[...]

This patch has no functional changes and preserves the md/raid0 behavior
when a member is removed before kernel v4.17.

Cc: stable@vger.kernel.org # v4.17
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Eric Ren <renzhengeek@gmail.com>
Fixes: 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash")
Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
---
 block/blk-core.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index a55389ba8779..d24a29244cb8 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1074,10 +1074,8 @@ blk_qc_t generic_make_request(struct bio *bio)
 			flags = 0;
 			if (bio->bi_opf & REQ_NOWAIT)
 				flags = BLK_MQ_REQ_NOWAIT;
-			if (blk_queue_enter(q, flags) < 0) {
+			if (blk_queue_enter(q, flags) < 0)
 				enter_succeeded = false;
-				q = NULL;
-			}
 		}
 
 		if (enter_succeeded) {
@@ -1108,6 +1106,7 @@ blk_qc_t generic_make_request(struct bio *bio)
 				bio_wouldblock_error(bio);
 			else
 				bio_io_error(bio);
+			q = NULL;
 		}
 		bio = bio_list_pop(&bio_list_on_stack[0]);
 	} while (bio);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-05-23 17:23 [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request() Song Liu
@ 2019-05-23 17:23 ` Song Liu
  2019-06-12 12:40   ` Guilherme Piccoli
  2019-06-12 16:37   ` Guilherme G. Piccoli
  2019-05-23 17:25 ` [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request() Song Liu
  2019-06-12 16:36 ` Guilherme G. Piccoli
  2 siblings, 2 replies; 24+ messages in thread
From: Song Liu @ 2019-05-23 17:23 UTC (permalink / raw)
  To: linux-raid, stable
  Cc: axboe, Guilherme G. Piccoli, Ming Lei, Song Liu, Tetsuo Handa, Song Liu

From: "Guilherme G. Piccoli" <gpiccoli@canonical.com>

Commit cd4a4ae4683d ("block: don't use blocking queue entered for
recursive bio submits") introduced the flag BIO_QUEUE_ENTERED in order
split bios bypass the blocking queue entering routine and use the live
non-blocking version. It was a result of an extensive discussion in
a linux-block thread[0], and the purpose of this change was to prevent
a hung task waiting on a reference to drop.

Happens that md raid0 split bios all the time, and more important,
it changes their underlying device to the raid member. After the change
introduced by this flag's usage, we experience various crashes if a raid0
member is removed during a large write. This happens because the bio
reaches the live queue entering function when the queue of the raid0
member is dying.

A simple reproducer of this behavior is presented below:
a) Build kernel v5.2-rc1 with CONFIG_BLK_DEV_THROTTLING=y.

b) Create a raid0 md array with 2 NVMe devices as members, and mount it
with an ext4 filesystem.

c) Run the following oneliner (supposing the raid0 is mounted in /mnt):
(dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;
echo 1 > /sys/block/nvme0n1/device/device/remove
(whereas nvme0n1 is the 2nd array member)

This will trigger the following warning/oops:

------------[ cut here ]------------
no blkg associated for bio on block-device: nvme0n1
WARNING: CPU: 9 PID: 184 at ./include/linux/blk-cgroup.h:785
generic_make_request_checks+0x4dd/0x690
[...]
BUG: unable to handle kernel NULL pointer dereference at 0000000000000155
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
RIP: 0010:blk_throtl_bio+0x45/0x970
[...]
Call Trace:
 generic_make_request_checks+0x1bf/0x690
 generic_make_request+0x64/0x3f0
 raid0_make_request+0x184/0x620 [raid0]
 ? raid0_make_request+0x184/0x620 [raid0]
 ? blk_queue_split+0x384/0x6d0
 md_handle_request+0x126/0x1a0
 md_make_request+0x7b/0x180
 generic_make_request+0x19e/0x3f0
 submit_bio+0x73/0x140
[...]

This patch changes raid0 driver to fallback to the "old" blocking queue
entering procedure, by clearing the BIO_QUEUE_ENTERED from raid0 bios.
This prevents the crashes and restores the regular behavior of raid0
arrays when a member is removed during a large write.

[0] https://marc.info/?l=linux-block&m=152638475806811

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Song Liu <liu.song.a23@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: stable@vger.kernel.org # v4.18
Fixes: cd4a4ae4683d ("block: don't use blocking queue entered for recursive bio submits")
Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
---
 drivers/md/raid0.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index f3fb5bb8c82a..d5bdc79e0835 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -547,6 +547,7 @@ static void raid0_handle_discard(struct mddev *mddev, struct bio *bio)
 			trace_block_bio_remap(bdev_get_queue(rdev->bdev),
 				discard_bio, disk_devt(mddev->gendisk),
 				bio->bi_iter.bi_sector);
+		bio_clear_flag(bio, BIO_QUEUE_ENTERED);
 		generic_make_request(discard_bio);
 	}
 	bio_endio(bio);
@@ -602,6 +603,7 @@ static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
 				disk_devt(mddev->gendisk), bio_sector);
 	mddev_check_writesame(mddev, bio);
 	mddev_check_write_zeroes(mddev, bio);
+	bio_clear_flag(bio, BIO_QUEUE_ENTERED);
 	generic_make_request(bio);
 	return true;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request()
  2019-05-23 17:23 [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request() Song Liu
  2019-05-23 17:23 ` [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios Song Liu
@ 2019-05-23 17:25 ` Song Liu
  2019-06-12 16:36 ` Guilherme G. Piccoli
  2 siblings, 0 replies; 24+ messages in thread
From: Song Liu @ 2019-05-23 17:25 UTC (permalink / raw)
  To: Song Liu; +Cc: linux-raid, stable, Jens Axboe, Guilherme G. Piccoli

On Thu, May 23, 2019 at 10:24 AM Song Liu <songliubraving@fb.com> wrote:
>
> From: "Guilherme G. Piccoli" <gpiccoli@canonical.com>
>
> Commit 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently
> with device removal triggers a crash") introduced a NULL pointer
> dereference in generic_make_request(). The patch sets q to NULL and
> enter_succeeded to false; right after, there's an 'if (enter_succeeded)'
> which is not taken, and then the 'else' will dereference q in
> blk_queue_dying(q).
>
> This patch just moves the 'q = NULL' to a point in which it won't trigger
> the oops, although the semantics of this NULLification remains untouched.
>
> A simple test case/reproducer is as follows:
> a) Build kernel v5.2-rc1 with CONFIG_BLK_CGROUP=n.
>
> b) Create a raid0 md array with 2 NVMe devices as members, and mount it
> with an ext4 filesystem.
>
> c) Run the following oneliner (supposing the raid0 is mounted in /mnt):
> (dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;
> echo 1 > /sys/block/nvme0n1/device/device/remove
> (whereas nvme0n1 is the 2nd array member)
>
> This will trigger the following oops:
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP PTI
> RIP: 0010:generic_make_request+0x32b/0x400
> Call Trace:
>  submit_bio+0x73/0x140
>  ext4_io_submit+0x4d/0x60
>  ext4_writepages+0x626/0xe90
>  do_writepages+0x4b/0xe0
> [...]
>
> This patch has no functional changes and preserves the md/raid0 behavior
> when a member is removed before kernel v4.17.
>
> Cc: stable@vger.kernel.org # v4.17
> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Tested-by: Eric Ren <renzhengeek@gmail.com>
> Fixes: 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash")
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
> Signed-off-by: Song Liu <songliubraving@fb.com>

Please note this patchset is only for stable.

Thanks,
Song

> ---
>  block/blk-core.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index a55389ba8779..d24a29244cb8 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1074,10 +1074,8 @@ blk_qc_t generic_make_request(struct bio *bio)
>                         flags = 0;
>                         if (bio->bi_opf & REQ_NOWAIT)
>                                 flags = BLK_MQ_REQ_NOWAIT;
> -                       if (blk_queue_enter(q, flags) < 0) {
> +                       if (blk_queue_enter(q, flags) < 0)
>                                 enter_succeeded = false;
> -                               q = NULL;
> -                       }
>                 }
>
>                 if (enter_succeeded) {
> @@ -1108,6 +1106,7 @@ blk_qc_t generic_make_request(struct bio *bio)
>                                 bio_wouldblock_error(bio);
>                         else
>                                 bio_io_error(bio);
> +                       q = NULL;
>                 }
>                 bio = bio_list_pop(&bio_list_on_stack[0]);
>         } while (bio);
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-05-23 17:23 ` [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios Song Liu
@ 2019-06-12 12:40   ` Guilherme Piccoli
  2019-06-12 12:48     ` Greg KH
  2019-06-12 16:37   ` Guilherme G. Piccoli
  1 sibling, 1 reply; 24+ messages in thread
From: Guilherme Piccoli @ 2019-06-12 12:40 UTC (permalink / raw)
  To: stable, gregkh, sashal
  Cc: Song Liu, linux-raid, Jens Axboe, Ming Lei, Song Liu, Tetsuo Handa

Hi Greg and Sasha, is there any news about these patches?
Just checked the stable branches 5.1.y and 5.0.y, they seem not merged.

If there's anything pending from my side, let me know.
Thanks in advance,


Guilherme

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-06-12 12:40   ` Guilherme Piccoli
@ 2019-06-12 12:48     ` Greg KH
  2019-06-12 16:38       ` Guilherme G. Piccoli
  0 siblings, 1 reply; 24+ messages in thread
From: Greg KH @ 2019-06-12 12:48 UTC (permalink / raw)
  To: Guilherme Piccoli
  Cc: stable, sashal, Song Liu, linux-raid, Jens Axboe, Ming Lei,
	Song Liu, Tetsuo Handa

On Wed, Jun 12, 2019 at 09:40:17AM -0300, Guilherme Piccoli wrote:
> Hi Greg and Sasha, is there any news about these patches?

What patches?

> Just checked the stable branches 5.1.y and 5.0.y, they seem not merged.

Are they merged in Linus's tree?  What are the git commit ids?

I have no record of these patches in my queue at the moment, sorry.  If
these were a backport, please resend them in the proper format.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request()
  2019-05-23 17:23 [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request() Song Liu
  2019-05-23 17:23 ` [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios Song Liu
  2019-05-23 17:25 ` [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request() Song Liu
@ 2019-06-12 16:36 ` Guilherme G. Piccoli
  2 siblings, 0 replies; 24+ messages in thread
From: Guilherme G. Piccoli @ 2019-06-12 16:36 UTC (permalink / raw)
  To: stable; +Cc: Song Liu, linux-raid, axboe, gregkh, sashal

+ Greg, Sasha


On 23/05/2019 14:23, Song Liu wrote:
> From: "Guilherme G. Piccoli" <gpiccoli@canonical.com>
> 
> Commit 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently
> with device removal triggers a crash") introduced a NULL pointer
> dereference in generic_make_request(). The patch sets q to NULL and
> enter_succeeded to false; right after, there's an 'if (enter_succeeded)'
> which is not taken, and then the 'else' will dereference q in
> blk_queue_dying(q).
> 
> This patch just moves the 'q = NULL' to a point in which it won't trigger
> the oops, although the semantics of this NULLification remains untouched.
> 
> A simple test case/reproducer is as follows:
> a) Build kernel v5.2-rc1 with CONFIG_BLK_CGROUP=n.
> 
> b) Create a raid0 md array with 2 NVMe devices as members, and mount it
> with an ext4 filesystem.
> 
> c) Run the following oneliner (supposing the raid0 is mounted in /mnt):
> (dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;
> echo 1 > /sys/block/nvme0n1/device/device/remove
> (whereas nvme0n1 is the 2nd array member)
> 
> This will trigger the following oops:
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP PTI
> RIP: 0010:generic_make_request+0x32b/0x400
> Call Trace:
>  submit_bio+0x73/0x140
>  ext4_io_submit+0x4d/0x60
>  ext4_writepages+0x626/0xe90
>  do_writepages+0x4b/0xe0
> [...]
> 
> This patch has no functional changes and preserves the md/raid0 behavior
> when a member is removed before kernel v4.17.
> 
> Cc: stable@vger.kernel.org # v4.17
> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Tested-by: Eric Ren <renzhengeek@gmail.com>
> Fixes: 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash")
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  block/blk-core.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index a55389ba8779..d24a29244cb8 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1074,10 +1074,8 @@ blk_qc_t generic_make_request(struct bio *bio)
>  			flags = 0;
>  			if (bio->bi_opf & REQ_NOWAIT)
>  				flags = BLK_MQ_REQ_NOWAIT;
> -			if (blk_queue_enter(q, flags) < 0) {
> +			if (blk_queue_enter(q, flags) < 0)
>  				enter_succeeded = false;
> -				q = NULL;
> -			}
>  		}
>  
>  		if (enter_succeeded) {
> @@ -1108,6 +1106,7 @@ blk_qc_t generic_make_request(struct bio *bio)
>  				bio_wouldblock_error(bio);
>  			else
>  				bio_io_error(bio);
> +			q = NULL;
>  		}
>  		bio = bio_list_pop(&bio_list_on_stack[0]);
>  	} while (bio);
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-05-23 17:23 ` [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios Song Liu
  2019-06-12 12:40   ` Guilherme Piccoli
@ 2019-06-12 16:37   ` Guilherme G. Piccoli
  2019-06-12 16:49     ` Greg KH
  1 sibling, 1 reply; 24+ messages in thread
From: Guilherme G. Piccoli @ 2019-06-12 16:37 UTC (permalink / raw)
  To: stable
  Cc: Song Liu, linux-raid, axboe, Ming Lei, Song Liu, Tetsuo Handa,
	gregkh, sashal

+Greg, Sasha


On 23/05/2019 14:23, Song Liu wrote:
> From: "Guilherme G. Piccoli" <gpiccoli@canonical.com>
> 
> Commit cd4a4ae4683d ("block: don't use blocking queue entered for
> recursive bio submits") introduced the flag BIO_QUEUE_ENTERED in order
> split bios bypass the blocking queue entering routine and use the live
> non-blocking version. It was a result of an extensive discussion in
> a linux-block thread[0], and the purpose of this change was to prevent
> a hung task waiting on a reference to drop.
> 
> Happens that md raid0 split bios all the time, and more important,
> it changes their underlying device to the raid member. After the change
> introduced by this flag's usage, we experience various crashes if a raid0
> member is removed during a large write. This happens because the bio
> reaches the live queue entering function when the queue of the raid0
> member is dying.
> 
> A simple reproducer of this behavior is presented below:
> a) Build kernel v5.2-rc1 with CONFIG_BLK_DEV_THROTTLING=y.
> 
> b) Create a raid0 md array with 2 NVMe devices as members, and mount it
> with an ext4 filesystem.
> 
> c) Run the following oneliner (supposing the raid0 is mounted in /mnt):
> (dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;
> echo 1 > /sys/block/nvme0n1/device/device/remove
> (whereas nvme0n1 is the 2nd array member)
> 
> This will trigger the following warning/oops:
> 
> ------------[ cut here ]------------
> no blkg associated for bio on block-device: nvme0n1
> WARNING: CPU: 9 PID: 184 at ./include/linux/blk-cgroup.h:785
> generic_make_request_checks+0x4dd/0x690
> [...]
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000155
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP PTI
> RIP: 0010:blk_throtl_bio+0x45/0x970
> [...]
> Call Trace:
>  generic_make_request_checks+0x1bf/0x690
>  generic_make_request+0x64/0x3f0
>  raid0_make_request+0x184/0x620 [raid0]
>  ? raid0_make_request+0x184/0x620 [raid0]
>  ? blk_queue_split+0x384/0x6d0
>  md_handle_request+0x126/0x1a0
>  md_make_request+0x7b/0x180
>  generic_make_request+0x19e/0x3f0
>  submit_bio+0x73/0x140
> [...]
> 
> This patch changes raid0 driver to fallback to the "old" blocking queue
> entering procedure, by clearing the BIO_QUEUE_ENTERED from raid0 bios.
> This prevents the crashes and restores the regular behavior of raid0
> arrays when a member is removed during a large write.
> 
> [0] https://marc.info/?l=linux-block&m=152638475806811
> 
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Song Liu <liu.song.a23@gmail.com>
> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: stable@vger.kernel.org # v4.18
> Fixes: cd4a4ae4683d ("block: don't use blocking queue entered for recursive bio submits")
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  drivers/md/raid0.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index f3fb5bb8c82a..d5bdc79e0835 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -547,6 +547,7 @@ static void raid0_handle_discard(struct mddev *mddev, struct bio *bio)
>  			trace_block_bio_remap(bdev_get_queue(rdev->bdev),
>  				discard_bio, disk_devt(mddev->gendisk),
>  				bio->bi_iter.bi_sector);
> +		bio_clear_flag(bio, BIO_QUEUE_ENTERED);
>  		generic_make_request(discard_bio);
>  	}
>  	bio_endio(bio);
> @@ -602,6 +603,7 @@ static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
>  				disk_devt(mddev->gendisk), bio_sector);
>  	mddev_check_writesame(mddev, bio);
>  	mddev_check_write_zeroes(mddev, bio);
> +	bio_clear_flag(bio, BIO_QUEUE_ENTERED);
>  	generic_make_request(bio);
>  	return true;
>  }
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-06-12 12:48     ` Greg KH
@ 2019-06-12 16:38       ` Guilherme G. Piccoli
  0 siblings, 0 replies; 24+ messages in thread
From: Guilherme G. Piccoli @ 2019-06-12 16:38 UTC (permalink / raw)
  To: Greg KH
  Cc: stable, sashal, Song Liu, linux-raid, Jens Axboe, Ming Lei,
	Song Liu, Tetsuo Handa

On 12/06/2019 09:48, Greg KH wrote:
> On Wed, Jun 12, 2019 at 09:40:17AM -0300, Guilherme Piccoli wrote:
>> Hi Greg and Sasha, is there any news about these patches?
> 
> What patches?
> 
>> Just checked the stable branches 5.1.y and 5.0.y, they seem not merged.
> 
> Are they merged in Linus's tree?  What are the git commit ids?
> 

Hi Greg, thanks for your prompt response! The story behind these patches
is a bit confusing; I'll summarize below.

I've submitted 2 patches with fixes to linux-raid in April; if you wanna
take a look, they are:

https://marc.info/?l=linux-raid&m=155839017427565
https://marc.info/?l=linux-raid&m=155839017827569

After some discussion, it was determined a patchset from Ming Lei fixed
the same issues I've reported/fixed, but it was a rework (and depends on
the removal of legacy IO path).
That said, Song Liu submitted both of my fixes as stable-only patches.
I couldn't find a stable archive (and I don't subscribe to that), so I
cannot point links here.

I just resubmitted both patches, this time CCing you and Sasha. Let me
know if there's anything needed - I'd prefer to have them upstream too,
but the discussion with Christoph/Ming/Song reached a consensus that it
wouldn't make sense to add them and soon after add Ming's patchset,
which removes that code.
But backporting Ming's series is not really simple/feasible.

Thanks,


Guilherme


> I have no record of these patches in my queue at the moment, sorry.  If
> these were a backport, please resend them in the proper format.
> 
> thanks,
> 
> greg k-h
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-06-12 16:37   ` Guilherme G. Piccoli
@ 2019-06-12 16:49     ` Greg KH
  2019-06-12 18:07       ` Guilherme Piccoli
  0 siblings, 1 reply; 24+ messages in thread
From: Greg KH @ 2019-06-12 16:49 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: stable, Song Liu, linux-raid, axboe, Ming Lei, Song Liu,
	Tetsuo Handa, sashal

On Wed, Jun 12, 2019 at 01:37:24PM -0300, Guilherme G. Piccoli wrote:
> +Greg, Sasha

Please resend them in a format that they can be applied in.

Also, I need a TON of descriptions about why this differs from what is
in Linus's tree, as it is, what you have below does not show that at
all, they seem to be valud for 5.2-rc1.

And I need acks from the maintainers of the subsystems.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-06-12 16:49     ` Greg KH
@ 2019-06-12 18:07       ` Guilherme Piccoli
  2019-06-12 18:36         ` Greg KH
  2019-06-12 18:43         ` Sasha Levin
  0 siblings, 2 replies; 24+ messages in thread
From: Guilherme Piccoli @ 2019-06-12 18:07 UTC (permalink / raw)
  To: Greg KH
  Cc: stable, Song Liu, linux-raid, Jens Axboe, Ming Lei, Song Liu,
	Tetsuo Handa, sashal

On Wed, Jun 12, 2019 at 1:50 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Wed, Jun 12, 2019 at 01:37:24PM -0300, Guilherme G. Piccoli wrote:
> > +Greg, Sasha
>
> Please resend them in a format that they can be applied in.
>
> Also, I need a TON of descriptions about why this differs from what is
> in Linus's tree, as it is, what you have below does not show that at
> all, they seem to be valud for 5.2-rc1.

Thanks Greg, I'll work on it. Can this "ton" of description be a cover-letter?
Cheers,


Guilherme

>
> And I need acks from the maintainers of the subsystems.
>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-06-12 18:07       ` Guilherme Piccoli
@ 2019-06-12 18:36         ` Greg KH
  2019-06-12 18:43         ` Sasha Levin
  1 sibling, 0 replies; 24+ messages in thread
From: Greg KH @ 2019-06-12 18:36 UTC (permalink / raw)
  To: Guilherme Piccoli
  Cc: stable, Song Liu, linux-raid, Jens Axboe, Ming Lei, Song Liu,
	Tetsuo Handa, sashal

On Wed, Jun 12, 2019 at 03:07:11PM -0300, Guilherme Piccoli wrote:
> On Wed, Jun 12, 2019 at 1:50 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Wed, Jun 12, 2019 at 01:37:24PM -0300, Guilherme G. Piccoli wrote:
> > > +Greg, Sasha
> >
> > Please resend them in a format that they can be applied in.
> >
> > Also, I need a TON of descriptions about why this differs from what is
> > in Linus's tree, as it is, what you have below does not show that at
> > all, they seem to be valud for 5.2-rc1.
> 
> Thanks Greg, I'll work on it. Can this "ton" of description be a cover-letter?

Nope, in the commit log itself as we need it in the tree if we apply it.

Remember, 99% of all patches that we take in the stable tree that are
NOT in Linus's tree end up having bugs.  Yours will, and we want lots of
documentation about exactly why we are thinking we are justified in
taking this patch :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-06-12 18:07       ` Guilherme Piccoli
  2019-06-12 18:36         ` Greg KH
@ 2019-06-12 18:43         ` Sasha Levin
  2019-06-12 18:48           ` Guilherme Piccoli
  1 sibling, 1 reply; 24+ messages in thread
From: Sasha Levin @ 2019-06-12 18:43 UTC (permalink / raw)
  To: Guilherme Piccoli
  Cc: Greg KH, stable, Song Liu, linux-raid, Jens Axboe, Ming Lei,
	Song Liu, Tetsuo Handa

On Wed, Jun 12, 2019 at 03:07:11PM -0300, Guilherme Piccoli wrote:
>On Wed, Jun 12, 2019 at 1:50 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>>
>> On Wed, Jun 12, 2019 at 01:37:24PM -0300, Guilherme G. Piccoli wrote:
>> > +Greg, Sasha
>>
>> Please resend them in a format that they can be applied in.
>>
>> Also, I need a TON of descriptions about why this differs from what is
>> in Linus's tree, as it is, what you have below does not show that at
>> all, they seem to be valud for 5.2-rc1.
>
>Thanks Greg, I'll work on it. Can this "ton" of description be a cover-letter?

Please just add it to the commit message. We might need to refer to it
in the future and cover letter will just get lost.

--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-06-12 18:43         ` Sasha Levin
@ 2019-06-12 18:48           ` Guilherme Piccoli
  0 siblings, 0 replies; 24+ messages in thread
From: Guilherme Piccoli @ 2019-06-12 18:48 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Greg KH, stable, Song Liu, linux-raid, Jens Axboe, Ming Lei,
	Song Liu, Tetsuo Handa

OK perfect, thank you both!

On Wed, Jun 12, 2019 at 3:43 PM Sasha Levin <sashal@kernel.org> wrote:
>
> On Wed, Jun 12, 2019 at 03:07:11PM -0300, Guilherme Piccoli wrote:
> >On Wed, Jun 12, 2019 at 1:50 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> >>
> >> On Wed, Jun 12, 2019 at 01:37:24PM -0300, Guilherme G. Piccoli wrote:
> >> > +Greg, Sasha
> >>
> >> Please resend them in a format that they can be applied in.
> >>
> >> Also, I need a TON of descriptions about why this differs from what is
> >> in Linus's tree, as it is, what you have below does not show that at
> >> all, they seem to be valud for 5.2-rc1.
> >
> >Thanks Greg, I'll work on it. Can this "ton" of description be a cover-letter?
>
> Please just add it to the commit message. We might need to refer to it
> in the future and cover letter will just get lost.
>
> --
> Thanks,
> Sasha

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-05-20 16:23                 ` Song Liu
@ 2019-05-20 19:25                   ` Guilherme Piccoli
  0 siblings, 0 replies; 24+ messages in thread
From: Guilherme Piccoli @ 2019-05-20 19:25 UTC (permalink / raw)
  To: Song Liu
  Cc: Guilherme G. Piccoli, Wols Lists, axboe, linux-block, linux-raid,
	dm-devel, Gavin Guo, Jay Vosburgh, Ming Lei, Tetsuo Handa,
	stable

On Mon, May 20, 2019 at 1:24 PM Song Liu <liu.song.a23@gmail.com> wrote:
>
> On Fri, May 17, 2019 at 9:19 AM Guilherme G. Piccoli
>
> I will process it. It was delayed due to the merge window.
>
> Thanks,
> Song


Thank you Song! I'm ready to send a v2, just to match the v2 of patch
"1/2" of this series (but no
change in this one, except rebase to 5.2-rc1).

Cheers,

Guilherme

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-05-17 16:19               ` Guilherme G. Piccoli
@ 2019-05-20 16:23                 ` Song Liu
  2019-05-20 19:25                   ` Guilherme Piccoli
  0 siblings, 1 reply; 24+ messages in thread
From: Song Liu @ 2019-05-20 16:23 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Wols Lists, axboe, Guilherme G. Piccoli, linux-block, linux-raid,
	dm-devel, Gavin Guo, Jay Vosburgh, Ming Lei, Tetsuo Handa,
	stable

On Fri, May 17, 2019 at 9:19 AM Guilherme G. Piccoli
<kernel@gpiccoli.net> wrote:
>
> Jens / Song, any news in this one?
>
> I think would be good to have this raid0 fix rather sooner than later
> if possible - it's easy
> to reproduce the crash.
>
> Thanks,
>
>
> Guilherme

I will process it. It was delayed due to the merge window.

Thanks,
Song

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-05-08 16:52             ` Wols Lists
@ 2019-05-17 16:19               ` Guilherme G. Piccoli
  2019-05-20 16:23                 ` Song Liu
  0 siblings, 1 reply; 24+ messages in thread
From: Guilherme G. Piccoli @ 2019-05-17 16:19 UTC (permalink / raw)
  To: Wols Lists, Song Liu, axboe
  Cc: Guilherme G. Piccoli, linux-block, linux-raid, dm-devel,
	Gavin Guo, Jay Vosburgh, Ming Lei, Tetsuo Handa, stable

Jens / Song, any news in this one?

I think would be good to have this raid0 fix rather sooner than later
if possible - it's easy
to reproduce the crash.

Thanks,


Guilherme

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-05-08 14:52           ` Guilherme G. Piccoli
@ 2019-05-08 16:52             ` Wols Lists
  2019-05-17 16:19               ` Guilherme G. Piccoli
  0 siblings, 1 reply; 24+ messages in thread
From: Wols Lists @ 2019-05-08 16:52 UTC (permalink / raw)
  To: Guilherme G. Piccoli, Song Liu
  Cc: Guilherme G. Piccoli, linux-block, linux-raid, dm-devel, axboe,
	Gavin Guo, Jay Vosburgh, kernel, Ming Lei, Tetsuo Handa, stable

On 08/05/19 15:52, Guilherme G. Piccoli wrote:
> Hi, I understand your concern. But all other raid levels contains
> failure-event mechanisms. For example, in all my tests with raid5 or
> raid1, it first complained the device was removed, then it failed in
> super_written() when no other available device was present.
> On the other hand, raid0 does "blind-writes": it just selects the device
> in which that bio should be written (given the stripe math) and change
> the bio's device, sending it back via generic_make_request(). It's
> dummy, but not in a bad way, but rather for performance reasons. It has
> no "intelligence" for failures, as all other raid levels.
> 
> That said, we could fix md.c for all raid levels, but I personally think
> it's a bazooka shot, only raid0 shows consistently this issue.
> 
The academic in me says we should push that error handling into
generic_make_request() or some raid function in md.c that deals with
those problems. Sounds like there's a fair bit of duplicate
functionality that could be re-factored out.
>>
>> Academic purity versus engineering practicality :-)
> 
> Heheh you have good points here! Thanks for the input =)
> Cheers,
> 
Doesn't help when there's not an architect to design an overall "grand
scheme", but my usual way of working is to design top down academically,
and then ask myself "what do I need" before implementing bottom-up.
Hopefully with a load of documentation saying "I haven't done this
because I don't need it, but this is where it goes".

Cheers,
Wol


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-05-08  9:29         ` Wols Lists
@ 2019-05-08 14:52           ` Guilherme G. Piccoli
  2019-05-08 16:52             ` Wols Lists
  0 siblings, 1 reply; 24+ messages in thread
From: Guilherme G. Piccoli @ 2019-05-08 14:52 UTC (permalink / raw)
  To: Wols Lists, Song Liu
  Cc: Guilherme G. Piccoli, linux-block, linux-raid, dm-devel, axboe,
	Gavin Guo, Jay Vosburgh, kernel, Ming Lei, Tetsuo Handa, stable

On 5/8/19 6:29 AM, Wols Lists wrote:
> On 06/05/19 22:07, Song Liu wrote:
>> Could you please run a quick test with raid5? I am wondering whether
>> some race condition could get us into similar crash. If we cannot easily
>> trigger the bug, we can process with this version.
> 
> Bear in mind I just read the list and write documentation, but ...
> 
> My gut feeling is that if it can theoretically happen for all raid
> modes, it should be fixed for all raid modes. What happens if code
> changes elsewhere and suddenly it really does happen for say raid-5?
> 
> On the other hand, if fixing it in md.c only gets tested for raid-0, how
> do we know it will actually work for other raids if they do suddenly
> start falling through.

Hi, I understand your concern. But all other raid levels contains 
failure-event mechanisms. For example, in all my tests with raid5 or 
raid1, it first complained the device was removed, then it failed in 
super_written() when no other available device was present.
On the other hand, raid0 does "blind-writes": it just selects the device 
in which that bio should be written (given the stripe math) and change 
the bio's device, sending it back via generic_make_request(). It's 
dummy, but not in a bad way, but rather for performance reasons. It has 
no "intelligence" for failures, as all other raid levels.

That said, we could fix md.c for all raid levels, but I personally think 
it's a bazooka shot, only raid0 shows consistently this issue.

> 
> Academic purity versus engineering practicality :-)

Heheh you have good points here! Thanks for the input =)
Cheers,


Guilherme


> 
> Cheers,
> Wol
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-05-06 21:07       ` Song Liu
  2019-05-07 21:51         ` Guilherme G. Piccoli
@ 2019-05-08  9:29         ` Wols Lists
  2019-05-08 14:52           ` Guilherme G. Piccoli
  1 sibling, 1 reply; 24+ messages in thread
From: Wols Lists @ 2019-05-08  9:29 UTC (permalink / raw)
  To: Song Liu, Guilherme G. Piccoli
  Cc: linux-block, linux-raid, dm-devel, axboe, Gavin Guo,
	Jay Vosburgh, kernel, Ming Lei, Tetsuo Handa, stable

On 06/05/19 22:07, Song Liu wrote:
> Could you please run a quick test with raid5? I am wondering whether
> some race condition could get us into similar crash. If we cannot easily
> trigger the bug, we can process with this version.

Bear in mind I just read the list and write documentation, but ...

My gut feeling is that if it can theoretically happen for all raid
modes, it should be fixed for all raid modes. What happens if code
changes elsewhere and suddenly it really does happen for say raid-5?

On the other hand, if fixing it in md.c only gets tested for raid-0, how
do we know it will actually work for other raids if they do suddenly
start falling through.

Academic purity versus engineering practicality :-)

Cheers,
Wol

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-05-06 21:07       ` Song Liu
@ 2019-05-07 21:51         ` Guilherme G. Piccoli
  2019-05-08  9:29         ` Wols Lists
  1 sibling, 0 replies; 24+ messages in thread
From: Guilherme G. Piccoli @ 2019-05-07 21:51 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-block, linux-raid, dm-devel, axboe, Gavin Guo,
	Jay Vosburgh, kernel, Ming Lei, Tetsuo Handa, stable

On 06/05/2019 18:07, Song Liu wrote:
>> [...]
>> I understand this could in theory affects all the RAID levels, but in
>> practice I don't think it'll happen. RAID0 is the only "blind" mode of
>> RAID, in the sense it's the only one that doesn't care at all with
>> failures. In fact, this was the origin of my other thread [0], regarding
>> the change of raid0's behavior in error cases..because it currently does
>> not care with members being removed and rely only in filesystem failures
>> (after submitting many BIOs to the removed device).
>>
>> That said, in this change I've only took care of raid0, since in my
>> understanding the other levels won't submit BIOs to dead devices; we can
>> experiment that to see if it's true.
> 
> Could you please run a quick test with raid5? I am wondering whether
> some race condition could get us into similar crash. If we cannot easily
> trigger the bug, we can process with this version.
> 
> Thanks,
> Song

Hi Song, I've tested both RAID5 (with 3 disks, removing one at a time),
and also RAID 1 (2 disks, also removing one at a time); no issues
observed in kernel 5.1. We can see one interesting message in kernel
log: "super_written gets error=10", which corresponds to md detecting
the error (bi_status == BLK_STS_IOERROR) and instantly failing the
write, making FS read-only.

So, I think really the issue happens only in RAID0, which writes
"blindly" to its components.
Let me know your thoughts - thanks again for your input!

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-05-06 18:48     ` Guilherme G. Piccoli
@ 2019-05-06 21:07       ` Song Liu
  2019-05-07 21:51         ` Guilherme G. Piccoli
  2019-05-08  9:29         ` Wols Lists
  0 siblings, 2 replies; 24+ messages in thread
From: Song Liu @ 2019-05-06 21:07 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: linux-block, linux-raid, dm-devel, axboe, Gavin Guo,
	Jay Vosburgh, kernel, Ming Lei, Tetsuo Handa, stable

On Mon, May 6, 2019 at 2:48 PM Guilherme G. Piccoli
<gpiccoli@canonical.com> wrote:
>
> On 06/05/2019 13:50, Song Liu wrote:
> > [...]
> > IIUC, we need this for all raid types. Is it possible to fix that in md.c so
> > all types get the fix?
> >
> > Thanks,
> > Song
> >
>
> Hi Song, thanks again for reviewing my code and provide input, much
> appreciated!
>
> I understand this could in theory affects all the RAID levels, but in
> practice I don't think it'll happen. RAID0 is the only "blind" mode of
> RAID, in the sense it's the only one that doesn't care at all with
> failures. In fact, this was the origin of my other thread [0], regarding
> the change of raid0's behavior in error cases..because it currently does
> not care with members being removed and rely only in filesystem failures
> (after submitting many BIOs to the removed device).
>
> That said, in this change I've only took care of raid0, since in my
> understanding the other levels won't submit BIOs to dead devices; we can
> experiment that to see if it's true.

Could you please run a quick test with raid5? I am wondering whether
some race condition could get us into similar crash. If we cannot easily
trigger the bug, we can process with this version.

Thanks,
Song

>
> But I'd be happy to change all other levels also if you think it's
> appropriate (or a simple generic change to md.c if it is enough). Do you
> think we could go ahead with this change, and further improve that (to
> cover all raid cases if necessary)?
>
> Cheers,
>
>
> Guilherme
>
>
>
> [0] https://marc.info/?l=linux-raid&m=155562509905735

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-05-06 16:50   ` Song Liu
@ 2019-05-06 18:48     ` Guilherme G. Piccoli
  2019-05-06 21:07       ` Song Liu
  0 siblings, 1 reply; 24+ messages in thread
From: Guilherme G. Piccoli @ 2019-05-06 18:48 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-block, linux-raid, dm-devel, axboe, Gavin Guo,
	Jay Vosburgh, kernel, Ming Lei, Tetsuo Handa, stable

On 06/05/2019 13:50, Song Liu wrote:
> [...] 
> IIUC, we need this for all raid types. Is it possible to fix that in md.c so
> all types get the fix?
> 
> Thanks,
> Song
> 

Hi Song, thanks again for reviewing my code and provide input, much
appreciated!

I understand this could in theory affects all the RAID levels, but in
practice I don't think it'll happen. RAID0 is the only "blind" mode of
RAID, in the sense it's the only one that doesn't care at all with
failures. In fact, this was the origin of my other thread [0], regarding
the change of raid0's behavior in error cases..because it currently does
not care with members being removed and rely only in filesystem failures
(after submitting many BIOs to the removed device).

That said, in this change I've only took care of raid0, since in my
understanding the other levels won't submit BIOs to dead devices; we can
experiment that to see if it's true.

But I'd be happy to change all other levels also if you think it's
appropriate (or a simple generic change to md.c if it is enough). Do you
think we could go ahead with this change, and further improve that (to
cover all raid cases if necessary)?

Cheers,


Guilherme



[0] https://marc.info/?l=linux-raid&m=155562509905735

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-04-30 22:37 ` [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios Guilherme G. Piccoli
@ 2019-05-06 16:50   ` Song Liu
  2019-05-06 18:48     ` Guilherme G. Piccoli
  0 siblings, 1 reply; 24+ messages in thread
From: Song Liu @ 2019-05-06 16:50 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: linux-block, linux-raid, dm-devel, axboe, Gavin Guo,
	Jay Vosburgh, kernel, Ming Lei, Tetsuo Handa, stable

On Tue, Apr 30, 2019 at 6:38 PM Guilherme G. Piccoli
<gpiccoli@canonical.com> wrote:
>
> Commit cd4a4ae4683d ("block: don't use blocking queue entered for
> recursive bio submits") introduced the flag BIO_QUEUE_ENTERED in order
> split bios bypass the blocking queue entering routine and use the live
> non-blocking version. It was a result of an extensive discussion in
> a linux-block thread[0], and the purpose of this change was to prevent
> a hung task waiting on a reference to drop.
>
> Happens that md raid0 split bios all the time, and more important,
> it changes their underlying device to the raid member. After the change
> introduced by this flag's usage, we experience various crashes if a raid0
> member is removed during a large write. This happens because the bio
> reaches the live queue entering function when the queue of the raid0
> member is dying.
>
> A simple reproducer of this behavior is presented below:
> a) Build kernel v5.1-rc7 with CONFIG_BLK_DEV_THROTTLING=y.
>
> b) Create a raid0 md array with 2 NVMe devices as members, and mount it
> with an ext4 filesystem.
>
> c) Run the following oneliner (supposing the raid0 is mounted in /mnt):
> (dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;
> echo 1 > /sys/block/nvme0n1/device/device/remove
> (whereas nvme0n1 is the 2nd array member)
>
> This will trigger the following warning/oops:
>
> ------------[ cut here ]------------
> no blkg associated for bio on block-device: nvme0n1
> WARNING: CPU: 9 PID: 184 at ./include/linux/blk-cgroup.h:785
> generic_make_request_checks+0x4dd/0x690
> [...]
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000155
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP PTI
> RIP: 0010:blk_throtl_bio+0x45/0x970
> [...]
> Call Trace:
>  generic_make_request_checks+0x1bf/0x690
>  generic_make_request+0x64/0x3f0
>  raid0_make_request+0x184/0x620 [raid0]
>  ? raid0_make_request+0x184/0x620 [raid0]
>  ? blk_queue_split+0x384/0x6d0
>  md_handle_request+0x126/0x1a0
>  md_make_request+0x7b/0x180
>  generic_make_request+0x19e/0x3f0
>  submit_bio+0x73/0x140
> [...]
>
> This patch changes raid0 driver to fallback to the "old" blocking queue
> entering procedure, by clearing the BIO_QUEUE_ENTERED from raid0 bios.
> This prevents the crashes and restores the regular behavior of raid0
> arrays when a member is removed during a large write.
>
> [0] https://marc.info/?l=linux-block&m=152638475806811
>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: stable@vger.kernel.org # v4.18
> Fixes: cd4a4ae4683d ("block: don't use blocking queue entered for recursive bio submits")
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>

IIUC, we need this for all raid types. Is it possible to fix that in md.c so
all types get the fix?

Thanks,
Song

> ---
>  drivers/md/raid0.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index f3fb5bb8c82a..d5bdc79e0835 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -547,6 +547,7 @@ static void raid0_handle_discard(struct mddev *mddev, struct bio *bio)
>                         trace_block_bio_remap(bdev_get_queue(rdev->bdev),
>                                 discard_bio, disk_devt(mddev->gendisk),
>                                 bio->bi_iter.bi_sector);
> +               bio_clear_flag(bio, BIO_QUEUE_ENTERED);
>                 generic_make_request(discard_bio);
>         }
>         bio_endio(bio);
> @@ -602,6 +603,7 @@ static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
>                                 disk_devt(mddev->gendisk), bio_sector);
>         mddev_check_writesame(mddev, bio);
>         mddev_check_write_zeroes(mddev, bio);
> +       bio_clear_flag(bio, BIO_QUEUE_ENTERED);
>         generic_make_request(bio);
>         return true;
>  }
> --
> 2.21.0
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0  bios
  2019-04-30 22:37 Guilherme G. Piccoli
@ 2019-04-30 22:37 ` Guilherme G. Piccoli
  2019-05-06 16:50   ` Song Liu
  0 siblings, 1 reply; 24+ messages in thread
From: Guilherme G. Piccoli @ 2019-04-30 22:37 UTC (permalink / raw)
  To: linux-block, linux-raid
  Cc: dm-devel, axboe, gavin.guo, jay.vosburgh, gpiccoli, kernel,
	Ming Lei, Tetsuo Handa, stable

Commit cd4a4ae4683d ("block: don't use blocking queue entered for
recursive bio submits") introduced the flag BIO_QUEUE_ENTERED in order
split bios bypass the blocking queue entering routine and use the live
non-blocking version. It was a result of an extensive discussion in
a linux-block thread[0], and the purpose of this change was to prevent
a hung task waiting on a reference to drop.

Happens that md raid0 split bios all the time, and more important,
it changes their underlying device to the raid member. After the change
introduced by this flag's usage, we experience various crashes if a raid0
member is removed during a large write. This happens because the bio
reaches the live queue entering function when the queue of the raid0
member is dying.

A simple reproducer of this behavior is presented below:
a) Build kernel v5.1-rc7 with CONFIG_BLK_DEV_THROTTLING=y.

b) Create a raid0 md array with 2 NVMe devices as members, and mount it
with an ext4 filesystem.

c) Run the following oneliner (supposing the raid0 is mounted in /mnt):
(dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;
echo 1 > /sys/block/nvme0n1/device/device/remove
(whereas nvme0n1 is the 2nd array member)

This will trigger the following warning/oops:

------------[ cut here ]------------
no blkg associated for bio on block-device: nvme0n1
WARNING: CPU: 9 PID: 184 at ./include/linux/blk-cgroup.h:785
generic_make_request_checks+0x4dd/0x690
[...]
BUG: unable to handle kernel NULL pointer dereference at 0000000000000155
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
RIP: 0010:blk_throtl_bio+0x45/0x970
[...]
Call Trace:
 generic_make_request_checks+0x1bf/0x690
 generic_make_request+0x64/0x3f0
 raid0_make_request+0x184/0x620 [raid0]
 ? raid0_make_request+0x184/0x620 [raid0]
 ? blk_queue_split+0x384/0x6d0
 md_handle_request+0x126/0x1a0
 md_make_request+0x7b/0x180
 generic_make_request+0x19e/0x3f0
 submit_bio+0x73/0x140
[...]

This patch changes raid0 driver to fallback to the "old" blocking queue
entering procedure, by clearing the BIO_QUEUE_ENTERED from raid0 bios.
This prevents the crashes and restores the regular behavior of raid0
arrays when a member is removed during a large write.

[0] https://marc.info/?l=linux-block&m=152638475806811

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: stable@vger.kernel.org # v4.18
Fixes: cd4a4ae4683d ("block: don't use blocking queue entered for recursive bio submits")
Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
---
 drivers/md/raid0.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index f3fb5bb8c82a..d5bdc79e0835 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -547,6 +547,7 @@ static void raid0_handle_discard(struct mddev *mddev, struct bio *bio)
 			trace_block_bio_remap(bdev_get_queue(rdev->bdev),
 				discard_bio, disk_devt(mddev->gendisk),
 				bio->bi_iter.bi_sector);
+		bio_clear_flag(bio, BIO_QUEUE_ENTERED);
 		generic_make_request(discard_bio);
 	}
 	bio_endio(bio);
@@ -602,6 +603,7 @@ static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
 				disk_devt(mddev->gendisk), bio_sector);
 	mddev_check_writesame(mddev, bio);
 	mddev_check_write_zeroes(mddev, bio);
+	bio_clear_flag(bio, BIO_QUEUE_ENTERED);
 	generic_make_request(bio);
 	return true;
 }
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2019-06-12 18:48 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-23 17:23 [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request() Song Liu
2019-05-23 17:23 ` [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios Song Liu
2019-06-12 12:40   ` Guilherme Piccoli
2019-06-12 12:48     ` Greg KH
2019-06-12 16:38       ` Guilherme G. Piccoli
2019-06-12 16:37   ` Guilherme G. Piccoli
2019-06-12 16:49     ` Greg KH
2019-06-12 18:07       ` Guilherme Piccoli
2019-06-12 18:36         ` Greg KH
2019-06-12 18:43         ` Sasha Levin
2019-06-12 18:48           ` Guilherme Piccoli
2019-05-23 17:25 ` [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request() Song Liu
2019-06-12 16:36 ` Guilherme G. Piccoli
  -- strict thread matches above, loose matches on Subject: below --
2019-04-30 22:37 Guilherme G. Piccoli
2019-04-30 22:37 ` [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios Guilherme G. Piccoli
2019-05-06 16:50   ` Song Liu
2019-05-06 18:48     ` Guilherme G. Piccoli
2019-05-06 21:07       ` Song Liu
2019-05-07 21:51         ` Guilherme G. Piccoli
2019-05-08  9:29         ` Wols Lists
2019-05-08 14:52           ` Guilherme G. Piccoli
2019-05-08 16:52             ` Wols Lists
2019-05-17 16:19               ` Guilherme G. Piccoli
2019-05-20 16:23                 ` Song Liu
2019-05-20 19:25                   ` Guilherme Piccoli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).