All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request()
@ 2019-05-23 17:23 ` Song Liu
  0 siblings, 0 replies; 24+ messages in thread
From: Song Liu @ 2019-05-23 17:23 UTC (permalink / raw)
  To: linux-raid, stable; +Cc: axboe, Guilherme G. Piccoli, Song Liu

From: "Guilherme G. Piccoli" <gpiccoli@canonical.com>

Commit 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently
with device removal triggers a crash") introduced a NULL pointer
dereference in generic_make_request(). The patch sets q to NULL and
enter_succeeded to false; right after, there's an 'if (enter_succeeded)'
which is not taken, and then the 'else' will dereference q in
blk_queue_dying(q).

This patch just moves the 'q = NULL' to a point in which it won't trigger
the oops, although the semantics of this NULLification remains untouched.

A simple test case/reproducer is as follows:
a) Build kernel v5.2-rc1 with CONFIG_BLK_CGROUP=n.

b) Create a raid0 md array with 2 NVMe devices as members, and mount it
with an ext4 filesystem.

c) Run the following oneliner (supposing the raid0 is mounted in /mnt):
(dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;
echo 1 > /sys/block/nvme0n1/device/device/remove
(whereas nvme0n1 is the 2nd array member)

This will trigger the following oops:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
RIP: 0010:generic_make_request+0x32b/0x400
Call Trace:
 submit_bio+0x73/0x140
 ext4_io_submit+0x4d/0x60
 ext4_writepages+0x626/0xe90
 do_writepages+0x4b/0xe0
[...]

This patch has no functional changes and preserves the md/raid0 behavior
when a member is removed before kernel v4.17.

Cc: stable@vger.kernel.org # v4.17
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Eric Ren <renzhengeek@gmail.com>
Fixes: 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash")
Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
---
 block/blk-core.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index a55389ba8779..d24a29244cb8 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1074,10 +1074,8 @@ blk_qc_t generic_make_request(struct bio *bio)
 			flags = 0;
 			if (bio->bi_opf & REQ_NOWAIT)
 				flags = BLK_MQ_REQ_NOWAIT;
-			if (blk_queue_enter(q, flags) < 0) {
+			if (blk_queue_enter(q, flags) < 0)
 				enter_succeeded = false;
-				q = NULL;
-			}
 		}
 
 		if (enter_succeeded) {
@@ -1108,6 +1106,7 @@ blk_qc_t generic_make_request(struct bio *bio)
 				bio_wouldblock_error(bio);
 			else
 				bio_io_error(bio);
+			q = NULL;
 		}
 		bio = bio_list_pop(&bio_list_on_stack[0]);
 	} while (bio);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request()
@ 2019-05-23 17:23 ` Song Liu
  0 siblings, 0 replies; 24+ messages in thread
From: Song Liu @ 2019-05-23 17:23 UTC (permalink / raw)
  To: linux-raid, stable; +Cc: axboe, Guilherme G. Piccoli, Song Liu

From: "Guilherme G. Piccoli" <gpiccoli@canonical.com>

Commit 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently
with device removal triggers a crash") introduced a NULL pointer
dereference in generic_make_request(). The patch sets q to NULL and
enter_succeeded to false; right after, there's an 'if (enter_succeeded)'
which is not taken, and then the 'else' will dereference q in
blk_queue_dying(q).

This patch just moves the 'q = NULL' to a point in which it won't trigger
the oops, although the semantics of this NULLification remains untouched.

A simple test case/reproducer is as follows:
a) Build kernel v5.2-rc1 with CONFIG_BLK_CGROUP=n.

b) Create a raid0 md array with 2 NVMe devices as members, and mount it
with an ext4 filesystem.

c) Run the following oneliner (supposing the raid0 is mounted in /mnt):
(dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;
echo 1 > /sys/block/nvme0n1/device/device/remove
(whereas nvme0n1 is the 2nd array member)

This will trigger the following oops:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
RIP: 0010:generic_make_request+0x32b/0x400
Call Trace:
 submit_bio+0x73/0x140
 ext4_io_submit+0x4d/0x60
 ext4_writepages+0x626/0xe90
 do_writepages+0x4b/0xe0
[...]

This patch has no functional changes and preserves the md/raid0 behavior
when a member is removed before kernel v4.17.

Cc: stable@vger.kernel.org # v4.17
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Eric Ren <renzhengeek@gmail.com>
Fixes: 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash")
Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
---
 block/blk-core.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index a55389ba8779..d24a29244cb8 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1074,10 +1074,8 @@ blk_qc_t generic_make_request(struct bio *bio)
 			flags = 0;
 			if (bio->bi_opf & REQ_NOWAIT)
 				flags = BLK_MQ_REQ_NOWAIT;
-			if (blk_queue_enter(q, flags) < 0) {
+			if (blk_queue_enter(q, flags) < 0)
 				enter_succeeded = false;
-				q = NULL;
-			}
 		}
 
 		if (enter_succeeded) {
@@ -1108,6 +1106,7 @@ blk_qc_t generic_make_request(struct bio *bio)
 				bio_wouldblock_error(bio);
 			else
 				bio_io_error(bio);
+			q = NULL;
 		}
 		bio = bio_list_pop(&bio_list_on_stack[0]);
 	} while (bio);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-05-23 17:23 ` Song Liu
@ 2019-05-23 17:23   ` Song Liu
  -1 siblings, 0 replies; 24+ messages in thread
From: Song Liu @ 2019-05-23 17:23 UTC (permalink / raw)
  To: linux-raid, stable
  Cc: axboe, Guilherme G. Piccoli, Ming Lei, Song Liu, Tetsuo Handa, Song Liu

From: "Guilherme G. Piccoli" <gpiccoli@canonical.com>

Commit cd4a4ae4683d ("block: don't use blocking queue entered for
recursive bio submits") introduced the flag BIO_QUEUE_ENTERED in order
split bios bypass the blocking queue entering routine and use the live
non-blocking version. It was a result of an extensive discussion in
a linux-block thread[0], and the purpose of this change was to prevent
a hung task waiting on a reference to drop.

Happens that md raid0 split bios all the time, and more important,
it changes their underlying device to the raid member. After the change
introduced by this flag's usage, we experience various crashes if a raid0
member is removed during a large write. This happens because the bio
reaches the live queue entering function when the queue of the raid0
member is dying.

A simple reproducer of this behavior is presented below:
a) Build kernel v5.2-rc1 with CONFIG_BLK_DEV_THROTTLING=y.

b) Create a raid0 md array with 2 NVMe devices as members, and mount it
with an ext4 filesystem.

c) Run the following oneliner (supposing the raid0 is mounted in /mnt):
(dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;
echo 1 > /sys/block/nvme0n1/device/device/remove
(whereas nvme0n1 is the 2nd array member)

This will trigger the following warning/oops:

------------[ cut here ]------------
no blkg associated for bio on block-device: nvme0n1
WARNING: CPU: 9 PID: 184 at ./include/linux/blk-cgroup.h:785
generic_make_request_checks+0x4dd/0x690
[...]
BUG: unable to handle kernel NULL pointer dereference at 0000000000000155
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
RIP: 0010:blk_throtl_bio+0x45/0x970
[...]
Call Trace:
 generic_make_request_checks+0x1bf/0x690
 generic_make_request+0x64/0x3f0
 raid0_make_request+0x184/0x620 [raid0]
 ? raid0_make_request+0x184/0x620 [raid0]
 ? blk_queue_split+0x384/0x6d0
 md_handle_request+0x126/0x1a0
 md_make_request+0x7b/0x180
 generic_make_request+0x19e/0x3f0
 submit_bio+0x73/0x140
[...]

This patch changes raid0 driver to fallback to the "old" blocking queue
entering procedure, by clearing the BIO_QUEUE_ENTERED from raid0 bios.
This prevents the crashes and restores the regular behavior of raid0
arrays when a member is removed during a large write.

[0] https://marc.info/?l=linux-block&m=152638475806811

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Song Liu <liu.song.a23@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: stable@vger.kernel.org # v4.18
Fixes: cd4a4ae4683d ("block: don't use blocking queue entered for recursive bio submits")
Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
---
 drivers/md/raid0.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index f3fb5bb8c82a..d5bdc79e0835 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -547,6 +547,7 @@ static void raid0_handle_discard(struct mddev *mddev, struct bio *bio)
 			trace_block_bio_remap(bdev_get_queue(rdev->bdev),
 				discard_bio, disk_devt(mddev->gendisk),
 				bio->bi_iter.bi_sector);
+		bio_clear_flag(bio, BIO_QUEUE_ENTERED);
 		generic_make_request(discard_bio);
 	}
 	bio_endio(bio);
@@ -602,6 +603,7 @@ static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
 				disk_devt(mddev->gendisk), bio_sector);
 	mddev_check_writesame(mddev, bio);
 	mddev_check_write_zeroes(mddev, bio);
+	bio_clear_flag(bio, BIO_QUEUE_ENTERED);
 	generic_make_request(bio);
 	return true;
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
@ 2019-05-23 17:23   ` Song Liu
  0 siblings, 0 replies; 24+ messages in thread
From: Song Liu @ 2019-05-23 17:23 UTC (permalink / raw)
  To: linux-raid, stable
  Cc: axboe, Guilherme G. Piccoli, Ming Lei, Song Liu, Tetsuo Handa, Song Liu

From: "Guilherme G. Piccoli" <gpiccoli@canonical.com>

Commit cd4a4ae4683d ("block: don't use blocking queue entered for
recursive bio submits") introduced the flag BIO_QUEUE_ENTERED in order
split bios bypass the blocking queue entering routine and use the live
non-blocking version. It was a result of an extensive discussion in
a linux-block thread[0], and the purpose of this change was to prevent
a hung task waiting on a reference to drop.

Happens that md raid0 split bios all the time, and more important,
it changes their underlying device to the raid member. After the change
introduced by this flag's usage, we experience various crashes if a raid0
member is removed during a large write. This happens because the bio
reaches the live queue entering function when the queue of the raid0
member is dying.

A simple reproducer of this behavior is presented below:
a) Build kernel v5.2-rc1 with CONFIG_BLK_DEV_THROTTLING=y.

b) Create a raid0 md array with 2 NVMe devices as members, and mount it
with an ext4 filesystem.

c) Run the following oneliner (supposing the raid0 is mounted in /mnt):
(dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;
echo 1 > /sys/block/nvme0n1/device/device/remove
(whereas nvme0n1 is the 2nd array member)

This will trigger the following warning/oops:

------------[ cut here ]------------
no blkg associated for bio on block-device: nvme0n1
WARNING: CPU: 9 PID: 184 at ./include/linux/blk-cgroup.h:785
generic_make_request_checks+0x4dd/0x690
[...]
BUG: unable to handle kernel NULL pointer dereference at 0000000000000155
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
RIP: 0010:blk_throtl_bio+0x45/0x970
[...]
Call Trace:
 generic_make_request_checks+0x1bf/0x690
 generic_make_request+0x64/0x3f0
 raid0_make_request+0x184/0x620 [raid0]
 ? raid0_make_request+0x184/0x620 [raid0]
 ? blk_queue_split+0x384/0x6d0
 md_handle_request+0x126/0x1a0
 md_make_request+0x7b/0x180
 generic_make_request+0x19e/0x3f0
 submit_bio+0x73/0x140
[...]

This patch changes raid0 driver to fallback to the "old" blocking queue
entering procedure, by clearing the BIO_QUEUE_ENTERED from raid0 bios.
This prevents the crashes and restores the regular behavior of raid0
arrays when a member is removed during a large write.

[0] https://marc.info/?l=linux-block&m=152638475806811

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Song Liu <liu.song.a23@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: stable@vger.kernel.org # v4.18
Fixes: cd4a4ae4683d ("block: don't use blocking queue entered for recursive bio submits")
Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
---
 drivers/md/raid0.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index f3fb5bb8c82a..d5bdc79e0835 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -547,6 +547,7 @@ static void raid0_handle_discard(struct mddev *mddev, struct bio *bio)
 			trace_block_bio_remap(bdev_get_queue(rdev->bdev),
 				discard_bio, disk_devt(mddev->gendisk),
 				bio->bi_iter.bi_sector);
+		bio_clear_flag(bio, BIO_QUEUE_ENTERED);
 		generic_make_request(discard_bio);
 	}
 	bio_endio(bio);
@@ -602,6 +603,7 @@ static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
 				disk_devt(mddev->gendisk), bio_sector);
 	mddev_check_writesame(mddev, bio);
 	mddev_check_write_zeroes(mddev, bio);
+	bio_clear_flag(bio, BIO_QUEUE_ENTERED);
 	generic_make_request(bio);
 	return true;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request()
  2019-05-23 17:23 ` Song Liu
  (?)
  (?)
@ 2019-05-23 17:25 ` Song Liu
  -1 siblings, 0 replies; 24+ messages in thread
From: Song Liu @ 2019-05-23 17:25 UTC (permalink / raw)
  To: Song Liu; +Cc: linux-raid, stable, Jens Axboe, Guilherme G. Piccoli

On Thu, May 23, 2019 at 10:24 AM Song Liu <songliubraving@fb.com> wrote:
>
> From: "Guilherme G. Piccoli" <gpiccoli@canonical.com>
>
> Commit 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently
> with device removal triggers a crash") introduced a NULL pointer
> dereference in generic_make_request(). The patch sets q to NULL and
> enter_succeeded to false; right after, there's an 'if (enter_succeeded)'
> which is not taken, and then the 'else' will dereference q in
> blk_queue_dying(q).
>
> This patch just moves the 'q = NULL' to a point in which it won't trigger
> the oops, although the semantics of this NULLification remains untouched.
>
> A simple test case/reproducer is as follows:
> a) Build kernel v5.2-rc1 with CONFIG_BLK_CGROUP=n.
>
> b) Create a raid0 md array with 2 NVMe devices as members, and mount it
> with an ext4 filesystem.
>
> c) Run the following oneliner (supposing the raid0 is mounted in /mnt):
> (dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;
> echo 1 > /sys/block/nvme0n1/device/device/remove
> (whereas nvme0n1 is the 2nd array member)
>
> This will trigger the following oops:
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP PTI
> RIP: 0010:generic_make_request+0x32b/0x400
> Call Trace:
>  submit_bio+0x73/0x140
>  ext4_io_submit+0x4d/0x60
>  ext4_writepages+0x626/0xe90
>  do_writepages+0x4b/0xe0
> [...]
>
> This patch has no functional changes and preserves the md/raid0 behavior
> when a member is removed before kernel v4.17.
>
> Cc: stable@vger.kernel.org # v4.17
> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Tested-by: Eric Ren <renzhengeek@gmail.com>
> Fixes: 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash")
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
> Signed-off-by: Song Liu <songliubraving@fb.com>

Please note this patchset is only for stable.

Thanks,
Song

> ---
>  block/blk-core.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index a55389ba8779..d24a29244cb8 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1074,10 +1074,8 @@ blk_qc_t generic_make_request(struct bio *bio)
>                         flags = 0;
>                         if (bio->bi_opf & REQ_NOWAIT)
>                                 flags = BLK_MQ_REQ_NOWAIT;
> -                       if (blk_queue_enter(q, flags) < 0) {
> +                       if (blk_queue_enter(q, flags) < 0)
>                                 enter_succeeded = false;
> -                               q = NULL;
> -                       }
>                 }
>
>                 if (enter_succeeded) {
> @@ -1108,6 +1106,7 @@ blk_qc_t generic_make_request(struct bio *bio)
>                                 bio_wouldblock_error(bio);
>                         else
>                                 bio_io_error(bio);
> +                       q = NULL;
>                 }
>                 bio = bio_list_pop(&bio_list_on_stack[0]);
>         } while (bio);
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-05-23 17:23   ` Song Liu
  (?)
@ 2019-06-12 12:40   ` Guilherme Piccoli
  2019-06-12 12:48     ` Greg KH
  -1 siblings, 1 reply; 24+ messages in thread
From: Guilherme Piccoli @ 2019-06-12 12:40 UTC (permalink / raw)
  To: stable, gregkh, sashal
  Cc: Song Liu, linux-raid, Jens Axboe, Ming Lei, Song Liu, Tetsuo Handa

Hi Greg and Sasha, is there any news about these patches?
Just checked the stable branches 5.1.y and 5.0.y, they seem not merged.

If there's anything pending from my side, let me know.
Thanks in advance,


Guilherme

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-06-12 12:40   ` Guilherme Piccoli
@ 2019-06-12 12:48     ` Greg KH
  2019-06-12 16:38       ` Guilherme G. Piccoli
  0 siblings, 1 reply; 24+ messages in thread
From: Greg KH @ 2019-06-12 12:48 UTC (permalink / raw)
  To: Guilherme Piccoli
  Cc: stable, sashal, Song Liu, linux-raid, Jens Axboe, Ming Lei,
	Song Liu, Tetsuo Handa

On Wed, Jun 12, 2019 at 09:40:17AM -0300, Guilherme Piccoli wrote:
> Hi Greg and Sasha, is there any news about these patches?

What patches?

> Just checked the stable branches 5.1.y and 5.0.y, they seem not merged.

Are they merged in Linus's tree?  What are the git commit ids?

I have no record of these patches in my queue at the moment, sorry.  If
these were a backport, please resend them in the proper format.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request()
  2019-05-23 17:23 ` Song Liu
                   ` (2 preceding siblings ...)
  (?)
@ 2019-06-12 16:36 ` Guilherme G. Piccoli
  -1 siblings, 0 replies; 24+ messages in thread
From: Guilherme G. Piccoli @ 2019-06-12 16:36 UTC (permalink / raw)
  To: stable; +Cc: Song Liu, linux-raid, axboe, gregkh, sashal

+ Greg, Sasha


On 23/05/2019 14:23, Song Liu wrote:
> From: "Guilherme G. Piccoli" <gpiccoli@canonical.com>
> 
> Commit 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently
> with device removal triggers a crash") introduced a NULL pointer
> dereference in generic_make_request(). The patch sets q to NULL and
> enter_succeeded to false; right after, there's an 'if (enter_succeeded)'
> which is not taken, and then the 'else' will dereference q in
> blk_queue_dying(q).
> 
> This patch just moves the 'q = NULL' to a point in which it won't trigger
> the oops, although the semantics of this NULLification remains untouched.
> 
> A simple test case/reproducer is as follows:
> a) Build kernel v5.2-rc1 with CONFIG_BLK_CGROUP=n.
> 
> b) Create a raid0 md array with 2 NVMe devices as members, and mount it
> with an ext4 filesystem.
> 
> c) Run the following oneliner (supposing the raid0 is mounted in /mnt):
> (dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;
> echo 1 > /sys/block/nvme0n1/device/device/remove
> (whereas nvme0n1 is the 2nd array member)
> 
> This will trigger the following oops:
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP PTI
> RIP: 0010:generic_make_request+0x32b/0x400
> Call Trace:
>  submit_bio+0x73/0x140
>  ext4_io_submit+0x4d/0x60
>  ext4_writepages+0x626/0xe90
>  do_writepages+0x4b/0xe0
> [...]
> 
> This patch has no functional changes and preserves the md/raid0 behavior
> when a member is removed before kernel v4.17.
> 
> Cc: stable@vger.kernel.org # v4.17
> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Tested-by: Eric Ren <renzhengeek@gmail.com>
> Fixes: 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash")
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  block/blk-core.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index a55389ba8779..d24a29244cb8 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1074,10 +1074,8 @@ blk_qc_t generic_make_request(struct bio *bio)
>  			flags = 0;
>  			if (bio->bi_opf & REQ_NOWAIT)
>  				flags = BLK_MQ_REQ_NOWAIT;
> -			if (blk_queue_enter(q, flags) < 0) {
> +			if (blk_queue_enter(q, flags) < 0)
>  				enter_succeeded = false;
> -				q = NULL;
> -			}
>  		}
>  
>  		if (enter_succeeded) {
> @@ -1108,6 +1106,7 @@ blk_qc_t generic_make_request(struct bio *bio)
>  				bio_wouldblock_error(bio);
>  			else
>  				bio_io_error(bio);
> +			q = NULL;
>  		}
>  		bio = bio_list_pop(&bio_list_on_stack[0]);
>  	} while (bio);
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-05-23 17:23   ` Song Liu
  (?)
  (?)
@ 2019-06-12 16:37   ` Guilherme G. Piccoli
  2019-06-12 16:49     ` Greg KH
  -1 siblings, 1 reply; 24+ messages in thread
From: Guilherme G. Piccoli @ 2019-06-12 16:37 UTC (permalink / raw)
  To: stable
  Cc: Song Liu, linux-raid, axboe, Ming Lei, Song Liu, Tetsuo Handa,
	gregkh, sashal

+Greg, Sasha


On 23/05/2019 14:23, Song Liu wrote:
> From: "Guilherme G. Piccoli" <gpiccoli@canonical.com>
> 
> Commit cd4a4ae4683d ("block: don't use blocking queue entered for
> recursive bio submits") introduced the flag BIO_QUEUE_ENTERED in order
> split bios bypass the blocking queue entering routine and use the live
> non-blocking version. It was a result of an extensive discussion in
> a linux-block thread[0], and the purpose of this change was to prevent
> a hung task waiting on a reference to drop.
> 
> Happens that md raid0 split bios all the time, and more important,
> it changes their underlying device to the raid member. After the change
> introduced by this flag's usage, we experience various crashes if a raid0
> member is removed during a large write. This happens because the bio
> reaches the live queue entering function when the queue of the raid0
> member is dying.
> 
> A simple reproducer of this behavior is presented below:
> a) Build kernel v5.2-rc1 with CONFIG_BLK_DEV_THROTTLING=y.
> 
> b) Create a raid0 md array with 2 NVMe devices as members, and mount it
> with an ext4 filesystem.
> 
> c) Run the following oneliner (supposing the raid0 is mounted in /mnt):
> (dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;
> echo 1 > /sys/block/nvme0n1/device/device/remove
> (whereas nvme0n1 is the 2nd array member)
> 
> This will trigger the following warning/oops:
> 
> ------------[ cut here ]------------
> no blkg associated for bio on block-device: nvme0n1
> WARNING: CPU: 9 PID: 184 at ./include/linux/blk-cgroup.h:785
> generic_make_request_checks+0x4dd/0x690
> [...]
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000155
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP PTI
> RIP: 0010:blk_throtl_bio+0x45/0x970
> [...]
> Call Trace:
>  generic_make_request_checks+0x1bf/0x690
>  generic_make_request+0x64/0x3f0
>  raid0_make_request+0x184/0x620 [raid0]
>  ? raid0_make_request+0x184/0x620 [raid0]
>  ? blk_queue_split+0x384/0x6d0
>  md_handle_request+0x126/0x1a0
>  md_make_request+0x7b/0x180
>  generic_make_request+0x19e/0x3f0
>  submit_bio+0x73/0x140
> [...]
> 
> This patch changes raid0 driver to fallback to the "old" blocking queue
> entering procedure, by clearing the BIO_QUEUE_ENTERED from raid0 bios.
> This prevents the crashes and restores the regular behavior of raid0
> arrays when a member is removed during a large write.
> 
> [0] https://marc.info/?l=linux-block&m=152638475806811
> 
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Song Liu <liu.song.a23@gmail.com>
> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: stable@vger.kernel.org # v4.18
> Fixes: cd4a4ae4683d ("block: don't use blocking queue entered for recursive bio submits")
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  drivers/md/raid0.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index f3fb5bb8c82a..d5bdc79e0835 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -547,6 +547,7 @@ static void raid0_handle_discard(struct mddev *mddev, struct bio *bio)
>  			trace_block_bio_remap(bdev_get_queue(rdev->bdev),
>  				discard_bio, disk_devt(mddev->gendisk),
>  				bio->bi_iter.bi_sector);
> +		bio_clear_flag(bio, BIO_QUEUE_ENTERED);
>  		generic_make_request(discard_bio);
>  	}
>  	bio_endio(bio);
> @@ -602,6 +603,7 @@ static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
>  				disk_devt(mddev->gendisk), bio_sector);
>  	mddev_check_writesame(mddev, bio);
>  	mddev_check_write_zeroes(mddev, bio);
> +	bio_clear_flag(bio, BIO_QUEUE_ENTERED);
>  	generic_make_request(bio);
>  	return true;
>  }
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-06-12 12:48     ` Greg KH
@ 2019-06-12 16:38       ` Guilherme G. Piccoli
  0 siblings, 0 replies; 24+ messages in thread
From: Guilherme G. Piccoli @ 2019-06-12 16:38 UTC (permalink / raw)
  To: Greg KH
  Cc: stable, sashal, Song Liu, linux-raid, Jens Axboe, Ming Lei,
	Song Liu, Tetsuo Handa

On 12/06/2019 09:48, Greg KH wrote:
> On Wed, Jun 12, 2019 at 09:40:17AM -0300, Guilherme Piccoli wrote:
>> Hi Greg and Sasha, is there any news about these patches?
> 
> What patches?
> 
>> Just checked the stable branches 5.1.y and 5.0.y, they seem not merged.
> 
> Are they merged in Linus's tree?  What are the git commit ids?
> 

Hi Greg, thanks for your prompt response! The story behind these patches
is a bit confusing; I'll summarize below.

I've submitted 2 patches with fixes to linux-raid in April; if you wanna
take a look, they are:

https://marc.info/?l=linux-raid&m=155839017427565
https://marc.info/?l=linux-raid&m=155839017827569

After some discussion, it was determined a patchset from Ming Lei fixed
the same issues I've reported/fixed, but it was a rework (and depends on
the removal of legacy IO path).
That said, Song Liu submitted both of my fixes as stable-only patches.
I couldn't find a stable archive (and I don't subscribe to that), so I
cannot point links here.

I just resubmitted both patches, this time CCing you and Sasha. Let me
know if there's anything needed - I'd prefer to have them upstream too,
but the discussion with Christoph/Ming/Song reached a consensus that it
wouldn't make sense to add them and soon after add Ming's patchset,
which removes that code.
But backporting Ming's series is not really simple/feasible.

Thanks,


Guilherme


> I have no record of these patches in my queue at the moment, sorry.  If
> these were a backport, please resend them in the proper format.
> 
> thanks,
> 
> greg k-h
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-06-12 16:37   ` Guilherme G. Piccoli
@ 2019-06-12 16:49     ` Greg KH
  2019-06-12 18:07       ` Guilherme Piccoli
  0 siblings, 1 reply; 24+ messages in thread
From: Greg KH @ 2019-06-12 16:49 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: stable, Song Liu, linux-raid, axboe, Ming Lei, Song Liu,
	Tetsuo Handa, sashal

On Wed, Jun 12, 2019 at 01:37:24PM -0300, Guilherme G. Piccoli wrote:
> +Greg, Sasha

Please resend them in a format that they can be applied in.

Also, I need a TON of descriptions about why this differs from what is
in Linus's tree, as it is, what you have below does not show that at
all, they seem to be valud for 5.2-rc1.

And I need acks from the maintainers of the subsystems.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-06-12 16:49     ` Greg KH
@ 2019-06-12 18:07       ` Guilherme Piccoli
  2019-06-12 18:36         ` Greg KH
  2019-06-12 18:43           ` Sasha Levin
  0 siblings, 2 replies; 24+ messages in thread
From: Guilherme Piccoli @ 2019-06-12 18:07 UTC (permalink / raw)
  To: Greg KH
  Cc: stable, Song Liu, linux-raid, Jens Axboe, Ming Lei, Song Liu,
	Tetsuo Handa, sashal

On Wed, Jun 12, 2019 at 1:50 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Wed, Jun 12, 2019 at 01:37:24PM -0300, Guilherme G. Piccoli wrote:
> > +Greg, Sasha
>
> Please resend them in a format that they can be applied in.
>
> Also, I need a TON of descriptions about why this differs from what is
> in Linus's tree, as it is, what you have below does not show that at
> all, they seem to be valud for 5.2-rc1.

Thanks Greg, I'll work on it. Can this "ton" of description be a cover-letter?
Cheers,


Guilherme

>
> And I need acks from the maintainers of the subsystems.
>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-06-12 18:07       ` Guilherme Piccoli
@ 2019-06-12 18:36         ` Greg KH
  2019-06-12 18:43           ` Sasha Levin
  1 sibling, 0 replies; 24+ messages in thread
From: Greg KH @ 2019-06-12 18:36 UTC (permalink / raw)
  To: Guilherme Piccoli
  Cc: stable, Song Liu, linux-raid, Jens Axboe, Ming Lei, Song Liu,
	Tetsuo Handa, sashal

On Wed, Jun 12, 2019 at 03:07:11PM -0300, Guilherme Piccoli wrote:
> On Wed, Jun 12, 2019 at 1:50 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Wed, Jun 12, 2019 at 01:37:24PM -0300, Guilherme G. Piccoli wrote:
> > > +Greg, Sasha
> >
> > Please resend them in a format that they can be applied in.
> >
> > Also, I need a TON of descriptions about why this differs from what is
> > in Linus's tree, as it is, what you have below does not show that at
> > all, they seem to be valud for 5.2-rc1.
> 
> Thanks Greg, I'll work on it. Can this "ton" of description be a cover-letter?

Nope, in the commit log itself as we need it in the tree if we apply it.

Remember, 99% of all patches that we take in the stable tree that are
NOT in Linus's tree end up having bugs.  Yours will, and we want lots of
documentation about exactly why we are thinking we are justified in
taking this patch :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-06-12 18:07       ` Guilherme Piccoli
@ 2019-06-12 18:43           ` Sasha Levin
  2019-06-12 18:43           ` Sasha Levin
  1 sibling, 0 replies; 24+ messages in thread
From: Sasha Levin @ 2019-06-12 18:43 UTC (permalink / raw)
  To: Guilherme Piccoli
  Cc: Greg KH, stable, Song Liu, linux-raid, Jens Axboe, Ming Lei,
	Song Liu, Tetsuo Handa

On Wed, Jun 12, 2019 at 03:07:11PM -0300, Guilherme Piccoli wrote:
>On Wed, Jun 12, 2019 at 1:50 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>>
>> On Wed, Jun 12, 2019 at 01:37:24PM -0300, Guilherme G. Piccoli wrote:
>> > +Greg, Sasha
>>
>> Please resend them in a format that they can be applied in.
>>
>> Also, I need a TON of descriptions about why this differs from what is
>> in Linus's tree, as it is, what you have below does not show that at
>> all, they seem to be valud for 5.2-rc1.
>
>Thanks Greg, I'll work on it. Can this "ton" of description be a cover-letter?

Please just add it to the commit message. We might need to refer to it
in the future and cover letter will just get lost.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
@ 2019-06-12 18:43           ` Sasha Levin
  0 siblings, 0 replies; 24+ messages in thread
From: Sasha Levin @ 2019-06-12 18:43 UTC (permalink / raw)
  To: Guilherme Piccoli
  Cc: Greg KH, stable, Song Liu, linux-raid, Jens Axboe, Ming Lei,
	Song Liu, Tetsuo Handa

On Wed, Jun 12, 2019 at 03:07:11PM -0300, Guilherme Piccoli wrote:
>On Wed, Jun 12, 2019 at 1:50 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>>
>> On Wed, Jun 12, 2019 at 01:37:24PM -0300, Guilherme G. Piccoli wrote:
>> > +Greg, Sasha
>>
>> Please resend them in a format that they can be applied in.
>>
>> Also, I need a TON of descriptions about why this differs from what is
>> in Linus's tree, as it is, what you have below does not show that at
>> all, they seem to be valud for 5.2-rc1.
>
>Thanks Greg, I'll work on it. Can this "ton" of description be a cover-letter?

Please just add it to the commit message. We might need to refer to it
in the future and cover letter will just get lost.

--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-06-12 18:43           ` Sasha Levin
  (?)
@ 2019-06-12 18:48           ` Guilherme Piccoli
  -1 siblings, 0 replies; 24+ messages in thread
From: Guilherme Piccoli @ 2019-06-12 18:48 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Greg KH, stable, Song Liu, linux-raid, Jens Axboe, Ming Lei,
	Song Liu, Tetsuo Handa

OK perfect, thank you both!

On Wed, Jun 12, 2019 at 3:43 PM Sasha Levin <sashal@kernel.org> wrote:
>
> On Wed, Jun 12, 2019 at 03:07:11PM -0300, Guilherme Piccoli wrote:
> >On Wed, Jun 12, 2019 at 1:50 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> >>
> >> On Wed, Jun 12, 2019 at 01:37:24PM -0300, Guilherme G. Piccoli wrote:
> >> > +Greg, Sasha
> >>
> >> Please resend them in a format that they can be applied in.
> >>
> >> Also, I need a TON of descriptions about why this differs from what is
> >> in Linus's tree, as it is, what you have below does not show that at
> >> all, they seem to be valud for 5.2-rc1.
> >
> >Thanks Greg, I'll work on it. Can this "ton" of description be a cover-letter?
>
> Please just add it to the commit message. We might need to refer to it
> in the future and cover letter will just get lost.
>
> --
> Thanks,
> Sasha

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request()
  2019-05-17 16:17   ` Guilherme G. Piccoli
@ 2019-05-20  2:43     ` Eric Ren
  0 siblings, 0 replies; 24+ messages in thread
From: Eric Ren @ 2019-05-20  2:43 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Guilherme G. Piccoli, linux-block, linux-raid, dm-devel, axboe,
	Gavin Guo, Jay Vosburgh, Bart Van Assche, stable

Hi,

On Sat, 18 May 2019 at 00:17, Guilherme G. Piccoli <kernel@gpiccoli.net> wrote:
>
> On Fri, May 17, 2019 at 12:33 AM Eric Ren <renzhengeek@gmail.com> wrote:
> >
> > Hello,
> > [...]
> > Thanks for the bugfix. I also had a panic having very similar
> > calltrace below as this one,
> > when using devicemapper in container scenario and deleting many thin
> > snapshots by dmsetup
> > remove_all -f, meanwhile executing lvm command like vgs.
> >
> > After applied this one, my testing doesn't crash kernel any more for
> > one week.  Could the block
> > developers please give more feedback/priority on this one?
> >
>
> Thanks Eric, for the testing! I think you could send your Tested-by[0]
> tag, which could be added
> in the patch before merge. It's good to know the patch helped somebody
> and your testing improves
> confidence in the change.

Please consider Ming's comments and send patch v2, then feel free to add:
Tested-by: Eric Ren <renzhengeek@gmail.com>

Thanks!
Eric

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request()
  2019-04-30 22:37 Guilherme G. Piccoli
@ 2019-05-17 22:04   ` Ming Lei
  2019-05-17  3:33 ` Eric Ren
  2019-05-17 22:04   ` Ming Lei
  2 siblings, 0 replies; 24+ messages in thread
From: Ming Lei @ 2019-05-17 22:04 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: linux-block, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	open list:DEVICE-MAPPER (LVM),
	Jens Axboe, Gavin Guo, Jay Vosburgh, kernel, Bart Van Assche,
	stable

On Wed, May 1, 2019 at 6:38 AM Guilherme G. Piccoli
<gpiccoli@canonical.com> wrote:
>
> Commit 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently
> with device removal triggers a crash") introduced a NULL pointer
> dereference in generic_make_request(). The patch sets q to NULL and
> enter_succeeded to false; right after, there's an 'if (enter_succeeded)'
> which is not taken, and then the 'else' will dereference q in
> blk_queue_dying(q).
>
> This patch just moves the 'q = NULL' to a point in which it won't trigger
> the oops, although the semantics of this NULLification remains untouched.
>
> A simple test case/reproducer is as follows:
> a) Build kernel v5.1-rc7 with CONFIG_BLK_CGROUP=n.
>
> b) Create a raid0 md array with 2 NVMe devices as members, and mount it
> with an ext4 filesystem.
>
> c) Run the following oneliner (supposing the raid0 is mounted in /mnt):
> (dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;
> echo 1 > /sys/block/nvme0n1/device/device/remove
> (whereas nvme0n1 is the 2nd array member)
>
> This will trigger the following oops:
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP PTI
> RIP: 0010:generic_make_request+0x32b/0x400
> Call Trace:
>  submit_bio+0x73/0x140
>  ext4_io_submit+0x4d/0x60
>  ext4_writepages+0x626/0xe90
>  do_writepages+0x4b/0xe0
> [...]
>
> This patch has no functional changes and preserves the md/raid0 behavior
> when a member is removed before kernel v4.17.
>
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: stable@vger.kernel.org # v4.17
> Fixes: 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash")

BTW, the legacy IO request path is removed since 5.0, the above fault
commit can be
removed as done by the following patches:

https://lore.kernel.org/linux-block/20190515030310.20393-1-ming.lei@redhat.com/T/#t

> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
> ---
>  block/blk-core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index a55389ba8779..e21856a7f3fa 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1076,7 +1076,6 @@ blk_qc_t generic_make_request(struct bio *bio)
>                                 flags = BLK_MQ_REQ_NOWAIT;
>                         if (blk_queue_enter(q, flags) < 0) {
>                                 enter_succeeded = false;
> -                               q = NULL;
>                         }

Please remove '{}'.

>                 }
>
> @@ -1108,6 +1107,7 @@ blk_qc_t generic_make_request(struct bio *bio)
>                                 bio_wouldblock_error(bio);
>                         else
>                                 bio_io_error(bio);
> +                       q = NULL;
>                 }
>                 bio = bio_list_pop(&bio_list_on_stack[0]);
>         } while (bio);
> --
> 2.21.0
>

Reviewed-by: Ming Lei <ming.lei@redhat.com>

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request()
@ 2019-05-17 22:04   ` Ming Lei
  0 siblings, 0 replies; 24+ messages in thread
From: Ming Lei @ 2019-05-17 22:04 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: linux-block, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	open list:DEVICE-MAPPER (LVM),
	Jens Axboe, Gavin Guo, Jay Vosburgh, kernel, Bart Van Assche,
	stable

On Wed, May 1, 2019 at 6:38 AM Guilherme G. Piccoli
<gpiccoli@canonical.com> wrote:
>
> Commit 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently
> with device removal triggers a crash") introduced a NULL pointer
> dereference in generic_make_request(). The patch sets q to NULL and
> enter_succeeded to false; right after, there's an 'if (enter_succeeded)'
> which is not taken, and then the 'else' will dereference q in
> blk_queue_dying(q).
>
> This patch just moves the 'q = NULL' to a point in which it won't trigger
> the oops, although the semantics of this NULLification remains untouched.
>
> A simple test case/reproducer is as follows:
> a) Build kernel v5.1-rc7 with CONFIG_BLK_CGROUP=n.
>
> b) Create a raid0 md array with 2 NVMe devices as members, and mount it
> with an ext4 filesystem.
>
> c) Run the following oneliner (supposing the raid0 is mounted in /mnt):
> (dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;
> echo 1 > /sys/block/nvme0n1/device/device/remove
> (whereas nvme0n1 is the 2nd array member)
>
> This will trigger the following oops:
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP PTI
> RIP: 0010:generic_make_request+0x32b/0x400
> Call Trace:
>  submit_bio+0x73/0x140
>  ext4_io_submit+0x4d/0x60
>  ext4_writepages+0x626/0xe90
>  do_writepages+0x4b/0xe0
> [...]
>
> This patch has no functional changes and preserves the md/raid0 behavior
> when a member is removed before kernel v4.17.
>
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: stable@vger.kernel.org # v4.17
> Fixes: 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash")

BTW, the legacy IO request path is removed since 5.0, the above fault
commit can be
removed as done by the following patches:

https://lore.kernel.org/linux-block/20190515030310.20393-1-ming.lei@redhat.com/T/#t

> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
> ---
>  block/blk-core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index a55389ba8779..e21856a7f3fa 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1076,7 +1076,6 @@ blk_qc_t generic_make_request(struct bio *bio)
>                                 flags = BLK_MQ_REQ_NOWAIT;
>                         if (blk_queue_enter(q, flags) < 0) {
>                                 enter_succeeded = false;
> -                               q = NULL;
>                         }

Please remove '{}'.

>                 }
>
> @@ -1108,6 +1107,7 @@ blk_qc_t generic_make_request(struct bio *bio)
>                                 bio_wouldblock_error(bio);
>                         else
>                                 bio_io_error(bio);
> +                       q = NULL;
>                 }
>                 bio = bio_list_pop(&bio_list_on_stack[0]);
>         } while (bio);
> --
> 2.21.0
>

Reviewed-by: Ming Lei <ming.lei@redhat.com>

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request()
  2019-05-17  3:33 ` Eric Ren
@ 2019-05-17 16:17   ` Guilherme G. Piccoli
  2019-05-20  2:43     ` Eric Ren
  0 siblings, 1 reply; 24+ messages in thread
From: Guilherme G. Piccoli @ 2019-05-17 16:17 UTC (permalink / raw)
  To: Eric Ren
  Cc: Guilherme G. Piccoli, linux-block, linux-raid, dm-devel, axboe,
	Gavin Guo, Jay Vosburgh, Bart Van Assche, stable

On Fri, May 17, 2019 at 12:33 AM Eric Ren <renzhengeek@gmail.com> wrote:
>
> Hello,
> [...]
> Thanks for the bugfix. I also had a panic having very similar
> calltrace below as this one,
> when using devicemapper in container scenario and deleting many thin
> snapshots by dmsetup
> remove_all -f, meanwhile executing lvm command like vgs.
>
> After applied this one, my testing doesn't crash kernel any more for
> one week.  Could the block
> developers please give more feedback/priority on this one?
>

Thanks Eric, for the testing! I think you could send your Tested-by[0]
tag, which could be added
in the patch before merge. It's good to know the patch helped somebody
and your testing improves
confidence in the change.

Cheers,


Guilherme

[0] https://www.kernel.org/doc/html/latest/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request()
  2019-04-30 22:37 Guilherme G. Piccoli
  2019-04-30 22:55   ` Bart Van Assche
@ 2019-05-17  3:33 ` Eric Ren
  2019-05-17 16:17   ` Guilherme G. Piccoli
  2019-05-17 22:04   ` Ming Lei
  2 siblings, 1 reply; 24+ messages in thread
From: Eric Ren @ 2019-05-17  3:33 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: linux-block, linux-raid, dm-devel, axboe, gavin.guo,
	jay.vosburgh, kernel, Bart Van Assche, stable

Hello,

On Wed, 1 May 2019 at 06:38, Guilherme G. Piccoli
<gpiccoli@canonical.com> wrote:
>
> Commit 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently
> with device removal triggers a crash") introduced a NULL pointer
> dereference in generic_make_request(). The patch sets q to NULL and
> enter_succeeded to false; right after, there's an 'if (enter_succeeded)'
> which is not taken, and then the 'else' will dereference q in
> blk_queue_dying(q).
>
> This patch just moves the 'q = NULL' to a point in which it won't trigger
> the oops, although the semantics of this NULLification remains untouched.
>
> A simple test case/reproducer is as follows:
> a) Build kernel v5.1-rc7 with CONFIG_BLK_CGROUP=n.
>
> b) Create a raid0 md array with 2 NVMe devices as members, and mount it
> with an ext4 filesystem.
>
> c) Run the following oneliner (supposing the raid0 is mounted in /mnt):
> (dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;
> echo 1 > /sys/block/nvme0n1/device/device/remove
> (whereas nvme0n1 is the 2nd array member)
>
> This will trigger the following oops:
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP PTI
> RIP: 0010:generic_make_request+0x32b/0x400
> Call Trace:
>  submit_bio+0x73/0x140
>  ext4_io_submit+0x4d/0x60
>  ext4_writepages+0x626/0xe90
>  do_writepages+0x4b/0xe0
> [...]
>
> This patch has no functional changes and preserves the md/raid0 behavior
> when a member is removed before kernel v4.17.
>
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: stable@vger.kernel.org # v4.17
> Fixes: 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash")
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>

Thanks for the bugfix. I also had a panic having very similar
calltrace below as this one,
when using devicemapper in container scenario and deleting many thin
snapshots by dmsetup
remove_all -f, meanwhile executing lvm command like vgs.

After applied this one, my testing doesn't crash kernel any more for
one week.  Could the block
developers please give more feedback/priority on this one?

My panic trace:
```
50515.136279] BUG: unable to handle kernel NULL pointer dereference at
00000000000003b8
[50515.144704] PGD 0 P4D 0
[50515.147576] Oops: 0000 [#1] SMP PTI
[50515.151403] CPU: 24 PID: 45287 Comm: vgs Kdump: loaded Not tainted
4.19.24-9.x86_64#1
[50515.169295] RIP: 0010:generic_make_request+0x24/0x350
[50515.174684] Code: e9 e1 45 42 00 90 0f 1f 44 00 00 55 48 89 e5 41
55 41 54 53 48 89 fb 48 83 e4 f0 48 83 ec 20 48 8b 47 08 f6 47 15 08
8b 77 10 <4c> 8b a0 b8 03 00 00 0f 84 82 00 00 00 49 8b 84 24 50 07 00
00 a8
[50515.194303] RSP: 0018:ffffa90862373a10 EFLAGS: 00010246
[50515.199869] RAX: 0000000000000000 RBX: ffff99d7338b7800 RCX: 0000000000000000
[50515.207347] RDX: ffff99d51d89c380 RSI: 0000000000000000 RDI: ffff99d7338b7800
[50515.214828] RBP: ffffa90862373a48 R08: 0000000000000000 R09: ffff99a840007300
[50515.222305] R10: ffffa90862373b88 R11: 0000000000000000 R12: ffff99d833592200
[50515.229782] R13: ffffa90862373b58 R14: 0000000000000001 R15: 0000000000000000
[50515.237264] FS:  00007fc36b322880(0000) GS:ffff99d83f000000(0000)
knlGS:0000000000000000
[50515.245944] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[50515.252036] CR2: 00000000000003b8 CR3: 0000005c53ed0001 CR4: 00000000003626a0
[50515.259519] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[50515.266996] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[50515.274473] Call Trace:
[50515.277264]  ? bio_add_page+0x42/0x50
[50515.281267]  submit_bio+0x6e/0x130
[50515.285013]  new_read+0xa2/0xf0 [dm_bufio]
[50515.289454]  dm_bm_read_lock+0x21/0x70 [dm_persistent_data]
[50515.295369]  ro_step+0x31/0x60 [dm_persistent_data]
[50515.300589]  dm_btree_find_key+0xb0/0x180 [dm_persistent_data]
[50515.306765]  dm_thin_get_highest_mapped_block+0x75/0x90 [dm_thin_pool]
[50515.313639]  thin_status+0x145/0x290 [dm_thin_pool]
...
```
Regards,
Eric

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request()
  2019-04-30 22:37 Guilherme G. Piccoli
@ 2019-04-30 22:55   ` Bart Van Assche
  2019-05-17  3:33 ` Eric Ren
  2019-05-17 22:04   ` Ming Lei
  2 siblings, 0 replies; 24+ messages in thread
From: Bart Van Assche @ 2019-04-30 22:55 UTC (permalink / raw)
  To: Guilherme G. Piccoli, linux-block, linux-raid
  Cc: dm-devel, axboe, gavin.guo, jay.vosburgh, kernel, stable

On Tue, 2019-04-30 at 19:37 -0300, Guilherme G. Piccoli wrote:
> Commit 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently
> with device removal triggers a crash") introduced a NULL pointer
> dereference in generic_make_request(). The patch sets q to NULL and
> enter_succeeded to false; right after, there's an 'if (enter_succeeded)'
> which is not taken, and then the 'else' will dereference q in
> blk_queue_dying(q).
> 
> This patch just moves the 'q = NULL' to a point in which it won't trigger
> the oops, although the semantics of this NULLification remains untouched.
> 
> A simple test case/reproducer is as follows:
> a) Build kernel v5.1-rc7 with CONFIG_BLK_CGROUP=n.
> 
> b) Create a raid0 md array with 2 NVMe devices as members, and mount it
> with an ext4 filesystem.
> 
> c) Run the following oneliner (supposing the raid0 is mounted in /mnt):
> (dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;
> echo 1 > /sys/block/nvme0n1/device/device/remove
> (whereas nvme0n1 is the 2nd array member)

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] block: Fix a NULL pointer dereference in  generic_make_request()
@ 2019-04-30 22:55   ` Bart Van Assche
  0 siblings, 0 replies; 24+ messages in thread
From: Bart Van Assche @ 2019-04-30 22:55 UTC (permalink / raw)
  To: Guilherme G. Piccoli, linux-block, linux-raid
  Cc: dm-devel, axboe, gavin.guo, jay.vosburgh, kernel, stable

On Tue, 2019-04-30 at 19:37 -0300, Guilherme G. Piccoli wrote:
> Commit 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently
> with device removal triggers a crash") introduced a NULL pointer
> dereference in generic_make_request(). The patch sets q to NULL and
> enter_succeeded to false; right after, there's an 'if (enter_succeeded)'
> which is not taken, and then the 'else' will dereference q in
> blk_queue_dying(q).
> 
> This patch just moves the 'q = NULL' to a point in which it won't trigger
> the oops, although the semantics of this NULLification remains untouched.
> 
> A simple test case/reproducer is as follows:
> a) Build kernel v5.1-rc7 with CONFIG_BLK_CGROUP=n.
> 
> b) Create a raid0 md array with 2 NVMe devices as members, and mount it
> with an ext4 filesystem.
> 
> c) Run the following oneliner (supposing the raid0 is mounted in /mnt):
> (dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;
> echo 1 > /sys/block/nvme0n1/device/device/remove
> (whereas nvme0n1 is the 2nd array member)

Reviewed-by: Bart Van Assche <bvanassche@acm.org>



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/2] block: Fix a NULL pointer dereference in  generic_make_request()
@ 2019-04-30 22:37 Guilherme G. Piccoli
  2019-04-30 22:55   ` Bart Van Assche
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Guilherme G. Piccoli @ 2019-04-30 22:37 UTC (permalink / raw)
  To: linux-block, linux-raid
  Cc: dm-devel, axboe, gavin.guo, jay.vosburgh, gpiccoli, kernel,
	Bart Van Assche, stable

Commit 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently
with device removal triggers a crash") introduced a NULL pointer
dereference in generic_make_request(). The patch sets q to NULL and
enter_succeeded to false; right after, there's an 'if (enter_succeeded)'
which is not taken, and then the 'else' will dereference q in
blk_queue_dying(q).

This patch just moves the 'q = NULL' to a point in which it won't trigger
the oops, although the semantics of this NULLification remains untouched.

A simple test case/reproducer is as follows:
a) Build kernel v5.1-rc7 with CONFIG_BLK_CGROUP=n.

b) Create a raid0 md array with 2 NVMe devices as members, and mount it
with an ext4 filesystem.

c) Run the following oneliner (supposing the raid0 is mounted in /mnt):
(dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;
echo 1 > /sys/block/nvme0n1/device/device/remove
(whereas nvme0n1 is the 2nd array member)

This will trigger the following oops:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
RIP: 0010:generic_make_request+0x32b/0x400
Call Trace:
 submit_bio+0x73/0x140
 ext4_io_submit+0x4d/0x60
 ext4_writepages+0x626/0xe90
 do_writepages+0x4b/0xe0
[...]

This patch has no functional changes and preserves the md/raid0 behavior
when a member is removed before kernel v4.17.

Cc: Bart Van Assche <bvanassche@acm.org>
Cc: stable@vger.kernel.org # v4.17
Fixes: 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash")
Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
---
 block/blk-core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index a55389ba8779..e21856a7f3fa 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1076,7 +1076,6 @@ blk_qc_t generic_make_request(struct bio *bio)
 				flags = BLK_MQ_REQ_NOWAIT;
 			if (blk_queue_enter(q, flags) < 0) {
 				enter_succeeded = false;
-				q = NULL;
 			}
 		}
 
@@ -1108,6 +1107,7 @@ blk_qc_t generic_make_request(struct bio *bio)
 				bio_wouldblock_error(bio);
 			else
 				bio_io_error(bio);
+			q = NULL;
 		}
 		bio = bio_list_pop(&bio_list_on_stack[0]);
 	} while (bio);
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2019-06-12 18:48 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-23 17:23 [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request() Song Liu
2019-05-23 17:23 ` Song Liu
2019-05-23 17:23 ` [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios Song Liu
2019-05-23 17:23   ` Song Liu
2019-06-12 12:40   ` Guilherme Piccoli
2019-06-12 12:48     ` Greg KH
2019-06-12 16:38       ` Guilherme G. Piccoli
2019-06-12 16:37   ` Guilherme G. Piccoli
2019-06-12 16:49     ` Greg KH
2019-06-12 18:07       ` Guilherme Piccoli
2019-06-12 18:36         ` Greg KH
2019-06-12 18:43         ` Sasha Levin
2019-06-12 18:43           ` Sasha Levin
2019-06-12 18:48           ` Guilherme Piccoli
2019-05-23 17:25 ` [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request() Song Liu
2019-06-12 16:36 ` Guilherme G. Piccoli
  -- strict thread matches above, loose matches on Subject: below --
2019-04-30 22:37 Guilherme G. Piccoli
2019-04-30 22:55 ` Bart Van Assche
2019-04-30 22:55   ` Bart Van Assche
2019-05-17  3:33 ` Eric Ren
2019-05-17 16:17   ` Guilherme G. Piccoli
2019-05-20  2:43     ` Eric Ren
2019-05-17 22:04 ` Ming Lei
2019-05-17 22:04   ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.