linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] block: Fix a NULL pointer dereference in  generic_make_request()
@ 2019-04-30 22:37 Guilherme G. Piccoli
  2019-04-30 22:37 ` [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios Guilherme G. Piccoli
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Guilherme G. Piccoli @ 2019-04-30 22:37 UTC (permalink / raw)
  To: linux-block, linux-raid
  Cc: dm-devel, axboe, gavin.guo, jay.vosburgh, gpiccoli, kernel,
	Bart Van Assche, stable

Commit 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently
with device removal triggers a crash") introduced a NULL pointer
dereference in generic_make_request(). The patch sets q to NULL and
enter_succeeded to false; right after, there's an 'if (enter_succeeded)'
which is not taken, and then the 'else' will dereference q in
blk_queue_dying(q).

This patch just moves the 'q = NULL' to a point in which it won't trigger
the oops, although the semantics of this NULLification remains untouched.

A simple test case/reproducer is as follows:
a) Build kernel v5.1-rc7 with CONFIG_BLK_CGROUP=n.

b) Create a raid0 md array with 2 NVMe devices as members, and mount it
with an ext4 filesystem.

c) Run the following oneliner (supposing the raid0 is mounted in /mnt):
(dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;
echo 1 > /sys/block/nvme0n1/device/device/remove
(whereas nvme0n1 is the 2nd array member)

This will trigger the following oops:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
RIP: 0010:generic_make_request+0x32b/0x400
Call Trace:
 submit_bio+0x73/0x140
 ext4_io_submit+0x4d/0x60
 ext4_writepages+0x626/0xe90
 do_writepages+0x4b/0xe0
[...]

This patch has no functional changes and preserves the md/raid0 behavior
when a member is removed before kernel v4.17.

Cc: Bart Van Assche <bvanassche@acm.org>
Cc: stable@vger.kernel.org # v4.17
Fixes: 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash")
Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
---
 block/blk-core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index a55389ba8779..e21856a7f3fa 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1076,7 +1076,6 @@ blk_qc_t generic_make_request(struct bio *bio)
 				flags = BLK_MQ_REQ_NOWAIT;
 			if (blk_queue_enter(q, flags) < 0) {
 				enter_succeeded = false;
-				q = NULL;
 			}
 		}
 
@@ -1108,6 +1107,7 @@ blk_qc_t generic_make_request(struct bio *bio)
 				bio_wouldblock_error(bio);
 			else
 				bio_io_error(bio);
+			q = NULL;
 		}
 		bio = bio_list_pop(&bio_list_on_stack[0]);
 	} while (bio);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0  bios
  2019-04-30 22:37 [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request() Guilherme G. Piccoli
@ 2019-04-30 22:37 ` Guilherme G. Piccoli
  2019-05-06 16:50   ` Song Liu
  2019-04-30 22:55 ` [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request() Bart Van Assche
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: Guilherme G. Piccoli @ 2019-04-30 22:37 UTC (permalink / raw)
  To: linux-block, linux-raid
  Cc: dm-devel, axboe, gavin.guo, jay.vosburgh, gpiccoli, kernel,
	Ming Lei, Tetsuo Handa, stable

Commit cd4a4ae4683d ("block: don't use blocking queue entered for
recursive bio submits") introduced the flag BIO_QUEUE_ENTERED in order
split bios bypass the blocking queue entering routine and use the live
non-blocking version. It was a result of an extensive discussion in
a linux-block thread[0], and the purpose of this change was to prevent
a hung task waiting on a reference to drop.

Happens that md raid0 split bios all the time, and more important,
it changes their underlying device to the raid member. After the change
introduced by this flag's usage, we experience various crashes if a raid0
member is removed during a large write. This happens because the bio
reaches the live queue entering function when the queue of the raid0
member is dying.

A simple reproducer of this behavior is presented below:
a) Build kernel v5.1-rc7 with CONFIG_BLK_DEV_THROTTLING=y.

b) Create a raid0 md array with 2 NVMe devices as members, and mount it
with an ext4 filesystem.

c) Run the following oneliner (supposing the raid0 is mounted in /mnt):
(dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;
echo 1 > /sys/block/nvme0n1/device/device/remove
(whereas nvme0n1 is the 2nd array member)

This will trigger the following warning/oops:

------------[ cut here ]------------
no blkg associated for bio on block-device: nvme0n1
WARNING: CPU: 9 PID: 184 at ./include/linux/blk-cgroup.h:785
generic_make_request_checks+0x4dd/0x690
[...]
BUG: unable to handle kernel NULL pointer dereference at 0000000000000155
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
RIP: 0010:blk_throtl_bio+0x45/0x970
[...]
Call Trace:
 generic_make_request_checks+0x1bf/0x690
 generic_make_request+0x64/0x3f0
 raid0_make_request+0x184/0x620 [raid0]
 ? raid0_make_request+0x184/0x620 [raid0]
 ? blk_queue_split+0x384/0x6d0
 md_handle_request+0x126/0x1a0
 md_make_request+0x7b/0x180
 generic_make_request+0x19e/0x3f0
 submit_bio+0x73/0x140
[...]

This patch changes raid0 driver to fallback to the "old" blocking queue
entering procedure, by clearing the BIO_QUEUE_ENTERED from raid0 bios.
This prevents the crashes and restores the regular behavior of raid0
arrays when a member is removed during a large write.

[0] https://marc.info/?l=linux-block&m=152638475806811

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: stable@vger.kernel.org # v4.18
Fixes: cd4a4ae4683d ("block: don't use blocking queue entered for recursive bio submits")
Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
---
 drivers/md/raid0.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index f3fb5bb8c82a..d5bdc79e0835 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -547,6 +547,7 @@ static void raid0_handle_discard(struct mddev *mddev, struct bio *bio)
 			trace_block_bio_remap(bdev_get_queue(rdev->bdev),
 				discard_bio, disk_devt(mddev->gendisk),
 				bio->bi_iter.bi_sector);
+		bio_clear_flag(bio, BIO_QUEUE_ENTERED);
 		generic_make_request(discard_bio);
 	}
 	bio_endio(bio);
@@ -602,6 +603,7 @@ static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
 				disk_devt(mddev->gendisk), bio_sector);
 	mddev_check_writesame(mddev, bio);
 	mddev_check_write_zeroes(mddev, bio);
+	bio_clear_flag(bio, BIO_QUEUE_ENTERED);
 	generic_make_request(bio);
 	return true;
 }
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] block: Fix a NULL pointer dereference in  generic_make_request()
  2019-04-30 22:37 [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request() Guilherme G. Piccoli
  2019-04-30 22:37 ` [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios Guilherme G. Piccoli
@ 2019-04-30 22:55 ` Bart Van Assche
  2019-05-17  3:33 ` Eric Ren
  2019-05-17 22:04 ` Ming Lei
  3 siblings, 0 replies; 17+ messages in thread
From: Bart Van Assche @ 2019-04-30 22:55 UTC (permalink / raw)
  To: Guilherme G. Piccoli, linux-block, linux-raid
  Cc: dm-devel, axboe, gavin.guo, jay.vosburgh, kernel, stable

On Tue, 2019-04-30 at 19:37 -0300, Guilherme G. Piccoli wrote:
> Commit 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently
> with device removal triggers a crash") introduced a NULL pointer
> dereference in generic_make_request(). The patch sets q to NULL and
> enter_succeeded to false; right after, there's an 'if (enter_succeeded)'
> which is not taken, and then the 'else' will dereference q in
> blk_queue_dying(q).
> 
> This patch just moves the 'q = NULL' to a point in which it won't trigger
> the oops, although the semantics of this NULLification remains untouched.
> 
> A simple test case/reproducer is as follows:
> a) Build kernel v5.1-rc7 with CONFIG_BLK_CGROUP=n.
> 
> b) Create a raid0 md array with 2 NVMe devices as members, and mount it
> with an ext4 filesystem.
> 
> c) Run the following oneliner (supposing the raid0 is mounted in /mnt):
> (dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;
> echo 1 > /sys/block/nvme0n1/device/device/remove
> (whereas nvme0n1 is the 2nd array member)

Reviewed-by: Bart Van Assche <bvanassche@acm.org>



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-04-30 22:37 ` [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios Guilherme G. Piccoli
@ 2019-05-06 16:50   ` Song Liu
  2019-05-06 18:48     ` Guilherme G. Piccoli
  0 siblings, 1 reply; 17+ messages in thread
From: Song Liu @ 2019-05-06 16:50 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: linux-block, linux-raid, dm-devel, axboe, Gavin Guo,
	Jay Vosburgh, kernel, Ming Lei, Tetsuo Handa, stable

On Tue, Apr 30, 2019 at 6:38 PM Guilherme G. Piccoli
<gpiccoli@canonical.com> wrote:
>
> Commit cd4a4ae4683d ("block: don't use blocking queue entered for
> recursive bio submits") introduced the flag BIO_QUEUE_ENTERED in order
> split bios bypass the blocking queue entering routine and use the live
> non-blocking version. It was a result of an extensive discussion in
> a linux-block thread[0], and the purpose of this change was to prevent
> a hung task waiting on a reference to drop.
>
> Happens that md raid0 split bios all the time, and more important,
> it changes their underlying device to the raid member. After the change
> introduced by this flag's usage, we experience various crashes if a raid0
> member is removed during a large write. This happens because the bio
> reaches the live queue entering function when the queue of the raid0
> member is dying.
>
> A simple reproducer of this behavior is presented below:
> a) Build kernel v5.1-rc7 with CONFIG_BLK_DEV_THROTTLING=y.
>
> b) Create a raid0 md array with 2 NVMe devices as members, and mount it
> with an ext4 filesystem.
>
> c) Run the following oneliner (supposing the raid0 is mounted in /mnt):
> (dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;
> echo 1 > /sys/block/nvme0n1/device/device/remove
> (whereas nvme0n1 is the 2nd array member)
>
> This will trigger the following warning/oops:
>
> ------------[ cut here ]------------
> no blkg associated for bio on block-device: nvme0n1
> WARNING: CPU: 9 PID: 184 at ./include/linux/blk-cgroup.h:785
> generic_make_request_checks+0x4dd/0x690
> [...]
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000155
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP PTI
> RIP: 0010:blk_throtl_bio+0x45/0x970
> [...]
> Call Trace:
>  generic_make_request_checks+0x1bf/0x690
>  generic_make_request+0x64/0x3f0
>  raid0_make_request+0x184/0x620 [raid0]
>  ? raid0_make_request+0x184/0x620 [raid0]
>  ? blk_queue_split+0x384/0x6d0
>  md_handle_request+0x126/0x1a0
>  md_make_request+0x7b/0x180
>  generic_make_request+0x19e/0x3f0
>  submit_bio+0x73/0x140
> [...]
>
> This patch changes raid0 driver to fallback to the "old" blocking queue
> entering procedure, by clearing the BIO_QUEUE_ENTERED from raid0 bios.
> This prevents the crashes and restores the regular behavior of raid0
> arrays when a member is removed during a large write.
>
> [0] https://marc.info/?l=linux-block&m=152638475806811
>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: stable@vger.kernel.org # v4.18
> Fixes: cd4a4ae4683d ("block: don't use blocking queue entered for recursive bio submits")
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>

IIUC, we need this for all raid types. Is it possible to fix that in md.c so
all types get the fix?

Thanks,
Song

> ---
>  drivers/md/raid0.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index f3fb5bb8c82a..d5bdc79e0835 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -547,6 +547,7 @@ static void raid0_handle_discard(struct mddev *mddev, struct bio *bio)
>                         trace_block_bio_remap(bdev_get_queue(rdev->bdev),
>                                 discard_bio, disk_devt(mddev->gendisk),
>                                 bio->bi_iter.bi_sector);
> +               bio_clear_flag(bio, BIO_QUEUE_ENTERED);
>                 generic_make_request(discard_bio);
>         }
>         bio_endio(bio);
> @@ -602,6 +603,7 @@ static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
>                                 disk_devt(mddev->gendisk), bio_sector);
>         mddev_check_writesame(mddev, bio);
>         mddev_check_write_zeroes(mddev, bio);
> +       bio_clear_flag(bio, BIO_QUEUE_ENTERED);
>         generic_make_request(bio);
>         return true;
>  }
> --
> 2.21.0
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-05-06 16:50   ` Song Liu
@ 2019-05-06 18:48     ` Guilherme G. Piccoli
  2019-05-06 21:07       ` Song Liu
  0 siblings, 1 reply; 17+ messages in thread
From: Guilherme G. Piccoli @ 2019-05-06 18:48 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-block, linux-raid, dm-devel, axboe, Gavin Guo,
	Jay Vosburgh, kernel, Ming Lei, Tetsuo Handa, stable

On 06/05/2019 13:50, Song Liu wrote:
> [...] 
> IIUC, we need this for all raid types. Is it possible to fix that in md.c so
> all types get the fix?
> 
> Thanks,
> Song
> 

Hi Song, thanks again for reviewing my code and provide input, much
appreciated!

I understand this could in theory affects all the RAID levels, but in
practice I don't think it'll happen. RAID0 is the only "blind" mode of
RAID, in the sense it's the only one that doesn't care at all with
failures. In fact, this was the origin of my other thread [0], regarding
the change of raid0's behavior in error cases..because it currently does
not care with members being removed and rely only in filesystem failures
(after submitting many BIOs to the removed device).

That said, in this change I've only took care of raid0, since in my
understanding the other levels won't submit BIOs to dead devices; we can
experiment that to see if it's true.

But I'd be happy to change all other levels also if you think it's
appropriate (or a simple generic change to md.c if it is enough). Do you
think we could go ahead with this change, and further improve that (to
cover all raid cases if necessary)?

Cheers,


Guilherme



[0] https://marc.info/?l=linux-raid&m=155562509905735

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-05-06 18:48     ` Guilherme G. Piccoli
@ 2019-05-06 21:07       ` Song Liu
  2019-05-07 21:51         ` Guilherme G. Piccoli
  2019-05-08  9:29         ` Wols Lists
  0 siblings, 2 replies; 17+ messages in thread
From: Song Liu @ 2019-05-06 21:07 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: linux-block, linux-raid, dm-devel, axboe, Gavin Guo,
	Jay Vosburgh, kernel, Ming Lei, Tetsuo Handa, stable

On Mon, May 6, 2019 at 2:48 PM Guilherme G. Piccoli
<gpiccoli@canonical.com> wrote:
>
> On 06/05/2019 13:50, Song Liu wrote:
> > [...]
> > IIUC, we need this for all raid types. Is it possible to fix that in md.c so
> > all types get the fix?
> >
> > Thanks,
> > Song
> >
>
> Hi Song, thanks again for reviewing my code and provide input, much
> appreciated!
>
> I understand this could in theory affects all the RAID levels, but in
> practice I don't think it'll happen. RAID0 is the only "blind" mode of
> RAID, in the sense it's the only one that doesn't care at all with
> failures. In fact, this was the origin of my other thread [0], regarding
> the change of raid0's behavior in error cases..because it currently does
> not care with members being removed and rely only in filesystem failures
> (after submitting many BIOs to the removed device).
>
> That said, in this change I've only took care of raid0, since in my
> understanding the other levels won't submit BIOs to dead devices; we can
> experiment that to see if it's true.

Could you please run a quick test with raid5? I am wondering whether
some race condition could get us into similar crash. If we cannot easily
trigger the bug, we can process with this version.

Thanks,
Song

>
> But I'd be happy to change all other levels also if you think it's
> appropriate (or a simple generic change to md.c if it is enough). Do you
> think we could go ahead with this change, and further improve that (to
> cover all raid cases if necessary)?
>
> Cheers,
>
>
> Guilherme
>
>
>
> [0] https://marc.info/?l=linux-raid&m=155562509905735

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-05-06 21:07       ` Song Liu
@ 2019-05-07 21:51         ` Guilherme G. Piccoli
  2019-05-08  9:29         ` Wols Lists
  1 sibling, 0 replies; 17+ messages in thread
From: Guilherme G. Piccoli @ 2019-05-07 21:51 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-block, linux-raid, dm-devel, axboe, Gavin Guo,
	Jay Vosburgh, kernel, Ming Lei, Tetsuo Handa, stable

On 06/05/2019 18:07, Song Liu wrote:
>> [...]
>> I understand this could in theory affects all the RAID levels, but in
>> practice I don't think it'll happen. RAID0 is the only "blind" mode of
>> RAID, in the sense it's the only one that doesn't care at all with
>> failures. In fact, this was the origin of my other thread [0], regarding
>> the change of raid0's behavior in error cases..because it currently does
>> not care with members being removed and rely only in filesystem failures
>> (after submitting many BIOs to the removed device).
>>
>> That said, in this change I've only took care of raid0, since in my
>> understanding the other levels won't submit BIOs to dead devices; we can
>> experiment that to see if it's true.
> 
> Could you please run a quick test with raid5? I am wondering whether
> some race condition could get us into similar crash. If we cannot easily
> trigger the bug, we can process with this version.
> 
> Thanks,
> Song

Hi Song, I've tested both RAID5 (with 3 disks, removing one at a time),
and also RAID 1 (2 disks, also removing one at a time); no issues
observed in kernel 5.1. We can see one interesting message in kernel
log: "super_written gets error=10", which corresponds to md detecting
the error (bi_status == BLK_STS_IOERROR) and instantly failing the
write, making FS read-only.

So, I think really the issue happens only in RAID0, which writes
"blindly" to its components.
Let me know your thoughts - thanks again for your input!

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-05-06 21:07       ` Song Liu
  2019-05-07 21:51         ` Guilherme G. Piccoli
@ 2019-05-08  9:29         ` Wols Lists
  2019-05-08 14:52           ` Guilherme G. Piccoli
  1 sibling, 1 reply; 17+ messages in thread
From: Wols Lists @ 2019-05-08  9:29 UTC (permalink / raw)
  To: Song Liu, Guilherme G. Piccoli
  Cc: linux-block, linux-raid, dm-devel, axboe, Gavin Guo,
	Jay Vosburgh, kernel, Ming Lei, Tetsuo Handa, stable

On 06/05/19 22:07, Song Liu wrote:
> Could you please run a quick test with raid5? I am wondering whether
> some race condition could get us into similar crash. If we cannot easily
> trigger the bug, we can process with this version.

Bear in mind I just read the list and write documentation, but ...

My gut feeling is that if it can theoretically happen for all raid
modes, it should be fixed for all raid modes. What happens if code
changes elsewhere and suddenly it really does happen for say raid-5?

On the other hand, if fixing it in md.c only gets tested for raid-0, how
do we know it will actually work for other raids if they do suddenly
start falling through.

Academic purity versus engineering practicality :-)

Cheers,
Wol

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-05-08  9:29         ` Wols Lists
@ 2019-05-08 14:52           ` Guilherme G. Piccoli
  2019-05-08 16:52             ` Wols Lists
  0 siblings, 1 reply; 17+ messages in thread
From: Guilherme G. Piccoli @ 2019-05-08 14:52 UTC (permalink / raw)
  To: Wols Lists, Song Liu
  Cc: Guilherme G. Piccoli, linux-block, linux-raid, dm-devel, axboe,
	Gavin Guo, Jay Vosburgh, kernel, Ming Lei, Tetsuo Handa, stable

On 5/8/19 6:29 AM, Wols Lists wrote:
> On 06/05/19 22:07, Song Liu wrote:
>> Could you please run a quick test with raid5? I am wondering whether
>> some race condition could get us into similar crash. If we cannot easily
>> trigger the bug, we can process with this version.
> 
> Bear in mind I just read the list and write documentation, but ...
> 
> My gut feeling is that if it can theoretically happen for all raid
> modes, it should be fixed for all raid modes. What happens if code
> changes elsewhere and suddenly it really does happen for say raid-5?
> 
> On the other hand, if fixing it in md.c only gets tested for raid-0, how
> do we know it will actually work for other raids if they do suddenly
> start falling through.

Hi, I understand your concern. But all other raid levels contains 
failure-event mechanisms. For example, in all my tests with raid5 or 
raid1, it first complained the device was removed, then it failed in 
super_written() when no other available device was present.
On the other hand, raid0 does "blind-writes": it just selects the device 
in which that bio should be written (given the stripe math) and change 
the bio's device, sending it back via generic_make_request(). It's 
dummy, but not in a bad way, but rather for performance reasons. It has 
no "intelligence" for failures, as all other raid levels.

That said, we could fix md.c for all raid levels, but I personally think 
it's a bazooka shot, only raid0 shows consistently this issue.

> 
> Academic purity versus engineering practicality :-)

Heheh you have good points here! Thanks for the input =)
Cheers,


Guilherme


> 
> Cheers,
> Wol
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-05-08 14:52           ` Guilherme G. Piccoli
@ 2019-05-08 16:52             ` Wols Lists
  2019-05-17 16:19               ` Guilherme G. Piccoli
  0 siblings, 1 reply; 17+ messages in thread
From: Wols Lists @ 2019-05-08 16:52 UTC (permalink / raw)
  To: Guilherme G. Piccoli, Song Liu
  Cc: Guilherme G. Piccoli, linux-block, linux-raid, dm-devel, axboe,
	Gavin Guo, Jay Vosburgh, kernel, Ming Lei, Tetsuo Handa, stable

On 08/05/19 15:52, Guilherme G. Piccoli wrote:
> Hi, I understand your concern. But all other raid levels contains
> failure-event mechanisms. For example, in all my tests with raid5 or
> raid1, it first complained the device was removed, then it failed in
> super_written() when no other available device was present.
> On the other hand, raid0 does "blind-writes": it just selects the device
> in which that bio should be written (given the stripe math) and change
> the bio's device, sending it back via generic_make_request(). It's
> dummy, but not in a bad way, but rather for performance reasons. It has
> no "intelligence" for failures, as all other raid levels.
> 
> That said, we could fix md.c for all raid levels, but I personally think
> it's a bazooka shot, only raid0 shows consistently this issue.
> 
The academic in me says we should push that error handling into
generic_make_request() or some raid function in md.c that deals with
those problems. Sounds like there's a fair bit of duplicate
functionality that could be re-factored out.
>>
>> Academic purity versus engineering practicality :-)
> 
> Heheh you have good points here! Thanks for the input =)
> Cheers,
> 
Doesn't help when there's not an architect to design an overall "grand
scheme", but my usual way of working is to design top down academically,
and then ask myself "what do I need" before implementing bottom-up.
Hopefully with a load of documentation saying "I haven't done this
because I don't need it, but this is where it goes".

Cheers,
Wol


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request()
  2019-04-30 22:37 [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request() Guilherme G. Piccoli
  2019-04-30 22:37 ` [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios Guilherme G. Piccoli
  2019-04-30 22:55 ` [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request() Bart Van Assche
@ 2019-05-17  3:33 ` Eric Ren
  2019-05-17 16:17   ` Guilherme G. Piccoli
  2019-05-17 22:04 ` Ming Lei
  3 siblings, 1 reply; 17+ messages in thread
From: Eric Ren @ 2019-05-17  3:33 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: linux-block, linux-raid, dm-devel, axboe, gavin.guo,
	jay.vosburgh, kernel, Bart Van Assche, stable

Hello,

On Wed, 1 May 2019 at 06:38, Guilherme G. Piccoli
<gpiccoli@canonical.com> wrote:
>
> Commit 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently
> with device removal triggers a crash") introduced a NULL pointer
> dereference in generic_make_request(). The patch sets q to NULL and
> enter_succeeded to false; right after, there's an 'if (enter_succeeded)'
> which is not taken, and then the 'else' will dereference q in
> blk_queue_dying(q).
>
> This patch just moves the 'q = NULL' to a point in which it won't trigger
> the oops, although the semantics of this NULLification remains untouched.
>
> A simple test case/reproducer is as follows:
> a) Build kernel v5.1-rc7 with CONFIG_BLK_CGROUP=n.
>
> b) Create a raid0 md array with 2 NVMe devices as members, and mount it
> with an ext4 filesystem.
>
> c) Run the following oneliner (supposing the raid0 is mounted in /mnt):
> (dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;
> echo 1 > /sys/block/nvme0n1/device/device/remove
> (whereas nvme0n1 is the 2nd array member)
>
> This will trigger the following oops:
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP PTI
> RIP: 0010:generic_make_request+0x32b/0x400
> Call Trace:
>  submit_bio+0x73/0x140
>  ext4_io_submit+0x4d/0x60
>  ext4_writepages+0x626/0xe90
>  do_writepages+0x4b/0xe0
> [...]
>
> This patch has no functional changes and preserves the md/raid0 behavior
> when a member is removed before kernel v4.17.
>
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: stable@vger.kernel.org # v4.17
> Fixes: 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash")
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>

Thanks for the bugfix. I also had a panic having very similar
calltrace below as this one,
when using devicemapper in container scenario and deleting many thin
snapshots by dmsetup
remove_all -f, meanwhile executing lvm command like vgs.

After applied this one, my testing doesn't crash kernel any more for
one week.  Could the block
developers please give more feedback/priority on this one?

My panic trace:
```
50515.136279] BUG: unable to handle kernel NULL pointer dereference at
00000000000003b8
[50515.144704] PGD 0 P4D 0
[50515.147576] Oops: 0000 [#1] SMP PTI
[50515.151403] CPU: 24 PID: 45287 Comm: vgs Kdump: loaded Not tainted
4.19.24-9.x86_64#1
[50515.169295] RIP: 0010:generic_make_request+0x24/0x350
[50515.174684] Code: e9 e1 45 42 00 90 0f 1f 44 00 00 55 48 89 e5 41
55 41 54 53 48 89 fb 48 83 e4 f0 48 83 ec 20 48 8b 47 08 f6 47 15 08
8b 77 10 <4c> 8b a0 b8 03 00 00 0f 84 82 00 00 00 49 8b 84 24 50 07 00
00 a8
[50515.194303] RSP: 0018:ffffa90862373a10 EFLAGS: 00010246
[50515.199869] RAX: 0000000000000000 RBX: ffff99d7338b7800 RCX: 0000000000000000
[50515.207347] RDX: ffff99d51d89c380 RSI: 0000000000000000 RDI: ffff99d7338b7800
[50515.214828] RBP: ffffa90862373a48 R08: 0000000000000000 R09: ffff99a840007300
[50515.222305] R10: ffffa90862373b88 R11: 0000000000000000 R12: ffff99d833592200
[50515.229782] R13: ffffa90862373b58 R14: 0000000000000001 R15: 0000000000000000
[50515.237264] FS:  00007fc36b322880(0000) GS:ffff99d83f000000(0000)
knlGS:0000000000000000
[50515.245944] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[50515.252036] CR2: 00000000000003b8 CR3: 0000005c53ed0001 CR4: 00000000003626a0
[50515.259519] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[50515.266996] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[50515.274473] Call Trace:
[50515.277264]  ? bio_add_page+0x42/0x50
[50515.281267]  submit_bio+0x6e/0x130
[50515.285013]  new_read+0xa2/0xf0 [dm_bufio]
[50515.289454]  dm_bm_read_lock+0x21/0x70 [dm_persistent_data]
[50515.295369]  ro_step+0x31/0x60 [dm_persistent_data]
[50515.300589]  dm_btree_find_key+0xb0/0x180 [dm_persistent_data]
[50515.306765]  dm_thin_get_highest_mapped_block+0x75/0x90 [dm_thin_pool]
[50515.313639]  thin_status+0x145/0x290 [dm_thin_pool]
...
```
Regards,
Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request()
  2019-05-17  3:33 ` Eric Ren
@ 2019-05-17 16:17   ` Guilherme G. Piccoli
  2019-05-20  2:43     ` Eric Ren
  0 siblings, 1 reply; 17+ messages in thread
From: Guilherme G. Piccoli @ 2019-05-17 16:17 UTC (permalink / raw)
  To: Eric Ren
  Cc: Guilherme G. Piccoli, linux-block, linux-raid, dm-devel, axboe,
	Gavin Guo, Jay Vosburgh, Bart Van Assche, stable

On Fri, May 17, 2019 at 12:33 AM Eric Ren <renzhengeek@gmail.com> wrote:
>
> Hello,
> [...]
> Thanks for the bugfix. I also had a panic having very similar
> calltrace below as this one,
> when using devicemapper in container scenario and deleting many thin
> snapshots by dmsetup
> remove_all -f, meanwhile executing lvm command like vgs.
>
> After applied this one, my testing doesn't crash kernel any more for
> one week.  Could the block
> developers please give more feedback/priority on this one?
>

Thanks Eric, for the testing! I think you could send your Tested-by[0]
tag, which could be added
in the patch before merge. It's good to know the patch helped somebody
and your testing improves
confidence in the change.

Cheers,


Guilherme

[0] https://www.kernel.org/doc/html/latest/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-05-08 16:52             ` Wols Lists
@ 2019-05-17 16:19               ` Guilherme G. Piccoli
  2019-05-20 16:23                 ` Song Liu
  0 siblings, 1 reply; 17+ messages in thread
From: Guilherme G. Piccoli @ 2019-05-17 16:19 UTC (permalink / raw)
  To: Wols Lists, Song Liu, axboe
  Cc: Guilherme G. Piccoli, linux-block, linux-raid, dm-devel,
	Gavin Guo, Jay Vosburgh, Ming Lei, Tetsuo Handa, stable

Jens / Song, any news in this one?

I think would be good to have this raid0 fix rather sooner than later
if possible - it's easy
to reproduce the crash.

Thanks,


Guilherme

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request()
  2019-04-30 22:37 [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request() Guilherme G. Piccoli
                   ` (2 preceding siblings ...)
  2019-05-17  3:33 ` Eric Ren
@ 2019-05-17 22:04 ` Ming Lei
  3 siblings, 0 replies; 17+ messages in thread
From: Ming Lei @ 2019-05-17 22:04 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: linux-block, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	open list:DEVICE-MAPPER (LVM),
	Jens Axboe, Gavin Guo, Jay Vosburgh, kernel, Bart Van Assche,
	stable

On Wed, May 1, 2019 at 6:38 AM Guilherme G. Piccoli
<gpiccoli@canonical.com> wrote:
>
> Commit 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently
> with device removal triggers a crash") introduced a NULL pointer
> dereference in generic_make_request(). The patch sets q to NULL and
> enter_succeeded to false; right after, there's an 'if (enter_succeeded)'
> which is not taken, and then the 'else' will dereference q in
> blk_queue_dying(q).
>
> This patch just moves the 'q = NULL' to a point in which it won't trigger
> the oops, although the semantics of this NULLification remains untouched.
>
> A simple test case/reproducer is as follows:
> a) Build kernel v5.1-rc7 with CONFIG_BLK_CGROUP=n.
>
> b) Create a raid0 md array with 2 NVMe devices as members, and mount it
> with an ext4 filesystem.
>
> c) Run the following oneliner (supposing the raid0 is mounted in /mnt):
> (dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3;
> echo 1 > /sys/block/nvme0n1/device/device/remove
> (whereas nvme0n1 is the 2nd array member)
>
> This will trigger the following oops:
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP PTI
> RIP: 0010:generic_make_request+0x32b/0x400
> Call Trace:
>  submit_bio+0x73/0x140
>  ext4_io_submit+0x4d/0x60
>  ext4_writepages+0x626/0xe90
>  do_writepages+0x4b/0xe0
> [...]
>
> This patch has no functional changes and preserves the md/raid0 behavior
> when a member is removed before kernel v4.17.
>
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: stable@vger.kernel.org # v4.17
> Fixes: 37f9579f4c31 ("blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash")

BTW, the legacy IO request path is removed since 5.0, the above fault
commit can be
removed as done by the following patches:

https://lore.kernel.org/linux-block/20190515030310.20393-1-ming.lei@redhat.com/T/#t

> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
> ---
>  block/blk-core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index a55389ba8779..e21856a7f3fa 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1076,7 +1076,6 @@ blk_qc_t generic_make_request(struct bio *bio)
>                                 flags = BLK_MQ_REQ_NOWAIT;
>                         if (blk_queue_enter(q, flags) < 0) {
>                                 enter_succeeded = false;
> -                               q = NULL;
>                         }

Please remove '{}'.

>                 }
>
> @@ -1108,6 +1107,7 @@ blk_qc_t generic_make_request(struct bio *bio)
>                                 bio_wouldblock_error(bio);
>                         else
>                                 bio_io_error(bio);
> +                       q = NULL;
>                 }
>                 bio = bio_list_pop(&bio_list_on_stack[0]);
>         } while (bio);
> --
> 2.21.0
>

Reviewed-by: Ming Lei <ming.lei@redhat.com>

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request()
  2019-05-17 16:17   ` Guilherme G. Piccoli
@ 2019-05-20  2:43     ` Eric Ren
  0 siblings, 0 replies; 17+ messages in thread
From: Eric Ren @ 2019-05-20  2:43 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Guilherme G. Piccoli, linux-block, linux-raid, dm-devel, axboe,
	Gavin Guo, Jay Vosburgh, Bart Van Assche, stable

Hi,

On Sat, 18 May 2019 at 00:17, Guilherme G. Piccoli <kernel@gpiccoli.net> wrote:
>
> On Fri, May 17, 2019 at 12:33 AM Eric Ren <renzhengeek@gmail.com> wrote:
> >
> > Hello,
> > [...]
> > Thanks for the bugfix. I also had a panic having very similar
> > calltrace below as this one,
> > when using devicemapper in container scenario and deleting many thin
> > snapshots by dmsetup
> > remove_all -f, meanwhile executing lvm command like vgs.
> >
> > After applied this one, my testing doesn't crash kernel any more for
> > one week.  Could the block
> > developers please give more feedback/priority on this one?
> >
>
> Thanks Eric, for the testing! I think you could send your Tested-by[0]
> tag, which could be added
> in the patch before merge. It's good to know the patch helped somebody
> and your testing improves
> confidence in the change.

Please consider Ming's comments and send patch v2, then feel free to add:
Tested-by: Eric Ren <renzhengeek@gmail.com>

Thanks!
Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-05-17 16:19               ` Guilherme G. Piccoli
@ 2019-05-20 16:23                 ` Song Liu
  2019-05-20 19:25                   ` Guilherme Piccoli
  0 siblings, 1 reply; 17+ messages in thread
From: Song Liu @ 2019-05-20 16:23 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Wols Lists, axboe, Guilherme G. Piccoli, linux-block, linux-raid,
	dm-devel, Gavin Guo, Jay Vosburgh, Ming Lei, Tetsuo Handa,
	stable

On Fri, May 17, 2019 at 9:19 AM Guilherme G. Piccoli
<kernel@gpiccoli.net> wrote:
>
> Jens / Song, any news in this one?
>
> I think would be good to have this raid0 fix rather sooner than later
> if possible - it's easy
> to reproduce the crash.
>
> Thanks,
>
>
> Guilherme

I will process it. It was delayed due to the merge window.

Thanks,
Song

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios
  2019-05-20 16:23                 ` Song Liu
@ 2019-05-20 19:25                   ` Guilherme Piccoli
  0 siblings, 0 replies; 17+ messages in thread
From: Guilherme Piccoli @ 2019-05-20 19:25 UTC (permalink / raw)
  To: Song Liu
  Cc: Guilherme G. Piccoli, Wols Lists, axboe, linux-block, linux-raid,
	dm-devel, Gavin Guo, Jay Vosburgh, Ming Lei, Tetsuo Handa,
	stable

On Mon, May 20, 2019 at 1:24 PM Song Liu <liu.song.a23@gmail.com> wrote:
>
> On Fri, May 17, 2019 at 9:19 AM Guilherme G. Piccoli
>
> I will process it. It was delayed due to the merge window.
>
> Thanks,
> Song


Thank you Song! I'm ready to send a v2, just to match the v2 of patch
"1/2" of this series (but no
change in this one, except rebase to 5.2-rc1).

Cheers,

Guilherme

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-05-20 19:26 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-30 22:37 [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request() Guilherme G. Piccoli
2019-04-30 22:37 ` [PATCH 2/2] md/raid0: Do not bypass blocking queue entered for raid0 bios Guilherme G. Piccoli
2019-05-06 16:50   ` Song Liu
2019-05-06 18:48     ` Guilherme G. Piccoli
2019-05-06 21:07       ` Song Liu
2019-05-07 21:51         ` Guilherme G. Piccoli
2019-05-08  9:29         ` Wols Lists
2019-05-08 14:52           ` Guilherme G. Piccoli
2019-05-08 16:52             ` Wols Lists
2019-05-17 16:19               ` Guilherme G. Piccoli
2019-05-20 16:23                 ` Song Liu
2019-05-20 19:25                   ` Guilherme Piccoli
2019-04-30 22:55 ` [PATCH 1/2] block: Fix a NULL pointer dereference in generic_make_request() Bart Van Assche
2019-05-17  3:33 ` Eric Ren
2019-05-17 16:17   ` Guilherme G. Piccoli
2019-05-20  2:43     ` Eric Ren
2019-05-17 22:04 ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).