All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] md/bitmap: avoid read out of the disk
@ 2017-10-10 21:20 Shaohua Li
  2017-10-11 12:41 ` Joshua Kinard
  2017-10-12  3:09 ` NeilBrown
  0 siblings, 2 replies; 22+ messages in thread
From: Shaohua Li @ 2017-10-10 21:20 UTC (permalink / raw)
  To: linux-raid; +Cc: kumba, Shaohua Li, Song Liu

From: Shaohua Li <shli@fb.com>

If PAGE_SIZE is bigger than 4k, we could read out of the disk boundary. Limit
the read size to the end of disk. Write path already has similar limitation.

Fix: 8031c3ddc70a(md/bitmap: copy correct data for bitmap super)
Reported-by: Joshua Kinard <kumba@gentoo.org>
Tested-by: Joshua Kinard <kumba@gentoo.org>
Cc: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 drivers/md/bitmap.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index d2121637b4ab..f68ec973fbdd 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -153,6 +153,7 @@ static int read_sb_page(struct mddev *mddev, loff_t offset,
 
 	struct md_rdev *rdev;
 	sector_t target;
+	int target_size;
 
 	rdev_for_each(rdev, mddev) {
 		if (! test_bit(In_sync, &rdev->flags)
@@ -161,9 +162,12 @@ static int read_sb_page(struct mddev *mddev, loff_t offset,
 			continue;
 
 		target = offset + index * (PAGE_SIZE/512);
+		target_size = min_t(u64, size, i_size_read(rdev->bdev->bd_inode) -
+			((target + rdev->sb_start) << 9));
+		target_size = roundup(target_size,
+			bdev_logical_block_size(rdev->bdev));
 
-		if (sync_page_io(rdev, target,
-				 roundup(size, bdev_logical_block_size(rdev->bdev)),
+		if (sync_page_io(rdev, target, target_size,
 				 page, REQ_OP_READ, 0, true)) {
 			page->index = index;
 			return 0;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] md/bitmap: avoid read out of the disk
  2017-10-10 21:20 [PATCH] md/bitmap: avoid read out of the disk Shaohua Li
@ 2017-10-11 12:41 ` Joshua Kinard
  2017-10-12  3:09 ` NeilBrown
  1 sibling, 0 replies; 22+ messages in thread
From: Joshua Kinard @ 2017-10-11 12:41 UTC (permalink / raw)
  To: Shaohua Li, linux-raid; +Cc: Shaohua Li, Song Liu

On 10/10/2017 17:20, Shaohua Li wrote:
> From: Shaohua Li <shli@fb.com>
> 
> If PAGE_SIZE is bigger than 4k, we could read out of the disk boundary. Limit
> the read size to the end of disk. Write path already has similar limitation.
> 
> Fix: 8031c3ddc70a(md/bitmap: copy correct data for bitmap super)
> Reported-by: Joshua Kinard <kumba@gentoo.org>
> Tested-by: Joshua Kinard <kumba@gentoo.org>
> Cc: Song Liu <songliubraving@fb.com>
> Signed-off-by: Shaohua Li <shli@fb.com>
> ---
>  drivers/md/bitmap.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
> index d2121637b4ab..f68ec973fbdd 100644
> --- a/drivers/md/bitmap.c
> +++ b/drivers/md/bitmap.c
> @@ -153,6 +153,7 @@ static int read_sb_page(struct mddev *mddev, loff_t offset,
>  
>  	struct md_rdev *rdev;
>  	sector_t target;
> +	int target_size;
>  
>  	rdev_for_each(rdev, mddev) {
>  		if (! test_bit(In_sync, &rdev->flags)
> @@ -161,9 +162,12 @@ static int read_sb_page(struct mddev *mddev, loff_t offset,
>  			continue;
>  
>  		target = offset + index * (PAGE_SIZE/512);
> +		target_size = min_t(u64, size, i_size_read(rdev->bdev->bd_inode) -
> +			((target + rdev->sb_start) << 9));
> +		target_size = roundup(target_size,
> +			bdev_logical_block_size(rdev->bdev));
>  
> -		if (sync_page_io(rdev, target,
> -				 roundup(size, bdev_logical_block_size(rdev->bdev)),
> +		if (sync_page_io(rdev, target, target_size,
>  				 page, REQ_OP_READ, 0, true)) {
>  			page->index = index;
>  			return 0;
> 

Acked-by: Joshua Kinard <kumba@gentoo.org>

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] md/bitmap: avoid read out of the disk
  2017-10-10 21:20 [PATCH] md/bitmap: avoid read out of the disk Shaohua Li
  2017-10-11 12:41 ` Joshua Kinard
@ 2017-10-12  3:09 ` NeilBrown
  2017-10-12 17:30   ` Shaohua Li
  1 sibling, 1 reply; 22+ messages in thread
From: NeilBrown @ 2017-10-12  3:09 UTC (permalink / raw)
  To: Shaohua Li, linux-raid; +Cc: kumba, Shaohua Li, Song Liu

[-- Attachment #1: Type: text/plain, Size: 2511 bytes --]

On Tue, Oct 10 2017, Shaohua Li wrote:

> From: Shaohua Li <shli@fb.com>
>
> If PAGE_SIZE is bigger than 4k, we could read out of the disk boundary. Limit
> the read size to the end of disk. Write path already has similar limitation.
>
> Fix: 8031c3ddc70a(md/bitmap: copy correct data for bitmap super)
> Reported-by: Joshua Kinard <kumba@gentoo.org>
> Tested-by: Joshua Kinard <kumba@gentoo.org>
> Cc: Song Liu <songliubraving@fb.com>
> Signed-off-by: Shaohua Li <shli@fb.com>

Given that this bug was introduced by
Commit: 8031c3ddc70a ("md/bitmap: copy correct data for bitmap super")

and that patch is markted:

    Cc: stable@vger.kernel.org (4.10+)

I think this patch should be tagged "CC: stable" too.

However ... that earlier patch looks strange to me.
Why is it that "raid5 cache could write bitmap superblock before bitmap superblock is
initialized."  Can we just get raid5 cache *not* to write the bitmap
superblock too early?
I think that would better than breaking code that previously worked.

We really need to stop assuming that PAGE_SIZE is 4K.  I'm probably as
guilty as anyone else, but it would be good to stop.

Thanks,
NeilBrown


> ---
>  drivers/md/bitmap.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
> index d2121637b4ab..f68ec973fbdd 100644
> --- a/drivers/md/bitmap.c
> +++ b/drivers/md/bitmap.c
> @@ -153,6 +153,7 @@ static int read_sb_page(struct mddev *mddev, loff_t offset,
>  
>  	struct md_rdev *rdev;
>  	sector_t target;
> +	int target_size;
>  
>  	rdev_for_each(rdev, mddev) {
>  		if (! test_bit(In_sync, &rdev->flags)
> @@ -161,9 +162,12 @@ static int read_sb_page(struct mddev *mddev, loff_t offset,
>  			continue;
>  
>  		target = offset + index * (PAGE_SIZE/512);
> +		target_size = min_t(u64, size, i_size_read(rdev->bdev->bd_inode) -
> +			((target + rdev->sb_start) << 9));
> +		target_size = roundup(target_size,
> +			bdev_logical_block_size(rdev->bdev));
>  
> -		if (sync_page_io(rdev, target,
> -				 roundup(size, bdev_logical_block_size(rdev->bdev)),
> +		if (sync_page_io(rdev, target, target_size,
>  				 page, REQ_OP_READ, 0, true)) {
>  			page->index = index;
>  			return 0;
> -- 
> 2.11.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] md/bitmap: avoid read out of the disk
  2017-10-12  3:09 ` NeilBrown
@ 2017-10-12 17:30   ` Shaohua Li
  2017-10-12 17:53     ` Song Liu
  2017-10-12 21:44     ` [PATCH] md/bitmap: avoid read out of the disk NeilBrown
  0 siblings, 2 replies; 22+ messages in thread
From: Shaohua Li @ 2017-10-12 17:30 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, kumba, Shaohua Li, Song Liu

On Thu, Oct 12, 2017 at 02:09:21PM +1100, Neil Brown wrote:
> On Tue, Oct 10 2017, Shaohua Li wrote:
> 
> > From: Shaohua Li <shli@fb.com>
> >
> > If PAGE_SIZE is bigger than 4k, we could read out of the disk boundary. Limit
> > the read size to the end of disk. Write path already has similar limitation.
> >
> > Fix: 8031c3ddc70a(md/bitmap: copy correct data for bitmap super)
> > Reported-by: Joshua Kinard <kumba@gentoo.org>
> > Tested-by: Joshua Kinard <kumba@gentoo.org>
> > Cc: Song Liu <songliubraving@fb.com>
> > Signed-off-by: Shaohua Li <shli@fb.com>
> 
> Given that this bug was introduced by
> Commit: 8031c3ddc70a ("md/bitmap: copy correct data for bitmap super")
> 
> and that patch is markted:
> 
>     Cc: stable@vger.kernel.org (4.10+)
> 
> I think this patch should be tagged "CC: stable" too.

I thought the Fix tag is enough, but I'll add the stable 
> However ... that earlier patch looks strange to me.
> Why is it that "raid5 cache could write bitmap superblock before bitmap superblock is
> initialized."  Can we just get raid5 cache *not* to write the bitmap
> superblock too early?
> I think that would better than breaking code that previously worked.

That's the log reply code, which must update superblock and hence bitmap
superblock, because reply happens very earlier. I agree the reply might still
have problem with bitmap. We'd better defer reply after the raid is fully
initialized. Song, any idea?

Thanks,
Shaohua 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] md/bitmap: avoid read out of the disk
  2017-10-12 17:30   ` Shaohua Li
@ 2017-10-12 17:53     ` Song Liu
  2017-10-12 21:46       ` NeilBrown
  2017-10-13  5:16       ` NeilBrown
  2017-10-12 21:44     ` [PATCH] md/bitmap: avoid read out of the disk NeilBrown
  1 sibling, 2 replies; 22+ messages in thread
From: Song Liu @ 2017-10-12 17:53 UTC (permalink / raw)
  To: Shaohua Li; +Cc: NeilBrown, linux-raid, kumba, Shaohua Li


> On Oct 12, 2017, at 10:30 AM, Shaohua Li <shli@kernel.org> wrote:
> 
> On Thu, Oct 12, 2017 at 02:09:21PM +1100, Neil Brown wrote:
>> On Tue, Oct 10 2017, Shaohua Li wrote:
>> 
>>> From: Shaohua Li <shli@fb.com>
>>> 
>>> If PAGE_SIZE is bigger than 4k, we could read out of the disk boundary. Limit
>>> the read size to the end of disk. Write path already has similar limitation.
>>> 
>>> Fix: 8031c3ddc70a(md/bitmap: copy correct data for bitmap super)
>>> Reported-by: Joshua Kinard <kumba@gentoo.org>
>>> Tested-by: Joshua Kinard <kumba@gentoo.org>
>>> Cc: Song Liu <songliubraving@fb.com>
>>> Signed-off-by: Shaohua Li <shli@fb.com>
>> 
>> Given that this bug was introduced by
>> Commit: 8031c3ddc70a ("md/bitmap: copy correct data for bitmap super")
>> 
>> and that patch is markted:
>> 
>>    Cc: stable@vger.kernel.org (4.10+)
>> 
>> I think this patch should be tagged "CC: stable" too.
> 
> I thought the Fix tag is enough, but I'll add the stable 
>> However ... that earlier patch looks strange to me.
>> Why is it that "raid5 cache could write bitmap superblock before bitmap superblock is
>> initialized."  Can we just get raid5 cache *not* to write the bitmap
>> superblock too early?
>> I think that would better than breaking code that previously worked.
> 
> That's the log reply code, which must update superblock and hence bitmap
> superblock, because reply happens very earlier. I agree the reply might still
> have problem with bitmap. We'd better defer reply after the raid is fully
> initialized. Song, any idea?
> 

With write back cache, there are two different types of stripes in recovery:
data-parity, and data-only. For data-parity stripes, we can simply replay data
from the journal. But for data-only stripes, we need to do rcw or rmw to update
parities. Currently, the writes are handled with raid5 state machine. Therefore,
we wake up mddev->thread in r5l_recovery_log(). It is necessary to finish these 
stripes before we fully initialize the array, because these stripes need to be 
handled with write back state machine; while we we always start the array with 
write through journal_mode. 

Maybe we can fix this by change the order of initialization in md_run(), 
specifically, moving bitmap_create() before pers->run(). 

Thanks,
Song






^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] md/bitmap: avoid read out of the disk
  2017-10-12 17:30   ` Shaohua Li
  2017-10-12 17:53     ` Song Liu
@ 2017-10-12 21:44     ` NeilBrown
  1 sibling, 0 replies; 22+ messages in thread
From: NeilBrown @ 2017-10-12 21:44 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid, kumba, Shaohua Li, Song Liu

[-- Attachment #1: Type: text/plain, Size: 2143 bytes --]

On Thu, Oct 12 2017, Shaohua Li wrote:

> On Thu, Oct 12, 2017 at 02:09:21PM +1100, Neil Brown wrote:
>> On Tue, Oct 10 2017, Shaohua Li wrote:
>> 
>> > From: Shaohua Li <shli@fb.com>
>> >
>> > If PAGE_SIZE is bigger than 4k, we could read out of the disk boundary. Limit
>> > the read size to the end of disk. Write path already has similar limitation.
>> >
>> > Fix: 8031c3ddc70a(md/bitmap: copy correct data for bitmap super)
>> > Reported-by: Joshua Kinard <kumba@gentoo.org>
>> > Tested-by: Joshua Kinard <kumba@gentoo.org>
>> > Cc: Song Liu <songliubraving@fb.com>
>> > Signed-off-by: Shaohua Li <shli@fb.com>
>> 
>> Given that this bug was introduced by
>> Commit: 8031c3ddc70a ("md/bitmap: copy correct data for bitmap super")
>> 
>> and that patch is markted:
>> 
>>     Cc: stable@vger.kernel.org (4.10+)
>> 
>> I think this patch should be tagged "CC: stable" too.
>
> I thought the Fix tag is enough, but I'll add the stable 

My experience is that Fixes: sometimes causes patched to go to stable,
but not always.  Cc: stable *always* causes the stable team to look at
the patch.
I prefer to keep each having a well defined meaning.  Fixes is
documentation about a relationship between commits, Cc:stable is a
statement about the importance of a patch.

Thanks,
Neilbrown


>> However ... that earlier patch looks strange to me.
>> Why is it that "raid5 cache could write bitmap superblock before bitmap superblock is
>> initialized."  Can we just get raid5 cache *not* to write the bitmap
>> superblock too early?
>> I think that would better than breaking code that previously worked.
>
> That's the log reply code, which must update superblock and hence bitmap
> superblock, because reply happens very earlier. I agree the reply might still
> have problem with bitmap. We'd better defer reply after the raid is fully
> initialized. Song, any idea?
>
> Thanks,
> Shaohua 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] md/bitmap: avoid read out of the disk
  2017-10-12 17:53     ` Song Liu
@ 2017-10-12 21:46       ` NeilBrown
  2017-10-12 22:51         ` Shaohua Li
  2017-10-13  5:16       ` NeilBrown
  1 sibling, 1 reply; 22+ messages in thread
From: NeilBrown @ 2017-10-12 21:46 UTC (permalink / raw)
  To: Song Liu, Shaohua Li; +Cc: linux-raid, kumba, Shaohua Li

[-- Attachment #1: Type: text/plain, Size: 2866 bytes --]

On Thu, Oct 12 2017, Song Liu wrote:

>> On Oct 12, 2017, at 10:30 AM, Shaohua Li <shli@kernel.org> wrote:
>> 
>> On Thu, Oct 12, 2017 at 02:09:21PM +1100, Neil Brown wrote:
>>> On Tue, Oct 10 2017, Shaohua Li wrote:
>>> 
>>>> From: Shaohua Li <shli@fb.com>
>>>> 
>>>> If PAGE_SIZE is bigger than 4k, we could read out of the disk boundary. Limit
>>>> the read size to the end of disk. Write path already has similar limitation.
>>>> 
>>>> Fix: 8031c3ddc70a(md/bitmap: copy correct data for bitmap super)
>>>> Reported-by: Joshua Kinard <kumba@gentoo.org>
>>>> Tested-by: Joshua Kinard <kumba@gentoo.org>
>>>> Cc: Song Liu <songliubraving@fb.com>
>>>> Signed-off-by: Shaohua Li <shli@fb.com>
>>> 
>>> Given that this bug was introduced by
>>> Commit: 8031c3ddc70a ("md/bitmap: copy correct data for bitmap super")
>>> 
>>> and that patch is markted:
>>> 
>>>    Cc: stable@vger.kernel.org (4.10+)
>>> 
>>> I think this patch should be tagged "CC: stable" too.
>> 
>> I thought the Fix tag is enough, but I'll add the stable 
>>> However ... that earlier patch looks strange to me.
>>> Why is it that "raid5 cache could write bitmap superblock before bitmap superblock is
>>> initialized."  Can we just get raid5 cache *not* to write the bitmap
>>> superblock too early?
>>> I think that would better than breaking code that previously worked.
>> 
>> That's the log reply code, which must update superblock and hence bitmap
>> superblock, because reply happens very earlier. I agree the reply might still
>> have problem with bitmap. We'd better defer reply after the raid is fully
>> initialized. Song, any idea?
>> 
>
> With write back cache, there are two different types of stripes in recovery:
> data-parity, and data-only. For data-parity stripes, we can simply replay data
> from the journal. But for data-only stripes, we need to do rcw or rmw to update
> parities. Currently, the writes are handled with raid5 state machine. Therefore,
> we wake up mddev->thread in r5l_recovery_log(). It is necessary to finish these 
> stripes before we fully initialize the array, because these stripes need to be 
> handled with write back state machine; while we we always start the array with 
> write through journal_mode. 
>
> Maybe we can fix this by change the order of initialization in md_run(), 
> specifically, moving bitmap_create() before pers->run(). 

I was thinking exactly this as I was looking at the code.  ->run can
start recovery happening, and having that happen before the bitmap is
ready is wrong.
Maybe there is enough locking that things won't happen in the wrong
order, but it does look a bit odd.
Thanks for the explanation about the interaction between the journal and
the bitmaps.  I'll dig into the code and see what I find.  Then maybe we
can compare notes.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] md/bitmap: avoid read out of the disk
  2017-10-12 21:46       ` NeilBrown
@ 2017-10-12 22:51         ` Shaohua Li
  0 siblings, 0 replies; 22+ messages in thread
From: Shaohua Li @ 2017-10-12 22:51 UTC (permalink / raw)
  To: NeilBrown; +Cc: Song Liu, linux-raid, kumba, Shaohua Li

On Fri, Oct 13, 2017 at 08:46:35AM +1100, Neil Brown wrote:
> On Thu, Oct 12 2017, Song Liu wrote:
> 
> >> On Oct 12, 2017, at 10:30 AM, Shaohua Li <shli@kernel.org> wrote:
> >> 
> >> On Thu, Oct 12, 2017 at 02:09:21PM +1100, Neil Brown wrote:
> >>> On Tue, Oct 10 2017, Shaohua Li wrote:
> >>> 
> >>>> From: Shaohua Li <shli@fb.com>
> >>>> 
> >>>> If PAGE_SIZE is bigger than 4k, we could read out of the disk boundary. Limit
> >>>> the read size to the end of disk. Write path already has similar limitation.
> >>>> 
> >>>> Fix: 8031c3ddc70a(md/bitmap: copy correct data for bitmap super)
> >>>> Reported-by: Joshua Kinard <kumba@gentoo.org>
> >>>> Tested-by: Joshua Kinard <kumba@gentoo.org>
> >>>> Cc: Song Liu <songliubraving@fb.com>
> >>>> Signed-off-by: Shaohua Li <shli@fb.com>
> >>> 
> >>> Given that this bug was introduced by
> >>> Commit: 8031c3ddc70a ("md/bitmap: copy correct data for bitmap super")
> >>> 
> >>> and that patch is markted:
> >>> 
> >>>    Cc: stable@vger.kernel.org (4.10+)
> >>> 
> >>> I think this patch should be tagged "CC: stable" too.
> >> 
> >> I thought the Fix tag is enough, but I'll add the stable 
> >>> However ... that earlier patch looks strange to me.
> >>> Why is it that "raid5 cache could write bitmap superblock before bitmap superblock is
> >>> initialized."  Can we just get raid5 cache *not* to write the bitmap
> >>> superblock too early?
> >>> I think that would better than breaking code that previously worked.
> >> 
> >> That's the log reply code, which must update superblock and hence bitmap
> >> superblock, because reply happens very earlier. I agree the reply might still
> >> have problem with bitmap. We'd better defer reply after the raid is fully
> >> initialized. Song, any idea?
> >> 
> >
> > With write back cache, there are two different types of stripes in recovery:
> > data-parity, and data-only. For data-parity stripes, we can simply replay data
> > from the journal. But for data-only stripes, we need to do rcw or rmw to update
> > parities. Currently, the writes are handled with raid5 state machine. Therefore,
> > we wake up mddev->thread in r5l_recovery_log(). It is necessary to finish these 
> > stripes before we fully initialize the array, because these stripes need to be 
> > handled with write back state machine; while we we always start the array with 
> > write through journal_mode. 
> >
> > Maybe we can fix this by change the order of initialization in md_run(), 
> > specifically, moving bitmap_create() before pers->run(). 
> 
> I was thinking exactly this as I was looking at the code.  ->run can
> start recovery happening, and having that happen before the bitmap is
> ready is wrong.
> Maybe there is enough locking that things won't happen in the wrong
> order, but it does look a bit odd.
> Thanks for the explanation about the interaction between the journal and
> the bitmaps.  I'll dig into the code and see what I find.  Then maybe we
> can compare notes.

On the other hand, is there any value to let bitmap and journal co-exist?
Sounds not to me, if journal is enabled, bitmap is useless. Should we just not
allow array with both bitmap and journal enabled?

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] md/bitmap: avoid read out of the disk
  2017-10-12 17:53     ` Song Liu
  2017-10-12 21:46       ` NeilBrown
@ 2017-10-13  5:16       ` NeilBrown
  2017-10-13 19:51         ` Shaohua Li
  1 sibling, 1 reply; 22+ messages in thread
From: NeilBrown @ 2017-10-13  5:16 UTC (permalink / raw)
  To: Song Liu, Shaohua Li; +Cc: linux-raid, kumba, Shaohua Li

[-- Attachment #1: Type: text/plain, Size: 3051 bytes --]

On Thu, Oct 12 2017, Song Liu wrote:

>> On Oct 12, 2017, at 10:30 AM, Shaohua Li <shli@kernel.org> wrote:
>> 
>> On Thu, Oct 12, 2017 at 02:09:21PM +1100, Neil Brown wrote:
>>> On Tue, Oct 10 2017, Shaohua Li wrote:
>>> 
>>>> From: Shaohua Li <shli@fb.com>
>>>> 
>>>> If PAGE_SIZE is bigger than 4k, we could read out of the disk boundary. Limit
>>>> the read size to the end of disk. Write path already has similar limitation.
>>>> 
>>>> Fix: 8031c3ddc70a(md/bitmap: copy correct data for bitmap super)
>>>> Reported-by: Joshua Kinard <kumba@gentoo.org>
>>>> Tested-by: Joshua Kinard <kumba@gentoo.org>
>>>> Cc: Song Liu <songliubraving@fb.com>
>>>> Signed-off-by: Shaohua Li <shli@fb.com>
>>> 
>>> Given that this bug was introduced by
>>> Commit: 8031c3ddc70a ("md/bitmap: copy correct data for bitmap super")
>>> 
>>> and that patch is markted:
>>> 
>>>    Cc: stable@vger.kernel.org (4.10+)
>>> 
>>> I think this patch should be tagged "CC: stable" too.
>> 
>> I thought the Fix tag is enough, but I'll add the stable 
>>> However ... that earlier patch looks strange to me.
>>> Why is it that "raid5 cache could write bitmap superblock before bitmap superblock is
>>> initialized."  Can we just get raid5 cache *not* to write the bitmap
>>> superblock too early?
>>> I think that would better than breaking code that previously worked.
>> 
>> That's the log reply code, which must update superblock and hence bitmap
>> superblock, because reply happens very earlier. I agree the reply might still
>> have problem with bitmap. We'd better defer reply after the raid is fully
>> initialized. Song, any idea?
>> 
>
> With write back cache, there are two different types of stripes in recovery:
> data-parity, and data-only. For data-parity stripes, we can simply replay data
> from the journal. But for data-only stripes, we need to do rcw or rmw to update
> parities. Currently, the writes are handled with raid5 state machine. Therefore,
> we wake up mddev->thread in r5l_recovery_log(). It is necessary to finish these 
> stripes before we fully initialize the array, because these stripes need to be 
> handled with write back state machine; while we we always start the array with 
> write through journal_mode. 
>
> Maybe we can fix this by change the order of initialization in md_run(), 
> specifically, moving bitmap_create() before pers->run(). 

I've looked at some of the details here now.

I think I would like raid5-cache to not perform any recovery until we
reach


	md_wakeup_thread(mddev->thread);
	md_wakeup_thread(mddev->sync_thread); /* possibly kick off a reshape */


in do_md_run().  Before that point it is possible to fail and abort -
e.g. if bitmap_load() fails.

Possibly we could insert another personality call here "->start()" ??
That could then do whatever is needed before

	set_capacity(mddev->gendisk, mddev->array_sectors);
	revalidate_disk(mddev->gendisk);

makes the array accessible.

Might that be reasonable?

Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] md/bitmap: avoid read out of the disk
  2017-10-13  5:16       ` NeilBrown
@ 2017-10-13 19:51         ` Shaohua Li
  2017-10-16 16:21           ` Song Liu
  0 siblings, 1 reply; 22+ messages in thread
From: Shaohua Li @ 2017-10-13 19:51 UTC (permalink / raw)
  To: NeilBrown; +Cc: Song Liu, linux-raid, kumba, Shaohua Li

On Fri, Oct 13, 2017 at 04:16:33PM +1100, Neil Brown wrote:
> On Thu, Oct 12 2017, Song Liu wrote:
> 
> >> On Oct 12, 2017, at 10:30 AM, Shaohua Li <shli@kernel.org> wrote:
> >> 
> >> On Thu, Oct 12, 2017 at 02:09:21PM +1100, Neil Brown wrote:
> >>> On Tue, Oct 10 2017, Shaohua Li wrote:
> >>> 
> >>>> From: Shaohua Li <shli@fb.com>
> >>>> 
> >>>> If PAGE_SIZE is bigger than 4k, we could read out of the disk boundary. Limit
> >>>> the read size to the end of disk. Write path already has similar limitation.
> >>>> 
> >>>> Fix: 8031c3ddc70a(md/bitmap: copy correct data for bitmap super)
> >>>> Reported-by: Joshua Kinard <kumba@gentoo.org>
> >>>> Tested-by: Joshua Kinard <kumba@gentoo.org>
> >>>> Cc: Song Liu <songliubraving@fb.com>
> >>>> Signed-off-by: Shaohua Li <shli@fb.com>
> >>> 
> >>> Given that this bug was introduced by
> >>> Commit: 8031c3ddc70a ("md/bitmap: copy correct data for bitmap super")
> >>> 
> >>> and that patch is markted:
> >>> 
> >>>    Cc: stable@vger.kernel.org (4.10+)
> >>> 
> >>> I think this patch should be tagged "CC: stable" too.
> >> 
> >> I thought the Fix tag is enough, but I'll add the stable 
> >>> However ... that earlier patch looks strange to me.
> >>> Why is it that "raid5 cache could write bitmap superblock before bitmap superblock is
> >>> initialized."  Can we just get raid5 cache *not* to write the bitmap
> >>> superblock too early?
> >>> I think that would better than breaking code that previously worked.
> >> 
> >> That's the log reply code, which must update superblock and hence bitmap
> >> superblock, because reply happens very earlier. I agree the reply might still
> >> have problem with bitmap. We'd better defer reply after the raid is fully
> >> initialized. Song, any idea?
> >> 
> >
> > With write back cache, there are two different types of stripes in recovery:
> > data-parity, and data-only. For data-parity stripes, we can simply replay data
> > from the journal. But for data-only stripes, we need to do rcw or rmw to update
> > parities. Currently, the writes are handled with raid5 state machine. Therefore,
> > we wake up mddev->thread in r5l_recovery_log(). It is necessary to finish these 
> > stripes before we fully initialize the array, because these stripes need to be 
> > handled with write back state machine; while we we always start the array with 
> > write through journal_mode. 
> >
> > Maybe we can fix this by change the order of initialization in md_run(), 
> > specifically, moving bitmap_create() before pers->run(). 
> 
> I've looked at some of the details here now.
> 
> I think I would like raid5-cache to not perform any recovery until we
> reach
> 
> 
> 	md_wakeup_thread(mddev->thread);
> 	md_wakeup_thread(mddev->sync_thread); /* possibly kick off a reshape */
> 
> 
> in do_md_run().  Before that point it is possible to fail and abort -
> e.g. if bitmap_load() fails.
> 
> Possibly we could insert another personality call here "->start()" ??
> That could then do whatever is needed before
> 
> 	set_capacity(mddev->gendisk, mddev->array_sectors);
> 	revalidate_disk(mddev->gendisk);
> 
> makes the array accessible.
> 
> Might that be reasonable?

Looks good. I think we should call the ->start before
md_wakeup_thread(mddev->thread); because we don't want to start recovery before
log is recovered.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] md/bitmap: avoid read out of the disk
  2017-10-13 19:51         ` Shaohua Li
@ 2017-10-16 16:21           ` Song Liu
  2017-10-16 21:15             ` NeilBrown
  0 siblings, 1 reply; 22+ messages in thread
From: Song Liu @ 2017-10-16 16:21 UTC (permalink / raw)
  To: Shaohua Li; +Cc: NeilBrown, linux-raid, kumba, Shaohua Li


> On Oct 13, 2017, at 12:51 PM, Shaohua Li <shli@kernel.org> wrote:
> 
> On Fri, Oct 13, 2017 at 04:16:33PM +1100, Neil Brown wrote:
>> On Thu, Oct 12 2017, Song Liu wrote:
>> 
>>>> On Oct 12, 2017, at 10:30 AM, Shaohua Li <shli@kernel.org> wrote:
>>>> 
>>>> On Thu, Oct 12, 2017 at 02:09:21PM +1100, Neil Brown wrote:
>>>>> On Tue, Oct 10 2017, Shaohua Li wrote:
>>>>> 
>>>>>> From: Shaohua Li <shli@fb.com>
>>>>>> 
>>>>>> If PAGE_SIZE is bigger than 4k, we could read out of the disk boundary. Limit
>>>>>> the read size to the end of disk. Write path already has similar limitation.
>>>>>> 
>>>>>> Fix: 8031c3ddc70a(md/bitmap: copy correct data for bitmap super)
>>>>>> Reported-by: Joshua Kinard <kumba@gentoo.org>
>>>>>> Tested-by: Joshua Kinard <kumba@gentoo.org>
>>>>>> Cc: Song Liu <songliubraving@fb.com>
>>>>>> Signed-off-by: Shaohua Li <shli@fb.com>
>>>>> 
>>>>> Given that this bug was introduced by
>>>>> Commit: 8031c3ddc70a ("md/bitmap: copy correct data for bitmap super")
>>>>> 
>>>>> and that patch is markted:
>>>>> 
>>>>>   Cc: stable@vger.kernel.org (4.10+)
>>>>> 
>>>>> I think this patch should be tagged "CC: stable" too.
>>>> 
>>>> I thought the Fix tag is enough, but I'll add the stable 
>>>>> However ... that earlier patch looks strange to me.
>>>>> Why is it that "raid5 cache could write bitmap superblock before bitmap superblock is
>>>>> initialized."  Can we just get raid5 cache *not* to write the bitmap
>>>>> superblock too early?
>>>>> I think that would better than breaking code that previously worked.
>>>> 
>>>> That's the log reply code, which must update superblock and hence bitmap
>>>> superblock, because reply happens very earlier. I agree the reply might still
>>>> have problem with bitmap. We'd better defer reply after the raid is fully
>>>> initialized. Song, any idea?
>>>> 
>>> 
>>> With write back cache, there are two different types of stripes in recovery:
>>> data-parity, and data-only. For data-parity stripes, we can simply replay data
>>> from the journal. But for data-only stripes, we need to do rcw or rmw to update
>>> parities. Currently, the writes are handled with raid5 state machine. Therefore,
>>> we wake up mddev->thread in r5l_recovery_log(). It is necessary to finish these 
>>> stripes before we fully initialize the array, because these stripes need to be 
>>> handled with write back state machine; while we we always start the array with 
>>> write through journal_mode. 
>>> 
>>> Maybe we can fix this by change the order of initialization in md_run(), 
>>> specifically, moving bitmap_create() before pers->run(). 
>> 
>> I've looked at some of the details here now.
>> 
>> I think I would like raid5-cache to not perform any recovery until we
>> reach
>> 
>> 
>> 	md_wakeup_thread(mddev->thread);
>> 	md_wakeup_thread(mddev->sync_thread); /* possibly kick off a reshape */
>> 
>> 
>> in do_md_run().  Before that point it is possible to fail and abort -
>> e.g. if bitmap_load() fails.
>> 
>> Possibly we could insert another personality call here "->start()" ??
>> That could then do whatever is needed before
>> 
>> 	set_capacity(mddev->gendisk, mddev->array_sectors);
>> 	revalidate_disk(mddev->gendisk);
>> 
>> makes the array accessible.
>> 
>> Might that be reasonable?
> 
> Looks good. I think we should call the ->start before
> md_wakeup_thread(mddev->thread); because we don't want to start recovery before
> log is recovered.

I also like this idea. In the coming month, I won't have much bandwidth to 
implement this. Please let me know if you want to make the change. Otherwise, 
I will do it later (in December, I guess). 

Thanks,
Song







^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] md/bitmap: avoid read out of the disk
  2017-10-16 16:21           ` Song Liu
@ 2017-10-16 21:15             ` NeilBrown
  2017-10-16 23:56               ` Shaohua Li
  0 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2017-10-16 21:15 UTC (permalink / raw)
  To: Song Liu, Shaohua Li; +Cc: linux-raid, kumba, Shaohua Li

[-- Attachment #1: Type: text/plain, Size: 4377 bytes --]

On Mon, Oct 16 2017, Song Liu wrote:

>> On Oct 13, 2017, at 12:51 PM, Shaohua Li <shli@kernel.org> wrote:
>> 
>> On Fri, Oct 13, 2017 at 04:16:33PM +1100, Neil Brown wrote:
>>> On Thu, Oct 12 2017, Song Liu wrote:
>>> 
>>>>> On Oct 12, 2017, at 10:30 AM, Shaohua Li <shli@kernel.org> wrote:
>>>>> 
>>>>> On Thu, Oct 12, 2017 at 02:09:21PM +1100, Neil Brown wrote:
>>>>>> On Tue, Oct 10 2017, Shaohua Li wrote:
>>>>>> 
>>>>>>> From: Shaohua Li <shli@fb.com>
>>>>>>> 
>>>>>>> If PAGE_SIZE is bigger than 4k, we could read out of the disk boundary. Limit
>>>>>>> the read size to the end of disk. Write path already has similar limitation.
>>>>>>> 
>>>>>>> Fix: 8031c3ddc70a(md/bitmap: copy correct data for bitmap super)
>>>>>>> Reported-by: Joshua Kinard <kumba@gentoo.org>
>>>>>>> Tested-by: Joshua Kinard <kumba@gentoo.org>
>>>>>>> Cc: Song Liu <songliubraving@fb.com>
>>>>>>> Signed-off-by: Shaohua Li <shli@fb.com>
>>>>>> 
>>>>>> Given that this bug was introduced by
>>>>>> Commit: 8031c3ddc70a ("md/bitmap: copy correct data for bitmap super")
>>>>>> 
>>>>>> and that patch is markted:
>>>>>> 
>>>>>>   Cc: stable@vger.kernel.org (4.10+)
>>>>>> 
>>>>>> I think this patch should be tagged "CC: stable" too.
>>>>> 
>>>>> I thought the Fix tag is enough, but I'll add the stable 
>>>>>> However ... that earlier patch looks strange to me.
>>>>>> Why is it that "raid5 cache could write bitmap superblock before bitmap superblock is
>>>>>> initialized."  Can we just get raid5 cache *not* to write the bitmap
>>>>>> superblock too early?
>>>>>> I think that would better than breaking code that previously worked.
>>>>> 
>>>>> That's the log reply code, which must update superblock and hence bitmap
>>>>> superblock, because reply happens very earlier. I agree the reply might still
>>>>> have problem with bitmap. We'd better defer reply after the raid is fully
>>>>> initialized. Song, any idea?
>>>>> 
>>>> 
>>>> With write back cache, there are two different types of stripes in recovery:
>>>> data-parity, and data-only. For data-parity stripes, we can simply replay data
>>>> from the journal. But for data-only stripes, we need to do rcw or rmw to update
>>>> parities. Currently, the writes are handled with raid5 state machine. Therefore,
>>>> we wake up mddev->thread in r5l_recovery_log(). It is necessary to finish these 
>>>> stripes before we fully initialize the array, because these stripes need to be 
>>>> handled with write back state machine; while we we always start the array with 
>>>> write through journal_mode. 
>>>> 
>>>> Maybe we can fix this by change the order of initialization in md_run(), 
>>>> specifically, moving bitmap_create() before pers->run(). 
>>> 
>>> I've looked at some of the details here now.
>>> 
>>> I think I would like raid5-cache to not perform any recovery until we
>>> reach
>>> 
>>> 
>>> 	md_wakeup_thread(mddev->thread);
>>> 	md_wakeup_thread(mddev->sync_thread); /* possibly kick off a reshape */
>>> 
>>> 
>>> in do_md_run().  Before that point it is possible to fail and abort -
>>> e.g. if bitmap_load() fails.
>>> 
>>> Possibly we could insert another personality call here "->start()" ??
>>> That could then do whatever is needed before
>>> 
>>> 	set_capacity(mddev->gendisk, mddev->array_sectors);
>>> 	revalidate_disk(mddev->gendisk);
>>> 
>>> makes the array accessible.
>>> 
>>> Might that be reasonable?
>> 
>> Looks good. I think we should call the ->start before
>> md_wakeup_thread(mddev->thread); because we don't want to start recovery before
>> log is recovered.
>
> I also like this idea. In the coming month, I won't have much bandwidth to 
> implement this. Please let me know if you want to make the change. Otherwise, 
> I will do it later (in December, I guess). 

It isn't something we should rush so take your time.
However I think we need to clean up the patches that have gone to
-stable.
I think
 Commit: 8031c3ddc70a ("md/bitmap: copy correct data for bitmap super")
should be reverted (in -stable too) and possibly be replaced by a patch
which refuses any attempt to combine a bitmap with a journal.
As Shaohua pointed that, that shouldn't really be needed anyway.
That would address the issue that 8031c3ddc70a was meant to fix.
I can write that patch if necessary.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] md/bitmap: avoid read out of the disk
  2017-10-16 21:15             ` NeilBrown
@ 2017-10-16 23:56               ` Shaohua Li
  2017-10-17  3:24                 ` [PATCH] md: forbid a RAID5 from having both a bitmap and a journal NeilBrown
  0 siblings, 1 reply; 22+ messages in thread
From: Shaohua Li @ 2017-10-16 23:56 UTC (permalink / raw)
  To: NeilBrown; +Cc: Song Liu, linux-raid, kumba, Shaohua Li

On Tue, Oct 17, 2017 at 08:15:33AM +1100, Neil Brown wrote:
> On Mon, Oct 16 2017, Song Liu wrote:
> 
> >> On Oct 13, 2017, at 12:51 PM, Shaohua Li <shli@kernel.org> wrote:
> >> 
> >> On Fri, Oct 13, 2017 at 04:16:33PM +1100, Neil Brown wrote:
> >>> On Thu, Oct 12 2017, Song Liu wrote:
> >>> 
> >>>>> On Oct 12, 2017, at 10:30 AM, Shaohua Li <shli@kernel.org> wrote:
> >>>>> 
> >>>>> On Thu, Oct 12, 2017 at 02:09:21PM +1100, Neil Brown wrote:
> >>>>>> On Tue, Oct 10 2017, Shaohua Li wrote:
> >>>>>> 
> >>>>>>> From: Shaohua Li <shli@fb.com>
> >>>>>>> 
> >>>>>>> If PAGE_SIZE is bigger than 4k, we could read out of the disk boundary. Limit
> >>>>>>> the read size to the end of disk. Write path already has similar limitation.
> >>>>>>> 
> >>>>>>> Fix: 8031c3ddc70a(md/bitmap: copy correct data for bitmap super)
> >>>>>>> Reported-by: Joshua Kinard <kumba@gentoo.org>
> >>>>>>> Tested-by: Joshua Kinard <kumba@gentoo.org>
> >>>>>>> Cc: Song Liu <songliubraving@fb.com>
> >>>>>>> Signed-off-by: Shaohua Li <shli@fb.com>
> >>>>>> 
> >>>>>> Given that this bug was introduced by
> >>>>>> Commit: 8031c3ddc70a ("md/bitmap: copy correct data for bitmap super")
> >>>>>> 
> >>>>>> and that patch is markted:
> >>>>>> 
> >>>>>>   Cc: stable@vger.kernel.org (4.10+)
> >>>>>> 
> >>>>>> I think this patch should be tagged "CC: stable" too.
> >>>>> 
> >>>>> I thought the Fix tag is enough, but I'll add the stable 
> >>>>>> However ... that earlier patch looks strange to me.
> >>>>>> Why is it that "raid5 cache could write bitmap superblock before bitmap superblock is
> >>>>>> initialized."  Can we just get raid5 cache *not* to write the bitmap
> >>>>>> superblock too early?
> >>>>>> I think that would better than breaking code that previously worked.
> >>>>> 
> >>>>> That's the log reply code, which must update superblock and hence bitmap
> >>>>> superblock, because reply happens very earlier. I agree the reply might still
> >>>>> have problem with bitmap. We'd better defer reply after the raid is fully
> >>>>> initialized. Song, any idea?
> >>>>> 
> >>>> 
> >>>> With write back cache, there are two different types of stripes in recovery:
> >>>> data-parity, and data-only. For data-parity stripes, we can simply replay data
> >>>> from the journal. But for data-only stripes, we need to do rcw or rmw to update
> >>>> parities. Currently, the writes are handled with raid5 state machine. Therefore,
> >>>> we wake up mddev->thread in r5l_recovery_log(). It is necessary to finish these 
> >>>> stripes before we fully initialize the array, because these stripes need to be 
> >>>> handled with write back state machine; while we we always start the array with 
> >>>> write through journal_mode. 
> >>>> 
> >>>> Maybe we can fix this by change the order of initialization in md_run(), 
> >>>> specifically, moving bitmap_create() before pers->run(). 
> >>> 
> >>> I've looked at some of the details here now.
> >>> 
> >>> I think I would like raid5-cache to not perform any recovery until we
> >>> reach
> >>> 
> >>> 
> >>> 	md_wakeup_thread(mddev->thread);
> >>> 	md_wakeup_thread(mddev->sync_thread); /* possibly kick off a reshape */
> >>> 
> >>> 
> >>> in do_md_run().  Before that point it is possible to fail and abort -
> >>> e.g. if bitmap_load() fails.
> >>> 
> >>> Possibly we could insert another personality call here "->start()" ??
> >>> That could then do whatever is needed before
> >>> 
> >>> 	set_capacity(mddev->gendisk, mddev->array_sectors);
> >>> 	revalidate_disk(mddev->gendisk);
> >>> 
> >>> makes the array accessible.
> >>> 
> >>> Might that be reasonable?
> >> 
> >> Looks good. I think we should call the ->start before
> >> md_wakeup_thread(mddev->thread); because we don't want to start recovery before
> >> log is recovered.
> >
> > I also like this idea. In the coming month, I won't have much bandwidth to 
> > implement this. Please let me know if you want to make the change. Otherwise, 
> > I will do it later (in December, I guess). 
> 
> It isn't something we should rush so take your time.
> However I think we need to clean up the patches that have gone to
> -stable.
> I think
>  Commit: 8031c3ddc70a ("md/bitmap: copy correct data for bitmap super")
> should be reverted (in -stable too) and possibly be replaced by a patch
> which refuses any attempt to combine a bitmap with a journal.
> As Shaohua pointed that, that shouldn't really be needed anyway.
> That would address the issue that 8031c3ddc70a was meant to fix.
> I can write that patch if necessary.

Sounds good to me. Appreciate if you write the patch.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] md: forbid a RAID5 from having both a bitmap and a journal.
  2017-10-16 23:56               ` Shaohua Li
@ 2017-10-17  3:24                 ` NeilBrown
  2017-10-17 20:41                   ` John Stoffel
                                     ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: NeilBrown @ 2017-10-17  3:24 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Song Liu, linux-raid, kumba, Shaohua Li

[-- Attachment #1: Type: text/plain, Size: 2380 bytes --]


Having both a bitmap and a journal is pointless.
Attempting to do so can corrupt the bitmap if the journal
replay happens before the bitmap is initialized.
Rather than try to avoid this corruption, simply
refuse to allow arrays with both a bitmap and a journal.
So:
 - if raid5_run sees both are present, fail.
 - if adding a bitmap finds a journal is present, fail
 - if adding a journal finds a bitmap is present, fail.

Cc: stable@vger.kernel.org (4.10+)
Signed-off-by: NeilBrown <neilb@suse.com>
---

This patch should replace 8031c3ddc70ab, which should be reverted.

Thanks,
NeilBrown


 drivers/md/bitmap.c | 6 ++++++
 drivers/md/md.c     | 2 +-
 drivers/md/raid5.c  | 7 +++++++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index cae57b5be817..f425905c97fa 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -1816,6 +1816,12 @@ struct bitmap *bitmap_create(struct mddev *mddev, int slot)
 
 	BUG_ON(file && mddev->bitmap_info.offset);
 
+	if (test_bit(MD_HAS_JOURNAL, &mddev->flags)) {
+		pr_notice("md/raid:%s: array with journal cannot have bitmap\n",
+			  mdname(mddev));
+		return ERR_PTR(-EBUSY);
+	}
+
 	bitmap = kzalloc(sizeof(*bitmap), GFP_KERNEL);
 	if (!bitmap)
 		return ERR_PTR(-ENOMEM);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 63ecfb063b76..bf06ff017eda 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6384,7 +6384,7 @@ static int add_new_disk(struct mddev *mddev, mdu_disk_info_t *info)
 					break;
 				}
 			}
-			if (has_journal) {
+			if (has_journal || mddev->bitmap) {
 				export_rdev(rdev);
 				return -EBUSY;
 			}
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 4f40ccd21cbb..e070e5c68801 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7134,6 +7134,13 @@ static int raid5_run(struct mddev *mddev)
 			min_offset_diff = diff;
 	}
 
+	if ((test_bit(MD_HAS_JOURNAL, &mddev->flags) || journal_dev) &&
+	    (mddev->bitmap_info.offset || mddev->bitmap_info.file)) {
+		pr_notice("md/raid:%s: array cannot have both journal and bitmap\n",
+			  mdname(mddev));
+		return -EINVAL;
+	}
+
 	if (mddev->reshape_position != MaxSector) {
 		/* Check that we can continue the reshape.
 		 * Difficulties arise if the stripe we would write to
-- 
2.14.0.rc0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] md: forbid a RAID5 from having both a bitmap and a journal.
  2017-10-17  3:24                 ` [PATCH] md: forbid a RAID5 from having both a bitmap and a journal NeilBrown
@ 2017-10-17 20:41                   ` John Stoffel
  2017-10-17 21:03                     ` NeilBrown
  2017-10-18  1:50                   ` Joshua Kinard
  2017-10-19  3:16                   ` Shaohua Li
  2 siblings, 1 reply; 22+ messages in thread
From: John Stoffel @ 2017-10-17 20:41 UTC (permalink / raw)
  To: NeilBrown; +Cc: Shaohua Li, Song Liu, linux-raid, kumba, Shaohua Li

>>>>> "NeilBrown" == NeilBrown  <neilb@suse.com> writes:

NeilBrown> Having both a bitmap and a journal is pointless.
NeilBrown> Attempting to do so can corrupt the bitmap if the journal
NeilBrown> replay happens before the bitmap is initialized.
NeilBrown> Rather than try to avoid this corruption, simply
NeilBrown> refuse to allow arrays with both a bitmap and a journal.
NeilBrown> So:
NeilBrown>  - if raid5_run sees both are present, fail.

So what happens if there's someone out there with an array setup with
both already?  Should the journal or the bitmap be removed at this
time?  

NeilBrown>  - if adding a bitmap finds a journal is present, fail
NeilBrown>  - if adding a journal finds a bitmap is present, fail.

NeilBrown> Cc: stable@vger.kernel.org (4.10+)
NeilBrown> Signed-off-by: NeilBrown <neilb@suse.com>
NeilBrown> ---

NeilBrown> This patch should replace 8031c3ddc70ab, which should be reverted.

NeilBrown> Thanks,
NeilBrown> NeilBrown


NeilBrown>  drivers/md/bitmap.c | 6 ++++++
NeilBrown>  drivers/md/md.c     | 2 +-
NeilBrown>  drivers/md/raid5.c  | 7 +++++++
NeilBrown>  3 files changed, 14 insertions(+), 1 deletion(-)

NeilBrown> diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
NeilBrown> index cae57b5be817..f425905c97fa 100644
NeilBrown> --- a/drivers/md/bitmap.c
NeilBrown> +++ b/drivers/md/bitmap.c
NeilBrown> @@ -1816,6 +1816,12 @@ struct bitmap *bitmap_create(struct mddev *mddev, int slot)
 
NeilBrown>  	BUG_ON(file && mddev->bitmap_info.offset);
 
NeilBrown> +	if (test_bit(MD_HAS_JOURNAL, &mddev->flags)) {
NeilBrown> +		pr_notice("md/raid:%s: array with journal cannot have bitmap\n",
NeilBrown> +			  mdname(mddev));
NeilBrown> +		return ERR_PTR(-EBUSY);
NeilBrown> +	}
NeilBrown> +
NeilBrown>  	bitmap = kzalloc(sizeof(*bitmap), GFP_KERNEL);
NeilBrown>  	if (!bitmap)
NeilBrown>  		return ERR_PTR(-ENOMEM);
NeilBrown> diff --git a/drivers/md/md.c b/drivers/md/md.c
NeilBrown> index 63ecfb063b76..bf06ff017eda 100644
NeilBrown> --- a/drivers/md/md.c
NeilBrown> +++ b/drivers/md/md.c
NeilBrown> @@ -6384,7 +6384,7 @@ static int add_new_disk(struct mddev *mddev, mdu_disk_info_t *info)
NeilBrown>  					break;
NeilBrown>  				}
NeilBrown>  			}
NeilBrown> -			if (has_journal) {
NeilBrown> +			if (has_journal || mddev->bitmap) {
NeilBrown>  				export_rdev(rdev);
NeilBrown>  				return -EBUSY;
NeilBrown>  			}
NeilBrown> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
NeilBrown> index 4f40ccd21cbb..e070e5c68801 100644
NeilBrown> --- a/drivers/md/raid5.c
NeilBrown> +++ b/drivers/md/raid5.c
NeilBrown> @@ -7134,6 +7134,13 @@ static int raid5_run(struct mddev *mddev)
NeilBrown>  			min_offset_diff = diff;
NeilBrown>  	}
 
NeilBrown> +	if ((test_bit(MD_HAS_JOURNAL, &mddev->flags) || journal_dev) &&
NeilBrown> +	    (mddev->bitmap_info.offset || mddev->bitmap_info.file)) {
NeilBrown> +		pr_notice("md/raid:%s: array cannot have both journal and bitmap\n",
NeilBrown> +			  mdname(mddev));
NeilBrown> +		return -EINVAL;
NeilBrown> +	}
NeilBrown> +
NeilBrown>  	if (mddev->reshape_position != MaxSector) {
NeilBrown>  		/* Check that we can continue the reshape.
NeilBrown>  		 * Difficulties arise if the stripe we would write to
NeilBrown> -- 
NeilBrown> 2.14.0.rc0.dirty

NeilBrown> [DELETED ATTACHMENT signature.asc, application/pgp-signature]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] md: forbid a RAID5 from having both a bitmap and a journal.
  2017-10-17 20:41                   ` John Stoffel
@ 2017-10-17 21:03                     ` NeilBrown
  2017-10-18  1:51                       ` Joshua Kinard
  2017-10-18 14:48                       ` John Stoffel
  0 siblings, 2 replies; 22+ messages in thread
From: NeilBrown @ 2017-10-17 21:03 UTC (permalink / raw)
  To: John Stoffel; +Cc: Shaohua Li, Song Liu, linux-raid, kumba, Shaohua Li

[-- Attachment #1: Type: text/plain, Size: 3956 bytes --]

On Tue, Oct 17 2017, John Stoffel wrote:

>>>>>> "NeilBrown" == NeilBrown  <neilb@suse.com> writes:
>
> NeilBrown> Having both a bitmap and a journal is pointless.
> NeilBrown> Attempting to do so can corrupt the bitmap if the journal
> NeilBrown> replay happens before the bitmap is initialized.
> NeilBrown> Rather than try to avoid this corruption, simply
> NeilBrown> refuse to allow arrays with both a bitmap and a journal.
> NeilBrown> So:
> NeilBrown>  - if raid5_run sees both are present, fail.
>
> So what happens if there's someone out there with an array setup with
> both already?  Should the journal or the bitmap be removed at this
> time?  

If someone has an array like that they can assemble it with
  --update=no-bitmap

I'd rather not automatically disable things.

Thanks,
NeilBrown


>
> NeilBrown>  - if adding a bitmap finds a journal is present, fail
> NeilBrown>  - if adding a journal finds a bitmap is present, fail.
>
> NeilBrown> Cc: stable@vger.kernel.org (4.10+)
> NeilBrown> Signed-off-by: NeilBrown <neilb@suse.com>
> NeilBrown> ---
>
> NeilBrown> This patch should replace 8031c3ddc70ab, which should be reverted.
>
> NeilBrown> Thanks,
> NeilBrown> NeilBrown
>
>
> NeilBrown>  drivers/md/bitmap.c | 6 ++++++
> NeilBrown>  drivers/md/md.c     | 2 +-
> NeilBrown>  drivers/md/raid5.c  | 7 +++++++
> NeilBrown>  3 files changed, 14 insertions(+), 1 deletion(-)
>
> NeilBrown> diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
> NeilBrown> index cae57b5be817..f425905c97fa 100644
> NeilBrown> --- a/drivers/md/bitmap.c
> NeilBrown> +++ b/drivers/md/bitmap.c
> NeilBrown> @@ -1816,6 +1816,12 @@ struct bitmap *bitmap_create(struct mddev *mddev, int slot)
>  
> NeilBrown>  	BUG_ON(file && mddev->bitmap_info.offset);
>  
> NeilBrown> +	if (test_bit(MD_HAS_JOURNAL, &mddev->flags)) {
> NeilBrown> +		pr_notice("md/raid:%s: array with journal cannot have bitmap\n",
> NeilBrown> +			  mdname(mddev));
> NeilBrown> +		return ERR_PTR(-EBUSY);
> NeilBrown> +	}
> NeilBrown> +
> NeilBrown>  	bitmap = kzalloc(sizeof(*bitmap), GFP_KERNEL);
> NeilBrown>  	if (!bitmap)
> NeilBrown>  		return ERR_PTR(-ENOMEM);
> NeilBrown> diff --git a/drivers/md/md.c b/drivers/md/md.c
> NeilBrown> index 63ecfb063b76..bf06ff017eda 100644
> NeilBrown> --- a/drivers/md/md.c
> NeilBrown> +++ b/drivers/md/md.c
> NeilBrown> @@ -6384,7 +6384,7 @@ static int add_new_disk(struct mddev *mddev, mdu_disk_info_t *info)
> NeilBrown>  					break;
> NeilBrown>  				}
> NeilBrown>  			}
> NeilBrown> -			if (has_journal) {
> NeilBrown> +			if (has_journal || mddev->bitmap) {
> NeilBrown>  				export_rdev(rdev);
> NeilBrown>  				return -EBUSY;
> NeilBrown>  			}
> NeilBrown> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> NeilBrown> index 4f40ccd21cbb..e070e5c68801 100644
> NeilBrown> --- a/drivers/md/raid5.c
> NeilBrown> +++ b/drivers/md/raid5.c
> NeilBrown> @@ -7134,6 +7134,13 @@ static int raid5_run(struct mddev *mddev)
> NeilBrown>  			min_offset_diff = diff;
> NeilBrown>  	}
>  
> NeilBrown> +	if ((test_bit(MD_HAS_JOURNAL, &mddev->flags) || journal_dev) &&
> NeilBrown> +	    (mddev->bitmap_info.offset || mddev->bitmap_info.file)) {
> NeilBrown> +		pr_notice("md/raid:%s: array cannot have both journal and bitmap\n",
> NeilBrown> +			  mdname(mddev));
> NeilBrown> +		return -EINVAL;
> NeilBrown> +	}
> NeilBrown> +
> NeilBrown>  	if (mddev->reshape_position != MaxSector) {
> NeilBrown>  		/* Check that we can continue the reshape.
> NeilBrown>  		 * Difficulties arise if the stripe we would write to
> NeilBrown> -- 
> NeilBrown> 2.14.0.rc0.dirty
>
> NeilBrown> [DELETED ATTACHMENT signature.asc, application/pgp-signature]
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] md: forbid a RAID5 from having both a bitmap and a journal.
  2017-10-17  3:24                 ` [PATCH] md: forbid a RAID5 from having both a bitmap and a journal NeilBrown
  2017-10-17 20:41                   ` John Stoffel
@ 2017-10-18  1:50                   ` Joshua Kinard
  2017-10-19  3:16                   ` Shaohua Li
  2 siblings, 0 replies; 22+ messages in thread
From: Joshua Kinard @ 2017-10-18  1:50 UTC (permalink / raw)
  To: NeilBrown, Shaohua Li; +Cc: Song Liu, linux-raid, Shaohua Li

On 10/16/2017 23:24, NeilBrown wrote:
> 
> Having both a bitmap and a journal is pointless.
> Attempting to do so can corrupt the bitmap if the journal
> replay happens before the bitmap is initialized.
> Rather than try to avoid this corruption, simply
> refuse to allow arrays with both a bitmap and a journal.
> So:
>  - if raid5_run sees both are present, fail.
>  - if adding a bitmap finds a journal is present, fail
>  - if adding a journal finds a bitmap is present, fail.
> 
> Cc: stable@vger.kernel.org (4.10+)
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
> 
> This patch should replace 8031c3ddc70ab, which should be reverted.
> 
> Thanks,
> NeilBrown
> 
> 
>  drivers/md/bitmap.c | 6 ++++++
>  drivers/md/md.c     | 2 +-
>  drivers/md/raid5.c  | 7 +++++++
>  3 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
> index cae57b5be817..f425905c97fa 100644
> --- a/drivers/md/bitmap.c
> +++ b/drivers/md/bitmap.c
> @@ -1816,6 +1816,12 @@ struct bitmap *bitmap_create(struct mddev *mddev, int slot)
>  
>  	BUG_ON(file && mddev->bitmap_info.offset);
>  
> +	if (test_bit(MD_HAS_JOURNAL, &mddev->flags)) {
> +		pr_notice("md/raid:%s: array with journal cannot have bitmap\n",
> +			  mdname(mddev));
> +		return ERR_PTR(-EBUSY);
> +	}
> +
>  	bitmap = kzalloc(sizeof(*bitmap), GFP_KERNEL);
>  	if (!bitmap)
>  		return ERR_PTR(-ENOMEM);
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 63ecfb063b76..bf06ff017eda 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -6384,7 +6384,7 @@ static int add_new_disk(struct mddev *mddev, mdu_disk_info_t *info)
>  					break;
>  				}
>  			}
> -			if (has_journal) {
> +			if (has_journal || mddev->bitmap) {
>  				export_rdev(rdev);
>  				return -EBUSY;
>  			}
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 4f40ccd21cbb..e070e5c68801 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -7134,6 +7134,13 @@ static int raid5_run(struct mddev *mddev)
>  			min_offset_diff = diff;
>  	}
>  
> +	if ((test_bit(MD_HAS_JOURNAL, &mddev->flags) || journal_dev) &&
> +	    (mddev->bitmap_info.offset || mddev->bitmap_info.file)) {
> +		pr_notice("md/raid:%s: array cannot have both journal and bitmap\n",
> +			  mdname(mddev));
> +		return -EINVAL;
> +	}
> +
>  	if (mddev->reshape_position != MaxSector) {
>  		/* Check that we can continue the reshape.
>  		 * Difficulties arise if the stripe we would write to
> 

Currently running the above patch, with 8031c3ddc70ab reverted, on both my SGI
Octane and larger SGI Onyx2, both using md/raid5 with PAGE_SIZE=64K, and no new
issues noticed thus far with a few heavy compile runs.  Thanks for looking into
this!

Tested-by: Joshua Kinard <kumba@gentoo.org>
Acked-by: Joshua Kinard <kumba@gentoo.org>

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] md: forbid a RAID5 from having both a bitmap and a journal.
  2017-10-17 21:03                     ` NeilBrown
@ 2017-10-18  1:51                       ` Joshua Kinard
  2017-10-19 23:16                         ` Wols Lists
  2017-10-18 14:48                       ` John Stoffel
  1 sibling, 1 reply; 22+ messages in thread
From: Joshua Kinard @ 2017-10-18  1:51 UTC (permalink / raw)
  To: NeilBrown, John Stoffel; +Cc: Shaohua Li, Song Liu, linux-raid, Shaohua Li

On 10/17/2017 17:03, NeilBrown wrote:
> On Tue, Oct 17 2017, John Stoffel wrote:
> 
>>>>>>> "NeilBrown" == NeilBrown  <neilb@suse.com> writes:
>>
>> NeilBrown> Having both a bitmap and a journal is pointless.
>> NeilBrown> Attempting to do so can corrupt the bitmap if the journal
>> NeilBrown> replay happens before the bitmap is initialized.
>> NeilBrown> Rather than try to avoid this corruption, simply
>> NeilBrown> refuse to allow arrays with both a bitmap and a journal.
>> NeilBrown> So:
>> NeilBrown>  - if raid5_run sees both are present, fail.
>>
>> So what happens if there's someone out there with an array setup with
>> both already?  Should the journal or the bitmap be removed at this
>> time?  
> 
> If someone has an array like that they can assemble it with
>   --update=no-bitmap

IMHO, this should probably get documented in the Linux RAID Wiki once it goes
through.


> I'd rather not automatically disable things.
> 
> Thanks,
> NeilBrown

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] md: forbid a RAID5 from having both a bitmap and a journal.
  2017-10-17 21:03                     ` NeilBrown
  2017-10-18  1:51                       ` Joshua Kinard
@ 2017-10-18 14:48                       ` John Stoffel
  2017-10-19 23:21                         ` Wols Lists
  1 sibling, 1 reply; 22+ messages in thread
From: John Stoffel @ 2017-10-18 14:48 UTC (permalink / raw)
  To: NeilBrown
  Cc: John Stoffel, Shaohua Li, Song Liu, linux-raid, kumba, Shaohua Li

>>>>> "NeilBrown" == NeilBrown  <neilb@suse.com> writes:

NeilBrown> On Tue, Oct 17 2017, John Stoffel wrote:
>>>>>>> "NeilBrown" == NeilBrown  <neilb@suse.com> writes:
>> 
NeilBrown> Having both a bitmap and a journal is pointless.
NeilBrown> Attempting to do so can corrupt the bitmap if the journal
NeilBrown> replay happens before the bitmap is initialized.
NeilBrown> Rather than try to avoid this corruption, simply
NeilBrown> refuse to allow arrays with both a bitmap and a journal.
NeilBrown> So:
NeilBrown> - if raid5_run sees both are present, fail.
>> 
>> So what happens if there's someone out there with an array setup with
>> both already?  Should the journal or the bitmap be removed at this
>> time?  

NeilBrown> If someone has an array like that they can assemble it with
NeilBrown>   --update=no-bitmap

NeilBrown> I'd rather not automatically disable things.

That works for me, assuming there's a useful message in the logs
explaining what's gone wrong here.  Either mdadm emits the message, or
the kernel does.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] md: forbid a RAID5 from having both a bitmap and a journal.
  2017-10-17  3:24                 ` [PATCH] md: forbid a RAID5 from having both a bitmap and a journal NeilBrown
  2017-10-17 20:41                   ` John Stoffel
  2017-10-18  1:50                   ` Joshua Kinard
@ 2017-10-19  3:16                   ` Shaohua Li
  2 siblings, 0 replies; 22+ messages in thread
From: Shaohua Li @ 2017-10-19  3:16 UTC (permalink / raw)
  To: NeilBrown; +Cc: Song Liu, linux-raid, kumba, Shaohua Li

On Tue, Oct 17, 2017 at 02:24:09PM +1100, Neil Brown wrote:
> 
> Having both a bitmap and a journal is pointless.
> Attempting to do so can corrupt the bitmap if the journal
> replay happens before the bitmap is initialized.
> Rather than try to avoid this corruption, simply
> refuse to allow arrays with both a bitmap and a journal.
> So:
>  - if raid5_run sees both are present, fail.
>  - if adding a bitmap finds a journal is present, fail
>  - if adding a journal finds a bitmap is present, fail.
> 
> Cc: stable@vger.kernel.org (4.10+)
> Signed-off-by: NeilBrown <neilb@suse.com>

Applied, thanks!
> ---
> 
> This patch should replace 8031c3ddc70ab, which should be reverted.
> 
> Thanks,
> NeilBrown
> 
> 
>  drivers/md/bitmap.c | 6 ++++++
>  drivers/md/md.c     | 2 +-
>  drivers/md/raid5.c  | 7 +++++++
>  3 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
> index cae57b5be817..f425905c97fa 100644
> --- a/drivers/md/bitmap.c
> +++ b/drivers/md/bitmap.c
> @@ -1816,6 +1816,12 @@ struct bitmap *bitmap_create(struct mddev *mddev, int slot)
>  
>  	BUG_ON(file && mddev->bitmap_info.offset);
>  
> +	if (test_bit(MD_HAS_JOURNAL, &mddev->flags)) {
> +		pr_notice("md/raid:%s: array with journal cannot have bitmap\n",
> +			  mdname(mddev));
> +		return ERR_PTR(-EBUSY);
> +	}
> +
>  	bitmap = kzalloc(sizeof(*bitmap), GFP_KERNEL);
>  	if (!bitmap)
>  		return ERR_PTR(-ENOMEM);
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 63ecfb063b76..bf06ff017eda 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -6384,7 +6384,7 @@ static int add_new_disk(struct mddev *mddev, mdu_disk_info_t *info)
>  					break;
>  				}
>  			}
> -			if (has_journal) {
> +			if (has_journal || mddev->bitmap) {
>  				export_rdev(rdev);
>  				return -EBUSY;
>  			}
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 4f40ccd21cbb..e070e5c68801 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -7134,6 +7134,13 @@ static int raid5_run(struct mddev *mddev)
>  			min_offset_diff = diff;
>  	}
>  
> +	if ((test_bit(MD_HAS_JOURNAL, &mddev->flags) || journal_dev) &&
> +	    (mddev->bitmap_info.offset || mddev->bitmap_info.file)) {
> +		pr_notice("md/raid:%s: array cannot have both journal and bitmap\n",
> +			  mdname(mddev));
> +		return -EINVAL;
> +	}
> +
>  	if (mddev->reshape_position != MaxSector) {
>  		/* Check that we can continue the reshape.
>  		 * Difficulties arise if the stripe we would write to
> -- 
> 2.14.0.rc0.dirty
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] md: forbid a RAID5 from having both a bitmap and a journal.
  2017-10-18  1:51                       ` Joshua Kinard
@ 2017-10-19 23:16                         ` Wols Lists
  0 siblings, 0 replies; 22+ messages in thread
From: Wols Lists @ 2017-10-19 23:16 UTC (permalink / raw)
  To: Joshua Kinard, NeilBrown, John Stoffel
  Cc: Shaohua Li, Song Liu, linux-raid, Shaohua Li

On 18/10/17 02:51, Joshua Kinard wrote:
> On 10/17/2017 17:03, NeilBrown wrote:
>> On Tue, Oct 17 2017, John Stoffel wrote:
>>
>>>>>>>> "NeilBrown" == NeilBrown  <neilb@suse.com> writes:
>>>
>>> NeilBrown> Having both a bitmap and a journal is pointless.
>>> NeilBrown> Attempting to do so can corrupt the bitmap if the journal
>>> NeilBrown> replay happens before the bitmap is initialized.
>>> NeilBrown> Rather than try to avoid this corruption, simply
>>> NeilBrown> refuse to allow arrays with both a bitmap and a journal.
>>> NeilBrown> So:
>>> NeilBrown>  - if raid5_run sees both are present, fail.
>>>
>>> So what happens if there's someone out there with an array setup with
>>> both already?  Should the journal or the bitmap be removed at this
>>> time?  
>>
>> If someone has an array like that they can assemble it with
>>   --update=no-bitmap
> 
> IMHO, this should probably get documented in the Linux RAID Wiki once it goes
> through.
> 
Done! I'm not sure where it should go, so under "When things go wrogn",
I've put it in the "easy fixes" section. It simply notes that you are
now not allowed to combine journals and bitmaps, and gives the fix. (It
also says when "now" is :-)
> 
>> I'd rather not automatically disable things.
>>
Cheers,
Wol


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] md: forbid a RAID5 from having both a bitmap and a journal.
  2017-10-18 14:48                       ` John Stoffel
@ 2017-10-19 23:21                         ` Wols Lists
  0 siblings, 0 replies; 22+ messages in thread
From: Wols Lists @ 2017-10-19 23:21 UTC (permalink / raw)
  To: John Stoffel, NeilBrown
  Cc: Shaohua Li, Song Liu, linux-raid, kumba, Shaohua Li

On 18/10/17 15:48, John Stoffel wrote:
>>>>>> "NeilBrown" == NeilBrown  <neilb@suse.com> writes:
> 
> NeilBrown> On Tue, Oct 17 2017, John Stoffel wrote:
>>>>>>>> "NeilBrown" == NeilBrown  <neilb@suse.com> writes:
>>>
> NeilBrown> Having both a bitmap and a journal is pointless.
> NeilBrown> Attempting to do so can corrupt the bitmap if the journal
> NeilBrown> replay happens before the bitmap is initialized.
> NeilBrown> Rather than try to avoid this corruption, simply
> NeilBrown> refuse to allow arrays with both a bitmap and a journal.
> NeilBrown> So:
> NeilBrown> - if raid5_run sees both are present, fail.
>>>
>>> So what happens if there's someone out there with an array setup with
>>> both already?  Should the journal or the bitmap be removed at this
>>> time?  
> 
> NeilBrown> If someone has an array like that they can assemble it with
> NeilBrown>   --update=no-bitmap
> 
> NeilBrown> I'd rather not automatically disable things.
> 
> That works for me, assuming there's a useful message in the logs
> explaining what's gone wrong here.  Either mdadm emits the message, or
> the kernel does.
> 
Or are we better off, initially, just adding the code that prevents
adding a bitmap or journal? Allow existing arrays to run, emitting
messages to the console and/or log that this is not a good idea?

Then maybe a few versions down the line actually refuse to start an
array in this state.

I don't know how many arrays are in this state now, but doing this
should reduce the number of people getting a nasty shock when their
system fails to boot ...

Cheers,
Wol

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-10-19 23:21 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-10 21:20 [PATCH] md/bitmap: avoid read out of the disk Shaohua Li
2017-10-11 12:41 ` Joshua Kinard
2017-10-12  3:09 ` NeilBrown
2017-10-12 17:30   ` Shaohua Li
2017-10-12 17:53     ` Song Liu
2017-10-12 21:46       ` NeilBrown
2017-10-12 22:51         ` Shaohua Li
2017-10-13  5:16       ` NeilBrown
2017-10-13 19:51         ` Shaohua Li
2017-10-16 16:21           ` Song Liu
2017-10-16 21:15             ` NeilBrown
2017-10-16 23:56               ` Shaohua Li
2017-10-17  3:24                 ` [PATCH] md: forbid a RAID5 from having both a bitmap and a journal NeilBrown
2017-10-17 20:41                   ` John Stoffel
2017-10-17 21:03                     ` NeilBrown
2017-10-18  1:51                       ` Joshua Kinard
2017-10-19 23:16                         ` Wols Lists
2017-10-18 14:48                       ` John Stoffel
2017-10-19 23:21                         ` Wols Lists
2017-10-18  1:50                   ` Joshua Kinard
2017-10-19  3:16                   ` Shaohua Li
2017-10-12 21:44     ` [PATCH] md/bitmap: avoid read out of the disk NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.