All of lore.kernel.org
 help / color / mirror / Atom feed
* raid5 crash on system which PAGE_SIZE is 64KB
@ 2021-03-15 13:44 Xiao Ni
  2021-03-16  9:20 ` Yufen Yu
  0 siblings, 1 reply; 6+ messages in thread
From: Xiao Ni @ 2021-03-15 13:44 UTC (permalink / raw)
  To: yuyufen, song, linux-raid, Nigel Croxon
  Cc: Heinz Mauelshagen, kent.overstreet

Hi all

We encounter one raid5 crash problem on POWER system which PAGE_SIZE is 
64KB.
I can reproduce this problem 100%.  This problem can be reproduced with 
latest upstream kernel.

The steps are:
mdadm -CR /dev/md0 -l5 -n3 /dev/sda1 /dev/sdc1 /dev/sdd1
mkfs.xfs /dev/md0 -f
mount /dev/md0 /mnt/test

The error message is:
mount: /mnt/test: mount(2) system call failed: Structure needs cleaning.

We can see error message in dmesg:
[ 6455.761545] XFS (md0): Metadata CRC error detected at 
xfs_agf_read_verify+0x118/0x160 [xfs], xfs_agf block 0x2105c008
[ 6455.761570] XFS (md0): Unmount and run xfs_repair
[ 6455.761575] XFS (md0): First 128 bytes of corrupted metadata buffer:
[ 6455.761581] 00000000: fe ed ba be 00 00 00 00 00 00 00 02 00 00 00 
00  ................
[ 6455.761586] 00000010: 00 00 00 00 00 00 03 c0 00 00 00 01 00 00 00 
00  ................
[ 6455.761590] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00  ................
[ 6455.761594] 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00  ................
[ 6455.761598] 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00  ................
[ 6455.761601] 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00  ................
[ 6455.761605] 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00  ................
[ 6455.761609] 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00  ................
[ 6455.761662] XFS (md0): metadata I/O error in "xfs_read_agf+0xb4/0x1a0 
[xfs]" at daddr 0x2105c008 len 8 error 74
[ 6455.761673] XFS (md0): Error -117 recovering leftover CoW allocations.
[ 6455.761685] XFS (md0): Corruption of in-memory data detected. 
Shutting down filesystem
[ 6455.761690] XFS (md0): Please unmount the filesystem and rectify the 
problem(s)

This problem doesn't happen when creating raid device with 
--assume-clean. So the crash only happens when sync and normal
I/O write at the same time.

I tried to revert the patch set "Save memory for stripe_head buffer" and 
the problem can be fixed. I'm looking at this problem,
but I haven't found the root cause. Could you have a look?

By the way, there is a place that I can't understand. Is it a bug? 
Should we do in this way:
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 5d57a5b..4a5e8ae 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1479,7 +1479,7 @@ static struct page **to_addr_page(struct 
raid5_percpu *percpu, int i)
  static addr_conv_t *to_addr_conv(struct stripe_head *sh,
                                  struct raid5_percpu *percpu, int i)
  {
-       return (void *) (to_addr_page(percpu, i) + sh->disks + 2);
+       return (void *) (to_addr_page(percpu, i) + sizeof(struct 
page*)*(sh->disks + 2));
  }

  /*
@@ -1488,7 +1488,7 @@ static addr_conv_t *to_addr_conv(struct 
stripe_head *sh,
  static unsigned int *
  to_addr_offs(struct stripe_head *sh, struct raid5_percpu *percpu)
  {
-       return (unsigned int *) (to_addr_conv(sh, percpu, 0) + sh->disks 
+ 2);
+       return (unsigned int *) (to_addr_conv(sh, percpu, 0) + 
sizeof(addr_conv_t)*(sh->disks + 2));
  }

This is introduced by commit b330e6a49d (md: convert to kvmalloc)

Regards
Xiao





^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: raid5 crash on system which PAGE_SIZE is 64KB
  2021-03-15 13:44 raid5 crash on system which PAGE_SIZE is 64KB Xiao Ni
@ 2021-03-16  9:20 ` Yufen Yu
  2021-03-22 17:28   ` Song Liu
  0 siblings, 1 reply; 6+ messages in thread
From: Yufen Yu @ 2021-03-16  9:20 UTC (permalink / raw)
  To: Xiao Ni, song, linux-raid, Nigel Croxon
  Cc: Heinz Mauelshagen, kent.overstreet



On 2021/3/15 21:44, Xiao Ni wrote:
> Hi all
> 
> We encounter one raid5 crash problem on POWER system which PAGE_SIZE is 64KB.
> I can reproduce this problem 100%.  This problem can be reproduced with latest upstream kernel.
> 
> The steps are:
> mdadm -CR /dev/md0 -l5 -n3 /dev/sda1 /dev/sdc1 /dev/sdd1
> mkfs.xfs /dev/md0 -f
> mount /dev/md0 /mnt/test
> 
> The error message is:
> mount: /mnt/test: mount(2) system call failed: Structure needs cleaning.
> 
> We can see error message in dmesg:
> [ 6455.761545] XFS (md0): Metadata CRC error detected at xfs_agf_read_verify+0x118/0x160 [xfs], xfs_agf block 0x2105c008
> [ 6455.761570] XFS (md0): Unmount and run xfs_repair
> [ 6455.761575] XFS (md0): First 128 bytes of corrupted metadata buffer:
> [ 6455.761581] 00000000: fe ed ba be 00 00 00 00 00 00 00 02 00 00 00 00  ................
> [ 6455.761586] 00000010: 00 00 00 00 00 00 03 c0 00 00 00 01 00 00 00 00  ................
> [ 6455.761590] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [ 6455.761594] 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [ 6455.761598] 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [ 6455.761601] 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [ 6455.761605] 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [ 6455.761609] 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [ 6455.761662] XFS (md0): metadata I/O error in "xfs_read_agf+0xb4/0x1a0 [xfs]" at daddr 0x2105c008 len 8 error 74
> [ 6455.761673] XFS (md0): Error -117 recovering leftover CoW allocations.
> [ 6455.761685] XFS (md0): Corruption of in-memory data detected. Shutting down filesystem
> [ 6455.761690] XFS (md0): Please unmount the filesystem and rectify the problem(s)
> 
> This problem doesn't happen when creating raid device with --assume-clean. So the crash only happens when sync and normal
> I/O write at the same time.
> 
> I tried to revert the patch set "Save memory for stripe_head buffer" and the problem can be fixed. I'm looking at this problem,
> but I haven't found the root cause. Could you have a look?

Thanks for reporting this bug. Please give me some times to debug it,
recently time is very limited for me.

Thanks,
Yufen

> 
> By the way, there is a place that I can't understand. Is it a bug? Should we do in this way:
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 5d57a5b..4a5e8ae 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -1479,7 +1479,7 @@ static struct page **to_addr_page(struct raid5_percpu *percpu, int i)
>   static addr_conv_t *to_addr_conv(struct stripe_head *sh,
>                                   struct raid5_percpu *percpu, int i)
>   {
> -       return (void *) (to_addr_page(percpu, i) + sh->disks + 2);
> +       return (void *) (to_addr_page(percpu, i) + sizeof(struct page*)*(sh->disks + 2));
>   }
> 
>   /*
> @@ -1488,7 +1488,7 @@ static addr_conv_t *to_addr_conv(struct stripe_head *sh,
>   static unsigned int *
>   to_addr_offs(struct stripe_head *sh, struct raid5_percpu *percpu)
>   {
> -       return (unsigned int *) (to_addr_conv(sh, percpu, 0) + sh->disks + 2);
> +       return (unsigned int *) (to_addr_conv(sh, percpu, 0) + sizeof(addr_conv_t)*(sh->disks + 2));
>   }
> 
> This is introduced by commit b330e6a49d (md: convert to kvmalloc)
> 
> Regards
> Xiao
> 
> 
> 
> 
> .

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: raid5 crash on system which PAGE_SIZE is 64KB
  2021-03-16  9:20 ` Yufen Yu
@ 2021-03-22 17:28   ` Song Liu
  2021-03-23  5:04     ` Xiao Ni
  2021-03-23  7:41     ` Yufen Yu
  0 siblings, 2 replies; 6+ messages in thread
From: Song Liu @ 2021-03-22 17:28 UTC (permalink / raw)
  To: Yufen Yu
  Cc: Xiao Ni, linux-raid, Nigel Croxon, Heinz Mauelshagen, kent.overstreet

On Tue, Mar 16, 2021 at 2:20 AM Yufen Yu <yuyufen@huawei.com> wrote:
>
>
>
> On 2021/3/15 21:44, Xiao Ni wrote:
> > Hi all
> >
> > We encounter one raid5 crash problem on POWER system which PAGE_SIZE is 64KB.
> > I can reproduce this problem 100%.  This problem can be reproduced with latest upstream kernel.
> >
> > The steps are:
> > mdadm -CR /dev/md0 -l5 -n3 /dev/sda1 /dev/sdc1 /dev/sdd1
> > mkfs.xfs /dev/md0 -f
> > mount /dev/md0 /mnt/test
> >
> > The error message is:
> > mount: /mnt/test: mount(2) system call failed: Structure needs cleaning.
> >
> > We can see error message in dmesg:
> > [ 6455.761545] XFS (md0): Metadata CRC error detected at xfs_agf_read_verify+0x118/0x160 [xfs], xfs_agf block 0x2105c008
> > [ 6455.761570] XFS (md0): Unmount and run xfs_repair
> > [ 6455.761575] XFS (md0): First 128 bytes of corrupted metadata buffer:
> > [ 6455.761581] 00000000: fe ed ba be 00 00 00 00 00 00 00 02 00 00 00 00  ................
> > [ 6455.761586] 00000010: 00 00 00 00 00 00 03 c0 00 00 00 01 00 00 00 00  ................
> > [ 6455.761590] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > [ 6455.761594] 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > [ 6455.761598] 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > [ 6455.761601] 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > [ 6455.761605] 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > [ 6455.761609] 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > [ 6455.761662] XFS (md0): metadata I/O error in "xfs_read_agf+0xb4/0x1a0 [xfs]" at daddr 0x2105c008 len 8 error 74
> > [ 6455.761673] XFS (md0): Error -117 recovering leftover CoW allocations.
> > [ 6455.761685] XFS (md0): Corruption of in-memory data detected. Shutting down filesystem
> > [ 6455.761690] XFS (md0): Please unmount the filesystem and rectify the problem(s)
> >
> > This problem doesn't happen when creating raid device with --assume-clean. So the crash only happens when sync and normal
> > I/O write at the same time.
> >
> > I tried to revert the patch set "Save memory for stripe_head buffer" and the problem can be fixed. I'm looking at this problem,
> > but I haven't found the root cause. Could you have a look?
>
> Thanks for reporting this bug. Please give me some times to debug it,
> recently time is very limited for me.
>
> Thanks,
> Yufen

Hi Yufen,

Have you got time to look into this?

>
> >
> > By the way, there is a place that I can't understand. Is it a bug? Should we do in this way:
> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > index 5d57a5b..4a5e8ae 100644
> > --- a/drivers/md/raid5.c
> > +++ b/drivers/md/raid5.c
> > @@ -1479,7 +1479,7 @@ static struct page **to_addr_page(struct raid5_percpu *percpu, int i)
> >   static addr_conv_t *to_addr_conv(struct stripe_head *sh,
> >                                   struct raid5_percpu *percpu, int i)
> >   {
> > -       return (void *) (to_addr_page(percpu, i) + sh->disks + 2);
> > +       return (void *) (to_addr_page(percpu, i) + sizeof(struct page*)*(sh->disks + 2));

I guess we don't need this change. to_add_page() returns "struct page **", which
should have same size of "struct page*", no?

Thanks,
Song

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: raid5 crash on system which PAGE_SIZE is 64KB
  2021-03-22 17:28   ` Song Liu
@ 2021-03-23  5:04     ` Xiao Ni
  2021-03-23  7:41     ` Yufen Yu
  1 sibling, 0 replies; 6+ messages in thread
From: Xiao Ni @ 2021-03-23  5:04 UTC (permalink / raw)
  To: Song Liu, Yufen Yu
  Cc: linux-raid, Nigel Croxon, Heinz Mauelshagen, kent.overstreet



On 03/23/2021 01:28 AM, Song Liu wrote:
> On Tue, Mar 16, 2021 at 2:20 AM Yufen Yu <yuyufen@huawei.com> wrote:
>>
>>
>> On 2021/3/15 21:44, Xiao Ni wrote:
>>> Hi all
>>>
>>> We encounter one raid5 crash problem on POWER system which PAGE_SIZE is 64KB.
>>> I can reproduce this problem 100%.  This problem can be reproduced with latest upstream kernel.
>>>
>>> The steps are:
>>> mdadm -CR /dev/md0 -l5 -n3 /dev/sda1 /dev/sdc1 /dev/sdd1
>>> mkfs.xfs /dev/md0 -f
>>> mount /dev/md0 /mnt/test
>>>
>>> The error message is:
>>> mount: /mnt/test: mount(2) system call failed: Structure needs cleaning.
>>>
>>> We can see error message in dmesg:
>>> [ 6455.761545] XFS (md0): Metadata CRC error detected at xfs_agf_read_verify+0x118/0x160 [xfs], xfs_agf block 0x2105c008
>>> [ 6455.761570] XFS (md0): Unmount and run xfs_repair
>>> [ 6455.761575] XFS (md0): First 128 bytes of corrupted metadata buffer:
>>> [ 6455.761581] 00000000: fe ed ba be 00 00 00 00 00 00 00 02 00 00 00 00  ................
>>> [ 6455.761586] 00000010: 00 00 00 00 00 00 03 c0 00 00 00 01 00 00 00 00  ................
>>> [ 6455.761590] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>> [ 6455.761594] 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>> [ 6455.761598] 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>> [ 6455.761601] 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>> [ 6455.761605] 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>> [ 6455.761609] 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>> [ 6455.761662] XFS (md0): metadata I/O error in "xfs_read_agf+0xb4/0x1a0 [xfs]" at daddr 0x2105c008 len 8 error 74
>>> [ 6455.761673] XFS (md0): Error -117 recovering leftover CoW allocations.
>>> [ 6455.761685] XFS (md0): Corruption of in-memory data detected. Shutting down filesystem
>>> [ 6455.761690] XFS (md0): Please unmount the filesystem and rectify the problem(s)
>>>
>>> This problem doesn't happen when creating raid device with --assume-clean. So the crash only happens when sync and normal
>>> I/O write at the same time.
>>>
>>> I tried to revert the patch set "Save memory for stripe_head buffer" and the problem can be fixed. I'm looking at this problem,
>>> but I haven't found the root cause. Could you have a look?
>> Thanks for reporting this bug. Please give me some times to debug it,
>> recently time is very limited for me.
>>
>> Thanks,
>> Yufen
> Hi Yufen,
>
> Have you got time to look into this?
>
>>> By the way, there is a place that I can't understand. Is it a bug? Should we do in this way:
>>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>>> index 5d57a5b..4a5e8ae 100644
>>> --- a/drivers/md/raid5.c
>>> +++ b/drivers/md/raid5.c
>>> @@ -1479,7 +1479,7 @@ static struct page **to_addr_page(struct raid5_percpu *percpu, int i)
>>>    static addr_conv_t *to_addr_conv(struct stripe_head *sh,
>>>                                    struct raid5_percpu *percpu, int i)
>>>    {
>>> -       return (void *) (to_addr_page(percpu, i) + sh->disks + 2);
>>> +       return (void *) (to_addr_page(percpu, i) + sizeof(struct page*)*(sh->disks + 2));
> I guess we don't need this change. to_add_page() returns "struct page **", which
> should have same size of "struct page*", no?

You are right. We don't need to change this. And I'm looking at this 
problem too.
I'll report once I find new hints.

Regards
Xiao


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: raid5 crash on system which PAGE_SIZE is 64KB
  2021-03-22 17:28   ` Song Liu
  2021-03-23  5:04     ` Xiao Ni
@ 2021-03-23  7:41     ` Yufen Yu
  2021-03-24  8:02       ` Xiao Ni
  1 sibling, 1 reply; 6+ messages in thread
From: Yufen Yu @ 2021-03-23  7:41 UTC (permalink / raw)
  To: Song Liu
  Cc: Xiao Ni, linux-raid, Nigel Croxon, Heinz Mauelshagen, kent.overstreet

hi

On 2021/3/23 1:28, Song Liu wrote:
> On Tue, Mar 16, 2021 at 2:20 AM Yufen Yu <yuyufen@huawei.com> wrote:
>>
>>
>>
>> On 2021/3/15 21:44, Xiao Ni wrote:
>>> Hi all
>>>
>>> We encounter one raid5 crash problem on POWER system which PAGE_SIZE is 64KB.
>>> I can reproduce this problem 100%.  This problem can be reproduced with latest upstream kernel.
>>>
>>> The steps are:
>>> mdadm -CR /dev/md0 -l5 -n3 /dev/sda1 /dev/sdc1 /dev/sdd1
>>> mkfs.xfs /dev/md0 -f
>>> mount /dev/md0 /mnt/test
>>>
>>> The error message is:
>>> mount: /mnt/test: mount(2) system call failed: Structure needs cleaning.
>>>
>>> We can see error message in dmesg:
>>> [ 6455.761545] XFS (md0): Metadata CRC error detected at xfs_agf_read_verify+0x118/0x160 [xfs], xfs_agf block 0x2105c008
>>> [ 6455.761570] XFS (md0): Unmount and run xfs_repair
>>> [ 6455.761575] XFS (md0): First 128 bytes of corrupted metadata buffer:
>>> [ 6455.761581] 00000000: fe ed ba be 00 00 00 00 00 00 00 02 00 00 00 00  ................
>>> [ 6455.761586] 00000010: 00 00 00 00 00 00 03 c0 00 00 00 01 00 00 00 00  ................
>>> [ 6455.761590] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>> [ 6455.761594] 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>> [ 6455.761598] 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>> [ 6455.761601] 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>> [ 6455.761605] 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>> [ 6455.761609] 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>> [ 6455.761662] XFS (md0): metadata I/O error in "xfs_read_agf+0xb4/0x1a0 [xfs]" at daddr 0x2105c008 len 8 error 74
>>> [ 6455.761673] XFS (md0): Error -117 recovering leftover CoW allocations.
>>> [ 6455.761685] XFS (md0): Corruption of in-memory data detected. Shutting down filesystem
>>> [ 6455.761690] XFS (md0): Please unmount the filesystem and rectify the problem(s)
>>>
>>> This problem doesn't happen when creating raid device with --assume-clean. So the crash only happens when sync and normal
>>> I/O write at the same time.
>>>
>>> I tried to revert the patch set "Save memory for stripe_head buffer" and the problem can be fixed. I'm looking at this problem,
>>> but I haven't found the root cause. Could you have a look?
>>
>> Thanks for reporting this bug. Please give me some times to debug it,
>> recently time is very limited for me.
>>
>> Thanks,
>> Yufen
> 
> Hi Yufen,
> 
> Have you got time to look into this?
> 

I can also reproduce this problem on my qemu vm system, with 3 10G disks.
But, there is no problem when I change mkfs.xfs option 'agcount' (default
value is 16 for my system). For example, if I set agcount=15, there is no
problem when mount xfs, likely:

mkfs.xfs -d agcount=15 -f /dev/md0
mount /dev/md0 /mnt/test

In addition, I try to write a 128MB file to /dev/md0 and then read it out
during md resync, they are same by checking md5sum, likely:

dd if=randfile of=/dev/md0 bs=1M count=128 oflag=direct seek=10240
dd if=/dev/md0 of=out.randfile bs=1M count=128 oflag=direct skip=10240

BTW, I found mkfs.xfs have some options related to raid device, such as
sunit, su, swidth, sw. I guess this problem may be caused by data alignment.
But, I have no idea how it happen. More time may needed.

Thanks
Yufen


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: raid5 crash on system which PAGE_SIZE is 64KB
  2021-03-23  7:41     ` Yufen Yu
@ 2021-03-24  8:02       ` Xiao Ni
  0 siblings, 0 replies; 6+ messages in thread
From: Xiao Ni @ 2021-03-24  8:02 UTC (permalink / raw)
  To: Yufen Yu, Song Liu
  Cc: linux-raid, Nigel Croxon, Heinz Mauelshagen, kent.overstreet

>>
>
> I can also reproduce this problem on my qemu vm system, with 3 10G disks.
> But, there is no problem when I change mkfs.xfs option 'agcount' (default
> value is 16 for my system). For example, if I set agcount=15, there is no
> problem when mount xfs, likely:
>
> mkfs.xfs -d agcount=15 -f /dev/md0
> mount /dev/md0 /mnt/test

Hi Yufen

I did test with agcount=15, this problem exists too in my environment.

Test1:
[root@ibm-p8-11 ~]# mdadm -CR /dev/md0 -l5 -n3 /dev/sd[b-d]1 --size=20G
[root@ibm-p8-11 ~]# mkfs.xfs /dev/md0 -f
meta-data=/dev/md0               isize=512    agcount=16, agsize=655232 blks
...
[root@ibm-p8-11 ~]# mount /dev/md0 /mnt/test
mount: /mnt/test: mount(2) system call failed: Structure needs cleaning.

Test2:
[root@ibm-p8-11 ~]# mkfs.xfs /dev/md0 -f -d agcount=15
Warning: AG size is a multiple of stripe width.  This can cause performance
problems by aligning all AGs on the same disk.  To avoid this, run mkfs with
an AG size that is one stripe unit smaller or larger, for example 699008.
meta-data=/dev/md0               isize=512    agcount=15, agsize=699136 blks
...
[root@ibm-p8-11 ~]# mount /dev/md0 /mnt/test
mount: /mnt/test: mount(2) system call failed: Structure needs cleaning.


>
> In addition, I try to write a 128MB file to /dev/md0 and then read it out
> during md resync, they are same by checking md5sum, likely:
>
> dd if=randfile of=/dev/md0 bs=1M count=128 oflag=direct seek=10240
> dd if=/dev/md0 of=out.randfile bs=1M count=128 oflag=direct skip=10240
>
> BTW, I found mkfs.xfs have some options related to raid device, such as
> sunit, su, swidth, sw. I guess this problem may be caused by data 
> alignment.
> But, I have no idea how it happen. More time may needed.

The problem doesn't happen if mkfs without resync. Is there a 
possibility that resync and mkfs
write to the same page?

Regards
Xiao


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-03-24  8:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-15 13:44 raid5 crash on system which PAGE_SIZE is 64KB Xiao Ni
2021-03-16  9:20 ` Yufen Yu
2021-03-22 17:28   ` Song Liu
2021-03-23  5:04     ` Xiao Ni
2021-03-23  7:41     ` Yufen Yu
2021-03-24  8:02       ` Xiao Ni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.