All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xiao Ni <xni@redhat.com>
To: yuyufen@huawei.com, song@kernel.org,
	linux-raid <linux-raid@vger.kernel.org>,
	Nigel Croxon <ncroxon@redhat.com>
Cc: Heinz Mauelshagen <heinzm@redhat.com>, kent.overstreet@gmail.com
Subject: raid5 crash on system which PAGE_SIZE is 64KB
Date: Mon, 15 Mar 2021 21:44:02 +0800	[thread overview]
Message-ID: <225718c0-475c-7bd7-e067-778f7097a923@redhat.com> (raw)

Hi all

We encounter one raid5 crash problem on POWER system which PAGE_SIZE is 
64KB.
I can reproduce this problem 100%.  This problem can be reproduced with 
latest upstream kernel.

The steps are:
mdadm -CR /dev/md0 -l5 -n3 /dev/sda1 /dev/sdc1 /dev/sdd1
mkfs.xfs /dev/md0 -f
mount /dev/md0 /mnt/test

The error message is:
mount: /mnt/test: mount(2) system call failed: Structure needs cleaning.

We can see error message in dmesg:
[ 6455.761545] XFS (md0): Metadata CRC error detected at 
xfs_agf_read_verify+0x118/0x160 [xfs], xfs_agf block 0x2105c008
[ 6455.761570] XFS (md0): Unmount and run xfs_repair
[ 6455.761575] XFS (md0): First 128 bytes of corrupted metadata buffer:
[ 6455.761581] 00000000: fe ed ba be 00 00 00 00 00 00 00 02 00 00 00 
00  ................
[ 6455.761586] 00000010: 00 00 00 00 00 00 03 c0 00 00 00 01 00 00 00 
00  ................
[ 6455.761590] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00  ................
[ 6455.761594] 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00  ................
[ 6455.761598] 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00  ................
[ 6455.761601] 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00  ................
[ 6455.761605] 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00  ................
[ 6455.761609] 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00  ................
[ 6455.761662] XFS (md0): metadata I/O error in "xfs_read_agf+0xb4/0x1a0 
[xfs]" at daddr 0x2105c008 len 8 error 74
[ 6455.761673] XFS (md0): Error -117 recovering leftover CoW allocations.
[ 6455.761685] XFS (md0): Corruption of in-memory data detected. 
Shutting down filesystem
[ 6455.761690] XFS (md0): Please unmount the filesystem and rectify the 
problem(s)

This problem doesn't happen when creating raid device with 
--assume-clean. So the crash only happens when sync and normal
I/O write at the same time.

I tried to revert the patch set "Save memory for stripe_head buffer" and 
the problem can be fixed. I'm looking at this problem,
but I haven't found the root cause. Could you have a look?

By the way, there is a place that I can't understand. Is it a bug? 
Should we do in this way:
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 5d57a5b..4a5e8ae 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1479,7 +1479,7 @@ static struct page **to_addr_page(struct 
raid5_percpu *percpu, int i)
  static addr_conv_t *to_addr_conv(struct stripe_head *sh,
                                  struct raid5_percpu *percpu, int i)
  {
-       return (void *) (to_addr_page(percpu, i) + sh->disks + 2);
+       return (void *) (to_addr_page(percpu, i) + sizeof(struct 
page*)*(sh->disks + 2));
  }

  /*
@@ -1488,7 +1488,7 @@ static addr_conv_t *to_addr_conv(struct 
stripe_head *sh,
  static unsigned int *
  to_addr_offs(struct stripe_head *sh, struct raid5_percpu *percpu)
  {
-       return (unsigned int *) (to_addr_conv(sh, percpu, 0) + sh->disks 
+ 2);
+       return (unsigned int *) (to_addr_conv(sh, percpu, 0) + 
sizeof(addr_conv_t)*(sh->disks + 2));
  }

This is introduced by commit b330e6a49d (md: convert to kvmalloc)

Regards
Xiao





             reply	other threads:[~2021-03-15 13:44 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-15 13:44 Xiao Ni [this message]
2021-03-16  9:20 ` raid5 crash on system which PAGE_SIZE is 64KB Yufen Yu
2021-03-22 17:28   ` Song Liu
2021-03-23  5:04     ` Xiao Ni
2021-03-23  7:41     ` Yufen Yu
2021-03-24  8:02       ` Xiao Ni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=225718c0-475c-7bd7-e067-778f7097a923@redhat.com \
    --to=xni@redhat.com \
    --cc=heinzm@redhat.com \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=ncroxon@redhat.com \
    --cc=song@kernel.org \
    --cc=yuyufen@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.