All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Brassow Jonathan <jbrassow@redhat.com>
Cc: "linux-raid@vger.kernel.org Raid" <linux-raid@vger.kernel.org>
Subject: Re: raid1 data corruption during resync
Date: Tue, 9 Sep 2014 11:08:13 +1000	[thread overview]
Message-ID: <20140909110813.3c6a15b2@notabene.brown> (raw)
In-Reply-To: <CEEAB94E-18F3-4087-A866-B412582F7B8D@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 3950 bytes --]

On Mon, 8 Sep 2014 10:52:49 -0500 Brassow Jonathan <jbrassow@redhat.com>
wrote:

> 
> 
> Begin forwarded message:
> 
> > From: Brassow Jonathan <jbrassow@redhat.com>
> > Date: September 4, 2014 9:01:58 AM CDT
> > To: NeilBrown <neilb@suse.de>
> > Cc: Eivind Sarto <eivindsarto@gmail.com>, "linux-raid@vger.kernel.org Raid" <linux-raid@vger.kernel.org>
> > Subject: Re: raid1 data corruption during resync
> > 
> > 
> > On Sep 4, 2014, at 12:28 AM, NeilBrown wrote:
> > 
> >> 
> >> Neither of these explain the hang you are seeing.
> >> I note that the "md0-resync" thread isn't listed.  I don't suppose you know
> >> what it was doing (stack trace)??
> >> Also, had  "md: md0: sync done" appeared in syslog yet?
> > 
> > The sync has not yet completed (no message).  I'm not sure why the resync thread didn't automatically report, but I've grabbed the entire trace from the machine ('echo t > /proc/sysrq-trigger') and it appears there.  The traces are attached.  (Would you rather have something so large posted in-line?)
> 
> I didn't see this message come through yet.  I am resending it with only the trace you requested from the mdX_resync thread.  If you need the entire list of traces, I can try resending that.
> 
>  brassow
> 
> Sep  4 08:52:00 bp-01 kernel: mdX_resync      D 0000000000000008     0 12374      2 0x00000080
> Sep  4 08:52:00 bp-01 kernel: ffff880207a5bb38 0000000000000046 0000000000000296 ffff88021727efc0
> Sep  4 08:52:00 bp-01 kernel: ffff880207a58010 0000000000012bc0 0000000000012bc0 ffff880214de0f40
> Sep  4 08:52:00 bp-01 kernel: ffff880207a5bb60 ffff88040171e178 ffff88040171e100 ffff880207a5bb58
> Sep  4 08:52:00 bp-01 kernel: Call Trace:
> Sep  4 08:52:00 bp-01 kernel: [<ffffffff81580549>] schedule+0x29/0x70
> Sep  4 08:52:00 bp-01 kernel: [<ffffffffa038a122>] raise_barrier+0xe2/0x160 [raid1]
> Sep  4 08:52:00 bp-01 kernel: [<ffffffff8108eb00>] ? bit_waitqueue+0xe0/0xe0
> Sep  4 08:52:00 bp-01 kernel: [<ffffffffa038b701>] sync_request+0x161/0x9e0 [raid1]
> Sep  4 08:52:00 bp-01 kernel: [<ffffffff8108ee13>] ? __wake_up+0x53/0x70
> Sep  4 08:52:00 bp-01 kernel: [<ffffffff81460009>] md_do_sync+0x849/0xd40
> Sep  4 08:52:00 bp-01 kernel: [<ffffffff8108847f>] ? put_prev_entity+0x2f/0x400
> Sep  4 08:52:00 bp-01 kernel: [<ffffffff81460816>] md_thread+0x116/0x150
> Sep  4 08:52:00 bp-01 kernel: [<ffffffff8158006e>] ? __schedule+0x34e/0x6e0
> Sep  4 08:52:00 bp-01 kernel: [<ffffffff81460700>] ? md_rdev_init+0x110/0x110
> Sep  4 08:52:00 bp-01 kernel: [<ffffffff8107083e>] kthread+0xce/0xf0
> Sep  4 08:52:00 bp-01 kernel: [<ffffffff81070770>] ? kthread_freezable_should_stop+0x70/0x70
> Sep  4 08:52:00 bp-01 kernel: [<ffffffff8158412c>] ret_from_fork+0x7c/0xb0
> Sep  4 08:52:00 bp-01 kernel: [<ffffffff81070770>] ? kthread_freezable_should_stop+0x70/0x70
> 

I did get the original thanks, I guess it just didn't make it to the list as
well.  Probably there is a limit on attachment sizes which isn't entirely
unreasonable.

No brilliant ideas yet...
That fact that there are no kernel messages: no "sync done" or "redirecting
sector" or "unrecoverable I/O" is a little surprising...

It appears that conf->barrier is elevated, implying that there is a resync
request that is still in-flight.  However there I cannot think were it would
be.
It might help  if I could see the disassemble of raise_barrier() so I could
confirm that "raise_barrier+0xe2/0x160" is in the first or the second
"wait_event_lock_irq" call.  I assume it is in the first, and is waiting for
the request that kcopyd wants to submit, to complete.
It it were the second then it would be waiting for ->start_next_window
to increase.  That happens when allow_barrier() is called.  If that were
the blockage, it means that some normal write is in-flight, rather than
an sync request.  But I don't know where it could be either.

So - still mystified.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2014-09-09  1:08 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <D4FE2D75-4208-48C9-A4D0-432F092E5AE9@redhat.com>
2014-09-08 15:52 ` Fwd: raid1 data corruption during resync Brassow Jonathan
2014-09-09  1:08   ` NeilBrown [this message]
2014-08-29 19:29 Eivind Sarto
2014-09-02 14:10 ` Brassow Jonathan
2014-09-02 16:43   ` Eivind Sarto
2014-09-02 17:04   ` Eivind Sarto
2014-09-02 16:59 ` Brassow Jonathan
2014-09-02 19:24 ` Brassow Jonathan
2014-09-02 22:07   ` Eivind Sarto
2014-09-02 22:14     ` Brassow Jonathan
2014-09-02 23:55     ` NeilBrown
2014-09-03  0:48       ` Eivind Sarto
2014-09-03  1:18       ` Brassow Jonathan
2014-09-03  1:31         ` NeilBrown
2014-09-03  1:45           ` Brassow Jonathan
2014-09-03 21:39             ` Brassow Jonathan
2014-09-04  5:28               ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140909110813.3c6a15b2@notabene.brown \
    --to=neilb@suse.de \
    --cc=jbrassow@redhat.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.