All of lore.kernel.org
 help / color / mirror / Atom feed
From: bugzilla-daemon@bugzilla.kernel.org
To: linux-ext4@vger.kernel.org
Subject: [Bug 102731] I have a cough.
Date: Mon, 28 Sep 2015 17:06:41 +0000	[thread overview]
Message-ID: <bug-102731-13602-iGtpT7UGll@https.bugzilla.kernel.org/> (raw)
In-Reply-To: <bug-102731-13602@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=102731

--- Comment #13 from Theodore Tso <tytso@mit.edu> ---
So it's been 12 days, and previously when you were using the Debian 3.16
kernel, it was triggering once every four days, right?  Can I assume that your
silence indicates that you haven't seen a problem to date?

If so, then it really does seen that it might be an interaction between LVM/MD
and KVM.

So if that's the case, then the next thing to ask is to try to figure out what
might be the triggering cause.   A couple of things come to mind:

1) Some failure to properly handle a flush cache command being sent to the MD
device.  This combined to either a power failure or a crash of the guest OS
(depending on how KVM is configured), might explain a block update getting
lost.   The fact that the block bitmap is out of sync with the block group
descriptor is consistent with this failure.  However, if you were seeing
failures once every four days, that would imply that the guest OS and/or host
OS would be crashing at that or about that level of frequency, and you haven't
reported that. 

2) Some kind a race between a 4k write and a RAID1 resync leading to a block
write getting lost.  Again, this reported data corruption is consistent with
this theory --- but this also requires the guest OS crashing due to some kind
of kernel crash or KVM/qemu shutdown and/or host OS crash / power failure, as
in (1) above.  If you weren't seeing these failures once every four days or so,
then this isn't a likely explanation.

3)  Some kind of corruption caused by the TRIM command being sent to the
RAID/MD device, possibly racing with a block bitmap update.  This could be
caused either by the file system being mounted with the -o discard mount
option, or by fstrim getting run out of cron, or by e2fsck explicitly being
asked to discard unused blocks (with the "-E discard" option).

4)  Some kind of bug which happens rarely either in qemu, the host kernel or
the guest kernel depending on how it communicates with the virtual disk. 
(i.e., virtio, scsi, ide, etc.)   Virtio is the most likely use case, and so
trying to change to use scsi emulation might be interesting.  (OTOH, if the
problem is specific to the MD layer, then this possibility is less likely.)

So as far as #3 is concerned, can you check to see if you had fstrim enabled,
or are mounting the file system with -o discard?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

  parent reply	other threads:[~2015-09-28 17:06 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-12  8:47 [Bug 102731] New: I have a cough bugzilla-daemon
2015-08-12  8:56 ` [Bug 102731] " bugzilla-daemon
2015-08-12  9:02 ` bugzilla-daemon
2015-08-12  9:11 ` bugzilla-daemon
2015-08-12  9:12 ` bugzilla-daemon
2015-08-12 18:53 ` bugzilla-daemon
2015-08-12 19:25 ` bugzilla-daemon
2015-08-31 15:46 ` bugzilla-daemon
2015-08-31 15:47 ` bugzilla-daemon
2015-08-31 18:03 ` bugzilla-daemon
2015-09-01 10:28 ` bugzilla-daemon
2015-09-01 14:43 ` bugzilla-daemon
2015-09-01 16:08 ` bugzilla-daemon
2015-09-16 14:09 ` bugzilla-daemon
2015-09-28 17:06 ` bugzilla-daemon [this message]
2015-09-30  9:49 ` bugzilla-daemon
2015-10-07 16:17 ` bugzilla-daemon
2015-10-08  9:16 ` bugzilla-daemon
2015-10-11  4:05 ` bugzilla-daemon
2015-10-12 10:36 ` bugzilla-daemon
2015-10-12 14:01 ` bugzilla-daemon
2015-10-15 15:32 ` bugzilla-daemon
2015-10-15 15:38 ` bugzilla-daemon
2015-10-15 15:41 ` bugzilla-daemon
2015-10-16 13:04 ` bugzilla-daemon
2015-10-16 15:53 ` bugzilla-daemon
2015-10-16 16:14 ` bugzilla-daemon
2015-10-20 13:40 ` bugzilla-daemon
2015-10-20 15:44 ` bugzilla-daemon
2015-10-20 15:55 ` bugzilla-daemon
2015-10-20 16:28 ` bugzilla-daemon
2015-10-20 16:30 ` bugzilla-daemon
2015-11-25 10:09 ` bugzilla-daemon
2016-01-19 12:00 ` bugzilla-daemon
2016-01-21 23:57 ` bugzilla-daemon
2016-01-22 10:27 ` bugzilla-daemon
2016-01-22 15:20 ` bugzilla-daemon
2016-01-22 16:36 ` bugzilla-daemon
2016-02-08  9:52 ` bugzilla-daemon
2016-02-08 10:56 ` bugzilla-daemon
2016-03-18 22:20 ` bugzilla-daemon
2016-03-19 17:49 ` bugzilla-daemon
2016-03-20  1:27 ` bugzilla-daemon
2016-03-20 23:26 ` bugzilla-daemon
2016-03-21 13:04 ` bugzilla-daemon
2016-03-25 16:55 ` bugzilla-daemon
2016-04-08 15:49 ` bugzilla-daemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-102731-13602-iGtpT7UGll@https.bugzilla.kernel.org/ \
    --to=bugzilla-daemon@bugzilla.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.