From mboxrd@z Thu Jan 1 00:00:00 1970 From: greg@enjellic.com Subject: Re: mismatch_cnt again Date: Thu, 12 Nov 2009 13:20:13 -0600 Message-ID: <200911121920.nACJKDew011818@wind.enjellic.com> References: Reply-To: greg@enjellic.com Return-path: In-Reply-To: Eyal Lebedinsky "Re: mismatch_cnt again" (Nov 10, 9:03am) Sender: linux-raid-owner@vger.kernel.org To: Eyal Lebedinsky , linux-raid list Cc: neilb@suse.de List-Id: linux-raid.ids On Nov 10, 9:03am, Eyal Lebedinsky wrote: } Subject: Re: mismatch_cnt again Good day to everyone. > Thanks everyone, > I wish to narrow down the issue to my question Are there situations > known to cause this without an actual hardware failure? > > Meaning, are there known *software* issues with this configuration > 2.6.30.5-43.fc11.x86_64, ext3, raid5, sata, Adaptec 1430SA > that can lead to a mismatch? > > It is not root, not swap, has weekly smartd scans and weekly > (different days) raid 'check's. Only report is a growing > mismatch_cnt. > > I noted the raid1 as mentioned in the thread. I have concerns there is a big ugly issue waiting to rear its head in the Linux storage community. Particularly after reading Martin's note about pages not being pinned through the duration of an I/O. Speaking directly to your concerns Eyal. One of my staff members runs recent Fedora on his desktop with software RAID1. On a brand new box shortly after installation he is noting large mismatch_cnt's on the RAID pairs. He posted about the issue a month or so ago to the linux-raid list. He received no definitive responses other than some vague hand waving that ext3 could cause this. I believe he is running ext4 on the RAID1 volumes in question. Interestingly enough a filesystem check comes up normal. So there are mismatches but they do not seem to be manifesting themselves. It would seem that others confirm this issue. More to the point we manage geographically mirrored storage systems. Linux initiators receive fiber-channel based block devices from two separate mirrors. The block devices are used as the basis for a RAID1 volume with persistent bitmaps. In the data-centers we have SCST based Linux storage targets. The target 'disks' are LVM based logical volumes platformed on top of software RAID5 volumes. We are seeing, in some cases, large mismatch_cnts on the RAID1 initiators. Check runs on each of the two RAID5 target volumes show no mismatches. So the mismatch is occuring at the RAID1 level and is independent of what is happening at the physical storage level. The filesystems on the RAID1 volumes are ext3 running under moderate to heavy load. Initiator kernels, in general, have been reasonably new, 2.6.27.x and forward, with RHEL5 userspace. I suspect there are one or more subtle factors which are making the non-pinned pages more of an issue then what they appear to be at first analysis. Jens and company have been putzing with the I/O schedulers and related issues. One possible bit of hand waving is that all of this may be somehow confounded by elevator induced latencies. Our I/O latencies are longer due to the physical issues of shooting I/O through a fair amount of glass and multi-trunked switch architectures. In addition we configure somewhat deeper queue depths on the targets which may compound the problem. But that doesn't explain Eyal's and other's issues with this showing up on desktop systems. In any case I am convinced the problem is real and potentially significant. What seems to be perplexing is why it isn't showing up as corrupted files and the like. We are not hearing anything from the user side which would suggest manifestation of the problem. More troubling in my opinion is how widespread the problem might be and how do we fix it? Automatic repair is problematic as has been discussed, particulary in the case of a two pair RAID1 volume. I'm also equally apprehensive about doing a casino roll with data by blindly running a 'repair'. The obvious alternative is to compare the mismatches and figure out which block is correct. Pragmatically a somewhat daunting task on potentially thousands of mismatches on multi-hundred gigabyte filesystems. Much more so when one considers the qualitative assessment issue and the need to do this off-line to avoid Heisenberg issues. > cheers Eyal So I think the problem is real and one we need to respond to as a community sooner rather than later. I shudder at the thought of an LWN or Slashdot article heralding the fact there might be silent corruption on thousands of filesystems around the planet... :-)( Neil/Martin what do you think? I'm happy to hunt if we can do anything from our end. Best wishes for a pleasant weekend to everyone. Greg }-- End of excerpt from Eyal Lebedinsky As always, Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC. 4206 N. 19th Ave. Specializing in information infra-structure Fargo, ND 58102 development. PH: 701-281-1686 FAX: 701-281-3949 EMAIL: greg@enjellic.com ------------------------------------------------------------------------------ "If I'd listened to customers, I'd have given them a faster horse." -- Henry Ford