All of lore.kernel.org
 help / color / mirror / Atom feed
* mismatch_cnt again
@ 2009-11-07  0:41 Eyal Lebedinsky
  2009-11-07  1:53 ` berk walker
  2009-11-09 22:03 ` Eyal Lebedinsky
  0 siblings, 2 replies; 58+ messages in thread
From: Eyal Lebedinsky @ 2009-11-07  0:41 UTC (permalink / raw)
  To: linux-raid list

For years I found the mismatch_cnt rising regularly every few weeks and could
never relate it to any evens.

I since replaced the computer, installed fedora 11 (was very old debian)
and only kept the array itself (ext3 on 5x1TB raid5). I had the raid
'repair'ed to get it to mismatch_cnt=0.

I thought that I saw the last of these. I had a good run for almost three
months, then last week I saw the first mismatch_cnt=184. It was still so
on this weekly 'check'.

I cannot see any bad event logged.

Are there situations known to cause this without an actual hardware failure?
I know that this came up in the past (often) but I see little recent
discussion and wonder what the current status is.

For the last 6 weeks (my uptime) the machine runs
	2.6.30.5-43.fc11.x86_64 #1 SMP

The raid holds data (no root or swap) used mostly as DVR (nothing heavy).
smartd checks each week and so far no errors. The disks are modern 1yo
"SAMSUNG HD103UJ".

TIA

-- 
Eyal Lebedinsky	(eyal@eyal.emu.id.au)

^ permalink raw reply	[flat|nested] 58+ messages in thread
* Re: mismatch_cnt again
@ 2009-11-12 19:20 greg
  2009-11-13  2:28 ` Neil Brown
  0 siblings, 1 reply; 58+ messages in thread
From: greg @ 2009-11-12 19:20 UTC (permalink / raw)
  To: Eyal Lebedinsky, linux-raid list; +Cc: neilb

On Nov 10,  9:03am, Eyal Lebedinsky wrote:
} Subject: Re: mismatch_cnt again

Good day to everyone.

> Thanks everyone,

> I wish to narrow down the issue to my question Are there situations
> known to cause this without an actual hardware failure?
>
> Meaning, are there known *software* issues with this configuration
> 	2.6.30.5-43.fc11.x86_64, ext3, raid5, sata, Adaptec 1430SA
> that can lead to a mismatch?
>
> It is not root, not swap, has weekly smartd scans and weekly
> (different days) raid 'check's. Only report is a growing
> mismatch_cnt.
>
> I noted the raid1 as mentioned in the thread.

I have concerns there is a big ugly issue waiting to rear its head in
the Linux storage community.  Particularly after reading Martin's note
about pages not being pinned through the duration of an I/O.

Speaking directly to your concerns Eyal.  One of my staff members runs
recent Fedora on his desktop with software RAID1.  On a brand new box
shortly after installation he is noting large mismatch_cnt's on the
RAID pairs.

He posted about the issue a month or so ago to the linux-raid list.
He received no definitive responses other than some vague hand waving
that ext3 could cause this.  I believe he is running ext4 on the RAID1
volumes in question.

Interestingly enough a filesystem check comes up normal.  So there are
mismatches but they do not seem to be manifesting themselves.  It
would seem that others confirm this issue.

More to the point we manage geographically mirrored storage systems.
Linux initiators receive fiber-channel based block devices from two
separate mirrors.  The block devices are used as the basis for a RAID1
volume with persistent bitmaps.

In the data-centers we have SCST based Linux storage targets.  The
target 'disks' are LVM based logical volumes platformed on top of
software RAID5 volumes.

We are seeing, in some cases, large mismatch_cnts on the RAID1
initiators.  Check runs on each of the two RAID5 target volumes show
no mismatches.  So the mismatch is occuring at the RAID1 level and is
independent of what is happening at the physical storage level.

The filesystems on the RAID1 volumes are ext3 running under moderate
to heavy load.  Initiator kernels, in general, have been reasonably
new, 2.6.27.x and forward, with RHEL5 userspace.

I suspect there are one or more subtle factors which are making the
non-pinned pages more of an issue then what they appear to be at first
analysis.  Jens and company have been putzing with the I/O schedulers
and related issues.  One possible bit of hand waving is that all of
this may be somehow confounded by elevator induced latencies.

Our I/O latencies are longer due to the physical issues of shooting
I/O through a fair amount of glass and multi-trunked switch
architectures.  In addition we configure somewhat deeper queue depths
on the targets which may compound the problem.  But that doesn't
explain Eyal's and other's issues with this showing up on desktop
systems.

In any case I am convinced the problem is real and potentially
significant.  What seems to be perplexing is why it isn't showing up
as corrupted files and the like.  We are not hearing anything from the
user side which would suggest manifestation of the problem.

More troubling in my opinion is how widespread the problem might be
and how do we fix it?  Automatic repair is problematic as has been
discussed, particulary in the case of a two pair RAID1 volume.  I'm
also equally apprehensive about doing a casino roll with data by
blindly running a 'repair'.

The obvious alternative is to compare the mismatches and figure out
which block is correct.  Pragmatically a somewhat daunting task on
potentially thousands of mismatches on multi-hundred gigabyte
filesystems.  Much more so when one considers the qualitative
assessment issue and the need to do this off-line to avoid Heisenberg
issues.

> cheers Eyal

So I think the problem is real and one we need to respond to as a
community sooner rather than later.  I shudder at the thought of an
LWN or Slashdot article heralding the fact there might be silent
corruption on thousands of filesystems around the planet... :-)(

Neil/Martin what do you think?

I'm happy to hunt if we can do anything from our end.

Best wishes for a pleasant weekend to everyone.

Greg

}-- End of excerpt from Eyal Lebedinsky

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.           Specializing in information infra-structure
Fargo, ND  58102            development.
PH: 701-281-1686
FAX: 701-281-3949           EMAIL: greg@enjellic.com
------------------------------------------------------------------------------
"If I'd listened to customers, I'd have given them a faster horse."
                                -- Henry Ford

^ permalink raw reply	[flat|nested] 58+ messages in thread
* Re: mismatch_cnt again
@ 2009-11-16 21:36 greg
  2009-11-16 22:14 ` Neil Brown
  0 siblings, 1 reply; 58+ messages in thread
From: greg @ 2009-11-16 21:36 UTC (permalink / raw)
  To: Neil Brown, greg; +Cc: Eyal Lebedinsky, linux-raid list

On Nov 13,  1:28pm, Neil Brown wrote:
} Subject: Re: mismatch_cnt again

Good afternoon to everyone, hope your week is starting well.

> On Thursday November 12, greg@enjellic.com wrote:
> > 
> > Neil/Martin what do you think?

> I think that if you found out which blocks were different and mapped
> that back through the filesystem, you would find that those blocks
> are not a part of any file, or possibly are part of a file that is
> currently being written.

I can buy the issue of the mismatches being part of a file being
written but that doesn't explain machines where the RAID1 array was
initialized and allowed to synchronize and which now show persistent
counts of mismatched sectors.

I can certainly buy the issue of the mismatches not being part of an
active file.  I still think this leaves the issue of why the
mismatches were generated unless we want to assume that whatever
causes the mismatch only affects areas of the filesystem which don't
have useful files.  Not a reassuring assumption.

> I guess I need to start logging the error address so people can
> start dealing with facts rather than fears.

I think that would be a good starting point.  If for no other reason
then to allow people to easily figure out the possible ramifications of a
mismatch count.

One other issue to consider.  We have RAID1 volumes with mismatch
counts over a wide variety of hardware platforms and Linux kernels.
In all cases the number of mismatched blocks are an exact multiple of
128.  That doesn't seem to suggest some type of random corruption.

This issue may all be innocuous but we have about the worst situation
we could have.  An issue which may be generating false positives for
potential corruption.  Amplified by the fact that major distributions
are generating what will be interpreted as warning e-mails about their
existence.  So even if the problem is innocuous the list is guaranteed
to be spammed with these reports let alone your inbox.... :-)

Just a thought in moving forward.

The 'check' option is primarily useful for its role in scrubbing RAID*
volumes with an eye toward making sure that silent corruption
scenarios don't arise which would thwart a resync.  Particularly since
you implemented the ability to attempt a sector re-write to trigger
block re-allocations.  This is a nice deterministic repair mechanism
which has fixed problems for us on a number of occassions.

I think what is needed is a 'scrub' directive which carries out this
function without incrementing mismatch counts and the like.  That
would leave a possibly enhanced 'check' command to report on
mismatches and carry out any remedial action, if any, that the group
can think of.

If a scrub directive were to be implemented it would be beneficial to
make it interruptible.  A 'halt' or similar directive would shutdown
the scrub and latch the last block number which had been examined.
That would allow a scrub to be resumed from that point in a subsequent
session.

With some of these large block devices it is difficult to get through
an entire 'check/scrub' in whatever late night window is left after
backups have run.  The above infra-structure would allow userspace to
gate the checking into whatever windows are available for these types
of activities.

> NeilBrown

Hope the above comments are helpful.

Best wishes for a productive week.

}-- End of excerpt from Neil Brown

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.           Specializing in information infra-structure
Fargo, ND  58102            development.
PH: 701-281-1686
FAX: 701-281-3949           EMAIL: greg@enjellic.com
------------------------------------------------------------------------------
"When I am working on a problem I never think about beauty.  I only
 think about how to solve the problem.  But when I have finished, if
 the solution is not beautiful, I know it is wrong."
                                -- Buckminster Fuller

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2009-11-17  5:22 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-07  0:41 mismatch_cnt again Eyal Lebedinsky
2009-11-07  1:53 ` berk walker
2009-11-07  7:49   ` Eyal Lebedinsky
2009-11-07  8:08     ` Michael Evans
2009-11-07  8:42       ` Eyal Lebedinsky
2009-11-07 13:51       ` Goswin von Brederlow
2009-11-07 14:58         ` Doug Ledford
2009-11-07 16:23           ` Piergiorgio Sartor
2009-11-07 16:37             ` Doug Ledford
2009-11-07 22:25               ` Eyal Lebedinsky
2009-11-07 22:57                 ` Doug Ledford
2009-11-08 15:32             ` Goswin von Brederlow
2009-11-09 18:08               ` Bill Davidsen
2009-11-07 22:19           ` Eyal Lebedinsky
2009-11-07 22:58             ` Doug Ledford
2009-11-08 15:46           ` Goswin von Brederlow
2009-11-08 16:04             ` Piergiorgio Sartor
2009-11-09 18:22               ` Bill Davidsen
2009-11-09 21:50                 ` NeilBrown
2009-11-10 18:05                   ` Bill Davidsen
2009-11-10 22:17                     ` Peter Rabbitson
2009-11-13  2:15                     ` Neil Brown
2009-11-09 19:13               ` Goswin von Brederlow
2009-11-08 22:51             ` Peter Rabbitson
2009-11-09 18:56               ` Piergiorgio Sartor
2009-11-09 21:14                 ` NeilBrown
2009-11-09 21:54                   ` Piergiorgio Sartor
2009-11-10  0:17                     ` NeilBrown
2009-11-10  9:09                       ` Peter Rabbitson
2009-11-10 14:03                         ` Martin K. Petersen
2009-11-12 22:40                           ` Bill Davidsen
2009-11-13 17:12                             ` Martin K. Petersen
2009-11-14 17:01                               ` Bill Davidsen
2009-11-17  5:19                                 ` Martin K. Petersen
2009-11-14 19:04                               ` Goswin von Brederlow
2009-11-17  5:22                                 ` Martin K. Petersen
2009-11-10 19:52                       ` Piergiorgio Sartor
2009-11-13  2:37                         ` Neil Brown
2009-11-13  5:30                           ` Goswin von Brederlow
2009-11-13  9:33                           ` Peter Rabbitson
2009-11-15 21:05                           ` Piergiorgio Sartor
2009-11-15 22:29                             ` Guy Watkins
2009-11-16  1:23                               ` Goswin von Brederlow
2009-11-16  1:37                               ` Neil Brown
2009-11-16  5:21                                 ` Goswin von Brederlow
2009-11-16  5:35                                   ` Neil Brown
2009-11-16  7:40                                     ` Goswin von Brederlow
2009-11-12 22:57                       ` Bill Davidsen
2009-11-09 18:11           ` Bill Davidsen
2009-11-09 20:58             ` Doug Ledford
2009-11-09 22:03 ` Eyal Lebedinsky
2009-11-12 19:20 greg
2009-11-13  2:28 ` Neil Brown
2009-11-13  5:19   ` Goswin von Brederlow
2009-11-15  1:54   ` Bill Davidsen
2009-11-16 21:36 greg
2009-11-16 22:14 ` Neil Brown
2009-11-17  4:50   ` Goswin von Brederlow

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.