From mboxrd@z Thu Jan 1 00:00:00 1970 From: Neil Brown Subject: Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5) Date: Tue, 23 Jan 2007 13:06:40 +1100 Message-ID: <17845.28080.887134.987438@notabene.brown> References: <45B5261B.1050104@redhat.com> <17845.13256.284461.992275@notabene.brown> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: message from Dan Williams on Monday January 22 Sender: linux-raid-owner@vger.kernel.org To: Dan Williams Cc: Chuck Ebbert , Justin Piszcz , linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, xfs@oss.sgi.com List-Id: linux-raid.ids On Monday January 22, dan.j.williams@gmail.com wrote: > On 1/22/07, Neil Brown wrote: > > On Monday January 22, cebbert@redhat.com wrote: > > > Justin Piszcz wrote: > > > > My .config is attached, please let me know if any other information is > > > > needed and please CC (lkml) as I am not on the list, thanks! > > > > > > > > Running Kernel 2.6.19.2 on a MD RAID5 volume. Copying files over Samba to > > > > the RAID5 running XFS. > > > > > > > > Any idea what happened here? > > .... > > > > > > > Without digging too deeply, I'd say you've hit the same bug Sami Farin > > > and others > > > have reported starting with 2.6.19: pages mapped with kmap_atomic() > > > become unmapped > > > during memcpy() or similar operations. Try disabling preempt -- that > > > seems to be the > > > common factor. > > > > That is exactly the conclusion I had just come to (a kmap_atomic page > > must be being unmapped during memcpy). I wasn't aware that others had > > reported it - thanks for that. > > > > Turning off CONFIG_PREEMPT certainly seems like a good idea. > > > Coming from an ARM background I am not yet versed in the inner > workings of kmap_atomic, but if you have time for a question I am > curious as to why spin_lock(&sh->lock) is not sufficient pre-emption > protection for copy_data() in this case? > Presumably there is a bug somewhere. kmap_atomic itself calls inc_preempt_count so that preemption should be disabled at least until the kunmap_atomic is called. But apparently not. The symptoms point exactly to the page getting unmapped when it shouldn't. Until that bug is found and fixed, the work around of turning of CONFIG_PREEMPT seems to make sense. Of course it would be great if someone who can easily reproduce this bug could do the 'git bisect' thing to find out where the bug crept in..... NeilBrown