From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752385Ab0LLKSx (ORCPT ); Sun, 12 Dec 2010 05:18:53 -0500 Received: from mail-bw0-f45.google.com ([209.85.214.45]:42072 "EHLO mail-bw0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751917Ab0LLKSv convert rfc822-to-8bit (ORCPT ); Sun, 12 Dec 2010 05:18:51 -0500 MIME-Version: 1.0 In-Reply-To: References: <20101209201359.GG2921@thunk.org> <20101209231616.GA12515@basil.fritz.box> <1291945065-sup-1838@think> <20101210023852.GB3059@thunk.org> <20101212023415.GG3059@thunk.org> From: Jon Nelson Date: Sun, 12 Dec 2010 04:18:29 -0600 Message-ID: Subject: Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective) To: "Ted Ts'o" , Jon Nelson , Matt , Chris Mason , Andi Kleen , Mike Snitzer , Milan Broz , linux-btrfs , dm-devel , Linux Kernel , htd , htejun , linux-ext4 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Dec 11, 2010 at 9:16 PM, Jon Nelson wrote: > On Sat, Dec 11, 2010 at 7:40 PM, Ted Ts'o wrote: >> Yes, indeed.  Is this in the virtualized environment or on real >> hardware at this point?  And how many CPU's do you have configured in >> your virtualized environment, and how memory memory?  Is having a >> certain number of CPU's critical for reproducing the problem?  Is >> constricting the amount of memory important? > > Originally, I observed the behavior on really real hardware. > > Since then, I have been able to reproduce it in VirtualBox and > qemu-kvm, with openSUSE 11.3 and KUbuntu. All of the more recent tests > have been with qemu-kvm. > > I have one CPU configured in the environment, 512MB of memory. > I have not done any memory-constriction tests whatsoever. > >> It'll be a lot easier if I can reproduce it locally, which is why I'm >> asking all of these questions. > > On Sat, Dec 11, 2010 at 8:34 PM, Ted Ts'o wrote: >> One experiment --- can you try this with the file system mounted with >> data=writeback, and see if the problem reproduces in that journalling >> mode? > > That test is running now, first with encryption. I will report if it > shows problems. If it does, I will wait until I have been able to see > that a few times, and move to a no-encryption test. Typically, I have > to run quite a few more iterations of that test before problems show > up (if they will at all). > >> I want to rule out (if possible) journal_submit_inode_data_buffers() >> racing with mpage_da_submit_io().  I don't think that's the issue, but >> I'd prefer to do the experiment to make sure.  So if you can use a >> kernel and system configuration which triggers the problem, and then >> try changing the mount options to include data=writeback, and then >> rerun the test, and let me know if the problem still reproduces, I'd >> be really grateful. Using 2.6.37-rc5 and data=writeback,noatime and LUKS encryption I hit the problem 71 times out of 173. -- Jon