From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752613Ab0LLDQo (ORCPT ); Sat, 11 Dec 2010 22:16:44 -0500 Received: from mail-bw0-f45.google.com ([209.85.214.45]:57366 "EHLO mail-bw0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750854Ab0LLDQk convert rfc822-to-8bit (ORCPT ); Sat, 11 Dec 2010 22:16:40 -0500 MIME-Version: 1.0 In-Reply-To: <20101212023415.GG3059@thunk.org> References: <20101209201359.GG2921@thunk.org> <20101209231616.GA12515@basil.fritz.box> <1291945065-sup-1838@think> <20101210023852.GB3059@thunk.org> <20101212023415.GG3059@thunk.org> From: Jon Nelson Date: Sat, 11 Dec 2010 21:16:18 -0600 Message-ID: Subject: Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective) To: "Ted Ts'o" , Jon Nelson , Matt , Chris Mason , Andi Kleen , Mike Snitzer , Milan Broz , linux-btrfs , dm-devel , Linux Kernel , htd , htejun , linux-ext4 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Dec 11, 2010 at 7:40 PM, Ted Ts'o wrote: > Yes, indeed. Is this in the virtualized environment or on real > hardware at this point? And how many CPU's do you have configured in > your virtualized environment, and how memory memory? Is having a > certain number of CPU's critical for reproducing the problem? Is > constricting the amount of memory important? Originally, I observed the behavior on really real hardware. Since then, I have been able to reproduce it in VirtualBox and qemu-kvm, with openSUSE 11.3 and KUbuntu. All of the more recent tests have been with qemu-kvm. I have one CPU configured in the environment, 512MB of memory. I have not done any memory-constriction tests whatsoever. > It'll be a lot easier if I can reproduce it locally, which is why I'm > asking all of these questions. On Sat, Dec 11, 2010 at 8:34 PM, Ted Ts'o wrote: > One experiment --- can you try this with the file system mounted with > data=writeback, and see if the problem reproduces in that journalling > mode? That test is running now, first with encryption. I will report if it shows problems. If it does, I will wait until I have been able to see that a few times, and move to a no-encryption test. Typically, I have to run quite a few more iterations of that test before problems show up (if they will at all). > I want to rule out (if possible) journal_submit_inode_data_buffers() > racing with mpage_da_submit_io().  I don't think that's the issue, but > I'd prefer to do the experiment to make sure.  So if you can use a > kernel and system configuration which triggers the problem, and then > try changing the mount options to include data=writeback, and then > rerun the test, and let me know if the problem still reproduces, I'd > be really grateful. -- Jon