From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S265902AbTLaA3h (ORCPT ); Tue, 30 Dec 2003 19:29:37 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S265879AbTLaA3e (ORCPT ); Tue, 30 Dec 2003 19:29:34 -0500 Received: from fw.osdl.org ([65.172.181.6]:29343 "EHLO mail.osdl.org") by vger.kernel.org with ESMTP id S265877AbTLaA3c (ORCPT ); Tue, 30 Dec 2003 19:29:32 -0500 Subject: Re: [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix From: Daniel McNeil To: Suparna Bhattacharya Cc: Janet Morgan , Badari Pulavarty , "linux-aio@kvack.org" , Linux Kernel Mailing List , Andrew Morton In-Reply-To: <20031230045334.GA3484@in.ibm.com> References: <3FCD4B66.8090905@us.ibm.com> <1070674185.1929.9.camel@ibm-c.pdx.osdl.net> <1070907814.707.2.camel@ibm-c.pdx.osdl.net> <1071190292.1937.13.camel@ibm-c.pdx.osdl.net> <20031230045334.GA3484@in.ibm.com> Content-Type: text/plain Organization: Message-Id: <1072830557.712.49.camel@ibm-c.pdx.osdl.net> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-5) Date: 30 Dec 2003 16:29:17 -0800 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2003-12-29 at 20:53, Suparna Bhattacharya wrote: > On Thu, Dec 11, 2003 at 04:51:33PM -0800, Daniel McNeil wrote: > > I've done more testing with added debug code. > > > > It looks like the filemap_write_and_wait() call is returning > > with data that has not made it disk. > > > > I added code to filemap_write_and_wait() to check if > > mapping->dirty_pages is not empty after calling filemap_fdatawrite() > > and filemap_fdatawait() and retry. Even with the retry, the test still > > sees uninitialized data when running tests overnight (I added a printk > > so I know the retry is happening). There are pages left on the > > dirty_pages list even after the write and wait. > > There are two calls to filemap_write_and_wait() for a DIO read > -- do you know in which if these cases you see dirty pages after > the write and wait ? The first is called without i_sem protection, > so it is possible for pages to be dirtied by a parallel buffered > write which is OK because this is just an extra/superfluous call > when it comes to DIO reads. The second call, however happens with i_sem > held and is used to guarantee that there are no exposures, so if > you are seeing remant dirty pages in this case it would be something > to worry about. > Yes there are two calls. I'm assuming that the retry was caused by the call w/o holding the i_sem, since pages can be dirtied in parallel. My debug output shows every filemap_write_and_wait() and which pages are on the dirty, io, and locked list and which pages are on the clean list after the wait. (yes, this is lots of output) I changed the test a little bit -- if the test sees non-zero data, it opens a new file descriptor and re-reads the data without O_DIRECT and also re-reads the 1st page of the non-zero data using O_DIRECT. What the debug output shows is: The 1st filemap_write_and_wait() writes out some data. The 2nd filemap_write_and_wait() sees different dirty pages. The pages that are seen non-zero by the test (many pages -- 243 in one case) are on the dirty_pages list before the write and on the clean_pages list after the wait. But some of the pages at the end of the clean list, are seen by the test program as non-zero. Since I added the DIO re-read, the debug output shows for the 2nd re-read: the 1st filemap_write_and_wait() with NO dirty pages the 2nd filemap_write_and_wait() with dirty pages AFTER the pages that mis-matched and the original plus these pages on the clean list after the wait. The 2nd re-read and the read without O_DIRECT both get zeroed data. Conclusions: =========== I'm not sure. :( It appears like the pages are put on the clean list and PG_writeback is cleared while the i/o is in flight and the DIO reads are put on the queue ahead of the writeback. I am trying to figure out how to add more debug code to prove this or figure out what else is going on. I'm open to suggestions. Daniel