On Thu, 28 Dec 2006, Andrew Morton wrote: > > It would be interesting to convert your app to do fsync() before > FADV_DONTNEED. That would take WB_SYNC_NONE out of the picture as well > (apart from pdflush activity). I get corruption - but the whole point is that it's very much pdflush that should be writing these pages out. Andrew - give my test-program a try. It can run in about 1 minute if you have a 256MB machine (I didn't, but "mem=256M" is my friend), and it seems to very consistently cause corruption. What I do is: # Make sure we write back aggressively echo 5 > /proc/sys/vm/dirty_ratio as root, and then just run the thing. Tons of corruption. But the corruption goes away if I just leave the default dirty ratio alone (but then I can increse the file size to trigger it, of course - but that also makes the test run a lot slower). Now, with a pre-2.6.19 kernel, I bet you won't get the corruption as easily (at least with the "fsync()"), but that's less to do with anything new, and probably just because then you simply won't have any pdflushing going on - since the kernel won't even notice that you have tons of dirty pages ;) It might also depend on the speed of your disk drive - the machine I test this on has a slow 4200 rpm laptop drive in it, and that probably makes things go south more easily. That's _especially_ true if this is related to any "bdi_write_congested()" logic. Now, it could also be related to various code snippets like ... if (wbc->sync_mode != WB_SYNC_NONE) wait_on_page_writeback(page); if (PageWriteback(page) || !clear_page_dirty_for_io(page)) { unlock_page(page); continue; } ... where the WB_SYNC_NONE case will hit the "PageWriteback()" and just not do the writeback at all (but it also won't clear the dirty bit, so it's certainly not an *OBVIOUS* bug). We also have code like this ("pageout()"): if (clear_page_dirty_for_io(page)) { int res; struct writeback_control wbc = { .sync_mode = WB_SYNC_NONE, .. } ... res = mapping->a_ops->writepage(page, &wbc); and in this case, if the "WB_SYNC_NONE" means that the "writepage()" call won't do anything at all because of congestion, then that would be a _bad_ thing, and would certainly explain how something didn't get written out. But that particular path should only trigger for the "shrink_page_list()" case, and it's not the case I seem to be testing with my "low dirty_ratio" testing. Linus