From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Mason Subject: Re: [PATCH] btrfs file write debugging patch Date: Mon, 28 Feb 2011 09:00:17 -0500 Message-ID: <1298901582-sup-6518@think> References: <1298857223-sup-5612@think> <201102281114.00018.johannes.hirte@fem.tu-ilmenau.de> Content-Type: text/plain; charset=UTF-8 Cc: Mitch Harder , =?utf-8?q?Maria_Wikstr=C3=B6m?= , "Zhong, Xin" , "linux-btrfs@vger.kernel.org" To: Johannes Hirte Return-path: In-reply-to: <201102281114.00018.johannes.hirte@fem.tu-ilmenau.de> List-ID: Excerpts from Johannes Hirte's message of 2011-02-28 05:13:59 -0500: > On Monday 28 February 2011 02:46:05 Chris Mason wrote: > > Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500: > > > Some clarification on my previous message... > > > > > > After looking at my ftrace log more closely, I can see where Btrfs is > > > trying to release the allocated pages. However, the calculation for > > > the number of dirty_pages is equal to 1 when "copied == 0". > > > > > > So I'm seeing at least two problems: > > > (1) It keeps looping when "copied == 0". > > > (2) One dirty page is not being released on every loop even though > > > "copied == 0" (at least this problem keeps it from being an infinite > > > loop by eventually exhausting reserveable space on the disk). > > > > Hi everyone, > > > > There are actually tow bugs here. First the one that Mitch hit, and a > > second one that still results in bad file_write results with my > > debugging hunks (the first two hunks below) in place. > > > > My patch fixes Mitch's bug by checking for copied == 0 after > > btrfs_copy_from_user and going the correct delalloc accounting. This > > one looks solved, but you'll notice the patch is bigger. > > > > First, I add some random failures to btrfs_copy_from_user() by failing > > everyone once and a while. This was much more reliable than trying to > > use memory pressure than making copy_from_user fail. > > > > If copy_from_user fails and we partially update a page, we end up with a > > page that may go away due to memory pressure. But, btrfs_file_write > > assumes that only the first and last page may have good data that needs > > to be read off the disk. > > > > This patch ditches that code and puts it into prepare_pages instead. > > But I'm still having some errors during long stress.sh runs. Ideas are > > more than welcome, hopefully some other timezones will kick in ideas > > while I sleep. > > At least it doesn't fix the emerge-problem for me. The behavior is now the same > as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcrypt' with no > further interaction to get the emerge-process hang with a svn-process > consuming 100% CPU. I can cancel the emerge-process with ctrl-c but the > spawned svn-process stays and it needs a reboot to get rid of it. I think your problem really is more enospc related. Still working on that as well. But please don't try the patch without removing the debugging hunk at the top (anything that mentions jiffies). -chris