All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] writeback: guard against jiffies wraparound on inode->dirtied_when checks (try #3)
@ 2009-04-01 17:37 Jeff Layton
  2009-04-01 20:22 ` Andi Kleen
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Layton @ 2009-04-01 17:37 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-fsdevel, fengguang.wu

This is the third version of this patch. The main difference from the
last patch is the addition by Wu of a helper function that encapsulates
the check.

The dirtied_when value on an inode is supposed to represent the first
time that an inode has one of its pages dirtied. This value is in units
of jiffies. It's used in several places in the writeback code to
determine when to write out an inode.

The problem is that these checks assume that dirtied_when is updated
periodically. If an inode is continuously being used for I/O it can be
persistently marked as dirty and will continue to age. Once the time
difference between dirtied_when and the jiffies value it is being
compared to is greater than or equal to half the maximum of the jiffies
type, the logic of the time_*() macros inverts and the opposite of what
is needed is returned. On 32-bit architectures that's just under 25 days
(assuming HZ == 1000).

As the least-recently dirtied inode, it'll end up being the first one
that pdflush will try to write out. sync_sb_inodes does this check:

	/* Was this inode dirtied after sync_sb_inodes was called? */
 	if (time_after(inode->dirtied_when, start))
 		break;

...but now dirtied_when appears to be in the future. sync_sb_inodes
bails out without attempting to write any dirty inodes. When this
occurs, pdflush will stop writing out inodes for this superblock.
Nothing can unwedge it until jiffies moves out of the problematic
window.

This patch fixes this problem by changing the checks against
dirtied_when to also check whether it appears to be in the future. If it
does, then we consider the value to be far in the past.

This should shrink the problematic window of time to such a small period
(30s) as not to matter.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Ian Kent <raven@themaw.net>
---
 fs/fs-writeback.c |   26 ++++++++++++++++++++++----
 1 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index e3fe991..cf9192c 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -196,7 +196,7 @@ static void redirty_tail(struct inode *inode)
 		struct inode *tail_inode;
 
 		tail_inode = list_entry(sb->s_dirty.next, struct inode, i_list);
-		if (!time_after_eq(inode->dirtied_when,
+		if (time_before(inode->dirtied_when,
 				tail_inode->dirtied_when))
 			inode->dirtied_when = jiffies;
 	}
@@ -220,6 +220,21 @@ static void inode_sync_complete(struct inode *inode)
 	wake_up_bit(&inode->i_state, __I_SYNC);
 }
 
+static bool inode_dirtied_after(struct inode *inode, unsigned long t)
+{
+	bool ret = time_after(inode->dirtied_when, t);
+#ifndef CONFIG_64BIT
+	/*
+	 * For inodes being constantly redirtied, dirtied_when can get stuck.
+	 * It _appears_ to be in the future, but is actually in distant past.
+	 * This test is necessary to prevent such wrapped-around relative times
+	 * from permanently stopping the whole pdflush writeback.
+	 */
+	ret = ret && time_before_eq(inode->dirtied_when, jiffies);
+#endif
+	return ret;
+}
+
 /*
  * Move expired dirty inodes from @delaying_queue to @dispatch_queue.
  */
@@ -231,7 +246,7 @@ static void move_expired_inodes(struct list_head *delaying_queue,
 		struct inode *inode = list_entry(delaying_queue->prev,
 						struct inode, i_list);
 		if (older_than_this &&
-			time_after(inode->dirtied_when, *older_than_this))
+		    inode_dirtied_after(inode, *older_than_this))
 			break;
 		list_move(&inode->i_list, dispatch_queue);
 	}
@@ -492,8 +507,11 @@ void generic_sync_sb_inodes(struct super_block *sb,
 			continue;		/* blockdev has wrong queue */
 		}
 
-		/* Was this inode dirtied after sync_sb_inodes was called? */
-		if (time_after(inode->dirtied_when, start))
+		/*
+		 * Was this inode dirtied after sync_sb_inodes was called?
+		 * This keeps sync from extra jobs and livelock.
+		 */
+		if (inode_dirtied_after(inode, start))
 			break;
 
 		/* Is another pdflush already flushing this queue? */
-- 
1.5.5.6


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] writeback: guard against jiffies wraparound on inode->dirtied_when checks (try #3)
  2009-04-01 17:37 [PATCH] writeback: guard against jiffies wraparound on inode->dirtied_when checks (try #3) Jeff Layton
@ 2009-04-01 20:22 ` Andi Kleen
  2009-04-01 21:26   ` Jeff Layton
  0 siblings, 1 reply; 5+ messages in thread
From: Andi Kleen @ 2009-04-01 20:22 UTC (permalink / raw)
  To: Jeff Layton; +Cc: akpm, linux-kernel, linux-fsdevel, fengguang.wu

Jeff Layton <jlayton@redhat.com> writes:
>
> The problem is that these checks assume that dirtied_when is updated
> periodically. If an inode is continuously being used for I/O it can be
> persistently marked as dirty and will continue to age. Once the time
> difference between dirtied_when and the jiffies value it is being
> compared to is greater than or equal to half the maximum of the jiffies
> type, the logic of the time_*() macros inverts and the opposite of what
> is needed is returned. On 32-bit architectures that's just under 25 days
> (assuming HZ == 1000).

I wonder if this can happen in other places using jiffies time stamp
too. Why not? Perhaps that check macro should be in timer.h and some auditing done
over the whiole code base?

-Andi


-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] writeback: guard against jiffies wraparound on inode->dirtied_when checks (try #3)
  2009-04-01 20:22 ` Andi Kleen
@ 2009-04-01 21:26   ` Jeff Layton
  2009-04-01 22:12     ` Andi Kleen
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Layton @ 2009-04-01 21:26 UTC (permalink / raw)
  To: Andi Kleen; +Cc: akpm, linux-kernel, linux-fsdevel, fengguang.wu

On Wed, 01 Apr 2009 22:22:06 +0200
Andi Kleen <andi@firstfloor.org> wrote:

> Jeff Layton <jlayton@redhat.com> writes:
> >
> > The problem is that these checks assume that dirtied_when is updated
> > periodically. If an inode is continuously being used for I/O it can be
> > persistently marked as dirty and will continue to age. Once the time
> > difference between dirtied_when and the jiffies value it is being
> > compared to is greater than or equal to half the maximum of the jiffies
> > type, the logic of the time_*() macros inverts and the opposite of what
> > is needed is returned. On 32-bit architectures that's just under 25 days
> > (assuming HZ == 1000).
> 
> I wonder if this can happen in other places using jiffies time stamp
> too. Why not? Perhaps that check macro should be in timer.h and some auditing done
> over the whiole code base?
> 

It certainly can happen in other places. We've seen very similar
problems in NFS, and they were fixed in similar ways. That's where the
time_in_range macro came from. I agree that a thorough audit of jiffies
usage would be a fine thing...

One possibility might be a new debugging option. We could add
replacement time_after() and time_before() macros that also check
whether the difference in times is beyond a certain threshold
(maybe a day or week or so), and pop a printk or otherwise record
info about it when one is detected?

That wouldn't find all of the problem cases, but it might help ID some
of them.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] writeback: guard against jiffies wraparound on inode->dirtied_when checks (try #3)
  2009-04-01 21:26   ` Jeff Layton
@ 2009-04-01 22:12     ` Andi Kleen
  2009-04-02 11:58       ` Jeff Layton
  0 siblings, 1 reply; 5+ messages in thread
From: Andi Kleen @ 2009-04-01 22:12 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Andi Kleen, akpm, linux-kernel, linux-fsdevel, fengguang.wu

On Wed, Apr 01, 2009 at 05:26:30PM -0400, Jeff Layton wrote:
> One possibility might be a new debugging option. We could add
> replacement time_after() and time_before() macros that also check
> whether the difference in times is beyond a certain threshold
> (maybe a day or week or so), and pop a printk or otherwise record
> info about it when one is detected?

Makes sense. However it might be hard to get people to run kernels
with heavy debugging options for that long.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] writeback: guard against jiffies wraparound on inode->dirtied_when checks (try #3)
  2009-04-01 22:12     ` Andi Kleen
@ 2009-04-02 11:58       ` Jeff Layton
  0 siblings, 0 replies; 5+ messages in thread
From: Jeff Layton @ 2009-04-02 11:58 UTC (permalink / raw)
  To: Andi Kleen; +Cc: akpm, linux-kernel, linux-fsdevel, fengguang.wu

On Thu, 2 Apr 2009 00:12:24 +0200
Andi Kleen <andi@firstfloor.org> wrote:

> On Wed, Apr 01, 2009 at 05:26:30PM -0400, Jeff Layton wrote:
> > One possibility might be a new debugging option. We could add
> > replacement time_after() and time_before() macros that also check
> > whether the difference in times is beyond a certain threshold
> > (maybe a day or week or so), and pop a printk or otherwise record
> > info about it when one is detected?
> 
> Makes sense. However it might be hard to get people to run kernels
> with heavy debugging options for that long.
> 

Good point. That would limit the usefulness. I also worry that these
macros get used in sensitive places that might not be conducive to
printk's. Plus, we'd have to worry about ratelimiting them since they
could potentially pop often once you did hit the issue.

I'm not sure there's much we can do other than good old-fashioned
review. Identifying places where jiffies-based timestamps might live a
long time is ultimately going to come down to understanding how they're
used in the code.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-04-02 12:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-01 17:37 [PATCH] writeback: guard against jiffies wraparound on inode->dirtied_when checks (try #3) Jeff Layton
2009-04-01 20:22 ` Andi Kleen
2009-04-01 21:26   ` Jeff Layton
2009-04-01 22:12     ` Andi Kleen
2009-04-02 11:58       ` Jeff Layton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.