All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] writeback: avoid race when update bandwidth
@ 2012-06-12 11:46 Wanpeng Li
  2012-06-12 11:52 ` Fengguang Wu
  0 siblings, 1 reply; 9+ messages in thread
From: Wanpeng Li @ 2012-06-12 11:46 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: linux-kernel, Gavin Shan, Wanpeng Li

From: Wanpeng Li <liwp@linux.vnet.ibm.com>

"V1 -> V2"
* remove dirty_lock

Since bdi->wb.list_lock is used to protect the b_* lists,
so the flushers who call wb_writeback to writeback pages will
stuck when bandwidth update policy holds this lock. In order
to avoid this race we can introduce a new bandwidth_lock who
is responsible for protecting bandwidth update policy.

Signed-off-by: Wanpeng Li <liwp.linux@gmail.com>

---
 mm/page-writeback.c |    9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index c833bf0..e28d36e 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -815,7 +815,6 @@ static void global_update_bandwidth(unsigned long thresh,
 				    unsigned long dirty,
 				    unsigned long now)
 {
-	static DEFINE_SPINLOCK(dirty_lock);
 	static unsigned long update_time;
 
 	/*
@@ -824,12 +823,10 @@ static void global_update_bandwidth(unsigned long thresh,
 	if (time_before(now, update_time + BANDWIDTH_INTERVAL))
 		return;
 
-	spin_lock(&dirty_lock);
 	if (time_after_eq(now, update_time + BANDWIDTH_INTERVAL)) {
 		update_dirty_limit(thresh, dirty);
 		update_time = now;
 	}
-	spin_unlock(&dirty_lock);
 }
 
 /*
@@ -1032,12 +1029,14 @@ static void bdi_update_bandwidth(struct backing_dev_info *bdi,
 				 unsigned long bdi_dirty,
 				 unsigned long start_time)
 {
+	static DEFINE_SPINLOCK(bandwidth_lock);
+
 	if (time_is_after_eq_jiffies(bdi->bw_time_stamp + BANDWIDTH_INTERVAL))
 		return;
-	spin_lock(&bdi->wb.list_lock);
+	spin_lock(&bandwidth_lock);
 	__bdi_update_bandwidth(bdi, thresh, bg_thresh, dirty,
 			       bdi_thresh, bdi_dirty, start_time);
-	spin_unlock(&bdi->wb.list_lock);
+	spin_unlock(&bandwidth_lock);
 }
 
 /*
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] writeback: avoid race when update bandwidth
  2012-06-12 11:46 [PATCH v2] writeback: avoid race when update bandwidth Wanpeng Li
@ 2012-06-12 11:52 ` Fengguang Wu
  2012-06-12 11:58   ` Wanpeng Li
  2012-06-13  3:59   ` Dave Chinner
  0 siblings, 2 replies; 9+ messages in thread
From: Fengguang Wu @ 2012-06-12 11:52 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: linux-kernel, Gavin Shan

On Tue, Jun 12, 2012 at 07:46:01PM +0800, Wanpeng Li wrote:
> From: Wanpeng Li <liwp@linux.vnet.ibm.com>
> 
> "V1 -> V2"
> * remove dirty_lock
> 
> Since bdi->wb.list_lock is used to protect the b_* lists,
> so the flushers who call wb_writeback to writeback pages will
> stuck when bandwidth update policy holds this lock. In order
> to avoid this race we can introduce a new bandwidth_lock who
> is responsible for protecting bandwidth update policy.
> 
> Signed-off-by: Wanpeng Li <liwp.linux@gmail.com>

Applied with a new title "writeback: use a standalone lock for
updating write bandwidth". "race" is sensitive because it often
refers to some locking error.

Thank you!

Fengguang

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] writeback: avoid race when update bandwidth
  2012-06-12 11:52 ` Fengguang Wu
@ 2012-06-12 11:58   ` Wanpeng Li
  2012-06-13  3:59   ` Dave Chinner
  1 sibling, 0 replies; 9+ messages in thread
From: Wanpeng Li @ 2012-06-12 11:58 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: linux-kernel, Gavin Shan, Wanpeng Li

On Tue, Jun 12, 2012 at 07:52:19PM +0800, Fengguang Wu wrote:
>On Tue, Jun 12, 2012 at 07:46:01PM +0800, Wanpeng Li wrote:
>> From: Wanpeng Li <liwp@linux.vnet.ibm.com>
>> 
>> "V1 -> V2"
>> * remove dirty_lock
>> 
>> Since bdi->wb.list_lock is used to protect the b_* lists,
>> so the flushers who call wb_writeback to writeback pages will
>> stuck when bandwidth update policy holds this lock. In order
>> to avoid this race we can introduce a new bandwidth_lock who
>> is responsible for protecting bandwidth update policy.
>> 
>> Signed-off-by: Wanpeng Li <liwp.linux@gmail.com>
>
>Applied with a new title "writeback: use a standalone lock for
>updating write bandwidth". "race" is sensitive because it often
>refers to some locking error.

OK, Thanks a lot.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] writeback: avoid race when update bandwidth
  2012-06-12 11:52 ` Fengguang Wu
  2012-06-12 11:58   ` Wanpeng Li
@ 2012-06-13  3:59   ` Dave Chinner
  2012-06-13 12:14     ` Fengguang Wu
  1 sibling, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2012-06-13  3:59 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: Wanpeng Li, linux-kernel, Gavin Shan

On Tue, Jun 12, 2012 at 07:52:19PM +0800, Fengguang Wu wrote:
> On Tue, Jun 12, 2012 at 07:46:01PM +0800, Wanpeng Li wrote:
> > From: Wanpeng Li <liwp@linux.vnet.ibm.com>
> > 
> > "V1 -> V2"
> > * remove dirty_lock
> > 
> > Since bdi->wb.list_lock is used to protect the b_* lists,
> > so the flushers who call wb_writeback to writeback pages will
> > stuck when bandwidth update policy holds this lock. In order
> > to avoid this race we can introduce a new bandwidth_lock who
> > is responsible for protecting bandwidth update policy.
> > 
> > Signed-off-by: Wanpeng Li <liwp.linux@gmail.com>
> 
> Applied with a new title "writeback: use a standalone lock for
> updating write bandwidth". "race" is sensitive because it often
> refers to some locking error.

Fengguang - can we get some evidence that this is a contended lock
before changing the scope of it? All of the previous "breaking up
global locks" have been done based on lock contention data, so
moving back to a global lock for this needs to have the same
analysis provided...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] writeback: avoid race when update bandwidth
  2012-06-13  3:59   ` Dave Chinner
@ 2012-06-13 12:14     ` Fengguang Wu
  2012-06-14  2:05       ` Dave Chinner
  0 siblings, 1 reply; 9+ messages in thread
From: Fengguang Wu @ 2012-06-13 12:14 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Wanpeng Li, linux-kernel, Gavin Shan

On Wed, Jun 13, 2012 at 01:59:20PM +1000, Dave Chinner wrote:
> On Tue, Jun 12, 2012 at 07:52:19PM +0800, Fengguang Wu wrote:
> > On Tue, Jun 12, 2012 at 07:46:01PM +0800, Wanpeng Li wrote:
> > > From: Wanpeng Li <liwp@linux.vnet.ibm.com>
> > > 
> > > "V1 -> V2"
> > > * remove dirty_lock
> > > 
> > > Since bdi->wb.list_lock is used to protect the b_* lists,
> > > so the flushers who call wb_writeback to writeback pages will
> > > stuck when bandwidth update policy holds this lock. In order
> > > to avoid this race we can introduce a new bandwidth_lock who
> > > is responsible for protecting bandwidth update policy.
> > > 
> > > Signed-off-by: Wanpeng Li <liwp.linux@gmail.com>
> > 
> > Applied with a new title "writeback: use a standalone lock for
> > updating write bandwidth". "race" is sensitive because it often
> > refers to some locking error.
> 
> Fengguang - can we get some evidence that this is a contended lock
> before changing the scope of it? All of the previous "breaking up
> global locks" have been done based on lock contention data, so
> moving back to a global lock for this needs to have the same
> analysis provided...

Good point. Attached is the lockstat for the case "10 disks each runs
100 dd dirtier tasks":

        lkp-ne02/JBOD-10HDD-thresh=4G/xfs-100dd-1-3.2.0-rc5

The wb->list_lock contention is much better than I expected, which is
good.  What stand out are
                                                        waittime-total
- &rq->lock             by double_rq_lock()             6738952.13
- clockevents_lock      by clockevents_notify()         2155554.37
- mapping->tree_lock    by test_clear_page_writeback()   931550.13
- sb_lock               by grab_super_passive()          918815.87
- &zone->lru_lock       by pagevec_lru_move_fn()         912681.05

- sysfs_mutex           by sysfs_permission()           24029975.20 # mutex
- ip->i_lock            by xfs_ilock()                  18428284.10 # mrlock

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] writeback: avoid race when update bandwidth
  2012-06-13 12:14     ` Fengguang Wu
@ 2012-06-14  2:05       ` Dave Chinner
  2012-06-14 14:00         ` Fengguang Wu
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2012-06-14  2:05 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: Wanpeng Li, linux-kernel, Gavin Shan

On Wed, Jun 13, 2012 at 08:14:34PM +0800, Fengguang Wu wrote:
> On Wed, Jun 13, 2012 at 01:59:20PM +1000, Dave Chinner wrote:
> > On Tue, Jun 12, 2012 at 07:52:19PM +0800, Fengguang Wu wrote:
> > > On Tue, Jun 12, 2012 at 07:46:01PM +0800, Wanpeng Li wrote:
> > > > From: Wanpeng Li <liwp@linux.vnet.ibm.com>
> > > > 
> > > > "V1 -> V2"
> > > > * remove dirty_lock
> > > > 
> > > > Since bdi->wb.list_lock is used to protect the b_* lists,
> > > > so the flushers who call wb_writeback to writeback pages will
> > > > stuck when bandwidth update policy holds this lock. In order
> > > > to avoid this race we can introduce a new bandwidth_lock who
> > > > is responsible for protecting bandwidth update policy.
> > > > 
> > > > Signed-off-by: Wanpeng Li <liwp.linux@gmail.com>
> > > 
> > > Applied with a new title "writeback: use a standalone lock for
> > > updating write bandwidth". "race" is sensitive because it often
> > > refers to some locking error.
> > 
> > Fengguang - can we get some evidence that this is a contended lock
> > before changing the scope of it? All of the previous "breaking up
> > global locks" have been done based on lock contention data, so
> > moving back to a global lock for this needs to have the same
> > analysis provided...
> 
> Good point. Attached is the lockstat for the case "10 disks each runs
> 100 dd dirtier tasks":
> 
>         lkp-ne02/JBOD-10HDD-thresh=4G/xfs-100dd-1-3.2.0-rc5

(nothing attached)

> The wb->list_lock contention is much better than I expected, which is
> good.  What stand out are
>                                                         waittime-total
> - &rq->lock             by double_rq_lock()             6738952.13
> - clockevents_lock      by clockevents_notify()         2155554.37
> - mapping->tree_lock    by test_clear_page_writeback()   931550.13
> - sb_lock               by grab_super_passive()          918815.87
> - &zone->lru_lock       by pagevec_lru_move_fn()         912681.05
> 
> - sysfs_mutex           by sysfs_permission()           24029975.20 # mutex
> - ip->i_lock            by xfs_ilock()                  18428284.10 # mrlock

The wait time is not really an indication of contention problems.
Large wait time is usually an indication that the lock is being used
a lot.

What matters is the number of contentions vs the number of
acquisitions, and the number of those contentions that bounced the
lock. If the number of contentions is >= 0.5% of the acquisitions,
then the lock can be considered hot and needing some work. If I look
here:

http://lists.linux.hp.com/~enw/ext4/3.2/3.2-full-lockstats.2/ffsb_fsscale.xfs.large_file_creates_threads=192/profiling/iteration.1/lock_stat

Which is a 192 thread concurrent write on a 48-core machine, the
wb.list_lock shows 5,532 acquistions for the entire test, while the
mapping tree lock took 440 million!. So your test isn't really one
that shows wb.list_lock contention. The 192-thread mailserver
workload from the same machine:

http://lists.linux.hp.com/~enw/ext4/3.2/3.2-full-lockstats.2/ffsb_fsscale.xfs.mail_server_threads=192/profiling/iteration.1/lock_stat

Shows about 7.1m acquisitions of the wb.list_lock, but only 28,000
contentions. So it isn't really contended enough to justify
replacing it with a global lock.

FWIW, the third most contended lock on that workload is the XFS
delayed write queue lock - 25M acquisitions for 600k contentions - a
rate of about 2% which means quite severe contention.  That lock no
longer exists in 3.5 - Christoph completely reworked the delayed
write buffer support to remove the global list and lock because it
was showing up in profiles like this...

Indeed, that profile shows that XFS owns 7 of the 10 most contended
locks, and 3 of them have had significant work done to reduce the
contention since 3.2 as a result of recent profile results like this.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] writeback: avoid race when update bandwidth
  2012-06-14  2:05       ` Dave Chinner
@ 2012-06-14 14:00         ` Fengguang Wu
  2012-06-15  0:06           ` Dave Chinner
  0 siblings, 1 reply; 9+ messages in thread
From: Fengguang Wu @ 2012-06-14 14:00 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Wanpeng Li, linux-kernel, Gavin Shan

On Thu, Jun 14, 2012 at 12:05:59PM +1000, Dave Chinner wrote:
> On Wed, Jun 13, 2012 at 08:14:34PM +0800, Fengguang Wu wrote:
> > On Wed, Jun 13, 2012 at 01:59:20PM +1000, Dave Chinner wrote:
> > > On Tue, Jun 12, 2012 at 07:52:19PM +0800, Fengguang Wu wrote:
> > > > On Tue, Jun 12, 2012 at 07:46:01PM +0800, Wanpeng Li wrote:
> > > > > From: Wanpeng Li <liwp@linux.vnet.ibm.com>
> > > > > 
> > > > > "V1 -> V2"
> > > > > * remove dirty_lock
> > > > > 
> > > > > Since bdi->wb.list_lock is used to protect the b_* lists,
> > > > > so the flushers who call wb_writeback to writeback pages will
> > > > > stuck when bandwidth update policy holds this lock. In order
> > > > > to avoid this race we can introduce a new bandwidth_lock who
> > > > > is responsible for protecting bandwidth update policy.
> > > > > 
> > > > > Signed-off-by: Wanpeng Li <liwp.linux@gmail.com>
> > > > 
> > > > Applied with a new title "writeback: use a standalone lock for
> > > > updating write bandwidth". "race" is sensitive because it often
> > > > refers to some locking error.
> > > 
> > > Fengguang - can we get some evidence that this is a contended lock
> > > before changing the scope of it? All of the previous "breaking up
> > > global locks" have been done based on lock contention data, so
> > > moving back to a global lock for this needs to have the same
> > > analysis provided...
> > 
> > Good point. Attached is the lockstat for the case "10 disks each runs
> > 100 dd dirtier tasks":
> > 
> >         lkp-ne02/JBOD-10HDD-thresh=4G/xfs-100dd-1-3.2.0-rc5
> 
> (nothing attached)
> 
> > The wb->list_lock contention is much better than I expected, which is
> > good.  What stand out are
> >                                                         waittime-total
> > - &rq->lock             by double_rq_lock()             6738952.13
> > - clockevents_lock      by clockevents_notify()         2155554.37
> > - mapping->tree_lock    by test_clear_page_writeback()   931550.13
> > - sb_lock               by grab_super_passive()          918815.87
> > - &zone->lru_lock       by pagevec_lru_move_fn()         912681.05
> > 
> > - sysfs_mutex           by sysfs_permission()           24029975.20 # mutex
> > - ip->i_lock            by xfs_ilock()                  18428284.10 # mrlock
> 
> The wait time is not really an indication of contention problems.
> Large wait time is usually an indication that the lock is being used
> a lot.

Right.

> What matters is the number of contentions vs the number of
> acquisitions, and the number of those contentions that bounced the
> lock. If the number of contentions is >= 0.5% of the acquisitions,
> then the lock can be considered hot and needing some work. If I look
> here:

I wonder if anyone has a simple script for sorting lock_stat output
based on that (and perhaps other selectable) criterion? It should be
possible to write on myself, but still.. ;-)

Default lock_stat output is sorted by absolute number of contentions.

> http://lists.linux.hp.com/~enw/ext4/3.2/3.2-full-lockstats.2/ffsb_fsscale.xfs.large_file_creates_threads=192/profiling/iteration.1/lock_stat
> 
> Which is a 192 thread concurrent write on a 48-core machine, the
> wb.list_lock shows 5,532 acquistions for the entire test, while the
> mapping tree lock took 440 million!. So your test isn't really one
> that shows wb.list_lock contention. The 192-thread mailserver
> workload from the same machine:
> 
> http://lists.linux.hp.com/~enw/ext4/3.2/3.2-full-lockstats.2/ffsb_fsscale.xfs.mail_server_threads=192/profiling/iteration.1/lock_stat
> 
> Shows about 7.1m acquisitions of the wb.list_lock, but only 28,000
> contentions. So it isn't really contended enough to justify
> replacing it with a global lock.

Right.

> FWIW, the third most contended lock on that workload is the XFS
> delayed write queue lock - 25M acquisitions for 600k contentions - a
> rate of about 2% which means quite severe contention.  That lock no
> longer exists in 3.5 - Christoph completely reworked the delayed
> write buffer support to remove the global list and lock because it
> was showing up in profiles like this...
> 
> Indeed, that profile shows that XFS owns 7 of the 10 most contended
> locks, and 3 of them have had significant work done to reduce the
> contention since 3.2 as a result of recent profile results like this.

Nice work!

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] writeback: avoid race when update bandwidth
  2012-06-14 14:00         ` Fengguang Wu
@ 2012-06-15  0:06           ` Dave Chinner
  2012-06-15  0:29             ` Fengguang Wu
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2012-06-15  0:06 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: Wanpeng Li, linux-kernel, Gavin Shan

On Thu, Jun 14, 2012 at 10:00:06PM +0800, Fengguang Wu wrote:
> On Thu, Jun 14, 2012 at 12:05:59PM +1000, Dave Chinner wrote:
> > On Wed, Jun 13, 2012 at 08:14:34PM +0800, Fengguang Wu wrote:
> I wonder if anyone has a simple script for sorting lock_stat output
> based on that (and perhaps other selectable) criterion? It should be
> possible to write on myself, but still.. ;-)
> 
> Default lock_stat output is sorted by absolute number of contentions.

No that I know of. The default is pretty sane, because a highly
contended lock that is causing performance problems will always show
up near the top. If it's not in the top 10, then it's usually not
worth worrying about until you've dealt with the those above it....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] writeback: avoid race when update bandwidth
  2012-06-15  0:06           ` Dave Chinner
@ 2012-06-15  0:29             ` Fengguang Wu
  0 siblings, 0 replies; 9+ messages in thread
From: Fengguang Wu @ 2012-06-15  0:29 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Wanpeng Li, linux-kernel, Gavin Shan

On Fri, Jun 15, 2012 at 10:06:33AM +1000, Dave Chinner wrote:
> On Thu, Jun 14, 2012 at 10:00:06PM +0800, Fengguang Wu wrote:
> > On Thu, Jun 14, 2012 at 12:05:59PM +1000, Dave Chinner wrote:
> > > On Wed, Jun 13, 2012 at 08:14:34PM +0800, Fengguang Wu wrote:
> > I wonder if anyone has a simple script for sorting lock_stat output
> > based on that (and perhaps other selectable) criterion? It should be
> > possible to write on myself, but still.. ;-)
> > 
> > Default lock_stat output is sorted by absolute number of contentions.
> 
> No that I know of. The default is pretty sane, because a highly
> contended lock that is causing performance problems will always show
> up near the top. If it's not in the top 10, then it's usually not
> worth worrying about until you've dealt with the those above it....

Okay, thanks for the tip!

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-06-15  0:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-12 11:46 [PATCH v2] writeback: avoid race when update bandwidth Wanpeng Li
2012-06-12 11:52 ` Fengguang Wu
2012-06-12 11:58   ` Wanpeng Li
2012-06-13  3:59   ` Dave Chinner
2012-06-13 12:14     ` Fengguang Wu
2012-06-14  2:05       ` Dave Chinner
2012-06-14 14:00         ` Fengguang Wu
2012-06-15  0:06           ` Dave Chinner
2012-06-15  0:29             ` Fengguang Wu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.