All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] writeback: avoid race when update bandwidth
@ 2012-06-12 10:26 Wanpeng Li
  2012-06-12 11:21 ` Fengguang Wu
  0 siblings, 1 reply; 8+ messages in thread
From: Wanpeng Li @ 2012-06-12 10:26 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: linux-kernel, Gavin Shan, Wanpeng Li, Wanpeng Li

From: Wanpeng Li <liwp@linux.vnet.ibm.com>

Since bdi->wb.list_lock is used to protect the b_* lists,
so the flushers who call wb_writeback to writeback pages will
stuck when bandwidth update policy holds this lock. In order
to avoid this race we can introduce a new bandwidth_lock who
is responsible for protecting bandwidth update policy.

Signed-off-by: Wanpeng Li <liswp@linux.vnet.ibm.com>
---
 mm/page-writeback.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index c8945e0..b3b08fb 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1032,12 +1032,14 @@ static void bdi_update_bandwidth(struct backing_dev_info *bdi,
 				 unsigned long bdi_dirty,
 				 unsigned long start_time)
 {
+	static DEFINE_SPINLOCK(bandwidth_lock);
+
 	if (time_is_after_eq_jiffies(bdi->bw_time_stamp + BANDWIDTH_INTERVAL))
 		return;
-	spin_lock(&bdi->wb.list_lock);
+	spin_lock(&bandwidth_lock);
 	__bdi_update_bandwidth(bdi, thresh, bg_thresh, dirty,
 			       bdi_thresh, bdi_dirty, start_time);
-	spin_unlock(&bdi->wb.list_lock);
+	spin_unlock(&bandwidth_lock);
 }
 
 /*
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] writeback: avoid race when update bandwidth
  2012-06-12 10:26 [PATCH] writeback: avoid race when update bandwidth Wanpeng Li
@ 2012-06-12 11:21 ` Fengguang Wu
  2012-06-12 11:29   ` Wanpeng Li
  2012-06-13  3:56   ` Dave Chinner
  0 siblings, 2 replies; 8+ messages in thread
From: Fengguang Wu @ 2012-06-12 11:21 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: linux-kernel, Gavin Shan, Wanpeng Li

On Tue, Jun 12, 2012 at 06:26:43PM +0800, Wanpeng Li wrote:
> From: Wanpeng Li <liwp@linux.vnet.ibm.com>

That email address is no longer in use?

> Since bdi->wb.list_lock is used to protect the b_* lists,
> so the flushers who call wb_writeback to writeback pages will
> stuck when bandwidth update policy holds this lock. In order
> to avoid this race we can introduce a new bandwidth_lock who
> is responsible for protecting bandwidth update policy.

This looks good to me. wb.list_lock could be contended and it's better
for bdi_update_bandwidth() to use a standalone and hardly contended
lock.

btw, with this change, the dirty_lock in global_update_bandwidth() can
be eliminated.

> Signed-off-by: Wanpeng Li <liswp@linux.vnet.ibm.com>
> ---
>  mm/page-writeback.c |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index c8945e0..b3b08fb 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -1032,12 +1032,14 @@ static void bdi_update_bandwidth(struct backing_dev_info *bdi,
>  				 unsigned long bdi_dirty,
>  				 unsigned long start_time)
>  {
> +	static DEFINE_SPINLOCK(bandwidth_lock);
> +
>  	if (time_is_after_eq_jiffies(bdi->bw_time_stamp + BANDWIDTH_INTERVAL))
>  		return;
> -	spin_lock(&bdi->wb.list_lock);
> +	spin_lock(&bandwidth_lock);
>  	__bdi_update_bandwidth(bdi, thresh, bg_thresh, dirty,
>  			       bdi_thresh, bdi_dirty, start_time);
> -	spin_unlock(&bdi->wb.list_lock);
> +	spin_unlock(&bandwidth_lock);
>  }
>  
>  /*
> -- 
> 1.7.9.5

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] writeback: avoid race when update bandwidth
  2012-06-12 11:21 ` Fengguang Wu
@ 2012-06-12 11:29   ` Wanpeng Li
  2012-06-12 11:33     ` Fengguang Wu
  2012-06-13  3:56   ` Dave Chinner
  1 sibling, 1 reply; 8+ messages in thread
From: Wanpeng Li @ 2012-06-12 11:29 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: linux-kernel, Gavin Shan, Wanpeng Li

On Tue, Jun 12, 2012 at 07:21:29PM +0800, Fengguang Wu wrote:
>On Tue, Jun 12, 2012 at 06:26:43PM +0800, Wanpeng Li wrote:
>> From: Wanpeng Li <liwp@linux.vnet.ibm.com>
>
>That email address is no longer in use?

No, :), you better use "Wanpeng Li <liwp@linux.vnet.ibm.com>" to commit
my patch. Because next month this email address will be available again,
and this also can help my colleagues to add my patches count this year,
thanks a lot.
>
>> Since bdi->wb.list_lock is used to protect the b_* lists,
>> so the flushers who call wb_writeback to writeback pages will
>> stuck when bandwidth update policy holds this lock. In order
>> to avoid this race we can introduce a new bandwidth_lock who
>> is responsible for protecting bandwidth update policy.
>
>This looks good to me. wb.list_lock could be contended and it's better
>for bdi_update_bandwidth() to use a standalone and hardly contended
>lock.
>
>btw, with this change, the dirty_lock in global_update_bandwidth() can
>be eliminated.

Ok, I will resend the patch.

Regards,
Wanpeng Li
>
>> Signed-off-by: Wanpeng Li <liswp@linux.vnet.ibm.com>
>> ---
>>  mm/page-writeback.c |    6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>> 
>> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
>> index c8945e0..b3b08fb 100644
>> --- a/mm/page-writeback.c
>> +++ b/mm/page-writeback.c
>> @@ -1032,12 +1032,14 @@ static void bdi_update_bandwidth(struct backing_dev_info *bdi,
>>  				 unsigned long bdi_dirty,
>>  				 unsigned long start_time)
>>  {
>> +	static DEFINE_SPINLOCK(bandwidth_lock);
>> +
>>  	if (time_is_after_eq_jiffies(bdi->bw_time_stamp + BANDWIDTH_INTERVAL))
>>  		return;
>> -	spin_lock(&bdi->wb.list_lock);
>> +	spin_lock(&bandwidth_lock);
>>  	__bdi_update_bandwidth(bdi, thresh, bg_thresh, dirty,
>>  			       bdi_thresh, bdi_dirty, start_time);
>> -	spin_unlock(&bdi->wb.list_lock);
>> +	spin_unlock(&bandwidth_lock);
>>  }
>>  
>>  /*
>> -- 
>> 1.7.9.5

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] writeback: avoid race when update bandwidth
  2012-06-12 11:29   ` Wanpeng Li
@ 2012-06-12 11:33     ` Fengguang Wu
  0 siblings, 0 replies; 8+ messages in thread
From: Fengguang Wu @ 2012-06-12 11:33 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: linux-kernel, Gavin Shan

On Tue, Jun 12, 2012 at 07:29:49PM +0800, Wanpeng Li wrote:
> On Tue, Jun 12, 2012 at 07:21:29PM +0800, Fengguang Wu wrote:
> >On Tue, Jun 12, 2012 at 06:26:43PM +0800, Wanpeng Li wrote:
> >> From: Wanpeng Li <liwp@linux.vnet.ibm.com>
> >
> >That email address is no longer in use?
> 
> No, :), you better use "Wanpeng Li <liwp@linux.vnet.ibm.com>" to commit
> my patch. Because next month this email address will be available again,
> and this also can help my colleagues to add my patches count this year,
> thanks a lot.

Then at least don't add liwp@linux.vnet.ibm.com in the email CC list.
It's annoying to receive error notifications.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] writeback: avoid race when update bandwidth
  2012-06-12 11:21 ` Fengguang Wu
  2012-06-12 11:29   ` Wanpeng Li
@ 2012-06-13  3:56   ` Dave Chinner
  2012-06-13  4:21     ` Fengguang Wu
  1 sibling, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2012-06-13  3:56 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: Wanpeng Li, linux-kernel, Gavin Shan, Wanpeng Li

On Tue, Jun 12, 2012 at 07:21:29PM +0800, Fengguang Wu wrote:
> On Tue, Jun 12, 2012 at 06:26:43PM +0800, Wanpeng Li wrote:
> > From: Wanpeng Li <liwp@linux.vnet.ibm.com>
> 
> That email address is no longer in use?
> 
> > Since bdi->wb.list_lock is used to protect the b_* lists,
> > so the flushers who call wb_writeback to writeback pages will
> > stuck when bandwidth update policy holds this lock. In order
> > to avoid this race we can introduce a new bandwidth_lock who
> > is responsible for protecting bandwidth update policy.

This is not a race condition - it is a lock contention condition.


> This looks good to me. wb.list_lock could be contended and it's better
> for bdi_update_bandwidth() to use a standalone and hardly contended
> lock.

I'm not sure it will be "hardly contended". That's a global lock, so
now we'll end up with updates on different bdis contending and it's
not uncommon to see a couple of thousand processes on large machines
beating on balance_dirty_pages().  Putting a global scope lock
around such a function doesn't seem like a good solution to me.

Oh, and if you want to remove the dirty_lock from
global_update_limit(), then replacing the lock with a cmpxchg loop
will do it just fine....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] writeback: avoid race when update bandwidth
  2012-06-13  3:56   ` Dave Chinner
@ 2012-06-13  4:21     ` Fengguang Wu
  2012-06-14  1:36       ` Dave Chinner
  0 siblings, 1 reply; 8+ messages in thread
From: Fengguang Wu @ 2012-06-13  4:21 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Wanpeng Li, linux-kernel, Gavin Shan, Wanpeng Li

On Wed, Jun 13, 2012 at 01:56:47PM +1000, Dave Chinner wrote:
> On Tue, Jun 12, 2012 at 07:21:29PM +0800, Fengguang Wu wrote:
> > On Tue, Jun 12, 2012 at 06:26:43PM +0800, Wanpeng Li wrote:
> > > From: Wanpeng Li <liwp@linux.vnet.ibm.com>
> > 
> > That email address is no longer in use?
> > 
> > > Since bdi->wb.list_lock is used to protect the b_* lists,
> > > so the flushers who call wb_writeback to writeback pages will
> > > stuck when bandwidth update policy holds this lock. In order
> > > to avoid this race we can introduce a new bandwidth_lock who
> > > is responsible for protecting bandwidth update policy.
> 
> This is not a race condition - it is a lock contention condition.

Nod.

> > This looks good to me. wb.list_lock could be contended and it's better
> > for bdi_update_bandwidth() to use a standalone and hardly contended
> > lock.
> 
> I'm not sure it will be "hardly contended". That's a global lock, so
> now we'll end up with updates on different bdis contending and it's
> not uncommon to see a couple of thousand processes on large machines
> beating on balance_dirty_pages().  Putting a global scope lock
> around such a function doesn't seem like a good solution to me.

It's more about the number of bdi's than the number of processes that matters.
Because here is a per-bdi 200ms ratelimit:

bdi_update_bandwidth():

       if (time_is_after_eq_jiffies(bdi->bw_time_stamp + BANDWIDTH_INTERVAL))
                return;         
       // lock it

So a global should be enough when there are only dozens of disks.

However, the global bandwidth_lock will probably become a problem when
there comes hundreds of disks. If there are (or will be) such setups,
I'm fine to revert to the old per-bdi locking.

> Oh, and if you want to remove the dirty_lock from
> global_update_limit(), then replacing the lock with a cmpxchg loop
> will do it just fine....

Yes. But to be frank, I don't care about that dirty_lock at all,
because it has its own 200ms rate limiting :-)

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] writeback: avoid race when update bandwidth
  2012-06-13  4:21     ` Fengguang Wu
@ 2012-06-14  1:36       ` Dave Chinner
  2012-06-14 13:48         ` Fengguang Wu
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2012-06-14  1:36 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: Wanpeng Li, linux-kernel, Gavin Shan, Wanpeng Li

On Wed, Jun 13, 2012 at 12:21:15PM +0800, Fengguang Wu wrote:
> On Wed, Jun 13, 2012 at 01:56:47PM +1000, Dave Chinner wrote:
> > On Tue, Jun 12, 2012 at 07:21:29PM +0800, Fengguang Wu wrote:
> > > On Tue, Jun 12, 2012 at 06:26:43PM +0800, Wanpeng Li wrote:
> > > > From: Wanpeng Li <liwp@linux.vnet.ibm.com>
> > > 
> > > That email address is no longer in use?
> > > 
> > > > Since bdi->wb.list_lock is used to protect the b_* lists,
> > > > so the flushers who call wb_writeback to writeback pages will
> > > > stuck when bandwidth update policy holds this lock. In order
> > > > to avoid this race we can introduce a new bandwidth_lock who
> > > > is responsible for protecting bandwidth update policy.
> > 
> > This is not a race condition - it is a lock contention condition.
> 
> Nod.
> 
> > > This looks good to me. wb.list_lock could be contended and it's better
> > > for bdi_update_bandwidth() to use a standalone and hardly contended
> > > lock.
> > 
> > I'm not sure it will be "hardly contended". That's a global lock, so
> > now we'll end up with updates on different bdis contending and it's
> > not uncommon to see a couple of thousand processes on large machines
> > beating on balance_dirty_pages().  Putting a global scope lock
> > around such a function doesn't seem like a good solution to me.
> 
> It's more about the number of bdi's than the number of processes that matters.
> Because here is a per-bdi 200ms ratelimit:
> 
> bdi_update_bandwidth():
> 
>        if (time_is_after_eq_jiffies(bdi->bw_time_stamp + BANDWIDTH_INTERVAL))
>                 return;         
>        // lock it

So now you get a thousand processes on a thousand CPUs all hit that
case at the same time because they are all writing to disk at the
same time, all nicely synchronised by MPI. Lock contention ahoy!

> So a global should be enough when there are only dozens of disks.

Only needs one bdi, just with lots of processes trying to hit it at
the same time such that they all pass the time after check.

> However, the global bandwidth_lock will probably become a problem when
> there comes hundreds of disks. If there are (or will be) such setups,
> I'm fine to revert to the old per-bdi locking.

There are setups with hundreds of disks. They also tend to
have hundreds of CPUs, too....

> > Oh, and if you want to remove the dirty_lock from
> > global_update_limit(), then replacing the lock with a cmpxchg loop
> > will do it just fine....
> 
> Yes. But to be frank, I don't care about that dirty_lock at all,
> because it has its own 200ms rate limiting :-)

That has the same problem, only it's currently nested inside another
lock which isolates it from contention.  This is why measurement is
important - until there is that evidence shows that the lock
contention is a problem, don't change it because it generally has a
unpredictable cascading effect that often results in worse
contention that was there originally....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] writeback: avoid race when update bandwidth
  2012-06-14  1:36       ` Dave Chinner
@ 2012-06-14 13:48         ` Fengguang Wu
  0 siblings, 0 replies; 8+ messages in thread
From: Fengguang Wu @ 2012-06-14 13:48 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Wanpeng Li, linux-kernel, Gavin Shan, Wanpeng Li

On Thu, Jun 14, 2012 at 11:36:45AM +1000, Dave Chinner wrote:
> On Wed, Jun 13, 2012 at 12:21:15PM +0800, Fengguang Wu wrote:
> > On Wed, Jun 13, 2012 at 01:56:47PM +1000, Dave Chinner wrote:
> > > On Tue, Jun 12, 2012 at 07:21:29PM +0800, Fengguang Wu wrote:
> > > > On Tue, Jun 12, 2012 at 06:26:43PM +0800, Wanpeng Li wrote:
> > > > > From: Wanpeng Li <liwp@linux.vnet.ibm.com>
> > > > 
> > > > That email address is no longer in use?
> > > > 
> > > > > Since bdi->wb.list_lock is used to protect the b_* lists,
> > > > > so the flushers who call wb_writeback to writeback pages will
> > > > > stuck when bandwidth update policy holds this lock. In order
> > > > > to avoid this race we can introduce a new bandwidth_lock who
> > > > > is responsible for protecting bandwidth update policy.
> > > 
> > > This is not a race condition - it is a lock contention condition.
> > 
> > Nod.
> > 
> > > > This looks good to me. wb.list_lock could be contended and it's better
> > > > for bdi_update_bandwidth() to use a standalone and hardly contended
> > > > lock.
> > > 
> > > I'm not sure it will be "hardly contended". That's a global lock, so
> > > now we'll end up with updates on different bdis contending and it's
> > > not uncommon to see a couple of thousand processes on large machines
> > > beating on balance_dirty_pages().  Putting a global scope lock
> > > around such a function doesn't seem like a good solution to me.
> > 
> > It's more about the number of bdi's than the number of processes that matters.
> > Because here is a per-bdi 200ms ratelimit:
> > 
> > bdi_update_bandwidth():
> > 
> >        if (time_is_after_eq_jiffies(bdi->bw_time_stamp + BANDWIDTH_INTERVAL))
> >                 return;         
> >        // lock it
> 
> So now you get a thousand processes on a thousand CPUs all hit that
> case at the same time because they are all writing to disk at the
> same time, all nicely synchronised by MPI. Lock contention ahoy!

Yeah, the cost does increase fast with number of CPUs...

> > So a global should be enough when there are only dozens of disks.
> 
> Only needs one bdi, just with lots of processes trying to hit it at
> the same time such that they all pass the time after check.

It's more related to number of CPUs: once task A updates
bdi->bw_time_stamp, the other tasks B, C, D, ... will see the updated
value and will all back off in the next 200ms period.

> > However, the global bandwidth_lock will probably become a problem when
> > there comes hundreds of disks. If there are (or will be) such setups,
> > I'm fine to revert to the old per-bdi locking.
> 
> There are setups with hundreds of disks. They also tend to
> have hundreds of CPUs, too....

OK.. I'll drop the change.

> > > Oh, and if you want to remove the dirty_lock from
> > > global_update_limit(), then replacing the lock with a cmpxchg loop
> > > will do it just fine....
> > 
> > Yes. But to be frank, I don't care about that dirty_lock at all,
> > because it has its own 200ms rate limiting :-)
> 
> That has the same problem, only it's currently nested inside another
> lock which isolates it from contention.  This is why measurement is
> important - until there is that evidence shows that the lock
> contention is a problem, don't change it because it generally has a
> unpredictable cascading effect that often results in worse
> contention that was there originally....

You are right, it's good attitude to avoid "might be better" changes
for some "suspected problem".

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-06-14 13:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-12 10:26 [PATCH] writeback: avoid race when update bandwidth Wanpeng Li
2012-06-12 11:21 ` Fengguang Wu
2012-06-12 11:29   ` Wanpeng Li
2012-06-12 11:33     ` Fengguang Wu
2012-06-13  3:56   ` Dave Chinner
2012-06-13  4:21     ` Fengguang Wu
2012-06-14  1:36       ` Dave Chinner
2012-06-14 13:48         ` Fengguang Wu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.