All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] mm: let the bdi_writeout fraction respond more quickly
@ 2010-06-14 13:58 Richard Kennedy
  2010-06-14 14:44   ` Richard Kennedy
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Kennedy @ 2010-06-14 13:58 UTC (permalink / raw)
  To: Jens Axboe, Peter Zijlstra, Andrew Morton, Wu Fengguang; +Cc: lkml, linux-mm

[-- Attachment #1: Type: text/plain, Size: 1860 bytes --]

Hi all,
The fraction of vm cache allowed to each BDI as calculated by
get_dirty_limits (mm/page-writeback.c) respond very slowly to changes in
workload.

Running a simple test that alternately writes 1Gb to sda then sdb,
twice, shows the bdi_threshold taking approximately 15 seconds to reach
a steady state value. This prevents a application from using all of the
available cache and forces it to write to the physical disk earlier than
strictly necessary.  
As you can see from the attached graph, bdi_thresh_before.png, our
current control system responds to this kind of workload very slowly.

The below patch speeds up the recalculation and lets it reach a steady
state value in a couple of seconds. see bdi_thresh_after.png.

I get better throughput with this patch applied and have been running
some variation of this on and off for some months without any obvious
problems.

(These tests were all run on 2.6.35-rc3,
where dm-2 is a sata drive lvm/ext4 and sdb is ide ext4.
I've got lots more results and graphs but won't bore you all with
them ;) )

I see this as a considerable improvement but I have found the magic
number of -4 empirically so it may just be tuned to my system. I'm not
sure how to decide on a value that is suitable for everyone. 

Does anyone have any suggestions or thoughts?

Unfortunately I don't have any other hardware to try this on, so I would
be very interest to hear if anyone tries this on their favourite
workload.

regards
Richard
 
patch against 2.6.35-rc3

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 2fdda90..315dd04 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -144,7 +144,7 @@ static int calc_period_shift(void)
 	else
 		dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
 				100;
-	return 2 + ilog2(dirty_total - 1);
+	return ilog2(dirty_total - 1) - 4;
 }
 
 /*


[-- Attachment #2: bdi_thresh_before.png --]
[-- Type: image/png, Size: 4098 bytes --]

[-- Attachment #3: bdi_thresh_after.png --]
[-- Type: image/png, Size: 2398 bytes --]

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] mm: let the bdi_writeout fraction respond more quickly
  2010-06-14 13:58 [RFC PATCH] mm: let the bdi_writeout fraction respond more quickly Richard Kennedy
@ 2010-06-14 14:44   ` Richard Kennedy
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Kennedy @ 2010-06-14 14:44 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Peter Zijlstra, Andrew Morton, Wu Fengguang, lkml, linux-mm

On Mon, 2010-06-14 at 14:58 +0100, Richard Kennedy wrote:
> Hi all,
> The fraction of vm cache allowed to each BDI as calculated by
> get_dirty_limits (mm/page-writeback.c) respond very slowly to changes in
> workload.
> 
> Running a simple test that alternately writes 1Gb to sda then sdb,
> twice, shows the bdi_threshold taking approximately 15 seconds to reach
> a steady state value. This prevents a application from using all of the
> available cache and forces it to write to the physical disk earlier than
> strictly necessary.  
> As you can see from the attached graph, bdi_thresh_before.png, our
> current control system responds to this kind of workload very slowly.
> 
> The below patch speeds up the recalculation and lets it reach a steady
> state value in a couple of seconds. see bdi_thresh_after.png.
> 
> I get better throughput with this patch applied and have been running
> some variation of this on and off for some months without any obvious
> problems.
> 
> (These tests were all run on 2.6.35-rc3,
> where dm-2 is a sata drive lvm/ext4 and sdb is ide ext4.
> I've got lots more results and graphs but won't bore you all with
> them ;) )
> 
> I see this as a considerable improvement but I have found the magic
> number of -4 empirically so it may just be tuned to my system. I'm not
> sure how to decide on a value that is suitable for everyone. 
> 
> Does anyone have any suggestions or thoughts?
> 
> Unfortunately I don't have any other hardware to try this on, so I would
> be very interest to hear if anyone tries this on their favourite
> workload.
> 
> regards
> Richard
>  
> patch against 2.6.35-rc3
> 
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 2fdda90..315dd04 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -144,7 +144,7 @@ static int calc_period_shift(void)
>  	else
>  		dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
>  				100;
> -	return 2 + ilog2(dirty_total - 1);
> +	return ilog2(dirty_total - 1) - 4;
>  }
>  
>  /*
> 
Fixed Jens email address. I can send you the graphs privately if you
haven't already got them.

regards
Richard



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] mm: let the bdi_writeout fraction respond more quickly
@ 2010-06-14 14:44   ` Richard Kennedy
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Kennedy @ 2010-06-14 14:44 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Peter Zijlstra, Andrew Morton, Wu Fengguang, lkml, linux-mm

On Mon, 2010-06-14 at 14:58 +0100, Richard Kennedy wrote:
> Hi all,
> The fraction of vm cache allowed to each BDI as calculated by
> get_dirty_limits (mm/page-writeback.c) respond very slowly to changes in
> workload.
> 
> Running a simple test that alternately writes 1Gb to sda then sdb,
> twice, shows the bdi_threshold taking approximately 15 seconds to reach
> a steady state value. This prevents a application from using all of the
> available cache and forces it to write to the physical disk earlier than
> strictly necessary.  
> As you can see from the attached graph, bdi_thresh_before.png, our
> current control system responds to this kind of workload very slowly.
> 
> The below patch speeds up the recalculation and lets it reach a steady
> state value in a couple of seconds. see bdi_thresh_after.png.
> 
> I get better throughput with this patch applied and have been running
> some variation of this on and off for some months without any obvious
> problems.
> 
> (These tests were all run on 2.6.35-rc3,
> where dm-2 is a sata drive lvm/ext4 and sdb is ide ext4.
> I've got lots more results and graphs but won't bore you all with
> them ;) )
> 
> I see this as a considerable improvement but I have found the magic
> number of -4 empirically so it may just be tuned to my system. I'm not
> sure how to decide on a value that is suitable for everyone. 
> 
> Does anyone have any suggestions or thoughts?
> 
> Unfortunately I don't have any other hardware to try this on, so I would
> be very interest to hear if anyone tries this on their favourite
> workload.
> 
> regards
> Richard
>  
> patch against 2.6.35-rc3
> 
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 2fdda90..315dd04 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -144,7 +144,7 @@ static int calc_period_shift(void)
>  	else
>  		dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
>  				100;
> -	return 2 + ilog2(dirty_total - 1);
> +	return ilog2(dirty_total - 1) - 4;
>  }
>  
>  /*
> 
Fixed Jens email address. I can send you the graphs privately if you
haven't already got them.

regards
Richard


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] mm: let the bdi_writeout fraction respond more quickly
  2010-06-14 14:44   ` Richard Kennedy
@ 2010-06-16 18:54     ` Peter Zijlstra
  -1 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2010-06-16 18:54 UTC (permalink / raw)
  To: Richard Kennedy; +Cc: Jens Axboe, Andrew Morton, Wu Fengguang, lkml, linux-mm

On Mon, 2010-06-14 at 15:44 +0100, Richard Kennedy wrote:
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index 2fdda90..315dd04 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -144,7 +144,7 @@ static int calc_period_shift(void)
> >       else
> >               dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
> >                               100;
> > -     return 2 + ilog2(dirty_total - 1);
> > +     return ilog2(dirty_total - 1) - 4;
> >  } 

IIRC I suggested similar things in the past and all we needed to do was
find people doing the measurements on different bits of hardware or so..

I don't have any problems with the approach, all we need to make sure is
that we never return 0 or a negative number (possibly ensure a minimum
positive shift value).



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] mm: let the bdi_writeout fraction respond more quickly
@ 2010-06-16 18:54     ` Peter Zijlstra
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2010-06-16 18:54 UTC (permalink / raw)
  To: Richard Kennedy; +Cc: Jens Axboe, Andrew Morton, Wu Fengguang, lkml, linux-mm

On Mon, 2010-06-14 at 15:44 +0100, Richard Kennedy wrote:
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index 2fdda90..315dd04 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -144,7 +144,7 @@ static int calc_period_shift(void)
> >       else
> >               dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
> >                               100;
> > -     return 2 + ilog2(dirty_total - 1);
> > +     return ilog2(dirty_total - 1) - 4;
> >  } 

IIRC I suggested similar things in the past and all we needed to do was
find people doing the measurements on different bits of hardware or so..

I don't have any problems with the approach, all we need to make sure is
that we never return 0 or a negative number (possibly ensure a minimum
positive shift value).


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] mm: let the bdi_writeout fraction respond more quickly
  2010-06-16 18:54     ` Peter Zijlstra
@ 2010-06-17 11:39       ` Richard Kennedy
  -1 siblings, 0 replies; 11+ messages in thread
From: Richard Kennedy @ 2010-06-17 11:39 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Jens Axboe, Andrew Morton, Wu Fengguang, lkml, linux-mm

On Wed, 2010-06-16 at 20:54 +0200, Peter Zijlstra wrote:
> On Mon, 2010-06-14 at 15:44 +0100, Richard Kennedy wrote:
> > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > > index 2fdda90..315dd04 100644
> > > --- a/mm/page-writeback.c
> > > +++ b/mm/page-writeback.c
> > > @@ -144,7 +144,7 @@ static int calc_period_shift(void)
> > >       else
> > >               dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
> > >                               100;
> > > -     return 2 + ilog2(dirty_total - 1);
> > > +     return ilog2(dirty_total - 1) - 4;
> > >  } 
> 
> IIRC I suggested similar things in the past and all we needed to do was
> find people doing the measurements on different bits of hardware or so..
> 
> I don't have any problems with the approach, all we need to make sure is
> that we never return 0 or a negative number (possibly ensure a minimum
> positive shift value).

Yep that sounds reasonable. would minimum shift of 4 be ok ?

something like

	max ( (ilog2(dirty_total - 1)- 4) , 4);

Unfortunately volunteers don't seem to be leaping out of the woodwork,
maybe Andrew could be persuaded to try this in his tree for a while and
see if any one squeaks ?

regards
Richard


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] mm: let the bdi_writeout fraction respond more quickly
@ 2010-06-17 11:39       ` Richard Kennedy
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Kennedy @ 2010-06-17 11:39 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Jens Axboe, Andrew Morton, Wu Fengguang, lkml, linux-mm

On Wed, 2010-06-16 at 20:54 +0200, Peter Zijlstra wrote:
> On Mon, 2010-06-14 at 15:44 +0100, Richard Kennedy wrote:
> > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > > index 2fdda90..315dd04 100644
> > > --- a/mm/page-writeback.c
> > > +++ b/mm/page-writeback.c
> > > @@ -144,7 +144,7 @@ static int calc_period_shift(void)
> > >       else
> > >               dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
> > >                               100;
> > > -     return 2 + ilog2(dirty_total - 1);
> > > +     return ilog2(dirty_total - 1) - 4;
> > >  } 
> 
> IIRC I suggested similar things in the past and all we needed to do was
> find people doing the measurements on different bits of hardware or so..
> 
> I don't have any problems with the approach, all we need to make sure is
> that we never return 0 or a negative number (possibly ensure a minimum
> positive shift value).

Yep that sounds reasonable. would minimum shift of 4 be ok ?

something like

	max ( (ilog2(dirty_total - 1)- 4) , 4);

Unfortunately volunteers don't seem to be leaping out of the woodwork,
maybe Andrew could be persuaded to try this in his tree for a while and
see if any one squeaks ?

regards
Richard

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] mm: let the bdi_writeout fraction respond more  quickly
  2010-06-17 11:39       ` Richard Kennedy
@ 2010-06-17 11:41         ` Jens Axboe
  -1 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2010-06-17 11:41 UTC (permalink / raw)
  To: Richard Kennedy
  Cc: Peter Zijlstra, Andrew Morton, Wu Fengguang, lkml, linux-mm

On 2010-06-17 13:39, Richard Kennedy wrote:
> On Wed, 2010-06-16 at 20:54 +0200, Peter Zijlstra wrote:
>> On Mon, 2010-06-14 at 15:44 +0100, Richard Kennedy wrote:
>>>> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
>>>> index 2fdda90..315dd04 100644
>>>> --- a/mm/page-writeback.c
>>>> +++ b/mm/page-writeback.c
>>>> @@ -144,7 +144,7 @@ static int calc_period_shift(void)
>>>>       else
>>>>               dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
>>>>                               100;
>>>> -     return 2 + ilog2(dirty_total - 1);
>>>> +     return ilog2(dirty_total - 1) - 4;
>>>>  } 
>>
>> IIRC I suggested similar things in the past and all we needed to do was
>> find people doing the measurements on different bits of hardware or so..
>>
>> I don't have any problems with the approach, all we need to make sure is
>> that we never return 0 or a negative number (possibly ensure a minimum
>> positive shift value).
> 
> Yep that sounds reasonable. would minimum shift of 4 be ok ?
> 
> something like
> 
> 	max ( (ilog2(dirty_total - 1)- 4) , 4);
> 
> Unfortunately volunteers don't seem to be leaping out of the woodwork,
> maybe Andrew could be persuaded to try this in his tree for a while and
> see if any one squeaks ?

I'm pretty sure that most volunteers are curious what to actually test,
so they shy away from it. If you added a good explanation of an easy way
to test the before and after, then it would be more approachable.

I'll give it a spin here.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] mm: let the bdi_writeout fraction respond more  quickly
@ 2010-06-17 11:41         ` Jens Axboe
  0 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2010-06-17 11:41 UTC (permalink / raw)
  To: Richard Kennedy
  Cc: Peter Zijlstra, Andrew Morton, Wu Fengguang, lkml, linux-mm

On 2010-06-17 13:39, Richard Kennedy wrote:
> On Wed, 2010-06-16 at 20:54 +0200, Peter Zijlstra wrote:
>> On Mon, 2010-06-14 at 15:44 +0100, Richard Kennedy wrote:
>>>> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
>>>> index 2fdda90..315dd04 100644
>>>> --- a/mm/page-writeback.c
>>>> +++ b/mm/page-writeback.c
>>>> @@ -144,7 +144,7 @@ static int calc_period_shift(void)
>>>>       else
>>>>               dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
>>>>                               100;
>>>> -     return 2 + ilog2(dirty_total - 1);
>>>> +     return ilog2(dirty_total - 1) - 4;
>>>>  } 
>>
>> IIRC I suggested similar things in the past and all we needed to do was
>> find people doing the measurements on different bits of hardware or so..
>>
>> I don't have any problems with the approach, all we need to make sure is
>> that we never return 0 or a negative number (possibly ensure a minimum
>> positive shift value).
> 
> Yep that sounds reasonable. would minimum shift of 4 be ok ?
> 
> something like
> 
> 	max ( (ilog2(dirty_total - 1)- 4) , 4);
> 
> Unfortunately volunteers don't seem to be leaping out of the woodwork,
> maybe Andrew could be persuaded to try this in his tree for a while and
> see if any one squeaks ?

I'm pretty sure that most volunteers are curious what to actually test,
so they shy away from it. If you added a good explanation of an easy way
to test the before and after, then it would be more approachable.

I'll give it a spin here.

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] mm: let the bdi_writeout fraction respond more  quickly
  2010-06-17 11:41         ` Jens Axboe
@ 2010-06-17 18:45           ` Richard Kennedy
  -1 siblings, 0 replies; 11+ messages in thread
From: Richard Kennedy @ 2010-06-17 18:45 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Peter Zijlstra, Andrew Morton, Wu Fengguang, lkml, linux-mm

On Thu, 2010-06-17 at 13:41 +0200, Jens Axboe wrote:
> On 2010-06-17 13:39, Richard Kennedy wrote:
> > On Wed, 2010-06-16 at 20:54 +0200, Peter Zijlstra wrote:
> >> On Mon, 2010-06-14 at 15:44 +0100, Richard Kennedy wrote:
> >>>> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> >>>> index 2fdda90..315dd04 100644
> >>>> --- a/mm/page-writeback.c
> >>>> +++ b/mm/page-writeback.c
> >>>> @@ -144,7 +144,7 @@ static int calc_period_shift(void)
> >>>>       else
> >>>>               dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
> >>>>                               100;
> >>>> -     return 2 + ilog2(dirty_total - 1);
> >>>> +     return ilog2(dirty_total - 1) - 4;
> >>>>  } 
> >>
> >> IIRC I suggested similar things in the past and all we needed to do was
> >> find people doing the measurements on different bits of hardware or so..
> >>
> >> I don't have any problems with the approach, all we need to make sure is
> >> that we never return 0 or a negative number (possibly ensure a minimum
> >> positive shift value).
> > 
> > Yep that sounds reasonable. would minimum shift of 4 be ok ?
> > 
> > something like
> > 
> > 	max ( (ilog2(dirty_total - 1)- 4) , 4);
> > 
> > Unfortunately volunteers don't seem to be leaping out of the woodwork,
> > maybe Andrew could be persuaded to try this in his tree for a while and
> > see if any one squeaks ?
> 
> I'm pretty sure that most volunteers are curious what to actually test,
> so they shy away from it. If you added a good explanation of an easy way
> to test the before and after, then it would be more approachable.
> 
> I'll give it a spin here.
> 

Ah - sorry. but I thought what it did was obvious ;)

Finding a test that's going to show a difference isn't going to be that
easy, It isn't going to have any effect on writing to a single bdi, but
only workloads writing to 2 (or more) disks.

Calc_period_shift controls the speed that the bdi dirty threshold gets
updated, which in turn controls how much of the vm_dirty cache a bdi can
use.
 The first graph shows that currently it is rather slow in reacting to
change so that when you switch the writes from sda to sdb, the threshold
doesn't react quickly enough and sdb isn't allowed to use it's fair
share of the cache and is forced to write to the spinning disk sooner.
Therefore it's slower overall. But the speed difference is highly
dependent on the size of the write v. the size of the cache and the
speed of the disk v. speed of writing to memory.

The tests I run here are writing a large file to one disk then after a
small delay start a small write to the second disk, but it's not easy to
get repeatable results from them.

I don't have a simple test, but the patch will improve the fairness of
the vm_dirty cache sharing. I had in mind the sort of server workloads
where some disks are dedicated to particular applications and others to
general use. There may also be some desktop improvements but they are
difficult to pin down.  

I'm sorry I wasn't clearer before and hope this has explained what I've
been trying to do.

regards
Richard




 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] mm: let the bdi_writeout fraction respond more quickly
@ 2010-06-17 18:45           ` Richard Kennedy
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Kennedy @ 2010-06-17 18:45 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Peter Zijlstra, Andrew Morton, Wu Fengguang, lkml, linux-mm

On Thu, 2010-06-17 at 13:41 +0200, Jens Axboe wrote:
> On 2010-06-17 13:39, Richard Kennedy wrote:
> > On Wed, 2010-06-16 at 20:54 +0200, Peter Zijlstra wrote:
> >> On Mon, 2010-06-14 at 15:44 +0100, Richard Kennedy wrote:
> >>>> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> >>>> index 2fdda90..315dd04 100644
> >>>> --- a/mm/page-writeback.c
> >>>> +++ b/mm/page-writeback.c
> >>>> @@ -144,7 +144,7 @@ static int calc_period_shift(void)
> >>>>       else
> >>>>               dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
> >>>>                               100;
> >>>> -     return 2 + ilog2(dirty_total - 1);
> >>>> +     return ilog2(dirty_total - 1) - 4;
> >>>>  } 
> >>
> >> IIRC I suggested similar things in the past and all we needed to do was
> >> find people doing the measurements on different bits of hardware or so..
> >>
> >> I don't have any problems with the approach, all we need to make sure is
> >> that we never return 0 or a negative number (possibly ensure a minimum
> >> positive shift value).
> > 
> > Yep that sounds reasonable. would minimum shift of 4 be ok ?
> > 
> > something like
> > 
> > 	max ( (ilog2(dirty_total - 1)- 4) , 4);
> > 
> > Unfortunately volunteers don't seem to be leaping out of the woodwork,
> > maybe Andrew could be persuaded to try this in his tree for a while and
> > see if any one squeaks ?
> 
> I'm pretty sure that most volunteers are curious what to actually test,
> so they shy away from it. If you added a good explanation of an easy way
> to test the before and after, then it would be more approachable.
> 
> I'll give it a spin here.
> 

Ah - sorry. but I thought what it did was obvious ;)

Finding a test that's going to show a difference isn't going to be that
easy, It isn't going to have any effect on writing to a single bdi, but
only workloads writing to 2 (or more) disks.

Calc_period_shift controls the speed that the bdi dirty threshold gets
updated, which in turn controls how much of the vm_dirty cache a bdi can
use.
 The first graph shows that currently it is rather slow in reacting to
change so that when you switch the writes from sda to sdb, the threshold
doesn't react quickly enough and sdb isn't allowed to use it's fair
share of the cache and is forced to write to the spinning disk sooner.
Therefore it's slower overall. But the speed difference is highly
dependent on the size of the write v. the size of the cache and the
speed of the disk v. speed of writing to memory.

The tests I run here are writing a large file to one disk then after a
small delay start a small write to the second disk, but it's not easy to
get repeatable results from them.

I don't have a simple test, but the patch will improve the fairness of
the vm_dirty cache sharing. I had in mind the sort of server workloads
where some disks are dedicated to particular applications and others to
general use. There may also be some desktop improvements but they are
difficult to pin down.  

I'm sorry I wasn't clearer before and hope this has explained what I've
been trying to do.

regards
Richard




 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-06-17 18:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-14 13:58 [RFC PATCH] mm: let the bdi_writeout fraction respond more quickly Richard Kennedy
2010-06-14 14:44 ` Richard Kennedy
2010-06-14 14:44   ` Richard Kennedy
2010-06-16 18:54   ` Peter Zijlstra
2010-06-16 18:54     ` Peter Zijlstra
2010-06-17 11:39     ` Richard Kennedy
2010-06-17 11:39       ` Richard Kennedy
2010-06-17 11:41       ` Jens Axboe
2010-06-17 11:41         ` Jens Axboe
2010-06-17 18:45         ` Richard Kennedy
2010-06-17 18:45           ` Richard Kennedy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.