All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] mm: swap: Use swapfiles in priority order
@ 2014-02-13 16:27 Christian Ehrhardt
  0 siblings, 0 replies; 13+ messages in thread
From: Christian Ehrhardt @ 2014-02-13 16:27 UTC (permalink / raw)
  To: linux-mm, mgorman

Hi,
first of all I hope this patching in of the Mesage-ID works for threaded 
views :-)

On 13/02/14 10:42:31, Mel Gorman wrote:
 >
 >
 > [prev in list] [next in list] [prev in thread] [next in thread]
 >
 > List:       linux-mm
 > Subject:    [PATCH] mm: swap: Use swapfiles in priority order
 > From:       Mel Gorman <mgorman () suse ! de>
 > Date:       2014-02-13 10:42:31
 > Message-ID: 20140213104231.GX6732 () suse ! de
 > [Download message RAW]
 >
 > According to the swapon documentation
 >
 > 	Swap  pages  are  allocated  from  areas  in priority order,
 > 	highest priority first.  For areas with different priorities, a
 > 	higher-priority area is exhausted before using a lower-priority area.
 >
 > A user reported

That was me and I can confirm that for all my setup were we encountered 
the issue is fixed with the new patch.

On top of that it also fixed a long running issue that swap gets slower 
the more swap targets you have - which was formerly discussed in detail 
here http://www.spinics.net/lists/linux-mm/msg68624.html

 > that the reality is different. When multiple swap files
 > are enabled and a memory consumer started, the swap files are consumed in
 > pairs after the highest priority file is exhausted. Early in the lifetime
 > of the test, swapfile consumptions looks like
 >
[...]
 >
 > Signed-off-by: Mel Gorman <mgorman@suse.de>
 > ---
 >  mm/swapfile.c | 2 +-
 >  1 file changed, 1 insertion(+), 1 deletion(-)
 >
 > diff --git a/mm/swapfile.c b/mm/swapfile.c
 > index 4a7f7e6..6d0ac2b 100644
 > --- a/mm/swapfile.c
 > +++ b/mm/swapfile.c
 > @@ -651,7 +651,7 @@ swp_entry_t get_swap_page(void)
 >  		goto noswap;
 >  	atomic_long_dec(&nr_swap_pages);
 >
 > -	for (type = swap_list.next; type >= 0 && wrapped < 2; type = next) {
 > +	for (type = swap_list.head; type >= 0 && wrapped < 2; type = next) {
 >  		hp_index = atomic_xchg(&highest_priority_index, -1);
 >  		/*
 >  		 * highest_priority_index records current highest priority swap


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: swap: Use swapfiles in priority order
  2014-02-16  2:59     ` Weijie Yang
@ 2014-02-24  8:28         ` Hugh Dickins
  0 siblings, 0 replies; 13+ messages in thread
From: Hugh Dickins @ 2014-02-24  8:28 UTC (permalink / raw)
  To: Weijie Yang
  Cc: Andrew Morton, Mel Gorman, Michal Hocko, Christian Ehrhardt,
	linux-mm, linux-kernel

On Sun, 16 Feb 2014, Weijie Yang wrote:
>  On Fri, Feb 14, 2014 at 9:10 PM, Christian Ehrhardt
> <ehrhardt@linux.vnet.ibm.com> wrote:
> > Weijie Yang <weijie.yang.kh <at> gmail.com> writes:
> >
> >>
> >> On Thu, Feb 13, 2014 at 6:42 PM, Mel Gorman <mgorman <at> suse.de> wrote:
> > [...]
> >> > -       for (type = swap_list.next; type >= 0 && wrapped < 2; type = next) {
> >> > +       for (type = swap_list.head; type >= 0 && wrapped < 2; type = next) {
> >>
> > [...]
> >> Does it lead to a "schlemiel the painter's algorithm"?
> >> (please forgive my rude words, but I can't find a precise word to describe it
> >>
> >> How about modify it like this?
> >>
> > [...]
> >> - next = swap_list.head;
> >> + next = type;
> > [...]
> >
> > Hi,
> > unfortunately withou studying the code more thoroughly I'm not even sure if
> > you meant you code to extend or replace Mels patch.
> >
> > To be sure about your intention.  You refered to algorithm scaling because
> > you were afraid the new code would scan the full list all the time right ?
> >
> > But simply letting the machines give a try for both options I can now
> > qualify both.
> >
> > Just your patch creates a behaviour of jumping over priorities (see the
> > following example), so I hope you meant combining both patches.
> > With that in mind the patch I eventually tested the combined patch looking
> > like this:
> 
> Hi Christian,
> 
> My patch is not appropriate, so there is no need to combine it with Mel's patch.
> 
> What I worried about Mel's patch is not only the search efficiency,
> actually it has
> negligible impact on system, but also the following scenario:
> 
> If two swapfiles have the same priority, in ordinary semantic, they
> should be used
> in balance. But with Mel's patch, it will always get the free
> swap_entry from the
> swap_list.head in priority order, I worry it could break the balance.
> 
> I think you can test this scenario if you have available test machines.
> 
> Appreciate for your done.

Weijie, I agree with you on both points: Schlemiel effect of repeatedly
restarting from head (already an unintended defect before Mel's patch),
and more importantly the breakage of swapfiles at the same priority.

Sorry, it has to be a Nak to Mel's patch, which fixes one behavior
at the expense of another.  And if we were to go that way, better
just to rip out all of swap_list.next and highest_priority_index.

I had hoped to respond today with a better patch; but I just haven't
got it right yet either.  I think we don't need to rush to fix it,
but fix it we certainly should.

Christian, congratulations on discovering this wrong behavior: at
first I assumed it came from Shaohua's 3.9 highest_priority_index
changes, but no; then I assumed it came from my 2.6.14 swap_lock
changes; but now I think it goes back even before 2.4.0, probably
ever since there have been swap priorities.

Hugh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: swap: Use swapfiles in priority order
@ 2014-02-24  8:28         ` Hugh Dickins
  0 siblings, 0 replies; 13+ messages in thread
From: Hugh Dickins @ 2014-02-24  8:28 UTC (permalink / raw)
  To: Weijie Yang
  Cc: Andrew Morton, Mel Gorman, Michal Hocko, Christian Ehrhardt,
	linux-mm, linux-kernel

On Sun, 16 Feb 2014, Weijie Yang wrote:
>  On Fri, Feb 14, 2014 at 9:10 PM, Christian Ehrhardt
> <ehrhardt@linux.vnet.ibm.com> wrote:
> > Weijie Yang <weijie.yang.kh <at> gmail.com> writes:
> >
> >>
> >> On Thu, Feb 13, 2014 at 6:42 PM, Mel Gorman <mgorman <at> suse.de> wrote:
> > [...]
> >> > -       for (type = swap_list.next; type >= 0 && wrapped < 2; type = next) {
> >> > +       for (type = swap_list.head; type >= 0 && wrapped < 2; type = next) {
> >>
> > [...]
> >> Does it lead to a "schlemiel the painter's algorithm"?
> >> (please forgive my rude words, but I can't find a precise word to describe it
> >>
> >> How about modify it like this?
> >>
> > [...]
> >> - next = swap_list.head;
> >> + next = type;
> > [...]
> >
> > Hi,
> > unfortunately withou studying the code more thoroughly I'm not even sure if
> > you meant you code to extend or replace Mels patch.
> >
> > To be sure about your intention.  You refered to algorithm scaling because
> > you were afraid the new code would scan the full list all the time right ?
> >
> > But simply letting the machines give a try for both options I can now
> > qualify both.
> >
> > Just your patch creates a behaviour of jumping over priorities (see the
> > following example), so I hope you meant combining both patches.
> > With that in mind the patch I eventually tested the combined patch looking
> > like this:
> 
> Hi Christian,
> 
> My patch is not appropriate, so there is no need to combine it with Mel's patch.
> 
> What I worried about Mel's patch is not only the search efficiency,
> actually it has
> negligible impact on system, but also the following scenario:
> 
> If two swapfiles have the same priority, in ordinary semantic, they
> should be used
> in balance. But with Mel's patch, it will always get the free
> swap_entry from the
> swap_list.head in priority order, I worry it could break the balance.
> 
> I think you can test this scenario if you have available test machines.
> 
> Appreciate for your done.

Weijie, I agree with you on both points: Schlemiel effect of repeatedly
restarting from head (already an unintended defect before Mel's patch),
and more importantly the breakage of swapfiles at the same priority.

Sorry, it has to be a Nak to Mel's patch, which fixes one behavior
at the expense of another.  And if we were to go that way, better
just to rip out all of swap_list.next and highest_priority_index.

I had hoped to respond today with a better patch; but I just haven't
got it right yet either.  I think we don't need to rush to fix it,
but fix it we certainly should.

Christian, congratulations on discovering this wrong behavior: at
first I assumed it came from Shaohua's 3.9 highest_priority_index
changes, but no; then I assumed it came from my 2.6.14 swap_lock
changes; but now I think it goes back even before 2.4.0, probably
ever since there have been swap priorities.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: swap: Use swapfiles in priority order
  2014-02-14 13:10   ` Christian Ehrhardt
@ 2014-02-16  2:59     ` Weijie Yang
  2014-02-24  8:28         ` Hugh Dickins
  0 siblings, 1 reply; 13+ messages in thread
From: Weijie Yang @ 2014-02-16  2:59 UTC (permalink / raw)
  To: Christian Ehrhardt; +Cc: linux-mm

 On Fri, Feb 14, 2014 at 9:10 PM, Christian Ehrhardt
<ehrhardt@linux.vnet.ibm.com> wrote:
> Weijie Yang <weijie.yang.kh <at> gmail.com> writes:
>
>>
>> On Thu, Feb 13, 2014 at 6:42 PM, Mel Gorman <mgorman <at> suse.de> wrote:
> [...]
>> > -       for (type = swap_list.next; type >= 0 && wrapped < 2; type = next) {
>> > +       for (type = swap_list.head; type >= 0 && wrapped < 2; type = next) {
>>
> [...]
>> Does it lead to a "schlemiel the painter's algorithm"?
>> (please forgive my rude words, but I can't find a precise word to describe it
>>
>> How about modify it like this?
>>
> [...]
>> - next = swap_list.head;
>> + next = type;
> [...]
>
> Hi,
> unfortunately withou studying the code more thoroughly I'm not even sure if
> you meant you code to extend or replace Mels patch.
>
> To be sure about your intention.  You refered to algorithm scaling because
> you were afraid the new code would scan the full list all the time right ?
>
> But simply letting the machines give a try for both options I can now
> qualify both.
>
> Just your patch creates a behaviour of jumping over priorities (see the
> following example), so I hope you meant combining both patches.
> With that in mind the patch I eventually tested the combined patch looking
> like this:

Hi Christian,

My patch is not appropriate, so there is no need to combine it with Mel's patch.

What I worried about Mel's patch is not only the search efficiency,
actually it has
negligible impact on system, but also the following scenario:

If two swapfiles have the same priority, in ordinary semantic, they
should be used
in balance. But with Mel's patch, it will always get the free
swap_entry from the
swap_list.head in priority order, I worry it could break the balance.

I think you can test this scenario if you have available test machines.

Appreciate for your done.

> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 612a7c9..53a3873 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -650,7 +650,7 @@ swp_entry_t get_swap_page(void)
>                 goto noswap;
>         atomic_long_dec(&nr_swap_pages);
>
> -       for (type = swap_list.next; type >= 0 && wrapped < 2; type = next) {
> +       for (type = swap_list.head; type >= 0 && wrapped < 2; type = next) {
>                 hp_index = atomic_xchg(&highest_priority_index, -1);
>                 /*
>                  * highest_priority_index records current highest priority swap
> @@ -675,7 +675,7 @@ swp_entry_t get_swap_page(void)
>                 next = si->next;
>                 if (next < 0 ||
>                     (!wrapped && si->prio != swap_info[next]->prio)) {
> -                       next = swap_list.head;
> +                       next = type;
>                         wrapped++;
>                 }
>
>
> At least for the two different cases we identified to fix with it the new
> code works as well:
> I) incrementing swap now in proper priority order
> Filename                                Type            Size    Used    Priority
> /testswap1                              file            100004  100004  8
> /testswap2                              file            100004  100004  7
> /testswap3                              file            100004  100004  6
> /testswap4                              file            100004  100004  5
> /testswap5                              file            100004  100004  4
> /testswap6                              file            100004  68764   3
> /testswap7                              file            100004  0       2
> /testswap8                              file            100004  0       1
>
> II) comparing a memory based block device "as one" vs "split into 8 pieces"
> as swap target(s).
> Like with Mels patch alone I'm able to achieve 1.5G/s TP on the
> overcommitted memory no matter how much swap targets I split it into.
>
> So while I can't speak for the logical correctness of your addition to the
> patch at least in terms of effectiveness it seems fine.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: swap: Use swapfiles in priority order
  2014-02-14 10:17     ` Mel Gorman
@ 2014-02-14 13:33       ` Weijie Yang
  -1 siblings, 0 replies; 13+ messages in thread
From: Weijie Yang @ 2014-02-14 13:33 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Andrew Morton, Hugh Dickins, Michal Hocko, Linux-MM, LKML

On Fri, Feb 14, 2014 at 6:17 PM, Mel Gorman <mgorman@suse.de> wrote:
> On Thu, Feb 13, 2014 at 11:58:05PM +0800, Weijie Yang wrote:
>> On Thu, Feb 13, 2014 at 6:42 PM, Mel Gorman <mgorman@suse.de> wrote:
>> > According to the swapon documentation
>> >
>> >         Swap  pages  are  allocated  from  areas  in priority order,
>> >         highest priority first.  For areas with different priorities, a
>> >         higher-priority area is exhausted before using a lower-priority area.
>> >
>> > A user reported that the reality is different. When multiple swap files
>> > are enabled and a memory consumer started, the swap files are consumed in
>> > pairs after the highest priority file is exhausted. Early in the lifetime
>> > of the test, swapfile consumptions looks like
>> >
>> > Filename                                Type            Size    Used    Priority
>> > /testswap1                              file            100004  100004  8
>> > /testswap2                              file            100004  23764   7
>> > /testswap3                              file            100004  23764   6
>> > /testswap4                              file            100004  0       5
>> > /testswap5                              file            100004  0       4
>> > /testswap6                              file            100004  0       3
>> > /testswap7                              file            100004  0       2
>> > /testswap8                              file            100004  0       1
>> >
>> > This patch fixes the swap_list search in get_swap_page to use the swap files
>> > in the correct order. When applied the swap file consumptions looks like
>> >
>> > Filename                                Type            Size    Used    Priority
>> > /testswap1                              file            100004  100004  8
>> > /testswap2                              file            100004  100004  7
>> > /testswap3                              file            100004  29372   6
>> > /testswap4                              file            100004  0       5
>> > /testswap5                              file            100004  0       4
>> > /testswap6                              file            100004  0       3
>> > /testswap7                              file            100004  0       2
>> > /testswap8                              file            100004  0       1
>> >
>> > Signed-off-by: Mel Gorman <mgorman@suse.de>
>> > ---
>> >  mm/swapfile.c | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/mm/swapfile.c b/mm/swapfile.c
>> > index 4a7f7e6..6d0ac2b 100644
>> > --- a/mm/swapfile.c
>> > +++ b/mm/swapfile.c
>> > @@ -651,7 +651,7 @@ swp_entry_t get_swap_page(void)
>> >                 goto noswap;
>> >         atomic_long_dec(&nr_swap_pages);
>> >
>> > -       for (type = swap_list.next; type >= 0 && wrapped < 2; type = next) {
>> > +       for (type = swap_list.head; type >= 0 && wrapped < 2; type = next) {
>>
>> Does it lead to a "schlemiel the painter's algorithm"?
>> (please forgive my rude words, but I can't find a precise word to describe it
>> because English is not my native language. My apologize.)
>>
>> How about modify it like this?
>>
>
> I blindly applied your version without review to see how it behaved and
> found it uses every second swapfile like this

I am sorry to waste your time, I should have tested it.
I will review the code more carefully, and send a tested patch if I find a
better way.
Apologize again.

> Filename                                Type            Size    Used    Priority
> /testswap1                              file            100004  100004  8
> /testswap2                              file            100004  16      7
> /testswap3                              file            100004  100004  6
> /testswap4                              file            100004  8       5
> /testswap5                              file            100004  100004  4
> /testswap6                              file            100004  8       3
> /testswap7                              file            100004  100004  2
> /testswap8                              file            100004  23504   1
>
> I admit I did not review the swap priority search algorithm in detail
> because the fix superficially looked straight forward but this
> alternative is not the answer either.
>
> --
> Mel Gorman
> SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: swap: Use swapfiles in priority order
@ 2014-02-14 13:33       ` Weijie Yang
  0 siblings, 0 replies; 13+ messages in thread
From: Weijie Yang @ 2014-02-14 13:33 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Andrew Morton, Hugh Dickins, Michal Hocko, Linux-MM, LKML

On Fri, Feb 14, 2014 at 6:17 PM, Mel Gorman <mgorman@suse.de> wrote:
> On Thu, Feb 13, 2014 at 11:58:05PM +0800, Weijie Yang wrote:
>> On Thu, Feb 13, 2014 at 6:42 PM, Mel Gorman <mgorman@suse.de> wrote:
>> > According to the swapon documentation
>> >
>> >         Swap  pages  are  allocated  from  areas  in priority order,
>> >         highest priority first.  For areas with different priorities, a
>> >         higher-priority area is exhausted before using a lower-priority area.
>> >
>> > A user reported that the reality is different. When multiple swap files
>> > are enabled and a memory consumer started, the swap files are consumed in
>> > pairs after the highest priority file is exhausted. Early in the lifetime
>> > of the test, swapfile consumptions looks like
>> >
>> > Filename                                Type            Size    Used    Priority
>> > /testswap1                              file            100004  100004  8
>> > /testswap2                              file            100004  23764   7
>> > /testswap3                              file            100004  23764   6
>> > /testswap4                              file            100004  0       5
>> > /testswap5                              file            100004  0       4
>> > /testswap6                              file            100004  0       3
>> > /testswap7                              file            100004  0       2
>> > /testswap8                              file            100004  0       1
>> >
>> > This patch fixes the swap_list search in get_swap_page to use the swap files
>> > in the correct order. When applied the swap file consumptions looks like
>> >
>> > Filename                                Type            Size    Used    Priority
>> > /testswap1                              file            100004  100004  8
>> > /testswap2                              file            100004  100004  7
>> > /testswap3                              file            100004  29372   6
>> > /testswap4                              file            100004  0       5
>> > /testswap5                              file            100004  0       4
>> > /testswap6                              file            100004  0       3
>> > /testswap7                              file            100004  0       2
>> > /testswap8                              file            100004  0       1
>> >
>> > Signed-off-by: Mel Gorman <mgorman@suse.de>
>> > ---
>> >  mm/swapfile.c | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/mm/swapfile.c b/mm/swapfile.c
>> > index 4a7f7e6..6d0ac2b 100644
>> > --- a/mm/swapfile.c
>> > +++ b/mm/swapfile.c
>> > @@ -651,7 +651,7 @@ swp_entry_t get_swap_page(void)
>> >                 goto noswap;
>> >         atomic_long_dec(&nr_swap_pages);
>> >
>> > -       for (type = swap_list.next; type >= 0 && wrapped < 2; type = next) {
>> > +       for (type = swap_list.head; type >= 0 && wrapped < 2; type = next) {
>>
>> Does it lead to a "schlemiel the painter's algorithm"?
>> (please forgive my rude words, but I can't find a precise word to describe it
>> because English is not my native language. My apologize.)
>>
>> How about modify it like this?
>>
>
> I blindly applied your version without review to see how it behaved and
> found it uses every second swapfile like this

I am sorry to waste your time, I should have tested it.
I will review the code more carefully, and send a tested patch if I find a
better way.
Apologize again.

> Filename                                Type            Size    Used    Priority
> /testswap1                              file            100004  100004  8
> /testswap2                              file            100004  16      7
> /testswap3                              file            100004  100004  6
> /testswap4                              file            100004  8       5
> /testswap5                              file            100004  100004  4
> /testswap6                              file            100004  8       3
> /testswap7                              file            100004  100004  2
> /testswap8                              file            100004  23504   1
>
> I admit I did not review the swap priority search algorithm in detail
> because the fix superficially looked straight forward but this
> alternative is not the answer either.
>
> --
> Mel Gorman
> SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: swap: Use swapfiles in priority order
  2014-02-13 15:58   ` Weijie Yang
  (?)
  (?)
@ 2014-02-14 13:10   ` Christian Ehrhardt
  2014-02-16  2:59     ` Weijie Yang
  -1 siblings, 1 reply; 13+ messages in thread
From: Christian Ehrhardt @ 2014-02-14 13:10 UTC (permalink / raw)
  To: linux-mm

Weijie Yang <weijie.yang.kh <at> gmail.com> writes:

> 
> On Thu, Feb 13, 2014 at 6:42 PM, Mel Gorman <mgorman <at> suse.de> wrote:
[...]
> > -       for (type = swap_list.next; type >= 0 && wrapped < 2; type = next) {
> > +       for (type = swap_list.head; type >= 0 && wrapped < 2; type = next) {
> 
[...]
> Does it lead to a "schlemiel the painter's algorithm"?
> (please forgive my rude words, but I can't find a precise word to describe it
> 
> How about modify it like this?
> 
[...]
> - next = swap_list.head;
> + next = type;
[...]

Hi,
unfortunately withou studying the code more thoroughly I'm not even sure if
you meant you code to extend or replace Mels patch.

To be sure about your intention.  You refered to algorithm scaling because
you were afraid the new code would scan the full list all the time right ?

But simply letting the machines give a try for both options I can now
qualify both.

Just your patch creates a behaviour of jumping over priorities (see the
following example), so I hope you meant combining both patches.
With that in mind the patch I eventually tested the combined patch looking
like this:

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 612a7c9..53a3873 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -650,7 +650,7 @@ swp_entry_t get_swap_page(void)
                goto noswap;
        atomic_long_dec(&nr_swap_pages);
 
-       for (type = swap_list.next; type >= 0 && wrapped < 2; type = next) {
+       for (type = swap_list.head; type >= 0 && wrapped < 2; type = next) {
                hp_index = atomic_xchg(&highest_priority_index, -1);
                /*
                 * highest_priority_index records current highest priority swap
@@ -675,7 +675,7 @@ swp_entry_t get_swap_page(void)
                next = si->next;
                if (next < 0 ||
                    (!wrapped && si->prio != swap_info[next]->prio)) {
-                       next = swap_list.head;
+                       next = type;
                        wrapped++;
                }


At least for the two different cases we identified to fix with it the new
code works as well:
I) incrementing swap now in proper priority order
Filename                                Type            Size    Used    Priority
/testswap1                              file            100004  100004  8
/testswap2                              file            100004  100004  7
/testswap3                              file            100004  100004  6
/testswap4                              file            100004  100004  5
/testswap5                              file            100004  100004  4
/testswap6                              file            100004  68764   3
/testswap7                              file            100004  0       2
/testswap8                              file            100004  0       1

II) comparing a memory based block device "as one" vs "split into 8 pieces"
as swap target(s).
Like with Mels patch alone I'm able to achieve 1.5G/s TP on the
overcommitted memory no matter how much swap targets I split it into.

So while I can't speak for the logical correctness of your addition to the
patch at least in terms of effectiveness it seems fine.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: swap: Use swapfiles in priority order
  2014-02-13 15:58   ` Weijie Yang
@ 2014-02-14 10:17     ` Mel Gorman
  -1 siblings, 0 replies; 13+ messages in thread
From: Mel Gorman @ 2014-02-14 10:17 UTC (permalink / raw)
  To: Weijie Yang; +Cc: Andrew Morton, Hugh Dickins, Michal Hocko, Linux-MM, LKML

On Thu, Feb 13, 2014 at 11:58:05PM +0800, Weijie Yang wrote:
> On Thu, Feb 13, 2014 at 6:42 PM, Mel Gorman <mgorman@suse.de> wrote:
> > According to the swapon documentation
> >
> >         Swap  pages  are  allocated  from  areas  in priority order,
> >         highest priority first.  For areas with different priorities, a
> >         higher-priority area is exhausted before using a lower-priority area.
> >
> > A user reported that the reality is different. When multiple swap files
> > are enabled and a memory consumer started, the swap files are consumed in
> > pairs after the highest priority file is exhausted. Early in the lifetime
> > of the test, swapfile consumptions looks like
> >
> > Filename                                Type            Size    Used    Priority
> > /testswap1                              file            100004  100004  8
> > /testswap2                              file            100004  23764   7
> > /testswap3                              file            100004  23764   6
> > /testswap4                              file            100004  0       5
> > /testswap5                              file            100004  0       4
> > /testswap6                              file            100004  0       3
> > /testswap7                              file            100004  0       2
> > /testswap8                              file            100004  0       1
> >
> > This patch fixes the swap_list search in get_swap_page to use the swap files
> > in the correct order. When applied the swap file consumptions looks like
> >
> > Filename                                Type            Size    Used    Priority
> > /testswap1                              file            100004  100004  8
> > /testswap2                              file            100004  100004  7
> > /testswap3                              file            100004  29372   6
> > /testswap4                              file            100004  0       5
> > /testswap5                              file            100004  0       4
> > /testswap6                              file            100004  0       3
> > /testswap7                              file            100004  0       2
> > /testswap8                              file            100004  0       1
> >
> > Signed-off-by: Mel Gorman <mgorman@suse.de>
> > ---
> >  mm/swapfile.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index 4a7f7e6..6d0ac2b 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -651,7 +651,7 @@ swp_entry_t get_swap_page(void)
> >                 goto noswap;
> >         atomic_long_dec(&nr_swap_pages);
> >
> > -       for (type = swap_list.next; type >= 0 && wrapped < 2; type = next) {
> > +       for (type = swap_list.head; type >= 0 && wrapped < 2; type = next) {
> 
> Does it lead to a "schlemiel the painter's algorithm"?
> (please forgive my rude words, but I can't find a precise word to describe it
> because English is not my native language. My apologize.)
> 
> How about modify it like this?
> 

I blindly applied your version without review to see how it behaved and
found it uses every second swapfile like this

Filename                                Type            Size    Used    Priority
/testswap1                              file            100004  100004  8
/testswap2                              file            100004  16      7
/testswap3                              file            100004  100004  6
/testswap4                              file            100004  8       5
/testswap5                              file            100004  100004  4
/testswap6                              file            100004  8       3
/testswap7                              file            100004  100004  2
/testswap8                              file            100004  23504   1

I admit I did not review the swap priority search algorithm in detail
because the fix superficially looked straight forward but this
alternative is not the answer either.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: swap: Use swapfiles in priority order
@ 2014-02-14 10:17     ` Mel Gorman
  0 siblings, 0 replies; 13+ messages in thread
From: Mel Gorman @ 2014-02-14 10:17 UTC (permalink / raw)
  To: Weijie Yang; +Cc: Andrew Morton, Hugh Dickins, Michal Hocko, Linux-MM, LKML

On Thu, Feb 13, 2014 at 11:58:05PM +0800, Weijie Yang wrote:
> On Thu, Feb 13, 2014 at 6:42 PM, Mel Gorman <mgorman@suse.de> wrote:
> > According to the swapon documentation
> >
> >         Swap  pages  are  allocated  from  areas  in priority order,
> >         highest priority first.  For areas with different priorities, a
> >         higher-priority area is exhausted before using a lower-priority area.
> >
> > A user reported that the reality is different. When multiple swap files
> > are enabled and a memory consumer started, the swap files are consumed in
> > pairs after the highest priority file is exhausted. Early in the lifetime
> > of the test, swapfile consumptions looks like
> >
> > Filename                                Type            Size    Used    Priority
> > /testswap1                              file            100004  100004  8
> > /testswap2                              file            100004  23764   7
> > /testswap3                              file            100004  23764   6
> > /testswap4                              file            100004  0       5
> > /testswap5                              file            100004  0       4
> > /testswap6                              file            100004  0       3
> > /testswap7                              file            100004  0       2
> > /testswap8                              file            100004  0       1
> >
> > This patch fixes the swap_list search in get_swap_page to use the swap files
> > in the correct order. When applied the swap file consumptions looks like
> >
> > Filename                                Type            Size    Used    Priority
> > /testswap1                              file            100004  100004  8
> > /testswap2                              file            100004  100004  7
> > /testswap3                              file            100004  29372   6
> > /testswap4                              file            100004  0       5
> > /testswap5                              file            100004  0       4
> > /testswap6                              file            100004  0       3
> > /testswap7                              file            100004  0       2
> > /testswap8                              file            100004  0       1
> >
> > Signed-off-by: Mel Gorman <mgorman@suse.de>
> > ---
> >  mm/swapfile.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index 4a7f7e6..6d0ac2b 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -651,7 +651,7 @@ swp_entry_t get_swap_page(void)
> >                 goto noswap;
> >         atomic_long_dec(&nr_swap_pages);
> >
> > -       for (type = swap_list.next; type >= 0 && wrapped < 2; type = next) {
> > +       for (type = swap_list.head; type >= 0 && wrapped < 2; type = next) {
> 
> Does it lead to a "schlemiel the painter's algorithm"?
> (please forgive my rude words, but I can't find a precise word to describe it
> because English is not my native language. My apologize.)
> 
> How about modify it like this?
> 

I blindly applied your version without review to see how it behaved and
found it uses every second swapfile like this

Filename                                Type            Size    Used    Priority
/testswap1                              file            100004  100004  8
/testswap2                              file            100004  16      7
/testswap3                              file            100004  100004  6
/testswap4                              file            100004  8       5
/testswap5                              file            100004  100004  4
/testswap6                              file            100004  8       3
/testswap7                              file            100004  100004  2
/testswap8                              file            100004  23504   1

I admit I did not review the swap priority search algorithm in detail
because the fix superficially looked straight forward but this
alternative is not the answer either.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: swap: Use swapfiles in priority order
  2014-02-13 10:42 ` Mel Gorman
@ 2014-02-13 15:58   ` Weijie Yang
  -1 siblings, 0 replies; 13+ messages in thread
From: Weijie Yang @ 2014-02-13 15:58 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Andrew Morton, Hugh Dickins, Michal Hocko, Linux-MM, LKML

On Thu, Feb 13, 2014 at 6:42 PM, Mel Gorman <mgorman@suse.de> wrote:
> According to the swapon documentation
>
>         Swap  pages  are  allocated  from  areas  in priority order,
>         highest priority first.  For areas with different priorities, a
>         higher-priority area is exhausted before using a lower-priority area.
>
> A user reported that the reality is different. When multiple swap files
> are enabled and a memory consumer started, the swap files are consumed in
> pairs after the highest priority file is exhausted. Early in the lifetime
> of the test, swapfile consumptions looks like
>
> Filename                                Type            Size    Used    Priority
> /testswap1                              file            100004  100004  8
> /testswap2                              file            100004  23764   7
> /testswap3                              file            100004  23764   6
> /testswap4                              file            100004  0       5
> /testswap5                              file            100004  0       4
> /testswap6                              file            100004  0       3
> /testswap7                              file            100004  0       2
> /testswap8                              file            100004  0       1
>
> This patch fixes the swap_list search in get_swap_page to use the swap files
> in the correct order. When applied the swap file consumptions looks like
>
> Filename                                Type            Size    Used    Priority
> /testswap1                              file            100004  100004  8
> /testswap2                              file            100004  100004  7
> /testswap3                              file            100004  29372   6
> /testswap4                              file            100004  0       5
> /testswap5                              file            100004  0       4
> /testswap6                              file            100004  0       3
> /testswap7                              file            100004  0       2
> /testswap8                              file            100004  0       1
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> ---
>  mm/swapfile.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 4a7f7e6..6d0ac2b 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -651,7 +651,7 @@ swp_entry_t get_swap_page(void)
>                 goto noswap;
>         atomic_long_dec(&nr_swap_pages);
>
> -       for (type = swap_list.next; type >= 0 && wrapped < 2; type = next) {
> +       for (type = swap_list.head; type >= 0 && wrapped < 2; type = next) {

Does it lead to a "schlemiel the painter's algorithm"?
(please forgive my rude words, but I can't find a precise word to describe it
because English is not my native language. My apologize.)

How about modify it like this?

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 4a7f7e6..d64aa55 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -676,7 +676,7 @@ swp_entry_t get_swap_page(void)
  next = si->next;
  if (next < 0 ||
     (!wrapped && si->prio != swap_info[next]->prio)) {
- next = swap_list.head;
+ next = type;
  wrapped++;
  }

>                 hp_index = atomic_xchg(&highest_priority_index, -1);
>                 /*
>                  * highest_priority_index records current highest priority swap
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: swap: Use swapfiles in priority order
@ 2014-02-13 15:58   ` Weijie Yang
  0 siblings, 0 replies; 13+ messages in thread
From: Weijie Yang @ 2014-02-13 15:58 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Andrew Morton, Hugh Dickins, Michal Hocko, Linux-MM, LKML

On Thu, Feb 13, 2014 at 6:42 PM, Mel Gorman <mgorman@suse.de> wrote:
> According to the swapon documentation
>
>         Swap  pages  are  allocated  from  areas  in priority order,
>         highest priority first.  For areas with different priorities, a
>         higher-priority area is exhausted before using a lower-priority area.
>
> A user reported that the reality is different. When multiple swap files
> are enabled and a memory consumer started, the swap files are consumed in
> pairs after the highest priority file is exhausted. Early in the lifetime
> of the test, swapfile consumptions looks like
>
> Filename                                Type            Size    Used    Priority
> /testswap1                              file            100004  100004  8
> /testswap2                              file            100004  23764   7
> /testswap3                              file            100004  23764   6
> /testswap4                              file            100004  0       5
> /testswap5                              file            100004  0       4
> /testswap6                              file            100004  0       3
> /testswap7                              file            100004  0       2
> /testswap8                              file            100004  0       1
>
> This patch fixes the swap_list search in get_swap_page to use the swap files
> in the correct order. When applied the swap file consumptions looks like
>
> Filename                                Type            Size    Used    Priority
> /testswap1                              file            100004  100004  8
> /testswap2                              file            100004  100004  7
> /testswap3                              file            100004  29372   6
> /testswap4                              file            100004  0       5
> /testswap5                              file            100004  0       4
> /testswap6                              file            100004  0       3
> /testswap7                              file            100004  0       2
> /testswap8                              file            100004  0       1
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> ---
>  mm/swapfile.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 4a7f7e6..6d0ac2b 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -651,7 +651,7 @@ swp_entry_t get_swap_page(void)
>                 goto noswap;
>         atomic_long_dec(&nr_swap_pages);
>
> -       for (type = swap_list.next; type >= 0 && wrapped < 2; type = next) {
> +       for (type = swap_list.head; type >= 0 && wrapped < 2; type = next) {

Does it lead to a "schlemiel the painter's algorithm"?
(please forgive my rude words, but I can't find a precise word to describe it
because English is not my native language. My apologize.)

How about modify it like this?

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 4a7f7e6..d64aa55 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -676,7 +676,7 @@ swp_entry_t get_swap_page(void)
  next = si->next;
  if (next < 0 ||
     (!wrapped && si->prio != swap_info[next]->prio)) {
- next = swap_list.head;
+ next = type;
  wrapped++;
  }

>                 hp_index = atomic_xchg(&highest_priority_index, -1);
>                 /*
>                  * highest_priority_index records current highest priority swap
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH] mm: swap: Use swapfiles in priority order
@ 2014-02-13 10:42 ` Mel Gorman
  0 siblings, 0 replies; 13+ messages in thread
From: Mel Gorman @ 2014-02-13 10:42 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Hugh Dickins, Michal Hocko, Linux-MM, LKML

According to the swapon documentation

	Swap  pages  are  allocated  from  areas  in priority order,
	highest priority first.  For areas with different priorities, a
	higher-priority area is exhausted before using a lower-priority area.

A user reported that the reality is different. When multiple swap files
are enabled and a memory consumer started, the swap files are consumed in
pairs after the highest priority file is exhausted. Early in the lifetime
of the test, swapfile consumptions looks like

Filename                                Type            Size    Used    Priority
/testswap1                              file            100004  100004  8
/testswap2                              file            100004  23764   7
/testswap3                              file            100004  23764   6
/testswap4                              file            100004  0       5
/testswap5                              file            100004  0       4
/testswap6                              file            100004  0       3
/testswap7                              file            100004  0       2
/testswap8                              file            100004  0       1

This patch fixes the swap_list search in get_swap_page to use the swap files
in the correct order. When applied the swap file consumptions looks like

Filename				Type		Size	Used	Priority
/testswap1                              file		100004	100004	8
/testswap2                              file		100004	100004	7
/testswap3                              file		100004	29372	6
/testswap4                              file		100004	0	5
/testswap5                              file		100004	0	4
/testswap6                              file		100004	0	3
/testswap7                              file		100004	0	2
/testswap8                              file		100004	0	1

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/swapfile.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 4a7f7e6..6d0ac2b 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -651,7 +651,7 @@ swp_entry_t get_swap_page(void)
 		goto noswap;
 	atomic_long_dec(&nr_swap_pages);
 
-	for (type = swap_list.next; type >= 0 && wrapped < 2; type = next) {
+	for (type = swap_list.head; type >= 0 && wrapped < 2; type = next) {
 		hp_index = atomic_xchg(&highest_priority_index, -1);
 		/*
 		 * highest_priority_index records current highest priority swap

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH] mm: swap: Use swapfiles in priority order
@ 2014-02-13 10:42 ` Mel Gorman
  0 siblings, 0 replies; 13+ messages in thread
From: Mel Gorman @ 2014-02-13 10:42 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Hugh Dickins, Michal Hocko, Linux-MM, LKML

According to the swapon documentation

	Swap  pages  are  allocated  from  areas  in priority order,
	highest priority first.  For areas with different priorities, a
	higher-priority area is exhausted before using a lower-priority area.

A user reported that the reality is different. When multiple swap files
are enabled and a memory consumer started, the swap files are consumed in
pairs after the highest priority file is exhausted. Early in the lifetime
of the test, swapfile consumptions looks like

Filename                                Type            Size    Used    Priority
/testswap1                              file            100004  100004  8
/testswap2                              file            100004  23764   7
/testswap3                              file            100004  23764   6
/testswap4                              file            100004  0       5
/testswap5                              file            100004  0       4
/testswap6                              file            100004  0       3
/testswap7                              file            100004  0       2
/testswap8                              file            100004  0       1

This patch fixes the swap_list search in get_swap_page to use the swap files
in the correct order. When applied the swap file consumptions looks like

Filename				Type		Size	Used	Priority
/testswap1                              file		100004	100004	8
/testswap2                              file		100004	100004	7
/testswap3                              file		100004	29372	6
/testswap4                              file		100004	0	5
/testswap5                              file		100004	0	4
/testswap6                              file		100004	0	3
/testswap7                              file		100004	0	2
/testswap8                              file		100004	0	1

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/swapfile.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 4a7f7e6..6d0ac2b 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -651,7 +651,7 @@ swp_entry_t get_swap_page(void)
 		goto noswap;
 	atomic_long_dec(&nr_swap_pages);
 
-	for (type = swap_list.next; type >= 0 && wrapped < 2; type = next) {
+	for (type = swap_list.head; type >= 0 && wrapped < 2; type = next) {
 		hp_index = atomic_xchg(&highest_priority_index, -1);
 		/*
 		 * highest_priority_index records current highest priority swap

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-02-24  8:28 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-13 16:27 [PATCH] mm: swap: Use swapfiles in priority order Christian Ehrhardt
  -- strict thread matches above, loose matches on Subject: below --
2014-02-13 10:42 Mel Gorman
2014-02-13 10:42 ` Mel Gorman
2014-02-13 15:58 ` Weijie Yang
2014-02-13 15:58   ` Weijie Yang
2014-02-14 10:17   ` Mel Gorman
2014-02-14 10:17     ` Mel Gorman
2014-02-14 13:33     ` Weijie Yang
2014-02-14 13:33       ` Weijie Yang
2014-02-14 13:10   ` Christian Ehrhardt
2014-02-16  2:59     ` Weijie Yang
2014-02-24  8:28       ` Hugh Dickins
2014-02-24  8:28         ` Hugh Dickins

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.