All of lore.kernel.org
 help / color / mirror / Atom feed
* Regression in mobility grouping?
@ 2016-09-28  1:41 Johannes Weiner
  2016-09-28  9:00   ` Vlastimil Babka
                   ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: Johannes Weiner @ 2016-09-28  1:41 UTC (permalink / raw)
  To: Vlastimil Babka, Mel Gorman, Joonsoo Kim
  Cc: linux-mm, linux-kernel, kernel-team

[-- Attachment #1: Type: text/plain, Size: 2597 bytes --]

Hi guys,

we noticed what looks like a regression in page mobility grouping
during an upgrade from 3.10 to 4.0. Identical machines, workloads, and
uptime, but /proc/pagetypeinfo on 3.10 looks like this:

Number of blocks type     Unmovable  Reclaimable      Movable      Reserve      Isolate 
Node 1, zone   Normal          815          433        31518            2            0 

and on 4.0 like this:

Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate 
Node 1, zone   Normal         3880         3530        25356            2            0            0 

4.0 is either polluting pageblocks more aggressively at allocation, or
is not able to make pageblocks movable again when the reclaimable and
unmovable allocations are released. Invoking compaction manually
(/proc/sys/vm/compact_memory) is not bringing them back, either.

The problem we are debugging is that these machines have a very high
rate of order-3 allocations (fdtable during fork, network rx), and
after the upgrade allocstalls have increased dramatically. I'm not
entirely sure this is the same issue, since even order-0 allocations
are struggling, but the mobility grouping in itself looks problematic.

I'm still going through the changes relevant to mobility grouping in
that timeframe, but if this rings a bell for anyone, it would help. I
hate blaming random patches, but these caught my eye:

9c0415e mm: more aggressive page stealing for UNMOVABLE allocations
3a1086f mm: always steal split buddies in fallback allocations
99592d5 mm: when stealing freepages, also take pages created by splitting buddy page

The changelog states that by aggressively stealing split buddy pages
during a fallback allocation we avoid subsequent stealing. But since
there are generally more movable/reclaimable pages available, and so
less falling back and stealing freepages on behalf of movable, won't
this mean that we could expect exactly that result - growing numbers
of unmovable blocks, while rarely stealing them back in movable alloc
fallbacks? And the expansion of !MOVABLE blocks would over time make
compaction less and less effective too, seeing as it doesn't consider
anything !MOVABLE suitable migration targets?

Attached are the full /proc/pagetypeinfo and /proc/buddyinfo from both
kernels on machines with similar uptimes and directly after invoking
compaction. As you can see, the buddy lists are much more fragmented
on 4.0, with unmovable/reclaimable allocations polluting more blocks.

Any thoughts on this would be greatly appreciated. I can test patches.

Thanks!

[-- Attachment #2: buddyinfo-3.10.txt --]
[-- Type: text/plain, Size: 400 bytes --]

Node 0, zone      DMA      0      0      0      1      2      1      1      0      1      1      3 
Node 0, zone    DMA32   1062   1491   1641   1725    478     77      5      1      0      0      0 
Node 0, zone   Normal  10436  16239   5903    696    130    729   1298    550    109      0      0 
Node 1, zone   Normal   5956     15      5     28     11      8      2      0      0      0      0 

[-- Attachment #3: buddyinfo-4.0.txt --]
[-- Type: text/plain, Size: 400 bytes --]

Node 0, zone      DMA      1      1      0      1      1      1      1      0      0      1      3 
Node 0, zone    DMA32   9462   6148   2297     27      0      0      0      0      0      0      0 
Node 0, zone   Normal 130376  36589   3777     94      1      0      0      0      0      0      0 
Node 1, zone   Normal 190988  77269   3896    332      6      0      0      0      0      0      0 

[-- Attachment #4: pagetypeinfo-3.10.txt --]
[-- Type: text/plain, Size: 3302 bytes --]

Page block order: 9
Pages per block:  512

Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10 
Node    0, zone      DMA, type    Unmovable      0      0      0      1      2      1      1      0      1      0      0 
Node    0, zone      DMA, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone      DMA, type      Movable      0      0      0      0      0      0      0      0      0      0      3 
Node    0, zone      DMA, type      Reserve      0      0      0      0      0      0      0      0      0      1      0 
Node    0, zone      DMA, type      Isolate      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone    DMA32, type    Unmovable    488    221    286      0      0      0      0      0      0      0      0 
Node    0, zone    DMA32, type  Reclaimable      1    725    741      0      0      0      0      0      0      0      0 
Node    0, zone    DMA32, type      Movable    431   1735   1073    105      0      0      0      0      0      0      0 
Node    0, zone    DMA32, type      Reserve      0      0      0     17      1      0      0      0      0      0      0 
Node    0, zone    DMA32, type      Isolate      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone   Normal, type    Unmovable   1922     16      1     19      0      0      0      0      0      0      0 
Node    0, zone   Normal, type  Reclaimable   4549      0      0      1      0      0      0      0      0      0      0 
Node    0, zone   Normal, type      Movable      3      0      2      3      0      0      0      0      0      0      0 
Node    0, zone   Normal, type      Reserve      0      0      1     22      1      2      1      0      0      0      0 
Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0 

Number of blocks type     Unmovable  Reclaimable      Movable      Reserve      Isolate 
Node 0, zone      DMA            1            0            6            1            0 
Node 0, zone    DMA32           96           21          898            1            0 
Node 0, zone   Normal         1105          497        30140            2            0 
Page block order: 9
Pages per block:  512

Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10 
Node    1, zone   Normal, type    Unmovable   5746      3      0      0      0      0      0      0      0      0      0 
Node    1, zone   Normal, type  Reclaimable     53     10      0      0      0      0      0      0      0      0      0 
Node    1, zone   Normal, type      Movable      1   2919   1131      0      0      0      0      0      0      0      0 
Node    1, zone   Normal, type      Reserve      0      0      0      0      0      1      2      0      0      0      0 
Node    1, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0 

Number of blocks type     Unmovable  Reclaimable      Movable      Reserve      Isolate 
Node 1, zone   Normal          868          433        31465            2            0 

[-- Attachment #5: pagetypeinfo-4.0.txt --]
[-- Type: text/plain, Size: 3868 bytes --]

Page block order: 9
Pages per block:  512

Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10 
Node    0, zone      DMA, type    Unmovable      1      1      0      1      1      1      1      0      0      0      0 
Node    0, zone      DMA, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone      DMA, type      Movable      0      0      0      0      0      0      0      0      0      0      3 
Node    0, zone      DMA, type      Reserve      0      0      0      0      0      0      0      0      0      1      0 
Node    0, zone      DMA, type          CMA      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone      DMA, type      Isolate      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone    DMA32, type    Unmovable   2717   4401    895      1      0      0      0      0      0      0      0 
Node    0, zone    DMA32, type  Reclaimable   6004   1784      0      0      0      0      0      0      0      0      0 
Node    0, zone    DMA32, type      Movable      1      0      0      0      0      0      0      0      0      0      0 
Node    0, zone    DMA32, type      Reserve      0      0      3      0      0      0      0      0      0      0      0 
Node    0, zone    DMA32, type          CMA      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone    DMA32, type      Isolate      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone   Normal, type    Unmovable 115050  40237   3785      0      0      0      0      0      0      0      0 
Node    0, zone   Normal, type  Reclaimable  51921  14109    659      0      0      0      0      0      0      0      0 
Node    0, zone   Normal, type      Movable      1  41954    984      0      0      0      0      0      0      0      0 
Node    0, zone   Normal, type      Reserve      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone   Normal, type          CMA      0      0      0      0      0      0      0      0      0      0      0 
Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0 

Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate 
Node 0, zone      DMA            1            0            6            1            0            0 
Node 0, zone    DMA32          620          184          211            1            0            0 
Node 0, zone   Normal         6634         3757        21351            2            0            0 
Page block order: 9
Pages per block:  512

Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10 
Node    1, zone   Normal, type    Unmovable  58723    366     15      6      0      0      0      0      0      0      0 
Node    1, zone   Normal, type  Reclaimable    163     74      5      1      0      0      0      0      0      0      0 
Node    1, zone   Normal, type      Movable   1217    283     10      0      0      0      0      0      0      0      0 
Node    1, zone   Normal, type      Reserve      0      0      0      3      0      0      0      0      0      0      0 
Node    1, zone   Normal, type          CMA      0      0      0      0      0      0      0      0      0      0      0 
Node    1, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0 

Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate 
Node 1, zone   Normal         3903         3518        25345            2            0            0 

[-- Attachment #6: extfrag-3.10.txt --]
[-- Type: text/plain, Size: 388 bytes --]

Node 0, zone      DMA -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 
Node 0, zone    DMA32 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 0.981 0.991 0.996 0.998 
Node 0, zone   Normal -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 0.977 0.989 0.995 0.998 
Node 1, zone   Normal -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 0.982 0.991 0.996 0.998 

[-- Attachment #7: extfrag-4.0.txt --]
[-- Type: text/plain, Size: 383 bytes --]

Node 0, zone      DMA -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 
Node 0, zone    DMA32 -1.000 -1.000 -1.000 -1.000 0.868 0.934 0.967 0.984 0.992 0.996 0.998 
Node 0, zone   Normal -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 0.972 0.986 0.993 0.997 0.999 
Node 1, zone   Normal -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 0.972 0.986 0.993 0.997 0.999 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Regression in mobility grouping?
  2016-09-28  1:41 Regression in mobility grouping? Johannes Weiner
@ 2016-09-28  9:00   ` Vlastimil Babka
  2016-09-28 10:26   ` Mel Gorman
  2016-09-29 21:05   ` Vlastimil Babka
  2 siblings, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2016-09-28  9:00 UTC (permalink / raw)
  To: Johannes Weiner, Mel Gorman, Joonsoo Kim
  Cc: linux-mm, linux-kernel, kernel-team

On 09/28/2016 03:41 AM, Johannes Weiner wrote:
> Hi guys,
> 
> we noticed what looks like a regression in page mobility grouping
> during an upgrade from 3.10 to 4.0. Identical machines, workloads, and
> uptime, but /proc/pagetypeinfo on 3.10 looks like this:
> 
> Number of blocks type     Unmovable  Reclaimable      Movable      Reserve      Isolate 
> Node 1, zone   Normal          815          433        31518            2            0 
> 
> and on 4.0 like this:
> 
> Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate 
> Node 1, zone   Normal         3880         3530        25356            2            0            0 

It's worth to keep in mind that this doesn't reflect where the actual
unmovable pages reside. It might be that in 3.10 they are spread within
the movable pages. IIRC enabling page_owner (not sure if in 4.0, there
were some later fixes I think) can augment pagetypeinfo with at least
some statistics of polluted pageblocks.

Does e.g. /proc/meminfo suggest how much unmovable/reclaimable memory
there should be allocated and if it would fill the respective
pageblocks, or if they are poorly utilized?

> 4.0 is either polluting pageblocks more aggressively at allocation, or
> is not able to make pageblocks movable again when the reclaimable and
> unmovable allocations are released. Invoking compaction manually
> (/proc/sys/vm/compact_memory) is not bringing them back, either.
>
> The problem we are debugging is that these machines have a very high
> rate of order-3 allocations (fdtable during fork, network rx), and
> after the upgrade allocstalls have increased dramatically. I'm not
> entirely sure this is the same issue, since even order-0 allocations
> are struggling, but the mobility grouping in itself looks problematic.
> 
> I'm still going through the changes relevant to mobility grouping in
> that timeframe, but if this rings a bell for anyone, it would help. I
> hate blaming random patches, but these caught my eye:
> 
> 9c0415e mm: more aggressive page stealing for UNMOVABLE allocations
> 3a1086f mm: always steal split buddies in fallback allocations
> 99592d5 mm: when stealing freepages, also take pages created by splitting buddy page

Check also the changelogs for mentions of earlier commits, e.g. 99592d5
should be restoring behavior that changed in 3.12-3.13 and you are
upgrading from 3.10.

> The changelog states that by aggressively stealing split buddy pages
> during a fallback allocation we avoid subsequent stealing. But since
> there are generally more movable/reclaimable pages available, and so
> less falling back and stealing freepages on behalf of movable, won't
> this mean that we could expect exactly that result - growing numbers
> of unmovable blocks, while rarely stealing them back in movable alloc
> fallbacks? And the expansion of !MOVABLE blocks would over time make
> compaction less and less effective too, seeing as it doesn't consider
> anything !MOVABLE suitable migration targets?

Yeah this is an issue with compaction that was brought up recently and I
want to tackle next.

> Attached are the full /proc/pagetypeinfo and /proc/buddyinfo from both
> kernels on machines with similar uptimes and directly after invoking
> compaction. As you can see, the buddy lists are much more fragmented
> on 4.0, with unmovable/reclaimable allocations polluting more blocks.
> 
> Any thoughts on this would be greatly appreciated. I can test patches.

I guess testing revert of 9c0415e could give us some idea. Commit
3a1086f shouldn't result in pageblock marking differences and as I said
above, 99592d5 should be just restoring to what 3.10 did.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Regression in mobility grouping?
@ 2016-09-28  9:00   ` Vlastimil Babka
  0 siblings, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2016-09-28  9:00 UTC (permalink / raw)
  To: Johannes Weiner, Mel Gorman, Joonsoo Kim
  Cc: linux-mm, linux-kernel, kernel-team

On 09/28/2016 03:41 AM, Johannes Weiner wrote:
> Hi guys,
> 
> we noticed what looks like a regression in page mobility grouping
> during an upgrade from 3.10 to 4.0. Identical machines, workloads, and
> uptime, but /proc/pagetypeinfo on 3.10 looks like this:
> 
> Number of blocks type     Unmovable  Reclaimable      Movable      Reserve      Isolate 
> Node 1, zone   Normal          815          433        31518            2            0 
> 
> and on 4.0 like this:
> 
> Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate 
> Node 1, zone   Normal         3880         3530        25356            2            0            0 

It's worth to keep in mind that this doesn't reflect where the actual
unmovable pages reside. It might be that in 3.10 they are spread within
the movable pages. IIRC enabling page_owner (not sure if in 4.0, there
were some later fixes I think) can augment pagetypeinfo with at least
some statistics of polluted pageblocks.

Does e.g. /proc/meminfo suggest how much unmovable/reclaimable memory
there should be allocated and if it would fill the respective
pageblocks, or if they are poorly utilized?

> 4.0 is either polluting pageblocks more aggressively at allocation, or
> is not able to make pageblocks movable again when the reclaimable and
> unmovable allocations are released. Invoking compaction manually
> (/proc/sys/vm/compact_memory) is not bringing them back, either.
>
> The problem we are debugging is that these machines have a very high
> rate of order-3 allocations (fdtable during fork, network rx), and
> after the upgrade allocstalls have increased dramatically. I'm not
> entirely sure this is the same issue, since even order-0 allocations
> are struggling, but the mobility grouping in itself looks problematic.
> 
> I'm still going through the changes relevant to mobility grouping in
> that timeframe, but if this rings a bell for anyone, it would help. I
> hate blaming random patches, but these caught my eye:
> 
> 9c0415e mm: more aggressive page stealing for UNMOVABLE allocations
> 3a1086f mm: always steal split buddies in fallback allocations
> 99592d5 mm: when stealing freepages, also take pages created by splitting buddy page

Check also the changelogs for mentions of earlier commits, e.g. 99592d5
should be restoring behavior that changed in 3.12-3.13 and you are
upgrading from 3.10.

> The changelog states that by aggressively stealing split buddy pages
> during a fallback allocation we avoid subsequent stealing. But since
> there are generally more movable/reclaimable pages available, and so
> less falling back and stealing freepages on behalf of movable, won't
> this mean that we could expect exactly that result - growing numbers
> of unmovable blocks, while rarely stealing them back in movable alloc
> fallbacks? And the expansion of !MOVABLE blocks would over time make
> compaction less and less effective too, seeing as it doesn't consider
> anything !MOVABLE suitable migration targets?

Yeah this is an issue with compaction that was brought up recently and I
want to tackle next.

> Attached are the full /proc/pagetypeinfo and /proc/buddyinfo from both
> kernels on machines with similar uptimes and directly after invoking
> compaction. As you can see, the buddy lists are much more fragmented
> on 4.0, with unmovable/reclaimable allocations polluting more blocks.
> 
> Any thoughts on this would be greatly appreciated. I can test patches.

I guess testing revert of 9c0415e could give us some idea. Commit
3a1086f shouldn't result in pageblock marking differences and as I said
above, 99592d5 should be just restoring to what 3.10 did.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Regression in mobility grouping?
  2016-09-28  1:41 Regression in mobility grouping? Johannes Weiner
@ 2016-09-28 10:26   ` Mel Gorman
  2016-09-28 10:26   ` Mel Gorman
  2016-09-29 21:05   ` Vlastimil Babka
  2 siblings, 0 replies; 43+ messages in thread
From: Mel Gorman @ 2016-09-28 10:26 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Vlastimil Babka, Joonsoo Kim, linux-mm, linux-kernel, kernel-team

On Tue, Sep 27, 2016 at 09:41:48PM -0400, Johannes Weiner wrote:
> Hi guys,
> 
> we noticed what looks like a regression in page mobility grouping
> during an upgrade from 3.10 to 4.0. Identical machines, workloads, and
> uptime, but /proc/pagetypeinfo on 3.10 looks like this:
> 
> Number of blocks type     Unmovable  Reclaimable      Movable      Reserve      Isolate 
> Node 1, zone   Normal          815          433        31518            2            0 
> 
> and on 4.0 like this:
> 
> Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate 
> Node 1, zone   Normal         3880         3530        25356            2            0            0 
> 

Unmovable pageblocks is not necessarily related to the number of
unmovable pages in the system although it is obviously a concern.
Basically there are two usual approaches to investigating this -- close
attention to the extfrag tracepoint and analysing high-order allocation
failures.

It's drastic, but when migration grouping was first implemented it was
necessary to use a variation of PAGE_OWNER to walk the movable pageblocks
identifying unmovable allocations in there. I also used to have a
debugging patch that would print out the owner of all pages that failed
to migrate within an unmovable block. Unfortunately I don't have these
patches any more and they wouldn't apply anyway but it'd be easier to
implement today than it was 7-8 years ago.

> 4.0 is either polluting pageblocks more aggressively at allocation, or
> is not able to make pageblocks movable again when the reclaimable and
> unmovable allocations are released. Invoking compaction manually
> (/proc/sys/vm/compact_memory) is not bringing them back, either.
> 
> The problem we are debugging is that these machines have a very high
> rate of order-3 allocations (fdtable during fork, network rx), and
> after the upgrade allocstalls have increased dramatically. I'm not
> entirely sure this is the same issue, since even order-0 allocations
> are struggling, but the mobility grouping in itself looks problematic.
> 

Network RX is likely to be atomic allocations. Another potentially place
to focus on is the use of HighAtomic pageblocks and either increasing
them in size or protecting them more aggressively.

> I'm still going through the changes relevant to mobility grouping in
> that timeframe, but if this rings a bell for anyone, it would help. I
> hate blaming random patches, but these caught my eye:
> 
> 9c0415e mm: more aggressive page stealing for UNMOVABLE allocations
> 3a1086f mm: always steal split buddies in fallback allocations
> 99592d5 mm: when stealing freepages, also take pages created by splitting buddy page
> 
> The changelog states that by aggressively stealing split buddy pages
> during a fallback allocation we avoid subsequent stealing. But since
> there are generally more movable/reclaimable pages available, and so
> less falling back and stealing freepages on behalf of movable, won't
> this mean that we could expect exactly that result - growing numbers
> of unmovable blocks, while rarely stealing them back in movable alloc
> fallbacks? And the expansion of !MOVABLE blocks would over time make
> compaction less and less effective too, seeing as it doesn't consider
> anything !MOVABLE suitable migration targets?
> 

It's a solid theory. There has been a lot of activity to weaken fragmentation
avoidance protection to reduce latency. Unfortunately external fragmentation
continues to be one of those topics that is very difficult to precisely
define because it's a matter of definition whether it's important or
not.

Another avenue worth considering is that compaction used to scan unmovable
pageblocks and migrate movable pages out of there but that was weakened
over time trying to allocate THP pages from direct allocation context
quickly enough. I'm not exactly sure what we do there at the moment and
whether kcompactd cleans unmovable pageblocks or not. It takes time but
it also reduces unmovable pageblock steals over time (or at least it did
a few years ago when I last investigated this in depth).

Unfortunately I do not have any suggestions offhand on how it could be
easily improved without going back to first principals and identifying
what pages end up in awkward positions, why and whether the cost of
"cleaning" unmovable pageblocks during compaction for a high-order
allocation is justified or not.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Regression in mobility grouping?
@ 2016-09-28 10:26   ` Mel Gorman
  0 siblings, 0 replies; 43+ messages in thread
From: Mel Gorman @ 2016-09-28 10:26 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Vlastimil Babka, Joonsoo Kim, linux-mm, linux-kernel, kernel-team

On Tue, Sep 27, 2016 at 09:41:48PM -0400, Johannes Weiner wrote:
> Hi guys,
> 
> we noticed what looks like a regression in page mobility grouping
> during an upgrade from 3.10 to 4.0. Identical machines, workloads, and
> uptime, but /proc/pagetypeinfo on 3.10 looks like this:
> 
> Number of blocks type     Unmovable  Reclaimable      Movable      Reserve      Isolate 
> Node 1, zone   Normal          815          433        31518            2            0 
> 
> and on 4.0 like this:
> 
> Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate 
> Node 1, zone   Normal         3880         3530        25356            2            0            0 
> 

Unmovable pageblocks is not necessarily related to the number of
unmovable pages in the system although it is obviously a concern.
Basically there are two usual approaches to investigating this -- close
attention to the extfrag tracepoint and analysing high-order allocation
failures.

It's drastic, but when migration grouping was first implemented it was
necessary to use a variation of PAGE_OWNER to walk the movable pageblocks
identifying unmovable allocations in there. I also used to have a
debugging patch that would print out the owner of all pages that failed
to migrate within an unmovable block. Unfortunately I don't have these
patches any more and they wouldn't apply anyway but it'd be easier to
implement today than it was 7-8 years ago.

> 4.0 is either polluting pageblocks more aggressively at allocation, or
> is not able to make pageblocks movable again when the reclaimable and
> unmovable allocations are released. Invoking compaction manually
> (/proc/sys/vm/compact_memory) is not bringing them back, either.
> 
> The problem we are debugging is that these machines have a very high
> rate of order-3 allocations (fdtable during fork, network rx), and
> after the upgrade allocstalls have increased dramatically. I'm not
> entirely sure this is the same issue, since even order-0 allocations
> are struggling, but the mobility grouping in itself looks problematic.
> 

Network RX is likely to be atomic allocations. Another potentially place
to focus on is the use of HighAtomic pageblocks and either increasing
them in size or protecting them more aggressively.

> I'm still going through the changes relevant to mobility grouping in
> that timeframe, but if this rings a bell for anyone, it would help. I
> hate blaming random patches, but these caught my eye:
> 
> 9c0415e mm: more aggressive page stealing for UNMOVABLE allocations
> 3a1086f mm: always steal split buddies in fallback allocations
> 99592d5 mm: when stealing freepages, also take pages created by splitting buddy page
> 
> The changelog states that by aggressively stealing split buddy pages
> during a fallback allocation we avoid subsequent stealing. But since
> there are generally more movable/reclaimable pages available, and so
> less falling back and stealing freepages on behalf of movable, won't
> this mean that we could expect exactly that result - growing numbers
> of unmovable blocks, while rarely stealing them back in movable alloc
> fallbacks? And the expansion of !MOVABLE blocks would over time make
> compaction less and less effective too, seeing as it doesn't consider
> anything !MOVABLE suitable migration targets?
> 

It's a solid theory. There has been a lot of activity to weaken fragmentation
avoidance protection to reduce latency. Unfortunately external fragmentation
continues to be one of those topics that is very difficult to precisely
define because it's a matter of definition whether it's important or
not.

Another avenue worth considering is that compaction used to scan unmovable
pageblocks and migrate movable pages out of there but that was weakened
over time trying to allocate THP pages from direct allocation context
quickly enough. I'm not exactly sure what we do there at the moment and
whether kcompactd cleans unmovable pageblocks or not. It takes time but
it also reduces unmovable pageblock steals over time (or at least it did
a few years ago when I last investigated this in depth).

Unfortunately I do not have any suggestions offhand on how it could be
easily improved without going back to first principals and identifying
what pages end up in awkward positions, why and whether the cost of
"cleaning" unmovable pageblocks during compaction for a high-order
allocation is justified or not.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Regression in mobility grouping?
  2016-09-28  9:00   ` Vlastimil Babka
@ 2016-09-28 15:39     ` Johannes Weiner
  -1 siblings, 0 replies; 43+ messages in thread
From: Johannes Weiner @ 2016-09-28 15:39 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Mel Gorman, Joonsoo Kim, linux-mm, linux-kernel, kernel-team

Hi Vlastimil,

On Wed, Sep 28, 2016 at 11:00:15AM +0200, Vlastimil Babka wrote:
> On 09/28/2016 03:41 AM, Johannes Weiner wrote:
> > Hi guys,
> > 
> > we noticed what looks like a regression in page mobility grouping
> > during an upgrade from 3.10 to 4.0. Identical machines, workloads, and
> > uptime, but /proc/pagetypeinfo on 3.10 looks like this:
> > 
> > Number of blocks type     Unmovable  Reclaimable      Movable      Reserve      Isolate 
> > Node 1, zone   Normal          815          433        31518            2            0 
> > 
> > and on 4.0 like this:
> > 
> > Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate 
> > Node 1, zone   Normal         3880         3530        25356            2            0            0 
> 
> It's worth to keep in mind that this doesn't reflect where the actual
> unmovable pages reside. It might be that in 3.10 they are spread within
> the movable pages. IIRC enabling page_owner (not sure if in 4.0, there
> were some later fixes I think) can augment pagetypeinfo with at least
> some statistics of polluted pageblocks.

Thanks, I'll look at the mixed block counts. I failed to make clear,
we saw that issue in the switch from 3.10 to 4.0, and I mentioned
those two kernels as last known good / first known bad. But later
kernels - we tried with 4.6 - look the same. This appears to be a
regression in (higher-order) allocation service quality somewhere
after 3.10 that persists into current kernels.

> Does e.g. /proc/meminfo suggest how much unmovable/reclaimable memory
> there should be allocated and if it would fill the respective
> pageblocks, or if they are poorly utilized?

They are very poorly utilized. On a machine with 90% anon/cache pages
alone we saw 50% of the page blocks unmovable.

> > 4.0 is either polluting pageblocks more aggressively at allocation, or
> > is not able to make pageblocks movable again when the reclaimable and
> > unmovable allocations are released. Invoking compaction manually
> > (/proc/sys/vm/compact_memory) is not bringing them back, either.
> >
> > The problem we are debugging is that these machines have a very high
> > rate of order-3 allocations (fdtable during fork, network rx), and
> > after the upgrade allocstalls have increased dramatically. I'm not
> > entirely sure this is the same issue, since even order-0 allocations
> > are struggling, but the mobility grouping in itself looks problematic.
> > 
> > I'm still going through the changes relevant to mobility grouping in
> > that timeframe, but if this rings a bell for anyone, it would help. I
> > hate blaming random patches, but these caught my eye:
> > 
> > 9c0415e mm: more aggressive page stealing for UNMOVABLE allocations
> > 3a1086f mm: always steal split buddies in fallback allocations
> > 99592d5 mm: when stealing freepages, also take pages created by splitting buddy page
> 
> Check also the changelogs for mentions of earlier commits, e.g. 99592d5
> should be restoring behavior that changed in 3.12-3.13 and you are
> upgrading from 3.10.

Good point.

> > The changelog states that by aggressively stealing split buddy pages
> > during a fallback allocation we avoid subsequent stealing. But since
> > there are generally more movable/reclaimable pages available, and so
> > less falling back and stealing freepages on behalf of movable, won't
> > this mean that we could expect exactly that result - growing numbers
> > of unmovable blocks, while rarely stealing them back in movable alloc
> > fallbacks? And the expansion of !MOVABLE blocks would over time make
> > compaction less and less effective too, seeing as it doesn't consider
> > anything !MOVABLE suitable migration targets?
> 
> Yeah this is an issue with compaction that was brought up recently and I
> want to tackle next.

Agreed, it would be nice if compaction could reclaim unmovable and
reclaimable blocks whose polluting allocations have since been freed.

But there is a limit to how lazy mobility grouping can be and still
expect compaction to fix it up. If 50% of the page blocks are marked
unmovable, we don't pack incoming polluting allocations. When spread
out the right way, even just a few of those can have a devastating
impact on overall compactability.

So regardless of future compaction improvements, we need to get
anti-frag accuracy in the allocator closer to 3.10 levels again.

> > Attached are the full /proc/pagetypeinfo and /proc/buddyinfo from both
> > kernels on machines with similar uptimes and directly after invoking
> > compaction. As you can see, the buddy lists are much more fragmented
> > on 4.0, with unmovable/reclaimable allocations polluting more blocks.
> > 
> > Any thoughts on this would be greatly appreciated. I can test patches.
> 
> I guess testing revert of 9c0415e could give us some idea. Commit
> 3a1086f shouldn't result in pageblock marking differences and as I said
> above, 99592d5 should be just restoring to what 3.10 did.

I can give this a shot, but note that this commit makes only unmovable
stealing more aggressive. We see reclaimable blocks up as well.

The workload is fairly variable, so it'll take about a day to smooth
out a meaningful average.

Thanks for your insights, Vlastimil!

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Regression in mobility grouping?
@ 2016-09-28 15:39     ` Johannes Weiner
  0 siblings, 0 replies; 43+ messages in thread
From: Johannes Weiner @ 2016-09-28 15:39 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Mel Gorman, Joonsoo Kim, linux-mm, linux-kernel, kernel-team

Hi Vlastimil,

On Wed, Sep 28, 2016 at 11:00:15AM +0200, Vlastimil Babka wrote:
> On 09/28/2016 03:41 AM, Johannes Weiner wrote:
> > Hi guys,
> > 
> > we noticed what looks like a regression in page mobility grouping
> > during an upgrade from 3.10 to 4.0. Identical machines, workloads, and
> > uptime, but /proc/pagetypeinfo on 3.10 looks like this:
> > 
> > Number of blocks type     Unmovable  Reclaimable      Movable      Reserve      Isolate 
> > Node 1, zone   Normal          815          433        31518            2            0 
> > 
> > and on 4.0 like this:
> > 
> > Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate 
> > Node 1, zone   Normal         3880         3530        25356            2            0            0 
> 
> It's worth to keep in mind that this doesn't reflect where the actual
> unmovable pages reside. It might be that in 3.10 they are spread within
> the movable pages. IIRC enabling page_owner (not sure if in 4.0, there
> were some later fixes I think) can augment pagetypeinfo with at least
> some statistics of polluted pageblocks.

Thanks, I'll look at the mixed block counts. I failed to make clear,
we saw that issue in the switch from 3.10 to 4.0, and I mentioned
those two kernels as last known good / first known bad. But later
kernels - we tried with 4.6 - look the same. This appears to be a
regression in (higher-order) allocation service quality somewhere
after 3.10 that persists into current kernels.

> Does e.g. /proc/meminfo suggest how much unmovable/reclaimable memory
> there should be allocated and if it would fill the respective
> pageblocks, or if they are poorly utilized?

They are very poorly utilized. On a machine with 90% anon/cache pages
alone we saw 50% of the page blocks unmovable.

> > 4.0 is either polluting pageblocks more aggressively at allocation, or
> > is not able to make pageblocks movable again when the reclaimable and
> > unmovable allocations are released. Invoking compaction manually
> > (/proc/sys/vm/compact_memory) is not bringing them back, either.
> >
> > The problem we are debugging is that these machines have a very high
> > rate of order-3 allocations (fdtable during fork, network rx), and
> > after the upgrade allocstalls have increased dramatically. I'm not
> > entirely sure this is the same issue, since even order-0 allocations
> > are struggling, but the mobility grouping in itself looks problematic.
> > 
> > I'm still going through the changes relevant to mobility grouping in
> > that timeframe, but if this rings a bell for anyone, it would help. I
> > hate blaming random patches, but these caught my eye:
> > 
> > 9c0415e mm: more aggressive page stealing for UNMOVABLE allocations
> > 3a1086f mm: always steal split buddies in fallback allocations
> > 99592d5 mm: when stealing freepages, also take pages created by splitting buddy page
> 
> Check also the changelogs for mentions of earlier commits, e.g. 99592d5
> should be restoring behavior that changed in 3.12-3.13 and you are
> upgrading from 3.10.

Good point.

> > The changelog states that by aggressively stealing split buddy pages
> > during a fallback allocation we avoid subsequent stealing. But since
> > there are generally more movable/reclaimable pages available, and so
> > less falling back and stealing freepages on behalf of movable, won't
> > this mean that we could expect exactly that result - growing numbers
> > of unmovable blocks, while rarely stealing them back in movable alloc
> > fallbacks? And the expansion of !MOVABLE blocks would over time make
> > compaction less and less effective too, seeing as it doesn't consider
> > anything !MOVABLE suitable migration targets?
> 
> Yeah this is an issue with compaction that was brought up recently and I
> want to tackle next.

Agreed, it would be nice if compaction could reclaim unmovable and
reclaimable blocks whose polluting allocations have since been freed.

But there is a limit to how lazy mobility grouping can be and still
expect compaction to fix it up. If 50% of the page blocks are marked
unmovable, we don't pack incoming polluting allocations. When spread
out the right way, even just a few of those can have a devastating
impact on overall compactability.

So regardless of future compaction improvements, we need to get
anti-frag accuracy in the allocator closer to 3.10 levels again.

> > Attached are the full /proc/pagetypeinfo and /proc/buddyinfo from both
> > kernels on machines with similar uptimes and directly after invoking
> > compaction. As you can see, the buddy lists are much more fragmented
> > on 4.0, with unmovable/reclaimable allocations polluting more blocks.
> > 
> > Any thoughts on this would be greatly appreciated. I can test patches.
> 
> I guess testing revert of 9c0415e could give us some idea. Commit
> 3a1086f shouldn't result in pageblock marking differences and as I said
> above, 99592d5 should be just restoring to what 3.10 did.

I can give this a shot, but note that this commit makes only unmovable
stealing more aggressive. We see reclaimable blocks up as well.

The workload is fairly variable, so it'll take about a day to smooth
out a meaningful average.

Thanks for your insights, Vlastimil!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Regression in mobility grouping?
  2016-09-28 10:26   ` Mel Gorman
@ 2016-09-28 16:37     ` Johannes Weiner
  -1 siblings, 0 replies; 43+ messages in thread
From: Johannes Weiner @ 2016-09-28 16:37 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Vlastimil Babka, Joonsoo Kim, linux-mm, linux-kernel, kernel-team

Hi Mel,

On Wed, Sep 28, 2016 at 11:26:09AM +0100, Mel Gorman wrote:
> On Tue, Sep 27, 2016 at 09:41:48PM -0400, Johannes Weiner wrote:
> > Hi guys,
> > 
> > we noticed what looks like a regression in page mobility grouping
> > during an upgrade from 3.10 to 4.0. Identical machines, workloads, and
> > uptime, but /proc/pagetypeinfo on 3.10 looks like this:
> > 
> > Number of blocks type     Unmovable  Reclaimable      Movable      Reserve      Isolate 
> > Node 1, zone   Normal          815          433        31518            2            0 
> > 
> > and on 4.0 like this:
> > 
> > Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate 
> > Node 1, zone   Normal         3880         3530        25356            2            0            0 
> > 
> 
> Unmovable pageblocks is not necessarily related to the number of
> unmovable pages in the system although it is obviously a concern.
> Basically there are two usual approaches to investigating this -- close
> attention to the extfrag tracepoint and analysing high-order allocation
> failures.
> 
> It's drastic, but when migration grouping was first implemented it was
> necessary to use a variation of PAGE_OWNER to walk the movable pageblocks
> identifying unmovable allocations in there. I also used to have a
> debugging patch that would print out the owner of all pages that failed
> to migrate within an unmovable block. Unfortunately I don't have these
> patches any more and they wouldn't apply anyway but it'd be easier to
> implement today than it was 7-8 years ago.

I've stared at the extfrag tracepoint for a while, and there really is
a high rate of block conversion going on, even after some uptime. But
it's not entirely obvious why. You'd think with large parts of memory
already in unmovable blocks - and we know them to be sparse based on
consumer breakdown in /proc/meminfo - there should be enough existing
blocks to choose from.

The PAGE_OWNER part of /proc/pagetypeinfo should be a good start for
seeing how efficiently we're packing by type. Thanks, I'll check that.

> > 4.0 is either polluting pageblocks more aggressively at allocation, or
> > is not able to make pageblocks movable again when the reclaimable and
> > unmovable allocations are released. Invoking compaction manually
> > (/proc/sys/vm/compact_memory) is not bringing them back, either.
> > 
> > The problem we are debugging is that these machines have a very high
> > rate of order-3 allocations (fdtable during fork, network rx), and
> > after the upgrade allocstalls have increased dramatically. I'm not
> > entirely sure this is the same issue, since even order-0 allocations
> > are struggling, but the mobility grouping in itself looks problematic.
> > 
> 
> Network RX is likely to be atomic allocations. Another potentially place
> to focus on is the use of HighAtomic pageblocks and either increasing
> them in size or protecting them more aggressively.

That's a good point in general for these machines and their workloads,
since we push them pretty hard with a combination of high memory
utilization and heavy network traffic with large packets sizes.

But note that MIGRATE_HIGHATOMIC was introduced only after the first
bad kernel.

> > I'm still going through the changes relevant to mobility grouping in
> > that timeframe, but if this rings a bell for anyone, it would help. I
> > hate blaming random patches, but these caught my eye:
> > 
> > 9c0415e mm: more aggressive page stealing for UNMOVABLE allocations
> > 3a1086f mm: always steal split buddies in fallback allocations
> > 99592d5 mm: when stealing freepages, also take pages created by splitting buddy page
> > 
> > The changelog states that by aggressively stealing split buddy pages
> > during a fallback allocation we avoid subsequent stealing. But since
> > there are generally more movable/reclaimable pages available, and so
> > less falling back and stealing freepages on behalf of movable, won't
> > this mean that we could expect exactly that result - growing numbers
> > of unmovable blocks, while rarely stealing them back in movable alloc
> > fallbacks? And the expansion of !MOVABLE blocks would over time make
> > compaction less and less effective too, seeing as it doesn't consider
> > anything !MOVABLE suitable migration targets?
> > 
> 
> It's a solid theory. There has been a lot of activity to weaken fragmentation
> avoidance protection to reduce latency. Unfortunately external fragmentation
> continues to be one of those topics that is very difficult to precisely
> define because it's a matter of definition whether it's important or
> not.

While I generally agree that it's a matter of degree, and a trade-off
between cost and accuracy, what we're observing here is a continued
deterioration of mobility grouping accuracy with uptime, to the point
of over half of memory being in unmovable/reclaimable blocks when the
majority of memory is movable allocations.

The consequences of that are devastating, because actually unmovable
allocations will be grouped less and less efficiently, and that in
turn affects the cost and effectiveness of every compaction run.

> Another avenue worth considering is that compaction used to scan unmovable
> pageblocks and migrate movable pages out of there but that was weakened
> over time trying to allocate THP pages from direct allocation context
> quickly enough. I'm not exactly sure what we do there at the moment and
> whether kcompactd cleans unmovable pageblocks or not. It takes time but
> it also reduces unmovable pageblock steals over time (or at least it did
> a few years ago when I last investigated this in depth).

I don't believe it does. There is a migrate_async_suitable() check
that skips over everything that isn't MOVABLE, but in spite of the
name this check is done for all compaction modes, see:

 isolate_freepages()
  suitable_migration_target()
   migrate_async_suitable()

That's why not even /proc/sys/vm/compact_memory would be able to
defragment these blocks right now.

But the more I think about this issue, the more I think compaction is
the wrong place to address this. Inefficiently packed unmovable blocks
will be less compactable, regardless of how many times the compaction
scanner looks at them. Compaction might be able to get a few chunks in
between actually unmovable pages, but the maximum size of these chunks
will be severely limited, and they won't be able to coalesce with the
surrounding chunks. Compaction cannot really fix up what mobility
grouping lets slide, so reallocating effort from allocation grouping
to compaction scanning will always be a net loss at higher uptimes.

> Unfortunately I do not have any suggestions offhand on how it could be
> easily improved without going back to first principals and identifying
> what pages end up in awkward positions, why and whether the cost of
> "cleaning" unmovable pageblocks during compaction for a high-order
> allocation is justified or not.

I don't think this particular case is a trade-off situation. From 3.10
to current kernels, we have seen both allocation latencies and overall
throughput (number of DB reqs handled per second) get worse.

Thanks!

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Regression in mobility grouping?
@ 2016-09-28 16:37     ` Johannes Weiner
  0 siblings, 0 replies; 43+ messages in thread
From: Johannes Weiner @ 2016-09-28 16:37 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Vlastimil Babka, Joonsoo Kim, linux-mm, linux-kernel, kernel-team

Hi Mel,

On Wed, Sep 28, 2016 at 11:26:09AM +0100, Mel Gorman wrote:
> On Tue, Sep 27, 2016 at 09:41:48PM -0400, Johannes Weiner wrote:
> > Hi guys,
> > 
> > we noticed what looks like a regression in page mobility grouping
> > during an upgrade from 3.10 to 4.0. Identical machines, workloads, and
> > uptime, but /proc/pagetypeinfo on 3.10 looks like this:
> > 
> > Number of blocks type     Unmovable  Reclaimable      Movable      Reserve      Isolate 
> > Node 1, zone   Normal          815          433        31518            2            0 
> > 
> > and on 4.0 like this:
> > 
> > Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate 
> > Node 1, zone   Normal         3880         3530        25356            2            0            0 
> > 
> 
> Unmovable pageblocks is not necessarily related to the number of
> unmovable pages in the system although it is obviously a concern.
> Basically there are two usual approaches to investigating this -- close
> attention to the extfrag tracepoint and analysing high-order allocation
> failures.
> 
> It's drastic, but when migration grouping was first implemented it was
> necessary to use a variation of PAGE_OWNER to walk the movable pageblocks
> identifying unmovable allocations in there. I also used to have a
> debugging patch that would print out the owner of all pages that failed
> to migrate within an unmovable block. Unfortunately I don't have these
> patches any more and they wouldn't apply anyway but it'd be easier to
> implement today than it was 7-8 years ago.

I've stared at the extfrag tracepoint for a while, and there really is
a high rate of block conversion going on, even after some uptime. But
it's not entirely obvious why. You'd think with large parts of memory
already in unmovable blocks - and we know them to be sparse based on
consumer breakdown in /proc/meminfo - there should be enough existing
blocks to choose from.

The PAGE_OWNER part of /proc/pagetypeinfo should be a good start for
seeing how efficiently we're packing by type. Thanks, I'll check that.

> > 4.0 is either polluting pageblocks more aggressively at allocation, or
> > is not able to make pageblocks movable again when the reclaimable and
> > unmovable allocations are released. Invoking compaction manually
> > (/proc/sys/vm/compact_memory) is not bringing them back, either.
> > 
> > The problem we are debugging is that these machines have a very high
> > rate of order-3 allocations (fdtable during fork, network rx), and
> > after the upgrade allocstalls have increased dramatically. I'm not
> > entirely sure this is the same issue, since even order-0 allocations
> > are struggling, but the mobility grouping in itself looks problematic.
> > 
> 
> Network RX is likely to be atomic allocations. Another potentially place
> to focus on is the use of HighAtomic pageblocks and either increasing
> them in size or protecting them more aggressively.

That's a good point in general for these machines and their workloads,
since we push them pretty hard with a combination of high memory
utilization and heavy network traffic with large packets sizes.

But note that MIGRATE_HIGHATOMIC was introduced only after the first
bad kernel.

> > I'm still going through the changes relevant to mobility grouping in
> > that timeframe, but if this rings a bell for anyone, it would help. I
> > hate blaming random patches, but these caught my eye:
> > 
> > 9c0415e mm: more aggressive page stealing for UNMOVABLE allocations
> > 3a1086f mm: always steal split buddies in fallback allocations
> > 99592d5 mm: when stealing freepages, also take pages created by splitting buddy page
> > 
> > The changelog states that by aggressively stealing split buddy pages
> > during a fallback allocation we avoid subsequent stealing. But since
> > there are generally more movable/reclaimable pages available, and so
> > less falling back and stealing freepages on behalf of movable, won't
> > this mean that we could expect exactly that result - growing numbers
> > of unmovable blocks, while rarely stealing them back in movable alloc
> > fallbacks? And the expansion of !MOVABLE blocks would over time make
> > compaction less and less effective too, seeing as it doesn't consider
> > anything !MOVABLE suitable migration targets?
> > 
> 
> It's a solid theory. There has been a lot of activity to weaken fragmentation
> avoidance protection to reduce latency. Unfortunately external fragmentation
> continues to be one of those topics that is very difficult to precisely
> define because it's a matter of definition whether it's important or
> not.

While I generally agree that it's a matter of degree, and a trade-off
between cost and accuracy, what we're observing here is a continued
deterioration of mobility grouping accuracy with uptime, to the point
of over half of memory being in unmovable/reclaimable blocks when the
majority of memory is movable allocations.

The consequences of that are devastating, because actually unmovable
allocations will be grouped less and less efficiently, and that in
turn affects the cost and effectiveness of every compaction run.

> Another avenue worth considering is that compaction used to scan unmovable
> pageblocks and migrate movable pages out of there but that was weakened
> over time trying to allocate THP pages from direct allocation context
> quickly enough. I'm not exactly sure what we do there at the moment and
> whether kcompactd cleans unmovable pageblocks or not. It takes time but
> it also reduces unmovable pageblock steals over time (or at least it did
> a few years ago when I last investigated this in depth).

I don't believe it does. There is a migrate_async_suitable() check
that skips over everything that isn't MOVABLE, but in spite of the
name this check is done for all compaction modes, see:

 isolate_freepages()
  suitable_migration_target()
   migrate_async_suitable()

That's why not even /proc/sys/vm/compact_memory would be able to
defragment these blocks right now.

But the more I think about this issue, the more I think compaction is
the wrong place to address this. Inefficiently packed unmovable blocks
will be less compactable, regardless of how many times the compaction
scanner looks at them. Compaction might be able to get a few chunks in
between actually unmovable pages, but the maximum size of these chunks
will be severely limited, and they won't be able to coalesce with the
surrounding chunks. Compaction cannot really fix up what mobility
grouping lets slide, so reallocating effort from allocation grouping
to compaction scanning will always be a net loss at higher uptimes.

> Unfortunately I do not have any suggestions offhand on how it could be
> easily improved without going back to first principals and identifying
> what pages end up in awkward positions, why and whether the cost of
> "cleaning" unmovable pageblocks during compaction for a high-order
> allocation is justified or not.

I don't think this particular case is a trade-off situation. From 3.10
to current kernels, we have seen both allocation latencies and overall
throughput (number of DB reqs handled per second) get worse.

Thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Regression in mobility grouping?
  2016-09-28 15:39     ` Johannes Weiner
@ 2016-09-29  2:25       ` Johannes Weiner
  -1 siblings, 0 replies; 43+ messages in thread
From: Johannes Weiner @ 2016-09-29  2:25 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Mel Gorman, Joonsoo Kim, linux-mm, linux-kernel, kernel-team

On Wed, Sep 28, 2016 at 11:39:25AM -0400, Johannes Weiner wrote:
> On Wed, Sep 28, 2016 at 11:00:15AM +0200, Vlastimil Babka wrote:
> > I guess testing revert of 9c0415e could give us some idea. Commit
> > 3a1086f shouldn't result in pageblock marking differences and as I said
> > above, 99592d5 should be just restoring to what 3.10 did.
> 
> I can give this a shot, but note that this commit makes only unmovable
> stealing more aggressive. We see reclaimable blocks up as well.

Quick update, I reverted back to stealing eagerly only on behalf of
MIGRATE_RECLAIMABLE allocations in a 4.6 kernel:

static bool can_steal_fallback(unsigned int order, int start_mt)
{
        if (order >= pageblock_order / 2 ||
            start_mt == MIGRATE_RECLAIMABLE ||
            page_group_by_mobility_disabled)
                return true;

        return false;
}

Yet, I still see UNMOVABLE growing to the thousands within minutes,
whereas 3.10 didn't reach those numbers even after days of uptime.

Okay, that wasn't it. However, there is something fishy going on,
because I see extfrag traces like these:

<idle>-0     [006] d.s.  1110.217281: mm_page_alloc_extfrag: page=ffffea0064142000 pfn=26235008 alloc_order=3 fallback_order=3 pageblock_order=9 alloc_migratetype=0 fallback_migratetype=2 fragmenting=1 change_ownership=1

enum {
        MIGRATE_UNMOVABLE,
        MIGRATE_MOVABLE,
        MIGRATE_RECLAIMABLE,
        MIGRATE_PCPTYPES,       /* the number of types on the pcp lists */
        MIGRATE_HIGHATOMIC = MIGRATE_PCPTYPES,
	...
};

This is an UNMOVABLE order-3 allocation falling back to RECLAIMABLE.
According to can_steal_fallback(), this allocation shouldn't steal the
pageblock, yet change_ownership=1 indicates the block is UNMOVABLE.

Who converted it? I wonder if there is a bug in ownership management,
and there was an UNMOVABLE block on the RECLAIMABLE freelist from the
beginning. AFAICS we never validate list/mt consistency anywhere.

I'll continue looking tomorrow.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Regression in mobility grouping?
@ 2016-09-29  2:25       ` Johannes Weiner
  0 siblings, 0 replies; 43+ messages in thread
From: Johannes Weiner @ 2016-09-29  2:25 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Mel Gorman, Joonsoo Kim, linux-mm, linux-kernel, kernel-team

On Wed, Sep 28, 2016 at 11:39:25AM -0400, Johannes Weiner wrote:
> On Wed, Sep 28, 2016 at 11:00:15AM +0200, Vlastimil Babka wrote:
> > I guess testing revert of 9c0415e could give us some idea. Commit
> > 3a1086f shouldn't result in pageblock marking differences and as I said
> > above, 99592d5 should be just restoring to what 3.10 did.
> 
> I can give this a shot, but note that this commit makes only unmovable
> stealing more aggressive. We see reclaimable blocks up as well.

Quick update, I reverted back to stealing eagerly only on behalf of
MIGRATE_RECLAIMABLE allocations in a 4.6 kernel:

static bool can_steal_fallback(unsigned int order, int start_mt)
{
        if (order >= pageblock_order / 2 ||
            start_mt == MIGRATE_RECLAIMABLE ||
            page_group_by_mobility_disabled)
                return true;

        return false;
}

Yet, I still see UNMOVABLE growing to the thousands within minutes,
whereas 3.10 didn't reach those numbers even after days of uptime.

Okay, that wasn't it. However, there is something fishy going on,
because I see extfrag traces like these:

<idle>-0     [006] d.s.  1110.217281: mm_page_alloc_extfrag: page=ffffea0064142000 pfn=26235008 alloc_order=3 fallback_order=3 pageblock_order=9 alloc_migratetype=0 fallback_migratetype=2 fragmenting=1 change_ownership=1

enum {
        MIGRATE_UNMOVABLE,
        MIGRATE_MOVABLE,
        MIGRATE_RECLAIMABLE,
        MIGRATE_PCPTYPES,       /* the number of types on the pcp lists */
        MIGRATE_HIGHATOMIC = MIGRATE_PCPTYPES,
	...
};

This is an UNMOVABLE order-3 allocation falling back to RECLAIMABLE.
According to can_steal_fallback(), this allocation shouldn't steal the
pageblock, yet change_ownership=1 indicates the block is UNMOVABLE.

Who converted it? I wonder if there is a bug in ownership management,
and there was an UNMOVABLE block on the RECLAIMABLE freelist from the
beginning. AFAICS we never validate list/mt consistency anywhere.

I'll continue looking tomorrow.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Regression in mobility grouping?
  2016-09-29  2:25       ` Johannes Weiner
@ 2016-09-29  6:14         ` Joonsoo Kim
  -1 siblings, 0 replies; 43+ messages in thread
From: Joonsoo Kim @ 2016-09-29  6:14 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Vlastimil Babka, Mel Gorman, linux-mm, linux-kernel, kernel-team

On Wed, Sep 28, 2016 at 10:25:40PM -0400, Johannes Weiner wrote:
> On Wed, Sep 28, 2016 at 11:39:25AM -0400, Johannes Weiner wrote:
> > On Wed, Sep 28, 2016 at 11:00:15AM +0200, Vlastimil Babka wrote:
> > > I guess testing revert of 9c0415e could give us some idea. Commit
> > > 3a1086f shouldn't result in pageblock marking differences and as I said
> > > above, 99592d5 should be just restoring to what 3.10 did.
> > 
> > I can give this a shot, but note that this commit makes only unmovable
> > stealing more aggressive. We see reclaimable blocks up as well.
> 
> Quick update, I reverted back to stealing eagerly only on behalf of
> MIGRATE_RECLAIMABLE allocations in a 4.6 kernel:

Hello, Johannes.

I think that it would be better to check 3.10 with above patches.
Fragmentation depends on not only policy itself but also
allocation/free pattern. There might be a large probability that
allocation/free pattern is changed in this large kernel version
difference.

> 
> static bool can_steal_fallback(unsigned int order, int start_mt)
> {
>         if (order >= pageblock_order / 2 ||
>             start_mt == MIGRATE_RECLAIMABLE ||
>             page_group_by_mobility_disabled)
>                 return true;
> 
>         return false;
> }
> 
> Yet, I still see UNMOVABLE growing to the thousands within minutes,
> whereas 3.10 didn't reach those numbers even after days of uptime.
> 
> Okay, that wasn't it. However, there is something fishy going on,
> because I see extfrag traces like these:
> 
> <idle>-0     [006] d.s.  1110.217281: mm_page_alloc_extfrag: page=ffffea0064142000 pfn=26235008 alloc_order=3 fallback_order=3 pageblock_order=9 alloc_migratetype=0 fallback_migratetype=2 fragmenting=1 change_ownership=1
> 
> enum {
>         MIGRATE_UNMOVABLE,
>         MIGRATE_MOVABLE,
>         MIGRATE_RECLAIMABLE,
>         MIGRATE_PCPTYPES,       /* the number of types on the pcp lists */
>         MIGRATE_HIGHATOMIC = MIGRATE_PCPTYPES,
> 	...
> };
> 
> This is an UNMOVABLE order-3 allocation falling back to RECLAIMABLE.
> According to can_steal_fallback(), this allocation shouldn't steal the
> pageblock, yet change_ownership=1 indicates the block is UNMOVABLE.
> 
> Who converted it? I wonder if there is a bug in ownership management,
> and there was an UNMOVABLE block on the RECLAIMABLE freelist from the
> beginning. AFAICS we never validate list/mt consistency anywhere.

According to my code review, it would be possible. When stealing
happens, we moved those buddy pages to current requested migratetype
buddy list. If the other migratetype allocation request comes and
stealing from the buddy list of previous requested migratetype
happens, change_ownership will show '1' even if there is no ownership
changing.

Thanks.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Regression in mobility grouping?
@ 2016-09-29  6:14         ` Joonsoo Kim
  0 siblings, 0 replies; 43+ messages in thread
From: Joonsoo Kim @ 2016-09-29  6:14 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Vlastimil Babka, Mel Gorman, linux-mm, linux-kernel, kernel-team

On Wed, Sep 28, 2016 at 10:25:40PM -0400, Johannes Weiner wrote:
> On Wed, Sep 28, 2016 at 11:39:25AM -0400, Johannes Weiner wrote:
> > On Wed, Sep 28, 2016 at 11:00:15AM +0200, Vlastimil Babka wrote:
> > > I guess testing revert of 9c0415e could give us some idea. Commit
> > > 3a1086f shouldn't result in pageblock marking differences and as I said
> > > above, 99592d5 should be just restoring to what 3.10 did.
> > 
> > I can give this a shot, but note that this commit makes only unmovable
> > stealing more aggressive. We see reclaimable blocks up as well.
> 
> Quick update, I reverted back to stealing eagerly only on behalf of
> MIGRATE_RECLAIMABLE allocations in a 4.6 kernel:

Hello, Johannes.

I think that it would be better to check 3.10 with above patches.
Fragmentation depends on not only policy itself but also
allocation/free pattern. There might be a large probability that
allocation/free pattern is changed in this large kernel version
difference.

> 
> static bool can_steal_fallback(unsigned int order, int start_mt)
> {
>         if (order >= pageblock_order / 2 ||
>             start_mt == MIGRATE_RECLAIMABLE ||
>             page_group_by_mobility_disabled)
>                 return true;
> 
>         return false;
> }
> 
> Yet, I still see UNMOVABLE growing to the thousands within minutes,
> whereas 3.10 didn't reach those numbers even after days of uptime.
> 
> Okay, that wasn't it. However, there is something fishy going on,
> because I see extfrag traces like these:
> 
> <idle>-0     [006] d.s.  1110.217281: mm_page_alloc_extfrag: page=ffffea0064142000 pfn=26235008 alloc_order=3 fallback_order=3 pageblock_order=9 alloc_migratetype=0 fallback_migratetype=2 fragmenting=1 change_ownership=1
> 
> enum {
>         MIGRATE_UNMOVABLE,
>         MIGRATE_MOVABLE,
>         MIGRATE_RECLAIMABLE,
>         MIGRATE_PCPTYPES,       /* the number of types on the pcp lists */
>         MIGRATE_HIGHATOMIC = MIGRATE_PCPTYPES,
> 	...
> };
> 
> This is an UNMOVABLE order-3 allocation falling back to RECLAIMABLE.
> According to can_steal_fallback(), this allocation shouldn't steal the
> pageblock, yet change_ownership=1 indicates the block is UNMOVABLE.
> 
> Who converted it? I wonder if there is a bug in ownership management,
> and there was an UNMOVABLE block on the RECLAIMABLE freelist from the
> beginning. AFAICS we never validate list/mt consistency anywhere.

According to my code review, it would be possible. When stealing
happens, we moved those buddy pages to current requested migratetype
buddy list. If the other migratetype allocation request comes and
stealing from the buddy list of previous requested migratetype
happens, change_ownership will show '1' even if there is no ownership
changing.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Regression in mobility grouping?
  2016-09-29  2:25       ` Johannes Weiner
@ 2016-09-29  7:17         ` Vlastimil Babka
  -1 siblings, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2016-09-29  7:17 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Mel Gorman, Joonsoo Kim, linux-mm, linux-kernel, kernel-team

On 09/29/2016 04:25 AM, Johannes Weiner wrote:
> On Wed, Sep 28, 2016 at 11:39:25AM -0400, Johannes Weiner wrote:
>> On Wed, Sep 28, 2016 at 11:00:15AM +0200, Vlastimil Babka wrote:
>>> I guess testing revert of 9c0415e could give us some idea. Commit
>>> 3a1086f shouldn't result in pageblock marking differences and as I said
>>> above, 99592d5 should be just restoring to what 3.10 did.
>>
>> I can give this a shot, but note that this commit makes only unmovable
>> stealing more aggressive. We see reclaimable blocks up as well.
>
> Quick update, I reverted back to stealing eagerly only on behalf of
> MIGRATE_RECLAIMABLE allocations in a 4.6 kernel:
>
> static bool can_steal_fallback(unsigned int order, int start_mt)
> {
>         if (order >= pageblock_order / 2 ||
>             start_mt == MIGRATE_RECLAIMABLE ||
>             page_group_by_mobility_disabled)
>                 return true;
>
>         return false;
> }
>
> Yet, I still see UNMOVABLE growing to the thousands within minutes,
> whereas 3.10 didn't reach those numbers even after days of uptime.
>
> Okay, that wasn't it. However, there is something fishy going on,
> because I see extfrag traces like these:
>
> <idle>-0     [006] d.s.  1110.217281: mm_page_alloc_extfrag: page=ffffea0064142000 pfn=26235008 alloc_order=3 fallback_order=3 pageblock_order=9 alloc_migratetype=0 fallback_migratetype=2 fragmenting=1 change_ownership=1
>
> enum {
>         MIGRATE_UNMOVABLE,
>         MIGRATE_MOVABLE,
>         MIGRATE_RECLAIMABLE,
>         MIGRATE_PCPTYPES,       /* the number of types on the pcp lists */
>         MIGRATE_HIGHATOMIC = MIGRATE_PCPTYPES,
> 	...
> };
>
> This is an UNMOVABLE order-3 allocation falling back to RECLAIMABLE.
> According to can_steal_fallback(), this allocation shouldn't steal the
> pageblock, yet change_ownership=1 indicates the block is UNMOVABLE.
>
> Who converted it? I wonder if there is a bug in ownership management,
> and there was an UNMOVABLE block on the RECLAIMABLE freelist from the
> beginning. AFAICS we never validate list/mt consistency anywhere.

Hm yes there are e.g. no strong guarantees for pageblock migratetype and 
relevant pages being on freelist of the same type, except for ISOLATE, 
for performance reasons. IIRC pageblock type is checked when putting a 
page on pcplist and then it may diverge before it's flushed on freelist. 
So it's possible the fallback page was on RECLAIMABLE list
while the pageblock was marked as UNMOVABLE.

Also the tracepoint is racy so that steal_suitable_fallback() doesn't 
have to communicate back whether it was truly stealing whole pageblock.

> I'll continue looking tomorrow.
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Regression in mobility grouping?
@ 2016-09-29  7:17         ` Vlastimil Babka
  0 siblings, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2016-09-29  7:17 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Mel Gorman, Joonsoo Kim, linux-mm, linux-kernel, kernel-team

On 09/29/2016 04:25 AM, Johannes Weiner wrote:
> On Wed, Sep 28, 2016 at 11:39:25AM -0400, Johannes Weiner wrote:
>> On Wed, Sep 28, 2016 at 11:00:15AM +0200, Vlastimil Babka wrote:
>>> I guess testing revert of 9c0415e could give us some idea. Commit
>>> 3a1086f shouldn't result in pageblock marking differences and as I said
>>> above, 99592d5 should be just restoring to what 3.10 did.
>>
>> I can give this a shot, but note that this commit makes only unmovable
>> stealing more aggressive. We see reclaimable blocks up as well.
>
> Quick update, I reverted back to stealing eagerly only on behalf of
> MIGRATE_RECLAIMABLE allocations in a 4.6 kernel:
>
> static bool can_steal_fallback(unsigned int order, int start_mt)
> {
>         if (order >= pageblock_order / 2 ||
>             start_mt == MIGRATE_RECLAIMABLE ||
>             page_group_by_mobility_disabled)
>                 return true;
>
>         return false;
> }
>
> Yet, I still see UNMOVABLE growing to the thousands within minutes,
> whereas 3.10 didn't reach those numbers even after days of uptime.
>
> Okay, that wasn't it. However, there is something fishy going on,
> because I see extfrag traces like these:
>
> <idle>-0     [006] d.s.  1110.217281: mm_page_alloc_extfrag: page=ffffea0064142000 pfn=26235008 alloc_order=3 fallback_order=3 pageblock_order=9 alloc_migratetype=0 fallback_migratetype=2 fragmenting=1 change_ownership=1
>
> enum {
>         MIGRATE_UNMOVABLE,
>         MIGRATE_MOVABLE,
>         MIGRATE_RECLAIMABLE,
>         MIGRATE_PCPTYPES,       /* the number of types on the pcp lists */
>         MIGRATE_HIGHATOMIC = MIGRATE_PCPTYPES,
> 	...
> };
>
> This is an UNMOVABLE order-3 allocation falling back to RECLAIMABLE.
> According to can_steal_fallback(), this allocation shouldn't steal the
> pageblock, yet change_ownership=1 indicates the block is UNMOVABLE.
>
> Who converted it? I wonder if there is a bug in ownership management,
> and there was an UNMOVABLE block on the RECLAIMABLE freelist from the
> beginning. AFAICS we never validate list/mt consistency anywhere.

Hm yes there are e.g. no strong guarantees for pageblock migratetype and 
relevant pages being on freelist of the same type, except for ISOLATE, 
for performance reasons. IIRC pageblock type is checked when putting a 
page on pcplist and then it may diverge before it's flushed on freelist. 
So it's possible the fallback page was on RECLAIMABLE list
while the pageblock was marked as UNMOVABLE.

Also the tracepoint is racy so that steal_suitable_fallback() doesn't 
have to communicate back whether it was truly stealing whole pageblock.

> I'll continue looking tomorrow.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Regression in mobility grouping?
  2016-09-29  6:14         ` Joonsoo Kim
@ 2016-09-29 16:14           ` Johannes Weiner
  -1 siblings, 0 replies; 43+ messages in thread
From: Johannes Weiner @ 2016-09-29 16:14 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Vlastimil Babka, Mel Gorman, linux-mm, linux-kernel, kernel-team

On Thu, Sep 29, 2016 at 03:14:33PM +0900, Joonsoo Kim wrote:
> On Wed, Sep 28, 2016 at 10:25:40PM -0400, Johannes Weiner wrote:
> > On Wed, Sep 28, 2016 at 11:39:25AM -0400, Johannes Weiner wrote:
> > > On Wed, Sep 28, 2016 at 11:00:15AM +0200, Vlastimil Babka wrote:
> > > > I guess testing revert of 9c0415e could give us some idea. Commit
> > > > 3a1086f shouldn't result in pageblock marking differences and as I said
> > > > above, 99592d5 should be just restoring to what 3.10 did.
> > > 
> > > I can give this a shot, but note that this commit makes only unmovable
> > > stealing more aggressive. We see reclaimable blocks up as well.
> > 
> > Quick update, I reverted back to stealing eagerly only on behalf of
> > MIGRATE_RECLAIMABLE allocations in a 4.6 kernel:
> 
> Hello, Johannes.
> 
> I think that it would be better to check 3.10 with above patches.
> Fragmentation depends on not only policy itself but also
> allocation/free pattern. There might be a large probability that
> allocation/free pattern is changed in this large kernel version
> difference.

You mean backport suspicious patches to 3.10 until I can reproduce it
there? I'm not sure. You're correct, the patterns very likely *have*
changed. But that alone cannot explain mobility grouping breaking that
badly. There is a reproducable bad behavior. It should be easier to
track down than to try to recreate it in the last-known-good kernel.

> > This is an UNMOVABLE order-3 allocation falling back to RECLAIMABLE.
> > According to can_steal_fallback(), this allocation shouldn't steal the
> > pageblock, yet change_ownership=1 indicates the block is UNMOVABLE.
> > 
> > Who converted it? I wonder if there is a bug in ownership management,
> > and there was an UNMOVABLE block on the RECLAIMABLE freelist from the
> > beginning. AFAICS we never validate list/mt consistency anywhere.
> 
> According to my code review, it would be possible. When stealing
> happens, we moved those buddy pages to current requested migratetype
> buddy list. If the other migratetype allocation request comes and
> stealing from the buddy list of previous requested migratetype
> happens, change_ownership will show '1' even if there is no ownership
> changing.

These two paths should exclude each other through the zone->lock, no?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Regression in mobility grouping?
@ 2016-09-29 16:14           ` Johannes Weiner
  0 siblings, 0 replies; 43+ messages in thread
From: Johannes Weiner @ 2016-09-29 16:14 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Vlastimil Babka, Mel Gorman, linux-mm, linux-kernel, kernel-team

On Thu, Sep 29, 2016 at 03:14:33PM +0900, Joonsoo Kim wrote:
> On Wed, Sep 28, 2016 at 10:25:40PM -0400, Johannes Weiner wrote:
> > On Wed, Sep 28, 2016 at 11:39:25AM -0400, Johannes Weiner wrote:
> > > On Wed, Sep 28, 2016 at 11:00:15AM +0200, Vlastimil Babka wrote:
> > > > I guess testing revert of 9c0415e could give us some idea. Commit
> > > > 3a1086f shouldn't result in pageblock marking differences and as I said
> > > > above, 99592d5 should be just restoring to what 3.10 did.
> > > 
> > > I can give this a shot, but note that this commit makes only unmovable
> > > stealing more aggressive. We see reclaimable blocks up as well.
> > 
> > Quick update, I reverted back to stealing eagerly only on behalf of
> > MIGRATE_RECLAIMABLE allocations in a 4.6 kernel:
> 
> Hello, Johannes.
> 
> I think that it would be better to check 3.10 with above patches.
> Fragmentation depends on not only policy itself but also
> allocation/free pattern. There might be a large probability that
> allocation/free pattern is changed in this large kernel version
> difference.

You mean backport suspicious patches to 3.10 until I can reproduce it
there? I'm not sure. You're correct, the patterns very likely *have*
changed. But that alone cannot explain mobility grouping breaking that
badly. There is a reproducable bad behavior. It should be easier to
track down than to try to recreate it in the last-known-good kernel.

> > This is an UNMOVABLE order-3 allocation falling back to RECLAIMABLE.
> > According to can_steal_fallback(), this allocation shouldn't steal the
> > pageblock, yet change_ownership=1 indicates the block is UNMOVABLE.
> > 
> > Who converted it? I wonder if there is a bug in ownership management,
> > and there was an UNMOVABLE block on the RECLAIMABLE freelist from the
> > beginning. AFAICS we never validate list/mt consistency anywhere.
> 
> According to my code review, it would be possible. When stealing
> happens, we moved those buddy pages to current requested migratetype
> buddy list. If the other migratetype allocation request comes and
> stealing from the buddy list of previous requested migratetype
> happens, change_ownership will show '1' even if there is no ownership
> changing.

These two paths should exclude each other through the zone->lock, no?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC 0/4] try to reduce fragmenting fallbacks
  2016-09-28  1:41 Regression in mobility grouping? Johannes Weiner
@ 2016-09-29 21:05   ` Vlastimil Babka
  2016-09-28 10:26   ` Mel Gorman
  2016-09-29 21:05   ` Vlastimil Babka
  2 siblings, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2016-09-29 21:05 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Mel Gorman, Joonsoo Kim, linux-kernel, linux-mm, kernel-team,
	Vlastimil Babka

Hi Johannes,

here's something quick to try or ponder about. However, untested since it's too
late here. Based on mmotm-2016-09-27-16-08 plus this fix [1]

[1] http://lkml.kernel.org/r/<cadadd38-6456-f58e-504f-cc18ddc47b3f@suse.cz>

Vlastimil Babka (4):
  mm, compaction: change migrate_async_suitable() to
    suitable_migration_source()
  mm, compaction: add migratetype to compact_control
  mm, compaction: restrict async compaction to matching migratetype
  mm, page_alloc: disallow migratetype fallback in fastpath

 include/linux/mmzone.h |  5 +++++
 mm/compaction.c        | 41 +++++++++++++++++++++++++----------------
 mm/internal.h          |  2 ++
 mm/page_alloc.c        | 34 +++++++++++++++++++++++-----------
 4 files changed, 55 insertions(+), 27 deletions(-)

-- 
2.10.0

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC 0/4] try to reduce fragmenting fallbacks
@ 2016-09-29 21:05   ` Vlastimil Babka
  0 siblings, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2016-09-29 21:05 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Mel Gorman, Joonsoo Kim, linux-kernel, linux-mm, kernel-team,
	Vlastimil Babka

Hi Johannes,

here's something quick to try or ponder about. However, untested since it's too
late here. Based on mmotm-2016-09-27-16-08 plus this fix [1]

[1] http://lkml.kernel.org/r/<cadadd38-6456-f58e-504f-cc18ddc47b3f@suse.cz>

Vlastimil Babka (4):
  mm, compaction: change migrate_async_suitable() to
    suitable_migration_source()
  mm, compaction: add migratetype to compact_control
  mm, compaction: restrict async compaction to matching migratetype
  mm, page_alloc: disallow migratetype fallback in fastpath

 include/linux/mmzone.h |  5 +++++
 mm/compaction.c        | 41 +++++++++++++++++++++++++----------------
 mm/internal.h          |  2 ++
 mm/page_alloc.c        | 34 +++++++++++++++++++++++-----------
 4 files changed, 55 insertions(+), 27 deletions(-)

-- 
2.10.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC 1/4] mm, compaction: change migrate_async_suitable() to suitable_migration_source()
  2016-09-29 21:05   ` Vlastimil Babka
@ 2016-09-29 21:05     ` Vlastimil Babka
  -1 siblings, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2016-09-29 21:05 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Mel Gorman, Joonsoo Kim, linux-kernel, linux-mm, kernel-team,
	Vlastimil Babka

Preparation for making the decisions more complex and depending on
compact_control flags. No functional change.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/mmzone.h |  5 +++++
 mm/compaction.c        | 19 +++++++++++--------
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 454495cc00fe..9cd3ee58ab2b 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -72,6 +72,11 @@ extern char * const migratetype_names[MIGRATE_TYPES];
 #  define is_migrate_cma(migratetype) false
 #endif
 
+static inline bool is_migrate_movable(int mt)
+{
+	return is_migrate_cma(mt) || mt == MIGRATE_MOVABLE;
+}
+
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
 		for (type = 0; type < MIGRATE_TYPES; type++)
diff --git a/mm/compaction.c b/mm/compaction.c
index 6e77b2016da1..673a81618534 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -88,11 +88,6 @@ static void map_pages(struct list_head *list)
 	list_splice(&tmp_list, list);
 }
 
-static inline bool migrate_async_suitable(int migratetype)
-{
-	return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
-}
-
 #ifdef CONFIG_COMPACTION
 
 int PageMovable(struct page *page)
@@ -996,6 +991,15 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
 #endif /* CONFIG_COMPACTION || CONFIG_CMA */
 #ifdef CONFIG_COMPACTION
 
+static bool suitable_migration_source(struct compact_control *cc,
+							struct page *page)
+{
+	if (cc->mode != MIGRATE_ASYNC)
+		return true;
+
+	return is_migrate_movable(get_pageblock_migratetype(page));
+}
+
 /* Returns true if the page is within a block suitable for migration to */
 static bool suitable_migration_target(struct compact_control *cc,
 							struct page *page)
@@ -1015,7 +1019,7 @@ static bool suitable_migration_target(struct compact_control *cc,
 	}
 
 	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
-	if (migrate_async_suitable(get_pageblock_migratetype(page)))
+	if (is_migrate_movable(get_pageblock_migratetype(page)))
 		return true;
 
 	/* Otherwise skip the block */
@@ -1250,8 +1254,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 		 * Async compaction is optimistic to see if the minimum amount
 		 * of work satisfies the allocation.
 		 */
-		if (cc->mode == MIGRATE_ASYNC &&
-		    !migrate_async_suitable(get_pageblock_migratetype(page)))
+		if (!suitable_migration_source(cc, page))
 			continue;
 
 		/* Perform the isolation */
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC 1/4] mm, compaction: change migrate_async_suitable() to suitable_migration_source()
@ 2016-09-29 21:05     ` Vlastimil Babka
  0 siblings, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2016-09-29 21:05 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Mel Gorman, Joonsoo Kim, linux-kernel, linux-mm, kernel-team,
	Vlastimil Babka

Preparation for making the decisions more complex and depending on
compact_control flags. No functional change.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/mmzone.h |  5 +++++
 mm/compaction.c        | 19 +++++++++++--------
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 454495cc00fe..9cd3ee58ab2b 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -72,6 +72,11 @@ extern char * const migratetype_names[MIGRATE_TYPES];
 #  define is_migrate_cma(migratetype) false
 #endif
 
+static inline bool is_migrate_movable(int mt)
+{
+	return is_migrate_cma(mt) || mt == MIGRATE_MOVABLE;
+}
+
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
 		for (type = 0; type < MIGRATE_TYPES; type++)
diff --git a/mm/compaction.c b/mm/compaction.c
index 6e77b2016da1..673a81618534 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -88,11 +88,6 @@ static void map_pages(struct list_head *list)
 	list_splice(&tmp_list, list);
 }
 
-static inline bool migrate_async_suitable(int migratetype)
-{
-	return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
-}
-
 #ifdef CONFIG_COMPACTION
 
 int PageMovable(struct page *page)
@@ -996,6 +991,15 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
 #endif /* CONFIG_COMPACTION || CONFIG_CMA */
 #ifdef CONFIG_COMPACTION
 
+static bool suitable_migration_source(struct compact_control *cc,
+							struct page *page)
+{
+	if (cc->mode != MIGRATE_ASYNC)
+		return true;
+
+	return is_migrate_movable(get_pageblock_migratetype(page));
+}
+
 /* Returns true if the page is within a block suitable for migration to */
 static bool suitable_migration_target(struct compact_control *cc,
 							struct page *page)
@@ -1015,7 +1019,7 @@ static bool suitable_migration_target(struct compact_control *cc,
 	}
 
 	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
-	if (migrate_async_suitable(get_pageblock_migratetype(page)))
+	if (is_migrate_movable(get_pageblock_migratetype(page)))
 		return true;
 
 	/* Otherwise skip the block */
@@ -1250,8 +1254,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 		 * Async compaction is optimistic to see if the minimum amount
 		 * of work satisfies the allocation.
 		 */
-		if (cc->mode == MIGRATE_ASYNC &&
-		    !migrate_async_suitable(get_pageblock_migratetype(page)))
+		if (!suitable_migration_source(cc, page))
 			continue;
 
 		/* Perform the isolation */
-- 
2.10.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC 2/4] mm, compaction: add migratetype to compact_control
  2016-09-29 21:05   ` Vlastimil Babka
@ 2016-09-29 21:05     ` Vlastimil Babka
  -1 siblings, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2016-09-29 21:05 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Mel Gorman, Joonsoo Kim, linux-kernel, linux-mm, kernel-team,
	Vlastimil Babka

Preparation patch. We are going to need migratetype at lower layers than
compact_zone() and compact_finished().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 15 +++++++--------
 mm/internal.h   |  1 +
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 673a81618534..823538353b80 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1290,11 +1290,12 @@ static inline bool is_via_compact_memory(int order)
 	return order == -1;
 }
 
-static enum compact_result __compact_finished(struct zone *zone, struct compact_control *cc,
-			    const int migratetype)
+static enum compact_result __compact_finished(struct zone *zone,
+						struct compact_control *cc)
 {
 	unsigned int order;
 	unsigned long watermark;
+	const int migratetype = cc->migratetype;
 
 	if (cc->contended || fatal_signal_pending(current))
 		return COMPACT_CONTENDED;
@@ -1357,12 +1358,11 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
 }
 
 static enum compact_result compact_finished(struct zone *zone,
-			struct compact_control *cc,
-			const int migratetype)
+			struct compact_control *cc)
 {
 	int ret;
 
-	ret = __compact_finished(zone, cc, migratetype);
+	ret = __compact_finished(zone, cc);
 	trace_mm_compaction_finished(zone, cc->order, ret);
 	if (ret == COMPACT_NO_SUITABLE_PAGE)
 		ret = COMPACT_CONTINUE;
@@ -1497,9 +1497,9 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 	enum compact_result ret;
 	unsigned long start_pfn = zone->zone_start_pfn;
 	unsigned long end_pfn = zone_end_pfn(zone);
-	const int migratetype = gfpflags_to_migratetype(cc->gfp_mask);
 	const bool sync = cc->mode != MIGRATE_ASYNC;
 
+	cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask);
 	ret = compaction_suitable(zone, cc->order, cc->alloc_flags,
 							cc->classzone_idx);
 	/* Compaction is likely to fail */
@@ -1549,8 +1549,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 
 	migrate_prep_local();
 
-	while ((ret = compact_finished(zone, cc, migratetype)) ==
-						COMPACT_CONTINUE) {
+	while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) {
 		int err;
 
 		switch (isolate_migratepages(zone, cc)) {
diff --git a/mm/internal.h b/mm/internal.h
index 537ac9951f5f..1fee63010dcc 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -182,6 +182,7 @@ struct compact_control {
 	bool direct_compaction;		/* False from kcompactd or /proc/... */
 	bool whole_zone;		/* Whole zone should/has been scanned */
 	int order;			/* order a direct compactor needs */
+	int migratetype;		/* migratetype of direct compactor */
 	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
 	const unsigned int alloc_flags;	/* alloc flags of a direct compactor */
 	const int classzone_idx;	/* zone index of a direct compactor */
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC 2/4] mm, compaction: add migratetype to compact_control
@ 2016-09-29 21:05     ` Vlastimil Babka
  0 siblings, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2016-09-29 21:05 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Mel Gorman, Joonsoo Kim, linux-kernel, linux-mm, kernel-team,
	Vlastimil Babka

Preparation patch. We are going to need migratetype at lower layers than
compact_zone() and compact_finished().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 15 +++++++--------
 mm/internal.h   |  1 +
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 673a81618534..823538353b80 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1290,11 +1290,12 @@ static inline bool is_via_compact_memory(int order)
 	return order == -1;
 }
 
-static enum compact_result __compact_finished(struct zone *zone, struct compact_control *cc,
-			    const int migratetype)
+static enum compact_result __compact_finished(struct zone *zone,
+						struct compact_control *cc)
 {
 	unsigned int order;
 	unsigned long watermark;
+	const int migratetype = cc->migratetype;
 
 	if (cc->contended || fatal_signal_pending(current))
 		return COMPACT_CONTENDED;
@@ -1357,12 +1358,11 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
 }
 
 static enum compact_result compact_finished(struct zone *zone,
-			struct compact_control *cc,
-			const int migratetype)
+			struct compact_control *cc)
 {
 	int ret;
 
-	ret = __compact_finished(zone, cc, migratetype);
+	ret = __compact_finished(zone, cc);
 	trace_mm_compaction_finished(zone, cc->order, ret);
 	if (ret == COMPACT_NO_SUITABLE_PAGE)
 		ret = COMPACT_CONTINUE;
@@ -1497,9 +1497,9 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 	enum compact_result ret;
 	unsigned long start_pfn = zone->zone_start_pfn;
 	unsigned long end_pfn = zone_end_pfn(zone);
-	const int migratetype = gfpflags_to_migratetype(cc->gfp_mask);
 	const bool sync = cc->mode != MIGRATE_ASYNC;
 
+	cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask);
 	ret = compaction_suitable(zone, cc->order, cc->alloc_flags,
 							cc->classzone_idx);
 	/* Compaction is likely to fail */
@@ -1549,8 +1549,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 
 	migrate_prep_local();
 
-	while ((ret = compact_finished(zone, cc, migratetype)) ==
-						COMPACT_CONTINUE) {
+	while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) {
 		int err;
 
 		switch (isolate_migratepages(zone, cc)) {
diff --git a/mm/internal.h b/mm/internal.h
index 537ac9951f5f..1fee63010dcc 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -182,6 +182,7 @@ struct compact_control {
 	bool direct_compaction;		/* False from kcompactd or /proc/... */
 	bool whole_zone;		/* Whole zone should/has been scanned */
 	int order;			/* order a direct compactor needs */
+	int migratetype;		/* migratetype of direct compactor */
 	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
 	const unsigned int alloc_flags;	/* alloc flags of a direct compactor */
 	const int classzone_idx;	/* zone index of a direct compactor */
-- 
2.10.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC 3/4] mm, compaction: restrict async compaction to matching migratetype
  2016-09-29 21:05   ` Vlastimil Babka
@ 2016-09-29 21:05     ` Vlastimil Babka
  -1 siblings, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2016-09-29 21:05 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Mel Gorman, Joonsoo Kim, linux-kernel, linux-mm, kernel-team,
	Vlastimil Babka

The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE
pageblocks. This is a heuristic intended to reduce latency, based on the
assumption that non-MOVABLE pageblocks are unlikely to contain movable pages.

However, with the exception of THP's, most high-order allocations are not
movable. Should the async compaction succeed, this increases the chance that
the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the
long-term fragmentation worse.

This patch attempts to help the situation by changing async direct compaction
so that the migrate scanner only scans the pageblocks of the requested
migratetype. If it's a non-MOVABLE type and there are such pageblocks that do
contain movable pages, chances are that the allocation can succeed within one
of such pageblocks, removing the need for a fallback. If that fails, the
subsequent sync attempt will ignore this restriction.

Not-yet-signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 11 +++++++++--
 mm/page_alloc.c | 20 +++++++++++++-------
 2 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 823538353b80..eb4ccd403543 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -994,10 +994,17 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
 static bool suitable_migration_source(struct compact_control *cc,
 							struct page *page)
 {
-	if (cc->mode != MIGRATE_ASYNC)
+	int block_mt;
+
+	if ((cc->mode != MIGRATE_ASYNC) || !cc->direct_compaction)
 		return true;
 
-	return is_migrate_movable(get_pageblock_migratetype(page));
+	block_mt = get_pageblock_migratetype(page);
+
+	if (cc->migratetype == MIGRATE_MOVABLE)
+		return is_migrate_movable(block_mt);
+	else
+		return block_mt == cc->migratetype;
 }
 
 /* Returns true if the page is within a block suitable for migration to */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fab8b6913179..0c00beec9336 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3492,6 +3492,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 						struct alloc_context *ac)
 {
 	bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM;
+	const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER;
 	struct page *page = NULL;
 	unsigned int alloc_flags;
 	unsigned long did_some_progress;
@@ -3539,12 +3540,17 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 
 	/*
 	 * For costly allocations, try direct compaction first, as it's likely
-	 * that we have enough base pages and don't need to reclaim. Don't try
-	 * that for allocations that are allowed to ignore watermarks, as the
-	 * ALLOC_NO_WATERMARKS attempt didn't yet happen.
+	 * that we have enough base pages and don't need to reclaim. For non-
+	 * movable high-order allocations, do that as well, as compaction will
+	 * try prevent permanent fragmentation by migrating from blocks of the
+	 * same migratetype.
+	 * Don't try this for allocations that are allowed to ignore
+	 * watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen.
 	 */
-	if (can_direct_reclaim && order > PAGE_ALLOC_COSTLY_ORDER &&
-		!gfp_pfmemalloc_allowed(gfp_mask)) {
+	if (can_direct_reclaim &&
+			(costly_order ||
+			   (order > 0 && ac->migratetype != MIGRATE_MOVABLE))
+			&& !gfp_pfmemalloc_allowed(gfp_mask)) {
 		page = __alloc_pages_direct_compact(gfp_mask, order,
 						alloc_flags, ac,
 						INIT_COMPACT_PRIORITY,
@@ -3556,7 +3562,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		 * Checks for costly allocations with __GFP_NORETRY, which
 		 * includes THP page fault allocations
 		 */
-		if (gfp_mask & __GFP_NORETRY) {
+		if (costly_order && (gfp_mask & __GFP_NORETRY)) {
 			/*
 			 * If compaction is deferred for high-order allocations,
 			 * it is because sync compaction recently failed. If
@@ -3651,7 +3657,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	 * Do not retry costly high order allocations unless they are
 	 * __GFP_REPEAT
 	 */
-	if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
+	if (costly_order && !(gfp_mask & __GFP_REPEAT))
 		goto nopage;
 
 	if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC 3/4] mm, compaction: restrict async compaction to matching migratetype
@ 2016-09-29 21:05     ` Vlastimil Babka
  0 siblings, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2016-09-29 21:05 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Mel Gorman, Joonsoo Kim, linux-kernel, linux-mm, kernel-team,
	Vlastimil Babka

The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE
pageblocks. This is a heuristic intended to reduce latency, based on the
assumption that non-MOVABLE pageblocks are unlikely to contain movable pages.

However, with the exception of THP's, most high-order allocations are not
movable. Should the async compaction succeed, this increases the chance that
the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the
long-term fragmentation worse.

This patch attempts to help the situation by changing async direct compaction
so that the migrate scanner only scans the pageblocks of the requested
migratetype. If it's a non-MOVABLE type and there are such pageblocks that do
contain movable pages, chances are that the allocation can succeed within one
of such pageblocks, removing the need for a fallback. If that fails, the
subsequent sync attempt will ignore this restriction.

Not-yet-signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 11 +++++++++--
 mm/page_alloc.c | 20 +++++++++++++-------
 2 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 823538353b80..eb4ccd403543 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -994,10 +994,17 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
 static bool suitable_migration_source(struct compact_control *cc,
 							struct page *page)
 {
-	if (cc->mode != MIGRATE_ASYNC)
+	int block_mt;
+
+	if ((cc->mode != MIGRATE_ASYNC) || !cc->direct_compaction)
 		return true;
 
-	return is_migrate_movable(get_pageblock_migratetype(page));
+	block_mt = get_pageblock_migratetype(page);
+
+	if (cc->migratetype == MIGRATE_MOVABLE)
+		return is_migrate_movable(block_mt);
+	else
+		return block_mt == cc->migratetype;
 }
 
 /* Returns true if the page is within a block suitable for migration to */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fab8b6913179..0c00beec9336 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3492,6 +3492,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 						struct alloc_context *ac)
 {
 	bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM;
+	const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER;
 	struct page *page = NULL;
 	unsigned int alloc_flags;
 	unsigned long did_some_progress;
@@ -3539,12 +3540,17 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 
 	/*
 	 * For costly allocations, try direct compaction first, as it's likely
-	 * that we have enough base pages and don't need to reclaim. Don't try
-	 * that for allocations that are allowed to ignore watermarks, as the
-	 * ALLOC_NO_WATERMARKS attempt didn't yet happen.
+	 * that we have enough base pages and don't need to reclaim. For non-
+	 * movable high-order allocations, do that as well, as compaction will
+	 * try prevent permanent fragmentation by migrating from blocks of the
+	 * same migratetype.
+	 * Don't try this for allocations that are allowed to ignore
+	 * watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen.
 	 */
-	if (can_direct_reclaim && order > PAGE_ALLOC_COSTLY_ORDER &&
-		!gfp_pfmemalloc_allowed(gfp_mask)) {
+	if (can_direct_reclaim &&
+			(costly_order ||
+			   (order > 0 && ac->migratetype != MIGRATE_MOVABLE))
+			&& !gfp_pfmemalloc_allowed(gfp_mask)) {
 		page = __alloc_pages_direct_compact(gfp_mask, order,
 						alloc_flags, ac,
 						INIT_COMPACT_PRIORITY,
@@ -3556,7 +3562,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		 * Checks for costly allocations with __GFP_NORETRY, which
 		 * includes THP page fault allocations
 		 */
-		if (gfp_mask & __GFP_NORETRY) {
+		if (costly_order && (gfp_mask & __GFP_NORETRY)) {
 			/*
 			 * If compaction is deferred for high-order allocations,
 			 * it is because sync compaction recently failed. If
@@ -3651,7 +3657,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	 * Do not retry costly high order allocations unless they are
 	 * __GFP_REPEAT
 	 */
-	if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
+	if (costly_order && !(gfp_mask & __GFP_REPEAT))
 		goto nopage;
 
 	if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
-- 
2.10.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC 4/4] mm, page_alloc: disallow migratetype fallback in fastpath
  2016-09-29 21:05   ` Vlastimil Babka
@ 2016-09-29 21:05     ` Vlastimil Babka
  -1 siblings, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2016-09-29 21:05 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Mel Gorman, Joonsoo Kim, linux-kernel, linux-mm, kernel-team,
	Vlastimil Babka

The previous patch has adjusted async compaction so that it helps against
longterm fragmentation when compacting for a non-MOVABLE high-order allocation.
The goal of this patch is to force such allocations go through compaction
once before being allowed to fallback to a pageblock of different migratetype
(e.g. MOVABLE). In contexts where compaction is not allowed (and for order-0
allocations), this delayed fallback possibility can still help by trying a
different zone where fallback might not be needed and potentially waking up
kswapd earlier.

Not-yet-signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/internal.h   |  1 +
 mm/page_alloc.c | 14 ++++++++++----
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 1fee63010dcc..a46eab383e8d 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -466,6 +466,7 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
 #define ALLOC_HIGH		0x20 /* __GFP_HIGH set */
 #define ALLOC_CPUSET		0x40 /* check for correct cpuset */
 #define ALLOC_CMA		0x80 /* allow allocations from CMA areas */
+#define ALLOC_FALLBACK		0x100 /* allow fallback of migratetype */
 
 enum ttu_flags;
 struct tlbflush_unmap_batch;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0c00beec9336..8a8ef9ebeb4d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2163,7 +2163,7 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
  * Call me with the zone->lock already held.
  */
 static struct page *__rmqueue(struct zone *zone, unsigned int order,
-				int migratetype)
+				int migratetype, bool allow_fallback)
 {
 	struct page *page;
 
@@ -2172,7 +2172,7 @@ static struct page *__rmqueue(struct zone *zone, unsigned int order,
 		if (migratetype == MIGRATE_MOVABLE)
 			page = __rmqueue_cma_fallback(zone, order);
 
-		if (!page)
+		if (!page && allow_fallback)
 			page = __rmqueue_fallback(zone, order, migratetype);
 	}
 
@@ -2193,7 +2193,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 
 	spin_lock(&zone->lock);
 	for (i = 0; i < count; ++i) {
-		struct page *page = __rmqueue(zone, order, migratetype);
+		struct page *page = __rmqueue(zone, order, migratetype, true);
 		if (unlikely(page == NULL))
 			break;
 
@@ -2626,7 +2626,10 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
 					trace_mm_page_alloc_zone_locked(page, order, migratetype);
 			}
 			if (!page)
-				page = __rmqueue(zone, order, migratetype);
+				page = __rmqueue(zone, order, migratetype,
+						alloc_flags &
+						(ALLOC_FALLBACK |
+						 ALLOC_NO_WATERMARKS));
 		} while (page && check_new_pages(page, order));
 		spin_unlock(&zone->lock);
 		if (!page)
@@ -3583,6 +3586,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		}
 	}
 
+	/* async direct compaction didn't help, now allow fallback */
+	alloc_flags |= ALLOC_FALLBACK;
+
 retry:
 	/* Ensure kswapd doesn't accidentally go to sleep as long as we loop */
 	if (gfp_mask & __GFP_KSWAPD_RECLAIM)
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC 4/4] mm, page_alloc: disallow migratetype fallback in fastpath
@ 2016-09-29 21:05     ` Vlastimil Babka
  0 siblings, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2016-09-29 21:05 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Mel Gorman, Joonsoo Kim, linux-kernel, linux-mm, kernel-team,
	Vlastimil Babka

The previous patch has adjusted async compaction so that it helps against
longterm fragmentation when compacting for a non-MOVABLE high-order allocation.
The goal of this patch is to force such allocations go through compaction
once before being allowed to fallback to a pageblock of different migratetype
(e.g. MOVABLE). In contexts where compaction is not allowed (and for order-0
allocations), this delayed fallback possibility can still help by trying a
different zone where fallback might not be needed and potentially waking up
kswapd earlier.

Not-yet-signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/internal.h   |  1 +
 mm/page_alloc.c | 14 ++++++++++----
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 1fee63010dcc..a46eab383e8d 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -466,6 +466,7 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
 #define ALLOC_HIGH		0x20 /* __GFP_HIGH set */
 #define ALLOC_CPUSET		0x40 /* check for correct cpuset */
 #define ALLOC_CMA		0x80 /* allow allocations from CMA areas */
+#define ALLOC_FALLBACK		0x100 /* allow fallback of migratetype */
 
 enum ttu_flags;
 struct tlbflush_unmap_batch;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0c00beec9336..8a8ef9ebeb4d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2163,7 +2163,7 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
  * Call me with the zone->lock already held.
  */
 static struct page *__rmqueue(struct zone *zone, unsigned int order,
-				int migratetype)
+				int migratetype, bool allow_fallback)
 {
 	struct page *page;
 
@@ -2172,7 +2172,7 @@ static struct page *__rmqueue(struct zone *zone, unsigned int order,
 		if (migratetype == MIGRATE_MOVABLE)
 			page = __rmqueue_cma_fallback(zone, order);
 
-		if (!page)
+		if (!page && allow_fallback)
 			page = __rmqueue_fallback(zone, order, migratetype);
 	}
 
@@ -2193,7 +2193,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 
 	spin_lock(&zone->lock);
 	for (i = 0; i < count; ++i) {
-		struct page *page = __rmqueue(zone, order, migratetype);
+		struct page *page = __rmqueue(zone, order, migratetype, true);
 		if (unlikely(page == NULL))
 			break;
 
@@ -2626,7 +2626,10 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
 					trace_mm_page_alloc_zone_locked(page, order, migratetype);
 			}
 			if (!page)
-				page = __rmqueue(zone, order, migratetype);
+				page = __rmqueue(zone, order, migratetype,
+						alloc_flags &
+						(ALLOC_FALLBACK |
+						 ALLOC_NO_WATERMARKS));
 		} while (page && check_new_pages(page, order));
 		spin_unlock(&zone->lock);
 		if (!page)
@@ -3583,6 +3586,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		}
 	}
 
+	/* async direct compaction didn't help, now allow fallback */
+	alloc_flags |= ALLOC_FALLBACK;
+
 retry:
 	/* Ensure kswapd doesn't accidentally go to sleep as long as we loop */
 	if (gfp_mask & __GFP_KSWAPD_RECLAIM)
-- 
2.10.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC 5/4] mm, page_alloc: split smallest stolen page in fallback
  2016-09-29 21:05   ` Vlastimil Babka
@ 2016-10-07  8:32     ` Vlastimil Babka
  -1 siblings, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2016-10-07  8:32 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Mel Gorman, Joonsoo Kim, linux-kernel, linux-mm, kernel-team,
	Vlastimil Babka

The __rmqueue_fallback() is called when there's no free page of requested
migratetype, and we need to steal from a different one. There are various
heuristics to make this event infrequent and reduce permanent fragmentation.
The main one is to try stealing from a pageblock that has the most free pages,
and possibly steal them all at once and convert the whole pageblock. Precise
searching for such pageblock would be expensive, so instead the heuristics
walks the free lists from MAX_ORDER down to requested order and assumes that
the block with highest-order free page is likely to also have the most free
pages in total.

So the chances are that together with the highest-order page, we steal also
pages of lower orders from the same block. But then we still split the highest
order page. This is wasteful and can contribute to fragmentation instead of
avoiding it.

This patch thus changes __rmqueue_fallback() to only steal the pages(s) and
put them on a freelist of the requested migratetype, and only report whether
it was successful. Then we pick the smallest page with __rmqueue_smallest().
This is all under zone lock, so nobody can steal it from us in the process.
This should reduce fragmentation due to fallbacks. At worst we are only
stealing a single highest-order page and waste some cycles by moving it between
lists and then removing it, but fallback is not exactly hot path so that should
not be a concern. As a side benefit the patch removes some duplicate code by
reusing __rmqueue_smallest().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_alloc.c | 50 ++++++++++++++++++++++++++------------------------
 1 file changed, 26 insertions(+), 24 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8a8ef9ebeb4d..2ccd80079d22 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1957,14 +1957,24 @@ static bool can_steal_fallback(unsigned int order, int start_mt)
  * use it's pages as requested migratetype in the future.
  */
 static void steal_suitable_fallback(struct zone *zone, struct page *page,
-							  int start_type)
+					 int start_type, bool whole_block)
 {
 	unsigned int current_order = page_order(page);
+	struct free_area *area;
 	int pages;
 
 	/* Take ownership for orders >= pageblock_order */
 	if (current_order >= pageblock_order) {
 		change_pageblock_range(page, current_order, start_type);
+		area = &zone->free_area[current_order];
+		list_move(&page->lru, &area->free_list[start_type]);
+		return;
+	}
+
+	/* We are not allowed to try stealing from the whole block */
+	if (!whole_block) {
+		area = &zone->free_area[current_order];
+		list_move(&page->lru, &area->free_list[start_type]);
 		return;
 	}
 
@@ -2108,8 +2118,13 @@ static void unreserve_highatomic_pageblock(const struct alloc_context *ac)
 	}
 }
 
-/* Remove an element from the buddy allocator from the fallback list */
-static inline struct page *
+/*
+ * Try finding a free buddy page on the fallback list and put it on the free
+ * list of requested migratetype, possibly along with other pages from the same
+ * block, depending on fragmentation avoidance heuristics. Returns true if
+ * fallback was found so that __rmqueue_smallest() can grab it.
+ */
+static inline bool
 __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
 {
 	struct free_area *area;
@@ -2130,32 +2145,16 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
 
 		page = list_first_entry(&area->free_list[fallback_mt],
 						struct page, lru);
-		if (can_steal)
-			steal_suitable_fallback(zone, page, start_migratetype);
-
-		/* Remove the page from the freelists */
-		area->nr_free--;
-		list_del(&page->lru);
-		rmv_page_order(page);
 
-		expand(zone, page, order, current_order, area,
-					start_migratetype);
-		/*
-		 * The pcppage_migratetype may differ from pageblock's
-		 * migratetype depending on the decisions in
-		 * find_suitable_fallback(). This is OK as long as it does not
-		 * differ for MIGRATE_CMA pageblocks. Those can be used as
-		 * fallback only via special __rmqueue_cma_fallback() function
-		 */
-		set_pcppage_migratetype(page, start_migratetype);
+		steal_suitable_fallback(zone, page, start_migratetype, can_steal);
 
 		trace_mm_page_alloc_extfrag(page, order, current_order,
 			start_migratetype, fallback_mt);
 
-		return page;
+		return true;
 	}
 
-	return NULL;
+	return false;
 }
 
 /*
@@ -2167,13 +2166,16 @@ static struct page *__rmqueue(struct zone *zone, unsigned int order,
 {
 	struct page *page;
 
+retry:
 	page = __rmqueue_smallest(zone, order, migratetype);
 	if (unlikely(!page)) {
 		if (migratetype == MIGRATE_MOVABLE)
 			page = __rmqueue_cma_fallback(zone, order);
 
-		if (!page && allow_fallback)
-			page = __rmqueue_fallback(zone, order, migratetype);
+		if (!page && allow_fallback) {
+			if (__rmqueue_fallback(zone, order, migratetype))
+				goto retry;
+		}
 	}
 
 	trace_mm_page_alloc_zone_locked(page, order, migratetype);
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC 5/4] mm, page_alloc: split smallest stolen page in fallback
@ 2016-10-07  8:32     ` Vlastimil Babka
  0 siblings, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2016-10-07  8:32 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Mel Gorman, Joonsoo Kim, linux-kernel, linux-mm, kernel-team,
	Vlastimil Babka

The __rmqueue_fallback() is called when there's no free page of requested
migratetype, and we need to steal from a different one. There are various
heuristics to make this event infrequent and reduce permanent fragmentation.
The main one is to try stealing from a pageblock that has the most free pages,
and possibly steal them all at once and convert the whole pageblock. Precise
searching for such pageblock would be expensive, so instead the heuristics
walks the free lists from MAX_ORDER down to requested order and assumes that
the block with highest-order free page is likely to also have the most free
pages in total.

So the chances are that together with the highest-order page, we steal also
pages of lower orders from the same block. But then we still split the highest
order page. This is wasteful and can contribute to fragmentation instead of
avoiding it.

This patch thus changes __rmqueue_fallback() to only steal the pages(s) and
put them on a freelist of the requested migratetype, and only report whether
it was successful. Then we pick the smallest page with __rmqueue_smallest().
This is all under zone lock, so nobody can steal it from us in the process.
This should reduce fragmentation due to fallbacks. At worst we are only
stealing a single highest-order page and waste some cycles by moving it between
lists and then removing it, but fallback is not exactly hot path so that should
not be a concern. As a side benefit the patch removes some duplicate code by
reusing __rmqueue_smallest().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_alloc.c | 50 ++++++++++++++++++++++++++------------------------
 1 file changed, 26 insertions(+), 24 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8a8ef9ebeb4d..2ccd80079d22 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1957,14 +1957,24 @@ static bool can_steal_fallback(unsigned int order, int start_mt)
  * use it's pages as requested migratetype in the future.
  */
 static void steal_suitable_fallback(struct zone *zone, struct page *page,
-							  int start_type)
+					 int start_type, bool whole_block)
 {
 	unsigned int current_order = page_order(page);
+	struct free_area *area;
 	int pages;
 
 	/* Take ownership for orders >= pageblock_order */
 	if (current_order >= pageblock_order) {
 		change_pageblock_range(page, current_order, start_type);
+		area = &zone->free_area[current_order];
+		list_move(&page->lru, &area->free_list[start_type]);
+		return;
+	}
+
+	/* We are not allowed to try stealing from the whole block */
+	if (!whole_block) {
+		area = &zone->free_area[current_order];
+		list_move(&page->lru, &area->free_list[start_type]);
 		return;
 	}
 
@@ -2108,8 +2118,13 @@ static void unreserve_highatomic_pageblock(const struct alloc_context *ac)
 	}
 }
 
-/* Remove an element from the buddy allocator from the fallback list */
-static inline struct page *
+/*
+ * Try finding a free buddy page on the fallback list and put it on the free
+ * list of requested migratetype, possibly along with other pages from the same
+ * block, depending on fragmentation avoidance heuristics. Returns true if
+ * fallback was found so that __rmqueue_smallest() can grab it.
+ */
+static inline bool
 __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
 {
 	struct free_area *area;
@@ -2130,32 +2145,16 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
 
 		page = list_first_entry(&area->free_list[fallback_mt],
 						struct page, lru);
-		if (can_steal)
-			steal_suitable_fallback(zone, page, start_migratetype);
-
-		/* Remove the page from the freelists */
-		area->nr_free--;
-		list_del(&page->lru);
-		rmv_page_order(page);
 
-		expand(zone, page, order, current_order, area,
-					start_migratetype);
-		/*
-		 * The pcppage_migratetype may differ from pageblock's
-		 * migratetype depending on the decisions in
-		 * find_suitable_fallback(). This is OK as long as it does not
-		 * differ for MIGRATE_CMA pageblocks. Those can be used as
-		 * fallback only via special __rmqueue_cma_fallback() function
-		 */
-		set_pcppage_migratetype(page, start_migratetype);
+		steal_suitable_fallback(zone, page, start_migratetype, can_steal);
 
 		trace_mm_page_alloc_extfrag(page, order, current_order,
 			start_migratetype, fallback_mt);
 
-		return page;
+		return true;
 	}
 
-	return NULL;
+	return false;
 }
 
 /*
@@ -2167,13 +2166,16 @@ static struct page *__rmqueue(struct zone *zone, unsigned int order,
 {
 	struct page *page;
 
+retry:
 	page = __rmqueue_smallest(zone, order, migratetype);
 	if (unlikely(!page)) {
 		if (migratetype == MIGRATE_MOVABLE)
 			page = __rmqueue_cma_fallback(zone, order);
 
-		if (!page && allow_fallback)
-			page = __rmqueue_fallback(zone, order, migratetype);
+		if (!page && allow_fallback) {
+			if (__rmqueue_fallback(zone, order, migratetype))
+				goto retry;
+		}
 	}
 
 	trace_mm_page_alloc_zone_locked(page, order, migratetype);
-- 
2.10.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [RFC 0/4] try to reduce fragmenting fallbacks
  2016-09-29 21:05   ` Vlastimil Babka
@ 2016-10-10 17:16     ` Johannes Weiner
  -1 siblings, 0 replies; 43+ messages in thread
From: Johannes Weiner @ 2016-10-10 17:16 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Mel Gorman, Joonsoo Kim, linux-kernel, linux-mm, kernel-team

Hi Vlastimil,

sorry for the delay, I just got back from traveling.

On Thu, Sep 29, 2016 at 11:05:44PM +0200, Vlastimil Babka wrote:
> Hi Johannes,
> 
> here's something quick to try or ponder about. However, untested since it's too
> late here. Based on mmotm-2016-09-27-16-08 plus this fix [1]
> 
> [1] http://lkml.kernel.org/r/<cadadd38-6456-f58e-504f-cc18ddc47b3f@suse.cz>

Thanks for whipping something up, I'll give these a shot. 4/4 is
something I wondered about too. Let's see how this performs.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC 0/4] try to reduce fragmenting fallbacks
@ 2016-10-10 17:16     ` Johannes Weiner
  0 siblings, 0 replies; 43+ messages in thread
From: Johannes Weiner @ 2016-10-10 17:16 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Mel Gorman, Joonsoo Kim, linux-kernel, linux-mm, kernel-team

Hi Vlastimil,

sorry for the delay, I just got back from traveling.

On Thu, Sep 29, 2016 at 11:05:44PM +0200, Vlastimil Babka wrote:
> Hi Johannes,
> 
> here's something quick to try or ponder about. However, untested since it's too
> late here. Based on mmotm-2016-09-27-16-08 plus this fix [1]
> 
> [1] http://lkml.kernel.org/r/<cadadd38-6456-f58e-504f-cc18ddc47b3f@suse.cz>

Thanks for whipping something up, I'll give these a shot. 4/4 is
something I wondered about too. Let's see how this performs.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC 6/4] mm, page_alloc: introduce MIGRATE_MIXED migratetype
  2016-09-29 21:05   ` Vlastimil Babka
@ 2016-10-11 13:11     ` Vlastimil Babka
  -1 siblings, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2016-10-11 13:11 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Mel Gorman, Joonsoo Kim, linux-kernel, linux-mm, kernel-team,
	Vlastimil Babka

Page mobility grouping tries to minimize the number of pageblocks that contain
non-migratable pages by distinguishing MOVABLE, UNMOVABLE and RECLAIMABLE
pageblock migratetypes. Changing pageblock's migratetype is allowed if an
allocation of different migratetype steals more than half of pages from it.

That means it's possible to have pageblocks that contain some UNMOVABLE and
RECLAIMABLE pages, yet they are marked as MOVABLE, and the next time stealing
happens, another MOVABLE pageblock might get polluted. On the other hand, if we
duly marked all polluted pageblocks (even just by single page) as UNMOVABLE or
RECLAIMABLE, further allocations and freeing of pages would tend to spread over
all of them, and there would be little pressure for them to eventually become
fully free and MOVABLE.

This patch thus introduces a new migratetype MIGRATE_MIXED, which is intended
to mark pageblocks that contain some UNMOVABLE or RECLAIMABLE pages, but not
enough to mark the whole pageblocks as such. These pageblocks become preferred
fallback before UNMOVABLE/RECLAIMABLE allocation steals from a MOVABLE
pageblock, or vice versa. This should help page mobility grouping:

- UNMOVABLE and RECLAIMABLE allocations will try to be satisfied from their
  respective pageblocks. If these are full, polluting other pageblocks is
  limited to MIGRATE_MIXED pageblocks. MIGRATE_MOVABLE pageblocks remain pure.
  If a temporery pressure for UNMOVABLE and RECLAIMABLE pageblocks disappears
  and can be satisfied without fallback, the MIXED pageblocks might eventually
  fully recover from the polluted pages.

- MOVABLE allocations will exhaust MOVABLE pageblocks first, then fallback to
  MIXED as second. This leaves free pages in UNMOVABLE and RECLAIMABLE
  pageblocks as a last resort, so those allocations don't have to fall back
  so much.

Not-yet-signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---

This is not a new idea, but maybe it's time to give it a shot. Perhaps it will
have to be complemented by compaction migrate scanner recovering pageblocks
from one migratetype to another depending on how many free pages and migratable
pages it finds in them. The fallback events have limited information and
unpredictable timing.

 include/linux/mmzone.h |  1 +
 mm/compaction.c        | 14 +++++++++--
 mm/page_alloc.c        | 63 ++++++++++++++++++++++++++++++++++++--------------
 3 files changed, 59 insertions(+), 19 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 9cd3ee58ab2b..e4e0a1f64801 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -41,6 +41,7 @@ enum {
 	MIGRATE_RECLAIMABLE,
 	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
 	MIGRATE_HIGHATOMIC = MIGRATE_PCPTYPES,
+	MIGRATE_MIXED,
 #ifdef CONFIG_CMA
 	/*
 	 * MIGRATE_CMA migration type is designed to mimic the way
diff --git a/mm/compaction.c b/mm/compaction.c
index eb4ccd403543..79bff09a5cac 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1001,6 +1001,9 @@ static bool suitable_migration_source(struct compact_control *cc,
 
 	block_mt = get_pageblock_migratetype(page);
 
+	if (block_mt == MIGRATE_MIXED)
+		return true;
+
 	if (cc->migratetype == MIGRATE_MOVABLE)
 		return is_migrate_movable(block_mt);
 	else
@@ -1011,6 +1014,8 @@ static bool suitable_migration_source(struct compact_control *cc,
 static bool suitable_migration_target(struct compact_control *cc,
 							struct page *page)
 {
+	int block_mt;
+
 	if (cc->ignore_block_suitable)
 		return true;
 
@@ -1025,8 +1030,13 @@ static bool suitable_migration_target(struct compact_control *cc,
 			return false;
 	}
 
-	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
-	if (is_migrate_movable(get_pageblock_migratetype(page)))
+	block_mt = get_pageblock_migratetype(page);
+
+	/*
+	 * If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration.
+	 * Allow also mixed pageblocks so we are not so restrictive.
+	 */
+	if (is_migrate_movable(block_mt) || block_mt == MIGRATE_MIXED)
 		return true;
 
 	/* Otherwise skip the block */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2ccd80079d22..6c5bc6a7858c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -229,6 +229,7 @@ char * const migratetype_names[MIGRATE_TYPES] = {
 	"Movable",
 	"Reclaimable",
 	"HighAtomic",
+	"Mixed",
 #ifdef CONFIG_CMA
 	"CMA",
 #endif
@@ -1814,9 +1815,9 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
  * the free lists for the desirable migrate type are depleted
  */
 static int fallbacks[MIGRATE_TYPES][4] = {
-	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_TYPES },
-	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_TYPES },
-	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
+	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MIXED, MIGRATE_MOVABLE,   MIGRATE_TYPES },
+	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MIXED, MIGRATE_MOVABLE,   MIGRATE_TYPES },
+	[MIGRATE_MOVABLE]     = { MIGRATE_MIXED, MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
 #ifdef CONFIG_CMA
 	[MIGRATE_CMA]         = { MIGRATE_TYPES }, /* Never used */
 #endif
@@ -1937,13 +1938,12 @@ static bool can_steal_fallback(unsigned int order, int start_mt)
 	 * but, below check doesn't guarantee it and that is just heuristic
 	 * so could be changed anytime.
 	 */
-	if (order >= pageblock_order)
+	if (order >= pageblock_order || page_group_by_mobility_disabled)
 		return true;
 
 	if (order >= pageblock_order / 2 ||
 		start_mt == MIGRATE_RECLAIMABLE ||
-		start_mt == MIGRATE_UNMOVABLE ||
-		page_group_by_mobility_disabled)
+		start_mt == MIGRATE_UNMOVABLE)
 		return true;
 
 	return false;
@@ -1962,6 +1962,7 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
 	unsigned int current_order = page_order(page);
 	struct free_area *area;
 	int pages;
+	int old_block_type, new_block_type;
 
 	/* Take ownership for orders >= pageblock_order */
 	if (current_order >= pageblock_order) {
@@ -1975,15 +1976,40 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
 	if (!whole_block) {
 		area = &zone->free_area[current_order];
 		list_move(&page->lru, &area->free_list[start_type]);
-		return;
+		pages = 1 << current_order;
+	} else {
+		pages = move_freepages_block(zone, page, start_type);
 	}
 
-	pages = move_freepages_block(zone, page, start_type);
+	new_block_type = old_block_type = get_pageblock_migratetype(page);
+	if (page_group_by_mobility_disabled)
+		new_block_type = start_type;
 
-	/* Claim the whole block if over half of it is free */
-	if (pages >= (1 << (pageblock_order-1)) ||
-			page_group_by_mobility_disabled)
-		set_pageblock_migratetype(page, start_type);
+	if (pages >= (1 << (pageblock_order-1))) {
+		/*
+		 * Claim the whole block if over half of it is free. The
+		 * exception is the transition to MIGRATE_MOVABLE where we
+		 * require it to be fully free so that MIGRATE_MOVABLE
+		 * pageblocks consist of purely movable pages. So if we steal
+		 * less than whole pageblock, mark it as MIGRATE_MIXED.
+		 */
+		if (start_type == MIGRATE_MOVABLE)
+			new_block_type = MIGRATE_MIXED;
+		else
+			new_block_type = start_type;
+	} else {
+		/*
+		 * We didn't steal enough to change the block's migratetype.
+		 * But if we are stealing from a MOVABLE block for a
+		 * non-MOVABLE allocation, mark the block as MIXED.
+		 */
+		if (old_block_type == MIGRATE_MOVABLE
+					&& start_type != MIGRATE_MOVABLE)
+			new_block_type = MIGRATE_MIXED;
+	}
+
+	if (new_block_type != old_block_type)
+		set_pageblock_migratetype(page, new_block_type);
 }
 
 /*
@@ -2526,16 +2552,18 @@ int __isolate_free_page(struct page *page, unsigned int order)
 	rmv_page_order(page);
 
 	/*
-	 * Set the pageblock if the isolated page is at least half of a
-	 * pageblock
+	 * Set the pageblock's migratetype to MIXED if the isolated page is
+	 * at least half of a pageblock, MOVABLE if at least whole pageblock
 	 */
 	if (order >= pageblock_order - 1) {
 		struct page *endpage = page + (1 << order) - 1;
+		int new_mt = (order >= pageblock_order) ?
+					MIGRATE_MOVABLE : MIGRATE_MIXED;
 		for (; page < endpage; page += pageblock_nr_pages) {
 			int mt = get_pageblock_migratetype(page);
-			if (!is_migrate_isolate(mt) && !is_migrate_cma(mt))
-				set_pageblock_migratetype(page,
-							  MIGRATE_MOVABLE);
+
+			if (!is_migrate_isolate(mt) && !is_migrate_movable(mt))
+				set_pageblock_migratetype(page, new_mt);
 		}
 	}
 
@@ -4213,6 +4241,7 @@ static void show_migration_types(unsigned char type)
 		[MIGRATE_MOVABLE]	= 'M',
 		[MIGRATE_RECLAIMABLE]	= 'E',
 		[MIGRATE_HIGHATOMIC]	= 'H',
+		[MIGRATE_MIXED]		= 'M',
 #ifdef CONFIG_CMA
 		[MIGRATE_CMA]		= 'C',
 #endif
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC 6/4] mm, page_alloc: introduce MIGRATE_MIXED migratetype
@ 2016-10-11 13:11     ` Vlastimil Babka
  0 siblings, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2016-10-11 13:11 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Mel Gorman, Joonsoo Kim, linux-kernel, linux-mm, kernel-team,
	Vlastimil Babka

Page mobility grouping tries to minimize the number of pageblocks that contain
non-migratable pages by distinguishing MOVABLE, UNMOVABLE and RECLAIMABLE
pageblock migratetypes. Changing pageblock's migratetype is allowed if an
allocation of different migratetype steals more than half of pages from it.

That means it's possible to have pageblocks that contain some UNMOVABLE and
RECLAIMABLE pages, yet they are marked as MOVABLE, and the next time stealing
happens, another MOVABLE pageblock might get polluted. On the other hand, if we
duly marked all polluted pageblocks (even just by single page) as UNMOVABLE or
RECLAIMABLE, further allocations and freeing of pages would tend to spread over
all of them, and there would be little pressure for them to eventually become
fully free and MOVABLE.

This patch thus introduces a new migratetype MIGRATE_MIXED, which is intended
to mark pageblocks that contain some UNMOVABLE or RECLAIMABLE pages, but not
enough to mark the whole pageblocks as such. These pageblocks become preferred
fallback before UNMOVABLE/RECLAIMABLE allocation steals from a MOVABLE
pageblock, or vice versa. This should help page mobility grouping:

- UNMOVABLE and RECLAIMABLE allocations will try to be satisfied from their
  respective pageblocks. If these are full, polluting other pageblocks is
  limited to MIGRATE_MIXED pageblocks. MIGRATE_MOVABLE pageblocks remain pure.
  If a temporery pressure for UNMOVABLE and RECLAIMABLE pageblocks disappears
  and can be satisfied without fallback, the MIXED pageblocks might eventually
  fully recover from the polluted pages.

- MOVABLE allocations will exhaust MOVABLE pageblocks first, then fallback to
  MIXED as second. This leaves free pages in UNMOVABLE and RECLAIMABLE
  pageblocks as a last resort, so those allocations don't have to fall back
  so much.

Not-yet-signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---

This is not a new idea, but maybe it's time to give it a shot. Perhaps it will
have to be complemented by compaction migrate scanner recovering pageblocks
from one migratetype to another depending on how many free pages and migratable
pages it finds in them. The fallback events have limited information and
unpredictable timing.

 include/linux/mmzone.h |  1 +
 mm/compaction.c        | 14 +++++++++--
 mm/page_alloc.c        | 63 ++++++++++++++++++++++++++++++++++++--------------
 3 files changed, 59 insertions(+), 19 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 9cd3ee58ab2b..e4e0a1f64801 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -41,6 +41,7 @@ enum {
 	MIGRATE_RECLAIMABLE,
 	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
 	MIGRATE_HIGHATOMIC = MIGRATE_PCPTYPES,
+	MIGRATE_MIXED,
 #ifdef CONFIG_CMA
 	/*
 	 * MIGRATE_CMA migration type is designed to mimic the way
diff --git a/mm/compaction.c b/mm/compaction.c
index eb4ccd403543..79bff09a5cac 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1001,6 +1001,9 @@ static bool suitable_migration_source(struct compact_control *cc,
 
 	block_mt = get_pageblock_migratetype(page);
 
+	if (block_mt == MIGRATE_MIXED)
+		return true;
+
 	if (cc->migratetype == MIGRATE_MOVABLE)
 		return is_migrate_movable(block_mt);
 	else
@@ -1011,6 +1014,8 @@ static bool suitable_migration_source(struct compact_control *cc,
 static bool suitable_migration_target(struct compact_control *cc,
 							struct page *page)
 {
+	int block_mt;
+
 	if (cc->ignore_block_suitable)
 		return true;
 
@@ -1025,8 +1030,13 @@ static bool suitable_migration_target(struct compact_control *cc,
 			return false;
 	}
 
-	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
-	if (is_migrate_movable(get_pageblock_migratetype(page)))
+	block_mt = get_pageblock_migratetype(page);
+
+	/*
+	 * If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration.
+	 * Allow also mixed pageblocks so we are not so restrictive.
+	 */
+	if (is_migrate_movable(block_mt) || block_mt == MIGRATE_MIXED)
 		return true;
 
 	/* Otherwise skip the block */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2ccd80079d22..6c5bc6a7858c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -229,6 +229,7 @@ char * const migratetype_names[MIGRATE_TYPES] = {
 	"Movable",
 	"Reclaimable",
 	"HighAtomic",
+	"Mixed",
 #ifdef CONFIG_CMA
 	"CMA",
 #endif
@@ -1814,9 +1815,9 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
  * the free lists for the desirable migrate type are depleted
  */
 static int fallbacks[MIGRATE_TYPES][4] = {
-	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_TYPES },
-	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_TYPES },
-	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
+	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MIXED, MIGRATE_MOVABLE,   MIGRATE_TYPES },
+	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MIXED, MIGRATE_MOVABLE,   MIGRATE_TYPES },
+	[MIGRATE_MOVABLE]     = { MIGRATE_MIXED, MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
 #ifdef CONFIG_CMA
 	[MIGRATE_CMA]         = { MIGRATE_TYPES }, /* Never used */
 #endif
@@ -1937,13 +1938,12 @@ static bool can_steal_fallback(unsigned int order, int start_mt)
 	 * but, below check doesn't guarantee it and that is just heuristic
 	 * so could be changed anytime.
 	 */
-	if (order >= pageblock_order)
+	if (order >= pageblock_order || page_group_by_mobility_disabled)
 		return true;
 
 	if (order >= pageblock_order / 2 ||
 		start_mt == MIGRATE_RECLAIMABLE ||
-		start_mt == MIGRATE_UNMOVABLE ||
-		page_group_by_mobility_disabled)
+		start_mt == MIGRATE_UNMOVABLE)
 		return true;
 
 	return false;
@@ -1962,6 +1962,7 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
 	unsigned int current_order = page_order(page);
 	struct free_area *area;
 	int pages;
+	int old_block_type, new_block_type;
 
 	/* Take ownership for orders >= pageblock_order */
 	if (current_order >= pageblock_order) {
@@ -1975,15 +1976,40 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
 	if (!whole_block) {
 		area = &zone->free_area[current_order];
 		list_move(&page->lru, &area->free_list[start_type]);
-		return;
+		pages = 1 << current_order;
+	} else {
+		pages = move_freepages_block(zone, page, start_type);
 	}
 
-	pages = move_freepages_block(zone, page, start_type);
+	new_block_type = old_block_type = get_pageblock_migratetype(page);
+	if (page_group_by_mobility_disabled)
+		new_block_type = start_type;
 
-	/* Claim the whole block if over half of it is free */
-	if (pages >= (1 << (pageblock_order-1)) ||
-			page_group_by_mobility_disabled)
-		set_pageblock_migratetype(page, start_type);
+	if (pages >= (1 << (pageblock_order-1))) {
+		/*
+		 * Claim the whole block if over half of it is free. The
+		 * exception is the transition to MIGRATE_MOVABLE where we
+		 * require it to be fully free so that MIGRATE_MOVABLE
+		 * pageblocks consist of purely movable pages. So if we steal
+		 * less than whole pageblock, mark it as MIGRATE_MIXED.
+		 */
+		if (start_type == MIGRATE_MOVABLE)
+			new_block_type = MIGRATE_MIXED;
+		else
+			new_block_type = start_type;
+	} else {
+		/*
+		 * We didn't steal enough to change the block's migratetype.
+		 * But if we are stealing from a MOVABLE block for a
+		 * non-MOVABLE allocation, mark the block as MIXED.
+		 */
+		if (old_block_type == MIGRATE_MOVABLE
+					&& start_type != MIGRATE_MOVABLE)
+			new_block_type = MIGRATE_MIXED;
+	}
+
+	if (new_block_type != old_block_type)
+		set_pageblock_migratetype(page, new_block_type);
 }
 
 /*
@@ -2526,16 +2552,18 @@ int __isolate_free_page(struct page *page, unsigned int order)
 	rmv_page_order(page);
 
 	/*
-	 * Set the pageblock if the isolated page is at least half of a
-	 * pageblock
+	 * Set the pageblock's migratetype to MIXED if the isolated page is
+	 * at least half of a pageblock, MOVABLE if at least whole pageblock
 	 */
 	if (order >= pageblock_order - 1) {
 		struct page *endpage = page + (1 << order) - 1;
+		int new_mt = (order >= pageblock_order) ?
+					MIGRATE_MOVABLE : MIGRATE_MIXED;
 		for (; page < endpage; page += pageblock_nr_pages) {
 			int mt = get_pageblock_migratetype(page);
-			if (!is_migrate_isolate(mt) && !is_migrate_cma(mt))
-				set_pageblock_migratetype(page,
-							  MIGRATE_MOVABLE);
+
+			if (!is_migrate_isolate(mt) && !is_migrate_movable(mt))
+				set_pageblock_migratetype(page, new_mt);
 		}
 	}
 
@@ -4213,6 +4241,7 @@ static void show_migration_types(unsigned char type)
 		[MIGRATE_MOVABLE]	= 'M',
 		[MIGRATE_RECLAIMABLE]	= 'E',
 		[MIGRATE_HIGHATOMIC]	= 'H',
+		[MIGRATE_MIXED]		= 'M',
 #ifdef CONFIG_CMA
 		[MIGRATE_CMA]		= 'C',
 #endif
-- 
2.10.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [RFC 4/4] mm, page_alloc: disallow migratetype fallback in fastpath
  2016-09-29 21:05     ` Vlastimil Babka
@ 2016-10-12 14:51       ` Vlastimil Babka
  -1 siblings, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2016-10-12 14:51 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Mel Gorman, Joonsoo Kim, linux-kernel, linux-mm, kernel-team

On 09/29/2016 11:05 PM, Vlastimil Babka wrote:
> The previous patch has adjusted async compaction so that it helps against
> longterm fragmentation when compacting for a non-MOVABLE high-order allocation.
> The goal of this patch is to force such allocations go through compaction
> once before being allowed to fallback to a pageblock of different migratetype
> (e.g. MOVABLE). In contexts where compaction is not allowed (and for order-0
> allocations), this delayed fallback possibility can still help by trying a
> different zone where fallback might not be needed and potentially waking up
> kswapd earlier.
> 
> Not-yet-signed-off-by: Vlastimil Babka <vbabka@suse.cz>

I forgot that compaction itself also needs to be told to not allow fallback,
otherwise it finishes with COMPACT_SUCCESS without actually doing anything...

>From 93acabcc744eab5a4aa965322e9083d0d9f990fc Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Wed, 12 Oct 2016 16:36:35 +0200
Subject: fixup! mm, page_alloc: disallow migratetype fallback in fastpath

We want to force compaction to run even when the requested page is potentially
available, but of a wrong migratetype. This won't work unless the compaction
itself is modified to not declare imediatelly success when it sees such page.
---
 mm/compaction.c | 22 +++++++++++++++++-----
 mm/internal.h   |  3 ++-
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index eb4ccd403543..eeb9200f7b7e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1354,9 +1354,11 @@ static enum compact_result __compact_finished(struct zone *zone,
 #endif
 		/*
 		 * Job done if allocation would steal freepages from
-		 * other migratetype buddy lists.
+		 * other migratetype buddy lists. This is not allowed
+		 * for async direct compaction.
 		 */
-		if (find_suitable_fallback(area, order, migratetype,
+		if (!cc->prevent_fallback &&
+			find_suitable_fallback(area, order, migratetype,
 						true, &can_steal) != -1)
 			return COMPACT_SUCCESS;
 	}
@@ -1509,8 +1511,17 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 	cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask);
 	ret = compaction_suitable(zone, cc->order, cc->alloc_flags,
 							cc->classzone_idx);
-	/* Compaction is likely to fail */
-	if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED)
+	/*
+	 * Compaction should not be needed. If we don't allow stealing from
+	 * pageblocks of different migratetype, the watermark checks cannot
+	 * distinguish that, so assume we would need to steal, and leave the
+	 * thorough check to compact_finished().
+	 */
+	if (ret == COMPACT_SUCCESS && !cc->prevent_fallback)
+		return ret;
+
+	/* Compaction is likely to fail due to insufficient free pages */
+	if (ret == COMPACT_SKIPPED)
 		return ret;
 
 	/* huh, compaction_suitable is returning something unexpected */
@@ -1678,7 +1689,8 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 		.direct_compaction = true,
 		.whole_zone = (prio == MIN_COMPACT_PRIORITY),
 		.ignore_skip_hint = (prio == MIN_COMPACT_PRIORITY),
-		.ignore_block_suitable = (prio == MIN_COMPACT_PRIORITY)
+		.ignore_block_suitable = (prio == MIN_COMPACT_PRIORITY),
+		.prevent_fallback = (prio == COMPACT_PRIO_ASYNC)
 	};
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
diff --git a/mm/internal.h b/mm/internal.h
index a46eab383e8d..bb01d9bd60a8 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -181,13 +181,14 @@ struct compact_control {
 	bool ignore_block_suitable;	/* Scan blocks considered unsuitable */
 	bool direct_compaction;		/* False from kcompactd or /proc/... */
 	bool whole_zone;		/* Whole zone should/has been scanned */
+	bool contended;			/* Signal lock or sched contention */
+	bool prevent_fallback;		/* Stealing migratetypes not allowed */
 	int order;			/* order a direct compactor needs */
 	int migratetype;		/* migratetype of direct compactor */
 	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
 	const unsigned int alloc_flags;	/* alloc flags of a direct compactor */
 	const int classzone_idx;	/* zone index of a direct compactor */
 	struct zone *zone;
-	bool contended;			/* Signal lock or sched contention */
 };
 
 unsigned long
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [RFC 4/4] mm, page_alloc: disallow migratetype fallback in fastpath
@ 2016-10-12 14:51       ` Vlastimil Babka
  0 siblings, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2016-10-12 14:51 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Mel Gorman, Joonsoo Kim, linux-kernel, linux-mm, kernel-team

On 09/29/2016 11:05 PM, Vlastimil Babka wrote:
> The previous patch has adjusted async compaction so that it helps against
> longterm fragmentation when compacting for a non-MOVABLE high-order allocation.
> The goal of this patch is to force such allocations go through compaction
> once before being allowed to fallback to a pageblock of different migratetype
> (e.g. MOVABLE). In contexts where compaction is not allowed (and for order-0
> allocations), this delayed fallback possibility can still help by trying a
> different zone where fallback might not be needed and potentially waking up
> kswapd earlier.
> 
> Not-yet-signed-off-by: Vlastimil Babka <vbabka@suse.cz>

I forgot that compaction itself also needs to be told to not allow fallback,
otherwise it finishes with COMPACT_SUCCESS without actually doing anything...

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Regression in mobility grouping?
  2016-09-29 16:14           ` Johannes Weiner
@ 2016-10-13  7:33             ` Joonsoo Kim
  -1 siblings, 0 replies; 43+ messages in thread
From: Joonsoo Kim @ 2016-10-13  7:33 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Vlastimil Babka, Mel Gorman, linux-mm, linux-kernel, kernel-team

Sorry for late response.

On Thu, Sep 29, 2016 at 12:14:02PM -0400, Johannes Weiner wrote:
> On Thu, Sep 29, 2016 at 03:14:33PM +0900, Joonsoo Kim wrote:
> > On Wed, Sep 28, 2016 at 10:25:40PM -0400, Johannes Weiner wrote:
> > > On Wed, Sep 28, 2016 at 11:39:25AM -0400, Johannes Weiner wrote:
> > > > On Wed, Sep 28, 2016 at 11:00:15AM +0200, Vlastimil Babka wrote:
> > > > > I guess testing revert of 9c0415e could give us some idea. Commit
> > > > > 3a1086f shouldn't result in pageblock marking differences and as I said
> > > > > above, 99592d5 should be just restoring to what 3.10 did.
> > > > 
> > > > I can give this a shot, but note that this commit makes only unmovable
> > > > stealing more aggressive. We see reclaimable blocks up as well.
> > > 
> > > Quick update, I reverted back to stealing eagerly only on behalf of
> > > MIGRATE_RECLAIMABLE allocations in a 4.6 kernel:
> > 
> > Hello, Johannes.
> > 
> > I think that it would be better to check 3.10 with above patches.
> > Fragmentation depends on not only policy itself but also
> > allocation/free pattern. There might be a large probability that
> > allocation/free pattern is changed in this large kernel version
> > difference.
> 
> You mean backport suspicious patches to 3.10 until I can reproduce it
> there? I'm not sure. You're correct, the patterns very likely *have*
> changed. But that alone cannot explain mobility grouping breaking that
> badly. There is a reproducable bad behavior. It should be easier to
> track down than to try to recreate it in the last-known-good kernel.

Okay. It is just my two cents.

> 
> > > This is an UNMOVABLE order-3 allocation falling back to RECLAIMABLE.
> > > According to can_steal_fallback(), this allocation shouldn't steal the
> > > pageblock, yet change_ownership=1 indicates the block is UNMOVABLE.
> > > 
> > > Who converted it? I wonder if there is a bug in ownership management,
> > > and there was an UNMOVABLE block on the RECLAIMABLE freelist from the
> > > beginning. AFAICS we never validate list/mt consistency anywhere.
> > 
> > According to my code review, it would be possible. When stealing
> > happens, we moved those buddy pages to current requested migratetype
> > buddy list. If the other migratetype allocation request comes and
> > stealing from the buddy list of previous requested migratetype
> > happens, change_ownership will show '1' even if there is no ownership
> > changing.
> 
> These two paths should exclude each other through the zone->lock, no?

zone->lock ensures that changing migratetype of pageblock happens
sequentially. But, it doesn't protect where actual freepage of some
pageblock is attached. For example, freepage on unmovable pageblock
could be attached on the movable buddy list and wrong information
about change_ownership=1 would be possible.

Thanks.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Regression in mobility grouping?
@ 2016-10-13  7:33             ` Joonsoo Kim
  0 siblings, 0 replies; 43+ messages in thread
From: Joonsoo Kim @ 2016-10-13  7:33 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Vlastimil Babka, Mel Gorman, linux-mm, linux-kernel, kernel-team

Sorry for late response.

On Thu, Sep 29, 2016 at 12:14:02PM -0400, Johannes Weiner wrote:
> On Thu, Sep 29, 2016 at 03:14:33PM +0900, Joonsoo Kim wrote:
> > On Wed, Sep 28, 2016 at 10:25:40PM -0400, Johannes Weiner wrote:
> > > On Wed, Sep 28, 2016 at 11:39:25AM -0400, Johannes Weiner wrote:
> > > > On Wed, Sep 28, 2016 at 11:00:15AM +0200, Vlastimil Babka wrote:
> > > > > I guess testing revert of 9c0415e could give us some idea. Commit
> > > > > 3a1086f shouldn't result in pageblock marking differences and as I said
> > > > > above, 99592d5 should be just restoring to what 3.10 did.
> > > > 
> > > > I can give this a shot, but note that this commit makes only unmovable
> > > > stealing more aggressive. We see reclaimable blocks up as well.
> > > 
> > > Quick update, I reverted back to stealing eagerly only on behalf of
> > > MIGRATE_RECLAIMABLE allocations in a 4.6 kernel:
> > 
> > Hello, Johannes.
> > 
> > I think that it would be better to check 3.10 with above patches.
> > Fragmentation depends on not only policy itself but also
> > allocation/free pattern. There might be a large probability that
> > allocation/free pattern is changed in this large kernel version
> > difference.
> 
> You mean backport suspicious patches to 3.10 until I can reproduce it
> there? I'm not sure. You're correct, the patterns very likely *have*
> changed. But that alone cannot explain mobility grouping breaking that
> badly. There is a reproducable bad behavior. It should be easier to
> track down than to try to recreate it in the last-known-good kernel.

Okay. It is just my two cents.

> 
> > > This is an UNMOVABLE order-3 allocation falling back to RECLAIMABLE.
> > > According to can_steal_fallback(), this allocation shouldn't steal the
> > > pageblock, yet change_ownership=1 indicates the block is UNMOVABLE.
> > > 
> > > Who converted it? I wonder if there is a bug in ownership management,
> > > and there was an UNMOVABLE block on the RECLAIMABLE freelist from the
> > > beginning. AFAICS we never validate list/mt consistency anywhere.
> > 
> > According to my code review, it would be possible. When stealing
> > happens, we moved those buddy pages to current requested migratetype
> > buddy list. If the other migratetype allocation request comes and
> > stealing from the buddy list of previous requested migratetype
> > happens, change_ownership will show '1' even if there is no ownership
> > changing.
> 
> These two paths should exclude each other through the zone->lock, no?

zone->lock ensures that changing migratetype of pageblock happens
sequentially. But, it doesn't protect where actual freepage of some
pageblock is attached. For example, freepage on unmovable pageblock
could be attached on the movable buddy list and wrong information
about change_ownership=1 would be possible.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC 4/4] mm, page_alloc: disallow migratetype fallback in fastpath
  2016-09-29 21:05     ` Vlastimil Babka
@ 2016-10-13  7:58       ` Joonsoo Kim
  -1 siblings, 0 replies; 43+ messages in thread
From: Joonsoo Kim @ 2016-10-13  7:58 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Johannes Weiner, Mel Gorman, linux-kernel, linux-mm, kernel-team

On Thu, Sep 29, 2016 at 11:05:48PM +0200, Vlastimil Babka wrote:
> The previous patch has adjusted async compaction so that it helps against
> longterm fragmentation when compacting for a non-MOVABLE high-order allocation.
> The goal of this patch is to force such allocations go through compaction
> once before being allowed to fallback to a pageblock of different migratetype
> (e.g. MOVABLE). In contexts where compaction is not allowed (and for order-0
> allocations), this delayed fallback possibility can still help by trying a
> different zone where fallback might not be needed and potentially waking up
> kswapd earlier.

Hmm... can we justify this compaction overhead in case of that there is
high order freepages in other migratetype pageblock? There is no guarantee
that longterm fragmentation happens and it affects the system
peformance.

And, it would easilly fail to compact in unmovable pageblock since
there would not be migratable pages if everything works as our
intended. So, I guess that checking it over and over doesn't help to
reduce fragmentation and just increase latency of allocation.

Thanks.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC 4/4] mm, page_alloc: disallow migratetype fallback in fastpath
@ 2016-10-13  7:58       ` Joonsoo Kim
  0 siblings, 0 replies; 43+ messages in thread
From: Joonsoo Kim @ 2016-10-13  7:58 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Johannes Weiner, Mel Gorman, linux-kernel, linux-mm, kernel-team

On Thu, Sep 29, 2016 at 11:05:48PM +0200, Vlastimil Babka wrote:
> The previous patch has adjusted async compaction so that it helps against
> longterm fragmentation when compacting for a non-MOVABLE high-order allocation.
> The goal of this patch is to force such allocations go through compaction
> once before being allowed to fallback to a pageblock of different migratetype
> (e.g. MOVABLE). In contexts where compaction is not allowed (and for order-0
> allocations), this delayed fallback possibility can still help by trying a
> different zone where fallback might not be needed and potentially waking up
> kswapd earlier.

Hmm... can we justify this compaction overhead in case of that there is
high order freepages in other migratetype pageblock? There is no guarantee
that longterm fragmentation happens and it affects the system
peformance.

And, it would easilly fail to compact in unmovable pageblock since
there would not be migratable pages if everything works as our
intended. So, I guess that checking it over and over doesn't help to
reduce fragmentation and just increase latency of allocation.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC 4/4] mm, page_alloc: disallow migratetype fallback in fastpath
  2016-10-13  7:58       ` Joonsoo Kim
@ 2016-10-13 11:46         ` Vlastimil Babka
  -1 siblings, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2016-10-13 11:46 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Johannes Weiner, Mel Gorman, linux-kernel, linux-mm, kernel-team

On 10/13/2016 09:58 AM, Joonsoo Kim wrote:
> On Thu, Sep 29, 2016 at 11:05:48PM +0200, Vlastimil Babka wrote:
>> The previous patch has adjusted async compaction so that it helps against
>> longterm fragmentation when compacting for a non-MOVABLE high-order allocation.
>> The goal of this patch is to force such allocations go through compaction
>> once before being allowed to fallback to a pageblock of different migratetype
>> (e.g. MOVABLE). In contexts where compaction is not allowed (and for order-0
>> allocations), this delayed fallback possibility can still help by trying a
>> different zone where fallback might not be needed and potentially waking up
>> kswapd earlier.
>
> Hmm... can we justify this compaction overhead in case of that there is
> high order freepages in other migratetype pageblock? There is no guarantee
> that longterm fragmentation happens and it affects the system
> peformance.

Yeah, I hoped testing would show whether this makes any difference, and 
what the overhead is, and then we can decide whether it's worth.

> And, it would easilly fail to compact in unmovable pageblock since
> there would not be migratable pages if everything works as our
> intended. So, I guess that checking it over and over doesn't help to
> reduce fragmentation and just increase latency of allocation.

The pageblock isolation_suitable heuristics of compaction should 
mitigate rescanning blocks without success. We could also add a per-zone 
flag that gets set during a fallback allocation event and cleared by 
finished compaction, or something.

> Thanks.
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC 4/4] mm, page_alloc: disallow migratetype fallback in fastpath
@ 2016-10-13 11:46         ` Vlastimil Babka
  0 siblings, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2016-10-13 11:46 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Johannes Weiner, Mel Gorman, linux-kernel, linux-mm, kernel-team

On 10/13/2016 09:58 AM, Joonsoo Kim wrote:
> On Thu, Sep 29, 2016 at 11:05:48PM +0200, Vlastimil Babka wrote:
>> The previous patch has adjusted async compaction so that it helps against
>> longterm fragmentation when compacting for a non-MOVABLE high-order allocation.
>> The goal of this patch is to force such allocations go through compaction
>> once before being allowed to fallback to a pageblock of different migratetype
>> (e.g. MOVABLE). In contexts where compaction is not allowed (and for order-0
>> allocations), this delayed fallback possibility can still help by trying a
>> different zone where fallback might not be needed and potentially waking up
>> kswapd earlier.
>
> Hmm... can we justify this compaction overhead in case of that there is
> high order freepages in other migratetype pageblock? There is no guarantee
> that longterm fragmentation happens and it affects the system
> peformance.

Yeah, I hoped testing would show whether this makes any difference, and 
what the overhead is, and then we can decide whether it's worth.

> And, it would easilly fail to compact in unmovable pageblock since
> there would not be migratable pages if everything works as our
> intended. So, I guess that checking it over and over doesn't help to
> reduce fragmentation and just increase latency of allocation.

The pageblock isolation_suitable heuristics of compaction should 
mitigate rescanning blocks without success. We could also add a per-zone 
flag that gets set during a fallback allocation event and cleared by 
finished compaction, or something.

> Thanks.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC 7/4] mm, page_alloc: count movable pages when stealing
  2016-09-29 21:05   ` Vlastimil Babka
@ 2016-10-13 14:11     ` Vlastimil Babka
  -1 siblings, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2016-10-13 14:11 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Mel Gorman, Joonsoo Kim, linux-kernel, linux-mm, kernel-team,
	Vlastimil Babka

When stealing pages from pageblock of a different migratetype, we count how
many free pages were stolen, and change the pageblock's migratetype if more
than half of the pageblock was free. This might be too conservative, as there
might be other pages that are not free, but were allocated with the same
migratetype as our allocation requested.

While we cannot determine the migratetype of allocated pages precisely (at
least without the page_owner functionality enabled), we can count pages that
compaction would try to isolate for migration - those are either on LRU or
__PageMovable(). The rest can be assumed to be MIGRATE_RECLAIMABLE or
MIGRATE_UNMOVABLE, which we cannot easily distinguish. This counting can be
done as part of free page stealing with little additional overhead.

The page stealing code is changed so that it considers free pages plus pages
of the "good" migratetype for the decision whether to change pageblock's
migratetype. For changing pageblock to MIGRATE_MOVABLE, we require that all
pages are either free or appear to be movable, otherwise we use MIGRATE_MIXED.

The result should be more accurate migratetype of pageblocks wrt the actual
pages in the pageblocks, when stealing from semi-occupied pageblocks. This
should help with page grouping by mobility.

Not-yet-signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/page-isolation.h |  5 +--
 mm/page_alloc.c                | 73 +++++++++++++++++++++++++++++-------------
 mm/page_isolation.c            |  5 +--
 3 files changed, 54 insertions(+), 29 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 047d64706f2a..d4cd2014fa6f 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,10 +33,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 			 bool skip_hwpoisoned_pages);
 void set_pageblock_migratetype(struct page *page, int migratetype);
 int move_freepages_block(struct zone *zone, struct page *page,
-				int migratetype);
-int move_freepages(struct zone *zone,
-			  struct page *start_page, struct page *end_page,
-			  int migratetype);
+				int migratetype, int *num_movable);
 
 /*
  * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6c5bc6a7858c..29e44364a02d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1842,9 +1842,9 @@ static inline struct page *__rmqueue_cma_fallback(struct zone *zone,
  * Note that start_page and end_pages are not aligned on a pageblock
  * boundary. If alignment is required, use move_freepages_block()
  */
-int move_freepages(struct zone *zone,
+static int move_freepages(struct zone *zone,
 			  struct page *start_page, struct page *end_page,
-			  int migratetype)
+			  int migratetype, int *num_movable)
 {
 	struct page *page;
 	unsigned int order;
@@ -1861,6 +1861,9 @@ int move_freepages(struct zone *zone,
 	VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
 #endif
 
+	if (num_movable)
+		*num_movable = 0;
+
 	for (page = start_page; page <= end_page;) {
 		/* Make sure we are not inadvertently changing nodes */
 		VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page);
@@ -1870,23 +1873,33 @@ int move_freepages(struct zone *zone,
 			continue;
 		}
 
-		if (!PageBuddy(page)) {
-			page++;
+		if (PageBuddy(page)) {
+			order = page_order(page);
+			list_move(&page->lru,
+				  &zone->free_area[order].free_list[migratetype]);
+			page += 1 << order;
+			pages_moved += 1 << order;
 			continue;
 		}
 
-		order = page_order(page);
-		list_move(&page->lru,
-			  &zone->free_area[order].free_list[migratetype]);
-		page += 1 << order;
-		pages_moved += 1 << order;
+		page++;
+		if (!num_movable)
+			continue;
+
+		/*
+		 * We assume that pages that could be isolated for migration are
+		 * movable. But we don't actually try isolating, as that would be
+		 * expensive.
+		 */
+		if (PageLRU(page) || __PageMovable(page))
+			(*num_movable)++;
 	}
 
 	return pages_moved;
 }
 
 int move_freepages_block(struct zone *zone, struct page *page,
-				int migratetype)
+				int migratetype, int *num_movable)
 {
 	unsigned long start_pfn, end_pfn;
 	struct page *start_page, *end_page;
@@ -1903,7 +1916,8 @@ int move_freepages_block(struct zone *zone, struct page *page,
 	if (!zone_spans_pfn(zone, end_pfn))
 		return 0;
 
-	return move_freepages(zone, start_page, end_page, migratetype);
+	return move_freepages(zone, start_page, end_page, migratetype,
+								num_movable);
 }
 
 static void change_pageblock_range(struct page *pageblock_page,
@@ -1961,7 +1975,7 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
 {
 	unsigned int current_order = page_order(page);
 	struct free_area *area;
-	int pages;
+	int free_pages, good_pages;
 	int old_block_type, new_block_type;
 
 	/* Take ownership for orders >= pageblock_order */
@@ -1976,24 +1990,37 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
 	if (!whole_block) {
 		area = &zone->free_area[current_order];
 		list_move(&page->lru, &area->free_list[start_type]);
-		pages = 1 << current_order;
+		free_pages = 1 << current_order;
+		/* We didn't scan the block, so be pessimistic */
+		good_pages = 0;
 	} else {
-		pages = move_freepages_block(zone, page, start_type);
+		free_pages = move_freepages_block(zone, page, start_type,
+							&good_pages);
+		/*
+		 * good_pages is now the number of movable pages, but if we
+		 * want MOVABLE or RECLAIMABLE, we consider all non-movable as
+		 * good (but we can't fully distinguish them)
+		 */
+		if (start_type != MIGRATE_MOVABLE)
+			good_pages = pageblock_nr_pages - free_pages -
+								good_pages;
 	}
 
 	new_block_type = old_block_type = get_pageblock_migratetype(page);
 	if (page_group_by_mobility_disabled)
 		new_block_type = start_type;
 
-	if (pages >= (1 << (pageblock_order-1))) {
+	if (free_pages + good_pages >= (1 << (pageblock_order-1))) {
 		/*
-		 * Claim the whole block if over half of it is free. The
-		 * exception is the transition to MIGRATE_MOVABLE where we
-		 * require it to be fully free so that MIGRATE_MOVABLE
-		 * pageblocks consist of purely movable pages. So if we steal
-		 * less than whole pageblock, mark it as MIGRATE_MIXED.
+		 * Claim the whole block if over half of it is free or of a good
+		 * type. The exception is the transition to MIGRATE_MOVABLE
+		 * where we require it to be fully free/good so that
+		 * MIGRATE_MOVABLE pageblocks consist of purely movable pages.
+		 * So if we steal less than whole pageblock, mark it as
+		 * MIGRATE_MIXED.
 		 */
-		if (start_type == MIGRATE_MOVABLE)
+		if ((start_type == MIGRATE_MOVABLE) &&
+				free_pages + good_pages < pageblock_nr_pages)
 			new_block_type = MIGRATE_MIXED;
 		else
 			new_block_type = start_type;
@@ -2079,7 +2106,7 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone,
 			!is_migrate_isolate(mt) && !is_migrate_cma(mt)) {
 		zone->nr_reserved_highatomic += pageblock_nr_pages;
 		set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
-		move_freepages_block(zone, page, MIGRATE_HIGHATOMIC);
+		move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL);
 	}
 
 out_unlock:
@@ -2136,7 +2163,7 @@ static void unreserve_highatomic_pageblock(const struct alloc_context *ac)
 			 * may increase.
 			 */
 			set_pageblock_migratetype(page, ac->migratetype);
-			move_freepages_block(zone, page, ac->migratetype);
+			move_freepages_block(zone, page, ac->migratetype, NULL);
 			spin_unlock_irqrestore(&zone->lock, flags);
 			return;
 		}
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index a5594bfcc5ed..29c2f9b9aba7 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -66,7 +66,8 @@ static int set_migratetype_isolate(struct page *page,
 
 		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
 		zone->nr_isolate_pageblock++;
-		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
+		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE,
+									NULL);
 
 		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
 	}
@@ -120,7 +121,7 @@ static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
 	 * pageblock scanning for freepage moving.
 	 */
 	if (!isolated_page) {
-		nr_pages = move_freepages_block(zone, page, migratetype);
+		nr_pages = move_freepages_block(zone, page, migratetype, NULL);
 		__mod_zone_freepage_state(zone, nr_pages, migratetype);
 	}
 	set_pageblock_migratetype(page, migratetype);
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC 7/4] mm, page_alloc: count movable pages when stealing
@ 2016-10-13 14:11     ` Vlastimil Babka
  0 siblings, 0 replies; 43+ messages in thread
From: Vlastimil Babka @ 2016-10-13 14:11 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Mel Gorman, Joonsoo Kim, linux-kernel, linux-mm, kernel-team,
	Vlastimil Babka

When stealing pages from pageblock of a different migratetype, we count how
many free pages were stolen, and change the pageblock's migratetype if more
than half of the pageblock was free. This might be too conservative, as there
might be other pages that are not free, but were allocated with the same
migratetype as our allocation requested.

While we cannot determine the migratetype of allocated pages precisely (at
least without the page_owner functionality enabled), we can count pages that
compaction would try to isolate for migration - those are either on LRU or
__PageMovable(). The rest can be assumed to be MIGRATE_RECLAIMABLE or
MIGRATE_UNMOVABLE, which we cannot easily distinguish. This counting can be
done as part of free page stealing with little additional overhead.

The page stealing code is changed so that it considers free pages plus pages
of the "good" migratetype for the decision whether to change pageblock's
migratetype. For changing pageblock to MIGRATE_MOVABLE, we require that all
pages are either free or appear to be movable, otherwise we use MIGRATE_MIXED.

The result should be more accurate migratetype of pageblocks wrt the actual
pages in the pageblocks, when stealing from semi-occupied pageblocks. This
should help with page grouping by mobility.

Not-yet-signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/page-isolation.h |  5 +--
 mm/page_alloc.c                | 73 +++++++++++++++++++++++++++++-------------
 mm/page_isolation.c            |  5 +--
 3 files changed, 54 insertions(+), 29 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 047d64706f2a..d4cd2014fa6f 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,10 +33,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 			 bool skip_hwpoisoned_pages);
 void set_pageblock_migratetype(struct page *page, int migratetype);
 int move_freepages_block(struct zone *zone, struct page *page,
-				int migratetype);
-int move_freepages(struct zone *zone,
-			  struct page *start_page, struct page *end_page,
-			  int migratetype);
+				int migratetype, int *num_movable);
 
 /*
  * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6c5bc6a7858c..29e44364a02d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1842,9 +1842,9 @@ static inline struct page *__rmqueue_cma_fallback(struct zone *zone,
  * Note that start_page and end_pages are not aligned on a pageblock
  * boundary. If alignment is required, use move_freepages_block()
  */
-int move_freepages(struct zone *zone,
+static int move_freepages(struct zone *zone,
 			  struct page *start_page, struct page *end_page,
-			  int migratetype)
+			  int migratetype, int *num_movable)
 {
 	struct page *page;
 	unsigned int order;
@@ -1861,6 +1861,9 @@ int move_freepages(struct zone *zone,
 	VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
 #endif
 
+	if (num_movable)
+		*num_movable = 0;
+
 	for (page = start_page; page <= end_page;) {
 		/* Make sure we are not inadvertently changing nodes */
 		VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page);
@@ -1870,23 +1873,33 @@ int move_freepages(struct zone *zone,
 			continue;
 		}
 
-		if (!PageBuddy(page)) {
-			page++;
+		if (PageBuddy(page)) {
+			order = page_order(page);
+			list_move(&page->lru,
+				  &zone->free_area[order].free_list[migratetype]);
+			page += 1 << order;
+			pages_moved += 1 << order;
 			continue;
 		}
 
-		order = page_order(page);
-		list_move(&page->lru,
-			  &zone->free_area[order].free_list[migratetype]);
-		page += 1 << order;
-		pages_moved += 1 << order;
+		page++;
+		if (!num_movable)
+			continue;
+
+		/*
+		 * We assume that pages that could be isolated for migration are
+		 * movable. But we don't actually try isolating, as that would be
+		 * expensive.
+		 */
+		if (PageLRU(page) || __PageMovable(page))
+			(*num_movable)++;
 	}
 
 	return pages_moved;
 }
 
 int move_freepages_block(struct zone *zone, struct page *page,
-				int migratetype)
+				int migratetype, int *num_movable)
 {
 	unsigned long start_pfn, end_pfn;
 	struct page *start_page, *end_page;
@@ -1903,7 +1916,8 @@ int move_freepages_block(struct zone *zone, struct page *page,
 	if (!zone_spans_pfn(zone, end_pfn))
 		return 0;
 
-	return move_freepages(zone, start_page, end_page, migratetype);
+	return move_freepages(zone, start_page, end_page, migratetype,
+								num_movable);
 }
 
 static void change_pageblock_range(struct page *pageblock_page,
@@ -1961,7 +1975,7 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
 {
 	unsigned int current_order = page_order(page);
 	struct free_area *area;
-	int pages;
+	int free_pages, good_pages;
 	int old_block_type, new_block_type;
 
 	/* Take ownership for orders >= pageblock_order */
@@ -1976,24 +1990,37 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
 	if (!whole_block) {
 		area = &zone->free_area[current_order];
 		list_move(&page->lru, &area->free_list[start_type]);
-		pages = 1 << current_order;
+		free_pages = 1 << current_order;
+		/* We didn't scan the block, so be pessimistic */
+		good_pages = 0;
 	} else {
-		pages = move_freepages_block(zone, page, start_type);
+		free_pages = move_freepages_block(zone, page, start_type,
+							&good_pages);
+		/*
+		 * good_pages is now the number of movable pages, but if we
+		 * want MOVABLE or RECLAIMABLE, we consider all non-movable as
+		 * good (but we can't fully distinguish them)
+		 */
+		if (start_type != MIGRATE_MOVABLE)
+			good_pages = pageblock_nr_pages - free_pages -
+								good_pages;
 	}
 
 	new_block_type = old_block_type = get_pageblock_migratetype(page);
 	if (page_group_by_mobility_disabled)
 		new_block_type = start_type;
 
-	if (pages >= (1 << (pageblock_order-1))) {
+	if (free_pages + good_pages >= (1 << (pageblock_order-1))) {
 		/*
-		 * Claim the whole block if over half of it is free. The
-		 * exception is the transition to MIGRATE_MOVABLE where we
-		 * require it to be fully free so that MIGRATE_MOVABLE
-		 * pageblocks consist of purely movable pages. So if we steal
-		 * less than whole pageblock, mark it as MIGRATE_MIXED.
+		 * Claim the whole block if over half of it is free or of a good
+		 * type. The exception is the transition to MIGRATE_MOVABLE
+		 * where we require it to be fully free/good so that
+		 * MIGRATE_MOVABLE pageblocks consist of purely movable pages.
+		 * So if we steal less than whole pageblock, mark it as
+		 * MIGRATE_MIXED.
 		 */
-		if (start_type == MIGRATE_MOVABLE)
+		if ((start_type == MIGRATE_MOVABLE) &&
+				free_pages + good_pages < pageblock_nr_pages)
 			new_block_type = MIGRATE_MIXED;
 		else
 			new_block_type = start_type;
@@ -2079,7 +2106,7 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone,
 			!is_migrate_isolate(mt) && !is_migrate_cma(mt)) {
 		zone->nr_reserved_highatomic += pageblock_nr_pages;
 		set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
-		move_freepages_block(zone, page, MIGRATE_HIGHATOMIC);
+		move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL);
 	}
 
 out_unlock:
@@ -2136,7 +2163,7 @@ static void unreserve_highatomic_pageblock(const struct alloc_context *ac)
 			 * may increase.
 			 */
 			set_pageblock_migratetype(page, ac->migratetype);
-			move_freepages_block(zone, page, ac->migratetype);
+			move_freepages_block(zone, page, ac->migratetype, NULL);
 			spin_unlock_irqrestore(&zone->lock, flags);
 			return;
 		}
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index a5594bfcc5ed..29c2f9b9aba7 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -66,7 +66,8 @@ static int set_migratetype_isolate(struct page *page,
 
 		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
 		zone->nr_isolate_pageblock++;
-		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
+		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE,
+									NULL);
 
 		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
 	}
@@ -120,7 +121,7 @@ static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
 	 * pageblock scanning for freepage moving.
 	 */
 	if (!isolated_page) {
-		nr_pages = move_freepages_block(zone, page, migratetype);
+		nr_pages = move_freepages_block(zone, page, migratetype, NULL);
 		__mod_zone_freepage_state(zone, nr_pages, migratetype);
 	}
 	set_pageblock_migratetype(page, migratetype);
-- 
2.10.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2016-10-13 14:12 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-28  1:41 Regression in mobility grouping? Johannes Weiner
2016-09-28  9:00 ` Vlastimil Babka
2016-09-28  9:00   ` Vlastimil Babka
2016-09-28 15:39   ` Johannes Weiner
2016-09-28 15:39     ` Johannes Weiner
2016-09-29  2:25     ` Johannes Weiner
2016-09-29  2:25       ` Johannes Weiner
2016-09-29  6:14       ` Joonsoo Kim
2016-09-29  6:14         ` Joonsoo Kim
2016-09-29 16:14         ` Johannes Weiner
2016-09-29 16:14           ` Johannes Weiner
2016-10-13  7:33           ` Joonsoo Kim
2016-10-13  7:33             ` Joonsoo Kim
2016-09-29  7:17       ` Vlastimil Babka
2016-09-29  7:17         ` Vlastimil Babka
2016-09-28 10:26 ` Mel Gorman
2016-09-28 10:26   ` Mel Gorman
2016-09-28 16:37   ` Johannes Weiner
2016-09-28 16:37     ` Johannes Weiner
2016-09-29 21:05 ` [RFC 0/4] try to reduce fragmenting fallbacks Vlastimil Babka
2016-09-29 21:05   ` Vlastimil Babka
2016-09-29 21:05   ` [RFC 1/4] mm, compaction: change migrate_async_suitable() to suitable_migration_source() Vlastimil Babka
2016-09-29 21:05     ` Vlastimil Babka
2016-09-29 21:05   ` [RFC 2/4] mm, compaction: add migratetype to compact_control Vlastimil Babka
2016-09-29 21:05     ` Vlastimil Babka
2016-09-29 21:05   ` [RFC 3/4] mm, compaction: restrict async compaction to matching migratetype Vlastimil Babka
2016-09-29 21:05     ` Vlastimil Babka
2016-09-29 21:05   ` [RFC 4/4] mm, page_alloc: disallow migratetype fallback in fastpath Vlastimil Babka
2016-09-29 21:05     ` Vlastimil Babka
2016-10-12 14:51     ` Vlastimil Babka
2016-10-12 14:51       ` Vlastimil Babka
2016-10-13  7:58     ` Joonsoo Kim
2016-10-13  7:58       ` Joonsoo Kim
2016-10-13 11:46       ` Vlastimil Babka
2016-10-13 11:46         ` Vlastimil Babka
2016-10-07  8:32   ` [RFC 5/4] mm, page_alloc: split smallest stolen page in fallback Vlastimil Babka
2016-10-07  8:32     ` Vlastimil Babka
2016-10-10 17:16   ` [RFC 0/4] try to reduce fragmenting fallbacks Johannes Weiner
2016-10-10 17:16     ` Johannes Weiner
2016-10-11 13:11   ` [RFC 6/4] mm, page_alloc: introduce MIGRATE_MIXED migratetype Vlastimil Babka
2016-10-11 13:11     ` Vlastimil Babka
2016-10-13 14:11   ` [RFC 7/4] mm, page_alloc: count movable pages when stealing Vlastimil Babka
2016-10-13 14:11     ` Vlastimil Babka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.