All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Huang Ying <ying.huang@intel.com>
Cc: LKML <linux-kernel@vger.kernel.org>, LKP ML <lkp@01.org>
Subject: Re: [LKP] [mm] 3484b2de949: -46.2% aim7.jobs-per-min
Date: Wed, 25 Mar 2015 10:54:48 +0000	[thread overview]
Message-ID: <20150325105448.GH4701@suse.de> (raw)
In-Reply-To: <1427100381.17170.2.camel@intel.com>

On Mon, Mar 23, 2015 at 04:46:21PM +0800, Huang Ying wrote:
> > My attention is occupied by the automatic NUMA regression at the moment
> > but I haven't forgotten this. Even with the high client count, I was not
> > able to reproduce this so it appears to depend on the number of CPUs
> > available to stress the allocator enough to bypass the per-cpu allocator
> > enough to contend heavily on the zone lock. I'm hoping to think of a
> > better alternative than adding more padding and increasing the cache
> > footprint of the allocator but so far I haven't thought of a good
> > alternative. Moving the lock to the end of the freelists would probably
> > address the problem but still increases the footprint for order-0
> > allocations by a cache line.
> 
> Any update on this?  Do you have some better idea?  I guess this may be
> fixed via putting some fields that are only read during order-0
> allocation with the same cache line of lock, if there are any.
> 

Sorry for the delay, the automatic NUMA regression took a long time to
close and it potentially affected anybody with a NUMA machine, not just
stress tests on large machines.

Moving it beside other fields shifts the problems. The lock is related
to the free areas so it really belongs nearby and from my own testing,
it does not affect mid-sized machines. I'd rather not put the lock in its
own cache line unless we have to. Can you try the following untested patch
instead? It is untested but builds and should be safe.

It'll increase the footprint of the page allocator but so would padding.
It means it will contend with high-order free page breakups but that
is not likely to happen during stress tests. It also collides with flags
but they are relatively rarely updated.


diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index f279d9c158cd..2782df47101e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -474,16 +474,15 @@ struct zone {
 	unsigned long		wait_table_bits;
 
 	ZONE_PADDING(_pad1_)
-
-	/* Write-intensive fields used from the page allocator */
-	spinlock_t		lock;
-
 	/* free areas of different sizes */
 	struct free_area	free_area[MAX_ORDER];
 
 	/* zone flags, see below */
 	unsigned long		flags;
 
+	/* Write-intensive fields used from the page allocator */
+	spinlock_t		lock;
+
 	ZONE_PADDING(_pad2_)
 
 	/* Write-intensive fields used by page reclaim */

WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@suse.de>
To: lkp@lists.01.org
Subject: Re: [mm] 3484b2de949: -46.2% aim7.jobs-per-min
Date: Wed, 25 Mar 2015 10:54:48 +0000	[thread overview]
Message-ID: <20150325105448.GH4701@suse.de> (raw)
In-Reply-To: <1427100381.17170.2.camel@intel.com>

[-- Attachment #1: Type: text/plain, Size: 2441 bytes --]

On Mon, Mar 23, 2015 at 04:46:21PM +0800, Huang Ying wrote:
> > My attention is occupied by the automatic NUMA regression at the moment
> > but I haven't forgotten this. Even with the high client count, I was not
> > able to reproduce this so it appears to depend on the number of CPUs
> > available to stress the allocator enough to bypass the per-cpu allocator
> > enough to contend heavily on the zone lock. I'm hoping to think of a
> > better alternative than adding more padding and increasing the cache
> > footprint of the allocator but so far I haven't thought of a good
> > alternative. Moving the lock to the end of the freelists would probably
> > address the problem but still increases the footprint for order-0
> > allocations by a cache line.
> 
> Any update on this?  Do you have some better idea?  I guess this may be
> fixed via putting some fields that are only read during order-0
> allocation with the same cache line of lock, if there are any.
> 

Sorry for the delay, the automatic NUMA regression took a long time to
close and it potentially affected anybody with a NUMA machine, not just
stress tests on large machines.

Moving it beside other fields shifts the problems. The lock is related
to the free areas so it really belongs nearby and from my own testing,
it does not affect mid-sized machines. I'd rather not put the lock in its
own cache line unless we have to. Can you try the following untested patch
instead? It is untested but builds and should be safe.

It'll increase the footprint of the page allocator but so would padding.
It means it will contend with high-order free page breakups but that
is not likely to happen during stress tests. It also collides with flags
but they are relatively rarely updated.


diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index f279d9c158cd..2782df47101e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -474,16 +474,15 @@ struct zone {
 	unsigned long		wait_table_bits;
 
 	ZONE_PADDING(_pad1_)
-
-	/* Write-intensive fields used from the page allocator */
-	spinlock_t		lock;
-
 	/* free areas of different sizes */
 	struct free_area	free_area[MAX_ORDER];
 
 	/* zone flags, see below */
 	unsigned long		flags;
 
+	/* Write-intensive fields used from the page allocator */
+	spinlock_t		lock;
+
 	ZONE_PADDING(_pad2_)
 
 	/* Write-intensive fields used by page reclaim */

  reply	other threads:[~2015-03-25 10:54 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-27  7:21 [LKP] [mm] 3484b2de949: -46.2% aim7.jobs-per-min Huang Ying
2015-02-27  7:21 ` Huang Ying
2015-02-27 11:53 ` [LKP] " Mel Gorman
2015-02-27 11:53   ` Mel Gorman
2015-02-28  1:24   ` [LKP] " Huang Ying
2015-02-28  1:24     ` Huang Ying
2015-02-28  7:57   ` [LKP] " Huang Ying
2015-02-28  7:57     ` Huang Ying
2015-02-28  1:46 ` [LKP] " Mel Gorman
2015-02-28  1:46   ` Mel Gorman
2015-02-28  2:30   ` [LKP] " Huang Ying
2015-02-28  2:30     ` Huang Ying
2015-02-28  2:42     ` [LKP] " Huang Ying
2015-02-28  2:42       ` Huang Ying
2015-02-28  7:30   ` [LKP] " Huang Ying
2015-02-28  7:30     ` Huang Ying
2015-03-05  5:34     ` [LKP] " Huang Ying
2015-03-05  5:34       ` Huang Ying
2015-03-05 10:26       ` [LKP] " Mel Gorman
2015-03-05 10:26         ` Mel Gorman
2015-03-23  8:46         ` [LKP] " Huang Ying
2015-03-23  8:46           ` Huang Ying
2015-03-25 10:54           ` Mel Gorman [this message]
2015-03-25 10:54             ` Mel Gorman
2015-03-27  8:49             ` [LKP] " Huang Ying
2015-03-27  8:49               ` Huang Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150325105448.GH4701@suse.de \
    --to=mgorman@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@01.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.