All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel J Blueman <daniel@numascale.com>
To: Waiman Long <waiman.long@hp.com>, Mel Gorman <mgorman@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	nzimmer <nzimmer@sgi.com>, Dave Hansen <dave.hansen@intel.com>,
	Scott Norton <scott.norton@hp.com>, Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Steffen Persvold <sp@numascale.com>
Subject: Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
Date: Sat, 23 May 2015 11:49:33 +0800	[thread overview]
Message-ID: <1432352973.6359.2@cpanel21.proisp.no> (raw)
In-Reply-To: <555F6404.4010905@hp.com>



-- 
Daniel J Blueman
Principal Software Engineer, Numascale

On Sat, May 23, 2015 at 1:14 AM, Waiman Long <waiman.long@hp.com> wrote:
> On 05/22/2015 05:33 AM, Mel Gorman wrote:
>> On Fri, May 22, 2015 at 02:30:01PM +0800, Daniel J Blueman wrote:
>>> On Thu, May 14, 2015 at 6:03 PM, Daniel J Blueman
>>> <daniel@numascale.com>  wrote:
>>>> On Thu, May 14, 2015 at 12:31 AM, Mel Gorman<mgorman@suse.de>  
>>>> wrote:
>>>>> On Wed, May 13, 2015 at 10:53:33AM -0500, nzimmer wrote:
>>>>>> I am just noticed a hang on my largest box.
>>>>>> I can only reproduce with large core counts, if I turn down the
>>>>>> number of cpus it doesn't have an issue.
>>>>>> 
>>>>> Odd. The number of core counts should make little a difference
>>>>> as only
>>>>> one CPU per node should be in use. Does sysrq+t give any
>>>>> indication how
>>>>> or where it is hanging?
>>>> I was seeing the same behaviour of 1000ms increasing to 5500ms
>>>> [1]; this suggests either lock contention or O(n) behaviour.
>>>> 
>>>> Nathan, can you check with this ordering of patches from Andrew's
>>>> cache [2]? I was getting hanging until I a found them all.
>>>> 
>>>> I'll follow up with timing data.
>>> 7TB over 216 NUMA nodes, 1728 cores, from kernel 4.0.4 load to 
>>> login:
>>> 
>>> 1. 2086s with patches 01-19 [1]
>>> 
>>> 2. 2026s adding "Take into account that large system caches scale
>>> linearly with memory", which has:
>>> min(2UL<<  (30 - PAGE_SHIFT), (pgdat->node_spanned_pages>>  3));
>>> 
>>> 3. 2442s fixing to:
>>> max(2UL<<  (30 - PAGE_SHIFT), (pgdat->node_spanned_pages>>  3));
>>> 
>>> 4. 2064s adjusting minimum and shift to:
>>> max(512UL<<  (20 - PAGE_SHIFT), (pgdat->node_spanned_pages>>  8));
>>> 
>>> 5. 1934s adjusting minimum and shift to:
>>> max(128UL<<  (20 - PAGE_SHIFT), (pgdat->node_spanned_pages>>  8));
>>> 
>>> 6. 930s #5 with the non-temporal PMD init patch I had earlier
>>> proposed (I'll pursue separately)
>>> 
>>> The scaling patch isn't in -mm.
>> That patch was superceded by "mm: meminit: finish
>> initialisation of struct pages before basic setup" and
>> "mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix"
>> so that's ok.
>> 
>> FWIW, I think you should still go ahead with the non-temporal 
>> patches because
>> there is potential benefit there other than the initialisation.  If 
>> there
>> was an arch-optional implementation of a non-termporal clear then it 
>> would
>> also be worth considering if __GFP_ZERO should use non-temporal 
>> stores.
>> At a greater stretch it would be worth considering if kswapd freeing 
>> should
>> zero pages to avoid a zero on the allocation side in the general 
>> case as
>> it would be more generally useful and a stepping stone towards what 
>> the
>> series "Sanitizing freed pages" attempts.

Good tip Mel; I'll take a look when time allows and get some data, 
though I guess it'll only be a win where the clearing is on a different 
node than the allocation.

> I think the non-temporal patch benefits mainly AMD systems. I have 
> tried the patch on both DragonHawk and it actually made it boot up a 
> little bit slower. I think the Intel optimized "rep stosb" 
> instruction (used in memset) is performing well. I had done similar 
> test on zero page code and the performance gain was non-conclusive.

I suspect 'rep stosb' on modern Intel hardware can write whole 
cachelines atomically, avoiding the RMW, or that the read part of the 
RMW is optimally prefetched. Open-coding it just can't reach the same 
level of pipeline saturation that the microcode can.

Daniel


WARNING: multiple messages have this Message-ID (diff)
From: Daniel J Blueman <daniel@numascale.com>
To: Waiman Long <waiman.long@hp.com>, Mel Gorman <mgorman@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	nzimmer <nzimmer@sgi.com>, Dave Hansen <dave.hansen@intel.com>,
	Scott Norton <scott.norton@hp.com>, Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Steffen Persvold <sp@numascale.com>
Subject: Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
Date: Sat, 23 May 2015 11:49:33 +0800	[thread overview]
Message-ID: <1432352973.6359.2@cpanel21.proisp.no> (raw)
In-Reply-To: <555F6404.4010905@hp.com>



-- 
Daniel J Blueman
Principal Software Engineer, Numascale

On Sat, May 23, 2015 at 1:14 AM, Waiman Long <waiman.long@hp.com> wrote:
> On 05/22/2015 05:33 AM, Mel Gorman wrote:
>> On Fri, May 22, 2015 at 02:30:01PM +0800, Daniel J Blueman wrote:
>>> On Thu, May 14, 2015 at 6:03 PM, Daniel J Blueman
>>> <daniel@numascale.com>  wrote:
>>>> On Thu, May 14, 2015 at 12:31 AM, Mel Gorman<mgorman@suse.de>  
>>>> wrote:
>>>>> On Wed, May 13, 2015 at 10:53:33AM -0500, nzimmer wrote:
>>>>>> I am just noticed a hang on my largest box.
>>>>>> I can only reproduce with large core counts, if I turn down the
>>>>>> number of cpus it doesn't have an issue.
>>>>>> 
>>>>> Odd. The number of core counts should make little a difference
>>>>> as only
>>>>> one CPU per node should be in use. Does sysrq+t give any
>>>>> indication how
>>>>> or where it is hanging?
>>>> I was seeing the same behaviour of 1000ms increasing to 5500ms
>>>> [1]; this suggests either lock contention or O(n) behaviour.
>>>> 
>>>> Nathan, can you check with this ordering of patches from Andrew's
>>>> cache [2]? I was getting hanging until I a found them all.
>>>> 
>>>> I'll follow up with timing data.
>>> 7TB over 216 NUMA nodes, 1728 cores, from kernel 4.0.4 load to 
>>> login:
>>> 
>>> 1. 2086s with patches 01-19 [1]
>>> 
>>> 2. 2026s adding "Take into account that large system caches scale
>>> linearly with memory", which has:
>>> min(2UL<<  (30 - PAGE_SHIFT), (pgdat->node_spanned_pages>>  3));
>>> 
>>> 3. 2442s fixing to:
>>> max(2UL<<  (30 - PAGE_SHIFT), (pgdat->node_spanned_pages>>  3));
>>> 
>>> 4. 2064s adjusting minimum and shift to:
>>> max(512UL<<  (20 - PAGE_SHIFT), (pgdat->node_spanned_pages>>  8));
>>> 
>>> 5. 1934s adjusting minimum and shift to:
>>> max(128UL<<  (20 - PAGE_SHIFT), (pgdat->node_spanned_pages>>  8));
>>> 
>>> 6. 930s #5 with the non-temporal PMD init patch I had earlier
>>> proposed (I'll pursue separately)
>>> 
>>> The scaling patch isn't in -mm.
>> That patch was superceded by "mm: meminit: finish
>> initialisation of struct pages before basic setup" and
>> "mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix"
>> so that's ok.
>> 
>> FWIW, I think you should still go ahead with the non-temporal 
>> patches because
>> there is potential benefit there other than the initialisation.  If 
>> there
>> was an arch-optional implementation of a non-termporal clear then it 
>> would
>> also be worth considering if __GFP_ZERO should use non-temporal 
>> stores.
>> At a greater stretch it would be worth considering if kswapd freeing 
>> should
>> zero pages to avoid a zero on the allocation side in the general 
>> case as
>> it would be more generally useful and a stepping stone towards what 
>> the
>> series "Sanitizing freed pages" attempts.

Good tip Mel; I'll take a look when time allows and get some data, 
though I guess it'll only be a win where the clearing is on a different 
node than the allocation.

> I think the non-temporal patch benefits mainly AMD systems. I have 
> tried the patch on both DragonHawk and it actually made it boot up a 
> little bit slower. I think the Intel optimized "rep stosb" 
> instruction (used in memset) is performing well. I had done similar 
> test on zero page code and the performance gain was non-conclusive.

I suspect 'rep stosb' on modern Intel hardware can write whole 
cachelines atomically, avoiding the RMW, or that the read part of the 
RMW is optimally prefetched. Open-coding it just can't reach the same 
level of pipeline saturation that the microcode can.

Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2015-05-23  3:49 UTC|newest]

Thread overview: 168+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-28 14:36 [PATCH 0/13] Parallel struct page initialisation v4 Mel Gorman
2015-04-28 14:36 ` Mel Gorman
2015-04-28 14:36 ` [PATCH 01/13] memblock: Introduce a for_each_reserved_mem_region iterator Mel Gorman
2015-04-28 14:36   ` Mel Gorman
2015-04-28 14:36 ` [PATCH 02/13] mm: meminit: Move page initialization into a separate function Mel Gorman
2015-04-28 14:36   ` Mel Gorman
2015-04-28 14:37 ` [PATCH 03/13] mm: meminit: Only set page reserved in the memblock region Mel Gorman
2015-04-28 14:37   ` Mel Gorman
2015-05-22 20:31   ` Tony Luck
2015-05-22 20:31     ` Tony Luck
2015-05-26 10:22     ` Mel Gorman
2015-05-26 10:22       ` Mel Gorman
2015-04-28 14:37 ` [PATCH 04/13] mm: page_alloc: Pass PFN to __free_pages_bootmem Mel Gorman
2015-04-28 14:37   ` Mel Gorman
2015-05-01  9:20   ` [PATCH] mm: page_alloc: pass PFN to __free_pages_bootmem -fix Mel Gorman
2015-05-01  9:20     ` Mel Gorman
2015-04-28 14:37 ` [PATCH 05/13] mm: meminit: Make __early_pfn_to_nid SMP-safe and introduce meminit_pfn_in_nid Mel Gorman
2015-04-28 14:37   ` Mel Gorman
2015-04-28 14:37 ` [PATCH 06/13] mm: meminit: Inline some helper functions Mel Gorman
2015-04-28 14:37   ` Mel Gorman
2015-04-30 21:53   ` Andrew Morton
2015-04-30 21:53     ` Andrew Morton
2015-04-30 21:55     ` Andrew Morton
2015-04-30 21:55       ` Andrew Morton
2015-05-04  8:33   ` Michal Hocko
2015-05-04  8:33     ` Michal Hocko
2015-05-04  8:38     ` Michal Hocko
2015-05-04  8:38       ` Michal Hocko
2015-04-28 14:37 ` [PATCH 07/13] mm: meminit: Initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set Mel Gorman
2015-04-28 14:37   ` Mel Gorman
2015-04-29 21:19   ` Andrew Morton
2015-04-29 21:19     ` Andrew Morton
2015-04-30  8:45     ` Mel Gorman
2015-04-30  8:45       ` Mel Gorman
2015-05-01  9:21   ` [PATCH] mm: meminit: Initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set -fix Mel Gorman
2015-05-01  9:21     ` Mel Gorman
2015-07-14 15:54   ` 4.2-rc2: hitting "file-max limit 8192 reached" Dave Hansen
2015-07-14 15:54     ` Dave Hansen
2015-07-14 16:15     ` Andrew Morton
2015-07-14 16:15       ` Andrew Morton
2015-07-15 10:45     ` Mel Gorman
2015-07-15 10:45       ` Mel Gorman
2015-04-28 14:37 ` [PATCH 08/13] mm: meminit: Initialise remaining struct pages in parallel with kswapd Mel Gorman
2015-04-28 14:37   ` Mel Gorman
2015-04-28 14:37 ` [PATCH 09/13] mm: meminit: Minimise number of pfn->page lookups during initialisation Mel Gorman
2015-04-28 14:37   ` Mel Gorman
2015-04-28 14:37 ` [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64 Mel Gorman
2015-04-28 14:37   ` Mel Gorman
2015-04-28 14:37 ` [PATCH 11/13] mm: meminit: Free pages in large chunks where possible Mel Gorman
2015-04-28 14:37   ` Mel Gorman
2015-04-28 14:37 ` [PATCH 12/13] mm: meminit: Reduce number of times pageblocks are set during struct page init Mel Gorman
2015-04-28 14:37   ` Mel Gorman
2015-05-01  9:23   ` [PATCH] mm: meminit: Reduce number of times pageblocks are set during struct page init -fix Mel Gorman
2015-05-01  9:23     ` Mel Gorman
2015-04-28 14:37 ` [PATCH 13/13] mm: meminit: Remove mminit_verify_page_links Mel Gorman
2015-04-28 14:37   ` Mel Gorman
2015-04-28 16:06 ` [PATCH 0/13] Parallel struct page initialisation v4 Pekka Enberg
2015-04-28 16:06   ` Pekka Enberg
2015-04-28 18:38   ` nzimmer
2015-04-28 18:38     ` nzimmer
2015-04-30 16:10     ` Daniel J Blueman
2015-04-30 16:10       ` Daniel J Blueman
2015-04-30 17:12       ` nzimmer
2015-04-30 17:12         ` nzimmer
2015-04-30 17:28         ` Mel Gorman
2015-04-30 17:28           ` Mel Gorman
2015-05-02 11:52       ` Elliott, Robert (Server Storage)
2015-05-02 11:52         ` Elliott, Robert (Server Storage)
2015-05-02 11:52         ` Elliott, Robert (Server Storage)
2015-04-29  1:16 ` Waiman Long
2015-04-29  1:16   ` Waiman Long
2015-05-01 22:02   ` Waiman Long
2015-05-01 22:02     ` Waiman Long
2015-05-02  0:09     ` Waiman Long
2015-05-02  0:09       ` Waiman Long
2015-05-02  8:52       ` Daniel J Blueman
2015-05-02  8:52         ` Daniel J Blueman
2015-05-02 16:05         ` Daniel J Blueman
2015-05-02 16:05           ` Daniel J Blueman
2015-05-04 21:30       ` Andrew Morton
2015-05-04 21:30         ` Andrew Morton
2015-05-05  3:32         ` Waiman Long
2015-05-05  3:32           ` Waiman Long
2015-05-05 10:45         ` Mel Gorman
2015-05-05 10:45           ` Mel Gorman
2015-05-05 13:55           ` Waiman Long
2015-05-05 13:55             ` Waiman Long
2015-05-05 14:31             ` Mel Gorman
2015-05-05 14:31               ` Mel Gorman
2015-05-05 15:01               ` Waiman Long
2015-05-05 15:01                 ` Waiman Long
2015-05-06  3:39                 ` Waiman Long
2015-05-06  3:39                   ` Waiman Long
2015-05-06  0:55               ` Waiman Long
2015-05-06  0:55                 ` Waiman Long
2015-05-05 20:02           ` Andrew Morton
2015-05-05 20:02             ` Andrew Morton
2015-05-05 22:13             ` Mel Gorman
2015-05-05 22:13               ` Mel Gorman
2015-05-05 22:25               ` Andrew Morton
2015-05-05 22:25                 ` Andrew Morton
2015-05-06  7:12                 ` Mel Gorman
2015-05-06  7:12                   ` Mel Gorman
2015-05-06 10:22                   ` Mel Gorman
2015-05-06 10:22                     ` Mel Gorman
2015-05-06 12:05                     ` Mel Gorman
2015-05-06 12:05                       ` Mel Gorman
2015-05-06 17:58                     ` Waiman Long
2015-05-06 17:58                       ` Waiman Long
2015-05-07  2:37                       ` Waiman Long
2015-05-07  2:37                         ` Waiman Long
2015-05-07  7:21                         ` Mel Gorman
2015-05-07  7:21                           ` Mel Gorman
2015-05-06  1:21             ` Waiman Long
2015-05-06  1:21               ` Waiman Long
2015-05-06  2:01               ` Andrew Morton
2015-05-06  2:01                 ` Andrew Morton
2015-05-07  7:25             ` [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup Mel Gorman
2015-05-07  7:25               ` Mel Gorman
2015-05-07 22:09               ` Andrew Morton
2015-05-07 22:09                 ` Andrew Morton
2015-05-07 22:52                 ` Mel Gorman
2015-05-07 22:52                   ` Mel Gorman
2015-05-07 23:02                   ` Andrew Morton
2015-05-07 23:02                     ` Andrew Morton
2015-05-13 15:53                 ` nzimmer
2015-05-13 15:53                   ` nzimmer
2015-05-13 16:31                   ` Mel Gorman
2015-05-13 16:31                     ` Mel Gorman
2015-05-14 10:03                     ` Daniel J Blueman
2015-05-14 10:03                       ` Daniel J Blueman
2015-05-14 15:47                       ` nzimmer
2015-05-14 15:47                         ` nzimmer
2015-05-19 18:31                       ` nzimmer
2015-05-19 18:31                         ` nzimmer
2015-05-19 19:06                         ` Mel Gorman
2015-05-19 19:06                           ` Mel Gorman
2015-05-22  6:30                       ` Daniel J Blueman
2015-05-22  6:30                         ` Daniel J Blueman
2015-05-22  9:33                         ` Mel Gorman
2015-05-22  9:33                           ` Mel Gorman
2015-05-22 17:14                           ` Waiman Long
2015-05-22 17:14                             ` Waiman Long
2015-05-22 21:43                             ` Davidlohr Bueso
2015-05-22 21:43                               ` Davidlohr Bueso
2015-05-23  3:49                             ` Daniel J Blueman [this message]
2015-05-23  3:49                               ` Daniel J Blueman
2015-06-24 22:50                       ` Nathan Zimmer
2015-06-24 22:50                         ` Nathan Zimmer
2015-06-25 20:48                         ` Mel Gorman
2015-06-25 20:48                           ` Mel Gorman
2015-06-25 20:57                           ` Mel Gorman
2015-06-25 20:57                             ` Mel Gorman
2015-06-25 21:37                             ` Nathan Zimmer
2015-06-25 21:37                               ` Nathan Zimmer
2015-06-25 21:34                           ` Nathan Zimmer
2015-06-25 21:34                             ` Nathan Zimmer
2015-06-25 21:44                           ` [RFC] kthread_create_on_node is failing to honor the node choice Nathan Zimmer
2015-06-26  1:08                             ` Lai Jiangshan
2015-07-09 22:12                             ` Andrew Morton
2015-07-10 14:26                               ` Mel Gorman
2015-07-10 17:34                               ` Nathan Zimmer
2015-06-26 10:16                         ` [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup Mel Gorman
2015-06-26 10:16                           ` Mel Gorman
2015-07-06 17:45                         ` Daniel J Blueman
2015-07-06 17:45                           ` Daniel J Blueman
2015-07-09 17:49                           ` Nathan Zimmer
2015-07-09 17:49                             ` Nathan Zimmer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1432352973.6359.2@cpanel21.proisp.no \
    --to=daniel@numascale.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=nzimmer@sgi.com \
    --cc=scott.norton@hp.com \
    --cc=sp@numascale.com \
    --cc=waiman.long@hp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.