nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Pavel Tatashin <pasha.tatashin@soleen.com>
To: Alexander Duyck <alexander.h.duyck@linux.intel.com>,
	daniel.m.jordan@oracle.com
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
	sparclinux@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-nvdimm@lists.01.org, davem@davemloft.net,
	pavel.tatashin@microsoft.com, mhocko@suse.com, mingo@kernel.org,
	kirill.shutemov@linux.intel.com, dan.j.williams@intel.com,
	dave.jiang@intel.com, rppt@linux.vnet.ibm.com,
	willy@infradead.org, vbabka@suse.cz, khalid.aziz@oracle.com,
	ldufour@linux.vnet.ibm.com, mgorman@techsingularity.net,
	yi.z.zhang@linux.intel.com
Subject: Re: [mm PATCH v5 0/7] Deferred page init improvements
Date: Fri, 9 Nov 2018 19:00:06 -0500	[thread overview]
Message-ID: <20181110000006.tmcfnzynelaznn7u@xakep.localdomain> (raw)
In-Reply-To: <18b6634b912af7b4ec01396a2b0f3b31737c9ea2.camel@linux.intel.com>

On 18-11-09 15:14:35, Alexander Duyck wrote:
> On Fri, 2018-11-09 at 16:15 -0500, Pavel Tatashin wrote:
> > On 18-11-05 13:19:25, Alexander Duyck wrote:
> > > This patchset is essentially a refactor of the page initialization logic
> > > that is meant to provide for better code reuse while providing a
> > > significant improvement in deferred page initialization performance.
> > > 
> > > In my testing on an x86_64 system with 384GB of RAM and 3TB of persistent
> > > memory per node I have seen the following. In the case of regular memory
> > > initialization the deferred init time was decreased from 3.75s to 1.06s on
> > > average. For the persistent memory the initialization time dropped from
> > > 24.17s to 19.12s on average. This amounts to a 253% improvement for the
> > > deferred memory initialization performance, and a 26% improvement in the
> > > persistent memory initialization performance.
> > 
> > Hi Alex,
> > 
> > Please try to run your persistent memory init experiment with Daniel's
> > patches:
> > 
> > https://lore.kernel.org/lkml/20181105165558.11698-1-daniel.m.jordan@oracle.com/
> 
> I've taken a quick look at it. It seems like a bit of a brute force way
> to try and speed things up. I would be worried about it potentially

There is a limit to max number of threads that ktasks start. The memory
throughput is *much* higher than what one CPU can maxout in a node, so
there is no reason to leave the other CPUs sit idle during boot when
they can help to initialize.

> introducing performance issues if the number of CPUs thrown at it end
> up exceeding the maximum throughput of the memory.
> 
> The data provided with patch 11 seems to point to issues such as that.
> In the case of the E7-8895 example cited it is increasing the numbers
> of CPUs used from memory initialization from 8 to 72, a 9x increase in
> the number of CPUs but it is yeilding only a 3.88x speedup.

Yes, but in both cases we are far from maxing out the memory throughput.
The 3.88x is indeed low, and I do not know what slows it down.

Daniel,

Could you please check why multi-threading efficiency is so low here?

I bet, there is some atomic operation introduces a contention within a
node. It should be possible to resolve.

> 
> > The performance should improve by much more than 26%.
> 
> The 26% improvement, or speedup of 1.26x using the ktask approach, was
> for persistent memory, not deferred memory init. The ktask patch
> doesn't do anything for persistent memory since it is takes the hot-
> plug path and isn't handled via the deferred memory init.

Ah, I thought in your experiment persistent memory takes deferred init
path. So, what exactly in your patches make this 1.26x speedup?

> 
> I had increased deferred memory init to about 3.53x the original speed
> (3.75s to 1.06s) on the system which I was testing. I do agree the two
> patches should be able to synergistically boost each other though as
> this patch set was meant to make the init much more cache friendly so
> as a result it should scale better as you add additional cores. I know
> I had done some playing around with fake numa to split up a single node
> into 8 logical nodes and I had seen a similar speedup of about 3.85x
> with my test memory initializing in about 275ms.
> 
> > Overall, your works looks good, but it needs to be considered how easy it will be
> > to merge with ktask. I will try to complete the review today.
> > 
> > Thank you,
> > Pasha
> 
> Looking over the patches they are still in the RFC stage and the data
> is in need of updates since it is referencing 4.15-rc kernels as its
> baseline. If anything I really think the ktask patch 11 would be easier
> to rebase around my patch set then the other way around. Also, this
> series is in Andrew's mmots as of a few days ago, so I think it will be
> in the next mmotm that comes out.

I do not disagree, I think these two patch series should complement each
other. But, if your changes make it impossible for ktask, I would strongly argue
against it, as the potential improvements with ktasks are much higher.
But, so far I do not see anything, so I think they can work together. I
am still reviewing your work.

> 
> The integration with the ktask code should be pretty straight forward.
> If anything I think my code would probably make it easier since it gets
> rid of the need to do all this in two passes. The only new limitation
> it would add is that you would probably want to split up the work along
> either max order or section aligned boundaries. What it would

Which is totally OK, it should make ktasks scale even better.

> essentially do is make things so that each of the ktask threads would
> probably look more like deferred_grow_zone which after my patch set is
> actually a fairly simple function.
> 
> Thanks.

Thank you,
Pasha

  reply	other threads:[~2018-11-10  0:00 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-05 21:19 [mm PATCH v5 0/7] Deferred page init improvements Alexander Duyck
2018-11-05 21:19 ` [mm PATCH v5 1/7] mm: Use mm_zero_struct_page from SPARC on all 64b architectures Alexander Duyck
2018-11-05 21:19 ` [mm PATCH v5 2/7] mm: Drop meminit_pfn_in_nid as it is redundant Alexander Duyck
2018-11-05 21:19 ` [mm PATCH v5 3/7] mm: Implement new zone specific memblock iterator Alexander Duyck
2018-11-09 23:26   ` Pavel Tatashin
2018-11-09 23:58     ` Alexander Duyck
2018-11-10  0:11       ` Pavel Tatashin
2018-11-05 21:19 ` [mm PATCH v5 4/7] mm: Initialize MAX_ORDER_NR_PAGES at a time instead of doing larger sections Alexander Duyck
2018-11-10  1:02   ` Pavel Tatashin
2018-11-19 18:53     ` Alexander Duyck
2018-11-05 21:19 ` [mm PATCH v5 5/7] mm: Move hot-plug specific memory init into separate functions and optimize Alexander Duyck
2018-11-10  2:07   ` Pavel Tatashin
2018-11-05 21:19 ` [mm PATCH v5 6/7] mm: Add reserved flag setting to set_page_links Alexander Duyck
2018-11-10  2:11   ` Pavel Tatashin
2018-11-05 21:20 ` [mm PATCH v5 7/7] mm: Use common iterator for deferred_init_pages and deferred_free_pages Alexander Duyck
2018-11-10  4:13   ` Pavel Tatashin
2018-11-12 15:12     ` Alexander Duyck
2018-11-09 21:15 ` [mm PATCH v5 0/7] Deferred page init improvements Pavel Tatashin
2018-11-09 23:14   ` Alexander Duyck
2018-11-10  0:00     ` Pavel Tatashin [this message]
2018-11-10  0:46       ` Alexander Duyck
2018-11-10  1:16         ` Pavel Tatashin
2018-11-12 19:10           ` Alexander Duyck
2018-11-12 20:37             ` Pavel Tatashin
2018-11-12 16:25       ` Daniel Jordan
2018-11-14 15:07 ` Michal Hocko
2018-11-14 19:12   ` Pavel Tatashin
2018-11-14 21:35     ` Michal Hocko
2018-11-15  0:50   ` Alexander Duyck
2018-11-15  1:55     ` Mike Rapoport
2018-11-15 19:09       ` Mike Rapoport
2018-11-15  8:10     ` Michal Hocko
2018-11-15 16:02       ` Alexander Duyck
2018-11-15 16:40         ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181110000006.tmcfnzynelaznn7u@xakep.localdomain \
    --to=pasha.tatashin@soleen.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=dave.jiang@intel.com \
    --cc=davem@davemloft.net \
    --cc=khalid.aziz@oracle.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=ldufour@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=mingo@kernel.org \
    --cc=pavel.tatashin@microsoft.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=sparclinux@vger.kernel.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yi.z.zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).