RE: [RFC PATCH v4 11/13] mm: parallelize deferred struct page initialization within each node

From: "Elliott, Robert (Persistent Memory)" <elliott@hpe.com>
To: Daniel Jordan <daniel.m.jordan@oracle.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Cc: "aarcange@redhat.com" <aarcange@redhat.com>,
	"aaron.lu@intel.com" <aaron.lu@intel.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"bsd@redhat.com" <bsd@redhat.com>,
	"darrick.wong@oracle.com" <darrick.wong@oracle.com>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	"jgg@mellanox.com" <jgg@mellanox.com>,
	"jwadams@google.com" <jwadams@google.com>,
	"jiangshanlai@gmail.com" <jiangshanlai@gmail.com>,
	"mhocko@kernel.org" <mhocko@kernel.org>,
	"mike.kravetz@oracle.com" <mike.kravetz@oracle.com>,
	"Pavel.Tatashin@microsoft.com" <Pavel.Tatashin@microsoft.com>,
	"prasad.singamsetty@oracle.com" <prasad.singamsetty@oracle.com>,
	"rdunlap@infradead.org" <rdunlap@infradead.org>,
	"steven.sistare@ora
Subject: RE: [RFC PATCH v4 11/13] mm: parallelize deferred struct page initialization within each node
Date: Sat, 10 Nov 2018 03:48:14 +0000	[thread overview]
Message-ID: <AT5PR8401MB1169798EBEF1EE5EBA3ABFFFABC70@AT5PR8401MB1169.NAMPRD84.PROD.OUTLOOK.COM> (raw)
In-Reply-To: <20181105165558.11698-12-daniel.m.jordan@oracle.com>

> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org <linux-kernel-
> owner@vger.kernel.org> On Behalf Of Daniel Jordan
> Sent: Monday, November 05, 2018 10:56 AM
> Subject: [RFC PATCH v4 11/13] mm: parallelize deferred struct page
> initialization within each node
> 
> ...  The kernel doesn't
> know the memory bandwidth of a given system to get the most efficient
> number of threads, so there's some guesswork involved.  

The ACPI HMAT (Heterogeneous Memory Attribute Table) is designed to report
that kind of information, and could facilitate automatic tuning.

There was discussion last year about kernel support for it:
https://lore.kernel.org/lkml/20171214021019.13579-1-ross.zwisler@linux.intel.com/

> In testing, a reasonable value turned out to be about a quarter of the
> CPUs on the node.
...
> +	/*
> +	 * We'd like to know the memory bandwidth of the chip to
>         calculate the
> +	 * most efficient number of threads to start, but we can't.
> +	 * In testing, a good value for a variety of systems was a
>         quarter of the CPUs on the node.
> +	 */
> +	nr_node_cpus = DIV_ROUND_UP(cpumask_weight(cpumask), 4);

You might want to base that calculation on and limit the threads to
physical cores, not hyperthreaded cores.

---
Robert Elliott, HPE Persistent Memory