linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Keith Busch <keith.busch@intel.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux ACPI <linux-acpi@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	Greg KH <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Dave Hansen <dave.hansen@intel.com>
Subject: Re: [PATCHv2 02/12] acpi/hmat: Parse and report heterogeneous memory
Date: Tue, 11 Dec 2018 12:29:45 -0800	[thread overview]
Message-ID: <CAPcyv4id0mgjdBPPw8Y26rodOEQ=EHfaTrgasU5g4X7u=dS2xw@mail.gmail.com> (raw)
In-Reply-To: <20181211165518.GB8101@localhost.localdomain>

On Tue, Dec 11, 2018 at 8:58 AM Keith Busch <keith.busch@intel.com> wrote:
>
> On Mon, Dec 10, 2018 at 10:03:40PM -0800, Dan Williams wrote:
> > I have a use case to detect the presence of a memory-side-cache early
> > at init time [1]. To me this means that hmat_init() needs to happen as
> > a part of acpi_numa_init(). Subsequently I think that also means that
> > the sysfs portion needs to be broken out to its own init path that can
> > probably run at module_init() priority.
> >
> > Perhaps we should split this patch set into two? The table parsing
> > with an in-kernel user is a bit easier to reason about and can go in
> > first. Towards that end can I steal / refllow patches 1 & 2 into the
> > memory randomization series? Other ideas how to handle this?
> >
> > [1]: https://lkml.org/lkml/2018/10/12/309
>
> To that end, will something like the following work for you? This just
> needs to happen after patch 1.
>
> ---
> diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
> index f5e09c39ff22..03ef3c8ba4ea 100644
> --- a/drivers/acpi/numa.c
> +++ b/drivers/acpi/numa.c
> @@ -40,6 +40,8 @@ static int pxm_to_node_map[MAX_PXM_DOMAINS]
>  static int node_to_pxm_map[MAX_NUMNODES]
>                         = { [0 ... MAX_NUMNODES - 1] = PXM_INVAL };
>
> +static unsigned long node_side_cached[BITS_TO_LONGS(MAX_PXM_DOMAINS)];
> +
>  unsigned char acpi_srat_revision __initdata;
>  int acpi_numa __initdata;
>
> @@ -262,6 +264,7 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
>         u64 start, end;
>         u32 hotpluggable;
>         int node, pxm;
> +       bool side_cached;
>
>         if (srat_disabled())
>                 goto out_err;
> @@ -308,6 +311,11 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
>                 pr_warn("SRAT: Failed to mark hotplug range [mem %#010Lx-%#010Lx] in memblock\n",
>                         (unsigned long long)start, (unsigned long long)end - 1);
>
> +       side_cached = test_bit(pxm, node_side_cached);
> +       if (side_cached && memblock_mark_sidecached(start, ma->length))
> +               pr_warn("SRAT: Failed to mark side cached range [mem %#010Lx-%#010Lx] in memblock\n",
> +                       (unsigned long long)start, (unsigned long long)end - 1);
> +
>         max_possible_pfn = max(max_possible_pfn, PFN_UP(end - 1));
>
>         return 0;
> @@ -411,6 +419,19 @@ acpi_parse_memory_affinity(union acpi_subtable_headers * header,
>         return 0;
>  }
>
> +static int __init
> +acpi_parse_cache(union acpi_subtable_headers *header, const unsigned long end)
> +{
> +       struct acpi_hmat_cache *cache = (void *)header;
> +       u32 attrs;
> +
> +       attrs = cache->cache_attributes;
> +       if (((attrs & ACPI_HMAT_CACHE_ASSOCIATIVITY) >> 8) ==
> +                                               ACPI_HMAT_CA_DIRECT_MAPPED)
> +               set_bit(cache->memory_PD, node_side_cached);

I'm not sure I see a use case for 'node_side_cached'. Instead I need
to know if a cache intercepts a "System RAM" resource, because a cache
in front of a reserved address range would not be impacted by page
allocator randomization. Or, are you saying have memblock generically
describes this capability and move the responsibility of acting on
that data to a higher level?

The other detail to consider is the cache ratio size, but that would
be a follow on feature. The use case is to automatically determine the
ratio to pass to numa_emulation:

    cc9aec03e58f x86/numa_emulation: Introduce uniform split capability

> +       return 0;
> +}
> +
>  static int __init acpi_parse_srat(struct acpi_table_header *table)
>  {
>         struct acpi_table_srat *srat = (struct acpi_table_srat *)table;
> @@ -422,6 +443,11 @@ static int __init acpi_parse_srat(struct acpi_table_header *table)
>         return 0;
>  }
>
> +static __init int acpi_parse_hmat(struct acpi_table_header *table)
> +{
> +       return 0;
> +}

What's this acpi_parse_hmat() stub for?

> +
>  static int __init
>  acpi_table_parse_srat(enum acpi_srat_type id,
>                       acpi_tbl_entry_handler handler, unsigned int max_entries)
> @@ -460,6 +486,16 @@ int __init acpi_numa_init(void)
>                                         sizeof(struct acpi_table_srat),
>                                         srat_proc, ARRAY_SIZE(srat_proc), 0);
>
> +               if (!acpi_table_parse(ACPI_SIG_HMAT, acpi_parse_hmat)) {
> +                       struct acpi_subtable_proc hmat_proc;
> +
> +                       memset(&hmat_proc, 0, sizeof(hmat_proc));
> +                       hmat_proc.handler = acpi_parse_cache;
> +                       hmat_proc.id = ACPI_HMAT_TYPE_CACHE;
> +                       acpi_table_parse_entries_array(ACPI_SIG_HMAT,
> +                                               sizeof(struct acpi_table_hmat),
> +                                               &hmat_proc, 1, 0);
> +               }
>                 cnt = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
>                                             acpi_parse_memory_affinity, 0);
>         }
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index aee299a6aa76..a24c918a4496 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -44,6 +44,7 @@ enum memblock_flags {
>         MEMBLOCK_HOTPLUG        = 0x1,  /* hotpluggable region */
>         MEMBLOCK_MIRROR         = 0x2,  /* mirrored region */
>         MEMBLOCK_NOMAP          = 0x4,  /* don't add to kernel direct mapping */
> +       MEMBLOCK_SIDECACHED     = 0x8,  /* System side caches memory access */

I'm concerned that we may be stretching memblock past its intended use
case especially for just this randomization case. For example, I think
memblock_find_in_range() gets confused in the presence of
MEMBLOCK_SIDECACHED memblocks.

  reply	other threads:[~2018-12-11 20:30 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-11  1:02 [PATCHv2 00/12] Heterogeneous memory node attributes Keith Busch
2018-12-11  1:02 ` [PATCHv2 01/12] acpi: Create subtable parsing infrastructure Keith Busch
2018-12-11  9:44   ` Rafael J. Wysocki
2018-12-19 23:19     ` Schmauss, Erik
2018-12-19 23:59       ` Dan Williams
2018-12-20  1:15         ` Schmauss, Erik
2018-12-20  8:57           ` Rafael J. Wysocki
2018-12-20 19:00             ` Schmauss, Erik
2018-12-13  9:05   ` kbuild test robot
2018-12-19 22:10   ` kbuild test robot
2018-12-11  1:03 ` [PATCHv2 02/12] acpi/hmat: Parse and report heterogeneous memory Keith Busch
2018-12-11  6:03   ` Dan Williams
2018-12-11 16:55     ` Keith Busch
2018-12-11 20:29       ` Dan Williams [this message]
2018-12-11 20:44         ` Keith Busch
2018-12-11 22:50           ` Dan Williams
2018-12-11  1:03 ` [PATCHv2 03/12] node: Link memory nodes to their compute nodes Keith Busch
2018-12-11  1:03 ` [PATCHv2 04/12] Documentation/ABI: Add new node sysfs attributes Keith Busch
2018-12-11  1:03 ` [PATCHv2 05/12] acpi/hmat: Register processor domain to its memory Keith Busch
2018-12-11  1:03 ` [PATCHv2 06/12] node: Add heterogenous memory performance Keith Busch
2018-12-11  1:03 ` [PATCHv2 07/12] Documentation/ABI: Add node performance attributes Keith Busch
2018-12-11  1:03 ` [PATCHv2 08/12] acpi/hmat: Register " Keith Busch
2018-12-11  1:03 ` [PATCHv2 09/12] node: Add memory caching attributes Keith Busch
2018-12-11  1:03 ` [PATCHv2 10/12] Documentation/ABI: Add node cache attributes Keith Busch
2018-12-11  1:03 ` [PATCHv2 11/12] acpi/hmat: Register memory side " Keith Busch
2018-12-11  1:03 ` [PATCHv2 12/12] doc/mm: New documentation for memory performance Keith Busch
2018-12-11  6:45   ` Mike Rapoport
2018-12-12  4:53   ` Aneesh Kumar K.V
2018-12-12 14:45     ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4id0mgjdBPPw8Y26rodOEQ=EHfaTrgasU5g4X7u=dS2xw@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=keith.busch@intel.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rafael@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).