* [PATCH 1/5] ACPI: NUMA: Add 'nohmat' option
2020-03-02 22:19 [PATCH 0/5] Manual definition of Soft Reserved memory devices Dan Williams
@ 2020-03-02 22:20 ` Dan Williams
2020-03-18 0:08 ` Dan Williams
2020-03-02 22:20 ` [PATCH 2/5] efi/fake_mem: Arrange for a resource entry per efi_fake_mem instance Dan Williams
` (4 subsequent siblings)
5 siblings, 1 reply; 15+ messages in thread
From: Dan Williams @ 2020-03-02 22:20 UTC (permalink / raw)
To: linux-acpi
Cc: x86, Rafael J. Wysocki, Dave Hansen, Andy Lutomirski,
Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
H. Peter Anvin, ard.biesheuvel, linux-nvdimm, linux-kernel
Disable parsing of the HMAT for debug, to workaround broken platform
instances, or cases where it is otherwise not wanted.
Cc: x86@kernel.org
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
arch/x86/mm/numa.c | 4 ++++
drivers/acpi/numa/hmat.c | 3 ++-
include/acpi/acpi_numa.h | 1 +
3 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 59ba008504dc..22de2e2610c1 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -44,6 +44,10 @@ static __init int numa_setup(char *opt)
#ifdef CONFIG_ACPI_NUMA
if (!strncmp(opt, "noacpi", 6))
acpi_numa = -1;
+#ifdef CONFIG_ACPI_HMAT
+ if (!strncmp(opt, "nohmat", 6))
+ hmat_disable = 1;
+#endif
#endif
return 0;
}
diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
index 2c32cfb72370..d3db121e393a 100644
--- a/drivers/acpi/numa/hmat.c
+++ b/drivers/acpi/numa/hmat.c
@@ -26,6 +26,7 @@
#include <linux/sysfs.h>
static u8 hmat_revision;
+int hmat_disable __initdata;
static LIST_HEAD(targets);
static LIST_HEAD(initiators);
@@ -814,7 +815,7 @@ static __init int hmat_init(void)
enum acpi_hmat_type i;
acpi_status status;
- if (srat_disabled())
+ if (srat_disabled() || hmat_disable)
return 0;
status = acpi_get_table(ACPI_SIG_SRAT, 0, &tbl);
diff --git a/include/acpi/acpi_numa.h b/include/acpi/acpi_numa.h
index fdebcfc6c8df..48ca468e9b61 100644
--- a/include/acpi/acpi_numa.h
+++ b/include/acpi/acpi_numa.h
@@ -18,6 +18,7 @@ extern int node_to_pxm(int);
extern int acpi_map_pxm_to_node(int);
extern unsigned char acpi_srat_revision;
extern int acpi_numa __initdata;
+extern int hmat_disable __initdata;
extern void bad_srat(void);
extern int srat_disabled(void);
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 1/5] ACPI: NUMA: Add 'nohmat' option
2020-03-02 22:20 ` [PATCH 1/5] ACPI: NUMA: Add 'nohmat' option Dan Williams
@ 2020-03-18 0:08 ` Dan Williams
2020-03-18 8:24 ` Rafael J. Wysocki
0 siblings, 1 reply; 15+ messages in thread
From: Dan Williams @ 2020-03-18 0:08 UTC (permalink / raw)
To: Linux ACPI
Cc: X86 ML, Rafael J. Wysocki, Dave Hansen, Andy Lutomirski,
Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
H. Peter Anvin, Ard Biesheuvel, linux-nvdimm,
Linux Kernel Mailing List
On Mon, Mar 2, 2020 at 2:36 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Disable parsing of the HMAT for debug, to workaround broken platform
> instances, or cases where it is otherwise not wanted.
Rafael, any heartburn with this change to the numa= option?
...as I look at this I realize I failed to also update
Documentation/x86/x86_64/boot-options.rst, will fix.
>
> Cc: x86@kernel.org
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
> arch/x86/mm/numa.c | 4 ++++
> drivers/acpi/numa/hmat.c | 3 ++-
> include/acpi/acpi_numa.h | 1 +
> 3 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index 59ba008504dc..22de2e2610c1 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -44,6 +44,10 @@ static __init int numa_setup(char *opt)
> #ifdef CONFIG_ACPI_NUMA
> if (!strncmp(opt, "noacpi", 6))
> acpi_numa = -1;
> +#ifdef CONFIG_ACPI_HMAT
> + if (!strncmp(opt, "nohmat", 6))
> + hmat_disable = 1;
> +#endif
> #endif
> return 0;
> }
> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
> index 2c32cfb72370..d3db121e393a 100644
> --- a/drivers/acpi/numa/hmat.c
> +++ b/drivers/acpi/numa/hmat.c
> @@ -26,6 +26,7 @@
> #include <linux/sysfs.h>
>
> static u8 hmat_revision;
> +int hmat_disable __initdata;
>
> static LIST_HEAD(targets);
> static LIST_HEAD(initiators);
> @@ -814,7 +815,7 @@ static __init int hmat_init(void)
> enum acpi_hmat_type i;
> acpi_status status;
>
> - if (srat_disabled())
> + if (srat_disabled() || hmat_disable)
> return 0;
>
> status = acpi_get_table(ACPI_SIG_SRAT, 0, &tbl);
> diff --git a/include/acpi/acpi_numa.h b/include/acpi/acpi_numa.h
> index fdebcfc6c8df..48ca468e9b61 100644
> --- a/include/acpi/acpi_numa.h
> +++ b/include/acpi/acpi_numa.h
> @@ -18,6 +18,7 @@ extern int node_to_pxm(int);
> extern int acpi_map_pxm_to_node(int);
> extern unsigned char acpi_srat_revision;
> extern int acpi_numa __initdata;
> +extern int hmat_disable __initdata;
>
> extern void bad_srat(void);
> extern int srat_disabled(void);
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/5] ACPI: NUMA: Add 'nohmat' option
2020-03-18 0:08 ` Dan Williams
@ 2020-03-18 8:24 ` Rafael J. Wysocki
2020-03-18 17:39 ` Dan Williams
0 siblings, 1 reply; 15+ messages in thread
From: Rafael J. Wysocki @ 2020-03-18 8:24 UTC (permalink / raw)
To: Dan Williams
Cc: Linux ACPI, X86 ML, Rafael J. Wysocki, Dave Hansen,
Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, H. Peter Anvin, Ard Biesheuvel, linux-nvdimm,
Linux Kernel Mailing List
On Wed, Mar 18, 2020 at 1:09 AM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Mon, Mar 2, 2020 at 2:36 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > Disable parsing of the HMAT for debug, to workaround broken platform
> > instances, or cases where it is otherwise not wanted.
>
> Rafael, any heartburn with this change to the numa= option?
>
> ...as I look at this I realize I failed to also update
> Documentation/x86/x86_64/boot-options.rst, will fix.
Thanks!
Apart from this just a minor nit below.
> >
> > Cc: x86@kernel.org
> > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Andy Lutomirski <luto@kernel.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> > arch/x86/mm/numa.c | 4 ++++
> > drivers/acpi/numa/hmat.c | 3 ++-
> > include/acpi/acpi_numa.h | 1 +
> > 3 files changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> > index 59ba008504dc..22de2e2610c1 100644
> > --- a/arch/x86/mm/numa.c
> > +++ b/arch/x86/mm/numa.c
> > @@ -44,6 +44,10 @@ static __init int numa_setup(char *opt)
> > #ifdef CONFIG_ACPI_NUMA
> > if (!strncmp(opt, "noacpi", 6))
> > acpi_numa = -1;
> > +#ifdef CONFIG_ACPI_HMAT
> > + if (!strncmp(opt, "nohmat", 6))
> > + hmat_disable = 1;
> > +#endif
I wonder if IS_ENABLED() would work here?
> > #endif
> > return 0;
> > }
> > diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
> > index 2c32cfb72370..d3db121e393a 100644
> > --- a/drivers/acpi/numa/hmat.c
> > +++ b/drivers/acpi/numa/hmat.c
> > @@ -26,6 +26,7 @@
> > #include <linux/sysfs.h>
> >
> > static u8 hmat_revision;
> > +int hmat_disable __initdata;
> >
> > static LIST_HEAD(targets);
> > static LIST_HEAD(initiators);
> > @@ -814,7 +815,7 @@ static __init int hmat_init(void)
> > enum acpi_hmat_type i;
> > acpi_status status;
> >
> > - if (srat_disabled())
> > + if (srat_disabled() || hmat_disable)
> > return 0;
> >
> > status = acpi_get_table(ACPI_SIG_SRAT, 0, &tbl);
> > diff --git a/include/acpi/acpi_numa.h b/include/acpi/acpi_numa.h
> > index fdebcfc6c8df..48ca468e9b61 100644
> > --- a/include/acpi/acpi_numa.h
> > +++ b/include/acpi/acpi_numa.h
> > @@ -18,6 +18,7 @@ extern int node_to_pxm(int);
> > extern int acpi_map_pxm_to_node(int);
> > extern unsigned char acpi_srat_revision;
> > extern int acpi_numa __initdata;
> > +extern int hmat_disable __initdata;
> >
> > extern void bad_srat(void);
> > extern int srat_disabled(void);
> >
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/5] ACPI: NUMA: Add 'nohmat' option
2020-03-18 8:24 ` Rafael J. Wysocki
@ 2020-03-18 17:39 ` Dan Williams
2020-03-19 9:30 ` Rafael J. Wysocki
0 siblings, 1 reply; 15+ messages in thread
From: Dan Williams @ 2020-03-18 17:39 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux ACPI, X86 ML, Rafael J. Wysocki, Dave Hansen,
Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, H. Peter Anvin, Ard Biesheuvel, linux-nvdimm,
Linux Kernel Mailing List
On Wed, Mar 18, 2020 at 1:24 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Wed, Mar 18, 2020 at 1:09 AM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > On Mon, Mar 2, 2020 at 2:36 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > >
> > > Disable parsing of the HMAT for debug, to workaround broken platform
> > > instances, or cases where it is otherwise not wanted.
> >
> > Rafael, any heartburn with this change to the numa= option?
> >
> > ...as I look at this I realize I failed to also update
> > Documentation/x86/x86_64/boot-options.rst, will fix.
>
> Thanks!
>
> Apart from this just a minor nit below.
>
> > >
> > > Cc: x86@kernel.org
> > > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> > > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > > Cc: Andy Lutomirski <luto@kernel.org>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Ingo Molnar <mingo@redhat.com>
> > > Cc: Borislav Petkov <bp@alien8.de>
> > > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > > ---
> > > arch/x86/mm/numa.c | 4 ++++
> > > drivers/acpi/numa/hmat.c | 3 ++-
> > > include/acpi/acpi_numa.h | 1 +
> > > 3 files changed, 7 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> > > index 59ba008504dc..22de2e2610c1 100644
> > > --- a/arch/x86/mm/numa.c
> > > +++ b/arch/x86/mm/numa.c
> > > @@ -44,6 +44,10 @@ static __init int numa_setup(char *opt)
> > > #ifdef CONFIG_ACPI_NUMA
> > > if (!strncmp(opt, "noacpi", 6))
> > > acpi_numa = -1;
> > > +#ifdef CONFIG_ACPI_HMAT
> > > + if (!strncmp(opt, "nohmat", 6))
> > > + hmat_disable = 1;
> > > +#endif
>
> I wonder if IS_ENABLED() would work here?
I took a look. hmat_disable, acpi_numa, and numa_emu_cmdline() are in
other compilation units. I could wrap writing those variables with
helper functions, and change numa_emu_cmdline(), to compile away when
their respective configuration options are not present.
Should we do that in general to have a touch point to report "you
specified an option that is invalid for your current kernel
configuration"? I'm happy to do that as a follow-on if you think it's
worthwhile.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/5] ACPI: NUMA: Add 'nohmat' option
2020-03-18 17:39 ` Dan Williams
@ 2020-03-19 9:30 ` Rafael J. Wysocki
0 siblings, 0 replies; 15+ messages in thread
From: Rafael J. Wysocki @ 2020-03-19 9:30 UTC (permalink / raw)
To: Dan Williams
Cc: Rafael J. Wysocki, Linux ACPI, X86 ML, Rafael J. Wysocki,
Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, H. Peter Anvin, Ard Biesheuvel,
linux-nvdimm, Linux Kernel Mailing List
On Wed, Mar 18, 2020 at 6:39 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Wed, Mar 18, 2020 at 1:24 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Wed, Mar 18, 2020 at 1:09 AM Dan Williams <dan.j.williams@intel.com> wrote:
> > >
> > > On Mon, Mar 2, 2020 at 2:36 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > > >
> > > > Disable parsing of the HMAT for debug, to workaround broken platform
> > > > instances, or cases where it is otherwise not wanted.
> > >
> > > Rafael, any heartburn with this change to the numa= option?
> > >
> > > ...as I look at this I realize I failed to also update
> > > Documentation/x86/x86_64/boot-options.rst, will fix.
> >
> > Thanks!
> >
> > Apart from this just a minor nit below.
> >
> > > >
> > > > Cc: x86@kernel.org
> > > > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> > > > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > > > Cc: Andy Lutomirski <luto@kernel.org>
> > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > Cc: Ingo Molnar <mingo@redhat.com>
> > > > Cc: Borislav Petkov <bp@alien8.de>
> > > > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > > > ---
> > > > arch/x86/mm/numa.c | 4 ++++
> > > > drivers/acpi/numa/hmat.c | 3 ++-
> > > > include/acpi/acpi_numa.h | 1 +
> > > > 3 files changed, 7 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> > > > index 59ba008504dc..22de2e2610c1 100644
> > > > --- a/arch/x86/mm/numa.c
> > > > +++ b/arch/x86/mm/numa.c
> > > > @@ -44,6 +44,10 @@ static __init int numa_setup(char *opt)
> > > > #ifdef CONFIG_ACPI_NUMA
> > > > if (!strncmp(opt, "noacpi", 6))
> > > > acpi_numa = -1;
> > > > +#ifdef CONFIG_ACPI_HMAT
> > > > + if (!strncmp(opt, "nohmat", 6))
> > > > + hmat_disable = 1;
> > > > +#endif
> >
> > I wonder if IS_ENABLED() would work here?
>
> I took a look. hmat_disable, acpi_numa, and numa_emu_cmdline() are in
> other compilation units. I could wrap writing those variables with
> helper functions, and change numa_emu_cmdline(), to compile away when
> their respective configuration options are not present.
>
> Should we do that in general to have a touch point to report "you
> specified an option that is invalid for your current kernel
> configuration"? I'm happy to do that as a follow-on if you think it's
> worthwhile.
Yes, please.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/5] efi/fake_mem: Arrange for a resource entry per efi_fake_mem instance
2020-03-02 22:19 [PATCH 0/5] Manual definition of Soft Reserved memory devices Dan Williams
2020-03-02 22:20 ` [PATCH 1/5] ACPI: NUMA: Add 'nohmat' option Dan Williams
@ 2020-03-02 22:20 ` Dan Williams
2020-03-03 8:01 ` Ard Biesheuvel
2020-03-02 22:20 ` [PATCH 3/5] ACPI: HMAT: Refactor hmat_register_target_device to hmem_register_device Dan Williams
` (3 subsequent siblings)
5 siblings, 1 reply; 15+ messages in thread
From: Dan Williams @ 2020-03-02 22:20 UTC (permalink / raw)
To: linux-acpi
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
x86, Ard Biesheuvel, peterz, dave.hansen, ard.biesheuvel,
linux-nvdimm, linux-kernel
In preparation for attaching a platform device per iomem resource teach
the efi_fake_mem code to create an e820 entry per instance. Similar to
E820_TYPE_PRAM, bypass merging resource when the e820 map is sanitized.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
arch/x86/kernel/e820.c | 16 +++++++++++++++-
drivers/firmware/efi/x86_fake_mem.c | 12 +++++++++---
2 files changed, 24 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index c5399e80c59c..96babb3a6629 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -305,6 +305,20 @@ static int __init cpcompare(const void *a, const void *b)
return (ap->addr != ap->entry->addr) - (bp->addr != bp->entry->addr);
}
+static bool e820_nomerge(enum e820_type type)
+{
+ /*
+ * These types may indicate distinct platform ranges aligned to
+ * numa node, protection domain, performance domain, or other
+ * boundaries. Do not merge them.
+ */
+ if (type == E820_TYPE_PRAM)
+ return true;
+ if (type == E820_TYPE_SOFT_RESERVED)
+ return true;
+ return false;
+}
+
int __init e820__update_table(struct e820_table *table)
{
struct e820_entry *entries = table->entries;
@@ -380,7 +394,7 @@ int __init e820__update_table(struct e820_table *table)
}
/* Continue building up new map based on this information: */
- if (current_type != last_type || current_type == E820_TYPE_PRAM) {
+ if (current_type != last_type || e820_nomerge(current_type)) {
if (last_type != 0) {
new_entries[new_nr_entries].size = change_point[chg_idx]->addr - last_addr;
/* Move forward only if the new size was non-zero: */
diff --git a/drivers/firmware/efi/x86_fake_mem.c b/drivers/firmware/efi/x86_fake_mem.c
index e5d6d5a1b240..0bafcc1bb0f6 100644
--- a/drivers/firmware/efi/x86_fake_mem.c
+++ b/drivers/firmware/efi/x86_fake_mem.c
@@ -38,7 +38,7 @@ void __init efi_fake_memmap_early(void)
m_start = mem->range.start;
m_end = mem->range.end;
for_each_efi_memory_desc(md) {
- u64 start, end;
+ u64 start, end, size;
if (md->type != EFI_CONVENTIONAL_MEMORY)
continue;
@@ -58,11 +58,17 @@ void __init efi_fake_memmap_early(void)
*/
start = max(start, m_start);
end = min(end, m_end);
+ size = end - start + 1;
if (end <= start)
continue;
- e820__range_update(start, end - start + 1, E820_TYPE_RAM,
- E820_TYPE_SOFT_RESERVED);
+
+ /*
+ * Ensure each efi_fake_mem instance results in
+ * a unique e820 resource
+ */
+ e820__range_remove(start, size, E820_TYPE_RAM, 1);
+ e820__range_add(start, size, E820_TYPE_SOFT_RESERVED);
e820__update_table(e820_table);
}
}
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/5] efi/fake_mem: Arrange for a resource entry per efi_fake_mem instance
2020-03-02 22:20 ` [PATCH 2/5] efi/fake_mem: Arrange for a resource entry per efi_fake_mem instance Dan Williams
@ 2020-03-03 8:01 ` Ard Biesheuvel
0 siblings, 0 replies; 15+ messages in thread
From: Ard Biesheuvel @ 2020-03-03 8:01 UTC (permalink / raw)
To: Dan Williams
Cc: ACPI Devel Maling List, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, H. Peter Anvin, the arch/x86 maintainers,
Peter Zijlstra, Dave Hansen, linux-nvdimm,
Linux Kernel Mailing List
On Mon, 2 Mar 2020 at 23:36, Dan Williams <dan.j.williams@intel.com> wrote:
>
> In preparation for attaching a platform device per iomem resource teach
> the efi_fake_mem code to create an e820 entry per instance. Similar to
> E820_TYPE_PRAM, bypass merging resource when the e820 map is sanitized.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> arch/x86/kernel/e820.c | 16 +++++++++++++++-
> drivers/firmware/efi/x86_fake_mem.c | 12 +++++++++---
> 2 files changed, 24 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> index c5399e80c59c..96babb3a6629 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -305,6 +305,20 @@ static int __init cpcompare(const void *a, const void *b)
> return (ap->addr != ap->entry->addr) - (bp->addr != bp->entry->addr);
> }
>
> +static bool e820_nomerge(enum e820_type type)
> +{
> + /*
> + * These types may indicate distinct platform ranges aligned to
> + * numa node, protection domain, performance domain, or other
> + * boundaries. Do not merge them.
> + */
> + if (type == E820_TYPE_PRAM)
> + return true;
> + if (type == E820_TYPE_SOFT_RESERVED)
> + return true;
> + return false;
> +}
> +
> int __init e820__update_table(struct e820_table *table)
> {
> struct e820_entry *entries = table->entries;
> @@ -380,7 +394,7 @@ int __init e820__update_table(struct e820_table *table)
> }
>
> /* Continue building up new map based on this information: */
> - if (current_type != last_type || current_type == E820_TYPE_PRAM) {
> + if (current_type != last_type || e820_nomerge(current_type)) {
> if (last_type != 0) {
> new_entries[new_nr_entries].size = change_point[chg_idx]->addr - last_addr;
> /* Move forward only if the new size was non-zero: */
> diff --git a/drivers/firmware/efi/x86_fake_mem.c b/drivers/firmware/efi/x86_fake_mem.c
> index e5d6d5a1b240..0bafcc1bb0f6 100644
> --- a/drivers/firmware/efi/x86_fake_mem.c
> +++ b/drivers/firmware/efi/x86_fake_mem.c
> @@ -38,7 +38,7 @@ void __init efi_fake_memmap_early(void)
> m_start = mem->range.start;
> m_end = mem->range.end;
> for_each_efi_memory_desc(md) {
> - u64 start, end;
> + u64 start, end, size;
>
> if (md->type != EFI_CONVENTIONAL_MEMORY)
> continue;
> @@ -58,11 +58,17 @@ void __init efi_fake_memmap_early(void)
> */
> start = max(start, m_start);
> end = min(end, m_end);
> + size = end - start + 1;
>
> if (end <= start)
> continue;
> - e820__range_update(start, end - start + 1, E820_TYPE_RAM,
> - E820_TYPE_SOFT_RESERVED);
> +
> + /*
> + * Ensure each efi_fake_mem instance results in
> + * a unique e820 resource
> + */
> + e820__range_remove(start, size, E820_TYPE_RAM, 1);
> + e820__range_add(start, size, E820_TYPE_SOFT_RESERVED);
> e820__update_table(e820_table);
> }
> }
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 3/5] ACPI: HMAT: Refactor hmat_register_target_device to hmem_register_device
2020-03-02 22:19 [PATCH 0/5] Manual definition of Soft Reserved memory devices Dan Williams
2020-03-02 22:20 ` [PATCH 1/5] ACPI: NUMA: Add 'nohmat' option Dan Williams
2020-03-02 22:20 ` [PATCH 2/5] efi/fake_mem: Arrange for a resource entry per efi_fake_mem instance Dan Williams
@ 2020-03-02 22:20 ` Dan Williams
2020-03-02 22:20 ` [PATCH 4/5] resource: Report parent to walk_iomem_res_desc() callback Dan Williams
` (2 subsequent siblings)
5 siblings, 0 replies; 15+ messages in thread
From: Dan Williams @ 2020-03-02 22:20 UTC (permalink / raw)
To: linux-acpi
Cc: Rafael J. Wysocki, peterz, dave.hansen, ard.biesheuvel,
linux-nvdimm, linux-kernel
In preparation for exposing "Soft Reserved" memory ranges without an
HMAT, move the hmem device registration to its own compilation unit and
make the implementation generic.
The generic implementation drops usage acpi_map_pxm_to_online_node()
that was translating ACPI proximity domain values and instead relies on
numa_map_to_online_node() to determine the numa node for the device.
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/acpi/numa/hmat.c | 68 ++++-----------------------------------------
drivers/dax/Kconfig | 4 +++
drivers/dax/Makefile | 3 +-
drivers/dax/hmem/Makefile | 5 +++
drivers/dax/hmem/device.c | 64 ++++++++++++++++++++++++++++++++++++++++++
drivers/dax/hmem/hmem.c | 2 +
include/linux/dax.h | 8 +++++
7 files changed, 89 insertions(+), 65 deletions(-)
create mode 100644 drivers/dax/hmem/Makefile
create mode 100644 drivers/dax/hmem/device.c
rename drivers/dax/{hmem.c => hmem/hmem.c} (98%)
diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
index d3db121e393a..2379efcea570 100644
--- a/drivers/acpi/numa/hmat.c
+++ b/drivers/acpi/numa/hmat.c
@@ -24,6 +24,7 @@
#include <linux/mutex.h>
#include <linux/node.h>
#include <linux/sysfs.h>
+#include <linux/dax.h>
static u8 hmat_revision;
int hmat_disable __initdata;
@@ -635,66 +636,6 @@ static void hmat_register_target_perf(struct memory_target *target)
node_set_perf_attrs(mem_nid, &target->hmem_attrs, 0);
}
-static void hmat_register_target_device(struct memory_target *target,
- struct resource *r)
-{
- /* define a clean / non-busy resource for the platform device */
- struct resource res = {
- .start = r->start,
- .end = r->end,
- .flags = IORESOURCE_MEM,
- };
- struct platform_device *pdev;
- struct memregion_info info;
- int rc, id;
-
- rc = region_intersects(res.start, resource_size(&res), IORESOURCE_MEM,
- IORES_DESC_SOFT_RESERVED);
- if (rc != REGION_INTERSECTS)
- return;
-
- id = memregion_alloc(GFP_KERNEL);
- if (id < 0) {
- pr_err("memregion allocation failure for %pr\n", &res);
- return;
- }
-
- pdev = platform_device_alloc("hmem", id);
- if (!pdev) {
- pr_err("hmem device allocation failure for %pr\n", &res);
- goto out_pdev;
- }
-
- pdev->dev.numa_node = acpi_map_pxm_to_online_node(target->memory_pxm);
- info = (struct memregion_info) {
- .target_node = acpi_map_pxm_to_node(target->memory_pxm),
- };
- rc = platform_device_add_data(pdev, &info, sizeof(info));
- if (rc < 0) {
- pr_err("hmem memregion_info allocation failure for %pr\n", &res);
- goto out_pdev;
- }
-
- rc = platform_device_add_resources(pdev, &res, 1);
- if (rc < 0) {
- pr_err("hmem resource allocation failure for %pr\n", &res);
- goto out_resource;
- }
-
- rc = platform_device_add(pdev);
- if (rc < 0) {
- dev_err(&pdev->dev, "device add failed for %pr\n", &res);
- goto out_resource;
- }
-
- return;
-
-out_resource:
- put_device(&pdev->dev);
-out_pdev:
- memregion_free(id);
-}
-
static void hmat_register_target_devices(struct memory_target *target)
{
struct resource *res;
@@ -706,8 +647,11 @@ static void hmat_register_target_devices(struct memory_target *target)
if (!IS_ENABLED(CONFIG_DEV_DAX_HMEM))
return;
- for (res = target->memregions.child; res; res = res->sibling)
- hmat_register_target_device(target, res);
+ for (res = target->memregions.child; res; res = res->sibling) {
+ int target_nid = acpi_map_pxm_to_node(target->memory_pxm);
+
+ hmem_register_device(target_nid, res);
+ }
}
static void hmat_register_target(struct memory_target *target)
diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index 3b6c06f07326..a229f45d34aa 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -48,6 +48,10 @@ config DEV_DAX_HMEM
Say M if unsure.
+config DEV_DAX_HMEM_DEVICES
+ depends on DEV_DAX_HMEM
+ def_bool y
+
config DEV_DAX_KMEM
tristate "KMEM DAX: volatile-use of persistent memory"
default DEV_DAX
diff --git a/drivers/dax/Makefile b/drivers/dax/Makefile
index 80065b38b3c4..9d4ba672d305 100644
--- a/drivers/dax/Makefile
+++ b/drivers/dax/Makefile
@@ -2,11 +2,10 @@
obj-$(CONFIG_DAX) += dax.o
obj-$(CONFIG_DEV_DAX) += device_dax.o
obj-$(CONFIG_DEV_DAX_KMEM) += kmem.o
-obj-$(CONFIG_DEV_DAX_HMEM) += dax_hmem.o
dax-y := super.o
dax-y += bus.o
device_dax-y := device.o
-dax_hmem-y := hmem.o
obj-y += pmem/
+obj-y += hmem/
diff --git a/drivers/dax/hmem/Makefile b/drivers/dax/hmem/Makefile
new file mode 100644
index 000000000000..a9d353d0c9ed
--- /dev/null
+++ b/drivers/dax/hmem/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_DEV_DAX_HMEM) += dax_hmem.o
+obj-$(CONFIG_DEV_DAX_HMEM_DEVICES) += device.o
+
+dax_hmem-y := hmem.o
diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
new file mode 100644
index 000000000000..99bc15a8b031
--- /dev/null
+++ b/drivers/dax/hmem/device.c
@@ -0,0 +1,64 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/platform_device.h>
+#include <linux/memregion.h>
+#include <linux/module.h>
+#include <linux/mm.h>
+
+void hmem_register_device(int target_nid, struct resource *r)
+{
+ /* define a clean / non-busy resource for the platform device */
+ struct resource res = {
+ .start = r->start,
+ .end = r->end,
+ .flags = IORESOURCE_MEM,
+ };
+ struct platform_device *pdev;
+ struct memregion_info info;
+ int rc, id;
+
+ rc = region_intersects(res.start, resource_size(&res), IORESOURCE_MEM,
+ IORES_DESC_SOFT_RESERVED);
+ if (rc != REGION_INTERSECTS)
+ return;
+
+ id = memregion_alloc(GFP_KERNEL);
+ if (id < 0) {
+ pr_err("memregion allocation failure for %pr\n", &res);
+ return;
+ }
+
+ pdev = platform_device_alloc("hmem", id);
+ if (!pdev) {
+ pr_err("hmem device allocation failure for %pr\n", &res);
+ goto out_pdev;
+ }
+
+ pdev->dev.numa_node = numa_map_to_online_node(target_nid);
+ info = (struct memregion_info) {
+ .target_node = target_nid,
+ };
+ rc = platform_device_add_data(pdev, &info, sizeof(info));
+ if (rc < 0) {
+ pr_err("hmem memregion_info allocation failure for %pr\n", &res);
+ goto out_pdev;
+ }
+
+ rc = platform_device_add_resources(pdev, &res, 1);
+ if (rc < 0) {
+ pr_err("hmem resource allocation failure for %pr\n", &res);
+ goto out_resource;
+ }
+
+ rc = platform_device_add(pdev);
+ if (rc < 0) {
+ dev_err(&pdev->dev, "device add failed for %pr\n", &res);
+ goto out_resource;
+ }
+
+ return;
+
+out_resource:
+ put_device(&pdev->dev);
+out_pdev:
+ memregion_free(id);
+}
diff --git a/drivers/dax/hmem.c b/drivers/dax/hmem/hmem.c
similarity index 98%
rename from drivers/dax/hmem.c
rename to drivers/dax/hmem/hmem.c
index fe7214daf62e..29ceb5795297 100644
--- a/drivers/dax/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -3,7 +3,7 @@
#include <linux/memregion.h>
#include <linux/module.h>
#include <linux/pfn_t.h>
-#include "bus.h"
+#include "../bus.h"
static int dax_hmem_probe(struct platform_device *pdev)
{
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 9bd8528bd305..9f6c282e9140 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -239,4 +239,12 @@ static inline bool dax_mapping(struct address_space *mapping)
return mapping->host && IS_DAX(mapping->host);
}
+#ifdef CONFIG_DEV_DAX_HMEM_DEVICES
+void hmem_register_device(int target_nid, struct resource *r);
+#else
+static inline void hmem_register_device(int target_nid, struct resource *r)
+{
+}
+#endif
+
#endif
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 4/5] resource: Report parent to walk_iomem_res_desc() callback
2020-03-02 22:19 [PATCH 0/5] Manual definition of Soft Reserved memory devices Dan Williams
` (2 preceding siblings ...)
2020-03-02 22:20 ` [PATCH 3/5] ACPI: HMAT: Refactor hmat_register_target_device to hmem_register_device Dan Williams
@ 2020-03-02 22:20 ` Dan Williams
2020-03-05 14:42 ` Tom Lendacky
2020-03-02 22:20 ` [PATCH 5/5] ACPI: HMAT: Attach a device for each soft-reserved range Dan Williams
2020-03-06 20:07 ` [PATCH 0/5] Manual definition of Soft Reserved memory devices Jeff Moyer
5 siblings, 1 reply; 15+ messages in thread
From: Dan Williams @ 2020-03-02 22:20 UTC (permalink / raw)
To: linux-acpi
Cc: Jason Gunthorpe, Dave Hansen, Wei Yang, Tom Lendacky, peterz,
ard.biesheuvel, linux-nvdimm, linux-kernel
In support of detecting whether a resource might have been been claimed,
report the parent to the walk_iomem_res_desc() callback. For example,
the ACPI HMAT parser publishes "hmem" platform devices per target range.
However, if the HMAT is disabled / missing a fallback driver can attach
devices to the raw memory ranges as a fallback if it sees unclaimed /
orphan "Soft Reserved" resources in the resource tree.
Otherwise, find_next_iomem_res() returns a resource with garbage data
from the stack allocation in __walk_iomem_res_desc() for the res->parent
field.
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
kernel/resource.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/resource.c b/kernel/resource.c
index 76036a41143b..6e22e312fd55 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -386,6 +386,7 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
res->end = min(end, p->end);
res->flags = p->flags;
res->desc = p->desc;
+ res->parent = p->parent;
}
read_unlock(&resource_lock);
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 4/5] resource: Report parent to walk_iomem_res_desc() callback
2020-03-02 22:20 ` [PATCH 4/5] resource: Report parent to walk_iomem_res_desc() callback Dan Williams
@ 2020-03-05 14:42 ` Tom Lendacky
2020-03-17 22:04 ` Dan Williams
0 siblings, 1 reply; 15+ messages in thread
From: Tom Lendacky @ 2020-03-05 14:42 UTC (permalink / raw)
To: Dan Williams, linux-acpi
Cc: Jason Gunthorpe, Dave Hansen, Wei Yang, peterz, ard.biesheuvel,
linux-nvdimm, linux-kernel
On 3/2/20 4:20 PM, Dan Williams wrote:
> In support of detecting whether a resource might have been been claimed,
> report the parent to the walk_iomem_res_desc() callback. For example,
> the ACPI HMAT parser publishes "hmem" platform devices per target range.
> However, if the HMAT is disabled / missing a fallback driver can attach
> devices to the raw memory ranges as a fallback if it sees unclaimed /
> orphan "Soft Reserved" resources in the resource tree.
>
> Otherwise, find_next_iomem_res() returns a resource with garbage data
> from the stack allocation in __walk_iomem_res_desc() for the res->parent
> field.
Just wondering if we shouldn't just copy the complete resource struct and
just override the start and end values? That way, if some code in the
future wants to look at sibling or child values, another change isn't needed.
Just a thought.
Thanks,
Tom
>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Wei Yang <richardw.yang@linux.intel.com>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
> kernel/resource.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 76036a41143b..6e22e312fd55 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -386,6 +386,7 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
> res->end = min(end, p->end);
> res->flags = p->flags;
> res->desc = p->desc;
> + res->parent = p->parent;
> }
>
> read_unlock(&resource_lock);
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/5] resource: Report parent to walk_iomem_res_desc() callback
2020-03-05 14:42 ` Tom Lendacky
@ 2020-03-17 22:04 ` Dan Williams
0 siblings, 0 replies; 15+ messages in thread
From: Dan Williams @ 2020-03-17 22:04 UTC (permalink / raw)
To: Tom Lendacky
Cc: Linux ACPI, Jason Gunthorpe, Dave Hansen, Wei Yang,
Peter Zijlstra, Ard Biesheuvel, linux-nvdimm,
Linux Kernel Mailing List
On Thu, Mar 5, 2020 at 6:42 AM Tom Lendacky <thomas.lendacky@amd.com> wrote:
>
> On 3/2/20 4:20 PM, Dan Williams wrote:
> > In support of detecting whether a resource might have been been claimed,
> > report the parent to the walk_iomem_res_desc() callback. For example,
> > the ACPI HMAT parser publishes "hmem" platform devices per target range.
> > However, if the HMAT is disabled / missing a fallback driver can attach
> > devices to the raw memory ranges as a fallback if it sees unclaimed /
> > orphan "Soft Reserved" resources in the resource tree.
> >
> > Otherwise, find_next_iomem_res() returns a resource with garbage data
> > from the stack allocation in __walk_iomem_res_desc() for the res->parent
> > field.
>
> Just wondering if we shouldn't just copy the complete resource struct and
> just override the start and end values? That way, if some code in the
> future wants to look at sibling or child values, another change isn't needed.
>
> Just a thought.
Thanks for taking a look. I think it's ok to come update this again if
that need arises.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 5/5] ACPI: HMAT: Attach a device for each soft-reserved range
2020-03-02 22:19 [PATCH 0/5] Manual definition of Soft Reserved memory devices Dan Williams
` (3 preceding siblings ...)
2020-03-02 22:20 ` [PATCH 4/5] resource: Report parent to walk_iomem_res_desc() callback Dan Williams
@ 2020-03-02 22:20 ` Dan Williams
2020-03-06 20:07 ` [PATCH 0/5] Manual definition of Soft Reserved memory devices Jeff Moyer
5 siblings, 0 replies; 15+ messages in thread
From: Dan Williams @ 2020-03-02 22:20 UTC (permalink / raw)
To: linux-acpi
Cc: Jonathan Cameron, Brice Goglin, Ard Biesheuvel,
Rafael J. Wysocki, Jeff Moyer, peterz, dave.hansen, linux-nvdimm,
linux-kernel
The hmem enabling in commit 'cf8741ac57ed ("ACPI: NUMA: HMAT: Register
"soft reserved" memory as an "hmem" device")' only registered ranges to
the hmem driver for each soft-reservation that also appeared in the
HMAT. While this is meant to encourage platform firmware to "do the
right thing" and publish an HMAT, the corollary is that platforms that
fail to publish an accurate HMAT will strand memory from Linux usage.
Additionally, the "efi_fake_mem" kernel command line option enabling
will strand memory by default without an HMAT.
Arrange for "soft reserved" memory that goes unclaimed by HMAT entries
to be published as raw resource ranges for the hmem driver to consume.
Include a module parameter to disable either this fallback behavior, or
the hmat enabling from creating hmem devices. The module parameter
requires the hmem device enabling to have unique name in the module
namespace: "device_hmem".
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/dax/Kconfig | 1 +
drivers/dax/hmem/Makefile | 3 ++-
drivers/dax/hmem/device.c | 33 +++++++++++++++++++++++++++++++++
3 files changed, 36 insertions(+), 1 deletion(-)
diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index a229f45d34aa..163edde6ba41 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -50,6 +50,7 @@ config DEV_DAX_HMEM
config DEV_DAX_HMEM_DEVICES
depends on DEV_DAX_HMEM
+ select NUMA_KEEP_MEMINFO if NUMA
def_bool y
config DEV_DAX_KMEM
diff --git a/drivers/dax/hmem/Makefile b/drivers/dax/hmem/Makefile
index a9d353d0c9ed..57377b4c3d47 100644
--- a/drivers/dax/hmem/Makefile
+++ b/drivers/dax/hmem/Makefile
@@ -1,5 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_DEV_DAX_HMEM) += dax_hmem.o
-obj-$(CONFIG_DEV_DAX_HMEM_DEVICES) += device.o
+obj-$(CONFIG_DEV_DAX_HMEM_DEVICES) += device_hmem.o
+device_hmem-y := device.o
dax_hmem-y := hmem.o
diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
index 99bc15a8b031..f9c5fa8b1880 100644
--- a/drivers/dax/hmem/device.c
+++ b/drivers/dax/hmem/device.c
@@ -4,6 +4,9 @@
#include <linux/module.h>
#include <linux/mm.h>
+static bool nohmem;
+module_param_named(disable, nohmem, bool, 0444);
+
void hmem_register_device(int target_nid, struct resource *r)
{
/* define a clean / non-busy resource for the platform device */
@@ -16,6 +19,9 @@ void hmem_register_device(int target_nid, struct resource *r)
struct memregion_info info;
int rc, id;
+ if (nohmem)
+ return;
+
rc = region_intersects(res.start, resource_size(&res), IORESOURCE_MEM,
IORES_DESC_SOFT_RESERVED);
if (rc != REGION_INTERSECTS)
@@ -62,3 +68,30 @@ void hmem_register_device(int target_nid, struct resource *r)
out_pdev:
memregion_free(id);
}
+
+static __init int hmem_register_one(struct resource *res, void *data)
+{
+ /*
+ * If the resource is not a top-level resource it was already
+ * assigned to a device by the HMAT parsing.
+ */
+ if (res->parent != &iomem_resource)
+ return 0;
+
+ hmem_register_device(phys_to_target_node(res->start), res);
+
+ return 0;
+}
+
+static __init int hmem_init(void)
+{
+ walk_iomem_res_desc(IORES_DESC_SOFT_RESERVED,
+ IORESOURCE_MEM, 0, -1, NULL, hmem_register_one);
+ return 0;
+}
+
+/*
+ * As this is a fallback for address ranges unclaimed by the ACPI HMAT
+ * parsing it must be at an initcall level greater than hmat_init().
+ */
+late_initcall(hmem_init);
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 0/5] Manual definition of Soft Reserved memory devices
2020-03-02 22:19 [PATCH 0/5] Manual definition of Soft Reserved memory devices Dan Williams
` (4 preceding siblings ...)
2020-03-02 22:20 ` [PATCH 5/5] ACPI: HMAT: Attach a device for each soft-reserved range Dan Williams
@ 2020-03-06 20:07 ` Jeff Moyer
2020-03-06 21:05 ` Dan Williams
5 siblings, 1 reply; 15+ messages in thread
From: Jeff Moyer @ 2020-03-06 20:07 UTC (permalink / raw)
To: Dan Williams
Cc: linux-acpi, Jason Gunthorpe, Peter Zijlstra, Ard Biesheuvel,
Jonathan Cameron, Borislav Petkov, Wei Yang, x86, H. Peter Anvin,
Brice Goglin, Thomas Gleixner, Ingo Molnar, Dave Hansen,
Rafael J. Wysocki, Ard Biesheuvel, Andy Lutomirski, Tom Lendacky,
linux-nvdimm, linux-kernel
Dan Williams <dan.j.williams@intel.com> writes:
> Given the current dearth of systems that supply an ACPI HMAT table, and
> the utility of being able to manually define device-dax "hmem" instances
> via the efi_fake_mem= option, relax the requirements for creating these
> devices. Specifically, add an option (numa=nohmat) to optionally disable
> consideration of the HMAT and update efi_fake_mem= to behave like
> memmap=nn!ss in terms of delimiting device boundaries.
So, am I correct in deducing that your primary motivation is testing
without hardware/firmware support? This looks like a bit of a hack to
me, and I think maybe it would be better to just emulate the HMAT using
qemu. I don't have a strong objection, though.
-Jeff
>
> All review welcome of course, but the E820 changes want an x86
> maintainer ack, the efi_fake_mem update needs Ard, and Rafael has
> previously shepherded the HMAT changes. For the changes to
> kernel/resource.c, where there is no clear maintainer, I just copied the
> last few people to make thoughtful changes in that area. I am happy to
> take these through the nvdimm tree along with these prerequisites
> already in -next:
>
> b2ca916ce392 ACPI: NUMA: Up-level "map to online node" functionality
> 4fcbe96e4d0b mm/numa: Skip NUMA_NO_NODE and online nodes in numa_map_to_online_node()
> 575e23b6e13c powerpc/papr_scm: Switch to numa_map_to_online_node()
> 1e5d8e1e47af x86/mm: Introduce CONFIG_NUMA_KEEP_MEMINFO
> 5d30f92e7631 x86/NUMA: Provide a range-to-target_node lookup facility
> 7b27a8622f80 libnvdimm/e820: Retrieve and populate correct 'target_node' info
>
> Tested with:
>
> numa=nohmat efi_fake_mem=4G@9G:0x40000,4G@13G:0x40000
>
> ...to create to device-dax instances:
>
> # daxctl list -RDu
> [
> {
> "path":"\/platform\/hmem.1",
> "id":1,
> "size":"4.00 GiB (4.29 GB)",
> "align":2097152,
> "devices":[
> {
> "chardev":"dax1.0",
> "size":"4.00 GiB (4.29 GB)",
> "target_node":3,
> "mode":"devdax"
> }
> ]
> },
> {
> "path":"\/platform\/hmem.0",
> "id":0,
> "size":"4.00 GiB (4.29 GB)",
> "align":2097152,
> "devices":[
> {
> "chardev":"dax0.0",
> "size":"4.00 GiB (4.29 GB)",
> "target_node":2,
> "mode":"devdax"
> }
> ]
> }
> ]
>
> ---
>
> Dan Williams (5):
> ACPI: NUMA: Add 'nohmat' option
> efi/fake_mem: Arrange for a resource entry per efi_fake_mem instance
> ACPI: HMAT: Refactor hmat_register_target_device to hmem_register_device
> resource: Report parent to walk_iomem_res_desc() callback
> ACPI: HMAT: Attach a device for each soft-reserved range
>
>
> arch/x86/kernel/e820.c | 16 +++++-
> arch/x86/mm/numa.c | 4 +
> drivers/acpi/numa/hmat.c | 71 +++-----------------------
> drivers/dax/Kconfig | 5 ++
> drivers/dax/Makefile | 3 -
> drivers/dax/hmem/Makefile | 6 ++
> drivers/dax/hmem/device.c | 97 +++++++++++++++++++++++++++++++++++
> drivers/dax/hmem/hmem.c | 2 -
> drivers/firmware/efi/x86_fake_mem.c | 12 +++-
> include/acpi/acpi_numa.h | 1
> include/linux/dax.h | 8 +++
> kernel/resource.c | 1
> 12 files changed, 156 insertions(+), 70 deletions(-)
> create mode 100644 drivers/dax/hmem/Makefile
> create mode 100644 drivers/dax/hmem/device.c
> rename drivers/dax/{hmem.c => hmem/hmem.c} (98%)
>
> base-commit: 7b27a8622f802761d5c6abd6c37b22312a35343c
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/5] Manual definition of Soft Reserved memory devices
2020-03-06 20:07 ` [PATCH 0/5] Manual definition of Soft Reserved memory devices Jeff Moyer
@ 2020-03-06 21:05 ` Dan Williams
0 siblings, 0 replies; 15+ messages in thread
From: Dan Williams @ 2020-03-06 21:05 UTC (permalink / raw)
To: Jeff Moyer
Cc: Linux ACPI, Jason Gunthorpe, Peter Zijlstra, Ard Biesheuvel,
Jonathan Cameron, Borislav Petkov, Wei Yang, X86 ML,
H. Peter Anvin, Brice Goglin, Thomas Gleixner, Ingo Molnar,
Dave Hansen, Rafael J. Wysocki, Ard Biesheuvel, Andy Lutomirski,
Tom Lendacky, linux-nvdimm, Linux Kernel Mailing List,
Joao Martins
On Fri, Mar 6, 2020 at 12:07 PM Jeff Moyer <jmoyer@redhat.com> wrote:
>
> Dan Williams <dan.j.williams@intel.com> writes:
>
> > Given the current dearth of systems that supply an ACPI HMAT table, and
> > the utility of being able to manually define device-dax "hmem" instances
> > via the efi_fake_mem= option, relax the requirements for creating these
> > devices. Specifically, add an option (numa=nohmat) to optionally disable
> > consideration of the HMAT and update efi_fake_mem= to behave like
> > memmap=nn!ss in terms of delimiting device boundaries.
>
> So, am I correct in deducing that your primary motivation is testing
> without hardware/firmware support?
My primary motivation is making the dax_kmem facility useful to
shipping platforms that have performance differentiated memory, but
may not have EFI-defined soft-reservations / HMAT (or
non-EFI-ACPI-platform equivalent). I'm anticipating HMAT enabled
platforms where the platform firmware policy for what is
soft-reserved, or not, is not the policy the system owner would pick.
I'd also highlight Joao's work [1] (see the TODO section) as an
indication of the demand for custom carving memory resources and
applying the device-dax memory management interface.
> This looks like a bit of a hack to
> me, and I think maybe it would be better to just emulate the HMAT using
> qemu. I don't have a strong objection, though.
Yeah, qemu emulation does not help when you, the system owner, have a
different use case than what the bare-metal platform-firmware
envisioned for "specific-purpose memory".
[1]: https://lore.kernel.org/lkml/20200110190313.17144-1-joao.m.martins@oracle.com/
^ permalink raw reply [flat|nested] 15+ messages in thread