linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/2] x86/mm/ident_map: Add PUD level 1GB page support
@ 2017-05-04  1:42 Xunlei Pang
  2017-05-04  1:42 ` [PATCH v3 2/2] x86_64/kexec: Use PUD level 1GB page for identity mapping if available Xunlei Pang
  2017-05-08  8:01 ` [tip:x86/urgent] x86/mm: Add support for gbpages to kernel_ident_mapping_init() tip-bot for Xunlei Pang
  0 siblings, 2 replies; 10+ messages in thread
From: Xunlei Pang @ 2017-05-04  1:42 UTC (permalink / raw)
  To: linux-kernel, Ingo Molnar
  Cc: kexec, akpm, Eric Biederman, Dave Young, x86, Borislav Petkov,
	Thomas Gleixner, Yinghai Lu, H. Peter Anvin, Andy Lutomirski,
	Xunlei Pang

The current kernel_ident_mapping_init() creates the identity
mapping always using 2MB page(PMD level), this patch adds the
1GB page(PUD level) support.

The primary advantage would be better TLB coverage/performance,
because we'd utilize 1GB TLBs instead of 2MB ones.

It is also useful for machines with large number of memory to
save paging structure allocations(around 4MB/TB using 2MB page)
when setting identity mappings for all the memory, after using
1GB page it will consume only 8KB/TB.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
---
 arch/x86/boot/compressed/pagetable.c |  2 +-
 arch/x86/include/asm/init.h          |  3 ++-
 arch/x86/kernel/machine_kexec_64.c   |  2 +-
 arch/x86/mm/ident_map.c              | 14 +++++++++++++-
 arch/x86/power/hibernate_64.c        |  2 +-
 5 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/pagetable.c
index 56589d0..1d78f17 100644
--- a/arch/x86/boot/compressed/pagetable.c
+++ b/arch/x86/boot/compressed/pagetable.c
@@ -70,7 +70,7 @@ static void *alloc_pgt_page(void *context)
  * Due to relocation, pointers must be assigned at run time not build time.
  */
 static struct x86_mapping_info mapping_info = {
-	.pmd_flag       = __PAGE_KERNEL_LARGE_EXEC,
+	.page_flag       = __PAGE_KERNEL_LARGE_EXEC,
 };
 
 /* Locates and clears a region for a new top level page table. */
diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
index 737da62..474eb8c 100644
--- a/arch/x86/include/asm/init.h
+++ b/arch/x86/include/asm/init.h
@@ -4,8 +4,9 @@
 struct x86_mapping_info {
 	void *(*alloc_pgt_page)(void *); /* allocate buf for page table */
 	void *context;			 /* context for alloc_pgt_page */
-	unsigned long pmd_flag;		 /* page flag for PMD entry */
+	unsigned long page_flag;	 /* page flag for PMD or PUD entry */
 	unsigned long offset;		 /* ident mapping offset */
+	bool direct_gbpages;		 /* PUD level 1GB page support */
 };
 
 int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 085c3b3..1d4f2b0 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -113,7 +113,7 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
 	struct x86_mapping_info info = {
 		.alloc_pgt_page	= alloc_pgt_page,
 		.context	= image,
-		.pmd_flag	= __PAGE_KERNEL_LARGE_EXEC,
+		.page_flag	= __PAGE_KERNEL_LARGE_EXEC,
 	};
 	unsigned long mstart, mend;
 	pgd_t *level4p;
diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
index 04210a2..adab159 100644
--- a/arch/x86/mm/ident_map.c
+++ b/arch/x86/mm/ident_map.c
@@ -13,7 +13,7 @@ static void ident_pmd_init(struct x86_mapping_info *info, pmd_t *pmd_page,
 		if (pmd_present(*pmd))
 			continue;
 
-		set_pmd(pmd, __pmd((addr - info->offset) | info->pmd_flag));
+		set_pmd(pmd, __pmd((addr - info->offset) | info->page_flag));
 	}
 }
 
@@ -30,6 +30,18 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
 		if (next > end)
 			next = end;
 
+		if (info->direct_gbpages) {
+			pud_t pudval;
+
+			if (pud_present(*pud))
+				continue;
+
+			addr &= PUD_MASK;
+			pudval = __pud((addr - info->offset) | info->page_flag);
+			set_pud(pud, pudval);
+			continue;
+		}
+
 		if (pud_present(*pud)) {
 			pmd = pmd_offset(pud, 0);
 			ident_pmd_init(info, pmd, addr, next);
diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c
index 6a61194..a6e21fe 100644
--- a/arch/x86/power/hibernate_64.c
+++ b/arch/x86/power/hibernate_64.c
@@ -104,7 +104,7 @@ static int set_up_temporary_mappings(void)
 {
 	struct x86_mapping_info info = {
 		.alloc_pgt_page	= alloc_pgt_page,
-		.pmd_flag	= __PAGE_KERNEL_LARGE_EXEC,
+		.page_flag	= __PAGE_KERNEL_LARGE_EXEC,
 		.offset		= __PAGE_OFFSET,
 	};
 	unsigned long mstart, mend;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 2/2] x86_64/kexec: Use PUD level 1GB page for identity mapping if available
  2017-05-04  1:42 [PATCH v3 1/2] x86/mm/ident_map: Add PUD level 1GB page support Xunlei Pang
@ 2017-05-04  1:42 ` Xunlei Pang
  2017-05-05  6:52   ` Ingo Molnar
  2017-05-08  8:01   ` [tip:x86/urgent] x86/kexec/64: Use gbpages for identity mappings " tip-bot for Xunlei Pang
  2017-05-08  8:01 ` [tip:x86/urgent] x86/mm: Add support for gbpages to kernel_ident_mapping_init() tip-bot for Xunlei Pang
  1 sibling, 2 replies; 10+ messages in thread
From: Xunlei Pang @ 2017-05-04  1:42 UTC (permalink / raw)
  To: linux-kernel, Ingo Molnar
  Cc: kexec, akpm, Eric Biederman, Dave Young, x86, Borislav Petkov,
	Thomas Gleixner, Yinghai Lu, H. Peter Anvin, Andy Lutomirski,
	Xunlei Pang

Kexec setups all identity mappings before booting into the new
kernel, and this will cause extra memory consumption for paging
structures which is quite considerable on modern machines with
huge number of memory.

E.g. On one 32TB machine, in kdump case, it could waste around
128MB (around 4MB/TB) from the reserved memory after kexec set
all the identity mappings using the current 2MB page, plus the
loaded kdump kernel, initramfs, etc, it caused kexec syscall
-NOMEM failure. As a result, we had to enlarge reserved memory
via "crashkernel=X".

This causes some trouble for distributions that use policies
to evaluate the proper "crashkernel=X" value for users.

Given that on machines with large number of memory, 1GB feature
is very likely available, and that kernel_ident_mapping_init()
supports PUD level 1GB page, to solve this problem, we use 1GB
size page to create the identity mapping pgtable for kdump if
1GB feature is available.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
---
 arch/x86/kernel/machine_kexec_64.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 1d4f2b0..c25d277 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -122,6 +122,10 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
 
 	level4p = (pgd_t *)__va(start_pgtable);
 	clear_page(level4p);
+
+	if (direct_gbpages)
+		info.direct_gbpages = true;
+
 	for (i = 0; i < nr_pfn_mapped; i++) {
 		mstart = pfn_mapped[i].start << PAGE_SHIFT;
 		mend   = pfn_mapped[i].end << PAGE_SHIFT;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/2] x86_64/kexec: Use PUD level 1GB page for identity mapping if available
  2017-05-04  1:42 ` [PATCH v3 2/2] x86_64/kexec: Use PUD level 1GB page for identity mapping if available Xunlei Pang
@ 2017-05-05  6:52   ` Ingo Molnar
  2017-05-05  7:32     ` Xunlei Pang
  2017-05-08  8:01   ` [tip:x86/urgent] x86/kexec/64: Use gbpages for identity mappings " tip-bot for Xunlei Pang
  1 sibling, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2017-05-05  6:52 UTC (permalink / raw)
  To: Xunlei Pang
  Cc: linux-kernel, kexec, akpm, Eric Biederman, Dave Young, x86,
	Borislav Petkov, Thomas Gleixner, Yinghai Lu, H. Peter Anvin,
	Andy Lutomirski


* Xunlei Pang <xlpang@redhat.com> wrote:

> @@ -122,6 +122,10 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
>  
>  	level4p = (pgd_t *)__va(start_pgtable);
>  	clear_page(level4p);
> +
> +	if (direct_gbpages)
> +		info.direct_gbpages = true;

No, this should be keyed off the CPU feature (X86_FEATURE_GBPAGES) automatically, 
not set blindly! AFAICS this patch will crash kexec on any CPU that does not 
support gbpages.

I only noticed this problem after having fixed/enhanced all the changelogs - so 
please pick up the new changelog up from the log below.

Thanks,

	Ingo


============================>

Author: Xunlei Pang <xlpang@redhat.com>

x86/mm: Add support for gbpages to kernel_ident_mapping_init()

Kernel identity mappings on x86-64 kernels are created in two
ways: by the early x86 boot code, or by kernel_ident_mapping_init().

Native kernels (which is the dominant usecase) use the former,
but the kexec and the hibernation code uses kernel_ident_mapping_init().

There's a subtle difference between these two ways of how identity
mappings are created, the current kernel_ident_mapping_init() code
creates identity mappings always using 2MB page(PMD level) - while
the native kernel boot path also utilizes gbpages where available.

This difference is suboptimal both for performance and for memory
usage: kernel_ident_mapping_init() needs to allocate pages for the
page tables when creating the new identity mappings.

This patch adds 1GB page(PUD level) support to kernel_ident_mapping_init()
to address these concerns.

The primary advantage would be better TLB coverage/performance,
because we'd utilize 1GB TLBs instead of 2MB ones.

It is also useful for machines with large number of memory to
save paging structure allocations(around 4MB/TB using 2MB page)
when setting identity mappings for all the memory, after using
1GB page it will consume only 8KB/TB.

( Note that this change alone does not activate gbpages in kexec,
  we are doing that in a separate patch. )

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/2] x86_64/kexec: Use PUD level 1GB page for identity mapping if available
  2017-05-05  6:52   ` Ingo Molnar
@ 2017-05-05  7:32     ` Xunlei Pang
  2017-05-05  9:20       ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Xunlei Pang @ 2017-05-05  7:32 UTC (permalink / raw)
  To: Ingo Molnar, Xunlei Pang
  Cc: linux-kernel, kexec, akpm, Eric Biederman, Dave Young, x86,
	Borislav Petkov, Thomas Gleixner, Yinghai Lu, H. Peter Anvin,
	Andy Lutomirski

On 05/05/2017 at 02:52 PM, Ingo Molnar wrote:
> * Xunlei Pang <xlpang@redhat.com> wrote:
>
>> @@ -122,6 +122,10 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
>>  
>>  	level4p = (pgd_t *)__va(start_pgtable);
>>  	clear_page(level4p);
>> +
>> +	if (direct_gbpages)
>> +		info.direct_gbpages = true;
> No, this should be keyed off the CPU feature (X86_FEATURE_GBPAGES) automatically, 
> not set blindly! AFAICS this patch will crash kexec on any CPU that does not 
> support gbpages.

It should be fine, probe_page_size_mask() already takes care of this:
    if (direct_gbpages && boot_cpu_has(X86_FEATURE_GBPAGES)) {
        printk(KERN_INFO "Using GB pages for direct mapping\n");
        page_size_mask |= 1 << PG_LEVEL_1G;
    } else {
        direct_gbpages = 0;
    }

So if X86_FEATURE_GBPAGES is not supported, direct_gbpages will be set to 0.

>
> I only noticed this problem after having fixed/enhanced all the changelogs - so 
> please pick up the new changelog up from the log below.

Thanks for the rewrite, it looks better.

Regards,
Xunlei

>
> Thanks,
>
> 	Ingo
>
>
> ============================>
>
> Author: Xunlei Pang <xlpang@redhat.com>
>
> x86/mm: Add support for gbpages to kernel_ident_mapping_init()
>
> Kernel identity mappings on x86-64 kernels are created in two
> ways: by the early x86 boot code, or by kernel_ident_mapping_init().
>
> Native kernels (which is the dominant usecase) use the former,
> but the kexec and the hibernation code uses kernel_ident_mapping_init().
>
> There's a subtle difference between these two ways of how identity
> mappings are created, the current kernel_ident_mapping_init() code
> creates identity mappings always using 2MB page(PMD level) - while
> the native kernel boot path also utilizes gbpages where available.
>
> This difference is suboptimal both for performance and for memory
> usage: kernel_ident_mapping_init() needs to allocate pages for the
> page tables when creating the new identity mappings.
>
> This patch adds 1GB page(PUD level) support to kernel_ident_mapping_init()
> to address these concerns.
>
> The primary advantage would be better TLB coverage/performance,
> because we'd utilize 1GB TLBs instead of 2MB ones.
>
> It is also useful for machines with large number of memory to
> save paging structure allocations(around 4MB/TB using 2MB page)
> when setting identity mappings for all the memory, after using
> 1GB page it will consume only 8KB/TB.
>
> ( Note that this change alone does not activate gbpages in kexec,
>   we are doing that in a separate patch. )
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/2] x86_64/kexec: Use PUD level 1GB page for identity mapping if available
  2017-05-05  7:32     ` Xunlei Pang
@ 2017-05-05  9:20       ` Ingo Molnar
  2017-05-05 10:50         ` Xunlei Pang
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2017-05-05  9:20 UTC (permalink / raw)
  To: xlpang
  Cc: linux-kernel, kexec, akpm, Eric Biederman, Dave Young, x86,
	Borislav Petkov, Thomas Gleixner, Yinghai Lu, H. Peter Anvin,
	Andy Lutomirski


* Xunlei Pang <xpang@redhat.com> wrote:

> On 05/05/2017 at 02:52 PM, Ingo Molnar wrote:
> > * Xunlei Pang <xlpang@redhat.com> wrote:
> >
> >> @@ -122,6 +122,10 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
> >>  
> >>  	level4p = (pgd_t *)__va(start_pgtable);
> >>  	clear_page(level4p);
> >> +
> >> +	if (direct_gbpages)
> >> +		info.direct_gbpages = true;
> > No, this should be keyed off the CPU feature (X86_FEATURE_GBPAGES) automatically, 
> > not set blindly! AFAICS this patch will crash kexec on any CPU that does not 
> > support gbpages.
> 
> It should be fine, probe_page_size_mask() already takes care of this:
>     if (direct_gbpages && boot_cpu_has(X86_FEATURE_GBPAGES)) {
>         printk(KERN_INFO "Using GB pages for direct mapping\n");
>         page_size_mask |= 1 << PG_LEVEL_1G;
>     } else {
>         direct_gbpages = 0;
>     }
> 
> So if X86_FEATURE_GBPAGES is not supported, direct_gbpages will be set to 0.

So why is the introduction of the info.direct_gbpages flag necessary? AFAICS it 
just duplicates the kernel's direct_gbpages flag. One outcome is that hibernation 
won't use gbpages, which is silly.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/2] x86_64/kexec: Use PUD level 1GB page for identity mapping if available
  2017-05-05  9:20       ` Ingo Molnar
@ 2017-05-05 10:50         ` Xunlei Pang
  2017-05-08  6:29           ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Xunlei Pang @ 2017-05-05 10:50 UTC (permalink / raw)
  To: Ingo Molnar, xlpang
  Cc: linux-kernel, kexec, akpm, Eric Biederman, Dave Young, x86,
	Borislav Petkov, Thomas Gleixner, Yinghai Lu, H. Peter Anvin,
	Andy Lutomirski

On 05/05/2017 at 05:20 PM, Ingo Molnar wrote:
> * Xunlei Pang <xpang@redhat.com> wrote:
>
>> On 05/05/2017 at 02:52 PM, Ingo Molnar wrote:
>>> * Xunlei Pang <xlpang@redhat.com> wrote:
>>>
>>>> @@ -122,6 +122,10 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
>>>>  
>>>>  	level4p = (pgd_t *)__va(start_pgtable);
>>>>  	clear_page(level4p);
>>>> +
>>>> +	if (direct_gbpages)
>>>> +		info.direct_gbpages = true;
>>> No, this should be keyed off the CPU feature (X86_FEATURE_GBPAGES) automatically, 
>>> not set blindly! AFAICS this patch will crash kexec on any CPU that does not 
>>> support gbpages.
>> It should be fine, probe_page_size_mask() already takes care of this:
>>     if (direct_gbpages && boot_cpu_has(X86_FEATURE_GBPAGES)) {
>>         printk(KERN_INFO "Using GB pages for direct mapping\n");
>>         page_size_mask |= 1 << PG_LEVEL_1G;
>>     } else {
>>         direct_gbpages = 0;
>>     }
>>
>> So if X86_FEATURE_GBPAGES is not supported, direct_gbpages will be set to 0.
> So why is the introduction of the info.direct_gbpages flag necessary? AFAICS it 
> just duplicates the kernel's direct_gbpages flag. One outcome is that hibernation 
> won't use gbpages, which is silly.

boot/compressed/pagetable.c  also uses kernel_ident_mapping_init() for kaslr, at the moment
we don't have "direct_gbpages" definition or X86_FEATURE_GBPAGES feature detection.

I thought that we can change the other call sites when found really needed.

Regards,
Xunlei

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/2] x86_64/kexec: Use PUD level 1GB page for identity mapping if available
  2017-05-05 10:50         ` Xunlei Pang
@ 2017-05-08  6:29           ` Ingo Molnar
  2017-05-08  7:24             ` Xunlei Pang
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2017-05-08  6:29 UTC (permalink / raw)
  To: xlpang
  Cc: linux-kernel, kexec, akpm, Eric Biederman, Dave Young, x86,
	Borislav Petkov, Thomas Gleixner, Yinghai Lu, H. Peter Anvin,
	Andy Lutomirski


* Xunlei Pang <xpang@redhat.com> wrote:

> On 05/05/2017 at 05:20 PM, Ingo Molnar wrote:
> > * Xunlei Pang <xpang@redhat.com> wrote:
> >
> >> On 05/05/2017 at 02:52 PM, Ingo Molnar wrote:
> >>> * Xunlei Pang <xlpang@redhat.com> wrote:
> >>>
> >>>> @@ -122,6 +122,10 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
> >>>>  
> >>>>  	level4p = (pgd_t *)__va(start_pgtable);
> >>>>  	clear_page(level4p);
> >>>> +
> >>>> +	if (direct_gbpages)
> >>>> +		info.direct_gbpages = true;
> >>> No, this should be keyed off the CPU feature (X86_FEATURE_GBPAGES) automatically, 
> >>> not set blindly! AFAICS this patch will crash kexec on any CPU that does not 
> >>> support gbpages.
> >> It should be fine, probe_page_size_mask() already takes care of this:
> >>     if (direct_gbpages && boot_cpu_has(X86_FEATURE_GBPAGES)) {
> >>         printk(KERN_INFO "Using GB pages for direct mapping\n");
> >>         page_size_mask |= 1 << PG_LEVEL_1G;
> >>     } else {
> >>         direct_gbpages = 0;
> >>     }
> >>
> >> So if X86_FEATURE_GBPAGES is not supported, direct_gbpages will be set to 0.
> > So why is the introduction of the info.direct_gbpages flag necessary? AFAICS it 
> > just duplicates the kernel's direct_gbpages flag. One outcome is that hibernation 
> > won't use gbpages, which is silly.
> 
> boot/compressed/pagetable.c also uses kernel_ident_mapping_init() for kaslr, at 
> the moment we don't have "direct_gbpages" definition or X86_FEATURE_GBPAGES 
> feature detection.
> 
> I thought that we can change the other call sites when found really needed.

Ok, you are right - I'll use the original patches as submitted, with the updated 
changelogs.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/2] x86_64/kexec: Use PUD level 1GB page for identity mapping if available
  2017-05-08  6:29           ` Ingo Molnar
@ 2017-05-08  7:24             ` Xunlei Pang
  0 siblings, 0 replies; 10+ messages in thread
From: Xunlei Pang @ 2017-05-08  7:24 UTC (permalink / raw)
  To: Ingo Molnar, xlpang
  Cc: linux-kernel, kexec, akpm, Eric Biederman, Dave Young, x86,
	Borislav Petkov, Thomas Gleixner, Yinghai Lu, H. Peter Anvin,
	Andy Lutomirski

On 05/08/2017 at 02:29 PM, Ingo Molnar wrote:
> * Xunlei Pang <xpang@redhat.com> wrote:
>
>> On 05/05/2017 at 05:20 PM, Ingo Molnar wrote:
>>> * Xunlei Pang <xpang@redhat.com> wrote:
>>>
>>>> On 05/05/2017 at 02:52 PM, Ingo Molnar wrote:
>>>>> * Xunlei Pang <xlpang@redhat.com> wrote:
>>>>>
>>>>>> @@ -122,6 +122,10 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
>>>>>>  
>>>>>>  	level4p = (pgd_t *)__va(start_pgtable);
>>>>>>  	clear_page(level4p);
>>>>>> +
>>>>>> +	if (direct_gbpages)
>>>>>> +		info.direct_gbpages = true;
>>>>> No, this should be keyed off the CPU feature (X86_FEATURE_GBPAGES) automatically, 
>>>>> not set blindly! AFAICS this patch will crash kexec on any CPU that does not 
>>>>> support gbpages.
>>>> It should be fine, probe_page_size_mask() already takes care of this:
>>>>     if (direct_gbpages && boot_cpu_has(X86_FEATURE_GBPAGES)) {
>>>>         printk(KERN_INFO "Using GB pages for direct mapping\n");
>>>>         page_size_mask |= 1 << PG_LEVEL_1G;
>>>>     } else {
>>>>         direct_gbpages = 0;
>>>>     }
>>>>
>>>> So if X86_FEATURE_GBPAGES is not supported, direct_gbpages will be set to 0.
>>> So why is the introduction of the info.direct_gbpages flag necessary? AFAICS it 
>>> just duplicates the kernel's direct_gbpages flag. One outcome is that hibernation 
>>> won't use gbpages, which is silly.
>> boot/compressed/pagetable.c also uses kernel_ident_mapping_init() for kaslr, at 
>> the moment we don't have "direct_gbpages" definition or X86_FEATURE_GBPAGES 
>> feature detection.
>>
>> I thought that we can change the other call sites when found really needed.
> Ok, you are right - I'll use the original patches as submitted, with the updated 
> changelogs.

Thanks!

Regards,
Xunlei

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tip:x86/urgent] x86/mm: Add support for gbpages to kernel_ident_mapping_init()
  2017-05-04  1:42 [PATCH v3 1/2] x86/mm/ident_map: Add PUD level 1GB page support Xunlei Pang
  2017-05-04  1:42 ` [PATCH v3 2/2] x86_64/kexec: Use PUD level 1GB page for identity mapping if available Xunlei Pang
@ 2017-05-08  8:01 ` tip-bot for Xunlei Pang
  1 sibling, 0 replies; 10+ messages in thread
From: tip-bot for Xunlei Pang @ 2017-05-08  8:01 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, tglx, ebiederm, luto, xlpang, dvlasenk, peterz, torvalds,
	hpa, yinghai, bp, dyoung, linux-kernel, brgerst, jpoimboe

Commit-ID:  66aad4fdf2bf0af29c7decb4433dc5ec6c7c5451
Gitweb:     http://git.kernel.org/tip/66aad4fdf2bf0af29c7decb4433dc5ec6c7c5451
Author:     Xunlei Pang <xlpang@redhat.com>
AuthorDate: Thu, 4 May 2017 09:42:50 +0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 8 May 2017 08:28:40 +0200

x86/mm: Add support for gbpages to kernel_ident_mapping_init()

Kernel identity mappings on x86-64 kernels are created in two
ways: by the early x86 boot code, or by kernel_ident_mapping_init().

Native kernels (which is the dominant usecase) use the former,
but the kexec and the hibernation code uses kernel_ident_mapping_init().

There's a subtle difference between these two ways of how identity
mappings are created, the current kernel_ident_mapping_init() code
creates identity mappings always using 2MB page(PMD level) - while
the native kernel boot path also utilizes gbpages where available.

This difference is suboptimal both for performance and for memory
usage: kernel_ident_mapping_init() needs to allocate pages for the
page tables when creating the new identity mappings.

This patch adds 1GB page(PUD level) support to kernel_ident_mapping_init()
to address these concerns.

The primary advantage would be better TLB coverage/performance,
because we'd utilize 1GB TLBs instead of 2MB ones.

It is also useful for machines with large number of memory to
save paging structure allocations(around 4MB/TB using 2MB page)
when setting identity mappings for all the memory, after using
1GB page it will consume only 8KB/TB.

( Note that this change alone does not activate gbpages in kexec,
  we are doing that in a separate patch. )

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: akpm@linux-foundation.org
Cc: kexec@lists.infradead.org
Link: http://lkml.kernel.org/r/1493862171-8799-1-git-send-email-xlpang@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/boot/compressed/pagetable.c |  2 +-
 arch/x86/include/asm/init.h          |  3 ++-
 arch/x86/kernel/machine_kexec_64.c   |  2 +-
 arch/x86/mm/ident_map.c              | 14 +++++++++++++-
 arch/x86/power/hibernate_64.c        |  2 +-
 5 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/pagetable.c
index 56589d0..1d78f17 100644
--- a/arch/x86/boot/compressed/pagetable.c
+++ b/arch/x86/boot/compressed/pagetable.c
@@ -70,7 +70,7 @@ static unsigned long level4p;
  * Due to relocation, pointers must be assigned at run time not build time.
  */
 static struct x86_mapping_info mapping_info = {
-	.pmd_flag       = __PAGE_KERNEL_LARGE_EXEC,
+	.page_flag       = __PAGE_KERNEL_LARGE_EXEC,
 };
 
 /* Locates and clears a region for a new top level page table. */
diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
index 737da62..474eb8c 100644
--- a/arch/x86/include/asm/init.h
+++ b/arch/x86/include/asm/init.h
@@ -4,8 +4,9 @@
 struct x86_mapping_info {
 	void *(*alloc_pgt_page)(void *); /* allocate buf for page table */
 	void *context;			 /* context for alloc_pgt_page */
-	unsigned long pmd_flag;		 /* page flag for PMD entry */
+	unsigned long page_flag;	 /* page flag for PMD or PUD entry */
 	unsigned long offset;		 /* ident mapping offset */
+	bool direct_gbpages;		 /* PUD level 1GB page support */
 };
 
 int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 085c3b3..1d4f2b0 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -113,7 +113,7 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
 	struct x86_mapping_info info = {
 		.alloc_pgt_page	= alloc_pgt_page,
 		.context	= image,
-		.pmd_flag	= __PAGE_KERNEL_LARGE_EXEC,
+		.page_flag	= __PAGE_KERNEL_LARGE_EXEC,
 	};
 	unsigned long mstart, mend;
 	pgd_t *level4p;
diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
index 04210a2..adab159 100644
--- a/arch/x86/mm/ident_map.c
+++ b/arch/x86/mm/ident_map.c
@@ -13,7 +13,7 @@ static void ident_pmd_init(struct x86_mapping_info *info, pmd_t *pmd_page,
 		if (pmd_present(*pmd))
 			continue;
 
-		set_pmd(pmd, __pmd((addr - info->offset) | info->pmd_flag));
+		set_pmd(pmd, __pmd((addr - info->offset) | info->page_flag));
 	}
 }
 
@@ -30,6 +30,18 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
 		if (next > end)
 			next = end;
 
+		if (info->direct_gbpages) {
+			pud_t pudval;
+
+			if (pud_present(*pud))
+				continue;
+
+			addr &= PUD_MASK;
+			pudval = __pud((addr - info->offset) | info->page_flag);
+			set_pud(pud, pudval);
+			continue;
+		}
+
 		if (pud_present(*pud)) {
 			pmd = pmd_offset(pud, 0);
 			ident_pmd_init(info, pmd, addr, next);
diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c
index 6a61194..a6e21fe 100644
--- a/arch/x86/power/hibernate_64.c
+++ b/arch/x86/power/hibernate_64.c
@@ -104,7 +104,7 @@ static int set_up_temporary_mappings(void)
 {
 	struct x86_mapping_info info = {
 		.alloc_pgt_page	= alloc_pgt_page,
-		.pmd_flag	= __PAGE_KERNEL_LARGE_EXEC,
+		.page_flag	= __PAGE_KERNEL_LARGE_EXEC,
 		.offset		= __PAGE_OFFSET,
 	};
 	unsigned long mstart, mend;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [tip:x86/urgent] x86/kexec/64: Use gbpages for identity mappings if available
  2017-05-04  1:42 ` [PATCH v3 2/2] x86_64/kexec: Use PUD level 1GB page for identity mapping if available Xunlei Pang
  2017-05-05  6:52   ` Ingo Molnar
@ 2017-05-08  8:01   ` tip-bot for Xunlei Pang
  1 sibling, 0 replies; 10+ messages in thread
From: tip-bot for Xunlei Pang @ 2017-05-08  8:01 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jpoimboe, torvalds, mingo, bp, yinghai, dyoung, luto,
	linux-kernel, brgerst, ebiederm, xlpang, peterz, tglx, dvlasenk,
	hpa

Commit-ID:  8638100c52bb7782462b14aad102a4aaf0c7094c
Gitweb:     http://git.kernel.org/tip/8638100c52bb7782462b14aad102a4aaf0c7094c
Author:     Xunlei Pang <xlpang@redhat.com>
AuthorDate: Thu, 4 May 2017 09:42:51 +0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 8 May 2017 08:28:44 +0200

x86/kexec/64: Use gbpages for identity mappings if available

Kexec sets up all identity mappings before booting into the new
kernel, and this will cause extra memory consumption for paging
structures which is quite considerable on modern machines with
huge memory sizes.

E.g. on a 32TB machine that is kdumping, it could waste around
128MB (around 4MB/TB) from the reserved memory after kexec sets
all the identity mappings using the current 2MB page.

Add to that the memory needed for the loaded kdump kernel, initramfs,
etc., and it causes a kexec syscall -NOMEM failure.

As a result, we had to enlarge reserved memory via "crashkernel=X"
to work around this problem.

This causes some trouble for distributions that use policies
to evaluate the proper "crashkernel=X" value for users.

So enable gbpages for kexec mappings.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: akpm@linux-foundation.org
Cc: kexec@lists.infradead.org
Link: http://lkml.kernel.org/r/1493862171-8799-2-git-send-email-xlpang@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/machine_kexec_64.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 1d4f2b0..c25d277 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -122,6 +122,10 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
 
 	level4p = (pgd_t *)__va(start_pgtable);
 	clear_page(level4p);
+
+	if (direct_gbpages)
+		info.direct_gbpages = true;
+
 	for (i = 0; i < nr_pfn_mapped; i++) {
 		mstart = pfn_mapped[i].start << PAGE_SHIFT;
 		mend   = pfn_mapped[i].end << PAGE_SHIFT;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-05-08  8:08 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-04  1:42 [PATCH v3 1/2] x86/mm/ident_map: Add PUD level 1GB page support Xunlei Pang
2017-05-04  1:42 ` [PATCH v3 2/2] x86_64/kexec: Use PUD level 1GB page for identity mapping if available Xunlei Pang
2017-05-05  6:52   ` Ingo Molnar
2017-05-05  7:32     ` Xunlei Pang
2017-05-05  9:20       ` Ingo Molnar
2017-05-05 10:50         ` Xunlei Pang
2017-05-08  6:29           ` Ingo Molnar
2017-05-08  7:24             ` Xunlei Pang
2017-05-08  8:01   ` [tip:x86/urgent] x86/kexec/64: Use gbpages for identity mappings " tip-bot for Xunlei Pang
2017-05-08  8:01 ` [tip:x86/urgent] x86/mm: Add support for gbpages to kernel_ident_mapping_init() tip-bot for Xunlei Pang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).