All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] kexec: remove unnecessary unusable_pages
@ 2016-07-12  4:56 ` zhongjiang
  0 siblings, 0 replies; 26+ messages in thread
From: zhongjiang @ 2016-07-12  4:56 UTC (permalink / raw)
  To: ebiederm, dyoung, horms, vgoyal, yinghai, akpm; +Cc: kexec, linux-mm

From: zhong jiang <zhongjiang@huawei.com>

In general, kexec alloc pages from buddy system, it cannot exceed
the physical address in the system.

The patch just remove this unnecessary code, no functional change.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 include/linux/kexec.h |  1 -
 kernel/kexec_core.c   | 13 -------------
 2 files changed, 14 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index e8acb2b..26e4917 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -162,7 +162,6 @@ struct kimage {
 
 	struct list_head control_pages;
 	struct list_head dest_pages;
-	struct list_head unusable_pages;
 
 	/* Address of next control page to allocate for crash kernels. */
 	unsigned long control_page;
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 56b3ed0..448127d 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -257,9 +257,6 @@ struct kimage *do_kimage_alloc_init(void)
 	/* Initialize the list of destination pages */
 	INIT_LIST_HEAD(&image->dest_pages);
 
-	/* Initialize the list of unusable pages */
-	INIT_LIST_HEAD(&image->unusable_pages);
-
 	return image;
 }
 
@@ -517,10 +514,6 @@ static void kimage_free_extra_pages(struct kimage *image)
 {
 	/* Walk through and free any extra destination pages I may have */
 	kimage_free_page_list(&image->dest_pages);
-
-	/* Walk through and free any unusable pages I have cached */
-	kimage_free_page_list(&image->unusable_pages);
-
 }
 void kimage_terminate(struct kimage *image)
 {
@@ -647,12 +640,6 @@ static struct page *kimage_alloc_page(struct kimage *image,
 		page = kimage_alloc_pages(gfp_mask, 0);
 		if (!page)
 			return NULL;
-		/* If the page cannot be used file it away */
-		if (page_to_pfn(page) >
-				(KEXEC_SOURCE_MEMORY_LIMIT >> PAGE_SHIFT)) {
-			list_add(&page->lru, &image->unusable_pages);
-			continue;
-		}
 		addr = page_to_pfn(page) << PAGE_SHIFT;
 
 		/* If it is the destination page we want use it */
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 1/2] kexec: remove unnecessary unusable_pages
@ 2016-07-12  4:56 ` zhongjiang
  0 siblings, 0 replies; 26+ messages in thread
From: zhongjiang @ 2016-07-12  4:56 UTC (permalink / raw)
  To: ebiederm, dyoung, horms, vgoyal, yinghai, akpm; +Cc: linux-mm, kexec

From: zhong jiang <zhongjiang@huawei.com>

In general, kexec alloc pages from buddy system, it cannot exceed
the physical address in the system.

The patch just remove this unnecessary code, no functional change.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 include/linux/kexec.h |  1 -
 kernel/kexec_core.c   | 13 -------------
 2 files changed, 14 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index e8acb2b..26e4917 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -162,7 +162,6 @@ struct kimage {
 
 	struct list_head control_pages;
 	struct list_head dest_pages;
-	struct list_head unusable_pages;
 
 	/* Address of next control page to allocate for crash kernels. */
 	unsigned long control_page;
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 56b3ed0..448127d 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -257,9 +257,6 @@ struct kimage *do_kimage_alloc_init(void)
 	/* Initialize the list of destination pages */
 	INIT_LIST_HEAD(&image->dest_pages);
 
-	/* Initialize the list of unusable pages */
-	INIT_LIST_HEAD(&image->unusable_pages);
-
 	return image;
 }
 
@@ -517,10 +514,6 @@ static void kimage_free_extra_pages(struct kimage *image)
 {
 	/* Walk through and free any extra destination pages I may have */
 	kimage_free_page_list(&image->dest_pages);
-
-	/* Walk through and free any unusable pages I have cached */
-	kimage_free_page_list(&image->unusable_pages);
-
 }
 void kimage_terminate(struct kimage *image)
 {
@@ -647,12 +640,6 @@ static struct page *kimage_alloc_page(struct kimage *image,
 		page = kimage_alloc_pages(gfp_mask, 0);
 		if (!page)
 			return NULL;
-		/* If the page cannot be used file it away */
-		if (page_to_pfn(page) >
-				(KEXEC_SOURCE_MEMORY_LIMIT >> PAGE_SHIFT)) {
-			list_add(&page->lru, &image->unusable_pages);
-			continue;
-		}
 		addr = page_to_pfn(page) << PAGE_SHIFT;
 
 		/* If it is the destination page we want use it */
-- 
1.8.3.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/2] kexec: add a pmd huge entry condition during the page table
  2016-07-12  4:56 ` zhongjiang
@ 2016-07-12  4:56   ` zhongjiang
  -1 siblings, 0 replies; 26+ messages in thread
From: zhongjiang @ 2016-07-12  4:56 UTC (permalink / raw)
  To: ebiederm, dyoung, horms, vgoyal, yinghai, akpm; +Cc: kexec, linux-mm

From: zhong jiang <zhongjiang@huawei.com>

when image is loaded into kernel, we need set up page table for it. and 
all valid pfn also set up new mapping. it will tend to establish a pmd 
page table in the form of a large page if pud_present is true. relocate_kernel 
points to code segment can locate in the pmd huge entry in init_transtion_pgtable. 
therefore, we need to take the situation into account.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 arch/x86/kernel/machine_kexec_64.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 5a294e4..c33e344 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -14,6 +14,7 @@
 #include <linux/gfp.h>
 #include <linux/reboot.h>
 #include <linux/numa.h>
+#include <linux/hugetlb.h>
 #include <linux/ftrace.h>
 #include <linux/io.h>
 #include <linux/suspend.h>
@@ -34,6 +35,17 @@ static struct kexec_file_ops *kexec_file_loaders[] = {
 };
 #endif
 
+static void split_pmd(pmd_t *pmd, pte_t *pte)
+{
+	unsigned long pfn = pmd_pfn(*pmd);
+	int i = 0;
+
+	do {
+		set_pte(pte, pfn_pte(pfn, PAGE_KERNEL_EXEC));
+		pfn++;
+	} while (pte++, i++, i < PTRS_PER_PTE);
+}
+
 static void free_transition_pgtable(struct kimage *image)
 {
 	free_page((unsigned long)image->arch.pud);
@@ -68,15 +80,19 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
 		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
 	}
 	pmd = pmd_offset(pud, vaddr);
-	if (!pmd_present(*pmd)) {
+	if (!pmd_present(*pmd) || pmd_huge(*pmd)) {
 		pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
 		if (!pte)
 			goto err;
 		image->arch.pte = pte;
-		set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
+		if (pmd_huge(*pmd))
+			split_pmd(pmd, pte);
+		else
+			set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
 	}
 	pte = pte_offset_kernel(pmd, vaddr);
 	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
+
 	return 0;
 err:
 	free_transition_pgtable(image);
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/2] kexec: add a pmd huge entry condition during the page table
@ 2016-07-12  4:56   ` zhongjiang
  0 siblings, 0 replies; 26+ messages in thread
From: zhongjiang @ 2016-07-12  4:56 UTC (permalink / raw)
  To: ebiederm, dyoung, horms, vgoyal, yinghai, akpm; +Cc: linux-mm, kexec

From: zhong jiang <zhongjiang@huawei.com>

when image is loaded into kernel, we need set up page table for it. and 
all valid pfn also set up new mapping. it will tend to establish a pmd 
page table in the form of a large page if pud_present is true. relocate_kernel 
points to code segment can locate in the pmd huge entry in init_transtion_pgtable. 
therefore, we need to take the situation into account.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 arch/x86/kernel/machine_kexec_64.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 5a294e4..c33e344 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -14,6 +14,7 @@
 #include <linux/gfp.h>
 #include <linux/reboot.h>
 #include <linux/numa.h>
+#include <linux/hugetlb.h>
 #include <linux/ftrace.h>
 #include <linux/io.h>
 #include <linux/suspend.h>
@@ -34,6 +35,17 @@ static struct kexec_file_ops *kexec_file_loaders[] = {
 };
 #endif
 
+static void split_pmd(pmd_t *pmd, pte_t *pte)
+{
+	unsigned long pfn = pmd_pfn(*pmd);
+	int i = 0;
+
+	do {
+		set_pte(pte, pfn_pte(pfn, PAGE_KERNEL_EXEC));
+		pfn++;
+	} while (pte++, i++, i < PTRS_PER_PTE);
+}
+
 static void free_transition_pgtable(struct kimage *image)
 {
 	free_page((unsigned long)image->arch.pud);
@@ -68,15 +80,19 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
 		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
 	}
 	pmd = pmd_offset(pud, vaddr);
-	if (!pmd_present(*pmd)) {
+	if (!pmd_present(*pmd) || pmd_huge(*pmd)) {
 		pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
 		if (!pte)
 			goto err;
 		image->arch.pte = pte;
-		set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
+		if (pmd_huge(*pmd))
+			split_pmd(pmd, pte);
+		else
+			set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
 	}
 	pte = pte_offset_kernel(pmd, vaddr);
 	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
+
 	return 0;
 err:
 	free_transition_pgtable(image);
-- 
1.8.3.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] kexec: remove unnecessary unusable_pages
  2016-07-12  4:56 ` zhongjiang
@ 2016-07-12 15:19   ` Eric W. Biederman
  -1 siblings, 0 replies; 26+ messages in thread
From: Eric W. Biederman @ 2016-07-12 15:19 UTC (permalink / raw)
  To: zhongjiang; +Cc: dyoung, horms, vgoyal, yinghai, akpm, kexec, linux-mm

zhongjiang <zhongjiang@huawei.com> writes:

> From: zhong jiang <zhongjiang@huawei.com>
>
> In general, kexec alloc pages from buddy system, it cannot exceed
> the physical address in the system.
>
> The patch just remove this unnecessary code, no functional change.

On 32bit systems with highmem support kexec can very easily receive a
page from the buddy allocator that can exceed 4GiB.  This doesn't show
up on 64bit systems as typically the memory limits are less than the
address space.  But this code is very necessary on some systems and
removing it is not ok.

Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>


>
> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
> ---
>  include/linux/kexec.h |  1 -
>  kernel/kexec_core.c   | 13 -------------
>  2 files changed, 14 deletions(-)
>
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index e8acb2b..26e4917 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -162,7 +162,6 @@ struct kimage {
>  
>  	struct list_head control_pages;
>  	struct list_head dest_pages;
> -	struct list_head unusable_pages;
>  
>  	/* Address of next control page to allocate for crash kernels. */
>  	unsigned long control_page;
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 56b3ed0..448127d 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -257,9 +257,6 @@ struct kimage *do_kimage_alloc_init(void)
>  	/* Initialize the list of destination pages */
>  	INIT_LIST_HEAD(&image->dest_pages);
>  
> -	/* Initialize the list of unusable pages */
> -	INIT_LIST_HEAD(&image->unusable_pages);
> -
>  	return image;
>  }
>  
> @@ -517,10 +514,6 @@ static void kimage_free_extra_pages(struct kimage *image)
>  {
>  	/* Walk through and free any extra destination pages I may have */
>  	kimage_free_page_list(&image->dest_pages);
> -
> -	/* Walk through and free any unusable pages I have cached */
> -	kimage_free_page_list(&image->unusable_pages);
> -
>  }
>  void kimage_terminate(struct kimage *image)
>  {
> @@ -647,12 +640,6 @@ static struct page *kimage_alloc_page(struct kimage *image,
>  		page = kimage_alloc_pages(gfp_mask, 0);
>  		if (!page)
>  			return NULL;
> -		/* If the page cannot be used file it away */
> -		if (page_to_pfn(page) >
> -				(KEXEC_SOURCE_MEMORY_LIMIT >> PAGE_SHIFT)) {
> -			list_add(&page->lru, &image->unusable_pages);
> -			continue;
> -		}
>  		addr = page_to_pfn(page) << PAGE_SHIFT;
>  
>  		/* If it is the destination page we want use it */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] kexec: remove unnecessary unusable_pages
@ 2016-07-12 15:19   ` Eric W. Biederman
  0 siblings, 0 replies; 26+ messages in thread
From: Eric W. Biederman @ 2016-07-12 15:19 UTC (permalink / raw)
  To: zhongjiang; +Cc: dyoung, kexec, linux-mm, horms, akpm, yinghai, vgoyal

zhongjiang <zhongjiang@huawei.com> writes:

> From: zhong jiang <zhongjiang@huawei.com>
>
> In general, kexec alloc pages from buddy system, it cannot exceed
> the physical address in the system.
>
> The patch just remove this unnecessary code, no functional change.

On 32bit systems with highmem support kexec can very easily receive a
page from the buddy allocator that can exceed 4GiB.  This doesn't show
up on 64bit systems as typically the memory limits are less than the
address space.  But this code is very necessary on some systems and
removing it is not ok.

Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>


>
> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
> ---
>  include/linux/kexec.h |  1 -
>  kernel/kexec_core.c   | 13 -------------
>  2 files changed, 14 deletions(-)
>
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index e8acb2b..26e4917 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -162,7 +162,6 @@ struct kimage {
>  
>  	struct list_head control_pages;
>  	struct list_head dest_pages;
> -	struct list_head unusable_pages;
>  
>  	/* Address of next control page to allocate for crash kernels. */
>  	unsigned long control_page;
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 56b3ed0..448127d 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -257,9 +257,6 @@ struct kimage *do_kimage_alloc_init(void)
>  	/* Initialize the list of destination pages */
>  	INIT_LIST_HEAD(&image->dest_pages);
>  
> -	/* Initialize the list of unusable pages */
> -	INIT_LIST_HEAD(&image->unusable_pages);
> -
>  	return image;
>  }
>  
> @@ -517,10 +514,6 @@ static void kimage_free_extra_pages(struct kimage *image)
>  {
>  	/* Walk through and free any extra destination pages I may have */
>  	kimage_free_page_list(&image->dest_pages);
> -
> -	/* Walk through and free any unusable pages I have cached */
> -	kimage_free_page_list(&image->unusable_pages);
> -
>  }
>  void kimage_terminate(struct kimage *image)
>  {
> @@ -647,12 +640,6 @@ static struct page *kimage_alloc_page(struct kimage *image,
>  		page = kimage_alloc_pages(gfp_mask, 0);
>  		if (!page)
>  			return NULL;
> -		/* If the page cannot be used file it away */
> -		if (page_to_pfn(page) >
> -				(KEXEC_SOURCE_MEMORY_LIMIT >> PAGE_SHIFT)) {
> -			list_add(&page->lru, &image->unusable_pages);
> -			continue;
> -		}
>  		addr = page_to_pfn(page) << PAGE_SHIFT;
>  
>  		/* If it is the destination page we want use it */

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] kexec: add a pmd huge entry condition during the page table
  2016-07-12  4:56   ` zhongjiang
@ 2016-07-12 15:46     ` Eric W. Biederman
  -1 siblings, 0 replies; 26+ messages in thread
From: Eric W. Biederman @ 2016-07-12 15:46 UTC (permalink / raw)
  To: zhongjiang; +Cc: dyoung, horms, vgoyal, yinghai, akpm, linux-mm, kexec

zhongjiang <zhongjiang@huawei.com> writes:

> From: zhong jiang <zhongjiang@huawei.com>
>
> when image is loaded into kernel, we need set up page table for it. and 
> all valid pfn also set up new mapping. it will tend to establish a pmd 
> page table in the form of a large page if pud_present is true. relocate_kernel 
> points to code segment can locate in the pmd huge entry in init_transtion_pgtable. 
> therefore, we need to take the situation into account.

I can see how in theory this might be necessary but when is a kernel virtual
address on x86_64 that is above 0x8000000000000000 in conflict with an
identity mapped physicall address that are all below 0x8000000000000000?

If anything the code could be simplified to always assume those mappings
are unoccupied.

Did you run into an actual failure somewhere?

Eric


> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
> ---
>  arch/x86/kernel/machine_kexec_64.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index 5a294e4..c33e344 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -14,6 +14,7 @@
>  #include <linux/gfp.h>
>  #include <linux/reboot.h>
>  #include <linux/numa.h>
> +#include <linux/hugetlb.h>
>  #include <linux/ftrace.h>
>  #include <linux/io.h>
>  #include <linux/suspend.h>
> @@ -34,6 +35,17 @@ static struct kexec_file_ops *kexec_file_loaders[] = {
>  };
>  #endif
>  
> +static void split_pmd(pmd_t *pmd, pte_t *pte)
> +{
> +	unsigned long pfn = pmd_pfn(*pmd);
> +	int i = 0;
> +
> +	do {
> +		set_pte(pte, pfn_pte(pfn, PAGE_KERNEL_EXEC));
> +		pfn++;
> +	} while (pte++, i++, i < PTRS_PER_PTE);
> +}
> +
>  static void free_transition_pgtable(struct kimage *image)
>  {
>  	free_page((unsigned long)image->arch.pud);
> @@ -68,15 +80,19 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
>  		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
>  	}
>  	pmd = pmd_offset(pud, vaddr);
> -	if (!pmd_present(*pmd)) {
> +	if (!pmd_present(*pmd) || pmd_huge(*pmd)) {
>  		pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
>  		if (!pte)
>  			goto err;
>  		image->arch.pte = pte;
> -		set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
> +		if (pmd_huge(*pmd))
> +			split_pmd(pmd, pte);
> +		else
> +			set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
>  	}
>  	pte = pte_offset_kernel(pmd, vaddr);
>  	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
> +
>  	return 0;
>  err:
>  	free_transition_pgtable(image);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] kexec: add a pmd huge entry condition during the page table
@ 2016-07-12 15:46     ` Eric W. Biederman
  0 siblings, 0 replies; 26+ messages in thread
From: Eric W. Biederman @ 2016-07-12 15:46 UTC (permalink / raw)
  To: zhongjiang; +Cc: dyoung, kexec, linux-mm, horms, akpm, yinghai, vgoyal

zhongjiang <zhongjiang@huawei.com> writes:

> From: zhong jiang <zhongjiang@huawei.com>
>
> when image is loaded into kernel, we need set up page table for it. and 
> all valid pfn also set up new mapping. it will tend to establish a pmd 
> page table in the form of a large page if pud_present is true. relocate_kernel 
> points to code segment can locate in the pmd huge entry in init_transtion_pgtable. 
> therefore, we need to take the situation into account.

I can see how in theory this might be necessary but when is a kernel virtual
address on x86_64 that is above 0x8000000000000000 in conflict with an
identity mapped physicall address that are all below 0x8000000000000000?

If anything the code could be simplified to always assume those mappings
are unoccupied.

Did you run into an actual failure somewhere?

Eric


> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
> ---
>  arch/x86/kernel/machine_kexec_64.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index 5a294e4..c33e344 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -14,6 +14,7 @@
>  #include <linux/gfp.h>
>  #include <linux/reboot.h>
>  #include <linux/numa.h>
> +#include <linux/hugetlb.h>
>  #include <linux/ftrace.h>
>  #include <linux/io.h>
>  #include <linux/suspend.h>
> @@ -34,6 +35,17 @@ static struct kexec_file_ops *kexec_file_loaders[] = {
>  };
>  #endif
>  
> +static void split_pmd(pmd_t *pmd, pte_t *pte)
> +{
> +	unsigned long pfn = pmd_pfn(*pmd);
> +	int i = 0;
> +
> +	do {
> +		set_pte(pte, pfn_pte(pfn, PAGE_KERNEL_EXEC));
> +		pfn++;
> +	} while (pte++, i++, i < PTRS_PER_PTE);
> +}
> +
>  static void free_transition_pgtable(struct kimage *image)
>  {
>  	free_page((unsigned long)image->arch.pud);
> @@ -68,15 +80,19 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
>  		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
>  	}
>  	pmd = pmd_offset(pud, vaddr);
> -	if (!pmd_present(*pmd)) {
> +	if (!pmd_present(*pmd) || pmd_huge(*pmd)) {
>  		pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
>  		if (!pte)
>  			goto err;
>  		image->arch.pte = pte;
> -		set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
> +		if (pmd_huge(*pmd))
> +			split_pmd(pmd, pte);
> +		else
> +			set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
>  	}
>  	pte = pte_offset_kernel(pmd, vaddr);
>  	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
> +
>  	return 0;
>  err:
>  	free_transition_pgtable(image);

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] kexec: remove unnecessary unusable_pages
  2016-07-12 15:19   ` Eric W. Biederman
@ 2016-07-13  4:08     ` zhong jiang
  -1 siblings, 0 replies; 26+ messages in thread
From: zhong jiang @ 2016-07-13  4:08 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: dyoung, horms, vgoyal, yinghai, akpm, kexec, linux-mm

On 2016/7/12 23:19, Eric W. Biederman wrote:
> zhongjiang <zhongjiang@huawei.com> writes:
>
>> From: zhong jiang <zhongjiang@huawei.com>
>>
>> In general, kexec alloc pages from buddy system, it cannot exceed
>> the physical address in the system.
>>
>> The patch just remove this unnecessary code, no functional change.
> On 32bit systems with highmem support kexec can very easily receive a
> page from the buddy allocator that can exceed 4GiB.  This doesn't show
> up on 64bit systems as typically the memory limits are less than the
> address space.  But this code is very necessary on some systems and
> removing it is not ok.
>
> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
  This viewpoint is as opposed to me,  32bit systems architectural decide it can not
  access exceed 4GiB whether the highmem or not.   but there is one exception, 
  when PAE enable, its physical address should be extended to 36,  new paging  mechanism
  established for it.  therefore, the  page from the buddy allocator can exceed 4GiB.

  moreover,  on 32bit systems I can not understand why KEXEC_SOURCE_MEMORY_LIMIT
  is defined to -1UL. therefore, kimge_aloc_page allocate page will always add to unusable_pages.

  Thanks
  zhongjiang
>> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
>> ---
>>  include/linux/kexec.h |  1 -
>>  kernel/kexec_core.c   | 13 -------------
>>  2 files changed, 14 deletions(-)
>>
>> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
>> index e8acb2b..26e4917 100644
>> --- a/include/linux/kexec.h
>> +++ b/include/linux/kexec.h
>> @@ -162,7 +162,6 @@ struct kimage {
>>  
>>  	struct list_head control_pages;
>>  	struct list_head dest_pages;
>> -	struct list_head unusable_pages;
>>  
>>  	/* Address of next control page to allocate for crash kernels. */
>>  	unsigned long control_page;
>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>> index 56b3ed0..448127d 100644
>> --- a/kernel/kexec_core.c
>> +++ b/kernel/kexec_core.c
>> @@ -257,9 +257,6 @@ struct kimage *do_kimage_alloc_init(void)
>>  	/* Initialize the list of destination pages */
>>  	INIT_LIST_HEAD(&image->dest_pages);
>>  
>> -	/* Initialize the list of unusable pages */
>> -	INIT_LIST_HEAD(&image->unusable_pages);
>> -
>>  	return image;
>>  }
>>  
>> @@ -517,10 +514,6 @@ static void kimage_free_extra_pages(struct kimage *image)
>>  {
>>  	/* Walk through and free any extra destination pages I may have */
>>  	kimage_free_page_list(&image->dest_pages);
>> -
>> -	/* Walk through and free any unusable pages I have cached */
>> -	kimage_free_page_list(&image->unusable_pages);
>> -
>>  }
>>  void kimage_terminate(struct kimage *image)
>>  {
>> @@ -647,12 +640,6 @@ static struct page *kimage_alloc_page(struct kimage *image,
>>  		page = kimage_alloc_pages(gfp_mask, 0);
>>  		if (!page)
>>  			return NULL;
>> -		/* If the page cannot be used file it away */
>> -		if (page_to_pfn(page) >
>> -				(KEXEC_SOURCE_MEMORY_LIMIT >> PAGE_SHIFT)) {
>> -			list_add(&page->lru, &image->unusable_pages);
>> -			continue;
>> -		}
>>  		addr = page_to_pfn(page) << PAGE_SHIFT;
>>  
>>  		/* If it is the destination page we want use it */
> .
>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] kexec: remove unnecessary unusable_pages
@ 2016-07-13  4:08     ` zhong jiang
  0 siblings, 0 replies; 26+ messages in thread
From: zhong jiang @ 2016-07-13  4:08 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: dyoung, kexec, linux-mm, horms, akpm, yinghai, vgoyal

On 2016/7/12 23:19, Eric W. Biederman wrote:
> zhongjiang <zhongjiang@huawei.com> writes:
>
>> From: zhong jiang <zhongjiang@huawei.com>
>>
>> In general, kexec alloc pages from buddy system, it cannot exceed
>> the physical address in the system.
>>
>> The patch just remove this unnecessary code, no functional change.
> On 32bit systems with highmem support kexec can very easily receive a
> page from the buddy allocator that can exceed 4GiB.  This doesn't show
> up on 64bit systems as typically the memory limits are less than the
> address space.  But this code is very necessary on some systems and
> removing it is not ok.
>
> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
  This viewpoint is as opposed to me,  32bit systems architectural decide it can not
  access exceed 4GiB whether the highmem or not.   but there is one exception, 
  when PAE enable, its physical address should be extended to 36,  new paging  mechanism
  established for it.  therefore, the  page from the buddy allocator can exceed 4GiB.

  moreover,  on 32bit systems I can not understand why KEXEC_SOURCE_MEMORY_LIMIT
  is defined to -1UL. therefore, kimge_aloc_page allocate page will always add to unusable_pages.

  Thanks
  zhongjiang
>> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
>> ---
>>  include/linux/kexec.h |  1 -
>>  kernel/kexec_core.c   | 13 -------------
>>  2 files changed, 14 deletions(-)
>>
>> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
>> index e8acb2b..26e4917 100644
>> --- a/include/linux/kexec.h
>> +++ b/include/linux/kexec.h
>> @@ -162,7 +162,6 @@ struct kimage {
>>  
>>  	struct list_head control_pages;
>>  	struct list_head dest_pages;
>> -	struct list_head unusable_pages;
>>  
>>  	/* Address of next control page to allocate for crash kernels. */
>>  	unsigned long control_page;
>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>> index 56b3ed0..448127d 100644
>> --- a/kernel/kexec_core.c
>> +++ b/kernel/kexec_core.c
>> @@ -257,9 +257,6 @@ struct kimage *do_kimage_alloc_init(void)
>>  	/* Initialize the list of destination pages */
>>  	INIT_LIST_HEAD(&image->dest_pages);
>>  
>> -	/* Initialize the list of unusable pages */
>> -	INIT_LIST_HEAD(&image->unusable_pages);
>> -
>>  	return image;
>>  }
>>  
>> @@ -517,10 +514,6 @@ static void kimage_free_extra_pages(struct kimage *image)
>>  {
>>  	/* Walk through and free any extra destination pages I may have */
>>  	kimage_free_page_list(&image->dest_pages);
>> -
>> -	/* Walk through and free any unusable pages I have cached */
>> -	kimage_free_page_list(&image->unusable_pages);
>> -
>>  }
>>  void kimage_terminate(struct kimage *image)
>>  {
>> @@ -647,12 +640,6 @@ static struct page *kimage_alloc_page(struct kimage *image,
>>  		page = kimage_alloc_pages(gfp_mask, 0);
>>  		if (!page)
>>  			return NULL;
>> -		/* If the page cannot be used file it away */
>> -		if (page_to_pfn(page) >
>> -				(KEXEC_SOURCE_MEMORY_LIMIT >> PAGE_SHIFT)) {
>> -			list_add(&page->lru, &image->unusable_pages);
>> -			continue;
>> -		}
>>  		addr = page_to_pfn(page) << PAGE_SHIFT;
>>  
>>  		/* If it is the destination page we want use it */
> .
>



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] kexec: remove unnecessary unusable_pages
  2016-07-13  4:08     ` zhong jiang
@ 2016-07-13  5:07       ` Eric W. Biederman
  -1 siblings, 0 replies; 26+ messages in thread
From: Eric W. Biederman @ 2016-07-13  5:07 UTC (permalink / raw)
  To: zhong jiang; +Cc: dyoung, horms, vgoyal, yinghai, akpm, kexec, linux-mm

zhong jiang <zhongjiang@huawei.com> writes:

> On 2016/7/12 23:19, Eric W. Biederman wrote:
>> zhongjiang <zhongjiang@huawei.com> writes:
>>
>>> From: zhong jiang <zhongjiang@huawei.com>
>>>
>>> In general, kexec alloc pages from buddy system, it cannot exceed
>>> the physical address in the system.
>>>
>>> The patch just remove this unnecessary code, no functional change.
>> On 32bit systems with highmem support kexec can very easily receive a
>> page from the buddy allocator that can exceed 4GiB.  This doesn't show
>> up on 64bit systems as typically the memory limits are less than the
>> address space.  But this code is very necessary on some systems and
>> removing it is not ok.
>>
>> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>
>   This viewpoint is as opposed to me,  32bit systems architectural decide it can not
>   access exceed 4GiB whether the highmem or not.   but there is one exception, 
>   when PAE enable, its physical address should be extended to 36,  new paging  mechanism
>   established for it.  therefore, the  page from the buddy allocator
>   can exceed 4GiB.

Exactly.  And I was dealing with PAE systems in 2001 or so with > 4GiB
of RAM.  Which is where the unusable_pages work comes from.

Other architectures such as ARM also followed a similar path, so
it isn't just x86 that has 32bit systems with > 32 address lines.

>   moreover,  on 32bit systems I can not understand why KEXEC_SOURCE_MEMORY_LIMIT
>   is defined to -1UL. therefore, kimge_aloc_page allocate page will always add to unusable_pages.

-1UL is a short way of writing 0xffffffffUL  Which is as close as you
can get to writing 0x100000000UL in 32bits.

kimage_alloc_page won't always add to unusable_pages as there is memory
below 4GiB but it isn't easily found so there may temporarily be a
memory shortage, as it allocates it's way there.  Unfortunately whenever
I have looked there are memory zones that line up with the memory the
kexec is looking for.  So it does a little bit of a weird dance to get
the memory it needs and to discard the memory it can't use.

Eric

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] kexec: remove unnecessary unusable_pages
@ 2016-07-13  5:07       ` Eric W. Biederman
  0 siblings, 0 replies; 26+ messages in thread
From: Eric W. Biederman @ 2016-07-13  5:07 UTC (permalink / raw)
  To: zhong jiang; +Cc: dyoung, kexec, linux-mm, horms, akpm, yinghai, vgoyal

zhong jiang <zhongjiang@huawei.com> writes:

> On 2016/7/12 23:19, Eric W. Biederman wrote:
>> zhongjiang <zhongjiang@huawei.com> writes:
>>
>>> From: zhong jiang <zhongjiang@huawei.com>
>>>
>>> In general, kexec alloc pages from buddy system, it cannot exceed
>>> the physical address in the system.
>>>
>>> The patch just remove this unnecessary code, no functional change.
>> On 32bit systems with highmem support kexec can very easily receive a
>> page from the buddy allocator that can exceed 4GiB.  This doesn't show
>> up on 64bit systems as typically the memory limits are less than the
>> address space.  But this code is very necessary on some systems and
>> removing it is not ok.
>>
>> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>
>   This viewpoint is as opposed to me,  32bit systems architectural decide it can not
>   access exceed 4GiB whether the highmem or not.   but there is one exception, 
>   when PAE enable, its physical address should be extended to 36,  new paging  mechanism
>   established for it.  therefore, the  page from the buddy allocator
>   can exceed 4GiB.

Exactly.  And I was dealing with PAE systems in 2001 or so with > 4GiB
of RAM.  Which is where the unusable_pages work comes from.

Other architectures such as ARM also followed a similar path, so
it isn't just x86 that has 32bit systems with > 32 address lines.

>   moreover,  on 32bit systems I can not understand why KEXEC_SOURCE_MEMORY_LIMIT
>   is defined to -1UL. therefore, kimge_aloc_page allocate page will always add to unusable_pages.

-1UL is a short way of writing 0xffffffffUL  Which is as close as you
can get to writing 0x100000000UL in 32bits.

kimage_alloc_page won't always add to unusable_pages as there is memory
below 4GiB but it isn't easily found so there may temporarily be a
memory shortage, as it allocates it's way there.  Unfortunately whenever
I have looked there are memory zones that line up with the memory the
kexec is looking for.  So it does a little bit of a weird dance to get
the memory it needs and to discard the memory it can't use.

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] kexec: add a pmd huge entry condition during the page table
  2016-07-12 15:46     ` Eric W. Biederman
@ 2016-07-13  7:01       ` zhong jiang
  -1 siblings, 0 replies; 26+ messages in thread
From: zhong jiang @ 2016-07-13  7:01 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: dyoung, horms, vgoyal, yinghai, akpm, linux-mm, kexec

On 2016/7/12 23:46, Eric W. Biederman wrote:
> zhongjiang <zhongjiang@huawei.com> writes:
>
>> From: zhong jiang <zhongjiang@huawei.com>
>>
>> when image is loaded into kernel, we need set up page table for it. and 
>> all valid pfn also set up new mapping. it will tend to establish a pmd 
>> page table in the form of a large page if pud_present is true. relocate_kernel 
>> points to code segment can locate in the pmd huge entry in init_transtion_pgtable. 
>> therefore, we need to take the situation into account.
> I can see how in theory this might be necessary but when is a kernel virtual
> address on x86_64 that is above 0x8000000000000000 in conflict with an
> identity mapped physicall address that are all below 0x8000000000000000?
>
> If anything the code could be simplified to always assume those mappings
> are unoccupied.
>
> Did you run into an actual failure somewhere?
>
> Eric
>
   I  do not understand what you trying to say,  Maybe I miss your point.
  
  The key is how to ensure that relocate_kernel points to the pmd entry is not huge page.
 
  Thanks
  zhongjiang
 
>> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
>> ---
>>  arch/x86/kernel/machine_kexec_64.c | 20 ++++++++++++++++++--
>>  1 file changed, 18 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
>> index 5a294e4..c33e344 100644
>> --- a/arch/x86/kernel/machine_kexec_64.c
>> +++ b/arch/x86/kernel/machine_kexec_64.c
>> @@ -14,6 +14,7 @@
>>  #include <linux/gfp.h>
>>  #include <linux/reboot.h>
>>  #include <linux/numa.h>
>> +#include <linux/hugetlb.h>
>>  #include <linux/ftrace.h>
>>  #include <linux/io.h>
>>  #include <linux/suspend.h>
>> @@ -34,6 +35,17 @@ static struct kexec_file_ops *kexec_file_loaders[] = {
>>  };
>>  #endif
>>  
>> +static void split_pmd(pmd_t *pmd, pte_t *pte)
>> +{
>> +	unsigned long pfn = pmd_pfn(*pmd);
>> +	int i = 0;
>> +
>> +	do {
>> +		set_pte(pte, pfn_pte(pfn, PAGE_KERNEL_EXEC));
>> +		pfn++;
>> +	} while (pte++, i++, i < PTRS_PER_PTE);
>> +}
>> +
>>  static void free_transition_pgtable(struct kimage *image)
>>  {
>>  	free_page((unsigned long)image->arch.pud);
>> @@ -68,15 +80,19 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
>>  		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
>>  	}
>>  	pmd = pmd_offset(pud, vaddr);
>> -	if (!pmd_present(*pmd)) {
>> +	if (!pmd_present(*pmd) || pmd_huge(*pmd)) {
>>  		pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
>>  		if (!pte)
>>  			goto err;
>>  		image->arch.pte = pte;
>> -		set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
>> +		if (pmd_huge(*pmd))
>> +			split_pmd(pmd, pte);
>> +		else
>> +			set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
>>  	}
>>  	pte = pte_offset_kernel(pmd, vaddr);
>>  	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
>> +
>>  	return 0;
>>  err:
>>  	free_transition_pgtable(image);
> .
>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] kexec: add a pmd huge entry condition during the page table
@ 2016-07-13  7:01       ` zhong jiang
  0 siblings, 0 replies; 26+ messages in thread
From: zhong jiang @ 2016-07-13  7:01 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: dyoung, kexec, linux-mm, horms, akpm, yinghai, vgoyal

On 2016/7/12 23:46, Eric W. Biederman wrote:
> zhongjiang <zhongjiang@huawei.com> writes:
>
>> From: zhong jiang <zhongjiang@huawei.com>
>>
>> when image is loaded into kernel, we need set up page table for it. and 
>> all valid pfn also set up new mapping. it will tend to establish a pmd 
>> page table in the form of a large page if pud_present is true. relocate_kernel 
>> points to code segment can locate in the pmd huge entry in init_transtion_pgtable. 
>> therefore, we need to take the situation into account.
> I can see how in theory this might be necessary but when is a kernel virtual
> address on x86_64 that is above 0x8000000000000000 in conflict with an
> identity mapped physicall address that are all below 0x8000000000000000?
>
> If anything the code could be simplified to always assume those mappings
> are unoccupied.
>
> Did you run into an actual failure somewhere?
>
> Eric
>
   I  do not understand what you trying to say,  Maybe I miss your point.
  
  The key is how to ensure that relocate_kernel points to the pmd entry is not huge page.
 
  Thanks
  zhongjiang
 
>> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
>> ---
>>  arch/x86/kernel/machine_kexec_64.c | 20 ++++++++++++++++++--
>>  1 file changed, 18 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
>> index 5a294e4..c33e344 100644
>> --- a/arch/x86/kernel/machine_kexec_64.c
>> +++ b/arch/x86/kernel/machine_kexec_64.c
>> @@ -14,6 +14,7 @@
>>  #include <linux/gfp.h>
>>  #include <linux/reboot.h>
>>  #include <linux/numa.h>
>> +#include <linux/hugetlb.h>
>>  #include <linux/ftrace.h>
>>  #include <linux/io.h>
>>  #include <linux/suspend.h>
>> @@ -34,6 +35,17 @@ static struct kexec_file_ops *kexec_file_loaders[] = {
>>  };
>>  #endif
>>  
>> +static void split_pmd(pmd_t *pmd, pte_t *pte)
>> +{
>> +	unsigned long pfn = pmd_pfn(*pmd);
>> +	int i = 0;
>> +
>> +	do {
>> +		set_pte(pte, pfn_pte(pfn, PAGE_KERNEL_EXEC));
>> +		pfn++;
>> +	} while (pte++, i++, i < PTRS_PER_PTE);
>> +}
>> +
>>  static void free_transition_pgtable(struct kimage *image)
>>  {
>>  	free_page((unsigned long)image->arch.pud);
>> @@ -68,15 +80,19 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
>>  		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
>>  	}
>>  	pmd = pmd_offset(pud, vaddr);
>> -	if (!pmd_present(*pmd)) {
>> +	if (!pmd_present(*pmd) || pmd_huge(*pmd)) {
>>  		pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
>>  		if (!pte)
>>  			goto err;
>>  		image->arch.pte = pte;
>> -		set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
>> +		if (pmd_huge(*pmd))
>> +			split_pmd(pmd, pte);
>> +		else
>> +			set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
>>  	}
>>  	pte = pte_offset_kernel(pmd, vaddr);
>>  	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
>> +
>>  	return 0;
>>  err:
>>  	free_transition_pgtable(image);
> .
>



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] kexec: remove unnecessary unusable_pages
  2016-07-13  5:07       ` Eric W. Biederman
@ 2016-07-13  7:07         ` zhong jiang
  -1 siblings, 0 replies; 26+ messages in thread
From: zhong jiang @ 2016-07-13  7:07 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: dyoung, horms, vgoyal, yinghai, akpm, kexec, linux-mm

On 2016/7/13 13:07, Eric W. Biederman wrote:
> zhong jiang <zhongjiang@huawei.com> writes:
>
>> On 2016/7/12 23:19, Eric W. Biederman wrote:
>>> zhongjiang <zhongjiang@huawei.com> writes:
>>>
>>>> From: zhong jiang <zhongjiang@huawei.com>
>>>>
>>>> In general, kexec alloc pages from buddy system, it cannot exceed
>>>> the physical address in the system.
>>>>
>>>> The patch just remove this unnecessary code, no functional change.
>>> On 32bit systems with highmem support kexec can very easily receive a
>>> page from the buddy allocator that can exceed 4GiB.  This doesn't show
>>> up on 64bit systems as typically the memory limits are less than the
>>> address space.  But this code is very necessary on some systems and
>>> removing it is not ok.
>>>
>>> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>>
>>   This viewpoint is as opposed to me,  32bit systems architectural decide it can not
>>   access exceed 4GiB whether the highmem or not.   but there is one exception, 
>>   when PAE enable, its physical address should be extended to 36,  new paging  mechanism
>>   established for it.  therefore, the  page from the buddy allocator
>>   can exceed 4GiB.
> Exactly.  And I was dealing with PAE systems in 2001 or so with > 4GiB
> of RAM.  Which is where the unusable_pages work comes from.
>
> Other architectures such as ARM also followed a similar path, so
> it isn't just x86 that has 32bit systems with > 32 address lines.
>
>>   moreover,  on 32bit systems I can not understand why KEXEC_SOURCE_MEMORY_LIMIT
>>   is defined to -1UL. therefore, kimge_aloc_page allocate page will always add to unusable_pages.
> -1UL is a short way of writing 0xffffffffUL  Which is as close as you
> can get to writing 0x100000000UL in 32bits.
>
> kimage_alloc_page won't always add to unusable_pages as there is memory
> below 4GiB but it isn't easily found so there may temporarily be a
> memory shortage, as it allocates it's way there.  Unfortunately whenever
> I have looked there are memory zones that line up with the memory the
> kexec is looking for.  So it does a little bit of a weird dance to get
> the memory it needs and to discard the memory it can't use.
>
> Eric
>
  Thanks , I get it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] kexec: remove unnecessary unusable_pages
@ 2016-07-13  7:07         ` zhong jiang
  0 siblings, 0 replies; 26+ messages in thread
From: zhong jiang @ 2016-07-13  7:07 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: dyoung, kexec, linux-mm, horms, akpm, yinghai, vgoyal

On 2016/7/13 13:07, Eric W. Biederman wrote:
> zhong jiang <zhongjiang@huawei.com> writes:
>
>> On 2016/7/12 23:19, Eric W. Biederman wrote:
>>> zhongjiang <zhongjiang@huawei.com> writes:
>>>
>>>> From: zhong jiang <zhongjiang@huawei.com>
>>>>
>>>> In general, kexec alloc pages from buddy system, it cannot exceed
>>>> the physical address in the system.
>>>>
>>>> The patch just remove this unnecessary code, no functional change.
>>> On 32bit systems with highmem support kexec can very easily receive a
>>> page from the buddy allocator that can exceed 4GiB.  This doesn't show
>>> up on 64bit systems as typically the memory limits are less than the
>>> address space.  But this code is very necessary on some systems and
>>> removing it is not ok.
>>>
>>> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>>
>>   This viewpoint is as opposed to me,  32bit systems architectural decide it can not
>>   access exceed 4GiB whether the highmem or not.   but there is one exception, 
>>   when PAE enable, its physical address should be extended to 36,  new paging  mechanism
>>   established for it.  therefore, the  page from the buddy allocator
>>   can exceed 4GiB.
> Exactly.  And I was dealing with PAE systems in 2001 or so with > 4GiB
> of RAM.  Which is where the unusable_pages work comes from.
>
> Other architectures such as ARM also followed a similar path, so
> it isn't just x86 that has 32bit systems with > 32 address lines.
>
>>   moreover,  on 32bit systems I can not understand why KEXEC_SOURCE_MEMORY_LIMIT
>>   is defined to -1UL. therefore, kimge_aloc_page allocate page will always add to unusable_pages.
> -1UL is a short way of writing 0xffffffffUL  Which is as close as you
> can get to writing 0x100000000UL in 32bits.
>
> kimage_alloc_page won't always add to unusable_pages as there is memory
> below 4GiB but it isn't easily found so there may temporarily be a
> memory shortage, as it allocates it's way there.  Unfortunately whenever
> I have looked there are memory zones that line up with the memory the
> kexec is looking for.  So it does a little bit of a weird dance to get
> the memory it needs and to discard the memory it can't use.
>
> Eric
>
  Thanks , I get it.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] kexec: add a pmd huge entry condition during the page table
  2016-07-13  7:01       ` zhong jiang
@ 2016-07-14 13:19         ` Eric W. Biederman
  -1 siblings, 0 replies; 26+ messages in thread
From: Eric W. Biederman @ 2016-07-14 13:19 UTC (permalink / raw)
  To: zhong jiang; +Cc: dyoung, horms, vgoyal, yinghai, akpm, linux-mm, kexec

zhong jiang <zhongjiang@huawei.com> writes:

> On 2016/7/12 23:46, Eric W. Biederman wrote:
>> zhongjiang <zhongjiang@huawei.com> writes:
>>
>>> From: zhong jiang <zhongjiang@huawei.com>
>>>
>>> when image is loaded into kernel, we need set up page table for it. and 
>>> all valid pfn also set up new mapping. it will tend to establish a pmd 
>>> page table in the form of a large page if pud_present is true. relocate_kernel 
>>> points to code segment can locate in the pmd huge entry in init_transtion_pgtable. 
>>> therefore, we need to take the situation into account.
>> I can see how in theory this might be necessary but when is a kernel virtual
>> address on x86_64 that is above 0x8000000000000000 in conflict with an
>> identity mapped physicall address that are all below 0x8000000000000000?
>>
>> If anything the code could be simplified to always assume those mappings
>> are unoccupied.
>>
>> Did you run into an actual failure somewhere?
>>
>> Eric
>>
>    I  do not understand what you trying to say,  Maybe I miss your point.
>   
>   The key is how to ensure that relocate_kernel points to the pmd
>   entry is not huge page.

Kernel virtual addresses are in the negative half of the address space.
Identity mapped physical addresses are in the positive half of the
address space.

As the entire negative half of the address space at the time that page
table entry is being created the are no huge pages present.

Even testing pmd_present is a redundant, and that is probably the bug.

Eric

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] kexec: add a pmd huge entry condition during the page table
@ 2016-07-14 13:19         ` Eric W. Biederman
  0 siblings, 0 replies; 26+ messages in thread
From: Eric W. Biederman @ 2016-07-14 13:19 UTC (permalink / raw)
  To: zhong jiang; +Cc: dyoung, kexec, linux-mm, horms, akpm, yinghai, vgoyal

zhong jiang <zhongjiang@huawei.com> writes:

> On 2016/7/12 23:46, Eric W. Biederman wrote:
>> zhongjiang <zhongjiang@huawei.com> writes:
>>
>>> From: zhong jiang <zhongjiang@huawei.com>
>>>
>>> when image is loaded into kernel, we need set up page table for it. and 
>>> all valid pfn also set up new mapping. it will tend to establish a pmd 
>>> page table in the form of a large page if pud_present is true. relocate_kernel 
>>> points to code segment can locate in the pmd huge entry in init_transtion_pgtable. 
>>> therefore, we need to take the situation into account.
>> I can see how in theory this might be necessary but when is a kernel virtual
>> address on x86_64 that is above 0x8000000000000000 in conflict with an
>> identity mapped physicall address that are all below 0x8000000000000000?
>>
>> If anything the code could be simplified to always assume those mappings
>> are unoccupied.
>>
>> Did you run into an actual failure somewhere?
>>
>> Eric
>>
>    I  do not understand what you trying to say,  Maybe I miss your point.
>   
>   The key is how to ensure that relocate_kernel points to the pmd
>   entry is not huge page.

Kernel virtual addresses are in the negative half of the address space.
Identity mapped physical addresses are in the positive half of the
address space.

As the entire negative half of the address space at the time that page
table entry is being created the are no huge pages present.

Even testing pmd_present is a redundant, and that is probably the bug.

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] kexec: add a pmd huge entry condition during the page table
  2016-07-14 13:19         ` Eric W. Biederman
@ 2016-07-20  7:25           ` zhong jiang
  -1 siblings, 0 replies; 26+ messages in thread
From: zhong jiang @ 2016-07-20  7:25 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: dyoung, horms, vgoyal, yinghai, akpm, linux-mm, kexec

On 2016/7/14 21:19, Eric W. Biederman wrote:
> zhong jiang <zhongjiang@huawei.com> writes:
>
>> On 2016/7/12 23:46, Eric W. Biederman wrote:
>>> zhongjiang <zhongjiang@huawei.com> writes:
>>>
>>>> From: zhong jiang <zhongjiang@huawei.com>
>>>>
>>>> when image is loaded into kernel, we need set up page table for it. and 
>>>> all valid pfn also set up new mapping. it will tend to establish a pmd 
>>>> page table in the form of a large page if pud_present is true. relocate_kernel 
>>>> points to code segment can locate in the pmd huge entry in init_transtion_pgtable. 
>>>> therefore, we need to take the situation into account.
>>> I can see how in theory this might be necessary but when is a kernel virtual
>>> address on x86_64 that is above 0x8000000000000000 in conflict with an
>>> identity mapped physicall address that are all below 0x8000000000000000?
>>>
>>> If anything the code could be simplified to always assume those mappings
>>> are unoccupied.
>>>
>>> Did you run into an actual failure somewhere?
>>>
>>> Eric
>>>
>>    I  do not understand what you trying to say,  Maybe I miss your point.
>>   
>>   The key is how to ensure that relocate_kernel points to the pmd
>>   entry is not huge page.
> Kernel virtual addresses are in the negative half of the address space.
> Identity mapped physical addresses are in the positive half of the
> address space.
>
> As the entire negative half of the address space at the time that page
> table entry is being created the are no huge pages present.
>
> Even testing pmd_present is a redundant, and that is probably the bug.
>
> Eric
>
> .
  ok , I know your mean.  we allocate new pgd page, that is  control_code_page,
  to rebuild new mapping machanism in init_pgtable.  because the relocate_kernel
  is in the negative half of the address space.   and The page table is not establise
  for the new pgd.  To my surprise,  if the page table is not exist, why we need check
  p(g,u,m)d_present() . if not , I still think that it can exist a pmd huge .

 or Maybe I misunderstand its meaning.

  Thanks
  zhongjiang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] kexec: add a pmd huge entry condition during the page table
@ 2016-07-20  7:25           ` zhong jiang
  0 siblings, 0 replies; 26+ messages in thread
From: zhong jiang @ 2016-07-20  7:25 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: dyoung, kexec, linux-mm, horms, akpm, yinghai, vgoyal

On 2016/7/14 21:19, Eric W. Biederman wrote:
> zhong jiang <zhongjiang@huawei.com> writes:
>
>> On 2016/7/12 23:46, Eric W. Biederman wrote:
>>> zhongjiang <zhongjiang@huawei.com> writes:
>>>
>>>> From: zhong jiang <zhongjiang@huawei.com>
>>>>
>>>> when image is loaded into kernel, we need set up page table for it. and 
>>>> all valid pfn also set up new mapping. it will tend to establish a pmd 
>>>> page table in the form of a large page if pud_present is true. relocate_kernel 
>>>> points to code segment can locate in the pmd huge entry in init_transtion_pgtable. 
>>>> therefore, we need to take the situation into account.
>>> I can see how in theory this might be necessary but when is a kernel virtual
>>> address on x86_64 that is above 0x8000000000000000 in conflict with an
>>> identity mapped physicall address that are all below 0x8000000000000000?
>>>
>>> If anything the code could be simplified to always assume those mappings
>>> are unoccupied.
>>>
>>> Did you run into an actual failure somewhere?
>>>
>>> Eric
>>>
>>    I  do not understand what you trying to say,  Maybe I miss your point.
>>   
>>   The key is how to ensure that relocate_kernel points to the pmd
>>   entry is not huge page.
> Kernel virtual addresses are in the negative half of the address space.
> Identity mapped physical addresses are in the positive half of the
> address space.
>
> As the entire negative half of the address space at the time that page
> table entry is being created the are no huge pages present.
>
> Even testing pmd_present is a redundant, and that is probably the bug.
>
> Eric
>
> .
  ok , I know your mean.  we allocate new pgd page, that is  control_code_page,
  to rebuild new mapping machanism in init_pgtable.  because the relocate_kernel
  is in the negative half of the address space.   and The page table is not establise
  for the new pgd.  To my surprise,  if the page table is not exist, why we need check
  p(g,u,m)d_present() . if not , I still think that it can exist a pmd huge .

 or Maybe I misunderstand its meaning.

  Thanks
  zhongjiang


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] kexec: add a pmd huge entry condition during the page table
  2016-07-11 20:25     ` Andrew Morton
@ 2016-07-12  2:21       ` zhong jiang
  -1 siblings, 0 replies; 26+ messages in thread
From: zhong jiang @ 2016-07-12  2:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel

On 2016/7/12 4:25, Andrew Morton wrote:
> On Mon, 11 Jul 2016 14:36:01 +0800 zhongjiang <zhongjiang@huawei.com> wrote:
>
>> From: zhong jiang <zhongjiang@huawei.com>
>>
>> when image is loaded into kernel, we need set up page table for it.
>> and all valid pfn also set up new mapping. it will set up a pmd huge
>> entry if pud_present is true.  relocate_kernel points to code segment
>> can locate in the pmd huge entry in init_transtion_pgtable. therefore,
>> we need to take the situation into account.
> Sorry, I just don't understand this changelog.  The second sentence is
> particularly hard.
>
> So can you please have another attempt at preparing the changelog text?
> The resend the patches and this time be sure to Cc the kexec
> maintainers.  I suggest this list:
>
> Cc: kexec@lists.infradead.org
> Cc: Eric Biederman <ebiederm@xmission.com>
> Cc: Dave Young <dyoung@redhat.com>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Simon Horman <horms@verge.net.au>
>
>
> .
>
 ok ,  I will modify the changelog and resend to this list. thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] kexec: add a pmd huge entry condition during the page table
@ 2016-07-12  2:21       ` zhong jiang
  0 siblings, 0 replies; 26+ messages in thread
From: zhong jiang @ 2016-07-12  2:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel

On 2016/7/12 4:25, Andrew Morton wrote:
> On Mon, 11 Jul 2016 14:36:01 +0800 zhongjiang <zhongjiang@huawei.com> wrote:
>
>> From: zhong jiang <zhongjiang@huawei.com>
>>
>> when image is loaded into kernel, we need set up page table for it.
>> and all valid pfn also set up new mapping. it will set up a pmd huge
>> entry if pud_present is true.  relocate_kernel points to code segment
>> can locate in the pmd huge entry in init_transtion_pgtable. therefore,
>> we need to take the situation into account.
> Sorry, I just don't understand this changelog.  The second sentence is
> particularly hard.
>
> So can you please have another attempt at preparing the changelog text?
> The resend the patches and this time be sure to Cc the kexec
> maintainers.  I suggest this list:
>
> Cc: kexec@lists.infradead.org
> Cc: Eric Biederman <ebiederm@xmission.com>
> Cc: Dave Young <dyoung@redhat.com>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Simon Horman <horms@verge.net.au>
>
>
> .
>
 ok ,  I will modify the changelog and resend to this list. thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] kexec: add a pmd huge entry condition during the page table
  2016-07-11  6:36   ` zhongjiang
@ 2016-07-11 20:25     ` Andrew Morton
  -1 siblings, 0 replies; 26+ messages in thread
From: Andrew Morton @ 2016-07-11 20:25 UTC (permalink / raw)
  To: zhongjiang; +Cc: linux-mm, linux-kernel

On Mon, 11 Jul 2016 14:36:01 +0800 zhongjiang <zhongjiang@huawei.com> wrote:

> From: zhong jiang <zhongjiang@huawei.com>
> 
> when image is loaded into kernel, we need set up page table for it.
> and all valid pfn also set up new mapping. it will set up a pmd huge
> entry if pud_present is true.  relocate_kernel points to code segment
> can locate in the pmd huge entry in init_transtion_pgtable. therefore,
> we need to take the situation into account.

Sorry, I just don't understand this changelog.  The second sentence is
particularly hard.

So can you please have another attempt at preparing the changelog text?
The resend the patches and this time be sure to Cc the kexec
maintainers.  I suggest this list:

Cc: kexec@lists.infradead.org
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Simon Horman <horms@verge.net.au>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] kexec: add a pmd huge entry condition during the page table
@ 2016-07-11 20:25     ` Andrew Morton
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Morton @ 2016-07-11 20:25 UTC (permalink / raw)
  To: zhongjiang; +Cc: linux-mm, linux-kernel

On Mon, 11 Jul 2016 14:36:01 +0800 zhongjiang <zhongjiang@huawei.com> wrote:

> From: zhong jiang <zhongjiang@huawei.com>
> 
> when image is loaded into kernel, we need set up page table for it.
> and all valid pfn also set up new mapping. it will set up a pmd huge
> entry if pud_present is true.  relocate_kernel points to code segment
> can locate in the pmd huge entry in init_transtion_pgtable. therefore,
> we need to take the situation into account.

Sorry, I just don't understand this changelog.  The second sentence is
particularly hard.

So can you please have another attempt at preparing the changelog text?
The resend the patches and this time be sure to Cc the kexec
maintainers.  I suggest this list:

Cc: kexec@lists.infradead.org
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Simon Horman <horms@verge.net.au>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 2/2] kexec: add a pmd huge entry condition during the page table
  2016-07-11  6:36 zhongjiang
@ 2016-07-11  6:36   ` zhongjiang
  0 siblings, 0 replies; 26+ messages in thread
From: zhongjiang @ 2016-07-11  6:36 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

From: zhong jiang <zhongjiang@huawei.com>

when image is loaded into kernel, we need set up page table for it.
and all valid pfn also set up new mapping. it will set up a pmd huge
entry if pud_present is true.  relocate_kernel points to code segment
can locate in the pmd huge entry in init_transtion_pgtable. therefore,
we need to take the situation into account.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 arch/x86/kernel/machine_kexec_64.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 5a294e4..c33e344 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -14,6 +14,7 @@
 #include <linux/gfp.h>
 #include <linux/reboot.h>
 #include <linux/numa.h>
+#include <linux/hugetlb.h>
 #include <linux/ftrace.h>
 #include <linux/io.h>
 #include <linux/suspend.h>
@@ -34,6 +35,17 @@ static struct kexec_file_ops *kexec_file_loaders[] = {
 };
 #endif
 
+static void split_pmd(pmd_t *pmd, pte_t *pte)
+{
+	unsigned long pfn = pmd_pfn(*pmd);
+	int i = 0;
+
+	do {
+		set_pte(pte, pfn_pte(pfn, PAGE_KERNEL_EXEC));
+		pfn++;
+	} while (pte++, i++, i < PTRS_PER_PTE);
+}
+
 static void free_transition_pgtable(struct kimage *image)
 {
 	free_page((unsigned long)image->arch.pud);
@@ -68,15 +80,19 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
 		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
 	}
 	pmd = pmd_offset(pud, vaddr);
-	if (!pmd_present(*pmd)) {
+	if (!pmd_present(*pmd) || pmd_huge(*pmd)) {
 		pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
 		if (!pte)
 			goto err;
 		image->arch.pte = pte;
-		set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
+		if (pmd_huge(*pmd))
+			split_pmd(pmd, pte);
+		else
+			set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
 	}
 	pte = pte_offset_kernel(pmd, vaddr);
 	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
+
 	return 0;
 err:
 	free_transition_pgtable(image);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/2] kexec: add a pmd huge entry condition during the page table
@ 2016-07-11  6:36   ` zhongjiang
  0 siblings, 0 replies; 26+ messages in thread
From: zhongjiang @ 2016-07-11  6:36 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

From: zhong jiang <zhongjiang@huawei.com>

when image is loaded into kernel, we need set up page table for it.
and all valid pfn also set up new mapping. it will set up a pmd huge
entry if pud_present is true.  relocate_kernel points to code segment
can locate in the pmd huge entry in init_transtion_pgtable. therefore,
we need to take the situation into account.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 arch/x86/kernel/machine_kexec_64.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 5a294e4..c33e344 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -14,6 +14,7 @@
 #include <linux/gfp.h>
 #include <linux/reboot.h>
 #include <linux/numa.h>
+#include <linux/hugetlb.h>
 #include <linux/ftrace.h>
 #include <linux/io.h>
 #include <linux/suspend.h>
@@ -34,6 +35,17 @@ static struct kexec_file_ops *kexec_file_loaders[] = {
 };
 #endif
 
+static void split_pmd(pmd_t *pmd, pte_t *pte)
+{
+	unsigned long pfn = pmd_pfn(*pmd);
+	int i = 0;
+
+	do {
+		set_pte(pte, pfn_pte(pfn, PAGE_KERNEL_EXEC));
+		pfn++;
+	} while (pte++, i++, i < PTRS_PER_PTE);
+}
+
 static void free_transition_pgtable(struct kimage *image)
 {
 	free_page((unsigned long)image->arch.pud);
@@ -68,15 +80,19 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
 		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
 	}
 	pmd = pmd_offset(pud, vaddr);
-	if (!pmd_present(*pmd)) {
+	if (!pmd_present(*pmd) || pmd_huge(*pmd)) {
 		pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
 		if (!pte)
 			goto err;
 		image->arch.pte = pte;
-		set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
+		if (pmd_huge(*pmd))
+			split_pmd(pmd, pte);
+		else
+			set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
 	}
 	pte = pte_offset_kernel(pmd, vaddr);
 	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
+
 	return 0;
 err:
 	free_transition_pgtable(image);
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2016-07-20  7:37 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-12  4:56 [PATCH 1/2] kexec: remove unnecessary unusable_pages zhongjiang
2016-07-12  4:56 ` zhongjiang
2016-07-12  4:56 ` [PATCH 2/2] kexec: add a pmd huge entry condition during the page table zhongjiang
2016-07-12  4:56   ` zhongjiang
2016-07-12 15:46   ` Eric W. Biederman
2016-07-12 15:46     ` Eric W. Biederman
2016-07-13  7:01     ` zhong jiang
2016-07-13  7:01       ` zhong jiang
2016-07-14 13:19       ` Eric W. Biederman
2016-07-14 13:19         ` Eric W. Biederman
2016-07-20  7:25         ` zhong jiang
2016-07-20  7:25           ` zhong jiang
2016-07-12 15:19 ` [PATCH 1/2] kexec: remove unnecessary unusable_pages Eric W. Biederman
2016-07-12 15:19   ` Eric W. Biederman
2016-07-13  4:08   ` zhong jiang
2016-07-13  4:08     ` zhong jiang
2016-07-13  5:07     ` Eric W. Biederman
2016-07-13  5:07       ` Eric W. Biederman
2016-07-13  7:07       ` zhong jiang
2016-07-13  7:07         ` zhong jiang
  -- strict thread matches above, loose matches on Subject: below --
2016-07-11  6:36 zhongjiang
2016-07-11  6:36 ` [PATCH 2/2] kexec: add a pmd huge entry condition during the page table zhongjiang
2016-07-11  6:36   ` zhongjiang
2016-07-11 20:25   ` Andrew Morton
2016-07-11 20:25     ` Andrew Morton
2016-07-12  2:21     ` zhong jiang
2016-07-12  2:21       ` zhong jiang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.