linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/10] change the implementation of the PageHighMem()
@ 2020-04-29  3:26 js1304
  2020-04-29  3:26 ` [PATCH v2 01/10] mm/page-flags: introduce PageHighMemZone() js1304
                   ` (10 more replies)
  0 siblings, 11 replies; 33+ messages in thread
From: js1304 @ 2020-04-29  3:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Vlastimil Babka, Laura Abbott,
	Aneesh Kumar K . V, Mel Gorman, Michal Hocko, Johannes Weiner,
	Roman Gushchin, Minchan Kim, Rik van Riel, Christian Koenig,
	Huang Rui, Eric Biederman, Rafael J . Wysocki, Pavel Machek,
	kernel-team, Christoph Hellwig, Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Changes on v2
- add "acked-by", "reviewed-by" tags
- replace PageHighMem() with use open-code, instead of using
new PageHighMemZone() macro. Related file is "include/linux/migrate.h"

Hello,

This patchset separates two use cases of PageHighMem() by introducing
PageHighMemZone() macro. And, it changes the implementation of
PageHighMem() to reflect the actual meaning of this macro. This patchset
is a preparation step for the patchset,
"mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE" [1].

PageHighMem() is used for two different cases. One is to check if there
is a direct mapping for this page or not. The other is to check the
zone of this page, that is, weather it is the highmem type zone or not.

Until now, both the cases are the perfectly same thing. So, implementation
of the PageHighMem() uses the one case that checks if the zone of the page
is the highmem type zone or not.

"#define PageHighMem(__p) is_highmem_idx(page_zonenum(__p))"

ZONE_MOVABLE is special. It is considered as normal type zone on
!CONFIG_HIGHMEM, but, it is considered as highmem type zone
on CONFIG_HIGHMEM. Let's focus on later case. In later case, all pages
on the ZONE_MOVABLE has no direct mapping until now.

However, following patchset
"mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE"
, which is once merged and reverted, will be tried again and will break
this assumption that all pages on the ZONE_MOVABLE has no direct mapping.
Hence, the ZONE_MOVABLE which is considered as highmem type zone could
have the both types of pages, direct mapped and not. Since
the ZONE_MOVABLE could have both type of pages, __GFP_HIGHMEM is still
required to allocate the memory from it. And, we conservatively need to
consider the ZONE_MOVABLE as highmem type zone.

Even in this situation, PageHighMem() for the pages on the ZONE_MOVABLE
when it is called for checking the direct mapping should return correct
result. Current implementation of PageHighMem() just returns TRUE
if the zone of the page is on a highmem type zone. So, it could be wrong
if the page on the MOVABLE_ZONE is actually direct mapped.

To solve this potential problem, this patch introduces a new
PageHighMemZone() macro. In following patches, two use cases of
PageHighMem() are separated by calling proper macro, PageHighMem() and
PageHighMemZone(). Then, implementation of PageHighMem() will be changed
as just checking if the direct mapping exists or not, regardless of
the zone of the page.

Note that there are some rules to determine the proper macro.

1. If PageHighMem() is called for checking if the direct mapping exists
or not, use PageHighMem().
2. If PageHighMem() is used to predict the previous gfp_flags for
this page, use PageHighMemZone(). The zone of the page is related to
the gfp_flags.
3. If purpose of calling PageHighMem() is to count highmem page and
to interact with the system by using this count, use PageHighMemZone().
This counter is usually used to calculate the available memory for an
kernel allocation and pages on the highmem zone cannot be available
for an kernel allocation.
4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
is just copy of the previous PageHighMem() implementation and won't
be changed.

My final plan is to change the name, PageHighMem() to PageNoDirectMapped()
or something else in order to represent proper meaning.

This patchset is based on next-20200428 and you can find the full patchset on the
following link.

https://github.com/JoonsooKim/linux/tree/page_highmem-cleanup-v2.00-next-20200428

Thanks.

[1]: https://lore.kernel.org/linux-mm/1512114786-5085-1-git-send-email-iamjoonsoo.kim@lge.com

Joonsoo Kim (10):
  mm/page-flags: introduce PageHighMemZone()
  drm/ttm: separate PageHighMem() and PageHighMemZone() use case
  kexec: separate PageHighMem() and PageHighMemZone() use case
  power: separate PageHighMem() and PageHighMemZone() use case
  mm/gup: separate PageHighMem() and PageHighMemZone() use case
  mm/hugetlb: separate PageHighMem() and PageHighMemZone() use case
  mm: separate PageHighMem() and PageHighMemZone() use case
  mm/page_alloc: correct the use of is_highmem_idx()
  mm/migrate: replace PageHighMem() with open-code
  mm/page-flags: change the implementation of the PageHighMem()

 drivers/gpu/drm/ttm/ttm_memory.c         |  4 ++--
 drivers/gpu/drm/ttm/ttm_page_alloc.c     |  2 +-
 drivers/gpu/drm/ttm/ttm_page_alloc_dma.c |  2 +-
 drivers/gpu/drm/ttm/ttm_tt.c             |  2 +-
 include/linux/migrate.h                  |  4 +++-
 include/linux/page-flags.h               | 10 +++++++++-
 kernel/kexec_core.c                      |  2 +-
 kernel/power/snapshot.c                  | 12 ++++++------
 mm/gup.c                                 |  2 +-
 mm/hugetlb.c                             |  2 +-
 mm/memory_hotplug.c                      |  2 +-
 mm/page_alloc.c                          |  4 ++--
 12 files changed, 29 insertions(+), 19 deletions(-)

-- 
2.7.4



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 01/10] mm/page-flags: introduce PageHighMemZone()
  2020-04-29  3:26 [PATCH v2 00/10] change the implementation of the PageHighMem() js1304
@ 2020-04-29  3:26 ` js1304
  2020-04-29  3:26 ` [PATCH v2 02/10] drm/ttm: separate PageHighMem() and PageHighMemZone() use case js1304
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 33+ messages in thread
From: js1304 @ 2020-04-29  3:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Vlastimil Babka, Laura Abbott,
	Aneesh Kumar K . V, Mel Gorman, Michal Hocko, Johannes Weiner,
	Roman Gushchin, Minchan Kim, Rik van Riel, Christian Koenig,
	Huang Rui, Eric Biederman, Rafael J . Wysocki, Pavel Machek,
	kernel-team, Christoph Hellwig, Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

PageHighMem() is used for two different cases. One is to check if there
is a direct mapping for this page or not. The other is to check the
zone of this page, that is, weather it is the highmem type zone or not.

Until now, both the cases are the perfectly same thing. So, implementation
of the PageHighMem() uses the one case that checks if the zone of the page
is the highmem type zone or not.

"#define PageHighMem(__p) is_highmem_idx(page_zonenum(__p))"

ZONE_MOVABLE is special. It is considered as normal type zone on
!CONFIG_HIGHMEM, but, it is considered as highmem type zone
on CONFIG_HIGHMEM. Let's focus on later case. In later case, all pages
on the ZONE_MOVABLE has no direct mapping until now.

However, following patchset
"mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE"
, which is once merged and reverted, will be tried again and will break
this assumption that all pages on the ZONE_MOVABLE has no direct mapping.
Hence, the ZONE_MOVABLE which is considered as highmem type zone could
have the both types of pages, direct mapped and not. Since
the ZONE_MOVABLE could have both type of pages, __GFP_HIGHMEM is still
required to allocate the memory from it. And, we conservatively need to
consider the ZONE_MOVABLE as highmem type zone.

Even in this situation, PageHighMem() for the pages on the ZONE_MOVABLE
when it is called for checking the direct mapping should return correct
result. Current implementation of PageHighMem() just returns TRUE
if the zone of the page is on a highmem type zone. So, it could be wrong
if the page on the MOVABLE_ZONE is actually direct mapped.

To solve this potential problem, this patch introduces a new
PageHighMemZone() macro. In following patches, two use cases of
PageHighMem() are separated by calling proper macro, PageHighMem() and
PageHighMemZone(). Then, implementation of PageHighMem() will be changed
as just checking if the direct mapping exists or not, regardless of
the zone of the page.

Note that there are some rules to determine the proper macro.

1. If PageHighMem() is called for checking if the direct mapping exists
or not, use PageHighMem().
2. If PageHighMem() is used to predict the previous gfp_flags for
this page, use PageHighMemZone(). The zone of the page is related to
the gfp_flags.
3. If purpose of calling PageHighMem() is to count highmem page and
to interact with the system by using this count, use PageHighMemZone().
This counter is usually used to calculate the available memory for an
kernel allocation and pages on the highmem zone cannot be available
for an kernel allocation.
4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
is just copy of the previous PageHighMem() implementation and won't
be changed.

My final plan is to change the name, PageHighMem() to PageNoDirectMapped()
or something else in order to represent proper meaning.

Acked-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/page-flags.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 222f6f7..fca0cce 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -378,10 +378,16 @@ PAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND)
 /*
  * Must use a macro here due to header dependency issues. page_zone() is not
  * available at this point.
+ * PageHighMem() is for checking if the direct mapping exists or not.
+ * PageHighMemZone() is for checking the zone, where the page is belong to,
+ * in order to predict previous gfp_flags or to count something for system
+ * memory management.
  */
 #define PageHighMem(__p) is_highmem_idx(page_zonenum(__p))
+#define PageHighMemZone(__p) is_highmem_idx(page_zonenum(__p))
 #else
 PAGEFLAG_FALSE(HighMem)
+PAGEFLAG_FALSE(HighMemZone)
 #endif
 
 #ifdef CONFIG_SWAP
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 02/10] drm/ttm: separate PageHighMem() and PageHighMemZone() use case
  2020-04-29  3:26 [PATCH v2 00/10] change the implementation of the PageHighMem() js1304
  2020-04-29  3:26 ` [PATCH v2 01/10] mm/page-flags: introduce PageHighMemZone() js1304
@ 2020-04-29  3:26 ` js1304
  2020-04-29  3:26 ` [PATCH v2 03/10] kexec: " js1304
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 33+ messages in thread
From: js1304 @ 2020-04-29  3:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Vlastimil Babka, Laura Abbott,
	Aneesh Kumar K . V, Mel Gorman, Michal Hocko, Johannes Weiner,
	Roman Gushchin, Minchan Kim, Rik van Riel, Christian Koenig,
	Huang Rui, Eric Biederman, Rafael J . Wysocki, Pavel Machek,
	kernel-team, Christoph Hellwig, Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Until now, PageHighMem() is used for two different cases. One is to check
if there is a direct mapping for this page or not. The other is to check
the zone of this page, that is, weather it is the highmem type zone or not.

Now, we have separate functions, PageHighMem() and PageHighMemZone() for
each cases. Use appropriate one.

Note that there are some rules to determine the proper macro.

1. If PageHighMem() is called for checking if the direct mapping exists
or not, use PageHighMem().
2. If PageHighMem() is used to predict the previous gfp_flags for
this page, use PageHighMemZone(). The zone of the page is related to
the gfp_flags.
3. If purpose of calling PageHighMem() is to count highmem page and
to interact with the system by using this count, use PageHighMemZone().
This counter is usually used to calculate the available memory for an
kernel allocation and pages on the highmem zone cannot be available
for an kernel allocation.
4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
is just copy of the previous PageHighMem() implementation and won't
be changed.

I apply the rule #4 for this patch.

Acked-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 drivers/gpu/drm/ttm/ttm_memory.c         | 4 ++--
 drivers/gpu/drm/ttm/ttm_page_alloc.c     | 2 +-
 drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 2 +-
 drivers/gpu/drm/ttm/ttm_tt.c             | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_memory.c b/drivers/gpu/drm/ttm/ttm_memory.c
index acd63b7..d071b71 100644
--- a/drivers/gpu/drm/ttm/ttm_memory.c
+++ b/drivers/gpu/drm/ttm/ttm_memory.c
@@ -641,7 +641,7 @@ int ttm_mem_global_alloc_page(struct ttm_mem_global *glob,
 	 */
 
 #ifdef CONFIG_HIGHMEM
-	if (PageHighMem(page) && glob->zone_highmem != NULL)
+	if (PageHighMemZone(page) && glob->zone_highmem != NULL)
 		zone = glob->zone_highmem;
 #else
 	if (glob->zone_dma32 && page_to_pfn(page) > 0x00100000UL)
@@ -656,7 +656,7 @@ void ttm_mem_global_free_page(struct ttm_mem_global *glob, struct page *page,
 	struct ttm_mem_zone *zone = NULL;
 
 #ifdef CONFIG_HIGHMEM
-	if (PageHighMem(page) && glob->zone_highmem != NULL)
+	if (PageHighMemZone(page) && glob->zone_highmem != NULL)
 		zone = glob->zone_highmem;
 #else
 	if (glob->zone_dma32 && page_to_pfn(page) > 0x00100000UL)
diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c
index b40a467..847fabe 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
@@ -530,7 +530,7 @@ static int ttm_alloc_new_pages(struct list_head *pages, gfp_t gfp_flags,
 		/* gfp flags of highmem page should never be dma32 so we
 		 * we should be fine in such case
 		 */
-		if (PageHighMem(p))
+		if (PageHighMemZone(p))
 			continue;
 
 #endif
diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
index faefaae..338b2a2 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
@@ -747,7 +747,7 @@ static int ttm_dma_pool_alloc_new_pages(struct dma_pool *pool,
 		/* gfp flags of highmem page should never be dma32 so we
 		 * we should be fine in such case
 		 */
-		if (PageHighMem(p))
+		if (PageHighMemZone(p))
 			continue;
 #endif
 
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 2ec448e..6e094dd 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -119,7 +119,7 @@ static int ttm_tt_set_page_caching(struct page *p,
 {
 	int ret = 0;
 
-	if (PageHighMem(p))
+	if (PageHighMemZone(p))
 		return 0;
 
 	if (c_old != tt_cached) {
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 03/10] kexec: separate PageHighMem() and PageHighMemZone() use case
  2020-04-29  3:26 [PATCH v2 00/10] change the implementation of the PageHighMem() js1304
  2020-04-29  3:26 ` [PATCH v2 01/10] mm/page-flags: introduce PageHighMemZone() js1304
  2020-04-29  3:26 ` [PATCH v2 02/10] drm/ttm: separate PageHighMem() and PageHighMemZone() use case js1304
@ 2020-04-29  3:26 ` js1304
  2020-05-01 14:03   ` Eric W. Biederman
  2020-04-29  3:26 ` [PATCH v2 04/10] power: " js1304
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: js1304 @ 2020-04-29  3:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Vlastimil Babka, Laura Abbott,
	Aneesh Kumar K . V, Mel Gorman, Michal Hocko, Johannes Weiner,
	Roman Gushchin, Minchan Kim, Rik van Riel, Christian Koenig,
	Huang Rui, Eric Biederman, Rafael J . Wysocki, Pavel Machek,
	kernel-team, Christoph Hellwig, Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Until now, PageHighMem() is used for two different cases. One is to check
if there is a direct mapping for this page or not. The other is to check
the zone of this page, that is, weather it is the highmem type zone or not.

Now, we have separate functions, PageHighMem() and PageHighMemZone() for
each cases. Use appropriate one.

Note that there are some rules to determine the proper macro.

1. If PageHighMem() is called for checking if the direct mapping exists
or not, use PageHighMem().
2. If PageHighMem() is used to predict the previous gfp_flags for
this page, use PageHighMemZone(). The zone of the page is related to
the gfp_flags.
3. If purpose of calling PageHighMem() is to count highmem page and
to interact with the system by using this count, use PageHighMemZone().
This counter is usually used to calculate the available memory for an
kernel allocation and pages on the highmem zone cannot be available
for an kernel allocation.
4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
is just copy of the previous PageHighMem() implementation and won't
be changed.

I apply the rule #2 for this patch.

Acked-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 kernel/kexec_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index ba1d91e..33097b7 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -766,7 +766,7 @@ static struct page *kimage_alloc_page(struct kimage *image,
 			 * gfp_flags honor the ones passed in.
 			 */
 			if (!(gfp_mask & __GFP_HIGHMEM) &&
-			    PageHighMem(old_page)) {
+			    PageHighMemZone(old_page)) {
 				kimage_free_pages(old_page);
 				continue;
 			}
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 04/10] power: separate PageHighMem() and PageHighMemZone() use case
  2020-04-29  3:26 [PATCH v2 00/10] change the implementation of the PageHighMem() js1304
                   ` (2 preceding siblings ...)
  2020-04-29  3:26 ` [PATCH v2 03/10] kexec: " js1304
@ 2020-04-29  3:26 ` js1304
  2020-05-01 12:22   ` Christoph Hellwig
  2020-04-29  3:26 ` [PATCH v2 05/10] mm/gup: " js1304
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: js1304 @ 2020-04-29  3:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Vlastimil Babka, Laura Abbott,
	Aneesh Kumar K . V, Mel Gorman, Michal Hocko, Johannes Weiner,
	Roman Gushchin, Minchan Kim, Rik van Riel, Christian Koenig,
	Huang Rui, Eric Biederman, Rafael J . Wysocki, Pavel Machek,
	kernel-team, Christoph Hellwig, Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Until now, PageHighMem() is used for two different cases. One is to check
if there is a direct mapping for this page or not. The other is to check
the zone of this page, that is, weather it is the highmem type zone or not.

Now, we have separate functions, PageHighMem() and PageHighMemZone() for
each cases. Use appropriate one.

Note that there are some rules to determine the proper macro.

1. If PageHighMem() is called for checking if the direct mapping exists
or not, use PageHighMem().
2. If PageHighMem() is used to predict the previous gfp_flags for
this page, use PageHighMemZone(). The zone of the page is related to
the gfp_flags.
3. If purpose of calling PageHighMem() is to count highmem page and
to interact with the system by using this count, use PageHighMemZone().
This counter is usually used to calculate the available memory for an
kernel allocation and pages on the highmem zone cannot be available
for an kernel allocation.
4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
is just copy of the previous PageHighMem() implementation and won't
be changed.

I apply the rule #3 for this patch.

Acked-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 kernel/power/snapshot.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 6598001..be759a6 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -1227,7 +1227,7 @@ static struct page *saveable_highmem_page(struct zone *zone, unsigned long pfn)
 	if (!page || page_zone(page) != zone)
 		return NULL;
 
-	BUG_ON(!PageHighMem(page));
+	BUG_ON(!PageHighMemZone(page));
 
 	if (swsusp_page_is_forbidden(page) ||  swsusp_page_is_free(page))
 		return NULL;
@@ -1291,7 +1291,7 @@ static struct page *saveable_page(struct zone *zone, unsigned long pfn)
 	if (!page || page_zone(page) != zone)
 		return NULL;
 
-	BUG_ON(PageHighMem(page));
+	BUG_ON(PageHighMemZone(page));
 
 	if (swsusp_page_is_forbidden(page) || swsusp_page_is_free(page))
 		return NULL;
@@ -1529,7 +1529,7 @@ static unsigned long preallocate_image_pages(unsigned long nr_pages, gfp_t mask)
 		if (!page)
 			break;
 		memory_bm_set_bit(&copy_bm, page_to_pfn(page));
-		if (PageHighMem(page))
+		if (PageHighMemZone(page))
 			alloc_highmem++;
 		else
 			alloc_normal++;
@@ -1625,7 +1625,7 @@ static unsigned long free_unnecessary_pages(void)
 		unsigned long pfn = memory_bm_next_pfn(&copy_bm);
 		struct page *page = pfn_to_page(pfn);
 
-		if (PageHighMem(page)) {
+		if (PageHighMemZone(page)) {
 			if (!to_free_highmem)
 				continue;
 			to_free_highmem--;
@@ -2264,7 +2264,7 @@ static unsigned int count_highmem_image_pages(struct memory_bitmap *bm)
 	memory_bm_position_reset(bm);
 	pfn = memory_bm_next_pfn(bm);
 	while (pfn != BM_END_OF_MAP) {
-		if (PageHighMem(pfn_to_page(pfn)))
+		if (PageHighMemZone(pfn_to_page(pfn)))
 			cnt++;
 
 		pfn = memory_bm_next_pfn(bm);
@@ -2541,7 +2541,7 @@ static void *get_buffer(struct memory_bitmap *bm, struct chain_allocator *ca)
 		return ERR_PTR(-EFAULT);
 
 	page = pfn_to_page(pfn);
-	if (PageHighMem(page))
+	if (PageHighMemZone(page))
 		return get_highmem_page_buffer(page, ca);
 
 	if (swsusp_page_is_forbidden(page) && swsusp_page_is_free(page))
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 05/10] mm/gup: separate PageHighMem() and PageHighMemZone() use case
  2020-04-29  3:26 [PATCH v2 00/10] change the implementation of the PageHighMem() js1304
                   ` (3 preceding siblings ...)
  2020-04-29  3:26 ` [PATCH v2 04/10] power: " js1304
@ 2020-04-29  3:26 ` js1304
  2020-05-01 12:24   ` Christoph Hellwig
  2020-04-29  3:26 ` [PATCH v2 06/10] mm/hugetlb: " js1304
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: js1304 @ 2020-04-29  3:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Vlastimil Babka, Laura Abbott,
	Aneesh Kumar K . V, Mel Gorman, Michal Hocko, Johannes Weiner,
	Roman Gushchin, Minchan Kim, Rik van Riel, Christian Koenig,
	Huang Rui, Eric Biederman, Rafael J . Wysocki, Pavel Machek,
	kernel-team, Christoph Hellwig, Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Until now, PageHighMem() is used for two different cases. One is to check
if there is a direct mapping for this page or not. The other is to check
the zone of this page, that is, weather it is the highmem type zone or not.

Now, we have separate functions, PageHighMem() and PageHighMemZone() for
each cases. Use appropriate one.

Note that there are some rules to determine the proper macro.

1. If PageHighMem() is called for checking if the direct mapping exists
or not, use PageHighMem().
2. If PageHighMem() is used to predict the previous gfp_flags for
this page, use PageHighMemZone(). The zone of the page is related to
the gfp_flags.
3. If purpose of calling PageHighMem() is to count highmem page and
to interact with the system by using this count, use PageHighMemZone().
This counter is usually used to calculate the available memory for an
kernel allocation and pages on the highmem zone cannot be available
for an kernel allocation.
4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
is just copy of the previous PageHighMem() implementation and won't
be changed.

I apply the rule #2 for this patch.

Acked-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/gup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/gup.c b/mm/gup.c
index 11fda53..9652eed 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1608,7 +1608,7 @@ static struct page *new_non_cma_page(struct page *page, unsigned long private)
 	 */
 	gfp_t gfp_mask = GFP_USER | __GFP_NOWARN;
 
-	if (PageHighMem(page))
+	if (PageHighMemZone(page))
 		gfp_mask |= __GFP_HIGHMEM;
 
 #ifdef CONFIG_HUGETLB_PAGE
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 06/10] mm/hugetlb: separate PageHighMem() and PageHighMemZone() use case
  2020-04-29  3:26 [PATCH v2 00/10] change the implementation of the PageHighMem() js1304
                   ` (4 preceding siblings ...)
  2020-04-29  3:26 ` [PATCH v2 05/10] mm/gup: " js1304
@ 2020-04-29  3:26 ` js1304
  2020-05-01 12:26   ` Christoph Hellwig
  2020-04-29  3:26 ` [PATCH v2 07/10] mm: " js1304
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: js1304 @ 2020-04-29  3:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Vlastimil Babka, Laura Abbott,
	Aneesh Kumar K . V, Mel Gorman, Michal Hocko, Johannes Weiner,
	Roman Gushchin, Minchan Kim, Rik van Riel, Christian Koenig,
	Huang Rui, Eric Biederman, Rafael J . Wysocki, Pavel Machek,
	kernel-team, Christoph Hellwig, Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Until now, PageHighMem() is used for two different cases. One is to check
if there is a direct mapping for this page or not. The other is to check
the zone of this page, that is, weather it is the highmem type zone or not.

Now, we have separate functions, PageHighMem() and PageHighMemZone() for
each cases. Use appropriate one.

Note that there are some rules to determine the proper macro.

1. If PageHighMem() is called for checking if the direct mapping exists
or not, use PageHighMem().
2. If PageHighMem() is used to predict the previous gfp_flags for
this page, use PageHighMemZone(). The zone of the page is related to
the gfp_flags.
3. If purpose of calling PageHighMem() is to count highmem page and
to interact with the system by using this count, use PageHighMemZone().
This counter is usually used to calculate the available memory for an
kernel allocation and pages on the highmem zone cannot be available
for an kernel allocation.
4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
is just copy of the previous PageHighMem() implementation and won't
be changed.

I apply the rule #3 for this patch.

Acked-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/hugetlb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 5548e88..56c9143 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2639,7 +2639,7 @@ static void try_to_free_low(struct hstate *h, unsigned long count,
 		list_for_each_entry_safe(page, next, freel, lru) {
 			if (count >= h->nr_huge_pages)
 				return;
-			if (PageHighMem(page))
+			if (PageHighMemZone(page))
 				continue;
 			list_del(&page->lru);
 			update_and_free_page(h, page);
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 07/10] mm: separate PageHighMem() and PageHighMemZone() use case
  2020-04-29  3:26 [PATCH v2 00/10] change the implementation of the PageHighMem() js1304
                   ` (5 preceding siblings ...)
  2020-04-29  3:26 ` [PATCH v2 06/10] mm/hugetlb: " js1304
@ 2020-04-29  3:26 ` js1304
  2020-05-01 12:30   ` Christoph Hellwig
  2020-04-29  3:26 ` [PATCH v2 08/10] mm/page_alloc: correct the use of is_highmem_idx() js1304
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: js1304 @ 2020-04-29  3:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Vlastimil Babka, Laura Abbott,
	Aneesh Kumar K . V, Mel Gorman, Michal Hocko, Johannes Weiner,
	Roman Gushchin, Minchan Kim, Rik van Riel, Christian Koenig,
	Huang Rui, Eric Biederman, Rafael J . Wysocki, Pavel Machek,
	kernel-team, Christoph Hellwig, Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Until now, PageHighMem() is used for two different cases. One is to check
if there is a direct mapping for this page or not. The other is to check
the zone of this page, that is, weather it is the highmem type zone or not.

Now, we have separate functions, PageHighMem() and PageHighMemZone() for
each cases. Use appropriate one.

Note that there are some rules to determine the proper macro.

1. If PageHighMem() is called for checking if the direct mapping exists
or not, use PageHighMem().
2. If PageHighMem() is used to predict the previous gfp_flags for
this page, use PageHighMemZone(). The zone of the page is related to
the gfp_flags.
3. If purpose of calling PageHighMem() is to count highmem page and
to interact with the system by using this count, use PageHighMemZone().
This counter is usually used to calculate the available memory for an
kernel allocation and pages on the highmem zone cannot be available
for an kernel allocation.
4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
is just copy of the previous PageHighMem() implementation and won't
be changed.

I apply the rule #3 for this patch.

Acked-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/memory_hotplug.c | 2 +-
 mm/page_alloc.c     | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 555137b..891c214 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -593,7 +593,7 @@ void generic_online_page(struct page *page, unsigned int order)
 	__free_pages_core(page, order);
 	totalram_pages_add(1UL << order);
 #ifdef CONFIG_HIGHMEM
-	if (PageHighMem(page))
+	if (PageHighMemZone(page))
 		totalhigh_pages_add(1UL << order);
 #endif
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fc5919e..7fe5115 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7444,7 +7444,7 @@ void adjust_managed_page_count(struct page *page, long count)
 	atomic_long_add(count, &page_zone(page)->managed_pages);
 	totalram_pages_add(count);
 #ifdef CONFIG_HIGHMEM
-	if (PageHighMem(page))
+	if (PageHighMemZone(page))
 		totalhigh_pages_add(count);
 #endif
 }
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 08/10] mm/page_alloc: correct the use of is_highmem_idx()
  2020-04-29  3:26 [PATCH v2 00/10] change the implementation of the PageHighMem() js1304
                   ` (6 preceding siblings ...)
  2020-04-29  3:26 ` [PATCH v2 07/10] mm: " js1304
@ 2020-04-29  3:26 ` js1304
  2020-04-29  3:26 ` [PATCH v2 09/10] mm/migrate: replace PageHighMem() with open-code js1304
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 33+ messages in thread
From: js1304 @ 2020-04-29  3:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Vlastimil Babka, Laura Abbott,
	Aneesh Kumar K . V, Mel Gorman, Michal Hocko, Johannes Weiner,
	Roman Gushchin, Minchan Kim, Rik van Riel, Christian Koenig,
	Huang Rui, Eric Biederman, Rafael J . Wysocki, Pavel Machek,
	kernel-team, Christoph Hellwig, Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

What we'd like to check here is whether page has direct mapping or not.
Use PageHighMem() since it is perfectly matched for this purpose.

Acked-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7fe5115..da473c7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1399,7 +1399,7 @@ static void __meminit __init_single_page(struct page *page, unsigned long pfn,
 	INIT_LIST_HEAD(&page->lru);
 #ifdef WANT_PAGE_VIRTUAL
 	/* The shift won't overflow because ZONE_NORMAL is below 4G. */
-	if (!is_highmem_idx(zone))
+	if (!PageHighMem(page))
 		set_page_address(page, __va(pfn << PAGE_SHIFT));
 #endif
 }
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 09/10] mm/migrate: replace PageHighMem() with open-code
  2020-04-29  3:26 [PATCH v2 00/10] change the implementation of the PageHighMem() js1304
                   ` (7 preceding siblings ...)
  2020-04-29  3:26 ` [PATCH v2 08/10] mm/page_alloc: correct the use of is_highmem_idx() js1304
@ 2020-04-29  3:26 ` js1304
  2020-04-29  3:26 ` [PATCH v2 10/10] mm/page-flags: change the implementation of the PageHighMem() js1304
  2020-04-30  1:47 ` [PATCH v2 00/10] " Andrew Morton
  10 siblings, 0 replies; 33+ messages in thread
From: js1304 @ 2020-04-29  3:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Vlastimil Babka, Laura Abbott,
	Aneesh Kumar K . V, Mel Gorman, Michal Hocko, Johannes Weiner,
	Roman Gushchin, Minchan Kim, Rik van Riel, Christian Koenig,
	Huang Rui, Eric Biederman, Rafael J . Wysocki, Pavel Machek,
	kernel-team, Christoph Hellwig, Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Implementation of PageHighMem() will be changed in following patches.
Before that, use open-code to avoid the side effect of implementation
change on PageHighMem().

Acked-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/migrate.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 3e546cb..a9cfd8e 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -37,6 +37,7 @@ static inline struct page *new_page_nodemask(struct page *page,
 	gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL;
 	unsigned int order = 0;
 	struct page *new_page = NULL;
+	int zidx;
 
 	if (PageHuge(page))
 		return alloc_huge_page_nodemask(page_hstate(compound_head(page)),
@@ -47,7 +48,8 @@ static inline struct page *new_page_nodemask(struct page *page,
 		order = HPAGE_PMD_ORDER;
 	}
 
-	if (PageHighMem(page) || (zone_idx(page_zone(page)) == ZONE_MOVABLE))
+	zidx = zone_idx(page_zone(page));
+	if (is_highmem_idx(zidx) || zidx == ZONE_MOVABLE)
 		gfp_mask |= __GFP_HIGHMEM;
 
 	new_page = __alloc_pages_nodemask(gfp_mask, order,
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 10/10] mm/page-flags: change the implementation of the PageHighMem()
  2020-04-29  3:26 [PATCH v2 00/10] change the implementation of the PageHighMem() js1304
                   ` (8 preceding siblings ...)
  2020-04-29  3:26 ` [PATCH v2 09/10] mm/migrate: replace PageHighMem() with open-code js1304
@ 2020-04-29  3:26 ` js1304
  2020-04-30  1:47 ` [PATCH v2 00/10] " Andrew Morton
  10 siblings, 0 replies; 33+ messages in thread
From: js1304 @ 2020-04-29  3:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Vlastimil Babka, Laura Abbott,
	Aneesh Kumar K . V, Mel Gorman, Michal Hocko, Johannes Weiner,
	Roman Gushchin, Minchan Kim, Rik van Riel, Christian Koenig,
	Huang Rui, Eric Biederman, Rafael J . Wysocki, Pavel Machek,
	kernel-team, Christoph Hellwig, Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Until now, PageHighMem() is used for two different cases. One is to check
if there is a direct mapping for this page or not. The other is to check
the zone of this page, that is, weather it is the highmem type zone or not.

Previous patches introduce PageHighMemZone() macro and separates both
cases strictly. So, now, PageHighMem() is used just for checking if
there is a direct mapping for this page or not.

In the following patchset, ZONE_MOVABLE which could be considered as
the highmem type zone in some configuration could have both types of
pages, direct mapped pages and unmapped pages. So, current implementation
of PageHighMem() that checks the zone rather than checks the page in order
to check if a direct mapping exists will be invalid. This patch prepares
that case by implementing PageHighMem() with the max_low_pfn.

Acked-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/page-flags.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index fca0cce..7ac5fc8 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -375,6 +375,8 @@ PAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND)
 	TESTCLEARFLAG(Readahead, reclaim, PF_NO_COMPOUND)
 
 #ifdef CONFIG_HIGHMEM
+extern unsigned long max_low_pfn;
+
 /*
  * Must use a macro here due to header dependency issues. page_zone() is not
  * available at this point.
@@ -383,7 +385,7 @@ PAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND)
  * in order to predict previous gfp_flags or to count something for system
  * memory management.
  */
-#define PageHighMem(__p) is_highmem_idx(page_zonenum(__p))
+#define PageHighMem(__p) (page_to_pfn(__p) >= max_low_pfn)
 #define PageHighMemZone(__p) is_highmem_idx(page_zonenum(__p))
 #else
 PAGEFLAG_FALSE(HighMem)
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 00/10] change the implementation of the PageHighMem()
  2020-04-29  3:26 [PATCH v2 00/10] change the implementation of the PageHighMem() js1304
                   ` (9 preceding siblings ...)
  2020-04-29  3:26 ` [PATCH v2 10/10] mm/page-flags: change the implementation of the PageHighMem() js1304
@ 2020-04-30  1:47 ` Andrew Morton
  2020-05-01 10:52   ` Joonsoo Kim
  10 siblings, 1 reply; 33+ messages in thread
From: Andrew Morton @ 2020-04-30  1:47 UTC (permalink / raw)
  To: js1304
  Cc: linux-mm, linux-kernel, Vlastimil Babka, Laura Abbott,
	Aneesh Kumar K . V, Mel Gorman, Michal Hocko, Johannes Weiner,
	Roman Gushchin, Minchan Kim, Rik van Riel, Christian Koenig,
	Huang Rui, Eric Biederman, Rafael J . Wysocki, Pavel Machek,
	kernel-team, Christoph Hellwig, Joonsoo Kim

On Wed, 29 Apr 2020 12:26:33 +0900 js1304@gmail.com wrote:

> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> Changes on v2
> - add "acked-by", "reviewed-by" tags
> - replace PageHighMem() with use open-code, instead of using
> new PageHighMemZone() macro. Related file is "include/linux/migrate.h"
> 
> Hello,
> 
> This patchset separates two use cases of PageHighMem() by introducing
> PageHighMemZone() macro. And, it changes the implementation of
> PageHighMem() to reflect the actual meaning of this macro. This patchset
> is a preparation step for the patchset,
> "mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE" [1].
> 
> PageHighMem() is used for two different cases. One is to check if there
> is a direct mapping for this page or not. The other is to check the
> zone of this page, that is, weather it is the highmem type zone or not.
> 
> Until now, both the cases are the perfectly same thing. So, implementation
> of the PageHighMem() uses the one case that checks if the zone of the page
> is the highmem type zone or not.
> 
> "#define PageHighMem(__p) is_highmem_idx(page_zonenum(__p))"
> 
> ZONE_MOVABLE is special. It is considered as normal type zone on
> !CONFIG_HIGHMEM, but, it is considered as highmem type zone
> on CONFIG_HIGHMEM. Let's focus on later case. In later case, all pages
> on the ZONE_MOVABLE has no direct mapping until now.
> 
> However, following patchset
> "mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE"
> , which is once merged and reverted, will be tried again and will break
> this assumption that all pages on the ZONE_MOVABLE has no direct mapping.
> Hence, the ZONE_MOVABLE which is considered as highmem type zone could
> have the both types of pages, direct mapped and not. Since
> the ZONE_MOVABLE could have both type of pages, __GFP_HIGHMEM is still
> required to allocate the memory from it. And, we conservatively need to
> consider the ZONE_MOVABLE as highmem type zone.
> 
> Even in this situation, PageHighMem() for the pages on the ZONE_MOVABLE
> when it is called for checking the direct mapping should return correct
> result. Current implementation of PageHighMem() just returns TRUE
> if the zone of the page is on a highmem type zone. So, it could be wrong
> if the page on the MOVABLE_ZONE is actually direct mapped.
> 
> To solve this potential problem, this patch introduces a new
> PageHighMemZone() macro. In following patches, two use cases of
> PageHighMem() are separated by calling proper macro, PageHighMem() and
> PageHighMemZone(). Then, implementation of PageHighMem() will be changed
> as just checking if the direct mapping exists or not, regardless of
> the zone of the page.
> 
> Note that there are some rules to determine the proper macro.
> 
> 1. If PageHighMem() is called for checking if the direct mapping exists
> or not, use PageHighMem().
> 2. If PageHighMem() is used to predict the previous gfp_flags for
> this page, use PageHighMemZone(). The zone of the page is related to
> the gfp_flags.
> 3. If purpose of calling PageHighMem() is to count highmem page and
> to interact with the system by using this count, use PageHighMemZone().
> This counter is usually used to calculate the available memory for an
> kernel allocation and pages on the highmem zone cannot be available
> for an kernel allocation.
> 4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
> is just copy of the previous PageHighMem() implementation and won't
> be changed.

hm, this won't improve maintainability :(

- Everyone will need to remember when to use PageHighMem() and when
  to use PageHighMemZone().  If they get it wrong, they're unlikely to
  notice any problem in their runtime testing, correct?

- New code will pop up which gets it wrong and nobody will notice for
  a long time.

So I guess we need to be pretty confident that the series "mm/cma:
manage the memory of the CMA area by using the ZONE_MOVABLE" will be
useful and merged before proceeding with this, yes?

On the other hand, this whole series is a no-op until [10/10]
(correct?) so it can be effectively reverted with a single line change,
with later cleanups which revert the other 9 patches.

So I think I'd like to take another look at "mm/cma: manage the memory
of the CMA area by using the ZONE_MOVABLE" before figuring out what to
do here.  Mainly to answer the question "is the new feature valuable
enough to justify the maintainability impact".  So please do take some
care in explaining the end-user benefit when preparing the new version
of that patchset.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 00/10] change the implementation of the PageHighMem()
  2020-04-30  1:47 ` [PATCH v2 00/10] " Andrew Morton
@ 2020-05-01 10:52   ` Joonsoo Kim
  2020-05-01 10:55     ` Christoph Hellwig
  0 siblings, 1 reply; 33+ messages in thread
From: Joonsoo Kim @ 2020-05-01 10:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux Memory Management List, LKML, Vlastimil Babka,
	Laura Abbott, Aneesh Kumar K . V, Mel Gorman, Michal Hocko,
	Johannes Weiner, Roman Gushchin, Minchan Kim, Rik van Riel,
	Christian Koenig, Huang Rui, Eric Biederman, Rafael J . Wysocki,
	Pavel Machek, kernel-team, Christoph Hellwig, Joonsoo Kim

2020년 4월 30일 (목) 오전 10:47, Andrew Morton <akpm@linux-foundation.org>님이 작성:
>
> On Wed, 29 Apr 2020 12:26:33 +0900 js1304@gmail.com wrote:
>
> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >
> > Changes on v2
> > - add "acked-by", "reviewed-by" tags
> > - replace PageHighMem() with use open-code, instead of using
> > new PageHighMemZone() macro. Related file is "include/linux/migrate.h"
> >
> > Hello,
> >
> > This patchset separates two use cases of PageHighMem() by introducing
> > PageHighMemZone() macro. And, it changes the implementation of
> > PageHighMem() to reflect the actual meaning of this macro. This patchset
> > is a preparation step for the patchset,
> > "mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE" [1].
> >
> > PageHighMem() is used for two different cases. One is to check if there
> > is a direct mapping for this page or not. The other is to check the
> > zone of this page, that is, weather it is the highmem type zone or not.
> >
> > Until now, both the cases are the perfectly same thing. So, implementation
> > of the PageHighMem() uses the one case that checks if the zone of the page
> > is the highmem type zone or not.
> >
> > "#define PageHighMem(__p) is_highmem_idx(page_zonenum(__p))"
> >
> > ZONE_MOVABLE is special. It is considered as normal type zone on
> > !CONFIG_HIGHMEM, but, it is considered as highmem type zone
> > on CONFIG_HIGHMEM. Let's focus on later case. In later case, all pages
> > on the ZONE_MOVABLE has no direct mapping until now.
> >
> > However, following patchset
> > "mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE"
> > , which is once merged and reverted, will be tried again and will break
> > this assumption that all pages on the ZONE_MOVABLE has no direct mapping.
> > Hence, the ZONE_MOVABLE which is considered as highmem type zone could
> > have the both types of pages, direct mapped and not. Since
> > the ZONE_MOVABLE could have both type of pages, __GFP_HIGHMEM is still
> > required to allocate the memory from it. And, we conservatively need to
> > consider the ZONE_MOVABLE as highmem type zone.
> >
> > Even in this situation, PageHighMem() for the pages on the ZONE_MOVABLE
> > when it is called for checking the direct mapping should return correct
> > result. Current implementation of PageHighMem() just returns TRUE
> > if the zone of the page is on a highmem type zone. So, it could be wrong
> > if the page on the MOVABLE_ZONE is actually direct mapped.
> >
> > To solve this potential problem, this patch introduces a new
> > PageHighMemZone() macro. In following patches, two use cases of
> > PageHighMem() are separated by calling proper macro, PageHighMem() and
> > PageHighMemZone(). Then, implementation of PageHighMem() will be changed
> > as just checking if the direct mapping exists or not, regardless of
> > the zone of the page.
> >
> > Note that there are some rules to determine the proper macro.
> >
> > 1. If PageHighMem() is called for checking if the direct mapping exists
> > or not, use PageHighMem().
> > 2. If PageHighMem() is used to predict the previous gfp_flags for
> > this page, use PageHighMemZone(). The zone of the page is related to
> > the gfp_flags.
> > 3. If purpose of calling PageHighMem() is to count highmem page and
> > to interact with the system by using this count, use PageHighMemZone().
> > This counter is usually used to calculate the available memory for an
> > kernel allocation and pages on the highmem zone cannot be available
> > for an kernel allocation.
> > 4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
> > is just copy of the previous PageHighMem() implementation and won't
> > be changed.
>
> hm, this won't improve maintainability :(
>
> - Everyone will need to remember when to use PageHighMem() and when
>   to use PageHighMemZone().  If they get it wrong, they're unlikely to
>   notice any problem in their runtime testing, correct?
>
> - New code will pop up which gets it wrong and nobody will notice for
>   a long time.

Hmm... I think that it's not that hard to decide correct macro. If we rename
PageHighMem() with PageDirectMapped(), they, PageDirectMapped() and
PageHighMemZone(), are self-explanation macro. There would be no
confusion to use.

> So I guess we need to be pretty confident that the series "mm/cma:
> manage the memory of the CMA area by using the ZONE_MOVABLE" will be
> useful and merged before proceeding with this, yes?

Yes and my assumption is that we (MM) have agreed with usefulness of
CMA series.

> On the other hand, this whole series is a no-op until [10/10]
> (correct?) so it can be effectively reverted with a single line change,

Correct!

> with later cleanups which revert the other 9 patches.
>
> So I think I'd like to take another look at "mm/cma: manage the memory
> of the CMA area by using the ZONE_MOVABLE" before figuring out what to
> do here.  Mainly to answer the question "is the new feature valuable
> enough to justify the maintainability impact".  So please do take some
> care in explaining the end-user benefit when preparing the new version
> of that patchset.

So, do you mean to send the new version of CMA patchset with more
explanation before merging this patchset? If yes, I can do. But, I'm not sure
that it's worth doing. Problems of CMA are still not solved although
the utilization problem will be partially solved by Roman's "mm,page_alloc,cma:
conditionally prefer cma pageblocks for movable allocations" patch
in this (v5.7) release. Rationale that we agree with CMA patchset is still
remained.

Anyway, if you mean that, I will send the CMA patchset with more explanation.

Thanks.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 00/10] change the implementation of the PageHighMem()
  2020-05-01 10:52   ` Joonsoo Kim
@ 2020-05-01 10:55     ` Christoph Hellwig
  2020-05-01 12:15       ` Joonsoo Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Christoph Hellwig @ 2020-05-01 10:55 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Linux Memory Management List, LKML,
	Vlastimil Babka, Laura Abbott, Aneesh Kumar K . V, Mel Gorman,
	Michal Hocko, Johannes Weiner, Roman Gushchin, Minchan Kim,
	Rik van Riel, Christian Koenig, Huang Rui, Eric Biederman,
	Rafael J . Wysocki, Pavel Machek, kernel-team, Christoph Hellwig,
	Joonsoo Kim

On Fri, May 01, 2020 at 07:52:35PM +0900, Joonsoo Kim wrote:
> > - New code will pop up which gets it wrong and nobody will notice for
> >   a long time.
> 
> Hmm... I think that it's not that hard to decide correct macro. If we rename
> PageHighMem() with PageDirectMapped(), they, PageDirectMapped() and
> PageHighMemZone(), are self-explanation macro. There would be no
> confusion to use.

What confuses me is why we even need PageHighMemZone - mostly code
should not care about particular zones.  Maybe just open coding
PageHighMemZone makes more sense - it is a little more cumersome, but
at least it makes it explicit what we check for.  I already sent you
an incremental diff for one obvious place, but maybe we need to look
through the remaining ones if we can kill them or open code them in an
obvious way.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 00/10] change the implementation of the PageHighMem()
  2020-05-01 10:55     ` Christoph Hellwig
@ 2020-05-01 12:15       ` Joonsoo Kim
  2020-05-01 12:34         ` Christoph Hellwig
  0 siblings, 1 reply; 33+ messages in thread
From: Joonsoo Kim @ 2020-05-01 12:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Linux Memory Management List, LKML,
	Vlastimil Babka, Laura Abbott, Aneesh Kumar K . V, Mel Gorman,
	Michal Hocko, Johannes Weiner, Roman Gushchin, Minchan Kim,
	Rik van Riel, Christian Koenig, Huang Rui, Eric Biederman,
	Rafael J . Wysocki, Pavel Machek, kernel-team, Joonsoo Kim

2020년 5월 1일 (금) 오후 7:55, Christoph Hellwig <hch@infradead.org>님이 작성:
>
> On Fri, May 01, 2020 at 07:52:35PM +0900, Joonsoo Kim wrote:
> > > - New code will pop up which gets it wrong and nobody will notice for
> > >   a long time.
> >
> > Hmm... I think that it's not that hard to decide correct macro. If we rename
> > PageHighMem() with PageDirectMapped(), they, PageDirectMapped() and
> > PageHighMemZone(), are self-explanation macro. There would be no
> > confusion to use.
>
> What confuses me is why we even need PageHighMemZone - mostly code
> should not care about particular zones.  Maybe just open coding
> PageHighMemZone makes more sense - it is a little more cumersome, but
> at least it makes it explicit what we check for.  I already sent you
> an incremental diff for one obvious place, but maybe we need to look
> through the remaining ones if we can kill them or open code them in an
> obvious way.

I think that PageHighMemZone() is long and complicated enough to have
a macro.

PageHighMemZone(page) = is_highmem_idx(zone_idx(page_zone(page))

Instead of open-code, how about changing the style of macro like as
page_from_highmem()? What PageHighMemZone() represent is derivated
attribute from the page so PageXXX() style may not be appropriate.

Thanks.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 04/10] power: separate PageHighMem() and PageHighMemZone() use case
  2020-04-29  3:26 ` [PATCH v2 04/10] power: " js1304
@ 2020-05-01 12:22   ` Christoph Hellwig
  2020-05-04  3:01     ` Joonsoo Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Christoph Hellwig @ 2020-05-01 12:22 UTC (permalink / raw)
  To: js1304
  Cc: Andrew Morton, linux-mm, linux-kernel, Vlastimil Babka,
	Laura Abbott, Aneesh Kumar K . V, Mel Gorman, Michal Hocko,
	Johannes Weiner, Roman Gushchin, Minchan Kim, Rik van Riel,
	Christian Koenig, Huang Rui, Eric Biederman, Rafael J . Wysocki,
	Pavel Machek, kernel-team, Christoph Hellwig, Joonsoo Kim

On Wed, Apr 29, 2020 at 12:26:37PM +0900, js1304@gmail.com wrote:
> index 6598001..be759a6 100644
> --- a/kernel/power/snapshot.c
> +++ b/kernel/power/snapshot.c
> @@ -1227,7 +1227,7 @@ static struct page *saveable_highmem_page(struct zone *zone, unsigned long pfn)
>  	if (!page || page_zone(page) != zone)
>  		return NULL;
>  
> -	BUG_ON(!PageHighMem(page));
> +	BUG_ON(!PageHighMemZone(page));

The above check already checks for the highmem zone.  So if we want
to keep the BUG_ON it needs stay PageHighMem to make sense.  That being
said I'd rather remove it entirelẏ.

> -	BUG_ON(PageHighMem(page));
> +	BUG_ON(PageHighMemZone(page));

Same here.

> @@ -1529,7 +1529,7 @@ static unsigned long preallocate_image_pages(unsigned long nr_pages, gfp_t mask)
>  		if (!page)
>  			break;
>  		memory_bm_set_bit(&copy_bm, page_to_pfn(page));
> -		if (PageHighMem(page))
> +		if (PageHighMemZone(page))
>  			alloc_highmem++;
>  		else
>  			alloc_normal++;

I don't fully understand the log here.  Can Pavel or Rafael clarify
why swsupst would care about the exact zone?


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 05/10] mm/gup: separate PageHighMem() and PageHighMemZone() use case
  2020-04-29  3:26 ` [PATCH v2 05/10] mm/gup: " js1304
@ 2020-05-01 12:24   ` Christoph Hellwig
  2020-05-04  3:02     ` Joonsoo Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Christoph Hellwig @ 2020-05-01 12:24 UTC (permalink / raw)
  To: js1304
  Cc: Andrew Morton, linux-mm, linux-kernel, Vlastimil Babka,
	Laura Abbott, Aneesh Kumar K . V, Mel Gorman, Michal Hocko,
	Johannes Weiner, Roman Gushchin, Minchan Kim, Rik van Riel,
	Christian Koenig, Huang Rui, Eric Biederman, Rafael J . Wysocki,
	Pavel Machek, kernel-team, Christoph Hellwig, Joonsoo Kim

On Wed, Apr 29, 2020 at 12:26:38PM +0900, js1304@gmail.com wrote:
> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> Until now, PageHighMem() is used for two different cases. One is to check
> if there is a direct mapping for this page or not. The other is to check
> the zone of this page, that is, weather it is the highmem type zone or not.
> 
> Now, we have separate functions, PageHighMem() and PageHighMemZone() for
> each cases. Use appropriate one.
> 
> Note that there are some rules to determine the proper macro.
> 
> 1. If PageHighMem() is called for checking if the direct mapping exists
> or not, use PageHighMem().
> 2. If PageHighMem() is used to predict the previous gfp_flags for
> this page, use PageHighMemZone(). The zone of the page is related to
> the gfp_flags.
> 3. If purpose of calling PageHighMem() is to count highmem page and
> to interact with the system by using this count, use PageHighMemZone().
> This counter is usually used to calculate the available memory for an
> kernel allocation and pages on the highmem zone cannot be available
> for an kernel allocation.
> 4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
> is just copy of the previous PageHighMem() implementation and won't
> be changed.
> 
> I apply the rule #2 for this patch.
> 
> Acked-by: Roman Gushchin <guro@fb.com>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  mm/gup.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 11fda53..9652eed 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1608,7 +1608,7 @@ static struct page *new_non_cma_page(struct page *page, unsigned long private)
>  	 */
>  	gfp_t gfp_mask = GFP_USER | __GFP_NOWARN;
>  
> -	if (PageHighMem(page))
> +	if (PageHighMemZone(page))
>  		gfp_mask |= __GFP_HIGHMEM;

I think this wants to stay PageHighMem.  This migrates CMA pages to
other places before doing a long term pin.  Anything that didn't have
a direct mapping before won't need one for the new page, which could
also include non-highmem zones without a highmem mapping.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 06/10] mm/hugetlb: separate PageHighMem() and PageHighMemZone() use case
  2020-04-29  3:26 ` [PATCH v2 06/10] mm/hugetlb: " js1304
@ 2020-05-01 12:26   ` Christoph Hellwig
  2020-05-04  3:03     ` Joonsoo Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Christoph Hellwig @ 2020-05-01 12:26 UTC (permalink / raw)
  To: js1304
  Cc: Andrew Morton, linux-mm, linux-kernel, Vlastimil Babka,
	Laura Abbott, Aneesh Kumar K . V, Mel Gorman, Michal Hocko,
	Johannes Weiner, Roman Gushchin, Minchan Kim, Rik van Riel,
	Christian Koenig, Huang Rui, Eric Biederman, Rafael J . Wysocki,
	Pavel Machek, kernel-team, Christoph Hellwig, Joonsoo Kim

On Wed, Apr 29, 2020 at 12:26:39PM +0900, js1304@gmail.com wrote:
> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> Until now, PageHighMem() is used for two different cases. One is to check
> if there is a direct mapping for this page or not. The other is to check
> the zone of this page, that is, weather it is the highmem type zone or not.
> 
> Now, we have separate functions, PageHighMem() and PageHighMemZone() for
> each cases. Use appropriate one.
> 
> Note that there are some rules to determine the proper macro.
> 
> 1. If PageHighMem() is called for checking if the direct mapping exists
> or not, use PageHighMem().
> 2. If PageHighMem() is used to predict the previous gfp_flags for
> this page, use PageHighMemZone(). The zone of the page is related to
> the gfp_flags.
> 3. If purpose of calling PageHighMem() is to count highmem page and
> to interact with the system by using this count, use PageHighMemZone().
> This counter is usually used to calculate the available memory for an
> kernel allocation and pages on the highmem zone cannot be available
> for an kernel allocation.
> 4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
> is just copy of the previous PageHighMem() implementation and won't
> be changed.
> 
> I apply the rule #3 for this patch.
> 
> Acked-by: Roman Gushchin <guro@fb.com>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Why do we care about the zone here?  This only cares about having
kernel direct mapped pages as far as I can tell.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 07/10] mm: separate PageHighMem() and PageHighMemZone() use case
  2020-04-29  3:26 ` [PATCH v2 07/10] mm: " js1304
@ 2020-05-01 12:30   ` Christoph Hellwig
  2020-05-04  3:08     ` Joonsoo Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Christoph Hellwig @ 2020-05-01 12:30 UTC (permalink / raw)
  To: js1304
  Cc: Andrew Morton, linux-mm, linux-kernel, Vlastimil Babka,
	Laura Abbott, Aneesh Kumar K . V, Mel Gorman, Michal Hocko,
	Johannes Weiner, Roman Gushchin, Minchan Kim, Rik van Riel,
	Christian Koenig, Huang Rui, Eric Biederman, Rafael J . Wysocki,
	Pavel Machek, kernel-team, Christoph Hellwig, Joonsoo Kim

On Wed, Apr 29, 2020 at 12:26:40PM +0900, js1304@gmail.com wrote:
> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> Until now, PageHighMem() is used for two different cases. One is to check
> if there is a direct mapping for this page or not. The other is to check
> the zone of this page, that is, weather it is the highmem type zone or not.
> 
> Now, we have separate functions, PageHighMem() and PageHighMemZone() for
> each cases. Use appropriate one.
> 
> Note that there are some rules to determine the proper macro.
> 
> 1. If PageHighMem() is called for checking if the direct mapping exists
> or not, use PageHighMem().
> 2. If PageHighMem() is used to predict the previous gfp_flags for
> this page, use PageHighMemZone(). The zone of the page is related to
> the gfp_flags.
> 3. If purpose of calling PageHighMem() is to count highmem page and
> to interact with the system by using this count, use PageHighMemZone().
> This counter is usually used to calculate the available memory for an
> kernel allocation and pages on the highmem zone cannot be available
> for an kernel allocation.
> 4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
> is just copy of the previous PageHighMem() implementation and won't
> be changed.
> 
> I apply the rule #3 for this patch.
> 
> Acked-by: Roman Gushchin <guro@fb.com>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  mm/memory_hotplug.c | 2 +-
>  mm/page_alloc.c     | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 555137b..891c214 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -593,7 +593,7 @@ void generic_online_page(struct page *page, unsigned int order)
>  	__free_pages_core(page, order);
>  	totalram_pages_add(1UL << order);
>  #ifdef CONFIG_HIGHMEM
> -	if (PageHighMem(page))
> +	if (PageHighMemZone(page))
>  		totalhigh_pages_add(1UL << order);
>  #endif
>  }
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index fc5919e..7fe5115 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7444,7 +7444,7 @@ void adjust_managed_page_count(struct page *page, long count)
>  	atomic_long_add(count, &page_zone(page)->managed_pages);
>  	totalram_pages_add(count);
>  #ifdef CONFIG_HIGHMEM
> -	if (PageHighMem(page))
> +	if (PageHighMemZone(page))
>  		totalhigh_pages_add(count);
>  #endif

This function already uses the page_zone structure above, I think
life would be easier of you compare against that, as that makes
the code more obvious. 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 00/10] change the implementation of the PageHighMem()
  2020-05-01 12:15       ` Joonsoo Kim
@ 2020-05-01 12:34         ` Christoph Hellwig
  2020-05-04  3:09           ` Joonsoo Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Christoph Hellwig @ 2020-05-01 12:34 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Christoph Hellwig, Andrew Morton, Linux Memory Management List,
	LKML, Vlastimil Babka, Laura Abbott, Aneesh Kumar K . V,
	Mel Gorman, Michal Hocko, Johannes Weiner, Roman Gushchin,
	Minchan Kim, Rik van Riel, Christian Koenig, Huang Rui,
	Eric Biederman, Rafael J . Wysocki, Pavel Machek, kernel-team,
	Joonsoo Kim

On Fri, May 01, 2020 at 09:15:30PM +0900, Joonsoo Kim wrote:
> I think that PageHighMemZone() is long and complicated enough to have
> a macro.

It is.  But then again it also shouldn't really be used by anything
but MM internals.

> 
> PageHighMemZone(page) = is_highmem_idx(zone_idx(page_zone(page))
> 
> Instead of open-code, how about changing the style of macro like as
> page_from_highmem()? What PageHighMemZone() represent is derivated
> attribute from the page so PageXXX() style may not be appropriate.

Maybe page_is_highmem_zone() with a big kerneldoc comment explaining
the use case?  Bonus points of killing enough users that it can be
in mm/internal.h.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 03/10] kexec: separate PageHighMem() and PageHighMemZone() use case
  2020-04-29  3:26 ` [PATCH v2 03/10] kexec: " js1304
@ 2020-05-01 14:03   ` Eric W. Biederman
  2020-05-04  3:10     ` Joonsoo Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Eric W. Biederman @ 2020-05-01 14:03 UTC (permalink / raw)
  To: js1304
  Cc: Andrew Morton, linux-mm, linux-kernel, Vlastimil Babka,
	Laura Abbott, Aneesh Kumar K . V, Mel Gorman, Michal Hocko,
	Johannes Weiner, Roman Gushchin, Minchan Kim, Rik van Riel,
	Christian Koenig, Huang Rui, Rafael J . Wysocki, Pavel Machek,
	kernel-team, Christoph Hellwig, Joonsoo Kim

js1304@gmail.com writes:

> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> Until now, PageHighMem() is used for two different cases. One is to check
> if there is a direct mapping for this page or not. The other is to check
> the zone of this page, that is, weather it is the highmem type zone or not.
>
> Now, we have separate functions, PageHighMem() and PageHighMemZone() for
> each cases. Use appropriate one.
>
> Note that there are some rules to determine the proper macro.
>
> 1. If PageHighMem() is called for checking if the direct mapping exists
> or not, use PageHighMem().
> 2. If PageHighMem() is used to predict the previous gfp_flags for
> this page, use PageHighMemZone(). The zone of the page is related to
> the gfp_flags.
> 3. If purpose of calling PageHighMem() is to count highmem page and
> to interact with the system by using this count, use PageHighMemZone().
> This counter is usually used to calculate the available memory for an
> kernel allocation and pages on the highmem zone cannot be available
> for an kernel allocation.
> 4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
> is just copy of the previous PageHighMem() implementation and won't
> be changed.
>
> I apply the rule #2 for this patch.

Hmm.

What happened to the notion of deprecating and reducing the usage of
highmem?  I know that we have some embedded architectures where it is
still important but this feels like it flies in the face of that.


This part of kexec would be much more maintainable if it had a proper
mm layer helper that tested to see if the page matched the passed in
gfp flags.  That way the mm layer could keep changing and doing weird
gyrations and this code would not care.


What would be really helpful is if there was a straight forward way to
allocate memory whose physical address fits in the native word size.


All I know for certain about this patch is that it takes a piece of code
that looked like it made sense, and transfroms it into something I can
not easily verify, and can not maintain.

As it makes the code unmaintainable.
Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>


Not to say that the code isn't questionable as it is, but this change just
pushes it over the edge into gobbledy gook.

Eric


> Acked-by: Roman Gushchin <guro@fb.com>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  kernel/kexec_core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index ba1d91e..33097b7 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -766,7 +766,7 @@ static struct page *kimage_alloc_page(struct kimage *image,
>  			 * gfp_flags honor the ones passed in.
>  			 */
>  			if (!(gfp_mask & __GFP_HIGHMEM) &&
> -			    PageHighMem(old_page)) {
> +			    PageHighMemZone(old_page)) {
>  				kimage_free_pages(old_page);
>  				continue;
>  			}


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 04/10] power: separate PageHighMem() and PageHighMemZone() use case
  2020-05-01 12:22   ` Christoph Hellwig
@ 2020-05-04  3:01     ` Joonsoo Kim
  0 siblings, 0 replies; 33+ messages in thread
From: Joonsoo Kim @ 2020-05-04  3:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Linux Memory Management List, LKML,
	Vlastimil Babka, Laura Abbott, Aneesh Kumar K . V, Mel Gorman,
	Michal Hocko, Johannes Weiner, Roman Gushchin, Minchan Kim,
	Rik van Riel, Christian Koenig, Huang Rui, Eric Biederman,
	Rafael J . Wysocki, Pavel Machek, kernel-team, Joonsoo Kim

2020년 5월 1일 (금) 오후 9:22, Christoph Hellwig <hch@infradead.org>님이 작성:
>
> On Wed, Apr 29, 2020 at 12:26:37PM +0900, js1304@gmail.com wrote:
> > index 6598001..be759a6 100644
> > --- a/kernel/power/snapshot.c
> > +++ b/kernel/power/snapshot.c
> > @@ -1227,7 +1227,7 @@ static struct page *saveable_highmem_page(struct zone *zone, unsigned long pfn)
> >       if (!page || page_zone(page) != zone)
> >               return NULL;
> >
> > -     BUG_ON(!PageHighMem(page));
> > +     BUG_ON(!PageHighMemZone(page));
>
> The above check already checks for the highmem zone.  So if we want
> to keep the BUG_ON it needs stay PageHighMem to make sense.  That being
> said I'd rather remove it entirelẏ.

Okay.

> > -     BUG_ON(PageHighMem(page));
> > +     BUG_ON(PageHighMemZone(page));
>
> Same here.

Okay.

Thanks.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 05/10] mm/gup: separate PageHighMem() and PageHighMemZone() use case
  2020-05-01 12:24   ` Christoph Hellwig
@ 2020-05-04  3:02     ` Joonsoo Kim
  0 siblings, 0 replies; 33+ messages in thread
From: Joonsoo Kim @ 2020-05-04  3:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Linux Memory Management List, LKML,
	Vlastimil Babka, Laura Abbott, Aneesh Kumar K . V, Mel Gorman,
	Michal Hocko, Johannes Weiner, Roman Gushchin, Minchan Kim,
	Rik van Riel, Christian Koenig, Huang Rui, Eric Biederman,
	Rafael J . Wysocki, Pavel Machek, kernel-team, Joonsoo Kim

2020년 5월 1일 (금) 오후 9:24, Christoph Hellwig <hch@infradead.org>님이 작성:
>
> On Wed, Apr 29, 2020 at 12:26:38PM +0900, js1304@gmail.com wrote:
> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >
> > Until now, PageHighMem() is used for two different cases. One is to check
> > if there is a direct mapping for this page or not. The other is to check
> > the zone of this page, that is, weather it is the highmem type zone or not.
> >
> > Now, we have separate functions, PageHighMem() and PageHighMemZone() for
> > each cases. Use appropriate one.
> >
> > Note that there are some rules to determine the proper macro.
> >
> > 1. If PageHighMem() is called for checking if the direct mapping exists
> > or not, use PageHighMem().
> > 2. If PageHighMem() is used to predict the previous gfp_flags for
> > this page, use PageHighMemZone(). The zone of the page is related to
> > the gfp_flags.
> > 3. If purpose of calling PageHighMem() is to count highmem page and
> > to interact with the system by using this count, use PageHighMemZone().
> > This counter is usually used to calculate the available memory for an
> > kernel allocation and pages on the highmem zone cannot be available
> > for an kernel allocation.
> > 4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
> > is just copy of the previous PageHighMem() implementation and won't
> > be changed.
> >
> > I apply the rule #2 for this patch.
> >
> > Acked-by: Roman Gushchin <guro@fb.com>
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > ---
> >  mm/gup.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 11fda53..9652eed 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -1608,7 +1608,7 @@ static struct page *new_non_cma_page(struct page *page, unsigned long private)
> >        */
> >       gfp_t gfp_mask = GFP_USER | __GFP_NOWARN;
> >
> > -     if (PageHighMem(page))
> > +     if (PageHighMemZone(page))
> >               gfp_mask |= __GFP_HIGHMEM;
>
> I think this wants to stay PageHighMem.  This migrates CMA pages to
> other places before doing a long term pin.  Anything that didn't have
> a direct mapping before won't need one for the new page, which could
> also include non-highmem zones without a highmem mapping.

What we want to do here is to guess allocation gfp flags of original
page in order to allocate
a new page with most relaxed gfp flag. It is depend on the zone where
the page belong to
rather than existence of direct mapping. Until now, existence of
direct mapping implies
the type of zone so there is no problem.

After my future CMA patchset, direct mapped CMA page will be on the
ZONE_MOVABLE.
And, a page on ZONE_MOVABLE should be allocated with __GFP_HIGHMEM |
__GFP_MOVABLE.
So, most relaxed gfp flag for this CMA page would include
__GFP_HIGHMEM. If PageHighMem()
is used here, __GFP_HIGHMEM would be lost since this CMA page has a
direct mapping.

Therefore, PageHighMemZone() is right one here.

Anyway, I saw Eric's comment in other e-mail that abstraction is
needed to guess gfp flags of
original page and I agree with it. This site can also get benefit of
such a change.

Thanks.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 06/10] mm/hugetlb: separate PageHighMem() and PageHighMemZone() use case
  2020-05-01 12:26   ` Christoph Hellwig
@ 2020-05-04  3:03     ` Joonsoo Kim
  0 siblings, 0 replies; 33+ messages in thread
From: Joonsoo Kim @ 2020-05-04  3:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Linux Memory Management List, LKML,
	Vlastimil Babka, Laura Abbott, Aneesh Kumar K . V, Mel Gorman,
	Michal Hocko, Johannes Weiner, Roman Gushchin, Minchan Kim,
	Rik van Riel, Christian Koenig, Huang Rui, Eric Biederman,
	Rafael J . Wysocki, Pavel Machek, kernel-team, Joonsoo Kim

2020년 5월 1일 (금) 오후 9:26, Christoph Hellwig <hch@infradead.org>님이 작성:
>
> On Wed, Apr 29, 2020 at 12:26:39PM +0900, js1304@gmail.com wrote:
> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >
> > Until now, PageHighMem() is used for two different cases. One is to check
> > if there is a direct mapping for this page or not. The other is to check
> > the zone of this page, that is, weather it is the highmem type zone or not.
> >
> > Now, we have separate functions, PageHighMem() and PageHighMemZone() for
> > each cases. Use appropriate one.
> >
> > Note that there are some rules to determine the proper macro.
> >
> > 1. If PageHighMem() is called for checking if the direct mapping exists
> > or not, use PageHighMem().
> > 2. If PageHighMem() is used to predict the previous gfp_flags for
> > this page, use PageHighMemZone(). The zone of the page is related to
> > the gfp_flags.
> > 3. If purpose of calling PageHighMem() is to count highmem page and
> > to interact with the system by using this count, use PageHighMemZone().
> > This counter is usually used to calculate the available memory for an
> > kernel allocation and pages on the highmem zone cannot be available
> > for an kernel allocation.
> > 4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
> > is just copy of the previous PageHighMem() implementation and won't
> > be changed.
> >
> > I apply the rule #3 for this patch.
> >
> > Acked-by: Roman Gushchin <guro@fb.com>
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> Why do we care about the zone here?  This only cares about having
> kernel direct mapped pages as far as I can tell.

My understaning is that what we want to do here is to first free memory
that can be used for kernel allocation. If direct mapped page is on the zone
that cannot be used for kernel allocation, there is no meaning to free
this page first. So, we need to take a care of the zone here.

Thanks.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 07/10] mm: separate PageHighMem() and PageHighMemZone() use case
  2020-05-01 12:30   ` Christoph Hellwig
@ 2020-05-04  3:08     ` Joonsoo Kim
  0 siblings, 0 replies; 33+ messages in thread
From: Joonsoo Kim @ 2020-05-04  3:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Linux Memory Management List, LKML,
	Vlastimil Babka, Laura Abbott, Aneesh Kumar K . V, Mel Gorman,
	Michal Hocko, Johannes Weiner, Roman Gushchin, Minchan Kim,
	Rik van Riel, Christian Koenig, Huang Rui, Eric Biederman,
	Rafael J . Wysocki, Pavel Machek, kernel-team, Joonsoo Kim

2020년 5월 1일 (금) 오후 9:30, Christoph Hellwig <hch@infradead.org>님이 작성:
>
> On Wed, Apr 29, 2020 at 12:26:40PM +0900, js1304@gmail.com wrote:
> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >
> > Until now, PageHighMem() is used for two different cases. One is to check
> > if there is a direct mapping for this page or not. The other is to check
> > the zone of this page, that is, weather it is the highmem type zone or not.
> >
> > Now, we have separate functions, PageHighMem() and PageHighMemZone() for
> > each cases. Use appropriate one.
> >
> > Note that there are some rules to determine the proper macro.
> >
> > 1. If PageHighMem() is called for checking if the direct mapping exists
> > or not, use PageHighMem().
> > 2. If PageHighMem() is used to predict the previous gfp_flags for
> > this page, use PageHighMemZone(). The zone of the page is related to
> > the gfp_flags.
> > 3. If purpose of calling PageHighMem() is to count highmem page and
> > to interact with the system by using this count, use PageHighMemZone().
> > This counter is usually used to calculate the available memory for an
> > kernel allocation and pages on the highmem zone cannot be available
> > for an kernel allocation.
> > 4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
> > is just copy of the previous PageHighMem() implementation and won't
> > be changed.
> >
> > I apply the rule #3 for this patch.
> >
> > Acked-by: Roman Gushchin <guro@fb.com>
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > ---
> >  mm/memory_hotplug.c | 2 +-
> >  mm/page_alloc.c     | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index 555137b..891c214 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -593,7 +593,7 @@ void generic_online_page(struct page *page, unsigned int order)
> >       __free_pages_core(page, order);
> >       totalram_pages_add(1UL << order);
> >  #ifdef CONFIG_HIGHMEM
> > -     if (PageHighMem(page))
> > +     if (PageHighMemZone(page))
> >               totalhigh_pages_add(1UL << order);
> >  #endif
> >  }
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index fc5919e..7fe5115 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -7444,7 +7444,7 @@ void adjust_managed_page_count(struct page *page, long count)
> >       atomic_long_add(count, &page_zone(page)->managed_pages);
> >       totalram_pages_add(count);
> >  #ifdef CONFIG_HIGHMEM
> > -     if (PageHighMem(page))
> > +     if (PageHighMemZone(page))
> >               totalhigh_pages_add(count);
> >  #endif
>
> This function already uses the page_zone structure above, I think
> life would be easier of you compare against that, as that makes
> the code more obvious.

If I can kill all the PageHighMemZone() macro, I will use page_zone() above.
However, if it's not possible, I will leave it as it is. It would be
simpler than
your suggestion.

Thanks.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 00/10] change the implementation of the PageHighMem()
  2020-05-01 12:34         ` Christoph Hellwig
@ 2020-05-04  3:09           ` Joonsoo Kim
  0 siblings, 0 replies; 33+ messages in thread
From: Joonsoo Kim @ 2020-05-04  3:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Linux Memory Management List, LKML,
	Vlastimil Babka, Laura Abbott, Aneesh Kumar K . V, Mel Gorman,
	Michal Hocko, Johannes Weiner, Roman Gushchin, Minchan Kim,
	Rik van Riel, Christian Koenig, Huang Rui, Eric Biederman,
	Rafael J . Wysocki, Pavel Machek, kernel-team, Joonsoo Kim

2020년 5월 1일 (금) 오후 9:34, Christoph Hellwig <hch@infradead.org>님이 작성:
>
> On Fri, May 01, 2020 at 09:15:30PM +0900, Joonsoo Kim wrote:
> > I think that PageHighMemZone() is long and complicated enough to have
> > a macro.
>
> It is.  But then again it also shouldn't really be used by anything
> but MM internals.

I'm not sure that we can make it MM internal but I will try.

> >
> > PageHighMemZone(page) = is_highmem_idx(zone_idx(page_zone(page))
> >
> > Instead of open-code, how about changing the style of macro like as
> > page_from_highmem()? What PageHighMemZone() represent is derivated
> > attribute from the page so PageXXX() style may not be appropriate.
>
> Maybe page_is_highmem_zone() with a big kerneldoc comment explaining
> the use case?  Bonus points of killing enough users that it can be
> in mm/internal.h.

I will try to kill page_is_highmem_zone() as much as possible in next version.

Thanks.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 03/10] kexec: separate PageHighMem() and PageHighMemZone() use case
  2020-05-01 14:03   ` Eric W. Biederman
@ 2020-05-04  3:10     ` Joonsoo Kim
  2020-05-04 14:03       ` Eric W. Biederman
  0 siblings, 1 reply; 33+ messages in thread
From: Joonsoo Kim @ 2020-05-04  3:10 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Linux Memory Management List, LKML,
	Vlastimil Babka, Laura Abbott, Aneesh Kumar K . V, Mel Gorman,
	Michal Hocko, Johannes Weiner, Roman Gushchin, Minchan Kim,
	Rik van Riel, Christian Koenig, Huang Rui, Rafael J . Wysocki,
	Pavel Machek, kernel-team, Christoph Hellwig, Joonsoo Kim

2020년 5월 1일 (금) 오후 11:06, Eric W. Biederman <ebiederm@xmission.com>님이 작성:
>
> js1304@gmail.com writes:
>
> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >
> > Until now, PageHighMem() is used for two different cases. One is to check
> > if there is a direct mapping for this page or not. The other is to check
> > the zone of this page, that is, weather it is the highmem type zone or not.
> >
> > Now, we have separate functions, PageHighMem() and PageHighMemZone() for
> > each cases. Use appropriate one.
> >
> > Note that there are some rules to determine the proper macro.
> >
> > 1. If PageHighMem() is called for checking if the direct mapping exists
> > or not, use PageHighMem().
> > 2. If PageHighMem() is used to predict the previous gfp_flags for
> > this page, use PageHighMemZone(). The zone of the page is related to
> > the gfp_flags.
> > 3. If purpose of calling PageHighMem() is to count highmem page and
> > to interact with the system by using this count, use PageHighMemZone().
> > This counter is usually used to calculate the available memory for an
> > kernel allocation and pages on the highmem zone cannot be available
> > for an kernel allocation.
> > 4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
> > is just copy of the previous PageHighMem() implementation and won't
> > be changed.
> >
> > I apply the rule #2 for this patch.
>
> Hmm.
>
> What happened to the notion of deprecating and reducing the usage of
> highmem?  I know that we have some embedded architectures where it is
> still important but this feels like it flies in the face of that.

AFAIK, deprecating highmem requires some more time and, before then,
we need to support it.

>
> This part of kexec would be much more maintainable if it had a proper
> mm layer helper that tested to see if the page matched the passed in
> gfp flags.  That way the mm layer could keep changing and doing weird
> gyrations and this code would not care.

Good idea! I will do it.

>
> What would be really helpful is if there was a straight forward way to
> allocate memory whose physical address fits in the native word size.
>
>
> All I know for certain about this patch is that it takes a piece of code
> that looked like it made sense, and transfroms it into something I can
> not easily verify, and can not maintain.

Although I decide to make a helper as you described above, I don't
understand why you think that a new code isn't maintainable. It is just
the same thing with different name. Could you elaborate more why do
you think so?

Thanks.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 03/10] kexec: separate PageHighMem() and PageHighMemZone() use case
  2020-05-04  3:10     ` Joonsoo Kim
@ 2020-05-04 14:03       ` Eric W. Biederman
  2020-05-04 21:59         ` [RFC][PATCH] kexec: Teach indirect pages how to live in high memory Eric W. Biederman
  2020-05-06  5:23         ` [PATCH v2 03/10] kexec: separate PageHighMem() and PageHighMemZone() use case Joonsoo Kim
  0 siblings, 2 replies; 33+ messages in thread
From: Eric W. Biederman @ 2020-05-04 14:03 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Linux Memory Management List, LKML,
	Vlastimil Babka, Laura Abbott, Aneesh Kumar K . V, Mel Gorman,
	Michal Hocko, Johannes Weiner, Roman Gushchin, Minchan Kim,
	Rik van Riel, Christian Koenig, Huang Rui, Rafael J . Wysocki,
	Pavel Machek, kernel-team, Christoph Hellwig, Joonsoo Kim,
	Kexec Mailing List


I have added in the kexec mailling list.

Looking at the patch we are discussing it appears that the kexec code
could be doing much better in highmem situations today but is not.


Joonsoo Kim <js1304@gmail.com> writes:

> 2020년 5월 1일 (금) 오후 11:06, Eric W. Biederman <ebiederm@xmission.com>님이 작성:
>>
>> js1304@gmail.com writes:
>>
>> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> >
>> > Until now, PageHighMem() is used for two different cases. One is to check
>> > if there is a direct mapping for this page or not. The other is to check
>> > the zone of this page, that is, weather it is the highmem type zone or not.
>> >
>> > Now, we have separate functions, PageHighMem() and PageHighMemZone() for
>> > each cases. Use appropriate one.
>> >
>> > Note that there are some rules to determine the proper macro.
>> >
>> > 1. If PageHighMem() is called for checking if the direct mapping exists
>> > or not, use PageHighMem().
>> > 2. If PageHighMem() is used to predict the previous gfp_flags for
>> > this page, use PageHighMemZone(). The zone of the page is related to
>> > the gfp_flags.
>> > 3. If purpose of calling PageHighMem() is to count highmem page and
>> > to interact with the system by using this count, use PageHighMemZone().
>> > This counter is usually used to calculate the available memory for an
>> > kernel allocation and pages on the highmem zone cannot be available
>> > for an kernel allocation.
>> > 4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
>> > is just copy of the previous PageHighMem() implementation and won't
>> > be changed.
>> >
>> > I apply the rule #2 for this patch.
>>
>> Hmm.
>>
>> What happened to the notion of deprecating and reducing the usage of
>> highmem?  I know that we have some embedded architectures where it is
>> still important but this feels like it flies in the face of that.
>
> AFAIK, deprecating highmem requires some more time and, before then,
> we need to support it.

But it at least makes sense to look at what we are doing with highmem
and ask if it makes sense.

>> This part of kexec would be much more maintainable if it had a proper
>> mm layer helper that tested to see if the page matched the passed in
>> gfp flags.  That way the mm layer could keep changing and doing weird
>> gyrations and this code would not care.
>
> Good idea! I will do it.
>
>>
>> What would be really helpful is if there was a straight forward way to
>> allocate memory whose physical address fits in the native word size.
>>
>>
>> All I know for certain about this patch is that it takes a piece of code
>> that looked like it made sense, and transfroms it into something I can
>> not easily verify, and can not maintain.
>
> Although I decide to make a helper as you described above, I don't
> understand why you think that a new code isn't maintainable. It is just
> the same thing with different name. Could you elaborate more why do
> you think so?

Because the current code is already wrong.  It does not handle
the general case of what it claims to handle.  When the only distinction
that needs to be drawn is highmem or not highmem that is likely fine.
But now you are making it possible to draw more distinctions.  At which
point I have no idea which distinction needs to be drawn.


The code and the logic is about 20 years old.  When it was written I
don't recally taking numa seriously and the kernel only had 3 zones
as I recall (DMA aka the now deprecated GFP_DMA, NORMAL, and HIGH).

The code attempts to work around limitations of those old zones amd play
nice in a highmem world by allocating memory HIGH memory and not using
it if the memory was above 4G ( on 32bit ).

Looking the kernel now has GFP_DMA32 so on 32bit with highmem we should
probably be using that, when allocating memory.




Further in dealing with this memory management situation we only
have two situations we call kimage_alloc_page.

For an indirect page which must have a valid page_address(page).
We could probably relax that if we cared to.

For a general kexec page to store the next kernel in until we switch.
The general pages can be in high memory.

In a highmem world all of those pages should be below 32bit.



Given that we fundamentally have two situations my sense is that we
should just refactor the code so that we never have to deal with:


			/* The old page I have found cannot be a
			 * destination page, so return it if it's
			 * gfp_flags honor the ones passed in.
			 */
			if (!(gfp_mask & __GFP_HIGHMEM) &&
			    PageHighMem(old_page)) {
				kimage_free_pages(old_page);
				continue;
			}

Either we teach kimage_add_entry how to work with high memory pages
(still 32bit accessible) or we teach kimage_alloc_page to notice it is
an indirect page allocation and to always skip trying to reuse the page
it found in that case.

That way the code does not need to know about forever changing mm internals.



We should probably investigate GFP_DMA32 at the same time, and switch to
that for 32bit rather than continuing to use GFP_HIGHUSER.

Eric


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC][PATCH] kexec: Teach indirect pages how to live in high memory
  2020-05-04 14:03       ` Eric W. Biederman
@ 2020-05-04 21:59         ` Eric W. Biederman
  2020-05-05 17:44           ` Hari Bathini
  2020-05-06  5:23         ` [PATCH v2 03/10] kexec: separate PageHighMem() and PageHighMemZone() use case Joonsoo Kim
  1 sibling, 1 reply; 33+ messages in thread
From: Eric W. Biederman @ 2020-05-04 21:59 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Linux Memory Management List, LKML,
	Vlastimil Babka, Laura Abbott, Aneesh Kumar K . V, Mel Gorman,
	Michal Hocko, Johannes Weiner, Roman Gushchin, Minchan Kim,
	Rik van Riel, Christian Koenig, Huang Rui, Rafael J . Wysocki,
	Pavel Machek, kernel-team, Christoph Hellwig, Joonsoo Kim,
	Kexec Mailing List


Recently a patch was proposed to kimage_alloc_page to slightly alter
the logic of how pages allocated with incompatible flags were
detected.  The logic was being altered because the semantics of the
page alloctor were changing yet again.

Looking at that case I realized that there is no reason for it to even
exist.  Either the indirect page allocations and the source page
allocations could be separated out, or I could do as I am doing now
and simply teach the indirect pages to live in high memory.

This patch replaced pointers of type kimage_entry_t * with a new type
kimage_entry_pos_t.  This new type holds the physical address of the
indirect page and the offset within that page of the next indirect
entry to write.  A special constant KIMAGE_ENTRY_POS_INVALID is added
that kimage_image_pos_t variables that don't currently have a valid
may be set to.

Two new functions kimage_read_entry and kimage_write_entry have been
provided to write entries in way that works if they live in high
memory.

The now unnecessary checks to see if a destination entry is non-zero
and to increment it if so have been removed.  For safety new indrect
pages are now cleared so we have a guarantee everything that has not
been used yet is zero.  Along with this writing an extra trailing 0
entry has been removed, as it is known all trailing entries are now 0.

With highmem support implemented for indirect pages
kimage_image_alloc_page has been updated to always allocate
GFP_HIGHUSER pages, and handling of pages with different
gfp flags has been removed.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---

I have not done more than compile test this but I think this will remove
that tricky case in the kexec highmem support.

Any comments?  Does anyone have a 32bit highmem system where they can
test this code?  I can probably do something with a 32bit x86 kernel
but it has been a few days.

Does anyone know how we can more effectively allocate memory below
whatever the maximum limit that kexec supports? Typically below
4G on 32bit and below 2^64 on 64bits.

Eric

 include/linux/kexec.h |   5 +-
 kernel/kexec_core.c   | 119 +++++++++++++++++++++++++-----------------
 2 files changed, 73 insertions(+), 51 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 1776eb2e43a4..6d3f6f4cb926 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -69,6 +69,8 @@
  */
 
 typedef unsigned long kimage_entry_t;
+typedef unsigned long kimage_entry_pos_t;
+#define KIMAGE_ENTRY_POS_INVALID ((kimage_entry_pos_t)-2)
 
 struct kexec_segment {
 	/*
@@ -243,8 +245,7 @@ int kexec_elf_probe(const char *buf, unsigned long len);
 #endif
 struct kimage {
 	kimage_entry_t head;
-	kimage_entry_t *entry;
-	kimage_entry_t *last_entry;
+	kimage_entry_pos_t entry_pos;
 
 	unsigned long start;
 	struct page *control_code_page;
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index c19c0dad1ebe..45862fda9e60 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -142,7 +142,6 @@ EXPORT_SYMBOL_GPL(kexec_crash_loaded);
 #define PAGE_COUNT(x) (((x) + PAGE_SIZE - 1) >> PAGE_SHIFT)
 
 static struct page *kimage_alloc_page(struct kimage *image,
-				       gfp_t gfp_mask,
 				       unsigned long dest);
 
 int sanity_check_segment_list(struct kimage *image)
@@ -261,8 +260,7 @@ struct kimage *do_kimage_alloc_init(void)
 		return NULL;
 
 	image->head = 0;
-	image->entry = &image->head;
-	image->last_entry = &image->head;
+	image->entry_pos = KIMAGE_ENTRY_POS_INVALID;
 	image->control_page = ~0; /* By default this does not apply */
 	image->type = KEXEC_TYPE_DEFAULT;
 
@@ -531,28 +529,56 @@ int kimage_crash_copy_vmcoreinfo(struct kimage *image)
 	return 0;
 }
 
-static int kimage_add_entry(struct kimage *image, kimage_entry_t entry)
+static kimage_entry_t kimage_read_entry(kimage_entry_pos_t pos)
 {
-	if (*image->entry != 0)
-		image->entry++;
+	kimage_entry_t *arr, entry;
+	struct page *page;
+	unsigned long off;
+
+	page = boot_pfn_to_page(pos >> PAGE_SHIFT);
+	off = pos & ~PAGE_MASK;
+	arr = kmap_atomic(page);
+	entry = arr[off];
+	kunmap_atomic(arr);
+
+	return entry;
+}
 
-	if (image->entry == image->last_entry) {
-		kimage_entry_t *ind_page;
+static void kimage_write_entry(kimage_entry_pos_t pos, kimage_entry_t entry)
+{
+	kimage_entry_t *arr;
+	struct page *page;
+	unsigned long off;
+
+	page = boot_pfn_to_page(pos >> PAGE_SHIFT);
+	off = pos & ~PAGE_MASK;
+	arr = kmap_atomic(page);
+	arr[off] = entry;
+	kunmap_atomic(arr);
+}
+
+#define LAST_KIMAGE_ENTRY ((PAGE_SIZE/sizeof(kimage_entry_t)) - 1)
+static int kimage_add_entry(struct kimage *image, kimage_entry_t entry)
+{
+	if ((image->entry_pos == KIMAGE_ENTRY_POS_INVALID) ||
+	    ((image->entry_pos & ~PAGE_MASK) == LAST_KIMAGE_ENTRY)) {
+		unsigned long ind_addr;
 		struct page *page;
 
-		page = kimage_alloc_page(image, GFP_KERNEL, KIMAGE_NO_DEST);
+		page = kimage_alloc_page(image, KIMAGE_NO_DEST);
 		if (!page)
 			return -ENOMEM;
 
-		ind_page = page_address(page);
-		*image->entry = virt_to_boot_phys(ind_page) | IND_INDIRECTION;
-		image->entry = ind_page;
-		image->last_entry = ind_page +
-				      ((PAGE_SIZE/sizeof(kimage_entry_t)) - 1);
+		ind_addr = page_to_boot_pfn(page) << PAGE_SHIFT;
+		kimage_write_entry(image->entry_pos, ind_addr | IND_INDIRECTION);
+
+		clear_highpage(page);
+
+		image->entry_pos = ind_addr;
 	}
-	*image->entry = entry;
-	image->entry++;
-	*image->entry = 0;
+
+	kimage_write_entry(image->entry_pos, entry);
+	image->entry_pos++;
 
 	return 0;
 }
@@ -597,16 +623,14 @@ int __weak machine_kexec_post_load(struct kimage *image)
 
 void kimage_terminate(struct kimage *image)
 {
-	if (*image->entry != 0)
-		image->entry++;
-
-	*image->entry = IND_DONE;
+	kimage_write_entry(image->entry_pos, IND_DONE);
 }
 
-#define for_each_kimage_entry(image, ptr, entry) \
-	for (ptr = &image->head; (entry = *ptr) && !(entry & IND_DONE); \
-		ptr = (entry & IND_INDIRECTION) ? \
-			boot_phys_to_virt((entry & PAGE_MASK)) : ptr + 1)
+#define for_each_kimage_entry(image, pos, entry) 				\
+	for (entry = image->head, pos = KIMAGE_ENTRY_POS_INVALID;		\
+	     entry && !(entry & IND_DONE);					\
+	     pos = ((entry & IND_INDIRECTION) ? (entry & PAGE_MASK) : pos + 1), \
+	     entry = kimage_read_entry(pos))
 
 static void kimage_free_entry(kimage_entry_t entry)
 {
@@ -618,8 +642,8 @@ static void kimage_free_entry(kimage_entry_t entry)
 
 void kimage_free(struct kimage *image)
 {
-	kimage_entry_t *ptr, entry;
-	kimage_entry_t ind = 0;
+	kimage_entry_t entry, ind = 0;
+	kimage_entry_pos_t pos;
 
 	if (!image)
 		return;
@@ -630,7 +654,7 @@ void kimage_free(struct kimage *image)
 	}
 
 	kimage_free_extra_pages(image);
-	for_each_kimage_entry(image, ptr, entry) {
+	for_each_kimage_entry(image, pos, entry) {
 		if (entry & IND_INDIRECTION) {
 			/* Free the previous indirection page */
 			if (ind & IND_INDIRECTION)
@@ -662,27 +686,27 @@ void kimage_free(struct kimage *image)
 	kfree(image);
 }
 
-static kimage_entry_t *kimage_dst_used(struct kimage *image,
-					unsigned long page)
+static kimage_entry_pos_t kimage_dst_used(struct kimage *image,
+					  unsigned long page)
 {
-	kimage_entry_t *ptr, entry;
 	unsigned long destination = 0;
+	kimage_entry_pos_t pos;
+	kimage_entry_t entry;
 
-	for_each_kimage_entry(image, ptr, entry) {
+	for_each_kimage_entry(image, pos, entry) {
 		if (entry & IND_DESTINATION)
 			destination = entry & PAGE_MASK;
 		else if (entry & IND_SOURCE) {
 			if (page == destination)
-				return ptr;
+				return pos;
 			destination += PAGE_SIZE;
 		}
 	}
 
-	return NULL;
+	return KIMAGE_ENTRY_POS_INVALID;
 }
 
 static struct page *kimage_alloc_page(struct kimage *image,
-					gfp_t gfp_mask,
 					unsigned long destination)
 {
 	/*
@@ -719,10 +743,10 @@ static struct page *kimage_alloc_page(struct kimage *image,
 	}
 	page = NULL;
 	while (1) {
-		kimage_entry_t *old;
+		kimage_entry_pos_t pos;
 
 		/* Allocate a page, if we run out of memory give up */
-		page = kimage_alloc_pages(gfp_mask, 0);
+		page = kimage_alloc_pages(GFP_HIGHUSER, 0);
 		if (!page)
 			return NULL;
 		/* If the page cannot be used file it away */
@@ -747,26 +771,23 @@ static struct page *kimage_alloc_page(struct kimage *image,
 		 * See if there is already a source page for this
 		 * destination page.  And if so swap the source pages.
 		 */
-		old = kimage_dst_used(image, addr);
-		if (old) {
+		pos = kimage_dst_used(image, addr);
+		if (pos != KIMAGE_ENTRY_POS_INVALID) {
 			/* If so move it */
+			kimage_entry_t old, replacement;
 			unsigned long old_addr;
 			struct page *old_page;
 
-			old_addr = *old & PAGE_MASK;
+			old = kimage_read_entry(pos);
+			old_addr = old & PAGE_MASK;
 			old_page = boot_pfn_to_page(old_addr >> PAGE_SHIFT);
 			copy_highpage(page, old_page);
-			*old = addr | (*old & ~PAGE_MASK);
+			replacement = addr | (old & ~PAGE_MASK);
+			kimage_write_entry(pos, replacement);
 
 			/* The old page I have found cannot be a
-			 * destination page, so return it if it's
-			 * gfp_flags honor the ones passed in.
+			 * destination page, so return it.
 			 */
-			if (!(gfp_mask & __GFP_HIGHMEM) &&
-			    PageHighMem(old_page)) {
-				kimage_free_pages(old_page);
-				continue;
-			}
 			addr = old_addr;
 			page = old_page;
 			break;
@@ -805,7 +826,7 @@ static int kimage_load_normal_segment(struct kimage *image,
 		char *ptr;
 		size_t uchunk, mchunk;
 
-		page = kimage_alloc_page(image, GFP_HIGHUSER, maddr);
+		page = kimage_alloc_page(image, maddr);
 		if (!page) {
 			result  = -ENOMEM;
 			goto out;
-- 
2.25.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH] kexec: Teach indirect pages how to live in high memory
  2020-05-04 21:59         ` [RFC][PATCH] kexec: Teach indirect pages how to live in high memory Eric W. Biederman
@ 2020-05-05 17:44           ` Hari Bathini
  2020-05-05 18:39             ` Eric W. Biederman
  0 siblings, 1 reply; 33+ messages in thread
From: Hari Bathini @ 2020-05-05 17:44 UTC (permalink / raw)
  To: Eric W. Biederman, Joonsoo Kim
  Cc: kernel-team, Michal Hocko, Minchan Kim, Aneesh Kumar K . V,
	Rik van Riel, Rafael J . Wysocki, LKML, Christian Koenig,
	Christoph Hellwig, Linux Memory Management List, Huang Rui,
	Kexec Mailing List, Pavel Machek, Johannes Weiner, Joonsoo Kim,
	Andrew Morton, Laura Abbott, Mel Gorman, Roman Gushchin,
	Vlastimil Babka



On 05/05/20 3:29 am, Eric W. Biederman wrote:
> 
> Recently a patch was proposed to kimage_alloc_page to slightly alter
> the logic of how pages allocated with incompatible flags were
> detected.  The logic was being altered because the semantics of the
> page alloctor were changing yet again.
> 
> Looking at that case I realized that there is no reason for it to even
> exist.  Either the indirect page allocations and the source page
> allocations could be separated out, or I could do as I am doing now
> and simply teach the indirect pages to live in high memory.
> 
> This patch replaced pointers of type kimage_entry_t * with a new type
> kimage_entry_pos_t.  This new type holds the physical address of the
> indirect page and the offset within that page of the next indirect
> entry to write.  A special constant KIMAGE_ENTRY_POS_INVALID is added
> that kimage_image_pos_t variables that don't currently have a valid
> may be set to.
> 
> Two new functions kimage_read_entry and kimage_write_entry have been
> provided to write entries in way that works if they live in high
> memory.
> 
> The now unnecessary checks to see if a destination entry is non-zero
> and to increment it if so have been removed.  For safety new indrect
> pages are now cleared so we have a guarantee everything that has not
> been used yet is zero.  Along with this writing an extra trailing 0
> entry has been removed, as it is known all trailing entries are now 0.
> 
> With highmem support implemented for indirect pages
> kimage_image_alloc_page has been updated to always allocate
> GFP_HIGHUSER pages, and handling of pages with different
> gfp flags has been removed.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Eric, the patch failed with data access exception on ppc64. Using the below patch on top
got me going...


diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 45862fd..bef52f1 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -570,7 +570,12 @@ static int kimage_add_entry(struct kimage *image, kimage_entry_t entry)
 			return -ENOMEM;
 
 		ind_addr = page_to_boot_pfn(page) << PAGE_SHIFT;
-		kimage_write_entry(image->entry_pos, ind_addr | IND_INDIRECTION);
+
+		/* If it is the first entry, handle it here */
+		if (!image->head)
+			image->head = ind_addr | IND_INDIRECTION;
+		else
+			kimage_write_entry(image->entry_pos, ind_addr | IND_INDIRECTION);
 
 		clear_highpage(page);
 
@@ -623,7 +628,11 @@ int __weak machine_kexec_post_load(struct kimage *image)
 
 void kimage_terminate(struct kimage *image)
 {
-	kimage_write_entry(image->entry_pos, IND_DONE);
+	/* This could be the only entry in case of kdump */
+	if (!image->head)
+		image->head = IND_DONE;
+	else
+		kimage_write_entry(image->entry_pos, IND_DONE);
 }
 
 #define for_each_kimage_entry(image, pos, entry) 				\


Thanks
Hari


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH] kexec: Teach indirect pages how to live in high memory
  2020-05-05 17:44           ` Hari Bathini
@ 2020-05-05 18:39             ` Eric W. Biederman
  2020-10-09  1:35               ` Joonsoo Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Eric W. Biederman @ 2020-05-05 18:39 UTC (permalink / raw)
  To: Hari Bathini
  Cc: Joonsoo Kim, kernel-team, Michal Hocko, Minchan Kim,
	Aneesh Kumar K . V, Rik van Riel, Rafael J . Wysocki, LKML,
	Christian Koenig, Christoph Hellwig,
	Linux Memory Management List, Huang Rui, Kexec Mailing List,
	Pavel Machek, Johannes Weiner, Joonsoo Kim, Andrew Morton,
	Laura Abbott, Mel Gorman, Roman Gushchin, Vlastimil Babka

Hari Bathini <hbathini@linux.ibm.com> writes:

> On 05/05/20 3:29 am, Eric W. Biederman wrote:
>> 
>> Recently a patch was proposed to kimage_alloc_page to slightly alter
>> the logic of how pages allocated with incompatible flags were
>> detected.  The logic was being altered because the semantics of the
>> page alloctor were changing yet again.
>> 
>> Looking at that case I realized that there is no reason for it to even
>> exist.  Either the indirect page allocations and the source page
>> allocations could be separated out, or I could do as I am doing now
>> and simply teach the indirect pages to live in high memory.
>> 
>> This patch replaced pointers of type kimage_entry_t * with a new type
>> kimage_entry_pos_t.  This new type holds the physical address of the
>> indirect page and the offset within that page of the next indirect
>> entry to write.  A special constant KIMAGE_ENTRY_POS_INVALID is added
>> that kimage_image_pos_t variables that don't currently have a valid
>> may be set to.
>> 
>> Two new functions kimage_read_entry and kimage_write_entry have been
>> provided to write entries in way that works if they live in high
>> memory.
>> 
>> The now unnecessary checks to see if a destination entry is non-zero
>> and to increment it if so have been removed.  For safety new indrect
>> pages are now cleared so we have a guarantee everything that has not
>> been used yet is zero.  Along with this writing an extra trailing 0
>> entry has been removed, as it is known all trailing entries are now 0.
>> 
>> With highmem support implemented for indirect pages
>> kimage_image_alloc_page has been updated to always allocate
>> GFP_HIGHUSER pages, and handling of pages with different
>> gfp flags has been removed.
>> 
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
> Eric, the patch failed with data access exception on ppc64. Using the below patch on top
> got me going...

Doh!  Somehow I thought I had put that logic or something equivalent
into kimage_write_entry and it appears I did not.  I will see if I can
respin the patch.

Thank you very much for testing.

Eric


> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 45862fd..bef52f1 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -570,7 +570,12 @@ static int kimage_add_entry(struct kimage *image, kimage_entry_t entry)
>  			return -ENOMEM;
>  
>  		ind_addr = page_to_boot_pfn(page) << PAGE_SHIFT;
> -		kimage_write_entry(image->entry_pos, ind_addr | IND_INDIRECTION);
> +
> +		/* If it is the first entry, handle it here */
> +		if (!image->head)
> +			image->head = ind_addr | IND_INDIRECTION;
> +		else
> +			kimage_write_entry(image->entry_pos, ind_addr | IND_INDIRECTION);
>  
>  		clear_highpage(page);
>  
> @@ -623,7 +628,11 @@ int __weak machine_kexec_post_load(struct kimage *image)
>  
>  void kimage_terminate(struct kimage *image)
>  {
> -	kimage_write_entry(image->entry_pos, IND_DONE);
> +	/* This could be the only entry in case of kdump */
> +	if (!image->head)
> +		image->head = IND_DONE;
> +	else
> +		kimage_write_entry(image->entry_pos, IND_DONE);
>  }
>  
>  #define for_each_kimage_entry(image, pos, entry) 				\
>
>
> Thanks
> Hari


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 03/10] kexec: separate PageHighMem() and PageHighMemZone() use case
  2020-05-04 14:03       ` Eric W. Biederman
  2020-05-04 21:59         ` [RFC][PATCH] kexec: Teach indirect pages how to live in high memory Eric W. Biederman
@ 2020-05-06  5:23         ` Joonsoo Kim
  1 sibling, 0 replies; 33+ messages in thread
From: Joonsoo Kim @ 2020-05-06  5:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Linux Memory Management List, LKML,
	Vlastimil Babka, Laura Abbott, Aneesh Kumar K . V, Mel Gorman,
	Michal Hocko, Johannes Weiner, Roman Gushchin, Minchan Kim,
	Rik van Riel, Christian Koenig, Huang Rui, Rafael J . Wysocki,
	Pavel Machek, kernel-team, Christoph Hellwig, Kexec Mailing List

On Mon, May 04, 2020 at 09:03:56AM -0500, Eric W. Biederman wrote:
> 
> I have added in the kexec mailling list.
> 
> Looking at the patch we are discussing it appears that the kexec code
> could be doing much better in highmem situations today but is not.

Sound great!

> 
> 
> Joonsoo Kim <js1304@gmail.com> writes:
> 
> > 2020년 5월 1일 (금) 오후 11:06, Eric W. Biederman <ebiederm@xmission.com>님이 작성:
> >>
> >> js1304@gmail.com writes:
> >>
> >> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >> >
> >> > Until now, PageHighMem() is used for two different cases. One is to check
> >> > if there is a direct mapping for this page or not. The other is to check
> >> > the zone of this page, that is, weather it is the highmem type zone or not.
> >> >
> >> > Now, we have separate functions, PageHighMem() and PageHighMemZone() for
> >> > each cases. Use appropriate one.
> >> >
> >> > Note that there are some rules to determine the proper macro.
> >> >
> >> > 1. If PageHighMem() is called for checking if the direct mapping exists
> >> > or not, use PageHighMem().
> >> > 2. If PageHighMem() is used to predict the previous gfp_flags for
> >> > this page, use PageHighMemZone(). The zone of the page is related to
> >> > the gfp_flags.
> >> > 3. If purpose of calling PageHighMem() is to count highmem page and
> >> > to interact with the system by using this count, use PageHighMemZone().
> >> > This counter is usually used to calculate the available memory for an
> >> > kernel allocation and pages on the highmem zone cannot be available
> >> > for an kernel allocation.
> >> > 4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
> >> > is just copy of the previous PageHighMem() implementation and won't
> >> > be changed.
> >> >
> >> > I apply the rule #2 for this patch.
> >>
> >> Hmm.
> >>
> >> What happened to the notion of deprecating and reducing the usage of
> >> highmem?  I know that we have some embedded architectures where it is
> >> still important but this feels like it flies in the face of that.
> >
> > AFAIK, deprecating highmem requires some more time and, before then,
> > we need to support it.
> 
> But it at least makes sense to look at what we are doing with highmem
> and ask if it makes sense.
> 
> >> This part of kexec would be much more maintainable if it had a proper
> >> mm layer helper that tested to see if the page matched the passed in
> >> gfp flags.  That way the mm layer could keep changing and doing weird
> >> gyrations and this code would not care.
> >
> > Good idea! I will do it.
> >
> >>
> >> What would be really helpful is if there was a straight forward way to
> >> allocate memory whose physical address fits in the native word size.
> >>
> >>
> >> All I know for certain about this patch is that it takes a piece of code
> >> that looked like it made sense, and transfroms it into something I can
> >> not easily verify, and can not maintain.
> >
> > Although I decide to make a helper as you described above, I don't
> > understand why you think that a new code isn't maintainable. It is just
> > the same thing with different name. Could you elaborate more why do
> > you think so?
> 
> Because the current code is already wrong.  It does not handle
> the general case of what it claims to handle.  When the only distinction
> that needs to be drawn is highmem or not highmem that is likely fine.
> But now you are making it possible to draw more distinctions.  At which
> point I have no idea which distinction needs to be drawn.
> 
> 
> The code and the logic is about 20 years old.  When it was written I
> don't recally taking numa seriously and the kernel only had 3 zones
> as I recall (DMA aka the now deprecated GFP_DMA, NORMAL, and HIGH).
> 
> The code attempts to work around limitations of those old zones amd play
> nice in a highmem world by allocating memory HIGH memory and not using
> it if the memory was above 4G ( on 32bit ).
> 
> Looking the kernel now has GFP_DMA32 so on 32bit with highmem we should
> probably be using that, when allocating memory.
> 

From quick investigation, unfortunately, ZONE_DMA32 isn't available on
x86 32bit now so using GFP_DMA32 to allocate memory below 4G would not
work. Enabling ZONE_DMA32 on x86 32bit would be not simple, so, IMHO, it
would be better to leave the code as it is.

> 
> 
> Further in dealing with this memory management situation we only
> have two situations we call kimage_alloc_page.
> 
> For an indirect page which must have a valid page_address(page).
> We could probably relax that if we cared to.
> 
> For a general kexec page to store the next kernel in until we switch.
> The general pages can be in high memory.
> 
> In a highmem world all of those pages should be below 32bit.
> 
> 
> 
> Given that we fundamentally have two situations my sense is that we
> should just refactor the code so that we never have to deal with:
> 
> 
> 			/* The old page I have found cannot be a
> 			 * destination page, so return it if it's
> 			 * gfp_flags honor the ones passed in.
> 			 */
> 			if (!(gfp_mask & __GFP_HIGHMEM) &&
> 			    PageHighMem(old_page)) {
> 				kimage_free_pages(old_page);
> 				continue;
> 			}
> 
> Either we teach kimage_add_entry how to work with high memory pages
> (still 32bit accessible) or we teach kimage_alloc_page to notice it is
> an indirect page allocation and to always skip trying to reuse the page
> it found in that case.
> 
> That way the code does not need to know about forever changing mm internals.

Nice! I already have seen your patch and found that above two lines
related to HIGHMEM are removed. Thanks for your help.

Thanks.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH] kexec: Teach indirect pages how to live in high memory
  2020-05-05 18:39             ` Eric W. Biederman
@ 2020-10-09  1:35               ` Joonsoo Kim
  0 siblings, 0 replies; 33+ messages in thread
From: Joonsoo Kim @ 2020-10-09  1:35 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Hari Bathini, kernel-team, Michal Hocko, Minchan Kim,
	Aneesh Kumar K . V, Rik van Riel, Rafael J . Wysocki, LKML,
	Christian Koenig, Christoph Hellwig,
	Linux Memory Management List, Huang Rui, Kexec Mailing List,
	Pavel Machek, Johannes Weiner, Andrew Morton, Laura Abbott,
	Mel Gorman, Roman Gushchin, Vlastimil Babka

On Tue, May 05, 2020 at 01:39:16PM -0500, Eric W. Biederman wrote:
> Hari Bathini <hbathini@linux.ibm.com> writes:
> 
> > On 05/05/20 3:29 am, Eric W. Biederman wrote:
> >> 
> >> Recently a patch was proposed to kimage_alloc_page to slightly alter
> >> the logic of how pages allocated with incompatible flags were
> >> detected.  The logic was being altered because the semantics of the
> >> page alloctor were changing yet again.
> >> 
> >> Looking at that case I realized that there is no reason for it to even
> >> exist.  Either the indirect page allocations and the source page
> >> allocations could be separated out, or I could do as I am doing now
> >> and simply teach the indirect pages to live in high memory.
> >> 
> >> This patch replaced pointers of type kimage_entry_t * with a new type
> >> kimage_entry_pos_t.  This new type holds the physical address of the
> >> indirect page and the offset within that page of the next indirect
> >> entry to write.  A special constant KIMAGE_ENTRY_POS_INVALID is added
> >> that kimage_image_pos_t variables that don't currently have a valid
> >> may be set to.
> >> 
> >> Two new functions kimage_read_entry and kimage_write_entry have been
> >> provided to write entries in way that works if they live in high
> >> memory.
> >> 
> >> The now unnecessary checks to see if a destination entry is non-zero
> >> and to increment it if so have been removed.  For safety new indrect
> >> pages are now cleared so we have a guarantee everything that has not
> >> been used yet is zero.  Along with this writing an extra trailing 0
> >> entry has been removed, as it is known all trailing entries are now 0.
> >> 
> >> With highmem support implemented for indirect pages
> >> kimage_image_alloc_page has been updated to always allocate
> >> GFP_HIGHUSER pages, and handling of pages with different
> >> gfp flags has been removed.
> >> 
> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> >
> > Eric, the patch failed with data access exception on ppc64. Using the below patch on top
> > got me going...
> 
> Doh!  Somehow I thought I had put that logic or something equivalent
> into kimage_write_entry and it appears I did not.  I will see if I can
> respin the patch.
> 
> Thank you very much for testing.

Hello, Eric.

It seems that this patch isn't upstreamed.
Could you respin the patch?
I've tested this one on x86_32 (highmem enabled) and it works well.

Thanks.


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2020-10-09  1:35 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-29  3:26 [PATCH v2 00/10] change the implementation of the PageHighMem() js1304
2020-04-29  3:26 ` [PATCH v2 01/10] mm/page-flags: introduce PageHighMemZone() js1304
2020-04-29  3:26 ` [PATCH v2 02/10] drm/ttm: separate PageHighMem() and PageHighMemZone() use case js1304
2020-04-29  3:26 ` [PATCH v2 03/10] kexec: " js1304
2020-05-01 14:03   ` Eric W. Biederman
2020-05-04  3:10     ` Joonsoo Kim
2020-05-04 14:03       ` Eric W. Biederman
2020-05-04 21:59         ` [RFC][PATCH] kexec: Teach indirect pages how to live in high memory Eric W. Biederman
2020-05-05 17:44           ` Hari Bathini
2020-05-05 18:39             ` Eric W. Biederman
2020-10-09  1:35               ` Joonsoo Kim
2020-05-06  5:23         ` [PATCH v2 03/10] kexec: separate PageHighMem() and PageHighMemZone() use case Joonsoo Kim
2020-04-29  3:26 ` [PATCH v2 04/10] power: " js1304
2020-05-01 12:22   ` Christoph Hellwig
2020-05-04  3:01     ` Joonsoo Kim
2020-04-29  3:26 ` [PATCH v2 05/10] mm/gup: " js1304
2020-05-01 12:24   ` Christoph Hellwig
2020-05-04  3:02     ` Joonsoo Kim
2020-04-29  3:26 ` [PATCH v2 06/10] mm/hugetlb: " js1304
2020-05-01 12:26   ` Christoph Hellwig
2020-05-04  3:03     ` Joonsoo Kim
2020-04-29  3:26 ` [PATCH v2 07/10] mm: " js1304
2020-05-01 12:30   ` Christoph Hellwig
2020-05-04  3:08     ` Joonsoo Kim
2020-04-29  3:26 ` [PATCH v2 08/10] mm/page_alloc: correct the use of is_highmem_idx() js1304
2020-04-29  3:26 ` [PATCH v2 09/10] mm/migrate: replace PageHighMem() with open-code js1304
2020-04-29  3:26 ` [PATCH v2 10/10] mm/page-flags: change the implementation of the PageHighMem() js1304
2020-04-30  1:47 ` [PATCH v2 00/10] " Andrew Morton
2020-05-01 10:52   ` Joonsoo Kim
2020-05-01 10:55     ` Christoph Hellwig
2020-05-01 12:15       ` Joonsoo Kim
2020-05-01 12:34         ` Christoph Hellwig
2020-05-04  3:09           ` Joonsoo Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).