* [RFT PATCH v1 0/5] fix up inaccurate zone->present_pages
@ 2012-11-18 16:07 ` Jiang Liu
0 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-18 16:07 UTC (permalink / raw)
To: Andrew Morton, Wen Congyang, David Rientjes
Cc: Jiang Liu, Maciej Rutecki, Chris Clayton, Rafael J . Wysocki,
Mel Gorman, Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko,
Jianguo Wu, linux-mm, linux-kernel
The commit 7f1290f2f2a4 ("mm: fix-up zone present pages") tries to
resolve an issue caused by inaccurate zone->present_pages, but that
fix is incomplete and causes regresions with HIGHMEM. And it has been
reverted by commit
5576646 revert "mm: fix-up zone present pages"
This is a following-up patchset for the issue above. It introduces a
new field named "managed_pages" to struct zone, which counts pages
managed by the buddy system from the zone. And zone->present_pages
is used to count pages existing in the zone, which is
spanned_pages - absent_pages.
But that way, zone->present_pages will be kept in consistence with
pgdat->node_present_pages, which is sum of zone->present_pages.
This patchset has only been tested on x86_64 with nobootmem.c. So need
help to test this patchset on machines:
1) use bootmem.c
2) have highmem
This patchset applies to "f4a75d2e Linux 3.7-rc6" from
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Any comments and helps are welcomed!
Jiang Liu (5):
mm: introduce new field "managed_pages" to struct zone
mm: replace zone->present_pages with zone->managed_pages if
appreciated
mm: set zone->present_pages to number of existing pages in the zone
mm: provide more accurate estimation of pages occupied by memmap
mm: increase totalram_pages when free pages allocated by bootmem
allocator
include/linux/mmzone.h | 1 +
mm/bootmem.c | 14 ++++++++
mm/memory_hotplug.c | 6 ++++
mm/mempolicy.c | 2 +-
mm/nobootmem.c | 15 ++++++++
mm/page_alloc.c | 89 +++++++++++++++++++++++++++++++-----------------
mm/vmscan.c | 16 ++++-----
mm/vmstat.c | 8 +++--
8 files changed, 108 insertions(+), 43 deletions(-)
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* [RFT PATCH v1 1/5] mm: introduce new field "managed_pages" to struct zone
2012-11-18 16:07 ` Jiang Liu
@ 2012-11-18 16:07 ` Jiang Liu
-1 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-18 16:07 UTC (permalink / raw)
To: Andrew Morton, Wen Congyang, David Rientjes
Cc: Jiang Liu, Maciej Rutecki, Chris Clayton, Rafael J . Wysocki,
Mel Gorman, Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko,
Jianguo Wu, linux-mm, linux-kernel
Currently a zone's present_pages is calcuated as below, which is
inaccurate and may cause trouble to memory hotplug.
spanned_pages - absent_pages - memmap_pages - dma_reserve.
During fixing bugs caused by inaccurate zone->present_pages, we found
zone->present_pages has been abused. The field zone->present_pages
may have different meanings in different contexts:
1) pages existing in a zone.
2) pages managed by the buddy system.
For more discussions about the issue, please refer to:
http://lkml.org/lkml/2012/11/5/866
https://patchwork.kernel.org/patch/1346751/
This patchset tries to introduce a new field named "managed_pages" to
struct zone, which counts "pages managed by the buddy system". And
revert zone->present_pages to count "physical pages existing in a zone",
which also keep in consistence with pgdat->node_present_pages.
We will set an initial value for zone->managed_pages in function
free_area_init_core() and will be adjusted later if the initial value is
inaccurate.
For DMA/normal zones, the initial value is set to:
(spanned_pages - absent_pages - memmap_pages - dma_reserve)
Later zone->managed_pages will be adjusted to the accurate value when
the bootmem allocator frees all free pages to the buddy system in
function free_all_bootmem_node() and free_all_bootmem().
The bootmem allocator doesn't touch highmem pages, so highmem zones'
managed_pages is set to the accurate value "spanned_pages - absent_pages"
in function free_area_init_core() and won't be updated anymore.
This patch also adds a new field "managed_pages" to /proc/zoneinfo
and sysrq showmem.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
---
include/linux/mmzone.h | 1 +
mm/bootmem.c | 14 ++++++++++++++
mm/memory_hotplug.c | 5 +++++
mm/nobootmem.c | 15 +++++++++++++++
mm/page_alloc.c | 37 +++++++++++++++++++++++--------------
mm/vmstat.c | 6 ++++--
6 files changed, 62 insertions(+), 16 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index a23923b..c61fb6d 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -480,6 +480,7 @@ struct zone {
*/
unsigned long spanned_pages; /* total size, including holes */
unsigned long present_pages; /* amount of memory (excluding holes) */
+ unsigned long managed_pages; /* pages managed by the Buddy */
/*
* rarely used fields:
diff --git a/mm/bootmem.c b/mm/bootmem.c
index f468185..a813e5b 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -229,6 +229,15 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
return count;
}
+static void reset_node_lowmem_managed_pages(pg_data_t *pgdat)
+{
+ struct zone *z;
+
+ for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
+ if (!is_highmem(z))
+ z->managed_pages = 0;
+}
+
/**
* free_all_bootmem_node - release a node's free pages to the buddy allocator
* @pgdat: node to be released
@@ -238,6 +247,7 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
unsigned long __init free_all_bootmem_node(pg_data_t *pgdat)
{
register_page_bootmem_info_node(pgdat);
+ reset_node_lowmem_managed_pages(pgdat);
return free_all_bootmem_core(pgdat->bdata);
}
@@ -250,6 +260,10 @@ unsigned long __init free_all_bootmem(void)
{
unsigned long total_pages = 0;
bootmem_data_t *bdata;
+ struct pglist_data *pgdat;
+
+ for_each_online_pgdat(pgdat)
+ reset_node_lowmem_managed_pages(pgdat);
list_for_each_entry(bdata, &bdata_list, list)
total_pages += free_all_bootmem_core(bdata);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index e4eeaca..acd3993 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -106,6 +106,7 @@ static void get_page_bootmem(unsigned long info, struct page *page,
void __ref put_page_bootmem(struct page *page)
{
unsigned long type;
+ static DEFINE_MUTEX(ppb_lock);
type = (unsigned long) page->lru.next;
BUG_ON(type < MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
@@ -115,7 +116,9 @@ void __ref put_page_bootmem(struct page *page)
ClearPagePrivate(page);
set_page_private(page, 0);
INIT_LIST_HEAD(&page->lru);
+ mutex_lock(&ppb_lock);
__free_pages_bootmem(page, 0);
+ mutex_unlock(&ppb_lock);
}
}
@@ -514,6 +517,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
return ret;
}
+ zone->managed_pages += onlined_pages;
zone->present_pages += onlined_pages;
zone->zone_pgdat->node_present_pages += onlined_pages;
if (onlined_pages) {
@@ -961,6 +965,7 @@ repeat:
/* reset pagetype flags and makes migrate type to be MOVABLE */
undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
/* removal success */
+ zone->managed_pages -= offlined_pages;
zone->present_pages -= offlined_pages;
zone->zone_pgdat->node_present_pages -= offlined_pages;
totalram_pages -= offlined_pages;
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index bd82f6b..9c55c18 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -137,6 +137,15 @@ unsigned long __init free_low_memory_core_early(int nodeid)
return count;
}
+static void reset_node_lowmem_managed_pages(pg_data_t *pgdat)
+{
+ struct zone *z;
+
+ for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
+ if (!is_highmem(z))
+ z->managed_pages = 0;
+}
+
/**
* free_all_bootmem_node - release a node's free pages to the buddy allocator
* @pgdat: node to be released
@@ -146,6 +155,7 @@ unsigned long __init free_low_memory_core_early(int nodeid)
unsigned long __init free_all_bootmem_node(pg_data_t *pgdat)
{
register_page_bootmem_info_node(pgdat);
+ reset_node_lowmem_managed_pages(pgdat);
/* free_low_memory_core_early(MAX_NUMNODES) will be called later */
return 0;
@@ -158,6 +168,11 @@ unsigned long __init free_all_bootmem_node(pg_data_t *pgdat)
*/
unsigned long __init free_all_bootmem(void)
{
+ struct pglist_data *pgdat;
+
+ for_each_online_pgdat(pgdat)
+ reset_node_lowmem_managed_pages(pgdat);
+
/*
* We need to use MAX_NUMNODES instead of NODE_DATA(0)->node_id
* because in some case like Node0 doesn't have RAM installed
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7bb35ac..a41ee64 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -745,6 +745,7 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
set_page_count(p, 0);
}
+ page_zone(page)->managed_pages += 1 << order;
set_page_refcounted(page);
__free_pages(page, order);
}
@@ -2950,6 +2951,7 @@ void show_free_areas(unsigned int filter)
" isolated(anon):%lukB"
" isolated(file):%lukB"
" present:%lukB"
+ " managed:%lukB"
" mlocked:%lukB"
" dirty:%lukB"
" writeback:%lukB"
@@ -2979,6 +2981,7 @@ void show_free_areas(unsigned int filter)
K(zone_page_state(zone, NR_ISOLATED_ANON)),
K(zone_page_state(zone, NR_ISOLATED_FILE)),
K(zone->present_pages),
+ K(zone->managed_pages),
K(zone_page_state(zone, NR_MLOCK)),
K(zone_page_state(zone, NR_FILE_DIRTY)),
K(zone_page_state(zone, NR_WRITEBACK)),
@@ -4455,48 +4458,54 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
for (j = 0; j < MAX_NR_ZONES; j++) {
struct zone *zone = pgdat->node_zones + j;
- unsigned long size, realsize, memmap_pages;
+ unsigned long size, realsize, freesize, memmap_pages;
size = zone_spanned_pages_in_node(nid, j, zones_size);
- realsize = size - zone_absent_pages_in_node(nid, j,
+ realsize = freesize = size - zone_absent_pages_in_node(nid, j,
zholes_size);
/*
- * Adjust realsize so that it accounts for how much memory
+ * Adjust freesize so that it accounts for how much memory
* is used by this zone for memmap. This affects the watermark
* and per-cpu initialisations
*/
memmap_pages =
PAGE_ALIGN(size * sizeof(struct page)) >> PAGE_SHIFT;
- if (realsize >= memmap_pages) {
- realsize -= memmap_pages;
+ if (freesize >= memmap_pages) {
+ freesize -= memmap_pages;
if (memmap_pages)
printk(KERN_DEBUG
" %s zone: %lu pages used for memmap\n",
zone_names[j], memmap_pages);
} else
printk(KERN_WARNING
- " %s zone: %lu pages exceeds realsize %lu\n",
- zone_names[j], memmap_pages, realsize);
+ " %s zone: %lu pages exceeds freesize %lu\n",
+ zone_names[j], memmap_pages, freesize);
/* Account for reserved pages */
- if (j == 0 && realsize > dma_reserve) {
- realsize -= dma_reserve;
+ if (j == 0 && freesize > dma_reserve) {
+ freesize -= dma_reserve;
printk(KERN_DEBUG " %s zone: %lu pages reserved\n",
zone_names[0], dma_reserve);
}
if (!is_highmem_idx(j))
- nr_kernel_pages += realsize;
- nr_all_pages += realsize;
+ nr_kernel_pages += freesize;
+ nr_all_pages += freesize;
zone->spanned_pages = size;
- zone->present_pages = realsize;
+ zone->present_pages = freesize;
+ /*
+ * Set an approximate value for lowmem here, it will be adjusted
+ * when the bootmem allocator frees pages into the buddy system.
+ * And all highmem pages will be managed by the buddy system.
+ */
+ zone->managed_pages = is_highmem_idx(j) ? realsize : freesize;
#ifdef CONFIG_NUMA
zone->node = nid;
- zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)
+ zone->min_unmapped_pages = (freesize*sysctl_min_unmapped_ratio)
/ 100;
- zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 100;
+ zone->min_slab_pages = (freesize * sysctl_min_slab_ratio) / 100;
#endif
zone->name = zone_names[j];
spin_lock_init(&zone->lock);
diff --git a/mm/vmstat.c b/mm/vmstat.c
index c737057..e47d31c 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -992,14 +992,16 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
"\n high %lu"
"\n scanned %lu"
"\n spanned %lu"
- "\n present %lu",
+ "\n present %lu"
+ "\n managed %lu",
zone_page_state(zone, NR_FREE_PAGES),
min_wmark_pages(zone),
low_wmark_pages(zone),
high_wmark_pages(zone),
zone->pages_scanned,
zone->spanned_pages,
- zone->present_pages);
+ zone->present_pages,
+ zone->managed_pages);
for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
seq_printf(m, "\n %-12s %lu", vmstat_text[i],
--
1.7.9.5
^ permalink raw reply related [flat|nested] 98+ messages in thread
* [RFT PATCH v1 1/5] mm: introduce new field "managed_pages" to struct zone
@ 2012-11-18 16:07 ` Jiang Liu
0 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-18 16:07 UTC (permalink / raw)
To: Andrew Morton, Wen Congyang, David Rientjes
Cc: Jiang Liu, Maciej Rutecki, Chris Clayton, Rafael J . Wysocki,
Mel Gorman, Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko,
Jianguo Wu, linux-mm, linux-kernel
Currently a zone's present_pages is calcuated as below, which is
inaccurate and may cause trouble to memory hotplug.
spanned_pages - absent_pages - memmap_pages - dma_reserve.
During fixing bugs caused by inaccurate zone->present_pages, we found
zone->present_pages has been abused. The field zone->present_pages
may have different meanings in different contexts:
1) pages existing in a zone.
2) pages managed by the buddy system.
For more discussions about the issue, please refer to:
http://lkml.org/lkml/2012/11/5/866
https://patchwork.kernel.org/patch/1346751/
This patchset tries to introduce a new field named "managed_pages" to
struct zone, which counts "pages managed by the buddy system". And
revert zone->present_pages to count "physical pages existing in a zone",
which also keep in consistence with pgdat->node_present_pages.
We will set an initial value for zone->managed_pages in function
free_area_init_core() and will be adjusted later if the initial value is
inaccurate.
For DMA/normal zones, the initial value is set to:
(spanned_pages - absent_pages - memmap_pages - dma_reserve)
Later zone->managed_pages will be adjusted to the accurate value when
the bootmem allocator frees all free pages to the buddy system in
function free_all_bootmem_node() and free_all_bootmem().
The bootmem allocator doesn't touch highmem pages, so highmem zones'
managed_pages is set to the accurate value "spanned_pages - absent_pages"
in function free_area_init_core() and won't be updated anymore.
This patch also adds a new field "managed_pages" to /proc/zoneinfo
and sysrq showmem.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
---
include/linux/mmzone.h | 1 +
mm/bootmem.c | 14 ++++++++++++++
mm/memory_hotplug.c | 5 +++++
mm/nobootmem.c | 15 +++++++++++++++
mm/page_alloc.c | 37 +++++++++++++++++++++++--------------
mm/vmstat.c | 6 ++++--
6 files changed, 62 insertions(+), 16 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index a23923b..c61fb6d 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -480,6 +480,7 @@ struct zone {
*/
unsigned long spanned_pages; /* total size, including holes */
unsigned long present_pages; /* amount of memory (excluding holes) */
+ unsigned long managed_pages; /* pages managed by the Buddy */
/*
* rarely used fields:
diff --git a/mm/bootmem.c b/mm/bootmem.c
index f468185..a813e5b 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -229,6 +229,15 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
return count;
}
+static void reset_node_lowmem_managed_pages(pg_data_t *pgdat)
+{
+ struct zone *z;
+
+ for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
+ if (!is_highmem(z))
+ z->managed_pages = 0;
+}
+
/**
* free_all_bootmem_node - release a node's free pages to the buddy allocator
* @pgdat: node to be released
@@ -238,6 +247,7 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
unsigned long __init free_all_bootmem_node(pg_data_t *pgdat)
{
register_page_bootmem_info_node(pgdat);
+ reset_node_lowmem_managed_pages(pgdat);
return free_all_bootmem_core(pgdat->bdata);
}
@@ -250,6 +260,10 @@ unsigned long __init free_all_bootmem(void)
{
unsigned long total_pages = 0;
bootmem_data_t *bdata;
+ struct pglist_data *pgdat;
+
+ for_each_online_pgdat(pgdat)
+ reset_node_lowmem_managed_pages(pgdat);
list_for_each_entry(bdata, &bdata_list, list)
total_pages += free_all_bootmem_core(bdata);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index e4eeaca..acd3993 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -106,6 +106,7 @@ static void get_page_bootmem(unsigned long info, struct page *page,
void __ref put_page_bootmem(struct page *page)
{
unsigned long type;
+ static DEFINE_MUTEX(ppb_lock);
type = (unsigned long) page->lru.next;
BUG_ON(type < MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
@@ -115,7 +116,9 @@ void __ref put_page_bootmem(struct page *page)
ClearPagePrivate(page);
set_page_private(page, 0);
INIT_LIST_HEAD(&page->lru);
+ mutex_lock(&ppb_lock);
__free_pages_bootmem(page, 0);
+ mutex_unlock(&ppb_lock);
}
}
@@ -514,6 +517,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
return ret;
}
+ zone->managed_pages += onlined_pages;
zone->present_pages += onlined_pages;
zone->zone_pgdat->node_present_pages += onlined_pages;
if (onlined_pages) {
@@ -961,6 +965,7 @@ repeat:
/* reset pagetype flags and makes migrate type to be MOVABLE */
undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
/* removal success */
+ zone->managed_pages -= offlined_pages;
zone->present_pages -= offlined_pages;
zone->zone_pgdat->node_present_pages -= offlined_pages;
totalram_pages -= offlined_pages;
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index bd82f6b..9c55c18 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -137,6 +137,15 @@ unsigned long __init free_low_memory_core_early(int nodeid)
return count;
}
+static void reset_node_lowmem_managed_pages(pg_data_t *pgdat)
+{
+ struct zone *z;
+
+ for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
+ if (!is_highmem(z))
+ z->managed_pages = 0;
+}
+
/**
* free_all_bootmem_node - release a node's free pages to the buddy allocator
* @pgdat: node to be released
@@ -146,6 +155,7 @@ unsigned long __init free_low_memory_core_early(int nodeid)
unsigned long __init free_all_bootmem_node(pg_data_t *pgdat)
{
register_page_bootmem_info_node(pgdat);
+ reset_node_lowmem_managed_pages(pgdat);
/* free_low_memory_core_early(MAX_NUMNODES) will be called later */
return 0;
@@ -158,6 +168,11 @@ unsigned long __init free_all_bootmem_node(pg_data_t *pgdat)
*/
unsigned long __init free_all_bootmem(void)
{
+ struct pglist_data *pgdat;
+
+ for_each_online_pgdat(pgdat)
+ reset_node_lowmem_managed_pages(pgdat);
+
/*
* We need to use MAX_NUMNODES instead of NODE_DATA(0)->node_id
* because in some case like Node0 doesn't have RAM installed
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7bb35ac..a41ee64 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -745,6 +745,7 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
set_page_count(p, 0);
}
+ page_zone(page)->managed_pages += 1 << order;
set_page_refcounted(page);
__free_pages(page, order);
}
@@ -2950,6 +2951,7 @@ void show_free_areas(unsigned int filter)
" isolated(anon):%lukB"
" isolated(file):%lukB"
" present:%lukB"
+ " managed:%lukB"
" mlocked:%lukB"
" dirty:%lukB"
" writeback:%lukB"
@@ -2979,6 +2981,7 @@ void show_free_areas(unsigned int filter)
K(zone_page_state(zone, NR_ISOLATED_ANON)),
K(zone_page_state(zone, NR_ISOLATED_FILE)),
K(zone->present_pages),
+ K(zone->managed_pages),
K(zone_page_state(zone, NR_MLOCK)),
K(zone_page_state(zone, NR_FILE_DIRTY)),
K(zone_page_state(zone, NR_WRITEBACK)),
@@ -4455,48 +4458,54 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
for (j = 0; j < MAX_NR_ZONES; j++) {
struct zone *zone = pgdat->node_zones + j;
- unsigned long size, realsize, memmap_pages;
+ unsigned long size, realsize, freesize, memmap_pages;
size = zone_spanned_pages_in_node(nid, j, zones_size);
- realsize = size - zone_absent_pages_in_node(nid, j,
+ realsize = freesize = size - zone_absent_pages_in_node(nid, j,
zholes_size);
/*
- * Adjust realsize so that it accounts for how much memory
+ * Adjust freesize so that it accounts for how much memory
* is used by this zone for memmap. This affects the watermark
* and per-cpu initialisations
*/
memmap_pages =
PAGE_ALIGN(size * sizeof(struct page)) >> PAGE_SHIFT;
- if (realsize >= memmap_pages) {
- realsize -= memmap_pages;
+ if (freesize >= memmap_pages) {
+ freesize -= memmap_pages;
if (memmap_pages)
printk(KERN_DEBUG
" %s zone: %lu pages used for memmap\n",
zone_names[j], memmap_pages);
} else
printk(KERN_WARNING
- " %s zone: %lu pages exceeds realsize %lu\n",
- zone_names[j], memmap_pages, realsize);
+ " %s zone: %lu pages exceeds freesize %lu\n",
+ zone_names[j], memmap_pages, freesize);
/* Account for reserved pages */
- if (j == 0 && realsize > dma_reserve) {
- realsize -= dma_reserve;
+ if (j == 0 && freesize > dma_reserve) {
+ freesize -= dma_reserve;
printk(KERN_DEBUG " %s zone: %lu pages reserved\n",
zone_names[0], dma_reserve);
}
if (!is_highmem_idx(j))
- nr_kernel_pages += realsize;
- nr_all_pages += realsize;
+ nr_kernel_pages += freesize;
+ nr_all_pages += freesize;
zone->spanned_pages = size;
- zone->present_pages = realsize;
+ zone->present_pages = freesize;
+ /*
+ * Set an approximate value for lowmem here, it will be adjusted
+ * when the bootmem allocator frees pages into the buddy system.
+ * And all highmem pages will be managed by the buddy system.
+ */
+ zone->managed_pages = is_highmem_idx(j) ? realsize : freesize;
#ifdef CONFIG_NUMA
zone->node = nid;
- zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)
+ zone->min_unmapped_pages = (freesize*sysctl_min_unmapped_ratio)
/ 100;
- zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 100;
+ zone->min_slab_pages = (freesize * sysctl_min_slab_ratio) / 100;
#endif
zone->name = zone_names[j];
spin_lock_init(&zone->lock);
diff --git a/mm/vmstat.c b/mm/vmstat.c
index c737057..e47d31c 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -992,14 +992,16 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
"\n high %lu"
"\n scanned %lu"
"\n spanned %lu"
- "\n present %lu",
+ "\n present %lu"
+ "\n managed %lu",
zone_page_state(zone, NR_FREE_PAGES),
min_wmark_pages(zone),
low_wmark_pages(zone),
high_wmark_pages(zone),
zone->pages_scanned,
zone->spanned_pages,
- zone->present_pages);
+ zone->present_pages,
+ zone->managed_pages);
for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
seq_printf(m, "\n %-12s %lu", vmstat_text[i],
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 1/5] mm: introduce new field "managed_pages" to struct zone
2012-11-18 16:07 ` Jiang Liu
@ 2012-11-19 23:38 ` Andrew Morton
-1 siblings, 0 replies; 98+ messages in thread
From: Andrew Morton @ 2012-11-19 23:38 UTC (permalink / raw)
To: Jiang Liu
Cc: Wen Congyang, David Rientjes, Jiang Liu, Maciej Rutecki,
Chris Clayton, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On Mon, 19 Nov 2012 00:07:26 +0800
Jiang Liu <liuj97@gmail.com> wrote:
> Currently a zone's present_pages is calcuated as below, which is
> inaccurate and may cause trouble to memory hotplug.
> spanned_pages - absent_pages - memmap_pages - dma_reserve.
>
> During fixing bugs caused by inaccurate zone->present_pages, we found
> zone->present_pages has been abused. The field zone->present_pages
> may have different meanings in different contexts:
> 1) pages existing in a zone.
> 2) pages managed by the buddy system.
>
> For more discussions about the issue, please refer to:
> http://lkml.org/lkml/2012/11/5/866
> https://patchwork.kernel.org/patch/1346751/
>
> This patchset tries to introduce a new field named "managed_pages" to
> struct zone, which counts "pages managed by the buddy system". And
> revert zone->present_pages to count "physical pages existing in a zone",
> which also keep in consistence with pgdat->node_present_pages.
>
> We will set an initial value for zone->managed_pages in function
> free_area_init_core() and will be adjusted later if the initial value is
> inaccurate.
>
> For DMA/normal zones, the initial value is set to:
> (spanned_pages - absent_pages - memmap_pages - dma_reserve)
> Later zone->managed_pages will be adjusted to the accurate value when
> the bootmem allocator frees all free pages to the buddy system in
> function free_all_bootmem_node() and free_all_bootmem().
>
> The bootmem allocator doesn't touch highmem pages, so highmem zones'
> managed_pages is set to the accurate value "spanned_pages - absent_pages"
> in function free_area_init_core() and won't be updated anymore.
>
> This patch also adds a new field "managed_pages" to /proc/zoneinfo
> and sysrq showmem.
hoo boy, what a mess we made. I'd like to merge these patches and get
them into -next for some testing, but -next has stopped for a couple of
weeks. Oh well, let's see what can be done.
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -480,6 +480,7 @@ struct zone {
> */
> unsigned long spanned_pages; /* total size, including holes */
> unsigned long present_pages; /* amount of memory (excluding holes) */
> + unsigned long managed_pages; /* pages managed by the Buddy */
Can you please add a nice big comment over these three fields which
fully describes what they do and the relationship between them?
Basically that stuff that's in the changelog.
Also, the existing comment tells us that spanned_pages and
present_pages are protected by span_seqlock but has not been updated to
describe the locking (if any) for managed_pages.
> /*
> * rarely used fields:
> diff --git a/mm/bootmem.c b/mm/bootmem.c
> index f468185..a813e5b 100644
> --- a/mm/bootmem.c
> +++ b/mm/bootmem.c
> @@ -229,6 +229,15 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
> return count;
> }
>
> +static void reset_node_lowmem_managed_pages(pg_data_t *pgdat)
> +{
> + struct zone *z;
> +
> + for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
> + if (!is_highmem(z))
Needs a comment explaining why we skip the highmem zone, please.
> + z->managed_pages = 0;
> +}
> +
>
> ...
>
> @@ -106,6 +106,7 @@ static void get_page_bootmem(unsigned long info, struct page *page,
> void __ref put_page_bootmem(struct page *page)
> {
> unsigned long type;
> + static DEFINE_MUTEX(ppb_lock);
>
> type = (unsigned long) page->lru.next;
> BUG_ON(type < MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
> @@ -115,7 +116,9 @@ void __ref put_page_bootmem(struct page *page)
> ClearPagePrivate(page);
> set_page_private(page, 0);
> INIT_LIST_HEAD(&page->lru);
> + mutex_lock(&ppb_lock);
> __free_pages_bootmem(page, 0);
> + mutex_unlock(&ppb_lock);
The mutex is odd. Nothing in the changelog, no code comment.
__free_pages_bootmem() is called from a lot of places but only this one
has locking. I'm madly guessing that the lock is here to handle two or
more concurrent memory hotpluggings, but I shouldn't need to guess!!
> }
>
> }
>
> ...
>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 1/5] mm: introduce new field "managed_pages" to struct zone
@ 2012-11-19 23:38 ` Andrew Morton
0 siblings, 0 replies; 98+ messages in thread
From: Andrew Morton @ 2012-11-19 23:38 UTC (permalink / raw)
To: Jiang Liu
Cc: Wen Congyang, David Rientjes, Jiang Liu, Maciej Rutecki,
Chris Clayton, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On Mon, 19 Nov 2012 00:07:26 +0800
Jiang Liu <liuj97@gmail.com> wrote:
> Currently a zone's present_pages is calcuated as below, which is
> inaccurate and may cause trouble to memory hotplug.
> spanned_pages - absent_pages - memmap_pages - dma_reserve.
>
> During fixing bugs caused by inaccurate zone->present_pages, we found
> zone->present_pages has been abused. The field zone->present_pages
> may have different meanings in different contexts:
> 1) pages existing in a zone.
> 2) pages managed by the buddy system.
>
> For more discussions about the issue, please refer to:
> http://lkml.org/lkml/2012/11/5/866
> https://patchwork.kernel.org/patch/1346751/
>
> This patchset tries to introduce a new field named "managed_pages" to
> struct zone, which counts "pages managed by the buddy system". And
> revert zone->present_pages to count "physical pages existing in a zone",
> which also keep in consistence with pgdat->node_present_pages.
>
> We will set an initial value for zone->managed_pages in function
> free_area_init_core() and will be adjusted later if the initial value is
> inaccurate.
>
> For DMA/normal zones, the initial value is set to:
> (spanned_pages - absent_pages - memmap_pages - dma_reserve)
> Later zone->managed_pages will be adjusted to the accurate value when
> the bootmem allocator frees all free pages to the buddy system in
> function free_all_bootmem_node() and free_all_bootmem().
>
> The bootmem allocator doesn't touch highmem pages, so highmem zones'
> managed_pages is set to the accurate value "spanned_pages - absent_pages"
> in function free_area_init_core() and won't be updated anymore.
>
> This patch also adds a new field "managed_pages" to /proc/zoneinfo
> and sysrq showmem.
hoo boy, what a mess we made. I'd like to merge these patches and get
them into -next for some testing, but -next has stopped for a couple of
weeks. Oh well, let's see what can be done.
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -480,6 +480,7 @@ struct zone {
> */
> unsigned long spanned_pages; /* total size, including holes */
> unsigned long present_pages; /* amount of memory (excluding holes) */
> + unsigned long managed_pages; /* pages managed by the Buddy */
Can you please add a nice big comment over these three fields which
fully describes what they do and the relationship between them?
Basically that stuff that's in the changelog.
Also, the existing comment tells us that spanned_pages and
present_pages are protected by span_seqlock but has not been updated to
describe the locking (if any) for managed_pages.
> /*
> * rarely used fields:
> diff --git a/mm/bootmem.c b/mm/bootmem.c
> index f468185..a813e5b 100644
> --- a/mm/bootmem.c
> +++ b/mm/bootmem.c
> @@ -229,6 +229,15 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
> return count;
> }
>
> +static void reset_node_lowmem_managed_pages(pg_data_t *pgdat)
> +{
> + struct zone *z;
> +
> + for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
> + if (!is_highmem(z))
Needs a comment explaining why we skip the highmem zone, please.
> + z->managed_pages = 0;
> +}
> +
>
> ...
>
> @@ -106,6 +106,7 @@ static void get_page_bootmem(unsigned long info, struct page *page,
> void __ref put_page_bootmem(struct page *page)
> {
> unsigned long type;
> + static DEFINE_MUTEX(ppb_lock);
>
> type = (unsigned long) page->lru.next;
> BUG_ON(type < MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
> @@ -115,7 +116,9 @@ void __ref put_page_bootmem(struct page *page)
> ClearPagePrivate(page);
> set_page_private(page, 0);
> INIT_LIST_HEAD(&page->lru);
> + mutex_lock(&ppb_lock);
> __free_pages_bootmem(page, 0);
> + mutex_unlock(&ppb_lock);
The mutex is odd. Nothing in the changelog, no code comment.
__free_pages_bootmem() is called from a lot of places but only this one
has locking. I'm madly guessing that the lock is here to handle two or
more concurrent memory hotpluggings, but I shouldn't need to guess!!
> }
>
> }
>
> ...
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 1/5] mm: introduce new field "managed_pages" to struct zone
2012-11-19 23:38 ` Andrew Morton
@ 2012-11-20 14:56 ` Jiang Liu
-1 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-20 14:56 UTC (permalink / raw)
To: Andrew Morton
Cc: Wen Congyang, David Rientjes, Jiang Liu, Maciej Rutecki,
Chris Clayton, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On 11/20/2012 07:38 AM, Andrew Morton wrote:
> On Mon, 19 Nov 2012 00:07:26 +0800
> Jiang Liu <liuj97@gmail.com> wrote:
>
>> Currently a zone's present_pages is calcuated as below, which is
>> inaccurate and may cause trouble to memory hotplug.
>> spanned_pages - absent_pages - memmap_pages - dma_reserve.
>>
>> During fixing bugs caused by inaccurate zone->present_pages, we found
>> zone->present_pages has been abused. The field zone->present_pages
>> may have different meanings in different contexts:
>> 1) pages existing in a zone.
>> 2) pages managed by the buddy system.
>>
>> For more discussions about the issue, please refer to:
>> http://lkml.org/lkml/2012/11/5/866
>> https://patchwork.kernel.org/patch/1346751/
>>
>> This patchset tries to introduce a new field named "managed_pages" to
>> struct zone, which counts "pages managed by the buddy system". And
>> revert zone->present_pages to count "physical pages existing in a zone",
>> which also keep in consistence with pgdat->node_present_pages.
>>
>> We will set an initial value for zone->managed_pages in function
>> free_area_init_core() and will be adjusted later if the initial value is
>> inaccurate.
>>
>> For DMA/normal zones, the initial value is set to:
>> (spanned_pages - absent_pages - memmap_pages - dma_reserve)
>> Later zone->managed_pages will be adjusted to the accurate value when
>> the bootmem allocator frees all free pages to the buddy system in
>> function free_all_bootmem_node() and free_all_bootmem().
>>
>> The bootmem allocator doesn't touch highmem pages, so highmem zones'
>> managed_pages is set to the accurate value "spanned_pages - absent_pages"
>> in function free_area_init_core() and won't be updated anymore.
>>
>> This patch also adds a new field "managed_pages" to /proc/zoneinfo
>> and sysrq showmem.
>
> hoo boy, what a mess we made. I'd like to merge these patches and get
> them into -next for some testing, but -next has stopped for a couple of
> weeks. Oh well, let's see what can be done.
Hi Andrew,
Really sorry for the delay. Within last a few weeks, I could only
find after work hours or weekends for programming:(
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -480,6 +480,7 @@ struct zone {
>> */
>> unsigned long spanned_pages; /* total size, including holes */
>> unsigned long present_pages; /* amount of memory (excluding holes) */
>> + unsigned long managed_pages; /* pages managed by the Buddy */
>
> Can you please add a nice big comment over these three fields which
> fully describes what they do and the relationship between them?
> Basically that stuff that's in the changelog.
>
> Also, the existing comment tells us that spanned_pages and
> present_pages are protected by span_seqlock but has not been updated to
> describe the locking (if any) for managed_pages.
How about this?
/*
* spanned_pages is the total pages spanned by the zone, including
* holes, which is calcualted as:
* spanned_pages = zone_end_pfn - zone_start_pfn;
*
* present_pages is physical pages existing within the zone, which
* is calculated as:
* present_pages = spanned_pages - absent_pages(pags in holes);
*
* managed_pages is present pages managed by the buddy system, which
* is calculated as (reserved_pages includes pages allocated by the
* bootmem allocator):
* managed_pages = present_pages - reserved_pages;
*
* So present_pages may be used by memory hotplug or memory power
* management logic to figure out unmanaged pages by checking
* (present_pages - managed_pages). And managed_pages should be used
* by page allocator and vm scanner to calculate all kinds of watermarks
* and thresholds.
*
* Lock Rules:
*
* zone_start_pfn, spanned_pages are protected by span_seqlock.
* It is a seqlock because it has to be read outside of zone->lock,
* and it is done in the main allocator path. But, it is written
* quite infrequently.
*
* The span_seq lock is declared along with zone->lock because it is
* frequently read in proximity to zone->lock. It's good to
* give them a chance of being in the same cacheline.
*
* Writing access to present_pages and managed_pages at runtime should
* be protected by lock_memory_hotplug()/unlock_memory_hotplug().
* Any reader who can't tolerant drift of present_pages and
* managed_pages should hold memory hotplug lock to get a stable value.
*/
unsigned long spanned_pages;
unsigned long present_pages;
unsigned long managed_pages;
>
>> /*
>> * rarely used fields:
>> diff --git a/mm/bootmem.c b/mm/bootmem.c
>> index f468185..a813e5b 100644
>> --- a/mm/bootmem.c
>> +++ b/mm/bootmem.c
>> @@ -229,6 +229,15 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
>> return count;
>> }
>>
>> +static void reset_node_lowmem_managed_pages(pg_data_t *pgdat)
>> +{
>> + struct zone *z;
>> +
>> + for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
>> + if (!is_highmem(z))
>
> Needs a comment explaining why we skip the highmem zone, please.
How about this?
/*
* In free_area_init_core(), highmem zone's managed_pages is set to
* present_pages, and bootmem allocator doesn't allocate from highmem
* zones. So there's no need to recalculate managed_pages because all
* highmem pages will be managed by the buddy system. Here highmem
* zone also includes highmem movable zone.
*/
>> + z->managed_pages = 0;
>> +}
>> +
>>
>> ...
>>
>> @@ -106,6 +106,7 @@ static void get_page_bootmem(unsigned long info, struct page *page,
>> void __ref put_page_bootmem(struct page *page)
>> {
>> unsigned long type;
>> + static DEFINE_MUTEX(ppb_lock);
>>
>> type = (unsigned long) page->lru.next;
>> BUG_ON(type < MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
>> @@ -115,7 +116,9 @@ void __ref put_page_bootmem(struct page *page)
>> ClearPagePrivate(page);
>> set_page_private(page, 0);
>> INIT_LIST_HEAD(&page->lru);
>> + mutex_lock(&ppb_lock);
>> __free_pages_bootmem(page, 0);
>> + mutex_unlock(&ppb_lock);
>
> The mutex is odd. Nothing in the changelog, no code comment.
> __free_pages_bootmem() is called from a lot of places but only this one
> has locking. I'm madly guessing that the lock is here to handle two or
> more concurrent memory hotpluggings, but I shouldn't need to guess!!
Actually I'm a little hesitate whether we should add a lock here.
All callers of __free_pages_bootmem() other than put_page_bootmem() should
only be used at startup time. And currently the only caller of put_page_bootmem()
has already been protected by pgdat_resize_lock(pgdat, &flags). So there's
no real need for lock, just defensive.
I'm not sure which is the best solution here.
1) add a comments into __free_pages_bootmem() to state that the caller should
serialize themselves.
2) Use a dedicated lock to serialize updates to zone->managed_pages, this need
modifications to page_alloc.c and memory_hotplug.c.
3) The above solution to serialize in put_page_bootmem().
What's your suggestions here?
Thanks
Gerry
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 1/5] mm: introduce new field "managed_pages" to struct zone
@ 2012-11-20 14:56 ` Jiang Liu
0 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-20 14:56 UTC (permalink / raw)
To: Andrew Morton
Cc: Wen Congyang, David Rientjes, Jiang Liu, Maciej Rutecki,
Chris Clayton, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On 11/20/2012 07:38 AM, Andrew Morton wrote:
> On Mon, 19 Nov 2012 00:07:26 +0800
> Jiang Liu <liuj97@gmail.com> wrote:
>
>> Currently a zone's present_pages is calcuated as below, which is
>> inaccurate and may cause trouble to memory hotplug.
>> spanned_pages - absent_pages - memmap_pages - dma_reserve.
>>
>> During fixing bugs caused by inaccurate zone->present_pages, we found
>> zone->present_pages has been abused. The field zone->present_pages
>> may have different meanings in different contexts:
>> 1) pages existing in a zone.
>> 2) pages managed by the buddy system.
>>
>> For more discussions about the issue, please refer to:
>> http://lkml.org/lkml/2012/11/5/866
>> https://patchwork.kernel.org/patch/1346751/
>>
>> This patchset tries to introduce a new field named "managed_pages" to
>> struct zone, which counts "pages managed by the buddy system". And
>> revert zone->present_pages to count "physical pages existing in a zone",
>> which also keep in consistence with pgdat->node_present_pages.
>>
>> We will set an initial value for zone->managed_pages in function
>> free_area_init_core() and will be adjusted later if the initial value is
>> inaccurate.
>>
>> For DMA/normal zones, the initial value is set to:
>> (spanned_pages - absent_pages - memmap_pages - dma_reserve)
>> Later zone->managed_pages will be adjusted to the accurate value when
>> the bootmem allocator frees all free pages to the buddy system in
>> function free_all_bootmem_node() and free_all_bootmem().
>>
>> The bootmem allocator doesn't touch highmem pages, so highmem zones'
>> managed_pages is set to the accurate value "spanned_pages - absent_pages"
>> in function free_area_init_core() and won't be updated anymore.
>>
>> This patch also adds a new field "managed_pages" to /proc/zoneinfo
>> and sysrq showmem.
>
> hoo boy, what a mess we made. I'd like to merge these patches and get
> them into -next for some testing, but -next has stopped for a couple of
> weeks. Oh well, let's see what can be done.
Hi Andrew,
Really sorry for the delay. Within last a few weeks, I could only
find after work hours or weekends for programming:(
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -480,6 +480,7 @@ struct zone {
>> */
>> unsigned long spanned_pages; /* total size, including holes */
>> unsigned long present_pages; /* amount of memory (excluding holes) */
>> + unsigned long managed_pages; /* pages managed by the Buddy */
>
> Can you please add a nice big comment over these three fields which
> fully describes what they do and the relationship between them?
> Basically that stuff that's in the changelog.
>
> Also, the existing comment tells us that spanned_pages and
> present_pages are protected by span_seqlock but has not been updated to
> describe the locking (if any) for managed_pages.
How about this?
/*
* spanned_pages is the total pages spanned by the zone, including
* holes, which is calcualted as:
* spanned_pages = zone_end_pfn - zone_start_pfn;
*
* present_pages is physical pages existing within the zone, which
* is calculated as:
* present_pages = spanned_pages - absent_pages(pags in holes);
*
* managed_pages is present pages managed by the buddy system, which
* is calculated as (reserved_pages includes pages allocated by the
* bootmem allocator):
* managed_pages = present_pages - reserved_pages;
*
* So present_pages may be used by memory hotplug or memory power
* management logic to figure out unmanaged pages by checking
* (present_pages - managed_pages). And managed_pages should be used
* by page allocator and vm scanner to calculate all kinds of watermarks
* and thresholds.
*
* Lock Rules:
*
* zone_start_pfn, spanned_pages are protected by span_seqlock.
* It is a seqlock because it has to be read outside of zone->lock,
* and it is done in the main allocator path. But, it is written
* quite infrequently.
*
* The span_seq lock is declared along with zone->lock because it is
* frequently read in proximity to zone->lock. It's good to
* give them a chance of being in the same cacheline.
*
* Writing access to present_pages and managed_pages at runtime should
* be protected by lock_memory_hotplug()/unlock_memory_hotplug().
* Any reader who can't tolerant drift of present_pages and
* managed_pages should hold memory hotplug lock to get a stable value.
*/
unsigned long spanned_pages;
unsigned long present_pages;
unsigned long managed_pages;
>
>> /*
>> * rarely used fields:
>> diff --git a/mm/bootmem.c b/mm/bootmem.c
>> index f468185..a813e5b 100644
>> --- a/mm/bootmem.c
>> +++ b/mm/bootmem.c
>> @@ -229,6 +229,15 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
>> return count;
>> }
>>
>> +static void reset_node_lowmem_managed_pages(pg_data_t *pgdat)
>> +{
>> + struct zone *z;
>> +
>> + for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
>> + if (!is_highmem(z))
>
> Needs a comment explaining why we skip the highmem zone, please.
How about this?
/*
* In free_area_init_core(), highmem zone's managed_pages is set to
* present_pages, and bootmem allocator doesn't allocate from highmem
* zones. So there's no need to recalculate managed_pages because all
* highmem pages will be managed by the buddy system. Here highmem
* zone also includes highmem movable zone.
*/
>> + z->managed_pages = 0;
>> +}
>> +
>>
>> ...
>>
>> @@ -106,6 +106,7 @@ static void get_page_bootmem(unsigned long info, struct page *page,
>> void __ref put_page_bootmem(struct page *page)
>> {
>> unsigned long type;
>> + static DEFINE_MUTEX(ppb_lock);
>>
>> type = (unsigned long) page->lru.next;
>> BUG_ON(type < MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
>> @@ -115,7 +116,9 @@ void __ref put_page_bootmem(struct page *page)
>> ClearPagePrivate(page);
>> set_page_private(page, 0);
>> INIT_LIST_HEAD(&page->lru);
>> + mutex_lock(&ppb_lock);
>> __free_pages_bootmem(page, 0);
>> + mutex_unlock(&ppb_lock);
>
> The mutex is odd. Nothing in the changelog, no code comment.
> __free_pages_bootmem() is called from a lot of places but only this one
> has locking. I'm madly guessing that the lock is here to handle two or
> more concurrent memory hotpluggings, but I shouldn't need to guess!!
Actually I'm a little hesitate whether we should add a lock here.
All callers of __free_pages_bootmem() other than put_page_bootmem() should
only be used at startup time. And currently the only caller of put_page_bootmem()
has already been protected by pgdat_resize_lock(pgdat, &flags). So there's
no real need for lock, just defensive.
I'm not sure which is the best solution here.
1) add a comments into __free_pages_bootmem() to state that the caller should
serialize themselves.
2) Use a dedicated lock to serialize updates to zone->managed_pages, this need
modifications to page_alloc.c and memory_hotplug.c.
3) The above solution to serialize in put_page_bootmem().
What's your suggestions here?
Thanks
Gerry
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 1/5] mm: introduce new field "managed_pages" to struct zone
2012-11-20 14:56 ` Jiang Liu
@ 2012-11-20 19:31 ` Andrew Morton
-1 siblings, 0 replies; 98+ messages in thread
From: Andrew Morton @ 2012-11-20 19:31 UTC (permalink / raw)
To: Jiang Liu
Cc: Wen Congyang, David Rientjes, Jiang Liu, Maciej Rutecki,
Chris Clayton, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On Tue, 20 Nov 2012 22:56:11 +0800
Jiang Liu <liuj97@gmail.com> wrote:
> On 11/20/2012 07:38 AM, Andrew Morton wrote:
> > On Mon, 19 Nov 2012 00:07:26 +0800
> > Jiang Liu <liuj97@gmail.com> wrote:
>
> ...
>
> > Also, the existing comment tells us that spanned_pages and
> > present_pages are protected by span_seqlock but has not been updated to
> > describe the locking (if any) for managed_pages.
> How about this?
>
> ...
>
Looks nice.
> >> + for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
> >> + if (!is_highmem(z))
> >
> > Needs a comment explaining why we skip the highmem zone, please.
> How about this?
>
> ...
>
Ditto.
> >> @@ -106,6 +106,7 @@ static void get_page_bootmem(unsigned long info, struct page *page,
> >> void __ref put_page_bootmem(struct page *page)
> >> {
> >> unsigned long type;
> >> + static DEFINE_MUTEX(ppb_lock);
> >>
> >> type = (unsigned long) page->lru.next;
> >> BUG_ON(type < MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
> >> @@ -115,7 +116,9 @@ void __ref put_page_bootmem(struct page *page)
> >> ClearPagePrivate(page);
> >> set_page_private(page, 0);
> >> INIT_LIST_HEAD(&page->lru);
> >> + mutex_lock(&ppb_lock);
> >> __free_pages_bootmem(page, 0);
> >> + mutex_unlock(&ppb_lock);
> >
> > The mutex is odd. Nothing in the changelog, no code comment.
> > __free_pages_bootmem() is called from a lot of places but only this one
> > has locking. I'm madly guessing that the lock is here to handle two or
> > more concurrent memory hotpluggings, but I shouldn't need to guess!!
> Actually I'm a little hesitate whether we should add a lock here.
>
> All callers of __free_pages_bootmem() other than put_page_bootmem() should
> only be used at startup time. And currently the only caller of put_page_bootmem()
> has already been protected by pgdat_resize_lock(pgdat, &flags). So there's
> no real need for lock, just defensive.
>
> I'm not sure which is the best solution here.
> 1) add a comments into __free_pages_bootmem() to state that the caller should
> serialize themselves.
> 2) Use a dedicated lock to serialize updates to zone->managed_pages, this need
> modifications to page_alloc.c and memory_hotplug.c.
> 3) The above solution to serialize in put_page_bootmem().
> What's your suggestions here?
Firstly, let's be clear about what *data* we're protecting here. I
think it's only ->managed_pages?
I agree that no locking is needed during the init-time code.
So afaict we only need be concerned about concurrent updates to
->managed_pages via memory hotplug, and lock_memory_hotplug() is
sufficient there. We don't need to be concerned about readers of
managed_pages because it is an unsigned long (a u64 on 32-bit machines
would be a problem).
All correct? If so, the code is OK as-is and this can all be
described/formalised in code comments. If one wants to be really
confident, we could do something along the lines of
void mod_zone_managed_pages(struct zone *zone, signed long delta)
{
WARN_ON(system_state != SYSTEM_BOOTING &&
!is_locked_memory_hotplug());
zone->managed_pages += delta;
}
And yes, is_locked_memory_hotplug() is a dopey name.
[un]lock_memory_hotplug() should have been called
memory_hotplug_[un]lock()!
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 1/5] mm: introduce new field "managed_pages" to struct zone
@ 2012-11-20 19:31 ` Andrew Morton
0 siblings, 0 replies; 98+ messages in thread
From: Andrew Morton @ 2012-11-20 19:31 UTC (permalink / raw)
To: Jiang Liu
Cc: Wen Congyang, David Rientjes, Jiang Liu, Maciej Rutecki,
Chris Clayton, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On Tue, 20 Nov 2012 22:56:11 +0800
Jiang Liu <liuj97@gmail.com> wrote:
> On 11/20/2012 07:38 AM, Andrew Morton wrote:
> > On Mon, 19 Nov 2012 00:07:26 +0800
> > Jiang Liu <liuj97@gmail.com> wrote:
>
> ...
>
> > Also, the existing comment tells us that spanned_pages and
> > present_pages are protected by span_seqlock but has not been updated to
> > describe the locking (if any) for managed_pages.
> How about this?
>
> ...
>
Looks nice.
> >> + for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
> >> + if (!is_highmem(z))
> >
> > Needs a comment explaining why we skip the highmem zone, please.
> How about this?
>
> ...
>
Ditto.
> >> @@ -106,6 +106,7 @@ static void get_page_bootmem(unsigned long info, struct page *page,
> >> void __ref put_page_bootmem(struct page *page)
> >> {
> >> unsigned long type;
> >> + static DEFINE_MUTEX(ppb_lock);
> >>
> >> type = (unsigned long) page->lru.next;
> >> BUG_ON(type < MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
> >> @@ -115,7 +116,9 @@ void __ref put_page_bootmem(struct page *page)
> >> ClearPagePrivate(page);
> >> set_page_private(page, 0);
> >> INIT_LIST_HEAD(&page->lru);
> >> + mutex_lock(&ppb_lock);
> >> __free_pages_bootmem(page, 0);
> >> + mutex_unlock(&ppb_lock);
> >
> > The mutex is odd. Nothing in the changelog, no code comment.
> > __free_pages_bootmem() is called from a lot of places but only this one
> > has locking. I'm madly guessing that the lock is here to handle two or
> > more concurrent memory hotpluggings, but I shouldn't need to guess!!
> Actually I'm a little hesitate whether we should add a lock here.
>
> All callers of __free_pages_bootmem() other than put_page_bootmem() should
> only be used at startup time. And currently the only caller of put_page_bootmem()
> has already been protected by pgdat_resize_lock(pgdat, &flags). So there's
> no real need for lock, just defensive.
>
> I'm not sure which is the best solution here.
> 1) add a comments into __free_pages_bootmem() to state that the caller should
> serialize themselves.
> 2) Use a dedicated lock to serialize updates to zone->managed_pages, this need
> modifications to page_alloc.c and memory_hotplug.c.
> 3) The above solution to serialize in put_page_bootmem().
> What's your suggestions here?
Firstly, let's be clear about what *data* we're protecting here. I
think it's only ->managed_pages?
I agree that no locking is needed during the init-time code.
So afaict we only need be concerned about concurrent updates to
->managed_pages via memory hotplug, and lock_memory_hotplug() is
sufficient there. We don't need to be concerned about readers of
managed_pages because it is an unsigned long (a u64 on 32-bit machines
would be a problem).
All correct? If so, the code is OK as-is and this can all be
described/formalised in code comments. If one wants to be really
confident, we could do something along the lines of
void mod_zone_managed_pages(struct zone *zone, signed long delta)
{
WARN_ON(system_state != SYSTEM_BOOTING &&
!is_locked_memory_hotplug());
zone->managed_pages += delta;
}
And yes, is_locked_memory_hotplug() is a dopey name.
[un]lock_memory_hotplug() should have been called
memory_hotplug_[un]lock()!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 1/5] mm: introduce new field "managed_pages" to struct zone
2012-11-20 19:31 ` Andrew Morton
@ 2012-11-21 14:36 ` Jiang Liu
-1 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-21 14:36 UTC (permalink / raw)
To: Andrew Morton
Cc: Wen Congyang, David Rientjes, Jiang Liu, Maciej Rutecki,
Chris Clayton, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On 11/21/2012 03:31 AM, Andrew Morton wrote:
>>>> @@ -106,6 +106,7 @@ static void get_page_bootmem(unsigned long info, struct page *page,
>>>> void __ref put_page_bootmem(struct page *page)
>>>> {
>>>> unsigned long type;
>>>> + static DEFINE_MUTEX(ppb_lock);
>>>>
>>>> type = (unsigned long) page->lru.next;
>>>> BUG_ON(type < MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
>>>> @@ -115,7 +116,9 @@ void __ref put_page_bootmem(struct page *page)
>>>> ClearPagePrivate(page);
>>>> set_page_private(page, 0);
>>>> INIT_LIST_HEAD(&page->lru);
>>>> + mutex_lock(&ppb_lock);
>>>> __free_pages_bootmem(page, 0);
>>>> + mutex_unlock(&ppb_lock);
>>>
>>> The mutex is odd. Nothing in the changelog, no code comment.
>>> __free_pages_bootmem() is called from a lot of places but only this one
>>> has locking. I'm madly guessing that the lock is here to handle two or
>>> more concurrent memory hotpluggings, but I shouldn't need to guess!!
>> Actually I'm a little hesitate whether we should add a lock here.
>>
>> All callers of __free_pages_bootmem() other than put_page_bootmem() should
>> only be used at startup time. And currently the only caller of put_page_bootmem()
>> has already been protected by pgdat_resize_lock(pgdat, &flags). So there's
>> no real need for lock, just defensive.
>>
>> I'm not sure which is the best solution here.
>> 1) add a comments into __free_pages_bootmem() to state that the caller should
>> serialize themselves.
>> 2) Use a dedicated lock to serialize updates to zone->managed_pages, this need
>> modifications to page_alloc.c and memory_hotplug.c.
>> 3) The above solution to serialize in put_page_bootmem().
>> What's your suggestions here?
>
> Firstly, let's be clear about what *data* we're protecting here. I
> think it's only ->managed_pages?
Yes, we are just trying to protect ->managed_pages.
> I agree that no locking is needed during the init-time code.
>
> So afaict we only need be concerned about concurrent updates to
> ->managed_pages via memory hotplug, and lock_memory_hotplug() is
> sufficient there. We don't need to be concerned about readers of
> managed_pages because it is an unsigned long (a u64 on 32-bit machines
> would be a problem).
>
> All correct? If so, the code is OK as-is and this can all be
> described/formalised in code comments. If one wants to be really
> confident, we could do something along the lines of
OK, will add some comments to describe.
> void mod_zone_managed_pages(struct zone *zone, signed long delta)
> {
> WARN_ON(system_state != SYSTEM_BOOTING &&
> !is_locked_memory_hotplug());
> zone->managed_pages += delta;
> }
This seems a little overhead because __free_pages_bootmem() is on the hot path
and will be called many times at boot time.
Regards!
Gerry
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 1/5] mm: introduce new field "managed_pages" to struct zone
@ 2012-11-21 14:36 ` Jiang Liu
0 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-21 14:36 UTC (permalink / raw)
To: Andrew Morton
Cc: Wen Congyang, David Rientjes, Jiang Liu, Maciej Rutecki,
Chris Clayton, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On 11/21/2012 03:31 AM, Andrew Morton wrote:
>>>> @@ -106,6 +106,7 @@ static void get_page_bootmem(unsigned long info, struct page *page,
>>>> void __ref put_page_bootmem(struct page *page)
>>>> {
>>>> unsigned long type;
>>>> + static DEFINE_MUTEX(ppb_lock);
>>>>
>>>> type = (unsigned long) page->lru.next;
>>>> BUG_ON(type < MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
>>>> @@ -115,7 +116,9 @@ void __ref put_page_bootmem(struct page *page)
>>>> ClearPagePrivate(page);
>>>> set_page_private(page, 0);
>>>> INIT_LIST_HEAD(&page->lru);
>>>> + mutex_lock(&ppb_lock);
>>>> __free_pages_bootmem(page, 0);
>>>> + mutex_unlock(&ppb_lock);
>>>
>>> The mutex is odd. Nothing in the changelog, no code comment.
>>> __free_pages_bootmem() is called from a lot of places but only this one
>>> has locking. I'm madly guessing that the lock is here to handle two or
>>> more concurrent memory hotpluggings, but I shouldn't need to guess!!
>> Actually I'm a little hesitate whether we should add a lock here.
>>
>> All callers of __free_pages_bootmem() other than put_page_bootmem() should
>> only be used at startup time. And currently the only caller of put_page_bootmem()
>> has already been protected by pgdat_resize_lock(pgdat, &flags). So there's
>> no real need for lock, just defensive.
>>
>> I'm not sure which is the best solution here.
>> 1) add a comments into __free_pages_bootmem() to state that the caller should
>> serialize themselves.
>> 2) Use a dedicated lock to serialize updates to zone->managed_pages, this need
>> modifications to page_alloc.c and memory_hotplug.c.
>> 3) The above solution to serialize in put_page_bootmem().
>> What's your suggestions here?
>
> Firstly, let's be clear about what *data* we're protecting here. I
> think it's only ->managed_pages?
Yes, we are just trying to protect ->managed_pages.
> I agree that no locking is needed during the init-time code.
>
> So afaict we only need be concerned about concurrent updates to
> ->managed_pages via memory hotplug, and lock_memory_hotplug() is
> sufficient there. We don't need to be concerned about readers of
> managed_pages because it is an unsigned long (a u64 on 32-bit machines
> would be a problem).
>
> All correct? If so, the code is OK as-is and this can all be
> described/formalised in code comments. If one wants to be really
> confident, we could do something along the lines of
OK, will add some comments to describe.
> void mod_zone_managed_pages(struct zone *zone, signed long delta)
> {
> WARN_ON(system_state != SYSTEM_BOOTING &&
> !is_locked_memory_hotplug());
> zone->managed_pages += delta;
> }
This seems a little overhead because __free_pages_bootmem() is on the hot path
and will be called many times at boot time.
Regards!
Gerry
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 1/5] mm: introduce new field "managed_pages" to struct zone
2012-11-21 14:36 ` Jiang Liu
@ 2012-11-21 19:31 ` Andrew Morton
-1 siblings, 0 replies; 98+ messages in thread
From: Andrew Morton @ 2012-11-21 19:31 UTC (permalink / raw)
To: Jiang Liu
Cc: Wen Congyang, David Rientjes, Jiang Liu, Maciej Rutecki,
Chris Clayton, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On Wed, 21 Nov 2012 22:36:56 +0800
Jiang Liu <liuj97@gmail.com> wrote:
> > void mod_zone_managed_pages(struct zone *zone, signed long delta)
> > {
> > WARN_ON(system_state != SYSTEM_BOOTING &&
> > !is_locked_memory_hotplug());
> > zone->managed_pages += delta;
> > }
> This seems a little overhead because __free_pages_bootmem() is on the hot path
> and will be called many times at boot time.
Maybe, maybe not. These things are measurable so let's not just guess.
But I'm not really recommending that we do this - there are all sorts
of things we *could* check and warn about, but we don't. Potential
errors in this area don't seem terribly important.
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 1/5] mm: introduce new field "managed_pages" to struct zone
@ 2012-11-21 19:31 ` Andrew Morton
0 siblings, 0 replies; 98+ messages in thread
From: Andrew Morton @ 2012-11-21 19:31 UTC (permalink / raw)
To: Jiang Liu
Cc: Wen Congyang, David Rientjes, Jiang Liu, Maciej Rutecki,
Chris Clayton, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On Wed, 21 Nov 2012 22:36:56 +0800
Jiang Liu <liuj97@gmail.com> wrote:
> > void mod_zone_managed_pages(struct zone *zone, signed long delta)
> > {
> > WARN_ON(system_state != SYSTEM_BOOTING &&
> > !is_locked_memory_hotplug());
> > zone->managed_pages += delta;
> > }
> This seems a little overhead because __free_pages_bootmem() is on the hot path
> and will be called many times at boot time.
Maybe, maybe not. These things are measurable so let's not just guess.
But I'm not really recommending that we do this - there are all sorts
of things we *could* check and warn about, but we don't. Potential
errors in this area don't seem terribly important.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* [RFT PATCH v2 1/5] mm: introduce new field "managed_pages" to struct zone
2012-11-20 19:31 ` Andrew Morton
@ 2012-11-21 15:06 ` Jiang Liu
-1 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-21 15:06 UTC (permalink / raw)
To: Andrew Morton, Wen Congyang, David Rientjes
Cc: Jiang Liu, Maciej Rutecki, Chris Clayton, Rafael J . Wysocki,
Mel Gorman, Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko,
Jianguo Wu, linux-mm, linux-kernel
Currently a zone's present_pages is calcuated as below, which is
inaccurate and may cause trouble to memory hotplug.
spanned_pages - absent_pages - memmap_pages - dma_reserve.
During fixing bugs caused by inaccurate zone->present_pages, we found
zone->present_pages has been abused. The field zone->present_pages
may have different meanings in different contexts:
1) pages existing in a zone.
2) pages managed by the buddy system.
For more discussions about the issue, please refer to:
http://lkml.org/lkml/2012/11/5/866
https://patchwork.kernel.org/patch/1346751/
This patchset tries to introduce a new field named "managed_pages" to
struct zone, which counts "pages managed by the buddy system". And
revert zone->present_pages to count "physical pages existing in a zone",
which also keep in consistence with pgdat->node_present_pages.
We will set an initial value for zone->managed_pages in function
free_area_init_core() and will adjust it later if the initial value is
inaccurate.
For DMA/normal zones, the initial value is set to:
(spanned_pages - absent_pages - memmap_pages - dma_reserve)
Later zone->managed_pages will be adjusted to the accurate value when
the bootmem allocator frees all free pages to the buddy system in
function free_all_bootmem_node() and free_all_bootmem().
The bootmem allocator doesn't touch highmem pages, so highmem zones'
managed_pages is set to the accurate value "spanned_pages - absent_pages"
in function free_area_init_core() and won't be updated anymore.
This patch also adds a new field "managed_pages" to /proc/zoneinfo
and sysrq showmem.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
---
include/linux/mmzone.h | 41 ++++++++++++++++++++++++++++++++++-------
mm/bootmem.c | 21 +++++++++++++++++++++
mm/memory_hotplug.c | 10 ++++++++++
mm/nobootmem.c | 22 ++++++++++++++++++++++
mm/page_alloc.c | 44 ++++++++++++++++++++++++++++++--------------
mm/vmstat.c | 6 ++++--
6 files changed, 121 insertions(+), 23 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index a23923b..47f5672 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -469,17 +469,44 @@ struct zone {
unsigned long zone_start_pfn;
/*
- * zone_start_pfn, spanned_pages and present_pages are all
- * protected by span_seqlock. It is a seqlock because it has
- * to be read outside of zone->lock, and it is done in the main
- * allocator path. But, it is written quite infrequently.
+ * spanned_pages is the total pages spanned by the zone, including
+ * holes, which is calcualted as:
+ * spanned_pages = zone_end_pfn - zone_start_pfn;
*
- * The lock is declared along with zone->lock because it is
+ * present_pages is physical pages existing within the zone, which
+ * is calculated as:
+ * present_pages = spanned_pages - absent_pages(pags in holes);
+ *
+ * managed_pages is present pages managed by the buddy system, which
+ * is calculated as (reserved_pages includes pages allocated by the
+ * bootmem allocator):
+ * managed_pages = present_pages - reserved_pages;
+ *
+ * So present_pages may be used by memory hotplug or memory power
+ * management logic to figure out unmanaged pages by checking
+ * (present_pages - managed_pages). And managed_pages should be used
+ * by page allocator and vm scanner to calculate all kinds of watermarks
+ * and thresholds.
+ *
+ * Lock Rules:
+ *
+ * zone_start_pfn, spanned_pages are protected by span_seqlock.
+ * It is a seqlock because it has to be read outside of zone->lock,
+ * and it is done in the main allocator path. But, it is written
+ * quite infrequently.
+ *
+ * The span_seq lock is declared along with zone->lock because it is
* frequently read in proximity to zone->lock. It's good to
* give them a chance of being in the same cacheline.
+ *
+ * Writing access to present_pages and managed_pages at runtime should
+ * be protected by lock_memory_hotplug()/unlock_memory_hotplug().
+ * Any reader who can't tolerant drift of present_pages and
+ * managed_pages should hold memory hotplug lock to get a stable value.
*/
- unsigned long spanned_pages; /* total size, including holes */
- unsigned long present_pages; /* amount of memory (excluding holes) */
+ unsigned long spanned_pages;
+ unsigned long present_pages;
+ unsigned long managed_pages;
/*
* rarely used fields:
diff --git a/mm/bootmem.c b/mm/bootmem.c
index f468185..4117294 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -229,6 +229,22 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
return count;
}
+static void reset_node_lowmem_managed_pages(pg_data_t *pgdat)
+{
+ struct zone *z;
+
+ /*
+ * In free_area_init_core(), highmem zone's managed_pages is set to
+ * present_pages, and bootmem allocator doesn't allocate from highmem
+ * zones. So there's no need to recalculate managed_pages because all
+ * highmem pages will be managed by the buddy system. Here highmem
+ * zone also includes highmem movable zone.
+ */
+ for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
+ if (!is_highmem(z))
+ z->managed_pages = 0;
+}
+
/**
* free_all_bootmem_node - release a node's free pages to the buddy allocator
* @pgdat: node to be released
@@ -238,6 +254,7 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
unsigned long __init free_all_bootmem_node(pg_data_t *pgdat)
{
register_page_bootmem_info_node(pgdat);
+ reset_node_lowmem_managed_pages(pgdat);
return free_all_bootmem_core(pgdat->bdata);
}
@@ -250,6 +267,10 @@ unsigned long __init free_all_bootmem(void)
{
unsigned long total_pages = 0;
bootmem_data_t *bdata;
+ struct pglist_data *pgdat;
+
+ for_each_online_pgdat(pgdat)
+ reset_node_lowmem_managed_pages(pgdat);
list_for_each_entry(bdata, &bdata_list, list)
total_pages += free_all_bootmem_core(bdata);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index e4eeaca..1a9e675 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -106,6 +106,7 @@ static void get_page_bootmem(unsigned long info, struct page *page,
void __ref put_page_bootmem(struct page *page)
{
unsigned long type;
+ static DEFINE_MUTEX(ppb_lock);
type = (unsigned long) page->lru.next;
BUG_ON(type < MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
@@ -115,7 +116,14 @@ void __ref put_page_bootmem(struct page *page)
ClearPagePrivate(page);
set_page_private(page, 0);
INIT_LIST_HEAD(&page->lru);
+
+ /*
+ * Please refer to comment for __free_pages_bootmem()
+ * for why we serialize here.
+ */
+ mutex_lock(&ppb_lock);
__free_pages_bootmem(page, 0);
+ mutex_unlock(&ppb_lock);
}
}
@@ -514,6 +522,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
return ret;
}
+ zone->managed_pages += onlined_pages;
zone->present_pages += onlined_pages;
zone->zone_pgdat->node_present_pages += onlined_pages;
if (onlined_pages) {
@@ -961,6 +970,7 @@ repeat:
/* reset pagetype flags and makes migrate type to be MOVABLE */
undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
/* removal success */
+ zone->managed_pages -= offlined_pages;
zone->present_pages -= offlined_pages;
zone->zone_pgdat->node_present_pages -= offlined_pages;
totalram_pages -= offlined_pages;
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index bd82f6b..b8294fc 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -137,6 +137,22 @@ unsigned long __init free_low_memory_core_early(int nodeid)
return count;
}
+static void reset_node_lowmem_managed_pages(pg_data_t *pgdat)
+{
+ struct zone *z;
+
+ /*
+ * In free_area_init_core(), highmem zone's managed_pages is set to
+ * present_pages, and bootmem allocator doesn't allocate from highmem
+ * zones. So there's no need to recalculate managed_pages because all
+ * highmem pages will be managed by the buddy system. Here highmem
+ * zone also includes highmem movable zone.
+ */
+ for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
+ if (!is_highmem(z))
+ z->managed_pages = 0;
+}
+
/**
* free_all_bootmem_node - release a node's free pages to the buddy allocator
* @pgdat: node to be released
@@ -146,6 +162,7 @@ unsigned long __init free_low_memory_core_early(int nodeid)
unsigned long __init free_all_bootmem_node(pg_data_t *pgdat)
{
register_page_bootmem_info_node(pgdat);
+ reset_node_lowmem_managed_pages(pgdat);
/* free_low_memory_core_early(MAX_NUMNODES) will be called later */
return 0;
@@ -158,6 +175,11 @@ unsigned long __init free_all_bootmem_node(pg_data_t *pgdat)
*/
unsigned long __init free_all_bootmem(void)
{
+ struct pglist_data *pgdat;
+
+ for_each_online_pgdat(pgdat)
+ reset_node_lowmem_managed_pages(pgdat);
+
/*
* We need to use MAX_NUMNODES instead of NODE_DATA(0)->node_id
* because in some case like Node0 doesn't have RAM installed
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7bb35ac..4418a04 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -730,6 +730,13 @@ static void __free_pages_ok(struct page *page, unsigned int order)
local_irq_restore(flags);
}
+/*
+ * Read access to zone->managed_pages is safe because it's unsigned long,
+ * but we still need to serialize writers. Currently all callers of
+ * __free_pages_bootmem() except put_page_bootmem() should only be used
+ * at boot time. So for shorter boot time, we have shift the burden to
+ * put_page_bootmem() to serialize writers.
+ */
void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
{
unsigned int nr_pages = 1 << order;
@@ -745,6 +752,7 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
set_page_count(p, 0);
}
+ page_zone(page)->managed_pages += 1 << order;
set_page_refcounted(page);
__free_pages(page, order);
}
@@ -2950,6 +2958,7 @@ void show_free_areas(unsigned int filter)
" isolated(anon):%lukB"
" isolated(file):%lukB"
" present:%lukB"
+ " managed:%lukB"
" mlocked:%lukB"
" dirty:%lukB"
" writeback:%lukB"
@@ -2979,6 +2988,7 @@ void show_free_areas(unsigned int filter)
K(zone_page_state(zone, NR_ISOLATED_ANON)),
K(zone_page_state(zone, NR_ISOLATED_FILE)),
K(zone->present_pages),
+ K(zone->managed_pages),
K(zone_page_state(zone, NR_MLOCK)),
K(zone_page_state(zone, NR_FILE_DIRTY)),
K(zone_page_state(zone, NR_WRITEBACK)),
@@ -4455,48 +4465,54 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
for (j = 0; j < MAX_NR_ZONES; j++) {
struct zone *zone = pgdat->node_zones + j;
- unsigned long size, realsize, memmap_pages;
+ unsigned long size, realsize, freesize, memmap_pages;
size = zone_spanned_pages_in_node(nid, j, zones_size);
- realsize = size - zone_absent_pages_in_node(nid, j,
+ realsize = freesize = size - zone_absent_pages_in_node(nid, j,
zholes_size);
/*
- * Adjust realsize so that it accounts for how much memory
+ * Adjust freesize so that it accounts for how much memory
* is used by this zone for memmap. This affects the watermark
* and per-cpu initialisations
*/
memmap_pages =
PAGE_ALIGN(size * sizeof(struct page)) >> PAGE_SHIFT;
- if (realsize >= memmap_pages) {
- realsize -= memmap_pages;
+ if (freesize >= memmap_pages) {
+ freesize -= memmap_pages;
if (memmap_pages)
printk(KERN_DEBUG
" %s zone: %lu pages used for memmap\n",
zone_names[j], memmap_pages);
} else
printk(KERN_WARNING
- " %s zone: %lu pages exceeds realsize %lu\n",
- zone_names[j], memmap_pages, realsize);
+ " %s zone: %lu pages exceeds freesize %lu\n",
+ zone_names[j], memmap_pages, freesize);
/* Account for reserved pages */
- if (j == 0 && realsize > dma_reserve) {
- realsize -= dma_reserve;
+ if (j == 0 && freesize > dma_reserve) {
+ freesize -= dma_reserve;
printk(KERN_DEBUG " %s zone: %lu pages reserved\n",
zone_names[0], dma_reserve);
}
if (!is_highmem_idx(j))
- nr_kernel_pages += realsize;
- nr_all_pages += realsize;
+ nr_kernel_pages += freesize;
+ nr_all_pages += freesize;
zone->spanned_pages = size;
- zone->present_pages = realsize;
+ zone->present_pages = freesize;
+ /*
+ * Set an approximate value for lowmem here, it will be adjusted
+ * when the bootmem allocator frees pages into the buddy system.
+ * And all highmem pages will be managed by the buddy system.
+ */
+ zone->managed_pages = is_highmem_idx(j) ? realsize : freesize;
#ifdef CONFIG_NUMA
zone->node = nid;
- zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)
+ zone->min_unmapped_pages = (freesize*sysctl_min_unmapped_ratio)
/ 100;
- zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 100;
+ zone->min_slab_pages = (freesize * sysctl_min_slab_ratio) / 100;
#endif
zone->name = zone_names[j];
spin_lock_init(&zone->lock);
diff --git a/mm/vmstat.c b/mm/vmstat.c
index c737057..e47d31c 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -992,14 +992,16 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
"\n high %lu"
"\n scanned %lu"
"\n spanned %lu"
- "\n present %lu",
+ "\n present %lu"
+ "\n managed %lu",
zone_page_state(zone, NR_FREE_PAGES),
min_wmark_pages(zone),
low_wmark_pages(zone),
high_wmark_pages(zone),
zone->pages_scanned,
zone->spanned_pages,
- zone->present_pages);
+ zone->present_pages,
+ zone->managed_pages);
for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
seq_printf(m, "\n %-12s %lu", vmstat_text[i],
--
1.7.9.5
^ permalink raw reply related [flat|nested] 98+ messages in thread
* [RFT PATCH v2 1/5] mm: introduce new field "managed_pages" to struct zone
@ 2012-11-21 15:06 ` Jiang Liu
0 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-21 15:06 UTC (permalink / raw)
To: Andrew Morton, Wen Congyang, David Rientjes
Cc: Jiang Liu, Maciej Rutecki, Chris Clayton, Rafael J . Wysocki,
Mel Gorman, Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko,
Jianguo Wu, linux-mm, linux-kernel
Currently a zone's present_pages is calcuated as below, which is
inaccurate and may cause trouble to memory hotplug.
spanned_pages - absent_pages - memmap_pages - dma_reserve.
During fixing bugs caused by inaccurate zone->present_pages, we found
zone->present_pages has been abused. The field zone->present_pages
may have different meanings in different contexts:
1) pages existing in a zone.
2) pages managed by the buddy system.
For more discussions about the issue, please refer to:
http://lkml.org/lkml/2012/11/5/866
https://patchwork.kernel.org/patch/1346751/
This patchset tries to introduce a new field named "managed_pages" to
struct zone, which counts "pages managed by the buddy system". And
revert zone->present_pages to count "physical pages existing in a zone",
which also keep in consistence with pgdat->node_present_pages.
We will set an initial value for zone->managed_pages in function
free_area_init_core() and will adjust it later if the initial value is
inaccurate.
For DMA/normal zones, the initial value is set to:
(spanned_pages - absent_pages - memmap_pages - dma_reserve)
Later zone->managed_pages will be adjusted to the accurate value when
the bootmem allocator frees all free pages to the buddy system in
function free_all_bootmem_node() and free_all_bootmem().
The bootmem allocator doesn't touch highmem pages, so highmem zones'
managed_pages is set to the accurate value "spanned_pages - absent_pages"
in function free_area_init_core() and won't be updated anymore.
This patch also adds a new field "managed_pages" to /proc/zoneinfo
and sysrq showmem.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
---
include/linux/mmzone.h | 41 ++++++++++++++++++++++++++++++++++-------
mm/bootmem.c | 21 +++++++++++++++++++++
mm/memory_hotplug.c | 10 ++++++++++
mm/nobootmem.c | 22 ++++++++++++++++++++++
mm/page_alloc.c | 44 ++++++++++++++++++++++++++++++--------------
mm/vmstat.c | 6 ++++--
6 files changed, 121 insertions(+), 23 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index a23923b..47f5672 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -469,17 +469,44 @@ struct zone {
unsigned long zone_start_pfn;
/*
- * zone_start_pfn, spanned_pages and present_pages are all
- * protected by span_seqlock. It is a seqlock because it has
- * to be read outside of zone->lock, and it is done in the main
- * allocator path. But, it is written quite infrequently.
+ * spanned_pages is the total pages spanned by the zone, including
+ * holes, which is calcualted as:
+ * spanned_pages = zone_end_pfn - zone_start_pfn;
*
- * The lock is declared along with zone->lock because it is
+ * present_pages is physical pages existing within the zone, which
+ * is calculated as:
+ * present_pages = spanned_pages - absent_pages(pags in holes);
+ *
+ * managed_pages is present pages managed by the buddy system, which
+ * is calculated as (reserved_pages includes pages allocated by the
+ * bootmem allocator):
+ * managed_pages = present_pages - reserved_pages;
+ *
+ * So present_pages may be used by memory hotplug or memory power
+ * management logic to figure out unmanaged pages by checking
+ * (present_pages - managed_pages). And managed_pages should be used
+ * by page allocator and vm scanner to calculate all kinds of watermarks
+ * and thresholds.
+ *
+ * Lock Rules:
+ *
+ * zone_start_pfn, spanned_pages are protected by span_seqlock.
+ * It is a seqlock because it has to be read outside of zone->lock,
+ * and it is done in the main allocator path. But, it is written
+ * quite infrequently.
+ *
+ * The span_seq lock is declared along with zone->lock because it is
* frequently read in proximity to zone->lock. It's good to
* give them a chance of being in the same cacheline.
+ *
+ * Writing access to present_pages and managed_pages at runtime should
+ * be protected by lock_memory_hotplug()/unlock_memory_hotplug().
+ * Any reader who can't tolerant drift of present_pages and
+ * managed_pages should hold memory hotplug lock to get a stable value.
*/
- unsigned long spanned_pages; /* total size, including holes */
- unsigned long present_pages; /* amount of memory (excluding holes) */
+ unsigned long spanned_pages;
+ unsigned long present_pages;
+ unsigned long managed_pages;
/*
* rarely used fields:
diff --git a/mm/bootmem.c b/mm/bootmem.c
index f468185..4117294 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -229,6 +229,22 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
return count;
}
+static void reset_node_lowmem_managed_pages(pg_data_t *pgdat)
+{
+ struct zone *z;
+
+ /*
+ * In free_area_init_core(), highmem zone's managed_pages is set to
+ * present_pages, and bootmem allocator doesn't allocate from highmem
+ * zones. So there's no need to recalculate managed_pages because all
+ * highmem pages will be managed by the buddy system. Here highmem
+ * zone also includes highmem movable zone.
+ */
+ for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
+ if (!is_highmem(z))
+ z->managed_pages = 0;
+}
+
/**
* free_all_bootmem_node - release a node's free pages to the buddy allocator
* @pgdat: node to be released
@@ -238,6 +254,7 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
unsigned long __init free_all_bootmem_node(pg_data_t *pgdat)
{
register_page_bootmem_info_node(pgdat);
+ reset_node_lowmem_managed_pages(pgdat);
return free_all_bootmem_core(pgdat->bdata);
}
@@ -250,6 +267,10 @@ unsigned long __init free_all_bootmem(void)
{
unsigned long total_pages = 0;
bootmem_data_t *bdata;
+ struct pglist_data *pgdat;
+
+ for_each_online_pgdat(pgdat)
+ reset_node_lowmem_managed_pages(pgdat);
list_for_each_entry(bdata, &bdata_list, list)
total_pages += free_all_bootmem_core(bdata);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index e4eeaca..1a9e675 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -106,6 +106,7 @@ static void get_page_bootmem(unsigned long info, struct page *page,
void __ref put_page_bootmem(struct page *page)
{
unsigned long type;
+ static DEFINE_MUTEX(ppb_lock);
type = (unsigned long) page->lru.next;
BUG_ON(type < MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
@@ -115,7 +116,14 @@ void __ref put_page_bootmem(struct page *page)
ClearPagePrivate(page);
set_page_private(page, 0);
INIT_LIST_HEAD(&page->lru);
+
+ /*
+ * Please refer to comment for __free_pages_bootmem()
+ * for why we serialize here.
+ */
+ mutex_lock(&ppb_lock);
__free_pages_bootmem(page, 0);
+ mutex_unlock(&ppb_lock);
}
}
@@ -514,6 +522,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
return ret;
}
+ zone->managed_pages += onlined_pages;
zone->present_pages += onlined_pages;
zone->zone_pgdat->node_present_pages += onlined_pages;
if (onlined_pages) {
@@ -961,6 +970,7 @@ repeat:
/* reset pagetype flags and makes migrate type to be MOVABLE */
undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
/* removal success */
+ zone->managed_pages -= offlined_pages;
zone->present_pages -= offlined_pages;
zone->zone_pgdat->node_present_pages -= offlined_pages;
totalram_pages -= offlined_pages;
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index bd82f6b..b8294fc 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -137,6 +137,22 @@ unsigned long __init free_low_memory_core_early(int nodeid)
return count;
}
+static void reset_node_lowmem_managed_pages(pg_data_t *pgdat)
+{
+ struct zone *z;
+
+ /*
+ * In free_area_init_core(), highmem zone's managed_pages is set to
+ * present_pages, and bootmem allocator doesn't allocate from highmem
+ * zones. So there's no need to recalculate managed_pages because all
+ * highmem pages will be managed by the buddy system. Here highmem
+ * zone also includes highmem movable zone.
+ */
+ for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
+ if (!is_highmem(z))
+ z->managed_pages = 0;
+}
+
/**
* free_all_bootmem_node - release a node's free pages to the buddy allocator
* @pgdat: node to be released
@@ -146,6 +162,7 @@ unsigned long __init free_low_memory_core_early(int nodeid)
unsigned long __init free_all_bootmem_node(pg_data_t *pgdat)
{
register_page_bootmem_info_node(pgdat);
+ reset_node_lowmem_managed_pages(pgdat);
/* free_low_memory_core_early(MAX_NUMNODES) will be called later */
return 0;
@@ -158,6 +175,11 @@ unsigned long __init free_all_bootmem_node(pg_data_t *pgdat)
*/
unsigned long __init free_all_bootmem(void)
{
+ struct pglist_data *pgdat;
+
+ for_each_online_pgdat(pgdat)
+ reset_node_lowmem_managed_pages(pgdat);
+
/*
* We need to use MAX_NUMNODES instead of NODE_DATA(0)->node_id
* because in some case like Node0 doesn't have RAM installed
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7bb35ac..4418a04 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -730,6 +730,13 @@ static void __free_pages_ok(struct page *page, unsigned int order)
local_irq_restore(flags);
}
+/*
+ * Read access to zone->managed_pages is safe because it's unsigned long,
+ * but we still need to serialize writers. Currently all callers of
+ * __free_pages_bootmem() except put_page_bootmem() should only be used
+ * at boot time. So for shorter boot time, we have shift the burden to
+ * put_page_bootmem() to serialize writers.
+ */
void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
{
unsigned int nr_pages = 1 << order;
@@ -745,6 +752,7 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
set_page_count(p, 0);
}
+ page_zone(page)->managed_pages += 1 << order;
set_page_refcounted(page);
__free_pages(page, order);
}
@@ -2950,6 +2958,7 @@ void show_free_areas(unsigned int filter)
" isolated(anon):%lukB"
" isolated(file):%lukB"
" present:%lukB"
+ " managed:%lukB"
" mlocked:%lukB"
" dirty:%lukB"
" writeback:%lukB"
@@ -2979,6 +2988,7 @@ void show_free_areas(unsigned int filter)
K(zone_page_state(zone, NR_ISOLATED_ANON)),
K(zone_page_state(zone, NR_ISOLATED_FILE)),
K(zone->present_pages),
+ K(zone->managed_pages),
K(zone_page_state(zone, NR_MLOCK)),
K(zone_page_state(zone, NR_FILE_DIRTY)),
K(zone_page_state(zone, NR_WRITEBACK)),
@@ -4455,48 +4465,54 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
for (j = 0; j < MAX_NR_ZONES; j++) {
struct zone *zone = pgdat->node_zones + j;
- unsigned long size, realsize, memmap_pages;
+ unsigned long size, realsize, freesize, memmap_pages;
size = zone_spanned_pages_in_node(nid, j, zones_size);
- realsize = size - zone_absent_pages_in_node(nid, j,
+ realsize = freesize = size - zone_absent_pages_in_node(nid, j,
zholes_size);
/*
- * Adjust realsize so that it accounts for how much memory
+ * Adjust freesize so that it accounts for how much memory
* is used by this zone for memmap. This affects the watermark
* and per-cpu initialisations
*/
memmap_pages =
PAGE_ALIGN(size * sizeof(struct page)) >> PAGE_SHIFT;
- if (realsize >= memmap_pages) {
- realsize -= memmap_pages;
+ if (freesize >= memmap_pages) {
+ freesize -= memmap_pages;
if (memmap_pages)
printk(KERN_DEBUG
" %s zone: %lu pages used for memmap\n",
zone_names[j], memmap_pages);
} else
printk(KERN_WARNING
- " %s zone: %lu pages exceeds realsize %lu\n",
- zone_names[j], memmap_pages, realsize);
+ " %s zone: %lu pages exceeds freesize %lu\n",
+ zone_names[j], memmap_pages, freesize);
/* Account for reserved pages */
- if (j == 0 && realsize > dma_reserve) {
- realsize -= dma_reserve;
+ if (j == 0 && freesize > dma_reserve) {
+ freesize -= dma_reserve;
printk(KERN_DEBUG " %s zone: %lu pages reserved\n",
zone_names[0], dma_reserve);
}
if (!is_highmem_idx(j))
- nr_kernel_pages += realsize;
- nr_all_pages += realsize;
+ nr_kernel_pages += freesize;
+ nr_all_pages += freesize;
zone->spanned_pages = size;
- zone->present_pages = realsize;
+ zone->present_pages = freesize;
+ /*
+ * Set an approximate value for lowmem here, it will be adjusted
+ * when the bootmem allocator frees pages into the buddy system.
+ * And all highmem pages will be managed by the buddy system.
+ */
+ zone->managed_pages = is_highmem_idx(j) ? realsize : freesize;
#ifdef CONFIG_NUMA
zone->node = nid;
- zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)
+ zone->min_unmapped_pages = (freesize*sysctl_min_unmapped_ratio)
/ 100;
- zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 100;
+ zone->min_slab_pages = (freesize * sysctl_min_slab_ratio) / 100;
#endif
zone->name = zone_names[j];
spin_lock_init(&zone->lock);
diff --git a/mm/vmstat.c b/mm/vmstat.c
index c737057..e47d31c 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -992,14 +992,16 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
"\n high %lu"
"\n scanned %lu"
"\n spanned %lu"
- "\n present %lu",
+ "\n present %lu"
+ "\n managed %lu",
zone_page_state(zone, NR_FREE_PAGES),
min_wmark_pages(zone),
low_wmark_pages(zone),
high_wmark_pages(zone),
zone->pages_scanned,
zone->spanned_pages,
- zone->present_pages);
+ zone->present_pages,
+ zone->managed_pages);
for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
seq_printf(m, "\n %-12s %lu", vmstat_text[i],
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 98+ messages in thread
* [RFT PATCH v1 2/5] mm: replace zone->present_pages with zone->managed_pages if appreciated
2012-11-18 16:07 ` Jiang Liu
@ 2012-11-18 16:07 ` Jiang Liu
-1 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-18 16:07 UTC (permalink / raw)
To: Andrew Morton, Wen Congyang, David Rientjes
Cc: Jiang Liu, Maciej Rutecki, Chris Clayton, Rafael J . Wysocki,
Mel Gorman, Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko,
Jianguo Wu, linux-mm, linux-kernel
Now we have zone->managed_pages for "pages managed by the buddy system
in the zone", so replace zone->present_pages with zone->managed_pages
if what the user really wants is number of pages managed by the buddy
system.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
---
mm/mempolicy.c | 2 +-
mm/page_alloc.c | 32 ++++++++++++++++----------------
mm/vmscan.c | 16 ++++++++--------
mm/vmstat.c | 2 +-
4 files changed, 26 insertions(+), 26 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index d04a8a5..8367070 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -147,7 +147,7 @@ static int is_valid_nodemask(const nodemask_t *nodemask)
for (k = 0; k <= policy_zone; k++) {
z = &NODE_DATA(nd)->node_zones[k];
- if (z->present_pages > 0)
+ if (z->managed_pages > 0)
return 1;
}
}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a41ee64..fe1cf48 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2791,7 +2791,7 @@ static unsigned int nr_free_zone_pages(int offset)
struct zonelist *zonelist = node_zonelist(numa_node_id(), GFP_KERNEL);
for_each_zone_zonelist(zone, z, zonelist, offset) {
- unsigned long size = zone->present_pages;
+ unsigned long size = zone->managed_pages;
unsigned long high = high_wmark_pages(zone);
if (size > high)
sum += size - high;
@@ -2844,7 +2844,7 @@ void si_meminfo_node(struct sysinfo *val, int nid)
val->totalram = pgdat->node_present_pages;
val->freeram = node_page_state(nid, NR_FREE_PAGES);
#ifdef CONFIG_HIGHMEM
- val->totalhigh = pgdat->node_zones[ZONE_HIGHMEM].present_pages;
+ val->totalhigh = pgdat->node_zones[ZONE_HIGHMEM].managed_pages;
val->freehigh = zone_page_state(&pgdat->node_zones[ZONE_HIGHMEM],
NR_FREE_PAGES);
#else
@@ -3883,7 +3883,7 @@ static int __meminit zone_batchsize(struct zone *zone)
*
* OK, so we don't know how big the cache is. So guess.
*/
- batch = zone->present_pages / 1024;
+ batch = zone->managed_pages / 1024;
if (batch * PAGE_SIZE > 512 * 1024)
batch = (512 * 1024) / PAGE_SIZE;
batch /= 4; /* We effectively *= 4 below */
@@ -3967,7 +3967,7 @@ static void __meminit setup_zone_pageset(struct zone *zone)
if (percpu_pagelist_fraction)
setup_pagelist_highmark(pcp,
- (zone->present_pages /
+ (zone->managed_pages /
percpu_pagelist_fraction));
}
}
@@ -5077,8 +5077,8 @@ static void calculate_totalreserve_pages(void)
/* we treat the high watermark as reserved pages. */
max += high_wmark_pages(zone);
- if (max > zone->present_pages)
- max = zone->present_pages;
+ if (max > zone->managed_pages)
+ max = zone->managed_pages;
reserve_pages += max;
/*
* Lowmem reserves are not available to
@@ -5110,7 +5110,7 @@ static void setup_per_zone_lowmem_reserve(void)
for_each_online_pgdat(pgdat) {
for (j = 0; j < MAX_NR_ZONES; j++) {
struct zone *zone = pgdat->node_zones + j;
- unsigned long present_pages = zone->present_pages;
+ unsigned long managed_pages = zone->managed_pages;
zone->lowmem_reserve[j] = 0;
@@ -5124,9 +5124,9 @@ static void setup_per_zone_lowmem_reserve(void)
sysctl_lowmem_reserve_ratio[idx] = 1;
lower_zone = pgdat->node_zones + idx;
- lower_zone->lowmem_reserve[j] = present_pages /
+ lower_zone->lowmem_reserve[j] = managed_pages /
sysctl_lowmem_reserve_ratio[idx];
- present_pages += lower_zone->present_pages;
+ managed_pages += lower_zone->managed_pages;
}
}
}
@@ -5145,14 +5145,14 @@ static void __setup_per_zone_wmarks(void)
/* Calculate total number of !ZONE_HIGHMEM pages */
for_each_zone(zone) {
if (!is_highmem(zone))
- lowmem_pages += zone->present_pages;
+ lowmem_pages += zone->managed_pages;
}
for_each_zone(zone) {
u64 tmp;
spin_lock_irqsave(&zone->lock, flags);
- tmp = (u64)pages_min * zone->present_pages;
+ tmp = (u64)pages_min * zone->managed_pages;
do_div(tmp, lowmem_pages);
if (is_highmem(zone)) {
/*
@@ -5166,7 +5166,7 @@ static void __setup_per_zone_wmarks(void)
*/
int min_pages;
- min_pages = zone->present_pages / 1024;
+ min_pages = zone->managed_pages / 1024;
if (min_pages < SWAP_CLUSTER_MAX)
min_pages = SWAP_CLUSTER_MAX;
if (min_pages > 128)
@@ -5235,7 +5235,7 @@ static void __meminit calculate_zone_inactive_ratio(struct zone *zone)
unsigned int gb, ratio;
/* Zone size in gigabytes */
- gb = zone->present_pages >> (30 - PAGE_SHIFT);
+ gb = zone->managed_pages >> (30 - PAGE_SHIFT);
if (gb)
ratio = int_sqrt(10 * gb);
else
@@ -5321,7 +5321,7 @@ int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write,
return rc;
for_each_zone(zone)
- zone->min_unmapped_pages = (zone->present_pages *
+ zone->min_unmapped_pages = (zone->managed_pages *
sysctl_min_unmapped_ratio) / 100;
return 0;
}
@@ -5337,7 +5337,7 @@ int sysctl_min_slab_ratio_sysctl_handler(ctl_table *table, int write,
return rc;
for_each_zone(zone)
- zone->min_slab_pages = (zone->present_pages *
+ zone->min_slab_pages = (zone->managed_pages *
sysctl_min_slab_ratio) / 100;
return 0;
}
@@ -5379,7 +5379,7 @@ int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
for_each_populated_zone(zone) {
for_each_possible_cpu(cpu) {
unsigned long high;
- high = zone->present_pages / percpu_pagelist_fraction;
+ high = zone->managed_pages / percpu_pagelist_fraction;
setup_pagelist_highmark(
per_cpu_ptr(zone->pageset, cpu), high);
}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 48550c6..7240e89 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1935,7 +1935,7 @@ static inline bool compaction_ready(struct zone *zone, struct scan_control *sc)
* a reasonable chance of completing and allocating the page
*/
balance_gap = min(low_wmark_pages(zone),
- (zone->present_pages + KSWAPD_ZONE_BALANCE_GAP_RATIO-1) /
+ (zone->managed_pages + KSWAPD_ZONE_BALANCE_GAP_RATIO-1) /
KSWAPD_ZONE_BALANCE_GAP_RATIO);
watermark = high_wmark_pages(zone) + balance_gap + (2UL << sc->order);
watermark_ok = zone_watermark_ok_safe(zone, 0, watermark, 0, 0);
@@ -2416,14 +2416,14 @@ static void age_active_anon(struct zone *zone, struct scan_control *sc)
static bool pgdat_balanced(pg_data_t *pgdat, unsigned long balanced_pages,
int classzone_idx)
{
- unsigned long present_pages = 0;
+ unsigned long managed_pages = 0;
int i;
for (i = 0; i <= classzone_idx; i++)
- present_pages += pgdat->node_zones[i].present_pages;
+ managed_pages += pgdat->node_zones[i].managed_pages;
/* A special case here: if zone has no page, we think it's balanced */
- return balanced_pages >= (present_pages >> 2);
+ return balanced_pages >= (managed_pages >> 2);
}
/*
@@ -2471,7 +2471,7 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining,
* is to sleep
*/
if (zone->all_unreclaimable) {
- balanced += zone->present_pages;
+ balanced += zone->managed_pages;
continue;
}
@@ -2479,7 +2479,7 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining,
i, 0))
all_zones_ok = false;
else
- balanced += zone->present_pages;
+ balanced += zone->managed_pages;
}
/*
@@ -2645,7 +2645,7 @@ loop_again:
* of the zone, whichever is smaller.
*/
balance_gap = min(low_wmark_pages(zone),
- (zone->present_pages +
+ (zone->managed_pages +
KSWAPD_ZONE_BALANCE_GAP_RATIO-1) /
KSWAPD_ZONE_BALANCE_GAP_RATIO);
/*
@@ -2712,7 +2712,7 @@ loop_again:
*/
zone_clear_flag(zone, ZONE_CONGESTED);
if (i <= *classzone_idx)
- balanced += zone->present_pages;
+ balanced += zone->managed_pages;
}
}
diff --git a/mm/vmstat.c b/mm/vmstat.c
index e47d31c..b2925d1 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -142,7 +142,7 @@ int calculate_normal_threshold(struct zone *zone)
* 125 1024 10 16-32 GB 9
*/
- mem = zone->present_pages >> (27 - PAGE_SHIFT);
+ mem = zone->managed_pages >> (27 - PAGE_SHIFT);
threshold = 2 * fls(num_online_cpus()) * (1 + fls(mem));
--
1.7.9.5
^ permalink raw reply related [flat|nested] 98+ messages in thread
* [RFT PATCH v1 2/5] mm: replace zone->present_pages with zone->managed_pages if appreciated
@ 2012-11-18 16:07 ` Jiang Liu
0 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-18 16:07 UTC (permalink / raw)
To: Andrew Morton, Wen Congyang, David Rientjes
Cc: Jiang Liu, Maciej Rutecki, Chris Clayton, Rafael J . Wysocki,
Mel Gorman, Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko,
Jianguo Wu, linux-mm, linux-kernel
Now we have zone->managed_pages for "pages managed by the buddy system
in the zone", so replace zone->present_pages with zone->managed_pages
if what the user really wants is number of pages managed by the buddy
system.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
---
mm/mempolicy.c | 2 +-
mm/page_alloc.c | 32 ++++++++++++++++----------------
mm/vmscan.c | 16 ++++++++--------
mm/vmstat.c | 2 +-
4 files changed, 26 insertions(+), 26 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index d04a8a5..8367070 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -147,7 +147,7 @@ static int is_valid_nodemask(const nodemask_t *nodemask)
for (k = 0; k <= policy_zone; k++) {
z = &NODE_DATA(nd)->node_zones[k];
- if (z->present_pages > 0)
+ if (z->managed_pages > 0)
return 1;
}
}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a41ee64..fe1cf48 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2791,7 +2791,7 @@ static unsigned int nr_free_zone_pages(int offset)
struct zonelist *zonelist = node_zonelist(numa_node_id(), GFP_KERNEL);
for_each_zone_zonelist(zone, z, zonelist, offset) {
- unsigned long size = zone->present_pages;
+ unsigned long size = zone->managed_pages;
unsigned long high = high_wmark_pages(zone);
if (size > high)
sum += size - high;
@@ -2844,7 +2844,7 @@ void si_meminfo_node(struct sysinfo *val, int nid)
val->totalram = pgdat->node_present_pages;
val->freeram = node_page_state(nid, NR_FREE_PAGES);
#ifdef CONFIG_HIGHMEM
- val->totalhigh = pgdat->node_zones[ZONE_HIGHMEM].present_pages;
+ val->totalhigh = pgdat->node_zones[ZONE_HIGHMEM].managed_pages;
val->freehigh = zone_page_state(&pgdat->node_zones[ZONE_HIGHMEM],
NR_FREE_PAGES);
#else
@@ -3883,7 +3883,7 @@ static int __meminit zone_batchsize(struct zone *zone)
*
* OK, so we don't know how big the cache is. So guess.
*/
- batch = zone->present_pages / 1024;
+ batch = zone->managed_pages / 1024;
if (batch * PAGE_SIZE > 512 * 1024)
batch = (512 * 1024) / PAGE_SIZE;
batch /= 4; /* We effectively *= 4 below */
@@ -3967,7 +3967,7 @@ static void __meminit setup_zone_pageset(struct zone *zone)
if (percpu_pagelist_fraction)
setup_pagelist_highmark(pcp,
- (zone->present_pages /
+ (zone->managed_pages /
percpu_pagelist_fraction));
}
}
@@ -5077,8 +5077,8 @@ static void calculate_totalreserve_pages(void)
/* we treat the high watermark as reserved pages. */
max += high_wmark_pages(zone);
- if (max > zone->present_pages)
- max = zone->present_pages;
+ if (max > zone->managed_pages)
+ max = zone->managed_pages;
reserve_pages += max;
/*
* Lowmem reserves are not available to
@@ -5110,7 +5110,7 @@ static void setup_per_zone_lowmem_reserve(void)
for_each_online_pgdat(pgdat) {
for (j = 0; j < MAX_NR_ZONES; j++) {
struct zone *zone = pgdat->node_zones + j;
- unsigned long present_pages = zone->present_pages;
+ unsigned long managed_pages = zone->managed_pages;
zone->lowmem_reserve[j] = 0;
@@ -5124,9 +5124,9 @@ static void setup_per_zone_lowmem_reserve(void)
sysctl_lowmem_reserve_ratio[idx] = 1;
lower_zone = pgdat->node_zones + idx;
- lower_zone->lowmem_reserve[j] = present_pages /
+ lower_zone->lowmem_reserve[j] = managed_pages /
sysctl_lowmem_reserve_ratio[idx];
- present_pages += lower_zone->present_pages;
+ managed_pages += lower_zone->managed_pages;
}
}
}
@@ -5145,14 +5145,14 @@ static void __setup_per_zone_wmarks(void)
/* Calculate total number of !ZONE_HIGHMEM pages */
for_each_zone(zone) {
if (!is_highmem(zone))
- lowmem_pages += zone->present_pages;
+ lowmem_pages += zone->managed_pages;
}
for_each_zone(zone) {
u64 tmp;
spin_lock_irqsave(&zone->lock, flags);
- tmp = (u64)pages_min * zone->present_pages;
+ tmp = (u64)pages_min * zone->managed_pages;
do_div(tmp, lowmem_pages);
if (is_highmem(zone)) {
/*
@@ -5166,7 +5166,7 @@ static void __setup_per_zone_wmarks(void)
*/
int min_pages;
- min_pages = zone->present_pages / 1024;
+ min_pages = zone->managed_pages / 1024;
if (min_pages < SWAP_CLUSTER_MAX)
min_pages = SWAP_CLUSTER_MAX;
if (min_pages > 128)
@@ -5235,7 +5235,7 @@ static void __meminit calculate_zone_inactive_ratio(struct zone *zone)
unsigned int gb, ratio;
/* Zone size in gigabytes */
- gb = zone->present_pages >> (30 - PAGE_SHIFT);
+ gb = zone->managed_pages >> (30 - PAGE_SHIFT);
if (gb)
ratio = int_sqrt(10 * gb);
else
@@ -5321,7 +5321,7 @@ int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write,
return rc;
for_each_zone(zone)
- zone->min_unmapped_pages = (zone->present_pages *
+ zone->min_unmapped_pages = (zone->managed_pages *
sysctl_min_unmapped_ratio) / 100;
return 0;
}
@@ -5337,7 +5337,7 @@ int sysctl_min_slab_ratio_sysctl_handler(ctl_table *table, int write,
return rc;
for_each_zone(zone)
- zone->min_slab_pages = (zone->present_pages *
+ zone->min_slab_pages = (zone->managed_pages *
sysctl_min_slab_ratio) / 100;
return 0;
}
@@ -5379,7 +5379,7 @@ int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
for_each_populated_zone(zone) {
for_each_possible_cpu(cpu) {
unsigned long high;
- high = zone->present_pages / percpu_pagelist_fraction;
+ high = zone->managed_pages / percpu_pagelist_fraction;
setup_pagelist_highmark(
per_cpu_ptr(zone->pageset, cpu), high);
}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 48550c6..7240e89 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1935,7 +1935,7 @@ static inline bool compaction_ready(struct zone *zone, struct scan_control *sc)
* a reasonable chance of completing and allocating the page
*/
balance_gap = min(low_wmark_pages(zone),
- (zone->present_pages + KSWAPD_ZONE_BALANCE_GAP_RATIO-1) /
+ (zone->managed_pages + KSWAPD_ZONE_BALANCE_GAP_RATIO-1) /
KSWAPD_ZONE_BALANCE_GAP_RATIO);
watermark = high_wmark_pages(zone) + balance_gap + (2UL << sc->order);
watermark_ok = zone_watermark_ok_safe(zone, 0, watermark, 0, 0);
@@ -2416,14 +2416,14 @@ static void age_active_anon(struct zone *zone, struct scan_control *sc)
static bool pgdat_balanced(pg_data_t *pgdat, unsigned long balanced_pages,
int classzone_idx)
{
- unsigned long present_pages = 0;
+ unsigned long managed_pages = 0;
int i;
for (i = 0; i <= classzone_idx; i++)
- present_pages += pgdat->node_zones[i].present_pages;
+ managed_pages += pgdat->node_zones[i].managed_pages;
/* A special case here: if zone has no page, we think it's balanced */
- return balanced_pages >= (present_pages >> 2);
+ return balanced_pages >= (managed_pages >> 2);
}
/*
@@ -2471,7 +2471,7 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining,
* is to sleep
*/
if (zone->all_unreclaimable) {
- balanced += zone->present_pages;
+ balanced += zone->managed_pages;
continue;
}
@@ -2479,7 +2479,7 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining,
i, 0))
all_zones_ok = false;
else
- balanced += zone->present_pages;
+ balanced += zone->managed_pages;
}
/*
@@ -2645,7 +2645,7 @@ loop_again:
* of the zone, whichever is smaller.
*/
balance_gap = min(low_wmark_pages(zone),
- (zone->present_pages +
+ (zone->managed_pages +
KSWAPD_ZONE_BALANCE_GAP_RATIO-1) /
KSWAPD_ZONE_BALANCE_GAP_RATIO);
/*
@@ -2712,7 +2712,7 @@ loop_again:
*/
zone_clear_flag(zone, ZONE_CONGESTED);
if (i <= *classzone_idx)
- balanced += zone->present_pages;
+ balanced += zone->managed_pages;
}
}
diff --git a/mm/vmstat.c b/mm/vmstat.c
index e47d31c..b2925d1 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -142,7 +142,7 @@ int calculate_normal_threshold(struct zone *zone)
* 125 1024 10 16-32 GB 9
*/
- mem = zone->present_pages >> (27 - PAGE_SHIFT);
+ mem = zone->managed_pages >> (27 - PAGE_SHIFT);
threshold = 2 * fls(num_online_cpus()) * (1 + fls(mem));
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 98+ messages in thread
* [RFT PATCH v1 3/5] mm: set zone->present_pages to number of existing pages in the zone
2012-11-18 16:07 ` Jiang Liu
@ 2012-11-18 16:07 ` Jiang Liu
-1 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-18 16:07 UTC (permalink / raw)
To: Andrew Morton, Wen Congyang, David Rientjes
Cc: Jiang Liu, Maciej Rutecki, Chris Clayton, Rafael J . Wysocki,
Mel Gorman, Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko,
Jianguo Wu, linux-mm, linux-kernel
Now all users using "number of pages managed by the buddy system" have
been converted to use zone->managed_pages, so set zone->present_pages
to what it really should be:
present_pages = spanned_pages - absent_pages;
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fe1cf48..5b327d7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4494,7 +4494,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
nr_all_pages += freesize;
zone->spanned_pages = size;
- zone->present_pages = freesize;
+ zone->present_pages = realsize;
/*
* Set an approximate value for lowmem here, it will be adjusted
* when the bootmem allocator frees pages into the buddy system.
--
1.7.9.5
^ permalink raw reply related [flat|nested] 98+ messages in thread
* [RFT PATCH v1 3/5] mm: set zone->present_pages to number of existing pages in the zone
@ 2012-11-18 16:07 ` Jiang Liu
0 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-18 16:07 UTC (permalink / raw)
To: Andrew Morton, Wen Congyang, David Rientjes
Cc: Jiang Liu, Maciej Rutecki, Chris Clayton, Rafael J . Wysocki,
Mel Gorman, Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko,
Jianguo Wu, linux-mm, linux-kernel
Now all users using "number of pages managed by the buddy system" have
been converted to use zone->managed_pages, so set zone->present_pages
to what it really should be:
present_pages = spanned_pages - absent_pages;
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fe1cf48..5b327d7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4494,7 +4494,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
nr_all_pages += freesize;
zone->spanned_pages = size;
- zone->present_pages = freesize;
+ zone->present_pages = realsize;
/*
* Set an approximate value for lowmem here, it will be adjusted
* when the bootmem allocator frees pages into the buddy system.
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 98+ messages in thread
* [RFT PATCH v1 4/5] mm: provide more accurate estimation of pages occupied by memmap
2012-11-18 16:07 ` Jiang Liu
@ 2012-11-18 16:07 ` Jiang Liu
-1 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-18 16:07 UTC (permalink / raw)
To: Andrew Morton, Wen Congyang, David Rientjes
Cc: Jiang Liu, Maciej Rutecki, Chris Clayton, Rafael J . Wysocki,
Mel Gorman, Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko,
Jianguo Wu, linux-mm, linux-kernel
If SPARSEMEM is enabled, it won't build page structures for
non-existing pages (holes) within a zone, so provide a more accurate
estimation of pages occupied by memmap if there are big holes within
the zone.
And pages for highmem zones' memmap will be allocated from lowmem,
so charge nr_kernel_pages for that.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
---
mm/page_alloc.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5b327d7..eb25679 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4435,6 +4435,22 @@ void __init set_pageblock_order(void)
#endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
+static unsigned long calc_memmap_size(unsigned long spanned_pages,
+ unsigned long present_pages)
+{
+ unsigned long pages = spanned_pages;
+
+ /*
+ * Provide a more accurate estimation if there are big holes within
+ * the zone and SPARSEMEM is in use.
+ */
+ if (spanned_pages > present_pages + (present_pages >> 4) &&
+ IS_ENABLED(CONFIG_SPARSEMEM))
+ pages = present_pages;
+
+ return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT;
+}
+
/*
* Set up the zone data structures:
* - mark all pages reserved
@@ -4469,8 +4485,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
* is used by this zone for memmap. This affects the watermark
* and per-cpu initialisations
*/
- memmap_pages =
- PAGE_ALIGN(size * sizeof(struct page)) >> PAGE_SHIFT;
+ memmap_pages = calc_memmap_size(size, realsize);
if (freesize >= memmap_pages) {
freesize -= memmap_pages;
if (memmap_pages)
@@ -4491,6 +4506,9 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
if (!is_highmem_idx(j))
nr_kernel_pages += freesize;
+ /* Charge for highmem memmap if there are enough kernel pages */
+ else if (nr_kernel_pages > memmap_pages * 2)
+ nr_kernel_pages -= memmap_pages;
nr_all_pages += freesize;
zone->spanned_pages = size;
--
1.7.9.5
^ permalink raw reply related [flat|nested] 98+ messages in thread
* [RFT PATCH v1 4/5] mm: provide more accurate estimation of pages occupied by memmap
@ 2012-11-18 16:07 ` Jiang Liu
0 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-18 16:07 UTC (permalink / raw)
To: Andrew Morton, Wen Congyang, David Rientjes
Cc: Jiang Liu, Maciej Rutecki, Chris Clayton, Rafael J . Wysocki,
Mel Gorman, Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko,
Jianguo Wu, linux-mm, linux-kernel
If SPARSEMEM is enabled, it won't build page structures for
non-existing pages (holes) within a zone, so provide a more accurate
estimation of pages occupied by memmap if there are big holes within
the zone.
And pages for highmem zones' memmap will be allocated from lowmem,
so charge nr_kernel_pages for that.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
---
mm/page_alloc.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5b327d7..eb25679 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4435,6 +4435,22 @@ void __init set_pageblock_order(void)
#endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
+static unsigned long calc_memmap_size(unsigned long spanned_pages,
+ unsigned long present_pages)
+{
+ unsigned long pages = spanned_pages;
+
+ /*
+ * Provide a more accurate estimation if there are big holes within
+ * the zone and SPARSEMEM is in use.
+ */
+ if (spanned_pages > present_pages + (present_pages >> 4) &&
+ IS_ENABLED(CONFIG_SPARSEMEM))
+ pages = present_pages;
+
+ return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT;
+}
+
/*
* Set up the zone data structures:
* - mark all pages reserved
@@ -4469,8 +4485,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
* is used by this zone for memmap. This affects the watermark
* and per-cpu initialisations
*/
- memmap_pages =
- PAGE_ALIGN(size * sizeof(struct page)) >> PAGE_SHIFT;
+ memmap_pages = calc_memmap_size(size, realsize);
if (freesize >= memmap_pages) {
freesize -= memmap_pages;
if (memmap_pages)
@@ -4491,6 +4506,9 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
if (!is_highmem_idx(j))
nr_kernel_pages += freesize;
+ /* Charge for highmem memmap if there are enough kernel pages */
+ else if (nr_kernel_pages > memmap_pages * 2)
+ nr_kernel_pages -= memmap_pages;
nr_all_pages += freesize;
zone->spanned_pages = size;
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 4/5] mm: provide more accurate estimation of pages occupied by memmap
2012-11-18 16:07 ` Jiang Liu
@ 2012-11-19 23:42 ` Andrew Morton
-1 siblings, 0 replies; 98+ messages in thread
From: Andrew Morton @ 2012-11-19 23:42 UTC (permalink / raw)
To: Jiang Liu
Cc: Wen Congyang, David Rientjes, Jiang Liu, Maciej Rutecki,
Chris Clayton, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On Mon, 19 Nov 2012 00:07:29 +0800
Jiang Liu <liuj97@gmail.com> wrote:
> If SPARSEMEM is enabled, it won't build page structures for
> non-existing pages (holes) within a zone, so provide a more accurate
> estimation of pages occupied by memmap if there are big holes within
> the zone.
>
> And pages for highmem zones' memmap will be allocated from lowmem,
> so charge nr_kernel_pages for that.
>
> ...
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4435,6 +4435,22 @@ void __init set_pageblock_order(void)
>
> #endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
>
> +static unsigned long calc_memmap_size(unsigned long spanned_pages,
> + unsigned long present_pages)
> +{
> + unsigned long pages = spanned_pages;
> +
> + /*
> + * Provide a more accurate estimation if there are big holes within
> + * the zone and SPARSEMEM is in use.
> + */
> + if (spanned_pages > present_pages + (present_pages >> 4) &&
> + IS_ENABLED(CONFIG_SPARSEMEM))
> + pages = present_pages;
> +
> + return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT;
> +}
Please explain the ">> 4" heuristc more completely - preferably in both
the changelog and code comments. Why can't we calculate this
requirement exactly? That might require a second pass, but that's OK for
code like this?
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 4/5] mm: provide more accurate estimation of pages occupied by memmap
@ 2012-11-19 23:42 ` Andrew Morton
0 siblings, 0 replies; 98+ messages in thread
From: Andrew Morton @ 2012-11-19 23:42 UTC (permalink / raw)
To: Jiang Liu
Cc: Wen Congyang, David Rientjes, Jiang Liu, Maciej Rutecki,
Chris Clayton, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On Mon, 19 Nov 2012 00:07:29 +0800
Jiang Liu <liuj97@gmail.com> wrote:
> If SPARSEMEM is enabled, it won't build page structures for
> non-existing pages (holes) within a zone, so provide a more accurate
> estimation of pages occupied by memmap if there are big holes within
> the zone.
>
> And pages for highmem zones' memmap will be allocated from lowmem,
> so charge nr_kernel_pages for that.
>
> ...
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4435,6 +4435,22 @@ void __init set_pageblock_order(void)
>
> #endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
>
> +static unsigned long calc_memmap_size(unsigned long spanned_pages,
> + unsigned long present_pages)
> +{
> + unsigned long pages = spanned_pages;
> +
> + /*
> + * Provide a more accurate estimation if there are big holes within
> + * the zone and SPARSEMEM is in use.
> + */
> + if (spanned_pages > present_pages + (present_pages >> 4) &&
> + IS_ENABLED(CONFIG_SPARSEMEM))
> + pages = present_pages;
> +
> + return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT;
> +}
Please explain the ">> 4" heuristc more completely - preferably in both
the changelog and code comments. Why can't we calculate this
requirement exactly? That might require a second pass, but that's OK for
code like this?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 4/5] mm: provide more accurate estimation of pages occupied by memmap
2012-11-19 23:42 ` Andrew Morton
@ 2012-11-20 15:18 ` Jiang Liu
-1 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-20 15:18 UTC (permalink / raw)
To: Andrew Morton
Cc: Wen Congyang, David Rientjes, Jiang Liu, Maciej Rutecki,
Chris Clayton, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On 11/20/2012 07:42 AM, Andrew Morton wrote:
> On Mon, 19 Nov 2012 00:07:29 +0800
> Jiang Liu <liuj97@gmail.com> wrote:
>
>> If SPARSEMEM is enabled, it won't build page structures for
>> non-existing pages (holes) within a zone, so provide a more accurate
>> estimation of pages occupied by memmap if there are big holes within
>> the zone.
>>
>> And pages for highmem zones' memmap will be allocated from lowmem,
>> so charge nr_kernel_pages for that.
>>
>> ...
>>
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -4435,6 +4435,22 @@ void __init set_pageblock_order(void)
>>
>> #endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
>>
>> +static unsigned long calc_memmap_size(unsigned long spanned_pages,
>> + unsigned long present_pages)
>> +{
>> + unsigned long pages = spanned_pages;
>> +
>> + /*
>> + * Provide a more accurate estimation if there are big holes within
>> + * the zone and SPARSEMEM is in use.
>> + */
>> + if (spanned_pages > present_pages + (present_pages >> 4) &&
>> + IS_ENABLED(CONFIG_SPARSEMEM))
>> + pages = present_pages;
>> +
>> + return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT;
>> +}
>
> Please explain the ">> 4" heuristc more completely - preferably in both
> the changelog and code comments. Why can't we calculate this
> requirement exactly? That might require a second pass, but that's OK for
> code like this?
Hi Andrew,
A normal x86 platform always have some holes within the DMA ZONE,
so the ">> 4" heuristic is to avoid applying this adjustment to the DMA
ZONE on x86 platforms.
Because the memmap_size is just an estimation, I feel it's OK to
remove the ">> 4" heuristic, that shouldn't affect much.
Thanks
Gerry
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 4/5] mm: provide more accurate estimation of pages occupied by memmap
@ 2012-11-20 15:18 ` Jiang Liu
0 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-20 15:18 UTC (permalink / raw)
To: Andrew Morton
Cc: Wen Congyang, David Rientjes, Jiang Liu, Maciej Rutecki,
Chris Clayton, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On 11/20/2012 07:42 AM, Andrew Morton wrote:
> On Mon, 19 Nov 2012 00:07:29 +0800
> Jiang Liu <liuj97@gmail.com> wrote:
>
>> If SPARSEMEM is enabled, it won't build page structures for
>> non-existing pages (holes) within a zone, so provide a more accurate
>> estimation of pages occupied by memmap if there are big holes within
>> the zone.
>>
>> And pages for highmem zones' memmap will be allocated from lowmem,
>> so charge nr_kernel_pages for that.
>>
>> ...
>>
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -4435,6 +4435,22 @@ void __init set_pageblock_order(void)
>>
>> #endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
>>
>> +static unsigned long calc_memmap_size(unsigned long spanned_pages,
>> + unsigned long present_pages)
>> +{
>> + unsigned long pages = spanned_pages;
>> +
>> + /*
>> + * Provide a more accurate estimation if there are big holes within
>> + * the zone and SPARSEMEM is in use.
>> + */
>> + if (spanned_pages > present_pages + (present_pages >> 4) &&
>> + IS_ENABLED(CONFIG_SPARSEMEM))
>> + pages = present_pages;
>> +
>> + return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT;
>> +}
>
> Please explain the ">> 4" heuristc more completely - preferably in both
> the changelog and code comments. Why can't we calculate this
> requirement exactly? That might require a second pass, but that's OK for
> code like this?
Hi Andrew,
A normal x86 platform always have some holes within the DMA ZONE,
so the ">> 4" heuristic is to avoid applying this adjustment to the DMA
ZONE on x86 platforms.
Because the memmap_size is just an estimation, I feel it's OK to
remove the ">> 4" heuristic, that shouldn't affect much.
Thanks
Gerry
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 4/5] mm: provide more accurate estimation of pages occupied by memmap
2012-11-20 15:18 ` Jiang Liu
@ 2012-11-20 19:19 ` Andrew Morton
-1 siblings, 0 replies; 98+ messages in thread
From: Andrew Morton @ 2012-11-20 19:19 UTC (permalink / raw)
To: Jiang Liu
Cc: Wen Congyang, David Rientjes, Jiang Liu, Maciej Rutecki,
Chris Clayton, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On Tue, 20 Nov 2012 23:18:34 +0800
Jiang Liu <liuj97@gmail.com> wrote:
> >> +static unsigned long calc_memmap_size(unsigned long spanned_pages,
> >> + unsigned long present_pages)
> >> +{
> >> + unsigned long pages = spanned_pages;
> >> +
> >> + /*
> >> + * Provide a more accurate estimation if there are big holes within
> >> + * the zone and SPARSEMEM is in use.
> >> + */
> >> + if (spanned_pages > present_pages + (present_pages >> 4) &&
> >> + IS_ENABLED(CONFIG_SPARSEMEM))
> >> + pages = present_pages;
> >> +
> >> + return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT;
> >> +}
> >
> > Please explain the ">> 4" heuristc more completely - preferably in both
> > the changelog and code comments. Why can't we calculate this
> > requirement exactly? That might require a second pass, but that's OK for
> > code like this?
> Hi Andrew,
> A normal x86 platform always have some holes within the DMA ZONE,
> so the ">> 4" heuristic is to avoid applying this adjustment to the DMA
> ZONE on x86 platforms.
> Because the memmap_size is just an estimation, I feel it's OK to
> remove the ">> 4" heuristic, that shouldn't affect much.
Again: why can't we calculate this requirement exactly? That might
require a second pass, but that's OK for code like this?
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 4/5] mm: provide more accurate estimation of pages occupied by memmap
@ 2012-11-20 19:19 ` Andrew Morton
0 siblings, 0 replies; 98+ messages in thread
From: Andrew Morton @ 2012-11-20 19:19 UTC (permalink / raw)
To: Jiang Liu
Cc: Wen Congyang, David Rientjes, Jiang Liu, Maciej Rutecki,
Chris Clayton, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On Tue, 20 Nov 2012 23:18:34 +0800
Jiang Liu <liuj97@gmail.com> wrote:
> >> +static unsigned long calc_memmap_size(unsigned long spanned_pages,
> >> + unsigned long present_pages)
> >> +{
> >> + unsigned long pages = spanned_pages;
> >> +
> >> + /*
> >> + * Provide a more accurate estimation if there are big holes within
> >> + * the zone and SPARSEMEM is in use.
> >> + */
> >> + if (spanned_pages > present_pages + (present_pages >> 4) &&
> >> + IS_ENABLED(CONFIG_SPARSEMEM))
> >> + pages = present_pages;
> >> +
> >> + return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT;
> >> +}
> >
> > Please explain the ">> 4" heuristc more completely - preferably in both
> > the changelog and code comments. Why can't we calculate this
> > requirement exactly? That might require a second pass, but that's OK for
> > code like this?
> Hi Andrew,
> A normal x86 platform always have some holes within the DMA ZONE,
> so the ">> 4" heuristic is to avoid applying this adjustment to the DMA
> ZONE on x86 platforms.
> Because the memmap_size is just an estimation, I feel it's OK to
> remove the ">> 4" heuristic, that shouldn't affect much.
Again: why can't we calculate this requirement exactly? That might
require a second pass, but that's OK for code like this?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 4/5] mm: provide more accurate estimation of pages occupied by memmap
2012-11-20 19:19 ` Andrew Morton
@ 2012-11-21 14:52 ` Jiang Liu
-1 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-21 14:52 UTC (permalink / raw)
To: Andrew Morton
Cc: Wen Congyang, David Rientjes, Jiang Liu, Maciej Rutecki,
Chris Clayton, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On 11/21/2012 03:19 AM, Andrew Morton wrote:
> On Tue, 20 Nov 2012 23:18:34 +0800
> Jiang Liu <liuj97@gmail.com> wrote:
>
>>>> +static unsigned long calc_memmap_size(unsigned long spanned_pages,
>>>> + unsigned long present_pages)
>>>> +{
>>>> + unsigned long pages = spanned_pages;
>>>> +
>>>> + /*
>>>> + * Provide a more accurate estimation if there are big holes within
>>>> + * the zone and SPARSEMEM is in use.
>>>> + */
>>>> + if (spanned_pages > present_pages + (present_pages >> 4) &&
>>>> + IS_ENABLED(CONFIG_SPARSEMEM))
>>>> + pages = present_pages;
>>>> +
>>>> + return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT;
>>>> +}
>>>
>>> Please explain the ">> 4" heuristc more completely - preferably in both
>>> the changelog and code comments. Why can't we calculate this
>>> requirement exactly? That might require a second pass, but that's OK for
>>> code like this?
>> Hi Andrew,
>> A normal x86 platform always have some holes within the DMA ZONE,
>> so the ">> 4" heuristic is to avoid applying this adjustment to the DMA
>> ZONE on x86 platforms.
>> Because the memmap_size is just an estimation, I feel it's OK to
>> remove the ">> 4" heuristic, that shouldn't affect much.
>
> Again: why can't we calculate this requirement exactly? That might
> require a second pass, but that's OK for code like this?
Hi Andrew,
If there are holes within a zone, it may cost us one or two extra pages
for each populated region within the zone due to alignment because memmap for
each populated regions may not naturally aligned on page boundary.
Originally the ">> 4" heuristic is to trade off these extra memmap pages,
especially for small zones linke DMA zone.
Thanks!
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 4/5] mm: provide more accurate estimation of pages occupied by memmap
@ 2012-11-21 14:52 ` Jiang Liu
0 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-21 14:52 UTC (permalink / raw)
To: Andrew Morton
Cc: Wen Congyang, David Rientjes, Jiang Liu, Maciej Rutecki,
Chris Clayton, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On 11/21/2012 03:19 AM, Andrew Morton wrote:
> On Tue, 20 Nov 2012 23:18:34 +0800
> Jiang Liu <liuj97@gmail.com> wrote:
>
>>>> +static unsigned long calc_memmap_size(unsigned long spanned_pages,
>>>> + unsigned long present_pages)
>>>> +{
>>>> + unsigned long pages = spanned_pages;
>>>> +
>>>> + /*
>>>> + * Provide a more accurate estimation if there are big holes within
>>>> + * the zone and SPARSEMEM is in use.
>>>> + */
>>>> + if (spanned_pages > present_pages + (present_pages >> 4) &&
>>>> + IS_ENABLED(CONFIG_SPARSEMEM))
>>>> + pages = present_pages;
>>>> +
>>>> + return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT;
>>>> +}
>>>
>>> Please explain the ">> 4" heuristc more completely - preferably in both
>>> the changelog and code comments. Why can't we calculate this
>>> requirement exactly? That might require a second pass, but that's OK for
>>> code like this?
>> Hi Andrew,
>> A normal x86 platform always have some holes within the DMA ZONE,
>> so the ">> 4" heuristic is to avoid applying this adjustment to the DMA
>> ZONE on x86 platforms.
>> Because the memmap_size is just an estimation, I feel it's OK to
>> remove the ">> 4" heuristic, that shouldn't affect much.
>
> Again: why can't we calculate this requirement exactly? That might
> require a second pass, but that's OK for code like this?
Hi Andrew,
If there are holes within a zone, it may cost us one or two extra pages
for each populated region within the zone due to alignment because memmap for
each populated regions may not naturally aligned on page boundary.
Originally the ">> 4" heuristic is to trade off these extra memmap pages,
especially for small zones linke DMA zone.
Thanks!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 4/5] mm: provide more accurate estimation of pages occupied by memmap
2012-11-21 14:52 ` Jiang Liu
@ 2012-11-21 19:35 ` Andrew Morton
-1 siblings, 0 replies; 98+ messages in thread
From: Andrew Morton @ 2012-11-21 19:35 UTC (permalink / raw)
To: Jiang Liu
Cc: Wen Congyang, David Rientjes, Jiang Liu, Maciej Rutecki,
Chris Clayton, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On Wed, 21 Nov 2012 22:52:29 +0800
Jiang Liu <liuj97@gmail.com> wrote:
> On 11/21/2012 03:19 AM, Andrew Morton wrote:
> > On Tue, 20 Nov 2012 23:18:34 +0800
> > Jiang Liu <liuj97@gmail.com> wrote:
> >
> >>>> +static unsigned long calc_memmap_size(unsigned long spanned_pages,
> >>>> + unsigned long present_pages)
> >>>> +{
> >>>> + unsigned long pages = spanned_pages;
> >>>> +
> >>>> + /*
> >>>> + * Provide a more accurate estimation if there are big holes within
> >>>> + * the zone and SPARSEMEM is in use.
> >>>> + */
> >>>> + if (spanned_pages > present_pages + (present_pages >> 4) &&
> >>>> + IS_ENABLED(CONFIG_SPARSEMEM))
> >>>> + pages = present_pages;
> >>>> +
> >>>> + return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT;
> >>>> +}
> >>>
> >>> Please explain the ">> 4" heuristc more completely - preferably in both
> >>> the changelog and code comments. Why can't we calculate this
> >>> requirement exactly? That might require a second pass, but that's OK for
> >>> code like this?
> >> Hi Andrew,
> >> A normal x86 platform always have some holes within the DMA ZONE,
> >> so the ">> 4" heuristic is to avoid applying this adjustment to the DMA
> >> ZONE on x86 platforms.
> >> Because the memmap_size is just an estimation, I feel it's OK to
> >> remove the ">> 4" heuristic, that shouldn't affect much.
> >
> > Again: why can't we calculate this requirement exactly? That might
> > require a second pass, but that's OK for code like this?
>
> Hi Andrew,
> If there are holes within a zone, it may cost us one or two extra pages
> for each populated region within the zone due to alignment because memmap for
> each populated regions may not naturally aligned on page boundary.
Right. So with an additional pass across the zone and a bit of
arithmetic, we can calculate the exact space requirement for memmap?
No need for kludgy heuristics?
> Originally the ">> 4" heuristic is to trade off these extra memmap pages,
> especially for small zones linke DMA zone.
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 4/5] mm: provide more accurate estimation of pages occupied by memmap
@ 2012-11-21 19:35 ` Andrew Morton
0 siblings, 0 replies; 98+ messages in thread
From: Andrew Morton @ 2012-11-21 19:35 UTC (permalink / raw)
To: Jiang Liu
Cc: Wen Congyang, David Rientjes, Jiang Liu, Maciej Rutecki,
Chris Clayton, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On Wed, 21 Nov 2012 22:52:29 +0800
Jiang Liu <liuj97@gmail.com> wrote:
> On 11/21/2012 03:19 AM, Andrew Morton wrote:
> > On Tue, 20 Nov 2012 23:18:34 +0800
> > Jiang Liu <liuj97@gmail.com> wrote:
> >
> >>>> +static unsigned long calc_memmap_size(unsigned long spanned_pages,
> >>>> + unsigned long present_pages)
> >>>> +{
> >>>> + unsigned long pages = spanned_pages;
> >>>> +
> >>>> + /*
> >>>> + * Provide a more accurate estimation if there are big holes within
> >>>> + * the zone and SPARSEMEM is in use.
> >>>> + */
> >>>> + if (spanned_pages > present_pages + (present_pages >> 4) &&
> >>>> + IS_ENABLED(CONFIG_SPARSEMEM))
> >>>> + pages = present_pages;
> >>>> +
> >>>> + return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT;
> >>>> +}
> >>>
> >>> Please explain the ">> 4" heuristc more completely - preferably in both
> >>> the changelog and code comments. Why can't we calculate this
> >>> requirement exactly? That might require a second pass, but that's OK for
> >>> code like this?
> >> Hi Andrew,
> >> A normal x86 platform always have some holes within the DMA ZONE,
> >> so the ">> 4" heuristic is to avoid applying this adjustment to the DMA
> >> ZONE on x86 platforms.
> >> Because the memmap_size is just an estimation, I feel it's OK to
> >> remove the ">> 4" heuristic, that shouldn't affect much.
> >
> > Again: why can't we calculate this requirement exactly? That might
> > require a second pass, but that's OK for code like this?
>
> Hi Andrew,
> If there are holes within a zone, it may cost us one or two extra pages
> for each populated region within the zone due to alignment because memmap for
> each populated regions may not naturally aligned on page boundary.
Right. So with an additional pass across the zone and a bit of
arithmetic, we can calculate the exact space requirement for memmap?
No need for kludgy heuristics?
> Originally the ">> 4" heuristic is to trade off these extra memmap pages,
> especially for small zones linke DMA zone.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 4/5] mm: provide more accurate estimation of pages occupied by memmap
2012-11-21 19:35 ` Andrew Morton
@ 2012-11-22 16:17 ` Jiang Liu
-1 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-22 16:17 UTC (permalink / raw)
To: Andrew Morton
Cc: Wen Congyang, David Rientjes, Jiang Liu, Maciej Rutecki,
Chris Clayton, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On 11/22/2012 03:35 AM, Andrew Morton wrote:
> On Wed, 21 Nov 2012 22:52:29 +0800
> Jiang Liu <liuj97@gmail.com> wrote:
>
>> On 11/21/2012 03:19 AM, Andrew Morton wrote:
>>> On Tue, 20 Nov 2012 23:18:34 +0800
>>> Jiang Liu <liuj97@gmail.com> wrote:
>>>
>>>>>> +static unsigned long calc_memmap_size(unsigned long spanned_pages,
>>>>>> + unsigned long present_pages)
>>>>>> +{
>>>>>> + unsigned long pages = spanned_pages;
>>>>>> +
>>>>>> + /*
>>>>>> + * Provide a more accurate estimation if there are big holes within
>>>>>> + * the zone and SPARSEMEM is in use.
>>>>>> + */
>>>>>> + if (spanned_pages > present_pages + (present_pages >> 4) &&
>>>>>> + IS_ENABLED(CONFIG_SPARSEMEM))
>>>>>> + pages = present_pages;
>>>>>> +
>>>>>> + return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT;
>>>>>> +}
>>>>>
>>>>> Please explain the ">> 4" heuristc more completely - preferably in both
>>>>> the changelog and code comments. Why can't we calculate this
>>>>> requirement exactly? That might require a second pass, but that's OK for
>>>>> code like this?
>>>> Hi Andrew,
>>>> A normal x86 platform always have some holes within the DMA ZONE,
>>>> so the ">> 4" heuristic is to avoid applying this adjustment to the DMA
>>>> ZONE on x86 platforms.
>>>> Because the memmap_size is just an estimation, I feel it's OK to
>>>> remove the ">> 4" heuristic, that shouldn't affect much.
>>>
>>> Again: why can't we calculate this requirement exactly? That might
>>> require a second pass, but that's OK for code like this?
>>
>> Hi Andrew,
>> If there are holes within a zone, it may cost us one or two extra pages
>> for each populated region within the zone due to alignment because memmap for
>> each populated regions may not naturally aligned on page boundary.
>
> Right. So with an additional pass across the zone and a bit of
> arithmetic, we can calculate the exact space requirement for memmap?
> No need for kludgy heuristics?
Hi Andrew:
Happy Thanksgiving!
The way to calculate the exact space requirement for memmap seems a little
complex, it depends on:
CONFIG_SPARSEMEM_VMEMMAP
CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
arch implemenation of alloc_remap()
Actually the original motivation is to reduce the deviation on a platform
such as:
node 0: 0-2G,4G-255G (a 2G hole between 2-4G)
node 1: 256G - 511G
node 2: 512G - 767G
node 3: 768G - 1023G
node 0: 1024G - 1026G (memory recovered from the hole)
So I just tried to reduce the deviation instead of accurate calculation of memmap.
Regards!
Gerry
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 4/5] mm: provide more accurate estimation of pages occupied by memmap
@ 2012-11-22 16:17 ` Jiang Liu
0 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-22 16:17 UTC (permalink / raw)
To: Andrew Morton
Cc: Wen Congyang, David Rientjes, Jiang Liu, Maciej Rutecki,
Chris Clayton, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On 11/22/2012 03:35 AM, Andrew Morton wrote:
> On Wed, 21 Nov 2012 22:52:29 +0800
> Jiang Liu <liuj97@gmail.com> wrote:
>
>> On 11/21/2012 03:19 AM, Andrew Morton wrote:
>>> On Tue, 20 Nov 2012 23:18:34 +0800
>>> Jiang Liu <liuj97@gmail.com> wrote:
>>>
>>>>>> +static unsigned long calc_memmap_size(unsigned long spanned_pages,
>>>>>> + unsigned long present_pages)
>>>>>> +{
>>>>>> + unsigned long pages = spanned_pages;
>>>>>> +
>>>>>> + /*
>>>>>> + * Provide a more accurate estimation if there are big holes within
>>>>>> + * the zone and SPARSEMEM is in use.
>>>>>> + */
>>>>>> + if (spanned_pages > present_pages + (present_pages >> 4) &&
>>>>>> + IS_ENABLED(CONFIG_SPARSEMEM))
>>>>>> + pages = present_pages;
>>>>>> +
>>>>>> + return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT;
>>>>>> +}
>>>>>
>>>>> Please explain the ">> 4" heuristc more completely - preferably in both
>>>>> the changelog and code comments. Why can't we calculate this
>>>>> requirement exactly? That might require a second pass, but that's OK for
>>>>> code like this?
>>>> Hi Andrew,
>>>> A normal x86 platform always have some holes within the DMA ZONE,
>>>> so the ">> 4" heuristic is to avoid applying this adjustment to the DMA
>>>> ZONE on x86 platforms.
>>>> Because the memmap_size is just an estimation, I feel it's OK to
>>>> remove the ">> 4" heuristic, that shouldn't affect much.
>>>
>>> Again: why can't we calculate this requirement exactly? That might
>>> require a second pass, but that's OK for code like this?
>>
>> Hi Andrew,
>> If there are holes within a zone, it may cost us one or two extra pages
>> for each populated region within the zone due to alignment because memmap for
>> each populated regions may not naturally aligned on page boundary.
>
> Right. So with an additional pass across the zone and a bit of
> arithmetic, we can calculate the exact space requirement for memmap?
> No need for kludgy heuristics?
Hi Andrew:
Happy Thanksgiving!
The way to calculate the exact space requirement for memmap seems a little
complex, it depends on:
CONFIG_SPARSEMEM_VMEMMAP
CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
arch implemenation of alloc_remap()
Actually the original motivation is to reduce the deviation on a platform
such as:
node 0: 0-2G,4G-255G (a 2G hole between 2-4G)
node 1: 256G - 511G
node 2: 512G - 767G
node 3: 768G - 1023G
node 0: 1024G - 1026G (memory recovered from the hole)
So I just tried to reduce the deviation instead of accurate calculation of memmap.
Regards!
Gerry
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap
2012-11-20 19:19 ` Andrew Morton
@ 2012-11-21 15:09 ` Jiang Liu
-1 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-21 15:09 UTC (permalink / raw)
To: Andrew Morton, Wen Congyang, David Rientjes
Cc: Jiang Liu, Maciej Rutecki, Chris Clayton, Rafael J . Wysocki,
Mel Gorman, Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko,
Jianguo Wu, linux-mm, linux-kernel
If SPARSEMEM is enabled, it won't build page structures for
non-existing pages (holes) within a zone, so provide a more accurate
estimation of pages occupied by memmap if there are bigger holes within
the zone.
And pages for highmem zones' memmap will be allocated from lowmem, so
charge nr_kernel_pages for that.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
---
mm/page_alloc.c | 26 ++++++++++++++++++++++++--
1 file changed, 24 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fc10071..9bbac97 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4442,6 +4442,26 @@ void __init set_pageblock_order(void)
#endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
+static unsigned long calc_memmap_size(unsigned long spanned_pages,
+ unsigned long present_pages)
+{
+ unsigned long pages = spanned_pages;
+
+ /*
+ * Provide a more accurate estimation if there are holes within
+ * the zone and SPARSEMEM is in use. If there are holes within the
+ * zone, each populated memory region may cost us one or two extra
+ * memmap pages due to alignment because memmap pages for each
+ * populated regions may not naturally algined on page boundary.
+ * So the (present_pages >> 4) heuristic is a tradeoff for that.
+ */
+ if (spanned_pages > present_pages + (present_pages >> 4) &&
+ IS_ENABLED(CONFIG_SPARSEMEM))
+ pages = present_pages;
+
+ return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT;
+}
+
/*
* Set up the zone data structures:
* - mark all pages reserved
@@ -4476,8 +4496,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
* is used by this zone for memmap. This affects the watermark
* and per-cpu initialisations
*/
- memmap_pages =
- PAGE_ALIGN(size * sizeof(struct page)) >> PAGE_SHIFT;
+ memmap_pages = calc_memmap_size(size, realsize);
if (freesize >= memmap_pages) {
freesize -= memmap_pages;
if (memmap_pages)
@@ -4498,6 +4517,9 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
if (!is_highmem_idx(j))
nr_kernel_pages += freesize;
+ /* Charge for highmem memmap if there are enough kernel pages */
+ else if (nr_kernel_pages > memmap_pages * 2)
+ nr_kernel_pages -= memmap_pages;
nr_all_pages += freesize;
zone->spanned_pages = size;
--
1.7.9.5
^ permalink raw reply related [flat|nested] 98+ messages in thread
* [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap
@ 2012-11-21 15:09 ` Jiang Liu
0 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-21 15:09 UTC (permalink / raw)
To: Andrew Morton, Wen Congyang, David Rientjes
Cc: Jiang Liu, Maciej Rutecki, Chris Clayton, Rafael J . Wysocki,
Mel Gorman, Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko,
Jianguo Wu, linux-mm, linux-kernel
If SPARSEMEM is enabled, it won't build page structures for
non-existing pages (holes) within a zone, so provide a more accurate
estimation of pages occupied by memmap if there are bigger holes within
the zone.
And pages for highmem zones' memmap will be allocated from lowmem, so
charge nr_kernel_pages for that.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
---
mm/page_alloc.c | 26 ++++++++++++++++++++++++--
1 file changed, 24 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fc10071..9bbac97 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4442,6 +4442,26 @@ void __init set_pageblock_order(void)
#endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
+static unsigned long calc_memmap_size(unsigned long spanned_pages,
+ unsigned long present_pages)
+{
+ unsigned long pages = spanned_pages;
+
+ /*
+ * Provide a more accurate estimation if there are holes within
+ * the zone and SPARSEMEM is in use. If there are holes within the
+ * zone, each populated memory region may cost us one or two extra
+ * memmap pages due to alignment because memmap pages for each
+ * populated regions may not naturally algined on page boundary.
+ * So the (present_pages >> 4) heuristic is a tradeoff for that.
+ */
+ if (spanned_pages > present_pages + (present_pages >> 4) &&
+ IS_ENABLED(CONFIG_SPARSEMEM))
+ pages = present_pages;
+
+ return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT;
+}
+
/*
* Set up the zone data structures:
* - mark all pages reserved
@@ -4476,8 +4496,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
* is used by this zone for memmap. This affects the watermark
* and per-cpu initialisations
*/
- memmap_pages =
- PAGE_ALIGN(size * sizeof(struct page)) >> PAGE_SHIFT;
+ memmap_pages = calc_memmap_size(size, realsize);
if (freesize >= memmap_pages) {
freesize -= memmap_pages;
if (memmap_pages)
@@ -4498,6 +4517,9 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
if (!is_highmem_idx(j))
nr_kernel_pages += freesize;
+ /* Charge for highmem memmap if there are enough kernel pages */
+ else if (nr_kernel_pages > memmap_pages * 2)
+ nr_kernel_pages -= memmap_pages;
nr_all_pages += freesize;
zone->spanned_pages = size;
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap
2012-11-21 15:09 ` Jiang Liu
@ 2012-11-28 23:52 ` Andrew Morton
-1 siblings, 0 replies; 98+ messages in thread
From: Andrew Morton @ 2012-11-28 23:52 UTC (permalink / raw)
To: Jiang Liu
Cc: Wen Congyang, David Rientjes, Jiang Liu, Maciej Rutecki,
Chris Clayton, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On Wed, 21 Nov 2012 23:09:46 +0800
Jiang Liu <liuj97@gmail.com> wrote:
> Subject: Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap
How are people to test this? "does it boot"?
> If SPARSEMEM is enabled, it won't build page structures for
> non-existing pages (holes) within a zone, so provide a more accurate
> estimation of pages occupied by memmap if there are bigger holes within
> the zone.
>
> And pages for highmem zones' memmap will be allocated from lowmem, so
> charge nr_kernel_pages for that.
>
> ...
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4442,6 +4442,26 @@ void __init set_pageblock_order(void)
>
> #endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
>
> +static unsigned long calc_memmap_size(unsigned long spanned_pages,
> + unsigned long present_pages)
> +{
> + unsigned long pages = spanned_pages;
> +
> + /*
> + * Provide a more accurate estimation if there are holes within
> + * the zone and SPARSEMEM is in use. If there are holes within the
> + * zone, each populated memory region may cost us one or two extra
> + * memmap pages due to alignment because memmap pages for each
> + * populated regions may not naturally algined on page boundary.
> + * So the (present_pages >> 4) heuristic is a tradeoff for that.
> + */
> + if (spanned_pages > present_pages + (present_pages >> 4) &&
> + IS_ENABLED(CONFIG_SPARSEMEM))
> + pages = present_pages;
> +
> + return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT;
> +}
> +
I spose we should do this, although it makes no difference as the
compiler will inline calc_memmap_size() into its caller:
--- a/mm/page_alloc.c~mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap-fix
+++ a/mm/page_alloc.c
@@ -4526,8 +4526,8 @@ void __init set_pageblock_order(void)
#endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
-static unsigned long calc_memmap_size(unsigned long spanned_pages,
- unsigned long present_pages)
+static unsigned long __paginginit calc_memmap_size(unsigned long spanned_pages,
+ unsigned long present_pages)
{
unsigned long pages = spanned_pages;
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap
@ 2012-11-28 23:52 ` Andrew Morton
0 siblings, 0 replies; 98+ messages in thread
From: Andrew Morton @ 2012-11-28 23:52 UTC (permalink / raw)
To: Jiang Liu
Cc: Wen Congyang, David Rientjes, Jiang Liu, Maciej Rutecki,
Chris Clayton, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On Wed, 21 Nov 2012 23:09:46 +0800
Jiang Liu <liuj97@gmail.com> wrote:
> Subject: Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap
How are people to test this? "does it boot"?
> If SPARSEMEM is enabled, it won't build page structures for
> non-existing pages (holes) within a zone, so provide a more accurate
> estimation of pages occupied by memmap if there are bigger holes within
> the zone.
>
> And pages for highmem zones' memmap will be allocated from lowmem, so
> charge nr_kernel_pages for that.
>
> ...
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4442,6 +4442,26 @@ void __init set_pageblock_order(void)
>
> #endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
>
> +static unsigned long calc_memmap_size(unsigned long spanned_pages,
> + unsigned long present_pages)
> +{
> + unsigned long pages = spanned_pages;
> +
> + /*
> + * Provide a more accurate estimation if there are holes within
> + * the zone and SPARSEMEM is in use. If there are holes within the
> + * zone, each populated memory region may cost us one or two extra
> + * memmap pages due to alignment because memmap pages for each
> + * populated regions may not naturally algined on page boundary.
> + * So the (present_pages >> 4) heuristic is a tradeoff for that.
> + */
> + if (spanned_pages > present_pages + (present_pages >> 4) &&
> + IS_ENABLED(CONFIG_SPARSEMEM))
> + pages = present_pages;
> +
> + return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT;
> +}
> +
I spose we should do this, although it makes no difference as the
compiler will inline calc_memmap_size() into its caller:
--- a/mm/page_alloc.c~mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap-fix
+++ a/mm/page_alloc.c
@@ -4526,8 +4526,8 @@ void __init set_pageblock_order(void)
#endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
-static unsigned long calc_memmap_size(unsigned long spanned_pages,
- unsigned long present_pages)
+static unsigned long __paginginit calc_memmap_size(unsigned long spanned_pages,
+ unsigned long present_pages)
{
unsigned long pages = spanned_pages;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap
2012-11-28 23:52 ` Andrew Morton
@ 2012-11-29 2:25 ` Jianguo Wu
-1 siblings, 0 replies; 98+ messages in thread
From: Jianguo Wu @ 2012-11-29 2:25 UTC (permalink / raw)
To: Andrew Morton
Cc: Jiang Liu, Wen Congyang, David Rientjes, Jiang Liu,
Maciej Rutecki, Chris Clayton, Rafael J . Wysocki, Mel Gorman,
Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko, linux-mm,
linux-kernel
On 2012/11/29 7:52, Andrew Morton wrote:
> On Wed, 21 Nov 2012 23:09:46 +0800
> Jiang Liu <liuj97@gmail.com> wrote:
>
>> Subject: Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap
>
> How are people to test this? "does it boot"?
>
I have tested this in x86_64, it does boot.
Node 0, zone DMA
pages free 3972
min 1
low 1
high 1
scanned 0
spanned 4080
present 3979
managed 3972
Node 0, zone DMA32
pages free 448783
min 172
low 215
high 258
scanned 0
spanned 1044480
present 500799
managed 444545
Node 0, zone Normal
pages free 2375547
min 1394
low 1742
high 2091
scanned 0
spanned 3670016
present 3670016
managed 3585105
Thanks,
Jianguo Wu
>> If SPARSEMEM is enabled, it won't build page structures for
>> non-existing pages (holes) within a zone, so provide a more accurate
>> estimation of pages occupied by memmap if there are bigger holes within
>> the zone.
>>
>> And pages for highmem zones' memmap will be allocated from lowmem, so
>> charge nr_kernel_pages for that.
>>
>> ...
>>
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -4442,6 +4442,26 @@ void __init set_pageblock_order(void)
>>
>> #endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
>>
>> +static unsigned long calc_memmap_size(unsigned long spanned_pages,
>> + unsigned long present_pages)
>> +{
>> + unsigned long pages = spanned_pages;
>> +
>> + /*
>> + * Provide a more accurate estimation if there are holes within
>> + * the zone and SPARSEMEM is in use. If there are holes within the
>> + * zone, each populated memory region may cost us one or two extra
>> + * memmap pages due to alignment because memmap pages for each
>> + * populated regions may not naturally algined on page boundary.
>> + * So the (present_pages >> 4) heuristic is a tradeoff for that.
>> + */
>> + if (spanned_pages > present_pages + (present_pages >> 4) &&
>> + IS_ENABLED(CONFIG_SPARSEMEM))
>> + pages = present_pages;
>> +
>> + return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT;
>> +}
>> +
>
> I spose we should do this, although it makes no difference as the
> compiler will inline calc_memmap_size() into its caller:
>
> --- a/mm/page_alloc.c~mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap-fix
> +++ a/mm/page_alloc.c
> @@ -4526,8 +4526,8 @@ void __init set_pageblock_order(void)
>
> #endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
>
> -static unsigned long calc_memmap_size(unsigned long spanned_pages,
> - unsigned long present_pages)
> +static unsigned long __paginginit calc_memmap_size(unsigned long spanned_pages,
> + unsigned long present_pages)
> {
> unsigned long pages = spanned_pages;
>
>
>
> .
>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap
@ 2012-11-29 2:25 ` Jianguo Wu
0 siblings, 0 replies; 98+ messages in thread
From: Jianguo Wu @ 2012-11-29 2:25 UTC (permalink / raw)
To: Andrew Morton
Cc: Jiang Liu, Wen Congyang, David Rientjes, Jiang Liu,
Maciej Rutecki, Chris Clayton, Rafael J . Wysocki, Mel Gorman,
Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko, linux-mm,
linux-kernel
On 2012/11/29 7:52, Andrew Morton wrote:
> On Wed, 21 Nov 2012 23:09:46 +0800
> Jiang Liu <liuj97@gmail.com> wrote:
>
>> Subject: Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap
>
> How are people to test this? "does it boot"?
>
I have tested this in x86_64, it does boot.
Node 0, zone DMA
pages free 3972
min 1
low 1
high 1
scanned 0
spanned 4080
present 3979
managed 3972
Node 0, zone DMA32
pages free 448783
min 172
low 215
high 258
scanned 0
spanned 1044480
present 500799
managed 444545
Node 0, zone Normal
pages free 2375547
min 1394
low 1742
high 2091
scanned 0
spanned 3670016
present 3670016
managed 3585105
Thanks,
Jianguo Wu
>> If SPARSEMEM is enabled, it won't build page structures for
>> non-existing pages (holes) within a zone, so provide a more accurate
>> estimation of pages occupied by memmap if there are bigger holes within
>> the zone.
>>
>> And pages for highmem zones' memmap will be allocated from lowmem, so
>> charge nr_kernel_pages for that.
>>
>> ...
>>
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -4442,6 +4442,26 @@ void __init set_pageblock_order(void)
>>
>> #endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
>>
>> +static unsigned long calc_memmap_size(unsigned long spanned_pages,
>> + unsigned long present_pages)
>> +{
>> + unsigned long pages = spanned_pages;
>> +
>> + /*
>> + * Provide a more accurate estimation if there are holes within
>> + * the zone and SPARSEMEM is in use. If there are holes within the
>> + * zone, each populated memory region may cost us one or two extra
>> + * memmap pages due to alignment because memmap pages for each
>> + * populated regions may not naturally algined on page boundary.
>> + * So the (present_pages >> 4) heuristic is a tradeoff for that.
>> + */
>> + if (spanned_pages > present_pages + (present_pages >> 4) &&
>> + IS_ENABLED(CONFIG_SPARSEMEM))
>> + pages = present_pages;
>> +
>> + return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT;
>> +}
>> +
>
> I spose we should do this, although it makes no difference as the
> compiler will inline calc_memmap_size() into its caller:
>
> --- a/mm/page_alloc.c~mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap-fix
> +++ a/mm/page_alloc.c
> @@ -4526,8 +4526,8 @@ void __init set_pageblock_order(void)
>
> #endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
>
> -static unsigned long calc_memmap_size(unsigned long spanned_pages,
> - unsigned long present_pages)
> +static unsigned long __paginginit calc_memmap_size(unsigned long spanned_pages,
> + unsigned long present_pages)
> {
> unsigned long pages = spanned_pages;
>
>
>
> .
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap
2012-11-28 23:52 ` Andrew Morton
@ 2012-11-29 10:52 ` Chris Clayton
-1 siblings, 0 replies; 98+ messages in thread
From: Chris Clayton @ 2012-11-29 10:52 UTC (permalink / raw)
To: Andrew Morton
Cc: Jiang Liu, Wen Congyang, David Rientjes, Jiang Liu,
Maciej Rutecki, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On 11/28/12 23:52, Andrew Morton wrote:
> On Wed, 21 Nov 2012 23:09:46 +0800
> Jiang Liu <liuj97@gmail.com> wrote:
>
>> Subject: Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap
>
> How are people to test this? "does it boot"?
>
I've been running kernels with Gerry's 5 patches applied for 11 days
now. This is on a 64bit laptop but with a 32bit kernel + HIGHMEM. I
joined the conversation because my laptop would not resume from suspend
to disk - it either froze or rebooted. With the patches applied the
laptop does successfully resume and has been stable.
Since Monday, I have have been running a kernel with the patches (plus,
from today, the patch you mailed yesterday) applied to 3.7rc7, without
problems.
Thanks,
Chris
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap
@ 2012-11-29 10:52 ` Chris Clayton
0 siblings, 0 replies; 98+ messages in thread
From: Chris Clayton @ 2012-11-29 10:52 UTC (permalink / raw)
To: Andrew Morton
Cc: Jiang Liu, Wen Congyang, David Rientjes, Jiang Liu,
Maciej Rutecki, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On 11/28/12 23:52, Andrew Morton wrote:
> On Wed, 21 Nov 2012 23:09:46 +0800
> Jiang Liu <liuj97@gmail.com> wrote:
>
>> Subject: Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap
>
> How are people to test this? "does it boot"?
>
I've been running kernels with Gerry's 5 patches applied for 11 days
now. This is on a 64bit laptop but with a 32bit kernel + HIGHMEM. I
joined the conversation because my laptop would not resume from suspend
to disk - it either froze or rebooted. With the patches applied the
laptop does successfully resume and has been stable.
Since Monday, I have have been running a kernel with the patches (plus,
from today, the patch you mailed yesterday) applied to 3.7rc7, without
problems.
Thanks,
Chris
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap
2012-11-29 10:52 ` Chris Clayton
@ 2012-12-02 19:55 ` Chris Clayton
-1 siblings, 0 replies; 98+ messages in thread
From: Chris Clayton @ 2012-12-02 19:55 UTC (permalink / raw)
To: Andrew Morton
Cc: Jiang Liu, Wen Congyang, David Rientjes, Jiang Liu,
Maciej Rutecki, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On 11/29/12 10:52, Chris Clayton wrote:
> On 11/28/12 23:52, Andrew Morton wrote:
>> On Wed, 21 Nov 2012 23:09:46 +0800
>> Jiang Liu <liuj97@gmail.com> wrote:
>>
>>> Subject: Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation
>>> of pages occupied by memmap
>>
>> How are people to test this? "does it boot"?
>>
>
> I've been running kernels with Gerry's 5 patches applied for 11 days
> now. This is on a 64bit laptop but with a 32bit kernel + HIGHMEM. I
> joined the conversation because my laptop would not resume from suspend
> to disk - it either froze or rebooted. With the patches applied the
> laptop does successfully resume and has been stable.
>
> Since Monday, I have have been running a kernel with the patches (plus,
> from today, the patch you mailed yesterday) applied to 3.7rc7, without
> problems.
>
I've been running 3.7-rc7 with the patches listed below for a week now
and it has been perfectly stable. In particular, my laptop will now
successfully resume from suspend to disk, which always failed without
the patches.
From Jiang Liu:
1. [RFT PATCH v2 1/5] mm: introduce new field "managed_pages" to struct zone
2. [RFT PATCH v1 2/5] mm: replace zone->present_pages with
zone->managed_pages if appreciated
3. [RFT PATCH v1 3/5] mm: set zone->present_pages to number of existing
pages in the zone
4. [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages
occupied by memmap
5. [RFT PATCH v1 5/5] mm: increase totalram_pages when free pages
allocated by bootmem allocator
From Andrew Morton:
6. mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap.patch
Tested-by: Chris Clayton <chris2553@googlemail.com>
> Thanks,
> Chris
>
>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap
@ 2012-12-02 19:55 ` Chris Clayton
0 siblings, 0 replies; 98+ messages in thread
From: Chris Clayton @ 2012-12-02 19:55 UTC (permalink / raw)
To: Andrew Morton
Cc: Jiang Liu, Wen Congyang, David Rientjes, Jiang Liu,
Maciej Rutecki, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On 11/29/12 10:52, Chris Clayton wrote:
> On 11/28/12 23:52, Andrew Morton wrote:
>> On Wed, 21 Nov 2012 23:09:46 +0800
>> Jiang Liu <liuj97@gmail.com> wrote:
>>
>>> Subject: Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation
>>> of pages occupied by memmap
>>
>> How are people to test this? "does it boot"?
>>
>
> I've been running kernels with Gerry's 5 patches applied for 11 days
> now. This is on a 64bit laptop but with a 32bit kernel + HIGHMEM. I
> joined the conversation because my laptop would not resume from suspend
> to disk - it either froze or rebooted. With the patches applied the
> laptop does successfully resume and has been stable.
>
> Since Monday, I have have been running a kernel with the patches (plus,
> from today, the patch you mailed yesterday) applied to 3.7rc7, without
> problems.
>
I've been running 3.7-rc7 with the patches listed below for a week now
and it has been perfectly stable. In particular, my laptop will now
successfully resume from suspend to disk, which always failed without
the patches.
From Jiang Liu:
1. [RFT PATCH v2 1/5] mm: introduce new field "managed_pages" to struct zone
2. [RFT PATCH v1 2/5] mm: replace zone->present_pages with
zone->managed_pages if appreciated
3. [RFT PATCH v1 3/5] mm: set zone->present_pages to number of existing
pages in the zone
4. [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages
occupied by memmap
5. [RFT PATCH v1 5/5] mm: increase totalram_pages when free pages
allocated by bootmem allocator
From Andrew Morton:
6. mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap.patch
Tested-by: Chris Clayton <chris2553@googlemail.com>
> Thanks,
> Chris
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap
2012-12-02 19:55 ` Chris Clayton
@ 2012-12-03 7:26 ` Chris Clayton
-1 siblings, 0 replies; 98+ messages in thread
From: Chris Clayton @ 2012-12-03 7:26 UTC (permalink / raw)
To: Andrew Morton
Cc: Jiang Liu, Wen Congyang, David Rientjes, Jiang Liu,
Maciej Rutecki, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On 12/02/12 19:55, Chris Clayton wrote:
>
>
> On 11/29/12 10:52, Chris Clayton wrote:
>> On 11/28/12 23:52, Andrew Morton wrote:
>>> On Wed, 21 Nov 2012 23:09:46 +0800
>>> Jiang Liu <liuj97@gmail.com> wrote:
>>>
>>>> Subject: Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation
>>>> of pages occupied by memmap
>>>
>>> How are people to test this? "does it boot"?
>>>
>>
>> I've been running kernels with Gerry's 5 patches applied for 11 days
>> now. This is on a 64bit laptop but with a 32bit kernel + HIGHMEM. I
>> joined the conversation because my laptop would not resume from suspend
>> to disk - it either froze or rebooted. With the patches applied the
>> laptop does successfully resume and has been stable.
>>
>> Since Monday, I have have been running a kernel with the patches (plus,
>> from today, the patch you mailed yesterday) applied to 3.7rc7, without
>> problems.
>>
>
> I've been running 3.7-rc7 with the patches listed below for a week now
> and it has been perfectly stable. In particular, my laptop will now
> successfully resume from suspend to disk, which always failed without
> the patches.
>
I should have said, of course, that it was -rc6 and earlier that would
not boot without Jiang Liu's patches. I applied those patches to rc-6
and my resume after suspend to disk problem was fixed. For a subsequent
week I have been running with the patches applied to -rc7, with Andrew's
patch also applied for the last 3 days. -rc7 was not subject to the
resume problem because the patch which broke it had been reverted.
All this has been on a 64bit laptop, but running a 32bit kernel with
HIGHMEM.
Apologies for yesterday's inaccuracy. I shouldn't send testing reports
when I'm in a hurry.
> From Jiang Liu:
> 1. [RFT PATCH v2 1/5] mm: introduce new field "managed_pages" to struct
> zone
> 2. [RFT PATCH v1 2/5] mm: replace zone->present_pages with
> zone->managed_pages if appreciated
> 3. [RFT PATCH v1 3/5] mm: set zone->present_pages to number of existing
> pages in the zone
> 4. [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages
> occupied by memmap
> 5. [RFT PATCH v1 5/5] mm: increase totalram_pages when free pages
> allocated by bootmem allocator
>
> From Andrew Morton:
> 6. mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap.patch
>
> Tested-by: Chris Clayton <chris2553@googlemail.com>
>
>> Thanks,
>> Chris
>>
>>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap
@ 2012-12-03 7:26 ` Chris Clayton
0 siblings, 0 replies; 98+ messages in thread
From: Chris Clayton @ 2012-12-03 7:26 UTC (permalink / raw)
To: Andrew Morton
Cc: Jiang Liu, Wen Congyang, David Rientjes, Jiang Liu,
Maciej Rutecki, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On 12/02/12 19:55, Chris Clayton wrote:
>
>
> On 11/29/12 10:52, Chris Clayton wrote:
>> On 11/28/12 23:52, Andrew Morton wrote:
>>> On Wed, 21 Nov 2012 23:09:46 +0800
>>> Jiang Liu <liuj97@gmail.com> wrote:
>>>
>>>> Subject: Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation
>>>> of pages occupied by memmap
>>>
>>> How are people to test this? "does it boot"?
>>>
>>
>> I've been running kernels with Gerry's 5 patches applied for 11 days
>> now. This is on a 64bit laptop but with a 32bit kernel + HIGHMEM. I
>> joined the conversation because my laptop would not resume from suspend
>> to disk - it either froze or rebooted. With the patches applied the
>> laptop does successfully resume and has been stable.
>>
>> Since Monday, I have have been running a kernel with the patches (plus,
>> from today, the patch you mailed yesterday) applied to 3.7rc7, without
>> problems.
>>
>
> I've been running 3.7-rc7 with the patches listed below for a week now
> and it has been perfectly stable. In particular, my laptop will now
> successfully resume from suspend to disk, which always failed without
> the patches.
>
I should have said, of course, that it was -rc6 and earlier that would
not boot without Jiang Liu's patches. I applied those patches to rc-6
and my resume after suspend to disk problem was fixed. For a subsequent
week I have been running with the patches applied to -rc7, with Andrew's
patch also applied for the last 3 days. -rc7 was not subject to the
resume problem because the patch which broke it had been reverted.
All this has been on a 64bit laptop, but running a 32bit kernel with
HIGHMEM.
Apologies for yesterday's inaccuracy. I shouldn't send testing reports
when I'm in a hurry.
> From Jiang Liu:
> 1. [RFT PATCH v2 1/5] mm: introduce new field "managed_pages" to struct
> zone
> 2. [RFT PATCH v1 2/5] mm: replace zone->present_pages with
> zone->managed_pages if appreciated
> 3. [RFT PATCH v1 3/5] mm: set zone->present_pages to number of existing
> pages in the zone
> 4. [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages
> occupied by memmap
> 5. [RFT PATCH v1 5/5] mm: increase totalram_pages when free pages
> allocated by bootmem allocator
>
> From Andrew Morton:
> 6. mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap.patch
>
> Tested-by: Chris Clayton <chris2553@googlemail.com>
>
>> Thanks,
>> Chris
>>
>>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap
2012-12-02 19:55 ` Chris Clayton
@ 2012-12-03 23:17 ` Andrew Morton
-1 siblings, 0 replies; 98+ messages in thread
From: Andrew Morton @ 2012-12-03 23:17 UTC (permalink / raw)
To: Chris Clayton
Cc: Jiang Liu, Wen Congyang, David Rientjes, Jiang Liu,
Maciej Rutecki, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On Sun, 02 Dec 2012 19:55:09 +0000
Chris Clayton <chris2553@googlemail.com> wrote:
>
>
> On 11/29/12 10:52, Chris Clayton wrote:
> > On 11/28/12 23:52, Andrew Morton wrote:
> >> On Wed, 21 Nov 2012 23:09:46 +0800
> >> Jiang Liu <liuj97@gmail.com> wrote:
> >>
> >>> Subject: Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation
> >>> of pages occupied by memmap
> >>
> >> How are people to test this? "does it boot"?
> >>
> >
> > I've been running kernels with Gerry's 5 patches applied for 11 days
> > now. This is on a 64bit laptop but with a 32bit kernel + HIGHMEM. I
> > joined the conversation because my laptop would not resume from suspend
> > to disk - it either froze or rebooted. With the patches applied the
> > laptop does successfully resume and has been stable.
> >
> > Since Monday, I have have been running a kernel with the patches (plus,
> > from today, the patch you mailed yesterday) applied to 3.7rc7, without
> > problems.
> >
>
> I've been running 3.7-rc7 with the patches listed below for a week now
> and it has been perfectly stable. In particular, my laptop will now
> successfully resume from suspend to disk, which always failed without
> the patches.
>
> From Jiang Liu:
> 1. [RFT PATCH v2 1/5] mm: introduce new field "managed_pages" to struct zone
> 2. [RFT PATCH v1 2/5] mm: replace zone->present_pages with
> zone->managed_pages if appreciated
> 3. [RFT PATCH v1 3/5] mm: set zone->present_pages to number of existing
> pages in the zone
> 4. [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages
> occupied by memmap
> 5. [RFT PATCH v1 5/5] mm: increase totalram_pages when free pages
> allocated by bootmem allocator
>
> From Andrew Morton:
> 6. mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap.patch
>
> Tested-by: Chris Clayton <chris2553@googlemail.com>
Thanks.
I have only two of these five patches queued for 3.8:
mm-introduce-new-field-managed_pages-to-struct-zone.patch and
mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap.patch.
I don't recall what happened with the other three.
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap
@ 2012-12-03 23:17 ` Andrew Morton
0 siblings, 0 replies; 98+ messages in thread
From: Andrew Morton @ 2012-12-03 23:17 UTC (permalink / raw)
To: Chris Clayton
Cc: Jiang Liu, Wen Congyang, David Rientjes, Jiang Liu,
Maciej Rutecki, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On Sun, 02 Dec 2012 19:55:09 +0000
Chris Clayton <chris2553@googlemail.com> wrote:
>
>
> On 11/29/12 10:52, Chris Clayton wrote:
> > On 11/28/12 23:52, Andrew Morton wrote:
> >> On Wed, 21 Nov 2012 23:09:46 +0800
> >> Jiang Liu <liuj97@gmail.com> wrote:
> >>
> >>> Subject: Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation
> >>> of pages occupied by memmap
> >>
> >> How are people to test this? "does it boot"?
> >>
> >
> > I've been running kernels with Gerry's 5 patches applied for 11 days
> > now. This is on a 64bit laptop but with a 32bit kernel + HIGHMEM. I
> > joined the conversation because my laptop would not resume from suspend
> > to disk - it either froze or rebooted. With the patches applied the
> > laptop does successfully resume and has been stable.
> >
> > Since Monday, I have have been running a kernel with the patches (plus,
> > from today, the patch you mailed yesterday) applied to 3.7rc7, without
> > problems.
> >
>
> I've been running 3.7-rc7 with the patches listed below for a week now
> and it has been perfectly stable. In particular, my laptop will now
> successfully resume from suspend to disk, which always failed without
> the patches.
>
> From Jiang Liu:
> 1. [RFT PATCH v2 1/5] mm: introduce new field "managed_pages" to struct zone
> 2. [RFT PATCH v1 2/5] mm: replace zone->present_pages with
> zone->managed_pages if appreciated
> 3. [RFT PATCH v1 3/5] mm: set zone->present_pages to number of existing
> pages in the zone
> 4. [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages
> occupied by memmap
> 5. [RFT PATCH v1 5/5] mm: increase totalram_pages when free pages
> allocated by bootmem allocator
>
> From Andrew Morton:
> 6. mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap.patch
>
> Tested-by: Chris Clayton <chris2553@googlemail.com>
Thanks.
I have only two of these five patches queued for 3.8:
mm-introduce-new-field-managed_pages-to-struct-zone.patch and
mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap.patch.
I don't recall what happened with the other three.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap
2012-12-03 23:17 ` Andrew Morton
@ 2012-12-04 1:21 ` Jiang Liu
-1 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-12-04 1:21 UTC (permalink / raw)
To: Andrew Morton
Cc: Chris Clayton, Jiang Liu, Wen Congyang, David Rientjes,
Maciej Rutecki, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On 2012-12-4 7:17, Andrew Morton wrote:
> On Sun, 02 Dec 2012 19:55:09 +0000
> Chris Clayton <chris2553@googlemail.com> wrote:
>
>>
>>
>> On 11/29/12 10:52, Chris Clayton wrote:
>>> On 11/28/12 23:52, Andrew Morton wrote:
>>>> On Wed, 21 Nov 2012 23:09:46 +0800
>>>> Jiang Liu <liuj97@gmail.com> wrote:
>>>>
>>>>> Subject: Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation
>>>>> of pages occupied by memmap
>>>>
>>>> How are people to test this? "does it boot"?
>>>>
>>>
>>> I've been running kernels with Gerry's 5 patches applied for 11 days
>>> now. This is on a 64bit laptop but with a 32bit kernel + HIGHMEM. I
>>> joined the conversation because my laptop would not resume from suspend
>>> to disk - it either froze or rebooted. With the patches applied the
>>> laptop does successfully resume and has been stable.
>>>
>>> Since Monday, I have have been running a kernel with the patches (plus,
>>> from today, the patch you mailed yesterday) applied to 3.7rc7, without
>>> problems.
>>>
>>
>> I've been running 3.7-rc7 with the patches listed below for a week now
>> and it has been perfectly stable. In particular, my laptop will now
>> successfully resume from suspend to disk, which always failed without
>> the patches.
>>
>> From Jiang Liu:
>> 1. [RFT PATCH v2 1/5] mm: introduce new field "managed_pages" to struct zone
>> 2. [RFT PATCH v1 2/5] mm: replace zone->present_pages with
>> zone->managed_pages if appreciated
>> 3. [RFT PATCH v1 3/5] mm: set zone->present_pages to number of existing
>> pages in the zone
>> 4. [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages
>> occupied by memmap
>> 5. [RFT PATCH v1 5/5] mm: increase totalram_pages when free pages
>> allocated by bootmem allocator
>>
>> From Andrew Morton:
>> 6. mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap.patch
>>
>> Tested-by: Chris Clayton <chris2553@googlemail.com>
>
> Thanks.
>
> I have only two of these five patches queued for 3.8:
> mm-introduce-new-field-managed_pages-to-struct-zone.patch and
> mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap.patch.
> I don't recall what happened with the other three.
Hi Andrew,
I think the story that,
1) I sent out all five patches for the first version.
2) Later updated and resend the first and forth patches according to
review comments without touching the other three patches.
So you may just get the updated two without noticing the other
three from the initial patchset.
Sorry for the trouble.
Regards!
Gerry
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap
@ 2012-12-04 1:21 ` Jiang Liu
0 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-12-04 1:21 UTC (permalink / raw)
To: Andrew Morton
Cc: Chris Clayton, Jiang Liu, Wen Congyang, David Rientjes,
Maciej Rutecki, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On 2012-12-4 7:17, Andrew Morton wrote:
> On Sun, 02 Dec 2012 19:55:09 +0000
> Chris Clayton <chris2553@googlemail.com> wrote:
>
>>
>>
>> On 11/29/12 10:52, Chris Clayton wrote:
>>> On 11/28/12 23:52, Andrew Morton wrote:
>>>> On Wed, 21 Nov 2012 23:09:46 +0800
>>>> Jiang Liu <liuj97@gmail.com> wrote:
>>>>
>>>>> Subject: Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation
>>>>> of pages occupied by memmap
>>>>
>>>> How are people to test this? "does it boot"?
>>>>
>>>
>>> I've been running kernels with Gerry's 5 patches applied for 11 days
>>> now. This is on a 64bit laptop but with a 32bit kernel + HIGHMEM. I
>>> joined the conversation because my laptop would not resume from suspend
>>> to disk - it either froze or rebooted. With the patches applied the
>>> laptop does successfully resume and has been stable.
>>>
>>> Since Monday, I have have been running a kernel with the patches (plus,
>>> from today, the patch you mailed yesterday) applied to 3.7rc7, without
>>> problems.
>>>
>>
>> I've been running 3.7-rc7 with the patches listed below for a week now
>> and it has been perfectly stable. In particular, my laptop will now
>> successfully resume from suspend to disk, which always failed without
>> the patches.
>>
>> From Jiang Liu:
>> 1. [RFT PATCH v2 1/5] mm: introduce new field "managed_pages" to struct zone
>> 2. [RFT PATCH v1 2/5] mm: replace zone->present_pages with
>> zone->managed_pages if appreciated
>> 3. [RFT PATCH v1 3/5] mm: set zone->present_pages to number of existing
>> pages in the zone
>> 4. [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages
>> occupied by memmap
>> 5. [RFT PATCH v1 5/5] mm: increase totalram_pages when free pages
>> allocated by bootmem allocator
>>
>> From Andrew Morton:
>> 6. mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap.patch
>>
>> Tested-by: Chris Clayton <chris2553@googlemail.com>
>
> Thanks.
>
> I have only two of these five patches queued for 3.8:
> mm-introduce-new-field-managed_pages-to-struct-zone.patch and
> mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap.patch.
> I don't recall what happened with the other three.
Hi Andrew,
I think the story that,
1) I sent out all five patches for the first version.
2) Later updated and resend the first and forth patches according to
review comments without touching the other three patches.
So you may just get the updated two without noticing the other
three from the initial patchset.
Sorry for the trouble.
Regards!
Gerry
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap
2012-12-03 23:17 ` Andrew Morton
@ 2012-12-04 10:05 ` Chris Clayton
-1 siblings, 0 replies; 98+ messages in thread
From: Chris Clayton @ 2012-12-04 10:05 UTC (permalink / raw)
To: Andrew Morton
Cc: Jiang Liu, Wen Congyang, David Rientjes, Jiang Liu,
Maciej Rutecki, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On 12/03/12 23:17, Andrew Morton wrote:
> On Sun, 02 Dec 2012 19:55:09 +0000
> Chris Clayton <chris2553@googlemail.com> wrote:
>
>>
>>
>> On 11/29/12 10:52, Chris Clayton wrote:
>>> On 11/28/12 23:52, Andrew Morton wrote:
>>>> On Wed, 21 Nov 2012 23:09:46 +0800
>>>> Jiang Liu <liuj97@gmail.com> wrote:
>>>>
>>>>> Subject: Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation
>>>>> of pages occupied by memmap
>>>>
>>>> How are people to test this? "does it boot"?
>>>>
>>>
>>> I've been running kernels with Gerry's 5 patches applied for 11 days
>>> now. This is on a 64bit laptop but with a 32bit kernel + HIGHMEM. I
>>> joined the conversation because my laptop would not resume from suspend
>>> to disk - it either froze or rebooted. With the patches applied the
>>> laptop does successfully resume and has been stable.
>>>
>>> Since Monday, I have have been running a kernel with the patches (plus,
>>> from today, the patch you mailed yesterday) applied to 3.7rc7, without
>>> problems.
>>>
>>
>> I've been running 3.7-rc7 with the patches listed below for a week now
>> and it has been perfectly stable. In particular, my laptop will now
>> successfully resume from suspend to disk, which always failed without
>> the patches.
>>
>> From Jiang Liu:
>> 1. [RFT PATCH v2 1/5] mm: introduce new field "managed_pages" to struct zone
>> 2. [RFT PATCH v1 2/5] mm: replace zone->present_pages with
>> zone->managed_pages if appreciated
>> 3. [RFT PATCH v1 3/5] mm: set zone->present_pages to number of existing
>> pages in the zone
>> 4. [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages
>> occupied by memmap
>> 5. [RFT PATCH v1 5/5] mm: increase totalram_pages when free pages
>> allocated by bootmem allocator
>>
>> From Andrew Morton:
>> 6. mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap.patch
>>
>> Tested-by: Chris Clayton <chris2553@googlemail.com>
>
> Thanks.
>
> I have only two of these five patches queued for 3.8:
> mm-introduce-new-field-managed_pages-to-struct-zone.patch and
> mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap.patch.
> I don't recall what happened with the other three.
>
Gerry posted version 1 of the five patches on 18 November. Version 2 of
the first and fourth patches were posted on 21 November.
BTW, the title of Andrew's patch that I applied is wrong above. It
should be
mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap-fix.
Chris
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap
@ 2012-12-04 10:05 ` Chris Clayton
0 siblings, 0 replies; 98+ messages in thread
From: Chris Clayton @ 2012-12-04 10:05 UTC (permalink / raw)
To: Andrew Morton
Cc: Jiang Liu, Wen Congyang, David Rientjes, Jiang Liu,
Maciej Rutecki, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On 12/03/12 23:17, Andrew Morton wrote:
> On Sun, 02 Dec 2012 19:55:09 +0000
> Chris Clayton <chris2553@googlemail.com> wrote:
>
>>
>>
>> On 11/29/12 10:52, Chris Clayton wrote:
>>> On 11/28/12 23:52, Andrew Morton wrote:
>>>> On Wed, 21 Nov 2012 23:09:46 +0800
>>>> Jiang Liu <liuj97@gmail.com> wrote:
>>>>
>>>>> Subject: Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation
>>>>> of pages occupied by memmap
>>>>
>>>> How are people to test this? "does it boot"?
>>>>
>>>
>>> I've been running kernels with Gerry's 5 patches applied for 11 days
>>> now. This is on a 64bit laptop but with a 32bit kernel + HIGHMEM. I
>>> joined the conversation because my laptop would not resume from suspend
>>> to disk - it either froze or rebooted. With the patches applied the
>>> laptop does successfully resume and has been stable.
>>>
>>> Since Monday, I have have been running a kernel with the patches (plus,
>>> from today, the patch you mailed yesterday) applied to 3.7rc7, without
>>> problems.
>>>
>>
>> I've been running 3.7-rc7 with the patches listed below for a week now
>> and it has been perfectly stable. In particular, my laptop will now
>> successfully resume from suspend to disk, which always failed without
>> the patches.
>>
>> From Jiang Liu:
>> 1. [RFT PATCH v2 1/5] mm: introduce new field "managed_pages" to struct zone
>> 2. [RFT PATCH v1 2/5] mm: replace zone->present_pages with
>> zone->managed_pages if appreciated
>> 3. [RFT PATCH v1 3/5] mm: set zone->present_pages to number of existing
>> pages in the zone
>> 4. [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages
>> occupied by memmap
>> 5. [RFT PATCH v1 5/5] mm: increase totalram_pages when free pages
>> allocated by bootmem allocator
>>
>> From Andrew Morton:
>> 6. mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap.patch
>>
>> Tested-by: Chris Clayton <chris2553@googlemail.com>
>
> Thanks.
>
> I have only two of these five patches queued for 3.8:
> mm-introduce-new-field-managed_pages-to-struct-zone.patch and
> mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap.patch.
> I don't recall what happened with the other three.
>
Gerry posted version 1 of the five patches on 18 November. Version 2 of
the first and fourth patches were posted on 21 November.
BTW, the title of Andrew's patch that I applied is wrong above. It
should be
mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap-fix.
Chris
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 4/5] mm: provide more accurate estimation of pages occupied by memmap
2012-11-18 16:07 ` Jiang Liu
@ 2012-11-20 2:15 ` Jaegeuk Hanse
-1 siblings, 0 replies; 98+ messages in thread
From: Jaegeuk Hanse @ 2012-11-20 2:15 UTC (permalink / raw)
To: Jiang Liu
Cc: Andrew Morton, Wen Congyang, David Rientjes, Jiang Liu,
Maciej Rutecki, Chris Clayton, Rafael J . Wysocki, Mel Gorman,
Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu,
linux-mm, linux-kernel
On 11/19/2012 12:07 AM, Jiang Liu wrote:
> If SPARSEMEM is enabled, it won't build page structures for
> non-existing pages (holes) within a zone, so provide a more accurate
> estimation of pages occupied by memmap if there are big holes within
> the zone.
>
> And pages for highmem zones' memmap will be allocated from lowmem,
> so charge nr_kernel_pages for that.
>
> Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
> ---
> mm/page_alloc.c | 22 ++++++++++++++++++++--
> 1 file changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5b327d7..eb25679 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4435,6 +4435,22 @@ void __init set_pageblock_order(void)
>
> #endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
>
> +static unsigned long calc_memmap_size(unsigned long spanned_pages,
> + unsigned long present_pages)
> +{
> + unsigned long pages = spanned_pages;
> +
> + /*
> + * Provide a more accurate estimation if there are big holes within
> + * the zone and SPARSEMEM is in use.
> + */
> + if (spanned_pages > present_pages + (present_pages >> 4) &&
> + IS_ENABLED(CONFIG_SPARSEMEM))
> + pages = present_pages;
> +
> + return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT;
> +}
> +
> /*
> * Set up the zone data structures:
> * - mark all pages reserved
> @@ -4469,8 +4485,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
> * is used by this zone for memmap. This affects the watermark
> * and per-cpu initialisations
> */
> - memmap_pages =
> - PAGE_ALIGN(size * sizeof(struct page)) >> PAGE_SHIFT;
> + memmap_pages = calc_memmap_size(size, realsize);
> if (freesize >= memmap_pages) {
> freesize -= memmap_pages;
> if (memmap_pages)
> @@ -4491,6 +4506,9 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>
> if (!is_highmem_idx(j))
> nr_kernel_pages += freesize;
> + /* Charge for highmem memmap if there are enough kernel pages */
> + else if (nr_kernel_pages > memmap_pages * 2)
> + nr_kernel_pages -= memmap_pages;
Since this is in else branch, if nr_kernel_pages is equal to 0 at
initially time?
> nr_all_pages += freesize;
>
> zone->spanned_pages = size;
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 4/5] mm: provide more accurate estimation of pages occupied by memmap
@ 2012-11-20 2:15 ` Jaegeuk Hanse
0 siblings, 0 replies; 98+ messages in thread
From: Jaegeuk Hanse @ 2012-11-20 2:15 UTC (permalink / raw)
To: Jiang Liu
Cc: Andrew Morton, Wen Congyang, David Rientjes, Jiang Liu,
Maciej Rutecki, Chris Clayton, Rafael J . Wysocki, Mel Gorman,
Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu,
linux-mm, linux-kernel
On 11/19/2012 12:07 AM, Jiang Liu wrote:
> If SPARSEMEM is enabled, it won't build page structures for
> non-existing pages (holes) within a zone, so provide a more accurate
> estimation of pages occupied by memmap if there are big holes within
> the zone.
>
> And pages for highmem zones' memmap will be allocated from lowmem,
> so charge nr_kernel_pages for that.
>
> Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
> ---
> mm/page_alloc.c | 22 ++++++++++++++++++++--
> 1 file changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5b327d7..eb25679 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4435,6 +4435,22 @@ void __init set_pageblock_order(void)
>
> #endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
>
> +static unsigned long calc_memmap_size(unsigned long spanned_pages,
> + unsigned long present_pages)
> +{
> + unsigned long pages = spanned_pages;
> +
> + /*
> + * Provide a more accurate estimation if there are big holes within
> + * the zone and SPARSEMEM is in use.
> + */
> + if (spanned_pages > present_pages + (present_pages >> 4) &&
> + IS_ENABLED(CONFIG_SPARSEMEM))
> + pages = present_pages;
> +
> + return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT;
> +}
> +
> /*
> * Set up the zone data structures:
> * - mark all pages reserved
> @@ -4469,8 +4485,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
> * is used by this zone for memmap. This affects the watermark
> * and per-cpu initialisations
> */
> - memmap_pages =
> - PAGE_ALIGN(size * sizeof(struct page)) >> PAGE_SHIFT;
> + memmap_pages = calc_memmap_size(size, realsize);
> if (freesize >= memmap_pages) {
> freesize -= memmap_pages;
> if (memmap_pages)
> @@ -4491,6 +4506,9 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>
> if (!is_highmem_idx(j))
> nr_kernel_pages += freesize;
> + /* Charge for highmem memmap if there are enough kernel pages */
> + else if (nr_kernel_pages > memmap_pages * 2)
> + nr_kernel_pages -= memmap_pages;
Since this is in else branch, if nr_kernel_pages is equal to 0 at
initially time?
> nr_all_pages += freesize;
>
> zone->spanned_pages = size;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* [RFT PATCH v1 5/5] mm: increase totalram_pages when free pages allocated by bootmem allocator
2012-11-18 16:07 ` Jiang Liu
@ 2012-11-18 16:07 ` Jiang Liu
-1 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-18 16:07 UTC (permalink / raw)
To: Andrew Morton, Wen Congyang, David Rientjes
Cc: Jiang Liu, Maciej Rutecki, Chris Clayton, Rafael J . Wysocki,
Mel Gorman, Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko,
Jianguo Wu, linux-mm, linux-kernel
Function put_page_bootmem() is used to free pages allocated by bootmem
allocator to the buddy system, so it should increase totalram_pages
when freeing pages.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
---
mm/memory_hotplug.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index acd3993..6b07e0c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -119,6 +119,7 @@ void __ref put_page_bootmem(struct page *page)
mutex_lock(&ppb_lock);
__free_pages_bootmem(page, 0);
mutex_unlock(&ppb_lock);
+ totalram_pages++;
}
}
--
1.7.9.5
^ permalink raw reply related [flat|nested] 98+ messages in thread
* [RFT PATCH v1 5/5] mm: increase totalram_pages when free pages allocated by bootmem allocator
@ 2012-11-18 16:07 ` Jiang Liu
0 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-18 16:07 UTC (permalink / raw)
To: Andrew Morton, Wen Congyang, David Rientjes
Cc: Jiang Liu, Maciej Rutecki, Chris Clayton, Rafael J . Wysocki,
Mel Gorman, Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko,
Jianguo Wu, linux-mm, linux-kernel
Function put_page_bootmem() is used to free pages allocated by bootmem
allocator to the buddy system, so it should increase totalram_pages
when freeing pages.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
---
mm/memory_hotplug.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index acd3993..6b07e0c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -119,6 +119,7 @@ void __ref put_page_bootmem(struct page *page)
mutex_lock(&ppb_lock);
__free_pages_bootmem(page, 0);
mutex_unlock(&ppb_lock);
+ totalram_pages++;
}
}
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 0/5] fix up inaccurate zone->present_pages
2012-11-18 16:07 ` Jiang Liu
@ 2012-11-18 20:36 ` Chris Clayton
-1 siblings, 0 replies; 98+ messages in thread
From: Chris Clayton @ 2012-11-18 20:36 UTC (permalink / raw)
To: Jiang Liu
Cc: Andrew Morton, Wen Congyang, David Rientjes, Jiang Liu,
Maciej Rutecki, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
Hi Gerry.
On 11/18/12 16:07, Jiang Liu wrote:
> The commit 7f1290f2f2a4 ("mm: fix-up zone present pages") tries to
> resolve an issue caused by inaccurate zone->present_pages, but that
> fix is incomplete and causes regresions with HIGHMEM. And it has been
> reverted by commit
> 5576646 revert "mm: fix-up zone present pages"
>
> This is a following-up patchset for the issue above. It introduces a
> new field named "managed_pages" to struct zone, which counts pages
> managed by the buddy system from the zone. And zone->present_pages
> is used to count pages existing in the zone, which is
> spanned_pages - absent_pages.
>
> But that way, zone->present_pages will be kept in consistence with
> pgdat->node_present_pages, which is sum of zone->present_pages.
>
> This patchset has only been tested on x86_64 with nobootmem.c. So need
> help to test this patchset on machines:
> 1) use bootmem.c
> 2) have highmem
>
> This patchset applies to "f4a75d2e Linux 3.7-rc6" from
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>
I've applied the five patches to Linus' 3.7.0-rc6 and can confirm that
the kernel allows my system to resume from a suspend to disc. Although
my laptop is 64 bit, I run a 32 bit kernel with HIGHMEM (I have 8GB RAM):
[chris:~/kernel/tmp/linux-3.7-rc6-resume]$ grep -E HIGHMEM\|X86_32 .config
CONFIG_X86_32=y
CONFIG_X86_32_SMP=y
CONFIG_X86_32_LAZY_GS=y
# CONFIG_X86_32_IRIS is not set
# CONFIG_NOHIGHMEM is not set
# CONFIG_HIGHMEM4G is not set
CONFIG_HIGHMEM64G=y
CONFIG_HIGHMEM=y
I can also say that a quick browse of the output of dmesg, shows nothing
out of the ordinary. I have insufficient knowledge to comment on the
patches, but I will run the kernel over the next few days and report
back later in the week.
Chris
> Any comments and helps are welcomed!
>
> Jiang Liu (5):
> mm: introduce new field "managed_pages" to struct zone
> mm: replace zone->present_pages with zone->managed_pages if
> appreciated
> mm: set zone->present_pages to number of existing pages in the zone
> mm: provide more accurate estimation of pages occupied by memmap
> mm: increase totalram_pages when free pages allocated by bootmem
> allocator
>
> include/linux/mmzone.h | 1 +
> mm/bootmem.c | 14 ++++++++
> mm/memory_hotplug.c | 6 ++++
> mm/mempolicy.c | 2 +-
> mm/nobootmem.c | 15 ++++++++
> mm/page_alloc.c | 89 +++++++++++++++++++++++++++++++-----------------
> mm/vmscan.c | 16 ++++-----
> mm/vmstat.c | 8 +++--
> 8 files changed, 108 insertions(+), 43 deletions(-)
>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 0/5] fix up inaccurate zone->present_pages
@ 2012-11-18 20:36 ` Chris Clayton
0 siblings, 0 replies; 98+ messages in thread
From: Chris Clayton @ 2012-11-18 20:36 UTC (permalink / raw)
To: Jiang Liu
Cc: Andrew Morton, Wen Congyang, David Rientjes, Jiang Liu,
Maciej Rutecki, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
Hi Gerry.
On 11/18/12 16:07, Jiang Liu wrote:
> The commit 7f1290f2f2a4 ("mm: fix-up zone present pages") tries to
> resolve an issue caused by inaccurate zone->present_pages, but that
> fix is incomplete and causes regresions with HIGHMEM. And it has been
> reverted by commit
> 5576646 revert "mm: fix-up zone present pages"
>
> This is a following-up patchset for the issue above. It introduces a
> new field named "managed_pages" to struct zone, which counts pages
> managed by the buddy system from the zone. And zone->present_pages
> is used to count pages existing in the zone, which is
> spanned_pages - absent_pages.
>
> But that way, zone->present_pages will be kept in consistence with
> pgdat->node_present_pages, which is sum of zone->present_pages.
>
> This patchset has only been tested on x86_64 with nobootmem.c. So need
> help to test this patchset on machines:
> 1) use bootmem.c
> 2) have highmem
>
> This patchset applies to "f4a75d2e Linux 3.7-rc6" from
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>
I've applied the five patches to Linus' 3.7.0-rc6 and can confirm that
the kernel allows my system to resume from a suspend to disc. Although
my laptop is 64 bit, I run a 32 bit kernel with HIGHMEM (I have 8GB RAM):
[chris:~/kernel/tmp/linux-3.7-rc6-resume]$ grep -E HIGHMEM\|X86_32 .config
CONFIG_X86_32=y
CONFIG_X86_32_SMP=y
CONFIG_X86_32_LAZY_GS=y
# CONFIG_X86_32_IRIS is not set
# CONFIG_NOHIGHMEM is not set
# CONFIG_HIGHMEM4G is not set
CONFIG_HIGHMEM64G=y
CONFIG_HIGHMEM=y
I can also say that a quick browse of the output of dmesg, shows nothing
out of the ordinary. I have insufficient knowledge to comment on the
patches, but I will run the kernel over the next few days and report
back later in the week.
Chris
> Any comments and helps are welcomed!
>
> Jiang Liu (5):
> mm: introduce new field "managed_pages" to struct zone
> mm: replace zone->present_pages with zone->managed_pages if
> appreciated
> mm: set zone->present_pages to number of existing pages in the zone
> mm: provide more accurate estimation of pages occupied by memmap
> mm: increase totalram_pages when free pages allocated by bootmem
> allocator
>
> include/linux/mmzone.h | 1 +
> mm/bootmem.c | 14 ++++++++
> mm/memory_hotplug.c | 6 ++++
> mm/mempolicy.c | 2 +-
> mm/nobootmem.c | 15 ++++++++
> mm/page_alloc.c | 89 +++++++++++++++++++++++++++++++-----------------
> mm/vmscan.c | 16 ++++-----
> mm/vmstat.c | 8 +++--
> 8 files changed, 108 insertions(+), 43 deletions(-)
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 0/5] fix up inaccurate zone->present_pages
2012-11-18 20:36 ` Chris Clayton
@ 2012-11-22 9:23 ` Chris Clayton
-1 siblings, 0 replies; 98+ messages in thread
From: Chris Clayton @ 2012-11-22 9:23 UTC (permalink / raw)
To: Chris Clayton
Cc: Jiang Liu, Andrew Morton, Wen Congyang, David Rientjes,
Jiang Liu, Maciej Rutecki, Rafael J . Wysocki, Mel Gorman,
Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu,
linux-mm, linux-kernel
>> This patchset has only been tested on x86_64 with nobootmem.c. So need
>> help to test this patchset on machines:
>> 1) use bootmem.c
>> 2) have highmem
>>
>> This patchset applies to "f4a75d2e Linux 3.7-rc6" from
>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>>
>
> I've applied the five patches to Linus' 3.7.0-rc6 and can confirm that
> the kernel allows my system to resume from a suspend to disc. Although
> my laptop is 64 bit, I run a 32 bit kernel with HIGHMEM (I have 8GB RAM):
>
> [chris:~/kernel/tmp/linux-3.7-rc6-resume]$ grep -E HIGHMEM\|X86_32 .config
> CONFIG_X86_32=y
> CONFIG_X86_32_SMP=y
> CONFIG_X86_32_LAZY_GS=y
> # CONFIG_X86_32_IRIS is not set
> # CONFIG_NOHIGHMEM is not set
> # CONFIG_HIGHMEM4G is not set
> CONFIG_HIGHMEM64G=y
> CONFIG_HIGHMEM=y
>
> I can also say that a quick browse of the output of dmesg, shows nothing
> out of the ordinary. I have insufficient knowledge to comment on the
> patches, but I will run the kernel over the next few days and report
> back later in the week.
>
Well, I've been running the kernel since Sunday and have had no problems
with my normal work mix of browsing, browsing the internet, video
editing, listening to music and building software. I'm now running a
kernel that build with the new patches 1 and 4 from yesterday (plus the
original 1, 2 and 5). All seems OK so far, including a couple of resumes
from suspend to disk.
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 0/5] fix up inaccurate zone->present_pages
@ 2012-11-22 9:23 ` Chris Clayton
0 siblings, 0 replies; 98+ messages in thread
From: Chris Clayton @ 2012-11-22 9:23 UTC (permalink / raw)
To: Chris Clayton
Cc: Jiang Liu, Andrew Morton, Wen Congyang, David Rientjes,
Jiang Liu, Maciej Rutecki, Rafael J . Wysocki, Mel Gorman,
Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu,
linux-mm, linux-kernel
>> This patchset has only been tested on x86_64 with nobootmem.c. So need
>> help to test this patchset on machines:
>> 1) use bootmem.c
>> 2) have highmem
>>
>> This patchset applies to "f4a75d2e Linux 3.7-rc6" from
>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>>
>
> I've applied the five patches to Linus' 3.7.0-rc6 and can confirm that
> the kernel allows my system to resume from a suspend to disc. Although
> my laptop is 64 bit, I run a 32 bit kernel with HIGHMEM (I have 8GB RAM):
>
> [chris:~/kernel/tmp/linux-3.7-rc6-resume]$ grep -E HIGHMEM\|X86_32 .config
> CONFIG_X86_32=y
> CONFIG_X86_32_SMP=y
> CONFIG_X86_32_LAZY_GS=y
> # CONFIG_X86_32_IRIS is not set
> # CONFIG_NOHIGHMEM is not set
> # CONFIG_HIGHMEM4G is not set
> CONFIG_HIGHMEM64G=y
> CONFIG_HIGHMEM=y
>
> I can also say that a quick browse of the output of dmesg, shows nothing
> out of the ordinary. I have insufficient knowledge to comment on the
> patches, but I will run the kernel over the next few days and report
> back later in the week.
>
Well, I've been running the kernel since Sunday and have had no problems
with my normal work mix of browsing, browsing the internet, video
editing, listening to music and building software. I'm now running a
kernel that build with the new patches 1 and 4 from yesterday (plus the
original 1, 2 and 5). All seems OK so far, including a couple of resumes
from suspend to disk.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 0/5] fix up inaccurate zone->present_pages
2012-11-22 9:23 ` Chris Clayton
@ 2012-11-26 9:46 ` Chris Clayton
-1 siblings, 0 replies; 98+ messages in thread
From: Chris Clayton @ 2012-11-26 9:46 UTC (permalink / raw)
To: Jiang Liu
Cc: Andrew Morton, Wen Congyang, David Rientjes, Jiang Liu,
Maciej Rutecki, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On 11/22/12 09:23, Chris Clayton wrote:
>>> This patchset has only been tested on x86_64 with nobootmem.c. So need
>>> help to test this patchset on machines:
>>> 1) use bootmem.c
>>> 2) have highmem
>>>
>>> This patchset applies to "f4a75d2e Linux 3.7-rc6" from
>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>>>
>>
>> I've applied the five patches to Linus' 3.7.0-rc6 and can confirm that
>> the kernel allows my system to resume from a suspend to disc. Although
>> my laptop is 64 bit, I run a 32 bit kernel with HIGHMEM (I have 8GB RAM):
>>
>> [chris:~/kernel/tmp/linux-3.7-rc6-resume]$ grep -E HIGHMEM\|X86_32
>> .config
>> CONFIG_X86_32=y
>> CONFIG_X86_32_SMP=y
>> CONFIG_X86_32_LAZY_GS=y
>> # CONFIG_X86_32_IRIS is not set
>> # CONFIG_NOHIGHMEM is not set
>> # CONFIG_HIGHMEM4G is not set
>> CONFIG_HIGHMEM64G=y
>> CONFIG_HIGHMEM=y
>>
>> I can also say that a quick browse of the output of dmesg, shows nothing
>> out of the ordinary. I have insufficient knowledge to comment on the
>> patches, but I will run the kernel over the next few days and report
>> back later in the week.
>>
>
> Well, I've been running the kernel since Sunday and have had no problems
> with my normal work mix of browsing, browsing the internet, video
> editing, listening to music and building software. I'm now running a
> kernel that build with the new patches 1 and 4 from yesterday (plus the
> original 1, 2 and 5). All seems OK so far, including a couple of resumes
> from suspend to disk.
>
>
-rc6 with Gerry's patches has run fine, including numerous resumes from
suspend to disk, which fails (freezing or rebooting) without the
patches. I've now applied the patches to -rc7 (they apply with a few
offsets, but look OK) and will run that for a day or two.
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 0/5] fix up inaccurate zone->present_pages
@ 2012-11-26 9:46 ` Chris Clayton
0 siblings, 0 replies; 98+ messages in thread
From: Chris Clayton @ 2012-11-26 9:46 UTC (permalink / raw)
To: Jiang Liu
Cc: Andrew Morton, Wen Congyang, David Rientjes, Jiang Liu,
Maciej Rutecki, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On 11/22/12 09:23, Chris Clayton wrote:
>>> This patchset has only been tested on x86_64 with nobootmem.c. So need
>>> help to test this patchset on machines:
>>> 1) use bootmem.c
>>> 2) have highmem
>>>
>>> This patchset applies to "f4a75d2e Linux 3.7-rc6" from
>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>>>
>>
>> I've applied the five patches to Linus' 3.7.0-rc6 and can confirm that
>> the kernel allows my system to resume from a suspend to disc. Although
>> my laptop is 64 bit, I run a 32 bit kernel with HIGHMEM (I have 8GB RAM):
>>
>> [chris:~/kernel/tmp/linux-3.7-rc6-resume]$ grep -E HIGHMEM\|X86_32
>> .config
>> CONFIG_X86_32=y
>> CONFIG_X86_32_SMP=y
>> CONFIG_X86_32_LAZY_GS=y
>> # CONFIG_X86_32_IRIS is not set
>> # CONFIG_NOHIGHMEM is not set
>> # CONFIG_HIGHMEM4G is not set
>> CONFIG_HIGHMEM64G=y
>> CONFIG_HIGHMEM=y
>>
>> I can also say that a quick browse of the output of dmesg, shows nothing
>> out of the ordinary. I have insufficient knowledge to comment on the
>> patches, but I will run the kernel over the next few days and report
>> back later in the week.
>>
>
> Well, I've been running the kernel since Sunday and have had no problems
> with my normal work mix of browsing, browsing the internet, video
> editing, listening to music and building software. I'm now running a
> kernel that build with the new patches 1 and 4 from yesterday (plus the
> original 1, 2 and 5). All seems OK so far, including a couple of resumes
> from suspend to disk.
>
>
-rc6 with Gerry's patches has run fine, including numerous resumes from
suspend to disk, which fails (freezing or rebooting) without the
patches. I've now applied the patches to -rc7 (they apply with a few
offsets, but look OK) and will run that for a day or two.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 0/5] fix up inaccurate zone->present_pages
2012-11-18 16:07 ` Jiang Liu
@ 2012-11-19 21:36 ` Maciej Rutecki
-1 siblings, 0 replies; 98+ messages in thread
From: Maciej Rutecki @ 2012-11-19 21:36 UTC (permalink / raw)
To: Jiang Liu
Cc: Andrew Morton, Wen Congyang, David Rientjes, Jiang Liu,
Chris Clayton, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On niedziela, 18 listopada 2012 o 17:07:25 Jiang Liu wrote:
> The commit 7f1290f2f2a4 ("mm: fix-up zone present pages") tries to
> resolve an issue caused by inaccurate zone->present_pages, but that
> fix is incomplete and causes regresions with HIGHMEM. And it has been
> reverted by commit
> 5576646 revert "mm: fix-up zone present pages"
>
> This is a following-up patchset for the issue above. It introduces a
> new field named "managed_pages" to struct zone, which counts pages
> managed by the buddy system from the zone. And zone->present_pages
> is used to count pages existing in the zone, which is
> spanned_pages - absent_pages.
>
> But that way, zone->present_pages will be kept in consistence with
> pgdat->node_present_pages, which is sum of zone->present_pages.
>
> This patchset has only been tested on x86_64 with nobootmem.c. So need
> help to test this patchset on machines:
> 1) use bootmem.c
> 2) have highmem
>
> This patchset applies to "f4a75d2e Linux 3.7-rc6" from
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>
> Any comments and helps are welcomed!
>
> Jiang Liu (5):
> mm: introduce new field "managed_pages" to struct zone
> mm: replace zone->present_pages with zone->managed_pages if
> appreciated
> mm: set zone->present_pages to number of existing pages in the zone
> mm: provide more accurate estimation of pages occupied by memmap
> mm: increase totalram_pages when free pages allocated by bootmem
> allocator
>
> include/linux/mmzone.h | 1 +
> mm/bootmem.c | 14 ++++++++
> mm/memory_hotplug.c | 6 ++++
> mm/mempolicy.c | 2 +-
> mm/nobootmem.c | 15 ++++++++
> mm/page_alloc.c | 89
> +++++++++++++++++++++++++++++++----------------- mm/vmscan.c |
> 16 ++++-----
> mm/vmstat.c | 8 +++--
> 8 files changed, 108 insertions(+), 43 deletions(-)
Tested in 32 bit linux with HIGHMEM. Seems be OK.
Regards
--
Maciej Rutecki
http://www.mrutecki.pl
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 0/5] fix up inaccurate zone->present_pages
@ 2012-11-19 21:36 ` Maciej Rutecki
0 siblings, 0 replies; 98+ messages in thread
From: Maciej Rutecki @ 2012-11-19 21:36 UTC (permalink / raw)
To: Jiang Liu
Cc: Andrew Morton, Wen Congyang, David Rientjes, Jiang Liu,
Chris Clayton, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
On niedziela, 18 listopada 2012 o 17:07:25 Jiang Liu wrote:
> The commit 7f1290f2f2a4 ("mm: fix-up zone present pages") tries to
> resolve an issue caused by inaccurate zone->present_pages, but that
> fix is incomplete and causes regresions with HIGHMEM. And it has been
> reverted by commit
> 5576646 revert "mm: fix-up zone present pages"
>
> This is a following-up patchset for the issue above. It introduces a
> new field named "managed_pages" to struct zone, which counts pages
> managed by the buddy system from the zone. And zone->present_pages
> is used to count pages existing in the zone, which is
> spanned_pages - absent_pages.
>
> But that way, zone->present_pages will be kept in consistence with
> pgdat->node_present_pages, which is sum of zone->present_pages.
>
> This patchset has only been tested on x86_64 with nobootmem.c. So need
> help to test this patchset on machines:
> 1) use bootmem.c
> 2) have highmem
>
> This patchset applies to "f4a75d2e Linux 3.7-rc6" from
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>
> Any comments and helps are welcomed!
>
> Jiang Liu (5):
> mm: introduce new field "managed_pages" to struct zone
> mm: replace zone->present_pages with zone->managed_pages if
> appreciated
> mm: set zone->present_pages to number of existing pages in the zone
> mm: provide more accurate estimation of pages occupied by memmap
> mm: increase totalram_pages when free pages allocated by bootmem
> allocator
>
> include/linux/mmzone.h | 1 +
> mm/bootmem.c | 14 ++++++++
> mm/memory_hotplug.c | 6 ++++
> mm/mempolicy.c | 2 +-
> mm/nobootmem.c | 15 ++++++++
> mm/page_alloc.c | 89
> +++++++++++++++++++++++++++++++----------------- mm/vmscan.c |
> 16 ++++-----
> mm/vmstat.c | 8 +++--
> 8 files changed, 108 insertions(+), 43 deletions(-)
Tested in 32 bit linux with HIGHMEM. Seems be OK.
Regards
--
Maciej Rutecki
http://www.mrutecki.pl
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 0/5] fix up inaccurate zone->present_pages
2012-11-19 21:36 ` Maciej Rutecki
@ 2012-11-20 16:03 ` Jiang Liu
-1 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-20 16:03 UTC (permalink / raw)
To: maciej.rutecki
Cc: Andrew Morton, Wen Congyang, David Rientjes, Jiang Liu,
Chris Clayton, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
Thanks, Maciej!
On 11/20/2012 05:36 AM, Maciej Rutecki wrote:
> On niedziela, 18 listopada 2012 o 17:07:25 Jiang Liu wrote:
>> The commit 7f1290f2f2a4 ("mm: fix-up zone present pages") tries to
>> resolve an issue caused by inaccurate zone->present_pages, but that
>> fix is incomplete and causes regresions with HIGHMEM. And it has been
>> reverted by commit
>> 5576646 revert "mm: fix-up zone present pages"
>>
>> This is a following-up patchset for the issue above. It introduces a
>> new field named "managed_pages" to struct zone, which counts pages
>> managed by the buddy system from the zone. And zone->present_pages
>> is used to count pages existing in the zone, which is
>> spanned_pages - absent_pages.
>>
>> But that way, zone->present_pages will be kept in consistence with
>> pgdat->node_present_pages, which is sum of zone->present_pages.
>>
>> This patchset has only been tested on x86_64 with nobootmem.c. So need
>> help to test this patchset on machines:
>> 1) use bootmem.c
>> 2) have highmem
>>
>> This patchset applies to "f4a75d2e Linux 3.7-rc6" from
>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>>
>> Any comments and helps are welcomed!
>>
>> Jiang Liu (5):
>> mm: introduce new field "managed_pages" to struct zone
>> mm: replace zone->present_pages with zone->managed_pages if
>> appreciated
>> mm: set zone->present_pages to number of existing pages in the zone
>> mm: provide more accurate estimation of pages occupied by memmap
>> mm: increase totalram_pages when free pages allocated by bootmem
>> allocator
>>
>> include/linux/mmzone.h | 1 +
>> mm/bootmem.c | 14 ++++++++
>> mm/memory_hotplug.c | 6 ++++
>> mm/mempolicy.c | 2 +-
>> mm/nobootmem.c | 15 ++++++++
>> mm/page_alloc.c | 89
>> +++++++++++++++++++++++++++++++----------------- mm/vmscan.c |
>> 16 ++++-----
>> mm/vmstat.c | 8 +++--
>> 8 files changed, 108 insertions(+), 43 deletions(-)
> Tested in 32 bit linux with HIGHMEM. Seems be OK.
>
> Regards
>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 0/5] fix up inaccurate zone->present_pages
@ 2012-11-20 16:03 ` Jiang Liu
0 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-20 16:03 UTC (permalink / raw)
To: maciej.rutecki
Cc: Andrew Morton, Wen Congyang, David Rientjes, Jiang Liu,
Chris Clayton, Rafael J . Wysocki, Mel Gorman, Minchan Kim,
KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu, linux-mm,
linux-kernel
Thanks, Maciej!
On 11/20/2012 05:36 AM, Maciej Rutecki wrote:
> On niedziela, 18 listopada 2012 o 17:07:25 Jiang Liu wrote:
>> The commit 7f1290f2f2a4 ("mm: fix-up zone present pages") tries to
>> resolve an issue caused by inaccurate zone->present_pages, but that
>> fix is incomplete and causes regresions with HIGHMEM. And it has been
>> reverted by commit
>> 5576646 revert "mm: fix-up zone present pages"
>>
>> This is a following-up patchset for the issue above. It introduces a
>> new field named "managed_pages" to struct zone, which counts pages
>> managed by the buddy system from the zone. And zone->present_pages
>> is used to count pages existing in the zone, which is
>> spanned_pages - absent_pages.
>>
>> But that way, zone->present_pages will be kept in consistence with
>> pgdat->node_present_pages, which is sum of zone->present_pages.
>>
>> This patchset has only been tested on x86_64 with nobootmem.c. So need
>> help to test this patchset on machines:
>> 1) use bootmem.c
>> 2) have highmem
>>
>> This patchset applies to "f4a75d2e Linux 3.7-rc6" from
>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>>
>> Any comments and helps are welcomed!
>>
>> Jiang Liu (5):
>> mm: introduce new field "managed_pages" to struct zone
>> mm: replace zone->present_pages with zone->managed_pages if
>> appreciated
>> mm: set zone->present_pages to number of existing pages in the zone
>> mm: provide more accurate estimation of pages occupied by memmap
>> mm: increase totalram_pages when free pages allocated by bootmem
>> allocator
>>
>> include/linux/mmzone.h | 1 +
>> mm/bootmem.c | 14 ++++++++
>> mm/memory_hotplug.c | 6 ++++
>> mm/mempolicy.c | 2 +-
>> mm/nobootmem.c | 15 ++++++++
>> mm/page_alloc.c | 89
>> +++++++++++++++++++++++++++++++----------------- mm/vmscan.c |
>> 16 ++++-----
>> mm/vmstat.c | 8 +++--
>> 8 files changed, 108 insertions(+), 43 deletions(-)
> Tested in 32 bit linux with HIGHMEM. Seems be OK.
>
> Regards
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 0/5] fix up inaccurate zone->present_pages
2012-11-18 16:07 ` Jiang Liu
@ 2012-11-20 2:13 ` Jaegeuk Hanse
-1 siblings, 0 replies; 98+ messages in thread
From: Jaegeuk Hanse @ 2012-11-20 2:13 UTC (permalink / raw)
To: Jiang Liu
Cc: Andrew Morton, Wen Congyang, David Rientjes, Jiang Liu,
Maciej Rutecki, Chris Clayton, Rafael J . Wysocki, Mel Gorman,
Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu,
linux-mm, linux-kernel
On 11/19/2012 12:07 AM, Jiang Liu wrote:
> The commit 7f1290f2f2a4 ("mm: fix-up zone present pages") tries to
> resolve an issue caused by inaccurate zone->present_pages, but that
> fix is incomplete and causes regresions with HIGHMEM. And it has been
> reverted by commit
> 5576646 revert "mm: fix-up zone present pages"
>
> This is a following-up patchset for the issue above. It introduces a
> new field named "managed_pages" to struct zone, which counts pages
> managed by the buddy system from the zone. And zone->present_pages
> is used to count pages existing in the zone, which is
> spanned_pages - absent_pages.
>
> But that way, zone->present_pages will be kept in consistence with
> pgdat->node_present_pages, which is sum of zone->present_pages.
>
> This patchset has only been tested on x86_64 with nobootmem.c. So need
> help to test this patchset on machines:
> 1) use bootmem.c
If only x86_32 use bootmem.c instead of nobootmem.c? How could I confirm it?
> 2) have highmem
>
> This patchset applies to "f4a75d2e Linux 3.7-rc6" from
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>
> Any comments and helps are welcomed!
>
> Jiang Liu (5):
> mm: introduce new field "managed_pages" to struct zone
> mm: replace zone->present_pages with zone->managed_pages if
> appreciated
> mm: set zone->present_pages to number of existing pages in the zone
> mm: provide more accurate estimation of pages occupied by memmap
> mm: increase totalram_pages when free pages allocated by bootmem
> allocator
>
> include/linux/mmzone.h | 1 +
> mm/bootmem.c | 14 ++++++++
> mm/memory_hotplug.c | 6 ++++
> mm/mempolicy.c | 2 +-
> mm/nobootmem.c | 15 ++++++++
> mm/page_alloc.c | 89 +++++++++++++++++++++++++++++++-----------------
> mm/vmscan.c | 16 ++++-----
> mm/vmstat.c | 8 +++--
> 8 files changed, 108 insertions(+), 43 deletions(-)
>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 0/5] fix up inaccurate zone->present_pages
@ 2012-11-20 2:13 ` Jaegeuk Hanse
0 siblings, 0 replies; 98+ messages in thread
From: Jaegeuk Hanse @ 2012-11-20 2:13 UTC (permalink / raw)
To: Jiang Liu
Cc: Andrew Morton, Wen Congyang, David Rientjes, Jiang Liu,
Maciej Rutecki, Chris Clayton, Rafael J . Wysocki, Mel Gorman,
Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu,
linux-mm, linux-kernel
On 11/19/2012 12:07 AM, Jiang Liu wrote:
> The commit 7f1290f2f2a4 ("mm: fix-up zone present pages") tries to
> resolve an issue caused by inaccurate zone->present_pages, but that
> fix is incomplete and causes regresions with HIGHMEM. And it has been
> reverted by commit
> 5576646 revert "mm: fix-up zone present pages"
>
> This is a following-up patchset for the issue above. It introduces a
> new field named "managed_pages" to struct zone, which counts pages
> managed by the buddy system from the zone. And zone->present_pages
> is used to count pages existing in the zone, which is
> spanned_pages - absent_pages.
>
> But that way, zone->present_pages will be kept in consistence with
> pgdat->node_present_pages, which is sum of zone->present_pages.
>
> This patchset has only been tested on x86_64 with nobootmem.c. So need
> help to test this patchset on machines:
> 1) use bootmem.c
If only x86_32 use bootmem.c instead of nobootmem.c? How could I confirm it?
> 2) have highmem
>
> This patchset applies to "f4a75d2e Linux 3.7-rc6" from
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>
> Any comments and helps are welcomed!
>
> Jiang Liu (5):
> mm: introduce new field "managed_pages" to struct zone
> mm: replace zone->present_pages with zone->managed_pages if
> appreciated
> mm: set zone->present_pages to number of existing pages in the zone
> mm: provide more accurate estimation of pages occupied by memmap
> mm: increase totalram_pages when free pages allocated by bootmem
> allocator
>
> include/linux/mmzone.h | 1 +
> mm/bootmem.c | 14 ++++++++
> mm/memory_hotplug.c | 6 ++++
> mm/mempolicy.c | 2 +-
> mm/nobootmem.c | 15 ++++++++
> mm/page_alloc.c | 89 +++++++++++++++++++++++++++++++-----------------
> mm/vmscan.c | 16 ++++-----
> mm/vmstat.c | 8 +++--
> 8 files changed, 108 insertions(+), 43 deletions(-)
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 0/5] fix up inaccurate zone->present_pages
2012-11-20 2:13 ` Jaegeuk Hanse
@ 2012-11-20 2:43 ` Jiang Liu
-1 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-20 2:43 UTC (permalink / raw)
To: Jaegeuk Hanse
Cc: Jiang Liu, Andrew Morton, Wen Congyang, David Rientjes,
Maciej Rutecki, Chris Clayton, Rafael J . Wysocki, Mel Gorman,
Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu,
linux-mm, linux-kernel
On 2012-11-20 10:13, Jaegeuk Hanse wrote:
> On 11/19/2012 12:07 AM, Jiang Liu wrote:
>> The commit 7f1290f2f2a4 ("mm: fix-up zone present pages") tries to
>> resolve an issue caused by inaccurate zone->present_pages, but that
>> fix is incomplete and causes regresions with HIGHMEM. And it has been
>> reverted by commit
>> 5576646 revert "mm: fix-up zone present pages"
>>
>> This is a following-up patchset for the issue above. It introduces a
>> new field named "managed_pages" to struct zone, which counts pages
>> managed by the buddy system from the zone. And zone->present_pages
>> is used to count pages existing in the zone, which is
>> spanned_pages - absent_pages.
>>
>> But that way, zone->present_pages will be kept in consistence with
>> pgdat->node_present_pages, which is sum of zone->present_pages.
>>
>> This patchset has only been tested on x86_64 with nobootmem.c. So need
>> help to test this patchset on machines:
>> 1) use bootmem.c
>
> If only x86_32 use bootmem.c instead of nobootmem.c? How could I confirm it?
Hi Jaegeuk,
Thanks for review this patch set.
Currently x86/x86_64/Sparc have been converted to use nobootmem.c,
and other Arches still use bootmem.c. So need to test it on other Arches,
such as ARM etc. Yesterday we have tested it patchset on an Itanium platform,
so bootmem.c should work as expected too.
Thanks!
Gerry
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 0/5] fix up inaccurate zone->present_pages
@ 2012-11-20 2:43 ` Jiang Liu
0 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-20 2:43 UTC (permalink / raw)
To: Jaegeuk Hanse
Cc: Jiang Liu, Andrew Morton, Wen Congyang, David Rientjes,
Maciej Rutecki, Chris Clayton, Rafael J . Wysocki, Mel Gorman,
Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu,
linux-mm, linux-kernel
On 2012-11-20 10:13, Jaegeuk Hanse wrote:
> On 11/19/2012 12:07 AM, Jiang Liu wrote:
>> The commit 7f1290f2f2a4 ("mm: fix-up zone present pages") tries to
>> resolve an issue caused by inaccurate zone->present_pages, but that
>> fix is incomplete and causes regresions with HIGHMEM. And it has been
>> reverted by commit
>> 5576646 revert "mm: fix-up zone present pages"
>>
>> This is a following-up patchset for the issue above. It introduces a
>> new field named "managed_pages" to struct zone, which counts pages
>> managed by the buddy system from the zone. And zone->present_pages
>> is used to count pages existing in the zone, which is
>> spanned_pages - absent_pages.
>>
>> But that way, zone->present_pages will be kept in consistence with
>> pgdat->node_present_pages, which is sum of zone->present_pages.
>>
>> This patchset has only been tested on x86_64 with nobootmem.c. So need
>> help to test this patchset on machines:
>> 1) use bootmem.c
>
> If only x86_32 use bootmem.c instead of nobootmem.c? How could I confirm it?
Hi Jaegeuk,
Thanks for review this patch set.
Currently x86/x86_64/Sparc have been converted to use nobootmem.c,
and other Arches still use bootmem.c. So need to test it on other Arches,
such as ARM etc. Yesterday we have tested it patchset on an Itanium platform,
so bootmem.c should work as expected too.
Thanks!
Gerry
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 0/5] fix up inaccurate zone->present_pages
2012-11-20 2:43 ` Jiang Liu
@ 2012-11-20 3:20 ` Jaegeuk Hanse
-1 siblings, 0 replies; 98+ messages in thread
From: Jaegeuk Hanse @ 2012-11-20 3:20 UTC (permalink / raw)
To: Jiang Liu
Cc: Jiang Liu, Andrew Morton, Wen Congyang, David Rientjes,
Maciej Rutecki, Chris Clayton, Rafael J . Wysocki, Mel Gorman,
Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu,
linux-mm, linux-kernel
On 11/20/2012 10:43 AM, Jiang Liu wrote:
> On 2012-11-20 10:13, Jaegeuk Hanse wrote:
>> On 11/19/2012 12:07 AM, Jiang Liu wrote:
>>> The commit 7f1290f2f2a4 ("mm: fix-up zone present pages") tries to
>>> resolve an issue caused by inaccurate zone->present_pages, but that
>>> fix is incomplete and causes regresions with HIGHMEM. And it has been
>>> reverted by commit
>>> 5576646 revert "mm: fix-up zone present pages"
>>>
>>> This is a following-up patchset for the issue above. It introduces a
>>> new field named "managed_pages" to struct zone, which counts pages
>>> managed by the buddy system from the zone. And zone->present_pages
>>> is used to count pages existing in the zone, which is
>>> spanned_pages - absent_pages.
>>>
>>> But that way, zone->present_pages will be kept in consistence with
>>> pgdat->node_present_pages, which is sum of zone->present_pages.
>>>
>>> This patchset has only been tested on x86_64 with nobootmem.c. So need
>>> help to test this patchset on machines:
>>> 1) use bootmem.c
>> If only x86_32 use bootmem.c instead of nobootmem.c? How could I confirm it?
> Hi Jaegeuk,
> Thanks for review this patch set.
> Currently x86/x86_64/Sparc have been converted to use nobootmem.c,
> and other Arches still use bootmem.c. So need to test it on other Arches,
> such as ARM etc. Yesterday we have tested it patchset on an Itanium platform,
> so bootmem.c should work as expected too.
Hi Jiang,
If there are any codes changed in x86/x86_64 to meet nobootmem.c logic?
I mean if remove
config NO_BOOTMEM
def_bool y
in arch/x86/Kconfig, whether x86/x86_64 can take advantage of bootmem.c
or not.
Regards,
Jaegeuk
> Thanks!
> Gerry
>
>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 0/5] fix up inaccurate zone->present_pages
@ 2012-11-20 3:20 ` Jaegeuk Hanse
0 siblings, 0 replies; 98+ messages in thread
From: Jaegeuk Hanse @ 2012-11-20 3:20 UTC (permalink / raw)
To: Jiang Liu
Cc: Jiang Liu, Andrew Morton, Wen Congyang, David Rientjes,
Maciej Rutecki, Chris Clayton, Rafael J . Wysocki, Mel Gorman,
Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu,
linux-mm, linux-kernel
On 11/20/2012 10:43 AM, Jiang Liu wrote:
> On 2012-11-20 10:13, Jaegeuk Hanse wrote:
>> On 11/19/2012 12:07 AM, Jiang Liu wrote:
>>> The commit 7f1290f2f2a4 ("mm: fix-up zone present pages") tries to
>>> resolve an issue caused by inaccurate zone->present_pages, but that
>>> fix is incomplete and causes regresions with HIGHMEM. And it has been
>>> reverted by commit
>>> 5576646 revert "mm: fix-up zone present pages"
>>>
>>> This is a following-up patchset for the issue above. It introduces a
>>> new field named "managed_pages" to struct zone, which counts pages
>>> managed by the buddy system from the zone. And zone->present_pages
>>> is used to count pages existing in the zone, which is
>>> spanned_pages - absent_pages.
>>>
>>> But that way, zone->present_pages will be kept in consistence with
>>> pgdat->node_present_pages, which is sum of zone->present_pages.
>>>
>>> This patchset has only been tested on x86_64 with nobootmem.c. So need
>>> help to test this patchset on machines:
>>> 1) use bootmem.c
>> If only x86_32 use bootmem.c instead of nobootmem.c? How could I confirm it?
> Hi Jaegeuk,
> Thanks for review this patch set.
> Currently x86/x86_64/Sparc have been converted to use nobootmem.c,
> and other Arches still use bootmem.c. So need to test it on other Arches,
> such as ARM etc. Yesterday we have tested it patchset on an Itanium platform,
> so bootmem.c should work as expected too.
Hi Jiang,
If there are any codes changed in x86/x86_64 to meet nobootmem.c logic?
I mean if remove
config NO_BOOTMEM
def_bool y
in arch/x86/Kconfig, whether x86/x86_64 can take advantage of bootmem.c
or not.
Regards,
Jaegeuk
> Thanks!
> Gerry
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 0/5] fix up inaccurate zone->present_pages
2012-11-20 3:20 ` Jaegeuk Hanse
@ 2012-11-20 3:46 ` Jiang Liu
-1 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-20 3:46 UTC (permalink / raw)
To: Jaegeuk Hanse
Cc: Jiang Liu, Andrew Morton, Wen Congyang, David Rientjes,
Maciej Rutecki, Chris Clayton, Rafael J . Wysocki, Mel Gorman,
Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu,
linux-mm, linux-kernel
On 2012-11-20 11:20, Jaegeuk Hanse wrote:
> On 11/20/2012 10:43 AM, Jiang Liu wrote:
>> On 2012-11-20 10:13, Jaegeuk Hanse wrote:
>>> On 11/19/2012 12:07 AM, Jiang Liu wrote:
>> Hi Jaegeuk,
>> Thanks for review this patch set.
>> Currently x86/x86_64/Sparc have been converted to use nobootmem.c,
>> and other Arches still use bootmem.c. So need to test it on other Arches,
>> such as ARM etc. Yesterday we have tested it patchset on an Itanium platform,
>> so bootmem.c should work as expected too.
>
> Hi Jiang,
>
> If there are any codes changed in x86/x86_64 to meet nobootmem.c logic? I mean if remove
> config NO_BOOTMEM
> def_bool y
> in arch/x86/Kconfig, whether x86/x86_64 can take advantage of bootmem.c or not.
There are code change in x86/x86_64 arch directory to convert from bootmem.c
to nobootmem.c, so you can't simply comment out NO_BOOTMEM Kconfig item.
There are differences in APIs for bootmem.c and nobootmem.c.
For example, free_low_memory_core_early() is only provided by nobootmem.c
and init_bootmem_node() is only provided by bootmem.c.
Thanks!
^ permalink raw reply [flat|nested] 98+ messages in thread
* Re: [RFT PATCH v1 0/5] fix up inaccurate zone->present_pages
@ 2012-11-20 3:46 ` Jiang Liu
0 siblings, 0 replies; 98+ messages in thread
From: Jiang Liu @ 2012-11-20 3:46 UTC (permalink / raw)
To: Jaegeuk Hanse
Cc: Jiang Liu, Andrew Morton, Wen Congyang, David Rientjes,
Maciej Rutecki, Chris Clayton, Rafael J . Wysocki, Mel Gorman,
Minchan Kim, KAMEZAWA Hiroyuki, Michal Hocko, Jianguo Wu,
linux-mm, linux-kernel
On 2012-11-20 11:20, Jaegeuk Hanse wrote:
> On 11/20/2012 10:43 AM, Jiang Liu wrote:
>> On 2012-11-20 10:13, Jaegeuk Hanse wrote:
>>> On 11/19/2012 12:07 AM, Jiang Liu wrote:
>> Hi Jaegeuk,
>> Thanks for review this patch set.
>> Currently x86/x86_64/Sparc have been converted to use nobootmem.c,
>> and other Arches still use bootmem.c. So need to test it on other Arches,
>> such as ARM etc. Yesterday we have tested it patchset on an Itanium platform,
>> so bootmem.c should work as expected too.
>
> Hi Jiang,
>
> If there are any codes changed in x86/x86_64 to meet nobootmem.c logic? I mean if remove
> config NO_BOOTMEM
> def_bool y
> in arch/x86/Kconfig, whether x86/x86_64 can take advantage of bootmem.c or not.
There are code change in x86/x86_64 arch directory to convert from bootmem.c
to nobootmem.c, so you can't simply comment out NO_BOOTMEM Kconfig item.
There are differences in APIs for bootmem.c and nobootmem.c.
For example, free_low_memory_core_early() is only provided by nobootmem.c
and init_bootmem_node() is only provided by bootmem.c.
Thanks!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 98+ messages in thread