All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wen Congyang <wency@cn.fujitsu.com>
To: Jiang Liu <liuj97@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Jiang Liu <jiang.liu@huawei.com>,
	Maciej Rutecki <maciej.rutecki@gmail.com>,
	Jianguo Wu <wujianguo@huawei.com>,
	Chris Clayton <chris2553@googlemail.com>,
	"Rafael J. Wysocki" <rjw@sisk.pl>, Mel Gorman <mgorman@suse.de>,
	Minchan Kim <minchan@kernel.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Michal Hocko <mhocko@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Daniel Vetter <daniel.vetter@ffwll.ch>
Subject: Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
Date: Thu, 15 Nov 2012 17:22:37 +0800	[thread overview]
Message-ID: <50A4B45D.5000905@cn.fujitsu.com> (raw)
In-Reply-To: <50A3B013.4030207@gmail.com>

Hi, Liu Jiang

At 11/14/2012 10:52 PM, Jiang Liu Wrote:
> On 11/07/2012 04:43 AM, Andrew Morton wrote:
>> On Tue, 6 Nov 2012 09:31:57 +0800
>> Jiang Liu <jiang.liu@huawei.com> wrote:
>>
>>> Changeset 7f1290f2f2 tries to fix a issue when calculating
>>> zone->present_pages, but it causes a regression to 32bit systems with
>>> HIGHMEM. With that changeset, function reset_zone_present_pages()
>>> resets all zone->present_pages to zero, and fixup_zone_present_pages()
>>> is called to recalculate zone->present_pages when boot allocator frees
>>> core memory pages into buddy allocator. Because highmem pages are not
>>> freed by bootmem allocator, all highmem zones' present_pages becomes
>>> zero.
>>>
>>> Actually there's no need to recalculate present_pages for highmem zone
>>> because bootmem allocator never allocates pages from them. So fix the
>>> regression by skipping highmem in function reset_zone_present_pages()
>>> and fixup_zone_present_pages().
>>>
>>> ...
>>>
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void)
>>>  	for_each_node_state(nid, N_HIGH_MEMORY) {
>>>  		for (i = 0; i < MAX_NR_ZONES; i++) {
>>>  			z = NODE_DATA(nid)->node_zones + i;
>>> -			z->present_pages = 0;
>>> +			if (!is_highmem(z))
>>> +				z->present_pages = 0;
>>>  		}
>>>  	}
>>>  }
>>> @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long start_pfn,
>>>  
>>>  	for (i = 0; i < MAX_NR_ZONES; i++) {
>>>  		z = NODE_DATA(nid)->node_zones + i;
>>> +		if (is_highmem(z))
>>> +			continue;
>>> +
>>>  		zone_start_pfn = z->zone_start_pfn;
>>>  		zone_end_pfn = zone_start_pfn + z->spanned_pages;
>>> -
>>> -		/* if the two regions intersect */
>>>  		if (!(zone_start_pfn >= end_pfn	|| zone_end_pfn <= start_pfn))
>>>  			z->present_pages += min(end_pfn, zone_end_pfn) -
>>>  					    max(start_pfn, zone_start_pfn);
>>
>> This ...  isn't very nice.  It is embeds within
>> reset_zone_present_pages() and fixup_zone_present_pages() knowledge
>> about their caller's state.  Or, more specifically, it is emebedding
>> knowledge about the overall state of the system when these functions
>> are called.
>>
>> I mean, a function called "reset_zone_present_pages" should reset
>> ->present_pages!
>>
>> The fact that fixup_zone_present_page() has multiple call sites makes
>> this all even more risky.  And what are the interactions between this
>> and memory hotplug?
>>
>> Can we find a cleaner fix?
>>
>> Please tell us more about what's happening here.  Is it the case that
>> reset_zone_present_pages() is being called *after* highmem has been
>> populated?  If so, then fixup_zone_present_pages() should work
>> correctly for highmem?  Or is it the case that highmem hasn't yet been
>> setup?  IOW, what is the sequence of operations here?
>>
>> Is the problem that we're *missing* a call to
>> fixup_zone_present_pages(), perhaps?  If we call
>> fixup_zone_present_pages() after highmem has been populated,
>> fixup_zone_present_pages() should correctly fill in the highmem zone's
>> ->present_pages?
> Hi Andrew,
> 	Sorry for the late response:(
> 	I have done more investigations according to your suggestions. Currently
> we have only called fixup_zone_present_pages() for memory freed by bootmem
> allocator and missed HIGHMEM pages. We could also call fixup_zone_present_pages()
> for HIGHMEM pages, but that will need to change arch specific code for x86, powerpc,
> sparc, microblaze, arm, mips, um and tile etc. Seems a little overhead.
> 	And sadly enough, I found the quick fix is still incomplete. The original
> patch still have another issue that, reset_zone_present_pages() is only called
> for IA64, so it will cause trouble for other arches which make use of "bootmem.c".
> 	Then I feel a little guilty and tried to find a cleaner solution without
> touching arch specific code. But things are more complex than my expectation and
> I'm still working on that.
> 	So how about totally reverting the changeset 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
> and I will post another version once I found a cleaner way?

I think fixup_zone_present_pages() are very useful for memory hotplug.

We calculate zone->present_pages in free_area_init_core(), but its value is wrong.
So it is why we fix it in fixup_zone_present_pages().

What about this:
1. init zone->present_pages to the present pages in this zone(include bootmem)
2. don't reset zone->present_pages for HIGHMEM pages

We don't allocate bootmem from HIGHMEM. So its present pages is inited in step1
and there is no need to fix it in step2.

Is it OK?

If it is OK, I will resend the patch for step1(the patch is from laijs).

Thanks
Wen Congyang

> 	Thanks!
> 	Gerry
> 
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


WARNING: multiple messages have this Message-ID (diff)
From: Wen Congyang <wency@cn.fujitsu.com>
To: Jiang Liu <liuj97@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Jiang Liu <jiang.liu@huawei.com>,
	Maciej Rutecki <maciej.rutecki@gmail.com>,
	Jianguo Wu <wujianguo@huawei.com>,
	Chris Clayton <chris2553@googlemail.com>,
	"Rafael J. Wysocki" <rjw@sisk.pl>, Mel Gorman <mgorman@suse.de>,
	Minchan Kim <minchan@kernel.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Michal Hocko <mhocko@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Daniel Vetter <daniel.vetter@ffwll.ch>
Subject: Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
Date: Thu, 15 Nov 2012 17:22:37 +0800	[thread overview]
Message-ID: <50A4B45D.5000905@cn.fujitsu.com> (raw)
In-Reply-To: <50A3B013.4030207@gmail.com>

Hi, Liu Jiang

At 11/14/2012 10:52 PM, Jiang Liu Wrote:
> On 11/07/2012 04:43 AM, Andrew Morton wrote:
>> On Tue, 6 Nov 2012 09:31:57 +0800
>> Jiang Liu <jiang.liu@huawei.com> wrote:
>>
>>> Changeset 7f1290f2f2 tries to fix a issue when calculating
>>> zone->present_pages, but it causes a regression to 32bit systems with
>>> HIGHMEM. With that changeset, function reset_zone_present_pages()
>>> resets all zone->present_pages to zero, and fixup_zone_present_pages()
>>> is called to recalculate zone->present_pages when boot allocator frees
>>> core memory pages into buddy allocator. Because highmem pages are not
>>> freed by bootmem allocator, all highmem zones' present_pages becomes
>>> zero.
>>>
>>> Actually there's no need to recalculate present_pages for highmem zone
>>> because bootmem allocator never allocates pages from them. So fix the
>>> regression by skipping highmem in function reset_zone_present_pages()
>>> and fixup_zone_present_pages().
>>>
>>> ...
>>>
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void)
>>>  	for_each_node_state(nid, N_HIGH_MEMORY) {
>>>  		for (i = 0; i < MAX_NR_ZONES; i++) {
>>>  			z = NODE_DATA(nid)->node_zones + i;
>>> -			z->present_pages = 0;
>>> +			if (!is_highmem(z))
>>> +				z->present_pages = 0;
>>>  		}
>>>  	}
>>>  }
>>> @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long start_pfn,
>>>  
>>>  	for (i = 0; i < MAX_NR_ZONES; i++) {
>>>  		z = NODE_DATA(nid)->node_zones + i;
>>> +		if (is_highmem(z))
>>> +			continue;
>>> +
>>>  		zone_start_pfn = z->zone_start_pfn;
>>>  		zone_end_pfn = zone_start_pfn + z->spanned_pages;
>>> -
>>> -		/* if the two regions intersect */
>>>  		if (!(zone_start_pfn >= end_pfn	|| zone_end_pfn <= start_pfn))
>>>  			z->present_pages += min(end_pfn, zone_end_pfn) -
>>>  					    max(start_pfn, zone_start_pfn);
>>
>> This ...  isn't very nice.  It is embeds within
>> reset_zone_present_pages() and fixup_zone_present_pages() knowledge
>> about their caller's state.  Or, more specifically, it is emebedding
>> knowledge about the overall state of the system when these functions
>> are called.
>>
>> I mean, a function called "reset_zone_present_pages" should reset
>> ->present_pages!
>>
>> The fact that fixup_zone_present_page() has multiple call sites makes
>> this all even more risky.  And what are the interactions between this
>> and memory hotplug?
>>
>> Can we find a cleaner fix?
>>
>> Please tell us more about what's happening here.  Is it the case that
>> reset_zone_present_pages() is being called *after* highmem has been
>> populated?  If so, then fixup_zone_present_pages() should work
>> correctly for highmem?  Or is it the case that highmem hasn't yet been
>> setup?  IOW, what is the sequence of operations here?
>>
>> Is the problem that we're *missing* a call to
>> fixup_zone_present_pages(), perhaps?  If we call
>> fixup_zone_present_pages() after highmem has been populated,
>> fixup_zone_present_pages() should correctly fill in the highmem zone's
>> ->present_pages?
> Hi Andrew,
> 	Sorry for the late response:(
> 	I have done more investigations according to your suggestions. Currently
> we have only called fixup_zone_present_pages() for memory freed by bootmem
> allocator and missed HIGHMEM pages. We could also call fixup_zone_present_pages()
> for HIGHMEM pages, but that will need to change arch specific code for x86, powerpc,
> sparc, microblaze, arm, mips, um and tile etc. Seems a little overhead.
> 	And sadly enough, I found the quick fix is still incomplete. The original
> patch still have another issue that, reset_zone_present_pages() is only called
> for IA64, so it will cause trouble for other arches which make use of "bootmem.c".
> 	Then I feel a little guilty and tried to find a cleaner solution without
> touching arch specific code. But things are more complex than my expectation and
> I'm still working on that.
> 	So how about totally reverting the changeset 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
> and I will post another version once I found a cleaner way?

I think fixup_zone_present_pages() are very useful for memory hotplug.

We calculate zone->present_pages in free_area_init_core(), but its value is wrong.
So it is why we fix it in fixup_zone_present_pages().

What about this:
1. init zone->present_pages to the present pages in this zone(include bootmem)
2. don't reset zone->present_pages for HIGHMEM pages

We don't allocate bootmem from HIGHMEM. So its present pages is inited in step1
and there is no need to fix it in step2.

Is it OK?

If it is OK, I will resend the patch for step1(the patch is from laijs).

Thanks
Wen Congyang

> 	Thanks!
> 	Gerry
> 
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-11-15  9:16 UTC|newest]

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-06  1:31 [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d Jiang Liu
2012-11-06  1:31 ` Jiang Liu
2012-11-06 10:23 ` Chris Clayton
2012-11-06 10:23   ` Chris Clayton
2012-11-06 20:43 ` Andrew Morton
2012-11-06 20:43   ` Andrew Morton
2012-11-14 14:52   ` Jiang Liu
2012-11-14 14:52     ` Jiang Liu
2012-11-15  9:22     ` Wen Congyang [this message]
2012-11-15  9:22       ` Wen Congyang
2012-11-15 11:28       ` Bob Liu
2012-11-15 11:28         ` Bob Liu
2012-11-15 14:23         ` Wen Congyang
2012-11-15 14:23           ` Wen Congyang
2012-11-15 15:40       ` Jiang Liu
2012-11-15 15:40         ` Jiang Liu
2012-11-15 21:41         ` David Rientjes
2012-11-15 21:41           ` David Rientjes
2012-11-15 19:24     ` Andrew Morton
2012-11-15 19:24       ` Andrew Morton
2012-11-15 21:17       ` Chris Clayton
2012-11-15 21:17         ` Chris Clayton
2012-11-15 21:27       ` David Rientjes
2012-11-15 21:27         ` David Rientjes
2012-11-18 16:07       ` [RFT PATCH v1 0/5] fix up inaccurate zone->present_pages Jiang Liu
2012-11-18 16:07         ` Jiang Liu
2012-11-18 16:07         ` [RFT PATCH v1 1/5] mm: introduce new field "managed_pages" to struct zone Jiang Liu
2012-11-18 16:07           ` Jiang Liu
2012-11-19 23:38           ` Andrew Morton
2012-11-19 23:38             ` Andrew Morton
2012-11-20 14:56             ` Jiang Liu
2012-11-20 14:56               ` Jiang Liu
2012-11-20 19:31               ` Andrew Morton
2012-11-20 19:31                 ` Andrew Morton
2012-11-21 14:36                 ` Jiang Liu
2012-11-21 14:36                   ` Jiang Liu
2012-11-21 19:31                   ` Andrew Morton
2012-11-21 19:31                     ` Andrew Morton
2012-11-21 15:06                 ` [RFT PATCH v2 " Jiang Liu
2012-11-21 15:06                   ` Jiang Liu
2012-11-18 16:07         ` [RFT PATCH v1 2/5] mm: replace zone->present_pages with zone->managed_pages if appreciated Jiang Liu
2012-11-18 16:07           ` Jiang Liu
2012-11-18 16:07         ` [RFT PATCH v1 3/5] mm: set zone->present_pages to number of existing pages in the zone Jiang Liu
2012-11-18 16:07           ` Jiang Liu
2012-11-18 16:07         ` [RFT PATCH v1 4/5] mm: provide more accurate estimation of pages occupied by memmap Jiang Liu
2012-11-18 16:07           ` Jiang Liu
2012-11-19 23:42           ` Andrew Morton
2012-11-19 23:42             ` Andrew Morton
2012-11-20 15:18             ` Jiang Liu
2012-11-20 15:18               ` Jiang Liu
2012-11-20 19:19               ` Andrew Morton
2012-11-20 19:19                 ` Andrew Morton
2012-11-21 14:52                 ` Jiang Liu
2012-11-21 14:52                   ` Jiang Liu
2012-11-21 19:35                   ` Andrew Morton
2012-11-21 19:35                     ` Andrew Morton
2012-11-22 16:17                     ` Jiang Liu
2012-11-22 16:17                       ` Jiang Liu
2012-11-21 15:09                 ` [RFT PATCH v2 " Jiang Liu
2012-11-21 15:09                   ` Jiang Liu
2012-11-28 23:52                   ` Andrew Morton
2012-11-28 23:52                     ` Andrew Morton
2012-11-29  2:25                     ` Jianguo Wu
2012-11-29  2:25                       ` Jianguo Wu
2012-11-29 10:52                     ` Chris Clayton
2012-11-29 10:52                       ` Chris Clayton
2012-12-02 19:55                       ` Chris Clayton
2012-12-02 19:55                         ` Chris Clayton
2012-12-03  7:26                         ` Chris Clayton
2012-12-03  7:26                           ` Chris Clayton
2012-12-03 23:17                         ` Andrew Morton
2012-12-03 23:17                           ` Andrew Morton
2012-12-04  1:21                           ` Jiang Liu
2012-12-04  1:21                             ` Jiang Liu
2012-12-04 10:05                           ` Chris Clayton
2012-12-04 10:05                             ` Chris Clayton
2012-11-20  2:15           ` [RFT PATCH v1 " Jaegeuk Hanse
2012-11-20  2:15             ` Jaegeuk Hanse
2012-11-18 16:07         ` [RFT PATCH v1 5/5] mm: increase totalram_pages when free pages allocated by bootmem allocator Jiang Liu
2012-11-18 16:07           ` Jiang Liu
2012-11-18 20:36         ` [RFT PATCH v1 0/5] fix up inaccurate zone->present_pages Chris Clayton
2012-11-18 20:36           ` Chris Clayton
2012-11-22  9:23           ` Chris Clayton
2012-11-22  9:23             ` Chris Clayton
2012-11-26  9:46             ` Chris Clayton
2012-11-26  9:46               ` Chris Clayton
2012-11-19 21:36         ` Maciej Rutecki
2012-11-19 21:36           ` Maciej Rutecki
2012-11-20 16:03           ` Jiang Liu
2012-11-20 16:03             ` Jiang Liu
2012-11-20  2:13         ` Jaegeuk Hanse
2012-11-20  2:13           ` Jaegeuk Hanse
2012-11-20  2:43           ` Jiang Liu
2012-11-20  2:43             ` Jiang Liu
2012-11-20  3:20             ` Jaegeuk Hanse
2012-11-20  3:20               ` Jaegeuk Hanse
2012-11-20  3:46               ` Jiang Liu
2012-11-20  3:46                 ` Jiang Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50A4B45D.5000905@cn.fujitsu.com \
    --to=wency@cn.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=chris2553@googlemail.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=jiang.liu@huawei.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liuj97@gmail.com \
    --cc=maciej.rutecki@gmail.com \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=minchan@kernel.org \
    --cc=rjw@sisk.pl \
    --cc=wujianguo@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.