All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zhang Yanfei <zhangyanfei.yes@gmail.com>
To: Tejun Heo <tj@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Rafael J . Wysocki" <rjw@sisk.pl>,
	lenb@kernel.org, Thomas Gleixner <tglx@linutronix.de>,
	mingo@elte.hu, Toshi Kani <toshi.kani@hp.com>,
	Wanpeng Li <liwanp@linux.vnet.ibm.com>,
	Thomas Renninger <trenn@suse.de>, Yinghai Lu <yinghai@kernel.org>,
	Jiang Liu <jiang.liu@huawei.com>,
	Wen Congyang <wency@cn.fujitsu.com>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	isimatu.yasuaki@jp.fujitsu.com, izumi.taku@jp.fujitsu.com,
	Mel Gorman <mgorman@suse.de>, Minchan Kim <minchan@kernel.org>,
	mina86@mina86.com, gong.chen@linux.intel.com,
	vasilis.liaskovitis@profitbricks.com, lwoodman@redhat.com,
	Rik van Riel <riel@redhat.com>,
	jweiner@redhat.com, prarit@redhat.com,
	"x86@kernel.org" <x86@kernel.org>,
	linux-doc@vger.kernel.org,
	"linux-kernel@vger.kernel.org"
	<linux-kernel@vger.kernel.org>Linux MM <li>
Subject: Re: [PATCH part1 v6 4/6] x86/mem-hotplug: Support initialize page tables in bottom-up
Date: Thu, 10 Oct 2013 01:14:23 +0800	[thread overview]
Message-ID: <52558EEF.4050009@gmail.com> (raw)
In-Reply-To: <20131009164449.GG22495@htj.dyndns.org>

Hello tejun,

Thanks for the response:)

On 10/10/2013 12:44 AM, Tejun Heo wrote:
> Hello,
> 
> On Wed, Oct 09, 2013 at 01:36:36AM +0800, Zhang Yanfei wrote:
>>> I'm still seriously concerned about this.  This unconditionally
>>> introduces new behavior which may very well break some classes of
> 
> This is an optional behavior which is triggered by a very specific
> kernel boot param, which I suspect is gonna need to stick around to
> support memory hotplug in the current setup unless we add another
> layer of address translation to support memory hotplug.

Yeah, I have explained that this is conditional.

> 
>>> systems -- the whole point of creating the page tables top down is
>>> because the kernel tends to be allocated in lower memory, which is also
>>> the memory that some devices need for DMA.
> 
> Would that really matter for the target use cases here?  These are
> likely fairly huge highend machines.  ISA DMA limit is below the
> kernel image and 32bit limit is pretty big in comparison and at this
> point even that limit is likely to be irrelevant at least for the
> target machines, which are gonna be almost inherently extremely niche.
> 
>>> so if we allocate memory close to the kernel image,
>>>   it's likely that we don't contaminate hotpluggable node.  We're
>>>   talking about few megs at most right after the kernel image.  I
>>>   can't see how that would make any noticeable difference.
>>
>> You meant that the memory size is about few megs. But here, page tables
>> seems to be large enough in big memory machines, so that page tables will
> 
> Hmmm?  Even with 4k mappings and, say, 16Gigs of memory, it's still
> somewhere above 32MiB, right?  And, these physical mappings don't
> usually use 4k mappings to begin with.  Unless we're worrying about
> ISA DMA limit, I don't think it'd be problematic.

I think Peter meant very huge memory machines, say 2T memory? In the worst
case, this may need 2G memory for page tables, seems huge....

And I am not familiar with the ISA DMA limit, does this mean the memory 
below 4G? Just as we have the ZONE_DMA32 in x86_64. (16MB limit seems not
the case here)

> 
>> consume the precious lower memory. So I think we may really reorder
>> the page table setup after we get the hotplug info in some way. Just like
>> we have done in patch 5, we reorder reserve_crashkernel() to be called
>> after initmem_init().
>>
>> So do you still have any objection to the pagetable setup reorder?
> 
> I still feel quite uneasy about pulling SRAT parsing and ACPI initrd
> overriding into early boot.
> 

I am trying to read all the discussion mails before. Maybe from the very
first patchset that made you uneasy about parsing SRAT earlier. The patchset
may do too much splitting and registering. So I am thinking that if we
could combine two thing together to make things cleaner:

1. introduce bottom up allocation to allocate memory near the kernel before
   we parse SRAT.
2. Since peter have the serious concern about the pagetable setup in bottom-up
   and Ingo also said we'd better not to touch the current top-down pagetable
   setup. Could we just put acpi_initrd_override and numa_init related functions
   before init_mem_mapping()? After numa info is parsed (including SRAT), we
   reset the allocation direction back to top-down, so we needn't change the
   page table setup process. And before numa info parsed, we use the bottom-up
   allocation to make sure all memory allocated by memblock is near the kernel
   image.

How do you think?

-- 
Thanks.
Zhang Yanfei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Zhang Yanfei <zhangyanfei.yes@gmail.com>
To: Tejun Heo <tj@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Rafael J . Wysocki" <rjw@sisk.pl>,
	lenb@kernel.org, Thomas Gleixner <tglx@linutronix.de>,
	mingo@elte.hu, Toshi Kani <toshi.kani@hp.com>,
	Wanpeng Li <liwanp@linux.vnet.ibm.com>,
	Thomas Renninger <trenn@suse.de>, Yinghai Lu <yinghai@kernel.org>,
	Jiang Liu <jiang.liu@huawei.com>,
	Wen Congyang <wency@cn.fujitsu.com>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	isimatu.yasuaki@jp.fujitsu.com, izumi.taku@jp.fujitsu.com,
	Mel Gorman <mgorman@suse.de>, Minchan Kim <minchan@kernel.org>,
	mina86@mina86.com, gong.chen@linux.intel.com,
	vasilis.liaskovitis@profitbricks.com, lwoodman@redhat.com,
	Rik van Riel <riel@redhat.com>,
	jweiner@redhat.com, prarit@redhat.com,
	"x86@kernel.org" <x86@kernel.org>,
	linux-doc@vger.kernel.org,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	linux-acpi@vger.kernel.org, imtangchen@gmail.com,
	Zhang Yanfei <zhangyanfei@cn.fujitsu.com>,
	Tang Chen <tangchen@cn.fujitsu.com>
Subject: Re: [PATCH part1 v6 4/6] x86/mem-hotplug: Support initialize page tables in bottom-up
Date: Thu, 10 Oct 2013 01:14:23 +0800	[thread overview]
Message-ID: <52558EEF.4050009@gmail.com> (raw)
In-Reply-To: <20131009164449.GG22495@htj.dyndns.org>

Hello tejun,

Thanks for the response:)

On 10/10/2013 12:44 AM, Tejun Heo wrote:
> Hello,
> 
> On Wed, Oct 09, 2013 at 01:36:36AM +0800, Zhang Yanfei wrote:
>>> I'm still seriously concerned about this.  This unconditionally
>>> introduces new behavior which may very well break some classes of
> 
> This is an optional behavior which is triggered by a very specific
> kernel boot param, which I suspect is gonna need to stick around to
> support memory hotplug in the current setup unless we add another
> layer of address translation to support memory hotplug.

Yeah, I have explained that this is conditional.

> 
>>> systems -- the whole point of creating the page tables top down is
>>> because the kernel tends to be allocated in lower memory, which is also
>>> the memory that some devices need for DMA.
> 
> Would that really matter for the target use cases here?  These are
> likely fairly huge highend machines.  ISA DMA limit is below the
> kernel image and 32bit limit is pretty big in comparison and at this
> point even that limit is likely to be irrelevant at least for the
> target machines, which are gonna be almost inherently extremely niche.
> 
>>> so if we allocate memory close to the kernel image,
>>>   it's likely that we don't contaminate hotpluggable node.  We're
>>>   talking about few megs at most right after the kernel image.  I
>>>   can't see how that would make any noticeable difference.
>>
>> You meant that the memory size is about few megs. But here, page tables
>> seems to be large enough in big memory machines, so that page tables will
> 
> Hmmm?  Even with 4k mappings and, say, 16Gigs of memory, it's still
> somewhere above 32MiB, right?  And, these physical mappings don't
> usually use 4k mappings to begin with.  Unless we're worrying about
> ISA DMA limit, I don't think it'd be problematic.

I think Peter meant very huge memory machines, say 2T memory? In the worst
case, this may need 2G memory for page tables, seems huge....

And I am not familiar with the ISA DMA limit, does this mean the memory 
below 4G? Just as we have the ZONE_DMA32 in x86_64. (16MB limit seems not
the case here)

> 
>> consume the precious lower memory. So I think we may really reorder
>> the page table setup after we get the hotplug info in some way. Just like
>> we have done in patch 5, we reorder reserve_crashkernel() to be called
>> after initmem_init().
>>
>> So do you still have any objection to the pagetable setup reorder?
> 
> I still feel quite uneasy about pulling SRAT parsing and ACPI initrd
> overriding into early boot.
> 

I am trying to read all the discussion mails before. Maybe from the very
first patchset that made you uneasy about parsing SRAT earlier. The patchset
may do too much splitting and registering. So I am thinking that if we
could combine two thing together to make things cleaner:

1. introduce bottom up allocation to allocate memory near the kernel before
   we parse SRAT.
2. Since peter have the serious concern about the pagetable setup in bottom-up
   and Ingo also said we'd better not to touch the current top-down pagetable
   setup. Could we just put acpi_initrd_override and numa_init related functions
   before init_mem_mapping()? After numa info is parsed (including SRAT), we
   reset the allocation direction back to top-down, so we needn't change the
   page table setup process. And before numa info parsed, we use the bottom-up
   allocation to make sure all memory allocated by memblock is near the kernel
   image.

How do you think?

-- 
Thanks.
Zhang Yanfei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-10-09 17:14 UTC|newest]

Thread overview: 109+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-04  1:56 [PATCH part1 v6 0/6] x86, memblock: Allocate memory near kernel image before SRAT parsed Zhang Yanfei
2013-10-04  1:56 ` Zhang Yanfei
2013-10-04  1:57 ` [PATCH part1 v6 1/6] memblock: Factor out of top-down allocation Zhang Yanfei
2013-10-04  1:57   ` Zhang Yanfei
2013-10-04  1:58 ` [PATCH part1 v6 2/6] memblock: Introduce bottom-up allocation mode Zhang Yanfei
2013-10-04  1:58   ` Zhang Yanfei
2013-10-05 21:30   ` Toshi Kani
2013-10-05 21:30     ` Toshi Kani
2013-10-04  1:59 ` [PATCH part1 v6 3/6] x86/mm: Factor out of top-down direct mapping setup Zhang Yanfei
2013-10-04  1:59   ` Zhang Yanfei
2013-10-04  2:00 ` [PATCH part1 v6 4/6] x86/mem-hotplug: Support initialize page tables in bottom-up Zhang Yanfei
2013-10-04  2:00   ` Zhang Yanfei
2013-10-05 22:09   ` Toshi Kani
2013-10-05 22:09     ` Toshi Kani
2013-10-07  0:00   ` H. Peter Anvin
2013-10-07  0:00     ` H. Peter Anvin
2013-10-07 14:17     ` Zhang Yanfei
2013-10-07 14:17       ` Zhang Yanfei
2013-10-08 17:36     ` Zhang Yanfei
2013-10-08 17:36       ` Zhang Yanfei
2013-10-08 17:36       ` Zhang Yanfei
2013-10-09 16:44       ` Tejun Heo
2013-10-09 16:44         ` Tejun Heo
2013-10-09 17:14         ` Zhang Yanfei [this message]
2013-10-09 17:14           ` Zhang Yanfei
2013-10-09 19:20           ` Tejun Heo
2013-10-09 19:20             ` Tejun Heo
2013-10-09 19:30             ` Dave Hansen
2013-10-09 19:30               ` Dave Hansen
2013-10-09 19:47               ` Tejun Heo
2013-10-09 19:47                 ` Tejun Heo
2013-10-09 20:58             ` Toshi Kani
2013-10-09 20:58               ` Toshi Kani
2013-10-09 21:11               ` Tejun Heo
2013-10-09 21:11                 ` Tejun Heo
2013-10-09 21:14                 ` H. Peter Anvin
2013-10-09 21:14                   ` H. Peter Anvin
2013-10-09 21:45                   ` Zhang Yanfei
2013-10-09 21:45                     ` Zhang Yanfei
2013-10-09 23:10                     ` H. Peter Anvin
2013-10-09 23:10                       ` H. Peter Anvin
2013-10-09 23:26                       ` Zhang Yanfei
2013-10-09 23:26                         ` Zhang Yanfei
2013-10-10  1:20                         ` Zhang Yanfei
2013-10-10  1:20                           ` Zhang Yanfei
2013-10-10  1:20                           ` Zhang Yanfei
2013-10-10  0:25                   ` Toshi Kani
2013-10-10  0:25                     ` Toshi Kani
2013-10-09 23:58                 ` Toshi Kani
2013-10-09 23:58                   ` Toshi Kani
2013-10-10  1:00                   ` Tejun Heo
2013-10-10  1:00                     ` Tejun Heo
2013-10-10 14:36                     ` Toshi Kani
2013-10-10 14:36                       ` Toshi Kani
2013-10-10 15:35                       ` Tejun Heo
2013-10-10 15:35                         ` Tejun Heo
2013-10-10 16:24                         ` Toshi Kani
2013-10-10 16:24                           ` Toshi Kani
2013-10-10 16:46                           ` Tejun Heo
2013-10-10 16:46                             ` Tejun Heo
2013-10-10 16:50                             ` Toshi Kani
2013-10-10 16:50                               ` Toshi Kani
2013-10-10 16:55                               ` Tejun Heo
2013-10-10 16:55                                 ` Tejun Heo
2013-10-10 16:59                                 ` Toshi Kani
2013-10-10 16:59                                   ` Toshi Kani
2013-10-10 17:12                                   ` H. Peter Anvin
2013-10-10 17:12                                     ` H. Peter Anvin
2013-10-10 19:17                                     ` Toshi Kani
2013-10-10 19:17                                       ` Toshi Kani
2013-10-10 22:19                                       ` Tejun Heo
2013-10-10 22:19                                         ` Tejun Heo
2013-10-10 23:00                                         ` Toshi Kani
2013-10-10 23:00                                           ` Toshi Kani
2013-10-09 21:19             ` Zhang Yanfei
2013-10-09 21:19               ` Zhang Yanfei
2013-10-09 21:22               ` H. Peter Anvin
2013-10-09 21:22                 ` H. Peter Anvin
2013-10-09 23:30                 ` Zhang Yanfei
2013-10-09 23:30                   ` Zhang Yanfei
2013-10-09 19:10         ` Yinghai Lu
2013-10-09 19:10           ` Yinghai Lu
2013-10-09 19:23           ` Tejun Heo
2013-10-09 19:23             ` Tejun Heo
2013-10-11  5:27             ` Yinghai Lu
2013-10-11  5:27               ` Yinghai Lu
2013-10-11  5:47               ` Zhang Yanfei
2013-10-11  5:47                 ` Zhang Yanfei
2013-10-11  6:33                 ` Ingo Molnar
2013-10-11  6:33                   ` Ingo Molnar
2013-10-11  6:46                   ` Zhang Yanfei
2013-10-11  6:46                     ` Zhang Yanfei
2013-10-04  2:01 ` [PATCH part1 v6 5/6] x86, acpi, crash, kdump: Do reserve_crashkernel() after SRAT is parsed Zhang Yanfei
2013-10-04  2:01   ` Zhang Yanfei
2013-10-05 22:10   ` Toshi Kani
2013-10-05 22:10     ` Toshi Kani
2013-10-04  2:02 ` [PATCH part1 v6 6/6] mem-hotplug: Introduce movable_node boot option Zhang Yanfei
2013-10-04  2:02   ` Zhang Yanfei
2013-10-05 22:28   ` Toshi Kani
2013-10-05 22:28     ` Toshi Kani
2013-10-06 14:43     ` [PATCH part1 v6 update " Zhang Yanfei
2013-10-06 14:43       ` Zhang Yanfei
2013-10-06 14:43       ` Zhang Yanfei
2013-10-06 23:03       ` Toshi Kani
2013-10-06 23:03         ` Toshi Kani
2013-10-08  4:23 ` [PATCH part1 v6 0/6] x86, memblock: Allocate memory near kernel image before SRAT parsed Ingo Molnar
2013-10-08  4:23   ` Ingo Molnar
2013-10-08 15:28   ` Zhang Yanfei
2013-10-08 15:28     ` Zhang Yanfei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52558EEF.4050009@gmail.com \
    --to=zhangyanfei.yes@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=gong.chen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=isimatu.yasuaki@jp.fujitsu.com \
    --cc=izumi.taku@jp.fujitsu.com \
    --cc=jiang.liu@huawei.com \
    --cc=jweiner@redhat.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=lenb@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liwanp@linux.vnet.ibm.com \
    --cc=lwoodman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mina86@mina86.com \
    --cc=minchan@kernel.org \
    --cc=mingo@elte.hu \
    --cc=prarit@redhat.com \
    --cc=riel@redhat.com \
    --cc=rjw@sisk.pl \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=toshi.kani@hp.com \
    --cc=trenn@suse.de \
    --cc=vasilis.liaskovitis@profitbricks.com \
    --cc=wency@cn.fujitsu.com \
    --cc=x86@kernel.org \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.