All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Zhang Yanfei <zhangyanfei.yes@gmail.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>,
	konrad.wilk@oracle.com, robert.moore@intel.com,
	lv.zheng@intel.com, rjw@sisk.pl, lenb@kernel.org,
	tglx@linutronix.de, mingo@elte.hu, hpa@zytor.com,
	akpm@linux-foundation.org, trenn@suse.de, yinghai@kernel.org,
	jiang.liu@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
	isimatu.yasuaki@jp.fujitsu.com, izumi.taku@jp.fujitsu.com,
	mgorman@suse.de, minchan@kernel.org, mina86@mina86.com,
	gong.chen@linux.intel.com, vasilis.liaskovitis@profitbricks.com,
	lwoodman@redhat.com, riel@redhat.com, jweiner@redhat.com,
	prarit@redhat.com, zhangyanfei@cn.fujitsu.com,
	yanghy@cn.fujitsu.com, x86@kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-acpi@vger.kernel.org
Subject: Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Date: Wed, 21 Aug 2013 11:36:39 -0400	[thread overview]
Message-ID: <20130821153639.GA17432@htj.dyndns.org> (raw)
In-Reply-To: <5214D60A.2090309@gmail.com>

Hello,

On Wed, Aug 21, 2013 at 11:00:26PM +0800, Zhang Yanfei wrote:
> In current boot order, before we get the SRAT, we have a big consumer of early
> allocations: we are setting up the page table in top-down (The idea was proposed by HPA,
> Link: https://lkml.org/lkml/2012/10/4/701). That said, this kind of page table
> setup will make the page tables as high as possible in memory, since memory at low 
> addresses is precious (for stupid DMA devices, for things like  kexec/kdump, and so on.)

With huge mappings, they are fairly small, right?  And this whole
thing needs a kernel param anyway at this point, so the allocation
direction can be made dependent on that or huge mapping availability
and, even with 4k mappings, we aren't talking about gigabytes of
memory, are we?

> So if we are trying to make early allocations close to kernel image, we should
> rewrite the way we are setting up page table totally. That is not a easy thing
> to do.

It has been a while since I looked at the code so can you please
elaborate why that is not easy?  It's pretty simple conceptually.

> * For memory hotplug, we need ACPI SRAT at early time to be aware of which memory
>   ranges are hotpluggable, and tell the kernel to try to stay away from hotpluggable
>   nodes.
> 
> This one is the current requirement of us but may be very helpful for future change:
> 
> * As suggested by Yinghai, we should allocate page tables in local node. This also
>   needs SRAT before direct mapping page tables are setup.

Does this even matter for huge mappings?

> * As mentioned by Toshi Kani <toshi.kani@hp.com>, ACPI SCPR/DBGP/DBG2 tables
>   allow the OS to initialize serial console/debug ports at early boot time. The
>   earlier it can be initialized, the better this feature will be.  These tables
>   are not currently used by Linux due to a licensing issue, but it could be
>   addressed some time soon.
> 
> So we decided to firstly make ACPI override earlier and use BRK (this is obviously
> near the kernel image range) to store the found ACPI tables.

I don't know.  The whole effort seems way overcomplicated compared to
the benefits it would bring.  For NUMA memory hotunplug, what's the
point of doing all this when the kernel doesn't have any control over
where its image is gonna be?  Some megabytes at the tail aren't gonna
make a huge difference and if you wanna do this properly, you need to
determine the load address of the kernel considering the node
boundaries and hotpluggability of each node, which has to happen
before the early kernel boot code executes.  And if there's a code
piece which does that, that might as well place the kernel image such
that extra allocation afterwards doesn't interfere with memory
hotunplugging.

It looks like a lot of code changes for a mechanism which doesn't seem
all that useful.  This code is already too late in boot sequence to be
a proper solution so I don't see the point in pushing the coverage to
the maximum from here.  It's kinda silly.

The last point - early init of debug facility - makes some sense but
again how extra coverage are we talking about?  The code path between
the two points is fairly short and the change doesn't come free.  It
means we add more fragile firmware-specific code path before the
execution environment is stable and get to do things like traveling
the same code paths multiple times in different environments.  Doesn't
seem like a win.  We want to reach stable execution environment as
soon as possible.  Shoving whole more logic before that in the name of
"earlier debugging" doesn't make a lot of sense.

Thanks.

-- 
tejun

WARNING: multiple messages have this Message-ID (diff)
From: Tejun Heo <tj@kernel.org>
To: Zhang Yanfei <zhangyanfei.yes@gmail.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>,
	konrad.wilk@oracle.com, robert.moore@intel.com,
	lv.zheng@intel.com, rjw@sisk.pl, lenb@kernel.org,
	tglx@linutronix.de, mingo@elte.hu, hpa@zytor.com,
	akpm@linux-foundation.org, trenn@suse.de, yinghai@kernel.org,
	jiang.liu@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
	isimatu.yasuaki@jp.fujitsu.com, izumi.taku@jp.fujitsu.com,
	mgorman@suse.de, minchan@kernel.org, mina86@mina86.com,
	gong.chen@linux.intel.com, vasilis.liaskovitis@profitbricks.com,
	lwoodman@redhat.com, riel@redhat.com, jweiner@redhat.com,
	prarit@redhat.com, zhangyanfei@cn.fujitsu.com,
	yanghy@cn.fujitsu.com, x86@kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-acpi@vger.kernel.org
Subject: Re: [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier.
Date: Wed, 21 Aug 2013 11:36:39 -0400	[thread overview]
Message-ID: <20130821153639.GA17432@htj.dyndns.org> (raw)
In-Reply-To: <5214D60A.2090309@gmail.com>

Hello,

On Wed, Aug 21, 2013 at 11:00:26PM +0800, Zhang Yanfei wrote:
> In current boot order, before we get the SRAT, we have a big consumer of early
> allocations: we are setting up the page table in top-down (The idea was proposed by HPA,
> Link: https://lkml.org/lkml/2012/10/4/701). That said, this kind of page table
> setup will make the page tables as high as possible in memory, since memory at low 
> addresses is precious (for stupid DMA devices, for things like  kexec/kdump, and so on.)

With huge mappings, they are fairly small, right?  And this whole
thing needs a kernel param anyway at this point, so the allocation
direction can be made dependent on that or huge mapping availability
and, even with 4k mappings, we aren't talking about gigabytes of
memory, are we?

> So if we are trying to make early allocations close to kernel image, we should
> rewrite the way we are setting up page table totally. That is not a easy thing
> to do.

It has been a while since I looked at the code so can you please
elaborate why that is not easy?  It's pretty simple conceptually.

> * For memory hotplug, we need ACPI SRAT at early time to be aware of which memory
>   ranges are hotpluggable, and tell the kernel to try to stay away from hotpluggable
>   nodes.
> 
> This one is the current requirement of us but may be very helpful for future change:
> 
> * As suggested by Yinghai, we should allocate page tables in local node. This also
>   needs SRAT before direct mapping page tables are setup.

Does this even matter for huge mappings?

> * As mentioned by Toshi Kani <toshi.kani@hp.com>, ACPI SCPR/DBGP/DBG2 tables
>   allow the OS to initialize serial console/debug ports at early boot time. The
>   earlier it can be initialized, the better this feature will be.  These tables
>   are not currently used by Linux due to a licensing issue, but it could be
>   addressed some time soon.
> 
> So we decided to firstly make ACPI override earlier and use BRK (this is obviously
> near the kernel image range) to store the found ACPI tables.

I don't know.  The whole effort seems way overcomplicated compared to
the benefits it would bring.  For NUMA memory hotunplug, what's the
point of doing all this when the kernel doesn't have any control over
where its image is gonna be?  Some megabytes at the tail aren't gonna
make a huge difference and if you wanna do this properly, you need to
determine the load address of the kernel considering the node
boundaries and hotpluggability of each node, which has to happen
before the early kernel boot code executes.  And if there's a code
piece which does that, that might as well place the kernel image such
that extra allocation afterwards doesn't interfere with memory
hotunplugging.

It looks like a lot of code changes for a mechanism which doesn't seem
all that useful.  This code is already too late in boot sequence to be
a proper solution so I don't see the point in pushing the coverage to
the maximum from here.  It's kinda silly.

The last point - early init of debug facility - makes some sense but
again how extra coverage are we talking about?  The code path between
the two points is fairly short and the change doesn't come free.  It
means we add more fragile firmware-specific code path before the
execution environment is stable and get to do things like traveling
the same code paths multiple times in different environments.  Doesn't
seem like a win.  We want to reach stable execution environment as
soon as possible.  Shoving whole more logic before that in the name of
"earlier debugging" doesn't make a lot of sense.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-08-21 15:36 UTC|newest]

Thread overview: 116+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-21 10:15 [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier Tang Chen
2013-08-21 10:15 ` Tang Chen
2013-08-21 10:15 ` [PATCH 1/8] x86: Make get_ramdisk_{image|size}() global Tang Chen
2013-08-21 10:15   ` Tang Chen
2013-08-21 10:15 ` [PATCH 2/8] x86, microcode: Use get_ramdisk_{image|size}() in microcode handling Tang Chen
2013-08-21 10:15   ` Tang Chen
2013-08-21 10:15 ` [PATCH 3/8] x86, acpi: Move table_sigs[] to stack Tang Chen
2013-08-21 10:15   ` Tang Chen
2013-08-21 10:15 ` [PATCH 4/8] x86, acpi, brk: Extend BRK 256KB to store acpi override tables Tang Chen
2013-08-21 10:15   ` Tang Chen
2013-08-21 10:15 ` [PATCH 5/8] x86, brk: Make extend_brk() available with va/pa Tang Chen
2013-08-21 10:15   ` Tang Chen
2013-08-21 12:26   ` Konrad Rzeszutek Wilk
2013-08-21 12:26     ` Konrad Rzeszutek Wilk
2013-08-21 12:35     ` H. Peter Anvin
2013-08-21 12:35       ` H. Peter Anvin
2013-08-21 14:42       ` Konrad Rzeszutek Wilk
2013-08-21 14:42         ` Konrad Rzeszutek Wilk
2013-08-21 15:04         ` H. Peter Anvin
2013-08-21 15:04           ` H. Peter Anvin
2013-08-21 10:15 ` [PATCH 6/8] x86, acpi: Make acpi_initrd_override() available with va or pa Tang Chen
2013-08-21 10:15   ` Tang Chen
2013-08-21 10:15 ` [PATCH 7/8] x86, acpi, brk: Make early_alloc_acpi_override_tables_buf() available with va/pa Tang Chen
2013-08-21 10:15   ` Tang Chen
2013-08-21 10:15 ` [PATCH 8/8] x86, acpi: Do acpi_initrd_override() earlier in head_32.S/head64.c Tang Chen
2013-08-21 10:15   ` Tang Chen
2013-08-21 10:42 ` [PATCH 0/8] x86, acpi: Move acpi_initrd_override() earlier Tang Chen
2013-08-21 10:42   ` Tang Chen
2013-08-21 13:06 ` Tejun Heo
2013-08-21 13:06   ` Tejun Heo
2013-08-21 15:00   ` Zhang Yanfei
2013-08-21 15:00     ` Zhang Yanfei
2013-08-21 15:36     ` Tejun Heo [this message]
2013-08-21 15:36       ` Tejun Heo
2013-08-21 19:31       ` Toshi Kani
2013-08-21 19:31         ` Toshi Kani
2013-08-21 19:54         ` Tejun Heo
2013-08-21 19:54           ` Tejun Heo
2013-08-21 20:29           ` Toshi Kani
2013-08-21 20:29             ` Toshi Kani
2013-08-21 20:40             ` Tejun Heo
2013-08-21 20:40               ` Tejun Heo
2013-08-21 22:36               ` Toshi Kani
2013-08-21 22:36                 ` Toshi Kani
2013-08-22  3:32                 ` Tejun Heo
2013-08-22  3:32                   ` Tejun Heo
2013-08-22 15:52                   ` Toshi Kani
2013-08-22 15:52                     ` Toshi Kani
2013-08-22 18:31                     ` Tejun Heo
2013-08-22 18:31                       ` Tejun Heo
2013-08-22 19:39                       ` Zhang Yanfei
2013-08-22 19:39                         ` Zhang Yanfei
2013-08-22 19:45                         ` Tejun Heo
2013-08-22 19:45                           ` Tejun Heo
2013-08-22 20:11                       ` Toshi Kani
2013-08-22 20:11                         ` Toshi Kani
2013-08-22 20:21                         ` Tejun Heo
2013-08-22 20:21                           ` Tejun Heo
2013-08-22 20:35                           ` Tejun Heo
2013-08-22 20:35                             ` Tejun Heo
2013-08-22 21:06                           ` Toshi Kani
2013-08-22 21:06                             ` Toshi Kani
2013-08-22 21:21                             ` Tejun Heo
2013-08-22 21:21                               ` Tejun Heo
2013-08-22 22:17                               ` Toshi Kani
2013-08-22 22:17                                 ` Toshi Kani
2013-08-23 13:04                                 ` Tejun Heo
2013-08-23 13:04                                   ` Tejun Heo
2013-08-23 13:08                                   ` H. Peter Anvin
2013-08-23 13:08                                     ` H. Peter Anvin
2013-08-23 14:19                                     ` Tejun Heo
2013-08-23 14:19                                       ` Tejun Heo
2013-08-23 14:24                                       ` H. Peter Anvin
2013-08-23 14:24                                         ` H. Peter Anvin
2013-08-23 14:24                                         ` H. Peter Anvin
2013-08-23 14:35                                         ` Tejun Heo
2013-08-23 14:35                                           ` Tejun Heo
2013-08-23 14:57                                           ` Tejun Heo
2013-08-23 14:57                                             ` Tejun Heo
2013-08-23 16:14                                   ` Toshi Kani
2013-08-23 16:14                                     ` Toshi Kani
2013-08-23 16:24                                     ` Tejun Heo
2013-08-23 16:24                                       ` Tejun Heo
2013-08-23 17:13                                       ` Toshi Kani
2013-08-23 17:13                                         ` Toshi Kani
2013-08-23 17:29                                         ` Zhang Yanfei
2013-08-23 17:29                                           ` Zhang Yanfei
2013-08-23 16:54                                     ` Zhang Yanfei
2013-08-23 16:54                                       ` Zhang Yanfei
2013-08-23 18:18                                       ` Yinghai Lu
2013-08-23 18:18                                         ` Yinghai Lu
2013-08-23 18:25                                         ` H. Peter Anvin
2013-08-23 20:08                                           ` Yinghai Lu
2013-08-23 20:30                                             ` Russ Anderson
2013-08-23 20:48                                               ` Yinghai Lu
2013-08-23 21:50                                                 ` chen tang
2013-08-23 21:52                                                   ` Moore, Robert
2013-08-23 22:05                                                     ` Yinghai Lu
2013-08-23 22:08                                                   ` Yinghai Lu
2013-08-23 22:40                                                     ` chen tang
2013-08-23 23:04                                                       ` Yinghai Lu
2013-08-24  2:41                                             ` Russ Anderson
2013-08-23 20:33                                         ` chen tang
2013-08-23 20:33                                           ` chen tang
2013-08-23 21:08                                           ` Yinghai Lu
2013-08-23 21:08                                             ` Yinghai Lu
2013-08-23 22:27                                             ` chen tang
2013-08-23 22:27                                               ` chen tang
2013-08-23 18:29                                       ` Toshi Kani
2013-08-23 18:29                                         ` Toshi Kani
2013-08-23 21:37                                         ` chen tang
2013-08-23 21:37                                           ` chen tang
2013-08-23 21:52                                           ` Tejun Heo
2013-08-23 21:52                                             ` Tejun Heo
2013-08-23 23:56                                             ` chen tang
2013-08-23 23:56                                               ` chen tang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130821153639.GA17432@htj.dyndns.org \
    --to=tj@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=gong.chen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=isimatu.yasuaki@jp.fujitsu.com \
    --cc=izumi.taku@jp.fujitsu.com \
    --cc=jiang.liu@huawei.com \
    --cc=jweiner@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lv.zheng@intel.com \
    --cc=lwoodman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mina86@mina86.com \
    --cc=minchan@kernel.org \
    --cc=mingo@elte.hu \
    --cc=prarit@redhat.com \
    --cc=riel@redhat.com \
    --cc=rjw@sisk.pl \
    --cc=robert.moore@intel.com \
    --cc=tangchen@cn.fujitsu.com \
    --cc=tglx@linutronix.de \
    --cc=trenn@suse.de \
    --cc=vasilis.liaskovitis@profitbricks.com \
    --cc=wency@cn.fujitsu.com \
    --cc=x86@kernel.org \
    --cc=yanghy@cn.fujitsu.com \
    --cc=yinghai@kernel.org \
    --cc=zhangyanfei.yes@gmail.com \
    --cc=zhangyanfei@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.