linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v1 0/8] x86/init: Linux linker tables
@ 2015-12-15 22:16 Luis R. Rodriguez
  2015-12-15 22:16 ` [RFC v1 1/8] paravirt: rename paravirt_enabled to paravirt_legacy Luis R. Rodriguez
                   ` (10 more replies)
  0 siblings, 11 replies; 60+ messages in thread
From: Luis R. Rodriguez @ 2015-12-15 22:16 UTC (permalink / raw)
  To: hpa, tglx, mingo, bp, konrad.wilk
  Cc: rusty, luto, boris.ostrovsky, mcb30, jgross, JBeulich, joro,
	ryabinin.a.a, andreyknvl, long.wanglong, qiuxishi, aryabinin,
	mchehab, valentinrothberg, peter.senna, mcgrof, x86, xen-devel,
	linux-kernel

From: "Luis R. Rodriguez" <mcgrof@suse.com>

             A long time ago in a galaxy far,
             far away...

Konrad Rzeszutek Wilk posted patches which eventually got merged to help
with modularizing the IOMMUs we have on x86 [0]. This work was done due to
the complex relationship that exists on IOMMUs and the requirements on
careful execution. The solution also provided a mechanism which jettisoned
unused IOMMUs during run-time.

During review, even though the code was merged, hpa did note that we tend
to encounter this type of problem "often enough that we should implement a
generic facility for it" [1], hpa acknowledged that it obviously has to be
based on sections and even noted that perhaps we might be able in the future to
automate its creation. He noted that the gPXE folks had done just this with
linker tables and suggested that "presumably we'd need a few different flavors
for init tables and so on, but this would make it a generic mechanism."

The IOMMU code got merged and this was left on someone's mental backburner.
I've had an itch to scratch recently to try to avoid issues which are possible
if one does not jettison other code carefully due to the large complexity of
implicit dependencies of certain code on x86 in particular with possible dead
code on x86 due to paravirtualization, and the IOMMU jettison strategy turned
out to be my favorite solution so far. I've taken on hpa's suggestions from
back in the day to review gPXE's solution to see if we could embrace it on
Linux for a generic section solution to help jettison code carefully.

What this patch set does exactly:

This RFC patch set attempts to add support for such a generic solution.
In the end, it turns out that the best solution possible was the best of
both worlds: a combination of what Konrad had implemented in addition to
what Michael Brown had implemented on the gXPE front. The IOMMU solution
enables simple semantic annotations for dependency relationships, this
however requires a run time sort. The gPXE solution grants the option to
simply sort at build time. One of gPXE's solution primary goals however was
also to help avoid bit-rot on code that's possible from #ifdef'ery. The Linux
linker table solution enables developers to pick and choose what they
need, with linker tables being the simplest solution. Contrary to gPXE
which strives to force compilation of all linker table solutions we
let developers pick *when* they want this as part of their solution.
As can be seen from the suggested x86 init specific use of linker tables
proposed you can also take advantage of both, linker sorting, optional
compilation when needed (at developer's discretion), and even careful
semantics annotation for dependency / relationship annotations. Although
the x86 init solution here is heavily inspired by the IOMMU solution it
diverges with strong semantics, and a new optional subarchitecture
annotation. Sorting of init sequences is structure specific, as such
each subsystem must defing their own solution unless semantics could
be shared. I considered sharing semantics but in the end this proved
pointless so this keeps things separate. A series of changes were made
to the x86 init sequence in contrast to the IOMMU solution to be *extremely
pedantic* on semantics, review of this changes can be studied on the
table-init tree [2].

Quick review of gPXE's solution and prospects on further changes:

In my review from gPXE's solution it was not clear what hpa meant by
gXPE folks having automated this process, they actually use linker tables
all around, forcing compilation of *everything* and just do linking of
enabled features at link time. You still need to build linker tables on
your own. What I do see more potential for in the future is enabling to
evolve stronger semantics over time, and this would also be subsystem specific.
This will be evident in this patch set on the x86 init use of linker tables.
I also see potential in strenghtening semantics for linker sorting, any of
these types of features however would impose requiring newer binutils. For
instance, gPXE's linker solution currently relies on SORT(), that defaults to
SORT_BY_NAME(). This sorts lexicographically, gPXE's solution uses two digits
to enable SORT_BY_NAME()'s lexicographical sort to sort orde by numeric
priority. Since one is in control of order-level numbers one can provide
guarantee that this sort should work as intended, however binutils also now has
a SORT_BY_INIT_PRIORITY() which sorts specifically based on digits.
SORT_BY_INIT_PRIORITY() was designed specifically for init_array sections
though. Refer to the userspace mockup solution table-init git tree [2] commit
6deba47ee1ad461e90 for more details on this.  One thing I can envision to help
here further are prospects for future linker enhancements, these however must
be carefully considered in light of possible requirements of newer binutils
and also of compatibility with other toolchains. For now we resort to the good
'ol SORT() which even Linux has made use of for ages already. In that regards
here, nothing new here. gPXE folks however did find some fun ICC compatiblity
issues, and have also developed fixes for them, these were obviously carried
over but should be reviewed carefully. Lastly, I should point out that
essentially what we're developing are different forms loose and strong
semantics -- in the most complex form what we really are after are feature
graphs. Mauro explained to me that for media driver they needed to build a
feature graphs, IMHO that could be a next level of strong semantics we might
want to consider. For now though the combination of gPXE's linker-table
solution based on linker order-level sort, and our old IOMMU init solution
and its simple dependency map (and possible extensions, see subarchitecture)
should likely suffice as a light weight solution for where we need semantics
for at least on the x86 init path.

Motivation and possible enhancements:

To understand what made me tick to work on this feel free to read the dead
code concerns with paravirtualization posts I've made, which also go into
and detail Xen's alternative entry point [4] [5]. That's not the end of
possibilities to help address to possible "dead code" on Linux, I have other
ideas but I have a bit more work to do before publishing anything about it.
If anything -- later work should likely supplement this solution further.

Currently the earliest I was able to make use of boot_params on x86-64 for
linker tables was after load_idt(), so I've decided to stick with the earliest
init call for x86 init sequences starting on x86_64_start_reservation().
However, if we're able to make use of boot_params earlier (help appreciated),
in particular just boot_params.hdr.hardware_subarch we should be able to make
clearer and careful annotations on early x86-64 init code. Using the
boot_params.hdr.hardware_subarch and requiring clear subarchitecture support
annotations on early init code should provide a *proactive* means to avoid issues
such as the cr4 shadow oversight [6] which later caused crashes on Xen. The
latest similar type of issue is when KAsan was introduced, after it was
integrated I suspected KAsan would probably break if enabled on Xen, I reported
this in March 2015 [7] and Andrey confirmed this might be the case but since he
wanted it enabled for allyesconfig and allmodconfig he could not think of a
clean way to disable it or address support for it then [8]. During the
development of this patch set I confirmed KAsan crashes Xen dom0 and therefore
should obviously crash Xen PV guests as well.  Using the same kernel KAsan
still worked on KVM and bare-metal. Provided we had early access to
boot_params.hdr.hardware_subarch, in x86_64_start_kernel() we could annotate
kasan_early_init() and friends not supported on Xen, however since we don't
have a way to disable KAsan at run time at this time we wouldn't have any other
option but to crash early. Because of this though we should perhaps just consider
disabling KAsan for Xen configs and KAsan folks might just have to bite the
bullet. If KAsan folks are gung-ho about not disabling KAsan when Xen is
enabled and if its easier to add support to disable KAsan at run time than
it is to add KAsan support for Xen one alternative might be to use linker
tables to annotate this and disable KAsan at run time for Xen.

One of the points of the use of the x86 linker table solution and specific
semantics there in is to *force* developers to think about requirements on x86
carefully. So for instance by requiring itemizing supported subarchitectures
*early* on in development we should be able to avoid proactively issues such as
the crash due to the cr4 shadow changes not added on the Xen entry path, and
also missing the requirement to develop a solution for Kasan for Xen. The
proposed x86 linker table solution is not the first to use
boot_params.hdr.hardware_subarch, its first use was actually on i386, and for
lguest. We take advantage of it to avoid further extending pv_ops, and
*actually* to see if we could simplify pv_ops over time further. This patch set
also includes a renaming of paravirt_enabled() to paravirt_legacy() 

An unexpected long term side goal which *might* be possible due to linker
tables on x86 is to help unify the different C entry points for Linux x86-64.
I've taken a stab at it after these patches [9] but it fails on Xen, likely
because the stack is not set up right due to the different calls / argument
requirements. If this is desirable folks could take a look and perhaps help
on that front.

Where to get code alternatively and testing:

All of these patches are RFCs, if anything the only patches worth consdering
merging sooner rather than later might be "paravirt: rename paravirt_enabled
to paravirt_legacy" and "x86/boot: add BIT() to boot/bitops.h". After review
I can send those separately in patch form if agreeable. Since these are all
just RFCs I've based these patches on top of Linus' v4.4-rc5. If you want all
patches in one file you can get them here [10], and a respective linux-next
version which applies on top of next-20151215 here [11]. I also have two
git trees set up with branches for this code, one based on Linus' tree [12]
and another based on linux-next next-20151215 [13]. I'll see what issues
zero-day bot testing finds out. I've just run time tested this on x86-64
bare metal, KVM, and Xen dom0 so far.

[0] https://marc.info/?l=linux-kernel&m=128284562303565&w=2
[1] https://marc.info/?l=linux-kernel&m=128285216913266&w=2
[2] https://github.com/mcgrof/table-init/
[3] https://github.com/mcgrof/table-init/commit/6deba47ee1ad461e90e0fbba226a535cfc1c58f3
[4] http://www.do-not-panic.com/2015/12/avoiding-dead-code-pvops-not-silver-bullet.html
[5] http://www.do-not-panic.com/2015/12/xen-and-x86-linux-zero-page.html
[6] http://lists.xenproject.org/archives/html/xen-devel/2015-02/msg02742.html
[7] http://lkml.kernel.org/r/CAB=NE6Xs5fepzNtymzT4CueeJZ0KMPETpda114DpL4eMtDswtw@mail.gmail.com
[8] http://lkml.kernel.org/r/54F5B3DA.70203@samsung.com
[9] http://drvbp1.linux-foundation.org/~mcgrof/patches/2015/12/15/x86-merge-x86-init-v1.patch
[10] http://drvbp1.linux-foundation.org/~mcgrof/patches/2015/12/15/pend-20151215-rfc-v1-linker-tables.patch
[11] http://drvbp1.linux-foundation.org/~mcgrof/patches/2015/12/15/pend-next-20151215-rfc-v1-linker-tables.patch
[12] https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linux.git/log/?h=20151215-rfc-v1-linker-tables
[13] https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linux-next.git/log/?h=20151215-rfc-v1-linker-tables

Luis R. Rodriguez (8):
  paravirt: rename paravirt_enabled to paravirt_legacy
  tables.h: add linker table support
  x86/boot: add BIT() to boot/bitops.h
  x86/init: add linker table support
  x86/init: move ebda reservations into linker table
  x86/init: use linker table for i386 early setup
  x86/init: user linker table for ce4100 early setup
  x86/init: use linker table for mid early setup

 Documentation/kbuild/makefiles.txt      |  19 ++
 arch/x86/Kconfig.debug                  |  47 ++++
 arch/x86/boot/bitops.h                  |   2 +
 arch/x86/boot/boot.h                    |   2 +-
 arch/x86/include/asm/bios_ebda.h        |   2 -
 arch/x86/include/asm/paravirt.h         |   4 +-
 arch/x86/include/asm/paravirt_types.h   |  11 +-
 arch/x86/include/asm/processor.h        |   2 +-
 arch/x86/include/asm/setup.h            |  12 -
 arch/x86/include/asm/x86_init.h         |   1 +
 arch/x86/include/asm/x86_init_fn.h      | 267 ++++++++++++++++++++
 arch/x86/kernel/Makefile                |   4 +-
 arch/x86/kernel/apm_32.c                |   2 +-
 arch/x86/kernel/asm-offsets.c           |   2 +-
 arch/x86/kernel/cpu/intel.c             |   2 +-
 arch/x86/kernel/cpu/microcode/core.c    |   2 +-
 arch/x86/kernel/dbg-tables/Makefile     |  18 ++
 arch/x86/kernel/dbg-tables/alpha.c      |  10 +
 arch/x86/kernel/dbg-tables/beta.c       |  18 ++
 arch/x86/kernel/dbg-tables/delta.c      |  10 +
 arch/x86/kernel/dbg-tables/gamma.c      |  18 ++
 arch/x86/kernel/dbg-tables/gamma.h      |   3 +
 arch/x86/kernel/{head.c => ebda.c}      |   6 +-
 arch/x86/kernel/head32.c                |  22 +-
 arch/x86/kernel/head64.c                |   4 +-
 arch/x86/kernel/init.c                  |  55 +++++
 arch/x86/kernel/kvm.c                   |   9 +-
 arch/x86/kernel/paravirt.c              |   2 +-
 arch/x86/kernel/sort-init.c             | 114 +++++++++
 arch/x86/kernel/tboot.c                 |   2 +-
 arch/x86/kernel/vmlinux.lds.S           |  25 ++
 arch/x86/lguest/boot.c                  |   2 +-
 arch/x86/mm/dump_pagetables.c           |   2 +-
 arch/x86/platform/ce4100/ce4100.c       |   4 +-
 arch/x86/platform/intel-mid/intel-mid.c |   4 +-
 arch/x86/tools/relocs.c                 |   1 +
 arch/x86/xen/enlighten.c                |   2 +-
 drivers/pnp/pnpbios/core.c              |   2 +-
 include/linux/tables.h                  | 421 ++++++++++++++++++++++++++++++++
 scripts/Makefile.build                  |   4 +-
 scripts/Makefile.clean                  |   1 +
 scripts/Makefile.lib                    |  12 +
 42 files changed, 1091 insertions(+), 61 deletions(-)
 create mode 100644 arch/x86/include/asm/x86_init_fn.h
 create mode 100644 arch/x86/kernel/dbg-tables/Makefile
 create mode 100644 arch/x86/kernel/dbg-tables/alpha.c
 create mode 100644 arch/x86/kernel/dbg-tables/beta.c
 create mode 100644 arch/x86/kernel/dbg-tables/delta.c
 create mode 100644 arch/x86/kernel/dbg-tables/gamma.c
 create mode 100644 arch/x86/kernel/dbg-tables/gamma.h
 rename arch/x86/kernel/{head.c => ebda.c} (94%)
 create mode 100644 arch/x86/kernel/init.c
 create mode 100644 arch/x86/kernel/sort-init.c
 create mode 100644 include/linux/tables.h

-- 
2.6.2


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [RFC v1 1/8] paravirt: rename paravirt_enabled to paravirt_legacy
  2015-12-15 22:16 [RFC v1 0/8] x86/init: Linux linker tables Luis R. Rodriguez
@ 2015-12-15 22:16 ` Luis R. Rodriguez
       [not found]   ` <20160120193241.GA4769@char.us.oracle.com>
  2015-12-15 22:16 ` [RFC v1 2/8] tables.h: add linker table support Luis R. Rodriguez
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 60+ messages in thread
From: Luis R. Rodriguez @ 2015-12-15 22:16 UTC (permalink / raw)
  To: hpa, tglx, mingo, bp, konrad.wilk
  Cc: rusty, luto, boris.ostrovsky, mcb30, jgross, JBeulich, joro,
	ryabinin.a.a, andreyknvl, long.wanglong, qiuxishi, aryabinin,
	mchehab, valentinrothberg, peter.senna, mcgrof, x86, xen-devel,
	linux-kernel, cocci

From: "Luis R. Rodriguez" <mcgrof@suse.com>

paravirt_enabled conveys the idea that if this is set or if
paravirt_enabled() returns true you are in a paravirtualized
environment. This is not true, this is really only useful to
determine if you have a paravirtualization hypervisor that uses
supports legacy paravirtualized guests. At run time, this tells
us if we've booted into a Linux guest with support for legacy
paravirtualized guests.

Make emphasis about this by renaming the member and helper.
While at it, make the type bool, and document it therefore
avoiding special cased comments trying to explain it.

The rename is done using the following Coccinelle SmPL patch:

@ rename_paravirt_enabled @
@@

-paravirt_enabled()
+paravirt_legacy()

@ rename_pv_info_pv_enabled @
@@
-pv_info.paravirt_enabled
+pv_info.paravirt_legacy

@ is_pv @
identifier pv;
@@
struct pv_info pv = {
};

@ rename_struct_pv_enabled depends on is_pv @
identifier is_pv.pv;
expression val;
@@

struct pv_info pv = {
-	.paravirt_enabled
+	.paravirt_legacy
	= val,
};

Generated-by: Coccinelle SmPL
Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: cocci@systeme.lip6.fr
cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 arch/x86/include/asm/paravirt.h       |  4 ++--
 arch/x86/include/asm/paravirt_types.h | 11 ++++++++++-
 arch/x86/include/asm/processor.h      |  2 +-
 arch/x86/kernel/apm_32.c              |  2 +-
 arch/x86/kernel/asm-offsets.c         |  2 +-
 arch/x86/kernel/cpu/intel.c           |  2 +-
 arch/x86/kernel/cpu/microcode/core.c  |  2 +-
 arch/x86/kernel/head.c                |  2 +-
 arch/x86/kernel/kvm.c                 |  9 +--------
 arch/x86/kernel/paravirt.c            |  2 +-
 arch/x86/kernel/tboot.c               |  2 +-
 arch/x86/lguest/boot.c                |  2 +-
 arch/x86/mm/dump_pagetables.c         |  2 +-
 arch/x86/xen/enlighten.c              |  2 +-
 drivers/pnp/pnpbios/core.c            |  2 +-
 15 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 10d0596433f8..38e698ff8cae 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -14,9 +14,9 @@
 #include <linux/types.h>
 #include <linux/cpumask.h>
 
-static inline int paravirt_enabled(void)
+static inline bool paravirt_legacy(void)
 {
-	return pv_info.paravirt_enabled;
+	return pv_info.paravirt_legacy;
 }
 
 static inline void load_sp0(struct tss_struct *tss,
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 31247b5bff7c..3d89e8e878bc 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -61,6 +61,15 @@ struct paravirt_callee_save {
 };
 
 /* general info */
+
+/**
+ * struct pv_info - hypervisor information
+ *
+ * @paravirt_legacy: true if this hypervisor supports legacy
+ * 	paravirtualized guests. Examples of features that
+ * 	characterize legacy paravirtualized guests are
+ * 	things such as the need for APM, PNP BIOS.
+ */
 struct pv_info {
 	unsigned int kernel_rpl;
 	int shared_kernel_pmd;
@@ -69,7 +78,7 @@ struct pv_info {
 	u16 extra_user_64bit_cs;  /* __USER_CS if none */
 #endif
 
-	int paravirt_enabled;
+	bool paravirt_legacy;
 	const char *name;
 };
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 67522256c7ff..354a0d010c59 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -471,7 +471,7 @@ static inline unsigned long current_top_of_stack(void)
 #include <asm/paravirt.h>
 #else
 #define __cpuid			native_cpuid
-#define paravirt_enabled()	0
+#define paravirt_legacy()	false
 
 static inline void load_sp0(struct tss_struct *tss,
 			    struct thread_struct *thread)
diff --git a/arch/x86/kernel/apm_32.c b/arch/x86/kernel/apm_32.c
index 052c9c3026cc..74a3e8ca2c6d 100644
--- a/arch/x86/kernel/apm_32.c
+++ b/arch/x86/kernel/apm_32.c
@@ -2267,7 +2267,7 @@ static int __init apm_init(void)
 
 	dmi_check_system(apm_dmi_table);
 
-	if (apm_info.bios.version == 0 || paravirt_enabled() || machine_is_olpc()) {
+	if (apm_info.bios.version == 0 || paravirt_legacy() || machine_is_olpc()) {
 		printk(KERN_INFO "apm: BIOS not found.\n");
 		return -ENODEV;
 	}
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 439df975bc7a..20a7a0df2c65 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -59,7 +59,7 @@ void common(void) {
 
 #ifdef CONFIG_PARAVIRT
 	BLANK();
-	OFFSET(PARAVIRT_enabled, pv_info, paravirt_enabled);
+	OFFSET(PARAVIRT_legacy, pv_info, paravirt_legacy);
 	OFFSET(PARAVIRT_PATCH_pv_cpu_ops, paravirt_patch_template, pv_cpu_ops);
 	OFFSET(PARAVIRT_PATCH_pv_irq_ops, paravirt_patch_template, pv_irq_ops);
 	OFFSET(PV_IRQ_irq_disable, pv_irq_ops, irq_disable);
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 209ac1e7d1f0..134b5d5d9c74 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -220,7 +220,7 @@ static void intel_workarounds(struct cpuinfo_x86 *c)
 	 * The Quark is also family 5, but does not have the same bug.
 	 */
 	clear_cpu_bug(c, X86_BUG_F00F);
-	if (!paravirt_enabled() && c->x86 == 5 && c->x86_model < 9) {
+	if (!paravirt_legacy() && c->x86 == 5 && c->x86_model < 9) {
 		static int f00f_workaround_enabled;
 
 		set_cpu_bug(c, X86_BUG_F00F);
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index b3e94ef461fd..022cc8dad97f 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -630,7 +630,7 @@ int __init microcode_init(void)
 	struct cpuinfo_x86 *c = &boot_cpu_data;
 	int error;
 
-	if (paravirt_enabled() || dis_ucode_ldr)
+	if (paravirt_legacy() || dis_ucode_ldr)
 		return -EINVAL;
 
 	if (c->x86_vendor == X86_VENDOR_INTEL)
diff --git a/arch/x86/kernel/head.c b/arch/x86/kernel/head.c
index 992f442ca155..279fad7288f8 100644
--- a/arch/x86/kernel/head.c
+++ b/arch/x86/kernel/head.c
@@ -38,7 +38,7 @@ void __init reserve_ebda_region(void)
 	 * that the paravirt case can handle memory setup
 	 * correctly, without our help.
 	 */
-	if (paravirt_enabled())
+	if (paravirt_legacy())
 		return;
 
 	/* end of low (conventional) memory */
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 47190bd399e7..e2368f73bc14 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -283,14 +283,7 @@ NOKPROBE_SYMBOL(do_async_page_fault);
 static void __init paravirt_ops_setup(void)
 {
 	pv_info.name = "KVM";
-
-	/*
-	 * KVM isn't paravirt in the sense of paravirt_enabled.  A KVM
-	 * guest kernel works like a bare metal kernel with additional
-	 * features, and paravirt_enabled is about features that are
-	 * missing.
-	 */
-	pv_info.paravirt_enabled = 0;
+	pv_info.paravirt_legacy = false;
 
 	if (kvm_para_has_feature(KVM_FEATURE_NOP_IO_DELAY))
 		pv_cpu_ops.io_delay = kvm_io_delay;
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index c2130aef3f9d..326df1ce2a11 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -312,7 +312,7 @@ enum paravirt_lazy_mode paravirt_get_lazy_mode(void)
 
 struct pv_info pv_info = {
 	.name = "bare hardware",
-	.paravirt_enabled = 0,
+	.paravirt_legacy = false,
 	.kernel_rpl = 0,
 	.shared_kernel_pmd = 1,	/* Only used when CONFIG_X86_PAE is set */
 
diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
index 91a4496db434..87a5027ff242 100644
--- a/arch/x86/kernel/tboot.c
+++ b/arch/x86/kernel/tboot.c
@@ -75,7 +75,7 @@ void __init tboot_probe(void)
 	}
 
 	/* only a natively booted kernel should be using TXT */
-	if (paravirt_enabled()) {
+	if (paravirt_legacy()) {
 		pr_warning("non-0 tboot_addr but pv_ops is enabled\n");
 		return;
 	}
diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
index a0d09f6c6533..eb0a42bbf41b 100644
--- a/arch/x86/lguest/boot.c
+++ b/arch/x86/lguest/boot.c
@@ -1409,7 +1409,7 @@ __init void lguest_init(void)
 	/* We're under lguest. */
 	pv_info.name = "lguest";
 	/* Paravirt is enabled. */
-	pv_info.paravirt_enabled = 1;
+	pv_info.paravirt_legacy = true;
 	/* We're running at privilege level 1, not 0 as normal. */
 	pv_info.kernel_rpl = 1;
 	/* Everyone except Xen runs with this set. */
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index a035c2aa7801..daf87f12a0af 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -365,7 +365,7 @@ static inline bool is_hypervisor_range(int idx)
 	 * ffff800000000000 - ffff87ffffffffff is reserved for
 	 * the hypervisor.
 	 */
-	return paravirt_enabled() &&
+	return paravirt_legacy() &&
 		(idx >= pgd_index(__PAGE_OFFSET) - 16) &&
 		(idx < pgd_index(__PAGE_OFFSET));
 }
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 5774800ff583..95060b31e49c 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1186,7 +1186,7 @@ static unsigned xen_patch(u8 type, u16 clobbers, void *insnbuf,
 }
 
 static const struct pv_info xen_info __initconst = {
-	.paravirt_enabled = 1,
+	.paravirt_legacy = true,
 	.shared_kernel_pmd = 0,
 
 #ifdef CONFIG_X86_64
diff --git a/drivers/pnp/pnpbios/core.c b/drivers/pnp/pnpbios/core.c
index facd43b8516c..40557bc16005 100644
--- a/drivers/pnp/pnpbios/core.c
+++ b/drivers/pnp/pnpbios/core.c
@@ -521,7 +521,7 @@ static int __init pnpbios_init(void)
 	int ret;
 
 	if (pnpbios_disabled || dmi_check_system(pnpbios_dmi_table) ||
-	    paravirt_enabled()) {
+	    paravirt_legacy()) {
 		printk(KERN_INFO "PnPBIOS: Disabled\n");
 		return -ENODEV;
 	}
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [RFC v1 2/8] tables.h: add linker table support
  2015-12-15 22:16 [RFC v1 0/8] x86/init: Linux linker tables Luis R. Rodriguez
  2015-12-15 22:16 ` [RFC v1 1/8] paravirt: rename paravirt_enabled to paravirt_legacy Luis R. Rodriguez
@ 2015-12-15 22:16 ` Luis R. Rodriguez
       [not found]   ` <20160120200428.GB4769@char.us.oracle.com>
  2015-12-15 22:16 ` [RFC v1 3/8] x86/boot: add BIT() to boot/bitops.h Luis R. Rodriguez
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 60+ messages in thread
From: Luis R. Rodriguez @ 2015-12-15 22:16 UTC (permalink / raw)
  To: hpa, tglx, mingo, bp, konrad.wilk
  Cc: rusty, luto, boris.ostrovsky, mcb30, jgross, JBeulich, joro,
	ryabinin.a.a, andreyknvl, long.wanglong, qiuxishi, aryabinin,
	mchehab, valentinrothberg, peter.senna, mcgrof, x86, xen-devel,
	linux-kernel

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This adds Linux table support to Linux, based on Michael Brown's
gPXE's linker table solution. Linker tables enable an extremely
light weight linker build time solution for feature ordering and
selection, this can help to both simplify init sequences in a generic
fashion and helps avoiding code bit-rotting when desirable.

Bit rotting avoidance is enabled by *granting the option* to force
compilation of code and only enable linking object code in when
specific features have been enabled. It accomplishes this by using
linker sections, the lightweight feature ordering of code is enabled
by taking advantage of the old binutils ld SORT() on features
specific sections.

Contrary to gPXE's solution, which strives to force compilation
of all features, Linux' solution strives to encourage usage of linker
tables to force compilation of code *only on designated code*. Linker
tables can be used without requiring one to force code compilation
of features that use it though, to take advantage of the simplification
of init sequences.

Linux code that uses linker tables *and* wishes to always require
compilation but selectively linking must use new kbuild target object
that helps accomplishes this, table-y. Linux code that uses linker
tables but does not want to require forcing compilation can use the
good 'ol obj-y targets.

Used commit c23508d796 from gPXE upstream [0] as the starting point
for development and evaluation of Linux's integration, further changes
made and evaluation of different linker table options for Linux are
available on the table-init userspace mockup solution [1].

[0] git://git.etherboot.org/scm/gpxe.git
[1] https://github.com/mcgrof/table-init.git

Cc: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 Documentation/kbuild/makefiles.txt |  19 ++
 include/linux/tables.h             | 421 +++++++++++++++++++++++++++++++++++++
 scripts/Makefile.build             |   4 +-
 scripts/Makefile.clean             |   1 +
 scripts/Makefile.lib               |  12 ++
 5 files changed, 455 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/tables.h

diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt
index 13f888a02a3d..0a21b8db2f90 100644
--- a/Documentation/kbuild/makefiles.txt
+++ b/Documentation/kbuild/makefiles.txt
@@ -1088,6 +1088,25 @@ When kbuild executes, the following steps are followed (roughly):
 	In this example, extra-y is used to list object files that
 	shall be built, but shall not be linked as part of built-in.o.
 
+    table-y table-m and table-
+
+	To avoid code bit-rot you may wish to avoid using #ifdefs on your
+	code. Bit-rot is possible if certain features may not be enabled
+	on certain build environments. The table-y, table-m specifies
+	targets which you always wish to force compilation on, but wish
+	to only enable linking in if the feature is enabled, these are
+	features taking advantage of Linux's linker tables. For details
+	on how to implement a feature using linker tablesrefer to
+	include/linux/tables.h.
+
+	Example:
+		table-$(CONFIG-FEATURE_FOO) += foo.o
+
+	An alternative to using table-y, table-m is to use extra-y followed
+	by the respective obj-y, obj-m:
+
+		extra-y += foo.o
+		obj-$(CONFIG-FEATURE_FOO) += foo.o
 
 --- 6.7 Commands useful for building a boot image
 
diff --git a/include/linux/tables.h b/include/linux/tables.h
new file mode 100644
index 000000000000..380b8220bed8
--- /dev/null
+++ b/include/linux/tables.h
@@ -0,0 +1,421 @@
+#ifndef _LINUX_LINKER_TABLES_H
+#define _LINUX_LINKER_TABLES_H
+/*
+ * Linux linker tables
+ *
+ * Copyright (C) 2005-2009 Michael Brown <mcb30@ipxe.org>
+ * Copyright (C) 2015 Luis R. Rodriguez <mcgrof@do-not-panic.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * Due to this file being licensed under the GPL there is controversy over
+ * whether this permits you to write a module that #includes this file
+ * without placing your module under the GPL.  Please consult a lawyer for
+ * advice before doing this.
+ */
+
+/**
+ * DOC: linker tables - simplifying inits and for when #ifdefs are harmful
+ *
+ * Linker tables help you simplify init sequences by using ELF sections, linker
+ * build time selective sorting (disabled options get ignored), and can also be
+ * used to help you avoid bit-rot code due to #ifdery collateral.
+ *
+ * The code bit-rot prolem:
+ *
+ * Overuse of C #ifdefs can be problematic for certain types of code.  Linux
+ * provides a rich array of features, but all these features take up valuable
+ * space in a kernel image. The traditional solution to this problem has been
+ * for each feature to have its own Kconfig entry and for the respective code
+ * to be wrapped around #ifdefs, allowing the feature to be compiled in only
+ * if desired.
+ *
+ * The problem with this is that over time it becomes very difficult and time
+ * consuming to compile, let alone test, all possible versions of Linux. Code
+ * that is not typically used tends to suffer from bit-rot over time. It can
+ * become difficult to predict which combinations of compile-time options will
+ * result in code that can compile and link correctly.
+ *
+ * To solve this problem linker tables can be used on Linux, it enables you to
+ * always force compiling of select features that one wishes to avoid bit-rot
+ * while still enabling you to disable linking feature code into the final
+ * kernel image or building certain modules if the features have been disabled
+ * via Kconfig. The code is derivative of gPXE linker table's solution.  For
+ * more details on history and to see how this code evolved refer to the
+ * userspace mockup solution on github [0]. This repository can also be used to
+ * for ease of testing of extensions and sampling of changes prior to inclusion
+ * into Linux, it is intended to be kept up to date to match Linux's solution.
+ * Contrary to gPXE's solution, which stives to force compilation of
+ * *everything*, Linux's solution allows for developers to be selective over
+ * where one should use linker tables.
+ *
+ * [0] https://github.com/mcgrof/table-init/
+ *
+ * To use linker tables, features should be implemented in separate C files,
+ * and should always be compiled -- they should not be guarded with a C code
+ * #ifdef CONFIG_FOO statements. By making code always compile, you avoid
+ * the problem of bit-rot in rarely-used code. The trick to linker tables
+ * is by forcing compilation but only linking if the feature has been enabled.
+ * This is accomplished with a new target for Makefiles documented below.
+ *
+ * Let's assume we want to always force compilation of feature FOO in the
+ * kernel but avoid linking it. When you enable the FOO feature via Kconfig
+ * you'd end up with:
+ *
+ *   #define CONFIG_FOO 1
+ *
+ * You typically would then just use this on your Makefile to selectively
+ * compile and link the feature:
+ *
+ * obj-$(CONFIG_FOO) += foo.o
+ *
+ * You'd instead use the new linker table object:
+ *
+ * table-$(CONFIG_FOO) += foo.o
+ *
+ * Alternatively this would be the equivalent of listing:
+ *
+ * extra += foo.o
+ * obj-$(CONFIG_FOO) += foo.o
+ *
+ * The reason why this works cleanly is due to how linker tables work.
+ * Traditionally, we would implement features in C code as follows:
+ *
+ *	foo_init();
+ *
+ * You'd then have a foo.h which would have:
+ *
+ *  #ifdef CONFIG_FOO
+ *  void foo(void);
+ *  #else
+ *  static inline void foo(void) { }
+ *  #endif
+ *
+ * Simplifying inits:
+ *
+ * With linker tables this is no longer necessary as your init routines would
+ * be implicit, you'd instead call:
+ *
+ *	call_init_fns();
+ *
+ * call_init_fns() would call all functions present in your init table and if
+ * and only if foo.o gets linked in, then its initialisation function will be
+ * called.
+ *
+ * The linker script takes care of assembling the tables for us. All of our
+ * table sections have names of the format .tbl.NAME.NN. NAME designates the
+ * data structure stored in the table. For instance if you had defined a struct
+ * init_fn as the data type for your init sequences you would have a respective
+ * init_fns table name declared as reference for these init sequences. NN is a
+ * two-digit decimal number used to impose an "order level" upon the tables if
+ * required. NN=00 is reserved for the symbol indicating "table start", and
+ * NN=99 is reserved for the symbol indicating "table end". In order for the
+ * call_init_fns() to work behind the scenes the custom linker script would
+ * need to define the beginning of the table, the end of the table, and in
+ * between it should use SORT() to give order-level effect. To account for all
+ * of your struct init_fn init sequences, in your linker script you would have:
+ *
+ *   .tbl            : {
+ *	__tbl_start_init_fns = .;
+ * 	*(SORT(.tbl.init_fns.*))
+ *	__tbl_end_init_fns = .;
+ *
+ *	Other init tables can go here as well for different
+ *	structures.
+ *   }
+ *
+ * The order-level is really only a helper, if only one order level is
+ * used, the next contributing factor to order is the order of the code
+ * in the C file, and the order of the objects in the Makefile. Using
+ * an order level then should not really be needed in most cases, its
+ * use however enables to compartamentalize code into tables where ordering
+ * through C file or through the Makefile would otherwise be very difficult
+ * or if one wanted to enable very specific initialization semantics.
+ *
+ * As an example, suppose that we want to create a "frobnicator"
+ * feature framework, and allow for several independent modules to
+ * provide frobnicating services. Then we would create a frob.h
+ * header file containing e.g.
+ *
+ *   struct frobnicator {
+ *      const char *name;
+ *	void (*frob) (void);
+ *   };
+ *
+ *   #define FROBNICATORS __table(struct frobnicator, "frobnicators")
+ *   #define __frobnicator __table_entry(FROBNICATORS, 01)
+ *
+ * Any module providing frobnicating services would look something
+ * like
+ *
+ *   #include "frob.h"
+ *
+ *   static void my_frob(void) {
+ *	... Do my frobnicating
+ *   }
+ *
+ *   struct frob my_frobnicator __frobnicator = {
+ *	.name = "my_frob",
+ *	.frob = my_frob,
+ *   };
+ *
+ * The central frobnicator code (frob.c) would use the frobnicating
+ * modules as follows
+ *
+ *   #include "frob.h"
+ *
+ *   void frob_all(void) {
+ *	struct frob *frob;
+ *
+ *	for_each_table(frob, FROBNICATORS) {
+ *         pr_info("Calling frobnicator \"%s\"\n", frob->name);
+ *	   frob->frob();
+ *	}
+ *   }
+ */
+
+/*
+ * __table - Declares a linker table
+ *
+ * @type: data type
+ * @name: table name
+ *
+ * Declares a linker table.
+ */
+#define __table(type, name) (type, name)
+
+/*
+ * __table_type() - get linker table data type
+ *
+ * @table: linker table
+ *
+ * Gives you the linker table data type.
+ */
+#define __table_type(table) __table_extract_type table
+#define __table_extract_type(type, name) type
+
+/*
+ * __table_name - get linker table name
+ *
+ * @table: linker table
+ *
+ * Gives you the table name.
+ */
+#define __table_name(table) __table_extract_name table
+#define __table_extract_name(type, name) name
+
+/*
+ * __table_section - get linker table section name
+ *
+ * @table: linker table
+ * @idx: order level, this is the sub-table index
+ *
+ * Declares the section name.
+ */
+#define __table_section(table, idx) \
+	".tbl." __table_name (table) "." __table_str (idx)
+#define __table_str(x) #x
+
+/*
+ * __table_alignment - get linker table alignment
+ *
+ * @table: linker table
+ *
+ * Gives you the linker table alignment.
+ */
+#define __table_alignment( table ) __alignof__ (__table_type(table))
+
+/*
+ * __table_entry - declare a linker table entry
+ *
+ * @table: linker table
+ * @idx: order level, the sub-table index
+ *
+ * Example usage:
+ *
+ *   #define FROBNICATORS __table (struct frobnicator, "frobnicators")
+ *   #define __frobnicator __table_entry(FROBNICATORS, 01)
+ *
+ *   struct frobnicator my_frob __frobnicator = {
+ *      ...
+ *   };
+ */
+#define __table_entry(table, idx)					\
+	__attribute__ ((__section__(__table_section(table, idx)),	\
+			__aligned__(__table_alignment(table))))
+
+/*
+ * __table_entries - get start of linker table entries
+ *
+ * @table: linker table
+ * @idx: order level, the sub-table index
+ *
+ * This gives you the start of the respective linker table entries
+ */
+#define __table_entries(table, idx) ( {					\
+	static __table_type(table) __table_entries[0]			\
+		__table_entry(table, idx); 				\
+	__table_entries; } )
+
+/*
+ * table_start - get start of linker table
+ *
+ * @table: linker table
+ *
+ * This gives you the start of the respective linker table.
+ *
+ * Example usage:
+ *
+ *   #define FROBNICATORS __table (struct frobnicator, "frobnicators")
+ *
+ *   struct frobnicator *frobs = table_start(FROBNICATORS);
+ */
+#define table_start(table) __table_entries(table, 00)
+
+/*
+ * table_end - get end of linker table
+ *
+ * @table: linker table
+ *
+ * This gives you the end of the respective linker table.
+ *
+ * Example usage:
+ *
+ *   #define FROBNICATORS __table (struct frobnicator, "frobnicators")
+ *
+ *   struct frobnicator *frobs_end = table_end(FROBNICATORS);
+ */
+#define table_end(table) __table_entries(table, 99)
+
+/*
+ * table_num_entries - get number of entries in linker table
+ *
+ * @table: linker table
+ *
+ * This gives you the number of entries in linker table.
+ *
+ * Example usage:
+ *
+ *   #define FROBNICATORS __table(struct frobnicator, "frobnicators")
+ *
+ *   unsigned int num_frobs = table_num_entries(FROBNICATORS);
+ */
+#define table_num_entries(table)					\
+	((unsigned int) (table_end(table) -				\
+			 table_start(table)))
+
+/*
+ * for_each_table_entry - iterate through all entries within a linker table
+ *
+ * @pointer: entry pointer
+ * @table: linker table
+ *
+ * Example usage:
+ *
+ *   #define FROBNICATORS __table(struct frobnicator, "frobnicators")
+ *
+ *   struct frobnicator *frob;
+ *
+ *   for_each_table_entry(frob, FROBNICATORS) {
+ *     ...
+ *   }
+ */
+#define for_each_table_entry(pointer, table)				\
+	for (pointer = table_start(table);				\
+	     pointer < table_end(table);				\
+	     pointer++)
+
+/**
+ * for_each_table_entry_reverse - iterate through linker table in reverse order
+ *
+ * @pointer: entry pointer
+ * @table: linker table
+ *
+ * Example usage:
+ *
+ *   #define FROBNICATORS __table(struct frobnicator, "frobnicators")
+ *
+ *   struct frobnicator *frob;
+ *
+ *   for_each_table_entry_reverse(frob, FROBNICATORS) {
+ *     ...
+ *   }
+ */
+#define for_each_table_entry_reverse(pointer, table)			\
+	for (pointer = (table_end(table) - 1 );				\
+	     pointer >= table_start(table);				\
+	     pointer--)
+
+/*
+ *
+ * Intel's C compiler chokes on several of the constructs used in this
+ * file. The workarounds are ugly, so we use them only for an icc
+ * build.
+ */
+#define ICC_ALIGN_HACK_FACTOR 128
+#ifdef __ICC
+
+/*
+ * icc miscompiles zero-length arrays by inserting padding to a length
+ * of two array elements. We therefore have to generate the
+ * __table_entries() symbols by hand in asm.
+ */
+#undef __table_entries
+#define __table_entries(table, idx) ( {					\
+	extern __table_type(table)					\
+		__table_temp_sym(idx, __LINE__) []			\
+		__table_entry(table, idx) 				\
+		asm (__table_entries_sym(table, idx));			\
+	__asm__( ".ifndef %c0\n\t"					\
+		  ".section " __table_section(table, idx) "\n\t"	\
+		  ".align %c1\n\t"					\
+	          "\n%c0:\n\t"						\
+		  ".previous\n\t" 					\
+		  ".endif\n\t"						\
+		  : : "i" (__table_temp_sym(idx, __LINE__ )),		\
+		      "i" (__table_alignment(table)));			\
+	__table_temp_sym ( idx, __LINE__ ); } )
+#define __table_entries_sym(table, idx)					\
+	"__tbl_" __table_name(table) "_" #idx
+#define __table_temp_sym(a, b)						\
+	___table_temp_sym(__table_, a, _, b)
+#define ___table_temp_sym(a, b, c, d) a ## b ## c ## d
+
+/*
+ * icc ignores __attribute__ (( aligned (x) )) when it is used to
+ * decrease the compiler's default choice of alignment (which may be
+ * higher than the alignment actually required by the structure).  We
+ * work around this by forcing the alignment to a large multiple of
+ * the required value (so that we are never attempting to decrease the
+ * default alignment) and then postprocessing the object file to
+ * reduce the alignment back down to the "real" value.
+ *
+ */
+#undef __table_alignment
+#define __table_alignment(table) \
+	(ICC_ALIGN_HACK_FACTOR * __alignof__(__table_type(table)))
+
+/*
+ * Because of the alignment hack, we must ensure that the compiler
+ * never tries to place multiple objects within the same section,
+ * otherwise the assembler will insert padding to the (incorrect)
+ * alignment boundary.  Do this by appending the line number to table
+ * section names.
+ *
+ * Note that we don't need to worry about padding between array
+ * elements, since the alignment is declared on the variable (i.e. the
+ * whole array) rather than on the type (i.e. on all individual array
+ * elements).
+ */
+#undef __table_section
+#define __table_section(table, idx) \
+	".tbl." __table_name(table) "." __table_str(idx) \
+	"." __table_xstr(__LINE__)
+#define __table_xstr(x) __table_str(x)
+
+#endif /* __ICC */
+
+#endif /* _LINUX_LINKER_TABLES_H */
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 01df30af4d4a..4210c6493a17 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -91,7 +91,7 @@ modorder-target := $(obj)/modules.order
 
 # We keep a list of all modules in $(MODVERDIR)
 
-__build: $(if $(KBUILD_BUILTIN),$(builtin-target) $(lib-target) $(extra-y)) \
+__build: $(if $(KBUILD_BUILTIN),$(builtin-target) $(lib-target) $(extra-y) $(table-y)) \
 	 $(if $(KBUILD_MODULES),$(obj-m) $(modorder-target)) \
 	 $(subdir-ym) $(always)
 	@:
@@ -294,7 +294,7 @@ $(obj)/%.o: $(src)/%.S FORCE
 	$(call if_changed_dep,as_o_S)
 
 targets += $(real-objs-y) $(real-objs-m) $(lib-y)
-targets += $(extra-y) $(MAKECMDGOALS) $(always)
+targets += $(extra-y) $(table-y) $(MAKECMDGOALS) $(always)
 
 # Linker scripts preprocessor (.lds.S -> .lds)
 # ---------------------------------------------------------------------------
diff --git a/scripts/Makefile.clean b/scripts/Makefile.clean
index 55c96cb8070f..0af1e12c9749 100644
--- a/scripts/Makefile.clean
+++ b/scripts/Makefile.clean
@@ -36,6 +36,7 @@ subdir-ymn	:= $(addprefix $(obj)/,$(subdir-ymn))
 # directory
 
 __clean-files	:= $(extra-y) $(extra-m) $(extra-)       \
+		   $(table-y) $(table-m) $(table-)       \
 		   $(always) $(targets) $(clean-files)   \
 		   $(host-progs)                         \
 		   $(hostprogs-y) $(hostprogs-m) $(hostprogs-)
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 79e86613712f..3d826e5fff9d 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -12,6 +12,16 @@ export KBUILD_SUBDIR_CCFLAGS := $(KBUILD_SUBDIR_CCFLAGS) $(subdir-ccflags-y)
 # Figure out what we need to build from the various variables
 # ===========================================================================
 
+# Linker tables objects always wish to be built to avoid bit-rot in
+# code, but only linked in *iff* they were enabled. We accomplish this
+# using pegging linker table objects into extra-y, which forces
+# compilation and then using the respective table-y and table-m as
+# as hints for things we do want enabled. Objects which we want to
+# avoid linking in will be in table-, not table-y and table-m.
+extra-y += $(table-)
+obj-m += $(table-m)
+obj-y += $(table-y)
+
 # When an object is listed to be built compiled-in and modular,
 # only build the compiled-in version
 
@@ -72,6 +82,8 @@ real-objs-m := $(foreach m, $(obj-m), $(if $(strip $($(m:.o=-objs)) $($(m:.o=-y)
 # Add subdir path
 
 extra-y		:= $(addprefix $(obj)/,$(extra-y))
+table-y		:= $(addprefix $(obj)/,$(table-y))
+table-m		:= $(addprefix $(obj)/,$(table-m))
 always		:= $(addprefix $(obj)/,$(always))
 targets		:= $(addprefix $(obj)/,$(targets))
 modorder	:= $(addprefix $(obj)/,$(modorder))
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [RFC v1 3/8] x86/boot: add BIT() to boot/bitops.h
  2015-12-15 22:16 [RFC v1 0/8] x86/init: Linux linker tables Luis R. Rodriguez
  2015-12-15 22:16 ` [RFC v1 1/8] paravirt: rename paravirt_enabled to paravirt_legacy Luis R. Rodriguez
  2015-12-15 22:16 ` [RFC v1 2/8] tables.h: add linker table support Luis R. Rodriguez
@ 2015-12-15 22:16 ` Luis R. Rodriguez
       [not found]   ` <20160120201718.GC4769@char.us.oracle.com>
  2015-12-15 22:16 ` [RFC v1 4/8] x86/init: add linker table support Luis R. Rodriguez
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 60+ messages in thread
From: Luis R. Rodriguez @ 2015-12-15 22:16 UTC (permalink / raw)
  To: hpa, tglx, mingo, bp, konrad.wilk
  Cc: rusty, luto, boris.ostrovsky, mcb30, jgross, JBeulich, joro,
	ryabinin.a.a, andreyknvl, long.wanglong, qiuxishi, aryabinin,
	mchehab, valentinrothberg, peter.senna, mcgrof, x86, xen-devel,
	linux-kernel

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The boot/bitops.h guards against included the regular bitops,
it only implements what we need at early boot. We'll be making
use of BIT() later so add it.

Users of boot/boot.h must include it prior to asm/setup.h
otherwise the guard protection devise against the regular
linux/bitops.h will not take effect.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 arch/x86/boot/bitops.h | 2 ++
 arch/x86/boot/boot.h   | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/bitops.h b/arch/x86/boot/bitops.h
index 878e4b9940d9..232cff0ff4e3 100644
--- a/arch/x86/boot/bitops.h
+++ b/arch/x86/boot/bitops.h
@@ -40,4 +40,6 @@ static inline void set_bit(int nr, void *addr)
 	asm("btsl %1,%0" : "+m" (*(u32 *)addr) : "Ir" (nr));
 }
 
+#define BIT(x)	(1 << x)
+
 #endif /* BOOT_BITOPS_H */
diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h
index 9011a88353de..4fb53da1f48a 100644
--- a/arch/x86/boot/boot.h
+++ b/arch/x86/boot/boot.h
@@ -23,8 +23,8 @@
 #include <stdarg.h>
 #include <linux/types.h>
 #include <linux/edd.h>
-#include <asm/setup.h>
 #include "bitops.h"
+#include <asm/setup.h>
 #include "ctype.h"
 #include "cpuflags.h"
 
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [RFC v1 4/8] x86/init: add linker table support
  2015-12-15 22:16 [RFC v1 0/8] x86/init: Linux linker tables Luis R. Rodriguez
                   ` (2 preceding siblings ...)
  2015-12-15 22:16 ` [RFC v1 3/8] x86/boot: add BIT() to boot/bitops.h Luis R. Rodriguez
@ 2015-12-15 22:16 ` Luis R. Rodriguez
       [not found]   ` <20160120210014.GF4769@char.us.oracle.com>
  2015-12-15 22:16 ` [RFC v1 5/8] x86/init: move ebda reservations into linker table Luis R. Rodriguez
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 60+ messages in thread
From: Luis R. Rodriguez @ 2015-12-15 22:16 UTC (permalink / raw)
  To: hpa, tglx, mingo, bp, konrad.wilk
  Cc: rusty, luto, boris.ostrovsky, mcb30, jgross, JBeulich, joro,
	ryabinin.a.a, andreyknvl, long.wanglong, qiuxishi, aryabinin,
	mchehab, valentinrothberg, peter.senna, mcgrof, x86, xen-devel,
	linux-kernel

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Any failure on the x86 init path can be catastrophic.
A simple shift of a call from one place to another can
easily break things. Likewise adding a new call to
one path without considering all x86 requirements
can make certain x86 run time environments crash.
We currently account for these requirements through
peer code review and run time testing. We could do
much better if we had a clean and simple way to annotate
strong semantics for run time requirements, init sequence
dependencies, and detection mechanisms for additions of
new x86 init sequences.

This adds support for linker tables for x86 in order
to help define strong semantics for x86 init sequences.
The defined struct x86_init_fn is inspired by both
gPXE's sample definitions but is also also heavily
inspired by Linux's own IOMMU initialiation solution
which enables to extend initialization semantics to
support init routine custom detection routines, and
enables adding annotations for init routine dependency
mapping. A newly featured solution in this design
is to build up on the hardware subarchitecture
added to the x86 boot protocol 2.07 in 2007 by Rusty
but never really taken advantage of except for lguest,
using it as a stop-gap for new x86 features which
have failed to take into consideration the dual x86-64
entry points possible due to paravirtualization yielding
requirements. The hardware subarchitecture could also
potentially be used in the future to help unify Linux x86
entry points. The current disjoint entry points for x86-64 can
be summarized as follows.

Bare metal, KVM, Xen HVM                      Xen PV / dom0
      startup_64()                             startup_xen()
             \                                     /
     x86_64_start_kernel()                 xen_start_kernel()
                          \               /
                     x86_64_start_reservations()
                                  |
                             start_kernel()
                             [   ...        ]
                             [ setup_arch() ]
                             [   ...        ]
                                 init

Inherited by the nature of using linker tables we also
gain the ability to sort init sequences using the linker
through the use of order-level and linker SORT() and only
optionally enable use specific init sequence sorting when
init seqeuence dependencies semantics are needed. By using
linker tables we can also avoid #ifdefer'y on select code
when and if desirable, this is also completley optional but
it is a feature we inherit by using linker tables.

For *new x86 features* this enables strong semantics to be
considered. For *existing x86 feature code* this enables the
opportunity to clarify requirements and dependencies,
which ultimately also provides a new way to ensure that code
that should not run never runs, that is, it provides one
mechanism to help prevent dead code. For a more elaborate
description on what dead code is exactly and how it can relate
to init sequences in particular to Xen due to its init sequence
refer to [0] and [1].

Four debug x86 features are included to help test x86 linker
table support. You'll want CONFIG_X86_DEBUG_LINKER_TABLES to
enable those. The features defined with table-$(CONFIG-x)
will *always* be compiled but only linked in when enabled.
The features defined with the good 'ol obj-y will only be
compiled as we're used to. If CONFIG_X86_DEBUG_LINKER_TABLES is
disabled nothing currently runs as there are no init sequences
yet using any of this.

The goal is to evolve the semantics carefully as needed. We start
with one basic callback, an early_init() called at the beginning
of x86_64_start_reservations().

[0] http://www.do-not-panic.com/2015/12/avoiding-dead-code-pvops-not-silver-bullet.html
[1] http://www.do-not-panic.com/2015/12/xen-and-x86-linux-zero-page.html

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 arch/x86/Kconfig.debug              |  47 +++++++
 arch/x86/include/asm/x86_init.h     |   1 +
 arch/x86/include/asm/x86_init_fn.h  | 267 ++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/Makefile            |   2 +
 arch/x86/kernel/dbg-tables/Makefile |  18 +++
 arch/x86/kernel/dbg-tables/alpha.c  |  10 ++
 arch/x86/kernel/dbg-tables/beta.c   |  18 +++
 arch/x86/kernel/dbg-tables/delta.c  |  10 ++
 arch/x86/kernel/dbg-tables/gamma.c  |  18 +++
 arch/x86/kernel/dbg-tables/gamma.h  |   3 +
 arch/x86/kernel/head32.c            |   4 +
 arch/x86/kernel/head64.c            |   4 +
 arch/x86/kernel/init.c              |  55 ++++++++
 arch/x86/kernel/sort-init.c         | 114 +++++++++++++++
 arch/x86/kernel/vmlinux.lds.S       |  25 ++++
 arch/x86/tools/relocs.c             |   1 +
 16 files changed, 597 insertions(+)
 create mode 100644 arch/x86/include/asm/x86_init_fn.h
 create mode 100644 arch/x86/kernel/dbg-tables/Makefile
 create mode 100644 arch/x86/kernel/dbg-tables/alpha.c
 create mode 100644 arch/x86/kernel/dbg-tables/beta.c
 create mode 100644 arch/x86/kernel/dbg-tables/delta.c
 create mode 100644 arch/x86/kernel/dbg-tables/gamma.c
 create mode 100644 arch/x86/kernel/dbg-tables/gamma.h
 create mode 100644 arch/x86/kernel/init.c
 create mode 100644 arch/x86/kernel/sort-init.c

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 137dfa96aa14..e66b030e82bc 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -400,4 +400,51 @@ config PUNIT_ATOM_DEBUG
 	  The current power state can be read from
 	  /sys/kernel/debug/punit_atom/dev_power_state
 
+config X86_DEBUG_LINKER_TABLES
+	bool "x86 linker table debug"
+	depends on DEBUG_KERNEL
+	default n
+	---help---
+	  Enabling this should enable debugging linker tables on x86
+	  with its own solution. To help debug it enables the "x86 beta"
+	  and "x86 gamma" built-in feature. Both "x86 alpha" and "x86 delta"
+	  built-in features are left disabled. You should see all tables
+	  compiled, except delta, only the beta and gamma feature will be
+	  linked in. The delta feature will only be compiled and linked in
+	  if and only if its enabled the old way.
+
+	  For more details on these linker table refer to:
+
+	  include/linux/tables.h
+
+	  If unsure, say N.
+
+config X86_DEBUG_LINKER_TABLE_ALPHA
+	bool "x86 linker table alpha"
+	default n
+	depends on X86_DEBUG_LINKER_TABLES
+	---help---
+	  Enabling this should enable the linker table "x86 alpha" feature.
+
+config X86_DEBUG_LINKER_TABLE_BETA
+	bool "x86 linker table beta"
+	default y
+	depends on X86_DEBUG_LINKER_TABLES
+	---help---
+	  Enabling this should enable the linker table "x86 beta" feature.
+
+config X86_DEBUG_LINKER_TABLE_GAMMA
+	bool "x86 linker table gamma"
+	default y
+	depends on X86_DEBUG_LINKER_TABLES
+	---help---
+	  Enabling this should enable the linker table "x86 gamma" feature.
+
+config X86_DEBUG_LINKER_TABLE_DELTA
+	bool "x86 linker table delta"
+	default n
+	depends on X86_DEBUG_LINKER_TABLES
+	---help---
+	  Enabling this should enable the linker table "x86 delta" feature.
+
 endmenu
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index cd0fc0cc78bc..b1b941e9ccd7 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -2,6 +2,7 @@
 #define _ASM_X86_PLATFORM_H
 
 #include <asm/bootparam.h>
+#include <asm/x86_init_fn.h>
 
 struct mpc_bus;
 struct mpc_cpu;
diff --git a/arch/x86/include/asm/x86_init_fn.h b/arch/x86/include/asm/x86_init_fn.h
new file mode 100644
index 000000000000..2c9e7a5f31c6
--- /dev/null
+++ b/arch/x86/include/asm/x86_init_fn.h
@@ -0,0 +1,267 @@
+#ifndef __X86_INIT_TABLES_H
+#define __X86_INIT_TABLES_H
+
+#include <linux/types.h>
+#include <linux/tables.h>
+#include <linux/init.h>
+#include <linux/bitops.h>
+
+/**
+ * struct x86_init_fn - x86 generic kernel init call
+ *
+ * Linux x86 features vary in complexity, features may require work done at
+ * different levels of the full x86 init sequence. Today there are also two
+ * different possible entry points for Linux on x86, one for bare metal, KVM
+ * and Xen HVM, and another for Xen PV guests / dom0.  Assuming a bootloader
+ * has set up 64-bit mode, roughly the x86 init sequence follows this path:
+ *
+ * Bare metal, KVM, Xen HVM                      Xen PV / dom0
+ *       startup_64()                             startup_xen()
+ *              \                                     /
+ *      x86_64_start_kernel()                 xen_start_kernel()
+ *                           \               /
+ *                      x86_64_start_reservations()
+ *                                   |
+ *                              start_kernel()
+ *                              [   ...        ]
+ *                              [ setup_arch() ]
+ *                              [   ...        ]
+ *                                  init
+ *
+ * x86_64_start_kernel() and xen_start_kernel() are the respective first C code
+ * entry starting points. The different entry points exist to enable Xen to
+ * skip a lot of hardware setup already done and managed on behalf of the
+ * hypervisor, we refer to this as "paravirtualization yielding". The different
+ * levels of init calls on the x86 init sequence exist to account for these
+ * slight differences and requirements. These different entry points also share
+ * a common entry x86 specific path, x86_64_start_reservations().
+ *
+ * A generic x86 feature can have different initialization calls, one on each
+ * of the different main x86 init sequences, but must also address both entry
+ * points in order to work properly across the board on all supported x86
+ * subarchitectures. Since x86 features can also have dependencies on other
+ * setup code or features, x86 features can at times be subordinate to other
+ * x86 features, or conditions. struct x86_init_fn enables feature developers
+ * to annotate dependency relationships to ensure subsequent init calls only
+ * run once a subordinate's dependencies have run. When needed custom
+ * dependency requirements can also be spelled out through a custom dependency
+ * checker. In order to account for the dual entry point nature of x86-64 Linux
+ * for "paravirtualization yielding" and to make annotations for support for
+ * these explicit each struct x86_init_fn must specify supported
+ * subarchitectures. The earliest x86-64 code can read the subarchitecture
+ * though is after load_idt(), as such the earliest we can currently rely on
+ * subarchitecture for semantics and a common init sequences is on the shared
+ * common x86_64_start_reservations().  Each struct x86_init_fn must also
+ * declare a two-digit decimal number to impose an ordering relative to other
+ * features when required.
+ *
+ * x86_init_fn enables strong semantics and dependencies to be defined and
+ * implemented on the full x86 initialization sequence.
+ *
+ * @order_level: must be set, linker order level, this corresponds to the table
+ * 	section sub-table index, we record this only for semantic validation
+ * 	purposes.  Order-level is always required however you typically would
+ * 	only use X86_INIT_NORMAL*() and leave ordering to be done by placement
+ * 	of code in a C file and the order of objects through a Makefile. Custom
+ * 	order-levels can be used when order on C file and order of objects on
+ * 	Makfiles does not suffice or much further refinements are needed.
+ * @supp_hardware_subarch: must be set, it represents the bitmask of supported
+ *	subarchitectures.  We require each struct x86_init_fn to have this set
+ *	to require developer considerations for each supported x86
+ *	subarchitecture and to build strong annotations of different possible
+ *	run time states particularly in consideration for the two main
+ *	different entry points for x86 Linux, to account for paravirtualization
+ *	yielding.
+ *
+ *	The subarchitecture is read by the kernel at early boot from the
+ *	struct boot_params hardware_subarch. Support for the subarchitecture
+ *	exists as of x86 boot protocol 2.07. The bootloader would have set up
+ *	the respective hardware_subarch on the boot sector as per
+ *	Documentation/x86/boot.txt.
+ *
+ *	What x86 entry point is used is determined at run time by the
+ *	bootloader. Linux pv_ops was designed to help enable to build one Linux
+ *	binary to support bare metal and different hypervisors.  pv_ops setup
+ *	code however is limited in that all pv_ops setup code is run late in
+ *	the x86 init sequence, during setup_arch(). In fact cpu_has_hypervisor
+ *	only works after early_cpu_init() during setup_arch(). If an x86
+ *	feature requires an earlier determination of what hypervisor was used,
+ *	or if it needs to annotate only support for certain hypervisors, the
+ *	x86 hardware_subarch should be set by the bootloader and
+ *	@supp_hardware_subarch set by the x86 feature. Using hardware_subarch
+ *	enables x86 features to fill the semantic gap between the Linux x86
+ *	entry point used and what pv_ops has to offer through a hypervisor
+ *	agnostic mechanism.
+ *
+ *	Each supported subarchitecture is set using the respective
+ *	X86_SUBARCH_* as a bit in the bitmask. For instance if a feature
+ *	is supported on PC and Xen subarchitectures only you would set this
+ *	bitmask to:
+ *
+ *		BIT(X86_SUBARCH_PC) |
+ *		BIT(X86_SUBARCH_XEN);
+ *
+ * @detect: optional, if set returns true if the feature has been detected to
+ *	be required, it returns false if the feature has been detected to not
+ *	be required.
+ * @depend: optional, if set this set of init routines must be called prior to
+ * 	the init routine who's respective detect routine we have set this
+ * 	depends callback to. This is only used for sorting purposes given
+ * 	all current init callbacks have a void return type. Sorting is
+ * 	implemented via x86_init_fn_sort(), it must be called only once,
+ * 	however you can delay sorting until you need it if you can ensure
+ * 	only @order_level and @supp_hardware_subarch can account for proper
+ * 	ordering and dependency requirements for all init sequences prior.
+ *	If you do not have a depend callback set its assumed the order level
+ *	(__x86_init_fn(level)) set by the init routine suffices to set the
+ *	order for when the feature's respective callbacks are called with
+ *	respect to other calls. Sorting of init calls with the same order level
+ *	is determined by linker order, determined by order placement on C code
+ *	and order listed on a Makefile. A routine that depends on another is
+ *	known as being subordinate to the init routine it depends on. Routines
+ *	that are subordinate must have an order-level of lower priority or
+ *	equal priority than the order-level of the init sequence it depends on.
+ * @early_init: required, routine which will run in x86_64_start_reservations()
+ *	after we ensure boot_params.hdr.hardware_subarch is accessible and
+ *	properly set. Memory is not yet available. This the earliest we can
+ *	currently define a common shared callback since all callbacks need to
+ *	check for boot_params.hdr.hardware_subarch and this becomes accessible
+ *	on x86-64 until after load_idt().
+ * @flags: optional, bitmask of enum x86_init_fn_flags
+ */
+struct x86_init_fn {
+	__u32 order_level;
+	__u32 supp_hardware_subarch;
+	bool (*detect)(void);
+	bool (*depend)(void);
+	void (*early_init)(void);
+	__u32 flags;
+};
+
+/**
+ * enum x86_init_fn_flags: flags for init sequences
+ *
+ * X86_INIT_FINISH_IF_DETECTED: tells the core that once this init sequence
+ *	has completed it can break out of the loop for init sequences on
+ *	its own level.
+ * X86_INIT_DETECTED: private flag. Used by the x86 core to annotate that this
+ * 	init sequence has been detected and it all of its callbacks
+ * 	must be run during initialization.
+ */
+enum x86_init_fn_flags {
+	X86_INIT_FINISH_IF_DETECTED = BIT(0),
+	X86_INIT_DETECTED = BIT(1),
+};
+
+/* The x86 initialisation function table */
+#define X86_INIT_FNS __table(struct x86_init_fn, "x86_init_fns")
+
+/* Used to declares an x86 initialization table */
+#define __x86_init_fn(order_level) __table_entry(X86_INIT_FNS, order_level)
+
+/* Init order levels, we can start at 01 but reserve 01-09 for now */
+#define X86_INIT_ORDER_EARLY	10
+#define X86_INIT_ORDER_NORMAL	30
+#define X86_INIT_ORDER_LATE	50
+
+/*
+ * Use LTO_REFERENCE_INITCALL just in case of issues with old versions of gcc.
+ * This might not be needed for linker tables due to how we compartamentalize
+ * sections and then order them at linker time, but just in case.
+ */
+
+#define x86_init(__level,						\
+		 __supp_hardware_subarch,				\
+		 __detect,						\
+		 __depend,						\
+		 __early_init)						\
+	static struct x86_init_fn __x86_init_fn_##__early_init __used	\
+		__x86_init_fn(__level) = {				\
+		.order_level = __level,					\
+		.supp_hardware_subarch = __supp_hardware_subarch,	\
+		.detect = __detect,					\
+		.depend = __depend,					\
+		.early_init = __early_init,				\
+	};								\
+	LTO_REFERENCE_INITCALL(__x86_init_fn_##__early_init);
+
+#define x86_init_early(__supp_hardware_subarch,				\
+		       __detect,					\
+		       __depend,					\
+		       __early_init)					\
+	x86_init(X86_INIT_ORDER_EARLY, __supp_hardware_subarch,		\
+		 __detect, __depend,					\
+		 __early_init);
+
+#define x86_init_normal(__supp_hardware_subarch,			\
+		       __detect,					\
+		       __depend,					\
+		       __early_init)					\
+	x86_init(__name, X86_INIT_ORDER_NORMAL, __supp_hardware_subarch,\
+		 __detect, __depend,					\
+		 __early_init);
+
+#define x86_init_early_all(__detect,					\
+			   __depend,					\
+			   __early_init)				\
+	x86_init_early(__name, X86_SUBARCH_ALL_SUBARCHS,		\
+		       __detect, __depend,				\
+		       __early_init);
+
+#define x86_init_early_pc(__detect,					\
+			  __depend,					\
+			  __early_init)					\
+	x86_init_early(BIT(X86_SUBARCH_PC),				\
+		       __detect, __depend,				\
+		       __early_init);
+
+#define x86_init_early_pc_simple(__early_init)				\
+	x86_init_early((BIT(X86_SUBARCH_PC)), NULL, NULL,		\
+		       __early_init);
+
+#define x86_init_normal_all(__detect,					\
+			    __depend,					\
+			    __early_init)				\
+	x86_init_normal(X86_SUBARCH_ALL_SUBARCHS,			\
+		        __detect, __depend,				\
+		        __early_init);
+
+#define x86_init_normal_pc(__detect,					\
+			   __depend,					\
+			   __early_init)				\
+	x86_init_normal((BIT(X86_SUBARCH_PC)),				\
+		        __detect, __depend,				\
+		        __early_init);
+
+
+#define x86_init_normal_xen(__detect,					\
+			    __depend,					\
+			    __early_init)				\
+	x86_init_normal((BIT(X86_SUBARCH_XEN)),				\
+		        __detect, __depend,				\
+		        __early_init);
+
+/**
+ * x86_init_fn_early_init: call all early_init() callbacks
+ *
+ * This calls all early_init() callbacks on the x86_init_fns linker table.
+ */
+void x86_init_fn_early_init(void);
+
+/**
+ * x86_init_fn_init_tables - sort and check x86 linker table
+ *
+ * This sorts struct x86_init_fn init sequences in the x86_init_fns linker
+ * table by ensuring that init sequences that depend on other init sequences
+ * are placed later in the linker table. Init sequences that do not have
+ * dependencies are left in place. Circular dependencies are not allowed.
+ * The order-level of subordinate init sequences, that is of init sequences
+ * that depend on other init sequences, must have an order-level of lower
+ * or equal priority to the init sequence it depends on.
+ *
+ * This also validates semantics of all struct x86_init_fn init sequences
+ * on the x86_init_fns linker table.
+ */
+void x86_init_fn_init_tables(void);
+
+#endif /* __X86_INIT_TABLES_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index b1b78ffe01d0..be167a0a5e2c 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -23,6 +23,8 @@ KASAN_SANITIZE_dumpstack_$(BITS).o := n
 CFLAGS_irq.o := -I$(src)/../include/asm/trace
 
 obj-y			:= process_$(BITS).o signal.o
+obj-y			+= init.o sort-init.o
+obj-$(CONFIG_X86_DEBUG_LINKER_TABLES) += dbg-tables/
 obj-$(CONFIG_COMPAT)	+= signal_compat.o
 obj-y			+= traps.o irq.o irq_$(BITS).o dumpstack_$(BITS).o
 obj-y			+= time.o ioport.o dumpstack.o nmi.o
diff --git a/arch/x86/kernel/dbg-tables/Makefile b/arch/x86/kernel/dbg-tables/Makefile
new file mode 100644
index 000000000000..02d12c502ad0
--- /dev/null
+++ b/arch/x86/kernel/dbg-tables/Makefile
@@ -0,0 +1,18 @@
+# You should see all these compile but at run time you'd
+# only see the ones that were linked in.
+
+# This ensures we always compile but only link
+# in alpha if it was enabled. This is typically what
+# you are expected to use to avoid code bit-rot.
+table-$(CONFIG_X86_DEBUG_LINKER_TABLE_ALPHA)	+= alpha.o
+
+# This accomplishes the same, but requires 2 lines.
+extra-y						+= beta.o
+obj-$(CONFIG_X86_DEBUG_LINKER_TABLE_BETA)	+= beta.o
+
+table-$(CONFIG_X86_DEBUG_LINKER_TABLE_GAMMA)	+= gamma.o
+
+# If you *know* you only want to enable compilation of
+# a feature when its selected you can just use the good
+# ol' obj
+obj-$(CONFIG_X86_DEBUG_LINKER_TABLE_DELTA)	+= delta.o
diff --git a/arch/x86/kernel/dbg-tables/alpha.c b/arch/x86/kernel/dbg-tables/alpha.c
new file mode 100644
index 000000000000..54754f893a08
--- /dev/null
+++ b/arch/x86/kernel/dbg-tables/alpha.c
@@ -0,0 +1,10 @@
+#define pr_fmt(fmt) "debug-alpha: " fmt
+
+#include <linux/kernel.h>
+#include <asm/x86_init.h>
+
+static void early_init_dbg_alpha(void) {
+	pr_info("early_init triggered\n");
+}
+
+x86_init_early_pc_simple(early_init_dbg_alpha);
diff --git a/arch/x86/kernel/dbg-tables/beta.c b/arch/x86/kernel/dbg-tables/beta.c
new file mode 100644
index 000000000000..7384a57fc386
--- /dev/null
+++ b/arch/x86/kernel/dbg-tables/beta.c
@@ -0,0 +1,18 @@
+#define pr_fmt(fmt) "debug-beta: " fmt
+
+#include <linux/kernel.h>
+#include <asm/x86_init.h>
+
+#include "gamma.h"
+
+static bool x86_dbg_detect_beta(void)
+{
+	return true;
+}
+
+static void early_init_dbg_beta(void) {
+	pr_info("early_init triggered\n");
+}
+x86_init_early_pc(x86_dbg_detect_beta,
+		  x86_dbg_detect_gamma,
+		  early_init_dbg_beta);
diff --git a/arch/x86/kernel/dbg-tables/delta.c b/arch/x86/kernel/dbg-tables/delta.c
new file mode 100644
index 000000000000..9d38c68e602a
--- /dev/null
+++ b/arch/x86/kernel/dbg-tables/delta.c
@@ -0,0 +1,10 @@
+#define pr_fmt(fmt) "debug-delta: " fmt
+
+#include <linux/kernel.h>
+#include <asm/x86_init.h>
+
+static void early_init_dbg_delta(void) {
+	pr_info("early_init triggered\n");
+}
+
+x86_init_early_pc_simple(early_init_dbg_delta);
diff --git a/arch/x86/kernel/dbg-tables/gamma.c b/arch/x86/kernel/dbg-tables/gamma.c
new file mode 100644
index 000000000000..7b663c1f08f4
--- /dev/null
+++ b/arch/x86/kernel/dbg-tables/gamma.c
@@ -0,0 +1,18 @@
+#define pr_fmt(fmt) "debug-gamma: " fmt
+
+#include <linux/kernel.h>
+#include <asm/x86_init.h>
+
+bool x86_dbg_detect_gamma(void)
+{
+	return true;
+}
+
+static void early_init_dbg_gamma(void)
+{
+	pr_info("early_init triggered\n");
+}
+
+x86_init_early_pc(x86_dbg_detect_gamma,
+		  NULL,
+		  early_init_dbg_gamma);
diff --git a/arch/x86/kernel/dbg-tables/gamma.h b/arch/x86/kernel/dbg-tables/gamma.h
new file mode 100644
index 000000000000..810d450ddd14
--- /dev/null
+++ b/arch/x86/kernel/dbg-tables/gamma.h
@@ -0,0 +1,3 @@
+#include <asm/x86_init.h>
+
+bool x86_dbg_detect_gamma(void);
diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index 2911ef3a9f1c..d93f3e42e61b 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -19,6 +19,7 @@
 #include <asm/bios_ebda.h>
 #include <asm/tlbflush.h>
 #include <asm/bootparam_utils.h>
+#include <asm/x86_init.h>
 
 static void __init i386_default_early_setup(void)
 {
@@ -47,5 +48,8 @@ asmlinkage __visible void __init i386_start_kernel(void)
 		break;
 	}
 
+	x86_init_fn_init_tables();
+	x86_init_fn_early_init();
+
 	start_kernel();
 }
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index f129a9af6357..f83263a8d0ed 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -28,6 +28,7 @@
 #include <asm/bootparam_utils.h>
 #include <asm/microcode.h>
 #include <asm/kasan.h>
+#include <asm/x86_init.h>
 
 /*
  * Manage page tables very early on.
@@ -190,6 +191,9 @@ void __init x86_64_start_reservations(char *real_mode_data)
 	if (!boot_params.hdr.version)
 		copy_bootdata(__va(real_mode_data));
 
+	x86_init_fn_init_tables();
+	x86_init_fn_early_init();
+
 	reserve_ebda_region();
 
 	start_kernel();
diff --git a/arch/x86/kernel/init.c b/arch/x86/kernel/init.c
new file mode 100644
index 000000000000..5f9b662e2b34
--- /dev/null
+++ b/arch/x86/kernel/init.c
@@ -0,0 +1,55 @@
+#define pr_fmt(fmt) "x86-init: " fmt
+
+#include <linux/bug.h>
+#include <linux/kernel.h>
+
+#include <asm/x86_init_fn.h>
+#include <asm/bootparam.h>
+#include <asm/boot.h>
+#include <asm/setup.h>
+
+extern struct x86_init_fn __tbl_x86_start_init_fns[], __tbl_x86_end_init_fns[];
+
+static bool x86_init_fn_supports_subarch(struct x86_init_fn *fn)
+{
+	if (!fn->supp_hardware_subarch) {
+		pr_err("Init sequence fails to declares any supported subarchs: %pF\n", fn->early_init);
+		WARN_ON(1);
+	}
+	if (BIT(boot_params.hdr.hardware_subarch) & fn->supp_hardware_subarch)
+		return true;
+	return false;
+}
+
+void x86_init_fn_early_init(void)
+{
+	int ret;
+	struct x86_init_fn *init_fn;
+	unsigned int num_inits = table_num_entries(X86_INIT_FNS);
+
+	if (!num_inits)
+		return;
+
+	pr_debug("Number of init entries: %d\n", num_inits);
+
+	for_each_table_entry(init_fn, X86_INIT_FNS) {
+		if (!x86_init_fn_supports_subarch(init_fn))
+			continue;
+		if (!init_fn->detect)
+			init_fn->flags |= X86_INIT_DETECTED;
+		else {
+			ret = init_fn->detect();
+			if (ret > 0)
+				init_fn->flags |= X86_INIT_DETECTED;
+		}
+
+		if (init_fn->flags & X86_INIT_DETECTED) {
+			init_fn->flags |= X86_INIT_DETECTED;
+			pr_debug("Running early init %pF ...\n", init_fn->early_init);
+			init_fn->early_init();
+			pr_debug("Completed early init %pF\n", init_fn->early_init);
+			if (init_fn->flags & X86_INIT_FINISH_IF_DETECTED)
+				break;
+		}
+	}
+}
diff --git a/arch/x86/kernel/sort-init.c b/arch/x86/kernel/sort-init.c
new file mode 100644
index 000000000000..f230f3ddb0aa
--- /dev/null
+++ b/arch/x86/kernel/sort-init.c
@@ -0,0 +1,114 @@
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <asm/x86_init_fn.h>
+
+extern struct x86_init_fn __tbl_x86_start_init_fns[], __tbl_x86_end_init_fns[];
+
+static struct x86_init_fn *x86_init_fn_find_dep(struct x86_init_fn *start,
+						struct x86_init_fn *finish,
+						struct x86_init_fn *q)
+{
+	struct x86_init_fn *p;
+
+	if (!q)
+		return NULL;
+
+	for (p = start; p < finish; p++)
+		if (p->detect == q->depend)
+			return p;
+
+	return NULL;
+}
+
+static void x86_init_fn_sort(struct x86_init_fn *start,
+			     struct x86_init_fn *finish)
+{
+
+	struct x86_init_fn *p, *q, tmp;
+
+	for (p = start; p < finish; p++) {
+again:
+		q = x86_init_fn_find_dep(start, finish, p);
+		/*
+		 * We are bit sneaky here. We use the memory address to figure
+		 * out if the node we depend on is past our point, if so, swap.
+		 */
+		if (q > p) {
+			tmp = *p;
+			memmove(p, q, sizeof(*p));
+			*q = tmp;
+			goto again;
+		}
+	}
+
+}
+
+static void x86_init_fn_check(struct x86_init_fn *start,
+			      struct x86_init_fn *finish)
+{
+	struct x86_init_fn *p, *q, *x;
+
+	/* Simple cyclic dependency checker. */
+	for (p = start; p < finish; p++) {
+		if (!p->depend)
+			continue;
+		q = x86_init_fn_find_dep(start, finish, p);
+		x = x86_init_fn_find_dep(start, finish, q);
+		if (p == x) {
+			pr_info("CYCLIC DEPENDENCY FOUND! %pF depends on %pF and vice-versa. BREAKING IT.\n",
+				p->early_init, q->early_init);
+			/* Heavy handed way..*/
+			x->depend = 0;
+		}
+	}
+
+	/*
+	 * Validate sorting semantics.
+	 *
+	 * p depends on q so:
+	 * 	- q must run first, so q < p. If q > p that's an issue
+	 * 	  as its saying p must run prior to q. We already sorted
+	 * 	  this table, this is a problem.
+	 *
+	 * 	- q's order level must be <= than p's as it should run first
+	 */
+	for (p = start; p < finish; p++) {
+		if (!p->depend)
+			continue;
+		/*
+		 * Be pedantic and do a full search on the entire table,
+		 * if we need further validation, after this is called
+		 * one could use an optimized version which just searches
+		 * on x86_init_fn_find_dep(p, finish, p), as we would have
+		 * guarantee on proper ordering both at the dependency level
+		 * and by order level.
+		 */
+		q = x86_init_fn_find_dep(start, finish, p);
+		if (q && q > p) {
+			pr_info("EXECUTION ORDER INVALID! %pF should be called before %pF!\n",
+				p->early_init, q->early_init);
+		}
+
+		/*
+		 * Technically this would still work as the memmove() would
+		 * have forced the dependency to run first, however we want
+		 * strong semantics, so lets avoid these.
+		 */
+		if (q && q->order_level > p->order_level) {
+			pr_info("INVALID ORDER LEVEL! %pF should have an order level <= be called before %pF!\n",
+				p->early_init, q->early_init);
+		}
+	}
+}
+
+void x86_init_fn_init_tables(void)
+{
+	unsigned int num_inits = table_num_entries(X86_INIT_FNS);
+
+	if (!num_inits)
+		return;
+
+	x86_init_fn_sort(__tbl_x86_start_init_fns, __tbl_x86_end_init_fns);
+	x86_init_fn_check(__tbl_x86_start_init_fns, __tbl_x86_end_init_fns);
+}
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 74e4bf11f562..75c8f3f80e73 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -265,6 +265,31 @@ SECTIONS
 		__apicdrivers_end = .;
 	}
 
+	/*
+	 * x86 linker tables: this will sort all order levels
+	 * lexicographically. Something like SORT_BY_INIT_PRIORITY()
+	 * which sorts specifically on numeric order is desirable but
+	 * that would impose a change on binutils to generalize
+	 * SORT_BY_INIT_PRIORITY() for any section, SORT_BY_INIT_PRIORITY()
+	 * is specific to the init sections, and that would also mean having
+	 * a bump on requirments for building Linux. Using SORT() should
+	 * suffice, * its what gpxe uses. For things in the same order
+	 * level we rely on order in the C file, and also order on the
+	 * Makefile. Since this solution also embraces the IOMMU init solution
+	 * it has further sorting semantics implemented in C and its semantics
+	 * are defined in x86_init_fn.h and implemented in sort-init.c.
+	 */
+	.tbl            : {
+		__tbl_x86_start_init_fns = .;
+		*(SORT(.tbl.x86_init_fns.*))
+		__tbl_x86_end_init_fns = .;
+
+		/* ... etc ... */
+
+		/* You would add other tables here as needed */
+	}
+  . = ALIGN(8);
+
 	. = ALIGN(8);
 	/*
 	 * .exit.text is discard at runtime, not link time, to deal with
diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index 0c2fae8d929d..22b1616de37c 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -56,6 +56,7 @@ static const char * const sym_regex_kernel[S_NSYMTYPES] = {
 	"__x86_cpu_dev_(start|end)|"
 	"(__parainstructions|__alt_instructions)(|_end)|"
 	"(__iommu_table|__apicdrivers|__smp_locks)(|_end)|"
+	"(__tbl_x86_(start|end)_init_fns|"
 	"__(start|end)_pci_.*|"
 	"__(start|end)_builtin_fw|"
 	"__(start|stop)___ksymtab(|_gpl|_unused|_unused_gpl|_gpl_future)|"
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [RFC v1 5/8] x86/init: move ebda reservations into linker table
  2015-12-15 22:16 [RFC v1 0/8] x86/init: Linux linker tables Luis R. Rodriguez
                   ` (3 preceding siblings ...)
  2015-12-15 22:16 ` [RFC v1 4/8] x86/init: add linker table support Luis R. Rodriguez
@ 2015-12-15 22:16 ` Luis R. Rodriguez
  2015-12-17 20:48   ` Andy Lutomirski
  2015-12-15 22:16 ` [RFC v1 6/8] x86/init: use linker table for i386 early setup Luis R. Rodriguez
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 60+ messages in thread
From: Luis R. Rodriguez @ 2015-12-15 22:16 UTC (permalink / raw)
  To: hpa, tglx, mingo, bp, konrad.wilk
  Cc: rusty, luto, boris.ostrovsky, mcb30, jgross, JBeulich, joro,
	ryabinin.a.a, andreyknvl, long.wanglong, qiuxishi, aryabinin,
	mchehab, valentinrothberg, peter.senna, mcgrof, x86, xen-devel,
	linux-kernel

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This lets us annotate its requirements specifically for
PC and lguest subarchitectures. While at it since head.c
just has ebda data rename it. Since we're using linker tables
and both x86 32-bit and 64-bit have them we don't need
to declare a call for this separately. This lets us
also keep this declared static now.

Since we're using linker tables and have support to annotate
subarchitecture do that instead. pv_legacy() is incorrect as
its really only for legacy PV guests. There's no need for
pv_legacy() check anymore now.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 arch/x86/include/asm/bios_ebda.h   | 2 --
 arch/x86/kernel/Makefile           | 2 +-
 arch/x86/kernel/{head.c => ebda.c} | 6 +++---
 arch/x86/kernel/head32.c           | 2 --
 arch/x86/kernel/head64.c           | 2 --
 5 files changed, 4 insertions(+), 10 deletions(-)
 rename arch/x86/kernel/{head.c => ebda.c} (94%)

diff --git a/arch/x86/include/asm/bios_ebda.h b/arch/x86/include/asm/bios_ebda.h
index aa6a3170ab5a..e63347eb3804 100644
--- a/arch/x86/include/asm/bios_ebda.h
+++ b/arch/x86/include/asm/bios_ebda.h
@@ -38,8 +38,6 @@ static inline unsigned int get_bios_ebda_length(void)
 	return length;
 }
 
-void reserve_ebda_region(void);
-
 #ifdef CONFIG_X86_CHECK_BIOS_CORRUPTION
 /*
  * This is obviously not a great place for this, but we want to be
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index be167a0a5e2c..e7a43f08bccc 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -2,7 +2,7 @@
 # Makefile for the linux kernel.
 #
 
-extra-y                := head_$(BITS).o head$(BITS).o head.o vmlinux.lds
+extra-y                := head_$(BITS).o head$(BITS).o ebda.o vmlinux.lds
 
 CPPFLAGS_vmlinux.lds += -U$(UTS_MACHINE)
 
diff --git a/arch/x86/kernel/head.c b/arch/x86/kernel/ebda.c
similarity index 94%
rename from arch/x86/kernel/head.c
rename to arch/x86/kernel/ebda.c
index 279fad7288f8..defc2f07a80b 100644
--- a/arch/x86/kernel/head.c
+++ b/arch/x86/kernel/ebda.c
@@ -4,6 +4,7 @@
 
 #include <asm/setup.h>
 #include <asm/bios_ebda.h>
+#include <asm/x86_init.h>
 
 /*
  * The BIOS places the EBDA/XBDA at the top of conventional
@@ -26,7 +27,7 @@
 #define LOWMEM_CAP		0x9f000U	/* Absolute maximum */
 #define INSANE_CUTOFF		0x20000U	/* Less than this = insane */
 
-void __init reserve_ebda_region(void)
+static void __init reserve_ebda_region(void)
 {
 	unsigned int lowmem, ebda_addr;
 
@@ -38,8 +39,6 @@ void __init reserve_ebda_region(void)
 	 * that the paravirt case can handle memory setup
 	 * correctly, without our help.
 	 */
-	if (paravirt_legacy())
-		return;
 
 	/* end of low (conventional) memory */
 	lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES);
@@ -69,3 +68,4 @@ void __init reserve_ebda_region(void)
 	/* reserve all memory between lowmem and the 1MB mark */
 	memblock_reserve(lowmem, 0x100000 - lowmem);
 }
+x86_init_early_pc_simple(reserve_ebda_region);
diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index d93f3e42e61b..768fa3888066 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -26,8 +26,6 @@ static void __init i386_default_early_setup(void)
 	/* Initialize 32bit specific setup functions */
 	x86_init.resources.reserve_resources = i386_reserve_resources;
 	x86_init.mpparse.setup_ioapic_ids = setup_ioapic_ids_from_mpc;
-
-	reserve_ebda_region();
 }
 
 asmlinkage __visible void __init i386_start_kernel(void)
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index f83263a8d0ed..c913b7eb5056 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -194,7 +194,5 @@ void __init x86_64_start_reservations(char *real_mode_data)
 	x86_init_fn_init_tables();
 	x86_init_fn_early_init();
 
-	reserve_ebda_region();
-
 	start_kernel();
 }
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [RFC v1 6/8] x86/init: use linker table for i386 early setup
  2015-12-15 22:16 [RFC v1 0/8] x86/init: Linux linker tables Luis R. Rodriguez
                   ` (4 preceding siblings ...)
  2015-12-15 22:16 ` [RFC v1 5/8] x86/init: move ebda reservations into linker table Luis R. Rodriguez
@ 2015-12-15 22:16 ` Luis R. Rodriguez
       [not found]   ` <20160120211424.GG4769@char.us.oracle.com>
  2015-12-15 22:16 ` [RFC v1 7/8] x86/init: user linker table for ce4100 " Luis R. Rodriguez
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 60+ messages in thread
From: Luis R. Rodriguez @ 2015-12-15 22:16 UTC (permalink / raw)
  To: hpa, tglx, mingo, bp, konrad.wilk
  Cc: rusty, luto, boris.ostrovsky, mcb30, jgross, JBeulich, joro,
	ryabinin.a.a, andreyknvl, long.wanglong, qiuxishi, aryabinin,
	mchehab, valentinrothberg, peter.senna, mcgrof, x86, xen-devel,
	linux-kernel

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This also annotates this is only used for PC and
lguest hardware subarchitectures.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 arch/x86/kernel/head32.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index 768fa3888066..1db8eec5b0e2 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -21,12 +21,15 @@
 #include <asm/bootparam_utils.h>
 #include <asm/x86_init.h>
 
-static void __init i386_default_early_setup(void)
+static void __init i386_set_setup_funcs(void)
 {
 	/* Initialize 32bit specific setup functions */
 	x86_init.resources.reserve_resources = i386_reserve_resources;
 	x86_init.mpparse.setup_ioapic_ids = setup_ioapic_ids_from_mpc;
 }
+x86_init_early(BIT(X86_SUBARCH_PC) |
+	       BIT(X86_SUBARCH_LGUEST),
+	       NULL, NULL, i386_set_setup_funcs);
 
 asmlinkage __visible void __init i386_start_kernel(void)
 {
@@ -41,9 +44,6 @@ asmlinkage __visible void __init i386_start_kernel(void)
 	case X86_SUBARCH_CE4100:
 		x86_ce4100_early_setup();
 		break;
-	default:
-		i386_default_early_setup();
-		break;
 	}
 
 	x86_init_fn_init_tables();
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [RFC v1 7/8] x86/init: user linker table for ce4100 early setup
  2015-12-15 22:16 [RFC v1 0/8] x86/init: Linux linker tables Luis R. Rodriguez
                   ` (5 preceding siblings ...)
  2015-12-15 22:16 ` [RFC v1 6/8] x86/init: use linker table for i386 early setup Luis R. Rodriguez
@ 2015-12-15 22:16 ` Luis R. Rodriguez
  2015-12-15 22:16 ` [RFC v1 8/8] x86/init: use linker table for mid " Luis R. Rodriguez
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 60+ messages in thread
From: Luis R. Rodriguez @ 2015-12-15 22:16 UTC (permalink / raw)
  To: hpa, tglx, mingo, bp, konrad.wilk
  Cc: rusty, luto, boris.ostrovsky, mcb30, jgross, JBeulich, joro,
	ryabinin.a.a, andreyknvl, long.wanglong, qiuxishi, aryabinin,
	mchehab, valentinrothberg, peter.senna, mcgrof, x86, xen-devel,
	linux-kernel

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Using the linker table removes the need for the #ifdef'ery
and clutter on head32.c.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 arch/x86/include/asm/setup.h      | 6 ------
 arch/x86/kernel/head32.c          | 3 ---
 arch/x86/platform/ce4100/ce4100.c | 4 +++-
 3 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 11af24e09c8a..f1e111a9d558 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -51,12 +51,6 @@ extern void x86_intel_mid_early_setup(void);
 static inline void x86_intel_mid_early_setup(void) { }
 #endif
 
-#ifdef CONFIG_X86_INTEL_CE
-extern void x86_ce4100_early_setup(void);
-#else
-static inline void x86_ce4100_early_setup(void) { }
-#endif
-
 #ifndef _SETUP
 
 #include <asm/espfix.h>
diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index 1db8eec5b0e2..ec6912873395 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -41,9 +41,6 @@ asmlinkage __visible void __init i386_start_kernel(void)
 	case X86_SUBARCH_INTEL_MID:
 		x86_intel_mid_early_setup();
 		break;
-	case X86_SUBARCH_CE4100:
-		x86_ce4100_early_setup();
-		break;
 	}
 
 	x86_init_fn_init_tables();
diff --git a/arch/x86/platform/ce4100/ce4100.c b/arch/x86/platform/ce4100/ce4100.c
index 701fd5843c87..db2a709402e2 100644
--- a/arch/x86/platform/ce4100/ce4100.c
+++ b/arch/x86/platform/ce4100/ce4100.c
@@ -24,6 +24,7 @@
 #include <asm/io.h>
 #include <asm/io_apic.h>
 #include <asm/emergency-restart.h>
+#include <asm/x86_init.h>
 
 static int ce4100_i8042_detect(void)
 {
@@ -144,7 +145,7 @@ static void sdv_pci_init(void)
  * CE4100 specific x86_init function overrides and early setup
  * calls.
  */
-void __init x86_ce4100_early_setup(void)
+static void __init x86_ce4100_early_setup(void)
 {
 	x86_init.oem.arch_setup = sdv_arch_setup;
 	x86_platform.i8042_detect = ce4100_i8042_detect;
@@ -166,3 +167,4 @@ void __init x86_ce4100_early_setup(void)
 
 	pm_power_off = ce4100_power_off;
 }
+x86_init_early(BIT(X86_SUBARCH_CE4100), NULL, NULL, x86_ce4100_early_setup);
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [RFC v1 8/8] x86/init: use linker table for mid early setup
  2015-12-15 22:16 [RFC v1 0/8] x86/init: Linux linker tables Luis R. Rodriguez
                   ` (6 preceding siblings ...)
  2015-12-15 22:16 ` [RFC v1 7/8] x86/init: user linker table for ce4100 " Luis R. Rodriguez
@ 2015-12-15 22:16 ` Luis R. Rodriguez
  2015-12-15 22:59 ` [RFC v1 0/8] x86/init: Linux linker tables H. Peter Anvin
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 60+ messages in thread
From: Luis R. Rodriguez @ 2015-12-15 22:16 UTC (permalink / raw)
  To: hpa, tglx, mingo, bp, konrad.wilk
  Cc: rusty, luto, boris.ostrovsky, mcb30, jgross, JBeulich, joro,
	ryabinin.a.a, andreyknvl, long.wanglong, qiuxishi, aryabinin,
	mchehab, valentinrothberg, peter.senna, mcgrof, x86, xen-devel,
	linux-kernel

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Using the linker table removes the need for the #ifdef'ery
and clutter on head32.c.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 arch/x86/include/asm/setup.h            | 6 ------
 arch/x86/kernel/head32.c                | 7 -------
 arch/x86/platform/intel-mid/intel-mid.c | 4 +++-
 3 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index f1e111a9d558..1345c1de7f99 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -45,12 +45,6 @@ extern void reserve_standard_io_resources(void);
 extern void i386_reserve_resources(void);
 extern void setup_default_timer_irq(void);
 
-#ifdef CONFIG_X86_INTEL_MID
-extern void x86_intel_mid_early_setup(void);
-#else
-static inline void x86_intel_mid_early_setup(void) { }
-#endif
-
 #ifndef _SETUP
 
 #include <asm/espfix.h>
diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index ec6912873395..b23d16a2a5d5 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -36,13 +36,6 @@ asmlinkage __visible void __init i386_start_kernel(void)
 	cr4_init_shadow();
 	sanitize_boot_params(&boot_params);
 
-	/* Call the subarch specific early setup function */
-	switch (boot_params.hdr.hardware_subarch) {
-	case X86_SUBARCH_INTEL_MID:
-		x86_intel_mid_early_setup();
-		break;
-	}
-
 	x86_init_fn_init_tables();
 	x86_init_fn_early_init();
 
diff --git a/arch/x86/platform/intel-mid/intel-mid.c b/arch/x86/platform/intel-mid/intel-mid.c
index 1bbc21e2e4ae..27bc33193a06 100644
--- a/arch/x86/platform/intel-mid/intel-mid.c
+++ b/arch/x86/platform/intel-mid/intel-mid.c
@@ -167,7 +167,7 @@ static unsigned char intel_mid_get_nmi_reason(void)
  * Moorestown specific x86_init function overrides and early setup
  * calls.
  */
-void __init x86_intel_mid_early_setup(void)
+static void __init x86_intel_mid_early_setup(void)
 {
 	x86_init.resources.probe_roms = x86_init_noop;
 	x86_init.resources.reserve_resources = x86_init_noop;
@@ -199,6 +199,8 @@ void __init x86_intel_mid_early_setup(void)
 	x86_init.mpparse.get_smp_config = x86_init_uint_noop;
 	set_bit(MP_BUS_ISA, mp_bus_not_pci);
 }
+x86_init_early(BIT(X86_SUBARCH_INTEL_MID), NULL, NULL,
+	       x86_intel_mid_early_setup);
 
 /*
  * if user does not want to use per CPU apb timer, just give it a lower rating
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [RFC v1 0/8] x86/init: Linux linker tables
  2015-12-15 22:16 [RFC v1 0/8] x86/init: Linux linker tables Luis R. Rodriguez
                   ` (7 preceding siblings ...)
  2015-12-15 22:16 ` [RFC v1 8/8] x86/init: use linker table for mid " Luis R. Rodriguez
@ 2015-12-15 22:59 ` H. Peter Anvin
  2015-12-17 20:38 ` H. Peter Anvin
       [not found] ` <20160120211649.GJ4769@char.us.oracle.com>
  10 siblings, 0 replies; 60+ messages in thread
From: H. Peter Anvin @ 2015-12-15 22:59 UTC (permalink / raw)
  To: Luis R. Rodriguez, tglx, mingo, bp, konrad.wilk
  Cc: rusty, luto, boris.ostrovsky, mcb30, jgross, JBeulich, joro,
	ryabinin.a.a, andreyknvl, long.wanglong, qiuxishi, aryabinin,
	mchehab, valentinrothberg, peter.senna, mcgrof, x86, xen-devel,
	linux-kernel

On December 15, 2015 2:16:29 PM PST, "Luis R. Rodriguez" <mcgrof@do-not-panic.com> wrote:
>From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
>             A long time ago in a galaxy far,
>             far away...
>
>Konrad Rzeszutek Wilk posted patches which eventually got merged to
>help
>with modularizing the IOMMUs we have on x86 [0]. This work was done due
>to
>the complex relationship that exists on IOMMUs and the requirements on
>careful execution. The solution also provided a mechanism which
>jettisoned
>unused IOMMUs during run-time.
>
>During review, even though the code was merged, hpa did note that we
>tend
>to encounter this type of problem "often enough that we should
>implement a
>generic facility for it" [1], hpa acknowledged that it obviously has to
>be
>based on sections and even noted that perhaps we might be able in the
>future to
>automate its creation. He noted that the gPXE folks had done just this
>with
>linker tables and suggested that "presumably we'd need a few different
>flavors
>for init tables and so on, but this would make it a generic mechanism."
>
>The IOMMU code got merged and this was left on someone's mental
>backburner.
>I've had an itch to scratch recently to try to avoid issues which are
>possible
>if one does not jettison other code carefully due to the large
>complexity of
>implicit dependencies of certain code on x86 in particular with
>possible dead
>code on x86 due to paravirtualization, and the IOMMU jettison strategy
>turned
>out to be my favorite solution so far. I've taken on hpa's suggestions
>from
>back in the day to review gPXE's solution to see if we could embrace it
>on
>Linux for a generic section solution to help jettison code carefully.
>
>What this patch set does exactly:
>
>This RFC patch set attempts to add support for such a generic solution.
>In the end, it turns out that the best solution possible was the best
>of
>both worlds: a combination of what Konrad had implemented in addition
>to
>what Michael Brown had implemented on the gXPE front. The IOMMU
>solution
>enables simple semantic annotations for dependency relationships, this
>however requires a run time sort. The gPXE solution grants the option
>to
>simply sort at build time. One of gPXE's solution primary goals however
>was
>also to help avoid bit-rot on code that's possible from #ifdef'ery. The
>Linux
>linker table solution enables developers to pick and choose what they
>need, with linker tables being the simplest solution. Contrary to gPXE
>which strives to force compilation of all linker table solutions we
>let developers pick *when* they want this as part of their solution.
>As can be seen from the suggested x86 init specific use of linker
>tables
>proposed you can also take advantage of both, linker sorting, optional
>compilation when needed (at developer's discretion), and even careful
>semantics annotation for dependency / relationship annotations.
>Although
>the x86 init solution here is heavily inspired by the IOMMU solution it
>diverges with strong semantics, and a new optional subarchitecture
>annotation. Sorting of init sequences is structure specific, as such
>each subsystem must defing their own solution unless semantics could
>be shared. I considered sharing semantics but in the end this proved
>pointless so this keeps things separate. A series of changes were made
>to the x86 init sequence in contrast to the IOMMU solution to be
>*extremely
>pedantic* on semantics, review of this changes can be studied on the
>table-init tree [2].
>
>Quick review of gPXE's solution and prospects on further changes:
>
>In my review from gPXE's solution it was not clear what hpa meant by
>gXPE folks having automated this process, they actually use linker
>tables
>all around, forcing compilation of *everything* and just do linking of
>enabled features at link time. You still need to build linker tables on
>your own. What I do see more potential for in the future is enabling to
>evolve stronger semantics over time, and this would also be subsystem
>specific.
>This will be evident in this patch set on the x86 init use of linker
>tables.
>I also see potential in strenghtening semantics for linker sorting, any
>of
>these types of features however would impose requiring newer binutils.
>For
>instance, gPXE's linker solution currently relies on SORT(), that
>defaults to
>SORT_BY_NAME(). This sorts lexicographically, gPXE's solution uses two
>digits
>to enable SORT_BY_NAME()'s lexicographical sort to sort orde by numeric
>priority. Since one is in control of order-level numbers one can
>provide
>guarantee that this sort should work as intended, however binutils also
>now has
>a SORT_BY_INIT_PRIORITY() which sorts specifically based on digits.
>SORT_BY_INIT_PRIORITY() was designed specifically for init_array
>sections
>though. Refer to the userspace mockup solution table-init git tree [2]
>commit
>6deba47ee1ad461e90 for more details on this.  One thing I can envision
>to help
>here further are prospects for future linker enhancements, these
>however must
>be carefully considered in light of possible requirements of newer
>binutils
>and also of compatibility with other toolchains. For now we resort to
>the good
>'ol SORT() which even Linux has made use of for ages already. In that
>regards
>here, nothing new here. gPXE folks however did find some fun ICC
>compatiblity
>issues, and have also developed fixes for them, these were obviously
>carried
>over but should be reviewed carefully. Lastly, I should point out that
>essentially what we're developing are different forms loose and strong
>semantics -- in the most complex form what we really are after are
>feature
>graphs. Mauro explained to me that for media driver they needed to
>build a
>feature graphs, IMHO that could be a next level of strong semantics we
>might
>want to consider. For now though the combination of gPXE's linker-table
>solution based on linker order-level sort, and our old IOMMU init
>solution
>and its simple dependency map (and possible extensions, see
>subarchitecture)
>should likely suffice as a light weight solution for where we need
>semantics
>for at least on the x86 init path.
>
>Motivation and possible enhancements:
>
>To understand what made me tick to work on this feel free to read the
>dead
>code concerns with paravirtualization posts I've made, which also go
>into
>and detail Xen's alternative entry point [4] [5]. That's not the end of
>possibilities to help address to possible "dead code" on Linux, I have
>other
>ideas but I have a bit more work to do before publishing anything about
>it.
>If anything -- later work should likely supplement this solution
>further.
>
>Currently the earliest I was able to make use of boot_params on x86-64
>for
>linker tables was after load_idt(), so I've decided to stick with the
>earliest
>init call for x86 init sequences starting on
>x86_64_start_reservation().
>However, if we're able to make use of boot_params earlier (help
>appreciated),
>in particular just boot_params.hdr.hardware_subarch we should be able
>to make
>clearer and careful annotations on early x86-64 init code. Using the
>boot_params.hdr.hardware_subarch and requiring clear subarchitecture
>support
>annotations on early init code should provide a *proactive* means to
>avoid issues
>such as the cr4 shadow oversight [6] which later caused crashes on Xen.
>The
>latest similar type of issue is when KAsan was introduced, after it was
>integrated I suspected KAsan would probably break if enabled on Xen, I
>reported
>this in March 2015 [7] and Andrey confirmed this might be the case but
>since he
>wanted it enabled for allyesconfig and allmodconfig he could not think
>of a
>clean way to disable it or address support for it then [8]. During the
>development of this patch set I confirmed KAsan crashes Xen dom0 and
>therefore
>should obviously crash Xen PV guests as well.  Using the same kernel
>KAsan
>still worked on KVM and bare-metal. Provided we had early access to
>boot_params.hdr.hardware_subarch, in x86_64_start_kernel() we could
>annotate
>kasan_early_init() and friends not supported on Xen, however since we
>don't
>have a way to disable KAsan at run time at this time we wouldn't have
>any other
>option but to crash early. Because of this though we should perhaps
>just consider
>disabling KAsan for Xen configs and KAsan folks might just have to bite
>the
>bullet. If KAsan folks are gung-ho about not disabling KAsan when Xen
>is
>enabled and if its easier to add support to disable KAsan at run time
>than
>it is to add KAsan support for Xen one alternative might be to use
>linker
>tables to annotate this and disable KAsan at run time for Xen.
>
>One of the points of the use of the x86 linker table solution and
>specific
>semantics there in is to *force* developers to think about requirements
>on x86
>carefully. So for instance by requiring itemizing supported
>subarchitectures
>*early* on in development we should be able to avoid proactively issues
>such as
>the crash due to the cr4 shadow changes not added on the Xen entry
>path, and
>also missing the requirement to develop a solution for Kasan for Xen.
>The
>proposed x86 linker table solution is not the first to use
>boot_params.hdr.hardware_subarch, its first use was actually on i386,
>and for
>lguest. We take advantage of it to avoid further extending pv_ops, and
>*actually* to see if we could simplify pv_ops over time further. This
>patch set
>also includes a renaming of paravirt_enabled() to paravirt_legacy() 
>
>An unexpected long term side goal which *might* be possible due to
>linker
>tables on x86 is to help unify the different C entry points for Linux
>x86-64.
>I've taken a stab at it after these patches [9] but it fails on Xen,
>likely
>because the stack is not set up right due to the different calls /
>argument
>requirements. If this is desirable folks could take a look and perhaps
>help
>on that front.
>
>Where to get code alternatively and testing:
>
>All of these patches are RFCs, if anything the only patches worth
>consdering
>merging sooner rather than later might be "paravirt: rename
>paravirt_enabled
>to paravirt_legacy" and "x86/boot: add BIT() to boot/bitops.h". After
>review
>I can send those separately in patch form if agreeable. Since these are
>all
>just RFCs I've based these patches on top of Linus' v4.4-rc5. If you
>want all
>patches in one file you can get them here [10], and a respective
>linux-next
>version which applies on top of next-20151215 here [11]. I also have
>two
>git trees set up with branches for this code, one based on Linus' tree
>[12]
>and another based on linux-next next-20151215 [13]. I'll see what
>issues
>zero-day bot testing finds out. I've just run time tested this on
>x86-64
>bare metal, KVM, and Xen dom0 so far.
>
>[0] https://marc.info/?l=linux-kernel&m=128284562303565&w=2
>[1] https://marc.info/?l=linux-kernel&m=128285216913266&w=2
>[2] https://github.com/mcgrof/table-init/
>[3]
>https://github.com/mcgrof/table-init/commit/6deba47ee1ad461e90e0fbba226a535cfc1c58f3
>[4]
>http://www.do-not-panic.com/2015/12/avoiding-dead-code-pvops-not-silver-bullet.html
>[5]
>http://www.do-not-panic.com/2015/12/xen-and-x86-linux-zero-page.html
>[6]
>http://lists.xenproject.org/archives/html/xen-devel/2015-02/msg02742.html
>[7]
>http://lkml.kernel.org/r/CAB=NE6Xs5fepzNtymzT4CueeJZ0KMPETpda114DpL4eMtDswtw@mail.gmail.com
>[8] http://lkml.kernel.org/r/54F5B3DA.70203@samsung.com
>[9]
>http://drvbp1.linux-foundation.org/~mcgrof/patches/2015/12/15/x86-merge-x86-init-v1.patch
>[10]
>http://drvbp1.linux-foundation.org/~mcgrof/patches/2015/12/15/pend-20151215-rfc-v1-linker-tables.patch
>[11]
>http://drvbp1.linux-foundation.org/~mcgrof/patches/2015/12/15/pend-next-20151215-rfc-v1-linker-tables.patch
>[12]
>https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linux.git/log/?h=20151215-rfc-v1-linker-tables
>[13]
>https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linux-next.git/log/?h=20151215-rfc-v1-linker-tables
>
>Luis R. Rodriguez (8):
>  paravirt: rename paravirt_enabled to paravirt_legacy
>  tables.h: add linker table support
>  x86/boot: add BIT() to boot/bitops.h
>  x86/init: add linker table support
>  x86/init: move ebda reservations into linker table
>  x86/init: use linker table for i386 early setup
>  x86/init: user linker table for ce4100 early setup
>  x86/init: use linker table for mid early setup
>
> Documentation/kbuild/makefiles.txt      |  19 ++
> arch/x86/Kconfig.debug                  |  47 ++++
> arch/x86/boot/bitops.h                  |   2 +
> arch/x86/boot/boot.h                    |   2 +-
> arch/x86/include/asm/bios_ebda.h        |   2 -
> arch/x86/include/asm/paravirt.h         |   4 +-
> arch/x86/include/asm/paravirt_types.h   |  11 +-
> arch/x86/include/asm/processor.h        |   2 +-
> arch/x86/include/asm/setup.h            |  12 -
> arch/x86/include/asm/x86_init.h         |   1 +
> arch/x86/include/asm/x86_init_fn.h      | 267 ++++++++++++++++++++
> arch/x86/kernel/Makefile                |   4 +-
> arch/x86/kernel/apm_32.c                |   2 +-
> arch/x86/kernel/asm-offsets.c           |   2 +-
> arch/x86/kernel/cpu/intel.c             |   2 +-
> arch/x86/kernel/cpu/microcode/core.c    |   2 +-
> arch/x86/kernel/dbg-tables/Makefile     |  18 ++
> arch/x86/kernel/dbg-tables/alpha.c      |  10 +
> arch/x86/kernel/dbg-tables/beta.c       |  18 ++
> arch/x86/kernel/dbg-tables/delta.c      |  10 +
> arch/x86/kernel/dbg-tables/gamma.c      |  18 ++
> arch/x86/kernel/dbg-tables/gamma.h      |   3 +
> arch/x86/kernel/{head.c => ebda.c}      |   6 +-
> arch/x86/kernel/head32.c                |  22 +-
> arch/x86/kernel/head64.c                |   4 +-
> arch/x86/kernel/init.c                  |  55 +++++
> arch/x86/kernel/kvm.c                   |   9 +-
> arch/x86/kernel/paravirt.c              |   2 +-
> arch/x86/kernel/sort-init.c             | 114 +++++++++
> arch/x86/kernel/tboot.c                 |   2 +-
> arch/x86/kernel/vmlinux.lds.S           |  25 ++
> arch/x86/lguest/boot.c                  |   2 +-
> arch/x86/mm/dump_pagetables.c           |   2 +-
> arch/x86/platform/ce4100/ce4100.c       |   4 +-
> arch/x86/platform/intel-mid/intel-mid.c |   4 +-
> arch/x86/tools/relocs.c                 |   1 +
> arch/x86/xen/enlighten.c                |   2 +-
> drivers/pnp/pnpbios/core.c              |   2 +-
>include/linux/tables.h                  | 421
>++++++++++++++++++++++++++++++++
> scripts/Makefile.build                  |   4 +-
> scripts/Makefile.clean                  |   1 +
> scripts/Makefile.lib                    |  12 +
> 42 files changed, 1091 insertions(+), 61 deletions(-)
> create mode 100644 arch/x86/include/asm/x86_init_fn.h
> create mode 100644 arch/x86/kernel/dbg-tables/Makefile
> create mode 100644 arch/x86/kernel/dbg-tables/alpha.c
> create mode 100644 arch/x86/kernel/dbg-tables/beta.c
> create mode 100644 arch/x86/kernel/dbg-tables/delta.c
> create mode 100644 arch/x86/kernel/dbg-tables/gamma.c
> create mode 100644 arch/x86/kernel/dbg-tables/gamma.h
> rename arch/x86/kernel/{head.c => ebda.c} (94%)
> create mode 100644 arch/x86/kernel/init.c
> create mode 100644 arch/x86/kernel/sort-init.c
> create mode 100644 include/linux/tables.h

By the way, please refer to iPXE as gPXE is a dead project.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 0/8] x86/init: Linux linker tables
  2015-12-15 22:16 [RFC v1 0/8] x86/init: Linux linker tables Luis R. Rodriguez
                   ` (8 preceding siblings ...)
  2015-12-15 22:59 ` [RFC v1 0/8] x86/init: Linux linker tables H. Peter Anvin
@ 2015-12-17 20:38 ` H. Peter Anvin
  2015-12-17 23:46   ` Luis R. Rodriguez
       [not found] ` <20160120211649.GJ4769@char.us.oracle.com>
  10 siblings, 1 reply; 60+ messages in thread
From: H. Peter Anvin @ 2015-12-17 20:38 UTC (permalink / raw)
  To: Luis R. Rodriguez, tglx, mingo, bp, konrad.wilk
  Cc: rusty, luto, boris.ostrovsky, mcb30, jgross, JBeulich, joro,
	ryabinin.a.a, andreyknvl, long.wanglong, qiuxishi, aryabinin,
	mchehab, valentinrothberg, peter.senna, mcgrof, x86, xen-devel,
	linux-kernel

I think we can make this even more generic.  In particular, I would love
to see a solution for link tables that:

a) can be used for any kind of data structures, not just function
pointers (the latter is a specialization of the former);
b) doesn't need any changes to the linker scripts once the initial
enabling is done for any one architecture.

Key to this is to be able to define tables by name only, which is really
why SORT_BY_NAME() is used: the name sorts before the priority simply by
putting the name before the class.

Instead of .tbl.* naming of sections I think we should have the first
component be the type of section (.rodata, .data, .init_rodata,
.read_mostly etc.) which makes it easier to write a linker script that
properly sorts it into the right section.  The other thing is to take a
clue from the implementation in iPXE, which uses priority levels 00 and
99 (or we could use non-integers which sort appropriately instead of
using "real" levels) to contain the start and end symbols, which
eliminates any need for linker script modifications to add new tables.

Making this a generic facility we could eventually eliminate a bunch of
ad hoc hacks we currently have.

Oh, and the link table feature should NOT be x86-specific.

	-hpa


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 5/8] x86/init: move ebda reservations into linker table
  2015-12-15 22:16 ` [RFC v1 5/8] x86/init: move ebda reservations into linker table Luis R. Rodriguez
@ 2015-12-17 20:48   ` Andy Lutomirski
  2015-12-17 20:55     ` H. Peter Anvin
  0 siblings, 1 reply; 60+ messages in thread
From: Andy Lutomirski @ 2015-12-17 20:48 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Konrad Rzeszutek Wilk, Rusty Russell, Boris Ostrovsky, mcb30,
	Juergen Gross, Jan Beulich, Joerg Roedel, Andrey Ryabinin,
	Andrey Konovalov, long.wanglong, qiuxishi, Andrey Ryabinin,
	Mauro Carvalho Chehab, valentinrothberg, peter.senna,
	Luis Rodriguez, X86 ML, Xen Devel, linux-kernel

On Tue, Dec 15, 2015 at 2:16 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> This lets us annotate its requirements specifically for
> PC and lguest subarchitectures. While at it since head.c
> just has ebda data rename it. Since we're using linker tables
> and both x86 32-bit and 64-bit have them we don't need
> to declare a call for this separately. This lets us
> also keep this declared static now.
>
> Since we're using linker tables and have support to annotate
> subarchitecture do that instead. pv_legacy() is incorrect as
> its really only for legacy PV guests. There's no need for
> pv_legacy() check anymore now.

I'm entirely ignorant of anything going on in gPXE/iPXE.

Can you explain what a linker table *does*?  It looks like all you've
done in this patch is to move code around.  What actually happens?

--Andy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 5/8] x86/init: move ebda reservations into linker table
  2015-12-17 20:48   ` Andy Lutomirski
@ 2015-12-17 20:55     ` H. Peter Anvin
  2015-12-17 20:57       ` Andy Lutomirski
  0 siblings, 1 reply; 60+ messages in thread
From: H. Peter Anvin @ 2015-12-17 20:55 UTC (permalink / raw)
  To: Andy Lutomirski, Luis R. Rodriguez
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Konrad Rzeszutek Wilk, Rusty Russell, Boris Ostrovsky, mcb30,
	Juergen Gross, Jan Beulich, Joerg Roedel, Andrey Ryabinin,
	Andrey Konovalov, long.wanglong, qiuxishi, Andrey Ryabinin,
	Mauro Carvalho Chehab, valentinrothberg, peter.senna,
	Luis Rodriguez, X86 ML, Xen Devel, linux-kernel

On 12/17/15 12:48, Andy Lutomirski wrote:
> 
> I'm entirely ignorant of anything going on in gPXE/iPXE.
> 
> Can you explain what a linker table *does*?  It looks like all you've
> done in this patch is to move code around.  What actually happens?
> 

A linker table is a data structure that is stitched together from items
in multiple object files.

We already have a *bunch* of linker tables in Linux, mostly the init
tables, but they are all built in an ad hoc manner which requires linker
script modifications, which are of course per architecture.

My desire would be to make a general linker table facility so that a new
linker table can be implemented by changing C code only.

	-hpa



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 5/8] x86/init: move ebda reservations into linker table
  2015-12-17 20:55     ` H. Peter Anvin
@ 2015-12-17 20:57       ` Andy Lutomirski
  2015-12-17 23:40         ` Luis R. Rodriguez
  0 siblings, 1 reply; 60+ messages in thread
From: Andy Lutomirski @ 2015-12-17 20:57 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Luis R. Rodriguez, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Konrad Rzeszutek Wilk, Rusty Russell, Boris Ostrovsky, mcb30,
	Juergen Gross, Jan Beulich, Joerg Roedel, Andrey Ryabinin,
	Andrey Konovalov, long.wanglong, qiuxishi, Andrey Ryabinin,
	Mauro Carvalho Chehab, valentinrothberg, peter.senna,
	Luis Rodriguez, X86 ML, Xen Devel, linux-kernel

On Thu, Dec 17, 2015 at 12:55 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 12/17/15 12:48, Andy Lutomirski wrote:
>>
>> I'm entirely ignorant of anything going on in gPXE/iPXE.
>>
>> Can you explain what a linker table *does*?  It looks like all you've
>> done in this patch is to move code around.  What actually happens?
>>
>
> A linker table is a data structure that is stitched together from items
> in multiple object files.
>
> We already have a *bunch* of linker tables in Linux, mostly the init
> tables, but they are all built in an ad hoc manner which requires linker
> script modifications, which are of course per architecture.
>
> My desire would be to make a general linker table facility so that a new
> linker table can be implemented by changing C code only.

Sounds good to me.

--Andy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 5/8] x86/init: move ebda reservations into linker table
  2015-12-17 20:57       ` Andy Lutomirski
@ 2015-12-17 23:40         ` Luis R. Rodriguez
  0 siblings, 0 replies; 60+ messages in thread
From: Luis R. Rodriguez @ 2015-12-17 23:40 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: H. Peter Anvin, Luis R. Rodriguez, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Konrad Rzeszutek Wilk, Rusty Russell,
	Boris Ostrovsky, mcb30, Juergen Gross, Jan Beulich, Joerg Roedel,
	Andrey Ryabinin, Andrey Konovalov, long.wanglong, qiuxishi,
	Andrey Ryabinin, Mauro Carvalho Chehab, valentinrothberg,
	peter.senna, Michal Marek, X86 ML, Xen Devel, linux-kernel

On Thu, Dec 17, 2015 at 12:57:49PM -0800, Andy Lutomirski wrote:
> On Thu, Dec 17, 2015 at 12:55 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> > On 12/17/15 12:48, Andy Lutomirski wrote:
> >>
> >> I'm entirely ignorant of anything going on in gPXE/iPXE.
> >>
> >> Can you explain what a linker table *does*?  It looks like all you've
> >> done in this patch is to move code around.  What actually happens?
> >>
> >
> > A linker table is a data structure that is stitched together from items
> > in multiple object files.
> >
> > We already have a *bunch* of linker tables in Linux, mostly the init
> > tables, but they are all built in an ad hoc manner which requires linker
> > script modifications, which are of course per architecture.
> >
> > My desire would be to make a general linker table facility so that a new
> > linker table can be implemented by changing C code only.
> 
> Sounds good to me.

That's what this actually accomplishes, there are just a few caveats to consider,
more on this shortly on the other thread in this patch series.

  Luis

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 0/8] x86/init: Linux linker tables
  2015-12-17 20:38 ` H. Peter Anvin
@ 2015-12-17 23:46   ` Luis R. Rodriguez
  2015-12-17 23:58     ` Luis R. Rodriguez
  2015-12-18  4:25     ` H. Peter Anvin
  0 siblings, 2 replies; 60+ messages in thread
From: Luis R. Rodriguez @ 2015-12-17 23:46 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Luis R. Rodriguez, tglx, mingo, bp, konrad.wilk, rusty, luto,
	boris.ostrovsky, mcb30, jgross, JBeulich, joro, ryabinin.a.a,
	andreyknvl, long.wanglong, qiuxishi, aryabinin, mchehab,
	valentinrothberg, peter.senna, x86, Michal Marek, xen-devel,
	Michael Matz, linux-kernel

On Thu, Dec 17, 2015 at 12:38:10PM -0800, H. Peter Anvin wrote:
> I think we can make this even more generic.  In particular, I would love
> to see a solution for link tables that:
> 
> a) can be used for any kind of data structures, not just function
> pointers (the latter is a specialization of the former);

Sure, I can give this a shot. I'll sync back to the userspace
table-init.git tree what I had in this series and then build
first there a solution to make it agnostic to the use case.

> b) doesn't need any changes to the linker scripts once the initial
> enabling is done for any one architecture.

I'm glad you brought this up. I considered this as well, and sadly I
forgot to document the resolution. So here it is:

iPXE uses *one* entry on their linker script, that's it. All else is
just C code. That works well as all code in iPXE is happy to work
with one linker script, and the goal and use case for the linker
tables are *all* for initialization. iPXE also does *not* make use
of any further semantics for dependency annotations.

For Linux we have different requirements and goals, and we also wanted to take
advantage of other possible semantics when desirable. Consider the proposed use
case of a linker table with struct x86_init_fn, for its init sequences. We
need at the very least ***one linker table entry* to catch and peg these
sections, just as iPXE has. If we wanted to we could live with that but for
Linux on x86 that means vmlinux.lds.S, that's already architecture specific.
Do we want a more generic place ? If so can anyone suggest one ?

So to be clear, the solution in place suggested here already enables this, but
its current form does use vmlinux.lds.S. Although the current patch made this
change:

        .tbl            : {                                                     
                __tbl_x86_start_init_fns = .;                                   
                *(SORT(.tbl.x86_init_fns.*))                                    
                __tbl_x86_end_init_fns = .;                                     
                                                                                
                /* ... etc ... */                                               
                                                                                
                /* You would add other tables here as needed */                 
        }  

This was on purpose, a generic form could have simply been used as well:


    __tbl = .;
    *(SORT(.tbl.*))
    __tbl_end = .;


I made this change on the mockup table-init through two commits here:

https://github.com/mcgrof/table-init/commit/1a330c1dadbf7fa255eadcbaf214452f7e9ea29c
https://github.com/mcgrof/table-init/commit/ad11b4010ab2a5ffae7be39cce0ed4d052fe18fd

I explain why I do that there but the gist of it is that on Linux we may also
want stronger semantics for specific linker table solutions, and solutions such
as those devised on the IOMMU init stuff do memmove() for sorting depending on
semantics defined (in the simplest case here so far dependency between init
sequences), this makes each set of sequences very subsystem specific. An issue
with *one* subsystem could make things really bad for others. I thought about
this quite a bit and figured its best left to the subsystem maintainers to
decide.

So we have the freedom to do we wish still -- its just we need to understand
the implications of using something like a run time sort, and also when we want
to have the option to use different sized set of routines for different sections.
Linker sorting should fortunately give us the freedom to sort without concern
or regard to size of the structs / data used, but *iff* different sized data
is used in that section then one has to be careful that the sort routine be
very subsystem specific. So as I see it we'd need linker script annotations
for start and end on subsystems that want to use run time sort. This would
compartmentalize their sorting and bugs. If run time sorting is not needed
a catch-it-all script section as the above simple one should catch it.

To try to avoid issues between these two types of cases we could just document
the requirement. Now, it'd nice if we had a way to provide the section start/end
through C code, and not modify anything on linker scripts even if you had run
time sort requirements, I just can't think of a way to do that right now. Ideas
welcomed. Maybe some Kconfig magic to custom linker script section ? Then the
main linker script can just include this generated linker script section ?

> Key to this is to be able to define tables by name only, which is really
> why SORT_BY_NAME() is used: the name sorts before the priority simply by
> putting the name before the class.

Right, tbl.init_fns.01 for example would be one section if your scruct
was .init_fns and 01 for the first "priority". That still puts a requirement
if you want to use a run time sort so you can identify the start/end of
the range.

> Instead of .tbl.* naming of sections I think we should have the first
> component be the type of section (.rodata, .data, .init_rodata,
> .read_mostly etc.) which makes it easier to write a linker script that
> properly sorts it into the right section.

Sure, we can peg that as an argument to the __table(), so perhaps:

Perhaps a new sections.h file (you tell me) which documents the different
section components:

/* document this *really* well */
#define SECTION_RODATA		".rodata"
#define SECTION_INIT		".init"
#define SECTION_INIT_RODATA	".init_rodata"
#define SECTION_READ_MOSTLY	".read_mostly"

Then on tables.h we add the section components support:

#define __table(component, type, name) (component, type, name) 

#define __table_component(table) __table_extract_component table                          
#define __table_extract_component(component, type, name) component

#define __table_type(table) __table_extract_type table                          
#define __table_extract_type(component, type, name) type

#define __table_name(table) __table_extract_name table                          
#define __table_extract_name(component, type, name) name 

#define __table_str(x) #x 

#define __table_section(table, idx) \                                           
        "." __table_component (table) ".tbl." __table_name (table) "." __table_str (idx)                      

#define __table_entry(table, idx)                                       \       
        __attribute__ ((__section__(__table_section(table, idx)),       \       
                        __aligned__(__table_alignment(table))))

A user could then be something as follows:

#define X86_INIT_FNS __table(SECTION_INIT, struct x86_init_fn, "x86_init_fns") 
#define __x86_init_fn(order_level) __table_entry(X86_INIT_FNS, order_level)

If that's what you mean?

I'm a bit wary about having the linker sort any of the above SECTION_*'s, but
if we're happy to do that perhaps a simple first step might be to see if 0-day
but would be happy with just the sort without any consequences to any
architecture. Thoughts?

> The other thing is to take a
> clue from the implementation in iPXE, which uses priority levels 00 and
> 99 (or we could use non-integers which sort appropriately instead of
> using "real" levels) to contain the start and end symbols, which
> eliminates any need for linker script modifications to add new tables.

This solution uses that as well. The only need for adding custom sections
is when they have a requirement for a custom run time sort, and also to
ensure they don't cause regressions on other subsystems if they have a buggy
sort. The run time sorting is all subsystem specific and up to their own
semantics.

> Making this a generic facility we could eventually eliminate a bunch of
> ad hoc hacks we currently have.

Towards the end of my development I did look into seeing if I could use this
sort of stuff also for the other section sorting hacks we have in place, I ran
into an issue with the type stuff, if we make it agnostic to data structure,
then perhaps indeed it might be possible to generalize this even to those
scary cob-web corner infested places in the kernel.

> Oh, and the link table feature should NOT be x86-specific.

As far as I can tell tables.h is not architecture specific at this point, the
way arch/x86/include/asm/x86_init_fn.h uses it is, but that's fine. Even with
the above changes it would still be architecture agnostic. If there's some
discrepancies on how some architecture linkers work -- now that's something
to consider now. Its also why I've been mocking this up in userspace, we
should be able to test that much easier than a full blown kernel.

Now would be a good time to get linker folks involved and eyeballing all this.

  Luis

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 0/8] x86/init: Linux linker tables
  2015-12-17 23:46   ` Luis R. Rodriguez
@ 2015-12-17 23:58     ` Luis R. Rodriguez
  2015-12-18  4:25     ` H. Peter Anvin
  1 sibling, 0 replies; 60+ messages in thread
From: Luis R. Rodriguez @ 2015-12-17 23:58 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Luis R. Rodriguez, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Konrad Rzeszutek Wilk, Rusty Russell, Andy Lutomirski,
	Boris Ostrovsky, mcb30, Juergen Gross, Jan Beulich, joro,
	Andrey Ryabinin, andreyknvl, long.wanglong, qiuxishi, aryabinin,
	Mauro Carvalho Chehab, Valentin Rothberg, Peter Senna Tschudin,
	X86 ML, Michal Marek, xen-devel, Michael Matz, linux-kernel

On Thu, Dec 17, 2015 at 3:46 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> solutions such
> as those devised on the IOMMU init stuff do memmove() for sorting depending on
> semantics defined (in the simplest case here so far dependency between init
> sequences), this makes each set of sequences very subsystem specific

I should also note -- since the sort uses memmove() we're constrained
to using run time sort only once we can ensure the kernel can do that
without issues. I'd hope x86_64_start_reservations() is a safe place,
if not please let me know. Likewise, since one prospect here is to
bring further strong subarchitecture semantics even earlier (help
appreciated), say as early as possible in x86_64_start_kernel(), it'd
be nice to know the earliest memmove() is safe there.

In this case the solution doesn't yet make use of calls in between
x86_64_start_kernel() and  x86_64_start_reservations() but if it could
(we'd need a way for x86_init_fn_early_init() to access the
subarchitecture, help appreciated) we *might* in the future want run
time sort this early. It might still be possible to *not* do a run
time sort until later so long as the priority level linker sort
mechanism suffices for all early init routines. That's another
possibility, should memmove() be a problem that early.

 Luis

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 0/8] x86/init: Linux linker tables
  2015-12-17 23:46   ` Luis R. Rodriguez
  2015-12-17 23:58     ` Luis R. Rodriguez
@ 2015-12-18  4:25     ` H. Peter Anvin
  2015-12-18  4:40       ` H. Peter Anvin
  2015-12-18 18:50       ` Luis R. Rodriguez
  1 sibling, 2 replies; 60+ messages in thread
From: H. Peter Anvin @ 2015-12-18  4:25 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Luis R. Rodriguez, tglx, mingo, bp, konrad.wilk, rusty, luto,
	boris.ostrovsky, mcb30, jgross, JBeulich, joro, ryabinin.a.a,
	andreyknvl, long.wanglong, qiuxishi, aryabinin, mchehab,
	valentinrothberg, peter.senna, x86, Michal Marek, xen-devel,
	Michael Matz, linux-kernel

On 12/17/15 15:46, Luis R. Rodriguez wrote:
> 
> I explain why I do that there but the gist of it is that on Linux we may also
> want stronger semantics for specific linker table solutions, and solutions such
> as those devised on the IOMMU init stuff do memmove() for sorting depending on
> semantics defined (in the simplest case here so far dependency between init
> sequences), this makes each set of sequences very subsystem specific. An issue
> with *one* subsystem could make things really bad for others. I thought about
> this quite a bit and figured its best left to the subsystem maintainers to
> decide.
> 

A table that needs sorting or other runtime handling is just a
read-write table for the purpose of the linker table construct.  It
presents to C as an array of initialized data.

> Perhaps a new sections.h file (you tell me) which documents the different
> section components:
> 
> /* document this *really* well */
> #define SECTION_RODATA		".rodata"
> #define SECTION_INIT		".init"
> #define SECTION_INIT_RODATA	".init_rodata"
> #define SECTION_READ_MOSTLY	".read_mostly"
> 
> Then on tables.h we add the section components support:

Yes, something like that.  How to macroize it cleanly is another matter;
we may want to use slightly different conventions that iPXE to match our
own codebase.

> #define __table(component, type, name) (component, type, name) 
> 
> #define __table_component(table) __table_extract_component table                          
> #define __table_extract_component(component, type, name) component
> 
> #define __table_type(table) __table_extract_type table                          
> #define __table_extract_type(component, type, name) type
> 
> #define __table_name(table) __table_extract_name table                          
> #define __table_extract_name(component, type, name) name 
> 
> #define __table_str(x) #x 
> 
> #define __table_section(table, idx) \                                           
>         "." __table_component (table) ".tbl." __table_name (table) "." __table_str (idx)                      
> 
> #define __table_entry(table, idx)                                       \       
>         __attribute__ ((__section__(__table_section(table, idx)),       \       
>                         __aligned__(__table_alignment(table))))
> 
> A user could then be something as follows:
> 
> #define X86_INIT_FNS __table(SECTION_INIT, struct x86_init_fn, "x86_init_fns") 
> #define __x86_init_fn(order_level) __table_entry(X86_INIT_FNS, order_level)

Yes, but in particular the common case of function initialization tables
should be generic.

I'm kind of thinking a syntax like this:

DECLARE_LINKTABLE_RO(struct foo, tablename);
DEFINE_LINKTABLE_RO(struct foo, tablename);
LINKTABLE_RO(tablename,level) = /* contents */;
LINKTABLE_SIZE(tablename)

... which would turn into something like this once it goes through all
the preprocessing phases

/* DECLARE_LINKTABLE_RO */
extern const struct foo tablename[], tablename__end[];

/* DEFINE_LINKTABLE_RO */
DECLARE_LINKTABLE_RO(struct foo, tablename);

const struct
foo__attribute__((used,section(".rodata.tbl.tablename.0"))) tablename[0];

const struct
foo__attribute__((used,section(".rodata.tbl.tablename.999")))
tablename__end[0];

/* LINKTABLE_RO */
static const __typeof__(tablename)
__attribute__((used,section(".rodata.tbl.tablename.50")))
__tbl_tablename_12345

/* LINKTABLE_SIZE */
((tablename__end) - (tablename))

... and so on for all the possible sections where we may want tables.

Note: I used 0 and 999 above since they sort before and after all
possible 2-digit decimal numbers, but that's just cosmetic.

> If that's what you mean?
> 
> I'm a bit wary about having the linker sort any of the above SECTION_*'s, but
> if we're happy to do that perhaps a simple first step might be to see if 0-day
> but would be happy with just the sort without any consequences to any
> architecture. Thoughts?

I don't see what is dangerous about it.  The section names are such that
a lexographical sort will do the right thing, and we can simply use
SORT(.rodata.tbl.*) in the linker script, for example.

>> The other thing is to take a
>> clue from the implementation in iPXE, which uses priority levels 00 and
>> 99 (or we could use non-integers which sort appropriately instead of
>> using "real" levels) to contain the start and end symbols, which
>> eliminates any need for linker script modifications to add new tables.
> 
> This solution uses that as well. The only need for adding custom sections
> is when they have a requirement for a custom run time sort, and also to
> ensure they don't cause regressions on other subsystems if they have a buggy
> sort. The run time sorting is all subsystem specific and up to their own
> semantics.

Again, from a linker table POV this is nothing other than a read-write
table; there is a runtime function that then operates on that read-write
table.

	-hpa



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 0/8] x86/init: Linux linker tables
  2015-12-18  4:25     ` H. Peter Anvin
@ 2015-12-18  4:40       ` H. Peter Anvin
  2016-01-21 20:19         ` H. Peter Anvin
  2015-12-18 18:50       ` Luis R. Rodriguez
  1 sibling, 1 reply; 60+ messages in thread
From: H. Peter Anvin @ 2015-12-18  4:40 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Luis R. Rodriguez, tglx, mingo, bp, konrad.wilk, rusty, luto,
	boris.ostrovsky, mcb30, jgross, JBeulich, joro, ryabinin.a.a,
	andreyknvl, long.wanglong, qiuxishi, aryabinin, mchehab,
	valentinrothberg, peter.senna, x86, Michal Marek, xen-devel,
	Michael Matz, linux-kernel

On 12/17/15 20:25, H. Peter Anvin wrote:
> 
> /* DECLARE_LINKTABLE_RO */
> extern const struct foo tablename[], tablename__end[];
> 
> /* DEFINE_LINKTABLE_RO */
> DECLARE_LINKTABLE_RO(struct foo, tablename);
> 
> const struct
> foo__attribute__((used,section(".rodata.tbl.tablename.0"))) tablename[0];
> 
> const struct
> foo__attribute__((used,section(".rodata.tbl.tablename.999")))
> tablename__end[0];
> 
> /* LINKTABLE_RO */
> static const __typeof__(tablename)
> __attribute__((used,section(".rodata.tbl.tablename.50")))
> __tbl_tablename_12345
> 
> /* LINKTABLE_SIZE */
> ((tablename__end) - (tablename))
> 
> ... and so on for all the possible sections where we may want tables.
> 

Come to think of it, we could even eliminate the need for a DEFINE
entirely if we made the start and end symbols static.  However, this
would generate an awful lot of identical-but-local symbols which
probably would make the linker slower and definitely would bloat the
debug data.

	-hpa



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 0/8] x86/init: Linux linker tables
  2015-12-18  4:25     ` H. Peter Anvin
  2015-12-18  4:40       ` H. Peter Anvin
@ 2015-12-18 18:50       ` Luis R. Rodriguez
  2015-12-18 18:58         ` H. Peter Anvin
  1 sibling, 1 reply; 60+ messages in thread
From: Luis R. Rodriguez @ 2015-12-18 18:50 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Luis R. Rodriguez, tglx, mingo, bp, konrad.wilk, rusty, luto,
	boris.ostrovsky, mcb30, jgross, JBeulich, joro, ryabinin.a.a,
	andreyknvl, long.wanglong, qiuxishi, aryabinin, mchehab,
	valentinrothberg, peter.senna, x86, Michal Marek, xen-devel,
	Michael Matz, linux-kernel

On Thu, Dec 17, 2015 at 08:25:19PM -0800, H. Peter Anvin wrote:
> On 12/17/15 15:46, Luis R. Rodriguez wrote:
> > 
> > I explain why I do that there but the gist of it is that on Linux we may also
> > want stronger semantics for specific linker table solutions, and solutions such
> > as those devised on the IOMMU init stuff do memmove() for sorting depending on
> > semantics defined (in the simplest case here so far dependency between init
> > sequences), this makes each set of sequences very subsystem specific. An issue
> > with *one* subsystem could make things really bad for others. I thought about
> > this quite a bit and figured its best left to the subsystem maintainers to
> > decide.
> > 
> 
> A table that needs sorting or other runtime handling is just a
> read-write table for the purpose of the linker table construct.  It
> presents to C as an array of initialized data.

Sure but what I was getting at was that since some run time changes to the
table *might* be desirable, depending on the subsystem, the subsystem table needs
to be able to know the start and end address of its table, and that's a
linker script change. But come to think of it, that was me just being pedantic
and careful, I'll try even a run time sort with a few tables of different
structure size to ensure its both possible to just declare them in C and also
sort them without a linker script change.

> > Perhaps a new sections.h file (you tell me) which documents the different
> > section components:
> > 
> > /* document this *really* well */
> > #define SECTION_RODATA		".rodata"
> > #define SECTION_INIT		".init"
> > #define SECTION_INIT_RODATA	".init_rodata"
> > #define SECTION_READ_MOSTLY	".read_mostly"
> > 
> > Then on tables.h we add the section components support:
> 
> Yes, something like that.  How to macroize it cleanly is another matter;
> we may want to use slightly different conventions that iPXE to match our
> own codebase.

Sure, the style below is from iPXE, the one in the patch set matches the
Linux coding style. Other than that I'd welcome feedback on any issues
or recommendations with style on our proposed version of tables.h
Seems you provided some pointers below, so I'll go try to incorporate
those suggestions.

> > #define __table(component, type, name) (component, type, name) 
> > 
> > #define __table_component(table) __table_extract_component table                          
> > #define __table_extract_component(component, type, name) component
> > 
> > #define __table_type(table) __table_extract_type table                          
> > #define __table_extract_type(component, type, name) type
> > 
> > #define __table_name(table) __table_extract_name table                          
> > #define __table_extract_name(component, type, name) name 
> > 
> > #define __table_str(x) #x 
> > 
> > #define __table_section(table, idx) \                                           
> >         "." __table_component (table) ".tbl." __table_name (table) "." __table_str (idx)                      
> > 
> > #define __table_entry(table, idx)                                       \       
> >         __attribute__ ((__section__(__table_section(table, idx)),       \       
> >                         __aligned__(__table_alignment(table))))
> > 
> > A user could then be something as follows:
> > 
> > #define X86_INIT_FNS __table(SECTION_INIT, struct x86_init_fn, "x86_init_fns") 
> > #define __x86_init_fn(order_level) __table_entry(X86_INIT_FNS, order_level)
> 
> Yes, but in particular the common case of function initialization tables
> should be generic.
> 
> I'm kind of thinking a syntax like this:
> 
> DECLARE_LINKTABLE_RO(struct foo, tablename);
> DEFINE_LINKTABLE_RO(struct foo, tablename);
> LINKTABLE_RO(tablename,level) = /* contents */;
> LINKTABLE_SIZE(tablename)
> 
> ... which would turn into something like this once it goes through all
> the preprocessing phases
> 
> /* DECLARE_LINKTABLE_RO */
> extern const struct foo tablename[], tablename__end[];
> 
> /* DEFINE_LINKTABLE_RO */
> DECLARE_LINKTABLE_RO(struct foo, tablename);
> 
> const struct
> foo__attribute__((used,section(".rodata.tbl.tablename.0"))) tablename[0];
> 
> const struct
> foo__attribute__((used,section(".rodata.tbl.tablename.999")))
> tablename__end[0];
> 
> /* LINKTABLE_RO */
> static const __typeof__(tablename)
> __attribute__((used,section(".rodata.tbl.tablename.50")))
> __tbl_tablename_12345
> 
> /* LINKTABLE_SIZE */
> ((tablename__end) - (tablename))
> 
> ... and so on for all the possible sections where we may want tables.

OK thanks so much! Will try working with that.

> Note: I used 0 and 999 above since they sort before and after all
> possible 2-digit decimal numbers, but that's just cosmetic.

Ah neat, so we could still use the two digits 99 and 00 for order
level if we wanted then.

> > If that's what you mean?
> > 
> > I'm a bit wary about having the linker sort any of the above SECTION_*'s, but
> > if we're happy to do that perhaps a simple first step might be to see if 0-day
> > but would be happy with just the sort without any consequences to any
> > architecture. Thoughts?
> 
> I don't see what is dangerous about it.  The section names are such that
> a lexographical sort will do the right thing, and we can simply use
> SORT(.rodata.tbl.*) in the linker script, for example.

OK I'll leave it up to you, I'll go test sorting the sections broadly first.

> >> The other thing is to take a
> >> clue from the implementation in iPXE, which uses priority levels 00 and
> >> 99 (or we could use non-integers which sort appropriately instead of
> >> using "real" levels) to contain the start and end symbols, which
> >> eliminates any need for linker script modifications to add new tables.
> > 
> > This solution uses that as well. The only need for adding custom sections
> > is when they have a requirement for a custom run time sort, and also to
> > ensure they don't cause regressions on other subsystems if they have a buggy
> > sort. The run time sorting is all subsystem specific and up to their own
> > semantics.
> 
> Again, from a linker table POV this is nothing other than a read-write
> table; there is a runtime function that then operates on that read-write
> table.

OK, I'm convinced, I'll go try these changes now.

  Luis

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 0/8] x86/init: Linux linker tables
  2015-12-18 18:50       ` Luis R. Rodriguez
@ 2015-12-18 18:58         ` H. Peter Anvin
  0 siblings, 0 replies; 60+ messages in thread
From: H. Peter Anvin @ 2015-12-18 18:58 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Luis R. Rodriguez, tglx, mingo, bp, konrad.wilk, rusty, luto,
	boris.ostrovsky, mcb30, jgross, JBeulich, joro, ryabinin.a.a,
	andreyknvl, long.wanglong, qiuxishi, aryabinin, mchehab,
	valentinrothberg, peter.senna, x86, Michal Marek, xen-devel,
	Michael Matz, linux-kernel

On December 18, 2015 10:50:40 AM PST, "Luis R. Rodriguez" <mcgrof@suse.com> wrote:
>On Thu, Dec 17, 2015 at 08:25:19PM -0800, H. Peter Anvin wrote:
>> On 12/17/15 15:46, Luis R. Rodriguez wrote:
>> > 
>> > I explain why I do that there but the gist of it is that on Linux
>we may also
>> > want stronger semantics for specific linker table solutions, and
>solutions such
>> > as those devised on the IOMMU init stuff do memmove() for sorting
>depending on
>> > semantics defined (in the simplest case here so far dependency
>between init
>> > sequences), this makes each set of sequences very subsystem
>specific. An issue
>> > with *one* subsystem could make things really bad for others. I
>thought about
>> > this quite a bit and figured its best left to the subsystem
>maintainers to
>> > decide.
>> > 
>> 
>> A table that needs sorting or other runtime handling is just a
>> read-write table for the purpose of the linker table construct.  It
>> presents to C as an array of initialized data.
>
>Sure but what I was getting at was that since some run time changes to
>the
>table *might* be desirable, depending on the subsystem, the subsystem
>table needs
>to be able to know the start and end address of its table, and that's a
>linker script change. But come to think of it, that was me just being
>pedantic
>and careful, I'll try even a run time sort with a few tables of
>different
>structure size to ensure its both possible to just declare them in C
>and also
>sort them without a linker script change.
>
>> > Perhaps a new sections.h file (you tell me) which documents the
>different
>> > section components:
>> > 
>> > /* document this *really* well */
>> > #define SECTION_RODATA		".rodata"
>> > #define SECTION_INIT		".init"
>> > #define SECTION_INIT_RODATA	".init_rodata"
>> > #define SECTION_READ_MOSTLY	".read_mostly"
>> > 
>> > Then on tables.h we add the section components support:
>> 
>> Yes, something like that.  How to macroize it cleanly is another
>matter;
>> we may want to use slightly different conventions that iPXE to match
>our
>> own codebase.
>
>Sure, the style below is from iPXE, the one in the patch set matches
>the
>Linux coding style. Other than that I'd welcome feedback on any issues
>or recommendations with style on our proposed version of tables.h
>Seems you provided some pointers below, so I'll go try to incorporate
>those suggestions.
>
>> > #define __table(component, type, name) (component, type, name) 
>> > 
>> > #define __table_component(table) __table_extract_component table   
>                      
>> > #define __table_extract_component(component, type, name) component
>> > 
>> > #define __table_type(table) __table_extract_type table             
>            
>> > #define __table_extract_type(component, type, name) type
>> > 
>> > #define __table_name(table) __table_extract_name table             
>            
>> > #define __table_extract_name(component, type, name) name 
>> > 
>> > #define __table_str(x) #x 
>> > 
>> > #define __table_section(table, idx) \                              
>            
>> >         "." __table_component (table) ".tbl." __table_name (table)
>"." __table_str (idx)                      
>> > 
>> > #define __table_entry(table, idx)                                  
>    \       
>> >         __attribute__ ((__section__(__table_section(table, idx)),  
>    \       
>> >                         __aligned__(__table_alignment(table))))
>> > 
>> > A user could then be something as follows:
>> > 
>> > #define X86_INIT_FNS __table(SECTION_INIT, struct x86_init_fn,
>"x86_init_fns") 
>> > #define __x86_init_fn(order_level) __table_entry(X86_INIT_FNS,
>order_level)
>> 
>> Yes, but in particular the common case of function initialization
>tables
>> should be generic.
>> 
>> I'm kind of thinking a syntax like this:
>> 
>> DECLARE_LINKTABLE_RO(struct foo, tablename);
>> DEFINE_LINKTABLE_RO(struct foo, tablename);
>> LINKTABLE_RO(tablename,level) = /* contents */;
>> LINKTABLE_SIZE(tablename)
>> 
>> ... which would turn into something like this once it goes through
>all
>> the preprocessing phases
>> 
>> /* DECLARE_LINKTABLE_RO */
>> extern const struct foo tablename[], tablename__end[];
>> 
>> /* DEFINE_LINKTABLE_RO */
>> DECLARE_LINKTABLE_RO(struct foo, tablename);
>> 
>> const struct
>> foo__attribute__((used,section(".rodata.tbl.tablename.0")))
>tablename[0];
>> 
>> const struct
>> foo__attribute__((used,section(".rodata.tbl.tablename.999")))
>> tablename__end[0];
>> 
>> /* LINKTABLE_RO */
>> static const __typeof__(tablename)
>> __attribute__((used,section(".rodata.tbl.tablename.50")))
>> __tbl_tablename_12345
>> 
>> /* LINKTABLE_SIZE */
>> ((tablename__end) - (tablename))
>> 
>> ... and so on for all the possible sections where we may want tables.
>
>OK thanks so much! Will try working with that.
>
>> Note: I used 0 and 999 above since they sort before and after all
>> possible 2-digit decimal numbers, but that's just cosmetic.
>
>Ah neat, so we could still use the two digits 99 and 00 for order
>level if we wanted then.
>
>> > If that's what you mean?
>> > 
>> > I'm a bit wary about having the linker sort any of the above
>SECTION_*'s, but
>> > if we're happy to do that perhaps a simple first step might be to
>see if 0-day
>> > but would be happy with just the sort without any consequences to
>any
>> > architecture. Thoughts?
>> 
>> I don't see what is dangerous about it.  The section names are such
>that
>> a lexographical sort will do the right thing, and we can simply use
>> SORT(.rodata.tbl.*) in the linker script, for example.
>
>OK I'll leave it up to you, I'll go test sorting the sections broadly
>first.
>
>> >> The other thing is to take a
>> >> clue from the implementation in iPXE, which uses priority levels
>00 and
>> >> 99 (or we could use non-integers which sort appropriately instead
>of
>> >> using "real" levels) to contain the start and end symbols, which
>> >> eliminates any need for linker script modifications to add new
>tables.
>> > 
>> > This solution uses that as well. The only need for adding custom
>sections
>> > is when they have a requirement for a custom run time sort, and
>also to
>> > ensure they don't cause regressions on other subsystems if they
>have a buggy
>> > sort. The run time sorting is all subsystem specific and up to
>their own
>> > semantics.
>> 
>> Again, from a linker table POV this is nothing other than a
>read-write
>> table; there is a runtime function that then operates on that
>read-write
>> table.
>
>OK, I'm convinced, I'll go try these changes now.
>
>  Luis

No, start and end symbols are provided by having zero-length sections which sort at the very beginning and very end.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 3/8] x86/boot: add BIT() to boot/bitops.h
       [not found]   ` <20160120201718.GC4769@char.us.oracle.com>
@ 2016-01-20 20:33     ` Luis R. Rodriguez
  0 siblings, 0 replies; 60+ messages in thread
From: Luis R. Rodriguez @ 2016-01-20 20:33 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Rusty Russell, Andy Lutomirski, Boris Ostrovsky, mcb30,
	Juergen Gross, Jan Beulich, joro, Andrey Ryabinin, andreyknvl,
	long.wanglong, qiuxishi, aryabinin, Mauro Carvalho Chehab,
	Valentin Rothberg, Peter Senna Tschudin, X86 ML, xen-devel,
	linux-kernel

On Wed, Jan 20, 2016 at 12:17 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> Where? Could you paste in the name of the patch in the description?

+x86_init_early(BIT(X86_SUBARCH_INTEL_MID), NULL, NULL,
+              x86_intel_mid_early_setup);

Is an example, so users of the subarch. Likewise the macro helpers to
make these smaller, see x86_init_early_all() on the patch "x86/init:
add linker table support".

And then of course the checkers:

+static bool x86_init_fn_supports_subarch(struct x86_init_fn *fn)
+{
+       if (!fn->supp_hardware_subarch) {
+               pr_err("Init sequence fails to declares any supported
subarchs: %pF\n", fn->early_init);
+               WARN_ON(1);
+       }
+       if (BIT(boot_params.hdr.hardware_subarch) & fn->supp_hardware_subarch)
+               return true;
+       return false;
+}

  Luis

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 4/8] x86/init: add linker table support
       [not found]   ` <20160120210014.GF4769@char.us.oracle.com>
@ 2016-01-20 21:33     ` Luis R. Rodriguez
  2016-01-20 21:41       ` H. Peter Anvin
  2016-01-21  8:38       ` Roger Pau Monné
  0 siblings, 2 replies; 60+ messages in thread
From: Luis R. Rodriguez @ 2016-01-20 21:33 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Boris Ostrovsky, Roger Pau Monné,
	Stefano Stabellini
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Rusty Russell, Andy Lutomirski, mcb30, Juergen Gross,
	Jan Beulich, joro, Andrey Ryabinin, andreyknvl, long.wanglong,
	qiuxishi, aryabinin, Mauro Carvalho Chehab, Valentin Rothberg,
	Peter Senna Tschudin, X86 ML, xen-devel, linux-kernel

On Wed, Jan 20, 2016 at 1:00 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
>> +static bool x86_init_fn_supports_subarch(struct x86_init_fn *fn)
>> +{
>> +     if (!fn->supp_hardware_subarch) {
>> +             pr_err("Init sequence fails to declares any supported subarchs: %pF\n", fn->early_init);
>> +             WARN_ON(1);
>> +     }
>> +     if (BIT(boot_params.hdr.hardware_subarch) & fn->supp_hardware_subarch)
>> +             return true;
>> +     return false;
>> +}
>
> So the logic for this working is that boot_params.hdr.hardware_subarch
>
> And the macros define two: BIT(X86_SUBARCH_PC) or BIT(X86_SUBARCH_XEN).
>
> But hardware_subarch by default is set to zero. Which means if GRUB2, PXELinux, Xen multiboot1
> don't set it - then the X86_SUBARCH_PC is choosen right?
>
>  1 << 0 & 1 << X86_SUBARCH_PC (which is zero).
>
> For this to nicely work with Xen it ought to do this:
>
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index 993b7a7..6cf9afd 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1676,6 +1676,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
>         boot_params.hdr.ramdisk_image = initrd_start;
>         boot_params.hdr.ramdisk_size = xen_start_info->mod_len;
>         boot_params.hdr.cmd_line_ptr = __pa(xen_start_info->cmd_line);
> +       boot_params.hdr.hardware_subarch = X86_SUBARCH_XEN;
>
>         if (!xen_initial_domain()) {
>                 add_preferred_console("xenboot", 0, NULL);
>
>
> ?

That's correct for PV and PVH, likewise when qemu is required for HVM
qemu could set it. I have the qemu change done but that should only
cover HVM. A common place to set this as well could be the hypervisor,
but currently the hypervisor doesn't set any boot_params, instead a
generic struct is passed and the kernel code (for any OS) is expected
to interpret this and then set the required values for the OS in the
init path. Long term though if we wanted to merge init further one way
could be to have the hypervisor just set the zero page cleanly for the
different modes. If we needed more data other than the
hardware_subarch we also have the hardware_subarch_data, that's a u64
, and how that is used would be up to the subarch. In Xen's case it
could do what it wants with it. That would still mean perhaps defining
as part of a Xen boot protocol a place where xen specific code can
count on finding more Xen data passed by the hypervisor, the
xen_start_info. That is, if we wanted to merge init paths this is
something to consider.

One thing I considered on the question of who should set the zero page
for Xen with the prospect of merging inits, or at least this subarch
for both short term and long term are the obvious implications in
terms of hypervisor / kernel / qemu combination requirements if the
subarch is needed. Having it set in the kernel is an obvious immediate
choice for PV / PVH but it means we can't merge init paths completely
(down to asm inits), we'd still be able to merge some C init paths
though, the first entry would still be different. Having the zero page
set on the hypervisor would go long ways but it would mean a
hypervisor change required.

These prospects are worth discussing, specially in light of Boris's
hvmlite work.

  Luis

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 4/8] x86/init: add linker table support
  2016-01-20 21:33     ` Luis R. Rodriguez
@ 2016-01-20 21:41       ` H. Peter Anvin
  2016-01-20 22:12         ` Luis R. Rodriguez
  2016-01-21  8:38       ` Roger Pau Monné
  1 sibling, 1 reply; 60+ messages in thread
From: H. Peter Anvin @ 2016-01-20 21:41 UTC (permalink / raw)
  To: Luis R. Rodriguez, Konrad Rzeszutek Wilk, Boris Ostrovsky,
	Roger Pau Monné,
	Stefano Stabellini
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Rusty Russell,
	Andy Lutomirski, mcb30, Juergen Gross, Jan Beulich, joro,
	Andrey Ryabinin, andreyknvl, long.wanglong, qiuxishi, aryabinin,
	Mauro Carvalho Chehab, Valentin Rothberg, Peter Senna Tschudin,
	X86 ML, xen-devel, linux-kernel

On 01/20/16 13:33, Luis R. Rodriguez wrote:
> 
> That's correct for PV and PVH, likewise when qemu is required for HVM
> qemu could set it. I have the qemu change done but that should only
> cover HVM. A common place to set this as well could be the hypervisor,
> but currently the hypervisor doesn't set any boot_params, instead a
> generic struct is passed and the kernel code (for any OS) is expected
> to interpret this and then set the required values for the OS in the
> init path. Long term though if we wanted to merge init further one way
> could be to have the hypervisor just set the zero page cleanly for the
> different modes. If we needed more data other than the
> hardware_subarch we also have the hardware_subarch_data, that's a u64
> , and how that is used would be up to the subarch. In Xen's case it
> could do what it wants with it. That would still mean perhaps defining
> as part of a Xen boot protocol a place where xen specific code can
> count on finding more Xen data passed by the hypervisor, the
> xen_start_info. That is, if we wanted to merge init paths this is
> something to consider.
> 
> One thing I considered on the question of who should set the zero page
> for Xen with the prospect of merging inits, or at least this subarch
> for both short term and long term are the obvious implications in
> terms of hypervisor / kernel / qemu combination requirements if the
> subarch is needed. Having it set in the kernel is an obvious immediate
> choice for PV / PVH but it means we can't merge init paths completely
> (down to asm inits), we'd still be able to merge some C init paths
> though, the first entry would still be different. Having the zero page
> set on the hypervisor would go long ways but it would mean a
> hypervisor change required.
> 
> These prospects are worth discussing, specially in light of Boris's
> hvmlite work.
> 

The above doesn't make sense to me.  hardware_subarch is really used
when the boot sequence is somehow nonstandard.  HVM probably doesn't
need that.

	-hpa

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 6/8] x86/init: use linker table for i386 early setup
       [not found]   ` <20160120211424.GG4769@char.us.oracle.com>
@ 2016-01-20 21:41     ` Luis R. Rodriguez
  2016-02-11 19:55       ` Luis R. Rodriguez
  0 siblings, 1 reply; 60+ messages in thread
From: Luis R. Rodriguez @ 2016-01-20 21:41 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Rusty Russell, Andy Lutomirski, Boris Ostrovsky, mcb30,
	Juergen Gross, Jan Beulich, joro, Andrey Ryabinin, long.wanglong,
	qiuxishi, aryabinin, Mauro Carvalho Chehab, Valentin Rothberg,
	Peter Senna Tschudin, X86 ML, xen-devel, linux-kernel

On Wed, Jan 20, 2016 at 1:14 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Tue, Dec 15, 2015 at 02:16:35PM -0800, Luis R. Rodriguez wrote:
>> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>>
>> This also annotates this is only used for PC and
>> lguest hardware subarchitectures.
>>
>> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
>
> On Xen the code path (before this patch) was:
> xen_start_kernel->i386_start_kernel->i386_default_early_setup
>
> and now you remove the i386_default_early_setup call?

No you're right, this should include the Xen subarch as well, thanks for that.

 Luis

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 0/8] x86/init: Linux linker tables
       [not found] ` <20160120211649.GJ4769@char.us.oracle.com>
@ 2016-01-20 21:49   ` Luis R. Rodriguez
  0 siblings, 0 replies; 60+ messages in thread
From: Luis R. Rodriguez @ 2016-01-20 21:49 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Rusty Russell, Andy Lutomirski, Boris Ostrovsky, mcb30,
	Juergen Gross, Jan Beulich, joro, Andrey Ryabinin, andreyknvl,
	long.wanglong, qiuxishi, aryabinin, Mauro Carvalho Chehab,
	Valentin Rothberg, Peter Senna Tschudin, X86 ML, xen-devel,
	linux-kernel

On Wed, Jan 20, 2016 at 1:16 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Tue, Dec 15, 2015 at 02:16:29PM -0800, Luis R. Rodriguez wrote:
>> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>>
>>              A long time ago in a galaxy far,
>>              far away...
>
> :-)
>
> You wouldn't have this on a git tree by any chance?

You bet, refer to "Where to get code alternatively and testing:" on
the original version of this e-mail. As you noted though the i386
change needs the bit for xen subarch, and also the PV enlighten.c code
needs the subarch set too.

> And thank you for doing this!

You did the original nice work on the nice shiny dependency thing,
this just combines it with what ipxe has defined. So thank you!

 Lusi

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 4/8] x86/init: add linker table support
  2016-01-20 21:41       ` H. Peter Anvin
@ 2016-01-20 22:12         ` Luis R. Rodriguez
  2016-01-20 22:20           ` H. Peter Anvin
  0 siblings, 1 reply; 60+ messages in thread
From: Luis R. Rodriguez @ 2016-01-20 22:12 UTC (permalink / raw)
  To: H. Peter Anvin, Takashi Iwai, Avi Kivity
  Cc: Konrad Rzeszutek Wilk, Boris Ostrovsky, Roger Pau Monné,
	Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Rusty Russell, Andy Lutomirski, mcb30,
	Juergen Gross, Jan Beulich, joro, Andrey Ryabinin, andreyknvl,
	long.wanglong, qiuxishi, aryabinin, Mauro Carvalho Chehab,
	Valentin Rothberg, Peter Senna Tschudin, X86 ML, xen-devel,
	linux-kernel

On Wed, Jan 20, 2016 at 1:41 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 01/20/16 13:33, Luis R. Rodriguez wrote:
>>
>> That's correct for PV and PVH, likewise when qemu is required for HVM
>> qemu could set it. I have the qemu change done but that should only
>> cover HVM. A common place to set this as well could be the hypervisor,
>> but currently the hypervisor doesn't set any boot_params, instead a
>> generic struct is passed and the kernel code (for any OS) is expected
>> to interpret this and then set the required values for the OS in the
>> init path. Long term though if we wanted to merge init further one way
>> could be to have the hypervisor just set the zero page cleanly for the
>> different modes. If we needed more data other than the
>> hardware_subarch we also have the hardware_subarch_data, that's a u64
>> , and how that is used would be up to the subarch. In Xen's case it
>> could do what it wants with it. That would still mean perhaps defining
>> as part of a Xen boot protocol a place where xen specific code can
>> count on finding more Xen data passed by the hypervisor, the
>> xen_start_info. That is, if we wanted to merge init paths this is
>> something to consider.
>>
>> One thing I considered on the question of who should set the zero page
>> for Xen with the prospect of merging inits, or at least this subarch
>> for both short term and long term are the obvious implications in
>> terms of hypervisor / kernel / qemu combination requirements if the
>> subarch is needed. Having it set in the kernel is an obvious immediate
>> choice for PV / PVH but it means we can't merge init paths completely
>> (down to asm inits), we'd still be able to merge some C init paths
>> though, the first entry would still be different. Having the zero page
>> set on the hypervisor would go long ways but it would mean a
>> hypervisor change required.
>>
>> These prospects are worth discussing, specially in light of Boris's
>> hvmlite work.
>>
>
> The above doesn't make sense to me.  hardware_subarch is really used
> when the boot sequence is somehow nonstandard.

Thanks for the feedback -- as it stands today hardware_subarch is only
used by lguest, Moorestown, and CE4100 even though we had definitions
for it for Xen -- this is not used yet. Its documentation does make
references to differences for a paravirtualized environment, and uses
a few examples but doesn't go into great depths about restrictions so
its limitations in how we could use it were not clear to me.

>  HVM probably doesn't need that.

Today HVM doesn't need it, but perhaps that is because it has not
needed changes early on boot. Will it, or could it? I'd even invite us
to consider the same for other hypervisors or PV hypervisors. I'll
note that things like cpu_has_hypervisor() or derivatives
(kvm_para_available() which is now used on drivers even, see
sound/pci/intel8x0.c) requires init_hypervisor_platform() run, in
terms of the x86 init sequence this is run pretty late at
setup_arch(). Should code need to know hypervisor info anytime before
that they have no generic option available.

I'm fine if we want to restrict hardware_subarch but I'll note the
semantics were not that explicit to delineate clear differences and I
just wanted to highlight the current early boot restriction of
cpu_has_hypervisor().

  Luis

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 4/8] x86/init: add linker table support
  2016-01-20 22:12         ` Luis R. Rodriguez
@ 2016-01-20 22:20           ` H. Peter Anvin
  2016-01-22  0:25             ` Luis R. Rodriguez
  0 siblings, 1 reply; 60+ messages in thread
From: H. Peter Anvin @ 2016-01-20 22:20 UTC (permalink / raw)
  To: Luis R. Rodriguez, Takashi Iwai, Avi Kivity
  Cc: Konrad Rzeszutek Wilk, Boris Ostrovsky, Roger Pau Monné,
	Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Rusty Russell, Andy Lutomirski, mcb30,
	Juergen Gross, Jan Beulich, joro, Andrey Ryabinin, andreyknvl,
	long.wanglong, qiuxishi, aryabinin, Mauro Carvalho Chehab,
	Valentin Rothberg, Peter Senna Tschudin, X86 ML, xen-devel,
	linux-kernel

On January 20, 2016 2:12:49 PM PST, "Luis R. Rodriguez" <mcgrof@do-not-panic.com> wrote:
>On Wed, Jan 20, 2016 at 1:41 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 01/20/16 13:33, Luis R. Rodriguez wrote:
>>>
>>> That's correct for PV and PVH, likewise when qemu is required for
>HVM
>>> qemu could set it. I have the qemu change done but that should only
>>> cover HVM. A common place to set this as well could be the
>hypervisor,
>>> but currently the hypervisor doesn't set any boot_params, instead a
>>> generic struct is passed and the kernel code (for any OS) is
>expected
>>> to interpret this and then set the required values for the OS in the
>>> init path. Long term though if we wanted to merge init further one
>way
>>> could be to have the hypervisor just set the zero page cleanly for
>the
>>> different modes. If we needed more data other than the
>>> hardware_subarch we also have the hardware_subarch_data, that's a
>u64
>>> , and how that is used would be up to the subarch. In Xen's case it
>>> could do what it wants with it. That would still mean perhaps
>defining
>>> as part of a Xen boot protocol a place where xen specific code can
>>> count on finding more Xen data passed by the hypervisor, the
>>> xen_start_info. That is, if we wanted to merge init paths this is
>>> something to consider.
>>>
>>> One thing I considered on the question of who should set the zero
>page
>>> for Xen with the prospect of merging inits, or at least this subarch
>>> for both short term and long term are the obvious implications in
>>> terms of hypervisor / kernel / qemu combination requirements if the
>>> subarch is needed. Having it set in the kernel is an obvious
>immediate
>>> choice for PV / PVH but it means we can't merge init paths
>completely
>>> (down to asm inits), we'd still be able to merge some C init paths
>>> though, the first entry would still be different. Having the zero
>page
>>> set on the hypervisor would go long ways but it would mean a
>>> hypervisor change required.
>>>
>>> These prospects are worth discussing, specially in light of Boris's
>>> hvmlite work.
>>>
>>
>> The above doesn't make sense to me.  hardware_subarch is really used
>> when the boot sequence is somehow nonstandard.
>
>Thanks for the feedback -- as it stands today hardware_subarch is only
>used by lguest, Moorestown, and CE4100 even though we had definitions
>for it for Xen -- this is not used yet. Its documentation does make
>references to differences for a paravirtualized environment, and uses
>a few examples but doesn't go into great depths about restrictions so
>its limitations in how we could use it were not clear to me.
>
>>  HVM probably doesn't need that.
>
>Today HVM doesn't need it, but perhaps that is because it has not
>needed changes early on boot. Will it, or could it? I'd even invite us
>to consider the same for other hypervisors or PV hypervisors. I'll
>note that things like cpu_has_hypervisor() or derivatives
>(kvm_para_available() which is now used on drivers even, see
>sound/pci/intel8x0.c) requires init_hypervisor_platform() run, in
>terms of the x86 init sequence this is run pretty late at
>setup_arch(). Should code need to know hypervisor info anytime before
>that they have no generic option available.
>
>I'm fine if we want to restrict hardware_subarch but I'll note the
>semantics were not that explicit to delineate clear differences and I
>just wanted to highlight the current early boot restriction of
>cpu_has_hypervisor().
>
>  Luis

Basically, if the hardware is enumerable using standard PC mechanisms (PCI, ACPI) and doesn't need a special boot flow it should use type 0.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 2/8] tables.h: add linker table support
       [not found]   ` <20160120200428.GB4769@char.us.oracle.com>
@ 2016-01-20 23:15     ` Michael Brown
  2016-01-20 23:24       ` H. Peter Anvin
  0 siblings, 1 reply; 60+ messages in thread
From: Michael Brown @ 2016-01-20 23:15 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Luis R. Rodriguez
  Cc: hpa, tglx, mingo, bp, rusty, luto, boris.ostrovsky, jgross,
	JBeulich, joro, ryabinin.a.a, andreyknvl, long.wanglong,
	qiuxishi, aryabinin, mchehab, valentinrothberg, peter.senna,
	mcgrof, x86, xen-devel, linux-kernel

>> + * To solve this problem linker tables can be used on Linux, it enables you to
>> + * always force compiling of select features that one wishes to avoid bit-rot
>> + * while still enabling you to disable linking feature code into the final
>> + * kernel image or building certain modules if the features have been disabled
>> + * via Kconfig. The code is derivative of gPXE linker table's solution.

I missed the start of this thread.  However, asking as the author of the 
original Etherboot/gPXE/iPXE linker table solution: please change all 
references from "gPXE" to "iPXE".

Thanks,

Michael

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 2/8] tables.h: add linker table support
  2016-01-20 23:15     ` Michael Brown
@ 2016-01-20 23:24       ` H. Peter Anvin
  0 siblings, 0 replies; 60+ messages in thread
From: H. Peter Anvin @ 2016-01-20 23:24 UTC (permalink / raw)
  To: Michael Brown, Konrad Rzeszutek Wilk, Luis R. Rodriguez
  Cc: tglx, mingo, bp, rusty, luto, boris.ostrovsky, jgross, JBeulich,
	joro, ryabinin.a.a, andreyknvl, long.wanglong, qiuxishi,
	aryabinin, mchehab, valentinrothberg, peter.senna, mcgrof, x86,
	xen-devel, linux-kernel

On January 20, 2016 3:15:48 PM PST, Michael Brown <mcb30@ipxe.org> wrote:
>>> + * To solve this problem linker tables can be used on Linux, it
>enables you to
>>> + * always force compiling of select features that one wishes to
>avoid bit-rot
>>> + * while still enabling you to disable linking feature code into
>the final
>>> + * kernel image or building certain modules if the features have
>been disabled
>>> + * via Kconfig. The code is derivative of gPXE linker table's
>solution.
>
>I missed the start of this thread.  However, asking as the author of
>the 
>original Etherboot/gPXE/iPXE linker table solution: please change all 
>references from "gPXE" to "iPXE".
>
>Thanks,
>
>Michael

Yes, that request has already been made :)
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 4/8] x86/init: add linker table support
  2016-01-20 21:33     ` Luis R. Rodriguez
  2016-01-20 21:41       ` H. Peter Anvin
@ 2016-01-21  8:38       ` Roger Pau Monné
  2016-01-21 13:45         ` Boris Ostrovsky
  1 sibling, 1 reply; 60+ messages in thread
From: Roger Pau Monné @ 2016-01-21  8:38 UTC (permalink / raw)
  To: Luis R. Rodriguez, Konrad Rzeszutek Wilk, Boris Ostrovsky,
	Stefano Stabellini
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Rusty Russell, Andy Lutomirski, mcb30, Juergen Gross,
	Jan Beulich, joro, Andrey Ryabinin, andreyknvl, long.wanglong,
	qiuxishi, aryabinin, Mauro Carvalho Chehab, Valentin Rothberg,
	Peter Senna Tschudin, X86 ML, xen-devel, linux-kernel

El 20/01/16 a les 22.33, Luis R. Rodriguez ha escrit:
> On Wed, Jan 20, 2016 at 1:00 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
>>> +static bool x86_init_fn_supports_subarch(struct x86_init_fn *fn)
>>> +{
>>> +     if (!fn->supp_hardware_subarch) {
>>> +             pr_err("Init sequence fails to declares any supported subarchs: %pF\n", fn->early_init);
>>> +             WARN_ON(1);
>>> +     }
>>> +     if (BIT(boot_params.hdr.hardware_subarch) & fn->supp_hardware_subarch)
>>> +             return true;
>>> +     return false;
>>> +}
>>
>> So the logic for this working is that boot_params.hdr.hardware_subarch
>>
>> And the macros define two: BIT(X86_SUBARCH_PC) or BIT(X86_SUBARCH_XEN).
>>
>> But hardware_subarch by default is set to zero. Which means if GRUB2, PXELinux, Xen multiboot1
>> don't set it - then the X86_SUBARCH_PC is choosen right?
>>
>>  1 << 0 & 1 << X86_SUBARCH_PC (which is zero).
>>
>> For this to nicely work with Xen it ought to do this:
>>
>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>> index 993b7a7..6cf9afd 100644
>> --- a/arch/x86/xen/enlighten.c
>> +++ b/arch/x86/xen/enlighten.c
>> @@ -1676,6 +1676,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
>>         boot_params.hdr.ramdisk_image = initrd_start;
>>         boot_params.hdr.ramdisk_size = xen_start_info->mod_len;
>>         boot_params.hdr.cmd_line_ptr = __pa(xen_start_info->cmd_line);
>> +       boot_params.hdr.hardware_subarch = X86_SUBARCH_XEN;
>>
>>         if (!xen_initial_domain()) {
>>                 add_preferred_console("xenboot", 0, NULL);
>>
>>
>> ?
> 
> That's correct for PV and PVH, likewise when qemu is required for HVM
> qemu could set it. I have the qemu change done but that should only
> cover HVM. A common place to set this as well could be the hypervisor,
> but currently the hypervisor doesn't set any boot_params, instead a
> generic struct is passed and the kernel code (for any OS) is expected
> to interpret this and then set the required values for the OS in the
> init path. Long term though if we wanted to merge init further one way
> could be to have the hypervisor just set the zero page cleanly for the
> different modes. If we needed more data other than the
> hardware_subarch we also have the hardware_subarch_data, that's a u64
> , and how that is used would be up to the subarch. In Xen's case it
> could do what it wants with it. That would still mean perhaps defining
> as part of a Xen boot protocol a place where xen specific code can
> count on finding more Xen data passed by the hypervisor, the
> xen_start_info. That is, if we wanted to merge init paths this is
> something to consider.
> 
> One thing I considered on the question of who should set the zero page
> for Xen with the prospect of merging inits, or at least this subarch
> for both short term and long term are the obvious implications in
> terms of hypervisor / kernel / qemu combination requirements if the
> subarch is needed. Having it set in the kernel is an obvious immediate
> choice for PV / PVH but it means we can't merge init paths completely
> (down to asm inits), we'd still be able to merge some C init paths
> though, the first entry would still be different. Having the zero page
> set on the hypervisor would go long ways but it would mean a
> hypervisor change required.

I don't think the hypervisor should be setting Linux specific boot
related parameters, the boot ABI should be OS agnostic. IMHO, a small
shim should be added to Linux in order to set what Linux requires when
entering from a Xen entry point.

Roger.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 4/8] x86/init: add linker table support
  2016-01-21  8:38       ` Roger Pau Monné
@ 2016-01-21 13:45         ` Boris Ostrovsky
  2016-01-21 19:25           ` H. Peter Anvin
  0 siblings, 1 reply; 60+ messages in thread
From: Boris Ostrovsky @ 2016-01-21 13:45 UTC (permalink / raw)
  To: Roger Pau Monné,
	Luis R. Rodriguez, Konrad Rzeszutek Wilk, Stefano Stabellini
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Rusty Russell, Andy Lutomirski, mcb30, Juergen Gross,
	Jan Beulich, joro, Andrey Ryabinin, andreyknvl, long.wanglong,
	qiuxishi, aryabinin, Mauro Carvalho Chehab, Valentin Rothberg,
	Peter Senna Tschudin, X86 ML, xen-devel, linux-kernel

On 01/21/2016 03:38 AM, Roger Pau Monné wrote:
> El 20/01/16 a les 22.33, Luis R. Rodriguez ha escrit:
>> On Wed, Jan 20, 2016 at 1:00 PM, Konrad Rzeszutek Wilk
>> <konrad.wilk@oracle.com> wrote:
>>>> +static bool x86_init_fn_supports_subarch(struct x86_init_fn *fn)
>>>> +{
>>>> +     if (!fn->supp_hardware_subarch) {
>>>> +             pr_err("Init sequence fails to declares any supported subarchs: %pF\n", fn->early_init);
>>>> +             WARN_ON(1);
>>>> +     }
>>>> +     if (BIT(boot_params.hdr.hardware_subarch) & fn->supp_hardware_subarch)
>>>> +             return true;
>>>> +     return false;
>>>> +}
>>> So the logic for this working is that boot_params.hdr.hardware_subarch
>>>
>>> And the macros define two: BIT(X86_SUBARCH_PC) or BIT(X86_SUBARCH_XEN).
>>>
>>> But hardware_subarch by default is set to zero. Which means if GRUB2, PXELinux, Xen multiboot1
>>> don't set it - then the X86_SUBARCH_PC is choosen right?
>>>
>>>   1 << 0 & 1 << X86_SUBARCH_PC (which is zero).
>>>
>>> For this to nicely work with Xen it ought to do this:
>>>
>>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>>> index 993b7a7..6cf9afd 100644
>>> --- a/arch/x86/xen/enlighten.c
>>> +++ b/arch/x86/xen/enlighten.c
>>> @@ -1676,6 +1676,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
>>>          boot_params.hdr.ramdisk_image = initrd_start;
>>>          boot_params.hdr.ramdisk_size = xen_start_info->mod_len;
>>>          boot_params.hdr.cmd_line_ptr = __pa(xen_start_info->cmd_line);
>>> +       boot_params.hdr.hardware_subarch = X86_SUBARCH_XEN;
>>>
>>>          if (!xen_initial_domain()) {
>>>                  add_preferred_console("xenboot", 0, NULL);
>>>
>>>
>>> ?
>> That's correct for PV and PVH, likewise when qemu is required for HVM
>> qemu could set it. I have the qemu change done but that should only
>> cover HVM. A common place to set this as well could be the hypervisor,
>> but currently the hypervisor doesn't set any boot_params, instead a
>> generic struct is passed and the kernel code (for any OS) is expected
>> to interpret this and then set the required values for the OS in the
>> init path. Long term though if we wanted to merge init further one way
>> could be to have the hypervisor just set the zero page cleanly for the
>> different modes. If we needed more data other than the
>> hardware_subarch we also have the hardware_subarch_data, that's a u64
>> , and how that is used would be up to the subarch. In Xen's case it
>> could do what it wants with it. That would still mean perhaps defining
>> as part of a Xen boot protocol a place where xen specific code can
>> count on finding more Xen data passed by the hypervisor, the
>> xen_start_info. That is, if we wanted to merge init paths this is
>> something to consider.
>>
>> One thing I considered on the question of who should set the zero page
>> for Xen with the prospect of merging inits, or at least this subarch
>> for both short term and long term are the obvious implications in
>> terms of hypervisor / kernel / qemu combination requirements if the
>> subarch is needed. Having it set in the kernel is an obvious immediate
>> choice for PV / PVH but it means we can't merge init paths completely
>> (down to asm inits), we'd still be able to merge some C init paths
>> though, the first entry would still be different. Having the zero page
>> set on the hypervisor would go long ways but it would mean a
>> hypervisor change required.
> I don't think the hypervisor should be setting Linux specific boot
> related parameters, the boot ABI should be OS agnostic. IMHO, a small
> shim should be added to Linux in order to set what Linux requires when
> entering from a Xen entry point.

And that's exactly what HVMlite does. Most of this shim layer is setting 
up boot_params, after which we jump to standard x86 boot path (i.e. 
startup_{32|64}). With hardware_subarch set to zero.

-boris

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 4/8] x86/init: add linker table support
  2016-01-21 13:45         ` Boris Ostrovsky
@ 2016-01-21 19:25           ` H. Peter Anvin
  2016-01-21 19:46             ` Luis R. Rodriguez
  0 siblings, 1 reply; 60+ messages in thread
From: H. Peter Anvin @ 2016-01-21 19:25 UTC (permalink / raw)
  To: Boris Ostrovsky, Roger Pau Monné,
	Luis R. Rodriguez, Konrad Rzeszutek Wilk, Stefano Stabellini
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Rusty Russell,
	Andy Lutomirski, mcb30, Juergen Gross, Jan Beulich, joro,
	Andrey Ryabinin, andreyknvl, long.wanglong, qiuxishi, aryabinin,
	Mauro Carvalho Chehab, Valentin Rothberg, Peter Senna Tschudin,
	X86 ML, xen-devel, linux-kernel

On 01/21/16 05:45, Boris Ostrovsky wrote:
>> I don't think the hypervisor should be setting Linux specific boot
>> related parameters, the boot ABI should be OS agnostic. IMHO, a small
>> shim should be added to Linux in order to set what Linux requires when
>> entering from a Xen entry point.

For the record, this is exactly what hardware_subarch is meant to do:
revector to a stub to do this kind of setup, for a subarchitecture which
doesn't have a natural stub like BIOS or EFI.  In the case of Xen PV, or
lguest, there are special care that has to be done very early in the
path due to the nonstandard handling of page tables, which is another
reason for this field.

> And that's exactly what HVMlite does. Most of this shim layer is setting
> up boot_params, after which we jump to standard x86 boot path (i.e.
> startup_{32|64}). With hardware_subarch set to zero.

Which is the way to do it as long as the early code can be the same.

	-hpa

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 4/8] x86/init: add linker table support
  2016-01-21 19:25           ` H. Peter Anvin
@ 2016-01-21 19:46             ` Luis R. Rodriguez
  2016-01-21 19:50               ` H. Peter Anvin
  0 siblings, 1 reply; 60+ messages in thread
From: Luis R. Rodriguez @ 2016-01-21 19:46 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Boris Ostrovsky, Roger Pau Monné,
	Konrad Rzeszutek Wilk, Stefano Stabellini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Rusty Russell, Andy Lutomirski,
	mcb30, Juergen Gross, Jan Beulich, joro, Andrey Ryabinin,
	andreyknvl, long.wanglong, qiuxishi, aryabinin,
	Mauro Carvalho Chehab, Valentin Rothberg, Peter Senna Tschudin,
	X86 ML, xen-devel, linux-kernel

On Thu, Jan 21, 2016 at 11:25 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>> And that's exactly what HVMlite does. Most of this shim layer is setting
>> up boot_params, after which we jump to standard x86 boot path (i.e.
>> startup_{32|64}). With hardware_subarch set to zero.
>
> Which is the way to do it as long as the early code can be the same.

To be clear, with the subarchand linker table suggested in my patch
series, it should be possible to have the same exact entry point, the
Xen PV setup code could run early in the order. For instance in the
linker table we could use the reserved order levels 01-09 for PV
hypervisor code:

+/* Init order levels, we can start at 01 but reserve 01-09 for now */
+#define X86_INIT_ORDER_EARLY   10
+#define X86_INIT_ORDER_NORMAL  30
+#define X86_INIT_ORDER_LATE    50

So perhaps X86_INIT_ORDER_PV as 05 later.

The standard x86 init would just then be:

asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
{
        x86_init_fn_init_tables();
        x86_init_fn_early_init();
}

The PV init code would kick in early and could parse the
boot_params.hdr.hardware_subarch_data pointer as it sees fit.

 Luis

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 4/8] x86/init: add linker table support
  2016-01-21 19:46             ` Luis R. Rodriguez
@ 2016-01-21 19:50               ` H. Peter Anvin
  2016-01-21 19:52                 ` H. Peter Anvin
  2016-01-21 20:05                 ` Luis R. Rodriguez
  0 siblings, 2 replies; 60+ messages in thread
From: H. Peter Anvin @ 2016-01-21 19:50 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Boris Ostrovsky, Roger Pau Monné,
	Konrad Rzeszutek Wilk, Stefano Stabellini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Rusty Russell, Andy Lutomirski,
	mcb30, Juergen Gross, Jan Beulich, joro, Andrey Ryabinin,
	andreyknvl, long.wanglong, qiuxishi, aryabinin,
	Mauro Carvalho Chehab, Valentin Rothberg, Peter Senna Tschudin,
	X86 ML, xen-devel, linux-kernel

On 01/21/16 11:46, Luis R. Rodriguez wrote:
> On Thu, Jan 21, 2016 at 11:25 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>>> And that's exactly what HVMlite does. Most of this shim layer is setting
>>> up boot_params, after which we jump to standard x86 boot path (i.e.
>>> startup_{32|64}). With hardware_subarch set to zero.
>>
>> Which is the way to do it as long as the early code can be the same.
> 
> To be clear, with the subarchand linker table suggested in my patch
> series, it should be possible to have the same exact entry point, the
> Xen PV setup code could run early in the order. For instance in the
> linker table we could use the reserved order levels 01-09 for PV
> hypervisor code:
> 
> +/* Init order levels, we can start at 01 but reserve 01-09 for now */
> +#define X86_INIT_ORDER_EARLY   10
> +#define X86_INIT_ORDER_NORMAL  30
> +#define X86_INIT_ORDER_LATE    50
> 
> So perhaps X86_INIT_ORDER_PV as 05 later.
> 
> The standard x86 init would just then be:
> 
> asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
> {
>         x86_init_fn_init_tables();
>         x86_init_fn_early_init();
> }
> 
> The PV init code would kick in early and could parse the
> boot_params.hdr.hardware_subarch_data pointer as it sees fit.
> 

Right... we already do that.

I don't even think you need to initialize any tables.  At least on i386,
we have to do this in assembly code.  However, it is just a simple table
walk.  :)

	-hpa

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 4/8] x86/init: add linker table support
  2016-01-21 19:50               ` H. Peter Anvin
@ 2016-01-21 19:52                 ` H. Peter Anvin
  2016-01-22  0:19                   ` Luis R. Rodriguez
  2016-01-21 20:05                 ` Luis R. Rodriguez
  1 sibling, 1 reply; 60+ messages in thread
From: H. Peter Anvin @ 2016-01-21 19:52 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Boris Ostrovsky, Roger Pau Monné,
	Konrad Rzeszutek Wilk, Stefano Stabellini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Rusty Russell, Andy Lutomirski,
	mcb30, Juergen Gross, Jan Beulich, joro, Andrey Ryabinin,
	andreyknvl, long.wanglong, qiuxishi, aryabinin,
	Mauro Carvalho Chehab, Valentin Rothberg, Peter Senna Tschudin,
	X86 ML, xen-devel, linux-kernel

On 01/21/16 11:50, H. Peter Anvin wrote:
> 
> Right... we already do that.
> 
> I don't even think you need to initialize any tables.  At least on i386,
> we have to do this in assembly code.  However, it is just a simple table
> walk.  :)
> 

It might make more sense to make subarch its own table, though, although
I haven't looked at your code in enough detail to say.

	-hpa

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 4/8] x86/init: add linker table support
  2016-01-21 19:50               ` H. Peter Anvin
  2016-01-21 19:52                 ` H. Peter Anvin
@ 2016-01-21 20:05                 ` Luis R. Rodriguez
  2016-01-21 21:36                   ` H. Peter Anvin
  1 sibling, 1 reply; 60+ messages in thread
From: Luis R. Rodriguez @ 2016-01-21 20:05 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Boris Ostrovsky, Roger Pau Monné,
	Konrad Rzeszutek Wilk, Stefano Stabellini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Rusty Russell, Andy Lutomirski,
	mcb30, Juergen Gross, Jan Beulich, joro, Andrey Ryabinin,
	long.wanglong, qiuxishi, aryabinin, Mauro Carvalho Chehab,
	Valentin Rothberg, Peter Senna Tschudin, X86 ML, xen-devel,
	linux-kernel

On Thu, Jan 21, 2016 at 11:50 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>> The PV init code would kick in early and could parse the
>> boot_params.hdr.hardware_subarch_data pointer as it sees fit.
>>
>
> Right... we already do that.
>
> I don't even think you need to initialize any tables.

The init above is for the sorting. Its not needed unless your
subsystem uses it, its the same sort as with the IOMMU stuff only with
stronger semantics. In theory one would *not* need to do this so
early, it could wait, provided you at least are confident the linker
order suffices for all routines called early. To be clear, the struct
proposed defines a set of callbacks for calling feature's callbacks at
different levels of the x86 init path. As this patch series is
concerned it had one for x86_64_start_reservations(). This could
easily be extended to also include one to be called for instance at
setup_arch(). If we were able to solve the ability to make use of
subarch so early as of we could then have a callback for
x86_64_start_kernel(), then one at x86_64_start_reservations() and
perhaps one at setup_arch() later -- Kasan is an example that could
fit this example in the future. What I mean is that we could avoid the
sort or wait for the run time sort later until
x86_64_start_reservations() or even setup_arch() provided the deafult
linker order suffices for the series of previous callbacks. Its up to
us but I'm taken care to be pedantic about semantics here.

> At least on i386,
> we have to do this in assembly code.

Neat! How is that order kept?

> However, it is just a simple table walk.  :)

 Luis

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 0/8] x86/init: Linux linker tables
  2015-12-18  4:40       ` H. Peter Anvin
@ 2016-01-21 20:19         ` H. Peter Anvin
  2016-01-21 20:33           ` Luis R. Rodriguez
  2016-01-22 13:44           ` Michael Matz
  0 siblings, 2 replies; 60+ messages in thread
From: H. Peter Anvin @ 2016-01-21 20:19 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Luis R. Rodriguez, tglx, mingo, bp, konrad.wilk, rusty, luto,
	boris.ostrovsky, mcb30, jgross, JBeulich, joro, ryabinin.a.a,
	andreyknvl, long.wanglong, qiuxishi, aryabinin, mchehab,
	valentinrothberg, peter.senna, x86, Michal Marek, xen-devel,
	Michael Matz, linux-kernel

On 12/17/15 20:40, H. Peter Anvin wrote:
>>
>> const struct
>> foo__attribute__((used,section(".rodata.tbl.tablename.0"))) tablename[0];
>>
>> const struct
>> foo__attribute__((used,section(".rodata.tbl.tablename.999")))
>> tablename__end[0];
>>

(Over)thinking about this some more, I suggest using the empty string
for the start and "~" for the end.  And, yes, I did check that ~ works
as part of a section name.

Something that confuses me is that gcc seems to give these sections the
"aw" attributes which makes as complain.  This might be a gcc bug.
Worst case we have to use an assembly statement to create these
sections; it isn't a big deal and shouldn't make it any more
architecture-specific.

	-hpa

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 0/8] x86/init: Linux linker tables
  2016-01-21 20:19         ` H. Peter Anvin
@ 2016-01-21 20:33           ` Luis R. Rodriguez
  2016-01-21 21:37             ` Konrad Rzeszutek Wilk
  2016-01-22 13:44           ` Michael Matz
  1 sibling, 1 reply; 60+ messages in thread
From: Luis R. Rodriguez @ 2016-01-21 20:33 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Konrad Rzeszutek Wilk, Rusty Russell, Andy Lutomirski,
	Boris Ostrovsky, mcb30, Juergen Gross, Jan Beulich, joro,
	Andrey Ryabinin, andreyknvl, long.wanglong, qiuxishi, aryabinin,
	Mauro Carvalho Chehab, Valentin Rothberg, Peter Senna Tschudin,
	X86 ML, Michal Marek, xen-devel, Michael Matz, linux-kernel

On Thu, Jan 21, 2016 at 12:19 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 12/17/15 20:40, H. Peter Anvin wrote:
>>>
>>> const struct
>>> foo__attribute__((used,section(".rodata.tbl.tablename.0"))) tablename[0];
>>>
>>> const struct
>>> foo__attribute__((used,section(".rodata.tbl.tablename.999")))
>>> tablename__end[0];
>>>
>
> (Over)thinking about this some more, I suggest using the empty string
> for the start and "~" for the end.  And, yes, I did check that ~ works
> as part of a section name.

Sure, do we know if that ICC compatible? Do we care? There are a
series of ICC hacks put in place on ipxe's original solution which
I've folded in, it seems that works but if we care about ICC those
folks should perhaps help review as well.

> Something that confuses me is that gcc seems to give these sections the
> "aw" attributes which makes as complain.  This might be a gcc bug.
> Worst case we have to use an assembly statement to create these
> sections; it isn't a big deal and shouldn't make it any more
> architecture-specific.

OK!

 Luis

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 4/8] x86/init: add linker table support
  2016-01-21 20:05                 ` Luis R. Rodriguez
@ 2016-01-21 21:36                   ` H. Peter Anvin
  0 siblings, 0 replies; 60+ messages in thread
From: H. Peter Anvin @ 2016-01-21 21:36 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Boris Ostrovsky, Roger Pau Monné,
	Konrad Rzeszutek Wilk, Stefano Stabellini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Rusty Russell, Andy Lutomirski,
	mcb30, Juergen Gross, Jan Beulich, joro, Andrey Ryabinin,
	long.wanglong, qiuxishi, aryabinin, Mauro Carvalho Chehab,
	Valentin Rothberg, Peter Senna Tschudin, X86 ML, xen-devel,
	linux-kernel

On 01/21/16 12:05, Luis R. Rodriguez wrote:
> 
>> At least on i386,
>> we have to do this in assembly code.
> 
> Neat! How is that order kept?
> 

Right now subarch_entries[] is just an array indexed by subarch number
hardcoded in head_32.S.

However, if you have a list of (id, target) then you could just iterate
over it.

	-hpa

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 0/8] x86/init: Linux linker tables
  2016-01-21 20:33           ` Luis R. Rodriguez
@ 2016-01-21 21:37             ` Konrad Rzeszutek Wilk
  2016-01-21 22:25               ` [Xen-devel] " Luis R. Rodriguez
  2016-01-22  8:15               ` Michael Brown
  0 siblings, 2 replies; 60+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-01-21 21:37 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Rusty Russell, Andy Lutomirski, Boris Ostrovsky, mcb30,
	Juergen Gross, Jan Beulich, joro, Andrey Ryabinin, andreyknvl,
	long.wanglong, qiuxishi, aryabinin, Mauro Carvalho Chehab,
	Valentin Rothberg, Peter Senna Tschudin, X86 ML, Michal Marek,
	xen-devel, Michael Matz, linux-kernel

On Thu, Jan 21, 2016 at 12:33:43PM -0800, Luis R. Rodriguez wrote:
> On Thu, Jan 21, 2016 at 12:19 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> > On 12/17/15 20:40, H. Peter Anvin wrote:
> >>>
> >>> const struct
> >>> foo__attribute__((used,section(".rodata.tbl.tablename.0"))) tablename[0];
> >>>
> >>> const struct
> >>> foo__attribute__((used,section(".rodata.tbl.tablename.999")))
> >>> tablename__end[0];
> >>>
> >
> > (Over)thinking about this some more, I suggest using the empty string
> > for the start and "~" for the end.  And, yes, I did check that ~ works
> > as part of a section name.
> 
> Sure, do we know if that ICC compatible? Do we care? There are a
> series of ICC hacks put in place on ipxe's original solution which
> I've folded in, it seems that works but if we care about ICC those
> folks should perhaps help review as well.

I didn't know the kernel could even be compiled with ICC? Thought
only GCC worked?

Anyhow - it may be that those fixes were for quite old ICC versions.
Does the latest one manifest these oddities?

> 
> > Something that confuses me is that gcc seems to give these sections the
> > "aw" attributes which makes as complain.  This might be a gcc bug.
> > Worst case we have to use an assembly statement to create these
> > sections; it isn't a big deal and shouldn't make it any more
> > architecture-specific.
> 
> OK!
> 
>  Luis

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Xen-devel] [RFC v1 0/8] x86/init: Linux linker tables
  2016-01-21 21:37             ` Konrad Rzeszutek Wilk
@ 2016-01-21 22:25               ` Luis R. Rodriguez
  2016-01-21 23:56                 ` H. Peter Anvin
  2016-01-22  8:15               ` Michael Brown
  1 sibling, 1 reply; 60+ messages in thread
From: Luis R. Rodriguez @ 2016-01-21 22:25 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Michael Brown
  Cc: Peter Senna Tschudin, Andrey Ryabinin, Jan Beulich,
	H. Peter Anvin, qiuxishi, Boris Ostrovsky, xen-devel, joro,
	X86 ML, Ingo Molnar, aryabinin, Michael Matz,
	Mauro Carvalho Chehab, andreyknvl, Rusty Russell, Michal Marek,
	Borislav Petkov, Thomas Gleixner, Valentin Rothberg,
	Juergen Gross, linux-kernel, Andy Lutomirski, long.wanglong

On Thu, Jan 21, 2016 at 1:37 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
>> Sure, do we know if that ICC compatible? Do we care? There are a
>> series of ICC hacks put in place on ipxe's original solution which
>> I've folded in, it seems that works but if we care about ICC those
>> folks should perhaps help review as well.
>
> I didn't know the kernel could even be compiled with ICC? Thought
> only GCC worked?

I'm happy with that, just wanted to make sure I raise the flag concern
given the icc hacks on the linker tables.

> Anyhow - it may be that those fixes were for quite old ICC versions.
> Does the latest one manifest these oddities?

I am not sure, I yield to Michael as the author of the original ICC
compatibility pieces. If we don't care about ICC let me know and I'll
just drop the stuff. In lack of such statements I'll just keep the
work arounds in place, but I'm more than trilled to drop it.

 Luis

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Xen-devel] [RFC v1 0/8] x86/init: Linux linker tables
  2016-01-21 22:25               ` [Xen-devel] " Luis R. Rodriguez
@ 2016-01-21 23:56                 ` H. Peter Anvin
  2016-01-22  0:28                   ` Luis R. Rodriguez
  0 siblings, 1 reply; 60+ messages in thread
From: H. Peter Anvin @ 2016-01-21 23:56 UTC (permalink / raw)
  To: Luis R. Rodriguez, Konrad Rzeszutek Wilk, Michael Brown
  Cc: Peter Senna Tschudin, Andrey Ryabinin, Jan Beulich, qiuxishi,
	Boris Ostrovsky, xen-devel, joro, X86 ML, Ingo Molnar, aryabinin,
	Michael Matz, Mauro Carvalho Chehab, andreyknvl, Rusty Russell,
	Michal Marek, Borislav Petkov, Thomas Gleixner,
	Valentin Rothberg, Juergen Gross, linux-kernel, Andy Lutomirski,
	long.wanglong

On 01/21/16 14:25, Luis R. Rodriguez wrote:
> On Thu, Jan 21, 2016 at 1:37 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
>>> Sure, do we know if that ICC compatible? Do we care? There are a
>>> series of ICC hacks put in place on ipxe's original solution which
>>> I've folded in, it seems that works but if we care about ICC those
>>> folks should perhaps help review as well.
>>
>> I didn't know the kernel could even be compiled with ICC? Thought
>> only GCC worked?
> 
> I'm happy with that, just wanted to make sure I raise the flag concern
> given the icc hacks on the linker tables.
> 
>> Anyhow - it may be that those fixes were for quite old ICC versions.
>> Does the latest one manifest these oddities?
> 
> I am not sure, I yield to Michael as the author of the original ICC
> compatibility pieces. If we don't care about ICC let me know and I'll
> just drop the stuff. In lack of such statements I'll just keep the
> work arounds in place, but I'm more than trilled to drop it.
> 

In general we let the ICC and Clang/LLVM teams communicate with out a
post facto.  We can't just guess what their requirements are, especially
since they are likely to change between revisions.

	-hpa

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 4/8] x86/init: add linker table support
  2016-01-21 19:52                 ` H. Peter Anvin
@ 2016-01-22  0:19                   ` Luis R. Rodriguez
  0 siblings, 0 replies; 60+ messages in thread
From: Luis R. Rodriguez @ 2016-01-22  0:19 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Boris Ostrovsky, Roger Pau Monné,
	Konrad Rzeszutek Wilk, Stefano Stabellini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Rusty Russell, Andy Lutomirski,
	Michael Brown, Juergen Gross, Jan Beulich, joro, Andrey Ryabinin,
	long.wanglong, qiuxishi, aryabinin, Mauro Carvalho Chehab,
	Valentin Rothberg, Peter Senna Tschudin, X86 ML, xen-devel,
	linux-kernel, Luis R. Rodriguez

On Thu, Jan 21, 2016 at 11:52 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 01/21/16 11:50, H. Peter Anvin wrote:
>>
>> Right... we already do that.
>>
>> I don't even think you need to initialize any tables.  At least on i386,
>> we have to do this in assembly code.  However, it is just a simple table
>> walk.  :)
>>
>
> It might make more sense to make subarch its own table, though, although
> I haven't looked at your code in enough detail to say.

The gist of it:

void x86_init_fn_early_init() {
  for_each_table_entry(init_fn, X86_INIT_FNS) {
     if supported subarc
      init_fn->early_init();
}

void x86_init_fn_setup_arch() {
  for_each_table_entry(init_fn, X86_INIT_FNS) {
     if supported subarc
      init_fn->setup_archt();
}

Right now the code defines just an init routine at the stage for
x86_64_start_reservations(), we call the callback early_init(), ie
setup_arch() doesn't exist yet. Since certain routines can run on
either Xen or bare-metal we allow a bitwise OR for the subarch as a
bitmask. I'll think a bit more about how to use subarch for a table,
but can't fit it yet. I'd like to later revise kfree'ing unused
sections, and a table per subarch might be nice for that, but
otherwise this flow seems easy to follow, so if we can kfree sections
within a table we could still accomplish that later.

  Luis

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 4/8] x86/init: add linker table support
  2016-01-20 22:20           ` H. Peter Anvin
@ 2016-01-22  0:25             ` Luis R. Rodriguez
  2016-01-22  0:42               ` H. Peter Anvin
  0 siblings, 1 reply; 60+ messages in thread
From: Luis R. Rodriguez @ 2016-01-22  0:25 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Takashi Iwai, Avi Kivity, Konrad Rzeszutek Wilk, Boris Ostrovsky,
	Roger Pau Monné,
	Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Rusty Russell, Andy Lutomirski, Michael Brown,
	Juergen Gross, Jan Beulich, joro, Andrey Ryabinin, long.wanglong,
	qiuxishi, aryabinin, Mauro Carvalho Chehab, Valentin Rothberg,
	Peter Senna Tschudin, X86 ML, xen-devel, linux-kernel,
	Luis R. Rodriguez

On Wed, Jan 20, 2016 at 2:20 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On January 20, 2016 2:12:49 PM PST, "Luis R. Rodriguez" <mcgrof@do-not-panic.com> wrote:
>>On Wed, Jan 20, 2016 at 1:41 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>> On 01/20/16 13:33, Luis R. Rodriguez wrote:
>>>>
>>>> That's correct for PV and PVH, likewise when qemu is required for
>>HVM
>>>> qemu could set it. I have the qemu change done but that should only
>>>> cover HVM. A common place to set this as well could be the
>>hypervisor,
>>>> but currently the hypervisor doesn't set any boot_params, instead a
>>>> generic struct is passed and the kernel code (for any OS) is
>>expected
>>>> to interpret this and then set the required values for the OS in the
>>>> init path. Long term though if we wanted to merge init further one
>>way
>>>> could be to have the hypervisor just set the zero page cleanly for
>>the
>>>> different modes. If we needed more data other than the
>>>> hardware_subarch we also have the hardware_subarch_data, that's a
>>u64
>>>> , and how that is used would be up to the subarch. In Xen's case it
>>>> could do what it wants with it. That would still mean perhaps
>>defining
>>>> as part of a Xen boot protocol a place where xen specific code can
>>>> count on finding more Xen data passed by the hypervisor, the
>>>> xen_start_info. That is, if we wanted to merge init paths this is
>>>> something to consider.
>>>>
>>>> One thing I considered on the question of who should set the zero
>>page
>>>> for Xen with the prospect of merging inits, or at least this subarch
>>>> for both short term and long term are the obvious implications in
>>>> terms of hypervisor / kernel / qemu combination requirements if the
>>>> subarch is needed. Having it set in the kernel is an obvious
>>immediate
>>>> choice for PV / PVH but it means we can't merge init paths
>>completely
>>>> (down to asm inits), we'd still be able to merge some C init paths
>>>> though, the first entry would still be different. Having the zero
>>page
>>>> set on the hypervisor would go long ways but it would mean a
>>>> hypervisor change required.
>>>>
>>>> These prospects are worth discussing, specially in light of Boris's
>>>> hvmlite work.
>>>>
>>>
>>> The above doesn't make sense to me.  hardware_subarch is really used
>>> when the boot sequence is somehow nonstandard.
>>
>>Thanks for the feedback -- as it stands today hardware_subarch is only
>>used by lguest, Moorestown, and CE4100 even though we had definitions
>>for it for Xen -- this is not used yet. Its documentation does make
>>references to differences for a paravirtualized environment, and uses
>>a few examples but doesn't go into great depths about restrictions so
>>its limitations in how we could use it were not clear to me.
>>
>>>  HVM probably doesn't need that.
>>
>>Today HVM doesn't need it, but perhaps that is because it has not
>>needed changes early on boot. Will it, or could it? I'd even invite us
>>to consider the same for other hypervisors or PV hypervisors. I'll
>>note that things like cpu_has_hypervisor() or derivatives
>>(kvm_para_available() which is now used on drivers even, see
>>sound/pci/intel8x0.c) requires init_hypervisor_platform() run, in
>>terms of the x86 init sequence this is run pretty late at
>>setup_arch(). Should code need to know hypervisor info anytime before
>>that they have no generic option available.
>>
>>I'm fine if we want to restrict hardware_subarch but I'll note the
>>semantics were not that explicit to delineate clear differences and I
>>just wanted to highlight the current early boot restriction of
>>cpu_has_hypervisor().
>>
>>  Luis
>
> Basically, if the hardware is enumerable using standard PC mechanisms (PCI, ACPI) and doesn't need a special boot flow it should use type 0.

I can extend the documentation as part of this to be clear.

I will note though that this still leaves a gap on code which might
want to access the question "are we in a virt environment, and if so
on which one" in between really early boot and right before
init_hypervisor_platform(). Or rather, given subarch can be used by
Xen and lguest but not KVM it means KVM doesn't get to use it. It may
not need it, but its also rather trivial to set up on qemu, and I have
a patch for that if we wanted one for KVM. That would extend the
definition of subarch a bit more, but as it stands today its use was
rather limited. Heck, subharch_data is to this day unused.

  Luis

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Xen-devel] [RFC v1 0/8] x86/init: Linux linker tables
  2016-01-21 23:56                 ` H. Peter Anvin
@ 2016-01-22  0:28                   ` Luis R. Rodriguez
  0 siblings, 0 replies; 60+ messages in thread
From: Luis R. Rodriguez @ 2016-01-22  0:28 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Konrad Rzeszutek Wilk, Michael Brown, Peter Senna Tschudin,
	Andrey Ryabinin, Jan Beulich, qiuxishi, Boris Ostrovsky,
	xen-devel, joro, X86 ML, Ingo Molnar, aryabinin, Michael Matz,
	Mauro Carvalho Chehab, andreyknvl, Rusty Russell, Michal Marek,
	Borislav Petkov, Thomas Gleixner, Valentin Rothberg,
	Juergen Gross, linux-kernel, Andy Lutomirski, long.wanglong

On Thu, Jan 21, 2016 at 03:56:35PM -0800, H. Peter Anvin wrote:
> On 01/21/16 14:25, Luis R. Rodriguez wrote:
> > On Thu, Jan 21, 2016 at 1:37 PM, Konrad Rzeszutek Wilk
> > <konrad.wilk@oracle.com> wrote:
> >>> Sure, do we know if that ICC compatible? Do we care? There are a
> >>> series of ICC hacks put in place on ipxe's original solution which
> >>> I've folded in, it seems that works but if we care about ICC those
> >>> folks should perhaps help review as well.
> >>
> >> I didn't know the kernel could even be compiled with ICC? Thought
> >> only GCC worked?
> > 
> > I'm happy with that, just wanted to make sure I raise the flag concern
> > given the icc hacks on the linker tables.
> > 
> >> Anyhow - it may be that those fixes were for quite old ICC versions.
> >> Does the latest one manifest these oddities?
> > 
> > I am not sure, I yield to Michael as the author of the original ICC
> > compatibility pieces. If we don't care about ICC let me know and I'll
> > just drop the stuff. In lack of such statements I'll just keep the
> > work arounds in place, but I'm more than trilled to drop it.
> > 
> 
> In general we let the ICC and Clang/LLVM teams communicate with out a
> post facto.  We can't just guess what their requirements are, especially
> since they are likely to change between revisions.

Great. I'm going to drop ICC hacks.

  Luis

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 4/8] x86/init: add linker table support
  2016-01-22  0:25             ` Luis R. Rodriguez
@ 2016-01-22  0:42               ` H. Peter Anvin
  2016-02-11 20:45                 ` Luis R. Rodriguez
  0 siblings, 1 reply; 60+ messages in thread
From: H. Peter Anvin @ 2016-01-22  0:42 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Takashi Iwai, Avi Kivity, Konrad Rzeszutek Wilk, Boris Ostrovsky,
	Roger Pau Monné,
	Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Rusty Russell, Andy Lutomirski, Michael Brown,
	Juergen Gross, Jan Beulich, joro, Andrey Ryabinin, long.wanglong,
	qiuxishi, aryabinin, Mauro Carvalho Chehab, Valentin Rothberg,
	Peter Senna Tschudin, X86 ML, xen-devel, linux-kernel,
	Luis R. Rodriguez

On 01/21/16 16:25, Luis R. Rodriguez wrote:
>>
>> Basically, if the hardware is enumerable using standard PC mechanisms (PCI, ACPI) and doesn't need a special boot flow it should use type 0.
> 
> I can extend the documentation as part of this to be clear.
> 
> I will note though that this still leaves a gap on code which might
> want to access the question "are we in a virt environment, and if so
> on which one" in between really early boot and right before
> init_hypervisor_platform(). Or rather, given subarch can be used by
> Xen and lguest but not KVM it means KVM doesn't get to use it. It may
> not need it, but its also rather trivial to set up on qemu, and I have
> a patch for that if we wanted one for KVM. That would extend the
> definition of subarch a bit more, but as it stands today its use was
> rather limited. Heck, subharch_data is to this day unused.
> 

KVM is not a subarch, and Xen HVM isn't either; the subarch was meant to
be specifically to handle nonstandard boot entries; the CE4100 extension
was itself kind of a hack.

If you have a genuine need for a "hypervisor type" then that is a
separate thing and should be treated separately from subarch.  However,
you need to consider that some hypervisors can emulate other hypervisors
and you may have more than one hypervisor API available.

	-hpa

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 0/8] x86/init: Linux linker tables
  2016-01-21 21:37             ` Konrad Rzeszutek Wilk
  2016-01-21 22:25               ` [Xen-devel] " Luis R. Rodriguez
@ 2016-01-22  8:15               ` Michael Brown
  1 sibling, 0 replies; 60+ messages in thread
From: Michael Brown @ 2016-01-22  8:15 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Luis R. Rodriguez
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Rusty Russell, Andy Lutomirski, Boris Ostrovsky, Juergen Gross,
	Jan Beulich, joro, Andrey Ryabinin, andreyknvl, long.wanglong,
	qiuxishi, aryabinin, Mauro Carvalho Chehab, Valentin Rothberg,
	Peter Senna Tschudin, X86 ML, Michal Marek, xen-devel,
	Michael Matz, linux-kernel

On 21/01/16 21:37, Konrad Rzeszutek Wilk wrote:
> On Thu, Jan 21, 2016 at 12:33:43PM -0800, Luis R. Rodriguez wrote:
>> Sure, do we know if that ICC compatible? Do we care? There are a
>> series of ICC hacks put in place on ipxe's original solution which
>> I've folded in, it seems that works but if we care about ICC those
>> folks should perhaps help review as well.
>
> I didn't know the kernel could even be compiled with ICC? Thought
> only GCC worked?
>
> Anyhow - it may be that those fixes were for quite old ICC versions.
> Does the latest one manifest these oddities?

I haven't tested building iPXE with icc for some time.  (The support for 
icc was originally added with the plan to be able to compile to EFI Byte 
Code; this plan was swiftly abandoned.)

The most recent version of icc that I have personally used with that 
code was 10.1.018.

Michael

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 0/8] x86/init: Linux linker tables
  2016-01-21 20:19         ` H. Peter Anvin
  2016-01-21 20:33           ` Luis R. Rodriguez
@ 2016-01-22 13:44           ` Michael Matz
  2016-01-22 19:06             ` H. Peter Anvin
  2016-02-02 23:48             ` H. Peter Anvin
  1 sibling, 2 replies; 60+ messages in thread
From: Michael Matz @ 2016-01-22 13:44 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Luis R. Rodriguez, Luis R. Rodriguez, tglx, mingo, bp,
	konrad.wilk, rusty, luto, boris.ostrovsky, mcb30, jgross,
	JBeulich, joro, ryabinin.a.a, andreyknvl, long.wanglong,
	qiuxishi, aryabinin, mchehab, valentinrothberg, peter.senna, x86,
	Michal Marek, xen-devel, linux-kernel

Hi,

On Thu, 21 Jan 2016, H. Peter Anvin wrote:

> Something that confuses me is that gcc seems to give these sections the 
> "aw" attributes which makes as complain.  This might be a gcc bug.

Workaround: use an (possibly empty) intializer:

struct foo {int i;};
const struct foo
__attribute__((used,section(".rodata.tbl.tablename.0"))) tablename[0] = {};


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 0/8] x86/init: Linux linker tables
  2016-01-22 13:44           ` Michael Matz
@ 2016-01-22 19:06             ` H. Peter Anvin
  2016-01-22 21:52               ` Luis R. Rodriguez
  2016-02-02 23:48             ` H. Peter Anvin
  1 sibling, 1 reply; 60+ messages in thread
From: H. Peter Anvin @ 2016-01-22 19:06 UTC (permalink / raw)
  To: Michael Matz
  Cc: Luis R. Rodriguez, Luis R. Rodriguez, tglx, mingo, bp,
	konrad.wilk, rusty, luto, boris.ostrovsky, mcb30, jgross,
	JBeulich, joro, ryabinin.a.a, andreyknvl, long.wanglong,
	qiuxishi, aryabinin, mchehab, valentinrothberg, peter.senna, x86,
	Michal Marek, xen-devel, linux-kernel

On 01/22/2016 05:44 AM, Michael Matz wrote:
> Hi,
> 
> On Thu, 21 Jan 2016, H. Peter Anvin wrote:
> 
>> Something that confuses me is that gcc seems to give these sections the 
>> "aw" attributes which makes as complain.  This might be a gcc bug.
> 
> Workaround: use an (possibly empty) intializer:
> 
> struct foo {int i;};
> const struct foo
> __attribute__((used,section(".rodata.tbl.tablename.0"))) tablename[0] = {};
> 

And indeed that works.  Awesome!  Much better than having to do an
assembly hack.

	-hpa

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 0/8] x86/init: Linux linker tables
  2016-01-22 19:06             ` H. Peter Anvin
@ 2016-01-22 21:52               ` Luis R. Rodriguez
  2016-02-03  0:22                 ` Luis R. Rodriguez
  0 siblings, 1 reply; 60+ messages in thread
From: Luis R. Rodriguez @ 2016-01-22 21:52 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Michael Matz, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Konrad Rzeszutek Wilk, Rusty Russell, Andy Lutomirski,
	Boris Ostrovsky, Michael Brown, Juergen Gross, Jan Beulich,
	Joerg Roedel, Andrey Ryabinin, andreyknvl, long.wanglong,
	qiuxishi, aryabinin, Mauro Carvalho Chehab, Valentin Rothberg,
	Peter Senna Tschudin, X86 ML, Michal Marek, xen-devel,
	linux-kernel

On Fri, Jan 22, 2016 at 11:06 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 01/22/2016 05:44 AM, Michael Matz wrote:
>> Hi,
>>
>> On Thu, 21 Jan 2016, H. Peter Anvin wrote:
>>
>>> Something that confuses me is that gcc seems to give these sections the
>>> "aw" attributes which makes as complain.  This might be a gcc bug.
>>
>> Workaround: use an (possibly empty) intializer:
>>
>> struct foo {int i;};
>> const struct foo
>> __attribute__((used,section(".rodata.tbl.tablename.0"))) tablename[0] = {};
>>
>
> And indeed that works.  Awesome!  Much better than having to do an
> assembly hack.

BTW before we set these in stone given that subarch (lguest, Xen, PC)
really does provide a split in run time code we *know* we could
technically free code for subarchs not needed. I don't expect this to
happen now, but its possibility to do later seems worthy for us to
consider on architecture here on the way we define the linker tables.
As I have it linker tables are associated associated with a struct
(which hpa asked to enable anyone to peg *anything* not just structs),
these structs then have a priority which uses the linker later to sort
things for us, and it also has a subarch bitmask which tells us the
supported subarchs.

Should it be possible to resuse free_init_pages() and/or
free_reserved_area() only for routines (members in the array in this
case of a struct of fns) that don't meet our subarch once we're done
iterating over the routies and know we can discard things we know we
can drop? Through a cursory glance, *I think* its possible as-is, we
would just need easy access to the respective start and end addresses
and I guess there lies the challenge. Question is, is would that be
clean enough for us? Or are there other things you can think of that
perhaps might make this prospect cleaner later to add?

I figure better ask now for architectural purposes than later after merged.

 Luis

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 0/8] x86/init: Linux linker tables
  2016-01-22 13:44           ` Michael Matz
  2016-01-22 19:06             ` H. Peter Anvin
@ 2016-02-02 23:48             ` H. Peter Anvin
  2016-02-03  0:15               ` Luis R. Rodriguez
  1 sibling, 1 reply; 60+ messages in thread
From: H. Peter Anvin @ 2016-02-02 23:48 UTC (permalink / raw)
  To: Michael Matz
  Cc: Luis R. Rodriguez, Luis R. Rodriguez, tglx, mingo, bp,
	konrad.wilk, rusty, luto, boris.ostrovsky, mcb30, jgross,
	JBeulich, joro, ryabinin.a.a, andreyknvl, long.wanglong,
	qiuxishi, aryabinin, mchehab, valentinrothberg, peter.senna, x86,
	Michal Marek, xen-devel, linux-kernel

On 01/22/2016 05:44 AM, Michael Matz wrote:
> Hi,
> 
> On Thu, 21 Jan 2016, H. Peter Anvin wrote:
> 
>> Something that confuses me is that gcc seems to give these sections the 
>> "aw" attributes which makes as complain.  This might be a gcc bug.
> 
> Workaround: use an (possibly empty) intializer:
> 
> struct foo {int i;};
> const struct foo
> __attribute__((used,section(".rodata.tbl.tablename.0"))) tablename[0] = {};
> 

Any forward progress on this?

	-hpa

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 0/8] x86/init: Linux linker tables
  2016-02-02 23:48             ` H. Peter Anvin
@ 2016-02-03  0:15               ` Luis R. Rodriguez
  0 siblings, 0 replies; 60+ messages in thread
From: Luis R. Rodriguez @ 2016-02-03  0:15 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Michael Matz, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Konrad Rzeszutek Wilk, Rusty Russell, Andy Lutomirski,
	Boris Ostrovsky, Michael Brown, Juergen Gross, Jan Beulich,
	Joerg Roedel, Andrey Ryabinin, andreyknvl, long.wanglong,
	qiuxishi, aryabinin, Mauro Carvalho Chehab, Valentin Rothberg,
	Peter Senna Tschudin, X86 ML, Michal Marek, xen-devel,
	linux-kernel

On Tue, Feb 2, 2016 at 3:48 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 01/22/2016 05:44 AM, Michael Matz wrote:
>> Hi,
>>
>> On Thu, 21 Jan 2016, H. Peter Anvin wrote:
>>
>>> Something that confuses me is that gcc seems to give these sections the
>>> "aw" attributes which makes as complain.  This might be a gcc bug.
>>
>> Workaround: use an (possibly empty) intializer:
>>
>> struct foo {int i;};
>> const struct foo
>> __attribute__((used,section(".rodata.tbl.tablename.0"))) tablename[0] = {};
>>
>
> Any forward progress on this?

Yes, I'll reply inline to my only concern.

 Luis

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 0/8] x86/init: Linux linker tables
  2016-01-22 21:52               ` Luis R. Rodriguez
@ 2016-02-03  0:22                 ` Luis R. Rodriguez
  2016-02-03  0:25                   ` H. Peter Anvin
  0 siblings, 1 reply; 60+ messages in thread
From: Luis R. Rodriguez @ 2016-02-03  0:22 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Michael Matz, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Konrad Rzeszutek Wilk, Rusty Russell, Andy Lutomirski,
	Boris Ostrovsky, Michael Brown, Juergen Gross, Jan Beulich,
	Joerg Roedel, Andrey Ryabinin, long.wanglong, qiuxishi,
	aryabinin, Mauro Carvalho Chehab, Valentin Rothberg,
	Peter Senna Tschudin, X86 ML, Michal Marek, xen-devel,
	linux-kernel

On Fri, Jan 22, 2016 at 1:52 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> On Fri, Jan 22, 2016 at 11:06 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 01/22/2016 05:44 AM, Michael Matz wrote:
>>> Hi,
>>>
>>> On Thu, 21 Jan 2016, H. Peter Anvin wrote:
>>>
>>>> Something that confuses me is that gcc seems to give these sections the
>>>> "aw" attributes which makes as complain.  This might be a gcc bug.
>>>
>>> Workaround: use an (possibly empty) intializer:
>>>
>>> struct foo {int i;};
>>> const struct foo
>>> __attribute__((used,section(".rodata.tbl.tablename.0"))) tablename[0] = {};
>>>
>>
>> And indeed that works.  Awesome!  Much better than having to do an
>> assembly hack.
>
> BTW before we set these in stone given that subarch (lguest, Xen, PC)
> really does provide a split in run time code we *know* we could
> technically free code for subarchs not needed. I don't expect this to
> happen now, but its possibility to do later seems worthy for us to
> consider on architecture here on the way we define the linker tables.
> As I have it linker tables are associated associated with a struct
> (which hpa asked to enable anyone to peg *anything* not just structs),
> these structs then have a priority which uses the linker later to sort
> things for us, and it also has a subarch bitmask which tells us the
> supported subarchs.
>
> Should it be possible to resuse free_init_pages() and/or
> free_reserved_area() only for routines (members in the array in this
> case of a struct of fns) that don't meet our subarch once we're done
> iterating over the routies and know we can discard things we know we
> can drop? Through a cursory glance, *I think* its possible as-is, we
> would just need easy access to the respective start and end addresses
> and I guess there lies the challenge. Question is, is would that be
> clean enough for us? Or are there other things you can think of that
> perhaps might make this prospect cleaner later to add?
>
> I figure better ask now for architectural purposes than later after merged.

I don't think its needed we iron out in a solution *now* to be able to
free code we know we won't need at run time but having a solid
understanding adding this feature later without much impact to users
might be worthy. As such I was pursuing a very basic proof of concept
to ensure this is possible first given I didn't hear back if folks
were sure this might be possible. I don't think a proof of concept
should take long so just want to get fleshed out.

Current dev tree for making this proof of concept:

https://github.com/mcgrof/kbox

If you don't think being prudent about this in terms of architecture
is necessary let me know and I'll just respin based on existing
discussion and we can leave this for later and deal with any required
changes later.

 Luis

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 0/8] x86/init: Linux linker tables
  2016-02-03  0:22                 ` Luis R. Rodriguez
@ 2016-02-03  0:25                   ` H. Peter Anvin
  2016-02-03  0:28                     ` Luis R. Rodriguez
  0 siblings, 1 reply; 60+ messages in thread
From: H. Peter Anvin @ 2016-02-03  0:25 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Michael Matz, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Konrad Rzeszutek Wilk, Rusty Russell, Andy Lutomirski,
	Boris Ostrovsky, Michael Brown, Juergen Gross, Jan Beulich,
	Joerg Roedel, Andrey Ryabinin, long.wanglong, qiuxishi,
	aryabinin, Mauro Carvalho Chehab, Valentin Rothberg,
	Peter Senna Tschudin, X86 ML, Michal Marek, xen-devel,
	linux-kernel

On 02/02/2016 04:22 PM, Luis R. Rodriguez wrote:
>>
>> Should it be possible to resuse free_init_pages() and/or
>> free_reserved_area() only for routines (members in the array in this
>> case of a struct of fns) that don't meet our subarch once we're done
>> iterating over the routies and know we can discard things we know we
>> can drop? Through a cursory glance, *I think* its possible as-is, we
>> would just need easy access to the respective start and end addresses
>> and I guess there lies the challenge. Question is, is would that be
>> clean enough for us? Or are there other things you can think of that
>> perhaps might make this prospect cleaner later to add?
>>
>> I figure better ask now for architectural purposes than later after merged.
> 
> I don't think its needed we iron out in a solution *now* to be able to
> free code we know we won't need at run time but having a solid
> understanding adding this feature later without much impact to users
> might be worthy. As such I was pursuing a very basic proof of concept
> to ensure this is possible first given I didn't hear back if folks
> were sure this might be possible. I don't think a proof of concept
> should take long so just want to get fleshed out.
> 

This applies to the specific subarch use rather than generic linker
tables, right?

	-hpa

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 0/8] x86/init: Linux linker tables
  2016-02-03  0:25                   ` H. Peter Anvin
@ 2016-02-03  0:28                     ` Luis R. Rodriguez
  2016-02-03  0:48                       ` Luis R. Rodriguez
  0 siblings, 1 reply; 60+ messages in thread
From: Luis R. Rodriguez @ 2016-02-03  0:28 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Michael Matz, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Konrad Rzeszutek Wilk, Rusty Russell, Andy Lutomirski,
	Boris Ostrovsky, Michael Brown, Juergen Gross, Jan Beulich,
	Joerg Roedel, Andrey Ryabinin, long.wanglong, qiuxishi,
	aryabinin, Mauro Carvalho Chehab, Valentin Rothberg,
	Peter Senna Tschudin, X86 ML, Michal Marek, xen-devel,
	linux-kernel

On Tue, Feb 2, 2016 at 4:25 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 02/02/2016 04:22 PM, Luis R. Rodriguez wrote:
>>>
>>> Should it be possible to resuse free_init_pages() and/or
>>> free_reserved_area() only for routines (members in the array in this
>>> case of a struct of fns) that don't meet our subarch once we're done
>>> iterating over the routies and know we can discard things we know we
>>> can drop? Through a cursory glance, *I think* its possible as-is, we
>>> would just need easy access to the respective start and end addresses
>>> and I guess there lies the challenge. Question is, is would that be
>>> clean enough for us? Or are there other things you can think of that
>>> perhaps might make this prospect cleaner later to add?
>>>
>>> I figure better ask now for architectural purposes than later after merged.
>>
>> I don't think its needed we iron out in a solution *now* to be able to
>> free code we know we won't need at run time but having a solid
>> understanding adding this feature later without much impact to users
>> might be worthy. As such I was pursuing a very basic proof of concept
>> to ensure this is possible first given I didn't hear back if folks
>> were sure this might be possible. I don't think a proof of concept
>> should take long so just want to get fleshed out.
>>
>
> This applies to the specific subarch use rather than generic linker
> tables, right?

Well both, given that for instance the kernel frees unused kernel code
using the __init section, so a neat generic section solution for this
perhaps in consideration for the subarch might be worthy to consider.
You had also suggested perhaps the linker table could be pivoted based
on the subarch at one point too, so I gave that a though. It would
make the freeing much easier but for run time it wouldn't fit well
into the current design yet.

 Luis

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 0/8] x86/init: Linux linker tables
  2016-02-03  0:28                     ` Luis R. Rodriguez
@ 2016-02-03  0:48                       ` Luis R. Rodriguez
  0 siblings, 0 replies; 60+ messages in thread
From: Luis R. Rodriguez @ 2016-02-03  0:48 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Michael Matz, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Konrad Rzeszutek Wilk, Rusty Russell, Andy Lutomirski,
	Boris Ostrovsky, Michael Brown, Juergen Gross, Jan Beulich,
	Joerg Roedel, Andrey Ryabinin, long.wanglong, qiuxishi,
	aryabinin, Mauro Carvalho Chehab, Valentin Rothberg,
	Peter Senna Tschudin, X86 ML, Michal Marek, xen-devel,
	linux-kernel, Brenden Blanco, Pere Monclus

On Tue, Feb 2, 2016 at 4:28 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> On Tue, Feb 2, 2016 at 4:25 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 02/02/2016 04:22 PM, Luis R. Rodriguez wrote:
>>s>>
>>>> Should it be possible to resuse free_init_pages() and/or
>>>> free_reserved_area() only for routines (members in the array in this
>>>> case of a struct of fns) that don't meet our subarch once we're done
>>>> iterating over the routies and know we can discard things we know we
>>>> can drop? Through a cursory glance, *I think* its possible as-is, we
>>>> would just need easy access to the respective start and end addresses
>>>> and I guess there lies the challenge. Question is, is would that be
>>>> clean enough for us? Or are there other things you can think of that
>>>> perhaps might make this prospect cleaner later to add?
>>>>
>>>> I figure better ask now for architectural purposes than later after merged.
>>>
>>> I don't think its needed we iron out in a solution *now* to be able to
>>> free code we know we won't need at run time but having a solid
>>> understanding adding this feature later without much impact to users
>>> might be worthy. As such I was pursuing a very basic proof of concept
>>> to ensure this is possible first given I didn't hear back if folks
>>> were sure this might be possible. I don't think a proof of concept
>>> should take long so just want to get fleshed out.
>>>
>>
>> This applies to the specific subarch use rather than generic linker
>> tables, right?
>
> Well both, given that for instance the kernel frees unused kernel code
> using the __init section, so a neat generic section solution for this
> perhaps in consideration for the subarch might be worthy to consider.
> You had also suggested perhaps the linker table could be pivoted based
> on the subarch at one point too, so I gave that a though. It would
> make the freeing much easier but for run time it wouldn't fit well
> into the current design yet.

OK so I've decided that to make the Proof of Concept also immediately
useful I could try to convert one of the *existing* ad hoc
link tables to generic using this new solution. The POC will not be
required as part of the patch series, just knowing it works is all
that would be needed.

To be clear then what I'll do:

  * I'll be respinning this series *now* to incorporate all feedback
given but *will not post* until I also have a proof of concept of a
conversion done and tested that includes free unused code
  * Since the renaming paravirt_enabled() patches are separate but
folks agree with them I'll submit that as an independent series,
shortly
  * Since the patch x86/boot: add BIT() to boot/bitops." is
independent I'll submit as independent single patch, shortly

 Luis

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 1/8] paravirt: rename paravirt_enabled to paravirt_legacy
       [not found]   ` <20160120193241.GA4769@char.us.oracle.com>
@ 2016-02-04 22:50     ` Luis R. Rodriguez
  0 siblings, 0 replies; 60+ messages in thread
From: Luis R. Rodriguez @ 2016-02-04 22:50 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Toshi Kani
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Rusty Russell, Andy Lutomirski, Boris Ostrovsky, Michael Brown,
	Juergen Gross, Jan Beulich, Joerg Roedel, Andrey Ryabinin,
	long.wanglong, qiuxishi, aryabinin, Mauro Carvalho Chehab,
	Valentin Rothberg, Peter Senna Tschudin, X86 ML, xen-devel,
	linux-kernel, cocci, Gary Lin, Andrew Cooper

On Wed, Jan 20, 2016 at 11:32 AM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Tue, Dec 15, 2015 at 02:16:30PM -0800, Luis R. Rodriguez wrote:
>> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> Hey Luis,
>
> Sorry for the long time to respond..
>>
>> paravirt_enabled conveys the idea that if this is set or if
>> paravirt_enabled() returns true you are in a paravirtualized
>> environment. This is not true, this is really only useful to
>> determine if you have a paravirtualization hypervisor that uses
>
> s/users//
>> supports legacy paravirtualized guests. At run time, this tells
>> us if we've booted into a Linux guest with support for legacy
>> paravirtualized guests.
>
> I know I suggested the name but when I look at the words
> 'paravirtualized legacy' immediately I thought about legacy PV
> drivers.. which we do not have in the code.

OK...

> The description:
>
>> + *   paravirtualized guests. Examples of features that
>> + *   characterize legacy paravirtualized guests are
>> + *   things such as the need for APM, PNP BIOS.
>
> Explains it very nicely.
>
> Perhaps (And sorry for that) it should be just called
>
> legacy_platform() ?
>
> Or
> ancient_platform() ?

No, we need to prefix this as its paravirt related, I've given this
silly little rename quite a bit of thought and have decided to stick
to paravirt_legacy() and just adding paravirt_legacy_free() as a
separate helper. The clarification can simply be placed through kdoc
form in documentation. More on this below.

> Since that would lead folks to immediately think of things
> that existed long time ago - such as APM or PNP BIOS.

Right.

> Other suggestions would be to piggyback on Microsoft pushing
> the option in the BIOS to have an "Legacy Free" option (PS/2,
> PnP, serial port, parallel port - all are disabled):
> https://en.wikipedia.org/wiki/Legacy-free_PC

Interesting, yes I like this as it has stronger semantics. I looked
further and it seems this concept of "legacy free" comes from the PC
system design guide [0] by both Microsoft and Intel. The latest of
such documents and the latest definitions of a PC 2001 contains the
concept of "legacy free" and defines it well on Page 50 [1] under the
"PC 2001 requirements". It notes:

----
The basic goal for the legacy-free system requirements is that the
operating system and devices do not use any of the following

  * ISA slots or devices
  * Legacy FDC
  *  PS/2, serial, parallel, and game ports

Revisions to the ACPI specification provide a mechanism that allows
the BIOS to report whether a system provides the services of these
components and interfaces. If the BIOS reports that the system is
legacy free, the system must meet the requirements provided in this
section.
----

[0] http://tech-insider.org/windows/research/2000/1102.html
[1] http://tech-insider.org/windows/research/acrobat/001102/03sys-2001.pdf

> Perhaps:
>
> legacy_free() ?

That's a negation of the question "is legacy?", so this would be a
separate call. Also this doesn't have a prefix.

Note that what is legacy on x86 today may be very legacy tomorrow and
later on there may be other notions of legacy... so sticking loosely
to at least the "PC 2001" definition seems to make sense here and
would limit scope and date the notion of "legacy" here for x86 and for
this paravirt hook. For now I don't see a need to include a date check
though as its a paravirt check anyway, but at least the definition
needs to limit the scope in its documentation for now.

This particular check is *if your hypervisor* supports legacy PC 2001
systems, so we do in the end need a prefix given the question as to
whether or not the kernel can determine you are on a legacy free
system or not is a separate possible question. The above definition
seems to indicate that ACPI should allow the BIOS to reoport if a
system is legacy free or not, so if such mechanisms exist, code that
currently has sprinkled paravirt_enabled() (we'll rename this soon)
could also have such checks for legacy free as well. Once, and if,
such system check is available perhaps we should prevent a legacy free
hypervisor from booting on a legacy system.

Likewise with such system check present if a PV legacy capable
hypervisor boots a legacy free system -- the code checks for
subsystems that are legacy would avoid running given that although
pavarit_supports_legacy() gives true, is_system_legacy() would give
false and avoid running.

Anyway, unless I hear of better suggestions I'm going to stick to
paravirt_legacy() and add paravirt_legacy_free(), and just expand on
the documentation to help clarify this is not for PV legacy drivers.

> Thank you!

 Luis

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 6/8] x86/init: use linker table for i386 early setup
  2016-01-20 21:41     ` Luis R. Rodriguez
@ 2016-02-11 19:55       ` Luis R. Rodriguez
  0 siblings, 0 replies; 60+ messages in thread
From: Luis R. Rodriguez @ 2016-02-11 19:55 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Boris Ostrovsky
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Rusty Russell, Andy Lutomirski, Michael Brown, Juergen Gross,
	Jan Beulich, Joerg Roedel, Andrey Ryabinin, long.wanglong,
	qiuxishi, aryabinin, Mauro Carvalho Chehab, Valentin Rothberg,
	Peter Senna Tschudin, X86 ML, xen-devel, linux-kernel

On Wed, Jan 20, 2016 at 1:41 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> On Wed, Jan 20, 2016 at 1:14 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
>> On Tue, Dec 15, 2015 at 02:16:35PM -0800, Luis R. Rodriguez wrote:
>>> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>>>
>>> This also annotates this is only used for PC and
>>> lguest hardware subarchitectures.
>>>
>>> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
>>
>> On Xen the code path (before this patch) was:
>> xen_start_kernel->i386_start_kernel->i386_default_early_setup
>>
>> and now you remove the i386_default_early_setup call?
>
> No you're right, this should include the Xen subarch as well, thanks for that.

FYI - I've mended this fix into my upcoming v2 series, I've also made
a separate patch to make annotate the XEN subarch on the PV guest boot
path (xen_start_kernel()). While at it, I'll go and also add an
extension of documentation to the subarch based on the conversations
on this thread, and make lguest explicitly use LGUEST subarch rather
than just the digit.

 Luis

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC v1 4/8] x86/init: add linker table support
  2016-01-22  0:42               ` H. Peter Anvin
@ 2016-02-11 20:45                 ` Luis R. Rodriguez
  0 siblings, 0 replies; 60+ messages in thread
From: Luis R. Rodriguez @ 2016-02-11 20:45 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Takashi Iwai, Avi Kivity, Konrad Rzeszutek Wilk, Boris Ostrovsky,
	Roger Pau Monné,
	Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Rusty Russell, Andy Lutomirski, Michael Brown,
	Juergen Gross, Jan Beulich, Joerg Roedel, Andrey Ryabinin,
	long.wanglong, qiuxishi, aryabinin, Mauro Carvalho Chehab,
	Valentin Rothberg, Peter Senna Tschudin, X86 ML, xen-devel,
	linux-kernel

On Thu, Jan 21, 2016 at 4:42 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 01/21/16 16:25, Luis R. Rodriguez wrote:
>>>
>>> Basically, if the hardware is enumerable using standard PC mechanisms (PCI, ACPI) and doesn't need a special boot flow it should use type 0.
>>
>> I can extend the documentation as part of this to be clear.
>>
>> I will note though that this still leaves a gap on code which might
>> want to access the question "are we in a virt environment, and if so
>> on which one" in between really early boot and right before
>> init_hypervisor_platform(). Or rather, given subarch can be used by
>> Xen and lguest but not KVM it means KVM doesn't get to use it. It may
>> not need it, but its also rather trivial to set up on qemu, and I have
>> a patch for that if we wanted one for KVM. That would extend the
>> definition of subarch a bit more, but as it stands today its use was
>> rather limited. Heck, subharch_data is to this day unused.
>>
>
> KVM is not a subarch, and Xen HVM isn't either; the subarch was meant to
> be specifically to handle nonstandard boot entries; the CE4100 extension
> was itself kind of a hack.
>
> If you have a genuine need for a "hypervisor type" then that is a
> separate thing and should be treated separately from subarch.  However,
> you need to consider that some hypervisors can emulate other hypervisors
> and you may have more than one hypervisor API available.

Thanks, I've pegged this onto my series as a documentation extensions
for boot params.

 Luis

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2016-02-11 20:45 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-15 22:16 [RFC v1 0/8] x86/init: Linux linker tables Luis R. Rodriguez
2015-12-15 22:16 ` [RFC v1 1/8] paravirt: rename paravirt_enabled to paravirt_legacy Luis R. Rodriguez
     [not found]   ` <20160120193241.GA4769@char.us.oracle.com>
2016-02-04 22:50     ` Luis R. Rodriguez
2015-12-15 22:16 ` [RFC v1 2/8] tables.h: add linker table support Luis R. Rodriguez
     [not found]   ` <20160120200428.GB4769@char.us.oracle.com>
2016-01-20 23:15     ` Michael Brown
2016-01-20 23:24       ` H. Peter Anvin
2015-12-15 22:16 ` [RFC v1 3/8] x86/boot: add BIT() to boot/bitops.h Luis R. Rodriguez
     [not found]   ` <20160120201718.GC4769@char.us.oracle.com>
2016-01-20 20:33     ` Luis R. Rodriguez
2015-12-15 22:16 ` [RFC v1 4/8] x86/init: add linker table support Luis R. Rodriguez
     [not found]   ` <20160120210014.GF4769@char.us.oracle.com>
2016-01-20 21:33     ` Luis R. Rodriguez
2016-01-20 21:41       ` H. Peter Anvin
2016-01-20 22:12         ` Luis R. Rodriguez
2016-01-20 22:20           ` H. Peter Anvin
2016-01-22  0:25             ` Luis R. Rodriguez
2016-01-22  0:42               ` H. Peter Anvin
2016-02-11 20:45                 ` Luis R. Rodriguez
2016-01-21  8:38       ` Roger Pau Monné
2016-01-21 13:45         ` Boris Ostrovsky
2016-01-21 19:25           ` H. Peter Anvin
2016-01-21 19:46             ` Luis R. Rodriguez
2016-01-21 19:50               ` H. Peter Anvin
2016-01-21 19:52                 ` H. Peter Anvin
2016-01-22  0:19                   ` Luis R. Rodriguez
2016-01-21 20:05                 ` Luis R. Rodriguez
2016-01-21 21:36                   ` H. Peter Anvin
2015-12-15 22:16 ` [RFC v1 5/8] x86/init: move ebda reservations into linker table Luis R. Rodriguez
2015-12-17 20:48   ` Andy Lutomirski
2015-12-17 20:55     ` H. Peter Anvin
2015-12-17 20:57       ` Andy Lutomirski
2015-12-17 23:40         ` Luis R. Rodriguez
2015-12-15 22:16 ` [RFC v1 6/8] x86/init: use linker table for i386 early setup Luis R. Rodriguez
     [not found]   ` <20160120211424.GG4769@char.us.oracle.com>
2016-01-20 21:41     ` Luis R. Rodriguez
2016-02-11 19:55       ` Luis R. Rodriguez
2015-12-15 22:16 ` [RFC v1 7/8] x86/init: user linker table for ce4100 " Luis R. Rodriguez
2015-12-15 22:16 ` [RFC v1 8/8] x86/init: use linker table for mid " Luis R. Rodriguez
2015-12-15 22:59 ` [RFC v1 0/8] x86/init: Linux linker tables H. Peter Anvin
2015-12-17 20:38 ` H. Peter Anvin
2015-12-17 23:46   ` Luis R. Rodriguez
2015-12-17 23:58     ` Luis R. Rodriguez
2015-12-18  4:25     ` H. Peter Anvin
2015-12-18  4:40       ` H. Peter Anvin
2016-01-21 20:19         ` H. Peter Anvin
2016-01-21 20:33           ` Luis R. Rodriguez
2016-01-21 21:37             ` Konrad Rzeszutek Wilk
2016-01-21 22:25               ` [Xen-devel] " Luis R. Rodriguez
2016-01-21 23:56                 ` H. Peter Anvin
2016-01-22  0:28                   ` Luis R. Rodriguez
2016-01-22  8:15               ` Michael Brown
2016-01-22 13:44           ` Michael Matz
2016-01-22 19:06             ` H. Peter Anvin
2016-01-22 21:52               ` Luis R. Rodriguez
2016-02-03  0:22                 ` Luis R. Rodriguez
2016-02-03  0:25                   ` H. Peter Anvin
2016-02-03  0:28                     ` Luis R. Rodriguez
2016-02-03  0:48                       ` Luis R. Rodriguez
2016-02-02 23:48             ` H. Peter Anvin
2016-02-03  0:15               ` Luis R. Rodriguez
2015-12-18 18:50       ` Luis R. Rodriguez
2015-12-18 18:58         ` H. Peter Anvin
     [not found] ` <20160120211649.GJ4769@char.us.oracle.com>
2016-01-20 21:49   ` Luis R. Rodriguez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).