xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [RFC v2 0/6] x86/init: use linker table
@ 2016-02-19 14:15 Luis R. Rodriguez
  2016-02-19 14:15 ` [RFC v2 1/6] x86/boot: add BIT() to boot/bitops.h Luis R. Rodriguez
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Luis R. Rodriguez @ 2016-02-19 14:15 UTC (permalink / raw)
  To: hpa, tglx, mingo, bp
  Cc: jgross, rusty, x86, linux-kernel, luto, Luis R. Rodriguez,
	xen-devel, david.vrabel, boris.ostrovsky, andriy.shevchenko,
	mcb30

This v2 is a re-work of the original linker table work [0],
I've split that original series out into 3:

  * paravirt_enabled() work [1]
  * linker table alone and proof of concepts [2]
  * this series - a new use for linker tables on x86 init

Since the crux of this work is the the linker tables,
I expect most feedback to come through there, but this
also had a few comments, which I've addressed in this
series.

The key here is we drop the direct use of the subarch and
instead make them an explicit requirement consideration
for early x86 boot code. Without this we have no ways but
code review to ensure mismatches will not happen. In
theory code review should catch this, in practice -- its
not happening. This takes a proactive approach to the
problem.

The full set of patches (from all related series) are also
available on linux-next [3] and Linus based tree [4].

[0] http://lkml.kernel.org/r/1450217797-19295-1-git-send-email-mcgrof@do-not-panic.com
[1] http://lkml.kernel.org/r/1455887316-9223-1-git-send-email-mcgrof@kernel.org
[2] http://lkml.kernel.org/r/1455889559-9428-1-git-send-email-mcgrof@kernel.org
[3] https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linux-next.git/log/?h=20160219-linker-table-v2
[4] https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linux.git/log/?h=20160219-linker-table-v2

Luis R. Rodriguez (6):
  x86/boot: add BIT() to boot/bitops.h
  x86/init: use linker tables to simplify x86 init and annotate
    dependencies
  x86/init: move ebda reservations into linker table
  x86/init: use linker table for i386 early setup
  x86/init: user linker table for ce4100 early setup
  x86/init: use linker table for mid early setup

 arch/x86/Kconfig.debug                  |  47 ++++++
 arch/x86/Makefile                       |   1 -
 arch/x86/boot/bitops.h                  |   2 +
 arch/x86/boot/boot.h                    |   2 +-
 arch/x86/include/asm/bios_ebda.h        |   2 -
 arch/x86/include/asm/setup.h            |  12 --
 arch/x86/include/asm/x86_init.h         |   1 +
 arch/x86/include/asm/x86_init_fn.h      | 263 ++++++++++++++++++++++++++++++++
 arch/x86/kernel/Makefile                |   5 +-
 arch/x86/kernel/dbg-tables/Makefile     |  18 +++
 arch/x86/kernel/dbg-tables/alpha.c      |  10 ++
 arch/x86/kernel/dbg-tables/beta.c       |  18 +++
 arch/x86/kernel/dbg-tables/delta.c      |  10 ++
 arch/x86/kernel/dbg-tables/gamma.c      |  18 +++
 arch/x86/kernel/dbg-tables/gamma.h      |   3 +
 arch/x86/kernel/{head.c => ebda.c}      |   6 +-
 arch/x86/kernel/head32.c                |  23 +--
 arch/x86/kernel/head64.c                |  12 +-
 arch/x86/kernel/init.c                  |  55 +++++++
 arch/x86/kernel/sort-init.c             | 114 ++++++++++++++
 arch/x86/platform/ce4100/ce4100.c       |   4 +-
 arch/x86/platform/intel-mid/intel-mid.c |   4 +-
 22 files changed, 584 insertions(+), 46 deletions(-)
 create mode 100644 arch/x86/include/asm/x86_init_fn.h
 create mode 100644 arch/x86/kernel/dbg-tables/Makefile
 create mode 100644 arch/x86/kernel/dbg-tables/alpha.c
 create mode 100644 arch/x86/kernel/dbg-tables/beta.c
 create mode 100644 arch/x86/kernel/dbg-tables/delta.c
 create mode 100644 arch/x86/kernel/dbg-tables/gamma.c
 create mode 100644 arch/x86/kernel/dbg-tables/gamma.h
 rename arch/x86/kernel/{head.c => ebda.c} (94%)
 create mode 100644 arch/x86/kernel/init.c
 create mode 100644 arch/x86/kernel/sort-init.c

-- 
2.7.0

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC v2 1/6] x86/boot: add BIT() to boot/bitops.h
  2016-02-19 14:15 [RFC v2 0/6] x86/init: use linker table Luis R. Rodriguez
@ 2016-02-19 14:15 ` Luis R. Rodriguez
  2016-02-19 15:14   ` Boris Ostrovsky
  2016-02-19 14:15 ` [RFC v2 2/6] x86/init: use linker tables to simplify x86 init and annotate dependencies Luis R. Rodriguez
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Luis R. Rodriguez @ 2016-02-19 14:15 UTC (permalink / raw)
  To: hpa, tglx, mingo, bp
  Cc: jgross, rusty, x86, linux-kernel, luto, Luis R. Rodriguez,
	xen-devel, david.vrabel, boris.ostrovsky, andriy.shevchenko,
	mcb30

The boot/bitops.h has guards against including the
regular bitops (include/asm-generic/bitops.h), it only
implements what we need at early boot. We'll be making
use of BIT() later so add it.

Users of boot/boot.h must include it prior to asm/setup.h
otherwise the guard protection devised against the regular
linux/bitops.h will not take effect.

v2: spelling fixes, and language descriptipon enhancements
    by Konrad.

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
---
 arch/x86/boot/bitops.h | 2 ++
 arch/x86/boot/boot.h   | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/bitops.h b/arch/x86/boot/bitops.h
index 878e4b9940d9..232cff0ff4e3 100644
--- a/arch/x86/boot/bitops.h
+++ b/arch/x86/boot/bitops.h
@@ -40,4 +40,6 @@ static inline void set_bit(int nr, void *addr)
 	asm("btsl %1,%0" : "+m" (*(u32 *)addr) : "Ir" (nr));
 }
 
+#define BIT(x)	(1 << x)
+
 #endif /* BOOT_BITOPS_H */
diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h
index 9011a88353de..4fb53da1f48a 100644
--- a/arch/x86/boot/boot.h
+++ b/arch/x86/boot/boot.h
@@ -23,8 +23,8 @@
 #include <stdarg.h>
 #include <linux/types.h>
 #include <linux/edd.h>
-#include <asm/setup.h>
 #include "bitops.h"
+#include <asm/setup.h>
 #include "ctype.h"
 #include "cpuflags.h"
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC v2 2/6] x86/init: use linker tables to simplify x86 init and annotate dependencies
  2016-02-19 14:15 [RFC v2 0/6] x86/init: use linker table Luis R. Rodriguez
  2016-02-19 14:15 ` [RFC v2 1/6] x86/boot: add BIT() to boot/bitops.h Luis R. Rodriguez
@ 2016-02-19 14:15 ` Luis R. Rodriguez
  2016-02-19 16:40   ` Andy Lutomirski
  2016-02-19 14:15 ` [RFC v2 3/6] x86/init: move ebda reservations into linker table Luis R. Rodriguez
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Luis R. Rodriguez @ 2016-02-19 14:15 UTC (permalink / raw)
  To: hpa, tglx, mingo, bp
  Cc: x86, linux-kernel, luto, boris.ostrovsky, rusty, david.vrabel,
	konrad.wilk, mcb30, jgross, andriy.shevchenko, xen-devel,
	Luis R. Rodriguez

Any failure on the x86 init path can be catastrophic.
A simple shift of a call from one place to another can
easily break things. Likewise adding a new call to
one path without considering all x86 requirements
can make certain x86 run time environments crash.
We currently account for these requirements through
peer code review and run time testing. We could do
much better if we had a clean and simple way to annotate
strong semantics for run time requirements, init sequence
dependencies, and detection mechanisms for additions of
new x86 init sequences.

This adds support for linker tables for x86 in order
to help define strong semantics for x86 init sequences
and as collateral help simplify the x86 init sequence.
The defined struct x86_init_fn is inspired by both
iPXE's sample definitions but is also also heavily
inspired by Linux's own IOMMU initialiation solution
which enables to extend initialization semantics to
support init routine custom detection routines, and
enables adding annotations for init routine dependency
mapping. A newly featured solution in this design
is to build up on the hardware subarchitecture
added to the x86 boot protocol 2.07 in 2007 by Rusty
but never really taken advantage of except for lguest,
using it as a stop-gap for new x86 features which
have failed to take into consideration the dual x86-64
entry points possible due to paravirtualization yielding
requirements. The hardware subarchitecture could also
potentially be used in the future to help unify Linux x86
entry points. The current disjoint entry points for x86-64 can
be summarized as follows.

Bare metal, KVM, Xen HVM                      Xen PV / dom0
      startup_64()                             startup_xen()
             \                                     /
     x86_64_start_kernel()                 xen_start_kernel()
                          \               /
                     x86_64_start_reservations()
                                  |
                             start_kernel()
                             [   ...        ]
                             [ setup_arch() ]
                             [   ...        ]
                                 init

Inherited by the nature of using linker tables we also
gain the ability to sort init sequences using the linker
through the use of order-level and linker SORT() and only
optionally enable use specific init sequence sorting when
init seqeuence dependencies semantics are needed. By using
linker tables we can also avoid #ifdefer'y on select code
when and if desirable, this is also completley optional but
it is a feature we inherit by using linker tables.

For *new x86 features* this enables strong semantics to be
considered. For *existing x86 feature code* this enables the
opportunity to clarify requirements and dependencies,
which ultimately also provides a new way to ensure that code
that should not run never runs, that is, it provides one
mechanism to help prevent dead code. For a more elaborate
description on what dead code is exactly and how it can relate
to init sequences in particular to Xen due to its init sequence
refer to [0] and [1].

Four debug x86 features are included to help test x86 linker
table support. You'll want CONFIG_X86_DEBUG_LINKER_TABLES to
enable those. The features defined with table-$(CONFIG-x)
will *always* be compiled but only linked in when enabled.
The features defined with the good 'ol obj-y will only be
compiled as we're used to. If CONFIG_X86_DEBUG_LINKER_TABLES is
disabled nothing currently runs as there are no init sequences
yet using any of this.

The goal is to evolve the semantics carefully as needed. We start
with one basic callback, an early_init() called at the beginning
of x86_64_start_reservations().

v2:

- port to new linker tables as discussed
- since the start and end of a table are now the empty string and
  "~" string, we can use any digit range for our order levels.
  Because of this bump the order level ranges for x86 from 01.99
  to 0000-9999. We can arbitrarilly change this later, but this also
  gives us a lot of leg room for easy adjustments later.
- since we are now using the basic .init section we have coverge
  support for modpost to check section mismatch issues for us! This
  also means we had to peg __ref on a few of our callers which used
  .init section code.
- Since we're using the standard .init section we can drop now our
  custom modification of both arch/x86/tools/relocs.c and
  arch/x86/kernel/vmlinux.lds.S ! To be clear we don't need any
  further linker script hacks now when making use of linker tables.

[0] http://www.do-not-panic.com/2015/12/avoiding-dead-code-pvops-not-silver-bullet.html
[1] http://www.do-not-panic.com/2015/12/xen-and-x86-linux-zero-page.html

Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
---
 arch/x86/Kconfig.debug              |  47 +++++++
 arch/x86/include/asm/x86_init.h     |   1 +
 arch/x86/include/asm/x86_init_fn.h  | 263 ++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/Makefile            |   2 +
 arch/x86/kernel/dbg-tables/Makefile |  18 +++
 arch/x86/kernel/dbg-tables/alpha.c  |  10 ++
 arch/x86/kernel/dbg-tables/beta.c   |  18 +++
 arch/x86/kernel/dbg-tables/delta.c  |  10 ++
 arch/x86/kernel/dbg-tables/gamma.c  |  18 +++
 arch/x86/kernel/dbg-tables/gamma.h  |   3 +
 arch/x86/kernel/head32.c            |   4 +
 arch/x86/kernel/head64.c            |   4 +
 arch/x86/kernel/init.c              |  55 ++++++++
 arch/x86/kernel/sort-init.c         | 114 ++++++++++++++++
 14 files changed, 567 insertions(+)
 create mode 100644 arch/x86/include/asm/x86_init_fn.h
 create mode 100644 arch/x86/kernel/dbg-tables/Makefile
 create mode 100644 arch/x86/kernel/dbg-tables/alpha.c
 create mode 100644 arch/x86/kernel/dbg-tables/beta.c
 create mode 100644 arch/x86/kernel/dbg-tables/delta.c
 create mode 100644 arch/x86/kernel/dbg-tables/gamma.c
 create mode 100644 arch/x86/kernel/dbg-tables/gamma.h
 create mode 100644 arch/x86/kernel/init.c
 create mode 100644 arch/x86/kernel/sort-init.c

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 9b18ed97a8a2..af5582d33dd8 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -383,4 +383,51 @@ config PUNIT_ATOM_DEBUG
 	  The current power state can be read from
 	  /sys/kernel/debug/punit_atom/dev_power_state
 
+config X86_DEBUG_LINKER_TABLES
+	bool "x86 linker table debug"
+	depends on DEBUG_KERNEL
+	default n
+	---help---
+	  Enabling this should enable debugging linker tables on x86
+	  with its own solution. To help debug it enables the "x86 beta"
+	  and "x86 gamma" built-in feature. Both "x86 alpha" and "x86 delta"
+	  built-in features are left disabled. You should see all tables
+	  compiled, except delta, only the beta and gamma feature will be
+	  linked in. The delta feature will only be compiled and linked in
+	  if and only if its enabled the old way.
+
+	  For more details on these linker table refer to:
+
+	  include/linux/tables.h
+
+	  If unsure, say N.
+
+config X86_DEBUG_LINKER_TABLE_ALPHA
+	bool "x86 linker table alpha"
+	default n
+	depends on X86_DEBUG_LINKER_TABLES
+	---help---
+	  Enabling this should enable the linker table "x86 alpha" feature.
+
+config X86_DEBUG_LINKER_TABLE_BETA
+	bool "x86 linker table beta"
+	default y
+	depends on X86_DEBUG_LINKER_TABLES
+	---help---
+	  Enabling this should enable the linker table "x86 beta" feature.
+
+config X86_DEBUG_LINKER_TABLE_GAMMA
+	bool "x86 linker table gamma"
+	default y
+	depends on X86_DEBUG_LINKER_TABLES
+	---help---
+	  Enabling this should enable the linker table "x86 gamma" feature.
+
+config X86_DEBUG_LINKER_TABLE_DELTA
+	bool "x86 linker table delta"
+	default n
+	depends on X86_DEBUG_LINKER_TABLES
+	---help---
+	  Enabling this should enable the linker table "x86 delta" feature.
+
 endmenu
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 1ae89a2721d6..0df68c814147 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -2,6 +2,7 @@
 #define _ASM_X86_PLATFORM_H
 
 #include <asm/bootparam.h>
+#include <asm/x86_init_fn.h>
 
 struct mpc_bus;
 struct mpc_cpu;
diff --git a/arch/x86/include/asm/x86_init_fn.h b/arch/x86/include/asm/x86_init_fn.h
new file mode 100644
index 000000000000..1521a546c734
--- /dev/null
+++ b/arch/x86/include/asm/x86_init_fn.h
@@ -0,0 +1,263 @@
+#ifndef __X86_INIT_TABLES_H
+#define __X86_INIT_TABLES_H
+
+#include <linux/types.h>
+#include <linux/tables.h>
+#include <linux/init.h>
+#include <linux/bitops.h>
+
+/**
+ * struct x86_init_fn - x86 generic kernel init call
+ *
+ * Linux x86 features vary in complexity, features may require work done at
+ * different levels of the full x86 init sequence. Today there are also two
+ * different possible entry points for Linux on x86, one for bare metal, KVM
+ * and Xen HVM, and another for Xen PV guests / dom0.  Assuming a bootloader
+ * has set up 64-bit mode, roughly the x86 init sequence follows this path:
+ *
+ * Bare metal, KVM, Xen HVM                      Xen PV / dom0
+ *       startup_64()                             startup_xen()
+ *              \                                     /
+ *      x86_64_start_kernel()                 xen_start_kernel()
+ *                           \               /
+ *                      x86_64_start_reservations()
+ *                                   |
+ *                              start_kernel()
+ *                              [   ...        ]
+ *                              [ setup_arch() ]
+ *                              [   ...        ]
+ *                                  init
+ *
+ * x86_64_start_kernel() and xen_start_kernel() are the respective first C code
+ * entry starting points. The different entry points exist to enable Xen to
+ * skip a lot of hardware setup already done and managed on behalf of the
+ * hypervisor, we refer to this as "paravirtualization yielding". The different
+ * levels of init calls on the x86 init sequence exist to account for these
+ * slight differences and requirements. These different entry points also share
+ * a common entry x86 specific path, x86_64_start_reservations().
+ *
+ * A generic x86 feature can have different initialization calls, one on each
+ * of the different main x86 init sequences, but must also address both entry
+ * points in order to work properly across the board on all supported x86
+ * subarchitectures. Since x86 features can also have dependencies on other
+ * setup code or features, x86 features can at times be subordinate to other
+ * x86 features, or conditions. struct x86_init_fn enables feature developers
+ * to annotate dependency relationships to ensure subsequent init calls only
+ * run once a subordinate's dependencies have run. When needed custom
+ * dependency requirements can also be spelled out through a custom dependency
+ * checker. In order to account for the dual entry point nature of x86-64 Linux
+ * for "paravirtualization yielding" and to make annotations for support for
+ * these explicit each struct x86_init_fn must specify supported
+ * subarchitectures. The earliest x86-64 code can read the subarchitecture
+ * though is after load_idt(), as such the earliest we can currently rely on
+ * subarchitecture for semantics and a common init sequences is on the shared
+ * common x86_64_start_reservations().  Each struct x86_init_fn must also
+ * declare a two-digit decimal number to impose an ordering relative to other
+ * features when required.
+ *
+ * x86_init_fn enables strong semantics and dependencies to be defined and
+ * implemented on the full x86 initialization sequence.
+ *
+ * @order_level: must be set, linker order level, this corresponds to the table
+ * 	section sub-table index, we record this only for semantic validation
+ * 	purposes.  Order-level is always required however you typically would
+ * 	only use X86_INIT_NORMAL*() and leave ordering to be done by placement
+ * 	of code in a C file and the order of objects through a Makefile. Custom
+ * 	order-levels can be used when order on C file and order of objects on
+ * 	Makfiles does not suffice or much further refinements are needed.
+ * @supp_hardware_subarch: must be set, it represents the bitmask of supported
+ *	subarchitectures.  We require each struct x86_init_fn to have this set
+ *	to require developer considerations for each supported x86
+ *	subarchitecture and to build strong annotations of different possible
+ *	run time states particularly in consideration for the two main
+ *	different entry points for x86 Linux, to account for paravirtualization
+ *	yielding.
+ *
+ *	The subarchitecture is read by the kernel at early boot from the
+ *	struct boot_params hardware_subarch. Support for the subarchitecture
+ *	exists as of x86 boot protocol 2.07. The bootloader would have set up
+ *	the respective hardware_subarch on the boot sector as per
+ *	Documentation/x86/boot.txt.
+ *
+ *	What x86 entry point is used is determined at run time by the
+ *	bootloader. Linux pv_ops was designed to help enable to build one Linux
+ *	binary to support bare metal and different hypervisors.  pv_ops setup
+ *	code however is limited in that all pv_ops setup code is run late in
+ *	the x86 init sequence, during setup_arch(). In fact cpu_has_hypervisor
+ *	only works after early_cpu_init() during setup_arch(). If an x86
+ *	feature requires an earlier determination of what hypervisor was used,
+ *	or if it needs to annotate only support for certain hypervisors, the
+ *	x86 hardware_subarch should be set by the bootloader and
+ *	@supp_hardware_subarch set by the x86 feature. Using hardware_subarch
+ *	enables x86 features to fill the semantic gap between the Linux x86
+ *	entry point used and what pv_ops has to offer through a hypervisor
+ *	agnostic mechanism.
+ *
+ *	Each supported subarchitecture is set using the respective
+ *	X86_SUBARCH_* as a bit in the bitmask. For instance if a feature
+ *	is supported on PC and Xen subarchitectures only you would set this
+ *	bitmask to:
+ *
+ *		BIT(X86_SUBARCH_PC) |
+ *		BIT(X86_SUBARCH_XEN);
+ *
+ * @detect: optional, if set returns true if the feature has been detected to
+ *	be required, it returns false if the feature has been detected to not
+ *	be required.
+ * @depend: optional, if set this set of init routines must be called prior to
+ * 	the init routine who's respective detect routine we have set this
+ * 	depends callback to. This is only used for sorting purposes given
+ * 	all current init callbacks have a void return type. Sorting is
+ * 	implemented via x86_init_fn_sort(), it must be called only once,
+ * 	however you can delay sorting until you need it if you can ensure
+ * 	only @order_level and @supp_hardware_subarch can account for proper
+ * 	ordering and dependency requirements for all init sequences prior.
+ *	If you do not have a depend callback set its assumed the order level
+ *	(__x86_init_fn(level)) set by the init routine suffices to set the
+ *	order for when the feature's respective callbacks are called with
+ *	respect to other calls. Sorting of init calls with the same order level
+ *	is determined by linker order, determined by order placement on C code
+ *	and order listed on a Makefile. A routine that depends on another is
+ *	known as being subordinate to the init routine it depends on. Routines
+ *	that are subordinate must have an order-level of lower priority or
+ *	equal priority than the order-level of the init sequence it depends on.
+ * @early_init: required, routine which will run in x86_64_start_reservations()
+ *	after we ensure boot_params.hdr.hardware_subarch is accessible and
+ *	properly set. Memory is not yet available. This the earliest we can
+ *	currently define a common shared callback since all callbacks need to
+ *	check for boot_params.hdr.hardware_subarch and this becomes accessible
+ *	on x86-64 until after load_idt().
+ * @flags: optional, bitmask of enum x86_init_fn_flags
+ */
+struct x86_init_fn {
+	__u32 order_level;
+	__u32 supp_hardware_subarch;
+	bool (*detect)(void);
+	bool (*depend)(void);
+	void (*early_init)(void);
+	__u32 flags;
+};
+
+/**
+ * enum x86_init_fn_flags: flags for init sequences
+ *
+ * X86_INIT_FINISH_IF_DETECTED: tells the core that once this init sequence
+ *	has completed it can break out of the loop for init sequences on
+ *	its own level.
+ * X86_INIT_DETECTED: private flag. Used by the x86 core to annotate that this
+ * 	init sequence has been detected and it all of its callbacks
+ * 	must be run during initialization.
+ */
+enum x86_init_fn_flags {
+	X86_INIT_FINISH_IF_DETECTED = BIT(0),
+	X86_INIT_DETECTED = BIT(1),
+};
+
+DECLARE_LINKTABLE_INIT(struct x86_init_fn, x86_init_fns);
+
+/* Init order levels, we can start at 0000 but reserve 0000-0999 for now */
+#define X86_INIT_ORDER_EARLY	1000
+#define X86_INIT_ORDER_NORMAL	3000
+#define X86_INIT_ORDER_LATE	5000
+
+/*
+ * Use LTO_REFERENCE_INITCALL just in case of issues with old versions of gcc.
+ * This might not be needed for linker tables due to how we compartamentalize
+ * sections and then order them at linker time, but just in case.
+ */
+
+#define x86_init(__level,						\
+		 __supp_hardware_subarch,				\
+		 __detect,						\
+		 __depend,						\
+		 __early_init)						\
+	static LINKTABLE_INIT(x86_init_fns, __level)			\
+	__x86_init_fn_##__early_init2 = {				\
+		.order_level = __level,					\
+		.supp_hardware_subarch = __supp_hardware_subarch,	\
+		.detect = __detect,					\
+		.depend = __depend,					\
+		.early_init = __early_init,				\
+	};								\
+	LTO_REFERENCE_INITCALL(__x86_init_fn_##__early_init);
+
+#define x86_init_early(__supp_hardware_subarch,			\
+		       __detect,					\
+		       __depend,					\
+		       __early_init)					\
+	x86_init(X86_INIT_ORDER_EARLY, __supp_hardware_subarch,	\
+		 __detect, __depend,					\
+		 __early_init);
+
+#define x86_init_normal(__supp_hardware_subarch,			\
+		       __detect,					\
+		       __depend,					\
+		       __early_init)					\
+	x86_init(__name, X86_INIT_ORDER_NORMAL, __supp_hardware_subarch,\
+		 __detect, __depend,					\
+		 __early_init);
+
+#define x86_init_early_all(__detect,					\
+			   __depend,					\
+			   __early_init)				\
+	x86_init_early(__name, X86_SUBARCH_ALL_SUBARCHS,		\
+		       __detect, __depend,				\
+		       __early_init);
+
+#define x86_init_early_pc(__detect,					\
+			  __depend,					\
+			  __early_init)					\
+	x86_init_early(BIT(X86_SUBARCH_PC),				\
+		       __detect, __depend,				\
+		       __early_init);
+
+#define x86_init_early_pc_simple(__early_init)				\
+	x86_init_early((BIT(X86_SUBARCH_PC)), NULL, NULL,		\
+		        __early_init);
+
+#define x86_init_normal_all(__detect,					\
+			    __depend,					\
+			    __early_init)				\
+	x86_init_normal(X86_SUBARCH_ALL_SUBARCHS,			\
+		        __detect, __depend,				\
+		        __early_init);
+
+#define x86_init_normal_pc(__detect,					\
+			   __depend,					\
+			   __early_init)				\
+	x86_init_normal((BIT(X86_SUBARCH_PC)),				\
+		        __detect, __depend,				\
+		        __early_init);
+
+
+#define x86_init_normal_xen(__detect,					\
+			    __depend,					\
+			    __early_init)				\
+	x86_init_normal((BIT(X86_SUBARCH_XEN)),				\
+		        __detect, __depend,				\
+		        __early_init);
+
+/**
+ * x86_init_fn_early_init: call all early_init() callbacks
+ *
+ * This calls all early_init() callbacks on the x86_init_fns linker table.
+ */
+void x86_init_fn_early_init(void);
+
+/**
+ * x86_init_fn_init_tables - sort and check x86 linker table
+ *
+ * This sorts struct x86_init_fn init sequences in the x86_init_fns linker
+ * table by ensuring that init sequences that depend on other init sequences
+ * are placed later in the linker table. Init sequences that do not have
+ * dependencies are left in place. Circular dependencies are not allowed.
+ * The order-level of subordinate init sequences, that is of init sequences
+ * that depend on other init sequences, must have an order-level of lower
+ * or equal priority to the init sequence it depends on.
+ *
+ * This also validates semantics of all struct x86_init_fn init sequences
+ * on the x86_init_fns linker table.
+ */
+void x86_init_fn_init_tables(void);
+
+#endif /* __X86_INIT_TABLES_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index b1b78ffe01d0..be167a0a5e2c 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -23,6 +23,8 @@ KASAN_SANITIZE_dumpstack_$(BITS).o := n
 CFLAGS_irq.o := -I$(src)/../include/asm/trace
 
 obj-y			:= process_$(BITS).o signal.o
+obj-y			+= init.o sort-init.o
+obj-$(CONFIG_X86_DEBUG_LINKER_TABLES) += dbg-tables/
 obj-$(CONFIG_COMPAT)	+= signal_compat.o
 obj-y			+= traps.o irq.o irq_$(BITS).o dumpstack_$(BITS).o
 obj-y			+= time.o ioport.o dumpstack.o nmi.o
diff --git a/arch/x86/kernel/dbg-tables/Makefile b/arch/x86/kernel/dbg-tables/Makefile
new file mode 100644
index 000000000000..02d12c502ad0
--- /dev/null
+++ b/arch/x86/kernel/dbg-tables/Makefile
@@ -0,0 +1,18 @@
+# You should see all these compile but at run time you'd
+# only see the ones that were linked in.
+
+# This ensures we always compile but only link
+# in alpha if it was enabled. This is typically what
+# you are expected to use to avoid code bit-rot.
+table-$(CONFIG_X86_DEBUG_LINKER_TABLE_ALPHA)	+= alpha.o
+
+# This accomplishes the same, but requires 2 lines.
+extra-y						+= beta.o
+obj-$(CONFIG_X86_DEBUG_LINKER_TABLE_BETA)	+= beta.o
+
+table-$(CONFIG_X86_DEBUG_LINKER_TABLE_GAMMA)	+= gamma.o
+
+# If you *know* you only want to enable compilation of
+# a feature when its selected you can just use the good
+# ol' obj
+obj-$(CONFIG_X86_DEBUG_LINKER_TABLE_DELTA)	+= delta.o
diff --git a/arch/x86/kernel/dbg-tables/alpha.c b/arch/x86/kernel/dbg-tables/alpha.c
new file mode 100644
index 000000000000..54754f893a08
--- /dev/null
+++ b/arch/x86/kernel/dbg-tables/alpha.c
@@ -0,0 +1,10 @@
+#define pr_fmt(fmt) "debug-alpha: " fmt
+
+#include <linux/kernel.h>
+#include <asm/x86_init.h>
+
+static void early_init_dbg_alpha(void) {
+	pr_info("early_init triggered\n");
+}
+
+x86_init_early_pc_simple(early_init_dbg_alpha);
diff --git a/arch/x86/kernel/dbg-tables/beta.c b/arch/x86/kernel/dbg-tables/beta.c
new file mode 100644
index 000000000000..7384a57fc386
--- /dev/null
+++ b/arch/x86/kernel/dbg-tables/beta.c
@@ -0,0 +1,18 @@
+#define pr_fmt(fmt) "debug-beta: " fmt
+
+#include <linux/kernel.h>
+#include <asm/x86_init.h>
+
+#include "gamma.h"
+
+static bool x86_dbg_detect_beta(void)
+{
+	return true;
+}
+
+static void early_init_dbg_beta(void) {
+	pr_info("early_init triggered\n");
+}
+x86_init_early_pc(x86_dbg_detect_beta,
+		  x86_dbg_detect_gamma,
+		  early_init_dbg_beta);
diff --git a/arch/x86/kernel/dbg-tables/delta.c b/arch/x86/kernel/dbg-tables/delta.c
new file mode 100644
index 000000000000..9d38c68e602a
--- /dev/null
+++ b/arch/x86/kernel/dbg-tables/delta.c
@@ -0,0 +1,10 @@
+#define pr_fmt(fmt) "debug-delta: " fmt
+
+#include <linux/kernel.h>
+#include <asm/x86_init.h>
+
+static void early_init_dbg_delta(void) {
+	pr_info("early_init triggered\n");
+}
+
+x86_init_early_pc_simple(early_init_dbg_delta);
diff --git a/arch/x86/kernel/dbg-tables/gamma.c b/arch/x86/kernel/dbg-tables/gamma.c
new file mode 100644
index 000000000000..7b663c1f08f4
--- /dev/null
+++ b/arch/x86/kernel/dbg-tables/gamma.c
@@ -0,0 +1,18 @@
+#define pr_fmt(fmt) "debug-gamma: " fmt
+
+#include <linux/kernel.h>
+#include <asm/x86_init.h>
+
+bool x86_dbg_detect_gamma(void)
+{
+	return true;
+}
+
+static void early_init_dbg_gamma(void)
+{
+	pr_info("early_init triggered\n");
+}
+
+x86_init_early_pc(x86_dbg_detect_gamma,
+		  NULL,
+		  early_init_dbg_gamma);
diff --git a/arch/x86/kernel/dbg-tables/gamma.h b/arch/x86/kernel/dbg-tables/gamma.h
new file mode 100644
index 000000000000..810d450ddd14
--- /dev/null
+++ b/arch/x86/kernel/dbg-tables/gamma.h
@@ -0,0 +1,3 @@
+#include <asm/x86_init.h>
+
+bool x86_dbg_detect_gamma(void);
diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index 2911ef3a9f1c..d93f3e42e61b 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -19,6 +19,7 @@
 #include <asm/bios_ebda.h>
 #include <asm/tlbflush.h>
 #include <asm/bootparam_utils.h>
+#include <asm/x86_init.h>
 
 static void __init i386_default_early_setup(void)
 {
@@ -47,5 +48,8 @@ asmlinkage __visible void __init i386_start_kernel(void)
 		break;
 	}
 
+	x86_init_fn_init_tables();
+	x86_init_fn_early_init();
+
 	start_kernel();
 }
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 2c0f3407bd1f..a3b56aecaeda 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -28,6 +28,7 @@
 #include <asm/bootparam_utils.h>
 #include <asm/microcode.h>
 #include <asm/kasan.h>
+#include <asm/x86_init.h>
 
 /*
  * Manage page tables very early on.
@@ -190,6 +191,9 @@ void __init x86_64_start_reservations(char *real_mode_data)
 	if (!boot_params.hdr.version)
 		copy_bootdata(__va(real_mode_data));
 
+	x86_init_fn_init_tables();
+	x86_init_fn_early_init();
+
 	reserve_ebda_region();
 
 	switch (boot_params.hdr.hardware_subarch) {
diff --git a/arch/x86/kernel/init.c b/arch/x86/kernel/init.c
new file mode 100644
index 000000000000..b012c59f11dc
--- /dev/null
+++ b/arch/x86/kernel/init.c
@@ -0,0 +1,55 @@
+#define pr_fmt(fmt) "x86-init: " fmt
+
+#include <linux/bug.h>
+#include <linux/kernel.h>
+
+#include <asm/x86_init_fn.h>
+#include <asm/bootparam.h>
+#include <asm/boot.h>
+#include <asm/setup.h>
+
+DEFINE_LINKTABLE_INIT(struct x86_init_fn, x86_init_fns);
+
+static bool x86_init_fn_supports_subarch(struct x86_init_fn *fn)
+{
+	if (!fn->supp_hardware_subarch) {
+		pr_err("Init sequence fails to declares any supported subarchs: %pF\n", fn->early_init);
+		WARN_ON(1);
+	}
+	if (BIT(boot_params.hdr.hardware_subarch) & fn->supp_hardware_subarch)
+		return true;
+	return false;
+}
+
+void __ref x86_init_fn_early_init(void)
+{
+	int ret;
+	struct x86_init_fn *init_fn;
+	unsigned int num_inits = LINKTABLE_SIZE(x86_init_fns);
+
+	if (!num_inits)
+		return;
+
+	pr_debug("Number of init entries: %d\n", num_inits);
+
+	LINKTABLE_FOR_EACH(init_fn, x86_init_fns) {
+		if (!x86_init_fn_supports_subarch(init_fn))
+			continue;
+		if (!init_fn->detect)
+			init_fn->flags |= X86_INIT_DETECTED;
+		else {
+			ret = init_fn->detect();
+			if (ret > 0)
+				init_fn->flags |= X86_INIT_DETECTED;
+		}
+
+		if (init_fn->flags & X86_INIT_DETECTED) {
+			init_fn->flags |= X86_INIT_DETECTED;
+			pr_debug("Running early init %pF ...\n", init_fn->early_init);
+			init_fn->early_init();
+			pr_debug("Completed early init %pF\n", init_fn->early_init);
+			if (init_fn->flags & X86_INIT_FINISH_IF_DETECTED)
+				break;
+		}
+	}
+}
diff --git a/arch/x86/kernel/sort-init.c b/arch/x86/kernel/sort-init.c
new file mode 100644
index 000000000000..c03669f6f9d6
--- /dev/null
+++ b/arch/x86/kernel/sort-init.c
@@ -0,0 +1,114 @@
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <asm/x86_init_fn.h>
+
+static struct x86_init_fn *x86_init_fn_find_dep(struct x86_init_fn *start,
+						struct x86_init_fn *finish,
+						struct x86_init_fn *q)
+{
+	struct x86_init_fn *p;
+
+	if (!q)
+		return NULL;
+
+	for (p = start; p < finish; p++)
+		if (p->detect == q->depend)
+			return p;
+
+	return NULL;
+}
+
+static void x86_init_fn_sort(struct x86_init_fn *start,
+			     struct x86_init_fn *finish)
+{
+
+	struct x86_init_fn *p, *q, tmp;
+
+	for (p = start; p < finish; p++) {
+again:
+		q = x86_init_fn_find_dep(start, finish, p);
+		/*
+		 * We are bit sneaky here. We use the memory address to figure
+		 * out if the node we depend on is past our point, if so, swap.
+		 */
+		if (q > p) {
+			tmp = *p;
+			memmove(p, q, sizeof(*p));
+			*q = tmp;
+			goto again;
+		}
+	}
+
+}
+
+static void x86_init_fn_check(struct x86_init_fn *start,
+			      struct x86_init_fn *finish)
+{
+	struct x86_init_fn *p, *q, *x;
+
+	/* Simple cyclic dependency checker. */
+	for (p = start; p < finish; p++) {
+		if (!p->depend)
+			continue;
+		q = x86_init_fn_find_dep(start, finish, p);
+		x = x86_init_fn_find_dep(start, finish, q);
+		if (p == x) {
+			pr_info("CYCLIC DEPENDENCY FOUND! %pF depends on %pF and vice-versa. BREAKING IT.\n",
+				p->early_init, q->early_init);
+			/* Heavy handed way..*/
+			x->depend = 0;
+		}
+	}
+
+	/*
+	 * Validate sorting semantics.
+	 *
+	 * p depends on q so:
+	 * 	- q must run first, so q < p. If q > p that's an issue
+	 * 	  as its saying p must run prior to q. We already sorted
+	 * 	  this table, this is a problem.
+	 *
+	 * 	- q's order level must be <= than p's as it should run first
+	 */
+	for (p = start; p < finish; p++) {
+		if (!p->depend)
+			continue;
+		/*
+		 * Be pedantic and do a full search on the entire table,
+		 * if we need further validation, after this is called
+		 * one could use an optimized version which just searches
+		 * on x86_init_fn_find_dep(p, finish, p), as we would have
+		 * guarantee on proper ordering both at the dependency level
+		 * and by order level.
+		 */
+		q = x86_init_fn_find_dep(start, finish, p);
+		if (q && q > p) {
+			pr_info("EXECUTION ORDER INVALID! %pF should be called before %pF!\n",
+				p->early_init, q->early_init);
+		}
+
+		/*
+		 * Technically this would still work as the memmove() would
+		 * have forced the dependency to run first, however we want
+		 * strong semantics, so lets avoid these.
+		 */
+		if (q && q->order_level > p->order_level) {
+			pr_info("INVALID ORDER LEVEL! %pF should have an order level <= be called before %pF!\n",
+				p->early_init, q->early_init);
+		}
+	}
+}
+
+void __ref x86_init_fn_init_tables(void)
+{
+	unsigned int num_inits = LINKTABLE_SIZE(x86_init_fns);
+
+	if (!num_inits)
+		return;
+
+	x86_init_fn_sort(LINKTABLE_START(x86_init_fns),
+			 LINKTABLE_END(x86_init_fns));
+	x86_init_fn_check(LINKTABLE_START(x86_init_fns),
+			  LINKTABLE_END(x86_init_fns));
+}
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC v2 3/6] x86/init: move ebda reservations into linker table
  2016-02-19 14:15 [RFC v2 0/6] x86/init: use linker table Luis R. Rodriguez
  2016-02-19 14:15 ` [RFC v2 1/6] x86/boot: add BIT() to boot/bitops.h Luis R. Rodriguez
  2016-02-19 14:15 ` [RFC v2 2/6] x86/init: use linker tables to simplify x86 init and annotate dependencies Luis R. Rodriguez
@ 2016-02-19 14:15 ` Luis R. Rodriguez
  2016-02-19 14:15 ` [RFC v2 4/6] x86/init: use linker table for i386 early setup Luis R. Rodriguez
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Luis R. Rodriguez @ 2016-02-19 14:15 UTC (permalink / raw)
  To: hpa, tglx, mingo, bp
  Cc: jgross, rusty, x86, linux-kernel, luto, Luis R. Rodriguez,
	xen-devel, david.vrabel, boris.ostrovsky, andriy.shevchenko,
	mcb30

This lets us annotate its requirements specifically for
PC and lguest subarchitectures. While at it since head.c
just has ebda data rename it. Since we're using linker tables
and both x86 32-bit and 64-bit have them we don't need
to declare a call for this separately. This lets us
also keep this declared static now.

Since we're using linker tables and have support to annotate
subarchitecture do that instead. pv_legacy() is incorrect as
its really only for legacy PV guests. There's no need for
pv_legacy() check anymore now.

v2:

- Fix 0-day bot testing of on linking now rebranded
  head.o as ebda.o
- Since we dropped paravirt_enabled() --> paravirt_legacy()
  rename in favor of just removing paravirt_enabled() this
  is based on that assumption. This removes one use case.

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
---
 arch/x86/Makefile                  | 1 -
 arch/x86/include/asm/bios_ebda.h   | 2 --
 arch/x86/kernel/Makefile           | 3 ++-
 arch/x86/kernel/{head.c => ebda.c} | 6 +++---
 arch/x86/kernel/head32.c           | 2 --
 arch/x86/kernel/head64.c           | 2 --
 6 files changed, 5 insertions(+), 11 deletions(-)
 rename arch/x86/kernel/{head.c => ebda.c} (94%)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 4086abca0b32..ec9a798ffe6c 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -208,7 +208,6 @@ endif
 
 head-y := arch/x86/kernel/head_$(BITS).o
 head-y += arch/x86/kernel/head$(BITS).o
-head-y += arch/x86/kernel/head.o
 
 libs-y  += arch/x86/lib/
 
diff --git a/arch/x86/include/asm/bios_ebda.h b/arch/x86/include/asm/bios_ebda.h
index aa6a3170ab5a..e63347eb3804 100644
--- a/arch/x86/include/asm/bios_ebda.h
+++ b/arch/x86/include/asm/bios_ebda.h
@@ -38,8 +38,6 @@ static inline unsigned int get_bios_ebda_length(void)
 	return length;
 }
 
-void reserve_ebda_region(void);
-
 #ifdef CONFIG_X86_CHECK_BIOS_CORRUPTION
 /*
  * This is obviously not a great place for this, but we want to be
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index be167a0a5e2c..f5b3de07142b 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -2,7 +2,7 @@
 # Makefile for the linux kernel.
 #
 
-extra-y                := head_$(BITS).o head$(BITS).o head.o vmlinux.lds
+extra-y                := head_$(BITS).o head$(BITS).o vmlinux.lds
 
 CPPFLAGS_vmlinux.lds += -U$(UTS_MACHINE)
 
@@ -22,6 +22,7 @@ KASAN_SANITIZE_dumpstack_$(BITS).o := n
 
 CFLAGS_irq.o := -I$(src)/../include/asm/trace
 
+obj-y			:= ebda.o
 obj-y			:= process_$(BITS).o signal.o
 obj-y			+= init.o sort-init.o
 obj-$(CONFIG_X86_DEBUG_LINKER_TABLES) += dbg-tables/
diff --git a/arch/x86/kernel/head.c b/arch/x86/kernel/ebda.c
similarity index 94%
rename from arch/x86/kernel/head.c
rename to arch/x86/kernel/ebda.c
index 4e3be58a1a77..defc2f07a80b 100644
--- a/arch/x86/kernel/head.c
+++ b/arch/x86/kernel/ebda.c
@@ -4,6 +4,7 @@
 
 #include <asm/setup.h>
 #include <asm/bios_ebda.h>
+#include <asm/x86_init.h>
 
 /*
  * The BIOS places the EBDA/XBDA at the top of conventional
@@ -26,7 +27,7 @@
 #define LOWMEM_CAP		0x9f000U	/* Absolute maximum */
 #define INSANE_CUTOFF		0x20000U	/* Less than this = insane */
 
-void __init reserve_ebda_region(void)
+static void __init reserve_ebda_region(void)
 {
 	unsigned int lowmem, ebda_addr;
 
@@ -38,8 +39,6 @@ void __init reserve_ebda_region(void)
 	 * that the paravirt case can handle memory setup
 	 * correctly, without our help.
 	 */
-	if (boot_params.hdr.hardware_subarch != X86_SUBARCH_PC)
-		return;
 
 	/* end of low (conventional) memory */
 	lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES);
@@ -69,3 +68,4 @@ void __init reserve_ebda_region(void)
 	/* reserve all memory between lowmem and the 1MB mark */
 	memblock_reserve(lowmem, 0x100000 - lowmem);
 }
+x86_init_early_pc_simple(reserve_ebda_region);
diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index d93f3e42e61b..768fa3888066 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -26,8 +26,6 @@ static void __init i386_default_early_setup(void)
 	/* Initialize 32bit specific setup functions */
 	x86_init.resources.reserve_resources = i386_reserve_resources;
 	x86_init.mpparse.setup_ioapic_ids = setup_ioapic_ids_from_mpc;
-
-	reserve_ebda_region();
 }
 
 asmlinkage __visible void __init i386_start_kernel(void)
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index a3b56aecaeda..a823447139f5 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -194,8 +194,6 @@ void __init x86_64_start_reservations(char *real_mode_data)
 	x86_init_fn_init_tables();
 	x86_init_fn_early_init();
 
-	reserve_ebda_region();
-
 	switch (boot_params.hdr.hardware_subarch) {
 	case X86_SUBARCH_INTEL_MID:
 		x86_intel_mid_early_setup();
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC v2 4/6] x86/init: use linker table for i386 early setup
  2016-02-19 14:15 [RFC v2 0/6] x86/init: use linker table Luis R. Rodriguez
                   ` (2 preceding siblings ...)
  2016-02-19 14:15 ` [RFC v2 3/6] x86/init: move ebda reservations into linker table Luis R. Rodriguez
@ 2016-02-19 14:15 ` Luis R. Rodriguez
  2016-02-19 14:15 ` [RFC v2 5/6] x86/init: user linker table for ce4100 " Luis R. Rodriguez
  2016-02-19 14:15 ` [RFC v2 6/6] x86/init: use linker table for mid " Luis R. Rodriguez
  5 siblings, 0 replies; 13+ messages in thread
From: Luis R. Rodriguez @ 2016-02-19 14:15 UTC (permalink / raw)
  To: hpa, tglx, mingo, bp
  Cc: jgross, rusty, x86, linux-kernel, luto, Luis R. Rodriguez,
	xen-devel, david.vrabel, boris.ostrovsky, andriy.shevchenko,
	mcb30

This also annotates this is only used for PC and
lguest hardware subarchitectures.

v2: add X86_SUBARCH_XEN as well, as noted by Konrad,
    now tested by 0-day bot.

Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
---
 arch/x86/kernel/head32.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index 768fa3888066..0b41626e57fc 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -21,12 +21,16 @@
 #include <asm/bootparam_utils.h>
 #include <asm/x86_init.h>
 
-static void __init i386_default_early_setup(void)
+static void __init i386_set_setup_funcs(void)
 {
 	/* Initialize 32bit specific setup functions */
 	x86_init.resources.reserve_resources = i386_reserve_resources;
 	x86_init.mpparse.setup_ioapic_ids = setup_ioapic_ids_from_mpc;
 }
+x86_init_early(BIT(X86_SUBARCH_PC) |
+	       BIT(X86_SUBARCH_LGUEST) |
+	       BIT(X86_SUBARCH_XEN),
+	       NULL, NULL, i386_set_setup_funcs);
 
 asmlinkage __visible void __init i386_start_kernel(void)
 {
@@ -41,9 +45,6 @@ asmlinkage __visible void __init i386_start_kernel(void)
 	case X86_SUBARCH_CE4100:
 		x86_ce4100_early_setup();
 		break;
-	default:
-		i386_default_early_setup();
-		break;
 	}
 
 	x86_init_fn_init_tables();
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC v2 5/6] x86/init: user linker table for ce4100 early setup
  2016-02-19 14:15 [RFC v2 0/6] x86/init: use linker table Luis R. Rodriguez
                   ` (3 preceding siblings ...)
  2016-02-19 14:15 ` [RFC v2 4/6] x86/init: use linker table for i386 early setup Luis R. Rodriguez
@ 2016-02-19 14:15 ` Luis R. Rodriguez
  2016-02-19 14:15 ` [RFC v2 6/6] x86/init: use linker table for mid " Luis R. Rodriguez
  5 siblings, 0 replies; 13+ messages in thread
From: Luis R. Rodriguez @ 2016-02-19 14:15 UTC (permalink / raw)
  To: hpa, tglx, mingo, bp
  Cc: x86, linux-kernel, luto, boris.ostrovsky, rusty, david.vrabel,
	konrad.wilk, mcb30, jgross, andriy.shevchenko, xen-devel,
	Luis R. Rodriguez

Using the linker table removes the need for the #ifdef'ery
and clutter on head32.c.

Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
---
 arch/x86/include/asm/setup.h      | 6 ------
 arch/x86/kernel/head32.c          | 3 ---
 arch/x86/platform/ce4100/ce4100.c | 4 +++-
 3 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 11af24e09c8a..f1e111a9d558 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -51,12 +51,6 @@ extern void x86_intel_mid_early_setup(void);
 static inline void x86_intel_mid_early_setup(void) { }
 #endif
 
-#ifdef CONFIG_X86_INTEL_CE
-extern void x86_ce4100_early_setup(void);
-#else
-static inline void x86_ce4100_early_setup(void) { }
-#endif
-
 #ifndef _SETUP
 
 #include <asm/espfix.h>
diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index 0b41626e57fc..9357feb09863 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -42,9 +42,6 @@ asmlinkage __visible void __init i386_start_kernel(void)
 	case X86_SUBARCH_INTEL_MID:
 		x86_intel_mid_early_setup();
 		break;
-	case X86_SUBARCH_CE4100:
-		x86_ce4100_early_setup();
-		break;
 	}
 
 	x86_init_fn_init_tables();
diff --git a/arch/x86/platform/ce4100/ce4100.c b/arch/x86/platform/ce4100/ce4100.c
index 701fd5843c87..db2a709402e2 100644
--- a/arch/x86/platform/ce4100/ce4100.c
+++ b/arch/x86/platform/ce4100/ce4100.c
@@ -24,6 +24,7 @@
 #include <asm/io.h>
 #include <asm/io_apic.h>
 #include <asm/emergency-restart.h>
+#include <asm/x86_init.h>
 
 static int ce4100_i8042_detect(void)
 {
@@ -144,7 +145,7 @@ static void sdv_pci_init(void)
  * CE4100 specific x86_init function overrides and early setup
  * calls.
  */
-void __init x86_ce4100_early_setup(void)
+static void __init x86_ce4100_early_setup(void)
 {
 	x86_init.oem.arch_setup = sdv_arch_setup;
 	x86_platform.i8042_detect = ce4100_i8042_detect;
@@ -166,3 +167,4 @@ void __init x86_ce4100_early_setup(void)
 
 	pm_power_off = ce4100_power_off;
 }
+x86_init_early(BIT(X86_SUBARCH_CE4100), NULL, NULL, x86_ce4100_early_setup);
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC v2 6/6] x86/init: use linker table for mid early setup
  2016-02-19 14:15 [RFC v2 0/6] x86/init: use linker table Luis R. Rodriguez
                   ` (4 preceding siblings ...)
  2016-02-19 14:15 ` [RFC v2 5/6] x86/init: user linker table for ce4100 " Luis R. Rodriguez
@ 2016-02-19 14:15 ` Luis R. Rodriguez
  5 siblings, 0 replies; 13+ messages in thread
From: Luis R. Rodriguez @ 2016-02-19 14:15 UTC (permalink / raw)
  To: hpa, tglx, mingo, bp
  Cc: x86, linux-kernel, luto, boris.ostrovsky, rusty, david.vrabel,
	konrad.wilk, mcb30, jgross, andriy.shevchenko, xen-devel,
	Luis R. Rodriguez

Using the linker table removes the need for the #ifdef'ery
and clutter on head32.c and head64.c, if this is linked in
and the subarch matches it will run.

Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
---
 arch/x86/include/asm/setup.h            | 6 ------
 arch/x86/kernel/head32.c                | 7 -------
 arch/x86/kernel/head64.c                | 8 --------
 arch/x86/platform/intel-mid/intel-mid.c | 4 +++-
 4 files changed, 3 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index f1e111a9d558..1345c1de7f99 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -45,12 +45,6 @@ extern void reserve_standard_io_resources(void);
 extern void i386_reserve_resources(void);
 extern void setup_default_timer_irq(void);
 
-#ifdef CONFIG_X86_INTEL_MID
-extern void x86_intel_mid_early_setup(void);
-#else
-static inline void x86_intel_mid_early_setup(void) { }
-#endif
-
 #ifndef _SETUP
 
 #include <asm/espfix.h>
diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index 9357feb09863..f28360c20496 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -37,13 +37,6 @@ asmlinkage __visible void __init i386_start_kernel(void)
 	cr4_init_shadow();
 	sanitize_boot_params(&boot_params);
 
-	/* Call the subarch specific early setup function */
-	switch (boot_params.hdr.hardware_subarch) {
-	case X86_SUBARCH_INTEL_MID:
-		x86_intel_mid_early_setup();
-		break;
-	}
-
 	x86_init_fn_init_tables();
 	x86_init_fn_early_init();
 
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index a823447139f5..c913b7eb5056 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -194,13 +194,5 @@ void __init x86_64_start_reservations(char *real_mode_data)
 	x86_init_fn_init_tables();
 	x86_init_fn_early_init();
 
-	switch (boot_params.hdr.hardware_subarch) {
-	case X86_SUBARCH_INTEL_MID:
-		x86_intel_mid_early_setup();
-		break;
-	default:
-		break;
-	}
-
 	start_kernel();
 }
diff --git a/arch/x86/platform/intel-mid/intel-mid.c b/arch/x86/platform/intel-mid/intel-mid.c
index 90bb997ed0a2..6a20a134d4d8 100644
--- a/arch/x86/platform/intel-mid/intel-mid.c
+++ b/arch/x86/platform/intel-mid/intel-mid.c
@@ -167,7 +167,7 @@ static unsigned char intel_mid_get_nmi_reason(void)
  * Moorestown specific x86_init function overrides and early setup
  * calls.
  */
-void __init x86_intel_mid_early_setup(void)
+static void __init x86_intel_mid_early_setup(void)
 {
 	x86_init.resources.probe_roms = x86_init_noop;
 	x86_init.resources.reserve_resources = x86_init_noop;
@@ -199,6 +199,8 @@ void __init x86_intel_mid_early_setup(void)
 	x86_init.mpparse.get_smp_config = x86_init_uint_noop;
 	set_bit(MP_BUS_ISA, mp_bus_not_pci);
 }
+x86_init_early(BIT(X86_SUBARCH_INTEL_MID), NULL, NULL,
+	       x86_intel_mid_early_setup);
 
 /*
  * if user does not want to use per CPU apb timer, just give it a lower rating
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC v2 1/6] x86/boot: add BIT() to boot/bitops.h
  2016-02-19 14:15 ` [RFC v2 1/6] x86/boot: add BIT() to boot/bitops.h Luis R. Rodriguez
@ 2016-02-19 15:14   ` Boris Ostrovsky
  2016-02-19 21:03     ` Luis R. Rodriguez
  0 siblings, 1 reply; 13+ messages in thread
From: Boris Ostrovsky @ 2016-02-19 15:14 UTC (permalink / raw)
  To: Luis R. Rodriguez, hpa, tglx, mingo, bp
  Cc: x86, linux-kernel, luto, rusty, david.vrabel, konrad.wilk, mcb30,
	jgross, andriy.shevchenko, xen-devel



On 02/19/2016 09:15 AM, Luis R. Rodriguez wrote:
> diff --git a/arch/x86/boot/bitops.h b/arch/x86/boot/bitops.h
> index 878e4b9940d9..232cff0ff4e3 100644
> --- a/arch/x86/boot/bitops.h
> +++ b/arch/x86/boot/bitops.h
> @@ -40,4 +40,6 @@ static inline void set_bit(int nr, void *addr)
>   	asm("btsl %1,%0" : "+m" (*(u32 *)addr) : "Ir" (nr));
>   }
>   
> +#define BIT(x)	(1 << x)

Parenthesize x.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC v2 2/6] x86/init: use linker tables to simplify x86 init and annotate dependencies
  2016-02-19 14:15 ` [RFC v2 2/6] x86/init: use linker tables to simplify x86 init and annotate dependencies Luis R. Rodriguez
@ 2016-02-19 16:40   ` Andy Lutomirski
  2016-02-20  0:42     ` Luis R. Rodriguez
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2016-02-19 16:40 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	X86 ML, linux-kernel, Boris Ostrovsky, Rusty Russell,
	David Vrabel, Konrad Rzeszutek Wilk, Michael Brown,
	Juergen Gross, Andy Shevchenko, Xen Devel

On Fri, Feb 19, 2016 at 6:15 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> Any failure on the x86 init path can be catastrophic.
> A simple shift of a call from one place to another can
> easily break things. Likewise adding a new call to
> one path without considering all x86 requirements
> can make certain x86 run time environments crash.
> We currently account for these requirements through
> peer code review and run time testing. We could do
> much better if we had a clean and simple way to annotate
> strong semantics for run time requirements, init sequence
> dependencies, and detection mechanisms for additions of
> new x86 init sequences.

Please document this in a way that will be useful for people trying to
understand what the code does as opposed to just people who are trying
to understand why you wrote it.  More below.



> +/**
> + * struct x86_init_fn - x86 generic kernel init call
> + *
> + * Linux x86 features vary in complexity, features may require work done at
> + * different levels of the full x86 init sequence. Today there are also two
> + * different possible entry points for Linux on x86, one for bare metal, KVM
> + * and Xen HVM, and another for Xen PV guests / dom0.  Assuming a bootloader
> + * has set up 64-bit mode, roughly the x86 init sequence follows this path:
> + *
> + * Bare metal, KVM, Xen HVM                      Xen PV / dom0
> + *       startup_64()                             startup_xen()
> + *              \                                     /
> + *      x86_64_start_kernel()                 xen_start_kernel()
> + *                           \               /
> + *                      x86_64_start_reservations()
> + *                                   |
> + *                              start_kernel()
> + *                              [   ...        ]
> + *                              [ setup_arch() ]
> + *                              [   ...        ]
> + *                                  init
> + *


I don't think this paragraph below is necessary.  I also think it's
very confusing.  Keep in mind that the reader has no idea what a
"level" is at this point and that the reader also doesn't need to
think about terms like "paravirtualization yielding".

> + * x86_64_start_kernel() and xen_start_kernel() are the respective first C code
> + * entry starting points. The different entry points exist to enable Xen to
> + * skip a lot of hardware setup already done and managed on behalf of the
> + * hypervisor, we refer to this as "paravirtualization yielding". The different
> + * levels of init calls on the x86 init sequence exist to account for these
> + * slight differences and requirements. These different entry points also share
> + * a common entry x86 specific path, x86_64_start_reservations().
> + *

And here, I don't even know what a "feature" is.

> + * A generic x86 feature can have different initialization calls, one on each
> + * of the different main x86 init sequences, but must also address both entry
> + * points in order to work properly across the board on all supported x86
> + * subarchitectures. Since x86 features can also have dependencies on other
> + * setup code or features, x86 features can at times be subordinate to other
> + * x86 features, or conditions. struct x86_init_fn enables feature developers
> + * to annotate dependency relationships to ensure subsequent init calls only
> + * run once a subordinate's dependencies have run. When needed custom
> + * dependency requirements can also be spelled out through a custom dependency
> + * checker. In order to account for the dual entry point nature of x86-64 Linux
> + * for "paravirtualization yielding" and to make annotations for support for
> + * these explicit each struct x86_init_fn must specify supported
> + * subarchitectures. The earliest x86-64 code can read the subarchitecture
> + * though is after load_idt(), as such the earliest we can currently rely on
> + * subarchitecture for semantics and a common init sequences is on the shared
> + * common x86_64_start_reservations().  Each struct x86_init_fn must also
> + * declare a two-digit decimal number to impose an ordering relative to other
> + * features when required.

I'm totally lost in the paragraph above.


> + *
> + * x86_init_fn enables strong semantics and dependencies to be defined and
> + * implemented on the full x86 initialization sequence.

Please try explaining what this is, instead.  For example:

An x86_init_fn represents a function to be called at a certain point
during initialization on a specific set of subarchitectures.

> + *
> + * @order_level: must be set, linker order level, this corresponds to the table
> + *     section sub-table index, we record this only for semantic validation
> + *     purposes.

I read this as "this is purely a debugging feature".  Is it?  I think it's not.

>  Order-level is always required however you typically would
> + *     only use X86_INIT_NORMAL*() and leave ordering to be done by placement
> + *     of code in a C file and the order of objects through a Makefile. Custom
> + *     order-levels can be used when order on C file and order of objects on
> + *     Makfiles does not suffice or much further refinements are needed.


Assuming I understand this correctly, here's how I'd write it:

@order_level: The temporal order of this x86_init_fn.  Lower
order_level numbers are called first.  Ties are broken by order found
in the Makefile and then by order in the C file.

Note, however, that my proposed explanation can't be right because it
appears to conflict with "depend".  Please adjust accordingly.

> + * @supp_hardware_subarch: must be set, it represents the bitmask of supported
> + *     subarchitectures.  We require each struct x86_init_fn to have this set
> + *     to require developer considerations for each supported x86
> + *     subarchitecture and to build strong annotations of different possible
> + *     run time states particularly in consideration for the two main
> + *     different entry points for x86 Linux, to account for paravirtualization
> + *     yielding.
'
Too much motivation, too little documentation.

@supp_hardware_subarch: A bitmask of subarchitectures on which to call
this init function.

--- start big deletion ---

> + *
> + *     The subarchitecture is read by the kernel at early boot from the
> + *     struct boot_params hardware_subarch. Support for the subarchitecture
> + *     exists as of x86 boot protocol 2.07. The bootloader would have set up
> + *     the respective hardware_subarch on the boot sector as per
> + *     Documentation/x86/boot.txt.
> + *
> + *     What x86 entry point is used is determined at run time by the
> + *     bootloader. Linux pv_ops was designed to help enable to build one Linux
> + *     binary to support bare metal and different hypervisors.  pv_ops setup
> + *     code however is limited in that all pv_ops setup code is run late in
> + *     the x86 init sequence, during setup_arch(). In fact cpu_has_hypervisor
> + *     only works after early_cpu_init() during setup_arch(). If an x86
> + *     feature requires an earlier determination of what hypervisor was used,
> + *     or if it needs to annotate only support for certain hypervisors, the
> + *     x86 hardware_subarch should be set by the bootloader and
> + *     @supp_hardware_subarch set by the x86 feature. Using hardware_subarch
> + *     enables x86 features to fill the semantic gap between the Linux x86
> + *     entry point used and what pv_ops has to offer through a hypervisor
> + *     agnostic mechanism.

--- end big deletion ---

> + *
> + *     Each supported subarchitecture is set using the respective
> + *     X86_SUBARCH_* as a bit in the bitmask. For instance if a feature
> + *     is supported on PC and Xen subarchitectures only you would set this
> + *     bitmask to:
> + *
> + *             BIT(X86_SUBARCH_PC) |
> + *             BIT(X86_SUBARCH_XEN);

I like this part, but how about "For instance, if an init function
should be called on PC and Xen subarchitectures only, you would set
the bitmask to..."?

> + *
> + * @detect: optional, if set returns true if the feature has been detected to
> + *     be required, it returns false if the feature has been detected to not
> + *     be required.

I have absolutely no idea what this means.

> + * @depend: optional, if set this set of init routines must be called prior to
> + *     the init routine who's respective detect routine we have set this
> + *     depends callback to. This is only used for sorting purposes given
> + *     all current init callbacks have a void return type. Sorting is
> + *     implemented via x86_init_fn_sort(), it must be called only once,
> + *     however you can delay sorting until you need it if you can ensure
> + *     only @order_level and @supp_hardware_subarch can account for proper
> + *     ordering and dependency requirements for all init sequences prior.
> + *     If you do not have a depend callback set its assumed the order level
> + *     (__x86_init_fn(level)) set by the init routine suffices to set the
> + *     order for when the feature's respective callbacks are called with
> + *     respect to other calls. Sorting of init calls with the same order level
> + *     is determined by linker order, determined by order placement on C code
> + *     and order listed on a Makefile. A routine that depends on another is
> + *     known as being subordinate to the init routine it depends on. Routines
> + *     that are subordinate must have an order-level of lower priority or
> + *     equal priority than the order-level of the init sequence it depends on.

I don't understand this at all.  I assume you're saying that some kind
of topological sorting happens.  This leads to a question: why is
"depend" a function pointer?  Shouldn't it be x86_init_fn*?  Also,
what happens if you depend on something that is disabled on the
running subarch?  And what happens if the depend-implied order is
inconsistent with order_level.

I would hope that order_level breaks ties in the topological sort and
that there's a bit fat warning if the order_level ordering is
inconsistent with the topological ordering.  Preferably the warning
would be at build time, in which case it could be an error.

> + * @early_init: required, routine which will run in x86_64_start_reservations()
> + *     after we ensure boot_params.hdr.hardware_subarch is accessible and
> + *     properly set. Memory is not yet available. This the earliest we can
> + *     currently define a common shared callback since all callbacks need to
> + *     check for boot_params.hdr.hardware_subarch and this becomes accessible
> + *     on x86-64 until after load_idt().

What's this for?  Under what conditions is it called?  What order?
Why is it part of x86_init_fn at all?

> + * @flags: optional, bitmask of enum x86_init_fn_flags

What are these flags?  What do they do?

> + */
> +struct x86_init_fn {
> +       __u32 order_level;
> +       __u32 supp_hardware_subarch;
> +       bool (*detect)(void);
> +       bool (*depend)(void);
> +       void (*early_init)(void);
> +       __u32 flags;
> +};
> +
> +/**
> + * enum x86_init_fn_flags: flags for init sequences
> + *
> + * X86_INIT_FINISH_IF_DETECTED: tells the core that once this init sequence
> + *     has completed it can break out of the loop for init sequences on
> + *     its own level.

What does this mean?

> + * X86_INIT_DETECTED: private flag. Used by the x86 core to annotate that this
> + *     init sequence has been detected and it all of its callbacks
> + *     must be run during initialization.

Please make this an entry in a new field scratch_space or similar and
just document the entire field as "private to the init core code".

> +static struct x86_init_fn *x86_init_fn_find_dep(struct x86_init_fn *start,
> +                                               struct x86_init_fn *finish,
> +                                               struct x86_init_fn *q)
> +{
> +       struct x86_init_fn *p;
> +
> +       if (!q)
> +               return NULL;
> +
> +       for (p = start; p < finish; p++)
> +               if (p->detect == q->depend)
> +                       return p;

That's very strange indeed, and it doesn't seem consistent with my
explanation.  Please fix up the docs to explain what's going on.
Again, as a reviewer and eventual user of this code, I do not need to
know what historical problem it solves.  I need to know what the code
does and how to use it.

> +
> +void __ref x86_init_fn_init_tables(void)

Why is this __ref and not __init?

--Andy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC v2 1/6] x86/boot: add BIT() to boot/bitops.h
  2016-02-19 15:14   ` Boris Ostrovsky
@ 2016-02-19 21:03     ` Luis R. Rodriguez
  0 siblings, 0 replies; 13+ messages in thread
From: Luis R. Rodriguez @ 2016-02-19 21:03 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Luis R. Rodriguez, hpa, tglx, mingo, bp, x86, linux-kernel, luto,
	rusty, david.vrabel, konrad.wilk, mcb30, jgross,
	andriy.shevchenko, xen-devel

On Fri, Feb 19, 2016 at 10:14:42AM -0500, Boris Ostrovsky wrote:
> 
> 
> On 02/19/2016 09:15 AM, Luis R. Rodriguez wrote:
> >diff --git a/arch/x86/boot/bitops.h b/arch/x86/boot/bitops.h
> >index 878e4b9940d9..232cff0ff4e3 100644
> >--- a/arch/x86/boot/bitops.h
> >+++ b/arch/x86/boot/bitops.h
> >@@ -40,4 +40,6 @@ static inline void set_bit(int nr, void *addr)
> >  	asm("btsl %1,%0" : "+m" (*(u32 *)addr) : "Ir" (nr));
> >  }
> >+#define BIT(x)	(1 << x)
> 
> Parenthesize x.

I'll just use (1UL << (nr)) then as well to match what
include/linux/bitmap.h uses.

 Luis

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC v2 2/6] x86/init: use linker tables to simplify x86 init and annotate dependencies
  2016-02-19 16:40   ` Andy Lutomirski
@ 2016-02-20  0:42     ` Luis R. Rodriguez
  2016-02-20  7:30       ` Andy Lutomirski
  0 siblings, 1 reply; 13+ messages in thread
From: Luis R. Rodriguez @ 2016-02-20  0:42 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Luis R. Rodriguez, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, X86 ML, linux-kernel, Boris Ostrovsky,
	Rusty Russell, David Vrabel, Konrad Rzeszutek Wilk,
	Michael Brown, Juergen Gross, Andy Shevchenko, Xen Devel,
	rafael.j.wysocki, mchehab, vegard.nossum

On Fri, Feb 19, 2016 at 08:40:49AM -0800, Andy Lutomirski wrote:
> On Fri, Feb 19, 2016 at 6:15 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > Any failure on the x86 init path can be catastrophic.
> > A simple shift of a call from one place to another can
> > easily break things. Likewise adding a new call to
> > one path without considering all x86 requirements
> > can make certain x86 run time environments crash.
> > We currently account for these requirements through
> > peer code review and run time testing. We could do
> > much better if we had a clean and simple way to annotate
> > strong semantics for run time requirements, init sequence
> > dependencies, and detection mechanisms for additions of
> > new x86 init sequences.
> 
> Please document this in a way that will be useful for people trying to
> understand what the code does as opposed to just people who are trying
> to understand why you wrote it.  More below.

Sure. I'm kind of glad people are getting tired of me trying to
explain *why* I wrote it though, I figured that might be the harder
thing to explain. Hopefully it sinks in.

> > +/**
> > + * struct x86_init_fn - x86 generic kernel init call
> > + *
> > + * Linux x86 features vary in complexity, features may require work done at
> > + * different levels of the full x86 init sequence. Today there are also two
> > + * different possible entry points for Linux on x86, one for bare metal, KVM
> > + * and Xen HVM, and another for Xen PV guests / dom0.  Assuming a bootloader
> > + * has set up 64-bit mode, roughly the x86 init sequence follows this path:
> > + *
> > + * Bare metal, KVM, Xen HVM                      Xen PV / dom0
> > + *       startup_64()                             startup_xen()
> > + *              \                                     /
> > + *      x86_64_start_kernel()                 xen_start_kernel()
> > + *                           \               /
> > + *                      x86_64_start_reservations()
> > + *                                   |
> > + *                              start_kernel()
> > + *                              [   ...        ]
> > + *                              [ setup_arch() ]
> > + *                              [   ...        ]
> > + *                                  init
> > + *
> 
> 
> I don't think this paragraph below is necessary.  I also think it's
> very confusing.  Keep in mind that the reader has no idea what a
> "level" is at this point and that the reader also doesn't need to
> think about terms like "paravirtualization yielding".
> 
> > + * x86_64_start_kernel() and xen_start_kernel() are the respective first C code
> > + * entry starting points. The different entry points exist to enable Xen to
> > + * skip a lot of hardware setup already done and managed on behalf of the
> > + * hypervisor, we refer to this as "paravirtualization yielding". The different
> > + * levels of init calls on the x86 init sequence exist to account for these
> > + * slight differences and requirements. These different entry points also share
> > + * a common entry x86 specific path, x86_64_start_reservations().

OK sure, I'll nuke.

> > + *
> 
> And here, I don't even know what a "feature" is.

Kasan is an example.

> > + * A generic x86 feature can have different initialization calls, one on each
> > + * of the different main x86 init sequences, but must also address both entry
> > + * points in order to work properly across the board on all supported x86
> > + * subarchitectures. Since x86 features can also have dependencies on other
> > + * setup code or features, x86 features can at times be subordinate to other
> > + * x86 features, or conditions. struct x86_init_fn enables feature developers
> > + * to annotate dependency relationships to ensure subsequent init calls only
> > + * run once a subordinate's dependencies have run. When needed custom
> > + * dependency requirements can also be spelled out through a custom dependency
> > + * checker. In order to account for the dual entry point nature of x86-64 Linux
> > + * for "paravirtualization yielding" and to make annotations for support for
> > + * these explicit each struct x86_init_fn must specify supported
> > + * subarchitectures. The earliest x86-64 code can read the subarchitecture
> > + * though is after load_idt(), as such the earliest we can currently rely on
> > + * subarchitecture for semantics and a common init sequences is on the shared
> > + * common x86_64_start_reservations().  Each struct x86_init_fn must also
> > + * declare a two-digit decimal number to impose an ordering relative to other
> > + * features when required.
> 
> I'm totally lost in the paragraph above.

As an example Kasan could be defined as a struct x86_init_fn with a respective
callback on x86_64_start_kernel() (kasan_early_init()) and later setup_arch()
(kasan_init()). Since we can't support for struct x86_init_fn calls to start all
the way from x86_64_start_kernel() (given the issues I had noted before)
the earliest a struct x86_init_fn can be used for is for a callback on
x86_64_start_reservations().

It just means we can different callbacks used for different parts of the
x86 init process, it explains the limits to the callbacks we can have right now.

> > + *
> > + * x86_init_fn enables strong semantics and dependencies to be defined and
> > + * implemented on the full x86 initialization sequence.
> 
> Please try explaining what this is, instead.  For example:
> 
> An x86_init_fn represents a function to be called at a certain point
> during initialization on a specific set of subarchitectures.

OK thanks.

> > + *
> > + * @order_level: must be set, linker order level, this corresponds to the table
> > + *     section sub-table index, we record this only for semantic validation
> > + *     purposes.
> 
> I read this as "this is purely a debugging feature".  Is it?  I think it's not.

Nope, the order_level really is used for linker table sorting, but since we can
modify the table for these structs with a run time sort later, we want to ensure
that the order level is still respected after we sort at run time.

> >  Order-level is always required however you typically would
> > + *     only use X86_INIT_NORMAL*() and leave ordering to be done by placement
> > + *     of code in a C file and the order of objects through a Makefile. Custom
> > + *     order-levels can be used when order on C file and order of objects on
> > + *     Makfiles does not suffice or much further refinements are needed.
> 
> 
> Assuming I understand this correctly, here's how I'd write it:
> 
> @order_level: The temporal order of this x86_init_fn.  Lower
> order_level numbers are called first.  Ties are broken by order found
> in the Makefile and then by order in the C file.

Thanks... yes.

> 
> Note, however, that my proposed explanation can't be right because it
> appears to conflict with "depend".  Please adjust accordingly.
> 
> > + * @supp_hardware_subarch: must be set, it represents the bitmask of supported
> > + *     subarchitectures.  We require each struct x86_init_fn to have this set
> > + *     to require developer considerations for each supported x86
> > + *     subarchitecture and to build strong annotations of different possible
> > + *     run time states particularly in consideration for the two main
> > + *     different entry points for x86 Linux, to account for paravirtualization
> > + *     yielding.
> '
> Too much motivation, too little documentation.
> 
> @supp_hardware_subarch: A bitmask of subarchitectures on which to call
> this init function.

OK thanks.

> --- start big deletion ---
> 
> > + *
> > + *     The subarchitecture is read by the kernel at early boot from the
> > + *     struct boot_params hardware_subarch. Support for the subarchitecture
> > + *     exists as of x86 boot protocol 2.07. The bootloader would have set up
> > + *     the respective hardware_subarch on the boot sector as per
> > + *     Documentation/x86/boot.txt.
> > + *
> > + *     What x86 entry point is used is determined at run time by the
> > + *     bootloader. Linux pv_ops was designed to help enable to build one Linux
> > + *     binary to support bare metal and different hypervisors.  pv_ops setup
> > + *     code however is limited in that all pv_ops setup code is run late in
> > + *     the x86 init sequence, during setup_arch(). In fact cpu_has_hypervisor
> > + *     only works after early_cpu_init() during setup_arch(). If an x86
> > + *     feature requires an earlier determination of what hypervisor was used,
> > + *     or if it needs to annotate only support for certain hypervisors, the
> > + *     x86 hardware_subarch should be set by the bootloader and
> > + *     @supp_hardware_subarch set by the x86 feature. Using hardware_subarch
> > + *     enables x86 features to fill the semantic gap between the Linux x86
> > + *     entry point used and what pv_ops has to offer through a hypervisor
> > + *     agnostic mechanism.
> 
> --- end big deletion ---

OK!

> > + *
> > + *     Each supported subarchitecture is set using the respective
> > + *     X86_SUBARCH_* as a bit in the bitmask. For instance if a feature
> > + *     is supported on PC and Xen subarchitectures only you would set this
> > + *     bitmask to:
> > + *
> > + *             BIT(X86_SUBARCH_PC) |
> > + *             BIT(X86_SUBARCH_XEN);
> 
> I like this part, but how about "For instance, if an init function
> should be called on PC and Xen subarchitectures only, you would set
> the bitmask to..."?

Sure.

> > + *
> > + * @detect: optional, if set returns true if the feature has been detected to
> > + *     be required, it returns false if the feature has been detected to not
> > + *     be required.
> 
> I have absolutely no idea what this means.

Well, so the order level stuff is used at link time, there however are some
run time environment conditions that might be important to consider as well,
the detect lets you write a C code routine which will check to see if the
feature should run, it returns false if the feature should not run or is
not required to run.

A cheesy way to implement a cheesy graph is to then use the same callback
as a pointer for dependency, so that if a feature depends on another, both
are on the same order level, and if they have a run time dependency
annotation, they can further annotate this relationship by having the
thing that depends, set its depends pointer to refer to the other
feature's detect callback.

This is borrowed from the IOMMU init code in the kernel. Same logic.
Only thing is we also get link order support too. Link order semantics
and logic should suffice for most things, but not all as with the IOMMU
init stuff.

> > + * @depend: optional, if set this set of init routines must be called prior to
> > + *     the init routine who's respective detect routine we have set this
> > + *     depends callback to. This is only used for sorting purposes given
> > + *     all current init callbacks have a void return type. Sorting is
> > + *     implemented via x86_init_fn_sort(), it must be called only once,
> > + *     however you can delay sorting until you need it if you can ensure
> > + *     only @order_level and @supp_hardware_subarch can account for proper
> > + *     ordering and dependency requirements for all init sequences prior.
> > + *     If you do not have a depend callback set its assumed the order level
> > + *     (__x86_init_fn(level)) set by the init routine suffices to set the
> > + *     order for when the feature's respective callbacks are called with
> > + *     respect to other calls. Sorting of init calls with the same order level
> > + *     is determined by linker order, determined by order placement on C code
> > + *     and order listed on a Makefile. A routine that depends on another is
> > + *     known as being subordinate to the init routine it depends on. Routines
> > + *     that are subordinate must have an order-level of lower priority or
> > + *     equal priority than the order-level of the init sequence it depends on.
> 
> I don't understand this at all.  I assume you're saying that some kind
> of topological sorting happens.  This leads to a question: why is
> "depend" a function pointer?  Shouldn't it be x86_init_fn*?

That's fine too. It was just borrowing the implementation from
the iommu solution. I've modified that considerably so could not
just extend it. After some more though I also considered generalizing
it but decided against it given that run time sorting can depend
on very custom semantics and most importantly data structure size.

> Also,
> what happens if you depend on something that is disabled on the
> running subarch? 

We have a flag that annotates if a callback runs, if the flag is not
set then a depends thing will ensure to not call the routine that
has depends set.

> And what happens if the depend-implied order is
> inconsistent with order_level.

Ah -- that's what I was trying to explain above in the comment
about inconsistent state -- a run time check *after* we do run
time sorting goes through and checks for sanity.

All this is a light weight feature graph, if you will, other
folks are working on different solutions for this and while
I have considered sharing a solution for early init code this
seems a bit more ideal for now. Later in the future though we
should probably get together and talk about a proper feature
graph, with more consistent checks, etc. Its another reason
why I am also curious to see a SAT solver at run time in the
future, but this is super long term [0]...

[0] http://kernelnewbies.org/KernelProjects/kconfig-sat

> I would hope that order_level breaks ties in the topological sort and
> that there's a bit fat warning if the order_level ordering is
> inconsistent with the topological ordering.  Preferably the warning
> would be at build time, in which case it could be an error.

Order level will always be respected at link time. The depend
and detect thing are heuristics used at run time and since
it depends on run time state we can't do this at build time.
The big fat hairy nasty warning is there though. Curious
enough I ended up finding out that the sanity check for
IOMMU was intended to only be run at debug time but the
debug thing was *always* set to true, so we've been running
the sanity checker in the kernel for ages. Its why I also
feel a bit more comfortable in keeping it live and matching
more close to its implementation. I've gone to pains to
build off of the IOMMU init solution and simply extend
on the semantics there. Turns out there was much room for
enhancements. All the fine tuning are documented in the
userspace tree.

> > + * @early_init: required, routine which will run in x86_64_start_reservations()
> > + *     after we ensure boot_params.hdr.hardware_subarch is accessible and
> > + *     properly set. Memory is not yet available. This the earliest we can
> > + *     currently define a common shared callback since all callbacks need to
> > + *     check for boot_params.hdr.hardware_subarch and this becomes accessible
> > + *     on x86-64 until after load_idt().
> 
> What's this for?  Under what conditions is it called?  What order?
> Why is it part of x86_init_fn at all?

Its one of the reasons we are adding this, callbacks to for features.
Its called if the its requirements are met (subarch, and depends).
I'll work on extending this to be clearer.

> > + * @flags: optional, bitmask of enum x86_init_fn_flags
> 
> What are these flags?  What do they do?

They tell us the supported subarchs that must be met matched
at run time in order for the feature's callback to run.

> > + */
> > +struct x86_init_fn {
> > +       __u32 order_level;
> > +       __u32 supp_hardware_subarch;
> > +       bool (*detect)(void);
> > +       bool (*depend)(void);
> > +       void (*early_init)(void);
> > +       __u32 flags;
> > +};
> > +
> > +/**
> > + * enum x86_init_fn_flags: flags for init sequences
> > + *
> > + * X86_INIT_FINISH_IF_DETECTED: tells the core that once this init sequence
> > + *     has completed it can break out of the loop for init sequences on
> > + *     its own level.
> 
> What does this mean?

It borrows the logic from the IOMMU stuff, it enables the core to detect
if *any* routine in the linker table has completed we can then now just
bail and stop going through the rest of the linker table.

> > + * X86_INIT_DETECTED: private flag. Used by the x86 core to annotate that this
> > + *     init sequence has been detected and it all of its callbacks
> > + *     must be run during initialization.
>
> 
> Please make this an entry in a new field scratch_space or similar and
> just document the entire field as "private to the init core code".

OK.

> > +static struct x86_init_fn *x86_init_fn_find_dep(struct x86_init_fn *start,
> > +                                               struct x86_init_fn *finish,
> > +                                               struct x86_init_fn *q)
> > +{
> > +       struct x86_init_fn *p;
> > +
> > +       if (!q)
> > +               return NULL;
> > +
> > +       for (p = start; p < finish; p++)
> > +               if (p->detect == q->depend)
> > +                       return p;
> 
> That's very strange indeed, and it doesn't seem consistent with my
> explanation.  Please fix up the docs to explain what's going on.
> Again, as a reviewer and eventual user of this code, I do not need to
> know what historical problem it solves.  I need to know what the code
> does and how to use it.

Understood. To help please note that this borrows from the IOMMU stuff,
which has even less documentation. I was trying to actually put some
good effort to fix that for our sake and to enhance the semantics
further. If things seem confusing just keep in mind this is the
IOMMU init solution with with enhanced semantics and linker table
support.

> > +
> > +void __ref x86_init_fn_init_tables(void)
> 
> Why is this __ref and not __init?

Yeah __init should work better, thanks.

 Luis

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC v2 2/6] x86/init: use linker tables to simplify x86 init and annotate dependencies
  2016-02-20  0:42     ` Luis R. Rodriguez
@ 2016-02-20  7:30       ` Andy Lutomirski
  2016-02-27  0:19         ` Luis R. Rodriguez
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2016-02-20  7:30 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Luis R. Rodriguez, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, X86 ML, linux-kernel, Boris Ostrovsky,
	Rusty Russell, David Vrabel, Konrad Rzeszutek Wilk,
	Michael Brown, Juergen Gross, Andy Shevchenko, Xen Devel,
	Rafael J. Wysocki, Mauro Carvalho Chehab, vegard.nossum

On Fri, Feb 19, 2016 at 4:42 PM, Luis R. Rodriguez <mcgrof@kernl.org> wrote:
> On Fri, Feb 19, 2016 at 08:40:49AM -0800, Andy Lutomirski wrote:
>> On Fri, Feb 19, 2016 at 6:15 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
>> > Any failure on the x86 init path can be catastrophic.
>> > A simple shift of a call from one place to another can
>> > easily break things. Likewise adding a new call to
>> > one path without considering all x86 requirements
>> > can make certain x86 run time environments crash.
>> > We currently account for these requirements through
>> > peer code review and run time testing. We could do
>> > much better if we had a clean and simple way to annotate
>> > strong semantics for run time requirements, init sequence
>> > dependencies, and detection mechanisms for additions of
>> > new x86 init sequences.
>>
>> Please document this in a way that will be useful for people trying to
>> understand what the code does as opposed to just people who are trying
>> to understand why you wrote it.  More below.
>
> Sure. I'm kind of glad people are getting tired of me trying to
> explain *why* I wrote it though, I figured that might be the harder
> thing to explain. Hopefully it sinks in.
>
>> > +/**
>> > + * struct x86_init_fn - x86 generic kernel init call
>> > + *
>> > + * Linux x86 features vary in complexity, features may require work done at
>> > + * different levels of the full x86 init sequence. Today there are also two
>> > + * different possible entry points for Linux on x86, one for bare metal, KVM
>> > + * and Xen HVM, and another for Xen PV guests / dom0.  Assuming a bootloader
>> > + * has set up 64-bit mode, roughly the x86 init sequence follows this path:
>> > + *
>> > + * Bare metal, KVM, Xen HVM                      Xen PV / dom0
>> > + *       startup_64()                             startup_xen()
>> > + *              \                                     /
>> > + *      x86_64_start_kernel()                 xen_start_kernel()
>> > + *                           \               /
>> > + *                      x86_64_start_reservations()
>> > + *                                   |
>> > + *                              start_kernel()
>> > + *                              [   ...        ]
>> > + *                              [ setup_arch() ]
>> > + *                              [   ...        ]
>> > + *                                  init
>> > + *
>>
>>
>> I don't think this paragraph below is necessary.  I also think it's
>> very confusing.  Keep in mind that the reader has no idea what a
>> "level" is at this point and that the reader also doesn't need to
>> think about terms like "paravirtualization yielding".
>>
>> > + * x86_64_start_kernel() and xen_start_kernel() are the respective first C code
>> > + * entry starting points. The different entry points exist to enable Xen to
>> > + * skip a lot of hardware setup already done and managed on behalf of the
>> > + * hypervisor, we refer to this as "paravirtualization yielding". The different
>> > + * levels of init calls on the x86 init sequence exist to account for these
>> > + * slight differences and requirements. These different entry points also share
>> > + * a common entry x86 specific path, x86_64_start_reservations().
>
> OK sure, I'll nuke.
>
>> > + *
>>
>> And here, I don't even know what a "feature" is.
>
> Kasan is an example.

Is kasan different from the cr4 shadow in this context?  I don't
really feel like calling the cr4 shadow a "feature" is meaningful.  Am
I misunderstanding something?

>
>> > + * A generic x86 feature can have different initialization calls, one on each
>> > + * of the different main x86 init sequences, but must also address both entry
>> > + * points in order to work properly across the board on all supported x86
>> > + * subarchitectures. Since x86 features can also have dependencies on other
>> > + * setup code or features, x86 features can at times be subordinate to other
>> > + * x86 features, or conditions. struct x86_init_fn enables feature developers
>> > + * to annotate dependency relationships to ensure subsequent init calls only
>> > + * run once a subordinate's dependencies have run. When needed custom
>> > + * dependency requirements can also be spelled out through a custom dependency
>> > + * checker. In order to account for the dual entry point nature of x86-64 Linux
>> > + * for "paravirtualization yielding" and to make annotations for support for
>> > + * these explicit each struct x86_init_fn must specify supported
>> > + * subarchitectures. The earliest x86-64 code can read the subarchitecture
>> > + * though is after load_idt(), as such the earliest we can currently rely on
>> > + * subarchitecture for semantics and a common init sequences is on the shared
>> > + * common x86_64_start_reservations().  Each struct x86_init_fn must also
>> > + * declare a two-digit decimal number to impose an ordering relative to other
>> > + * features when required.
>>
>> I'm totally lost in the paragraph above.
>
> As an example Kasan could be defined as a struct x86_init_fn with a respective
> callback on x86_64_start_kernel() (kasan_early_init()) and later setup_arch()
> (kasan_init()). Since we can't support for struct x86_init_fn calls to start all
> the way from x86_64_start_kernel() (given the issues I had noted before)
> the earliest a struct x86_init_fn can be used for is for a callback on
> x86_64_start_reservations().

I'm still lost.  Kasan is a pile of code that does something.  Kasan
has some initialization code.  Do you mean that kasan's initialization
code could be encapsulated in one or more x86_init_fn instances?

>
> It just means we can different callbacks used for different parts of the
> x86 init process, it explains the limits to the callbacks we can have right now.

Do you mean that there would be different linker tables containing
x86_init_fn instances for each of these callback points?  Or is
"early_init" the only such callback point.

BTW, why is early_init called early_init?  Why not call it "callback"?
 Then you could reuse this for other things.

>
>> > + *
>> > + * x86_init_fn enables strong semantics and dependencies to be defined and
>> > + * implemented on the full x86 initialization sequence.
>>
>> Please try explaining what this is, instead.  For example:
>>
>> An x86_init_fn represents a function to be called at a certain point
>> during initialization on a specific set of subarchitectures.
>
> OK thanks.
>
>> > + *
>> > + * @order_level: must be set, linker order level, this corresponds to the table
>> > + *     section sub-table index, we record this only for semantic validation
>> > + *     purposes.
>>
>> I read this as "this is purely a debugging feature".  Is it?  I think it's not.
>
> Nope, the order_level really is used for linker table sorting, but since we can
> modify the table for these structs with a run time sort later, we want to ensure
> that the order level is still respected after we sort at run time.
>
>> >  Order-level is always required however you typically would
>> > + *     only use X86_INIT_NORMAL*() and leave ordering to be done by placement
>> > + *     of code in a C file and the order of objects through a Makefile. Custom
>> > + *     order-levels can be used when order on C file and order of objects on
>> > + *     Makfiles does not suffice or much further refinements are needed.
>>
>>
>> Assuming I understand this correctly, here's how I'd write it:
>>
>> @order_level: The temporal order of this x86_init_fn.  Lower
>> order_level numbers are called first.  Ties are broken by order found
>> in the Makefile and then by order in the C file.
>
> Thanks... yes.
>
>>
>> Note, however, that my proposed explanation can't be right because it
>> appears to conflict with "depend".  Please adjust accordingly.
>>
>> > + * @supp_hardware_subarch: must be set, it represents the bitmask of supported
>> > + *     subarchitectures.  We require each struct x86_init_fn to have this set
>> > + *     to require developer considerations for each supported x86
>> > + *     subarchitecture and to build strong annotations of different possible
>> > + *     run time states particularly in consideration for the two main
>> > + *     different entry points for x86 Linux, to account for paravirtualization
>> > + *     yielding.
>> '
>> Too much motivation, too little documentation.
>>
>> @supp_hardware_subarch: A bitmask of subarchitectures on which to call
>> this init function.
>
> OK thanks.
>

>
>> > + *
>> > + * @detect: optional, if set returns true if the feature has been detected to
>> > + *     be required, it returns false if the feature has been detected to not
>> > + *     be required.
>>
>> I have absolutely no idea what this means.
>
> Well, so the order level stuff is used at link time, there however are some
> run time environment conditions that might be important to consider as well,
> the detect lets you write a C code routine which will check to see if the
> feature should run, it returns false if the feature should not run or is
> not required to run.

So how is:

static bool detect_foo(void) { return whatever; }
statis void init_foo(void) { do something; }

.detect = detect_foo,
.early_init = init_foo,

different from:

static void init_foo(void)
{
  if (whatever) do something;
}

.early_init = init_foo,


>
> A cheesy way to implement a cheesy graph is to then use the same callback
> as a pointer for dependency, so that if a feature depends on another, both
> are on the same order level, and if they have a run time dependency
> annotation, they can further annotate this relationship by having the
> thing that depends, set its depends pointer to refer to the other
> feature's detect callback.
>
> This is borrowed from the IOMMU init code in the kernel. Same logic.
> Only thing is we also get link order support too. Link order semantics
> and logic should suffice for most things, but not all as with the IOMMU
> init stuff.

But this doesn't work at all if .detect isn't set.

>> And what happens if the depend-implied order is
>> inconsistent with order_level.
>
> Ah -- that's what I was trying to explain above in the comment
> about inconsistent state -- a run time check *after* we do run
> time sorting goes through and checks for sanity.
>
> All this is a light weight feature graph, if you will, other
> folks are working on different solutions for this and while
> I have considered sharing a solution for early init code this
> seems a bit more ideal for now. Later in the future though we
> should probably get together and talk about a proper feature
> graph, with more consistent checks, etc. Its another reason
> why I am also curious to see a SAT solver at run time in the
> future, but this is super long term [0]...
>
> [0] http://kernelnewbies.org/KernelProjects/kconfig-sat
>
>> I would hope that order_level breaks ties in the topological sort and
>> that there's a bit fat warning if the order_level ordering is
>> inconsistent with the topological ordering.  Preferably the warning
>> would be at build time, in which case it could be an error.
>
> Order level will always be respected at link time. The depend
> and detect thing are heuristics used at run time and since
> it depends on run time state we can't do this at build time.
> The big fat hairy nasty warning is there though. Curious
> enough I ended up finding out that the sanity check for
> IOMMU was intended to only be run at debug time but the
> debug thing was *always* set to true, so we've been running
> the sanity checker in the kernel for ages. Its why I also
> feel a bit more comfortable in keeping it live and matching
> more close to its implementation. I've gone to pains to
> build off of the IOMMU init solution and simply extend
> on the semantics there. Turns out there was much room for
> enhancements. All the fine tuning are documented in the
> userspace tree.

So is this what you're saying:

Callbacks are called in an order consistent with the dependencies
specified by @depend.  Ties are broken first by @order_level and then
by linker order.  It is an error for an x86_init_fn with lower
@order_level to depend on an x86_init_fn with higher @order_level.

>
>> > + * @early_init: required, routine which will run in x86_64_start_reservations()
>> > + *     after we ensure boot_params.hdr.hardware_subarch is accessible and
>> > + *     properly set. Memory is not yet available. This the earliest we can
>> > + *     currently define a common shared callback since all callbacks need to
>> > + *     check for boot_params.hdr.hardware_subarch and this becomes accessible
>> > + *     on x86-64 until after load_idt().
>>
>> What's this for?  Under what conditions is it called?  What order?
>> Why is it part of x86_init_fn at all?
>
> Its one of the reasons we are adding this, callbacks to for features.
> Its called if the its requirements are met (subarch, and depends).
> I'll work on extending this to be clearer.

I think this should just be ".callback".  Let the context by implied
by the linker table in which it appears.
.
>
>> > + */
>> > +struct x86_init_fn {
>> > +       __u32 order_level;
>> > +       __u32 supp_hardware_subarch;
>> > +       bool (*detect)(void);
>> > +       bool (*depend)(void);
>> > +       void (*early_init)(void);
>> > +       __u32 flags;
>> > +};
>> > +
>> > +/**
>> > + * enum x86_init_fn_flags: flags for init sequences
>> > + *
>> > + * X86_INIT_FINISH_IF_DETECTED: tells the core that once this init sequence
>> > + *     has completed it can break out of the loop for init sequences on
>> > + *     its own level.
>>
>> What does this mean?
>
> It borrows the logic from the IOMMU stuff, it enables the core to detect
> if *any* routine in the linker table has completed we can then now just
> bail and stop going through the rest of the linker table.

Can you give an example for which this would be used?

--Andy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC v2 2/6] x86/init: use linker tables to simplify x86 init and annotate dependencies
  2016-02-20  7:30       ` Andy Lutomirski
@ 2016-02-27  0:19         ` Luis R. Rodriguez
  0 siblings, 0 replies; 13+ messages in thread
From: Luis R. Rodriguez @ 2016-02-27  0:19 UTC (permalink / raw)
  To: Andy Lutomirski, Konrad Rzeszutek Wilk, Rafael J. Wysocki,
	Mauro Carvalho Chehab, vegard.nossum, Joerg Roedel
  Cc: Luis R. Rodriguez, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, X86 ML, linux-kernel, Boris Ostrovsky,
	Rusty Russell, David Vrabel, Michael Brown, Juergen Gross,
	Andy Shevchenko, Xen Devel

On Fri, Feb 19, 2016 at 11:30:43PM -0800, Andy Lutomirski wrote:
> On Fri, Feb 19, 2016 at 4:42 PM, Luis R. Rodriguez <mcgrof@kernl.org> wrote:
> > On Fri, Feb 19, 2016 at 08:40:49AM -0800, Andy Lutomirski wrote:
> >> On Fri, Feb 19, 2016 at 6:15 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> >> And here, I don't even know what a "feature" is.
> >
> > Kasan is an example.
> 
> Is kasan different from the cr4 shadow in this context?  I don't
> really feel like calling the cr4 shadow a "feature" is meaningful.  Am
> I misunderstanding something?

Its an early boot call that we can share with the different boot entries,
or that each boot entry must call. Kasan is different in that is has a series
of requirements for each boot entry, it needs an early boot call, then one
at setup_arch(). In the context of how I'm implementing callbacks just as
an example, cr4 shadow would just have one callback, whereas kasan could
have an early_init() callback and a setup_arch() callback.

> >> > + * A generic x86 feature can have different initialization calls, one on each
> >> > + * of the different main x86 init sequences, but must also address both entry
> >> > + * points in order to work properly across the board on all supported x86
> >> > + * subarchitectures. Since x86 features can also have dependencies on other
> >> > + * setup code or features, x86 features can at times be subordinate to other
> >> > + * x86 features, or conditions. struct x86_init_fn enables feature developers
> >> > + * to annotate dependency relationships to ensure subsequent init calls only
> >> > + * run once a subordinate's dependencies have run. When needed custom
> >> > + * dependency requirements can also be spelled out through a custom dependency
> >> > + * checker. In order to account for the dual entry point nature of x86-64 Linux
> >> > + * for "paravirtualization yielding" and to make annotations for support for
> >> > + * these explicit each struct x86_init_fn must specify supported
> >> > + * subarchitectures. The earliest x86-64 code can read the subarchitecture
> >> > + * though is after load_idt(), as such the earliest we can currently rely on
> >> > + * subarchitecture for semantics and a common init sequences is on the shared
> >> > + * common x86_64_start_reservations().  Each struct x86_init_fn must also
> >> > + * declare a two-digit decimal number to impose an ordering relative to other
> >> > + * features when required.
> >>
> >> I'm totally lost in the paragraph above.
> >
> > As an example Kasan could be defined as a struct x86_init_fn with a respective
> > callback on x86_64_start_kernel() (kasan_early_init()) and later setup_arch()
> > (kasan_init()). Since we can't support for struct x86_init_fn calls to start all
> > the way from x86_64_start_kernel() (given the issues I had noted before)
> > the earliest a struct x86_init_fn can be used for is for a callback on
> > x86_64_start_reservations().
> 
> I'm still lost.  Kasan is a pile of code that does something.  Kasan
> has some initialization code.  Do you mean that kasan's initialization
> code could be encapsulated in one or more x86_init_fn instances?

Yup.

Kasan has for instance:

arch/x86/kernel/head64.c:

asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
{
	...
	kasan_early_init()
	...
}

Then on arch/x86/kernel/setup.c:

void __init setup_arch(char **cmdline_p)
{
	...
	kasan_init();
	...
}

With x86_init_fn we'd have something like within arch/x86/mm/kasan_init_64.c:

static LINKTABLE_INIT_DATA(x86_init_fns, X86_INIT_ORDER_EARLY)
	__x86_init_fn_kasan_early_init = {
	.order_level = X86_INIT_ORDER_EARLY,
	.supp_hardware_subarch = BIT(X86_SUBARCH_PC),
	.early_init = kasan_early_init,
	.setup_arch = kasan_init,
}; 

Only we'd wrap this in a macro wrapper to make it look something like:

x86_init_early_pc_setup(kasan_early_init, kasan_init);

With that in place we can remove the explicit kasan_init() call on setup.c
and kasan_early_init() in head64.c as x86_64_start_kernel() would have
a single x86_init_fn_early_init(); and setup_arch() would have something
like x86_init_fn_setup_arch() -- each of those would iterate over the
x86_init_fns linker table, and call their respective callback. Please
note though that since subarch is currently only readable until after
load_idt() it means though that the we can't fold kasan into this
framework yet, its why I placed x86_init_fn_early_init() until
x86_64_start_reservations().

The proactive measure here would be that if a developer were to add a feature
similar in nature to Kasan they'd have to think about what subarch they
support and vet / ensure they work, and if they can't get it to work at
least they'd need to design it in a way that enables it to be disabled
at run time. Right now Kasan cannot be disabled at run time.

> > It just means we can different callbacks used for different parts of the
> > x86 init process, it explains the limits to the callbacks we can have right now.
> 
> Do you mean that there would be different linker tables containing
> x86_init_fn instances for each of these callback points?  Or is
> "early_init" the only such callback point.

No we'd have only one linker table: x86_init_fns, and in it would be
structs of type struct x86_init_fn. We can iterate over the linker
the table anywhere, in our case since we're using it to help with
the init process, and since x86 features can have calls in different
parts of the x86 init process we can just iterate over the linker table
during these difference key places and have it call the needed callback
for each place.

So x86_init_fn_early_init() would call early_init(). Its the only callback
we have right now. Later, we may add a setup_arch() callback and then
place a x86_init_fn_setup_arch() on setup_arch() to remove clutter there.

> BTW, why is early_init called early_init?  Why not call it "callback"?
> Then you could reuse this for other things.

Because its trying to make it clear where in the x86 init path its going
to be called, in this case today the earliest I can get subarch to be
read is x86_64_start_reservations().

> >> > + * @detect: optional, if set returns true if the feature has been detected to
> >> > + *     be required, it returns false if the feature has been detected to not
> >> > + *     be required.
> >>
> >> I have absolutely no idea what this means.
> >
> > Well, so the order level stuff is used at link time, there however are some
> > run time environment conditions that might be important to consider as well,
> > the detect lets you write a C code routine which will check to see if the
> > feature should run, it returns false if the feature should not run or is
> > not required to run.
> 
> So how is:
> 
> static bool detect_foo(void) { return whatever; }
> statis void init_foo(void) { do something; }
> 
> .detect = detect_foo,
> .early_init = init_foo,
> 
> different from:
> 
> static void init_foo(void)
> {
>   if (whatever) do something;
> }
> 
> .early_init = init_foo,

Its not, but the key here is not @detect but rather the pair of both @detect
and @depend, the @depend would refer to a @detect routine from another
"feature". We then just run time sort things to kick off in the right order --
this would help for cases the linker sort would not suffice (things which can
only be determined at run time).  The run time sort was added for IOMMU
originally to handle a complex dependency chain which could only be determined
at run time. Instead of sprinkling checks all over we have the dependencies
clearly annotated with this simple structure and have a run time sort, sort the
init sequences as they need to happen. Its important to not only ensure order
but also to ensure things that *should not run* don't run.

Refer to the implementation in arch/x86/kernel/sort-init.c which is
heavily borrowed from the existing IOMMU run time sort we do to handle
the run time dependencies there. The reason we can't share sort between
x86 and IOMMU is the structs vary slightly, so the structs are of
different size and the semantics may end up being very different. Lastly
if a bug in sort / semantics in IOMMU we could end up regressing x86
init as well. We can't possibly assume we can share the same semantics
across the board for all users of linker tables. In this case its best to
just keep these linker tables separately and enable their own semantics to
be defined independently.

> > A cheesy way to implement a cheesy graph is to then use the same callback
> > as a pointer for dependency, so that if a feature depends on another, both
> > are on the same order level, and if they have a run time dependency
> > annotation, they can further annotate this relationship by having the
> > thing that depends, set its depends pointer to refer to the other
> > feature's detect callback.
> >
> > This is borrowed from the IOMMU init code in the kernel. Same logic.
> > Only thing is we also get link order support too. Link order semantics
> > and logic should suffice for most things, but not all as with the IOMMU
> > init stuff.
> 
> But this doesn't work at all if .detect isn't set.

I've changed the semantics for x86. So for x86 this is optional, if
detect is null we ignore it. Likewise if depend is null, we ignore it
for sorting purposes.

> >> And what happens if the depend-implied order is
> >> inconsistent with order_level.
> >
> > Ah -- that's what I was trying to explain above in the comment
> > about inconsistent state -- a run time check *after* we do run
> > time sorting goes through and checks for sanity.
> >
> > All this is a light weight feature graph, if you will, other
> > folks are working on different solutions for this and while
> > I have considered sharing a solution for early init code this
> > seems a bit more ideal for now. Later in the future though we
> > should probably get together and talk about a proper feature
> > graph, with more consistent checks, etc. Its another reason
> > why I am also curious to see a SAT solver at run time in the
> > future, but this is super long term [0]...
> >
> > [0] http://kernelnewbies.org/KernelProjects/kconfig-sat
> >
> >> I would hope that order_level breaks ties in the topological sort and
> >> that there's a bit fat warning if the order_level ordering is
> >> inconsistent with the topological ordering.  Preferably the warning
> >> would be at build time, in which case it could be an error.
> >
> > Order level will always be respected at link time. The depend
> > and detect thing are heuristics used at run time and since
> > it depends on run time state we can't do this at build time.
> > The big fat hairy nasty warning is there though. Curious
> > enough I ended up finding out that the sanity check for
> > IOMMU was intended to only be run at debug time but the
> > debug thing was *always* set to true, so we've been running
> > the sanity checker in the kernel for ages. Its why I also
> > feel a bit more comfortable in keeping it live and matching
> > more close to its implementation. I've gone to pains to
> > build off of the IOMMU init solution and simply extend
> > on the semantics there. Turns out there was much room for
> > enhancements. All the fine tuning are documented in the
> > userspace tree.
> 
> So is this what you're saying:
> 
> Callbacks are called in an order consistent with the dependencies
> specified by @depend.

No, @depend just helps with run time sorting the linker table so things that
need to go first which require run time semantics get tucked in first. This
should help with cruft checks, waiting on dependencies, so it should speed
up the process as well. For this x86 init stuff there will be one callback
per each major x86 init path, we'll start off with x86_init_fn_early_init()
which is the earliest we can get the subarch.

> Ties are broken first by @order_level and then
> by linker order.

Right.

> It is an error for an x86_init_fn with lower
> @order_level to depend on an x86_init_fn with higher @order_level.

Right. Another way to say this:

If p @depends on q, q's order level must be <= than p's as it should run first.

Note that technically if we didn't want this check we could remove it,
I tested that it would force the run time sort to then override the
order level, but that make semantics sloppy, so I've pegged this
as a requirement in definition and built in a checker for it. This is
in x86_init_fn_check().

> >> > + * @early_init: required, routine which will run in x86_64_start_reservations()
> >> > + *     after we ensure boot_params.hdr.hardware_subarch is accessible and
> >> > + *     properly set. Memory is not yet available. This the earliest we can
> >> > + *     currently define a common shared callback since all callbacks need to
> >> > + *     check for boot_params.hdr.hardware_subarch and this becomes accessible
> >> > + *     on x86-64 until after load_idt().
> >>
> >> What's this for?  Under what conditions is it called?  What order?
> >> Why is it part of x86_init_fn at all?
> >
> > Its one of the reasons we are adding this, callbacks to for features.
> > Its called if the its requirements are met (subarch, and depends).
> > I'll work on extending this to be clearer.
> 
> I think this should just be ".callback".  Let the context by implied
> by the linker table in which it appears.

The thing is we want to be specific and allow one struct per main
x86 init path major call site, this lets you have one struct for
Kasan for instance, and you have semantics well spelled out.

> >
> >> > + */
> >> > +struct x86_init_fn {
> >> > +       __u32 order_level;
> >> > +       __u32 supp_hardware_subarch;
> >> > +       bool (*detect)(void);
> >> > +       bool (*depend)(void);
> >> > +       void (*early_init)(void);
> >> > +       __u32 flags;
> >> > +};
> >> > +
> >> > +/**
> >> > + * enum x86_init_fn_flags: flags for init sequences
> >> > + *
> >> > + * X86_INIT_FINISH_IF_DETECTED: tells the core that once this init sequence
> >> > + *     has completed it can break out of the loop for init sequences on
> >> > + *     its own level.
> >>
> >> What does this mean?
> >
> > It borrows the logic from the IOMMU stuff, it enables the core to detect
> > if *any* routine in the linker table has completed we can then now just
> > bail and stop going through the rest of the linker table.
> 
> Can you give an example for which this would be used?

I'm going to rip this out.

I know this will be useful but since I can't see anything obvious right now
that simple link order won't address on early boot I'm going to rip this out
in the next series. If we find we may need it later we can consider it later.
Such a sort run time sort thing might prove more useful later in boot where
asynchronous behaviour is more prevalent. Folks working in higher layers
may find more use for it.

  Luis

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-02-27  0:19 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-19 14:15 [RFC v2 0/6] x86/init: use linker table Luis R. Rodriguez
2016-02-19 14:15 ` [RFC v2 1/6] x86/boot: add BIT() to boot/bitops.h Luis R. Rodriguez
2016-02-19 15:14   ` Boris Ostrovsky
2016-02-19 21:03     ` Luis R. Rodriguez
2016-02-19 14:15 ` [RFC v2 2/6] x86/init: use linker tables to simplify x86 init and annotate dependencies Luis R. Rodriguez
2016-02-19 16:40   ` Andy Lutomirski
2016-02-20  0:42     ` Luis R. Rodriguez
2016-02-20  7:30       ` Andy Lutomirski
2016-02-27  0:19         ` Luis R. Rodriguez
2016-02-19 14:15 ` [RFC v2 3/6] x86/init: move ebda reservations into linker table Luis R. Rodriguez
2016-02-19 14:15 ` [RFC v2 4/6] x86/init: use linker table for i386 early setup Luis R. Rodriguez
2016-02-19 14:15 ` [RFC v2 5/6] x86/init: user linker table for ce4100 " Luis R. Rodriguez
2016-02-19 14:15 ` [RFC v2 6/6] x86/init: use linker table for mid " Luis R. Rodriguez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).