All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 4.4 00/37] 4.4.110-stable review
@ 2018-01-03 20:11 Greg Kroah-Hartman
  2018-01-03 20:11   ` Greg Kroah-Hartman
                   ` (46 more replies)
  0 siblings, 47 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, torvalds, akpm, linux, shuahkh, patches,
	ben.hutchings, lkft-triage, stable

This is the start of the stable review cycle for the 4.4.110 release.
There are 37 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Fri Jan  5 19:50:38 UTC 2018.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
	kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.110-rc1.gz
or in the git tree and branch at:
  git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Linux 4.4.110-rc1

Kees Cook <keescook@chromium.org>
    KPTI: Report when enabled

Kees Cook <keescook@chromium.org>
    KPTI: Rename to PAGE_TABLE_ISOLATION

Borislav Petkov <bp@suse.de>
    x86/kaiser: Move feature detection up

Jiri Kosina <jkosina@suse.cz>
    kaiser: disabled on Xen PV

Borislav Petkov <bp@suse.de>
    x86/kaiser: Reenable PARAVIRT

Thomas Gleixner <tglx@linutronix.de>
    x86/paravirt: Dont patch flush_tlb_single

Hugh Dickins <hughd@google.com>
    kaiser: kaiser_flush_tlb_on_return_to_user() check PCID

Hugh Dickins <hughd@google.com>
    kaiser: asm/tlbflush.h handle noPGE at lower level

Hugh Dickins <hughd@google.com>
    kaiser: drop is_atomic arg to kaiser_pagetable_walk()

Hugh Dickins <hughd@google.com>
    kaiser: use ALTERNATIVE instead of x86_cr3_pcid_noflush

Borislav Petkov <bp@suse.de>
    x86/kaiser: Check boottime cmdline params

Borislav Petkov <bp@suse.de>
    x86/kaiser: Rename and simplify X86_FEATURE_KAISER handling

Hugh Dickins <hughd@google.com>
    kaiser: add "nokaiser" boot option, using ALTERNATIVE

Hugh Dickins <hughd@google.com>
    kaiser: fix unlikely error in alloc_ldt_struct()

Hugh Dickins <hughd@google.com>
    kaiser: _pgd_alloc() without __GFP_REPEAT to avoid stalls

Hugh Dickins <hughd@google.com>
    kaiser: paranoid_entry pass cr3 need to paranoid_exit

Hugh Dickins <hughd@google.com>
    kaiser: x86_cr3_pcid_noflush and x86_cr3_pcid_user

Hugh Dickins <hughd@google.com>
    kaiser: PCID 0 for kernel and 128 for user

Hugh Dickins <hughd@google.com>
    kaiser: load_new_mm_cr3() let SWITCH_USER_CR3 flush user

Dave Hansen <dave.hansen@linux.intel.com>
    kaiser: enhanced by kernel and user PCIDs

Hugh Dickins <hughd@google.com>
    kaiser: vmstat show NR_KAISERTABLE as nr_overhead

Hugh Dickins <hughd@google.com>
    kaiser: delete KAISER_REAL_SWITCH option

Hugh Dickins <hughd@google.com>
    kaiser: name that 0x1000 KAISER_SHADOW_PGD_OFFSET

Hugh Dickins <hughd@google.com>
    kaiser: cleanups while trying for gold link

Hugh Dickins <hughd@google.com>
    kaiser: kaiser_remove_mapping() move along the pgd

Hugh Dickins <hughd@google.com>
    kaiser: tidied up kaiser_add/remove_mapping slightly

Hugh Dickins <hughd@google.com>
    kaiser: tidied up asm/kaiser.h somewhat

Hugh Dickins <hughd@google.com>
    kaiser: ENOMEM if kaiser_pagetable_walk() NULL

Hugh Dickins <hughd@google.com>
    kaiser: fix perf crashes

Hugh Dickins <hughd@google.com>
    kaiser: fix regs to do_nmi() ifndef CONFIG_KAISER

Hugh Dickins <hughd@google.com>
    kaiser: KAISER depends on SMP

Hugh Dickins <hughd@google.com>
    kaiser: fix build and FIXME in alloc_ldt_struct()

Hugh Dickins <hughd@google.com>
    kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE

Hugh Dickins <hughd@google.com>
    kaiser: do not set _PAGE_NX on pgd_none

Dave Hansen <dave.hansen@linux.intel.com>
    kaiser: merged update

Richard Fellner <richard.fellner@student.tugraz.at>
    KAISER: Kernel Address Isolation

Tom Lendacky <thomas.lendacky@amd.com>
    x86/boot: Add early cmdline parsing for options with arguments


-------------

Diffstat:

 Documentation/kernel-parameters.txt         |   8 +
 Makefile                                    |   4 +-
 arch/x86/boot/compressed/misc.h             |   1 +
 arch/x86/entry/entry_64.S                   | 164 ++++++++--
 arch/x86/entry/entry_64_compat.S            |   7 +
 arch/x86/include/asm/cmdline.h              |   2 +
 arch/x86/include/asm/cpufeature.h           |   4 +
 arch/x86/include/asm/desc.h                 |   2 +-
 arch/x86/include/asm/hw_irq.h               |   2 +-
 arch/x86/include/asm/kaiser.h               | 141 +++++++++
 arch/x86/include/asm/pgtable.h              |  28 +-
 arch/x86/include/asm/pgtable_64.h           |  25 +-
 arch/x86/include/asm/pgtable_types.h        |  29 +-
 arch/x86/include/asm/processor.h            |   2 +-
 arch/x86/include/asm/tlbflush.h             |  74 ++++-
 arch/x86/include/uapi/asm/processor-flags.h |   3 +-
 arch/x86/kernel/cpu/common.c                |  28 +-
 arch/x86/kernel/cpu/perf_event_intel_ds.c   |  57 +++-
 arch/x86/kernel/espfix_64.c                 |  10 +
 arch/x86/kernel/head_64.S                   |  35 ++-
 arch/x86/kernel/irqinit.c                   |   2 +-
 arch/x86/kernel/ldt.c                       |  25 +-
 arch/x86/kernel/paravirt_patch_64.c         |   2 -
 arch/x86/kernel/process.c                   |   2 +-
 arch/x86/kernel/setup.c                     |   7 +
 arch/x86/kernel/tracepoint.c                |   2 +
 arch/x86/kvm/x86.c                          |   3 +-
 arch/x86/lib/cmdline.c                      | 105 +++++++
 arch/x86/mm/Makefile                        |   1 +
 arch/x86/mm/init.c                          |   2 +-
 arch/x86/mm/init_64.c                       |  10 +
 arch/x86/mm/kaiser.c                        | 455 ++++++++++++++++++++++++++++
 arch/x86/mm/pageattr.c                      |  63 +++-
 arch/x86/mm/pgtable.c                       |  16 +-
 arch/x86/mm/tlb.c                           |  39 ++-
 include/asm-generic/vmlinux.lds.h           |   7 +
 include/linux/kaiser.h                      |  52 ++++
 include/linux/mmzone.h                      |   3 +-
 include/linux/percpu-defs.h                 |  32 +-
 init/main.c                                 |   2 +
 kernel/fork.c                               |   6 +
 mm/vmstat.c                                 |   1 +
 security/Kconfig                            |  10 +
 43 files changed, 1375 insertions(+), 98 deletions(-)

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 01/37] x86/boot: Add early cmdline parsing for options with arguments
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
  2018-01-03 20:11   ` Greg Kroah-Hartman
  2018-01-03 20:11   ` [kernel-hardening] " Greg Kroah-Hartman
@ 2018-01-03 20:11   ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 04/37] kaiser: do not set _PAGE_NX on pgd_none Greg Kroah-Hartman
                     ` (43 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Tom Lendacky, Thomas Gleixner,
	Alexander Potapenko, Andrey Ryabinin, Andy Lutomirski,
	Arnd Bergmann, Borislav Petkov, Brijesh Singh, Dave Young,
	Dmitry Vyukov, Jonathan Corbet, Konrad Rzeszutek Wilk,
	Larry Woodman, Linus Torvalds, Matt Fleming, Michael S. Tsirkin,
	Paolo Bonzini, Peter Zijlstra, Radim Krčmář,
	Rik van Riel, Toshimitsu Kani, kasan-dev, kvm, linux-arch,
	linux-doc, linux-efi, linux-mm, Ingo Molnar

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Tom Lendacky <thomas.lendacky@amd.com>

commit e505371dd83963caae1a37ead9524e8d997341be upstream.

Add a cmdline_find_option() function to look for cmdline options that
take arguments. The argument is returned in a supplied buffer and the
argument length (regardless of whether it fits in the supplied buffer)
is returned, with -1 indicating not found.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Toshimitsu Kani <toshi.kani@hpe.com>
Cc: kasan-dev@googlegroups.com
Cc: kvm@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/36b5f97492a9745dce27682305f990fc20e5cf8a.1500319216.git.thomas.lendacky@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/include/asm/cmdline.h |    2 
 arch/x86/lib/cmdline.c         |  105 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 107 insertions(+)

--- a/arch/x86/include/asm/cmdline.h
+++ b/arch/x86/include/asm/cmdline.h
@@ -2,5 +2,7 @@
 #define _ASM_X86_CMDLINE_H
 
 int cmdline_find_option_bool(const char *cmdline_ptr, const char *option);
+int cmdline_find_option(const char *cmdline_ptr, const char *option,
+			char *buffer, int bufsize);
 
 #endif /* _ASM_X86_CMDLINE_H */
--- a/arch/x86/lib/cmdline.c
+++ b/arch/x86/lib/cmdline.c
@@ -82,3 +82,108 @@ int cmdline_find_option_bool(const char
 
 	return 0;	/* Buffer overrun */
 }
+
+/*
+ * Find a non-boolean option (i.e. option=argument). In accordance with
+ * standard Linux practice, if this option is repeated, this returns the
+ * last instance on the command line.
+ *
+ * @cmdline: the cmdline string
+ * @max_cmdline_size: the maximum size of cmdline
+ * @option: option string to look for
+ * @buffer: memory buffer to return the option argument
+ * @bufsize: size of the supplied memory buffer
+ *
+ * Returns the length of the argument (regardless of if it was
+ * truncated to fit in the buffer), or -1 on not found.
+ */
+static int
+__cmdline_find_option(const char *cmdline, int max_cmdline_size,
+		      const char *option, char *buffer, int bufsize)
+{
+	char c;
+	int pos = 0, len = -1;
+	const char *opptr = NULL;
+	char *bufptr = buffer;
+	enum {
+		st_wordstart = 0,	/* Start of word/after whitespace */
+		st_wordcmp,	/* Comparing this word */
+		st_wordskip,	/* Miscompare, skip */
+		st_bufcpy,	/* Copying this to buffer */
+	} state = st_wordstart;
+
+	if (!cmdline)
+		return -1;      /* No command line */
+
+	/*
+	 * This 'pos' check ensures we do not overrun
+	 * a non-NULL-terminated 'cmdline'
+	 */
+	while (pos++ < max_cmdline_size) {
+		c = *(char *)cmdline++;
+		if (!c)
+			break;
+
+		switch (state) {
+		case st_wordstart:
+			if (myisspace(c))
+				break;
+
+			state = st_wordcmp;
+			opptr = option;
+			/* fall through */
+
+		case st_wordcmp:
+			if ((c == '=') && !*opptr) {
+				/*
+				 * We matched all the way to the end of the
+				 * option we were looking for, prepare to
+				 * copy the argument.
+				 */
+				len = 0;
+				bufptr = buffer;
+				state = st_bufcpy;
+				break;
+			} else if (c == *opptr++) {
+				/*
+				 * We are currently matching, so continue
+				 * to the next character on the cmdline.
+				 */
+				break;
+			}
+			state = st_wordskip;
+			/* fall through */
+
+		case st_wordskip:
+			if (myisspace(c))
+				state = st_wordstart;
+			break;
+
+		case st_bufcpy:
+			if (myisspace(c)) {
+				state = st_wordstart;
+			} else {
+				/*
+				 * Increment len, but don't overrun the
+				 * supplied buffer and leave room for the
+				 * NULL terminator.
+				 */
+				if (++len < bufsize)
+					*bufptr++ = c;
+			}
+			break;
+		}
+	}
+
+	if (bufsize)
+		*bufptr = '\0';
+
+	return len;
+}
+
+int cmdline_find_option(const char *cmdline, const char *option, char *buffer,
+			int bufsize)
+{
+	return __cmdline_find_option(cmdline, COMMAND_LINE_SIZE, option,
+				     buffer, bufsize);
+}

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 01/37] x86/boot: Add early cmdline parsing for options with arguments
@ 2018-01-03 20:11   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Tom Lendacky, Thomas Gleixner,
	Alexander Potapenko, Andrey Ryabinin, Andy Lutomirski,
	Arnd Bergmann, Borislav Petkov, Brijesh Singh, Dave Young,
	Dmitry Vyukov, Jonathan Corbet, Konrad Rzeszutek Wilk,
	Larry Woodman, Linus Torvalds, Matt Fleming, Michael S. Tsirkin,
	Paolo Bonzini, Peter Zijlstra, Radim Krčmář,
	Rik van Riel

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Tom Lendacky <thomas.lendacky@amd.com>

commit e505371dd83963caae1a37ead9524e8d997341be upstream.

Add a cmdline_find_option() function to look for cmdline options that
take arguments. The argument is returned in a supplied buffer and the
argument length (regardless of whether it fits in the supplied buffer)
is returned, with -1 indicating not found.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Toshimitsu Kani <toshi.kani@hpe.com>
Cc: kasan-dev@googlegroups.com
Cc: kvm@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/36b5f97492a9745dce27682305f990fc20e5cf8a.1500319216.git.thomas.lendacky@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/include/asm/cmdline.h |    2 
 arch/x86/lib/cmdline.c         |  105 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 107 insertions(+)

--- a/arch/x86/include/asm/cmdline.h
+++ b/arch/x86/include/asm/cmdline.h
@@ -2,5 +2,7 @@
 #define _ASM_X86_CMDLINE_H
 
 int cmdline_find_option_bool(const char *cmdline_ptr, const char *option);
+int cmdline_find_option(const char *cmdline_ptr, const char *option,
+			char *buffer, int bufsize);
 
 #endif /* _ASM_X86_CMDLINE_H */
--- a/arch/x86/lib/cmdline.c
+++ b/arch/x86/lib/cmdline.c
@@ -82,3 +82,108 @@ int cmdline_find_option_bool(const char
 
 	return 0;	/* Buffer overrun */
 }
+
+/*
+ * Find a non-boolean option (i.e. option=argument). In accordance with
+ * standard Linux practice, if this option is repeated, this returns the
+ * last instance on the command line.
+ *
+ * @cmdline: the cmdline string
+ * @max_cmdline_size: the maximum size of cmdline
+ * @option: option string to look for
+ * @buffer: memory buffer to return the option argument
+ * @bufsize: size of the supplied memory buffer
+ *
+ * Returns the length of the argument (regardless of if it was
+ * truncated to fit in the buffer), or -1 on not found.
+ */
+static int
+__cmdline_find_option(const char *cmdline, int max_cmdline_size,
+		      const char *option, char *buffer, int bufsize)
+{
+	char c;
+	int pos = 0, len = -1;
+	const char *opptr = NULL;
+	char *bufptr = buffer;
+	enum {
+		st_wordstart = 0,	/* Start of word/after whitespace */
+		st_wordcmp,	/* Comparing this word */
+		st_wordskip,	/* Miscompare, skip */
+		st_bufcpy,	/* Copying this to buffer */
+	} state = st_wordstart;
+
+	if (!cmdline)
+		return -1;      /* No command line */
+
+	/*
+	 * This 'pos' check ensures we do not overrun
+	 * a non-NULL-terminated 'cmdline'
+	 */
+	while (pos++ < max_cmdline_size) {
+		c = *(char *)cmdline++;
+		if (!c)
+			break;
+
+		switch (state) {
+		case st_wordstart:
+			if (myisspace(c))
+				break;
+
+			state = st_wordcmp;
+			opptr = option;
+			/* fall through */
+
+		case st_wordcmp:
+			if ((c == '=') && !*opptr) {
+				/*
+				 * We matched all the way to the end of the
+				 * option we were looking for, prepare to
+				 * copy the argument.
+				 */
+				len = 0;
+				bufptr = buffer;
+				state = st_bufcpy;
+				break;
+			} else if (c == *opptr++) {
+				/*
+				 * We are currently matching, so continue
+				 * to the next character on the cmdline.
+				 */
+				break;
+			}
+			state = st_wordskip;
+			/* fall through */
+
+		case st_wordskip:
+			if (myisspace(c))
+				state = st_wordstart;
+			break;
+
+		case st_bufcpy:
+			if (myisspace(c)) {
+				state = st_wordstart;
+			} else {
+				/*
+				 * Increment len, but don't overrun the
+				 * supplied buffer and leave room for the
+				 * NULL terminator.
+				 */
+				if (++len < bufsize)
+					*bufptr++ = c;
+			}
+			break;
+		}
+	}
+
+	if (bufsize)
+		*bufptr = '\0';
+
+	return len;
+}
+
+int cmdline_find_option(const char *cmdline, const char *option, char *buffer,
+			int bufsize)
+{
+	return __cmdline_find_option(cmdline, COMMAND_LINE_SIZE, option,
+				     buffer, bufsize);
+}


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 01/37] x86/boot: Add early cmdline parsing for options with arguments
@ 2018-01-03 20:11   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Tom Lendacky, Thomas Gleixner,
	Alexander Potapenko, Andrey Ryabinin, Andy Lutomirski,
	Arnd Bergmann, Borislav Petkov, Brijesh Singh, Dave Young,
	Dmitry Vyukov, Jonathan Corbet, Konrad Rzeszutek Wilk,
	Larry Woodman, Linus Torvalds, Matt Fleming, Michael S. Tsirkin,
	Paolo Bonzini, Peter Zijlstra, Radim Krčmář,
	Rik van Riel, Toshimitsu Kani, kasan-dev, kvm, linux-arch,
	linux-doc, linux-efi, linux-mm, Ingo Molnar

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Tom Lendacky <thomas.lendacky@amd.com>

commit e505371dd83963caae1a37ead9524e8d997341be upstream.

Add a cmdline_find_option() function to look for cmdline options that
take arguments. The argument is returned in a supplied buffer and the
argument length (regardless of whether it fits in the supplied buffer)
is returned, with -1 indicating not found.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Toshimitsu Kani <toshi.kani@hpe.com>
Cc: kasan-dev@googlegroups.com
Cc: kvm@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/36b5f97492a9745dce27682305f990fc20e5cf8a.1500319216.git.thomas.lendacky@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/include/asm/cmdline.h |    2 
 arch/x86/lib/cmdline.c         |  105 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 107 insertions(+)

--- a/arch/x86/include/asm/cmdline.h
+++ b/arch/x86/include/asm/cmdline.h
@@ -2,5 +2,7 @@
 #define _ASM_X86_CMDLINE_H
 
 int cmdline_find_option_bool(const char *cmdline_ptr, const char *option);
+int cmdline_find_option(const char *cmdline_ptr, const char *option,
+			char *buffer, int bufsize);
 
 #endif /* _ASM_X86_CMDLINE_H */
--- a/arch/x86/lib/cmdline.c
+++ b/arch/x86/lib/cmdline.c
@@ -82,3 +82,108 @@ int cmdline_find_option_bool(const char
 
 	return 0;	/* Buffer overrun */
 }
+
+/*
+ * Find a non-boolean option (i.e. option=argument). In accordance with
+ * standard Linux practice, if this option is repeated, this returns the
+ * last instance on the command line.
+ *
+ * @cmdline: the cmdline string
+ * @max_cmdline_size: the maximum size of cmdline
+ * @option: option string to look for
+ * @buffer: memory buffer to return the option argument
+ * @bufsize: size of the supplied memory buffer
+ *
+ * Returns the length of the argument (regardless of if it was
+ * truncated to fit in the buffer), or -1 on not found.
+ */
+static int
+__cmdline_find_option(const char *cmdline, int max_cmdline_size,
+		      const char *option, char *buffer, int bufsize)
+{
+	char c;
+	int pos = 0, len = -1;
+	const char *opptr = NULL;
+	char *bufptr = buffer;
+	enum {
+		st_wordstart = 0,	/* Start of word/after whitespace */
+		st_wordcmp,	/* Comparing this word */
+		st_wordskip,	/* Miscompare, skip */
+		st_bufcpy,	/* Copying this to buffer */
+	} state = st_wordstart;
+
+	if (!cmdline)
+		return -1;      /* No command line */
+
+	/*
+	 * This 'pos' check ensures we do not overrun
+	 * a non-NULL-terminated 'cmdline'
+	 */
+	while (pos++ < max_cmdline_size) {
+		c = *(char *)cmdline++;
+		if (!c)
+			break;
+
+		switch (state) {
+		case st_wordstart:
+			if (myisspace(c))
+				break;
+
+			state = st_wordcmp;
+			opptr = option;
+			/* fall through */
+
+		case st_wordcmp:
+			if ((c == '=') && !*opptr) {
+				/*
+				 * We matched all the way to the end of the
+				 * option we were looking for, prepare to
+				 * copy the argument.
+				 */
+				len = 0;
+				bufptr = buffer;
+				state = st_bufcpy;
+				break;
+			} else if (c == *opptr++) {
+				/*
+				 * We are currently matching, so continue
+				 * to the next character on the cmdline.
+				 */
+				break;
+			}
+			state = st_wordskip;
+			/* fall through */
+
+		case st_wordskip:
+			if (myisspace(c))
+				state = st_wordstart;
+			break;
+
+		case st_bufcpy:
+			if (myisspace(c)) {
+				state = st_wordstart;
+			} else {
+				/*
+				 * Increment len, but don't overrun the
+				 * supplied buffer and leave room for the
+				 * NULL terminator.
+				 */
+				if (++len < bufsize)
+					*bufptr++ = c;
+			}
+			break;
+		}
+	}
+
+	if (bufsize)
+		*bufptr = '\0';
+
+	return len;
+}
+
+int cmdline_find_option(const char *cmdline, const char *option, char *buffer,
+			int bufsize)
+{
+	return __cmdline_find_option(cmdline, COMMAND_LINE_SIZE, option,
+				     buffer, bufsize);
+}


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 01/37] x86/boot: Add early cmdline parsing for options with arguments
@ 2018-01-03 20:11   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Tom Lendacky, Thomas Gleixner,
	Alexander Potapenko, Andrey Ryabinin, Andy Lutomirski,
	Arnd Bergmann, Borislav Petkov, Brijesh Singh, Dave Young,
	Dmitry Vyukov, Jonathan Corbet, Konrad Rzeszutek Wilk,
	Larry Woodman, Linus Torvalds, Matt Fleming, Michael S. Tsirkin,
	Paolo Bonzini, Peter Zijlstra, Radim Krčmář,
	Rik van Riel, Toshimitsu Kani, kasan-dev, kvm, linux-arch,
	linux-doc, linux-efi, linux-mm, Ingo Molnar

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Tom Lendacky <thomas.lendacky@amd.com>

commit e505371dd83963caae1a37ead9524e8d997341be upstream.

Add a cmdline_find_option() function to look for cmdline options that
take arguments. The argument is returned in a supplied buffer and the
argument length (regardless of whether it fits in the supplied buffer)
is returned, with -1 indicating not found.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim KrA?mA!A? <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Toshimitsu Kani <toshi.kani@hpe.com>
Cc: kasan-dev@googlegroups.com
Cc: kvm@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/36b5f97492a9745dce27682305f990fc20e5cf8a.1500319216.git.thomas.lendacky@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/include/asm/cmdline.h |    2 
 arch/x86/lib/cmdline.c         |  105 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 107 insertions(+)

--- a/arch/x86/include/asm/cmdline.h
+++ b/arch/x86/include/asm/cmdline.h
@@ -2,5 +2,7 @@
 #define _ASM_X86_CMDLINE_H
 
 int cmdline_find_option_bool(const char *cmdline_ptr, const char *option);
+int cmdline_find_option(const char *cmdline_ptr, const char *option,
+			char *buffer, int bufsize);
 
 #endif /* _ASM_X86_CMDLINE_H */
--- a/arch/x86/lib/cmdline.c
+++ b/arch/x86/lib/cmdline.c
@@ -82,3 +82,108 @@ int cmdline_find_option_bool(const char
 
 	return 0;	/* Buffer overrun */
 }
+
+/*
+ * Find a non-boolean option (i.e. option=argument). In accordance with
+ * standard Linux practice, if this option is repeated, this returns the
+ * last instance on the command line.
+ *
+ * @cmdline: the cmdline string
+ * @max_cmdline_size: the maximum size of cmdline
+ * @option: option string to look for
+ * @buffer: memory buffer to return the option argument
+ * @bufsize: size of the supplied memory buffer
+ *
+ * Returns the length of the argument (regardless of if it was
+ * truncated to fit in the buffer), or -1 on not found.
+ */
+static int
+__cmdline_find_option(const char *cmdline, int max_cmdline_size,
+		      const char *option, char *buffer, int bufsize)
+{
+	char c;
+	int pos = 0, len = -1;
+	const char *opptr = NULL;
+	char *bufptr = buffer;
+	enum {
+		st_wordstart = 0,	/* Start of word/after whitespace */
+		st_wordcmp,	/* Comparing this word */
+		st_wordskip,	/* Miscompare, skip */
+		st_bufcpy,	/* Copying this to buffer */
+	} state = st_wordstart;
+
+	if (!cmdline)
+		return -1;      /* No command line */
+
+	/*
+	 * This 'pos' check ensures we do not overrun
+	 * a non-NULL-terminated 'cmdline'
+	 */
+	while (pos++ < max_cmdline_size) {
+		c = *(char *)cmdline++;
+		if (!c)
+			break;
+
+		switch (state) {
+		case st_wordstart:
+			if (myisspace(c))
+				break;
+
+			state = st_wordcmp;
+			opptr = option;
+			/* fall through */
+
+		case st_wordcmp:
+			if ((c == '=') && !*opptr) {
+				/*
+				 * We matched all the way to the end of the
+				 * option we were looking for, prepare to
+				 * copy the argument.
+				 */
+				len = 0;
+				bufptr = buffer;
+				state = st_bufcpy;
+				break;
+			} else if (c == *opptr++) {
+				/*
+				 * We are currently matching, so continue
+				 * to the next character on the cmdline.
+				 */
+				break;
+			}
+			state = st_wordskip;
+			/* fall through */
+
+		case st_wordskip:
+			if (myisspace(c))
+				state = st_wordstart;
+			break;
+
+		case st_bufcpy:
+			if (myisspace(c)) {
+				state = st_wordstart;
+			} else {
+				/*
+				 * Increment len, but don't overrun the
+				 * supplied buffer and leave room for the
+				 * NULL terminator.
+				 */
+				if (++len < bufsize)
+					*bufptr++ = c;
+			}
+			break;
+		}
+	}
+
+	if (bufsize)
+		*bufptr = '\0';
+
+	return len;
+}
+
+int cmdline_find_option(const char *cmdline, const char *option, char *buffer,
+			int bufsize)
+{
+	return __cmdline_find_option(cmdline, COMMAND_LINE_SIZE, option,
+				     buffer, bufsize);
+}


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 02/37] KAISER: Kernel Address Isolation
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
@ 2018-01-03 20:11   ` Greg Kroah-Hartman
  2018-01-03 20:11   ` [kernel-hardening] " Greg Kroah-Hartman
                     ` (45 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel, kernel-hardening
  Cc: Greg Kroah-Hartman, stable, clementine.maurice, moritz.lipp,
	Michael Schwarz, Richard Fellner, Ingo Molnar, kirill.shutemov,
	anders.fogh, Daniel Gruss, Jiri Kosina, Hugh Dickins

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Richard Fellner <richard.fellner@student.tugraz.at>


This patch introduces our implementation of KAISER (Kernel Address Isolation to
have Side-channels Efficiently Removed), a kernel isolation technique to close
hardware side channels on kernel address information.

More information about the patch can be found on:

        https://github.com/IAIK/KAISER

From: Richard Fellner <richard.fellner@student.tugraz.at>
From: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
X-Subject: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
Date: Thu, 4 May 2017 14:26:50 +0200
Link: http://marc.info/?l=linux-kernel&m=149390087310405&w=2
Kaiser-4.10-SHA1: c4b1831d44c6144d3762ccc72f0c4e71a0c713e5

To: <linux-kernel@vger.kernel.org>
To: <kernel-hardening@lists.openwall.com>
Cc: <clementine.maurice@iaik.tugraz.at>
Cc: <moritz.lipp@iaik.tugraz.at>
Cc: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Cc: Richard Fellner <richard.fellner@student.tugraz.at>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <kirill.shutemov@linux.intel.com>
Cc: <anders.fogh@gdata-adan.de>

After several recent works [1,2,3] KASLR on x86_64 was basically
considered dead by many researchers. We have been working on an
efficient but effective fix for this problem and found that not mapping
the kernel space when running in user mode is the solution to this
problem [4] (the corresponding paper [5] will be presented at ESSoS17).

With this RFC patch we allow anybody to configure their kernel with the
flag CONFIG_KAISER to add our defense mechanism.

If there are any questions we would love to answer them.
We also appreciate any comments!

Cheers,
Daniel (+ the KAISER team from Graz University of Technology)

[1] http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf
[2] https://www.blackhat.com/docs/us-16/materials/us-16-Fogh-Using-Undocumented-CPU-Behaviour-To-See-Into-Kernel-Mode-And-Break-KASLR-In-The-Process.pdf
[3] https://www.blackhat.com/docs/us-16/materials/us-16-Jang-Breaking-Kernel-Address-Space-Layout-Randomization-KASLR-With-Intel-TSX.pdf
[4] https://github.com/IAIK/KAISER
[5] https://gruss.cc/files/kaiser.pdf

[patch based also on
https://raw.githubusercontent.com/IAIK/KAISER/master/KAISER/0001-KAISER-Kernel-Address-Isolation.patch]

Signed-off-by: Richard Fellner <richard.fellner@student.tugraz.at>
Signed-off-by: Moritz Lipp <moritz.lipp@iaik.tugraz.at>
Signed-off-by: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
Signed-off-by: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/entry/entry_64.S            |   19 +++-
 arch/x86/entry/entry_64_compat.S     |    6 +
 arch/x86/include/asm/hw_irq.h        |    2 
 arch/x86/include/asm/kaiser.h        |  113 ++++++++++++++++++++++++
 arch/x86/include/asm/pgtable.h       |    4 
 arch/x86/include/asm/pgtable_64.h    |   21 ++++
 arch/x86/include/asm/pgtable_types.h |   12 ++
 arch/x86/include/asm/processor.h     |    7 +
 arch/x86/kernel/cpu/common.c         |    4 
 arch/x86/kernel/espfix_64.c          |    6 +
 arch/x86/kernel/head_64.S            |   16 ++-
 arch/x86/kernel/irqinit.c            |    2 
 arch/x86/kernel/process.c            |    2 
 arch/x86/mm/Makefile                 |    1 
 arch/x86/mm/kaiser.c                 |  160 +++++++++++++++++++++++++++++++++++
 arch/x86/mm/pageattr.c               |    2 
 arch/x86/mm/pgtable.c                |   26 +++++
 include/asm-generic/vmlinux.lds.h    |   11 ++
 include/linux/percpu-defs.h          |   30 ++++++
 init/main.c                          |    6 +
 kernel/fork.c                        |    8 +
 security/Kconfig                     |    7 +
 22 files changed, 449 insertions(+), 16 deletions(-)
 create mode 100644 arch/x86/include/asm/kaiser.h
 create mode 100644 arch/x86/mm/kaiser.c

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -35,6 +35,7 @@
 #include <asm/asm.h>
 #include <asm/smap.h>
 #include <asm/pgtable_types.h>
+#include <asm/kaiser.h>
 #include <linux/err.h>
 
 /* Avoid __ASSEMBLER__'ifying <linux/audit.h> just for this.  */
@@ -135,6 +136,7 @@ ENTRY(entry_SYSCALL_64)
 	 * it is too small to ever cause noticeable irq latency.
 	 */
 	SWAPGS_UNSAFE_STACK
+	SWITCH_KERNEL_CR3_NO_STACK
 	/*
 	 * A hypervisor implementation might want to use a label
 	 * after the swapgs, so that it can do the swapgs
@@ -207,9 +209,10 @@ entry_SYSCALL_64_fastpath:
 	testl	$_TIF_ALLWORK_MASK, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
 	jnz	int_ret_from_sys_call_irqs_off	/* Go to the slow path */
 
-	RESTORE_C_REGS_EXCEPT_RCX_R11
 	movq	RIP(%rsp), %rcx
 	movq	EFLAGS(%rsp), %r11
+	RESTORE_C_REGS_EXCEPT_RCX_R11
+	SWITCH_USER_CR3
 	movq	RSP(%rsp), %rsp
 	/*
 	 * 64-bit SYSRET restores rip from rcx,
@@ -347,10 +350,12 @@ GLOBAL(int_ret_from_sys_call)
 syscall_return_via_sysret:
 	/* rcx and r11 are already restored (see code above) */
 	RESTORE_C_REGS_EXCEPT_RCX_R11
+	SWITCH_USER_CR3
 	movq	RSP(%rsp), %rsp
 	USERGS_SYSRET64
 
 opportunistic_sysret_failed:
+	SWITCH_USER_CR3
 	SWAPGS
 	jmp	restore_c_regs_and_iret
 END(entry_SYSCALL_64)
@@ -509,6 +514,7 @@ END(irq_entries_start)
 	 * tracking that we're in kernel mode.
 	 */
 	SWAPGS
+	SWITCH_KERNEL_CR3
 
 	/*
 	 * We need to tell lockdep that IRQs are off.  We can't do this until
@@ -568,6 +574,7 @@ GLOBAL(retint_user)
 	mov	%rsp,%rdi
 	call	prepare_exit_to_usermode
 	TRACE_IRQS_IRETQ
+	SWITCH_USER_CR3
 	SWAPGS
 	jmp	restore_regs_and_iret
 
@@ -625,6 +632,7 @@ native_irq_return_ldt:
 	pushq	%rax
 	pushq	%rdi
 	SWAPGS
+	SWITCH_KERNEL_CR3
 	movq	PER_CPU_VAR(espfix_waddr), %rdi
 	movq	%rax, (0*8)(%rdi)		/* RAX */
 	movq	(2*8)(%rsp), %rax		/* RIP */
@@ -640,6 +648,7 @@ native_irq_return_ldt:
 	andl	$0xffff0000, %eax
 	popq	%rdi
 	orq	PER_CPU_VAR(espfix_stack), %rax
+	SWITCH_USER_CR3
 	SWAPGS
 	movq	%rax, %rsp
 	popq	%rax
@@ -1007,6 +1016,7 @@ ENTRY(paranoid_entry)
 	testl	%edx, %edx
 	js	1f				/* negative -> in kernel */
 	SWAPGS
+	SWITCH_KERNEL_CR3
 	xorl	%ebx, %ebx
 1:	ret
 END(paranoid_entry)
@@ -1029,6 +1039,7 @@ ENTRY(paranoid_exit)
 	testl	%ebx, %ebx			/* swapgs needed? */
 	jnz	paranoid_exit_no_swapgs
 	TRACE_IRQS_IRETQ
+	SWITCH_USER_CR3_NO_STACK
 	SWAPGS_UNSAFE_STACK
 	jmp	paranoid_exit_restore
 paranoid_exit_no_swapgs:
@@ -1058,6 +1069,7 @@ ENTRY(error_entry)
 	 * from user mode due to an IRET fault.
 	 */
 	SWAPGS
+	SWITCH_KERNEL_CR3
 
 .Lerror_entry_from_usermode_after_swapgs:
 	/*
@@ -1110,7 +1122,7 @@ ENTRY(error_entry)
 	 * Switch to kernel gsbase:
 	 */
 	SWAPGS
-
+	SWITCH_KERNEL_CR3
 	/*
 	 * Pretend that the exception came from user mode: set up pt_regs
 	 * as if we faulted immediately after IRET and clear EBX so that
@@ -1210,6 +1222,7 @@ ENTRY(nmi)
 	 */
 
 	SWAPGS_UNSAFE_STACK
+	SWITCH_KERNEL_CR3_NO_STACK
 	cld
 	movq	%rsp, %rdx
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
@@ -1250,6 +1263,7 @@ ENTRY(nmi)
 	 * work, because we don't want to enable interrupts.  Fortunately,
 	 * do_nmi doesn't modify pt_regs.
 	 */
+	SWITCH_USER_CR3
 	SWAPGS
 	jmp	restore_c_regs_and_iret
 
@@ -1461,6 +1475,7 @@ end_repeat_nmi:
 	testl	%ebx, %ebx			/* swapgs needed? */
 	jnz	nmi_restore
 nmi_swapgs:
+	SWITCH_USER_CR3_NO_STACK
 	SWAPGS_UNSAFE_STACK
 nmi_restore:
 	RESTORE_EXTRA_REGS
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -13,6 +13,7 @@
 #include <asm/irqflags.h>
 #include <asm/asm.h>
 #include <asm/smap.h>
+#include <asm/kaiser.h>
 #include <linux/linkage.h>
 #include <linux/err.h>
 
@@ -50,6 +51,7 @@ ENDPROC(native_usergs_sysret32)
 ENTRY(entry_SYSENTER_compat)
 	/* Interrupts are off on entry. */
 	SWAPGS_UNSAFE_STACK
+	SWITCH_KERNEL_CR3_NO_STACK
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
 
 	/*
@@ -161,6 +163,7 @@ ENDPROC(entry_SYSENTER_compat)
 ENTRY(entry_SYSCALL_compat)
 	/* Interrupts are off on entry. */
 	SWAPGS_UNSAFE_STACK
+	SWITCH_KERNEL_CR3_NO_STACK
 
 	/* Stash user ESP and switch to the kernel stack. */
 	movl	%esp, %r8d
@@ -208,6 +211,7 @@ ENTRY(entry_SYSCALL_compat)
 	/* Opportunistic SYSRET */
 sysret32_from_system_call:
 	TRACE_IRQS_ON			/* User mode traces as IRQs on. */
+	SWITCH_USER_CR3
 	movq	RBX(%rsp), %rbx		/* pt_regs->rbx */
 	movq	RBP(%rsp), %rbp		/* pt_regs->rbp */
 	movq	EFLAGS(%rsp), %r11	/* pt_regs->flags (in r11) */
@@ -269,6 +273,7 @@ ENTRY(entry_INT80_compat)
 	PARAVIRT_ADJUST_EXCEPTION_FRAME
 	ASM_CLAC			/* Do this early to minimize exposure */
 	SWAPGS
+	SWITCH_KERNEL_CR3_NO_STACK
 
 	/*
 	 * User tracing code (ptrace or signal handlers) might assume that
@@ -311,6 +316,7 @@ ENTRY(entry_INT80_compat)
 
 	/* Go back to user mode. */
 	TRACE_IRQS_ON
+	SWITCH_USER_CR3
 	SWAPGS
 	jmp	restore_regs_and_iret
 END(entry_INT80_compat)
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -187,7 +187,7 @@ extern char irq_entries_start[];
 #define VECTOR_RETRIGGERED	((void *)~0UL)
 
 typedef struct irq_desc* vector_irq_t[NR_VECTORS];
-DECLARE_PER_CPU(vector_irq_t, vector_irq);
+DECLARE_PER_CPU_USER_MAPPED(vector_irq_t, vector_irq);
 
 #endif /* !ASSEMBLY_ */
 
--- /dev/null
+++ b/arch/x86/include/asm/kaiser.h
@@ -0,0 +1,113 @@
+#ifndef _ASM_X86_KAISER_H
+#define _ASM_X86_KAISER_H
+
+/* This file includes the definitions for the KAISER feature.
+ * KAISER is a counter measure against x86_64 side channel attacks on the kernel virtual memory.
+ * It has a shodow-pgd for every process. the shadow-pgd has a minimalistic kernel-set mapped,
+ * but includes the whole user memory. Within a kernel context switch, or when an interrupt is handled,
+ * the pgd is switched to the normal one. When the system switches to user mode, the shadow pgd is enabled.
+ * By this, the virtual memory chaches are freed, and the user may not attack the whole kernel memory.
+ *
+ * A minimalistic kernel mapping holds the parts needed to be mapped in user mode, as the entry/exit functions
+ * of the user space, or the stacks.
+ */
+#ifdef __ASSEMBLY__
+#ifdef CONFIG_KAISER
+
+.macro _SWITCH_TO_KERNEL_CR3 reg
+movq %cr3, \reg
+andq $(~0x1000), \reg
+movq \reg, %cr3
+.endm
+
+.macro _SWITCH_TO_USER_CR3 reg
+movq %cr3, \reg
+orq $(0x1000), \reg
+movq \reg, %cr3
+.endm
+
+.macro SWITCH_KERNEL_CR3
+pushq %rax
+_SWITCH_TO_KERNEL_CR3 %rax
+popq %rax
+.endm
+
+.macro SWITCH_USER_CR3
+pushq %rax
+_SWITCH_TO_USER_CR3 %rax
+popq %rax
+.endm
+
+.macro SWITCH_KERNEL_CR3_NO_STACK
+movq %rax, PER_CPU_VAR(unsafe_stack_register_backup)
+_SWITCH_TO_KERNEL_CR3 %rax
+movq PER_CPU_VAR(unsafe_stack_register_backup), %rax
+.endm
+
+
+.macro SWITCH_USER_CR3_NO_STACK
+
+movq %rax, PER_CPU_VAR(unsafe_stack_register_backup)
+_SWITCH_TO_USER_CR3 %rax
+movq PER_CPU_VAR(unsafe_stack_register_backup), %rax
+
+.endm
+
+#else /* CONFIG_KAISER */
+
+.macro SWITCH_KERNEL_CR3 reg
+.endm
+.macro SWITCH_USER_CR3 reg
+.endm
+.macro SWITCH_USER_CR3_NO_STACK
+.endm
+.macro SWITCH_KERNEL_CR3_NO_STACK
+.endm
+
+#endif /* CONFIG_KAISER */
+#else /* __ASSEMBLY__ */
+
+
+#ifdef CONFIG_KAISER
+// Upon kernel/user mode switch, it may happen that
+// the address space has to be switched before the registers have been stored.
+// To change the address space, another register is needed.
+// A register therefore has to be stored/restored.
+//
+DECLARE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
+
+#endif /* CONFIG_KAISER */
+
+/**
+ *  shadowmem_add_mapping - map a virtual memory part to the shadow mapping
+ *  @addr: the start address of the range
+ *  @size: the size of the range
+ *  @flags: The mapping flags of the pages
+ *
+ *  the mapping is done on a global scope, so no bigger synchronization has to be done.
+ *  the pages have to be manually unmapped again when they are not needed any longer.
+ */
+extern void kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags);
+
+
+/**
+ *  shadowmem_remove_mapping - unmap a virtual memory part of the shadow mapping
+ *  @addr: the start address of the range
+ *  @size: the size of the range
+ */
+extern void kaiser_remove_mapping(unsigned long start, unsigned long size);
+
+/**
+ *  shadowmem_initialize_mapping - Initalize the shadow mapping
+ *
+ *  most parts of the shadow mapping can be mapped upon boot time.
+ *  only the thread stacks have to be mapped on runtime.
+ *  the mapped regions are not unmapped at all.
+ */
+extern void kaiser_init(void);
+
+#endif
+
+
+
+#endif /* _ASM_X86_KAISER_H */
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -856,6 +856,10 @@ static inline void pmdp_set_wrprotect(st
 static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count)
 {
        memcpy(dst, src, count * sizeof(pgd_t));
+#ifdef CONFIG_KAISER
+	// clone the shadow pgd part as well
+	memcpy(native_get_shadow_pgd(dst), native_get_shadow_pgd(src), count * sizeof(pgd_t));
+#endif
 }
 
 #define PTE_SHIFT ilog2(PTRS_PER_PTE)
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -106,9 +106,30 @@ static inline void native_pud_clear(pud_
 	native_set_pud(pud, native_make_pud(0));
 }
 
+#ifdef CONFIG_KAISER
+static inline pgd_t * native_get_shadow_pgd(pgd_t *pgdp) {
+	return (pgd_t *)(void*)((unsigned long)(void*)pgdp | (unsigned long)PAGE_SIZE);
+}
+
+static inline pgd_t * native_get_normal_pgd(pgd_t *pgdp) {
+	return (pgd_t *)(void*)((unsigned long)(void*)pgdp &  ~(unsigned long)PAGE_SIZE);
+}
+#endif /* CONFIG_KAISER */
+
 static inline void native_set_pgd(pgd_t *pgdp, pgd_t pgd)
 {
+#ifdef CONFIG_KAISER
+	// We know that a pgd is page aligned.
+	// Therefore the lower indices have to be mapped to user space.
+	// These pages are mapped to the shadow mapping.
+	if ((((unsigned long)pgdp) % PAGE_SIZE) < (PAGE_SIZE / 2)) {
+		native_get_shadow_pgd(pgdp)->pgd = pgd.pgd;
+	}
+
+	pgdp->pgd = pgd.pgd & ~_PAGE_USER;
+#else /* CONFIG_KAISER */
 	*pgdp = pgd;
+#endif
 }
 
 static inline void native_pgd_clear(pgd_t *pgd)
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -39,7 +39,11 @@
 #define _PAGE_ACCESSED	(_AT(pteval_t, 1) << _PAGE_BIT_ACCESSED)
 #define _PAGE_DIRTY	(_AT(pteval_t, 1) << _PAGE_BIT_DIRTY)
 #define _PAGE_PSE	(_AT(pteval_t, 1) << _PAGE_BIT_PSE)
-#define _PAGE_GLOBAL	(_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL)
+#ifdef CONFIG_KAISER
+#define _PAGE_GLOBAL	(_AT(pteval_t, 0))
+#else
+#define _PAGE_GLOBAL  (_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL)
+#endif
 #define _PAGE_SOFTW1	(_AT(pteval_t, 1) << _PAGE_BIT_SOFTW1)
 #define _PAGE_SOFTW2	(_AT(pteval_t, 1) << _PAGE_BIT_SOFTW2)
 #define _PAGE_PAT	(_AT(pteval_t, 1) << _PAGE_BIT_PAT)
@@ -89,7 +93,11 @@
 #define _PAGE_NX	(_AT(pteval_t, 0))
 #endif
 
-#define _PAGE_PROTNONE	(_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
+#ifdef CONFIG_KAISER
+#define _PAGE_PROTNONE	(_AT(pteval_t, 0))
+#else
+#define _PAGE_PROTNONE  (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
+#endif
 
 #define _PAGE_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |	\
 			 _PAGE_ACCESSED | _PAGE_DIRTY)
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -305,7 +305,7 @@ struct tss_struct {
 
 } ____cacheline_aligned;
 
-DECLARE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss);
+DECLARE_PER_CPU_SHARED_ALIGNED_USER_MAPPED(struct tss_struct, cpu_tss);
 
 #ifdef CONFIG_X86_32
 DECLARE_PER_CPU(unsigned long, cpu_current_top_of_stack);
@@ -332,6 +332,11 @@ union irq_stack_union {
 		char gs_base[40];
 		unsigned long stack_canary;
 	};
+
+	struct {
+		char irq_stack_pointer[64];
+		char unused[IRQ_STACK_SIZE - 64];
+	};
 };
 
 DECLARE_PER_CPU_FIRST(union irq_stack_union, irq_stack_union) __visible;
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -92,7 +92,7 @@ static const struct cpu_dev default_cpu
 
 static const struct cpu_dev *this_cpu = &default_cpu;
 
-DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = { .gdt = {
+DEFINE_PER_CPU_PAGE_ALIGNED_USER_MAPPED(struct gdt_page, gdt_page) = { .gdt = {
 #ifdef CONFIG_X86_64
 	/*
 	 * We need valid kernel segments for data and code in long mode too
@@ -1229,7 +1229,7 @@ static const unsigned int exception_stac
 	  [DEBUG_STACK - 1]			= DEBUG_STKSZ
 };
 
-static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks
+DEFINE_PER_CPU_PAGE_ALIGNED_USER_MAPPED(char, exception_stacks
 	[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]);
 
 /* May not be marked __init: used by software suspend */
--- a/arch/x86/kernel/espfix_64.c
+++ b/arch/x86/kernel/espfix_64.c
@@ -41,6 +41,7 @@
 #include <asm/pgalloc.h>
 #include <asm/setup.h>
 #include <asm/espfix.h>
+#include <asm/kaiser.h>
 
 /*
  * Note: we only need 6*8 = 48 bytes for the espfix stack, but round
@@ -126,6 +127,11 @@ void __init init_espfix_bsp(void)
 	/* Install the espfix pud into the kernel page directory */
 	pgd_p = &init_level4_pgt[pgd_index(ESPFIX_BASE_ADDR)];
 	pgd_populate(&init_mm, pgd_p, (pud_t *)espfix_pud_page);
+#ifdef CONFIG_KAISER
+	// add the esp stack pud to the shadow mapping here.
+	// This can be done directly, because the fixup stack has its own pud
+	set_pgd(native_get_shadow_pgd(pgd_p), __pgd(_PAGE_TABLE | __pa((pud_t *)espfix_pud_page)));
+#endif
 
 	/* Randomize the locations */
 	init_espfix_random();
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -441,6 +441,14 @@ early_idt_ripmsg:
 	.balign	PAGE_SIZE; \
 GLOBAL(name)
 
+#ifdef CONFIG_KAISER
+#define NEXT_PGD_PAGE(name) \
+	.balign 2 * PAGE_SIZE; \
+GLOBAL(name)
+#else
+#define NEXT_PGD_PAGE(name) NEXT_PAGE(name)
+#endif
+
 /* Automate the creation of 1 to 1 mapping pmd entries */
 #define PMDS(START, PERM, COUNT)			\
 	i = 0 ;						\
@@ -450,7 +458,7 @@ GLOBAL(name)
 	.endr
 
 	__INITDATA
-NEXT_PAGE(early_level4_pgt)
+NEXT_PGD_PAGE(early_level4_pgt)
 	.fill	511,8,0
 	.quad	level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
 
@@ -460,10 +468,10 @@ NEXT_PAGE(early_dynamic_pgts)
 	.data
 
 #ifndef CONFIG_XEN
-NEXT_PAGE(init_level4_pgt)
-	.fill	512,8,0
+NEXT_PGD_PAGE(init_level4_pgt)
+	.fill	2*512,8,0
 #else
-NEXT_PAGE(init_level4_pgt)
+NEXT_PGD_PAGE(init_level4_pgt)
 	.quad   level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
 	.org    init_level4_pgt + L4_PAGE_OFFSET*8, 0
 	.quad   level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -51,7 +51,7 @@ static struct irqaction irq2 = {
 	.flags = IRQF_NO_THREAD,
 };
 
-DEFINE_PER_CPU(vector_irq_t, vector_irq) = {
+DEFINE_PER_CPU_USER_MAPPED(vector_irq_t, vector_irq) = {
 	[0 ... NR_VECTORS - 1] = VECTOR_UNUSED,
 };
 
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -39,7 +39,7 @@
  * section. Since TSS's are completely CPU-local, we want them
  * on exact cacheline boundaries, to eliminate cacheline ping-pong.
  */
-__visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = {
+__visible DEFINE_PER_CPU_SHARED_ALIGNED_USER_MAPPED(struct tss_struct, cpu_tss) = {
 	.x86_tss = {
 		.sp0 = TOP_OF_INIT_STACK,
 #ifdef CONFIG_X86_32
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -32,3 +32,4 @@ obj-$(CONFIG_ACPI_NUMA)		+= srat.o
 obj-$(CONFIG_NUMA_EMU)		+= numa_emulation.o
 
 obj-$(CONFIG_X86_INTEL_MPX)	+= mpx.o
+obj-$(CONFIG_KAISER)		+= kaiser.o
--- /dev/null
+++ b/arch/x86/mm/kaiser.c
@@ -0,0 +1,160 @@
+
+
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/string.h>
+#include <linux/types.h>
+#include <linux/bug.h>
+#include <linux/init.h>
+#include <linux/spinlock.h>
+#include <linux/mm.h>
+
+#include <linux/uaccess.h>
+#include <asm/pgtable.h>
+#include <asm/pgalloc.h>
+#include <asm/desc.h>
+#ifdef CONFIG_KAISER
+
+__visible DEFINE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
+
+/**
+ * Get the real ppn from a address in kernel mapping.
+ * @param address The virtual adrress
+ * @return the physical address
+ */
+static inline unsigned long get_pa_from_mapping (unsigned long address)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+
+	pgd = pgd_offset_k(address);
+	BUG_ON(pgd_none(*pgd) || pgd_large(*pgd));
+
+	pud = pud_offset(pgd, address);
+	BUG_ON(pud_none(*pud));
+
+	if (pud_large(*pud)) {
+		return (pud_pfn(*pud) << PAGE_SHIFT) | (address & ~PUD_PAGE_MASK);
+	}
+
+	pmd = pmd_offset(pud, address);
+	BUG_ON(pmd_none(*pmd));
+
+	if (pmd_large(*pmd)) {
+		return (pmd_pfn(*pmd) << PAGE_SHIFT) | (address & ~PMD_PAGE_MASK);
+	}
+
+	pte = pte_offset_kernel(pmd, address);
+	BUG_ON(pte_none(*pte));
+
+	return (pte_pfn(*pte) << PAGE_SHIFT) | (address & ~PAGE_MASK);
+}
+
+void _kaiser_copy (unsigned long start_addr, unsigned long size,
+					unsigned long flags)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+	unsigned long address;
+	unsigned long end_addr = start_addr + size;
+	unsigned long target_address;
+
+	for (address = PAGE_ALIGN(start_addr - (PAGE_SIZE - 1));
+			address < PAGE_ALIGN(end_addr); address += PAGE_SIZE) {
+		target_address = get_pa_from_mapping(address);
+
+		pgd = native_get_shadow_pgd(pgd_offset_k(address));
+
+		BUG_ON(pgd_none(*pgd) && "All shadow pgds should be mapped at this time\n");
+		BUG_ON(pgd_large(*pgd));
+
+		pud = pud_offset(pgd, address);
+		if (pud_none(*pud)) {
+			set_pud(pud, __pud(_PAGE_TABLE | __pa(pmd_alloc_one(0, address))));
+		}
+		BUG_ON(pud_large(*pud));
+
+		pmd = pmd_offset(pud, address);
+		if (pmd_none(*pmd)) {
+			set_pmd(pmd, __pmd(_PAGE_TABLE | __pa(pte_alloc_one_kernel(0, address))));
+		}
+		BUG_ON(pmd_large(*pmd));
+
+		pte = pte_offset_kernel(pmd, address);
+		if (pte_none(*pte)) {
+			set_pte(pte, __pte(flags | target_address));
+		} else {
+			BUG_ON(__pa(pte_page(*pte)) != target_address);
+		}
+	}
+}
+
+// at first, add a pmd for every pgd entry in the shadowmem-kernel-part of the kernel mapping
+static inline void __init _kaiser_init(void)
+{
+	pgd_t *pgd;
+	int i = 0;
+
+	pgd = native_get_shadow_pgd(pgd_offset_k((unsigned long )0));
+	for (i = PTRS_PER_PGD / 2; i < PTRS_PER_PGD; i++) {
+		set_pgd(pgd + i, __pgd(_PAGE_TABLE |__pa(pud_alloc_one(0, 0))));
+	}
+}
+
+extern char __per_cpu_user_mapped_start[], __per_cpu_user_mapped_end[];
+spinlock_t shadow_table_lock;
+void __init kaiser_init(void)
+{
+	int cpu;
+	spin_lock_init(&shadow_table_lock);
+
+	spin_lock(&shadow_table_lock);
+
+	_kaiser_init();
+
+	for_each_possible_cpu(cpu) {
+		// map the per cpu user variables
+		_kaiser_copy(
+				(unsigned long) (__per_cpu_user_mapped_start + per_cpu_offset(cpu)),
+				(unsigned long) __per_cpu_user_mapped_end - (unsigned long) __per_cpu_user_mapped_start,
+				__PAGE_KERNEL);
+	}
+
+	// map the entry/exit text section, which is responsible to switch between user- and kernel mode
+	_kaiser_copy(
+			(unsigned long) __entry_text_start,
+			(unsigned long) __entry_text_end - (unsigned long) __entry_text_start,
+			__PAGE_KERNEL_RX);
+
+	// the fixed map address of the idt_table
+	_kaiser_copy(
+			(unsigned long) idt_descr.address,
+			sizeof(gate_desc) * NR_VECTORS,
+			__PAGE_KERNEL_RO);
+
+	spin_unlock(&shadow_table_lock);
+}
+
+// add a mapping to the shadow-mapping, and synchronize the mappings
+void kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags)
+{
+	spin_lock(&shadow_table_lock);
+	_kaiser_copy(addr, size, flags);
+	spin_unlock(&shadow_table_lock);
+}
+
+extern void unmap_pud_range(pgd_t *pgd, unsigned long start, unsigned long end);
+void kaiser_remove_mapping(unsigned long start, unsigned long size)
+{
+	pgd_t *pgd = native_get_shadow_pgd(pgd_offset_k(start));
+	spin_lock(&shadow_table_lock);
+	do {
+		unmap_pud_range(pgd, start, start + size);
+	} while (pgd++ != native_get_shadow_pgd(pgd_offset_k(start + size)));
+	spin_unlock(&shadow_table_lock);
+}
+#endif /* CONFIG_KAISER */
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -829,7 +829,7 @@ static void unmap_pmd_range(pud_t *pud,
 			pud_clear(pud);
 }
 
-static void unmap_pud_range(pgd_t *pgd, unsigned long start, unsigned long end)
+void unmap_pud_range(pgd_t *pgd, unsigned long start, unsigned long end)
 {
 	pud_t *pud = pud_offset(pgd, start);
 
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -342,12 +342,38 @@ static inline void _pgd_free(pgd_t *pgd)
 #else
 static inline pgd_t *_pgd_alloc(void)
 {
+#ifdef CONFIG_KAISER
+	// Instead of one PML4, we aquire two PML4s and, thus, an 8kb-aligned memory
+	// block. Therefore, we have to allocate at least 3 pages. However, the
+	// __get_free_pages returns us 4 pages. Hence, we store the base pointer at
+	// the beginning of the page of our 8kb-aligned memory block in order to
+	// correctly free it afterwars.
+
+	unsigned long pages = __get_free_pages(PGALLOC_GFP, get_order(4*PAGE_SIZE));
+
+	if(native_get_normal_pgd((pgd_t*) pages) == (pgd_t*) pages)
+	{
+		*((unsigned long*)(pages + 2 * PAGE_SIZE)) = pages;
+		return (pgd_t *) pages;
+	}
+	else
+	{
+		*((unsigned long*)(pages + 3 * PAGE_SIZE)) = pages;
+		return (pgd_t *) (pages + PAGE_SIZE);
+	}
+#else
 	return (pgd_t *)__get_free_page(PGALLOC_GFP);
+#endif
 }
 
 static inline void _pgd_free(pgd_t *pgd)
 {
+#ifdef CONFIG_KAISER
+  unsigned long pages = *((unsigned long*) ((char*) pgd + 2 * PAGE_SIZE));
+	free_pages(pages, get_order(4*PAGE_SIZE));
+#else
 	free_page((unsigned long)pgd);
+#endif
 }
 #endif /* CONFIG_X86_PAE */
 
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -725,7 +725,16 @@
  */
 #define PERCPU_INPUT(cacheline)						\
 	VMLINUX_SYMBOL(__per_cpu_start) = .;				\
-	*(.data..percpu..first)						\
+	\
+	VMLINUX_SYMBOL(__per_cpu_user_mapped_start) = .;        \
+	*(.data..percpu..first)           \
+	. = ALIGN(cacheline);           \
+	*(.data..percpu..user_mapped)            \
+	*(.data..percpu..user_mapped..shared_aligned)        \
+	. = ALIGN(PAGE_SIZE);           \
+	*(.data..percpu..user_mapped..page_aligned)          \
+	VMLINUX_SYMBOL(__per_cpu_user_mapped_end) = .;        \
+	\
 	. = ALIGN(PAGE_SIZE);						\
 	*(.data..percpu..page_aligned)					\
 	. = ALIGN(cacheline);						\
--- a/include/linux/percpu-defs.h
+++ b/include/linux/percpu-defs.h
@@ -35,6 +35,12 @@
 
 #endif
 
+#ifdef CONFIG_KAISER
+#define USER_MAPPED_SECTION "..user_mapped"
+#else
+#define USER_MAPPED_SECTION ""
+#endif
+
 /*
  * Base implementations of per-CPU variable declarations and definitions, where
  * the section in which the variable is to be placed is provided by the
@@ -115,6 +121,12 @@
 #define DEFINE_PER_CPU(type, name)					\
 	DEFINE_PER_CPU_SECTION(type, name, "")
 
+#define DECLARE_PER_CPU_USER_MAPPED(type, name)         \
+	DECLARE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION)
+
+#define DEFINE_PER_CPU_USER_MAPPED(type, name)          \
+	DEFINE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION)
+
 /*
  * Declaration/definition used for per-CPU variables that must come first in
  * the set of variables.
@@ -144,6 +156,14 @@
 	DEFINE_PER_CPU_SECTION(type, name, PER_CPU_SHARED_ALIGNED_SECTION) \
 	____cacheline_aligned_in_smp
 
+#define DECLARE_PER_CPU_SHARED_ALIGNED_USER_MAPPED(type, name)			\
+	DECLARE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION PER_CPU_SHARED_ALIGNED_SECTION) \
+	____cacheline_aligned_in_smp
+
+#define DEFINE_PER_CPU_SHARED_ALIGNED_USER_MAPPED(type, name)			\
+	DEFINE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION PER_CPU_SHARED_ALIGNED_SECTION) \
+	____cacheline_aligned_in_smp
+
 #define DECLARE_PER_CPU_ALIGNED(type, name)				\
 	DECLARE_PER_CPU_SECTION(type, name, PER_CPU_ALIGNED_SECTION)	\
 	____cacheline_aligned
@@ -162,6 +182,16 @@
 #define DEFINE_PER_CPU_PAGE_ALIGNED(type, name)				\
 	DEFINE_PER_CPU_SECTION(type, name, "..page_aligned")		\
 	__aligned(PAGE_SIZE)
+/*
+ * Declaration/definition used for per-CPU variables that must be page aligned and need to be mapped in user mode.
+ */
+#define DECLARE_PER_CPU_PAGE_ALIGNED_USER_MAPPED(type, name)      \
+  DECLARE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION"..page_aligned")   \
+  __aligned(PAGE_SIZE)
+
+#define DEFINE_PER_CPU_PAGE_ALIGNED_USER_MAPPED(type, name)       \
+  DEFINE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION"..page_aligned")    \
+  __aligned(PAGE_SIZE)
 
 /*
  * Declaration/definition used for per-CPU variables that must be read mostly.
--- a/init/main.c
+++ b/init/main.c
@@ -87,6 +87,9 @@
 #include <asm/setup.h>
 #include <asm/sections.h>
 #include <asm/cacheflush.h>
+#ifdef CONFIG_KAISER
+#include <asm/kaiser.h>
+#endif
 
 static int kernel_init(void *);
 
@@ -492,6 +495,9 @@ static void __init mm_init(void)
 	pgtable_init();
 	vmalloc_init();
 	ioremap_huge_init();
+#ifdef CONFIG_KAISER
+	kaiser_init();
+#endif
 }
 
 asmlinkage __visible void __init start_kernel(void)
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -167,8 +167,12 @@ static struct thread_info *alloc_thread_
 	return page ? page_address(page) : NULL;
 }
 
+extern void kaiser_remove_mapping(unsigned long start_addr, unsigned long size);
 static inline void free_thread_info(struct thread_info *ti)
 {
+#ifdef CONFIG_KAISER
+	kaiser_remove_mapping((unsigned long)ti, THREAD_SIZE);
+#endif
 	free_kmem_pages((unsigned long)ti, THREAD_SIZE_ORDER);
 }
 # else
@@ -331,6 +335,7 @@ void set_task_stack_end_magic(struct tas
 	*stackend = STACK_END_MAGIC;	/* for overflow detection */
 }
 
+extern void kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags);
 static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
 {
 	struct task_struct *tsk;
@@ -352,6 +357,9 @@ static struct task_struct *dup_task_stru
 		goto free_ti;
 
 	tsk->stack = ti;
+#ifdef CONFIG_KAISER
+	kaiser_add_mapping((unsigned long)tsk->stack, THREAD_SIZE, __PAGE_KERNEL);
+#endif
 #ifdef CONFIG_SECCOMP
 	/*
 	 * We must handle setting up seccomp filters once we're under
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -30,6 +30,13 @@ config SECURITY
 	  model will be used.
 
 	  If you are unsure how to answer this question, answer N.
+config KAISER
+	bool "Remove the kernel mapping in user mode"
+	depends on X86_64
+	depends on !PARAVIRT
+	help
+	  This enforces a strict kernel and user space isolation in order to close
+	  hardware side channels on kernel address information.
 
 config SECURITYFS
 	bool "Enable the securityfs filesystem"

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [kernel-hardening] [PATCH 4.4 02/37] KAISER: Kernel Address Isolation
@ 2018-01-03 20:11   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel, kernel-hardening
  Cc: Greg Kroah-Hartman, stable, clementine.maurice, moritz.lipp,
	Michael Schwarz, Richard Fellner, Ingo Molnar, kirill.shutemov,
	anders.fogh, Daniel Gruss, Jiri Kosina, Hugh Dickins

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Richard Fellner <richard.fellner@student.tugraz.at>


This patch introduces our implementation of KAISER (Kernel Address Isolation to
have Side-channels Efficiently Removed), a kernel isolation technique to close
hardware side channels on kernel address information.

More information about the patch can be found on:

        https://github.com/IAIK/KAISER

From: Richard Fellner <richard.fellner@student.tugraz.at>
From: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
X-Subject: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
Date: Thu, 4 May 2017 14:26:50 +0200
Link: http://marc.info/?l=linux-kernel&m=149390087310405&w=2
Kaiser-4.10-SHA1: c4b1831d44c6144d3762ccc72f0c4e71a0c713e5

To: <linux-kernel@vger.kernel.org>
To: <kernel-hardening@lists.openwall.com>
Cc: <clementine.maurice@iaik.tugraz.at>
Cc: <moritz.lipp@iaik.tugraz.at>
Cc: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Cc: Richard Fellner <richard.fellner@student.tugraz.at>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <kirill.shutemov@linux.intel.com>
Cc: <anders.fogh@gdata-adan.de>

After several recent works [1,2,3] KASLR on x86_64 was basically
considered dead by many researchers. We have been working on an
efficient but effective fix for this problem and found that not mapping
the kernel space when running in user mode is the solution to this
problem [4] (the corresponding paper [5] will be presented at ESSoS17).

With this RFC patch we allow anybody to configure their kernel with the
flag CONFIG_KAISER to add our defense mechanism.

If there are any questions we would love to answer them.
We also appreciate any comments!

Cheers,
Daniel (+ the KAISER team from Graz University of Technology)

[1] http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf
[2] https://www.blackhat.com/docs/us-16/materials/us-16-Fogh-Using-Undocumented-CPU-Behaviour-To-See-Into-Kernel-Mode-And-Break-KASLR-In-The-Process.pdf
[3] https://www.blackhat.com/docs/us-16/materials/us-16-Jang-Breaking-Kernel-Address-Space-Layout-Randomization-KASLR-With-Intel-TSX.pdf
[4] https://github.com/IAIK/KAISER
[5] https://gruss.cc/files/kaiser.pdf

[patch based also on
https://raw.githubusercontent.com/IAIK/KAISER/master/KAISER/0001-KAISER-Kernel-Address-Isolation.patch]

Signed-off-by: Richard Fellner <richard.fellner@student.tugraz.at>
Signed-off-by: Moritz Lipp <moritz.lipp@iaik.tugraz.at>
Signed-off-by: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
Signed-off-by: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/entry/entry_64.S            |   19 +++-
 arch/x86/entry/entry_64_compat.S     |    6 +
 arch/x86/include/asm/hw_irq.h        |    2 
 arch/x86/include/asm/kaiser.h        |  113 ++++++++++++++++++++++++
 arch/x86/include/asm/pgtable.h       |    4 
 arch/x86/include/asm/pgtable_64.h    |   21 ++++
 arch/x86/include/asm/pgtable_types.h |   12 ++
 arch/x86/include/asm/processor.h     |    7 +
 arch/x86/kernel/cpu/common.c         |    4 
 arch/x86/kernel/espfix_64.c          |    6 +
 arch/x86/kernel/head_64.S            |   16 ++-
 arch/x86/kernel/irqinit.c            |    2 
 arch/x86/kernel/process.c            |    2 
 arch/x86/mm/Makefile                 |    1 
 arch/x86/mm/kaiser.c                 |  160 +++++++++++++++++++++++++++++++++++
 arch/x86/mm/pageattr.c               |    2 
 arch/x86/mm/pgtable.c                |   26 +++++
 include/asm-generic/vmlinux.lds.h    |   11 ++
 include/linux/percpu-defs.h          |   30 ++++++
 init/main.c                          |    6 +
 kernel/fork.c                        |    8 +
 security/Kconfig                     |    7 +
 22 files changed, 449 insertions(+), 16 deletions(-)
 create mode 100644 arch/x86/include/asm/kaiser.h
 create mode 100644 arch/x86/mm/kaiser.c

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -35,6 +35,7 @@
 #include <asm/asm.h>
 #include <asm/smap.h>
 #include <asm/pgtable_types.h>
+#include <asm/kaiser.h>
 #include <linux/err.h>
 
 /* Avoid __ASSEMBLER__'ifying <linux/audit.h> just for this.  */
@@ -135,6 +136,7 @@ ENTRY(entry_SYSCALL_64)
 	 * it is too small to ever cause noticeable irq latency.
 	 */
 	SWAPGS_UNSAFE_STACK
+	SWITCH_KERNEL_CR3_NO_STACK
 	/*
 	 * A hypervisor implementation might want to use a label
 	 * after the swapgs, so that it can do the swapgs
@@ -207,9 +209,10 @@ entry_SYSCALL_64_fastpath:
 	testl	$_TIF_ALLWORK_MASK, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
 	jnz	int_ret_from_sys_call_irqs_off	/* Go to the slow path */
 
-	RESTORE_C_REGS_EXCEPT_RCX_R11
 	movq	RIP(%rsp), %rcx
 	movq	EFLAGS(%rsp), %r11
+	RESTORE_C_REGS_EXCEPT_RCX_R11
+	SWITCH_USER_CR3
 	movq	RSP(%rsp), %rsp
 	/*
 	 * 64-bit SYSRET restores rip from rcx,
@@ -347,10 +350,12 @@ GLOBAL(int_ret_from_sys_call)
 syscall_return_via_sysret:
 	/* rcx and r11 are already restored (see code above) */
 	RESTORE_C_REGS_EXCEPT_RCX_R11
+	SWITCH_USER_CR3
 	movq	RSP(%rsp), %rsp
 	USERGS_SYSRET64
 
 opportunistic_sysret_failed:
+	SWITCH_USER_CR3
 	SWAPGS
 	jmp	restore_c_regs_and_iret
 END(entry_SYSCALL_64)
@@ -509,6 +514,7 @@ END(irq_entries_start)
 	 * tracking that we're in kernel mode.
 	 */
 	SWAPGS
+	SWITCH_KERNEL_CR3
 
 	/*
 	 * We need to tell lockdep that IRQs are off.  We can't do this until
@@ -568,6 +574,7 @@ GLOBAL(retint_user)
 	mov	%rsp,%rdi
 	call	prepare_exit_to_usermode
 	TRACE_IRQS_IRETQ
+	SWITCH_USER_CR3
 	SWAPGS
 	jmp	restore_regs_and_iret
 
@@ -625,6 +632,7 @@ native_irq_return_ldt:
 	pushq	%rax
 	pushq	%rdi
 	SWAPGS
+	SWITCH_KERNEL_CR3
 	movq	PER_CPU_VAR(espfix_waddr), %rdi
 	movq	%rax, (0*8)(%rdi)		/* RAX */
 	movq	(2*8)(%rsp), %rax		/* RIP */
@@ -640,6 +648,7 @@ native_irq_return_ldt:
 	andl	$0xffff0000, %eax
 	popq	%rdi
 	orq	PER_CPU_VAR(espfix_stack), %rax
+	SWITCH_USER_CR3
 	SWAPGS
 	movq	%rax, %rsp
 	popq	%rax
@@ -1007,6 +1016,7 @@ ENTRY(paranoid_entry)
 	testl	%edx, %edx
 	js	1f				/* negative -> in kernel */
 	SWAPGS
+	SWITCH_KERNEL_CR3
 	xorl	%ebx, %ebx
 1:	ret
 END(paranoid_entry)
@@ -1029,6 +1039,7 @@ ENTRY(paranoid_exit)
 	testl	%ebx, %ebx			/* swapgs needed? */
 	jnz	paranoid_exit_no_swapgs
 	TRACE_IRQS_IRETQ
+	SWITCH_USER_CR3_NO_STACK
 	SWAPGS_UNSAFE_STACK
 	jmp	paranoid_exit_restore
 paranoid_exit_no_swapgs:
@@ -1058,6 +1069,7 @@ ENTRY(error_entry)
 	 * from user mode due to an IRET fault.
 	 */
 	SWAPGS
+	SWITCH_KERNEL_CR3
 
 .Lerror_entry_from_usermode_after_swapgs:
 	/*
@@ -1110,7 +1122,7 @@ ENTRY(error_entry)
 	 * Switch to kernel gsbase:
 	 */
 	SWAPGS
-
+	SWITCH_KERNEL_CR3
 	/*
 	 * Pretend that the exception came from user mode: set up pt_regs
 	 * as if we faulted immediately after IRET and clear EBX so that
@@ -1210,6 +1222,7 @@ ENTRY(nmi)
 	 */
 
 	SWAPGS_UNSAFE_STACK
+	SWITCH_KERNEL_CR3_NO_STACK
 	cld
 	movq	%rsp, %rdx
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
@@ -1250,6 +1263,7 @@ ENTRY(nmi)
 	 * work, because we don't want to enable interrupts.  Fortunately,
 	 * do_nmi doesn't modify pt_regs.
 	 */
+	SWITCH_USER_CR3
 	SWAPGS
 	jmp	restore_c_regs_and_iret
 
@@ -1461,6 +1475,7 @@ end_repeat_nmi:
 	testl	%ebx, %ebx			/* swapgs needed? */
 	jnz	nmi_restore
 nmi_swapgs:
+	SWITCH_USER_CR3_NO_STACK
 	SWAPGS_UNSAFE_STACK
 nmi_restore:
 	RESTORE_EXTRA_REGS
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -13,6 +13,7 @@
 #include <asm/irqflags.h>
 #include <asm/asm.h>
 #include <asm/smap.h>
+#include <asm/kaiser.h>
 #include <linux/linkage.h>
 #include <linux/err.h>
 
@@ -50,6 +51,7 @@ ENDPROC(native_usergs_sysret32)
 ENTRY(entry_SYSENTER_compat)
 	/* Interrupts are off on entry. */
 	SWAPGS_UNSAFE_STACK
+	SWITCH_KERNEL_CR3_NO_STACK
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
 
 	/*
@@ -161,6 +163,7 @@ ENDPROC(entry_SYSENTER_compat)
 ENTRY(entry_SYSCALL_compat)
 	/* Interrupts are off on entry. */
 	SWAPGS_UNSAFE_STACK
+	SWITCH_KERNEL_CR3_NO_STACK
 
 	/* Stash user ESP and switch to the kernel stack. */
 	movl	%esp, %r8d
@@ -208,6 +211,7 @@ ENTRY(entry_SYSCALL_compat)
 	/* Opportunistic SYSRET */
 sysret32_from_system_call:
 	TRACE_IRQS_ON			/* User mode traces as IRQs on. */
+	SWITCH_USER_CR3
 	movq	RBX(%rsp), %rbx		/* pt_regs->rbx */
 	movq	RBP(%rsp), %rbp		/* pt_regs->rbp */
 	movq	EFLAGS(%rsp), %r11	/* pt_regs->flags (in r11) */
@@ -269,6 +273,7 @@ ENTRY(entry_INT80_compat)
 	PARAVIRT_ADJUST_EXCEPTION_FRAME
 	ASM_CLAC			/* Do this early to minimize exposure */
 	SWAPGS
+	SWITCH_KERNEL_CR3_NO_STACK
 
 	/*
 	 * User tracing code (ptrace or signal handlers) might assume that
@@ -311,6 +316,7 @@ ENTRY(entry_INT80_compat)
 
 	/* Go back to user mode. */
 	TRACE_IRQS_ON
+	SWITCH_USER_CR3
 	SWAPGS
 	jmp	restore_regs_and_iret
 END(entry_INT80_compat)
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -187,7 +187,7 @@ extern char irq_entries_start[];
 #define VECTOR_RETRIGGERED	((void *)~0UL)
 
 typedef struct irq_desc* vector_irq_t[NR_VECTORS];
-DECLARE_PER_CPU(vector_irq_t, vector_irq);
+DECLARE_PER_CPU_USER_MAPPED(vector_irq_t, vector_irq);
 
 #endif /* !ASSEMBLY_ */
 
--- /dev/null
+++ b/arch/x86/include/asm/kaiser.h
@@ -0,0 +1,113 @@
+#ifndef _ASM_X86_KAISER_H
+#define _ASM_X86_KAISER_H
+
+/* This file includes the definitions for the KAISER feature.
+ * KAISER is a counter measure against x86_64 side channel attacks on the kernel virtual memory.
+ * It has a shodow-pgd for every process. the shadow-pgd has a minimalistic kernel-set mapped,
+ * but includes the whole user memory. Within a kernel context switch, or when an interrupt is handled,
+ * the pgd is switched to the normal one. When the system switches to user mode, the shadow pgd is enabled.
+ * By this, the virtual memory chaches are freed, and the user may not attack the whole kernel memory.
+ *
+ * A minimalistic kernel mapping holds the parts needed to be mapped in user mode, as the entry/exit functions
+ * of the user space, or the stacks.
+ */
+#ifdef __ASSEMBLY__
+#ifdef CONFIG_KAISER
+
+.macro _SWITCH_TO_KERNEL_CR3 reg
+movq %cr3, \reg
+andq $(~0x1000), \reg
+movq \reg, %cr3
+.endm
+
+.macro _SWITCH_TO_USER_CR3 reg
+movq %cr3, \reg
+orq $(0x1000), \reg
+movq \reg, %cr3
+.endm
+
+.macro SWITCH_KERNEL_CR3
+pushq %rax
+_SWITCH_TO_KERNEL_CR3 %rax
+popq %rax
+.endm
+
+.macro SWITCH_USER_CR3
+pushq %rax
+_SWITCH_TO_USER_CR3 %rax
+popq %rax
+.endm
+
+.macro SWITCH_KERNEL_CR3_NO_STACK
+movq %rax, PER_CPU_VAR(unsafe_stack_register_backup)
+_SWITCH_TO_KERNEL_CR3 %rax
+movq PER_CPU_VAR(unsafe_stack_register_backup), %rax
+.endm
+
+
+.macro SWITCH_USER_CR3_NO_STACK
+
+movq %rax, PER_CPU_VAR(unsafe_stack_register_backup)
+_SWITCH_TO_USER_CR3 %rax
+movq PER_CPU_VAR(unsafe_stack_register_backup), %rax
+
+.endm
+
+#else /* CONFIG_KAISER */
+
+.macro SWITCH_KERNEL_CR3 reg
+.endm
+.macro SWITCH_USER_CR3 reg
+.endm
+.macro SWITCH_USER_CR3_NO_STACK
+.endm
+.macro SWITCH_KERNEL_CR3_NO_STACK
+.endm
+
+#endif /* CONFIG_KAISER */
+#else /* __ASSEMBLY__ */
+
+
+#ifdef CONFIG_KAISER
+// Upon kernel/user mode switch, it may happen that
+// the address space has to be switched before the registers have been stored.
+// To change the address space, another register is needed.
+// A register therefore has to be stored/restored.
+//
+DECLARE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
+
+#endif /* CONFIG_KAISER */
+
+/**
+ *  shadowmem_add_mapping - map a virtual memory part to the shadow mapping
+ *  @addr: the start address of the range
+ *  @size: the size of the range
+ *  @flags: The mapping flags of the pages
+ *
+ *  the mapping is done on a global scope, so no bigger synchronization has to be done.
+ *  the pages have to be manually unmapped again when they are not needed any longer.
+ */
+extern void kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags);
+
+
+/**
+ *  shadowmem_remove_mapping - unmap a virtual memory part of the shadow mapping
+ *  @addr: the start address of the range
+ *  @size: the size of the range
+ */
+extern void kaiser_remove_mapping(unsigned long start, unsigned long size);
+
+/**
+ *  shadowmem_initialize_mapping - Initalize the shadow mapping
+ *
+ *  most parts of the shadow mapping can be mapped upon boot time.
+ *  only the thread stacks have to be mapped on runtime.
+ *  the mapped regions are not unmapped at all.
+ */
+extern void kaiser_init(void);
+
+#endif
+
+
+
+#endif /* _ASM_X86_KAISER_H */
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -856,6 +856,10 @@ static inline void pmdp_set_wrprotect(st
 static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count)
 {
        memcpy(dst, src, count * sizeof(pgd_t));
+#ifdef CONFIG_KAISER
+	// clone the shadow pgd part as well
+	memcpy(native_get_shadow_pgd(dst), native_get_shadow_pgd(src), count * sizeof(pgd_t));
+#endif
 }
 
 #define PTE_SHIFT ilog2(PTRS_PER_PTE)
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -106,9 +106,30 @@ static inline void native_pud_clear(pud_
 	native_set_pud(pud, native_make_pud(0));
 }
 
+#ifdef CONFIG_KAISER
+static inline pgd_t * native_get_shadow_pgd(pgd_t *pgdp) {
+	return (pgd_t *)(void*)((unsigned long)(void*)pgdp | (unsigned long)PAGE_SIZE);
+}
+
+static inline pgd_t * native_get_normal_pgd(pgd_t *pgdp) {
+	return (pgd_t *)(void*)((unsigned long)(void*)pgdp &  ~(unsigned long)PAGE_SIZE);
+}
+#endif /* CONFIG_KAISER */
+
 static inline void native_set_pgd(pgd_t *pgdp, pgd_t pgd)
 {
+#ifdef CONFIG_KAISER
+	// We know that a pgd is page aligned.
+	// Therefore the lower indices have to be mapped to user space.
+	// These pages are mapped to the shadow mapping.
+	if ((((unsigned long)pgdp) % PAGE_SIZE) < (PAGE_SIZE / 2)) {
+		native_get_shadow_pgd(pgdp)->pgd = pgd.pgd;
+	}
+
+	pgdp->pgd = pgd.pgd & ~_PAGE_USER;
+#else /* CONFIG_KAISER */
 	*pgdp = pgd;
+#endif
 }
 
 static inline void native_pgd_clear(pgd_t *pgd)
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -39,7 +39,11 @@
 #define _PAGE_ACCESSED	(_AT(pteval_t, 1) << _PAGE_BIT_ACCESSED)
 #define _PAGE_DIRTY	(_AT(pteval_t, 1) << _PAGE_BIT_DIRTY)
 #define _PAGE_PSE	(_AT(pteval_t, 1) << _PAGE_BIT_PSE)
-#define _PAGE_GLOBAL	(_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL)
+#ifdef CONFIG_KAISER
+#define _PAGE_GLOBAL	(_AT(pteval_t, 0))
+#else
+#define _PAGE_GLOBAL  (_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL)
+#endif
 #define _PAGE_SOFTW1	(_AT(pteval_t, 1) << _PAGE_BIT_SOFTW1)
 #define _PAGE_SOFTW2	(_AT(pteval_t, 1) << _PAGE_BIT_SOFTW2)
 #define _PAGE_PAT	(_AT(pteval_t, 1) << _PAGE_BIT_PAT)
@@ -89,7 +93,11 @@
 #define _PAGE_NX	(_AT(pteval_t, 0))
 #endif
 
-#define _PAGE_PROTNONE	(_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
+#ifdef CONFIG_KAISER
+#define _PAGE_PROTNONE	(_AT(pteval_t, 0))
+#else
+#define _PAGE_PROTNONE  (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
+#endif
 
 #define _PAGE_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |	\
 			 _PAGE_ACCESSED | _PAGE_DIRTY)
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -305,7 +305,7 @@ struct tss_struct {
 
 } ____cacheline_aligned;
 
-DECLARE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss);
+DECLARE_PER_CPU_SHARED_ALIGNED_USER_MAPPED(struct tss_struct, cpu_tss);
 
 #ifdef CONFIG_X86_32
 DECLARE_PER_CPU(unsigned long, cpu_current_top_of_stack);
@@ -332,6 +332,11 @@ union irq_stack_union {
 		char gs_base[40];
 		unsigned long stack_canary;
 	};
+
+	struct {
+		char irq_stack_pointer[64];
+		char unused[IRQ_STACK_SIZE - 64];
+	};
 };
 
 DECLARE_PER_CPU_FIRST(union irq_stack_union, irq_stack_union) __visible;
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -92,7 +92,7 @@ static const struct cpu_dev default_cpu
 
 static const struct cpu_dev *this_cpu = &default_cpu;
 
-DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = { .gdt = {
+DEFINE_PER_CPU_PAGE_ALIGNED_USER_MAPPED(struct gdt_page, gdt_page) = { .gdt = {
 #ifdef CONFIG_X86_64
 	/*
 	 * We need valid kernel segments for data and code in long mode too
@@ -1229,7 +1229,7 @@ static const unsigned int exception_stac
 	  [DEBUG_STACK - 1]			= DEBUG_STKSZ
 };
 
-static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks
+DEFINE_PER_CPU_PAGE_ALIGNED_USER_MAPPED(char, exception_stacks
 	[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]);
 
 /* May not be marked __init: used by software suspend */
--- a/arch/x86/kernel/espfix_64.c
+++ b/arch/x86/kernel/espfix_64.c
@@ -41,6 +41,7 @@
 #include <asm/pgalloc.h>
 #include <asm/setup.h>
 #include <asm/espfix.h>
+#include <asm/kaiser.h>
 
 /*
  * Note: we only need 6*8 = 48 bytes for the espfix stack, but round
@@ -126,6 +127,11 @@ void __init init_espfix_bsp(void)
 	/* Install the espfix pud into the kernel page directory */
 	pgd_p = &init_level4_pgt[pgd_index(ESPFIX_BASE_ADDR)];
 	pgd_populate(&init_mm, pgd_p, (pud_t *)espfix_pud_page);
+#ifdef CONFIG_KAISER
+	// add the esp stack pud to the shadow mapping here.
+	// This can be done directly, because the fixup stack has its own pud
+	set_pgd(native_get_shadow_pgd(pgd_p), __pgd(_PAGE_TABLE | __pa((pud_t *)espfix_pud_page)));
+#endif
 
 	/* Randomize the locations */
 	init_espfix_random();
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -441,6 +441,14 @@ early_idt_ripmsg:
 	.balign	PAGE_SIZE; \
 GLOBAL(name)
 
+#ifdef CONFIG_KAISER
+#define NEXT_PGD_PAGE(name) \
+	.balign 2 * PAGE_SIZE; \
+GLOBAL(name)
+#else
+#define NEXT_PGD_PAGE(name) NEXT_PAGE(name)
+#endif
+
 /* Automate the creation of 1 to 1 mapping pmd entries */
 #define PMDS(START, PERM, COUNT)			\
 	i = 0 ;						\
@@ -450,7 +458,7 @@ GLOBAL(name)
 	.endr
 
 	__INITDATA
-NEXT_PAGE(early_level4_pgt)
+NEXT_PGD_PAGE(early_level4_pgt)
 	.fill	511,8,0
 	.quad	level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
 
@@ -460,10 +468,10 @@ NEXT_PAGE(early_dynamic_pgts)
 	.data
 
 #ifndef CONFIG_XEN
-NEXT_PAGE(init_level4_pgt)
-	.fill	512,8,0
+NEXT_PGD_PAGE(init_level4_pgt)
+	.fill	2*512,8,0
 #else
-NEXT_PAGE(init_level4_pgt)
+NEXT_PGD_PAGE(init_level4_pgt)
 	.quad   level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
 	.org    init_level4_pgt + L4_PAGE_OFFSET*8, 0
 	.quad   level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -51,7 +51,7 @@ static struct irqaction irq2 = {
 	.flags = IRQF_NO_THREAD,
 };
 
-DEFINE_PER_CPU(vector_irq_t, vector_irq) = {
+DEFINE_PER_CPU_USER_MAPPED(vector_irq_t, vector_irq) = {
 	[0 ... NR_VECTORS - 1] = VECTOR_UNUSED,
 };
 
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -39,7 +39,7 @@
  * section. Since TSS's are completely CPU-local, we want them
  * on exact cacheline boundaries, to eliminate cacheline ping-pong.
  */
-__visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = {
+__visible DEFINE_PER_CPU_SHARED_ALIGNED_USER_MAPPED(struct tss_struct, cpu_tss) = {
 	.x86_tss = {
 		.sp0 = TOP_OF_INIT_STACK,
 #ifdef CONFIG_X86_32
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -32,3 +32,4 @@ obj-$(CONFIG_ACPI_NUMA)		+= srat.o
 obj-$(CONFIG_NUMA_EMU)		+= numa_emulation.o
 
 obj-$(CONFIG_X86_INTEL_MPX)	+= mpx.o
+obj-$(CONFIG_KAISER)		+= kaiser.o
--- /dev/null
+++ b/arch/x86/mm/kaiser.c
@@ -0,0 +1,160 @@
+
+
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/string.h>
+#include <linux/types.h>
+#include <linux/bug.h>
+#include <linux/init.h>
+#include <linux/spinlock.h>
+#include <linux/mm.h>
+
+#include <linux/uaccess.h>
+#include <asm/pgtable.h>
+#include <asm/pgalloc.h>
+#include <asm/desc.h>
+#ifdef CONFIG_KAISER
+
+__visible DEFINE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
+
+/**
+ * Get the real ppn from a address in kernel mapping.
+ * @param address The virtual adrress
+ * @return the physical address
+ */
+static inline unsigned long get_pa_from_mapping (unsigned long address)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+
+	pgd = pgd_offset_k(address);
+	BUG_ON(pgd_none(*pgd) || pgd_large(*pgd));
+
+	pud = pud_offset(pgd, address);
+	BUG_ON(pud_none(*pud));
+
+	if (pud_large(*pud)) {
+		return (pud_pfn(*pud) << PAGE_SHIFT) | (address & ~PUD_PAGE_MASK);
+	}
+
+	pmd = pmd_offset(pud, address);
+	BUG_ON(pmd_none(*pmd));
+
+	if (pmd_large(*pmd)) {
+		return (pmd_pfn(*pmd) << PAGE_SHIFT) | (address & ~PMD_PAGE_MASK);
+	}
+
+	pte = pte_offset_kernel(pmd, address);
+	BUG_ON(pte_none(*pte));
+
+	return (pte_pfn(*pte) << PAGE_SHIFT) | (address & ~PAGE_MASK);
+}
+
+void _kaiser_copy (unsigned long start_addr, unsigned long size,
+					unsigned long flags)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+	unsigned long address;
+	unsigned long end_addr = start_addr + size;
+	unsigned long target_address;
+
+	for (address = PAGE_ALIGN(start_addr - (PAGE_SIZE - 1));
+			address < PAGE_ALIGN(end_addr); address += PAGE_SIZE) {
+		target_address = get_pa_from_mapping(address);
+
+		pgd = native_get_shadow_pgd(pgd_offset_k(address));
+
+		BUG_ON(pgd_none(*pgd) && "All shadow pgds should be mapped at this time\n");
+		BUG_ON(pgd_large(*pgd));
+
+		pud = pud_offset(pgd, address);
+		if (pud_none(*pud)) {
+			set_pud(pud, __pud(_PAGE_TABLE | __pa(pmd_alloc_one(0, address))));
+		}
+		BUG_ON(pud_large(*pud));
+
+		pmd = pmd_offset(pud, address);
+		if (pmd_none(*pmd)) {
+			set_pmd(pmd, __pmd(_PAGE_TABLE | __pa(pte_alloc_one_kernel(0, address))));
+		}
+		BUG_ON(pmd_large(*pmd));
+
+		pte = pte_offset_kernel(pmd, address);
+		if (pte_none(*pte)) {
+			set_pte(pte, __pte(flags | target_address));
+		} else {
+			BUG_ON(__pa(pte_page(*pte)) != target_address);
+		}
+	}
+}
+
+// at first, add a pmd for every pgd entry in the shadowmem-kernel-part of the kernel mapping
+static inline void __init _kaiser_init(void)
+{
+	pgd_t *pgd;
+	int i = 0;
+
+	pgd = native_get_shadow_pgd(pgd_offset_k((unsigned long )0));
+	for (i = PTRS_PER_PGD / 2; i < PTRS_PER_PGD; i++) {
+		set_pgd(pgd + i, __pgd(_PAGE_TABLE |__pa(pud_alloc_one(0, 0))));
+	}
+}
+
+extern char __per_cpu_user_mapped_start[], __per_cpu_user_mapped_end[];
+spinlock_t shadow_table_lock;
+void __init kaiser_init(void)
+{
+	int cpu;
+	spin_lock_init(&shadow_table_lock);
+
+	spin_lock(&shadow_table_lock);
+
+	_kaiser_init();
+
+	for_each_possible_cpu(cpu) {
+		// map the per cpu user variables
+		_kaiser_copy(
+				(unsigned long) (__per_cpu_user_mapped_start + per_cpu_offset(cpu)),
+				(unsigned long) __per_cpu_user_mapped_end - (unsigned long) __per_cpu_user_mapped_start,
+				__PAGE_KERNEL);
+	}
+
+	// map the entry/exit text section, which is responsible to switch between user- and kernel mode
+	_kaiser_copy(
+			(unsigned long) __entry_text_start,
+			(unsigned long) __entry_text_end - (unsigned long) __entry_text_start,
+			__PAGE_KERNEL_RX);
+
+	// the fixed map address of the idt_table
+	_kaiser_copy(
+			(unsigned long) idt_descr.address,
+			sizeof(gate_desc) * NR_VECTORS,
+			__PAGE_KERNEL_RO);
+
+	spin_unlock(&shadow_table_lock);
+}
+
+// add a mapping to the shadow-mapping, and synchronize the mappings
+void kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags)
+{
+	spin_lock(&shadow_table_lock);
+	_kaiser_copy(addr, size, flags);
+	spin_unlock(&shadow_table_lock);
+}
+
+extern void unmap_pud_range(pgd_t *pgd, unsigned long start, unsigned long end);
+void kaiser_remove_mapping(unsigned long start, unsigned long size)
+{
+	pgd_t *pgd = native_get_shadow_pgd(pgd_offset_k(start));
+	spin_lock(&shadow_table_lock);
+	do {
+		unmap_pud_range(pgd, start, start + size);
+	} while (pgd++ != native_get_shadow_pgd(pgd_offset_k(start + size)));
+	spin_unlock(&shadow_table_lock);
+}
+#endif /* CONFIG_KAISER */
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -829,7 +829,7 @@ static void unmap_pmd_range(pud_t *pud,
 			pud_clear(pud);
 }
 
-static void unmap_pud_range(pgd_t *pgd, unsigned long start, unsigned long end)
+void unmap_pud_range(pgd_t *pgd, unsigned long start, unsigned long end)
 {
 	pud_t *pud = pud_offset(pgd, start);
 
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -342,12 +342,38 @@ static inline void _pgd_free(pgd_t *pgd)
 #else
 static inline pgd_t *_pgd_alloc(void)
 {
+#ifdef CONFIG_KAISER
+	// Instead of one PML4, we aquire two PML4s and, thus, an 8kb-aligned memory
+	// block. Therefore, we have to allocate at least 3 pages. However, the
+	// __get_free_pages returns us 4 pages. Hence, we store the base pointer at
+	// the beginning of the page of our 8kb-aligned memory block in order to
+	// correctly free it afterwars.
+
+	unsigned long pages = __get_free_pages(PGALLOC_GFP, get_order(4*PAGE_SIZE));
+
+	if(native_get_normal_pgd((pgd_t*) pages) == (pgd_t*) pages)
+	{
+		*((unsigned long*)(pages + 2 * PAGE_SIZE)) = pages;
+		return (pgd_t *) pages;
+	}
+	else
+	{
+		*((unsigned long*)(pages + 3 * PAGE_SIZE)) = pages;
+		return (pgd_t *) (pages + PAGE_SIZE);
+	}
+#else
 	return (pgd_t *)__get_free_page(PGALLOC_GFP);
+#endif
 }
 
 static inline void _pgd_free(pgd_t *pgd)
 {
+#ifdef CONFIG_KAISER
+  unsigned long pages = *((unsigned long*) ((char*) pgd + 2 * PAGE_SIZE));
+	free_pages(pages, get_order(4*PAGE_SIZE));
+#else
 	free_page((unsigned long)pgd);
+#endif
 }
 #endif /* CONFIG_X86_PAE */
 
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -725,7 +725,16 @@
  */
 #define PERCPU_INPUT(cacheline)						\
 	VMLINUX_SYMBOL(__per_cpu_start) = .;				\
-	*(.data..percpu..first)						\
+	\
+	VMLINUX_SYMBOL(__per_cpu_user_mapped_start) = .;        \
+	*(.data..percpu..first)           \
+	. = ALIGN(cacheline);           \
+	*(.data..percpu..user_mapped)            \
+	*(.data..percpu..user_mapped..shared_aligned)        \
+	. = ALIGN(PAGE_SIZE);           \
+	*(.data..percpu..user_mapped..page_aligned)          \
+	VMLINUX_SYMBOL(__per_cpu_user_mapped_end) = .;        \
+	\
 	. = ALIGN(PAGE_SIZE);						\
 	*(.data..percpu..page_aligned)					\
 	. = ALIGN(cacheline);						\
--- a/include/linux/percpu-defs.h
+++ b/include/linux/percpu-defs.h
@@ -35,6 +35,12 @@
 
 #endif
 
+#ifdef CONFIG_KAISER
+#define USER_MAPPED_SECTION "..user_mapped"
+#else
+#define USER_MAPPED_SECTION ""
+#endif
+
 /*
  * Base implementations of per-CPU variable declarations and definitions, where
  * the section in which the variable is to be placed is provided by the
@@ -115,6 +121,12 @@
 #define DEFINE_PER_CPU(type, name)					\
 	DEFINE_PER_CPU_SECTION(type, name, "")
 
+#define DECLARE_PER_CPU_USER_MAPPED(type, name)         \
+	DECLARE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION)
+
+#define DEFINE_PER_CPU_USER_MAPPED(type, name)          \
+	DEFINE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION)
+
 /*
  * Declaration/definition used for per-CPU variables that must come first in
  * the set of variables.
@@ -144,6 +156,14 @@
 	DEFINE_PER_CPU_SECTION(type, name, PER_CPU_SHARED_ALIGNED_SECTION) \
 	____cacheline_aligned_in_smp
 
+#define DECLARE_PER_CPU_SHARED_ALIGNED_USER_MAPPED(type, name)			\
+	DECLARE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION PER_CPU_SHARED_ALIGNED_SECTION) \
+	____cacheline_aligned_in_smp
+
+#define DEFINE_PER_CPU_SHARED_ALIGNED_USER_MAPPED(type, name)			\
+	DEFINE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION PER_CPU_SHARED_ALIGNED_SECTION) \
+	____cacheline_aligned_in_smp
+
 #define DECLARE_PER_CPU_ALIGNED(type, name)				\
 	DECLARE_PER_CPU_SECTION(type, name, PER_CPU_ALIGNED_SECTION)	\
 	____cacheline_aligned
@@ -162,6 +182,16 @@
 #define DEFINE_PER_CPU_PAGE_ALIGNED(type, name)				\
 	DEFINE_PER_CPU_SECTION(type, name, "..page_aligned")		\
 	__aligned(PAGE_SIZE)
+/*
+ * Declaration/definition used for per-CPU variables that must be page aligned and need to be mapped in user mode.
+ */
+#define DECLARE_PER_CPU_PAGE_ALIGNED_USER_MAPPED(type, name)      \
+  DECLARE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION"..page_aligned")   \
+  __aligned(PAGE_SIZE)
+
+#define DEFINE_PER_CPU_PAGE_ALIGNED_USER_MAPPED(type, name)       \
+  DEFINE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION"..page_aligned")    \
+  __aligned(PAGE_SIZE)
 
 /*
  * Declaration/definition used for per-CPU variables that must be read mostly.
--- a/init/main.c
+++ b/init/main.c
@@ -87,6 +87,9 @@
 #include <asm/setup.h>
 #include <asm/sections.h>
 #include <asm/cacheflush.h>
+#ifdef CONFIG_KAISER
+#include <asm/kaiser.h>
+#endif
 
 static int kernel_init(void *);
 
@@ -492,6 +495,9 @@ static void __init mm_init(void)
 	pgtable_init();
 	vmalloc_init();
 	ioremap_huge_init();
+#ifdef CONFIG_KAISER
+	kaiser_init();
+#endif
 }
 
 asmlinkage __visible void __init start_kernel(void)
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -167,8 +167,12 @@ static struct thread_info *alloc_thread_
 	return page ? page_address(page) : NULL;
 }
 
+extern void kaiser_remove_mapping(unsigned long start_addr, unsigned long size);
 static inline void free_thread_info(struct thread_info *ti)
 {
+#ifdef CONFIG_KAISER
+	kaiser_remove_mapping((unsigned long)ti, THREAD_SIZE);
+#endif
 	free_kmem_pages((unsigned long)ti, THREAD_SIZE_ORDER);
 }
 # else
@@ -331,6 +335,7 @@ void set_task_stack_end_magic(struct tas
 	*stackend = STACK_END_MAGIC;	/* for overflow detection */
 }
 
+extern void kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags);
 static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
 {
 	struct task_struct *tsk;
@@ -352,6 +357,9 @@ static struct task_struct *dup_task_stru
 		goto free_ti;
 
 	tsk->stack = ti;
+#ifdef CONFIG_KAISER
+	kaiser_add_mapping((unsigned long)tsk->stack, THREAD_SIZE, __PAGE_KERNEL);
+#endif
 #ifdef CONFIG_SECCOMP
 	/*
 	 * We must handle setting up seccomp filters once we're under
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -30,6 +30,13 @@ config SECURITY
 	  model will be used.
 
 	  If you are unsure how to answer this question, answer N.
+config KAISER
+	bool "Remove the kernel mapping in user mode"
+	depends on X86_64
+	depends on !PARAVIRT
+	help
+	  This enforces a strict kernel and user space isolation in order to close
+	  hardware side channels on kernel address information.
 
 config SECURITYFS
 	bool "Enable the securityfs filesystem"

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 03/37] kaiser: merged update
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
  2018-01-03 20:11   ` Greg Kroah-Hartman
  2018-01-03 20:11   ` [kernel-hardening] " Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 04/37] kaiser: do not set _PAGE_NX on pgd_none Greg Kroah-Hartman
                   ` (43 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Jiri Kosina

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dave Hansen <dave.hansen@linux.intel.com>


Merged fixes and cleanups, rebased to 4.4.89 tree (no 5-level paging).

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/entry/entry_64.S            |  106 ++++++++++-
 arch/x86/include/asm/kaiser.h        |   43 ++--
 arch/x86/include/asm/pgtable.h       |   18 +
 arch/x86/include/asm/pgtable_64.h    |   48 ++++-
 arch/x86/include/asm/pgtable_types.h |    6 
 arch/x86/kernel/espfix_64.c          |   13 -
 arch/x86/kernel/head_64.S            |   19 +-
 arch/x86/kernel/ldt.c                |   27 ++
 arch/x86/kernel/tracepoint.c         |    2 
 arch/x86/mm/kaiser.c                 |  318 +++++++++++++++++++++++++----------
 arch/x86/mm/pageattr.c               |   63 +++++-
 arch/x86/mm/pgtable.c                |   40 +---
 include/linux/kaiser.h               |   26 ++
 kernel/fork.c                        |    9 
 security/Kconfig                     |    5 
 15 files changed, 553 insertions(+), 190 deletions(-)
 create mode 100644 include/linux/kaiser.h

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -212,6 +212,13 @@ entry_SYSCALL_64_fastpath:
 	movq	RIP(%rsp), %rcx
 	movq	EFLAGS(%rsp), %r11
 	RESTORE_C_REGS_EXCEPT_RCX_R11
+	/*
+	 * This opens a window where we have a user CR3, but are
+	 * running in the kernel.  This makes using the CS
+	 * register useless for telling whether or not we need to
+	 * switch CR3 in NMIs.  Normal interrupts are OK because
+	 * they are off here.
+	 */
 	SWITCH_USER_CR3
 	movq	RSP(%rsp), %rsp
 	/*
@@ -350,11 +357,25 @@ GLOBAL(int_ret_from_sys_call)
 syscall_return_via_sysret:
 	/* rcx and r11 are already restored (see code above) */
 	RESTORE_C_REGS_EXCEPT_RCX_R11
+	/*
+	 * This opens a window where we have a user CR3, but are
+	 * running in the kernel.  This makes using the CS
+	 * register useless for telling whether or not we need to
+	 * switch CR3 in NMIs.  Normal interrupts are OK because
+	 * they are off here.
+	 */
 	SWITCH_USER_CR3
 	movq	RSP(%rsp), %rsp
 	USERGS_SYSRET64
 
 opportunistic_sysret_failed:
+	/*
+	 * This opens a window where we have a user CR3, but are
+	 * running in the kernel.  This makes using the CS
+	 * register useless for telling whether or not we need to
+	 * switch CR3 in NMIs.  Normal interrupts are OK because
+	 * they are off here.
+	 */
 	SWITCH_USER_CR3
 	SWAPGS
 	jmp	restore_c_regs_and_iret
@@ -1059,6 +1080,13 @@ ENTRY(error_entry)
 	cld
 	SAVE_C_REGS 8
 	SAVE_EXTRA_REGS 8
+	/*
+	 * error_entry() always returns with a kernel gsbase and
+	 * CR3.  We must also have a kernel CR3/gsbase before
+	 * calling TRACE_IRQS_*.  Just unconditionally switch to
+	 * the kernel CR3 here.
+	 */
+	SWITCH_KERNEL_CR3
 	xorl	%ebx, %ebx
 	testb	$3, CS+8(%rsp)
 	jz	.Lerror_kernelspace
@@ -1069,7 +1097,6 @@ ENTRY(error_entry)
 	 * from user mode due to an IRET fault.
 	 */
 	SWAPGS
-	SWITCH_KERNEL_CR3
 
 .Lerror_entry_from_usermode_after_swapgs:
 	/*
@@ -1122,7 +1149,7 @@ ENTRY(error_entry)
 	 * Switch to kernel gsbase:
 	 */
 	SWAPGS
-	SWITCH_KERNEL_CR3
+
 	/*
 	 * Pretend that the exception came from user mode: set up pt_regs
 	 * as if we faulted immediately after IRET and clear EBX so that
@@ -1222,7 +1249,10 @@ ENTRY(nmi)
 	 */
 
 	SWAPGS_UNSAFE_STACK
-	SWITCH_KERNEL_CR3_NO_STACK
+	/*
+	 * percpu variables are mapped with user CR3, so no need
+	 * to switch CR3 here.
+	 */
 	cld
 	movq	%rsp, %rdx
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
@@ -1256,14 +1286,33 @@ ENTRY(nmi)
 
 	movq	%rsp, %rdi
 	movq	$-1, %rsi
+#ifdef CONFIG_KAISER
+	/* Unconditionally use kernel CR3 for do_nmi() */
+	/* %rax is saved above, so OK to clobber here */
+	movq	%cr3, %rax
+	pushq	%rax
+#ifdef CONFIG_KAISER_REAL_SWITCH
+	andq	$(~0x1000), %rax
+#endif
+	movq	%rax, %cr3
+#endif
 	call	do_nmi
+	/*
+	 * Unconditionally restore CR3.  I know we return to
+	 * kernel code that needs user CR3, but do we ever return
+	 * to "user mode" where we need the kernel CR3?
+	 */
+#ifdef CONFIG_KAISER
+	popq	%rax
+	mov	%rax, %cr3
+#endif
 
 	/*
 	 * Return back to user mode.  We must *not* do the normal exit
-	 * work, because we don't want to enable interrupts.  Fortunately,
-	 * do_nmi doesn't modify pt_regs.
+	 * work, because we don't want to enable interrupts.  Do not
+	 * switch to user CR3: we might be going back to kernel code
+	 * that had a user CR3 set.
 	 */
-	SWITCH_USER_CR3
 	SWAPGS
 	jmp	restore_c_regs_and_iret
 
@@ -1459,23 +1508,54 @@ end_repeat_nmi:
 	ALLOC_PT_GPREGS_ON_STACK
 
 	/*
-	 * Use paranoid_entry to handle SWAPGS, but no need to use paranoid_exit
-	 * as we should not be calling schedule in NMI context.
-	 * Even with normal interrupts enabled. An NMI should not be
-	 * setting NEED_RESCHED or anything that normal interrupts and
-	 * exceptions might do.
+	 * Use the same approach as paranoid_entry to handle SWAPGS, but
+	 * without CR3 handling since we do that differently in NMIs.  No
+	 * need to use paranoid_exit as we should not be calling schedule
+	 * in NMI context.  Even with normal interrupts enabled. An NMI
+	 * should not be setting NEED_RESCHED or anything that normal
+	 * interrupts and exceptions might do.
 	 */
-	call	paranoid_entry
+	cld
+	SAVE_C_REGS
+	SAVE_EXTRA_REGS
+	movl	$1, %ebx
+	movl	$MSR_GS_BASE, %ecx
+	rdmsr
+	testl	%edx, %edx
+	js	1f				/* negative -> in kernel */
+	SWAPGS
+	xorl	%ebx, %ebx
+1:
+#ifdef CONFIG_KAISER
+	/* Unconditionally use kernel CR3 for do_nmi() */
+	/* %rax is saved above, so OK to clobber here */
+	movq	%cr3, %rax
+	pushq	%rax
+#ifdef CONFIG_KAISER_REAL_SWITCH
+	andq	$(~0x1000), %rax
+#endif
+	movq	%rax, %cr3
+#endif
 
 	/* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
 	movq	%rsp, %rdi
+	addq	$8, %rdi /* point %rdi at ptregs, fixed up for CR3 */
 	movq	$-1, %rsi
 	call	do_nmi
+	/*
+	 * Unconditionally restore CR3.  We might be returning to
+	 * kernel code that needs user CR3, like just just before
+	 * a sysret.
+	 */
+#ifdef CONFIG_KAISER
+	popq	%rax
+	mov	%rax, %cr3
+#endif
 
 	testl	%ebx, %ebx			/* swapgs needed? */
 	jnz	nmi_restore
 nmi_swapgs:
-	SWITCH_USER_CR3_NO_STACK
+	/* We fixed up CR3 above, so no need to switch it here */
 	SWAPGS_UNSAFE_STACK
 nmi_restore:
 	RESTORE_EXTRA_REGS
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -16,13 +16,17 @@
 
 .macro _SWITCH_TO_KERNEL_CR3 reg
 movq %cr3, \reg
+#ifdef CONFIG_KAISER_REAL_SWITCH
 andq $(~0x1000), \reg
+#endif
 movq \reg, %cr3
 .endm
 
 .macro _SWITCH_TO_USER_CR3 reg
 movq %cr3, \reg
+#ifdef CONFIG_KAISER_REAL_SWITCH
 orq $(0x1000), \reg
+#endif
 movq \reg, %cr3
 .endm
 
@@ -65,48 +69,53 @@ movq PER_CPU_VAR(unsafe_stack_register_b
 .endm
 
 #endif /* CONFIG_KAISER */
+
 #else /* __ASSEMBLY__ */
 
 
 #ifdef CONFIG_KAISER
-// Upon kernel/user mode switch, it may happen that
-// the address space has to be switched before the registers have been stored.
-// To change the address space, another register is needed.
-// A register therefore has to be stored/restored.
-//
-DECLARE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
+/*
+ * Upon kernel/user mode switch, it may happen that the address
+ * space has to be switched before the registers have been
+ * stored.  To change the address space, another register is
+ * needed.  A register therefore has to be stored/restored.
+*/
 
-#endif /* CONFIG_KAISER */
+DECLARE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
 
 /**
- *  shadowmem_add_mapping - map a virtual memory part to the shadow mapping
+ *  kaiser_add_mapping - map a virtual memory part to the shadow (user) mapping
  *  @addr: the start address of the range
  *  @size: the size of the range
  *  @flags: The mapping flags of the pages
  *
- *  the mapping is done on a global scope, so no bigger synchronization has to be done.
- *  the pages have to be manually unmapped again when they are not needed any longer.
+ *  The mapping is done on a global scope, so no bigger
+ *  synchronization has to be done.  the pages have to be
+ *  manually unmapped again when they are not needed any longer.
  */
-extern void kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags);
+extern int kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags);
 
 
 /**
- *  shadowmem_remove_mapping - unmap a virtual memory part of the shadow mapping
+ *  kaiser_remove_mapping - unmap a virtual memory part of the shadow mapping
  *  @addr: the start address of the range
  *  @size: the size of the range
  */
 extern void kaiser_remove_mapping(unsigned long start, unsigned long size);
 
 /**
- *  shadowmem_initialize_mapping - Initalize the shadow mapping
+ *  kaiser_initialize_mapping - Initalize the shadow mapping
  *
- *  most parts of the shadow mapping can be mapped upon boot time.
- *  only the thread stacks have to be mapped on runtime.
- *  the mapped regions are not unmapped at all.
+ *  Most parts of the shadow mapping can be mapped upon boot
+ *  time.  Only per-process things like the thread stacks
+ *  or a new LDT have to be mapped at runtime.  These boot-
+ *  time mappings are permanent and nevertunmapped.
  */
 extern void kaiser_init(void);
 
-#endif
+#endif /* CONFIG_KAISER */
+
+#endif /* __ASSEMBLY */
 
 
 
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -653,7 +653,17 @@ static inline pud_t *pud_offset(pgd_t *p
 
 static inline int pgd_bad(pgd_t pgd)
 {
-	return (pgd_flags(pgd) & ~_PAGE_USER) != _KERNPG_TABLE;
+	pgdval_t ignore_flags = _PAGE_USER;
+	/*
+	 * We set NX on KAISER pgds that map userspace memory so
+	 * that userspace can not meaningfully use the kernel
+	 * page table by accident; it will fault on the first
+	 * instruction it tries to run.  See native_set_pgd().
+	 */
+	if (IS_ENABLED(CONFIG_KAISER))
+		ignore_flags |= _PAGE_NX;
+
+	return (pgd_flags(pgd) & ~ignore_flags) != _KERNPG_TABLE;
 }
 
 static inline int pgd_none(pgd_t pgd)
@@ -857,8 +867,10 @@ static inline void clone_pgd_range(pgd_t
 {
        memcpy(dst, src, count * sizeof(pgd_t));
 #ifdef CONFIG_KAISER
-	// clone the shadow pgd part as well
-	memcpy(native_get_shadow_pgd(dst), native_get_shadow_pgd(src), count * sizeof(pgd_t));
+	/* Clone the shadow pgd part as well */
+	memcpy(native_get_shadow_pgd(dst),
+	       native_get_shadow_pgd(src),
+	       count * sizeof(pgd_t));
 #endif
 }
 
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -107,26 +107,58 @@ static inline void native_pud_clear(pud_
 }
 
 #ifdef CONFIG_KAISER
-static inline pgd_t * native_get_shadow_pgd(pgd_t *pgdp) {
+static inline pgd_t * native_get_shadow_pgd(pgd_t *pgdp)
+{
 	return (pgd_t *)(void*)((unsigned long)(void*)pgdp | (unsigned long)PAGE_SIZE);
 }
 
-static inline pgd_t * native_get_normal_pgd(pgd_t *pgdp) {
+static inline pgd_t * native_get_normal_pgd(pgd_t *pgdp)
+{
 	return (pgd_t *)(void*)((unsigned long)(void*)pgdp &  ~(unsigned long)PAGE_SIZE);
 }
+#else
+static inline pgd_t * native_get_shadow_pgd(pgd_t *pgdp)
+{
+	BUILD_BUG_ON(1);
+	return NULL;
+}
+static inline pgd_t * native_get_normal_pgd(pgd_t *pgdp)
+{
+	return pgdp;
+}
 #endif /* CONFIG_KAISER */
 
+/*
+ * Page table pages are page-aligned.  The lower half of the top
+ * level is used for userspace and the top half for the kernel.
+ * This returns true for user pages that need to get copied into
+ * both the user and kernel copies of the page tables, and false
+ * for kernel pages that should only be in the kernel copy.
+ */
+static inline bool is_userspace_pgd(void *__ptr)
+{
+	unsigned long ptr = (unsigned long)__ptr;
+
+	return ((ptr % PAGE_SIZE) < (PAGE_SIZE / 2));
+}
+
 static inline void native_set_pgd(pgd_t *pgdp, pgd_t pgd)
 {
 #ifdef CONFIG_KAISER
-	// We know that a pgd is page aligned.
-	// Therefore the lower indices have to be mapped to user space.
-	// These pages are mapped to the shadow mapping.
-	if ((((unsigned long)pgdp) % PAGE_SIZE) < (PAGE_SIZE / 2)) {
+	pteval_t extra_kern_pgd_flags = 0;
+	/* Do we need to also populate the shadow pgd? */
+	if (is_userspace_pgd(pgdp)) {
 		native_get_shadow_pgd(pgdp)->pgd = pgd.pgd;
+		/*
+		 * Even if the entry is *mapping* userspace, ensure
+		 * that userspace can not use it.  This way, if we
+		 * get out to userspace running on the kernel CR3,
+		 * userspace will crash instead of running.
+		 */
+		extra_kern_pgd_flags = _PAGE_NX;
 	}
-
-	pgdp->pgd = pgd.pgd & ~_PAGE_USER;
+	pgdp->pgd = pgd.pgd;
+	pgdp->pgd |= extra_kern_pgd_flags;
 #else /* CONFIG_KAISER */
 	*pgdp = pgd;
 #endif
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -42,7 +42,7 @@
 #ifdef CONFIG_KAISER
 #define _PAGE_GLOBAL	(_AT(pteval_t, 0))
 #else
-#define _PAGE_GLOBAL  (_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL)
+#define _PAGE_GLOBAL	(_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL)
 #endif
 #define _PAGE_SOFTW1	(_AT(pteval_t, 1) << _PAGE_BIT_SOFTW1)
 #define _PAGE_SOFTW2	(_AT(pteval_t, 1) << _PAGE_BIT_SOFTW2)
@@ -93,11 +93,7 @@
 #define _PAGE_NX	(_AT(pteval_t, 0))
 #endif
 
-#ifdef CONFIG_KAISER
-#define _PAGE_PROTNONE	(_AT(pteval_t, 0))
-#else
 #define _PAGE_PROTNONE  (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
-#endif
 
 #define _PAGE_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |	\
 			 _PAGE_ACCESSED | _PAGE_DIRTY)
--- a/arch/x86/kernel/espfix_64.c
+++ b/arch/x86/kernel/espfix_64.c
@@ -127,11 +127,14 @@ void __init init_espfix_bsp(void)
 	/* Install the espfix pud into the kernel page directory */
 	pgd_p = &init_level4_pgt[pgd_index(ESPFIX_BASE_ADDR)];
 	pgd_populate(&init_mm, pgd_p, (pud_t *)espfix_pud_page);
-#ifdef CONFIG_KAISER
-	// add the esp stack pud to the shadow mapping here.
-	// This can be done directly, because the fixup stack has its own pud
-	set_pgd(native_get_shadow_pgd(pgd_p), __pgd(_PAGE_TABLE | __pa((pud_t *)espfix_pud_page)));
-#endif
+	/*
+	 * Just copy the top-level PGD that is mapping the espfix
+	 * area to ensure it is mapped into the shadow user page
+	 * tables.
+	 */
+	if (IS_ENABLED(CONFIG_KAISER))
+		set_pgd(native_get_shadow_pgd(pgd_p),
+			__pgd(_KERNPG_TABLE | __pa((pud_t *)espfix_pud_page)));
 
 	/* Randomize the locations */
 	init_espfix_random();
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -442,11 +442,24 @@ early_idt_ripmsg:
 GLOBAL(name)
 
 #ifdef CONFIG_KAISER
+/*
+ * Each PGD needs to be 8k long and 8k aligned.  We do not
+ * ever go out to userspace with these, so we do not
+ * strictly *need* the second page, but this allows us to
+ * have a single set_pgd() implementation that does not
+ * need to worry about whether it has 4k or 8k to work
+ * with.
+ *
+ * This ensures PGDs are 8k long:
+ */
+#define KAISER_USER_PGD_FILL	512
+/* This ensures they are 8k-aligned: */
 #define NEXT_PGD_PAGE(name) \
 	.balign 2 * PAGE_SIZE; \
 GLOBAL(name)
 #else
 #define NEXT_PGD_PAGE(name) NEXT_PAGE(name)
+#define KAISER_USER_PGD_FILL	0
 #endif
 
 /* Automate the creation of 1 to 1 mapping pmd entries */
@@ -461,6 +474,7 @@ GLOBAL(name)
 NEXT_PGD_PAGE(early_level4_pgt)
 	.fill	511,8,0
 	.quad	level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
+	.fill	KAISER_USER_PGD_FILL,8,0
 
 NEXT_PAGE(early_dynamic_pgts)
 	.fill	512*EARLY_DYNAMIC_PAGE_TABLES,8,0
@@ -469,7 +483,8 @@ NEXT_PAGE(early_dynamic_pgts)
 
 #ifndef CONFIG_XEN
 NEXT_PGD_PAGE(init_level4_pgt)
-	.fill	2*512,8,0
+	.fill	512,8,0
+	.fill	KAISER_USER_PGD_FILL,8,0
 #else
 NEXT_PGD_PAGE(init_level4_pgt)
 	.quad   level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
@@ -478,6 +493,7 @@ NEXT_PGD_PAGE(init_level4_pgt)
 	.org    init_level4_pgt + L4_START_KERNEL*8, 0
 	/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
 	.quad   level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
+	.fill	KAISER_USER_PGD_FILL,8,0
 
 NEXT_PAGE(level3_ident_pgt)
 	.quad	level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
@@ -488,6 +504,7 @@ NEXT_PAGE(level2_ident_pgt)
 	 */
 	PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
 #endif
+	.fill	KAISER_USER_PGD_FILL,8,0
 
 NEXT_PAGE(level3_kernel_pgt)
 	.fill	L3_START_KERNEL,8,0
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -18,6 +18,7 @@
 #include <linux/uaccess.h>
 
 #include <asm/ldt.h>
+#include <asm/kaiser.h>
 #include <asm/desc.h>
 #include <asm/mmu_context.h>
 #include <asm/syscalls.h>
@@ -34,11 +35,21 @@ static void flush_ldt(void *current_mm)
 	set_ldt(pc->ldt->entries, pc->ldt->size);
 }
 
+static void __free_ldt_struct(struct ldt_struct *ldt)
+{
+	if (ldt->size * LDT_ENTRY_SIZE > PAGE_SIZE)
+		vfree(ldt->entries);
+	else
+		free_page((unsigned long)ldt->entries);
+	kfree(ldt);
+}
+
 /* The caller must call finalize_ldt_struct on the result. LDT starts zeroed. */
 static struct ldt_struct *alloc_ldt_struct(int size)
 {
 	struct ldt_struct *new_ldt;
 	int alloc_size;
+	int ret = 0;
 
 	if (size > LDT_ENTRIES)
 		return NULL;
@@ -66,6 +77,14 @@ static struct ldt_struct *alloc_ldt_stru
 		return NULL;
 	}
 
+	// FIXME: make kaiser_add_mapping() return an error code
+	// when it fails
+	kaiser_add_mapping((unsigned long)new_ldt->entries, alloc_size,
+			   __PAGE_KERNEL);
+	if (ret) {
+		__free_ldt_struct(new_ldt);
+		return NULL;
+	}
 	new_ldt->size = size;
 	return new_ldt;
 }
@@ -92,12 +111,10 @@ static void free_ldt_struct(struct ldt_s
 	if (likely(!ldt))
 		return;
 
+	kaiser_remove_mapping((unsigned long)ldt->entries,
+			      ldt->size * LDT_ENTRY_SIZE);
 	paravirt_free_ldt(ldt->entries, ldt->size);
-	if (ldt->size * LDT_ENTRY_SIZE > PAGE_SIZE)
-		vfree(ldt->entries);
-	else
-		free_page((unsigned long)ldt->entries);
-	kfree(ldt);
+	__free_ldt_struct(ldt);
 }
 
 /*
--- a/arch/x86/kernel/tracepoint.c
+++ b/arch/x86/kernel/tracepoint.c
@@ -9,10 +9,12 @@
 #include <linux/atomic.h>
 
 atomic_t trace_idt_ctr = ATOMIC_INIT(0);
+__aligned(PAGE_SIZE)
 struct desc_ptr trace_idt_descr = { NR_VECTORS * 16 - 1,
 				(unsigned long) trace_idt_table };
 
 /* No need to be aligned, but done to keep all IDTs defined the same way. */
+__aligned(PAGE_SIZE)
 gate_desc trace_idt_table[NR_VECTORS] __page_aligned_bss;
 
 static int trace_irq_vector_refcount;
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -1,160 +1,306 @@
-
-
+#include <linux/bug.h>
 #include <linux/kernel.h>
 #include <linux/errno.h>
 #include <linux/string.h>
 #include <linux/types.h>
 #include <linux/bug.h>
 #include <linux/init.h>
+#include <linux/interrupt.h>
 #include <linux/spinlock.h>
 #include <linux/mm.h>
-
 #include <linux/uaccess.h>
+#include <linux/ftrace.h>
+
+#include <asm/kaiser.h>
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
 #include <asm/desc.h>
 #ifdef CONFIG_KAISER
 
 __visible DEFINE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
+/*
+ * At runtime, the only things we map are some things for CPU
+ * hotplug, and stacks for new processes.  No two CPUs will ever
+ * be populating the same addresses, so we only need to ensure
+ * that we protect between two CPUs trying to allocate and
+ * populate the same page table page.
+ *
+ * Only take this lock when doing a set_p[4um]d(), but it is not
+ * needed for doing a set_pte().  We assume that only the *owner*
+ * of a given allocation will be doing this for _their_
+ * allocation.
+ *
+ * This ensures that once a system has been running for a while
+ * and there have been stacks all over and these page tables
+ * are fully populated, there will be no further acquisitions of
+ * this lock.
+ */
+static DEFINE_SPINLOCK(shadow_table_allocation_lock);
 
-/**
- * Get the real ppn from a address in kernel mapping.
- * @param address The virtual adrress
- * @return the physical address
+/*
+ * Returns -1 on error.
  */
-static inline unsigned long get_pa_from_mapping (unsigned long address)
+static inline unsigned long get_pa_from_mapping(unsigned long vaddr)
 {
 	pgd_t *pgd;
 	pud_t *pud;
 	pmd_t *pmd;
 	pte_t *pte;
 
-	pgd = pgd_offset_k(address);
-	BUG_ON(pgd_none(*pgd) || pgd_large(*pgd));
+	pgd = pgd_offset_k(vaddr);
+	/*
+	 * We made all the kernel PGDs present in kaiser_init().
+	 * We expect them to stay that way.
+	 */
+	BUG_ON(pgd_none(*pgd));
+	/*
+	 * PGDs are either 512GB or 128TB on all x86_64
+	 * configurations.  We don't handle these.
+	 */
+	BUG_ON(pgd_large(*pgd));
+
+	pud = pud_offset(pgd, vaddr);
+	if (pud_none(*pud)) {
+		WARN_ON_ONCE(1);
+		return -1;
+	}
 
-	pud = pud_offset(pgd, address);
-	BUG_ON(pud_none(*pud));
+	if (pud_large(*pud))
+		return (pud_pfn(*pud) << PAGE_SHIFT) | (vaddr & ~PUD_PAGE_MASK);
 
-	if (pud_large(*pud)) {
-		return (pud_pfn(*pud) << PAGE_SHIFT) | (address & ~PUD_PAGE_MASK);
+	pmd = pmd_offset(pud, vaddr);
+	if (pmd_none(*pmd)) {
+		WARN_ON_ONCE(1);
+		return -1;
 	}
 
-	pmd = pmd_offset(pud, address);
-	BUG_ON(pmd_none(*pmd));
+	if (pmd_large(*pmd))
+		return (pmd_pfn(*pmd) << PAGE_SHIFT) | (vaddr & ~PMD_PAGE_MASK);
 
-	if (pmd_large(*pmd)) {
-		return (pmd_pfn(*pmd) << PAGE_SHIFT) | (address & ~PMD_PAGE_MASK);
+	pte = pte_offset_kernel(pmd, vaddr);
+	if (pte_none(*pte)) {
+		WARN_ON_ONCE(1);
+		return -1;
 	}
 
-	pte = pte_offset_kernel(pmd, address);
-	BUG_ON(pte_none(*pte));
-
-	return (pte_pfn(*pte) << PAGE_SHIFT) | (address & ~PAGE_MASK);
+	return (pte_pfn(*pte) << PAGE_SHIFT) | (vaddr & ~PAGE_MASK);
 }
 
-void _kaiser_copy (unsigned long start_addr, unsigned long size,
-					unsigned long flags)
+/*
+ * This is a relatively normal page table walk, except that it
+ * also tries to allocate page tables pages along the way.
+ *
+ * Returns a pointer to a PTE on success, or NULL on failure.
+ */
+static pte_t *kaiser_pagetable_walk(unsigned long address, bool is_atomic)
 {
-	pgd_t *pgd;
-	pud_t *pud;
 	pmd_t *pmd;
-	pte_t *pte;
-	unsigned long address;
-	unsigned long end_addr = start_addr + size;
-	unsigned long target_address;
+	pud_t *pud;
+	pgd_t *pgd = native_get_shadow_pgd(pgd_offset_k(address));
+	gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
 
-	for (address = PAGE_ALIGN(start_addr - (PAGE_SIZE - 1));
-			address < PAGE_ALIGN(end_addr); address += PAGE_SIZE) {
-		target_address = get_pa_from_mapping(address);
+	might_sleep();
+	if (is_atomic) {
+		gfp &= ~GFP_KERNEL;
+		gfp |= __GFP_HIGH | __GFP_ATOMIC;
+	}
 
-		pgd = native_get_shadow_pgd(pgd_offset_k(address));
+	if (pgd_none(*pgd)) {
+		WARN_ONCE(1, "All shadow pgds should have been populated");
+		return NULL;
+	}
+	BUILD_BUG_ON(pgd_large(*pgd) != 0);
 
-		BUG_ON(pgd_none(*pgd) && "All shadow pgds should be mapped at this time\n");
-		BUG_ON(pgd_large(*pgd));
+	pud = pud_offset(pgd, address);
+	/* The shadow page tables do not use large mappings: */
+	if (pud_large(*pud)) {
+		WARN_ON(1);
+		return NULL;
+	}
+	if (pud_none(*pud)) {
+		unsigned long new_pmd_page = __get_free_page(gfp);
+		if (!new_pmd_page)
+			return NULL;
+		spin_lock(&shadow_table_allocation_lock);
+		if (pud_none(*pud))
+			set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
+		else
+			free_page(new_pmd_page);
+		spin_unlock(&shadow_table_allocation_lock);
+	}
 
-		pud = pud_offset(pgd, address);
-		if (pud_none(*pud)) {
-			set_pud(pud, __pud(_PAGE_TABLE | __pa(pmd_alloc_one(0, address))));
-		}
-		BUG_ON(pud_large(*pud));
+	pmd = pmd_offset(pud, address);
+	/* The shadow page tables do not use large mappings: */
+	if (pmd_large(*pmd)) {
+		WARN_ON(1);
+		return NULL;
+	}
+	if (pmd_none(*pmd)) {
+		unsigned long new_pte_page = __get_free_page(gfp);
+		if (!new_pte_page)
+			return NULL;
+		spin_lock(&shadow_table_allocation_lock);
+		if (pmd_none(*pmd))
+			set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page)));
+		else
+			free_page(new_pte_page);
+		spin_unlock(&shadow_table_allocation_lock);
+	}
 
-		pmd = pmd_offset(pud, address);
-		if (pmd_none(*pmd)) {
-			set_pmd(pmd, __pmd(_PAGE_TABLE | __pa(pte_alloc_one_kernel(0, address))));
-		}
-		BUG_ON(pmd_large(*pmd));
+	return pte_offset_kernel(pmd, address);
+}
 
-		pte = pte_offset_kernel(pmd, address);
+int kaiser_add_user_map(const void *__start_addr, unsigned long size,
+			unsigned long flags)
+{
+	int ret = 0;
+	pte_t *pte;
+	unsigned long start_addr = (unsigned long )__start_addr;
+	unsigned long address = start_addr & PAGE_MASK;
+	unsigned long end_addr = PAGE_ALIGN(start_addr + size);
+	unsigned long target_address;
+
+	for (;address < end_addr; address += PAGE_SIZE) {
+		target_address = get_pa_from_mapping(address);
+		if (target_address == -1) {
+			ret = -EIO;
+			break;
+		}
+		pte = kaiser_pagetable_walk(address, false);
 		if (pte_none(*pte)) {
 			set_pte(pte, __pte(flags | target_address));
 		} else {
-			BUG_ON(__pa(pte_page(*pte)) != target_address);
+			pte_t tmp;
+			set_pte(&tmp, __pte(flags | target_address));
+			WARN_ON_ONCE(!pte_same(*pte, tmp));
 		}
 	}
+	return ret;
 }
 
-// at first, add a pmd for every pgd entry in the shadowmem-kernel-part of the kernel mapping
-static inline void __init _kaiser_init(void)
+static int kaiser_add_user_map_ptrs(const void *start, const void *end, unsigned long flags)
+{
+	unsigned long size = end - start;
+
+	return kaiser_add_user_map(start, size, flags);
+}
+
+/*
+ * Ensure that the top level of the (shadow) page tables are
+ * entirely populated.  This ensures that all processes that get
+ * forked have the same entries.  This way, we do not have to
+ * ever go set up new entries in older processes.
+ *
+ * Note: we never free these, so there are no updates to them
+ * after this.
+ */
+static void __init kaiser_init_all_pgds(void)
 {
 	pgd_t *pgd;
 	int i = 0;
 
 	pgd = native_get_shadow_pgd(pgd_offset_k((unsigned long )0));
 	for (i = PTRS_PER_PGD / 2; i < PTRS_PER_PGD; i++) {
-		set_pgd(pgd + i, __pgd(_PAGE_TABLE |__pa(pud_alloc_one(0, 0))));
+		pgd_t new_pgd;
+		pud_t *pud = pud_alloc_one(&init_mm, PAGE_OFFSET + i * PGDIR_SIZE);
+		if (!pud) {
+			WARN_ON(1);
+			break;
+		}
+		new_pgd = __pgd(_KERNPG_TABLE |__pa(pud));
+		/*
+		 * Make sure not to stomp on some other pgd entry.
+		 */
+		if (!pgd_none(pgd[i])) {
+			WARN_ON(1);
+			continue;
+		}
+		set_pgd(pgd + i, new_pgd);
 	}
 }
 
+#define kaiser_add_user_map_early(start, size, flags) do {	\
+	int __ret = kaiser_add_user_map(start, size, flags);	\
+	WARN_ON(__ret);						\
+} while (0)
+
+#define kaiser_add_user_map_ptrs_early(start, end, flags) do {		\
+	int __ret = kaiser_add_user_map_ptrs(start, end, flags);	\
+	WARN_ON(__ret);							\
+} while (0)
+
 extern char __per_cpu_user_mapped_start[], __per_cpu_user_mapped_end[];
-spinlock_t shadow_table_lock;
+/*
+ * If anything in here fails, we will likely die on one of the
+ * first kernel->user transitions and init will die.  But, we
+ * will have most of the kernel up by then and should be able to
+ * get a clean warning out of it.  If we BUG_ON() here, we run
+ * the risk of being before we have good console output.
+ */
 void __init kaiser_init(void)
 {
 	int cpu;
-	spin_lock_init(&shadow_table_lock);
-
-	spin_lock(&shadow_table_lock);
 
-	_kaiser_init();
+	kaiser_init_all_pgds();
 
 	for_each_possible_cpu(cpu) {
-		// map the per cpu user variables
-		_kaiser_copy(
-				(unsigned long) (__per_cpu_user_mapped_start + per_cpu_offset(cpu)),
-				(unsigned long) __per_cpu_user_mapped_end - (unsigned long) __per_cpu_user_mapped_start,
-				__PAGE_KERNEL);
-	}
-
-	// map the entry/exit text section, which is responsible to switch between user- and kernel mode
-	_kaiser_copy(
-			(unsigned long) __entry_text_start,
-			(unsigned long) __entry_text_end - (unsigned long) __entry_text_start,
-			__PAGE_KERNEL_RX);
-
-	// the fixed map address of the idt_table
-	_kaiser_copy(
-			(unsigned long) idt_descr.address,
-			sizeof(gate_desc) * NR_VECTORS,
-			__PAGE_KERNEL_RO);
+		void *percpu_vaddr = __per_cpu_user_mapped_start +
+				     per_cpu_offset(cpu);
+		unsigned long percpu_sz = __per_cpu_user_mapped_end -
+					  __per_cpu_user_mapped_start;
+		kaiser_add_user_map_early(percpu_vaddr, percpu_sz,
+					  __PAGE_KERNEL);
+	}
 
-	spin_unlock(&shadow_table_lock);
+	/*
+	 * Map the entry/exit text section, which is needed at
+	 * switches from user to and from kernel.
+	 */
+	kaiser_add_user_map_ptrs_early(__entry_text_start, __entry_text_end,
+				       __PAGE_KERNEL_RX);
+
+#if defined(CONFIG_FUNCTION_GRAPH_TRACER) || defined(CONFIG_KASAN)
+	kaiser_add_user_map_ptrs_early(__irqentry_text_start,
+				       __irqentry_text_end,
+				       __PAGE_KERNEL_RX);
+#endif
+	kaiser_add_user_map_early((void *)idt_descr.address,
+				  sizeof(gate_desc) * NR_VECTORS,
+				  __PAGE_KERNEL_RO);
+#ifdef CONFIG_TRACING
+	kaiser_add_user_map_early(&trace_idt_descr,
+				  sizeof(trace_idt_descr),
+				  __PAGE_KERNEL);
+	kaiser_add_user_map_early(&trace_idt_table,
+				  sizeof(gate_desc) * NR_VECTORS,
+				  __PAGE_KERNEL);
+#endif
+	kaiser_add_user_map_early(&debug_idt_descr, sizeof(debug_idt_descr),
+				  __PAGE_KERNEL);
+	kaiser_add_user_map_early(&debug_idt_table,
+				  sizeof(gate_desc) * NR_VECTORS,
+				  __PAGE_KERNEL);
 }
 
+extern void unmap_pud_range_nofree(pgd_t *pgd, unsigned long start, unsigned long end);
 // add a mapping to the shadow-mapping, and synchronize the mappings
-void kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags)
+int kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags)
 {
-	spin_lock(&shadow_table_lock);
-	_kaiser_copy(addr, size, flags);
-	spin_unlock(&shadow_table_lock);
+	return kaiser_add_user_map((const void *)addr, size, flags);
 }
 
-extern void unmap_pud_range(pgd_t *pgd, unsigned long start, unsigned long end);
 void kaiser_remove_mapping(unsigned long start, unsigned long size)
 {
-	pgd_t *pgd = native_get_shadow_pgd(pgd_offset_k(start));
-	spin_lock(&shadow_table_lock);
-	do {
-		unmap_pud_range(pgd, start, start + size);
-	} while (pgd++ != native_get_shadow_pgd(pgd_offset_k(start + size)));
-	spin_unlock(&shadow_table_lock);
+	unsigned long end = start + size;
+	unsigned long addr;
+
+	for (addr = start; addr < end; addr += PGDIR_SIZE) {
+		pgd_t *pgd = native_get_shadow_pgd(pgd_offset_k(addr));
+		/*
+		 * unmap_p4d_range() handles > P4D_SIZE unmaps,
+		 * so no need to trim 'end'.
+		 */
+		unmap_pud_range_nofree(pgd, addr, end);
+	}
 }
 #endif /* CONFIG_KAISER */
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -52,6 +52,7 @@ static DEFINE_SPINLOCK(cpa_lock);
 #define CPA_FLUSHTLB 1
 #define CPA_ARRAY 2
 #define CPA_PAGES_ARRAY 4
+#define CPA_FREE_PAGETABLES 8
 
 #ifdef CONFIG_PROC_FS
 static unsigned long direct_pages_count[PG_LEVEL_NUM];
@@ -723,10 +724,13 @@ static int split_large_page(struct cpa_d
 	return 0;
 }
 
-static bool try_to_free_pte_page(pte_t *pte)
+static bool try_to_free_pte_page(struct cpa_data *cpa, pte_t *pte)
 {
 	int i;
 
+	if (!(cpa->flags & CPA_FREE_PAGETABLES))
+		return false;
+
 	for (i = 0; i < PTRS_PER_PTE; i++)
 		if (!pte_none(pte[i]))
 			return false;
@@ -735,10 +739,13 @@ static bool try_to_free_pte_page(pte_t *
 	return true;
 }
 
-static bool try_to_free_pmd_page(pmd_t *pmd)
+static bool try_to_free_pmd_page(struct cpa_data *cpa, pmd_t *pmd)
 {
 	int i;
 
+	if (!(cpa->flags & CPA_FREE_PAGETABLES))
+		return false;
+
 	for (i = 0; i < PTRS_PER_PMD; i++)
 		if (!pmd_none(pmd[i]))
 			return false;
@@ -759,7 +766,9 @@ static bool try_to_free_pud_page(pud_t *
 	return true;
 }
 
-static bool unmap_pte_range(pmd_t *pmd, unsigned long start, unsigned long end)
+static bool unmap_pte_range(struct cpa_data *cpa, pmd_t *pmd,
+			    unsigned long start,
+			    unsigned long end)
 {
 	pte_t *pte = pte_offset_kernel(pmd, start);
 
@@ -770,22 +779,23 @@ static bool unmap_pte_range(pmd_t *pmd,
 		pte++;
 	}
 
-	if (try_to_free_pte_page((pte_t *)pmd_page_vaddr(*pmd))) {
+	if (try_to_free_pte_page(cpa, (pte_t *)pmd_page_vaddr(*pmd))) {
 		pmd_clear(pmd);
 		return true;
 	}
 	return false;
 }
 
-static void __unmap_pmd_range(pud_t *pud, pmd_t *pmd,
+static void __unmap_pmd_range(struct cpa_data *cpa, pud_t *pud, pmd_t *pmd,
 			      unsigned long start, unsigned long end)
 {
-	if (unmap_pte_range(pmd, start, end))
-		if (try_to_free_pmd_page((pmd_t *)pud_page_vaddr(*pud)))
+	if (unmap_pte_range(cpa, pmd, start, end))
+		if (try_to_free_pmd_page(cpa, (pmd_t *)pud_page_vaddr(*pud)))
 			pud_clear(pud);
 }
 
-static void unmap_pmd_range(pud_t *pud, unsigned long start, unsigned long end)
+static void unmap_pmd_range(struct cpa_data *cpa, pud_t *pud,
+			    unsigned long start, unsigned long end)
 {
 	pmd_t *pmd = pmd_offset(pud, start);
 
@@ -796,7 +806,7 @@ static void unmap_pmd_range(pud_t *pud,
 		unsigned long next_page = (start + PMD_SIZE) & PMD_MASK;
 		unsigned long pre_end = min_t(unsigned long, end, next_page);
 
-		__unmap_pmd_range(pud, pmd, start, pre_end);
+		__unmap_pmd_range(cpa, pud, pmd, start, pre_end);
 
 		start = pre_end;
 		pmd++;
@@ -809,7 +819,8 @@ static void unmap_pmd_range(pud_t *pud,
 		if (pmd_large(*pmd))
 			pmd_clear(pmd);
 		else
-			__unmap_pmd_range(pud, pmd, start, start + PMD_SIZE);
+			__unmap_pmd_range(cpa, pud, pmd,
+					  start, start + PMD_SIZE);
 
 		start += PMD_SIZE;
 		pmd++;
@@ -819,17 +830,19 @@ static void unmap_pmd_range(pud_t *pud,
 	 * 4K leftovers?
 	 */
 	if (start < end)
-		return __unmap_pmd_range(pud, pmd, start, end);
+		return __unmap_pmd_range(cpa, pud, pmd, start, end);
 
 	/*
 	 * Try again to free the PMD page if haven't succeeded above.
 	 */
 	if (!pud_none(*pud))
-		if (try_to_free_pmd_page((pmd_t *)pud_page_vaddr(*pud)))
+		if (try_to_free_pmd_page(cpa, (pmd_t *)pud_page_vaddr(*pud)))
 			pud_clear(pud);
 }
 
-void unmap_pud_range(pgd_t *pgd, unsigned long start, unsigned long end)
+static void __unmap_pud_range(struct cpa_data *cpa, pgd_t *pgd,
+			      unsigned long start,
+			      unsigned long end)
 {
 	pud_t *pud = pud_offset(pgd, start);
 
@@ -840,7 +853,7 @@ void unmap_pud_range(pgd_t *pgd, unsigne
 		unsigned long next_page = (start + PUD_SIZE) & PUD_MASK;
 		unsigned long pre_end	= min_t(unsigned long, end, next_page);
 
-		unmap_pmd_range(pud, start, pre_end);
+		unmap_pmd_range(cpa, pud, start, pre_end);
 
 		start = pre_end;
 		pud++;
@@ -854,7 +867,7 @@ void unmap_pud_range(pgd_t *pgd, unsigne
 		if (pud_large(*pud))
 			pud_clear(pud);
 		else
-			unmap_pmd_range(pud, start, start + PUD_SIZE);
+			unmap_pmd_range(cpa, pud, start, start + PUD_SIZE);
 
 		start += PUD_SIZE;
 		pud++;
@@ -864,7 +877,7 @@ void unmap_pud_range(pgd_t *pgd, unsigne
 	 * 2M leftovers?
 	 */
 	if (start < end)
-		unmap_pmd_range(pud, start, end);
+		unmap_pmd_range(cpa, pud, start, end);
 
 	/*
 	 * No need to try to free the PUD page because we'll free it in
@@ -872,6 +885,24 @@ void unmap_pud_range(pgd_t *pgd, unsigne
 	 */
 }
 
+static void unmap_pud_range(pgd_t *pgd, unsigned long start, unsigned long end)
+{
+	struct cpa_data cpa = {
+		.flags = CPA_FREE_PAGETABLES,
+	};
+
+	__unmap_pud_range(&cpa, pgd, start, end);
+}
+
+void unmap_pud_range_nofree(pgd_t *pgd, unsigned long start, unsigned long end)
+{
+	struct cpa_data cpa = {
+		.flags = 0,
+	};
+
+	__unmap_pud_range(&cpa, pgd, start, end);
+}
+
 static void unmap_pgd_range(pgd_t *root, unsigned long addr, unsigned long end)
 {
 	pgd_t *pgd_entry = root + pgd_index(addr);
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -340,40 +340,26 @@ static inline void _pgd_free(pgd_t *pgd)
 		kmem_cache_free(pgd_cache, pgd);
 }
 #else
-static inline pgd_t *_pgd_alloc(void)
-{
-#ifdef CONFIG_KAISER
-	// Instead of one PML4, we aquire two PML4s and, thus, an 8kb-aligned memory
-	// block. Therefore, we have to allocate at least 3 pages. However, the
-	// __get_free_pages returns us 4 pages. Hence, we store the base pointer at
-	// the beginning of the page of our 8kb-aligned memory block in order to
-	// correctly free it afterwars.
 
-	unsigned long pages = __get_free_pages(PGALLOC_GFP, get_order(4*PAGE_SIZE));
-
-	if(native_get_normal_pgd((pgd_t*) pages) == (pgd_t*) pages)
-	{
-		*((unsigned long*)(pages + 2 * PAGE_SIZE)) = pages;
-		return (pgd_t *) pages;
-	}
-	else
-	{
-		*((unsigned long*)(pages + 3 * PAGE_SIZE)) = pages;
-		return (pgd_t *) (pages + PAGE_SIZE);
-	}
+#ifdef CONFIG_KAISER
+/*
+ * Instead of one pmd, we aquire two pmds.  Being order-1, it is
+ * both 8k in size and 8k-aligned.  That lets us just flip bit 12
+ * in a pointer to swap between the two 4k halves.
+ */
+#define PGD_ALLOCATION_ORDER 1
 #else
-	return (pgd_t *)__get_free_page(PGALLOC_GFP);
+#define PGD_ALLOCATION_ORDER 0
 #endif
+
+static inline pgd_t *_pgd_alloc(void)
+{
+	return (pgd_t *)__get_free_pages(PGALLOC_GFP, PGD_ALLOCATION_ORDER);
 }
 
 static inline void _pgd_free(pgd_t *pgd)
 {
-#ifdef CONFIG_KAISER
-  unsigned long pages = *((unsigned long*) ((char*) pgd + 2 * PAGE_SIZE));
-	free_pages(pages, get_order(4*PAGE_SIZE));
-#else
-	free_page((unsigned long)pgd);
-#endif
+	free_pages((unsigned long)pgd, PGD_ALLOCATION_ORDER);
 }
 #endif /* CONFIG_X86_PAE */
 
--- /dev/null
+++ b/include/linux/kaiser.h
@@ -0,0 +1,26 @@
+#ifndef _INCLUDE_KAISER_H
+#define _INCLUDE_KAISER_H
+
+#ifdef CONFIG_KAISER
+#include <asm/kaiser.h>
+#else
+
+/*
+ * These stubs are used whenever CONFIG_KAISER is off, which
+ * includes architectures that support KAISER, but have it
+ * disabled.
+ */
+
+static inline void kaiser_init(void)
+{
+}
+static inline void kaiser_remove_mapping(unsigned long start, unsigned long size)
+{
+}
+static inline int kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags)
+{
+	return 0;
+}
+
+#endif /* !CONFIG_KAISER */
+#endif /* _INCLUDE_KAISER_H */
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -58,6 +58,7 @@
 #include <linux/tsacct_kern.h>
 #include <linux/cn_proc.h>
 #include <linux/freezer.h>
+#include <linux/kaiser.h>
 #include <linux/delayacct.h>
 #include <linux/taskstats_kern.h>
 #include <linux/random.h>
@@ -335,7 +336,6 @@ void set_task_stack_end_magic(struct tas
 	*stackend = STACK_END_MAGIC;	/* for overflow detection */
 }
 
-extern void kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags);
 static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
 {
 	struct task_struct *tsk;
@@ -357,9 +357,10 @@ static struct task_struct *dup_task_stru
 		goto free_ti;
 
 	tsk->stack = ti;
-#ifdef CONFIG_KAISER
-	kaiser_add_mapping((unsigned long)tsk->stack, THREAD_SIZE, __PAGE_KERNEL);
-#endif
+
+	err= kaiser_add_mapping((unsigned long)tsk->stack, THREAD_SIZE, __PAGE_KERNEL);
+	if (err)
+		goto free_ti;
 #ifdef CONFIG_SECCOMP
 	/*
 	 * We must handle setting up seccomp filters once we're under
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -32,12 +32,17 @@ config SECURITY
 	  If you are unsure how to answer this question, answer N.
 config KAISER
 	bool "Remove the kernel mapping in user mode"
+	default y
 	depends on X86_64
 	depends on !PARAVIRT
 	help
 	  This enforces a strict kernel and user space isolation in order to close
 	  hardware side channels on kernel address information.
 
+config KAISER_REAL_SWITCH
+	bool "KAISER: actually switch page tables"
+	default y
+
 config SECURITYFS
 	bool "Enable the securityfs filesystem"
 	help

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 04/37] kaiser: do not set _PAGE_NX on pgd_none
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (2 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 03/37] kaiser: merged update Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 05/37] kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE Greg Kroah-Hartman
                   ` (42 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Jiri Kosina

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>


native_pgd_clear() uses native_set_pgd(), so native_set_pgd() must
avoid setting the _PAGE_NX bit on an otherwise pgd_none() entry:
usually that just generated a warning on exit, but sometimes
more mysterious and damaging failures (our production machines
could not complete booting).

The original fix to this just avoided adding _PAGE_NX to
an empty entry; but eventually more problems surfaced with kexec,
and EFI mapping expected to be a problem too.  So now instead
change native_set_pgd() to update shadow only if _PAGE_USER:

A few places (kernel/machine_kexec_64.c, platform/efi/efi_64.c for sure)
use set_pgd() to set up a temporary internal virtual address space, with
physical pages remapped at what Kaiser regards as userspace addresses:
Kaiser then assumes a shadow pgd follows, which it will try to corrupt.

This appears to be responsible for the recent kexec and kdump failures;
though it's unclear how those did not manifest as a problem before.
Ah, the shadow pgd will only be assumed to "follow" if the requested
pgd is on an even-numbered page: so I suppose it was going wrong 50%
of the time all along.

What we need is a flag to set_pgd(), to tell it we're dealing with
userspace.  Er, isn't that what the pgd's _PAGE_USER bit is saying?
Add a test for that.  But we cannot do the same for pgd_clear()
(which may be called to clear corrupted entries - set aside the
question of "corrupt in which pgd?" until later), so there just
rely on pgd_clear() not being called in the problematic cases -
with a WARN_ON_ONCE() which should fire half the time if it is.

But this is getting too big for an inline function: move it into
arch/x86/mm/kaiser.c (which then demands a boot/compressed mod);
and de-void and de-space native_get_shadow/normal_pgd() while here.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/boot/compressed/misc.h   |    1 
 arch/x86/include/asm/pgtable_64.h |   51 +++++++++-----------------------------
 arch/x86/mm/kaiser.c              |   42 +++++++++++++++++++++++++++++++
 3 files changed, 56 insertions(+), 38 deletions(-)

--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -9,6 +9,7 @@
  */
 #undef CONFIG_PARAVIRT
 #undef CONFIG_PARAVIRT_SPINLOCKS
+#undef CONFIG_KAISER
 #undef CONFIG_KASAN
 
 #include <linux/linkage.h>
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -107,61 +107,36 @@ static inline void native_pud_clear(pud_
 }
 
 #ifdef CONFIG_KAISER
-static inline pgd_t * native_get_shadow_pgd(pgd_t *pgdp)
+extern pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd);
+
+static inline pgd_t *native_get_shadow_pgd(pgd_t *pgdp)
 {
-	return (pgd_t *)(void*)((unsigned long)(void*)pgdp | (unsigned long)PAGE_SIZE);
+	return (pgd_t *)((unsigned long)pgdp | (unsigned long)PAGE_SIZE);
 }
 
-static inline pgd_t * native_get_normal_pgd(pgd_t *pgdp)
+static inline pgd_t *native_get_normal_pgd(pgd_t *pgdp)
 {
-	return (pgd_t *)(void*)((unsigned long)(void*)pgdp &  ~(unsigned long)PAGE_SIZE);
+	return (pgd_t *)((unsigned long)pgdp & ~(unsigned long)PAGE_SIZE);
 }
 #else
-static inline pgd_t * native_get_shadow_pgd(pgd_t *pgdp)
+static inline pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+	return pgd;
+}
+static inline pgd_t *native_get_shadow_pgd(pgd_t *pgdp)
 {
 	BUILD_BUG_ON(1);
 	return NULL;
 }
-static inline pgd_t * native_get_normal_pgd(pgd_t *pgdp)
+static inline pgd_t *native_get_normal_pgd(pgd_t *pgdp)
 {
 	return pgdp;
 }
 #endif /* CONFIG_KAISER */
 
-/*
- * Page table pages are page-aligned.  The lower half of the top
- * level is used for userspace and the top half for the kernel.
- * This returns true for user pages that need to get copied into
- * both the user and kernel copies of the page tables, and false
- * for kernel pages that should only be in the kernel copy.
- */
-static inline bool is_userspace_pgd(void *__ptr)
-{
-	unsigned long ptr = (unsigned long)__ptr;
-
-	return ((ptr % PAGE_SIZE) < (PAGE_SIZE / 2));
-}
-
 static inline void native_set_pgd(pgd_t *pgdp, pgd_t pgd)
 {
-#ifdef CONFIG_KAISER
-	pteval_t extra_kern_pgd_flags = 0;
-	/* Do we need to also populate the shadow pgd? */
-	if (is_userspace_pgd(pgdp)) {
-		native_get_shadow_pgd(pgdp)->pgd = pgd.pgd;
-		/*
-		 * Even if the entry is *mapping* userspace, ensure
-		 * that userspace can not use it.  This way, if we
-		 * get out to userspace running on the kernel CR3,
-		 * userspace will crash instead of running.
-		 */
-		extra_kern_pgd_flags = _PAGE_NX;
-	}
-	pgdp->pgd = pgd.pgd;
-	pgdp->pgd |= extra_kern_pgd_flags;
-#else /* CONFIG_KAISER */
-	*pgdp = pgd;
-#endif
+	*pgdp = kaiser_set_shadow_pgd(pgdp, pgd);
 }
 
 static inline void native_pgd_clear(pgd_t *pgd)
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -303,4 +303,46 @@ void kaiser_remove_mapping(unsigned long
 		unmap_pud_range_nofree(pgd, addr, end);
 	}
 }
+
+/*
+ * Page table pages are page-aligned.  The lower half of the top
+ * level is used for userspace and the top half for the kernel.
+ * This returns true for user pages that need to get copied into
+ * both the user and kernel copies of the page tables, and false
+ * for kernel pages that should only be in the kernel copy.
+ */
+static inline bool is_userspace_pgd(pgd_t *pgdp)
+{
+	return ((unsigned long)pgdp % PAGE_SIZE) < (PAGE_SIZE / 2);
+}
+
+pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+	/*
+	 * Do we need to also populate the shadow pgd?  Check _PAGE_USER to
+	 * skip cases like kexec and EFI which make temporary low mappings.
+	 */
+	if (pgd.pgd & _PAGE_USER) {
+		if (is_userspace_pgd(pgdp)) {
+			native_get_shadow_pgd(pgdp)->pgd = pgd.pgd;
+			/*
+			 * Even if the entry is *mapping* userspace, ensure
+			 * that userspace can not use it.  This way, if we
+			 * get out to userspace running on the kernel CR3,
+			 * userspace will crash instead of running.
+			 */
+			pgd.pgd |= _PAGE_NX;
+		}
+	} else if (!pgd.pgd) {
+		/*
+		 * pgd_clear() cannot check _PAGE_USER, and is even used to
+		 * clear corrupted pgd entries: so just rely on cases like
+		 * kexec and EFI never to be using pgd_clear().
+		 */
+		if (!WARN_ON_ONCE((unsigned long)pgdp & PAGE_SIZE) &&
+		    is_userspace_pgd(pgdp))
+			native_get_shadow_pgd(pgdp)->pgd = pgd.pgd;
+	}
+	return pgd;
+}
 #endif /* CONFIG_KAISER */

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 05/37] kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (3 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 04/37] kaiser: do not set _PAGE_NX on pgd_none Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 06/37] kaiser: fix build and FIXME in alloc_ldt_struct() Greg Kroah-Hartman
                   ` (41 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Jiri Kosina

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>


Kaiser only needs to map one page of the stack; and
kernel/fork.c did not build on powerpc (no __PAGE_KERNEL).
It's all cleaner if linux/kaiser.h provides kaiser_map_thread_stack()
and kaiser_unmap_thread_stack() wrappers around asm/kaiser.h's
kaiser_add_mapping() and kaiser_remove_mapping().  And use
linux/kaiser.h in init/main.c to avoid the #ifdefs there.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/kaiser.h |   40 +++++++++++++++++++++++++++++++++-------
 init/main.c            |    6 +-----
 kernel/fork.c          |    7 ++-----
 3 files changed, 36 insertions(+), 17 deletions(-)

--- a/include/linux/kaiser.h
+++ b/include/linux/kaiser.h
@@ -1,26 +1,52 @@
-#ifndef _INCLUDE_KAISER_H
-#define _INCLUDE_KAISER_H
+#ifndef _LINUX_KAISER_H
+#define _LINUX_KAISER_H
 
 #ifdef CONFIG_KAISER
 #include <asm/kaiser.h>
+
+static inline int kaiser_map_thread_stack(void *stack)
+{
+	/*
+	 * Map that page of kernel stack on which we enter from user context.
+	 */
+	return kaiser_add_mapping((unsigned long)stack +
+			THREAD_SIZE - PAGE_SIZE, PAGE_SIZE, __PAGE_KERNEL);
+}
+
+static inline void kaiser_unmap_thread_stack(void *stack)
+{
+	/*
+	 * Note: may be called even when kaiser_map_thread_stack() failed.
+	 */
+	kaiser_remove_mapping((unsigned long)stack +
+			THREAD_SIZE - PAGE_SIZE, PAGE_SIZE);
+}
 #else
 
 /*
  * These stubs are used whenever CONFIG_KAISER is off, which
- * includes architectures that support KAISER, but have it
- * disabled.
+ * includes architectures that support KAISER, but have it disabled.
  */
 
 static inline void kaiser_init(void)
 {
 }
-static inline void kaiser_remove_mapping(unsigned long start, unsigned long size)
+static inline int kaiser_add_mapping(unsigned long addr,
+				     unsigned long size, unsigned long flags)
+{
+	return 0;
+}
+static inline void kaiser_remove_mapping(unsigned long start,
+					 unsigned long size)
 {
 }
-static inline int kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags)
+static inline int kaiser_map_thread_stack(void *stack)
 {
 	return 0;
 }
+static inline void kaiser_unmap_thread_stack(void *stack)
+{
+}
 
 #endif /* !CONFIG_KAISER */
-#endif /* _INCLUDE_KAISER_H */
+#endif /* _LINUX_KAISER_H */
--- a/init/main.c
+++ b/init/main.c
@@ -81,15 +81,13 @@
 #include <linux/integrity.h>
 #include <linux/proc_ns.h>
 #include <linux/io.h>
+#include <linux/kaiser.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
 #include <asm/setup.h>
 #include <asm/sections.h>
 #include <asm/cacheflush.h>
-#ifdef CONFIG_KAISER
-#include <asm/kaiser.h>
-#endif
 
 static int kernel_init(void *);
 
@@ -495,9 +493,7 @@ static void __init mm_init(void)
 	pgtable_init();
 	vmalloc_init();
 	ioremap_huge_init();
-#ifdef CONFIG_KAISER
 	kaiser_init();
-#endif
 }
 
 asmlinkage __visible void __init start_kernel(void)
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -168,12 +168,9 @@ static struct thread_info *alloc_thread_
 	return page ? page_address(page) : NULL;
 }
 
-extern void kaiser_remove_mapping(unsigned long start_addr, unsigned long size);
 static inline void free_thread_info(struct thread_info *ti)
 {
-#ifdef CONFIG_KAISER
-	kaiser_remove_mapping((unsigned long)ti, THREAD_SIZE);
-#endif
+	kaiser_unmap_thread_stack(ti);
 	free_kmem_pages((unsigned long)ti, THREAD_SIZE_ORDER);
 }
 # else
@@ -358,7 +355,7 @@ static struct task_struct *dup_task_stru
 
 	tsk->stack = ti;
 
-	err= kaiser_add_mapping((unsigned long)tsk->stack, THREAD_SIZE, __PAGE_KERNEL);
+	err = kaiser_map_thread_stack(tsk->stack);
 	if (err)
 		goto free_ti;
 #ifdef CONFIG_SECCOMP

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 06/37] kaiser: fix build and FIXME in alloc_ldt_struct()
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (4 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 05/37] kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 07/37] kaiser: KAISER depends on SMP Greg Kroah-Hartman
                   ` (40 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Jiri Kosina

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>


Include linux/kaiser.h instead of asm/kaiser.h to build ldt.c without
CONFIG_KAISER.  kaiser_add_mapping() does already return an error code,
so fix the FIXME.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/ldt.c |   10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -16,9 +16,9 @@
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
 #include <linux/uaccess.h>
+#include <linux/kaiser.h>
 
 #include <asm/ldt.h>
-#include <asm/kaiser.h>
 #include <asm/desc.h>
 #include <asm/mmu_context.h>
 #include <asm/syscalls.h>
@@ -49,7 +49,7 @@ static struct ldt_struct *alloc_ldt_stru
 {
 	struct ldt_struct *new_ldt;
 	int alloc_size;
-	int ret = 0;
+	int ret;
 
 	if (size > LDT_ENTRIES)
 		return NULL;
@@ -77,10 +77,8 @@ static struct ldt_struct *alloc_ldt_stru
 		return NULL;
 	}
 
-	// FIXME: make kaiser_add_mapping() return an error code
-	// when it fails
-	kaiser_add_mapping((unsigned long)new_ldt->entries, alloc_size,
-			   __PAGE_KERNEL);
+	ret = kaiser_add_mapping((unsigned long)new_ldt->entries, alloc_size,
+				 __PAGE_KERNEL);
 	if (ret) {
 		__free_ldt_struct(new_ldt);
 		return NULL;

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 07/37] kaiser: KAISER depends on SMP
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (5 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 06/37] kaiser: fix build and FIXME in alloc_ldt_struct() Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 08/37] kaiser: fix regs to do_nmi() ifndef CONFIG_KAISER Greg Kroah-Hartman
                   ` (39 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Jiri Kosina

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>


It is absurd that KAISER should depend on SMP, but apparently nobody
has tried a UP build before: which breaks on implicit declaration of
function 'per_cpu_offset' in arch/x86/mm/kaiser.c.

Now, you would expect that to be trivially fixed up; but looking at
the System.map when that block is #ifdef'ed out of kaiser_init(),
I see that in a UP build __per_cpu_user_mapped_end is precisely at
__per_cpu_user_mapped_start, and the items carefully gathered into
that section for user-mapping on SMP, dispersed elsewhere on UP.

So, some other kind of section assignment will be needed on UP,
but implementing that is not a priority: just make KAISER depend
on SMP for now.

Also inserted a blank line before the option, tidied up the
brief Kconfig help message, and added an "If unsure, Y".

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 security/Kconfig |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

--- a/security/Kconfig
+++ b/security/Kconfig
@@ -30,14 +30,16 @@ config SECURITY
 	  model will be used.
 
 	  If you are unsure how to answer this question, answer N.
+
 config KAISER
 	bool "Remove the kernel mapping in user mode"
 	default y
-	depends on X86_64
-	depends on !PARAVIRT
+	depends on X86_64 && SMP && !PARAVIRT
 	help
-	  This enforces a strict kernel and user space isolation in order to close
-	  hardware side channels on kernel address information.
+	  This enforces a strict kernel and user space isolation, in order
+	  to close hardware side channels on kernel address information.
+
+	  If you are unsure how to answer this question, answer Y.
 
 config KAISER_REAL_SWITCH
 	bool "KAISER: actually switch page tables"

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 08/37] kaiser: fix regs to do_nmi() ifndef CONFIG_KAISER
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (6 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 07/37] kaiser: KAISER depends on SMP Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 09/37] kaiser: fix perf crashes Greg Kroah-Hartman
                   ` (38 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Jiri Kosina

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>


pjt has observed that nmi's second (nmi_from_kernel) call to do_nmi()
adjusted the %rdi regs arg, rightly when CONFIG_KAISER, but wrongly
when not CONFIG_KAISER.

Although the minimal change is to add an #ifdef CONFIG_KAISER around
the addq line, that looks cluttered, and I prefer how the first call
to do_nmi() handled it: prepare args in %rdi and %rsi before getting
into the CONFIG_KAISER block, since it does not touch them at all.

And while we're here, place the "#ifdef CONFIG_KAISER" that follows
each, to enclose the "Unconditionally restore CR3" comment: matching
how the "Unconditionally use kernel CR3" comment above is enclosed.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/entry/entry_64.S |   11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1297,12 +1297,13 @@ ENTRY(nmi)
 	movq	%rax, %cr3
 #endif
 	call	do_nmi
+
+#ifdef CONFIG_KAISER
 	/*
 	 * Unconditionally restore CR3.  I know we return to
 	 * kernel code that needs user CR3, but do we ever return
 	 * to "user mode" where we need the kernel CR3?
 	 */
-#ifdef CONFIG_KAISER
 	popq	%rax
 	mov	%rax, %cr3
 #endif
@@ -1526,6 +1527,8 @@ end_repeat_nmi:
 	SWAPGS
 	xorl	%ebx, %ebx
 1:
+	movq	%rsp, %rdi
+	movq	$-1, %rsi
 #ifdef CONFIG_KAISER
 	/* Unconditionally use kernel CR3 for do_nmi() */
 	/* %rax is saved above, so OK to clobber here */
@@ -1538,16 +1541,14 @@ end_repeat_nmi:
 #endif
 
 	/* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
-	movq	%rsp, %rdi
-	addq	$8, %rdi /* point %rdi at ptregs, fixed up for CR3 */
-	movq	$-1, %rsi
 	call	do_nmi
+
+#ifdef CONFIG_KAISER
 	/*
 	 * Unconditionally restore CR3.  We might be returning to
 	 * kernel code that needs user CR3, like just just before
 	 * a sysret.
 	 */
-#ifdef CONFIG_KAISER
 	popq	%rax
 	mov	%rax, %cr3
 #endif

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 09/37] kaiser: fix perf crashes
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (7 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 08/37] kaiser: fix regs to do_nmi() ifndef CONFIG_KAISER Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 10/37] kaiser: ENOMEM if kaiser_pagetable_walk() NULL Greg Kroah-Hartman
                   ` (37 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Jiri Kosina

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>


Avoid perf crashes: place debug_store in the user-mapped per-cpu area
instead of allocating, and use page allocator plus kaiser_add_mapping()
to keep the BTS and PEBS buffers user-mapped (that is, present in the
user mapping, though visible only to kernel and hardware).  The PEBS
fixup buffer does not need this treatment.

The need for a user-mapped struct debug_store showed up before doing
any conscious perf testing: in a couple of kernel paging oopses on
Westmere, implicating the debug_store offset of the per-cpu area.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/cpu/perf_event_intel_ds.c |   57 +++++++++++++++++++++++-------
 1 file changed, 45 insertions(+), 12 deletions(-)

--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -2,11 +2,15 @@
 #include <linux/types.h>
 #include <linux/slab.h>
 
+#include <asm/kaiser.h>
 #include <asm/perf_event.h>
 #include <asm/insn.h>
 
 #include "perf_event.h"
 
+static
+DEFINE_PER_CPU_SHARED_ALIGNED_USER_MAPPED(struct debug_store, cpu_debug_store);
+
 /* The size of a BTS record in bytes: */
 #define BTS_RECORD_SIZE		24
 
@@ -268,6 +272,39 @@ void fini_debug_store_on_cpu(int cpu)
 
 static DEFINE_PER_CPU(void *, insn_buffer);
 
+static void *dsalloc(size_t size, gfp_t flags, int node)
+{
+#ifdef CONFIG_KAISER
+	unsigned int order = get_order(size);
+	struct page *page;
+	unsigned long addr;
+
+	page = __alloc_pages_node(node, flags | __GFP_ZERO, order);
+	if (!page)
+		return NULL;
+	addr = (unsigned long)page_address(page);
+	if (kaiser_add_mapping(addr, size, __PAGE_KERNEL) < 0) {
+		__free_pages(page, order);
+		addr = 0;
+	}
+	return (void *)addr;
+#else
+	return kmalloc_node(size, flags | __GFP_ZERO, node);
+#endif
+}
+
+static void dsfree(const void *buffer, size_t size)
+{
+#ifdef CONFIG_KAISER
+	if (!buffer)
+		return;
+	kaiser_remove_mapping((unsigned long)buffer, size);
+	free_pages((unsigned long)buffer, get_order(size));
+#else
+	kfree(buffer);
+#endif
+}
+
 static int alloc_pebs_buffer(int cpu)
 {
 	struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
@@ -278,7 +315,7 @@ static int alloc_pebs_buffer(int cpu)
 	if (!x86_pmu.pebs)
 		return 0;
 
-	buffer = kzalloc_node(x86_pmu.pebs_buffer_size, GFP_KERNEL, node);
+	buffer = dsalloc(x86_pmu.pebs_buffer_size, GFP_KERNEL, node);
 	if (unlikely(!buffer))
 		return -ENOMEM;
 
@@ -289,7 +326,7 @@ static int alloc_pebs_buffer(int cpu)
 	if (x86_pmu.intel_cap.pebs_format < 2) {
 		ibuffer = kzalloc_node(PEBS_FIXUP_SIZE, GFP_KERNEL, node);
 		if (!ibuffer) {
-			kfree(buffer);
+			dsfree(buffer, x86_pmu.pebs_buffer_size);
 			return -ENOMEM;
 		}
 		per_cpu(insn_buffer, cpu) = ibuffer;
@@ -315,7 +352,8 @@ static void release_pebs_buffer(int cpu)
 	kfree(per_cpu(insn_buffer, cpu));
 	per_cpu(insn_buffer, cpu) = NULL;
 
-	kfree((void *)(unsigned long)ds->pebs_buffer_base);
+	dsfree((void *)(unsigned long)ds->pebs_buffer_base,
+			x86_pmu.pebs_buffer_size);
 	ds->pebs_buffer_base = 0;
 }
 
@@ -329,7 +367,7 @@ static int alloc_bts_buffer(int cpu)
 	if (!x86_pmu.bts)
 		return 0;
 
-	buffer = kzalloc_node(BTS_BUFFER_SIZE, GFP_KERNEL | __GFP_NOWARN, node);
+	buffer = dsalloc(BTS_BUFFER_SIZE, GFP_KERNEL | __GFP_NOWARN, node);
 	if (unlikely(!buffer)) {
 		WARN_ONCE(1, "%s: BTS buffer allocation failure\n", __func__);
 		return -ENOMEM;
@@ -355,19 +393,15 @@ static void release_bts_buffer(int cpu)
 	if (!ds || !x86_pmu.bts)
 		return;
 
-	kfree((void *)(unsigned long)ds->bts_buffer_base);
+	dsfree((void *)(unsigned long)ds->bts_buffer_base, BTS_BUFFER_SIZE);
 	ds->bts_buffer_base = 0;
 }
 
 static int alloc_ds_buffer(int cpu)
 {
-	int node = cpu_to_node(cpu);
-	struct debug_store *ds;
-
-	ds = kzalloc_node(sizeof(*ds), GFP_KERNEL, node);
-	if (unlikely(!ds))
-		return -ENOMEM;
+	struct debug_store *ds = per_cpu_ptr(&cpu_debug_store, cpu);
 
+	memset(ds, 0, sizeof(*ds));
 	per_cpu(cpu_hw_events, cpu).ds = ds;
 
 	return 0;
@@ -381,7 +415,6 @@ static void release_ds_buffer(int cpu)
 		return;
 
 	per_cpu(cpu_hw_events, cpu).ds = NULL;
-	kfree(ds);
 }
 
 void release_ds_buffers(void)

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 10/37] kaiser: ENOMEM if kaiser_pagetable_walk() NULL
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (8 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 09/37] kaiser: fix perf crashes Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 11/37] kaiser: tidied up asm/kaiser.h somewhat Greg Kroah-Hartman
                   ` (36 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Jiri Kosina

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>


kaiser_add_user_map() took no notice when kaiser_pagetable_walk() failed.
And avoid its might_sleep() when atomic (though atomic at present unused).

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/mm/kaiser.c |   10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -99,11 +99,11 @@ static pte_t *kaiser_pagetable_walk(unsi
 	pgd_t *pgd = native_get_shadow_pgd(pgd_offset_k(address));
 	gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
 
-	might_sleep();
 	if (is_atomic) {
 		gfp &= ~GFP_KERNEL;
 		gfp |= __GFP_HIGH | __GFP_ATOMIC;
-	}
+	} else
+		might_sleep();
 
 	if (pgd_none(*pgd)) {
 		WARN_ONCE(1, "All shadow pgds should have been populated");
@@ -160,13 +160,17 @@ int kaiser_add_user_map(const void *__st
 	unsigned long end_addr = PAGE_ALIGN(start_addr + size);
 	unsigned long target_address;
 
-	for (;address < end_addr; address += PAGE_SIZE) {
+	for (; address < end_addr; address += PAGE_SIZE) {
 		target_address = get_pa_from_mapping(address);
 		if (target_address == -1) {
 			ret = -EIO;
 			break;
 		}
 		pte = kaiser_pagetable_walk(address, false);
+		if (!pte) {
+			ret = -ENOMEM;
+			break;
+		}
 		if (pte_none(*pte)) {
 			set_pte(pte, __pte(flags | target_address));
 		} else {

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 11/37] kaiser: tidied up asm/kaiser.h somewhat
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (9 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 10/37] kaiser: ENOMEM if kaiser_pagetable_walk() NULL Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 12/37] kaiser: tidied up kaiser_add/remove_mapping slightly Greg Kroah-Hartman
                   ` (35 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Jiri Kosina

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>


Mainly deleting a surfeit of blank lines, and reflowing header comment.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/kaiser.h |   32 +++++++++++++-------------------
 1 file changed, 13 insertions(+), 19 deletions(-)

--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -1,15 +1,17 @@
 #ifndef _ASM_X86_KAISER_H
 #define _ASM_X86_KAISER_H
-
-/* This file includes the definitions for the KAISER feature.
- * KAISER is a counter measure against x86_64 side channel attacks on the kernel virtual memory.
- * It has a shodow-pgd for every process. the shadow-pgd has a minimalistic kernel-set mapped,
- * but includes the whole user memory. Within a kernel context switch, or when an interrupt is handled,
- * the pgd is switched to the normal one. When the system switches to user mode, the shadow pgd is enabled.
- * By this, the virtual memory chaches are freed, and the user may not attack the whole kernel memory.
+/*
+ * This file includes the definitions for the KAISER feature.
+ * KAISER is a counter measure against x86_64 side channel attacks on
+ * the kernel virtual memory.  It has a shadow pgd for every process: the
+ * shadow pgd has a minimalistic kernel-set mapped, but includes the whole
+ * user memory. Within a kernel context switch, or when an interrupt is handled,
+ * the pgd is switched to the normal one. When the system switches to user mode,
+ * the shadow pgd is enabled. By this, the virtual memory caches are freed,
+ * and the user may not attack the whole kernel memory.
  *
- * A minimalistic kernel mapping holds the parts needed to be mapped in user mode, as the entry/exit functions
- * of the user space, or the stacks.
+ * A minimalistic kernel mapping holds the parts needed to be mapped in user
+ * mode, such as the entry/exit functions of the user space, or the stacks.
  */
 #ifdef __ASSEMBLY__
 #ifdef CONFIG_KAISER
@@ -48,13 +50,10 @@ _SWITCH_TO_KERNEL_CR3 %rax
 movq PER_CPU_VAR(unsafe_stack_register_backup), %rax
 .endm
 
-
 .macro SWITCH_USER_CR3_NO_STACK
-
 movq %rax, PER_CPU_VAR(unsafe_stack_register_backup)
 _SWITCH_TO_USER_CR3 %rax
 movq PER_CPU_VAR(unsafe_stack_register_backup), %rax
-
 .endm
 
 #else /* CONFIG_KAISER */
@@ -72,7 +71,6 @@ movq PER_CPU_VAR(unsafe_stack_register_b
 
 #else /* __ASSEMBLY__ */
 
-
 #ifdef CONFIG_KAISER
 /*
  * Upon kernel/user mode switch, it may happen that the address
@@ -80,7 +78,6 @@ movq PER_CPU_VAR(unsafe_stack_register_b
  * stored.  To change the address space, another register is
  * needed.  A register therefore has to be stored/restored.
 */
-
 DECLARE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
 
 /**
@@ -95,7 +92,6 @@ DECLARE_PER_CPU_USER_MAPPED(unsigned lon
  */
 extern int kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags);
 
-
 /**
  *  kaiser_remove_mapping - unmap a virtual memory part of the shadow mapping
  *  @addr: the start address of the range
@@ -104,12 +100,12 @@ extern int kaiser_add_mapping(unsigned l
 extern void kaiser_remove_mapping(unsigned long start, unsigned long size);
 
 /**
- *  kaiser_initialize_mapping - Initalize the shadow mapping
+ *  kaiser_init - Initialize the shadow mapping
  *
  *  Most parts of the shadow mapping can be mapped upon boot
  *  time.  Only per-process things like the thread stacks
  *  or a new LDT have to be mapped at runtime.  These boot-
- *  time mappings are permanent and nevertunmapped.
+ *  time mappings are permanent and never unmapped.
  */
 extern void kaiser_init(void);
 
@@ -117,6 +113,4 @@ extern void kaiser_init(void);
 
 #endif /* __ASSEMBLY */
 
-
-
 #endif /* _ASM_X86_KAISER_H */

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 12/37] kaiser: tidied up kaiser_add/remove_mapping slightly
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (10 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 11/37] kaiser: tidied up asm/kaiser.h somewhat Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 13/37] kaiser: kaiser_remove_mapping() move along the pgd Greg Kroah-Hartman
                   ` (34 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Jiri Kosina

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>


Yes, unmap_pud_range_nofree()'s declaration ought to be in a
header file really, but I'm not sure we want to use it anyway:
so for now just declare it inside kaiser_remove_mapping().
And there doesn't seem to be such a thing as unmap_p4d_range(),
even in a 5-level paging tree.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/mm/kaiser.c |    9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -286,8 +286,7 @@ void __init kaiser_init(void)
 				  __PAGE_KERNEL);
 }
 
-extern void unmap_pud_range_nofree(pgd_t *pgd, unsigned long start, unsigned long end);
-// add a mapping to the shadow-mapping, and synchronize the mappings
+/* Add a mapping to the shadow mapping, and synchronize the mappings */
 int kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags)
 {
 	return kaiser_add_user_map((const void *)addr, size, flags);
@@ -295,15 +294,13 @@ int kaiser_add_mapping(unsigned long add
 
 void kaiser_remove_mapping(unsigned long start, unsigned long size)
 {
+	extern void unmap_pud_range_nofree(pgd_t *pgd,
+				unsigned long start, unsigned long end);
 	unsigned long end = start + size;
 	unsigned long addr;
 
 	for (addr = start; addr < end; addr += PGDIR_SIZE) {
 		pgd_t *pgd = native_get_shadow_pgd(pgd_offset_k(addr));
-		/*
-		 * unmap_p4d_range() handles > P4D_SIZE unmaps,
-		 * so no need to trim 'end'.
-		 */
 		unmap_pud_range_nofree(pgd, addr, end);
 	}
 }

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 13/37] kaiser: kaiser_remove_mapping() move along the pgd
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (11 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 12/37] kaiser: tidied up kaiser_add/remove_mapping slightly Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 14/37] kaiser: cleanups while trying for gold link Greg Kroah-Hartman
                   ` (33 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Jiri Kosina

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>


When removing the bogus comment from kaiser_remove_mapping(),
I really ought to have checked the extent of its bogosity: as
Neel points out, there is nothing to stop unmap_pud_range_nofree()
from continuing beyond the end of a pud (and starting in the wrong
position on the next).

Fix kaiser_remove_mapping() to constrain the extent and advance pgd
pointer correctly: use pgd_addr_end() macro as used throughout base
mm (but don't assume page-rounded start and size in this case).

But this bug was very unlikely to trigger in this backport: since
any buddy allocation is contained within a single pud extent, and
we are not using vmapped stacks (and are only mapping one page of
stack anyway): the only way to hit this bug here would be when
freeing a large modified ldt.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/mm/kaiser.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -297,11 +297,13 @@ void kaiser_remove_mapping(unsigned long
 	extern void unmap_pud_range_nofree(pgd_t *pgd,
 				unsigned long start, unsigned long end);
 	unsigned long end = start + size;
-	unsigned long addr;
+	unsigned long addr, next;
+	pgd_t *pgd;
 
-	for (addr = start; addr < end; addr += PGDIR_SIZE) {
-		pgd_t *pgd = native_get_shadow_pgd(pgd_offset_k(addr));
-		unmap_pud_range_nofree(pgd, addr, end);
+	pgd = native_get_shadow_pgd(pgd_offset_k(start));
+	for (addr = start; addr < end; pgd++, addr = next) {
+		next = pgd_addr_end(addr, end);
+		unmap_pud_range_nofree(pgd, addr, next);
 	}
 }
 

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 14/37] kaiser: cleanups while trying for gold link
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (12 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 13/37] kaiser: kaiser_remove_mapping() move along the pgd Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 15/37] kaiser: name that 0x1000 KAISER_SHADOW_PGD_OFFSET Greg Kroah-Hartman
                   ` (32 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Jiri Kosina

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>


While trying to get our gold link to work, four cleanups:
matched the gdt_page declaration to its definition;
in fiddling unsuccessfully with PERCPU_INPUT(), lined up backslashes;
lined up the backslashes according to convention in percpu-defs.h;
deleted the unused irq_stack_pointer addition to irq_stack_union.

Sad to report that aligning backslashes does not appear to help gold
align to 8192: but while these did not help, they are worth keeping.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/desc.h       |    2 +-
 arch/x86/include/asm/processor.h  |    5 -----
 include/asm-generic/vmlinux.lds.h |   18 ++++++++----------
 include/linux/percpu-defs.h       |   24 ++++++++++++------------
 4 files changed, 21 insertions(+), 28 deletions(-)

--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -43,7 +43,7 @@ struct gdt_page {
 	struct desc_struct gdt[GDT_ENTRIES];
 } __attribute__((aligned(PAGE_SIZE)));
 
-DECLARE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page);
+DECLARE_PER_CPU_PAGE_ALIGNED_USER_MAPPED(struct gdt_page, gdt_page);
 
 static inline struct desc_struct *get_cpu_gdt_table(unsigned int cpu)
 {
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -332,11 +332,6 @@ union irq_stack_union {
 		char gs_base[40];
 		unsigned long stack_canary;
 	};
-
-	struct {
-		char irq_stack_pointer[64];
-		char unused[IRQ_STACK_SIZE - 64];
-	};
 };
 
 DECLARE_PER_CPU_FIRST(union irq_stack_union, irq_stack_union) __visible;
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -725,16 +725,14 @@
  */
 #define PERCPU_INPUT(cacheline)						\
 	VMLINUX_SYMBOL(__per_cpu_start) = .;				\
-	\
-	VMLINUX_SYMBOL(__per_cpu_user_mapped_start) = .;        \
-	*(.data..percpu..first)           \
-	. = ALIGN(cacheline);           \
-	*(.data..percpu..user_mapped)            \
-	*(.data..percpu..user_mapped..shared_aligned)        \
-	. = ALIGN(PAGE_SIZE);           \
-	*(.data..percpu..user_mapped..page_aligned)          \
-	VMLINUX_SYMBOL(__per_cpu_user_mapped_end) = .;        \
-	\
+	VMLINUX_SYMBOL(__per_cpu_user_mapped_start) = .;		\
+	*(.data..percpu..first)						\
+	. = ALIGN(cacheline);						\
+	*(.data..percpu..user_mapped)					\
+	*(.data..percpu..user_mapped..shared_aligned)			\
+	. = ALIGN(PAGE_SIZE);						\
+	*(.data..percpu..user_mapped..page_aligned)			\
+	VMLINUX_SYMBOL(__per_cpu_user_mapped_end) = .;			\
 	. = ALIGN(PAGE_SIZE);						\
 	*(.data..percpu..page_aligned)					\
 	. = ALIGN(cacheline);						\
--- a/include/linux/percpu-defs.h
+++ b/include/linux/percpu-defs.h
@@ -121,10 +121,10 @@
 #define DEFINE_PER_CPU(type, name)					\
 	DEFINE_PER_CPU_SECTION(type, name, "")
 
-#define DECLARE_PER_CPU_USER_MAPPED(type, name)         \
+#define DECLARE_PER_CPU_USER_MAPPED(type, name)				\
 	DECLARE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION)
 
-#define DEFINE_PER_CPU_USER_MAPPED(type, name)          \
+#define DEFINE_PER_CPU_USER_MAPPED(type, name)				\
 	DEFINE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION)
 
 /*
@@ -156,11 +156,11 @@
 	DEFINE_PER_CPU_SECTION(type, name, PER_CPU_SHARED_ALIGNED_SECTION) \
 	____cacheline_aligned_in_smp
 
-#define DECLARE_PER_CPU_SHARED_ALIGNED_USER_MAPPED(type, name)			\
+#define DECLARE_PER_CPU_SHARED_ALIGNED_USER_MAPPED(type, name)		\
 	DECLARE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION PER_CPU_SHARED_ALIGNED_SECTION) \
 	____cacheline_aligned_in_smp
 
-#define DEFINE_PER_CPU_SHARED_ALIGNED_USER_MAPPED(type, name)			\
+#define DEFINE_PER_CPU_SHARED_ALIGNED_USER_MAPPED(type, name)		\
 	DEFINE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION PER_CPU_SHARED_ALIGNED_SECTION) \
 	____cacheline_aligned_in_smp
 
@@ -185,18 +185,18 @@
 /*
  * Declaration/definition used for per-CPU variables that must be page aligned and need to be mapped in user mode.
  */
-#define DECLARE_PER_CPU_PAGE_ALIGNED_USER_MAPPED(type, name)      \
-  DECLARE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION"..page_aligned")   \
-  __aligned(PAGE_SIZE)
-
-#define DEFINE_PER_CPU_PAGE_ALIGNED_USER_MAPPED(type, name)       \
-  DEFINE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION"..page_aligned")    \
-  __aligned(PAGE_SIZE)
+#define DECLARE_PER_CPU_PAGE_ALIGNED_USER_MAPPED(type, name)		\
+	DECLARE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION"..page_aligned") \
+	__aligned(PAGE_SIZE)
+
+#define DEFINE_PER_CPU_PAGE_ALIGNED_USER_MAPPED(type, name)		\
+	DEFINE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION"..page_aligned") \
+	__aligned(PAGE_SIZE)
 
 /*
  * Declaration/definition used for per-CPU variables that must be read mostly.
  */
-#define DECLARE_PER_CPU_READ_MOSTLY(type, name)			\
+#define DECLARE_PER_CPU_READ_MOSTLY(type, name)				\
 	DECLARE_PER_CPU_SECTION(type, name, "..read_mostly")
 
 #define DEFINE_PER_CPU_READ_MOSTLY(type, name)				\

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 15/37] kaiser: name that 0x1000 KAISER_SHADOW_PGD_OFFSET
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (13 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 14/37] kaiser: cleanups while trying for gold link Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 16/37] kaiser: delete KAISER_REAL_SWITCH option Greg Kroah-Hartman
                   ` (31 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Jiri Kosina

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>


There's a 0x1000 in various places, which looks better with a name.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/entry/entry_64.S     |    4 ++--
 arch/x86/include/asm/kaiser.h |    7 +++++--
 2 files changed, 7 insertions(+), 4 deletions(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1292,7 +1292,7 @@ ENTRY(nmi)
 	movq	%cr3, %rax
 	pushq	%rax
 #ifdef CONFIG_KAISER_REAL_SWITCH
-	andq	$(~0x1000), %rax
+	andq	$(~KAISER_SHADOW_PGD_OFFSET), %rax
 #endif
 	movq	%rax, %cr3
 #endif
@@ -1535,7 +1535,7 @@ end_repeat_nmi:
 	movq	%cr3, %rax
 	pushq	%rax
 #ifdef CONFIG_KAISER_REAL_SWITCH
-	andq	$(~0x1000), %rax
+	andq	$(~KAISER_SHADOW_PGD_OFFSET), %rax
 #endif
 	movq	%rax, %cr3
 #endif
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -13,13 +13,16 @@
  * A minimalistic kernel mapping holds the parts needed to be mapped in user
  * mode, such as the entry/exit functions of the user space, or the stacks.
  */
+
+#define KAISER_SHADOW_PGD_OFFSET 0x1000
+
 #ifdef __ASSEMBLY__
 #ifdef CONFIG_KAISER
 
 .macro _SWITCH_TO_KERNEL_CR3 reg
 movq %cr3, \reg
 #ifdef CONFIG_KAISER_REAL_SWITCH
-andq $(~0x1000), \reg
+andq $(~KAISER_SHADOW_PGD_OFFSET), \reg
 #endif
 movq \reg, %cr3
 .endm
@@ -27,7 +30,7 @@ movq \reg, %cr3
 .macro _SWITCH_TO_USER_CR3 reg
 movq %cr3, \reg
 #ifdef CONFIG_KAISER_REAL_SWITCH
-orq $(0x1000), \reg
+orq $(KAISER_SHADOW_PGD_OFFSET), \reg
 #endif
 movq \reg, %cr3
 .endm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 16/37] kaiser: delete KAISER_REAL_SWITCH option
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (14 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 15/37] kaiser: name that 0x1000 KAISER_SHADOW_PGD_OFFSET Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 17/37] kaiser: vmstat show NR_KAISERTABLE as nr_overhead Greg Kroah-Hartman
                   ` (30 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Jiri Kosina

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>


We fail to see what CONFIG_KAISER_REAL_SWITCH is for: it seems to be
left over from early development, and now just obscures tricky parts
of the code.  Delete it before adding PCIDs, or nokaiser boot option.

(Or if there is some good reason to keep the option, then it needs
a help text - and a "depends on KAISER", so that all those without
KAISER are not asked the question.)

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/entry/entry_64.S     |    4 ----
 arch/x86/include/asm/kaiser.h |    4 ----
 security/Kconfig              |    4 ----
 3 files changed, 12 deletions(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1291,9 +1291,7 @@ ENTRY(nmi)
 	/* %rax is saved above, so OK to clobber here */
 	movq	%cr3, %rax
 	pushq	%rax
-#ifdef CONFIG_KAISER_REAL_SWITCH
 	andq	$(~KAISER_SHADOW_PGD_OFFSET), %rax
-#endif
 	movq	%rax, %cr3
 #endif
 	call	do_nmi
@@ -1534,9 +1532,7 @@ end_repeat_nmi:
 	/* %rax is saved above, so OK to clobber here */
 	movq	%cr3, %rax
 	pushq	%rax
-#ifdef CONFIG_KAISER_REAL_SWITCH
 	andq	$(~KAISER_SHADOW_PGD_OFFSET), %rax
-#endif
 	movq	%rax, %cr3
 #endif
 
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -21,17 +21,13 @@
 
 .macro _SWITCH_TO_KERNEL_CR3 reg
 movq %cr3, \reg
-#ifdef CONFIG_KAISER_REAL_SWITCH
 andq $(~KAISER_SHADOW_PGD_OFFSET), \reg
-#endif
 movq \reg, %cr3
 .endm
 
 .macro _SWITCH_TO_USER_CR3 reg
 movq %cr3, \reg
-#ifdef CONFIG_KAISER_REAL_SWITCH
 orq $(KAISER_SHADOW_PGD_OFFSET), \reg
-#endif
 movq \reg, %cr3
 .endm
 
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -41,10 +41,6 @@ config KAISER
 
 	  If you are unsure how to answer this question, answer Y.
 
-config KAISER_REAL_SWITCH
-	bool "KAISER: actually switch page tables"
-	default y
-
 config SECURITYFS
 	bool "Enable the securityfs filesystem"
 	help

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 17/37] kaiser: vmstat show NR_KAISERTABLE as nr_overhead
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (15 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 16/37] kaiser: delete KAISER_REAL_SWITCH option Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 18/37] kaiser: enhanced by kernel and user PCIDs Greg Kroah-Hartman
                   ` (29 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Jiri Kosina

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>


The kaiser update made an interesting choice, never to free any shadow
page tables.  Contention on global spinlock was worrying, particularly
with it held across page table scans when freeing.  Something had to be
done: I was going to add refcounting; but simply never to free them is
an appealing choice, minimizing contention without complicating the code
(the more a page table is found already, the less the spinlock is used).

But leaking pages in this way is also a worry: can we get away with it?
At the very least, we need a count to show how bad it actually gets:
in principle, one might end up wasting about 1/256 of memory that way
(1/512 for when direct-mapped pages have to be user-mapped, plus 1/512
for when they are user-mapped from the vmalloc area on another occasion
(but we don't have vmalloc'ed stacks, so only large ldts are vmalloc'ed).

Add per-cpu stat NR_KAISERTABLE: including 256 at startup for the
shared pgd entries, and 1 for each intermediate page table added
thereafter for user-mapping - but leave out the 1 per mm, for its
shadow pgd, because that distracts from the monotonic increase.
Shown in /proc/vmstat as nr_overhead (0 if kaiser not enabled).

In practice, it doesn't look so bad so far: more like 1/12000 after
nine hours of gtests below; and movable pageblock segregation should
tend to cluster the kaiser tables into a subset of the address space
(if not, they will be bad for compaction too).  But production may
tell a different story: keep an eye on this number, and bring back
lighter freeing if it gets out of control (maybe a shrinker).

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/mm/kaiser.c   |   16 +++++++++++-----
 include/linux/mmzone.h |    3 ++-
 mm/vmstat.c            |    1 +
 3 files changed, 14 insertions(+), 6 deletions(-)

--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -122,9 +122,11 @@ static pte_t *kaiser_pagetable_walk(unsi
 		if (!new_pmd_page)
 			return NULL;
 		spin_lock(&shadow_table_allocation_lock);
-		if (pud_none(*pud))
+		if (pud_none(*pud)) {
 			set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
-		else
+			__inc_zone_page_state(virt_to_page((void *)
+						new_pmd_page), NR_KAISERTABLE);
+		} else
 			free_page(new_pmd_page);
 		spin_unlock(&shadow_table_allocation_lock);
 	}
@@ -140,9 +142,11 @@ static pte_t *kaiser_pagetable_walk(unsi
 		if (!new_pte_page)
 			return NULL;
 		spin_lock(&shadow_table_allocation_lock);
-		if (pmd_none(*pmd))
+		if (pmd_none(*pmd)) {
 			set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page)));
-		else
+			__inc_zone_page_state(virt_to_page((void *)
+						new_pte_page), NR_KAISERTABLE);
+		} else
 			free_page(new_pte_page);
 		spin_unlock(&shadow_table_allocation_lock);
 	}
@@ -206,11 +210,13 @@ static void __init kaiser_init_all_pgds(
 	pgd = native_get_shadow_pgd(pgd_offset_k((unsigned long )0));
 	for (i = PTRS_PER_PGD / 2; i < PTRS_PER_PGD; i++) {
 		pgd_t new_pgd;
-		pud_t *pud = pud_alloc_one(&init_mm, PAGE_OFFSET + i * PGDIR_SIZE);
+		pud_t *pud = pud_alloc_one(&init_mm,
+					   PAGE_OFFSET + i * PGDIR_SIZE);
 		if (!pud) {
 			WARN_ON(1);
 			break;
 		}
+		inc_zone_page_state(virt_to_page(pud), NR_KAISERTABLE);
 		new_pgd = __pgd(_KERNPG_TABLE |__pa(pud));
 		/*
 		 * Make sure not to stomp on some other pgd entry.
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -131,8 +131,9 @@ enum zone_stat_item {
 	NR_SLAB_RECLAIMABLE,
 	NR_SLAB_UNRECLAIMABLE,
 	NR_PAGETABLE,		/* used for pagetables */
-	NR_KERNEL_STACK,
 	/* Second 128 byte cacheline */
+	NR_KERNEL_STACK,
+	NR_KAISERTABLE,
 	NR_UNSTABLE_NFS,	/* NFS unstable pages */
 	NR_BOUNCE,
 	NR_VMSCAN_WRITE,
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -736,6 +736,7 @@ const char * const vmstat_text[] = {
 	"nr_slab_unreclaimable",
 	"nr_page_table_pages",
 	"nr_kernel_stack",
+	"nr_overhead",
 	"nr_unstable",
 	"nr_bounce",
 	"nr_vmscan_write",

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 18/37] kaiser: enhanced by kernel and user PCIDs
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (16 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 17/37] kaiser: vmstat show NR_KAISERTABLE as nr_overhead Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 19/37] kaiser: load_new_mm_cr3() let SWITCH_USER_CR3 flush user Greg Kroah-Hartman
                   ` (28 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Jiri Kosina

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dave Hansen <dave.hansen@linux.intel.com>


Merged performance improvements to Kaiser, using distinct kernel
and user Process Context Identifiers to minimize the TLB flushing.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/entry/entry_64.S                   |   10 ++++-
 arch/x86/entry/entry_64_compat.S            |    1 
 arch/x86/include/asm/cpufeature.h           |    1 
 arch/x86/include/asm/kaiser.h               |   15 ++++++-
 arch/x86/include/asm/pgtable_types.h        |   26 +++++++++++++
 arch/x86/include/asm/tlbflush.h             |   54 +++++++++++++++++++++++-----
 arch/x86/include/uapi/asm/processor-flags.h |    3 +
 arch/x86/kernel/cpu/common.c                |   34 +++++++++++++++++
 arch/x86/kvm/x86.c                          |    3 +
 arch/x86/mm/kaiser.c                        |    7 +++
 arch/x86/mm/tlb.c                           |   46 ++++++++++++++++++++++-
 11 files changed, 182 insertions(+), 18 deletions(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1291,7 +1291,10 @@ ENTRY(nmi)
 	/* %rax is saved above, so OK to clobber here */
 	movq	%cr3, %rax
 	pushq	%rax
-	andq	$(~KAISER_SHADOW_PGD_OFFSET), %rax
+	/* mask off "user" bit of pgd address and 12 PCID bits: */
+	andq	$(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
+	/* Add back kernel PCID and "no flush" bit */
+	orq	X86_CR3_PCID_KERN_VAR, %rax
 	movq	%rax, %cr3
 #endif
 	call	do_nmi
@@ -1532,7 +1535,10 @@ end_repeat_nmi:
 	/* %rax is saved above, so OK to clobber here */
 	movq	%cr3, %rax
 	pushq	%rax
-	andq	$(~KAISER_SHADOW_PGD_OFFSET), %rax
+	/* mask off "user" bit of pgd address and 12 PCID bits: */
+	andq	$(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
+	/* Add back kernel PCID and "no flush" bit */
+	orq	X86_CR3_PCID_KERN_VAR, %rax
 	movq	%rax, %cr3
 #endif
 
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -13,6 +13,7 @@
 #include <asm/irqflags.h>
 #include <asm/asm.h>
 #include <asm/smap.h>
+#include <asm/pgtable_types.h>
 #include <asm/kaiser.h>
 #include <linux/linkage.h>
 #include <linux/err.h>
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -187,6 +187,7 @@
 #define X86_FEATURE_ARAT	( 7*32+ 1) /* Always Running APIC Timer */
 #define X86_FEATURE_CPB		( 7*32+ 2) /* AMD Core Performance Boost */
 #define X86_FEATURE_EPB		( 7*32+ 3) /* IA32_ENERGY_PERF_BIAS support */
+#define X86_FEATURE_INVPCID_SINGLE ( 7*32+ 4) /* Effectively INVPCID && CR4.PCIDE=1 */
 #define X86_FEATURE_PLN		( 7*32+ 5) /* Intel Power Limit Notification */
 #define X86_FEATURE_PTS		( 7*32+ 6) /* Intel Package Thermal Status */
 #define X86_FEATURE_DTHERM	( 7*32+ 7) /* Digital Thermal Sensor */
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -1,5 +1,8 @@
 #ifndef _ASM_X86_KAISER_H
 #define _ASM_X86_KAISER_H
+
+#include <uapi/asm/processor-flags.h> /* For PCID constants */
+
 /*
  * This file includes the definitions for the KAISER feature.
  * KAISER is a counter measure against x86_64 side channel attacks on
@@ -21,13 +24,21 @@
 
 .macro _SWITCH_TO_KERNEL_CR3 reg
 movq %cr3, \reg
-andq $(~KAISER_SHADOW_PGD_OFFSET), \reg
+andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), \reg
+orq  X86_CR3_PCID_KERN_VAR, \reg
 movq \reg, %cr3
 .endm
 
 .macro _SWITCH_TO_USER_CR3 reg
 movq %cr3, \reg
-orq $(KAISER_SHADOW_PGD_OFFSET), \reg
+andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), \reg
+/*
+ * This can obviously be one instruction by putting the
+ * KAISER_SHADOW_PGD_OFFSET bit in the X86_CR3_PCID_USER_VAR.
+ * But, just leave it now for simplicity.
+ */
+orq  X86_CR3_PCID_USER_VAR, \reg
+orq  $(KAISER_SHADOW_PGD_OFFSET), \reg
 movq \reg, %cr3
 .endm
 
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -106,6 +106,32 @@
 			 _PAGE_SOFT_DIRTY)
 #define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE)
 
+/* The ASID is the lower 12 bits of CR3 */
+#define X86_CR3_PCID_ASID_MASK  (_AC((1<<12)-1,UL))
+
+/* Mask for all the PCID-related bits in CR3: */
+#define X86_CR3_PCID_MASK       (X86_CR3_PCID_NOFLUSH | X86_CR3_PCID_ASID_MASK)
+#if defined(CONFIG_KAISER) && defined(CONFIG_X86_64)
+#define X86_CR3_PCID_ASID_KERN  (_AC(0x4,UL))
+#define X86_CR3_PCID_ASID_USER  (_AC(0x6,UL))
+
+#define X86_CR3_PCID_KERN_FLUSH		(X86_CR3_PCID_ASID_KERN)
+#define X86_CR3_PCID_USER_FLUSH		(X86_CR3_PCID_ASID_USER)
+#define X86_CR3_PCID_KERN_NOFLUSH	(X86_CR3_PCID_NOFLUSH | X86_CR3_PCID_ASID_KERN)
+#define X86_CR3_PCID_USER_NOFLUSH	(X86_CR3_PCID_NOFLUSH | X86_CR3_PCID_ASID_USER)
+#else
+#define X86_CR3_PCID_ASID_KERN  (_AC(0x0,UL))
+#define X86_CR3_PCID_ASID_USER  (_AC(0x0,UL))
+/*
+ * PCIDs are unsupported on 32-bit and none of these bits can be
+ * set in CR3:
+ */
+#define X86_CR3_PCID_KERN_FLUSH		(0)
+#define X86_CR3_PCID_USER_FLUSH		(0)
+#define X86_CR3_PCID_KERN_NOFLUSH	(0)
+#define X86_CR3_PCID_USER_NOFLUSH	(0)
+#endif
+
 /*
  * The cache modes defined here are used to translate between pure SW usage
  * and the HW defined cache mode bits and/or PAT entries.
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -12,7 +12,6 @@ static inline void __invpcid(unsigned lo
 			     unsigned long type)
 {
 	struct { u64 d[2]; } desc = { { pcid, addr } };
-
 	/*
 	 * The memory clobber is because the whole point is to invalidate
 	 * stale TLB entries and, especially if we're flushing global
@@ -133,14 +132,25 @@ static inline void cr4_set_bits_and_upda
 
 static inline void __native_flush_tlb(void)
 {
+	if (!cpu_feature_enabled(X86_FEATURE_INVPCID)) {
+	 	/*
+		 * If current->mm == NULL then we borrow a mm which may change during a
+		 * task switch and therefore we must not be preempted while we write CR3
+		 * back:
+		 */
+		preempt_disable();
+		native_write_cr3(native_read_cr3());
+		preempt_enable();
+		return;
+	}
 	/*
-	 * If current->mm == NULL then we borrow a mm which may change during a
-	 * task switch and therefore we must not be preempted while we write CR3
-	 * back:
-	 */
-	preempt_disable();
-	native_write_cr3(native_read_cr3());
-	preempt_enable();
+	 * We are no longer using globals with KAISER, so a
+	 * "nonglobals" flush would work too. But, this is more
+	 * conservative.
+	 *
+	 * Note, this works with CR4.PCIDE=0 or 1.
+	 */
+	invpcid_flush_all();
 }
 
 static inline void __native_flush_tlb_global_irq_disabled(void)
@@ -162,6 +172,8 @@ static inline void __native_flush_tlb_gl
 		/*
 		 * Using INVPCID is considerably faster than a pair of writes
 		 * to CR4 sandwiched inside an IRQ flag save/restore.
+		 *
+	 	 * Note, this works with CR4.PCIDE=0 or 1.
 		 */
 		invpcid_flush_all();
 		return;
@@ -181,7 +193,31 @@ static inline void __native_flush_tlb_gl
 
 static inline void __native_flush_tlb_single(unsigned long addr)
 {
-	asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
+	/*
+	 * SIMICS #GP's if you run INVPCID with type 2/3
+	 * and X86_CR4_PCIDE clear.  Shame!
+	 *
+	 * The ASIDs used below are hard-coded.  But, we must not
+	 * call invpcid(type=1/2) before CR4.PCIDE=1.  Just call
+	 * invpcid in the case we are called early.
+	 */
+	if (!this_cpu_has(X86_FEATURE_INVPCID_SINGLE)) {
+		asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
+		return;
+	}
+	/* Flush the address out of both PCIDs. */
+	/*
+	 * An optimization here might be to determine addresses
+	 * that are only kernel-mapped and only flush the kernel
+	 * ASID.  But, userspace flushes are probably much more
+	 * important performance-wise.
+	 *
+	 * Make sure to do only a single invpcid when KAISER is
+	 * disabled and we have only a single ASID.
+	 */
+	if (X86_CR3_PCID_ASID_KERN != X86_CR3_PCID_ASID_USER)
+		invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
+	invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
 }
 
 static inline void __flush_tlb_all(void)
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -77,7 +77,8 @@
 #define X86_CR3_PWT		_BITUL(X86_CR3_PWT_BIT)
 #define X86_CR3_PCD_BIT		4 /* Page Cache Disable */
 #define X86_CR3_PCD		_BITUL(X86_CR3_PCD_BIT)
-#define X86_CR3_PCID_MASK	_AC(0x00000fff,UL) /* PCID Mask */
+#define X86_CR3_PCID_NOFLUSH_BIT 63 /* Preserve old PCID */
+#define X86_CR3_PCID_NOFLUSH    _BITULL(X86_CR3_PCID_NOFLUSH_BIT)
 
 /*
  * Intel CPU features in CR4
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -321,11 +321,45 @@ static __always_inline void setup_smap(s
 	}
 }
 
+/*
+ * These can have bit 63 set, so we can not just use a plain "or"
+ * instruction to get their value or'd into CR3.  It would take
+ * another register.  So, we use a memory reference to these
+ * instead.
+ *
+ * This is also handy because systems that do not support
+ * PCIDs just end up or'ing a 0 into their CR3, which does
+ * no harm.
+ */
+__aligned(PAGE_SIZE) unsigned long X86_CR3_PCID_KERN_VAR = 0;
+__aligned(PAGE_SIZE) unsigned long X86_CR3_PCID_USER_VAR = 0;
+
 static void setup_pcid(struct cpuinfo_x86 *c)
 {
 	if (cpu_has(c, X86_FEATURE_PCID)) {
 		if (cpu_has(c, X86_FEATURE_PGE)) {
 			cr4_set_bits(X86_CR4_PCIDE);
+			/*
+			 * These variables are used by the entry/exit
+			 * code to change PCIDs.
+			 */
+#ifdef CONFIG_KAISER
+			X86_CR3_PCID_KERN_VAR = X86_CR3_PCID_KERN_NOFLUSH;
+			X86_CR3_PCID_USER_VAR = X86_CR3_PCID_USER_NOFLUSH;
+#endif
+			/*
+			 * INVPCID has two "groups" of types:
+			 * 1/2: Invalidate an individual address
+			 * 3/4: Invalidate all contexts
+			 *
+			 * 1/2 take a PCID, but 3/4 do not.  So, 3/4
+			 * ignore the PCID argument in the descriptor.
+			 * But, we have to be careful not to call 1/2
+			 * with an actual non-zero PCID in them before
+			 * we do the above cr4_set_bits().
+			 */
+			if (cpu_has(c, X86_FEATURE_INVPCID))
+				set_cpu_cap(c, X86_FEATURE_INVPCID_SINGLE);
 		} else {
 			/*
 			 * flush_tlb_all(), as currently implemented, won't
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -759,7 +759,8 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, u
 			return 1;
 
 		/* PCID can not be enabled when cr3[11:0]!=000H or EFER.LMA=0 */
-		if ((kvm_read_cr3(vcpu) & X86_CR3_PCID_MASK) || !is_long_mode(vcpu))
+		if ((kvm_read_cr3(vcpu) & X86_CR3_PCID_ASID_MASK) ||
+		    !is_long_mode(vcpu))
 			return 1;
 	}
 
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -240,6 +240,8 @@ static void __init kaiser_init_all_pgds(
 } while (0)
 
 extern char __per_cpu_user_mapped_start[], __per_cpu_user_mapped_end[];
+extern unsigned long X86_CR3_PCID_KERN_VAR;
+extern unsigned long X86_CR3_PCID_USER_VAR;
 /*
  * If anything in here fails, we will likely die on one of the
  * first kernel->user transitions and init will die.  But, we
@@ -290,6 +292,11 @@ void __init kaiser_init(void)
 	kaiser_add_user_map_early(&debug_idt_table,
 				  sizeof(gate_desc) * NR_VECTORS,
 				  __PAGE_KERNEL);
+
+	kaiser_add_user_map_early(&X86_CR3_PCID_KERN_VAR, PAGE_SIZE,
+				  __PAGE_KERNEL);
+	kaiser_add_user_map_early(&X86_CR3_PCID_USER_VAR, PAGE_SIZE,
+				  __PAGE_KERNEL);
 }
 
 /* Add a mapping to the shadow mapping, and synchronize the mappings */
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -34,6 +34,46 @@ struct flush_tlb_info {
 	unsigned long flush_end;
 };
 
+static void load_new_mm_cr3(pgd_t *pgdir)
+{
+	unsigned long new_mm_cr3 = __pa(pgdir);
+
+	/*
+	 * KAISER, plus PCIDs needs some extra work here.  But,
+	 * if either of features is not present, we need no
+	 * PCIDs here and just do a normal, full TLB flush with
+	 * the write_cr3()
+	 */
+	if (!IS_ENABLED(CONFIG_KAISER) ||
+	    !cpu_feature_enabled(X86_FEATURE_PCID))
+		goto out_set_cr3;
+	/*
+	 * We reuse the same PCID for different tasks, so we must
+	 * flush all the entires for the PCID out when we change
+	 * tasks.
+	 */
+	new_mm_cr3 = X86_CR3_PCID_KERN_FLUSH | __pa(pgdir);
+
+	/*
+	 * The flush from load_cr3() may leave old TLB entries
+	 * for userspace in place.  We must flush that context
+	 * separately.  We can theoretically delay doing this
+	 * until we actually load up the userspace CR3, but
+	 * that's a bit tricky.  We have to have the "need to
+	 * flush userspace PCID" bit per-cpu and check it in the
+	 * exit-to-userspace paths.
+	 */
+	invpcid_flush_single_context(X86_CR3_PCID_ASID_USER);
+
+out_set_cr3:
+	/*
+	 * Caution: many callers of this function expect
+	 * that load_cr3() is serializing and orders TLB
+	 * fills with respect to the mm_cpumask writes.
+	 */
+	write_cr3(new_mm_cr3);
+}
+
 /*
  * We cannot call mmdrop() because we are in interrupt context,
  * instead update mm->cpu_vm_mask.
@@ -45,7 +85,7 @@ void leave_mm(int cpu)
 		BUG();
 	if (cpumask_test_cpu(cpu, mm_cpumask(active_mm))) {
 		cpumask_clear_cpu(cpu, mm_cpumask(active_mm));
-		load_cr3(swapper_pg_dir);
+		load_new_mm_cr3(swapper_pg_dir);
 		/*
 		 * This gets called in the idle path where RCU
 		 * functions differently.  Tracing normally
@@ -105,7 +145,7 @@ void switch_mm_irqs_off(struct mm_struct
 		 * ordering guarantee we need.
 		 *
 		 */
-		load_cr3(next->pgd);
+		load_new_mm_cr3(next->pgd);
 
 		trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
 
@@ -152,7 +192,7 @@ void switch_mm_irqs_off(struct mm_struct
 			 * As above, load_cr3() is serializing and orders TLB
 			 * fills with respect to the mm_cpumask write.
 			 */
-			load_cr3(next->pgd);
+			load_new_mm_cr3(next->pgd);
 			trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
 			load_mm_cr4(next);
 			load_mm_ldt(next);

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 19/37] kaiser: load_new_mm_cr3() let SWITCH_USER_CR3 flush user
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (17 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 18/37] kaiser: enhanced by kernel and user PCIDs Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 20/37] kaiser: PCID 0 for kernel and 128 for user Greg Kroah-Hartman
                   ` (27 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Jiri Kosina

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>


We have many machines (Westmere, Sandybridge, Ivybridge) supporting
PCID but not INVPCID: on these load_new_mm_cr3() simply crashed.

Flushing user context inside load_new_mm_cr3() without the use of
invpcid is difficult: momentarily switch from kernel to user context
and back to do so?  I'm not sure whether that can be safely done at
all, and would risk polluting user context with kernel internals,
and kernel context with stale user externals.

Instead, follow the hint in the comment that was there: change
X86_CR3_PCID_USER_VAR to be a per-cpu variable, then load_new_mm_cr3()
can leave a note in it, for SWITCH_USER_CR3 on return to userspace to
flush user context TLB, instead of default X86_CR3_PCID_USER_NOFLUSH.

Which works well enough that there's no need to do it this way only
when invpcid is unsupported: it's a good alternative to invpcid here.
But there's a couple of inlines in asm/tlbflush.h that need to do the
same trick, so it's best to localize all this per-cpu business in
mm/kaiser.c: moving that part of the initialization from setup_pcid()
to kaiser_setup_pcid(); with kaiser_flush_tlb_on_return_to_user() the
function for noting an X86_CR3_PCID_USER_FLUSH.  And let's keep a
KAISER_SHADOW_PGD_OFFSET in there, to avoid the extra OR on exit.

I did try to make the feature tests in asm/tlbflush.h more consistent
with each other: there seem to be far too many ways of performing such
tests, and I don't have a good grasp of their differences.  At first
I converted them all to be static_cpu_has(): but that proved to be a
mistake, as the comment in __native_flush_tlb_single() hints; so then
I reversed and made them all this_cpu_has().  Probably all gratuitous
change, but that's the way it's working at present.

I am slightly bothered by the way non-per-cpu X86_CR3_PCID_KERN_VAR
gets re-initialized by each cpu (before and after these changes):
no problem when (as usual) all cpus on a machine have the same
features, but in principle incorrect.  However, my experiment
to per-cpu-ify that one did not end well...

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/kaiser.h   |   18 +++++++-----
 arch/x86/include/asm/tlbflush.h |   58 +++++++++++++++++++++++++++-------------
 arch/x86/kernel/cpu/common.c    |   22 ---------------
 arch/x86/mm/kaiser.c            |   50 ++++++++++++++++++++++++++++++----
 arch/x86/mm/tlb.c               |   46 ++++++++++++-------------------
 5 files changed, 114 insertions(+), 80 deletions(-)

--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -32,13 +32,12 @@ movq \reg, %cr3
 .macro _SWITCH_TO_USER_CR3 reg
 movq %cr3, \reg
 andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), \reg
-/*
- * This can obviously be one instruction by putting the
- * KAISER_SHADOW_PGD_OFFSET bit in the X86_CR3_PCID_USER_VAR.
- * But, just leave it now for simplicity.
- */
-orq  X86_CR3_PCID_USER_VAR, \reg
-orq  $(KAISER_SHADOW_PGD_OFFSET), \reg
+orq  PER_CPU_VAR(X86_CR3_PCID_USER_VAR), \reg
+js   9f
+// FLUSH this time, reset to NOFLUSH for next time
+// But if nopcid?  Consider using 0x80 for user pcid?
+movb $(0x80), PER_CPU_VAR(X86_CR3_PCID_USER_VAR+7)
+9:
 movq \reg, %cr3
 .endm
 
@@ -90,6 +89,11 @@ movq PER_CPU_VAR(unsafe_stack_register_b
 */
 DECLARE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
 
+extern unsigned long X86_CR3_PCID_KERN_VAR;
+DECLARE_PER_CPU(unsigned long, X86_CR3_PCID_USER_VAR);
+
+extern char __per_cpu_user_mapped_start[], __per_cpu_user_mapped_end[];
+
 /**
  *  kaiser_add_mapping - map a virtual memory part to the shadow (user) mapping
  *  @addr: the start address of the range
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -12,6 +12,7 @@ static inline void __invpcid(unsigned lo
 			     unsigned long type)
 {
 	struct { u64 d[2]; } desc = { { pcid, addr } };
+
 	/*
 	 * The memory clobber is because the whole point is to invalidate
 	 * stale TLB entries and, especially if we're flushing global
@@ -130,27 +131,42 @@ static inline void cr4_set_bits_and_upda
 	cr4_set_bits(mask);
 }
 
+/*
+ * Declare a couple of kaiser interfaces here for convenience,
+ * to avoid the need for asm/kaiser.h in unexpected places.
+ */
+#ifdef CONFIG_KAISER
+extern void kaiser_setup_pcid(void);
+extern void kaiser_flush_tlb_on_return_to_user(void);
+#else
+static inline void kaiser_setup_pcid(void)
+{
+}
+static inline void kaiser_flush_tlb_on_return_to_user(void)
+{
+}
+#endif
+
 static inline void __native_flush_tlb(void)
 {
-	if (!cpu_feature_enabled(X86_FEATURE_INVPCID)) {
-	 	/*
-		 * If current->mm == NULL then we borrow a mm which may change during a
-		 * task switch and therefore we must not be preempted while we write CR3
-		 * back:
+	if (this_cpu_has(X86_FEATURE_INVPCID)) {
+		/*
+		 * Note, this works with CR4.PCIDE=0 or 1.
 		 */
-		preempt_disable();
-		native_write_cr3(native_read_cr3());
-		preempt_enable();
+		invpcid_flush_all_nonglobals();
 		return;
 	}
+
 	/*
-	 * We are no longer using globals with KAISER, so a
-	 * "nonglobals" flush would work too. But, this is more
-	 * conservative.
-	 *
-	 * Note, this works with CR4.PCIDE=0 or 1.
+	 * If current->mm == NULL then we borrow a mm which may change during a
+	 * task switch and therefore we must not be preempted while we write CR3
+	 * back:
 	 */
-	invpcid_flush_all();
+	preempt_disable();
+	if (this_cpu_has(X86_FEATURE_PCID))
+		kaiser_flush_tlb_on_return_to_user();
+	native_write_cr3(native_read_cr3());
+	preempt_enable();
 }
 
 static inline void __native_flush_tlb_global_irq_disabled(void)
@@ -166,9 +182,13 @@ static inline void __native_flush_tlb_gl
 
 static inline void __native_flush_tlb_global(void)
 {
+#ifdef CONFIG_KAISER
+	/* Globals are not used at all */
+	__native_flush_tlb();
+#else
 	unsigned long flags;
 
-	if (static_cpu_has(X86_FEATURE_INVPCID)) {
+	if (this_cpu_has(X86_FEATURE_INVPCID)) {
 		/*
 		 * Using INVPCID is considerably faster than a pair of writes
 		 * to CR4 sandwiched inside an IRQ flag save/restore.
@@ -185,10 +205,9 @@ static inline void __native_flush_tlb_gl
 	 * be called from deep inside debugging code.)
 	 */
 	raw_local_irq_save(flags);
-
 	__native_flush_tlb_global_irq_disabled();
-
 	raw_local_irq_restore(flags);
+#endif
 }
 
 static inline void __native_flush_tlb_single(unsigned long addr)
@@ -199,9 +218,12 @@ static inline void __native_flush_tlb_si
 	 *
 	 * The ASIDs used below are hard-coded.  But, we must not
 	 * call invpcid(type=1/2) before CR4.PCIDE=1.  Just call
-	 * invpcid in the case we are called early.
+	 * invlpg in the case we are called early.
 	 */
+
 	if (!this_cpu_has(X86_FEATURE_INVPCID_SINGLE)) {
+		if (this_cpu_has(X86_FEATURE_PCID))
+			kaiser_flush_tlb_on_return_to_user();
 		asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
 		return;
 	}
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -321,33 +321,12 @@ static __always_inline void setup_smap(s
 	}
 }
 
-/*
- * These can have bit 63 set, so we can not just use a plain "or"
- * instruction to get their value or'd into CR3.  It would take
- * another register.  So, we use a memory reference to these
- * instead.
- *
- * This is also handy because systems that do not support
- * PCIDs just end up or'ing a 0 into their CR3, which does
- * no harm.
- */
-__aligned(PAGE_SIZE) unsigned long X86_CR3_PCID_KERN_VAR = 0;
-__aligned(PAGE_SIZE) unsigned long X86_CR3_PCID_USER_VAR = 0;
-
 static void setup_pcid(struct cpuinfo_x86 *c)
 {
 	if (cpu_has(c, X86_FEATURE_PCID)) {
 		if (cpu_has(c, X86_FEATURE_PGE)) {
 			cr4_set_bits(X86_CR4_PCIDE);
 			/*
-			 * These variables are used by the entry/exit
-			 * code to change PCIDs.
-			 */
-#ifdef CONFIG_KAISER
-			X86_CR3_PCID_KERN_VAR = X86_CR3_PCID_KERN_NOFLUSH;
-			X86_CR3_PCID_USER_VAR = X86_CR3_PCID_USER_NOFLUSH;
-#endif
-			/*
 			 * INVPCID has two "groups" of types:
 			 * 1/2: Invalidate an individual address
 			 * 3/4: Invalidate all contexts
@@ -372,6 +351,7 @@ static void setup_pcid(struct cpuinfo_x8
 			clear_cpu_cap(c, X86_FEATURE_PCID);
 		}
 	}
+	kaiser_setup_pcid();
 }
 
 /*
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -12,12 +12,26 @@
 #include <linux/ftrace.h>
 
 #include <asm/kaiser.h>
+#include <asm/tlbflush.h>	/* to verify its kaiser declarations */
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
 #include <asm/desc.h>
+
 #ifdef CONFIG_KAISER
+__visible
+DEFINE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
+
+/*
+ * These can have bit 63 set, so we can not just use a plain "or"
+ * instruction to get their value or'd into CR3.  It would take
+ * another register.  So, we use a memory reference to these instead.
+ *
+ * This is also handy because systems that do not support PCIDs
+ * just end up or'ing a 0 into their CR3, which does no harm.
+ */
+__aligned(PAGE_SIZE) unsigned long X86_CR3_PCID_KERN_VAR;
+DEFINE_PER_CPU(unsigned long, X86_CR3_PCID_USER_VAR);
 
-__visible DEFINE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
 /*
  * At runtime, the only things we map are some things for CPU
  * hotplug, and stacks for new processes.  No two CPUs will ever
@@ -239,9 +253,6 @@ static void __init kaiser_init_all_pgds(
 	WARN_ON(__ret);							\
 } while (0)
 
-extern char __per_cpu_user_mapped_start[], __per_cpu_user_mapped_end[];
-extern unsigned long X86_CR3_PCID_KERN_VAR;
-extern unsigned long X86_CR3_PCID_USER_VAR;
 /*
  * If anything in here fails, we will likely die on one of the
  * first kernel->user transitions and init will die.  But, we
@@ -295,8 +306,6 @@ void __init kaiser_init(void)
 
 	kaiser_add_user_map_early(&X86_CR3_PCID_KERN_VAR, PAGE_SIZE,
 				  __PAGE_KERNEL);
-	kaiser_add_user_map_early(&X86_CR3_PCID_USER_VAR, PAGE_SIZE,
-				  __PAGE_KERNEL);
 }
 
 /* Add a mapping to the shadow mapping, and synchronize the mappings */
@@ -361,4 +370,33 @@ pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp,
 	}
 	return pgd;
 }
+
+void kaiser_setup_pcid(void)
+{
+	unsigned long kern_cr3 = 0;
+	unsigned long user_cr3 = KAISER_SHADOW_PGD_OFFSET;
+
+	if (this_cpu_has(X86_FEATURE_PCID)) {
+		kern_cr3 |= X86_CR3_PCID_KERN_NOFLUSH;
+		user_cr3 |= X86_CR3_PCID_USER_NOFLUSH;
+	}
+	/*
+	 * These variables are used by the entry/exit
+	 * code to change PCID and pgd and TLB flushing.
+	 */
+	X86_CR3_PCID_KERN_VAR = kern_cr3;
+	this_cpu_write(X86_CR3_PCID_USER_VAR, user_cr3);
+}
+
+/*
+ * Make a note that this cpu will need to flush USER tlb on return to user.
+ * Caller checks whether this_cpu_has(X86_FEATURE_PCID) before calling:
+ * if cpu does not, then the NOFLUSH bit will never have been set.
+ */
+void kaiser_flush_tlb_on_return_to_user(void)
+{
+	this_cpu_write(X86_CR3_PCID_USER_VAR,
+			X86_CR3_PCID_USER_FLUSH | KAISER_SHADOW_PGD_OFFSET);
+}
+EXPORT_SYMBOL(kaiser_flush_tlb_on_return_to_user);
 #endif /* CONFIG_KAISER */
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -6,13 +6,14 @@
 #include <linux/interrupt.h>
 #include <linux/module.h>
 #include <linux/cpu.h>
+#include <linux/debugfs.h>
 
 #include <asm/tlbflush.h>
 #include <asm/mmu_context.h>
 #include <asm/cache.h>
 #include <asm/apic.h>
 #include <asm/uv/uv.h>
-#include <linux/debugfs.h>
+#include <asm/kaiser.h>
 
 /*
  *	TLB flushing, formerly SMP-only
@@ -38,34 +39,23 @@ static void load_new_mm_cr3(pgd_t *pgdir
 {
 	unsigned long new_mm_cr3 = __pa(pgdir);
 
-	/*
-	 * KAISER, plus PCIDs needs some extra work here.  But,
-	 * if either of features is not present, we need no
-	 * PCIDs here and just do a normal, full TLB flush with
-	 * the write_cr3()
-	 */
-	if (!IS_ENABLED(CONFIG_KAISER) ||
-	    !cpu_feature_enabled(X86_FEATURE_PCID))
-		goto out_set_cr3;
-	/*
-	 * We reuse the same PCID for different tasks, so we must
-	 * flush all the entires for the PCID out when we change
-	 * tasks.
-	 */
-	new_mm_cr3 = X86_CR3_PCID_KERN_FLUSH | __pa(pgdir);
-
-	/*
-	 * The flush from load_cr3() may leave old TLB entries
-	 * for userspace in place.  We must flush that context
-	 * separately.  We can theoretically delay doing this
-	 * until we actually load up the userspace CR3, but
-	 * that's a bit tricky.  We have to have the "need to
-	 * flush userspace PCID" bit per-cpu and check it in the
-	 * exit-to-userspace paths.
-	 */
-	invpcid_flush_single_context(X86_CR3_PCID_ASID_USER);
+#ifdef CONFIG_KAISER
+	if (this_cpu_has(X86_FEATURE_PCID)) {
+		/*
+		 * We reuse the same PCID for different tasks, so we must
+		 * flush all the entries for the PCID out when we change tasks.
+		 * Flush KERN below, flush USER when returning to userspace in
+		 * kaiser's SWITCH_USER_CR3 (_SWITCH_TO_USER_CR3) macro.
+		 *
+		 * invpcid_flush_single_context(X86_CR3_PCID_ASID_USER) could
+		 * do it here, but can only be used if X86_FEATURE_INVPCID is
+		 * available - and many machines support pcid without invpcid.
+		 */
+		new_mm_cr3 |= X86_CR3_PCID_KERN_FLUSH;
+		kaiser_flush_tlb_on_return_to_user();
+	}
+#endif /* CONFIG_KAISER */
 
-out_set_cr3:
 	/*
 	 * Caution: many callers of this function expect
 	 * that load_cr3() is serializing and orders TLB

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 20/37] kaiser: PCID 0 for kernel and 128 for user
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (18 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 19/37] kaiser: load_new_mm_cr3() let SWITCH_USER_CR3 flush user Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 21/37] kaiser: x86_cr3_pcid_noflush and x86_cr3_pcid_user Greg Kroah-Hartman
                   ` (26 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Jiri Kosina

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>


Why was 4 chosen for kernel PCID and 6 for user PCID?
No good reason in a backport where PCIDs are only used for Kaiser.

If we continue with those, then we shall need to add Andy Lutomirski's
4.13 commit 6c690ee1039b ("x86/mm: Split read_cr3() into read_cr3_pa()
and __read_cr3()"), which deals with the problem of read_cr3() callers
finding stray bits in the cr3 that they expected to be page-aligned;
and for hibernation, his 4.14 commit f34902c5c6c0 ("x86/hibernate/64:
Mask off CR3's PCID bits in the saved CR3").

But if 0 is used for kernel PCID, then there's no need to add in those
commits - whenever the kernel looks, it sees 0 in the lower bits; and
0 for kernel seems an obvious choice.

And I naughtily propose 128 for user PCID.  Because there's a place
in _SWITCH_TO_USER_CR3 where it takes note of the need for TLB FLUSH,
but needs to reset that to NOFLUSH for the next occasion.  Currently
it does so with a "movb $(0x80)" into the high byte of the per-cpu
quadword, but that will cause a machine without PCID support to crash.
Now, if %al just happened to have 0x80 in it at that point, on a
machine with PCID support, but 0 on a machine without PCID support...

(That will go badly wrong once the pgd can be at a physical address
above 2^56, but even with 5-level paging, physical goes up to 2^52.)

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/kaiser.h        |   19 ++++++++++++-------
 arch/x86/include/asm/pgtable_types.h |    7 ++++---
 arch/x86/mm/tlb.c                    |    3 +++
 3 files changed, 19 insertions(+), 10 deletions(-)

--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -29,14 +29,19 @@ orq  X86_CR3_PCID_KERN_VAR, \reg
 movq \reg, %cr3
 .endm
 
-.macro _SWITCH_TO_USER_CR3 reg
+.macro _SWITCH_TO_USER_CR3 reg regb
+/*
+ * regb must be the low byte portion of reg: because we have arranged
+ * for the low byte of the user PCID to serve as the high byte of NOFLUSH
+ * (0x80 for each when PCID is enabled, or 0x00 when PCID and NOFLUSH are
+ * not enabled): so that the one register can update both memory and cr3.
+ */
 movq %cr3, \reg
 andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), \reg
 orq  PER_CPU_VAR(X86_CR3_PCID_USER_VAR), \reg
 js   9f
-// FLUSH this time, reset to NOFLUSH for next time
-// But if nopcid?  Consider using 0x80 for user pcid?
-movb $(0x80), PER_CPU_VAR(X86_CR3_PCID_USER_VAR+7)
+/* FLUSH this time, reset to NOFLUSH for next time (if PCID enabled) */
+movb \regb, PER_CPU_VAR(X86_CR3_PCID_USER_VAR+7)
 9:
 movq \reg, %cr3
 .endm
@@ -49,7 +54,7 @@ popq %rax
 
 .macro SWITCH_USER_CR3
 pushq %rax
-_SWITCH_TO_USER_CR3 %rax
+_SWITCH_TO_USER_CR3 %rax %al
 popq %rax
 .endm
 
@@ -61,7 +66,7 @@ movq PER_CPU_VAR(unsafe_stack_register_b
 
 .macro SWITCH_USER_CR3_NO_STACK
 movq %rax, PER_CPU_VAR(unsafe_stack_register_backup)
-_SWITCH_TO_USER_CR3 %rax
+_SWITCH_TO_USER_CR3 %rax %al
 movq PER_CPU_VAR(unsafe_stack_register_backup), %rax
 .endm
 
@@ -69,7 +74,7 @@ movq PER_CPU_VAR(unsafe_stack_register_b
 
 .macro SWITCH_KERNEL_CR3 reg
 .endm
-.macro SWITCH_USER_CR3 reg
+.macro SWITCH_USER_CR3 reg regb
 .endm
 .macro SWITCH_USER_CR3_NO_STACK
 .endm
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -111,16 +111,17 @@
 
 /* Mask for all the PCID-related bits in CR3: */
 #define X86_CR3_PCID_MASK       (X86_CR3_PCID_NOFLUSH | X86_CR3_PCID_ASID_MASK)
+#define X86_CR3_PCID_ASID_KERN  (_AC(0x0,UL))
+
 #if defined(CONFIG_KAISER) && defined(CONFIG_X86_64)
-#define X86_CR3_PCID_ASID_KERN  (_AC(0x4,UL))
-#define X86_CR3_PCID_ASID_USER  (_AC(0x6,UL))
+/* Let X86_CR3_PCID_ASID_USER be usable for the X86_CR3_PCID_NOFLUSH bit */
+#define X86_CR3_PCID_ASID_USER	(_AC(0x80,UL))
 
 #define X86_CR3_PCID_KERN_FLUSH		(X86_CR3_PCID_ASID_KERN)
 #define X86_CR3_PCID_USER_FLUSH		(X86_CR3_PCID_ASID_USER)
 #define X86_CR3_PCID_KERN_NOFLUSH	(X86_CR3_PCID_NOFLUSH | X86_CR3_PCID_ASID_KERN)
 #define X86_CR3_PCID_USER_NOFLUSH	(X86_CR3_PCID_NOFLUSH | X86_CR3_PCID_ASID_USER)
 #else
-#define X86_CR3_PCID_ASID_KERN  (_AC(0x0,UL))
 #define X86_CR3_PCID_ASID_USER  (_AC(0x0,UL))
 /*
  * PCIDs are unsupported on 32-bit and none of these bits can be
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -50,6 +50,9 @@ static void load_new_mm_cr3(pgd_t *pgdir
 		 * invpcid_flush_single_context(X86_CR3_PCID_ASID_USER) could
 		 * do it here, but can only be used if X86_FEATURE_INVPCID is
 		 * available - and many machines support pcid without invpcid.
+		 *
+		 * The line below is a no-op: X86_CR3_PCID_KERN_FLUSH is now 0;
+		 * but keep that line in there in case something changes.
 		 */
 		new_mm_cr3 |= X86_CR3_PCID_KERN_FLUSH;
 		kaiser_flush_tlb_on_return_to_user();

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 21/37] kaiser: x86_cr3_pcid_noflush and x86_cr3_pcid_user
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (19 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 20/37] kaiser: PCID 0 for kernel and 128 for user Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 22/37] kaiser: paranoid_entry pass cr3 need to paranoid_exit Greg Kroah-Hartman
                   ` (25 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Jiri Kosina

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>


Mostly this commit is just unshouting X86_CR3_PCID_KERN_VAR and
X86_CR3_PCID_USER_VAR: we usually name variables in lower-case.

But why does x86_cr3_pcid_noflush need to be __aligned(PAGE_SIZE)?
Ah, it's a leftover from when kaiser_add_user_map() once complained
about mapping the same page twice.  Make it __read_mostly instead.
(I'm a little uneasy about all the unrelated data which shares its
page getting user-mapped too, but that was so before, and not a big
deal: though we call it user-mapped, it's not mapped with _PAGE_USER.)

And there is a little change around the two calls to do_nmi().
Previously they set the NOFLUSH bit (if PCID supported) when
forcing to kernel context before do_nmi(); now they also have the
NOFLUSH bit set (if PCID supported) when restoring context after:
nothing done in do_nmi() should require a TLB to be flushed here.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/entry/entry_64.S     |    8 ++++----
 arch/x86/include/asm/kaiser.h |   11 +++++------
 arch/x86/mm/kaiser.c          |   13 +++++++------
 3 files changed, 16 insertions(+), 16 deletions(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1290,11 +1290,11 @@ ENTRY(nmi)
 	/* Unconditionally use kernel CR3 for do_nmi() */
 	/* %rax is saved above, so OK to clobber here */
 	movq	%cr3, %rax
+	/* If PCID enabled, NOFLUSH now and NOFLUSH on return */
+	orq	x86_cr3_pcid_noflush, %rax
 	pushq	%rax
 	/* mask off "user" bit of pgd address and 12 PCID bits: */
 	andq	$(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
-	/* Add back kernel PCID and "no flush" bit */
-	orq	X86_CR3_PCID_KERN_VAR, %rax
 	movq	%rax, %cr3
 #endif
 	call	do_nmi
@@ -1534,11 +1534,11 @@ end_repeat_nmi:
 	/* Unconditionally use kernel CR3 for do_nmi() */
 	/* %rax is saved above, so OK to clobber here */
 	movq	%cr3, %rax
+	/* If PCID enabled, NOFLUSH now and NOFLUSH on return */
+	orq	x86_cr3_pcid_noflush, %rax
 	pushq	%rax
 	/* mask off "user" bit of pgd address and 12 PCID bits: */
 	andq	$(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
-	/* Add back kernel PCID and "no flush" bit */
-	orq	X86_CR3_PCID_KERN_VAR, %rax
 	movq	%rax, %cr3
 #endif
 
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -25,7 +25,7 @@
 .macro _SWITCH_TO_KERNEL_CR3 reg
 movq %cr3, \reg
 andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), \reg
-orq  X86_CR3_PCID_KERN_VAR, \reg
+orq  x86_cr3_pcid_noflush, \reg
 movq \reg, %cr3
 .endm
 
@@ -37,11 +37,10 @@ movq \reg, %cr3
  * not enabled): so that the one register can update both memory and cr3.
  */
 movq %cr3, \reg
-andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), \reg
-orq  PER_CPU_VAR(X86_CR3_PCID_USER_VAR), \reg
+orq  PER_CPU_VAR(x86_cr3_pcid_user), \reg
 js   9f
 /* FLUSH this time, reset to NOFLUSH for next time (if PCID enabled) */
-movb \regb, PER_CPU_VAR(X86_CR3_PCID_USER_VAR+7)
+movb \regb, PER_CPU_VAR(x86_cr3_pcid_user+7)
 9:
 movq \reg, %cr3
 .endm
@@ -94,8 +93,8 @@ movq PER_CPU_VAR(unsafe_stack_register_b
 */
 DECLARE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
 
-extern unsigned long X86_CR3_PCID_KERN_VAR;
-DECLARE_PER_CPU(unsigned long, X86_CR3_PCID_USER_VAR);
+extern unsigned long x86_cr3_pcid_noflush;
+DECLARE_PER_CPU(unsigned long, x86_cr3_pcid_user);
 
 extern char __per_cpu_user_mapped_start[], __per_cpu_user_mapped_end[];
 
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -29,8 +29,8 @@ DEFINE_PER_CPU_USER_MAPPED(unsigned long
  * This is also handy because systems that do not support PCIDs
  * just end up or'ing a 0 into their CR3, which does no harm.
  */
-__aligned(PAGE_SIZE) unsigned long X86_CR3_PCID_KERN_VAR;
-DEFINE_PER_CPU(unsigned long, X86_CR3_PCID_USER_VAR);
+unsigned long x86_cr3_pcid_noflush __read_mostly;
+DEFINE_PER_CPU(unsigned long, x86_cr3_pcid_user);
 
 /*
  * At runtime, the only things we map are some things for CPU
@@ -304,7 +304,8 @@ void __init kaiser_init(void)
 				  sizeof(gate_desc) * NR_VECTORS,
 				  __PAGE_KERNEL);
 
-	kaiser_add_user_map_early(&X86_CR3_PCID_KERN_VAR, PAGE_SIZE,
+	kaiser_add_user_map_early(&x86_cr3_pcid_noflush,
+				  sizeof(x86_cr3_pcid_noflush),
 				  __PAGE_KERNEL);
 }
 
@@ -384,8 +385,8 @@ void kaiser_setup_pcid(void)
 	 * These variables are used by the entry/exit
 	 * code to change PCID and pgd and TLB flushing.
 	 */
-	X86_CR3_PCID_KERN_VAR = kern_cr3;
-	this_cpu_write(X86_CR3_PCID_USER_VAR, user_cr3);
+	x86_cr3_pcid_noflush = kern_cr3;
+	this_cpu_write(x86_cr3_pcid_user, user_cr3);
 }
 
 /*
@@ -395,7 +396,7 @@ void kaiser_setup_pcid(void)
  */
 void kaiser_flush_tlb_on_return_to_user(void)
 {
-	this_cpu_write(X86_CR3_PCID_USER_VAR,
+	this_cpu_write(x86_cr3_pcid_user,
 			X86_CR3_PCID_USER_FLUSH | KAISER_SHADOW_PGD_OFFSET);
 }
 EXPORT_SYMBOL(kaiser_flush_tlb_on_return_to_user);

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 22/37] kaiser: paranoid_entry pass cr3 need to paranoid_exit
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (20 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 21/37] kaiser: x86_cr3_pcid_noflush and x86_cr3_pcid_user Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 23/37] kaiser: _pgd_alloc() without __GFP_REPEAT to avoid stalls Greg Kroah-Hartman
                   ` (24 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Jiri Kosina

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>


Neel Natu points out that paranoid_entry() was wrong to assume that
an entry that did not need swapgs would not need SWITCH_KERNEL_CR3:
paranoid_entry (used for debug breakpoint, int3, double fault or MCE;
though I think it's only the MCE case that is cause for concern here)
can break in at an awkward time, between cr3 switch and swapgs, but
its handling always needs kernel gs and kernel cr3.

Easy to fix in itself, but paranoid_entry() also needs to convey to
paranoid_exit() (and my reading of macro idtentry says paranoid_entry
and paranoid_exit are always paired) how to restore the prior state.
The swapgs state is already conveyed by %ebx (0 or 1), so extend that
also to convey when SWITCH_USER_CR3 will be needed (2 or 3).

(Yes, I'd much prefer that 0 meant no swapgs, whereas it's the other
way round: and a convention shared with error_entry() and error_exit(),
which I don't want to touch.  Perhaps I should have inverted the bit
for switch cr3 too, but did not.)

paranoid_exit() would be straightforward, except for TRACE_IRQS: it
did TRACE_IRQS_IRETQ when doing swapgs, but TRACE_IRQS_IRETQ_DEBUG
when not: which is it supposed to use when SWITCH_USER_CR3 is split
apart from that?  As best as I can determine, commit 5963e317b1e9
("ftrace/x86: Do not change stacks in DEBUG when calling lockdep")
missed the swapgs case, and should have used TRACE_IRQS_IRETQ_DEBUG
there too (the discrepancy has nothing to do with the liberal use
of _NO_STACK and _UNSAFE_STACK hereabouts: TRACE_IRQS_OFF_DEBUG has
just been used in all cases); discrepancy lovingly preserved across
several paranoid_exit() cleanups, but I'm now removing it.

Neel further indicates that to use SWITCH_USER_CR3_NO_STACK there in
paranoid_exit() is now not only unnecessary but unsafe: might corrupt
syscall entry's unsafe_stack_register_backup of %rax.  Just use
SWITCH_USER_CR3: and delete SWITCH_USER_CR3_NO_STACK altogether,
before we make the mistake of using it again.

hughd adds: this commit fixes an issue in the Kaiser-without-PCIDs
part of the series, and ought to be moved earlier, if you decided
to make a release of Kaiser-without-PCIDs.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/entry/entry_64.S     |   46 ++++++++++++++++++++++++++++++++----------
 arch/x86/include/asm/kaiser.h |    8 -------
 2 files changed, 36 insertions(+), 18 deletions(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1025,7 +1025,11 @@ idtentry machine_check					has_error_cod
 /*
  * Save all registers in pt_regs, and switch gs if needed.
  * Use slow, but surefire "are we in kernel?" check.
- * Return: ebx=0: need swapgs on exit, ebx=1: otherwise
+ *
+ * Return: ebx=0: needs swapgs but not SWITCH_USER_CR3 in paranoid_exit
+ *         ebx=1: needs neither swapgs nor SWITCH_USER_CR3 in paranoid_exit
+ *         ebx=2: needs both swapgs and SWITCH_USER_CR3 in paranoid_exit
+ *         ebx=3: needs SWITCH_USER_CR3 but not swapgs in paranoid_exit
  */
 ENTRY(paranoid_entry)
 	cld
@@ -1037,9 +1041,26 @@ ENTRY(paranoid_entry)
 	testl	%edx, %edx
 	js	1f				/* negative -> in kernel */
 	SWAPGS
-	SWITCH_KERNEL_CR3
 	xorl	%ebx, %ebx
-1:	ret
+1:
+#ifdef CONFIG_KAISER
+	/*
+	 * We might have come in between a swapgs and a SWITCH_KERNEL_CR3
+	 * on entry, or between a SWITCH_USER_CR3 and a swapgs on exit.
+	 * Do a conditional SWITCH_KERNEL_CR3: this could safely be done
+	 * unconditionally, but we need to find out whether the reverse
+	 * should be done on return (conveyed to paranoid_exit in %ebx).
+	 */
+	movq	%cr3, %rax
+	testl	$KAISER_SHADOW_PGD_OFFSET, %eax
+	jz	2f
+	orl	$2, %ebx
+	andq	$(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
+	orq	x86_cr3_pcid_noflush, %rax
+	movq	%rax, %cr3
+2:
+#endif
+	ret
 END(paranoid_entry)
 
 /*
@@ -1052,20 +1073,25 @@ END(paranoid_entry)
  * be complicated.  Fortunately, we there's no good reason
  * to try to handle preemption here.
  *
- * On entry, ebx is "no swapgs" flag (1: don't need swapgs, 0: need it)
+ * On entry: ebx=0: needs swapgs but not SWITCH_USER_CR3
+ *           ebx=1: needs neither swapgs nor SWITCH_USER_CR3
+ *           ebx=2: needs both swapgs and SWITCH_USER_CR3
+ *           ebx=3: needs SWITCH_USER_CR3 but not swapgs
  */
 ENTRY(paranoid_exit)
 	DISABLE_INTERRUPTS(CLBR_NONE)
 	TRACE_IRQS_OFF_DEBUG
-	testl	%ebx, %ebx			/* swapgs needed? */
+	TRACE_IRQS_IRETQ_DEBUG
+#ifdef CONFIG_KAISER
+	testl	$2, %ebx			/* SWITCH_USER_CR3 needed? */
+	jz	paranoid_exit_no_switch
+	SWITCH_USER_CR3
+paranoid_exit_no_switch:
+#endif
+	testl	$1, %ebx			/* swapgs needed? */
 	jnz	paranoid_exit_no_swapgs
-	TRACE_IRQS_IRETQ
-	SWITCH_USER_CR3_NO_STACK
 	SWAPGS_UNSAFE_STACK
-	jmp	paranoid_exit_restore
 paranoid_exit_no_swapgs:
-	TRACE_IRQS_IRETQ_DEBUG
-paranoid_exit_restore:
 	RESTORE_EXTRA_REGS
 	RESTORE_C_REGS
 	REMOVE_PT_GPREGS_FROM_STACK 8
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -63,20 +63,12 @@ _SWITCH_TO_KERNEL_CR3 %rax
 movq PER_CPU_VAR(unsafe_stack_register_backup), %rax
 .endm
 
-.macro SWITCH_USER_CR3_NO_STACK
-movq %rax, PER_CPU_VAR(unsafe_stack_register_backup)
-_SWITCH_TO_USER_CR3 %rax %al
-movq PER_CPU_VAR(unsafe_stack_register_backup), %rax
-.endm
-
 #else /* CONFIG_KAISER */
 
 .macro SWITCH_KERNEL_CR3 reg
 .endm
 .macro SWITCH_USER_CR3 reg regb
 .endm
-.macro SWITCH_USER_CR3_NO_STACK
-.endm
 .macro SWITCH_KERNEL_CR3_NO_STACK
 .endm
 

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 23/37] kaiser: _pgd_alloc() without __GFP_REPEAT to avoid stalls
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (21 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 22/37] kaiser: paranoid_entry pass cr3 need to paranoid_exit Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 24/37] kaiser: fix unlikely error in alloc_ldt_struct() Greg Kroah-Hartman
                   ` (23 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Jiri Kosina

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>


Synthetic filesystem mempressure testing has shown softlockups, with
hour-long page allocation stalls, and pgd_alloc() trying for order:1
with __GFP_REPEAT in one of the backtraces each time.

That's _pgd_alloc() going for a Kaiser double-pgd, using the __GFP_REPEAT
common to all page table allocations, but actually having no effect on
order:0 (see should_alloc_oom() and should_continue_reclaim() in this
tree, but beware that ports to another tree might behave differently).

Order:1 stack allocation has been working satisfactorily without
__GFP_REPEAT forever, and page table allocation only asks __GFP_REPEAT
for awkward occasions in a long-running process: it's not appropriate
at fork or exec time, and seems to be doing much more harm than good:
getting those contiguous pages under very heavy mempressure can be
hard (though even without it, Kaiser does generate more mempressure).

Mask out that __GFP_REPEAT inside _pgd_alloc().  Why not take it out
of the PGALLOG_GFP altogether, as v4.7 commit a3a9a59d2067 ("x86: get
rid of superfluous __GFP_REPEAT") did?  Because I think that might
make a difference to our page table memcg charging, which I'd prefer
not to interfere with at this time.

hughd adds: __alloc_pages_slowpath() in the 4.4.89-stable tree handles
__GFP_REPEAT a little differently than in prod kernel or 3.18.72-stable,
so it may not always be exactly a no-op on order:0 pages, as said above;
but I think still appropriate to omit it from Kaiser or non-Kaiser pgd.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/mm/pgtable.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -6,7 +6,7 @@
 #include <asm/fixmap.h>
 #include <asm/mtrr.h>
 
-#define PGALLOC_GFP GFP_KERNEL | __GFP_NOTRACK | __GFP_REPEAT | __GFP_ZERO
+#define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_REPEAT | __GFP_ZERO)
 
 #ifdef CONFIG_HIGHPTE
 #define PGALLOC_USER_GFP __GFP_HIGHMEM
@@ -354,7 +354,9 @@ static inline void _pgd_free(pgd_t *pgd)
 
 static inline pgd_t *_pgd_alloc(void)
 {
-	return (pgd_t *)__get_free_pages(PGALLOC_GFP, PGD_ALLOCATION_ORDER);
+	/* No __GFP_REPEAT: to avoid page allocation stalls in order-1 case */
+	return (pgd_t *)__get_free_pages(PGALLOC_GFP & ~__GFP_REPEAT,
+					 PGD_ALLOCATION_ORDER);
 }
 
 static inline void _pgd_free(pgd_t *pgd)

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 24/37] kaiser: fix unlikely error in alloc_ldt_struct()
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (22 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 23/37] kaiser: _pgd_alloc() without __GFP_REPEAT to avoid stalls Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 25/37] kaiser: add "nokaiser" boot option, using ALTERNATIVE Greg Kroah-Hartman
                   ` (22 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Jiri Kosina

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>


An error from kaiser_add_mapping() here is not at all likely, but
Eric Biggers rightly points out that __free_ldt_struct() relies on
new_ldt->size being initialized: move that up.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/ldt.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -79,11 +79,11 @@ static struct ldt_struct *alloc_ldt_stru
 
 	ret = kaiser_add_mapping((unsigned long)new_ldt->entries, alloc_size,
 				 __PAGE_KERNEL);
+	new_ldt->size = size;
 	if (ret) {
 		__free_ldt_struct(new_ldt);
 		return NULL;
 	}
-	new_ldt->size = size;
 	return new_ldt;
 }
 

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 25/37] kaiser: add "nokaiser" boot option, using ALTERNATIVE
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (23 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 24/37] kaiser: fix unlikely error in alloc_ldt_struct() Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 26/37] x86/kaiser: Rename and simplify X86_FEATURE_KAISER handling Greg Kroah-Hartman
                   ` (21 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Jiri Kosina

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>


Added "nokaiser" boot option: an early param like "noinvpcid".
Most places now check int kaiser_enabled (#defined 0 when not
CONFIG_KAISER) instead of #ifdef CONFIG_KAISER; but entry_64.S
and entry_64_compat.S are using the ALTERNATIVE technique, which
patches in the preferred instructions at runtime.  That technique
is tied to x86 cpu features, so X86_FEATURE_KAISER is fabricated.

Prior to "nokaiser", Kaiser #defined _PAGE_GLOBAL 0: revert that,
but be careful with both _PAGE_GLOBAL and CR4.PGE: setting them when
nokaiser like when !CONFIG_KAISER, but not setting either when kaiser -
neither matters on its own, but it's hard to be sure that _PAGE_GLOBAL
won't get set in some obscure corner, or something add PGE into CR4.
By omitting _PAGE_GLOBAL from __supported_pte_mask when kaiser_enabled,
all page table setup which uses pte_pfn() masks it out of the ptes.

It's slightly shameful that the same declaration versus definition of
kaiser_enabled appears in not one, not two, but in three header files
(asm/kaiser.h, asm/pgtable.h, asm/tlbflush.h).  I felt safer that way,
than with #including any of those in any of the others; and did not
feel it worth an asm/kaiser_enabled.h - kernel/cpu/common.c includes
them all, so we shall hear about it if they get out of synch.

Cleanups while in the area: removed the silly #ifdef CONFIG_KAISER
from kaiser.c; removed the unused native_get_normal_pgd(); removed
the spurious reg clutter from SWITCH_*_CR3 macro stubs; corrected some
comments.  But more interestingly, set CR4.PSE in secondary_startup_64:
the manual is clear that it does not matter whether it's 0 or 1 when
4-level-pts are enabled, but I was distracted to find cr4 different on
BSP and auxiliaries - BSP alone was adding PSE, in probe_page_size_mask().

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/kernel-parameters.txt  |    2 +
 arch/x86/entry/entry_64.S            |   15 +++++++------
 arch/x86/include/asm/cpufeature.h    |    3 ++
 arch/x86/include/asm/kaiser.h        |   27 +++++++++++++++++-------
 arch/x86/include/asm/pgtable.h       |   20 ++++++++++++-----
 arch/x86/include/asm/pgtable_64.h    |   13 +++--------
 arch/x86/include/asm/pgtable_types.h |    4 ---
 arch/x86/include/asm/tlbflush.h      |   39 ++++++++++++++++++++++-------------
 arch/x86/kernel/cpu/common.c         |   28 ++++++++++++++++++++++++-
 arch/x86/kernel/espfix_64.c          |    3 +-
 arch/x86/kernel/head_64.S            |    4 +--
 arch/x86/mm/init.c                   |    2 -
 arch/x86/mm/init_64.c                |   10 ++++++++
 arch/x86/mm/kaiser.c                 |   26 +++++++++++++++++++----
 arch/x86/mm/pgtable.c                |    8 +------
 arch/x86/mm/tlb.c                    |    4 ---
 16 files changed, 143 insertions(+), 65 deletions(-)

--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2523,6 +2523,8 @@ bytes respectively. Such letter suffixes
 
 	nojitter	[IA-64] Disables jitter checking for ITC timers.
 
+	nokaiser	[X86-64] Disable KAISER isolation of kernel from user.
+
 	no-kvmclock	[X86,KVM] Disable paravirtualized KVM clock driver
 
 	no-kvmapf	[X86,KVM] Disable paravirtualized asynchronous page
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1051,7 +1051,7 @@ ENTRY(paranoid_entry)
 	 * unconditionally, but we need to find out whether the reverse
 	 * should be done on return (conveyed to paranoid_exit in %ebx).
 	 */
-	movq	%cr3, %rax
+	ALTERNATIVE "jmp 2f", "movq %cr3, %rax", X86_FEATURE_KAISER
 	testl	$KAISER_SHADOW_PGD_OFFSET, %eax
 	jz	2f
 	orl	$2, %ebx
@@ -1083,6 +1083,7 @@ ENTRY(paranoid_exit)
 	TRACE_IRQS_OFF_DEBUG
 	TRACE_IRQS_IRETQ_DEBUG
 #ifdef CONFIG_KAISER
+	/* No ALTERNATIVE for X86_FEATURE_KAISER: paranoid_entry sets %ebx */
 	testl	$2, %ebx			/* SWITCH_USER_CR3 needed? */
 	jz	paranoid_exit_no_switch
 	SWITCH_USER_CR3
@@ -1315,13 +1316,14 @@ ENTRY(nmi)
 #ifdef CONFIG_KAISER
 	/* Unconditionally use kernel CR3 for do_nmi() */
 	/* %rax is saved above, so OK to clobber here */
-	movq	%cr3, %rax
+	ALTERNATIVE "jmp 2f", "movq %cr3, %rax", X86_FEATURE_KAISER
 	/* If PCID enabled, NOFLUSH now and NOFLUSH on return */
 	orq	x86_cr3_pcid_noflush, %rax
 	pushq	%rax
 	/* mask off "user" bit of pgd address and 12 PCID bits: */
 	andq	$(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
 	movq	%rax, %cr3
+2:
 #endif
 	call	do_nmi
 
@@ -1331,8 +1333,7 @@ ENTRY(nmi)
 	 * kernel code that needs user CR3, but do we ever return
 	 * to "user mode" where we need the kernel CR3?
 	 */
-	popq	%rax
-	mov	%rax, %cr3
+	ALTERNATIVE "", "popq %rax; movq %rax, %cr3", X86_FEATURE_KAISER
 #endif
 
 	/*
@@ -1559,13 +1560,14 @@ end_repeat_nmi:
 #ifdef CONFIG_KAISER
 	/* Unconditionally use kernel CR3 for do_nmi() */
 	/* %rax is saved above, so OK to clobber here */
-	movq	%cr3, %rax
+	ALTERNATIVE "jmp 2f", "movq %cr3, %rax", X86_FEATURE_KAISER
 	/* If PCID enabled, NOFLUSH now and NOFLUSH on return */
 	orq	x86_cr3_pcid_noflush, %rax
 	pushq	%rax
 	/* mask off "user" bit of pgd address and 12 PCID bits: */
 	andq	$(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
 	movq	%rax, %cr3
+2:
 #endif
 
 	/* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
@@ -1577,8 +1579,7 @@ end_repeat_nmi:
 	 * kernel code that needs user CR3, like just just before
 	 * a sysret.
 	 */
-	popq	%rax
-	mov	%rax, %cr3
+	ALTERNATIVE "", "popq %rax; movq %rax, %cr3", X86_FEATURE_KAISER
 #endif
 
 	testl	%ebx, %ebx			/* swapgs needed? */
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -200,6 +200,9 @@
 #define X86_FEATURE_HWP_PKG_REQ ( 7*32+14) /* Intel HWP_PKG_REQ */
 #define X86_FEATURE_INTEL_PT	( 7*32+15) /* Intel Processor Trace */
 
+/* Because the ALTERNATIVE scheme is for members of the X86_FEATURE club... */
+#define X86_FEATURE_KAISER	( 7*32+31) /* CONFIG_KAISER w/o nokaiser */
+
 /* Virtualization flags: Linux defined, word 8 */
 #define X86_FEATURE_TPR_SHADOW  ( 8*32+ 0) /* Intel TPR Shadow */
 #define X86_FEATURE_VNMI        ( 8*32+ 1) /* Intel Virtual NMI */
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -46,28 +46,33 @@ movq \reg, %cr3
 .endm
 
 .macro SWITCH_KERNEL_CR3
-pushq %rax
+ALTERNATIVE "jmp 8f", "pushq %rax", X86_FEATURE_KAISER
 _SWITCH_TO_KERNEL_CR3 %rax
 popq %rax
+8:
 .endm
 
 .macro SWITCH_USER_CR3
-pushq %rax
+ALTERNATIVE "jmp 8f", "pushq %rax", X86_FEATURE_KAISER
 _SWITCH_TO_USER_CR3 %rax %al
 popq %rax
+8:
 .endm
 
 .macro SWITCH_KERNEL_CR3_NO_STACK
-movq %rax, PER_CPU_VAR(unsafe_stack_register_backup)
+ALTERNATIVE "jmp 8f", \
+	__stringify(movq %rax, PER_CPU_VAR(unsafe_stack_register_backup)), \
+	X86_FEATURE_KAISER
 _SWITCH_TO_KERNEL_CR3 %rax
 movq PER_CPU_VAR(unsafe_stack_register_backup), %rax
+8:
 .endm
 
 #else /* CONFIG_KAISER */
 
-.macro SWITCH_KERNEL_CR3 reg
+.macro SWITCH_KERNEL_CR3
 .endm
-.macro SWITCH_USER_CR3 reg regb
+.macro SWITCH_USER_CR3
 .endm
 .macro SWITCH_KERNEL_CR3_NO_STACK
 .endm
@@ -90,6 +95,16 @@ DECLARE_PER_CPU(unsigned long, x86_cr3_p
 
 extern char __per_cpu_user_mapped_start[], __per_cpu_user_mapped_end[];
 
+extern int kaiser_enabled;
+#else
+#define kaiser_enabled	0
+#endif /* CONFIG_KAISER */
+
+/*
+ * Kaiser function prototypes are needed even when CONFIG_KAISER is not set,
+ * so as to build with tests on kaiser_enabled instead of #ifdefs.
+ */
+
 /**
  *  kaiser_add_mapping - map a virtual memory part to the shadow (user) mapping
  *  @addr: the start address of the range
@@ -119,8 +134,6 @@ extern void kaiser_remove_mapping(unsign
  */
 extern void kaiser_init(void);
 
-#endif /* CONFIG_KAISER */
-
 #endif /* __ASSEMBLY */
 
 #endif /* _ASM_X86_KAISER_H */
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -18,6 +18,12 @@
 #ifndef __ASSEMBLY__
 #include <asm/x86_init.h>
 
+#ifdef CONFIG_KAISER
+extern int kaiser_enabled;
+#else
+#define kaiser_enabled 0
+#endif
+
 void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd);
 void ptdump_walk_pgd_level_checkwx(void);
 
@@ -660,7 +666,7 @@ static inline int pgd_bad(pgd_t pgd)
 	 * page table by accident; it will fault on the first
 	 * instruction it tries to run.  See native_set_pgd().
 	 */
-	if (IS_ENABLED(CONFIG_KAISER))
+	if (kaiser_enabled)
 		ignore_flags |= _PAGE_NX;
 
 	return (pgd_flags(pgd) & ~ignore_flags) != _KERNPG_TABLE;
@@ -865,12 +871,14 @@ static inline void pmdp_set_wrprotect(st
  */
 static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count)
 {
-       memcpy(dst, src, count * sizeof(pgd_t));
+	memcpy(dst, src, count * sizeof(pgd_t));
 #ifdef CONFIG_KAISER
-	/* Clone the shadow pgd part as well */
-	memcpy(native_get_shadow_pgd(dst),
-	       native_get_shadow_pgd(src),
-	       count * sizeof(pgd_t));
+	if (kaiser_enabled) {
+		/* Clone the shadow pgd part as well */
+		memcpy(native_get_shadow_pgd(dst),
+			native_get_shadow_pgd(src),
+			count * sizeof(pgd_t));
+	}
 #endif
 }
 
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -111,13 +111,12 @@ extern pgd_t kaiser_set_shadow_pgd(pgd_t
 
 static inline pgd_t *native_get_shadow_pgd(pgd_t *pgdp)
 {
+#ifdef CONFIG_DEBUG_VM
+	/* linux/mmdebug.h may not have been included at this point */
+	BUG_ON(!kaiser_enabled);
+#endif
 	return (pgd_t *)((unsigned long)pgdp | (unsigned long)PAGE_SIZE);
 }
-
-static inline pgd_t *native_get_normal_pgd(pgd_t *pgdp)
-{
-	return (pgd_t *)((unsigned long)pgdp & ~(unsigned long)PAGE_SIZE);
-}
 #else
 static inline pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd)
 {
@@ -128,10 +127,6 @@ static inline pgd_t *native_get_shadow_p
 	BUILD_BUG_ON(1);
 	return NULL;
 }
-static inline pgd_t *native_get_normal_pgd(pgd_t *pgdp)
-{
-	return pgdp;
-}
 #endif /* CONFIG_KAISER */
 
 static inline void native_set_pgd(pgd_t *pgdp, pgd_t pgd)
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -39,11 +39,7 @@
 #define _PAGE_ACCESSED	(_AT(pteval_t, 1) << _PAGE_BIT_ACCESSED)
 #define _PAGE_DIRTY	(_AT(pteval_t, 1) << _PAGE_BIT_DIRTY)
 #define _PAGE_PSE	(_AT(pteval_t, 1) << _PAGE_BIT_PSE)
-#ifdef CONFIG_KAISER
-#define _PAGE_GLOBAL	(_AT(pteval_t, 0))
-#else
 #define _PAGE_GLOBAL	(_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL)
-#endif
 #define _PAGE_SOFTW1	(_AT(pteval_t, 1) << _PAGE_BIT_SOFTW1)
 #define _PAGE_SOFTW2	(_AT(pteval_t, 1) << _PAGE_BIT_SOFTW2)
 #define _PAGE_PAT	(_AT(pteval_t, 1) << _PAGE_BIT_PAT)
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -136,9 +136,11 @@ static inline void cr4_set_bits_and_upda
  * to avoid the need for asm/kaiser.h in unexpected places.
  */
 #ifdef CONFIG_KAISER
+extern int kaiser_enabled;
 extern void kaiser_setup_pcid(void);
 extern void kaiser_flush_tlb_on_return_to_user(void);
 #else
+#define kaiser_enabled 0
 static inline void kaiser_setup_pcid(void)
 {
 }
@@ -163,7 +165,7 @@ static inline void __native_flush_tlb(vo
 	 * back:
 	 */
 	preempt_disable();
-	if (this_cpu_has(X86_FEATURE_PCID))
+	if (kaiser_enabled && this_cpu_has(X86_FEATURE_PCID))
 		kaiser_flush_tlb_on_return_to_user();
 	native_write_cr3(native_read_cr3());
 	preempt_enable();
@@ -174,20 +176,30 @@ static inline void __native_flush_tlb_gl
 	unsigned long cr4;
 
 	cr4 = this_cpu_read(cpu_tlbstate.cr4);
-	/* clear PGE */
-	native_write_cr4(cr4 & ~X86_CR4_PGE);
-	/* write old PGE again and flush TLBs */
-	native_write_cr4(cr4);
+	if (cr4 & X86_CR4_PGE) {
+		/* clear PGE and flush TLB of all entries */
+		native_write_cr4(cr4 & ~X86_CR4_PGE);
+		/* restore PGE as it was before */
+		native_write_cr4(cr4);
+	} else {
+		/*
+		 * x86_64 microcode update comes this way when CR4.PGE is not
+		 * enabled, and it's safer for all callers to allow this case.
+		 */
+		native_write_cr3(native_read_cr3());
+	}
 }
 
 static inline void __native_flush_tlb_global(void)
 {
-#ifdef CONFIG_KAISER
-	/* Globals are not used at all */
-	__native_flush_tlb();
-#else
 	unsigned long flags;
 
+	if (kaiser_enabled) {
+		/* Globals are not used at all */
+		__native_flush_tlb();
+		return;
+	}
+
 	if (this_cpu_has(X86_FEATURE_INVPCID)) {
 		/*
 		 * Using INVPCID is considerably faster than a pair of writes
@@ -207,7 +219,6 @@ static inline void __native_flush_tlb_gl
 	raw_local_irq_save(flags);
 	__native_flush_tlb_global_irq_disabled();
 	raw_local_irq_restore(flags);
-#endif
 }
 
 static inline void __native_flush_tlb_single(unsigned long addr)
@@ -222,7 +233,7 @@ static inline void __native_flush_tlb_si
 	 */
 
 	if (!this_cpu_has(X86_FEATURE_INVPCID_SINGLE)) {
-		if (this_cpu_has(X86_FEATURE_PCID))
+		if (kaiser_enabled && this_cpu_has(X86_FEATURE_PCID))
 			kaiser_flush_tlb_on_return_to_user();
 		asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
 		return;
@@ -237,9 +248,9 @@ static inline void __native_flush_tlb_si
 	 * Make sure to do only a single invpcid when KAISER is
 	 * disabled and we have only a single ASID.
 	 */
-	if (X86_CR3_PCID_ASID_KERN != X86_CR3_PCID_ASID_USER)
-		invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
-	invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
+	if (kaiser_enabled)
+		invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
+	invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
 }
 
 static inline void __flush_tlb_all(void)
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -178,6 +178,20 @@ static int __init x86_pcid_setup(char *s
 	return 1;
 }
 __setup("nopcid", x86_pcid_setup);
+
+static int __init x86_nokaiser_setup(char *s)
+{
+	/* nokaiser doesn't accept parameters */
+	if (s)
+		return -EINVAL;
+#ifdef CONFIG_KAISER
+	kaiser_enabled = 0;
+	setup_clear_cpu_cap(X86_FEATURE_KAISER);
+	pr_info("nokaiser: KAISER feature disabled\n");
+#endif
+	return 0;
+}
+early_param("nokaiser", x86_nokaiser_setup);
 #endif
 
 static int __init x86_noinvpcid_setup(char *s)
@@ -324,7 +338,7 @@ static __always_inline void setup_smap(s
 static void setup_pcid(struct cpuinfo_x86 *c)
 {
 	if (cpu_has(c, X86_FEATURE_PCID)) {
-		if (cpu_has(c, X86_FEATURE_PGE)) {
+		if (cpu_has(c, X86_FEATURE_PGE) || kaiser_enabled) {
 			cr4_set_bits(X86_CR4_PCIDE);
 			/*
 			 * INVPCID has two "groups" of types:
@@ -747,6 +761,10 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
 		c->x86_power = cpuid_edx(0x80000007);
 
 	init_scattered_cpuid_features(c);
+#ifdef CONFIG_KAISER
+	if (kaiser_enabled)
+		set_cpu_cap(c, X86_FEATURE_KAISER);
+#endif
 }
 
 static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c)
@@ -1406,6 +1424,14 @@ void cpu_init(void)
 	 * try to read it.
 	 */
 	cr4_init_shadow();
+	if (!kaiser_enabled) {
+		/*
+		 * secondary_startup_64() deferred setting PGE in cr4:
+		 * probe_page_size_mask() sets it on the boot cpu,
+		 * but it needs to be set on each secondary cpu.
+		 */
+		cr4_set_bits(X86_CR4_PGE);
+	}
 
 	/*
 	 * Load microcode on this cpu if a valid microcode is available.
--- a/arch/x86/kernel/espfix_64.c
+++ b/arch/x86/kernel/espfix_64.c
@@ -132,9 +132,10 @@ void __init init_espfix_bsp(void)
 	 * area to ensure it is mapped into the shadow user page
 	 * tables.
 	 */
-	if (IS_ENABLED(CONFIG_KAISER))
+	if (kaiser_enabled) {
 		set_pgd(native_get_shadow_pgd(pgd_p),
 			__pgd(_KERNPG_TABLE | __pa((pud_t *)espfix_pud_page)));
+	}
 
 	/* Randomize the locations */
 	init_espfix_random();
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -183,8 +183,8 @@ ENTRY(secondary_startup_64)
 	movq	$(init_level4_pgt - __START_KERNEL_map), %rax
 1:
 
-	/* Enable PAE mode and PGE */
-	movl	$(X86_CR4_PAE | X86_CR4_PGE), %ecx
+	/* Enable PAE and PSE, but defer PGE until kaiser_enabled is decided */
+	movl	$(X86_CR4_PAE | X86_CR4_PSE), %ecx
 	movq	%rcx, %cr4
 
 	/* Setup early boot stage 4 level pagetables. */
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -165,7 +165,7 @@ static void __init probe_page_size_mask(
 		cr4_set_bits_and_update_boot(X86_CR4_PSE);
 
 	/* Enable PGE if available */
-	if (cpu_has_pge) {
+	if (cpu_has_pge && !kaiser_enabled) {
 		cr4_set_bits_and_update_boot(X86_CR4_PGE);
 		__supported_pte_mask |= _PAGE_GLOBAL;
 	} else
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -395,6 +395,16 @@ void __init cleanup_highmap(void)
 			continue;
 		if (vaddr < (unsigned long) _text || vaddr > end)
 			set_pmd(pmd, __pmd(0));
+		else if (kaiser_enabled) {
+			/*
+			 * level2_kernel_pgt is initialized with _PAGE_GLOBAL:
+			 * clear that now.  This is not important, so long as
+			 * CR4.PGE remains clear, but it removes an anomaly.
+			 * Physical mapping setup below avoids _PAGE_GLOBAL
+			 * by use of massage_pgprot() inside pfn_pte() etc.
+			 */
+			set_pmd(pmd, pmd_clear_flags(*pmd, _PAGE_GLOBAL));
+		}
 	}
 }
 
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -17,7 +17,9 @@
 #include <asm/pgalloc.h>
 #include <asm/desc.h>
 
-#ifdef CONFIG_KAISER
+int kaiser_enabled __read_mostly = 1;
+EXPORT_SYMBOL(kaiser_enabled);	/* for inlined TLB flush functions */
+
 __visible
 DEFINE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
 
@@ -168,8 +170,8 @@ static pte_t *kaiser_pagetable_walk(unsi
 	return pte_offset_kernel(pmd, address);
 }
 
-int kaiser_add_user_map(const void *__start_addr, unsigned long size,
-			unsigned long flags)
+static int kaiser_add_user_map(const void *__start_addr, unsigned long size,
+			       unsigned long flags)
 {
 	int ret = 0;
 	pte_t *pte;
@@ -178,6 +180,15 @@ int kaiser_add_user_map(const void *__st
 	unsigned long end_addr = PAGE_ALIGN(start_addr + size);
 	unsigned long target_address;
 
+	/*
+	 * It is convenient for callers to pass in __PAGE_KERNEL etc,
+	 * and there is no actual harm from setting _PAGE_GLOBAL, so
+	 * long as CR4.PGE is not set.  But it is nonetheless troubling
+	 * to see Kaiser itself setting _PAGE_GLOBAL (now that "nokaiser"
+	 * requires that not to be #defined to 0): so mask it off here.
+	 */
+	flags &= ~_PAGE_GLOBAL;
+
 	for (; address < end_addr; address += PAGE_SIZE) {
 		target_address = get_pa_from_mapping(address);
 		if (target_address == -1) {
@@ -264,6 +275,8 @@ void __init kaiser_init(void)
 {
 	int cpu;
 
+	if (!kaiser_enabled)
+		return;
 	kaiser_init_all_pgds();
 
 	for_each_possible_cpu(cpu) {
@@ -312,6 +325,8 @@ void __init kaiser_init(void)
 /* Add a mapping to the shadow mapping, and synchronize the mappings */
 int kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags)
 {
+	if (!kaiser_enabled)
+		return 0;
 	return kaiser_add_user_map((const void *)addr, size, flags);
 }
 
@@ -323,6 +338,8 @@ void kaiser_remove_mapping(unsigned long
 	unsigned long addr, next;
 	pgd_t *pgd;
 
+	if (!kaiser_enabled)
+		return;
 	pgd = native_get_shadow_pgd(pgd_offset_k(start));
 	for (addr = start; addr < end; pgd++, addr = next) {
 		next = pgd_addr_end(addr, end);
@@ -344,6 +361,8 @@ static inline bool is_userspace_pgd(pgd_
 
 pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd)
 {
+	if (!kaiser_enabled)
+		return pgd;
 	/*
 	 * Do we need to also populate the shadow pgd?  Check _PAGE_USER to
 	 * skip cases like kexec and EFI which make temporary low mappings.
@@ -400,4 +419,3 @@ void kaiser_flush_tlb_on_return_to_user(
 			X86_CR3_PCID_USER_FLUSH | KAISER_SHADOW_PGD_OFFSET);
 }
 EXPORT_SYMBOL(kaiser_flush_tlb_on_return_to_user);
-#endif /* CONFIG_KAISER */
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -341,16 +341,12 @@ static inline void _pgd_free(pgd_t *pgd)
 }
 #else
 
-#ifdef CONFIG_KAISER
 /*
- * Instead of one pmd, we aquire two pmds.  Being order-1, it is
+ * Instead of one pgd, Kaiser acquires two pgds.  Being order-1, it is
  * both 8k in size and 8k-aligned.  That lets us just flip bit 12
  * in a pointer to swap between the two 4k halves.
  */
-#define PGD_ALLOCATION_ORDER 1
-#else
-#define PGD_ALLOCATION_ORDER 0
-#endif
+#define PGD_ALLOCATION_ORDER	kaiser_enabled
 
 static inline pgd_t *_pgd_alloc(void)
 {
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -39,8 +39,7 @@ static void load_new_mm_cr3(pgd_t *pgdir
 {
 	unsigned long new_mm_cr3 = __pa(pgdir);
 
-#ifdef CONFIG_KAISER
-	if (this_cpu_has(X86_FEATURE_PCID)) {
+	if (kaiser_enabled && this_cpu_has(X86_FEATURE_PCID)) {
 		/*
 		 * We reuse the same PCID for different tasks, so we must
 		 * flush all the entries for the PCID out when we change tasks.
@@ -57,7 +56,6 @@ static void load_new_mm_cr3(pgd_t *pgdir
 		new_mm_cr3 |= X86_CR3_PCID_KERN_FLUSH;
 		kaiser_flush_tlb_on_return_to_user();
 	}
-#endif /* CONFIG_KAISER */
 
 	/*
 	 * Caution: many callers of this function expect

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 26/37] x86/kaiser: Rename and simplify X86_FEATURE_KAISER handling
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (24 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 25/37] kaiser: add "nokaiser" boot option, using ALTERNATIVE Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 27/37] x86/kaiser: Check boottime cmdline params Greg Kroah-Hartman
                   ` (20 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Borislav Petkov

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Borislav Petkov <bp@suse.de>


Concentrate it in arch/x86/mm/kaiser.c and use the upstream string "nopti".

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/kernel-parameters.txt |    2 +-
 arch/x86/kernel/cpu/common.c        |   18 ------------------
 arch/x86/mm/kaiser.c                |   20 +++++++++++++++++++-
 3 files changed, 20 insertions(+), 20 deletions(-)

--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2523,7 +2523,7 @@ bytes respectively. Such letter suffixes
 
 	nojitter	[IA-64] Disables jitter checking for ITC timers.
 
-	nokaiser	[X86-64] Disable KAISER isolation of kernel from user.
+	nopti		[X86-64] Disable KAISER isolation of kernel from user.
 
 	no-kvmclock	[X86,KVM] Disable paravirtualized KVM clock driver
 
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -178,20 +178,6 @@ static int __init x86_pcid_setup(char *s
 	return 1;
 }
 __setup("nopcid", x86_pcid_setup);
-
-static int __init x86_nokaiser_setup(char *s)
-{
-	/* nokaiser doesn't accept parameters */
-	if (s)
-		return -EINVAL;
-#ifdef CONFIG_KAISER
-	kaiser_enabled = 0;
-	setup_clear_cpu_cap(X86_FEATURE_KAISER);
-	pr_info("nokaiser: KAISER feature disabled\n");
-#endif
-	return 0;
-}
-early_param("nokaiser", x86_nokaiser_setup);
 #endif
 
 static int __init x86_noinvpcid_setup(char *s)
@@ -761,10 +747,6 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
 		c->x86_power = cpuid_edx(0x80000007);
 
 	init_scattered_cpuid_features(c);
-#ifdef CONFIG_KAISER
-	if (kaiser_enabled)
-		set_cpu_cap(c, X86_FEATURE_KAISER);
-#endif
 }
 
 static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c)
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -275,8 +275,13 @@ void __init kaiser_init(void)
 {
 	int cpu;
 
-	if (!kaiser_enabled)
+	if (!kaiser_enabled) {
+		setup_clear_cpu_cap(X86_FEATURE_KAISER);
 		return;
+	}
+
+	setup_force_cpu_cap(X86_FEATURE_KAISER);
+
 	kaiser_init_all_pgds();
 
 	for_each_possible_cpu(cpu) {
@@ -419,3 +424,16 @@ void kaiser_flush_tlb_on_return_to_user(
 			X86_CR3_PCID_USER_FLUSH | KAISER_SHADOW_PGD_OFFSET);
 }
 EXPORT_SYMBOL(kaiser_flush_tlb_on_return_to_user);
+
+static int __init x86_nokaiser_setup(char *s)
+{
+	/* nopti doesn't accept parameters */
+	if (s)
+		return -EINVAL;
+
+	kaiser_enabled = 0;
+	pr_info("Kernel/User page tables isolation: disabled\n");
+
+	return 0;
+}
+early_param("nopti", x86_nokaiser_setup);

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 27/37] x86/kaiser: Check boottime cmdline params
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (25 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 26/37] x86/kaiser: Rename and simplify X86_FEATURE_KAISER handling Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 28/37] kaiser: use ALTERNATIVE instead of x86_cr3_pcid_noflush Greg Kroah-Hartman
                   ` (19 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Borislav Petkov

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Borislav Petkov <bp@suse.de>


AMD (and possibly other vendors) are not affected by the leak
KAISER is protecting against.

Keep the "nopti" for traditional reasons and add pti=<on|off|auto>
like upstream.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/kernel-parameters.txt |    6 +++
 arch/x86/mm/kaiser.c                |   59 +++++++++++++++++++++++++-----------
 2 files changed, 47 insertions(+), 18 deletions(-)

--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3056,6 +3056,12 @@ bytes respectively. Such letter suffixes
 	pt.		[PARIDE]
 			See Documentation/blockdev/paride.txt.
 
+	pti=		[X86_64]
+			Control KAISER user/kernel address space isolation:
+			on - enable
+			off - disable
+			auto - default setting
+
 	pty.legacy_count=
 			[KNL] Number of legacy pty's. Overwrites compiled-in
 			default number.
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -16,6 +16,7 @@
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
 #include <asm/desc.h>
+#include <asm/cmdline.h>
 
 int kaiser_enabled __read_mostly = 1;
 EXPORT_SYMBOL(kaiser_enabled);	/* for inlined TLB flush functions */
@@ -264,6 +265,43 @@ static void __init kaiser_init_all_pgds(
 	WARN_ON(__ret);							\
 } while (0)
 
+void __init kaiser_check_boottime_disable(void)
+{
+	bool enable = true;
+	char arg[5];
+	int ret;
+
+	ret = cmdline_find_option(boot_command_line, "pti", arg, sizeof(arg));
+	if (ret > 0) {
+		if (!strncmp(arg, "on", 2))
+			goto enable;
+
+		if (!strncmp(arg, "off", 3))
+			goto disable;
+
+		if (!strncmp(arg, "auto", 4))
+			goto skip;
+	}
+
+	if (cmdline_find_option_bool(boot_command_line, "nopti"))
+		goto disable;
+
+skip:
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
+		goto disable;
+
+enable:
+	if (enable)
+		setup_force_cpu_cap(X86_FEATURE_KAISER);
+
+	return;
+
+disable:
+	pr_info("Kernel/User page tables isolation: disabled\n");
+	kaiser_enabled = 0;
+	setup_clear_cpu_cap(X86_FEATURE_KAISER);
+}
+
 /*
  * If anything in here fails, we will likely die on one of the
  * first kernel->user transitions and init will die.  But, we
@@ -275,12 +313,10 @@ void __init kaiser_init(void)
 {
 	int cpu;
 
-	if (!kaiser_enabled) {
-		setup_clear_cpu_cap(X86_FEATURE_KAISER);
-		return;
-	}
+	kaiser_check_boottime_disable();
 
-	setup_force_cpu_cap(X86_FEATURE_KAISER);
+	if (!kaiser_enabled)
+		return;
 
 	kaiser_init_all_pgds();
 
@@ -424,16 +460,3 @@ void kaiser_flush_tlb_on_return_to_user(
 			X86_CR3_PCID_USER_FLUSH | KAISER_SHADOW_PGD_OFFSET);
 }
 EXPORT_SYMBOL(kaiser_flush_tlb_on_return_to_user);
-
-static int __init x86_nokaiser_setup(char *s)
-{
-	/* nopti doesn't accept parameters */
-	if (s)
-		return -EINVAL;
-
-	kaiser_enabled = 0;
-	pr_info("Kernel/User page tables isolation: disabled\n");
-
-	return 0;
-}
-early_param("nopti", x86_nokaiser_setup);

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 28/37] kaiser: use ALTERNATIVE instead of x86_cr3_pcid_noflush
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (26 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 27/37] x86/kaiser: Check boottime cmdline params Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 29/37] kaiser: drop is_atomic arg to kaiser_pagetable_walk() Greg Kroah-Hartman
                   ` (18 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Jiri Kosina

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>


Now that we're playing the ALTERNATIVE game, use that more efficient
method: instead of user-mapping an extra page, and reading an extra
cacheline each time for x86_cr3_pcid_noflush.

Neel has found that __stringify(bts $X86_CR3_PCID_NOFLUSH_BIT, %rax)
is a working substitute for the "bts $63, %rax" in these ALTERNATIVEs;
but the one line with $63 in looks clearer, so let's stick with that.

Worried about what happens with an ALTERNATIVE between the jump and
jump label in another ALTERNATIVE?  I was, but have checked the
combinations in SWITCH_KERNEL_CR3_NO_STACK at entry_SYSCALL_64,
and it does a good job.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/entry/entry_64.S     |    7 ++++---
 arch/x86/include/asm/kaiser.h |    6 +++---
 arch/x86/mm/kaiser.c          |   11 +----------
 3 files changed, 8 insertions(+), 16 deletions(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1056,7 +1056,8 @@ ENTRY(paranoid_entry)
 	jz	2f
 	orl	$2, %ebx
 	andq	$(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
-	orq	x86_cr3_pcid_noflush, %rax
+	/* If PCID enabled, set X86_CR3_PCID_NOFLUSH_BIT */
+	ALTERNATIVE "", "bts $63, %rax", X86_FEATURE_PCID
 	movq	%rax, %cr3
 2:
 #endif
@@ -1318,7 +1319,7 @@ ENTRY(nmi)
 	/* %rax is saved above, so OK to clobber here */
 	ALTERNATIVE "jmp 2f", "movq %cr3, %rax", X86_FEATURE_KAISER
 	/* If PCID enabled, NOFLUSH now and NOFLUSH on return */
-	orq	x86_cr3_pcid_noflush, %rax
+	ALTERNATIVE "", "bts $63, %rax", X86_FEATURE_PCID
 	pushq	%rax
 	/* mask off "user" bit of pgd address and 12 PCID bits: */
 	andq	$(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
@@ -1562,7 +1563,7 @@ end_repeat_nmi:
 	/* %rax is saved above, so OK to clobber here */
 	ALTERNATIVE "jmp 2f", "movq %cr3, %rax", X86_FEATURE_KAISER
 	/* If PCID enabled, NOFLUSH now and NOFLUSH on return */
-	orq	x86_cr3_pcid_noflush, %rax
+	ALTERNATIVE "", "bts $63, %rax", X86_FEATURE_PCID
 	pushq	%rax
 	/* mask off "user" bit of pgd address and 12 PCID bits: */
 	andq	$(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -25,7 +25,8 @@
 .macro _SWITCH_TO_KERNEL_CR3 reg
 movq %cr3, \reg
 andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), \reg
-orq  x86_cr3_pcid_noflush, \reg
+/* If PCID enabled, set X86_CR3_PCID_NOFLUSH_BIT */
+ALTERNATIVE "", "bts $63, \reg", X86_FEATURE_PCID
 movq \reg, %cr3
 .endm
 
@@ -39,7 +40,7 @@ movq \reg, %cr3
 movq %cr3, \reg
 orq  PER_CPU_VAR(x86_cr3_pcid_user), \reg
 js   9f
-/* FLUSH this time, reset to NOFLUSH for next time (if PCID enabled) */
+/* If PCID enabled, FLUSH this time, reset to NOFLUSH for next time */
 movb \regb, PER_CPU_VAR(x86_cr3_pcid_user+7)
 9:
 movq \reg, %cr3
@@ -90,7 +91,6 @@ movq PER_CPU_VAR(unsafe_stack_register_b
 */
 DECLARE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
 
-extern unsigned long x86_cr3_pcid_noflush;
 DECLARE_PER_CPU(unsigned long, x86_cr3_pcid_user);
 
 extern char __per_cpu_user_mapped_start[], __per_cpu_user_mapped_end[];
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -32,7 +32,6 @@ DEFINE_PER_CPU_USER_MAPPED(unsigned long
  * This is also handy because systems that do not support PCIDs
  * just end up or'ing a 0 into their CR3, which does no harm.
  */
-unsigned long x86_cr3_pcid_noflush __read_mostly;
 DEFINE_PER_CPU(unsigned long, x86_cr3_pcid_user);
 
 /*
@@ -357,10 +356,6 @@ void __init kaiser_init(void)
 	kaiser_add_user_map_early(&debug_idt_table,
 				  sizeof(gate_desc) * NR_VECTORS,
 				  __PAGE_KERNEL);
-
-	kaiser_add_user_map_early(&x86_cr3_pcid_noflush,
-				  sizeof(x86_cr3_pcid_noflush),
-				  __PAGE_KERNEL);
 }
 
 /* Add a mapping to the shadow mapping, and synchronize the mappings */
@@ -434,18 +429,14 @@ pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp,
 
 void kaiser_setup_pcid(void)
 {
-	unsigned long kern_cr3 = 0;
 	unsigned long user_cr3 = KAISER_SHADOW_PGD_OFFSET;
 
-	if (this_cpu_has(X86_FEATURE_PCID)) {
-		kern_cr3 |= X86_CR3_PCID_KERN_NOFLUSH;
+	if (this_cpu_has(X86_FEATURE_PCID))
 		user_cr3 |= X86_CR3_PCID_USER_NOFLUSH;
-	}
 	/*
 	 * These variables are used by the entry/exit
 	 * code to change PCID and pgd and TLB flushing.
 	 */
-	x86_cr3_pcid_noflush = kern_cr3;
 	this_cpu_write(x86_cr3_pcid_user, user_cr3);
 }
 

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 29/37] kaiser: drop is_atomic arg to kaiser_pagetable_walk()
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (27 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 28/37] kaiser: use ALTERNATIVE instead of x86_cr3_pcid_noflush Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 30/37] kaiser: asm/tlbflush.h handle noPGE at lower level Greg Kroah-Hartman
                   ` (17 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Jiri Kosina

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>


I have not observed a might_sleep() warning from setup_fixmap_gdt()'s
use of kaiser_add_mapping() in our tree (why not?), but like upstream
we have not provided a way for that to pass is_atomic true down to
kaiser_pagetable_walk(), and at startup it's far from a likely source
of trouble: so just delete the walk's is_atomic arg and might_sleep().

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/mm/kaiser.c |   10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -108,19 +108,13 @@ static inline unsigned long get_pa_from_
  *
  * Returns a pointer to a PTE on success, or NULL on failure.
  */
-static pte_t *kaiser_pagetable_walk(unsigned long address, bool is_atomic)
+static pte_t *kaiser_pagetable_walk(unsigned long address)
 {
 	pmd_t *pmd;
 	pud_t *pud;
 	pgd_t *pgd = native_get_shadow_pgd(pgd_offset_k(address));
 	gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
 
-	if (is_atomic) {
-		gfp &= ~GFP_KERNEL;
-		gfp |= __GFP_HIGH | __GFP_ATOMIC;
-	} else
-		might_sleep();
-
 	if (pgd_none(*pgd)) {
 		WARN_ONCE(1, "All shadow pgds should have been populated");
 		return NULL;
@@ -195,7 +189,7 @@ static int kaiser_add_user_map(const voi
 			ret = -EIO;
 			break;
 		}
-		pte = kaiser_pagetable_walk(address, false);
+		pte = kaiser_pagetable_walk(address);
 		if (!pte) {
 			ret = -ENOMEM;
 			break;

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 30/37] kaiser: asm/tlbflush.h handle noPGE at lower level
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (28 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 29/37] kaiser: drop is_atomic arg to kaiser_pagetable_walk() Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 31/37] kaiser: kaiser_flush_tlb_on_return_to_user() check PCID Greg Kroah-Hartman
                   ` (16 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Jiri Kosina

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>


I found asm/tlbflush.h too twisty, and think it safer not to avoid
__native_flush_tlb_global_irq_disabled() in the kaiser_enabled case,
but instead let it handle kaiser_enabled along with cr3: it can just
use __native_flush_tlb() for that, no harm in re-disabling preemption.

(This is not the same change as Kirill and Dave have suggested for
upstream, flipping PGE in cr4: that's neat, but needs a cpu_has_pge
check; cr3 is enough for kaiser, and thought to be cheaper than cr4.)

Also delete the X86_FEATURE_INVPCID invpcid_flush_all_nonglobals()
preference from __native_flush_tlb(): unlike the invpcid_flush_all()
preference in __native_flush_tlb_global(), it's not seen in upstream
4.14, and was recently reported to be surprisingly slow.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/tlbflush.h |   27 +++------------------------
 1 file changed, 3 insertions(+), 24 deletions(-)

--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -151,14 +151,6 @@ static inline void kaiser_flush_tlb_on_r
 
 static inline void __native_flush_tlb(void)
 {
-	if (this_cpu_has(X86_FEATURE_INVPCID)) {
-		/*
-		 * Note, this works with CR4.PCIDE=0 or 1.
-		 */
-		invpcid_flush_all_nonglobals();
-		return;
-	}
-
 	/*
 	 * If current->mm == NULL then we borrow a mm which may change during a
 	 * task switch and therefore we must not be preempted while we write CR3
@@ -182,11 +174,8 @@ static inline void __native_flush_tlb_gl
 		/* restore PGE as it was before */
 		native_write_cr4(cr4);
 	} else {
-		/*
-		 * x86_64 microcode update comes this way when CR4.PGE is not
-		 * enabled, and it's safer for all callers to allow this case.
-		 */
-		native_write_cr3(native_read_cr3());
+		/* do it with cr3, letting kaiser flush user PCID */
+		__native_flush_tlb();
 	}
 }
 
@@ -194,12 +183,6 @@ static inline void __native_flush_tlb_gl
 {
 	unsigned long flags;
 
-	if (kaiser_enabled) {
-		/* Globals are not used at all */
-		__native_flush_tlb();
-		return;
-	}
-
 	if (this_cpu_has(X86_FEATURE_INVPCID)) {
 		/*
 		 * Using INVPCID is considerably faster than a pair of writes
@@ -255,11 +238,7 @@ static inline void __native_flush_tlb_si
 
 static inline void __flush_tlb_all(void)
 {
-	if (cpu_has_pge)
-		__flush_tlb_global();
-	else
-		__flush_tlb();
-
+	__flush_tlb_global();
 	/*
 	 * Note: if we somehow had PCID but not PGE, then this wouldn't work --
 	 * we'd end up flushing kernel translations for the current ASID but

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 31/37] kaiser: kaiser_flush_tlb_on_return_to_user() check PCID
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (29 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 30/37] kaiser: asm/tlbflush.h handle noPGE at lower level Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11   ` Greg Kroah-Hartman
                   ` (15 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Jiri Kosina

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>


Let kaiser_flush_tlb_on_return_to_user() do the X86_FEATURE_PCID
check, instead of each caller doing it inline first: nobody needs
to optimize for the noPCID case, it's clearer this way, and better
suits later changes.  Replace those no-op X86_CR3_PCID_KERN_FLUSH lines
by a BUILD_BUG_ON() in load_new_mm_cr3(), in case something changes.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/tlbflush.h |    4 ++--
 arch/x86/mm/kaiser.c            |    6 +++---
 arch/x86/mm/tlb.c               |    8 ++++----
 3 files changed, 9 insertions(+), 9 deletions(-)

--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -157,7 +157,7 @@ static inline void __native_flush_tlb(vo
 	 * back:
 	 */
 	preempt_disable();
-	if (kaiser_enabled && this_cpu_has(X86_FEATURE_PCID))
+	if (kaiser_enabled)
 		kaiser_flush_tlb_on_return_to_user();
 	native_write_cr3(native_read_cr3());
 	preempt_enable();
@@ -216,7 +216,7 @@ static inline void __native_flush_tlb_si
 	 */
 
 	if (!this_cpu_has(X86_FEATURE_INVPCID_SINGLE)) {
-		if (kaiser_enabled && this_cpu_has(X86_FEATURE_PCID))
+		if (kaiser_enabled)
 			kaiser_flush_tlb_on_return_to_user();
 		asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
 		return;
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -436,12 +436,12 @@ void kaiser_setup_pcid(void)
 
 /*
  * Make a note that this cpu will need to flush USER tlb on return to user.
- * Caller checks whether this_cpu_has(X86_FEATURE_PCID) before calling:
- * if cpu does not, then the NOFLUSH bit will never have been set.
+ * If cpu does not have PCID, then the NOFLUSH bit will never have been set.
  */
 void kaiser_flush_tlb_on_return_to_user(void)
 {
-	this_cpu_write(x86_cr3_pcid_user,
+	if (this_cpu_has(X86_FEATURE_PCID))
+		this_cpu_write(x86_cr3_pcid_user,
 			X86_CR3_PCID_USER_FLUSH | KAISER_SHADOW_PGD_OFFSET);
 }
 EXPORT_SYMBOL(kaiser_flush_tlb_on_return_to_user);
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -39,7 +39,7 @@ static void load_new_mm_cr3(pgd_t *pgdir
 {
 	unsigned long new_mm_cr3 = __pa(pgdir);
 
-	if (kaiser_enabled && this_cpu_has(X86_FEATURE_PCID)) {
+	if (kaiser_enabled) {
 		/*
 		 * We reuse the same PCID for different tasks, so we must
 		 * flush all the entries for the PCID out when we change tasks.
@@ -50,10 +50,10 @@ static void load_new_mm_cr3(pgd_t *pgdir
 		 * do it here, but can only be used if X86_FEATURE_INVPCID is
 		 * available - and many machines support pcid without invpcid.
 		 *
-		 * The line below is a no-op: X86_CR3_PCID_KERN_FLUSH is now 0;
-		 * but keep that line in there in case something changes.
+		 * If X86_CR3_PCID_KERN_FLUSH actually added something, then it
+		 * would be needed in the write_cr3() below - if PCIDs enabled.
 		 */
-		new_mm_cr3 |= X86_CR3_PCID_KERN_FLUSH;
+		BUILD_BUG_ON(X86_CR3_PCID_KERN_FLUSH);
 		kaiser_flush_tlb_on_return_to_user();
 	}
 

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 32/37] x86/paravirt: Dont patch flush_tlb_single
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
@ 2018-01-03 20:11   ` Greg Kroah-Hartman
  2018-01-03 20:11   ` [kernel-hardening] " Greg Kroah-Hartman
                     ` (45 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Thomas Gleixner, Josh Poimboeuf,
	Juergen Gross, Peter Zijlstra, Andy Lutomirski, Boris Ostrovsky,
	Borislav Petkov, Borislav Petkov, Brian Gerst, Dave Hansen,
	Dave Hansen, David Laight, Denys Vlasenko, Eduardo Valentin,
	H. Peter Anvin, Linus Torvalds, Rik van Riel, Will Deacon,
	aliguori, daniel.gruss, hughd, keescook, linux-mm,
	michael.schwarz, moritz.lipp, richard.fellner, Ingo Molnar,
	Borislav Petkov

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>


commit a035795499ca1c2bd1928808d1a156eda1420383 upstream

native_flush_tlb_single() will be changed with the upcoming
PAGE_TABLE_ISOLATION feature. This requires to have more code in
there than INVLPG.

Remove the paravirt patching for it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Cc: michael.schwarz@iaik.tugraz.at
Cc: moritz.lipp@iaik.tugraz.at
Cc: richard.fellner@student.tugraz.at
Link: https://lkml.kernel.org/r/20171204150606.828111617@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/paravirt_patch_64.c |    2 --
 1 file changed, 2 deletions(-)

--- a/arch/x86/kernel/paravirt_patch_64.c
+++ b/arch/x86/kernel/paravirt_patch_64.c
@@ -9,7 +9,6 @@ DEF_NATIVE(pv_irq_ops, save_fl, "pushfq;
 DEF_NATIVE(pv_mmu_ops, read_cr2, "movq %cr2, %rax");
 DEF_NATIVE(pv_mmu_ops, read_cr3, "movq %cr3, %rax");
 DEF_NATIVE(pv_mmu_ops, write_cr3, "movq %rdi, %cr3");
-DEF_NATIVE(pv_mmu_ops, flush_tlb_single, "invlpg (%rdi)");
 DEF_NATIVE(pv_cpu_ops, clts, "clts");
 DEF_NATIVE(pv_cpu_ops, wbinvd, "wbinvd");
 
@@ -62,7 +61,6 @@ unsigned native_patch(u8 type, u16 clobb
 		PATCH_SITE(pv_mmu_ops, read_cr3);
 		PATCH_SITE(pv_mmu_ops, write_cr3);
 		PATCH_SITE(pv_cpu_ops, clts);
-		PATCH_SITE(pv_mmu_ops, flush_tlb_single);
 		PATCH_SITE(pv_cpu_ops, wbinvd);
 #if defined(CONFIG_PARAVIRT_SPINLOCKS) && defined(CONFIG_QUEUED_SPINLOCKS)
 		case PARAVIRT_PATCH(pv_lock_ops.queued_spin_unlock):

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 32/37] x86/paravirt: Dont patch flush_tlb_single
@ 2018-01-03 20:11   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Thomas Gleixner, Josh Poimboeuf,
	Juergen Gross, Peter Zijlstra, Andy Lutomirski, Boris Ostrovsky,
	Borislav Petkov, Borislav Petkov, Brian Gerst, Dave Hansen,
	Dave Hansen, David Laight, Denys Vlasenko, Eduardo Valentin,
	H. Peter Anvin, Linus Torvalds, Rik van Riel, Will Deacon,
	aliguori, daniel.gruss, hughd, keescook, linux-mm,
	michael.schwarz, moritz.lipp, richard.fellner, Ingo Molnar,
	Borislav Petkov

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>


commit a035795499ca1c2bd1928808d1a156eda1420383 upstream

native_flush_tlb_single() will be changed with the upcoming
PAGE_TABLE_ISOLATION feature. This requires to have more code in
there than INVLPG.

Remove the paravirt patching for it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Cc: michael.schwarz@iaik.tugraz.at
Cc: moritz.lipp@iaik.tugraz.at
Cc: richard.fellner@student.tugraz.at
Link: https://lkml.kernel.org/r/20171204150606.828111617@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/paravirt_patch_64.c |    2 --
 1 file changed, 2 deletions(-)

--- a/arch/x86/kernel/paravirt_patch_64.c
+++ b/arch/x86/kernel/paravirt_patch_64.c
@@ -9,7 +9,6 @@ DEF_NATIVE(pv_irq_ops, save_fl, "pushfq;
 DEF_NATIVE(pv_mmu_ops, read_cr2, "movq %cr2, %rax");
 DEF_NATIVE(pv_mmu_ops, read_cr3, "movq %cr3, %rax");
 DEF_NATIVE(pv_mmu_ops, write_cr3, "movq %rdi, %cr3");
-DEF_NATIVE(pv_mmu_ops, flush_tlb_single, "invlpg (%rdi)");
 DEF_NATIVE(pv_cpu_ops, clts, "clts");
 DEF_NATIVE(pv_cpu_ops, wbinvd, "wbinvd");
 
@@ -62,7 +61,6 @@ unsigned native_patch(u8 type, u16 clobb
 		PATCH_SITE(pv_mmu_ops, read_cr3);
 		PATCH_SITE(pv_mmu_ops, write_cr3);
 		PATCH_SITE(pv_cpu_ops, clts);
-		PATCH_SITE(pv_mmu_ops, flush_tlb_single);
 		PATCH_SITE(pv_cpu_ops, wbinvd);
 #if defined(CONFIG_PARAVIRT_SPINLOCKS) && defined(CONFIG_QUEUED_SPINLOCKS)
 		case PARAVIRT_PATCH(pv_lock_ops.queued_spin_unlock):


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 33/37] x86/kaiser: Reenable PARAVIRT
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (31 preceding siblings ...)
  2018-01-03 20:11   ` Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 34/37] kaiser: disabled on Xen PV Greg Kroah-Hartman
                   ` (13 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Borislav Petkov

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Borislav Petkov <bp@suse.de>


Now that the required bits have been addressed, reenable
PARAVIRT.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 security/Kconfig |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/security/Kconfig
+++ b/security/Kconfig
@@ -34,7 +34,7 @@ config SECURITY
 config KAISER
 	bool "Remove the kernel mapping in user mode"
 	default y
-	depends on X86_64 && SMP && !PARAVIRT
+	depends on X86_64 && SMP
 	help
 	  This enforces a strict kernel and user space isolation, in order
 	  to close hardware side channels on kernel address information.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 34/37] kaiser: disabled on Xen PV
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (32 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 33/37] x86/kaiser: Reenable PARAVIRT Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 35/37] x86/kaiser: Move feature detection up Greg Kroah-Hartman
                   ` (12 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Jiri Kosina

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jiri Kosina <jkosina@suse.cz>


Kaiser cannot be used on paravirtualized MMUs (namely reading and writing CR3).
This does not work with KAISER as the CR3 switch from and to user space PGD
would require to map the whole XEN_PV machinery into both.

More importantly, enabling KAISER on Xen PV doesn't make too much sense, as PV
guests use distinct %cr3 values for kernel and user already.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/mm/kaiser.c |    5 +++++
 1 file changed, 5 insertions(+)

--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -264,6 +264,9 @@ void __init kaiser_check_boottime_disabl
 	char arg[5];
 	int ret;
 
+	if (boot_cpu_has(X86_FEATURE_XENPV))
+		goto silent_disable;
+
 	ret = cmdline_find_option(boot_command_line, "pti", arg, sizeof(arg));
 	if (ret > 0) {
 		if (!strncmp(arg, "on", 2))
@@ -291,6 +294,8 @@ enable:
 
 disable:
 	pr_info("Kernel/User page tables isolation: disabled\n");
+
+silent_disable:
 	kaiser_enabled = 0;
 	setup_clear_cpu_cap(X86_FEATURE_KAISER);
 }

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 35/37] x86/kaiser: Move feature detection up
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (33 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 34/37] kaiser: disabled on Xen PV Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 36/37] KPTI: Rename to PAGE_TABLE_ISOLATION Greg Kroah-Hartman
                   ` (11 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Borislav Petkov

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Borislav Petkov <bp@suse.de>


... before the first use of kaiser_enabled as otherwise funky
things happen:

  about to get started...
  (XEN) d0v0 Unhandled page fault fault/trap [#14, ec=0000]
  (XEN) Pagetable walk from ffff88022a449090:
  (XEN)  L4[0x110] = 0000000229e0e067 0000000000001e0e
  (XEN)  L3[0x008] = 0000000000000000 ffffffffffffffff
  (XEN) domain_crash_sync called from entry.S: fault at ffff82d08033fd08
  entry.o#create_bounce_frame+0x135/0x14d
  (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
  (XEN) ----[ Xen-4.9.1_02-3.21  x86_64  debug=n   Not tainted ]----
  (XEN) CPU:    0
  (XEN) RIP:    e033:[<ffffffff81007460>]
  (XEN) RFLAGS: 0000000000000286   EM: 1   CONTEXT: pv guest (d0v0)

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/kaiser.h |    2 ++
 arch/x86/kernel/setup.c       |    7 +++++++
 arch/x86/mm/kaiser.c          |    2 --
 3 files changed, 9 insertions(+), 2 deletions(-)

--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -96,8 +96,10 @@ DECLARE_PER_CPU(unsigned long, x86_cr3_p
 extern char __per_cpu_user_mapped_start[], __per_cpu_user_mapped_end[];
 
 extern int kaiser_enabled;
+extern void __init kaiser_check_boottime_disable(void);
 #else
 #define kaiser_enabled	0
+static inline void __init kaiser_check_boottime_disable(void) {}
 #endif /* CONFIG_KAISER */
 
 /*
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -112,6 +112,7 @@
 #include <asm/alternative.h>
 #include <asm/prom.h>
 #include <asm/microcode.h>
+#include <asm/kaiser.h>
 
 /*
  * max_low_pfn_mapped: highest direct mapped pfn under 4GB
@@ -1016,6 +1017,12 @@ void __init setup_arch(char **cmdline_p)
 	 */
 	init_hypervisor_platform();
 
+	/*
+	 * This needs to happen right after XENPV is set on xen and
+	 * kaiser_enabled is checked below in cleanup_highmap().
+	 */
+	kaiser_check_boottime_disable();
+
 	x86_init.resources.probe_roms();
 
 	/* after parse_early_param, so could debug it */
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -311,8 +311,6 @@ void __init kaiser_init(void)
 {
 	int cpu;
 
-	kaiser_check_boottime_disable();
-
 	if (!kaiser_enabled)
 		return;
 

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 36/37] KPTI: Rename to PAGE_TABLE_ISOLATION
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (34 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 35/37] x86/kaiser: Move feature detection up Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 20:11 ` [PATCH 4.4 37/37] KPTI: Report when enabled Greg Kroah-Hartman
                   ` (10 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Kees Cook

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Kees Cook <keescook@chromium.org>

This renames CONFIG_KAISER to CONFIG_PAGE_TABLE_ISOLATION.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/boot/compressed/misc.h           |    2 +-
 arch/x86/entry/entry_64.S                 |   12 ++++++------
 arch/x86/include/asm/cpufeature.h         |    2 +-
 arch/x86/include/asm/kaiser.h             |   12 ++++++------
 arch/x86/include/asm/pgtable.h            |    4 ++--
 arch/x86/include/asm/pgtable_64.h         |    4 ++--
 arch/x86/include/asm/pgtable_types.h      |    2 +-
 arch/x86/include/asm/tlbflush.h           |    2 +-
 arch/x86/kernel/cpu/perf_event_intel_ds.c |    4 ++--
 arch/x86/kernel/head_64.S                 |    2 +-
 arch/x86/mm/Makefile                      |    2 +-
 include/linux/kaiser.h                    |    6 +++---
 include/linux/percpu-defs.h               |    2 +-
 security/Kconfig                          |    2 +-
 14 files changed, 29 insertions(+), 29 deletions(-)

--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -9,7 +9,7 @@
  */
 #undef CONFIG_PARAVIRT
 #undef CONFIG_PARAVIRT_SPINLOCKS
-#undef CONFIG_KAISER
+#undef CONFIG_PAGE_TABLE_ISOLATION
 #undef CONFIG_KASAN
 
 #include <linux/linkage.h>
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1043,7 +1043,7 @@ ENTRY(paranoid_entry)
 	SWAPGS
 	xorl	%ebx, %ebx
 1:
-#ifdef CONFIG_KAISER
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
 	/*
 	 * We might have come in between a swapgs and a SWITCH_KERNEL_CR3
 	 * on entry, or between a SWITCH_USER_CR3 and a swapgs on exit.
@@ -1083,7 +1083,7 @@ ENTRY(paranoid_exit)
 	DISABLE_INTERRUPTS(CLBR_NONE)
 	TRACE_IRQS_OFF_DEBUG
 	TRACE_IRQS_IRETQ_DEBUG
-#ifdef CONFIG_KAISER
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
 	/* No ALTERNATIVE for X86_FEATURE_KAISER: paranoid_entry sets %ebx */
 	testl	$2, %ebx			/* SWITCH_USER_CR3 needed? */
 	jz	paranoid_exit_no_switch
@@ -1314,7 +1314,7 @@ ENTRY(nmi)
 
 	movq	%rsp, %rdi
 	movq	$-1, %rsi
-#ifdef CONFIG_KAISER
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
 	/* Unconditionally use kernel CR3 for do_nmi() */
 	/* %rax is saved above, so OK to clobber here */
 	ALTERNATIVE "jmp 2f", "movq %cr3, %rax", X86_FEATURE_KAISER
@@ -1328,7 +1328,7 @@ ENTRY(nmi)
 #endif
 	call	do_nmi
 
-#ifdef CONFIG_KAISER
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
 	/*
 	 * Unconditionally restore CR3.  I know we return to
 	 * kernel code that needs user CR3, but do we ever return
@@ -1558,7 +1558,7 @@ end_repeat_nmi:
 1:
 	movq	%rsp, %rdi
 	movq	$-1, %rsi
-#ifdef CONFIG_KAISER
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
 	/* Unconditionally use kernel CR3 for do_nmi() */
 	/* %rax is saved above, so OK to clobber here */
 	ALTERNATIVE "jmp 2f", "movq %cr3, %rax", X86_FEATURE_KAISER
@@ -1574,7 +1574,7 @@ end_repeat_nmi:
 	/* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
 	call	do_nmi
 
-#ifdef CONFIG_KAISER
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
 	/*
 	 * Unconditionally restore CR3.  We might be returning to
 	 * kernel code that needs user CR3, like just just before
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -201,7 +201,7 @@
 #define X86_FEATURE_INTEL_PT	( 7*32+15) /* Intel Processor Trace */
 
 /* Because the ALTERNATIVE scheme is for members of the X86_FEATURE club... */
-#define X86_FEATURE_KAISER	( 7*32+31) /* CONFIG_KAISER w/o nokaiser */
+#define X86_FEATURE_KAISER	( 7*32+31) /* CONFIG_PAGE_TABLE_ISOLATION w/o nokaiser */
 
 /* Virtualization flags: Linux defined, word 8 */
 #define X86_FEATURE_TPR_SHADOW  ( 8*32+ 0) /* Intel TPR Shadow */
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -20,7 +20,7 @@
 #define KAISER_SHADOW_PGD_OFFSET 0x1000
 
 #ifdef __ASSEMBLY__
-#ifdef CONFIG_KAISER
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
 
 .macro _SWITCH_TO_KERNEL_CR3 reg
 movq %cr3, \reg
@@ -69,7 +69,7 @@ movq PER_CPU_VAR(unsafe_stack_register_b
 8:
 .endm
 
-#else /* CONFIG_KAISER */
+#else /* CONFIG_PAGE_TABLE_ISOLATION */
 
 .macro SWITCH_KERNEL_CR3
 .endm
@@ -78,11 +78,11 @@ movq PER_CPU_VAR(unsafe_stack_register_b
 .macro SWITCH_KERNEL_CR3_NO_STACK
 .endm
 
-#endif /* CONFIG_KAISER */
+#endif /* CONFIG_PAGE_TABLE_ISOLATION */
 
 #else /* __ASSEMBLY__ */
 
-#ifdef CONFIG_KAISER
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
 /*
  * Upon kernel/user mode switch, it may happen that the address
  * space has to be switched before the registers have been
@@ -100,10 +100,10 @@ extern void __init kaiser_check_boottime
 #else
 #define kaiser_enabled	0
 static inline void __init kaiser_check_boottime_disable(void) {}
-#endif /* CONFIG_KAISER */
+#endif /* CONFIG_PAGE_TABLE_ISOLATION */
 
 /*
- * Kaiser function prototypes are needed even when CONFIG_KAISER is not set,
+ * Kaiser function prototypes are needed even when CONFIG_PAGE_TABLE_ISOLATION is not set,
  * so as to build with tests on kaiser_enabled instead of #ifdefs.
  */
 
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -18,7 +18,7 @@
 #ifndef __ASSEMBLY__
 #include <asm/x86_init.h>
 
-#ifdef CONFIG_KAISER
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
 extern int kaiser_enabled;
 #else
 #define kaiser_enabled 0
@@ -872,7 +872,7 @@ static inline void pmdp_set_wrprotect(st
 static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count)
 {
 	memcpy(dst, src, count * sizeof(pgd_t));
-#ifdef CONFIG_KAISER
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
 	if (kaiser_enabled) {
 		/* Clone the shadow pgd part as well */
 		memcpy(native_get_shadow_pgd(dst),
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -106,7 +106,7 @@ static inline void native_pud_clear(pud_
 	native_set_pud(pud, native_make_pud(0));
 }
 
-#ifdef CONFIG_KAISER
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
 extern pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd);
 
 static inline pgd_t *native_get_shadow_pgd(pgd_t *pgdp)
@@ -127,7 +127,7 @@ static inline pgd_t *native_get_shadow_p
 	BUILD_BUG_ON(1);
 	return NULL;
 }
-#endif /* CONFIG_KAISER */
+#endif /* CONFIG_PAGE_TABLE_ISOLATION */
 
 static inline void native_set_pgd(pgd_t *pgdp, pgd_t pgd)
 {
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -109,7 +109,7 @@
 #define X86_CR3_PCID_MASK       (X86_CR3_PCID_NOFLUSH | X86_CR3_PCID_ASID_MASK)
 #define X86_CR3_PCID_ASID_KERN  (_AC(0x0,UL))
 
-#if defined(CONFIG_KAISER) && defined(CONFIG_X86_64)
+#if defined(CONFIG_PAGE_TABLE_ISOLATION) && defined(CONFIG_X86_64)
 /* Let X86_CR3_PCID_ASID_USER be usable for the X86_CR3_PCID_NOFLUSH bit */
 #define X86_CR3_PCID_ASID_USER	(_AC(0x80,UL))
 
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -135,7 +135,7 @@ static inline void cr4_set_bits_and_upda
  * Declare a couple of kaiser interfaces here for convenience,
  * to avoid the need for asm/kaiser.h in unexpected places.
  */
-#ifdef CONFIG_KAISER
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
 extern int kaiser_enabled;
 extern void kaiser_setup_pcid(void);
 extern void kaiser_flush_tlb_on_return_to_user(void);
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -274,7 +274,7 @@ static DEFINE_PER_CPU(void *, insn_buffe
 
 static void *dsalloc(size_t size, gfp_t flags, int node)
 {
-#ifdef CONFIG_KAISER
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
 	unsigned int order = get_order(size);
 	struct page *page;
 	unsigned long addr;
@@ -295,7 +295,7 @@ static void *dsalloc(size_t size, gfp_t
 
 static void dsfree(const void *buffer, size_t size)
 {
-#ifdef CONFIG_KAISER
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
 	if (!buffer)
 		return;
 	kaiser_remove_mapping((unsigned long)buffer, size);
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -441,7 +441,7 @@ early_idt_ripmsg:
 	.balign	PAGE_SIZE; \
 GLOBAL(name)
 
-#ifdef CONFIG_KAISER
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
 /*
  * Each PGD needs to be 8k long and 8k aligned.  We do not
  * ever go out to userspace with these, so we do not
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -32,4 +32,4 @@ obj-$(CONFIG_ACPI_NUMA)		+= srat.o
 obj-$(CONFIG_NUMA_EMU)		+= numa_emulation.o
 
 obj-$(CONFIG_X86_INTEL_MPX)	+= mpx.o
-obj-$(CONFIG_KAISER)		+= kaiser.o
+obj-$(CONFIG_PAGE_TABLE_ISOLATION)		+= kaiser.o
--- a/include/linux/kaiser.h
+++ b/include/linux/kaiser.h
@@ -1,7 +1,7 @@
 #ifndef _LINUX_KAISER_H
 #define _LINUX_KAISER_H
 
-#ifdef CONFIG_KAISER
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
 #include <asm/kaiser.h>
 
 static inline int kaiser_map_thread_stack(void *stack)
@@ -24,7 +24,7 @@ static inline void kaiser_unmap_thread_s
 #else
 
 /*
- * These stubs are used whenever CONFIG_KAISER is off, which
+ * These stubs are used whenever CONFIG_PAGE_TABLE_ISOLATION is off, which
  * includes architectures that support KAISER, but have it disabled.
  */
 
@@ -48,5 +48,5 @@ static inline void kaiser_unmap_thread_s
 {
 }
 
-#endif /* !CONFIG_KAISER */
+#endif /* !CONFIG_PAGE_TABLE_ISOLATION */
 #endif /* _LINUX_KAISER_H */
--- a/include/linux/percpu-defs.h
+++ b/include/linux/percpu-defs.h
@@ -35,7 +35,7 @@
 
 #endif
 
-#ifdef CONFIG_KAISER
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
 #define USER_MAPPED_SECTION "..user_mapped"
 #else
 #define USER_MAPPED_SECTION ""
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -31,7 +31,7 @@ config SECURITY
 
 	  If you are unsure how to answer this question, answer N.
 
-config KAISER
+config PAGE_TABLE_ISOLATION
 	bool "Remove the kernel mapping in user mode"
 	default y
 	depends on X86_64 && SMP

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH 4.4 37/37] KPTI: Report when enabled
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (35 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 36/37] KPTI: Rename to PAGE_TABLE_ISOLATION Greg Kroah-Hartman
@ 2018-01-03 20:11 ` Greg Kroah-Hartman
  2018-01-03 22:08 ` [PATCH 4.4 00/37] 4.4.110-stable review Nathan Chancellor
                   ` (9 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-03 20:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Kees Cook

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Kees Cook <keescook@chromium.org>

Make sure dmesg reports when KPTI is enabled.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/mm/kaiser.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -11,6 +11,9 @@
 #include <linux/uaccess.h>
 #include <linux/ftrace.h>
 
+#undef pr_fmt
+#define pr_fmt(fmt)     "Kernel/User page tables isolation: " fmt
+
 #include <asm/kaiser.h>
 #include <asm/tlbflush.h>	/* to verify its kaiser declarations */
 #include <asm/pgtable.h>
@@ -293,7 +296,7 @@ enable:
 	return;
 
 disable:
-	pr_info("Kernel/User page tables isolation: disabled\n");
+	pr_info("disabled\n");
 
 silent_disable:
 	kaiser_enabled = 0;
@@ -353,6 +356,8 @@ void __init kaiser_init(void)
 	kaiser_add_user_map_early(&debug_idt_table,
 				  sizeof(gate_desc) * NR_VECTORS,
 				  __PAGE_KERNEL);
+
+	pr_info("enabled\n");
 }
 
 /* Add a mapping to the shadow mapping, and synchronize the mappings */

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (36 preceding siblings ...)
  2018-01-03 20:11 ` [PATCH 4.4 37/37] KPTI: Report when enabled Greg Kroah-Hartman
@ 2018-01-03 22:08 ` Nathan Chancellor
  2018-01-04  8:10   ` Greg Kroah-Hartman
  2018-01-04  6:50 ` Naresh Kamboju
                   ` (8 subsequent siblings)
  46 siblings, 1 reply; 156+ messages in thread
From: Nathan Chancellor @ 2018-01-03 22:08 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, linux, shuahkh, patches,
	ben.hutchings, lkft-triage, stable

On Wed, Jan 03, 2018 at 09:11:06PM +0100, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.110 release.
> There are 37 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Fri Jan  5 19:50:38 UTC 2018.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
> 	kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.110-rc1.gz
> or in the git tree and branch at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 
> -------------
> Pseudo-Shortlog of commits:
> 
> Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>     Linux 4.4.110-rc1
> 
> Kees Cook <keescook@chromium.org>
>     KPTI: Report when enabled
> 
> Kees Cook <keescook@chromium.org>
>     KPTI: Rename to PAGE_TABLE_ISOLATION
> 
> Borislav Petkov <bp@suse.de>
>     x86/kaiser: Move feature detection up
> 
> Jiri Kosina <jkosina@suse.cz>
>     kaiser: disabled on Xen PV
> 
> Borislav Petkov <bp@suse.de>
>     x86/kaiser: Reenable PARAVIRT
> 
> Thomas Gleixner <tglx@linutronix.de>
>     x86/paravirt: Dont patch flush_tlb_single
> 
> Hugh Dickins <hughd@google.com>
>     kaiser: kaiser_flush_tlb_on_return_to_user() check PCID
> 
> Hugh Dickins <hughd@google.com>
>     kaiser: asm/tlbflush.h handle noPGE at lower level
> 
> Hugh Dickins <hughd@google.com>
>     kaiser: drop is_atomic arg to kaiser_pagetable_walk()
> 
> Hugh Dickins <hughd@google.com>
>     kaiser: use ALTERNATIVE instead of x86_cr3_pcid_noflush
> 
> Borislav Petkov <bp@suse.de>
>     x86/kaiser: Check boottime cmdline params
> 
> Borislav Petkov <bp@suse.de>
>     x86/kaiser: Rename and simplify X86_FEATURE_KAISER handling
> 
> Hugh Dickins <hughd@google.com>
>     kaiser: add "nokaiser" boot option, using ALTERNATIVE
> 
> Hugh Dickins <hughd@google.com>
>     kaiser: fix unlikely error in alloc_ldt_struct()
> 
> Hugh Dickins <hughd@google.com>
>     kaiser: _pgd_alloc() without __GFP_REPEAT to avoid stalls
> 
> Hugh Dickins <hughd@google.com>
>     kaiser: paranoid_entry pass cr3 need to paranoid_exit
> 
> Hugh Dickins <hughd@google.com>
>     kaiser: x86_cr3_pcid_noflush and x86_cr3_pcid_user
> 
> Hugh Dickins <hughd@google.com>
>     kaiser: PCID 0 for kernel and 128 for user
> 
> Hugh Dickins <hughd@google.com>
>     kaiser: load_new_mm_cr3() let SWITCH_USER_CR3 flush user
> 
> Dave Hansen <dave.hansen@linux.intel.com>
>     kaiser: enhanced by kernel and user PCIDs
> 
> Hugh Dickins <hughd@google.com>
>     kaiser: vmstat show NR_KAISERTABLE as nr_overhead
> 
> Hugh Dickins <hughd@google.com>
>     kaiser: delete KAISER_REAL_SWITCH option
> 
> Hugh Dickins <hughd@google.com>
>     kaiser: name that 0x1000 KAISER_SHADOW_PGD_OFFSET
> 
> Hugh Dickins <hughd@google.com>
>     kaiser: cleanups while trying for gold link
> 
> Hugh Dickins <hughd@google.com>
>     kaiser: kaiser_remove_mapping() move along the pgd
> 
> Hugh Dickins <hughd@google.com>
>     kaiser: tidied up kaiser_add/remove_mapping slightly
> 
> Hugh Dickins <hughd@google.com>
>     kaiser: tidied up asm/kaiser.h somewhat
> 
> Hugh Dickins <hughd@google.com>
>     kaiser: ENOMEM if kaiser_pagetable_walk() NULL
> 
> Hugh Dickins <hughd@google.com>
>     kaiser: fix perf crashes
> 
> Hugh Dickins <hughd@google.com>
>     kaiser: fix regs to do_nmi() ifndef CONFIG_KAISER
> 
> Hugh Dickins <hughd@google.com>
>     kaiser: KAISER depends on SMP
> 
> Hugh Dickins <hughd@google.com>
>     kaiser: fix build and FIXME in alloc_ldt_struct()
> 
> Hugh Dickins <hughd@google.com>
>     kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE
> 
> Hugh Dickins <hughd@google.com>
>     kaiser: do not set _PAGE_NX on pgd_none
> 
> Dave Hansen <dave.hansen@linux.intel.com>
>     kaiser: merged update
> 
> Richard Fellner <richard.fellner@student.tugraz.at>
>     KAISER: Kernel Address Isolation
> 
> Tom Lendacky <thomas.lendacky@amd.com>
>     x86/boot: Add early cmdline parsing for options with arguments
> 
> 
> -------------
> 
> Diffstat:
> 
>  Documentation/kernel-parameters.txt         |   8 +
>  Makefile                                    |   4 +-
>  arch/x86/boot/compressed/misc.h             |   1 +
>  arch/x86/entry/entry_64.S                   | 164 ++++++++--
>  arch/x86/entry/entry_64_compat.S            |   7 +
>  arch/x86/include/asm/cmdline.h              |   2 +
>  arch/x86/include/asm/cpufeature.h           |   4 +
>  arch/x86/include/asm/desc.h                 |   2 +-
>  arch/x86/include/asm/hw_irq.h               |   2 +-
>  arch/x86/include/asm/kaiser.h               | 141 +++++++++
>  arch/x86/include/asm/pgtable.h              |  28 +-
>  arch/x86/include/asm/pgtable_64.h           |  25 +-
>  arch/x86/include/asm/pgtable_types.h        |  29 +-
>  arch/x86/include/asm/processor.h            |   2 +-
>  arch/x86/include/asm/tlbflush.h             |  74 ++++-
>  arch/x86/include/uapi/asm/processor-flags.h |   3 +-
>  arch/x86/kernel/cpu/common.c                |  28 +-
>  arch/x86/kernel/cpu/perf_event_intel_ds.c   |  57 +++-
>  arch/x86/kernel/espfix_64.c                 |  10 +
>  arch/x86/kernel/head_64.S                   |  35 ++-
>  arch/x86/kernel/irqinit.c                   |   2 +-
>  arch/x86/kernel/ldt.c                       |  25 +-
>  arch/x86/kernel/paravirt_patch_64.c         |   2 -
>  arch/x86/kernel/process.c                   |   2 +-
>  arch/x86/kernel/setup.c                     |   7 +
>  arch/x86/kernel/tracepoint.c                |   2 +
>  arch/x86/kvm/x86.c                          |   3 +-
>  arch/x86/lib/cmdline.c                      | 105 +++++++
>  arch/x86/mm/Makefile                        |   1 +
>  arch/x86/mm/init.c                          |   2 +-
>  arch/x86/mm/init_64.c                       |  10 +
>  arch/x86/mm/kaiser.c                        | 455 ++++++++++++++++++++++++++++
>  arch/x86/mm/pageattr.c                      |  63 +++-
>  arch/x86/mm/pgtable.c                       |  16 +-
>  arch/x86/mm/tlb.c                           |  39 ++-
>  include/asm-generic/vmlinux.lds.h           |   7 +
>  include/linux/kaiser.h                      |  52 ++++
>  include/linux/mmzone.h                      |   3 +-
>  include/linux/percpu-defs.h                 |  32 +-
>  init/main.c                                 |   2 +
>  kernel/fork.c                               |   6 +
>  mm/vmstat.c                                 |   1 +
>  security/Kconfig                            |  10 +
>  43 files changed, 1375 insertions(+), 98 deletions(-)
> 
>

Not that my feedback will matter much on this release since Pixel 2 XL
is an arm64 device but merged, compiled, and flashed successfully.

The changes to kernel/fork.c had to be slightly adjusted for Google's
tree due to their addition of mainline commit b235beea9e99 ("Clarify
naming of thread info/stack allocators").

No noticeable issues in general use or dmesg.

Thanks!
Nathan

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (37 preceding siblings ...)
  2018-01-03 22:08 ` [PATCH 4.4 00/37] 4.4.110-stable review Nathan Chancellor
@ 2018-01-04  6:50 ` Naresh Kamboju
  2018-01-04  9:27 ` kernelci.org bot
                   ` (7 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Naresh Kamboju @ 2018-01-04  6:50 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, linux- stable

On 4 January 2018 at 01:41, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> This is the start of the stable review cycle for the 4.4.110 release.
> There are 37 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Jan  5 19:50:38 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
>         kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.110-rc1.gz
> or in the git tree and branch at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Results from Linaro’s test farm.
No regressions on arm64, arm and x86_64.

NOTE:
Retested 20 Iterations on two devices and all 40 times test completed
successfully.
Which confirms this is an intermittent timing failure.
For the internal investigation and record the bug has been reported.
LKFT:stable-rc 4.4.110-rc1: x15: LTP poll02 FAIL: poll() slept for too
long (intermittent)
https://bugs.linaro.org/show_bug.cgi?id=3566

Summary
------------------------------------------------------------------------

kernel: 4.4.110-rc1
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.4.y
git commit: 99abd6cdd65e984d89c8565508a7a96ea0fce179
git describe: v4.4.109-38-g99abd6cdd65e
Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.109-38-g99abd6cdd65e

No Regressions (compared to build v4.4.109)
------------------------------------------------------------------------


Boards, architectures and test suites:
-------------------------------------

juno-r2 - arm64
* boot - pass: 20,
* kselftest - pass: 32, skip: 29
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 28, skip: 36
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 60,
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 22,
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 14,
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 984, skip: 124
* ltp-timers-tests - pass: 12,

x15 - arm
* boot - pass: 20,
* kselftest - pass: 31, skip: 29
* libhugetlbfs - pass: 87, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64,
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 60,
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 20, skip: 2
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 13, skip: 1
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - fail: 1, pass: 1034, skip: 67
* ltp-timers-tests - pass: 12,

x86_64
* boot - pass: 20,
* kselftest - pass: 44, skip: 32
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64,
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 61, skip: 1
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 22,
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 9, skip: 1
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 1013, skip: 117
* ltp-timers-tests - pass: 12,

Hikey board test results,

Summary
------------------------------------------------------------------------

kernel: 4.4.110-rc1
git repo: https://git.linaro.org/lkft/arm64-stable-rc.git
git tag: 4.4.110-rc1-hikey-20180103-95
git commit: 0769c4b4aafd63e5d73b6d67f6fe93abcff67cdc
git describe: 4.4.110-rc1-hikey-20180103-95
Test details: https://qa-reports.linaro.org/lkft/linaro-hikey-stable-rc-4.4-oe/build/4.4.110-rc1-hikey-20180103-95


No regressions (compared to build 4.4.110-rc1-hikey-20180103-94)

Boards, architectures and test suites:
-------------------------------------

hi6220-hikey - arm64
* boot - pass: 20,
* kselftest - pass: 30, skip: 31
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 28, skip: 36
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 60,
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 21, skip: 1
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 14,
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 980, skip: 124
* ltp-timers-tests - pass: 12,

Documentation - https://collaborate.linaro.org/display/LKFT/Email+Reports
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-03 22:08 ` [PATCH 4.4 00/37] 4.4.110-stable review Nathan Chancellor
@ 2018-01-04  8:10   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-04  8:10 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: linux-kernel, torvalds, akpm, linux, shuahkh, patches,
	ben.hutchings, lkft-triage, stable

On Wed, Jan 03, 2018 at 03:08:09PM -0700, Nathan Chancellor wrote:
> Not that my feedback will matter much on this release since Pixel 2 XL
> is an arm64 device but merged, compiled, and flashed successfully.

Hey, it's good to know I didn't break anything :)

> The changes to kernel/fork.c had to be slightly adjusted for Google's
> tree due to their addition of mainline commit b235beea9e99 ("Clarify
> naming of thread info/stack allocators").

Ah, good to know, I'll watch out for that when I do the andoid-common
tree merges.

Thanks again for testing and letting me know.

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (38 preceding siblings ...)
  2018-01-04  6:50 ` Naresh Kamboju
@ 2018-01-04  9:27 ` kernelci.org bot
  2018-01-05  0:06   ` Kevin Hilman
  2018-01-04 16:38 ` Pavel Tatashin
                   ` (6 subsequent siblings)
  46 siblings, 1 reply; 156+ messages in thread
From: kernelci.org bot @ 2018-01-04  9:27 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: Greg Kroah-Hartman, torvalds, akpm, linux, shuahkh, patches,
	ben.hutchings, lkft-triage, stable

stable-rc/linux-4.4.y boot: 100 boots: 4 failed, 93 passed with 1 offline, 2 conflicts (v4.4.109-38-g99abd6cdd65e)

Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.4.y/kernel/v4.4.109-38-g99abd6cdd65e/
Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.4.y/kernel/v4.4.109-38-g99abd6cdd65e/

Tree: stable-rc
Branch: linux-4.4.y
Git Describe: v4.4.109-38-g99abd6cdd65e
Git Commit: 99abd6cdd65e984d89c8565508a7a96ea0fce179
Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Tested: 53 unique boards, 19 SoC families, 16 builds out of 178

Boot Regressions Detected:

arm:

    exynos_defconfig:
        exynos5422-odroidxu3:
            lab-collabora: failing since 58 days (last pass: v4.4.95-21-g32458fcb7bd6 - first fail: v4.4.96-41-g336421367b9c)

    multi_v7_defconfig:
        armada-xp-linksys-mamba:
            lab-free-electrons: new failure (last pass: v4.4.109-36-g8b381424010c)
        tegra124-nyan-big:
            lab-collabora: failing since 1 day (last pass: v4.4.109 - first fail: v4.4.109-36-g8b381424010c)

    tegra_defconfig:
        tegra124-nyan-big:
            lab-collabora: failing since 1 day (last pass: v4.4.108-65-g57856049c0f8 - first fail: v4.4.109)

Boot Failures Detected:

arm:

    multi_v7_defconfig
        armada-xp-linksys-mamba: 1 failed lab
        tegra124-nyan-big: 1 failed lab

    exynos_defconfig
        exynos5422-odroidxu3_rootfs:nfs: 1 failed lab

    tegra_defconfig
        tegra124-nyan-big: 1 failed lab

Offline Platforms:

arm:

    davinci_all_defconfig:
        dm365evm,legacy: 1 offline lab

Conflicting Boot Failures Detected: (These likely are not failures as other labs are reporting PASS. Needs review.)

arm:

    multi_v7_defconfig:
        exynos5422-odroidxu3:
            lab-baylibre-seattle: PASS
            lab-collabora: FAIL

    exynos_defconfig:
        exynos5422-odroidxu3:
            lab-baylibre-seattle: PASS
            lab-collabora: FAIL

---
For more info write to <info@kernelci.org>

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (39 preceding siblings ...)
  2018-01-04  9:27 ` kernelci.org bot
@ 2018-01-04 16:38 ` Pavel Tatashin
  2018-01-04 16:53   ` Greg Kroah-Hartman
  2018-01-04 20:11   ` Linus Torvalds
  2018-01-04 17:03 ` Guenter Roeck
                   ` (5 subsequent siblings)
  46 siblings, 2 replies; 156+ messages in thread
From: Pavel Tatashin @ 2018-01-04 16:38 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, Andrew Morton, linux, shuahkh, patches,
	ben.hutchings, lkft-triage, stable

I am getting the following panic when trying to boot 4.4.110rc1 on
Intel(R) Xeon(R) CPU E5-2630:

[    5.923489] BUG: unable to handle kernel NULL pointer dereference
at 000000000000000d
[    5.932259] IP: [<ffffffff810e70d2>] dyntick_save_progress_counter+0x12/0x50
[    5.940142] PGD 0
[    5.942400] Oops: 0002 [#1] SMP
[    5.946023] Modules linked in:
[    5.949448] CPU: 5 PID: 8 Comm: rcu_sched Not tainted
4.4.110-rc1_pt_linux-4.4.110rc1 #1
[    5.958484] Hardware name: Oracle Corporation ORACLE SERVER
X6-2/ASM,MOTHERBOARD,1U, BIOS 38050100 08/30/2016
[    5.969552] task: ffff881ff2f1ab00 ti: ffff881ff2f24000 task.ti:
ffff881ff2f24000
[    5.977905] RIP: 0010:[<ffffffff810e70d2>]  [<ffffffff810e70d2>]
dyntick_save_progress_counter+0x12/0x50
[    5.988505] RSP: 0000:ffff881ff2f27dc0  EFLAGS: 00010046
[    5.994434] RAX: 0000000000000001 RBX: ffffffff81b02140 RCX: ffff883fec768000
[    6.002403] RDX: 0000000000000000 RSI: ffff881ff2f27e5f RDI: ffff88407e958140
[    6.010368] RBP: ffff881ff2f27dc0 R08: ffff881ff2f27e78 R09: 000000016110f359
[    6.018333] R10: 0000000000000b10 R11: 0000000000000000 R12: ffffffff81b02140
[    6.026297] R13: 00000000ffffffdf R14: 0000000000000021 R15: 0000000200000000
[    6.034262] FS:  0000000000000000(0000) GS:ffff881fff940000(0000)
knlGS:0000000000000000
[    6.043293] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.049707] CR2: 000000000000000d CR3: 0000000001aa6000 CR4: 0000000000360670
[    6.057672] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    6.065638] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    6.073603] Stack:
[    6.075847]  ffff881ff2f27e18 ffffffff810e8fac 0000000000000202
ffff881ff2f27e60
[    6.084158]  ffff881ff2f27e5f ffffffff810e70c0 ffffffff81b02140
ffffffff81b127a0
[    6.092465]  0000000000000001 0000000000000000 0000000000000003
ffff881ff2f27eb8
[    6.100768] Call Trace:
[    6.103501]  [<ffffffff810e8fac>] force_qs_rnp+0xdc/0x150
[    6.109527]  [<ffffffff810e70c0>] ? rcu_start_gp+0x70/0x70
[    6.115654]  [<ffffffff810ea118>] rcu_gp_kthread+0x468/0x9b0
[    6.121976]  [<ffffffff810c9190>] ? prepare_to_wait_event+0xf0/0xf0
[    6.128973]  [<ffffffff810e9cb0>] ? rcu_process_callbacks+0x5f0/0x5f0
[    6.136167]  [<ffffffff810a4a25>] kthread+0xe5/0x100
[    6.141710]  [<ffffffff810a4940>] ? kthread_park+0x60/0x60
[    6.147840]  [<ffffffff81714e8f>] ret_from_fork+0x3f/0x70
[    6.153868]  [<ffffffff810a4940>] ? kthread_park+0x60/0x60

I tried to bisect the problem, but when I try to boot only with:
"KAISER: Kernel Address Isolation" machine hangs during boot and
reboots without any panic message.

4.4.109 boots fine
4.9.75rc1 also boots fine.

Thank you,
Pavel

On Wed, Jan 3, 2018 at 3:11 PM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> This is the start of the stable review cycle for the 4.4.110 release.
> There are 37 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Jan  5 19:50:38 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
>         kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.110-rc1.gz
> or in the git tree and branch at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>
> -------------
> Pseudo-Shortlog of commits:
>
> Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>     Linux 4.4.110-rc1
>
> Kees Cook <keescook@chromium.org>
>     KPTI: Report when enabled
>
> Kees Cook <keescook@chromium.org>
>     KPTI: Rename to PAGE_TABLE_ISOLATION
>
> Borislav Petkov <bp@suse.de>
>     x86/kaiser: Move feature detection up
>
> Jiri Kosina <jkosina@suse.cz>
>     kaiser: disabled on Xen PV
>
> Borislav Petkov <bp@suse.de>
>     x86/kaiser: Reenable PARAVIRT
>
> Thomas Gleixner <tglx@linutronix.de>
>     x86/paravirt: Dont patch flush_tlb_single
>
> Hugh Dickins <hughd@google.com>
>     kaiser: kaiser_flush_tlb_on_return_to_user() check PCID
>
> Hugh Dickins <hughd@google.com>
>     kaiser: asm/tlbflush.h handle noPGE at lower level
>
> Hugh Dickins <hughd@google.com>
>     kaiser: drop is_atomic arg to kaiser_pagetable_walk()
>
> Hugh Dickins <hughd@google.com>
>     kaiser: use ALTERNATIVE instead of x86_cr3_pcid_noflush
>
> Borislav Petkov <bp@suse.de>
>     x86/kaiser: Check boottime cmdline params
>
> Borislav Petkov <bp@suse.de>
>     x86/kaiser: Rename and simplify X86_FEATURE_KAISER handling
>
> Hugh Dickins <hughd@google.com>
>     kaiser: add "nokaiser" boot option, using ALTERNATIVE
>
> Hugh Dickins <hughd@google.com>
>     kaiser: fix unlikely error in alloc_ldt_struct()
>
> Hugh Dickins <hughd@google.com>
>     kaiser: _pgd_alloc() without __GFP_REPEAT to avoid stalls
>
> Hugh Dickins <hughd@google.com>
>     kaiser: paranoid_entry pass cr3 need to paranoid_exit
>
> Hugh Dickins <hughd@google.com>
>     kaiser: x86_cr3_pcid_noflush and x86_cr3_pcid_user
>
> Hugh Dickins <hughd@google.com>
>     kaiser: PCID 0 for kernel and 128 for user
>
> Hugh Dickins <hughd@google.com>
>     kaiser: load_new_mm_cr3() let SWITCH_USER_CR3 flush user
>
> Dave Hansen <dave.hansen@linux.intel.com>
>     kaiser: enhanced by kernel and user PCIDs
>
> Hugh Dickins <hughd@google.com>
>     kaiser: vmstat show NR_KAISERTABLE as nr_overhead
>
> Hugh Dickins <hughd@google.com>
>     kaiser: delete KAISER_REAL_SWITCH option
>
> Hugh Dickins <hughd@google.com>
>     kaiser: name that 0x1000 KAISER_SHADOW_PGD_OFFSET
>
> Hugh Dickins <hughd@google.com>
>     kaiser: cleanups while trying for gold link
>
> Hugh Dickins <hughd@google.com>
>     kaiser: kaiser_remove_mapping() move along the pgd
>
> Hugh Dickins <hughd@google.com>
>     kaiser: tidied up kaiser_add/remove_mapping slightly
>
> Hugh Dickins <hughd@google.com>
>     kaiser: tidied up asm/kaiser.h somewhat
>
> Hugh Dickins <hughd@google.com>
>     kaiser: ENOMEM if kaiser_pagetable_walk() NULL
>
> Hugh Dickins <hughd@google.com>
>     kaiser: fix perf crashes
>
> Hugh Dickins <hughd@google.com>
>     kaiser: fix regs to do_nmi() ifndef CONFIG_KAISER
>
> Hugh Dickins <hughd@google.com>
>     kaiser: KAISER depends on SMP
>
> Hugh Dickins <hughd@google.com>
>     kaiser: fix build and FIXME in alloc_ldt_struct()
>
> Hugh Dickins <hughd@google.com>
>     kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE
>
> Hugh Dickins <hughd@google.com>
>     kaiser: do not set _PAGE_NX on pgd_none
>
> Dave Hansen <dave.hansen@linux.intel.com>
>     kaiser: merged update
>
> Richard Fellner <richard.fellner@student.tugraz.at>
>     KAISER: Kernel Address Isolation
>
> Tom Lendacky <thomas.lendacky@amd.com>
>     x86/boot: Add early cmdline parsing for options with arguments
>
>
> -------------
>
> Diffstat:
>
>  Documentation/kernel-parameters.txt         |   8 +
>  Makefile                                    |   4 +-
>  arch/x86/boot/compressed/misc.h             |   1 +
>  arch/x86/entry/entry_64.S                   | 164 ++++++++--
>  arch/x86/entry/entry_64_compat.S            |   7 +
>  arch/x86/include/asm/cmdline.h              |   2 +
>  arch/x86/include/asm/cpufeature.h           |   4 +
>  arch/x86/include/asm/desc.h                 |   2 +-
>  arch/x86/include/asm/hw_irq.h               |   2 +-
>  arch/x86/include/asm/kaiser.h               | 141 +++++++++
>  arch/x86/include/asm/pgtable.h              |  28 +-
>  arch/x86/include/asm/pgtable_64.h           |  25 +-
>  arch/x86/include/asm/pgtable_types.h        |  29 +-
>  arch/x86/include/asm/processor.h            |   2 +-
>  arch/x86/include/asm/tlbflush.h             |  74 ++++-
>  arch/x86/include/uapi/asm/processor-flags.h |   3 +-
>  arch/x86/kernel/cpu/common.c                |  28 +-
>  arch/x86/kernel/cpu/perf_event_intel_ds.c   |  57 +++-
>  arch/x86/kernel/espfix_64.c                 |  10 +
>  arch/x86/kernel/head_64.S                   |  35 ++-
>  arch/x86/kernel/irqinit.c                   |   2 +-
>  arch/x86/kernel/ldt.c                       |  25 +-
>  arch/x86/kernel/paravirt_patch_64.c         |   2 -
>  arch/x86/kernel/process.c                   |   2 +-
>  arch/x86/kernel/setup.c                     |   7 +
>  arch/x86/kernel/tracepoint.c                |   2 +
>  arch/x86/kvm/x86.c                          |   3 +-
>  arch/x86/lib/cmdline.c                      | 105 +++++++
>  arch/x86/mm/Makefile                        |   1 +
>  arch/x86/mm/init.c                          |   2 +-
>  arch/x86/mm/init_64.c                       |  10 +
>  arch/x86/mm/kaiser.c                        | 455 ++++++++++++++++++++++++++++
>  arch/x86/mm/pageattr.c                      |  63 +++-
>  arch/x86/mm/pgtable.c                       |  16 +-
>  arch/x86/mm/tlb.c                           |  39 ++-
>  include/asm-generic/vmlinux.lds.h           |   7 +
>  include/linux/kaiser.h                      |  52 ++++
>  include/linux/mmzone.h                      |   3 +-
>  include/linux/percpu-defs.h                 |  32 +-
>  init/main.c                                 |   2 +
>  kernel/fork.c                               |   6 +
>  mm/vmstat.c                                 |   1 +
>  security/Kconfig                            |  10 +
>  43 files changed, 1375 insertions(+), 98 deletions(-)
>
>



-- 
Life Is BeAuTifuL :)

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 16:38 ` Pavel Tatashin
@ 2018-01-04 16:53   ` Greg Kroah-Hartman
  2018-01-04 17:01     ` Guenter Roeck
                       ` (2 more replies)
  2018-01-04 20:11   ` Linus Torvalds
  1 sibling, 3 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-04 16:53 UTC (permalink / raw)
  To: Pavel Tatashin, Jiri Kosina, Hugh Dickins, Dave Hansen
  Cc: linux-kernel, torvalds, Andrew Morton, linux, shuahkh, patches,
	ben.hutchings, lkft-triage, stable

On Thu, Jan 04, 2018 at 11:38:25AM -0500, Pavel Tatashin wrote:
> I am getting the following panic when trying to boot 4.4.110rc1 on
> Intel(R) Xeon(R) CPU E5-2630:
> 
> [    5.923489] BUG: unable to handle kernel NULL pointer dereference
> at 000000000000000d
> [    5.932259] IP: [<ffffffff810e70d2>] dyntick_save_progress_counter+0x12/0x50
> [    5.940142] PGD 0
> [    5.942400] Oops: 0002 [#1] SMP
> [    5.946023] Modules linked in:
> [    5.949448] CPU: 5 PID: 8 Comm: rcu_sched Not tainted
> 4.4.110-rc1_pt_linux-4.4.110rc1 #1
> [    5.958484] Hardware name: Oracle Corporation ORACLE SERVER
> X6-2/ASM,MOTHERBOARD,1U, BIOS 38050100 08/30/2016
> [    5.969552] task: ffff881ff2f1ab00 ti: ffff881ff2f24000 task.ti:
> ffff881ff2f24000
> [    5.977905] RIP: 0010:[<ffffffff810e70d2>]  [<ffffffff810e70d2>]
> dyntick_save_progress_counter+0x12/0x50
> [    5.988505] RSP: 0000:ffff881ff2f27dc0  EFLAGS: 00010046
> [    5.994434] RAX: 0000000000000001 RBX: ffffffff81b02140 RCX: ffff883fec768000
> [    6.002403] RDX: 0000000000000000 RSI: ffff881ff2f27e5f RDI: ffff88407e958140
> [    6.010368] RBP: ffff881ff2f27dc0 R08: ffff881ff2f27e78 R09: 000000016110f359
> [    6.018333] R10: 0000000000000b10 R11: 0000000000000000 R12: ffffffff81b02140
> [    6.026297] R13: 00000000ffffffdf R14: 0000000000000021 R15: 0000000200000000
> [    6.034262] FS:  0000000000000000(0000) GS:ffff881fff940000(0000)
> knlGS:0000000000000000
> [    6.043293] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    6.049707] CR2: 000000000000000d CR3: 0000000001aa6000 CR4: 0000000000360670
> [    6.057672] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    6.065638] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    6.073603] Stack:
> [    6.075847]  ffff881ff2f27e18 ffffffff810e8fac 0000000000000202
> ffff881ff2f27e60
> [    6.084158]  ffff881ff2f27e5f ffffffff810e70c0 ffffffff81b02140
> ffffffff81b127a0
> [    6.092465]  0000000000000001 0000000000000000 0000000000000003
> ffff881ff2f27eb8
> [    6.100768] Call Trace:
> [    6.103501]  [<ffffffff810e8fac>] force_qs_rnp+0xdc/0x150
> [    6.109527]  [<ffffffff810e70c0>] ? rcu_start_gp+0x70/0x70
> [    6.115654]  [<ffffffff810ea118>] rcu_gp_kthread+0x468/0x9b0
> [    6.121976]  [<ffffffff810c9190>] ? prepare_to_wait_event+0xf0/0xf0
> [    6.128973]  [<ffffffff810e9cb0>] ? rcu_process_callbacks+0x5f0/0x5f0
> [    6.136167]  [<ffffffff810a4a25>] kthread+0xe5/0x100
> [    6.141710]  [<ffffffff810a4940>] ? kthread_park+0x60/0x60
> [    6.147840]  [<ffffffff81714e8f>] ret_from_fork+0x3f/0x70
> [    6.153868]  [<ffffffff810a4940>] ? kthread_park+0x60/0x60
> 
> I tried to bisect the problem, but when I try to boot only with:
> "KAISER: Kernel Address Isolation" machine hangs during boot and
> reboots without any panic message.
> 
> 4.4.109 boots fine
> 4.9.75rc1 also boots fine.

Hm, so I'm guessing 4.15-rc6 also works?

Odd that 4.9.75-rc1 fails.

Adding Jiri and Hugh and Dave here to see if they have seen this
before...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 16:53   ` Greg Kroah-Hartman
@ 2018-01-04 17:01     ` Guenter Roeck
  2018-01-04 17:09       ` Greg Kroah-Hartman
  2018-01-04 17:02     ` Pavel Tatashin
  2018-01-04 17:03     ` Willy Tarreau
  2 siblings, 1 reply; 156+ messages in thread
From: Guenter Roeck @ 2018-01-04 17:01 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Pavel Tatashin, Jiri Kosina, Hugh Dickins, Dave Hansen,
	linux-kernel, torvalds, Andrew Morton, shuahkh, patches,
	ben.hutchings, lkft-triage, stable

On Thu, Jan 04, 2018 at 05:53:06PM +0100, Greg Kroah-Hartman wrote:
> On Thu, Jan 04, 2018 at 11:38:25AM -0500, Pavel Tatashin wrote:
> > I am getting the following panic when trying to boot 4.4.110rc1 on
> > Intel(R) Xeon(R) CPU E5-2630:
> > 
> > [    5.923489] BUG: unable to handle kernel NULL pointer dereference
> > at 000000000000000d
> > [    5.932259] IP: [<ffffffff810e70d2>] dyntick_save_progress_counter+0x12/0x50
> > [    5.940142] PGD 0
> > [    5.942400] Oops: 0002 [#1] SMP
> > [    5.946023] Modules linked in:
> > [    5.949448] CPU: 5 PID: 8 Comm: rcu_sched Not tainted
> > 4.4.110-rc1_pt_linux-4.4.110rc1 #1
> > [    5.958484] Hardware name: Oracle Corporation ORACLE SERVER
> > X6-2/ASM,MOTHERBOARD,1U, BIOS 38050100 08/30/2016
> > [    5.969552] task: ffff881ff2f1ab00 ti: ffff881ff2f24000 task.ti:
> > ffff881ff2f24000
> > [    5.977905] RIP: 0010:[<ffffffff810e70d2>]  [<ffffffff810e70d2>]
> > dyntick_save_progress_counter+0x12/0x50
> > [    5.988505] RSP: 0000:ffff881ff2f27dc0  EFLAGS: 00010046
> > [    5.994434] RAX: 0000000000000001 RBX: ffffffff81b02140 RCX: ffff883fec768000
> > [    6.002403] RDX: 0000000000000000 RSI: ffff881ff2f27e5f RDI: ffff88407e958140
> > [    6.010368] RBP: ffff881ff2f27dc0 R08: ffff881ff2f27e78 R09: 000000016110f359
> > [    6.018333] R10: 0000000000000b10 R11: 0000000000000000 R12: ffffffff81b02140
> > [    6.026297] R13: 00000000ffffffdf R14: 0000000000000021 R15: 0000000200000000
> > [    6.034262] FS:  0000000000000000(0000) GS:ffff881fff940000(0000)
> > knlGS:0000000000000000
> > [    6.043293] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    6.049707] CR2: 000000000000000d CR3: 0000000001aa6000 CR4: 0000000000360670
> > [    6.057672] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [    6.065638] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [    6.073603] Stack:
> > [    6.075847]  ffff881ff2f27e18 ffffffff810e8fac 0000000000000202
> > ffff881ff2f27e60
> > [    6.084158]  ffff881ff2f27e5f ffffffff810e70c0 ffffffff81b02140
> > ffffffff81b127a0
> > [    6.092465]  0000000000000001 0000000000000000 0000000000000003
> > ffff881ff2f27eb8
> > [    6.100768] Call Trace:
> > [    6.103501]  [<ffffffff810e8fac>] force_qs_rnp+0xdc/0x150
> > [    6.109527]  [<ffffffff810e70c0>] ? rcu_start_gp+0x70/0x70
> > [    6.115654]  [<ffffffff810ea118>] rcu_gp_kthread+0x468/0x9b0
> > [    6.121976]  [<ffffffff810c9190>] ? prepare_to_wait_event+0xf0/0xf0
> > [    6.128973]  [<ffffffff810e9cb0>] ? rcu_process_callbacks+0x5f0/0x5f0
> > [    6.136167]  [<ffffffff810a4a25>] kthread+0xe5/0x100
> > [    6.141710]  [<ffffffff810a4940>] ? kthread_park+0x60/0x60
> > [    6.147840]  [<ffffffff81714e8f>] ret_from_fork+0x3f/0x70
> > [    6.153868]  [<ffffffff810a4940>] ? kthread_park+0x60/0x60
> > 
> > I tried to bisect the problem, but when I try to boot only with:
> > "KAISER: Kernel Address Isolation" machine hangs during boot and
> > reboots without any panic message.
> > 
> > 4.4.109 boots fine
> > 4.9.75rc1 also boots fine.
> 
> Hm, so I'm guessing 4.15-rc6 also works?
> 
> Odd that 4.9.75-rc1 fails.
> 

I thought the above says that it boots fine ?

Guenter

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 16:53   ` Greg Kroah-Hartman
  2018-01-04 17:01     ` Guenter Roeck
@ 2018-01-04 17:02     ` Pavel Tatashin
  2018-01-04 17:03     ` Willy Tarreau
  2 siblings, 0 replies; 156+ messages in thread
From: Pavel Tatashin @ 2018-01-04 17:02 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jiri Kosina, Hugh Dickins, Dave Hansen, linux-kernel, torvalds,
	Andrew Morton, linux, shuahkh, patches, ben.hutchings,
	lkft-triage, stable

[-- Attachment #1: Type: text/plain, Size: 197 bytes --]

> Hm, so I'm guessing 4.15-rc6 also works?

I have not test 4.15

> Odd that 4.9.75-rc1 fails.

4.9.75-rc1 does NOT fail, it boots fine.

config for 4.4.110rc1 panic is attached.

Thank you,
Pasha

[-- Attachment #2: config_linux-4.4.110rc1.gz --]
[-- Type: application/x-gzip, Size: 39060 bytes --]

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 16:53   ` Greg Kroah-Hartman
  2018-01-04 17:01     ` Guenter Roeck
  2018-01-04 17:02     ` Pavel Tatashin
@ 2018-01-04 17:03     ` Willy Tarreau
  2018-01-04 17:11       ` Greg Kroah-Hartman
  2 siblings, 1 reply; 156+ messages in thread
From: Willy Tarreau @ 2018-01-04 17:03 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Pavel Tatashin, Jiri Kosina, Hugh Dickins, Dave Hansen,
	linux-kernel, torvalds, Andrew Morton, linux, shuahkh, patches,
	ben.hutchings, lkft-triage, stable

On Thu, Jan 04, 2018 at 05:53:06PM +0100, Greg Kroah-Hartman wrote:
> On Thu, Jan 04, 2018 at 11:38:25AM -0500, Pavel Tatashin wrote:
> > I am getting the following panic when trying to boot 4.4.110rc1 on
> > Intel(R) Xeon(R) CPU E5-2630:
> > 
> > [    5.923489] BUG: unable to handle kernel NULL pointer dereference
> > at 000000000000000d
> > [    5.932259] IP: [<ffffffff810e70d2>] dyntick_save_progress_counter+0x12/0x50
> > [    5.940142] PGD 0
> > [    5.942400] Oops: 0002 [#1] SMP
> > [    5.946023] Modules linked in:
> > [    5.949448] CPU: 5 PID: 8 Comm: rcu_sched Not tainted
> > 4.4.110-rc1_pt_linux-4.4.110rc1 #1
> > [    5.958484] Hardware name: Oracle Corporation ORACLE SERVER
> > X6-2/ASM,MOTHERBOARD,1U, BIOS 38050100 08/30/2016
> > [    5.969552] task: ffff881ff2f1ab00 ti: ffff881ff2f24000 task.ti:
> > ffff881ff2f24000
> > [    5.977905] RIP: 0010:[<ffffffff810e70d2>]  [<ffffffff810e70d2>]
> > dyntick_save_progress_counter+0x12/0x50
> > [    5.988505] RSP: 0000:ffff881ff2f27dc0  EFLAGS: 00010046
> > [    5.994434] RAX: 0000000000000001 RBX: ffffffff81b02140 RCX: ffff883fec768000
> > [    6.002403] RDX: 0000000000000000 RSI: ffff881ff2f27e5f RDI: ffff88407e958140
> > [    6.010368] RBP: ffff881ff2f27dc0 R08: ffff881ff2f27e78 R09: 000000016110f359
> > [    6.018333] R10: 0000000000000b10 R11: 0000000000000000 R12: ffffffff81b02140
> > [    6.026297] R13: 00000000ffffffdf R14: 0000000000000021 R15: 0000000200000000
> > [    6.034262] FS:  0000000000000000(0000) GS:ffff881fff940000(0000)
> > knlGS:0000000000000000
> > [    6.043293] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    6.049707] CR2: 000000000000000d CR3: 0000000001aa6000 CR4: 0000000000360670
> > [    6.057672] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [    6.065638] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [    6.073603] Stack:
> > [    6.075847]  ffff881ff2f27e18 ffffffff810e8fac 0000000000000202
> > ffff881ff2f27e60
> > [    6.084158]  ffff881ff2f27e5f ffffffff810e70c0 ffffffff81b02140
> > ffffffff81b127a0
> > [    6.092465]  0000000000000001 0000000000000000 0000000000000003
> > ffff881ff2f27eb8
> > [    6.100768] Call Trace:
> > [    6.103501]  [<ffffffff810e8fac>] force_qs_rnp+0xdc/0x150
> > [    6.109527]  [<ffffffff810e70c0>] ? rcu_start_gp+0x70/0x70
> > [    6.115654]  [<ffffffff810ea118>] rcu_gp_kthread+0x468/0x9b0
> > [    6.121976]  [<ffffffff810c9190>] ? prepare_to_wait_event+0xf0/0xf0
> > [    6.128973]  [<ffffffff810e9cb0>] ? rcu_process_callbacks+0x5f0/0x5f0
> > [    6.136167]  [<ffffffff810a4a25>] kthread+0xe5/0x100
> > [    6.141710]  [<ffffffff810a4940>] ? kthread_park+0x60/0x60
> > [    6.147840]  [<ffffffff81714e8f>] ret_from_fork+0x3f/0x70
> > [    6.153868]  [<ffffffff810a4940>] ? kthread_park+0x60/0x60
> > 
> > I tried to bisect the problem, but when I try to boot only with:
> > "KAISER: Kernel Address Isolation" machine hangs during boot and
> > reboots without any panic message.
> > 
> > 4.4.109 boots fine
> > 4.9.75rc1 also boots fine.
> 
> Hm, so I'm guessing 4.15-rc6 also works?
> 
> Odd that 4.9.75-rc1 fails.

s/4.9.75/4.4.110/ I suppose.

Can't this be because more patches are required in 4.4 to support this
patch set ? Or maybe a manual fix for a conflict that went wrong ? Just
trying to guess.

Willy

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (40 preceding siblings ...)
  2018-01-04 16:38 ` Pavel Tatashin
@ 2018-01-04 17:03 ` Guenter Roeck
  2018-01-04 19:38 ` Thomas Voegtle
                   ` (4 subsequent siblings)
  46 siblings, 0 replies; 156+ messages in thread
From: Guenter Roeck @ 2018-01-04 17:03 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, shuahkh, patches, ben.hutchings,
	lkft-triage, stable

On Wed, Jan 03, 2018 at 09:11:06PM +0100, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.110 release.
> There are 37 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Fri Jan  5 19:50:38 UTC 2018.
> Anything received after that time might be too late.
> 

For v4.4.109-38-g99abd6c:

Build results:
	total: 145 pass: 145 fail: 0
Qemu test results:
	total: 118 pass: 118 fail: 0

Details are available at http://kerneltests.org/builders.

Guenter

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 17:01     ` Guenter Roeck
@ 2018-01-04 17:09       ` Greg Kroah-Hartman
  0 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-04 17:09 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Pavel Tatashin, Jiri Kosina, Hugh Dickins, Dave Hansen,
	linux-kernel, torvalds, Andrew Morton, shuahkh, patches,
	ben.hutchings, lkft-triage, stable

On Thu, Jan 04, 2018 at 09:01:06AM -0800, Guenter Roeck wrote:
> On Thu, Jan 04, 2018 at 05:53:06PM +0100, Greg Kroah-Hartman wrote:
> > On Thu, Jan 04, 2018 at 11:38:25AM -0500, Pavel Tatashin wrote:
> > > I am getting the following panic when trying to boot 4.4.110rc1 on
> > > Intel(R) Xeon(R) CPU E5-2630:
> > > 
> > > [    5.923489] BUG: unable to handle kernel NULL pointer dereference
> > > at 000000000000000d
> > > [    5.932259] IP: [<ffffffff810e70d2>] dyntick_save_progress_counter+0x12/0x50
> > > [    5.940142] PGD 0
> > > [    5.942400] Oops: 0002 [#1] SMP
> > > [    5.946023] Modules linked in:
> > > [    5.949448] CPU: 5 PID: 8 Comm: rcu_sched Not tainted
> > > 4.4.110-rc1_pt_linux-4.4.110rc1 #1
> > > [    5.958484] Hardware name: Oracle Corporation ORACLE SERVER
> > > X6-2/ASM,MOTHERBOARD,1U, BIOS 38050100 08/30/2016
> > > [    5.969552] task: ffff881ff2f1ab00 ti: ffff881ff2f24000 task.ti:
> > > ffff881ff2f24000
> > > [    5.977905] RIP: 0010:[<ffffffff810e70d2>]  [<ffffffff810e70d2>]
> > > dyntick_save_progress_counter+0x12/0x50
> > > [    5.988505] RSP: 0000:ffff881ff2f27dc0  EFLAGS: 00010046
> > > [    5.994434] RAX: 0000000000000001 RBX: ffffffff81b02140 RCX: ffff883fec768000
> > > [    6.002403] RDX: 0000000000000000 RSI: ffff881ff2f27e5f RDI: ffff88407e958140
> > > [    6.010368] RBP: ffff881ff2f27dc0 R08: ffff881ff2f27e78 R09: 000000016110f359
> > > [    6.018333] R10: 0000000000000b10 R11: 0000000000000000 R12: ffffffff81b02140
> > > [    6.026297] R13: 00000000ffffffdf R14: 0000000000000021 R15: 0000000200000000
> > > [    6.034262] FS:  0000000000000000(0000) GS:ffff881fff940000(0000)
> > > knlGS:0000000000000000
> > > [    6.043293] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [    6.049707] CR2: 000000000000000d CR3: 0000000001aa6000 CR4: 0000000000360670
> > > [    6.057672] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [    6.065638] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > [    6.073603] Stack:
> > > [    6.075847]  ffff881ff2f27e18 ffffffff810e8fac 0000000000000202
> > > ffff881ff2f27e60
> > > [    6.084158]  ffff881ff2f27e5f ffffffff810e70c0 ffffffff81b02140
> > > ffffffff81b127a0
> > > [    6.092465]  0000000000000001 0000000000000000 0000000000000003
> > > ffff881ff2f27eb8
> > > [    6.100768] Call Trace:
> > > [    6.103501]  [<ffffffff810e8fac>] force_qs_rnp+0xdc/0x150
> > > [    6.109527]  [<ffffffff810e70c0>] ? rcu_start_gp+0x70/0x70
> > > [    6.115654]  [<ffffffff810ea118>] rcu_gp_kthread+0x468/0x9b0
> > > [    6.121976]  [<ffffffff810c9190>] ? prepare_to_wait_event+0xf0/0xf0
> > > [    6.128973]  [<ffffffff810e9cb0>] ? rcu_process_callbacks+0x5f0/0x5f0
> > > [    6.136167]  [<ffffffff810a4a25>] kthread+0xe5/0x100
> > > [    6.141710]  [<ffffffff810a4940>] ? kthread_park+0x60/0x60
> > > [    6.147840]  [<ffffffff81714e8f>] ret_from_fork+0x3f/0x70
> > > [    6.153868]  [<ffffffff810a4940>] ? kthread_park+0x60/0x60
> > > 
> > > I tried to bisect the problem, but when I try to boot only with:
> > > "KAISER: Kernel Address Isolation" machine hangs during boot and
> > > reboots without any panic message.
> > > 
> > > 4.4.109 boots fine
> > > 4.9.75rc1 also boots fine.
> > 
> > Hm, so I'm guessing 4.15-rc6 also works?
> > 
> > Odd that 4.9.75-rc1 fails.

Sorry, it's been a long few days, I meant "odd that the 4.9 -rc works
and the 4.4 one fails".

{sigh}

I think I need to ignore email for a while...

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 17:03     ` Willy Tarreau
@ 2018-01-04 17:11       ` Greg Kroah-Hartman
  2018-01-04 17:13         ` Willy Tarreau
  2018-01-04 17:14         ` Greg Kroah-Hartman
  0 siblings, 2 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-04 17:11 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Pavel Tatashin, Jiri Kosina, Hugh Dickins, Dave Hansen,
	linux-kernel, torvalds, Andrew Morton, linux, shuahkh, patches,
	ben.hutchings, lkft-triage, stable

On Thu, Jan 04, 2018 at 06:03:15PM +0100, Willy Tarreau wrote:
> On Thu, Jan 04, 2018 at 05:53:06PM +0100, Greg Kroah-Hartman wrote:
> > On Thu, Jan 04, 2018 at 11:38:25AM -0500, Pavel Tatashin wrote:
> > > I am getting the following panic when trying to boot 4.4.110rc1 on
> > > Intel(R) Xeon(R) CPU E5-2630:
> > > 
> > > [    5.923489] BUG: unable to handle kernel NULL pointer dereference
> > > at 000000000000000d
> > > [    5.932259] IP: [<ffffffff810e70d2>] dyntick_save_progress_counter+0x12/0x50
> > > [    5.940142] PGD 0
> > > [    5.942400] Oops: 0002 [#1] SMP
> > > [    5.946023] Modules linked in:
> > > [    5.949448] CPU: 5 PID: 8 Comm: rcu_sched Not tainted
> > > 4.4.110-rc1_pt_linux-4.4.110rc1 #1
> > > [    5.958484] Hardware name: Oracle Corporation ORACLE SERVER
> > > X6-2/ASM,MOTHERBOARD,1U, BIOS 38050100 08/30/2016
> > > [    5.969552] task: ffff881ff2f1ab00 ti: ffff881ff2f24000 task.ti:
> > > ffff881ff2f24000
> > > [    5.977905] RIP: 0010:[<ffffffff810e70d2>]  [<ffffffff810e70d2>]
> > > dyntick_save_progress_counter+0x12/0x50
> > > [    5.988505] RSP: 0000:ffff881ff2f27dc0  EFLAGS: 00010046
> > > [    5.994434] RAX: 0000000000000001 RBX: ffffffff81b02140 RCX: ffff883fec768000
> > > [    6.002403] RDX: 0000000000000000 RSI: ffff881ff2f27e5f RDI: ffff88407e958140
> > > [    6.010368] RBP: ffff881ff2f27dc0 R08: ffff881ff2f27e78 R09: 000000016110f359
> > > [    6.018333] R10: 0000000000000b10 R11: 0000000000000000 R12: ffffffff81b02140
> > > [    6.026297] R13: 00000000ffffffdf R14: 0000000000000021 R15: 0000000200000000
> > > [    6.034262] FS:  0000000000000000(0000) GS:ffff881fff940000(0000)
> > > knlGS:0000000000000000
> > > [    6.043293] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [    6.049707] CR2: 000000000000000d CR3: 0000000001aa6000 CR4: 0000000000360670
> > > [    6.057672] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [    6.065638] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > [    6.073603] Stack:
> > > [    6.075847]  ffff881ff2f27e18 ffffffff810e8fac 0000000000000202
> > > ffff881ff2f27e60
> > > [    6.084158]  ffff881ff2f27e5f ffffffff810e70c0 ffffffff81b02140
> > > ffffffff81b127a0
> > > [    6.092465]  0000000000000001 0000000000000000 0000000000000003
> > > ffff881ff2f27eb8
> > > [    6.100768] Call Trace:
> > > [    6.103501]  [<ffffffff810e8fac>] force_qs_rnp+0xdc/0x150
> > > [    6.109527]  [<ffffffff810e70c0>] ? rcu_start_gp+0x70/0x70
> > > [    6.115654]  [<ffffffff810ea118>] rcu_gp_kthread+0x468/0x9b0
> > > [    6.121976]  [<ffffffff810c9190>] ? prepare_to_wait_event+0xf0/0xf0
> > > [    6.128973]  [<ffffffff810e9cb0>] ? rcu_process_callbacks+0x5f0/0x5f0
> > > [    6.136167]  [<ffffffff810a4a25>] kthread+0xe5/0x100
> > > [    6.141710]  [<ffffffff810a4940>] ? kthread_park+0x60/0x60
> > > [    6.147840]  [<ffffffff81714e8f>] ret_from_fork+0x3f/0x70
> > > [    6.153868]  [<ffffffff810a4940>] ? kthread_park+0x60/0x60
> > > 
> > > I tried to bisect the problem, but when I try to boot only with:
> > > "KAISER: Kernel Address Isolation" machine hangs during boot and
> > > reboots without any panic message.
> > > 
> > > 4.4.109 boots fine
> > > 4.9.75rc1 also boots fine.
> > 
> > Hm, so I'm guessing 4.15-rc6 also works?
> > 
> > Odd that 4.9.75-rc1 fails.
> 
> s/4.9.75/4.4.110/ I suppose.

Yes, mistake on my side.

> Can't this be because more patches are required in 4.4 to support this
> patch set ? Or maybe a manual fix for a conflict that went wrong ? Just
> trying to guess.

Odd thing is, the 4.9 series started from the 4.4 code for most of the
patches, so I would expect that one to fail...

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 17:11       ` Greg Kroah-Hartman
@ 2018-01-04 17:13         ` Willy Tarreau
  2018-01-04 17:14         ` Greg Kroah-Hartman
  1 sibling, 0 replies; 156+ messages in thread
From: Willy Tarreau @ 2018-01-04 17:13 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Pavel Tatashin, Jiri Kosina, Hugh Dickins, Dave Hansen,
	linux-kernel, torvalds, Andrew Morton, linux, shuahkh, patches,
	ben.hutchings, lkft-triage, stable

On Thu, Jan 04, 2018 at 06:11:02PM +0100, Greg Kroah-Hartman wrote:
> > Can't this be because more patches are required in 4.4 to support this
> > patch set ? Or maybe a manual fix for a conflict that went wrong ? Just
> > trying to guess.
> 
> Odd thing is, the 4.9 series started from the 4.4 code for most of the
> patches, so I would expect that one to fail...

I see. Then maybe a missing patch somewhere in 4.4 compared to 4.9 :-/
I have no idea what to look for however.

Willy

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 17:11       ` Greg Kroah-Hartman
  2018-01-04 17:13         ` Willy Tarreau
@ 2018-01-04 17:14         ` Greg Kroah-Hartman
  2018-01-04 17:16           ` Greg Kroah-Hartman
  1 sibling, 1 reply; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-04 17:14 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Pavel Tatashin, Jiri Kosina, Hugh Dickins, Dave Hansen,
	linux-kernel, torvalds, Andrew Morton, linux, shuahkh, patches,
	ben.hutchings, lkft-triage, stable

On Thu, Jan 04, 2018 at 06:11:02PM +0100, Greg Kroah-Hartman wrote:
> On Thu, Jan 04, 2018 at 06:03:15PM +0100, Willy Tarreau wrote:
> > On Thu, Jan 04, 2018 at 05:53:06PM +0100, Greg Kroah-Hartman wrote:
> > > On Thu, Jan 04, 2018 at 11:38:25AM -0500, Pavel Tatashin wrote:
> > > > I am getting the following panic when trying to boot 4.4.110rc1 on
> > > > Intel(R) Xeon(R) CPU E5-2630:
> > > > 
> > > > [    5.923489] BUG: unable to handle kernel NULL pointer dereference
> > > > at 000000000000000d
> > > > [    5.932259] IP: [<ffffffff810e70d2>] dyntick_save_progress_counter+0x12/0x50
> > > > [    5.940142] PGD 0
> > > > [    5.942400] Oops: 0002 [#1] SMP
> > > > [    5.946023] Modules linked in:
> > > > [    5.949448] CPU: 5 PID: 8 Comm: rcu_sched Not tainted
> > > > 4.4.110-rc1_pt_linux-4.4.110rc1 #1
> > > > [    5.958484] Hardware name: Oracle Corporation ORACLE SERVER
> > > > X6-2/ASM,MOTHERBOARD,1U, BIOS 38050100 08/30/2016
> > > > [    5.969552] task: ffff881ff2f1ab00 ti: ffff881ff2f24000 task.ti:
> > > > ffff881ff2f24000
> > > > [    5.977905] RIP: 0010:[<ffffffff810e70d2>]  [<ffffffff810e70d2>]
> > > > dyntick_save_progress_counter+0x12/0x50
> > > > [    5.988505] RSP: 0000:ffff881ff2f27dc0  EFLAGS: 00010046
> > > > [    5.994434] RAX: 0000000000000001 RBX: ffffffff81b02140 RCX: ffff883fec768000
> > > > [    6.002403] RDX: 0000000000000000 RSI: ffff881ff2f27e5f RDI: ffff88407e958140
> > > > [    6.010368] RBP: ffff881ff2f27dc0 R08: ffff881ff2f27e78 R09: 000000016110f359
> > > > [    6.018333] R10: 0000000000000b10 R11: 0000000000000000 R12: ffffffff81b02140
> > > > [    6.026297] R13: 00000000ffffffdf R14: 0000000000000021 R15: 0000000200000000
> > > > [    6.034262] FS:  0000000000000000(0000) GS:ffff881fff940000(0000)
> > > > knlGS:0000000000000000
> > > > [    6.043293] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [    6.049707] CR2: 000000000000000d CR3: 0000000001aa6000 CR4: 0000000000360670
> > > > [    6.057672] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > [    6.065638] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > [    6.073603] Stack:
> > > > [    6.075847]  ffff881ff2f27e18 ffffffff810e8fac 0000000000000202
> > > > ffff881ff2f27e60
> > > > [    6.084158]  ffff881ff2f27e5f ffffffff810e70c0 ffffffff81b02140
> > > > ffffffff81b127a0
> > > > [    6.092465]  0000000000000001 0000000000000000 0000000000000003
> > > > ffff881ff2f27eb8
> > > > [    6.100768] Call Trace:
> > > > [    6.103501]  [<ffffffff810e8fac>] force_qs_rnp+0xdc/0x150
> > > > [    6.109527]  [<ffffffff810e70c0>] ? rcu_start_gp+0x70/0x70
> > > > [    6.115654]  [<ffffffff810ea118>] rcu_gp_kthread+0x468/0x9b0
> > > > [    6.121976]  [<ffffffff810c9190>] ? prepare_to_wait_event+0xf0/0xf0
> > > > [    6.128973]  [<ffffffff810e9cb0>] ? rcu_process_callbacks+0x5f0/0x5f0
> > > > [    6.136167]  [<ffffffff810a4a25>] kthread+0xe5/0x100
> > > > [    6.141710]  [<ffffffff810a4940>] ? kthread_park+0x60/0x60
> > > > [    6.147840]  [<ffffffff81714e8f>] ret_from_fork+0x3f/0x70
> > > > [    6.153868]  [<ffffffff810a4940>] ? kthread_park+0x60/0x60
> > > > 
> > > > I tried to bisect the problem, but when I try to boot only with:
> > > > "KAISER: Kernel Address Isolation" machine hangs during boot and
> > > > reboots without any panic message.
> > > > 
> > > > 4.4.109 boots fine
> > > > 4.9.75rc1 also boots fine.
> > > 
> > > Hm, so I'm guessing 4.15-rc6 also works?
> > > 
> > > Odd that 4.9.75-rc1 fails.
> > 
> > s/4.9.75/4.4.110/ I suppose.
> 
> Yes, mistake on my side.
> 
> > Can't this be because more patches are required in 4.4 to support this
> > patch set ? Or maybe a manual fix for a conflict that went wrong ? Just
> > trying to guess.
> 
> Odd thing is, the 4.9 series started from the 4.4 code for most of the
> patches, so I would expect that one to fail...

Also, the 4.4 patches were supposed to have been better tested, I need
to go dig and see what I messed up here...

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 17:14         ` Greg Kroah-Hartman
@ 2018-01-04 17:16           ` Greg Kroah-Hartman
  2018-01-04 17:56             ` Guenter Roeck
  0 siblings, 1 reply; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-04 17:16 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Pavel Tatashin, Jiri Kosina, Hugh Dickins, Dave Hansen,
	linux-kernel, torvalds, Andrew Morton, linux, shuahkh, patches,
	ben.hutchings, lkft-triage, stable

On Thu, Jan 04, 2018 at 06:14:15PM +0100, Greg Kroah-Hartman wrote:
> On Thu, Jan 04, 2018 at 06:11:02PM +0100, Greg Kroah-Hartman wrote:
> > On Thu, Jan 04, 2018 at 06:03:15PM +0100, Willy Tarreau wrote:
> > > On Thu, Jan 04, 2018 at 05:53:06PM +0100, Greg Kroah-Hartman wrote:
> > > > On Thu, Jan 04, 2018 at 11:38:25AM -0500, Pavel Tatashin wrote:
> > > > > I am getting the following panic when trying to boot 4.4.110rc1 on
> > > > > Intel(R) Xeon(R) CPU E5-2630:
> > > > > 
> > > > > [    5.923489] BUG: unable to handle kernel NULL pointer dereference
> > > > > at 000000000000000d
> > > > > [    5.932259] IP: [<ffffffff810e70d2>] dyntick_save_progress_counter+0x12/0x50
> > > > > [    5.940142] PGD 0
> > > > > [    5.942400] Oops: 0002 [#1] SMP
> > > > > [    5.946023] Modules linked in:
> > > > > [    5.949448] CPU: 5 PID: 8 Comm: rcu_sched Not tainted
> > > > > 4.4.110-rc1_pt_linux-4.4.110rc1 #1
> > > > > [    5.958484] Hardware name: Oracle Corporation ORACLE SERVER
> > > > > X6-2/ASM,MOTHERBOARD,1U, BIOS 38050100 08/30/2016
> > > > > [    5.969552] task: ffff881ff2f1ab00 ti: ffff881ff2f24000 task.ti:
> > > > > ffff881ff2f24000
> > > > > [    5.977905] RIP: 0010:[<ffffffff810e70d2>]  [<ffffffff810e70d2>]
> > > > > dyntick_save_progress_counter+0x12/0x50
> > > > > [    5.988505] RSP: 0000:ffff881ff2f27dc0  EFLAGS: 00010046
> > > > > [    5.994434] RAX: 0000000000000001 RBX: ffffffff81b02140 RCX: ffff883fec768000
> > > > > [    6.002403] RDX: 0000000000000000 RSI: ffff881ff2f27e5f RDI: ffff88407e958140
> > > > > [    6.010368] RBP: ffff881ff2f27dc0 R08: ffff881ff2f27e78 R09: 000000016110f359
> > > > > [    6.018333] R10: 0000000000000b10 R11: 0000000000000000 R12: ffffffff81b02140
> > > > > [    6.026297] R13: 00000000ffffffdf R14: 0000000000000021 R15: 0000000200000000
> > > > > [    6.034262] FS:  0000000000000000(0000) GS:ffff881fff940000(0000)
> > > > > knlGS:0000000000000000
> > > > > [    6.043293] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > [    6.049707] CR2: 000000000000000d CR3: 0000000001aa6000 CR4: 0000000000360670
> > > > > [    6.057672] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > [    6.065638] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > [    6.073603] Stack:
> > > > > [    6.075847]  ffff881ff2f27e18 ffffffff810e8fac 0000000000000202
> > > > > ffff881ff2f27e60
> > > > > [    6.084158]  ffff881ff2f27e5f ffffffff810e70c0 ffffffff81b02140
> > > > > ffffffff81b127a0
> > > > > [    6.092465]  0000000000000001 0000000000000000 0000000000000003
> > > > > ffff881ff2f27eb8
> > > > > [    6.100768] Call Trace:
> > > > > [    6.103501]  [<ffffffff810e8fac>] force_qs_rnp+0xdc/0x150
> > > > > [    6.109527]  [<ffffffff810e70c0>] ? rcu_start_gp+0x70/0x70
> > > > > [    6.115654]  [<ffffffff810ea118>] rcu_gp_kthread+0x468/0x9b0
> > > > > [    6.121976]  [<ffffffff810c9190>] ? prepare_to_wait_event+0xf0/0xf0
> > > > > [    6.128973]  [<ffffffff810e9cb0>] ? rcu_process_callbacks+0x5f0/0x5f0
> > > > > [    6.136167]  [<ffffffff810a4a25>] kthread+0xe5/0x100
> > > > > [    6.141710]  [<ffffffff810a4940>] ? kthread_park+0x60/0x60
> > > > > [    6.147840]  [<ffffffff81714e8f>] ret_from_fork+0x3f/0x70
> > > > > [    6.153868]  [<ffffffff810a4940>] ? kthread_park+0x60/0x60
> > > > > 
> > > > > I tried to bisect the problem, but when I try to boot only with:
> > > > > "KAISER: Kernel Address Isolation" machine hangs during boot and
> > > > > reboots without any panic message.
> > > > > 
> > > > > 4.4.109 boots fine
> > > > > 4.9.75rc1 also boots fine.
> > > > 
> > > > Hm, so I'm guessing 4.15-rc6 also works?
> > > > 
> > > > Odd that 4.9.75-rc1 fails.
> > > 
> > > s/4.9.75/4.4.110/ I suppose.
> > 
> > Yes, mistake on my side.
> > 
> > > Can't this be because more patches are required in 4.4 to support this
> > > patch set ? Or maybe a manual fix for a conflict that went wrong ? Just
> > > trying to guess.
> > 
> > Odd thing is, the 4.9 series started from the 4.4 code for most of the
> > patches, so I would expect that one to fail...
> 
> Also, the 4.4 patches were supposed to have been better tested, I need
> to go dig and see what I messed up here...

Nope, it matches up with what is in SLES12 exactly, I must be missing
something else here as a prerequisite...

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 17:16           ` Greg Kroah-Hartman
@ 2018-01-04 17:56             ` Guenter Roeck
  2018-01-05 15:00               ` Greg Kroah-Hartman
  0 siblings, 1 reply; 156+ messages in thread
From: Guenter Roeck @ 2018-01-04 17:56 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Willy Tarreau, Pavel Tatashin, Jiri Kosina, Hugh Dickins,
	Dave Hansen, linux-kernel, torvalds, Andrew Morton, shuahkh,
	patches, ben.hutchings, lkft-triage, stable

On Thu, Jan 04, 2018 at 06:16:04PM +0100, Greg Kroah-Hartman wrote:
> On Thu, Jan 04, 2018 at 06:14:15PM +0100, Greg Kroah-Hartman wrote:
> > On Thu, Jan 04, 2018 at 06:11:02PM +0100, Greg Kroah-Hartman wrote:
> > > On Thu, Jan 04, 2018 at 06:03:15PM +0100, Willy Tarreau wrote:
> > > > On Thu, Jan 04, 2018 at 05:53:06PM +0100, Greg Kroah-Hartman wrote:
> > > > > On Thu, Jan 04, 2018 at 11:38:25AM -0500, Pavel Tatashin wrote:
> > > > > > I am getting the following panic when trying to boot 4.4.110rc1 on
> > > > > > Intel(R) Xeon(R) CPU E5-2630:
> > > > > > 
> > > > > > [    5.923489] BUG: unable to handle kernel NULL pointer dereference
> > > > > > at 000000000000000d
> > > > > > [    5.932259] IP: [<ffffffff810e70d2>] dyntick_save_progress_counter+0x12/0x50
> > > > > > [    5.940142] PGD 0
> > > > > > [    5.942400] Oops: 0002 [#1] SMP
> > > > > > [    5.946023] Modules linked in:
> > > > > > [    5.949448] CPU: 5 PID: 8 Comm: rcu_sched Not tainted
> > > > > > 4.4.110-rc1_pt_linux-4.4.110rc1 #1
> > > > > > [    5.958484] Hardware name: Oracle Corporation ORACLE SERVER
> > > > > > X6-2/ASM,MOTHERBOARD,1U, BIOS 38050100 08/30/2016
> > > > > > [    5.969552] task: ffff881ff2f1ab00 ti: ffff881ff2f24000 task.ti:
> > > > > > ffff881ff2f24000
> > > > > > [    5.977905] RIP: 0010:[<ffffffff810e70d2>]  [<ffffffff810e70d2>]
> > > > > > dyntick_save_progress_counter+0x12/0x50
> > > > > > [    5.988505] RSP: 0000:ffff881ff2f27dc0  EFLAGS: 00010046
> > > > > > [    5.994434] RAX: 0000000000000001 RBX: ffffffff81b02140 RCX: ffff883fec768000
> > > > > > [    6.002403] RDX: 0000000000000000 RSI: ffff881ff2f27e5f RDI: ffff88407e958140
> > > > > > [    6.010368] RBP: ffff881ff2f27dc0 R08: ffff881ff2f27e78 R09: 000000016110f359
> > > > > > [    6.018333] R10: 0000000000000b10 R11: 0000000000000000 R12: ffffffff81b02140
> > > > > > [    6.026297] R13: 00000000ffffffdf R14: 0000000000000021 R15: 0000000200000000
> > > > > > [    6.034262] FS:  0000000000000000(0000) GS:ffff881fff940000(0000)
> > > > > > knlGS:0000000000000000
> > > > > > [    6.043293] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > [    6.049707] CR2: 000000000000000d CR3: 0000000001aa6000 CR4: 0000000000360670
> > > > > > [    6.057672] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > [    6.065638] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > [    6.073603] Stack:
> > > > > > [    6.075847]  ffff881ff2f27e18 ffffffff810e8fac 0000000000000202
> > > > > > ffff881ff2f27e60
> > > > > > [    6.084158]  ffff881ff2f27e5f ffffffff810e70c0 ffffffff81b02140
> > > > > > ffffffff81b127a0
> > > > > > [    6.092465]  0000000000000001 0000000000000000 0000000000000003
> > > > > > ffff881ff2f27eb8
> > > > > > [    6.100768] Call Trace:
> > > > > > [    6.103501]  [<ffffffff810e8fac>] force_qs_rnp+0xdc/0x150
> > > > > > [    6.109527]  [<ffffffff810e70c0>] ? rcu_start_gp+0x70/0x70
> > > > > > [    6.115654]  [<ffffffff810ea118>] rcu_gp_kthread+0x468/0x9b0
> > > > > > [    6.121976]  [<ffffffff810c9190>] ? prepare_to_wait_event+0xf0/0xf0
> > > > > > [    6.128973]  [<ffffffff810e9cb0>] ? rcu_process_callbacks+0x5f0/0x5f0
> > > > > > [    6.136167]  [<ffffffff810a4a25>] kthread+0xe5/0x100
> > > > > > [    6.141710]  [<ffffffff810a4940>] ? kthread_park+0x60/0x60
> > > > > > [    6.147840]  [<ffffffff81714e8f>] ret_from_fork+0x3f/0x70
> > > > > > [    6.153868]  [<ffffffff810a4940>] ? kthread_park+0x60/0x60
> > > > > > 
> > > > > > I tried to bisect the problem, but when I try to boot only with:
> > > > > > "KAISER: Kernel Address Isolation" machine hangs during boot and
> > > > > > reboots without any panic message.
> > > > > > 
> > > > > > 4.4.109 boots fine
> > > > > > 4.9.75rc1 also boots fine.
> > > > > 
> > > > > Hm, so I'm guessing 4.15-rc6 also works?
> > > > > 
> > > > > Odd that 4.9.75-rc1 fails.
> > > > 
> > > > s/4.9.75/4.4.110/ I suppose.
> > > 
> > > Yes, mistake on my side.
> > > 
> > > > Can't this be because more patches are required in 4.4 to support this
> > > > patch set ? Or maybe a manual fix for a conflict that went wrong ? Just
> > > > trying to guess.
> > > 
> > > Odd thing is, the 4.9 series started from the 4.4 code for most of the
> > > patches, so I would expect that one to fail...
> > 
> > Also, the 4.4 patches were supposed to have been better tested, I need
> > to go dig and see what I messed up here...
> 
> Nope, it matches up with what is in SLES12 exactly, I must be missing
> something else here as a prerequisite...

FWIW, v4.4.110-rc1 boots fine when merged into chromeos-4.4, on i7-7Y75.

Guenter

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (41 preceding siblings ...)
  2018-01-04 17:03 ` Guenter Roeck
@ 2018-01-04 19:38 ` Thomas Voegtle
  2018-01-04 19:50   ` Greg Kroah-Hartman
                     ` (2 more replies)
  2018-01-04 22:00 ` Shuah Khan
                   ` (3 subsequent siblings)
  46 siblings, 3 replies; 156+ messages in thread
From: Thomas Voegtle @ 2018-01-04 19:38 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, linux, shuahkh, patches,
	ben.hutchings, lkft-triage, stable


When I start 4.4.110-rc1 on a virtual machine (qemu) init throws a
segfault and the kernel panics (attempted to kill init).
The VM host is a Haswell system.

The same kernel binary boots fine on a (other) Haswell system.

I tried:

4.4.110-rc1     broken
4.4.109         ok
4.9.75-rc1      ok

All systems are OpenSuSE 42.3 64bit.

qemu is started only with:
qemu-system-x86_64 -m 2048 -enable-kvm  -drive
file=tvsuse,format=raw,if=none,id=virtdisk0 -device
virtio-blk-pci,scsi=off,drive=virtdisk0

Am I the only one who sees this? Has anyone booted that kernel on qemu?

Confused,

   Thomas

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 19:38 ` Thomas Voegtle
@ 2018-01-04 19:50   ` Greg Kroah-Hartman
  2018-01-04 20:16     ` Thomas Voegtle
  2018-01-04 20:10   ` Guenter Roeck
  2018-01-05 14:58   ` Greg Kroah-Hartman
  2 siblings, 1 reply; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-04 19:50 UTC (permalink / raw)
  To: Thomas Voegtle
  Cc: linux-kernel, torvalds, akpm, linux, shuahkh, patches,
	ben.hutchings, lkft-triage, stable

On Thu, Jan 04, 2018 at 08:38:23PM +0100, Thomas Voegtle wrote:
> 
> When I start 4.4.110-rc1 on a virtual machine (qemu) init throws a
> segfault and the kernel panics (attempted to kill init).
> The VM host is a Haswell system.
> 
> The same kernel binary boots fine on a (other) Haswell system.
> 
> I tried:
> 
> 4.4.110-rc1     broken
> 4.4.109         ok
> 4.9.75-rc1      ok

Does 4.15-rc6 also work ok?

> All systems are OpenSuSE 42.3 64bit.
> 
> qemu is started only with:
> qemu-system-x86_64 -m 2048 -enable-kvm  -drive
> file=tvsuse,format=raw,if=none,id=virtdisk0 -device
> virtio-blk-pci,scsi=off,drive=virtdisk0
> 
> Am I the only one who sees this? Has anyone booted that kernel on qemu?

Any chance we can see the panic?

There's another error report of this same type of thing on this thread,
did you see that?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 19:38 ` Thomas Voegtle
  2018-01-04 19:50   ` Greg Kroah-Hartman
@ 2018-01-04 20:10   ` Guenter Roeck
  2018-01-05 14:58   ` Greg Kroah-Hartman
  2 siblings, 0 replies; 156+ messages in thread
From: Guenter Roeck @ 2018-01-04 20:10 UTC (permalink / raw)
  To: Thomas Voegtle
  Cc: Greg Kroah-Hartman, linux-kernel, torvalds, akpm, shuahkh,
	patches, ben.hutchings, lkft-triage, stable

On Thu, Jan 04, 2018 at 08:38:23PM +0100, Thomas Voegtle wrote:
> 
> When I start 4.4.110-rc1 on a virtual machine (qemu) init throws a
> segfault and the kernel panics (attempted to kill init).
> The VM host is a Haswell system.
> 
> The same kernel binary boots fine on a (other) Haswell system.
> 
> I tried:
> 
> 4.4.110-rc1     broken
> 4.4.109         ok
> 4.9.75-rc1      ok
> 
> All systems are OpenSuSE 42.3 64bit.
> 
> qemu is started only with:
> qemu-system-x86_64 -m 2048 -enable-kvm  -drive
> file=tvsuse,format=raw,if=none,id=virtdisk0 -device
> virtio-blk-pci,scsi=off,drive=virtdisk0
> 
> Am I the only one who sees this? Has anyone booted that kernel on qemu?
> 
I did, but not on Haswell, and not with the same root file system.
It boots fine in qemu on E5-2690 v3.

Do you have a traceback ?

Guenter

> Confused,
> 
>   Thomas
> 

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 16:38 ` Pavel Tatashin
  2018-01-04 16:53   ` Greg Kroah-Hartman
@ 2018-01-04 20:11   ` Linus Torvalds
  1 sibling, 0 replies; 156+ messages in thread
From: Linus Torvalds @ 2018-01-04 20:11 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Greg Kroah-Hartman, Linux Kernel Mailing List, Andrew Morton,
	Guenter Roeck, Shuah Khan, patches, Ben Hutchings, lkft-triage,
	stable

On Thu, Jan 4, 2018 at 8:38 AM, Pavel Tatashin <soleen@gmail.com> wrote:
> I am getting the following panic when trying to boot 4.4.110rc1 on
> Intel(R) Xeon(R) CPU E5-2630:
>
> [    5.923489] BUG: unable to handle kernel NULL pointer dereference at 000000000000000d
> [    5.932259] IP: [<ffffffff810e70d2>] dyntick_save_progress_counter+0x12/0x50

Hmm. You don't have the "Code:" line in this oops anywhere, do you?

> [    5.977905] RIP: dyntick_save_progress_counter+0x12/0x50
> [    5.988505] RSP: 0000:ffff881ff2f27dc0  EFLAGS: 00010046
> [    5.994434] RAX: 0000000000000001 RBX: ffffffff81b02140 RCX: ffff883fec768000
> [    6.002403] RDX: 0000000000000000 RSI: ffff881ff2f27e5f RDI: ffff88407e958140
> [    6.010368] RBP: ffff881ff2f27dc0 R08: ffff881ff2f27e78 R09: 000000016110f359
> [    6.018333] R10: 0000000000000b10 R11: 0000000000000000 R12: ffffffff81b02140
> [    6.026297] R13: 00000000ffffffdf R14: 0000000000000021 R15: 0000000200000000
> [    6.034262] FS:  0000000000000000(0000) GS:ffff881fff940000(0000) knlGS:0000000000000000
> [    6.043293] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    6.049707] CR2: 000000000000000d CR3: 0000000001aa6000 CR4: 0000000000360670
> [    6.057672] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    6.065638] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    6.073603] Stack:
> [    6.075847]  ffff881ff2f27e18 ffffffff810e8fac 0000000000000202 ffff881ff2f27e60
> [    6.084158]  ffff881ff2f27e5f ffffffff810e70c0 ffffffff81b02140 ffffffff81b127a0
> [    6.092465]  0000000000000001 0000000000000000 0000000000000003 ffff881ff2f27eb8
> [    6.100768] Call Trace:
> [    6.103501]  [<ffffffff810e8fac>] force_qs_rnp+0xdc/0x150

The oops looks like it *might* be this:

        lock xadd %edx,0xc(%rax)

which is from the

        int snap = atomic_add_return(0, &rdtp->dynticks);

in rcu_dynticks_snap() because %rax is 1 and that would give you the
invalid page fault and the right faulting address.

But that would be complete rcu data structure corruption (that rdtp
pointer comes from

        per_cpu_ptr(rsp->rda, cpu)

in force_qs_rnp(), afaik.

The PTI patches obviously change percpu stuff, but this looks like an
odd place for that to manifest.

                 Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 19:50   ` Greg Kroah-Hartman
@ 2018-01-04 20:16     ` Thomas Voegtle
  2018-01-04 20:29       ` Linus Torvalds
  0 siblings, 1 reply; 156+ messages in thread
From: Thomas Voegtle @ 2018-01-04 20:16 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Thomas Voegtle, linux-kernel, torvalds, akpm, linux, shuahkh,
	patches, ben.hutchings, lkft-triage, stable

[-- Attachment #1: Type: text/plain, Size: 941 bytes --]

On Thu, 4 Jan 2018, Greg Kroah-Hartman wrote:

> On Thu, Jan 04, 2018 at 08:38:23PM +0100, Thomas Voegtle wrote:
>>
>> When I start 4.4.110-rc1 on a virtual machine (qemu) init throws a
>> segfault and the kernel panics (attempted to kill init).
>> The VM host is a Haswell system.
>>
>> The same kernel binary boots fine on a (other) Haswell system.
>>
>> I tried:
>>
>> 4.4.110-rc1     broken
>> 4.4.109         ok
>> 4.9.75-rc1      ok
>
> Does 4.15-rc6 also work ok?

Yes. Slightly different kernel config, but it boots.

>> All systems are OpenSuSE 42.3 64bit.
>>
>> qemu is started only with:
>> qemu-system-x86_64 -m 2048 -enable-kvm  -drive
>> file=tvsuse,format=raw,if=none,id=virtdisk0 -device
>> virtio-blk-pci,scsi=off,drive=virtdisk0
>>
>> Am I the only one who sees this? Has anyone booted that kernel on qemu?
>
> Any chance we can see the panic?

Attached a screenshot.
Is that useful? Are there some debug options I can add?

[-- Attachment #2: Type: image/png, Size: 44846 bytes --]

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 20:16     ` Thomas Voegtle
@ 2018-01-04 20:29       ` Linus Torvalds
  2018-01-04 20:43         ` Andy Lutomirski
  0 siblings, 1 reply; 156+ messages in thread
From: Linus Torvalds @ 2018-01-04 20:29 UTC (permalink / raw)
  To: Thomas Voegtle, Andy Lutomirski
  Cc: Greg Kroah-Hartman, Linux Kernel Mailing List, Andrew Morton,
	Guenter Roeck, Shuah Khan, patches, Ben Hutchings, lkft-triage,
	stable

On Thu, Jan 4, 2018 at 12:16 PM, Thomas Voegtle <tv@lio96.de> wrote:
>
> Attached a screenshot.
> Is that useful? Are there some debug options I can add?

Not much of an oops, because the SIGSEGV happens in user space. The
only reason you get any kernel stack printout at all is because 'init'
dying will make the kernel print that out.

The segfault address for init looks like the fixmap area to me (first
byte in the last page of the fixmap?). "Error 5" means that it's a
user-space read that got a protection fault. So it's not a LDT of GDT
update or anything like that, it's a normal access from user space (or
a qemu emulation bug, but that sounds unlikely).

Is that the vsyscall page?

Adding Luto to the participants. I think he noticed one of the
vsyscall patches missing earlier in the 4.9 series. Maybe the 4.4
series had something similar..

              Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 20:29       ` Linus Torvalds
@ 2018-01-04 20:43         ` Andy Lutomirski
  2018-01-04 20:57           ` Hugh Dickins
  2018-01-05  5:33           ` Andy Lutomirski
  0 siblings, 2 replies; 156+ messages in thread
From: Andy Lutomirski @ 2018-01-04 20:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Voegtle, Greg Kroah-Hartman, Linux Kernel Mailing List,
	Andrew Morton, Guenter Roeck, Shuah Khan, patches, Ben Hutchings,
	lkft-triage, stable


> On Jan 4, 2018, at 12:29 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
>> On Thu, Jan 4, 2018 at 12:16 PM, Thomas Voegtle <tv@lio96.de> wrote:
>> 
>> Attached a screenshot.
>> Is that useful? Are there some debug options I can add?
> 
> Not much of an oops, because the SIGSEGV happens in user space. The
> only reason you get any kernel stack printout at all is because 'init'
> dying will make the kernel print that out.
> 
> The segfault address for init looks like the fixmap area to me (first
> byte in the last page of the fixmap?). "Error 5" means that it's a
> user-space read that got a protection fault. So it's not a LDT of GDT
> update or anything like that, it's a normal access from user space (or
> a qemu emulation bug, but that sounds unlikely).
> 
> Is that the vsyscall page?
> 
> Adding Luto to the participants. I think he noticed one of the
> vsyscall patches missing earlier in the 4.9 series. Maybe the 4.4
> series had something similar..
> 

That's almost certainly it.

I'll try to find some time today or tomorrow to add a proper selftest.

>              Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 20:43         ` Andy Lutomirski
@ 2018-01-04 20:57           ` Hugh Dickins
  2018-01-04 21:16             ` Andy Lutomirski
  2018-01-04 21:23             ` Pavel Tatashin
  2018-01-05  5:33           ` Andy Lutomirski
  1 sibling, 2 replies; 156+ messages in thread
From: Hugh Dickins @ 2018-01-04 20:57 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Thomas Voegtle, Greg Kroah-Hartman,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable

On Thu, Jan 4, 2018 at 12:43 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
>> On Jan 4, 2018, at 12:29 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>
>>> On Thu, Jan 4, 2018 at 12:16 PM, Thomas Voegtle <tv@lio96.de> wrote:
>>>
>>> Attached a screenshot.
>>> Is that useful? Are there some debug options I can add?
>>
>> Not much of an oops, because the SIGSEGV happens in user space. The
>> only reason you get any kernel stack printout at all is because 'init'
>> dying will make the kernel print that out.
>>
>> The segfault address for init looks like the fixmap area to me (first
>> byte in the last page of the fixmap?). "Error 5" means that it's a
>> user-space read that got a protection fault. So it's not a LDT of GDT
>> update or anything like that, it's a normal access from user space (or
>> a qemu emulation bug, but that sounds unlikely).
>>
>> Is that the vsyscall page?
>>
>> Adding Luto to the participants. I think he noticed one of the
>> vsyscall patches missing earlier in the 4.9 series. Maybe the 4.4
>> series had something similar..
>>
>
> That's almost certainly it.

I'm hopeless on the FIXMAP arithmetic, but I'm pretty sure that
ffffffffff5ff000 is either VSYSCALL page or PVCLOCK page (I think it
was VVAR page when init segfaulted on it in my 3.2).

I'll forward Borislav's suggested 4.4 VSYSCALL patch from the kaiser
backports ml to Thomas, to see if that sorts his crash (forwarding in
the hope that gmail doesn't mess up the patch).

Seems odd that 4.4 should be broken but 4.9 not broken here, I'd
expect them to be equally known broken with respect to VSYSCALL; but
perhaps it's a matter of userspace trying different fallbacks
according to what kernel supports, and only hitting this on 4.4.

Hugh

>
> I'll try to find some time today  Thomnor tomorrow to add a proper selftest.
>
>>              Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 20:57           ` Hugh Dickins
@ 2018-01-04 21:16             ` Andy Lutomirski
  2018-01-04 21:23             ` Pavel Tatashin
  1 sibling, 0 replies; 156+ messages in thread
From: Andy Lutomirski @ 2018-01-04 21:16 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Thomas Voegtle, Greg Kroah-Hartman,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable



> On Jan 4, 2018, at 12:57 PM, Hugh Dickins <hughd@google.com> wrote:
> 
>> On Thu, Jan 4, 2018 at 12:43 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> 
>>>> On Jan 4, 2018, at 12:29 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>>> 
>>>> On Thu, Jan 4, 2018 at 12:16 PM, Thomas Voegtle <tv@lio96.de> wrote:
>>>> 
>>>> Attached a screenshot.
>>>> Is that useful? Are there some debug options I can add?
>>> 
>>> Not much of an oops, because the SIGSEGV happens in user space. The
>>> only reason you get any kernel stack printout at all is because 'init'
>>> dying will make the kernel print that out.
>>> 
>>> The segfault address for init looks like the fixmap area to me (first
>>> byte in the last page of the fixmap?). "Error 5" means that it's a
>>> user-space read that got a protection fault. So it's not a LDT of GDT
>>> update or anything like that, it's a normal access from user space (or
>>> a qemu emulation bug, but that sounds unlikely).
>>> 
>>> Is that the vsyscall page?
>>> 
>>> Adding Luto to the participants. I think he noticed one of the
>>> vsyscall patches missing earlier in the 4.9 series. Maybe the 4.4
>>> series had something similar..
>>> 
>> 
>> That's almost certainly it.
> 
> I'm hopeless on the FIXMAP arithmetic, but I'm pretty sure that
> ffffffffff5ff000 is either VSYSCALL page or PVCLOCK page (I think it
> was VVAR page when init segfaulted on it in my 3.2).

Nah, that's one page below VSYSCALL.  Vvar is 0x7fff...

I don't have the actual screenshot, I think.

> 
> I'll forward Borislav's suggested 4.4 VSYSCALL patch from the kaiser
> backports ml to Thomas, to see if that sorts his crash (forwarding in
> the hope that gmail doesn't mess up the patch).
> 
> Seems odd that 4.4 should be broken but 4.9 not broken here, I'd
> expect them to be equally known broken with respect to VSYSCALL; but
> perhaps it's a matter of userspace trying different fallbacks
> according to what kernel supports, and only hitting this on 4.4.

I don't think any current userspace is that dumb.  But Go was still using vsyscall fairly recently.

I may be able to look for real tonight.

> 
> Hugh
> 
>> 
>> I'll try to find some time today  Thomnor tomorrow to add a proper selftest.
>> 
>>>             Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 20:57           ` Hugh Dickins
  2018-01-04 21:16             ` Andy Lutomirski
@ 2018-01-04 21:23             ` Pavel Tatashin
  2018-01-04 21:37               ` Hugh Dickins
  1 sibling, 1 reply; 156+ messages in thread
From: Pavel Tatashin @ 2018-01-04 21:23 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andy Lutomirski, Linus Torvalds, Thomas Voegtle,
	Greg Kroah-Hartman, Linux Kernel Mailing List, Andrew Morton,
	Guenter Roeck, Shuah Khan, patches, Ben Hutchings, lkft-triage,
	stable

I tried cherry picking
 435086b36f62 x86/vsyscall/64: Explicitly set _PAGE_USER in the
pagetable hierarchy

on top of 4.4.110-rc1, (needed to resolve a small 5level table to
4level page table conflict). Unfortunately, this does not solve the
panic/hanging problem I reported. For some reason I do not see the
panic message anymore. Machine hangs here:

[    5.023052] zswap: loaded using pool lzo/zbud
[    5.023063] page_owner is disabled
[    5.026492] Key type trusted registered
[    5.029325] Key type encrypted registered
[    5.029330] ima: No TPM chip found, activating TPM-bypass!
[    5.029365] evm: HMAC attrs: 0x1
[    5.034696] rtc_cmos 00:00: setting system clock to 2018-01-04
21:20:34 UTC (1515100834)
[    5.216862] Freeing unused kernel memory: 1856K
<hang>

And reboots after about half a minute.

Thank you,
Pavel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 21:23             ` Pavel Tatashin
@ 2018-01-04 21:37               ` Hugh Dickins
  2018-01-04 21:48                 ` Pavel Tatashin
  0 siblings, 1 reply; 156+ messages in thread
From: Hugh Dickins @ 2018-01-04 21:37 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Andy Lutomirski, Linus Torvalds, Thomas Voegtle,
	Greg Kroah-Hartman, Linux Kernel Mailing List, Andrew Morton,
	Guenter Roeck, Shuah Khan, patches, Ben Hutchings, lkft-triage,
	stable

On Thu, Jan 4, 2018 at 1:23 PM, Pavel Tatashin
<pasha.tatashin@oracle.com> wrote:
> I tried cherry picking
>  435086b36f62 x86/vsyscall/64: Explicitly set _PAGE_USER in the
> pagetable hierarchy
>
> on top of 4.4.110-rc1, (needed to resolve a small 5level table to
> 4level page table conflict). Unfortunately, this does not solve the
> panic/hanging problem I reported. For some reason I do not see the
> panic message anymore. Machine hangs here:
>
> [    5.023052] zswap: loaded using pool lzo/zbud
> [    5.023063] page_owner is disabled
> [    5.026492] Key type trusted registered
> [    5.029325] Key type encrypted registered
> [    5.029330] ima: No TPM chip found, activating TPM-bypass!
> [    5.029365] evm: HMAC attrs: 0x1
> [    5.034696] rtc_cmos 00:00: setting system clock to 2018-01-04
> 21:20:34 UTC (1515100834)
> [    5.216862] Freeing unused kernel memory: 1856K
> <hang>
>
> And reboots after about half a minute.

Thanks for trying, but yes, I wouldn't expect a straight cherry-pick
of that to work in the context of 4.4.110: it needs to be
cherry-picked "in principle".  Which Borislav has done, and I'll
forward you his (not yet reviewed) patch too, but frankly I've much
less hope that it will help your crash than Thomas's.

So please revert that cherry-pick; and if Borislav's patch doesn't
help, if you can send us a "Code:" line from the crash, that may still
give us more to go on.

As Linus remarked earlier, "The PTI patches obviously change percpu
stuff, but this looks like an odd place for that to manifest".
Exactly: segfault and panic when starting init is a "normal" symptom
when we get something wrong with Kaiser/PTI, but a kthread crashing in
dyntick_save_progress_counter is something new to me.

Hugh

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 21:37               ` Hugh Dickins
@ 2018-01-04 21:48                 ` Pavel Tatashin
  2018-01-04 22:33                   ` Linus Torvalds
  2018-01-05 14:59                   ` Greg Kroah-Hartman
  0 siblings, 2 replies; 156+ messages in thread
From: Pavel Tatashin @ 2018-01-04 21:48 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andy Lutomirski, Linus Torvalds, Thomas Voegtle,
	Greg Kroah-Hartman, Linux Kernel Mailing List, Andrew Morton,
	Guenter Roeck, Shuah Khan, patches, Ben Hutchings, lkft-triage,
	stable

[-- Attachment #1: Type: text/plain, Size: 2152 bytes --]

[    6.159992] Code: 89 83 78 06 01 00 b8 01 00 00 00 5b 41 5c 41 5d
5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 31 d2 48 8b 87 c8 00 00
00 48 89 e5 <f0> 0f c1 50 0c 89 97 d0 00 00 00 83 e2 01 b8 01 00 00 00
74 1d

Also, attached is the full console output.

Thank you,
Pavel

On Thu, Jan 4, 2018 at 4:37 PM, Hugh Dickins <hughd@google.com> wrote:
> On Thu, Jan 4, 2018 at 1:23 PM, Pavel Tatashin
> <pasha.tatashin@oracle.com> wrote:
>> I tried cherry picking
>>  435086b36f62 x86/vsyscall/64: Explicitly set _PAGE_USER in the
>> pagetable hierarchy
>>
>> on top of 4.4.110-rc1, (needed to resolve a small 5level table to
>> 4level page table conflict). Unfortunately, this does not solve the
>> panic/hanging problem I reported. For some reason I do not see the
>> panic message anymore. Machine hangs here:
>>
>> [    5.023052] zswap: loaded using pool lzo/zbud
>> [    5.023063] page_owner is disabled
>> [    5.026492] Key type trusted registered
>> [    5.029325] Key type encrypted registered
>> [    5.029330] ima: No TPM chip found, activating TPM-bypass!
>> [    5.029365] evm: HMAC attrs: 0x1
>> [    5.034696] rtc_cmos 00:00: setting system clock to 2018-01-04
>> 21:20:34 UTC (1515100834)
>> [    5.216862] Freeing unused kernel memory: 1856K
>> <hang>
>>
>> And reboots after about half a minute.
>
> Thanks for trying, but yes, I wouldn't expect a straight cherry-pick
> of that to work in the context of 4.4.110: it needs to be
> cherry-picked "in principle".  Which Borislav has done, and I'll
> forward you his (not yet reviewed) patch too, but frankly I've much
> less hope that it will help your crash than Thomas's.
>
> So please revert that cherry-pick; and if Borislav's patch doesn't
> help, if you can send us a "Code:" line from the crash, that may still
> give us more to go on.
>
> As Linus remarked earlier, "The PTI patches obviously change percpu
> stuff, but this looks like an odd place for that to manifest".
> Exactly: segfault and panic when starting init is a "normal" symptom
> when we get something wrong with Kaiser/PTI, but a kthread crashing in
> dyntick_save_progress_counter is something new to me.
>
> Hugh

[-- Attachment #2: console_panic.output.gz --]
[-- Type: application/x-gzip, Size: 12445 bytes --]

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (42 preceding siblings ...)
  2018-01-04 19:38 ` Thomas Voegtle
@ 2018-01-04 22:00 ` Shuah Khan
  2018-01-05  7:55   ` Greg Kroah-Hartman
  2018-01-04 23:45 ` Guenter Roeck
                   ` (2 subsequent siblings)
  46 siblings, 1 reply; 156+ messages in thread
From: Shuah Khan @ 2018-01-04 22:00 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: torvalds, akpm, linux, patches, ben.hutchings, lkft-triage,
	stable, Shuah Khan

On 01/03/2018 01:11 PM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.110 release.
> There are 37 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Fri Jan  5 19:50:38 UTC 2018.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
> 	kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.110-rc1.gz
> or in the git tree and branch at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 

Based on the email threads, I expected to see issues, however,
compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 21:48                 ` Pavel Tatashin
@ 2018-01-04 22:33                   ` Linus Torvalds
  2018-01-05 14:59                   ` Greg Kroah-Hartman
  1 sibling, 0 replies; 156+ messages in thread
From: Linus Torvalds @ 2018-01-04 22:33 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Hugh Dickins, Andy Lutomirski, Thomas Voegtle,
	Greg Kroah-Hartman, Linux Kernel Mailing List, Andrew Morton,
	Guenter Roeck, Shuah Khan, patches, Ben Hutchings, lkft-triage,
	stable

On Thu, Jan 4, 2018 at 1:48 PM, Pavel Tatashin
<pasha.tatashin@oracle.com> wrote:
> [    6.159992] Code: 89 83 78 06 01 00 b8 01 00 00 00 5b 41 5c 41 5d
> 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 31 d2 48 8b 87 c8 00 00
> 00 48 89 e5 <f0> 0f c1 50 0c 89 97 d0 00 00 00 83 e2 01 b8 01 00 00 00
> 74 1d

Yeah, it's the "lock xadd" as suspected:

   0:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
   5:   55                      push   %rbp
   6:   31 d2                   xor    %edx,%edx
   8:   48 8b 87 c8 00 00 00    mov    0xc8(%rdi),%rax
   f:   48 89 e5                mov    %rsp,%rbp
  12:*  f0 0f c1 50 0c          lock xadd %edx,0xc(%rax)
 <-- trapping instruction
  17:   89 97 d0 00 00 00       mov    %edx,0xd0(%rdi)
  1d:   83 e2 01                and    $0x1,%edx
  20:   b8 01 00 00 00          mov    $0x1,%eax
  25:   74 1d                   je     0x44

(that first "nop" is a 5-byte nop that is used for the function
tracing placeholder).

And %rax contains garbage (the value "1", rather than a valid kernel pointer).

Sadly, I have no idea about how that garbage came about.

                Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (43 preceding siblings ...)
  2018-01-04 22:00 ` Shuah Khan
@ 2018-01-04 23:45 ` Guenter Roeck
  2018-01-04 23:58   ` Linus Torvalds
                     ` (2 more replies)
  2018-01-05 17:20 ` Alice Ferrazzi
  2018-01-05 17:56 ` Guenter Roeck
  46 siblings, 3 replies; 156+ messages in thread
From: Guenter Roeck @ 2018-01-04 23:45 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, shuahkh, patches, ben.hutchings,
	lkft-triage, stable, Tao Wu

On Wed, Jan 03, 2018 at 09:11:06PM +0100, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.110 release.
> There are 37 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Fri Jan  5 19:50:38 UTC 2018.
> Anything received after that time might be too late.
> 

This is also reported to crash if loaded under qemu + haxm under windows.
See https://www.spinics.net/lists/kernel/msg2689835.html for details.
Here is a boot log (the log is from chromeos-4.4, but Tao Wu says that
the same log is also seen with vanilla v4.4.110-rc1).

[    0.712750] Freeing unused kernel memory: 552K
[    0.721821] init: Corrupted page table at address 57b029b332e0
[    0.722761] PGD 80000000bb238067 PUD bc36a067 PMD bc369067 PTE 45d2067
[    0.722761] Bad pagetable: 000b [#1] PREEMPT SMP 
[    0.722761] Modules linked in:
[    0.722761] CPU: 1 PID: 1 Comm: init Not tainted 4.4.96 #31
[    0.722761] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
[    0.722761] task: ffff8800bc290000 ti: ffff8800bc28c000 task.ti: ffff8800bc28c000
[    0.722761] RIP: 0010:[<ffffffff83f4129e>]  [<ffffffff83f4129e>] __clear_user+0x42/0x67
[    0.722761] RSP: 0000:ffff8800bc28fcf8  EFLAGS: 00010202
[    0.722761] RAX: 0000000000000000 RBX: 00000000000001a4 RCX: 00000000000001a4
[    0.722761] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000057b029b332e0
[    0.722761] RBP: ffff8800bc28fd08 R08: ffff8800bc290000 R09: ffff8800bb2f4000
[    0.722761] R10: ffff8800bc290000 R11: ffff8800bb2f4000 R12: 000057b029b332e0
[    0.722761] R13: 0000000000000000 R14: 000057b029b33340 R15: ffff8800bb1e2a00
[    0.722761] FS:  0000000000000000(0000) GS:ffff8800bfb00000(0000) knlGS:0000000000000000
[    0.722761] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    0.722761] CR2: 000057b029b332e0 CR3: 00000000bb2f8000 CR4: 00000000000006e0
[    0.722761] Stack:
[    0.722761]  000057b029b332e0 ffff8800bb95fa80 ffff8800bc28fd18 ffffffff83f4120c
[    0.722761]  ffff8800bc28fe18 ffffffff83e9e7a1 ffff8800bc28fd68 0000000000000000
[    0.722761]  ffff8800bc290000 ffff8800bc290000 ffff8800bc290000 ffff8800bc290000
[    0.722761] Call Trace:
[    0.722761]  [<ffffffff83f4120c>] clear_user+0x2e/0x30
[    0.722761]  [<ffffffff83e9e7a1>] load_elf_binary+0xa7f/0x18f7
[    0.722761]  [<ffffffff83de2088>] search_binary_handler+0x86/0x19c
[    0.722761]  [<ffffffff83de389e>] do_execveat_common.isra.26+0x909/0xf98
[    0.722761]  [<ffffffff844febe0>] ? rest_init+0x87/0x87
[    0.722761]  [<ffffffff83de40be>] do_execve+0x23/0x25
[    0.722761]  [<ffffffff83c002e3>] run_init_process+0x2b/0x2d
[    0.722761]  [<ffffffff844fec4d>] kernel_init+0x6d/0xda
[    0.722761]  [<ffffffff84505b2f>] ret_from_fork+0x3f/0x70
[    0.722761]  [<ffffffff844febe0>] ? rest_init+0x87/0x87
[    0.722761] Code: 86 84 be 12 00 00 00 e8 87 0d e8 ff 66 66 90 48 89 d8 48 c1
eb 03 4c 89 e7 83 e0 07 48 89 d9 be 08 00 00 00 31 d2 48 85 c9 74 0a <48> 89 17
48 01 f7 ff c9 75 f6 48 89 c1 85 c9 74 09 88 17 48 ff 
[    0.722761] RIP  [<ffffffff83f4129e>] __clear_user+0x42/0x67
[    0.722761]  RSP <ffff8800bc28fcf8>
[    0.722761] ---[ end trace def703879b4ff090 ]---
[    0.722761] BUG: sleeping function called from invalid context at /mnt/host/source/src/third_party/kernel/v4.4/kernel/locking/rwsem.c:21
[    0.722761] in_atomic(): 0, irqs_disabled(): 1, pid: 1, name: init
[    0.722761] CPU: 1 PID: 1 Comm: init Tainted: G      D         4.4.96 #31
[    0.722761] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
[    0.722761]  0000000000000086 dcb5d76098c89836 ffff8800bc28fa30 ffffffff83f34004
[    0.722761]  ffffffff84839dc2 0000000000000015 ffff8800bc28fa40 ffffffff83d57dc9
[    0.722761]  ffff8800bc28fa68 ffffffff83d57e6a ffffffff84a53640 0000000000000000
[    0.722761] Call Trace:
[    0.722761]  [<ffffffff83f34004>] dump_stack+0x4d/0x63
[    0.722761]  [<ffffffff83d57dc9>] ___might_sleep+0x13a/0x13c
[    0.722761]  [<ffffffff83d57e6a>] __might_sleep+0x9f/0xa6
[    0.722761]  [<ffffffff84502788>] down_read+0x20/0x31
[    0.722761]  [<ffffffff83cc5d9b>] __blocking_notifier_call_chain+0x35/0x63
[    0.722761]  [<ffffffff83cc5ddd>] blocking_notifier_call_chain+0x14/0x16
[    0.800374] usb 1-1: new full-speed USB device number 2 using uhci_hcd
[    0.722761]  [<ffffffff83cefe97>] profile_task_exit+0x1a/0x1c
[    0.802309]  [<ffffffff83cac84e>] do_exit+0x39/0xe7f
[    0.802309]  [<ffffffff83ce5938>] ? vprintk_default+0x1d/0x1f
[    0.802309]  [<ffffffff83d7bb95>] ? printk+0x57/0x73
[    0.802309]  [<ffffffff83c46e25>] oops_end+0x80/0x85
[    0.802309]  [<ffffffff83c7b747>] pgtable_bad+0x8a/0x95
[    0.802309]  [<ffffffff83ca7f4a>] __do_page_fault+0x8c/0x352
[    0.802309]  [<ffffffff83eefba5>] ? file_has_perm+0xc4/0xe5
[    0.802309]  [<ffffffff83ca821c>] do_page_fault+0xc/0xe
[    0.802309]  [<ffffffff84507682>] page_fault+0x22/0x30
[    0.802309]  [<ffffffff83f4129e>] ? __clear_user+0x42/0x67
[    0.802309]  [<ffffffff83f4127f>] ? __clear_user+0x23/0x67
[    0.802309]  [<ffffffff83f4120c>] clear_user+0x2e/0x30
[    0.802309]  [<ffffffff83e9e7a1>] load_elf_binary+0xa7f/0x18f7
[    0.802309]  [<ffffffff83de2088>] search_binary_handler+0x86/0x19c
[    0.802309]  [<ffffffff83de389e>] do_execveat_common.isra.26+0x909/0xf98
[    0.802309]  [<ffffffff844febe0>] ? rest_init+0x87/0x87
[    0.802309]  [<ffffffff83de40be>] do_execve+0x23/0x25
[    0.802309]  [<ffffffff83c002e3>] run_init_process+0x2b/0x2d
[    0.802309]  [<ffffffff844fec4d>] kernel_init+0x6d/0xda
[    0.802309]  [<ffffffff84505b2f>] ret_from_fork+0x3f/0x70
[    0.802309]  [<ffffffff844febe0>] ? rest_init+0x87/0x87
[    0.830559] Kernel panic - not syncing: Attempted to kill init!  exitcode=0x00000009
[    0.830559] 
[    0.831305] Kernel Offset: 0x2c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[    0.831305] ---[ end Kernel panic - not syncing: Attempted to kill init!  exitcode=0x00000009

The crash part of this problem may be solved with the following patch
(thanks to Hugh for the hint). There is still another problem, though -
with this patch applied, the qemu session aborts with "VCPU Shutdown
request", whatever that means.

Guenter

---
From: Guenter Roeck <groeck@chromium.org>
Date: Thu, 4 Jan 2018 13:41:55 -0800
Subject: [PATCH 2/2] WIP: kaiser: Set _PAGE_NX only if supported

Change-Id: Ie6ab566c1d725b24c4b3aa80a47c3ff3a5feddb9
Signed-off-by: Guenter Roeck <groeck@chromium.org>
---
 arch/x86/mm/kaiser.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/kaiser.c b/arch/x86/mm/kaiser.c
index 7d2f7eb6857f..e4706273d4a1 100644
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -421,7 +421,8 @@ pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd)
 			 * get out to userspace running on the kernel CR3,
 			 * userspace will crash instead of running.
 			 */
-			pgd.pgd |= _PAGE_NX;
+			if (__supported_pte_mask & _PAGE_NX)
+				pgd.pgd |= _PAGE_NX;
 		}
 	} else if (!pgd.pgd) {
 		/*
-- 
2.16.0.rc0.223.g4a4ac83678-goog

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 23:45 ` Guenter Roeck
@ 2018-01-04 23:58   ` Linus Torvalds
  2018-01-05  4:37     ` Mike Galbraith
  2018-01-05 13:41   ` Greg Kroah-Hartman
  2 siblings, 0 replies; 156+ messages in thread
From: Linus Torvalds @ 2018-01-04 23:58 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Greg Kroah-Hartman, Linux Kernel Mailing List, Andrew Morton,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable, Tao Wu

On Thu, Jan 4, 2018 at 3:45 PM, Guenter Roeck <linux@roeck-us.net> wrote:
>
> [    0.721821] init: Corrupted page table at address 57b029b332e0
> [    0.722761] PGD 80000000bb238067 PUD bc36a067 PMD bc369067 PTE 45d2067
> [    0.722761] Bad pagetable: 000b [#1] PREEMPT SMP

Ok, it's unhappy because the RSVD bit is set in the error code.

And yeah, that seems to be due to NX in the pgd (nothing else is
certainly set), with presumably a virtual machine that doesn't support
it.

So I suspect your patch is indeed the right thing.

> The crash part of this problem may be solved with the following patch
> (thanks to Hugh for the hint). There is still another problem, though -
> with this patch applied, the qemu session aborts with "VCPU Shutdown
> request", whatever that means.

Presumably that is a triple fault.

That causes a reboot traditionally, and in a virtual environment that
would be approximated with a VCPU shutdown.

                Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04  9:27 ` kernelci.org bot
@ 2018-01-05  0:06   ` Kevin Hilman
  2018-01-08 15:06     ` Guillaume Tucker
  0 siblings, 1 reply; 156+ messages in thread
From: Kevin Hilman @ 2018-01-05  0:06 UTC (permalink / raw)
  To: kernelci.org bot
  Cc: Greg Kroah-Hartman, linux-kernel, torvalds, akpm, linux, shuahkh,
	patches, ben.hutchings, lkft-triage, stable, Guillaume Tucker,
	kernelci

kernelci.org bot <bot@kernelci.org> writes:

> stable-rc/linux-4.4.y boot: 100 boots: 4 failed, 93 passed with 1 offline, 2 conflicts (v4.4.109-38-g99abd6cdd65e)
>
> Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.4.y/kernel/v4.4.109-38-g99abd6cdd65e/
> Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.4.y/kernel/v4.4.109-38-g99abd6cdd65e/
>
> Tree: stable-rc
> Branch: linux-4.4.y
> Git Describe: v4.4.109-38-g99abd6cdd65e
> Git Commit: 99abd6cdd65e984d89c8565508a7a96ea0fce179
> Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> Tested: 53 unique boards, 19 SoC families, 16 builds out of 178

TL;DR;  All is well.

> Boot Regressions Detected:
>
> arm:
>
>     exynos_defconfig:
>         exynos5422-odroidxu3:
>             lab-collabora: failing since 58 days (last pass: v4.4.95-21-g32458fcb7bd6 - first fail: v4.4.96-41-g336421367b9c)

Long standing issue in lab-collabora (passing in other labs)  Guillaume?

>     multi_v7_defconfig:
>         armada-xp-linksys-mamba:
>             lab-free-electrons: new failure (last pass: v4.4.109-36-g8b381424010c)

Not a kerel issue, bootROM fails to start bootloader.  I pinged lab
owners (Free Electrons)

>         tegra124-nyan-big:
>             lab-collabora: failing since 1 day (last pass: v4.4.109 - first fail: v4.4.109-36-g8b381424010c)
>
>     tegra_defconfig:
>         tegra124-nyan-big:
>             lab-collabora: failing since 1 day (last pass: v4.4.108-65-g57856049c0f8 - first fail: v4.4.109)

This one is booting fine, but the command to power-off the board is
timing out, resulting in a failure report.

Kevin

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 23:45 ` Guenter Roeck
@ 2018-01-05  4:37     ` Mike Galbraith
  2018-01-05  4:37     ` Mike Galbraith
  2018-01-05 13:41   ` Greg Kroah-Hartman
  2 siblings, 0 replies; 156+ messages in thread
From: Mike Galbraith @ 2018-01-05  4:37 UTC (permalink / raw)
  To: Guenter Roeck, Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, shuahkh, patches, ben.hutchings,
	lkft-triage, stable, Tao Wu

On Thu, 2018-01-04 at 15:45 -0800, Guenter Roeck wrote:
> 
> The crash part of this problem may be solved with the following patch
> (thanks to Hugh for the hint). There is still another problem, though -
> with this patch applied, the qemu session aborts with "VCPU Shutdown
> request", whatever that means.

The crash part is not fixed by your patch here, w/wo I get this, and it
is PTI, as virgin 109 boots/works with identical everything else.  My
shiny new PTI equipped enterprise 4.4 RT kernels also boot/work fine,
which seems a bit odd.. and not particularly comforting.

[    1.244354] Freeing unused kernel memory: 1192K
[    1.245278] Write protecting the kernel read-only data: 10240k
[    1.247626] Freeing unused kernel memory: 1152K
[    1.251318] Freeing unused kernel memory: 1476K
[    1.253393] init[1]: segfault at ffffffffff5ff100 ip 00007fffb7ffac6e sp 00007fffb7fa07d8 error 5
[    1.254629] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    1.254629] 
[    1.256202] CPU: 4 PID: 1 Comm: init Not tainted 4.4.110-rc1-smp #4
[    1.257169] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
[    1.258563]  0000000000000000 ffffffff8125a9c0 ffffffff817de7c8 ffff880197e83cf0
[    1.260850]  ffffffff8112bb2d ffffffff00000010 ffff880197e83d00 ffff880197e83ca0
[    1.263091]  ffffffff81c3cf30 000000000000000b ffff880197e90010 0000000000000000
[    1.264580] Call Trace:
[    1.265617]  [<ffffffff8125a9c0>] ? dump_stack+0x5c/0x7c
[    1.266671]  [<ffffffff8112bb2d>] ? panic+0xc8/0x20f
[    1.267799]  [<ffffffff81060af0>] ? do_exit+0xa50/0xa50
[    1.268971]  [<ffffffff810618e9>] ? do_group_exit+0x39/0xa0
[    1.270281]  [<ffffffff8106c8a0>] ? get_signal+0x1d0/0x600
[    1.271347]  [<ffffffff810041e3>] ? do_signal+0x23/0x5b0
[    1.272259]  [<ffffffff8106ade9>] ? __send_signal+0x179/0x460
[    1.273235]  [<ffffffff8104b88f>] ? force_sig_info_fault+0x5f/0x70
[    1.274258]  [<ffffffff8104bf6c>] ? __bad_area_nosemaphore+0x1cc/0x200
[    1.275268]  [<ffffffff8105a052>] ? exit_to_usermode_loop+0x54/0x95
[    1.276262]  [<ffffffff81001961>] ? prepare_exit_to_usermode+0x31/0x40
[    1.277266]  [<ffffffff814d9dbe>] ? retint_user+0x8/0x2c
[    1.278274] Dumping ftrace buffer:
[    1.279011]    (ftrace buffer empty)
[    1.279728] Kernel Offset: disabled
[    1.280432] Rebooting in 60 seconds..

virsh # exit
 
> 
> Guenter
> 
> ---
> From: Guenter Roeck <groeck@chromium.org>
> Date: Thu, 4 Jan 2018 13:41:55 -0800
> Subject: [PATCH 2/2] WIP: kaiser: Set _PAGE_NX only if supported
> 
> Change-Id: Ie6ab566c1d725b24c4b3aa80a47c3ff3a5feddb9
> Signed-off-by: Guenter Roeck <groeck@chromium.org>
> ---
>  arch/x86/mm/kaiser.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/mm/kaiser.c b/arch/x86/mm/kaiser.c
> index 7d2f7eb6857f..e4706273d4a1 100644
> --- a/arch/x86/mm/kaiser.c
> +++ b/arch/x86/mm/kaiser.c
> @@ -421,7 +421,8 @@ pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd)
>  			 * get out to userspace running on the kernel CR3,
>  			 * userspace will crash instead of running.
>  			 */
> -			pgd.pgd |= _PAGE_NX;
> +			if (__supported_pte_mask & _PAGE_NX)
> +				pgd.pgd |= _PAGE_NX;
>  		}
>  	} else if (!pgd.pgd) {
>  		/*

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
@ 2018-01-05  4:37     ` Mike Galbraith
  0 siblings, 0 replies; 156+ messages in thread
From: Mike Galbraith @ 2018-01-05  4:37 UTC (permalink / raw)
  To: Guenter Roeck, Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, shuahkh, patches, ben.hutchings,
	lkft-triage, stable, Tao Wu

On Thu, 2018-01-04 at 15:45 -0800, Guenter Roeck wrote:
> 
> The crash part of this problem may be solved with the following patch
> (thanks to Hugh for the hint). There is still another problem, though -
> with this patch applied, the qemu session aborts with "VCPU Shutdown
> request", whatever that means.

The crash part is not fixed by your patch here, w/wo I get this, and it
is PTI, as virgin 109 boots/works with identical everything else. ï¿œMy
shiny new PTI equipped enterprise 4.4 RT kernels also boot/work fine,
which seems a bit odd.. and not particularly comforting.

[    1.244354] Freeing unused kernel memory: 1192K
[    1.245278] Write protecting the kernel read-only data: 10240k
[    1.247626] Freeing unused kernel memory: 1152K
[    1.251318] Freeing unused kernel memory: 1476K
[    1.253393] init[1]: segfault at ffffffffff5ff100 ip 00007fffb7ffac6e sp 00007fffb7fa07d8 error 5
[    1.254629] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    1.254629] 
[    1.256202] CPU: 4 PID: 1 Comm: init Not tainted 4.4.110-rc1-smp #4
[    1.257169] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
[    1.258563]  0000000000000000 ffffffff8125a9c0 ffffffff817de7c8 ffff880197e83cf0
[    1.260850]  ffffffff8112bb2d ffffffff00000010 ffff880197e83d00 ffff880197e83ca0
[    1.263091]  ffffffff81c3cf30 000000000000000b ffff880197e90010 0000000000000000
[    1.264580] Call Trace:
[    1.265617]  [<ffffffff8125a9c0>] ? dump_stack+0x5c/0x7c
[    1.266671]  [<ffffffff8112bb2d>] ? panic+0xc8/0x20f
[    1.267799]  [<ffffffff81060af0>] ? do_exit+0xa50/0xa50
[    1.268971]  [<ffffffff810618e9>] ? do_group_exit+0x39/0xa0
[    1.270281]  [<ffffffff8106c8a0>] ? get_signal+0x1d0/0x600
[    1.271347]  [<ffffffff810041e3>] ? do_signal+0x23/0x5b0
[    1.272259]  [<ffffffff8106ade9>] ? __send_signal+0x179/0x460
[    1.273235]  [<ffffffff8104b88f>] ? force_sig_info_fault+0x5f/0x70
[    1.274258]  [<ffffffff8104bf6c>] ? __bad_area_nosemaphore+0x1cc/0x200
[    1.275268]  [<ffffffff8105a052>] ? exit_to_usermode_loop+0x54/0x95
[    1.276262]  [<ffffffff81001961>] ? prepare_exit_to_usermode+0x31/0x40
[    1.277266]  [<ffffffff814d9dbe>] ? retint_user+0x8/0x2c
[    1.278274] Dumping ftrace buffer:
[    1.279011]    (ftrace buffer empty)
[    1.279728] Kernel Offset: disabled
[    1.280432] Rebooting in 60 seconds..

virsh # exit
ᅵ
> 
> Guenter
> 
> ---
> From: Guenter Roeck <groeck@chromium.org>
> Date: Thu, 4 Jan 2018 13:41:55 -0800
> Subject: [PATCH 2/2] WIP: kaiser: Set _PAGE_NX only if supported
> 
> Change-Id: Ie6ab566c1d725b24c4b3aa80a47c3ff3a5feddb9
> Signed-off-by: Guenter Roeck <groeck@chromium.org>
> ---
>  arch/x86/mm/kaiser.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/mm/kaiser.c b/arch/x86/mm/kaiser.c
> index 7d2f7eb6857f..e4706273d4a1 100644
> --- a/arch/x86/mm/kaiser.c
> +++ b/arch/x86/mm/kaiser.c
> @@ -421,7 +421,8 @@ pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd)
>  			 * get out to userspace running on the kernel CR3,
>  			 * userspace will crash instead of running.
>  			 */
> -			pgd.pgd |= _PAGE_NX;
> +			if (__supported_pte_mask & _PAGE_NX)
> +				pgd.pgd |= _PAGE_NX;
>  		}
>  	} else if (!pgd.pgd) {
>  		/*

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 20:43         ` Andy Lutomirski
  2018-01-04 20:57           ` Hugh Dickins
@ 2018-01-05  5:33           ` Andy Lutomirski
  2018-01-05 10:12             ` Kees Cook
  1 sibling, 1 reply; 156+ messages in thread
From: Andy Lutomirski @ 2018-01-05  5:33 UTC (permalink / raw)
  To: Linus Torvalds, Pavel Tatashin, Kees Cook
  Cc: Thomas Voegtle, Greg Kroah-Hartman, Linux Kernel Mailing List,
	Andrew Morton, Guenter Roeck, Shuah Khan, patches, Ben Hutchings,
	lkft-triage, stable

On Thu, Jan 4, 2018 at 12:43 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
>> On Jan 4, 2018, at 12:29 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>
>>> On Thu, Jan 4, 2018 at 12:16 PM, Thomas Voegtle <tv@lio96.de> wrote:
>>>
>>> Attached a screenshot.
>>> Is that useful? Are there some debug options I can add?
>>
>> Not much of an oops, because the SIGSEGV happens in user space. The
>> only reason you get any kernel stack printout at all is because 'init'
>> dying will make the kernel print that out.
>>
>> The segfault address for init looks like the fixmap area to me (first
>> byte in the last page of the fixmap?). "Error 5" means that it's a
>> user-space read that got a protection fault. So it's not a LDT of GDT
>> update or anything like that, it's a normal access from user space (or
>> a qemu emulation bug, but that sounds unlikely).
>>
>> Is that the vsyscall page?
>>
>> Adding Luto to the participants. I think he noticed one of the
>> vsyscall patches missing earlier in the 4.9 series. Maybe the 4.4
>> series had something similar..
>>
>
> That's almost certainly it.
>
> I'll try to find some time today or tomorrow to add a proper selftest.
>

Give this a shot:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/pti&id=17c5ebeb2e00879b0af1a9c32bf37ecdd9b9b31b

Boot with each of vsyscall=none, vsyscall=native, and vsyscall=emulate
and run both the 32-bit and 64-bit variants of that test.  All six
combinations should pass.  But I bet they don't on 4.4.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 22:00 ` Shuah Khan
@ 2018-01-05  7:55   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-05  7:55 UTC (permalink / raw)
  To: Shuah Khan
  Cc: linux-kernel, torvalds, akpm, linux, patches, ben.hutchings,
	lkft-triage, stable

On Thu, Jan 04, 2018 at 03:00:29PM -0700, Shuah Khan wrote:
> On 01/03/2018 01:11 PM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.4.110 release.
> > There are 37 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Fri Jan  5 19:50:38 UTC 2018.
> > Anything received after that time might be too late.
> > 
> > The whole patch series can be found in one patch at:
> > 	kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.110-rc1.gz
> > or in the git tree and branch at:
> >   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
> > and the diffstat can be found below.
> > 
> > thanks,
> > 
> > greg k-h
> > 
> 
> Based on the email threads, I expected to see issues, however,
> compiled and booted on my test system. No dmesg regressions.

Hey, you got lucky :)

Thanks for testing all of these and letting me know.

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05  5:33           ` Andy Lutomirski
@ 2018-01-05 10:12             ` Kees Cook
  2018-01-05 12:14               ` Greg Kroah-Hartman
  2018-01-05 13:08               ` Greg Kroah-Hartman
  0 siblings, 2 replies; 156+ messages in thread
From: Kees Cook @ 2018-01-05 10:12 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Pavel Tatashin, Thomas Voegtle,
	Greg Kroah-Hartman, Linux Kernel Mailing List, Andrew Morton,
	Guenter Roeck, Shuah Khan, patches, Ben Hutchings, lkft-triage,
	stable

On Thu, Jan 4, 2018 at 9:33 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Thu, Jan 4, 2018 at 12:43 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>
>>> On Jan 4, 2018, at 12:29 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>>
>>>> On Thu, Jan 4, 2018 at 12:16 PM, Thomas Voegtle <tv@lio96.de> wrote:
>>>>
>>>> Attached a screenshot.
>>>> Is that useful? Are there some debug options I can add?
>>>
>>> Not much of an oops, because the SIGSEGV happens in user space. The
>>> only reason you get any kernel stack printout at all is because 'init'
>>> dying will make the kernel print that out.
>>>
>>> The segfault address for init looks like the fixmap area to me (first
>>> byte in the last page of the fixmap?). "Error 5" means that it's a
>>> user-space read that got a protection fault. So it's not a LDT of GDT
>>> update or anything like that, it's a normal access from user space (or
>>> a qemu emulation bug, but that sounds unlikely).
>>>
>>> Is that the vsyscall page?
>>>
>>> Adding Luto to the participants. I think he noticed one of the
>>> vsyscall patches missing earlier in the 4.9 series. Maybe the 4.4
>>> series had something similar..
>>>
>>
>> That's almost certainly it.
>>
>> I'll try to find some time today or tomorrow to add a proper selftest.
>>
>
> Give this a shot:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/pti&id=17c5ebeb2e00879b0af1a9c32bf37ecdd9b9b31b
>
> Boot with each of vsyscall=none, vsyscall=native, and vsyscall=emulate
> and run both the 32-bit and 64-bit variants of that test.  All six
> combinations should pass.  But I bet they don't on 4.4.

With my 4.4.110-rc1 under QEMU -cpu=host (Xeon E5-2690 v3)

vsyscall=emulate:

# ./test_vsyscall_64
...
[RUN]   Checking read access to the vsyscall page
[FAIL]  We don't have read access, but we should

vsyscall=native:

# ./test_vsyscall_64
...
[RUN]   Checking read access to the vsyscall page
[FAIL]  We don't have read access, but we should

Everything else passes.

Note that test_vsyscall_32 warns:

# ./test_vsyscall_32
Warning: failed to find getcpu in vDSO
...

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 10:12             ` Kees Cook
@ 2018-01-05 12:14               ` Greg Kroah-Hartman
  2018-01-05 13:08               ` Greg Kroah-Hartman
  1 sibling, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-05 12:14 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Lutomirski, Linus Torvalds, Pavel Tatashin, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable

On Fri, Jan 05, 2018 at 02:12:33AM -0800, Kees Cook wrote:
> Note that test_vsyscall_32 warns:
> 
> # ./test_vsyscall_32
> Warning: failed to find getcpu in vDSO
> ...

It gives me that warning with 4.15-rc6 as well.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05  4:37     ` Mike Galbraith
@ 2018-01-05 12:17       ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-05 12:17 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Guenter Roeck, linux-kernel, torvalds, akpm, shuahkh, patches,
	ben.hutchings, lkft-triage, stable, Tao Wu

On Fri, Jan 05, 2018 at 05:37:54AM +0100, Mike Galbraith wrote:
> On Thu, 2018-01-04 at 15:45 -0800, Guenter Roeck wrote:
> > 
> > The crash part of this problem may be solved with the following patch
> > (thanks to Hugh for the hint). There is still another problem, though -
> > with this patch applied, the qemu session aborts with "VCPU Shutdown
> > request", whatever that means.
> 
> The crash part is not fixed by your patch here, w/wo I get this, and it
> is PTI, as virgin 109 boots/works with identical everything else.  My
> shiny new PTI equipped enterprise 4.4 RT kernels also boot/work fine,
> which seems a bit odd.. and not particularly comforting.

Might I ask _what_ enterprise 4.4 kernels you are trying here?  This
should be the identical set to what is in the SLES12 tree, which worries
me a lot...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
@ 2018-01-05 12:17       ` Greg Kroah-Hartman
  0 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-05 12:17 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Guenter Roeck, linux-kernel, torvalds, akpm, shuahkh, patches,
	ben.hutchings, lkft-triage, stable, Tao Wu

On Fri, Jan 05, 2018 at 05:37:54AM +0100, Mike Galbraith wrote:
> On Thu, 2018-01-04 at 15:45 -0800, Guenter Roeck wrote:
> > 
> > The crash part of this problem may be solved with the following patch
> > (thanks to Hugh for the hint). There is still another problem, though -
> > with this patch applied, the qemu session aborts with "VCPU Shutdown
> > request", whatever that means.
> 
> The crash part is not fixed by your patch here, w/wo I get this, and it
> is PTI, as virgin 109 boots/works with identical everything else. �My
> shiny new PTI equipped enterprise 4.4 RT kernels also boot/work fine,
> which seems a bit odd.. and not particularly comforting.

Might I ask _what_ enterprise 4.4 kernels you are trying here?  This
should be the identical set to what is in the SLES12 tree, which worries
me a lot...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 12:17       ` Greg Kroah-Hartman
@ 2018-01-05 13:03         ` Mike Galbraith
  -1 siblings, 0 replies; 156+ messages in thread
From: Mike Galbraith @ 2018-01-05 13:03 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Guenter Roeck, linux-kernel, torvalds, akpm, shuahkh, patches,
	ben.hutchings, lkft-triage, stable, Tao Wu

On Fri, 2018-01-05 at 13:17 +0100, Greg Kroah-Hartman wrote:
> On Fri, Jan 05, 2018 at 05:37:54AM +0100, Mike Galbraith wrote:
> > On Thu, 2018-01-04 at 15:45 -0800, Guenter Roeck wrote:
> > > 
> > > The crash part of this problem may be solved with the following patch
> > > (thanks to Hugh for the hint). There is still another problem, though -
> > > with this patch applied, the qemu session aborts with "VCPU Shutdown
> > > request", whatever that means.
> > 
> > The crash part is not fixed by your patch here, w/wo I get this, and it
> > is PTI, as virgin 109 boots/works with identical everything else.  My
> > shiny new PTI equipped enterprise 4.4 RT kernels also boot/work fine,
> > which seems a bit odd.. and not particularly comforting.
> 
> Might I ask _what_ enterprise 4.4 kernels you are trying here?  This
> should be the identical set to what is in the SLES12 tree, which worries
> me a lot...

SLE12-SP[23]-RT, currently 4.4.104 based. Parent trees boot fine in the
vm too.  I thought perhaps it was a config difference, but seems not,
4.4.110-rc1 built with as close to enterprise config as you can get
blows up the same as my light config.

	-Mike

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
@ 2018-01-05 13:03         ` Mike Galbraith
  0 siblings, 0 replies; 156+ messages in thread
From: Mike Galbraith @ 2018-01-05 13:03 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Guenter Roeck, linux-kernel, torvalds, akpm, shuahkh, patches,
	ben.hutchings, lkft-triage, stable, Tao Wu

On Fri, 2018-01-05 at 13:17 +0100, Greg Kroah-Hartman wrote:
> On Fri, Jan 05, 2018 at 05:37:54AM +0100, Mike Galbraith wrote:
> > On Thu, 2018-01-04 at 15:45 -0800, Guenter Roeck wrote:
> > > 
> > > The crash part of this problem may be solved with the following patch
> > > (thanks to Hugh for the hint). There is still another problem, though -
> > > with this patch applied, the qemu session aborts with "VCPU Shutdown
> > > request", whatever that means.
> > 
> > The crash part is not fixed by your patch here, w/wo I get this, and it
> > is PTI, as virgin 109 boots/works with identical everything else. ï¿œMy
> > shiny new PTI equipped enterprise 4.4 RT kernels also boot/work fine,
> > which seems a bit odd.. and not particularly comforting.
> 
> Might I ask _what_ enterprise 4.4 kernels you are trying here?  This
> should be the identical set to what is in the SLES12 tree, which worries
> me a lot...

SLE12-SP[23]-RT, currently 4.4.104 based. Parent trees boot fine in the
vm too.  I thought perhaps it was a config difference, but seems not,
4.4.110-rc1 built with as close to enterprise config as you can get
blows up the same as my light config.

	-Mike

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 10:12             ` Kees Cook
  2018-01-05 12:14               ` Greg Kroah-Hartman
@ 2018-01-05 13:08               ` Greg Kroah-Hartman
  1 sibling, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-05 13:08 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Lutomirski, Linus Torvalds, Pavel Tatashin, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable

On Fri, Jan 05, 2018 at 02:12:33AM -0800, Kees Cook wrote:
> On Thu, Jan 4, 2018 at 9:33 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> > On Thu, Jan 4, 2018 at 12:43 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> >>
> >>> On Jan 4, 2018, at 12:29 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> >>>
> >>>> On Thu, Jan 4, 2018 at 12:16 PM, Thomas Voegtle <tv@lio96.de> wrote:
> >>>>
> >>>> Attached a screenshot.
> >>>> Is that useful? Are there some debug options I can add?
> >>>
> >>> Not much of an oops, because the SIGSEGV happens in user space. The
> >>> only reason you get any kernel stack printout at all is because 'init'
> >>> dying will make the kernel print that out.
> >>>
> >>> The segfault address for init looks like the fixmap area to me (first
> >>> byte in the last page of the fixmap?). "Error 5" means that it's a
> >>> user-space read that got a protection fault. So it's not a LDT of GDT
> >>> update or anything like that, it's a normal access from user space (or
> >>> a qemu emulation bug, but that sounds unlikely).
> >>>
> >>> Is that the vsyscall page?
> >>>
> >>> Adding Luto to the participants. I think he noticed one of the
> >>> vsyscall patches missing earlier in the 4.9 series. Maybe the 4.4
> >>> series had something similar..
> >>>
> >>
> >> That's almost certainly it.
> >>
> >> I'll try to find some time today or tomorrow to add a proper selftest.
> >>
> >
> > Give this a shot:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/pti&id=17c5ebeb2e00879b0af1a9c32bf37ecdd9b9b31b
> >
> > Boot with each of vsyscall=none, vsyscall=native, and vsyscall=emulate
> > and run both the 32-bit and 64-bit variants of that test.  All six
> > combinations should pass.  But I bet they don't on 4.4.
> 
> With my 4.4.110-rc1 under QEMU -cpu=host (Xeon E5-2690 v3)
> 
> vsyscall=emulate:
> 
> # ./test_vsyscall_64
> ...
> [RUN]   Checking read access to the vsyscall page
> [FAIL]  We don't have read access, but we should
> 
> vsyscall=native:
> 
> # ./test_vsyscall_64
> ...
> [RUN]   Checking read access to the vsyscall page
> [FAIL]  We don't have read access, but we should
> 
> Everything else passes.

I get this same error with the latest 4.9-rc tree as well, but it works
just fine on 4.15-rc6.

I'll look at the proposed patches now for this...

thanks so much for the test tool.

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 13:03         ` Mike Galbraith
@ 2018-01-05 13:34           ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-05 13:34 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Guenter Roeck, linux-kernel, torvalds, akpm, shuahkh, patches,
	ben.hutchings, lkft-triage, stable, Tao Wu

On Fri, Jan 05, 2018 at 02:03:17PM +0100, Mike Galbraith wrote:
> On Fri, 2018-01-05 at 13:17 +0100, Greg Kroah-Hartman wrote:
> > On Fri, Jan 05, 2018 at 05:37:54AM +0100, Mike Galbraith wrote:
> > > On Thu, 2018-01-04 at 15:45 -0800, Guenter Roeck wrote:
> > > > 
> > > > The crash part of this problem may be solved with the following patch
> > > > (thanks to Hugh for the hint). There is still another problem, though -
> > > > with this patch applied, the qemu session aborts with "VCPU Shutdown
> > > > request", whatever that means.
> > > 
> > > The crash part is not fixed by your patch here, w/wo I get this, and it
> > > is PTI, as virgin 109 boots/works with identical everything else.  My
> > > shiny new PTI equipped enterprise 4.4 RT kernels also boot/work fine,
> > > which seems a bit odd.. and not particularly comforting.
> > 
> > Might I ask _what_ enterprise 4.4 kernels you are trying here?  This
> > should be the identical set to what is in the SLES12 tree, which worries
> > me a lot...
> 
> SLE12-SP[23]-RT, currently 4.4.104 based. Parent trees boot fine in the
> vm too.  I thought perhaps it was a config difference, but seems not,
> 4.4.110-rc1 built with as close to enterprise config as you can get
> blows up the same as my light config.

Ok, we found two patches that were missing in 4.4-stable that were in
the SLES12 tree (thanks to Jamie Iles), now I only have 19k more to sift
through :)

I should probably do an "interm" release to get people to be able to
sync up to a common place easier for testing, dealing with patch sets
and random emails saying different git ids is not easy for anyone.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
@ 2018-01-05 13:34           ` Greg Kroah-Hartman
  0 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-05 13:34 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Guenter Roeck, linux-kernel, torvalds, akpm, shuahkh, patches,
	ben.hutchings, lkft-triage, stable, Tao Wu

On Fri, Jan 05, 2018 at 02:03:17PM +0100, Mike Galbraith wrote:
> On Fri, 2018-01-05 at 13:17 +0100, Greg Kroah-Hartman wrote:
> > On Fri, Jan 05, 2018 at 05:37:54AM +0100, Mike Galbraith wrote:
> > > On Thu, 2018-01-04 at 15:45 -0800, Guenter Roeck wrote:
> > > > 
> > > > The crash part of this problem may be solved with the following patch
> > > > (thanks to Hugh for the hint). There is still another problem, though -
> > > > with this patch applied, the qemu session aborts with "VCPU Shutdown
> > > > request", whatever that means.
> > > 
> > > The crash part is not fixed by your patch here, w/wo I get this, and it
> > > is PTI, as virgin 109 boots/works with identical everything else. �My
> > > shiny new PTI equipped enterprise 4.4 RT kernels also boot/work fine,
> > > which seems a bit odd.. and not particularly comforting.
> > 
> > Might I ask _what_ enterprise 4.4 kernels you are trying here?  This
> > should be the identical set to what is in the SLES12 tree, which worries
> > me a lot...
> 
> SLE12-SP[23]-RT, currently 4.4.104 based. Parent trees boot fine in the
> vm too.  I thought perhaps it was a config difference, but seems not,
> 4.4.110-rc1 built with as close to enterprise config as you can get
> blows up the same as my light config.

Ok, we found two patches that were missing in 4.4-stable that were in
the SLES12 tree (thanks to Jamie Iles), now I only have 19k more to sift
through :)

I should probably do an "interm" release to get people to be able to
sync up to a common place easier for testing, dealing with patch sets
and random emails saying different git ids is not easy for anyone.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 23:45 ` Guenter Roeck
  2018-01-04 23:58   ` Linus Torvalds
  2018-01-05  4:37     ` Mike Galbraith
@ 2018-01-05 13:41   ` Greg Kroah-Hartman
  2018-01-05 17:51     ` Guenter Roeck
  2 siblings, 1 reply; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-05 13:41 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: linux-kernel, torvalds, akpm, shuahkh, patches, ben.hutchings,
	lkft-triage, stable, Tao Wu

On Thu, Jan 04, 2018 at 03:45:55PM -0800, Guenter Roeck wrote:
> On Wed, Jan 03, 2018 at 09:11:06PM +0100, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.4.110 release.
> > There are 37 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Fri Jan  5 19:50:38 UTC 2018.
> > Anything received after that time might be too late.
> > 
> 
> This is also reported to crash if loaded under qemu + haxm under windows.
> See https://www.spinics.net/lists/kernel/msg2689835.html for details.
> Here is a boot log (the log is from chromeos-4.4, but Tao Wu says that
> the same log is also seen with vanilla v4.4.110-rc1).
> 
> [    0.712750] Freeing unused kernel memory: 552K
> [    0.721821] init: Corrupted page table at address 57b029b332e0
> [    0.722761] PGD 80000000bb238067 PUD bc36a067 PMD bc369067 PTE 45d2067
> [    0.722761] Bad pagetable: 000b [#1] PREEMPT SMP 
> [    0.722761] Modules linked in:
> [    0.722761] CPU: 1 PID: 1 Comm: init Not tainted 4.4.96 #31
> [    0.722761] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
> [    0.722761] task: ffff8800bc290000 ti: ffff8800bc28c000 task.ti: ffff8800bc28c000
> [    0.722761] RIP: 0010:[<ffffffff83f4129e>]  [<ffffffff83f4129e>] __clear_user+0x42/0x67
> [    0.722761] RSP: 0000:ffff8800bc28fcf8  EFLAGS: 00010202
> [    0.722761] RAX: 0000000000000000 RBX: 00000000000001a4 RCX: 00000000000001a4
> [    0.722761] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000057b029b332e0
> [    0.722761] RBP: ffff8800bc28fd08 R08: ffff8800bc290000 R09: ffff8800bb2f4000
> [    0.722761] R10: ffff8800bc290000 R11: ffff8800bb2f4000 R12: 000057b029b332e0
> [    0.722761] R13: 0000000000000000 R14: 000057b029b33340 R15: ffff8800bb1e2a00
> [    0.722761] FS:  0000000000000000(0000) GS:ffff8800bfb00000(0000) knlGS:0000000000000000
> [    0.722761] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [    0.722761] CR2: 000057b029b332e0 CR3: 00000000bb2f8000 CR4: 00000000000006e0
> [    0.722761] Stack:
> [    0.722761]  000057b029b332e0 ffff8800bb95fa80 ffff8800bc28fd18 ffffffff83f4120c
> [    0.722761]  ffff8800bc28fe18 ffffffff83e9e7a1 ffff8800bc28fd68 0000000000000000
> [    0.722761]  ffff8800bc290000 ffff8800bc290000 ffff8800bc290000 ffff8800bc290000
> [    0.722761] Call Trace:
> [    0.722761]  [<ffffffff83f4120c>] clear_user+0x2e/0x30
> [    0.722761]  [<ffffffff83e9e7a1>] load_elf_binary+0xa7f/0x18f7
> [    0.722761]  [<ffffffff83de2088>] search_binary_handler+0x86/0x19c
> [    0.722761]  [<ffffffff83de389e>] do_execveat_common.isra.26+0x909/0xf98
> [    0.722761]  [<ffffffff844febe0>] ? rest_init+0x87/0x87
> [    0.722761]  [<ffffffff83de40be>] do_execve+0x23/0x25
> [    0.722761]  [<ffffffff83c002e3>] run_init_process+0x2b/0x2d
> [    0.722761]  [<ffffffff844fec4d>] kernel_init+0x6d/0xda
> [    0.722761]  [<ffffffff84505b2f>] ret_from_fork+0x3f/0x70
> [    0.722761]  [<ffffffff844febe0>] ? rest_init+0x87/0x87
> [    0.722761] Code: 86 84 be 12 00 00 00 e8 87 0d e8 ff 66 66 90 48 89 d8 48 c1
> eb 03 4c 89 e7 83 e0 07 48 89 d9 be 08 00 00 00 31 d2 48 85 c9 74 0a <48> 89 17
> 48 01 f7 ff c9 75 f6 48 89 c1 85 c9 74 09 88 17 48 ff 
> [    0.722761] RIP  [<ffffffff83f4129e>] __clear_user+0x42/0x67
> [    0.722761]  RSP <ffff8800bc28fcf8>
> [    0.722761] ---[ end trace def703879b4ff090 ]---
> [    0.722761] BUG: sleeping function called from invalid context at /mnt/host/source/src/third_party/kernel/v4.4/kernel/locking/rwsem.c:21
> [    0.722761] in_atomic(): 0, irqs_disabled(): 1, pid: 1, name: init
> [    0.722761] CPU: 1 PID: 1 Comm: init Tainted: G      D         4.4.96 #31
> [    0.722761] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
> [    0.722761]  0000000000000086 dcb5d76098c89836 ffff8800bc28fa30 ffffffff83f34004
> [    0.722761]  ffffffff84839dc2 0000000000000015 ffff8800bc28fa40 ffffffff83d57dc9
> [    0.722761]  ffff8800bc28fa68 ffffffff83d57e6a ffffffff84a53640 0000000000000000
> [    0.722761] Call Trace:
> [    0.722761]  [<ffffffff83f34004>] dump_stack+0x4d/0x63
> [    0.722761]  [<ffffffff83d57dc9>] ___might_sleep+0x13a/0x13c
> [    0.722761]  [<ffffffff83d57e6a>] __might_sleep+0x9f/0xa6
> [    0.722761]  [<ffffffff84502788>] down_read+0x20/0x31
> [    0.722761]  [<ffffffff83cc5d9b>] __blocking_notifier_call_chain+0x35/0x63
> [    0.722761]  [<ffffffff83cc5ddd>] blocking_notifier_call_chain+0x14/0x16
> [    0.800374] usb 1-1: new full-speed USB device number 2 using uhci_hcd
> [    0.722761]  [<ffffffff83cefe97>] profile_task_exit+0x1a/0x1c
> [    0.802309]  [<ffffffff83cac84e>] do_exit+0x39/0xe7f
> [    0.802309]  [<ffffffff83ce5938>] ? vprintk_default+0x1d/0x1f
> [    0.802309]  [<ffffffff83d7bb95>] ? printk+0x57/0x73
> [    0.802309]  [<ffffffff83c46e25>] oops_end+0x80/0x85
> [    0.802309]  [<ffffffff83c7b747>] pgtable_bad+0x8a/0x95
> [    0.802309]  [<ffffffff83ca7f4a>] __do_page_fault+0x8c/0x352
> [    0.802309]  [<ffffffff83eefba5>] ? file_has_perm+0xc4/0xe5
> [    0.802309]  [<ffffffff83ca821c>] do_page_fault+0xc/0xe
> [    0.802309]  [<ffffffff84507682>] page_fault+0x22/0x30
> [    0.802309]  [<ffffffff83f4129e>] ? __clear_user+0x42/0x67
> [    0.802309]  [<ffffffff83f4127f>] ? __clear_user+0x23/0x67
> [    0.802309]  [<ffffffff83f4120c>] clear_user+0x2e/0x30
> [    0.802309]  [<ffffffff83e9e7a1>] load_elf_binary+0xa7f/0x18f7
> [    0.802309]  [<ffffffff83de2088>] search_binary_handler+0x86/0x19c
> [    0.802309]  [<ffffffff83de389e>] do_execveat_common.isra.26+0x909/0xf98
> [    0.802309]  [<ffffffff844febe0>] ? rest_init+0x87/0x87
> [    0.802309]  [<ffffffff83de40be>] do_execve+0x23/0x25
> [    0.802309]  [<ffffffff83c002e3>] run_init_process+0x2b/0x2d
> [    0.802309]  [<ffffffff844fec4d>] kernel_init+0x6d/0xda
> [    0.802309]  [<ffffffff84505b2f>] ret_from_fork+0x3f/0x70
> [    0.802309]  [<ffffffff844febe0>] ? rest_init+0x87/0x87
> [    0.830559] Kernel panic - not syncing: Attempted to kill init!  exitcode=0x00000009
> [    0.830559] 
> [    0.831305] Kernel Offset: 0x2c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [    0.831305] ---[ end Kernel panic - not syncing: Attempted to kill init!  exitcode=0x00000009
> 
> The crash part of this problem may be solved with the following patch
> (thanks to Hugh for the hint). There is still another problem, though -
> with this patch applied, the qemu session aborts with "VCPU Shutdown
> request", whatever that means.
> 
> Guenter
> 
> ---
> From: Guenter Roeck <groeck@chromium.org>
> Date: Thu, 4 Jan 2018 13:41:55 -0800
> Subject: [PATCH 2/2] WIP: kaiser: Set _PAGE_NX only if supported
> 
> Change-Id: Ie6ab566c1d725b24c4b3aa80a47c3ff3a5feddb9
> Signed-off-by: Guenter Roeck <groeck@chromium.org>
> ---
>  arch/x86/mm/kaiser.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/mm/kaiser.c b/arch/x86/mm/kaiser.c
> index 7d2f7eb6857f..e4706273d4a1 100644
> --- a/arch/x86/mm/kaiser.c
> +++ b/arch/x86/mm/kaiser.c
> @@ -421,7 +421,8 @@ pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd)
>  			 * get out to userspace running on the kernel CR3,
>  			 * userspace will crash instead of running.
>  			 */
> -			pgd.pgd |= _PAGE_NX;
> +			if (__supported_pte_mask & _PAGE_NX)
> +				pgd.pgd |= _PAGE_NX;
>  		}
>  	} else if (!pgd.pgd) {
>  		/*

Very good catch, this mirrors almost what is in 4.14 in this area.  I'll
go queue this up for 4.9 and 4.4 stable trees.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 13:34           ` Greg Kroah-Hartman
  (?)
@ 2018-01-05 14:03           ` Mike Galbraith
  2018-01-05 23:28             ` Hugh Dickins
  -1 siblings, 1 reply; 156+ messages in thread
From: Mike Galbraith @ 2018-01-05 14:03 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Guenter Roeck, linux-kernel, torvalds, akpm, shuahkh, patches,
	ben.hutchings, lkft-triage, stable, Tao Wu

On Fri, 2018-01-05 at 14:34 +0100, Greg Kroah-Hartman wrote:
> 
> Ok, we found two patches that were missing in 4.4-stable that were in
> the SLES12 tree (thanks to Jamie Iles), now I only have 19k more to sift
> through :)

As you know, in enterprise, uname -r means you might find something
this old in your kernel if you look hard enough :)

	-Mike

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 19:38 ` Thomas Voegtle
  2018-01-04 19:50   ` Greg Kroah-Hartman
  2018-01-04 20:10   ` Guenter Roeck
@ 2018-01-05 14:58   ` Greg Kroah-Hartman
  2018-01-05 15:25     ` Thomas Voegtle
  2 siblings, 1 reply; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-05 14:58 UTC (permalink / raw)
  To: Thomas Voegtle
  Cc: linux-kernel, torvalds, akpm, linux, shuahkh, patches,
	ben.hutchings, lkft-triage, stable

On Thu, Jan 04, 2018 at 08:38:23PM +0100, Thomas Voegtle wrote:
> 
> When I start 4.4.110-rc1 on a virtual machine (qemu) init throws a
> segfault and the kernel panics (attempted to kill init).
> The VM host is a Haswell system.
> 
> The same kernel binary boots fine on a (other) Haswell system.
> 
> I tried:
> 
> 4.4.110-rc1     broken
> 4.4.109         ok
> 4.9.75-rc1      ok
> 
> All systems are OpenSuSE 42.3 64bit.
> 
> qemu is started only with:
> qemu-system-x86_64 -m 2048 -enable-kvm  -drive
> file=tvsuse,format=raw,if=none,id=virtdisk0 -device
> virtio-blk-pci,scsi=off,drive=virtdisk0
> 
> Am I the only one who sees this? Has anyone booted that kernel on qemu?

I've now released 4.4.110, which had 4 more patches on top of what
4.4.109-rc1 had in it, that should hopefully resolve these issues.

Can you test that and let me know if you still have problems?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 21:48                 ` Pavel Tatashin
  2018-01-04 22:33                   ` Linus Torvalds
@ 2018-01-05 14:59                   ` Greg Kroah-Hartman
  2018-01-05 15:32                     ` Pavel Tatashin
  1 sibling, 1 reply; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-05 14:59 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Hugh Dickins, Andy Lutomirski, Linus Torvalds, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable

On Thu, Jan 04, 2018 at 04:48:48PM -0500, Pavel Tatashin wrote:
> [    6.159992] Code: 89 83 78 06 01 00 b8 01 00 00 00 5b 41 5c 41 5d
> 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 31 d2 48 8b 87 c8 00 00
> 00 48 89 e5 <f0> 0f c1 50 0c 89 97 d0 00 00 00 83 e2 01 b8 01 00 00 00
> 74 1d
> 
> Also, attached is the full console output.

Ick, like the others, I have no idea what happened here.

But, can you tgest 4.4.110 now?  It has 4 more patches on top of what
you were testing with here for 4.4.110-rc1, that hopefully should
resolve this type of issue.

And if not, it would be good for us to know :)

thanks so much for testing,

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-04 17:56             ` Guenter Roeck
@ 2018-01-05 15:00               ` Greg Kroah-Hartman
  2018-01-05 18:12                 ` Guenter Roeck
  0 siblings, 1 reply; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-05 15:00 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Willy Tarreau, Pavel Tatashin, Jiri Kosina, Hugh Dickins,
	Dave Hansen, linux-kernel, torvalds, Andrew Morton, shuahkh,
	patches, ben.hutchings, lkft-triage, stable

On Thu, Jan 04, 2018 at 09:56:47AM -0800, Guenter Roeck wrote:
> On Thu, Jan 04, 2018 at 06:16:04PM +0100, Greg Kroah-Hartman wrote:
> > On Thu, Jan 04, 2018 at 06:14:15PM +0100, Greg Kroah-Hartman wrote:
> > > On Thu, Jan 04, 2018 at 06:11:02PM +0100, Greg Kroah-Hartman wrote:
> > > > On Thu, Jan 04, 2018 at 06:03:15PM +0100, Willy Tarreau wrote:
> > > > > On Thu, Jan 04, 2018 at 05:53:06PM +0100, Greg Kroah-Hartman wrote:
> > > > > > On Thu, Jan 04, 2018 at 11:38:25AM -0500, Pavel Tatashin wrote:
> > > > > > > I am getting the following panic when trying to boot 4.4.110rc1 on
> > > > > > > Intel(R) Xeon(R) CPU E5-2630:
> > > > > > > 
> > > > > > > [    5.923489] BUG: unable to handle kernel NULL pointer dereference
> > > > > > > at 000000000000000d
> > > > > > > [    5.932259] IP: [<ffffffff810e70d2>] dyntick_save_progress_counter+0x12/0x50
> > > > > > > [    5.940142] PGD 0
> > > > > > > [    5.942400] Oops: 0002 [#1] SMP
> > > > > > > [    5.946023] Modules linked in:
> > > > > > > [    5.949448] CPU: 5 PID: 8 Comm: rcu_sched Not tainted
> > > > > > > 4.4.110-rc1_pt_linux-4.4.110rc1 #1
> > > > > > > [    5.958484] Hardware name: Oracle Corporation ORACLE SERVER
> > > > > > > X6-2/ASM,MOTHERBOARD,1U, BIOS 38050100 08/30/2016
> > > > > > > [    5.969552] task: ffff881ff2f1ab00 ti: ffff881ff2f24000 task.ti:
> > > > > > > ffff881ff2f24000
> > > > > > > [    5.977905] RIP: 0010:[<ffffffff810e70d2>]  [<ffffffff810e70d2>]
> > > > > > > dyntick_save_progress_counter+0x12/0x50
> > > > > > > [    5.988505] RSP: 0000:ffff881ff2f27dc0  EFLAGS: 00010046
> > > > > > > [    5.994434] RAX: 0000000000000001 RBX: ffffffff81b02140 RCX: ffff883fec768000
> > > > > > > [    6.002403] RDX: 0000000000000000 RSI: ffff881ff2f27e5f RDI: ffff88407e958140
> > > > > > > [    6.010368] RBP: ffff881ff2f27dc0 R08: ffff881ff2f27e78 R09: 000000016110f359
> > > > > > > [    6.018333] R10: 0000000000000b10 R11: 0000000000000000 R12: ffffffff81b02140
> > > > > > > [    6.026297] R13: 00000000ffffffdf R14: 0000000000000021 R15: 0000000200000000
> > > > > > > [    6.034262] FS:  0000000000000000(0000) GS:ffff881fff940000(0000)
> > > > > > > knlGS:0000000000000000
> > > > > > > [    6.043293] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > [    6.049707] CR2: 000000000000000d CR3: 0000000001aa6000 CR4: 0000000000360670
> > > > > > > [    6.057672] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > > [    6.065638] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > > [    6.073603] Stack:
> > > > > > > [    6.075847]  ffff881ff2f27e18 ffffffff810e8fac 0000000000000202
> > > > > > > ffff881ff2f27e60
> > > > > > > [    6.084158]  ffff881ff2f27e5f ffffffff810e70c0 ffffffff81b02140
> > > > > > > ffffffff81b127a0
> > > > > > > [    6.092465]  0000000000000001 0000000000000000 0000000000000003
> > > > > > > ffff881ff2f27eb8
> > > > > > > [    6.100768] Call Trace:
> > > > > > > [    6.103501]  [<ffffffff810e8fac>] force_qs_rnp+0xdc/0x150
> > > > > > > [    6.109527]  [<ffffffff810e70c0>] ? rcu_start_gp+0x70/0x70
> > > > > > > [    6.115654]  [<ffffffff810ea118>] rcu_gp_kthread+0x468/0x9b0
> > > > > > > [    6.121976]  [<ffffffff810c9190>] ? prepare_to_wait_event+0xf0/0xf0
> > > > > > > [    6.128973]  [<ffffffff810e9cb0>] ? rcu_process_callbacks+0x5f0/0x5f0
> > > > > > > [    6.136167]  [<ffffffff810a4a25>] kthread+0xe5/0x100
> > > > > > > [    6.141710]  [<ffffffff810a4940>] ? kthread_park+0x60/0x60
> > > > > > > [    6.147840]  [<ffffffff81714e8f>] ret_from_fork+0x3f/0x70
> > > > > > > [    6.153868]  [<ffffffff810a4940>] ? kthread_park+0x60/0x60
> > > > > > > 
> > > > > > > I tried to bisect the problem, but when I try to boot only with:
> > > > > > > "KAISER: Kernel Address Isolation" machine hangs during boot and
> > > > > > > reboots without any panic message.
> > > > > > > 
> > > > > > > 4.4.109 boots fine
> > > > > > > 4.9.75rc1 also boots fine.
> > > > > > 
> > > > > > Hm, so I'm guessing 4.15-rc6 also works?
> > > > > > 
> > > > > > Odd that 4.9.75-rc1 fails.
> > > > > 
> > > > > s/4.9.75/4.4.110/ I suppose.
> > > > 
> > > > Yes, mistake on my side.
> > > > 
> > > > > Can't this be because more patches are required in 4.4 to support this
> > > > > patch set ? Or maybe a manual fix for a conflict that went wrong ? Just
> > > > > trying to guess.
> > > > 
> > > > Odd thing is, the 4.9 series started from the 4.4 code for most of the
> > > > patches, so I would expect that one to fail...
> > > 
> > > Also, the 4.4 patches were supposed to have been better tested, I need
> > > to go dig and see what I messed up here...
> > 
> > Nope, it matches up with what is in SLES12 exactly, I must be missing
> > something else here as a prerequisite...
> 
> FWIW, v4.4.110-rc1 boots fine when merged into chromeos-4.4, on i7-7Y75.

That's good to know, hopefully 4.4.110-final also still works for you :)

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 14:58   ` Greg Kroah-Hartman
@ 2018-01-05 15:25     ` Thomas Voegtle
  2018-01-05 15:48       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 156+ messages in thread
From: Thomas Voegtle @ 2018-01-05 15:25 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, linux, shuahkh, patches,
	ben.hutchings, lkft-triage, stable

On Fri, 5 Jan 2018, Greg Kroah-Hartman wrote:

> On Thu, Jan 04, 2018 at 08:38:23PM +0100, Thomas Voegtle wrote:
>>
>> When I start 4.4.110-rc1 on a virtual machine (qemu) init throws a
>> segfault and the kernel panics (attempted to kill init).
>> The VM host is a Haswell system.
>>
>> The same kernel binary boots fine on a (other) Haswell system.
>>
>> I tried:
>>
>> 4.4.110-rc1     broken
>> 4.4.109         ok
>> 4.9.75-rc1      ok
>>
>> All systems are OpenSuSE 42.3 64bit.
>>
>> qemu is started only with:
>> qemu-system-x86_64 -m 2048 -enable-kvm  -drive
>> file=tvsuse,format=raw,if=none,id=virtdisk0 -device
>> virtio-blk-pci,scsi=off,drive=virtdisk0
>>
>> Am I the only one who sees this? Has anyone booted that kernel on qemu?
>
> I've now released 4.4.110, which had 4 more patches on top of what
> 4.4.109-rc1 had in it, that should hopefully resolve these issues.
>
> Can you test that and let me know if you still have problems?

It's fixed. I can boot 4.4.110 on qemu without problems so far.

./test_vsyscall_64 still fails though, like Kees wrote about 4.4.110-rc1
https://lkml.org/lkml/2018/1/5/123

That's another issue?


  Thank you very much.

       Thomas

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 14:59                   ` Greg Kroah-Hartman
@ 2018-01-05 15:32                     ` Pavel Tatashin
  2018-01-05 15:51                       ` Greg Kroah-Hartman
  2018-01-05 16:57                       ` Andy Lutomirski
  0 siblings, 2 replies; 156+ messages in thread
From: Pavel Tatashin @ 2018-01-05 15:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Hugh Dickins, Andy Lutomirski, Linus Torvalds, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable

Hi Greg,

Just tested on my machine:
[    0.000000] Initializing cgroup subsys cpuset 

[    0.000000] Initializing cgroup subsys cpu 

[    0.000000] Initializing cgroup subsys cpuacct 

[    0.000000] Linux version 4.4.110_pt_linux-v4.4.110 
(ptatashi@ca-ostest441) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) 
(GCC) ) #1 SMP Fri Jan 5 07:22:34 PST 2018
[    0.000000] Command line: 
BOOT_IMAGE=/vmlinuz-4.4.110_pt_linux-v4.4.110 
root=UUID=fe908085-0117-442b-a57c-ce651cc95b38 ro crashkernel=auto 
console=ttyS0,115200 LANG=en_US.UTF-8
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256 

[    0.000000] x86/fpu: Supporting XSAVE feature 0x01: 'x87 floating 
point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x02: 'SSE registers' 

[    0.000000] x86/fpu: Supporting XSAVE feature 0x04: 'AVX registers' 

[    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 
bytes, using 'standard' format. 

<cut>
[    3.457106] hub 1-0:1.0: USB hub found 

[    3.461298] hub 1-0:1.0: 2 ports detected 

[    3.466173] ehci-pci 0000:00:1d.0: EHCI Host Controller 

[    3.472111] ehci-pci 0000:00:1d.0: new USB bus registered, assigned 
bus number 2
[    3.480381] ehci-pci 0000:00:1d.0: debug port 2 

[    3.489571] ehci-pci 0000:00:1d.0: irq 18, io mem 0xc7101000 

[    3.501393] ehci-pci 0000:00:1d.0: USB 2.0 started, EHCI 1.00 

[    3.507855] usb usb2: New USB device found, idVendor=1d6b, 
idProduct=0002
[    3.515436] usb usb2: New USB device strings: Mfr=3, Product=2, 
SerialNumber=1
[    3.523500] usb usb2: Product: EHCI Host Controller 

[    3.528947] usb usb2: Manufacturer: Linux 4.4.110_pt_linux-v4.4.110 
ehci_hcd
[    3.536816] usb usb2: SerialNumber: 0000:00:1d.0 

[    3.542107] hub 2-0:1.0: USB hub found 

[    3.546301] hub 2-0:1.0: 2 ports detected 

[    3.550942] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver 

[    3.557854] ohci-pci: OHCI PCI platform driver 

[    3.562844] uhci_hcd: USB Universal Host Controller Interface driver 

[    3.570032] usbcore: registered new interface driver usbserial 

[    3.576550] usbcore: registered new interface driver 
usbserial_generic
[    3.583844] usbserial: USB Serial support registered for generic 

[    3.590570] i8042: PNP: No PS/2 controller found. Probing ports 
directly.
[    3.995383] tsc: Refined TSC clocksource calibration: 2195.099 MHz 

[    4.002289] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 
0x1fa41d170d9, max_idle_ns: 440795288527 ns 

[    4.046414] usb 2-1: new high-speed USB device number 2 using 
ehci-pci
[    4.174758] usb 2-1: New USB device found, idVendor=8087, 
idProduct=8002
[    4.182245] usb 2-1: New USB device strings: Mfr=0, Product=0, 
SerialNumber=0
[    4.190382] hub 2-1:1.0: USB hub found 

[    4.194609] hub 2-1:1.0: 8 ports detected 

[    4.637363] i8042: No controller found 

[    4.641646] mousedev: PS/2 mouse device common for all mice 

[    4.648117] rtc_cmos 00:00: RTC can wake from S4 

[    4.653447] rtc_cmos 00:00: rtc core: registered rtc_cmos as rtc0 

[    4.660272] rtc_cmos 00:00: alarms up to one month, y3k, 114 bytes 
nvram, hpet irqs
[    4.669050] Intel P-state driver initializing. 

[    4.676630] EFI Variables Facility v0.08 2004-May-17
<hangs here>
Reboots after about 30 seconds.

Boots fine with nopti option.

Thank you,
Pavel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 15:25     ` Thomas Voegtle
@ 2018-01-05 15:48       ` Greg Kroah-Hartman
  0 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-05 15:48 UTC (permalink / raw)
  To: Thomas Voegtle
  Cc: linux-kernel, torvalds, akpm, linux, shuahkh, patches,
	ben.hutchings, lkft-triage, stable

On Fri, Jan 05, 2018 at 04:25:33PM +0100, Thomas Voegtle wrote:
> On Fri, 5 Jan 2018, Greg Kroah-Hartman wrote:
> 
> > On Thu, Jan 04, 2018 at 08:38:23PM +0100, Thomas Voegtle wrote:
> > > 
> > > When I start 4.4.110-rc1 on a virtual machine (qemu) init throws a
> > > segfault and the kernel panics (attempted to kill init).
> > > The VM host is a Haswell system.
> > > 
> > > The same kernel binary boots fine on a (other) Haswell system.
> > > 
> > > I tried:
> > > 
> > > 4.4.110-rc1     broken
> > > 4.4.109         ok
> > > 4.9.75-rc1      ok
> > > 
> > > All systems are OpenSuSE 42.3 64bit.
> > > 
> > > qemu is started only with:
> > > qemu-system-x86_64 -m 2048 -enable-kvm  -drive
> > > file=tvsuse,format=raw,if=none,id=virtdisk0 -device
> > > virtio-blk-pci,scsi=off,drive=virtdisk0
> > > 
> > > Am I the only one who sees this? Has anyone booted that kernel on qemu?
> > 
> > I've now released 4.4.110, which had 4 more patches on top of what
> > 4.4.109-rc1 had in it, that should hopefully resolve these issues.
> > 
> > Can you test that and let me know if you still have problems?
> 
> It's fixed. I can boot 4.4.110 on qemu without problems so far.

Yeah!!!

> ./test_vsyscall_64 still fails though, like Kees wrote about 4.4.110-rc1
> https://lkml.org/lkml/2018/1/5/123
> 
> That's another issue?

Yes it is, that's next up to get resolved.

thanks for testing and letting me know,

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 15:32                     ` Pavel Tatashin
@ 2018-01-05 15:51                       ` Greg Kroah-Hartman
  2018-01-05 15:57                         ` Willy Tarreau
  2018-01-05 16:26                         ` Pavel Tatashin
  2018-01-05 16:57                       ` Andy Lutomirski
  1 sibling, 2 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-05 15:51 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Hugh Dickins, Andy Lutomirski, Linus Torvalds, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable

On Fri, Jan 05, 2018 at 10:32:49AM -0500, Pavel Tatashin wrote:
> Hi Greg,
> 
> Just tested on my machine:
> [    0.000000] Initializing cgroup subsys cpuset
> 
> [    0.000000] Initializing cgroup subsys cpu
> 
> [    0.000000] Initializing cgroup subsys cpuacct
> 
> [    0.000000] Linux version 4.4.110_pt_linux-v4.4.110
> (ptatashi@ca-ostest441) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC)
> ) #1 SMP Fri Jan 5 07:22:34 PST 2018
> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.4.110_pt_linux-v4.4.110
> root=UUID=fe908085-0117-442b-a57c-ce651cc95b38 ro crashkernel=auto
> console=ttyS0,115200 LANG=en_US.UTF-8
> [    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
> 
> [    0.000000] x86/fpu: Supporting XSAVE feature 0x01: 'x87 floating point
> registers'
> [    0.000000] x86/fpu: Supporting XSAVE feature 0x02: 'SSE registers'
> 
> [    0.000000] x86/fpu: Supporting XSAVE feature 0x04: 'AVX registers'
> 
> [    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832
> bytes, using 'standard' format.
> 
> <cut>
> [    3.457106] hub 1-0:1.0: USB hub found
> 
> [    3.461298] hub 1-0:1.0: 2 ports detected
> 
> [    3.466173] ehci-pci 0000:00:1d.0: EHCI Host Controller
> 
> [    3.472111] ehci-pci 0000:00:1d.0: new USB bus registered, assigned bus
> number 2
> [    3.480381] ehci-pci 0000:00:1d.0: debug port 2
> 
> [    3.489571] ehci-pci 0000:00:1d.0: irq 18, io mem 0xc7101000
> 
> [    3.501393] ehci-pci 0000:00:1d.0: USB 2.0 started, EHCI 1.00
> 
> [    3.507855] usb usb2: New USB device found, idVendor=1d6b, idProduct=0002
> [    3.515436] usb usb2: New USB device strings: Mfr=3, Product=2,
> SerialNumber=1
> [    3.523500] usb usb2: Product: EHCI Host Controller
> 
> [    3.528947] usb usb2: Manufacturer: Linux 4.4.110_pt_linux-v4.4.110
> ehci_hcd
> [    3.536816] usb usb2: SerialNumber: 0000:00:1d.0
> 
> [    3.542107] hub 2-0:1.0: USB hub found
> 
> [    3.546301] hub 2-0:1.0: 2 ports detected
> 
> [    3.550942] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
> 
> [    3.557854] ohci-pci: OHCI PCI platform driver
> 
> [    3.562844] uhci_hcd: USB Universal Host Controller Interface driver
> 
> [    3.570032] usbcore: registered new interface driver usbserial
> 
> [    3.576550] usbcore: registered new interface driver usbserial_generic
> [    3.583844] usbserial: USB Serial support registered for generic
> 
> [    3.590570] i8042: PNP: No PS/2 controller found. Probing ports directly.
> [    3.995383] tsc: Refined TSC clocksource calibration: 2195.099 MHz
> 
> [    4.002289] clocksource: tsc: mask: 0xffffffffffffffff max_cycles:
> 0x1fa41d170d9, max_idle_ns: 440795288527 ns
> 
> [    4.046414] usb 2-1: new high-speed USB device number 2 using ehci-pci
> [    4.174758] usb 2-1: New USB device found, idVendor=8087, idProduct=8002
> [    4.182245] usb 2-1: New USB device strings: Mfr=0, Product=0,
> SerialNumber=0
> [    4.190382] hub 2-1:1.0: USB hub found
> 
> [    4.194609] hub 2-1:1.0: 8 ports detected
> 
> [    4.637363] i8042: No controller found
> 
> [    4.641646] mousedev: PS/2 mouse device common for all mice
> 
> [    4.648117] rtc_cmos 00:00: RTC can wake from S4
> 
> [    4.653447] rtc_cmos 00:00: rtc core: registered rtc_cmos as rtc0
> 
> [    4.660272] rtc_cmos 00:00: alarms up to one month, y3k, 114 bytes nvram,
> hpet irqs
> [    4.669050] Intel P-state driver initializing.
> 
> [    4.676630] EFI Variables Facility v0.08 2004-May-17
> <hangs here>
> Reboots after about 30 seconds.
> 
> Boots fine with nopti option.

Crap.

And 4.9.75 works for you just fine?  Same with 4.15-rc6?

I'm wondering if this is some crazy gcc thing, given the ancient age of
what you are using (gcc 4.8.5).  I haven't used 4.x in many many years,
is this what comes with RHEL6?  What is the "base" distro you are
building this on, and anything special about the hardware being used
here?

Or is this a virtual machine?  I've been seeing too many different
crashes lately to keep them all straight, sorry...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 15:51                       ` Greg Kroah-Hartman
@ 2018-01-05 15:57                         ` Willy Tarreau
  2018-01-05 18:01                           ` Greg Kroah-Hartman
  2018-01-05 16:26                         ` Pavel Tatashin
  1 sibling, 1 reply; 156+ messages in thread
From: Willy Tarreau @ 2018-01-05 15:57 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Pavel Tatashin, Hugh Dickins, Andy Lutomirski, Linus Torvalds,
	Thomas Voegtle, Linux Kernel Mailing List, Andrew Morton,
	Guenter Roeck, Shuah Khan, patches, Ben Hutchings, lkft-triage,
	stable

On Fri, Jan 05, 2018 at 04:51:32PM +0100, Greg Kroah-Hartman wrote:
> On Fri, Jan 05, 2018 at 10:32:49AM -0500, Pavel Tatashin wrote:
(...)
> > Reboots after about 30 seconds.
> > 
> > Boots fine with nopti option.
> 
> Crap.
> 
> And 4.9.75 works for you just fine?  Same with 4.15-rc6?
> 
> I'm wondering if this is some crazy gcc thing, given the ancient age of
> what you are using (gcc 4.8.5).  I haven't used 4.x in many many years,
> is this what comes with RHEL6?  What is the "base" distro you are
> building this on, and anything special about the hardware being used
> here?

I don't think so, I'm personally building with 4.7.4 and am not seeing
this with 4.4.110.

Willy

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 15:51                       ` Greg Kroah-Hartman
  2018-01-05 15:57                         ` Willy Tarreau
@ 2018-01-05 16:26                         ` Pavel Tatashin
  1 sibling, 0 replies; 156+ messages in thread
From: Pavel Tatashin @ 2018-01-05 16:26 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Hugh Dickins, Andy Lutomirski, Linus Torvalds, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable

>
> Crap.
>
> And 4.9.75 works for you just fine?  Same with 4.15-rc6?

4.15-rc6  -> Rebooted twice no issues.
4.9.75 -> Rebooted twice no issues
4.4.110 -> hangs/reboots on every single reboot.

>
> I'm wondering if this is some crazy gcc thing, given the ancient age of
> what you are using (gcc 4.8.5).  I haven't used 4.x in many many years,
> is this what comes with RHEL6?  What is the "base" distro you are
> building this on, and anything special about the hardware being used
> here?

Oracle Linux 7.3

[root@ca-ostest441 ~]# cat /etc/oracle-release
Oracle Linux Server release 7.3
[root@ca-ostest441 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.3 (Maipo)

> Or is this a virtual machine?  I've been seeing too many different
> crashes lately to keep them all straight, sorry...

This is a physical machine. No special devices attached:
http://www.oracle.com/us/products/servers/x6-2datasheet-2900789.pdf

[root@ca-ostest441 ~]# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                40
On-line CPU(s) list:   0-39
Thread(s) per core:    2
Core(s) per socket:    10
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
Stepping:              1
CPU MHz:               2394.586
BogoMIPS:              4390.22
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              25600K
NUMA node0 CPU(s):     0-9,20-29
NUMA node1 CPU(s):     10-19,30-39

[root@ca-ostest441 ~]# free -h
              total        used        free      shared  buff/cache   available
Mem:           251G        2.7G        241G        9.1M        7.8G        247G
Swap:          4.0G          0B        4.0G

>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 15:32                     ` Pavel Tatashin
  2018-01-05 15:51                       ` Greg Kroah-Hartman
@ 2018-01-05 16:57                       ` Andy Lutomirski
  2018-01-05 17:14                         ` Pavel Tatashin
  1 sibling, 1 reply; 156+ messages in thread
From: Andy Lutomirski @ 2018-01-05 16:57 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Greg Kroah-Hartman, Hugh Dickins, Linus Torvalds, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable



> On Jan 5, 2018, at 7:32 AM, Pavel Tatashin <pasha.tatashin@oracle.com> wrote:
> 
> Hi Greg,
> 
> Just tested on my machine:
> [    0.000000] Initializing cgroup subsys cpuset 
> [    0.000000] Initializing cgroup subsys cpu 
> [    0.000000] Initializing cgroup subsys cpuacct 
> [    0.000000] Linux version 4.4.110_pt_linux-v4.4.110 (ptatashi@ca-ostest441) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Fri Jan 5 07:22:34 PST 2018
> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.4.110_pt_linux-v4.4.110 root=UUID=fe908085-0117-442b-a57c-ce651cc95b38 ro crashkernel=auto console=ttyS0,115200 LANG=en_US.UTF-8
> [    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256 
> [    0.000000] x86/fpu: Supporting XSAVE feature 0x01: 'x87 floating point registers'
> [    0.000000] x86/fpu: Supporting XSAVE feature 0x02: 'SSE registers' 
> [    0.000000] x86/fpu: Supporting XSAVE feature 0x04: 'AVX registers' 
> [    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format. 
> <cut>
> [    3.457106] hub 1-0:1.0: USB hub found 
> [    3.461298] hub 1-0:1.0: 2 ports detected 
> [    3.466173] ehci-pci 0000:00:1d.0: EHCI Host Controller 
> [    3.472111] ehci-pci 0000:00:1d.0: new USB bus registered, assigned bus number 2
> [    3.480381] ehci-pci 0000:00:1d.0: debug port 2 
> [    3.489571] ehci-pci 0000:00:1d.0: irq 18, io mem 0xc7101000 
> [    3.501393] ehci-pci 0000:00:1d.0: USB 2.0 started, EHCI 1.00 
> [    3.507855] usb usb2: New USB device found, idVendor=1d6b, idProduct=0002
> [    3.515436] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
> [    3.523500] usb usb2: Product: EHCI Host Controller 
> [    3.528947] usb usb2: Manufacturer: Linux 4.4.110_pt_linux-v4.4.110 ehci_hcd
> [    3.536816] usb usb2: SerialNumber: 0000:00:1d.0 
> [    3.542107] hub 2-0:1.0: USB hub found 
> [    3.546301] hub 2-0:1.0: 2 ports detected 
> [    3.550942] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver 
> [    3.557854] ohci-pci: OHCI PCI platform driver 
> [    3.562844] uhci_hcd: USB Universal Host Controller Interface driver 
> [    3.570032] usbcore: registered new interface driver usbserial 
> [    3.576550] usbcore: registered new interface driver usbserial_generic
> [    3.583844] usbserial: USB Serial support registered for generic 
> [    3.590570] i8042: PNP: No PS/2 controller found. Probing ports directly.
> [    3.995383] tsc: Refined TSC clocksource calibration: 2195.099 MHz 
> [    4.002289] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1fa41d170d9, max_idle_ns: 440795288527 ns 
> [    4.046414] usb 2-1: new high-speed USB device number 2 using ehci-pci
> [    4.174758] usb 2-1: New USB device found, idVendor=8087, idProduct=8002
> [    4.182245] usb 2-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
> [    4.190382] hub 2-1:1.0: USB hub found 
> [    4.194609] hub 2-1:1.0: 8 ports detected 
> [    4.637363] i8042: No controller found 
> [    4.641646] mousedev: PS/2 mouse device common for all mice 
> [    4.648117] rtc_cmos 00:00: RTC can wake from S4 
> [    4.653447] rtc_cmos 00:00: rtc core: registered rtc_cmos as rtc0 
> [    4.660272] rtc_cmos 00:00: alarms up to one month, y3k, 114 bytes nvram, hpet irqs
> [    4.669050] Intel P-state driver initializing. 
> [    4.676630] EFI Variables Facility v0.08 2004-May-17
> <hangs here>
> Reboots after about 30 seconds.
> 

This looks like the KVM RSM issue.  When you manage to run a buggy configuration (KVM + OVMF with secure boot support in the host, PCID (PTI or otherwise) and SMP in the guest), the first EFI call after AP bringup dies.

The actual failure is nasty.  When one CPU calls into EFI, all the other CPUs die --they enter SMM and they don't come back out correctly.  I think the best the guest could do is to try to generate a useful printk if this happens.

Update your host.

> Boots fine with nopti option.
> 
> Thank you,
> Pavel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 16:57                       ` Andy Lutomirski
@ 2018-01-05 17:14                         ` Pavel Tatashin
  2018-01-05 17:43                           ` Andy Lutomirski
  0 siblings, 1 reply; 156+ messages in thread
From: Pavel Tatashin @ 2018-01-05 17:14 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Greg Kroah-Hartman, Hugh Dickins, Linus Torvalds, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable

Hi Andy,

This is bare metal, not VM, read my other email in this thread about
the machine on which I am testing. Sometime hang happens a little
later:

[    5.088948] microcode: CPU36 sig=0x406f1, pf=0x1, revision=0xb00001d
[    5.096076] microcode: CPU37 sig=0x406f1, pf=0x1, revision=0xb00001d
[    5.103206] microcode: CPU38 sig=0x406f1, pf=0x1, revision=0xb00001d
[    5.110326] microcode: CPU39 sig=0x406f1, pf=0x1, revision=0xb00001d
[    5.117467] microcode: Microcode Update Driver: v2.01
<tigran@aivazian.fsnet.co.uk>, Peter Oruba
[    5.127476] registered taskstats version 1
[    5.132058] Loading compiled-in X.509 certificates
[    5.138206] Loaded X.509 cert 'Build time autogenerated kernel key:
26871d9e2c53359981a91797284d4f630796d8cf'
[    5.149337] zswap: loaded using pool lzo/zbud
[    5.154215] page_owner is disabled
[    5.161468] Key type trusted registered
[    5.169226] Key type encrypted registered
[    5.173719] ima: No TPM chip found, activating TPM-bypass!
[    5.179918] evm: HMAC attrs: 0x1
[    5.184958] rtc_cmos 00:00: setting system clock to 2018-01-05
15:40:45 UTC (1515166845)
[    5.196099] Freeing unused kernel memory: 1856K
<hang / reboot here>

Thank you,
Pavel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (44 preceding siblings ...)
  2018-01-04 23:45 ` Guenter Roeck
@ 2018-01-05 17:20 ` Alice Ferrazzi
  2018-01-05 18:01   ` Greg Kroah-Hartman
  2018-01-05 17:56 ` Guenter Roeck
  46 siblings, 1 reply; 156+ messages in thread
From: Alice Ferrazzi @ 2018-01-05 17:20 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, Guenter Roeck, shuahkh, patches,
	Ben Hutchings, lkft-triage, stable

On Thu, Jan 4, 2018 at 5:11 AM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> This is the start of the stable review cycle for the 4.4.110 release.
> There are 37 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Jan  5 19:50:38 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
>         kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.110-rc1.gz
> or in the git tree and branch at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
> and the diffstat can be found below.
>

This patchset merges correctly with Gentoo patches and GCC version 6.4.0
The kernel boot up correctly.
Logs: http://kernel1.amd64.dev.gentoo.org:8010/#/builders/5/builds/44

-- 
Thanks,
Alice Ferrazzi

Gentoo Kernel Project Leader
Gentoo Foundation Board Member
Mail: Alice Ferrazzi <alicef@gentoo.org>
PGP: 2E4E 0856 461C 0585 1336 F496 5621 A6B2 8638 781A

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 17:14                         ` Pavel Tatashin
@ 2018-01-05 17:43                           ` Andy Lutomirski
  2018-01-05 17:48                             ` Pavel Tatashin
  0 siblings, 1 reply; 156+ messages in thread
From: Andy Lutomirski @ 2018-01-05 17:43 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Greg Kroah-Hartman, Hugh Dickins, Linus Torvalds, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable


> On Jan 5, 2018, at 9:14 AM, Pavel Tatashin <pasha.tatashin@oracle.com> wrote:
> 
> Hi Andy,
> 
> This is bare metal, not VM, read my other email in this thread about
> the machine on which I am testing. Sometime hang happens a little
> later:
> 
> [    5.088948] microcode: CPU36 sig=0x406f1, pf=0x1, revision=0xb00001d
> [    5.096076] microcode: CPU37 sig=0x406f1, pf=0x1, revision=0xb00001d
> [    5.103206] microcode: CPU38 sig=0x406f1, pf=0x1, revision=0xb00001d
> [    5.110326] microcode: CPU39 sig=0x406f1, pf=0x1, revision=0xb00001d
> [    5.117467] microcode: Microcode Update Driver: v2.01
> <tigran@aivazian.fsnet.co.uk>, Peter Oruba
> [    5.127476] registered taskstats version 1
> [    5.132058] Loading compiled-in X.509 certificates
> [    5.138206] Loaded X.509 cert 'Build time autogenerated kernel key:
> 26871d9e2c53359981a91797284d4f630796d8cf'
> [    5.149337] zswap: loaded using pool lzo/zbud
> [    5.154215] page_owner is disabled
> [    5.161468] Key type trusted registered
> [    5.169226] Key type encrypted registered
> [    5.173719] ima: No TPM chip found, activating TPM-bypass!
> [    5.179918] evm: HMAC attrs: 0x1
> [    5.184958] rtc_cmos 00:00: setting system clock to 2018-01-05
> 15:40:45 UTC (1515166845)
> [    5.196099] Freeing unused kernel memory: 1856K
> <hang / reboot here>

Gah, too many emails.

Someone probably just needs to look at the EFI think code.  Does it boot if you disable EFI support in the kernel (noefi boot option, I think, or maybe just compile it out.



> 
> Thank you,
> Pavel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 17:43                           ` Andy Lutomirski
@ 2018-01-05 17:48                             ` Pavel Tatashin
  2018-01-05 17:52                               ` Greg Kroah-Hartman
  0 siblings, 1 reply; 156+ messages in thread
From: Pavel Tatashin @ 2018-01-05 17:48 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Greg Kroah-Hartman, Hugh Dickins, Linus Torvalds, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable

Boots successfully with "noefi" kernel parameter :)

On Fri, Jan 5, 2018 at 12:43 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
>> On Jan 5, 2018, at 9:14 AM, Pavel Tatashin <pasha.tatashin@oracle.com> wrote:
>>
>> Hi Andy,
>>
>> This is bare metal, not VM, read my other email in this thread about
>> the machine on which I am testing. Sometime hang happens a little
>> later:
>>
>> [    5.088948] microcode: CPU36 sig=0x406f1, pf=0x1, revision=0xb00001d
>> [    5.096076] microcode: CPU37 sig=0x406f1, pf=0x1, revision=0xb00001d
>> [    5.103206] microcode: CPU38 sig=0x406f1, pf=0x1, revision=0xb00001d
>> [    5.110326] microcode: CPU39 sig=0x406f1, pf=0x1, revision=0xb00001d
>> [    5.117467] microcode: Microcode Update Driver: v2.01
>> <tigran@aivazian.fsnet.co.uk>, Peter Oruba
>> [    5.127476] registered taskstats version 1
>> [    5.132058] Loading compiled-in X.509 certificates
>> [    5.138206] Loaded X.509 cert 'Build time autogenerated kernel key:
>> 26871d9e2c53359981a91797284d4f630796d8cf'
>> [    5.149337] zswap: loaded using pool lzo/zbud
>> [    5.154215] page_owner is disabled
>> [    5.161468] Key type trusted registered
>> [    5.169226] Key type encrypted registered
>> [    5.173719] ima: No TPM chip found, activating TPM-bypass!
>> [    5.179918] evm: HMAC attrs: 0x1
>> [    5.184958] rtc_cmos 00:00: setting system clock to 2018-01-05
>> 15:40:45 UTC (1515166845)
>> [    5.196099] Freeing unused kernel memory: 1856K
>> <hang / reboot here>
>
> Gah, too many emails.
>
> Someone probably just needs to look at the EFI think code.  Does it boot if you disable EFI support in the kernel (noefi boot option, I think, or maybe just compile it out.
>
>
>
>>
>> Thank you,
>> Pavel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 13:41   ` Greg Kroah-Hartman
@ 2018-01-05 17:51     ` Guenter Roeck
  0 siblings, 0 replies; 156+ messages in thread
From: Guenter Roeck @ 2018-01-05 17:51 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, shuahkh, patches, ben.hutchings,
	lkft-triage, stable, Tao Wu

On Fri, Jan 05, 2018 at 02:41:04PM +0100, Greg Kroah-Hartman wrote:
> On Thu, Jan 04, 2018 at 03:45:55PM -0800, Guenter Roeck wrote:
> > On Wed, Jan 03, 2018 at 09:11:06PM +0100, Greg Kroah-Hartman wrote:
> > > This is the start of the stable review cycle for the 4.4.110 release.
> > > There are 37 patches in this series, all will be posted as a response
> > > to this one.  If anyone has any issues with these being applied, please
> > > let me know.
> > > 
> > > Responses should be made by Fri Jan  5 19:50:38 UTC 2018.
> > > Anything received after that time might be too late.
> > > 
> > 
> > This is also reported to crash if loaded under qemu + haxm under windows.
[ ... ]
> > The crash part of this problem may be solved with the following patch
> > (thanks to Hugh for the hint). There is still another problem, though -
> > with this patch applied, the qemu session aborts with "VCPU Shutdown
> > request", whatever that means.
> > 

v4.4.110 still suffers from "VCPU Shutdown request" with qemu+haxm.
Unfortunately I don't have any other information about the problem
at this time.

Guenter

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 17:48                             ` Pavel Tatashin
@ 2018-01-05 17:52                               ` Greg Kroah-Hartman
  2018-01-05 18:15                                 ` Andy Lutomirski
  0 siblings, 1 reply; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-05 17:52 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Andy Lutomirski, Hugh Dickins, Linus Torvalds, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable

On Fri, Jan 05, 2018 at 12:48:54PM -0500, Pavel Tatashin wrote:
> Boots successfully with "noefi" kernel parameter :)

Thanks, that will help me narrow it down.  I'll dig through more patches
when I get home tonight...

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
                   ` (45 preceding siblings ...)
  2018-01-05 17:20 ` Alice Ferrazzi
@ 2018-01-05 17:56 ` Guenter Roeck
  2018-01-05 20:54   ` Greg Kroah-Hartman
  46 siblings, 1 reply; 156+ messages in thread
From: Guenter Roeck @ 2018-01-05 17:56 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, shuahkh, patches, ben.hutchings,
	lkft-triage, stable

On Wed, Jan 03, 2018 at 09:11:06PM +0100, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.110 release.
> There are 37 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Fri Jan  5 19:50:38 UTC 2018.
> Anything received after that time might be too late.
> 

Update: v4.4.110 final nosmp builds fail as follows:

------------
Error log:
arch/x86/entry/vdso/vma.c: In function ‘map_vdso’:
arch/x86/entry/vdso/vma.c:173:9: error:
	implicit declaration of function ‘pvclock_pvti_cpu0_va’

Guenter

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 17:20 ` Alice Ferrazzi
@ 2018-01-05 18:01   ` Greg Kroah-Hartman
  2018-01-09 19:49     ` Serge E. Hallyn
  0 siblings, 1 reply; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-05 18:01 UTC (permalink / raw)
  To: Alice Ferrazzi
  Cc: linux-kernel, torvalds, akpm, Guenter Roeck, shuahkh, patches,
	Ben Hutchings, lkft-triage, stable

On Sat, Jan 06, 2018 at 02:20:16AM +0900, Alice Ferrazzi wrote:
> On Thu, Jan 4, 2018 at 5:11 AM, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > This is the start of the stable review cycle for the 4.4.110 release.
> > There are 37 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Fri Jan  5 19:50:38 UTC 2018.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> >         kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.110-rc1.gz
> > or in the git tree and branch at:
> >   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
> > and the diffstat can be found below.
> >
> 
> This patchset merges correctly with Gentoo patches and GCC version 6.4.0
> The kernel boot up correctly.
> Logs: http://kernel1.amd64.dev.gentoo.org:8010/#/builders/5/builds/44

Great, but Gentoo really should be moving to 4.9 and 4.14 here, I
hope no one running Gentoo is relying on 4.4 :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 15:57                         ` Willy Tarreau
@ 2018-01-05 18:01                           ` Greg Kroah-Hartman
  0 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-05 18:01 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Pavel Tatashin, Hugh Dickins, Andy Lutomirski, Linus Torvalds,
	Thomas Voegtle, Linux Kernel Mailing List, Andrew Morton,
	Guenter Roeck, Shuah Khan, patches, Ben Hutchings, lkft-triage,
	stable

On Fri, Jan 05, 2018 at 04:57:15PM +0100, Willy Tarreau wrote:
> On Fri, Jan 05, 2018 at 04:51:32PM +0100, Greg Kroah-Hartman wrote:
> > On Fri, Jan 05, 2018 at 10:32:49AM -0500, Pavel Tatashin wrote:
> (...)
> > > Reboots after about 30 seconds.
> > > 
> > > Boots fine with nopti option.
> > 
> > Crap.
> > 
> > And 4.9.75 works for you just fine?  Same with 4.15-rc6?
> > 
> > I'm wondering if this is some crazy gcc thing, given the ancient age of
> > what you are using (gcc 4.8.5).  I haven't used 4.x in many many years,
> > is this what comes with RHEL6?  What is the "base" distro you are
> > building this on, and anything special about the hardware being used
> > here?
> 
> I don't think so, I'm personally building with 4.7.4 and am not seeing
> this with 4.4.110.

Ok, looks like an efi issue...

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 15:00               ` Greg Kroah-Hartman
@ 2018-01-05 18:12                 ` Guenter Roeck
  2018-01-05 20:53                   ` Greg Kroah-Hartman
  0 siblings, 1 reply; 156+ messages in thread
From: Guenter Roeck @ 2018-01-05 18:12 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Willy Tarreau, Pavel Tatashin, Jiri Kosina, Hugh Dickins,
	Dave Hansen, linux-kernel, torvalds, Andrew Morton, shuahkh,
	patches, ben.hutchings, lkft-triage, stable

On Fri, Jan 05, 2018 at 04:00:55PM +0100, Greg Kroah-Hartman wrote:
> On Thu, Jan 04, 2018 at 09:56:47AM -0800, Guenter Roeck wrote:
> > 
> > FWIW, v4.4.110-rc1 boots fine when merged into chromeos-4.4, on i7-7Y75.
> 
> That's good to know, hopefully 4.4.110-final also still works for you :)

It seems to be working. One patch to add for v4.4.111: 

063fb3e56f6d ("x86/kasan: Write protect kasan zero shadow")

It is needed to be able to run KASAN enabled images in KVM.

Guenter

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 17:52                               ` Greg Kroah-Hartman
@ 2018-01-05 18:15                                 ` Andy Lutomirski
  2018-01-05 18:21                                   ` Pavel Tatashin
  2018-01-05 20:48                                   ` Greg Kroah-Hartman
  0 siblings, 2 replies; 156+ messages in thread
From: Andy Lutomirski @ 2018-01-05 18:15 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Pavel Tatashin, Hugh Dickins, Linus Torvalds, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable

On Fri, Jan 5, 2018 at 9:52 AM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Fri, Jan 05, 2018 at 12:48:54PM -0500, Pavel Tatashin wrote:
>> Boots successfully with "noefi" kernel parameter :)
>
> Thanks, that will help me narrow it down.  I'll dig through more patches
> when I get home tonight...

I wish you luck.  The 4.4 series is "KAISER", not "KPTI", and the
relevant code is spread all over the place and is generally garbage.
See, for example, the turd called kaiser_set_shadow_pgd().  I would
not be terribly surprised if that particular turd is biting here.

An alternative theory is that something is screwy in the EFI code.  I
don't see anything directly wrong, but it's certainly a bit sketchy.
The newer kernels carefully avoid using PCID 0 for real work to avoid
corruption due to EFI and similar things.  The "KAISER" code has no
such mitigation.  Fortunately, it seems to use PCID=0 for kernel and
PCID=nonzero for user, so the obvious problem isn't present, but
something could still be wrong.

Pavel, can you send your /proc/cpuinfo on a noefi boot?  (Just the
first CPU worth is fine.)

FWIW, I said before that I have very little desire to help debug
"KAISER".  I stand by that.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 18:15                                 ` Andy Lutomirski
@ 2018-01-05 18:21                                   ` Pavel Tatashin
  2018-01-05 19:14                                     ` Pavel Tatashin
  2018-01-05 20:48                                   ` Greg Kroah-Hartman
  1 sibling, 1 reply; 156+ messages in thread
From: Pavel Tatashin @ 2018-01-05 18:21 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Greg Kroah-Hartman, Hugh Dickins, Linus Torvalds, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable

> Pavel, can you send your /proc/cpuinfo on a noefi boot?  (Just the
> first CPU worth is fine.)

With noefi option:

[root@ca-ostest441 ~]# more /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 79
model name      : Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
stepping        : 1
microcode       : 0xb00001d
cpu MHz         : 1971.406
cache size      : 25600 KB
physical id     : 0
siblings        : 20
core id         : 0
cpu cores       : 10
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 20
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdt
scp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc ap
erfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg
 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_time
r aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb invpcid_singl
e pln pts dtherm intel_pt kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase
tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap xsaveopt
 cqm_llc cqm_occup_llc
bugs            :
bogomips        : 4390.08
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 18:21                                   ` Pavel Tatashin
@ 2018-01-05 19:14                                     ` Pavel Tatashin
  2018-01-05 19:18                                       ` Pavel Tatashin
  0 siblings, 1 reply; 156+ messages in thread
From: Pavel Tatashin @ 2018-01-05 19:14 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Greg Kroah-Hartman, Hugh Dickins, Linus Torvalds, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable

I hoped, this patch would fix the efi issue:
https://lkml.org/lkml/2018/1/5/534

But, unfortunatly it does not. I got a partial panic message this time:

[    4.737578] usb 1-1: new high-speed USB device number 2 using ehci-pci
[    4.846712] BUG: unable to handle kernel paging request at 0000000000017e10
[    4.854509] IP: [<ffffffff810ce77e>]
native_queued_spin_lock_slowpath+0xfe/0x170
[    4.862780] PGD 0
[    4.865034] Oops: 0002 [#1] SMP
[    4.868657] Modules linked in:
[    4.872075] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
4.4.110_pt_linux-v4.4.110 #3
[    4.880526] Hardware name: Oracle Corporation ORACLE SERVER
X6-2/ASM,MOTHERBOARD,1U, BIOS 38050100 08/30/2016
[    4.891596] task: ffffffff81aab500 ti: ffffffff81a98000 task.ti:
ffffffff81a98000
[    4.899950] RIP: 0010:[<ffffffff810ce77e>]  [<ffffffff810ce77e>]
native_queued_spin_lock_slowpath+0xfe/0x170
[    4.910936] RSP: 0000:ffff881fff803c88  EFLAGS: 00010002
[    4.916865] RAX: 000000000000206b RBX: ffff88407e611900 RCX: ffff881fff817e00
[    4.924831] RDX: 0000000000017e10 RSI: 0000000000040000 RDI: ffff88407e611a58
[    4.932797] RBP: ffff881fff803c88 R08: 0000000000000101 R09: 0000000000000000
[    4.940764] R10: 000000005c96d000 R11: ffff88005c96d0c0 R12: ffff881ff25e52c8
[    4.948730] R13: ffff88407e6d1900 R14: ffff881fff8118c0 R15: ffff88407e6118c0
[    4.956696] FS:  0000000000000000(0000) GS:ffff881fff800000(0000)
knlGS:0000000000000000
[    4.965727] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.972140] CR2: 0000000000017e10 CR3: 0000000001aa2000 CR4: 00000000003606

On Fri, Jan 5, 2018 at 1:21 PM, Pavel Tatashin
<pasha.tatashin@oracle.com> wrote:
>> Pavel, can you send your /proc/cpuinfo on a noefi boot?  (Just the
>> first CPU worth is fine.)
>
> With noefi option:
>
> [root@ca-ostest441 ~]# more /proc/cpuinfo
> processor       : 0
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 79
> model name      : Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
> stepping        : 1
> microcode       : 0xb00001d
> cpu MHz         : 1971.406
> cache size      : 25600 KB
> physical id     : 0
> siblings        : 20
> core id         : 0
> cpu cores       : 10
> apicid          : 0
> initial apicid  : 0
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 20
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdt
> scp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc ap
> erfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg
>  fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_time
> r aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb invpcid_singl
> e pln pts dtherm intel_pt kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase
> tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap xsaveopt
>  cqm_llc cqm_occup_llc
> bugs            :
> bogomips        : 4390.08
> clflush size    : 64
> cache_alignment : 64
> address sizes   : 46 bits physical, 48 bits virtual
> power management:

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 19:14                                     ` Pavel Tatashin
@ 2018-01-05 19:18                                       ` Pavel Tatashin
  2018-01-05 20:45                                         ` Greg Kroah-Hartman
  0 siblings, 1 reply; 156+ messages in thread
From: Pavel Tatashin @ 2018-01-05 19:18 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Greg Kroah-Hartman, Hugh Dickins, Linus Torvalds, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable

Actually it helps, if before 4.4.110 never booted on my machine, not i
was able to boot on a second try.

On Fri, Jan 5, 2018 at 2:14 PM, Pavel Tatashin
<pasha.tatashin@oracle.com> wrote:
> I hoped, this patch would fix the efi issue:
> https://lkml.org/lkml/2018/1/5/534
>
> But, unfortunatly it does not. I got a partial panic message this time:
>
> [    4.737578] usb 1-1: new high-speed USB device number 2 using ehci-pci
> [    4.846712] BUG: unable to handle kernel paging request at 0000000000017e10
> [    4.854509] IP: [<ffffffff810ce77e>]
> native_queued_spin_lock_slowpath+0xfe/0x170
> [    4.862780] PGD 0
> [    4.865034] Oops: 0002 [#1] SMP
> [    4.868657] Modules linked in:
> [    4.872075] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> 4.4.110_pt_linux-v4.4.110 #3
> [    4.880526] Hardware name: Oracle Corporation ORACLE SERVER
> X6-2/ASM,MOTHERBOARD,1U, BIOS 38050100 08/30/2016
> [    4.891596] task: ffffffff81aab500 ti: ffffffff81a98000 task.ti:
> ffffffff81a98000
> [    4.899950] RIP: 0010:[<ffffffff810ce77e>]  [<ffffffff810ce77e>]
> native_queued_spin_lock_slowpath+0xfe/0x170
> [    4.910936] RSP: 0000:ffff881fff803c88  EFLAGS: 00010002
> [    4.916865] RAX: 000000000000206b RBX: ffff88407e611900 RCX: ffff881fff817e00
> [    4.924831] RDX: 0000000000017e10 RSI: 0000000000040000 RDI: ffff88407e611a58
> [    4.932797] RBP: ffff881fff803c88 R08: 0000000000000101 R09: 0000000000000000
> [    4.940764] R10: 000000005c96d000 R11: ffff88005c96d0c0 R12: ffff881ff25e52c8
> [    4.948730] R13: ffff88407e6d1900 R14: ffff881fff8118c0 R15: ffff88407e6118c0
> [    4.956696] FS:  0000000000000000(0000) GS:ffff881fff800000(0000)
> knlGS:0000000000000000
> [    4.965727] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    4.972140] CR2: 0000000000017e10 CR3: 0000000001aa2000 CR4: 00000000003606
>
> On Fri, Jan 5, 2018 at 1:21 PM, Pavel Tatashin
> <pasha.tatashin@oracle.com> wrote:
>>> Pavel, can you send your /proc/cpuinfo on a noefi boot?  (Just the
>>> first CPU worth is fine.)
>>
>> With noefi option:
>>
>> [root@ca-ostest441 ~]# more /proc/cpuinfo
>> processor       : 0
>> vendor_id       : GenuineIntel
>> cpu family      : 6
>> model           : 79
>> model name      : Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
>> stepping        : 1
>> microcode       : 0xb00001d
>> cpu MHz         : 1971.406
>> cache size      : 25600 KB
>> physical id     : 0
>> siblings        : 20
>> core id         : 0
>> cpu cores       : 10
>> apicid          : 0
>> initial apicid  : 0
>> fpu             : yes
>> fpu_exception   : yes
>> cpuid level     : 20
>> wp              : yes
>> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
>> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdt
>> scp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc ap
>> erfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg
>>  fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_time
>> r aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb invpcid_singl
>> e pln pts dtherm intel_pt kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase
>> tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap xsaveopt
>>  cqm_llc cqm_occup_llc
>> bugs            :
>> bogomips        : 4390.08
>> clflush size    : 64
>> cache_alignment : 64
>> address sizes   : 46 bits physical, 48 bits virtual
>> power management:

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 19:18                                       ` Pavel Tatashin
@ 2018-01-05 20:45                                         ` Greg Kroah-Hartman
  2018-01-05 21:03                                           ` Pavel Tatashin
  0 siblings, 1 reply; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-05 20:45 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Andy Lutomirski, Hugh Dickins, Linus Torvalds, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable

On Fri, Jan 05, 2018 at 02:18:32PM -0500, Pavel Tatashin wrote:
> Actually it helps, if before 4.4.110 never booted on my machine, not i
> was able to boot on a second try.

Wait, what?  This has never booted on 4.4.x before?  Did 4.4.108 work?
109?  Are you sure this hardware even works?  :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 18:15                                 ` Andy Lutomirski
  2018-01-05 18:21                                   ` Pavel Tatashin
@ 2018-01-05 20:48                                   ` Greg Kroah-Hartman
  1 sibling, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-05 20:48 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Pavel Tatashin, Hugh Dickins, Linus Torvalds, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable

On Fri, Jan 05, 2018 at 10:15:00AM -0800, Andy Lutomirski wrote:
> On Fri, Jan 5, 2018 at 9:52 AM, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > On Fri, Jan 05, 2018 at 12:48:54PM -0500, Pavel Tatashin wrote:
> >> Boots successfully with "noefi" kernel parameter :)
> >
> > Thanks, that will help me narrow it down.  I'll dig through more patches
> > when I get home tonight...
> 
> I wish you luck.  The 4.4 series is "KAISER", not "KPTI", and the
> relevant code is spread all over the place and is generally garbage.
> See, for example, the turd called kaiser_set_shadow_pgd().  I would
> not be terribly surprised if that particular turd is biting here.
> 
> An alternative theory is that something is screwy in the EFI code.  I
> don't see anything directly wrong, but it's certainly a bit sketchy.
> The newer kernels carefully avoid using PCID 0 for real work to avoid
> corruption due to EFI and similar things.  The "KAISER" code has no
> such mitigation.  Fortunately, it seems to use PCID=0 for kernel and
> PCID=nonzero for user, so the obvious problem isn't present, but
> something could still be wrong.
> 
> Pavel, can you send your /proc/cpuinfo on a noefi boot?  (Just the
> first CPU worth is fine.)
> 
> FWIW, I said before that I have very little desire to help debug
> "KAISER".  I stand by that.

I totally understand, and do not expect your help at all.

Worse case, I point people at 4.14 and tell them to upgrade, I'm not
going to waste a ton of time on this for the same exact reasons you list
here.

And yeah, kaiser_set_shadow_pgd() is horrid, I've already gotten sucked
into it for long enough...

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 18:12                 ` Guenter Roeck
@ 2018-01-05 20:53                   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-05 20:53 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Willy Tarreau, Pavel Tatashin, Jiri Kosina, Hugh Dickins,
	Dave Hansen, linux-kernel, torvalds, Andrew Morton, shuahkh,
	patches, ben.hutchings, lkft-triage, stable

On Fri, Jan 05, 2018 at 10:12:38AM -0800, Guenter Roeck wrote:
> On Fri, Jan 05, 2018 at 04:00:55PM +0100, Greg Kroah-Hartman wrote:
> > On Thu, Jan 04, 2018 at 09:56:47AM -0800, Guenter Roeck wrote:
> > > 
> > > FWIW, v4.4.110-rc1 boots fine when merged into chromeos-4.4, on i7-7Y75.
> > 
> > That's good to know, hopefully 4.4.110-final also still works for you :)
> 
> It seems to be working. One patch to add for v4.4.111: 
> 
> 063fb3e56f6d ("x86/kasan: Write protect kasan zero shadow")
> 
> It is needed to be able to run KASAN enabled images in KVM.

Ugh, thanks for that, it also looks like SLES also is missing that one
too.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 17:56 ` Guenter Roeck
@ 2018-01-05 20:54   ` Greg Kroah-Hartman
  2018-01-05 21:21     ` Guenter Roeck
  2018-01-06  1:35     ` Guenter Roeck
  0 siblings, 2 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-05 20:54 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: linux-kernel, torvalds, akpm, shuahkh, patches, ben.hutchings,
	lkft-triage, stable

On Fri, Jan 05, 2018 at 09:56:16AM -0800, Guenter Roeck wrote:
> On Wed, Jan 03, 2018 at 09:11:06PM +0100, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.4.110 release.
> > There are 37 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Fri Jan  5 19:50:38 UTC 2018.
> > Anything received after that time might be too late.
> > 
> 
> Update: v4.4.110 final nosmp builds fail as follows:
> 
> ------------
> Error log:
> arch/x86/entry/vdso/vma.c: In function ‘map_vdso’:
> arch/x86/entry/vdso/vma.c:173:9: error:
> 	implicit declaration of function ‘pvclock_pvti_cpu0_va’

x86-64 or i386?
That should be a CONFIG_PARAVIRT_CLOCK issue, not a smp build issue,
have a .config I can try?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 20:45                                         ` Greg Kroah-Hartman
@ 2018-01-05 21:03                                           ` Pavel Tatashin
  2018-01-05 23:15                                             ` Hugh Dickins
  2018-01-07 10:45                                             ` Greg Kroah-Hartman
  0 siblings, 2 replies; 156+ messages in thread
From: Pavel Tatashin @ 2018-01-05 21:03 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andy Lutomirski, Hugh Dickins, Linus Torvalds, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable

The hardware works :) I meant that before the patch linked in 
https://lkml.org/lkml/2018/1/5/534, I was never able to boot 4.4.110. 
But with that patch applied, I was able to boot it at least once, but it 
could be accidental. The hang/panic does not happen at the same time on 
every boot.

Pasha

On 01/05/2018 03:45 PM, Greg Kroah-Hartman wrote:
> On Fri, Jan 05, 2018 at 02:18:32PM -0500, Pavel Tatashin wrote:
>> Actually it helps, if before 4.4.110 never booted on my machine, not i
>> was able to boot on a second try.
> 
> Wait, what?  This has never booted on 4.4.x before?  Did 4.4.108 work?
> 109?  Are you sure this hardware even works?  :)
> 
> thanks,
> 
> greg k-h
> 

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 20:54   ` Greg Kroah-Hartman
@ 2018-01-05 21:21     ` Guenter Roeck
  2018-01-06  1:35     ` Guenter Roeck
  1 sibling, 0 replies; 156+ messages in thread
From: Guenter Roeck @ 2018-01-05 21:21 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, shuahkh, patches, ben.hutchings,
	lkft-triage, stable

On Fri, Jan 05, 2018 at 09:54:45PM +0100, Greg Kroah-Hartman wrote:
> On Fri, Jan 05, 2018 at 09:56:16AM -0800, Guenter Roeck wrote:
> > On Wed, Jan 03, 2018 at 09:11:06PM +0100, Greg Kroah-Hartman wrote:
> > > This is the start of the stable review cycle for the 4.4.110 release.
> > > There are 37 patches in this series, all will be posted as a response
> > > to this one.  If anyone has any issues with these being applied, please
> > > let me know.
> > > 
> > > Responses should be made by Fri Jan  5 19:50:38 UTC 2018.
> > > Anything received after that time might be too late.
> > > 
> > 
> > Update: v4.4.110 final nosmp builds fail as follows:
> > 
> > ------------
> > Error log:
> > arch/x86/entry/vdso/vma.c: In function ‘map_vdso’:
> > arch/x86/entry/vdso/vma.c:173:9: error:
> > 	implicit declaration of function ‘pvclock_pvti_cpu0_va’
> 
> x86-64 or i386?

x86-64

> That should be a CONFIG_PARAVIRT_CLOCK issue, not a smp build issue,
> have a .config I can try?
> 
https://github.com/groeck/linux-build-test/blob/master/rootfs/x86_64/qemu_x86_64_pc_nosmp_defconfig

However,
https://github.com/groeck/linux-build-test/blob/master/rootfs/x86_64/qemu_x86_64_pc_defconfig
does build, and the only differences are:

30a31
> CONFIG_SMP=y
32a34,35
> CONFIG_NR_CPUS=24
> CONFIG_SCHED_SMT=y
44d46
< CONFIG_ACPI_CONTAINER=y

Both configurations have CONFIG_PARAVIRT_CLOCK disabled.

Guenter

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 21:03                                           ` Pavel Tatashin
@ 2018-01-05 23:15                                             ` Hugh Dickins
  2018-01-06  1:16                                               ` Pavel Tatashin
  2018-01-07 10:45                                             ` Greg Kroah-Hartman
  1 sibling, 1 reply; 156+ messages in thread
From: Hugh Dickins @ 2018-01-05 23:15 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Greg Kroah-Hartman, Andy Lutomirski, Linus Torvalds,
	Thomas Voegtle, Linux Kernel Mailing List, Andrew Morton,
	Guenter Roeck, Shuah Khan, patches, Ben Hutchings, lkft-triage,
	stable

On Fri, Jan 5, 2018 at 1:03 PM, Pavel Tatashin
<pasha.tatashin@oracle.com> wrote:
> The hardware works :) I meant that before the patch linked in
> https://lkml.org/lkml/2018/1/5/534, I was never able to boot 4.4.110. But
> with that patch applied, I was able to boot it at least once, but it could
> be accidental. The hang/panic does not happen at the same time on every
> boot.

I get the feeling that it was accidental: it seems to me that you have
a memory corruption problem, that gets shifted around by the different
patches (or "noefi" or "nopti").

Because yesterday your boots were able to get way beyond the "EFI
Variables Facility" message, and I can't imagine why the EFI issue
would not have been equally debilitating on yesterday's 110-rc, if it
were in play.

I did intend to ask you to send your System.map, for us to scan
through: maybe some variable is marked __init and should not be, then
the "Freeing unused kernel memory" frees it for random reuse.

But today you didn't get anywhere near the "Freeing unused kernel
memory", so that can't be it - or do you sometimes get that far today?

You mention that the hang/panic does not happen at the same time on
every boot: I think all I can ask is for you to keep supplying us with
different examples (console messages) of where it occurs, in the hope
that one of them will point us in the right direction.

And it even seems possible that this has nothing to do with the
4.4.110 changes - that 4.4.109 plus some other random patches would
unleash similar corruption. Though on balance that does seem unlikely.

Hugh

>
> Pasha
>
>
> On 01/05/2018 03:45 PM, Greg Kroah-Hartman wrote:
>>
>> On Fri, Jan 05, 2018 at 02:18:32PM -0500, Pavel Tatashin wrote:
>>>
>>> Actually it helps, if before 4.4.110 never booted on my machine, not i
>>> was able to boot on a second try.
>>
>>
>> Wait, what?  This has never booted on 4.4.x before?  Did 4.4.108 work?
>> 109?  Are you sure this hardware even works?  :)
>>
>> thanks,
>>
>> greg k-h
>>
>

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 14:03           ` Mike Galbraith
@ 2018-01-05 23:28             ` Hugh Dickins
  2018-01-06  2:58               ` Mike Galbraith
  0 siblings, 1 reply; 156+ messages in thread
From: Hugh Dickins @ 2018-01-05 23:28 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Greg Kroah-Hartman, Guenter Roeck, linux-kernel, Linus Torvalds,
	Andrew Morton, Shuah Khan, patches, Ben Hutchings, lkft-triage,
	stable, Tao Wu

On Fri, Jan 5, 2018 at 6:03 AM, Mike Galbraith <efault@gmx.de> wrote:
> On Fri, 2018-01-05 at 14:34 +0100, Greg Kroah-Hartman wrote:
>>
>> Ok, we found two patches that were missing in 4.4-stable that were in
>> the SLES12 tree (thanks to Jamie Iles), now I only have 19k more to sift
>> through :)
>
> As you know, in enterprise, uname -r means you might find something
> this old in your kernel if you look hard enough :)

Mike, I think there's a good chance that Greg's 4.4.110 final will fix
your "segfault at ffffffffff5ff100" crashes: please give it a try when
you can, and let us know - thanks.

Hugh

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 23:15                                             ` Hugh Dickins
@ 2018-01-06  1:16                                               ` Pavel Tatashin
  0 siblings, 0 replies; 156+ messages in thread
From: Pavel Tatashin @ 2018-01-06  1:16 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Greg Kroah-Hartman, Andy Lutomirski, Linus Torvalds,
	Thomas Voegtle, Linux Kernel Mailing List, Andrew Morton,
	Guenter Roeck, Shuah Khan, patches, Ben Hutchings, lkft-triage,
	stable

Hi Hugh,

Thank you very much for your very thoughtful input.

I quiet positive this problem is PTI regression, because exactly the 
same problem I see with kernel 4.1 to which I back-ported all the 
necessary PTI patches from 4.4.110. I will provide this thread with more 
information as I collect it. I will also try to root cause the problem.

The bug has memory corruption behavior, but with both 4.1 and 4.4 
kernels problem goes away when I boot with noefi parameter. So, EFI + 
PTI is the culprit for this memory corruption.

Thank you,
Pavel

On 01/05/2018 06:15 PM, Hugh Dickins wrote:
> On Fri, Jan 5, 2018 at 1:03 PM, Pavel Tatashin
> <pasha.tatashin@oracle.com> wrote:
>> The hardware works :) I meant that before the patch linked in
>> https://lkml.org/lkml/2018/1/5/534, I was never able to boot 4.4.110. But
>> with that patch applied, I was able to boot it at least once, but it could
>> be accidental. The hang/panic does not happen at the same time on every
>> boot.
> 
> I get the feeling that it was accidental: it seems to me that you have
> a memory corruption problem, that gets shifted around by the different
> patches (or "noefi" or "nopti").
> 
> Because yesterday your boots were able to get way beyond the "EFI
> Variables Facility" message, and I can't imagine why the EFI issue
> would not have been equally debilitating on yesterday's 110-rc, if it
> were in play.
> 
> I did intend to ask you to send your System.map, for us to scan
> through: maybe some variable is marked __init and should not be, then
> the "Freeing unused kernel memory" frees it for random reuse.
> 
> But today you didn't get anywhere near the "Freeing unused kernel
> memory", so that can't be it - or do you sometimes get that far today?
> 
> You mention that the hang/panic does not happen at the same time on
> every boot: I think all I can ask is for you to keep supplying us with
> different examples (console messages) of where it occurs, in the hope
> that one of them will point us in the right direction.
> 
> And it even seems possible that this has nothing to do with the
> 4.4.110 changes - that 4.4.109 plus some other random patches would
> unleash similar corruption. Though on balance that does seem unlikely.
> 
> Hugh

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 20:54   ` Greg Kroah-Hartman
  2018-01-05 21:21     ` Guenter Roeck
@ 2018-01-06  1:35     ` Guenter Roeck
  1 sibling, 0 replies; 156+ messages in thread
From: Guenter Roeck @ 2018-01-06  1:35 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, shuahkh, patches, ben.hutchings,
	lkft-triage, stable

On 01/05/2018 12:54 PM, Greg Kroah-Hartman wrote:
> On Fri, Jan 05, 2018 at 09:56:16AM -0800, Guenter Roeck wrote:
>> On Wed, Jan 03, 2018 at 09:11:06PM +0100, Greg Kroah-Hartman wrote:
>>> This is the start of the stable review cycle for the 4.4.110 release.
>>> There are 37 patches in this series, all will be posted as a response
>>> to this one.  If anyone has any issues with these being applied, please
>>> let me know.
>>>
>>> Responses should be made by Fri Jan  5 19:50:38 UTC 2018.
>>> Anything received after that time might be too late.
>>>
>>
>> Update: v4.4.110 final nosmp builds fail as follows:
>>
>> ------------
>> Error log:
>> arch/x86/entry/vdso/vma.c: In function ‘map_vdso’:
>> arch/x86/entry/vdso/vma.c:173:9: error:
>> 	implicit declaration of function ‘pvclock_pvti_cpu0_va’
> 
> x86-64 or i386?
> That should be a CONFIG_PARAVIRT_CLOCK issue, not a smp build issue,
> have a .config I can try?
> 

Here is an easier way to reproduce the problem: make allnoconfig ; make

Guenter

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 23:28             ` Hugh Dickins
@ 2018-01-06  2:58               ` Mike Galbraith
  0 siblings, 0 replies; 156+ messages in thread
From: Mike Galbraith @ 2018-01-06  2:58 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Greg Kroah-Hartman, Guenter Roeck, linux-kernel, Linus Torvalds,
	Andrew Morton, Shuah Khan, patches, Ben Hutchings, lkft-triage,
	stable, Tao Wu

On Fri, 2018-01-05 at 15:28 -0800, Hugh Dickins wrote:
> On Fri, Jan 5, 2018 at 6:03 AM, Mike Galbraith <efault@gmx.de> wrote:
> > On Fri, 2018-01-05 at 14:34 +0100, Greg Kroah-Hartman wrote:
> >>
> >> Ok, we found two patches that were missing in 4.4-stable that were in
> >> the SLES12 tree (thanks to Jamie Iles), now I only have 19k more to sift
> >> through :)
> >
> > As you know, in enterprise, uname -r means you might find something
> > this old in your kernel if you look hard enough :)
> 
> Mike, I think there's a good chance that Greg's 4.4.110 final will fix
> your "segfault at ffffffffff5ff100" crashes: please give it a try when
> you can, and let us know - thanks.

Already done, and yes, it did.

	-Mike

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 21:03                                           ` Pavel Tatashin
  2018-01-05 23:15                                             ` Hugh Dickins
@ 2018-01-07 10:45                                             ` Greg Kroah-Hartman
  2018-01-07 14:17                                               ` Pavel Tatashin
  1 sibling, 1 reply; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-07 10:45 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Andy Lutomirski, Hugh Dickins, Linus Torvalds, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable

On Fri, Jan 05, 2018 at 04:03:54PM -0500, Pavel Tatashin wrote:
> The hardware works :) I meant that before the patch linked in
> https://lkml.org/lkml/2018/1/5/534, I was never able to boot 4.4.110. But
> with that patch applied, I was able to boot it at least once, but it could
> be accidental. The hang/panic does not happen at the same time on every
> boot.

Any chance you can grab the latest SLES 12 kernel and run it with pti
and efi enabled to see if that works properly for you or not?  I trust
SUSE's testing of their kernel, and odds are I'm just missing one of
their many other patches they have in their tree for other issues that
they have seen in the past.

If you want, I can just send you the full patch that they run on top of
the latest 4.4 stable tree, so you don't have to dig it out of their git
repo if you can't find the binary image.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-07 10:45                                             ` Greg Kroah-Hartman
@ 2018-01-07 14:17                                               ` Pavel Tatashin
  2018-01-07 15:06                                                 ` Pavel Tatashin
  0 siblings, 1 reply; 156+ messages in thread
From: Pavel Tatashin @ 2018-01-07 14:17 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andy Lutomirski, Hugh Dickins, Linus Torvalds, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable

Hi Greg,

I cloned and built suse12, and it does not have issues with EFI + PTI
(kaiser) on my machine.

BTW, i have also reproduced this problem on another machine with the
same configuration, therefore, it is not specific only to one box.
Also, as I mentioned earlier I am seeing the same issue with 4.1 +
kaiser patches taken from 4.4.110.

Thank you,
Pavel

On Sun, Jan 7, 2018 at 5:45 AM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Fri, Jan 05, 2018 at 04:03:54PM -0500, Pavel Tatashin wrote:
>> The hardware works :) I meant that before the patch linked in
>> https://lkml.org/lkml/2018/1/5/534, I was never able to boot 4.4.110. But
>> with that patch applied, I was able to boot it at least once, but it could
>> be accidental. The hang/panic does not happen at the same time on every
>> boot.
>
> Any chance you can grab the latest SLES 12 kernel and run it with pti
> and efi enabled to see if that works properly for you or not?  I trust
> SUSE's testing of their kernel, and odds are I'm just missing one of
> their many other patches they have in their tree for other issues that
> they have seen in the past.
>
> If you want, I can just send you the full patch that they run on top of
> the latest 4.4 stable tree, so you don't have to dig it out of their git
> repo if you can't find the binary image.
>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-07 14:17                                               ` Pavel Tatashin
@ 2018-01-07 15:06                                                 ` Pavel Tatashin
  2018-01-08  7:46                                                   ` Greg Kroah-Hartman
  0 siblings, 1 reply; 156+ messages in thread
From: Pavel Tatashin @ 2018-01-07 15:06 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andy Lutomirski, Hugh Dickins, Linus Torvalds, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable

Hi Greg,

I reverted suse12 back to:
13dae54cb229d078635f159dd8afe16ae683980b
x86/kaiser: Move feature detection up (bsc#1068032).

And, still do not see the problem. So, whatever fixes the issue comes
before kaiser.

Pavel

On Sun, Jan 7, 2018 at 9:17 AM, Pavel Tatashin
<pasha.tatashin@oracle.com> wrote:
> Hi Greg,
>
> I cloned and built suse12, and it does not have issues with EFI + PTI
> (kaiser) on my machine.
>
> BTW, i have also reproduced this problem on another machine with the
> same configuration, therefore, it is not specific only to one box.
> Also, as I mentioned earlier I am seeing the same issue with 4.1 +
> kaiser patches taken from 4.4.110.
>
> Thank you,
> Pavel
>
> On Sun, Jan 7, 2018 at 5:45 AM, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
>> On Fri, Jan 05, 2018 at 04:03:54PM -0500, Pavel Tatashin wrote:
>>> The hardware works :) I meant that before the patch linked in
>>> https://lkml.org/lkml/2018/1/5/534, I was never able to boot 4.4.110. But
>>> with that patch applied, I was able to boot it at least once, but it could
>>> be accidental. The hang/panic does not happen at the same time on every
>>> boot.
>>
>> Any chance you can grab the latest SLES 12 kernel and run it with pti
>> and efi enabled to see if that works properly for you or not?  I trust
>> SUSE's testing of their kernel, and odds are I'm just missing one of
>> their many other patches they have in their tree for other issues that
>> they have seen in the past.
>>
>> If you want, I can just send you the full patch that they run on top of
>> the latest 4.4 stable tree, so you don't have to dig it out of their git
>> repo if you can't find the binary image.
>>
>> thanks,
>>
>> greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-07 15:06                                                 ` Pavel Tatashin
@ 2018-01-08  7:46                                                   ` Greg Kroah-Hartman
  2018-01-08 20:38                                                     ` Pavel Tatashin
  0 siblings, 1 reply; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-08  7:46 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Andy Lutomirski, Hugh Dickins, Linus Torvalds, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable

On Sun, Jan 07, 2018 at 10:06:59AM -0500, Pavel Tatashin wrote:
> Hi Greg,
> 
> I reverted suse12 back to:
> 13dae54cb229d078635f159dd8afe16ae683980b
> x86/kaiser: Move feature detection up (bsc#1068032).
> 
> And, still do not see the problem. So, whatever fixes the issue comes
> before kaiser.

Ok, thanks for the hint.

As I can't duplicate this here at all, any specifics as to what
hardware/procesor type this is?

I can punt and say just "use 4.9 on this hardware if you have it",
right?  :)

I'll try to dig through the sles kernel some more, but given it is 20000
patches, and I can't actually test the problem myself, it's not exactly
easy going...

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05  0:06   ` Kevin Hilman
@ 2018-01-08 15:06     ` Guillaume Tucker
  0 siblings, 0 replies; 156+ messages in thread
From: Guillaume Tucker @ 2018-01-08 15:06 UTC (permalink / raw)
  To: Kevin Hilman, kernelci.org bot
  Cc: Greg Kroah-Hartman, linux-kernel, torvalds, akpm, linux, shuahkh,
	patches, ben.hutchings, lkft-triage, stable, kernelci

On 05/01/18 00:06, Kevin Hilman wrote:
> kernelci.org bot <bot@kernelci.org> writes:
> 
>> stable-rc/linux-4.4.y boot: 100 boots: 4 failed, 93 passed with 1 offline, 2 conflicts (v4.4.109-38-g99abd6cdd65e)
>>
>> Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.4.y/kernel/v4.4.109-38-g99abd6cdd65e/
>> Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.4.y/kernel/v4.4.109-38-g99abd6cdd65e/
>>
>> Tree: stable-rc
>> Branch: linux-4.4.y
>> Git Describe: v4.4.109-38-g99abd6cdd65e
>> Git Commit: 99abd6cdd65e984d89c8565508a7a96ea0fce179
>> Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
>> Tested: 53 unique boards, 19 SoC families, 16 builds out of 178
> 
> TL;DR;  All is well.
> 
>> Boot Regressions Detected:
>>
>> arm:
>>
>>      exynos_defconfig:
>>          exynos5422-odroidxu3:
>>              lab-collabora: failing since 58 days (last pass: v4.4.95-21-g32458fcb7bd6 - first fail: v4.4.96-41-g336421367b9c)
> 
> Long standing issue in lab-collabora (passing in other labs)  Guillaume?

This should be fixed now, with a tweak to the device config to
enable relocating the ramdisk and dtb:

     https://review.linaro.org/#/c/23238/

>>      multi_v7_defconfig:
>>          armada-xp-linksys-mamba:
>>              lab-free-electrons: new failure (last pass: v4.4.109-36-g8b381424010c)
> 
> Not a kerel issue, bootROM fails to start bootloader.  I pinged lab
> owners (Free Electrons)
> 
>>          tegra124-nyan-big:
>>              lab-collabora: failing since 1 day (last pass: v4.4.109 - first fail: v4.4.109-36-g8b381424010c)
>>
>>      tegra_defconfig:
>>          tegra124-nyan-big:
>>              lab-collabora: failing since 1 day (last pass: v4.4.108-65-g57856049c0f8 - first fail: v4.4.109)
> 
> This one is booting fine, but the command to power-off the board is
> timing out, resulting in a failure report.

Indeed, this was due to a crash of the lavapdu daemon - it's back
on track now.

(On a side note, the tegra124-nyan-big is still failing to boot
in mainline due to a genuine kernel driver issue.)

Guillaume

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-08  7:46                                                   ` Greg Kroah-Hartman
@ 2018-01-08 20:38                                                     ` Pavel Tatashin
  2018-01-08 21:24                                                       ` Pavel Tatashin
  0 siblings, 1 reply; 156+ messages in thread
From: Pavel Tatashin @ 2018-01-08 20:38 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andy Lutomirski, Hugh Dickins, Linus Torvalds, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable

Hi Greg,



On Mon, Jan 8, 2018 at 2:46 AM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Sun, Jan 07, 2018 at 10:06:59AM -0500, Pavel Tatashin wrote:
>> Hi Greg,
>>
>> I reverted suse12 back to:
>> 13dae54cb229d078635f159dd8afe16ae683980b
>> x86/kaiser: Move feature detection up (bsc#1068032).
>>
>> And, still do not see the problem. So, whatever fixes the issue comes
>> before kaiser.
>
> Ok, thanks for the hint.
>
> As I can't duplicate this here at all, any specifics as to what
> hardware/procesor type this is?
>

BIOS:
Version 2.17.1249. Copyright (C) 2016 American Megatrends, Inc.
BIOS Date: 08/30/2016 10:35:36 Ver: 38050100

ca-ostest442:linux-stable$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                40
On-line CPU(s) list:   0-39
Thread(s) per core:    2
Core(s) per socket:    10
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
Stepping:              1
CPU MHz:               1738.601
BogoMIPS:              4396.18
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              25600K
NUMA node0 CPU(s):     0-9,20-29
NUMA node1 CPU(s):     10-19,30-39

Note, if I boot with nr_cpus=1, hang never happens, with nr_cpus=4
happens but seldomly, and with all 40 CPUs happens on almost every
reboot.

As Hugh Dickins suggested, I am going to show panic outputs, as I get
them. Here is one more panic (note output is not complete because
machine reboots):

[    6.276456] EFI Variables Facility v0.08 2004-May-17
[    6.384665] BUG: unable to handle kernel paging request at
ffff901fff5a6000
[    6.392461] IP: [<ffffffff8106bb08>] vmalloc_fault+0x1f8/0x340
[    6.398987] PGD 0
[    6.401242] Oops: 0000 [#1] SMP
[    6.404866] Modules linked in:
[    6.408287] CPU: 10 PID: 0 Comm: swapper/10 Not tainted
4.4.110_pt_stable #2
[    6.416156] Hardware name: Oracle Corporation ORACLE SERVER
X6-2/ASM,MOTHERBOARD,1U, BIOS 3
8050100 08/30/2016
[    6.427226] task: ffff883ff1e28000 ti: ffff883ff1e24000 task.ti:
ffff883ff1e24000
[    6.435580] RIP: 0010:[<ffffffff8106bb08>]  [<ffffffff8106bb08>]
vmalloc_fault+0x1f8/0x340
[    6.444819] RSP: 0000:ffff883ff1e27cc0  EFLAGS: 00010086
[    6.450749] RAX: ffff881fff5a6058 RBX: 00003ffffffff000 RCX:
0000081fff5a6000
[    6.458714] RDX: ffff880000000000 RSI: ffff901fff5a6000 RDI:
0000000000000000
[    6.466681] RBP: ffff883ff1e27cf0 R08: 0000000000000018 R09:
000000000002d2de
[    6.474647] R10: 0000000000032ef3 R11: 0000000000002e04 R12:
ffffc900000000f0
[    6.482615] R13: ffff880000000000 R14: ffff901fff5a6000 R15:
ffff881fff5a6000
[    6.490574] FS:  0000000000000000(0000) GS:ffff88407e600000(0000)
knlGS:0000000000000000
[    6.499607] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.506022] CR2: ffff901fff5a6000 CR3: 0000000001aa2000 CR4:
0000000000360670
[    6.513989] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[    6.521956] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[    6.529923] Stack:
[    6.532169]  ffff881fff5a6000[    6.532405] ------------[ cut here
]------------
[    6.532414] WARNING: CPU: 22 PID: 162

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-08 20:38                                                     ` Pavel Tatashin
@ 2018-01-08 21:24                                                       ` Pavel Tatashin
  2018-01-11 18:36                                                         ` Pavel Tatashin
  0 siblings, 1 reply; 156+ messages in thread
From: Pavel Tatashin @ 2018-01-08 21:24 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andy Lutomirski, Hugh Dickins, Linus Torvalds, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable

Here is one more:

[    6.284763] EFI Variables Facility v0.08 2004-May-17
[    6.555990] ------------[ cut here ]------------
[    6.561145] kernel BUG at
/scratch/ptatashi/linux-stable/mm/slub.c:3627!
[    6.568625] invalid opcode: 0000 [#1] SMP
[    6.573219] Modules linked in:
[    6.576639] CPU: 1 PID: 364 Comm: kworker/1:1 Not tainted
4.4.110_pt_stable #3
[    6.584692] Hardware name: Oracle Corporation ORACLE SERVER
X6-2/ASM,MOTHERBOARD,1U, BIOS 38050100 08/30/2016
[    6.595766] Workqueue: events clocksource_watchdog_work
[    6.601611] task: ffff881fecd82b00 ti: ffff881fecda4000 task.ti:
ffff881fecda4000
[    6.609963] RIP: 0010:[<ffffffff811e704a>]  [<ffffffff811e704a>]
kfree+0x14a/0x150
[    6.618419] RSP: 0000:ffff881fecda7d40  EFLAGS: 00010246
[    6.624348] RAX: ffffffff8106c280 RBX: ffff883ff114bfc0 RCX:
00000000ffffffd8
[    6.632314] RDX: 000077ff80000000 RSI: 0000000000000246 RDI:
ffff883ff114bfc0
[    6.640280] RBP: ffff881fecda7d58 R08: 0000000000000000 R09:
ffff881fff917300
[    6.648244] R10: 0000000000000000 R11: ffffea00ffc452c0 R12:
ffff883fec2f4080
[    6.656208] R13: ffffffff810a5bee R14: 00000000ffffffff R15:
0000000000000000
[    6.664175] FS:  0000000000000000(0000) GS:ffff881fff840000(0000)
knlGS:0000000000000000
[    6.673208] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.679623] CR2: 0000000000000000 CR3: 0000000001aa2000 CR4:
0000000000360670
[    6.687587] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[    6.695553] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[    6.703516] Stack:
[    6.705759]  ffff883ff114bfc0 ffff883fec2f4080 ffffffff819a26e8
ffff881fecda7e00
[    6.714061]  ffffffff810a5bee ffff881f00000020 ffff881fecda7e10
ffff881fecda7da8
[    6.722363]  ffffffff00000000 ffff881f00000000 ffff881fecda7d90
ffff881fecda7d90
[    6.730666] Call Trace:
[    6.733400]  [<ffffffff810a5bee>]
kthread_create_on_node+0x14e/0x1a0
[    6.740495]  [<ffffffff810f9dd5>]
clocksource_watchdog_work+0x25/0x40
[    6.747679]  [<ffffffff8109ef6f>] process_one_work+0x14f/0x400
[    6.754181]  [<ffffffff8109fbc4>] worker_thread+0x114/0x480
[    6.760402]  [<ffffffff8109fab0>] ? rescuer_thread+0x310/0x310
[    6.766913]  [<ffffffff810a56b5>] kthread+0xe5/0x100
[    6.772456]  [<ffffffff810a55d0>] ? kthread_park+0x60/0x60
[    6.778580]  [<ffffffff8170fa0f>] ret_from_fork+0x3f/0x70
[    6.784608]  [<ffffffff810a55d0>] ? kthread_park+0x60/0x60
[    6.790721] Code: 8b 03 31 f6 f6 c4 40 74 04 41 8b 73 6c 4c 89 df
e8 1c a8 fa ff e9 73 ff ff ff 4c 8d 58 ff e9 20 ff ff ff 49 8b 43 20
a8 01 75 d4 <0f> 0b 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56
41 55
[    6.812429] RIP  [<ffffffff811e704a>] kfree+0x14a/0x150
[    6.818273]  RSP <ffff881fecda7d40>
[    6.822177] ---[ end trace 4ce44d21c6d68eed ]---

On Mon, Jan 8, 2018 at 3:38 PM, Pavel Tatashin
<pasha.tatashin@oracle.com> wrote:
> Hi Greg,
>
>
>
> On Mon, Jan 8, 2018 at 2:46 AM, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
>> On Sun, Jan 07, 2018 at 10:06:59AM -0500, Pavel Tatashin wrote:
>>> Hi Greg,
>>>
>>> I reverted suse12 back to:
>>> 13dae54cb229d078635f159dd8afe16ae683980b
>>> x86/kaiser: Move feature detection up (bsc#1068032).
>>>
>>> And, still do not see the problem. So, whatever fixes the issue comes
>>> before kaiser.
>>
>> Ok, thanks for the hint.
>>
>> As I can't duplicate this here at all, any specifics as to what
>> hardware/procesor type this is?
>>
>
> BIOS:
> Version 2.17.1249. Copyright (C) 2016 American Megatrends, Inc.
> BIOS Date: 08/30/2016 10:35:36 Ver: 38050100
>
> ca-ostest442:linux-stable$ lscpu
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                40
> On-line CPU(s) list:   0-39
> Thread(s) per core:    2
> Core(s) per socket:    10
> Socket(s):             2
> NUMA node(s):          2
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 79
> Model name:            Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
> Stepping:              1
> CPU MHz:               1738.601
> BogoMIPS:              4396.18
> Virtualization:        VT-x
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              256K
> L3 cache:              25600K
> NUMA node0 CPU(s):     0-9,20-29
> NUMA node1 CPU(s):     10-19,30-39
>
> Note, if I boot with nr_cpus=1, hang never happens, with nr_cpus=4
> happens but seldomly, and with all 40 CPUs happens on almost every
> reboot.
>
> As Hugh Dickins suggested, I am going to show panic outputs, as I get
> them. Here is one more panic (note output is not complete because
> machine reboots):
>
> [    6.276456] EFI Variables Facility v0.08 2004-May-17
> [    6.384665] BUG: unable to handle kernel paging request at
> ffff901fff5a6000
> [    6.392461] IP: [<ffffffff8106bb08>] vmalloc_fault+0x1f8/0x340
> [    6.398987] PGD 0
> [    6.401242] Oops: 0000 [#1] SMP
> [    6.404866] Modules linked in:
> [    6.408287] CPU: 10 PID: 0 Comm: swapper/10 Not tainted
> 4.4.110_pt_stable #2
> [    6.416156] Hardware name: Oracle Corporation ORACLE SERVER
> X6-2/ASM,MOTHERBOARD,1U, BIOS 3
> 8050100 08/30/2016
> [    6.427226] task: ffff883ff1e28000 ti: ffff883ff1e24000 task.ti:
> ffff883ff1e24000
> [    6.435580] RIP: 0010:[<ffffffff8106bb08>]  [<ffffffff8106bb08>]
> vmalloc_fault+0x1f8/0x340
> [    6.444819] RSP: 0000:ffff883ff1e27cc0  EFLAGS: 00010086
> [    6.450749] RAX: ffff881fff5a6058 RBX: 00003ffffffff000 RCX:
> 0000081fff5a6000
> [    6.458714] RDX: ffff880000000000 RSI: ffff901fff5a6000 RDI:
> 0000000000000000
> [    6.466681] RBP: ffff883ff1e27cf0 R08: 0000000000000018 R09:
> 000000000002d2de
> [    6.474647] R10: 0000000000032ef3 R11: 0000000000002e04 R12:
> ffffc900000000f0
> [    6.482615] R13: ffff880000000000 R14: ffff901fff5a6000 R15:
> ffff881fff5a6000
> [    6.490574] FS:  0000000000000000(0000) GS:ffff88407e600000(0000)
> knlGS:0000000000000000
> [    6.499607] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    6.506022] CR2: ffff901fff5a6000 CR3: 0000000001aa2000 CR4:
> 0000000000360670
> [    6.513989] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [    6.521956] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [    6.529923] Stack:
> [    6.532169]  ffff881fff5a6000[    6.532405] ------------[ cut here
> ]------------
> [    6.532414] WARNING: CPU: 22 PID: 162

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-05 18:01   ` Greg Kroah-Hartman
@ 2018-01-09 19:49     ` Serge E. Hallyn
  2018-01-10  8:48       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 156+ messages in thread
From: Serge E. Hallyn @ 2018-01-09 19:49 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Alice Ferrazzi, linux-kernel, torvalds, akpm, Guenter Roeck,
	shuahkh, patches, Ben Hutchings, lkft-triage, stable

Quoting Greg Kroah-Hartman (gregkh@linuxfoundation.org):
> On Sat, Jan 06, 2018 at 02:20:16AM +0900, Alice Ferrazzi wrote:
> > On Thu, Jan 4, 2018 at 5:11 AM, Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > > This is the start of the stable review cycle for the 4.4.110 release.
> > > There are 37 patches in this series, all will be posted as a response
> > > to this one.  If anyone has any issues with these being applied, please
> > > let me know.
> > >
> > > Responses should be made by Fri Jan  5 19:50:38 UTC 2018.
> > > Anything received after that time might be too late.
> > >
> > > The whole patch series can be found in one patch at:
> > >         kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.110-rc1.gz
> > > or in the git tree and branch at:
> > >   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
> > > and the diffstat can be found below.
> > >
> > 
> > This patchset merges correctly with Gentoo patches and GCC version 6.4.0
> > The kernel boot up correctly.
> > Logs: http://kernel1.amd64.dev.gentoo.org:8010/#/builders/5/builds/44
> 
> Great, but Gentoo really should be moving to 4.9 and 4.14 here, I
> hope no one running Gentoo is relying on 4.4 :)

Wait what?

According to https://www.kernel.org/category/releases.html
4.4 should be the best bet for longest support, right?  Does
that page need to be updated?  If 4.4 is not going to be
supported, is there anything else with a possible 5-6 years
of support?

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-09 19:49     ` Serge E. Hallyn
@ 2018-01-10  8:48       ` Greg Kroah-Hartman
  2018-01-10 16:45         ` Serge E. Hallyn
  0 siblings, 1 reply; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-10  8:48 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Alice Ferrazzi, linux-kernel, torvalds, akpm, Guenter Roeck,
	shuahkh, patches, Ben Hutchings, lkft-triage, stable

On Tue, Jan 09, 2018 at 01:49:48PM -0600, Serge E. Hallyn wrote:
> Quoting Greg Kroah-Hartman (gregkh@linuxfoundation.org):
> > On Sat, Jan 06, 2018 at 02:20:16AM +0900, Alice Ferrazzi wrote:
> > > On Thu, Jan 4, 2018 at 5:11 AM, Greg Kroah-Hartman
> > > <gregkh@linuxfoundation.org> wrote:
> > > > This is the start of the stable review cycle for the 4.4.110 release.
> > > > There are 37 patches in this series, all will be posted as a response
> > > > to this one.  If anyone has any issues with these being applied, please
> > > > let me know.
> > > >
> > > > Responses should be made by Fri Jan  5 19:50:38 UTC 2018.
> > > > Anything received after that time might be too late.
> > > >
> > > > The whole patch series can be found in one patch at:
> > > >         kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.110-rc1.gz
> > > > or in the git tree and branch at:
> > > >   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
> > > > and the diffstat can be found below.
> > > >
> > > 
> > > This patchset merges correctly with Gentoo patches and GCC version 6.4.0
> > > The kernel boot up correctly.
> > > Logs: http://kernel1.amd64.dev.gentoo.org:8010/#/builders/5/builds/44
> > 
> > Great, but Gentoo really should be moving to 4.9 and 4.14 here, I
> > hope no one running Gentoo is relying on 4.4 :)
> 
> Wait what?
> 
> According to https://www.kernel.org/category/releases.html
> 4.4 should be the best bet for longest support, right?  Does
> that page need to be updated?  If 4.4 is not going to be
> supported, is there anything else with a possible 5-6 years
> of support?

4.4 is going to be supported, yes, but really, for a desktop/server
system, why would you ever want to stick with it for anything longer
than a year?  No new hardware support is added, and no new features that
you would want are in there.

The LTS kernels are for the crazy embedded people that don't change
their hardware systems, and have the insane huge number of out-of-tree
patches.  No one else should be using those kernels, they should always
be using newer ones, as there are always more issues fixed in newer
kernels than older ones.

So again, I hope no one running Gentoo, which is a rolling, constantly
updated distro, is using the old and crusty 4.4 kernel release.  To do
so is to defeat the purpose of relying on Gentoo in the first place...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-10  8:48       ` Greg Kroah-Hartman
@ 2018-01-10 16:45         ` Serge E. Hallyn
  0 siblings, 0 replies; 156+ messages in thread
From: Serge E. Hallyn @ 2018-01-10 16:45 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Serge E. Hallyn, Alice Ferrazzi, linux-kernel, torvalds, akpm,
	Guenter Roeck, shuahkh, patches, Ben Hutchings, lkft-triage,
	stable

Quoting Greg Kroah-Hartman (gregkh@linuxfoundation.org):
> On Tue, Jan 09, 2018 at 01:49:48PM -0600, Serge E. Hallyn wrote:
> > Quoting Greg Kroah-Hartman (gregkh@linuxfoundation.org):
> > > On Sat, Jan 06, 2018 at 02:20:16AM +0900, Alice Ferrazzi wrote:
> > > > On Thu, Jan 4, 2018 at 5:11 AM, Greg Kroah-Hartman
> > > > <gregkh@linuxfoundation.org> wrote:
> > > > > This is the start of the stable review cycle for the 4.4.110 release.
> > > > > There are 37 patches in this series, all will be posted as a response
> > > > > to this one.  If anyone has any issues with these being applied, please
> > > > > let me know.
> > > > >
> > > > > Responses should be made by Fri Jan  5 19:50:38 UTC 2018.
> > > > > Anything received after that time might be too late.
> > > > >
> > > > > The whole patch series can be found in one patch at:
> > > > >         kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.110-rc1.gz
> > > > > or in the git tree and branch at:
> > > > >   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
> > > > > and the diffstat can be found below.
> > > > >
> > > > 
> > > > This patchset merges correctly with Gentoo patches and GCC version 6.4.0
> > > > The kernel boot up correctly.
> > > > Logs: http://kernel1.amd64.dev.gentoo.org:8010/#/builders/5/builds/44
> > > 
> > > Great, but Gentoo really should be moving to 4.9 and 4.14 here, I
> > > hope no one running Gentoo is relying on 4.4 :)
> > 
> > Wait what?
> > 
> > According to https://www.kernel.org/category/releases.html
> > 4.4 should be the best bet for longest support, right?  Does
> > that page need to be updated?  If 4.4 is not going to be
> > supported, is there anything else with a possible 5-6 years
> > of support?
> 
> 4.4 is going to be supported, yes, but really, for a desktop/server
> system, why would you ever want to stick with it for anything longer
> than a year?  No new hardware support is added, and no new features that
> you would want are in there.
> 
> The LTS kernels are for the crazy embedded people that don't change
> their hardware systems, and have the insane huge number of out-of-tree
> patches.  No one else should be using those kernels, they should always
> be using newer ones, as there are always more issues fixed in newer
> kernels than older ones.
> 
> So again, I hope no one running Gentoo, which is a rolling, constantly
> updated distro, is using the old and crusty 4.4 kernel release.  To do
> so is to defeat the purpose of relying on Gentoo in the first place...

Ah, I see, yeah that makes sense :)

thanks,
-serge

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-08 21:24                                                       ` Pavel Tatashin
@ 2018-01-11 18:36                                                         ` Pavel Tatashin
  2018-01-11 18:40                                                           ` Pavel Tatashin
  2018-01-11 20:10                                                           ` Greg Kroah-Hartman
  0 siblings, 2 replies; 156+ messages in thread
From: Pavel Tatashin @ 2018-01-11 18:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andy Lutomirski, Hugh Dickins, Linus Torvalds, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable,
	Steve Sistare

[-- Attachment #1: Type: text/plain, Size: 419 bytes --]

I have root caused the memory corruption panics/hangs that I've been
experiencing during boot with the latest 4.4.110 kernel. The problem
as was suspected by Andy Lutomirski is with interaction between PTI
and EFI. It may affect any system that has EFI bios.  I have not
verified if it can affect any other kernel beside 4.4.110

Attached is the fix for this issue with explanations that Steve
Sistare and I developed.

[-- Attachment #2: 0001-x86-pti-efi-broken-conversion-from-efi-to-kernel-pag.patch --]
[-- Type: text/x-patch, Size: 4201 bytes --]

From 1189f3568a90ddd40e1418b9687def5d89153ee3 Mon Sep 17 00:00:00 2001
From: Pavel Tatashin <pasha.tatashin@oracle.com>
Date: Thu, 11 Jan 2018 06:50:25 -0800
Subject: [PATCH] x86/pti/efi: broken conversion from efi to kernel page table

In entry_64.S we have code like this:

    /* Unconditionally use kernel CR3 for do_nmi() */
    /* %rax is saved above, so OK to clobber here */
    ALTERNATIVE "jmp 2f", "movq %cr3, %rax", X86_FEATURE_KAISER
    /* If PCID enabled, NOFLUSH now and NOFLUSH on return */
    ALTERNATIVE "", "bts $63, %rax", X86_FEATURE_PCID
    pushq   %rax
    /* mask off "user" bit of pgd address and 12 PCID bits: */
    andq    $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
    movq    %rax, %cr3
2:

    /* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
    call    do_nmi

With this instruction:
    andq    $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax

We unconditionally switch from whatever our CR3 was to kernel page table.
But, in arch/x86/platform/efi/efi_64.c We temporarily set a different page
table, that does not have the kernel page table with 0x1000 offset from it.

Look in efi_thunk() and efi_thunk_set_virtual_address_map().

So, while CR3 points to the other page table, we get an NMI interrupt,
and clear 0x1000 from CR3, resulting in a bogus CR3 if the 0x1000 bit was
set.

The efi page table comes from realmode/rm/trampoline_64.S:

arch/x86/realmode/rm/trampoline_64.S

141 .bss
142 .balign PAGE_SIZE
143 GLOBAL(trampoline_pgd) .space PAGE_SIZE

Notice: alignment is PAGE_SIZE, so after applying KAISER_SHADOW_PGD_OFFSET
which equal to PAGE_SIZE, we can get a different page table.

But, even if we fix alignment, here the trampoline binary is later copied
into dynamically allocated memory in reserve_real_mode(), so we need to
fix that place as well.

Fixes: 8a43ddfb93a0 ("KAISER: Kernel Address Isolation")

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
---
 arch/x86/include/asm/kaiser.h        | 8 ++++++++
 arch/x86/realmode/init.c             | 4 +++-
 arch/x86/realmode/rm/trampoline_64.S | 3 ++-
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kaiser.h b/arch/x86/include/asm/kaiser.h
index 802bbbdfe143..e087bd7a8d29 100644
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -19,6 +19,12 @@
 
 #define KAISER_SHADOW_PGD_OFFSET 0x1000
 
+/*
+ *  A page table address must have this alignment to stay the same when
+ *  KAISER_SHADOW_PGD_OFFSET mask is applied
+ */
+#define KAISER_KERNEL_PGD_ALIGNMENT (KAISER_SHADOW_PGD_OFFSET << 1)
+
 #ifdef __ASSEMBLY__
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
 
@@ -71,6 +77,8 @@ movq PER_CPU_VAR(unsafe_stack_register_backup), %rax
 
 #else /* CONFIG_PAGE_TABLE_ISOLATION */
 
+#define KAISER_KERNEL_PGD_ALIGNMENT PAGE_SIZE
+
 .macro SWITCH_KERNEL_CR3
 .endm
 .macro SWITCH_USER_CR3
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 0b7a63d98440..cfecb7d6c6a8 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -1,5 +1,6 @@
 #include <linux/io.h>
 #include <linux/memblock.h>
+#include <linux/kaiser.h>
 
 #include <asm/cacheflush.h>
 #include <asm/pgtable.h>
@@ -15,7 +16,8 @@ void __init reserve_real_mode(void)
 	size_t size = PAGE_ALIGN(real_mode_blob_end - real_mode_blob);
 
 	/* Has to be under 1M so we can execute real-mode AP code. */
-	mem = memblock_find_in_range(0, 1<<20, size, PAGE_SIZE);
+	mem = memblock_find_in_range(0, 1 << 20, size,
+				     KAISER_KERNEL_PGD_ALIGNMENT);
 	if (!mem)
 		panic("Cannot allocate trampoline\n");
 
diff --git a/arch/x86/realmode/rm/trampoline_64.S b/arch/x86/realmode/rm/trampoline_64.S
index dac7b20d2f9d..781cca63f795 100644
--- a/arch/x86/realmode/rm/trampoline_64.S
+++ b/arch/x86/realmode/rm/trampoline_64.S
@@ -30,6 +30,7 @@
 #include <asm/msr.h>
 #include <asm/segment.h>
 #include <asm/processor-flags.h>
+#include <asm/kaiser.h>
 #include "realmode.h"
 
 	.text
@@ -139,7 +140,7 @@ tr_gdt:
 tr_gdt_end:
 
 	.bss
-	.balign	PAGE_SIZE
+	.balign	KAISER_KERNEL_PGD_ALIGNMENT
 GLOBAL(trampoline_pgd)		.space	PAGE_SIZE
 
 	.balign	8
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-11 18:36                                                         ` Pavel Tatashin
@ 2018-01-11 18:40                                                           ` Pavel Tatashin
  2018-01-11 19:09                                                             ` Linus Torvalds
  2018-01-11 20:10                                                           ` Greg Kroah-Hartman
  1 sibling, 1 reply; 156+ messages in thread
From: Pavel Tatashin @ 2018-01-11 18:40 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andy Lutomirski, Hugh Dickins, Linus Torvalds, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable,
	Steve Sistare

If it is better to resubmit this patch via git send-email, please let me know.

Thank you,
Pavel

On Thu, Jan 11, 2018 at 1:36 PM, Pavel Tatashin
<pasha.tatashin@oracle.com> wrote:
> I have root caused the memory corruption panics/hangs that I've been
> experiencing during boot with the latest 4.4.110 kernel. The problem
> as was suspected by Andy Lutomirski is with interaction between PTI
> and EFI. It may affect any system that has EFI bios.  I have not
> verified if it can affect any other kernel beside 4.4.110
>
> Attached is the fix for this issue with explanations that Steve
> Sistare and I developed.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-11 18:40                                                           ` Pavel Tatashin
@ 2018-01-11 19:09                                                             ` Linus Torvalds
  2018-01-11 20:37                                                               ` Thomas Gleixner
  0 siblings, 1 reply; 156+ messages in thread
From: Linus Torvalds @ 2018-01-11 19:09 UTC (permalink / raw)
  To: Pavel Tatashin, Thomas Gleixner
  Cc: Greg Kroah-Hartman, Andy Lutomirski, Hugh Dickins,
	Thomas Voegtle, Linux Kernel Mailing List, Andrew Morton,
	Guenter Roeck, Shuah Khan, patches, Ben Hutchings, lkft-triage,
	stable, Steve Sistare

[ Patch to make sure the EFI trampoline_pgd is properly aligned and
has the double pgd that KPTI requires ]

On Thu, Jan 11, 2018 at 10:40 AM, Pavel Tatashin
<pasha.tatashin@oracle.com> wrote:
> If it is better to resubmit this patch via git send-email, please let me know.

It would be better, because that way the patch can be more easily
quoted and discussed.

That said, I do not see why this isn't an issue upstream too.

As far as I can tell, it's not just 4.4.110. Our current entry code
does that ADJUST_KERNEL_CR3 dance too, which clears the
PTI_SWITCH_MASK bit from cr3.

And that realmode trampoline pgd seems all to be just aligned to PAGE_SIZE.

Now, in the modern world, we generate new page tables for EFI, but we
still have that EFI_OLD_MEMMAP code that disables that. And afaik,
EFI_OLD_MEMMAP has the exact same problem that your patch fixes in 4.4
(where it's always on).

So I think this patch should go into the development kernel too.

Or maybe it already is, and I just haven't gotten it yet.

Or - even more likely - I'm missing something entirely, and even
EFI_OLD_MEMMAP solved this some other way upstream.

Adding Thomas Gleixner explicitly to the participants so that he can
tell me I'm a moron and point me to the right thing.

               Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-11 18:36                                                         ` Pavel Tatashin
  2018-01-11 18:40                                                           ` Pavel Tatashin
@ 2018-01-11 20:10                                                           ` Greg Kroah-Hartman
  2018-01-11 20:17                                                             ` Linus Torvalds
  2018-01-11 20:18                                                             ` Pavel Tatashin
  1 sibling, 2 replies; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-11 20:10 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Andy Lutomirski, Hugh Dickins, Linus Torvalds, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable,
	Steve Sistare

On Thu, Jan 11, 2018 at 01:36:50PM -0500, Pavel Tatashin wrote:
> I have root caused the memory corruption panics/hangs that I've been
> experiencing during boot with the latest 4.4.110 kernel. The problem
> as was suspected by Andy Lutomirski is with interaction between PTI
> and EFI. It may affect any system that has EFI bios.  I have not
> verified if it can affect any other kernel beside 4.4.110
> 
> Attached is the fix for this issue with explanations that Steve
> Sistare and I developed.

Nice, but why does this not show up in 4.9 and 4.14 and Linus's tree as
well on this hardware?  Nor on the SLES12 SP3 kernel?

What is different there that 4.4 requires?  That worries me more than
your fix (which looks good to me, fwiw.)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-11 20:10                                                           ` Greg Kroah-Hartman
@ 2018-01-11 20:17                                                             ` Linus Torvalds
  2018-01-11 20:18                                                             ` Pavel Tatashin
  1 sibling, 0 replies; 156+ messages in thread
From: Linus Torvalds @ 2018-01-11 20:17 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Pavel Tatashin, Andy Lutomirski, Hugh Dickins, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable,
	Steve Sistare

On Thu, Jan 11, 2018 at 12:10 PM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> Nice, but why does this not show up in 4.9 and 4.14 and Linus's tree as
> well on this hardware?  Nor on the SLES12 SP3 kernel?
>
> What is different there that 4.4 requires?  That worries me more than
> your fix (which looks good to me, fwiw.)

I really think it's simply that since v4.6, we've had commit
67a9108ed431 ("x86/efi: Build our own page table structures"), so no
normal EFI use actually uses the old legacy mapping unless you passed
in "efi=old_map" on the kernel command line.

So the bug is there in all versions, it's just that it's normally only
noticeable in 4.4.

But I might be missing some other difference, so take that with a pinch of salt.

             Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-11 20:10                                                           ` Greg Kroah-Hartman
  2018-01-11 20:17                                                             ` Linus Torvalds
@ 2018-01-11 20:18                                                             ` Pavel Tatashin
  1 sibling, 0 replies; 156+ messages in thread
From: Pavel Tatashin @ 2018-01-11 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andy Lutomirski, Hugh Dickins, Linus Torvalds, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable,
	Steve Sistare



On 01/11/2018 03:10 PM, Greg Kroah-Hartman wrote:
> On Thu, Jan 11, 2018 at 01:36:50PM -0500, Pavel Tatashin wrote:
>> I have root caused the memory corruption panics/hangs that I've been
>> experiencing during boot with the latest 4.4.110 kernel. The problem
>> as was suspected by Andy Lutomirski is with interaction between PTI
>> and EFI. It may affect any system that has EFI bios.  I have not
>> verified if it can affect any other kernel beside 4.4.110
>>
>> Attached is the fix for this issue with explanations that Steve
>> Sistare and I developed.
> 
> Nice, but why does this not show up in 4.9 and 4.14 and Linus's tree as
> well on this hardware?  Nor on the SLES12 SP3 kernel?
> 
> What is different there that 4.4 requires?  That worries me more than
> your fix (which looks good to me, fwiw.)

Hi Greg,

I have not studied other versions of kernels, efi was changed 
substantially since 4.4. But, even on 4.4.110 there are several things 
have to happen for this bug to show-up:

1. During boot memmblock must allocate address that is not 2PAGE_SIZE 
aligned.
2. nmi must arrive exactly when EFI replaced page table.

While I was debugging this problem, I tried to enable, kasan, vm_debug, 
add more printfs etc, but every little change would cause this problem 
to disappear, or appear less frequently.

Thank you,
Pavel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-11 19:09                                                             ` Linus Torvalds
@ 2018-01-11 20:37                                                               ` Thomas Gleixner
  2018-01-11 20:46                                                                 ` Linus Torvalds
  0 siblings, 1 reply; 156+ messages in thread
From: Thomas Gleixner @ 2018-01-11 20:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Pavel Tatashin, Greg Kroah-Hartman, Andy Lutomirski,
	Hugh Dickins, Thomas Voegtle, Linux Kernel Mailing List,
	Andrew Morton, Guenter Roeck, Shuah Khan, patches, Ben Hutchings,
	lkft-triage, stable, Steve Sistare, Matt Fleming,
	Borislav Petkov

On Thu, 11 Jan 2018, Linus Torvalds wrote:

> [ Patch to make sure the EFI trampoline_pgd is properly aligned and
> has the double pgd that KPTI requires ]
> 
> On Thu, Jan 11, 2018 at 10:40 AM, Pavel Tatashin
> <pasha.tatashin@oracle.com> wrote:
> > If it is better to resubmit this patch via git send-email, please let me know.
> 
> It would be better, because that way the patch can be more easily
> quoted and discussed.
> 
> That said, I do not see why this isn't an issue upstream too.
> 
> As far as I can tell, it's not just 4.4.110. Our current entry code
> does that ADJUST_KERNEL_CR3 dance too, which clears the
> PTI_SWITCH_MASK bit from cr3.
> 
> And that realmode trampoline pgd seems all to be just aligned to PAGE_SIZE.

Right, but see below.

> Now, in the modern world, we generate new page tables for EFI, but we
> still have that EFI_OLD_MEMMAP code that disables that. And afaik,
> EFI_OLD_MEMMAP has the exact same problem that your patch fixes in 4.4
> (where it's always on).
> 
> So I think this patch should go into the development kernel too.
> 
> Or maybe it already is, and I just haven't gotten it yet.

It's not. There is an efi oldmap fix pending, but that's a different story.

> Or - even more likely - I'm missing something entirely, and even
> EFI_OLD_MEMMAP solved this some other way upstream.

67a9108ed431 ("x86/efi: Build our own page table structures")

got rid of EFI depending on real_mode_header->trampoline_pgd

So I don't see how upstream needs the fix as the trampoline_pgd seems only
to be used when coming out of the boot loader.

Adding Matt. He stepped back from EFI, but he might still know.

> Adding Thomas Gleixner explicitly to the participants so that he can
> tell me I'm a moron and point me to the right thing.

Your wish is my command, but I need to stare some more before doing so.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-11 20:37                                                               ` Thomas Gleixner
@ 2018-01-11 20:46                                                                 ` Linus Torvalds
  2018-01-11 21:32                                                                   ` Thomas Gleixner
  2018-01-11 21:35                                                                   ` Steven Sistare
  0 siblings, 2 replies; 156+ messages in thread
From: Linus Torvalds @ 2018-01-11 20:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Pavel Tatashin, Greg Kroah-Hartman, Andy Lutomirski,
	Hugh Dickins, Thomas Voegtle, Linux Kernel Mailing List,
	Andrew Morton, Guenter Roeck, Shuah Khan, patches, Ben Hutchings,
	lkft-triage, stable, Steve Sistare, Matt Fleming,
	Borislav Petkov

On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> 67a9108ed431 ("x86/efi: Build our own page table structures")
>
> got rid of EFI depending on real_mode_header->trampoline_pgd

So I think it only got rid of by default - the codepath is still
there, the allocation is still there, it's just that it's not actually
used unless somebody does that "efi=old_mmap" thing.

Looking around, there's at least one quirk for the SGI UV1 system that
enables EFI_OLD_MMAP automatically. There might be others that I
missed, but I think that's it.

So it *can* trigger without "efi=old_mmap", but not on any normal machines.

And as Pavel points out, even when the bug is active, it's pretty hard
to actually trigger.

But yeah, there may be other EFI patches that I didn't notice that
changed things in other ways too.

               Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-11 20:46                                                                 ` Linus Torvalds
@ 2018-01-11 21:32                                                                   ` Thomas Gleixner
  2018-01-11 22:30                                                                     ` Thomas Gleixner
  2018-01-11 21:35                                                                   ` Steven Sistare
  1 sibling, 1 reply; 156+ messages in thread
From: Thomas Gleixner @ 2018-01-11 21:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Pavel Tatashin, Greg Kroah-Hartman, Andy Lutomirski,
	Hugh Dickins, Thomas Voegtle, Linux Kernel Mailing List,
	Andrew Morton, Guenter Roeck, Shuah Khan, patches, Ben Hutchings,
	lkft-triage, stable, Steve Sistare, Matt Fleming,
	Borislav Petkov

On Thu, 11 Jan 2018, Linus Torvalds wrote:

> On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > 67a9108ed431 ("x86/efi: Build our own page table structures")
> >
> > got rid of EFI depending on real_mode_header->trampoline_pgd
> 
> So I think it only got rid of by default - the codepath is still
> there, the allocation is still there, it's just that it's not actually
> used unless somebody does that "efi=old_mmap" thing.

Yes, the trampoline_pgd is still around, but I can't figure out how it
would be used after boot. Confused, digging more.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-11 20:46                                                                 ` Linus Torvalds
  2018-01-11 21:32                                                                   ` Thomas Gleixner
@ 2018-01-11 21:35                                                                   ` Steven Sistare
  2018-01-11 21:44                                                                     ` Thomas Gleixner
  1 sibling, 1 reply; 156+ messages in thread
From: Steven Sistare @ 2018-01-11 21:35 UTC (permalink / raw)
  To: Linus Torvalds, Thomas Gleixner
  Cc: Pavel Tatashin, Greg Kroah-Hartman, Andy Lutomirski,
	Hugh Dickins, Thomas Voegtle, Linux Kernel Mailing List,
	Andrew Morton, Guenter Roeck, Shuah Khan, patches, Ben Hutchings,
	lkft-triage, stable, Matt Fleming, Borislav Petkov

On 1/11/2018 3:46 PM, Linus Torvalds wrote:
> On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> 67a9108ed431 ("x86/efi: Build our own page table structures")
>>
>> got rid of EFI depending on real_mode_header->trampoline_pgd
> 
> So I think it only got rid of by default - the codepath is still
> there, the allocation is still there, it's just that it's not actually
> used unless somebody does that "efi=old_mmap" thing.
> 
> Looking around, there's at least one quirk for the SGI UV1 system that
> enables EFI_OLD_MMAP automatically. There might be others that I
> missed, but I think that's it.
> 
> So it *can* trigger without "efi=old_mmap", but not on any normal machines.
> 
> And as Pavel points out, even when the bug is active, it's pretty hard
> to actually trigger.
> 
> But yeah, there may be other EFI patches that I didn't notice that
> changed things in other ways too.
> 
>                Linus

The bug is not present in the latest upstream kernel because the efi_pgd is
correctly aligned:

  arch/x86/platform/efi/efi_64.c
    int __init efi_alloc_page_tables(void)
      efi_pgd = (pgd_t *)__get_free_pages(gfp_mask, PGD_ALLOCATION_ORDER);

  arch/x86/include/asm/pgalloc.h
    +#ifdef CONFIG_PAGE_TABLE_ISOLATION
    +#define PGD_ALLOCATION_ORDER 1
    +#else
    +#define PGD_ALLOCATION_ORDER 0
    +#endif

Pavel's patch fixes kernels prior to
  67a9108ed431 ("x86/efi: Build our own page table structures")

where the efi pgd allocation looks like:

  arch/x86/realmode/init.c
    void __init reserve_real_mode(void)
       mem = memblock_find_in_range(0, 1<<20, size, PAGE_SIZE);
       base = __va(mem);
       real_mode_header = (struct real_mode_header *) base;

  void __init setup_real_mode(void)
    trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);

Kernel versions between 67a9108ed431 and the latest also have the bug and
need a similar fix:

  arch/x86/platform/efi/efi_64.c

    int __init efi_alloc_page_tables(void)
      efi_pgd = (pgd_t *)__get_free_page(gfp_mask);

    int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
      pgd = efi_pgd;
      efi_scratch.efi_pgt = (pgd_t *)__pa(efi_pgd);

All of the code paths above are taken when *not* EFI_OLD_MMAP.

- Steve

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-11 21:35                                                                   ` Steven Sistare
@ 2018-01-11 21:44                                                                     ` Thomas Gleixner
  2018-01-11 21:49                                                                       ` Linus Torvalds
  0 siblings, 1 reply; 156+ messages in thread
From: Thomas Gleixner @ 2018-01-11 21:44 UTC (permalink / raw)
  To: Steven Sistare
  Cc: Linus Torvalds, Pavel Tatashin, Greg Kroah-Hartman,
	Andy Lutomirski, Hugh Dickins, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable,
	Matt Fleming, Borislav Petkov

On Thu, 11 Jan 2018, Steven Sistare wrote:
> On 1/11/2018 3:46 PM, Linus Torvalds wrote:
> > On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >>
> >> 67a9108ed431 ("x86/efi: Build our own page table structures")
> >>
> >> got rid of EFI depending on real_mode_header->trampoline_pgd
> > 
> > So I think it only got rid of by default - the codepath is still
> > there, the allocation is still there, it's just that it's not actually
> > used unless somebody does that "efi=old_mmap" thing.
> > 
> > Looking around, there's at least one quirk for the SGI UV1 system that
> > enables EFI_OLD_MMAP automatically. There might be others that I
> > missed, but I think that's it.
> > 
> > So it *can* trigger without "efi=old_mmap", but not on any normal machines.
> > 
> > And as Pavel points out, even when the bug is active, it's pretty hard
> > to actually trigger.
> > 
> > But yeah, there may be other EFI patches that I didn't notice that
> > changed things in other ways too.
> > 
> >                Linus
> 
> The bug is not present in the latest upstream kernel because the efi_pgd is
> correctly aligned:
> 
>   arch/x86/platform/efi/efi_64.c
>     int __init efi_alloc_page_tables(void)
>       efi_pgd = (pgd_t *)__get_free_pages(gfp_mask, PGD_ALLOCATION_ORDER);

Yes, I came exactly to the same conclusion, but I didn't want to call Linus
a moron before I triple checked that trampoline_pgd is still there, but
only every used to get out of the realmode swamp at bpot.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-11 21:44                                                                     ` Thomas Gleixner
@ 2018-01-11 21:49                                                                       ` Linus Torvalds
  0 siblings, 0 replies; 156+ messages in thread
From: Linus Torvalds @ 2018-01-11 21:49 UTC (permalink / raw)
  To: Steven Sistare
  Cc: Thomas Gleixner, Pavel Tatashin, Greg Kroah-Hartman,
	Andy Lutomirski, Hugh Dickins, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable,
	Matt Fleming, Borislav Petkov

[-- Attachment #1: Type: text/plain, Size: 547 bytes --]

On Jan 11, 2018 13:35, "Steven Sistare" <steven.sistare@oracle.com> wrote:


All of the code paths above are taken when *not* EFI_OLD_MMAP.


But it is exactly the EFI_OLD_MMAP case I worry about.

Nobody should hopefully use it, but as mentioned, at least the SGI UV1 case
enables it automatically.

And who knows how many users ended up adding it manually due to the
problems we had for a while with EFI page tables (due to non-linear
addresses when laying out the EFI data, but also due to bad EFI memory
information from the BIOS)

     Linus

[-- Attachment #2: Type: text/html, Size: 1515 bytes --]

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-11 21:32                                                                   ` Thomas Gleixner
@ 2018-01-11 22:30                                                                     ` Thomas Gleixner
  2018-01-11 22:42                                                                       ` Steven Sistare
  2018-01-11 23:03                                                                       ` Thomas Gleixner
  0 siblings, 2 replies; 156+ messages in thread
From: Thomas Gleixner @ 2018-01-11 22:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Pavel Tatashin, Greg Kroah-Hartman, Andy Lutomirski,
	Hugh Dickins, Thomas Voegtle, Linux Kernel Mailing List,
	Andrew Morton, Guenter Roeck, Shuah Khan, patches, Ben Hutchings,
	lkft-triage, stable, Steve Sistare, Matt Fleming,
	Borislav Petkov

On Thu, 11 Jan 2018, Thomas Gleixner wrote:
> On Thu, 11 Jan 2018, Linus Torvalds wrote:
> 
> > On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > >
> > > 67a9108ed431 ("x86/efi: Build our own page table structures")
> > >
> > > got rid of EFI depending on real_mode_header->trampoline_pgd
> > 
> > So I think it only got rid of by default - the codepath is still
> > there, the allocation is still there, it's just that it's not actually
> > used unless somebody does that "efi=old_mmap" thing.
> 
> Yes, the trampoline_pgd is still around, but I can't figure out how it
> would be used after boot. Confused, digging more.

So coming back to the same commit. From the changelog:

    This is caused by mapping EFI regions with RWX permissions.
    There isn't much we can do to restrict the permissions for these
    regions due to the way the firmware toolchains mix code and
    data, but we can at least isolate these mappings so that they do
    not appear in the regular kernel page tables.
    
    In commit d2f7cbe7b26a ("x86/efi: Runtime services virtual
    mapping") we started using 'trampoline_pgd' to map the EFI
    regions because there was an existing identity mapping there
    which we use during the SetVirtualAddressMap() call and for
    broken firmware that accesses those addresses.

So this very commit gets rid of the (ab)use of trampoline_pgd and allocates
efi_pgd, which we made use the proper size.

trampoline_pgd is since then only used to get into long mode in
realmode/rm/trampoline_64.S and for reboot in machine_real_restart().

The runtime services stuff does not use it in kernel versions >= 4.6

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-11 22:30                                                                     ` Thomas Gleixner
@ 2018-01-11 22:42                                                                       ` Steven Sistare
  2018-01-11 22:47                                                                         ` Thomas Gleixner
  2018-01-11 22:59                                                                         ` Linus Torvalds
  2018-01-11 23:03                                                                       ` Thomas Gleixner
  1 sibling, 2 replies; 156+ messages in thread
From: Steven Sistare @ 2018-01-11 22:42 UTC (permalink / raw)
  To: Thomas Gleixner, Linus Torvalds
  Cc: Pavel Tatashin, Greg Kroah-Hartman, Andy Lutomirski,
	Hugh Dickins, Thomas Voegtle, Linux Kernel Mailing List,
	Andrew Morton, Guenter Roeck, Shuah Khan, patches, Ben Hutchings,
	lkft-triage, stable, Matt Fleming, Borislav Petkov

On 1/11/2018 5:30 PM, Thomas Gleixner wrote:
> On Thu, 11 Jan 2018, Thomas Gleixner wrote:
>> On Thu, 11 Jan 2018, Linus Torvalds wrote:
>>
>>> On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>>>>
>>>> 67a9108ed431 ("x86/efi: Build our own page table structures")
>>>>
>>>> got rid of EFI depending on real_mode_header->trampoline_pgd
>>>
>>> So I think it only got rid of by default - the codepath is still
>>> there, the allocation is still there, it's just that it's not actually
>>> used unless somebody does that "efi=old_mmap" thing.
>>
>> Yes, the trampoline_pgd is still around, but I can't figure out how it
>> would be used after boot. Confused, digging more.
> 
> So coming back to the same commit. From the changelog:
> 
>     This is caused by mapping EFI regions with RWX permissions.
>     There isn't much we can do to restrict the permissions for these
>     regions due to the way the firmware toolchains mix code and
>     data, but we can at least isolate these mappings so that they do
>     not appear in the regular kernel page tables.
>     
>     In commit d2f7cbe7b26a ("x86/efi: Runtime services virtual
>     mapping") we started using 'trampoline_pgd' to map the EFI
>     regions because there was an existing identity mapping there
>     which we use during the SetVirtualAddressMap() call and for
>     broken firmware that accesses those addresses.
> 
> So this very commit gets rid of the (ab)use of trampoline_pgd and allocates
> efi_pgd, which we made use the proper size.
> 
> trampoline_pgd is since then only used to get into long mode in
> realmode/rm/trampoline_64.S and for reboot in machine_real_restart().
> 
> The runtime services stuff does not use it in kernel versions >= 4.6
> 
> Thanks,
> 
> 	tglx

Yes, and addressing Linus' concern about EFI_OLD_MEMMAP, those paths are 
independent of it.  When EFI_OLD_MMAP is enabled, the efi pgd is not 
used, and the bug will not bite.

- Steve

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-11 22:42                                                                       ` Steven Sistare
@ 2018-01-11 22:47                                                                         ` Thomas Gleixner
  2018-01-12  1:15                                                                           ` Guenter Roeck
  2018-01-11 22:59                                                                         ` Linus Torvalds
  1 sibling, 1 reply; 156+ messages in thread
From: Thomas Gleixner @ 2018-01-11 22:47 UTC (permalink / raw)
  To: Steven Sistare
  Cc: Linus Torvalds, Pavel Tatashin, Greg Kroah-Hartman,
	Andy Lutomirski, Hugh Dickins, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable,
	Matt Fleming, Borislav Petkov

On Thu, 11 Jan 2018, Steven Sistare wrote:
> On 1/11/2018 5:30 PM, Thomas Gleixner wrote:
> > On Thu, 11 Jan 2018, Thomas Gleixner wrote:
> >> On Thu, 11 Jan 2018, Linus Torvalds wrote:
> >>
> >>> On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >>>>
> >>>> 67a9108ed431 ("x86/efi: Build our own page table structures")
> >>>>
> >>>> got rid of EFI depending on real_mode_header->trampoline_pgd
> >>>
> >>> So I think it only got rid of by default - the codepath is still
> >>> there, the allocation is still there, it's just that it's not actually
> >>> used unless somebody does that "efi=old_mmap" thing.
> >>
> >> Yes, the trampoline_pgd is still around, but I can't figure out how it
> >> would be used after boot. Confused, digging more.
> > 
> > So coming back to the same commit. From the changelog:
> > 
> >     This is caused by mapping EFI regions with RWX permissions.
> >     There isn't much we can do to restrict the permissions for these
> >     regions due to the way the firmware toolchains mix code and
> >     data, but we can at least isolate these mappings so that they do
> >     not appear in the regular kernel page tables.
> >     
> >     In commit d2f7cbe7b26a ("x86/efi: Runtime services virtual
> >     mapping") we started using 'trampoline_pgd' to map the EFI
> >     regions because there was an existing identity mapping there
> >     which we use during the SetVirtualAddressMap() call and for
> >     broken firmware that accesses those addresses.
> > 
> > So this very commit gets rid of the (ab)use of trampoline_pgd and allocates
> > efi_pgd, which we made use the proper size.
> > 
> > trampoline_pgd is since then only used to get into long mode in
> > realmode/rm/trampoline_64.S and for reboot in machine_real_restart().
> > 
> > The runtime services stuff does not use it in kernel versions >= 4.6
> > 
> > Thanks,
> > 
> > 	tglx
> 
> Yes, and addressing Linus' concern about EFI_OLD_MEMMAP, those paths are 
> independent of it.  When EFI_OLD_MMAP is enabled, the efi pgd is not 
> used, and the bug will not bite.

We have a fix queued in tip/x86/pti which addresses a missing NX clear, but
that's a different story.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-11 22:42                                                                       ` Steven Sistare
  2018-01-11 22:47                                                                         ` Thomas Gleixner
@ 2018-01-11 22:59                                                                         ` Linus Torvalds
  1 sibling, 0 replies; 156+ messages in thread
From: Linus Torvalds @ 2018-01-11 22:59 UTC (permalink / raw)
  To: Steven Sistare
  Cc: Thomas Gleixner, Pavel Tatashin, Greg Kroah-Hartman,
	Andy Lutomirski, Hugh Dickins, Thomas Voegtle,
	Linux Kernel Mailing List, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable,
	Matt Fleming, Borislav Petkov

On Thu, Jan 11, 2018 at 2:42 PM, Steven Sistare
<steven.sistare@oracle.com> wrote:
>
> Yes, and addressing Linus' concern about EFI_OLD_MEMMAP, those paths are
> independent of it.  When EFI_OLD_MMAP is enabled, the efi pgd is not
> used, and the bug will not bite.

Ok, good. Thanks for checking.

               Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-11 22:30                                                                     ` Thomas Gleixner
  2018-01-11 22:42                                                                       ` Steven Sistare
@ 2018-01-11 23:03                                                                       ` Thomas Gleixner
  2018-01-12  7:19                                                                         ` Greg Kroah-Hartman
  1 sibling, 1 reply; 156+ messages in thread
From: Thomas Gleixner @ 2018-01-11 23:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Pavel Tatashin, Greg Kroah-Hartman, Andy Lutomirski,
	Hugh Dickins, Thomas Voegtle, Linux Kernel Mailing List,
	Andrew Morton, Guenter Roeck, Shuah Khan, patches, Ben Hutchings,
	lkft-triage, stable, Steve Sistare, Matt Fleming,
	Borislav Petkov

On Thu, 11 Jan 2018, Thomas Gleixner wrote:
> On Thu, 11 Jan 2018, Thomas Gleixner wrote:
> > On Thu, 11 Jan 2018, Linus Torvalds wrote:
> > 
> > > On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > > >
> > > > 67a9108ed431 ("x86/efi: Build our own page table structures")
> > > >
> > > > got rid of EFI depending on real_mode_header->trampoline_pgd
> > > 
> > > So I think it only got rid of by default - the codepath is still
> > > there, the allocation is still there, it's just that it's not actually
> > > used unless somebody does that "efi=old_mmap" thing.
> > 
> > Yes, the trampoline_pgd is still around, but I can't figure out how it
> > would be used after boot. Confused, digging more.
> 
> So coming back to the same commit. From the changelog:
> 
>     This is caused by mapping EFI regions with RWX permissions.
>     There isn't much we can do to restrict the permissions for these
>     regions due to the way the firmware toolchains mix code and
>     data, but we can at least isolate these mappings so that they do
>     not appear in the regular kernel page tables.
>     
>     In commit d2f7cbe7b26a ("x86/efi: Runtime services virtual
>     mapping") we started using 'trampoline_pgd' to map the EFI
>     regions because there was an existing identity mapping there
>     which we use during the SetVirtualAddressMap() call and for
>     broken firmware that accesses those addresses.
> 
> So this very commit gets rid of the (ab)use of trampoline_pgd and allocates
> efi_pgd, which we made use the proper size.
> 
> trampoline_pgd is since then only used to get into long mode in
> realmode/rm/trampoline_64.S and for reboot in machine_real_restart().
> 
> The runtime services stuff does not use it in kernel versions >= 4.6

But there is one very well hidden user for it after boot:

    It's used for booting secondary CPUs from real mode

So the transition to long mode for secondaries uses the trampoline pgd for
long mode transition and then jumping to secondary_startup_64 where CR3 is
set to the real kernel page tables.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-11 22:47                                                                         ` Thomas Gleixner
@ 2018-01-12  1:15                                                                           ` Guenter Roeck
  0 siblings, 0 replies; 156+ messages in thread
From: Guenter Roeck @ 2018-01-12  1:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Steven Sistare, Linus Torvalds, Pavel Tatashin,
	Greg Kroah-Hartman, Andy Lutomirski, Hugh Dickins,
	Thomas Voegtle, Linux Kernel Mailing List, Andrew Morton,
	Shuah Khan, patches, Ben Hutchings, lkft-triage, stable,
	Matt Fleming, Borislav Petkov

On Thu, Jan 11, 2018 at 11:47:23PM +0100, Thomas Gleixner wrote:
> On Thu, 11 Jan 2018, Steven Sistare wrote:
> > On 1/11/2018 5:30 PM, Thomas Gleixner wrote:
> > > On Thu, 11 Jan 2018, Thomas Gleixner wrote:
> > >> On Thu, 11 Jan 2018, Linus Torvalds wrote:
> > >>
> > >>> On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > >>>>
> > >>>> 67a9108ed431 ("x86/efi: Build our own page table structures")
> > >>>>
> > >>>> got rid of EFI depending on real_mode_header->trampoline_pgd
> > >>>
> > >>> So I think it only got rid of by default - the codepath is still
> > >>> there, the allocation is still there, it's just that it's not actually
> > >>> used unless somebody does that "efi=old_mmap" thing.
> > >>
> > >> Yes, the trampoline_pgd is still around, but I can't figure out how it
> > >> would be used after boot. Confused, digging more.
> > > 
> > > So coming back to the same commit. From the changelog:
> > > 
> > >     This is caused by mapping EFI regions with RWX permissions.
> > >     There isn't much we can do to restrict the permissions for these
> > >     regions due to the way the firmware toolchains mix code and
> > >     data, but we can at least isolate these mappings so that they do
> > >     not appear in the regular kernel page tables.
> > >     
> > >     In commit d2f7cbe7b26a ("x86/efi: Runtime services virtual
> > >     mapping") we started using 'trampoline_pgd' to map the EFI
> > >     regions because there was an existing identity mapping there
> > >     which we use during the SetVirtualAddressMap() call and for
> > >     broken firmware that accesses those addresses.
> > > 
> > > So this very commit gets rid of the (ab)use of trampoline_pgd and allocates
> > > efi_pgd, which we made use the proper size.
> > > 
> > > trampoline_pgd is since then only used to get into long mode in
> > > realmode/rm/trampoline_64.S and for reboot in machine_real_restart().
> > > 
> > > The runtime services stuff does not use it in kernel versions >= 4.6
> > > 
> > > Thanks,
> > > 
> > > 	tglx
> > 
> > Yes, and addressing Linus' concern about EFI_OLD_MEMMAP, those paths are 
> > independent of it.  When EFI_OLD_MMAP is enabled, the efi pgd is not 
> > used, and the bug will not bite.
> 
> We have a fix queued in tip/x86/pti which addresses a missing NX clear, but
> that's a different story.
> 
Since you are talking about NX, I see this in last night's -next:

kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
BUG: unable to handle kernel paging request at fffffe0000007000
IP: 0xfffffe0000006e9d
PGD ffd6067 P4D ffd6067 PUD ffd5067 PMD ff73067 PTE 800000000fc09063
Oops: 0011 [#1] PREEMPT SMP PTI
Modules linked in:
CPU: 0 PID: 1 Comm: init Tainted: G        W
4.15.0-rc7-next-20180111-yocto-standard #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
RIP: 0010:0xfffffe0000006e9d
RSP: 0018:ffffaee28000ffd0 EFLAGS: 00000006
RAX: 000000000000000c RBX: 0000000000400040 RCX: 00007f2c4186ad6a
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffb6a00000
RBP: 0000000000000008 R08: 000000000000037f R09: 0000000000000064
R10: 00000000078bfbfd R11: 0000000000000246 R12: 00007f2c41856a60
R13: 0000000000000000 R14: 0000000000402368 R15: 0000000000001000
FS:  0000000000000000(0000) GS:ffff95fecfc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: fffffe0000007000 CR3: 000000000d88a000 CR4: 00000000003406f0
Call Trace:
Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <90> 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90 90 90 90 90 
RIP: 0xfffffe0000006e9d RSP: ffffaee28000ffd0
CR2: fffffe0000007000
---[ end trace a82b8742114c1785 ]---

Is this the issue you are talking about, or is the fix triggering 
the crash ?

Guenter

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-11 23:03                                                                       ` Thomas Gleixner
@ 2018-01-12  7:19                                                                         ` Greg Kroah-Hartman
  2018-01-12  8:03                                                                           ` Thomas Gleixner
  0 siblings, 1 reply; 156+ messages in thread
From: Greg Kroah-Hartman @ 2018-01-12  7:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Pavel Tatashin, Andy Lutomirski, Hugh Dickins,
	Thomas Voegtle, Linux Kernel Mailing List, Andrew Morton,
	Guenter Roeck, Shuah Khan, patches, Ben Hutchings, lkft-triage,
	stable, Steve Sistare, Matt Fleming, Borislav Petkov

On Fri, Jan 12, 2018 at 12:03:10AM +0100, Thomas Gleixner wrote:
> On Thu, 11 Jan 2018, Thomas Gleixner wrote:
> > On Thu, 11 Jan 2018, Thomas Gleixner wrote:
> > > On Thu, 11 Jan 2018, Linus Torvalds wrote:
> > > 
> > > > On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > > > >
> > > > > 67a9108ed431 ("x86/efi: Build our own page table structures")
> > > > >
> > > > > got rid of EFI depending on real_mode_header->trampoline_pgd
> > > > 
> > > > So I think it only got rid of by default - the codepath is still
> > > > there, the allocation is still there, it's just that it's not actually
> > > > used unless somebody does that "efi=old_mmap" thing.
> > > 
> > > Yes, the trampoline_pgd is still around, but I can't figure out how it
> > > would be used after boot. Confused, digging more.
> > 
> > So coming back to the same commit. From the changelog:
> > 
> >     This is caused by mapping EFI regions with RWX permissions.
> >     There isn't much we can do to restrict the permissions for these
> >     regions due to the way the firmware toolchains mix code and
> >     data, but we can at least isolate these mappings so that they do
> >     not appear in the regular kernel page tables.
> >     
> >     In commit d2f7cbe7b26a ("x86/efi: Runtime services virtual
> >     mapping") we started using 'trampoline_pgd' to map the EFI
> >     regions because there was an existing identity mapping there
> >     which we use during the SetVirtualAddressMap() call and for
> >     broken firmware that accesses those addresses.
> > 
> > So this very commit gets rid of the (ab)use of trampoline_pgd and allocates
> > efi_pgd, which we made use the proper size.
> > 
> > trampoline_pgd is since then only used to get into long mode in
> > realmode/rm/trampoline_64.S and for reboot in machine_real_restart().
> > 
> > The runtime services stuff does not use it in kernel versions >= 4.6
> 
> But there is one very well hidden user for it after boot:
> 
>     It's used for booting secondary CPUs from real mode
> 
> So the transition to long mode for secondaries uses the trampoline pgd for
> long mode transition and then jumping to secondary_startup_64 where CR3 is
> set to the real kernel page tables.

Ok, so the summary is that this patch is only needed for the 4.4 and 4.9
kernels, and _NOT_ for Linus's tree and 4.14, right?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH 4.4 00/37] 4.4.110-stable review
  2018-01-12  7:19                                                                         ` Greg Kroah-Hartman
@ 2018-01-12  8:03                                                                           ` Thomas Gleixner
  0 siblings, 0 replies; 156+ messages in thread
From: Thomas Gleixner @ 2018-01-12  8:03 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linus Torvalds, Pavel Tatashin, Andy Lutomirski, Hugh Dickins,
	Thomas Voegtle, Linux Kernel Mailing List, Andrew Morton,
	Guenter Roeck, Shuah Khan, patches, Ben Hutchings, lkft-triage,
	stable, Steve Sistare, Matt Fleming, Borislav Petkov

On Fri, 12 Jan 2018, Greg Kroah-Hartman wrote:
> On Fri, Jan 12, 2018 at 12:03:10AM +0100, Thomas Gleixner wrote:
> > So the transition to long mode for secondaries uses the trampoline pgd for
> > long mode transition and then jumping to secondary_startup_64 where CR3 is
> > set to the real kernel page tables.
> 
> Ok, so the summary is that this patch is only needed for the 4.4 and 4.9
> kernels, and _NOT_ for Linus's tree and 4.14, right?

Correct.

^ permalink raw reply	[flat|nested] 156+ messages in thread

end of thread, other threads:[~2018-01-12  8:03 UTC | newest]

Thread overview: 156+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-03 20:11 [PATCH 4.4 00/37] 4.4.110-stable review Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 01/37] x86/boot: Add early cmdline parsing for options with arguments Greg Kroah-Hartman
2018-01-03 20:11   ` Greg Kroah-Hartman
2018-01-03 20:11   ` Greg Kroah-Hartman
2018-01-03 20:11   ` Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 02/37] KAISER: Kernel Address Isolation Greg Kroah-Hartman
2018-01-03 20:11   ` [kernel-hardening] " Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 03/37] kaiser: merged update Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 04/37] kaiser: do not set _PAGE_NX on pgd_none Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 05/37] kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 06/37] kaiser: fix build and FIXME in alloc_ldt_struct() Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 07/37] kaiser: KAISER depends on SMP Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 08/37] kaiser: fix regs to do_nmi() ifndef CONFIG_KAISER Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 09/37] kaiser: fix perf crashes Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 10/37] kaiser: ENOMEM if kaiser_pagetable_walk() NULL Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 11/37] kaiser: tidied up asm/kaiser.h somewhat Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 12/37] kaiser: tidied up kaiser_add/remove_mapping slightly Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 13/37] kaiser: kaiser_remove_mapping() move along the pgd Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 14/37] kaiser: cleanups while trying for gold link Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 15/37] kaiser: name that 0x1000 KAISER_SHADOW_PGD_OFFSET Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 16/37] kaiser: delete KAISER_REAL_SWITCH option Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 17/37] kaiser: vmstat show NR_KAISERTABLE as nr_overhead Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 18/37] kaiser: enhanced by kernel and user PCIDs Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 19/37] kaiser: load_new_mm_cr3() let SWITCH_USER_CR3 flush user Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 20/37] kaiser: PCID 0 for kernel and 128 for user Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 21/37] kaiser: x86_cr3_pcid_noflush and x86_cr3_pcid_user Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 22/37] kaiser: paranoid_entry pass cr3 need to paranoid_exit Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 23/37] kaiser: _pgd_alloc() without __GFP_REPEAT to avoid stalls Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 24/37] kaiser: fix unlikely error in alloc_ldt_struct() Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 25/37] kaiser: add "nokaiser" boot option, using ALTERNATIVE Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 26/37] x86/kaiser: Rename and simplify X86_FEATURE_KAISER handling Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 27/37] x86/kaiser: Check boottime cmdline params Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 28/37] kaiser: use ALTERNATIVE instead of x86_cr3_pcid_noflush Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 29/37] kaiser: drop is_atomic arg to kaiser_pagetable_walk() Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 30/37] kaiser: asm/tlbflush.h handle noPGE at lower level Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 31/37] kaiser: kaiser_flush_tlb_on_return_to_user() check PCID Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 32/37] x86/paravirt: Dont patch flush_tlb_single Greg Kroah-Hartman
2018-01-03 20:11   ` Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 33/37] x86/kaiser: Reenable PARAVIRT Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 34/37] kaiser: disabled on Xen PV Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 35/37] x86/kaiser: Move feature detection up Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 36/37] KPTI: Rename to PAGE_TABLE_ISOLATION Greg Kroah-Hartman
2018-01-03 20:11 ` [PATCH 4.4 37/37] KPTI: Report when enabled Greg Kroah-Hartman
2018-01-03 22:08 ` [PATCH 4.4 00/37] 4.4.110-stable review Nathan Chancellor
2018-01-04  8:10   ` Greg Kroah-Hartman
2018-01-04  6:50 ` Naresh Kamboju
2018-01-04  9:27 ` kernelci.org bot
2018-01-05  0:06   ` Kevin Hilman
2018-01-08 15:06     ` Guillaume Tucker
2018-01-04 16:38 ` Pavel Tatashin
2018-01-04 16:53   ` Greg Kroah-Hartman
2018-01-04 17:01     ` Guenter Roeck
2018-01-04 17:09       ` Greg Kroah-Hartman
2018-01-04 17:02     ` Pavel Tatashin
2018-01-04 17:03     ` Willy Tarreau
2018-01-04 17:11       ` Greg Kroah-Hartman
2018-01-04 17:13         ` Willy Tarreau
2018-01-04 17:14         ` Greg Kroah-Hartman
2018-01-04 17:16           ` Greg Kroah-Hartman
2018-01-04 17:56             ` Guenter Roeck
2018-01-05 15:00               ` Greg Kroah-Hartman
2018-01-05 18:12                 ` Guenter Roeck
2018-01-05 20:53                   ` Greg Kroah-Hartman
2018-01-04 20:11   ` Linus Torvalds
2018-01-04 17:03 ` Guenter Roeck
2018-01-04 19:38 ` Thomas Voegtle
2018-01-04 19:50   ` Greg Kroah-Hartman
2018-01-04 20:16     ` Thomas Voegtle
2018-01-04 20:29       ` Linus Torvalds
2018-01-04 20:43         ` Andy Lutomirski
2018-01-04 20:57           ` Hugh Dickins
2018-01-04 21:16             ` Andy Lutomirski
2018-01-04 21:23             ` Pavel Tatashin
2018-01-04 21:37               ` Hugh Dickins
2018-01-04 21:48                 ` Pavel Tatashin
2018-01-04 22:33                   ` Linus Torvalds
2018-01-05 14:59                   ` Greg Kroah-Hartman
2018-01-05 15:32                     ` Pavel Tatashin
2018-01-05 15:51                       ` Greg Kroah-Hartman
2018-01-05 15:57                         ` Willy Tarreau
2018-01-05 18:01                           ` Greg Kroah-Hartman
2018-01-05 16:26                         ` Pavel Tatashin
2018-01-05 16:57                       ` Andy Lutomirski
2018-01-05 17:14                         ` Pavel Tatashin
2018-01-05 17:43                           ` Andy Lutomirski
2018-01-05 17:48                             ` Pavel Tatashin
2018-01-05 17:52                               ` Greg Kroah-Hartman
2018-01-05 18:15                                 ` Andy Lutomirski
2018-01-05 18:21                                   ` Pavel Tatashin
2018-01-05 19:14                                     ` Pavel Tatashin
2018-01-05 19:18                                       ` Pavel Tatashin
2018-01-05 20:45                                         ` Greg Kroah-Hartman
2018-01-05 21:03                                           ` Pavel Tatashin
2018-01-05 23:15                                             ` Hugh Dickins
2018-01-06  1:16                                               ` Pavel Tatashin
2018-01-07 10:45                                             ` Greg Kroah-Hartman
2018-01-07 14:17                                               ` Pavel Tatashin
2018-01-07 15:06                                                 ` Pavel Tatashin
2018-01-08  7:46                                                   ` Greg Kroah-Hartman
2018-01-08 20:38                                                     ` Pavel Tatashin
2018-01-08 21:24                                                       ` Pavel Tatashin
2018-01-11 18:36                                                         ` Pavel Tatashin
2018-01-11 18:40                                                           ` Pavel Tatashin
2018-01-11 19:09                                                             ` Linus Torvalds
2018-01-11 20:37                                                               ` Thomas Gleixner
2018-01-11 20:46                                                                 ` Linus Torvalds
2018-01-11 21:32                                                                   ` Thomas Gleixner
2018-01-11 22:30                                                                     ` Thomas Gleixner
2018-01-11 22:42                                                                       ` Steven Sistare
2018-01-11 22:47                                                                         ` Thomas Gleixner
2018-01-12  1:15                                                                           ` Guenter Roeck
2018-01-11 22:59                                                                         ` Linus Torvalds
2018-01-11 23:03                                                                       ` Thomas Gleixner
2018-01-12  7:19                                                                         ` Greg Kroah-Hartman
2018-01-12  8:03                                                                           ` Thomas Gleixner
2018-01-11 21:35                                                                   ` Steven Sistare
2018-01-11 21:44                                                                     ` Thomas Gleixner
2018-01-11 21:49                                                                       ` Linus Torvalds
2018-01-11 20:10                                                           ` Greg Kroah-Hartman
2018-01-11 20:17                                                             ` Linus Torvalds
2018-01-11 20:18                                                             ` Pavel Tatashin
2018-01-05 20:48                                   ` Greg Kroah-Hartman
2018-01-05  5:33           ` Andy Lutomirski
2018-01-05 10:12             ` Kees Cook
2018-01-05 12:14               ` Greg Kroah-Hartman
2018-01-05 13:08               ` Greg Kroah-Hartman
2018-01-04 20:10   ` Guenter Roeck
2018-01-05 14:58   ` Greg Kroah-Hartman
2018-01-05 15:25     ` Thomas Voegtle
2018-01-05 15:48       ` Greg Kroah-Hartman
2018-01-04 22:00 ` Shuah Khan
2018-01-05  7:55   ` Greg Kroah-Hartman
2018-01-04 23:45 ` Guenter Roeck
2018-01-04 23:58   ` Linus Torvalds
2018-01-05  4:37   ` Mike Galbraith
2018-01-05  4:37     ` Mike Galbraith
2018-01-05 12:17     ` Greg Kroah-Hartman
2018-01-05 12:17       ` Greg Kroah-Hartman
2018-01-05 13:03       ` Mike Galbraith
2018-01-05 13:03         ` Mike Galbraith
2018-01-05 13:34         ` Greg Kroah-Hartman
2018-01-05 13:34           ` Greg Kroah-Hartman
2018-01-05 14:03           ` Mike Galbraith
2018-01-05 23:28             ` Hugh Dickins
2018-01-06  2:58               ` Mike Galbraith
2018-01-05 13:41   ` Greg Kroah-Hartman
2018-01-05 17:51     ` Guenter Roeck
2018-01-05 17:20 ` Alice Ferrazzi
2018-01-05 18:01   ` Greg Kroah-Hartman
2018-01-09 19:49     ` Serge E. Hallyn
2018-01-10  8:48       ` Greg Kroah-Hartman
2018-01-10 16:45         ` Serge E. Hallyn
2018-01-05 17:56 ` Guenter Roeck
2018-01-05 20:54   ` Greg Kroah-Hartman
2018-01-05 21:21     ` Guenter Roeck
2018-01-06  1:35     ` Guenter Roeck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.