All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tony Luck <tony.luck@intel.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andy Lutomirski <luto@kernel.org>,
	Dan Williams <dan.j.williams@intel.com>, <elliott@hpe.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-nvdimm@ml01.01.org, x86@kernel.org
Subject: [PATCH v7 0/3] Machine check recovery when kernel accesses poison
Date: Mon, 4 Jan 2016 16:05:51 -0800	[thread overview]
Message-ID: <cover.1451952351.git.tony.luck@intel.com> (raw)

This series is initially targeted at the folks doing filesystems
on top of NVDIMMs. They really want to be able to return -EIO
when there is a h/w error (just like spinning rust, and SSD does).

I plan to use the same infrastructure to write a machine check aware
"copy_from_user()" that will SIGBUS the calling application when a
syscall touches poison in user space (just like we do when the application
touches the poison itself).

Changes V6-V7:
Boris:	Why add/subtract 0x20000000? Added better comment provided by Andy
Boris:	Churn. Part2 changes things only introduced in part1.
	Merged parts 1&2 into one patch.
Ingo:	Missing my sign off on part1. Added.

Changes V5-V6
Andy:	Provoked massive re-write by providing what is now part1 of this
	patch series. This frees up two bits in the exception table
	fixup field that can be used to tag exception table entries
	as different "classes". This means we don't need my separate
	exception table fro machine checks. Also avoids duplicating
	fixup actions for #PF and #MC cases that were in version 5.
Andy:	Use C99 array initializers to tie the various class fixup
	functions back to the defintions of each class. Also give the
	functions meanningful names (not fixup_class0() etc.).
Boris:	Cleaned up my lousy assembly code removing many spurious 'l'
	modifiers on instructions.
Boris:	Provided some helper functions for the machine check severity
	calculation that make the code more readable.
Boris:	Have __mcsafe_copy() return a structure with the 'remaining bytes'
	in a separate field from the fault indicator. Boris had suggested
	Linux -EFAULT/-EINVAL ... but I thought it made more sense to return
	the exception number (X86_TRAP_MC, etc.)  This finally kills off
	BIT(63) which has been controversial throughout all the early versions
	of this patch series.

Changes V4-V5
Tony:	Extended __mcsafe_copy() to have fixup entries for both machine
	check and page fault.

Changes V3-V4:
Andy:   Simplify fixup_mcexception() by dropping used-once local variable
Andy:   "Reviewed-by" tag added to part1
Boris:  Moved new functions to memcpy_64.S and declaration to asm/string_64.h
Boris:  Changed name s/mcsafe_memcpy/__mcsafe_copy/ to make it clear that this
        is an internal function and that return value doesn't follow memcpy() semantics.
Boris:  "Reviewed-by" tag added to parts 1&2

Changes V2-V3:

Andy:   Don't hack "regs->ax = BIT(63) | addr;" in the machine check
        handler.  Now have better fixup code that computes the number
        of remaining bytes (just like page-fault fixup).
Andy:   #define for BIT(63). Done, plus couple of extra macros using it.
Boris:  Don't clutter up generic code (like mm/extable.c) with this.
        I moved everything under arch/x86 (the asm-generic change is
        a more generic #define).
Boris:  Dependencies for CONFIG_MCE_KERNEL_RECOVERY are too generic.
        I made it a real menu item with default "n". Dan Williams
        will use "select MCE_KERNEL_RECOVERY" from his persistent
        filesystem code.
Boris:  Simplify conditionals in mce.c by moving tolerant/kill_it
        checks earlier, with a skip to end if they aren't set.
Boris:  Miscellaneous grammar/punctuation. Fixed.
Boris:  Don't leak spurious __start_mcextable symbols into kernels
        that didn't configure MCE_KERNEL_RECOVERY. Done.
Tony:   New code doesn't belong in user_copy_64.S/uaccess*.h. Moved
        to new .S/.h files
Elliott:Cacheing behavior non-optimal. Could use movntdqa, vmovntdqa
        or vmovntdqa on source addresses. I didn't fix this yet. Think
        of the current mcsafe_memcpy() as the first of several functions.
        This one is useful for small copies (meta-data) where the overhead
        of saving SSE/AVX state isn't justified.

Changes V1->V2:

0-day:  Reported build errors and warnings on 32-bit systems. Fixed
0-day:  Reported bloat to tinyconfig. Fixed
Boris:  Suggestions to use extra macros to reduce code duplication in _ASM_*EXTABLE. Done
Boris:  Re-write "tolerant==3" check to reduce indentation level. See below.
Andy:   Check IP is valid before searching kernel exception tables. Done.
Andy:   Explain use of BIT(63) on return value from mcsafe_memcpy(). Done (added decode macros).
Andy:   Untangle mess of code in tail of do_machine_check() to make it
        clear what is going on (e.g. that we only enter the ist_begin_non_atomic()
        if we were called from user code, not from kernel!). Done.

Tony Luck (3):
  x86: Add classes to exception tables
  x86, mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception
    table entries
  x86, mce: Add __mcsafe_copy()

 arch/x86/Kconfig                          |  10 +++
 arch/x86/include/asm/asm.h                | 102 ++++++++++++++++------
 arch/x86/include/asm/string_64.h          |  10 +++
 arch/x86/include/asm/uaccess.h            |  17 +++-
 arch/x86/kernel/cpu/mcheck/mce-severity.c |  32 ++++++-
 arch/x86/kernel/cpu/mcheck/mce.c          |  71 ++++++++--------
 arch/x86/kernel/kprobes/core.c            |   2 +-
 arch/x86/kernel/traps.c                   |   6 +-
 arch/x86/kernel/x8664_ksyms_64.c          |   4 +
 arch/x86/lib/memcpy_64.S                  | 136 ++++++++++++++++++++++++++++++
 arch/x86/mm/extable.c                     |  66 ++++++++++-----
 arch/x86/mm/fault.c                       |   2 +-
 12 files changed, 369 insertions(+), 89 deletions(-)

-- 
2.1.4


WARNING: multiple messages have this Message-ID (diff)
From: Tony Luck <tony.luck@intel.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andy Lutomirski <luto@kernel.org>,
	Dan Williams <dan.j.williams@intel.com>,
	elliott@hpe.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linux-nvdimm@ml01.01.org, x86@kernel.org
Subject: [PATCH v7 0/3] Machine check recovery when kernel accesses poison
Date: Mon, 4 Jan 2016 16:05:51 -0800	[thread overview]
Message-ID: <cover.1451952351.git.tony.luck@intel.com> (raw)

This series is initially targeted at the folks doing filesystems
on top of NVDIMMs. They really want to be able to return -EIO
when there is a h/w error (just like spinning rust, and SSD does).

I plan to use the same infrastructure to write a machine check aware
"copy_from_user()" that will SIGBUS the calling application when a
syscall touches poison in user space (just like we do when the application
touches the poison itself).

Changes V6-V7:
Boris:	Why add/subtract 0x20000000? Added better comment provided by Andy
Boris:	Churn. Part2 changes things only introduced in part1.
	Merged parts 1&2 into one patch.
Ingo:	Missing my sign off on part1. Added.

Changes V5-V6
Andy:	Provoked massive re-write by providing what is now part1 of this
	patch series. This frees up two bits in the exception table
	fixup field that can be used to tag exception table entries
	as different "classes". This means we don't need my separate
	exception table fro machine checks. Also avoids duplicating
	fixup actions for #PF and #MC cases that were in version 5.
Andy:	Use C99 array initializers to tie the various class fixup
	functions back to the defintions of each class. Also give the
	functions meanningful names (not fixup_class0() etc.).
Boris:	Cleaned up my lousy assembly code removing many spurious 'l'
	modifiers on instructions.
Boris:	Provided some helper functions for the machine check severity
	calculation that make the code more readable.
Boris:	Have __mcsafe_copy() return a structure with the 'remaining bytes'
	in a separate field from the fault indicator. Boris had suggested
	Linux -EFAULT/-EINVAL ... but I thought it made more sense to return
	the exception number (X86_TRAP_MC, etc.)  This finally kills off
	BIT(63) which has been controversial throughout all the early versions
	of this patch series.

Changes V4-V5
Tony:	Extended __mcsafe_copy() to have fixup entries for both machine
	check and page fault.

Changes V3-V4:
Andy:   Simplify fixup_mcexception() by dropping used-once local variable
Andy:   "Reviewed-by" tag added to part1
Boris:  Moved new functions to memcpy_64.S and declaration to asm/string_64.h
Boris:  Changed name s/mcsafe_memcpy/__mcsafe_copy/ to make it clear that this
        is an internal function and that return value doesn't follow memcpy() semantics.
Boris:  "Reviewed-by" tag added to parts 1&2

Changes V2-V3:

Andy:   Don't hack "regs->ax = BIT(63) | addr;" in the machine check
        handler.  Now have better fixup code that computes the number
        of remaining bytes (just like page-fault fixup).
Andy:   #define for BIT(63). Done, plus couple of extra macros using it.
Boris:  Don't clutter up generic code (like mm/extable.c) with this.
        I moved everything under arch/x86 (the asm-generic change is
        a more generic #define).
Boris:  Dependencies for CONFIG_MCE_KERNEL_RECOVERY are too generic.
        I made it a real menu item with default "n". Dan Williams
        will use "select MCE_KERNEL_RECOVERY" from his persistent
        filesystem code.
Boris:  Simplify conditionals in mce.c by moving tolerant/kill_it
        checks earlier, with a skip to end if they aren't set.
Boris:  Miscellaneous grammar/punctuation. Fixed.
Boris:  Don't leak spurious __start_mcextable symbols into kernels
        that didn't configure MCE_KERNEL_RECOVERY. Done.
Tony:   New code doesn't belong in user_copy_64.S/uaccess*.h. Moved
        to new .S/.h files
Elliott:Cacheing behavior non-optimal. Could use movntdqa, vmovntdqa
        or vmovntdqa on source addresses. I didn't fix this yet. Think
        of the current mcsafe_memcpy() as the first of several functions.
        This one is useful for small copies (meta-data) where the overhead
        of saving SSE/AVX state isn't justified.

Changes V1->V2:

0-day:  Reported build errors and warnings on 32-bit systems. Fixed
0-day:  Reported bloat to tinyconfig. Fixed
Boris:  Suggestions to use extra macros to reduce code duplication in _ASM_*EXTABLE. Done
Boris:  Re-write "tolerant==3" check to reduce indentation level. See below.
Andy:   Check IP is valid before searching kernel exception tables. Done.
Andy:   Explain use of BIT(63) on return value from mcsafe_memcpy(). Done (added decode macros).
Andy:   Untangle mess of code in tail of do_machine_check() to make it
        clear what is going on (e.g. that we only enter the ist_begin_non_atomic()
        if we were called from user code, not from kernel!). Done.

Tony Luck (3):
  x86: Add classes to exception tables
  x86, mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception
    table entries
  x86, mce: Add __mcsafe_copy()

 arch/x86/Kconfig                          |  10 +++
 arch/x86/include/asm/asm.h                | 102 ++++++++++++++++------
 arch/x86/include/asm/string_64.h          |  10 +++
 arch/x86/include/asm/uaccess.h            |  17 +++-
 arch/x86/kernel/cpu/mcheck/mce-severity.c |  32 ++++++-
 arch/x86/kernel/cpu/mcheck/mce.c          |  71 ++++++++--------
 arch/x86/kernel/kprobes/core.c            |   2 +-
 arch/x86/kernel/traps.c                   |   6 +-
 arch/x86/kernel/x8664_ksyms_64.c          |   4 +
 arch/x86/lib/memcpy_64.S                  | 136 ++++++++++++++++++++++++++++++
 arch/x86/mm/extable.c                     |  66 ++++++++++-----
 arch/x86/mm/fault.c                       |   2 +-
 12 files changed, 369 insertions(+), 89 deletions(-)

-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

             reply	other threads:[~2016-01-05  0:12 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-05  0:05 Tony Luck [this message]
2016-01-05  0:05 ` [PATCH v7 0/3] Machine check recovery when kernel accesses poison Tony Luck
2015-12-30 17:59 ` [PATCH v7 1/3] x86: Add classes to exception tables Tony Luck
2015-12-30 17:59   ` Tony Luck
2016-01-06 12:33   ` Borislav Petkov
2016-01-06 17:35     ` Luck, Tony
2016-01-06 17:35       ` Luck, Tony
2016-01-06 17:48       ` Linus Torvalds
2016-01-06 17:48         ` Linus Torvalds
2016-01-06 17:54     ` Andy Lutomirski
2016-01-06 17:54       ` Andy Lutomirski
2016-01-06 17:59       ` Borislav Petkov
2016-01-06 18:07         ` Andy Lutomirski
2016-01-06 18:07           ` Andy Lutomirski
2016-01-06 19:42           ` Borislav Petkov
2016-01-07 12:11             ` Borislav Petkov
2016-01-07 18:22               ` Luck, Tony
2016-01-07 18:22                 ` Luck, Tony
2016-01-08  1:45               ` Luck, Tony
2016-01-08  1:45                 ` Luck, Tony
2016-01-08 10:37                 ` Borislav Petkov
2016-01-08 16:29                   ` Luck, Tony
2016-01-08 17:20                     ` Borislav Petkov
2016-01-08 22:29                     ` Brian Gerst
2016-01-08 22:29                       ` Brian Gerst
2016-01-08  5:30               ` Luck, Tony
2016-01-08  5:30                 ` Luck, Tony
2016-01-08 10:41                 ` Borislav Petkov
2016-01-06 12:36   ` Borislav Petkov
2015-12-31 19:40 ` [PATCH v7 2/3] x86, mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries Tony Luck
2015-12-31 19:40   ` Tony Luck
2015-12-31 19:43 ` [PATCH v7 3/3] x86, mce: Add __mcsafe_copy() Tony Luck
2015-12-31 19:43   ` Tony Luck
2016-01-06  4:42   ` Dan Williams
2016-01-06  4:42     ` Dan Williams
2016-01-06  7:06     ` Luck, Tony
2016-01-06  7:06       ` Luck, Tony
2016-01-06  7:11       ` Dan Williams
2016-01-06  7:11         ` Dan Williams
2016-01-06 16:37         ` Dan Williams
2016-01-06 16:37           ` Dan Williams
2016-01-06 16:57           ` Luck, Tony
2016-01-06 16:57             ` Luck, Tony
2016-01-06 17:05             ` Dan Williams
2016-01-06 17:05               ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1451952351.git.tony.luck@intel.com \
    --to=tony.luck@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=elliott@hpe.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.