* [RFC PATCH 0/2] kpatch: dynamic kernel patching @ 2014-05-01 15:52 Josh Poimboeuf 2014-05-01 15:52 ` [RFC PATCH 1/2] kpatch: add TAINT_KPATCH flag Josh Poimboeuf ` (5 more replies) 0 siblings, 6 replies; 60+ messages in thread From: Josh Poimboeuf @ 2014-05-01 15:52 UTC (permalink / raw) To: Josh Poimboeuf, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Frederic Weisbecker, Ingo Molnar, Jiri Slaby Cc: linux-kernel Hi, Since Jiri posted the kGraft patches [1], I wanted to share an alternative live patching solution called kpatch, which is something we've been working on at Red Hat for quite a while. The kernel piece of it ("kpatch core module") is completely self-contained in a GPL module. It compiles and works without needing to change any kernel code, and in fact we already have it working fine with Fedora 20 [2] without any distro kernel patches needed. We'd definitely like to see it (or some combination of it and kGraft) merged into Linux. This patch set is for the core module, which provides the kernel infrastructure for kpatch. It has a kpatch_register() interface which allows kernel modules ("patch modules") to replace old functions with new functions which are loaded with the modules. There are also some user space tools [2] which aren't included in this patch set, which magically generate binary patch modules from source diffs, and manage the loading and unloading of these modules. I didn't include them here because I think we should agree on what the kernel parts should look like before trying to discuss the user space tools (and whether they should be in-tree). kpatch vs kGraft ---------------- I think the biggest difference between kpatch and kGraft is how they ensure that the patch is applied atomically and safely. kpatch checks the backtraces of all tasks in stop_machine() to ensure that no instances of the old function are running when the new function is applied. I think the biggest downside of this approach is that stop_machine() has to idle all other CPUs during the patching process, so it inserts a small amount of latency (a few ms on an idle system). Instead, kGraft uses per-task consistency: each task either sees the old version or the new version of the function. This gives a consistent view with respect to functions, but _not_ data, because the old and new functions are allowed to run simultaneously and share data. This could be dangerous if a patch changes how a function uses a data structure. The new function could make a data change that the old function wasn't expecting. With kpatch, that's not an issue because all the functions are patched at the same time. So kpatch is safer with respect to data interactions. Other advantages of the kpatch stop_machine() approach: - IMO, the kpatch code is much simpler than kGraft. The implementation is very straightforward and is completely self-contained. It requires zero changes to the kernel. (However a new TAINT_KPATCH flag would be a good idea, and we do anticipate some minor changes to kprobes and ftrace for better compatibility.) - The use of stop_machine() will enable an important not-yet-implemented feature to call a user-supplied callback function at loading time which can be used to atomically update data structures when applying a patch. I don't see how such a feature would be possible with the kGraft approach. - kpatch applies patches immediately without having to send signals to sleeping processes, and without having to hope that those processes handle the signal appropriately. - kpatch's patching behavior is more deterministic because stop_machine() ensures that all tasks are sleeping and interrupts are disabled when the patching occurs. - kpatch already supports other cool features like: - removing patches and rolling back to the original functions - atomically replacing existing patches - incremental patching - loading multiple patch modules TODO ---- Here are the only outstanding issues: - A new FTRACE_OPS_FL_PERMANENT flag is needed to tell ftrace to never disable the handler. Otherwise a patch could be temporarily or permanently removed in certain situations. - A few kprobes compatibility issues: - Patching of a kprobed function doesn't take effect until the kprobe is removed. - kretprobes removes the probed function's calling function's IP from the stack, which could lead to a false negative in the kpatch backtrace safety check. [1] http://thread.gmane.org/gmane.linux.kernel/1694304 [2] https://github.com/dynup/kpatch Josh Poimboeuf (2): kpatch: add TAINT_KPATCH flag kpatch: add kpatch core module Documentation/kpatch.txt | 193 +++++++++++++ Documentation/oops-tracing.txt | 3 + Documentation/sysctl/kernel.txt | 1 + MAINTAINERS | 9 + arch/Kconfig | 14 + include/linux/kernel.h | 1 + include/linux/kpatch.h | 61 ++++ kernel/Makefile | 1 + kernel/kpatch/Makefile | 1 + kernel/kpatch/kpatch.c | 615 ++++++++++++++++++++++++++++++++++++++++ kernel/panic.c | 2 + 11 files changed, 901 insertions(+) create mode 100644 Documentation/kpatch.txt create mode 100644 include/linux/kpatch.h create mode 100644 kernel/kpatch/Makefile create mode 100644 kernel/kpatch/kpatch.c -- 1.9.0 ^ permalink raw reply [flat|nested] 60+ messages in thread
* [RFC PATCH 1/2] kpatch: add TAINT_KPATCH flag 2014-05-01 15:52 [RFC PATCH 0/2] kpatch: dynamic kernel patching Josh Poimboeuf @ 2014-05-01 15:52 ` Josh Poimboeuf 2014-05-01 15:52 ` [RFC PATCH 2/2] kpatch: add kpatch core module Josh Poimboeuf ` (4 subsequent siblings) 5 siblings, 0 replies; 60+ messages in thread From: Josh Poimboeuf @ 2014-05-01 15:52 UTC (permalink / raw) To: Josh Poimboeuf, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Frederic Weisbecker, Ingo Molnar, Jiri Slaby Cc: linux-kernel Add a TAINT_KPATCH flag to be set whenever a kpatch patch module successfully replaces a function. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Seth Jennings <sjenning@redhat.com> --- Documentation/oops-tracing.txt | 3 +++ Documentation/sysctl/kernel.txt | 1 + include/linux/kernel.h | 1 + kernel/panic.c | 2 ++ 4 files changed, 7 insertions(+) diff --git a/Documentation/oops-tracing.txt b/Documentation/oops-tracing.txt index e315599..2ed7ed1 100644 --- a/Documentation/oops-tracing.txt +++ b/Documentation/oops-tracing.txt @@ -268,6 +268,9 @@ characters, each representing a particular tainted value. 14: 'E' if an unsigned module has been loaded in a kernel supporting module signature. + 15: 'H' if a kpatch hot patch module has replaced any functions in the + running kernel. + The primary reason for the 'Tainted: ' string is to tell kernel debuggers if this is a clean kernel or if anything unusual has occurred. Tainting is permanent: even if an offending module is diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 9886c3d..7d31b1e 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -788,6 +788,7 @@ can be ORed together: 4096 - An out-of-tree module has been loaded. 8192 - An unsigned module has been loaded in a kernel supporting module signature. +16384 - A kpatch module has replaced a function in the running kernel. ============================================================== diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 4c52907..cdfbe73 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -470,6 +470,7 @@ extern enum system_states { #define TAINT_FIRMWARE_WORKAROUND 11 #define TAINT_OOT_MODULE 12 #define TAINT_UNSIGNED_MODULE 13 +#define TAINT_KPATCH 14 extern const char hex_asc[]; #define hex_asc_lo(x) hex_asc[((x) & 0x0f)] diff --git a/kernel/panic.c b/kernel/panic.c index d02fa9f..2c04a88 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -212,6 +212,7 @@ static const struct tnt tnts[] = { { TAINT_FIRMWARE_WORKAROUND, 'I', ' ' }, { TAINT_OOT_MODULE, 'O', ' ' }, { TAINT_UNSIGNED_MODULE, 'E', ' ' }, + { TAINT_KPATCH, 'H', ' ' }, }; /** @@ -231,6 +232,7 @@ static const struct tnt tnts[] = { * 'I' - Working around severe firmware bug. * 'O' - Out-of-tree module has been loaded. * 'E' - Unsigned module has been loaded. + * 'H' - kpatch replaced a function. * * The string is overwritten by the next call to print_tainted(). */ -- 1.9.0 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [RFC PATCH 2/2] kpatch: add kpatch core module 2014-05-01 15:52 [RFC PATCH 0/2] kpatch: dynamic kernel patching Josh Poimboeuf 2014-05-01 15:52 ` [RFC PATCH 1/2] kpatch: add TAINT_KPATCH flag Josh Poimboeuf @ 2014-05-01 15:52 ` Josh Poimboeuf 2014-05-01 20:45 ` [RFC PATCH 0/2] kpatch: dynamic kernel patching Andi Kleen ` (3 subsequent siblings) 5 siblings, 0 replies; 60+ messages in thread From: Josh Poimboeuf @ 2014-05-01 15:52 UTC (permalink / raw) To: Josh Poimboeuf, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Frederic Weisbecker, Ingo Molnar, Jiri Slaby Cc: linux-kernel Add the kpatch core module. It's a self-contained module with a kernel patching infrastructure that enables patching a running kernel without rebooting or restarting any processes. Kernel modules ("patch modules") can call kpatch_register() to replace new functions with old ones. Before applying a patch, kpatch checks the stacks of all tasks in stop_machine() to ensure that the patch is applied atomically, to prevent any weird effects from function interface changes or data semantic changes that might occur between old and new versions of the functions running simultaneously. If any of the to-be-patched functions are on the stack, it fails with -EBUSY. ftrace is used to do the code modification. For each function to be patched, kpatch registers an ftrace ops handler. When called, the handler modifies regs->ip on the stack and returns back to ftrace, which restores the RIP and "returns" to the beginning of the new function. Other features: * Safe unpatching * Atomic repatching (replacing one patch module with another) * Support for multiple patch modules * kpatch_[register|unregister] are properly synchronized with kpatch_ftrace_handler() when it runs in NMI context (thanks to Masami for helping with this) Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> --- Documentation/kpatch.txt | 193 +++++++++++++++ MAINTAINERS | 9 + arch/Kconfig | 14 ++ include/linux/kpatch.h | 61 +++++ kernel/Makefile | 1 + kernel/kpatch/Makefile | 1 + kernel/kpatch/kpatch.c | 615 +++++++++++++++++++++++++++++++++++++++++++++++ 7 files changed, 894 insertions(+) create mode 100644 Documentation/kpatch.txt create mode 100644 include/linux/kpatch.h create mode 100644 kernel/kpatch/Makefile create mode 100644 kernel/kpatch/kpatch.c diff --git a/Documentation/kpatch.txt b/Documentation/kpatch.txt new file mode 100644 index 0000000..697c054 --- /dev/null +++ b/Documentation/kpatch.txt @@ -0,0 +1,193 @@ +kpatch: dynamic kernel patching +=============================== + +kpatch is a Linux dynamic kernel patching infrastructure which allows you to +patch a running kernel without rebooting or restarting any processes. It +enables sysadmins to apply critical security patches to the kernel immediately, +without having to wait for long-running tasks to complete, users to log off, or +for scheduled reboot windows. It gives more control over uptime without +sacrificing security or stability. + +How it works +------------ + +kpatch works at a function granularity: old functions are replaced with new +ones. It has four main components: + +- **kpatch-build**: a collection of tools which convert a source diff patch to + a patch module. They work by compiling the kernel both with and without + the source patch, comparing the binaries, and generating a patch module + which includes new binary versions of the functions to be replaced. + +- **patch module**: a kernel module (.ko file) which includes the + replacement functions and metadata about the original functions. + +- **kpatch core module**: the kernel infrastructure which provides an interface + for the patch modules to register new functions for replacement. It uses the + kernel ftrace subsystem to hook into the original function's mcount call + instruction, so that a call to the original function is redirected to the + replacement function. + +- **kpatch utility:** a command-line tool which allows a user to manage a + collection of patch modules. One or more patch modules may be + configured to load at boot time, so that a system can remain patched + even after a reboot into the same version of the kernel. + + +How to use it +------------- + +Currently, only the core module is in the kernel tree. The supporting +kpatch-build and kpatch utility tools can be found at: + + https://github.com/dynup/kpatch + +You can also find directions there for how to create binary patch modules and +load them into your kernel. + + +Limitations +----------- + +- Patches which modify kernel modules are not supported (yet). Only + functions in the vmlinux file (listed in System.map) can be patched. + +- Patches to functions which are always on the stack of at least one + process in the system are not supported. Examples: schedule(), + sys_poll(), sys_select(), sys_read(), sys_nanosleep(). Attempting to + apply such a patch will cause the insmod of the patch module to return + an error. + +- Patches which modify init functions (annotated with `__init`) are not + supported. kpatch-build will return an error if the patch attempts + to do so. + +- Patches which modify statically allocated data are not supported. + kpatch-build will detect that and return an error. (In the future + we will add a facility to support it. It will probably require the + user to write code which runs at patch module loading time which manually + updates the data.) + +- Patches which change the way a function interacts with dynamically + allocated data might be safe, or might not. It isn't possible for + kpatch-build to verify the safety of this kind of patch. It's up to + the user to understand what the patch does, whether the new functions + interact with dynamically allocated data in a different way than the + old functions did, and whether it would be safe to atomically apply + such a patch to a running kernel. + + +Frequently Asked Questions +-------------------------- + +**Q. Isn't this just a virus/rootkit injection framework?** + +kpatch uses kernel modules to replace code. It requires the `CAP_SYS_MODULE` +capability. If you already have that capability, then you already have the +ability to arbitrarily modify the kernel, with or without kpatch. + +**Q. How can I detect if somebody has patched the kernel?** + +When a patch module is loaded, the TAINT_KPATCH flag is set. To test for the +taint flag, run: + + [ `cat /proc/sys/kernel/tainted` -ge 16384 ] && echo tainted + +If TAINT_KPATCH is set, you'll see "tainted". + +**Q. Will it destabilize my system?** + +No, as long as the patch is chosen carefully. See the Limitations section +above. + +**Q. Why does kpatch use ftrace to jump to the replacement function instead of +adding the jump directly?** + +ftrace owns the first "call mcount" instruction of every kernel function. In +order to keep compatibility with ftrace, we go through ftrace rather than +updating the instruction directly. + +**Q Is kpatch compatible with \<insert kernel debugging subsystem here\>?** + +We aim to be good kernel citizens and maintain compatibility. A hot patch +replacement function is no different than a function loaded by any other kernel +module. Each replacement function has its own symbol name and kallsyms entry, +so it looks like a normal function to the kernel. + +- **oops stack traces**: Yes. If the replacement function is involved in an + oops, the stack trace will show the function and kernel module name of the + replacement function, just like any other kernel module function. The oops + message will also show the taint flag (currently `TAINT_USER`). +- **kdump/crash**: Yes. Replacement functions are normal functions, so crash + will have no issues. [TODO: create patch module debuginfo symbols and crash + warning message] +- **ftrace**: Yes, see previous question. +- **systemtap/kprobes**: Some incompatibilities exist. + - If you setup a kprobe module at the beginning of a function before loading + a kpatch module, and they both affect the same function, kprobes "wins" + until the kprobe has been unregistered. This is tracked in issue + [#47](https://github.com/dynup/kpatch/issues/47). + - Setting a kretprobe before loading a kpatch module could be unsafe. See + issue [#67](https://github.com/dynup/kpatch/issues/67). +- **perf**: TODO: try it out + +**Q. Why not use something like kexec instead?** + +If you want to avoid a hardware reboot, but are ok with restarting processes, +kexec is a good alternative. + +**Q. If an application can't handle a reboot, it's designed wrong.** + +That's a good poi... [system reboots] + +**Q. What changes are needed in other upstream projects?** + +We hope to make the following changes to other projects: + +- kernel: + - ftrace improvements to close any windows that would allow a patch to + be inadvertently disabled + - hot patch taint flag + - possibly the kpatch core module itself + +- crash: + - make it glaringly obvious that you're debugging a patched kernel + - point it to where the patch modules and corresponding debug symbols + live on the file system + +**Q: Is it possible to register a function that gets called atomically with +`stop_machine` when the patch module loads and unloads?** + +We do have plans to implement something like that. + +**Q. What kernels are supported?** + +kpatch needs gcc >= 4.6 and Linux >= 3.7 for use of the -mfentry flag. + +**Q. Is it possible to remove a patch?** + +Yes. Just run `kpatch unload` which will disable and unload the patch module +and restore the function to its original state. + +**Q. Can you apply multiple patches?** + +Yes. Also, a single function can even be patched multiple times if needed. + +**Q. Why did kpatch-build detect a changed function that wasn't touched by the +source patch?** + +There could be a variety of reasons for this, such as: + +- The patch changed an inline function. +- The compiler decided to inline a changed function, resulting in the outer + function getting recompiled. This is common in the case where the inner + function is static and is only called once. +- The function uses a WARN() or WARN_ON() macro. These macros embed the source + code line number (`__LINE__`) into an instruction. If a function was changed + higher up in the file, it will affect the line numbers for all subsequent + WARN calls in the file, resulting in recompilation of their functions. If + this happens to you, you can usually just ignore it, as patching a few extra + functions isn't typically a problem. If it becomes a problem for whatever + reason, you can change the source patch to redefine the WARN macro for the + affected files, such that it hard codes the old line number instead of using + `__LINE__`, for example. diff --git a/MAINTAINERS b/MAINTAINERS index ea44a57..711dc3b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5200,6 +5200,15 @@ F: include/linux/kmemleak.h F: mm/kmemleak.c F: mm/kmemleak-test.c +KPATCH +M: Josh Poimboeuf <jpoimboe@redhat.com> +M: Seth Jennings <sjenning@redhat.com> +W: https://github.com/dynup/kpatch +S: Maintained +F: Documentation/kpatch.txt +F: include/linux/kpatch.h +F: kernel/kpatch/* + KPROBES M: Ananth N Mavinakayanahalli <ananth@in.ibm.com> M: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> diff --git a/arch/Kconfig b/arch/Kconfig index 97ff872..8693aae 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -472,6 +472,20 @@ config HAVE_IRQ_EXIT_ON_IRQ_STACK This spares a stack switch and improves cache usage on softirq processing. +config KPATCH + tristate "kpatch dynamic kernel updating support" + default n + depends on MODULES + depends on FTRACE + depends on HAVE_FENTRY + depends on SYSFS + help + kpatch is a dynamic kernel patching infrastructure which allows you + to dynamically update the code of a running kernel without rebooting + or restarting any processes. + + If in doubt, say "N". + # # ABI hall of shame # diff --git a/include/linux/kpatch.h b/include/linux/kpatch.h new file mode 100644 index 0000000..121e383 --- /dev/null +++ b/include/linux/kpatch.h @@ -0,0 +1,61 @@ +/* + * kpatch.h + * + * Copyright (C) 2014 Seth Jennings <sjenning@redhat.com> + * Copyright (C) 2013-2014 Josh Poimboeuf <jpoimboe@redhat.com> + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <https://www.gnu.org/licenses/>. + * + * Contains the API for the core kpatch module used by the patch modules + */ + +#ifndef _KPATCH_H_ +#define _KPATCH_H_ + +#include <linux/types.h> +#include <linux/module.h> + +enum kpatch_op { + KPATCH_OP_NONE, + KPATCH_OP_PATCH, + KPATCH_OP_UNPATCH, +}; + +struct kpatch_func { + /* public */ + unsigned long new_addr; + unsigned long new_size; + unsigned long old_addr; + unsigned long old_size; + + /* private */ + struct hlist_node node; + struct kpatch_module *kpmod; + enum kpatch_op op; +}; + +struct kpatch_module { + struct module *mod; + struct kpatch_func *funcs; + int num_funcs; + + bool enabled; +}; + +extern struct kobject *kpatch_patches_kobj; + +extern int kpatch_register(struct kpatch_module *kpmod, bool replace); +extern int kpatch_unregister(struct kpatch_module *kpmod); + +#endif /* _KPATCH_H_ */ diff --git a/kernel/Makefile b/kernel/Makefile index f2a8b62..2e56269 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -87,6 +87,7 @@ obj-$(CONFIG_RING_BUFFER) += trace/ obj-$(CONFIG_TRACEPOINTS) += trace/ obj-$(CONFIG_IRQ_WORK) += irq_work.o obj-$(CONFIG_CPU_PM) += cpu_pm.o +obj-$(CONFIG_KPATCH) += kpatch/ obj-$(CONFIG_PERF_EVENTS) += events/ diff --git a/kernel/kpatch/Makefile b/kernel/kpatch/Makefile new file mode 100644 index 0000000..800241a --- /dev/null +++ b/kernel/kpatch/Makefile @@ -0,0 +1 @@ +obj-$(CONFIG_KPATCH) := kpatch.o diff --git a/kernel/kpatch/kpatch.c b/kernel/kpatch/kpatch.c new file mode 100644 index 0000000..390e926 --- /dev/null +++ b/kernel/kpatch/kpatch.c @@ -0,0 +1,615 @@ +/* + * Copyright (C) 2014 Seth Jennings <sjenning@redhat.com> + * Copyright (C) 2013-2014 Josh Poimboeuf <jpoimboe@redhat.com> + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +/* Contains the code for the core kpatch module. Each patch module registers + * with this module to redirect old functions to new functions. + * + * Each patch module can contain one or more new functions. This information + * is contained in the .patches section of the patch module. For each function + * patched by the module we must: + * - Call stop_machine + * - Ensure that no execution thread is currently in the old function (or has + * it in the call stack) + * - Add the new function address to the kpatch_funcs table + * + * After that, each call to the old function calls into kpatch_ftrace_handler() + * which finds the new function in the kpatch_funcs table and updates the + * return instruction pointer so that ftrace will return to the new function. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/kpatch.h> +#include <linux/module.h> +#include <linux/slab.h> +#include <linux/stop_machine.h> +#include <linux/ftrace.h> +#include <linux/hashtable.h> +#include <linux/preempt_mask.h> +#include <asm/stacktrace.h> +#include <asm/cacheflush.h> + +#define KPATCH_HASH_BITS 8 +static DEFINE_HASHTABLE(kpatch_func_hash, KPATCH_HASH_BITS); + +static DEFINE_SEMAPHORE(kpatch_mutex); + +static int kpatch_num_registered; + +static struct kobject *kpatch_root_kobj; +struct kobject *kpatch_patches_kobj; +EXPORT_SYMBOL_GPL(kpatch_patches_kobj); + +struct kpatch_backtrace_args { + struct kpatch_module *kpmod; + int ret; +}; + +/* + * The kpatch core module has a state machine which allows for proper + * synchronization with kpatch_ftrace_handler() when it runs in NMI context. + * + * +-----------------------------------------------------+ + * | | + * | + + * v +---> KPATCH_STATE_SUCCESS + * KPATCH_STATE_IDLE +---> KPATCH_STATE_UPDATING | + * ^ +---> KPATCH_STATE_FAILURE + * | + + * | | + * +-----------------------------------------------------+ + * + * KPATCH_STATE_IDLE: No updates are pending. The func hash is valid, and the + * reader doesn't need to check func->op. + * + * KPATCH_STATE_UPDATING: An update is in progress. The reader must call + * kpatch_state_finish(KPATCH_STATE_FAILURE) before accessing the func hash. + * + * KPATCH_STATE_FAILURE: An update failed, and the func hash might be + * inconsistent (pending patched funcs might not have been removed yet). If + * func->op is KPATCH_OP_PATCH, then rollback to the previous version of the + * func. + * + * KPATCH_STATE_SUCCESS: An update succeeded, but the func hash might be + * inconsistent (pending unpatched funcs might not have been removed yet). If + * func->op is KPATCH_OP_UNPATCH, then rollback to the previous version of the + * func. + */ +enum { + KPATCH_STATE_IDLE, + KPATCH_STATE_UPDATING, + KPATCH_STATE_SUCCESS, + KPATCH_STATE_FAILURE, +}; +static atomic_t kpatch_state; + + +static inline void kpatch_state_idle(void) +{ + int state = atomic_read(&kpatch_state); + WARN_ON(state != KPATCH_STATE_SUCCESS && state != KPATCH_STATE_FAILURE); + atomic_set(&kpatch_state, KPATCH_STATE_IDLE); +} + +static inline void kpatch_state_updating(void) +{ + WARN_ON(atomic_read(&kpatch_state) != KPATCH_STATE_IDLE); + atomic_set(&kpatch_state, KPATCH_STATE_UPDATING); +} + +/* If state is updating, change it to success or failure and return new state */ +static inline int kpatch_state_finish(int state) +{ + int result; + WARN_ON(state != KPATCH_STATE_SUCCESS && state != KPATCH_STATE_FAILURE); + result = atomic_cmpxchg(&kpatch_state, KPATCH_STATE_UPDATING, state); + return result == KPATCH_STATE_UPDATING ? state : result; +} + +static struct kpatch_func *kpatch_get_func(unsigned long ip) +{ + struct kpatch_func *f; + + /* Here, we have to use rcu safe hlist because of NMI concurrency */ + hash_for_each_possible_rcu(kpatch_func_hash, f, node, ip) + if (f->old_addr == ip) + return f; + return NULL; +} + +static struct kpatch_func *kpatch_get_prev_func(struct kpatch_func *f, + unsigned long ip) +{ + hlist_for_each_entry_continue_rcu(f, node) + if (f->old_addr == ip) + return f; + return NULL; +} + +static inline int kpatch_compare_addresses(unsigned long stack_addr, + unsigned long func_addr, + unsigned long func_size) +{ + if (stack_addr >= func_addr && stack_addr < func_addr + func_size) { + /* TODO: use kallsyms to print symbol name */ + pr_err("activeness safety check failed for function at address 0x%lx\n", + stack_addr); + return -EBUSY; + } + return 0; +} + +static void kpatch_backtrace_address_verify(void *data, unsigned long address, + int reliable) +{ + struct kpatch_backtrace_args *args = data; + struct kpatch_module *kpmod = args->kpmod; + struct kpatch_func *func; + int i; + + if (args->ret) + return; + + /* check kpmod funcs */ + for (i = 0; i < kpmod->num_funcs; i++) { + unsigned long func_addr, func_size; + struct kpatch_func *active_func; + + func = &kpmod->funcs[i]; + active_func = kpatch_get_func(func->old_addr); + if (!active_func) { + /* patching an unpatched func */ + func_addr = func->old_addr; + func_size = func->old_size; + } else { + /* repatching or unpatching */ + func_addr = active_func->new_addr; + func_size = active_func->new_size; + } + + args->ret = kpatch_compare_addresses(address, func_addr, + func_size); + if (args->ret) + return; + } + + /* in the replace case, need to check the func hash as well */ + hash_for_each_rcu(kpatch_func_hash, i, func, node) { + if (func->op == KPATCH_OP_UNPATCH) { + args->ret = kpatch_compare_addresses(address, + func->new_addr, + func->new_size); + if (args->ret) + return; + } + } +} + +static int kpatch_backtrace_stack(void *data, char *name) +{ + return 0; +} + +static const struct stacktrace_ops kpatch_backtrace_ops = { + .address = kpatch_backtrace_address_verify, + .stack = kpatch_backtrace_stack, + .walk_stack = print_context_stack_bp, +}; + +/* + * Verify activeness safety, i.e. that none of the to-be-patched functions are + * on the stack of any task. + * + * This function is called from stop_machine() context. + */ +static int kpatch_verify_activeness_safety(struct kpatch_module *kpmod) +{ + struct task_struct *g, *t; + int ret = 0; + + struct kpatch_backtrace_args args = { + .kpmod = kpmod, + .ret = 0 + }; + + /* Check the stacks of all tasks. */ + do_each_thread(g, t) { + dump_trace(t, NULL, NULL, 0, &kpatch_backtrace_ops, &args); + if (args.ret) { + ret = args.ret; + goto out; + } + } while_each_thread(g, t); + +out: + return ret; +} + +/* Called from stop_machine */ +static int kpatch_apply_patch(void *data) +{ + struct kpatch_module *kpmod = data; + struct kpatch_func *funcs = kpmod->funcs; + int num_funcs = kpmod->num_funcs; + int i, ret; + + ret = kpatch_verify_activeness_safety(kpmod); + if (ret) { + kpatch_state_finish(KPATCH_STATE_FAILURE); + return ret; + } + + /* tentatively add the new funcs to the global func hash */ + for (i = 0; i < num_funcs; i++) + hash_add_rcu(kpatch_func_hash, &funcs[i].node, + funcs[i].old_addr); + + /* memory barrier between func hash add and state change */ + smp_wmb(); + + /* + * Check if any inconsistent NMI has happened while updating. If not, + * move to success state. + */ + ret = kpatch_state_finish(KPATCH_STATE_SUCCESS); + if (ret == KPATCH_STATE_FAILURE) { + pr_err("NMI activeness safety check failed\n"); + + /* Failed, we have to rollback patching process */ + for (i = 0; i < num_funcs; i++) + hash_del_rcu(&funcs[i].node); + + return -EBUSY; + } + + return 0; +} + +/* Called from stop_machine */ +static int kpatch_remove_patch(void *data) +{ + struct kpatch_module *kpmod = data; + struct kpatch_func *funcs = kpmod->funcs; + int num_funcs = kpmod->num_funcs; + int ret, i; + + ret = kpatch_verify_activeness_safety(kpmod); + if (ret) { + kpatch_state_finish(KPATCH_STATE_FAILURE); + return ret; + } + + /* Check if any inconsistent NMI has happened while updating */ + ret = kpatch_state_finish(KPATCH_STATE_SUCCESS); + if (ret == KPATCH_STATE_FAILURE) + return -EBUSY; + + /* Succeeded, remove all updating funcs from hash table */ + for (i = 0; i < num_funcs; i++) + hash_del_rcu(&funcs[i].node); + + return 0; +} + +/* + * This is where the magic happens. Update regs->ip to tell ftrace to return + * to the new function. + * + * If there are multiple patch modules that have registered to patch the same + * function, the last one to register wins, as it'll be first in the hash + * bucket. + */ +static void notrace kpatch_ftrace_handler(unsigned long ip, unsigned long parent_ip, + struct ftrace_ops *fops, + struct pt_regs *regs) +{ + struct kpatch_func *func; + int state; + + preempt_disable_notrace(); + + if (likely(!in_nmi())) + func = kpatch_get_func(ip); + else { + /* Checking for NMI inconsistency */ + state = kpatch_state_finish(KPATCH_STATE_FAILURE); + + /* no memory reordering between state and func hash read */ + smp_rmb(); + + func = kpatch_get_func(ip); + + if (likely(state == KPATCH_STATE_IDLE)) + goto done; + + if (state == KPATCH_STATE_SUCCESS) { + /* + * Patching succeeded. If the function was being + * unpatched, roll back to the previous version. + */ + if (func && func->op == KPATCH_OP_UNPATCH) + func = kpatch_get_prev_func(func, ip); + } else { + /* + * Patching failed. If the function was being patched, + * roll back to the previous version. + */ + if (func && func->op == KPATCH_OP_PATCH) + func = kpatch_get_prev_func(func, ip); + } + } +done: + if (func) + regs->ip = func->new_addr; + + preempt_enable_notrace(); +} + +static struct ftrace_ops kpatch_ftrace_ops __read_mostly = { + .func = kpatch_ftrace_handler, + .flags = FTRACE_OPS_FL_SAVE_REGS, +}; + +/* Remove kpatch_funcs from ftrace filter */ +static void kpatch_remove_funcs_from_filter(struct kpatch_func *funcs, + int num_funcs) +{ + int i, ret = 0; + + for (i = 0; i < num_funcs; i++) { + struct kpatch_func *func = &funcs[i]; + + /* + * If any other modules have also patched this function, don't + * remove its ftrace handler. + */ + if (kpatch_get_func(func->old_addr)) + continue; + + /* Remove the ftrace handler for this function. */ + ret = ftrace_set_filter_ip(&kpatch_ftrace_ops, func->old_addr, + 1, 0); + + WARN(ret, "can't remove ftrace filter at address 0x%lx (rc=%d)", + func->old_addr, ret); + } +} + +int kpatch_register(struct kpatch_module *kpmod, bool replace) +{ + int ret, i; + struct kpatch_func *funcs = kpmod->funcs; + struct kpatch_func *func; + int num_funcs = kpmod->num_funcs; + + if (!kpmod->mod || !funcs || !num_funcs) + return -EINVAL; + + kpmod->enabled = false; + + down(&kpatch_mutex); + + if (!try_module_get(kpmod->mod)) { + ret = -ENODEV; + goto err_up; + } + + for (i = 0; i < num_funcs; i++) { + func = &funcs[i]; + + func->op = KPATCH_OP_PATCH; + func->kpmod = kpmod; + + /* + * If any other modules have also patched this function, it + * already has an ftrace handler. + */ + if (kpatch_get_func(func->old_addr)) + continue; + + /* Add an ftrace handler for this function. */ + ret = ftrace_set_filter_ip(&kpatch_ftrace_ops, func->old_addr, + 0, 0); + if (ret) { + pr_err("can't set ftrace filter at address 0x%lx\n", + func->old_addr); + num_funcs = i; + goto err_rollback; + } + } + + /* Register the ftrace trampoline if it hasn't been done already. */ + if (!kpatch_num_registered) { + ret = register_ftrace_function(&kpatch_ftrace_ops); + if (ret) { + pr_err("can't register ftrace handler\n"); + goto err_rollback; + } + } + kpatch_num_registered++; + + if (replace) + hash_for_each_rcu(kpatch_func_hash, i, func, node) + func->op = KPATCH_OP_UNPATCH; + + /* memory barrier between func hash and state write */ + smp_wmb(); + + kpatch_state_updating(); + + /* + * Idle the CPUs, verify activeness safety, and atomically make the new + * functions visible to the trampoline. + */ + ret = stop_machine(kpatch_apply_patch, kpmod, NULL); + + /* + * For the replace case, remove any obsolete funcs from the hash and + * the ftrace filter, and disable the owning patch module so that it + * can be removed. + */ + if (!ret && replace) + hash_for_each_rcu(kpatch_func_hash, i, func, node) { + if (func->op != KPATCH_OP_UNPATCH) + continue; + hash_del_rcu(&func->node); + kpatch_remove_funcs_from_filter(func, 1); + if (func->kpmod->enabled) { + kpatch_num_registered--; + func->kpmod->enabled = false; + pr_notice("unloaded patch module \"%s\"\n", + func->kpmod->mod->name); + module_put(func->kpmod->mod); + } + } + + /* memory barrier between func hash and state write */ + smp_wmb(); + + /* NMI handlers can return to normal now */ + kpatch_state_idle(); + + /* + * Wait for all existing NMI handlers to complete so that they don't + * see any changes to funcs or funcs->op that might occur after this + * point. + * + * Any NMI handlers starting after this point will see the IDLE state. + */ + synchronize_rcu(); + + if (ret) + goto err_unregister; + + for (i = 0; i < num_funcs; i++) + funcs[i].op = KPATCH_OP_NONE; + + pr_notice_once("tainting kernel with TAINT_KPATCH\n"); + add_taint(TAINT_KPATCH, LOCKDEP_STILL_OK); + + pr_notice("loaded patch module \"%s\"\n", kpmod->mod->name); + + kpmod->enabled = true; + + up(&kpatch_mutex); + return 0; + +err_unregister: + if (replace) + hash_for_each_rcu(kpatch_func_hash, i, func, node) + func->op = KPATCH_OP_NONE; + if (kpatch_num_registered == 1) { + int ret2 = unregister_ftrace_function(&kpatch_ftrace_ops); + if (ret2) { + pr_err("ftrace unregister failed (%d)\n", ret2); + goto err_rollback; + } + } + kpatch_num_registered--; +err_rollback: + kpatch_remove_funcs_from_filter(funcs, num_funcs); + module_put(kpmod->mod); +err_up: + up(&kpatch_mutex); + return ret; +} +EXPORT_SYMBOL(kpatch_register); + +int kpatch_unregister(struct kpatch_module *kpmod) +{ + struct kpatch_func *funcs = kpmod->funcs; + int num_funcs = kpmod->num_funcs; + int i, ret; + + WARN_ON(!kpmod->enabled); + + down(&kpatch_mutex); + + for (i = 0; i < num_funcs; i++) + funcs[i].op = KPATCH_OP_UNPATCH; + + /* memory barrier between func hash and state write */ + smp_wmb(); + + kpatch_state_updating(); + + ret = stop_machine(kpatch_remove_patch, kpmod, NULL); + + /* NMI handlers can return to normal now */ + kpatch_state_idle(); + + /* + * Wait for all existing NMI handlers to complete so that they don't + * see any changes to funcs or funcs->op that might occur after this + * point. + * + * Any NMI handlers starting after this point will see the IDLE state. + */ + synchronize_rcu(); + + if (ret) { + for (i = 0; i < num_funcs; i++) + funcs[i].op = KPATCH_OP_NONE; + goto out; + } + + if (kpatch_num_registered == 1) { + ret = unregister_ftrace_function(&kpatch_ftrace_ops); + if (ret) + WARN(1, "can't unregister ftrace handler"); + else + kpatch_num_registered--; + } + + kpatch_remove_funcs_from_filter(funcs, num_funcs); + + pr_notice("unloaded patch module \"%s\"\n", kpmod->mod->name); + + kpmod->enabled = false; + module_put(kpmod->mod); + +out: + up(&kpatch_mutex); + return ret; +} +EXPORT_SYMBOL(kpatch_unregister); + +static int kpatch_init(void) +{ + kpatch_root_kobj = kobject_create_and_add("kpatch", kernel_kobj); + if (!kpatch_root_kobj) + return -ENOMEM; + + kpatch_patches_kobj = kobject_create_and_add("patches", + kpatch_root_kobj); + if (!kpatch_patches_kobj) + return -ENOMEM; + + return 0; +} + +static void kpatch_exit(void) +{ + WARN_ON(kpatch_num_registered != 0); + kobject_put(kpatch_patches_kobj); + kobject_put(kpatch_root_kobj); +} + +module_init(kpatch_init); +module_exit(kpatch_exit); +MODULE_LICENSE("GPL"); -- 1.9.0 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-01 15:52 [RFC PATCH 0/2] kpatch: dynamic kernel patching Josh Poimboeuf 2014-05-01 15:52 ` [RFC PATCH 1/2] kpatch: add TAINT_KPATCH flag Josh Poimboeuf 2014-05-01 15:52 ` [RFC PATCH 2/2] kpatch: add kpatch core module Josh Poimboeuf @ 2014-05-01 20:45 ` Andi Kleen 2014-05-01 21:01 ` Josh Poimboeuf 2014-05-02 8:37 ` Jiri Kosina ` (2 subsequent siblings) 5 siblings, 1 reply; 60+ messages in thread From: Andi Kleen @ 2014-05-01 20:45 UTC (permalink / raw) To: Josh Poimboeuf Cc: Seth Jennings, Masami Hiramatsu, Steven Rostedt, Frederic Weisbecker, Ingo Molnar, Jiri Slaby, linux-kernel Josh Poimboeuf <jpoimboe@redhat.com> writes: > > kpatch checks the backtraces of all tasks in stop_machine() to ensure > that no instances of the old function are running when the new function > is applied. How does that work for tail calls? call foo foo: ... jmp bar bar: ... code executing ... When you backtrace you will see foo, but you are running in bar. Note that tail calls can be indirect, so they cannot be resolved statically. CONFIG_DEBUG_INFO usually disables tail calls, but not supporting it would seem like a large limitation, as the cost can be high. It wouldn't surprise me if there are some similar special cases that can even happen with them disabled. In theory you could read LBRs, but even that may miss some extreme cases. -Andi -- ak@linux.intel.com -- Speaking for myself only ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-01 20:45 ` [RFC PATCH 0/2] kpatch: dynamic kernel patching Andi Kleen @ 2014-05-01 21:01 ` Josh Poimboeuf 2014-05-01 21:06 ` Andi Kleen 0 siblings, 1 reply; 60+ messages in thread From: Josh Poimboeuf @ 2014-05-01 21:01 UTC (permalink / raw) To: Andi Kleen Cc: Seth Jennings, Masami Hiramatsu, Steven Rostedt, Frederic Weisbecker, Ingo Molnar, Jiri Slaby, linux-kernel On Thu, May 01, 2014 at 01:45:33PM -0700, Andi Kleen wrote: > Josh Poimboeuf <jpoimboe@redhat.com> writes: > > > > kpatch checks the backtraces of all tasks in stop_machine() to ensure > > that no instances of the old function are running when the new function > > is applied. > > How does that work for tail calls? > > call foo > foo: > ... > jmp bar > > bar: > ... code executing ... > > When you backtrace you will see foo, but you are running in bar. > Note that tail calls can be indirect, so they cannot be resolved > statically. > > CONFIG_DEBUG_INFO usually disables tail calls, but not supporting > it would seem like a large limitation, as the cost can be high. > > It wouldn't surprise me if there are some similar special cases that > can even happen with them disabled. > > In theory you could read LBRs, but even that may miss some extreme > cases. When bar returns, would it skip foo and go straight back to foo's caller? If so, then it should be safe to patch foo after it jumps to bar. -- Josh ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-01 21:01 ` Josh Poimboeuf @ 2014-05-01 21:06 ` Andi Kleen 2014-05-01 21:27 ` Josh Poimboeuf 2014-05-02 1:30 ` Masami Hiramatsu 0 siblings, 2 replies; 60+ messages in thread From: Andi Kleen @ 2014-05-01 21:06 UTC (permalink / raw) To: Josh Poimboeuf Cc: Andi Kleen, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Frederic Weisbecker, Ingo Molnar, Jiri Slaby, linux-kernel > When bar returns, would it skip foo and go straight back to foo's > caller? If so, then it should be safe to patch foo after it jumps to > bar. foo is no problem, you see it in the backtrace. But you don't see bar. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-01 21:06 ` Andi Kleen @ 2014-05-01 21:27 ` Josh Poimboeuf 2014-05-01 21:39 ` Josh Poimboeuf 2014-05-02 1:30 ` Masami Hiramatsu 1 sibling, 1 reply; 60+ messages in thread From: Josh Poimboeuf @ 2014-05-01 21:27 UTC (permalink / raw) To: Andi Kleen Cc: Seth Jennings, Masami Hiramatsu, Steven Rostedt, Frederic Weisbecker, Ingo Molnar, Jiri Slaby, linux-kernel On Thu, May 01, 2014 at 11:06:01PM +0200, Andi Kleen wrote: > > When bar returns, would it skip foo and go straight back to foo's > > caller? If so, then it should be safe to patch foo after it jumps to > > bar. > > foo is no problem, you see it in the backtrace. > But you don't see bar. Sorry, I missed your point the first time. Good question. stop_machine schedules a high priority thread on each CPU, which means every other task will be waiting in a schedule() call (assuming a non-preemptible kernel). In my local kernel, a quick grep of the disassembly doesn't show any jumps to schedule: $ egrep 'j.*<.*>' vmlinux.asm |grep -v '\+' |grep schedule ffffffff816b89b5: e9 e2 fe ff ff jmpq ffffffff816b889c <retint_with_reschedule> ffffffff816b8cec: 75 1e jne ffffffff816b8d0c <paranoid_schedule> But yes, that would be a problem if any tail call jumps to schedule() ever showed up. We may need to detect that case in our patch generation tooling and fail to create the patch module binary if the patch affects a function which does this. -- Josh ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-01 21:27 ` Josh Poimboeuf @ 2014-05-01 21:39 ` Josh Poimboeuf 0 siblings, 0 replies; 60+ messages in thread From: Josh Poimboeuf @ 2014-05-01 21:39 UTC (permalink / raw) To: Andi Kleen Cc: Seth Jennings, Masami Hiramatsu, Steven Rostedt, Frederic Weisbecker, Ingo Molnar, Jiri Slaby, linux-kernel On Thu, May 01, 2014 at 04:27:46PM -0500, Josh Poimboeuf wrote: > On Thu, May 01, 2014 at 11:06:01PM +0200, Andi Kleen wrote: > > > When bar returns, would it skip foo and go straight back to foo's > > > caller? If so, then it should be safe to patch foo after it jumps to > > > bar. > > > > foo is no problem, you see it in the backtrace. > > But you don't see bar. > > Sorry, I missed your point the first time. Good question. > > stop_machine schedules a high priority thread on each CPU, which means > every other task will be waiting in a schedule() call (assuming a > non-preemptible kernel). In my local kernel, a quick grep of the > disassembly doesn't show any jumps to schedule: > > $ egrep 'j.*<.*>' vmlinux.asm |grep -v '\+' |grep schedule > ffffffff816b89b5: e9 e2 fe ff ff jmpq ffffffff816b889c <retint_with_reschedule> > ffffffff816b8cec: 75 1e jne ffffffff816b8d0c <paranoid_schedule> > > But yes, that would be a problem if any tail call jumps to schedule() > ever showed up. We may need to detect that case in our patch generation > tooling and fail to create the patch module binary if the patch affects > a function which does this. Thinking more about this... Even if it jumps to schedule(), I think there's no problem, since the function is basically done, and we already know not to patch schedule(). -- Josh ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-01 21:06 ` Andi Kleen 2014-05-01 21:27 ` Josh Poimboeuf @ 2014-05-02 1:30 ` Masami Hiramatsu 1 sibling, 0 replies; 60+ messages in thread From: Masami Hiramatsu @ 2014-05-02 1:30 UTC (permalink / raw) To: Andi Kleen Cc: Josh Poimboeuf, Seth Jennings, Steven Rostedt, Frederic Weisbecker, Ingo Molnar, Jiri Slaby, linux-kernel (2014/05/02 6:06), Andi Kleen wrote: >> When bar returns, would it skip foo and go straight back to foo's >> caller? If so, then it should be safe to patch foo after it jumps to >> bar. > > foo is no problem, you see it in the backtrace. > But you don't see bar. No, there is no "foo" in backtrace. As Josh said, stop_machine() schedules highest priority tasks on each running thread, as below. buz: ... call foo -> push buz on the stack foo: ... jmp bar bar: ... call schedule -> push bar on the stack Thus, we'll not see foo on the stack. However, since the tail call doesn't return to foo, we can treat it already finished. So, there is no consistency problem on it. :) Thank you, -- Masami HIRAMATSU Software Platform Research Dept. Linux Technology Research Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu.pt@hitachi.com ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-01 15:52 [RFC PATCH 0/2] kpatch: dynamic kernel patching Josh Poimboeuf ` (2 preceding siblings ...) 2014-05-01 20:45 ` [RFC PATCH 0/2] kpatch: dynamic kernel patching Andi Kleen @ 2014-05-02 8:37 ` Jiri Kosina 2014-05-02 13:29 ` Josh Poimboeuf 2014-05-02 13:10 ` Jiri Kosina 2014-05-05 8:55 ` Ingo Molnar 5 siblings, 1 reply; 60+ messages in thread From: Jiri Kosina @ 2014-05-02 8:37 UTC (permalink / raw) To: Josh Poimboeuf Cc: Seth Jennings, Masami Hiramatsu, Steven Rostedt, Frederic Weisbecker, Ingo Molnar, Jiri Slaby, linux-kernel On Thu, 1 May 2014, Josh Poimboeuf wrote: > Since Jiri posted the kGraft patches [1], I wanted to share an > alternative live patching solution called kpatch, which is something > we've been working on at Red Hat for quite a while. Hi Josh, Seth, thanks a lot for following up to our RFC with your submission, I am pretty sure this will help to energize the discussion and will provoke ideas for further improvements. [ ... snip ... ] > kpatch vs kGraft > ---------------- > > I think the biggest difference between kpatch and kGraft is how they > ensure that the patch is applied atomically and safely. > > kpatch checks the backtraces of all tasks in stop_machine() to ensure > that no instances of the old function are running when the new function > is applied. I think the biggest downside of this approach is that > stop_machine() has to idle all other CPUs during the patching process, > so it inserts a small amount of latency (a few ms on an idle system). Completely agreed with your comparative analysis, thanks for a nice summary. Additional thing that I believe is important to add here: with the "stop-machine / check all tasks" aproach, there might be situations where you'll always fail to patch the system; if there is a long-time sleeper in the patched callchain, such a single sleeper is enough to make the patching of the whole system impossible. With the lazy/gradual aproach implemented in kGraft, the whole system is gradually moving towards "fully patched" state and once all the sleepers blocking the process wake up, it ultimately converges to the fully patched state. Thanks, -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-02 8:37 ` Jiri Kosina @ 2014-05-02 13:29 ` Josh Poimboeuf 0 siblings, 0 replies; 60+ messages in thread From: Josh Poimboeuf @ 2014-05-02 13:29 UTC (permalink / raw) To: Jiri Kosina Cc: Seth Jennings, Masami Hiramatsu, Steven Rostedt, Frederic Weisbecker, Ingo Molnar, Jiri Slaby, linux-kernel On Fri, May 02, 2014 at 10:37:53AM +0200, Jiri Kosina wrote: > On Thu, 1 May 2014, Josh Poimboeuf wrote: > > > Since Jiri posted the kGraft patches [1], I wanted to share an > > alternative live patching solution called kpatch, which is something > > we've been working on at Red Hat for quite a while. > > Hi Josh, Seth, > > thanks a lot for following up to our RFC with your submission, I am pretty > sure this will help to energize the discussion and will provoke ideas for > further improvements. Hi Jiri! Thanks for your comments. I'm pretty sure we have the same basic goals and we definitely want to find something that works for everybody. > > [ ... snip ... ] > > kpatch vs kGraft > > ---------------- > > > > I think the biggest difference between kpatch and kGraft is how they > > ensure that the patch is applied atomically and safely. > > > > kpatch checks the backtraces of all tasks in stop_machine() to ensure > > that no instances of the old function are running when the new function > > is applied. I think the biggest downside of this approach is that > > stop_machine() has to idle all other CPUs during the patching process, > > so it inserts a small amount of latency (a few ms on an idle system). > > Completely agreed with your comparative analysis, thanks for a nice > summary. > > Additional thing that I believe is important to add here: with the > "stop-machine / check all tasks" aproach, there might be situations where > you'll always fail to patch the system; if there is a long-time sleeper in > the patched callchain, such a single sleeper is enough to make the > patching of the whole system impossible. Yeah, this is true with our current implementation. With stop_machine(), functions which are always on the stack of at least one process in the system can't be patched. In practice this means functions like schedule(), poll(), select() and nanosleep(). Here are all the top-level sleeping functions on my system: $ for i in /proc/*/wchan; do cat $i; echo; done | sort |uniq |egrep -v '^0$' devtmpfsd do_sigtimedwait do_wait ep_poll fsnotify_mark_destroy futex_wait_queue_me hrtimer_nanosleep hub_thread inotify_read irq_thread kauditd_thread khugepaged kjournald2 ksm_scan_thread kswapd kthreadd n_tty_read pipe_wait poll_schedule_timeout rcu_gp_kthread rescuer_thread rfcomm_run scsi_error_handler smpboot_thread_fn unix_stream_recvmsg worker_thread In addition to these functions, their call chain path down to schedule() also can't be patched. But it works out to be a tiny ratio of unpatchable-to-patchable. I think this is a very minor limitation, but if it turned out to be a problem, we could do something like installing an ftrace handler for the blocking function, waking the sleeping task, and looping in the ftrace handler until the patching process is complete. > > With the lazy/gradual aproach implemented in kGraft, the whole system is > gradually moving towards "fully patched" state and once all the sleepers > blocking the process wake up, it ultimately converges to the fully patched > state. > > Thanks, > > -- > Jiri Kosina > SUSE Labs > -- Josh ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-01 15:52 [RFC PATCH 0/2] kpatch: dynamic kernel patching Josh Poimboeuf ` (3 preceding siblings ...) 2014-05-02 8:37 ` Jiri Kosina @ 2014-05-02 13:10 ` Jiri Kosina 2014-05-02 13:37 ` Josh Poimboeuf 2014-05-05 23:34 ` David Lang 2014-05-05 8:55 ` Ingo Molnar 5 siblings, 2 replies; 60+ messages in thread From: Jiri Kosina @ 2014-05-02 13:10 UTC (permalink / raw) To: Josh Poimboeuf Cc: Seth Jennings, Masami Hiramatsu, Steven Rostedt, Frederic Weisbecker, Ingo Molnar, Jiri Slaby, linux-kernel On Thu, 1 May 2014, Josh Poimboeuf wrote: > kpatch vs kGraft > ---------------- > > I think the biggest difference between kpatch and kGraft is how they > ensure that the patch is applied atomically and safely. > > kpatch checks the backtraces of all tasks in stop_machine() to ensure > that no instances of the old function are running when the new function > is applied. I think the biggest downside of this approach is that > stop_machine() has to idle all other CPUs during the patching process, > so it inserts a small amount of latency (a few ms on an idle system). > > Instead, kGraft uses per-task consistency: each task either sees the old > version or the new version of the function. This gives a consistent > view with respect to functions, but _not_ data, because the old and new > functions are allowed to run simultaneously and share data. This could > be dangerous if a patch changes how a function uses a data structure. > The new function could make a data change that the old function wasn't > expecting. Please correct me if I am wrong, but with kPatch, you are also unable to do a "flip and forget" switch between functions that expect different format of in-memory data without performing a non-trivial all-memory lookup to find structures in question and perfoming corresponding transformations. What we can do with kGraft si to perform the patching in two steps (1) redirect to a temporary band-aid function that can handle both semantics of the data (persumably in highly sub-optimal way) (2) patching in (1) succeeds completely (kGraft claims victory), start a new round of patching with redirect to the final function which expects only the new semantics This basically implies that both aproaches need "human inspection" in this respect anyway. -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-02 13:10 ` Jiri Kosina @ 2014-05-02 13:37 ` Josh Poimboeuf 2014-05-05 23:34 ` David Lang 1 sibling, 0 replies; 60+ messages in thread From: Josh Poimboeuf @ 2014-05-02 13:37 UTC (permalink / raw) To: Jiri Kosina Cc: Seth Jennings, Masami Hiramatsu, Steven Rostedt, Frederic Weisbecker, Ingo Molnar, Jiri Slaby, linux-kernel On Fri, May 02, 2014 at 03:10:58PM +0200, Jiri Kosina wrote: > On Thu, 1 May 2014, Josh Poimboeuf wrote: > > > kpatch vs kGraft > > ---------------- > > > > I think the biggest difference between kpatch and kGraft is how they > > ensure that the patch is applied atomically and safely. > > > > kpatch checks the backtraces of all tasks in stop_machine() to ensure > > that no instances of the old function are running when the new function > > is applied. I think the biggest downside of this approach is that > > stop_machine() has to idle all other CPUs during the patching process, > > so it inserts a small amount of latency (a few ms on an idle system). > > > > Instead, kGraft uses per-task consistency: each task either sees the old > > version or the new version of the function. This gives a consistent > > view with respect to functions, but _not_ data, because the old and new > > functions are allowed to run simultaneously and share data. This could > > be dangerous if a patch changes how a function uses a data structure. > > The new function could make a data change that the old function wasn't > > expecting. > > Please correct me if I am wrong, but with kPatch, you are also unable to > do a "flip and forget" switch between functions that expect different > format of in-memory data without performing a non-trivial all-memory > lookup to find structures in question and perfoming corresponding > transformations. > > What we can do with kGraft si to perform the patching in two steps > > (1) redirect to a temporary band-aid function that can handle both > semantics of the data (persumably in highly sub-optimal way) > (2) patching in (1) succeeds completely (kGraft claims victory), start a > new round of patching with redirect to the final function which > expects only the new semantics > > This basically implies that both aproaches need "human inspection" in this > respect anyway. Ah, interesting. The intermediate function which knows how to handle both versions of the data could get pretty tricky though. With kpatch we'd just need to allow the user to install a callback function which runs in stop_machine() and manually updates all the data structures. I think we'll be implementing something like this soon. -- Josh ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-02 13:10 ` Jiri Kosina 2014-05-02 13:37 ` Josh Poimboeuf @ 2014-05-05 23:34 ` David Lang 2014-05-05 23:52 ` Jiri Kosina 1 sibling, 1 reply; 60+ messages in thread From: David Lang @ 2014-05-05 23:34 UTC (permalink / raw) To: Jiri Kosina Cc: Josh Poimboeuf, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Frederic Weisbecker, Ingo Molnar, Jiri Slaby, linux-kernel On Fri, 2 May 2014, Jiri Kosina wrote: > On Thu, 1 May 2014, Josh Poimboeuf wrote: > >> kpatch vs kGraft >> ---------------- >> >> I think the biggest difference between kpatch and kGraft is how they >> ensure that the patch is applied atomically and safely. >> >> kpatch checks the backtraces of all tasks in stop_machine() to ensure >> that no instances of the old function are running when the new function >> is applied. I think the biggest downside of this approach is that >> stop_machine() has to idle all other CPUs during the patching process, >> so it inserts a small amount of latency (a few ms on an idle system). >> >> Instead, kGraft uses per-task consistency: each task either sees the old >> version or the new version of the function. This gives a consistent >> view with respect to functions, but _not_ data, because the old and new >> functions are allowed to run simultaneously and share data. This could >> be dangerous if a patch changes how a function uses a data structure. >> The new function could make a data change that the old function wasn't >> expecting. > > Please correct me if I am wrong, but with kPatch, you are also unable to > do a "flip and forget" switch between functions that expect different > format of in-memory data without performing a non-trivial all-memory > lookup to find structures in question and perfoming corresponding > transformations. > > What we can do with kGraft si to perform the patching in two steps > > (1) redirect to a temporary band-aid function that can handle both > semantics of the data (persumably in highly sub-optimal way) > (2) patching in (1) succeeds completely (kGraft claims victory), start a > new round of patching with redirect to the final function which > expects only the new semantics > > This basically implies that both aproaches need "human inspection" in this > respect anyway. how would you know that all instances of the datastructure in memory have been touched? just because all tasks have run and are outside the function in question doesn't tell you data structures have been converted. You have no way of knowing when (or if) the next call to the modified function will take place on any potential in-memory structure. David Lang ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-05 23:34 ` David Lang @ 2014-05-05 23:52 ` Jiri Kosina 2014-05-06 1:59 ` David Lang 2014-05-06 7:32 ` Ingo Molnar 0 siblings, 2 replies; 60+ messages in thread From: Jiri Kosina @ 2014-05-05 23:52 UTC (permalink / raw) To: David Lang Cc: Josh Poimboeuf, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Frederic Weisbecker, Ingo Molnar, Jiri Slaby, linux-kernel On Mon, 5 May 2014, David Lang wrote: > how would you know that all instances of the datastructure in memory > have= been touched? just because all tasks have run and are outside the > function in question doesn't tell you data structures have been > converted. You have n= o way of knowing when (or if) the next call to > the modified function will take place on any potential in-memory > structure. The problem you are trying to avoid here is functions expecting to read "v2" format of the data from memory, while there are still tasks that are unpredictably writing "v1" format of the data to the memory. There are several ways to attack this problem: - stop the whole system, convert all the existing data structures to new format (which might potentially be non-trivial, mostly because you have to *know* where all the data structures have been allocated), apply patch, resume operation [ksplice, probably kpatch in future] - restrict the data format to be backwards compatible [to be done manually during patch creation, currently what kGraft needs to do in such case] - have a proxy code which can read both "v1" and "v2" formats, and writes back in the same format it has seen the data structure on input - once all the *code* has been converted, it still has to understand "v1" and "v2", but it can now start writing out "v2" format only [possible with kGraft, not implemented in automated fashion] Ideas are of course more than welcome. Thanks, -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-05 23:52 ` Jiri Kosina @ 2014-05-06 1:59 ` David Lang 2014-05-06 12:17 ` Josh Poimboeuf 2014-05-06 7:32 ` Ingo Molnar 1 sibling, 1 reply; 60+ messages in thread From: David Lang @ 2014-05-06 1:59 UTC (permalink / raw) To: Jiri Kosina Cc: Josh Poimboeuf, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Frederic Weisbecker, Ingo Molnar, Jiri Slaby, linux-kernel On Tue, 6 May 2014, Jiri Kosina wrote: > On Mon, 5 May 2014, David Lang wrote: > >> how would you know that all instances of the datastructure in memory >> have= been touched? just because all tasks have run and are outside the >> function in question doesn't tell you data structures have been >> converted. You have n= o way of knowing when (or if) the next call to >> the modified function will take place on any potential in-memory >> structure. > > The problem you are trying to avoid here is functions expecting to read > "v2" format of the data from memory, while there are still tasks that are > unpredictably writing "v1" format of the data to the memory. > > There are several ways to attack this problem: > > - stop the whole system, convert all the existing data structures to new > format (which might potentially be non-trivial, mostly because you > have to *know* where all the data structures have been allocated), apply > patch, resume operation [ksplice, probably kpatch in future] > - restrict the data format to be backwards compatible [to be done > manually during patch creation, currently what kGraft needs to do in > such case] > - have a proxy code which can read both "v1" and "v2" formats, and writes > back in the same format it has seen the data structure on input doesn't this have the same problem of finding all the data? > - once all the *code* has been converted, it still has to understand "v1" > and "v2", but it can now start writing out "v2" format only [possible > with kGraft, not implemented in automated fashion] which is a varient of the second one in that all the data needs to be tagged with a version so that it can be converted. I don't see that 'stop the world' ends up being much better for this. it does avoid any possibility of v1 code reading v2 data, but it doesn't help in avoiding v2 code reading v1 data some time (potentially much) later. David Lang ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-06 1:59 ` David Lang @ 2014-05-06 12:17 ` Josh Poimboeuf 0 siblings, 0 replies; 60+ messages in thread From: Josh Poimboeuf @ 2014-05-06 12:17 UTC (permalink / raw) To: David Lang Cc: Jiri Kosina, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Frederic Weisbecker, Ingo Molnar, Jiri Slaby, linux-kernel On Mon, May 05, 2014 at 06:59:29PM -0700, David Lang wrote: > On Tue, 6 May 2014, Jiri Kosina wrote: > > >On Mon, 5 May 2014, David Lang wrote: > > > >>how would you know that all instances of the datastructure in memory > >>have= been touched? just because all tasks have run and are outside the > >>function in question doesn't tell you data structures have been > >>converted. You have n= o way of knowing when (or if) the next call to > >>the modified function will take place on any potential in-memory > >>structure. > > > >The problem you are trying to avoid here is functions expecting to read > >"v2" format of the data from memory, while there are still tasks that are > >unpredictably writing "v1" format of the data to the memory. > > > >There are several ways to attack this problem: > > > >- stop the whole system, convert all the existing data structures to new > > format (which might potentially be non-trivial, mostly because you > > have to *know* where all the data structures have been allocated), apply > > patch, resume operation [ksplice, probably kpatch in future] > >- restrict the data format to be backwards compatible [to be done > > manually during patch creation, currently what kGraft needs to do in > > such case] > >- have a proxy code which can read both "v1" and "v2" formats, and writes > > back in the same format it has seen the data structure on input > > doesn't this have the same problem of finding all the data? > > >- once all the *code* has been converted, it still has to understand "v1" > > and "v2", but it can now start writing out "v2" format only [possible > > with kGraft, not implemented in automated fashion] > > which is a varient of the second one in that all the data needs to be tagged > with a version so that it can be converted. > > I don't see that 'stop the world' ends up being much better for this. > > it does avoid any possibility of v1 code reading v2 data, but it doesn't > help in avoiding v2 code reading v1 data some time (potentially much) later. In general we would avoid this type of situation, where data semantics change. If it's absolutely necessary, then it's up to the user to write a hook function that will find all the data structures and convert them at patch loading time. If they can't write a function which does that sufficiently and safely, then live patching isn't a good option. -- Josh ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-05 23:52 ` Jiri Kosina 2014-05-06 1:59 ` David Lang @ 2014-05-06 7:32 ` Ingo Molnar 2014-05-06 8:03 ` Jiri Kosina 2014-05-06 12:23 ` Josh Poimboeuf 1 sibling, 2 replies; 60+ messages in thread From: Ingo Molnar @ 2014-05-06 7:32 UTC (permalink / raw) To: Jiri Kosina Cc: David Lang, Josh Poimboeuf, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Frederic Weisbecker, Ingo Molnar, Jiri Slaby, linux-kernel * Jiri Kosina <jkosina@suse.cz> wrote: > On Mon, 5 May 2014, David Lang wrote: > > > how would you know that all instances of the datastructure in memory > > have= been touched? just because all tasks have run and are outside the > > function in question doesn't tell you data structures have been > > converted. You have n= o way of knowing when (or if) the next call to > > the modified function will take place on any potential in-memory > > structure. > > The problem you are trying to avoid here is functions expecting to read > "v2" format of the data from memory, while there are still tasks that are > unpredictably writing "v1" format of the data to the memory. > > There are several ways to attack this problem: > > - stop the whole system, convert all the existing data structures to new > format (which might potentially be non-trivial, mostly because you > have to *know* where all the data structures have been allocated), apply > patch, resume operation [ksplice, probably kpatch in future] > - restrict the data format to be backwards compatible [to be done > manually during patch creation, currently what kGraft needs to do in > such case] > - have a proxy code which can read both "v1" and "v2" formats, and writes > back in the same format it has seen the data structure on input > - once all the *code* has been converted, it still has to understand "v1" > and "v2", but it can now start writing out "v2" format only [possible > with kGraft, not implemented in automated fashion] > > Ideas are of course more than welcome. So what I'm curious about, what is the actual 'in the field' distro experience, about the type of live-patches that get pushed with urgency? My guess would be that the overwhelming majority of live-patches don't change data structures - and hence the right initial model would be to ensure (via tooling, and via review) that 'v1' and 'v2' data is exactly the same. The most abstract way to 'live patch' a kernel is to do a checkpoint save on all application state, to reboot the kernel, boot into the patched kernel and then restore all application state seemlessly. i.e. the CR effort that has been going on for years, to save/restore all state that is exposed through a kernel ABI. That allows a complete kernel to be switched out, without affecting applications. Thanks, Ingo ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-06 7:32 ` Ingo Molnar @ 2014-05-06 8:03 ` Jiri Kosina 2014-05-06 12:23 ` Josh Poimboeuf 1 sibling, 0 replies; 60+ messages in thread From: Jiri Kosina @ 2014-05-06 8:03 UTC (permalink / raw) To: Ingo Molnar Cc: David Lang, Josh Poimboeuf, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Frederic Weisbecker, Ingo Molnar, Jiri Slaby, linux-kernel On Tue, 6 May 2014, Ingo Molnar wrote: > So what I'm curious about, what is the actual 'in the field' distro > experience, about the type of live-patches that get pushed with > urgency? This is of course a very good question. We've done some very light preparatory analysis and went through patches which would make most sense to be shipped as hot/live patches without enough time for proper downtime scheduling (i.e. CVE severity high enough (local root), etc). Most of the time, these turn out to be a one-or-few liners, mostly adding extra check, fixing bounds, etc. There were just one or two in a few years history where some extra care would be needed. Of course, I guess that most valuable input regarding this could be coming from kSplice guys, as they have the biggest in-field experience with this, and are shipping a lot of non-trivial patches this way, not just the "super-critical" ones. But I am not really sure whether we can expect any input from them, unfortunately. > My guess would be that the overwhelming majority of live-patches don't > change data structures - and hence the right initial model would be to > ensure (via tooling, and via review) that 'v1' and 'v2' data is exactly > the same. I fully agree ... that's why we are not pro-actively dealing with that in kGraft, for the sake of keeping it stupid at simple (at least in the early stages). > The most abstract way to 'live patch' a kernel is to do a checkpoint > save on all application state, to reboot the kernel, boot into the > patched kernel and then restore all application state seemlessly. In some sense yes, but it seems to still be rather costly operation (especially checkpointing on a HUGE machine might take some time, if I understand the technology correctly) compared to just flipping a few bytes (while maintaining overall correctness, of course) in the kernel memory. Basically, CR might be the best solution for very complex and large kernel updates. What I am currently most concerned about though, are scenarios where datacenter owners need to urgently apply a local root one-liner fix with as little hassle as possible. -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-06 7:32 ` Ingo Molnar 2014-05-06 8:03 ` Jiri Kosina @ 2014-05-06 12:23 ` Josh Poimboeuf 2014-05-07 12:19 ` Ingo Molnar 1 sibling, 1 reply; 60+ messages in thread From: Josh Poimboeuf @ 2014-05-06 12:23 UTC (permalink / raw) To: Ingo Molnar Cc: Jiri Kosina, David Lang, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Frederic Weisbecker, Ingo Molnar, Jiri Slaby, linux-kernel On Tue, May 06, 2014 at 09:32:28AM +0200, Ingo Molnar wrote: > > * Jiri Kosina <jkosina@suse.cz> wrote: > > > On Mon, 5 May 2014, David Lang wrote: > > > > > how would you know that all instances of the datastructure in memory > > > have= been touched? just because all tasks have run and are outside the > > > function in question doesn't tell you data structures have been > > > converted. You have n= o way of knowing when (or if) the next call to > > > the modified function will take place on any potential in-memory > > > structure. > > > > The problem you are trying to avoid here is functions expecting to read > > "v2" format of the data from memory, while there are still tasks that are > > unpredictably writing "v1" format of the data to the memory. > > > > There are several ways to attack this problem: > > > > - stop the whole system, convert all the existing data structures to new > > format (which might potentially be non-trivial, mostly because you > > have to *know* where all the data structures have been allocated), apply > > patch, resume operation [ksplice, probably kpatch in future] > > - restrict the data format to be backwards compatible [to be done > > manually during patch creation, currently what kGraft needs to do in > > such case] > > - have a proxy code which can read both "v1" and "v2" formats, and writes > > back in the same format it has seen the data structure on input > > - once all the *code* has been converted, it still has to understand "v1" > > and "v2", but it can now start writing out "v2" format only [possible > > with kGraft, not implemented in automated fashion] > > > > Ideas are of course more than welcome. > > So what I'm curious about, what is the actual 'in the field' distro > experience, about the type of live-patches that get pushed with > urgency? > > My guess would be that the overwhelming majority of live-patches don't > change data structures - and hence the right initial model would be to > ensure (via tooling, and via review) that 'v1' and 'v2' data is > exactly the same. Yes, in general we want to avoid data changes. In practice, we expect most patches to be small, localized security fixes, so it shouldn't be an issue in most cases. Currently the kpatch tooling detects any compile-time changes to static data and refuses to build the patch module in that case. But there's no way to programmatically detect changes to dynamic data. Which is why the user always has to be very careful when selecting a patch. -- Josh ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-06 12:23 ` Josh Poimboeuf @ 2014-05-07 12:19 ` Ingo Molnar 2014-05-09 1:46 ` David Lang 0 siblings, 1 reply; 60+ messages in thread From: Ingo Molnar @ 2014-05-07 12:19 UTC (permalink / raw) To: Josh Poimboeuf Cc: Jiri Kosina, David Lang, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Frederic Weisbecker, Ingo Molnar, Jiri Slaby, linux-kernel * Josh Poimboeuf <jpoimboe@redhat.com> wrote: > On Tue, May 06, 2014 at 09:32:28AM +0200, Ingo Molnar wrote: > > > > * Jiri Kosina <jkosina@suse.cz> wrote: > > > > > On Mon, 5 May 2014, David Lang wrote: > > > > > > > how would you know that all instances of the datastructure in memory > > > > have= been touched? just because all tasks have run and are outside the > > > > function in question doesn't tell you data structures have been > > > > converted. You have n= o way of knowing when (or if) the next call to > > > > the modified function will take place on any potential in-memory > > > > structure. > > > > > > The problem you are trying to avoid here is functions expecting to read > > > "v2" format of the data from memory, while there are still tasks that are > > > unpredictably writing "v1" format of the data to the memory. > > > > > > There are several ways to attack this problem: > > > > > > - stop the whole system, convert all the existing data structures to new > > > format (which might potentially be non-trivial, mostly because you > > > have to *know* where all the data structures have been allocated), apply > > > patch, resume operation [ksplice, probably kpatch in future] > > > - restrict the data format to be backwards compatible [to be done > > > manually during patch creation, currently what kGraft needs to do in > > > such case] > > > - have a proxy code which can read both "v1" and "v2" formats, and writes > > > back in the same format it has seen the data structure on input > > > - once all the *code* has been converted, it still has to understand "v1" > > > and "v2", but it can now start writing out "v2" format only [possible > > > with kGraft, not implemented in automated fashion] > > > > > > Ideas are of course more than welcome. > > > > So what I'm curious about, what is the actual 'in the field' distro > > experience, about the type of live-patches that get pushed with > > urgency? > > > > My guess would be that the overwhelming majority of live-patches don't > > change data structures - and hence the right initial model would be to > > ensure (via tooling, and via review) that 'v1' and 'v2' data is > > exactly the same. > > Yes, in general we want to avoid data changes. In practice, we expect > most patches to be small, localized security fixes, so it shouldn't be > an issue in most cases. > > Currently the kpatch tooling detects any compile-time changes to > static data and refuses to build the patch module in that case. > > But there's no way to programmatically detect changes to dynamic > data. Which is why the user always has to be very careful when > selecting a patch. And since this is about the system kernel it's dead easy to mess up a new kernel function and make the system unbootable - so it's not like 'be careful' isn't something implied already. Thanks, Ingo ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-07 12:19 ` Ingo Molnar @ 2014-05-09 1:46 ` David Lang 2014-05-09 2:45 ` Steven Rostedt 2014-05-09 4:07 ` Masami Hiramatsu 0 siblings, 2 replies; 60+ messages in thread From: David Lang @ 2014-05-09 1:46 UTC (permalink / raw) To: Ingo Molnar Cc: Josh Poimboeuf, Jiri Kosina, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Frederic Weisbecker, Ingo Molnar, Jiri Slaby, linux-kernel On Wed, 7 May 2014, Ingo Molnar wrote: > * Josh Poimboeuf <jpoimboe@redhat.com> wrote: > >> On Tue, May 06, 2014 at 09:32:28AM +0200, Ingo Molnar wrote: >>> >>> * Jiri Kosina <jkosina@suse.cz> wrote: >>> >>>> On Mon, 5 May 2014, David Lang wrote: >>>> >>>>> how would you know that all instances of the datastructure in memory >>>>> have= been touched? just because all tasks have run and are outside the >>>>> function in question doesn't tell you data structures have been >>>>> converted. You have n= o way of knowing when (or if) the next call to >>>>> the modified function will take place on any potential in-memory >>>>> structure. >>>> >>>> The problem you are trying to avoid here is functions expecting to read >>>> "v2" format of the data from memory, while there are still tasks that are >>>> unpredictably writing "v1" format of the data to the memory. >>>> >>>> There are several ways to attack this problem: >>>> >>>> - stop the whole system, convert all the existing data structures to new >>>> format (which might potentially be non-trivial, mostly because you >>>> have to *know* where all the data structures have been allocated), apply >>>> patch, resume operation [ksplice, probably kpatch in future] >>>> - restrict the data format to be backwards compatible [to be done >>>> manually during patch creation, currently what kGraft needs to do in >>>> such case] >>>> - have a proxy code which can read both "v1" and "v2" formats, and writes >>>> back in the same format it has seen the data structure on input >>>> - once all the *code* has been converted, it still has to understand "v1" >>>> and "v2", but it can now start writing out "v2" format only [possible >>>> with kGraft, not implemented in automated fashion] >>>> >>>> Ideas are of course more than welcome. >>> >>> So what I'm curious about, what is the actual 'in the field' distro >>> experience, about the type of live-patches that get pushed with >>> urgency? >>> >>> My guess would be that the overwhelming majority of live-patches don't >>> change data structures - and hence the right initial model would be to >>> ensure (via tooling, and via review) that 'v1' and 'v2' data is >>> exactly the same. >> >> Yes, in general we want to avoid data changes. In practice, we expect >> most patches to be small, localized security fixes, so it shouldn't be >> an issue in most cases. >> >> Currently the kpatch tooling detects any compile-time changes to >> static data and refuses to build the patch module in that case. >> >> But there's no way to programmatically detect changes to dynamic >> data. Which is why the user always has to be very careful when >> selecting a patch. > > And since this is about the system kernel it's dead easy to mess up a > new kernel function and make the system unbootable - so it's not like > 'be careful' isn't something implied already. It's possible to have two versions of code that each work independently, but that you can't switch between easily on the fly. If the new code assumes a lock is held that the old code didn't take, then when you switch, you are eventually going to hit a case where the new code trys to release a lock it doesn't hold. detecting all possible cases progromatically seems close to impossible. but this means that there are two categories of patches 1. patches that are safe to put in a kernel that you are going to boot from 2. patches that are able to be applied on the fly and the tool isn't going to be able to tell you which category the patch is in. It can identify some of the items that make it unlikely or impossible for the patch to belong to #2, but don't rely on the tool catching all of them David Lang ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-09 1:46 ` David Lang @ 2014-05-09 2:45 ` Steven Rostedt 2014-05-09 4:07 ` Masami Hiramatsu 1 sibling, 0 replies; 60+ messages in thread From: Steven Rostedt @ 2014-05-09 2:45 UTC (permalink / raw) To: David Lang Cc: Ingo Molnar, Josh Poimboeuf, Jiri Kosina, Seth Jennings, Masami Hiramatsu, Frederic Weisbecker, Ingo Molnar, Jiri Slaby, linux-kernel On Thu, 2014-05-08 at 18:46 -0700, David Lang wrote: > On Wed, 7 May 2014, Ingo Molnar wrote: > It's possible to have two versions of code that each work independently, but > that you can't switch between easily on the fly. > > If the new code assumes a lock is held that the old code didn't take, then when > you switch, you are eventually going to hit a case where the new code trys to > release a lock it doesn't hold. But we are replacing full functions here. Never half way through. Thus, there should never be "old code didn't take", because we replace the entire function. If the new code expects a lock to be held where the old code did not, there's nothing changing anything that leads up to this new code. The lock still won't be taken. A patch that expects that wont work on a reboot either. > > detecting all possible cases progromatically seems close to impossible. > > but this means that there are two categories of patches > > 1. patches that are safe to put in a kernel that you are going to boot from > > 2. patches that are able to be applied on the fly > > and the tool isn't going to be able to tell you which category the patch is in. > It can identify some of the items that make it unlikely or impossible for the > patch to belong to #2, but don't rely on the tool catching all of them I agree that live patching is going to be a bit trickier than patches that are applied for reboot. And I also agree that no tool could determine if the patch can be applied live even if it could be applied safely for reboot. To do so would probably be on the difficulty scale of the halting problem. -- Steve ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-09 1:46 ` David Lang 2014-05-09 2:45 ` Steven Rostedt @ 2014-05-09 4:07 ` Masami Hiramatsu 1 sibling, 0 replies; 60+ messages in thread From: Masami Hiramatsu @ 2014-05-09 4:07 UTC (permalink / raw) To: David Lang Cc: Ingo Molnar, Josh Poimboeuf, Jiri Kosina, Seth Jennings, Steven Rostedt, Frederic Weisbecker, Ingo Molnar, Jiri Slaby, linux-kernel (2014/05/09 10:46), David Lang wrote: > On Wed, 7 May 2014, Ingo Molnar wrote: > >> * Josh Poimboeuf <jpoimboe@redhat.com> wrote: >> >>> On Tue, May 06, 2014 at 09:32:28AM +0200, Ingo Molnar wrote: >>>> >>>> * Jiri Kosina <jkosina@suse.cz> wrote: >>>> >>>>> On Mon, 5 May 2014, David Lang wrote: >>>>> >>>>>> how would you know that all instances of the datastructure in memory >>>>>> have= been touched? just because all tasks have run and are outside the >>>>>> function in question doesn't tell you data structures have been >>>>>> converted. You have n= o way of knowing when (or if) the next call to >>>>>> the modified function will take place on any potential in-memory >>>>>> structure. >>>>> >>>>> The problem you are trying to avoid here is functions expecting to read >>>>> "v2" format of the data from memory, while there are still tasks that are >>>>> unpredictably writing "v1" format of the data to the memory. >>>>> >>>>> There are several ways to attack this problem: >>>>> >>>>> - stop the whole system, convert all the existing data structures to new >>>>> format (which might potentially be non-trivial, mostly because you >>>>> have to *know* where all the data structures have been allocated), apply >>>>> patch, resume operation [ksplice, probably kpatch in future] >>>>> - restrict the data format to be backwards compatible [to be done >>>>> manually during patch creation, currently what kGraft needs to do in >>>>> such case] >>>>> - have a proxy code which can read both "v1" and "v2" formats, and writes >>>>> back in the same format it has seen the data structure on input >>>>> - once all the *code* has been converted, it still has to understand "v1" >>>>> and "v2", but it can now start writing out "v2" format only [possible >>>>> with kGraft, not implemented in automated fashion] >>>>> >>>>> Ideas are of course more than welcome. >>>> >>>> So what I'm curious about, what is the actual 'in the field' distro >>>> experience, about the type of live-patches that get pushed with >>>> urgency? >>>> >>>> My guess would be that the overwhelming majority of live-patches don't >>>> change data structures - and hence the right initial model would be to >>>> ensure (via tooling, and via review) that 'v1' and 'v2' data is >>>> exactly the same. >>> >>> Yes, in general we want to avoid data changes. In practice, we expect >>> most patches to be small, localized security fixes, so it shouldn't be >>> an issue in most cases. >>> >>> Currently the kpatch tooling detects any compile-time changes to >>> static data and refuses to build the patch module in that case. >>> >>> But there's no way to programmatically detect changes to dynamic >>> data. Which is why the user always has to be very careful when >>> selecting a patch. >> >> And since this is about the system kernel it's dead easy to mess up a >> new kernel function and make the system unbootable - so it's not like >> 'be careful' isn't something implied already. > > It's possible to have two versions of code that each work independently, but > that you can't switch between easily on the fly. > > If the new code assumes a lock is held that the old code didn't take, then when > you switch, you are eventually going to hit a case where the new code trys to > release a lock it doesn't hold. > > detecting all possible cases progromatically seems close to impossible. Agreed. Perhaps, spinlock or locks which have small critical section are usually able to make safe, because a lock caller also does unlock. But mutex etc. usually have different locker/unlocker function. In that case, we'll need to check running all functions which is in the critical region. > but this means that there are two categories of patches > > 1. patches that are safe to put in a kernel that you are going to boot from > > 2. patches that are able to be applied on the fly > > and the tool isn't going to be able to tell you which category the patch is in. > It can identify some of the items that make it unlikely or impossible for the > patch to belong to #2, but don't rely on the tool catching all of them Yeah, I think we'd better start with heuristic decision. Most of the cases, it could be applied. Thank you, -- Masami HIRAMATSU Software Platform Research Dept. Linux Technology Research Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu.pt@hitachi.com ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-01 15:52 [RFC PATCH 0/2] kpatch: dynamic kernel patching Josh Poimboeuf ` (4 preceding siblings ...) 2014-05-02 13:10 ` Jiri Kosina @ 2014-05-05 8:55 ` Ingo Molnar 2014-05-05 13:26 ` Josh Poimboeuf 5 siblings, 1 reply; 60+ messages in thread From: Ingo Molnar @ 2014-05-05 8:55 UTC (permalink / raw) To: Josh Poimboeuf Cc: Seth Jennings, Masami Hiramatsu, Steven Rostedt, Frederic Weisbecker, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner * Josh Poimboeuf <jpoimboe@redhat.com> wrote: > [...] > > kpatch checks the backtraces of all tasks in stop_machine() to > ensure that no instances of the old function are running when the > new function is applied. I think the biggest downside of this > approach is that stop_machine() has to idle all other CPUs during > the patching process, so it inserts a small amount of latency (a few > ms on an idle system). When live patching the kernel, how about achieving an even 'cleaner' state for all tasks in the system: to freeze all tasks, as the suspend and hibernation code (and kexec) does, via freeze_processes()? That means no tasks in the system have any real kernel execution state, and there's also no problem with long-sleeping tasks, as freeze_processes() is supposed to be fast as well. I.e. go for the most conservative live patching state first, and relax it only once the initial model is upstream and is working robustly. Thanks, Ingo ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-05 8:55 ` Ingo Molnar @ 2014-05-05 13:26 ` Josh Poimboeuf 2014-05-05 14:10 ` Frederic Weisbecker 0 siblings, 1 reply; 60+ messages in thread From: Josh Poimboeuf @ 2014-05-05 13:26 UTC (permalink / raw) To: Ingo Molnar Cc: Seth Jennings, Masami Hiramatsu, Steven Rostedt, Frederic Weisbecker, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner On Mon, May 05, 2014 at 10:55:37AM +0200, Ingo Molnar wrote: > > * Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > > [...] > > > > kpatch checks the backtraces of all tasks in stop_machine() to > > ensure that no instances of the old function are running when the > > new function is applied. I think the biggest downside of this > > approach is that stop_machine() has to idle all other CPUs during > > the patching process, so it inserts a small amount of latency (a few > > ms on an idle system). > > When live patching the kernel, how about achieving an even 'cleaner' > state for all tasks in the system: to freeze all tasks, as the suspend > and hibernation code (and kexec) does, via freeze_processes()? > > That means no tasks in the system have any real kernel execution > state, and there's also no problem with long-sleeping tasks, as > freeze_processes() is supposed to be fast as well. > > I.e. go for the most conservative live patching state first, and relax > it only once the initial model is upstream and is working robustly. I had considered doing this before, but the problem I found is that many kernel threads are unfreezable. So we wouldn't be able to check whether its safe to replace any functions in use by those kernel threads. -- Josh ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-05 13:26 ` Josh Poimboeuf @ 2014-05-05 14:10 ` Frederic Weisbecker 2014-05-05 18:43 ` Ingo Molnar 0 siblings, 1 reply; 60+ messages in thread From: Frederic Weisbecker @ 2014-05-05 14:10 UTC (permalink / raw) To: Josh Poimboeuf Cc: Ingo Molnar, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner On Mon, May 05, 2014 at 08:26:38AM -0500, Josh Poimboeuf wrote: > On Mon, May 05, 2014 at 10:55:37AM +0200, Ingo Molnar wrote: > > > > * Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > > > > [...] > > > > > > kpatch checks the backtraces of all tasks in stop_machine() to > > > ensure that no instances of the old function are running when the > > > new function is applied. I think the biggest downside of this > > > approach is that stop_machine() has to idle all other CPUs during > > > the patching process, so it inserts a small amount of latency (a few > > > ms on an idle system). > > > > When live patching the kernel, how about achieving an even 'cleaner' > > state for all tasks in the system: to freeze all tasks, as the suspend > > and hibernation code (and kexec) does, via freeze_processes()? > > > > That means no tasks in the system have any real kernel execution > > state, and there's also no problem with long-sleeping tasks, as > > freeze_processes() is supposed to be fast as well. > > > > I.e. go for the most conservative live patching state first, and relax > > it only once the initial model is upstream and is working robustly. > > I had considered doing this before, but the problem I found is that many > kernel threads are unfreezable. So we wouldn't be able to check whether > its safe to replace any functions in use by those kernel threads. OTOH many kernel threads are parkable. Which achieves kind of similar desired behaviour: the kernel threads then aren't running. And in fact we could implement freezing on top of park for kthreads. But unfortunately there are still quite some of them which don't support parking. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-05 14:10 ` Frederic Weisbecker @ 2014-05-05 18:43 ` Ingo Molnar 2014-05-05 21:49 ` Frederic Weisbecker 2014-05-06 11:45 ` Masami Hiramatsu 0 siblings, 2 replies; 60+ messages in thread From: Ingo Molnar @ 2014-05-05 18:43 UTC (permalink / raw) To: Frederic Weisbecker Cc: Josh Poimboeuf, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner * Frederic Weisbecker <fweisbec@gmail.com> wrote: > On Mon, May 05, 2014 at 08:26:38AM -0500, Josh Poimboeuf wrote: > > On Mon, May 05, 2014 at 10:55:37AM +0200, Ingo Molnar wrote: > > > > > > * Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > > > > > > [...] > > > > > > > > kpatch checks the backtraces of all tasks in stop_machine() to > > > > ensure that no instances of the old function are running when the > > > > new function is applied. I think the biggest downside of this > > > > approach is that stop_machine() has to idle all other CPUs during > > > > the patching process, so it inserts a small amount of latency (a few > > > > ms on an idle system). > > > > > > When live patching the kernel, how about achieving an even 'cleaner' > > > state for all tasks in the system: to freeze all tasks, as the suspend > > > and hibernation code (and kexec) does, via freeze_processes()? > > > > > > That means no tasks in the system have any real kernel execution > > > state, and there's also no problem with long-sleeping tasks, as > > > freeze_processes() is supposed to be fast as well. > > > > > > I.e. go for the most conservative live patching state first, and relax > > > it only once the initial model is upstream and is working robustly. > > > > I had considered doing this before, but the problem I found is > > that many kernel threads are unfreezable. So we wouldn't be able > > to check whether its safe to replace any functions in use by those > > kernel threads. > > OTOH many kernel threads are parkable. Which achieves kind of > similar desired behaviour: the kernel threads then aren't running. > > And in fact we could implement freezing on top of park for kthreads. > > But unfortunately there are still quite some of them which don't > support parking. Well, if distros are moving towards live patching (and they are!), then it looks rather necessary to me that something scary as flipping out live kernel instructions with substantially different code should be as safe as possible, and only then fast. If a kernel refuses to patch with certain threads running, that will drive those kernel threads being fixed and such. It's a deterministic, recoverable, reportable bug situation, so fixing it should be fast. We learned these robustness lessons the hard way with kprobes and ftrace dynamic code patching... which are utterly simple compared to live kernel patching! Thanks, Ingo ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-05 18:43 ` Ingo Molnar @ 2014-05-05 21:49 ` Frederic Weisbecker 2014-05-06 12:12 ` Josh Poimboeuf 2014-05-06 11:45 ` Masami Hiramatsu 1 sibling, 1 reply; 60+ messages in thread From: Frederic Weisbecker @ 2014-05-05 21:49 UTC (permalink / raw) To: Ingo Molnar Cc: Josh Poimboeuf, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner On Mon, May 05, 2014 at 08:43:04PM +0200, Ingo Molnar wrote: > > * Frederic Weisbecker <fweisbec@gmail.com> wrote: > > > On Mon, May 05, 2014 at 08:26:38AM -0500, Josh Poimboeuf wrote: > > > On Mon, May 05, 2014 at 10:55:37AM +0200, Ingo Molnar wrote: > > > > > > > > * Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > > > > > > > > [...] > > > > > > > > > > kpatch checks the backtraces of all tasks in stop_machine() to > > > > > ensure that no instances of the old function are running when the > > > > > new function is applied. I think the biggest downside of this > > > > > approach is that stop_machine() has to idle all other CPUs during > > > > > the patching process, so it inserts a small amount of latency (a few > > > > > ms on an idle system). > > > > > > > > When live patching the kernel, how about achieving an even 'cleaner' > > > > state for all tasks in the system: to freeze all tasks, as the suspend > > > > and hibernation code (and kexec) does, via freeze_processes()? > > > > > > > > That means no tasks in the system have any real kernel execution > > > > state, and there's also no problem with long-sleeping tasks, as > > > > freeze_processes() is supposed to be fast as well. > > > > > > > > I.e. go for the most conservative live patching state first, and relax > > > > it only once the initial model is upstream and is working robustly. > > > > > > I had considered doing this before, but the problem I found is > > > that many kernel threads are unfreezable. So we wouldn't be able > > > to check whether its safe to replace any functions in use by those > > > kernel threads. > > > > OTOH many kernel threads are parkable. Which achieves kind of > > similar desired behaviour: the kernel threads then aren't running. > > > > And in fact we could implement freezing on top of park for kthreads. > > > > But unfortunately there are still quite some of them which don't > > support parking. > > Well, if distros are moving towards live patching (and they are!), > then it looks rather necessary to me that something scary as flipping > out live kernel instructions with substantially different code should > be as safe as possible, and only then fast. Sure I won't argue that. > > If a kernel refuses to patch with certain threads running, that will > drive those kernel threads being fixed and such. It's a deterministic, > recoverable, reportable bug situation, so fixing it should be fast. > > We learned these robustness lessons the hard way with kprobes and > ftrace dynamic code patching... which are utterly simple compared to > live kernel patching! Yeah, agreed. More rationale behind: we want to put the kthreads into semantic sleeps, not just random sleeping point. This way we lower the chances to execute new code messing up living state that is expecting old code after random preemption or sleeping points. But by semantic sleeps I mean more than just explicit calls to schedule() as opposed to preemption points. It also implies shutting down as well the living states handled by the kthread such that some sort of re-initialization of the state is also needed when the kthread gets back to run. And that's exactly what good implementations of kthread park provide. Consider kernel/watchdog.c as an example: when we park the lockup detector kthread, it disables the perf event and the hrtimer before it goes to actually park and sleep. When the kthread is later unparked, the kthread restarts the hrtimer and the perf event. If we live patch code that has obscure relations with perf or hrtimer here, we lower a lot the chances for a crash when the watchdog kthread is parked. So I'm in favour of parking all possible kthreads before live patching. Freezing alone doesn't provide the same state shutdown than parking. Now since parking looks more widely implemented than kthread freezing, we could even think about implementing kthread freezing using parking as backend. > > Thanks, > > Ingo ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-05 21:49 ` Frederic Weisbecker @ 2014-05-06 12:12 ` Josh Poimboeuf 2014-05-06 12:33 ` Steven Rostedt 2014-05-06 14:05 ` Frederic Weisbecker 0 siblings, 2 replies; 60+ messages in thread From: Josh Poimboeuf @ 2014-05-06 12:12 UTC (permalink / raw) To: Frederic Weisbecker Cc: Ingo Molnar, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner On Mon, May 05, 2014 at 11:49:23PM +0200, Frederic Weisbecker wrote: > On Mon, May 05, 2014 at 08:43:04PM +0200, Ingo Molnar wrote: > > If a kernel refuses to patch with certain threads running, that will > > drive those kernel threads being fixed and such. It's a deterministic, > > recoverable, reportable bug situation, so fixing it should be fast. > > > > We learned these robustness lessons the hard way with kprobes and > > ftrace dynamic code patching... which are utterly simple compared to > > live kernel patching! > > Yeah, agreed. More rationale behind: we want to put the kthreads into > semantic sleeps, not just random sleeping point. This way we lower the > chances to execute new code messing up living state that is expecting old > code after random preemption or sleeping points. > > But by semantic sleeps I mean more than just explicit calls to schedule() > as opposed to preemption points. > It also implies shutting down as well the living states handled by the kthread > such that some sort of re-initialization of the state is also needed when > the kthread gets back to run. > > And that's exactly what good implementations of kthread park provide. > > Consider kernel/watchdog.c as an example: when we park the lockup > detector kthread, it disables the perf event and the hrtimer before it goes > to actually park and sleep. When the kthread is later unparked, the kthread > restarts the hrtimer and the perf event. > > If we live patch code that has obscure relations with perf or hrtimer here, > we lower a lot the chances for a crash when the watchdog kthread is parked. > > So I'm in favour of parking all possible kthreads before live patching. Freezing > alone doesn't provide the same state shutdown than parking. > > Now since parking looks more widely implemented than kthread freezing, we could > even think about implementing kthread freezing using parking as backend. The vast majority of kernel threads on my system don't seem to know anything about parking or freezing. I see one kthread function which calls kthread_should_park(), which is smpboot_thread_fn(), used for ksoftirqd/*, migration/* and watchdog/*. But there are many other kthread functions which seem to be parking ignorant, including: cpu_idle_loop kthreadd rcu_gp_kthread worker_thread rescuer_thread devtmpfsd hub_thread kswapd ksm_scan_thread khugepaged fsnotify_mark_destroy scsi_error_handler kauditd_thread kjournald2 irq_thread rfcomm_run Maybe we could modify all these thread functions (and probably more) to be park and/or freezer capable. But really it wouldn't make much of a difference IMO. It would only protect careless users from a tiny percentage of all possible havoc that a careless user could create. Live patching is a very sensitive and risky operation, and from a kernel standpoint we should make it as safe as we reasonably can. But we can't do much about careless users. Ultimately the risk is in the hands of the user and their choice of patches. They need to absolutely understand all the implications of patching a particular function. If the patch changes the way a function interacts with some external data, then they're starting to tempt fate and they need to be extra careful. This care needs to be taken for *all* kernel functions, not just for the few that are called from kernel threads. Also, the top level kernel thread functions (like those listed above) will never be patchable anyway, because we never patch an in-use function (these functions are always in the threads' backtraces). This further diminishes the benefit of parking/freezing kernel threads. -- Josh ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-06 12:12 ` Josh Poimboeuf @ 2014-05-06 12:33 ` Steven Rostedt 2014-05-06 22:49 ` Masami Hiramatsu 2014-05-06 14:05 ` Frederic Weisbecker 1 sibling, 1 reply; 60+ messages in thread From: Steven Rostedt @ 2014-05-06 12:33 UTC (permalink / raw) To: Josh Poimboeuf Cc: Frederic Weisbecker, Ingo Molnar, Seth Jennings, Masami Hiramatsu, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner On Tue, 6 May 2014 07:12:11 -0500 Josh Poimboeuf <jpoimboe@redhat.com> wrote: > Live patching is a very sensitive and risky operation, and from a kernel > standpoint we should make it as safe as we reasonably can. But we can't > do much about careless users. Ultimately the risk is in the hands of > the user and their choice of patches. They need to absolutely > understand all the implications of patching a particular function. If > the patch changes the way a function interacts with some external data, > then they're starting to tempt fate and they need to be extra careful. > This care needs to be taken for *all* kernel functions, not just for the > few that are called from kernel threads. Ideally the kpatch tools should be able to somewhat prevent users from doing damage. Or at least make them type a sentence that says: I know what I'm doing and will not blame anyone but myself if this kills the system along with all my puppies and kittens. I'm guessing that kpatch needs to be marketed that a distro or "hired help" will be creating the patch and the admin only needs to "trust" the one that gave them the kpatch module to load. All the testing/checking that the module works will be done by kernel developers and not by any "users". -- Steve ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-06 12:33 ` Steven Rostedt @ 2014-05-06 22:49 ` Masami Hiramatsu 0 siblings, 0 replies; 60+ messages in thread From: Masami Hiramatsu @ 2014-05-06 22:49 UTC (permalink / raw) To: Steven Rostedt Cc: Josh Poimboeuf, Frederic Weisbecker, Ingo Molnar, Seth Jennings, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner (2014/05/06 21:33), Steven Rostedt wrote: > On Tue, 6 May 2014 07:12:11 -0500 > Josh Poimboeuf <jpoimboe@redhat.com> wrote: > >> Live patching is a very sensitive and risky operation, and from a kernel >> standpoint we should make it as safe as we reasonably can. But we can't >> do much about careless users. Ultimately the risk is in the hands of >> the user and their choice of patches. They need to absolutely >> understand all the implications of patching a particular function. If >> the patch changes the way a function interacts with some external data, >> then they're starting to tempt fate and they need to be extra careful. >> This care needs to be taken for *all* kernel functions, not just for the >> few that are called from kernel threads. > > Ideally the kpatch tools should be able to somewhat prevent users from > doing damage. Or at least make them type a sentence that says: > > I know what I'm doing and will not blame anyone but myself if this > kills the system along with all my puppies and kittens. > > I'm guessing that kpatch needs to be marketed that a distro or "hired > help" will be creating the patch and the admin only needs to "trust" > the one that gave them the kpatch module to load. All the > testing/checking that the module works will be done by kernel > developers and not by any "users". I strongly agreed. Desktop/Cloud users may not care about it, most of users just update kernel and reboot. Somewhat "enterprise" users may also like to do "rolling update" their cluster systems. Only limited mission critical non-stop systems require it. And since they usually don't want to loose distro support, all binary patches will be provided and tested by distro kernel developers, not by users itself. :) Thank you, -- Masami HIRAMATSU Software Platform Research Dept. Linux Technology Research Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu.pt@hitachi.com ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-06 12:12 ` Josh Poimboeuf 2014-05-06 12:33 ` Steven Rostedt @ 2014-05-06 14:05 ` Frederic Weisbecker 2014-05-06 14:50 ` Josh Poimboeuf 1 sibling, 1 reply; 60+ messages in thread From: Frederic Weisbecker @ 2014-05-06 14:05 UTC (permalink / raw) To: Josh Poimboeuf Cc: Ingo Molnar, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner On Tue, May 06, 2014 at 07:12:11AM -0500, Josh Poimboeuf wrote: > On Mon, May 05, 2014 at 11:49:23PM +0200, Frederic Weisbecker wrote: > > On Mon, May 05, 2014 at 08:43:04PM +0200, Ingo Molnar wrote: > > > If a kernel refuses to patch with certain threads running, that will > > > drive those kernel threads being fixed and such. It's a deterministic, > > > recoverable, reportable bug situation, so fixing it should be fast. > > > > > > We learned these robustness lessons the hard way with kprobes and > > > ftrace dynamic code patching... which are utterly simple compared to > > > live kernel patching! > > > > Yeah, agreed. More rationale behind: we want to put the kthreads into > > semantic sleeps, not just random sleeping point. This way we lower the > > chances to execute new code messing up living state that is expecting old > > code after random preemption or sleeping points. > > > > But by semantic sleeps I mean more than just explicit calls to schedule() > > as opposed to preemption points. > > It also implies shutting down as well the living states handled by the kthread > > such that some sort of re-initialization of the state is also needed when > > the kthread gets back to run. > > > > And that's exactly what good implementations of kthread park provide. > > > > Consider kernel/watchdog.c as an example: when we park the lockup > > detector kthread, it disables the perf event and the hrtimer before it goes > > to actually park and sleep. When the kthread is later unparked, the kthread > > restarts the hrtimer and the perf event. > > > > If we live patch code that has obscure relations with perf or hrtimer here, > > we lower a lot the chances for a crash when the watchdog kthread is parked. > > > > So I'm in favour of parking all possible kthreads before live patching. Freezing > > alone doesn't provide the same state shutdown than parking. > > > > Now since parking looks more widely implemented than kthread freezing, we could > > even think about implementing kthread freezing using parking as backend. > > The vast majority of kernel threads on my system don't seem to know > anything about parking or freezing. I see one kthread function which > calls kthread_should_park(), which is smpboot_thread_fn(), used for > ksoftirqd/*, migration/* and watchdog/*. But there are many other > kthread functions which seem to be parking ignorant, including: > > cpu_idle_loop > kthreadd > rcu_gp_kthread > worker_thread > rescuer_thread > devtmpfsd > hub_thread > kswapd > ksm_scan_thread > khugepaged > fsnotify_mark_destroy > scsi_error_handler > kauditd_thread > kjournald2 > irq_thread > rfcomm_run Yeah I now realize that only very few of them can park. Only infiniband, rcu, stop_machine and the watchdog... But that's still a good direction to take if we want to make the kernel step by step more robust against live patching. > > Maybe we could modify all these thread functions (and probably more) to > be park and/or freezer capable. But really it wouldn't make much of a > difference IMO. It would only protect careless users from a tiny > percentage of all possible havoc that a careless user could create. > > Live patching is a very sensitive and risky operation, and from a kernel > standpoint we should make it as safe as we reasonably can. But we can't > do much about careless users. Ultimately the risk is in the hands of > the user and their choice of patches. They need to absolutely > understand all the implications of patching a particular function. Kernel developers will be a tiny minority of users of live patching. Very few sysadmins know about kernel internals. You can't really ask them to judge if a patch is reasonably live patchable or not. It's possible to appreciate a patch wrt. its size, or the fact that it came through a stable tree, so it's at worst mildly invasive. The only thing that could make live patching safe is that a community eventually builds around it and carefully inspect patches in a stable release then provide a selection of safely live patchable pieces. Other than that, expect people to do crazy things. > If the patch changes the way a function interacts with some external data, > then they're starting to tempt fate and they need to be extra careful. > This care needs to be taken for *all* kernel functions, not just for the > few that are called from kernel threads. It's actually very hard to tell if a given function isn't called by any kernel thread. > > Also, the top level kernel thread functions (like those listed above) > will never be patchable anyway, because we never patch an in-use > function (these functions are always in the threads' backtraces). This > further diminishes the benefit of parking/freezing kernel threads. You're right to do some basic prevention. It's the bare minimum. But keep in mind it's a tiny protection if you consider the top level function is only a small part of what is called by a kernel thread. And there is still a risk that the patched part is incompatible with previously executed code. For example when a patched function depends on a patched initialization but we executed the old version of the initialization already and we are going to execute the new patched function. It's actually a common pattern. And careful implementation of kthreads parking can help. But this is something that we can do step by step. Ah this reminds me when we chased kprobes dangerous spots and we tried to declare __kprobes the functions which were too dangerous to hot patch. We eventually gave up because it was impossible to fix everything. And that was only for kprobes! So you can never tell if a given patch will impact a given kthread. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-06 14:05 ` Frederic Weisbecker @ 2014-05-06 14:50 ` Josh Poimboeuf 2014-05-07 12:24 ` Ingo Molnar 0 siblings, 1 reply; 60+ messages in thread From: Josh Poimboeuf @ 2014-05-06 14:50 UTC (permalink / raw) To: Frederic Weisbecker Cc: Ingo Molnar, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner On Tue, May 06, 2014 at 04:05:21PM +0200, Frederic Weisbecker wrote: > On Tue, May 06, 2014 at 07:12:11AM -0500, Josh Poimboeuf wrote: > > On Mon, May 05, 2014 at 11:49:23PM +0200, Frederic Weisbecker wrote: > > > On Mon, May 05, 2014 at 08:43:04PM +0200, Ingo Molnar wrote: > > > > If a kernel refuses to patch with certain threads running, that will > > > > drive those kernel threads being fixed and such. It's a deterministic, > > > > recoverable, reportable bug situation, so fixing it should be fast. > > > > > > > > We learned these robustness lessons the hard way with kprobes and > > > > ftrace dynamic code patching... which are utterly simple compared to > > > > live kernel patching! > > > > > > Yeah, agreed. More rationale behind: we want to put the kthreads into > > > semantic sleeps, not just random sleeping point. This way we lower the > > > chances to execute new code messing up living state that is expecting old > > > code after random preemption or sleeping points. > > > > > > But by semantic sleeps I mean more than just explicit calls to schedule() > > > as opposed to preemption points. > > > It also implies shutting down as well the living states handled by the kthread > > > such that some sort of re-initialization of the state is also needed when > > > the kthread gets back to run. > > > > > > And that's exactly what good implementations of kthread park provide. > > > > > > Consider kernel/watchdog.c as an example: when we park the lockup > > > detector kthread, it disables the perf event and the hrtimer before it goes > > > to actually park and sleep. When the kthread is later unparked, the kthread > > > restarts the hrtimer and the perf event. > > > > > > If we live patch code that has obscure relations with perf or hrtimer here, > > > we lower a lot the chances for a crash when the watchdog kthread is parked. > > > > > > So I'm in favour of parking all possible kthreads before live patching. Freezing > > > alone doesn't provide the same state shutdown than parking. > > > > > > Now since parking looks more widely implemented than kthread freezing, we could > > > even think about implementing kthread freezing using parking as backend. > > > > The vast majority of kernel threads on my system don't seem to know > > anything about parking or freezing. I see one kthread function which > > calls kthread_should_park(), which is smpboot_thread_fn(), used for > > ksoftirqd/*, migration/* and watchdog/*. But there are many other > > kthread functions which seem to be parking ignorant, including: > > > > cpu_idle_loop > > kthreadd > > rcu_gp_kthread > > worker_thread > > rescuer_thread > > devtmpfsd > > hub_thread > > kswapd > > ksm_scan_thread > > khugepaged > > fsnotify_mark_destroy > > scsi_error_handler > > kauditd_thread > > kjournald2 > > irq_thread > > rfcomm_run > > Yeah I now realize that only very few of them can park. Only infiniband, rcu, > stop_machine and the watchdog... > > But that's still a good direction to take if we want to make the kernel > step by step more robust against live patching. > > > > > Maybe we could modify all these thread functions (and probably more) to > > be park and/or freezer capable. But really it wouldn't make much of a > > difference IMO. It would only protect careless users from a tiny > > percentage of all possible havoc that a careless user could create. > > > > Live patching is a very sensitive and risky operation, and from a kernel > > standpoint we should make it as safe as we reasonably can. But we can't > > do much about careless users. Ultimately the risk is in the hands of > > the user and their choice of patches. They need to absolutely > > understand all the implications of patching a particular function. > > Kernel developers will be a tiny minority of users of live patching. > > Very few sysadmins know about kernel internals. You can't really ask them > to judge if a patch is reasonably live patchable or not. > > It's possible to appreciate a patch wrt. its size, or the fact that it came > through a stable tree, so it's at worst mildly invasive. > > The only thing that could make live patching safe is that a community > eventually builds around it and carefully inspect patches in a stable release > then provide a selection of safely live patchable pieces. I agree, and this is exactly the approach we're taking. It's not possible for the tools or the kernel to determine whether a patch is safe. We don't expect a sysadmin to make that determination either. We have to rely on kernel experts to analyze the patches, test them, and then deliver them (or the binary patch modules) to the sysadmin. > Other than that, expect people to do crazy things. Yes. This is why we taint the kernel with TAINT_KPATCH. > > If the patch changes the way a function interacts with some external data, > > then they're starting to tempt fate and they need to be extra careful. > > This care needs to be taken for *all* kernel functions, not just for the > > few that are called from kernel threads. > > It's actually very hard to tell if a given function isn't called by any > kernel thread. I agree. The point I was trying to make is that the user needs to be very careful with analyzing the patch, regardless of whether the function is called by a kernel thread. > > Also, the top level kernel thread functions (like those listed above) > > will never be patchable anyway, because we never patch an in-use > > function (these functions are always in the threads' backtraces). This > > further diminishes the benefit of parking/freezing kernel threads. > > You're right to do some basic prevention. It's the bare minimum. > > But keep in mind it's a tiny protection if you consider the top level > function is only a small part of what is called by a kernel thread. And > there is still a risk that the patched part is incompatible with previously > executed code. For example when a patched function depends on a patched > initialization but we executed the old version of the initialization already > and we are going to execute the new patched function. It's actually a common > pattern. And careful implementation of kthreads parking can help. But this is > something that we can do step by step. We prevent many situations like this by disallowing the user from patching __init functions when building the patch module binary. > Ah this reminds me when we chased kprobes dangerous spots and we tried to > declare __kprobes the functions which were too dangerous to hot patch. > > We eventually gave up because it was impossible to fix everything. And that > was only for kprobes! > > So you can never tell if a given patch will impact a given kthread. If the user (or the person creating the patch for them) doesn't understand all impacts of the patch, they have no business patching their kernel with it. -- Josh ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-06 14:50 ` Josh Poimboeuf @ 2014-05-07 12:24 ` Ingo Molnar 2014-05-07 15:41 ` Josh Poimboeuf 0 siblings, 1 reply; 60+ messages in thread From: Ingo Molnar @ 2014-05-07 12:24 UTC (permalink / raw) To: Josh Poimboeuf Cc: Frederic Weisbecker, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner * Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > Ah this reminds me when we chased kprobes dangerous spots and we > > tried to declare __kprobes the functions which were too dangerous > > to hot patch. > > > > We eventually gave up because it was impossible to fix everything. > > And that was only for kprobes! > > > > So you can never tell if a given patch will impact a given > > kthread. > > If the user (or the person creating the patch for them) doesn't > understand all impacts of the patch, they have no business patching > their kernel with it. I think what is being somewhat lost is this discussion is the distinction between: 1) is the patch safe 2) is the _live patching_ safe It's really two different things. We should absolutely strive for live patching to be safe under all circumstances, as long as the patch being fed to it is safe in itself when building a new kernel the old fashioned way. I.e. it's natural that a kernel can be messed up via a patch, but this subsystem should absolutely make sure that it will safely reject totally fine patches that are unsafe to live patch. Thanks, Ingo ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-07 12:24 ` Ingo Molnar @ 2014-05-07 15:41 ` Josh Poimboeuf 2014-05-07 15:57 ` Ingo Molnar 0 siblings, 1 reply; 60+ messages in thread From: Josh Poimboeuf @ 2014-05-07 15:41 UTC (permalink / raw) To: Ingo Molnar Cc: Frederic Weisbecker, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner On Wed, May 07, 2014 at 02:24:44PM +0200, Ingo Molnar wrote: > > * Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > > > Ah this reminds me when we chased kprobes dangerous spots and we > > > tried to declare __kprobes the functions which were too dangerous > > > to hot patch. > > > > > > We eventually gave up because it was impossible to fix everything. > > > And that was only for kprobes! > > > > > > So you can never tell if a given patch will impact a given > > > kthread. > > > > If the user (or the person creating the patch for them) doesn't > > understand all impacts of the patch, they have no business patching > > their kernel with it. > > I think what is being somewhat lost is this discussion is the > distinction between: > > 1) is the patch safe > 2) is the _live patching_ safe > > It's really two different things. We should absolutely strive for live > patching to be safe under all circumstances, as long as the patch > being fed to it is safe in itself when building a new kernel the old > fashioned way. > > I.e. it's natural that a kernel can be messed up via a patch, but this > subsystem should absolutely make sure that it will safely reject > totally fine patches that are unsafe to live patch. Thanks, that's a very succinct way to put it. They are indeed two different things, but at the same time they're interrelated: determining whether a patch is safe requires making assumptions about how it will be applied. Here's how kpatch draws the lines: 1) Is the patch safe? Determined by the user (and partially enforced by the kpatch-build tools). The user can assume that the old function(s) will not be in use by any task at the time of replacement, so there's no risk of unexpected interactions between the old and the new. There is no guarantee that all tasks will be frozen. 2) Is the live patching safe? Determined by the kernel. The kernel only needs to ensure that the old function(s) are not in use. We do this with stop_machine() and backtrace checks of all tasks. It's a very simple contract between user and kernel. I think the proposal was that we change this contract such that the user can additionally assume that all tasks will be frozen. I could be missing something but I don't see a real benefit from it. -- Josh ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-07 15:41 ` Josh Poimboeuf @ 2014-05-07 15:57 ` Ingo Molnar 2014-05-07 16:43 ` Josh Poimboeuf 2014-05-07 22:56 ` David Lang 0 siblings, 2 replies; 60+ messages in thread From: Ingo Molnar @ 2014-05-07 15:57 UTC (permalink / raw) To: Josh Poimboeuf Cc: Frederic Weisbecker, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner * Josh Poimboeuf <jpoimboe@redhat.com> wrote: > On Wed, May 07, 2014 at 02:24:44PM +0200, Ingo Molnar wrote: > > > > * Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > > > > > Ah this reminds me when we chased kprobes dangerous spots and we > > > > tried to declare __kprobes the functions which were too dangerous > > > > to hot patch. > > > > > > > > We eventually gave up because it was impossible to fix everything. > > > > And that was only for kprobes! > > > > > > > > So you can never tell if a given patch will impact a given > > > > kthread. > > > > > > If the user (or the person creating the patch for them) doesn't > > > understand all impacts of the patch, they have no business patching > > > their kernel with it. > > > > I think what is being somewhat lost is this discussion is the > > distinction between: > > > > 1) is the patch safe > > 2) is the _live patching_ safe > > > > It's really two different things. We should absolutely strive for live > > patching to be safe under all circumstances, as long as the patch > > being fed to it is safe in itself when building a new kernel the old > > fashioned way. > > > > I.e. it's natural that a kernel can be messed up via a patch, but this > > subsystem should absolutely make sure that it will safely reject > > totally fine patches that are unsafe to live patch. > > Thanks, that's a very succinct way to put it. They are indeed two > different things, but at the same time they're interrelated: determining > whether a patch is safe requires making assumptions about how it will be > applied. No! A patch to the kernel source is 'safe' if it results in a correctly patched kernel source. Full stop! Live patching does not enter into this question, ever. The correctness of a patch to the source does not depend on 'live patching' considerations in any way, shape or form. Any mechanism that tries to blur these lines is broken by design. My claim is that if a patch is correct/safe in the old fashioned way, then a fundamental principle is that a live patching subsystem must either safely apply, or safely reject the live patching attempt, independently from any user input. It's similar to how kprobes (or ftrace) will safely reject or perform a live patching of the kernel. So for example, there's this recent upstream kernel fix: 3ca9e5d36afb agp: info leak in agpioc_info_wrap() which fixes an information leak. The 'patch' is Git commit 3ca9e5d36afb (i.e. it patches a very specific incoming kernel source tree that results in a specific outgoing source tree), and we know it's safe and correct. Any live patching subsystem must make sure that if this patch is live-patched, that this attempt is either rejected safely or performed safely. "We think/hope it won't blow up in most cases and we automated some checks halfways" or "the user must know what he is doing" is really not something that I think is a good concept for something as fragile as live patching. Thanks, Ingo ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-07 15:57 ` Ingo Molnar @ 2014-05-07 16:43 ` Josh Poimboeuf 2014-05-07 22:56 ` David Lang 1 sibling, 0 replies; 60+ messages in thread From: Josh Poimboeuf @ 2014-05-07 16:43 UTC (permalink / raw) To: Ingo Molnar Cc: Frederic Weisbecker, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner On Wed, May 07, 2014 at 05:57:54PM +0200, Ingo Molnar wrote: > Live patching does not enter into this question, ever. The correctness > of a patch to the source does not depend on 'live patching' > considerations in any way, shape or form. > > Any mechanism that tries to blur these lines is broken by design. > > My claim is that if a patch is correct/safe in the old fashioned way, > then a fundamental principle is that a live patching subsystem must > either safely apply, or safely reject the live patching attempt, > independently from any user input. That's a valiant goal, but it's not going to happen unless you want to rewrite Linux in Haskell. It's just not possible for a program to prove that a patch is safe to apply to a running kernel. There are way too many subtle interactions with dynamically allocated data between functions. I think the only way to achieve that is with CRIU, but it still requires a kexec or a reboot, so you lose all kernel state and it's much more disruptive. > "We think/hope it won't blow up in most cases and we automated some > checks halfways" or "the user must know what he is doing" is really > not something that I think is a good concept for something as fragile > as live patching. This is a distro tool, not a general purpose one. If distros are careful with their patch selection, it won't blow up. It's a valuable way for distros to help out sysadmins who need a hot security fix but can't reboot immediately. -- Josh ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-07 15:57 ` Ingo Molnar 2014-05-07 16:43 ` Josh Poimboeuf @ 2014-05-07 22:56 ` David Lang 2014-05-08 6:12 ` Ingo Molnar 1 sibling, 1 reply; 60+ messages in thread From: David Lang @ 2014-05-07 22:56 UTC (permalink / raw) To: Ingo Molnar Cc: Josh Poimboeuf, Frederic Weisbecker, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner On Wed, 7 May 2014, Ingo Molnar wrote: > * Josh Poimboeuf <jpoimboe@redhat.com> wrote: > >> On Wed, May 07, 2014 at 02:24:44PM +0200, Ingo Molnar wrote: >>> >>> * Josh Poimboeuf <jpoimboe@redhat.com> wrote: >>> >>>>> Ah this reminds me when we chased kprobes dangerous spots and we >>>>> tried to declare __kprobes the functions which were too dangerous >>>>> to hot patch. >>>>> >>>>> We eventually gave up because it was impossible to fix everything. >>>>> And that was only for kprobes! >>>>> >>>>> So you can never tell if a given patch will impact a given >>>>> kthread. >>>> >>>> If the user (or the person creating the patch for them) doesn't >>>> understand all impacts of the patch, they have no business patching >>>> their kernel with it. >>> >>> I think what is being somewhat lost is this discussion is the >>> distinction between: >>> >>> 1) is the patch safe >>> 2) is the _live patching_ safe >>> >>> It's really two different things. We should absolutely strive for live >>> patching to be safe under all circumstances, as long as the patch >>> being fed to it is safe in itself when building a new kernel the old >>> fashioned way. >>> >>> I.e. it's natural that a kernel can be messed up via a patch, but this >>> subsystem should absolutely make sure that it will safely reject >>> totally fine patches that are unsafe to live patch. >> >> Thanks, that's a very succinct way to put it. They are indeed two >> different things, but at the same time they're interrelated: determining >> whether a patch is safe requires making assumptions about how it will be >> applied. > > No! > > A patch to the kernel source is 'safe' if it results in a correctly > patched kernel source. Full stop! > > Live patching does not enter into this question, ever. The correctness > of a patch to the source does not depend on 'live patching' > considerations in any way, shape or form. > > Any mechanism that tries to blur these lines is broken by design. > > My claim is that if a patch is correct/safe in the old fashioned way, > then a fundamental principle is that a live patching subsystem must > either safely apply, or safely reject the live patching attempt, > independently from any user input. > > It's similar to how kprobes (or ftrace) will safely reject or perform > a live patching of the kernel. > > So for example, there's this recent upstream kernel fix: > > 3ca9e5d36afb agp: info leak in agpioc_info_wrap() > > which fixes an information leak. The 'patch' is Git commit > 3ca9e5d36afb (i.e. it patches a very specific incoming kernel source > tree that results in a specific outgoing source tree), and we know > it's safe and correct. > > Any live patching subsystem must make sure that if this patch is > live-patched, that this attempt is either rejected safely or performed > safely. > > "We think/hope it won't blow up in most cases and we automated some > checks halfways" or "the user must know what he is doing" is really > not something that I think is a good concept for something as fragile > as live patching. In that case you will have to reject any kernel patch that changes any memory structure, because it's impossible as a general rule to say that changing memory structures is going to be safe (or even possible) to change. that includes any access to memory that moves around a lock David Lang ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-07 22:56 ` David Lang @ 2014-05-08 6:12 ` Ingo Molnar 2014-05-08 6:50 ` David Lang 0 siblings, 1 reply; 60+ messages in thread From: Ingo Molnar @ 2014-05-08 6:12 UTC (permalink / raw) To: David Lang Cc: Josh Poimboeuf, Frederic Weisbecker, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner * David Lang <david@lang.hm> wrote: > On Wed, 7 May 2014, Ingo Molnar wrote: > > >* Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > > >>On Wed, May 07, 2014 at 02:24:44PM +0200, Ingo Molnar wrote: > >>> > >>>* Josh Poimboeuf <jpoimboe@redhat.com> wrote: > >>> > >>>>>Ah this reminds me when we chased kprobes dangerous spots and we > >>>>>tried to declare __kprobes the functions which were too dangerous > >>>>>to hot patch. > >>>>> > >>>>>We eventually gave up because it was impossible to fix everything. > >>>>>And that was only for kprobes! > >>>>> > >>>>>So you can never tell if a given patch will impact a given > >>>>>kthread. > >>>> > >>>>If the user (or the person creating the patch for them) doesn't > >>>>understand all impacts of the patch, they have no business patching > >>>>their kernel with it. > >>> > >>>I think what is being somewhat lost is this discussion is the > >>>distinction between: > >>> > >>> 1) is the patch safe > >>> 2) is the _live patching_ safe > >>> > >>>It's really two different things. We should absolutely strive for live > >>>patching to be safe under all circumstances, as long as the patch > >>>being fed to it is safe in itself when building a new kernel the old > >>>fashioned way. > >>> > >>>I.e. it's natural that a kernel can be messed up via a patch, but this > >>>subsystem should absolutely make sure that it will safely reject > >>>totally fine patches that are unsafe to live patch. > >> > >>Thanks, that's a very succinct way to put it. They are indeed two > >>different things, but at the same time they're interrelated: determining > >>whether a patch is safe requires making assumptions about how it will be > >>applied. > > > >No! > > > >A patch to the kernel source is 'safe' if it results in a correctly > >patched kernel source. Full stop! > > > >Live patching does not enter into this question, ever. The correctness > >of a patch to the source does not depend on 'live patching' > >considerations in any way, shape or form. > > > >Any mechanism that tries to blur these lines is broken by design. > > > >My claim is that if a patch is correct/safe in the old fashioned way, > >then a fundamental principle is that a live patching subsystem must > >either safely apply, or safely reject the live patching attempt, > >independently from any user input. > > > >It's similar to how kprobes (or ftrace) will safely reject or perform > >a live patching of the kernel. > > > >So for example, there's this recent upstream kernel fix: > > > > 3ca9e5d36afb agp: info leak in agpioc_info_wrap() > > > >which fixes an information leak. The 'patch' is Git commit > >3ca9e5d36afb (i.e. it patches a very specific incoming kernel source > >tree that results in a specific outgoing source tree), and we know > >it's safe and correct. > > > >Any live patching subsystem must make sure that if this patch is > >live-patched, that this attempt is either rejected safely or performed > >safely. > > > >"We think/hope it won't blow up in most cases and we automated some > >checks halfways" or "the user must know what he is doing" is really > >not something that I think is a good concept for something as fragile > >as live patching. > > In that case you will have to reject any kernel patch that changes > any memory structure, because it's impossible as a general rule to > say that changing memory structures is going to be safe (or even > possible) to change. > > that includes any access to memory that moves around a lock Initially restricting it to such patches would be a good beginning - most of the security fixes are just failed checks, i.e. they don't typically even change any external (not on stack) memory structure, right? Thanks, Ingo ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-08 6:12 ` Ingo Molnar @ 2014-05-08 6:50 ` David Lang 2014-05-08 7:08 ` Ingo Molnar 0 siblings, 1 reply; 60+ messages in thread From: David Lang @ 2014-05-08 6:50 UTC (permalink / raw) To: Ingo Molnar Cc: Josh Poimboeuf, Frederic Weisbecker, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner On Thu, 8 May 2014, Ingo Molnar wrote: >>> >>> No! >>> >>> A patch to the kernel source is 'safe' if it results in a correctly >>> patched kernel source. Full stop! >>> >>> Live patching does not enter into this question, ever. The correctness >>> of a patch to the source does not depend on 'live patching' >>> considerations in any way, shape or form. >>> >>> Any mechanism that tries to blur these lines is broken by design. >>> >>> My claim is that if a patch is correct/safe in the old fashioned way, >>> then a fundamental principle is that a live patching subsystem must >>> either safely apply, or safely reject the live patching attempt, >>> independently from any user input. >>> >>> It's similar to how kprobes (or ftrace) will safely reject or perform >>> a live patching of the kernel. >>> >>> So for example, there's this recent upstream kernel fix: >>> >>> 3ca9e5d36afb agp: info leak in agpioc_info_wrap() >>> >>> which fixes an information leak. The 'patch' is Git commit >>> 3ca9e5d36afb (i.e. it patches a very specific incoming kernel source >>> tree that results in a specific outgoing source tree), and we know >>> it's safe and correct. >>> >>> Any live patching subsystem must make sure that if this patch is >>> live-patched, that this attempt is either rejected safely or performed >>> safely. >>> >>> "We think/hope it won't blow up in most cases and we automated some >>> checks halfways" or "the user must know what he is doing" is really >>> not something that I think is a good concept for something as fragile >>> as live patching. >> >> In that case you will have to reject any kernel patch that changes >> any memory structure, because it's impossible as a general rule to >> say that changing memory structures is going to be safe (or even >> possible) to change. >> >> that includes any access to memory that moves around a lock > > Initially restricting it to such patches would be a good beginning - > most of the security fixes are just failed checks, i.e. they don't > typically even change any external (not on stack) memory structure, > right? in terms of hit-patching kernels you are correct. but that's a far cry from what it sounded like you were demanding (that it must handle any kernel patch) David Lang ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-08 6:50 ` David Lang @ 2014-05-08 7:08 ` Ingo Molnar 2014-05-08 7:29 ` Masami Hiramatsu ` (2 more replies) 0 siblings, 3 replies; 60+ messages in thread From: Ingo Molnar @ 2014-05-08 7:08 UTC (permalink / raw) To: David Lang Cc: Josh Poimboeuf, Frederic Weisbecker, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner * David Lang <david@lang.hm> wrote: > On Thu, 8 May 2014, Ingo Molnar wrote: > > >>> > >>>No! > >>> > >>>A patch to the kernel source is 'safe' if it results in a correctly > >>>patched kernel source. Full stop! > >>> > >>>Live patching does not enter into this question, ever. The correctness > >>>of a patch to the source does not depend on 'live patching' > >>>considerations in any way, shape or form. > >>> > >>>Any mechanism that tries to blur these lines is broken by design. > >>> > >>>My claim is that if a patch is correct/safe in the old fashioned way, > >>>then a fundamental principle is that a live patching subsystem must > >>>either safely apply, or safely reject the live patching attempt, > >>>independently from any user input. > >>> > >>>It's similar to how kprobes (or ftrace) will safely reject or perform > >>>a live patching of the kernel. > >>> > >>>So for example, there's this recent upstream kernel fix: > >>> > >>>3ca9e5d36afb agp: info leak in agpioc_info_wrap() > >>> > >>>which fixes an information leak. The 'patch' is Git commit > >>>3ca9e5d36afb (i.e. it patches a very specific incoming kernel source > >>>tree that results in a specific outgoing source tree), and we know > >>>it's safe and correct. > >>> > >>>Any live patching subsystem must make sure that if this patch is > >>>live-patched, that this attempt is either rejected safely or performed > >>>safely. > >>> > >>>"We think/hope it won't blow up in most cases and we automated some > >>>checks halfways" or "the user must know what he is doing" is really > >>>not something that I think is a good concept for something as fragile > >>>as live patching. > >> > >>In that case you will have to reject any kernel patch that changes > >>any memory structure, because it's impossible as a general rule to > >>say that changing memory structures is going to be safe (or even > >>possible) to change. > >> > >>that includes any access to memory that moves around a lock > > > >Initially restricting it to such patches would be a good beginning - > >most of the security fixes are just failed checks, i.e. they don't > >typically even change any external (not on stack) memory structure, > >right? > > in terms of hit-patching kernels you are correct. > > but that's a far cry from what it sounded like you were demanding > (that it must handle any kernel patch) No, I was not demanding that at all, my suggestion was: > My claim is that if a patch is correct/safe in the old fashioned > way, then a fundamental principle is that a live patching > subsystem must either safely apply, or safely reject the live > patching attempt, independently from any user input. Note the 'if'. It could start simple and stupid, and only allow cases where we know the patch must be trivially safe (because it does not do much in terms of disturbing globally visible state). That needs some tooling help, but apparently tooling help is in place already. And then we can complicate it from there - but have a reasonably robust starting point that we _know_ works (as long as the implementation is correct). Thanks, Ingo ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-08 7:08 ` Ingo Molnar @ 2014-05-08 7:29 ` Masami Hiramatsu 2014-05-08 12:48 ` Josh Poimboeuf 2014-06-14 20:31 ` Pavel Machek 2 siblings, 0 replies; 60+ messages in thread From: Masami Hiramatsu @ 2014-05-08 7:29 UTC (permalink / raw) To: Ingo Molnar Cc: David Lang, Josh Poimboeuf, Frederic Weisbecker, Seth Jennings, Steven Rostedt, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner (2014/05/08 16:08), Ingo Molnar wrote: > > * David Lang <david@lang.hm> wrote: > >> On Thu, 8 May 2014, Ingo Molnar wrote: >> >>>>> >>>>> No! >>>>> >>>>> A patch to the kernel source is 'safe' if it results in a correctly >>>>> patched kernel source. Full stop! >>>>> >>>>> Live patching does not enter into this question, ever. The correctness >>>>> of a patch to the source does not depend on 'live patching' >>>>> considerations in any way, shape or form. >>>>> >>>>> Any mechanism that tries to blur these lines is broken by design. >>>>> >>>>> My claim is that if a patch is correct/safe in the old fashioned way, >>>>> then a fundamental principle is that a live patching subsystem must >>>>> either safely apply, or safely reject the live patching attempt, >>>>> independently from any user input. >>>>> >>>>> It's similar to how kprobes (or ftrace) will safely reject or perform >>>>> a live patching of the kernel. >>>>> >>>>> So for example, there's this recent upstream kernel fix: >>>>> >>>>> 3ca9e5d36afb agp: info leak in agpioc_info_wrap() >>>>> >>>>> which fixes an information leak. The 'patch' is Git commit >>>>> 3ca9e5d36afb (i.e. it patches a very specific incoming kernel source >>>>> tree that results in a specific outgoing source tree), and we know >>>>> it's safe and correct. >>>>> >>>>> Any live patching subsystem must make sure that if this patch is >>>>> live-patched, that this attempt is either rejected safely or performed >>>>> safely. >>>>> >>>>> "We think/hope it won't blow up in most cases and we automated some >>>>> checks halfways" or "the user must know what he is doing" is really >>>>> not something that I think is a good concept for something as fragile >>>>> as live patching. >>>> >>>> In that case you will have to reject any kernel patch that changes >>>> any memory structure, because it's impossible as a general rule to >>>> say that changing memory structures is going to be safe (or even >>>> possible) to change. >>>> >>>> that includes any access to memory that moves around a lock >>> >>> Initially restricting it to such patches would be a good beginning - >>> most of the security fixes are just failed checks, i.e. they don't >>> typically even change any external (not on stack) memory structure, >>> right? >> >> in terms of hit-patching kernels you are correct. >> >> but that's a far cry from what it sounded like you were demanding >> (that it must handle any kernel patch) > > No, I was not demanding that at all, my suggestion was: > > > My claim is that if a patch is correct/safe in the old fashioned > > way, then a fundamental principle is that a live patching > > subsystem must either safely apply, or safely reject the live > > patching attempt, independently from any user input. What would you mean "safely apply" and "safely reject"? Current kpatch checks the target functions are in use and reject safely if it is in use. > Note the 'if'. It could start simple and stupid, and only allow cases > where we know the patch must be trivially safe (because it does not do > much in terms of disturbing globally visible state). That needs some > tooling help, but apparently tooling help is in place already. Hm, it is possible to check the binary patch itself doesn't refer the global memory, but it's hard to find it derefers some address of global memory. Also, there can be a case that the function caller passes the address of a global memory, and the binary patch just calls another function with that address. This can change the global state (but correct way). > And then we can complicate it from there - but have a reasonably > robust starting point that we _know_ works (as long as the > implementation is correct). I agreed if you mean the starting with a simple "tool" and complicate it. I doubt kernel module can check the patch logic.... Thank you, -- Masami HIRAMATSU Software Platform Research Dept. Linux Technology Research Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu.pt@hitachi.com ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-08 7:08 ` Ingo Molnar 2014-05-08 7:29 ` Masami Hiramatsu @ 2014-05-08 12:48 ` Josh Poimboeuf 2014-05-09 6:21 ` Masami Hiramatsu 2014-06-14 20:31 ` Pavel Machek 2 siblings, 1 reply; 60+ messages in thread From: Josh Poimboeuf @ 2014-05-08 12:48 UTC (permalink / raw) To: Ingo Molnar Cc: David Lang, Frederic Weisbecker, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner On Thu, May 08, 2014 at 09:08:15AM +0200, Ingo Molnar wrote: > > * David Lang <david@lang.hm> wrote: > > > On Thu, 8 May 2014, Ingo Molnar wrote: > > > > >>> > > >>>No! > > >>> > > >>>A patch to the kernel source is 'safe' if it results in a correctly > > >>>patched kernel source. Full stop! > > >>> > > >>>Live patching does not enter into this question, ever. The correctness > > >>>of a patch to the source does not depend on 'live patching' > > >>>considerations in any way, shape or form. > > >>> > > >>>Any mechanism that tries to blur these lines is broken by design. > > >>> > > >>>My claim is that if a patch is correct/safe in the old fashioned way, > > >>>then a fundamental principle is that a live patching subsystem must > > >>>either safely apply, or safely reject the live patching attempt, > > >>>independently from any user input. > > >>> > > >>>It's similar to how kprobes (or ftrace) will safely reject or perform > > >>>a live patching of the kernel. > > >>> > > >>>So for example, there's this recent upstream kernel fix: > > >>> > > >>>3ca9e5d36afb agp: info leak in agpioc_info_wrap() > > >>> > > >>>which fixes an information leak. The 'patch' is Git commit > > >>>3ca9e5d36afb (i.e. it patches a very specific incoming kernel source > > >>>tree that results in a specific outgoing source tree), and we know > > >>>it's safe and correct. > > >>> > > >>>Any live patching subsystem must make sure that if this patch is > > >>>live-patched, that this attempt is either rejected safely or performed > > >>>safely. > > >>> > > >>>"We think/hope it won't blow up in most cases and we automated some > > >>>checks halfways" or "the user must know what he is doing" is really > > >>>not something that I think is a good concept for something as fragile > > >>>as live patching. > > >> > > >>In that case you will have to reject any kernel patch that changes > > >>any memory structure, because it's impossible as a general rule to > > >>say that changing memory structures is going to be safe (or even > > >>possible) to change. > > >> > > >>that includes any access to memory that moves around a lock > > > > > >Initially restricting it to such patches would be a good beginning - > > >most of the security fixes are just failed checks, i.e. they don't > > >typically even change any external (not on stack) memory structure, > > >right? > > > > in terms of hit-patching kernels you are correct. > > > > but that's a far cry from what it sounded like you were demanding > > (that it must handle any kernel patch) > > No, I was not demanding that at all, my suggestion was: > > > My claim is that if a patch is correct/safe in the old fashioned > > way, then a fundamental principle is that a live patching > > subsystem must either safely apply, or safely reject the live > > patching attempt, independently from any user input. > > Note the 'if'. It could start simple and stupid, and only allow cases > where we know the patch must be trivially safe (because it does not do > much in terms of disturbing globally visible state). That needs some > tooling help, but apparently tooling help is in place already. > > And then we can complicate it from there - but have a reasonably > robust starting point that we _know_ works (as long as the > implementation is correct). I really wonder if detecting a "trivially safe" patch is even possible. Where do you draw the line with the following patches? - add a call to another function which modifies global data - add an early return or a goto which changes the way the function modifies (or no longer modifies) global data - touch a local stack variable which results in global data being modified later in the function - return a different value which causes the function's caller to modify data -- Josh ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-08 12:48 ` Josh Poimboeuf @ 2014-05-09 6:21 ` Masami Hiramatsu 0 siblings, 0 replies; 60+ messages in thread From: Masami Hiramatsu @ 2014-05-09 6:21 UTC (permalink / raw) To: Josh Poimboeuf Cc: Ingo Molnar, David Lang, Frederic Weisbecker, Seth Jennings, Steven Rostedt, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner (2014/05/08 21:48), Josh Poimboeuf wrote: >> No, I was not demanding that at all, my suggestion was: >> >> > My claim is that if a patch is correct/safe in the old fashioned >> > way, then a fundamental principle is that a live patching >> > subsystem must either safely apply, or safely reject the live >> > patching attempt, independently from any user input. >> >> Note the 'if'. It could start simple and stupid, and only allow cases >> where we know the patch must be trivially safe (because it does not do >> much in terms of disturbing globally visible state). That needs some >> tooling help, but apparently tooling help is in place already. >> >> And then we can complicate it from there - but have a reasonably >> robust starting point that we _know_ works (as long as the >> implementation is correct). > > I really wonder if detecting a "trivially safe" patch is even possible. > > Where do you draw the line with the following patches? > > - add a call to another function which modifies global data This depends on what global data and how. For example, the global data is used only from the replaced functions, it's a kind of local data. And also, the global data modification is as designed (e.g. acquiring/ releasing a spinlock), that is also safe. I think, the bad case is modifying shared global data to new state which unexpected by other data holders. > - add an early return or a goto which changes the way the function > modifies (or no longer modifies) global data Ditto, if it is unexpected at other parts, that will be unacceptable. > - touch a local stack variable which results in global data being > modified later in the function > > - return a different value which causes the function's caller to modify > data I think if the local variable or return value change is correctly handled by the caller (as expected), that is good too. Thank you, -- Masami HIRAMATSU Software Platform Research Dept. Linux Technology Research Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu.pt@hitachi.com ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-08 7:08 ` Ingo Molnar 2014-05-08 7:29 ` Masami Hiramatsu 2014-05-08 12:48 ` Josh Poimboeuf @ 2014-06-14 20:31 ` Pavel Machek 2014-06-15 6:57 ` Ingo Molnar 2 siblings, 1 reply; 60+ messages in thread From: Pavel Machek @ 2014-06-14 20:31 UTC (permalink / raw) To: Ingo Molnar Cc: David Lang, Josh Poimboeuf, Frederic Weisbecker, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner Hi! > > in terms of hit-patching kernels you are correct. > > > > but that's a far cry from what it sounded like you were demanding > > (that it must handle any kernel patch) > > No, I was not demanding that at all, my suggestion was: > > > My claim is that if a patch is correct/safe in the old fashioned > > way, then a fundamental principle is that a live patching > > subsystem must either safely apply, or safely reject the live > > patching attempt, independently from any user input. > > Note the 'if'. It could start simple and stupid, and only allow cases > where we know the patch must be trivially safe (because it does not do > much in terms of disturbing globally visible state). That needs some > tooling help, but apparently tooling help is in place already. Actually, even if patch is very trivial, it will be difficult to determine if it is safe. Consider adding error check: int do_something(void) { #if 0 if (1) return -1; #endif return 0; } void main(void) { if (do_something()) printf("error happened\n"); } Imagine changing that #if 0 to #if 1. But gcc at -O3 already optimized out the error message. So... do we compile whole second kernel and compare the binaries? I think I seen remark "don't try to do binary compares" somewhere... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-06-14 20:31 ` Pavel Machek @ 2014-06-15 6:57 ` Ingo Molnar 0 siblings, 0 replies; 60+ messages in thread From: Ingo Molnar @ 2014-06-15 6:57 UTC (permalink / raw) To: Pavel Machek Cc: David Lang, Josh Poimboeuf, Frederic Weisbecker, Seth Jennings, Masami Hiramatsu, Steven Rostedt, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner * Pavel Machek <pavel@ucw.cz> wrote: > Hi! > > > > in terms of hit-patching kernels you are correct. > > > > > > but that's a far cry from what it sounded like you were demanding > > > (that it must handle any kernel patch) > > > > No, I was not demanding that at all, my suggestion was: > > > > > My claim is that if a patch is correct/safe in the old fashioned > > > way, then a fundamental principle is that a live patching > > > subsystem must either safely apply, or safely reject the live > > > patching attempt, independently from any user input. > > > > Note the 'if'. It could start simple and stupid, and only allow > > cases where we know the patch must be trivially safe (because it > > does not do much in terms of disturbing globally visible state). > > That needs some tooling help, but apparently tooling help is in > > place already. > > Actually, even if patch is very trivial, it will be difficult to > determine if it is safe. Consider adding error check: > > int > do_something(void) > { > #if 0 > if (1) > return -1; > #endif > return 0; > } > > void > main(void) > { > if (do_something()) > printf("error happened\n"); > } > > Imagine changing that #if 0 to #if 1. But gcc at -O3 already > optimized out the error message. So... do we compile whole second > kernel and compare the binaries? I think I seen remark "don't try to > do binary compares" somewhere... Fair enough. Thanks, Ingo ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-05 18:43 ` Ingo Molnar 2014-05-05 21:49 ` Frederic Weisbecker @ 2014-05-06 11:45 ` Masami Hiramatsu 2014-05-06 12:26 ` Steven Rostedt 1 sibling, 1 reply; 60+ messages in thread From: Masami Hiramatsu @ 2014-05-06 11:45 UTC (permalink / raw) To: Ingo Molnar Cc: Frederic Weisbecker, Josh Poimboeuf, Seth Jennings, Steven Rostedt, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner (2014/05/06 3:43), Ingo Molnar wrote: > > * Frederic Weisbecker <fweisbec@gmail.com> wrote: > >> On Mon, May 05, 2014 at 08:26:38AM -0500, Josh Poimboeuf wrote: >>> On Mon, May 05, 2014 at 10:55:37AM +0200, Ingo Molnar wrote: >>>> >>>> * Josh Poimboeuf <jpoimboe@redhat.com> wrote: >>>> >>>>> [...] >>>>> >>>>> kpatch checks the backtraces of all tasks in stop_machine() to >>>>> ensure that no instances of the old function are running when the >>>>> new function is applied. I think the biggest downside of this >>>>> approach is that stop_machine() has to idle all other CPUs during >>>>> the patching process, so it inserts a small amount of latency (a few >>>>> ms on an idle system). >>>> >>>> When live patching the kernel, how about achieving an even 'cleaner' >>>> state for all tasks in the system: to freeze all tasks, as the suspend >>>> and hibernation code (and kexec) does, via freeze_processes()? >>>> >>>> That means no tasks in the system have any real kernel execution >>>> state, and there's also no problem with long-sleeping tasks, as >>>> freeze_processes() is supposed to be fast as well. >>>> >>>> I.e. go for the most conservative live patching state first, and relax >>>> it only once the initial model is upstream and is working robustly. >>> >>> I had considered doing this before, but the problem I found is >>> that many kernel threads are unfreezable. So we wouldn't be able >>> to check whether its safe to replace any functions in use by those >>> kernel threads. >> >> OTOH many kernel threads are parkable. Which achieves kind of >> similar desired behaviour: the kernel threads then aren't running. >> >> And in fact we could implement freezing on top of park for kthreads. >> >> But unfortunately there are still quite some of them which don't >> support parking. > > Well, if distros are moving towards live patching (and they are!), > then it looks rather necessary to me that something scary as flipping > out live kernel instructions with substantially different code should > be as safe as possible, and only then fast. Agreed. At this point, I think we'd better take a safer way to live patch. However, I also think if users can accept such freezing wait-time, it means they can also accept kexec based "checkpoint-restart" patching. So, I think the final goal of the kpatch will be live patching without stopping the machine. I'm discussing the issue on github #138, but that is off-topic. :) > If a kernel refuses to patch with certain threads running, that will > drive those kernel threads being fixed and such. It's a deterministic, > recoverable, reportable bug situation, so fixing it should be fast. That's nice to fix that. As Frederic said, we can make all kthreads park-able. > We learned these robustness lessons the hard way with kprobes and > ftrace dynamic code patching... which are utterly simple compared to > live kernel patching! Yeah, thanks for your help :) Thank you, -- Masami HIRAMATSU Software Platform Research Dept. Linux Technology Research Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu.pt@hitachi.com ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-06 11:45 ` Masami Hiramatsu @ 2014-05-06 12:26 ` Steven Rostedt 2014-05-06 22:33 ` Masami Hiramatsu 2014-05-16 16:27 ` Jiri Kosina 0 siblings, 2 replies; 60+ messages in thread From: Steven Rostedt @ 2014-05-06 12:26 UTC (permalink / raw) To: Masami Hiramatsu Cc: Ingo Molnar, Frederic Weisbecker, Josh Poimboeuf, Seth Jennings, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner On Tue, 06 May 2014 20:45:50 +0900 Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> wrote: > However, I also think if users can accept such freezing wait-time, > it means they can also accept kexec based "checkpoint-restart" patching. > So, I think the final goal of the kpatch will be live patching without > stopping the machine. I'm discussing the issue on github #138, but that is > off-topic. :) > I agree with Ingo too. Being conservative at first is the right approach here. We should start out with a stop_machine making sure that everything is sane before we continue. Sure, that's not much different than a kexec, but lets take things one step at a time. ftrace did the stop_machine (and still does for some archs), and slowly moved to a more efficient method. kpatch/kgraft should follow suit. -- Steve ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-06 12:26 ` Steven Rostedt @ 2014-05-06 22:33 ` Masami Hiramatsu 2014-05-16 16:27 ` Jiri Kosina 1 sibling, 0 replies; 60+ messages in thread From: Masami Hiramatsu @ 2014-05-06 22:33 UTC (permalink / raw) To: Steven Rostedt Cc: Ingo Molnar, Frederic Weisbecker, Josh Poimboeuf, Seth Jennings, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner (2014/05/06 21:26), Steven Rostedt wrote: > On Tue, 06 May 2014 20:45:50 +0900 > Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> wrote: > > >> However, I also think if users can accept such freezing wait-time, >> it means they can also accept kexec based "checkpoint-restart" patching. >> So, I think the final goal of the kpatch will be live patching without >> stopping the machine. I'm discussing the issue on github #138, but that is >> off-topic. :) >> > > I agree with Ingo too. Being conservative at first is the right > approach here. We should start out with a stop_machine making sure that > everything is sane before we continue. Sure, that's not much different > than a kexec, but lets take things one step at a time. Agreed. that is a correct way to move things forward. Anyway, my stop_machine-less approach still has many implementation issues. It should be solved in upstream, not out-of-tree. So, this topic is off-topic at this stage. :) We need to focus on how to merge live-patch into upstream kernel. > ftrace did the stop_machine (and still does for some archs), and slowly > moved to a more efficient method. kpatch/kgraft should follow suit. Sure, that's a best story of how things should be evolved on the kernel :) Thank you, -- Masami HIRAMATSU Software Platform Research Dept. Linux Technology Research Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu.pt@hitachi.com ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-06 12:26 ` Steven Rostedt 2014-05-06 22:33 ` Masami Hiramatsu @ 2014-05-16 16:27 ` Jiri Kosina 2014-05-16 17:14 ` Josh Poimboeuf ` (2 more replies) 1 sibling, 3 replies; 60+ messages in thread From: Jiri Kosina @ 2014-05-16 16:27 UTC (permalink / raw) To: Steven Rostedt Cc: Masami Hiramatsu, Ingo Molnar, Frederic Weisbecker, Josh Poimboeuf, Seth Jennings, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner On Tue, 6 May 2014, Steven Rostedt wrote: > > However, I also think if users can accept such freezing wait-time, > > it means they can also accept kexec based "checkpoint-restart" patching. > > So, I think the final goal of the kpatch will be live patching without > > stopping the machine. I'm discussing the issue on github #138, but that is > > off-topic. :) > > I agree with Ingo too. Being conservative at first is the right > approach here. We should start out with a stop_machine making sure that > everything is sane before we continue. Sure, that's not much different > than a kexec, but lets take things one step at a time. > > ftrace did the stop_machine (and still does for some archs), and slowly > moved to a more efficient method. kpatch/kgraft should follow suit. I don't really agree here. I actually believe that "lazy" switching kgraft is doing provides a little bit more in the sense of consistency than stop_machine()-based aproach. Consider this scenario: void foo() { for (i=0; i<10000; i++) { bar(i); something_else(i); } } Let's say you want to live-patch bar(). With stop_machine()-based aproach, you can easily end-up with old bar() and new bar() being called in two consecutive iterations before the loop is even exited, right? (especially on preemptible kernel, or if something_else() goes to sleep). With lazy-switching implemented in kgraft, this can never happen. So I'd like to ask for a little bit more explanation why you think the stop_machine()-based patching provides more sanity/consistency assurance than the lazy switching we're doing. Thanks a lot, -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-16 16:27 ` Jiri Kosina @ 2014-05-16 17:14 ` Josh Poimboeuf 2014-05-20 9:37 ` Jiri Kosina 2014-05-16 18:09 ` Masami Hiramatsu 2014-05-16 18:55 ` Steven Rostedt 2 siblings, 1 reply; 60+ messages in thread From: Josh Poimboeuf @ 2014-05-16 17:14 UTC (permalink / raw) To: Jiri Kosina Cc: Steven Rostedt, Masami Hiramatsu, Ingo Molnar, Frederic Weisbecker, Seth Jennings, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner On Fri, May 16, 2014 at 06:27:27PM +0200, Jiri Kosina wrote: > On Tue, 6 May 2014, Steven Rostedt wrote: > > > > However, I also think if users can accept such freezing wait-time, > > > it means they can also accept kexec based "checkpoint-restart" patching. > > > So, I think the final goal of the kpatch will be live patching without > > > stopping the machine. I'm discussing the issue on github #138, but that is > > > off-topic. :) > > > > I agree with Ingo too. Being conservative at first is the right > > approach here. We should start out with a stop_machine making sure that > > everything is sane before we continue. Sure, that's not much different > > than a kexec, but lets take things one step at a time. > > > > ftrace did the stop_machine (and still does for some archs), and slowly > > moved to a more efficient method. kpatch/kgraft should follow suit. > > I don't really agree here. > > I actually believe that "lazy" switching kgraft is doing provides a little > bit more in the sense of consistency than stop_machine()-based aproach. > > Consider this scenario: > > void foo() > { > for (i=0; i<10000; i++) { > bar(i); > something_else(i); > } > } > > Let's say you want to live-patch bar(). With stop_machine()-based aproach, > you can easily end-up with old bar() and new bar() being called in two > consecutive iterations before the loop is even exited, right? (especially > on preemptible kernel, or if something_else() goes to sleep). Can you clarify why this would be a problem? Is it because the new bar() changed some data semantics which confused foo() or something_else()? > > With lazy-switching implemented in kgraft, this can never happen. > > So I'd like to ask for a little bit more explanation why you think the > stop_machine()-based patching provides more sanity/consistency assurance > than the lazy switching we're doing. > > Thanks a lot, > > -- > Jiri Kosina > SUSE Labs -- Josh ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-16 17:14 ` Josh Poimboeuf @ 2014-05-20 9:37 ` Jiri Kosina 2014-05-20 12:59 ` Josh Poimboeuf 0 siblings, 1 reply; 60+ messages in thread From: Jiri Kosina @ 2014-05-20 9:37 UTC (permalink / raw) To: Josh Poimboeuf Cc: Steven Rostedt, Masami Hiramatsu, Ingo Molnar, Frederic Weisbecker, Seth Jennings, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner On Fri, 16 May 2014, Josh Poimboeuf wrote: > > Consider this scenario: > > > > void foo() > > { > > for (i=0; i<10000; i++) { > > bar(i); > > something_else(i); > > } > > } > > > > Let's say you want to live-patch bar(). With stop_machine()-based aproach, > > you can easily end-up with old bar() and new bar() being called in two > > consecutive iterations before the loop is even exited, right? (especially > > on preemptible kernel, or if something_else() goes to sleep). > > Can you clarify why this would be a problem? Is it because the new > bar() changed some data semantics which confused foo() or > something_else()? I guess the example I used wasn't really completely illustrative, sorry for that. But I guess this has been answered later in the thread already; the thing is that you don't have a complete callgraph available (at least I believe you don't ...?), so you don't really know where your patched function will be called from, and thus you can't change function arguments or return value semantics; with lazy aproach, you can do that. Thanks, -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-20 9:37 ` Jiri Kosina @ 2014-05-20 12:59 ` Josh Poimboeuf 0 siblings, 0 replies; 60+ messages in thread From: Josh Poimboeuf @ 2014-05-20 12:59 UTC (permalink / raw) To: Jiri Kosina Cc: Steven Rostedt, Masami Hiramatsu, Ingo Molnar, Frederic Weisbecker, Seth Jennings, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner On Tue, May 20, 2014 at 11:37:16AM +0200, Jiri Kosina wrote: > On Fri, 16 May 2014, Josh Poimboeuf wrote: > > > > Consider this scenario: > > > > > > void foo() > > > { > > > for (i=0; i<10000; i++) { > > > bar(i); > > > something_else(i); > > > } > > > } > > > > > > Let's say you want to live-patch bar(). With stop_machine()-based aproach, > > > you can easily end-up with old bar() and new bar() being called in two > > > consecutive iterations before the loop is even exited, right? (especially > > > on preemptible kernel, or if something_else() goes to sleep). > > > > Can you clarify why this would be a problem? Is it because the new > > bar() changed some data semantics which confused foo() or > > something_else()? > > I guess the example I used wasn't really completely illustrative, sorry > for that. But I guess this has been answered later in the thread already; > the thing is that you don't have a complete callgraph available (at least > I believe you don't ...?), so you don't really know where your patched > function will be called from, and thus you can't change function arguments > or return value semantics; with lazy aproach, you can do that. If the patch changes return semantics, it would also need to change all the affected callers to handle the new semantics. But I think that's a general statement for all patches, not just for the live patching case. Otherwise it's just a bad patch in general. For changed function arguments, the compiler will catch that and recompile all the callers, so the kpatch tooling would detect that the callers changed and patch them too. But to be fair, here's an example where kGraft does better: foo = bar(); something_else(); baz(foo); Let's say that bar()'s return semantics change, and baz() is also patched to handle the new semantics. If the stop_machine occurs in something_else(), then the new baz() will be called with the old "version" of foo. If the user were smart enough to be aware of this potential scenario, it could be solved with a kGraft-esque approach for data semantic changes, where the patched baz() would need to handle both versions of foo. This is another example of why changing return or data semantics is tricky and should be avoided. BTW, this is also an example of why the stop_machine and lazy approaches aren't interchangeable. The patch creator has to know which method will be used to apply the patch in order to be able to determine whether a patch is "safe". -- Josh ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-16 16:27 ` Jiri Kosina 2014-05-16 17:14 ` Josh Poimboeuf @ 2014-05-16 18:09 ` Masami Hiramatsu 2014-05-17 22:46 ` Vojtech Pavlik 2014-05-16 18:55 ` Steven Rostedt 2 siblings, 1 reply; 60+ messages in thread From: Masami Hiramatsu @ 2014-05-16 18:09 UTC (permalink / raw) To: Jiri Kosina Cc: Steven Rostedt, Ingo Molnar, Frederic Weisbecker, Josh Poimboeuf, Seth Jennings, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner (2014/05/17 1:27), Jiri Kosina wrote: > On Tue, 6 May 2014, Steven Rostedt wrote: > >>> However, I also think if users can accept such freezing wait-time, >>> it means they can also accept kexec based "checkpoint-restart" patching. >>> So, I think the final goal of the kpatch will be live patching without >>> stopping the machine. I'm discussing the issue on github #138, but that is >>> off-topic. :) >> >> I agree with Ingo too. Being conservative at first is the right >> approach here. We should start out with a stop_machine making sure that >> everything is sane before we continue. Sure, that's not much different >> than a kexec, but lets take things one step at a time. >> >> ftrace did the stop_machine (and still does for some archs), and slowly >> moved to a more efficient method. kpatch/kgraft should follow suit. > > I don't really agree here. > > I actually believe that "lazy" switching kgraft is doing provides a little > bit more in the sense of consistency than stop_machine()-based aproach. > > Consider this scenario: > > void foo() > { > for (i=0; i<10000; i++) { > bar(i); > something_else(i); > } > } In this case, I'd recommend you to add foo() to replacing target as dummy. Then, kpatch can ensure foo() is actually not running. :) > Let's say you want to live-patch bar(). With stop_machine()-based aproach, > you can easily end-up with old bar() and new bar() being called in two > consecutive iterations before the loop is even exited, right? (especially > on preemptible kernel, or if something_else() goes to sleep). > > With lazy-switching implemented in kgraft, this can never happen. And I guess similar thing may happen with kgraft. If old function and new function share a non-auto variable and they modify it different way, the result will be unexpected by the mutual interference. Thank you, > > So I'd like to ask for a little bit more explanation why you think the > stop_machine()-based patching provides more sanity/consistency assurance > than the lazy switching we're doing. > > Thanks a lot, > -- Masami HIRAMATSU Software Platform Research Dept. Linux Technology Research Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu.pt@hitachi.com ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-16 18:09 ` Masami Hiramatsu @ 2014-05-17 22:46 ` Vojtech Pavlik 0 siblings, 0 replies; 60+ messages in thread From: Vojtech Pavlik @ 2014-05-17 22:46 UTC (permalink / raw) To: Masami Hiramatsu Cc: Jiri Kosina, Steven Rostedt, Ingo Molnar, Frederic Weisbecker, Josh Poimboeuf, Seth Jennings, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner On Sat, May 17, 2014 at 03:09:57AM +0900, Masami Hiramatsu wrote: > (2014/05/17 1:27), Jiri Kosina wrote: > > On Tue, 6 May 2014, Steven Rostedt wrote: > > > >>> However, I also think if users can accept such freezing wait-time, > >>> it means they can also accept kexec based "checkpoint-restart" patching. > >>> So, I think the final goal of the kpatch will be live patching without > >>> stopping the machine. I'm discussing the issue on github #138, but that is > >>> off-topic. :) > >> > >> I agree with Ingo too. Being conservative at first is the right > >> approach here. We should start out with a stop_machine making sure that > >> everything is sane before we continue. Sure, that's not much different > >> than a kexec, but lets take things one step at a time. > >> > >> ftrace did the stop_machine (and still does for some archs), and slowly > >> moved to a more efficient method. kpatch/kgraft should follow suit. > > > > I don't really agree here. > > > > I actually believe that "lazy" switching kgraft is doing provides a little > > bit more in the sense of consistency than stop_machine()-based aproach. > > > > Consider this scenario: > > > > void foo() > > { > > for (i=0; i<10000; i++) { > > bar(i); > > something_else(i); > > } > > } > > In this case, I'd recommend you to add foo() to replacing target > as dummy. Then, kpatch can ensure foo() is actually not running. :) The problem is: Where do you stop? Adding the whole list of functions that transitively may ever call bar() can be pretty much the whole kernel. And given that you can be calling via pointer, you can't often even tell who is calling bar(). So yes, a developer could manually include foo() in the to be patched list, so that it is checked that foo() isn't being executed while patching. But that would have to be done entirely manually after thoroughly understanding the implications of the patch. > > Let's say you want to live-patch bar(). With stop_machine()-based aproach, > > you can easily end-up with old bar() and new bar() being called in two > > consecutive iterations before the loop is even exited, right? (especially > > on preemptible kernel, or if something_else() goes to sleep). > > > > With lazy-switching implemented in kgraft, this can never happen. > > And I guess similar thing may happen with kgraft. If old function and > new function share a non-auto variable and they modify it different way, > the result will be unexpected by the mutual interference. By non-auto I assume you mean globally accessible variable or data structures. Yes, with kGraft the new function has to be backward compatible with pre-patch global data layout and semantics. The parameters, return value and their semantics are free to change, though, and kGraft guarantees consistency there. -- Vojtech Pavlik SUSE Labs ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-16 16:27 ` Jiri Kosina 2014-05-16 17:14 ` Josh Poimboeuf 2014-05-16 18:09 ` Masami Hiramatsu @ 2014-05-16 18:55 ` Steven Rostedt 2014-05-16 22:32 ` Jiri Kosina 2 siblings, 1 reply; 60+ messages in thread From: Steven Rostedt @ 2014-05-16 18:55 UTC (permalink / raw) To: Jiri Kosina Cc: Masami Hiramatsu, Ingo Molnar, Frederic Weisbecker, Josh Poimboeuf, Seth Jennings, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner On Fri, 16 May 2014 18:27:27 +0200 (CEST) Jiri Kosina <jkosina@suse.cz> wrote: > > ftrace did the stop_machine (and still does for some archs), and slowly > > moved to a more efficient method. kpatch/kgraft should follow suit. > > I don't really agree here. > > I actually believe that "lazy" switching kgraft is doing provides a little > bit more in the sense of consistency than stop_machine()-based aproach. > > Consider this scenario: > > void foo() > { > for (i=0; i<10000; i++) { > bar(i); > something_else(i); > } > } > > Let's say you want to live-patch bar(). With stop_machine()-based aproach, > you can easily end-up with old bar() and new bar() being called in two > consecutive iterations before the loop is even exited, right? (especially > on preemptible kernel, or if something_else() goes to sleep). And bar() should still do the same result. Otherwise you would think that foo should change too. > > With lazy-switching implemented in kgraft, this can never happen. > > So I'd like to ask for a little bit more explanation why you think the > stop_machine()-based patching provides more sanity/consistency assurance > than the lazy switching we're doing. Here's what I'm more concerned with. With "lazy" switching you can have two tasks running two different versions of bar(). What happens if the locking of data within bar changes? Say the data was protected incorrectly with mutex(X) and you now need to protect it with mutex(Y). With stop machine, you can make sure everyone is out of bar() and all tasks will use the same mutex to access the data. But with a lazy approach, one task can be protecting the data with mutex(X) and the other with mutex(Y) causing both tasks to be accessing the data at the same time. *That* is what I'm more concerned about. I believe there are more issues with running the two different versions of the same function at the same time than there are with iterations of two different versions of the call. One would expect that the results should stay the same and if not, then the callers would need to be changed too. -- Steve ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-16 18:55 ` Steven Rostedt @ 2014-05-16 22:32 ` Jiri Kosina 2014-05-17 0:27 ` Steven Rostedt 0 siblings, 1 reply; 60+ messages in thread From: Jiri Kosina @ 2014-05-16 22:32 UTC (permalink / raw) To: Steven Rostedt Cc: Masami Hiramatsu, Ingo Molnar, Frederic Weisbecker, Josh Poimboeuf, Seth Jennings, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner On Fri, 16 May 2014, Steven Rostedt wrote: > > With lazy-switching implemented in kgraft, this can never happen. > > > > So I'd like to ask for a little bit more explanation why you think the > > stop_machine()-based patching provides more sanity/consistency assurance > > than the lazy switching we're doing. > > Here's what I'm more concerned with. With "lazy" switching you can have > two tasks running two different versions of bar(). What happens if the > locking of data within bar changes? Say the data was protected > incorrectly with mutex(X) and you now need to protect it with mutex(Y). > > With stop machine, you can make sure everyone is out of bar() and all > tasks will use the same mutex to access the data. But with a lazy > approach, one task can be protecting the data with mutex(X) and the > other with mutex(Y) causing both tasks to be accessing the data at the > same time. > > *That* is what I'm more concerned about. That's true, and we come back to what has been said at the very beginning for both aproaches -- you can't really get away without manual human inspection of the patches that are being applied. The case you have outlined is indeed problematic for the "lazy switching" aproach, and can be worked around (interim function, which takes both mutexes in well defined order, for example). You can construct a broken locking scenario for stop_machine() aproach as well -- consider a case when you are exchaing a function which changes the locking order of two locks/mutexes. How do you deal with the rest of the code where the locks are being acquired, but not through the functions you've exchanged? So again -- there is no disagreement, I believe, about the fact that "live patches" can't be reliably auto-generated, and human inspection will always be necessary. Given the intended use-case (serious CVEs mostly, handled by distro vendors), this is fine. -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-16 22:32 ` Jiri Kosina @ 2014-05-17 0:27 ` Steven Rostedt 2014-05-17 7:10 ` Jiri Kosina 0 siblings, 1 reply; 60+ messages in thread From: Steven Rostedt @ 2014-05-17 0:27 UTC (permalink / raw) To: Jiri Kosina Cc: Masami Hiramatsu, Ingo Molnar, Frederic Weisbecker, Josh Poimboeuf, Seth Jennings, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner On Sat, 17 May 2014 00:32:10 +0200 (CEST) Jiri Kosina <jkosina@suse.cz> wrote: > That's true, and we come back to what has been said at the very beginning > for both aproaches -- you can't really get away without manual human > inspection of the patches that are being applied. > > The case you have outlined is indeed problematic for the "lazy switching" > aproach, and can be worked around (interim function, which takes both > mutexes in well defined order, for example). > > You can construct a broken locking scenario for stop_machine() aproach as > well -- consider a case when you are exchaing a function which changes the > locking order of two locks/mutexes. How do you deal with the rest of the > code where the locks are being acquired, but not through the functions > you've exchanged? I'm a bit confused by this. If you change locking order and there's other functions that acquire it in reverse order that's not in the patch, that sounds like you just introduced a new bug. > > So again -- there is no disagreement, I believe, about the fact that "live > patches" can't be reliably auto-generated, and human inspection will > always be necessary. Given the intended use-case (serious CVEs mostly, > handled by distro vendors), this is fine. > Right, I absolutely agree that the real use case is to fix off by one errors and buffer overflows. Anything that is more complex really needs a reboot (at a minimum, a kexec reboot). I know our customers would love to see this upgrading entire kernels, but that's rather unrealistic. Why I still favor the stop_machine approach, is because the method of patching is a bit simpler that way. A "lazy" approach will be more complex and more likely to be buggy. The thing I'm arguing here is not the end result being a problem, but the implementation of the patching itself causing bugs. I rather have a "lazy" approach, but like ftrace and its breakpoint method, the stop_machine approach is the simpler way to make sure the patching works before we try to optimize it. -- Steve ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching 2014-05-17 0:27 ` Steven Rostedt @ 2014-05-17 7:10 ` Jiri Kosina 0 siblings, 0 replies; 60+ messages in thread From: Jiri Kosina @ 2014-05-17 7:10 UTC (permalink / raw) To: Steven Rostedt Cc: Masami Hiramatsu, Ingo Molnar, Frederic Weisbecker, Josh Poimboeuf, Seth Jennings, Ingo Molnar, Jiri Slaby, linux-kernel, Peter Zijlstra, Andrew Morton, Linus Torvalds, Thomas Gleixner On Fri, 16 May 2014, Steven Rostedt wrote: > Why I still favor the stop_machine approach, is because the method of > patching is a bit simpler that way. A "lazy" approach will be more > complex and more likely to be buggy. The thing I'm arguing here is not > the end result being a problem, but the implementation of the patching > itself causing bugs. Well, what can I say to this. 21 files changed, 594 insertions(+), 10 deletions(-) that's a complete implementation, including comments and some documentation. Yes, it still has TODOs (such as patching modules as they are modprobed, we're working on multi-arch support, etc), but it's more or less complete working x86 skeleton. > I rather have a "lazy" approach, I'm glad to hear that, thanks :) > but like ftrace and its breakpoint method, the stop_machine approach is > the simpler way to make sure the patching works before we try to > optimize it. I am still not convinced that it's more complex. It's actually lazy both in the way it performs patching and in implementation -- we basically set a flag, flip the switch, and let the universe converge to a new state by itself. It's basically hard to argue about level of bugginess when no actual bugs are being pointed out :) (well, yes, the kthreads stuff needs to be taken care of, but both kgraft and kpatch have similar issues there). Thanks, -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 60+ messages in thread
end of thread, other threads:[~2014-06-15 6:57 UTC | newest] Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-05-01 15:52 [RFC PATCH 0/2] kpatch: dynamic kernel patching Josh Poimboeuf 2014-05-01 15:52 ` [RFC PATCH 1/2] kpatch: add TAINT_KPATCH flag Josh Poimboeuf 2014-05-01 15:52 ` [RFC PATCH 2/2] kpatch: add kpatch core module Josh Poimboeuf 2014-05-01 20:45 ` [RFC PATCH 0/2] kpatch: dynamic kernel patching Andi Kleen 2014-05-01 21:01 ` Josh Poimboeuf 2014-05-01 21:06 ` Andi Kleen 2014-05-01 21:27 ` Josh Poimboeuf 2014-05-01 21:39 ` Josh Poimboeuf 2014-05-02 1:30 ` Masami Hiramatsu 2014-05-02 8:37 ` Jiri Kosina 2014-05-02 13:29 ` Josh Poimboeuf 2014-05-02 13:10 ` Jiri Kosina 2014-05-02 13:37 ` Josh Poimboeuf 2014-05-05 23:34 ` David Lang 2014-05-05 23:52 ` Jiri Kosina 2014-05-06 1:59 ` David Lang 2014-05-06 12:17 ` Josh Poimboeuf 2014-05-06 7:32 ` Ingo Molnar 2014-05-06 8:03 ` Jiri Kosina 2014-05-06 12:23 ` Josh Poimboeuf 2014-05-07 12:19 ` Ingo Molnar 2014-05-09 1:46 ` David Lang 2014-05-09 2:45 ` Steven Rostedt 2014-05-09 4:07 ` Masami Hiramatsu 2014-05-05 8:55 ` Ingo Molnar 2014-05-05 13:26 ` Josh Poimboeuf 2014-05-05 14:10 ` Frederic Weisbecker 2014-05-05 18:43 ` Ingo Molnar 2014-05-05 21:49 ` Frederic Weisbecker 2014-05-06 12:12 ` Josh Poimboeuf 2014-05-06 12:33 ` Steven Rostedt 2014-05-06 22:49 ` Masami Hiramatsu 2014-05-06 14:05 ` Frederic Weisbecker 2014-05-06 14:50 ` Josh Poimboeuf 2014-05-07 12:24 ` Ingo Molnar 2014-05-07 15:41 ` Josh Poimboeuf 2014-05-07 15:57 ` Ingo Molnar 2014-05-07 16:43 ` Josh Poimboeuf 2014-05-07 22:56 ` David Lang 2014-05-08 6:12 ` Ingo Molnar 2014-05-08 6:50 ` David Lang 2014-05-08 7:08 ` Ingo Molnar 2014-05-08 7:29 ` Masami Hiramatsu 2014-05-08 12:48 ` Josh Poimboeuf 2014-05-09 6:21 ` Masami Hiramatsu 2014-06-14 20:31 ` Pavel Machek 2014-06-15 6:57 ` Ingo Molnar 2014-05-06 11:45 ` Masami Hiramatsu 2014-05-06 12:26 ` Steven Rostedt 2014-05-06 22:33 ` Masami Hiramatsu 2014-05-16 16:27 ` Jiri Kosina 2014-05-16 17:14 ` Josh Poimboeuf 2014-05-20 9:37 ` Jiri Kosina 2014-05-20 12:59 ` Josh Poimboeuf 2014-05-16 18:09 ` Masami Hiramatsu 2014-05-17 22:46 ` Vojtech Pavlik 2014-05-16 18:55 ` Steven Rostedt 2014-05-16 22:32 ` Jiri Kosina 2014-05-17 0:27 ` Steven Rostedt 2014-05-17 7:10 ` Jiri Kosina
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.