From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750989AbdK3Vxh (ORCPT ); Thu, 30 Nov 2017 16:53:37 -0500 Received: from mx2.suse.de ([195.135.220.15]:55687 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750747AbdK3Vxf (ORCPT ); Thu, 30 Nov 2017 16:53:35 -0500 Date: Thu, 30 Nov 2017 22:53:32 +0100 (CET) From: Jiri Kosina To: Miroslav Benes cc: jpoimboe@redhat.com, jeyu@kernel.org, pmladek@suse.com, lpechacek@suse.cz, pavel@ucw.cz, live-patching@vger.kernel.org, linux-kernel@vger.kernel.org, Oleg Nesterov , Michael Ellerman , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Andy Lutomirski , linuxppc-dev@lists.ozlabs.org, x86@kernel.org Subject: Re: [PATCH v4 1/2] livepatch: send a fake signal to all blocking tasks In-Reply-To: <20171115135014.20594-2-mbenes@suse.cz> Message-ID: References: <20171115135014.20594-1-mbenes@suse.cz> <20171115135014.20594-2-mbenes@suse.cz> User-Agent: Alpine 2.21 (LSU 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 15 Nov 2017, Miroslav Benes wrote: > Live patching consistency model is of LEAVE_PATCHED_SET and > SWITCH_THREAD. This means that all tasks in the system have to be marked > one by one as safe to call a new patched function. Safe means when a > task is not (sleeping) in a set of patched functions. That is, no > patched function is on the task's stack. Another clearly safe place is > the boundary between kernel and userspace. The patching waits for all > tasks to get outside of the patched set or to cross the boundary. The > transition is completed afterwards. > > The problem is that a task can block the transition for quite a long > time, if not forever. It could sleep in a set of patched functions, for > example. Luckily we can force the task to leave the set by sending it a > fake signal, that is a signal with no data in signal pending structures > (no handler, no sign of proper signal delivered). Suspend/freezer use > this to freeze the tasks as well. The task gets TIF_SIGPENDING set and > is woken up (if it has been sleeping in the kernel before) or kicked by > rescheduling IPI (if it was running on other CPU). This causes the task > to go to kernel/userspace boundary where the signal would be handled and > the task would be marked as safe in terms of live patching. > > There are tasks which are not affected by this technique though. The > fake signal is not sent to kthreads. They should be handled differently. > They can be woken up so they leave the patched set and their > TIF_PATCH_PENDING can be cleared thanks to stack checking. > > For the sake of completeness, if the task is in TASK_RUNNING state but > not currently running on some CPU it doesn't get the IPI, but it would > eventually handle the signal anyway. Second, if the task runs in the > kernel (in TASK_RUNNING state) it gets the IPI, but the signal is not > handled on return from the interrupt. It would be handled on return to > the userspace in the future when the fake signal is sent again. Stack > checking deals with these cases in a better way. > > If the task was sleeping in a syscall it would be woken by our fake > signal, it would check if TIF_SIGPENDING is set (by calling > signal_pending() predicate) and return ERESTART* or EINTR. Syscalls with > ERESTART* return values are restarted in case of the fake signal (see > do_signal()). EINTR is propagated back to the userspace program. This > could disturb the program, but... > > * each process dealing with signals should react accordingly to EINTR > return values. > * syscalls returning EINTR happen to be quite common situation in the > system even if no fake signal is sent. > * freezer sends the fake signal and does not deal with EINTR anyhow. > Thus EINTR values are returned when the system is resumed. > > The very safe marking is done in architectures' "entry" on syscall and > interrupt/exception exit paths, and in a stack checking functions of > livepatch. TIF_PATCH_PENDING is cleared and the next > recalc_sigpending() drops TIF_SIGPENDING. In connection with this, also > call klp_update_patch_state() before do_signal(), so that > recalc_sigpending() in dequeue_signal() can clear TIF_PATCH_PENDING > immediately and thus prevent a double call of do_signal(). > > Note that the fake signal is not sent to stopped/traced tasks. Such task > prevents the patching to finish till it continues again (is not traced > anymore). > > Last, sending the fake signal is not automatic. It is done only when > admin requests it by writing 1 to signal sysfs attribute in livepatch > sysfs directory. > > Signed-off-by: Miroslav Benes > Cc: Oleg Nesterov > Cc: Michael Ellerman > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: "H. Peter Anvin" > Cc: Andy Lutomirski > Cc: linuxppc-dev@lists.ozlabs.org > Cc: x86@kernel.org > --- > Documentation/ABI/testing/sysfs-kernel-livepatch | 12 +++++++ > Documentation/livepatch/livepatch.txt | 11 +++++-- > arch/powerpc/kernel/signal.c | 6 ++-- > arch/x86/entry/common.c | 6 ++-- > kernel/livepatch/core.c | 30 +++++++++++++++++ > kernel/livepatch/transition.c | 41 ++++++++++++++++++++++++ > kernel/livepatch/transition.h | 1 + > kernel/signal.c | 4 ++- I'd like to be queuing this patchset for the next merge window, so if there are any objections for the out-of-kernel/livepatch/* changes, please speak up now. Thanks. -- Jiri Kosina SUSE Labs