linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Documentation/livepatch: remove the limitation for schedule() patching
@ 2017-01-06 14:00 Miroslav Benes
  2017-01-06 15:01 ` Petr Mladek
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Miroslav Benes @ 2017-01-06 14:00 UTC (permalink / raw)
  To: jpoimboe, jeyu, jikos
  Cc: pmladek, corbet, live-patching, linux-doc, linux-kernel, Miroslav Benes

The Limitations section of the documentation describes the impossibility
to livepatch anything that is inlined to __schedule() function. This had
been true till 4.9 kernel came. Thanks to commit 0100301bfdf5
("sched/x86: Rewrite the switch_to() code") from Brian Gerst there is
__switch_to_asm function now (implemented in assembly) called properly
from context_switch(). RIP is thus saved on the stack and a task would
return to proper version of __schedule() et al. functions.

Of course __switch_to_asm() is not patchable for the reason described in
the section. But there is no __fentry__ call and I cannot imagine a
reason to do it anyway.

Therefore, remove the paragraphs from the section.

Signed-off-by: Miroslav Benes <mbenes@suse.cz>
---
FWIW, I also tested this to be sure on top of the consistency model
patch set. I patched schedule() function which calls __schedule() (it is
impossible to patch it directly due to notrace attribute). It works well
except...

1. the patching process does not finish, because many tasks sleep in
schedule. STOP/CONT signal does not help. I'll investigate.

2. reversion of the process does not work as expected. The kernel
crashes after the removal of the module. A task very likely slept in
schedule and was not migrated properly. It might be because of the races
in klp_reverse_transition() described by Petr, or might be somewhere
else. I'll look into it.

 Documentation/livepatch/livepatch.txt | 19 -------------------
 1 file changed, 19 deletions(-)

diff --git a/Documentation/livepatch/livepatch.txt b/Documentation/livepatch/livepatch.txt
index f5967316deb9..7f04e13ec53d 100644
--- a/Documentation/livepatch/livepatch.txt
+++ b/Documentation/livepatch/livepatch.txt
@@ -329,25 +329,6 @@ See Documentation/ABI/testing/sysfs-kernel-livepatch for more details.
     by "notrace".
 
 
-  + Anything inlined into __schedule() can not be patched.
-
-    The switch_to macro is inlined into __schedule(). It switches the
-    context between two processes in the middle of the macro. It does
-    not save RIP in x86_64 version (contrary to 32-bit version). Instead,
-    the currently used __schedule()/switch_to() handles both processes.
-
-    Now, let's have two different tasks. One calls the original
-    __schedule(), its registers are stored in a defined order and it
-    goes to sleep in the switch_to macro and some other task is restored
-    using the original __schedule(). Then there is the second task which
-    calls patched__schedule(), it goes to sleep there and the first task
-    is picked by the patched__schedule(). Its RSP is restored and now
-    the registers should be restored as well. But the order is different
-    in the new patched__schedule(), so...
-
-    There is work in progress to remove this limitation.
-
-
   + Livepatch modules can not be removed.
 
     The current implementation just redirects the functions at the very
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] Documentation/livepatch: remove the limitation for schedule() patching
  2017-01-06 14:00 [PATCH] Documentation/livepatch: remove the limitation for schedule() patching Miroslav Benes
@ 2017-01-06 15:01 ` Petr Mladek
  2017-01-06 15:10   ` Miroslav Benes
  2017-01-06 19:13 ` Josh Poimboeuf
  2017-01-11  1:33 ` Jiri Kosina
  2 siblings, 1 reply; 8+ messages in thread
From: Petr Mladek @ 2017-01-06 15:01 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: jpoimboe, jeyu, jikos, corbet, live-patching, linux-doc, linux-kernel

On Fri 2017-01-06 15:00:45, Miroslav Benes wrote:
> The Limitations section of the documentation describes the impossibility
> to livepatch anything that is inlined to __schedule() function. This had
> been true till 4.9 kernel came. Thanks to commit 0100301bfdf5
> ("sched/x86: Rewrite the switch_to() code") from Brian Gerst there is
> __switch_to_asm function now (implemented in assembly) called properly
> from context_switch(). RIP is thus saved on the stack and a task would
> return to proper version of __schedule() et al. functions.
> 
> Of course __switch_to_asm() is not patchable for the reason described in
> the section. But there is no __fentry__ call and I cannot imagine a
> reason to do it anyway.
> 
> Therefore, remove the paragraphs from the section.
> 
> Signed-off-by: Miroslav Benes <mbenes@suse.cz>

It is great to get a feature for free ;-)

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

---
> FWIW, I also tested this to be sure on top of the consistency model
> patch set. I patched schedule() function which calls __schedule() (it is
> impossible to patch it directly due to notrace attribute). It works well
> except...
> 
> 1. the patching process does not finish, because many tasks sleep in
> schedule. STOP/CONT signal does not help. I'll investigate.

Are these userspace processes or kthreads? Kthreads would cause
problems because they do not handle signals.


> 2. reversion of the process does not work as expected. The kernel
> crashes after the removal of the module. A task very likely slept in
> schedule and was not migrated properly. It might be because of the races
> in klp_reverse_transition() described by Petr, or might be somewhere
> else. I'll look into it.

I hope that I will be able to do another dive into the consistency
model patchset the following week.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Documentation/livepatch: remove the limitation for schedule() patching
  2017-01-06 15:01 ` Petr Mladek
@ 2017-01-06 15:10   ` Miroslav Benes
  0 siblings, 0 replies; 8+ messages in thread
From: Miroslav Benes @ 2017-01-06 15:10 UTC (permalink / raw)
  To: Petr Mladek
  Cc: jpoimboe, jeyu, jikos, corbet, live-patching, linux-doc, linux-kernel

On Fri, 6 Jan 2017, Petr Mladek wrote:

> On Fri 2017-01-06 15:00:45, Miroslav Benes wrote:
> > The Limitations section of the documentation describes the impossibility
> > to livepatch anything that is inlined to __schedule() function. This had
> > been true till 4.9 kernel came. Thanks to commit 0100301bfdf5
> > ("sched/x86: Rewrite the switch_to() code") from Brian Gerst there is
> > __switch_to_asm function now (implemented in assembly) called properly
> > from context_switch(). RIP is thus saved on the stack and a task would
> > return to proper version of __schedule() et al. functions.
> > 
> > Of course __switch_to_asm() is not patchable for the reason described in
> > the section. But there is no __fentry__ call and I cannot imagine a
> > reason to do it anyway.
> > 
> > Therefore, remove the paragraphs from the section.
> > 
> > Signed-off-by: Miroslav Benes <mbenes@suse.cz>
> 
> It is great to get a feature for free ;-)
> 
> Reviewed-by: Petr Mladek <pmladek@suse.com>
> 
> Best Regards,
> Petr
> 
> ---
> > FWIW, I also tested this to be sure on top of the consistency model
> > patch set. I patched schedule() function which calls __schedule() (it is
> > impossible to patch it directly due to notrace attribute). It works well
> > except...
> > 
> > 1. the patching process does not finish, because many tasks sleep in
> > schedule. STOP/CONT signal does not help. I'll investigate.
> 
> Are these userspace processes or kthreads? Kthreads would cause
> problems because they do not handle signals.

Userspace processes, but I take it back. Stupid typo in my script. It 
works as expected. Kthreads sleeping in schedule() are of course there and 
a signal does not help.

Miroslav

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Documentation/livepatch: remove the limitation for schedule() patching
  2017-01-06 14:00 [PATCH] Documentation/livepatch: remove the limitation for schedule() patching Miroslav Benes
  2017-01-06 15:01 ` Petr Mladek
@ 2017-01-06 19:13 ` Josh Poimboeuf
  2017-01-09 12:50   ` Miroslav Benes
  2017-01-11  1:33 ` Jiri Kosina
  2 siblings, 1 reply; 8+ messages in thread
From: Josh Poimboeuf @ 2017-01-06 19:13 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: jeyu, jikos, pmladek, corbet, live-patching, linux-doc, linux-kernel

On Fri, Jan 06, 2017 at 03:00:45PM +0100, Miroslav Benes wrote:
> The Limitations section of the documentation describes the impossibility
> to livepatch anything that is inlined to __schedule() function. This had
> been true till 4.9 kernel came. Thanks to commit 0100301bfdf5
> ("sched/x86: Rewrite the switch_to() code") from Brian Gerst there is
> __switch_to_asm function now (implemented in assembly) called properly
> from context_switch(). RIP is thus saved on the stack and a task would
> return to proper version of __schedule() et al. functions.
> 
> Of course __switch_to_asm() is not patchable for the reason described in
> the section. But there is no __fentry__ call and I cannot imagine a
> reason to do it anyway.
> 
> Therefore, remove the paragraphs from the section.
> 
> Signed-off-by: Miroslav Benes <mbenes@suse.cz>

Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>

> ---
> FWIW, I also tested this to be sure on top of the consistency model
> patch set. I patched schedule() function which calls __schedule() (it is
> impossible to patch it directly due to notrace attribute). It works well
> except...
> 
> 1. the patching process does not finish, because many tasks sleep in
> schedule. STOP/CONT signal does not help. I'll investigate.
> 
> 2. reversion of the process does not work as expected. The kernel
> crashes after the removal of the module. A task very likely slept in
> schedule and was not migrated properly. It might be because of the races
> in klp_reverse_transition() described by Petr, or might be somewhere
> else. I'll look into it.

Hm, will be interesting to see the cause of this...

-- 
Josh

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Documentation/livepatch: remove the limitation for schedule() patching
  2017-01-06 19:13 ` Josh Poimboeuf
@ 2017-01-09 12:50   ` Miroslav Benes
  2017-01-09 14:54     ` Josh Poimboeuf
  0 siblings, 1 reply; 8+ messages in thread
From: Miroslav Benes @ 2017-01-09 12:50 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: jeyu, jikos, pmladek, corbet, live-patching, linux-doc, linux-kernel

On Fri, 6 Jan 2017, Josh Poimboeuf wrote:

> On Fri, Jan 06, 2017 at 03:00:45PM +0100, Miroslav Benes wrote:
> > 
> > 2. reversion of the process does not work as expected. The kernel
> > crashes after the removal of the module. A task very likely slept in
> > schedule and was not migrated properly. It might be because of the races
> > in klp_reverse_transition() described by Petr, or might be somewhere
> > else. I'll look into it.
> 
> Hm, will be interesting to see the cause of this...

The absence of the patched schedule() on the stack was the cause. 
klp_try_switch_task() thus did not see it and happily migrated the task. 

The reason is funny. One cannot patch __schedule() (which is of 
interested) because of the notrace attribute. So all the callers need to 
be processed. I tried to make my life easier and patched only schedule(). 
GCC then inlined new __schedule() to the new schedule(). When I added 
noinline attribute to the new __schedule() everything was fine (because 
suddenly new schedule() was on the stack as expected).

There is still one thing which I don't understand. Why __schedule() 
(patched or the original) is not on the stack. The actual "sleep" 
should happen in __switch_to_asm() which is C function now. And there is a 
call to __switch_to_asm() in __schedule(). __schedule() thus should be on 
the stack, shouldn't it? What am I missing? __switch_to_asm() pushes %rbp 
on the stack...

Miroslav

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Documentation/livepatch: remove the limitation for schedule() patching
  2017-01-09 12:50   ` Miroslav Benes
@ 2017-01-09 14:54     ` Josh Poimboeuf
  2017-01-10 10:32       ` Miroslav Benes
  0 siblings, 1 reply; 8+ messages in thread
From: Josh Poimboeuf @ 2017-01-09 14:54 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: jeyu, jikos, pmladek, corbet, live-patching, linux-doc, linux-kernel

On Mon, Jan 09, 2017 at 01:50:19PM +0100, Miroslav Benes wrote:
> There is still one thing which I don't understand. Why __schedule() 
> (patched or the original) is not on the stack. The actual "sleep" 
> should happen in __switch_to_asm() which is C function now. And there is a 
> call to __switch_to_asm() in __schedule(). __schedule() thus should be on 
> the stack, shouldn't it? What am I missing? __switch_to_asm() pushes %rbp 
> on the stack...

Ah, this is an unwinder bug.  get_frame_pointer() needs to be fixed so
that for an inactive task it returns a pointer to inactive_task_frame.bp
rather than the value of inactive_task_frame.bp itself.  Will fix it.

-- 
Josh

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Documentation/livepatch: remove the limitation for schedule() patching
  2017-01-09 14:54     ` Josh Poimboeuf
@ 2017-01-10 10:32       ` Miroslav Benes
  0 siblings, 0 replies; 8+ messages in thread
From: Miroslav Benes @ 2017-01-10 10:32 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: jeyu, jikos, pmladek, corbet, live-patching, linux-doc, linux-kernel

On Mon, 9 Jan 2017, Josh Poimboeuf wrote:

> On Mon, Jan 09, 2017 at 01:50:19PM +0100, Miroslav Benes wrote:
> > There is still one thing which I don't understand. Why __schedule() 
> > (patched or the original) is not on the stack. The actual "sleep" 
> > should happen in __switch_to_asm() which is C function now. And there is a 
> > call to __switch_to_asm() in __schedule(). __schedule() thus should be on 
> > the stack, shouldn't it? What am I missing? __switch_to_asm() pushes %rbp 
> > on the stack...
> 
> Ah, this is an unwinder bug.  get_frame_pointer() needs to be fixed so
> that for an inactive task it returns a pointer to inactive_task_frame.bp
> rather than the value of inactive_task_frame.bp itself.  Will fix it.

And it works with the fix. Thanks.

Miroslav

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Documentation/livepatch: remove the limitation for schedule() patching
  2017-01-06 14:00 [PATCH] Documentation/livepatch: remove the limitation for schedule() patching Miroslav Benes
  2017-01-06 15:01 ` Petr Mladek
  2017-01-06 19:13 ` Josh Poimboeuf
@ 2017-01-11  1:33 ` Jiri Kosina
  2 siblings, 0 replies; 8+ messages in thread
From: Jiri Kosina @ 2017-01-11  1:33 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: jpoimboe, jeyu, pmladek, corbet, live-patching, linux-doc, linux-kernel

On Fri, 6 Jan 2017, Miroslav Benes wrote:

> The Limitations section of the documentation describes the impossibility
> to livepatch anything that is inlined to __schedule() function. This had
> been true till 4.9 kernel came. Thanks to commit 0100301bfdf5
> ("sched/x86: Rewrite the switch_to() code") from Brian Gerst there is
> __switch_to_asm function now (implemented in assembly) called properly
> from context_switch(). RIP is thus saved on the stack and a task would
> return to proper version of __schedule() et al. functions.
> 
> Of course __switch_to_asm() is not patchable for the reason described in
> the section. But there is no __fentry__ call and I cannot imagine a
> reason to do it anyway.
> 
> Therefore, remove the paragraphs from the section.
> 
> Signed-off-by: Miroslav Benes <mbenes@suse.cz>
> ---
> FWIW, I also tested this to be sure on top of the consistency model
> patch set. I patched schedule() function which calls __schedule() (it is
> impossible to patch it directly due to notrace attribute). It works well
> except...
> 
> 1. the patching process does not finish, because many tasks sleep in
> schedule. STOP/CONT signal does not help. I'll investigate.
> 
> 2. reversion of the process does not work as expected. The kernel
> crashes after the removal of the module. A task very likely slept in
> schedule and was not migrated properly. It might be because of the races
> in klp_reverse_transition() described by Petr, or might be somewhere
> else. I'll look into it.

Applied, thanks.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-01-11  1:34 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-06 14:00 [PATCH] Documentation/livepatch: remove the limitation for schedule() patching Miroslav Benes
2017-01-06 15:01 ` Petr Mladek
2017-01-06 15:10   ` Miroslav Benes
2017-01-06 19:13 ` Josh Poimboeuf
2017-01-09 12:50   ` Miroslav Benes
2017-01-09 14:54     ` Josh Poimboeuf
2017-01-10 10:32       ` Miroslav Benes
2017-01-11  1:33 ` Jiri Kosina

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).