All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/21] kGraft
@ 2014-06-25 11:05 Jiri Slaby
  2014-06-25 12:54 ` One Thousand Gnomes
  2014-07-02 12:04 ` kGraft to -next [was: 00/21 kGraft] Jiri Slaby
  0 siblings, 2 replies; 19+ messages in thread
From: Jiri Slaby @ 2014-06-25 11:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: tj, rostedt, mingo, akpm, andi, paulmck, pavel, jirislaby,
	Vojtech Pavlik, Michael Matz, Jiri Kosina, Jiri Slaby

Hi,

this is a repost of the second round of RFC on kGraft, the linux
kernel online patching developed at SUSE. This repost only widened the
target audience for broader review, no code change happened.

Please speak up now (or be silent till the next merge window). That
is, if there are no objections, we plan pushing the tree into -next
and asking Linus in the next merge window for comments.

The patches are posted as a reply to this email and can be also
obtained as a whole tree from:
https://git.kernel.org/cgit/linux/kernel/git/jirislaby/kgraft.git/log/?h=kgraft

Jiri Kosina (6):
  kgr: initial code
  kgr: x86: refuse to build without fentry support
  kgr: add procfs interface for per-process 'kgr_in_progress'
  kgr: make a per-process 'in progress' flag a single bit
  kgr: expose global 'in_progress' state through procfs
  kgr: x86: optimize handling of CPU-bound tasks

Jiri Slaby (14):
  ftrace: Add function to find fentry of function
  ftrace: Make ftrace_is_dead available globally
  kgr: add testing kgraft patch
  kgr: update Kconfig documentation
  kgr: add Documentation
  kgr: trigger the first check earlier
  kgr: sched.h, introduce kgr_task_safe helper
  kgr: mark task_safe in some kthreads
  kgr: kthreads support
  kgr: handle irqs
  kgr: add MAINTAINERS entry
  kgr: add support for missing functions
  kgr: exercise non-present function
  kgr: fix race of stub and patching

Libor Pechacek (1):
  kgr: rephrase the "kGraft failed" message

 Documentation/kgraft.txt           |  44 ++++
 MAINTAINERS                        |   9 +
 arch/x86/Kconfig                   |   2 +
 arch/x86/include/asm/kgraft.h      |  61 ++++++
 arch/x86/include/asm/thread_info.h |   6 +-
 arch/x86/kernel/entry_64.S         |   9 +
 drivers/base/devtmpfs.c            |   1 +
 drivers/scsi/scsi_error.c          |   2 +
 drivers/usb/core/hub.c             |   4 +-
 fs/jbd2/journal.c                  |   2 +
 fs/notify/mark.c                   |   5 +-
 fs/proc/base.c                     |  11 +
 include/linux/freezer.h            |   2 +
 include/linux/ftrace.h             |   4 +
 include/linux/kgraft.h             |  90 ++++++++
 include/linux/sched.h              |   9 +
 kernel/Kconfig.kgraft              |  10 +
 kernel/Makefile                    |   1 +
 kernel/hung_task.c                 |   5 +-
 kernel/kgraft.c                    | 430 +++++++++++++++++++++++++++++++++++++
 kernel/kthread.c                   |   3 +
 kernel/rcu/tree.c                  |   6 +-
 kernel/rcu/tree_plugin.h           |  10 +-
 kernel/smpboot.c                   |   2 +
 kernel/trace/ftrace.c              |  30 +++
 kernel/trace/trace.h               |   2 -
 kernel/workqueue.c                 |   3 +
 mm/huge_memory.c                   |   1 +
 net/bluetooth/rfcomm/core.c        |   2 +
 samples/Kconfig                    |   8 +
 samples/Makefile                   |   3 +-
 samples/kgraft/Makefile            |   1 +
 samples/kgraft/kgraft_patcher.c    |  99 +++++++++
 33 files changed, 864 insertions(+), 13 deletions(-)
 create mode 100644 Documentation/kgraft.txt
 create mode 100644 arch/x86/include/asm/kgraft.h
 create mode 100644 include/linux/kgraft.h
 create mode 100644 kernel/Kconfig.kgraft
 create mode 100644 kernel/kgraft.c
 create mode 100644 samples/kgraft/Makefile
 create mode 100644 samples/kgraft/kgraft_patcher.c

-- 
2.0.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 00/21] kGraft
  2014-06-25 11:05 [PATCH 00/21] kGraft Jiri Slaby
@ 2014-06-25 12:54 ` One Thousand Gnomes
  2014-06-25 15:54   ` Jiri Kosina
  2014-07-02 12:04 ` kGraft to -next [was: 00/21 kGraft] Jiri Slaby
  1 sibling, 1 reply; 19+ messages in thread
From: One Thousand Gnomes @ 2014-06-25 12:54 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: linux-kernel, tj, rostedt, mingo, akpm, andi, paulmck, pavel,
	jirislaby, Vojtech Pavlik, Michael Matz, Jiri Kosina

On Wed, 25 Jun 2014 13:05:29 +0200
Jiri Slaby <jslaby@suse.cz> wrote:

> Hi,
> 
> this is a repost of the second round of RFC on kGraft, the linux
> kernel online patching developed at SUSE. This repost only widened the
> target audience for broader review, no code change happened.

-		wait_event_freezable(khubd_wait,
+		wait_event_freezable(khubd_wait,
  ({ kgr_task_safe(current);

The changes are somewhat ugly with all the kgraft crap leaking into plces
like jbd and freezer and usb. That says to me its not well isolated and
there must be a better way of hiding that kgr stuff so we don't have to
kepe putting more code into all the kthreads, and inevitably missing them
or have people getting it wrong.

You also wake up all the kthreads when some of them might not expect to
be woken and may not check for the case of a bogus wake.

You add instructions to various hotpaths (despite the CONFIG comment).
Have you done timing analysis of their impact when turned on ?

Can you explain a bit more about why the path you've chosen was used as
opposed to using the power management freeze/resume path. Would that not
be a lot less intrusive ?

Don't get me wrong - I'd like a good working ksplice/graft !

Alan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 00/21] kGraft
  2014-06-25 12:54 ` One Thousand Gnomes
@ 2014-06-25 15:54   ` Jiri Kosina
  2014-06-26  5:50     ` Vojtech Pavlik
  0 siblings, 1 reply; 19+ messages in thread
From: Jiri Kosina @ 2014-06-25 15:54 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Jiri Slaby, linux-kernel, tj, rostedt, mingo, akpm, andi,
	paulmck, pavel, jirislaby, Vojtech Pavlik, Michael Matz

On Wed, 25 Jun 2014, One Thousand Gnomes wrote:

> > this is a repost of the second round of RFC on kGraft, the linux
> > kernel online patching developed at SUSE. This repost only widened the
> > target audience for broader review, no code change happened.
> 
> -		wait_event_freezable(khubd_wait,
> +		wait_event_freezable(khubd_wait,
>   ({ kgr_task_safe(current);
> 
> The changes are somewhat ugly with all the kgraft crap leaking into plces
> like jbd and freezer and usb. That says to me its not well isolated and
> there must be a better way of hiding that kgr stuff so we don't have to
> kepe putting more code into all the kthreads, and inevitably missing them
> or have people getting it wrong.

Tejun commented on this very issue during the first RFC submission a 
couple weeks ago. Part of his proposal was actually to take this as a good 
pretext to review the existing kernel threads, as the idea is that a big 
portion of those could easily be converted to workqueues.

It of course has its own implications, such as not being able to 
prioritize that easily, but there is probably a lot of low-hanging fruits 
where driver authors and their grandmas have been creating kthreads where 
workqueue would suffice.

But it's a very long term goal, and it's probably impractical to make 
kGraft dependent on it.

> You add instructions to various hotpaths (despite the CONFIG comment).

Kernel entry is hidden behind _TIF_WORK_SYSCALL_ENTRY and 
_TIF_ALLWORK_MASK, so it doesn't add any overhead if kGraft is not 
enabled. What other hot paths do you refer to?

> Don't get me wrong - I'd like a good working ksplice/graft !

Thanks Alan,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 00/21] kGraft
  2014-06-25 15:54   ` Jiri Kosina
@ 2014-06-26  5:50     ` Vojtech Pavlik
  0 siblings, 0 replies; 19+ messages in thread
From: Vojtech Pavlik @ 2014-06-26  5:50 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Jiri Slaby, linux-kernel, tj, rostedt, mingo, akpm, andi,
	paulmck, pavel, jirislaby, Michael Matz, Jiri Kosina

On Wed, Jun 25, 2014 at 05:54:50PM +0200, Jiri Kosina wrote:
> > -		wait_event_freezable(khubd_wait,
> > +		wait_event_freezable(khubd_wait,
> >   ({ kgr_task_safe(current);
> > 
> > The changes are somewhat ugly with all the kgraft crap leaking into plces
> > like jbd and freezer and usb. That says to me its not well isolated and
> > there must be a better way of hiding that kgr stuff so we don't have to
> > kepe putting more code into all the kthreads, and inevitably missing them
> > or have people getting it wrong.
> 
> Tejun commented on this very issue during the first RFC submission a 
> couple weeks ago. Part of his proposal was actually to take this as a good 
> pretext to review the existing kernel threads, as the idea is that a big 
> portion of those could easily be converted to workqueues.

In the past few years there was a significant proliferation of kernel
threads just for cases where something needs to be done from a process
context now and then.

This is underlined by the fact that on an average installation there are
more kernel threads than real userspace processes.

Converting these bits to workqueues would then also take care of their
interaction with freezer, kthread_stop. The main loop code wouldn't have
to be replicated over and over with slight variations. kgr_taks_safe
would then plug into that rather elegantly.

In the absence of this rework, kGraft hijacks the freezer and
kthread_stop to pinpoint the 'end' of the main loop of most kernel
threads and annotates the rest that doesn't handle freezing or stopping
with kgr_task_safe directly.


> It of course has its own implications, such as not being able to 
> prioritize that easily, but there is probably a lot of low-hanging fruits 
> where driver authors and their grandmas have been creating kthreads where 
> workqueue would suffice.

Indeed, on the other hand, there are enough workqueues today and on
realtime systems the need for prioritizing them exists already.

> But it's a very long term goal, and it's probably impractical to make 
> kGraft dependent on it.

Should the kernel thread changes turn to be a big issue, we could do the
initial submission without them and depend on the kthread->workqueue
rework. Once we add support for multiple patches in progress, the fact
that we don't support kernel threads would not be as painful.
 
> > You add instructions to various hotpaths (despite the CONFIG
> > comment).
> 
> Kernel entry is hidden behind _TIF_WORK_SYSCALL_ENTRY and
> _TIF_ALLWORK_MASK, so it doesn't add any overhead if kGraft is not
> enabled. What other hot paths do you refer to?

As Jiří says, I don't see any hot path where kGraft would add
instructions: Kernel exit and entry are handled outside the hot path
in the _TIF_WORK_SYSCALL_ENTRY and _TIF_ALLWORK_MASK handlers. [patch 15/21]

It adds instructions into the main loops of kernel threads, but those
can hardly be called hot paths as they mostly sleep for long times.

-- 
Vojtech Pavlik
Director SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* kGraft to -next [was: 00/21 kGraft]
  2014-06-25 11:05 [PATCH 00/21] kGraft Jiri Slaby
  2014-06-25 12:54 ` One Thousand Gnomes
@ 2014-07-02 12:04 ` Jiri Slaby
  2014-07-02 12:30   ` Tejun Heo
  2014-07-03  0:26   ` Stephen Rothwell
  1 sibling, 2 replies; 19+ messages in thread
From: Jiri Slaby @ 2014-07-02 12:04 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: linux-kernel, tj, rostedt, mingo, akpm, andi, paulmck, pavel,
	jirislaby, Vojtech Pavlik, Michael Matz, Jiri Kosina

On 06/25/2014 01:05 PM, Jiri Slaby wrote:
> Hi,
> 
> this is a repost of the second round of RFC on kGraft, the linux
> kernel online patching developed at SUSE. This repost only widened the
> target audience for broader review, no code change happened.
> 
> Please speak up now (or be silent till the next merge window). That
> is, if there are no objections, we plan pushing the tree into -next
> and asking Linus in the next merge window for comments.
> 
> The patches are posted as a reply to this email and can be also
> obtained as a whole tree from:
> https://git.kernel.org/cgit/linux/kernel/git/jirislaby/kgraft.git/log/?h=kgraft

Stephen,

may I ask you to add the kGraft tree to -next?

git://git.kernel.org/pub/scm/linux/kernel/git/jirislaby/kgraft.git#kgraft

Thank you.

 Documentation/kgraft.txt           |  44 ++++
 MAINTAINERS                        |   9 +
 arch/x86/Kconfig                   |   2 +
 arch/x86/include/asm/kgraft.h      |  61 +++++
 arch/x86/include/asm/thread_info.h |   6 +-
 arch/x86/kernel/entry_64.S         |   9 +
 drivers/base/devtmpfs.c            |   1 +
 drivers/scsi/scsi_error.c          |   2 +
 drivers/usb/core/hub.c             |   4 +-
 fs/jbd2/journal.c                  |   2 +
 fs/notify/mark.c                   |   5 +-
 fs/proc/base.c                     |  11 +
 include/linux/freezer.h            |   2 +
 include/linux/ftrace.h             |   4 +
 include/linux/kgraft.h             | 130 ++++++++++
 include/linux/sched.h              |   9 +
 kernel/Kconfig.kgraft              |  12 +
 kernel/Makefile                    |   1 +
 kernel/hung_task.c                 |   5 +-
 kernel/kgraft.c                    | 477 ++++++++++++++++++++++++++++++
 kernel/kgraft_files.c              | 150 ++++++++++++
 kernel/kthread.c                   |   3 +
 kernel/rcu/tree.c                  |   6 +-
 kernel/rcu/tree_plugin.h           |  10 +-
 kernel/smpboot.c                   |   2 +
 kernel/trace/ftrace.c              |  30 +++
 kernel/trace/trace.h               |   2 -
 kernel/workqueue.c                 |   3 +
 mm/huge_memory.c                   |   6 +-
 net/bluetooth/rfcomm/core.c        |   2 +
 samples/Kconfig                    |   8 +
 samples/Makefile                   |   3 +-
 samples/kgraft/Makefile            |   1 +
 samples/kgraft/kgraft_patcher.c    |  97 ++++++++
 34 files changed, 1104 insertions(+), 15 deletions(-)
 create mode 100644 Documentation/kgraft.txt
 create mode 100644 arch/x86/include/asm/kgraft.h
 create mode 100644 include/linux/kgraft.h
 create mode 100644 kernel/Kconfig.kgraft
 create mode 100644 kernel/kgraft.c
 create mode 100644 kernel/kgraft_files.c
 create mode 100644 samples/kgraft/Makefile
 create mode 100644 samples/kgraft/kgraft_patcher.c

-- 
js
suse labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: kGraft to -next [was: 00/21 kGraft]
  2014-07-02 12:04 ` kGraft to -next [was: 00/21 kGraft] Jiri Slaby
@ 2014-07-02 12:30   ` Tejun Heo
  2014-07-02 12:47     ` One Thousand Gnomes
                       ` (2 more replies)
  2014-07-03  0:26   ` Stephen Rothwell
  1 sibling, 3 replies; 19+ messages in thread
From: Tejun Heo @ 2014-07-02 12:30 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Stephen Rothwell, linux-kernel, rostedt, mingo, akpm, andi,
	paulmck, pavel, jirislaby, Vojtech Pavlik, Michael Matz,
	Jiri Kosina

Hello,

On Wed, Jul 02, 2014 at 02:04:38PM +0200, Jiri Slaby wrote:
> On 06/25/2014 01:05 PM, Jiri Slaby wrote:
...
> > https://git.kernel.org/cgit/linux/kernel/git/jirislaby/kgraft.git/log/?h=kgraft
> 
> Stephen,
> 
> may I ask you to add the kGraft tree to -next?
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/jirislaby/kgraft.git#kgraft

Do we have consensus on the approach?  I personally really don't like
the fact that it's adding another aspect to kthread management which
is difficult to get right and nearly impossible to verify
automatically.

IIUC, there are three similar solutions.  What are the pros and cons
of each?  Can we combine the different approaches?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: kGraft to -next [was: 00/21 kGraft]
  2014-07-02 12:30   ` Tejun Heo
@ 2014-07-02 12:47     ` One Thousand Gnomes
  2014-07-02 13:01       ` Jiri Kosina
  2014-07-02 12:47     ` Steven Rostedt
  2014-07-02 16:28     ` Josh Poimboeuf
  2 siblings, 1 reply; 19+ messages in thread
From: One Thousand Gnomes @ 2014-07-02 12:47 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jiri Slaby, Stephen Rothwell, linux-kernel, rostedt, mingo, akpm,
	andi, paulmck, pavel, jirislaby, Vojtech Pavlik, Michael Matz,
	Jiri Kosina

On Wed, 2 Jul 2014 08:30:02 -0400
Tejun Heo <tj@kernel.org> wrote:

> Hello,
> 
> On Wed, Jul 02, 2014 at 02:04:38PM +0200, Jiri Slaby wrote:
> > On 06/25/2014 01:05 PM, Jiri Slaby wrote:
> ...
> > > https://git.kernel.org/cgit/linux/kernel/git/jirislaby/kgraft.git/log/?h=kgraft
> > 
> > Stephen,
> > 
> > may I ask you to add the kGraft tree to -next?
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/jirislaby/kgraft.git#kgraft
> 
> Do we have consensus on the approach?  I personally really don't like
> the fact that it's adding another aspect to kthread management which
> is difficult to get right and nearly impossible to verify
> automatically.

Nor me. I don't see why it can't use the kernel freeze functionality ?

Alan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: kGraft to -next [was: 00/21 kGraft]
  2014-07-02 12:30   ` Tejun Heo
  2014-07-02 12:47     ` One Thousand Gnomes
@ 2014-07-02 12:47     ` Steven Rostedt
  2014-07-02 16:28     ` Josh Poimboeuf
  2 siblings, 0 replies; 19+ messages in thread
From: Steven Rostedt @ 2014-07-02 12:47 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jiri Slaby, Stephen Rothwell, linux-kernel, mingo, akpm, andi,
	paulmck, pavel, jirislaby, Vojtech Pavlik, Michael Matz,
	Jiri Kosina

On Wed, 2 Jul 2014 08:30:02 -0400
Tejun Heo <tj@kernel.org> wrote:

> Hello,
> 
> On Wed, Jul 02, 2014 at 02:04:38PM +0200, Jiri Slaby wrote:
> > On 06/25/2014 01:05 PM, Jiri Slaby wrote:
> ...
> > > https://git.kernel.org/cgit/linux/kernel/git/jirislaby/kgraft.git/log/?h=kgraft
> > 
> > Stephen,
> > 
> > may I ask you to add the kGraft tree to -next?
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/jirislaby/kgraft.git#kgraft
> 
> Do we have consensus on the approach?  I personally really don't like
> the fact that it's adding another aspect to kthread management which
> is difficult to get right and nearly impossible to verify
> automatically.
> 
> IIUC, there are three similar solutions.  What are the pros and cons
> of each?  Can we combine the different approaches?

Has this been agreed on to be accepted yet? I don't believe so.

linux-next is for code that will be going into the next release of the
kernel. Not for developmental code or code that is still being
discussed.

-- Steve


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: kGraft to -next [was: 00/21 kGraft]
  2014-07-02 12:47     ` One Thousand Gnomes
@ 2014-07-02 13:01       ` Jiri Kosina
  2014-07-02 13:30         ` Tejun Heo
  0 siblings, 1 reply; 19+ messages in thread
From: Jiri Kosina @ 2014-07-02 13:01 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Tejun Heo, Jiri Slaby, Stephen Rothwell, linux-kernel, rostedt,
	mingo, akpm, andi, paulmck, pavel, jirislaby, Vojtech Pavlik,
	Michael Matz

On Wed, 2 Jul 2014, One Thousand Gnomes wrote:

> > > > https://git.kernel.org/cgit/linux/kernel/git/jirislaby/kgraft.git/log/?h=kgraft
> > > 
> > > Stephen,
> > > 
> > > may I ask you to add the kGraft tree to -next?
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/jirislaby/kgraft.git#kgraft
> > 
> > Do we have consensus on the approach?  I personally really don't like
> > the fact that it's adding another aspect to kthread management which
> > is difficult to get right and nearly impossible to verify
> > automatically.
> 
> Nor me. I don't see why it can't use the kernel freeze functionality ?

Well, we are, see the patch no. 9. It all boils down basically to

[ ... ]
diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index 7fd81b8c4897..e08c3bef251b 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -61,6 +61,8 @@ static inline bool try_to_freeze_unsafe(void)
 
 static inline bool try_to_freeze(void)
 {
+       kgr_task_safe(current);
+
        if (!(current->flags & PF_NOFREEZE))
                debug_check_no_locks_held();
        return try_to_freeze_unsafe();
[ ... ]


Then there are non-freezable kernel threads. We are currently handling 
those explicitly, but I agree it can be argued that they might not require 
any special handling at all, given they claimed themselves explicitly 
non-freezable.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: kGraft to -next [was: 00/21 kGraft]
  2014-07-02 13:01       ` Jiri Kosina
@ 2014-07-02 13:30         ` Tejun Heo
  2014-07-05 20:04           ` Jiri Kosina
  0 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2014-07-02 13:30 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: One Thousand Gnomes, Jiri Slaby, Stephen Rothwell, linux-kernel,
	rostedt, mingo, akpm, andi, paulmck, pavel, jirislaby,
	Vojtech Pavlik, Michael Matz

Hello,

On Wed, Jul 02, 2014 at 03:01:16PM +0200, Jiri Kosina wrote:
>  static inline bool try_to_freeze(void)
>  {
> +       kgr_task_safe(current);
> +
>         if (!(current->flags & PF_NOFREEZE))
>                 debug_check_no_locks_held();
>         return try_to_freeze_unsafe();

Heh, I'm totally confused now.  Why is this correct?  What guarantees
that context is not carried across try_to_freeze()?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: kGraft to -next [was: 00/21 kGraft]
  2014-07-02 12:30   ` Tejun Heo
  2014-07-02 12:47     ` One Thousand Gnomes
  2014-07-02 12:47     ` Steven Rostedt
@ 2014-07-02 16:28     ` Josh Poimboeuf
  2 siblings, 0 replies; 19+ messages in thread
From: Josh Poimboeuf @ 2014-07-02 16:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jiri Slaby, Stephen Rothwell, linux-kernel, rostedt, mingo, akpm,
	andi, paulmck, pavel, jirislaby, Vojtech Pavlik, Michael Matz,
	Jiri Kosina, Seth Jennings, One Thousand Gnomes

On Wed, Jul 02, 2014 at 08:30:02AM -0400, Tejun Heo wrote:
> Hello,
> 
> On Wed, Jul 02, 2014 at 02:04:38PM +0200, Jiri Slaby wrote:
> > On 06/25/2014 01:05 PM, Jiri Slaby wrote:
> ...
> > > https://git.kernel.org/cgit/linux/kernel/git/jirislaby/kgraft.git/log/?h=kgraft
> > 
> > Stephen,
> > 
> > may I ask you to add the kGraft tree to -next?
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/jirislaby/kgraft.git#kgraft
> 
> Do we have consensus on the approach?  I personally really don't like
> the fact that it's adding another aspect to kthread management which
> is difficult to get right and nearly impossible to verify
> automatically.
> 
> IIUC, there are three similar solutions.  What are the pros and cons
> of each?  Can we combine the different approaches?

Please don't forget about kpatch.  The most recent version was posted
here:

  https://lkml.org/lkml/2014/5/1/273

We've made a ton of improvements since then, so I'll probably post a new
patch set soon.

kpatch advantages:

* 100% self-contained in its own module [1]

* Doesn't rely on changing all the kthreads

* Patch is applied atomically using stop_machine(), so it's safer with
  respect to data semantic changes

* Patching atomically also makes it much easier to analyze a patch to
  determine whether it's safe for live patching

* Already supports many advanced features which kGraft is lacking:
  - patched functions can access non-exported symbols, e.g. static
    variables
  - safe unpatching
  - module patching (and deferred module patching)
  - atomic patch replacement
  - supports atomic load/unload user hook functions
  - proper duplicate symbol handling
  - address verification sanity checks
  - sophisticated user space tools for analyzing and converting source
    patches to binary patch modules
  - ability to properly deal with many special sections (__bug_table,
    .data..percpu, etc)

kpatch disadvantages:

* Can't yet patch functions which are always active (schedule(),
  sys_poll(), etc).  We're currently working on ways to overcome this
  limitation.  One way is to allow the user to skip the backtrace check
  for those patches which don't change data semantics (which, for
  security fixes, should be most patches).  We also have some other
  ideas brewing...

* stop_machine() latency.  We've found that stop_machine() is still
  pretty fast.  IIRC, we measured ~1ms on an idle system and ~40ms on a
  heavily loaded 16 CPU system.

* Currently we don't freeze kernel threads.  Instead we just put them to
  sleep.  We _could_ freeze them, but I think it needs more discussion.
  It's definitely not a cure-all because you still have to worry about
  user threads.

  With our current approach, when analyzing whether patches are safe to
  apply live, we assume that all kernel and user threads will be asleep.
  We make no assumptions that kernel threads will be frozen.  In general
  we avoid changing data and data semantics as much as possible, so it
  shouldn't matter in most cases.  Personally I haven't yet run into a
  case where freezing kernel threads would have made a patch become
  "safe".


[1] https://github.com/dynup/kpatch/blob/master/kmod/core/core.c

-- 
Josh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: kGraft to -next [was: 00/21 kGraft]
  2014-07-02 12:04 ` kGraft to -next [was: 00/21 kGraft] Jiri Slaby
  2014-07-02 12:30   ` Tejun Heo
@ 2014-07-03  0:26   ` Stephen Rothwell
  1 sibling, 0 replies; 19+ messages in thread
From: Stephen Rothwell @ 2014-07-03  0:26 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: linux-kernel, tj, rostedt, mingo, akpm, andi, paulmck, pavel,
	jirislaby, Vojtech Pavlik, Michael Matz, Jiri Kosina,
	One Thousand Gnomes, Josh Poimboeuf

[-- Attachment #1: Type: text/plain, Size: 355 bytes --]

Hi Jiri,

On Wed, 02 Jul 2014 14:04:38 +0200 Jiri Slaby <jslaby@suse.cz> wrote:
>
> may I ask you to add the kGraft tree to -next?
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/jirislaby/kgraft.git#kgraft

Given the ongoing discussion, I think this needs to wait ...

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: kGraft to -next [was: 00/21 kGraft]
  2014-07-02 13:30         ` Tejun Heo
@ 2014-07-05 20:04           ` Jiri Kosina
  2014-07-05 20:36             ` Tejun Heo
  0 siblings, 1 reply; 19+ messages in thread
From: Jiri Kosina @ 2014-07-05 20:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: One Thousand Gnomes, Jiri Slaby, Stephen Rothwell, linux-kernel,
	rostedt, mingo, Andrew Morton, andi, paulmck, Pavel Machek,
	jirislaby, Vojtech Pavlik, Michael Matz

On Wed, 2 Jul 2014, Tejun Heo wrote:

> >  static inline bool try_to_freeze(void)
> >  {
> > +       kgr_task_safe(current);
> > +
> >         if (!(current->flags & PF_NOFREEZE))
> >                 debug_check_no_locks_held();
> >         return try_to_freeze_unsafe();
> 
> Heh, I'm totally confused now.  Why is this correct?  What guarantees
> that context is not carried across try_to_freeze()?

I think we need to take a step back now, and ask ourselves a question 
"What is the actual goal here?".

What we need is to have a defined point in execution where we can draw a 
line between "old" and "new" universes. For processess that are crossing 
the userspace/kernelspace boundary, the obvious choice, that covers most 
of the use-cases, has been made. There are still scenarios where this 
aproach can't be just-blindly-applied(TM) for various reasons (changing 
lock order might cause deadlocks, there are cases where state is lingering 
between two user <-> kernel transitions, etc). So we'll need to provide 
guidelines for kGraft patch writers anyway.

The same holds for the kernel threads -- until all (or most of) the 
kthreads are converted to workqueues, the obivous choice, that should 
cover most of the use-cases, has been made.

But manual/human inspection is absolutely unavoidably necessary in any 
case.

Please keep in mind that this is designed for fixes that need immediate 
response (getting bounds checking right, adding an extra check, adding a 
missing lock, etc -- please see my previous mail on this topic in the old 
thread). It's absolutely by design not intended for implementing whole new 
features or exchanging the whole kernel on the fly; there are other 
solutions for that (such as the criu-based thing). As such, we tend to 
interfere with the rest of the kernel as little as possible, but it 
inadverently brings drawbacks in the form of putting burden of more work 
to the actual kGraft patch writers. I don't see that as a bad thing.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: kGraft to -next [was: 00/21 kGraft]
  2014-07-05 20:04           ` Jiri Kosina
@ 2014-07-05 20:36             ` Tejun Heo
  2014-07-05 20:49               ` Jiri Kosina
  0 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2014-07-05 20:36 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: One Thousand Gnomes, Jiri Slaby, Stephen Rothwell, linux-kernel,
	rostedt, mingo, Andrew Morton, andi, paulmck, Pavel Machek,
	jirislaby, Vojtech Pavlik, Michael Matz

Hello,

On Sat, Jul 05, 2014 at 10:04:52PM +0200, Jiri Kosina wrote:
> What we need is to have a defined point in execution where we can draw a 
> line between "old" and "new" universes. For processess that are crossing 
> the userspace/kernelspace boundary, the obvious choice, that covers most 
> of the use-cases, has been made. There are still scenarios where this 
> aproach can't be just-blindly-applied(TM) for various reasons (changing 
> lock order might cause deadlocks, there are cases where state is lingering 

Sure, if something breaks work due to semantic differences, there
isn't anything we can do.

> between two user <-> kernel transitions, etc). So we'll need to provide 
> guidelines for kGraft patch writers anyway.
> 
> The same holds for the kernel threads -- until all (or most of) the 
> kthreads are converted to workqueues, the obivous choice, that should 
> cover most of the use-cases, has been made.

But, this is different.  This is the mechanism itself being flaky and
buggy.  This can make things fail not because the patch itself is
semantically wrong but because the mechanism stopped the kernel at the
wrong place.  The mechanism is simply broken and it might as well
declare that patching whatever kthreads are running isn't supported.

> But manual/human inspection is absolutely unavoidably necessary in any 
> case.

This is extremely tricky to inspect and likely to just work in most,
but not all, cases when it goes wrong just out of sheer luck.

> Please keep in mind that this is designed for fixes that need immediate 
> response (getting bounds checking right, adding an extra check, adding a 
> missing lock, etc -- please see my previous mail on this topic in the old 
> thread). It's absolutely by design not intended for implementing whole new 
> features or exchanging the whole kernel on the fly; there are other 
> solutions for that (such as the criu-based thing). As such, we tend to 
> interfere with the rest of the kernel as little as possible, but it 
> inadverently brings drawbacks in the form of putting burden of more work 
> to the actual kGraft patch writers. I don't see that as a bad thing.

I'm not saying this must be able to do everything but it shouldn't be
voodoo either.  Freezer points *can* coincide with what kgraft
requires but it may not too.  Why claim that freezing points are safe
as stopping points when they aren't?  That doesn't make any sense to
me.  If kthread can't be safely supported now, that's fine but then
make it clear that functions which may be executed by kthreads aren't
supported.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: kGraft to -next [was: 00/21 kGraft]
  2014-07-05 20:36             ` Tejun Heo
@ 2014-07-05 20:49               ` Jiri Kosina
  2014-07-05 21:04                 ` Tejun Heo
  2014-07-29 14:05                 ` Jiri Kosina
  0 siblings, 2 replies; 19+ messages in thread
From: Jiri Kosina @ 2014-07-05 20:49 UTC (permalink / raw)
  To: Tejun Heo, One Thousand Gnomes
  Cc: Jiri Slaby, Stephen Rothwell, linux-kernel, rostedt, mingo,
	Andrew Morton, andi, paulmck, Pavel Machek, jirislaby,
	Vojtech Pavlik, Michael Matz

On Sat, 5 Jul 2014, Tejun Heo wrote:

> > What we need is to have a defined point in execution where we can draw a 
> > line between "old" and "new" universes. For processess that are crossing 
> > the userspace/kernelspace boundary, the obvious choice, that covers most 
> > of the use-cases, has been made. There are still scenarios where this 
> > aproach can't be just-blindly-applied(TM) for various reasons (changing 
> > lock order might cause deadlocks, there are cases where state is lingering 
> 
> Sure, if something breaks work due to semantic differences, there
> isn't anything we can do.
[ ... ]
> > The same holds for the kernel threads -- until all (or most of) the 
> > kthreads are converted to workqueues, the obivous choice, that should 
> > cover most of the use-cases, has been made.
> 
> But, this is different.  

Quite frankly, I have to say that I don't personally see *that* big 
difference. In both cases we are making assumptions about semantics, and 
are trying to get "as close as possible" to the point in time where the 
set of assumptions can be made as minimal as possible.

With userspace thread, this is kernel/userspace boundary. With kthread, 
this is starting of new iteration of the main loop / try_to_freeze().

We were not able to come up with anything better. Alan, you were 
stating "I don't see why it can't use the kernel freeze functionality?". 
What exactly was your suggestion, if not this, please?

> This is the mechanism itself being flaky and buggy.  This can make 
> things fail not because the patch itself is semantically wrong but 
> because the mechanism stopped the kernel at the wrong place.  

Well, to be precise, we are not "stopping" the kernel at all.

> If kthread can't be safely supported now, that's fine but then make it 
> clear that functions which may be executed by kthreads aren't supported.

So one of the aproaches implied by this would be first merging a light 
version of kGraft which doesn't support kthreads, and working towards a 
solution for kthreads as well (which might be tangential to kGraft; if 
most of the kthreads get converted to workqueues, it's a win-win 
situation anyway); would such kGraft-light get your Ack then? :)

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: kGraft to -next [was: 00/21 kGraft]
  2014-07-05 20:49               ` Jiri Kosina
@ 2014-07-05 21:04                 ` Tejun Heo
  2014-07-05 21:06                   ` Jiri Kosina
  2014-07-29 14:05                 ` Jiri Kosina
  1 sibling, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2014-07-05 21:04 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: One Thousand Gnomes, Jiri Slaby, Stephen Rothwell, linux-kernel,
	rostedt, mingo, Andrew Morton, andi, paulmck, Pavel Machek,
	jirislaby, Vojtech Pavlik, Michael Matz

Hello,

On Sat, Jul 05, 2014 at 10:49:18PM +0200, Jiri Kosina wrote:
> Quite frankly, I have to say that I don't personally see *that* big 
> difference. In both cases we are making assumptions about semantics, and 
> are trying to get "as close as possible" to the point in time where the 
> set of assumptions can be made as minimal as possible.
>
> With userspace thread, this is kernel/userspace boundary. With kthread, 
> this is starting of new iteration of the main loop / try_to_freeze().

This is really different.  With kernel/userspace boundary, if the
patch is correct, the mechanism, sans bugs, should be able to
guarantee that the patched result is correct.  With kthreads, it
can't.  The boundary between the old and new worlds might or might not
be correct.  How can they be the same?

The fact that they may coincide often can be useful as a guideline or
whatever but I'm completely against just mushing it together when it
isn't correct.  This kind of things quickly lead to ambiguous
situations where people are not sure about the specific semantics or
guarantees of the construct and implement weird voodoo code followed
by voodoo fixes.  We already had a full round of that with the kernel
freezer itself, where people thought that the freezer magically makes
PM work properly for a subsystem.  Let's please not do that again.

> > This is the mechanism itself being flaky and buggy.  This can make 
> > things fail not because the patch itself is semantically wrong but 
> > because the mechanism stopped the kernel at the wrong place.  
> 
> Well, to be precise, we are not "stopping" the kernel at all.

Sure, whatever the term is, this is the boundary that the mechanism
considers it safe to swap the code, right?  User/kernel boundary
should be able to serve that purpose.  Freezing points can't.

> > If kthread can't be safely supported now, that's fine but then make it 
> > clear that functions which may be executed by kthreads aren't supported.
> 
> So one of the aproaches implied by this would be first merging a light 
> version of kGraft which doesn't support kthreads, and working towards a 
> solution for kthreads as well (which might be tangential to kGraft; if 
> most of the kthreads get converted to workqueues, it's a win-win 
> situation anyway); would such kGraft-light get your Ack then? :)

Yes, I think that converting things over to workqueue or
kthread_worker is generally a good idea but I'm not sure I'm in the
position to ack or nack kgraft as a whole.  I am not too sure about
the capability itself (neither positive or negative) and it'd take
quite a bit of energy to scrutinize and compare all the alternatives.
It'd be awesome if people who are working on the features can talk to
each other and see whether things can be combined.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: kGraft to -next [was: 00/21 kGraft]
  2014-07-05 21:04                 ` Tejun Heo
@ 2014-07-05 21:06                   ` Jiri Kosina
  2014-07-05 21:08                     ` Tejun Heo
  0 siblings, 1 reply; 19+ messages in thread
From: Jiri Kosina @ 2014-07-05 21:06 UTC (permalink / raw)
  To: Tejun Heo
  Cc: One Thousand Gnomes, Jiri Slaby, Stephen Rothwell, linux-kernel,
	rostedt, mingo, Andrew Morton, andi, paulmck, Pavel Machek,
	jirislaby, Vojtech Pavlik, Michael Matz

On Sat, 5 Jul 2014, Tejun Heo wrote:

> It'd be awesome if people who are working on the features can talk to
> each other and see whether things can be combined.

Oh, I absolutely agree; trust me, we are trying to get as much discussion 
going on as possible :) I proposed it as a Kernel Summit topic, and 
hopefully work will be done at the mini-summit proposed by Steven at 
Plumbers as well.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: kGraft to -next [was: 00/21 kGraft]
  2014-07-05 21:06                   ` Jiri Kosina
@ 2014-07-05 21:08                     ` Tejun Heo
  0 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2014-07-05 21:08 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: One Thousand Gnomes, Jiri Slaby, Stephen Rothwell, linux-kernel,
	rostedt, mingo, Andrew Morton, andi, paulmck, Pavel Machek,
	jirislaby, Vojtech Pavlik, Michael Matz

On Sat, Jul 05, 2014 at 11:06:28PM +0200, Jiri Kosina wrote:
> On Sat, 5 Jul 2014, Tejun Heo wrote:
> 
> > It'd be awesome if people who are working on the features can talk to
> > each other and see whether things can be combined.
> 
> Oh, I absolutely agree; trust me, we are trying to get as much discussion 
> going on as possible :) I proposed it as a Kernel Summit topic, and 
> hopefully work will be done at the mini-summit proposed by Steven at 
> Plumbers as well.

Ah, that's awesome to hear. :)

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: kGraft to -next [was: 00/21 kGraft]
  2014-07-05 20:49               ` Jiri Kosina
  2014-07-05 21:04                 ` Tejun Heo
@ 2014-07-29 14:05                 ` Jiri Kosina
  1 sibling, 0 replies; 19+ messages in thread
From: Jiri Kosina @ 2014-07-29 14:05 UTC (permalink / raw)
  To: Tejun Heo, One Thousand Gnomes
  Cc: Jiri Slaby, Stephen Rothwell, linux-kernel, rostedt, mingo,
	Andrew Morton, andi, paulmck, Pavel Machek, jirislaby,
	Vojtech Pavlik, Michael Matz

On Sat, 5 Jul 2014, Jiri Kosina wrote:

> > > The same holds for the kernel threads -- until all (or most of) the 
> > > kthreads are converted to workqueues, the obivous choice, that should 
> > > cover most of the use-cases, has been made.
> > 
> > But, this is different.  
> 
> Quite frankly, I have to say that I don't personally see *that* big 
> difference. In both cases we are making assumptions about semantics, and 
> are trying to get "as close as possible" to the point in time where the 
> set of assumptions can be made as minimal as possible.
> 
> With userspace thread, this is kernel/userspace boundary. With kthread, 
> this is starting of new iteration of the main loop / try_to_freeze().
> 
> We were not able to come up with anything better. Alan, you were 
> stating "I don't see why it can't use the kernel freeze functionality?". 
> What exactly was your suggestion, if not this, please?

Seems like there was no further activity on this wile I was on vacation, 
so let me resurrect this.

Alan, either we are already doing what you suggested, or I misunderstood 
your proposal above. Could you please elaborate?

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2014-07-29 14:05 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-25 11:05 [PATCH 00/21] kGraft Jiri Slaby
2014-06-25 12:54 ` One Thousand Gnomes
2014-06-25 15:54   ` Jiri Kosina
2014-06-26  5:50     ` Vojtech Pavlik
2014-07-02 12:04 ` kGraft to -next [was: 00/21 kGraft] Jiri Slaby
2014-07-02 12:30   ` Tejun Heo
2014-07-02 12:47     ` One Thousand Gnomes
2014-07-02 13:01       ` Jiri Kosina
2014-07-02 13:30         ` Tejun Heo
2014-07-05 20:04           ` Jiri Kosina
2014-07-05 20:36             ` Tejun Heo
2014-07-05 20:49               ` Jiri Kosina
2014-07-05 21:04                 ` Tejun Heo
2014-07-05 21:06                   ` Jiri Kosina
2014-07-05 21:08                     ` Tejun Heo
2014-07-29 14:05                 ` Jiri Kosina
2014-07-02 12:47     ` Steven Rostedt
2014-07-02 16:28     ` Josh Poimboeuf
2014-07-03  0:26   ` Stephen Rothwell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.