All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched: allow preempt notifiers to self-unregister.
@ 2011-12-16 16:15 Pierre Habouzit
  2011-12-16 17:09 ` Peter Zijlstra
  2011-12-18  9:10 ` Avi Kivity
  0 siblings, 2 replies; 7+ messages in thread
From: Pierre Habouzit @ 2011-12-16 16:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: Pierre Habouzit, Avi Kivity, Ingo Molnar, Peter Zijlstra

This is very annoying when unlike kvm (the sole in-tree user I know of)
the process is a plain user process the kernel doesn't really handle and
that it wants to unregister the callbacks just at its death.

The code that one would like to write is:

    static my_module_sched_out(struct preempt_notifier *notifier)
    {
        struct my_module_type *mmi =
            container_of(notifier, struct my_module_type, notifier);

        if (unlikely(current->state & TASK_DEAD)) {
            preempt_notifier_unregister(notifier);
            kfree(mmi);
            return;
        }

        /* do our real work */
    }

Sadly this isn't allowed because the htlist walk isn't "safe" against
removals.

Signed-off-by: Pierre Habouzit <pierre.habouzit@intersec.com>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 kernel/sched.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

  As a background, this need is because I have some kind of module code
  that uses this facility to evaluate how many of a group of threads are
  concurrently running (to regulate a pool of threads).

  Hence I install those callbacks for the thread registering themselves
  and want to keep them until the thread dies. Sadly I have no way to
  unregister those callbacks right now, but for horrible hacks (involving
  private delayed queues processed regularly walked to kfree() the
  structures referencing pids that are dead, urgh).

diff --git a/kernel/sched.c b/kernel/sched.c
index d6b149c..2653169 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3106,7 +3106,7 @@ static void fire_sched_in_preempt_notifiers(struct task_struct *curr)
 	struct preempt_notifier *notifier;
 	struct hlist_node *node;
 
-	hlist_for_each_entry(notifier, node, &curr->preempt_notifiers, link)
+	hlist_for_each_entry_safe(notifier, node, &curr->preempt_notifiers, link)
 		notifier->ops->sched_in(notifier, raw_smp_processor_id());
 }
 
@@ -3117,7 +3117,7 @@ fire_sched_out_preempt_notifiers(struct task_struct *curr,
 	struct preempt_notifier *notifier;
 	struct hlist_node *node;
 
-	hlist_for_each_entry(notifier, node, &curr->preempt_notifiers, link)
+	hlist_for_each_entry_safe(notifier, node, &curr->preempt_notifiers, link)
 		notifier->ops->sched_out(notifier, next);
 }
 
-- 
1.7.8.rc3.348.gb51a37


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched: allow preempt notifiers to self-unregister.
  2011-12-16 16:15 [PATCH] sched: allow preempt notifiers to self-unregister Pierre Habouzit
@ 2011-12-16 17:09 ` Peter Zijlstra
  2011-12-16 17:25   ` Pierre Habouzit
  2011-12-18  9:10 ` Avi Kivity
  1 sibling, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2011-12-16 17:09 UTC (permalink / raw)
  To: Pierre Habouzit; +Cc: linux-kernel, Avi Kivity, Ingo Molnar

On Fri, 2011-12-16 at 17:15 +0100, Pierre Habouzit wrote:

>   As a background, this need is because I have some kind of module code
>   that uses this facility to evaluate how many of a group of threads are
>   concurrently running (to regulate a pool of threads).

Typically such stuff is only merged along with whomever uses it.

>   Hence I install those callbacks for the thread registering themselves
>   and want to keep them until the thread dies. Sadly I have no way to
>   unregister those callbacks right now, but for horrible hacks (involving
>   private delayed queues processed regularly walked to kfree() the
>   structures referencing pids that are dead, urgh).

kfree_rcu() is the 'normal' way to cheat your way out of this.

> diff --git a/kernel/sched.c b/kernel/sched.c
> index d6b149c..2653169 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -3106,7 +3106,7 @@ static void fire_sched_in_preempt_notifiers(struct task_struct *curr)
>  	struct preempt_notifier *notifier;
>  	struct hlist_node *node;
>  
> -	hlist_for_each_entry(notifier, node, &curr->preempt_notifiers, link)
> +	hlist_for_each_entry_safe(notifier, node, &curr->preempt_notifiers, link)
>  		notifier->ops->sched_in(notifier, raw_smp_processor_id());
>  }
>  
> @@ -3117,7 +3117,7 @@ fire_sched_out_preempt_notifiers(struct task_struct *curr,
>  	struct preempt_notifier *notifier;
>  	struct hlist_node *node;
>  
> -	hlist_for_each_entry(notifier, node, &curr->preempt_notifiers, link)
> +	hlist_for_each_entry_safe(notifier, node, &curr->preempt_notifiers, link)
>  		notifier->ops->sched_out(notifier, next);
>  }

This adds to scheduler hot paths, it needs very good justification.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched: allow preempt notifiers to self-unregister.
  2011-12-16 17:09 ` Peter Zijlstra
@ 2011-12-16 17:25   ` Pierre Habouzit
  2011-12-16 17:33     ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Pierre Habouzit @ 2011-12-16 17:25 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Avi Kivity, Ingo Molnar

On Fri, Dec 16, 2011 at 06:09:45PM +0100, Peter Zijlstra wrote:
> On Fri, 2011-12-16 at 17:15 +0100, Pierre Habouzit wrote:
> 
> >   As a background, this need is because I have some kind of module code
> >   that uses this facility to evaluate how many of a group of threads are
> >   concurrently running (to regulate a pool of threads).
> 
> Typically such stuff is only merged along with whomever uses it.

Well for now I'm just toying with it, it's nowhere nead to be ready to
be even shown without me having to bury my head from shame ;)

> >   Hence I install those callbacks for the thread registering themselves
> >   and want to keep them until the thread dies. Sadly I have no way to
> >   unregister those callbacks right now, but for horrible hacks (involving
> >   private delayed queues processed regularly walked to kfree() the
> >   structures referencing pids that are dead, urgh).
> 
> kfree_rcu() is the 'normal' way to cheat your way out of this.

Hmm, if when I'm scheduled "out" with the TASK_DEAD bit set, am I sure
the _in/_out callback will never ever be called again?

It experimentally seems that the answer is yes, but I'm not familiar
enough with the scheduler to be a 100% sure. If yes then kfree_rcu is
just fine indeed and I don't need the patch, at all.

If it's not "sure" then I assume I can probably use call_rcu() but that
looks like a total overkill for something that can be fully avoided with
my patch, which incidentally, doesn't slow the typical sched path (there
should be no callbacks and the _safe iterator exits as fast as the non
safe iterator).
-- 
Intersec <http://www.intersec.com>
Pierre Habouzit <pierre.habouzit@intersec.com> | Chief Software Architect
Tél : +33 (0)1 5570 3346
Mob : +33 (0)6 1636 8131
Fax : +33 (0)1 5570 3332
37 Rue Pierre Lhomme
92400 Courbevoie

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched: allow preempt notifiers to self-unregister.
  2011-12-16 17:25   ` Pierre Habouzit
@ 2011-12-16 17:33     ` Peter Zijlstra
  2011-12-16 17:42       ` Pierre Habouzit
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2011-12-16 17:33 UTC (permalink / raw)
  To: Pierre Habouzit; +Cc: linux-kernel, Avi Kivity, Ingo Molnar

On Fri, 2011-12-16 at 18:25 +0100, Pierre Habouzit wrote:
> On Fri, Dec 16, 2011 at 06:09:45PM +0100, Peter Zijlstra wrote:
> > On Fri, 2011-12-16 at 17:15 +0100, Pierre Habouzit wrote:
> > 
> > >   As a background, this need is because I have some kind of module code
> > >   that uses this facility to evaluate how many of a group of threads are
> > >   concurrently running (to regulate a pool of threads).
> > 
> > Typically such stuff is only merged along with whomever uses it.
> 
> Well for now I'm just toying with it, it's nowhere nead to be ready to
> be even shown without me having to bury my head from shame ;)

:-) no worries just keep something like this along until you're ready..

> > >   Hence I install those callbacks for the thread registering themselves
> > >   and want to keep them until the thread dies. Sadly I have no way to
> > >   unregister those callbacks right now, but for horrible hacks (involving
> > >   private delayed queues processed regularly walked to kfree() the
> > >   structures referencing pids that are dead, urgh).
> > 
> > kfree_rcu() is the 'normal' way to cheat your way out of this.
> 
> Hmm, if when I'm scheduled "out" with the TASK_DEAD bit set, am I sure
> the _in/_out callback will never ever be called again?

Yep.

> It experimentally seems that the answer is yes, but I'm not familiar
> enough with the scheduler to be a 100% sure. If yes then kfree_rcu is
> just fine indeed and I don't need the patch, at all.
> 
> If it's not "sure" then I assume I can probably use call_rcu() but that

kfree_rcu() is a convenient macro wrapped around call_rcu().

> looks like a total overkill for something that can be fully avoided with
> my patch, which incidentally, doesn't slow the typical sched path (there
> should be no callbacks and the _safe iterator exits as fast as the non
> safe iterator).

Ah, you're right, I thought it frobbed the extra variable too, but
looking at it it only does that when there's anything on the list.




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched: allow preempt notifiers to self-unregister.
  2011-12-16 17:33     ` Peter Zijlstra
@ 2011-12-16 17:42       ` Pierre Habouzit
  0 siblings, 0 replies; 7+ messages in thread
From: Pierre Habouzit @ 2011-12-16 17:42 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Avi Kivity, Ingo Molnar

On Fri, Dec 16, 2011 at 06:33:07PM +0100, Peter Zijlstra wrote:
> On Fri, 2011-12-16 at 18:25 +0100, Pierre Habouzit wrote:
> > On Fri, Dec 16, 2011 at 06:09:45PM +0100, Peter Zijlstra wrote:
> > > On Fri, 2011-12-16 at 17:15 +0100, Pierre Habouzit wrote:
> > > >   Hence I install those callbacks for the thread registering themselves
> > > >   and want to keep them until the thread dies. Sadly I have no way to
> > > >   unregister those callbacks right now, but for horrible hacks (involving
> > > >   private delayed queues processed regularly walked to kfree() the
> > > >   structures referencing pids that are dead, urgh).
> > > 
> > > kfree_rcu() is the 'normal' way to cheat your way out of this.
> > 
> > Hmm, if when I'm scheduled "out" with the TASK_DEAD bit set, am I sure
> > the _in/_out callback will never ever be called again?
> 
> Yep.

Good then kfree_rcu (or call_rcu if I need more at some point) is good
enough, which is nice because it will allow me to run on "any" kernel
whether my patch is ever applied or not.

> > It experimentally seems that the answer is yes, but I'm not familiar
> > enough with the scheduler to be a 100% sure. If yes then kfree_rcu is
> > just fine indeed and I don't need the patch, at all.
> > 
> > If it's not "sure" then I assume I can probably use call_rcu() but that
> 
> kfree_rcu() is a convenient macro wrapped around call_rcu().

Yes I've seen, what I meant was a shortcut for "if kfree_rcu isn't
allowed because I may kfree() callbacks still registered and that _in or
_out events could be still fired then I'll write something that safely
unregisters callbacks from a longer call_rcu" :)

I reckon this "shortcut" wasn't obvious for the reader. But I won't need
that since you answered "yes" to my previous question.

> > looks like a total overkill for something that can be fully avoided with
> > my patch, which incidentally, doesn't slow the typical sched path (there
> > should be no callbacks and the _safe iterator exits as fast as the non
> > safe iterator).
> 
> Ah, you're right, I thought it frobbed the extra variable too, but
> looking at it it only does that when there's anything on the list.

Yeah that's the reason why I submitted the patch in the first place
since it doesn't change the performance for the usual case. But well,
given your answers, I don't really care whether it's applied or not
anymore, I still find it cumbersome that people couldn't unregister from
a callback, that's really something that I expected to work :) It may be
worth a comment in preempt.h to save some experimenters a kernel panic
or two :P

Thank you for your answers.

(for the story but well, I understand you couldn't care less, it sadly
caused me a kernel panic and 2 subsequent ones because btrfs kind of
didn't like the panics. Okay, I've got the message, I've been a lazy boy
to develop kernel code for almost the first time directly in my running
linux instead of a qemu :P)
-- 
Intersec <http://www.intersec.com>
Pierre Habouzit <pierre.habouzit@intersec.com> | Chief Software Architect
Tél : +33 (0)1 5570 3346
Mob : +33 (0)6 1636 8131
Fax : +33 (0)1 5570 3332
37 Rue Pierre Lhomme
92400 Courbevoie

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched: allow preempt notifiers to self-unregister.
  2011-12-16 16:15 [PATCH] sched: allow preempt notifiers to self-unregister Pierre Habouzit
  2011-12-16 17:09 ` Peter Zijlstra
@ 2011-12-18  9:10 ` Avi Kivity
  2011-12-19 10:26   ` Pierre Habouzit
  1 sibling, 1 reply; 7+ messages in thread
From: Avi Kivity @ 2011-12-18  9:10 UTC (permalink / raw)
  To: Pierre Habouzit; +Cc: linux-kernel, Ingo Molnar, Peter Zijlstra

On 12/16/2011 06:15 PM, Pierre Habouzit wrote:
>   As a background, this need is because I have some kind of module code
>   that uses this facility to evaluate how many of a group of threads are
>   concurrently running (to regulate a pool of threads).
>

That's what's cmwq is supposed to be doing (and that too should be using
preempt notifiers, IMO).


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched: allow preempt notifiers to self-unregister.
  2011-12-18  9:10 ` Avi Kivity
@ 2011-12-19 10:26   ` Pierre Habouzit
  0 siblings, 0 replies; 7+ messages in thread
From: Pierre Habouzit @ 2011-12-19 10:26 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linux-kernel, Ingo Molnar, Peter Zijlstra

On Sun, Dec 18, 2011 at 11:10:09AM +0200, Avi Kivity wrote:
> On 12/16/2011 06:15 PM, Pierre Habouzit wrote:
> >   As a background, this need is because I have some kind of module
> >   code that uses this facility to evaluate how many of a group of
> >   threads are concurrently running (to regulate a pool of threads).
> >
>
> That's what's cmwq is supposed to be doing (and that too should be
> using preempt notifiers, IMO).

It doesn't (use preempt notifiers) because it calls schedule directly
and can have its own processing before and after schedule() on his own.

Indeed, my goal is to have something like kernel/workqueues.c, kind of,
but for *userland*. The goal is to support things like Apple
pthread_workqueue_* stuff[1], and similar concepts of load-regulated
userland thread pools.


 [1] http://people.freebsd.org/~sson/thrworkq/pthread_workqueue.3.txt
-- 
Intersec <http://www.intersec.com>
Pierre Habouzit <pierre.habouzit@intersec.com> | Chief Software Architect
Tél : +33 (0)1 5570 3346
Mob : +33 (0)6 1636 8131
Fax : +33 (0)1 5570 3332
37 Rue Pierre Lhomme
92400 Courbevoie

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-12-19 10:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-16 16:15 [PATCH] sched: allow preempt notifiers to self-unregister Pierre Habouzit
2011-12-16 17:09 ` Peter Zijlstra
2011-12-16 17:25   ` Pierre Habouzit
2011-12-16 17:33     ` Peter Zijlstra
2011-12-16 17:42       ` Pierre Habouzit
2011-12-18  9:10 ` Avi Kivity
2011-12-19 10:26   ` Pierre Habouzit

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.