linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 3/4] cleanup __exit_signal()
@ 2006-02-25  1:48 Suzanne Wood
  2006-02-25 19:26 ` Oleg Nesterov
  0 siblings, 1 reply; 6+ messages in thread
From: Suzanne Wood @ 2006-02-25  1:48 UTC (permalink / raw)
  To: oleg; +Cc: linux-kernel, paulmck

The extent of the rcu readside critical section is determined 
by the corresponding placements of rcu_read_lock() and 
rcu_read_unlock().  Your recent [PATCH] convert sighand_cache 
to use SLAB_DESTROY_BY_RCU uncovered a comment that elicits 
this request for clarification.  (The initial motivation was in
seeing the introduction of an rcu_assign_pointer() and
looking for the corresponding rcu_dereference().)

Jul 13 2004 [PATCH] rmaplock 2/6 SLAB_DESTROY_BY_RCU (and 
consistent with slab.c in linux-2.6.16-rc3), struct slab_rcu 
is described:
 * struct slab_rcu
 *
 * slab_destroy on a SLAB_DESTROY_BY_RCU cache uses this structure to
 * arrange for kmem_freepages to be called via RCU.  This is useful if
 * we need to approach a kernel structure obliquely, from its address
 * obtained without the usual locking.  We can lock the structure to
 * stabilize it and check it's still at the given address, only if we
 * can be sure that the memory has not been meanwhile reused for some
 * other kind of object (which our subsystem's lock might corrupt).
 *
 * rcu_read_lock before reading the address, then rcu_read_unlock after
 * taking the spinlock within the structure expected at that address.

Does this mean that the rcu_read_lock() can safely occur just 
after the spin_lock(&sighand->siglock)?  Since I don't find an 
example that follows this interpretation of the comment, what 
is the intention?  Or, if so, what is the particular context?  
Looks like all kernel occurrences of rcu_dereference() 
with sighand arguments have, within the function definition, 
rcu_read_lock/unlock() pairs enclosing spin lock and unlock 
pairs except that in group_send_sig_info() with a comment on 
requiring rcu_read_lock or tasklist_lock.

An example is attached in your patch to move __exit_signal().  
It appears that the rcu readside critical section is in place to 
provide persistence of the task_struct.  __exit_sighand() calls
sighand_free(sighand) -- proposed to be renamed cleanup_sighand(tsk) 
to call kmem_cache_free(sighand_cachep, sighand) -- before 
spin_unlock(&sighand->siglock) is called in __exit_signal().

Thank you for any suggestions.

> Subject:    [PATCH 1/3] move __exit_signal() to kernel/exit.c
> From:       Oleg Nesterov
> Date:       2006-02-22 22:32:54
> 
> __exit_signal() is private to release_task() now.
> I think it is better to make it static in kernel/exit.c
> and export flush_sigqueue() instead - this function is
> much more simple and straightforward.
> 
> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
> 
> --- 2.6.16-rc3/include/linux/sched.h~1_MOVE	2006-02-20 21:00:09.000000000 +0300
> +++ 2.6.16-rc3/include/linux/sched.h	2006-02-23 00:23:40.000000000 +0300
> @@ -1143,7 +1143,6 @@ extern void exit_thread(void);
>  extern void exit_files(struct task_struct *);
>  extern void __cleanup_signal(struct signal_struct *);
>  extern void cleanup_sighand(struct task_struct *);
> -extern void __exit_signal(struct task_struct *);
>  extern void exit_itimers(struct signal_struct *);
>  
>  extern NORET_TYPE void do_group_exit(int);
> --- 2.6.16-rc3/include/linux/signal.h~1_MOVE	2006-01-19 18:13:07.000000000 +0300
> +++ 2.6.16-rc3/include/linux/signal.h	2006-02-23 00:36:27.000000000 +0300
> @@ -249,6 +249,8 @@ static inline void init_sigpending(struc
>  	INIT_LIST_HEAD(&sig->list);
>  }
>  
> +extern void flush_sigqueue(struct sigpending *queue);
> +
>  /* Test if 'sig' is valid signal. Use this instead of testing _NSIG directly */
>  static inline int valid_signal(unsigned long sig)
>  {
> --- 2.6.16-rc3/kernel/exit.c~1_MOVE	2006-02-17 00:05:25.000000000 +0300
> +++ 2.6.16-rc3/kernel/exit.c	2006-02-23 00:32:46.000000000 +0300
> @@ -29,6 +29,7 @@
>  #include <linux/cpuset.h>
>  #include <linux/syscalls.h>
>  #include <linux/signal.h>
> +#include <linux/posix-timers.h>
>  #include <linux/cn_proc.h>
>  #include <linux/mutex.h>
>  
> @@ -60,6 +61,68 @@ static void __unhash_process(struct task
>  	remove_parent(p);
>  }
>  
> +/*
> + * This function expects the tasklist_lock write-locked.
> + */
> +static void __exit_signal(struct task_struct *tsk)
> +{
> +	struct signal_struct *sig = tsk->signal;
> +	struct sighand_struct *sighand;
> +
> +	BUG_ON(!sig);
> +	BUG_ON(!atomic_read(&sig->count));
> +
> +	rcu_read_lock();
> +	sighand = rcu_dereference(tsk->sighand);
> +	spin_lock(&sighand->siglock);
> +
> +	posix_cpu_timers_exit(tsk);
> +	if (atomic_dec_and_test(&sig->count))
> +		posix_cpu_timers_exit_group(tsk);
> +	else {
> +		/*
> +		 * If there is any task waiting for the group exit
> +		 * then notify it:
> +		 */
> +		if (sig->group_exit_task && atomic_read(&sig->count) == sig->notify_count) {
> +			wake_up_process(sig->group_exit_task);
> +			sig->group_exit_task = NULL;
> +		}
> +		if (tsk == sig->curr_target)
> +			sig->curr_target = next_thread(tsk);
> +		/*
> +		 * Accumulate here the counters for all threads but the
> +		 * group leader as they die, so they can be added into
> +		 * the process-wide totals when those are taken.
> +		 * The group leader stays around as a zombie as long
> +		 * as there are other threads.  When it gets reaped,
> +		 * the exit.c code will add its counts into these totals.
> +		 * We won't ever get here for the group leader, since it
> +		 * will have been the last reference on the signal_struct.
> +		 */
> +		sig->utime = cputime_add(sig->utime, tsk->utime);
> +		sig->stime = cputime_add(sig->stime, tsk->stime);
> +		sig->min_flt += tsk->min_flt;
> +		sig->maj_flt += tsk->maj_flt;
> +		sig->nvcsw += tsk->nvcsw;
> +		sig->nivcsw += tsk->nivcsw;
> +		sig->sched_time += tsk->sched_time;
> +		sig = NULL; /* Marker for below. */
> +	}
> +
> +	tsk->signal = NULL;
> +	cleanup_sighand(tsk);
> +	spin_unlock(&sighand->siglock);
> +	rcu_read_unlock();
> +
> +	clear_tsk_thread_flag(tsk,TIF_SIGPENDING);
> +	flush_sigqueue(&tsk->pending);
> +	if (sig) {
> +		flush_sigqueue(&sig->shared_pending);
> +		__cleanup_signal(sig);
> +	}
> +}


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/4] cleanup __exit_signal()
  2006-02-25  1:48 [PATCH 3/4] cleanup __exit_signal() Suzanne Wood
@ 2006-02-25 19:26 ` Oleg Nesterov
  0 siblings, 0 replies; 6+ messages in thread
From: Oleg Nesterov @ 2006-02-25 19:26 UTC (permalink / raw)
  To: Suzanne Wood; +Cc: linux-kernel, paulmck

Suzanne Wood wrote:
>
> The extent of the rcu readside critical section is determined
> by the corresponding placements of rcu_read_lock() and
> rcu_read_unlock().  Your recent [PATCH] convert sighand_cache
> to use SLAB_DESTROY_BY_RCU uncovered a comment that elicits
> this request for clarification.  (The initial motivation was in
> seeing the introduction of an rcu_assign_pointer() and
> looking for the corresponding rcu_dereference().)
>
> Jul 13 2004 [PATCH] rmaplock 2/6 SLAB_DESTROY_BY_RCU (and
> consistent with slab.c in linux-2.6.16-rc3), struct slab_rcu
> is described:
>  * struct slab_rcu
>  *
>  * slab_destroy on a SLAB_DESTROY_BY_RCU cache uses this structure to
>  * arrange for kmem_freepages to be called via RCU.  This is useful if
>  * we need to approach a kernel structure obliquely, from its address
>  * obtained without the usual locking.  We can lock the structure to
>  * stabilize it and check it's still at the given address, only if we
>  * can be sure that the memory has not been meanwhile reused for some
>  * other kind of object (which our subsystem's lock might corrupt).
>  *
>  * rcu_read_lock before reading the address, then rcu_read_unlock after
>  * taking the spinlock within the structure expected at that address.
>
> Does this mean that the rcu_read_lock() can safely occur just
> after the spin_lock(&sighand->siglock)?  Since I don't find an
> example that follows this interpretation of the comment, what
> is the intention?  Or, if so, what is the particular context?
> Looks like all kernel occurrences of rcu_dereference()
> with sighand arguments have, within the function definition,
> rcu_read_lock/unlock() pairs enclosing spin lock and unlock
> pairs except that in group_send_sig_info() with a comment on
> requiring rcu_read_lock or tasklist_lock.

Sorry, I can't understand this question. __exit_signal() does
rcu_read_lock() (btw, this is not strictly necessary here due
to tasklist_lock held, it is more a documentation) _before_ it
takes sighand->siglock.

> An example is attached in your patch to move __exit_signal().
> It appears that the rcu readside critical section is in place to
> provide persistence of the task_struct.  __exit_sighand() calls
> sighand_free(sighand) -- proposed to be renamed cleanup_sighand(tsk)
> to call kmem_cache_free(sighand_cachep, sighand) -- before
> spin_unlock(&sighand->siglock) is called in __exit_signal().

This is a very valid question.

Yes, spin_unlock(&sighand->siglock) after kmem_cache_free() means
we are probably writing to the memory which was already reused on
another cpu.

However, SLAB_DESTROY_BY_RCU garantees that this memory was not
released to the system (while we are under rcu_read_lock()), so
this memory is a valid sighand_struct, and it is ok to release
sighand->siglock. That is why we initialize ->siglock (unlike
->count) in sighand_ctor, but not in copy_sighand().

In other words, after kmem_cache_free() we own nothing in sighand,
except this ->siglock.

So we are safe even if another cpu tries to lock this sighand
right now (currently this is not posiible, copy_process or
de_thread should first take tasklist_lock), it will be blocked
until we release it.

This patch was done when __exit_signal had 2 sighand_free() calls.
Now we can change this:

	void cleanup_sighand(struct sighand_struct *sighand)
	{
		if (atomic_dec_and_test(&sighand->count))
			kmem_cache_free(sighand_cachep, sighand);
	}

	void __exit_signal(tsk)
	{
		...

		tsk->signal = NULL;
		tsk->sighand = NULL;	// we must do it before unlocking ->siglock
		spin_unlock(&sighand->siglock);
		rcu_read_unlock();

		cleanup_sighand(sighand)
		...
	}

Oleg.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/4] cleanup __exit_signal()
  2006-02-24 18:13   ` Oleg Nesterov
@ 2006-02-25  0:20     ` Paul E. McKenney
  0 siblings, 0 replies; 6+ messages in thread
From: Paul E. McKenney @ 2006-02-25  0:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, linux-kernel, Ingo Molnar, Eric W. Biederman,
	Roland McGrath

On Fri, Feb 24, 2006 at 09:13:22PM +0300, Oleg Nesterov wrote:
> "Paul E. McKenney" wrote:
> > 
> > On Mon, Feb 20, 2006 at 07:04:03PM +0300, Oleg Nesterov wrote:
> > > This patch factors out duplicated code under 'if' branches.
> > > Also, BUG_ON() conversions and whitespace cleanups.
> > 
> > Passed steamroller.  Looks sane to me.
> 
> Oh, thanks!
> 
> I forgot to say it, but I had run steamroller tests too before I
> sent "some tasklist_lock removals" series.

Glad to hear it!

> Do you know any other test which may be useful too?

Matt Wilcox mentioned that a full build of gdb ran some tests that do
a good job of exercising signals.  I have not yet tried this myself
(but am giving it a shot).

Also, my guess is that you ran steamroller on x86 (how many CPUs?).
I ran on ppc64.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/4] cleanup __exit_signal()
  2006-02-24 18:01 ` Paul E. McKenney
@ 2006-02-24 18:13   ` Oleg Nesterov
  2006-02-25  0:20     ` Paul E. McKenney
  0 siblings, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2006-02-24 18:13 UTC (permalink / raw)
  To: paulmck
  Cc: Andrew Morton, linux-kernel, Ingo Molnar, Eric W. Biederman,
	Roland McGrath

"Paul E. McKenney" wrote:
> 
> On Mon, Feb 20, 2006 at 07:04:03PM +0300, Oleg Nesterov wrote:
> > This patch factors out duplicated code under 'if' branches.
> > Also, BUG_ON() conversions and whitespace cleanups.
> 
> Passed steamroller.  Looks sane to me.

Oh, thanks!

I forgot to say it, but I had run steamroller tests too before I
sent "some tasklist_lock removals" series.

Do you know any other test which may be useful too?

Oleg.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/4] cleanup __exit_signal()
  2006-02-20 16:04 Oleg Nesterov
@ 2006-02-24 18:01 ` Paul E. McKenney
  2006-02-24 18:13   ` Oleg Nesterov
  0 siblings, 1 reply; 6+ messages in thread
From: Paul E. McKenney @ 2006-02-24 18:01 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, linux-kernel, Ingo Molnar, Eric W. Biederman,
	Roland McGrath

On Mon, Feb 20, 2006 at 07:04:03PM +0300, Oleg Nesterov wrote:
> This patch factors out duplicated code under 'if' branches.
> Also, BUG_ON() conversions and whitespace cleanups.

Passed steamroller.  Looks sane to me.

						Thanx, Paul

Acked-by: <paulmck@us.ibm.com>
> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
> 
> --- 2.6.16-rc3/kernel/signal.c~3_ESIG	2006-02-20 02:02:03.000000000 +0300
> +++ 2.6.16-rc3/kernel/signal.c	2006-02-20 20:55:50.000000000 +0300
> @@ -341,24 +341,20 @@ void __exit_sighand(struct task_struct *
>   */
>  void __exit_signal(struct task_struct *tsk)
>  {
> -	struct signal_struct * sig = tsk->signal;
> -	struct sighand_struct * sighand;
> +	struct signal_struct *sig = tsk->signal;
> +	struct sighand_struct *sighand;
> +
> +	BUG_ON(!sig);
> +	BUG_ON(!atomic_read(&sig->count));
>  
> -	if (!sig)
> -		BUG();
> -	if (!atomic_read(&sig->count))
> -		BUG();
>  	rcu_read_lock();
>  	sighand = rcu_dereference(tsk->sighand);
>  	spin_lock(&sighand->siglock);
> +
>  	posix_cpu_timers_exit(tsk);
> -	if (atomic_dec_and_test(&sig->count)) {
> +	if (atomic_dec_and_test(&sig->count))
>  		posix_cpu_timers_exit_group(tsk);
> -		tsk->signal = NULL;
> -		__exit_sighand(tsk);
> -		spin_unlock(&sighand->siglock);
> -		flush_sigqueue(&sig->shared_pending);
> -	} else {
> +	else {
>  		/*
>  		 * If there is any task waiting for the group exit
>  		 * then notify it:
> @@ -369,7 +365,6 @@ void __exit_signal(struct task_struct *t
>  		}
>  		if (tsk == sig->curr_target)
>  			sig->curr_target = next_thread(tsk);
> -		tsk->signal = NULL;
>  		/*
>  		 * Accumulate here the counters for all threads but the
>  		 * group leader as they die, so they can be added into
> @@ -387,14 +382,18 @@ void __exit_signal(struct task_struct *t
>  		sig->nvcsw += tsk->nvcsw;
>  		sig->nivcsw += tsk->nivcsw;
>  		sig->sched_time += tsk->sched_time;
> -		__exit_sighand(tsk);
> -		spin_unlock(&sighand->siglock);
> -		sig = NULL;	/* Marker for below.  */
> +		sig = NULL; /* Marker for below. */
>  	}
> +
> +	tsk->signal = NULL;
> +	__exit_sighand(tsk);
> +	spin_unlock(&sighand->siglock);
>  	rcu_read_unlock();
> +
>  	clear_tsk_thread_flag(tsk,TIF_SIGPENDING);
>  	flush_sigqueue(&tsk->pending);
>  	if (sig) {
> +		flush_sigqueue(&sig->shared_pending);
>  		__cleanup_signal(sig);
>  	}
>  }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 3/4] cleanup __exit_signal()
@ 2006-02-20 16:04 Oleg Nesterov
  2006-02-24 18:01 ` Paul E. McKenney
  0 siblings, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2006-02-20 16:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Ingo Molnar, Paul E. McKenney, Eric W. Biederman,
	Roland McGrath

This patch factors out duplicated code under 'if' branches.
Also, BUG_ON() conversions and whitespace cleanups.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>

--- 2.6.16-rc3/kernel/signal.c~3_ESIG	2006-02-20 02:02:03.000000000 +0300
+++ 2.6.16-rc3/kernel/signal.c	2006-02-20 20:55:50.000000000 +0300
@@ -341,24 +341,20 @@ void __exit_sighand(struct task_struct *
  */
 void __exit_signal(struct task_struct *tsk)
 {
-	struct signal_struct * sig = tsk->signal;
-	struct sighand_struct * sighand;
+	struct signal_struct *sig = tsk->signal;
+	struct sighand_struct *sighand;
+
+	BUG_ON(!sig);
+	BUG_ON(!atomic_read(&sig->count));
 
-	if (!sig)
-		BUG();
-	if (!atomic_read(&sig->count))
-		BUG();
 	rcu_read_lock();
 	sighand = rcu_dereference(tsk->sighand);
 	spin_lock(&sighand->siglock);
+
 	posix_cpu_timers_exit(tsk);
-	if (atomic_dec_and_test(&sig->count)) {
+	if (atomic_dec_and_test(&sig->count))
 		posix_cpu_timers_exit_group(tsk);
-		tsk->signal = NULL;
-		__exit_sighand(tsk);
-		spin_unlock(&sighand->siglock);
-		flush_sigqueue(&sig->shared_pending);
-	} else {
+	else {
 		/*
 		 * If there is any task waiting for the group exit
 		 * then notify it:
@@ -369,7 +365,6 @@ void __exit_signal(struct task_struct *t
 		}
 		if (tsk == sig->curr_target)
 			sig->curr_target = next_thread(tsk);
-		tsk->signal = NULL;
 		/*
 		 * Accumulate here the counters for all threads but the
 		 * group leader as they die, so they can be added into
@@ -387,14 +382,18 @@ void __exit_signal(struct task_struct *t
 		sig->nvcsw += tsk->nvcsw;
 		sig->nivcsw += tsk->nivcsw;
 		sig->sched_time += tsk->sched_time;
-		__exit_sighand(tsk);
-		spin_unlock(&sighand->siglock);
-		sig = NULL;	/* Marker for below.  */
+		sig = NULL; /* Marker for below. */
 	}
+
+	tsk->signal = NULL;
+	__exit_sighand(tsk);
+	spin_unlock(&sighand->siglock);
 	rcu_read_unlock();
+
 	clear_tsk_thread_flag(tsk,TIF_SIGPENDING);
 	flush_sigqueue(&tsk->pending);
 	if (sig) {
+		flush_sigqueue(&sig->shared_pending);
 		__cleanup_signal(sig);
 	}
 }

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2006-02-25 19:29 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-02-25  1:48 [PATCH 3/4] cleanup __exit_signal() Suzanne Wood
2006-02-25 19:26 ` Oleg Nesterov
  -- strict thread matches above, loose matches on Subject: below --
2006-02-20 16:04 Oleg Nesterov
2006-02-24 18:01 ` Paul E. McKenney
2006-02-24 18:13   ` Oleg Nesterov
2006-02-25  0:20     ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).