All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Don Zickus <dzickus@redhat.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Ingo Molnar <mingo@elte.hu>,
	Jerome Marchand <jmarchan@redhat.com>,
	Mandeep Singh Baines <msb@google.com>,
	Roland McGrath <roland@redhat.com>,
	linux-kernel@vger.kernel.org, stable@kernel.org,
	"Eric W. Biederman" <ebiederm@xmission.com>
Subject: Re: while_each_thread() under rcu_read_lock() is broken?
Date: Mon, 21 Jun 2010 13:51:28 -0700	[thread overview]
Message-ID: <20100621205128.GI2354@linux.vnet.ibm.com> (raw)
In-Reply-To: <20100621170919.GA13826@redhat.com>

On Mon, Jun 21, 2010 at 07:09:19PM +0200, Oleg Nesterov wrote:
> On 06/18, Paul E. McKenney wrote:
> >
> > On Fri, Jun 18, 2010 at 09:34:03PM +0200, Oleg Nesterov wrote:
> > >
> > > 	#define XXX(t)	({
> > > 		struct task_struct *__prev = t;
> > > 		t = next_thread(t);
> > > 		t != g && t != __prev;
> > > 	})
> > >
> > > 	#define while_each_thread(g, t) \
> > > 		while (XXX(t))
> >
> > Isn't the above vulnerable to a pthread_create() immediately following
> > the offending exec()?  Especially if the task doing the traversal is
> > preempted?
> 
> Yes, thanks!
> 
> > here are some techniques that might (or might not) help:
> 
> To simplify, let's consider the concrete example,

Sounds very good!

> 	rcu_read_lock();
> 
> 	g = t = returns_the_rcu_safe_task_struct_ptr();

This returns a pointer to the task struct of the current thread?
Or might this return a pointer some other thread's task struct?

> 	do {
> 		printk("%d\n", t->pid);
> 	} while_each_thread(g, t);
> 
> 	rcu_read_unlock();
> 
> Whatever we do, without tasklist/siglock this can obviously race
> with fork/exit/exec. It is OK to miss a thread, or print the pid
> of the already exited/released task.
> 
> But it should not loop forever (the subject), and it should not
> print the same pid twice (ignoring pid reuse, of course).
> 
> And, afaics, there are no problems with rcu magic per se, next_thread()
> always returns the task_struct we can safely dereference. The only
> problem is that while_each_thread() assumes that sooner or later
> next_thread() must reach the starting point, g.
> 
> (zap_threads() is different, it must not miss a thread with ->mm
>  we are going to dump, but it holds mmap_sem).

Indeed, the tough part is figuring out when you are done given that things
can come and go at will.  Some additional tricks, in no particular order:

1.	Always start at the group leader.  Of course, the group leader
	is probably permitted to leave any time it wants to, so this
	is not sufficient in and of itself.

2.	Maintain a separate task structure that flags the head of the
	list.  This separate structure is freed one RCU grace period
	following the disappearance of the current group leader.  This
	should be quite robust, but "holy overhead, Batman!!!"  (Apologies
	for the American pop culture reference, but nothing else seemed
	appropriate.)

3.	Like #2, but reduce the storage overhead by using a structure
	containing only the pointers and perhaps a few additional fields.
	Set one of the bottom bits of one of the pointers to flag the fact
	that this is a special structure, allowing the reader to take
	note and do any additional checks required.

	Of course, it is possible for a reader to be diverted onto a
	new thread group, so it is then also possible that it needs
	to return back to the previous thread group.  The smaller structure
	corresponding to the smaller structure might contain the
	pointers required for it to make this transition.

	Please note that a given thread group will have several of
	these structures if several thread-group depart in quick
	succession.  The smaller structure would need to contain
	enough information to allow the reader to thread back to
	whatever group-leader task it started from.  Which will likely
	make managing the lifetimes of these smaller structures
	"interesting"...  Though this could be eased by maintaining
	forward links as well as backwards links.  Then when the
	grace period expired, the corresponding backwards links could
	be NULLed -- and a second grace period would likely be needed.

	It is tempting to leave these special structs out of any
	thread groups that have not yet had a leader depart, but this
	is probably vulnerable to looping.  (It might be possible to
	avoid this, but additional checks are required.)

There are other approaches, but it is probably time for you to teach me
more about the requirements you face.

> > o	Check ACCESS_ONCE(p->group_leader == g),
> 
> 	but this assumeas that while_each_thread(g, t) always
> 	uses g == group_leader. zap_other_threads(), for example,
> 	starts with g == current and not necessarily the leader.
> 
> > if false, restart
> > 	the traversal.
> 
> I am not sure we should restart (don't pid the same pid twice),
> probably we should break the loop.
> 
> So, I am thinking about the first attempt
> 
> 	#define while_each_thread(g, t) \
> 		while ((t = next_thread(t)) != g && pid_alive(g))
> 
> again. But this means while_each_thread() can miss more threads
> than it currently can under the same conditions. Correct, but
> not good.

OK, so we really want to complete the scan through the thread group, or
at least avoid introducing additional tendency to not complete the scan.
Good to know.  ;-)

> And,
> 
> > 	This of course might
> > 	require careful attention of the order of updates to ->group_leader
> > 	and the list pointers.
> 
> Yes. At least the code above needs barriers, and I am not sure
> it can be really correct without more complications.

Agreed!

> > o	Maintain an ->oldnext field that tracks the old pointer to
> > 	the next task for one RCU grace period after a de_thread()
> > 	operation.  When the grace period expires (presumably via
> > 	call_rcu()), the ->oldnext field is set to NULL.
> 
> Well, another field in task_struct...

Yeah, would be good to avoid this.  Not sure it can be avoided, though.

> > o	Do the de_thread() incrementally.  So if the list is tasks A,
> > 	B, and C, in that order, and if we are de-thread()ing B,
> > 	then make A's pointer refer to C,
> 
> This breaks while_each_thread() under tasklist/siglock. It must
> see all unhashed tasks.

Could de_thread() hold those locks in order to avoid that breakage?

> Oh. I'll try to think more, but I also hope someone else can suggest
> a simple solution.

;-)

> > > Please tell me I am wrong!
> >
> > It would take a braver man than me to say that Oleg Nesterov is wrong!
> 
> Hmm. Given that you proved I was wrong in the first lines of your
> email, I do not know what to think ;)

Ah, but I didn't prove you wrong.  Instead, I simply asked if you
were wrong.  You supplied the proof.  ;-)

							Thanx, Paul

  parent reply	other threads:[~2010-06-21 20:51 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-18 19:02 [PATCH] fix the racy check_hung_uninterruptible_tasks()->rcu_lock_break() logic Oleg Nesterov
2010-06-18 19:34 ` while_each_thread() under rcu_read_lock() is broken? Oleg Nesterov
2010-06-18 21:08   ` Roland McGrath
2010-06-18 22:37     ` Oleg Nesterov
2010-06-18 22:33   ` Paul E. McKenney
2010-06-21 17:09     ` Oleg Nesterov
2010-06-21 17:44       ` Oleg Nesterov
2010-06-21 18:00         ` Oleg Nesterov
2010-06-21 19:02         ` Roland McGrath
2010-06-21 20:06           ` Oleg Nesterov
2010-06-21 21:19             ` Eric W. Biederman
2010-06-22 14:34               ` Oleg Nesterov
2010-07-08 23:59             ` Roland McGrath
2010-07-09  0:41               ` Paul E. McKenney
2010-07-09  1:01                 ` Roland McGrath
2010-07-09 16:18                   ` Paul E. McKenney
2010-06-21 20:51       ` Paul E. McKenney [this message]
2010-06-21 21:22         ` Eric W. Biederman
2010-06-21 21:38           ` Paul E. McKenney
2010-06-22 21:23         ` Oleg Nesterov
2010-06-22 22:12           ` Paul E. McKenney
2010-06-23 15:24             ` Oleg Nesterov
2010-06-24 18:07               ` Paul E. McKenney
2010-06-24 18:50                 ` Chris Friesen
2010-06-24 22:00                   ` Oleg Nesterov
2010-06-25  0:08                     ` Eric W. Biederman
2010-06-25  3:42                       ` Paul E. McKenney
2010-06-25 10:08                       ` Oleg Nesterov
2010-07-09  0:52                       ` Roland McGrath
2010-06-24 21:14                 ` Roland McGrath
2010-06-25  3:37                   ` Paul E. McKenney
2010-07-09  0:41                     ` Roland McGrath
2010-06-24 21:57                 ` Oleg Nesterov
2010-06-25  3:41                   ` Paul E. McKenney
2010-06-25  9:55                     ` Oleg Nesterov
2010-06-28 23:43                       ` Paul E. McKenney
2010-06-29 13:05                         ` Oleg Nesterov
2010-06-29 15:34                           ` Paul E. McKenney
2010-06-29 17:54                             ` Oleg Nesterov
2010-06-19  5:00   ` Mandeep Baines
2010-06-19  5:35     ` Frederic Weisbecker
2010-06-19 15:44       ` Mandeep Baines
2010-06-19 19:19     ` Oleg Nesterov
2010-06-18 20:11 ` [PATCH] fix the racy check_hung_uninterruptible_tasks()->rcu_lock_break() logic Frederic Weisbecker
2010-06-18 20:38 ` Mandeep Singh Baines

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100621205128.GI2354@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=dzickus@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=fweisbec@gmail.com \
    --cc=jmarchan@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=msb@google.com \
    --cc=oleg@redhat.com \
    --cc=roland@redhat.com \
    --cc=stable@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.