All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Darren Hart <darren@dvhart.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>
Cc: Jerome Marchand <jmarchan@redhat.com>,
	Larry Woodman <lwoodman@redhat.com>,
	Mateusz Guzik <mguzik@redhat.com>,
	linux-kernel@vger.kernel.org
Subject: [PATCH 0/1] futex: check PF_KTHREAD rather than !p->mm to filter out kthreads
Date: Mon, 2 Feb 2015 15:05:15 +0100	[thread overview]
Message-ID: <20150202140515.GA26398@redhat.com> (raw)

Thomas, et al, could you please review?

The change looks trivial, but I simply can not understand this logic,
please help.



First of all, why exactly do we need this mm/PF_KTHREAD check added by
f0d71b3dcb8332f7971 ? Of course, it is simply wrong to declare a random
kernel thread to be the owner as the changelog says. But why kthread is
worse than a random user-space task, say, /sbin/init?

IIUC, the fact that we can abuse ->pi_state_list is not that bad, no matter
if this (k)thread will exit or not. AFAICS, the only problem is that we can
boost the prio of this thread. Or I missed another problem?

I am asking because we need to backport some fixes, and I am trying to
convince myself that I actually understand what I am trying to do ;)




And another question. Lets forget about this ->mm check. I simply can not
understand this

	ret = (p->flags & PF_EXITPIDONE) ? -ESRCH : -EAGAIN

logic in attach_to_pi_owner(). First of all, why do we need to retry if
PF_EXITING is set but PF_EXITPIDONE is not? Why we can not simply ignore
PF_EXITING and rely on exit_pi_state_list() if PF_EXITPIDONE is not set?

I must have missed something but this looks buggy, I do not see any
preemption point in this "retry" loop. Suppose that max_cpus=1 and rt_task()
preempts the non-rt PF_EXITING owner. Looks like futex_lock_pi() can spin
forever in this case? (OK, ignoring RT throttling).

IOW. Could you explain why the patch below is wrong correctness-wise?
Lets ignore the fact it is obviously ugly and suboptimal...

Thanks,

Oleg.

--- x/kernel/exit.c
+++ x/kernel/exit.c
@@ -684,13 +684,7 @@ void do_exit(long code)
 	if (unlikely(tsk->flags & PF_EXITING)) {
 		pr_alert("Fixing recursive fault but reboot is needed!\n");
 		/*
-		 * We can do this unlocked here. The futex code uses
-		 * this flag just to verify whether the pi state
-		 * cleanup has been done or not. In the worst case it
-		 * loops once more. We pretend that the cleanup was
-		 * done as there is no way to return. Either the
-		 * OWNER_DIED bit is set by now or we push the blocked
-		 * task into the wait for ever nirwana as well.
+		 * XXX: ADD COMMENT
 		 */
 		tsk->flags |= PF_EXITPIDONE;
 		set_current_state(TASK_UNINTERRUPTIBLE);
@@ -699,8 +693,7 @@ void do_exit(long code)
 
 	exit_signals(tsk);  /* sets PF_EXITING */
 	/*
-	 * tsk->flags are checked in the futex code to protect against
-	 * an exiting task cleaning up the robust pi futexes.
+	 * XXX: anobody else needs this ?
 	 */
 	smp_mb();
 	raw_spin_unlock_wait(&tsk->pi_lock);
@@ -779,12 +772,6 @@ void do_exit(long code)
 	 * Make sure we are holding no locks:
 	 */
 	debug_check_no_locks_held();
-	/*
-	 * We can do this unlocked here. The futex code uses this flag
-	 * just to verify whether the pi state cleanup has been done
-	 * or not. In the worst case it loops once more.
-	 */
-	tsk->flags |= PF_EXITPIDONE;
 
 	if (tsk->io_context)
 		exit_io_context(tsk);
--- x/kernel/fork.c
+++ x/kernel/fork.c
@@ -803,8 +803,7 @@ void mm_release(struct task_struct *tsk,
 		tsk->compat_robust_list = NULL;
 	}
 #endif
-	if (unlikely(!list_empty(&tsk->pi_state_list)))
-		exit_pi_state_list(tsk);
+	exit_pi_state_list(tsk);
 #endif
 
 	uprobe_free_utask(tsk);
--- x/kernel/futex.c
+++ x/kernel/futex.c
@@ -722,6 +722,7 @@ void exit_pi_state_list(struct task_stru
 	 * versus waiters unqueueing themselves:
 	 */
 	raw_spin_lock_irq(&curr->pi_lock);
+	curr->flags |= PF_EXITPIDONE;
 	while (!list_empty(head)) {
 
 		next = head->next;
@@ -912,17 +913,15 @@ static int attach_to_pi_owner(u32 uval, 
 	 * p->pi_lock:
 	 */
 	raw_spin_lock_irq(&p->pi_lock);
-	if (unlikely(p->flags & PF_EXITING)) {
+	if (unlikely(p->flags & PF_EXITPIDONE)) {
 		/*
 		 * The task is on the way out. When PF_EXITPIDONE is
 		 * set, we know that the task has finished the
 		 * cleanup:
 		 */
-		int ret = (p->flags & PF_EXITPIDONE) ? -ESRCH : -EAGAIN;
-
 		raw_spin_unlock_irq(&p->pi_lock);
 		put_task_struct(p);
-		return ret;
+		return -ESRCH;
 	}
 
 	/*


             reply	other threads:[~2015-02-02 14:06 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-02 14:05 Oleg Nesterov [this message]
2015-02-02 14:05 ` [PATCH 1/1] futex: check PF_KTHREAD rather than !p->mm to filter out kthreads Oleg Nesterov
2015-02-04 10:48   ` Peter Zijlstra
2015-02-14 18:01   ` Davidlohr Bueso
2015-02-14 20:57     ` Oleg Nesterov
2015-02-14 21:15       ` Davidlohr Bueso
2015-02-14 21:54         ` Oleg Nesterov
2015-02-18 17:11   ` [tip:locking/core] locking/futex: Check " tip-bot for Oleg Nesterov
2015-02-02 15:11 ` [PATCH 0/1] futex: check " Peter Zijlstra
2015-02-02 15:13   ` Peter Zijlstra
2015-02-02 15:14     ` Peter Zijlstra
2015-02-02 16:20   ` Oleg Nesterov
2015-02-03 20:09   ` Oleg Nesterov
2015-02-04 11:12     ` Peter Zijlstra
2015-02-04 20:25       ` Oleg Nesterov
2015-02-05 16:27         ` Peter Zijlstra
2015-02-05 18:10           ` Oleg Nesterov
2015-02-06 10:46             ` Peter Zijlstra
2015-02-06 17:04               ` Oleg Nesterov
2015-02-09 20:38                 ` Darren Hart
2015-02-10 11:14                   ` Oleg Nesterov
2015-02-16 20:13 ` [PATCH 0/1] futex: don't spin waiting for PF_EXITING -> PF_EXITPIDONE transition Oleg Nesterov
2015-02-16 20:13   ` [PATCH 1/1] " Oleg Nesterov
2015-02-27  9:52     ` Peter Zijlstra
2015-02-27 11:54       ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150202140515.GA26398@redhat.com \
    --to=oleg@redhat.com \
    --cc=darren@dvhart.com \
    --cc=jmarchan@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lwoodman@redhat.com \
    --cc=mguzik@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.