All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Darren Hart <darren@dvhart.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Jerome Marchand <jmarchan@redhat.com>,
	Larry Woodman <lwoodman@redhat.com>,
	Mateusz Guzik <mguzik@redhat.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/1] futex: check PF_KTHREAD rather than !p->mm to filter out kthreads
Date: Mon, 2 Feb 2015 16:11:59 +0100	[thread overview]
Message-ID: <20150202151159.GE26304@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20150202140515.GA26398@redhat.com>

On Mon, Feb 02, 2015 at 03:05:15PM +0100, Oleg Nesterov wrote:

> First of all, why exactly do we need this mm/PF_KTHREAD check added by
> f0d71b3dcb8332f7971 ? Of course, it is simply wrong to declare a random
> kernel thread to be the owner as the changelog says. But why kthread is
> worse than a random user-space task, say, /sbin/init?

As the changelog says, we _should_ equally disallow other userspace
tasks that do not share the futex value with us, its just that at the
time we could not come up with a sensible (and cheap) way of testing for
this.

> IIUC, the fact that we can abuse ->pi_state_list is not that bad, no matter
> if this (k)thread will exit or not. AFAICS, the only problem is that we can
> boost the prio of this thread. Or I missed another problem?

No that's it.

> I am asking because we need to backport some fixes, and I am trying to
> convince myself that I actually understand what I am trying to do ;)



> And another question. Lets forget about this ->mm check. I simply can not
> understand this
> 
> 	ret = (p->flags & PF_EXITPIDONE) ? -ESRCH : -EAGAIN
> 
> logic in attach_to_pi_owner(). First of all, why do we need to retry if
> PF_EXITING is set but PF_EXITPIDONE is not? Why we can not simply ignore
> PF_EXITING and rely on exit_pi_state_list() if PF_EXITPIDONE is not set?
> 
> I must have missed something but this looks buggy, I do not see any
> preemption point in this "retry" loop. Suppose that max_cpus=1 and rt_task()
> preempts the non-rt PF_EXITING owner. Looks like futex_lock_pi() can spin
> forever in this case? (OK, ignoring RT throttling).

This is not something I've ever looked at before; 778e9a9c3e71
("pi-futex: fix exit races and locking problems") seems to suggest its
possible to get onto tsk->pi_state_list after exit_pi_state_list().

So while the below shows preemption points; those don't actually help
against RT tasks, a FIFO-99 task will always be more eligible to run
than most others.

So yes, I do like your proposal of putting PF_EXITPIDONE under the
->pi_lock section that handles exit_pi_state_list().

I further think we can remove the smp_mb(); raw_spin_unlock_wait() from
do_exit() -- this would offset the new unconditional ->pi_lock
acquisition in exit_pi_state_list(). The comment there suggests robust
futexes are involved but I cannot find any except the PI state muck
testing ->flags.

As for the recursive fault; I think the safer option is to set
EXITPIDONE and not register more PI states, as opposed to allowing more
and more states to be added. Yes we'll leak whatever currently is there,
but no point in allowing it to get worse.


do_exit()
{
	exit_signals(tsk); /* sets PF_EXITING */

	smp_mb();
	raw_spin_unlock_wait(&tsk->pi_lock);

	exit_mm() {
		mm_release() {
			exit_pi_state_list();
		}
	}

	tsk->flags |= PF_EXITPIDONE;
}

vs

futex_lock_pi()
{
retry:
	...

	ret = futex_lock_pi_atomic() {
		attach_to_pi_owner() {
			raw_spin_lock(&tsk->pi_lock);
			if (PF_EXITING) {
				ret = PF_EXITPIDONE ? -ESRCH : -AGAIN;
				raw_spin_unlock(&tsk->pi_lock);
				return ret;
			}
		}
	}
	if (ret) {
		switch(ret) {
		...

		case -EAGAIN:
			...
			cond_resched();
			goto retry;
		}
	}
}

vs

futex_requeue()
{
retry:
	...

	ret = futex_proxy_trylock_atomic() {
		ret = futex_lock_pi_atomic() {
			attach_to_pi_owner() {
				raw_spin_lock(&tsk->pi_lock);
				if (PF_EXITING) {
					ret = PF_EXITPIDONE ? -ESRCH : -AGAIN;
					raw_spin_unlock(&tsk->pi_lock);
					return ret;
				}
			}
		}
	}

	if (ret > 0) {
		ret = lookup_pi_state() {
			attach_to_pi_owner() {
				raw_spin_lock(&tsk->pi_lock);
				if (PF_EXITING) {
					ret = PF_EXITPIDONE ? -ESRCH : -AGAIN;
					raw_spin_unlock(&tsk->pi_lock);
					return ret;
				}
			}
		}
	}

	...
	switch(ret) {
		...
	case -EAGAIN:
		...
		cond_resched();
		goto retry;
	}
}

vs



  parent reply	other threads:[~2015-02-02 15:12 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-02 14:05 [PATCH 0/1] futex: check PF_KTHREAD rather than !p->mm to filter out kthreads Oleg Nesterov
2015-02-02 14:05 ` [PATCH 1/1] " Oleg Nesterov
2015-02-04 10:48   ` Peter Zijlstra
2015-02-14 18:01   ` Davidlohr Bueso
2015-02-14 20:57     ` Oleg Nesterov
2015-02-14 21:15       ` Davidlohr Bueso
2015-02-14 21:54         ` Oleg Nesterov
2015-02-18 17:11   ` [tip:locking/core] locking/futex: Check " tip-bot for Oleg Nesterov
2015-02-02 15:11 ` Peter Zijlstra [this message]
2015-02-02 15:13   ` [PATCH 0/1] futex: check " Peter Zijlstra
2015-02-02 15:14     ` Peter Zijlstra
2015-02-02 16:20   ` Oleg Nesterov
2015-02-03 20:09   ` Oleg Nesterov
2015-02-04 11:12     ` Peter Zijlstra
2015-02-04 20:25       ` Oleg Nesterov
2015-02-05 16:27         ` Peter Zijlstra
2015-02-05 18:10           ` Oleg Nesterov
2015-02-06 10:46             ` Peter Zijlstra
2015-02-06 17:04               ` Oleg Nesterov
2015-02-09 20:38                 ` Darren Hart
2015-02-10 11:14                   ` Oleg Nesterov
2015-02-16 20:13 ` [PATCH 0/1] futex: don't spin waiting for PF_EXITING -> PF_EXITPIDONE transition Oleg Nesterov
2015-02-16 20:13   ` [PATCH 1/1] " Oleg Nesterov
2015-02-27  9:52     ` Peter Zijlstra
2015-02-27 11:54       ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150202151159.GE26304@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=darren@dvhart.com \
    --cc=jmarchan@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lwoodman@redhat.com \
    --cc=mguzik@redhat.com \
    --cc=oleg@redhat.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.