All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Russell King - ARM Linux admin <linux@armlinux.org.uk>,
	Chris Metcalf <cmetcalf@ezchip.com>,
	Christoph Lameter <cl@linux.com>, Kirill Tkhai <tkhai@yandex.ru>,
	Mike Galbraith <efault@gmx.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	Davidlohr Bueso <dave@stgolabs.net>,
	mpe@ellerman.id.au
Subject: Re: [PATCH 2/3] task: RCU protect tasks on the runqueue
Date: Wed, 25 Sep 2019 09:36:43 +0200	[thread overview]
Message-ID: <20190925073643.GA4536@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <87woehek20.fsf@x220.int.ebiederm.org>

On Mon, Sep 09, 2019 at 07:22:15AM -0500, Eric W. Biederman wrote:

> Let me see if I can explain my confusion in terms of task_numa_compare.
> 
> The function task_numa_comare now does:
> 
> 	rcu_read_lock();
> 	cur = rcu_dereference(dst_rq->curr);
> 
> Then it proceeds to examine a few fields of cur.
> 
> 	cur->flags;
>         cur->cpus_ptr;
>         cur->numa_group;
>         cur->numa_faults[];
>         cur->total_numa_faults;
>         cur->se.cfs_rq;
> 
> My concern here is the ordering between setting rq->curr and and setting
> those other fields.  If we read values of those fields that were not
> current when we set cur than I think task_numa_compare would give an
> incorrect answer.

That code is explicitly without serialization. One memory barrier isn't
going to make any difference what so ever.

Specifically, those numbers will change _after_ we make it current, not
before.

> From what I can see pick_next_task_fair does not mess with any of
> the fields that task_numa_compare uses.  So the existing memory barrier
> should be more than sufficient for that case.

There should not be, but even if there is, load-balancing in general
(very much including the NUMA balancing) is completely unserialized and
prone to reading stale data at all times.

This is on purpose; it minimizes the interference of load-balancing, and
if we make a 'wrong' choice, the next pass can fix it up.

> So I think I just need to write up a patch 4/3 that replaces the my
> "rcu_assign_pointer(rq->curr, next)" with "RCU_INIT_POINTER(rq->curr,
> next)".  And includes a great big comment to that effect.
> 
> 
> 
> Now maybe I am wrong.  Maybe we can afford to see data that was changed
> before the task became rq->curr in task_numa_compare.  But I don't think
> so.  I really think it is that full memory barrier just a bit earlier
> in __schedule that is keeping us safe.
> 
> Am I wrong?

Not in so far that we now both seem to agree on using
RCU_INIT_POINTER(); but we're still disagreeing on why.

My argument is that since we can already obtain any task_struct pointer
through find_task_by_vpid() and friends, any access to its individual
members must already have serialzation rules (or explicitly none, as for
load-balancing).

I'm viewing this as just another variant of find_task_by_vpid(), except
we index by CPU instead of PID. There can be changes to task_struct
between to invocations of find_task_by_vpid(), any user will have to
somehow deal with that. This is no different.

Take for instance p->cpus_ptr, it has very specific serialzation rules
(which that NUMA balancer thing blatantly ignores), as do a lot of other
task_struct fields.

The memory barrier in question isn't going to help with any of that.

Specifically, if we care about the consistency of the numbers changed by
pick_next_task(), we must acquire rq->lock (of course, once we do that,
we can immediately use rq->curr without further RCU use).

  reply	other threads:[~2019-09-25  7:37 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-30 14:08 [BUG] Use of probe_kernel_address() in task_rcu_dereference() without checking return value Russell King - ARM Linux admin
2019-08-30 15:24 ` Oleg Nesterov
2019-08-30 15:30 ` Linus Torvalds
2019-08-30 15:40   ` Russell King - ARM Linux admin
2019-08-30 15:43     ` Linus Torvalds
2019-08-30 15:41   ` Linus Torvalds
2019-08-30 16:09     ` Oleg Nesterov
2019-08-30 16:21       ` Linus Torvalds
2019-08-30 16:44         ` Oleg Nesterov
2019-08-30 16:58           ` Linus Torvalds
2019-08-30 19:36         ` Eric W. Biederman
2019-09-02 13:40           ` Oleg Nesterov
2019-09-02 13:53             ` Peter Zijlstra
2019-09-02 14:44               ` Oleg Nesterov
2019-09-02 16:20                 ` Peter Zijlstra
2019-09-02 17:04             ` Eric W. Biederman
2019-09-02 17:34               ` Linus Torvalds
2019-09-03  4:50                 ` [PATCH 0/3] task: Making tasks on the runqueue rcu protected Eric W. Biederman
2019-09-03  4:51                   ` [PATCH 1/3] task: Add a count of task rcu users Eric W. Biederman
2019-09-04 14:36                     ` Oleg Nesterov
2019-09-04 14:44                     ` Frederic Weisbecker
2019-09-04 15:32                       ` Oleg Nesterov
2019-09-04 16:33                         ` Frederic Weisbecker
2019-09-04 18:20                           ` Linus Torvalds
2019-09-05 14:59                             ` Frederic Weisbecker
2019-09-03  4:52                   ` [PATCH 2/3] task: RCU protect tasks on the runqueue Eric W. Biederman
2019-09-03  7:41                     ` Peter Zijlstra
2019-09-03  7:47                       ` Peter Zijlstra
2019-09-03 16:44                         ` Eric W. Biederman
2019-09-03 17:08                           ` Linus Torvalds
2019-09-03 18:13                             ` Eric W. Biederman
2019-09-03 19:18                               ` Linus Torvalds
2019-09-03 20:06                                 ` Peter Zijlstra
2019-09-03 21:32                                   ` Paul E. McKenney
2019-09-05 20:02                                     ` Eric W. Biederman
2019-09-05 20:55                                       ` Paul E. McKenney
2019-09-06  7:07                                       ` Peter Zijlstra
2019-09-09 12:22                                         ` Eric W. Biederman
2019-09-25  7:36                                           ` Peter Zijlstra [this message]
2019-09-27  8:10                                   ` [tip: sched/urgent] tasks, sched/core: RCUify the assignment of rq->curr tip-bot2 for Eric W. Biederman
2019-09-03 19:42                               ` [PATCH 2/3] task: RCU protect tasks on the runqueue Peter Zijlstra
2019-09-14 12:31                           ` [PATCH v2 1/4] task: Add a count of task rcu users Eric W. Biederman
2019-09-14 12:31                           ` [PATCH v2 2/4] task: Ensure tasks are available for a grace period after leaving the runqueue Eric W. Biederman
2019-09-14 12:32                           ` [PATCH v2 3/4] task: With a grace period after finish_task_switch, remove unnecessary code Eric W. Biederman
2019-09-04 14:22                     ` [PATCH 2/3] task: RCU protect tasks on the runqueue Frederic Weisbecker
2019-09-03  4:52                   ` [PATCH 3/3] task: Clean house now that tasks on the runqueue are rcu protected Eric W. Biederman
2019-09-03  9:45                     ` kbuild test robot
2019-09-03 13:06                     ` Oleg Nesterov
2019-09-03 13:58                   ` [PATCH 0/3] task: Making tasks on the runqueue " Oleg Nesterov
2019-09-03 15:44                   ` Linus Torvalds
2019-09-03 19:46                     ` Peter Zijlstra
     [not found]                   ` <87muf7f4bf.fsf_-_@x220.int.ebiederm.org>
2019-09-14 12:33                     ` [PATCH v2 1/4] task: Add a count of task rcu users Eric W. Biederman
2019-09-15 13:54                       ` Paul E. McKenney
2019-09-27  8:10                       ` [tip: sched/urgent] tasks: Add a count of task RCU users tip-bot2 for Eric W. Biederman
2019-09-14 12:33                     ` [PATCH v2 2/4] task: Ensure tasks are available for a grace period after leaving the runqueue Eric W. Biederman
2019-09-15 14:07                       ` Paul E. McKenney
2019-09-15 14:09                         ` Paul E. McKenney
2019-09-27  8:10                       ` [tip: sched/urgent] tasks, sched/core: " tip-bot2 for Eric W. Biederman
2019-09-14 12:34                     ` [PATCH v2 3/4] task: With a grace period after finish_task_switch, remove unnecessary code Eric W. Biederman
2019-09-15 14:32                       ` Paul E. McKenney
2019-09-15 17:07                         ` Linus Torvalds
2019-09-15 18:47                           ` Paul E. McKenney
2019-09-27  8:10                       ` [tip: sched/urgent] tasks, sched/core: With a grace period after finish_task_switch(), " tip-bot2 for Eric W. Biederman
2019-09-14 12:35                     ` [PATCH v2 4/4] task: RCUify the assignment of rq->curr Eric W. Biederman
2019-09-15 14:41                       ` Paul E. McKenney
2019-09-15 17:59                         ` Eric W. Biederman
2019-09-15 18:25                           ` Eric W. Biederman
2019-09-15 18:48                             ` Paul E. McKenney
2019-09-20 23:02                       ` Frederic Weisbecker
2019-09-26  1:49                         ` Eric W. Biederman
2019-09-26 12:42                           ` Frederic Weisbecker
2019-09-14 17:43                     ` [PATCH v2 0/4] task: Making tasks on the runqueue rcu protected Linus Torvalds
2019-09-17 17:38                       ` Eric W. Biederman
2019-09-25  7:51                         ` Peter Zijlstra
2019-09-26  1:11                           ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190925073643.GA4536@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=cl@linux.com \
    --cc=cmetcalf@ezchip.com \
    --cc=dave@stgolabs.net \
    --cc=ebiederm@xmission.com \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=oleg@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tkhai@yandex.ru \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.