All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Peter Oskolkov <posk@posk.io>
Cc: Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Andy Lutomirski <luto@kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-api@vger.kernel.org, Paul Turner <pjt@google.com>,
	Ben Segall <bsegall@google.com>, Peter Oskolkov <posk@google.com>,
	Andrei Vagin <avagin@google.com>, Jann Horn <jannh@google.com>,
	Thierry Delisle <tdelisle@uwaterloo.ca>
Subject: Re: [PATCH v0.9.1 3/6] sched/umcg: implement UMCG syscalls
Date: Mon, 29 Nov 2021 17:41:05 +0100	[thread overview]
Message-ID: <YaUCoe07Wl9Stlch@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <CAFTs51XnN+N74i1XHvRUAUWd04-Fs9uV6ouXo=CQSQs8MaEM5A@mail.gmail.com>

On Sun, Nov 28, 2021 at 04:29:11PM -0800, Peter Oskolkov wrote:

> wait_wake_only is not needed if you have both next_tid and server_tid,
> as your patch has. In my version of the patch, next_tid is the same as
> server_tid, so the flag is needed to indicate to the kernel that
> next_tid is the wakee, not the server.

Ah, okay.

> re: (idle_)server_tid_ptr: it seems that you assume that blocked
> workers keep their servers, while in my patch they "lose them" once
> they block, and so there should be a global idle server pointer to
> wake the server in my scheme (if there is an idle one). The main
> difference is that in my approach a server has only a single, running,
> worker assigned to it, while in your approach it can have a number of
> blocked/idle workers to take care of as well.

Correct; I've been thinking in analogues of the way we schedule CPUs.
Each CPU has a ready/run queue along with the current task.
fundamentally the RUNNABLE tasks need to go somewhere when all servers
are busy. So at that point the previous server is as good a place as
any.

Now, I sympathise with a blocked task not having a relation; I often
argue this same, since we have wakeup balancing etc. And I've not really
thought about how to best do wakeup-balancing, also see below.

> The main difference between our approaches, as I see it: in my
> approach if a worker is running, its server is sleeping, period. If we
> have N servers, and N running workers, there are no servers to wake
> when a previously blocked worker finishes its blocking op. In your
> approach, it seems that N servers have each a bunch of workers
> pointing at them, and a single worker running. If a previously blocked
> worker wakes up, it wakes the server it was assigned to previously,

Right; it does that. It can check the ::state of it's current task,
possibly set TF_PREEMPT or just go back to sleep.

> and so now we have more than N physical tasks/threads running: N
> workers and the woken server. This is not ideal: if the process is
> affined to only N CPUs, that means a worker will be preempted to let
> the woken server run, which is somewhat against the goal of letting
> the workers run more or less uninterrupted. This is not deal breaking,
> but maybe something to keep in mind.

I suppose it's easy enough to make this behaviour configurable though;
simply enqueue and not wake.... Hmm.. how would this worker know if the
server was 'busy' or not? The whole 'current' thing is a user-space
construct. I suppose that's what your pointer was for? Puts an actual
idle server in there, if there is one. Let me ponder that a bit.

However, do note this whole scheme fundamentally has some of that, the
moment the syscall unblocks until sys_exit is 'unmanaged' runtime for
all tasks, they can consume however much time the syscall needs there.

Also, timeout on sys_umcg_wait() gets you the exact same situation (or
worse, multiple running workers).

> Another big concern I have is that you removed UMCG_TF_LOCKED. I

OOh yes, I forgot to mention that. I couldn't figure out what it was
supposed to do.

> definitely needed it to guard workers during "sched work" in the
> userspace in my approach. I'm not sure if the flag is absolutely
> needed with your approach, but most likely it is - the kernel-side
> scheduler does lock tasks and runqueues and disables interrupts and
> migrations and other things so that the scheduling logic is not
> hijacked by concurrent stuff. Why do you assume that the userspace
> scheduling code does not need similar protections?

I've not yet come across a case where this is needed. Migration for
instance is possible when RUNNABLE, simply write ::server_tid before
::state. Userspace just needs to make sure who actually owns the task,
but it can do that outside of this state.

But like I said; I've not yet done the userspace part (and I lost most
of today trying to install a new machine), so perhaps I'll run into it
soon enough.



  reply	other threads:[~2021-11-29 20:07 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-22 21:13 [PATCH v0.9.1 0/6] sched,mm,x86/uaccess: implement User Managed Concurrency Groups Peter Oskolkov
2021-11-22 21:13 ` [PATCH v0.9.1 1/6] sched/umcg: add WF_CURRENT_CPU and externise ttwu Peter Oskolkov
2021-11-22 21:13 ` [PATCH v0.9.1 2/6] mm, x86/uaccess: add userspace atomic helpers Peter Oskolkov
2021-11-24 14:31   ` Peter Zijlstra
2021-11-22 21:13 ` [PATCH v0.9.1 3/6] sched/umcg: implement UMCG syscalls Peter Oskolkov
2021-11-24 18:36   ` kernel test robot
2021-11-24 18:36     ` kernel test robot
2021-11-24 20:08   ` Peter Zijlstra
2021-11-24 21:32     ` Peter Zijlstra
2021-11-25 17:28     ` Peter Oskolkov
2021-11-26 17:09       ` Peter Zijlstra
2021-11-26 21:08         ` Thomas Gleixner
2021-11-26 21:59           ` Peter Zijlstra
2021-11-26 22:07             ` Peter Zijlstra
2021-11-27  0:45             ` Thomas Gleixner
2021-11-29 15:05               ` Peter Zijlstra
2021-11-26 22:16         ` Peter Zijlstra
2021-11-27  1:16           ` Thomas Gleixner
2021-11-29 15:07             ` Peter Zijlstra
2021-11-29  0:29         ` Peter Oskolkov
2021-11-29 16:41           ` Peter Zijlstra [this message]
2021-11-29 17:34             ` Peter Oskolkov
2021-11-29 21:08               ` Peter Zijlstra
2021-11-29 21:29                 ` Peter Zijlstra
2021-11-29 23:38                 ` Peter Oskolkov
2021-12-06 11:32                   ` Peter Zijlstra
2021-12-06 12:04                     ` Peter Zijlstra
2021-12-13 13:55                     ` Peter Zijlstra
2021-12-06 11:47               ` Peter Zijlstra
2022-01-19 17:26                 ` Peter Oskolkov
2022-01-20 11:07                   ` Peter Zijlstra
2021-11-24 21:19   ` Peter Zijlstra
2021-11-26 21:11     ` Thomas Gleixner
2021-11-26 21:52       ` Peter Zijlstra
2021-11-29 22:07         ` Thomas Gleixner
2021-11-29 22:22           ` Peter Zijlstra
2021-11-24 21:41   ` Peter Zijlstra
2021-11-24 21:58   ` Peter Zijlstra
2021-11-24 22:18   ` Peter Zijlstra
2021-11-22 21:13 ` [PATCH v0.9.1 4/6] sched/umcg, lib/umcg: implement libumcg Peter Oskolkov
2021-11-22 21:13 ` [PATCH v0.9.1 5/6] sched/umcg: add Documentation/userspace-api/umcg.txt Peter Oskolkov
2021-11-22 21:13 ` [PATCH v0.9.1 6/6] sched/umcg, lib/umcg: add tools/lib/umcg/libumcg.txt Peter Oskolkov
2021-11-24 14:06 ` [PATCH v0.9.1 0/6] sched,mm,x86/uaccess: implement User Managed Concurrency Groups Peter Zijlstra
2021-11-24 16:28   ` Peter Oskolkov
2021-11-24 17:20     ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YaUCoe07Wl9Stlch@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=avagin@google.com \
    --cc=bsegall@google.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=jannh@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pjt@google.com \
    --cc=posk@google.com \
    --cc=posk@posk.io \
    --cc=tdelisle@uwaterloo.ca \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.