linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Peter Oskolkov <posk@posk.io>
Cc: Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Andy Lutomirski <luto@kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-api@vger.kernel.org, Paul Turner <pjt@google.com>,
	Ben Segall <bsegall@google.com>, Peter Oskolkov <posk@google.com>,
	Andrei Vagin <avagin@google.com>, Jann Horn <jannh@google.com>,
	Thierry Delisle <tdelisle@uwaterloo.ca>
Subject: Re: [PATCH v0.9.1 3/6] sched/umcg: implement UMCG syscalls
Date: Mon, 29 Nov 2021 22:08:41 +0100	[thread overview]
Message-ID: <20211129210841.GO721624@worktop.programming.kicks-ass.net> (raw)
In-Reply-To: <CAFTs51UzR=m6+vcjTCNOGwGu3ZwB5GMrg+cSQy2ecvCWxhZvEQ@mail.gmail.com>

On Mon, Nov 29, 2021 at 09:34:49AM -0800, Peter Oskolkov wrote:
> On Mon, Nov 29, 2021 at 8:41 AM Peter Zijlstra <peterz@infradead.org> wrote:

> > However, do note this whole scheme fundamentally has some of that, the
> > moment the syscall unblocks until sys_exit is 'unmanaged' runtime for
> > all tasks, they can consume however much time the syscall needs there.
> >
> > Also, timeout on sys_umcg_wait() gets you the exact same situation (or
> > worse, multiple running workers).
> 
> It should not. Timed out workers should be added to the runnable list
> and not become running unless a server chooses so. So sys_umcg_wait()
> with a timeout should behave similarly to a normal sleep, in that the
> server is woken upon the worker blocking, and upon the worker wakeup
> the worker is added to the woken workers list and waits for a server
> to run it. The only difference is that in a sleep the worker becomes
> BLOCKED, while in sys_umcg_wait() the worker is RUNNABLE the whole
> time.

OK, that's somewhat subtle and I hadn't gotten that either.

Currently it return -ETIMEDOUT in RUNNING state for both server and
worker callers.

Let me go fix that then.

> > > Another big concern I have is that you removed UMCG_TF_LOCKED. I
> >
> > OOh yes, I forgot to mention that. I couldn't figure out what it was
> > supposed to do.
> >
> > > definitely needed it to guard workers during "sched work" in the
> > > userspace in my approach. I'm not sure if the flag is absolutely
> > > needed with your approach, but most likely it is - the kernel-side
> > > scheduler does lock tasks and runqueues and disables interrupts and
> > > migrations and other things so that the scheduling logic is not
> > > hijacked by concurrent stuff. Why do you assume that the userspace
> > > scheduling code does not need similar protections?
> >
> > I've not yet come across a case where this is needed. Migration for
> > instance is possible when RUNNABLE, simply write ::server_tid before
> > ::state. Userspace just needs to make sure who actually owns the task,
> > but it can do that outside of this state.
> >
> > But like I said; I've not yet done the userspace part (and I lost most
> > of today trying to install a new machine), so perhaps I'll run into it
> > soon enough.
> 
> The most obvious scenario where I needed locking is when worker A
> wants to context switch into worker B, while another worker C wants to
> context switch into worker A, and worker A pagefaults. This involves:
> 
> worker A context: worker A context switches into worker B:
> 
> - worker B::server_tid = worker A::server_tid
> - worker A::server_tid = none
> - worker A::state = runnable
> - worker B::state = running
> - worker A::next_tid = worker B
> - worker A calls sys_umcg_wait()
> 
> worker B context: before the above completes, worker C wants to
> context switch into worker A, with similar steps.
> 
> "interrupt context": in the middle of the mess above, worker A pagefaults
> 
> Too many moving parts. UMCG_TF_LOCKED helped me make this mess
> manageable. Maybe without pagefaults clever ordering of the operations
> listed above could make things work, but pagefaults mess things badly,
> so some kind of "preempt_disable()" for the userspace scheduling code
> was needed, and UMCG_TF_LOCKED was the solution I had.

I'm not sure I'm following. For this to be true A and C must be running
on a different server right?

So we have something like:

	S0 running A			S1 running B

Therefore:

	S0::state == RUNNABLE		S1::state == RUNNABLE
	A::server_tid == S0.tid		B::server_tid == S1.tid
	A::state == RUNNING		B::state == RUNNING

Now, you want A to switch to C, therefore C had better be with S0, eg we
have:

	C::server_tid == S0.tid
	C::state == RUNNABLE

So then A does:

	A::next_tid = C.tid;
	sys_umcg_wait();

Which will:

	pin(A);
	pin(S0);

	cmpxchg(A::state, RUNNING, RUNNABLE);

	next_tid = A::next_tid; // C

	enqueue(S0::runnable, A);

At which point B steals S0's runnable queue, and tries to make A go.

					runnable = xchg(S0::runnable_list_ptr, NULL); // == A
					A::server_tid = S1.tid;
					B::next_tid = A.tid;
					sys_umcg_wait();

	wake(C)
	  cmpxchg(C::state, RUNNABLE, RUNNING); <-- *fault*


Something like that, right?

What currently happens is that S0 goes back to S0 and S1 ends up in A.
That is, if, for any reason we fail to wake next_tid, we'll wake
server_tid.

So then S0 wakes up and gets to re-evaluate life. If it has another
worker it can go run that, otherwise it can try and steal a worker
somewhere or just idle out.

Now arguably, the only reason A->C can fault is because C is garbage, at
which point your program is malformed and it doesn't matter what
happens one way or the other.

  reply	other threads:[~2021-11-29 23:06 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-22 21:13 [PATCH v0.9.1 0/6] sched,mm,x86/uaccess: implement User Managed Concurrency Groups Peter Oskolkov
2021-11-22 21:13 ` [PATCH v0.9.1 1/6] sched/umcg: add WF_CURRENT_CPU and externise ttwu Peter Oskolkov
2021-11-22 21:13 ` [PATCH v0.9.1 2/6] mm, x86/uaccess: add userspace atomic helpers Peter Oskolkov
2021-11-24 14:31   ` Peter Zijlstra
2021-11-22 21:13 ` [PATCH v0.9.1 3/6] sched/umcg: implement UMCG syscalls Peter Oskolkov
2021-11-24 18:36   ` kernel test robot
2021-11-24 20:08   ` Peter Zijlstra
2021-11-24 21:32     ` Peter Zijlstra
2021-11-25 17:28     ` Peter Oskolkov
2021-11-26 17:09       ` Peter Zijlstra
2021-11-26 21:08         ` Thomas Gleixner
2021-11-26 21:59           ` Peter Zijlstra
2021-11-26 22:07             ` Peter Zijlstra
2021-11-27  0:45             ` Thomas Gleixner
2021-11-29 15:05               ` Peter Zijlstra
2021-11-26 22:16         ` Peter Zijlstra
2021-11-27  1:16           ` Thomas Gleixner
2021-11-29 15:07             ` Peter Zijlstra
2021-11-29  0:29         ` Peter Oskolkov
2021-11-29 16:41           ` Peter Zijlstra
2021-11-29 17:34             ` Peter Oskolkov
2021-11-29 21:08               ` Peter Zijlstra [this message]
2021-11-29 21:29                 ` Peter Zijlstra
2021-11-29 23:38                 ` Peter Oskolkov
2021-12-06 11:32                   ` Peter Zijlstra
2021-12-06 12:04                     ` Peter Zijlstra
2021-12-13 13:55                     ` Peter Zijlstra
2021-12-06 11:47               ` Peter Zijlstra
2022-01-19 17:26                 ` Peter Oskolkov
2022-01-20 11:07                   ` Peter Zijlstra
2021-11-24 21:19   ` Peter Zijlstra
2021-11-26 21:11     ` Thomas Gleixner
2021-11-26 21:52       ` Peter Zijlstra
2021-11-29 22:07         ` Thomas Gleixner
2021-11-29 22:22           ` Peter Zijlstra
2021-11-24 21:41   ` Peter Zijlstra
2021-11-24 21:58   ` Peter Zijlstra
2021-11-24 22:18   ` Peter Zijlstra
2021-11-22 21:13 ` [PATCH v0.9.1 4/6] sched/umcg, lib/umcg: implement libumcg Peter Oskolkov
2021-11-22 21:13 ` [PATCH v0.9.1 5/6] sched/umcg: add Documentation/userspace-api/umcg.txt Peter Oskolkov
2021-11-22 21:13 ` [PATCH v0.9.1 6/6] sched/umcg, lib/umcg: add tools/lib/umcg/libumcg.txt Peter Oskolkov
2021-11-24 14:06 ` [PATCH v0.9.1 0/6] sched,mm,x86/uaccess: implement User Managed Concurrency Groups Peter Zijlstra
2021-11-24 16:28   ` Peter Oskolkov
2021-11-24 17:20     ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211129210841.GO721624@worktop.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=avagin@google.com \
    --cc=bsegall@google.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=jannh@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pjt@google.com \
    --cc=posk@google.com \
    --cc=posk@posk.io \
    --cc=tdelisle@uwaterloo.ca \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).