LKML Archive on lore.kernel.org
 help / color / Atom feed
From: peterz@infradead.org
To: Peter Oskolkov <posk@posk.io>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Ingo Molnar <mingo@kernel.org>,
	Darren Hart <dvhart@infradead.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Peter Oskolkov <posk@google.com>,
	Andrei Vagin <avagin@google.com>, Paul Turner <pjt@google.com>,
	Ben Segall <bsegall@google.com>, Aaron Lu <aaron.lwe@gmail.com>,
	Waiman Long <longman@redhat.com>
Subject: Re: [PATCH for 5.9 1/3] futex: introduce FUTEX_SWAP operation
Date: Mon, 27 Jul 2020 11:51:10 +0200
Message-ID: <20200727095110.GG119549@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <CAFTs51UJhC9TmXkzz8VbDNmkSEyZE29=dRdUi65TDpSYqoK5vw@mail.gmail.com>

On Thu, Jul 23, 2020 at 05:25:05PM -0700, Peter Oskolkov wrote:
> On Thu, Jul 23, 2020 at 4:28 AM Peter Zijlstra <peterz@infradead.org> wrote:

> > What worries me is how FUTEX_SWAP would interact with the future
> > FUTEX_LOCK / FUTEX_UNLOCK. When we implement pthread_mutex with those,
> > there's very few WAIT/WAKE left.
> 
> [+cc Waiman Long]
> 
> I've looked through the latest FUTEX_LOCK patchset I could find (
> https://lore.kernel.org/patchwork/cover/772643/ and related), and it seems
> that FUTEX_SWAP and FUTEX_LOCK/FUTEX_UNLOCK patchsets
> address the same issue (slow wakeups) but for different use cases:
> 
> FUTEX_LOCK/FUTEX_UNLOCK uses spinning and lock stealing to
> improve futex wake/wait performance in high contention situations;

The spinning is best for light contention.

> FUTEX_SWAP is designed to be used for fast context switching with
> _no_ contention by design: the waker that is going to sleep, and the wakee
> are using different futexes; the userspace will have a futex per thread/task,
> and when needed the thread/task will either simply sleep on its futex,
> or context switch (=FUTEX_SWAP) into a different thread/task.

Right, but how can you tell what the next thing after UNLOCK is going to
be.. that's going to be tricky.

> I can also imagine that instead of combining WAIT/WAKE for
> fast context switching, a variant of FUTEX_SWAP can use LOCK/UNLOCK
> operations in the future, when these are available; but again, I fully
> expect that
> a single "FUTEX_LOCK the current task on futex A, FUTEX_UNLOCK futex B,
> context switch into the wakee" futex op will be much faster than doing
> the same thing
> in two syscalls, as FUTEX_LOCK/FUTEX_UNLOCK does not seem to be concerned
> with fast waking of a sleeping task, but more with minimizing sleeping
> in the first place.

Correct; however the reason I like LOCK/UNLOCK is that it exposes the
blocking relations to the kernel -- and that ties into yet another
unfinished patch-set :-/

  https://lkml.kernel.org/r/20181009092434.26221-1-juri.lelli@redhat.com

> What will be faster: FUTEX_SWAP that does
>    FUTEX_WAKE (futex A) + FUTEX_WAIT (current, futex B),
> or FUTEX_SWAP that does
>    FUTEX_UNLOCK (futex A) + FUTEX_LOCK (current, futex B)?

Well, perhaps both argue against having SWAP but instead having compound
futex ops. Something I think we're already starting to see with the new
futex API patches posted here:

  https://lkml.kernel.org/r/20200709175921.211387-1-andrealmeid@collabora.com

sys_futex_waitv() is effectively a whole bunch of WAIT ops combined.

> As wake+wait will always put the waker to sleep, it means that
> there will be a true context switch on the same CPU on the fast path;
> on the other hand, unlock+lock will potentially evade sleeping,
> so the wakee will often run on a different CPU (with the waker
> spinning instead of sleeping?), thus not benefitting from cache locality
> that fast context switching on the same CPU is meant to use...
> 
> I'll add some of the considerations above to the expanded cover letter
> (or a commit message).

It's Monday morning, so perhaps I'm making a mess of things, but
something like the below, where our thread t2 issues synchronous work to
t1:


	t1		t2
			LOCK A
	LOCK B
1:	LOCK A

			...


			UNLOCK A
			LOCK B
	...

	UNLOCK B
	UNLOCK A
	LOCK B
	GOTO 1
			UNLOCK B
			LOCK A
			...

Is an abuse of mutexes, that is, it implements completions using a
(fair) mutex. A guards the work-queue, B is the 'completion'.

Then, if you teach it that a compound UNLOCK-A + LOCK-B, where
the lock owner of B is on the wait list of A is a 'SWAP', should get you
the desired semantics, I think.

You can do SWAP and you get to have an exposed blocking relation.

Is this exactly what we want, I don't know. Because I'm not entirely
sure what problem we're solving. Why would you be setting things up like
that in the first place. IIRC you're using this to implement coroutines
in golang, and I'm not sure I have a firm enough grasp of all that to
make a cogent suggestion one way or the other.

> > Also, why would we commit to an ABI without ever having seen the rest?
> 
> I'm not completely sure what you mean here. We do not envision any
> expansion/changes to the ABI proposed here,

Well, you do not, but how can we verify this without having a complete
picture. Also, IIRC, there's a bunch of scheduler patches that goes on
top to implement the fast switch.

Also, how does this compare to some of the glorious hacks found in
GNU Pth? You can implement M:N threading using those as well.


  parent reply index

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-22 23:45 [PATCH for 5.9 0/3] FUTEX_SWAP (tip/locking/core) Peter Oskolkov
2020-07-22 23:45 ` [PATCH for 5.9 1/3] futex: introduce FUTEX_SWAP operation Peter Oskolkov
2020-07-23 11:27   ` Peter Zijlstra
2020-07-24  0:25     ` Peter Oskolkov
2020-07-24  3:00       ` Waiman Long
2020-07-24  3:22         ` Peter Oskolkov
2020-07-27  9:51       ` peterz [this message]
2020-07-28  0:01         ` Peter Oskolkov
2020-07-22 23:45 ` [PATCH for 5.9 2/3] futex/sched: add wake_up_process_prefer_current_cpu, use in FUTEX_SWAP Peter Oskolkov
2020-07-22 23:45 ` [PATCH for 5.9 3/3] selftests/futex: add futex_swap selftest Peter Oskolkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200727095110.GG119549@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=aaron.lwe@gmail.com \
    --cc=avagin@google.com \
    --cc=bsegall@google.com \
    --cc=dvhart@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pjt@google.com \
    --cc=posk@google.com \
    --cc=posk@posk.io \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git