From: email@example.com To: Peter Oskolkov <firstname.lastname@example.org> Cc: Linux Kernel Mailing List <email@example.com>, Thomas Gleixner <firstname.lastname@example.org>, Ingo Molnar <email@example.com>, Ingo Molnar <firstname.lastname@example.org>, Darren Hart <email@example.com>, Vincent Guittot <firstname.lastname@example.org>, Peter Oskolkov <email@example.com>, Andrei Vagin <firstname.lastname@example.org>, Paul Turner <email@example.com>, Ben Segall <firstname.lastname@example.org>, Aaron Lu <email@example.com>, Waiman Long <firstname.lastname@example.org> Subject: Re: [PATCH for 5.9 1/3] futex: introduce FUTEX_SWAP operation Date: Mon, 27 Jul 2020 11:51:10 +0200 Message-ID: <20200727095110.GG119549@hirez.programming.kicks-ass.net> (raw) In-Reply-To: <CAFTs51UJhC9TmXkzz8VbDNmkSEyZE29=dRdUi65TDpSYqoK5vw@mail.gmail.com> On Thu, Jul 23, 2020 at 05:25:05PM -0700, Peter Oskolkov wrote: > On Thu, Jul 23, 2020 at 4:28 AM Peter Zijlstra <email@example.com> wrote: > > What worries me is how FUTEX_SWAP would interact with the future > > FUTEX_LOCK / FUTEX_UNLOCK. When we implement pthread_mutex with those, > > there's very few WAIT/WAKE left. > > [+cc Waiman Long] > > I've looked through the latest FUTEX_LOCK patchset I could find ( > https://lore.kernel.org/patchwork/cover/772643/ and related), and it seems > that FUTEX_SWAP and FUTEX_LOCK/FUTEX_UNLOCK patchsets > address the same issue (slow wakeups) but for different use cases: > > FUTEX_LOCK/FUTEX_UNLOCK uses spinning and lock stealing to > improve futex wake/wait performance in high contention situations; The spinning is best for light contention. > FUTEX_SWAP is designed to be used for fast context switching with > _no_ contention by design: the waker that is going to sleep, and the wakee > are using different futexes; the userspace will have a futex per thread/task, > and when needed the thread/task will either simply sleep on its futex, > or context switch (=FUTEX_SWAP) into a different thread/task. Right, but how can you tell what the next thing after UNLOCK is going to be.. that's going to be tricky. > I can also imagine that instead of combining WAIT/WAKE for > fast context switching, a variant of FUTEX_SWAP can use LOCK/UNLOCK > operations in the future, when these are available; but again, I fully > expect that > a single "FUTEX_LOCK the current task on futex A, FUTEX_UNLOCK futex B, > context switch into the wakee" futex op will be much faster than doing > the same thing > in two syscalls, as FUTEX_LOCK/FUTEX_UNLOCK does not seem to be concerned > with fast waking of a sleeping task, but more with minimizing sleeping > in the first place. Correct; however the reason I like LOCK/UNLOCK is that it exposes the blocking relations to the kernel -- and that ties into yet another unfinished patch-set :-/ https://firstname.lastname@example.org > What will be faster: FUTEX_SWAP that does > FUTEX_WAKE (futex A) + FUTEX_WAIT (current, futex B), > or FUTEX_SWAP that does > FUTEX_UNLOCK (futex A) + FUTEX_LOCK (current, futex B)? Well, perhaps both argue against having SWAP but instead having compound futex ops. Something I think we're already starting to see with the new futex API patches posted here: https://email@example.com sys_futex_waitv() is effectively a whole bunch of WAIT ops combined. > As wake+wait will always put the waker to sleep, it means that > there will be a true context switch on the same CPU on the fast path; > on the other hand, unlock+lock will potentially evade sleeping, > so the wakee will often run on a different CPU (with the waker > spinning instead of sleeping?), thus not benefitting from cache locality > that fast context switching on the same CPU is meant to use... > > I'll add some of the considerations above to the expanded cover letter > (or a commit message). It's Monday morning, so perhaps I'm making a mess of things, but something like the below, where our thread t2 issues synchronous work to t1: t1 t2 LOCK A LOCK B 1: LOCK A ... UNLOCK A LOCK B ... UNLOCK B UNLOCK A LOCK B GOTO 1 UNLOCK B LOCK A ... Is an abuse of mutexes, that is, it implements completions using a (fair) mutex. A guards the work-queue, B is the 'completion'. Then, if you teach it that a compound UNLOCK-A + LOCK-B, where the lock owner of B is on the wait list of A is a 'SWAP', should get you the desired semantics, I think. You can do SWAP and you get to have an exposed blocking relation. Is this exactly what we want, I don't know. Because I'm not entirely sure what problem we're solving. Why would you be setting things up like that in the first place. IIRC you're using this to implement coroutines in golang, and I'm not sure I have a firm enough grasp of all that to make a cogent suggestion one way or the other. > > Also, why would we commit to an ABI without ever having seen the rest? > > I'm not completely sure what you mean here. We do not envision any > expansion/changes to the ABI proposed here, Well, you do not, but how can we verify this without having a complete picture. Also, IIRC, there's a bunch of scheduler patches that goes on top to implement the fast switch. Also, how does this compare to some of the glorious hacks found in GNU Pth? You can implement M:N threading using those as well.
next prev parent reply index Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-07-22 23:45 [PATCH for 5.9 0/3] FUTEX_SWAP (tip/locking/core) Peter Oskolkov 2020-07-22 23:45 ` [PATCH for 5.9 1/3] futex: introduce FUTEX_SWAP operation Peter Oskolkov 2020-07-23 11:27 ` Peter Zijlstra 2020-07-24 0:25 ` Peter Oskolkov 2020-07-24 3:00 ` Waiman Long 2020-07-24 3:22 ` Peter Oskolkov 2020-07-27 9:51 ` peterz [this message] 2020-07-28 0:01 ` Peter Oskolkov 2020-07-22 23:45 ` [PATCH for 5.9 2/3] futex/sched: add wake_up_process_prefer_current_cpu, use in FUTEX_SWAP Peter Oskolkov 2020-07-22 23:45 ` [PATCH for 5.9 3/3] selftests/futex: add futex_swap selftest Peter Oskolkov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20200727095110.GG119549@hirez.programming.kicks-ass.net \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
LKML Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \ firstname.lastname@example.org public-inbox-index lkml Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel AGPL code for this site: git clone https://public-inbox.org/public-inbox.git