From: Peter Oskolkov <firstname.lastname@example.org> To: Peter Zijlstra <email@example.com> Cc: Linux Kernel Mailing List <firstname.lastname@example.org>, Thomas Gleixner <email@example.com>, Ingo Molnar <firstname.lastname@example.org>, Ingo Molnar <email@example.com>, Darren Hart <firstname.lastname@example.org>, Vincent Guittot <email@example.com>, Peter Oskolkov <firstname.lastname@example.org>, Andrei Vagin <email@example.com>, Paul Turner <firstname.lastname@example.org>, Ben Segall <email@example.com>, Aaron Lu <firstname.lastname@example.org>, Waiman Long <email@example.com> Subject: Re: [PATCH for 5.9 1/3] futex: introduce FUTEX_SWAP operation Date: Thu, 23 Jul 2020 17:25:05 -0700 Message-ID: <CAFTs51UJhC9TmXkzz8VbDNmkSEyZE29=dRdUi65TDpSYqoK5vw@mail.gmail.com> (raw) In-Reply-To: <20200723112757.GN5523@worktop.programming.kicks-ass.net> On Thu, Jul 23, 2020 at 4:28 AM Peter Zijlstra <firstname.lastname@example.org> wrote: Thanks a lot for your comments, Peter! My answers below. > > On Wed, Jul 22, 2020 at 04:45:36PM -0700, Peter Oskolkov wrote: > > This patchset is the first step to open-source this work. As explained > > in the linked pdf and video, SwitchTo API has three core operations: wait, > > resume, and swap (=switch). So this patchset adds a FUTEX_SWAP operation > > that, in addition to FUTEX_WAIT and FUTEX_WAKE, will provide a foundation > > on top of which user-space threading libraries can be built. > > The PDF and video can go pound sand; you get to fully explain things > here. Will do. Should I expand the cover letter or the commit message? (I'll probably split the first patch into two in the latter case). > > What worries me is how FUTEX_SWAP would interact with the future > FUTEX_LOCK / FUTEX_UNLOCK. When we implement pthread_mutex with those, > there's very few WAIT/WAKE left. [+cc Waiman Long] I've looked through the latest FUTEX_LOCK patchset I could find ( https://lore.kernel.org/patchwork/cover/772643/ and related), and it seems that FUTEX_SWAP and FUTEX_LOCK/FUTEX_UNLOCK patchsets address the same issue (slow wakeups) but for different use cases: FUTEX_LOCK/FUTEX_UNLOCK uses spinning and lock stealing to improve futex wake/wait performance in high contention situations; FUTEX_SWAP is designed to be used for fast context switching with _no_ contention by design: the waker that is going to sleep, and the wakee are using different futexes; the userspace will have a futex per thread/task, and when needed the thread/task will either simply sleep on its futex, or context switch (=FUTEX_SWAP) into a different thread/task. I can also imagine that instead of combining WAIT/WAKE for fast context switching, a variant of FUTEX_SWAP can use LOCK/UNLOCK operations in the future, when these are available; but again, I fully expect that a single "FUTEX_LOCK the current task on futex A, FUTEX_UNLOCK futex B, context switch into the wakee" futex op will be much faster than doing the same thing in two syscalls, as FUTEX_LOCK/FUTEX_UNLOCK does not seem to be concerned with fast waking of a sleeping task, but more with minimizing sleeping in the first place. What will be faster: FUTEX_SWAP that does FUTEX_WAKE (futex A) + FUTEX_WAIT (current, futex B), or FUTEX_SWAP that does FUTEX_UNLOCK (futex A) + FUTEX_LOCK (current, futex B)? As wake+wait will always put the waker to sleep, it means that there will be a true context switch on the same CPU on the fast path; on the other hand, unlock+lock will potentially evade sleeping, so the wakee will often run on a different CPU (with the waker spinning instead of sleeping?), thus not benefitting from cache locality that fast context switching on the same CPU is meant to use... I'll add some of the considerations above to the expanded cover letter (or a commit message). > > Also, why would we commit to an ABI without ever having seen the rest? I'm not completely sure what you mean here. We do not envision any expansion/changes to the ABI proposed here, only further performance improvements. On these, we currently think that marking the wakee as the preferred next task to run on the current CPU (by storing "struct task_struct *preferred_next_tast" either in a per-CPU pointer, or in the current task_struct) and then having schedule() determine whether to follow the hint or ignore it would be the simplest way to speed up the context switch. > > On another note: wake_up_process_prefer_current_cpu() is a horrific > function name :/ That's half to a third of the line limit. I fully agree. I considered wake_up_on_current_cpu() first, but this name does not indicate that the wakeup is a "strong wish", but "current cpu" is a weak one... Do you have any suggestions? Maybe wake_up_on_cpu(struct task_struct *next, int cpu_hint)? But this seems too broad in scope, as we are interested here in only migrating the task to the current CPU... Thanks again for your comments!
next prev parent reply index Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-07-22 23:45 [PATCH for 5.9 0/3] FUTEX_SWAP (tip/locking/core) Peter Oskolkov 2020-07-22 23:45 ` [PATCH for 5.9 1/3] futex: introduce FUTEX_SWAP operation Peter Oskolkov 2020-07-23 11:27 ` Peter Zijlstra 2020-07-24 0:25 ` Peter Oskolkov [this message] 2020-07-24 3:00 ` Waiman Long 2020-07-24 3:22 ` Peter Oskolkov 2020-07-27 9:51 ` peterz 2020-07-28 0:01 ` Peter Oskolkov 2020-07-22 23:45 ` [PATCH for 5.9 2/3] futex/sched: add wake_up_process_prefer_current_cpu, use in FUTEX_SWAP Peter Oskolkov 2020-07-22 23:45 ` [PATCH for 5.9 3/3] selftests/futex: add futex_swap selftest Peter Oskolkov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CAFTs51UJhC9TmXkzz8VbDNmkSEyZE29=dRdUi65TDpSYqoK5vw@mail.gmail.com' \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
LKML Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \ email@example.com public-inbox-index lkml Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel AGPL code for this site: git clone https://public-inbox.org/public-inbox.git