Kernel-hardening archive on lore.kernel.org
 help / color / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Jann Horn <jannh@google.com>,
	Alan Stern <stern@rowland.harvard.edu>,
	 Andrea Parri <parri.andrea@gmail.com>,
	Will Deacon <will@kernel.org>,
	 Peter Zijlstra <peterz@infradead.org>,
	Boqun Feng <boqun.feng@gmail.com>,
	 Nicholas Piggin <npiggin@gmail.com>,
	David Howells <dhowells@redhat.com>,
	 Jade Alglave <j.alglave@ucl.ac.uk>,
	Luc Maranget <luc.maranget@inria.fr>,
	 "Paul E. McKenney" <paulmck@kernel.org>,
	Akira Yokosawa <akiyks@gmail.com>,
	 Daniel Lustig <dlustig@nvidia.com>,
	Adam Zabrocki <pi3@pi3.com.pl>,
	 kernel list <linux-kernel@vger.kernel.org>,
	 Kernel Hardening <kernel-hardening@lists.openwall.com>,
	Oleg Nesterov <oleg@redhat.com>,
	 Andy Lutomirski <luto@amacapital.net>,
	Bernd Edlinger <bernd.edlinger@hotmail.de>,
	 Kees Cook <keescook@chromium.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	 stable <stable@vger.kernel.org>, Marco Elver <elver@google.com>,
	 Dmitry Vyukov <dvyukov@google.com>,
	kasan-dev <kasan-dev@googlegroups.com>
Subject: Re: [PATCH] signal: Extend exec_id to 64bits
Date: Thu, 2 Apr 2020 11:06:02 -0700
Message-ID: <CAHk-=wiOS4Fi2tsXQrvLOiW69g4HiJYsqL6RPeTd14b4+2-Ykg@mail.gmail.com> (raw)
In-Reply-To: <877dyym3r0.fsf@x220.int.ebiederm.org>

On Thu, Apr 2, 2020 at 6:14 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
> > tasklist_lock is aboue the hottest lock there is in all of the kernel.
>
> Do you know code paths you see tasklist_lock being hot?

It's generally not bad enough to show up on single-socket machines.

But the problem with tasklist_lock is that it's one of our remaining
completely global locks. So it scales like sh*t in some circumstances.

On single-socket machines, most of the truly nasty hot paths aren't a
huge problem, because they tend to be mostly readers. So you get the
cacheline bounce, but you don't (usually) get much busy looping. The
cacheline bounce is "almost free" on a single socket.

But because it's one of those completely global locks, on big
multi-socket machines people have reported it as a problem forever.
Even just readers can cause problems (because of the cacheline
bouncing even when you just do the reader increment), but you also end
up having more issues with writers scaling badly.

Don't get me wrong - you can get bad scaling on other locks too, even
when they aren't really global - we had that with just the reference
counter increment for the user signal accounting, after all. Neither
of the reference counts were actually global, but they were just
effectively single counters under that particular load (ie the count
was per-user, but the load ran as a single user).

The reason tasklist_lock probably doesn't come up very much is that
it's _always_ been expensive. It has also caused some fundamental
issues (I think it's the main reason we have that rule that
reader-writer locks are unfair to readers, because we have readers
from interrupt context too, but can't afford to make normal readers
disable interrupts).

A lot of the tasklist lock readers end up looping quite a bit inside
the lock (looping over threads etc), which is why it can then be a big
deal when the rare reader shows up.

We've improved a _lot_ of those loops. That has definitely helped for
the common cases. But we've never been able to really fix the lock
itself.

                 Linus

  reply index

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-24 21:50 Curiosity around 'exec_id' and some problems associated with it Adam Zabrocki
2020-03-29 22:43 ` Kees Cook
2020-03-30  8:34   ` Oleg Nesterov
2020-03-31  4:29   ` Adam Zabrocki
2020-04-01 20:47   ` [PATCH] signal: Extend exec_id to 64bits Eric W. Biederman
2020-04-01 20:55     ` Linus Torvalds
2020-04-01 21:03       ` Eric W. Biederman
2020-04-01 23:37     ` Jann Horn
2020-04-01 23:51       ` Linus Torvalds
2020-04-01 23:55         ` Linus Torvalds
2020-04-02  1:35           ` Jann Horn
2020-04-02  2:05             ` Linus Torvalds
2020-04-02 13:11               ` Eric W. Biederman
2020-04-02 18:06                 ` Linus Torvalds [this message]
2020-04-02  4:46     ` Jann Horn
2020-04-02 14:14       ` Eric W. Biederman
2020-04-03  2:11       ` Adam Zabrocki
2020-04-02  7:19     ` Kees Cook
2020-04-02  7:22     ` Bernd Edlinger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wiOS4Fi2tsXQrvLOiW69g4HiJYsqL6RPeTd14b4+2-Ykg@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=akiyks@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bernd.edlinger@hotmail.de \
    --cc=boqun.feng@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=dlustig@nvidia.com \
    --cc=dvyukov@google.com \
    --cc=ebiederm@xmission.com \
    --cc=elver@google.com \
    --cc=j.alglave@ucl.ac.uk \
    --cc=jannh@google.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=keescook@chromium.org \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luc.maranget@inria.fr \
    --cc=luto@amacapital.net \
    --cc=npiggin@gmail.com \
    --cc=oleg@redhat.com \
    --cc=parri.andrea@gmail.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pi3@pi3.com.pl \
    --cc=stable@vger.kernel.org \
    --cc=stern@rowland.harvard.edu \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Kernel-hardening archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kernel-hardening/0 kernel-hardening/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kernel-hardening kernel-hardening/ https://lore.kernel.org/kernel-hardening \
		kernel-hardening@lists.openwall.com
	public-inbox-index kernel-hardening

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/com.openwall.lists.kernel-hardening


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git