linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	"Huang, Ying" <ying.huang@intel.com>,
	Philip Li <philip.li@intel.com>,
	Andi Kleen <andi.kleen@intel.com>, Jiri Olsa <jolsa@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Sasha Levin <sashal@kernel.org>, Feng Tang <feng.tang@intel.com>
Subject: [PATCH 4.19 18/48] signal: avoid double atomic counter increments for user accounting
Date: Thu, 19 Mar 2020 14:04:00 +0100	[thread overview]
Message-ID: <20200319123908.898223901@linuxfoundation.org> (raw)
In-Reply-To: <20200319123902.941451241@linuxfoundation.org>

From: Linus Torvalds <torvalds@linux-foundation.org>

[ Upstream commit fda31c50292a5062332fa0343c084bd9f46604d9 ]

When queueing a signal, we increment both the users count of pending
signals (for RLIMIT_SIGPENDING tracking) and we increment the refcount
of the user struct itself (because we keep a reference to the user in
the signal structure in order to correctly account for it when freeing).

That turns out to be fairly expensive, because both of them are atomic
updates, and particularly under extreme signal handling pressure on big
machines, you can get a lot of cache contention on the user struct.
That can then cause horrid cacheline ping-pong when you do these
multiple accesses.

So change the reference counting to only pin the user for the _first_
pending signal, and to unpin it when the last pending signal is
dequeued.  That means that when a user sees a lot of concurrent signal
queuing - which is the only situation when this matters - the only
atomic access needed is generally the 'sigpending' count update.

This was noticed because of a particularly odd timing artifact on a
dual-socket 96C/192T Cascade Lake platform: when you get into bad
contention, on that machine for some reason seems to be much worse when
the contention happens in the upper 32-byte half of the cacheline.

As a result, the kernel test robot will-it-scale 'signal1' benchmark had
an odd performance regression simply due to random alignment of the
'struct user_struct' (and pointed to a completely unrelated and
apparently nonsensical commit for the regression).

Avoiding the double increments (and decrements on the dequeueing side,
of course) makes for much less contention and hugely improved
performance on that will-it-scale microbenchmark.

Quoting Feng Tang:

 "It makes a big difference, that the performance score is tripled! bump
  from original 17000 to 54000. Also the gap between 5.0-rc6 and
  5.0-rc6+Jiri's patch is reduced to around 2%"

[ The "2% gap" is the odd cacheline placement difference on that
  platform: under the extreme contention case, the effect of which half
  of the cacheline was hot was 5%, so with the reduced contention the
  odd timing artifact is reduced too ]

It does help in the non-contended case too, but is not nearly as
noticeable.

Reported-and-tested-by: Feng Tang <feng.tang@intel.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Philip Li <philip.li@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 kernel/signal.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 08911bb6fe9ab..c42eaf39b5729 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -407,27 +407,32 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t flags, int override_rlimi
 {
 	struct sigqueue *q = NULL;
 	struct user_struct *user;
+	int sigpending;
 
 	/*
 	 * Protect access to @t credentials. This can go away when all
 	 * callers hold rcu read lock.
+	 *
+	 * NOTE! A pending signal will hold on to the user refcount,
+	 * and we get/put the refcount only when the sigpending count
+	 * changes from/to zero.
 	 */
 	rcu_read_lock();
-	user = get_uid(__task_cred(t)->user);
-	atomic_inc(&user->sigpending);
+	user = __task_cred(t)->user;
+	sigpending = atomic_inc_return(&user->sigpending);
+	if (sigpending == 1)
+		get_uid(user);
 	rcu_read_unlock();
 
-	if (override_rlimit ||
-	    atomic_read(&user->sigpending) <=
-			task_rlimit(t, RLIMIT_SIGPENDING)) {
+	if (override_rlimit || likely(sigpending <= task_rlimit(t, RLIMIT_SIGPENDING))) {
 		q = kmem_cache_alloc(sigqueue_cachep, flags);
 	} else {
 		print_dropped_signal(sig);
 	}
 
 	if (unlikely(q == NULL)) {
-		atomic_dec(&user->sigpending);
-		free_uid(user);
+		if (atomic_dec_and_test(&user->sigpending))
+			free_uid(user);
 	} else {
 		INIT_LIST_HEAD(&q->list);
 		q->flags = 0;
@@ -441,8 +446,8 @@ static void __sigqueue_free(struct sigqueue *q)
 {
 	if (q->flags & SIGQUEUE_PREALLOC)
 		return;
-	atomic_dec(&q->user->sigpending);
-	free_uid(q->user);
+	if (atomic_dec_and_test(&q->user->sigpending))
+		free_uid(q->user);
 	kmem_cache_free(sigqueue_cachep, q);
 }
 
-- 
2.20.1




  parent reply	other threads:[~2020-03-19 13:43 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-19 13:03 [PATCH 4.19 00/48] 4.19.112-rc1 review Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 4.19 01/48] perf/amd/uncore: Replace manual sampling check with CAP_NO_INTERRUPT flag Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 4.19 02/48] mmc: core: Default to generic_cmd6_time as timeout in __mmc_switch() Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 4.19 03/48] mmc: sdhci-omap: Add platform specific reset callback Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 4.19 04/48] mmc: sdhci-omap: Workaround errata regarding SDR104/HS200 tuning failures (i929) Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 4.19 05/48] mmc: host: Fix Kconfig warnings on keystone_defconfig Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 4.19 06/48] mmc: sdhci-omap: Fix busy detection by enabling MMC_CAP_NEED_RSP_BUSY Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 4.19 07/48] mmc: core: Respect MMC_CAP_NEED_RSP_BUSY for eMMC sleep command Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 4.19 08/48] mmc: core: Respect MMC_CAP_NEED_RSP_BUSY for erase/trim/discard Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 4.19 09/48] mmc: core: Allow host controllers to require R1B for CMD6 Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 4.19 10/48] ACPI: watchdog: Allow disabling WDAT at boot Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 4.19 11/48] HID: apple: Add support for recent firmware on Magic Keyboards Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 4.19 12/48] HID: i2c-hid: add Trekstor Surfbook E11B to descriptor override Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 4.19 13/48] cfg80211: check reg_rule for NULL in handle_channel_custom() Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 4.19 14/48] scsi: libfc: free response frame from GPN_ID Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 4.19 15/48] net: usb: qmi_wwan: restore mtu min/max values after raw_ip switch Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 4.19 16/48] net: ks8851-ml: Fix IRQ handling and locking Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 4.19 17/48] mac80211: rx: avoid RCU list traversal under mutex Greg Kroah-Hartman
2020-03-19 13:04 ` Greg Kroah-Hartman [this message]
2020-03-19 13:04 ` [PATCH 4.19 19/48] slip: not call free_netdev before rtnl_unlock in slip_open Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 20/48] hinic: fix a irq affinity bug Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 21/48] hinic: fix a bug of setting hw_ioctxt Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 22/48] net: rmnet: fix NULL pointer dereference in rmnet_newlink() Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 23/48] net: rmnet: fix NULL pointer dereference in rmnet_changelink() Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 24/48] net: rmnet: fix suspicious RCU usage Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 25/48] net: rmnet: remove rcu_read_lock in rmnet_force_unassociate_device() Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 26/48] net: rmnet: do not allow to change mux id if mux id is duplicated Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 27/48] net: rmnet: use upper/lower device infrastructure Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 28/48] net: rmnet: fix bridge mode bugs Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 29/48] net: rmnet: fix packet forwarding in rmnet bridge mode Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 30/48] sfc: fix timestamp reconstruction at 16-bit rollover points Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 31/48] jbd2: fix data races at struct journal_head Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 32/48] wimax: i2400: fix memory leak Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 33/48] wimax: i2400: Fix memory leak in i2400m_op_rfkill_sw_toggle Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 34/48] mmc: sdhci-omap: Dont finish_mrq() on a command error during tuning Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 35/48] mmc: sdhci-omap: Fix Tuning procedure for temperatures < -20C Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 36/48] driver core: Remove the link if there is no driver with AUTO flag Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 37/48] driver core: Fix adding device links to probing suppliers Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 38/48] driver core: Make driver core own stateful device links Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 39/48] driver core: Add device link flag DL_FLAG_AUTOPROBE_CONSUMER Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 40/48] driver core: Remove device link creation limitation Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 41/48] driver core: Fix creation of device links with PM-runtime flags Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 42/48] net: qrtr: fix len of skb_put_padto in qrtr_node_enqueue Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 43/48] ARM: 8957/1: VDSO: Match ARMv8 timer in cntvct_functional() Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 44/48] ARM: 8958/1: rename missed uaccess .fixup section Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 45/48] mm: slub: add missing TID bump in kmem_cache_alloc_bulk() Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 46/48] HID: google: add moonball USB id Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 47/48] efi: Fix debugobjects warning on efi_rts_work Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 4.19 48/48] ipv4: ensure rcu_read_lock() in cipso_v4_error() Greg Kroah-Hartman
2020-03-19 19:42 ` [PATCH 4.19 00/48] 4.19.112-rc1 review Naresh Kamboju
2020-03-19 20:00   ` Ben Hutchings
2020-03-20  8:03     ` Greg Kroah-Hartman
2020-03-20  8:11       ` Greg Kroah-Hartman
2020-03-20 16:41         ` Naresh Kamboju
2020-03-21  7:09           ` Greg Kroah-Hartman
2020-03-20 17:58         ` Guenter Roeck
2020-03-21  7:08           ` Greg Kroah-Hartman
2020-03-22  1:25       ` Sasha Levin
2020-03-19 23:36 ` Guenter Roeck
2020-03-20 21:01 ` Chris Paterson
2020-03-21  7:13   ` Greg Kroah-Hartman
2020-03-21  0:40 ` shuah

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200319123908.898223901@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=andi.kleen@intel.com \
    --cc=ebiederm@xmission.com \
    --cc=feng.tang@intel.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=philip.li@intel.com \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).