stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev, Jan Kara <jack@suse.cz>,
	Mel Gorman <mgorman@techsingularity.net>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH 6.1 43/71] rtmutex: Add acquire semantics for rtmutex lock acquisition slow path
Date: Mon,  2 Jan 2023 12:22:08 +0100	[thread overview]
Message-ID: <20230102110553.299694452@linuxfoundation.org> (raw)
In-Reply-To: <20230102110551.509937186@linuxfoundation.org>

From: Mel Gorman <mgorman@techsingularity.net>

commit 1c0908d8e441631f5b8ba433523cf39339ee2ba0 upstream.

Jan Kara reported the following bug triggering on 6.0.5-rt14 running dbench
on XFS on arm64.

 kernel BUG at fs/inode.c:625!
 Internal error: Oops - BUG: 0 [#1] PREEMPT_RT SMP
 CPU: 11 PID: 6611 Comm: dbench Tainted: G            E   6.0.0-rt14-rt+ #1
 pc : clear_inode+0xa0/0xc0
 lr : clear_inode+0x38/0xc0
 Call trace:
  clear_inode+0xa0/0xc0
  evict+0x160/0x180
  iput+0x154/0x240
  do_unlinkat+0x184/0x300
  __arm64_sys_unlinkat+0x48/0xc0
  el0_svc_common.constprop.4+0xe4/0x2c0
  do_el0_svc+0xac/0x100
  el0_svc+0x78/0x200
  el0t_64_sync_handler+0x9c/0xc0
  el0t_64_sync+0x19c/0x1a0

It also affects 6.1-rc7-rt5 and affects a preempt-rt fork of 5.14 so this
is likely a bug that existed forever and only became visible when ARM
support was added to preempt-rt. The same problem does not occur on x86-64
and he also reported that converting sb->s_inode_wblist_lock to
raw_spinlock_t makes the problem disappear indicating that the RT spinlock
variant is the problem.

Which in turn means that RT mutexes on ARM64 and any other weakly ordered
architecture are affected by this independent of RT.

Will Deacon observed:

  "I'd be more inclined to be suspicious of the slowpath tbh, as we need to
   make sure that we have acquire semantics on all paths where the lock can
   be taken. Looking at the rtmutex code, this really isn't obvious to me
   -- for example, try_to_take_rt_mutex() appears to be able to return via
   the 'takeit' label without acquire semantics and it looks like we might
   be relying on the caller's subsequent _unlock_ of the wait_lock for
   ordering, but that will give us release semantics which aren't correct."

Sebastian Andrzej Siewior prototyped a fix that does work based on that
comment but it was a little bit overkill and added some fences that should
not be necessary.

The lock owner is updated with an IRQ-safe raw spinlock held, but the
spin_unlock does not provide acquire semantics which are needed when
acquiring a mutex.

Adds the necessary acquire semantics for lock owner updates in the slow path
acquisition and the waiter bit logic.

It successfully completed 10 iterations of the dbench workload while the
vanilla kernel fails on the first iteration.

[ bigeasy@linutronix.de: Initial prototype fix ]

Fixes: 700318d1d7b38 ("locking/rtmutex: Use acquire/release semantics")
Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core")
Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20221202100223.6mevpbl7i6x5udfd@techsingularity.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/locking/rtmutex.c     |   55 +++++++++++++++++++++++++++++++++++--------
 kernel/locking/rtmutex_api.c |    6 ++--
 2 files changed, 49 insertions(+), 12 deletions(-)

--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -89,15 +89,31 @@ static inline int __ww_mutex_check_kill(
  * set this bit before looking at the lock.
  */
 
-static __always_inline void
-rt_mutex_set_owner(struct rt_mutex_base *lock, struct task_struct *owner)
+static __always_inline struct task_struct *
+rt_mutex_owner_encode(struct rt_mutex_base *lock, struct task_struct *owner)
 {
 	unsigned long val = (unsigned long)owner;
 
 	if (rt_mutex_has_waiters(lock))
 		val |= RT_MUTEX_HAS_WAITERS;
 
-	WRITE_ONCE(lock->owner, (struct task_struct *)val);
+	return (struct task_struct *)val;
+}
+
+static __always_inline void
+rt_mutex_set_owner(struct rt_mutex_base *lock, struct task_struct *owner)
+{
+	/*
+	 * lock->wait_lock is held but explicit acquire semantics are needed
+	 * for a new lock owner so WRITE_ONCE is insufficient.
+	 */
+	xchg_acquire(&lock->owner, rt_mutex_owner_encode(lock, owner));
+}
+
+static __always_inline void rt_mutex_clear_owner(struct rt_mutex_base *lock)
+{
+	/* lock->wait_lock is held so the unlock provides release semantics. */
+	WRITE_ONCE(lock->owner, rt_mutex_owner_encode(lock, NULL));
 }
 
 static __always_inline void clear_rt_mutex_waiters(struct rt_mutex_base *lock)
@@ -106,7 +122,8 @@ static __always_inline void clear_rt_mut
 			((unsigned long)lock->owner & ~RT_MUTEX_HAS_WAITERS);
 }
 
-static __always_inline void fixup_rt_mutex_waiters(struct rt_mutex_base *lock)
+static __always_inline void
+fixup_rt_mutex_waiters(struct rt_mutex_base *lock, bool acquire_lock)
 {
 	unsigned long owner, *p = (unsigned long *) &lock->owner;
 
@@ -172,8 +189,21 @@ static __always_inline void fixup_rt_mut
 	 * still set.
 	 */
 	owner = READ_ONCE(*p);
-	if (owner & RT_MUTEX_HAS_WAITERS)
-		WRITE_ONCE(*p, owner & ~RT_MUTEX_HAS_WAITERS);
+	if (owner & RT_MUTEX_HAS_WAITERS) {
+		/*
+		 * See rt_mutex_set_owner() and rt_mutex_clear_owner() on
+		 * why xchg_acquire() is used for updating owner for
+		 * locking and WRITE_ONCE() for unlocking.
+		 *
+		 * WRITE_ONCE() would work for the acquire case too, but
+		 * in case that the lock acquisition failed it might
+		 * force other lockers into the slow path unnecessarily.
+		 */
+		if (acquire_lock)
+			xchg_acquire(p, owner & ~RT_MUTEX_HAS_WAITERS);
+		else
+			WRITE_ONCE(*p, owner & ~RT_MUTEX_HAS_WAITERS);
+	}
 }
 
 /*
@@ -208,6 +238,13 @@ static __always_inline void mark_rt_mute
 		owner = *p;
 	} while (cmpxchg_relaxed(p, owner,
 				 owner | RT_MUTEX_HAS_WAITERS) != owner);
+
+	/*
+	 * The cmpxchg loop above is relaxed to avoid back-to-back ACQUIRE
+	 * operations in the event of contention. Ensure the successful
+	 * cmpxchg is visible.
+	 */
+	smp_mb__after_atomic();
 }
 
 /*
@@ -1243,7 +1280,7 @@ static int __sched __rt_mutex_slowtryloc
 	 * try_to_take_rt_mutex() sets the lock waiters bit
 	 * unconditionally. Clean this up.
 	 */
-	fixup_rt_mutex_waiters(lock);
+	fixup_rt_mutex_waiters(lock, true);
 
 	return ret;
 }
@@ -1604,7 +1641,7 @@ static int __sched __rt_mutex_slowlock(s
 	 * try_to_take_rt_mutex() sets the waiter bit
 	 * unconditionally. We might have to fix that up.
 	 */
-	fixup_rt_mutex_waiters(lock);
+	fixup_rt_mutex_waiters(lock, true);
 
 	trace_contention_end(lock, ret);
 
@@ -1719,7 +1756,7 @@ static void __sched rtlock_slowlock_lock
 	 * try_to_take_rt_mutex() sets the waiter bit unconditionally.
 	 * We might have to fix that up:
 	 */
-	fixup_rt_mutex_waiters(lock);
+	fixup_rt_mutex_waiters(lock, true);
 	debug_rt_mutex_free_waiter(&waiter);
 
 	trace_contention_end(lock, 0);
--- a/kernel/locking/rtmutex_api.c
+++ b/kernel/locking/rtmutex_api.c
@@ -267,7 +267,7 @@ void __sched rt_mutex_init_proxy_locked(
 void __sched rt_mutex_proxy_unlock(struct rt_mutex_base *lock)
 {
 	debug_rt_mutex_proxy_unlock(lock);
-	rt_mutex_set_owner(lock, NULL);
+	rt_mutex_clear_owner(lock);
 }
 
 /**
@@ -382,7 +382,7 @@ int __sched rt_mutex_wait_proxy_lock(str
 	 * try_to_take_rt_mutex() sets the waiter bit unconditionally. We might
 	 * have to fix that up.
 	 */
-	fixup_rt_mutex_waiters(lock);
+	fixup_rt_mutex_waiters(lock, true);
 	raw_spin_unlock_irq(&lock->wait_lock);
 
 	return ret;
@@ -438,7 +438,7 @@ bool __sched rt_mutex_cleanup_proxy_lock
 	 * try_to_take_rt_mutex() sets the waiter bit unconditionally. We might
 	 * have to fix that up.
 	 */
-	fixup_rt_mutex_waiters(lock);
+	fixup_rt_mutex_waiters(lock, false);
 
 	raw_spin_unlock_irq(&lock->wait_lock);
 



  parent reply	other threads:[~2023-01-02 11:27 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-02 11:21 [PATCH 6.1 00/71] 6.1.3-rc1 review Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 01/71] eventpoll: add EPOLL_URING_WAKE poll wakeup flag Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 02/71] eventfd: provide a eventfd_signal_mask() helper Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 03/71] io_uring: pass in EPOLL_URING_WAKE for eventfd signaling and wakeups Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 04/71] nvme-pci: fix doorbell buffer value endianness Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 05/71] nvme-pci: fix mempool alloc size Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 06/71] nvme-pci: fix page size checks Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 07/71] ACPI: resource: do IRQ override on XMG Core 15 Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 08/71] ACPI: resource: do IRQ override on Lenovo 14ALC7 Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 09/71] ACPI: resource: Add Asus ExpertBook B2502 to Asus quirks Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 10/71] ACPI: video: Fix Apple GMUX backlight detection Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 11/71] block, bfq: fix uaf for bfqq in bfq_exit_icq_bfqq Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 12/71] ata: ahci: Fix PCS quirk application for suspend Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 13/71] nvme: fix the NVME_CMD_EFFECTS_CSE_MASK definition Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 14/71] nvmet: dont defer passthrough commands with trivial effects to the workqueue Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 15/71] fs/ntfs3: Validate BOOT record_size Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 16/71] fs/ntfs3: Add overflow check for attribute size Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 17/71] fs/ntfs3: Validate data run offset Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 18/71] fs/ntfs3: Add null pointer check to attr_load_runs_vcn Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 19/71] fs/ntfs3: Fix memory leak on ntfs_fill_super() error path Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 20/71] fs/ntfs3: Add null pointer check for inode operations Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 21/71] fs/ntfs3: Validate attribute name offset Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 22/71] fs/ntfs3: Validate buffer length while parsing index Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 23/71] fs/ntfs3: Validate resident attribute name Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 24/71] fs/ntfs3: Fix slab-out-of-bounds read in run_unpack Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 25/71] soundwire: dmi-quirks: add quirk variant for LAPBC710 NUC15 Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 26/71] phy: sun4i-usb: Introduce port2 SIDDQ quirk Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 27/71] phy: sun4i-usb: Add support for the H616 USB PHY Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 28/71] fs/ntfs3: Validate index root when initialize NTFS security Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 29/71] fs/ntfs3: Use __GFP_NOWARN allocation at wnd_init() Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 30/71] fs/ntfs3: Use __GFP_NOWARN allocation at ntfs_fill_super() Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 31/71] fs/ntfs3: Delete duplicate condition in ntfs_read_mft() Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 32/71] fs/ntfs3: Fix slab-out-of-bounds in r_page Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 33/71] objtool: Fix SEGFAULT Greg Kroah-Hartman
2023-01-02 11:21 ` [PATCH 6.1 34/71] iommu/mediatek: Fix crash on isr after kexec() Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 35/71] powerpc/rtas: avoid device tree lookups in rtas_os_term() Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 36/71] powerpc/rtas: avoid scheduling " Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 37/71] rtc: msc313: Fix function prototype mismatch in msc313_rtc_probe() Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 38/71] NFSD: fix use-after-free in __nfs42_ssc_open() Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 39/71] kprobes: kretprobe events missing on 2-core KVM guest Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 40/71] HID: multitouch: fix Asus ExpertBook P2 P2451FA trackpoint Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 41/71] HID: plantronics: Additional PIDs for double volume key presses quirk Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 42/71] futex: Fix futex_waitv() hrtimer debug object leak on kcalloc error Greg Kroah-Hartman
2023-01-02 11:22 ` Greg Kroah-Hartman [this message]
2023-01-02 11:22 ` [PATCH 6.1 44/71] mm, mremap: fix mremap() expanding vma with addr inside vma Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 45/71] mm/mempolicy: fix memory leak in set_mempolicy_home_node system call Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 46/71] kmsan: export kmsan_handle_urb Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 47/71] kmsan: include linux/vmalloc.h Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 48/71] pstore: Properly assign mem_type property Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 49/71] pstore/zone: Use GFP_ATOMIC to allocate zone buffer Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 50/71] hfsplus: fix bug causing custom uid and gid being unable to be assigned with mount Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 51/71] ACPI: x86: s2idle: Force AMD GUID/_REV 2 on HP Elitebook 865 Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 52/71] ACPI: x86: s2idle: Stop using AMD specific codepath for Rembrandt+ Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 53/71] binfmt: Fix error return code in load_elf_fdpic_binary() Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 54/71] ovl: Use ovl mounters fsuid and fsgid in ovl_link() Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 55/71] ovl: update ->f_iocb_flags when ovl_change_flags() modifies ->f_flags Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 56/71] ALSA: line6: correct midi status byte when receiving data from podxt Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 57/71] ALSA: line6: fix stack overflow in line6_midi_transmit Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 58/71] ALSA: hda/hdmi: Static PCM mapping again with AMD HDMI codecs Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 59/71] pnode: terminate at peers of source Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 60/71] mfd: mt6360: Add bounds checking in Regmap read/write call-backs Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 61/71] md: fix a crash in mempool_free Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 62/71] mm, compaction: fix fast_isolate_around() to stay within boundaries Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 63/71] f2fs: should put a page when checking the summary info Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 64/71] f2fs: allow to read node block after shutdown Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 65/71] block: Do not reread partition table on exclusively open device Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 66/71] mmc: vub300: fix warning - do not call blocking ops when !TASK_RUNNING Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 67/71] tpm: acpi: Call acpi_put_table() to fix memory leak Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 68/71] tpm: tpm_crb: Add the missed " Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 69/71] tpm: tpm_tis: " Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 70/71] SUNRPC: Dont leak netobj memory when gss_read_proxy_verf() fails Greg Kroah-Hartman
2023-01-02 11:22 ` [PATCH 6.1 71/71] kcsan: Instrument memcpy/memset/memmove with newer Clang Greg Kroah-Hartman
2023-01-02 23:14 ` [PATCH 6.1 00/71] 6.1.3-rc1 review Rudi Heitbaum
2023-01-03  0:25 ` Shuah Khan
2023-01-03  1:13 ` Guenter Roeck
2023-01-03  7:24 ` Fenil Jain
2023-01-03  8:40 ` Naresh Kamboju
2023-01-03  8:45 ` Naresh Kamboju
2023-01-03  8:59 ` Ron Economos
2023-01-03 10:34 ` Sudip Mukherjee (Codethink)
2023-01-03 12:08 ` Bagas Sanjaya
2023-01-03 13:22 ` Allen Pais
2023-01-03 19:33 ` Florian Fainelli
2023-01-04  1:39 ` Justin Forbes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230102110553.299694452@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=jack@suse.cz \
    --cc=mgorman@techsingularity.net \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).