linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] mmiowb: Rename mmiowb_spin_{lock, unlock}() to mmiowb_in_{lock, unlock}()
@ 2024-03-01 13:05 Huacai Chen
  2024-03-01 13:05 ` [PATCH 2/2] mmiowb: Hook up mmiowb helpers to mutexes as well as spinlocks Huacai Chen
  0 siblings, 1 reply; 4+ messages in thread
From: Huacai Chen @ 2024-03-01 13:05 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Arnd Bergmann
  Cc: Huacai Chen, Waiman Long, Boqun Feng, Guo Ren, Rui Wang,
	WANG Xuerui, Jiaxun Yang, linux-arch, linux-kernel, Huacai Chen

We are extending mmiowb tracking system from spinlock to mutex, so
rename mmiowb_spin_{lock, unlock}() to mmiowb_in_{lock, unlock}() to
reflect the fact. No functional changes.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
---
 include/asm-generic/mmiowb.h    | 8 ++++----
 include/linux/spinlock.h        | 6 +++---
 kernel/locking/spinlock_debug.c | 6 +++---
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/asm-generic/mmiowb.h b/include/asm-generic/mmiowb.h
index 5698fca3bf56..eb2335f9f35e 100644
--- a/include/asm-generic/mmiowb.h
+++ b/include/asm-generic/mmiowb.h
@@ -40,13 +40,13 @@ static inline void mmiowb_set_pending(void)
 		ms->mmiowb_pending = ms->nesting_count;
 }
 
-static inline void mmiowb_spin_lock(void)
+static inline void mmiowb_in_lock(void)
 {
 	struct mmiowb_state *ms = __mmiowb_state();
 	ms->nesting_count++;
 }
 
-static inline void mmiowb_spin_unlock(void)
+static inline void mmiowb_in_unlock(void)
 {
 	struct mmiowb_state *ms = __mmiowb_state();
 
@@ -59,7 +59,7 @@ static inline void mmiowb_spin_unlock(void)
 }
 #else
 #define mmiowb_set_pending()		do { } while (0)
-#define mmiowb_spin_lock()		do { } while (0)
-#define mmiowb_spin_unlock()		do { } while (0)
+#define mmiowb_in_lock()		do { } while (0)
+#define mmiowb_in_unlock()		do { } while (0)
 #endif	/* CONFIG_MMIOWB */
 #endif	/* __ASM_GENERIC_MMIOWB_H */
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 3fcd20de6ca8..60eda70cddd0 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -185,7 +185,7 @@ static inline void do_raw_spin_lock(raw_spinlock_t *lock) __acquires(lock)
 {
 	__acquire(lock);
 	arch_spin_lock(&lock->raw_lock);
-	mmiowb_spin_lock();
+	mmiowb_in_lock();
 }
 
 static inline int do_raw_spin_trylock(raw_spinlock_t *lock)
@@ -193,14 +193,14 @@ static inline int do_raw_spin_trylock(raw_spinlock_t *lock)
 	int ret = arch_spin_trylock(&(lock)->raw_lock);
 
 	if (ret)
-		mmiowb_spin_lock();
+		mmiowb_in_lock();
 
 	return ret;
 }
 
 static inline void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock)
 {
-	mmiowb_spin_unlock();
+	mmiowb_in_unlock();
 	arch_spin_unlock(&lock->raw_lock);
 	__release(lock);
 }
diff --git a/kernel/locking/spinlock_debug.c b/kernel/locking/spinlock_debug.c
index 87b03d2e41db..632a88322433 100644
--- a/kernel/locking/spinlock_debug.c
+++ b/kernel/locking/spinlock_debug.c
@@ -114,7 +114,7 @@ void do_raw_spin_lock(raw_spinlock_t *lock)
 {
 	debug_spin_lock_before(lock);
 	arch_spin_lock(&lock->raw_lock);
-	mmiowb_spin_lock();
+	mmiowb_in_lock();
 	debug_spin_lock_after(lock);
 }
 
@@ -123,7 +123,7 @@ int do_raw_spin_trylock(raw_spinlock_t *lock)
 	int ret = arch_spin_trylock(&lock->raw_lock);
 
 	if (ret) {
-		mmiowb_spin_lock();
+		mmiowb_in_lock();
 		debug_spin_lock_after(lock);
 	}
 #ifndef CONFIG_SMP
@@ -137,7 +137,7 @@ int do_raw_spin_trylock(raw_spinlock_t *lock)
 
 void do_raw_spin_unlock(raw_spinlock_t *lock)
 {
-	mmiowb_spin_unlock();
+	mmiowb_in_unlock();
 	debug_spin_unlock(lock);
 	arch_spin_unlock(&lock->raw_lock);
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] mmiowb: Hook up mmiowb helpers to mutexes as well as spinlocks
  2024-03-01 13:05 [PATCH 1/2] mmiowb: Rename mmiowb_spin_{lock, unlock}() to mmiowb_in_{lock, unlock}() Huacai Chen
@ 2024-03-01 13:05 ` Huacai Chen
  2024-04-12 13:32   ` Will Deacon
  0 siblings, 1 reply; 4+ messages in thread
From: Huacai Chen @ 2024-03-01 13:05 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Arnd Bergmann
  Cc: Huacai Chen, Waiman Long, Boqun Feng, Guo Ren, Rui Wang,
	WANG Xuerui, Jiaxun Yang, linux-arch, linux-kernel, Huacai Chen

Commit fb24ea52f78e0d595852e ("drivers: Remove explicit invocations of
mmiowb()") remove all mmiowb() in drivers, but it says:

"NOTE: mmiowb() has only ever guaranteed ordering in conjunction with
spin_unlock(). However, pairing each mmiowb() removal in this patch with
the corresponding call to spin_unlock() is not at all trivial, so there
is a small chance that this change may regress any drivers incorrectly
relying on mmiowb() to order MMIO writes between CPUs using lock-free
synchronisation."

The mmio in radeon_ring_commit() is protected by a mutex rather than a
spinlock, but in the mutex fastpath it behaves similar to spinlock. We
can add mmiowb() calls in the radeon driver but the maintainer says he
doesn't like such a workaround, and radeon is not the only example of
mutex protected mmio.

So we extend the mmiowb tracking system from spinlock to mutex, hook up
mmiowb helpers to mutexes as well as spinlocks.

Without this, we get such an error when run 'glxgears' on weak ordering
architectures such as LoongArch:

radeon 0000:04:00.0: ring 0 stalled for more than 10324msec
radeon 0000:04:00.0: ring 3 stalled for more than 10240msec
radeon 0000:04:00.0: GPU lockup (current fence id 0x000000000001f412 last fence id 0x000000000001f414 on ring 3)
radeon 0000:04:00.0: GPU lockup (current fence id 0x000000000000f940 last fence id 0x000000000000f941 on ring 0)
radeon 0000:04:00.0: scheduling IB failed (-35).
[drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
radeon 0000:04:00.0: scheduling IB failed (-35).
[drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
radeon 0000:04:00.0: scheduling IB failed (-35).
[drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
radeon 0000:04:00.0: scheduling IB failed (-35).
[drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
radeon 0000:04:00.0: scheduling IB failed (-35).
[drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
radeon 0000:04:00.0: scheduling IB failed (-35).
[drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
radeon 0000:04:00.0: scheduling IB failed (-35).
[drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)

Link: https://lore.kernel.org/dri-devel/29df7e26-d7a8-4f67-b988-44353c4270ac@amd.com/T/#t
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
---
 kernel/locking/mutex.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index cbae8c0b89ab..f51d09aec643 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -127,8 +127,10 @@ static inline struct task_struct *__mutex_trylock_common(struct mutex *lock, boo
 		}
 
 		if (atomic_long_try_cmpxchg_acquire(&lock->owner, &owner, task | flags)) {
-			if (task == curr)
+			if (task == curr) {
+				mmiowb_in_lock();
 				return NULL;
+			}
 			break;
 		}
 	}
@@ -168,8 +170,10 @@ static __always_inline bool __mutex_trylock_fast(struct mutex *lock)
 	unsigned long curr = (unsigned long)current;
 	unsigned long zero = 0UL;
 
-	if (atomic_long_try_cmpxchg_acquire(&lock->owner, &zero, curr))
+	if (atomic_long_try_cmpxchg_acquire(&lock->owner, &zero, curr)) {
+		mmiowb_in_lock();
 		return true;
+	}
 
 	return false;
 }
@@ -178,6 +182,7 @@ static __always_inline bool __mutex_unlock_fast(struct mutex *lock)
 {
 	unsigned long curr = (unsigned long)current;
 
+	mmiowb_in_unlock();
 	return atomic_long_try_cmpxchg_release(&lock->owner, &curr, 0UL);
 }
 #endif
@@ -918,6 +923,7 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
 	 * Except when HANDOFF, in that case we must not clear the owner field,
 	 * but instead set it to the top waiter.
 	 */
+	mmiowb_in_unlock();
 	owner = atomic_long_read(&lock->owner);
 	for (;;) {
 		MUTEX_WARN_ON(__owner_task(owner) != current);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] mmiowb: Hook up mmiowb helpers to mutexes as well as spinlocks
  2024-03-01 13:05 ` [PATCH 2/2] mmiowb: Hook up mmiowb helpers to mutexes as well as spinlocks Huacai Chen
@ 2024-04-12 13:32   ` Will Deacon
  2024-04-12 14:35     ` Huacai Chen
  0 siblings, 1 reply; 4+ messages in thread
From: Will Deacon @ 2024-04-12 13:32 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Peter Zijlstra, Ingo Molnar, Arnd Bergmann, Huacai Chen,
	Waiman Long, Boqun Feng, Guo Ren, Rui Wang, WANG Xuerui,
	Jiaxun Yang, linux-arch, linux-kernel, torvalds, mpe

On Fri, Mar 01, 2024 at 09:05:32PM +0800, Huacai Chen wrote:
> Commit fb24ea52f78e0d595852e ("drivers: Remove explicit invocations of
> mmiowb()") remove all mmiowb() in drivers, but it says:
> 
> "NOTE: mmiowb() has only ever guaranteed ordering in conjunction with
> spin_unlock(). However, pairing each mmiowb() removal in this patch with
> the corresponding call to spin_unlock() is not at all trivial, so there
> is a small chance that this change may regress any drivers incorrectly
> relying on mmiowb() to order MMIO writes between CPUs using lock-free
> synchronisation."
> 
> The mmio in radeon_ring_commit() is protected by a mutex rather than a
> spinlock, but in the mutex fastpath it behaves similar to spinlock. We
> can add mmiowb() calls in the radeon driver but the maintainer says he
> doesn't like such a workaround, and radeon is not the only example of
> mutex protected mmio.

Oh no! Ostrich programming is real!

https://lore.kernel.org/lkml/CAHk-=wgbnn7x+i72NqnvXotbxjsk2Ag56Q5YP0OSvhY9sUk7QA@mail.gmail.com/

> So we extend the mmiowb tracking system from spinlock to mutex, hook up
> mmiowb helpers to mutexes as well as spinlocks.
> 
> Without this, we get such an error when run 'glxgears' on weak ordering
> architectures such as LoongArch:
> 
> radeon 0000:04:00.0: ring 0 stalled for more than 10324msec
> radeon 0000:04:00.0: ring 3 stalled for more than 10240msec
> radeon 0000:04:00.0: GPU lockup (current fence id 0x000000000001f412 last fence id 0x000000000001f414 on ring 3)
> radeon 0000:04:00.0: GPU lockup (current fence id 0x000000000000f940 last fence id 0x000000000000f941 on ring 0)
> radeon 0000:04:00.0: scheduling IB failed (-35).
> [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
> radeon 0000:04:00.0: scheduling IB failed (-35).
> [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
> radeon 0000:04:00.0: scheduling IB failed (-35).
> [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
> radeon 0000:04:00.0: scheduling IB failed (-35).
> [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
> radeon 0000:04:00.0: scheduling IB failed (-35).
> [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
> radeon 0000:04:00.0: scheduling IB failed (-35).
> [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
> radeon 0000:04:00.0: scheduling IB failed (-35).
> [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
> 
> Link: https://lore.kernel.org/dri-devel/29df7e26-d7a8-4f67-b988-44353c4270ac@amd.com/T/#t

Hmm. It's hard to tell from that thread whether you're really falling
afoul of the problems that mmiowb() tries to solve or whether Loongson
simply doesn't have enough ordering in its MMIO accessors.

The code you proposed has:

	mmiowb(); /* Make sure wptr is up-to-date for hw */

but mmiowb() is really about ensuring order with MMIO accesses from a
different CPU that subsequently takes the lock.

So please can you explain a bit more about the failing case? I'm worried
that mmiowb() may be the wrong tool for the job here.

Thanks,

Will

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] mmiowb: Hook up mmiowb helpers to mutexes as well as spinlocks
  2024-04-12 13:32   ` Will Deacon
@ 2024-04-12 14:35     ` Huacai Chen
  0 siblings, 0 replies; 4+ messages in thread
From: Huacai Chen @ 2024-04-12 14:35 UTC (permalink / raw)
  To: Will Deacon
  Cc: Huacai Chen, Peter Zijlstra, Ingo Molnar, Arnd Bergmann,
	Waiman Long, Boqun Feng, Guo Ren, Rui Wang, WANG Xuerui,
	Jiaxun Yang, linux-arch, linux-kernel, torvalds, mpe

Hi, Will,

On Fri, Apr 12, 2024 at 9:32 PM Will Deacon <will@kernel.org> wrote:
>
> On Fri, Mar 01, 2024 at 09:05:32PM +0800, Huacai Chen wrote:
> > Commit fb24ea52f78e0d595852e ("drivers: Remove explicit invocations of
> > mmiowb()") remove all mmiowb() in drivers, but it says:
> >
> > "NOTE: mmiowb() has only ever guaranteed ordering in conjunction with
> > spin_unlock(). However, pairing each mmiowb() removal in this patch with
> > the corresponding call to spin_unlock() is not at all trivial, so there
> > is a small chance that this change may regress any drivers incorrectly
> > relying on mmiowb() to order MMIO writes between CPUs using lock-free
> > synchronisation."
> >
> > The mmio in radeon_ring_commit() is protected by a mutex rather than a
> > spinlock, but in the mutex fastpath it behaves similar to spinlock. We
> > can add mmiowb() calls in the radeon driver but the maintainer says he
> > doesn't like such a workaround, and radeon is not the only example of
> > mutex protected mmio.
>
> Oh no! Ostrich programming is real!
>
> https://lore.kernel.org/lkml/CAHk-=wgbnn7x+i72NqnvXotbxjsk2Ag56Q5YP0OSvhY9sUk7QA@mail.gmail.com/
Yes, you are probably right, so we solved it in the architectural code
like this finally.

https://lore.kernel.org/loongarch/20240305143958.1752241-1-chenhuacai@loongson.cn/T/#u

Huacai

>
> > So we extend the mmiowb tracking system from spinlock to mutex, hook up
> > mmiowb helpers to mutexes as well as spinlocks.
> >
> > Without this, we get such an error when run 'glxgears' on weak ordering
> > architectures such as LoongArch:
> >
> > radeon 0000:04:00.0: ring 0 stalled for more than 10324msec
> > radeon 0000:04:00.0: ring 3 stalled for more than 10240msec
> > radeon 0000:04:00.0: GPU lockup (current fence id 0x000000000001f412 last fence id 0x000000000001f414 on ring 3)
> > radeon 0000:04:00.0: GPU lockup (current fence id 0x000000000000f940 last fence id 0x000000000000f941 on ring 0)
> > radeon 0000:04:00.0: scheduling IB failed (-35).
> > [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
> > radeon 0000:04:00.0: scheduling IB failed (-35).
> > [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
> > radeon 0000:04:00.0: scheduling IB failed (-35).
> > [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
> > radeon 0000:04:00.0: scheduling IB failed (-35).
> > [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
> > radeon 0000:04:00.0: scheduling IB failed (-35).
> > [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
> > radeon 0000:04:00.0: scheduling IB failed (-35).
> > [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
> > radeon 0000:04:00.0: scheduling IB failed (-35).
> > [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
> >
> > Link: https://lore.kernel.org/dri-devel/29df7e26-d7a8-4f67-b988-44353c4270ac@amd.com/T/#t
>
> Hmm. It's hard to tell from that thread whether you're really falling
> afoul of the problems that mmiowb() tries to solve or whether Loongson
> simply doesn't have enough ordering in its MMIO accessors.
>
> The code you proposed has:
>
>         mmiowb(); /* Make sure wptr is up-to-date for hw */
>
> but mmiowb() is really about ensuring order with MMIO accesses from a
> different CPU that subsequently takes the lock.
>
> So please can you explain a bit more about the failing case? I'm worried
> that mmiowb() may be the wrong tool for the job here.
>
> Thanks,
>
> Will

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-04-12 14:35 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-01 13:05 [PATCH 1/2] mmiowb: Rename mmiowb_spin_{lock, unlock}() to mmiowb_in_{lock, unlock}() Huacai Chen
2024-03-01 13:05 ` [PATCH 2/2] mmiowb: Hook up mmiowb helpers to mutexes as well as spinlocks Huacai Chen
2024-04-12 13:32   ` Will Deacon
2024-04-12 14:35     ` Huacai Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).