linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/4] fs: Substitute bit-spinlocks for PREEMPT_RT and debugging
@ 2019-07-30 11:24 Thomas Gleixner
  2019-07-30 11:24 ` [patch 1/4] locking/lockdep: Add Kconfig option for bit spinlocks Thomas Gleixner
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Thomas Gleixner @ 2019-07-30 11:24 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Anna-Maria Gleixner, Steven Rostedt, Julia Cartwright,
	Theodore Tso, Matthew Wilcox, Alexander Viro, linux-fsdevel,
	linux-ext4, Jan Kara

Bit spinlocks are problematic if PREEMPT_RT is enabled. They disable
preemption, which is undesired for latency reasons and breaks when regular
spinlocks are taken within the bit_spinlock locked region because regular
spinlocks are converted to 'sleeping spinlocks' on RT.

Bit spinlocks are also not covered by lock debugging, e.g. lockdep. With
the spinlock substitution in place, they can be exposed via a new config
switch: CONFIG_DEBUG_BIT_SPINLOCKS.

This series handles only buffer head and jbd2, but does not touch the
hlist_bl bit-spinlock usage.

For most usage sites of hlist_bl the protected sections are either really
small and do not contain any RT incompatible code or can be trivially fixed
up. In those cases converting the list_bl locking to spinlocks looks like
overkill, but it still might make sense to provide it for debugging.

The only exception for this is the code in drivers/md/dm-snap.c which has
interesting nested locking in place (via kmemcache, cgroups, block ...)
which completely escapes lockdep.

I pondered making the hlist_bl locking conditional on a per source
file define, but that does not work due to include hell.

Thoughts?

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [patch 1/4] locking/lockdep: Add Kconfig option for bit spinlocks
  2019-07-30 11:24 [patch 0/4] fs: Substitute bit-spinlocks for PREEMPT_RT and debugging Thomas Gleixner
@ 2019-07-30 11:24 ` Thomas Gleixner
  2019-07-30 11:24 ` [patch 2/4] fs/buffer: Move BH_Uptodate_Lock locking into wrapper functions Thomas Gleixner
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2019-07-30 11:24 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Anna-Maria Gleixner, Steven Rostedt, Julia Cartwright,
	Theodore Tso, Matthew Wilcox, Alexander Viro, linux-fsdevel,
	linux-ext4, Jan Kara

Some usage sites of bit spinlocks have a substitution with regular
spinlocks which depends on CONFIG_PREEMPT_RT. But this substitution can
also be used to expose these locks to the regular lock debugging
infrastructure, e.g. lockdep.

As this increases the size of affected data structures significantly this
is guarded by a separate Kconfig switch.

Note, that only the bit spinlocks which have a substitution implemented
will be covered by this. All other bit spinlocks evade lock debugging as
before.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 lib/Kconfig.debug |   10 ++++++++++
 1 file changed, 10 insertions(+)

--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1201,6 +1201,16 @@ config DEBUG_RWSEMS
 	  This debugging feature allows mismatched rw semaphore locks
 	  and unlocks to be detected and reported.
 
+config DEBUG_BIT_SPINLOCKS
+	bool "Bit spinlock debugging"
+	depends on DEBUG_SPINLOCK
+	help
+	  This debugging feature substitutes bit spinlocks in some use
+	  cases, e.g. buffer head, zram, with with regular spinlocks so
+	  these locks are exposed to lock debugging features.
+
+	  Not all bit spinlocks are covered by this.
+
 config DEBUG_LOCK_ALLOC
 	bool "Lock debugging: detect incorrect freeing of live locks"
 	depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [patch 2/4] fs/buffer: Move BH_Uptodate_Lock locking into wrapper functions
  2019-07-30 11:24 [patch 0/4] fs: Substitute bit-spinlocks for PREEMPT_RT and debugging Thomas Gleixner
  2019-07-30 11:24 ` [patch 1/4] locking/lockdep: Add Kconfig option for bit spinlocks Thomas Gleixner
@ 2019-07-30 11:24 ` Thomas Gleixner
  2019-07-31 14:46   ` Jan Kara
  2019-07-30 11:24 ` [patch 3/4] fs/buffer: Substitute BH_Uptodate_Lock for RT and bit spinlock debugging Thomas Gleixner
  2019-07-30 11:24 ` [patch 4/4] fs: jbd/jbd2: Substitute BH locks for RT and lock debugging Thomas Gleixner
  3 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2019-07-30 11:24 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Anna-Maria Gleixner, Steven Rostedt, Julia Cartwright,
	Theodore Tso, Matthew Wilcox, Alexander Viro, linux-fsdevel,
	linux-ext4, Jan Kara

Bit spinlocks are problematic if PREEMPT_RT is enabled, because they
disable preemption, which is undesired for latency reasons and breaks when
regular spinlocks are taken within the bit_spinlock locked region because
regular spinlocks are converted to 'sleeping spinlocks' on RT. So RT
replaces the bit spinlocks with regular spinlocks to avoid this problem.

To avoid ifdeffery at the source level, wrap all BH_Uptodate_Lock bitlock
operations with inline functions, so the spinlock substitution can be done
at one place.

Using regular spinlocks can also be enabled for lock debugging purposes so
the lock operations become visible to lockdep.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
---
 fs/buffer.c                 |   20 ++++++--------------
 fs/ext4/page-io.c           |    6 ++----
 fs/ntfs/aops.c              |   10 +++-------
 include/linux/buffer_head.h |   16 ++++++++++++++++
 4 files changed, 27 insertions(+), 25 deletions(-)

--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -275,8 +275,7 @@ static void end_buffer_async_read(struct
 	 * decide that the page is now completely done.
 	 */
 	first = page_buffers(page);
-	local_irq_save(flags);
-	bit_spin_lock(BH_Uptodate_Lock, &first->b_state);
+	flags = bh_uptodate_lock_irqsave(first);
 	clear_buffer_async_read(bh);
 	unlock_buffer(bh);
 	tmp = bh;
@@ -289,8 +288,7 @@ static void end_buffer_async_read(struct
 		}
 		tmp = tmp->b_this_page;
 	} while (tmp != bh);
-	bit_spin_unlock(BH_Uptodate_Lock, &first->b_state);
-	local_irq_restore(flags);
+	bh_uptodate_unlock_irqrestore(first, flags);
 
 	/*
 	 * If none of the buffers had errors and they are all
@@ -302,9 +300,7 @@ static void end_buffer_async_read(struct
 	return;
 
 still_busy:
-	bit_spin_unlock(BH_Uptodate_Lock, &first->b_state);
-	local_irq_restore(flags);
-	return;
+	bh_uptodate_unlock_irqrestore(first, flags);
 }
 
 /*
@@ -331,8 +327,7 @@ void end_buffer_async_write(struct buffe
 	}
 
 	first = page_buffers(page);
-	local_irq_save(flags);
-	bit_spin_lock(BH_Uptodate_Lock, &first->b_state);
+	flags = bh_uptodate_lock_irqsave(first);
 
 	clear_buffer_async_write(bh);
 	unlock_buffer(bh);
@@ -344,15 +339,12 @@ void end_buffer_async_write(struct buffe
 		}
 		tmp = tmp->b_this_page;
 	}
-	bit_spin_unlock(BH_Uptodate_Lock, &first->b_state);
-	local_irq_restore(flags);
+	bh_uptodate_unlock_irqrestore(first, flags);
 	end_page_writeback(page);
 	return;
 
 still_busy:
-	bit_spin_unlock(BH_Uptodate_Lock, &first->b_state);
-	local_irq_restore(flags);
-	return;
+	bh_uptodate_unlock_irqrestore(first, flags);
 }
 EXPORT_SYMBOL(end_buffer_async_write);
 
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -90,8 +90,7 @@ static void ext4_finish_bio(struct bio *
 		 * We check all buffers in the page under BH_Uptodate_Lock
 		 * to avoid races with other end io clearing async_write flags
 		 */
-		local_irq_save(flags);
-		bit_spin_lock(BH_Uptodate_Lock, &head->b_state);
+		flags = bh_uptodate_lock_irqsave(head);
 		do {
 			if (bh_offset(bh) < bio_start ||
 			    bh_offset(bh) + bh->b_size > bio_end) {
@@ -103,8 +102,7 @@ static void ext4_finish_bio(struct bio *
 			if (bio->bi_status)
 				buffer_io_error(bh);
 		} while ((bh = bh->b_this_page) != head);
-		bit_spin_unlock(BH_Uptodate_Lock, &head->b_state);
-		local_irq_restore(flags);
+		bh_uptodate_unlock_irqrestore(head, flags);
 		if (!under_io) {
 			fscrypt_free_bounce_page(bounce_page);
 			end_page_writeback(page);
--- a/fs/ntfs/aops.c
+++ b/fs/ntfs/aops.c
@@ -92,8 +92,7 @@ static void ntfs_end_buffer_async_read(s
 				"0x%llx.", (unsigned long long)bh->b_blocknr);
 	}
 	first = page_buffers(page);
-	local_irq_save(flags);
-	bit_spin_lock(BH_Uptodate_Lock, &first->b_state);
+	flags = bh_uptodate_lock_irqsave(first);
 	clear_buffer_async_read(bh);
 	unlock_buffer(bh);
 	tmp = bh;
@@ -108,8 +107,7 @@ static void ntfs_end_buffer_async_read(s
 		}
 		tmp = tmp->b_this_page;
 	} while (tmp != bh);
-	bit_spin_unlock(BH_Uptodate_Lock, &first->b_state);
-	local_irq_restore(flags);
+	bh_uptodate_unlock_irqrestore(first, flags);
 	/*
 	 * If none of the buffers had errors then we can set the page uptodate,
 	 * but we first have to perform the post read mst fixups, if the
@@ -142,9 +140,7 @@ static void ntfs_end_buffer_async_read(s
 	unlock_page(page);
 	return;
 still_busy:
-	bit_spin_unlock(BH_Uptodate_Lock, &first->b_state);
-	local_irq_restore(flags);
-	return;
+	bh_uptodate_unlock_irqrestore(first, flags);
 }
 
 /**
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -78,6 +78,22 @@ struct buffer_head {
 	atomic_t b_count;		/* users using this buffer_head */
 };
 
+static inline unsigned long bh_uptodate_lock_irqsave(struct buffer_head *bh)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	bit_spin_lock(BH_Uptodate_Lock, &bh->b_state);
+	return flags;
+}
+
+static inline void
+bh_uptodate_unlock_irqrestore(struct buffer_head *bh, unsigned long flags)
+{
+	bit_spin_unlock(BH_Uptodate_Lock, &bh->b_state);
+	local_irq_restore(flags);
+}
+
 /*
  * macro tricks to expand the set_buffer_foo(), clear_buffer_foo()
  * and buffer_foo() functions.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [patch 3/4] fs/buffer: Substitute BH_Uptodate_Lock for RT and bit spinlock debugging
  2019-07-30 11:24 [patch 0/4] fs: Substitute bit-spinlocks for PREEMPT_RT and debugging Thomas Gleixner
  2019-07-30 11:24 ` [patch 1/4] locking/lockdep: Add Kconfig option for bit spinlocks Thomas Gleixner
  2019-07-30 11:24 ` [patch 2/4] fs/buffer: Move BH_Uptodate_Lock locking into wrapper functions Thomas Gleixner
@ 2019-07-30 11:24 ` Thomas Gleixner
  2019-07-31 14:47   ` Jan Kara
  2019-07-30 11:24 ` [patch 4/4] fs: jbd/jbd2: Substitute BH locks for RT and lock debugging Thomas Gleixner
  3 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2019-07-30 11:24 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Anna-Maria Gleixner, Steven Rostedt, Julia Cartwright,
	Theodore Tso, Alexander Viro, Matthew Wilcox, linux-fsdevel,
	linux-ext4, Jan Kara

Bit spinlocks are problematic if PREEMPT_RT is enabled. They disable
preemption, which is undesired for latency reasons and breaks when regular
spinlocks are taken within the bit_spinlock locked region because regular
spinlocks are converted to 'sleeping spinlocks' on RT.

Substitute the BH_Uptodate_Lock bit spinlock with a regular spinlock for
PREEMPT_RT enabled kernels.

Bit spinlocks are also not covered by lock debugging, e.g. lockdep. With
the spinlock substitution in place, they can be exposed via
CONFIG_DEBUG_BIT_SPINLOCKS.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: linux-fsdevel@vger.kernel.org
---
 fs/buffer.c                 |    1 +
 include/linux/buffer_head.h |   31 +++++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+)

--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3360,6 +3360,7 @@ struct buffer_head *alloc_buffer_head(gf
 	struct buffer_head *ret = kmem_cache_zalloc(bh_cachep, gfp_flags);
 	if (ret) {
 		INIT_LIST_HEAD(&ret->b_assoc_buffers);
+		buffer_head_init_locks(ret);
 		preempt_disable();
 		__this_cpu_inc(bh_accounting.nr);
 		recalc_bh_state();
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -76,8 +76,35 @@ struct buffer_head {
 	struct address_space *b_assoc_map;	/* mapping this buffer is
 						   associated with */
 	atomic_t b_count;		/* users using this buffer_head */
+
+#if defined(CONFIG_PREEMPT_RT) || defined(CONFIG_DEBUG_BIT_SPINLOCKS)
+	spinlock_t b_uptodate_lock;
+#endif
 };
 
+#if defined(CONFIG_PREEMPT_RT) || defined(CONFIG_DEBUG_BIT_SPINLOCKS)
+
+static inline unsigned long bh_uptodate_lock_irqsave(struct buffer_head *bh)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&bh->b_uptodate_lock, flags);
+	return flags;
+}
+
+static inline void
+bh_uptodate_unlock_irqrestore(struct buffer_head *bh, unsigned long flags)
+{
+	spin_unlock_irqrestore(&bh->b_uptodate_lock, flags);
+}
+
+static inline void buffer_head_init_locks(struct buffer_head *bh)
+{
+	spin_lock_init(&bh->b_uptodate_lock);
+}
+
+#else /* PREEMPT_RT || DEBUG_BIT_SPINLOCKS */
+
 static inline unsigned long bh_uptodate_lock_irqsave(struct buffer_head *bh)
 {
 	unsigned long flags;
@@ -94,6 +121,10 @@ bh_uptodate_unlock_irqrestore(struct buf
 	local_irq_restore(flags);
 }
 
+static inline void buffer_head_init_locks(struct buffer_head *bh) { }
+
+#endif /* !PREEMPT_RT && !DEBUG_BIT_SPINLOCKS */
+
 /*
  * macro tricks to expand the set_buffer_foo(), clear_buffer_foo()
  * and buffer_foo() functions.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [patch 4/4] fs: jbd/jbd2: Substitute BH locks for RT and lock debugging
  2019-07-30 11:24 [patch 0/4] fs: Substitute bit-spinlocks for PREEMPT_RT and debugging Thomas Gleixner
                   ` (2 preceding siblings ...)
  2019-07-30 11:24 ` [patch 3/4] fs/buffer: Substitute BH_Uptodate_Lock for RT and bit spinlock debugging Thomas Gleixner
@ 2019-07-30 11:24 ` Thomas Gleixner
  2019-07-31 15:48   ` Jan Kara
  3 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2019-07-30 11:24 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Anna-Maria Gleixner, Steven Rostedt, Julia Cartwright,
	linux-ext4, Theodore Tso, Matthew Wilcox, Jan Kara,
	Alexander Viro, linux-fsdevel

Bit spinlocks are problematic if PREEMPT_RT is enabled. They disable
preemption, which is undesired for latency reasons and breaks when regular
spinlocks are taken within the bit_spinlock locked region because regular
spinlocks are converted to 'sleeping spinlocks' on RT.

Substitute the BH_State and BH_JournalHead bit spinlocks with regular
spinlock for PREEMPT_RT enabled kernels.

Bit spinlocks are also not covered by lock debugging, e.g. lockdep. With
the spinlock substitution in place, they can be exposed via
CONFIG_DEBUG_BIT_SPINLOCKS.

Originally-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-ext4@vger.kernel.org
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Jan Kara <jack@suse.com>
--
 include/linux/buffer_head.h |    8 ++++++++
 include/linux/jbd2.h        |   36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)

--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -79,6 +79,10 @@ struct buffer_head {
 
 #if defined(CONFIG_PREEMPT_RT) || defined(CONFIG_DEBUG_BIT_SPINLOCKS)
 	spinlock_t b_uptodate_lock;
+# if IS_ENABLED(CONFIG_JBD2)
+	spinlock_t b_state_lock;
+	spinlock_t b_journal_head_lock;
+# endif
 #endif
 };
 
@@ -101,6 +105,10 @@ bh_uptodate_unlock_irqrestore(struct buf
 static inline void buffer_head_init_locks(struct buffer_head *bh)
 {
 	spin_lock_init(&bh->b_uptodate_lock);
+#if IS_ENABLED(CONFIG_JBD2)
+	spin_lock_init(&bh->b_state_lock);
+	spin_lock_init(&bh->b_journal_head_lock);
+#endif
 }
 
 #else /* PREEMPT_RT || DEBUG_BIT_SPINLOCKS */
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -342,6 +342,40 @@ static inline struct journal_head *bh2jh
 	return bh->b_private;
 }
 
+#if defined(CONFIG_PREEMPT_RT) || defined(CONFIG_DEBUG_BIT_SPINLOCKS)
+
+static inline void jbd_lock_bh_state(struct buffer_head *bh)
+{
+	spin_lock(&bh->b_state_lock);
+}
+
+static inline int jbd_trylock_bh_state(struct buffer_head *bh)
+{
+	return spin_trylock(&bh->b_state_lock);
+}
+
+static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
+{
+	return spin_is_locked(&bh->b_state_lock);
+}
+
+static inline void jbd_unlock_bh_state(struct buffer_head *bh)
+{
+	spin_unlock(&bh->b_state_lock);
+}
+
+static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
+{
+	spin_lock(&bh->b_journal_head_lock);
+}
+
+static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
+{
+	spin_unlock(&bh->b_journal_head_lock);
+}
+
+#else /* PREEMPT_RT || DEBUG_BIT_SPINLOCKS */
+
 static inline void jbd_lock_bh_state(struct buffer_head *bh)
 {
 	bit_spin_lock(BH_State, &bh->b_state);
@@ -372,6 +406,8 @@ static inline void jbd_unlock_bh_journal
 	bit_spin_unlock(BH_JournalHead, &bh->b_state);
 }
 
+#endif /* !PREEMPT_RT && !DEBUG_BIT_SPINLOCKS */
+
 #define J_ASSERT(assert)	BUG_ON(!(assert))
 
 #define J_ASSERT_BH(bh, expr)	J_ASSERT(expr)



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch 2/4] fs/buffer: Move BH_Uptodate_Lock locking into wrapper functions
  2019-07-30 11:24 ` [patch 2/4] fs/buffer: Move BH_Uptodate_Lock locking into wrapper functions Thomas Gleixner
@ 2019-07-31 14:46   ` Jan Kara
  2019-07-31 22:27     ` Thomas Gleixner
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kara @ 2019-07-31 14:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Steven Rostedt, Peter Zijlstra, Matthew Wilcox,
	Ingo Molnar, Anna-Maria Gleixner, Sebastian Siewior,
	Theodore Tso, Julia Cartwright, Jan Kara, linux-ext4,
	linux-fsdevel, Alexander Viro

On Tue 30-07-19 13:24:54, Thomas Gleixner wrote:
> Bit spinlocks are problematic if PREEMPT_RT is enabled, because they
> disable preemption, which is undesired for latency reasons and breaks when
> regular spinlocks are taken within the bit_spinlock locked region because
> regular spinlocks are converted to 'sleeping spinlocks' on RT. So RT
> replaces the bit spinlocks with regular spinlocks to avoid this problem.
> 
> To avoid ifdeffery at the source level, wrap all BH_Uptodate_Lock bitlock
> operations with inline functions, so the spinlock substitution can be done
> at one place.
> 
> Using regular spinlocks can also be enabled for lock debugging purposes so
> the lock operations become visible to lockdep.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: "Theodore Ts'o" <tytso@mit.edu>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: linux-fsdevel@vger.kernel.org

Looks good to me. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

BTW, it should be possible to get rid of BH_Uptodate_Lock altogether using
bio chaining (which was non-existent when this bh code was written) to make
sure IO completion function gets called only once all bios used to fill in
/ write out the page are done. It would be also more efficient. But I guess
that's an interesting cleanup project for some other time...

								Honza

> ---
>  fs/buffer.c                 |   20 ++++++--------------
>  fs/ext4/page-io.c           |    6 ++----
>  fs/ntfs/aops.c              |   10 +++-------
>  include/linux/buffer_head.h |   16 ++++++++++++++++
>  4 files changed, 27 insertions(+), 25 deletions(-)
> 
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -275,8 +275,7 @@ static void end_buffer_async_read(struct
>  	 * decide that the page is now completely done.
>  	 */
>  	first = page_buffers(page);
> -	local_irq_save(flags);
> -	bit_spin_lock(BH_Uptodate_Lock, &first->b_state);
> +	flags = bh_uptodate_lock_irqsave(first);
>  	clear_buffer_async_read(bh);
>  	unlock_buffer(bh);
>  	tmp = bh;
> @@ -289,8 +288,7 @@ static void end_buffer_async_read(struct
>  		}
>  		tmp = tmp->b_this_page;
>  	} while (tmp != bh);
> -	bit_spin_unlock(BH_Uptodate_Lock, &first->b_state);
> -	local_irq_restore(flags);
> +	bh_uptodate_unlock_irqrestore(first, flags);
>  
>  	/*
>  	 * If none of the buffers had errors and they are all
> @@ -302,9 +300,7 @@ static void end_buffer_async_read(struct
>  	return;
>  
>  still_busy:
> -	bit_spin_unlock(BH_Uptodate_Lock, &first->b_state);
> -	local_irq_restore(flags);
> -	return;
> +	bh_uptodate_unlock_irqrestore(first, flags);
>  }
>  
>  /*
> @@ -331,8 +327,7 @@ void end_buffer_async_write(struct buffe
>  	}
>  
>  	first = page_buffers(page);
> -	local_irq_save(flags);
> -	bit_spin_lock(BH_Uptodate_Lock, &first->b_state);
> +	flags = bh_uptodate_lock_irqsave(first);
>  
>  	clear_buffer_async_write(bh);
>  	unlock_buffer(bh);
> @@ -344,15 +339,12 @@ void end_buffer_async_write(struct buffe
>  		}
>  		tmp = tmp->b_this_page;
>  	}
> -	bit_spin_unlock(BH_Uptodate_Lock, &first->b_state);
> -	local_irq_restore(flags);
> +	bh_uptodate_unlock_irqrestore(first, flags);
>  	end_page_writeback(page);
>  	return;
>  
>  still_busy:
> -	bit_spin_unlock(BH_Uptodate_Lock, &first->b_state);
> -	local_irq_restore(flags);
> -	return;
> +	bh_uptodate_unlock_irqrestore(first, flags);
>  }
>  EXPORT_SYMBOL(end_buffer_async_write);
>  
> --- a/fs/ext4/page-io.c
> +++ b/fs/ext4/page-io.c
> @@ -90,8 +90,7 @@ static void ext4_finish_bio(struct bio *
>  		 * We check all buffers in the page under BH_Uptodate_Lock
>  		 * to avoid races with other end io clearing async_write flags
>  		 */
> -		local_irq_save(flags);
> -		bit_spin_lock(BH_Uptodate_Lock, &head->b_state);
> +		flags = bh_uptodate_lock_irqsave(head);
>  		do {
>  			if (bh_offset(bh) < bio_start ||
>  			    bh_offset(bh) + bh->b_size > bio_end) {
> @@ -103,8 +102,7 @@ static void ext4_finish_bio(struct bio *
>  			if (bio->bi_status)
>  				buffer_io_error(bh);
>  		} while ((bh = bh->b_this_page) != head);
> -		bit_spin_unlock(BH_Uptodate_Lock, &head->b_state);
> -		local_irq_restore(flags);
> +		bh_uptodate_unlock_irqrestore(head, flags);
>  		if (!under_io) {
>  			fscrypt_free_bounce_page(bounce_page);
>  			end_page_writeback(page);
> --- a/fs/ntfs/aops.c
> +++ b/fs/ntfs/aops.c
> @@ -92,8 +92,7 @@ static void ntfs_end_buffer_async_read(s
>  				"0x%llx.", (unsigned long long)bh->b_blocknr);
>  	}
>  	first = page_buffers(page);
> -	local_irq_save(flags);
> -	bit_spin_lock(BH_Uptodate_Lock, &first->b_state);
> +	flags = bh_uptodate_lock_irqsave(first);
>  	clear_buffer_async_read(bh);
>  	unlock_buffer(bh);
>  	tmp = bh;
> @@ -108,8 +107,7 @@ static void ntfs_end_buffer_async_read(s
>  		}
>  		tmp = tmp->b_this_page;
>  	} while (tmp != bh);
> -	bit_spin_unlock(BH_Uptodate_Lock, &first->b_state);
> -	local_irq_restore(flags);
> +	bh_uptodate_unlock_irqrestore(first, flags);
>  	/*
>  	 * If none of the buffers had errors then we can set the page uptodate,
>  	 * but we first have to perform the post read mst fixups, if the
> @@ -142,9 +140,7 @@ static void ntfs_end_buffer_async_read(s
>  	unlock_page(page);
>  	return;
>  still_busy:
> -	bit_spin_unlock(BH_Uptodate_Lock, &first->b_state);
> -	local_irq_restore(flags);
> -	return;
> +	bh_uptodate_unlock_irqrestore(first, flags);
>  }
>  
>  /**
> --- a/include/linux/buffer_head.h
> +++ b/include/linux/buffer_head.h
> @@ -78,6 +78,22 @@ struct buffer_head {
>  	atomic_t b_count;		/* users using this buffer_head */
>  };
>  
> +static inline unsigned long bh_uptodate_lock_irqsave(struct buffer_head *bh)
> +{
> +	unsigned long flags;
> +
> +	local_irq_save(flags);
> +	bit_spin_lock(BH_Uptodate_Lock, &bh->b_state);
> +	return flags;
> +}
> +
> +static inline void
> +bh_uptodate_unlock_irqrestore(struct buffer_head *bh, unsigned long flags)
> +{
> +	bit_spin_unlock(BH_Uptodate_Lock, &bh->b_state);
> +	local_irq_restore(flags);
> +}
> +
>  /*
>   * macro tricks to expand the set_buffer_foo(), clear_buffer_foo()
>   * and buffer_foo() functions.
> 
> 
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch 3/4] fs/buffer: Substitute BH_Uptodate_Lock for RT and bit spinlock debugging
  2019-07-30 11:24 ` [patch 3/4] fs/buffer: Substitute BH_Uptodate_Lock for RT and bit spinlock debugging Thomas Gleixner
@ 2019-07-31 14:47   ` Jan Kara
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Kara @ 2019-07-31 14:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Steven Rostedt, Peter Zijlstra, Matthew Wilcox,
	Ingo Molnar, Anna-Maria Gleixner, Sebastian Siewior,
	Theodore Tso, Julia Cartwright, Jan Kara, linux-ext4,
	linux-fsdevel, Alexander Viro

On Tue 30-07-19 13:24:55, Thomas Gleixner wrote:
> Bit spinlocks are problematic if PREEMPT_RT is enabled. They disable
> preemption, which is undesired for latency reasons and breaks when regular
> spinlocks are taken within the bit_spinlock locked region because regular
> spinlocks are converted to 'sleeping spinlocks' on RT.
> 
> Substitute the BH_Uptodate_Lock bit spinlock with a regular spinlock for
> PREEMPT_RT enabled kernels.
> 
> Bit spinlocks are also not covered by lock debugging, e.g. lockdep. With
> the spinlock substitution in place, they can be exposed via
> CONFIG_DEBUG_BIT_SPINLOCKS.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: "Theodore Ts'o" <tytso@mit.edu>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: linux-fsdevel@vger.kernel.org

Looks good to me. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/buffer.c                 |    1 +
>  include/linux/buffer_head.h |   31 +++++++++++++++++++++++++++++++
>  2 files changed, 32 insertions(+)
> 
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -3360,6 +3360,7 @@ struct buffer_head *alloc_buffer_head(gf
>  	struct buffer_head *ret = kmem_cache_zalloc(bh_cachep, gfp_flags);
>  	if (ret) {
>  		INIT_LIST_HEAD(&ret->b_assoc_buffers);
> +		buffer_head_init_locks(ret);
>  		preempt_disable();
>  		__this_cpu_inc(bh_accounting.nr);
>  		recalc_bh_state();
> --- a/include/linux/buffer_head.h
> +++ b/include/linux/buffer_head.h
> @@ -76,8 +76,35 @@ struct buffer_head {
>  	struct address_space *b_assoc_map;	/* mapping this buffer is
>  						   associated with */
>  	atomic_t b_count;		/* users using this buffer_head */
> +
> +#if defined(CONFIG_PREEMPT_RT) || defined(CONFIG_DEBUG_BIT_SPINLOCKS)
> +	spinlock_t b_uptodate_lock;
> +#endif
>  };
>  
> +#if defined(CONFIG_PREEMPT_RT) || defined(CONFIG_DEBUG_BIT_SPINLOCKS)
> +
> +static inline unsigned long bh_uptodate_lock_irqsave(struct buffer_head *bh)
> +{
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&bh->b_uptodate_lock, flags);
> +	return flags;
> +}
> +
> +static inline void
> +bh_uptodate_unlock_irqrestore(struct buffer_head *bh, unsigned long flags)
> +{
> +	spin_unlock_irqrestore(&bh->b_uptodate_lock, flags);
> +}
> +
> +static inline void buffer_head_init_locks(struct buffer_head *bh)
> +{
> +	spin_lock_init(&bh->b_uptodate_lock);
> +}
> +
> +#else /* PREEMPT_RT || DEBUG_BIT_SPINLOCKS */
> +
>  static inline unsigned long bh_uptodate_lock_irqsave(struct buffer_head *bh)
>  {
>  	unsigned long flags;
> @@ -94,6 +121,10 @@ bh_uptodate_unlock_irqrestore(struct buf
>  	local_irq_restore(flags);
>  }
>  
> +static inline void buffer_head_init_locks(struct buffer_head *bh) { }
> +
> +#endif /* !PREEMPT_RT && !DEBUG_BIT_SPINLOCKS */
> +
>  /*
>   * macro tricks to expand the set_buffer_foo(), clear_buffer_foo()
>   * and buffer_foo() functions.
> 
> 
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch 4/4] fs: jbd/jbd2: Substitute BH locks for RT and lock debugging
  2019-07-30 11:24 ` [patch 4/4] fs: jbd/jbd2: Substitute BH locks for RT and lock debugging Thomas Gleixner
@ 2019-07-31 15:48   ` Jan Kara
  2019-07-31 19:40     ` Thomas Gleixner
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kara @ 2019-07-31 15:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Steven Rostedt, Peter Zijlstra, Matthew Wilcox,
	Ingo Molnar, Anna-Maria Gleixner, Sebastian Siewior,
	Theodore Tso, Julia Cartwright, Jan Kara, linux-ext4,
	linux-fsdevel, Alexander Viro

On Tue 30-07-19 13:24:56, Thomas Gleixner wrote:
> Bit spinlocks are problematic if PREEMPT_RT is enabled. They disable
> preemption, which is undesired for latency reasons and breaks when regular
> spinlocks are taken within the bit_spinlock locked region because regular
> spinlocks are converted to 'sleeping spinlocks' on RT.
> 
> Substitute the BH_State and BH_JournalHead bit spinlocks with regular
> spinlock for PREEMPT_RT enabled kernels.

Is there a real need for substitution for BH_JournalHead bit spinlock?  The
critical sections are pretty tiny, all located within fs/jbd2/journal.c.
Maybe only the one around __journal_remove_journal_head() would need a bit
of refactoring so that journal_free_journal_head() doesn't get called
under the bit-spinlock.

BH_State lock is definitely worth it. In fact, if you placed the spinlock
inside struct journal_head (which is the structure whose members are in
fact protected by it), I'd be even fine with just using the spinlock always
instead of the bit spinlock. journal_head is pretty big anyway (and there's
even 4-byte hole in it for 64-bit archs) and these structures are pretty
rare (only for actively changed metadata buffers).

								Honza

> 
> Bit spinlocks are also not covered by lock debugging, e.g. lockdep. With
> the spinlock substitution in place, they can be exposed via
> CONFIG_DEBUG_BIT_SPINLOCKS.
> 
> Originally-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: linux-ext4@vger.kernel.org
> Cc: "Theodore Ts'o" <tytso@mit.edu>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Jan Kara <jack@suse.com>
> --
>  include/linux/buffer_head.h |    8 ++++++++
>  include/linux/jbd2.h        |   36 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 44 insertions(+)
> 
> --- a/include/linux/buffer_head.h
> +++ b/include/linux/buffer_head.h
> @@ -79,6 +79,10 @@ struct buffer_head {
>  
>  #if defined(CONFIG_PREEMPT_RT) || defined(CONFIG_DEBUG_BIT_SPINLOCKS)
>  	spinlock_t b_uptodate_lock;
> +# if IS_ENABLED(CONFIG_JBD2)
> +	spinlock_t b_state_lock;
> +	spinlock_t b_journal_head_lock;
> +# endif
>  #endif
>  };
>  
> @@ -101,6 +105,10 @@ bh_uptodate_unlock_irqrestore(struct buf
>  static inline void buffer_head_init_locks(struct buffer_head *bh)
>  {
>  	spin_lock_init(&bh->b_uptodate_lock);
> +#if IS_ENABLED(CONFIG_JBD2)
> +	spin_lock_init(&bh->b_state_lock);
> +	spin_lock_init(&bh->b_journal_head_lock);
> +#endif
>  }
>  
>  #else /* PREEMPT_RT || DEBUG_BIT_SPINLOCKS */
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -342,6 +342,40 @@ static inline struct journal_head *bh2jh
>  	return bh->b_private;
>  }
>  
> +#if defined(CONFIG_PREEMPT_RT) || defined(CONFIG_DEBUG_BIT_SPINLOCKS)
> +
> +static inline void jbd_lock_bh_state(struct buffer_head *bh)
> +{
> +	spin_lock(&bh->b_state_lock);
> +}
> +
> +static inline int jbd_trylock_bh_state(struct buffer_head *bh)
> +{
> +	return spin_trylock(&bh->b_state_lock);
> +}
> +
> +static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
> +{
> +	return spin_is_locked(&bh->b_state_lock);
> +}
> +
> +static inline void jbd_unlock_bh_state(struct buffer_head *bh)
> +{
> +	spin_unlock(&bh->b_state_lock);
> +}
> +
> +static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
> +{
> +	spin_lock(&bh->b_journal_head_lock);
> +}
> +
> +static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
> +{
> +	spin_unlock(&bh->b_journal_head_lock);
> +}
> +
> +#else /* PREEMPT_RT || DEBUG_BIT_SPINLOCKS */
> +
>  static inline void jbd_lock_bh_state(struct buffer_head *bh)
>  {
>  	bit_spin_lock(BH_State, &bh->b_state);
> @@ -372,6 +406,8 @@ static inline void jbd_unlock_bh_journal
>  	bit_spin_unlock(BH_JournalHead, &bh->b_state);
>  }
>  
> +#endif /* !PREEMPT_RT && !DEBUG_BIT_SPINLOCKS */
> +
>  #define J_ASSERT(assert)	BUG_ON(!(assert))
>  
>  #define J_ASSERT_BH(bh, expr)	J_ASSERT(expr)
> 
> 
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch 4/4] fs: jbd/jbd2: Substitute BH locks for RT and lock debugging
  2019-07-31 15:48   ` Jan Kara
@ 2019-07-31 19:40     ` Thomas Gleixner
  2019-08-01  8:44       ` Jan Kara
  0 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2019-07-31 19:40 UTC (permalink / raw)
  To: Jan Kara
  Cc: LKML, Steven Rostedt, Peter Zijlstra, Matthew Wilcox,
	Ingo Molnar, Anna-Maria Gleixner, Sebastian Siewior,
	Theodore Tso, Julia Cartwright, Jan Kara, linux-ext4,
	linux-fsdevel, Alexander Viro

On Wed, 31 Jul 2019, Jan Kara wrote:
> On Tue 30-07-19 13:24:56, Thomas Gleixner wrote:
> > Bit spinlocks are problematic if PREEMPT_RT is enabled. They disable
> > preemption, which is undesired for latency reasons and breaks when regular
> > spinlocks are taken within the bit_spinlock locked region because regular
> > spinlocks are converted to 'sleeping spinlocks' on RT.
> > 
> > Substitute the BH_State and BH_JournalHead bit spinlocks with regular
> > spinlock for PREEMPT_RT enabled kernels.
> 
> Is there a real need for substitution for BH_JournalHead bit spinlock?  The
> critical sections are pretty tiny, all located within fs/jbd2/journal.c.
> Maybe only the one around __journal_remove_journal_head() would need a bit
> of refactoring so that journal_free_journal_head() doesn't get called
> under the bit-spinlock.

Makes sense.

> BH_State lock is definitely worth it. In fact, if you placed the spinlock
> inside struct journal_head (which is the structure whose members are in
> fact protected by it), I'd be even fine with just using the spinlock always
> instead of the bit spinlock. journal_head is pretty big anyway (and there's
> even 4-byte hole in it for 64-bit archs) and these structures are pretty
> rare (only for actively changed metadata buffers).

Just need to figure out what to do with the ASSERT_JH(state_is_locked) case for
UP. Perhaps just return true for UP && !DEBUG_SPINLOCK?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch 2/4] fs/buffer: Move BH_Uptodate_Lock locking into wrapper functions
  2019-07-31 14:46   ` Jan Kara
@ 2019-07-31 22:27     ` Thomas Gleixner
  0 siblings, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2019-07-31 22:27 UTC (permalink / raw)
  To: Jan Kara
  Cc: LKML, Steven Rostedt, Peter Zijlstra, Matthew Wilcox,
	Ingo Molnar, Anna-Maria Gleixner, Sebastian Siewior,
	Theodore Tso, Julia Cartwright, Jan Kara, linux-ext4,
	linux-fsdevel, Alexander Viro

On Wed, 31 Jul 2019, Jan Kara wrote:
> On Tue 30-07-19 13:24:54, Thomas Gleixner wrote:
> > Bit spinlocks are problematic if PREEMPT_RT is enabled, because they
> > disable preemption, which is undesired for latency reasons and breaks when
> > regular spinlocks are taken within the bit_spinlock locked region because
> > regular spinlocks are converted to 'sleeping spinlocks' on RT. So RT
> > replaces the bit spinlocks with regular spinlocks to avoid this problem.
> > 
> > To avoid ifdeffery at the source level, wrap all BH_Uptodate_Lock bitlock
> > operations with inline functions, so the spinlock substitution can be done
> > at one place.
> > 
> > Using regular spinlocks can also be enabled for lock debugging purposes so
> > the lock operations become visible to lockdep.
> > 
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > Cc: "Theodore Ts'o" <tytso@mit.edu>
> > Cc: Matthew Wilcox <willy@infradead.org>
> > Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> > Cc: linux-fsdevel@vger.kernel.org
> 
> Looks good to me. You can add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>
> 
> BTW, it should be possible to get rid of BH_Uptodate_Lock altogether using
> bio chaining (which was non-existent when this bh code was written) to make
> sure IO completion function gets called only once all bios used to fill in
> / write out the page are done. It would be also more efficient. But I guess
> that's an interesting cleanup project for some other time...

While 'possible cleanup' is something which triggers a certain nerve, that
particular project certainly goes beyond my basic understanding of that
whole fs/block conglomerate. I rather leave that to people who actually
have a clue. :)

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch 4/4] fs: jbd/jbd2: Substitute BH locks for RT and lock debugging
  2019-07-31 19:40     ` Thomas Gleixner
@ 2019-08-01  8:44       ` Jan Kara
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Kara @ 2019-08-01  8:44 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jan Kara, LKML, Steven Rostedt, Peter Zijlstra, Matthew Wilcox,
	Ingo Molnar, Anna-Maria Gleixner, Sebastian Siewior,
	Theodore Tso, Julia Cartwright, Jan Kara, linux-ext4,
	linux-fsdevel, Alexander Viro

On Wed 31-07-19 21:40:42, Thomas Gleixner wrote:
> On Wed, 31 Jul 2019, Jan Kara wrote:
> > BH_State lock is definitely worth it. In fact, if you placed the spinlock
> > inside struct journal_head (which is the structure whose members are in
> > fact protected by it), I'd be even fine with just using the spinlock always
> > instead of the bit spinlock. journal_head is pretty big anyway (and there's
> > even 4-byte hole in it for 64-bit archs) and these structures are pretty
> > rare (only for actively changed metadata buffers).
> 
> Just need to figure out what to do with the ASSERT_JH(state_is_locked) case for
> UP. Perhaps just return true for UP && !DEBUG_SPINLOCK?

Yes, that makes sense.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-08-01  8:44 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-30 11:24 [patch 0/4] fs: Substitute bit-spinlocks for PREEMPT_RT and debugging Thomas Gleixner
2019-07-30 11:24 ` [patch 1/4] locking/lockdep: Add Kconfig option for bit spinlocks Thomas Gleixner
2019-07-30 11:24 ` [patch 2/4] fs/buffer: Move BH_Uptodate_Lock locking into wrapper functions Thomas Gleixner
2019-07-31 14:46   ` Jan Kara
2019-07-31 22:27     ` Thomas Gleixner
2019-07-30 11:24 ` [patch 3/4] fs/buffer: Substitute BH_Uptodate_Lock for RT and bit spinlock debugging Thomas Gleixner
2019-07-31 14:47   ` Jan Kara
2019-07-30 11:24 ` [patch 4/4] fs: jbd/jbd2: Substitute BH locks for RT and lock debugging Thomas Gleixner
2019-07-31 15:48   ` Jan Kara
2019-07-31 19:40     ` Thomas Gleixner
2019-08-01  8:44       ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).