All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] squashfs: Disable "percpu multiple decompressor" on RT
@ 2018-05-02 13:12 Alexander Stein
  2018-05-02 14:37 ` Julia Cartwright
  0 siblings, 1 reply; 7+ messages in thread
From: Alexander Stein @ 2018-05-02 13:12 UTC (permalink / raw)
  To: linux-rt-users; +Cc: Alexander Stein

This decompressor can introduce a huge latency when a to be executed
process have to read and decompress directly from mass storage.
Using a QSPI flash and squashfs, starting htop causes a latency of ~8000µs
to a running cyclictest. The "multiple decompressor" is fine though.
The cause is that squashfs_decompress() calls get_cpu_ptr(). If this is
done on all CPUs no task will be executed until the decompression has
finished.

Signed-off-by: Alexander Stein <alexander.stein@systec-electronic.com>
---
 fs/squashfs/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/squashfs/Kconfig b/fs/squashfs/Kconfig
index 1adb3346b9d6..246c5966173d 100644
--- a/fs/squashfs/Kconfig
+++ b/fs/squashfs/Kconfig
@@ -86,6 +86,7 @@ config SQUASHFS_DECOMP_MULTI
 
 config SQUASHFS_DECOMP_MULTI_PERCPU
 	bool "Use percpu multiple decompressors for parallel I/O"
+	depends on !PREEMPT_RT_BASE
 	help
 	  By default Squashfs uses a single decompressor but it gives
 	  poor performance on parallel I/O workloads when using multiple CPU
-- 
2.16.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] squashfs: Disable "percpu multiple decompressor" on RT
  2018-05-02 13:12 [PATCH 1/1] squashfs: Disable "percpu multiple decompressor" on RT Alexander Stein
@ 2018-05-02 14:37 ` Julia Cartwright
  2018-05-03  6:36   ` Alexander Stein
  0 siblings, 1 reply; 7+ messages in thread
From: Julia Cartwright @ 2018-05-02 14:37 UTC (permalink / raw)
  To: Alexander Stein; +Cc: linux-rt-users, bigeasy, tglx

On Wed, May 02, 2018 at 03:12:33PM +0200, Alexander Stein wrote:
> This decompressor can introduce a huge latency when a to be executed
> process have to read and decompress directly from mass storage.
> Using a QSPI flash and squashfs, starting htop causes a latency of ~8000µs
> to a running cyclictest. The "multiple decompressor" is fine though.
> The cause is that squashfs_decompress() calls get_cpu_ptr(). If this is
> done on all CPUs no task will be executed until the decompression has
> finished.

Hello Alexander-

Thanks for the patch.

[..]
> +++ b/fs/squashfs/Kconfig
> @@ -86,6 +86,7 @@ config SQUASHFS_DECOMP_MULTI
>  
>  config SQUASHFS_DECOMP_MULTI_PERCPU
>  	bool "Use percpu multiple decompressors for parallel I/O"
> +	depends on !PREEMPT_RT_BASE

Hmm, I think we'd like to get out of the business of disabling Kconfig
options unless we are absolutely not given any other choice.

Looking at the codepaths involved in this squashfs decompressor, it
seems like this is a perfect candidate for the usage of local locks.
Can you give the following patch a try instead?

Thanks!
   Julia

-- 8< --
Subject: [PATCH] squashfs: make use of local lock in multi_cpu decompressor

Currently, the squashfs multi_cpu decompressor makes use of
get_cpu_ptr()/put_cpu_ptr(), which unconditionally disable preemption
during decompression.

Because the workload is distributed across CPUs, all CPUs can observe a
very high wakeup latency, which has been seen to be as much as 8000us.

Convert this decompressor to make use of a local lock, which will allow
execution of the decompressor with preemption-enabled, but also ensure
concurrent accesses to the percpu compressor data on the local CPU will
be serialized.

Reported-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Julia Cartwright <julia@ni.com>
---
 fs/squashfs/decompressor_multi_percpu.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/squashfs/decompressor_multi_percpu.c b/fs/squashfs/decompressor_multi_percpu.c
index 23a9c28ad8ea..661ced620fd1 100644
--- a/fs/squashfs/decompressor_multi_percpu.c
+++ b/fs/squashfs/decompressor_multi_percpu.c
@@ -6,6 +6,7 @@
  * the COPYING file in the top-level directory.
  */
 
+#include <linux/locallock.h>
 #include <linux/types.h>
 #include <linux/slab.h>
 #include <linux/percpu.h>
@@ -25,6 +26,8 @@ struct squashfs_stream {
 	void		*stream;
 };
 
+static DEFINE_LOCAL_IRQ_LOCK(stream_lock);
+
 void *squashfs_decompressor_create(struct squashfs_sb_info *msblk,
 						void *comp_opts)
 {
@@ -79,10 +82,15 @@ int squashfs_decompress(struct squashfs_sb_info *msblk, struct buffer_head **bh,
 {
 	struct squashfs_stream __percpu *percpu =
 			(struct squashfs_stream __percpu *) msblk->stream;
-	struct squashfs_stream *stream = get_cpu_ptr(percpu);
-	int res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
-		offset, length, output);
-	put_cpu_ptr(stream);
+	struct squashfs_stream *stream;
+	int res;
+
+	stream = get_locked_var(stream_lock, percpu);
+
+	res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
+			offset, length, output);
+
+	put_locked_var(stream_lock, stream);
 
 	if (res < 0)
 		ERROR("%s decompression failed, data probably corrupt\n",
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] squashfs: Disable "percpu multiple decompressor" on RT
  2018-05-02 14:37 ` Julia Cartwright
@ 2018-05-03  6:36   ` Alexander Stein
  2018-05-03 15:48     ` Julia Cartwright
  0 siblings, 1 reply; 7+ messages in thread
From: Alexander Stein @ 2018-05-03  6:36 UTC (permalink / raw)
  To: Julia Cartwright; +Cc: linux-rt-users, bigeasy, tglx

Hello Julia,

On Wednesday, May 2, 2018, 4:37:29 PM CEST Julia Cartwright wrote:
> [..]
> > +++ b/fs/squashfs/Kconfig
> > @@ -86,6 +86,7 @@ config SQUASHFS_DECOMP_MULTI
> >  
> >  config SQUASHFS_DECOMP_MULTI_PERCPU
> >  	bool "Use percpu multiple decompressors for parallel I/O"
> > +	depends on !PREEMPT_RT_BASE
> 
> Hmm, I think we'd like to get out of the business of disabling Kconfig
> options unless we are absolutely not given any other choice.
> 
> Looking at the codepaths involved in this squashfs decompressor, it
> seems like this is a perfect candidate for the usage of local locks.
> Can you give the following patch a try instead?

Sure. See below!

> --- a/fs/squashfs/decompressor_multi_percpu.c
> +++ b/fs/squashfs/decompressor_multi_percpu.c
> @@ -6,6 +6,7 @@
>   * the COPYING file in the top-level directory.
>   */
>  
> +#include <linux/locallock.h>
>  #include <linux/types.h>
>  #include <linux/slab.h>
>  #include <linux/percpu.h>

<linux/locallock.h> needs to be added after <linux/slab.h>.

> [...]
> @@ -79,10 +82,15 @@ int squashfs_decompress(struct squashfs_sb_info *msblk, struct buffer_head **bh,
>  {
>  	struct squashfs_stream __percpu *percpu =
>  			(struct squashfs_stream __percpu *) msblk->stream;
> -	struct squashfs_stream *stream = get_cpu_ptr(percpu);
> -	int res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
> -		offset, length, output);
> -	put_cpu_ptr(stream);
> +	struct squashfs_stream *stream;
> +	int res;
> +
> +	stream = get_locked_var(stream_lock, percpu);
> +
> +	res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
> +			offset, length, output);
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This doesn't work. I get a "Unable to handle kernel paging request at virtual address bcf67c1c". addr2line says its this line.

Best regards,
Alexander




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] squashfs: Disable "percpu multiple decompressor" on RT
  2018-05-03  6:36   ` Alexander Stein
@ 2018-05-03 15:48     ` Julia Cartwright
  2018-05-07  6:09       ` Alexander Stein
  0 siblings, 1 reply; 7+ messages in thread
From: Julia Cartwright @ 2018-05-03 15:48 UTC (permalink / raw)
  To: Alexander Stein; +Cc: linux-rt-users, bigeasy, tglx

Hey Alexander-

On Thu, May 03, 2018 at 08:36:03AM +0200, Alexander Stein wrote:
> On Wednesday, May 2, 2018, 4:37:29 PM CEST Julia Cartwright wrote:
> > [..]
> > > +++ b/fs/squashfs/Kconfig
> > > @@ -86,6 +86,7 @@ config SQUASHFS_DECOMP_MULTI
> > >  
> > >  config SQUASHFS_DECOMP_MULTI_PERCPU
> > >  	bool "Use percpu multiple decompressors for parallel I/O"
> > > +	depends on !PREEMPT_RT_BASE
> > 
> > Hmm, I think we'd like to get out of the business of disabling Kconfig
> > options unless we are absolutely not given any other choice.
> > 
> > Looking at the codepaths involved in this squashfs decompressor, it
> > seems like this is a perfect candidate for the usage of local locks.
> > Can you give the following patch a try instead?
> 
> Sure. See below!

Thanks for giving it a test.

> > --- a/fs/squashfs/decompressor_multi_percpu.c
> > +++ b/fs/squashfs/decompressor_multi_percpu.c
> > @@ -6,6 +6,7 @@
> >   * the COPYING file in the top-level directory.
> >   */
> >  
> > +#include <linux/locallock.h>
> >  #include <linux/types.h>
> >  #include <linux/slab.h>
> >  #include <linux/percpu.h>
>
> <linux/locallock.h> needs to be added after <linux/slab.h>.

That seems broken.

> > [...]
> > @@ -79,10 +82,15 @@ int squashfs_decompress(struct squashfs_sb_info *msblk, struct buffer_head **bh,
> >  {
> >  	struct squashfs_stream __percpu *percpu =
> >  			(struct squashfs_stream __percpu *) msblk->stream;
> > -	struct squashfs_stream *stream = get_cpu_ptr(percpu);
> > -	int res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
> > -		offset, length, output);
> > -	put_cpu_ptr(stream);
> > +	struct squashfs_stream *stream;
> > +	int res;
> > +
> > +	stream = get_locked_var(stream_lock, percpu);
> > +
> > +	res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
> > +			offset, length, output);
>           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> This doesn't work. I get a "Unable to handle kernel paging request at virtual address bcf67c1c". addr2line says its this line.

Hrmph.  I was lost in the percpu arithmetic.  Should have built with
sparse, it clearly tells me I screwed up.  Are you able to give this a
try?  It's sparse-clean now :)

   Julia

diff --git a/fs/squashfs/decompressor_multi_percpu.c b/fs/squashfs/decompressor_multi_percpu.c
index 23a9c28ad8ea..6a73c4fa88e7 100644
--- a/fs/squashfs/decompressor_multi_percpu.c
+++ b/fs/squashfs/decompressor_multi_percpu.c
@@ -10,6 +10,7 @@
 #include <linux/slab.h>
 #include <linux/percpu.h>
 #include <linux/buffer_head.h>
+#include <linux/locallock.h>
 
 #include "squashfs_fs.h"
 #include "squashfs_fs_sb.h"
@@ -25,6 +26,8 @@ struct squashfs_stream {
 	void		*stream;
 };
 
+static DEFINE_LOCAL_IRQ_LOCK(stream_lock);
+
 void *squashfs_decompressor_create(struct squashfs_sb_info *msblk,
 						void *comp_opts)
 {
@@ -79,10 +82,15 @@ int squashfs_decompress(struct squashfs_sb_info *msblk, struct buffer_head **bh,
 {
 	struct squashfs_stream __percpu *percpu =
 			(struct squashfs_stream __percpu *) msblk->stream;
-	struct squashfs_stream *stream = get_cpu_ptr(percpu);
-	int res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
-		offset, length, output);
-	put_cpu_ptr(stream);
+	struct squashfs_stream *stream;
+	int res;
+
+	stream = get_locked_ptr(stream_lock, percpu);
+
+	res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
+			offset, length, output);
+
+	put_locked_ptr(stream_lock, stream);
 
 	if (res < 0)
 		ERROR("%s decompression failed, data probably corrupt\n",
diff --git a/include/linux/locallock.h b/include/linux/locallock.h
index d658c2552601..c3ab5183a6a1 100644
--- a/include/linux/locallock.h
+++ b/include/linux/locallock.h
@@ -222,6 +222,14 @@ static inline int __local_unlock_irqrestore(struct local_irq_lock *lv,
 
 #define put_locked_var(lvar, var)	local_unlock(lvar);
 
+#define get_locked_ptr(lvar, var)					\
+	({								\
+		local_lock(lvar);					\
+		this_cpu_ptr(var);					\
+	})
+
+#define put_locked_ptr(lvar, var)	local_unlock(lvar);
+
 #define local_lock_cpu(lvar)						\
 	({								\
 		local_lock(lvar);					\

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] squashfs: Disable "percpu multiple decompressor" on RT
  2018-05-03 15:48     ` Julia Cartwright
@ 2018-05-07  6:09       ` Alexander Stein
  2018-05-07 13:58         ` [PATCH RT 1/2] locallock: provide {get,put}_locked_ptr() variants Julia Cartwright
  0 siblings, 1 reply; 7+ messages in thread
From: Alexander Stein @ 2018-05-07  6:09 UTC (permalink / raw)
  To: Julia Cartwright; +Cc: linux-rt-users, bigeasy, tglx

Hello Julia,

On Thursday, May 3, 2018, 5:48:34 PM CEST Julia Cartwright wrote:
> [...]
> Hrmph.  I was lost in the percpu arithmetic.  Should have built with
> sparse, it clearly tells me I screwed up.  Are you able to give this a
> try?  It's sparse-clean now :)

That does the trick. The max latency stays at 80µs in my test case. Thanks.
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>

Best regards,
Alexander

> diff --git a/fs/squashfs/decompressor_multi_percpu.c b/fs/squashfs/decompressor_multi_percpu.c
> index 23a9c28ad8ea..6a73c4fa88e7 100644
> --- a/fs/squashfs/decompressor_multi_percpu.c
> +++ b/fs/squashfs/decompressor_multi_percpu.c
> @@ -10,6 +10,7 @@
>  #include <linux/slab.h>
>  #include <linux/percpu.h>
>  #include <linux/buffer_head.h>
> +#include <linux/locallock.h>
>  
>  #include "squashfs_fs.h"
>  #include "squashfs_fs_sb.h"
> @@ -25,6 +26,8 @@ struct squashfs_stream {
>  	void		*stream;
>  };
>  
> +static DEFINE_LOCAL_IRQ_LOCK(stream_lock);
> +
>  void *squashfs_decompressor_create(struct squashfs_sb_info *msblk,
>  						void *comp_opts)
>  {
> @@ -79,10 +82,15 @@ int squashfs_decompress(struct squashfs_sb_info *msblk, struct buffer_head **bh,
>  {
>  	struct squashfs_stream __percpu *percpu =
>  			(struct squashfs_stream __percpu *) msblk->stream;
> -	struct squashfs_stream *stream = get_cpu_ptr(percpu);
> -	int res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
> -		offset, length, output);
> -	put_cpu_ptr(stream);
> +	struct squashfs_stream *stream;
> +	int res;
> +
> +	stream = get_locked_ptr(stream_lock, percpu);
> +
> +	res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
> +			offset, length, output);
> +
> +	put_locked_ptr(stream_lock, stream);
>  
>  	if (res < 0)
>  		ERROR("%s decompression failed, data probably corrupt\n",
> diff --git a/include/linux/locallock.h b/include/linux/locallock.h
> index d658c2552601..c3ab5183a6a1 100644
> --- a/include/linux/locallock.h
> +++ b/include/linux/locallock.h
> @@ -222,6 +222,14 @@ static inline int __local_unlock_irqrestore(struct local_irq_lock *lv,
>  
>  #define put_locked_var(lvar, var)	local_unlock(lvar);
>  
> +#define get_locked_ptr(lvar, var)					\
> +	({								\
> +		local_lock(lvar);					\
> +		this_cpu_ptr(var);					\
> +	})
> +
> +#define put_locked_ptr(lvar, var)	local_unlock(lvar);
> +
>  #define local_lock_cpu(lvar)						\
>  	({								\
>  		local_lock(lvar);					\
> 
> 




^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH RT 1/2] locallock: provide {get,put}_locked_ptr() variants
  2018-05-07  6:09       ` Alexander Stein
@ 2018-05-07 13:58         ` Julia Cartwright
  2018-05-07 13:58           ` [PATCH RT 2/2] squashfs: make use of local lock in multi_cpu decompressor Julia Cartwright
  0 siblings, 1 reply; 7+ messages in thread
From: Julia Cartwright @ 2018-05-07 13:58 UTC (permalink / raw)
  To: bigeasy, tglx
  Cc: linux-rt-users, linux-kernel, Phillip Lougher, Alexander Stein

Provide a set of locallocked accessors for pointers to per-CPU data;
this is useful for dynamically-allocated per-CPU regions, for example.

These are symmetric with the {get,put}_cpu_ptr() per-CPU accessor
variants.

Signed-off-by: Julia Cartwright <julia@ni.com>
---
 include/linux/locallock.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/locallock.h b/include/linux/locallock.h
index d658c2552601..921eab83cd34 100644
--- a/include/linux/locallock.h
+++ b/include/linux/locallock.h
@@ -222,6 +222,14 @@ static inline int __local_unlock_irqrestore(struct local_irq_lock *lv,
 
 #define put_locked_var(lvar, var)	local_unlock(lvar);
 
+#define get_locked_ptr(lvar, var)					\
+	({								\
+		local_lock(lvar);					\
+		this_cpu_ptr(var);					\
+	})
+
+#define put_locked_ptr(lvar, var)	local_unlock(lvar);
+
 #define local_lock_cpu(lvar)						\
 	({								\
 		local_lock(lvar);					\
@@ -262,6 +270,8 @@ static inline void local_irq_lock_init(int lvar) { }
 
 #define get_locked_var(lvar, var)		get_cpu_var(var)
 #define put_locked_var(lvar, var)		put_cpu_var(var)
+#define get_locked_ptr(lvar, var)		get_cpu_ptr(var)
+#define put_locked_ptr(lvar, var)		put_cpu_ptr(var)
 
 #define local_lock_cpu(lvar)			get_cpu()
 #define local_unlock_cpu(lvar)			put_cpu()
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH RT 2/2] squashfs: make use of local lock in multi_cpu decompressor
  2018-05-07 13:58         ` [PATCH RT 1/2] locallock: provide {get,put}_locked_ptr() variants Julia Cartwright
@ 2018-05-07 13:58           ` Julia Cartwright
  0 siblings, 0 replies; 7+ messages in thread
From: Julia Cartwright @ 2018-05-07 13:58 UTC (permalink / raw)
  To: bigeasy, tglx
  Cc: linux-rt-users, linux-kernel, Phillip Lougher, Alexander Stein

Currently, the squashfs multi_cpu decompressor makes use of
get_cpu_ptr()/put_cpu_ptr(), which unconditionally disable preemption
during decompression.

Because the workload is distributed across CPUs, all CPUs can observe a
very high wakeup latency, which has been seen to be as much as 8000us.

Convert this decompressor to make use of a local lock, which will allow
execution of the decompressor with preemption-enabled, but also ensure
concurrent accesses to the percpu compressor data on the local CPU will
be serialized.

Reported-by: Alexander Stein <alexander.stein@systec-electronic.com>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Julia Cartwright <julia@ni.com>
---
 fs/squashfs/decompressor_multi_percpu.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/squashfs/decompressor_multi_percpu.c b/fs/squashfs/decompressor_multi_percpu.c
index 23a9c28ad8ea..6a73c4fa88e7 100644
--- a/fs/squashfs/decompressor_multi_percpu.c
+++ b/fs/squashfs/decompressor_multi_percpu.c
@@ -10,6 +10,7 @@
 #include <linux/slab.h>
 #include <linux/percpu.h>
 #include <linux/buffer_head.h>
+#include <linux/locallock.h>
 
 #include "squashfs_fs.h"
 #include "squashfs_fs_sb.h"
@@ -25,6 +26,8 @@ struct squashfs_stream {
 	void		*stream;
 };
 
+static DEFINE_LOCAL_IRQ_LOCK(stream_lock);
+
 void *squashfs_decompressor_create(struct squashfs_sb_info *msblk,
 						void *comp_opts)
 {
@@ -79,10 +82,15 @@ int squashfs_decompress(struct squashfs_sb_info *msblk, struct buffer_head **bh,
 {
 	struct squashfs_stream __percpu *percpu =
 			(struct squashfs_stream __percpu *) msblk->stream;
-	struct squashfs_stream *stream = get_cpu_ptr(percpu);
-	int res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
-		offset, length, output);
-	put_cpu_ptr(stream);
+	struct squashfs_stream *stream;
+	int res;
+
+	stream = get_locked_ptr(stream_lock, percpu);
+
+	res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
+			offset, length, output);
+
+	put_locked_ptr(stream_lock, stream);
 
 	if (res < 0)
 		ERROR("%s decompression failed, data probably corrupt\n",
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-05-07 13:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-02 13:12 [PATCH 1/1] squashfs: Disable "percpu multiple decompressor" on RT Alexander Stein
2018-05-02 14:37 ` Julia Cartwright
2018-05-03  6:36   ` Alexander Stein
2018-05-03 15:48     ` Julia Cartwright
2018-05-07  6:09       ` Alexander Stein
2018-05-07 13:58         ` [PATCH RT 1/2] locallock: provide {get,put}_locked_ptr() variants Julia Cartwright
2018-05-07 13:58           ` [PATCH RT 2/2] squashfs: make use of local lock in multi_cpu decompressor Julia Cartwright

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.