All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] zsmalloc: use workqueue to destroy pool in zpool callback
@ 2016-03-29 22:02 Yu Zhao
  2016-03-30  0:54 ` Sergey Senozhatsky
       [not found] ` <20160329235950.GA19927@bbox>
  0 siblings, 2 replies; 38+ messages in thread
From: Yu Zhao @ 2016-03-29 22:02 UTC (permalink / raw)
  To: Minchan Kim, Nitin Gupta; +Cc: Sergey Senozhatsky, linux-mm, Yu Zhao

zs_destroy_pool() might sleep so it shouldn't be used in zpool
destroy callback which can be invoked in softirq context when
zsmalloc is configured to work with zswap.

  BUG: scheduling while atomic: swapper/6/0/0x00000100
  ...
  Call Trace:
   <IRQ>  [<ffffffffaf09e31e>] dump_stack+0x4d/0x63
   [<ffffffffaf09aae2>] __schedule_bug+0x46/0x54
   [<ffffffffaea00704>] __schedule+0x334/0x3d3
   [<ffffffffaea00897>] schedule+0x37/0x80
   [<ffffffffaea00a5e>] schedule_preempt_disabled+0xe/0x10
   [<ffffffffaeafced5>] mutex_optimistic_spin+0x185/0x1c0
   [<ffffffffaea0215b>] __mutex_lock_slowpath+0x2b/0x100
   [<ffffffffaebf90ce>] ? __drain_alien_cache+0x9e/0xf0
   [<ffffffffaea0224b>] mutex_lock+0x1b/0x2f
   [<ffffffffaebca4f0>] kmem_cache_destroy+0x50/0x130
   [<ffffffffaec10405>] zs_destroy_pool+0x85/0xe0
   [<ffffffffaec1046e>] zs_zpool_destroy+0xe/0x10
   [<ffffffffaec101a4>] zpool_destroy_pool+0x54/0x70
   [<ffffffffaebedac2>] __zswap_pool_release+0x62/0x90
   [<ffffffffaeb1037e>] rcu_process_callbacks+0x22e/0x640
   [<ffffffffaeb15a3e>] ? run_timer_softirq+0x3e/0x280
   [<ffffffffaeabe13b>] __do_softirq+0xcb/0x250
   [<ffffffffaeabe4dc>] irq_exit+0x9c/0xb0
   [<ffffffffaea03e7a>] smp_apic_timer_interrupt+0x6a/0x80
   [<ffffffffaf0a394f>] apic_timer_interrupt+0x7f/0x90
   <EOI>  [<ffffffffaef609a2>] ? cpuidle_enter_state+0xb2/0x200
   [<ffffffffaef60985>] ? cpuidle_enter_state+0x95/0x200
   [<ffffffffaef60b27>] cpuidle_enter+0x17/0x20
   [<ffffffffaeaf79c5>] cpu_startup_entry+0x235/0x320
   [<ffffffffaea9a9db>] start_secondary+0xeb/0x100[  218.606157]

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 mm/zsmalloc.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index e72efb1..fca5366 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -262,6 +262,9 @@ struct zs_pool {
 #ifdef CONFIG_ZSMALLOC_STAT
 	struct dentry *stat_dentry;
 #endif
+#ifdef CONFIG_ZPOOL
+	struct work_struct zpool_destroy_work;
+#endif
 };
 
 /*
@@ -327,9 +330,17 @@ static void *zs_zpool_create(const char *name, gfp_t gfp,
 	return zs_create_pool(name, gfp);
 }
 
+static void zs_zpool_destroy_work(struct work_struct *work)
+{
+	zs_destroy_pool(container_of(work, struct zs_pool, zpool_destroy_work));
+}
+
 static void zs_zpool_destroy(void *pool)
 {
-	zs_destroy_pool(pool);
+	struct zs_pool *zs_pool = pool;
+
+	INIT_WORK(&zs_pool->zpool_destroy_work, zs_zpool_destroy_work);
+	schedule_work(&zs_pool->zpool_destroy_work);
 }
 
 static int zs_zpool_malloc(void *pool, size_t size, gfp_t gfp,
-- 
2.8.0.rc3.226.g39d4020

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH] zsmalloc: use workqueue to destroy pool in zpool callback
  2016-03-29 22:02 [PATCH] zsmalloc: use workqueue to destroy pool in zpool callback Yu Zhao
@ 2016-03-30  0:54 ` Sergey Senozhatsky
       [not found] ` <20160329235950.GA19927@bbox>
  1 sibling, 0 replies; 38+ messages in thread
From: Sergey Senozhatsky @ 2016-03-30  0:54 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Minchan Kim, Nitin Gupta, Sergey Senozhatsky, Sergey Senozhatsky,
	Andrew Morton, linux-mm

Hello,

On (03/29/16 15:02), Yu Zhao wrote:
> zs_destroy_pool() might sleep so it shouldn't be used in zpool
> destroy callback which can be invoked in softirq context when
> zsmalloc is configured to work with zswap.

hm, interesting...

if we use zsmalloc with pool_stat enabled (pool specific nodes in
debugfs), then doing quick hot_remove->hot_add of zram1 device

	remove zram1  (destroy pool -> schedule_work)
	add zram1 (zs_pool_stat_create pool1)

can result in an error on the last step, right? because we have
a race window between work->zs_pool_stat_destroy() and zs_pool_stat_create().
correct? should we move zs_destroy_pool() out of work callback function
and do it in zs_zpool_destroy()?

	-ss

>   BUG: scheduling while atomic: swapper/6/0/0x00000100
>   ...
>   Call Trace:
>    <IRQ>  [<ffffffffaf09e31e>] dump_stack+0x4d/0x63
>    [<ffffffffaf09aae2>] __schedule_bug+0x46/0x54
>    [<ffffffffaea00704>] __schedule+0x334/0x3d3
>    [<ffffffffaea00897>] schedule+0x37/0x80
>    [<ffffffffaea00a5e>] schedule_preempt_disabled+0xe/0x10
>    [<ffffffffaeafced5>] mutex_optimistic_spin+0x185/0x1c0
>    [<ffffffffaea0215b>] __mutex_lock_slowpath+0x2b/0x100
>    [<ffffffffaebf90ce>] ? __drain_alien_cache+0x9e/0xf0
>    [<ffffffffaea0224b>] mutex_lock+0x1b/0x2f
>    [<ffffffffaebca4f0>] kmem_cache_destroy+0x50/0x130
>    [<ffffffffaec10405>] zs_destroy_pool+0x85/0xe0
>    [<ffffffffaec1046e>] zs_zpool_destroy+0xe/0x10
>    [<ffffffffaec101a4>] zpool_destroy_pool+0x54/0x70
>    [<ffffffffaebedac2>] __zswap_pool_release+0x62/0x90
>    [<ffffffffaeb1037e>] rcu_process_callbacks+0x22e/0x640
>    [<ffffffffaeb15a3e>] ? run_timer_softirq+0x3e/0x280
>    [<ffffffffaeabe13b>] __do_softirq+0xcb/0x250
>    [<ffffffffaeabe4dc>] irq_exit+0x9c/0xb0
>    [<ffffffffaea03e7a>] smp_apic_timer_interrupt+0x6a/0x80
>    [<ffffffffaf0a394f>] apic_timer_interrupt+0x7f/0x90
>    <EOI>  [<ffffffffaef609a2>] ? cpuidle_enter_state+0xb2/0x200
>    [<ffffffffaef60985>] ? cpuidle_enter_state+0x95/0x200
>    [<ffffffffaef60b27>] cpuidle_enter+0x17/0x20
>    [<ffffffffaeaf79c5>] cpu_startup_entry+0x235/0x320
>    [<ffffffffaea9a9db>] start_secondary+0xeb/0x100[  218.606157]
> 
> Signed-off-by: Yu Zhao <yuzhao@google.com>
> ---
>  mm/zsmalloc.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index e72efb1..fca5366 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -262,6 +262,9 @@ struct zs_pool {
>  #ifdef CONFIG_ZSMALLOC_STAT
>  	struct dentry *stat_dentry;
>  #endif
> +#ifdef CONFIG_ZPOOL
> +	struct work_struct zpool_destroy_work;
> +#endif
>  };
>  
>  /*
> @@ -327,9 +330,17 @@ static void *zs_zpool_create(const char *name, gfp_t gfp,
>  	return zs_create_pool(name, gfp);
>  }
>  
> +static void zs_zpool_destroy_work(struct work_struct *work)
> +{
> +	zs_destroy_pool(container_of(work, struct zs_pool, zpool_destroy_work));
> +}
> +
>  static void zs_zpool_destroy(void *pool)
>  {
> -	zs_destroy_pool(pool);
> +	struct zs_pool *zs_pool = pool;
> +
> +	INIT_WORK(&zs_pool->zpool_destroy_work, zs_zpool_destroy_work);
> +	schedule_work(&zs_pool->zpool_destroy_work);
>  }
>  
>  static int zs_zpool_malloc(void *pool, size_t size, gfp_t gfp,
> -- 
> 2.8.0.rc3.226.g39d4020
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] zsmalloc: use workqueue to destroy pool in zpool callback
       [not found] ` <20160329235950.GA19927@bbox>
@ 2016-03-31  8:46     ` Sergey Senozhatsky
  0 siblings, 0 replies; 38+ messages in thread
From: Sergey Senozhatsky @ 2016-03-31  8:46 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Minchan Kim, Nitin Gupta, Sergey Senozhatsky, linux-mm,
	Dan Streetman, Seth Jennings, Sergey Senozhatsky, Andrew Morton,
	linux-kernel

On (03/30/16 08:59), Minchan Kim wrote:
> On Tue, Mar 29, 2016 at 03:02:57PM -0700, Yu Zhao wrote:
> > zs_destroy_pool() might sleep so it shouldn't be used in zpool
> > destroy callback which can be invoked in softirq context when
> > zsmalloc is configured to work with zswap.
> 
> I think it's a limitation of zswap design, not zsmalloc.
> Could you handle it in zswap?

agree. hm, looking at this backtrace

>   [<ffffffffaea0224b>] mutex_lock+0x1b/0x2f
>   [<ffffffffaebca4f0>] kmem_cache_destroy+0x50/0x130
>   [<ffffffffaec10405>] zs_destroy_pool+0x85/0xe0
>   [<ffffffffaec1046e>] zs_zpool_destroy+0xe/0x10
>   [<ffffffffaec101a4>] zpool_destroy_pool+0x54/0x70
>   [<ffffffffaebedac2>] __zswap_pool_release+0x62/0x90
>   [<ffffffffaeb1037e>] rcu_process_callbacks+0x22e/0x640
>   [<ffffffffaeb15a3e>] ? run_timer_softirq+0x3e/0x280
>   [<ffffffffaeabe13b>] __do_softirq+0xcb/0x250
>   [<ffffffffaeabe4dc>] irq_exit+0x9c/0xb0
>   [<ffffffffaea03e7a>] smp_apic_timer_interrupt+0x6a/0x80
>   [<ffffffffaf0a394f>] apic_timer_interrupt+0x7f/0x90

it also can hit the following path

	rcu_process_callbacks()
		__zswap_pool_release()
			zswap_pool_destroy()
				zswap_cpu_comp_destroy()
					cpu_notifier_register_begin()
						mutex_lock(&cpu_add_remove_lock);  <<<

can't it?

	-ss

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] zsmalloc: use workqueue to destroy pool in zpool callback
@ 2016-03-31  8:46     ` Sergey Senozhatsky
  0 siblings, 0 replies; 38+ messages in thread
From: Sergey Senozhatsky @ 2016-03-31  8:46 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Minchan Kim, Nitin Gupta, Sergey Senozhatsky, linux-mm,
	Dan Streetman, Seth Jennings, Sergey Senozhatsky, Andrew Morton,
	linux-kernel

On (03/30/16 08:59), Minchan Kim wrote:
> On Tue, Mar 29, 2016 at 03:02:57PM -0700, Yu Zhao wrote:
> > zs_destroy_pool() might sleep so it shouldn't be used in zpool
> > destroy callback which can be invoked in softirq context when
> > zsmalloc is configured to work with zswap.
> 
> I think it's a limitation of zswap design, not zsmalloc.
> Could you handle it in zswap?

agree. hm, looking at this backtrace

>   [<ffffffffaea0224b>] mutex_lock+0x1b/0x2f
>   [<ffffffffaebca4f0>] kmem_cache_destroy+0x50/0x130
>   [<ffffffffaec10405>] zs_destroy_pool+0x85/0xe0
>   [<ffffffffaec1046e>] zs_zpool_destroy+0xe/0x10
>   [<ffffffffaec101a4>] zpool_destroy_pool+0x54/0x70
>   [<ffffffffaebedac2>] __zswap_pool_release+0x62/0x90
>   [<ffffffffaeb1037e>] rcu_process_callbacks+0x22e/0x640
>   [<ffffffffaeb15a3e>] ? run_timer_softirq+0x3e/0x280
>   [<ffffffffaeabe13b>] __do_softirq+0xcb/0x250
>   [<ffffffffaeabe4dc>] irq_exit+0x9c/0xb0
>   [<ffffffffaea03e7a>] smp_apic_timer_interrupt+0x6a/0x80
>   [<ffffffffaf0a394f>] apic_timer_interrupt+0x7f/0x90

it also can hit the following path

	rcu_process_callbacks()
		__zswap_pool_release()
			zswap_pool_destroy()
				zswap_cpu_comp_destroy()
					cpu_notifier_register_begin()
						mutex_lock(&cpu_add_remove_lock);  <<<

can't it?

	-ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] zsmalloc: use workqueue to destroy pool in zpool callback
  2016-03-31  8:46     ` Sergey Senozhatsky
@ 2016-03-31 21:46       ` Yu Zhao
  -1 siblings, 0 replies; 38+ messages in thread
From: Yu Zhao @ 2016-03-31 21:46 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Minchan Kim, Nitin Gupta, linux-mm, Dan Streetman, Seth Jennings,
	Sergey Senozhatsky, Andrew Morton, linux-kernel

On Thu, Mar 31, 2016 at 05:46:39PM +0900, Sergey Senozhatsky wrote:
> On (03/30/16 08:59), Minchan Kim wrote:
> > On Tue, Mar 29, 2016 at 03:02:57PM -0700, Yu Zhao wrote:
> > > zs_destroy_pool() might sleep so it shouldn't be used in zpool
> > > destroy callback which can be invoked in softirq context when
> > > zsmalloc is configured to work with zswap.
> > 
> > I think it's a limitation of zswap design, not zsmalloc.
> > Could you handle it in zswap?
> 
> agree. hm, looking at this backtrace
> 
> >   [<ffffffffaea0224b>] mutex_lock+0x1b/0x2f
> >   [<ffffffffaebca4f0>] kmem_cache_destroy+0x50/0x130
> >   [<ffffffffaec10405>] zs_destroy_pool+0x85/0xe0
> >   [<ffffffffaec1046e>] zs_zpool_destroy+0xe/0x10
> >   [<ffffffffaec101a4>] zpool_destroy_pool+0x54/0x70
> >   [<ffffffffaebedac2>] __zswap_pool_release+0x62/0x90
> >   [<ffffffffaeb1037e>] rcu_process_callbacks+0x22e/0x640
> >   [<ffffffffaeb15a3e>] ? run_timer_softirq+0x3e/0x280
> >   [<ffffffffaeabe13b>] __do_softirq+0xcb/0x250
> >   [<ffffffffaeabe4dc>] irq_exit+0x9c/0xb0
> >   [<ffffffffaea03e7a>] smp_apic_timer_interrupt+0x6a/0x80
> >   [<ffffffffaf0a394f>] apic_timer_interrupt+0x7f/0x90
> 
> it also can hit the following path
> 
> 	rcu_process_callbacks()
> 		__zswap_pool_release()
> 			zswap_pool_destroy()
> 				zswap_cpu_comp_destroy()
> 					cpu_notifier_register_begin()
> 						mutex_lock(&cpu_add_remove_lock);  <<<
> 
> can't it?
> 
> 	-ss

Thanks, Sergey. Now I'm convinced the problem should be fixed in
zswap. Since the rcu callback is already executed asynchronously,
using workqueue to defer the callback further more doesn't seem
to cause additional race condition at least.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] zsmalloc: use workqueue to destroy pool in zpool callback
@ 2016-03-31 21:46       ` Yu Zhao
  0 siblings, 0 replies; 38+ messages in thread
From: Yu Zhao @ 2016-03-31 21:46 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Minchan Kim, Nitin Gupta, linux-mm, Dan Streetman, Seth Jennings,
	Sergey Senozhatsky, Andrew Morton, linux-kernel

On Thu, Mar 31, 2016 at 05:46:39PM +0900, Sergey Senozhatsky wrote:
> On (03/30/16 08:59), Minchan Kim wrote:
> > On Tue, Mar 29, 2016 at 03:02:57PM -0700, Yu Zhao wrote:
> > > zs_destroy_pool() might sleep so it shouldn't be used in zpool
> > > destroy callback which can be invoked in softirq context when
> > > zsmalloc is configured to work with zswap.
> > 
> > I think it's a limitation of zswap design, not zsmalloc.
> > Could you handle it in zswap?
> 
> agree. hm, looking at this backtrace
> 
> >   [<ffffffffaea0224b>] mutex_lock+0x1b/0x2f
> >   [<ffffffffaebca4f0>] kmem_cache_destroy+0x50/0x130
> >   [<ffffffffaec10405>] zs_destroy_pool+0x85/0xe0
> >   [<ffffffffaec1046e>] zs_zpool_destroy+0xe/0x10
> >   [<ffffffffaec101a4>] zpool_destroy_pool+0x54/0x70
> >   [<ffffffffaebedac2>] __zswap_pool_release+0x62/0x90
> >   [<ffffffffaeb1037e>] rcu_process_callbacks+0x22e/0x640
> >   [<ffffffffaeb15a3e>] ? run_timer_softirq+0x3e/0x280
> >   [<ffffffffaeabe13b>] __do_softirq+0xcb/0x250
> >   [<ffffffffaeabe4dc>] irq_exit+0x9c/0xb0
> >   [<ffffffffaea03e7a>] smp_apic_timer_interrupt+0x6a/0x80
> >   [<ffffffffaf0a394f>] apic_timer_interrupt+0x7f/0x90
> 
> it also can hit the following path
> 
> 	rcu_process_callbacks()
> 		__zswap_pool_release()
> 			zswap_pool_destroy()
> 				zswap_cpu_comp_destroy()
> 					cpu_notifier_register_begin()
> 						mutex_lock(&cpu_add_remove_lock);  <<<
> 
> can't it?
> 
> 	-ss

Thanks, Sergey. Now I'm convinced the problem should be fixed in
zswap. Since the rcu callback is already executed asynchronously,
using workqueue to defer the callback further more doesn't seem
to cause additional race condition at least.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] zsmalloc: use workqueue to destroy pool in zpool callback
  2016-03-31 21:46       ` Yu Zhao
@ 2016-03-31 22:05         ` Dan Streetman
  -1 siblings, 0 replies; 38+ messages in thread
From: Dan Streetman @ 2016-03-31 22:05 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Sergey Senozhatsky, Minchan Kim, Nitin Gupta, Linux-MM,
	Seth Jennings, Sergey Senozhatsky, Andrew Morton, linux-kernel

On Thu, Mar 31, 2016 at 5:46 PM, Yu Zhao <yuzhao@google.com> wrote:
> On Thu, Mar 31, 2016 at 05:46:39PM +0900, Sergey Senozhatsky wrote:
>> On (03/30/16 08:59), Minchan Kim wrote:
>> > On Tue, Mar 29, 2016 at 03:02:57PM -0700, Yu Zhao wrote:
>> > > zs_destroy_pool() might sleep so it shouldn't be used in zpool
>> > > destroy callback which can be invoked in softirq context when
>> > > zsmalloc is configured to work with zswap.
>> >
>> > I think it's a limitation of zswap design, not zsmalloc.
>> > Could you handle it in zswap?
>>
>> agree. hm, looking at this backtrace
>>
>> >   [<ffffffffaea0224b>] mutex_lock+0x1b/0x2f
>> >   [<ffffffffaebca4f0>] kmem_cache_destroy+0x50/0x130
>> >   [<ffffffffaec10405>] zs_destroy_pool+0x85/0xe0
>> >   [<ffffffffaec1046e>] zs_zpool_destroy+0xe/0x10
>> >   [<ffffffffaec101a4>] zpool_destroy_pool+0x54/0x70
>> >   [<ffffffffaebedac2>] __zswap_pool_release+0x62/0x90
>> >   [<ffffffffaeb1037e>] rcu_process_callbacks+0x22e/0x640
>> >   [<ffffffffaeb15a3e>] ? run_timer_softirq+0x3e/0x280
>> >   [<ffffffffaeabe13b>] __do_softirq+0xcb/0x250
>> >   [<ffffffffaeabe4dc>] irq_exit+0x9c/0xb0
>> >   [<ffffffffaea03e7a>] smp_apic_timer_interrupt+0x6a/0x80
>> >   [<ffffffffaf0a394f>] apic_timer_interrupt+0x7f/0x90
>>
>> it also can hit the following path
>>
>>       rcu_process_callbacks()
>>               __zswap_pool_release()
>>                       zswap_pool_destroy()
>>                               zswap_cpu_comp_destroy()
>>                                       cpu_notifier_register_begin()
>>                                               mutex_lock(&cpu_add_remove_lock);  <<<
>>
>> can't it?
>>
>>       -ss
>
> Thanks, Sergey. Now I'm convinced the problem should be fixed in
> zswap. Since the rcu callback is already executed asynchronously,
> using workqueue to defer the callback further more doesn't seem
> to cause additional race condition at least.

certainly seems appropriate to fix it in zswap, I'll work on a patch
unless Seth or anyone else is already working on it.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] zsmalloc: use workqueue to destroy pool in zpool callback
@ 2016-03-31 22:05         ` Dan Streetman
  0 siblings, 0 replies; 38+ messages in thread
From: Dan Streetman @ 2016-03-31 22:05 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Sergey Senozhatsky, Minchan Kim, Nitin Gupta, Linux-MM,
	Seth Jennings, Sergey Senozhatsky, Andrew Morton, linux-kernel

On Thu, Mar 31, 2016 at 5:46 PM, Yu Zhao <yuzhao@google.com> wrote:
> On Thu, Mar 31, 2016 at 05:46:39PM +0900, Sergey Senozhatsky wrote:
>> On (03/30/16 08:59), Minchan Kim wrote:
>> > On Tue, Mar 29, 2016 at 03:02:57PM -0700, Yu Zhao wrote:
>> > > zs_destroy_pool() might sleep so it shouldn't be used in zpool
>> > > destroy callback which can be invoked in softirq context when
>> > > zsmalloc is configured to work with zswap.
>> >
>> > I think it's a limitation of zswap design, not zsmalloc.
>> > Could you handle it in zswap?
>>
>> agree. hm, looking at this backtrace
>>
>> >   [<ffffffffaea0224b>] mutex_lock+0x1b/0x2f
>> >   [<ffffffffaebca4f0>] kmem_cache_destroy+0x50/0x130
>> >   [<ffffffffaec10405>] zs_destroy_pool+0x85/0xe0
>> >   [<ffffffffaec1046e>] zs_zpool_destroy+0xe/0x10
>> >   [<ffffffffaec101a4>] zpool_destroy_pool+0x54/0x70
>> >   [<ffffffffaebedac2>] __zswap_pool_release+0x62/0x90
>> >   [<ffffffffaeb1037e>] rcu_process_callbacks+0x22e/0x640
>> >   [<ffffffffaeb15a3e>] ? run_timer_softirq+0x3e/0x280
>> >   [<ffffffffaeabe13b>] __do_softirq+0xcb/0x250
>> >   [<ffffffffaeabe4dc>] irq_exit+0x9c/0xb0
>> >   [<ffffffffaea03e7a>] smp_apic_timer_interrupt+0x6a/0x80
>> >   [<ffffffffaf0a394f>] apic_timer_interrupt+0x7f/0x90
>>
>> it also can hit the following path
>>
>>       rcu_process_callbacks()
>>               __zswap_pool_release()
>>                       zswap_pool_destroy()
>>                               zswap_cpu_comp_destroy()
>>                                       cpu_notifier_register_begin()
>>                                               mutex_lock(&cpu_add_remove_lock);  <<<
>>
>> can't it?
>>
>>       -ss
>
> Thanks, Sergey. Now I'm convinced the problem should be fixed in
> zswap. Since the rcu callback is already executed asynchronously,
> using workqueue to defer the callback further more doesn't seem
> to cause additional race condition at least.

certainly seems appropriate to fix it in zswap, I'll work on a patch
unless Seth or anyone else is already working on it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH] mm/zpool: use workqueue for zpool_destroy
  2016-03-31 22:05         ` Dan Streetman
@ 2016-04-25 21:20           ` Dan Streetman
  -1 siblings, 0 replies; 38+ messages in thread
From: Dan Streetman @ 2016-04-25 21:20 UTC (permalink / raw)
  To: Yu Zhao, Andrew Morton, Seth Jennings
  Cc: Sergey Senozhatsky, Minchan Kim, Nitin Gupta, Linux-MM,
	Sergey Senozhatsky, linux-kernel, Dan Streetman, Dan Streetman

Add a work_struct to struct zpool, and change zpool_destroy_pool to
defer calling the pool implementation destroy.

The zsmalloc pool destroy function, which is one of the zpool
implementations, may sleep during destruction of the pool.  However
zswap, which uses zpool, may call zpool_destroy_pool from atomic
context.  So we need to defer the call to the zpool implementation
to destroy the pool.

This is essentially the same as Yu Zhao's proposed patch to zsmalloc,
but moved to zpool.

Reported-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Cc: Dan Streetman <dan.streetman@canonical.com>
---
 mm/zpool.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/mm/zpool.c b/mm/zpool.c
index fd3ff71..ea12069 100644
--- a/mm/zpool.c
+++ b/mm/zpool.c
@@ -23,6 +23,7 @@ struct zpool {
 	const struct zpool_ops *ops;
 
 	struct list_head list;
+	struct work_struct work;
 };
 
 static LIST_HEAD(drivers_head);
@@ -197,6 +198,15 @@ struct zpool *zpool_create_pool(const char *type, const char *name, gfp_t gfp,
 	return zpool;
 }
 
+static void zpool_destroy_pool_work(struct work_struct *work)
+{
+	struct zpool *zpool = container_of(work, struct zpool, work);
+
+	zpool->driver->destroy(zpool->pool);
+	zpool_put_driver(zpool->driver);
+	kfree(zpool);
+}
+
 /**
  * zpool_destroy_pool() - Destroy a zpool
  * @pool	The zpool to destroy.
@@ -204,7 +214,8 @@ struct zpool *zpool_create_pool(const char *type, const char *name, gfp_t gfp,
  * Implementations must guarantee this to be thread-safe,
  * however only when destroying different pools.  The same
  * pool should only be destroyed once, and should not be used
- * after it is destroyed.
+ * after it is destroyed.  This defers calling the implementation
+ * to a workqueue, so the implementation may sleep.
  *
  * This destroys an existing zpool.  The zpool should not be in use.
  */
@@ -215,9 +226,8 @@ void zpool_destroy_pool(struct zpool *zpool)
 	spin_lock(&pools_lock);
 	list_del(&zpool->list);
 	spin_unlock(&pools_lock);
-	zpool->driver->destroy(zpool->pool);
-	zpool_put_driver(zpool->driver);
-	kfree(zpool);
+	INIT_WORK(&zpool->work, zpool_destroy_pool_work);
+	schedule_work(&zpool->work);
 }
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH] mm/zpool: use workqueue for zpool_destroy
@ 2016-04-25 21:20           ` Dan Streetman
  0 siblings, 0 replies; 38+ messages in thread
From: Dan Streetman @ 2016-04-25 21:20 UTC (permalink / raw)
  To: Yu Zhao, Andrew Morton, Seth Jennings
  Cc: Sergey Senozhatsky, Minchan Kim, Nitin Gupta, Linux-MM,
	Sergey Senozhatsky, linux-kernel, Dan Streetman, Dan Streetman

Add a work_struct to struct zpool, and change zpool_destroy_pool to
defer calling the pool implementation destroy.

The zsmalloc pool destroy function, which is one of the zpool
implementations, may sleep during destruction of the pool.  However
zswap, which uses zpool, may call zpool_destroy_pool from atomic
context.  So we need to defer the call to the zpool implementation
to destroy the pool.

This is essentially the same as Yu Zhao's proposed patch to zsmalloc,
but moved to zpool.

Reported-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Cc: Dan Streetman <dan.streetman@canonical.com>
---
 mm/zpool.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/mm/zpool.c b/mm/zpool.c
index fd3ff71..ea12069 100644
--- a/mm/zpool.c
+++ b/mm/zpool.c
@@ -23,6 +23,7 @@ struct zpool {
 	const struct zpool_ops *ops;
 
 	struct list_head list;
+	struct work_struct work;
 };
 
 static LIST_HEAD(drivers_head);
@@ -197,6 +198,15 @@ struct zpool *zpool_create_pool(const char *type, const char *name, gfp_t gfp,
 	return zpool;
 }
 
+static void zpool_destroy_pool_work(struct work_struct *work)
+{
+	struct zpool *zpool = container_of(work, struct zpool, work);
+
+	zpool->driver->destroy(zpool->pool);
+	zpool_put_driver(zpool->driver);
+	kfree(zpool);
+}
+
 /**
  * zpool_destroy_pool() - Destroy a zpool
  * @pool	The zpool to destroy.
@@ -204,7 +214,8 @@ struct zpool *zpool_create_pool(const char *type, const char *name, gfp_t gfp,
  * Implementations must guarantee this to be thread-safe,
  * however only when destroying different pools.  The same
  * pool should only be destroyed once, and should not be used
- * after it is destroyed.
+ * after it is destroyed.  This defers calling the implementation
+ * to a workqueue, so the implementation may sleep.
  *
  * This destroys an existing zpool.  The zpool should not be in use.
  */
@@ -215,9 +226,8 @@ void zpool_destroy_pool(struct zpool *zpool)
 	spin_lock(&pools_lock);
 	list_del(&zpool->list);
 	spin_unlock(&pools_lock);
-	zpool->driver->destroy(zpool->pool);
-	zpool_put_driver(zpool->driver);
-	kfree(zpool);
+	INIT_WORK(&zpool->work, zpool_destroy_pool_work);
+	schedule_work(&zpool->work);
 }
 
 /**
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm/zpool: use workqueue for zpool_destroy
  2016-04-25 21:20           ` Dan Streetman
@ 2016-04-25 21:46             ` Andrew Morton
  -1 siblings, 0 replies; 38+ messages in thread
From: Andrew Morton @ 2016-04-25 21:46 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Yu Zhao, Seth Jennings, Sergey Senozhatsky, Minchan Kim,
	Nitin Gupta, Linux-MM, Sergey Senozhatsky, linux-kernel,
	Dan Streetman

On Mon, 25 Apr 2016 17:20:10 -0400 Dan Streetman <ddstreet@ieee.org> wrote:

> Add a work_struct to struct zpool, and change zpool_destroy_pool to
> defer calling the pool implementation destroy.
> 
> The zsmalloc pool destroy function, which is one of the zpool
> implementations, may sleep during destruction of the pool.  However
> zswap, which uses zpool, may call zpool_destroy_pool from atomic
> context.  So we need to defer the call to the zpool implementation
> to destroy the pool.
> 
> This is essentially the same as Yu Zhao's proposed patch to zsmalloc,
> but moved to zpool.

OK, but the refrain remains the same: what are the runtime effects of
the change?  Are real people in real worlds seeing scary kernel
warnings?  Deadlocks?

This info is needed so that I and others can decide which kernel
version(s) should be patched.

Thanks.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm/zpool: use workqueue for zpool_destroy
@ 2016-04-25 21:46             ` Andrew Morton
  0 siblings, 0 replies; 38+ messages in thread
From: Andrew Morton @ 2016-04-25 21:46 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Yu Zhao, Seth Jennings, Sergey Senozhatsky, Minchan Kim,
	Nitin Gupta, Linux-MM, Sergey Senozhatsky, linux-kernel,
	Dan Streetman

On Mon, 25 Apr 2016 17:20:10 -0400 Dan Streetman <ddstreet@ieee.org> wrote:

> Add a work_struct to struct zpool, and change zpool_destroy_pool to
> defer calling the pool implementation destroy.
> 
> The zsmalloc pool destroy function, which is one of the zpool
> implementations, may sleep during destruction of the pool.  However
> zswap, which uses zpool, may call zpool_destroy_pool from atomic
> context.  So we need to defer the call to the zpool implementation
> to destroy the pool.
> 
> This is essentially the same as Yu Zhao's proposed patch to zsmalloc,
> but moved to zpool.

OK, but the refrain remains the same: what are the runtime effects of
the change?  Are real people in real worlds seeing scary kernel
warnings?  Deadlocks?

This info is needed so that I and others can decide which kernel
version(s) should be patched.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm/zpool: use workqueue for zpool_destroy
  2016-04-25 21:20           ` Dan Streetman
@ 2016-04-25 22:18             ` Yu Zhao
  -1 siblings, 0 replies; 38+ messages in thread
From: Yu Zhao @ 2016-04-25 22:18 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Andrew Morton, Seth Jennings, Sergey Senozhatsky, Minchan Kim,
	Nitin Gupta, Linux-MM, Sergey Senozhatsky, linux-kernel,
	Dan Streetman

On Mon, Apr 25, 2016 at 05:20:10PM -0400, Dan Streetman wrote:
> Add a work_struct to struct zpool, and change zpool_destroy_pool to
> defer calling the pool implementation destroy.
> 
> The zsmalloc pool destroy function, which is one of the zpool
> implementations, may sleep during destruction of the pool.  However
> zswap, which uses zpool, may call zpool_destroy_pool from atomic
> context.  So we need to defer the call to the zpool implementation
> to destroy the pool.
> 
> This is essentially the same as Yu Zhao's proposed patch to zsmalloc,
> but moved to zpool.

Thanks, Dan. Sergey also mentioned another call path that triggers the
same problem (BUG: scheduling while atomic):
  rcu_process_callbacks()
          __zswap_pool_release()
                  zswap_pool_destroy()
                          zswap_cpu_comp_destroy()
                                  cpu_notifier_register_begin()
                                          mutex_lock(&cpu_add_remove_lock);
So I was thinking zswap_pool_destroy() might be done in workqueue in zswap.c.
This way we fix both call paths.

Or you have another patch to fix the second call path?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm/zpool: use workqueue for zpool_destroy
@ 2016-04-25 22:18             ` Yu Zhao
  0 siblings, 0 replies; 38+ messages in thread
From: Yu Zhao @ 2016-04-25 22:18 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Andrew Morton, Seth Jennings, Sergey Senozhatsky, Minchan Kim,
	Nitin Gupta, Linux-MM, Sergey Senozhatsky, linux-kernel,
	Dan Streetman

On Mon, Apr 25, 2016 at 05:20:10PM -0400, Dan Streetman wrote:
> Add a work_struct to struct zpool, and change zpool_destroy_pool to
> defer calling the pool implementation destroy.
> 
> The zsmalloc pool destroy function, which is one of the zpool
> implementations, may sleep during destruction of the pool.  However
> zswap, which uses zpool, may call zpool_destroy_pool from atomic
> context.  So we need to defer the call to the zpool implementation
> to destroy the pool.
> 
> This is essentially the same as Yu Zhao's proposed patch to zsmalloc,
> but moved to zpool.

Thanks, Dan. Sergey also mentioned another call path that triggers the
same problem (BUG: scheduling while atomic):
  rcu_process_callbacks()
          __zswap_pool_release()
                  zswap_pool_destroy()
                          zswap_cpu_comp_destroy()
                                  cpu_notifier_register_begin()
                                          mutex_lock(&cpu_add_remove_lock);
So I was thinking zswap_pool_destroy() might be done in workqueue in zswap.c.
This way we fix both call paths.

Or you have another patch to fix the second call path?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm/zpool: use workqueue for zpool_destroy
  2016-04-25 22:18             ` Yu Zhao
@ 2016-04-26  0:59               ` Sergey Senozhatsky
  -1 siblings, 0 replies; 38+ messages in thread
From: Sergey Senozhatsky @ 2016-04-26  0:59 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Dan Streetman, Andrew Morton, Seth Jennings, Sergey Senozhatsky,
	Minchan Kim, Nitin Gupta, Linux-MM, Sergey Senozhatsky,
	linux-kernel, Dan Streetman

On (04/25/16 15:18), Yu Zhao wrote:
> On Mon, Apr 25, 2016 at 05:20:10PM -0400, Dan Streetman wrote:
> > Add a work_struct to struct zpool, and change zpool_destroy_pool to
> > defer calling the pool implementation destroy.
> > 
> > The zsmalloc pool destroy function, which is one of the zpool
> > implementations, may sleep during destruction of the pool.  However
> > zswap, which uses zpool, may call zpool_destroy_pool from atomic
> > context.  So we need to defer the call to the zpool implementation
> > to destroy the pool.
> > 
> > This is essentially the same as Yu Zhao's proposed patch to zsmalloc,
> > but moved to zpool.
> 
> Thanks, Dan. Sergey also mentioned another call path that triggers the
> same problem (BUG: scheduling while atomic):
>   rcu_process_callbacks()
>           __zswap_pool_release()
>                   zswap_pool_destroy()
>                           zswap_cpu_comp_destroy()
>                                   cpu_notifier_register_begin()
>                                           mutex_lock(&cpu_add_remove_lock);
> So I was thinking zswap_pool_destroy() might be done in workqueue in zswap.c.
> This way we fix both call paths.

right, I'm not sure the patch addressed all of the issues.

	-ss

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm/zpool: use workqueue for zpool_destroy
@ 2016-04-26  0:59               ` Sergey Senozhatsky
  0 siblings, 0 replies; 38+ messages in thread
From: Sergey Senozhatsky @ 2016-04-26  0:59 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Dan Streetman, Andrew Morton, Seth Jennings, Sergey Senozhatsky,
	Minchan Kim, Nitin Gupta, Linux-MM, Sergey Senozhatsky,
	linux-kernel, Dan Streetman

On (04/25/16 15:18), Yu Zhao wrote:
> On Mon, Apr 25, 2016 at 05:20:10PM -0400, Dan Streetman wrote:
> > Add a work_struct to struct zpool, and change zpool_destroy_pool to
> > defer calling the pool implementation destroy.
> > 
> > The zsmalloc pool destroy function, which is one of the zpool
> > implementations, may sleep during destruction of the pool.  However
> > zswap, which uses zpool, may call zpool_destroy_pool from atomic
> > context.  So we need to defer the call to the zpool implementation
> > to destroy the pool.
> > 
> > This is essentially the same as Yu Zhao's proposed patch to zsmalloc,
> > but moved to zpool.
> 
> Thanks, Dan. Sergey also mentioned another call path that triggers the
> same problem (BUG: scheduling while atomic):
>   rcu_process_callbacks()
>           __zswap_pool_release()
>                   zswap_pool_destroy()
>                           zswap_cpu_comp_destroy()
>                                   cpu_notifier_register_begin()
>                                           mutex_lock(&cpu_add_remove_lock);
> So I was thinking zswap_pool_destroy() might be done in workqueue in zswap.c.
> This way we fix both call paths.

right, I'm not sure the patch addressed all of the issues.

	-ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm/zpool: use workqueue for zpool_destroy
  2016-04-25 22:18             ` Yu Zhao
@ 2016-04-26 11:07               ` Dan Streetman
  -1 siblings, 0 replies; 38+ messages in thread
From: Dan Streetman @ 2016-04-26 11:07 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Andrew Morton, Seth Jennings, Sergey Senozhatsky, Minchan Kim,
	Nitin Gupta, Linux-MM, Sergey Senozhatsky, linux-kernel,
	Dan Streetman

On Mon, Apr 25, 2016 at 6:18 PM, Yu Zhao <yuzhao@google.com> wrote:
> On Mon, Apr 25, 2016 at 05:20:10PM -0400, Dan Streetman wrote:
>> Add a work_struct to struct zpool, and change zpool_destroy_pool to
>> defer calling the pool implementation destroy.
>>
>> The zsmalloc pool destroy function, which is one of the zpool
>> implementations, may sleep during destruction of the pool.  However
>> zswap, which uses zpool, may call zpool_destroy_pool from atomic
>> context.  So we need to defer the call to the zpool implementation
>> to destroy the pool.
>>
>> This is essentially the same as Yu Zhao's proposed patch to zsmalloc,
>> but moved to zpool.
>
> Thanks, Dan. Sergey also mentioned another call path that triggers the
> same problem (BUG: scheduling while atomic):
>   rcu_process_callbacks()
>           __zswap_pool_release()
>                   zswap_pool_destroy()
>                           zswap_cpu_comp_destroy()
>                                   cpu_notifier_register_begin()
>                                           mutex_lock(&cpu_add_remove_lock);
> So I was thinking zswap_pool_destroy() might be done in workqueue in zswap.c.
> This way we fix both call paths.

Yes, you're right, I took so long to get around to this I forgot the details :-)

I'll send a new patch to zswap.

>
> Or you have another patch to fix the second call path?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm/zpool: use workqueue for zpool_destroy
@ 2016-04-26 11:07               ` Dan Streetman
  0 siblings, 0 replies; 38+ messages in thread
From: Dan Streetman @ 2016-04-26 11:07 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Andrew Morton, Seth Jennings, Sergey Senozhatsky, Minchan Kim,
	Nitin Gupta, Linux-MM, Sergey Senozhatsky, linux-kernel,
	Dan Streetman

On Mon, Apr 25, 2016 at 6:18 PM, Yu Zhao <yuzhao@google.com> wrote:
> On Mon, Apr 25, 2016 at 05:20:10PM -0400, Dan Streetman wrote:
>> Add a work_struct to struct zpool, and change zpool_destroy_pool to
>> defer calling the pool implementation destroy.
>>
>> The zsmalloc pool destroy function, which is one of the zpool
>> implementations, may sleep during destruction of the pool.  However
>> zswap, which uses zpool, may call zpool_destroy_pool from atomic
>> context.  So we need to defer the call to the zpool implementation
>> to destroy the pool.
>>
>> This is essentially the same as Yu Zhao's proposed patch to zsmalloc,
>> but moved to zpool.
>
> Thanks, Dan. Sergey also mentioned another call path that triggers the
> same problem (BUG: scheduling while atomic):
>   rcu_process_callbacks()
>           __zswap_pool_release()
>                   zswap_pool_destroy()
>                           zswap_cpu_comp_destroy()
>                                   cpu_notifier_register_begin()
>                                           mutex_lock(&cpu_add_remove_lock);
> So I was thinking zswap_pool_destroy() might be done in workqueue in zswap.c.
> This way we fix both call paths.

Yes, you're right, I took so long to get around to this I forgot the details :-)

I'll send a new patch to zswap.

>
> Or you have another patch to fix the second call path?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH] mm/zswap: use workqueue to destroy pool
  2016-04-25 21:20           ` Dan Streetman
@ 2016-04-26 21:08             ` Dan Streetman
  -1 siblings, 0 replies; 38+ messages in thread
From: Dan Streetman @ 2016-04-26 21:08 UTC (permalink / raw)
  To: Yu Zhao, Andrew Morton, Seth Jennings
  Cc: Sergey Senozhatsky, Minchan Kim, Nitin Gupta, Linux-MM,
	Sergey Senozhatsky, linux-kernel, Dan Streetman, Dan Streetman

Add a work_struct to struct zswap_pool, and change __zswap_pool_empty
to use the workqueue instead of using call_rcu().

When zswap destroys a pool no longer in use, it uses call_rcu() to
perform the destruction/freeing.  Since that executes in softirq
context, it must not sleep.  However, actually destroying the pool
involves freeing the per-cpu compressors (which requires locking the
cpu_add_remove_lock mutex) and freeing the zpool, for which the
implementation may sleep (e.g. zsmalloc calls kmem_cache_destroy,
which locks the slab_mutex).  So if either mutex is currently taken,
or any other part of the compressor or zpool implementation sleeps, it
will result in a BUG().

It's not easy to reproduce this when changing zswap's params normally.
In testing with a loaded system, this does not fail:

$ cd /sys/module/zswap/parameters
$ echo lz4 > compressor ; echo zsmalloc > zpool

nor does this:

$ while true ; do
> echo lzo > compressor ; echo zbud > zpool
> sleep 1
> echo lz4 > compressor ; echo zsmalloc > zpool
> sleep 1
> done

although it's still possible either of those might fail, depending on
whether anything else besides zswap has locked the mutexes.

However, changing a parameter with no delay immediately causes the
schedule while atomic BUG:

$ while true ; do
> echo lzo > compressor ; echo lz4 > compressor
> done

This is essentially the same as Yu Zhao's proposed patch to zsmalloc,
but moved to zswap, to cover compressor and zpool freeing.

Fixes: f1c54846ee45 ("zswap: dynamic pool creation")
Reported-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Cc: Dan Streetman <dan.streetman@canonical.com>
---
 mm/zswap.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index 91dad80..f207da7 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -117,7 +117,7 @@ struct zswap_pool {
 	struct crypto_comp * __percpu *tfm;
 	struct kref kref;
 	struct list_head list;
-	struct rcu_head rcu_head;
+	struct work_struct work;
 	struct notifier_block notifier;
 	char tfm_name[CRYPTO_MAX_ALG_NAME];
 };
@@ -652,9 +652,11 @@ static int __must_check zswap_pool_get(struct zswap_pool *pool)
 	return kref_get_unless_zero(&pool->kref);
 }
 
-static void __zswap_pool_release(struct rcu_head *head)
+static void __zswap_pool_release(struct work_struct *work)
 {
-	struct zswap_pool *pool = container_of(head, typeof(*pool), rcu_head);
+	struct zswap_pool *pool = container_of(work, typeof(*pool), work);
+
+	synchronize_rcu();
 
 	/* nobody should have been able to get a kref... */
 	WARN_ON(kref_get_unless_zero(&pool->kref));
@@ -674,7 +676,9 @@ static void __zswap_pool_empty(struct kref *kref)
 	WARN_ON(pool == zswap_pool_current());
 
 	list_del_rcu(&pool->list);
-	call_rcu(&pool->rcu_head, __zswap_pool_release);
+
+	INIT_WORK(&pool->work, __zswap_pool_release);
+	schedule_work(&pool->work);
 
 	spin_unlock(&zswap_pools_lock);
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH] mm/zswap: use workqueue to destroy pool
@ 2016-04-26 21:08             ` Dan Streetman
  0 siblings, 0 replies; 38+ messages in thread
From: Dan Streetman @ 2016-04-26 21:08 UTC (permalink / raw)
  To: Yu Zhao, Andrew Morton, Seth Jennings
  Cc: Sergey Senozhatsky, Minchan Kim, Nitin Gupta, Linux-MM,
	Sergey Senozhatsky, linux-kernel, Dan Streetman, Dan Streetman

Add a work_struct to struct zswap_pool, and change __zswap_pool_empty
to use the workqueue instead of using call_rcu().

When zswap destroys a pool no longer in use, it uses call_rcu() to
perform the destruction/freeing.  Since that executes in softirq
context, it must not sleep.  However, actually destroying the pool
involves freeing the per-cpu compressors (which requires locking the
cpu_add_remove_lock mutex) and freeing the zpool, for which the
implementation may sleep (e.g. zsmalloc calls kmem_cache_destroy,
which locks the slab_mutex).  So if either mutex is currently taken,
or any other part of the compressor or zpool implementation sleeps, it
will result in a BUG().

It's not easy to reproduce this when changing zswap's params normally.
In testing with a loaded system, this does not fail:

$ cd /sys/module/zswap/parameters
$ echo lz4 > compressor ; echo zsmalloc > zpool

nor does this:

$ while true ; do
> echo lzo > compressor ; echo zbud > zpool
> sleep 1
> echo lz4 > compressor ; echo zsmalloc > zpool
> sleep 1
> done

although it's still possible either of those might fail, depending on
whether anything else besides zswap has locked the mutexes.

However, changing a parameter with no delay immediately causes the
schedule while atomic BUG:

$ while true ; do
> echo lzo > compressor ; echo lz4 > compressor
> done

This is essentially the same as Yu Zhao's proposed patch to zsmalloc,
but moved to zswap, to cover compressor and zpool freeing.

Fixes: f1c54846ee45 ("zswap: dynamic pool creation")
Reported-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Cc: Dan Streetman <dan.streetman@canonical.com>
---
 mm/zswap.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index 91dad80..f207da7 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -117,7 +117,7 @@ struct zswap_pool {
 	struct crypto_comp * __percpu *tfm;
 	struct kref kref;
 	struct list_head list;
-	struct rcu_head rcu_head;
+	struct work_struct work;
 	struct notifier_block notifier;
 	char tfm_name[CRYPTO_MAX_ALG_NAME];
 };
@@ -652,9 +652,11 @@ static int __must_check zswap_pool_get(struct zswap_pool *pool)
 	return kref_get_unless_zero(&pool->kref);
 }
 
-static void __zswap_pool_release(struct rcu_head *head)
+static void __zswap_pool_release(struct work_struct *work)
 {
-	struct zswap_pool *pool = container_of(head, typeof(*pool), rcu_head);
+	struct zswap_pool *pool = container_of(work, typeof(*pool), work);
+
+	synchronize_rcu();
 
 	/* nobody should have been able to get a kref... */
 	WARN_ON(kref_get_unless_zero(&pool->kref));
@@ -674,7 +676,9 @@ static void __zswap_pool_empty(struct kref *kref)
 	WARN_ON(pool == zswap_pool_current());
 
 	list_del_rcu(&pool->list);
-	call_rcu(&pool->rcu_head, __zswap_pool_release);
+
+	INIT_WORK(&pool->work, __zswap_pool_release);
+	schedule_work(&pool->work);
 
 	spin_unlock(&zswap_pools_lock);
 }
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm/zswap: use workqueue to destroy pool
  2016-04-26 21:08             ` Dan Streetman
@ 2016-04-27  0:58               ` Sergey Senozhatsky
  -1 siblings, 0 replies; 38+ messages in thread
From: Sergey Senozhatsky @ 2016-04-27  0:58 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Yu Zhao, Andrew Morton, Seth Jennings, Sergey Senozhatsky,
	Minchan Kim, Nitin Gupta, Linux-MM, Sergey Senozhatsky,
	linux-kernel, Dan Streetman

Hello,

On (04/26/16 17:08), Dan Streetman wrote:
[..]
> -static void __zswap_pool_release(struct rcu_head *head)
> +static void __zswap_pool_release(struct work_struct *work)
>  {
> -	struct zswap_pool *pool = container_of(head, typeof(*pool), rcu_head);
> +	struct zswap_pool *pool = container_of(work, typeof(*pool), work);
> +
> +	synchronize_rcu();
>  
>  	/* nobody should have been able to get a kref... */
>  	WARN_ON(kref_get_unless_zero(&pool->kref));
> @@ -674,7 +676,9 @@ static void __zswap_pool_empty(struct kref *kref)
>  	WARN_ON(pool == zswap_pool_current());
>  
>  	list_del_rcu(&pool->list);
> -	call_rcu(&pool->rcu_head, __zswap_pool_release);
> +
> +	INIT_WORK(&pool->work, __zswap_pool_release);
> +	schedule_work(&pool->work);

so in general the patch look good to me.

it's either I didn't have enough coffee yet (which is true) or
_IN THEORY_ it creates a tiny race condition; which is hard (and
unlikely) to hit, but still. and the problem being is
CONFIG_ZSMALLOC_STAT.

zsmalloc stats are exported via debugfs which is getting init
during pool set up in zs_pool_stat_create() -> debugfs_create_dir() zsmalloc<ID>.

so, once again, in theory, since zswap has the same <ID>, debugfs
dir will have the same for different pool, so a series of zpool
changes via user space knob

	zsmalloc > zpool
	zbud > zpool
	zsmalloc > zpool

can result in

release zsmalloc0	 switch to zbud		switch to zsmalloc
__zswap_pool_release()
	schedule_work()
				...
						zs_create_pool()
							zs_pool_stat_create()
							<<  zsmalloc0 still exists >>

	work is finally scheduled
		zs_destroy_pool()
			zs_pool_stat_destroy()

	-ss

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm/zswap: use workqueue to destroy pool
@ 2016-04-27  0:58               ` Sergey Senozhatsky
  0 siblings, 0 replies; 38+ messages in thread
From: Sergey Senozhatsky @ 2016-04-27  0:58 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Yu Zhao, Andrew Morton, Seth Jennings, Sergey Senozhatsky,
	Minchan Kim, Nitin Gupta, Linux-MM, Sergey Senozhatsky,
	linux-kernel, Dan Streetman

Hello,

On (04/26/16 17:08), Dan Streetman wrote:
[..]
> -static void __zswap_pool_release(struct rcu_head *head)
> +static void __zswap_pool_release(struct work_struct *work)
>  {
> -	struct zswap_pool *pool = container_of(head, typeof(*pool), rcu_head);
> +	struct zswap_pool *pool = container_of(work, typeof(*pool), work);
> +
> +	synchronize_rcu();
>  
>  	/* nobody should have been able to get a kref... */
>  	WARN_ON(kref_get_unless_zero(&pool->kref));
> @@ -674,7 +676,9 @@ static void __zswap_pool_empty(struct kref *kref)
>  	WARN_ON(pool == zswap_pool_current());
>  
>  	list_del_rcu(&pool->list);
> -	call_rcu(&pool->rcu_head, __zswap_pool_release);
> +
> +	INIT_WORK(&pool->work, __zswap_pool_release);
> +	schedule_work(&pool->work);

so in general the patch look good to me.

it's either I didn't have enough coffee yet (which is true) or
_IN THEORY_ it creates a tiny race condition; which is hard (and
unlikely) to hit, but still. and the problem being is
CONFIG_ZSMALLOC_STAT.

zsmalloc stats are exported via debugfs which is getting init
during pool set up in zs_pool_stat_create() -> debugfs_create_dir() zsmalloc<ID>.

so, once again, in theory, since zswap has the same <ID>, debugfs
dir will have the same for different pool, so a series of zpool
changes via user space knob

	zsmalloc > zpool
	zbud > zpool
	zsmalloc > zpool

can result in

release zsmalloc0	 switch to zbud		switch to zsmalloc
__zswap_pool_release()
	schedule_work()
				...
						zs_create_pool()
							zs_pool_stat_create()
							<<  zsmalloc0 still exists >>

	work is finally scheduled
		zs_destroy_pool()
			zs_pool_stat_destroy()

	-ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm/zswap: use workqueue to destroy pool
  2016-04-27  0:58               ` Sergey Senozhatsky
@ 2016-04-27 17:19                 ` Dan Streetman
  -1 siblings, 0 replies; 38+ messages in thread
From: Dan Streetman @ 2016-04-27 17:19 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Yu Zhao, Andrew Morton, Seth Jennings, Minchan Kim, Nitin Gupta,
	Linux-MM, Sergey Senozhatsky, linux-kernel, Dan Streetman

On Tue, Apr 26, 2016 at 8:58 PM, Sergey Senozhatsky
<sergey.senozhatsky.work@gmail.com> wrote:
> Hello,
>
> On (04/26/16 17:08), Dan Streetman wrote:
> [..]
>> -static void __zswap_pool_release(struct rcu_head *head)
>> +static void __zswap_pool_release(struct work_struct *work)
>>  {
>> -     struct zswap_pool *pool = container_of(head, typeof(*pool), rcu_head);
>> +     struct zswap_pool *pool = container_of(work, typeof(*pool), work);
>> +
>> +     synchronize_rcu();
>>
>>       /* nobody should have been able to get a kref... */
>>       WARN_ON(kref_get_unless_zero(&pool->kref));
>> @@ -674,7 +676,9 @@ static void __zswap_pool_empty(struct kref *kref)
>>       WARN_ON(pool == zswap_pool_current());
>>
>>       list_del_rcu(&pool->list);
>> -     call_rcu(&pool->rcu_head, __zswap_pool_release);
>> +
>> +     INIT_WORK(&pool->work, __zswap_pool_release);
>> +     schedule_work(&pool->work);
>
> so in general the patch look good to me.
>
> it's either I didn't have enough coffee yet (which is true) or
> _IN THEORY_ it creates a tiny race condition; which is hard (and
> unlikely) to hit, but still. and the problem being is
> CONFIG_ZSMALLOC_STAT.

Aha, thanks, I hadn't tested with that param enabled.  However, the
patch doesn't create the race condition, that existed already.

>
> zsmalloc stats are exported via debugfs which is getting init
> during pool set up in zs_pool_stat_create() -> debugfs_create_dir() zsmalloc<ID>.
>
> so, once again, in theory, since zswap has the same <ID>, debugfs
> dir will have the same for different pool, so a series of zpool
> changes via user space knob
>
>         zsmalloc > zpool
>         zbud > zpool
>         zsmalloc > zpool
>
> can result in
>
> release zsmalloc0        switch to zbud         switch to zsmalloc
> __zswap_pool_release()
>         schedule_work()
>                                 ...
>                                                 zs_create_pool()
>                                                         zs_pool_stat_create()
>                                                         <<  zsmalloc0 still exists >>
>
>         work is finally scheduled
>                 zs_destroy_pool()
>                         zs_pool_stat_destroy()

zsmalloc uses the pool 'name' provided, without any checking, and in
this case it will always be 'zswap'.  So this is easy to reproduce:

1. make sure kernel is compiled with CONFIG_ZSMALLOC_STAT=y
2. enable zswap, change zpool to zsmalloc
3. put some pages into zswap
4. try to change the compressor -> failure

It fails because the new zswap pool creates a new zpool using
zsmalloc, but it can't create the zsmalloc pool because there is
already one named 'zswap' so the stat dir can't be created.

So...either zswap needs to provide a unique 'name' to each of its
zpools, or zsmalloc needs to modify its provided pool name in some way
(add a unique suffix maybe).  Or both.

It seems like zsmalloc should do the checking/modification - or, at
the very least, it should have consistent behavior regardless of the
CONFIG_ZSMALLOC_STAT setting.  However, it's easy to change zswap to
provide a unique name for each zpool creation, and zsmalloc's primary
user (zram) guarantees to provide a unique name for each pool created.
So updating zswap is probably best.


>
>         -ss

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm/zswap: use workqueue to destroy pool
@ 2016-04-27 17:19                 ` Dan Streetman
  0 siblings, 0 replies; 38+ messages in thread
From: Dan Streetman @ 2016-04-27 17:19 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Yu Zhao, Andrew Morton, Seth Jennings, Minchan Kim, Nitin Gupta,
	Linux-MM, Sergey Senozhatsky, linux-kernel, Dan Streetman

On Tue, Apr 26, 2016 at 8:58 PM, Sergey Senozhatsky
<sergey.senozhatsky.work@gmail.com> wrote:
> Hello,
>
> On (04/26/16 17:08), Dan Streetman wrote:
> [..]
>> -static void __zswap_pool_release(struct rcu_head *head)
>> +static void __zswap_pool_release(struct work_struct *work)
>>  {
>> -     struct zswap_pool *pool = container_of(head, typeof(*pool), rcu_head);
>> +     struct zswap_pool *pool = container_of(work, typeof(*pool), work);
>> +
>> +     synchronize_rcu();
>>
>>       /* nobody should have been able to get a kref... */
>>       WARN_ON(kref_get_unless_zero(&pool->kref));
>> @@ -674,7 +676,9 @@ static void __zswap_pool_empty(struct kref *kref)
>>       WARN_ON(pool == zswap_pool_current());
>>
>>       list_del_rcu(&pool->list);
>> -     call_rcu(&pool->rcu_head, __zswap_pool_release);
>> +
>> +     INIT_WORK(&pool->work, __zswap_pool_release);
>> +     schedule_work(&pool->work);
>
> so in general the patch look good to me.
>
> it's either I didn't have enough coffee yet (which is true) or
> _IN THEORY_ it creates a tiny race condition; which is hard (and
> unlikely) to hit, but still. and the problem being is
> CONFIG_ZSMALLOC_STAT.

Aha, thanks, I hadn't tested with that param enabled.  However, the
patch doesn't create the race condition, that existed already.

>
> zsmalloc stats are exported via debugfs which is getting init
> during pool set up in zs_pool_stat_create() -> debugfs_create_dir() zsmalloc<ID>.
>
> so, once again, in theory, since zswap has the same <ID>, debugfs
> dir will have the same for different pool, so a series of zpool
> changes via user space knob
>
>         zsmalloc > zpool
>         zbud > zpool
>         zsmalloc > zpool
>
> can result in
>
> release zsmalloc0        switch to zbud         switch to zsmalloc
> __zswap_pool_release()
>         schedule_work()
>                                 ...
>                                                 zs_create_pool()
>                                                         zs_pool_stat_create()
>                                                         <<  zsmalloc0 still exists >>
>
>         work is finally scheduled
>                 zs_destroy_pool()
>                         zs_pool_stat_destroy()

zsmalloc uses the pool 'name' provided, without any checking, and in
this case it will always be 'zswap'.  So this is easy to reproduce:

1. make sure kernel is compiled with CONFIG_ZSMALLOC_STAT=y
2. enable zswap, change zpool to zsmalloc
3. put some pages into zswap
4. try to change the compressor -> failure

It fails because the new zswap pool creates a new zpool using
zsmalloc, but it can't create the zsmalloc pool because there is
already one named 'zswap' so the stat dir can't be created.

So...either zswap needs to provide a unique 'name' to each of its
zpools, or zsmalloc needs to modify its provided pool name in some way
(add a unique suffix maybe).  Or both.

It seems like zsmalloc should do the checking/modification - or, at
the very least, it should have consistent behavior regardless of the
CONFIG_ZSMALLOC_STAT setting.  However, it's easy to change zswap to
provide a unique name for each zpool creation, and zsmalloc's primary
user (zram) guarantees to provide a unique name for each pool created.
So updating zswap is probably best.


>
>         -ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm/zswap: use workqueue to destroy pool
  2016-04-27 17:19                 ` Dan Streetman
@ 2016-04-28  1:40                   ` Sergey Senozhatsky
  -1 siblings, 0 replies; 38+ messages in thread
From: Sergey Senozhatsky @ 2016-04-28  1:40 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Sergey Senozhatsky, Yu Zhao, Andrew Morton, Seth Jennings,
	Minchan Kim, Nitin Gupta, Linux-MM, Sergey Senozhatsky,
	linux-kernel, Dan Streetman

Hello Dan,

On (04/27/16 13:19), Dan Streetman wrote:
[..]
> > so in general the patch look good to me.
> >
> > it's either I didn't have enough coffee yet (which is true) or
> > _IN THEORY_ it creates a tiny race condition; which is hard (and
> > unlikely) to hit, but still. and the problem being is
> > CONFIG_ZSMALLOC_STAT.
> 
> Aha, thanks, I hadn't tested with that param enabled.  However, the
> patch doesn't create the race condition, that existed already.

well, agree. it's not like zsmalloc race condition, but the way zsmalloc
is managed (deferred destruction either via rcu or scheduled work).

> It fails because the new zswap pool creates a new zpool using
> zsmalloc, but it can't create the zsmalloc pool because there is
> already one named 'zswap' so the stat dir can't be created.
> 
> So...either zswap needs to provide a unique 'name' to each of its
> zpools, or zsmalloc needs to modify its provided pool name in some way
> (add a unique suffix maybe).  Or both.
> 
> It seems like zsmalloc should do the checking/modification - or, at
> the very least, it should have consistent behavior regardless of the
> CONFIG_ZSMALLOC_STAT setting.

yes, zram guarantees that there won't be any name collisions. and the
way it's working for zram, zram<ID> corresponds to zsmalloc<ID>.


the bigger issue here (and I was thinking at some point of fixing it,
but then I grepped to see how many API users are in there, and I gave
up) is that it seems we have no way to check if the dir exists in debugfs.

we call this function

struct dentry *debugfs_create_dir(const char *name, struct dentry *parent)
{
	struct dentry *dentry = start_creating(name, parent);
	struct inode *inode;

	if (IS_ERR(dentry))
		return NULL;

	inode = debugfs_get_inode(dentry->d_sb);
	if (unlikely(!inode))
		return failed_creating(dentry);

	inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
	inode->i_op = &simple_dir_inode_operations;
	inode->i_fop = &simple_dir_operations;

	/* directory inodes start off with i_nlink == 2 (for "." entry) */
	inc_nlink(inode);
	d_instantiate(dentry, inode);
	inc_nlink(d_inode(dentry->d_parent));
	fsnotify_mkdir(d_inode(dentry->d_parent), dentry);
	return end_creating(dentry);
}

and debugfs _does know_ that the directory ERR_PTR(-EEXIST), that's what
start_creating()->lookup_one_len() return

static struct dentry *start_creating(const char *name, struct dentry *parent)
{
	struct dentry *dentry;
	int error;

	pr_debug("debugfs: creating file '%s'\n",name);

	if (IS_ERR(parent))
		return parent;

	error = simple_pin_fs(&debug_fs_type, &debugfs_mount,
			      &debugfs_mount_count);
	if (error)
		return ERR_PTR(error);

	/* If the parent is not specified, we create it in the root.
	 * We need the root dentry to do this, which is in the super
	 * block. A pointer to that is in the struct vfsmount that we
	 * have around.
	 */
	if (!parent)
		parent = debugfs_mount->mnt_root;

	inode_lock(d_inode(parent));
	dentry = lookup_one_len(name, parent, strlen(name));
	if (!IS_ERR(dentry) && d_really_is_positive(dentry)) {
		dput(dentry);
		dentry = ERR_PTR(-EEXIST);
	}

	if (IS_ERR(dentry)) {
		inode_unlock(d_inode(parent));
		simple_release_fs(&debugfs_mount, &debugfs_mount_count);
	}

	return dentry;
}

but debugfs_create_dir() instead of propagating this error, it swallows it
and simply return NULL, so we can't tell the difference between -EEXIST, OOM,
or anything else. so doing this check in zsmalloc() is not so easy.

/* well, I may be wrong here */

> However, it's easy to change zswap to provide a unique name for each
> zpool creation, and zsmalloc's primary user (zram) guarantees to
> provide a unique name for each pool created. So updating zswap is
> probably best.

if you can do it in zswap, then please do.

	-ss

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm/zswap: use workqueue to destroy pool
@ 2016-04-28  1:40                   ` Sergey Senozhatsky
  0 siblings, 0 replies; 38+ messages in thread
From: Sergey Senozhatsky @ 2016-04-28  1:40 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Sergey Senozhatsky, Yu Zhao, Andrew Morton, Seth Jennings,
	Minchan Kim, Nitin Gupta, Linux-MM, Sergey Senozhatsky,
	linux-kernel, Dan Streetman

Hello Dan,

On (04/27/16 13:19), Dan Streetman wrote:
[..]
> > so in general the patch look good to me.
> >
> > it's either I didn't have enough coffee yet (which is true) or
> > _IN THEORY_ it creates a tiny race condition; which is hard (and
> > unlikely) to hit, but still. and the problem being is
> > CONFIG_ZSMALLOC_STAT.
> 
> Aha, thanks, I hadn't tested with that param enabled.  However, the
> patch doesn't create the race condition, that existed already.

well, agree. it's not like zsmalloc race condition, but the way zsmalloc
is managed (deferred destruction either via rcu or scheduled work).

> It fails because the new zswap pool creates a new zpool using
> zsmalloc, but it can't create the zsmalloc pool because there is
> already one named 'zswap' so the stat dir can't be created.
> 
> So...either zswap needs to provide a unique 'name' to each of its
> zpools, or zsmalloc needs to modify its provided pool name in some way
> (add a unique suffix maybe).  Or both.
> 
> It seems like zsmalloc should do the checking/modification - or, at
> the very least, it should have consistent behavior regardless of the
> CONFIG_ZSMALLOC_STAT setting.

yes, zram guarantees that there won't be any name collisions. and the
way it's working for zram, zram<ID> corresponds to zsmalloc<ID>.


the bigger issue here (and I was thinking at some point of fixing it,
but then I grepped to see how many API users are in there, and I gave
up) is that it seems we have no way to check if the dir exists in debugfs.

we call this function

struct dentry *debugfs_create_dir(const char *name, struct dentry *parent)
{
	struct dentry *dentry = start_creating(name, parent);
	struct inode *inode;

	if (IS_ERR(dentry))
		return NULL;

	inode = debugfs_get_inode(dentry->d_sb);
	if (unlikely(!inode))
		return failed_creating(dentry);

	inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
	inode->i_op = &simple_dir_inode_operations;
	inode->i_fop = &simple_dir_operations;

	/* directory inodes start off with i_nlink == 2 (for "." entry) */
	inc_nlink(inode);
	d_instantiate(dentry, inode);
	inc_nlink(d_inode(dentry->d_parent));
	fsnotify_mkdir(d_inode(dentry->d_parent), dentry);
	return end_creating(dentry);
}

and debugfs _does know_ that the directory ERR_PTR(-EEXIST), that's what
start_creating()->lookup_one_len() return

static struct dentry *start_creating(const char *name, struct dentry *parent)
{
	struct dentry *dentry;
	int error;

	pr_debug("debugfs: creating file '%s'\n",name);

	if (IS_ERR(parent))
		return parent;

	error = simple_pin_fs(&debug_fs_type, &debugfs_mount,
			      &debugfs_mount_count);
	if (error)
		return ERR_PTR(error);

	/* If the parent is not specified, we create it in the root.
	 * We need the root dentry to do this, which is in the super
	 * block. A pointer to that is in the struct vfsmount that we
	 * have around.
	 */
	if (!parent)
		parent = debugfs_mount->mnt_root;

	inode_lock(d_inode(parent));
	dentry = lookup_one_len(name, parent, strlen(name));
	if (!IS_ERR(dentry) && d_really_is_positive(dentry)) {
		dput(dentry);
		dentry = ERR_PTR(-EEXIST);
	}

	if (IS_ERR(dentry)) {
		inode_unlock(d_inode(parent));
		simple_release_fs(&debugfs_mount, &debugfs_mount_count);
	}

	return dentry;
}

but debugfs_create_dir() instead of propagating this error, it swallows it
and simply return NULL, so we can't tell the difference between -EEXIST, OOM,
or anything else. so doing this check in zsmalloc() is not so easy.

/* well, I may be wrong here */

> However, it's easy to change zswap to provide a unique name for each
> zpool creation, and zsmalloc's primary user (zram) guarantees to
> provide a unique name for each pool created. So updating zswap is
> probably best.

if you can do it in zswap, then please do.

	-ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm/zswap: use workqueue to destroy pool
  2016-04-28  1:40                   ` Sergey Senozhatsky
@ 2016-04-28  4:09                     ` Sergey Senozhatsky
  -1 siblings, 0 replies; 38+ messages in thread
From: Sergey Senozhatsky @ 2016-04-28  4:09 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Dan Streetman, Yu Zhao, Andrew Morton, Seth Jennings,
	Minchan Kim, Nitin Gupta, Linux-MM, Sergey Senozhatsky,
	linux-kernel, Dan Streetman

On (04/28/16 10:40), Sergey Senozhatsky wrote:
[..]
> the bigger issue here (and I was thinking at some point of fixing it,
> but then I grepped to see how many API users are in there, and I gave
> up) is that it seems we have no way to check if the dir exists in debugfs.

well, unless we want to do something like below. but I don't think Greg
will not buy it and the basic rule is to be careful in the driver code
and avoid any collisions.

---

 fs/debugfs/inode.c      | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/debugfs.h |  7 +++++++
 2 files changed, 55 insertions(+)

diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index 8580831..76cf851 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -709,6 +709,54 @@ exit:
 EXPORT_SYMBOL_GPL(debugfs_rename);
 
 /**
+ * debugfs_entry_exists - lookup file/directory name
+ *
+ * @name: a pointer to a string containing the name of the file/directory
+ *        to lookup.
+ * @parent: a pointer to the parent dentry.  This should be a directory
+ *          dentry if set. If this parameter is NULL, then the root of the
+ *          debugfs filesystem will be used.
+ *
+ * This function lookup a file/directory name in debugfs. If the
+ * name corresponds to positive dentry, the function will return %0.
+ *
+ * If debugfs is not enabled in the kernel, the value -%ENODEV will be
+ * returned.
+ */
+int debugfs_entry_exists(const char *name, struct dentry *parent)
+{
+	struct dentry *dentry;
+	int error;
+
+	if (IS_ERR(parent))
+		return PTR_ERR(parent);
+
+	error = simple_pin_fs(&debug_fs_type, &debugfs_mount,
+			      &debugfs_mount_count);
+	if (error)
+		return error;
+
+	if (!parent)
+		parent = debugfs_mount->mnt_root;
+
+	error = -EINVAL;
+	inode_lock(d_inode(parent));
+	dentry = lookup_one_len(name, parent, strlen(name));
+	if (IS_ERR(dentry)) {
+		error = PTR_ERR(dentry);
+	} else {
+		if (d_really_is_positive(dentry))
+			error = 0;
+		dput(dentry);
+	}
+
+	inode_unlock(d_inode(parent));
+	simple_release_fs(&debugfs_mount, &debugfs_mount_count);
+	return error;
+}
+EXPORT_SYMBOL_GPL(debugfs_entry_exists);
+
+/**
  * debugfs_initialized - Tells whether debugfs has been registered
  */
 bool debugfs_initialized(void)
diff --git a/include/linux/debugfs.h b/include/linux/debugfs.h
index 981e53a..5b6321e 100644
--- a/include/linux/debugfs.h
+++ b/include/linux/debugfs.h
@@ -124,6 +124,8 @@ ssize_t debugfs_read_file_bool(struct file *file, char __user *user_buf,
 ssize_t debugfs_write_file_bool(struct file *file, const char __user *user_buf,
 				size_t count, loff_t *ppos);
 
+int debugfs_entry_exists(const char *name, struct dentry *parent);
+
 #else
 
 #include <linux/err.h>
@@ -312,6 +314,11 @@ static inline ssize_t debugfs_write_file_bool(struct file *file,
 	return -ENODEV;
 }
 
+static inline int debugfs_entry_exists(const char *name, struct dentry *parent)
+{
+	return -ENODEV;
+}
+
 #endif
 
 #endif

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm/zswap: use workqueue to destroy pool
@ 2016-04-28  4:09                     ` Sergey Senozhatsky
  0 siblings, 0 replies; 38+ messages in thread
From: Sergey Senozhatsky @ 2016-04-28  4:09 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Dan Streetman, Yu Zhao, Andrew Morton, Seth Jennings,
	Minchan Kim, Nitin Gupta, Linux-MM, Sergey Senozhatsky,
	linux-kernel, Dan Streetman

On (04/28/16 10:40), Sergey Senozhatsky wrote:
[..]
> the bigger issue here (and I was thinking at some point of fixing it,
> but then I grepped to see how many API users are in there, and I gave
> up) is that it seems we have no way to check if the dir exists in debugfs.

well, unless we want to do something like below. but I don't think Greg
will not buy it and the basic rule is to be careful in the driver code
and avoid any collisions.

---

 fs/debugfs/inode.c      | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/debugfs.h |  7 +++++++
 2 files changed, 55 insertions(+)

diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index 8580831..76cf851 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -709,6 +709,54 @@ exit:
 EXPORT_SYMBOL_GPL(debugfs_rename);
 
 /**
+ * debugfs_entry_exists - lookup file/directory name
+ *
+ * @name: a pointer to a string containing the name of the file/directory
+ *        to lookup.
+ * @parent: a pointer to the parent dentry.  This should be a directory
+ *          dentry if set. If this parameter is NULL, then the root of the
+ *          debugfs filesystem will be used.
+ *
+ * This function lookup a file/directory name in debugfs. If the
+ * name corresponds to positive dentry, the function will return %0.
+ *
+ * If debugfs is not enabled in the kernel, the value -%ENODEV will be
+ * returned.
+ */
+int debugfs_entry_exists(const char *name, struct dentry *parent)
+{
+	struct dentry *dentry;
+	int error;
+
+	if (IS_ERR(parent))
+		return PTR_ERR(parent);
+
+	error = simple_pin_fs(&debug_fs_type, &debugfs_mount,
+			      &debugfs_mount_count);
+	if (error)
+		return error;
+
+	if (!parent)
+		parent = debugfs_mount->mnt_root;
+
+	error = -EINVAL;
+	inode_lock(d_inode(parent));
+	dentry = lookup_one_len(name, parent, strlen(name));
+	if (IS_ERR(dentry)) {
+		error = PTR_ERR(dentry);
+	} else {
+		if (d_really_is_positive(dentry))
+			error = 0;
+		dput(dentry);
+	}
+
+	inode_unlock(d_inode(parent));
+	simple_release_fs(&debugfs_mount, &debugfs_mount_count);
+	return error;
+}
+EXPORT_SYMBOL_GPL(debugfs_entry_exists);
+
+/**
  * debugfs_initialized - Tells whether debugfs has been registered
  */
 bool debugfs_initialized(void)
diff --git a/include/linux/debugfs.h b/include/linux/debugfs.h
index 981e53a..5b6321e 100644
--- a/include/linux/debugfs.h
+++ b/include/linux/debugfs.h
@@ -124,6 +124,8 @@ ssize_t debugfs_read_file_bool(struct file *file, char __user *user_buf,
 ssize_t debugfs_write_file_bool(struct file *file, const char __user *user_buf,
 				size_t count, loff_t *ppos);
 
+int debugfs_entry_exists(const char *name, struct dentry *parent);
+
 #else
 
 #include <linux/err.h>
@@ -312,6 +314,11 @@ static inline ssize_t debugfs_write_file_bool(struct file *file,
 	return -ENODEV;
 }
 
+static inline int debugfs_entry_exists(const char *name, struct dentry *parent)
+{
+	return -ENODEV;
+}
+
 #endif
 
 #endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm/zswap: use workqueue to destroy pool
  2016-04-28  1:40                   ` Sergey Senozhatsky
@ 2016-04-28  8:21                     ` Dan Streetman
  -1 siblings, 0 replies; 38+ messages in thread
From: Dan Streetman @ 2016-04-28  8:21 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Yu Zhao, Andrew Morton, Seth Jennings, Minchan Kim, Nitin Gupta,
	Linux-MM, Sergey Senozhatsky, linux-kernel, Dan Streetman

On Wed, Apr 27, 2016 at 9:40 PM, Sergey Senozhatsky
<sergey.senozhatsky.work@gmail.com> wrote:
> Hello Dan,
>
> On (04/27/16 13:19), Dan Streetman wrote:
> [..]
>> > so in general the patch look good to me.
>> >
>> > it's either I didn't have enough coffee yet (which is true) or
>> > _IN THEORY_ it creates a tiny race condition; which is hard (and
>> > unlikely) to hit, but still. and the problem being is
>> > CONFIG_ZSMALLOC_STAT.
>>
>> Aha, thanks, I hadn't tested with that param enabled.  However, the
>> patch doesn't create the race condition, that existed already.
>
> well, agree. it's not like zsmalloc race condition, but the way zsmalloc
> is managed (deferred destruction either via rcu or scheduled work).
>
>> It fails because the new zswap pool creates a new zpool using
>> zsmalloc, but it can't create the zsmalloc pool because there is
>> already one named 'zswap' so the stat dir can't be created.
>>
>> So...either zswap needs to provide a unique 'name' to each of its
>> zpools, or zsmalloc needs to modify its provided pool name in some way
>> (add a unique suffix maybe).  Or both.
>>
>> It seems like zsmalloc should do the checking/modification - or, at
>> the very least, it should have consistent behavior regardless of the
>> CONFIG_ZSMALLOC_STAT setting.
>
> yes, zram guarantees that there won't be any name collisions. and the
> way it's working for zram, zram<ID> corresponds to zsmalloc<ID>.
>
>
> the bigger issue here (and I was thinking at some point of fixing it,
> but then I grepped to see how many API users are in there, and I gave
> up) is that it seems we have no way to check if the dir exists in debugfs.
>
> we call this function
>
> struct dentry *debugfs_create_dir(const char *name, struct dentry *parent)
> {
>         struct dentry *dentry = start_creating(name, parent);
>         struct inode *inode;
>
>         if (IS_ERR(dentry))
>                 return NULL;
>
>         inode = debugfs_get_inode(dentry->d_sb);
>         if (unlikely(!inode))
>                 return failed_creating(dentry);
>
>         inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
>         inode->i_op = &simple_dir_inode_operations;
>         inode->i_fop = &simple_dir_operations;
>
>         /* directory inodes start off with i_nlink == 2 (for "." entry) */
>         inc_nlink(inode);
>         d_instantiate(dentry, inode);
>         inc_nlink(d_inode(dentry->d_parent));
>         fsnotify_mkdir(d_inode(dentry->d_parent), dentry);
>         return end_creating(dentry);
> }
>
> and debugfs _does know_ that the directory ERR_PTR(-EEXIST), that's what
> start_creating()->lookup_one_len() return
>
> static struct dentry *start_creating(const char *name, struct dentry *parent)
> {
>         struct dentry *dentry;
>         int error;
>
>         pr_debug("debugfs: creating file '%s'\n",name);
>
>         if (IS_ERR(parent))
>                 return parent;
>
>         error = simple_pin_fs(&debug_fs_type, &debugfs_mount,
>                               &debugfs_mount_count);
>         if (error)
>                 return ERR_PTR(error);
>
>         /* If the parent is not specified, we create it in the root.
>          * We need the root dentry to do this, which is in the super
>          * block. A pointer to that is in the struct vfsmount that we
>          * have around.
>          */
>         if (!parent)
>                 parent = debugfs_mount->mnt_root;
>
>         inode_lock(d_inode(parent));
>         dentry = lookup_one_len(name, parent, strlen(name));
>         if (!IS_ERR(dentry) && d_really_is_positive(dentry)) {
>                 dput(dentry);
>                 dentry = ERR_PTR(-EEXIST);
>         }
>
>         if (IS_ERR(dentry)) {
>                 inode_unlock(d_inode(parent));
>                 simple_release_fs(&debugfs_mount, &debugfs_mount_count);
>         }
>
>         return dentry;
> }
>
> but debugfs_create_dir() instead of propagating this error, it swallows it
> and simply return NULL, so we can't tell the difference between -EEXIST, OOM,
> or anything else. so doing this check in zsmalloc() is not so easy.

yeah, Greg intentionally made the debugfs api opaque, so there's only
a binary created/failed indication.

While I agree zswap should provide unique names, I also think zsmalloc
should not abort if its debugfs content fails to be created - the
intention of debugfs is not to be a critical part of drivers, but only
to provide debug information.  I'll send a patch to zsmalloc
separately, that allows zsmalloc pool creation to continue even if the
debugfs dir/file failed to be created.


>
> /* well, I may be wrong here */
>
>> However, it's easy to change zswap to provide a unique name for each
>> zpool creation, and zsmalloc's primary user (zram) guarantees to
>> provide a unique name for each pool created. So updating zswap is
>> probably best.
>
> if you can do it in zswap, then please do.
>
>         -ss

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm/zswap: use workqueue to destroy pool
@ 2016-04-28  8:21                     ` Dan Streetman
  0 siblings, 0 replies; 38+ messages in thread
From: Dan Streetman @ 2016-04-28  8:21 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Yu Zhao, Andrew Morton, Seth Jennings, Minchan Kim, Nitin Gupta,
	Linux-MM, Sergey Senozhatsky, linux-kernel, Dan Streetman

On Wed, Apr 27, 2016 at 9:40 PM, Sergey Senozhatsky
<sergey.senozhatsky.work@gmail.com> wrote:
> Hello Dan,
>
> On (04/27/16 13:19), Dan Streetman wrote:
> [..]
>> > so in general the patch look good to me.
>> >
>> > it's either I didn't have enough coffee yet (which is true) or
>> > _IN THEORY_ it creates a tiny race condition; which is hard (and
>> > unlikely) to hit, but still. and the problem being is
>> > CONFIG_ZSMALLOC_STAT.
>>
>> Aha, thanks, I hadn't tested with that param enabled.  However, the
>> patch doesn't create the race condition, that existed already.
>
> well, agree. it's not like zsmalloc race condition, but the way zsmalloc
> is managed (deferred destruction either via rcu or scheduled work).
>
>> It fails because the new zswap pool creates a new zpool using
>> zsmalloc, but it can't create the zsmalloc pool because there is
>> already one named 'zswap' so the stat dir can't be created.
>>
>> So...either zswap needs to provide a unique 'name' to each of its
>> zpools, or zsmalloc needs to modify its provided pool name in some way
>> (add a unique suffix maybe).  Or both.
>>
>> It seems like zsmalloc should do the checking/modification - or, at
>> the very least, it should have consistent behavior regardless of the
>> CONFIG_ZSMALLOC_STAT setting.
>
> yes, zram guarantees that there won't be any name collisions. and the
> way it's working for zram, zram<ID> corresponds to zsmalloc<ID>.
>
>
> the bigger issue here (and I was thinking at some point of fixing it,
> but then I grepped to see how many API users are in there, and I gave
> up) is that it seems we have no way to check if the dir exists in debugfs.
>
> we call this function
>
> struct dentry *debugfs_create_dir(const char *name, struct dentry *parent)
> {
>         struct dentry *dentry = start_creating(name, parent);
>         struct inode *inode;
>
>         if (IS_ERR(dentry))
>                 return NULL;
>
>         inode = debugfs_get_inode(dentry->d_sb);
>         if (unlikely(!inode))
>                 return failed_creating(dentry);
>
>         inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
>         inode->i_op = &simple_dir_inode_operations;
>         inode->i_fop = &simple_dir_operations;
>
>         /* directory inodes start off with i_nlink == 2 (for "." entry) */
>         inc_nlink(inode);
>         d_instantiate(dentry, inode);
>         inc_nlink(d_inode(dentry->d_parent));
>         fsnotify_mkdir(d_inode(dentry->d_parent), dentry);
>         return end_creating(dentry);
> }
>
> and debugfs _does know_ that the directory ERR_PTR(-EEXIST), that's what
> start_creating()->lookup_one_len() return
>
> static struct dentry *start_creating(const char *name, struct dentry *parent)
> {
>         struct dentry *dentry;
>         int error;
>
>         pr_debug("debugfs: creating file '%s'\n",name);
>
>         if (IS_ERR(parent))
>                 return parent;
>
>         error = simple_pin_fs(&debug_fs_type, &debugfs_mount,
>                               &debugfs_mount_count);
>         if (error)
>                 return ERR_PTR(error);
>
>         /* If the parent is not specified, we create it in the root.
>          * We need the root dentry to do this, which is in the super
>          * block. A pointer to that is in the struct vfsmount that we
>          * have around.
>          */
>         if (!parent)
>                 parent = debugfs_mount->mnt_root;
>
>         inode_lock(d_inode(parent));
>         dentry = lookup_one_len(name, parent, strlen(name));
>         if (!IS_ERR(dentry) && d_really_is_positive(dentry)) {
>                 dput(dentry);
>                 dentry = ERR_PTR(-EEXIST);
>         }
>
>         if (IS_ERR(dentry)) {
>                 inode_unlock(d_inode(parent));
>                 simple_release_fs(&debugfs_mount, &debugfs_mount_count);
>         }
>
>         return dentry;
> }
>
> but debugfs_create_dir() instead of propagating this error, it swallows it
> and simply return NULL, so we can't tell the difference between -EEXIST, OOM,
> or anything else. so doing this check in zsmalloc() is not so easy.

yeah, Greg intentionally made the debugfs api opaque, so there's only
a binary created/failed indication.

While I agree zswap should provide unique names, I also think zsmalloc
should not abort if its debugfs content fails to be created - the
intention of debugfs is not to be a critical part of drivers, but only
to provide debug information.  I'll send a patch to zsmalloc
separately, that allows zsmalloc pool creation to continue even if the
debugfs dir/file failed to be created.


>
> /* well, I may be wrong here */
>
>> However, it's easy to change zswap to provide a unique name for each
>> zpool creation, and zsmalloc's primary user (zram) guarantees to
>> provide a unique name for each pool created. So updating zswap is
>> probably best.
>
> if you can do it in zswap, then please do.
>
>         -ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH] mm/zswap: provide unique zpool name
  2016-04-27 17:19                 ` Dan Streetman
@ 2016-04-28  9:13                   ` Dan Streetman
  -1 siblings, 0 replies; 38+ messages in thread
From: Dan Streetman @ 2016-04-28  9:13 UTC (permalink / raw)
  To: Sergey Senozhatsky, Andrew Morton, Seth Jennings
  Cc: Yu Zhao, Minchan Kim, Nitin Gupta, Linux-MM, Sergey Senozhatsky,
	linux-kernel, Dan Streetman, Dan Streetman

Instead of using "zswap" as the name for all zpools created, add
an atomic counter and use "zswap%x" with the counter number for each
zpool created, to provide a unique name for each new zpool.

As zsmalloc, one of the zpool implementations, requires/expects a
unique name for each pool created, zswap should provide a unique name.
The zsmalloc pool creation does not fail if a new pool with a
conflicting name is created, unless CONFIG_ZSMALLOC_STAT is enabled;
in that case, zsmalloc pool creation fails with -ENOMEM.  Then zswap
will be unable to change its compressor parameter if its zpool is
zsmalloc; it also will be unable to change its zpool parameter back
to zsmalloc, if it has any existing old zpool using zsmalloc with
page(s) in it.  Attempts to change the parameters will result in
failure to create the zpool.  This changes zswap to provide a
unique name for each zpool creation.

Fixes: f1c54846ee45 ("zswap: dynamic pool creation")
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Dan Streetman <dan.streetman@canonical.com>
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 mm/zswap.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index f207da7..275b22c 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -170,6 +170,8 @@ static struct zswap_tree *zswap_trees[MAX_SWAPFILES];
 static LIST_HEAD(zswap_pools);
 /* protects zswap_pools list modification */
 static DEFINE_SPINLOCK(zswap_pools_lock);
+/* pool counter to provide unique names to zpool */
+static atomic_t zswap_pools_count = ATOMIC_INIT(0);
 
 /* used by param callback function */
 static bool zswap_init_started;
@@ -565,6 +567,7 @@ static struct zswap_pool *zswap_pool_find_get(char *type, char *compressor)
 static struct zswap_pool *zswap_pool_create(char *type, char *compressor)
 {
 	struct zswap_pool *pool;
+	char name[38]; /* 'zswap' + 32 char (max) num + \0 */
 	gfp_t gfp = __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM;
 
 	pool = kzalloc(sizeof(*pool), GFP_KERNEL);
@@ -573,7 +576,10 @@ static struct zswap_pool *zswap_pool_create(char *type, char *compressor)
 		return NULL;
 	}
 
-	pool->zpool = zpool_create_pool(type, "zswap", gfp, &zswap_zpool_ops);
+	/* unique name for each pool specifically required by zsmalloc */
+	snprintf(name, 38, "zswap%x", atomic_inc_return(&zswap_pools_count));
+
+	pool->zpool = zpool_create_pool(type, name, gfp, &zswap_zpool_ops);
 	if (!pool->zpool) {
 		pr_err("%s zpool not available\n", type);
 		goto error;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH] mm/zswap: provide unique zpool name
@ 2016-04-28  9:13                   ` Dan Streetman
  0 siblings, 0 replies; 38+ messages in thread
From: Dan Streetman @ 2016-04-28  9:13 UTC (permalink / raw)
  To: Sergey Senozhatsky, Andrew Morton, Seth Jennings
  Cc: Yu Zhao, Minchan Kim, Nitin Gupta, Linux-MM, Sergey Senozhatsky,
	linux-kernel, Dan Streetman, Dan Streetman

Instead of using "zswap" as the name for all zpools created, add
an atomic counter and use "zswap%x" with the counter number for each
zpool created, to provide a unique name for each new zpool.

As zsmalloc, one of the zpool implementations, requires/expects a
unique name for each pool created, zswap should provide a unique name.
The zsmalloc pool creation does not fail if a new pool with a
conflicting name is created, unless CONFIG_ZSMALLOC_STAT is enabled;
in that case, zsmalloc pool creation fails with -ENOMEM.  Then zswap
will be unable to change its compressor parameter if its zpool is
zsmalloc; it also will be unable to change its zpool parameter back
to zsmalloc, if it has any existing old zpool using zsmalloc with
page(s) in it.  Attempts to change the parameters will result in
failure to create the zpool.  This changes zswap to provide a
unique name for each zpool creation.

Fixes: f1c54846ee45 ("zswap: dynamic pool creation")
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Dan Streetman <dan.streetman@canonical.com>
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 mm/zswap.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index f207da7..275b22c 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -170,6 +170,8 @@ static struct zswap_tree *zswap_trees[MAX_SWAPFILES];
 static LIST_HEAD(zswap_pools);
 /* protects zswap_pools list modification */
 static DEFINE_SPINLOCK(zswap_pools_lock);
+/* pool counter to provide unique names to zpool */
+static atomic_t zswap_pools_count = ATOMIC_INIT(0);
 
 /* used by param callback function */
 static bool zswap_init_started;
@@ -565,6 +567,7 @@ static struct zswap_pool *zswap_pool_find_get(char *type, char *compressor)
 static struct zswap_pool *zswap_pool_create(char *type, char *compressor)
 {
 	struct zswap_pool *pool;
+	char name[38]; /* 'zswap' + 32 char (max) num + \0 */
 	gfp_t gfp = __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM;
 
 	pool = kzalloc(sizeof(*pool), GFP_KERNEL);
@@ -573,7 +576,10 @@ static struct zswap_pool *zswap_pool_create(char *type, char *compressor)
 		return NULL;
 	}
 
-	pool->zpool = zpool_create_pool(type, "zswap", gfp, &zswap_zpool_ops);
+	/* unique name for each pool specifically required by zsmalloc */
+	snprintf(name, 38, "zswap%x", atomic_inc_return(&zswap_pools_count));
+
+	pool->zpool = zpool_create_pool(type, name, gfp, &zswap_zpool_ops);
 	if (!pool->zpool) {
 		pr_err("%s zpool not available\n", type);
 		goto error;
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm/zswap: provide unique zpool name
  2016-04-28  9:13                   ` Dan Streetman
@ 2016-04-28 22:16                     ` Andrew Morton
  -1 siblings, 0 replies; 38+ messages in thread
From: Andrew Morton @ 2016-04-28 22:16 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Sergey Senozhatsky, Seth Jennings, Yu Zhao, Minchan Kim,
	Nitin Gupta, Linux-MM, Sergey Senozhatsky, linux-kernel,
	Dan Streetman

On Thu, 28 Apr 2016 05:13:23 -0400 Dan Streetman <ddstreet@ieee.org> wrote:

> Instead of using "zswap" as the name for all zpools created, add
> an atomic counter and use "zswap%x" with the counter number for each
> zpool created, to provide a unique name for each new zpool.
> 
> As zsmalloc, one of the zpool implementations, requires/expects a
> unique name for each pool created, zswap should provide a unique name.
> The zsmalloc pool creation does not fail if a new pool with a
> conflicting name is created, unless CONFIG_ZSMALLOC_STAT is enabled;
> in that case, zsmalloc pool creation fails with -ENOMEM.  Then zswap
> will be unable to change its compressor parameter if its zpool is
> zsmalloc; it also will be unable to change its zpool parameter back
> to zsmalloc, if it has any existing old zpool using zsmalloc with
> page(s) in it.  Attempts to change the parameters will result in
> failure to create the zpool.  This changes zswap to provide a
> unique name for each zpool creation.
> 
> Fixes: f1c54846ee45 ("zswap: dynamic pool creation")

September 2015.  I added a cc:stable to this one.

> Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Cc: Dan Streetman <dan.streetman@canonical.com>
> Signed-off-by: Dan Streetman <ddstreet@ieee.org>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm/zswap: provide unique zpool name
@ 2016-04-28 22:16                     ` Andrew Morton
  0 siblings, 0 replies; 38+ messages in thread
From: Andrew Morton @ 2016-04-28 22:16 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Sergey Senozhatsky, Seth Jennings, Yu Zhao, Minchan Kim,
	Nitin Gupta, Linux-MM, Sergey Senozhatsky, linux-kernel,
	Dan Streetman

On Thu, 28 Apr 2016 05:13:23 -0400 Dan Streetman <ddstreet@ieee.org> wrote:

> Instead of using "zswap" as the name for all zpools created, add
> an atomic counter and use "zswap%x" with the counter number for each
> zpool created, to provide a unique name for each new zpool.
> 
> As zsmalloc, one of the zpool implementations, requires/expects a
> unique name for each pool created, zswap should provide a unique name.
> The zsmalloc pool creation does not fail if a new pool with a
> conflicting name is created, unless CONFIG_ZSMALLOC_STAT is enabled;
> in that case, zsmalloc pool creation fails with -ENOMEM.  Then zswap
> will be unable to change its compressor parameter if its zpool is
> zsmalloc; it also will be unable to change its zpool parameter back
> to zsmalloc, if it has any existing old zpool using zsmalloc with
> page(s) in it.  Attempts to change the parameters will result in
> failure to create the zpool.  This changes zswap to provide a
> unique name for each zpool creation.
> 
> Fixes: f1c54846ee45 ("zswap: dynamic pool creation")

September 2015.  I added a cc:stable to this one.

> Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Cc: Dan Streetman <dan.streetman@canonical.com>
> Signed-off-by: Dan Streetman <ddstreet@ieee.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm/zswap: use workqueue to destroy pool
  2016-04-26 21:08             ` Dan Streetman
@ 2016-04-29  0:25               ` Sergey Senozhatsky
  -1 siblings, 0 replies; 38+ messages in thread
From: Sergey Senozhatsky @ 2016-04-29  0:25 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Yu Zhao, Andrew Morton, Seth Jennings, Sergey Senozhatsky,
	Minchan Kim, Nitin Gupta, Linux-MM, Sergey Senozhatsky,
	linux-kernel, Dan Streetman

On (04/26/16 17:08), Dan Streetman wrote:
> Add a work_struct to struct zswap_pool, and change __zswap_pool_empty
> to use the workqueue instead of using call_rcu().
> 
> When zswap destroys a pool no longer in use, it uses call_rcu() to
> perform the destruction/freeing.  Since that executes in softirq
> context, it must not sleep.  However, actually destroying the pool
> involves freeing the per-cpu compressors (which requires locking the
> cpu_add_remove_lock mutex) and freeing the zpool, for which the
> implementation may sleep (e.g. zsmalloc calls kmem_cache_destroy,
> which locks the slab_mutex).  So if either mutex is currently taken,
> or any other part of the compressor or zpool implementation sleeps, it
> will result in a BUG().
> 
> It's not easy to reproduce this when changing zswap's params normally.
> In testing with a loaded system, this does not fail:
> 
> $ cd /sys/module/zswap/parameters
> $ echo lz4 > compressor ; echo zsmalloc > zpool
> 
> nor does this:
> 
> $ while true ; do
> > echo lzo > compressor ; echo zbud > zpool
> > sleep 1
> > echo lz4 > compressor ; echo zsmalloc > zpool
> > sleep 1
> > done
> 
> although it's still possible either of those might fail, depending on
> whether anything else besides zswap has locked the mutexes.
> 
> However, changing a parameter with no delay immediately causes the
> schedule while atomic BUG:
> 
> $ while true ; do
> > echo lzo > compressor ; echo lz4 > compressor
> > done
> 
> This is essentially the same as Yu Zhao's proposed patch to zsmalloc,
> but moved to zswap, to cover compressor and zpool freeing.
> 
> Fixes: f1c54846ee45 ("zswap: dynamic pool creation")
> Reported-by: Yu Zhao <yuzhao@google.com>
> Signed-off-by: Dan Streetman <ddstreet@ieee.org>
> Cc: Dan Streetman <dan.streetman@canonical.com>

Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

	-ss

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm/zswap: use workqueue to destroy pool
@ 2016-04-29  0:25               ` Sergey Senozhatsky
  0 siblings, 0 replies; 38+ messages in thread
From: Sergey Senozhatsky @ 2016-04-29  0:25 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Yu Zhao, Andrew Morton, Seth Jennings, Sergey Senozhatsky,
	Minchan Kim, Nitin Gupta, Linux-MM, Sergey Senozhatsky,
	linux-kernel, Dan Streetman

On (04/26/16 17:08), Dan Streetman wrote:
> Add a work_struct to struct zswap_pool, and change __zswap_pool_empty
> to use the workqueue instead of using call_rcu().
> 
> When zswap destroys a pool no longer in use, it uses call_rcu() to
> perform the destruction/freeing.  Since that executes in softirq
> context, it must not sleep.  However, actually destroying the pool
> involves freeing the per-cpu compressors (which requires locking the
> cpu_add_remove_lock mutex) and freeing the zpool, for which the
> implementation may sleep (e.g. zsmalloc calls kmem_cache_destroy,
> which locks the slab_mutex).  So if either mutex is currently taken,
> or any other part of the compressor or zpool implementation sleeps, it
> will result in a BUG().
> 
> It's not easy to reproduce this when changing zswap's params normally.
> In testing with a loaded system, this does not fail:
> 
> $ cd /sys/module/zswap/parameters
> $ echo lz4 > compressor ; echo zsmalloc > zpool
> 
> nor does this:
> 
> $ while true ; do
> > echo lzo > compressor ; echo zbud > zpool
> > sleep 1
> > echo lz4 > compressor ; echo zsmalloc > zpool
> > sleep 1
> > done
> 
> although it's still possible either of those might fail, depending on
> whether anything else besides zswap has locked the mutexes.
> 
> However, changing a parameter with no delay immediately causes the
> schedule while atomic BUG:
> 
> $ while true ; do
> > echo lzo > compressor ; echo lz4 > compressor
> > done
> 
> This is essentially the same as Yu Zhao's proposed patch to zsmalloc,
> but moved to zswap, to cover compressor and zpool freeing.
> 
> Fixes: f1c54846ee45 ("zswap: dynamic pool creation")
> Reported-by: Yu Zhao <yuzhao@google.com>
> Signed-off-by: Dan Streetman <ddstreet@ieee.org>
> Cc: Dan Streetman <dan.streetman@canonical.com>

Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

	-ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm/zswap: provide unique zpool name
  2016-04-28  9:13                   ` Dan Streetman
@ 2016-04-29  0:25                     ` Sergey Senozhatsky
  -1 siblings, 0 replies; 38+ messages in thread
From: Sergey Senozhatsky @ 2016-04-29  0:25 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Sergey Senozhatsky, Andrew Morton, Seth Jennings, Yu Zhao,
	Minchan Kim, Nitin Gupta, Linux-MM, Sergey Senozhatsky,
	linux-kernel, Dan Streetman

On (04/28/16 05:13), Dan Streetman wrote:
> Instead of using "zswap" as the name for all zpools created, add
> an atomic counter and use "zswap%x" with the counter number for each
> zpool created, to provide a unique name for each new zpool.
> 
> As zsmalloc, one of the zpool implementations, requires/expects a
> unique name for each pool created, zswap should provide a unique name.
> The zsmalloc pool creation does not fail if a new pool with a
> conflicting name is created, unless CONFIG_ZSMALLOC_STAT is enabled;
> in that case, zsmalloc pool creation fails with -ENOMEM.  Then zswap
> will be unable to change its compressor parameter if its zpool is
> zsmalloc; it also will be unable to change its zpool parameter back
> to zsmalloc, if it has any existing old zpool using zsmalloc with
> page(s) in it.  Attempts to change the parameters will result in
> failure to create the zpool.  This changes zswap to provide a
> unique name for each zpool creation.
> 
> Fixes: f1c54846ee45 ("zswap: dynamic pool creation")
> Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Cc: Dan Streetman <dan.streetman@canonical.com>
> Signed-off-by: Dan Streetman <ddstreet@ieee.org>

Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

	-ss

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] mm/zswap: provide unique zpool name
@ 2016-04-29  0:25                     ` Sergey Senozhatsky
  0 siblings, 0 replies; 38+ messages in thread
From: Sergey Senozhatsky @ 2016-04-29  0:25 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Sergey Senozhatsky, Andrew Morton, Seth Jennings, Yu Zhao,
	Minchan Kim, Nitin Gupta, Linux-MM, Sergey Senozhatsky,
	linux-kernel, Dan Streetman

On (04/28/16 05:13), Dan Streetman wrote:
> Instead of using "zswap" as the name for all zpools created, add
> an atomic counter and use "zswap%x" with the counter number for each
> zpool created, to provide a unique name for each new zpool.
> 
> As zsmalloc, one of the zpool implementations, requires/expects a
> unique name for each pool created, zswap should provide a unique name.
> The zsmalloc pool creation does not fail if a new pool with a
> conflicting name is created, unless CONFIG_ZSMALLOC_STAT is enabled;
> in that case, zsmalloc pool creation fails with -ENOMEM.  Then zswap
> will be unable to change its compressor parameter if its zpool is
> zsmalloc; it also will be unable to change its zpool parameter back
> to zsmalloc, if it has any existing old zpool using zsmalloc with
> page(s) in it.  Attempts to change the parameters will result in
> failure to create the zpool.  This changes zswap to provide a
> unique name for each zpool creation.
> 
> Fixes: f1c54846ee45 ("zswap: dynamic pool creation")
> Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Cc: Dan Streetman <dan.streetman@canonical.com>
> Signed-off-by: Dan Streetman <ddstreet@ieee.org>

Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

	-ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2016-04-29  0:24 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-29 22:02 [PATCH] zsmalloc: use workqueue to destroy pool in zpool callback Yu Zhao
2016-03-30  0:54 ` Sergey Senozhatsky
     [not found] ` <20160329235950.GA19927@bbox>
2016-03-31  8:46   ` Sergey Senozhatsky
2016-03-31  8:46     ` Sergey Senozhatsky
2016-03-31 21:46     ` Yu Zhao
2016-03-31 21:46       ` Yu Zhao
2016-03-31 22:05       ` Dan Streetman
2016-03-31 22:05         ` Dan Streetman
2016-04-25 21:20         ` [PATCH] mm/zpool: use workqueue for zpool_destroy Dan Streetman
2016-04-25 21:20           ` Dan Streetman
2016-04-25 21:46           ` Andrew Morton
2016-04-25 21:46             ` Andrew Morton
2016-04-25 22:18           ` Yu Zhao
2016-04-25 22:18             ` Yu Zhao
2016-04-26  0:59             ` Sergey Senozhatsky
2016-04-26  0:59               ` Sergey Senozhatsky
2016-04-26 11:07             ` Dan Streetman
2016-04-26 11:07               ` Dan Streetman
2016-04-26 21:08           ` [PATCH] mm/zswap: use workqueue to destroy pool Dan Streetman
2016-04-26 21:08             ` Dan Streetman
2016-04-27  0:58             ` Sergey Senozhatsky
2016-04-27  0:58               ` Sergey Senozhatsky
2016-04-27 17:19               ` Dan Streetman
2016-04-27 17:19                 ` Dan Streetman
2016-04-28  1:40                 ` Sergey Senozhatsky
2016-04-28  1:40                   ` Sergey Senozhatsky
2016-04-28  4:09                   ` Sergey Senozhatsky
2016-04-28  4:09                     ` Sergey Senozhatsky
2016-04-28  8:21                   ` Dan Streetman
2016-04-28  8:21                     ` Dan Streetman
2016-04-28  9:13                 ` [PATCH] mm/zswap: provide unique zpool name Dan Streetman
2016-04-28  9:13                   ` Dan Streetman
2016-04-28 22:16                   ` Andrew Morton
2016-04-28 22:16                     ` Andrew Morton
2016-04-29  0:25                   ` Sergey Senozhatsky
2016-04-29  0:25                     ` Sergey Senozhatsky
2016-04-29  0:25             ` [PATCH] mm/zswap: use workqueue to destroy pool Sergey Senozhatsky
2016-04-29  0:25               ` Sergey Senozhatsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.