linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 1/2] zram: free meta table in zram_meta_free
       [not found] <1422432945-6764-1-git-send-email-minchan@kernel.org>
@ 2015-01-28 14:19 ` Sergey Senozhatsky
  2015-01-28 23:17   ` Minchan Kim
       [not found] ` <1422432945-6764-2-git-send-email-minchan@kernel.org>
  1 sibling, 1 reply; 44+ messages in thread
From: Sergey Senozhatsky @ 2015-01-28 14:19 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Nitin Gupta,
	Jerome Marchand, Sergey Senozhatsky, Ganesh Mahendran,
	sergey.senozhatsky.work

On (01/28/15 17:15), Minchan Kim wrote:
> zram_meta_alloc() and zram_meta_free() are a pair.
> In zram_meta_alloc(), meta table is allocated. So it it better to free
> it in zram_meta_free().
> 
> Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  drivers/block/zram/zram_drv.c | 30 ++++++++++++++----------------
>  drivers/block/zram/zram_drv.h |  1 +
>  2 files changed, 15 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 9250b3f54a8f..a598ada817f0 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -309,6 +309,18 @@ static inline int valid_io_request(struct zram *zram,
>  
>  static void zram_meta_free(struct zram_meta *meta)
>  {
> +	size_t index;


I don't like how we bloat structs w/o any need.
zram keeps ->disksize, so let's use `zram->disksize >> PAGE_SHIFT'
instead of introducing ->num_pages.

	-ss

> +	/* Free all pages that are still in this zram device */
> +	for (index = 0; index < meta->num_pages; index++) {
> +		unsigned long handle = meta->table[index].handle;
> +
> +		if (!handle)
> +			continue;
> +
> +		zs_free(meta->mem_pool, handle);
> +	}
> +
>  	zs_destroy_pool(meta->mem_pool);
>  	vfree(meta->table);
>  	kfree(meta);
> @@ -316,15 +328,14 @@ static void zram_meta_free(struct zram_meta *meta)
>  
>  static struct zram_meta *zram_meta_alloc(int device_id, u64 disksize)
>  {
> -	size_t num_pages;
>  	char pool_name[8];
>  	struct zram_meta *meta = kmalloc(sizeof(*meta), GFP_KERNEL);
>  
>  	if (!meta)
>  		return NULL;
>  
> -	num_pages = disksize >> PAGE_SHIFT;
> -	meta->table = vzalloc(num_pages * sizeof(*meta->table));
> +	meta->num_pages = disksize >> PAGE_SHIFT;
> +	meta->table = vzalloc(meta->num_pages * sizeof(*meta->table));
>  	if (!meta->table) {
>  		pr_err("Error allocating zram address table\n");
>  		goto out_error;
> @@ -706,9 +717,6 @@ static void zram_bio_discard(struct zram *zram, u32 index,
>  
>  static void zram_reset_device(struct zram *zram, bool reset_capacity)
>  {
> -	size_t index;
> -	struct zram_meta *meta;
> -
>  	down_write(&zram->init_lock);
>  
>  	zram->limit_pages = 0;
> @@ -718,16 +726,6 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>  		return;
>  	}
>  
> -	meta = zram->meta;
> -	/* Free all pages that are still in this zram device */
> -	for (index = 0; index < zram->disksize >> PAGE_SHIFT; index++) {
> -		unsigned long handle = meta->table[index].handle;
> -		if (!handle)
> -			continue;
> -
> -		zs_free(meta->mem_pool, handle);
> -	}
> -
>  	zcomp_destroy(zram->comp);
>  	zram->max_comp_streams = 1;
>  
> diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
> index b05a816b09ac..e492f6bf11f1 100644
> --- a/drivers/block/zram/zram_drv.h
> +++ b/drivers/block/zram/zram_drv.h
> @@ -96,6 +96,7 @@ struct zram_stats {
>  struct zram_meta {
>  	struct zram_table_entry *table;
>  	struct zs_pool *mem_pool;
> +	size_t num_pages;
>  };
>  
>  struct zram {
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
       [not found] ` <1422432945-6764-2-git-send-email-minchan@kernel.org>
@ 2015-01-28 14:56   ` Sergey Senozhatsky
  2015-01-28 15:04     ` Sergey Senozhatsky
  2015-01-28 23:33     ` Minchan Kim
  2015-01-29 13:48   ` Ganesh Mahendran
  1 sibling, 2 replies; 44+ messages in thread
From: Sergey Senozhatsky @ 2015-01-28 14:56 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Nitin Gupta,
	Jerome Marchand, Sergey Senozhatsky, Ganesh Mahendran

On (01/28/15 17:15), Minchan Kim wrote:
> Admin could reset zram during I/O operation going on so we have
> used zram->init_lock as read-side lock in I/O path to prevent
> sudden zram meta freeing.
> 
> However, the init_lock is really troublesome.
> We can't do call zram_meta_alloc under init_lock due to lockdep splat
> because zram_rw_page is one of the function under reclaim path and
> hold it as read_lock while other places in process context hold it
> as write_lock. So, we have used allocation out of the lock to avoid
> lockdep warn but it's not good for readability and fainally, I met
> another lockdep splat between init_lock and cpu_hotpulug from
> kmem_cache_destroy during wokring zsmalloc compaction. :(
> 
> Yes, the ideal is to remove horrible init_lock of zram in rw path.
> This patch removes it in rw path and instead, put init_done bool
> variable to check initialization done with smp_[wmb|rmb] and
> srcu_[un]read_lock to prevent sudden zram meta freeing
> during I/O operation.
> 
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  drivers/block/zram/zram_drv.c | 85 +++++++++++++++++++++++++++++++------------
>  drivers/block/zram/zram_drv.h |  5 +++
>  2 files changed, 66 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index a598ada817f0..b33add453027 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -32,6 +32,7 @@
>  #include <linux/string.h>
>  #include <linux/vmalloc.h>
>  #include <linux/err.h>
> +#include <linux/srcu.h>
>  
>  #include "zram_drv.h"
>  
> @@ -53,9 +54,31 @@ static ssize_t name##_show(struct device *d,		\
>  }									\
>  static DEVICE_ATTR_RO(name);
>  
> -static inline int init_done(struct zram *zram)
> +static inline bool init_done(struct zram *zram)
>  {
> -	return zram->meta != NULL;
> +	/*
> +	 * init_done can be used without holding zram->init_lock in
> +	 * read/write handler(ie, zram_make_request) but we should make sure
> +	 * that zram->init_done should set up after meta initialization is
> +	 * done. Look at setup_init_done.
> +	 */
> +	bool ret = zram->init_done;

I don't like re-introduced ->init_done.
another idea... how about using `zram->disksize == 0' instead of
`->init_done' (previously `->meta != NULL')? should do the trick.


and I'm not sure I get this rmb...

	-ss

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-01-28 14:56   ` [PATCH v1 2/2] zram: remove init_lock in zram_make_request Sergey Senozhatsky
@ 2015-01-28 15:04     ` Sergey Senozhatsky
  2015-01-28 23:33     ` Minchan Kim
  1 sibling, 0 replies; 44+ messages in thread
From: Sergey Senozhatsky @ 2015-01-28 15:04 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel, linux-mm,
	Nitin Gupta, Jerome Marchand, Ganesh Mahendran,
	sergey.senozhatsky.work

On (01/28/15 23:56), Sergey Senozhatsky wrote:
> > -static inline int init_done(struct zram *zram)
> > +static inline bool init_done(struct zram *zram)
> >  {
> > -	return zram->meta != NULL;
> > +	/*
> > +	 * init_done can be used without holding zram->init_lock in
> > +	 * read/write handler(ie, zram_make_request) but we should make sure
> > +	 * that zram->init_done should set up after meta initialization is
> > +	 * done. Look at setup_init_done.
> > +	 */
> > +	bool ret = zram->init_done;
> 
> I don't like re-introduced ->init_done.
> another idea... how about using `zram->disksize == 0' instead of
> `->init_done' (previously `->meta != NULL')? should do the trick.
> 

a typo, I meant `->disksize != 0'.

	-ss

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/2] zram: free meta table in zram_meta_free
  2015-01-28 14:19 ` [PATCH 1/2] zram: free meta table in zram_meta_free Sergey Senozhatsky
@ 2015-01-28 23:17   ` Minchan Kim
  2015-01-29  1:49     ` Ganesh Mahendran
  0 siblings, 1 reply; 44+ messages in thread
From: Minchan Kim @ 2015-01-28 23:17 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, linux-kernel, linux-mm, Nitin Gupta,
	Jerome Marchand, Ganesh Mahendran, sergey.senozhatsky.work

On Wed, Jan 28, 2015 at 11:19:17PM +0900, Sergey Senozhatsky wrote:
> On (01/28/15 17:15), Minchan Kim wrote:
> > zram_meta_alloc() and zram_meta_free() are a pair.
> > In zram_meta_alloc(), meta table is allocated. So it it better to free
> > it in zram_meta_free().
> > 
> > Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > ---
> >  drivers/block/zram/zram_drv.c | 30 ++++++++++++++----------------
> >  drivers/block/zram/zram_drv.h |  1 +
> >  2 files changed, 15 insertions(+), 16 deletions(-)
> > 
> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> > index 9250b3f54a8f..a598ada817f0 100644
> > --- a/drivers/block/zram/zram_drv.c
> > +++ b/drivers/block/zram/zram_drv.c
> > @@ -309,6 +309,18 @@ static inline int valid_io_request(struct zram *zram,
> >  
> >  static void zram_meta_free(struct zram_meta *meta)
> >  {
> > +	size_t index;
> 
> 
> I don't like how we bloat structs w/o any need.
> zram keeps ->disksize, so let's use `zram->disksize >> PAGE_SHIFT'
> instead of introducing ->num_pages.

Right. I overlooked it. I just want to send my patch[2/2] and I thought
it would be better ganesh's patch to merge first although it's orthogonal.

Ganesh, I hope you resend this patch with Sergey's suggestion.
If you are busy, please tell me. I will do it instead of you.

> 
> 	-ss
> 
> > +	/* Free all pages that are still in this zram device */
> > +	for (index = 0; index < meta->num_pages; index++) {
> > +		unsigned long handle = meta->table[index].handle;
> > +
> > +		if (!handle)
> > +			continue;
> > +
> > +		zs_free(meta->mem_pool, handle);
> > +	}
> > +
> >  	zs_destroy_pool(meta->mem_pool);
> >  	vfree(meta->table);
> >  	kfree(meta);
> > @@ -316,15 +328,14 @@ static void zram_meta_free(struct zram_meta *meta)
> >  
> >  static struct zram_meta *zram_meta_alloc(int device_id, u64 disksize)
> >  {
> > -	size_t num_pages;
> >  	char pool_name[8];
> >  	struct zram_meta *meta = kmalloc(sizeof(*meta), GFP_KERNEL);
> >  
> >  	if (!meta)
> >  		return NULL;
> >  
> > -	num_pages = disksize >> PAGE_SHIFT;
> > -	meta->table = vzalloc(num_pages * sizeof(*meta->table));
> > +	meta->num_pages = disksize >> PAGE_SHIFT;
> > +	meta->table = vzalloc(meta->num_pages * sizeof(*meta->table));
> >  	if (!meta->table) {
> >  		pr_err("Error allocating zram address table\n");
> >  		goto out_error;
> > @@ -706,9 +717,6 @@ static void zram_bio_discard(struct zram *zram, u32 index,
> >  
> >  static void zram_reset_device(struct zram *zram, bool reset_capacity)
> >  {
> > -	size_t index;
> > -	struct zram_meta *meta;
> > -
> >  	down_write(&zram->init_lock);
> >  
> >  	zram->limit_pages = 0;
> > @@ -718,16 +726,6 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
> >  		return;
> >  	}
> >  
> > -	meta = zram->meta;
> > -	/* Free all pages that are still in this zram device */
> > -	for (index = 0; index < zram->disksize >> PAGE_SHIFT; index++) {
> > -		unsigned long handle = meta->table[index].handle;
> > -		if (!handle)
> > -			continue;
> > -
> > -		zs_free(meta->mem_pool, handle);
> > -	}
> > -
> >  	zcomp_destroy(zram->comp);
> >  	zram->max_comp_streams = 1;
> >  
> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
> > index b05a816b09ac..e492f6bf11f1 100644
> > --- a/drivers/block/zram/zram_drv.h
> > +++ b/drivers/block/zram/zram_drv.h
> > @@ -96,6 +96,7 @@ struct zram_stats {
> >  struct zram_meta {
> >  	struct zram_table_entry *table;
> >  	struct zs_pool *mem_pool;
> > +	size_t num_pages;
> >  };
> >  
> >  struct zram {
> > -- 
> > 1.9.1
> > 

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-01-28 14:56   ` [PATCH v1 2/2] zram: remove init_lock in zram_make_request Sergey Senozhatsky
  2015-01-28 15:04     ` Sergey Senozhatsky
@ 2015-01-28 23:33     ` Minchan Kim
       [not found]       ` <CAHqPoqKZFDSjO1pL+ixYe_m_L0nGNcu04qSNp-jd1fUixKtHnw@mail.gmail.com>
  1 sibling, 1 reply; 44+ messages in thread
From: Minchan Kim @ 2015-01-28 23:33 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, linux-kernel, linux-mm, Nitin Gupta,
	Jerome Marchand, Ganesh Mahendran

On Wed, Jan 28, 2015 at 11:56:51PM +0900, Sergey Senozhatsky wrote:
> On (01/28/15 17:15), Minchan Kim wrote:
> > Admin could reset zram during I/O operation going on so we have
> > used zram->init_lock as read-side lock in I/O path to prevent
> > sudden zram meta freeing.
> > 
> > However, the init_lock is really troublesome.
> > We can't do call zram_meta_alloc under init_lock due to lockdep splat
> > because zram_rw_page is one of the function under reclaim path and
> > hold it as read_lock while other places in process context hold it
> > as write_lock. So, we have used allocation out of the lock to avoid
> > lockdep warn but it's not good for readability and fainally, I met
> > another lockdep splat between init_lock and cpu_hotpulug from
> > kmem_cache_destroy during wokring zsmalloc compaction. :(
> > 
> > Yes, the ideal is to remove horrible init_lock of zram in rw path.
> > This patch removes it in rw path and instead, put init_done bool
> > variable to check initialization done with smp_[wmb|rmb] and
> > srcu_[un]read_lock to prevent sudden zram meta freeing
> > during I/O operation.
> > 
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > ---
> >  drivers/block/zram/zram_drv.c | 85 +++++++++++++++++++++++++++++++------------
> >  drivers/block/zram/zram_drv.h |  5 +++
> >  2 files changed, 66 insertions(+), 24 deletions(-)
> > 
> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> > index a598ada817f0..b33add453027 100644
> > --- a/drivers/block/zram/zram_drv.c
> > +++ b/drivers/block/zram/zram_drv.c
> > @@ -32,6 +32,7 @@
> >  #include <linux/string.h>
> >  #include <linux/vmalloc.h>
> >  #include <linux/err.h>
> > +#include <linux/srcu.h>
> >  
> >  #include "zram_drv.h"
> >  
> > @@ -53,9 +54,31 @@ static ssize_t name##_show(struct device *d,		\
> >  }									\
> >  static DEVICE_ATTR_RO(name);
> >  
> > -static inline int init_done(struct zram *zram)
> > +static inline bool init_done(struct zram *zram)
> >  {
> > -	return zram->meta != NULL;
> > +	/*
> > +	 * init_done can be used without holding zram->init_lock in
> > +	 * read/write handler(ie, zram_make_request) but we should make sure
> > +	 * that zram->init_done should set up after meta initialization is
> > +	 * done. Look at setup_init_done.
> > +	 */
> > +	bool ret = zram->init_done;
> 
> I don't like re-introduced ->init_done.
> another idea... how about using `zram->disksize == 0' instead of
> `->init_done' (previously `->meta != NULL')? should do the trick.

It could be.

> 
> 
> and I'm not sure I get this rmb...

What makes you not sure?
I think it's clear and common pattern for smp_[wmb|rmb]. :)


> 
> 	-ss

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/2] zram: free meta table in zram_meta_free
  2015-01-28 23:17   ` Minchan Kim
@ 2015-01-29  1:49     ` Ganesh Mahendran
  0 siblings, 0 replies; 44+ messages in thread
From: Ganesh Mahendran @ 2015-01-29  1:49 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel, Linux-MM,
	Nitin Gupta, Jerome Marchand, sergey.senozhatsky.work

Hello, Minchan

2015-01-29 7:17 GMT+08:00 Minchan Kim <minchan@kernel.org>:
> On Wed, Jan 28, 2015 at 11:19:17PM +0900, Sergey Senozhatsky wrote:
>> On (01/28/15 17:15), Minchan Kim wrote:
>> > zram_meta_alloc() and zram_meta_free() are a pair.
>> > In zram_meta_alloc(), meta table is allocated. So it it better to free
>> > it in zram_meta_free().
>> >
>> > Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
>> > Signed-off-by: Minchan Kim <minchan@kernel.org>
>> > ---
>> >  drivers/block/zram/zram_drv.c | 30 ++++++++++++++----------------
>> >  drivers/block/zram/zram_drv.h |  1 +
>> >  2 files changed, 15 insertions(+), 16 deletions(-)
>> >
>> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
>> > index 9250b3f54a8f..a598ada817f0 100644
>> > --- a/drivers/block/zram/zram_drv.c
>> > +++ b/drivers/block/zram/zram_drv.c
>> > @@ -309,6 +309,18 @@ static inline int valid_io_request(struct zram *zram,
>> >
>> >  static void zram_meta_free(struct zram_meta *meta)
>> >  {
>> > +   size_t index;
>>
>>
>> I don't like how we bloat structs w/o any need.
>> zram keeps ->disksize, so let's use `zram->disksize >> PAGE_SHIFT'
>> instead of introducing ->num_pages.
>
> Right. I overlooked it. I just want to send my patch[2/2] and I thought
> it would be better ganesh's patch to merge first although it's orthogonal.
>
> Ganesh, I hope you resend this patch with Sergey's suggestion.
> If you are busy, please tell me. I will do it instead of you.

OK, I will do it today.
Thanks

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
       [not found]       ` <CAHqPoqKZFDSjO1pL+ixYe_m_L0nGNcu04qSNp-jd1fUixKtHnw@mail.gmail.com>
@ 2015-01-29  2:01         ` Minchan Kim
  2015-01-29  2:22           ` Sergey Senozhatsky
  0 siblings, 1 reply; 44+ messages in thread
From: Minchan Kim @ 2015-01-29  2:01 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, linux-kernel, Linux-MM, Nitin Gupta,
	Jerome Marchand, Ganesh Mahendran, sergey.senozhatsky.work

On Thu, Jan 29, 2015 at 10:57:38AM +0900, Sergey Senozhatsky wrote:
> On Thu, Jan 29, 2015 at 8:33 AM, Minchan Kim <minchan@kernel.org> wrote:
> 
> > On Wed, Jan 28, 2015 at 11:56:51PM +0900, Sergey Senozhatsky wrote:
> > > I don't like re-introduced ->init_done.
> > > another idea... how about using `zram->disksize == 0' instead of
> > > `->init_done' (previously `->meta != NULL')? should do the trick.
> >
> > It could be.
> >
> >
> care to change it?

Will try!

> 
> 
> 
> > >
> > >
> > > and I'm not sure I get this rmb...
> >
> > What makes you not sure?
> > I think it's clear and common pattern for smp_[wmb|rmb]. :)
> >
> 
> 
> well... what that "if (ret)" gives? it's almost always true, because the
> device is initialized during read/write operations (in 99.99% of cases).

If it was your concern, I'm happy to remove the check.(ie, actually,
I realized that after I push the button to send). Thanks!

> 
> -ss

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-01-29  2:01         ` Minchan Kim
@ 2015-01-29  2:22           ` Sergey Senozhatsky
  2015-01-29  5:28             ` Minchan Kim
  0 siblings, 1 reply; 44+ messages in thread
From: Sergey Senozhatsky @ 2015-01-29  2:22 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel, Linux-MM,
	Nitin Gupta, Jerome Marchand, Ganesh Mahendran,
	sergey.senozhatsky.work

On (01/29/15 11:01), Minchan Kim wrote:
> On Thu, Jan 29, 2015 at 10:57:38AM +0900, Sergey Senozhatsky wrote:
> > On Thu, Jan 29, 2015 at 8:33 AM, Minchan Kim <minchan@kernel.org> wrote:
> > > On Wed, Jan 28, 2015 at 11:56:51PM +0900, Sergey Senozhatsky wrote:
> > > > I don't like re-introduced ->init_done.
> > > > another idea... how about using `zram->disksize == 0' instead of
> > > > `->init_done' (previously `->meta != NULL')? should do the trick.
> > >
> > > It could be.
> > >
> > >
> > care to change it?
> 
> Will try!
> 
> If it was your concern, I'm happy to remove the check.(ie, actually,
> I realized that after I push the button to send). Thanks!
> 

Thanks a lot, Minchan.

and, guys, sorry for previous html email (I'm sure I toggled the "plain
text" mode in gmail web-interface, but somehow it has different meaning
in gmail world).


I'm still concerned about performance numbers that I see on my x86_64.
it's not always, but mostly slower. I'll give it another try (disable
lockdep, etc.), but if we lose 10% on average then, sorry, I'm not so
positive about srcu change and will tend to vote for your initial commit
that simply moved meta free() out of init_lock and left locking as is
(lockdep warning would have been helpful there, because otherwise it
just looked like we change code w/o any reason).

what do you thunk?

	-ss

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-01-29  2:22           ` Sergey Senozhatsky
@ 2015-01-29  5:28             ` Minchan Kim
  2015-01-29  6:06               ` Sergey Senozhatsky
  0 siblings, 1 reply; 44+ messages in thread
From: Minchan Kim @ 2015-01-29  5:28 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel, Linux-MM,
	Nitin Gupta, Jerome Marchand, Ganesh Mahendran

On Thu, Jan 29, 2015 at 11:22:41AM +0900, Sergey Senozhatsky wrote:
> On (01/29/15 11:01), Minchan Kim wrote:
> > On Thu, Jan 29, 2015 at 10:57:38AM +0900, Sergey Senozhatsky wrote:
> > > On Thu, Jan 29, 2015 at 8:33 AM, Minchan Kim <minchan@kernel.org> wrote:
> > > > On Wed, Jan 28, 2015 at 11:56:51PM +0900, Sergey Senozhatsky wrote:
> > > > > I don't like re-introduced ->init_done.
> > > > > another idea... how about using `zram->disksize == 0' instead of
> > > > > `->init_done' (previously `->meta != NULL')? should do the trick.
> > > >
> > > > It could be.
> > > >
> > > >
> > > care to change it?
> > 
> > Will try!
> > 
> > If it was your concern, I'm happy to remove the check.(ie, actually,
> > I realized that after I push the button to send). Thanks!
> > 
> 
> Thanks a lot, Minchan.
> 
> and, guys, sorry for previous html email (I'm sure I toggled the "plain
> text" mode in gmail web-interface, but somehow it has different meaning
> in gmail world).
> 
> 
> I'm still concerned about performance numbers that I see on my x86_64.
> it's not always, but mostly slower. I'll give it another try (disable
> lockdep, etc.), but if we lose 10% on average then, sorry, I'm not so
> positive about srcu change and will tend to vote for your initial commit
> that simply moved meta free() out of init_lock and left locking as is
> (lockdep warning would have been helpful there, because otherwise it
> just looked like we change code w/o any reason).
> 
> what do you thunk?

Surely I agreee with you. If it suffers from 10% performance regression,
it's absolutely no go.

However, I believe it should be no loss because that's one of the reason
from RCU birth which should be really win in read-side lock path compared
to other locking.

Please test it with dd or something for block-based test for removing
noise from FS. I also will test it to confirm that with real machine.

Thanks for the review!



> 
> 	-ss

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-01-29  5:28             ` Minchan Kim
@ 2015-01-29  6:06               ` Sergey Senozhatsky
  2015-01-29  6:35                 ` Minchan Kim
  0 siblings, 1 reply; 44+ messages in thread
From: Sergey Senozhatsky @ 2015-01-29  6:06 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	linux-kernel, Linux-MM, Nitin Gupta, Jerome Marchand,
	Ganesh Mahendran

On (01/29/15 14:28), Minchan Kim wrote:
> > I'm still concerned about performance numbers that I see on my x86_64.
> > it's not always, but mostly slower. I'll give it another try (disable
> > lockdep, etc.), but if we lose 10% on average then, sorry, I'm not so
> > positive about srcu change and will tend to vote for your initial commit
> > that simply moved meta free() out of init_lock and left locking as is
> > (lockdep warning would have been helpful there, because otherwise it
> > just looked like we change code w/o any reason).
> > 
> > what do you thunk?
> 
> Surely I agreee with you. If it suffers from 10% performance regression,
> it's absolutely no go.
> 
> However, I believe it should be no loss because that's one of the reason
> from RCU birth which should be really win in read-side lock path compared
> to other locking.
> 
> Please test it with dd or something for block-based test for removing
> noise from FS. I also will test it to confirm that with real machine.
> 

do you test with a single dd thread/process?  just dd if=foo of=bar -c... or
you start N `dd &' processes?

for a single writer there should be no difference, no doubt. I'm more
interested in multi-writer/multi-reader/mixed use cases.

the options that I use are: iozone -t 3 -R -r 16K -s 60M -I +Z
and -I is:
	-I  Use VxFS VX_DIRECT, O_DIRECT,or O_DIRECTIO for all file operations

with O_DIRECT I don't think there is a lot of noise, but I'll try to use
different benchmarks a bit later.


	-ss

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-01-29  6:06               ` Sergey Senozhatsky
@ 2015-01-29  6:35                 ` Minchan Kim
  2015-01-29  7:08                   ` Sergey Senozhatsky
  2015-01-30  0:20                   ` Sergey Senozhatsky
  0 siblings, 2 replies; 44+ messages in thread
From: Minchan Kim @ 2015-01-29  6:35 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel, Linux-MM,
	Nitin Gupta, Jerome Marchand, Ganesh Mahendran

On Thu, Jan 29, 2015 at 03:06:04PM +0900, Sergey Senozhatsky wrote:
> On (01/29/15 14:28), Minchan Kim wrote:
> > > I'm still concerned about performance numbers that I see on my x86_64.
> > > it's not always, but mostly slower. I'll give it another try (disable
> > > lockdep, etc.), but if we lose 10% on average then, sorry, I'm not so
> > > positive about srcu change and will tend to vote for your initial commit
> > > that simply moved meta free() out of init_lock and left locking as is
> > > (lockdep warning would have been helpful there, because otherwise it
> > > just looked like we change code w/o any reason).
> > > 
> > > what do you thunk?
> > 
> > Surely I agreee with you. If it suffers from 10% performance regression,
> > it's absolutely no go.
> > 
> > However, I believe it should be no loss because that's one of the reason
> > from RCU birth which should be really win in read-side lock path compared
> > to other locking.
> > 
> > Please test it with dd or something for block-based test for removing
> > noise from FS. I also will test it to confirm that with real machine.
> > 
> 
> do you test with a single dd thread/process?  just dd if=foo of=bar -c... or
> you start N `dd &' processes?

I tested it with multiple dd processes.

> 
> for a single writer there should be no difference, no doubt. I'm more
> interested in multi-writer/multi-reader/mixed use cases.
> 
> the options that I use are: iozone -t 3 -R -r 16K -s 60M -I +Z
> and -I is:
> 	-I  Use VxFS VX_DIRECT, O_DIRECT,or O_DIRECTIO for all file operations
> 
> with O_DIRECT I don't think there is a lot of noise, but I'll try to use
> different benchmarks a bit later.
> 

As you told, the data was not stable.
Anyway, when I read down_read implementation, it's one atomic instruction.
Hmm, it seems te be better for srcu_read_lock which does more things.
But I guessed most of overhead are from [de]compression, memcpy, clear_page
That's why I guessed we don't have measurable difference from that.
What's the data pattern if you use iozone? I guess it's really simple
pattern compressor can do fast. I used /dev/sda for dd write so
more realistic data. Anyway, if we has 10% regression even if the data
is simple, I never want to merge it.
I will test it carefully and if it turns out lots regression,
surely, I will not go with this and send the original patch again.

Thanks.

> 
> 	-ss

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-01-29  6:35                 ` Minchan Kim
@ 2015-01-29  7:08                   ` Sergey Senozhatsky
  2015-01-30 14:41                     ` Minchan Kim
  2015-01-30  0:20                   ` Sergey Senozhatsky
  1 sibling, 1 reply; 44+ messages in thread
From: Sergey Senozhatsky @ 2015-01-29  7:08 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	linux-kernel, Linux-MM, Nitin Gupta, Jerome Marchand,
	Ganesh Mahendran

On (01/29/15 15:35), Minchan Kim wrote:
>
> As you told, the data was not stable.
>
yes. fread test was always slower, and the rest was mostly slower.


> Anyway, when I read down_read implementation, it's one atomic instruction.
> Hmm, it seems te be better for srcu_read_lock which does more things.
>
srcu looks havier, agree.

> But I guessed most of overhead are from [de]compression, memcpy, clear_page
> That's why I guessed we don't have measurable difference from that.
> What's the data pattern if you use iozone?

by "data pattern" you mean usage scenario? well, I usually use zram for
`make -jX', where X=[4..N]. so N concurrent read-write ops scenario.

	-ss

> I guess it's really simple pattern compressor can do fast. I used /dev/sda
> for dd write so more realistic data. Anyway, if we has 10% regression even if
> the data is simple, I never want to merge it.
> I will test it carefully and if it turns out lots regression,
> surely, I will not go with this and send the original patch again.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
       [not found] ` <1422432945-6764-2-git-send-email-minchan@kernel.org>
  2015-01-28 14:56   ` [PATCH v1 2/2] zram: remove init_lock in zram_make_request Sergey Senozhatsky
@ 2015-01-29 13:48   ` Ganesh Mahendran
  2015-01-29 15:12     ` Sergey Senozhatsky
  1 sibling, 1 reply; 44+ messages in thread
From: Ganesh Mahendran @ 2015-01-29 13:48 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, Linux-MM, Nitin Gupta,
	Jerome Marchand, Sergey Senozhatsky

Hello, Minchan:

2015-01-28 16:15 GMT+08:00 Minchan Kim <minchan@kernel.org>:
> Admin could reset zram during I/O operation going on so we have
> used zram->init_lock as read-side lock in I/O path to prevent
> sudden zram meta freeing.

When I/O operation is running, that means the /dev/zram0 is
mounted or swaped on. Then the device could not be reset by
below code:

    /* Do not reset an active device! */
    if (bdev->bd_holders) {
        ret = -EBUSY;
        goto out;
    }

So the zram->init_lock in I/O path is to check whether the device
has been initialized(echo xxx > /sys/block/zram/disk_size).

Is that right?

>
> However, the init_lock is really troublesome.
> We can't do call zram_meta_alloc under init_lock due to lockdep splat

I do not know much about the lockdep.

enabled the CONFIG_LOCKDEP, But from the test, there is no warning
printed. Could you please tell me how this happened?

Thanks.

> because zram_rw_page is one of the function under reclaim path and
> hold it as read_lock while other places in process context hold it
> as write_lock. So, we have used allocation out of the lock to avoid
> lockdep warn but it's not good for readability and fainally, I met
> another lockdep splat between init_lock and cpu_hotpulug from
> kmem_cache_destroy during wokring zsmalloc compaction. :(
>
> Yes, the ideal is to remove horrible init_lock of zram in rw path.
> This patch removes it in rw path and instead, put init_done bool
> variable to check initialization done with smp_[wmb|rmb] and
> srcu_[un]read_lock to prevent sudden zram meta freeing
> during I/O operation.
>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  drivers/block/zram/zram_drv.c | 85 +++++++++++++++++++++++++++++++------------
>  drivers/block/zram/zram_drv.h |  5 +++
>  2 files changed, 66 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index a598ada817f0..b33add453027 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -32,6 +32,7 @@
>  #include <linux/string.h>
>  #include <linux/vmalloc.h>
>  #include <linux/err.h>
> +#include <linux/srcu.h>
>
>  #include "zram_drv.h"
>
> @@ -53,9 +54,31 @@ static ssize_t name##_show(struct device *d,         \
>  }                                                                      \
>  static DEVICE_ATTR_RO(name);
>
> -static inline int init_done(struct zram *zram)
> +static inline bool init_done(struct zram *zram)
>  {
> -       return zram->meta != NULL;
> +       /*
> +        * init_done can be used without holding zram->init_lock in
> +        * read/write handler(ie, zram_make_request) but we should make sure
> +        * that zram->init_done should set up after meta initialization is
> +        * done. Look at setup_init_done.
> +        */
> +       bool ret = zram->init_done;
> +
> +       if (ret)
> +               smp_rmb(); /* pair with setup_init_done */
> +
> +       return ret;
> +}
> +
> +static inline void setup_init_done(struct zram *zram, bool flag)
> +{
> +       /*
> +        * Store operation of struct zram fields should complete
> +        * before zram->init_done set up because zram_bvec_rw
> +        * doesn't hold an zram->init_lock.
> +        */
> +       smp_wmb();
> +       zram->init_done = flag;
>  }
>
>  static inline struct zram *dev_to_zram(struct device *dev)
> @@ -326,6 +349,10 @@ static void zram_meta_free(struct zram_meta *meta)
>         kfree(meta);
>  }
>
> +static void rcu_zram_do_nothing(struct rcu_head *unused)
> +{
> +}
> +
>  static struct zram_meta *zram_meta_alloc(int device_id, u64 disksize)
>  {
>         char pool_name[8];
> @@ -726,11 +753,8 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>                 return;
>         }
>
> -       zcomp_destroy(zram->comp);
>         zram->max_comp_streams = 1;
>
> -       zram_meta_free(zram->meta);
> -       zram->meta = NULL;
>         /* Reset stats */
>         memset(&zram->stats, 0, sizeof(zram->stats));
>
> @@ -738,8 +762,12 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>         if (reset_capacity)
>                 set_capacity(zram->disk, 0);

Should this be after synchronize_srcu()?

>
> +       setup_init_done(zram, false);
> +       call_srcu(&zram->srcu, &zram->rcu, rcu_zram_do_nothing);
> +       synchronize_srcu(&zram->srcu);
> +       zram_meta_free(zram->meta);
> +       zcomp_destroy(zram->comp);
>         up_write(&zram->init_lock);
> -
>         /*
>          * Revalidate disk out of the init_lock to avoid lockdep splat.
>          * It's okay because disk's capacity is protected by init_lock
> @@ -762,10 +790,19 @@ static ssize_t disksize_store(struct device *dev,
>         if (!disksize)
>                 return -EINVAL;
>
> +       down_write(&zram->init_lock);
> +       if (init_done(zram)) {
> +               pr_info("Cannot change disksize for initialized device\n");
> +               up_write(&zram->init_lock);
> +               return -EBUSY;
> +       }
> +
>         disksize = PAGE_ALIGN(disksize);
>         meta = zram_meta_alloc(zram->disk->first_minor, disksize);
> -       if (!meta)
> +       if (!meta) {
> +               up_write(&zram->init_lock);
>                 return -ENOMEM;
> +       }
>
>         comp = zcomp_create(zram->compressor, zram->max_comp_streams);
>         if (IS_ERR(comp)) {
> @@ -775,17 +812,11 @@ static ssize_t disksize_store(struct device *dev,
>                 goto out_free_meta;
>         }
>
> -       down_write(&zram->init_lock);
> -       if (init_done(zram)) {
> -               pr_info("Cannot change disksize for initialized device\n");
> -               err = -EBUSY;
> -               goto out_destroy_comp;
> -       }
> -
>         zram->meta = meta;
>         zram->comp = comp;
>         zram->disksize = disksize;
>         set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
> +       setup_init_done(zram, true);
>         up_write(&zram->init_lock);
>
>         /*
> @@ -797,10 +828,8 @@ static ssize_t disksize_store(struct device *dev,
>
>         return len;
>
> -out_destroy_comp:
> -       up_write(&zram->init_lock);
> -       zcomp_destroy(comp);
>  out_free_meta:
> +       up_write(&zram->init_lock);
>         zram_meta_free(meta);
>         return err;
>  }
> @@ -905,9 +934,10 @@ out:
>   */
>  static void zram_make_request(struct request_queue *queue, struct bio *bio)
>  {
> +       int idx;
>         struct zram *zram = queue->queuedata;
>
> -       down_read(&zram->init_lock);
> +       idx = srcu_read_lock(&zram->srcu);
>         if (unlikely(!init_done(zram)))
>                 goto error;
>
> @@ -918,12 +948,12 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
>         }
>
>         __zram_make_request(zram, bio);
> -       up_read(&zram->init_lock);
> +       srcu_read_unlock(&zram->srcu, idx);
>
>         return;
>
>  error:
> -       up_read(&zram->init_lock);
> +       srcu_read_unlock(&zram->srcu, idx);
>         bio_io_error(bio);
>  }
>
> @@ -945,18 +975,20 @@ static void zram_slot_free_notify(struct block_device *bdev,
>  static int zram_rw_page(struct block_device *bdev, sector_t sector,
>                        struct page *page, int rw)
>  {
> -       int offset, err;
> +       int offset, err, idx;
>         u32 index;
>         struct zram *zram;
>         struct bio_vec bv;
>
>         zram = bdev->bd_disk->private_data;
> +       idx = srcu_read_lock(&zram->srcu);
> +
>         if (!valid_io_request(zram, sector, PAGE_SIZE)) {
>                 atomic64_inc(&zram->stats.invalid_io);
> +               srcu_read_unlock(&zram->srcu, idx);
>                 return -EINVAL;
>         }
>
> -       down_read(&zram->init_lock);
>         if (unlikely(!init_done(zram))) {
>                 err = -EIO;
>                 goto out_unlock;
> @@ -971,7 +1003,7 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector,
>
>         err = zram_bvec_rw(zram, &bv, index, offset, rw);
>  out_unlock:
> -       up_read(&zram->init_lock);
> +       srcu_read_unlock(&zram->srcu, idx);
>         /*
>          * If I/O fails, just return error(ie, non-zero) without
>          * calling page_endio.
> @@ -1041,6 +1073,11 @@ static int create_device(struct zram *zram, int device_id)
>
>         init_rwsem(&zram->init_lock);
>
> +       if (init_srcu_struct(&zram->srcu)) {
> +               pr_err("Error initialize srcu for device %d\n", device_id);
> +               goto out;
> +       }
> +
>         zram->queue = blk_alloc_queue(GFP_KERNEL);
>         if (!zram->queue) {
>                 pr_err("Error allocating disk queue for device %d\n",
> @@ -1125,8 +1162,8 @@ static void destroy_device(struct zram *zram)
>
>         del_gendisk(zram->disk);
>         put_disk(zram->disk);
> -
>         blk_cleanup_queue(zram->queue);
> +       cleanup_srcu_struct(&zram->srcu);
>  }
>
>  static int __init zram_init(void)
> diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
> index e492f6bf11f1..2042c310aea8 100644
> --- a/drivers/block/zram/zram_drv.h
> +++ b/drivers/block/zram/zram_drv.h
> @@ -105,8 +105,13 @@ struct zram {
>         struct gendisk *disk;
>         struct zcomp *comp;
>
> +       struct srcu_struct srcu;
> +       struct rcu_head rcu;
> +
>         /* Prevent concurrent execution of device init, reset and R/W request */
>         struct rw_semaphore init_lock;
> +       bool init_done;
> +
>         /*
>          * This is the limit on amount of *uncompressed* worth of data
>          * we can store in a disk.
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-01-29 13:48   ` Ganesh Mahendran
@ 2015-01-29 15:12     ` Sergey Senozhatsky
  2015-01-30  7:52       ` Ganesh Mahendran
  0 siblings, 1 reply; 44+ messages in thread
From: Sergey Senozhatsky @ 2015-01-29 15:12 UTC (permalink / raw)
  To: Ganesh Mahendran
  Cc: Minchan Kim, Andrew Morton, linux-kernel, Linux-MM, Nitin Gupta,
	Jerome Marchand, Sergey Senozhatsky, sergey.senozhatsky.work

On (01/29/15 21:48), Ganesh Mahendran wrote:
> > Admin could reset zram during I/O operation going on so we have
> > used zram->init_lock as read-side lock in I/O path to prevent
> > sudden zram meta freeing.
> 
> When I/O operation is running, that means the /dev/zram0 is
> mounted or swaped on. Then the device could not be reset by
> below code:
> 
>     /* Do not reset an active device! */
>     if (bdev->bd_holders) {
>         ret = -EBUSY;
>         goto out;
>     }
> 
> So the zram->init_lock in I/O path is to check whether the device
> has been initialized(echo xxx > /sys/block/zram/disk_size).
> 

for mounted device (w/fs), we see initial (well, it goes up and down
many times while we create device, but this is not interesting here)
->bd_holders increment in:
  vfs_kern_mount -> mount_bdev -> blkdev_get_by_path -> blkdev_get

and it goes to zero in:
  cleanup_mnt -> deactivate_super -> kill_block_super -> blkdev_put


after umount we still have init device. so, *theoretically*, we
can see something like

	CPU0				CPU1
umount
reset_store			
bdev->bd_holders == 0			mount
...					zram_make_request()
zram_reset_device()

w/o zram->init_lock in both zram_reset_device() and zram_make_request()
one of CPUs will be a bit sad.

	-ss

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-01-29  6:35                 ` Minchan Kim
  2015-01-29  7:08                   ` Sergey Senozhatsky
@ 2015-01-30  0:20                   ` Sergey Senozhatsky
  1 sibling, 0 replies; 44+ messages in thread
From: Sergey Senozhatsky @ 2015-01-30  0:20 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	linux-kernel, Linux-MM, Nitin Gupta, Jerome Marchand,
	Ganesh Mahendran

[-- Attachment #1: Type: text/plain, Size: 41304 bytes --]

On (01/29/15 15:35), Minchan Kim wrote:
> 
> I tested it with multiple dd processes.
> 

Hello,

fio test (http://manpages.ubuntu.com/manpages/natty/man1/fio.1.html) configs are attached,
so we can run fio on different h/w (works against zram with fs mounted at /mnt/)

	for i in ./test-fio-*; do fio ./$i; rm bg*; done

in short

randread
BASE   READ: io=1600.0MB, aggrb=3747.8MB/s, minb=959250KB/s, maxb=982.82MB/s, mint=407msec, maxt=427msec
SRCU   READ: io=1600.0MB, aggrb=3782.6MB/s, minb=968321KB/s, maxb=977.11MB/s, mint=409msec, maxt=423msec

random read/write
BASE   READ: io=820304KB, aggrb=1296.3MB/s, minb=331838KB/s, maxb=333456KB/s, mint=615msec, maxt=618msec
SRCU   READ: io=820304KB, aggrb=1261.6MB/s, minb=322954KB/s, maxb=335639KB/s, mint=611msec, maxt=635msec
BASE  WRITE: io=818096KB, aggrb=1292.8MB/s, minb=330944KB/s, maxb=332559KB/s, mint=615msec, maxt=618msec
SRCU  WRITE: io=818096KB, aggrb=1258.2MB/s, minb=322085KB/s, maxb=334736KB/s, mint=611msec, maxt=635msec

random write
BASE  WRITE: io=1600.0MB, aggrb=692184KB/s, minb=173046KB/s, maxb=174669KB/s, mint=2345msec, maxt=2367msec
SRCU  WRITE: io=1600.0MB, aggrb=672577KB/s, minb=168144KB/s, maxb=174149KB/s, mint=2352msec, maxt=2436msec


detailed per-process metrics:

************* BASE *************

$ for i in ./test-fio-*; do fio ./$i; rm bg*; done
bgreader: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32
...
fio-2.2.5
Starting 4 processes
bgreader: Laying out IO file(s) (1 file(s) / 400MB)
bgreader: Laying out IO file(s) (1 file(s) / 400MB)
bgreader: Laying out IO file(s) (1 file(s) / 400MB)
bgreader: Laying out IO file(s) (1 file(s) / 400MB)

bgreader: (groupid=0, jobs=1): err= 0: pid=10792: Thu Jan 29 19:43:30 2015
  read : io=409600KB, bw=959251KB/s, iops=239812, runt=   427msec
    slat (usec): min=2, max=28, avg= 2.43, stdev= 0.62
    clat (usec): min=1, max=151, avg=122.97, stdev= 4.99
     lat (usec): min=4, max=174, avg=125.52, stdev= 5.08
    clat percentiles (usec):
     |  1.00th=[  106],  5.00th=[  116], 10.00th=[  121], 20.00th=[  122],
     | 30.00th=[  122], 40.00th=[  123], 50.00th=[  123], 60.00th=[  124],
     | 70.00th=[  124], 80.00th=[  124], 90.00th=[  125], 95.00th=[  127],
     | 99.00th=[  143], 99.50th=[  143], 99.90th=[  145], 99.95th=[  145],
     | 99.99th=[  147]
    lat (usec) : 2=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%
    lat (usec) : 250=99.97%
  cpu          : usr=39.11%, sys=55.04%, ctx=82, majf=0, minf=38
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=102400/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32
bgreader: (groupid=0, jobs=1): err= 0: pid=10793: Thu Jan 29 19:43:30 2015
  read : io=409600KB, bw=982254KB/s, iops=245563, runt=   417msec
    slat (usec): min=2, max=60, avg= 2.42, stdev= 0.56
    clat (usec): min=1, max=187, avg=123.17, stdev= 4.02
     lat (usec): min=3, max=189, avg=125.71, stdev= 4.09
    clat percentiles (usec):
     |  1.00th=[  105],  5.00th=[  121], 10.00th=[  121], 20.00th=[  122],
     | 30.00th=[  122], 40.00th=[  123], 50.00th=[  123], 60.00th=[  124],
     | 70.00th=[  124], 80.00th=[  125], 90.00th=[  126], 95.00th=[  127],
     | 99.00th=[  133], 99.50th=[  135], 99.90th=[  141], 99.95th=[  175],
     | 99.99th=[  187]
    lat (usec) : 2=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%
    lat (usec) : 250=99.97%
  cpu          : usr=40.53%, sys=55.88%, ctx=43, majf=0, minf=39
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=102400/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32
bgreader: (groupid=0, jobs=1): err= 0: pid=10794: Thu Jan 29 19:43:30 2015
  read : io=409600KB, bw=982.82MB/s, iops=251597, runt=   407msec
    slat (usec): min=2, max=50, avg= 2.43, stdev= 0.58
    clat (usec): min=2, max=174, avg=123.07, stdev= 3.97
     lat (usec): min=5, max=197, avg=125.62, stdev= 4.02
    clat percentiles (usec):
     |  1.00th=[  106],  5.00th=[  120], 10.00th=[  121], 20.00th=[  122],
     | 30.00th=[  122], 40.00th=[  123], 50.00th=[  123], 60.00th=[  123],
     | 70.00th=[  124], 80.00th=[  125], 90.00th=[  126], 95.00th=[  129],
     | 99.00th=[  135], 99.50th=[  137], 99.90th=[  145], 99.95th=[  153],
     | 99.99th=[  169]
    lat (usec) : 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%
    lat (usec) : 250=99.98%
  cpu          : usr=41.03%, sys=57.74%, ctx=42, majf=0, minf=37
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=102400/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32
bgreader: (groupid=0, jobs=1): err= 0: pid=10795: Thu Jan 29 19:43:30 2015
  read : io=409600KB, bw=980.41MB/s, iops=250980, runt=   408msec
    slat (usec): min=2, max=357, avg= 2.45, stdev= 1.32
    clat (usec): min=1, max=498, avg=124.23, stdev=12.09
     lat (usec): min=4, max=501, avg=126.80, stdev=12.33
    clat percentiles (usec):
     |  1.00th=[  108],  5.00th=[  121], 10.00th=[  122], 20.00th=[  122],
     | 30.00th=[  123], 40.00th=[  123], 50.00th=[  124], 60.00th=[  124],
     | 70.00th=[  124], 80.00th=[  125], 90.00th=[  126], 95.00th=[  127],
     | 99.00th=[  139], 99.50th=[  141], 99.90th=[  274], 99.95th=[  350],
     | 99.99th=[  498]
    lat (usec) : 2=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%
    lat (usec) : 250=99.61%, 500=0.37%
  cpu          : usr=41.18%, sys=58.09%, ctx=86, majf=0, minf=38
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=102400/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: io=1600.0MB, aggrb=3747.8MB/s, minb=959250KB/s, maxb=982.82MB/s, mint=407msec, maxt=427msec

Disk stats (read/write):
  zram0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
bgupdater: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32
...
fio-2.2.5
Starting 4 processes
bgupdater: Laying out IO file(s) (1 file(s) / 400MB)
bgupdater: Laying out IO file(s) (1 file(s) / 400MB)
bgupdater: Laying out IO file(s) (1 file(s) / 400MB)
bgupdater: Laying out IO file(s) (1 file(s) / 400MB)

bgupdater: (groupid=0, jobs=1): err= 0: pid=10803: Thu Jan 29 19:43:33 2015
  read : io=205076KB, bw=332916KB/s, iops=83228, runt=   616msec
    slat (usec): min=2, max=38, avg= 2.53, stdev= 0.71
    clat (usec): min=18, max=912, avg=187.23, stdev=18.71
     lat (usec): min=22, max=915, avg=189.88, stdev=18.74
    clat percentiles (usec):
     |  1.00th=[  159],  5.00th=[  167], 10.00th=[  171], 20.00th=[  177],
     | 30.00th=[  181], 40.00th=[  183], 50.00th=[  187], 60.00th=[  191],
     | 70.00th=[  193], 80.00th=[  197], 90.00th=[  203], 95.00th=[  209],
     | 99.00th=[  223], 99.50th=[  233], 99.90th=[  258], 99.95th=[  290],
     | 99.99th=[  908]
    bw (KB  /s): min=332504, max=332504, per=25.05%, avg=332504.00, stdev= 0.00
  write: io=204524KB, bw=332019KB/s, iops=83004, runt=   616msec
    slat (usec): min=4, max=731, avg= 6.21, stdev= 3.42
    clat (usec): min=1, max=910, avg=187.36, stdev=17.80
     lat (usec): min=8, max=918, avg=193.71, stdev=18.16
    clat percentiles (usec):
     |  1.00th=[  159],  5.00th=[  167], 10.00th=[  171], 20.00th=[  177],
     | 30.00th=[  181], 40.00th=[  185], 50.00th=[  187], 60.00th=[  191],
     | 70.00th=[  193], 80.00th=[  197], 90.00th=[  203], 95.00th=[  209],
     | 99.00th=[  223], 99.50th=[  231], 99.90th=[  258], 99.95th=[  282],
     | 99.99th=[  908]
    bw (KB  /s): min=332544, max=332544, per=25.12%, avg=332544.00, stdev= 0.00
    lat (usec) : 2=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 250=99.80%
    lat (usec) : 500=0.15%, 1000=0.03%
  cpu          : usr=30.36%, sys=68.99%, ctx=185, majf=0, minf=7
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=51269/w=51131/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32
bgupdater: (groupid=0, jobs=1): err= 0: pid=10804: Thu Jan 29 19:43:33 2015
  read : io=205076KB, bw=331838KB/s, iops=82959, runt=   618msec
    slat (usec): min=2, max=99, avg= 2.54, stdev= 0.88
    clat (usec): min=13, max=4676, avg=187.64, stdev=78.01
     lat (usec): min=16, max=4678, avg=190.29, stdev=78.01
    clat percentiles (usec):
     |  1.00th=[  159],  5.00th=[  167], 10.00th=[  171], 20.00th=[  177],
     | 30.00th=[  181], 40.00th=[  183], 50.00th=[  187], 60.00th=[  189],
     | 70.00th=[  193], 80.00th=[  197], 90.00th=[  201], 95.00th=[  207],
     | 99.00th=[  221], 99.50th=[  239], 99.90th=[  342], 99.95th=[  462],
     | 99.99th=[ 4704]
    bw (KB  /s): min=330248, max=330248, per=24.88%, avg=330248.00, stdev= 0.00
  write: io=204524KB, bw=330945KB/s, iops=82736, runt=   618msec
    slat (usec): min=4, max=4458, avg= 6.24, stdev=19.77
    clat (usec): min=2, max=4673, avg=187.90, stdev=80.44
     lat (usec): min=6, max=4680, avg=194.28, stdev=82.87
    clat percentiles (usec):
     |  1.00th=[  159],  5.00th=[  167], 10.00th=[  171], 20.00th=[  177],
     | 30.00th=[  181], 40.00th=[  183], 50.00th=[  187], 60.00th=[  189],
     | 70.00th=[  193], 80.00th=[  197], 90.00th=[  201], 95.00th=[  207],
     | 99.00th=[  219], 99.50th=[  233], 99.90th=[  342], 99.95th=[  462],
     | 99.99th=[ 4640]
    bw (KB  /s): min=330328, max=330328, per=24.95%, avg=330328.00, stdev= 0.00
    lat (usec) : 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%
    lat (usec) : 250=99.67%, 500=0.28%
    lat (msec) : 10=0.03%
  cpu          : usr=30.10%, sys=68.45%, ctx=67, majf=0, minf=10
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=51269/w=51131/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32
bgupdater: (groupid=0, jobs=1): err= 0: pid=10805: Thu Jan 29 19:43:33 2015
  read : io=205076KB, bw=332916KB/s, iops=83228, runt=   616msec
    slat (usec): min=2, max=4810, avg= 2.63, stdev=21.50
    clat (usec): min=14, max=5018, avg=187.75, stdev=99.05
     lat (usec): min=16, max=5020, avg=190.49, stdev=101.38
    clat percentiles (usec):
     |  1.00th=[  159],  5.00th=[  167], 10.00th=[  171], 20.00th=[  175],
     | 30.00th=[  179], 40.00th=[  183], 50.00th=[  185], 60.00th=[  189],
     | 70.00th=[  191], 80.00th=[  195], 90.00th=[  201], 95.00th=[  207],
     | 99.00th=[  219], 99.50th=[  229], 99.90th=[  262], 99.95th=[  956],
     | 99.99th=[ 5024]
    bw (KB  /s): min=331056, max=331056, per=24.94%, avg=331056.00, stdev= 0.00
  write: io=204524KB, bw=332019KB/s, iops=83004, runt=   616msec
    slat (usec): min=4, max=49, avg= 6.11, stdev= 1.01
    clat (usec): min=2, max=5018, avg=186.97, stdev=70.14
     lat (usec): min=6, max=5024, avg=193.22, stdev=70.15
    clat percentiles (usec):
     |  1.00th=[  159],  5.00th=[  167], 10.00th=[  171], 20.00th=[  177],
     | 30.00th=[  179], 40.00th=[  183], 50.00th=[  187], 60.00th=[  189],
     | 70.00th=[  193], 80.00th=[  195], 90.00th=[  201], 95.00th=[  207],
     | 99.00th=[  217], 99.50th=[  227], 99.90th=[  255], 99.95th=[  948],
     | 99.99th=[ 5024]
    bw (KB  /s): min=331320, max=331320, per=25.03%, avg=331320.00, stdev= 0.00
    lat (usec) : 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%
    lat (usec) : 250=99.83%, 500=0.09%, 1000=0.03%
    lat (msec) : 10=0.03%
  cpu          : usr=30.52%, sys=68.34%, ctx=70, majf=0, minf=8
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=51269/w=51131/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32
bgupdater: (groupid=0, jobs=1): err= 0: pid=10806: Thu Jan 29 19:43:33 2015
  read : io=205076KB, bw=333457KB/s, iops=83364, runt=   615msec
    slat (usec): min=2, max=40, avg= 2.53, stdev= 0.69
    clat (usec): min=18, max=287, avg=186.73, stdev=12.94
     lat (usec): min=21, max=289, avg=189.38, stdev=12.98
    clat percentiles (usec):
     |  1.00th=[  159],  5.00th=[  167], 10.00th=[  171], 20.00th=[  177],
     | 30.00th=[  181], 40.00th=[  183], 50.00th=[  187], 60.00th=[  189],
     | 70.00th=[  193], 80.00th=[  197], 90.00th=[  203], 95.00th=[  207],
     | 99.00th=[  221], 99.50th=[  229], 99.90th=[  258], 99.95th=[  270],
     | 99.99th=[  286]
    bw (KB  /s): min=332680, max=332680, per=25.06%, avg=332680.00, stdev= 0.00
  write: io=204524KB, bw=332559KB/s, iops=83139, runt=   615msec
    slat (usec): min=4, max=28, avg= 6.17, stdev= 1.02
    clat (usec): min=2, max=285, avg=186.95, stdev=12.80
     lat (usec): min=8, max=294, avg=193.26, stdev=12.89
    clat percentiles (usec):
     |  1.00th=[  159],  5.00th=[  167], 10.00th=[  171], 20.00th=[  177],
     | 30.00th=[  181], 40.00th=[  185], 50.00th=[  187], 60.00th=[  191],
     | 70.00th=[  193], 80.00th=[  197], 90.00th=[  203], 95.00th=[  209],
     | 99.00th=[  221], 99.50th=[  227], 99.90th=[  251], 99.95th=[  262],
     | 99.99th=[  274]
    bw (KB  /s): min=332672, max=332672, per=25.13%, avg=332672.00, stdev= 0.00
    lat (usec) : 4=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 250=99.85%
    lat (usec) : 500=0.14%
  cpu          : usr=30.73%, sys=69.11%, ctx=63, majf=0, minf=7
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=51269/w=51131/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: io=820304KB, aggrb=1296.3MB/s, minb=331838KB/s, maxb=333456KB/s, mint=615msec, maxt=618msec
  WRITE: io=818096KB, aggrb=1292.8MB/s, minb=330944KB/s, maxb=332559KB/s, mint=615msec, maxt=618msec

Disk stats (read/write):
  zram0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
bgwriter: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32
...
fio-2.2.5
Starting 4 processes
bgwriter: Laying out IO file(s) (1 file(s) / 400MB)
bgwriter: Laying out IO file(s) (1 file(s) / 400MB)
bgwriter: Laying out IO file(s) (1 file(s) / 400MB)
bgwriter: Laying out IO file(s) (1 file(s) / 400MB)
Jobs: 4 (f=4)
bgwriter: (groupid=0, jobs=1): err= 0: pid=10814: Thu Jan 29 19:43:35 2015
  write: io=409600KB, bw=174150KB/s, iops=43537, runt=  2352msec
    slat (usec): min=11, max=3866, avg=20.72, stdev=14.58
    clat (usec): min=2, max=5074, avg=712.72, stdev=92.15
     lat (usec): min=20, max=5094, avg=733.63, stdev=93.77
    clat percentiles (usec):
     |  1.00th=[  612],  5.00th=[  676], 10.00th=[  684], 20.00th=[  700],
     | 30.00th=[  700], 40.00th=[  708], 50.00th=[  708], 60.00th=[  716],
     | 70.00th=[  716], 80.00th=[  724], 90.00th=[  732], 95.00th=[  740],
     | 99.00th=[  820], 99.50th=[ 1004], 99.90th=[ 1640], 99.95th=[ 2128],
     | 99.99th=[ 5088]
    bw (KB  /s): min=170904, max=174808, per=24.97%, avg=172816.00, stdev=1847.92
    lat (usec) : 4=0.01%, 50=0.01%, 100=0.01%, 250=0.01%, 500=0.24%
    lat (usec) : 750=96.28%, 1000=2.94%
    lat (msec) : 2=0.46%, 4=0.03%, 10=0.03%
  cpu          : usr=10.67%, sys=88.52%, ctx=267, majf=0, minf=8
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=102400/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32
bgwriter: (groupid=0, jobs=1): err= 0: pid=10815: Thu Jan 29 19:43:35 2015
  write: io=409600KB, bw=173046KB/s, iops=43261, runt=  2367msec
    slat (usec): min=12, max=1459, avg=20.87, stdev=10.16
    clat (usec): min=2, max=2804, avg=717.25, stdev=69.33
     lat (usec): min=22, max=2825, avg=738.31, stdev=70.84
    clat percentiles (usec):
     |  1.00th=[  628],  5.00th=[  676], 10.00th=[  692], 20.00th=[  700],
     | 30.00th=[  708], 40.00th=[  708], 50.00th=[  716], 60.00th=[  716],
     | 70.00th=[  724], 80.00th=[  732], 90.00th=[  740], 95.00th=[  748],
     | 99.00th=[  812], 99.50th=[ 1080], 99.90th=[ 1800], 99.95th=[ 1992],
     | 99.99th=[ 2640]
    bw (KB  /s): min=171160, max=172768, per=24.82%, avg=171786.00, stdev=761.48
    lat (usec) : 4=0.01%, 50=0.01%, 100=0.01%, 250=0.01%, 500=0.11%
    lat (usec) : 750=95.99%, 1000=3.25%
    lat (msec) : 2=0.60%, 4=0.04%
  cpu          : usr=10.65%, sys=88.64%, ctx=502, majf=0, minf=9
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=102400/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32
bgwriter: (groupid=0, jobs=1): err= 0: pid=10816: Thu Jan 29 19:43:35 2015
  write: io=409600KB, bw=174224KB/s, iops=43555, runt=  2351msec
    slat (usec): min=12, max=280, avg=20.67, stdev= 2.58
    clat (usec): min=2, max=1116, avg=712.42, stdev=31.10
     lat (usec): min=20, max=1145, avg=733.30, stdev=31.89
    clat percentiles (usec):
     |  1.00th=[  628],  5.00th=[  676], 10.00th=[  684], 20.00th=[  700],
     | 30.00th=[  708], 40.00th=[  708], 50.00th=[  716], 60.00th=[  716],
     | 70.00th=[  724], 80.00th=[  724], 90.00th=[  740], 95.00th=[  748],
     | 99.00th=[  772], 99.50th=[  788], 99.90th=[ 1020], 99.95th=[ 1048],
     | 99.99th=[ 1080]
    bw (KB  /s): min=171632, max=174888, per=25.00%, avg=173026.00, stdev=1359.10
    lat (usec) : 4=0.01%, 50=0.01%, 100=0.01%, 250=0.01%, 500=0.11%
    lat (usec) : 750=94.50%, 1000=5.19%
    lat (msec) : 2=0.19%
  cpu          : usr=10.85%, sys=88.77%, ctx=472, majf=0, minf=8
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=102400/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32
bgwriter: (groupid=0, jobs=1): err= 0: pid=10817: Thu Jan 29 19:43:35 2015
  write: io=409600KB, bw=174670KB/s, iops=43667, runt=  2345msec
    slat (usec): min=11, max=6296, avg=20.66, stdev=24.99
    clat (usec): min=2, max=5509, avg=708.98, stdev=90.54
     lat (usec): min=22, max=7099, avg=729.83, stdev=94.22
    clat percentiles (usec):
     |  1.00th=[  604],  5.00th=[  676], 10.00th=[  684], 20.00th=[  700],
     | 30.00th=[  700], 40.00th=[  708], 50.00th=[  708], 60.00th=[  716],
     | 70.00th=[  716], 80.00th=[  724], 90.00th=[  732], 95.00th=[  740],
     | 99.00th=[  756], 99.50th=[  772], 99.90th=[  940], 99.95th=[ 1912],
     | 99.99th=[ 5536]
    bw (KB  /s): min=170792, max=176064, per=25.05%, avg=173360.00, stdev=2157.10
    lat (usec) : 4=0.01%, 50=0.01%, 100=0.01%, 250=0.01%, 500=0.32%
    lat (usec) : 750=97.68%, 1000=1.92%
    lat (msec) : 2=0.03%, 10=0.03%
  cpu          : usr=10.79%, sys=88.53%, ctx=252, majf=0, minf=6
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=102400/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: io=1600.0MB, aggrb=692184KB/s, minb=173046KB/s, maxb=174669KB/s, mint=2345msec, maxt=2367msec

Disk stats (read/write):
  zram0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%



==========================================================================================================



************* SRCU *************

$ for i in ./test-fio-*; do fio ./$i; rm bg*; done
bgreader: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32
...
fio-2.2.5
Starting 4 processes
bgreader: Laying out IO file(s) (1 file(s) / 400MB)
bgreader: Laying out IO file(s) (1 file(s) / 400MB)
bgreader: Laying out IO file(s) (1 file(s) / 400MB)
bgreader: Laying out IO file(s) (1 file(s) / 400MB)

bgreader: (groupid=0, jobs=1): err= 0: pid=11578: Thu Jan 29 19:45:01 2015
  read : io=409600KB, bw=986988KB/s, iops=246746, runt=   415msec
    slat (usec): min=2, max=708, avg= 2.48, stdev= 2.29
    clat (usec): min=2, max=839, avg=124.69, stdev=16.01
     lat (usec): min=4, max=842, avg=127.29, stdev=16.29
    clat percentiles (usec):
     |  1.00th=[  121],  5.00th=[  122], 10.00th=[  122], 20.00th=[  122],
     | 30.00th=[  123], 40.00th=[  123], 50.00th=[  123], 60.00th=[  123],
     | 70.00th=[  124], 80.00th=[  124], 90.00th=[  125], 95.00th=[  131],
     | 99.00th=[  139], 99.50th=[  223], 99.90th=[  243], 99.95th=[  282],
     | 99.99th=[  836]
    lat (usec) : 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%
    lat (usec) : 250=99.90%, 500=0.05%, 1000=0.03%
  cpu          : usr=40.48%, sys=57.11%, ctx=46, majf=0, minf=39
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=102400/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32
bgreader: (groupid=0, jobs=1): err= 0: pid=11579: Thu Jan 29 19:45:01 2015
  read : io=409600KB, bw=968322KB/s, iops=242080, runt=   423msec
    slat (usec): min=2, max=88, avg= 2.45, stdev= 0.67
    clat (usec): min=1, max=213, avg=123.41, stdev= 4.23
     lat (usec): min=3, max=216, avg=125.97, stdev= 4.30
    clat percentiles (usec):
     |  1.00th=[  108],  5.00th=[  121], 10.00th=[  122], 20.00th=[  122],
     | 30.00th=[  123], 40.00th=[  123], 50.00th=[  123], 60.00th=[  124],
     | 70.00th=[  124], 80.00th=[  124], 90.00th=[  125], 95.00th=[  126],
     | 99.00th=[  139], 99.50th=[  139], 99.90th=[  145], 99.95th=[  195],
     | 99.99th=[  213]
    lat (usec) : 2=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%
    lat (usec) : 250=99.97%
  cpu          : usr=39.24%, sys=55.79%, ctx=85, majf=0, minf=39
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=102400/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32
bgreader: (groupid=0, jobs=1): err= 0: pid=11580: Thu Jan 29 19:45:01 2015
  read : io=409600KB, bw=984615KB/s, iops=246153, runt=   416msec
    slat (usec): min=2, max=51, avg= 2.42, stdev= 0.63
    clat (usec): min=2, max=171, avg=123.01, stdev= 3.46
     lat (usec): min=4, max=174, avg=125.55, stdev= 3.50
    clat percentiles (usec):
     |  1.00th=[  120],  5.00th=[  121], 10.00th=[  121], 20.00th=[  122],
     | 30.00th=[  122], 40.00th=[  122], 50.00th=[  122], 60.00th=[  123],
     | 70.00th=[  123], 80.00th=[  123], 90.00th=[  124], 95.00th=[  126],
     | 99.00th=[  141], 99.50th=[  143], 99.90th=[  145], 99.95th=[  155],
     | 99.99th=[  171]
    lat (usec) : 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%
    lat (usec) : 250=99.97%
  cpu          : usr=40.38%, sys=56.25%, ctx=82, majf=0, minf=37
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=102400/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32
bgreader: (groupid=0, jobs=1): err= 0: pid=11581: Thu Jan 29 19:45:01 2015
  read : io=409600KB, bw=977.11MB/s, iops=250366, runt=   409msec
    slat (usec): min=2, max=236, avg= 2.46, stdev= 0.98
    clat (usec): min=1, max=370, avg=124.55, stdev=11.16
     lat (usec): min=4, max=372, avg=127.12, stdev=11.37
    clat percentiles (usec):
     |  1.00th=[  107],  5.00th=[  121], 10.00th=[  122], 20.00th=[  122],
     | 30.00th=[  123], 40.00th=[  123], 50.00th=[  124], 60.00th=[  124],
     | 70.00th=[  124], 80.00th=[  125], 90.00th=[  125], 95.00th=[  126],
     | 99.00th=[  175], 99.50th=[  223], 99.90th=[  225], 99.95th=[  278],
     | 99.99th=[  370]
    lat (usec) : 2=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%
    lat (usec) : 250=99.92%, 500=0.06%
  cpu          : usr=41.56%, sys=57.95%, ctx=48, majf=0, minf=38
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=102400/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: io=1600.0MB, aggrb=3782.6MB/s, minb=968321KB/s, maxb=977.11MB/s, mint=409msec, maxt=423msec

Disk stats (read/write):
  zram0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
bgupdater: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32
...
fio-2.2.5
Starting 4 processes
bgupdater: Laying out IO file(s) (1 file(s) / 400MB)
bgupdater: Laying out IO file(s) (1 file(s) / 400MB)
bgupdater: Laying out IO file(s) (1 file(s) / 400MB)
bgupdater: Laying out IO file(s) (1 file(s) / 400MB)

bgupdater: (groupid=0, jobs=1): err= 0: pid=11592: Thu Jan 29 19:45:04 2015
  read : io=205076KB, bw=334545KB/s, iops=83636, runt=   613msec
    slat (usec): min=2, max=73, avg= 2.56, stdev= 0.73
    clat (usec): min=19, max=310, avg=186.14, stdev=13.49
     lat (usec): min=22, max=312, avg=188.81, stdev=13.55
    clat percentiles (usec):
     |  1.00th=[  159],  5.00th=[  167], 10.00th=[  171], 20.00th=[  177],
     | 30.00th=[  179], 40.00th=[  183], 50.00th=[  187], 60.00th=[  189],
     | 70.00th=[  193], 80.00th=[  197], 90.00th=[  201], 95.00th=[  207],
     | 99.00th=[  229], 99.50th=[  251], 99.90th=[  266], 99.95th=[  274],
     | 99.99th=[  298]
    bw (KB  /s): min=333496, max=333496, per=25.82%, avg=333496.00, stdev= 0.00
  write: io=204524KB, bw=333644KB/s, iops=83411, runt=   613msec
    slat (usec): min=4, max=30, avg= 6.10, stdev= 1.01
    clat (usec): min=2, max=307, avg=186.32, stdev=13.35
     lat (usec): min=8, max=313, avg=192.56, stdev=13.47
    clat percentiles (usec):
     |  1.00th=[  159],  5.00th=[  167], 10.00th=[  171], 20.00th=[  177],
     | 30.00th=[  181], 40.00th=[  183], 50.00th=[  187], 60.00th=[  189],
     | 70.00th=[  193], 80.00th=[  197], 90.00th=[  201], 95.00th=[  207],
     | 99.00th=[  227], 99.50th=[  249], 99.90th=[  266], 99.95th=[  274],
     | 99.99th=[  298]
    bw (KB  /s): min=333368, max=333368, per=25.88%, avg=333368.00, stdev= 0.00
    lat (usec) : 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%
    lat (usec) : 250=99.47%, 500=0.51%
  cpu          : usr=30.51%, sys=69.00%, ctx=68, majf=0, minf=7
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=51269/w=51131/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32
bgupdater: (groupid=0, jobs=1): err= 0: pid=11593: Thu Jan 29 19:45:04 2015
  read : io=205076KB, bw=334545KB/s, iops=83636, runt=   613msec
    slat (usec): min=2, max=40, avg= 2.53, stdev= 0.65
    clat (usec): min=53, max=3694, avg=186.31, stdev=65.12
     lat (usec): min=55, max=3697, avg=188.96, stdev=65.13
    clat percentiles (usec):
     |  1.00th=[  159],  5.00th=[  167], 10.00th=[  171], 20.00th=[  175],
     | 30.00th=[  179], 40.00th=[  181], 50.00th=[  185], 60.00th=[  187],
     | 70.00th=[  191], 80.00th=[  195], 90.00th=[  201], 95.00th=[  207],
     | 99.00th=[  231], 99.50th=[  249], 99.90th=[  266], 99.95th=[  342],
     | 99.99th=[ 3696]
    bw (KB  /s): min=335528, max=335528, per=25.97%, avg=335528.00, stdev= 0.00
  write: io=204524KB, bw=333644KB/s, iops=83411, runt=   613msec
    slat (usec): min=4, max=160, avg= 6.08, stdev= 1.22
    clat (usec): min=2, max=3693, avg=186.31, stdev=59.38
     lat (usec): min=44, max=3699, avg=192.54, stdev=59.41
    clat percentiles (usec):
     |  1.00th=[  159],  5.00th=[  167], 10.00th=[  171], 20.00th=[  175],
     | 30.00th=[  179], 40.00th=[  183], 50.00th=[  185], 60.00th=[  189],
     | 70.00th=[  191], 80.00th=[  195], 90.00th=[  201], 95.00th=[  205],
     | 99.00th=[  225], 99.50th=[  247], 99.90th=[  262], 99.95th=[  346],
     | 99.99th=[ 3696]
    bw (KB  /s): min=335376, max=335376, per=26.03%, avg=335376.00, stdev= 0.00
    lat (usec) : 4=0.01%, 50=0.01%, 100=0.01%, 250=99.55%, 500=0.41%
    lat (msec) : 4=0.03%
  cpu          : usr=30.34%, sys=68.68%, ctx=64, majf=0, minf=9
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=51269/w=51131/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32
bgupdater: (groupid=0, jobs=1): err= 0: pid=11594: Thu Jan 29 19:45:04 2015
  read : io=205076KB, bw=322954KB/s, iops=80738, runt=   635msec
    slat (usec): min=2, max=9087, avg= 2.75, stdev=41.31
    clat (usec): min=13, max=9287, avg=193.88, stdev=237.09
     lat (usec): min=15, max=9293, avg=196.75, stdev=240.69
    clat percentiles (usec):
     |  1.00th=[  143],  5.00th=[  163], 10.00th=[  169], 20.00th=[  175],
     | 30.00th=[  179], 40.00th=[  181], 50.00th=[  185], 60.00th=[  189],
     | 70.00th=[  191], 80.00th=[  195], 90.00th=[  201], 95.00th=[  209],
     | 99.00th=[  255], 99.50th=[  350], 99.90th=[ 3536], 99.95th=[ 8256],
     | 99.99th=[ 9280]
    bw (KB  /s): min=324248, max=324248, per=25.10%, avg=324248.00, stdev= 0.00
  write: io=204524KB, bw=322085KB/s, iops=80521, runt=   635msec
    slat (usec): min=3, max=3360, avg= 6.20, stdev=15.91
    clat (usec): min=1, max=9286, avg=192.34, stdev=210.71
     lat (usec): min=6, max=9293, avg=198.68, stdev=211.36
    clat percentiles (usec):
     |  1.00th=[  145],  5.00th=[  163], 10.00th=[  169], 20.00th=[  175],
     | 30.00th=[  179], 40.00th=[  183], 50.00th=[  185], 60.00th=[  189],
     | 70.00th=[  191], 80.00th=[  195], 90.00th=[  203], 95.00th=[  209],
     | 99.00th=[  253], 99.50th=[  346], 99.90th=[ 1032], 99.95th=[ 8256],
     | 99.99th=[ 9280]
    bw (KB  /s): min=324656, max=324656, per=25.20%, avg=324656.00, stdev= 0.00
    lat (usec) : 2=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%
    lat (usec) : 250=98.84%, 500=0.93%, 750=0.05%
    lat (msec) : 2=0.03%, 4=0.06%, 10=0.06%
  cpu          : usr=29.29%, sys=66.14%, ctx=91, majf=0, minf=7
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=51269/w=51131/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32
bgupdater: (groupid=0, jobs=1): err= 0: pid=11595: Thu Jan 29 19:45:04 2015
  read : io=205076KB, bw=335640KB/s, iops=83909, runt=   611msec
    slat (usec): min=2, max=58, avg= 2.56, stdev= 0.82
    clat (usec): min=17, max=470, avg=185.51, stdev=15.09
     lat (usec): min=20, max=472, avg=188.18, stdev=15.16
    clat percentiles (usec):
     |  1.00th=[  157],  5.00th=[  167], 10.00th=[  171], 20.00th=[  175],
     | 30.00th=[  179], 40.00th=[  181], 50.00th=[  185], 60.00th=[  187],
     | 70.00th=[  191], 80.00th=[  195], 90.00th=[  201], 95.00th=[  209],
     | 99.00th=[  233], 99.50th=[  251], 99.90th=[  270], 99.95th=[  330],
     | 99.99th=[  462]
    bw (KB  /s): min=334544, max=334544, per=25.90%, avg=334544.00, stdev= 0.00
  write: io=204524KB, bw=334736KB/s, iops=83684, runt=   611msec
    slat (usec): min=4, max=90, avg= 6.07, stdev= 1.19
    clat (usec): min=2, max=468, avg=185.70, stdev=14.68
     lat (usec): min=8, max=474, avg=191.91, stdev=14.84
    clat percentiles (usec):
     |  1.00th=[  159],  5.00th=[  167], 10.00th=[  171], 20.00th=[  175],
     | 30.00th=[  179], 40.00th=[  183], 50.00th=[  185], 60.00th=[  189],
     | 70.00th=[  191], 80.00th=[  195], 90.00th=[  203], 95.00th=[  209],
     | 99.00th=[  231], 99.50th=[  249], 99.90th=[  266], 99.95th=[  274],
     | 99.99th=[  462]
    bw (KB  /s): min=334440, max=334440, per=25.96%, avg=334440.00, stdev= 0.00
    lat (usec) : 4=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 250=99.48%
    lat (usec) : 500=0.51%
  cpu          : usr=30.44%, sys=68.58%, ctx=188, majf=0, minf=7
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=51269/w=51131/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: io=820304KB, aggrb=1261.6MB/s, minb=322954KB/s, maxb=335639KB/s, mint=611msec, maxt=635msec
  WRITE: io=818096KB, aggrb=1258.2MB/s, minb=322085KB/s, maxb=334736KB/s, mint=611msec, maxt=635msec

Disk stats (read/write):
  zram0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
bgwriter: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32
...
fio-2.2.5
Starting 4 processes
bgwriter: Laying out IO file(s) (1 file(s) / 400MB)
bgwriter: Laying out IO file(s) (1 file(s) / 400MB)
bgwriter: Laying out IO file(s) (1 file(s) / 400MB)
bgwriter: Laying out IO file(s) (1 file(s) / 400MB)
Jobs: 4 (f=4)
bgwriter: (groupid=0, jobs=1): err= 0: pid=11605: Thu Jan 29 19:45:06 2015
  write: io=409600KB, bw=168144KB/s, iops=42036, runt=  2436msec
    slat (usec): min=10, max=5615, avg=21.50, stdev=37.53
    clat (usec): min=2, max=9808, avg=738.51, stdev=364.25
     lat (usec): min=40, max=9827, avg=760.22, stdev=371.10
    clat percentiles (usec):
     |  1.00th=[  612],  5.00th=[  668], 10.00th=[  684], 20.00th=[  692],
     | 30.00th=[  700], 40.00th=[  700], 50.00th=[  708], 60.00th=[  708],
     | 70.00th=[  716], 80.00th=[  724], 90.00th=[  732], 95.00th=[  756],
     | 99.00th=[ 1576], 99.50th=[ 3056], 99.90th=[ 6816], 99.95th=[ 9408],
     | 99.99th=[ 9792]
    bw (KB  /s): min=157408, max=176360, per=24.65%, avg=165770.00, stdev=8401.50
    lat (usec) : 4=0.01%, 50=0.01%, 100=0.01%, 250=0.01%, 500=0.20%
    lat (usec) : 750=94.24%, 1000=4.14%
    lat (msec) : 2=0.63%, 4=0.45%, 10=0.33%
  cpu          : usr=10.26%, sys=85.22%, ctx=982, majf=0, minf=7
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=102400/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32
bgwriter: (groupid=0, jobs=1): err= 0: pid=11606: Thu Jan 29 19:45:06 2015
  write: io=409600KB, bw=173928KB/s, iops=43481, runt=  2355msec
    slat (usec): min=12, max=1173, avg=20.76, stdev= 7.22
    clat (usec): min=2, max=2247, avg=713.64, stdev=49.05
     lat (usec): min=22, max=2270, avg=734.58, stdev=50.26
    clat percentiles (usec):
     |  1.00th=[  620],  5.00th=[  668], 10.00th=[  684], 20.00th=[  692],
     | 30.00th=[  700], 40.00th=[  708], 50.00th=[  708], 60.00th=[  716],
     | 70.00th=[  716], 80.00th=[  724], 90.00th=[  740], 95.00th=[  756],
     | 99.00th=[  868], 99.50th=[  924], 99.90th=[ 1368], 99.95th=[ 1608],
     | 99.99th=[ 2192]
    bw (KB  /s): min=170448, max=175504, per=25.63%, avg=172406.00, stdev=2166.56
    lat (usec) : 4=0.01%, 50=0.01%, 100=0.01%, 250=0.01%, 500=0.13%
    lat (usec) : 750=92.77%, 1000=6.92%
    lat (msec) : 2=0.17%, 4=0.01%
  cpu          : usr=10.66%, sys=88.66%, ctx=528, majf=0, minf=8
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=102400/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32
bgwriter: (groupid=0, jobs=1): err= 0: pid=11607: Thu Jan 29 19:45:06 2015
  write: io=409600KB, bw=172900KB/s, iops=43224, runt=  2369msec
    slat (usec): min=9, max=3458, avg=20.88, stdev=19.13
    clat (usec): min=2, max=4190, avg=718.03, stdev=114.69
     lat (usec): min=22, max=4210, avg=739.09, stdev=116.47
    clat percentiles (usec):
     |  1.00th=[  620],  5.00th=[  668], 10.00th=[  684], 20.00th=[  692],
     | 30.00th=[  700], 40.00th=[  700], 50.00th=[  708], 60.00th=[  708],
     | 70.00th=[  716], 80.00th=[  724], 90.00th=[  732], 95.00th=[  756],
     | 99.00th=[ 1064], 99.50th=[ 1080], 99.90th=[ 2024], 99.95th=[ 3952],
     | 99.99th=[ 4192]
    bw (KB  /s): min=168600, max=173752, per=25.49%, avg=171458.00, stdev=2129.26
    lat (usec) : 4=0.01%, 50=0.01%, 100=0.01%, 250=0.01%, 500=0.14%
    lat (usec) : 750=94.29%, 1000=2.69%
    lat (msec) : 2=2.74%, 4=0.09%, 10=0.03%
  cpu          : usr=10.55%, sys=87.38%, ctx=373, majf=0, minf=8
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=102400/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32
bgwriter: (groupid=0, jobs=1): err= 0: pid=11608: Thu Jan 29 19:45:06 2015
  write: io=409600KB, bw=174150KB/s, iops=43537, runt=  2352msec
    slat (usec): min=9, max=5199, avg=20.79, stdev=26.32
    clat (usec): min=2, max=5915, avg=712.94, stdev=149.06
     lat (usec): min=20, max=5936, avg=733.91, stdev=151.50
    clat percentiles (usec):
     |  1.00th=[  628],  5.00th=[  668], 10.00th=[  684], 20.00th=[  692],
     | 30.00th=[  700], 40.00th=[  708], 50.00th=[  708], 60.00th=[  716],
     | 70.00th=[  716], 80.00th=[  724], 90.00th=[  732], 95.00th=[  740],
     | 99.00th=[  780], 99.50th=[  836], 99.90th=[ 2512], 99.95th=[ 5536],
     | 99.99th=[ 5920]
    bw (KB  /s): min=170872, max=174024, per=25.68%, avg=172714.00, stdev=1386.45
    lat (usec) : 4=0.01%, 50=0.01%, 100=0.01%, 250=0.01%, 500=0.10%
    lat (usec) : 750=97.10%, 1000=2.55%
    lat (msec) : 2=0.08%, 4=0.07%, 10=0.09%
  cpu          : usr=10.50%, sys=88.56%, ctx=260, majf=0, minf=6
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=102400/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: io=1600.0MB, aggrb=672577KB/s, minb=168144KB/s, maxb=174149KB/s, mint=2352msec, maxt=2436msec

Disk stats (read/write):
  zram0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%



	-ss

[-- Attachment #2: test-fio-randread --]
[-- Type: text/plain, Size: 126 bytes --]

[global]
size=400m
directory=/mnt/
ioengine=libaio
iodepth=32
direct=1
invalidate=1
numjobs=4
loops=4

[bgreader]
rw=randread

[-- Attachment #3: test-fio-randrw --]
[-- Type: text/plain, Size: 125 bytes --]

[global]
size=400m
directory=/mnt/
ioengine=libaio
iodepth=32
direct=1
invalidate=1
numjobs=4
loops=5

[bgupdater]
rw=randrw

[-- Attachment #4: test-fio-randwrite --]
[-- Type: text/plain, Size: 127 bytes --]

[global]
size=400m
directory=/mnt/
ioengine=libaio
iodepth=32
direct=1
invalidate=1
numjobs=4
loops=5

[bgwriter]
rw=randwrite

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-01-29 15:12     ` Sergey Senozhatsky
@ 2015-01-30  7:52       ` Ganesh Mahendran
  2015-01-30  8:08         ` Sergey Senozhatsky
  0 siblings, 1 reply; 44+ messages in thread
From: Ganesh Mahendran @ 2015-01-30  7:52 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Minchan Kim, Andrew Morton, linux-kernel, Linux-MM, Nitin Gupta,
	Jerome Marchand, sergey.senozhatsky.work

Hello Sergey

2015-01-29 23:12 GMT+08:00 Sergey Senozhatsky <sergey.senozhatsky@gmail.com>:
> On (01/29/15 21:48), Ganesh Mahendran wrote:
>> > Admin could reset zram during I/O operation going on so we have
>> > used zram->init_lock as read-side lock in I/O path to prevent
>> > sudden zram meta freeing.
>>
>> When I/O operation is running, that means the /dev/zram0 is
>> mounted or swaped on. Then the device could not be reset by
>> below code:
>>
>>     /* Do not reset an active device! */
>>     if (bdev->bd_holders) {
>>         ret = -EBUSY;
>>         goto out;
>>     }
>>
>> So the zram->init_lock in I/O path is to check whether the device
>> has been initialized(echo xxx > /sys/block/zram/disk_size).
>>

Thanks for your explanation.

>
> for mounted device (w/fs), we see initial (well, it goes up and down

What does "w/" mean?

> many times while we create device, but this is not interesting here)
> ->bd_holders increment in:
>   vfs_kern_mount -> mount_bdev -> blkdev_get_by_path -> blkdev_get
>
> and it goes to zero in:
>   cleanup_mnt -> deactivate_super -> kill_block_super -> blkdev_put
>
>
> after umount we still have init device. so, *theoretically*, we
> can see something like
>
>         CPU0                            CPU1
> umount
> reset_store
> bdev->bd_holders == 0                   mount
> ...                                     zram_make_request()
> zram_reset_device()

In this example, the data stored in zram will be corrupted.
Since CPU0 will free meta while CPU1 is using.
right?


>
> w/o zram->init_lock in both zram_reset_device() and zram_make_request()
> one of CPUs will be a bit sad.
what does "w/o" mean?

Thanks

>
>         -ss

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-01-30  7:52       ` Ganesh Mahendran
@ 2015-01-30  8:08         ` Sergey Senozhatsky
  2015-01-31  8:50           ` Ganesh Mahendran
  0 siblings, 1 reply; 44+ messages in thread
From: Sergey Senozhatsky @ 2015-01-30  8:08 UTC (permalink / raw)
  To: Ganesh Mahendran
  Cc: Sergey Senozhatsky, Minchan Kim, Andrew Morton, linux-kernel,
	Linux-MM, Nitin Gupta, Jerome Marchand, sergey.senozhatsky.work

On (01/30/15 15:52), Ganesh Mahendran wrote:
> >> When I/O operation is running, that means the /dev/zram0 is
> >> mounted or swaped on. Then the device could not be reset by
> >> below code:
> >>
> >>     /* Do not reset an active device! */
> >>     if (bdev->bd_holders) {
> >>         ret = -EBUSY;
> >>         goto out;
> >>     }
> >>
> >> So the zram->init_lock in I/O path is to check whether the device
> >> has been initialized(echo xxx > /sys/block/zram/disk_size).
> >>
> 
> Thanks for your explanation.
> 
> >
> > for mounted device (w/fs), we see initial (well, it goes up and down
> 
> What does "w/" mean?

'with fs'

> > many times while we create device, but this is not interesting here)
> > ->bd_holders increment in:
> >   vfs_kern_mount -> mount_bdev -> blkdev_get_by_path -> blkdev_get
> >
> > and it goes to zero in:
> >   cleanup_mnt -> deactivate_super -> kill_block_super -> blkdev_put
> >
> >
> > after umount we still have init device. so, *theoretically*, we
> > can see something like
> >
> >         CPU0                            CPU1
> > umount
> > reset_store
> > bdev->bd_holders == 0                   mount
> > ...                                     zram_make_request()
> > zram_reset_device()
> 
> In this example, the data stored in zram will be corrupted.
> Since CPU0 will free meta while CPU1 is using.
> right?
> 

with out ->init_lock protection in this case we have 'free' vs. 'use' race.

> 
> >
> > w/o zram->init_lock in both zram_reset_device() and zram_make_request()
> > one of CPUs will be a bit sad.
> what does "w/o" mean?

'with out'


	-ss

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-01-29  7:08                   ` Sergey Senozhatsky
@ 2015-01-30 14:41                     ` Minchan Kim
  2015-01-31 11:31                       ` Sergey Senozhatsky
  2015-02-01 14:50                       ` Sergey Senozhatsky
  0 siblings, 2 replies; 44+ messages in thread
From: Minchan Kim @ 2015-01-30 14:41 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel, Linux-MM,
	Nitin Gupta, Jerome Marchand, Ganesh Mahendran

Hello, Sergey

On Thu, Jan 29, 2015 at 04:08:35PM +0900, Sergey Senozhatsky wrote:
> On (01/29/15 15:35), Minchan Kim wrote:
> >
> > As you told, the data was not stable.
> >
> yes. fread test was always slower, and the rest was mostly slower.
> 
> 
> > Anyway, when I read down_read implementation, it's one atomic instruction.
> > Hmm, it seems te be better for srcu_read_lock which does more things.
> >
> srcu looks havier, agree.

ffffffff8172c350 <down_read>:
ffffffff8172c350:       e8 7b 3f 00 00          callq  ffffffff817302d0 <__fentry__>
ffffffff8172c355:       55                      push   %rbp
ffffffff8172c356:       48 89 e5                mov    %rsp,%rbp
ffffffff8172c359:       53                      push   %rbx
ffffffff8172c35a:       48 89 fb                mov    %rdi,%rbx
ffffffff8172c35d:       48 83 ec 08             sub    $0x8,%rsp
ffffffff8172c361:       e8 9a e0 ff ff          callq  ffffffff8172a400 <_cond_resched>
ffffffff8172c366:       48 89 d8                mov    %rbx,%rax
ffffffff8172c369:       f0 48 ff 00             lock incq (%rax)
ffffffff8172c36d:       79 05                   jns    ffffffff8172c374 <down_read+0x24>
ffffffff8172c36f:       e8 5c e7 c4 ff          callq  ffffffff8137aad0 <call_rwsem_down_read_failed>
ffffffff8172c374:       48 83 c4 08             add    $0x8,%rsp
ffffffff8172c378:       5b                      pop    %rbx
ffffffff8172c379:       5d                      pop    %rbp
ffffffff8172c37a:       c3                      retq   


ffffffff810eeec0 <__srcu_read_lock>:
ffffffff810eeec0:       e8 0b 14 64 00          callq  ffffffff817302d0 <__fentry__>
ffffffff810eeec5:       48 8b 07                mov    (%rdi),%rax
ffffffff810eeec8:       55                      push   %rbp
ffffffff810eeec9:       48 89 e5                mov    %rsp,%rbp
ffffffff810eeecc:       83 e0 01                and    $0x1,%eax
ffffffff810eeecf:       48 63 d0                movslq %eax,%rdx
ffffffff810eeed2:       48 8b 4f 08             mov    0x8(%rdi),%rcx
ffffffff810eeed6:       65 48 ff 04 d1          incq   %gs:(%rcx,%rdx,8)
ffffffff810eeedb:       0f ae f0                mfence 
ffffffff810eeede:       48 83 c2 02             add    $0x2,%rdx
ffffffff810eeee2:       48 8b 4f 08             mov    0x8(%rdi),%rcx
ffffffff810eeee6:       65 48 ff 04 d1          incq   %gs:(%rcx,%rdx,8)
ffffffff810eeeeb:       5d                      pop    %rbp
ffffffff810eeeec:       c3                      retq   

Yes, __srcu_read_lock is a little bit heavier but the number of instruction
are not too much difference to make difference 10%. A culprit is
__cond_resched but I don't think, either because our test was CPU intensive
soS I don't think schedule latency affects total bandwidth.

More cuprit is your data pattern.
It seems you didn't use scramble_buffers=0, zero_buffers in fio so that
fio fills random data pattern so zram bandwidth could be different by
compression/decompression ratio.

I did test your fio script adding above options with my 4 CPU real machine
(NOTE, ubuntu fio is old so that it doesn't work well above two options
so I should update fio recently which solves it perfectly)

Another thing about fio is it seems loops option works with write test
with overwrite=1 options while read test doesn't work so that I should
use perf stat -r options to verify stdev.

In addition, I passed first test to remove noise as creating files
and increased testsize as 1G from 400m

1) randread

= vanilla =

 Performance counter stats for 'fio test-fio-randread.txt' (10 runs):

       4713.879241      task-clock (msec)         #    3.160 CPUs utilized            ( +-  0.62% )
             1,131      context-switches          #    0.240 K/sec                    ( +-  2.83% )
                23      cpu-migrations            #    0.005 K/sec                    ( +-  4.40% )
            15,767      page-faults               #    0.003 M/sec                    ( +-  0.03% )
    15,134,497,088      cycles                    #    3.211 GHz                      ( +-  0.15% ) [83.36%]
    10,763,665,604      stalled-cycles-frontend   #   71.12% frontend cycles idle     ( +-  0.22% ) [83.34%]
     6,896,294,076      stalled-cycles-backend    #   45.57% backend  cycles idle     ( +-  0.29% ) [66.67%]
     9,898,608,791      instructions              #    0.65  insns per cycle        
                                                  #    1.09  stalled cycles per insn  ( +-  0.07% ) [83.33%]
     1,852,167,485      branches                  #  392.918 M/sec                    ( +-  0.07% ) [83.34%]
        14,864,143      branch-misses             #    0.80% of all branches          ( +-  0.16% ) [83.34%]

       1.491813361 seconds time elapsed                                          ( +-  0.62% )

= srcu =

 Performance counter stats for 'fio test-fio-randread.txt' (10 runs):

       4752.790715      task-clock (msec)         #    3.166 CPUs utilized            ( +-  0.48% )
             1,179      context-switches          #    0.248 K/sec                    ( +-  1.56% )
                26      cpu-migrations            #    0.005 K/sec                    ( +-  3.91% )
            15,764      page-faults               #    0.003 M/sec                    ( +-  0.02% )
    15,263,869,915      cycles                    #    3.212 GHz                      ( +-  0.25% ) [83.32%]
    10,935,658,177      stalled-cycles-frontend   #   71.64% frontend cycles idle     ( +-  0.38% ) [83.33%]
     7,067,290,320      stalled-cycles-backend    #   46.30% backend  cycles idle     ( +-  0.46% ) [66.64%]
     9,896,513,423      instructions              #    0.65  insns per cycle        
                                                  #    1.11  stalled cycles per insn  ( +-  0.07% ) [83.33%]
     1,847,612,285      branches                  #  388.743 M/sec                    ( +-  0.07% ) [83.38%]
        14,814,815      branch-misses             #    0.80% of all branches          ( +-  0.24% ) [83.37%]

       1.501284082 seconds time elapsed                                          ( +-  0.50% )

srcu is worse as 0.63% but the difference is really marginal.

2) randwrite

= vanilla =

 Performance counter stats for 'fio test-fio-randwrite.txt' (10 runs):

       6283.823490      task-clock (msec)         #    3.332 CPUs utilized            ( +-  0.44% )
             1,536      context-switches          #    0.245 K/sec                    ( +-  2.10% )
                25      cpu-migrations            #    0.004 K/sec                    ( +-  3.79% )
            15,914      page-faults               #    0.003 M/sec                    ( +-  0.02% )
    20,408,942,915      cycles                    #    3.248 GHz                      ( +-  0.40% ) [83.34%]
    14,398,424,739      stalled-cycles-frontend   #   70.55% frontend cycles idle     ( +-  0.62% ) [83.36%]
     9,513,822,555      stalled-cycles-backend    #   46.62% backend  cycles idle     ( +-  0.62% ) [66.65%]
    13,507,376,783      instructions              #    0.66  insns per cycle        
                                                  #    1.07  stalled cycles per insn  ( +-  0.05% ) [83.36%]
     3,155,423,934      branches                  #  502.150 M/sec                    ( +-  0.05% ) [83.34%]
        18,381,090      branch-misses             #    0.58% of all branches          ( +-  0.16% ) [83.34%]

       1.885926070 seconds time elapsed                                          ( +-  0.61% )

= srcu =

 Performance counter stats for 'fio test-fio-randwrite.txt' (10 runs):

       6152.997119      task-clock (msec)         #    3.304 CPUs utilized            ( +-  0.29% )
             1,422      context-switches          #    0.231 K/sec                    ( +-  3.45% )
                28      cpu-migrations            #    0.004 K/sec                    ( +-  7.47% )
            15,921      page-faults               #    0.003 M/sec                    ( +-  0.02% )
    19,862,315,430      cycles                    #    3.228 GHz                      ( +-  0.09% ) [83.33%]
    13,872,541,761      stalled-cycles-frontend   #   69.84% frontend cycles idle     ( +-  0.12% ) [83.34%]
     9,074,883,552      stalled-cycles-backend    #   45.69% backend  cycles idle     ( +-  0.19% ) [66.71%]
    13,494,854,651      instructions              #    0.68  insns per cycle        
                                                  #    1.03  stalled cycles per insn  ( +-  0.03% ) [83.37%]
     3,148,938,955      branches                  #  511.773 M/sec                    ( +-  0.04% ) [83.33%]
        17,701,249      branch-misses             #    0.56% of all branches          ( +-  0.23% ) [83.34%]

       1.862543230 seconds time elapsed                                          ( +-  0.35% )

srcu is better as 1.24% is better.

3) randrw

= vanilla =

 Performance counter stats for 'fio test-fio-randrw.txt' (10 runs):

       5609.976477      task-clock (msec)         #    3.249 CPUs utilized            ( +-  0.34% )
             1,407      context-switches          #    0.251 K/sec                    ( +-  0.96% )
                25      cpu-migrations            #    0.004 K/sec                    ( +-  5.37% )
            15,906      page-faults               #    0.003 M/sec                    ( +-  0.05% )
    18,090,560,346      cycles                    #    3.225 GHz                      ( +-  0.35% ) [83.36%]
    12,885,393,954      stalled-cycles-frontend   #   71.23% frontend cycles idle     ( +-  0.53% ) [83.33%]
     8,570,185,547      stalled-cycles-backend    #   47.37% backend  cycles idle     ( +-  0.59% ) [66.67%]
    11,771,620,352      instructions              #    0.65  insns per cycle        
                                                  #    1.09  stalled cycles per insn  ( +-  0.05% ) [83.35%]
     2,508,014,871      branches                  #  447.063 M/sec                    ( +-  0.05% ) [83.34%]
        18,585,638      branch-misses             #    0.74% of all branches          ( +-  0.23% ) [83.35%]

       1.726691239 seconds time elapsed                                          ( +-  0.40% )

= srcu =

       5475.312828      task-clock (msec)         #    3.246 CPUs utilized            ( +-  0.59% )
             1,399      context-switches          #    0.255 K/sec                    ( +-  1.46% )
                24      cpu-migrations            #    0.004 K/sec                    ( +-  6.27% )
            15,916      page-faults               #    0.003 M/sec                    ( +-  0.04% )
    17,583,197,041      cycles                    #    3.211 GHz                      ( +-  0.11% ) [83.33%]
    12,352,657,985      stalled-cycles-frontend   #   70.25% frontend cycles idle     ( +-  0.16% ) [83.33%]
     8,173,164,212      stalled-cycles-backend    #   46.48% backend  cycles idle     ( +-  0.19% ) [66.70%]
    11,780,176,340      instructions              #    0.67  insns per cycle        
                                                  #    1.05  stalled cycles per insn  ( +-  0.05% ) [83.36%]
     2,506,722,383      branches                  #  457.823 M/sec                    ( +-  0.06% ) [83.35%]
        18,436,877      branch-misses             #    0.74% of all branches          ( +-  0.18% ) [83.32%]

       1.686877512 seconds time elapsed                                          ( +-  0.43% )

srcu is better as 2.3%

Srcu is better than down_read but I don't believe either because when I
did perf record, [up|down]_read and srcu_read_[lock|unlock] is really
minor (about 0.5%) so that I think it's really marginal.
(for example, if we removes srcu_read_[un]lock totally, we just enhance
about 1%) So, I don't think it's worth.

Okay, if you concerns on the data still, how about this?
Even, it would be smaller instructions than [up|down]_read so I guess
it could remove your performance concern. But I don't believe
it could make significant difference, either.
Hope it addresses your concern.

Thanks.

>From e3f0965e692a3d085bb4ff25a774291e3f269550 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Fri, 30 Jan 2015 09:57:37 +0900
Subject: [PATCH] zram: remove init_lock in zram_make_request

Admin could reset zram during I/O operation going on so we have
used zram->init_lock as read-side lock in I/O path to prevent
sudden zram meta freeing.

However, the init_lock is really troublesome.
We can't do call zram_meta_alloc under init_lock due to lockdep splat
because zram_rw_page is one of the function under reclaim path and
hold it as read_lock while other places in process context hold it
as write_lock. So, we have used allocation out of the lock to avoid
lockdep warn but it's not good for readability and fainally, I met
another lockdep splat between init_lock and cpu_hotplug from
kmem_cache_destroy during working zsmalloc compaction. :(

Yes, the ideal is to remove horrible init_lock of zram in rw path.
This patch removes it in rw path and instead, add atomic refcount
for meta lifetime management and completion to free meta in process
context. It's important to free meta in process context because
some of resource destruction needs mutex lock, which could be held
if we releases the resource in reclaim context so it's deadlock,
again.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/zram/zram_drv.c | 72 +++++++++++++++++++++++++++++++------------
 drivers/block/zram/zram_drv.h |  2 ++
 2 files changed, 54 insertions(+), 20 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index aa5a4c54f057..9c69c35eace9 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -55,7 +55,7 @@ static DEVICE_ATTR_RO(name);
 
 static inline int init_done(struct zram *zram)
 {
-	return zram->meta != NULL;
+	return zram->disksize != 0;
 }
 
 static inline struct zram *dev_to_zram(struct device *dev)
@@ -350,6 +350,8 @@ static struct zram_meta *zram_meta_alloc(int device_id, u64 disksize)
 		goto out_error;
 	}
 
+	init_completion(&meta->complete);
+	atomic_set(&meta->refcount, 1);
 	return meta;
 
 out_error:
@@ -358,6 +360,23 @@ out_error:
 	return NULL;
 }
 
+static inline bool zram_meta_get(struct zram_meta *meta)
+{
+	if (!atomic_inc_not_zero(&meta->refcount))
+		return false;
+	return true;
+}
+
+/*
+ * We want to free zram_meta in process context to avoid
+ * deadlock between reclaim path and any other locks
+ */
+static inline void zram_meta_put(struct zram_meta *meta)
+{
+	if (atomic_dec_and_test(&meta->refcount))
+		complete(&meta->complete);
+}
+
 static void update_position(u32 *index, int *offset, struct bio_vec *bvec)
 {
 	if (*offset + bvec->bv_len >= PAGE_SIZE)
@@ -719,6 +738,9 @@ static void zram_bio_discard(struct zram *zram, u32 index,
 
 static void zram_reset_device(struct zram *zram, bool reset_capacity)
 {
+	struct zram_meta *meta;
+	u64 disksize;
+
 	down_write(&zram->init_lock);
 
 	zram->limit_pages = 0;
@@ -728,14 +750,20 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
 		return;
 	}
 
+	meta = zram->meta;
+
 	zcomp_destroy(zram->comp);
 	zram->max_comp_streams = 1;
-	zram_meta_free(zram->meta, zram->disksize);
-	zram->meta = NULL;
+	disksize = zram->disksize;
+	zram_meta_put(meta);
+	/* Read/write handler will not handle further I/O operation. */
+	zram->disksize = 0;
+	wait_for_completion(&meta->complete);
+	/* I/O operation under all of CPU are done so let's free */
+	zram_meta_free(zram->meta, disksize);
 	/* Reset stats */
 	memset(&zram->stats, 0, sizeof(zram->stats));
 
-	zram->disksize = 0;
 	if (reset_capacity)
 		set_capacity(zram->disk, 0);
 
@@ -908,23 +936,25 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
 {
 	struct zram *zram = queue->queuedata;
 
-	down_read(&zram->init_lock);
-	if (unlikely(!init_done(zram)))
+	if (unlikely(!zram_meta_get(zram->meta)))
 		goto error;
 
+	if (unlikely(!init_done(zram)))
+		goto put_meta;
+
 	if (!valid_io_request(zram, bio->bi_iter.bi_sector,
 					bio->bi_iter.bi_size)) {
 		atomic64_inc(&zram->stats.invalid_io);
-		goto error;
+		goto put_meta;
 	}
 
 	__zram_make_request(zram, bio);
-	up_read(&zram->init_lock);
+	zram_meta_put(zram->meta);
 
 	return;
-
+put_meta:
+	zram_meta_put(zram->meta);
 error:
-	up_read(&zram->init_lock);
 	bio_io_error(bio);
 }
 
@@ -946,21 +976,22 @@ static void zram_slot_free_notify(struct block_device *bdev,
 static int zram_rw_page(struct block_device *bdev, sector_t sector,
 		       struct page *page, int rw)
 {
-	int offset, err;
+	int offset, err = -EIO;
 	u32 index;
 	struct zram *zram;
 	struct bio_vec bv;
 
 	zram = bdev->bd_disk->private_data;
+	if (unlikely(!zram_meta_get(zram->meta)))
+		goto out;
+
+	if (unlikely(!init_done(zram)))
+		goto put_meta;
+
 	if (!valid_io_request(zram, sector, PAGE_SIZE)) {
 		atomic64_inc(&zram->stats.invalid_io);
-		return -EINVAL;
-	}
-
-	down_read(&zram->init_lock);
-	if (unlikely(!init_done(zram))) {
-		err = -EIO;
-		goto out_unlock;
+		err = -EINVAL;
+		goto put_meta;
 	}
 
 	index = sector >> SECTORS_PER_PAGE_SHIFT;
@@ -971,8 +1002,9 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector,
 	bv.bv_offset = 0;
 
 	err = zram_bvec_rw(zram, &bv, index, offset, rw);
-out_unlock:
-	up_read(&zram->init_lock);
+put_meta:
+	zram_meta_put(zram->meta);
+out:
 	/*
 	 * If I/O fails, just return error(ie, non-zero) without
 	 * calling page_endio.
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index b05a816b09ac..07e55ff84a9c 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -96,6 +96,8 @@ struct zram_stats {
 struct zram_meta {
 	struct zram_table_entry *table;
 	struct zs_pool *mem_pool;
+	atomic_t refcount;
+	struct completion complete; /* notify IO under all of cpu are done */
 };
 
 struct zram {
-- 
1.9.1


> 
> > But I guessed most of overhead are from [de]compression, memcpy, clear_page
> > That's why I guessed we don't have measurable difference from that.
> > What's the data pattern if you use iozone?
> 
> by "data pattern" you mean usage scenario? well, I usually use zram for
> `make -jX', where X=[4..N]. so N concurrent read-write ops scenario.

What I meant is what data fills I/O buffer, which is really important
to evaluate zram because the compression/decompression speeds relys on it.

> 
> 	-ss
> 
> > I guess it's really simple pattern compressor can do fast. I used /dev/sda
> > for dd write so more realistic data. Anyway, if we has 10% regression even if
> > the data is simple, I never want to merge it.
> > I will test it carefully and if it turns out lots regression,
> > surely, I will not go with this and send the original patch again.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-01-30  8:08         ` Sergey Senozhatsky
@ 2015-01-31  8:50           ` Ganesh Mahendran
  2015-01-31 11:07             ` Sergey Senozhatsky
  0 siblings, 1 reply; 44+ messages in thread
From: Ganesh Mahendran @ 2015-01-31  8:50 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Minchan Kim, Andrew Morton, linux-kernel,
	Linux-MM, Nitin Gupta, Jerome Marchand

2015-01-30 16:08 GMT+08:00 Sergey Senozhatsky
<sergey.senozhatsky.work@gmail.com>:
> On (01/30/15 15:52), Ganesh Mahendran wrote:
>> >> When I/O operation is running, that means the /dev/zram0 is
>> >> mounted or swaped on. Then the device could not be reset by
>> >> below code:
>> >>
>> >>     /* Do not reset an active device! */
>> >>     if (bdev->bd_holders) {
>> >>         ret = -EBUSY;
>> >>         goto out;
>> >>     }
>> >>
>> >> So the zram->init_lock in I/O path is to check whether the device
>> >> has been initialized(echo xxx > /sys/block/zram/disk_size).
>> >>
>>
>> Thanks for your explanation.
>>
>> >
>> > for mounted device (w/fs), we see initial (well, it goes up and down
>>
>> What does "w/" mean?
>
> 'with fs'
>
>> > many times while we create device, but this is not interesting here)
>> > ->bd_holders increment in:
>> >   vfs_kern_mount -> mount_bdev -> blkdev_get_by_path -> blkdev_get
>> >
>> > and it goes to zero in:
>> >   cleanup_mnt -> deactivate_super -> kill_block_super -> blkdev_put
>> >
>> >
>> > after umount we still have init device. so, *theoretically*, we
>> > can see something like
>> >
>> >         CPU0                            CPU1
>> > umount
>> > reset_store
>> > bdev->bd_holders == 0                   mount
>> > ...                                     zram_make_request()
>> > zram_reset_device()
>>
>> In this example, the data stored in zram will be corrupted.
>> Since CPU0 will free meta while CPU1 is using.
>> right?
>>
>
> with out ->init_lock protection in this case we have 'free' vs. 'use' race.

Maybe I did not explain clearly. I send a patch about this issue:

https://patchwork.kernel.org/patch/5754041/

Thanks

>
>>
>> >
>> > w/o zram->init_lock in both zram_reset_device() and zram_make_request()
>> > one of CPUs will be a bit sad.
>> what does "w/o" mean?
>
> 'with out'
>
>
>         -ss

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-01-31  8:50           ` Ganesh Mahendran
@ 2015-01-31 11:07             ` Sergey Senozhatsky
  2015-01-31 12:59               ` Ganesh Mahendran
  0 siblings, 1 reply; 44+ messages in thread
From: Sergey Senozhatsky @ 2015-01-31 11:07 UTC (permalink / raw)
  To: Ganesh Mahendran
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Minchan Kim,
	Andrew Morton, linux-kernel, Linux-MM, Nitin Gupta,
	Jerome Marchand

On (01/31/15 16:50), Ganesh Mahendran wrote:
> >> > after umount we still have init device. so, *theoretically*, we
> >> > can see something like
> >> >
> >> >         CPU0                            CPU1
> >> > umount
> >> > reset_store
> >> > bdev->bd_holders == 0                   mount
> >> > ...                                     zram_make_request()
> >> > zram_reset_device()
[..]


> 
> Maybe I did not explain clearly. I send a patch about this issue:
> 
> https://patchwork.kernel.org/patch/5754041/


excuse me? explain to me clearly what? my finding and my analysis?


this is the second time in a week that you hijack someone's work
and you don't even bother to give any credit to people.


Minchan moved zram_meta_free(meta) out of init_lock here
https://lkml.org/lkml/2015/1/21/29

I proposed to also move zs_free() of meta->handles here
https://lkml.org/lkml/2015/1/21/384


... so what happened then -- you jumped in and sent a patch.
https://lkml.org/lkml/2015/1/24/50


Minchan sent you a hint https://lkml.org/lkml/2015/1/26/471

>   but it seems the patch is based on my recent work "zram: free meta out of init_lock".



 "the patch is based on my work"!



now, for the last few days we were discussing init_lock and I first
expressed my concerns and spoke about 'free' vs. 'use' problem
here (but still didn't have enough spare to submit, besides we are in
the middle of reset/init/write rework)

https://lkml.org/lkml/2015/1/27/1029

>
>bdev->bd_holders protects from resetting device which has read/write
>operation ongoing on the onther CPU.
>
>I need to refresh on how ->bd_holders actually incremented/decremented.
>can the following race condition take a place?
>
>        CPU0                                    CPU1
>reset_store()
>bdev->bd_holders == false
>                                        zram_make_request
>                                                -rm- down_read(&zram->init_lock);
>                                        init_done(zram) == true
>zram_reset_device()                     valid_io_request()
>                                        __zram_make_request
>down_write(&zram->init_lock);           zram_bvec_rw
>[..]
>set_capacity(zram->disk, 0);
>zram->init_done = false;
>kick_all_cpus_sync();                   zram_bvec_write or zram_bvec_read()
>zram_meta_free(zram->meta);
>zcomp_destroy(zram->comp);              zcomp_compress() or zcomp_decompress()
>


and later here https://lkml.org/lkml/2015/1/29/645

>
>after umount we still have init device. so, *theoretically*, we
>can see something like
>
>
>        CPU0                            CPU1
>umount
>reset_store
>bdev->bd_holders == 0                   mount
>...                                     zram_make_request()
>zram_reset_device()
>



so what happened next? your patch happened next.
with quite familiar problem description

>
>      CPU0                    CPU1
> t1:  bdput
> t2:                          mount /dev/zram0 /mnt
> t3:  zram_reset_device
>

and now you say that I don't understant something in "your analysis"?



stop doing this. this is not how it works.


	-ss


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-01-30 14:41                     ` Minchan Kim
@ 2015-01-31 11:31                       ` Sergey Senozhatsky
  2015-02-01 14:50                       ` Sergey Senozhatsky
  1 sibling, 0 replies; 44+ messages in thread
From: Sergey Senozhatsky @ 2015-01-31 11:31 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	linux-kernel, Linux-MM, Nitin Gupta, Jerome Marchand,
	Ganesh Mahendran

Hello Minchan,
excellent analysis!

On (01/30/15 23:41), Minchan Kim wrote:
> Yes, __srcu_read_lock is a little bit heavier but the number of instruction
> are not too much difference to make difference 10%. A culprit is
> __cond_resched but I don't think, either because our test was CPU intensive
> soS I don't think schedule latency affects total bandwidth.
> 
> More cuprit is your data pattern.
> It seems you didn't use scramble_buffers=0, zero_buffers in fio so that
> fio fills random data pattern so zram bandwidth could be different by
> compression/decompression ratio.

Completely agree.
Shame on me. gotten so used to iozone (iozone uses same data pattern 0xA5,
this is +Z option what for), so I didn't even think about data pattern
in fio. sorry.

> 1) randread
> srcu is worse as 0.63% but the difference is really marginal.
> 
> 2) randwrite
> srcu is better as 1.24% is better.
> 
> 3) randrw
> srcu is better as 2.3%

hm, interesting. I'll re-check.

> Okay, if you concerns on the data still, how about this?

I'm not so upset to lose 0.6234187%. my concerns were about iozone's
10% different (which looks a bit worse).


I'll review your patch. Thanks for your effort.


> > 
> > by "data pattern" you mean usage scenario? well, I usually use zram for
> > `make -jX', where X=[4..N]. so N concurrent read-write ops scenario.
> 
> What I meant is what data fills I/O buffer, which is really important
> to evaluate zram because the compression/decompression speeds relys on it.
> 

I see. I never test it with `make' anyway, only iozone +Z.

	-ss

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-01-31 11:07             ` Sergey Senozhatsky
@ 2015-01-31 12:59               ` Ganesh Mahendran
  0 siblings, 0 replies; 44+ messages in thread
From: Ganesh Mahendran @ 2015-01-31 12:59 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Minchan Kim, Andrew Morton, linux-kernel,
	Linux-MM, Nitin Gupta, Jerome Marchand

Hello, Sergey

2015-01-31 19:07 GMT+08:00 Sergey Senozhatsky <sergey.senozhatsky@gmail.com>:
> On (01/31/15 16:50), Ganesh Mahendran wrote:
>> >> > after umount we still have init device. so, *theoretically*, we
>> >> > can see something like
>> >> >
>> >> >         CPU0                            CPU1
>> >> > umount
>> >> > reset_store
>> >> > bdev->bd_holders == 0                   mount
>> >> > ...                                     zram_make_request()
>> >> > zram_reset_device()
> [..]
>
>
>>
>> Maybe I did not explain clearly. I send a patch about this issue:
>>
>> https://patchwork.kernel.org/patch/5754041/
>
>
> excuse me? explain to me clearly what? my finding and my analysis?

Sorry, I missed this mail
https://lkml.org/lkml/2015/1/27/1029

That's why I ask questions in this
https://lkml.org/lkml/2015/1/29/580
after Minchan's description.

>
>
> this is the second time in a week that you hijack someone's work
> and you don't even bother to give any credit to people.
>
>
> Minchan moved zram_meta_free(meta) out of init_lock here
> https://lkml.org/lkml/2015/1/21/29
>
> I proposed to also move zs_free() of meta->handles here
> https://lkml.org/lkml/2015/1/21/384

I thought you wanted move the code block after
       up_write(&zram->init_lock);

And I found the code block can be even encapsulated in
zram_meta_free().

That's why I sent:
https://lkml.org/lkml/2015/1/24/50

>
>
> ... so what happened then -- you jumped in and sent a patch.
> https://lkml.org/lkml/2015/1/24/50
>
>
> Minchan sent you a hint https://lkml.org/lkml/2015/1/26/471
>
>>   but it seems the patch is based on my recent work "zram: free meta out of init_lock".
>
>
>
>  "the patch is based on my work"!
>
>
>
> now, for the last few days we were discussing init_lock and I first
> expressed my concerns and spoke about 'free' vs. 'use' problem
> here (but still didn't have enough spare to submit, besides we are in
> the middle of reset/init/write rework)
>
> https://lkml.org/lkml/2015/1/27/1029
>
>>
>>bdev->bd_holders protects from resetting device which has read/write
>>operation ongoing on the onther CPU.
>>
>>I need to refresh on how ->bd_holders actually incremented/decremented.
>>can the following race condition take a place?
>>
>>        CPU0                                    CPU1
>>reset_store()
>>bdev->bd_holders == false
>>                                        zram_make_request
>>                                                -rm- down_read(&zram->init_lock);
>>                                        init_done(zram) == true
>>zram_reset_device()                     valid_io_request()
>>                                        __zram_make_request
>>down_write(&zram->init_lock);           zram_bvec_rw
>>[..]
>>set_capacity(zram->disk, 0);
>>zram->init_done = false;
>>kick_all_cpus_sync();                   zram_bvec_write or zram_bvec_read()
>>zram_meta_free(zram->meta);
>>zcomp_destroy(zram->comp);              zcomp_compress() or zcomp_decompress()

Sorry, I did not check this mail.

>>
>
>
> and later here https://lkml.org/lkml/2015/1/29/645
>
>>
>>after umount we still have init device. so, *theoretically*, we
>>can see something like
>>
>>
>>        CPU0                            CPU1
>>umount
>>reset_store
>>bdev->bd_holders == 0                   mount
>>...                                     zram_make_request()
>>zram_reset_device()
>>
>
>
>
> so what happened next? your patch happened next.
> with quite familiar problem description
>
>>
>>      CPU0                    CPU1
>> t1:  bdput
>> t2:                          mount /dev/zram0 /mnt
>> t3:  zram_reset_device
>>
>
> and now you say that I don't understant something in "your analysis"?
>
>
>
> stop doing this. this is not how it works.
>
>
>         -ss
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-01-30 14:41                     ` Minchan Kim
  2015-01-31 11:31                       ` Sergey Senozhatsky
@ 2015-02-01 14:50                       ` Sergey Senozhatsky
  2015-02-01 15:04                         ` Sergey Senozhatsky
  2015-02-02  1:30                         ` Minchan Kim
  1 sibling, 2 replies; 44+ messages in thread
From: Sergey Senozhatsky @ 2015-02-01 14:50 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	linux-kernel, Linux-MM, Nitin Gupta, Jerome Marchand,
	Ganesh Mahendran

Hello Minchan,

the idea looks good and this is something I was trying to do, except
that I used kref.

some review nitpicks are below. I also posted modified version of your
patch so that will save some time.


>  static inline int init_done(struct zram *zram)
>  {
> -	return zram->meta != NULL;
> +	return zram->disksize != 0;

we don't set ->disksize to 0 when create device. and I think
it's better to use refcount here, but set it to 0 during device creation.
(see the patch below)

> +static inline bool zram_meta_get(struct zram_meta *meta)
> +{
> +	if (!atomic_inc_not_zero(&meta->refcount))
> +		return false;
> +	return true;
> +}

I've changed it to likely case first: `if recount return true'

>  static void zram_reset_device(struct zram *zram, bool reset_capacity)
>  {
> +	struct zram_meta *meta;
> +	u64 disksize;

not needed. (see the patch below).

> +
>  	down_write(&zram->init_lock);
>  
>  	zram->limit_pages = 0;
> @@ -728,14 +750,20 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>  		return;
>  	}
>  
> +	meta = zram->meta;
> +
>  	zcomp_destroy(zram->comp);

we can't destoy zcomp before we see IO completion.

>  	zram->max_comp_streams = 1;

we better keep original comp_streams number before we see IO completion.
we don't know how many RW ops we have, so completion can happen earlier.

> -	zram_meta_free(zram->meta, zram->disksize);
> -	zram->meta = NULL;
> +	disksize = zram->disksize;
> +	zram_meta_put(meta);
> +	/* Read/write handler will not handle further I/O operation. */
> +	zram->disksize = 0;

I keep it on its current position. (see below)

> +	wait_for_completion(&meta->complete);
> +	/* I/O operation under all of CPU are done so let's free */
> +	zram_meta_free(zram->meta, disksize);
>  	/* Reset stats */
>  	memset(&zram->stats, 0, sizeof(zram->stats));
>  
> -	zram->disksize = 0;
>  	if (reset_capacity)
>  		set_capacity(zram->disk, 0);
>  
> @@ -908,23 +936,25 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
>  {
>  	struct zram *zram = queue->queuedata;
>  
> -	down_read(&zram->init_lock);
> -	if (unlikely(!init_done(zram)))
> +	if (unlikely(!zram_meta_get(zram->meta)))
>  		goto error;
>  
> +	if (unlikely(!init_done(zram)))
> +		goto put_meta;
> +

here and later:
we can't take zram_meta_get() first and then check for init_done(zram),
because ->meta can be NULL, so it fill be ->NULL->refcount.

let's keep ->completion and ->refcount in zram and rename zram_meta_[get|put]
to zram_[get|put].




please review a bit modified version of your patch.

/* the patch also reogranizes a bit order of struct zram members, to move
member that we use more often together and to avoid paddings. nothing
critical here. */


next action items are:
-- we actually can now switch from init_lock in some _show() fucntion to
zram_get()/zram_put()
-- address that theoretical and very unlikely in real live race condition
of umount-reset vs. mount-rw.


no concerns about performance of this version -- we probably will not get
any faster than that.


thanks a lot for your effort!

---

 drivers/block/zram/zram_drv.c | 82 ++++++++++++++++++++++++++++++-------------
 drivers/block/zram/zram_drv.h | 17 ++++-----
 2 files changed, 66 insertions(+), 33 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index aa5a4c5..6916790 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -44,7 +44,7 @@ static const char *default_compressor = "lzo";
 static unsigned int num_devices = 1;
 
 #define ZRAM_ATTR_RO(name)						\
-static ssize_t name##_show(struct device *d,		\
+static ssize_t name##_show(struct device *d,				\
 				struct device_attribute *attr, char *b)	\
 {									\
 	struct zram *zram = dev_to_zram(d);				\
@@ -55,7 +55,7 @@ static DEVICE_ATTR_RO(name);
 
 static inline int init_done(struct zram *zram)
 {
-	return zram->meta != NULL;
+	return atomic_read(&zram->refcount);
 }
 
 static inline struct zram *dev_to_zram(struct device *dev)
@@ -358,6 +358,23 @@ out_error:
 	return NULL;
 }
 
+static inline bool zram_get(struct zram *zram)
+{
+	if (atomic_inc_not_zero(&zram->refcount))
+		return true;
+	return false;
+}
+
+/*
+ * We want to free zram_meta in process context to avoid
+ * deadlock between reclaim path and any other locks
+ */
+static inline void zram_put(struct zram *zram)
+{
+	if (atomic_dec_and_test(&zram->refcount))
+		complete(&zram->io_done);
+}
+
 static void update_position(u32 *index, int *offset, struct bio_vec *bvec)
 {
 	if (*offset + bvec->bv_len >= PAGE_SIZE)
@@ -719,6 +736,9 @@ static void zram_bio_discard(struct zram *zram, u32 index,
 
 static void zram_reset_device(struct zram *zram, bool reset_capacity)
 {
+	struct zram_meta *meta;
+	struct zcomp *comp;
+
 	down_write(&zram->init_lock);
 
 	zram->limit_pages = 0;
@@ -728,14 +748,21 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
 		return;
 	}
 
-	zcomp_destroy(zram->comp);
-	zram->max_comp_streams = 1;
-	zram_meta_free(zram->meta, zram->disksize);
-	zram->meta = NULL;
+	meta = zram->meta;
+	comp = zram->comp;
+	/* ->refcount will go down to 0 eventually */
+	zram_put(zram);
+
+	wait_for_completion(&zram->io_done);
+	/* I/O operation under all of CPU are done so let's free */
+	zram_meta_free(meta, disksize);
+	zcomp_destroy(comp);
+
 	/* Reset stats */
 	memset(&zram->stats, 0, sizeof(zram->stats));
-
 	zram->disksize = 0;
+	zram->max_comp_streams = 1;
+
 	if (reset_capacity)
 		set_capacity(zram->disk, 0);
 
@@ -783,6 +810,8 @@ static ssize_t disksize_store(struct device *dev,
 		goto out_destroy_comp;
 	}
 
+	init_completion(&zram->io_done);
+	atomic_set(&zram->refcount, 1);
 	zram->meta = meta;
 	zram->comp = comp;
 	zram->disksize = disksize;
@@ -795,7 +824,6 @@ static ssize_t disksize_store(struct device *dev,
 	 * so that revalidate_disk always sees up-to-date capacity.
 	 */
 	revalidate_disk(zram->disk);
-
 	return len;
 
 out_destroy_comp:
@@ -908,23 +936,24 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
 {
 	struct zram *zram = queue->queuedata;
 
-	down_read(&zram->init_lock);
-	if (unlikely(!init_done(zram)))
+	if (unlikely(!zram_get(zram)))
 		goto error;
 
+	if (unlikely(!init_done(zram)))
+		goto put_zram;
+
 	if (!valid_io_request(zram, bio->bi_iter.bi_sector,
 					bio->bi_iter.bi_size)) {
 		atomic64_inc(&zram->stats.invalid_io);
-		goto error;
+		goto put_zram;
 	}
 
 	__zram_make_request(zram, bio);
-	up_read(&zram->init_lock);
-
+	zram_put(zram);
 	return;
-
+put_zram:
+	zram_put(zram);
 error:
-	up_read(&zram->init_lock);
 	bio_io_error(bio);
 }
 
@@ -946,21 +975,22 @@ static void zram_slot_free_notify(struct block_device *bdev,
 static int zram_rw_page(struct block_device *bdev, sector_t sector,
 		       struct page *page, int rw)
 {
-	int offset, err;
+	int offset, err = -EIO;
 	u32 index;
 	struct zram *zram;
 	struct bio_vec bv;
 
 	zram = bdev->bd_disk->private_data;
+	if (unlikely(!zram_get(zram)))
+		goto out;
+
+	if (unlikely(!init_done(zram)))
+		goto put_zram;
+
 	if (!valid_io_request(zram, sector, PAGE_SIZE)) {
 		atomic64_inc(&zram->stats.invalid_io);
-		return -EINVAL;
-	}
-
-	down_read(&zram->init_lock);
-	if (unlikely(!init_done(zram))) {
-		err = -EIO;
-		goto out_unlock;
+		err = -EINVAL;
+		goto put_zram;
 	}
 
 	index = sector >> SECTORS_PER_PAGE_SHIFT;
@@ -971,8 +1001,9 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector,
 	bv.bv_offset = 0;
 
 	err = zram_bvec_rw(zram, &bv, index, offset, rw);
-out_unlock:
-	up_read(&zram->init_lock);
+put_zram:
+	zram_put(zram);
+out:
 	/*
 	 * If I/O fails, just return error(ie, non-zero) without
 	 * calling page_endio.
@@ -1041,6 +1072,7 @@ static int create_device(struct zram *zram, int device_id)
 	int ret = -ENOMEM;
 
 	init_rwsem(&zram->init_lock);
+	atomic_set(&zram->refcount, 0);
 
 	zram->queue = blk_alloc_queue(GFP_KERNEL);
 	if (!zram->queue) {
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index b05a816..7138c82 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -100,24 +100,25 @@ struct zram_meta {
 
 struct zram {
 	struct zram_meta *meta;
+	struct zcomp *comp;
 	struct request_queue *queue;
 	struct gendisk *disk;
-	struct zcomp *comp;
-
 	/* Prevent concurrent execution of device init, reset and R/W request */
 	struct rw_semaphore init_lock;
 	/*
-	 * This is the limit on amount of *uncompressed* worth of data
-	 * we can store in a disk.
+	 * the number of pages zram can consume for storing compressed data
 	 */
-	u64 disksize;	/* bytes */
+	unsigned long limit_pages;
+	atomic_t refcount;
 	int max_comp_streams;
+
 	struct zram_stats stats;
+	struct completion io_done; /* notify IO under all of cpu are done */
 	/*
-	 * the number of pages zram can consume for storing compressed data
+	 * This is the limit on amount of *uncompressed* worth of data
+	 * we can store in a disk.
 	 */
-	unsigned long limit_pages;
-
+	u64 disksize;	/* bytes */
 	char compressor[10];
 };
 #endif


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-02-01 14:50                       ` Sergey Senozhatsky
@ 2015-02-01 15:04                         ` Sergey Senozhatsky
  2015-02-02  1:43                           ` Minchan Kim
  2015-02-02  1:30                         ` Minchan Kim
  1 sibling, 1 reply; 44+ messages in thread
From: Sergey Senozhatsky @ 2015-02-01 15:04 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	linux-kernel, Linux-MM, Nitin Gupta, Jerome Marchand,
	Ganesh Mahendran

Sorry, guys! my bad. the patch I sent to you is incomplete. pelase review
this one.

On (02/01/15 23:50), Sergey Senozhatsky wrote:
> +	zram_meta_free(meta, disksize);
> 
that was my in-mail last second stupid modification and I didn't compile
tested it. I hit send button and then realized that your patch didn't move
->meta and ->comp free out of ->init_lock.

So this patch does it.

thanks.

---

 drivers/block/zram/zram_drv.c | 84 +++++++++++++++++++++++++++++--------------
 drivers/block/zram/zram_drv.h | 17 ++++-----
 2 files changed, 67 insertions(+), 34 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index aa5a4c5..c0b612d 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -44,7 +44,7 @@ static const char *default_compressor = "lzo";
 static unsigned int num_devices = 1;
 
 #define ZRAM_ATTR_RO(name)						\
-static ssize_t name##_show(struct device *d,		\
+static ssize_t name##_show(struct device *d,				\
 				struct device_attribute *attr, char *b)	\
 {									\
 	struct zram *zram = dev_to_zram(d);				\
@@ -55,7 +55,7 @@ static DEVICE_ATTR_RO(name);
 
 static inline int init_done(struct zram *zram)
 {
-	return zram->meta != NULL;
+	return atomic_read(&zram->refcount);
 }
 
 static inline struct zram *dev_to_zram(struct device *dev)
@@ -358,6 +358,23 @@ out_error:
 	return NULL;
 }
 
+static inline bool zram_get(struct zram *zram)
+{
+	if (atomic_inc_not_zero(&zram->refcount))
+		return true;
+	return false;
+}
+
+/*
+ * We want to free zram_meta in process context to avoid
+ * deadlock between reclaim path and any other locks
+ */
+static inline void zram_put(struct zram *zram)
+{
+	if (atomic_dec_and_test(&zram->refcount))
+		complete(&zram->io_done);
+}
+
 static void update_position(u32 *index, int *offset, struct bio_vec *bvec)
 {
 	if (*offset + bvec->bv_len >= PAGE_SIZE)
@@ -719,6 +736,10 @@ static void zram_bio_discard(struct zram *zram, u32 index,
 
 static void zram_reset_device(struct zram *zram, bool reset_capacity)
 {
+	struct zram_meta *meta;
+	struct zcomp *comp;
+	u64 disksize;
+
 	down_write(&zram->init_lock);
 
 	zram->limit_pages = 0;
@@ -728,19 +749,25 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
 		return;
 	}
 
-	zcomp_destroy(zram->comp);
-	zram->max_comp_streams = 1;
-	zram_meta_free(zram->meta, zram->disksize);
-	zram->meta = NULL;
+	meta = zram->meta;
+	comp = zram->comp;
+	disksize = zram->disksize;
+	/* ->refcount will go down to 0 eventually */
+	zram_put(zram);
+	wait_for_completion(&zram->io_done);
+
 	/* Reset stats */
 	memset(&zram->stats, 0, sizeof(zram->stats));
-
 	zram->disksize = 0;
+	zram->max_comp_streams = 1;
+
 	if (reset_capacity)
 		set_capacity(zram->disk, 0);
 
 	up_write(&zram->init_lock);
-
+	/* I/O operation under all of CPU are done so let's free */
+	zram_meta_free(meta, disksize);
+	zcomp_destroy(comp);
 	/*
 	 * Revalidate disk out of the init_lock to avoid lockdep splat.
 	 * It's okay because disk's capacity is protected by init_lock
@@ -783,6 +810,8 @@ static ssize_t disksize_store(struct device *dev,
 		goto out_destroy_comp;
 	}
 
+	init_completion(&zram->io_done);
+	atomic_set(&zram->refcount, 1);
 	zram->meta = meta;
 	zram->comp = comp;
 	zram->disksize = disksize;
@@ -795,7 +824,6 @@ static ssize_t disksize_store(struct device *dev,
 	 * so that revalidate_disk always sees up-to-date capacity.
 	 */
 	revalidate_disk(zram->disk);
-
 	return len;
 
 out_destroy_comp:
@@ -908,23 +936,24 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
 {
 	struct zram *zram = queue->queuedata;
 
-	down_read(&zram->init_lock);
-	if (unlikely(!init_done(zram)))
+	if (unlikely(!zram_get(zram)))
 		goto error;
 
+	if (unlikely(!init_done(zram)))
+		goto put_zram;
+
 	if (!valid_io_request(zram, bio->bi_iter.bi_sector,
 					bio->bi_iter.bi_size)) {
 		atomic64_inc(&zram->stats.invalid_io);
-		goto error;
+		goto put_zram;
 	}
 
 	__zram_make_request(zram, bio);
-	up_read(&zram->init_lock);
-
+	zram_put(zram);
 	return;
-
+put_zram:
+	zram_put(zram);
 error:
-	up_read(&zram->init_lock);
 	bio_io_error(bio);
 }
 
@@ -946,21 +975,22 @@ static void zram_slot_free_notify(struct block_device *bdev,
 static int zram_rw_page(struct block_device *bdev, sector_t sector,
 		       struct page *page, int rw)
 {
-	int offset, err;
+	int offset, err = -EIO;
 	u32 index;
 	struct zram *zram;
 	struct bio_vec bv;
 
 	zram = bdev->bd_disk->private_data;
+	if (unlikely(!zram_get(zram)))
+		goto out;
+
+	if (unlikely(!init_done(zram)))
+		goto put_zram;
+
 	if (!valid_io_request(zram, sector, PAGE_SIZE)) {
 		atomic64_inc(&zram->stats.invalid_io);
-		return -EINVAL;
-	}
-
-	down_read(&zram->init_lock);
-	if (unlikely(!init_done(zram))) {
-		err = -EIO;
-		goto out_unlock;
+		err = -EINVAL;
+		goto put_zram;
 	}
 
 	index = sector >> SECTORS_PER_PAGE_SHIFT;
@@ -971,8 +1001,9 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector,
 	bv.bv_offset = 0;
 
 	err = zram_bvec_rw(zram, &bv, index, offset, rw);
-out_unlock:
-	up_read(&zram->init_lock);
+put_zram:
+	zram_put(zram);
+out:
 	/*
 	 * If I/O fails, just return error(ie, non-zero) without
 	 * calling page_endio.
@@ -1041,6 +1072,7 @@ static int create_device(struct zram *zram, int device_id)
 	int ret = -ENOMEM;
 
 	init_rwsem(&zram->init_lock);
+	atomic_set(&zram->refcount, 0);
 
 	zram->queue = blk_alloc_queue(GFP_KERNEL);
 	if (!zram->queue) {
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index b05a816..7138c82 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -100,24 +100,25 @@ struct zram_meta {
 
 struct zram {
 	struct zram_meta *meta;
+	struct zcomp *comp;
 	struct request_queue *queue;
 	struct gendisk *disk;
-	struct zcomp *comp;
-
 	/* Prevent concurrent execution of device init, reset and R/W request */
 	struct rw_semaphore init_lock;
 	/*
-	 * This is the limit on amount of *uncompressed* worth of data
-	 * we can store in a disk.
+	 * the number of pages zram can consume for storing compressed data
 	 */
-	u64 disksize;	/* bytes */
+	unsigned long limit_pages;
+	atomic_t refcount;
 	int max_comp_streams;
+
 	struct zram_stats stats;
+	struct completion io_done; /* notify IO under all of cpu are done */
 	/*
-	 * the number of pages zram can consume for storing compressed data
+	 * This is the limit on amount of *uncompressed* worth of data
+	 * we can store in a disk.
 	 */
-	unsigned long limit_pages;
-
+	u64 disksize;	/* bytes */
 	char compressor[10];
 };
 #endif


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-02-01 14:50                       ` Sergey Senozhatsky
  2015-02-01 15:04                         ` Sergey Senozhatsky
@ 2015-02-02  1:30                         ` Minchan Kim
  2015-02-02  1:48                           ` Sergey Senozhatsky
  1 sibling, 1 reply; 44+ messages in thread
From: Minchan Kim @ 2015-02-02  1:30 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel, Linux-MM,
	Nitin Gupta, Jerome Marchand, Ganesh Mahendran

Hey Sergey,

On Sun, Feb 01, 2015 at 11:50:36PM +0900, Sergey Senozhatsky wrote:
> Hello Minchan,
> 
> the idea looks good and this is something I was trying to do, except
> that I used kref.
> 
> some review nitpicks are below. I also posted modified version of your
> patch so that will save some time.

Thanks a lot!

> 
> 
> >  static inline int init_done(struct zram *zram)
> >  {
> > -	return zram->meta != NULL;
> > +	return zram->disksize != 0;
> 
> we don't set ->disksize to 0 when create device. and I think
> it's better to use refcount here, but set it to 0 during device creation.
> (see the patch below)

There was a reason I didn't use refcount there.
I should have written down it.

We need something to prevent further I/O handling on other CPUs.
Otherwise, it's livelock. For example, new 'A' I/O rw path on CPU 1
can see non-zero refcount if another CPU is going on rw.
Then, another new 'B' I/O rw path on CPU 2 can see non-zero refcount
if A I/O is going on. Then, another new 'C' I/O rw path on CPU 3 can
see non-zero refcount if B I/O is going on. Finally, 'A' IO is done
on CPU 1 and next I/O 'D' on CPU 1 can see non-zero refcount because
'C' on CPU 3 is going on. Infinite loop.

> 
> > +static inline bool zram_meta_get(struct zram_meta *meta)
> > +{
> > +	if (!atomic_inc_not_zero(&meta->refcount))
> > +		return false;
> > +	return true;
> > +}
> 
> I've changed it to likely case first: `if recount return true'

Thanks!

> 
> >  static void zram_reset_device(struct zram *zram, bool reset_capacity)
> >  {
> > +	struct zram_meta *meta;
> > +	u64 disksize;
> 
> not needed. (see the patch below).
> 
> > +
> >  	down_write(&zram->init_lock);
> >  
> >  	zram->limit_pages = 0;
> > @@ -728,14 +750,20 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
> >  		return;
> >  	}
> >  
> > +	meta = zram->meta;
> > +
> >  	zcomp_destroy(zram->comp);
> 
> we can't destoy zcomp before we see IO completion.

True.

> 
> >  	zram->max_comp_streams = 1;
> 
> we better keep original comp_streams number before we see IO completion.
> we don't know how many RW ops we have, so completion can happen earlier.

Yeb.

> 
> > -	zram_meta_free(zram->meta, zram->disksize);
> > -	zram->meta = NULL;
> > +	disksize = zram->disksize;
> > +	zram_meta_put(meta);
> > +	/* Read/write handler will not handle further I/O operation. */
> > +	zram->disksize = 0;
> 
> I keep it on its current position. (see below)
> 
> > +	wait_for_completion(&meta->complete);
> > +	/* I/O operation under all of CPU are done so let's free */
> > +	zram_meta_free(zram->meta, disksize);
> >  	/* Reset stats */
> >  	memset(&zram->stats, 0, sizeof(zram->stats));
> >  
> > -	zram->disksize = 0;
> >  	if (reset_capacity)
> >  		set_capacity(zram->disk, 0);
> >  
> > @@ -908,23 +936,25 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
> >  {
> >  	struct zram *zram = queue->queuedata;
> >  
> > -	down_read(&zram->init_lock);
> > -	if (unlikely(!init_done(zram)))
> > +	if (unlikely(!zram_meta_get(zram->meta)))
> >  		goto error;
> >  
> > +	if (unlikely(!init_done(zram)))
> > +		goto put_meta;
> > +
> 
> here and later:
> we can't take zram_meta_get() first and then check for init_done(zram),
> because ->meta can be NULL, so it fill be ->NULL->refcount.

True.
Actually, it was totally RFC I forgot adding the tag in the night but I can't
escape from my shame with the escuse. Thanks!

> 
> let's keep ->completion and ->refcount in zram and rename zram_meta_[get|put]
> to zram_[get|put].

Good idea but still want to name it as zram_meta_get/put because zram_get naming
might confuse struct zram's refcount rather than zram_meta. :)

> 
> 
> 
> 
> please review a bit modified version of your patch.
> 
> /* the patch also reogranizes a bit order of struct zram members, to move
> member that we use more often together and to avoid paddings. nothing
> critical here. */
> 
> 
> next action items are:
> -- we actually can now switch from init_lock in some _show() fucntion to
> zram_get()/zram_put()
> -- address that theoretical and very unlikely in real live race condition
> of umount-reset vs. mount-rw.
> 
> 
> no concerns about performance of this version -- we probably will not get
> any faster than that.
> 
> 
> thanks a lot for your effort!
> 
> ---
> 
>  drivers/block/zram/zram_drv.c | 82 ++++++++++++++++++++++++++++++-------------
>  drivers/block/zram/zram_drv.h | 17 ++++-----
>  2 files changed, 66 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index aa5a4c5..6916790 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -44,7 +44,7 @@ static const char *default_compressor = "lzo";
>  static unsigned int num_devices = 1;
>  
>  #define ZRAM_ATTR_RO(name)						\
> -static ssize_t name##_show(struct device *d,		\
> +static ssize_t name##_show(struct device *d,				\
>  				struct device_attribute *attr, char *b)	\
>  {									\
>  	struct zram *zram = dev_to_zram(d);				\
> @@ -55,7 +55,7 @@ static DEVICE_ATTR_RO(name);
>  
>  static inline int init_done(struct zram *zram)
>  {
> -	return zram->meta != NULL;
> +	return atomic_read(&zram->refcount);
>  }
>  
>  static inline struct zram *dev_to_zram(struct device *dev)
> @@ -358,6 +358,23 @@ out_error:
>  	return NULL;
>  }
>  
> +static inline bool zram_get(struct zram *zram)
> +{
> +	if (atomic_inc_not_zero(&zram->refcount))
> +		return true;
> +	return false;
> +}
> +
> +/*
> + * We want to free zram_meta in process context to avoid
> + * deadlock between reclaim path and any other locks
> + */
> +static inline void zram_put(struct zram *zram)
> +{
> +	if (atomic_dec_and_test(&zram->refcount))
> +		complete(&zram->io_done);
> +}
> +
>  static void update_position(u32 *index, int *offset, struct bio_vec *bvec)
>  {
>  	if (*offset + bvec->bv_len >= PAGE_SIZE)
> @@ -719,6 +736,9 @@ static void zram_bio_discard(struct zram *zram, u32 index,
>  
>  static void zram_reset_device(struct zram *zram, bool reset_capacity)
>  {
> +	struct zram_meta *meta;
> +	struct zcomp *comp;
> +
>  	down_write(&zram->init_lock);
>  
>  	zram->limit_pages = 0;
> @@ -728,14 +748,21 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>  		return;
>  	}
>  
> -	zcomp_destroy(zram->comp);
> -	zram->max_comp_streams = 1;
> -	zram_meta_free(zram->meta, zram->disksize);
> -	zram->meta = NULL;
> +	meta = zram->meta;
> +	comp = zram->comp;
> +	/* ->refcount will go down to 0 eventually */
> +	zram_put(zram);
> +
> +	wait_for_completion(&zram->io_done);
> +	/* I/O operation under all of CPU are done so let's free */
> +	zram_meta_free(meta, disksize);
> +	zcomp_destroy(comp);
> +
>  	/* Reset stats */
>  	memset(&zram->stats, 0, sizeof(zram->stats));
> -
>  	zram->disksize = 0;
> +	zram->max_comp_streams = 1;
> +
>  	if (reset_capacity)
>  		set_capacity(zram->disk, 0);
>  
> @@ -783,6 +810,8 @@ static ssize_t disksize_store(struct device *dev,
>  		goto out_destroy_comp;
>  	}
>  
> +	init_completion(&zram->io_done);
> +	atomic_set(&zram->refcount, 1);
>  	zram->meta = meta;
>  	zram->comp = comp;
>  	zram->disksize = disksize;
> @@ -795,7 +824,6 @@ static ssize_t disksize_store(struct device *dev,
>  	 * so that revalidate_disk always sees up-to-date capacity.
>  	 */
>  	revalidate_disk(zram->disk);
> -
>  	return len;
>  
>  out_destroy_comp:
> @@ -908,23 +936,24 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
>  {
>  	struct zram *zram = queue->queuedata;
>  
> -	down_read(&zram->init_lock);
> -	if (unlikely(!init_done(zram)))
> +	if (unlikely(!zram_get(zram)))
>  		goto error;
>  
> +	if (unlikely(!init_done(zram)))
> +		goto put_zram;
> +
>  	if (!valid_io_request(zram, bio->bi_iter.bi_sector,
>  					bio->bi_iter.bi_size)) {
>  		atomic64_inc(&zram->stats.invalid_io);
> -		goto error;
> +		goto put_zram;
>  	}
>  
>  	__zram_make_request(zram, bio);
> -	up_read(&zram->init_lock);
> -
> +	zram_put(zram);
>  	return;
> -
> +put_zram:
> +	zram_put(zram);
>  error:
> -	up_read(&zram->init_lock);
>  	bio_io_error(bio);
>  }
>  
> @@ -946,21 +975,22 @@ static void zram_slot_free_notify(struct block_device *bdev,
>  static int zram_rw_page(struct block_device *bdev, sector_t sector,
>  		       struct page *page, int rw)
>  {
> -	int offset, err;
> +	int offset, err = -EIO;
>  	u32 index;
>  	struct zram *zram;
>  	struct bio_vec bv;
>  
>  	zram = bdev->bd_disk->private_data;
> +	if (unlikely(!zram_get(zram)))
> +		goto out;
> +
> +	if (unlikely(!init_done(zram)))
> +		goto put_zram;
> +
>  	if (!valid_io_request(zram, sector, PAGE_SIZE)) {
>  		atomic64_inc(&zram->stats.invalid_io);
> -		return -EINVAL;
> -	}
> -
> -	down_read(&zram->init_lock);
> -	if (unlikely(!init_done(zram))) {
> -		err = -EIO;
> -		goto out_unlock;
> +		err = -EINVAL;
> +		goto put_zram;
>  	}
>  
>  	index = sector >> SECTORS_PER_PAGE_SHIFT;
> @@ -971,8 +1001,9 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector,
>  	bv.bv_offset = 0;
>  
>  	err = zram_bvec_rw(zram, &bv, index, offset, rw);
> -out_unlock:
> -	up_read(&zram->init_lock);
> +put_zram:
> +	zram_put(zram);
> +out:
>  	/*
>  	 * If I/O fails, just return error(ie, non-zero) without
>  	 * calling page_endio.
> @@ -1041,6 +1072,7 @@ static int create_device(struct zram *zram, int device_id)
>  	int ret = -ENOMEM;
>  
>  	init_rwsem(&zram->init_lock);
> +	atomic_set(&zram->refcount, 0);
>  
>  	zram->queue = blk_alloc_queue(GFP_KERNEL);
>  	if (!zram->queue) {
> diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
> index b05a816..7138c82 100644
> --- a/drivers/block/zram/zram_drv.h
> +++ b/drivers/block/zram/zram_drv.h
> @@ -100,24 +100,25 @@ struct zram_meta {
>  
>  struct zram {
>  	struct zram_meta *meta;
> +	struct zcomp *comp;
>  	struct request_queue *queue;
>  	struct gendisk *disk;
> -	struct zcomp *comp;
> -
>  	/* Prevent concurrent execution of device init, reset and R/W request */
>  	struct rw_semaphore init_lock;
>  	/*
> -	 * This is the limit on amount of *uncompressed* worth of data
> -	 * we can store in a disk.
> +	 * the number of pages zram can consume for storing compressed data
>  	 */
> -	u64 disksize;	/* bytes */
> +	unsigned long limit_pages;
> +	atomic_t refcount;
>  	int max_comp_streams;
> +
>  	struct zram_stats stats;
> +	struct completion io_done; /* notify IO under all of cpu are done */
>  	/*
> -	 * the number of pages zram can consume for storing compressed data
> +	 * This is the limit on amount of *uncompressed* worth of data
> +	 * we can store in a disk.
>  	 */
> -	unsigned long limit_pages;
> -
> +	u64 disksize;	/* bytes */
>  	char compressor[10];
>  };
>  #endif
> 

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-02-01 15:04                         ` Sergey Senozhatsky
@ 2015-02-02  1:43                           ` Minchan Kim
  2015-02-02  1:59                             ` Sergey Senozhatsky
  0 siblings, 1 reply; 44+ messages in thread
From: Minchan Kim @ 2015-02-02  1:43 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel, Linux-MM,
	Nitin Gupta, Jerome Marchand, Ganesh Mahendran

On Mon, Feb 02, 2015 at 12:04:16AM +0900, Sergey Senozhatsky wrote:
> Sorry, guys! my bad. the patch I sent to you is incomplete. pelase review
> this one.
> 
> On (02/01/15 23:50), Sergey Senozhatsky wrote:
> > +	zram_meta_free(meta, disksize);
> > 
> that was my in-mail last second stupid modification and I didn't compile
> tested it. I hit send button and then realized that your patch didn't move
> ->meta and ->comp free out of ->init_lock.
> 
> So this patch does it.
> 
> thanks.
> 
> ---
> 
>  drivers/block/zram/zram_drv.c | 84 +++++++++++++++++++++++++++++--------------
>  drivers/block/zram/zram_drv.h | 17 ++++-----
>  2 files changed, 67 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index aa5a4c5..c0b612d 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -44,7 +44,7 @@ static const char *default_compressor = "lzo";
>  static unsigned int num_devices = 1;
>  
>  #define ZRAM_ATTR_RO(name)						\
> -static ssize_t name##_show(struct device *d,		\
> +static ssize_t name##_show(struct device *d,				\
>  				struct device_attribute *attr, char *b)	\
>  {									\
>  	struct zram *zram = dev_to_zram(d);				\
> @@ -55,7 +55,7 @@ static DEVICE_ATTR_RO(name);
>  
>  static inline int init_done(struct zram *zram)
>  {
> -	return zram->meta != NULL;
> +	return atomic_read(&zram->refcount);

As I said previous mail, it could make livelock so I want to use disksize
in here to prevent further I/O handling.

>  }
>  
>  static inline struct zram *dev_to_zram(struct device *dev)
> @@ -358,6 +358,23 @@ out_error:
>  	return NULL;
>  }
>  
> +static inline bool zram_get(struct zram *zram)
> +{
> +	if (atomic_inc_not_zero(&zram->refcount))
> +		return true;
> +	return false;
> +}
> +
> +/*
> + * We want to free zram_meta in process context to avoid
> + * deadlock between reclaim path and any other locks
> + */
> +static inline void zram_put(struct zram *zram)
> +{
> +	if (atomic_dec_and_test(&zram->refcount))
> +		complete(&zram->io_done);
> +}

Although I suggested this complete, it might be rather overkill(pz,
understand me it was work in midnight. :))
Instead, we could use just atomic_dec in here and
use wait_event(event, atomic_read(&zram->refcount) == 0) in reset.

Otherwise, looks good to me. I will cook up based on this version
and test/send if you don't have any strong objection until that.

Thanks for the review, Sergey!

> +
>  static void update_position(u32 *index, int *offset, struct bio_vec *bvec)
>  {
>  	if (*offset + bvec->bv_len >= PAGE_SIZE)
> @@ -719,6 +736,10 @@ static void zram_bio_discard(struct zram *zram, u32 index,
>  
>  static void zram_reset_device(struct zram *zram, bool reset_capacity)
>  {
> +	struct zram_meta *meta;
> +	struct zcomp *comp;
> +	u64 disksize;
> +
>  	down_write(&zram->init_lock);
>  
>  	zram->limit_pages = 0;
> @@ -728,19 +749,25 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>  		return;
>  	}
>  
> -	zcomp_destroy(zram->comp);
> -	zram->max_comp_streams = 1;
> -	zram_meta_free(zram->meta, zram->disksize);
> -	zram->meta = NULL;
> +	meta = zram->meta;
> +	comp = zram->comp;
> +	disksize = zram->disksize;
> +	/* ->refcount will go down to 0 eventually */
> +	zram_put(zram);
> +	wait_for_completion(&zram->io_done);
> +
>  	/* Reset stats */
>  	memset(&zram->stats, 0, sizeof(zram->stats));
> -
>  	zram->disksize = 0;
> +	zram->max_comp_streams = 1;
> +
>  	if (reset_capacity)
>  		set_capacity(zram->disk, 0);
>  
>  	up_write(&zram->init_lock);
> -
> +	/* I/O operation under all of CPU are done so let's free */
> +	zram_meta_free(meta, disksize);
> +	zcomp_destroy(comp);
>  	/*
>  	 * Revalidate disk out of the init_lock to avoid lockdep splat.
>  	 * It's okay because disk's capacity is protected by init_lock
> @@ -783,6 +810,8 @@ static ssize_t disksize_store(struct device *dev,
>  		goto out_destroy_comp;
>  	}
>  
> +	init_completion(&zram->io_done);
> +	atomic_set(&zram->refcount, 1);
>  	zram->meta = meta;
>  	zram->comp = comp;
>  	zram->disksize = disksize;
> @@ -795,7 +824,6 @@ static ssize_t disksize_store(struct device *dev,
>  	 * so that revalidate_disk always sees up-to-date capacity.
>  	 */
>  	revalidate_disk(zram->disk);
> -
>  	return len;
>  
>  out_destroy_comp:
> @@ -908,23 +936,24 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
>  {
>  	struct zram *zram = queue->queuedata;
>  
> -	down_read(&zram->init_lock);
> -	if (unlikely(!init_done(zram)))
> +	if (unlikely(!zram_get(zram)))
>  		goto error;
>  
> +	if (unlikely(!init_done(zram)))
> +		goto put_zram;
> +
>  	if (!valid_io_request(zram, bio->bi_iter.bi_sector,
>  					bio->bi_iter.bi_size)) {
>  		atomic64_inc(&zram->stats.invalid_io);
> -		goto error;
> +		goto put_zram;
>  	}
>  
>  	__zram_make_request(zram, bio);
> -	up_read(&zram->init_lock);
> -
> +	zram_put(zram);
>  	return;
> -
> +put_zram:
> +	zram_put(zram);
>  error:
> -	up_read(&zram->init_lock);
>  	bio_io_error(bio);
>  }
>  
> @@ -946,21 +975,22 @@ static void zram_slot_free_notify(struct block_device *bdev,
>  static int zram_rw_page(struct block_device *bdev, sector_t sector,
>  		       struct page *page, int rw)
>  {
> -	int offset, err;
> +	int offset, err = -EIO;
>  	u32 index;
>  	struct zram *zram;
>  	struct bio_vec bv;
>  
>  	zram = bdev->bd_disk->private_data;
> +	if (unlikely(!zram_get(zram)))
> +		goto out;
> +
> +	if (unlikely(!init_done(zram)))
> +		goto put_zram;
> +
>  	if (!valid_io_request(zram, sector, PAGE_SIZE)) {
>  		atomic64_inc(&zram->stats.invalid_io);
> -		return -EINVAL;
> -	}
> -
> -	down_read(&zram->init_lock);
> -	if (unlikely(!init_done(zram))) {
> -		err = -EIO;
> -		goto out_unlock;
> +		err = -EINVAL;
> +		goto put_zram;
>  	}
>  
>  	index = sector >> SECTORS_PER_PAGE_SHIFT;
> @@ -971,8 +1001,9 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector,
>  	bv.bv_offset = 0;
>  
>  	err = zram_bvec_rw(zram, &bv, index, offset, rw);
> -out_unlock:
> -	up_read(&zram->init_lock);
> +put_zram:
> +	zram_put(zram);
> +out:
>  	/*
>  	 * If I/O fails, just return error(ie, non-zero) without
>  	 * calling page_endio.
> @@ -1041,6 +1072,7 @@ static int create_device(struct zram *zram, int device_id)
>  	int ret = -ENOMEM;
>  
>  	init_rwsem(&zram->init_lock);
> +	atomic_set(&zram->refcount, 0);
>  
>  	zram->queue = blk_alloc_queue(GFP_KERNEL);
>  	if (!zram->queue) {
> diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
> index b05a816..7138c82 100644
> --- a/drivers/block/zram/zram_drv.h
> +++ b/drivers/block/zram/zram_drv.h
> @@ -100,24 +100,25 @@ struct zram_meta {
>  
>  struct zram {
>  	struct zram_meta *meta;
> +	struct zcomp *comp;
>  	struct request_queue *queue;
>  	struct gendisk *disk;
> -	struct zcomp *comp;
> -
>  	/* Prevent concurrent execution of device init, reset and R/W request */
>  	struct rw_semaphore init_lock;
>  	/*
> -	 * This is the limit on amount of *uncompressed* worth of data
> -	 * we can store in a disk.
> +	 * the number of pages zram can consume for storing compressed data
>  	 */
> -	u64 disksize;	/* bytes */
> +	unsigned long limit_pages;
> +	atomic_t refcount;
>  	int max_comp_streams;
> +
>  	struct zram_stats stats;
> +	struct completion io_done; /* notify IO under all of cpu are done */
>  	/*
> -	 * the number of pages zram can consume for storing compressed data
> +	 * This is the limit on amount of *uncompressed* worth of data
> +	 * we can store in a disk.
>  	 */
> -	unsigned long limit_pages;
> -
> +	u64 disksize;	/* bytes */
>  	char compressor[10];
>  };
>  #endif
> 

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-02-02  1:30                         ` Minchan Kim
@ 2015-02-02  1:48                           ` Sergey Senozhatsky
  2015-02-02  2:44                             ` Minchan Kim
  0 siblings, 1 reply; 44+ messages in thread
From: Sergey Senozhatsky @ 2015-02-02  1:48 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	linux-kernel, Linux-MM, Nitin Gupta, Jerome Marchand,
	Ganesh Mahendran

Hello Minchan,

On (02/02/15 10:30), Minchan Kim wrote:
> > >  static inline int init_done(struct zram *zram)
> > >  {
> > > -	return zram->meta != NULL;
> > > +	return zram->disksize != 0;
> > 
> > we don't set ->disksize to 0 when create device. and I think
> > it's better to use refcount here, but set it to 0 during device creation.
> > (see the patch below)
> 
> There was a reason I didn't use refcount there.
> I should have written down it.
> 
> We need something to prevent further I/O handling on other CPUs.
> Otherwise, it's livelock. For example, new 'A' I/O rw path on CPU 1
> can see non-zero refcount if another CPU is going on rw.
> Then, another new 'B' I/O rw path on CPU 2 can see non-zero refcount
> if A I/O is going on. Then, another new 'C' I/O rw path on CPU 3 can
> see non-zero refcount if B I/O is going on. Finally, 'A' IO is done
> on CPU 1 and next I/O 'D' on CPU 1 can see non-zero refcount because
> 'C' on CPU 3 is going on. Infinite loop.

sure, I did think about this. and I actually didn't find any reason not
to use ->refcount there. if user wants to reset the device, he first
should umount it to make bdev->bd_holders check happy. and that's where
IOs will be failed. so it makes sense to switch to ->refcount there, IMHO.


> > here and later:
> > we can't take zram_meta_get() first and then check for init_done(zram),
> > because ->meta can be NULL, so it fill be ->NULL->refcount.
> 
> True.
> Actually, it was totally RFC I forgot adding the tag in the night but I can't
> escape from my shame with the escuse. Thanks!

no problem at all. you were throwing solutions all week long.

> 
> > 
> > let's keep ->completion and ->refcount in zram and rename zram_meta_[get|put]
> > to zram_[get|put].
> 
> Good idea but still want to name it as zram_meta_get/put because zram_get naming
> might confuse struct zram's refcount rather than zram_meta. :)

no objections. but I assume we agreed to keep ->io_done completion
and ->refcount in zram.

	-ss

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-02-02  1:43                           ` Minchan Kim
@ 2015-02-02  1:59                             ` Sergey Senozhatsky
  2015-02-02  2:45                               ` Minchan Kim
  0 siblings, 1 reply; 44+ messages in thread
From: Sergey Senozhatsky @ 2015-02-02  1:59 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	linux-kernel, Linux-MM, Nitin Gupta, Jerome Marchand,
	Ganesh Mahendran

On (02/02/15 10:43), Minchan Kim wrote:
> >  static inline int init_done(struct zram *zram)
> >  {
> > -	return zram->meta != NULL;
> > +	return atomic_read(&zram->refcount);
> 
> As I said previous mail, it could make livelock so I want to use disksize
> in here to prevent further I/O handling.

just as I said in my previous email -- is this live lock really possible?
we need to umount device to continue with reset. and umount will kill IOs out
of our way.

the other reset caller is  __exit zram_exit(). but once again, I don't
expect this function being executed on mounted device and module being
in use.


> > +static inline void zram_put(struct zram *zram)
> > +{
> > +	if (atomic_dec_and_test(&zram->refcount))
> > +		complete(&zram->io_done);
> > +}
> 
> Although I suggested this complete, it might be rather overkill(pz,
> understand me it was work in midnight. :))
> Instead, we could use just atomic_dec in here and
> use wait_event(event, atomic_read(&zram->refcount) == 0) in reset.
> 

yes, I think it can do the trick.

	-ss

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-02-02  1:48                           ` Sergey Senozhatsky
@ 2015-02-02  2:44                             ` Minchan Kim
  2015-02-02  4:01                               ` Sergey Senozhatsky
  0 siblings, 1 reply; 44+ messages in thread
From: Minchan Kim @ 2015-02-02  2:44 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel, Linux-MM,
	Nitin Gupta, Jerome Marchand, Ganesh Mahendran

On Mon, Feb 02, 2015 at 10:48:00AM +0900, Sergey Senozhatsky wrote:
> Hello Minchan,
> 
> On (02/02/15 10:30), Minchan Kim wrote:
> > > >  static inline int init_done(struct zram *zram)
> > > >  {
> > > > -	return zram->meta != NULL;
> > > > +	return zram->disksize != 0;
> > > 
> > > we don't set ->disksize to 0 when create device. and I think
> > > it's better to use refcount here, but set it to 0 during device creation.
> > > (see the patch below)
> > 
> > There was a reason I didn't use refcount there.
> > I should have written down it.
> > 
> > We need something to prevent further I/O handling on other CPUs.
> > Otherwise, it's livelock. For example, new 'A' I/O rw path on CPU 1
> > can see non-zero refcount if another CPU is going on rw.
> > Then, another new 'B' I/O rw path on CPU 2 can see non-zero refcount
> > if A I/O is going on. Then, another new 'C' I/O rw path on CPU 3 can
> > see non-zero refcount if B I/O is going on. Finally, 'A' IO is done
> > on CPU 1 and next I/O 'D' on CPU 1 can see non-zero refcount because
> > 'C' on CPU 3 is going on. Infinite loop.
> 
> sure, I did think about this. and I actually didn't find any reason not
> to use ->refcount there. if user wants to reset the device, he first
> should umount it to make bdev->bd_holders check happy. and that's where
> IOs will be failed. so it makes sense to switch to ->refcount there, IMHO.

If we use zram as block device itself(not a fs or swap) and open the
block device as !FMODE_EXCL, bd_holders will be void.

Another topic: As I didn't see enough fs/block_dev.c bd_holders in zram
would be mess. I guess we need to study hotplug of device and implement
it for zram reset rather than strange own konb. It should go TODO. :(

> 
> 
> > > here and later:
> > > we can't take zram_meta_get() first and then check for init_done(zram),
> > > because ->meta can be NULL, so it fill be ->NULL->refcount.
> > 
> > True.
> > Actually, it was totally RFC I forgot adding the tag in the night but I can't
> > escape from my shame with the escuse. Thanks!
> 
> no problem at all. you were throwing solutions all week long.
> 
> > 
> > > 
> > > let's keep ->completion and ->refcount in zram and rename zram_meta_[get|put]
> > > to zram_[get|put].
> > 
> > Good idea but still want to name it as zram_meta_get/put because zram_get naming
> > might confuse struct zram's refcount rather than zram_meta. :)
> 
> no objections. but I assume we agreed to keep ->io_done completion
> and ->refcount in zram.
> 
> 	-ss

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-02-02  1:59                             ` Sergey Senozhatsky
@ 2015-02-02  2:45                               ` Minchan Kim
  2015-02-02  3:47                                 ` Sergey Senozhatsky
  0 siblings, 1 reply; 44+ messages in thread
From: Minchan Kim @ 2015-02-02  2:45 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel, Linux-MM,
	Nitin Gupta, Jerome Marchand, Ganesh Mahendran

On Mon, Feb 02, 2015 at 10:59:40AM +0900, Sergey Senozhatsky wrote:
> On (02/02/15 10:43), Minchan Kim wrote:
> > >  static inline int init_done(struct zram *zram)
> > >  {
> > > -	return zram->meta != NULL;
> > > +	return atomic_read(&zram->refcount);
> > 
> > As I said previous mail, it could make livelock so I want to use disksize
> > in here to prevent further I/O handling.
> 
> just as I said in my previous email -- is this live lock really possible?
> we need to umount device to continue with reset. and umount will kill IOs out
> of our way.
> 
> the other reset caller is  __exit zram_exit(). but once again, I don't
> expect this function being executed on mounted device and module being
> in use.
> 
> 
> > > +static inline void zram_put(struct zram *zram)
> > > +{
> > > +	if (atomic_dec_and_test(&zram->refcount))
> > > +		complete(&zram->io_done);
> > > +}
> > 
> > Although I suggested this complete, it might be rather overkill(pz,
> > understand me it was work in midnight. :))
> > Instead, we could use just atomic_dec in here and
> > use wait_event(event, atomic_read(&zram->refcount) == 0) in reset.
> > 
> 
> yes, I think it can do the trick.

Hey, it's not a trick. It suits for the our goal well. Completion
was too much, I think.

> 
> 	-ss

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-02-02  2:45                               ` Minchan Kim
@ 2015-02-02  3:47                                 ` Sergey Senozhatsky
  0 siblings, 0 replies; 44+ messages in thread
From: Sergey Senozhatsky @ 2015-02-02  3:47 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	linux-kernel, Linux-MM, Nitin Gupta, Jerome Marchand,
	Ganesh Mahendran

On (02/02/15 11:45), Minchan Kim wrote:
> On Mon, Feb 02, 2015 at 10:59:40AM +0900, Sergey Senozhatsky wrote:
> > On (02/02/15 10:43), Minchan Kim wrote:
> > > >  static inline int init_done(struct zram *zram)
> > > >  {
> > > > -	return zram->meta != NULL;
> > > > +	return atomic_read(&zram->refcount);
> > > 
> > > As I said previous mail, it could make livelock so I want to use disksize
> > > in here to prevent further I/O handling.
> > 
> > just as I said in my previous email -- is this live lock really possible?
> > we need to umount device to continue with reset. and umount will kill IOs out
> > of our way.
> > 
> > the other reset caller is  __exit zram_exit(). but once again, I don't
> > expect this function being executed on mounted device and module being
> > in use.
> > 
> > 
> > > > +static inline void zram_put(struct zram *zram)
> > > > +{
> > > > +	if (atomic_dec_and_test(&zram->refcount))
> > > > +		complete(&zram->io_done);
> > > > +}
> > > 
> > > Although I suggested this complete, it might be rather overkill(pz,
> > > understand me it was work in midnight. :))
> > > Instead, we could use just atomic_dec in here and
> > > use wait_event(event, atomic_read(&zram->refcount) == 0) in reset.
> > > 
> > 
> > yes, I think it can do the trick.
> 
> Hey, it's not a trick. It suits for the our goal well. Completion
> was too much, I think.
> 

sure :)

	-ss

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-02-02  2:44                             ` Minchan Kim
@ 2015-02-02  4:01                               ` Sergey Senozhatsky
  2015-02-02  4:28                                 ` Minchan Kim
  0 siblings, 1 reply; 44+ messages in thread
From: Sergey Senozhatsky @ 2015-02-02  4:01 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	linux-kernel, Linux-MM, Nitin Gupta, Jerome Marchand,
	Ganesh Mahendran

On (02/02/15 11:44), Minchan Kim wrote:
> > sure, I did think about this. and I actually didn't find any reason not
> > to use ->refcount there. if user wants to reset the device, he first
> > should umount it to make bdev->bd_holders check happy. and that's where
> > IOs will be failed. so it makes sense to switch to ->refcount there, IMHO.
> 
> If we use zram as block device itself(not a fs or swap) and open the
> block device as !FMODE_EXCL, bd_holders will be void.
> 

hm.
I don't mind to use ->disksize there, but personally I'd maybe prefer
to use ->refcount, which just looks less hacky. zram's most common use
cases are coming from ram swap device or ram device with fs. so it looks
a bit like we care about some corner case here.

just my opinion, no objections against ->disksize != 0.

I need to check fs/block_dev. can we switch away from ->bd_holders?

> Another topic: As I didn't see enough fs/block_dev.c bd_holders in zram
> would be mess. I guess we need to study hotplug of device and implement
> it for zram reset rather than strange own konb. It should go TODO. :(

ok, need to investigate this later.
let's land current activities first.

	-ss

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-02-02  4:01                               ` Sergey Senozhatsky
@ 2015-02-02  4:28                                 ` Minchan Kim
  2015-02-02  5:09                                   ` Sergey Senozhatsky
  2015-02-02  5:10                                   ` Minchan Kim
  0 siblings, 2 replies; 44+ messages in thread
From: Minchan Kim @ 2015-02-02  4:28 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel, Linux-MM,
	Nitin Gupta, Jerome Marchand, Ganesh Mahendran

On Mon, Feb 02, 2015 at 01:01:24PM +0900, Sergey Senozhatsky wrote:
> On (02/02/15 11:44), Minchan Kim wrote:
> > > sure, I did think about this. and I actually didn't find any reason not
> > > to use ->refcount there. if user wants to reset the device, he first
> > > should umount it to make bdev->bd_holders check happy. and that's where
> > > IOs will be failed. so it makes sense to switch to ->refcount there, IMHO.
> > 
> > If we use zram as block device itself(not a fs or swap) and open the
> > block device as !FMODE_EXCL, bd_holders will be void.
> > 
> 
> hm.
> I don't mind to use ->disksize there, but personally I'd maybe prefer
> to use ->refcount, which just looks less hacky. zram's most common use
> cases are coming from ram swap device or ram device with fs. so it looks
> a bit like we care about some corner case here.

Maybe, but I always test zram with dd so it's not a corner case for me. :)

> 
> just my opinion, no objections against ->disksize != 0.

Thanks. It's a draft for v2. Please review.

BTW, you pointed out race between bdev_open/close and reset and
it's cleary bug although it's rare in real practice.
So, I want to fix it earlier than this patch and mark it as -stable
if we can fix it easily like Ganesh's work.
If it gets landing, we could make this patch rebased on it.

>From 699502b4e0c84b3d7b33f8754cf1c0109b16c012 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Mon, 2 Feb 2015 10:36:28 +0900
Subject: [PATCH v2] zram: remove init_lock in zram_make_request

Admin could reset zram during I/O operation going on so we have
used zram->init_lock as read-side lock in I/O path to prevent
sudden zram meta freeing.

However, the init_lock is really troublesome.
We can't do call zram_meta_alloc under init_lock due to lockdep splat
because zram_rw_page is one of the function under reclaim path and
hold it as read_lock while other places in process context hold it
as write_lock. So, we have used allocation out of the lock to avoid
lockdep warn but it's not good for readability and fainally, I met
another lockdep splat between init_lock and cpu_hotplug from
kmem_cache_destroy during working zsmalloc compaction. :(

Yes, the ideal is to remove horrible init_lock of zram in rw path.
This patch removes it in rw path and instead, add atomic refcount
for meta lifetime management and completion to free meta in process
context. It's important to free meta in process context because
some of resource destruction needs mutex lock, which could be held
if we releases the resource in reclaim context so it's deadlock,
again.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/zram/zram_drv.c | 85 ++++++++++++++++++++++++++++++-------------
 drivers/block/zram/zram_drv.h | 20 +++++-----
 2 files changed, 71 insertions(+), 34 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index aa5a4c5..c6d505c 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -55,7 +55,7 @@ static DEVICE_ATTR_RO(name);
 
 static inline int init_done(struct zram *zram)
 {
-	return zram->meta != NULL;
+	return zram->disksize != 0;
 }
 
 static inline struct zram *dev_to_zram(struct device *dev)
@@ -358,6 +358,18 @@ out_error:
 	return NULL;
 }
 
+static inline bool zram_meta_get(struct zram *zram)
+{
+	if (atomic_inc_not_zero(&zram->refcount))
+		return true;
+	return false;
+}
+
+static inline void zram_meta_put(struct zram *zram)
+{
+	atomic_dec(&zram->refcount);
+}
+
 static void update_position(u32 *index, int *offset, struct bio_vec *bvec)
 {
 	if (*offset + bvec->bv_len >= PAGE_SIZE)
@@ -719,6 +731,10 @@ static void zram_bio_discard(struct zram *zram, u32 index,
 
 static void zram_reset_device(struct zram *zram, bool reset_capacity)
 {
+	struct zram_meta *meta;
+	struct zcomp *comp;
+	u64 disksize;
+
 	down_write(&zram->init_lock);
 
 	zram->limit_pages = 0;
@@ -728,19 +744,32 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
 		return;
 	}
 
-	zcomp_destroy(zram->comp);
-	zram->max_comp_streams = 1;
-	zram_meta_free(zram->meta, zram->disksize);
-	zram->meta = NULL;
+	meta = zram->meta;
+	comp = zram->comp;
+	disksize = zram->disksize;
+	zram->disksize = 0;
+	/*
+	 * ->refcount will go down to 0 eventually and rw handler cannot
+	 * handle further I/O by init_done checking.
+	 */
+	zram_meta_put(zram);
+	/*
+	 * We want to free zram_meta in process context to avoid
+	 * deadlock between reclaim path and any other locks
+	 */
+	wait_event(zram->io_done, atomic_read(&zram->refcount) == 0);
+
 	/* Reset stats */
 	memset(&zram->stats, 0, sizeof(zram->stats));
+	zram->max_comp_streams = 1;
 
-	zram->disksize = 0;
 	if (reset_capacity)
 		set_capacity(zram->disk, 0);
 
 	up_write(&zram->init_lock);
-
+	/* I/O operation under all of CPU are done so let's free */
+	zram_meta_free(meta, disksize);
+	zcomp_destroy(comp);
 	/*
 	 * Revalidate disk out of the init_lock to avoid lockdep splat.
 	 * It's okay because disk's capacity is protected by init_lock
@@ -783,6 +812,8 @@ static ssize_t disksize_store(struct device *dev,
 		goto out_destroy_comp;
 	}
 
+	init_waitqueue_head(&zram->io_done);
+	zram_meta_get(zram);
 	zram->meta = meta;
 	zram->comp = comp;
 	zram->disksize = disksize;
@@ -838,8 +869,8 @@ static ssize_t reset_store(struct device *dev,
 	/* Make sure all pending I/O is finished */
 	fsync_bdev(bdev);
 	bdput(bdev);
-
 	zram_reset_device(zram, true);
+
 	return len;
 
 out:
@@ -908,23 +939,24 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
 {
 	struct zram *zram = queue->queuedata;
 
-	down_read(&zram->init_lock);
-	if (unlikely(!init_done(zram)))
+	if (unlikely(!zram_meta_get(zram)))
 		goto error;
 
+	if (unlikely(!init_done(zram)))
+		goto put_zram;
+
 	if (!valid_io_request(zram, bio->bi_iter.bi_sector,
 					bio->bi_iter.bi_size)) {
 		atomic64_inc(&zram->stats.invalid_io);
-		goto error;
+		goto put_zram;
 	}
 
 	__zram_make_request(zram, bio);
-	up_read(&zram->init_lock);
-
+	zram_meta_put(zram);
 	return;
-
+put_zram:
+	zram_meta_put(zram);
 error:
-	up_read(&zram->init_lock);
 	bio_io_error(bio);
 }
 
@@ -946,21 +978,22 @@ static void zram_slot_free_notify(struct block_device *bdev,
 static int zram_rw_page(struct block_device *bdev, sector_t sector,
 		       struct page *page, int rw)
 {
-	int offset, err;
+	int offset, err = -EIO;
 	u32 index;
 	struct zram *zram;
 	struct bio_vec bv;
 
 	zram = bdev->bd_disk->private_data;
+	if (unlikely(!zram_meta_get(zram)))
+		goto out;
+
+	if (unlikely(!init_done(zram)))
+		goto put_zram;
+
 	if (!valid_io_request(zram, sector, PAGE_SIZE)) {
 		atomic64_inc(&zram->stats.invalid_io);
-		return -EINVAL;
-	}
-
-	down_read(&zram->init_lock);
-	if (unlikely(!init_done(zram))) {
-		err = -EIO;
-		goto out_unlock;
+		err = -EINVAL;
+		goto put_zram;
 	}
 
 	index = sector >> SECTORS_PER_PAGE_SHIFT;
@@ -971,8 +1004,9 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector,
 	bv.bv_offset = 0;
 
 	err = zram_bvec_rw(zram, &bv, index, offset, rw);
-out_unlock:
-	up_read(&zram->init_lock);
+put_zram:
+	zram_meta_put(zram);
+out:
 	/*
 	 * If I/O fails, just return error(ie, non-zero) without
 	 * calling page_endio.
@@ -1041,6 +1075,7 @@ static int create_device(struct zram *zram, int device_id)
 	int ret = -ENOMEM;
 
 	init_rwsem(&zram->init_lock);
+	atomic_set(&zram->refcount, 0);
 
 	zram->queue = blk_alloc_queue(GFP_KERNEL);
 	if (!zram->queue) {
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index b05a816..6085335 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -100,24 +100,26 @@ struct zram_meta {
 
 struct zram {
 	struct zram_meta *meta;
+	struct zcomp *comp;
 	struct request_queue *queue;
 	struct gendisk *disk;
-	struct zcomp *comp;
-
-	/* Prevent concurrent execution of device init, reset and R/W request */
+	/* Prevent concurrent execution of device init */
 	struct rw_semaphore init_lock;
 	/*
-	 * This is the limit on amount of *uncompressed* worth of data
-	 * we can store in a disk.
+	 * the number of pages zram can consume for storing compressed data
 	 */
-	u64 disksize;	/* bytes */
+	unsigned long limit_pages;
+	atomic_t refcount; /* refcount for zram_meta */
 	int max_comp_streams;
+
 	struct zram_stats stats;
+	/* wait all IO under all of cpu are done */
+	wait_queue_head_t io_done;
 	/*
-	 * the number of pages zram can consume for storing compressed data
+	 * This is the limit on amount of *uncompressed* worth of data
+	 * we can store in a disk.
 	 */
-	unsigned long limit_pages;
-
+	u64 disksize;	/* bytes */
 	char compressor[10];
 };
 #endif
-- 
1.9.3



> 
> I need to check fs/block_dev. can we switch away from ->bd_holders?
> 
> > Another topic: As I didn't see enough fs/block_dev.c bd_holders in zram
> > would be mess. I guess we need to study hotplug of device and implement
> > it for zram reset rather than strange own konb. It should go TODO. :(
> 
> ok, need to investigate this later.
> let's land current activities first.
> 
> 	-ss

-- 
Kind regards,
Minchan Kim

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-02-02  4:28                                 ` Minchan Kim
@ 2015-02-02  5:09                                   ` Sergey Senozhatsky
  2015-02-02  5:18                                     ` Minchan Kim
  2015-02-02  5:10                                   ` Minchan Kim
  1 sibling, 1 reply; 44+ messages in thread
From: Sergey Senozhatsky @ 2015-02-02  5:09 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	linux-kernel, Linux-MM, Nitin Gupta, Jerome Marchand,
	Ganesh Mahendran


the patch mosly looks good, except for one place:

On (02/02/15 13:28), Minchan Kim wrote:
> @@ -783,6 +812,8 @@ static ssize_t disksize_store(struct device *dev,
>  		goto out_destroy_comp;
>  	}
>  
> +	init_waitqueue_head(&zram->io_done);
> +	zram_meta_get(zram);

it was
+       init_completion(&zram->io_done);
+       atomic_set(&zram->refcount, 1);

I think we need to replace zram_meta_get(zram) with atomic_set(&zram->refcount, 1).

->refcount is 0 by default and atomic_inc_not_zero(&zram->refcount) will not
increment it here, nor anywhere else.


>  	zram->meta = meta;
>  	zram->comp = comp;
>  	zram->disksize = disksize;
> @@ -838,8 +869,8 @@ static ssize_t reset_store(struct device *dev,
>  	/* Make sure all pending I/O is finished */
>  	fsync_bdev(bdev);
>  	bdput(bdev);
> -

[..]

> @@ -1041,6 +1075,7 @@ static int create_device(struct zram *zram, int device_id)
>  	int ret = -ENOMEM;
>  
>  	init_rwsem(&zram->init_lock);
> +	atomic_set(&zram->refcount, 0);

sorry, I forgot that zram is kzalloc()-ated. so we can drop

	atomic_set(&zram->refcount, 0)

>  	zram->queue = blk_alloc_queue(GFP_KERNEL);
>  	if (!zram->queue) {


	-ss

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-02-02  4:28                                 ` Minchan Kim
  2015-02-02  5:09                                   ` Sergey Senozhatsky
@ 2015-02-02  5:10                                   ` Minchan Kim
  1 sibling, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2015-02-02  5:10 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel, Linux-MM,
	Nitin Gupta, Jerome Marchand, Ganesh Mahendran

On Mon, Feb 02, 2015 at 01:28:47PM +0900, Minchan Kim wrote:
> On Mon, Feb 02, 2015 at 01:01:24PM +0900, Sergey Senozhatsky wrote:
> > On (02/02/15 11:44), Minchan Kim wrote:
> > > > sure, I did think about this. and I actually didn't find any reason not
> > > > to use ->refcount there. if user wants to reset the device, he first
> > > > should umount it to make bdev->bd_holders check happy. and that's where
> > > > IOs will be failed. so it makes sense to switch to ->refcount there, IMHO.
> > > 
> > > If we use zram as block device itself(not a fs or swap) and open the
> > > block device as !FMODE_EXCL, bd_holders will be void.
> > > 
> > 
> > hm.
> > I don't mind to use ->disksize there, but personally I'd maybe prefer
> > to use ->refcount, which just looks less hacky. zram's most common use
> > cases are coming from ram swap device or ram device with fs. so it looks
> > a bit like we care about some corner case here.
> 
> Maybe, but I always test zram with dd so it's not a corner case for me. :)
> 
> > 
> > just my opinion, no objections against ->disksize != 0.
> 
> Thanks. It's a draft for v2. Please review.
> 
> BTW, you pointed out race between bdev_open/close and reset and
> it's cleary bug although it's rare in real practice.
> So, I want to fix it earlier than this patch and mark it as -stable
> if we can fix it easily like Ganesh's work.
> If it gets landing, we could make this patch rebased on it.
> 
> From 699502b4e0c84b3d7b33f8754cf1c0109b16c012 Mon Sep 17 00:00:00 2001
> From: Minchan Kim <minchan@kernel.org>
> Date: Mon, 2 Feb 2015 10:36:28 +0900
> Subject: [PATCH v2] zram: remove init_lock in zram_make_request
> 
> Admin could reset zram during I/O operation going on so we have
> used zram->init_lock as read-side lock in I/O path to prevent
> sudden zram meta freeing.
> 
> However, the init_lock is really troublesome.
> We can't do call zram_meta_alloc under init_lock due to lockdep splat
> because zram_rw_page is one of the function under reclaim path and
> hold it as read_lock while other places in process context hold it
> as write_lock. So, we have used allocation out of the lock to avoid
> lockdep warn but it's not good for readability and fainally, I met
> another lockdep splat between init_lock and cpu_hotplug from
> kmem_cache_destroy during working zsmalloc compaction. :(
> 
> Yes, the ideal is to remove horrible init_lock of zram in rw path.
> This patch removes it in rw path and instead, add atomic refcount
> for meta lifetime management and completion to free meta in process
> context. It's important to free meta in process context because
> some of resource destruction needs mutex lock, which could be held
> if we releases the resource in reclaim context so it's deadlock,
> again.
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  drivers/block/zram/zram_drv.c | 85 ++++++++++++++++++++++++++++++-------------
>  drivers/block/zram/zram_drv.h | 20 +++++-----
>  2 files changed, 71 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index aa5a4c5..c6d505c 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -55,7 +55,7 @@ static DEVICE_ATTR_RO(name);
>  
>  static inline int init_done(struct zram *zram)
>  {
> -	return zram->meta != NULL;
> +	return zram->disksize != 0;
>  }
>  
>  static inline struct zram *dev_to_zram(struct device *dev)
> @@ -358,6 +358,18 @@ out_error:
>  	return NULL;
>  }
>  
> +static inline bool zram_meta_get(struct zram *zram)
> +{
> +	if (atomic_inc_not_zero(&zram->refcount))
> +		return true;
> +	return false;
> +}
> +
> +static inline void zram_meta_put(struct zram *zram)
> +{
> +	atomic_dec(&zram->refcount);
> +}
> +
>  static void update_position(u32 *index, int *offset, struct bio_vec *bvec)
>  {
>  	if (*offset + bvec->bv_len >= PAGE_SIZE)
> @@ -719,6 +731,10 @@ static void zram_bio_discard(struct zram *zram, u32 index,
>  
>  static void zram_reset_device(struct zram *zram, bool reset_capacity)
>  {
> +	struct zram_meta *meta;
> +	struct zcomp *comp;
> +	u64 disksize;
> +
>  	down_write(&zram->init_lock);
>  
>  	zram->limit_pages = 0;
> @@ -728,19 +744,32 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>  		return;
>  	}
>  
> -	zcomp_destroy(zram->comp);
> -	zram->max_comp_streams = 1;
> -	zram_meta_free(zram->meta, zram->disksize);
> -	zram->meta = NULL;
> +	meta = zram->meta;
> +	comp = zram->comp;
> +	disksize = zram->disksize;
> +	zram->disksize = 0;
> +	/*
> +	 * ->refcount will go down to 0 eventually and rw handler cannot
> +	 * handle further I/O by init_done checking.
> +	 */
> +	zram_meta_put(zram);
> +	/*
> +	 * We want to free zram_meta in process context to avoid
> +	 * deadlock between reclaim path and any other locks
> +	 */
> +	wait_event(zram->io_done, atomic_read(&zram->refcount) == 0);
> +
>  	/* Reset stats */
>  	memset(&zram->stats, 0, sizeof(zram->stats));
> +	zram->max_comp_streams = 1;
>  
> -	zram->disksize = 0;
>  	if (reset_capacity)
>  		set_capacity(zram->disk, 0);
>  
>  	up_write(&zram->init_lock);
> -
> +	/* I/O operation under all of CPU are done so let's free */
> +	zram_meta_free(meta, disksize);
> +	zcomp_destroy(comp);
>  	/*
>  	 * Revalidate disk out of the init_lock to avoid lockdep splat.
>  	 * It's okay because disk's capacity is protected by init_lock
> @@ -783,6 +812,8 @@ static ssize_t disksize_store(struct device *dev,
>  		goto out_destroy_comp;
>  	}
>  
> +	init_waitqueue_head(&zram->io_done);
> +	zram_meta_get(zram);
        
Argh, It should be

        atomic_set(&zram->refcount, 1);

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-02-02  5:09                                   ` Sergey Senozhatsky
@ 2015-02-02  5:18                                     ` Minchan Kim
  2015-02-02  5:28                                       ` Sergey Senozhatsky
  0 siblings, 1 reply; 44+ messages in thread
From: Minchan Kim @ 2015-02-02  5:18 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel, Linux-MM,
	Nitin Gupta, Jerome Marchand, Ganesh Mahendran

On Mon, Feb 02, 2015 at 02:09:12PM +0900, Sergey Senozhatsky wrote:
> 
> the patch mosly looks good, except for one place:
> 
> On (02/02/15 13:28), Minchan Kim wrote:
> > @@ -783,6 +812,8 @@ static ssize_t disksize_store(struct device *dev,
> >  		goto out_destroy_comp;
> >  	}
> >  
> > +	init_waitqueue_head(&zram->io_done);
> > +	zram_meta_get(zram);
> 
> it was
> +       init_completion(&zram->io_done);
> +       atomic_set(&zram->refcount, 1);
> 
> I think we need to replace zram_meta_get(zram) with atomic_set(&zram->refcount, 1).
> 
> ->refcount is 0 by default and atomic_inc_not_zero(&zram->refcount) will not
> increment it here, nor anywhere else.
> 
> 
> >  	zram->meta = meta;
> >  	zram->comp = comp;
> >  	zram->disksize = disksize;
> > @@ -838,8 +869,8 @@ static ssize_t reset_store(struct device *dev,
> >  	/* Make sure all pending I/O is finished */
> >  	fsync_bdev(bdev);
> >  	bdput(bdev);
> > -
> 
> [..]
> 
> > @@ -1041,6 +1075,7 @@ static int create_device(struct zram *zram, int device_id)
> >  	int ret = -ENOMEM;
> >  
> >  	init_rwsem(&zram->init_lock);
> > +	atomic_set(&zram->refcount, 0);
> 
> sorry, I forgot that zram is kzalloc()-ated. so we can drop
> 
> 	atomic_set(&zram->refcount, 0)
> 

Everything are fixed. Ready to send a patch.
But before sending, hope we fix umount race issue first.

Thanks a lot, Sergey!

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-02-02  5:18                                     ` Minchan Kim
@ 2015-02-02  5:28                                       ` Sergey Senozhatsky
  0 siblings, 0 replies; 44+ messages in thread
From: Sergey Senozhatsky @ 2015-02-02  5:28 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	linux-kernel, Linux-MM, Nitin Gupta, Jerome Marchand,
	Ganesh Mahendran

On (02/02/15 14:18), Minchan Kim wrote:
> Everything are fixed. Ready to send a patch.
> But before sending, hope we fix umount race issue first.
> 

Thanks a lot, Minchan!


OK, surely I don't mind to fix the umount race first, I just didn't want
to interrupt you in the middle of locking rework. the race is hardly
possible, but still.


I didn't review Ganesh's patch yet, I'll try to find some time to take a
look later this day.

I'm also planning to send a small `struct zram' cleanup patch by the end
of this day.


	-ss

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-02-03  3:02         ` Minchan Kim
@ 2015-02-03  3:56           ` Sergey Senozhatsky
  0 siblings, 0 replies; 44+ messages in thread
From: Sergey Senozhatsky @ 2015-02-03  3:56 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	linux-kernel, Linux-MM, Nitin Gupta, Jerome Marchand,
	Ganesh Mahendran

On (02/03/15 12:02), Minchan Kim wrote:
> On Tue, Feb 03, 2015 at 10:54:33AM +0900, Sergey Senozhatsky wrote:
> > On (02/02/15 16:06), Sergey Senozhatsky wrote:
> > > So, guys, how about doing it differently, in less lines of code,
> > > hopefully. Don't move reset_store()'s work to zram_reset_device().
> > > Instead, move
> > > 
> > > 	set_capacity(zram->disk, 0);
> > > 	revalidate_disk(zram->disk);
> > > 
> > > out from zram_reset_device() to reset_store(). this two function are
> > > executed only when called from reset_store() anyway. this also will let
> > > us drop `bool reset capacity' param from zram_reset_device().
> > > 
> > > 
> > > so we will do in reset_store()
> > > 
> > > 	mutex_lock(bdev->bd_mutex);
> > > 
> > > 	fsync_bdev(bdev);
> > > 	zram_reset_device(zram);
> > > 	set_capacity(zram->disk, 0);
> > > 
> > > 	mutex_unlock(&bdev->bd_mutex);
> > > 
> > > 	revalidate_disk(zram->disk);
> > > 	bdput(bdev);
> > > 
> > > 
> > > 
> > > and change zram_reset_device(zram, false) call to simply zram_reset_device(zram)
> > > in __exit zram_exit(void).
> > > 
> > 
> > Hello,
> > 
> > Minchan, Ganesh, I sent a patch last night, with the above solution.
> > looks ok to you?
> 
> Just I sent a feedback.
> 

thanks.
yeah, !FMODE_EXCL mode.

how do you want to handle it -- you want to send a separate patch or
you want me to send incremental one-liner and ask Andrew to squash them?

	-ss

> > 
> > Minchan, I think I'll send my small struct zram clean-up patch after
> > your init_lock patch. what's your opinion?
> 
> Good for me.
> 
> Thanks.
> -- 
> Kind regards,
> Minchan Kim
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-02-03  1:54       ` Sergey Senozhatsky
@ 2015-02-03  3:02         ` Minchan Kim
  2015-02-03  3:56           ` Sergey Senozhatsky
  0 siblings, 1 reply; 44+ messages in thread
From: Minchan Kim @ 2015-02-03  3:02 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel, Linux-MM,
	Nitin Gupta, Jerome Marchand, Ganesh Mahendran

On Tue, Feb 03, 2015 at 10:54:33AM +0900, Sergey Senozhatsky wrote:
> On (02/02/15 16:06), Sergey Senozhatsky wrote:
> > So, guys, how about doing it differently, in less lines of code,
> > hopefully. Don't move reset_store()'s work to zram_reset_device().
> > Instead, move
> > 
> > 	set_capacity(zram->disk, 0);
> > 	revalidate_disk(zram->disk);
> > 
> > out from zram_reset_device() to reset_store(). this two function are
> > executed only when called from reset_store() anyway. this also will let
> > us drop `bool reset capacity' param from zram_reset_device().
> > 
> > 
> > so we will do in reset_store()
> > 
> > 	mutex_lock(bdev->bd_mutex);
> > 
> > 	fsync_bdev(bdev);
> > 	zram_reset_device(zram);
> > 	set_capacity(zram->disk, 0);
> > 
> > 	mutex_unlock(&bdev->bd_mutex);
> > 
> > 	revalidate_disk(zram->disk);
> > 	bdput(bdev);
> > 
> > 
> > 
> > and change zram_reset_device(zram, false) call to simply zram_reset_device(zram)
> > in __exit zram_exit(void).
> > 
> 
> Hello,
> 
> Minchan, Ganesh, I sent a patch last night, with the above solution.
> looks ok to you?

Just I sent a feedback.

> 
> Minchan, I think I'll send my small struct zram clean-up patch after
> your init_lock patch. what's your opinion?

Good for me.

Thanks.
-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-02-02  7:06     ` Sergey Senozhatsky
@ 2015-02-03  1:54       ` Sergey Senozhatsky
  2015-02-03  3:02         ` Minchan Kim
  0 siblings, 1 reply; 44+ messages in thread
From: Sergey Senozhatsky @ 2015-02-03  1:54 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	linux-kernel, Linux-MM, Nitin Gupta, Jerome Marchand,
	Ganesh Mahendran

On (02/02/15 16:06), Sergey Senozhatsky wrote:
> So, guys, how about doing it differently, in less lines of code,
> hopefully. Don't move reset_store()'s work to zram_reset_device().
> Instead, move
> 
> 	set_capacity(zram->disk, 0);
> 	revalidate_disk(zram->disk);
> 
> out from zram_reset_device() to reset_store(). this two function are
> executed only when called from reset_store() anyway. this also will let
> us drop `bool reset capacity' param from zram_reset_device().
> 
> 
> so we will do in reset_store()
> 
> 	mutex_lock(bdev->bd_mutex);
> 
> 	fsync_bdev(bdev);
> 	zram_reset_device(zram);
> 	set_capacity(zram->disk, 0);
> 
> 	mutex_unlock(&bdev->bd_mutex);
> 
> 	revalidate_disk(zram->disk);
> 	bdput(bdev);
> 
> 
> 
> and change zram_reset_device(zram, false) call to simply zram_reset_device(zram)
> in __exit zram_exit(void).
> 

Hello,

Minchan, Ganesh, I sent a patch last night, with the above solution.
looks ok to you?

Minchan, I think I'll send my small struct zram clean-up patch after
your init_lock patch. what's your opinion?

	-ss

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-02-02  6:18   ` Minchan Kim
@ 2015-02-02  7:06     ` Sergey Senozhatsky
  2015-02-03  1:54       ` Sergey Senozhatsky
  0 siblings, 1 reply; 44+ messages in thread
From: Sergey Senozhatsky @ 2015-02-02  7:06 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	linux-kernel, Linux-MM, Nitin Gupta, Jerome Marchand,
	Ganesh Mahendran

On (02/02/15 15:18), Minchan Kim wrote:
> > a quick idea:
> > can we additionally move all bd flush and put work after zram_reset_device(zram, true)
> > and, perhaps, replace ->bd_holders with something else?
> > 
> > zram_reset_device() will not return until we have active IOs, pending IOs will be
> > invalidated by ->disksize != 0.
> 
> Sorry, I don't get it. Could you describe what you are concerning about active I/O?
> My concern is just race bd_holder/bd_openers and bd_holders of zram check.
> I don't think any simple solution without bd_mutex.
> If we can close the race, anything could be a solution.
> If we close the race, we should return -EBUSY if anyone is opening the zram device
> so bd_openers check would be better than bd_holders.
> 

yeah, sorry. nevermind.


So, guys, how about doing it differently, in less lines of code,
hopefully. Don't move reset_store()'s work to zram_reset_device().
Instead, move

	set_capacity(zram->disk, 0);
	revalidate_disk(zram->disk);

out from zram_reset_device() to reset_store(). this two function are
executed only when called from reset_store() anyway. this also will let
us drop `bool reset capacity' param from zram_reset_device().


so we will do in reset_store()

	mutex_lock(bdev->bd_mutex);

	fsync_bdev(bdev);
	zram_reset_device(zram);
	set_capacity(zram->disk, 0);

	mutex_unlock(&bdev->bd_mutex);

	revalidate_disk(zram->disk);
	bdput(bdev);



and change zram_reset_device(zram, false) call to simply zram_reset_device(zram)
in __exit zram_exit(void).

	-ss

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-02-02  5:59 ` Sergey Senozhatsky
@ 2015-02-02  6:18   ` Minchan Kim
  2015-02-02  7:06     ` Sergey Senozhatsky
  0 siblings, 1 reply; 44+ messages in thread
From: Minchan Kim @ 2015-02-02  6:18 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel, Linux-MM,
	Nitin Gupta, Jerome Marchand, Ganesh Mahendran

On Mon, Feb 02, 2015 at 02:59:23PM +0900, Sergey Senozhatsky wrote:
> On (02/02/15 12:41), Minchan Kim wrote:
> > > If we use zram as block device itself(not a fs or swap) and open the
> > > block device as !FMODE_EXCL, bd_holders will be void.
> > > 
> > > Another topic: As I didn't see enough fs/block_dev.c bd_holders in zram
> > > would be mess. I guess we need to study hotplug of device and implement
> > > it for zram reset rather than strange own konb. It should go TODO. :(
> > 
> > Actually, I thought bd_mutex use from custom driver was terrible idea
> > so we should walk around with device hotplug but as I look through
> > another drivers, they have used the lock for a long time.
> > Maybe it's okay to use it in zram?
> > If so, Ganesh's patch is no problem to me although I didn't' review it in detail.
> > One thing I want to point out is that it would be better to change bd_holders
> > with bd_openers to filter out because dd test opens block device as !EXCL
> > so bd_holders will be void.
> > 
> > What do you think about it?
> > 
> 
> a quick idea:
> can we additionally move all bd flush and put work after zram_reset_device(zram, true)
> and, perhaps, replace ->bd_holders with something else?
> 
> zram_reset_device() will not return until we have active IOs, pending IOs will be
> invalidated by ->disksize != 0.

Sorry, I don't get it. Could you describe what you are concerning about active I/O?
My concern is just race bd_holder/bd_openers and bd_holders of zram check.
I don't think any simple solution without bd_mutex.
If we can close the race, anything could be a solution.
If we close the race, we should return -EBUSY if anyone is opening the zram device
so bd_openers check would be better than bd_holders.

> 
> 	-ss

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
  2015-02-02  3:41 Minchan Kim
@ 2015-02-02  5:59 ` Sergey Senozhatsky
  2015-02-02  6:18   ` Minchan Kim
  0 siblings, 1 reply; 44+ messages in thread
From: Sergey Senozhatsky @ 2015-02-02  5:59 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	linux-kernel, Linux-MM, Nitin Gupta, Jerome Marchand,
	Ganesh Mahendran

On (02/02/15 12:41), Minchan Kim wrote:
> > If we use zram as block device itself(not a fs or swap) and open the
> > block device as !FMODE_EXCL, bd_holders will be void.
> > 
> > Another topic: As I didn't see enough fs/block_dev.c bd_holders in zram
> > would be mess. I guess we need to study hotplug of device and implement
> > it for zram reset rather than strange own konb. It should go TODO. :(
> 
> Actually, I thought bd_mutex use from custom driver was terrible idea
> so we should walk around with device hotplug but as I look through
> another drivers, they have used the lock for a long time.
> Maybe it's okay to use it in zram?
> If so, Ganesh's patch is no problem to me although I didn't' review it in detail.
> One thing I want to point out is that it would be better to change bd_holders
> with bd_openers to filter out because dd test opens block device as !EXCL
> so bd_holders will be void.
> 
> What do you think about it?
> 

a quick idea:
can we additionally move all bd flush and put work after zram_reset_device(zram, true)
and, perhaps, replace ->bd_holders with something else?

zram_reset_device() will not return until we have active IOs, pending IOs will be
invalidated by ->disksize != 0.

	-ss

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/2] zram: remove init_lock in zram_make_request
@ 2015-02-02  3:41 Minchan Kim
  2015-02-02  5:59 ` Sergey Senozhatsky
  0 siblings, 1 reply; 44+ messages in thread
From: Minchan Kim @ 2015-02-02  3:41 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel, Linux-MM,
	Nitin Gupta, Jerome Marchand, Ganesh Mahendran

Separate another issue from my patch.

On Mon, Feb 02, 2015 at 11:44:06AM +0900, Minchan Kim wrote:
> On Mon, Feb 02, 2015 at 10:48:00AM +0900, Sergey Senozhatsky wrote:
> > Hello Minchan,
> > 
> > On (02/02/15 10:30), Minchan Kim wrote:
> > > > >  static inline int init_done(struct zram *zram)
> > > > >  {
> > > > > -	return zram->meta != NULL;
> > > > > +	return zram->disksize != 0;
> > > > 
> > > > we don't set ->disksize to 0 when create device. and I think
> > > > it's better to use refcount here, but set it to 0 during device creation.
> > > > (see the patch below)
> > > 
> > > There was a reason I didn't use refcount there.
> > > I should have written down it.
> > > 
> > > We need something to prevent further I/O handling on other CPUs.
> > > Otherwise, it's livelock. For example, new 'A' I/O rw path on CPU 1
> > > can see non-zero refcount if another CPU is going on rw.
> > > Then, another new 'B' I/O rw path on CPU 2 can see non-zero refcount
> > > if A I/O is going on. Then, another new 'C' I/O rw path on CPU 3 can
> > > see non-zero refcount if B I/O is going on. Finally, 'A' IO is done
> > > on CPU 1 and next I/O 'D' on CPU 1 can see non-zero refcount because
> > > 'C' on CPU 3 is going on. Infinite loop.
> > 
> > sure, I did think about this. and I actually didn't find any reason not
> > to use ->refcount there. if user wants to reset the device, he first
> > should umount it to make bdev->bd_holders check happy. and that's where
> > IOs will be failed. so it makes sense to switch to ->refcount there, IMHO.
> 
> If we use zram as block device itself(not a fs or swap) and open the
> block device as !FMODE_EXCL, bd_holders will be void.
> 
> Another topic: As I didn't see enough fs/block_dev.c bd_holders in zram
> would be mess. I guess we need to study hotplug of device and implement
> it for zram reset rather than strange own konb. It should go TODO. :(

Actually, I thought bd_mutex use from custom driver was terrible idea
so we should walk around with device hotplug but as I look through
another drivers, they have used the lock for a long time.
Maybe it's okay to use it in zram?
If so, Ganesh's patch is no problem to me although I didn't' review it in detail.
One thing I want to point out is that it would be better to change bd_holders
with bd_openers to filter out because dd test opens block device as !EXCL
so bd_holders will be void.

What do you think about it?

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2015-02-03  3:56 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1422432945-6764-1-git-send-email-minchan@kernel.org>
2015-01-28 14:19 ` [PATCH 1/2] zram: free meta table in zram_meta_free Sergey Senozhatsky
2015-01-28 23:17   ` Minchan Kim
2015-01-29  1:49     ` Ganesh Mahendran
     [not found] ` <1422432945-6764-2-git-send-email-minchan@kernel.org>
2015-01-28 14:56   ` [PATCH v1 2/2] zram: remove init_lock in zram_make_request Sergey Senozhatsky
2015-01-28 15:04     ` Sergey Senozhatsky
2015-01-28 23:33     ` Minchan Kim
     [not found]       ` <CAHqPoqKZFDSjO1pL+ixYe_m_L0nGNcu04qSNp-jd1fUixKtHnw@mail.gmail.com>
2015-01-29  2:01         ` Minchan Kim
2015-01-29  2:22           ` Sergey Senozhatsky
2015-01-29  5:28             ` Minchan Kim
2015-01-29  6:06               ` Sergey Senozhatsky
2015-01-29  6:35                 ` Minchan Kim
2015-01-29  7:08                   ` Sergey Senozhatsky
2015-01-30 14:41                     ` Minchan Kim
2015-01-31 11:31                       ` Sergey Senozhatsky
2015-02-01 14:50                       ` Sergey Senozhatsky
2015-02-01 15:04                         ` Sergey Senozhatsky
2015-02-02  1:43                           ` Minchan Kim
2015-02-02  1:59                             ` Sergey Senozhatsky
2015-02-02  2:45                               ` Minchan Kim
2015-02-02  3:47                                 ` Sergey Senozhatsky
2015-02-02  1:30                         ` Minchan Kim
2015-02-02  1:48                           ` Sergey Senozhatsky
2015-02-02  2:44                             ` Minchan Kim
2015-02-02  4:01                               ` Sergey Senozhatsky
2015-02-02  4:28                                 ` Minchan Kim
2015-02-02  5:09                                   ` Sergey Senozhatsky
2015-02-02  5:18                                     ` Minchan Kim
2015-02-02  5:28                                       ` Sergey Senozhatsky
2015-02-02  5:10                                   ` Minchan Kim
2015-01-30  0:20                   ` Sergey Senozhatsky
2015-01-29 13:48   ` Ganesh Mahendran
2015-01-29 15:12     ` Sergey Senozhatsky
2015-01-30  7:52       ` Ganesh Mahendran
2015-01-30  8:08         ` Sergey Senozhatsky
2015-01-31  8:50           ` Ganesh Mahendran
2015-01-31 11:07             ` Sergey Senozhatsky
2015-01-31 12:59               ` Ganesh Mahendran
2015-02-02  3:41 Minchan Kim
2015-02-02  5:59 ` Sergey Senozhatsky
2015-02-02  6:18   ` Minchan Kim
2015-02-02  7:06     ` Sergey Senozhatsky
2015-02-03  1:54       ` Sergey Senozhatsky
2015-02-03  3:02         ` Minchan Kim
2015-02-03  3:56           ` Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).