linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/zswap: add writethrough option
@ 2013-12-19 13:23 Dan Streetman
  2014-01-02 15:38 ` Dan Streetman
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Dan Streetman @ 2013-12-19 13:23 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Dan Streetman, Linux-MM, linux-kernel, Bob Liu, Minchan Kim,
	Weijie Yang, Shirish Pargaonkar, Mel Gorman, Andrew Morton

Currently, zswap is writeback cache; stored pages are not sent
to swap disk, and when zswap wants to evict old pages it must
first write them back to swap cache/disk manually.  This avoids
swap out disk I/O up front, but only moves that disk I/O to
the writeback case (for pages that are evicted), and adds the
overhead of having to uncompress the evicted pages and the
need for an additional free page (to store the uncompressed page).

This optionally changes zswap to writethrough cache by enabling
frontswap_writethrough() before registering, so that any
successful page store will also be written to swap disk.  The
default remains writeback.  To enable writethrough, the param
zswap.writethrough=1 must be used at boot.

Whether writeback or writethrough will provide better performance
depends on many factors including disk I/O speed/throughput,
CPU speed(s), system load, etc.  In most cases it is likely
that writeback has better performance than writethrough before
zswap is full, but after zswap fills up writethrough has
better performance than writeback.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>

---

Based on specjbb testing on my laptop, the results for both writeback
and writethrough are better than not using zswap at all, but writeback
does seem to be better than writethrough while zswap isn't full.  Once
it fills up, performance for writethrough is essentially close to not
using zswap, while writeback seems to be worse than not using zswap.
However, I think more testing on a wider span of systems and conditions
is needed.  Additionally, I'm not sure that specjbb is measuring true
performance under fully loaded cpu conditions, so additional cpu load
might need to be added or specjbb parameters modified (I took the
values from the 4 "warehouses" test run).

In any case though, I think having writethrough as an option is still
useful.  More changes could be made, such as changing from writeback
to writethrough based on the zswap % full.  And the patch doesn't
change default behavior - writethrough must be specifically enabled.

The %-ized numbers I got from specjbb on average, using the default
20% max_pool_percent and varying the amount of heap used as shown:

ram | no zswap | writeback | writethrough
75     93.08     100         96.90
87     96.58     95.58       96.72
100    92.29     89.73       86.75
112    63.80     38.66       19.66
125    4.79      29.90       15.75
137    4.99      4.50        4.75
150    4.28      4.62        5.01
162    5.20      2.94        4.66
175    5.71      2.11        4.84



 mm/zswap.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 64 insertions(+), 4 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index e55bab9..2f919db 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -61,6 +61,8 @@ static atomic_t zswap_stored_pages = ATOMIC_INIT(0);
 static u64 zswap_pool_limit_hit;
 /* Pages written back when pool limit was reached */
 static u64 zswap_written_back_pages;
+/* Pages evicted when pool limit was reached */
+static u64 zswap_evicted_pages;
 /* Store failed due to a reclaim failure after pool limit was reached */
 static u64 zswap_reject_reclaim_fail;
 /* Compressed page was too big for the allocator to (optimally) store */
@@ -89,6 +91,10 @@ static unsigned int zswap_max_pool_percent = 20;
 module_param_named(max_pool_percent,
 			zswap_max_pool_percent, uint, 0644);
 
+/* Writeback/writethrough mode (fixed at boot for now) */
+static bool zswap_writethrough;
+module_param_named(writethrough, zswap_writethrough, bool, 0444);
+
 /*********************************
 * compression functions
 **********************************/
@@ -629,6 +635,48 @@ end:
 }
 
 /*********************************
+* evict code
+**********************************/
+
+/*
+ * This evicts pages that have already been written through to swap.
+ */
+static int zswap_evict_entry(struct zbud_pool *pool, unsigned long handle)
+{
+	struct zswap_header *zhdr;
+	swp_entry_t swpentry;
+	struct zswap_tree *tree;
+	pgoff_t offset;
+	struct zswap_entry *entry;
+
+	/* extract swpentry from data */
+	zhdr = zbud_map(pool, handle);
+	swpentry = zhdr->swpentry; /* here */
+	zbud_unmap(pool, handle);
+	tree = zswap_trees[swp_type(swpentry)];
+	offset = swp_offset(swpentry);
+	BUG_ON(pool != tree->pool);
+
+	/* find and ref zswap entry */
+	spin_lock(&tree->lock);
+	entry = zswap_rb_search(&tree->rbroot, offset);
+	if (!entry) {
+		/* entry was invalidated */
+		spin_unlock(&tree->lock);
+		return 0;
+	}
+
+	zswap_evicted_pages++;
+
+	zswap_rb_erase(&tree->rbroot, entry);
+	zswap_entry_put(tree, entry);
+
+	spin_unlock(&tree->lock);
+
+	return 0;
+}
+
+/*********************************
 * frontswap hooks
 **********************************/
 /* attempts to compress and store an single page */
@@ -744,7 +792,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
 	spin_lock(&tree->lock);
 	entry = zswap_entry_find_get(&tree->rbroot, offset);
 	if (!entry) {
-		/* entry was written back */
+		/* entry was written back or evicted */
 		spin_unlock(&tree->lock);
 		return -1;
 	}
@@ -778,7 +826,7 @@ static void zswap_frontswap_invalidate_page(unsigned type, pgoff_t offset)
 	spin_lock(&tree->lock);
 	entry = zswap_rb_search(&tree->rbroot, offset);
 	if (!entry) {
-		/* entry was written back */
+		/* entry was written back or evicted */
 		spin_unlock(&tree->lock);
 		return;
 	}
@@ -813,18 +861,26 @@ static void zswap_frontswap_invalidate_area(unsigned type)
 	zswap_trees[type] = NULL;
 }
 
-static struct zbud_ops zswap_zbud_ops = {
+static struct zbud_ops zswap_zbud_writeback_ops = {
 	.evict = zswap_writeback_entry
 };
+static struct zbud_ops zswap_zbud_writethrough_ops = {
+	.evict = zswap_evict_entry
+};
 
 static void zswap_frontswap_init(unsigned type)
 {
 	struct zswap_tree *tree;
+	struct zbud_ops *ops;
 
 	tree = kzalloc(sizeof(struct zswap_tree), GFP_KERNEL);
 	if (!tree)
 		goto err;
-	tree->pool = zbud_create_pool(GFP_KERNEL, &zswap_zbud_ops);
+	if (zswap_writethrough)
+		ops = &zswap_zbud_writethrough_ops;
+	else
+		ops = &zswap_zbud_writeback_ops;
+	tree->pool = zbud_create_pool(GFP_KERNEL, ops);
 	if (!tree->pool)
 		goto freetree;
 	tree->rbroot = RB_ROOT;
@@ -875,6 +931,8 @@ static int __init zswap_debugfs_init(void)
 			zswap_debugfs_root, &zswap_reject_compress_poor);
 	debugfs_create_u64("written_back_pages", S_IRUGO,
 			zswap_debugfs_root, &zswap_written_back_pages);
+	debugfs_create_u64("evicted_pages", S_IRUGO,
+			zswap_debugfs_root, &zswap_evicted_pages);
 	debugfs_create_u64("duplicate_entry", S_IRUGO,
 			zswap_debugfs_root, &zswap_duplicate_entry);
 	debugfs_create_u64("pool_pages", S_IRUGO,
@@ -919,6 +977,8 @@ static int __init init_zswap(void)
 		pr_err("per-cpu initialization failed\n");
 		goto pcpufail;
 	}
+	if (zswap_writethrough)
+		frontswap_writethrough(true);
 	frontswap_register_ops(&zswap_frontswap_ops);
 	if (zswap_debugfs_init())
 		pr_warn("debugfs initialization failed\n");
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/zswap: add writethrough option
  2013-12-19 13:23 [PATCH] mm/zswap: add writethrough option Dan Streetman
@ 2014-01-02 15:38 ` Dan Streetman
  2014-01-03  2:21   ` Weijie Yang
  2014-01-03 15:11 ` Seth Jennings
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: Dan Streetman @ 2014-01-02 15:38 UTC (permalink / raw)
  To: Seth Jennings
  Cc: Dan Streetman, Linux-MM, linux-kernel, Bob Liu, Minchan Kim,
	Weijie Yang, Shirish Pargaonkar, Mel Gorman, Andrew Morton

Happy new year!

Seth, just checking if you have had a chance yet to think about this one.



On Thu, Dec 19, 2013 at 8:23 AM, Dan Streetman <ddstreet@ieee.org> wrote:
> Currently, zswap is writeback cache; stored pages are not sent
> to swap disk, and when zswap wants to evict old pages it must
> first write them back to swap cache/disk manually.  This avoids
> swap out disk I/O up front, but only moves that disk I/O to
> the writeback case (for pages that are evicted), and adds the
> overhead of having to uncompress the evicted pages and the
> need for an additional free page (to store the uncompressed page).
>
> This optionally changes zswap to writethrough cache by enabling
> frontswap_writethrough() before registering, so that any
> successful page store will also be written to swap disk.  The
> default remains writeback.  To enable writethrough, the param
> zswap.writethrough=1 must be used at boot.
>
> Whether writeback or writethrough will provide better performance
> depends on many factors including disk I/O speed/throughput,
> CPU speed(s), system load, etc.  In most cases it is likely
> that writeback has better performance than writethrough before
> zswap is full, but after zswap fills up writethrough has
> better performance than writeback.
>
> Signed-off-by: Dan Streetman <ddstreet@ieee.org>
>
> ---
>
> Based on specjbb testing on my laptop, the results for both writeback
> and writethrough are better than not using zswap at all, but writeback
> does seem to be better than writethrough while zswap isn't full.  Once
> it fills up, performance for writethrough is essentially close to not
> using zswap, while writeback seems to be worse than not using zswap.
> However, I think more testing on a wider span of systems and conditions
> is needed.  Additionally, I'm not sure that specjbb is measuring true
> performance under fully loaded cpu conditions, so additional cpu load
> might need to be added or specjbb parameters modified (I took the
> values from the 4 "warehouses" test run).
>
> In any case though, I think having writethrough as an option is still
> useful.  More changes could be made, such as changing from writeback
> to writethrough based on the zswap % full.  And the patch doesn't
> change default behavior - writethrough must be specifically enabled.
>
> The %-ized numbers I got from specjbb on average, using the default
> 20% max_pool_percent and varying the amount of heap used as shown:
>
> ram | no zswap | writeback | writethrough
> 75     93.08     100         96.90
> 87     96.58     95.58       96.72
> 100    92.29     89.73       86.75
> 112    63.80     38.66       19.66
> 125    4.79      29.90       15.75
> 137    4.99      4.50        4.75
> 150    4.28      4.62        5.01
> 162    5.20      2.94        4.66
> 175    5.71      2.11        4.84
>
>
>
>  mm/zswap.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 64 insertions(+), 4 deletions(-)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index e55bab9..2f919db 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -61,6 +61,8 @@ static atomic_t zswap_stored_pages = ATOMIC_INIT(0);
>  static u64 zswap_pool_limit_hit;
>  /* Pages written back when pool limit was reached */
>  static u64 zswap_written_back_pages;
> +/* Pages evicted when pool limit was reached */
> +static u64 zswap_evicted_pages;
>  /* Store failed due to a reclaim failure after pool limit was reached */
>  static u64 zswap_reject_reclaim_fail;
>  /* Compressed page was too big for the allocator to (optimally) store */
> @@ -89,6 +91,10 @@ static unsigned int zswap_max_pool_percent = 20;
>  module_param_named(max_pool_percent,
>                         zswap_max_pool_percent, uint, 0644);
>
> +/* Writeback/writethrough mode (fixed at boot for now) */
> +static bool zswap_writethrough;
> +module_param_named(writethrough, zswap_writethrough, bool, 0444);
> +
>  /*********************************
>  * compression functions
>  **********************************/
> @@ -629,6 +635,48 @@ end:
>  }
>
>  /*********************************
> +* evict code
> +**********************************/
> +
> +/*
> + * This evicts pages that have already been written through to swap.
> + */
> +static int zswap_evict_entry(struct zbud_pool *pool, unsigned long handle)
> +{
> +       struct zswap_header *zhdr;
> +       swp_entry_t swpentry;
> +       struct zswap_tree *tree;
> +       pgoff_t offset;
> +       struct zswap_entry *entry;
> +
> +       /* extract swpentry from data */
> +       zhdr = zbud_map(pool, handle);
> +       swpentry = zhdr->swpentry; /* here */
> +       zbud_unmap(pool, handle);
> +       tree = zswap_trees[swp_type(swpentry)];
> +       offset = swp_offset(swpentry);
> +       BUG_ON(pool != tree->pool);
> +
> +       /* find and ref zswap entry */
> +       spin_lock(&tree->lock);
> +       entry = zswap_rb_search(&tree->rbroot, offset);
> +       if (!entry) {
> +               /* entry was invalidated */
> +               spin_unlock(&tree->lock);
> +               return 0;
> +       }
> +
> +       zswap_evicted_pages++;
> +
> +       zswap_rb_erase(&tree->rbroot, entry);
> +       zswap_entry_put(tree, entry);
> +
> +       spin_unlock(&tree->lock);
> +
> +       return 0;
> +}
> +
> +/*********************************
>  * frontswap hooks
>  **********************************/
>  /* attempts to compress and store an single page */
> @@ -744,7 +792,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
>         spin_lock(&tree->lock);
>         entry = zswap_entry_find_get(&tree->rbroot, offset);
>         if (!entry) {
> -               /* entry was written back */
> +               /* entry was written back or evicted */
>                 spin_unlock(&tree->lock);
>                 return -1;
>         }
> @@ -778,7 +826,7 @@ static void zswap_frontswap_invalidate_page(unsigned type, pgoff_t offset)
>         spin_lock(&tree->lock);
>         entry = zswap_rb_search(&tree->rbroot, offset);
>         if (!entry) {
> -               /* entry was written back */
> +               /* entry was written back or evicted */
>                 spin_unlock(&tree->lock);
>                 return;
>         }
> @@ -813,18 +861,26 @@ static void zswap_frontswap_invalidate_area(unsigned type)
>         zswap_trees[type] = NULL;
>  }
>
> -static struct zbud_ops zswap_zbud_ops = {
> +static struct zbud_ops zswap_zbud_writeback_ops = {
>         .evict = zswap_writeback_entry
>  };
> +static struct zbud_ops zswap_zbud_writethrough_ops = {
> +       .evict = zswap_evict_entry
> +};
>
>  static void zswap_frontswap_init(unsigned type)
>  {
>         struct zswap_tree *tree;
> +       struct zbud_ops *ops;
>
>         tree = kzalloc(sizeof(struct zswap_tree), GFP_KERNEL);
>         if (!tree)
>                 goto err;
> -       tree->pool = zbud_create_pool(GFP_KERNEL, &zswap_zbud_ops);
> +       if (zswap_writethrough)
> +               ops = &zswap_zbud_writethrough_ops;
> +       else
> +               ops = &zswap_zbud_writeback_ops;
> +       tree->pool = zbud_create_pool(GFP_KERNEL, ops);
>         if (!tree->pool)
>                 goto freetree;
>         tree->rbroot = RB_ROOT;
> @@ -875,6 +931,8 @@ static int __init zswap_debugfs_init(void)
>                         zswap_debugfs_root, &zswap_reject_compress_poor);
>         debugfs_create_u64("written_back_pages", S_IRUGO,
>                         zswap_debugfs_root, &zswap_written_back_pages);
> +       debugfs_create_u64("evicted_pages", S_IRUGO,
> +                       zswap_debugfs_root, &zswap_evicted_pages);
>         debugfs_create_u64("duplicate_entry", S_IRUGO,
>                         zswap_debugfs_root, &zswap_duplicate_entry);
>         debugfs_create_u64("pool_pages", S_IRUGO,
> @@ -919,6 +977,8 @@ static int __init init_zswap(void)
>                 pr_err("per-cpu initialization failed\n");
>                 goto pcpufail;
>         }
> +       if (zswap_writethrough)
> +               frontswap_writethrough(true);
>         frontswap_register_ops(&zswap_frontswap_ops);
>         if (zswap_debugfs_init())
>                 pr_warn("debugfs initialization failed\n");
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH] mm/zswap: add writethrough option
  2014-01-02 15:38 ` Dan Streetman
@ 2014-01-03  2:21   ` Weijie Yang
  0 siblings, 0 replies; 22+ messages in thread
From: Weijie Yang @ 2014-01-03  2:21 UTC (permalink / raw)
  To: 'Dan Streetman', 'Seth Jennings'
  Cc: 'Linux-MM', 'linux-kernel', 'Bob Liu',
	'Minchan Kim', 'Shirish Pargaonkar',
	'Mel Gorman', 'Andrew Morton'

On Thu, Jan 2, 2014 at 11:38 PM, Dan Streetman <ddstreet@ieee.org> wrote:
> Happy new year!
>
> Seth, just checking if you have had a chance yet to think about this one.
>
>
> On Thu, Dec 19, 2013 at 8:23 AM, Dan Streetman <ddstreet@ieee.org> wrote:
> > Currently, zswap is writeback cache; stored pages are not sent
> > to swap disk, and when zswap wants to evict old pages it must
> > first write them back to swap cache/disk manually.  This avoids
> > swap out disk I/O up front, but only moves that disk I/O to
> > the writeback case (for pages that are evicted), and adds the
> > overhead of having to uncompress the evicted pages and the
> > need for an additional free page (to store the uncompressed page).
> >
> > This optionally changes zswap to writethrough cache by enabling
> > frontswap_writethrough() before registering, so that any
> > successful page store will also be written to swap disk.  The
> > default remains writeback.  To enable writethrough, the param
> > zswap.writethrough=1 must be used at boot.
> >
> > Whether writeback or writethrough will provide better performance
> > depends on many factors including disk I/O speed/throughput,
> > CPU speed(s), system load, etc.  In most cases it is likely
> > that writeback has better performance than writethrough before
> > zswap is full, but after zswap fills up writethrough has
> > better performance than writeback.
> >
> > Signed-off-by: Dan Streetman <ddstreet@ieee.org>

Thanks for your work.
Although I won't try writethrough mode in embedded device, I hope it may be
helpful to others.
I reviewed this patch, and it is good to me.

Reviewed-by: Weijie Yang <weijie.yang@samsung.com>

> > ---
> >
> > Based on specjbb testing on my laptop, the results for both writeback
> > and writethrough are better than not using zswap at all, but writeback
> > does seem to be better than writethrough while zswap isn't full.  Once
> > it fills up, performance for writethrough is essentially close to not
> > using zswap, while writeback seems to be worse than not using zswap.
> > However, I think more testing on a wider span of systems and conditions
> > is needed.  Additionally, I'm not sure that specjbb is measuring true
> > performance under fully loaded cpu conditions, so additional cpu load
> > might need to be added or specjbb parameters modified (I took the
> > values from the 4 "warehouses" test run).
> >
> > In any case though, I think having writethrough as an option is still
> > useful.  More changes could be made, such as changing from writeback
> > to writethrough based on the zswap % full.  And the patch doesn't
> > change default behavior - writethrough must be specifically enabled.
> >
> > The %-ized numbers I got from specjbb on average, using the default
> > 20% max_pool_percent and varying the amount of heap used as shown:
> >
> > ram | no zswap | writeback | writethrough
> > 75     93.08     100         96.90
> > 87     96.58     95.58       96.72
> > 100    92.29     89.73       86.75
> > 112    63.80     38.66       19.66
> > 125    4.79      29.90       15.75
> > 137    4.99      4.50        4.75
> > 150    4.28      4.62        5.01
> > 162    5.20      2.94        4.66
> > 175    5.71      2.11        4.84
> >
> >
> >
> >  mm/zswap.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 64 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index e55bab9..2f919db 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -61,6 +61,8 @@ static atomic_t zswap_stored_pages = ATOMIC_INIT(0);
> >  static u64 zswap_pool_limit_hit;
> >  /* Pages written back when pool limit was reached */
> >  static u64 zswap_written_back_pages;
> > +/* Pages evicted when pool limit was reached */
> > +static u64 zswap_evicted_pages;
> >  /* Store failed due to a reclaim failure after pool limit was reached */
> >  static u64 zswap_reject_reclaim_fail;
> >  /* Compressed page was too big for the allocator to (optimally) store */
> > @@ -89,6 +91,10 @@ static unsigned int zswap_max_pool_percent = 20;
> >  module_param_named(max_pool_percent,
> >                         zswap_max_pool_percent, uint, 0644);
> >
> > +/* Writeback/writethrough mode (fixed at boot for now) */
> > +static bool zswap_writethrough;
> > +module_param_named(writethrough, zswap_writethrough, bool, 0444);
> > +
> >  /*********************************
> >  * compression functions
> >  **********************************/
> > @@ -629,6 +635,48 @@ end:
> >  }
> >
> >  /*********************************
> > +* evict code
> > +**********************************/
> > +
> > +/*
> > + * This evicts pages that have already been written through to swap.
> > + */
> > +static int zswap_evict_entry(struct zbud_pool *pool, unsigned long handle)
> > +{
> > +       struct zswap_header *zhdr;
> > +       swp_entry_t swpentry;
> > +       struct zswap_tree *tree;
> > +       pgoff_t offset;
> > +       struct zswap_entry *entry;
> > +
> > +       /* extract swpentry from data */
> > +       zhdr = zbud_map(pool, handle);
> > +       swpentry = zhdr->swpentry; /* here */
> > +       zbud_unmap(pool, handle);
> > +       tree = zswap_trees[swp_type(swpentry)];
> > +       offset = swp_offset(swpentry);
> > +       BUG_ON(pool != tree->pool);
> > +
> > +       /* find and ref zswap entry */
> > +       spin_lock(&tree->lock);
> > +       entry = zswap_rb_search(&tree->rbroot, offset);
> > +       if (!entry) {
> > +               /* entry was invalidated */
> > +               spin_unlock(&tree->lock);
> > +               return 0;
> > +       }
> > +
> > +       zswap_evicted_pages++;
> > +
> > +       zswap_rb_erase(&tree->rbroot, entry);
> > +       zswap_entry_put(tree, entry);
> > +
> > +       spin_unlock(&tree->lock);
> > +
> > +       return 0;
> > +}
> > +
> > +/*********************************
> >  * frontswap hooks
> >  **********************************/
> >  /* attempts to compress and store an single page */
> > @@ -744,7 +792,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
> >         spin_lock(&tree->lock);
> >         entry = zswap_entry_find_get(&tree->rbroot, offset);
> >         if (!entry) {
> > -               /* entry was written back */
> > +               /* entry was written back or evicted */
> >                 spin_unlock(&tree->lock);
> >                 return -1;
> >         }
> > @@ -778,7 +826,7 @@ static void zswap_frontswap_invalidate_page(unsigned type, pgoff_t offset)
> >         spin_lock(&tree->lock);
> >         entry = zswap_rb_search(&tree->rbroot, offset);
> >         if (!entry) {
> > -               /* entry was written back */
> > +               /* entry was written back or evicted */
> >                 spin_unlock(&tree->lock);
> >                 return;
> >         }
> > @@ -813,18 +861,26 @@ static void zswap_frontswap_invalidate_area(unsigned type)
> >         zswap_trees[type] = NULL;
> >  }
> >
> > -static struct zbud_ops zswap_zbud_ops = {
> > +static struct zbud_ops zswap_zbud_writeback_ops = {
> >         .evict = zswap_writeback_entry
> >  };
> > +static struct zbud_ops zswap_zbud_writethrough_ops = {
> > +       .evict = zswap_evict_entry
> > +};
> >
> >  static void zswap_frontswap_init(unsigned type)
> >  {
> >         struct zswap_tree *tree;
> > +       struct zbud_ops *ops;
> >
> >         tree = kzalloc(sizeof(struct zswap_tree), GFP_KERNEL);
> >         if (!tree)
> >                 goto err;
> > -       tree->pool = zbud_create_pool(GFP_KERNEL, &zswap_zbud_ops);
> > +       if (zswap_writethrough)
> > +               ops = &zswap_zbud_writethrough_ops;
> > +       else
> > +               ops = &zswap_zbud_writeback_ops;
> > +       tree->pool = zbud_create_pool(GFP_KERNEL, ops);
> >         if (!tree->pool)
> >                 goto freetree;
> >         tree->rbroot = RB_ROOT;
> > @@ -875,6 +931,8 @@ static int __init zswap_debugfs_init(void)
> >                         zswap_debugfs_root, &zswap_reject_compress_poor);
> >         debugfs_create_u64("written_back_pages", S_IRUGO,
> >                         zswap_debugfs_root, &zswap_written_back_pages);
> > +       debugfs_create_u64("evicted_pages", S_IRUGO,
> > +                       zswap_debugfs_root, &zswap_evicted_pages);
> >         debugfs_create_u64("duplicate_entry", S_IRUGO,
> >                         zswap_debugfs_root, &zswap_duplicate_entry);
> >         debugfs_create_u64("pool_pages", S_IRUGO,
> > @@ -919,6 +977,8 @@ static int __init init_zswap(void)
> >                 pr_err("per-cpu initialization failed\n");
> >                 goto pcpufail;
> >         }
> > +       if (zswap_writethrough)
> > +               frontswap_writethrough(true);
> >         frontswap_register_ops(&zswap_frontswap_ops);
> >         if (zswap_debugfs_init())
> >                 pr_warn("debugfs initialization failed\n");
> > --
> > 1.8.3.1
> >


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/zswap: add writethrough option
  2013-12-19 13:23 [PATCH] mm/zswap: add writethrough option Dan Streetman
  2014-01-02 15:38 ` Dan Streetman
@ 2014-01-03 15:11 ` Seth Jennings
  2014-01-13 17:03   ` Dan Streetman
  2014-01-14  0:11 ` Minchan Kim
  2014-01-27 14:01 ` [PATCH v2] " Dan Streetman
  3 siblings, 1 reply; 22+ messages in thread
From: Seth Jennings @ 2014-01-03 15:11 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Linux-MM, linux-kernel, Bob Liu, Minchan Kim, Weijie Yang,
	Shirish Pargaonkar, Mel Gorman, Andrew Morton

On Thu, Dec 19, 2013 at 08:23:27AM -0500, Dan Streetman wrote:
> Currently, zswap is writeback cache; stored pages are not sent
> to swap disk, and when zswap wants to evict old pages it must
> first write them back to swap cache/disk manually.  This avoids
> swap out disk I/O up front, but only moves that disk I/O to
> the writeback case (for pages that are evicted), and adds the
> overhead of having to uncompress the evicted pages and the
> need for an additional free page (to store the uncompressed page).
> 
> This optionally changes zswap to writethrough cache by enabling
> frontswap_writethrough() before registering, so that any
> successful page store will also be written to swap disk.  The
> default remains writeback.  To enable writethrough, the param
> zswap.writethrough=1 must be used at boot.
> 
> Whether writeback or writethrough will provide better performance
> depends on many factors including disk I/O speed/throughput,
> CPU speed(s), system load, etc.  In most cases it is likely
> that writeback has better performance than writethrough before
> zswap is full, but after zswap fills up writethrough has
> better performance than writeback.
> 
> Signed-off-by: Dan Streetman <ddstreet@ieee.org>

Hey Dan, sorry for the delay on this.  Vacation and busyness.

This looks like a good option for those that don't mind having
the write overhead to ensure that things don't really bog down
if the compress pool overflows, while maintaining the read fault
speedup by decompressing from the pool.

Acked-by: Seth Jennings <sjennings@variantweb.net>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/zswap: add writethrough option
  2014-01-03 15:11 ` Seth Jennings
@ 2014-01-13 17:03   ` Dan Streetman
  0 siblings, 0 replies; 22+ messages in thread
From: Dan Streetman @ 2014-01-13 17:03 UTC (permalink / raw)
  To: Andrew Morton, Seth Jennings
  Cc: Linux-MM, linux-kernel, Bob Liu, Minchan Kim, Weijie Yang,
	Shirish Pargaonkar, Mel Gorman

Ping to see if this patch can get picked up.

On Fri, Jan 3, 2014 at 10:11 AM, Seth Jennings <sjennings@variantweb.net> wrote:
> On Thu, Dec 19, 2013 at 08:23:27AM -0500, Dan Streetman wrote:
>> Currently, zswap is writeback cache; stored pages are not sent
>> to swap disk, and when zswap wants to evict old pages it must
>> first write them back to swap cache/disk manually.  This avoids
>> swap out disk I/O up front, but only moves that disk I/O to
>> the writeback case (for pages that are evicted), and adds the
>> overhead of having to uncompress the evicted pages and the
>> need for an additional free page (to store the uncompressed page).
>>
>> This optionally changes zswap to writethrough cache by enabling
>> frontswap_writethrough() before registering, so that any
>> successful page store will also be written to swap disk.  The
>> default remains writeback.  To enable writethrough, the param
>> zswap.writethrough=1 must be used at boot.
>>
>> Whether writeback or writethrough will provide better performance
>> depends on many factors including disk I/O speed/throughput,
>> CPU speed(s), system load, etc.  In most cases it is likely
>> that writeback has better performance than writethrough before
>> zswap is full, but after zswap fills up writethrough has
>> better performance than writeback.
>>
>> Signed-off-by: Dan Streetman <ddstreet@ieee.org>
>
> Hey Dan, sorry for the delay on this.  Vacation and busyness.
>
> This looks like a good option for those that don't mind having
> the write overhead to ensure that things don't really bog down
> if the compress pool overflows, while maintaining the read fault
> speedup by decompressing from the pool.
>
> Acked-by: Seth Jennings <sjennings@variantweb.net>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/zswap: add writethrough option
  2013-12-19 13:23 [PATCH] mm/zswap: add writethrough option Dan Streetman
  2014-01-02 15:38 ` Dan Streetman
  2014-01-03 15:11 ` Seth Jennings
@ 2014-01-14  0:11 ` Minchan Kim
  2014-01-14 15:10   ` Dan Streetman
  2014-01-27 14:01 ` [PATCH v2] " Dan Streetman
  3 siblings, 1 reply; 22+ messages in thread
From: Minchan Kim @ 2014-01-14  0:11 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Seth Jennings, Linux-MM, linux-kernel, Bob Liu, Weijie Yang,
	Shirish Pargaonkar, Mel Gorman, Andrew Morton

Hello Dan,

Sorry for the late response and I didn't look at the code yet
because I am not convinced. :( 

On Thu, Dec 19, 2013 at 08:23:27AM -0500, Dan Streetman wrote:
> Currently, zswap is writeback cache; stored pages are not sent
> to swap disk, and when zswap wants to evict old pages it must
> first write them back to swap cache/disk manually.  This avoids
> swap out disk I/O up front, but only moves that disk I/O to
> the writeback case (for pages that are evicted), and adds the
> overhead of having to uncompress the evicted pages and the
> need for an additional free page (to store the uncompressed page).
> 
> This optionally changes zswap to writethrough cache by enabling
> frontswap_writethrough() before registering, so that any
> successful page store will also be written to swap disk.  The
> default remains writeback.  To enable writethrough, the param
> zswap.writethrough=1 must be used at boot.
> 
> Whether writeback or writethrough will provide better performance
> depends on many factors including disk I/O speed/throughput,
> CPU speed(s), system load, etc.  In most cases it is likely
> that writeback has better performance than writethrough before
> zswap is full, but after zswap fills up writethrough has
> better performance than writeback.

So you claims we should use writeback default but writethrough
after memory limit is full?
But it would break LRU ordering and I think better idea is to
handle it more generic way rather than chaning entire policy.

Now, zswap evict out just *a* page rather than a bunch of pages
so it stucks every store if many swap write happens continuously.
It's not efficient so how about adding kswapd's threshold concept
like min/low/high? So, it could evict early before reaching zswap
memory pool and stop it reaches high watermark.
I guess it could be better than now.

Other point: As I read device-mapper/cache.txt, cache operating mode
already supports writethrough. It means zram zRAM can support
writeback/writethough with dm-cache.
Have you tried it? Is there any problem?

Acutally, I really don't know how much benefit we have that in-memory
swap overcomming to the real storage but if you want, zRAM with dm-cache
is another option rather than invent new wheel by "just having is better".

Thanks.

> 
> Signed-off-by: Dan Streetman <ddstreet@ieee.org>
> 
> ---
> 
> Based on specjbb testing on my laptop, the results for both writeback
> and writethrough are better than not using zswap at all, but writeback
> does seem to be better than writethrough while zswap isn't full.  Once
> it fills up, performance for writethrough is essentially close to not
> using zswap, while writeback seems to be worse than not using zswap.
> However, I think more testing on a wider span of systems and conditions
> is needed.  Additionally, I'm not sure that specjbb is measuring true
> performance under fully loaded cpu conditions, so additional cpu load
> might need to be added or specjbb parameters modified (I took the
> values from the 4 "warehouses" test run).
> 
> In any case though, I think having writethrough as an option is still
> useful.  More changes could be made, such as changing from writeback
> to writethrough based on the zswap % full.  And the patch doesn't
> change default behavior - writethrough must be specifically enabled.
> 
> The %-ized numbers I got from specjbb on average, using the default
> 20% max_pool_percent and varying the amount of heap used as shown:
> 
> ram | no zswap | writeback | writethrough
> 75     93.08     100         96.90
> 87     96.58     95.58       96.72
> 100    92.29     89.73       86.75
> 112    63.80     38.66       19.66
> 125    4.79      29.90       15.75
> 137    4.99      4.50        4.75
> 150    4.28      4.62        5.01
> 162    5.20      2.94        4.66
> 175    5.71      2.11        4.84
> 
> 
> 
>  mm/zswap.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 64 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/zswap.c b/mm/zswap.c
> index e55bab9..2f919db 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -61,6 +61,8 @@ static atomic_t zswap_stored_pages = ATOMIC_INIT(0);
>  static u64 zswap_pool_limit_hit;
>  /* Pages written back when pool limit was reached */
>  static u64 zswap_written_back_pages;
> +/* Pages evicted when pool limit was reached */
> +static u64 zswap_evicted_pages;
>  /* Store failed due to a reclaim failure after pool limit was reached */
>  static u64 zswap_reject_reclaim_fail;
>  /* Compressed page was too big for the allocator to (optimally) store */
> @@ -89,6 +91,10 @@ static unsigned int zswap_max_pool_percent = 20;
>  module_param_named(max_pool_percent,
>  			zswap_max_pool_percent, uint, 0644);
>  
> +/* Writeback/writethrough mode (fixed at boot for now) */
> +static bool zswap_writethrough;
> +module_param_named(writethrough, zswap_writethrough, bool, 0444);
> +
>  /*********************************
>  * compression functions
>  **********************************/
> @@ -629,6 +635,48 @@ end:
>  }
>  
>  /*********************************
> +* evict code
> +**********************************/
> +
> +/*
> + * This evicts pages that have already been written through to swap.
> + */
> +static int zswap_evict_entry(struct zbud_pool *pool, unsigned long handle)
> +{
> +	struct zswap_header *zhdr;
> +	swp_entry_t swpentry;
> +	struct zswap_tree *tree;
> +	pgoff_t offset;
> +	struct zswap_entry *entry;
> +
> +	/* extract swpentry from data */
> +	zhdr = zbud_map(pool, handle);
> +	swpentry = zhdr->swpentry; /* here */
> +	zbud_unmap(pool, handle);
> +	tree = zswap_trees[swp_type(swpentry)];
> +	offset = swp_offset(swpentry);
> +	BUG_ON(pool != tree->pool);
> +
> +	/* find and ref zswap entry */
> +	spin_lock(&tree->lock);
> +	entry = zswap_rb_search(&tree->rbroot, offset);
> +	if (!entry) {
> +		/* entry was invalidated */
> +		spin_unlock(&tree->lock);
> +		return 0;
> +	}
> +
> +	zswap_evicted_pages++;
> +
> +	zswap_rb_erase(&tree->rbroot, entry);
> +	zswap_entry_put(tree, entry);
> +
> +	spin_unlock(&tree->lock);
> +
> +	return 0;
> +}
> +
> +/*********************************
>  * frontswap hooks
>  **********************************/
>  /* attempts to compress and store an single page */
> @@ -744,7 +792,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
>  	spin_lock(&tree->lock);
>  	entry = zswap_entry_find_get(&tree->rbroot, offset);
>  	if (!entry) {
> -		/* entry was written back */
> +		/* entry was written back or evicted */
>  		spin_unlock(&tree->lock);
>  		return -1;
>  	}
> @@ -778,7 +826,7 @@ static void zswap_frontswap_invalidate_page(unsigned type, pgoff_t offset)
>  	spin_lock(&tree->lock);
>  	entry = zswap_rb_search(&tree->rbroot, offset);
>  	if (!entry) {
> -		/* entry was written back */
> +		/* entry was written back or evicted */
>  		spin_unlock(&tree->lock);
>  		return;
>  	}
> @@ -813,18 +861,26 @@ static void zswap_frontswap_invalidate_area(unsigned type)
>  	zswap_trees[type] = NULL;
>  }
>  
> -static struct zbud_ops zswap_zbud_ops = {
> +static struct zbud_ops zswap_zbud_writeback_ops = {
>  	.evict = zswap_writeback_entry
>  };
> +static struct zbud_ops zswap_zbud_writethrough_ops = {
> +	.evict = zswap_evict_entry
> +};
>  
>  static void zswap_frontswap_init(unsigned type)
>  {
>  	struct zswap_tree *tree;
> +	struct zbud_ops *ops;
>  
>  	tree = kzalloc(sizeof(struct zswap_tree), GFP_KERNEL);
>  	if (!tree)
>  		goto err;
> -	tree->pool = zbud_create_pool(GFP_KERNEL, &zswap_zbud_ops);
> +	if (zswap_writethrough)
> +		ops = &zswap_zbud_writethrough_ops;
> +	else
> +		ops = &zswap_zbud_writeback_ops;
> +	tree->pool = zbud_create_pool(GFP_KERNEL, ops);
>  	if (!tree->pool)
>  		goto freetree;
>  	tree->rbroot = RB_ROOT;
> @@ -875,6 +931,8 @@ static int __init zswap_debugfs_init(void)
>  			zswap_debugfs_root, &zswap_reject_compress_poor);
>  	debugfs_create_u64("written_back_pages", S_IRUGO,
>  			zswap_debugfs_root, &zswap_written_back_pages);
> +	debugfs_create_u64("evicted_pages", S_IRUGO,
> +			zswap_debugfs_root, &zswap_evicted_pages);
>  	debugfs_create_u64("duplicate_entry", S_IRUGO,
>  			zswap_debugfs_root, &zswap_duplicate_entry);
>  	debugfs_create_u64("pool_pages", S_IRUGO,
> @@ -919,6 +977,8 @@ static int __init init_zswap(void)
>  		pr_err("per-cpu initialization failed\n");
>  		goto pcpufail;
>  	}
> +	if (zswap_writethrough)
> +		frontswap_writethrough(true);
>  	frontswap_register_ops(&zswap_frontswap_ops);
>  	if (zswap_debugfs_init())
>  		pr_warn("debugfs initialization failed\n");
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/zswap: add writethrough option
  2014-01-14  0:11 ` Minchan Kim
@ 2014-01-14 15:10   ` Dan Streetman
  2014-01-15  5:42     ` Minchan Kim
  0 siblings, 1 reply; 22+ messages in thread
From: Dan Streetman @ 2014-01-14 15:10 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Seth Jennings, Linux-MM, linux-kernel, Bob Liu, Weijie Yang,
	Shirish Pargaonkar, Mel Gorman, Andrew Morton

On Mon, Jan 13, 2014 at 7:11 PM, Minchan Kim <minchan@kernel.org> wrote:
> Hello Dan,
>
> Sorry for the late response and I didn't look at the code yet
> because I am not convinced. :(
>
> On Thu, Dec 19, 2013 at 08:23:27AM -0500, Dan Streetman wrote:
>> Currently, zswap is writeback cache; stored pages are not sent
>> to swap disk, and when zswap wants to evict old pages it must
>> first write them back to swap cache/disk manually.  This avoids
>> swap out disk I/O up front, but only moves that disk I/O to
>> the writeback case (for pages that are evicted), and adds the
>> overhead of having to uncompress the evicted pages and the
>> need for an additional free page (to store the uncompressed page).
>>
>> This optionally changes zswap to writethrough cache by enabling
>> frontswap_writethrough() before registering, so that any
>> successful page store will also be written to swap disk.  The
>> default remains writeback.  To enable writethrough, the param
>> zswap.writethrough=1 must be used at boot.
>>
>> Whether writeback or writethrough will provide better performance
>> depends on many factors including disk I/O speed/throughput,
>> CPU speed(s), system load, etc.  In most cases it is likely
>> that writeback has better performance than writethrough before
>> zswap is full, but after zswap fills up writethrough has
>> better performance than writeback.
>
> So you claims we should use writeback default but writethrough
> after memory limit is full?
> But it would break LRU ordering and I think better idea is to
> handle it more generic way rather than chaning entire policy.

This patch only adds the option of using writethrough.  That's all.

> Now, zswap evict out just *a* page rather than a bunch of pages
> so it stucks every store if many swap write happens continuously.
> It's not efficient so how about adding kswapd's threshold concept
> like min/low/high? So, it could evict early before reaching zswap
> memory pool and stop it reaches high watermark.
> I guess it could be better than now.

Well, I don't think that's related to this patch, but certainly a good idea to
investigate.

>
> Other point: As I read device-mapper/cache.txt, cache operating mode
> already supports writethrough. It means zram zRAM can support
> writeback/writethough with dm-cache.
> Have you tried it? Is there any problem?

zswap isn't a block device though, so that doesn't apply (unless I'm
missing something).

>
> Acutally, I really don't know how much benefit we have that in-memory
> swap overcomming to the real storage but if you want, zRAM with dm-cache
> is another option rather than invent new wheel by "just having is better".

I'm not sure if this patch is related to the zswap vs. zram discussions.  This
only adds the option of using writethrough to zswap.  It's a first
step to possibly
making zswap work more efficiently using writeback and/or writethrough
depending on
the system and conditions.

>
> Thanks.
>
>>
>> Signed-off-by: Dan Streetman <ddstreet@ieee.org>
>>
>> ---
>>
>> Based on specjbb testing on my laptop, the results for both writeback
>> and writethrough are better than not using zswap at all, but writeback
>> does seem to be better than writethrough while zswap isn't full.  Once
>> it fills up, performance for writethrough is essentially close to not
>> using zswap, while writeback seems to be worse than not using zswap.
>> However, I think more testing on a wider span of systems and conditions
>> is needed.  Additionally, I'm not sure that specjbb is measuring true
>> performance under fully loaded cpu conditions, so additional cpu load
>> might need to be added or specjbb parameters modified (I took the
>> values from the 4 "warehouses" test run).
>>
>> In any case though, I think having writethrough as an option is still
>> useful.  More changes could be made, such as changing from writeback
>> to writethrough based on the zswap % full.  And the patch doesn't
>> change default behavior - writethrough must be specifically enabled.
>>
>> The %-ized numbers I got from specjbb on average, using the default
>> 20% max_pool_percent and varying the amount of heap used as shown:
>>
>> ram | no zswap | writeback | writethrough
>> 75     93.08     100         96.90
>> 87     96.58     95.58       96.72
>> 100    92.29     89.73       86.75
>> 112    63.80     38.66       19.66
>> 125    4.79      29.90       15.75
>> 137    4.99      4.50        4.75
>> 150    4.28      4.62        5.01
>> 162    5.20      2.94        4.66
>> 175    5.71      2.11        4.84
>>
>>
>>
>>  mm/zswap.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
>>  1 file changed, 64 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/zswap.c b/mm/zswap.c
>> index e55bab9..2f919db 100644
>> --- a/mm/zswap.c
>> +++ b/mm/zswap.c
>> @@ -61,6 +61,8 @@ static atomic_t zswap_stored_pages = ATOMIC_INIT(0);
>>  static u64 zswap_pool_limit_hit;
>>  /* Pages written back when pool limit was reached */
>>  static u64 zswap_written_back_pages;
>> +/* Pages evicted when pool limit was reached */
>> +static u64 zswap_evicted_pages;
>>  /* Store failed due to a reclaim failure after pool limit was reached */
>>  static u64 zswap_reject_reclaim_fail;
>>  /* Compressed page was too big for the allocator to (optimally) store */
>> @@ -89,6 +91,10 @@ static unsigned int zswap_max_pool_percent = 20;
>>  module_param_named(max_pool_percent,
>>                       zswap_max_pool_percent, uint, 0644);
>>
>> +/* Writeback/writethrough mode (fixed at boot for now) */
>> +static bool zswap_writethrough;
>> +module_param_named(writethrough, zswap_writethrough, bool, 0444);
>> +
>>  /*********************************
>>  * compression functions
>>  **********************************/
>> @@ -629,6 +635,48 @@ end:
>>  }
>>
>>  /*********************************
>> +* evict code
>> +**********************************/
>> +
>> +/*
>> + * This evicts pages that have already been written through to swap.
>> + */
>> +static int zswap_evict_entry(struct zbud_pool *pool, unsigned long handle)
>> +{
>> +     struct zswap_header *zhdr;
>> +     swp_entry_t swpentry;
>> +     struct zswap_tree *tree;
>> +     pgoff_t offset;
>> +     struct zswap_entry *entry;
>> +
>> +     /* extract swpentry from data */
>> +     zhdr = zbud_map(pool, handle);
>> +     swpentry = zhdr->swpentry; /* here */
>> +     zbud_unmap(pool, handle);
>> +     tree = zswap_trees[swp_type(swpentry)];
>> +     offset = swp_offset(swpentry);
>> +     BUG_ON(pool != tree->pool);
>> +
>> +     /* find and ref zswap entry */
>> +     spin_lock(&tree->lock);
>> +     entry = zswap_rb_search(&tree->rbroot, offset);
>> +     if (!entry) {
>> +             /* entry was invalidated */
>> +             spin_unlock(&tree->lock);
>> +             return 0;
>> +     }
>> +
>> +     zswap_evicted_pages++;
>> +
>> +     zswap_rb_erase(&tree->rbroot, entry);
>> +     zswap_entry_put(tree, entry);
>> +
>> +     spin_unlock(&tree->lock);
>> +
>> +     return 0;
>> +}
>> +
>> +/*********************************
>>  * frontswap hooks
>>  **********************************/
>>  /* attempts to compress and store an single page */
>> @@ -744,7 +792,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
>>       spin_lock(&tree->lock);
>>       entry = zswap_entry_find_get(&tree->rbroot, offset);
>>       if (!entry) {
>> -             /* entry was written back */
>> +             /* entry was written back or evicted */
>>               spin_unlock(&tree->lock);
>>               return -1;
>>       }
>> @@ -778,7 +826,7 @@ static void zswap_frontswap_invalidate_page(unsigned type, pgoff_t offset)
>>       spin_lock(&tree->lock);
>>       entry = zswap_rb_search(&tree->rbroot, offset);
>>       if (!entry) {
>> -             /* entry was written back */
>> +             /* entry was written back or evicted */
>>               spin_unlock(&tree->lock);
>>               return;
>>       }
>> @@ -813,18 +861,26 @@ static void zswap_frontswap_invalidate_area(unsigned type)
>>       zswap_trees[type] = NULL;
>>  }
>>
>> -static struct zbud_ops zswap_zbud_ops = {
>> +static struct zbud_ops zswap_zbud_writeback_ops = {
>>       .evict = zswap_writeback_entry
>>  };
>> +static struct zbud_ops zswap_zbud_writethrough_ops = {
>> +     .evict = zswap_evict_entry
>> +};
>>
>>  static void zswap_frontswap_init(unsigned type)
>>  {
>>       struct zswap_tree *tree;
>> +     struct zbud_ops *ops;
>>
>>       tree = kzalloc(sizeof(struct zswap_tree), GFP_KERNEL);
>>       if (!tree)
>>               goto err;
>> -     tree->pool = zbud_create_pool(GFP_KERNEL, &zswap_zbud_ops);
>> +     if (zswap_writethrough)
>> +             ops = &zswap_zbud_writethrough_ops;
>> +     else
>> +             ops = &zswap_zbud_writeback_ops;
>> +     tree->pool = zbud_create_pool(GFP_KERNEL, ops);
>>       if (!tree->pool)
>>               goto freetree;
>>       tree->rbroot = RB_ROOT;
>> @@ -875,6 +931,8 @@ static int __init zswap_debugfs_init(void)
>>                       zswap_debugfs_root, &zswap_reject_compress_poor);
>>       debugfs_create_u64("written_back_pages", S_IRUGO,
>>                       zswap_debugfs_root, &zswap_written_back_pages);
>> +     debugfs_create_u64("evicted_pages", S_IRUGO,
>> +                     zswap_debugfs_root, &zswap_evicted_pages);
>>       debugfs_create_u64("duplicate_entry", S_IRUGO,
>>                       zswap_debugfs_root, &zswap_duplicate_entry);
>>       debugfs_create_u64("pool_pages", S_IRUGO,
>> @@ -919,6 +977,8 @@ static int __init init_zswap(void)
>>               pr_err("per-cpu initialization failed\n");
>>               goto pcpufail;
>>       }
>> +     if (zswap_writethrough)
>> +             frontswap_writethrough(true);
>>       frontswap_register_ops(&zswap_frontswap_ops);
>>       if (zswap_debugfs_init())
>>               pr_warn("debugfs initialization failed\n");
>> --
>> 1.8.3.1
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
> --
> Kind regards,
> Minchan Kim

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/zswap: add writethrough option
  2014-01-14 15:10   ` Dan Streetman
@ 2014-01-15  5:42     ` Minchan Kim
  2014-01-17  5:41       ` Dan Streetman
  0 siblings, 1 reply; 22+ messages in thread
From: Minchan Kim @ 2014-01-15  5:42 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Seth Jennings, Linux-MM, linux-kernel, Bob Liu, Weijie Yang,
	Shirish Pargaonkar, Mel Gorman, Andrew Morton

Hello,

On Tue, Jan 14, 2014 at 10:10:44AM -0500, Dan Streetman wrote:
> On Mon, Jan 13, 2014 at 7:11 PM, Minchan Kim <minchan@kernel.org> wrote:
> > Hello Dan,
> >
> > Sorry for the late response and I didn't look at the code yet
> > because I am not convinced. :(
> >
> > On Thu, Dec 19, 2013 at 08:23:27AM -0500, Dan Streetman wrote:
> >> Currently, zswap is writeback cache; stored pages are not sent
> >> to swap disk, and when zswap wants to evict old pages it must
> >> first write them back to swap cache/disk manually.  This avoids
> >> swap out disk I/O up front, but only moves that disk I/O to
> >> the writeback case (for pages that are evicted), and adds the
> >> overhead of having to uncompress the evicted pages and the
> >> need for an additional free page (to store the uncompressed page).
> >>
> >> This optionally changes zswap to writethrough cache by enabling
> >> frontswap_writethrough() before registering, so that any
> >> successful page store will also be written to swap disk.  The
> >> default remains writeback.  To enable writethrough, the param
> >> zswap.writethrough=1 must be used at boot.
> >>
> >> Whether writeback or writethrough will provide better performance
> >> depends on many factors including disk I/O speed/throughput,
> >> CPU speed(s), system load, etc.  In most cases it is likely
> >> that writeback has better performance than writethrough before
> >> zswap is full, but after zswap fills up writethrough has
> >> better performance than writeback.
> >
> > So you claims we should use writeback default but writethrough
> > after memory limit is full?
> > But it would break LRU ordering and I think better idea is to
> > handle it more generic way rather than chaning entire policy.
> 
> This patch only adds the option of using writethrough.  That's all.

The point is that please explain that what's the your problem now
and prove that adding new option for solve the problem is best.
Just "Optionally, having is better" is not good approach to merge/maintain.

> 
> > Now, zswap evict out just *a* page rather than a bunch of pages
> > so it stucks every store if many swap write happens continuously.
> > It's not efficient so how about adding kswapd's threshold concept
> > like min/low/high? So, it could evict early before reaching zswap
> > memory pool and stop it reaches high watermark.
> > I guess it could be better than now.
> 
> Well, I don't think that's related to this patch, but certainly a good idea to
> investigate.

Why I suggested it that I feel from your description that wb is just
slower than wt since zswap memory is pool.

> 
> >
> > Other point: As I read device-mapper/cache.txt, cache operating mode
> > already supports writethrough. It means zram zRAM can support
> > writeback/writethough with dm-cache.
> > Have you tried it? Is there any problem?
> 
> zswap isn't a block device though, so that doesn't apply (unless I'm
> missing something).

zram is block device so freely you can make it to swap block device
and binding it with dm-cache will make what you want.
The whole point is we could do what you want without adding new
so I hope you prove what's the problem in existing solution so that
we could judge and try to solve the pain point with more ideal
approach.

> 
> >
> > Acutally, I really don't know how much benefit we have that in-memory
> > swap overcomming to the real storage but if you want, zRAM with dm-cache
> > is another option rather than invent new wheel by "just having is better".
> 
> I'm not sure if this patch is related to the zswap vs. zram discussions.  This
> only adds the option of using writethrough to zswap.  It's a first
> step to possibly
> making zswap work more efficiently using writeback and/or writethrough
> depending on
> the system and conditions.

The patch size is small. Okay I don't want to be a party-pooper
but at least, I should say my thought for Andrew to help judging.

> 
> >
> > Thanks.
> >
> >>
> >> Signed-off-by: Dan Streetman <ddstreet@ieee.org>
> >>
> >> ---
> >>
> >> Based on specjbb testing on my laptop, the results for both writeback
> >> and writethrough are better than not using zswap at all, but writeback
> >> does seem to be better than writethrough while zswap isn't full.  Once
> >> it fills up, performance for writethrough is essentially close to not
> >> using zswap, while writeback seems to be worse than not using zswap.
> >> However, I think more testing on a wider span of systems and conditions
> >> is needed.  Additionally, I'm not sure that specjbb is measuring true
> >> performance under fully loaded cpu conditions, so additional cpu load
> >> might need to be added or specjbb parameters modified (I took the
> >> values from the 4 "warehouses" test run).
> >>
> >> In any case though, I think having writethrough as an option is still
> >> useful.  More changes could be made, such as changing from writeback
> >> to writethrough based on the zswap % full.  And the patch doesn't
> >> change default behavior - writethrough must be specifically enabled.
> >>
> >> The %-ized numbers I got from specjbb on average, using the default
> >> 20% max_pool_percent and varying the amount of heap used as shown:
> >>
> >> ram | no zswap | writeback | writethrough
> >> 75     93.08     100         96.90
> >> 87     96.58     95.58       96.72
> >> 100    92.29     89.73       86.75
> >> 112    63.80     38.66       19.66
> >> 125    4.79      29.90       15.75
> >> 137    4.99      4.50        4.75
> >> 150    4.28      4.62        5.01
> >> 162    5.20      2.94        4.66
> >> 175    5.71      2.11        4.84
> >>
> >>
> >>
> >>  mm/zswap.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
> >>  1 file changed, 64 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/mm/zswap.c b/mm/zswap.c
> >> index e55bab9..2f919db 100644
> >> --- a/mm/zswap.c
> >> +++ b/mm/zswap.c
> >> @@ -61,6 +61,8 @@ static atomic_t zswap_stored_pages = ATOMIC_INIT(0);
> >>  static u64 zswap_pool_limit_hit;
> >>  /* Pages written back when pool limit was reached */
> >>  static u64 zswap_written_back_pages;
> >> +/* Pages evicted when pool limit was reached */
> >> +static u64 zswap_evicted_pages;
> >>  /* Store failed due to a reclaim failure after pool limit was reached */
> >>  static u64 zswap_reject_reclaim_fail;
> >>  /* Compressed page was too big for the allocator to (optimally) store */
> >> @@ -89,6 +91,10 @@ static unsigned int zswap_max_pool_percent = 20;
> >>  module_param_named(max_pool_percent,
> >>                       zswap_max_pool_percent, uint, 0644);
> >>
> >> +/* Writeback/writethrough mode (fixed at boot for now) */
> >> +static bool zswap_writethrough;
> >> +module_param_named(writethrough, zswap_writethrough, bool, 0444);
> >> +
> >>  /*********************************
> >>  * compression functions
> >>  **********************************/
> >> @@ -629,6 +635,48 @@ end:
> >>  }
> >>
> >>  /*********************************
> >> +* evict code
> >> +**********************************/
> >> +
> >> +/*
> >> + * This evicts pages that have already been written through to swap.
> >> + */
> >> +static int zswap_evict_entry(struct zbud_pool *pool, unsigned long handle)
> >> +{
> >> +     struct zswap_header *zhdr;
> >> +     swp_entry_t swpentry;
> >> +     struct zswap_tree *tree;
> >> +     pgoff_t offset;
> >> +     struct zswap_entry *entry;
> >> +
> >> +     /* extract swpentry from data */
> >> +     zhdr = zbud_map(pool, handle);
> >> +     swpentry = zhdr->swpentry; /* here */
> >> +     zbud_unmap(pool, handle);
> >> +     tree = zswap_trees[swp_type(swpentry)];
> >> +     offset = swp_offset(swpentry);
> >> +     BUG_ON(pool != tree->pool);
> >> +
> >> +     /* find and ref zswap entry */
> >> +     spin_lock(&tree->lock);
> >> +     entry = zswap_rb_search(&tree->rbroot, offset);
> >> +     if (!entry) {
> >> +             /* entry was invalidated */
> >> +             spin_unlock(&tree->lock);
> >> +             return 0;
> >> +     }
> >> +
> >> +     zswap_evicted_pages++;
> >> +
> >> +     zswap_rb_erase(&tree->rbroot, entry);
> >> +     zswap_entry_put(tree, entry);
> >> +
> >> +     spin_unlock(&tree->lock);
> >> +
> >> +     return 0;
> >> +}
> >> +
> >> +/*********************************
> >>  * frontswap hooks
> >>  **********************************/
> >>  /* attempts to compress and store an single page */
> >> @@ -744,7 +792,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
> >>       spin_lock(&tree->lock);
> >>       entry = zswap_entry_find_get(&tree->rbroot, offset);
> >>       if (!entry) {
> >> -             /* entry was written back */
> >> +             /* entry was written back or evicted */
> >>               spin_unlock(&tree->lock);
> >>               return -1;
> >>       }
> >> @@ -778,7 +826,7 @@ static void zswap_frontswap_invalidate_page(unsigned type, pgoff_t offset)
> >>       spin_lock(&tree->lock);
> >>       entry = zswap_rb_search(&tree->rbroot, offset);
> >>       if (!entry) {
> >> -             /* entry was written back */
> >> +             /* entry was written back or evicted */
> >>               spin_unlock(&tree->lock);
> >>               return;
> >>       }
> >> @@ -813,18 +861,26 @@ static void zswap_frontswap_invalidate_area(unsigned type)
> >>       zswap_trees[type] = NULL;
> >>  }
> >>
> >> -static struct zbud_ops zswap_zbud_ops = {
> >> +static struct zbud_ops zswap_zbud_writeback_ops = {
> >>       .evict = zswap_writeback_entry
> >>  };
> >> +static struct zbud_ops zswap_zbud_writethrough_ops = {
> >> +     .evict = zswap_evict_entry
> >> +};
> >>
> >>  static void zswap_frontswap_init(unsigned type)
> >>  {
> >>       struct zswap_tree *tree;
> >> +     struct zbud_ops *ops;
> >>
> >>       tree = kzalloc(sizeof(struct zswap_tree), GFP_KERNEL);
> >>       if (!tree)
> >>               goto err;
> >> -     tree->pool = zbud_create_pool(GFP_KERNEL, &zswap_zbud_ops);
> >> +     if (zswap_writethrough)
> >> +             ops = &zswap_zbud_writethrough_ops;
> >> +     else
> >> +             ops = &zswap_zbud_writeback_ops;
> >> +     tree->pool = zbud_create_pool(GFP_KERNEL, ops);
> >>       if (!tree->pool)
> >>               goto freetree;
> >>       tree->rbroot = RB_ROOT;
> >> @@ -875,6 +931,8 @@ static int __init zswap_debugfs_init(void)
> >>                       zswap_debugfs_root, &zswap_reject_compress_poor);
> >>       debugfs_create_u64("written_back_pages", S_IRUGO,
> >>                       zswap_debugfs_root, &zswap_written_back_pages);
> >> +     debugfs_create_u64("evicted_pages", S_IRUGO,
> >> +                     zswap_debugfs_root, &zswap_evicted_pages);
> >>       debugfs_create_u64("duplicate_entry", S_IRUGO,
> >>                       zswap_debugfs_root, &zswap_duplicate_entry);
> >>       debugfs_create_u64("pool_pages", S_IRUGO,
> >> @@ -919,6 +977,8 @@ static int __init init_zswap(void)
> >>               pr_err("per-cpu initialization failed\n");
> >>               goto pcpufail;
> >>       }
> >> +     if (zswap_writethrough)
> >> +             frontswap_writethrough(true);
> >>       frontswap_register_ops(&zswap_frontswap_ops);
> >>       if (zswap_debugfs_init())
> >>               pr_warn("debugfs initialization failed\n");
> >> --
> >> 1.8.3.1
> >>
> >> --
> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >> the body to majordomo@kvack.org.  For more info on Linux MM,
> >> see: http://www.linux-mm.org/ .
> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >
> > --
> > Kind regards,
> > Minchan Kim
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/zswap: add writethrough option
  2014-01-15  5:42     ` Minchan Kim
@ 2014-01-17  5:41       ` Dan Streetman
  2014-01-22 14:19         ` Dan Streetman
  0 siblings, 1 reply; 22+ messages in thread
From: Dan Streetman @ 2014-01-17  5:41 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Seth Jennings, Linux-MM, linux-kernel, Bob Liu, Weijie Yang,
	Shirish Pargaonkar, Mel Gorman, Andrew Morton

On Wed, Jan 15, 2014 at 12:42 AM, Minchan Kim <minchan@kernel.org> wrote:
> Hello,
>
> On Tue, Jan 14, 2014 at 10:10:44AM -0500, Dan Streetman wrote:
>> On Mon, Jan 13, 2014 at 7:11 PM, Minchan Kim <minchan@kernel.org> wrote:
>> > Hello Dan,
>> >
>> > Sorry for the late response and I didn't look at the code yet
>> > because I am not convinced. :(
>> >
>> > On Thu, Dec 19, 2013 at 08:23:27AM -0500, Dan Streetman wrote:
>> >> Currently, zswap is writeback cache; stored pages are not sent
>> >> to swap disk, and when zswap wants to evict old pages it must
>> >> first write them back to swap cache/disk manually.  This avoids
>> >> swap out disk I/O up front, but only moves that disk I/O to
>> >> the writeback case (for pages that are evicted), and adds the
>> >> overhead of having to uncompress the evicted pages and the
>> >> need for an additional free page (to store the uncompressed page).
>> >>
>> >> This optionally changes zswap to writethrough cache by enabling
>> >> frontswap_writethrough() before registering, so that any
>> >> successful page store will also be written to swap disk.  The
>> >> default remains writeback.  To enable writethrough, the param
>> >> zswap.writethrough=1 must be used at boot.
>> >>
>> >> Whether writeback or writethrough will provide better performance
>> >> depends on many factors including disk I/O speed/throughput,
>> >> CPU speed(s), system load, etc.  In most cases it is likely
>> >> that writeback has better performance than writethrough before
>> >> zswap is full, but after zswap fills up writethrough has
>> >> better performance than writeback.
>> >
>> > So you claims we should use writeback default but writethrough
>> > after memory limit is full?
>> > But it would break LRU ordering and I think better idea is to
>> > handle it more generic way rather than chaning entire policy.
>>
>> This patch only adds the option of using writethrough.  That's all.
>
> The point is that please explain that what's the your problem now
> and prove that adding new option for solve the problem is best.
> Just "Optionally, having is better" is not good approach to merge/maintain.

You may have missed the earlier emails discussing all that, so to
recap it appears that writeback is (usually) faster before zswap is
full, while writethrough appears (usually) slightly faster after zswap
has filled up.  It's highly dependent on the actual system details
(cpu speed, IO speed, load, etc) though.

>
>>
>> > Now, zswap evict out just *a* page rather than a bunch of pages
>> > so it stucks every store if many swap write happens continuously.
>> > It's not efficient so how about adding kswapd's threshold concept
>> > like min/low/high? So, it could evict early before reaching zswap
>> > memory pool and stop it reaches high watermark.
>> > I guess it could be better than now.
>>
>> Well, I don't think that's related to this patch, but certainly a good idea to
>> investigate.
>
> Why I suggested it that I feel from your description that wb is just
> slower than wt since zswap memory is pool.

evicting pages early doesn't avoid the overhead of having to
decompress the pages nor does it avoid having to write them to disk,
so I don't think it has a direct relation to this patch to add the
writethrough option.

>
>>
>> >
>> > Other point: As I read device-mapper/cache.txt, cache operating mode
>> > already supports writethrough. It means zram zRAM can support
>> > writeback/writethough with dm-cache.
>> > Have you tried it? Is there any problem?
>>
>> zswap isn't a block device though, so that doesn't apply (unless I'm
>> missing something).
>
> zram is block device so freely you can make it to swap block device
> and binding it with dm-cache will make what you want.
> The whole point is we could do what you want without adding new
> so I hope you prove what's the problem in existing solution so that
> we could judge and try to solve the pain point with more ideal
> approach.

Sorry, it seems like you're saying "you can drop zswap and start using
zram, so this patch isn't needed", which really doesn't actually
address this patch I don't think.  zswap vs. zram isn't an argument
I'm trying to get into right now.

>
>>
>> >
>> > Acutally, I really don't know how much benefit we have that in-memory
>> > swap overcomming to the real storage but if you want, zRAM with dm-cache
>> > is another option rather than invent new wheel by "just having is better".
>>
>> I'm not sure if this patch is related to the zswap vs. zram discussions.  This
>> only adds the option of using writethrough to zswap.  It's a first
>> step to possibly
>> making zswap work more efficiently using writeback and/or writethrough
>> depending on
>> the system and conditions.
>
> The patch size is small. Okay I don't want to be a party-pooper
> but at least, I should say my thought for Andrew to help judging.

Sure, I'm glad to have your suggestions.

>
>>
>> >
>> > Thanks.
>> >
>> >>
>> >> Signed-off-by: Dan Streetman <ddstreet@ieee.org>
>> >>
>> >> ---
>> >>
>> >> Based on specjbb testing on my laptop, the results for both writeback
>> >> and writethrough are better than not using zswap at all, but writeback
>> >> does seem to be better than writethrough while zswap isn't full.  Once
>> >> it fills up, performance for writethrough is essentially close to not
>> >> using zswap, while writeback seems to be worse than not using zswap.
>> >> However, I think more testing on a wider span of systems and conditions
>> >> is needed.  Additionally, I'm not sure that specjbb is measuring true
>> >> performance under fully loaded cpu conditions, so additional cpu load
>> >> might need to be added or specjbb parameters modified (I took the
>> >> values from the 4 "warehouses" test run).
>> >>
>> >> In any case though, I think having writethrough as an option is still
>> >> useful.  More changes could be made, such as changing from writeback
>> >> to writethrough based on the zswap % full.  And the patch doesn't
>> >> change default behavior - writethrough must be specifically enabled.
>> >>
>> >> The %-ized numbers I got from specjbb on average, using the default
>> >> 20% max_pool_percent and varying the amount of heap used as shown:
>> >>
>> >> ram | no zswap | writeback | writethrough
>> >> 75     93.08     100         96.90
>> >> 87     96.58     95.58       96.72
>> >> 100    92.29     89.73       86.75
>> >> 112    63.80     38.66       19.66
>> >> 125    4.79      29.90       15.75
>> >> 137    4.99      4.50        4.75
>> >> 150    4.28      4.62        5.01
>> >> 162    5.20      2.94        4.66
>> >> 175    5.71      2.11        4.84
>> >>
>> >>
>> >>
>> >>  mm/zswap.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
>> >>  1 file changed, 64 insertions(+), 4 deletions(-)
>> >>
>> >> diff --git a/mm/zswap.c b/mm/zswap.c
>> >> index e55bab9..2f919db 100644
>> >> --- a/mm/zswap.c
>> >> +++ b/mm/zswap.c
>> >> @@ -61,6 +61,8 @@ static atomic_t zswap_stored_pages = ATOMIC_INIT(0);
>> >>  static u64 zswap_pool_limit_hit;
>> >>  /* Pages written back when pool limit was reached */
>> >>  static u64 zswap_written_back_pages;
>> >> +/* Pages evicted when pool limit was reached */
>> >> +static u64 zswap_evicted_pages;
>> >>  /* Store failed due to a reclaim failure after pool limit was reached */
>> >>  static u64 zswap_reject_reclaim_fail;
>> >>  /* Compressed page was too big for the allocator to (optimally) store */
>> >> @@ -89,6 +91,10 @@ static unsigned int zswap_max_pool_percent = 20;
>> >>  module_param_named(max_pool_percent,
>> >>                       zswap_max_pool_percent, uint, 0644);
>> >>
>> >> +/* Writeback/writethrough mode (fixed at boot for now) */
>> >> +static bool zswap_writethrough;
>> >> +module_param_named(writethrough, zswap_writethrough, bool, 0444);
>> >> +
>> >>  /*********************************
>> >>  * compression functions
>> >>  **********************************/
>> >> @@ -629,6 +635,48 @@ end:
>> >>  }
>> >>
>> >>  /*********************************
>> >> +* evict code
>> >> +**********************************/
>> >> +
>> >> +/*
>> >> + * This evicts pages that have already been written through to swap.
>> >> + */
>> >> +static int zswap_evict_entry(struct zbud_pool *pool, unsigned long handle)
>> >> +{
>> >> +     struct zswap_header *zhdr;
>> >> +     swp_entry_t swpentry;
>> >> +     struct zswap_tree *tree;
>> >> +     pgoff_t offset;
>> >> +     struct zswap_entry *entry;
>> >> +
>> >> +     /* extract swpentry from data */
>> >> +     zhdr = zbud_map(pool, handle);
>> >> +     swpentry = zhdr->swpentry; /* here */
>> >> +     zbud_unmap(pool, handle);
>> >> +     tree = zswap_trees[swp_type(swpentry)];
>> >> +     offset = swp_offset(swpentry);
>> >> +     BUG_ON(pool != tree->pool);
>> >> +
>> >> +     /* find and ref zswap entry */
>> >> +     spin_lock(&tree->lock);
>> >> +     entry = zswap_rb_search(&tree->rbroot, offset);
>> >> +     if (!entry) {
>> >> +             /* entry was invalidated */
>> >> +             spin_unlock(&tree->lock);
>> >> +             return 0;
>> >> +     }
>> >> +
>> >> +     zswap_evicted_pages++;
>> >> +
>> >> +     zswap_rb_erase(&tree->rbroot, entry);
>> >> +     zswap_entry_put(tree, entry);
>> >> +
>> >> +     spin_unlock(&tree->lock);
>> >> +
>> >> +     return 0;
>> >> +}
>> >> +
>> >> +/*********************************
>> >>  * frontswap hooks
>> >>  **********************************/
>> >>  /* attempts to compress and store an single page */
>> >> @@ -744,7 +792,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
>> >>       spin_lock(&tree->lock);
>> >>       entry = zswap_entry_find_get(&tree->rbroot, offset);
>> >>       if (!entry) {
>> >> -             /* entry was written back */
>> >> +             /* entry was written back or evicted */
>> >>               spin_unlock(&tree->lock);
>> >>               return -1;
>> >>       }
>> >> @@ -778,7 +826,7 @@ static void zswap_frontswap_invalidate_page(unsigned type, pgoff_t offset)
>> >>       spin_lock(&tree->lock);
>> >>       entry = zswap_rb_search(&tree->rbroot, offset);
>> >>       if (!entry) {
>> >> -             /* entry was written back */
>> >> +             /* entry was written back or evicted */
>> >>               spin_unlock(&tree->lock);
>> >>               return;
>> >>       }
>> >> @@ -813,18 +861,26 @@ static void zswap_frontswap_invalidate_area(unsigned type)
>> >>       zswap_trees[type] = NULL;
>> >>  }
>> >>
>> >> -static struct zbud_ops zswap_zbud_ops = {
>> >> +static struct zbud_ops zswap_zbud_writeback_ops = {
>> >>       .evict = zswap_writeback_entry
>> >>  };
>> >> +static struct zbud_ops zswap_zbud_writethrough_ops = {
>> >> +     .evict = zswap_evict_entry
>> >> +};
>> >>
>> >>  static void zswap_frontswap_init(unsigned type)
>> >>  {
>> >>       struct zswap_tree *tree;
>> >> +     struct zbud_ops *ops;
>> >>
>> >>       tree = kzalloc(sizeof(struct zswap_tree), GFP_KERNEL);
>> >>       if (!tree)
>> >>               goto err;
>> >> -     tree->pool = zbud_create_pool(GFP_KERNEL, &zswap_zbud_ops);
>> >> +     if (zswap_writethrough)
>> >> +             ops = &zswap_zbud_writethrough_ops;
>> >> +     else
>> >> +             ops = &zswap_zbud_writeback_ops;
>> >> +     tree->pool = zbud_create_pool(GFP_KERNEL, ops);
>> >>       if (!tree->pool)
>> >>               goto freetree;
>> >>       tree->rbroot = RB_ROOT;
>> >> @@ -875,6 +931,8 @@ static int __init zswap_debugfs_init(void)
>> >>                       zswap_debugfs_root, &zswap_reject_compress_poor);
>> >>       debugfs_create_u64("written_back_pages", S_IRUGO,
>> >>                       zswap_debugfs_root, &zswap_written_back_pages);
>> >> +     debugfs_create_u64("evicted_pages", S_IRUGO,
>> >> +                     zswap_debugfs_root, &zswap_evicted_pages);
>> >>       debugfs_create_u64("duplicate_entry", S_IRUGO,
>> >>                       zswap_debugfs_root, &zswap_duplicate_entry);
>> >>       debugfs_create_u64("pool_pages", S_IRUGO,
>> >> @@ -919,6 +977,8 @@ static int __init init_zswap(void)
>> >>               pr_err("per-cpu initialization failed\n");
>> >>               goto pcpufail;
>> >>       }
>> >> +     if (zswap_writethrough)
>> >> +             frontswap_writethrough(true);
>> >>       frontswap_register_ops(&zswap_frontswap_ops);
>> >>       if (zswap_debugfs_init())
>> >>               pr_warn("debugfs initialization failed\n");
>> >> --
>> >> 1.8.3.1
>> >>
>> >> --
>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >> the body to majordomo@kvack.org.  For more info on Linux MM,
>> >> see: http://www.linux-mm.org/ .
>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>> >
>> > --
>> > Kind regards,
>> > Minchan Kim
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
> --
> Kind regards,
> Minchan Kim

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/zswap: add writethrough option
  2014-01-17  5:41       ` Dan Streetman
@ 2014-01-22 14:19         ` Dan Streetman
  2014-01-22 20:33           ` Andrew Morton
  0 siblings, 1 reply; 22+ messages in thread
From: Dan Streetman @ 2014-01-22 14:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Seth Jennings, Minchan Kim, Linux-MM, linux-kernel, Bob Liu,
	Weijie Yang, Shirish Pargaonkar, Mel Gorman

On Fri, Jan 17, 2014 at 12:41 AM, Dan Streetman <ddstreet@ieee.org> wrote:
> On Wed, Jan 15, 2014 at 12:42 AM, Minchan Kim <minchan@kernel.org> wrote:
>> Hello,
>>
>> On Tue, Jan 14, 2014 at 10:10:44AM -0500, Dan Streetman wrote:
>>> On Mon, Jan 13, 2014 at 7:11 PM, Minchan Kim <minchan@kernel.org> wrote:
>>> > Hello Dan,
>>> >
>>> > Sorry for the late response and I didn't look at the code yet
>>> > because I am not convinced. :(
>>> >
>>> > On Thu, Dec 19, 2013 at 08:23:27AM -0500, Dan Streetman wrote:
>>> >> Currently, zswap is writeback cache; stored pages are not sent
>>> >> to swap disk, and when zswap wants to evict old pages it must
>>> >> first write them back to swap cache/disk manually.  This avoids
>>> >> swap out disk I/O up front, but only moves that disk I/O to
>>> >> the writeback case (for pages that are evicted), and adds the
>>> >> overhead of having to uncompress the evicted pages and the
>>> >> need for an additional free page (to store the uncompressed page).
>>> >>
>>> >> This optionally changes zswap to writethrough cache by enabling
>>> >> frontswap_writethrough() before registering, so that any
>>> >> successful page store will also be written to swap disk.  The
>>> >> default remains writeback.  To enable writethrough, the param
>>> >> zswap.writethrough=1 must be used at boot.
>>> >>
>>> >> Whether writeback or writethrough will provide better performance
>>> >> depends on many factors including disk I/O speed/throughput,
>>> >> CPU speed(s), system load, etc.  In most cases it is likely
>>> >> that writeback has better performance than writethrough before
>>> >> zswap is full, but after zswap fills up writethrough has
>>> >> better performance than writeback.
>>> >
>>> > So you claims we should use writeback default but writethrough
>>> > after memory limit is full?
>>> > But it would break LRU ordering and I think better idea is to
>>> > handle it more generic way rather than chaning entire policy.
>>>
>>> This patch only adds the option of using writethrough.  That's all.
>>
>> The point is that please explain that what's the your problem now
>> and prove that adding new option for solve the problem is best.
>> Just "Optionally, having is better" is not good approach to merge/maintain.
>
> You may have missed the earlier emails discussing all that, so to
> recap it appears that writeback is (usually) faster before zswap is
> full, while writethrough appears (usually) slightly faster after zswap
> has filled up.  It's highly dependent on the actual system details
> (cpu speed, IO speed, load, etc) though.
>
>>
>>>
>>> > Now, zswap evict out just *a* page rather than a bunch of pages
>>> > so it stucks every store if many swap write happens continuously.
>>> > It's not efficient so how about adding kswapd's threshold concept
>>> > like min/low/high? So, it could evict early before reaching zswap
>>> > memory pool and stop it reaches high watermark.
>>> > I guess it could be better than now.
>>>
>>> Well, I don't think that's related to this patch, but certainly a good idea to
>>> investigate.
>>
>> Why I suggested it that I feel from your description that wb is just
>> slower than wt since zswap memory is pool.
>
> evicting pages early doesn't avoid the overhead of having to
> decompress the pages nor does it avoid having to write them to disk,
> so I don't think it has a direct relation to this patch to add the
> writethrough option.
>
>>
>>>
>>> >
>>> > Other point: As I read device-mapper/cache.txt, cache operating mode
>>> > already supports writethrough. It means zram zRAM can support
>>> > writeback/writethough with dm-cache.
>>> > Have you tried it? Is there any problem?
>>>
>>> zswap isn't a block device though, so that doesn't apply (unless I'm
>>> missing something).
>>
>> zram is block device so freely you can make it to swap block device
>> and binding it with dm-cache will make what you want.
>> The whole point is we could do what you want without adding new
>> so I hope you prove what's the problem in existing solution so that
>> we could judge and try to solve the pain point with more ideal
>> approach.
>
> Sorry, it seems like you're saying "you can drop zswap and start using
> zram, so this patch isn't needed", which really doesn't actually
> address this patch I don't think.  zswap vs. zram isn't an argument
> I'm trying to get into right now.
>
>>
>>>
>>> >
>>> > Acutally, I really don't know how much benefit we have that in-memory
>>> > swap overcomming to the real storage but if you want, zRAM with dm-cache
>>> > is another option rather than invent new wheel by "just having is better".
>>>
>>> I'm not sure if this patch is related to the zswap vs. zram discussions.  This
>>> only adds the option of using writethrough to zswap.  It's a first
>>> step to possibly
>>> making zswap work more efficiently using writeback and/or writethrough
>>> depending on
>>> the system and conditions.
>>
>> The patch size is small. Okay I don't want to be a party-pooper
>> but at least, I should say my thought for Andrew to help judging.
>
> Sure, I'm glad to have your suggestions.

To give this a bump - Andrew do you have any concerns about this
patch?  Or can you pick this up?

>
>>
>>>
>>> >
>>> > Thanks.
>>> >
>>> >>
>>> >> Signed-off-by: Dan Streetman <ddstreet@ieee.org>
>>> >>
>>> >> ---
>>> >>
>>> >> Based on specjbb testing on my laptop, the results for both writeback
>>> >> and writethrough are better than not using zswap at all, but writeback
>>> >> does seem to be better than writethrough while zswap isn't full.  Once
>>> >> it fills up, performance for writethrough is essentially close to not
>>> >> using zswap, while writeback seems to be worse than not using zswap.
>>> >> However, I think more testing on a wider span of systems and conditions
>>> >> is needed.  Additionally, I'm not sure that specjbb is measuring true
>>> >> performance under fully loaded cpu conditions, so additional cpu load
>>> >> might need to be added or specjbb parameters modified (I took the
>>> >> values from the 4 "warehouses" test run).
>>> >>
>>> >> In any case though, I think having writethrough as an option is still
>>> >> useful.  More changes could be made, such as changing from writeback
>>> >> to writethrough based on the zswap % full.  And the patch doesn't
>>> >> change default behavior - writethrough must be specifically enabled.
>>> >>
>>> >> The %-ized numbers I got from specjbb on average, using the default
>>> >> 20% max_pool_percent and varying the amount of heap used as shown:
>>> >>
>>> >> ram | no zswap | writeback | writethrough
>>> >> 75     93.08     100         96.90
>>> >> 87     96.58     95.58       96.72
>>> >> 100    92.29     89.73       86.75
>>> >> 112    63.80     38.66       19.66
>>> >> 125    4.79      29.90       15.75
>>> >> 137    4.99      4.50        4.75
>>> >> 150    4.28      4.62        5.01
>>> >> 162    5.20      2.94        4.66
>>> >> 175    5.71      2.11        4.84
>>> >>
>>> >>
>>> >>
>>> >>  mm/zswap.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
>>> >>  1 file changed, 64 insertions(+), 4 deletions(-)
>>> >>
>>> >> diff --git a/mm/zswap.c b/mm/zswap.c
>>> >> index e55bab9..2f919db 100644
>>> >> --- a/mm/zswap.c
>>> >> +++ b/mm/zswap.c
>>> >> @@ -61,6 +61,8 @@ static atomic_t zswap_stored_pages = ATOMIC_INIT(0);
>>> >>  static u64 zswap_pool_limit_hit;
>>> >>  /* Pages written back when pool limit was reached */
>>> >>  static u64 zswap_written_back_pages;
>>> >> +/* Pages evicted when pool limit was reached */
>>> >> +static u64 zswap_evicted_pages;
>>> >>  /* Store failed due to a reclaim failure after pool limit was reached */
>>> >>  static u64 zswap_reject_reclaim_fail;
>>> >>  /* Compressed page was too big for the allocator to (optimally) store */
>>> >> @@ -89,6 +91,10 @@ static unsigned int zswap_max_pool_percent = 20;
>>> >>  module_param_named(max_pool_percent,
>>> >>                       zswap_max_pool_percent, uint, 0644);
>>> >>
>>> >> +/* Writeback/writethrough mode (fixed at boot for now) */
>>> >> +static bool zswap_writethrough;
>>> >> +module_param_named(writethrough, zswap_writethrough, bool, 0444);
>>> >> +
>>> >>  /*********************************
>>> >>  * compression functions
>>> >>  **********************************/
>>> >> @@ -629,6 +635,48 @@ end:
>>> >>  }
>>> >>
>>> >>  /*********************************
>>> >> +* evict code
>>> >> +**********************************/
>>> >> +
>>> >> +/*
>>> >> + * This evicts pages that have already been written through to swap.
>>> >> + */
>>> >> +static int zswap_evict_entry(struct zbud_pool *pool, unsigned long handle)
>>> >> +{
>>> >> +     struct zswap_header *zhdr;
>>> >> +     swp_entry_t swpentry;
>>> >> +     struct zswap_tree *tree;
>>> >> +     pgoff_t offset;
>>> >> +     struct zswap_entry *entry;
>>> >> +
>>> >> +     /* extract swpentry from data */
>>> >> +     zhdr = zbud_map(pool, handle);
>>> >> +     swpentry = zhdr->swpentry; /* here */
>>> >> +     zbud_unmap(pool, handle);
>>> >> +     tree = zswap_trees[swp_type(swpentry)];
>>> >> +     offset = swp_offset(swpentry);
>>> >> +     BUG_ON(pool != tree->pool);
>>> >> +
>>> >> +     /* find and ref zswap entry */
>>> >> +     spin_lock(&tree->lock);
>>> >> +     entry = zswap_rb_search(&tree->rbroot, offset);
>>> >> +     if (!entry) {
>>> >> +             /* entry was invalidated */
>>> >> +             spin_unlock(&tree->lock);
>>> >> +             return 0;
>>> >> +     }
>>> >> +
>>> >> +     zswap_evicted_pages++;
>>> >> +
>>> >> +     zswap_rb_erase(&tree->rbroot, entry);
>>> >> +     zswap_entry_put(tree, entry);
>>> >> +
>>> >> +     spin_unlock(&tree->lock);
>>> >> +
>>> >> +     return 0;
>>> >> +}
>>> >> +
>>> >> +/*********************************
>>> >>  * frontswap hooks
>>> >>  **********************************/
>>> >>  /* attempts to compress and store an single page */
>>> >> @@ -744,7 +792,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
>>> >>       spin_lock(&tree->lock);
>>> >>       entry = zswap_entry_find_get(&tree->rbroot, offset);
>>> >>       if (!entry) {
>>> >> -             /* entry was written back */
>>> >> +             /* entry was written back or evicted */
>>> >>               spin_unlock(&tree->lock);
>>> >>               return -1;
>>> >>       }
>>> >> @@ -778,7 +826,7 @@ static void zswap_frontswap_invalidate_page(unsigned type, pgoff_t offset)
>>> >>       spin_lock(&tree->lock);
>>> >>       entry = zswap_rb_search(&tree->rbroot, offset);
>>> >>       if (!entry) {
>>> >> -             /* entry was written back */
>>> >> +             /* entry was written back or evicted */
>>> >>               spin_unlock(&tree->lock);
>>> >>               return;
>>> >>       }
>>> >> @@ -813,18 +861,26 @@ static void zswap_frontswap_invalidate_area(unsigned type)
>>> >>       zswap_trees[type] = NULL;
>>> >>  }
>>> >>
>>> >> -static struct zbud_ops zswap_zbud_ops = {
>>> >> +static struct zbud_ops zswap_zbud_writeback_ops = {
>>> >>       .evict = zswap_writeback_entry
>>> >>  };
>>> >> +static struct zbud_ops zswap_zbud_writethrough_ops = {
>>> >> +     .evict = zswap_evict_entry
>>> >> +};
>>> >>
>>> >>  static void zswap_frontswap_init(unsigned type)
>>> >>  {
>>> >>       struct zswap_tree *tree;
>>> >> +     struct zbud_ops *ops;
>>> >>
>>> >>       tree = kzalloc(sizeof(struct zswap_tree), GFP_KERNEL);
>>> >>       if (!tree)
>>> >>               goto err;
>>> >> -     tree->pool = zbud_create_pool(GFP_KERNEL, &zswap_zbud_ops);
>>> >> +     if (zswap_writethrough)
>>> >> +             ops = &zswap_zbud_writethrough_ops;
>>> >> +     else
>>> >> +             ops = &zswap_zbud_writeback_ops;
>>> >> +     tree->pool = zbud_create_pool(GFP_KERNEL, ops);
>>> >>       if (!tree->pool)
>>> >>               goto freetree;
>>> >>       tree->rbroot = RB_ROOT;
>>> >> @@ -875,6 +931,8 @@ static int __init zswap_debugfs_init(void)
>>> >>                       zswap_debugfs_root, &zswap_reject_compress_poor);
>>> >>       debugfs_create_u64("written_back_pages", S_IRUGO,
>>> >>                       zswap_debugfs_root, &zswap_written_back_pages);
>>> >> +     debugfs_create_u64("evicted_pages", S_IRUGO,
>>> >> +                     zswap_debugfs_root, &zswap_evicted_pages);
>>> >>       debugfs_create_u64("duplicate_entry", S_IRUGO,
>>> >>                       zswap_debugfs_root, &zswap_duplicate_entry);
>>> >>       debugfs_create_u64("pool_pages", S_IRUGO,
>>> >> @@ -919,6 +977,8 @@ static int __init init_zswap(void)
>>> >>               pr_err("per-cpu initialization failed\n");
>>> >>               goto pcpufail;
>>> >>       }
>>> >> +     if (zswap_writethrough)
>>> >> +             frontswap_writethrough(true);
>>> >>       frontswap_register_ops(&zswap_frontswap_ops);
>>> >>       if (zswap_debugfs_init())
>>> >>               pr_warn("debugfs initialization failed\n");
>>> >> --
>>> >> 1.8.3.1
>>> >>
>>> >> --
>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> >> the body to majordomo@kvack.org.  For more info on Linux MM,
>>> >> see: http://www.linux-mm.org/ .
>>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>> >
>>> > --
>>> > Kind regards,
>>> > Minchan Kim
>>>
>>> --
>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>> see: http://www.linux-mm.org/ .
>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>
>> --
>> Kind regards,
>> Minchan Kim

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/zswap: add writethrough option
  2014-01-22 14:19         ` Dan Streetman
@ 2014-01-22 20:33           ` Andrew Morton
  2014-01-23  0:18             ` Minchan Kim
  2014-01-23 20:43             ` Dan Streetman
  0 siblings, 2 replies; 22+ messages in thread
From: Andrew Morton @ 2014-01-22 20:33 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Seth Jennings, Minchan Kim, Linux-MM, linux-kernel, Bob Liu,
	Weijie Yang, Shirish Pargaonkar, Mel Gorman

On Wed, 22 Jan 2014 09:19:58 -0500 Dan Streetman <ddstreet@ieee.org> wrote:

> >>> > Acutally, I really don't know how much benefit we have that in-memory
> >>> > swap overcomming to the real storage but if you want, zRAM with dm-cache
> >>> > is another option rather than invent new wheel by "just having is better".
> >>>
> >>> I'm not sure if this patch is related to the zswap vs. zram discussions.  This
> >>> only adds the option of using writethrough to zswap.  It's a first
> >>> step to possibly
> >>> making zswap work more efficiently using writeback and/or writethrough
> >>> depending on
> >>> the system and conditions.
> >>
> >> The patch size is small. Okay I don't want to be a party-pooper
> >> but at least, I should say my thought for Andrew to help judging.
> >
> > Sure, I'm glad to have your suggestions.
> 
> To give this a bump - Andrew do you have any concerns about this
> patch?  Or can you pick this up?

I don't pay much attention to new features during the merge window,
preferring to shove them into a folder to look at later.  Often they
have bitrotted by the time -rc1 comes around.

I'm not sure that this review discussion has played out yet - is
Minchan happy?

Please update the changelog so that it reflects the questions Minchan
asked (any reviewer question should be regarded as an inadequacy in
either the code commenting or the changelog - people shouldn't need to
ask the programmer why he did something!) and resend for -rc1?


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/zswap: add writethrough option
  2014-01-22 20:33           ` Andrew Morton
@ 2014-01-23  0:18             ` Minchan Kim
  2014-01-23  1:08               ` Bob Liu
                                 ` (2 more replies)
  2014-01-23 20:43             ` Dan Streetman
  1 sibling, 3 replies; 22+ messages in thread
From: Minchan Kim @ 2014-01-23  0:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dan Streetman, Seth Jennings, Linux-MM, linux-kernel, Bob Liu,
	Weijie Yang, Shirish Pargaonkar, Mel Gorman

Hello all,

On Wed, Jan 22, 2014 at 12:33:58PM -0800, Andrew Morton wrote:
> On Wed, 22 Jan 2014 09:19:58 -0500 Dan Streetman <ddstreet@ieee.org> wrote:
> 
> > >>> > Acutally, I really don't know how much benefit we have that in-memory
> > >>> > swap overcomming to the real storage but if you want, zRAM with dm-cache
> > >>> > is another option rather than invent new wheel by "just having is better".
> > >>>
> > >>> I'm not sure if this patch is related to the zswap vs. zram discussions.  This
> > >>> only adds the option of using writethrough to zswap.  It's a first
> > >>> step to possibly
> > >>> making zswap work more efficiently using writeback and/or writethrough
> > >>> depending on
> > >>> the system and conditions.
> > >>
> > >> The patch size is small. Okay I don't want to be a party-pooper
> > >> but at least, I should say my thought for Andrew to help judging.
> > >
> > > Sure, I'm glad to have your suggestions.
> > 
> > To give this a bump - Andrew do you have any concerns about this
> > patch?  Or can you pick this up?
> 
> I don't pay much attention to new features during the merge window,
> preferring to shove them into a folder to look at later.  Often they
> have bitrotted by the time -rc1 comes around.
> 
> I'm not sure that this review discussion has played out yet - is
> Minchan happy?

>From the beginning, zswap is for reducing swap I/O but if workingset
overflows, it should write back rather than OOM with expecting a small
number of writeback would make the system happy because the high memory
pressure is temporal so soon most of workload would be hit in zswap
without further writeback.

If memory pressure continues and writeback steadily, it means zswap's
benefit would be mitigated, even worse by addding comp/decomp overhead.
In that case, it would be better to disable zswap, even.

Dan said writethrough supporting is first step to make zswap smart
but anybody didn't say further words to step into the smart and
what's the *real* workload want it and what's the *real* number from
that because dm-cache/zram might be a good fit.
(I don't intend to argue zram VS zswap. If the concern is solved by
existing solution, why should we invent new function and
have maintenace cost?) so it's very hard for me to judge that we should
accept and maintain it.

We need blueprint for the future and make an agreement on the
direction before merging this patch.

But code size is not much and Seth already gave an his Ack so I don't
want to hurt Dan any more(Sorry for Dan) and wasting my time so pass
the decision to others(ex, Seth and Bob).
If they insist on, I don't object any more.

Sorry for bothering Dan.

Thanks.

> 
> Please update the changelog so that it reflects the questions Minchan
> asked (any reviewer question should be regarded as an inadequacy in
> either the code commenting or the changelog - people shouldn't need to
> ask the programmer why he did something!) and resend for -rc1?
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/zswap: add writethrough option
  2014-01-23  0:18             ` Minchan Kim
@ 2014-01-23  1:08               ` Bob Liu
  2014-01-23 12:46               ` Austin S Hemmelgarn
  2014-01-23 19:18               ` Dan Streetman
  2 siblings, 0 replies; 22+ messages in thread
From: Bob Liu @ 2014-01-23  1:08 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Dan Streetman, Seth Jennings, Linux-MM,
	linux-kernel, Weijie Yang, Shirish Pargaonkar, Mel Gorman


On 01/23/2014 08:18 AM, Minchan Kim wrote:
> Hello all,
> 
> On Wed, Jan 22, 2014 at 12:33:58PM -0800, Andrew Morton wrote:
>> On Wed, 22 Jan 2014 09:19:58 -0500 Dan Streetman <ddstreet@ieee.org> wrote:
>>
>>>>>>> Acutally, I really don't know how much benefit we have that in-memory
>>>>>>> swap overcomming to the real storage but if you want, zRAM with dm-cache
>>>>>>> is another option rather than invent new wheel by "just having is better".
>>>>>>
>>>>>> I'm not sure if this patch is related to the zswap vs. zram discussions.  This
>>>>>> only adds the option of using writethrough to zswap.  It's a first
>>>>>> step to possibly
>>>>>> making zswap work more efficiently using writeback and/or writethrough
>>>>>> depending on
>>>>>> the system and conditions.
>>>>>
>>>>> The patch size is small. Okay I don't want to be a party-pooper
>>>>> but at least, I should say my thought for Andrew to help judging.
>>>>
>>>> Sure, I'm glad to have your suggestions.
>>>
>>> To give this a bump - Andrew do you have any concerns about this
>>> patch?  Or can you pick this up?
>>
>> I don't pay much attention to new features during the merge window,
>> preferring to shove them into a folder to look at later.  Often they
>> have bitrotted by the time -rc1 comes around.
>>
>> I'm not sure that this review discussion has played out yet - is
>> Minchan happy?
> 
> From the beginning, zswap is for reducing swap I/O but if workingset
> overflows, it should write back rather than OOM with expecting a small
> number of writeback would make the system happy because the high memory
> pressure is temporal so soon most of workload would be hit in zswap
> without further writeback.
> 
> If memory pressure continues and writeback steadily, it means zswap's
> benefit would be mitigated, even worse by addding comp/decomp overhead.
> In that case, it would be better to disable zswap, even.
> 
> Dan said writethrough supporting is first step to make zswap smart
> but anybody didn't say further words to step into the smart and
> what's the *real* workload want it and what's the *real* number from
> that because dm-cache/zram might be a good fit.
> (I don't intend to argue zram VS zswap. If the concern is solved by
> existing solution, why should we invent new function and
> have maintenace cost?) so it's very hard for me to judge that we should
> accept and maintain it.
> 

Speak of dm-cache, there are also bcache, flashcache and bcache.

> We need blueprint for the future and make an agreement on the
> direction before merging this patch.
> 
> But code size is not much and Seth already gave an his Ack so I don't
> want to hurt Dan any more(Sorry for Dan) and wasting my time so pass
> the decision to others(ex, Seth and Bob).

Since zswap is a cache layer and write-back and write-through are two
common options for any cache. I'm fine with adding this write-through
option.

Thanks,
-Bob


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/zswap: add writethrough option
  2014-01-23  0:18             ` Minchan Kim
  2014-01-23  1:08               ` Bob Liu
@ 2014-01-23 12:46               ` Austin S Hemmelgarn
  2014-01-23 19:18               ` Dan Streetman
  2 siblings, 0 replies; 22+ messages in thread
From: Austin S Hemmelgarn @ 2014-01-23 12:46 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton
  Cc: Dan Streetman, Seth Jennings, Linux-MM, linux-kernel, Bob Liu,
	Weijie Yang, Shirish Pargaonkar, Mel Gorman

On 2014-01-22 19:18, Minchan Kim wrote:
> 
> From the beginning, zswap is for reducing swap I/O but if workingset
> overflows, it should write back rather than OOM with expecting a small
> number of writeback would make the system happy because the high memory
> pressure is temporal so soon most of workload would be hit in zswap
> without further writeback.
> 
But write-through would still reduce I/O to swap, the only difference is
that it just reduces the need to read from swap, not the need for all
I/O.  This can still be significant because it would mean (assuming that
zswap uses a LRU algorithm for deciding what to drop) that static pages
that get accessed frequently and get swapped out often would just get
written to swap once, and only be read from zswap.

Also, I disagree with the implication that memory pressure is
short-lived, I have a desktop with 16G of RAM, and I regularly work with
data-sets that are at-least that size (mostly hi-res images and
hi-quality video/audio).
> If memory pressure continues and writeback steadily, it means zswap's
> benefit would be mitigated, even worse by addding comp/decomp overhead.
> In that case, it would be better to disable zswap, even.
> 
> Dan said writethrough supporting is first step to make zswap smart
> but anybody didn't say further words to step into the smart and
> what's the *real* workload want it and what's the *real* number from
> that because dm-cache/zram might be a good fit.
> (I don't intend to argue zram VS zswap. If the concern is solved by
> existing solution, why should we invent new function and
> have maintenace cost?) so it's very hard for me to judge that we should
> accept and maintain it.
bcache isn't stable enough that I would even trust using it for /tmp,
let alone using it for swap (i consistently get kernel OOPSes when it's
compiled in or loaded as a module, even when I'm not using it), and
dm-cache is a pain to setup (especially if it needs to happen every time
the system boots).  Part of the big advantage of zswap is that it is
(relatively) stable, and all it takes to set it up is turning on a pair
of kconfig options.
> We need blueprint for the future and make an agreement on the
> direction before merging this patch.
> 
> But code size is not much and Seth already gave an his Ack so I don't
> want to hurt Dan any more(Sorry for Dan) and wasting my time so pass
> the decision to others(ex, Seth and Bob).
> If they insist on, I don't object any more.
> 
> Sorry for bothering Dan.
> 
> Thanks.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/zswap: add writethrough option
  2014-01-23  0:18             ` Minchan Kim
  2014-01-23  1:08               ` Bob Liu
  2014-01-23 12:46               ` Austin S Hemmelgarn
@ 2014-01-23 19:18               ` Dan Streetman
  2 siblings, 0 replies; 22+ messages in thread
From: Dan Streetman @ 2014-01-23 19:18 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Seth Jennings, Linux-MM, linux-kernel, Bob Liu,
	Weijie Yang, Shirish Pargaonkar, Mel Gorman

On Wed, Jan 22, 2014 at 7:18 PM, Minchan Kim <minchan@kernel.org> wrote:
> Hello all,
>
> On Wed, Jan 22, 2014 at 12:33:58PM -0800, Andrew Morton wrote:
>> On Wed, 22 Jan 2014 09:19:58 -0500 Dan Streetman <ddstreet@ieee.org> wrote:
>>
>> > >>> > Acutally, I really don't know how much benefit we have that in-memory
>> > >>> > swap overcomming to the real storage but if you want, zRAM with dm-cache
>> > >>> > is another option rather than invent new wheel by "just having is better".
>> > >>>
>> > >>> I'm not sure if this patch is related to the zswap vs. zram discussions.  This
>> > >>> only adds the option of using writethrough to zswap.  It's a first
>> > >>> step to possibly
>> > >>> making zswap work more efficiently using writeback and/or writethrough
>> > >>> depending on
>> > >>> the system and conditions.
>> > >>
>> > >> The patch size is small. Okay I don't want to be a party-pooper
>> > >> but at least, I should say my thought for Andrew to help judging.
>> > >
>> > > Sure, I'm glad to have your suggestions.
>> >
>> > To give this a bump - Andrew do you have any concerns about this
>> > patch?  Or can you pick this up?
>>
>> I don't pay much attention to new features during the merge window,
>> preferring to shove them into a folder to look at later.  Often they
>> have bitrotted by the time -rc1 comes around.
>>
>> I'm not sure that this review discussion has played out yet - is
>> Minchan happy?
>
> From the beginning, zswap is for reducing swap I/O but if workingset
> overflows, it should write back rather than OOM with expecting a small
> number of writeback would make the system happy because the high memory
> pressure is temporal so soon most of workload would be hit in zswap
> without further writeback.
>
> If memory pressure continues and writeback steadily, it means zswap's
> benefit would be mitigated, even worse by addding comp/decomp overhead.
> In that case, it would be better to disable zswap, even.
>
> Dan said writethrough supporting is first step to make zswap smart
> but anybody didn't say further words to step into the smart and
> what's the *real* workload want it and what's the *real* number from
> that because dm-cache/zram might be a good fit.
> (I don't intend to argue zram VS zswap. If the concern is solved by
> existing solution, why should we invent new function and
> have maintenace cost?) so it's very hard for me to judge that we should
> accept and maintain it.
>
> We need blueprint for the future and make an agreement on the
> direction before merging this patch.

Well, I believe there are some cases where writeback will be better
and other cases where writethrough is better.  If we wait to add
writethrough until I or someone shows they have a specific use case
that performs better consistently with writethrough instead of
writeback, then of course I can try to find it, but I'm just one guy,
who has (relatively) very limited access to systems to test on.
Whereas if writethrough is in the kernel, anyone who wants to can test
to see if their workload performs better with writethrough.

Additionally, I think that it's possible to improve performance by
dynamically choosing wback or wthru - for example proactively wthru
swapping while there is not much IO and no immediate need for free
pages - the extra IO of wthru shouldn't really matter, but will help
if the swapped out page is eventually dropped from the zswap pool to
make room for more pages; while wback swapping makes more sense where
there's an immediate need for free pages and/or a high IO load.  Now
that zswap can significantly reduce the time to restore a swapped out
page, proactive swapping makes (IMHO) much more sense.

So I guess what I'm not clear on is, what level of blueprint do you
want to see?  I think that any more work will require a certain amount
of prototyping and testing, as well as a lot of feedback and
discussion...and to me, it makes more sense to add now this relatively
simple patch that, by default, changes nothing, so that more detailed
discussions and patches can follow without having to include this.

>
> But code size is not much and Seth already gave an his Ack so I don't
> want to hurt Dan any more(Sorry for Dan) and wasting my time so pass
> the decision to others(ex, Seth and Bob).
> If they insist on, I don't object any more.

Well I certainly don't want to insist on anything, I'd like to address
your concern about it.  If you would like to get deeper into
discussions about future possibilities before adding this patch, then
that's fine with me, as I don't think anyone is actually waiting to
use the patch by itself.  But I also don't want to waste your time, as
you said, with premature discussions before I've had a chance to do
any more investigation/prototyping work for whatever might follow onto
this.

Since you've said twice now you don't object as long as Seth, Bob, et.
al. are ok with the patch, I will assume you're still ok with adding
it also, but if you do want to discuss the future work now please do
let me know.

>
> Sorry for bothering Dan.

No bother at all.  Thanks for pushing me to clarify my thoughts!

>
> Thanks.
>
>>
>> Please update the changelog so that it reflects the questions Minchan
>> asked (any reviewer question should be regarded as an inadequacy in
>> either the code commenting or the changelog - people shouldn't need to
>> ask the programmer why he did something!) and resend for -rc1?
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
> --
> Kind regards,
> Minchan Kim

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm/zswap: add writethrough option
  2014-01-22 20:33           ` Andrew Morton
  2014-01-23  0:18             ` Minchan Kim
@ 2014-01-23 20:43             ` Dan Streetman
  1 sibling, 0 replies; 22+ messages in thread
From: Dan Streetman @ 2014-01-23 20:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Seth Jennings, Minchan Kim, Linux-MM, linux-kernel, Bob Liu,
	Weijie Yang, Shirish Pargaonkar, Mel Gorman

On Wed, Jan 22, 2014 at 3:33 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Wed, 22 Jan 2014 09:19:58 -0500 Dan Streetman <ddstreet@ieee.org> wrote:
>
>> >>> > Acutally, I really don't know how much benefit we have that in-memory
>> >>> > swap overcomming to the real storage but if you want, zRAM with dm-cache
>> >>> > is another option rather than invent new wheel by "just having is better".
>> >>>
>> >>> I'm not sure if this patch is related to the zswap vs. zram discussions.  This
>> >>> only adds the option of using writethrough to zswap.  It's a first
>> >>> step to possibly
>> >>> making zswap work more efficiently using writeback and/or writethrough
>> >>> depending on
>> >>> the system and conditions.
>> >>
>> >> The patch size is small. Okay I don't want to be a party-pooper
>> >> but at least, I should say my thought for Andrew to help judging.
>> >
>> > Sure, I'm glad to have your suggestions.
>>
>> To give this a bump - Andrew do you have any concerns about this
>> patch?  Or can you pick this up?
>
> I don't pay much attention to new features during the merge window,
> preferring to shove them into a folder to look at later.  Often they
> have bitrotted by the time -rc1 comes around.
>
> I'm not sure that this review discussion has played out yet - is
> Minchan happy?

I think so, or at least ok enough to not block it, but please correct
me if I am wrong, Minchan.

>
> Please update the changelog so that it reflects the questions Minchan
> asked (any reviewer question should be regarded as an inadequacy in
> either the code commenting or the changelog - people shouldn't need to
> ask the programmer why he did something!) and resend for -rc1?

OK I'll update and resend.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2] mm/zswap: add writethrough option
  2013-12-19 13:23 [PATCH] mm/zswap: add writethrough option Dan Streetman
                   ` (2 preceding siblings ...)
  2014-01-14  0:11 ` Minchan Kim
@ 2014-01-27 14:01 ` Dan Streetman
  2014-02-03 23:08   ` Andrew Morton
  3 siblings, 1 reply; 22+ messages in thread
From: Dan Streetman @ 2014-01-27 14:01 UTC (permalink / raw)
  To: Seth Jennings, Andrew Morton
  Cc: Dan Streetman, Linux-MM, linux-kernel, Bob Liu, Minchan Kim,
	Weijie Yang, Shirish Pargaonkar, Mel Gorman

Currently, zswap is writeback cache; stored pages are not sent
to swap disk, and when zswap wants to evict old pages it must
first write them back to swap cache/disk manually.  This avoids
swap out disk I/O up front, but only moves that disk I/O to
the writeback case (for pages that are evicted), and adds the
overhead of having to uncompress the evicted pages and the
need for an additional free page (to store the uncompressed page).

This optionally changes zswap to writethrough cache by enabling
frontswap_writethrough() before registering, so that any
successful page store will also be written to swap disk.  The
default remains writeback.  To enable writethrough, the param
zswap.writethrough=1 must be used at boot.

Whether writeback or writethrough will provide better performance
depends on many factors including disk I/O speed/throughput,
CPU speed(s), system load, etc.  In most cases it is likely
that writeback has better performance than writethrough before
zswap is full, but after zswap fills up writethrough has
better performance than writeback.

The reason to add this option now is, first to allow any zswap
user to be able to test using writethrough to determine if they
get better performance than using writeback, and second to allow
future updates to zswap, such as the possibility of dynamically
switching between writeback and writethrough.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>

---

Changes in v2:
 - update changelog with reasoning to include patch now,
   in response to Minchan's concerns

Based on specjbb testing on my laptop, the results for both writeback
and writethrough are better than not using zswap at all, but writeback
does seem to be better than writethrough while zswap isn't full.  Once
it fills up, performance for writethrough is essentially close to not
using zswap, while writeback seems to be worse than not using zswap.
However, I think more testing on a wider span of systems and conditions
is needed.  Additionally, I'm not sure that specjbb is measuring true
performance under fully loaded cpu conditions, so additional cpu load
might need to be added or specjbb parameters modified (I took the
values from the 4 "warehouses" test run).

In any case though, I think having writethrough as an option is still
useful.  More changes could be made, such as changing from writeback
to writethrough based on the zswap % full.  And the patch doesn't
change default behavior - writethrough must be specifically enabled.

The %-ized numbers I got from specjbb on average, using the default
20% max_pool_percent and varying the amount of heap used as shown:

ram | no zswap | writeback | writethrough
75     93.08     100         96.90
87     96.58     95.58       96.72
100    92.29     89.73       86.75
112    63.80     38.66       19.66
125    4.79      29.90       15.75
137    4.99      4.50        4.75
150    4.28      4.62        5.01
162    5.20      2.94        4.66
175    5.71      2.11        4.84



 mm/zswap.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 64 insertions(+), 4 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index e55bab9..2f919db 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -61,6 +61,8 @@ static atomic_t zswap_stored_pages = ATOMIC_INIT(0);
 static u64 zswap_pool_limit_hit;
 /* Pages written back when pool limit was reached */
 static u64 zswap_written_back_pages;
+/* Pages evicted when pool limit was reached */
+static u64 zswap_evicted_pages;
 /* Store failed due to a reclaim failure after pool limit was reached */
 static u64 zswap_reject_reclaim_fail;
 /* Compressed page was too big for the allocator to (optimally) store */
@@ -89,6 +91,10 @@ static unsigned int zswap_max_pool_percent = 20;
 module_param_named(max_pool_percent,
 			zswap_max_pool_percent, uint, 0644);
 
+/* Writeback/writethrough mode (fixed at boot for now) */
+static bool zswap_writethrough;
+module_param_named(writethrough, zswap_writethrough, bool, 0444);
+
 /*********************************
 * compression functions
 **********************************/
@@ -629,6 +635,48 @@ end:
 }
 
 /*********************************
+* evict code
+**********************************/
+
+/*
+ * This evicts pages that have already been written through to swap.
+ */
+static int zswap_evict_entry(struct zbud_pool *pool, unsigned long handle)
+{
+	struct zswap_header *zhdr;
+	swp_entry_t swpentry;
+	struct zswap_tree *tree;
+	pgoff_t offset;
+	struct zswap_entry *entry;
+
+	/* extract swpentry from data */
+	zhdr = zbud_map(pool, handle);
+	swpentry = zhdr->swpentry; /* here */
+	zbud_unmap(pool, handle);
+	tree = zswap_trees[swp_type(swpentry)];
+	offset = swp_offset(swpentry);
+	BUG_ON(pool != tree->pool);
+
+	/* find and ref zswap entry */
+	spin_lock(&tree->lock);
+	entry = zswap_rb_search(&tree->rbroot, offset);
+	if (!entry) {
+		/* entry was invalidated */
+		spin_unlock(&tree->lock);
+		return 0;
+	}
+
+	zswap_evicted_pages++;
+
+	zswap_rb_erase(&tree->rbroot, entry);
+	zswap_entry_put(tree, entry);
+
+	spin_unlock(&tree->lock);
+
+	return 0;
+}
+
+/*********************************
 * frontswap hooks
 **********************************/
 /* attempts to compress and store an single page */
@@ -744,7 +792,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
 	spin_lock(&tree->lock);
 	entry = zswap_entry_find_get(&tree->rbroot, offset);
 	if (!entry) {
-		/* entry was written back */
+		/* entry was written back or evicted */
 		spin_unlock(&tree->lock);
 		return -1;
 	}
@@ -778,7 +826,7 @@ static void zswap_frontswap_invalidate_page(unsigned type, pgoff_t offset)
 	spin_lock(&tree->lock);
 	entry = zswap_rb_search(&tree->rbroot, offset);
 	if (!entry) {
-		/* entry was written back */
+		/* entry was written back or evicted */
 		spin_unlock(&tree->lock);
 		return;
 	}
@@ -813,18 +861,26 @@ static void zswap_frontswap_invalidate_area(unsigned type)
 	zswap_trees[type] = NULL;
 }
 
-static struct zbud_ops zswap_zbud_ops = {
+static struct zbud_ops zswap_zbud_writeback_ops = {
 	.evict = zswap_writeback_entry
 };
+static struct zbud_ops zswap_zbud_writethrough_ops = {
+	.evict = zswap_evict_entry
+};
 
 static void zswap_frontswap_init(unsigned type)
 {
 	struct zswap_tree *tree;
+	struct zbud_ops *ops;
 
 	tree = kzalloc(sizeof(struct zswap_tree), GFP_KERNEL);
 	if (!tree)
 		goto err;
-	tree->pool = zbud_create_pool(GFP_KERNEL, &zswap_zbud_ops);
+	if (zswap_writethrough)
+		ops = &zswap_zbud_writethrough_ops;
+	else
+		ops = &zswap_zbud_writeback_ops;
+	tree->pool = zbud_create_pool(GFP_KERNEL, ops);
 	if (!tree->pool)
 		goto freetree;
 	tree->rbroot = RB_ROOT;
@@ -875,6 +931,8 @@ static int __init zswap_debugfs_init(void)
 			zswap_debugfs_root, &zswap_reject_compress_poor);
 	debugfs_create_u64("written_back_pages", S_IRUGO,
 			zswap_debugfs_root, &zswap_written_back_pages);
+	debugfs_create_u64("evicted_pages", S_IRUGO,
+			zswap_debugfs_root, &zswap_evicted_pages);
 	debugfs_create_u64("duplicate_entry", S_IRUGO,
 			zswap_debugfs_root, &zswap_duplicate_entry);
 	debugfs_create_u64("pool_pages", S_IRUGO,
@@ -919,6 +977,8 @@ static int __init init_zswap(void)
 		pr_err("per-cpu initialization failed\n");
 		goto pcpufail;
 	}
+	if (zswap_writethrough)
+		frontswap_writethrough(true);
 	frontswap_register_ops(&zswap_frontswap_ops);
 	if (zswap_debugfs_init())
 		pr_warn("debugfs initialization failed\n");
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] mm/zswap: add writethrough option
  2014-01-27 14:01 ` [PATCH v2] " Dan Streetman
@ 2014-02-03 23:08   ` Andrew Morton
  2014-02-04  2:47     ` Minchan Kim
  2014-02-10 19:05     ` Dan Streetman
  0 siblings, 2 replies; 22+ messages in thread
From: Andrew Morton @ 2014-02-03 23:08 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Seth Jennings, Linux-MM, linux-kernel, Bob Liu, Minchan Kim,
	Weijie Yang, Shirish Pargaonkar, Mel Gorman

On Mon, 27 Jan 2014 09:01:19 -0500 Dan Streetman <ddstreet@ieee.org> wrote:

> Currently, zswap is writeback cache; stored pages are not sent
> to swap disk, and when zswap wants to evict old pages it must
> first write them back to swap cache/disk manually.  This avoids
> swap out disk I/O up front, but only moves that disk I/O to
> the writeback case (for pages that are evicted), and adds the
> overhead of having to uncompress the evicted pages and the
> need for an additional free page (to store the uncompressed page).
> 
> This optionally changes zswap to writethrough cache by enabling
> frontswap_writethrough() before registering, so that any
> successful page store will also be written to swap disk.  The
> default remains writeback.  To enable writethrough, the param
> zswap.writethrough=1 must be used at boot.
> 
> Whether writeback or writethrough will provide better performance
> depends on many factors including disk I/O speed/throughput,
> CPU speed(s), system load, etc.  In most cases it is likely
> that writeback has better performance than writethrough before
> zswap is full, but after zswap fills up writethrough has
> better performance than writeback.
> 
> The reason to add this option now is, first to allow any zswap
> user to be able to test using writethrough to determine if they
> get better performance than using writeback, and second to allow
> future updates to zswap, such as the possibility of dynamically
> switching between writeback and writethrough.
> 
> ...
>
> Based on specjbb testing on my laptop, the results for both writeback
> and writethrough are better than not using zswap at all, but writeback
> does seem to be better than writethrough while zswap isn't full.  Once
> it fills up, performance for writethrough is essentially close to not
> using zswap, while writeback seems to be worse than not using zswap.
> However, I think more testing on a wider span of systems and conditions
> is needed.  Additionally, I'm not sure that specjbb is measuring true
> performance under fully loaded cpu conditions, so additional cpu load
> might need to be added or specjbb parameters modified (I took the
> values from the 4 "warehouses" test run).
> 
> In any case though, I think having writethrough as an option is still
> useful.  More changes could be made, such as changing from writeback
> to writethrough based on the zswap % full.  And the patch doesn't
> change default behavior - writethrough must be specifically enabled.
> 
> The %-ized numbers I got from specjbb on average, using the default
> 20% max_pool_percent and varying the amount of heap used as shown:
> 
> ram | no zswap | writeback | writethrough
> 75     93.08     100         96.90
> 87     96.58     95.58       96.72
> 100    92.29     89.73       86.75
> 112    63.80     38.66       19.66
> 125    4.79      29.90       15.75
> 137    4.99      4.50        4.75
> 150    4.28      4.62        5.01
> 162    5.20      2.94        4.66
> 175    5.71      2.11        4.84

Changelog is very useful, thanks for taking the time.

It does sound like the feature is of marginal benefit.  Is "zswap
filled up" an interesting or useful case to optimize?

otoh the addition is pretty simple and we can later withdraw the whole
thing without breaking anyone's systems.

What do people think?



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] mm/zswap: add writethrough option
  2014-02-03 23:08   ` Andrew Morton
@ 2014-02-04  2:47     ` Minchan Kim
  2014-02-10 19:05     ` Dan Streetman
  1 sibling, 0 replies; 22+ messages in thread
From: Minchan Kim @ 2014-02-04  2:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dan Streetman, Seth Jennings, Linux-MM, linux-kernel, Bob Liu,
	Weijie Yang, Shirish Pargaonkar, Mel Gorman

Hello Andrew,

On Mon, Feb 03, 2014 at 03:08:35PM -0800, Andrew Morton wrote:
> On Mon, 27 Jan 2014 09:01:19 -0500 Dan Streetman <ddstreet@ieee.org> wrote:
> 
> > Currently, zswap is writeback cache; stored pages are not sent
> > to swap disk, and when zswap wants to evict old pages it must
> > first write them back to swap cache/disk manually.  This avoids
> > swap out disk I/O up front, but only moves that disk I/O to
> > the writeback case (for pages that are evicted), and adds the
> > overhead of having to uncompress the evicted pages and the
> > need for an additional free page (to store the uncompressed page).
> > 
> > This optionally changes zswap to writethrough cache by enabling
> > frontswap_writethrough() before registering, so that any
> > successful page store will also be written to swap disk.  The
> > default remains writeback.  To enable writethrough, the param
> > zswap.writethrough=1 must be used at boot.
> > 
> > Whether writeback or writethrough will provide better performance
> > depends on many factors including disk I/O speed/throughput,
> > CPU speed(s), system load, etc.  In most cases it is likely
> > that writeback has better performance than writethrough before
> > zswap is full, but after zswap fills up writethrough has
> > better performance than writeback.
> > 
> > The reason to add this option now is, first to allow any zswap
> > user to be able to test using writethrough to determine if they
> > get better performance than using writeback, and second to allow
> > future updates to zswap, such as the possibility of dynamically
> > switching between writeback and writethrough.
> > 
> > ...
> >
> > Based on specjbb testing on my laptop, the results for both writeback
> > and writethrough are better than not using zswap at all, but writeback
> > does seem to be better than writethrough while zswap isn't full.  Once
> > it fills up, performance for writethrough is essentially close to not
> > using zswap, while writeback seems to be worse than not using zswap.
> > However, I think more testing on a wider span of systems and conditions
> > is needed.  Additionally, I'm not sure that specjbb is measuring true
> > performance under fully loaded cpu conditions, so additional cpu load
> > might need to be added or specjbb parameters modified (I took the
> > values from the 4 "warehouses" test run).
> > 
> > In any case though, I think having writethrough as an option is still
> > useful.  More changes could be made, such as changing from writeback
> > to writethrough based on the zswap % full.  And the patch doesn't
> > change default behavior - writethrough must be specifically enabled.
> > 
> > The %-ized numbers I got from specjbb on average, using the default
> > 20% max_pool_percent and varying the amount of heap used as shown:
> > 
> > ram | no zswap | writeback | writethrough
> > 75     93.08     100         96.90
> > 87     96.58     95.58       96.72
> > 100    92.29     89.73       86.75
> > 112    63.80     38.66       19.66
> > 125    4.79      29.90       15.75
> > 137    4.99      4.50        4.75
> > 150    4.28      4.62        5.01
> > 162    5.20      2.94        4.66
> > 175    5.71      2.11        4.84
> 
> Changelog is very useful, thanks for taking the time.
> 
> It does sound like the feature is of marginal benefit.  Is "zswap
> filled up" an interesting or useful case to optimize?
> 
> otoh the addition is pretty simple and we can later withdraw the whole
> thing without breaking anyone's systems.
> 
> What do people think?

IMHO, Using overcommiting memory and swap, it's really thing
we shold optimize once we decided to use writeback of zswap.

But I don't think writethrough isn't ideal solution for
that case where zswap is full. Sometime, just dynamic disabling
of zswap might be better due to reducing unnecessary
comp/decomp overhead.

Dan said that it's good to have because someuser might find
right example we didn't find in future. Although I'm not a
huge fan of such justification for merging the patch(I tempted
my patches several time with such claim), I don't object it
(Actually, I have an idea to make zswap's writethough useful but
it isn't related to this topic) any more if we could withdraw
easily if it turns out a obstacle for future enhace.

Thanks.
-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] mm/zswap: add writethrough option
  2014-02-03 23:08   ` Andrew Morton
  2014-02-04  2:47     ` Minchan Kim
@ 2014-02-10 19:05     ` Dan Streetman
  2014-02-10 23:06       ` Andrew Morton
  1 sibling, 1 reply; 22+ messages in thread
From: Dan Streetman @ 2014-02-10 19:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Seth Jennings, Linux-MM, linux-kernel, Bob Liu, Minchan Kim,
	Weijie Yang, Shirish Pargaonkar, Mel Gorman

On Mon, Feb 3, 2014 at 6:08 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Mon, 27 Jan 2014 09:01:19 -0500 Dan Streetman <ddstreet@ieee.org> wrote:
>
>> Currently, zswap is writeback cache; stored pages are not sent
>> to swap disk, and when zswap wants to evict old pages it must
>> first write them back to swap cache/disk manually.  This avoids
>> swap out disk I/O up front, but only moves that disk I/O to
>> the writeback case (for pages that are evicted), and adds the
>> overhead of having to uncompress the evicted pages and the
>> need for an additional free page (to store the uncompressed page).
>>
>> This optionally changes zswap to writethrough cache by enabling
>> frontswap_writethrough() before registering, so that any
>> successful page store will also be written to swap disk.  The
>> default remains writeback.  To enable writethrough, the param
>> zswap.writethrough=1 must be used at boot.
>>
>> Whether writeback or writethrough will provide better performance
>> depends on many factors including disk I/O speed/throughput,
>> CPU speed(s), system load, etc.  In most cases it is likely
>> that writeback has better performance than writethrough before
>> zswap is full, but after zswap fills up writethrough has
>> better performance than writeback.
>>
>> The reason to add this option now is, first to allow any zswap
>> user to be able to test using writethrough to determine if they
>> get better performance than using writeback, and second to allow
>> future updates to zswap, such as the possibility of dynamically
>> switching between writeback and writethrough.
>>
>> ...
>>
>> Based on specjbb testing on my laptop, the results for both writeback
>> and writethrough are better than not using zswap at all, but writeback
>> does seem to be better than writethrough while zswap isn't full.  Once
>> it fills up, performance for writethrough is essentially close to not
>> using zswap, while writeback seems to be worse than not using zswap.
>> However, I think more testing on a wider span of systems and conditions
>> is needed.  Additionally, I'm not sure that specjbb is measuring true
>> performance under fully loaded cpu conditions, so additional cpu load
>> might need to be added or specjbb parameters modified (I took the
>> values from the 4 "warehouses" test run).
>>
>> In any case though, I think having writethrough as an option is still
>> useful.  More changes could be made, such as changing from writeback
>> to writethrough based on the zswap % full.  And the patch doesn't
>> change default behavior - writethrough must be specifically enabled.
>>
>> The %-ized numbers I got from specjbb on average, using the default
>> 20% max_pool_percent and varying the amount of heap used as shown:
>>
>> ram | no zswap | writeback | writethrough
>> 75     93.08     100         96.90
>> 87     96.58     95.58       96.72
>> 100    92.29     89.73       86.75
>> 112    63.80     38.66       19.66
>> 125    4.79      29.90       15.75
>> 137    4.99      4.50        4.75
>> 150    4.28      4.62        5.01
>> 162    5.20      2.94        4.66
>> 175    5.71      2.11        4.84
>
> Changelog is very useful, thanks for taking the time.
>
> It does sound like the feature is of marginal benefit.  Is "zswap
> filled up" an interesting or useful case to optimize?
>
> otoh the addition is pretty simple and we can later withdraw the whole
> thing without breaking anyone's systems.

ping...

you still thinking about this or is it a reject for now?

>
> What do people think?
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] mm/zswap: add writethrough option
  2014-02-10 19:05     ` Dan Streetman
@ 2014-02-10 23:06       ` Andrew Morton
  2014-02-11 22:49         ` Dan Streetman
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2014-02-10 23:06 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Seth Jennings, Linux-MM, linux-kernel, Bob Liu, Minchan Kim,
	Weijie Yang, Shirish Pargaonkar, Mel Gorman

On Mon, 10 Feb 2014 14:05:14 -0500 Dan Streetman <ddstreet@ieee.org> wrote:

> >
> > It does sound like the feature is of marginal benefit.  Is "zswap
> > filled up" an interesting or useful case to optimize?
> >
> > otoh the addition is pretty simple and we can later withdraw the whole
> > thing without breaking anyone's systems.
> 
> ping...
> 
> you still thinking about this or is it a reject for now?

I'm not seeing a compelling case for merging it and Minchan sounded
rather unconvinced.  Perhaps we should park it until/unless a more
solid need is found?


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] mm/zswap: add writethrough option
  2014-02-10 23:06       ` Andrew Morton
@ 2014-02-11 22:49         ` Dan Streetman
  0 siblings, 0 replies; 22+ messages in thread
From: Dan Streetman @ 2014-02-11 22:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Seth Jennings, Linux-MM, linux-kernel, Bob Liu, Minchan Kim,
	Weijie Yang, Shirish Pargaonkar, Mel Gorman

On Mon, Feb 10, 2014 at 6:06 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Mon, 10 Feb 2014 14:05:14 -0500 Dan Streetman <ddstreet@ieee.org> wrote:
>
>> >
>> > It does sound like the feature is of marginal benefit.  Is "zswap
>> > filled up" an interesting or useful case to optimize?
>> >
>> > otoh the addition is pretty simple and we can later withdraw the whole
>> > thing without breaking anyone's systems.
>>
>> ping...
>>
>> you still thinking about this or is it a reject for now?
>
> I'm not seeing a compelling case for merging it and Minchan sounded
> rather unconvinced.  Perhaps we should park it until/unless a more
> solid need is found?


Sounds good.  I'll bring it back up if I find some solid need for it.  Thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-02-11 22:49 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-19 13:23 [PATCH] mm/zswap: add writethrough option Dan Streetman
2014-01-02 15:38 ` Dan Streetman
2014-01-03  2:21   ` Weijie Yang
2014-01-03 15:11 ` Seth Jennings
2014-01-13 17:03   ` Dan Streetman
2014-01-14  0:11 ` Minchan Kim
2014-01-14 15:10   ` Dan Streetman
2014-01-15  5:42     ` Minchan Kim
2014-01-17  5:41       ` Dan Streetman
2014-01-22 14:19         ` Dan Streetman
2014-01-22 20:33           ` Andrew Morton
2014-01-23  0:18             ` Minchan Kim
2014-01-23  1:08               ` Bob Liu
2014-01-23 12:46               ` Austin S Hemmelgarn
2014-01-23 19:18               ` Dan Streetman
2014-01-23 20:43             ` Dan Streetman
2014-01-27 14:01 ` [PATCH v2] " Dan Streetman
2014-02-03 23:08   ` Andrew Morton
2014-02-04  2:47     ` Minchan Kim
2014-02-10 19:05     ` Dan Streetman
2014-02-10 23:06       ` Andrew Morton
2014-02-11 22:49         ` Dan Streetman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).