linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] zsmalloc: small compaction improvements
@ 2015-07-11  9:45 Sergey Senozhatsky
  2015-07-11  9:45 ` [PATCH 1/3] zsmalloc: factor out zs_pages_to_compact() Sergey Senozhatsky
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Sergey Senozhatsky @ 2015-07-11  9:45 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, Sergey Senozhatsky,
	Sergey Senozhatsky

Hello,

First two patches introduce new zsmalloc zs_pages_to_compact()
symbol and change zram's `compact' sysfs attribute to be
read-write:
-- write triggers compaction, no changes
-- read returns the number of pages that compaction can
   potentially free

This lets user space to make a bit better decisions and to
avoid unneeded (which will not result in any significant
memory savings) compaction calls:

Example:

      if [ `cat /sys/block/zram<id>/compact` -gt 10 ]; then
          echo 1 > /sys/block/zram<id>/compact;
      fi

Up until now user space could not tell whether compaction
will result in any gain.

The third patch removes class locking around zs_can_compact()
in zs_pages_to_compact(), the motivation and details are
provided in the commit message.

Sergey Senozhatsky (3):
  zsmalloc: factor out zs_pages_to_compact()
  zram: make compact a read-write sysfs node
  zsmalloc: do not take class lock in zs_pages_to_compact()

 Documentation/ABI/testing/sysfs-block-zram |  7 +++---
 Documentation/blockdev/zram.txt            |  4 +++-
 drivers/block/zram/zram_drv.c              | 16 ++++++++++++-
 include/linux/zsmalloc.h                   |  1 +
 mm/zsmalloc.c                              | 37 +++++++++++++++++-------------
 5 files changed, 44 insertions(+), 21 deletions(-)

-- 
2.4.5


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] zsmalloc: factor out zs_pages_to_compact()
  2015-07-11  9:45 [PATCH 0/3] zsmalloc: small compaction improvements Sergey Senozhatsky
@ 2015-07-11  9:45 ` Sergey Senozhatsky
  2015-07-11  9:45 ` [PATCH 2/3] zram: make compact a read-write sysfs node Sergey Senozhatsky
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Sergey Senozhatsky @ 2015-07-11  9:45 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, Sergey Senozhatsky,
	Sergey Senozhatsky

Factor out the code that calculates how many pages compaction
can free into zs_pages_to_compact() function and export it
as zsmalloc API symbol. We still use it in zs_shrinker_count(),
just like we did before, and at the same time we now let zram
know this number (and provide it to user space) so user space
can make better assumptions about manual compaction effectiveness.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 include/linux/zsmalloc.h |  1 +
 mm/zsmalloc.c            | 39 +++++++++++++++++++++++----------------
 2 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index 6398dfa..8f4de78 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -53,6 +53,7 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle);
 
 unsigned long zs_get_total_pages(struct zs_pool *pool);
 unsigned long zs_compact(struct zs_pool *pool);
+unsigned long zs_pages_to_compact(struct zs_pool *pool);
 
 void zs_pool_stats(struct zs_pool *pool, struct zs_pool_stats *stats);
 #endif
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index c10885c..b10a228 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1798,6 +1798,28 @@ void zs_pool_stats(struct zs_pool *pool, struct zs_pool_stats *stats)
 }
 EXPORT_SYMBOL_GPL(zs_pool_stats);
 
+unsigned long zs_pages_to_compact(struct zs_pool *pool)
+{
+	unsigned long pages_to_free = 0;
+	int i;
+	struct size_class *class;
+
+	for (i = zs_size_classes - 1; i >= 0; i--) {
+		class = pool->size_class[i];
+		if (!class)
+			continue;
+		if (class->index != i)
+			continue;
+
+		spin_lock(&class->lock);
+		pages_to_free += zs_can_compact(class);
+		spin_unlock(&class->lock);
+	}
+
+	return pages_to_free;
+}
+EXPORT_SYMBOL_GPL(zs_pages_to_compact);
+
 static unsigned long zs_shrinker_scan(struct shrinker *shrinker,
 		struct shrink_control *sc)
 {
@@ -1819,28 +1841,13 @@ static unsigned long zs_shrinker_scan(struct shrinker *shrinker,
 static unsigned long zs_shrinker_count(struct shrinker *shrinker,
 		struct shrink_control *sc)
 {
-	int i;
-	struct size_class *class;
-	unsigned long pages_to_free = 0;
 	struct zs_pool *pool = container_of(shrinker, struct zs_pool,
 			shrinker);
 
 	if (!pool->shrinker_enabled)
 		return 0;
 
-	for (i = zs_size_classes - 1; i >= 0; i--) {
-		class = pool->size_class[i];
-		if (!class)
-			continue;
-		if (class->index != i)
-			continue;
-
-		spin_lock(&class->lock);
-		pages_to_free += zs_can_compact(class);
-		spin_unlock(&class->lock);
-	}
-
-	return pages_to_free;
+	return zs_pages_to_compact(pool);
 }
 
 static void zs_unregister_shrinker(struct zs_pool *pool)
-- 
2.4.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] zram: make compact a read-write sysfs node
  2015-07-11  9:45 [PATCH 0/3] zsmalloc: small compaction improvements Sergey Senozhatsky
  2015-07-11  9:45 ` [PATCH 1/3] zsmalloc: factor out zs_pages_to_compact() Sergey Senozhatsky
@ 2015-07-11  9:45 ` Sergey Senozhatsky
  2015-07-11  9:45 ` [PATCH 3/3] zsmalloc: do not take class lock in zs_pages_to_compact() Sergey Senozhatsky
  2015-07-13 23:36 ` [PATCH 0/3] zsmalloc: small compaction improvements Minchan Kim
  3 siblings, 0 replies; 15+ messages in thread
From: Sergey Senozhatsky @ 2015-07-11  9:45 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, Sergey Senozhatsky,
	Sergey Senozhatsky

Change zram's `compact' sysfs node to be a read-write attribute.
Write triggers zsmalloc compaction, just as before, read returns
the number of pages that zsmalloc can potentially compact.

User space now has a chance to estimate possible compaction memory
savings and avoid unnecessary compactions.

Example:

  if [ `cat /sys/block/zram<id>/compact` -gt 10 ]; then
      echo 1 > /sys/block/zram<id>/compact;
  fi

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 Documentation/ABI/testing/sysfs-block-zram |  7 ++++---
 Documentation/blockdev/zram.txt            |  4 +++-
 drivers/block/zram/zram_drv.c              | 16 +++++++++++++++-
 3 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
index 2e69e83..0093998 100644
--- a/Documentation/ABI/testing/sysfs-block-zram
+++ b/Documentation/ABI/testing/sysfs-block-zram
@@ -146,9 +146,10 @@ What:		/sys/block/zram<id>/compact
 Date:		August 2015
 Contact:	Minchan Kim <minchan@kernel.org>
 Description:
-		The compact file is write-only and trigger compaction for
-		allocator zrm uses. The allocator moves some objects so that
-		it could free fragment space.
+		The compact file is read/write. Write triggers underlying
+		allocator's memory compaction, which may result in memory
+		savings. Read returns the number of pages that compaction
+		can potentially (but not guaranteed to) free.
 
 What:		/sys/block/zram<id>/io_stat
 Date:		August 2015
diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index 62435bb..1854f62 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -146,7 +146,9 @@ mem_limit         RW    the maximum amount of memory ZRAM can use to store
                         the compressed data
 pages_compacted   RO    the number of pages freed during compaction
                         (available only via zram<id>/mm_stat node)
-compact           WO    trigger memory compaction
+compact           RW    write triggers memory compaction, read shows how many
+                        pages can potentially (but not necessarily will) be
+                        compacted
 
 WARNING
 =======
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index f5ef9e0..def9b8a 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -404,6 +404,20 @@ static ssize_t compact_store(struct device *dev,
 	return len;
 }
 
+static ssize_t compact_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct zram *zram = dev_to_zram(dev);
+	unsigned long num_pages = 0;
+
+	down_read(&zram->init_lock);
+	if (init_done(zram))
+		num_pages = zs_pages_to_compact(zram->meta->mem_pool);
+	up_read(&zram->init_lock);
+
+	return scnprintf(buf, PAGE_SIZE, "%lu\n", num_pages);
+}
+
 static ssize_t io_stat_show(struct device *dev,
 		struct device_attribute *attr, char *buf)
 {
@@ -1145,7 +1159,7 @@ static const struct block_device_operations zram_devops = {
 	.owner = THIS_MODULE
 };
 
-static DEVICE_ATTR_WO(compact);
+static DEVICE_ATTR_RW(compact);
 static DEVICE_ATTR_RW(disksize);
 static DEVICE_ATTR_RO(initstate);
 static DEVICE_ATTR_WO(reset);
-- 
2.4.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] zsmalloc: do not take class lock in zs_pages_to_compact()
  2015-07-11  9:45 [PATCH 0/3] zsmalloc: small compaction improvements Sergey Senozhatsky
  2015-07-11  9:45 ` [PATCH 1/3] zsmalloc: factor out zs_pages_to_compact() Sergey Senozhatsky
  2015-07-11  9:45 ` [PATCH 2/3] zram: make compact a read-write sysfs node Sergey Senozhatsky
@ 2015-07-11  9:45 ` Sergey Senozhatsky
  2015-07-15  4:07   ` Sergey Senozhatsky
  2015-07-13 23:36 ` [PATCH 0/3] zsmalloc: small compaction improvements Minchan Kim
  3 siblings, 1 reply; 15+ messages in thread
From: Sergey Senozhatsky @ 2015-07-11  9:45 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, Sergey Senozhatsky,
	Sergey Senozhatsky

We can avoid taking class ->lock around zs_can_compact() in
zs_pages_to_compact(), because the number that we return back
is outdated in general case, by design. We have different
source that are able to change class's state right after we
return from zs_can_compact() -- ongoing IO operations, manually
triggered compaction or automatic compaction, or all three
simultaneously.

We re-do this calculations during compaction on a per class basis
anyway.

zs_unregister_shrinker() will not return until we have an active
shrinker, so classes won't unexpectedly disappear while
zs_pages_to_compact(), invoked by zs_shrinker_count(), iterates
them.

When called from zram, we are protected by zram's ->init_lock,
so, again, classes will be there until zs_pages_to_compact()
iterates them.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 mm/zsmalloc.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index b10a228..824c182 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1811,9 +1811,7 @@ unsigned long zs_pages_to_compact(struct zs_pool *pool)
 		if (class->index != i)
 			continue;
 
-		spin_lock(&class->lock);
 		pages_to_free += zs_can_compact(class);
-		spin_unlock(&class->lock);
 	}
 
 	return pages_to_free;
-- 
2.4.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] zsmalloc: small compaction improvements
  2015-07-11  9:45 [PATCH 0/3] zsmalloc: small compaction improvements Sergey Senozhatsky
                   ` (2 preceding siblings ...)
  2015-07-11  9:45 ` [PATCH 3/3] zsmalloc: do not take class lock in zs_pages_to_compact() Sergey Senozhatsky
@ 2015-07-13 23:36 ` Minchan Kim
  2015-07-14  0:31   ` Sergey Senozhatsky
  3 siblings, 1 reply; 15+ messages in thread
From: Minchan Kim @ 2015-07-13 23:36 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, linux-mm, linux-kernel, Sergey Senozhatsky

Hello Sergey,

On Sat, Jul 11, 2015 at 06:45:29PM +0900, Sergey Senozhatsky wrote:
> Hello,
> 
> First two patches introduce new zsmalloc zs_pages_to_compact()
> symbol and change zram's `compact' sysfs attribute to be
> read-write:
> -- write triggers compaction, no changes
> -- read returns the number of pages that compaction can
>    potentially free
> 
> This lets user space to make a bit better decisions and to
> avoid unneeded (which will not result in any significant
> memory savings) compaction calls:
> 
> Example:
> 
>       if [ `cat /sys/block/zram<id>/compact` -gt 10 ]; then
>           echo 1 > /sys/block/zram<id>/compact;
>       fi
> 
> Up until now user space could not tell whether compaction
> will result in any gain.

First of all, thanks for the looking this.

Question:

What is motivation?
IOW, did you see big overhead by user-triggered compaction? so,
do you want to throttle it by userspace?

> 
> The third patch removes class locking around zs_can_compact()
> in zs_pages_to_compact(), the motivation and details are
> provided in the commit message.
> 
> Sergey Senozhatsky (3):
>   zsmalloc: factor out zs_pages_to_compact()
>   zram: make compact a read-write sysfs node
>   zsmalloc: do not take class lock in zs_pages_to_compact()
> 
>  Documentation/ABI/testing/sysfs-block-zram |  7 +++---
>  Documentation/blockdev/zram.txt            |  4 +++-
>  drivers/block/zram/zram_drv.c              | 16 ++++++++++++-
>  include/linux/zsmalloc.h                   |  1 +
>  mm/zsmalloc.c                              | 37 +++++++++++++++++-------------
>  5 files changed, 44 insertions(+), 21 deletions(-)
> 
> -- 
> 2.4.5
> 

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] zsmalloc: small compaction improvements
  2015-07-13 23:36 ` [PATCH 0/3] zsmalloc: small compaction improvements Minchan Kim
@ 2015-07-14  0:31   ` Sergey Senozhatsky
  2015-07-14  0:55     ` Minchan Kim
  0 siblings, 1 reply; 15+ messages in thread
From: Sergey Senozhatsky @ 2015-07-14  0:31 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, linux-mm, linux-kernel,
	Sergey Senozhatsky

Hello Minchan,

On (07/14/15 08:36), Minchan Kim wrote:
[..]
> >       if [ `cat /sys/block/zram<id>/compact` -gt 10 ]; then
> >           echo 1 > /sys/block/zram<id>/compact;
> >       fi
> > 
> > Up until now user space could not tell whether compaction
> > will result in any gain.
> 
> First of all, thanks for the looking this.
> 
> Question:
> 
> What is motivation?
> IOW, did you see big overhead by user-triggered compaction? so,
> do you want to throttle it by userspace?

It depends on 'big overhead' definition, of course. We don't care
that much when compaction is issued by the shrinker, because things
are getting bad and we can sacrifice performance. But user triggered
compaction on a I/O pressured device can needlessly slow things down,
especially now, when we drain ALMOST_FULL classes.

/sys/block/zram<id>/compact is a black box. We provide it, we don't
throttle it in the kernel, and user space is absolutely clueless when
it invokes compaction. From some remote (or alternative) point of
view compaction can be seen as "zsmalloc's cache flush" (unused objects
make write path quicker - no zspage allocation needed) and it won't
hurt to give user space some numbers so it can decide if the whole
thing is worth it (that decision is, once again, I/O pattern and
setup specific -- some users may be interested in compaction only
if it will reduce zsmalloc's memory consumption by, say, 15%).

	-ss

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] zsmalloc: small compaction improvements
  2015-07-14  0:31   ` Sergey Senozhatsky
@ 2015-07-14  0:55     ` Minchan Kim
  2015-07-14 12:29       ` Sergey Senozhatsky
  0 siblings, 1 reply; 15+ messages in thread
From: Minchan Kim @ 2015-07-14  0:55 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, linux-mm, linux-kernel

On Tue, Jul 14, 2015 at 09:31:32AM +0900, Sergey Senozhatsky wrote:
> Hello Minchan,
> 
> On (07/14/15 08:36), Minchan Kim wrote:
> [..]
> > >       if [ `cat /sys/block/zram<id>/compact` -gt 10 ]; then
> > >           echo 1 > /sys/block/zram<id>/compact;
> > >       fi
> > > 
> > > Up until now user space could not tell whether compaction
> > > will result in any gain.
> > 
> > First of all, thanks for the looking this.
> > 
> > Question:
> > 
> > What is motivation?
> > IOW, did you see big overhead by user-triggered compaction? so,
> > do you want to throttle it by userspace?
> 
> It depends on 'big overhead' definition, of course. We don't care
> that much when compaction is issued by the shrinker, because things
> are getting bad and we can sacrifice performance. But user triggered
> compaction on a I/O pressured device can needlessly slow things down,
> especially now, when we drain ALMOST_FULL classes.

You mean performance overhead by additional alloc_pages?
If so, you mean ALMOST_EMPTY|ALMOST_FULL, not only ALMOST_FULL?

So, it's performance enhance patch?
Please give the some number to justify patchset.

> 
> /sys/block/zram<id>/compact is a black box. We provide it, we don't
> throttle it in the kernel, and user space is absolutely clueless when
> it invokes compaction. From some remote (or alternative) point of

But we have zs_can_compact so it can effectively skip the class if it
is not proper class.

> view compaction can be seen as "zsmalloc's cache flush" (unused objects
> make write path quicker - no zspage allocation needed) and it won't
> hurt to give user space some numbers so it can decide if the whole
> thing is worth it (that decision is, once again, I/O pattern and
> setup specific -- some users may be interested in compaction only
> if it will reduce zsmalloc's memory consumption by, say, 15%).

Again, your claim is performace so I need number.
If it's really horrible, I guess below interface makes user handy
without peeking nr_can_compact ad doing compact.

        /* Tell zram to compact if fragment ration is higher 15% */
        echo 15% > /sys/block/zram0/compact
        or
        echo 15% > /sys/block/zram/compact_condition

Anyway, we need a number before starting discussion.

Thanks.
> 
> 	-ss

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] zsmalloc: small compaction improvements
  2015-07-14  0:55     ` Minchan Kim
@ 2015-07-14 12:29       ` Sergey Senozhatsky
  2015-07-14 16:52         ` Minchan Kim
  0 siblings, 1 reply; 15+ messages in thread
From: Sergey Senozhatsky @ 2015-07-14 12:29 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton, linux-mm,
	linux-kernel

On (07/14/15 09:55), Minchan Kim wrote:
> > It depends on 'big overhead' definition, of course. We don't care
> > that much when compaction is issued by the shrinker, because things
> > are getting bad and we can sacrifice performance. But user triggered
> > compaction on a I/O pressured device can needlessly slow things down,
> > especially now, when we drain ALMOST_FULL classes.
> 
> You mean performance overhead by additional alloc_pages?

not only performance, but yes, performance mostly.

> If so, you mean ALMOST_EMPTY|ALMOST_FULL, not only ALMOST_FULL?

of course, I meant your recent patch here. should have been 'we _ALSO_
drain ALMOST_FULL classes'

> 
> So, it's performance enhance patch?
> Please give the some number to justify patchset.

alrighty... again...

> > 
> > /sys/block/zram<id>/compact is a black box. We provide it, we don't
> > throttle it in the kernel, and user space is absolutely clueless when
> > it invokes compaction. From some remote (or alternative) point of
> 
> But we have zs_can_compact so it can effectively skip the class if it
> is not proper class.

user triggered compaction can compact too much.
in its current state triggering a compaction from user space is like
playing a lottery or a russian roulette.

a simple script

for i in {1..1000}; do
        echo -n 'compact... ';
        cat /sys/block/zram0/compact;
        echo 1 > /sys/block/zram0/compact;
        sleep 1;
done

(and this is not so crazy. love it or not, but this is the only way
how user space can use compaction at the moment).

the output
...
compact... 0
compact... 0
compact... 0
compact... 0
compact... 0
compact... 0
compact... 409
compact... 3550
compact... 0
compact... 0
compact... 0
compact... 2129
compact... 765
compact... 0
compact... 0
compact... 0
compact... 784
compact... 0
compact... 0
compact... 0
compact... 0
...

(f.e., we compacted 3550 pages on device being under I/O pressure.
that's quite a lot, don't you think so?).

first	-- no enforced compaction
second	-- with enforced compaction

./iozone -t 8 -R -r 4K -s 200M -I +Z

                        w/o               w/compaction
"  Initial write "    549240.49             538710.62
"        Rewrite "   2447973.19            2442312.38
"           Read "   5533620.69            5611562.00
"        Re-read "   5689199.81            4916373.62
"   Reverse Read "   4094576.16            5280551.56
"    Stride read "   5382067.75            5395350.00
"    Random read "   5384945.56            5298079.62
" Mixed workload "   3986770.06            3918897.78
"   Random write "   2290869.12            2201346.50
"         Pwrite "    502619.36             493527.64
"          Pread "   2675312.28            2732118.19
"         Fwrite "   4198686.06            3373855.09
"          Fread "  18054401.25           17895797.00


> > view compaction can be seen as "zsmalloc's cache flush" (unused objects
> > make write path quicker - no zspage allocation needed) and it won't
> > hurt to give user space some numbers so it can decide if the whole
> > thing is worth it (that decision is, once again, I/O pattern and
> > setup specific -- some users may be interested in compaction only
> > if it will reduce zsmalloc's memory consumption by, say, 15%).
> 
> Again, your claim is performace so I need number.
> If it's really horrible, I guess below interface makes user handy
> without peeking nr_can_compact ad doing compact.
> 
>         /* Tell zram to compact if fragment ration is higher 15% */
>         echo 15% > /sys/block/zram0/compact
>         or
>         echo 15% > /sys/block/zram/compact_condition

no, this is the least of the things we need to do -- enforced and
pre-defined policy engine in zram/zsmalloc 'that will work for all'.
user space has almost all the numbers to do it, the only missing bit
of the puzzle is `nr_can_compact' number. it's up to user space then
to decide how it wishes things to be done. for example:
"don't compact if compaction will flush 35% of zsmalloc pages on a
I/O pressured device" or something else.

	-ss

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] zsmalloc: small compaction improvements
  2015-07-14 12:29       ` Sergey Senozhatsky
@ 2015-07-14 16:52         ` Minchan Kim
  2015-07-15  0:21           ` Sergey Senozhatsky
  0 siblings, 1 reply; 15+ messages in thread
From: Minchan Kim @ 2015-07-14 16:52 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, linux-mm, linux-kernel

Hi Sergey,

On Tue, Jul 14, 2015 at 09:29:32PM +0900, Sergey Senozhatsky wrote:
> On (07/14/15 09:55), Minchan Kim wrote:
> > > It depends on 'big overhead' definition, of course. We don't care
> > > that much when compaction is issued by the shrinker, because things
> > > are getting bad and we can sacrifice performance. But user triggered
> > > compaction on a I/O pressured device can needlessly slow things down,
> > > especially now, when we drain ALMOST_FULL classes.
> > 
> > You mean performance overhead by additional alloc_pages?
> 
> not only performance, but yes, performance mostly.
> 
> > If so, you mean ALMOST_EMPTY|ALMOST_FULL, not only ALMOST_FULL?
> 
> of course, I meant your recent patch here. should have been 'we _ALSO_
> drain ALMOST_FULL classes'
> 
> > 
> > So, it's performance enhance patch?
> > Please give the some number to justify patchset.
> 
> alrighty... again...
> 
> > > 
> > > /sys/block/zram<id>/compact is a black box. We provide it, we don't
> > > throttle it in the kernel, and user space is absolutely clueless when
> > > it invokes compaction. From some remote (or alternative) point of
> > 
> > But we have zs_can_compact so it can effectively skip the class if it
> > is not proper class.
> 
> user triggered compaction can compact too much.
> in its current state triggering a compaction from user space is like
> playing a lottery or a russian roulette.

We were on different page.

I thought the motivation from this patchset is to prevent compaction
overhead by frequent user-driven compaction request because user
don't know how they can get free pages by compaction so they should
ask compact frequently with blind.
That's why I mentioned zs_can_compact which can skip compaction
effectively.

But your montivation is too much drain so loses performance.

First of all, user-triggered compaction should be highly avoided
because we introduced auto-compaction now so it should be enough.
If auto-compaction has trobule with someone's requirement,
we should fix/enhance it rather than enhancing manual compact.

For example, if there is some problems in compaction of VM, we don't
enhance manual compact(ie, /proc/sys/vm/compact_memory). Instead,
we enhance dynamic compaction.
If there is some problems in reclaim algorithm, we don't enhance
/proc/sys/vm/drop_caches but fixes alrogithm itself.
Because it's really hard for user to decide when/how it triggers
and even every system doesn't have such userspace program.

If user want to compact by manual, we should catch up the voice
from them and try to find why they are not enough with auto-compaction
and fix it rather than relying on user's decision with manual compaction.

Nonetheless, if user decied to trigger manual compaction, he should
have a cost(ie, performance lose). It's natural for all caches that
performance and cache size is proportional.

If you have a concern about too much drain, we should fix auto-compaction
which can reclaim pages as many as VM request, rather than every zspages
of all classes.

Slab shrinker already passes a hint how many objects subsystem should
reclaim so we could stop reclaim at some point, not draining all classes.
But the problem in this approach is we need balance reclaiming for all
classes with round-robin to prevent too much draining from a class,
I think it is not hard but I really suspect we need it in this point.

> 
> a simple script
> 
> for i in {1..1000}; do
>         echo -n 'compact... ';
>         cat /sys/block/zram0/compact;
>         echo 1 > /sys/block/zram0/compact;
>         sleep 1;
> done
> 
> (and this is not so crazy. love it or not, but this is the only way
> how user space can use compaction at the moment).
> 
> the output
> ...
> compact... 0
> compact... 0
> compact... 0
> compact... 0
> compact... 0
> compact... 0
> compact... 409
> compact... 3550
> compact... 0
> compact... 0
> compact... 0
> compact... 2129
> compact... 765
> compact... 0
> compact... 0
> compact... 0
> compact... 784
> compact... 0
> compact... 0
> compact... 0
> compact... 0
> ...
> 
> (f.e., we compacted 3550 pages on device being under I/O pressure.
> that's quite a lot, don't you think so?).
> 
> first	-- no enforced compaction
> second	-- with enforced compaction
> 
> ./iozone -t 8 -R -r 4K -s 200M -I +Z
> 
>                         w/o               w/compaction
> "  Initial write "    549240.49             538710.62
> "        Rewrite "   2447973.19            2442312.38
> "           Read "   5533620.69            5611562.00
> "        Re-read "   5689199.81            4916373.62
> "   Reverse Read "   4094576.16            5280551.56
> "    Stride read "   5382067.75            5395350.00
> "    Random read "   5384945.56            5298079.62
> " Mixed workload "   3986770.06            3918897.78
> "   Random write "   2290869.12            2201346.50
> "         Pwrite "    502619.36             493527.64
> "          Pread "   2675312.28            2732118.19
> "         Fwrite "   4198686.06            3373855.09
> "          Fread "  18054401.25           17895797.00

It seems to be not significant if we consider it's benchmark to
focus to reveal performance and even sometime w/compaction is better.

Although it might be apparenlty slow, it is expected.
Because run-time compaction overhead and small cache size
by compaction will absolutely affect performance so user should
avoid such manual compaction and relies on cache eviction
to VM reclaim. It's true for all caches in current linux kernel.

> 
> 
> > > view compaction can be seen as "zsmalloc's cache flush" (unused objects
> > > make write path quicker - no zspage allocation needed) and it won't
> > > hurt to give user space some numbers so it can decide if the whole
> > > thing is worth it (that decision is, once again, I/O pattern and
> > > setup specific -- some users may be interested in compaction only
> > > if it will reduce zsmalloc's memory consumption by, say, 15%).
> > 
> > Again, your claim is performace so I need number.
> > If it's really horrible, I guess below interface makes user handy
> > without peeking nr_can_compact ad doing compact.
> > 
> >         /* Tell zram to compact if fragment ration is higher 15% */
> >         echo 15% > /sys/block/zram0/compact
> >         or
> >         echo 15% > /sys/block/zram/compact_condition
> 
> no, this is the least of the things we need to do -- enforced and
> pre-defined policy engine in zram/zsmalloc 'that will work for all'.
> user space has almost all the numbers to do it, the only missing bit
> of the puzzle is `nr_can_compact' number. it's up to user space then
> to decide how it wishes things to be done. for example:

I don't think so. It seems you claims kernel should provide basic stat
and user should implement the policy based on it. Normally it might be
true but if kernel can do it better, it's better to handle it in kernel
space.

In case of your interface, every such system should have a userspace manager
to control it. IOW, if there is no such program in the system, the feature
is void. But with my suggestion, we could make default value(ex, 30 or 15)
and user can change it anytime. At least, it will work without any userspace
intervention.

Other flaw of your interface is we cannot provide right vaule from
reading /sys/block/zram0/compact, which is really racy so user's
decision is really fragile to work.
It's bad for kernel to provide such vague value to the user for his
policy decision.

NACK.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] zsmalloc: small compaction improvements
  2015-07-14 16:52         ` Minchan Kim
@ 2015-07-15  0:21           ` Sergey Senozhatsky
  2015-07-15  0:24             ` Minchan Kim
  0 siblings, 1 reply; 15+ messages in thread
From: Sergey Senozhatsky @ 2015-07-15  0:21 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton, linux-mm,
	linux-kernel

On (07/15/15 01:52), Minchan Kim wrote:
> > alrighty... again...
> > 
> > > > 
> > > > /sys/block/zram<id>/compact is a black box. We provide it, we don't
> > > > throttle it in the kernel, and user space is absolutely clueless when
> > > > it invokes compaction. From some remote (or alternative) point of
> > > 
> > > But we have zs_can_compact so it can effectively skip the class if it
> > > is not proper class.
> > 
> > user triggered compaction can compact too much.
> > in its current state triggering a compaction from user space is like
> > playing a lottery or a russian roulette.
> 
> We were on different page.

> I thought the motivation from this patchset is to prevent compaction
> overhead by frequent user-driven compaction request because user
> don't know how they can get free pages by compaction so they should
> ask compact frequently with blind.

this is exactly the motivation for this patchset. seriously.

whatever.

	-ss

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] zsmalloc: small compaction improvements
  2015-07-15  0:21           ` Sergey Senozhatsky
@ 2015-07-15  0:24             ` Minchan Kim
  2015-07-15 11:16               ` Sergey Senozhatsky
  0 siblings, 1 reply; 15+ messages in thread
From: Minchan Kim @ 2015-07-15  0:24 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, linux-mm, linux-kernel

On Wed, Jul 15, 2015 at 09:21:06AM +0900, Sergey Senozhatsky wrote:
> On (07/15/15 01:52), Minchan Kim wrote:
> > > alrighty... again...
> > > 
> > > > > 
> > > > > /sys/block/zram<id>/compact is a black box. We provide it, we don't
> > > > > throttle it in the kernel, and user space is absolutely clueless when
> > > > > it invokes compaction. From some remote (or alternative) point of
> > > > 
> > > > But we have zs_can_compact so it can effectively skip the class if it
> > > > is not proper class.
> > > 
> > > user triggered compaction can compact too much.
> > > in its current state triggering a compaction from user space is like
> > > playing a lottery or a russian roulette.
> > 
> > We were on different page.
> 
> > I thought the motivation from this patchset is to prevent compaction
> > overhead by frequent user-driven compaction request because user
> > don't know how they can get free pages by compaction so they should
> > ask compact frequently with blind.
> 
> this is exactly the motivation for this patchset. seriously.

User should rely on the auto-compaction.

> 
> whatever.
> 
> 	-ss

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] zsmalloc: do not take class lock in zs_pages_to_compact()
  2015-07-11  9:45 ` [PATCH 3/3] zsmalloc: do not take class lock in zs_pages_to_compact() Sergey Senozhatsky
@ 2015-07-15  4:07   ` Sergey Senozhatsky
  2015-07-15 23:38     ` Minchan Kim
  0 siblings, 1 reply; 15+ messages in thread
From: Sergey Senozhatsky @ 2015-07-15  4:07 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, Sergey Senozhatsky,
	Sergey Senozhatsky

On (07/11/15 18:45), Sergey Senozhatsky wrote:
[..]
> We re-do this calculations during compaction on a per class basis
> anyway.
> 
> zs_unregister_shrinker() will not return until we have an active
> shrinker, so classes won't unexpectedly disappear while
> zs_pages_to_compact(), invoked by zs_shrinker_count(), iterates
> them.
> 
> When called from zram, we are protected by zram's ->init_lock,
> so, again, classes will be there until zs_pages_to_compact()
> iterates them.
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> ---
>  mm/zsmalloc.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index b10a228..824c182 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1811,9 +1811,7 @@ unsigned long zs_pages_to_compact(struct zs_pool *pool)
>  		if (class->index != i)
>  			continue;
>  
> -		spin_lock(&class->lock);
>  		pages_to_free += zs_can_compact(class);
> -		spin_unlock(&class->lock);
>  	}
>  
>  	return pages_to_free;

This patch still makes sense. Agree?

	-ss

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] zsmalloc: small compaction improvements
  2015-07-15  0:24             ` Minchan Kim
@ 2015-07-15 11:16               ` Sergey Senozhatsky
  0 siblings, 0 replies; 15+ messages in thread
From: Sergey Senozhatsky @ 2015-07-15 11:16 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton, linux-mm,
	linux-kernel

On (07/15/15 09:24), Minchan Kim wrote:
> On Wed, Jul 15, 2015 at 09:21:06AM +0900, Sergey Senozhatsky wrote:
> > On (07/15/15 01:52), Minchan Kim wrote:
> > > > alrighty... again...
> > > > 
> > > > > > 
> > > > > > /sys/block/zram<id>/compact is a black box. We provide it, we don't
> > > > > > throttle it in the kernel, and user space is absolutely clueless when
> > > > > > it invokes compaction. From some remote (or alternative) point of
> > > > > 
> > > > > But we have zs_can_compact so it can effectively skip the class if it
> > > > > is not proper class.
> > > > 
> > > > user triggered compaction can compact too much.
> > > > in its current state triggering a compaction from user space is like
> > > > playing a lottery or a russian roulette.
> > > 
> > > We were on different page.
> > 
> > > I thought the motivation from this patchset is to prevent compaction
> > > overhead by frequent user-driven compaction request because user
> > > don't know how they can get free pages by compaction so they should
> > > ask compact frequently with blind.
> > 
> > this is exactly the motivation for this patchset. seriously.
> 
> User should rely on the auto-compaction.

yep, which will be available in 5-6 months... right behind the corner.

	-ss

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] zsmalloc: do not take class lock in zs_pages_to_compact()
  2015-07-15  4:07   ` Sergey Senozhatsky
@ 2015-07-15 23:38     ` Minchan Kim
  2015-07-15 23:59       ` Sergey Senozhatsky
  0 siblings, 1 reply; 15+ messages in thread
From: Minchan Kim @ 2015-07-15 23:38 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, linux-mm, linux-kernel, Sergey Senozhatsky

Hi Sergey,

On Wed, Jul 15, 2015 at 01:07:03PM +0900, Sergey Senozhatsky wrote:
> On (07/11/15 18:45), Sergey Senozhatsky wrote:
> [..]
> > We re-do this calculations during compaction on a per class basis
> > anyway.
> > 
> > zs_unregister_shrinker() will not return until we have an active
> > shrinker, so classes won't unexpectedly disappear while
> > zs_pages_to_compact(), invoked by zs_shrinker_count(), iterates
> > them.
> > 
> > When called from zram, we are protected by zram's ->init_lock,
> > so, again, classes will be there until zs_pages_to_compact()
> > iterates them.
> > 
> > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > ---
> >  mm/zsmalloc.c | 2 --
> >  1 file changed, 2 deletions(-)
> > 
> > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> > index b10a228..824c182 100644
> > --- a/mm/zsmalloc.c
> > +++ b/mm/zsmalloc.c
> > @@ -1811,9 +1811,7 @@ unsigned long zs_pages_to_compact(struct zs_pool *pool)
> >  		if (class->index != i)
> >  			continue;
> >  
> > -		spin_lock(&class->lock);
> >  		pages_to_free += zs_can_compact(class);
> > -		spin_unlock(&class->lock);
> >  	}
> >  
> >  	return pages_to_free;
> 
> This patch still makes sense. Agree?

There is already race window between shrink_count and shrink_slab so
it would be okay if we return stale stat with removing the lock if
the difference is not huge.

Even, now we don't obey nr_to_scan of shrinker in zs_shrinker_scan
so the such accuracy would be pointless.

Please resend the patch and correct zs_can_compact's comment.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] zsmalloc: do not take class lock in zs_pages_to_compact()
  2015-07-15 23:38     ` Minchan Kim
@ 2015-07-15 23:59       ` Sergey Senozhatsky
  0 siblings, 0 replies; 15+ messages in thread
From: Sergey Senozhatsky @ 2015-07-15 23:59 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, linux-mm, linux-kernel,
	Sergey Senozhatsky

Hi,

On (07/16/15 08:38), Minchan Kim wrote:
> > > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> > > index b10a228..824c182 100644
> > > --- a/mm/zsmalloc.c
> > > +++ b/mm/zsmalloc.c
> > > @@ -1811,9 +1811,7 @@ unsigned long zs_pages_to_compact(struct zs_pool *pool)
> > >  		if (class->index != i)
> > >  			continue;
> > >  
> > > -		spin_lock(&class->lock);
> > >  		pages_to_free += zs_can_compact(class);
> > > -		spin_unlock(&class->lock);
> > >  	}
> > >  
> > >  	return pages_to_free;
> > 
> > This patch still makes sense. Agree?
> 
> There is already race window between shrink_count and shrink_slab so
> it would be okay if we return stale stat with removing the lock if
> the difference is not huge.
> 
> Even, now we don't obey nr_to_scan of shrinker in zs_shrinker_scan
> so the such accuracy would be pointless.

Yeah, automatic shrinker may work concurrently with the user triggered
one, so it may be hard (time consuming) to release the exact amount of
pages that we returned from _count(). We can look at `sc->nr_to_reclaim'
to avoid releasing more pages than shrinker wants us to release, but
I'd probably prefer to keep the existing behaviour if we were called by
the shrinker.

OK, will resend later today.

	-ss

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-07-15 23:59 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-11  9:45 [PATCH 0/3] zsmalloc: small compaction improvements Sergey Senozhatsky
2015-07-11  9:45 ` [PATCH 1/3] zsmalloc: factor out zs_pages_to_compact() Sergey Senozhatsky
2015-07-11  9:45 ` [PATCH 2/3] zram: make compact a read-write sysfs node Sergey Senozhatsky
2015-07-11  9:45 ` [PATCH 3/3] zsmalloc: do not take class lock in zs_pages_to_compact() Sergey Senozhatsky
2015-07-15  4:07   ` Sergey Senozhatsky
2015-07-15 23:38     ` Minchan Kim
2015-07-15 23:59       ` Sergey Senozhatsky
2015-07-13 23:36 ` [PATCH 0/3] zsmalloc: small compaction improvements Minchan Kim
2015-07-14  0:31   ` Sergey Senozhatsky
2015-07-14  0:55     ` Minchan Kim
2015-07-14 12:29       ` Sergey Senozhatsky
2015-07-14 16:52         ` Minchan Kim
2015-07-15  0:21           ` Sergey Senozhatsky
2015-07-15  0:24             ` Minchan Kim
2015-07-15 11:16               ` Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).