* [PATCH] zram: export the number of available comp streams
@ 2016-01-26 12:03 Sergey Senozhatsky
2016-01-26 21:13 ` Andrew Morton
2016-01-29 7:28 ` Minchan Kim
0 siblings, 2 replies; 11+ messages in thread
From: Sergey Senozhatsky @ 2016-01-26 12:03 UTC (permalink / raw)
To: Andrew Morton
Cc: Minchan Kim, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky
I've been asked several very simple questions:
a) How can I ensure that zram uses (or used) several compression
streams?
b) What is the current number of comp streams (how much memory
does zram *actually* use for compression streams, if there are
more than one stream)?
zram, indeed, does not provide any info and does not answer
these questions. Reading from `max_comp_streams' let to estimate
only theoretical comp streams memory consumption, which assumes
that zram will allocate max_comp_streams. However, it's possible
that the real number of compression streams will never reach that
max value, due to various reasons, e.g. max_comp_streams is too
high, etc.
The patch adds `avail_streams' column to the /sys/block/zram<id>/mm_stat
device file. For a single compression stream backend it's always 1,
for a multi stream backend - it shows the actual ->avail_strm value.
The number of allocated compression streams answers several
questions:
a) the current `level of concurrency' that the device has
experienced
b) the amount of memory used by compression streams (by multiplying
the `avail_streams' column value, ->buffer size and algorithm's
specific scratch buffer size; the last are easy to find out,
unlike `avail_streams').
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
Documentation/blockdev/zram.txt | 9 +++++++++
drivers/block/zram/zcomp.c | 24 ++++++++++++++++++++++++
drivers/block/zram/zcomp.h | 2 ++
drivers/block/zram/zram_drv.c | 7 +++++--
4 files changed, 40 insertions(+), 2 deletions(-)
diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index 5bda503..95bad79 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -227,6 +227,15 @@ line of text and contains the following stats separated by whitespace:
mem_used_max
zero_pages
num_migrated
+ avail_streams
+
+`avail_streams' column shows the current number of available compression
+streams, which is not necessarily equal to the number of max compression
+streams. The number of max compression streams can be set too high and be
+unreachable (depending on the load and the usage pattern, of course).
+`avail_streams' let to find out the real 'level of concurrency' that
+a particular zram device saw and to calculate the real memory consumption
+by allocated compression streams, not the theoretical maximum value.
9) Deactivate:
swapoff /dev/zram0
diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
index 3ef42e5..83ee2a4 100644
--- a/drivers/block/zram/zcomp.c
+++ b/drivers/block/zram/zcomp.c
@@ -183,6 +183,18 @@ static bool zcomp_strm_multi_set_max_streams(struct zcomp *comp, int num_strm)
return true;
}
+static int zcomp_strm_multi_num_avail_streams(struct zcomp *comp)
+{
+ int avail;
+ struct zcomp_strm_multi *zs = comp->stream;
+
+ spin_lock(&zs->strm_lock);
+ avail = zs->avail_strm;
+ spin_unlock(&zs->strm_lock);
+
+ return avail;
+}
+
static void zcomp_strm_multi_destroy(struct zcomp *comp)
{
struct zcomp_strm_multi *zs = comp->stream;
@@ -206,6 +218,7 @@ static int zcomp_strm_multi_create(struct zcomp *comp, int max_strm)
comp->strm_find = zcomp_strm_multi_find;
comp->strm_release = zcomp_strm_multi_release;
comp->set_max_streams = zcomp_strm_multi_set_max_streams;
+ comp->num_avail_streams = zcomp_strm_multi_num_avail_streams;
zs = kmalloc(sizeof(struct zcomp_strm_multi), GFP_KERNEL);
if (!zs)
return -ENOMEM;
@@ -246,6 +259,11 @@ static bool zcomp_strm_single_set_max_streams(struct zcomp *comp, int num_strm)
return false;
}
+static int zcomp_strm_single_num_avail_streams(struct zcomp *comp)
+{
+ return 1;
+}
+
static void zcomp_strm_single_destroy(struct zcomp *comp)
{
struct zcomp_strm_single *zs = comp->stream;
@@ -261,6 +279,7 @@ static int zcomp_strm_single_create(struct zcomp *comp)
comp->strm_find = zcomp_strm_single_find;
comp->strm_release = zcomp_strm_single_release;
comp->set_max_streams = zcomp_strm_single_set_max_streams;
+ comp->num_avail_streams = zcomp_strm_single_num_avail_streams;
zs = kmalloc(sizeof(struct zcomp_strm_single), GFP_KERNEL);
if (!zs)
return -ENOMEM;
@@ -304,6 +323,11 @@ bool zcomp_set_max_streams(struct zcomp *comp, int num_strm)
return comp->set_max_streams(comp, num_strm);
}
+int zcomp_num_avail_streams(struct zcomp *comp)
+{
+ return comp->num_avail_streams(comp);
+}
+
struct zcomp_strm *zcomp_strm_find(struct zcomp *comp)
{
return comp->strm_find(comp);
diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h
index b7d2a4b..a69a3ca 100644
--- a/drivers/block/zram/zcomp.h
+++ b/drivers/block/zram/zcomp.h
@@ -47,6 +47,7 @@ struct zcomp {
struct zcomp_strm *(*strm_find)(struct zcomp *comp);
void (*strm_release)(struct zcomp *comp, struct zcomp_strm *zstrm);
bool (*set_max_streams)(struct zcomp *comp, int num_strm);
+ int (*num_avail_streams)(struct zcomp *comp);
void (*destroy)(struct zcomp *comp);
};
@@ -66,4 +67,5 @@ int zcomp_decompress(struct zcomp *comp, const unsigned char *src,
size_t src_len, unsigned char *dst);
bool zcomp_set_max_streams(struct zcomp *comp, int num_strm);
+int zcomp_num_avail_streams(struct zcomp *comp);
#endif /* _ZCOMP_H_ */
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 370c2f7..46055db 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -429,6 +429,7 @@ static ssize_t mm_stat_show(struct device *dev,
struct zs_pool_stats pool_stats;
u64 orig_size, mem_used = 0;
long max_used;
+ int avail_streams = 0;
ssize_t ret;
memset(&pool_stats, 0x00, sizeof(struct zs_pool_stats));
@@ -437,20 +438,22 @@ static ssize_t mm_stat_show(struct device *dev,
if (init_done(zram)) {
mem_used = zs_get_total_pages(zram->meta->mem_pool);
zs_pool_stats(zram->meta->mem_pool, &pool_stats);
+ avail_streams = zcomp_num_avail_streams(zram->comp);
}
orig_size = atomic64_read(&zram->stats.pages_stored);
max_used = atomic_long_read(&zram->stats.max_used_pages);
ret = scnprintf(buf, PAGE_SIZE,
- "%8llu %8llu %8llu %8lu %8ld %8llu %8lu\n",
+ "%8llu %8llu %8llu %8lu %8ld %8llu %8lu %8d\n",
orig_size << PAGE_SHIFT,
(u64)atomic64_read(&zram->stats.compr_data_size),
mem_used << PAGE_SHIFT,
zram->limit_pages << PAGE_SHIFT,
max_used << PAGE_SHIFT,
(u64)atomic64_read(&zram->stats.zero_pages),
- pool_stats.pages_compacted);
+ pool_stats.pages_compacted,
+ avail_streams);
up_read(&zram->init_lock);
return ret;
--
2.7.0.75.g3ee1e0f
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] zram: export the number of available comp streams
2016-01-26 12:03 [PATCH] zram: export the number of available comp streams Sergey Senozhatsky
@ 2016-01-26 21:13 ` Andrew Morton
2016-01-27 0:34 ` Sergey Senozhatsky
2016-01-29 7:28 ` Minchan Kim
1 sibling, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2016-01-26 21:13 UTC (permalink / raw)
To: Sergey Senozhatsky; +Cc: Minchan Kim, linux-kernel, Sergey Senozhatsky
On Tue, 26 Jan 2016 21:03:59 +0900 Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:
> I've been asked several very simple questions:
> a) How can I ensure that zram uses (or used) several compression
> streams?
> b) What is the current number of comp streams (how much memory
> does zram *actually* use for compression streams, if there are
> more than one stream)?
>
> zram, indeed, does not provide any info and does not answer
> these questions. Reading from `max_comp_streams' let to estimate
> only theoretical comp streams memory consumption, which assumes
> that zram will allocate max_comp_streams. However, it's possible
> that the real number of compression streams will never reach that
> max value, due to various reasons, e.g. max_comp_streams is too
> high, etc.
>
> The patch adds `avail_streams' column to the /sys/block/zram<id>/mm_stat
> device file. For a single compression stream backend it's always 1,
> for a multi stream backend - it shows the actual ->avail_strm value.
>
> The number of allocated compression streams answers several
> questions:
> a) the current `level of concurrency' that the device has
> experienced
> b) the amount of memory used by compression streams (by multiplying
> the `avail_streams' column value, ->buffer size and algorithm's
> specific scratch buffer size; the last are easy to find out,
> unlike `avail_streams').
>
> --- a/Documentation/blockdev/zram.txt
> +++ b/Documentation/blockdev/zram.txt
> @@ -227,6 +227,15 @@ line of text and contains the following stats separated by whitespace:
> mem_used_max
> zero_pages
> num_migrated
> + avail_streams
> +
> +`avail_streams' column shows the current number of available compression
> +streams, which is not necessarily equal to the number of max compression
> +streams. The number of max compression streams can be set too high and be
> +unreachable (depending on the load and the usage pattern, of course).
> +`avail_streams' let to find out the real 'level of concurrency' that
> +a particular zram device saw and to calculate the real memory consumption
> +by allocated compression streams, not the theoretical maximum value.
>
"number of max compression streams" doesn't make a lot of sense. It
should be "max number of compression streams", yes"
--- a/Documentation/blockdev/zram.txt~zram-export-the-number-of-available-comp-streams-fix
+++ a/Documentation/blockdev/zram.txt
@@ -230,12 +230,13 @@ line of text and contains the following
avail_streams
`avail_streams' column shows the current number of available compression
-streams, which is not necessarily equal to the number of max compression
-streams. The number of max compression streams can be set too high and be
-unreachable (depending on the load and the usage pattern, of course).
-`avail_streams' let to find out the real 'level of concurrency' that
-a particular zram device saw and to calculate the real memory consumption
-by allocated compression streams, not the theoretical maximum value.
+streams, which is not necessarily equal to the max number of compression
+streams. The max number of compression streams can be set too high and
+can be unreachable (depending on the load and the usage pattern, of
+course). `avail_streams' permits finding out the real 'level of
+concurrency' that a particular zram device saw and to calculate the real
+memory consumption by allocated compression streams, not the theoretical
+maximum value.
9) Deactivate:
swapoff /dev/zram0
> ...
>
> --- a/drivers/block/zram/zcomp.c
> +++ b/drivers/block/zram/zcomp.c
> @@ -183,6 +183,18 @@ static bool zcomp_strm_multi_set_max_streams(struct zcomp *comp, int num_strm)
> return true;
> }
>
> +static int zcomp_strm_multi_num_avail_streams(struct zcomp *comp)
> +{
> + int avail;
> + struct zcomp_strm_multi *zs = comp->stream;
> +
> + spin_lock(&zs->strm_lock);
> + avail = zs->avail_strm;
> + spin_unlock(&zs->strm_lock);
> +
> + return avail;
> +}
The spin_lock() doesn't do anything very useful here - we're simply
reading an `int' and it could be omitted. I guess it's OK for
documentary reasons (and perhaps for the memory barrier).
>
> ...
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] zram: export the number of available comp streams
2016-01-26 21:13 ` Andrew Morton
@ 2016-01-27 0:34 ` Sergey Senozhatsky
0 siblings, 0 replies; 11+ messages in thread
From: Sergey Senozhatsky @ 2016-01-27 0:34 UTC (permalink / raw)
To: Andrew Morton
Cc: Sergey Senozhatsky, Minchan Kim, linux-kernel, Sergey Senozhatsky
On (01/26/16 13:13), Andrew Morton wrote:
[..]
> > +`avail_streams' column shows the current number of available compression
> > +streams, which is not necessarily equal to the number of max compression
> > +streams. The number of max compression streams can be set too high and be
> > +unreachable (depending on the load and the usage pattern, of course).
> > +`avail_streams' let to find out the real 'level of concurrency' that
> > +a particular zram device saw and to calculate the real memory consumption
> > +by allocated compression streams, not the theoretical maximum value.
> >
>
> "number of max compression streams" doesn't make a lot of sense. It
> should be "max number of compression streams", yes"
Thank you! much better this way.
[..]
> > +static int zcomp_strm_multi_num_avail_streams(struct zcomp *comp)
> > +{
> > + int avail;
> > + struct zcomp_strm_multi *zs = comp->stream;
> > +
> > + spin_lock(&zs->strm_lock);
> > + avail = zs->avail_strm;
> > + spin_unlock(&zs->strm_lock);
> > +
> > + return avail;
> > +}
>
> The spin_lock() doesn't do anything very useful here - we're simply
> reading an `int' and it could be omitted. I guess it's OK for
> documentary reasons (and perhaps for the memory barrier).
yes, agree, that was exactly my thinking. it's fine to have it here
for barrier, but at the same it can be scratched and replaced with
a "yes, this is racy. don't send a patch. `avail_streams' is not so
important" comment.
let's hear from Minchan, if he hates it then I'll just send a v2.
-ss
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] zram: export the number of available comp streams
2016-01-26 12:03 [PATCH] zram: export the number of available comp streams Sergey Senozhatsky
2016-01-26 21:13 ` Andrew Morton
@ 2016-01-29 7:28 ` Minchan Kim
2016-02-01 1:02 ` Sergey Senozhatsky
1 sibling, 1 reply; 11+ messages in thread
From: Minchan Kim @ 2016-01-29 7:28 UTC (permalink / raw)
To: Sergey Senozhatsky; +Cc: Andrew Morton, linux-kernel, Sergey Senozhatsky
Hello Sergey,
Sorry to late response. Thesedays, I'm really busy with personal
stuff.
Today, I have thought about this patch and have several questions.
Let's start with simple questions.
On Tue, Jan 26, 2016 at 09:03:59PM +0900, Sergey Senozhatsky wrote:
> I've been asked several very simple questions:
> a) How can I ensure that zram uses (or used) several compression
> streams?
Why does he want to ensure several compression streams?
As you know well, zram handle it dynamically.
If zram cannot allocate more streams, it means the system is
heavily fragmented or memory pressure at that time so there
is no worth to add more stream, I think.
Could you elaborate it more why he want to know it and what
he expect from that?
> b) What is the current number of comp streams (how much memory
> does zram *actually* use for compression streams, if there are
> more than one stream)?
Hmm, in the kernel, there are lots of example subsystem
we cannot know exact memory usage. Why does the user want
to know exact memory usage of zram? What is his concern?
In advance, sorry for slow response.
>
> zram, indeed, does not provide any info and does not answer
> these questions. Reading from `max_comp_streams' let to estimate
> only theoretical comp streams memory consumption, which assumes
> that zram will allocate max_comp_streams. However, it's possible
> that the real number of compression streams will never reach that
> max value, due to various reasons, e.g. max_comp_streams is too
> high, etc.
>
> The patch adds `avail_streams' column to the /sys/block/zram<id>/mm_stat
> device file. For a single compression stream backend it's always 1,
> for a multi stream backend - it shows the actual ->avail_strm value.
>
> The number of allocated compression streams answers several
> questions:
> a) the current `level of concurrency' that the device has
> experienced
> b) the amount of memory used by compression streams (by multiplying
> the `avail_streams' column value, ->buffer size and algorithm's
> specific scratch buffer size; the last are easy to find out,
> unlike `avail_streams').
>
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] zram: export the number of available comp streams
2016-01-29 7:28 ` Minchan Kim
@ 2016-02-01 1:02 ` Sergey Senozhatsky
2016-03-18 0:32 ` Minchan Kim
0 siblings, 1 reply; 11+ messages in thread
From: Sergey Senozhatsky @ 2016-02-01 1:02 UTC (permalink / raw)
To: Minchan Kim
Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel, Sergey Senozhatsky
Hello Minchan,
On (01/29/16 16:28), Minchan Kim wrote:
> Hello Sergey,
>
> Sorry to late response. Thesedays, I'm really busy with personal
> stuff.
sure, no worries :)
> On Tue, Jan 26, 2016 at 09:03:59PM +0900, Sergey Senozhatsky wrote:
> > I've been asked several very simple questions:
> > a) How can I ensure that zram uses (or used) several compression
> > streams?
>
> Why does he want to ensure several compression streams?
> As you know well, zram handle it dynamically.
>
> If zram cannot allocate more streams, it means the system is
> heavily fragmented or memory pressure at that time so there
> is no worth to add more stream, I think.
>
> Could you elaborate it more why he want to know it and what
> he expect from that?
good questions. I believe mostly it's about fine-tuning on a
per-device basis, which is getting especially tricky when zram
devices are used as a sort of in-memory tmp storage for various
applications (black boxen).
> > b) What is the current number of comp streams (how much memory
> > does zram *actually* use for compression streams, if there are
> > more than one stream)?
>
> Hmm, in the kernel, there are lots of example subsystem
> we cannot know exact memory usage. Why does the user want
> to know exact memory usage of zram? What is his concern?
certainly true. probably some of those sub-systems/drivers have some
sort of LRU, or shrinker callbacks, to release unneeded memory back.
zram only allocates streams, and it basically hard to tell how many:
up to max_comp_streams, which can be larger than the number of cpus
on the system; because we keep preemption enabled (I didn't realize
that until I played with the patch) around
zcomp_strm_find()/zcomp_strm_release():
zram_bvec_write()
{
...
zstrm = zcomp_strm_find(zram->comp);
>> can preempt
user_mem = kmap_atomic(page);
>> now atomic
zcomp_compress()
...
kunmap_atomic()
>> can preempt
zcomp_strm_release()
...
}
so how many streams I can have on my old 4-cpus x86_64 box?
10?
yes.
# cat /sys/block/zram0/mm_stat
630484992 9288707 13103104 0 13103104 16240 0 10
16?
yes.
# cat /sys/block/zram0/mm_stat
1893117952 25296718 31354880 0 31354880 15342 0 16
21?
yes.
# cat /sys/block/zram0/mm_stat
1893167104 28499936 46616576 0 46616576 15330 0 21
do I need 21? may be no. do I nede 18? if 18 streams are needed only 10%
of the time (I can figure it out by doing repetitive cat zramX/mm_stat),
then I can set max_comp_streams to make 90% of applications happy, e.g.
max_comp_streams to 10, and save some memory.
10 echo X > /sys/block/zramX/max_comp_streams
20 do tests
30 cat /sys/block/zramX/mm_stat
40 update X (increase/decrease) if needed, otherwise break
50 goto 10
> In advance, sorry for slow response.
no prob, my response wasn't super fast either :)
-ss
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] zram: export the number of available comp streams
2016-02-01 1:02 ` Sergey Senozhatsky
@ 2016-03-18 0:32 ` Minchan Kim
2016-03-18 1:09 ` Sergey Senozhatsky
0 siblings, 1 reply; 11+ messages in thread
From: Minchan Kim @ 2016-03-18 0:32 UTC (permalink / raw)
To: Sergey Senozhatsky; +Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel
Hi Sergey,
I forgot this patch until now. Sorry about that.
On Mon, Feb 01, 2016 at 10:02:48AM +0900, Sergey Senozhatsky wrote:
> Hello Minchan,
>
> On (01/29/16 16:28), Minchan Kim wrote:
> > Hello Sergey,
> >
> > Sorry to late response. Thesedays, I'm really busy with personal
> > stuff.
>
> sure, no worries :)
>
> > On Tue, Jan 26, 2016 at 09:03:59PM +0900, Sergey Senozhatsky wrote:
> > > I've been asked several very simple questions:
> > > a) How can I ensure that zram uses (or used) several compression
> > > streams?
> >
> > Why does he want to ensure several compression streams?
> > As you know well, zram handle it dynamically.
> >
> > If zram cannot allocate more streams, it means the system is
> > heavily fragmented or memory pressure at that time so there
> > is no worth to add more stream, I think.
> >
> > Could you elaborate it more why he want to know it and what
> > he expect from that?
>
> good questions. I believe mostly it's about fine-tuning on a
> per-device basis, which is getting especially tricky when zram
> devices are used as a sort of in-memory tmp storage for various
> applications (black boxen).
>
> > > b) What is the current number of comp streams (how much memory
> > > does zram *actually* use for compression streams, if there are
> > > more than one stream)?
> >
> > Hmm, in the kernel, there are lots of example subsystem
> > we cannot know exact memory usage. Why does the user want
> > to know exact memory usage of zram? What is his concern?
>
> certainly true. probably some of those sub-systems/drivers have some
> sort of LRU, or shrinker callbacks, to release unneeded memory back.
> zram only allocates streams, and it basically hard to tell how many:
> up to max_comp_streams, which can be larger than the number of cpus
> on the system; because we keep preemption enabled (I didn't realize
> that until I played with the patch) around
> zcomp_strm_find()/zcomp_strm_release():
>
> zram_bvec_write()
> {
> ...
> zstrm = zcomp_strm_find(zram->comp);
> >> can preempt
> user_mem = kmap_atomic(page);
> >> now atomic
> zcomp_compress()
> ...
> kunmap_atomic()
> >> can preempt
> zcomp_strm_release()
> ...
> }
>
> so how many streams I can have on my old 4-cpus x86_64 box?
>
> 10?
> yes.
>
> # cat /sys/block/zram0/mm_stat
> 630484992 9288707 13103104 0 13103104 16240 0 10
>
> 16?
> yes.
>
> # cat /sys/block/zram0/mm_stat
> 1893117952 25296718 31354880 0 31354880 15342 0 16
>
> 21?
> yes.
>
> # cat /sys/block/zram0/mm_stat
> 1893167104 28499936 46616576 0 46616576 15330 0 21
>
> do I need 21? may be no. do I nede 18? if 18 streams are needed only 10%
> of the time (I can figure it out by doing repetitive cat zramX/mm_stat),
> then I can set max_comp_streams to make 90% of applications happy, e.g.
> max_comp_streams to 10, and save some memory.
>
Okay. Let's go back to zcomp design decade. As you remember, the reason
we separated single and multi stream code was performance caused by
locking scheme(ie, mutex_lock in single stream model was really fast
than sleep/wakeup model in multi stream).
If we could overcome that problem back then, we should have gone to
multi stream code default.
How about using *per-cpu* streams?
I remember you wanted to create max number of comp streams statically
although I didn't want at that time but I change my decision.
Let's allocate comp stream statically but remove max_comp_streams
knob. Instead, by default, zram alloctes number of streams according
to the number of online CPU.
So I think we can solve locking scheme issue in single stream
, guarantee parallel level as well as enhancing performance with
no locking.
Downside with the approach is that unnecessary memory space reserve
although zram might be used 1% of running system time. But we
should give it up for other benefits(ie, simple code, removing
max_comp_streams knob, no need to this your stat, guarantee parallel
level, guarantee consumed memory space).
What do you think about it?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] zram: export the number of available comp streams
2016-03-18 0:32 ` Minchan Kim
@ 2016-03-18 1:09 ` Sergey Senozhatsky
2016-03-18 1:25 ` Minchan Kim
0 siblings, 1 reply; 11+ messages in thread
From: Sergey Senozhatsky @ 2016-03-18 1:09 UTC (permalink / raw)
To: Minchan Kim
Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton, linux-kernel
Hello Minchan,
On (03/18/16 09:32), Minchan Kim wrote:
[..]
> > do I need 21? may be no. do I nede 18? if 18 streams are needed only 10%
> > of the time (I can figure it out by doing repetitive cat zramX/mm_stat),
> > then I can set max_comp_streams to make 90% of applications happy, e.g.
> > max_comp_streams to 10, and save some memory.
> >
>
> Okay. Let's go back to zcomp design decade. As you remember, the reason
> we separated single and multi stream code was performance caused by
> locking scheme(ie, mutex_lock in single stream model was really fast
> than sleep/wakeup model in multi stream).
> If we could overcome that problem back then, we should have gone to
> multi stream code default.
yes, IIRC I wanted to limit the number of streams by the number of
online CPUs (or was it 2*num_online_cpus()?), and thus change the
number of streams dynamically (because CPUs can go on and off line);
and create at least num_online_cpus() streams during device
initialization.
the reason for a single-stream zram IIRC were setups in which
zram is used as a swap device. streams require some memory, after
all. and then we discovered that mutex spin on owner boosts single
stream zram significantly.
> How about using *per-cpu* streams?
OK. instead of list of idle streams use per-cpu pointer and process
CPU_FOO notifications. that can work, sounds good to me.
> I remember you wanted to create max number of comp streams statically
> although I didn't want at that time but I change my decision.
>
> Let's allocate comp stream statically but remove max_comp_streams
> knob. Instead, by default, zram alloctes number of streams according
> to the number of online CPU.
OK. removing `max_comp_streams' will take another 2 years. That's
a major change, we can leave it for longer, just make it nop.
> So I think we can solve locking scheme issue in single stream
> , guarantee parallel level as well as enhancing performance with
> no locking.
>
> Downside with the approach is that unnecessary memory space reserve
> although zram might be used 1% of running system time. But we
> should give it up for other benefits
aha, ok.
> (ie, simple code, removing
> max_comp_streams knob, no need to this your stat, guarantee parallel
> level, guarantee consumed memory space).
I'll take a look and prepare some numbers (most likely next week).
> What do you think about it?
so should I ask Andrew to drop this patch?
-ss
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] zram: export the number of available comp streams
2016-03-18 1:09 ` Sergey Senozhatsky
@ 2016-03-18 1:25 ` Minchan Kim
2016-03-21 7:51 ` Sergey Senozhatsky
0 siblings, 1 reply; 11+ messages in thread
From: Minchan Kim @ 2016-03-18 1:25 UTC (permalink / raw)
To: Sergey Senozhatsky; +Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel
On Fri, Mar 18, 2016 at 10:09:37AM +0900, Sergey Senozhatsky wrote:
> Hello Minchan,
>
> On (03/18/16 09:32), Minchan Kim wrote:
> [..]
> > > do I need 21? may be no. do I nede 18? if 18 streams are needed only 10%
> > > of the time (I can figure it out by doing repetitive cat zramX/mm_stat),
> > > then I can set max_comp_streams to make 90% of applications happy, e.g.
> > > max_comp_streams to 10, and save some memory.
> > >
> >
> > Okay. Let's go back to zcomp design decade. As you remember, the reason
> > we separated single and multi stream code was performance caused by
> > locking scheme(ie, mutex_lock in single stream model was really fast
> > than sleep/wakeup model in multi stream).
> > If we could overcome that problem back then, we should have gone to
> > multi stream code default.
>
> yes, IIRC I wanted to limit the number of streams by the number of
> online CPUs (or was it 2*num_online_cpus()?), and thus change the
> number of streams dynamically (because CPUs can go on and off line);
> and create at least num_online_cpus() streams during device
> initialization.
>
> the reason for a single-stream zram IIRC were setups in which
> zram is used as a swap device. streams require some memory, after
> all. and then we discovered that mutex spin on owner boosts single
> stream zram significantly.
>
> > How about using *per-cpu* streams?
>
> OK. instead of list of idle streams use per-cpu pointer and process
> CPU_FOO notifications. that can work, sounds good to me.
>
>
> > I remember you wanted to create max number of comp streams statically
> > although I didn't want at that time but I change my decision.
> >
> > Let's allocate comp stream statically but remove max_comp_streams
> > knob. Instead, by default, zram alloctes number of streams according
> > to the number of online CPU.
>
> OK. removing `max_comp_streams' will take another 2 years. That's
> a major change, we can leave it for longer, just make it nop.
>
> > So I think we can solve locking scheme issue in single stream
> > , guarantee parallel level as well as enhancing performance with
> > no locking.
> >
> > Downside with the approach is that unnecessary memory space reserve
> > although zram might be used 1% of running system time. But we
> > should give it up for other benefits
>
> aha, ok.
>
> > (ie, simple code, removing
> > max_comp_streams knob, no need to this your stat, guarantee parallel
> > level, guarantee consumed memory space).
>
> I'll take a look and prepare some numbers (most likely next week).
Sounds great to me!
>
>
> > What do you think about it?
>
> so should I ask Andrew to drop this patch?
Yeb.
Thanks!
>
> -ss
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] zram: export the number of available comp streams
2016-03-18 1:25 ` Minchan Kim
@ 2016-03-21 7:51 ` Sergey Senozhatsky
2016-03-22 0:39 ` Minchan Kim
0 siblings, 1 reply; 11+ messages in thread
From: Sergey Senozhatsky @ 2016-03-21 7:51 UTC (permalink / raw)
To: Minchan Kim
Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton, linux-kernel
Hello Minchan,
On (03/18/16 10:25), Minchan Kim wrote:
[..]
> > aha, ok.
> >
> > > (ie, simple code, removing
> > > max_comp_streams knob, no need to this your stat, guarantee parallel
> > > level, guarantee consumed memory space).
> >
> > I'll take a look and prepare some numbers (most likely next week).
>
> Sounds great to me!
so I have schematically this thing now. streams are per-cpu and contain
scratch buffer and work mem.
zram_bvec_write()
{
*get_cpu_ptr(comp->stream);
zcomp_compress();
zs_malloc()
put_cpu_ptr(comp->stream);
}
this, however, makes zsmalloc unhapy. pool has GFP_NOIO | __GFP_HIGHMEM
gfp, and GFP_NOIO is ___GFP_DIRECT_RECLAIM|___GFP_KSWAPD_RECLAIM. this
__GFP_DIRECT_RECLAIM is in the conflict with per-cpu streams, because
per-cpu streams require disabled preemption (up until we copy stream
buffer to zspage). so what options do we have here... from the top of
my head (w/o a lot of thinking)...
-- remove __GFP_DIRECT_RECLAIM from pool gfp mask, which a bit is risky...
IOW, make pool gfp '___GFP_KSWAPD_RECLAIM | __GFP_HIGHMEM'
-- kmalloc/kfree temp buffer for every RW op, which is ugly... because
it sort of voids the whole purpose of per-cpu streams.
-- ... hm.
-ss
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] zram: export the number of available comp streams
2016-03-21 7:51 ` Sergey Senozhatsky
@ 2016-03-22 0:39 ` Minchan Kim
2016-03-23 8:01 ` Sergey Senozhatsky
0 siblings, 1 reply; 11+ messages in thread
From: Minchan Kim @ 2016-03-22 0:39 UTC (permalink / raw)
To: Sergey Senozhatsky; +Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel
Hello Sergey,
On Mon, Mar 21, 2016 at 04:51:28PM +0900, Sergey Senozhatsky wrote:
> Hello Minchan,
>
> On (03/18/16 10:25), Minchan Kim wrote:
> [..]
> > > aha, ok.
> > >
> > > > (ie, simple code, removing
> > > > max_comp_streams knob, no need to this your stat, guarantee parallel
> > > > level, guarantee consumed memory space).
> > >
> > > I'll take a look and prepare some numbers (most likely next week).
> >
> > Sounds great to me!
>
> so I have schematically this thing now. streams are per-cpu and contain
> scratch buffer and work mem.
>
> zram_bvec_write()
> {
> *get_cpu_ptr(comp->stream);
> zcomp_compress();
> zs_malloc()
> put_cpu_ptr(comp->stream);
> }
>
> this, however, makes zsmalloc unhapy. pool has GFP_NOIO | __GFP_HIGHMEM
> gfp, and GFP_NOIO is ___GFP_DIRECT_RECLAIM|___GFP_KSWAPD_RECLAIM. this
> __GFP_DIRECT_RECLAIM is in the conflict with per-cpu streams, because
> per-cpu streams require disabled preemption (up until we copy stream
> buffer to zspage). so what options do we have here... from the top of
> my head (w/o a lot of thinking)...
Indeed.
> -- remove __GFP_DIRECT_RECLAIM from pool gfp mask, which a bit is risky...
> IOW, make pool gfp '___GFP_KSWAPD_RECLAIM | __GFP_HIGHMEM'
Yeb. It would be okay for zram-swap but not zram-blk.
> -- kmalloc/kfree temp buffer for every RW op, which is ugly... because
> it sort of voids the whole purpose of per-cpu streams.
How about this?
zram_bvec_write()
{
retry:
*get_cpu_ptr(comp->stream);
zcomp_compress();
handle = zs_malloc((gfp &~ __GFP_DIRECT_RECLAIM| | GFP_NOWARN)
if (!handle) {
put_cpu_ptr(comp->stream);
handle = zs_malloc(gfp);
goto retry;
}
put_cpu_ptr(comp->stream);
}
If per-cpu model really performance win, it is worth to try.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] zram: export the number of available comp streams
2016-03-22 0:39 ` Minchan Kim
@ 2016-03-23 8:01 ` Sergey Senozhatsky
0 siblings, 0 replies; 11+ messages in thread
From: Sergey Senozhatsky @ 2016-03-23 8:01 UTC (permalink / raw)
To: Minchan Kim
Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton, linux-kernel
( was "[PATCH] zram: export the number of available comp streams"
forked from http://marc.info/?l=linux-kernel&m=145860707516861 )
Hello Minchan,
forked into a separate tread.
On (03/22/16 09:39), Minchan Kim wrote:
> > zram_bvec_write()
> > {
> > *get_cpu_ptr(comp->stream);
> > zcomp_compress();
> > zs_malloc()
> > put_cpu_ptr(comp->stream);
> > }
> >
> > this, however, makes zsmalloc unhapy. pool has GFP_NOIO | __GFP_HIGHMEM
> > gfp, and GFP_NOIO is ___GFP_DIRECT_RECLAIM|___GFP_KSWAPD_RECLAIM. this
> > __GFP_DIRECT_RECLAIM is in the conflict with per-cpu streams, because
> > per-cpu streams require disabled preemption (up until we copy stream
> > buffer to zspage). so what options do we have here... from the top of
> > my head (w/o a lot of thinking)...
>
> Indeed.
...
> How about this?
>
> zram_bvec_write()
> {
> retry:
> *get_cpu_ptr(comp->stream);
> zcomp_compress();
> handle = zs_malloc((gfp &~ __GFP_DIRECT_RECLAIM| | GFP_NOWARN)
> if (!handle) {
> put_cpu_ptr(comp->stream);
> handle = zs_malloc(gfp);
> goto retry;
> }
> put_cpu_ptr(comp->stream);
> }
interesting. the retry jump should go higher, we have "user_mem = kmap_atomic(page)"
which we unmap right after compression, because a) we don't need
uncompressed memory anymore b) zs_malloc() can sleep and we can't have atomic
mapping around. the nasty thing here is is_partial_io(). we need to re-do
if (is_partial_io(bvec))
memcpy(uncmem + offset, user_mem + bvec->bv_offset,
bvec->bv_len);
once again in the worst case.
so zs_malloc((gfp &~ __GFP_DIRECT_RECLAIM | GFP_NOWARN) so far can cause
double memcpy() and double compression. just to outline this.
the test.
I executed a number of iozone tests, on each iteration re-creating zram
device (3GB, LZO, EXT4. the box has 4 x86_64 CPUs).
$DEVICE_SZ=3G
$FREE_SPACE is 10% of $DEVICE_SZ
time ./iozone -t $i -R -r $((8*$i))K -s $((($DEVICE_SZ/$i - $FREE_SPACE)/(1024*1024)))M -I +Z
columns:
TEST MAX_STREAMS 4 MAX_STREAMS 8 PER_CPU STREAMS
====================================================================
Test #1 iozone -t 1 -R -r 8K -s 2764M -I +Z
Initial write 853492.31* 835868.50 839789.56
Rewrite 1642073.88 1657255.75 1693011.50*
Read 3384044.00* 3218727.25 3269109.50
Re-read 3389794.50* 3243187.00 3267422.25
Reverse Read 3209805.75* 3082040.00 3107957.25
Stride read 3100144.50* 2972280.25 2923155.25
Random read 2992249.75* 2874605.00 2854824.25
Mixed workload 2992274.75* 2878212.25 2883840.00
Random write 1471800.00 1452346.50 1515678.75*
Pwrite 802083.00 801627.31 820251.69*
Pread 3443495.00* 3308659.25 3302089.00
Fwrite 1880446.88 1838607.50 1909490.00*
Fread 3479614.75 3091634.75 6442964.50*
= real 1m4.170s 1m4.513s 1m4.123s
= user 0m0.559s 0m0.518s 0m0.511s
= sys 0m18.766s 0m19.264s 0m18.641s
Test #2 iozone -t 2 -R -r 16K -s 1228M -I +Z
Initial write 2102532.12 2051809.19 2419072.50*
Rewrite 2217024.25 2250930.00 3681559.00*
Read 7716933.25 7898759.00 8345507.75*
Re-read 7748487.75 7765282.25 8342367.50*
Reverse Read 7415254.25 7552637.25 7822691.75*
Stride read 7041909.50 7091049.25 7401273.00*
Random read 6205044.25 6738888.50 7232104.25*
Mixed workload 4582990.00 5271651.50 5361002.88*
Random write 2591893.62 2513729.88 3660774.38*
Pwrite 1873876.75 1909758.69 2087238.81*
Pread 4669850.00 4651121.56 4919588.44*
Fwrite 1937947.25 1940628.06 2034251.25*
Fread 9930319.00 9970078.00* 9831422.50
= real 0m53.844s 0m53.607s 0m52.528s
= user 0m0.273s 0m0.289s 0m0.280s
= sys 0m16.595s 0m16.478s 0m14.072s
Test #3 iozone -t 3 -R -r 24K -s 716M -I +Z
Initial write 3036567.50 2998918.25 3683853.00*
Rewrite 3402447.88 3415685.88 5054705.38*
Read 11767413.00* 11133789.50 11246497.25
Re-read 11797680.50* 11092592.00 11277382.00
Reverse Read 10828320.00* 10157665.50 10749055.00
Stride read 10532039.50* 9943521.75 10464700.25
Random read 10380365.75* 9807859.25 10234127.00
Mixed workload 8772132.50* 8415083.50 8457108.50
Random write 3364875.00 3310042.00 5059136.38*
Pwrite 2677290.25 2651309.50 3198166.25*
Pread 5221799.56* 4963050.69 4987293.78
Fwrite 2026887.56 2047679.00 2124199.62*
Fread 11310381.25 11413531.50 11444208.75*
= real 0m50.209s 0m50.782s 0m49.750s
= user 0m0.195s 0m0.205s 0m0.215s
= sys 0m14.873s 0m15.159s 0m12.911s
Test #4 iozone -t 4 -R -r 32K -s 460M -I +Z
Initial write 3841474.94 3859279.81 5309988.88*
Rewrite 3905526.25 3917309.62 6814800.62*
Read 16233054.50 14843560.25 16352283.75*
Re-read 16335506.50 15529152.25 16352570.00*
Reverse Read 15316394.50* 14225482.50 15004897.50
Stride read 14799380.25* 14064034.25 14355184.25
Random read 14683771.00 14206928.50 14814913.00*
Mixed workload 9058851.50 9180650.75 10815917.50*
Random write 3990585.94 4004757.00 6722088.50*
Pwrite 3318836.12 3468977.69 4244747.69*
Pread 5894538.16* 5588046.38 5847345.62
Fwrite 2227353.75 2186688.62 2386974.88*
Fread 12046094.00 12240004.75* 12073956.75
= real 0m48.561s 0m48.839s 0m48.142s
= user 0m0.155s 0m0.170s 0m0.133s
= sys 0m13.650s 0m13.684s 0m10.790s
Test #5 iozone -t 5 -R -r 40K -s 307M -I +Z
Initial write 4034878.94 4026610.69 5775746.12*
Rewrite 3898600.44 3901114.16 6923764.19*
Read 14947360.88 16698824.25* 10155333.62
Re-read 15844580.75* 15344057.00 9869874.38
Reverse Read 7459156.95 9023317.86* 7648295.03
Stride read 10823891.81 9615553.81 11231183.72*
Random read 10391702.56* 9740935.75 10048038.28
Mixed workload 8261830.94 10175925.00* 7535763.75
Random write 3951423.31 3960984.62 6671441.38*
Pwrite 4119023.12 4097204.56 5975659.12*
Pread 6072076.73* 4338668.50 6020808.34
Fwrite 2417235.47 2337875.88 2665450.62*
Fread 13393630.25 13648332.00* 13395391.00
= real 0m47.756s 0m47.939s 0m47.483s
= user 0m0.128s 0m0.128s 0m0.119s
= sys 0m10.361s 0m10.392s 0m8.717s
Test #6 iozone -t 6 -R -r 48K -s 204M -I +Z
Initial write 4134932.97 4137171.88 5983193.31*
Rewrite 3928131.31 3950764.00 7124248.00*
Read 10965005.75* 10152236.50 9856572.88
Re-read 9386946.00 10776231.38 14303174.12*
Reverse Read 6035244.89 7456152.38* 5999446.38
Stride read 8041000.75 7995307.75 10182936.75*
Random read 8565099.09 10487707.58* 8694877.25
Mixed workload 5301593.06 7332589.09* 6802251.06
Random write 4046482.56 3986854.94 6723824.56*
Pwrite 4188226.41 4214513.34 6245278.44*
Pread 3452596.86 3708694.69* 3486420.41
Fwrite 2829500.22 3030742.72 3033792.28*
Fread 13331387.75 13490416.50 14940410.25*
= real 0m47.150s 0m47.050s 0m47.044s
= user 0m0.106s 0m0.100s 0m0.094s
= sys 0m9.238s 0m8.804s 0m6.930s
Test #7 iozone -t 7 -R -r 56K -s 131M -I +Z
Initial write 4169480.84 4116331.03 5946801.38*
Rewrite 3993155.97 3986195.00 6928142.44*
Read 18901600.25* 10088918.69 6699592.78
Re-read 8738544.69 14881309.62* 13960026.06
Reverse Read 5008919.08 7923949.95* 5495212.41
Stride read 7029436.75 8747574.91* 6477087.25
Random read 6994738.56* 5448687.81 6585235.53
Mixed workload 5178632.44 5258914.92 5587421.81*
Random write 4008977.78 3928116.88 6816453.12*
Pwrite 4342852.09 4154319.09 6124520.06*
Pread 3880318.99 2978587.56 4493903.14*
Fwrite 5557990.03 2923556.59 6126649.94*
Fread 14451722.00 15281179.62* 14675436.50
= real 0m46.321s 0m46.458s 0m45.791s
= user 0m0.093s 0m0.089s 0m0.095s
= sys 0m6.961s 0m6.600s 0m5.499s
Test #8 iozone -t 8 -R -r 64K -s 76M -I +Z
Initial write 4354783.88 4392731.31 6337397.50*
Rewrite 4070162.69 3974051.50 7587279.81*
Read 10095324.56 17945227.88* 8359665.56
Re-read 12316555.88 20468303.75* 7949999.34
Reverse Read 4924659.84 8542573.33* 6388858.72
Stride read 10895715.69 14828968.38* 6107484.81
Random read 6838537.34 14352104.25* 5389174.97
Mixed workload 5805646.75 8391745.53* 6052748.25
Random write 4148973.38 3890847.38 7247214.19*
Pwrite 4309372.41 4423800.34 6863604.69*
Pread 4875766.02* 4042375.33 3692948.91
Fwrite 6102404.31 6021884.41 6634112.09*
Fread 15485971.12* 14900780.62 13981842.50
= real 0m45.618s 0m45.753s 0m45.619s
= user 0m0.071s 0m0.080s 0m0.060s
= sys 0m4.702s 0m4.430s 0m3.555s
Test #9 iozone -t 9 -R -r 72K -s 34M -I +Z
Initial write 4202354.67 4208936.34 6300798.88*
Rewrite 4046855.38 4294137.50 7623323.69*
Read 10926571.88 13304801.81* 10895587.19
Re-read 17725984.94* 7964431.25 12394078.50
Reverse Read 5843121.72 5851846.66* 4075657.20
Stride read 9688998.59 10306234.70* 5566376.62
Random read 7656689.97 8660602.06* 5437182.36
Mixed workload 6229215.62 11205238.73* 5575719.75
Random write 4094822.22 4517401.86 6601624.94*
Pwrite 4274497.50 4263936.64 6844453.11*
Pread 6525075.62* 6043725.62 5745003.28
Fwrite 5958798.56 8430354.78* 7636085.00
Fread 18636725.12* 17268959.12 16618803.62
= real 0m44.945s 0m44.816s 0m45.194s
= user 0m0.062s 0m0.060s 0m0.060s
= sys 0m2.187s 0m2.223s 0m1.888s
Test #10 iozone -t 10 -R -r 80K -s 0M -I +Z
Initial write 3213973.56 2731512.62 4416466.25*
Rewrite 3066956.44* 2693819.50 332671.94
Read 7769523.25* 2681473.75 462840.44
Re-read 5244861.75 5473037.00* 382183.03
Reverse Read 7479397.25* 4869597.75 374714.06
Stride read 5403282.50* 5385083.75 382473.44
Random read 5131997.25 5176799.75* 380593.56
Mixed workload 3998043.25 4219049.00* 1645850.45
Random write 3452832.88 3290861.69 3588531.75*
Pwrite 3757435.81 2711756.47 4561807.88*
Pread 2743595.25* 2635835.00 412947.98
Fwrite 16076549.00 16741977.25* 14797209.38
Fread 23581812.62* 21664184.25 5064296.97
= real 0m44.490s 0m44.444s 0m44.609s
= user 0m0.054s 0m0.049s 0m0.055s
= sys 0m0.037s 0m0.046s 0m0.148s
so when the number of active tasks become larger than the number
of online CPUS, iozone reports a bit hard to understand data. I
can assume that since now we keep the preemption disabled longer
in write path, a concurrent operation (READ or WRITE) cannot preempt
current anymore... slightly suspicious.
the other hard to understand thing is why do READ-only tests have
such a huge jitter. READ-only tests don't depend on streams, they
don't even use them, we supply compressed data directly to
decompression api.
may be better retire iozone and never use it again.
"118 insertions(+), 238 deletions(-)" the patches remove a big
pile of code.
-ss
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-03-23 7:59 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-26 12:03 [PATCH] zram: export the number of available comp streams Sergey Senozhatsky
2016-01-26 21:13 ` Andrew Morton
2016-01-27 0:34 ` Sergey Senozhatsky
2016-01-29 7:28 ` Minchan Kim
2016-02-01 1:02 ` Sergey Senozhatsky
2016-03-18 0:32 ` Minchan Kim
2016-03-18 1:09 ` Sergey Senozhatsky
2016-03-18 1:25 ` Minchan Kim
2016-03-21 7:51 ` Sergey Senozhatsky
2016-03-22 0:39 ` Minchan Kim
2016-03-23 8:01 ` Sergey Senozhatsky
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.