zram: add zstd to the supported algorithms list
diff mbox series

Message ID 20170824014936.4738-1-sergey.senozhatsky@gmail.com
State New, archived
Headers show
Series
  • zram: add zstd to the supported algorithms list
Related show

Commit Message

Sergey Senozhatsky Aug. 24, 2017, 1:49 a.m. UTC
Add ZSTD to the list of supported compression algorithms.

Official benchmarks [1]:

Compressor name         Ratio   Compression     Decompress.
zstd 1.1.3 -1           2.877   430 MB/s        1110 MB/s
zlib 1.2.8 -1           2.743   110 MB/s        400 MB/s
brotli 0.5.2 -0         2.708   400 MB/s        430 MB/s
quicklz 1.5.0 -1        2.238   550 MB/s        710 MB/s
lzo1x 2.09 -1           2.108   650 MB/s        830 MB/s
lz4 1.7.5               2.101   720 MB/s        3600 MB/s
snappy 1.1.3            2.091   500 MB/s        1650 MB/s
lzf 3.6 -1              2.077   400 MB/s        860 MB/s

[1] https://github.com/facebook/zstd

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 drivers/block/zram/zcomp.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Minchan Kim Aug. 24, 2017, 4:30 a.m. UTC | #1
Hello Sergey,

On Thu, Aug 24, 2017 at 10:49:36AM +0900, Sergey Senozhatsky wrote:
> Add ZSTD to the list of supported compression algorithms.
> 
> Official benchmarks [1]:

First of all, thanks for the work!

I want to ask one thing.

Could you add some benchmark(e.g.,) result(comp ratio and speed)
compared to (inflate, lzo, lz4)?

I want to see how much it's good for small data that ours is 4K.
Thanks!

> 
> Compressor name         Ratio   Compression     Decompress.
> zstd 1.1.3 -1           2.877   430 MB/s        1110 MB/s
> zlib 1.2.8 -1           2.743   110 MB/s        400 MB/s
> brotli 0.5.2 -0         2.708   400 MB/s        430 MB/s
> quicklz 1.5.0 -1        2.238   550 MB/s        710 MB/s
> lzo1x 2.09 -1           2.108   650 MB/s        830 MB/s
> lz4 1.7.5               2.101   720 MB/s        3600 MB/s
> snappy 1.1.3            2.091   500 MB/s        1650 MB/s
> lzf 3.6 -1              2.077   400 MB/s        860 MB/s
> 
> [1] https://github.com/facebook/zstd
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> ---
>  drivers/block/zram/zcomp.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
> index 5b8992beffec..cc66daec7bbc 100644
> --- a/drivers/block/zram/zcomp.c
> +++ b/drivers/block/zram/zcomp.c
> @@ -31,6 +31,9 @@ static const char * const backends[] = {
>  #endif
>  #if IS_ENABLED(CONFIG_CRYPTO_842)
>  	"842",
> +#endif
> +#if IS_ENABLED(CONFIG_CRYPTO_ZSTD)
> +	"zstd",
>  #endif
>  	NULL
>  };
> -- 
> 2.14.1
>
Sergey Senozhatsky Aug. 24, 2017, 2:04 p.m. UTC | #2
Hi,

On (08/24/17 13:30), Minchan Kim wrote:
> Hello Sergey,
> 
> On Thu, Aug 24, 2017 at 10:49:36AM +0900, Sergey Senozhatsky wrote:
> > Add ZSTD to the list of supported compression algorithms.
> > 
> > Official benchmarks [1]:
> 
> First of all, thanks for the work!
> 
> I want to ask one thing.
> 
> Could you add some benchmark(e.g.,) result(comp ratio and speed)
> compared to (inflate, lzo, lz4)?
> 
> I want to see how much it's good for small data that ours is 4K.


so on my syntetic fio test (with a static buffer):


                     LZO         DEFLATE         ZSTD

#jobs1                         	                	                
WRITE:              (2180MB/s)	 (77.2MB/s)    	 (1429MB/s)    
WRITE:              (1617MB/s)	 (77.7MB/s)    	 (1202MB/s)    
READ:                (426MB/s)	 (595MB/s)     	 (1181MB/s)    
READ:                (422MB/s)	 (572MB/s)     	 (1020MB/s)    
READ:                (318MB/s)	 (67.8MB/s)    	 (563MB/s)     
WRITE:               (318MB/s)	 (67.9MB/s)    	 (564MB/s)     
READ:                (336MB/s)	 (68.3MB/s)    	 (583MB/s)     
WRITE:               (335MB/s)	 (68.2MB/s)    	 (582MB/s)     
#jobs2                         	                	                
WRITE:              (3441MB/s)	 (152MB/s)     	 (2141MB/s)    
WRITE:              (2507MB/s)	 (147MB/s)     	 (1888MB/s)    
READ:                (801MB/s)	 (1146MB/s)    	 (1890MB/s)    
READ:                (767MB/s)	 (1096MB/s)    	 (2073MB/s)    
READ:                (621MB/s)	 (126MB/s)     	 (1009MB/s)    
WRITE:               (621MB/s)	 (126MB/s)     	 (1009MB/s)    
READ:                (656MB/s)	 (125MB/s)     	 (1075MB/s)    
WRITE:               (657MB/s)	 (126MB/s)     	 (1077MB/s)    
#jobs3                         	                	                
WRITE:              (4772MB/s)	 (225MB/s)     	 (3394MB/s)    
WRITE:              (3905MB/s)	 (211MB/s)     	 (2939MB/s)    
READ:               (1216MB/s)	 (1608MB/s)    	 (3218MB/s)    
READ:               (1159MB/s)	 (1431MB/s)    	 (2981MB/s)    
READ:                (906MB/s)	 (156MB/s)     	 (1457MB/s)    
WRITE:               (907MB/s)	 (156MB/s)     	 (1458MB/s)    
READ:                (953MB/s)	 (158MB/s)     	 (1595MB/s)    
WRITE:               (952MB/s)	 (157MB/s)     	 (1593MB/s)    
#jobs4                         	                	                
WRITE:              (6036MB/s)	 (265MB/s)     	 (4469MB/s)    
WRITE:              (5059MB/s)	 (263MB/s)     	 (3951MB/s)    
READ:               (1618MB/s)	 (2066MB/s)    	 (4276MB/s)    
READ:               (1573MB/s)	 (1942MB/s)    	 (3830MB/s)    
READ:               (1202MB/s)	 (227MB/s)     	 (1971MB/s)    
WRITE:              (1200MB/s)	 (227MB/s)     	 (1968MB/s)    
READ:               (1265MB/s)	 (226MB/s)     	 (2116MB/s)    
WRITE:              (1264MB/s)	 (226MB/s)     	 (2114MB/s)    
#jobs5                         	                	                
WRITE:              (5339MB/s)	 (233MB/s)     	 (3781MB/s)    
WRITE:              (4298MB/s)	 (234MB/s)     	 (3276MB/s)    
READ:               (1626MB/s)	 (2048MB/s)    	 (4081MB/s)    
READ:               (1567MB/s)	 (1929MB/s)    	 (3758MB/s)    
READ:               (1174MB/s)	 (205MB/s)     	 (1747MB/s)    
WRITE:              (1173MB/s)	 (204MB/s)     	 (1746MB/s)    
READ:               (1214MB/s)	 (208MB/s)     	 (1890MB/s)    
WRITE:              (1215MB/s)	 (208MB/s)     	 (1892MB/s)    
#jobs6                         	                	                
WRITE:              (5666MB/s)	 (270MB/s)     	 (4338MB/s)    
WRITE:              (4828MB/s)	 (267MB/s)     	 (3772MB/s)    
READ:               (1803MB/s)	 (2058MB/s)    	 (4946MB/s)    
READ:               (1805MB/s)	 (2156MB/s)    	 (4711MB/s)    
READ:               (1334MB/s)	 (235MB/s)     	 (2135MB/s)    
WRITE:              (1335MB/s)	 (235MB/s)     	 (2137MB/s)    
READ:               (1364MB/s)	 (236MB/s)     	 (2268MB/s)    
WRITE:              (1365MB/s)	 (237MB/s)     	 (2270MB/s)    
#jobs7                         	                	                
WRITE:              (5474MB/s)	 (270MB/s)     	 (4300MB/s)    
WRITE:              (4666MB/s)	 (266MB/s)     	 (3817MB/s)    
READ:               (2022MB/s)	 (2319MB/s)    	 (5472MB/s)    
READ:               (1924MB/s)	 (2260MB/s)    	 (5031MB/s)    
READ:               (1369MB/s)	 (242MB/s)     	 (2153MB/s)    
WRITE:              (1370MB/s)	 (242MB/s)     	 (2155MB/s)    
READ:               (1499MB/s)	 (246MB/s)     	 (2310MB/s)    
WRITE:              (1497MB/s)	 (246MB/s)     	 (2307MB/s)    
#jobs8                         	                	                
WRITE:              (5558MB/s)	 (273MB/s)     	 (4439MB/s)    
WRITE:              (4763MB/s)	 (271MB/s)     	 (3918MB/s)    
READ:               (2201MB/s)	 (2599MB/s)    	 (6062MB/s)    
READ:               (2105MB/s)	 (2463MB/s)    	 (5413MB/s)    
READ:               (1490MB/s)	 (252MB/s)     	 (2238MB/s)    
WRITE:              (1488MB/s)	 (252MB/s)     	 (2236MB/s)    
READ:               (1566MB/s)	 (254MB/s)     	 (2434MB/s)    
WRITE:              (1568MB/s)	 (254MB/s)     	 (2437MB/s)    
#jobs9                         	                	                
WRITE:              (5120MB/s)	 (264MB/s)     	 (4035MB/s)    
WRITE:              (4531MB/s)	 (267MB/s)     	 (3740MB/s)    
READ:               (1940MB/s)	 (2258MB/s)    	 (4986MB/s)    
READ:               (2024MB/s)	 (2387MB/s)    	 (4871MB/s)    
READ:               (1343MB/s)	 (246MB/s)     	 (2038MB/s)    
WRITE:              (1342MB/s)	 (246MB/s)     	 (2037MB/s)    
READ:               (1553MB/s)	 (238MB/s)     	 (2243MB/s)    
WRITE:              (1552MB/s)	 (238MB/s)     	 (2242MB/s)    
#jobs10                        	                	                
WRITE:              (5345MB/s)	 (271MB/s)     	 (3988MB/s)    
WRITE:              (4750MB/s)	 (254MB/s)     	 (3668MB/s)    
READ:               (1876MB/s)	 (2363MB/s)    	 (5150MB/s)    
READ:               (1990MB/s)	 (2256MB/s)    	 (5080MB/s)    
READ:               (1355MB/s)	 (250MB/s)     	 (2019MB/s)    
WRITE:              (1356MB/s)	 (251MB/s)     	 (2020MB/s)    
READ:               (1490MB/s)	 (252MB/s)     	 (2202MB/s)    
WRITE:              (1488MB/s)	 (252MB/s)     	 (2199MB/s)    

jobs1                              perfstat         	                          	                          
instructions                 52,065,555,710 (    0.79)	  855,731,114,587 (    2.64)	   54,280,709,944 (    1.40)
branches                     14,020,427,116 ( 725.847)	  101,733,449,582 (1074.521)	   11,170,591,067 ( 992.869)
branch-misses                    22,626,174 (   0.16%)	      274,197,885 (   0.27%)	       25,915,805 (   0.23%)
jobs2                              perfstat         	                          	                          
instructions                103,633,110,402 (    0.75)	1,710,822,100,914 (    2.59)	  107,879,874,104 (    1.28)
branches                     27,931,237,282 ( 679.203)	  203,298,267,479 (1037.326)	   22,185,350,842 ( 884.427)
branch-misses                    46,103,811 (   0.17%)	      533,747,204 (   0.26%)	       49,682,483 (   0.22%)
jobs3                              perfstat         	                          	                          
instructions                154,857,283,657 (    0.76)	2,565,748,974,197 (    2.57)	  161,515,435,813 (    1.31)
branches                     41,759,490,355 ( 670.529)	  304,905,605,277 ( 978.765)	   33,215,805,907 ( 888.003)
branch-misses                    74,263,293 (   0.18%)	      759,746,240 (   0.25%)	       76,841,196 (   0.23%)
jobs4                              perfstat         	                          	                          
instructions                206,215,849,076 (    0.75)	3,420,169,460,897 (    2.60)	  215,003,061,664 (    1.31)
branches                     55,632,141,739 ( 666.501)	  406,394,977,433 ( 927.241)	   44,214,322,251 ( 883.532)
branch-misses                   102,287,788 (   0.18%)	    1,098,617,314 (   0.27%)	      103,891,040 (   0.23%)
jobs5                              perfstat         	                          	                          
instructions                258,711,315,588 (    0.67)	4,275,657,533,244 (    2.23)	  269,332,235,685 (    1.08)
branches                     69,802,821,166 ( 588.823)	  507,996,211,252 ( 797.036)	   55,450,846,129 ( 735.095)
branch-misses                   129,217,214 (   0.19%)	    1,243,284,991 (   0.24%)	      173,512,278 (   0.31%)
jobs6                              perfstat         	                          	                          
instructions                312,796,166,008 (    0.61)	5,133,896,344,660 (    2.02)	  323,658,769,588 (    1.04)
branches                     84,372,488,583 ( 520.541)	  610,310,494,402 ( 697.642)	   66,683,292,992 ( 693.939)
branch-misses                   159,438,978 (   0.19%)	    1,396,368,563 (   0.23%)	      174,406,934 (   0.26%)
jobs7                              perfstat         	                          	                          
instructions                363,211,372,930 (    0.56)	5,988,205,600,879 (    1.75)	  377,824,674,156 (    0.93)
branches                     98,057,013,765 ( 463.117)	  711,841,255,974 ( 598.762)	   77,879,009,954 ( 600.443)
branch-misses                   199,513,153 (   0.20%)	    1,507,651,077 (   0.21%)	      248,203,369 (   0.32%)
jobs8                              perfstat         	                          	                          
instructions                413,960,354,615 (    0.52)	6,842,918,558,378 (    1.45)	  431,938,486,581 (    0.83)
branches                    111,812,574,884 ( 414.224)	  813,299,084,518 ( 491.173)	   89,062,699,827 ( 517.795)
branch-misses                   233,584,845 (   0.21%)	    1,531,593,921 (   0.19%)	      286,818,489 (   0.32%)
jobs9                              perfstat         	                          	                          
instructions                465,976,220,300 (    0.53)	7,698,467,237,372 (    1.47)	  486,352,600,321 (    0.84)
branches                    125,931,456,162 ( 424.063)	  915,207,005,715 ( 498.192)	  100,370,404,090 ( 517.439)
branch-misses                   256,992,445 (   0.20%)	    1,782,809,816 (   0.19%)	      345,239,380 (   0.34%)
jobs10                             perfstat         	                          	                          
instructions                517,406,372,715 (    0.53)	8,553,527,312,900 (    1.48)	  540,732,653,094 (    0.84)
branches                    139,839,780,676 ( 427.732)	1,016,737,699,389 ( 503.172)	  111,696,557,638 ( 516.750)
branch-misses                   259,595,561 (   0.19%)	    1,952,570,279 (   0.19%)	      357,818,661 (   0.32%)


seconds elapsed        20.630411534	96.084546565	12.743373571
seconds elapsed        22.292627625	100.984155001	14.407413560
seconds elapsed        22.396016966	110.344880848	14.032201392
seconds elapsed        22.517330949	113.351459170	14.243074935
seconds elapsed        28.548305104	156.515193765	19.159286861
seconds elapsed        30.453538116	164.559937678	19.362492717
seconds elapsed        33.467108086	188.486827481	21.492612173
seconds elapsed        35.617727591	209.602677783	23.256422492
seconds elapsed        42.584239509	243.959902566	28.458540338
seconds elapsed        47.683632526	269.635248851	31.542404137


over all, ZSTD has slower WRITE, but much faster READ (perhaps a static
compression buffer helps ZSTD a lot), which results in faster test results.

now, memory consumption (zram mm_stat file)

zram-LZO-mm_stat
mm_stat (jobs1): 2147483648 23068672 33558528        0 33558528        0        0
mm_stat (jobs2): 2147483648 23068672 33558528        0 33558528        0        0
mm_stat (jobs3): 2147483648 23068672 33558528        0 33562624        0        0
mm_stat (jobs4): 2147483648 23068672 33558528        0 33558528        0        0
mm_stat (jobs5): 2147483648 23068672 33558528        0 33558528        0        0
mm_stat (jobs6): 2147483648 23068672 33558528        0 33562624        0        0
mm_stat (jobs7): 2147483648 23068672 33558528        0 33566720        0        0
mm_stat (jobs8): 2147483648 23068672 33558528        0 33558528        0        0
mm_stat (jobs9): 2147483648 23068672 33558528        0 33558528        0        0
mm_stat (jobs10): 2147483648 23068672 33558528        0 33562624        0        0

zram-DEFLATE-mm_stat
mm_stat (jobs1): 2147483648 16252928 25178112        0 25178112        0        0
mm_stat (jobs2): 2147483648 16252928 25178112        0 25178112        0        0
mm_stat (jobs3): 2147483648 16252928 25178112        0 25178112        0        0
mm_stat (jobs4): 2147483648 16252928 25178112        0 25178112        0        0
mm_stat (jobs5): 2147483648 16252928 25178112        0 25178112        0        0
mm_stat (jobs6): 2147483648 16252928 25178112        0 25178112        0        0
mm_stat (jobs7): 2147483648 16252928 25178112        0 25190400        0        0
mm_stat (jobs8): 2147483648 16252928 25178112        0 25190400        0        0
mm_stat (jobs9): 2147483648 16252928 25178112        0 25178112        0        0
mm_stat (jobs10): 2147483648 16252928 25178112        0 25178112        0        0

zram-ZSTD-mm_stat
mm_stat (jobs1): 2147483648 11010048 16781312        0 16781312        0        0
mm_stat (jobs2): 2147483648 11010048 16781312        0 16781312        0        0
mm_stat (jobs3): 2147483648 11010048 16781312        0 16785408        0        0
mm_stat (jobs4): 2147483648 11010048 16781312        0 16781312        0        0
mm_stat (jobs5): 2147483648 11010048 16781312        0 16781312        0        0
mm_stat (jobs6): 2147483648 11010048 16781312        0 16781312        0        0
mm_stat (jobs7): 2147483648 11010048 16781312        0 16781312        0        0
mm_stat (jobs8): 2147483648 11010048 16781312        0 16781312        0        0
mm_stat (jobs9): 2147483648 11010048 16781312        0 16785408        0        0
mm_stat (jobs10): 2147483648 11010048 16781312        0 16781312        0        0


	-ss
Joonsoo Kim Aug. 25, 2017, 12:49 a.m. UTC | #3
On Thu, Aug 24, 2017 at 09:33:54PM +0000, Nick Terrell wrote:
> On Thu, Aug 24, 2017 at 10:49:36AM +0900, Sergey Senozhatsky wrote:
> > Add ZSTD to the list of supported compression algorithms.
> > 
> > Official benchmarks [1]:
> 
> Awesome! Let me know if you need anything from me.
> 
Hello, Nick.

Awesome work!!!

Let me ask a question.
Zram compress and decompress a small data (a page) and your github
site says that using predefined dictionary would be helpful in this
situation. However, it seems that compression crypto API for zstd
doesn't use ZSTD_compress_usingDict(). Is there any plan to support
it?

Thanks.
Sergey Senozhatsky Aug. 25, 2017, 12:51 a.m. UTC | #4
On (08/24/17 21:33), Nick Terrell wrote:
> On Thu, Aug 24, 2017 at 10:49:36AM +0900, Sergey Senozhatsky wrote:
> > Add ZSTD to the list of supported compression algorithms.
> > 
> > Official benchmarks [1]:
> 
> Awesome! Let me know if you need anything from me.

thanks!

	-ss
Nick Terrell Aug. 25, 2017, 1:35 a.m. UTC | #5
On 8/24/17, 5:49 PM, "Joonsoo Kim" <iamjoonsoo.kim@lge.com> wrote:
> On Thu, Aug 24, 2017 at 09:33:54PM +0000, Nick Terrell wrote:
> > On Thu, Aug 24, 2017 at 10:49:36AM +0900, Sergey Senozhatsky wrote:
> > > Add ZSTD to the list of supported compression algorithms.
> > > 
> > > Official benchmarks [1]:
> > 
> > Awesome! Let me know if you need anything from me.
> > 
> Hello, Nick.
> 
> Awesome work!!!
> 
> Let me ask a question.
> Zram compress and decompress a small data (a page) and your github
> site says that using predefined dictionary would be helpful in this
> situation. However, it seems that compression crypto API for zstd
> doesn't use ZSTD_compress_usingDict(). Is there any plan to support
> it?

I think using dictionaries in zram could be very interesting. We could for
example, take a random sample of the RAM and use that as the dictionary
for compression. E.g. take 32 512B samples from RAM and build a 16 KB
dictionary (sizes may vary).

I'm not sure how you would pass a dictionary into the crypto compression
API, but I'm sure we can make something work if dictionary compression
proves to be beneficial enough.

What data have you, or anyone, used for benchmarking compression ratio and 
speed for RAM? Since it is such a specialized application, the standard
compression benchmarks aren't very applicable.

> Thanks.
Sergey Senozhatsky Aug. 25, 2017, 1:53 a.m. UTC | #6
On (08/25/17 01:35), Nick Terrell wrote:
> On 8/24/17, 5:49 PM, "Joonsoo Kim" <iamjoonsoo.kim@lge.com> wrote:
> > On Thu, Aug 24, 2017 at 09:33:54PM +0000, Nick Terrell wrote:
> > > On Thu, Aug 24, 2017 at 10:49:36AM +0900, Sergey Senozhatsky wrote:
> > > > Add ZSTD to the list of supported compression algorithms.
> > > > 
> > > > Official benchmarks [1]:
> > > 
> > > Awesome! Let me know if you need anything from me.
> > > 
> > Hello, Nick.
> > 
> > Awesome work!!!
> > 
> > Let me ask a question.
> > Zram compress and decompress a small data (a page) and your github
> > site says that using predefined dictionary would be helpful in this
> > situation. However, it seems that compression crypto API for zstd
> > doesn't use ZSTD_compress_usingDict(). Is there any plan to support
> > it?

excellent question, Joonsoo.

> I think using dictionaries in zram could be very interesting. We could for
> example, take a random sample of the RAM and use that as the dictionary
> for compression. E.g. take 32 512B samples from RAM and build a 16 KB
> dictionary (sizes may vary).
> 
> I'm not sure how you would pass a dictionary into the crypto compression
> API, but I'm sure we can make something work if dictionary compression
> proves to be beneficial enough.

a dictionaty pointer can be in `struct zstd_ctx'.


> What data have you, or anyone, used for benchmarking compression ratio and 
> speed for RAM? Since it is such a specialized application, the standard
> compression benchmarks aren't very applicable.


yeah, I thought that zstd uses dicts unconditionally.

I used my own simple minded test script:
	https://github.com/sergey-senozhatsky/zram-perf-test

it basically invokes fio with a 'static compression buffer', because
we want to have the exactly same data to be compressed when I compare
algorithms... I guess I need to improve it, somehow.

	-ss
Joonsoo Kim Aug. 25, 2017, 2:02 a.m. UTC | #7
On Fri, Aug 25, 2017 at 01:35:35AM +0000, Nick Terrell wrote:
> On 8/24/17, 5:49 PM, "Joonsoo Kim" <iamjoonsoo.kim@lge.com> wrote:
> > On Thu, Aug 24, 2017 at 09:33:54PM +0000, Nick Terrell wrote:
> > > On Thu, Aug 24, 2017 at 10:49:36AM +0900, Sergey Senozhatsky wrote:
> > > > Add ZSTD to the list of supported compression algorithms.
> > > > 
> > > > Official benchmarks [1]:
> > > 
> > > Awesome! Let me know if you need anything from me.
> > > 
> > Hello, Nick.
> > 
> > Awesome work!!!
> > 
> > Let me ask a question.
> > Zram compress and decompress a small data (a page) and your github
> > site says that using predefined dictionary would be helpful in this
> > situation. However, it seems that compression crypto API for zstd
> > doesn't use ZSTD_compress_usingDict(). Is there any plan to support
> > it?
> 
> I think using dictionaries in zram could be very interesting. We could for

Yes!

> example, take a random sample of the RAM and use that as the dictionary
> for compression. E.g. take 32 512B samples from RAM and build a 16 KB
> dictionary (sizes may vary).
> 
> I'm not sure how you would pass a dictionary into the crypto compression
> API, but I'm sure we can make something work if dictionary compression
> proves to be beneficial enough.

Okay. I'm also not sure how we can pass a dictionary into the crypto
compression API so I ask you that you have a plan to implement it. :)

> 
> What data have you, or anyone, used for benchmarking compression ratio and 
> speed for RAM? Since it is such a specialized application, the standard
> compression benchmarks aren't very applicable.

For three month ago, I worked zram deduplication feature. It wasn't
merged due to some problems however it tells us that
zram's data has many similiar (even some of them is perfectly the
same) parts in some cases. I think that this similarity between data
and predefined dictionary would lead to a good result.

lkml.kernel.org/r/<1494556204-25796-3-git-send-email-iamjoonsoo.kim@lge.com>

Thanks.
Nick Terrell Aug. 25, 2017, 2:09 a.m. UTC | #8
On 8/24/17, 6:53 PM, "Sergey Senozhatsky" <sergey.senozhatsky.work@gmail.com> wrote:
> On (08/25/17 01:35), Nick Terrell wrote:
> > On 8/24/17, 5:49 PM, "Joonsoo Kim" <iamjoonsoo.kim@lge.com> wrote:
> > > On Thu, Aug 24, 2017 at 09:33:54PM +0000, Nick Terrell wrote:
> > > > On Thu, Aug 24, 2017 at 10:49:36AM +0900, Sergey Senozhatsky wrote:
> > > > > Add ZSTD to the list of supported compression algorithms.
> > > > > 
> > > > > Official benchmarks [1]:
> > > > 
> > > > Awesome! Let me know if you need anything from me.
> > > > 
> > > Hello, Nick.
> > > 
> > > Awesome work!!!
> > > 
> > > Let me ask a question.
> > > Zram compress and decompress a small data (a page) and your github
> > > site says that using predefined dictionary would be helpful in this
> > > situation. However, it seems that compression crypto API for zstd
> > > doesn't use ZSTD_compress_usingDict(). Is there any plan to support
> > > it?
> 
> excellent question, Joonsoo.
> 
> > I think using dictionaries in zram could be very interesting. We could for
> > example, take a random sample of the RAM and use that as the dictionary
> > for compression. E.g. take 32 512B samples from RAM and build a 16 KB
> > dictionary (sizes may vary).
> > 
> > I'm not sure how you would pass a dictionary into the crypto compression
> > API, but I'm sure we can make something work if dictionary compression
> > proves to be beneficial enough.
> 
> a dictionaty pointer can be in `struct zstd_ctx'.

Sorry I'm not super familiar with the crypto API. How would the user
initialize the dictionary in the `struct zstd_ctx'? I guess you could
either expose the struct in some header, or provide a function that
initializes it, but that isn't generic. Does the crypto library already
have a standard way to provide extra information?

> > What data have you, or anyone, used for benchmarking compression ratio and 
> > speed for RAM? Since it is such a specialized application, the standard
> > compression benchmarks aren't very applicable.
> 
> 
> yeah, I thought that zstd uses dicts unconditionally.
> 
> I used my own simple minded test script:
> 	https://github.com/sergey-senozhatsky/zram-perf-test
> 
> it basically invokes fio with a 'static compression buffer', because
> we want to have the exactly same data to be compressed when I compare
> algorithms... I guess I need to improve it, somehow.

I'll check it out, thanks!

> 
> 	-ss
>
Sergey Senozhatsky Aug. 25, 2017, 2:21 a.m. UTC | #9
On (08/25/17 02:09), Nick Terrell wrote:
[..]
> > > I think using dictionaries in zram could be very interesting. We could for
> > > example, take a random sample of the RAM and use that as the dictionary
> > > for compression. E.g. take 32 512B samples from RAM and build a 16 KB
> > > dictionary (sizes may vary).
> > > 
> > > I'm not sure how you would pass a dictionary into the crypto compression
> > > API, but I'm sure we can make something work if dictionary compression
> > > proves to be beneficial enough.
> > 
> > a dictionaty pointer can be in `struct zstd_ctx'.
> 
> Sorry I'm not super familiar with the crypto API. How would the user
> initialize the dictionary in the `struct zstd_ctx'? I guess you could
> either expose the struct in some header, or provide a function that
> initializes it, but that isn't generic. Does the crypto library already
> have a standard way to provide extra information?

not really familiar either... I was thinking about having "zstd" and
"zstd_dict" crypto_alg structs - one would be !dict, the other one would
allocate dict and pass it to compress/decompress zstd callbacks. "zstd"
vecrsion would invoke zstd_params() passing zeros as compress and dict
sizes to ZSTD_getParams(), while "zstd_dict" would invoke, lets say,
zstd_params_dict() passing PAGE_SIZE-s. hm... (0, PAGE_SIZE)?  to
ZSTD_getParams(). just a rough idea...

	-ss
Nick Terrell Aug. 25, 2017, 2:46 a.m. UTC | #10
On 8/24/17, 7:21 PM, "Sergey Senozhatsky" <sergey.senozhatsky.work@gmail.com> wrote:
> not really familiar either... I was thinking about having "zstd" and
> "zstd_dict" crypto_alg structs - one would be !dict, the other one would
> allocate dict and pass it to compress/decompress zstd callbacks. "zstd"
> vecrsion would invoke zstd_params() passing zeros as compress and dict
> sizes to ZSTD_getParams(), while "zstd_dict" would invoke, lets say,
> zstd_params_dict() passing PAGE_SIZE-s. hm... (0, PAGE_SIZE)?  to
> ZSTD_getParams(). just a rough idea...

The way zstd dictionaries work is the user provides some data which gets
"prepended" to the data that is about to be compressed, without actually
writing it to output. That way zstd can find matches in the dictionary and
represent them for "free". That means the user has to pass the same data to
both the compressor and decompressor.

We could build a dictionary, say every 20 minutes, by sampling 512 B chunks
of the RAM and constructing a 16 KB dictionary. Then recompress all the
compressed RAM with the new dictionary. This is just a simple example of a
dictionary construction algorithm. You could imagine grouping pages by
application, and building a dictionary per application, since those pages
would likely be more similar.

Regarding the crypto API, I think it would be possible to experiment by
creating functions like
`zstd_comp_add_dictionary(void *ctx, void *data, size_t size)'
and `zstd_decomp_add_dictionary(void *ctx, void *data, size_t size)'
in the crypto zstd implementation and declare them in `zcomp.c'. If the
experiments prove that using zstd dictionaries (or LZ4 dictionaries) is
worthwhile, then we can figure out how we can make it work for real.
Minchan Kim Aug. 25, 2017, 4:50 a.m. UTC | #11
Hi Sergey,

On Thu, Aug 24, 2017 at 11:04:40PM +0900, Sergey Senozhatsky wrote:
> Hi,
> 
> On (08/24/17 13:30), Minchan Kim wrote:
> > Hello Sergey,
> > 
> > On Thu, Aug 24, 2017 at 10:49:36AM +0900, Sergey Senozhatsky wrote:
> > > Add ZSTD to the list of supported compression algorithms.
> > > 
> > > Official benchmarks [1]:
> > 
> > First of all, thanks for the work!
> > 
> > I want to ask one thing.
> > 
> > Could you add some benchmark(e.g.,) result(comp ratio and speed)
> > compared to (inflate, lzo, lz4)?
> > 
> > I want to see how much it's good for small data that ours is 4K.
> 
> 
> so on my syntetic fio test (with a static buffer):
> 
> 
>                      LZO         DEFLATE         ZSTD
> 
> #jobs1                         	                	                
> WRITE:              (2180MB/s)	 (77.2MB/s)    	 (1429MB/s)    
> WRITE:              (1617MB/s)	 (77.7MB/s)    	 (1202MB/s)    
> READ:                (426MB/s)	 (595MB/s)     	 (1181MB/s)    
> READ:                (422MB/s)	 (572MB/s)     	 (1020MB/s)    
> READ:                (318MB/s)	 (67.8MB/s)    	 (563MB/s)     
> WRITE:               (318MB/s)	 (67.9MB/s)    	 (564MB/s)     
> READ:                (336MB/s)	 (68.3MB/s)    	 (583MB/s)     
> WRITE:               (335MB/s)	 (68.2MB/s)    	 (582MB/s)     
> #jobs2                         	                	                
> WRITE:              (3441MB/s)	 (152MB/s)     	 (2141MB/s)    
> WRITE:              (2507MB/s)	 (147MB/s)     	 (1888MB/s)    
> READ:                (801MB/s)	 (1146MB/s)    	 (1890MB/s)    
> READ:                (767MB/s)	 (1096MB/s)    	 (2073MB/s)    
> READ:                (621MB/s)	 (126MB/s)     	 (1009MB/s)    
> WRITE:               (621MB/s)	 (126MB/s)     	 (1009MB/s)    
> READ:                (656MB/s)	 (125MB/s)     	 (1075MB/s)    
> WRITE:               (657MB/s)	 (126MB/s)     	 (1077MB/s)    
> #jobs3                         	                	                
> WRITE:              (4772MB/s)	 (225MB/s)     	 (3394MB/s)    
> WRITE:              (3905MB/s)	 (211MB/s)     	 (2939MB/s)    
> READ:               (1216MB/s)	 (1608MB/s)    	 (3218MB/s)    
> READ:               (1159MB/s)	 (1431MB/s)    	 (2981MB/s)    
> READ:                (906MB/s)	 (156MB/s)     	 (1457MB/s)    
> WRITE:               (907MB/s)	 (156MB/s)     	 (1458MB/s)    
> READ:                (953MB/s)	 (158MB/s)     	 (1595MB/s)    
> WRITE:               (952MB/s)	 (157MB/s)     	 (1593MB/s)    
> #jobs4                         	                	                
> WRITE:              (6036MB/s)	 (265MB/s)     	 (4469MB/s)    
> WRITE:              (5059MB/s)	 (263MB/s)     	 (3951MB/s)    
> READ:               (1618MB/s)	 (2066MB/s)    	 (4276MB/s)    
> READ:               (1573MB/s)	 (1942MB/s)    	 (3830MB/s)    
> READ:               (1202MB/s)	 (227MB/s)     	 (1971MB/s)    
> WRITE:              (1200MB/s)	 (227MB/s)     	 (1968MB/s)    
> READ:               (1265MB/s)	 (226MB/s)     	 (2116MB/s)    
> WRITE:              (1264MB/s)	 (226MB/s)     	 (2114MB/s)    
> #jobs5                         	                	                
> WRITE:              (5339MB/s)	 (233MB/s)     	 (3781MB/s)    
> WRITE:              (4298MB/s)	 (234MB/s)     	 (3276MB/s)    
> READ:               (1626MB/s)	 (2048MB/s)    	 (4081MB/s)    
> READ:               (1567MB/s)	 (1929MB/s)    	 (3758MB/s)    
> READ:               (1174MB/s)	 (205MB/s)     	 (1747MB/s)    
> WRITE:              (1173MB/s)	 (204MB/s)     	 (1746MB/s)    
> READ:               (1214MB/s)	 (208MB/s)     	 (1890MB/s)    
> WRITE:              (1215MB/s)	 (208MB/s)     	 (1892MB/s)    
> #jobs6                         	                	                
> WRITE:              (5666MB/s)	 (270MB/s)     	 (4338MB/s)    
> WRITE:              (4828MB/s)	 (267MB/s)     	 (3772MB/s)    
> READ:               (1803MB/s)	 (2058MB/s)    	 (4946MB/s)    
> READ:               (1805MB/s)	 (2156MB/s)    	 (4711MB/s)    
> READ:               (1334MB/s)	 (235MB/s)     	 (2135MB/s)    
> WRITE:              (1335MB/s)	 (235MB/s)     	 (2137MB/s)    
> READ:               (1364MB/s)	 (236MB/s)     	 (2268MB/s)    
> WRITE:              (1365MB/s)	 (237MB/s)     	 (2270MB/s)    
> #jobs7                         	                	                
> WRITE:              (5474MB/s)	 (270MB/s)     	 (4300MB/s)    
> WRITE:              (4666MB/s)	 (266MB/s)     	 (3817MB/s)    
> READ:               (2022MB/s)	 (2319MB/s)    	 (5472MB/s)    
> READ:               (1924MB/s)	 (2260MB/s)    	 (5031MB/s)    
> READ:               (1369MB/s)	 (242MB/s)     	 (2153MB/s)    
> WRITE:              (1370MB/s)	 (242MB/s)     	 (2155MB/s)    
> READ:               (1499MB/s)	 (246MB/s)     	 (2310MB/s)    
> WRITE:              (1497MB/s)	 (246MB/s)     	 (2307MB/s)    
> #jobs8                         	                	                
> WRITE:              (5558MB/s)	 (273MB/s)     	 (4439MB/s)    
> WRITE:              (4763MB/s)	 (271MB/s)     	 (3918MB/s)    
> READ:               (2201MB/s)	 (2599MB/s)    	 (6062MB/s)    
> READ:               (2105MB/s)	 (2463MB/s)    	 (5413MB/s)    
> READ:               (1490MB/s)	 (252MB/s)     	 (2238MB/s)    
> WRITE:              (1488MB/s)	 (252MB/s)     	 (2236MB/s)    
> READ:               (1566MB/s)	 (254MB/s)     	 (2434MB/s)    
> WRITE:              (1568MB/s)	 (254MB/s)     	 (2437MB/s)    
> #jobs9                         	                	                
> WRITE:              (5120MB/s)	 (264MB/s)     	 (4035MB/s)    
> WRITE:              (4531MB/s)	 (267MB/s)     	 (3740MB/s)    
> READ:               (1940MB/s)	 (2258MB/s)    	 (4986MB/s)    
> READ:               (2024MB/s)	 (2387MB/s)    	 (4871MB/s)    
> READ:               (1343MB/s)	 (246MB/s)     	 (2038MB/s)    
> WRITE:              (1342MB/s)	 (246MB/s)     	 (2037MB/s)    
> READ:               (1553MB/s)	 (238MB/s)     	 (2243MB/s)    
> WRITE:              (1552MB/s)	 (238MB/s)     	 (2242MB/s)    
> #jobs10                        	                	                
> WRITE:              (5345MB/s)	 (271MB/s)     	 (3988MB/s)    
> WRITE:              (4750MB/s)	 (254MB/s)     	 (3668MB/s)    
> READ:               (1876MB/s)	 (2363MB/s)    	 (5150MB/s)    
> READ:               (1990MB/s)	 (2256MB/s)    	 (5080MB/s)    
> READ:               (1355MB/s)	 (250MB/s)     	 (2019MB/s)    
> WRITE:              (1356MB/s)	 (251MB/s)     	 (2020MB/s)    
> READ:               (1490MB/s)	 (252MB/s)     	 (2202MB/s)    
> WRITE:              (1488MB/s)	 (252MB/s)     	 (2199MB/s)    
> 
> jobs1                              perfstat         	                          	                          
> instructions                 52,065,555,710 (    0.79)	  855,731,114,587 (    2.64)	   54,280,709,944 (    1.40)
> branches                     14,020,427,116 ( 725.847)	  101,733,449,582 (1074.521)	   11,170,591,067 ( 992.869)
> branch-misses                    22,626,174 (   0.16%)	      274,197,885 (   0.27%)	       25,915,805 (   0.23%)
> jobs2                              perfstat         	                          	                          
> instructions                103,633,110,402 (    0.75)	1,710,822,100,914 (    2.59)	  107,879,874,104 (    1.28)
> branches                     27,931,237,282 ( 679.203)	  203,298,267,479 (1037.326)	   22,185,350,842 ( 884.427)
> branch-misses                    46,103,811 (   0.17%)	      533,747,204 (   0.26%)	       49,682,483 (   0.22%)
> jobs3                              perfstat         	                          	                          
> instructions                154,857,283,657 (    0.76)	2,565,748,974,197 (    2.57)	  161,515,435,813 (    1.31)
> branches                     41,759,490,355 ( 670.529)	  304,905,605,277 ( 978.765)	   33,215,805,907 ( 888.003)
> branch-misses                    74,263,293 (   0.18%)	      759,746,240 (   0.25%)	       76,841,196 (   0.23%)
> jobs4                              perfstat         	                          	                          
> instructions                206,215,849,076 (    0.75)	3,420,169,460,897 (    2.60)	  215,003,061,664 (    1.31)
> branches                     55,632,141,739 ( 666.501)	  406,394,977,433 ( 927.241)	   44,214,322,251 ( 883.532)
> branch-misses                   102,287,788 (   0.18%)	    1,098,617,314 (   0.27%)	      103,891,040 (   0.23%)
> jobs5                              perfstat         	                          	                          
> instructions                258,711,315,588 (    0.67)	4,275,657,533,244 (    2.23)	  269,332,235,685 (    1.08)
> branches                     69,802,821,166 ( 588.823)	  507,996,211,252 ( 797.036)	   55,450,846,129 ( 735.095)
> branch-misses                   129,217,214 (   0.19%)	    1,243,284,991 (   0.24%)	      173,512,278 (   0.31%)
> jobs6                              perfstat         	                          	                          
> instructions                312,796,166,008 (    0.61)	5,133,896,344,660 (    2.02)	  323,658,769,588 (    1.04)
> branches                     84,372,488,583 ( 520.541)	  610,310,494,402 ( 697.642)	   66,683,292,992 ( 693.939)
> branch-misses                   159,438,978 (   0.19%)	    1,396,368,563 (   0.23%)	      174,406,934 (   0.26%)
> jobs7                              perfstat         	                          	                          
> instructions                363,211,372,930 (    0.56)	5,988,205,600,879 (    1.75)	  377,824,674,156 (    0.93)
> branches                     98,057,013,765 ( 463.117)	  711,841,255,974 ( 598.762)	   77,879,009,954 ( 600.443)
> branch-misses                   199,513,153 (   0.20%)	    1,507,651,077 (   0.21%)	      248,203,369 (   0.32%)
> jobs8                              perfstat         	                          	                          
> instructions                413,960,354,615 (    0.52)	6,842,918,558,378 (    1.45)	  431,938,486,581 (    0.83)
> branches                    111,812,574,884 ( 414.224)	  813,299,084,518 ( 491.173)	   89,062,699,827 ( 517.795)
> branch-misses                   233,584,845 (   0.21%)	    1,531,593,921 (   0.19%)	      286,818,489 (   0.32%)
> jobs9                              perfstat         	                          	                          
> instructions                465,976,220,300 (    0.53)	7,698,467,237,372 (    1.47)	  486,352,600,321 (    0.84)
> branches                    125,931,456,162 ( 424.063)	  915,207,005,715 ( 498.192)	  100,370,404,090 ( 517.439)
> branch-misses                   256,992,445 (   0.20%)	    1,782,809,816 (   0.19%)	      345,239,380 (   0.34%)
> jobs10                             perfstat         	                          	                          
> instructions                517,406,372,715 (    0.53)	8,553,527,312,900 (    1.48)	  540,732,653,094 (    0.84)
> branches                    139,839,780,676 ( 427.732)	1,016,737,699,389 ( 503.172)	  111,696,557,638 ( 516.750)
> branch-misses                   259,595,561 (   0.19%)	    1,952,570,279 (   0.19%)	      357,818,661 (   0.32%)
> 
> 
> seconds elapsed        20.630411534	96.084546565	12.743373571
> seconds elapsed        22.292627625	100.984155001	14.407413560
> seconds elapsed        22.396016966	110.344880848	14.032201392
> seconds elapsed        22.517330949	113.351459170	14.243074935
> seconds elapsed        28.548305104	156.515193765	19.159286861
> seconds elapsed        30.453538116	164.559937678	19.362492717
> seconds elapsed        33.467108086	188.486827481	21.492612173
> seconds elapsed        35.617727591	209.602677783	23.256422492
> seconds elapsed        42.584239509	243.959902566	28.458540338
> seconds elapsed        47.683632526	269.635248851	31.542404137
> 
> 
> over all, ZSTD has slower WRITE, but much faster READ (perhaps a static
> compression buffer helps ZSTD a lot), which results in faster test results.
> 
> now, memory consumption (zram mm_stat file)
> 
> zram-LZO-mm_stat
> mm_stat (jobs1): 2147483648 23068672 33558528        0 33558528        0        0
> mm_stat (jobs2): 2147483648 23068672 33558528        0 33558528        0        0
> mm_stat (jobs3): 2147483648 23068672 33558528        0 33562624        0        0
> mm_stat (jobs4): 2147483648 23068672 33558528        0 33558528        0        0
> mm_stat (jobs5): 2147483648 23068672 33558528        0 33558528        0        0
> mm_stat (jobs6): 2147483648 23068672 33558528        0 33562624        0        0
> mm_stat (jobs7): 2147483648 23068672 33558528        0 33566720        0        0
> mm_stat (jobs8): 2147483648 23068672 33558528        0 33558528        0        0
> mm_stat (jobs9): 2147483648 23068672 33558528        0 33558528        0        0
> mm_stat (jobs10): 2147483648 23068672 33558528        0 33562624        0        0
> 
> zram-DEFLATE-mm_stat
> mm_stat (jobs1): 2147483648 16252928 25178112        0 25178112        0        0
> mm_stat (jobs2): 2147483648 16252928 25178112        0 25178112        0        0
> mm_stat (jobs3): 2147483648 16252928 25178112        0 25178112        0        0
> mm_stat (jobs4): 2147483648 16252928 25178112        0 25178112        0        0
> mm_stat (jobs5): 2147483648 16252928 25178112        0 25178112        0        0
> mm_stat (jobs6): 2147483648 16252928 25178112        0 25178112        0        0
> mm_stat (jobs7): 2147483648 16252928 25178112        0 25190400        0        0
> mm_stat (jobs8): 2147483648 16252928 25178112        0 25190400        0        0
> mm_stat (jobs9): 2147483648 16252928 25178112        0 25178112        0        0
> mm_stat (jobs10): 2147483648 16252928 25178112        0 25178112        0        0
> 
> zram-ZSTD-mm_stat
> mm_stat (jobs1): 2147483648 11010048 16781312        0 16781312        0        0
> mm_stat (jobs2): 2147483648 11010048 16781312        0 16781312        0        0
> mm_stat (jobs3): 2147483648 11010048 16781312        0 16785408        0        0
> mm_stat (jobs4): 2147483648 11010048 16781312        0 16781312        0        0
> mm_stat (jobs5): 2147483648 11010048 16781312        0 16781312        0        0
> mm_stat (jobs6): 2147483648 11010048 16781312        0 16781312        0        0
> mm_stat (jobs7): 2147483648 11010048 16781312        0 16781312        0        0
> mm_stat (jobs8): 2147483648 11010048 16781312        0 16781312        0        0
> mm_stat (jobs9): 2147483648 11010048 16781312        0 16785408        0        0
> mm_stat (jobs10): 2147483648 11010048 16781312        0 16781312        0        0

Thanks for the testing.
Could you resend the patch with this test result with my acked-by?

Acked-by: Minchan Kim <minchan@kernel.org>

Off-topic:

In all ways, zstd beats deflate. Nick, Right?

With zstd, I doubt we should show "deflate" to user. Many options just
make user confused.
The inflate have been there for representing high comp ratio but slower
speed. However, zstd is better unconditionally compared to deflate
so how about replacing deflate with zstd?

Sergey, what do you think about it?
Sergey Senozhatsky Aug. 25, 2017, 5:06 a.m. UTC | #12
On (08/25/17 13:50), Minchan Kim wrote:
[..]
> Thanks for the testing.
> Could you resend the patch with this test result with my acked-by?

ok.

> Acked-by: Minchan Kim <minchan@kernel.org>

thanks.

> Off-topic:
> 
> In all ways, zstd beats deflate. Nick, Right?
> 
> With zstd, I doubt we should show "deflate" to user. Many options just
> make user confused.
> The inflate have been there for representing high comp ratio but slower
> speed. However, zstd is better unconditionally compared to deflate
> so how about replacing deflate with zstd?
> 
> Sergey, what do you think about it?

hmmm... frankly, I don't think it would confuse anyone. the code is
there - compiled - anyway, and the module is visible in /proc/crypto
etc. if we will make it unavailable in zram then this can be confusing,
probably... if anyone ever pays any attention at all. my guess is that
people look what's in /sys/block/zram0/comp_algorithm just once, then
they set up a create-zram script/systemd unit file/etc. and forget
about it.

cryto API is just a wrapper and it calls lib/comp_foo to do the actual
work, and that lib/comp_foo *may be* can be implemented as a H/W (not
S/W) compressing backend. well, who knows.

	-ss
Minchan Kim Aug. 25, 2017, 5:19 a.m. UTC | #13
Hi Nick,

On Fri, Aug 25, 2017 at 01:35:35AM +0000, Nick Terrell wrote:
> On 8/24/17, 5:49 PM, "Joonsoo Kim" <iamjoonsoo.kim@lge.com> wrote:
> > On Thu, Aug 24, 2017 at 09:33:54PM +0000, Nick Terrell wrote:
> > > On Thu, Aug 24, 2017 at 10:49:36AM +0900, Sergey Senozhatsky wrote:
> > > > Add ZSTD to the list of supported compression algorithms.
> > > > 
> > > > Official benchmarks [1]:
> > > 
> > > Awesome! Let me know if you need anything from me.
> > > 
> > Hello, Nick.
> > 
> > Awesome work!!!
> > 
> > Let me ask a question.
> > Zram compress and decompress a small data (a page) and your github
> > site says that using predefined dictionary would be helpful in this
> > situation. However, it seems that compression crypto API for zstd
> > doesn't use ZSTD_compress_usingDict(). Is there any plan to support
> > it?
> 
> I think using dictionaries in zram could be very interesting. We could for
> example, take a random sample of the RAM and use that as the dictionary
> for compression. E.g. take 32 512B samples from RAM and build a 16 KB
> dictionary (sizes may vary).

For static option, could we create the dictionary with data in zram
and dump the dictionary into file. And then, rebuiling zram or kernel
includes the dictionary into images.

For it, we would need some knob like

        cat /sys/block/zram/zstd_dict > dict.data

        CONFIG_ZSTD_DICT_DIR=
        CONFIG_ZSTD_DICT_FILE= 

For dynamic option, could we make the dictionary with data
in zram dynamically? So, upcoming pages will use the newly
created dictionary but old compressed pages will use own dictionary.

I'm not sure it's possible, anyway, if predefined dict can help
comp ratio a lot in 4K data, I really love the feature and will support
to have it. ;)

> 
> I'm not sure how you would pass a dictionary into the crypto compression
> API, but I'm sure we can make something work if dictionary compression
> proves to be beneficial enough.

Yes, it would be better to integrate the feature crypto but Please, don't tie to
crypto API. If it's hard to support with current cypto API in short time,
I really want to support it with zcomp_zstd.c.

Please look at old zcomp model.
http://elixir.free-electrons.com/linux/v4.7/source/drivers/block/zram/zcomp_lz4.c

> 
> What data have you, or anyone, used for benchmarking compression ratio and 
> speed for RAM? Since it is such a specialized application, the standard
> compression benchmarks aren't very applicable.

I have used my image dumped from desktop swap device.
Of course, it doesn't cover all of cases in the world but it would be better
to use IO benchmark buffer, IMHO. :)

Thanks.
Sergey Senozhatsky Aug. 25, 2017, 5:26 a.m. UTC | #14
On (08/25/17 02:46), Nick Terrell wrote:
> On 8/24/17, 7:21 PM, "Sergey Senozhatsky" <sergey.senozhatsky.work@gmail.com> wrote:
> > not really familiar either... I was thinking about having "zstd" and
> > "zstd_dict" crypto_alg structs - one would be !dict, the other one would
> > allocate dict and pass it to compress/decompress zstd callbacks. "zstd"
> > vecrsion would invoke zstd_params() passing zeros as compress and dict
> > sizes to ZSTD_getParams(), while "zstd_dict" would invoke, lets say,
> > zstd_params_dict() passing PAGE_SIZE-s. hm... (0, PAGE_SIZE)?  to
> > ZSTD_getParams(). just a rough idea...
> 
> The way zstd dictionaries work is the user provides some data which gets
> "prepended" to the data that is about to be compressed, without actually
> writing it to output. That way zstd can find matches in the dictionary and
> represent them for "free". That means the user has to pass the same data to
> both the compressor and decompressor.

ah... I thought zstd would construct the dictionary for us based on the
data it compresses; and we just need to provide the buffer.

> We could build a dictionary, say every 20 minutes, by sampling 512 B chunks
> of the RAM and constructing a 16 KB dictionary. Then recompress all the
> compressed RAM with the new dictionary. This is just a simple example of a
> dictionary construction algorithm. You could imagine grouping pages by
> application, and building a dictionary per application, since those pages
> would likely be more similar.
> 
> Regarding the crypto API, I think it would be possible to experiment by
> creating functions like
> `zstd_comp_add_dictionary(void *ctx, void *data, size_t size)'
> and `zstd_decomp_add_dictionary(void *ctx, void *data, size_t size)'
> in the crypto zstd implementation and declare them in `zcomp.c'. If the
> experiments prove that using zstd dictionaries (or LZ4 dictionaries) is
> worthwhile, then we can figure out how we can make it work for real.

	-ss
Sergey Senozhatsky Aug. 25, 2017, 5:27 a.m. UTC | #15
On (08/25/17 14:06), Sergey Senozhatsky wrote:
> > Off-topic:
> > 
> > In all ways, zstd beats deflate. Nick, Right?
> > 
> > With zstd, I doubt we should show "deflate" to user. Many options just
> > make user confused.
> > The inflate have been there for representing high comp ratio but slower
> > speed. However, zstd is better unconditionally compared to deflate
> > so how about replacing deflate with zstd?
> > 
> > Sergey, what do you think about it?
> 
> hmmm... frankly, I don't think it would confuse anyone. the code is
> there - compiled - anyway, and the module is visible in /proc/crypto
> etc. if we will make it unavailable in zram then this can be confusing,
> probably... if anyone ever pays any attention at all. my guess is that
> people look what's in /sys/block/zram0/comp_algorithm just once, then
> they set up a create-zram script/systemd unit file/etc. and forget
> about it.
> 
> cryto API is just a wrapper and it calls lib/comp_foo to do the actual
> work, and that lib/comp_foo *may be* can be implemented as a H/W (not
> S/W) compressing backend. well, who knows.

but I can remove deflate, if you insist. sure.

	-ss
Minchan Kim Aug. 25, 2017, 5:36 a.m. UTC | #16
On Fri, Aug 25, 2017 at 02:06:44PM +0900, Sergey Senozhatsky wrote:

< snip >

> > Off-topic:
> > 
> > In all ways, zstd beats deflate. Nick, Right?
> > 
> > With zstd, I doubt we should show "deflate" to user. Many options just
> > make user confused.
> > The inflate have been there for representing high comp ratio but slower
> > speed. However, zstd is better unconditionally compared to deflate
> > so how about replacing deflate with zstd?
> > 
> > Sergey, what do you think about it?
> 
> hmmm... frankly, I don't think it would confuse anyone. the code is
> there - compiled - anyway, and the module is visible in /proc/crypto
> etc. if we will make it unavailable in zram then this can be confusing,
> probably... if anyone ever pays any attention at all. my guess is that
> people look what's in /sys/block/zram0/comp_algorithm just once, then
> they set up a create-zram script/systemd unit file/etc. and forget
> about it.

Although we don't show "deflate", zram still supports it so is it okay
for the user forgotten about it? and zram still doesn't show all of 
compression algorithm in system.

If I misunderstood your point, please, clarify.

Again, my point is that I want to show limited representative compression
(high speed/low comp, low speed/high comp, mid/mid) algorithm via
/sys/block/zram0/comp_algorithm rather than adding new entry whenever
new algorithm is added on. If others want something else algorithms
by some reason, he still can do it. We didn't prevent it.
Sergey Senozhatsky Aug. 25, 2017, 7:45 a.m. UTC | #17
On (08/25/17 14:36), Minchan Kim wrote:
[..]
> > hmmm... frankly, I don't think it would confuse anyone. the code is
> > there - compiled - anyway, and the module is visible in /proc/crypto
> > etc. if we will make it unavailable in zram then this can be confusing,
> > probably... if anyone ever pays any attention at all. my guess is that
> > people look what's in /sys/block/zram0/comp_algorithm just once, then
> > they set up a create-zram script/systemd unit file/etc. and forget
> > about it.
> 
> Although we don't show "deflate", zram still supports

right. I forgot about it :) [... and I have authored that code]

> Again, my point is that I want to show limited representative compression
> (high speed/low comp, low speed/high comp, mid/mid) algorithm via
> /sys/block/zram0/comp_algorithm rather than adding new entry whenever
> new algorithm is added on.

ok, will send out a patch set.



that may lead to a bigger/more general question:

- if zstd is so much better, then do we need deflate/inflate at all in
  the kernel? may be zstd can replace it?

what do you think, Nick?

	-ss
Adam Borowski Aug. 25, 2017, 8:08 a.m. UTC | #18
On Fri, Aug 25, 2017 at 04:45:33PM +0900, Sergey Senozhatsky wrote:
> that may lead to a bigger/more general question:
> 
> - if zstd is so much better, then do we need deflate/inflate at all in
>   the kernel? may be zstd can replace it?

zram and vmlinuz/modules are about the only cases that can be replaced. 
Everything else is long-lived data (filesystems) or comes from the outside
(network protocols).  There are also some cases where we have some control
over the producer (initramfstools) but even there you'd need many years of
deprecation.


Meow!
Nick Terrell Aug. 25, 2017, 6:55 p.m. UTC | #19
On 8/24/17, 9:51 PM, "Minchan Kim" <minchan@kernel.org> wrote:
>Off-topic:
>
>In all ways, zstd beats deflate. Nick, Right?

Yeah. Since all LZ compression algorithms are heuristics, there are some
rare cases where zlib beats zstd in compression ratio by a bit, but never
speed. In all cases where we've deployed zstd, we've seen aggregate
compression ratio improvements over zlib, and expect the same here.

>With zstd, I doubt we should show "deflate" to user. Many options just
>make user confused.
>The inflate have been there for representing high comp ratio but slower
>speed. However, zstd is better unconditionally compared to deflate
>so how about replacing deflate with zstd?
>
>Sergey, what do you think about it?
Nick Terrell Aug. 25, 2017, 7:31 p.m. UTC | #20
On 8/24/17, 10:19 PM, "Minchan Kim" <minchan@kernel.org> wrote:
> On Fri, Aug 25, 2017 at 01:35:35AM +0000, Nick Terrell wrote:
[..]
> > I think using dictionaries in zram could be very interesting. We could for
> > example, take a random sample of the RAM and use that as the dictionary
> > for compression. E.g. take 32 512B samples from RAM and build a 16 KB
> > dictionary (sizes may vary).
> 
> For static option, could we create the dictionary with data in zram
> and dump the dictionary into file. And then, rebuiling zram or kernel
> includes the dictionary into images.
> 
> For it, we would need some knob like
> 
>         cat /sys/block/zram/zstd_dict > dict.data
> 
>         CONFIG_ZSTD_DICT_DIR=
>         CONFIG_ZSTD_DICT_FILE= 

My guess is that a static dictionary won't cut it, since different
workloads will have drastically different RAM contents, so we won't be able
to construct a single dictionary that works for them all. I'd love to be
proven wrong though.

> For dynamic option, could we make the dictionary with data
> in zram dynamically? So, upcoming pages will use the newly
> created dictionary but old compressed pages will use own dictionary.

Yeah thats totally possible on the compression side, we would just need to
save which pages were compressed with which dictionary somewhere.

> I'm not sure it's possible, anyway, if predefined dict can help
> comp ratio a lot in 4K data, I really love the feature and will support
> to have it. ;)
> 
> > 
> > I'm not sure how you would pass a dictionary into the crypto compression
> > API, but I'm sure we can make something work if dictionary compression
> > proves to be beneficial enough.
> 
> Yes, it would be better to integrate the feature crypto but Please, don't tie to
> crypto API. If it's hard to support with current cypto API in short time,
> I really want to support it with zcomp_zstd.c.
> 
> Please look at old zcomp model.
> http://elixir.free-electrons.com/linux/v4.7/source/drivers/block/zram/zcomp_lz4.c

Thanks for the link, we could definitely make zcomp work with dictionaries.

> > What data have you, or anyone, used for benchmarking compression ratio and 
> > speed for RAM? Since it is such a specialized application, the standard
> > compression benchmarks aren't very applicable.
> 
> I have used my image dumped from desktop swap device.
> Of course, it doesn't cover all of cases in the world but it would be better
> to use IO benchmark buffer, IMHO. :)

Since adding dictionary support won't be quite as easy as adding zstd
support, I think the first step is building a set of benchmarks that
represent some common real world scenarios. We can easily test different
dictionary construction algorithms in userspace, and determine if the work
will pay off for some workloads. I'll collect some RAM samples from my
device and run some preliminary tests.
Minchan Kim Aug. 28, 2017, 6:52 a.m. UTC | #21
Hi Nick,

On Fri, Aug 25, 2017 at 07:31:14PM +0000, Nick Terrell wrote:
> On 8/24/17, 10:19 PM, "Minchan Kim" <minchan@kernel.org> wrote:
> > On Fri, Aug 25, 2017 at 01:35:35AM +0000, Nick Terrell wrote:
> [..]
> > > I think using dictionaries in zram could be very interesting. We could for
> > > example, take a random sample of the RAM and use that as the dictionary
> > > for compression. E.g. take 32 512B samples from RAM and build a 16 KB
> > > dictionary (sizes may vary).
> > 
> > For static option, could we create the dictionary with data in zram
> > and dump the dictionary into file. And then, rebuiling zram or kernel
> > includes the dictionary into images.
> > 
> > For it, we would need some knob like
> > 
> >         cat /sys/block/zram/zstd_dict > dict.data
> > 
> >         CONFIG_ZSTD_DICT_DIR=
> >         CONFIG_ZSTD_DICT_FILE= 
> 
> My guess is that a static dictionary won't cut it, since different
> workloads will have drastically different RAM contents, so we won't be able
> to construct a single dictionary that works for them all. I'd love to be
> proven wrong though.

zRAM is popular for system swap in embedded world. In mobile phone,
there would be different workloads as you said but other scenario
like refrigerator, TV and so will have very specific scenario
so it would be a great to have.

> 
> > For dynamic option, could we make the dictionary with data
> > in zram dynamically? So, upcoming pages will use the newly
> > created dictionary but old compressed pages will use own dictionary.
> 
> Yeah thats totally possible on the compression side, we would just need to
> save which pages were compressed with which dictionary somewhere.

Great. We have zram->table for object based and zspage for pages unit
so I expect it wouldn't be hard to implement.

> 
> > I'm not sure it's possible, anyway, if predefined dict can help
> > comp ratio a lot in 4K data, I really love the feature and will support
> > to have it. ;)
> > 
> > > 
> > > I'm not sure how you would pass a dictionary into the crypto compression
> > > API, but I'm sure we can make something work if dictionary compression
> > > proves to be beneficial enough.
> > 
> > Yes, it would be better to integrate the feature crypto but Please, don't tie to
> > crypto API. If it's hard to support with current cypto API in short time,
> > I really want to support it with zcomp_zstd.c.
> > 
> > Please look at old zcomp model.
> > http://elixir.free-electrons.com/linux/v4.7/source/drivers/block/zram/zcomp_lz4.c
> 
> Thanks for the link, we could definitely make zcomp work with dictionaries.
> 
> > > What data have you, or anyone, used for benchmarking compression ratio and 
> > > speed for RAM? Since it is such a specialized application, the standard
> > > compression benchmarks aren't very applicable.
> > 
> > I have used my image dumped from desktop swap device.
> > Of course, it doesn't cover all of cases in the world but it would be better
> > to use IO benchmark buffer, IMHO. :)
> 
> Since adding dictionary support won't be quite as easy as adding zstd
> support, I think the first step is building a set of benchmarks that
> represent some common real world scenarios. We can easily test different
> dictionary construction algorithms in userspace, and determine if the work
> will pay off for some workloads. I'll collect some RAM samples from my
> device and run some preliminary tests.

Sweet. I am looking forward to seeing your result.
Thanks!

Patch
diff mbox series

diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
index 5b8992beffec..cc66daec7bbc 100644
--- a/drivers/block/zram/zcomp.c
+++ b/drivers/block/zram/zcomp.c
@@ -31,6 +31,9 @@  static const char * const backends[] = {
 #endif
 #if IS_ENABLED(CONFIG_CRYPTO_842)
 	"842",
+#endif
+#if IS_ENABLED(CONFIG_CRYPTO_ZSTD)
+	"zstd",
 #endif
 	NULL
 };