linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] zstd: reduce stack usage
       [not found] <CGME20190603090227epcas5p348327061a3facbb9dfcf662bf2bc196e@epcas5p3.samsung.com>
@ 2019-06-03  9:02 ` Maninder Singh
       [not found]   ` <CGME20190603090232epcas5p1630d0584e8a1aa9495edc819605664fc@epcas5p1.samsung.com>
                     ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Maninder Singh @ 2019-06-03  9:02 UTC (permalink / raw)
  To: akpm, herbert, davem, keescook, gustavo
  Cc: joe, linux-crypto, linux-kernel, a.sahrawat, pankaj.m, v.narang,
	Maninder Singh

This patch set reduces stack usage for zstd code, because target like ARM has
limited 8KB kernel stack, which is getting overflowed due to hight stack usage
of zstd code with call flow like:

....
....
(FSE_compress_usingCTable) from (HUF_compressWeights_wksp+0x140/0x200)
(HUF_compressWeights_wksp) from (HUF_writeCTable_wksp+0xdc/0x1c8)
(HUF_writeCTable_wksp) from (HUF_compress4X_repeat+0x214/0x450)
(HUF_compress4X_repeat) from  (ZSTD_compressBlock_internal+0x228/0x135c)
(ZSTD_compressBlock_internal) from  (ZSTD_compressContinue_internal+0x1f8/0x3c8)
(ZSTD_compressContinue_internal) from  (ZSTD_compressCCtx+0xc0/0x1cc)
(ZSTD_compressCCtx) from  (zstd_compress+0x90/0xa8)
(zstd_compress) from  (crypto_compress+0x2c/0x34) 
(crypto_compress) from  (zcomp_compress+0x3c/0x44)
(zcomp_compress) from  (zram_bvec_rw+0x2f8/0xa7c)
(zram_bvec_rw) from [<c03e1e58>] (zram_rw_page+0x104/0x170)
(zram_rw_page) from [<c01d3f94>] (bdev_write_page+0x80/0xb4)
(bdev_write_page) from [<c017dc9c>] (__swap_writepage+0x160/0x29c)
(__swap_writepage) from [<c017de14>] (swap_writepage+0x3c/0x58)
(swap_writepage) from [<c014f88c>] (shrink_page_list+0x788/0xae0)
(shrink_page_list) from [<c01502e0>] (shrink_inactive_list+0x210/0x4a8)
(shrink_inactive_list) from [<c0150dd4>] (shrink_zone+0x53c/0x7c0)
(shrink_zone) from [<c0151354>] (try_to_free_pages+0x2fc/0x7cc)
(try_to_free_pages) from [<c0144c60>] (__alloc_pages_nodemask+0x534/0x91c)
(__alloc_pages_nodemask) from [<c01393ac>] (pagecache_get_page+0xe0/0x1d8)
....
....

Maninder Singh, Vaneet Narang (4):
  zstd: pass pointer rathen than structure to functions
  zstd: use U16 data type for rankPos
  zstd: move params structure to global variable to reduce stack
    usage
  zstd: change structure variable from int to char

 crypto/zstd.c           | 11 ++++---
 include/linux/zstd.h    | 16 +++++-----
 lib/zstd/compress.c     | 85 +++++++++++++++++++++++++------------------------
 lib/zstd/decompress.c   |  2 +-
 lib/zstd/huf_compress.c |  4 +--
 5 files changed, 62 insertions(+), 56 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] zstd: pass pointer rathen than structure to functions
       [not found]   ` <CGME20190603090232epcas5p1630d0584e8a1aa9495edc819605664fc@epcas5p1.samsung.com>
@ 2019-06-03  9:02     ` Maninder Singh
  2019-06-03 21:41       ` Andrew Morton
                         ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Maninder Singh @ 2019-06-03  9:02 UTC (permalink / raw)
  To: akpm, herbert, davem, keescook, gustavo
  Cc: joe, linux-crypto, linux-kernel, a.sahrawat, pankaj.m, v.narang,
	Maninder Singh

currently params structure is passed in all functions, which increases
stack usage in all the function and lead to stack overflow on target like
ARM with kernel stack size of 8 KB so better to pass pointer.

Checked for ARM:

                                Original               Patched
Call FLow Size:                  1264                   1040
....
(HUF_sort)                      -> 296
(HUF_buildCTable_wksp)          -> 144
(HUF_compress4X_repeat)         -> 88
(ZSTD_compressBlock_internal)   -> 200
(ZSTD_compressContinue_internal)-> 136                  -> 88
(ZSTD_compressCCtx)             -> 192                  -> 64
(zstd_compress)                 -> 144                  -> 96
(crypto_compress)               -> 32
(zcomp_compress)                -> 32
....

Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Signed-off-by: Vaneet Narang <v.narang@samsung.com>

Fixing, Line 211: Using & instead of && makes this somewhat difficult to read.
It's hard to believe this is a performance optimization.
Signed-off-by: Joe Perches <joe@perches.com>
---
https://lkml.org/lkml/2019/5/10/539 <joe@perches.com>

 crypto/zstd.c        |  2 +-
 include/linux/zstd.h | 10 +++----
 lib/zstd/compress.c  | 85 +++++++++++++++++++++++++++-------------------------
 3 files changed, 50 insertions(+), 47 deletions(-)

diff --git a/crypto/zstd.c b/crypto/zstd.c
index 2c04055..4e9ff22 100644
--- a/crypto/zstd.c
+++ b/crypto/zstd.c
@@ -162,7 +162,7 @@ static int __zstd_compress(const u8 *src, unsigned int slen,
 	struct zstd_ctx *zctx = ctx;
 	const ZSTD_parameters params = zstd_params();
 
-	out_len = ZSTD_compressCCtx(zctx->cctx, dst, *dlen, src, slen, params);
+	out_len = ZSTD_compressCCtx(zctx->cctx, dst, *dlen, src, slen, &params);
 	if (ZSTD_isError(out_len))
 		return -EINVAL;
 	*dlen = out_len;
diff --git a/include/linux/zstd.h b/include/linux/zstd.h
index 249575e..5103efa 100644
--- a/include/linux/zstd.h
+++ b/include/linux/zstd.h
@@ -254,7 +254,7 @@ ZSTD_CCtx *ZSTD_initCCtx(void *workspace, size_t workspaceSize);
  *               ZSTD_isError().
  */
 size_t ZSTD_compressCCtx(ZSTD_CCtx *ctx, void *dst, size_t dstCapacity,
-	const void *src, size_t srcSize, ZSTD_parameters params);
+	const void *src, size_t srcSize, const ZSTD_parameters *params);
 
 /**
  * ZSTD_DCtxWorkspaceBound() - amount of memory needed to initialize a ZSTD_DCtx
@@ -324,7 +324,7 @@ size_t ZSTD_decompressDCtx(ZSTD_DCtx *ctx, void *dst, size_t dstCapacity,
  */
 size_t ZSTD_compress_usingDict(ZSTD_CCtx *ctx, void *dst, size_t dstCapacity,
 	const void *src, size_t srcSize, const void *dict, size_t dictSize,
-	ZSTD_parameters params);
+	const ZSTD_parameters *params);
 
 /**
  * ZSTD_decompress_usingDict() - decompress src into dst using a dictionary
@@ -381,7 +381,7 @@ typedef struct ZSTD_CDict_s ZSTD_CDict;
  * Return:         The digested dictionary emplaced into workspace.
  */
 ZSTD_CDict *ZSTD_initCDict(const void *dictBuffer, size_t dictSize,
-	ZSTD_parameters params, void *workspace, size_t workspaceSize);
+	const ZSTD_parameters *params, void *workspace, size_t workspaceSize);
 
 /**
  * ZSTD_compress_usingCDict() - compress src into dst using a ZSTD_CDict
@@ -552,7 +552,7 @@ typedef struct ZSTD_CStream_s ZSTD_CStream;
  *
  * Return:          The zstd streaming compression context.
  */
-ZSTD_CStream *ZSTD_initCStream(ZSTD_parameters params,
+ZSTD_CStream *ZSTD_initCStream(const ZSTD_parameters *params,
 	unsigned long long pledgedSrcSize, void *workspace,
 	size_t workspaceSize);
 
@@ -1006,7 +1006,7 @@ size_t ZSTD_compressBegin(ZSTD_CCtx *cctx, int compressionLevel);
 size_t ZSTD_compressBegin_usingDict(ZSTD_CCtx *cctx, const void *dict,
 	size_t dictSize, int compressionLevel);
 size_t ZSTD_compressBegin_advanced(ZSTD_CCtx *cctx, const void *dict,
-	size_t dictSize, ZSTD_parameters params,
+	size_t dictSize, const ZSTD_parameters *params,
 	unsigned long long pledgedSrcSize);
 size_t ZSTD_copyCCtx(ZSTD_CCtx *cctx, const ZSTD_CCtx *preparedCCtx,
 	unsigned long long pledgedSrcSize);
diff --git a/lib/zstd/compress.c b/lib/zstd/compress.c
index 5e0b670..306e31b 100644
--- a/lib/zstd/compress.c
+++ b/lib/zstd/compress.c
@@ -206,18 +206,21 @@ ZSTD_compressionParameters ZSTD_adjustCParams(ZSTD_compressionParameters cPar, u
 	return cPar;
 }
 
-static U32 ZSTD_equivalentParams(ZSTD_parameters param1, ZSTD_parameters param2)
+static U32 ZSTD_equivalentParams(const ZSTD_parameters *param1, const ZSTD_parameters *param2)
 {
-	return (param1.cParams.hashLog == param2.cParams.hashLog) & (param1.cParams.chainLog == param2.cParams.chainLog) &
-	       (param1.cParams.strategy == param2.cParams.strategy) & ((param1.cParams.searchLength == 3) == (param2.cParams.searchLength == 3));
+	return  (param1->cParams.hashLog == param2->cParams.hashLog) &&
+		(param1->cParams.chainLog == param2->cParams.chainLog) &&
+		(param1->cParams.strategy == param2->cParams.strategy) &&
+		(param1->cParams.searchLength == 3) &&
+		(param1->cParams.searchLength == param2->cParams.searchLength);
 }
 
 /*! ZSTD_continueCCtx() :
 	reuse CCtx without reset (note : requires no dictionary) */
-static size_t ZSTD_continueCCtx(ZSTD_CCtx *cctx, ZSTD_parameters params, U64 frameContentSize)
+static size_t ZSTD_continueCCtx(ZSTD_CCtx *cctx, const ZSTD_parameters *params, U64 frameContentSize)
 {
 	U32 const end = (U32)(cctx->nextSrc - cctx->base);
-	cctx->params = params;
+	cctx->params = *params;
 	cctx->frameContentSize = frameContentSize;
 	cctx->lowLimit = end;
 	cctx->dictLimit = end;
@@ -239,23 +242,23 @@ typedef enum { ZSTDcrp_continue, ZSTDcrp_noMemset, ZSTDcrp_fullReset } ZSTD_comp
 
 /*! ZSTD_resetCCtx_advanced() :
 	note : `params` must be validated */
-static size_t ZSTD_resetCCtx_advanced(ZSTD_CCtx *zc, ZSTD_parameters params, U64 frameContentSize, ZSTD_compResetPolicy_e const crp)
+static size_t ZSTD_resetCCtx_advanced(ZSTD_CCtx *zc, const ZSTD_parameters *params, U64 frameContentSize, ZSTD_compResetPolicy_e const crp)
 {
 	if (crp == ZSTDcrp_continue)
-		if (ZSTD_equivalentParams(params, zc->params)) {
+		if (ZSTD_equivalentParams(params, &zc->params)) {
 			zc->flagStaticTables = 0;
 			zc->flagStaticHufTable = HUF_repeat_none;
 			return ZSTD_continueCCtx(zc, params, frameContentSize);
 		}
 
 	{
-		size_t const blockSize = MIN(ZSTD_BLOCKSIZE_ABSOLUTEMAX, (size_t)1 << params.cParams.windowLog);
-		U32 const divider = (params.cParams.searchLength == 3) ? 3 : 4;
+		size_t const blockSize = MIN(ZSTD_BLOCKSIZE_ABSOLUTEMAX, (size_t)1 << params->cParams.windowLog);
+		U32 const divider = (params->cParams.searchLength == 3) ? 3 : 4;
 		size_t const maxNbSeq = blockSize / divider;
 		size_t const tokenSpace = blockSize + 11 * maxNbSeq;
-		size_t const chainSize = (params.cParams.strategy == ZSTD_fast) ? 0 : (1 << params.cParams.chainLog);
-		size_t const hSize = ((size_t)1) << params.cParams.hashLog;
-		U32 const hashLog3 = (params.cParams.searchLength > 3) ? 0 : MIN(ZSTD_HASHLOG3_MAX, params.cParams.windowLog);
+		size_t const chainSize = (params->cParams.strategy == ZSTD_fast) ? 0 : (1 << params->cParams.chainLog);
+		size_t const hSize = ((size_t)1) << params->cParams.hashLog;
+		U32 const hashLog3 = (params->cParams.searchLength > 3) ? 0 : MIN(ZSTD_HASHLOG3_MAX, params->cParams.windowLog);
 		size_t const h3Size = ((size_t)1) << hashLog3;
 		size_t const tableSpace = (chainSize + hSize + h3Size) * sizeof(U32);
 		void *ptr;
@@ -265,7 +268,7 @@ static size_t ZSTD_resetCCtx_advanced(ZSTD_CCtx *zc, ZSTD_parameters params, U64
 			size_t const optSpace = ((MaxML + 1) + (MaxLL + 1) + (MaxOff + 1) + (1 << Litbits)) * sizeof(U32) +
 						(ZSTD_OPT_NUM + 1) * (sizeof(ZSTD_match_t) + sizeof(ZSTD_optimal_t));
 			size_t const neededSpace = tableSpace + (256 * sizeof(U32)) /* huffTable */ + tokenSpace +
-						   (((params.cParams.strategy == ZSTD_btopt) || (params.cParams.strategy == ZSTD_btopt2)) ? optSpace : 0);
+						   (((params->cParams.strategy == ZSTD_btopt) || (params->cParams.strategy == ZSTD_btopt2)) ? optSpace : 0);
 			if (zc->workSpaceSize < neededSpace) {
 				ZSTD_free(zc->workSpace, zc->customMem);
 				zc->workSpace = ZSTD_malloc(neededSpace, zc->customMem);
@@ -294,7 +297,7 @@ static size_t ZSTD_resetCCtx_advanced(ZSTD_CCtx *zc, ZSTD_parameters params, U64
 		zc->dictBase = NULL;
 		zc->dictLimit = 0;
 		zc->lowLimit = 0;
-		zc->params = params;
+		zc->params = *params;
 		zc->blockSize = blockSize;
 		zc->frameContentSize = frameContentSize;
 		{
@@ -303,7 +306,7 @@ static size_t ZSTD_resetCCtx_advanced(ZSTD_CCtx *zc, ZSTD_parameters params, U64
 				zc->rep[i] = repStartValue[i];
 		}
 
-		if ((params.cParams.strategy == ZSTD_btopt) || (params.cParams.strategy == ZSTD_btopt2)) {
+		if ((params->cParams.strategy == ZSTD_btopt) || (params->cParams.strategy == ZSTD_btopt2)) {
 			zc->seqStore.litFreq = (U32 *)ptr;
 			zc->seqStore.litLengthFreq = zc->seqStore.litFreq + (1 << Litbits);
 			zc->seqStore.matchLengthFreq = zc->seqStore.litLengthFreq + (MaxLL + 1);
@@ -354,7 +357,7 @@ size_t ZSTD_copyCCtx(ZSTD_CCtx *dstCCtx, const ZSTD_CCtx *srcCCtx, unsigned long
 	{
 		ZSTD_parameters params = srcCCtx->params;
 		params.fParams.contentSizeFlag = (pledgedSrcSize > 0);
-		ZSTD_resetCCtx_advanced(dstCCtx, params, pledgedSrcSize, ZSTDcrp_noMemset);
+		ZSTD_resetCCtx_advanced(dstCCtx, &params, pledgedSrcSize, ZSTDcrp_noMemset);
 	}
 
 	/* copy tables */
@@ -2428,16 +2431,16 @@ static size_t ZSTD_compress_generic(ZSTD_CCtx *cctx, void *dst, size_t dstCapaci
 	return op - ostart;
 }
 
-static size_t ZSTD_writeFrameHeader(void *dst, size_t dstCapacity, ZSTD_parameters params, U64 pledgedSrcSize, U32 dictID)
+static size_t ZSTD_writeFrameHeader(void *dst, size_t dstCapacity, ZSTD_parameters *params, U64 pledgedSrcSize, U32 dictID)
 {
 	BYTE *const op = (BYTE *)dst;
 	U32 const dictIDSizeCode = (dictID > 0) + (dictID >= 256) + (dictID >= 65536); /* 0-3 */
-	U32 const checksumFlag = params.fParams.checksumFlag > 0;
-	U32 const windowSize = 1U << params.cParams.windowLog;
-	U32 const singleSegment = params.fParams.contentSizeFlag && (windowSize >= pledgedSrcSize);
-	BYTE const windowLogByte = (BYTE)((params.cParams.windowLog - ZSTD_WINDOWLOG_ABSOLUTEMIN) << 3);
+	U32 const checksumFlag = params->fParams.checksumFlag > 0;
+	U32 const windowSize = 1U << params->cParams.windowLog;
+	U32 const singleSegment = params->fParams.contentSizeFlag && (windowSize >= pledgedSrcSize);
+	BYTE const windowLogByte = (BYTE)((params->cParams.windowLog - ZSTD_WINDOWLOG_ABSOLUTEMIN) << 3);
 	U32 const fcsCode =
-	    params.fParams.contentSizeFlag ? (pledgedSrcSize >= 256) + (pledgedSrcSize >= 65536 + 256) + (pledgedSrcSize >= 0xFFFFFFFFU) : 0; /* 0-3 */
+	    params->fParams.contentSizeFlag ? (pledgedSrcSize >= 256) + (pledgedSrcSize >= 65536 + 256) + (pledgedSrcSize >= 0xFFFFFFFFU) : 0; /* 0-3 */
 	BYTE const frameHeaderDecriptionByte = (BYTE)(dictIDSizeCode + (checksumFlag << 2) + (singleSegment << 5) + (fcsCode << 6));
 	size_t pos;
 
@@ -2496,7 +2499,7 @@ static size_t ZSTD_compressContinue_internal(ZSTD_CCtx *cctx, void *dst, size_t
 		return ERROR(stage_wrong); /* missing init (ZSTD_compressBegin) */
 
 	if (frame && (cctx->stage == ZSTDcs_init)) {
-		fhSize = ZSTD_writeFrameHeader(dst, dstCapacity, cctx->params, cctx->frameContentSize, cctx->dictID);
+		fhSize = ZSTD_writeFrameHeader(dst, dstCapacity, &cctx->params, cctx->frameContentSize, cctx->dictID);
 		if (ZSTD_isError(fhSize))
 			return fhSize;
 		dstCapacity -= fhSize;
@@ -2735,7 +2738,7 @@ static size_t ZSTD_compress_insertDictionary(ZSTD_CCtx *cctx, const void *dict,
 
 /*! ZSTD_compressBegin_internal() :
 *   @return : 0, or an error code */
-static size_t ZSTD_compressBegin_internal(ZSTD_CCtx *cctx, const void *dict, size_t dictSize, ZSTD_parameters params, U64 pledgedSrcSize)
+static size_t ZSTD_compressBegin_internal(ZSTD_CCtx *cctx, const void *dict, size_t dictSize, const ZSTD_parameters *params, U64 pledgedSrcSize)
 {
 	ZSTD_compResetPolicy_e const crp = dictSize ? ZSTDcrp_fullReset : ZSTDcrp_continue;
 	CHECK_F(ZSTD_resetCCtx_advanced(cctx, params, pledgedSrcSize, crp));
@@ -2744,17 +2747,17 @@ static size_t ZSTD_compressBegin_internal(ZSTD_CCtx *cctx, const void *dict, siz
 
 /*! ZSTD_compressBegin_advanced() :
 *   @return : 0, or an error code */
-size_t ZSTD_compressBegin_advanced(ZSTD_CCtx *cctx, const void *dict, size_t dictSize, ZSTD_parameters params, unsigned long long pledgedSrcSize)
+size_t ZSTD_compressBegin_advanced(ZSTD_CCtx *cctx, const void *dict, size_t dictSize, const ZSTD_parameters *params, unsigned long long pledgedSrcSize)
 {
 	/* compression parameters verification and optimization */
-	CHECK_F(ZSTD_checkCParams(params.cParams));
+	CHECK_F(ZSTD_checkCParams(params->cParams));
 	return ZSTD_compressBegin_internal(cctx, dict, dictSize, params, pledgedSrcSize);
 }
 
 size_t ZSTD_compressBegin_usingDict(ZSTD_CCtx *cctx, const void *dict, size_t dictSize, int compressionLevel)
 {
 	ZSTD_parameters const params = ZSTD_getParams(compressionLevel, 0, dictSize);
-	return ZSTD_compressBegin_internal(cctx, dict, dictSize, params, 0);
+	return ZSTD_compressBegin_internal(cctx, dict, dictSize, &params, 0);
 }
 
 size_t ZSTD_compressBegin(ZSTD_CCtx *cctx, int compressionLevel) { return ZSTD_compressBegin_usingDict(cctx, NULL, 0, compressionLevel); }
@@ -2773,7 +2776,7 @@ static size_t ZSTD_writeEpilogue(ZSTD_CCtx *cctx, void *dst, size_t dstCapacity)
 
 	/* special case : empty frame */
 	if (cctx->stage == ZSTDcs_init) {
-		fhSize = ZSTD_writeFrameHeader(dst, dstCapacity, cctx->params, 0, 0);
+		fhSize = ZSTD_writeFrameHeader(dst, dstCapacity, &cctx->params, 0, 0);
 		if (ZSTD_isError(fhSize))
 			return fhSize;
 		dstCapacity -= fhSize;
@@ -2816,19 +2819,19 @@ size_t ZSTD_compressEnd(ZSTD_CCtx *cctx, void *dst, size_t dstCapacity, const vo
 }
 
 static size_t ZSTD_compress_internal(ZSTD_CCtx *cctx, void *dst, size_t dstCapacity, const void *src, size_t srcSize, const void *dict, size_t dictSize,
-				     ZSTD_parameters params)
+				     const ZSTD_parameters *params)
 {
 	CHECK_F(ZSTD_compressBegin_internal(cctx, dict, dictSize, params, srcSize));
 	return ZSTD_compressEnd(cctx, dst, dstCapacity, src, srcSize);
 }
 
 size_t ZSTD_compress_usingDict(ZSTD_CCtx *ctx, void *dst, size_t dstCapacity, const void *src, size_t srcSize, const void *dict, size_t dictSize,
-			       ZSTD_parameters params)
+			       const ZSTD_parameters *params)
 {
 	return ZSTD_compress_internal(ctx, dst, dstCapacity, src, srcSize, dict, dictSize, params);
 }
 
-size_t ZSTD_compressCCtx(ZSTD_CCtx *ctx, void *dst, size_t dstCapacity, const void *src, size_t srcSize, ZSTD_parameters params)
+size_t ZSTD_compressCCtx(ZSTD_CCtx *ctx, void *dst, size_t dstCapacity, const void *src, size_t srcSize, const ZSTD_parameters *params)
 {
 	return ZSTD_compress_internal(ctx, dst, dstCapacity, src, srcSize, NULL, 0, params);
 }
@@ -2844,7 +2847,7 @@ struct ZSTD_CDict_s {
 
 size_t ZSTD_CDictWorkspaceBound(ZSTD_compressionParameters cParams) { return ZSTD_CCtxWorkspaceBound(cParams) + ZSTD_ALIGN(sizeof(ZSTD_CDict)); }
 
-static ZSTD_CDict *ZSTD_createCDict_advanced(const void *dictBuffer, size_t dictSize, unsigned byReference, ZSTD_parameters params, ZSTD_customMem customMem)
+static ZSTD_CDict *ZSTD_createCDict_advanced(const void *dictBuffer, size_t dictSize, unsigned byReference, const ZSTD_parameters *params, ZSTD_customMem customMem)
 {
 	if (!customMem.customAlloc || !customMem.customFree)
 		return NULL;
@@ -2890,7 +2893,7 @@ static ZSTD_CDict *ZSTD_createCDict_advanced(const void *dictBuffer, size_t dict
 	}
 }
 
-ZSTD_CDict *ZSTD_initCDict(const void *dict, size_t dictSize, ZSTD_parameters params, void *workspace, size_t workspaceSize)
+ZSTD_CDict *ZSTD_initCDict(const void *dict, size_t dictSize, const ZSTD_parameters *params, void *workspace, size_t workspaceSize)
 {
 	ZSTD_customMem const stackMem = ZSTD_initStack(workspace, workspaceSize);
 	return ZSTD_createCDict_advanced(dict, dictSize, 1, params, stackMem);
@@ -2918,7 +2921,7 @@ size_t ZSTD_compressBegin_usingCDict(ZSTD_CCtx *cctx, const ZSTD_CDict *cdict, u
 	else {
 		ZSTD_parameters params = cdict->refContext->params;
 		params.fParams.contentSizeFlag = (pledgedSrcSize > 0);
-		CHECK_F(ZSTD_compressBegin_advanced(cctx, NULL, 0, params, pledgedSrcSize));
+		CHECK_F(ZSTD_compressBegin_advanced(cctx, NULL, 0, &params, pledgedSrcSize));
 	}
 	return 0;
 }
@@ -3031,7 +3034,7 @@ static size_t ZSTD_resetCStream_internal(ZSTD_CStream *zcs, unsigned long long p
 	if (zcs->cdict)
 		CHECK_F(ZSTD_compressBegin_usingCDict(zcs->cctx, zcs->cdict, pledgedSrcSize))
 	else
-		CHECK_F(ZSTD_compressBegin_advanced(zcs->cctx, NULL, 0, zcs->params, pledgedSrcSize));
+		CHECK_F(ZSTD_compressBegin_advanced(zcs->cctx, NULL, 0, &zcs->params, pledgedSrcSize));
 
 	zcs->inToCompress = 0;
 	zcs->inBuffPos = 0;
@@ -3052,11 +3055,11 @@ size_t ZSTD_resetCStream(ZSTD_CStream *zcs, unsigned long long pledgedSrcSize)
 	return ZSTD_resetCStream_internal(zcs, pledgedSrcSize);
 }
 
-static size_t ZSTD_initCStream_advanced(ZSTD_CStream *zcs, const void *dict, size_t dictSize, ZSTD_parameters params, unsigned long long pledgedSrcSize)
+static size_t ZSTD_initCStream_advanced(ZSTD_CStream *zcs, const void *dict, size_t dictSize, const ZSTD_parameters *params, unsigned long long pledgedSrcSize)
 {
 	/* allocate buffers */
 	{
-		size_t const neededInBuffSize = (size_t)1 << params.cParams.windowLog;
+		size_t const neededInBuffSize = (size_t)1 << params->cParams.windowLog;
 		if (zcs->inBuffSize < neededInBuffSize) {
 			zcs->inBuffSize = neededInBuffSize;
 			ZSTD_free(zcs->inBuff, zcs->customMem);
@@ -3083,13 +3086,13 @@ static size_t ZSTD_initCStream_advanced(ZSTD_CStream *zcs, const void *dict, siz
 	} else
 		zcs->cdict = NULL;
 
-	zcs->checksum = params.fParams.checksumFlag > 0;
-	zcs->params = params;
+	zcs->checksum = params->fParams.checksumFlag > 0;
+	zcs->params = *params;
 
 	return ZSTD_resetCStream_internal(zcs, pledgedSrcSize);
 }
 
-ZSTD_CStream *ZSTD_initCStream(ZSTD_parameters params, unsigned long long pledgedSrcSize, void *workspace, size_t workspaceSize)
+ZSTD_CStream *ZSTD_initCStream(const ZSTD_parameters *params, unsigned long long pledgedSrcSize, void *workspace, size_t workspaceSize)
 {
 	ZSTD_customMem const stackMem = ZSTD_initStack(workspace, workspaceSize);
 	ZSTD_CStream *const zcs = ZSTD_createCStream_advanced(stackMem);
@@ -3105,7 +3108,7 @@ ZSTD_CStream *ZSTD_initCStream(ZSTD_parameters params, unsigned long long pledge
 ZSTD_CStream *ZSTD_initCStream_usingCDict(const ZSTD_CDict *cdict, unsigned long long pledgedSrcSize, void *workspace, size_t workspaceSize)
 {
 	ZSTD_parameters const params = ZSTD_getParamsFromCDict(cdict);
-	ZSTD_CStream *const zcs = ZSTD_initCStream(params, pledgedSrcSize, workspace, workspaceSize);
+	ZSTD_CStream *const zcs = ZSTD_initCStream(&params, pledgedSrcSize, workspace, workspaceSize);
 	if (zcs) {
 		zcs->cdict = cdict;
 		if (ZSTD_isError(ZSTD_resetCStream_internal(zcs, pledgedSrcSize))) {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/4] zstd: use U16 data type for rankPos
       [not found]   ` <CGME20190603090236epcas5p1bf0733024f7fb52f8129157b11c8f882@epcas5p1.samsung.com>
@ 2019-06-03  9:02     ` Maninder Singh
  0 siblings, 0 replies; 16+ messages in thread
From: Maninder Singh @ 2019-06-03  9:02 UTC (permalink / raw)
  To: akpm, herbert, davem, keescook, gustavo
  Cc: joe, linux-crypto, linux-kernel, a.sahrawat, pankaj.m, v.narang,
	Maninder Singh

rankPos structure variables value can not be more than 512.
So it can easily be declared as U16 rather than U32.

It will reduce stack usage of HUF_sort from 256 bytes to 128 bytes

original:
e92ddbf0        push    {r4, r5, r6, r7, r8, r9, fp, ip, lr, pc}
e24cb004        sub     fp, ip, #4
e24ddc01        sub     sp, sp, #256    ; 0x100

changed:
e92ddbf0        push    {r4, r5, r6, r7, r8, r9, fp, ip, lr, pc}
e24cb004        sub     fp, ip, #4
e24dd080        sub     sp, sp, #128    ; 0x80


Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Signed-off-by: Vaneet Narang <v.narang@samsung.com>
---
 lib/zstd/huf_compress.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/zstd/huf_compress.c b/lib/zstd/huf_compress.c
index e727812..2203124 100644
--- a/lib/zstd/huf_compress.c
+++ b/lib/zstd/huf_compress.c
@@ -382,8 +382,8 @@ static U32 HUF_setMaxHeight(nodeElt *huffNode, U32 lastNonNull, U32 maxNbBits)
 }
 
 typedef struct {
-	U32 base;
-	U32 curr;
+	U16 base;
+	U16 curr;
 } rankPos;
 
 static void HUF_sort(nodeElt *huffNode, const U32 *count, U32 maxSymbolValue)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/4] zstd: move params structure to global variable to reduce  stack usage
       [not found]   ` <CGME20190603090240epcas5p17d0881686df3fa3042d0b2d659e925b3@epcas5p1.samsung.com>
@ 2019-06-03  9:02     ` Maninder Singh
  2019-06-03 21:47       ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: Maninder Singh @ 2019-06-03  9:02 UTC (permalink / raw)
  To: akpm, herbert, davem, keescook, gustavo
  Cc: joe, linux-crypto, linux-kernel, a.sahrawat, pankaj.m, v.narang,
	Maninder Singh

As params structure remains same for lifetime, just initialise it
at init time and make it global variable.

Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Signed-off-by: Vaneet Narang <v.narang@samsung.com>
---
 crypto/zstd.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/crypto/zstd.c b/crypto/zstd.c
index 4e9ff22..80a9e5c 100644
--- a/crypto/zstd.c
+++ b/crypto/zstd.c
@@ -32,6 +32,8 @@ struct zstd_ctx {
 	void *dwksp;
 };
 
+static ZSTD_parameters params;
+
 static ZSTD_parameters zstd_params(void)
 {
 	return ZSTD_getParams(ZSTD_DEF_LEVEL, 0, 0);
@@ -40,8 +42,10 @@ static ZSTD_parameters zstd_params(void)
 static int zstd_comp_init(struct zstd_ctx *ctx)
 {
 	int ret = 0;
-	const ZSTD_parameters params = zstd_params();
-	const size_t wksp_size = ZSTD_CCtxWorkspaceBound(params.cParams);
+	size_t wksp_size;
+
+	params = zstd_params();
+	wksp_size = ZSTD_CCtxWorkspaceBound(params.cParams);
 
 	ctx->cwksp = vzalloc(wksp_size);
 	if (!ctx->cwksp) {
@@ -160,7 +164,6 @@ static int __zstd_compress(const u8 *src, unsigned int slen,
 {
 	size_t out_len;
 	struct zstd_ctx *zctx = ctx;
-	const ZSTD_parameters params = zstd_params();
 
 	out_len = ZSTD_compressCCtx(zctx->cctx, dst, *dlen, src, slen, &params);
 	if (ZSTD_isError(out_len))
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/4] zstd: change structure variable from int to char
       [not found]   ` <CGME20190603090245epcas5p4a6cdfdb7ef72bfd36472f43bb4e1e0f1@epcas5p4.samsung.com>
@ 2019-06-03  9:02     ` Maninder Singh
  0 siblings, 0 replies; 16+ messages in thread
From: Maninder Singh @ 2019-06-03  9:02 UTC (permalink / raw)
  To: akpm, herbert, davem, keescook, gustavo
  Cc: joe, linux-crypto, linux-kernel, a.sahrawat, pankaj.m, v.narang,
	Maninder Singh

Elements of ZSTD_frameParameters structure are used as flag, so
just declare them as char. ZSTD_frameParameters structure is used by
ZSTD_parameters.

Before:
======
sizeof(ZSTD_parameters)
$1 = 40

After:
=====
sizeof(ZSTD_parameters)
$1 = 32

Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Signed-off-by: Vaneet Narang <v.narang@samsung.com>
---
 include/linux/zstd.h  | 6 +++---
 lib/zstd/compress.c   | 4 ++--
 lib/zstd/decompress.c | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/zstd.h b/include/linux/zstd.h
index 5103efa..a1e3483 100644
--- a/include/linux/zstd.h
+++ b/include/linux/zstd.h
@@ -164,9 +164,9 @@ typedef struct {
  * The default value is all fields set to 0.
  */
 typedef struct {
-	unsigned int contentSizeFlag;
-	unsigned int checksumFlag;
-	unsigned int noDictIDFlag;
+	unsigned char contentSizeFlag;
+	unsigned char checksumFlag;
+	unsigned char noDictIDFlag;
 } ZSTD_frameParameters;
 
 /**
diff --git a/lib/zstd/compress.c b/lib/zstd/compress.c
index 306e31b..7e0cea2 100644
--- a/lib/zstd/compress.c
+++ b/lib/zstd/compress.c
@@ -2435,9 +2435,9 @@ static size_t ZSTD_writeFrameHeader(void *dst, size_t dstCapacity, ZSTD_paramete
 {
 	BYTE *const op = (BYTE *)dst;
 	U32 const dictIDSizeCode = (dictID > 0) + (dictID >= 256) + (dictID >= 65536); /* 0-3 */
-	U32 const checksumFlag = params->fParams.checksumFlag > 0;
+	BYTE const checksumFlag = params->fParams.checksumFlag > 0;
 	U32 const windowSize = 1U << params->cParams.windowLog;
-	U32 const singleSegment = params->fParams.contentSizeFlag && (windowSize >= pledgedSrcSize);
+	BYTE const singleSegment = params->fParams.contentSizeFlag && (windowSize >= pledgedSrcSize);
 	BYTE const windowLogByte = (BYTE)((params->cParams.windowLog - ZSTD_WINDOWLOG_ABSOLUTEMIN) << 3);
 	U32 const fcsCode =
 	    params->fParams.contentSizeFlag ? (pledgedSrcSize >= 256) + (pledgedSrcSize >= 65536 + 256) + (pledgedSrcSize >= 0xFFFFFFFFU) : 0; /* 0-3 */
diff --git a/lib/zstd/decompress.c b/lib/zstd/decompress.c
index 269ee9a..7828264 100644
--- a/lib/zstd/decompress.c
+++ b/lib/zstd/decompress.c
@@ -233,7 +233,7 @@ size_t ZSTD_getFrameParams(ZSTD_frameParams *fparamsPtr, const void *src, size_t
 		BYTE const fhdByte = ip[4];
 		size_t pos = 5;
 		U32 const dictIDSizeCode = fhdByte & 3;
-		U32 const checksumFlag = (fhdByte >> 2) & 1;
+		BYTE const checksumFlag = (fhdByte >> 2) & 1;
 		U32 const singleSegment = (fhdByte >> 5) & 1;
 		U32 const fcsID = fhdByte >> 6;
 		U32 const windowSizeMax = 1U << ZSTD_WINDOWLOG_MAX;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] zstd: pass pointer rathen than structure to functions
  2019-06-03  9:02     ` [PATCH 1/4] zstd: pass pointer rathen than structure to functions Maninder Singh
@ 2019-06-03 21:41       ` Andrew Morton
  2019-06-04 22:43       ` Andrew Morton
       [not found]       ` <CGME20190603090232epcas5p1630d0584e8a1aa9495edc819605664fc@epcms5p1>
  2 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2019-06-03 21:41 UTC (permalink / raw)
  To: Maninder Singh
  Cc: herbert, davem, keescook, gustavo, joe, linux-crypto,
	linux-kernel, a.sahrawat, pankaj.m, v.narang

On Mon,  3 Jun 2019 14:32:03 +0530 Maninder Singh <maninder1.s@samsung.com> wrote:

> currently params structure is passed in all functions, which increases
> stack usage in all the function and lead to stack overflow on target like
> ARM with kernel stack size of 8 KB so better to pass pointer.
> 
> Checked for ARM:
> 
>                                 Original               Patched
> Call FLow Size:                  1264                   1040
> ....
> (HUF_sort)                      -> 296
> (HUF_buildCTable_wksp)          -> 144
> (HUF_compress4X_repeat)         -> 88
> (ZSTD_compressBlock_internal)   -> 200
> (ZSTD_compressContinue_internal)-> 136                  -> 88
> (ZSTD_compressCCtx)             -> 192                  -> 64
> (zstd_compress)                 -> 144                  -> 96
> (crypto_compress)               -> 32
> (zcomp_compress)                -> 32
> ....
> 
> ...
>
> --- a/crypto/zstd.c
> +++ b/crypto/zstd.c
> @@ -162,7 +162,7 @@ static int __zstd_compress(const u8 *src, unsigned int slen,
>  	struct zstd_ctx *zctx = ctx;
>  	const ZSTD_parameters params = zstd_params();
>  
> -	out_len = ZSTD_compressCCtx(zctx->cctx, dst, *dlen, src, slen, params);
> +	out_len = ZSTD_compressCCtx(zctx->cctx, dst, *dlen, src, slen, &params);
>  	if (ZSTD_isError(out_len))
>  		return -EINVAL;
>  	*dlen = out_len;
> diff --git a/include/linux/zstd.h b/include/linux/zstd.h
> index 249575e..5103efa 100644
> --- a/include/linux/zstd.h
> +++ b/include/linux/zstd.h
> @@ -254,7 +254,7 @@ ZSTD_CCtx *ZSTD_initCCtx(void *workspace, size_t workspaceSize);
>   *               ZSTD_isError().
>   */
>  size_t ZSTD_compressCCtx(ZSTD_CCtx *ctx, void *dst, size_t dstCapacity,
> -	const void *src, size_t srcSize, ZSTD_parameters params);
> +	const void *src, size_t srcSize, const ZSTD_parameters *params);

Making the params poniter const made review a lot easier ;)

But struct ZSTD_CCtx_s.params is still a copied structure.  Could we
make it `const ZSTD_parameters *params'?  Probably not, due to lifetime
issues?



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] zstd: move params structure to global variable to reduce  stack usage
  2019-06-03  9:02     ` [PATCH 3/4] zstd: move params structure to global variable to reduce stack usage Maninder Singh
@ 2019-06-03 21:47       ` Andrew Morton
  0 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2019-06-03 21:47 UTC (permalink / raw)
  To: Maninder Singh
  Cc: herbert, davem, keescook, gustavo, joe, linux-crypto,
	linux-kernel, a.sahrawat, pankaj.m, v.narang

On Mon,  3 Jun 2019 14:32:05 +0530 Maninder Singh <maninder1.s@samsung.com> wrote:

>
> Subject: [PATCH 3/4] zstd: move params structure to global variable to reduce  stack usage

If this affected lib/zstd/ I'd be alarmed.  But it's a client of
lib/zstd that is choosing to have a single kernel-wide copy.  I'll
rewrite this patch title to "crypto/zstd.c: ...".



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4] zstd: reduce stack usage
  2019-06-03  9:02 ` [PATCH 0/4] zstd: reduce stack usage Maninder Singh
                     ` (3 preceding siblings ...)
       [not found]   ` <CGME20190603090245epcas5p4a6cdfdb7ef72bfd36472f43bb4e1e0f1@epcas5p4.samsung.com>
@ 2019-06-03 21:49   ` Andrew Morton
       [not found]   ` <CGME20190603090227epcas5p348327061a3facbb9dfcf662bf2bc196e@epcms5p3>
  5 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2019-06-03 21:49 UTC (permalink / raw)
  To: Maninder Singh
  Cc: herbert, davem, keescook, gustavo, joe, linux-crypto,
	linux-kernel, a.sahrawat, pankaj.m, v.narang

On Mon,  3 Jun 2019 14:32:02 +0530 Maninder Singh <maninder1.s@samsung.com> wrote:

> This patch set reduces stack usage for zstd code, because target like ARM has
> limited 8KB kernel stack, which is getting overflowed due to hight stack usage
> of zstd code with call flow like:

That's rather bad behaviour.  I assume the patchset actually fixes this?

I think I'll schedule the patchset for 5.2-rcX so that zstd is actually
usable on arm in 5.2.  Does that sound OK?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE:(2) [PATCH 0/4] zstd: reduce stack usage
       [not found]   ` <CGME20190603090227epcas5p348327061a3facbb9dfcf662bf2bc196e@epcms5p3>
@ 2019-06-04 12:06     ` Vaneet Narang
  0 siblings, 0 replies; 16+ messages in thread
From: Vaneet Narang @ 2019-06-04 12:06 UTC (permalink / raw)
  To: Andrew Morton, Maninder Singh
  Cc: herbert, davem, keescook, gustavo, joe, linux-crypto,
	linux-kernel, AMIT SAHRAWAT, PANKAJ MISHRA, Vaneet Narang

Hi Andrew,

>> This patch set reduces stack usage for zstd code, because target like ARM has
>> limited 8KB kernel stack, which is getting overflowed due to hight stack usage
>> of zstd code with call flow like:
 
>That's rather bad behaviour.  I assume the patchset actually fixes this?

Yes, patchset tries to reduce around 300 bytes of stack usage of zstd compression path. 
We faced high stack usage issue on switching compression algo from LZO/LZ4 to zstd algo.
zstd compression uses around 1200 bytes of stack which is huge as compared 
to LZO/LZ4 which uses negligible stack (< 200 bytes).

 
>I think I'll schedule the patchset for 5.2-rcX so that zstd is actually
>usable on arm in 5.2.  Does that sound OK? 
OK

Regards,
Vaneet Narang


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE:(2) [PATCH 1/4] zstd: pass pointer rathen than structure to functions
       [not found]       ` <CGME20190603090232epcas5p1630d0584e8a1aa9495edc819605664fc@epcms5p1>
@ 2019-06-04 13:19         ` Vaneet Narang
  2019-06-06 14:10         ` Vaneet Narang
  1 sibling, 0 replies; 16+ messages in thread
From: Vaneet Narang @ 2019-06-04 13:19 UTC (permalink / raw)
  To: Andrew Morton, Maninder Singh
  Cc: herbert, davem, keescook, gustavo, joe, linux-crypto,
	linux-kernel, AMIT SAHRAWAT, PANKAJ MISHRA, Vaneet Narang

Hi Andrew,
 
>But struct ZSTD_CCtx_s.params is still a copied structure.  Could we
>make it `const ZSTD_parameters *params'?  Probably not, due to lifetime
>issues?

ZSTD maintains its own ctxt. Yes we can avoid storing pointers of other modules
because of lifetime issues. We should keep ZSTD_CCtx_s.params as structure instead of pointer.


Thanks & Regards,
Vaneet Narang 

 
 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] zstd: pass pointer rathen than structure to functions
  2019-06-03  9:02     ` [PATCH 1/4] zstd: pass pointer rathen than structure to functions Maninder Singh
  2019-06-03 21:41       ` Andrew Morton
@ 2019-06-04 22:43       ` Andrew Morton
  2019-06-05 11:57         ` David Sterba
       [not found]       ` <CGME20190603090232epcas5p1630d0584e8a1aa9495edc819605664fc@epcms5p1>
  2 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2019-06-04 22:43 UTC (permalink / raw)
  To: Maninder Singh
  Cc: herbert, davem, keescook, gustavo, joe, linux-crypto,
	linux-kernel, a.sahrawat, pankaj.m, v.narang, Chris Mason,
	Josef Bacik, David Sterba, linux-btrfs

On Mon,  3 Jun 2019 14:32:03 +0530 Maninder Singh <maninder1.s@samsung.com> wrote:

> currently params structure is passed in all functions, which increases
> stack usage in all the function and lead to stack overflow on target like
> ARM with kernel stack size of 8 KB so better to pass pointer.
> 
> Checked for ARM:
> 
>                                 Original               Patched
> Call FLow Size:                  1264                   1040
> ....
> (HUF_sort)                      -> 296
> (HUF_buildCTable_wksp)          -> 144
> (HUF_compress4X_repeat)         -> 88
> (ZSTD_compressBlock_internal)   -> 200
> (ZSTD_compressContinue_internal)-> 136                  -> 88
> (ZSTD_compressCCtx)             -> 192                  -> 64
> (zstd_compress)                 -> 144                  -> 96
> (crypto_compress)               -> 32
> (zcomp_compress)                -> 32
> ....
> 
> Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
> Signed-off-by: Vaneet Narang <v.narang@samsung.com>
> 

You missed btrfs.  This needs review, please - particularly the
kernel-wide static ZSTD_parameters in zstd_get_btrfs_parameters().

The base patch is here:

http://lkml.kernel.org/r/1559552526-4317-2-git-send-email-maninder1.s@samsung.com  

--- a/fs/btrfs/zstd.c~zstd-pass-pointer-rathen-than-structure-to-functions-fix
+++ a/fs/btrfs/zstd.c
@@ -27,15 +27,17 @@
 /* 307s to avoid pathologically clashing with transaction commit */
 #define ZSTD_BTRFS_RECLAIM_JIFFIES (307 * HZ)
 
-static ZSTD_parameters zstd_get_btrfs_parameters(unsigned int level,
+static ZSTD_parameters *zstd_get_btrfs_parameters(unsigned int level,
 						 size_t src_len)
 {
-	ZSTD_parameters params = ZSTD_getParams(level, src_len, 0);
+	static ZSTD_parameters params;
+
+	params = ZSTD_getParams(level, src_len, 0);
 
 	if (params.cParams.windowLog > ZSTD_BTRFS_MAX_WINDOWLOG)
 		params.cParams.windowLog = ZSTD_BTRFS_MAX_WINDOWLOG;
 	WARN_ON(src_len > ZSTD_BTRFS_MAX_INPUT);
-	return params;
+	return &params;
 }
 
 struct workspace {
@@ -155,11 +157,12 @@ static void zstd_calc_ws_mem_sizes(void)
 	unsigned int level;
 
 	for (level = 1; level <= ZSTD_BTRFS_MAX_LEVEL; level++) {
-		ZSTD_parameters params =
-			zstd_get_btrfs_parameters(level, ZSTD_BTRFS_MAX_INPUT);
-		size_t level_size =
-			max_t(size_t,
-			      ZSTD_CStreamWorkspaceBound(params.cParams),
+		ZSTD_parameters *params;
+		size_t level_size;
+
+		params = zstd_get_btrfs_parameters(level, ZSTD_BTRFS_MAX_INPUT);
+		level_size = max_t(size_t,
+			      ZSTD_CStreamWorkspaceBound(params->cParams),
 			      ZSTD_DStreamWorkspaceBound(ZSTD_BTRFS_MAX_INPUT));
 
 		max_size = max_t(size_t, max_size, level_size);
@@ -385,8 +388,9 @@ static int zstd_compress_pages(struct li
 	unsigned long len = *total_out;
 	const unsigned long nr_dest_pages = *out_pages;
 	unsigned long max_out = nr_dest_pages * PAGE_SIZE;
-	ZSTD_parameters params = zstd_get_btrfs_parameters(workspace->req_level,
-							   len);
+	ZSTD_parameters *params;
+
+	params = zstd_get_btrfs_parameters(workspace->req_level, len);
 
 	*out_pages = 0;
 	*total_out = 0;
_


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] zstd: pass pointer rathen than structure to functions
  2019-06-04 22:43       ` Andrew Morton
@ 2019-06-05 11:57         ` David Sterba
  2019-06-05 12:32           ` David Sterba
  0 siblings, 1 reply; 16+ messages in thread
From: David Sterba @ 2019-06-05 11:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Maninder Singh, herbert, davem, keescook, gustavo, joe,
	linux-crypto, linux-kernel, a.sahrawat, pankaj.m, v.narang,
	Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Tue, Jun 04, 2019 at 03:43:26PM -0700, Andrew Morton wrote:
> On Mon,  3 Jun 2019 14:32:03 +0530 Maninder Singh <maninder1.s@samsung.com> wrote:
> 
> > currently params structure is passed in all functions, which increases
> > stack usage in all the function and lead to stack overflow on target like
> > ARM with kernel stack size of 8 KB so better to pass pointer.
> > 
> > Checked for ARM:
> > 
> >                                 Original               Patched
> > Call FLow Size:                  1264                   1040
> > ....
> > (HUF_sort)                      -> 296
> > (HUF_buildCTable_wksp)          -> 144
> > (HUF_compress4X_repeat)         -> 88
> > (ZSTD_compressBlock_internal)   -> 200
> > (ZSTD_compressContinue_internal)-> 136                  -> 88
> > (ZSTD_compressCCtx)             -> 192                  -> 64
> > (zstd_compress)                 -> 144                  -> 96
> > (crypto_compress)               -> 32
> > (zcomp_compress)                -> 32
> > ....
> > 
> > Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
> > Signed-off-by: Vaneet Narang <v.narang@samsung.com>
> > 
> 
> You missed btrfs.  This needs review, please - particularly the
> kernel-wide static ZSTD_parameters in zstd_get_btrfs_parameters().

> 
> The base patch is here:
> 
> http://lkml.kernel.org/r/1559552526-4317-2-git-send-email-maninder1.s@samsung.com  
> 
> --- a/fs/btrfs/zstd.c~zstd-pass-pointer-rathen-than-structure-to-functions-fix
> +++ a/fs/btrfs/zstd.c
> @@ -27,15 +27,17 @@
>  /* 307s to avoid pathologically clashing with transaction commit */
>  #define ZSTD_BTRFS_RECLAIM_JIFFIES (307 * HZ)
>  
> -static ZSTD_parameters zstd_get_btrfs_parameters(unsigned int level,
> +static ZSTD_parameters *zstd_get_btrfs_parameters(unsigned int level,
>  						 size_t src_len)
>  {
> -	ZSTD_parameters params = ZSTD_getParams(level, src_len, 0);
> +	static ZSTD_parameters params;

> +
> +	params = ZSTD_getParams(level, src_len, 0);

No thats' broken, the params can't be static as it depends on level and
src_len. What happens if there are several requests in parallel with
eg. different levels?

Would be really great if the mailinglist is CCed when the code is
changed in a non-trivial way.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] zstd: pass pointer rathen than structure to functions
  2019-06-05 11:57         ` David Sterba
@ 2019-06-05 12:32           ` David Sterba
  2019-06-05 21:32             ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: David Sterba @ 2019-06-05 12:32 UTC (permalink / raw)
  To: Andrew Morton, Maninder Singh, herbert, davem, keescook, gustavo,
	joe, linux-crypto, linux-kernel, a.sahrawat, pankaj.m, v.narang,
	Chris Mason, Josef Bacik, David Sterba, linux-btrfs, terrelln

On Wed, Jun 05, 2019 at 01:57:03PM +0200, David Sterba wrote:
> On Tue, Jun 04, 2019 at 03:43:26PM -0700, Andrew Morton wrote:
> > On Mon,  3 Jun 2019 14:32:03 +0530 Maninder Singh <maninder1.s@samsung.com> wrote:
> > 
> > > currently params structure is passed in all functions, which increases
> > > stack usage in all the function and lead to stack overflow on target like
> > > ARM with kernel stack size of 8 KB so better to pass pointer.
> > > 
> > > Checked for ARM:
> > > 
> > >                                 Original               Patched
> > > Call FLow Size:                  1264                   1040
> > > ....
> > > (HUF_sort)                      -> 296
> > > (HUF_buildCTable_wksp)          -> 144
> > > (HUF_compress4X_repeat)         -> 88
> > > (ZSTD_compressBlock_internal)   -> 200
> > > (ZSTD_compressContinue_internal)-> 136                  -> 88
> > > (ZSTD_compressCCtx)             -> 192                  -> 64
> > > (zstd_compress)                 -> 144                  -> 96
> > > (crypto_compress)               -> 32
> > > (zcomp_compress)                -> 32
> > > ....
> > > 
> > > Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
> > > Signed-off-by: Vaneet Narang <v.narang@samsung.com>
> > > 
> > 
> > You missed btrfs.  This needs review, please - particularly the
> > kernel-wide static ZSTD_parameters in zstd_get_btrfs_parameters().
> 
> > 
> > The base patch is here:
> > 
> > http://lkml.kernel.org/r/1559552526-4317-2-git-send-email-maninder1.s@samsung.com  
> > 
> > --- a/fs/btrfs/zstd.c~zstd-pass-pointer-rathen-than-structure-to-functions-fix
> > +++ a/fs/btrfs/zstd.c
> > @@ -27,15 +27,17 @@
> >  /* 307s to avoid pathologically clashing with transaction commit */
> >  #define ZSTD_BTRFS_RECLAIM_JIFFIES (307 * HZ)
> >  
> > -static ZSTD_parameters zstd_get_btrfs_parameters(unsigned int level,
> > +static ZSTD_parameters *zstd_get_btrfs_parameters(unsigned int level,
> >  						 size_t src_len)
> >  {
> > -	ZSTD_parameters params = ZSTD_getParams(level, src_len, 0);
> > +	static ZSTD_parameters params;
> 
> > +
> > +	params = ZSTD_getParams(level, src_len, 0);
> 
> No thats' broken, the params can't be static as it depends on level and
> src_len. What happens if there are several requests in parallel with
> eg. different levels?
> 
> Would be really great if the mailinglist is CCed when the code is
> changed in a non-trivial way.

So this does not compile fs/btrfs/zstd.o which Andrew probably found
too, otherwise btrfs is the only in-tree user of the function outside of
lib/ and crypto/.

I think that Nick Terrell should have been CCed too, as he ported zstd
to linux.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] zstd: pass pointer rathen than structure to functions
  2019-06-05 12:32           ` David Sterba
@ 2019-06-05 21:32             ` Andrew Morton
  0 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2019-06-05 21:32 UTC (permalink / raw)
  To: dsterba
  Cc: Maninder Singh, herbert, davem, keescook, gustavo, joe,
	linux-crypto, linux-kernel, a.sahrawat, pankaj.m, v.narang,
	Chris Mason, Josef Bacik, David Sterba, linux-btrfs, terrelln

On Wed, 5 Jun 2019 14:32:53 +0200 David Sterba <dsterba@suse.cz> wrote:

> > >  
> > > -static ZSTD_parameters zstd_get_btrfs_parameters(unsigned int level,
> > > +static ZSTD_parameters *zstd_get_btrfs_parameters(unsigned int level,
> > >  						 size_t src_len)
> > >  {
> > > -	ZSTD_parameters params = ZSTD_getParams(level, src_len, 0);
> > > +	static ZSTD_parameters params;
> > 
> > > +
> > > +	params = ZSTD_getParams(level, src_len, 0);
> > 
> > No thats' broken, the params can't be static as it depends on level and
> > src_len. What happens if there are several requests in parallel with
> > eg. different levels?

I wondered.  I'll drop the patch series as some more serious thinking
is needed.

> > Would be really great if the mailinglist is CCed when the code is
> > changed in a non-trivial way.

Well we didn't actually change btrfs until I discovered that Maninder
had missed it.

> So this does not compile fs/btrfs/zstd.o which Andrew probably found
> too, otherwise btrfs is the only in-tree user of the function outside of
> lib/ and crypto/.

Worked for me - I might have sent the wrong version.

> I think that Nick Terrell should have been CCed too, as he ported zstd
> to linux.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE:(2) [PATCH 1/4] zstd: pass pointer rathen than structure to functions
       [not found]       ` <CGME20190603090232epcas5p1630d0584e8a1aa9495edc819605664fc@epcms5p1>
  2019-06-04 13:19         ` Vaneet Narang
@ 2019-06-06 14:10         ` Vaneet Narang
  2019-06-06 20:14           ` (2) " Nick Terrell
  1 sibling, 1 reply; 16+ messages in thread
From: Vaneet Narang @ 2019-06-06 14:10 UTC (permalink / raw)
  To: Andrew Morton, dsterba
  Cc: Maninder Singh, herbert, davem, keescook, gustavo, joe,
	linux-crypto, linux-kernel, AMIT SAHRAWAT, PANKAJ MISHRA,
	Vaneet Narang, Chris Mason, Josef Bacik, David Sterba,
	linux-btrfs, terrelln

Hi Andrew / David,

 
>> > > -        ZSTD_parameters params = ZSTD_getParams(level, src_len, 0);
>> > > +        static ZSTD_parameters params;
>> > 
>> > > +
>> > > +        params = ZSTD_getParams(level, src_len, 0);
>> > 
>> > No thats' broken, the params can't be static as it depends on level and
>> > src_len. What happens if there are several requests in parallel with
>> > eg. different levels?

There is no need to make static for btrfs. We can keep it as a stack variable.
This patch set  focussed on reducing stack usage of zstd compression when triggered
through zram. ZRAM internally uses crypto and currently crpto uses fixed level and also
not dependent upon source length.

crypto/zstd.c:  
static ZSTD_parameters zstd_params(void)
{
        return ZSTD_getParams(ZSTD_DEF_LEVEL, 0, 0);
}


Actually high stack usage problem with zstd compression patch gets exploited more incase of 
shrink path which gets triggered randomly from any call flow in case of low memory and adds overhead
of more than 2000 byte of stack and results in stack overflow.

Stack usage of alloc_page in case of low memory

   72   HUF_compressWeights_wksp+0x140/0x200  
   64   HUF_writeCTable_wksp+0xdc/0x1c8      
   88   HUF_compress4X_repeat+0x214/0x450     
  208   ZSTD_compressBlock_internal+0x224/0x137c
  136   ZSTD_compressContinue_internal+0x210/0x3b0
  192   ZSTD_compressCCtx+0x6c/0x144
  144   zstd_compress+0x40/0x58
   32   crypto_compress+0x2c/0x34
   32   zcomp_compress+0x3c/0x44
   80   zram_bvec_rw+0x2f8/0xa7c
   64   zram_rw_page+0x104/0x170
   48   bdev_write_page+0x80/0xb4
  112   __swap_writepage+0x160/0x29c
   24   swap_writepage+0x3c/0x58
  160   shrink_page_list+0x788/0xae0
  128   shrink_inactive_list+0x210/0x4a8
  184   shrink_zone+0x53c/0x7c0
  160   try_to_free_pages+0x2fc/0x7cc
   80   __alloc_pages_nodemask+0x534/0x91c

Thanks & Regards,
Vaneet Narang 
 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: (2) [PATCH 1/4] zstd: pass pointer rathen than structure to functions
  2019-06-06 14:10         ` Vaneet Narang
@ 2019-06-06 20:14           ` Nick Terrell
  0 siblings, 0 replies; 16+ messages in thread
From: Nick Terrell @ 2019-06-06 20:14 UTC (permalink / raw)
  To: Vaneet Narang
  Cc: Andrew Morton, dsterba, Maninder Singh, herbert, davem, keescook,
	gustavo, joe, linux-crypto, linux-kernel, AMIT SAHRAWAT,
	PANKAJ MISHRA, Chris Mason, Josef Bacik, David Sterba,
	linux-btrfs



> On Jun 6, 2019, at 7:10 AM, Vaneet Narang <v.narang@samsung.com> wrote:
> 
> Hi Andrew / David,
> 
>  
>>>  > > -        ZSTD_parameters params = ZSTD_getParams(level, src_len, 0);
>>>  > > +        static ZSTD_parameters params;
>>>  > 
>>>  > > +
>>>  > > +        params = ZSTD_getParams(level, src_len, 0);
>>>  > 
>>>  > No thats' broken, the params can't be static as it depends on level and
>>>  > src_len. What happens if there are several requests in parallel with
>>>  > eg. different levels?
> 
> There is no need to make static for btrfs. We can keep it as a stack variable.
> This patch set  focussed on reducing stack usage of zstd compression when triggered
> through zram. ZRAM internally uses crypto and currently crpto uses fixed level and also
> not dependent upon source length.

Can we measure the performance of these patches on btrfs and/or zram? See the benchmarks
I ran on my original patch to btrfs for reference https://lore.kernel.org/patchwork/patch/802866/.

I don't expect a speed difference, but I think it is worth measuring.

> crypto/zstd.c:  
> static ZSTD_parameters zstd_params(void)
> {
>        return ZSTD_getParams(ZSTD_DEF_LEVEL, 0, 0);
> }
> 
> 
> Actually high stack usage problem with zstd compression patch gets exploited more incase of 
> shrink path which gets triggered randomly from any call flow in case of low memory and adds overhead
> of more than 2000 byte of stack and results in stack overflow.
> 
> Stack usage of alloc_page in case of low memory
> 
>   72   HUF_compressWeights_wksp+0x140/0x200  
>   64   HUF_writeCTable_wksp+0xdc/0x1c8      
>   88   HUF_compress4X_repeat+0x214/0x450     
>  208   ZSTD_compressBlock_internal+0x224/0x137c
>  136   ZSTD_compressContinue_internal+0x210/0x3b0
>  192   ZSTD_compressCCtx+0x6c/0x144
>  144   zstd_compress+0x40/0x58
>   32   crypto_compress+0x2c/0x34
>   32   zcomp_compress+0x3c/0x44
>   80   zram_bvec_rw+0x2f8/0xa7c
>   64   zram_rw_page+0x104/0x170
>   48   bdev_write_page+0x80/0xb4
>  112   __swap_writepage+0x160/0x29c
>   24   swap_writepage+0x3c/0x58
>  160   shrink_page_list+0x788/0xae0
>  128   shrink_inactive_list+0x210/0x4a8
>  184   shrink_zone+0x53c/0x7c0
>  160   try_to_free_pages+0x2fc/0x7cc
>   80   __alloc_pages_nodemask+0x534/0x91c
> 
> Thanks & Regards,
> Vaneet Narang 
>  


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-06-06 20:15 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20190603090227epcas5p348327061a3facbb9dfcf662bf2bc196e@epcas5p3.samsung.com>
2019-06-03  9:02 ` [PATCH 0/4] zstd: reduce stack usage Maninder Singh
     [not found]   ` <CGME20190603090232epcas5p1630d0584e8a1aa9495edc819605664fc@epcas5p1.samsung.com>
2019-06-03  9:02     ` [PATCH 1/4] zstd: pass pointer rathen than structure to functions Maninder Singh
2019-06-03 21:41       ` Andrew Morton
2019-06-04 22:43       ` Andrew Morton
2019-06-05 11:57         ` David Sterba
2019-06-05 12:32           ` David Sterba
2019-06-05 21:32             ` Andrew Morton
     [not found]       ` <CGME20190603090232epcas5p1630d0584e8a1aa9495edc819605664fc@epcms5p1>
2019-06-04 13:19         ` Vaneet Narang
2019-06-06 14:10         ` Vaneet Narang
2019-06-06 20:14           ` (2) " Nick Terrell
     [not found]   ` <CGME20190603090236epcas5p1bf0733024f7fb52f8129157b11c8f882@epcas5p1.samsung.com>
2019-06-03  9:02     ` [PATCH 2/4] zstd: use U16 data type for rankPos Maninder Singh
     [not found]   ` <CGME20190603090240epcas5p17d0881686df3fa3042d0b2d659e925b3@epcas5p1.samsung.com>
2019-06-03  9:02     ` [PATCH 3/4] zstd: move params structure to global variable to reduce stack usage Maninder Singh
2019-06-03 21:47       ` Andrew Morton
     [not found]   ` <CGME20190603090245epcas5p4a6cdfdb7ef72bfd36472f43bb4e1e0f1@epcas5p4.samsung.com>
2019-06-03  9:02     ` [PATCH 4/4] zstd: change structure variable from int to char Maninder Singh
2019-06-03 21:49   ` [PATCH 0/4] zstd: reduce stack usage Andrew Morton
     [not found]   ` <CGME20190603090227epcas5p348327061a3facbb9dfcf662bf2bc196e@epcms5p3>
2019-06-04 12:06     ` Vaneet Narang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).