linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] crypto: benchmark - add the crypto benchmark
@ 2022-09-19 12:05 Yang Shen
  2022-09-19 12:05 ` [RFC PATCH 1/6] moduleparams: Add hexulong type parameter Yang Shen
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Yang Shen @ 2022-09-19 12:05 UTC (permalink / raw)
  To: herbert, davem; +Cc: linux-kernel, linux-crypto, gregkh

Add crypto benchmark - A tool to help the users quickly get the
performance of a algorithm registered in crypto.

The tool tries to use the same API to unify the processes of different
algorithms. The algorithm can do some private operations in the callbacks.
For users, they can see the unified configuration parameters, rather than
a set of configuration parameters corresponding to each algorithm.

This tool can provide users with the ability to test the performance of
algorithms in some specific scenarios. At present, the following parameters
are selected for users configuration: block size, block number,
thread number, bound numa and request number for per tfm. These parameters
can help users simulate approximate business scenarios.

For the RFC version, the compression benchmark test is supported.
I did some verification on Kunpeng920.

The first test case is for zlib-deflate software algorithm.
The cpu frequency is 2.6 GHz. I want to show you the influence of these
parameters.

The configuration is following:
run set: algorithm zlib-deflate, algtype CRYPTO_COMPRESS, inputsize 1024,
loop 1, numamask 0x0, optype 0, reqnum 1, threadnum 1, time 1.
The result is :
Crypto benchmark result:
        throughput      pps             time
        150 MB/s        150 kPP/s       1000 ms

And then change the block size:
run set: algorithm zlib-deflate, algtype CRYPTO_COMPRESS, inputsize 8192,
loop 1, numamask 0x0, optype 0, reqnum 1, threadnum 1, time 1.
Crypto benchmark result:
        throughput      pps             time
        473 MB/s        59 kPP/s        1005 ms

run set: algorithm zlib-deflate, algtype CRYPTO_COMPRESS, inputsize 65536,
loop 1, numamask 0x0, optype 0, reqnum 1, threadnum 1, time 1.
Crypto benchmark result:
        throughput      pps             time
        421 MB/s        6 kPP/s         1038 ms

With the test, users can know that the throughput and pps are both
influenced by block size on this server. And the throughput has a peak
value while the pps is inverse ratio with bolck size increasing.
Due to the software algorithm, thread number will linear increase the
result while it is less than cpu number and other parameters have little
influence on performance.

The second test case is for zlib-deflate hardware. The tested parameters
has the same effect on hardware. Here I test the parameter 'reqnum'.
The software algorithm register to synchronous process. So here it is
useless for software performance.

run set: algorithm zlib-deflate, algtype CRYPTO_COMPRESS, inputsize 8192,
loop 1, numamask 0x0, optype 0, reqnum 1, threadnum 1, time 1.
Crypto benchmark result:
        throughput      pps             time
        367 MB/s        46 kPP/s        941 ms

run set: algorithm zlib-deflate, algtype CRYPTO_COMPRESS, inputsize 8192,
loop 1, numamask 0x0, optype 0, reqnum 10, threadnum 1, time 1.
Crypto benchmark result:
        throughput      pps             time
        3507 MB/s       438 kPP/s       1003 ms

run set: algorithm zlib-deflate, algtype CRYPTO_COMPRESS, inputsize 8192,
loop 1, numamask 0x0, optype 0, reqnum 100, threadnum 1, time 1.
Crypto benchmark result:
        throughput      pps             time
        6318 MB/s       790 kPP/s       1093 ms

So we can know that for asynchronous algorithms, request number for per
tfm also influence the throughput and pps until a peak value.

So with this tool, we can get a quick verification for different platform
and get some reference for business scenarios configuration.

Yang Shen (6):
  moduleparams: Add hexulong type parameter
  crypto: benchmark - add a crypto benchmark tool
  crytpo: benchmark - support compression/decompresssion
  crypto: benchmark - add help information
  crypto: benchmark - add API documentation
  MAINTAINERS: add crypto benchmark MAINTAINER

 Documentation/crypto/benchmark.rst | 104 +++++
 MAINTAINERS                        |   7 +
 crypto/Kconfig                     |   2 +
 crypto/Makefile                    |   5 +
 crypto/benchmark/Kconfig           |  11 +
 crypto/benchmark/Makefile          |   3 +
 crypto/benchmark/benchmark.c       | 599 +++++++++++++++++++++++++++++
 crypto/benchmark/benchmark.h       |  76 ++++
 crypto/benchmark/bm_comp.c         | 435 +++++++++++++++++++++
 crypto/benchmark/bm_comp.h         |  19 +
 include/linux/moduleparam.h        |   7 +-
 kernel/params.c                    |   1 +
 12 files changed, 1268 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/crypto/benchmark.rst
 create mode 100644 crypto/benchmark/Kconfig
 create mode 100644 crypto/benchmark/Makefile
 create mode 100644 crypto/benchmark/benchmark.c
 create mode 100644 crypto/benchmark/benchmark.h
 create mode 100644 crypto/benchmark/bm_comp.c
 create mode 100644 crypto/benchmark/bm_comp.h

--
2.24.0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH 1/6] moduleparams: Add hexulong type parameter
  2022-09-19 12:05 [RFC PATCH 0/6] crypto: benchmark - add the crypto benchmark Yang Shen
@ 2022-09-19 12:05 ` Yang Shen
  2022-09-19 12:05 ` [RFC PATCH 2/6] crypto: benchmark - add a crypto benchmark tool Yang Shen
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Yang Shen @ 2022-09-19 12:05 UTC (permalink / raw)
  To: herbert, davem; +Cc: linux-kernel, linux-crypto, gregkh

Due to the bitmap.h uses a unsigned long pointer for bitmap variable,
Add an 'hexulong' is more convenient.

Signed-off-by: Yang Shen <shenyang39@huawei.com>
---
 include/linux/moduleparam.h | 7 ++++++-
 kernel/params.c             | 1 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/moduleparam.h b/include/linux/moduleparam.h
index 962cd41a2cb5..9e0828fa3946 100644
--- a/include/linux/moduleparam.h
+++ b/include/linux/moduleparam.h
@@ -118,7 +118,7 @@ struct kparam_array
  * you can create your own by defining those variables.
  *
  * Standard types are:
- *	byte, hexint, short, ushort, int, uint, long, ulong
+ *	byte, hexint, hexulong, short, ushort, int, uint, long, ulong
  *	charp: a character pointer
  *	bool: a bool, values 0/1, y/n, Y/N.
  *	invbool: the above, only sense-reversed (N = true).
@@ -455,6 +455,11 @@ extern int param_set_hexint(const char *val, const struct kernel_param *kp);
 extern int param_get_hexint(char *buffer, const struct kernel_param *kp);
 #define param_check_hexint(name, p) param_check_uint(name, p)
 
+extern const struct kernel_param_ops param_ops_hexulong;
+extern int param_set_hexulong(const char *val, const struct kernel_param *kp);
+extern int param_get_hexulong(char *buffer, const struct kernel_param *kp);
+#define param_check_hexulong(name, p) param_check_ulong(name, p)
+
 extern const struct kernel_param_ops param_ops_charp;
 extern int param_set_charp(const char *val, const struct kernel_param *kp);
 extern int param_get_charp(char *buffer, const struct kernel_param *kp);
diff --git a/kernel/params.c b/kernel/params.c
index 5b92310425c5..f367f0c1f228 100644
--- a/kernel/params.c
+++ b/kernel/params.c
@@ -242,6 +242,7 @@ STANDARD_PARAM_DEF(long,	long,			"%li",		kstrtol);
 STANDARD_PARAM_DEF(ulong,	unsigned long,		"%lu",		kstrtoul);
 STANDARD_PARAM_DEF(ullong,	unsigned long long,	"%llu",		kstrtoull);
 STANDARD_PARAM_DEF(hexint,	unsigned int,		"%#08x", 	kstrtouint);
+STANDARD_PARAM_DEF(hexulong,	unsigned long,		"%#016lx",	kstrtoul);
 
 int param_set_uint_minmax(const char *val, const struct kernel_param *kp,
 		unsigned int min, unsigned int max)
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 2/6] crypto: benchmark - add a crypto benchmark tool
  2022-09-19 12:05 [RFC PATCH 0/6] crypto: benchmark - add the crypto benchmark Yang Shen
  2022-09-19 12:05 ` [RFC PATCH 1/6] moduleparams: Add hexulong type parameter Yang Shen
@ 2022-09-19 12:05 ` Yang Shen
  2022-09-20  7:31   ` Greg KH
  2022-09-19 12:05 ` [RFC PATCH 3/6] crytpo: benchmark - support compression/decompresssion Yang Shen
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Yang Shen @ 2022-09-19 12:05 UTC (permalink / raw)
  To: herbert, davem; +Cc: linux-kernel, linux-crypto, gregkh

Provide a crypto benchmark to help the developer quickly get the
performance of a algorithm registered in crypto.

Due to the crypto algorithms have multifarious parameters, the tool
cannot support all test scenes. In order to provide users with simple
and easy-to-use tools and support as many test scenarios as possible,
benchmark refers to the crypto method to provide a unified struct
'crypto_bm_ops'. And the algorithm registers its own callbacks to parse
the user's input. In crypto, a algorithm class has multiple algorithms,
but all of them uses the same API. So in the benchmark, a algorithm
class uses the same 'ops' and distinguish specific algorithm by name.

First, consider the performance calculation model. Considering the
crypto subsystem model, a reasonable process code based on crypto api
should create a numa node based 'crypto_tfm' in advance and apply for
a certain amount of 'crypto_req' according to their own business.
In the real business processing stage, the thread send tasks based on
'crypto_req' and wait for completion.

Therefore, the benchmark will create 'crypto_tfm' and 'crypto_req' at
first, and then count all requests time to calculate performance.
So the result is the pure algorithm performance. When each algorithm
class implements its own 'ops', it needs to pay attention to the content
completed in the callback. Before the 'ops.perf', the tool had better
prepare the request data set. And in order to avoid the false high
performance of the algorithm caused by the false cache and TLB hit rate,
the size of data set should be larger than 'crypto_req' number.
The 'crypto_bm_ops' has following api:
 - init & uninit
 The initialize related functions. Algorithm can do some private setting.
 - create_tfm & release_tfm
 The 'crypto_tfm' related functions. Algorithm has different tfm name in
 crypto. But they both has a member named tfm, so use tfm to stand for
 algorithm handle. The benchmark has provides the tfm array.
 - create_req & release_req
 The 'crypto_req' related functions. The callbacks should create a 'reqnum'
 'crypto_req' group in struct 'crypto_bm_base'. And the also suggest
 prepare the request data in this function. In order to avoid the false
 high performance of the algorithm caused by the false cache and TLB hit
 rate, the size of data set should be larger than 'crypto_req' number.
 - perf
 The request sending functions. The registrant should use parameter 'loop'
 to send requests repeatly. And update the count in struct
 'crypto_bm_thread_data'.

Then consider the parameters that user can configure. Generally speaking,
the following parameters will affect the performance of the algorithm:
tfm number, request number, block size, numa node. And some parameters
will affect the stability of performance: testing time and requests sent
number. To sum up, the benchmark has following parameters:
 - algorithm
 The testing algorithm name. Showed in /proc/crypto.
 - algtype
 The testing algorithm class. Can get the algorithm class by echo 'algtype'
 to /sys/module/crypto_benchmark/parameters/help.
 - inputsize
 The testing length that can greatly impact performance. Such as data size
 for compress or key length for encryption.
 - loop
 The testing loop times. Avoid performance fluctuations caused by
 environment.
 - numamask
 The testing bind numamask. Used for allocate memory, create threads and
 create 'crypto_tfm'.
 - optype
 The testing algorithm operation type. Can get the algorithm available
 operation types by cat /sys/module/crypto_benchmark/parameters/help
 with specified 'algtype'.
 - reqnum
 The testing request number for per tfm. Used for test asynchrony api
 performance.
 - threadnum
 The testing thread number. To simplify model, create a 'crypto_tfm' per
 thread.
 - time
 The testing time. Used for stop the test thread.
 - run
 Start or stop the test.

Users can configure parameters under
/sys/modules/crypto_benchmark/parameters/.
Then echo 1 to 'run' to start the test. And if they want to stop the
test, just echo 0 to 'run'.

Signed-off-by: Yang Shen <shenyang39@huawei.com>
---
 crypto/Kconfig               |   2 +
 crypto/Makefile              |   5 +
 crypto/benchmark/Kconfig     |  11 +
 crypto/benchmark/Makefile    |   3 +
 crypto/benchmark/benchmark.c | 509 +++++++++++++++++++++++++++++++++++
 crypto/benchmark/benchmark.h |  76 ++++++
 6 files changed, 606 insertions(+)
 create mode 100644 crypto/benchmark/Kconfig
 create mode 100644 crypto/benchmark/Makefile
 create mode 100644 crypto/benchmark/benchmark.c
 create mode 100644 crypto/benchmark/benchmark.h

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 40423a14f86f..a0f618f349fc 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1438,4 +1438,6 @@ source "drivers/crypto/Kconfig"
 source "crypto/asymmetric_keys/Kconfig"
 source "certs/Kconfig"

+source "crypto/benchmark/Kconfig"
+
 endif	# if CRYPTO
diff --git a/crypto/Makefile b/crypto/Makefile
index a6f94e04e1da..67edf4e1337c 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -212,3 +212,8 @@ obj-$(CONFIG_CRYPTO_SIMD) += crypto_simd.o
 # Key derivation function
 #
 obj-$(CONFIG_CRYPTO_KDF800108_CTR) += kdf_sp800108.o
+
+#
+# crypto benchmark
+#
+obj-y += benchmark/
diff --git a/crypto/benchmark/Kconfig b/crypto/benchmark/Kconfig
new file mode 100644
index 000000000000..abee14ba8e40
--- /dev/null
+++ b/crypto/benchmark/Kconfig
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0
+
+config CRYPTO_BENCHMARK
+	bool "Testing performance of crypto algorithms"
+	depends on CRYPTO
+	help
+	  This option support test crypto async api performance.
+	  Select this if you want to test crypto algorithms performance
+	  conveniently.
+	  Before use it, you should check whether the algorithm class is
+	  supported.
diff --git a/crypto/benchmark/Makefile b/crypto/benchmark/Makefile
new file mode 100644
index 000000000000..5244178e14c4
--- /dev/null
+++ b/crypto/benchmark/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_CRYPTO_BENCHMARK) += crypto_benchmark.o
+crypto_benchmark-objs += benchmark.o
diff --git a/crypto/benchmark/benchmark.c b/crypto/benchmark/benchmark.c
new file mode 100644
index 000000000000..9a833b277d87
--- /dev/null
+++ b/crypto/benchmark/benchmark.c
@@ -0,0 +1,509 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2022 HiSilicon Limited.
+ */
+#include <linux/crypto.h>
+#include <linux/jiffies.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/kthread.h>
+#include <linux/string.h>
+#include <linux/wait.h>
+
+#include "benchmark.h"
+
+enum crypto_bm_status {
+	CRYPTO_BM_STOP,
+	CRYPTO_BM_RUN,
+};
+
+enum crypto_bm_alg {
+	CRYPTO_BM_ALG_MAX,
+};
+
+struct crypto_bm_alg_ops {
+	const char *alg;
+	int (*init)(struct crypto_bm_base *base);
+	void (*uninit)(struct crypto_bm_base *base);
+	int (*create_tfm)(struct crypto_bm_base *base, u32 idx);
+	void (*release_tfm)(struct crypto_bm_base *base, u32 idx);
+	int (*create_req)(struct crypto_bm_base *base, u32 idx);
+	void (*release_req)(struct crypto_bm_base *base, u32 idx);
+	int (*perf)(struct crypto_bm_thread_data *data);
+};
+
+struct {
+	wait_queue_head_t wq;
+	atomic_t count;
+} crypto_bm_wq = { 0 };
+
+#define CRYPTO_BM_THREAD_MAX	1024U
+
+#define algorithm_desc		"Testing algorithm name"
+#define algtype_desc		"Testing algorithm type, according to enum crypto_bm_alg"
+#define inputsize_desc		"Testing input size"
+#define loop_desc		"Testing loop times, the unit is kile, 0/1(default, 1 ktimes), 2(2 ktimes) ..."
+#define numamask_desc		"Testing bind numamask, 0(default, not bind), 1(bind to node 0), 3(bind to node0 and node1) ..."
+#define optype_desc		"Testing algorithm operation type 0 && 1: 0(default, compress/encipher), 1(decompress/decipher)"
+#define reqnum_desc		"Testing request number for per tfm, 0/1 (default 1 request), 2(2 requests) ..."
+#define threadnum_desc		"Testing thread number, one 'crypto_tfm' per thread. 0/1 (default 1 thread), 2(2 threads) ..."
+#define time_desc		"Testing time, the unit is second, 0/1 (default 1 s), 2(2 s) ..."
+#define run_desc		"Start/stop all the tests based on the configuration, 0(default, not run, stop), or run"
+
+static atomic_t benchmark_status;
+
+static struct crypto_bm_attrs benchmark_attrs = { 0 };
+
+static struct crypto_bm_base benchmark_base = {
+	.attrs = &benchmark_attrs,
+};
+
+static struct crypto_bm_thread_data thread_data[CRYPTO_BM_THREAD_MAX] = { 0 };
+
+static struct task_struct *crypto_bm_perf[CRYPTO_BM_THREAD_MAX] = { NULL };
+static struct task_struct *test_thread;
+
+static struct crypto_bm_alg_ops benchmark_ops[] = {
+	{
+		/* sentinel */
+	}
+};
+
+static int crypto_bm_algorithm_param_set(const char *val, const struct kernel_param *kp)
+{
+	char *s = strstrip((char *)val);
+
+	if (atomic_read(&benchmark_status))
+		return -EBUSY;
+
+	if (!crypto_has_alg(s, 0, 0)) {
+		pr_err("failed to find the algorithm %s\n", s);
+		return -EINVAL;
+	}
+
+	return param_set_charp(s, kp);
+}
+
+static const struct kernel_param_ops alg_ops = {
+	.set = crypto_bm_algorithm_param_set,
+	.get = param_get_charp,
+};
+
+module_param_cb(algorithm, &alg_ops, &benchmark_attrs.algorithm, 0644);
+MODULE_PARM_DESC(algorithm, algorithm_desc);
+
+static int crypto_bm_numamask_param_set(const char *val, const struct kernel_param *kp)
+{
+	if (atomic_read(&benchmark_status))
+		return -EBUSY;
+
+	return param_set_hexulong(val, kp);
+}
+
+static const struct kernel_param_ops numamask_ops = {
+	.set = crypto_bm_numamask_param_set,
+	.get = param_get_hexulong,
+};
+
+module_param_cb(numamask, &numamask_ops, &benchmark_attrs.numamask, 0644);
+MODULE_PARM_DESC(numamask, numamask_desc);
+
+#define MODULE_PARAMETER_DEF(xxx) \
+static int xxx##_set(const char *val, const struct kernel_param *kp) \
+{ \
+	u32 n; \
+	int ret; \
+	if (atomic_read(&benchmark_status)) \
+		return -EBUSY; \
+	ret = kstrtou32(val, 10, &n); \
+	if (ret != 0) \
+		return -EINVAL; \
+	return param_set_uint(val, kp); \
+} \
+static const struct kernel_param_ops xxx##_ops = { \
+	.set = xxx##_set, \
+	.get = param_get_uint \
+}; \
+module_param_cb(xxx, &xxx##_ops, &benchmark_attrs.xxx, 0644); \
+MODULE_PARM_DESC(xxx, xxx##_desc)
+
+MODULE_PARAMETER_DEF(algtype);
+MODULE_PARAMETER_DEF(inputsize);
+MODULE_PARAMETER_DEF(loop);
+MODULE_PARAMETER_DEF(optype);
+MODULE_PARAMETER_DEF(reqnum);
+MODULE_PARAMETER_DEF(threadnum);
+MODULE_PARAMETER_DEF(time);
+
+static int crypto_bm_check_params(struct crypto_bm_attrs *attrs)
+{
+	if (attrs->algorithm == NULL) {
+		pr_err("algorithm is NULL\n");
+		return -EINVAL;
+	}
+
+	if (attrs->algtype >= CRYPTO_BM_ALG_MAX) {
+		pr_err("algorithm type %d is invalid\n", attrs->algtype);
+		return -EINVAL;
+	}
+
+	if (attrs->inputsize == 0) {
+		pr_err("input size is 0\n");
+		return -EINVAL;
+	}
+
+	if (attrs->threadnum >= CRYPTO_BM_THREAD_MAX) {
+		pr_err("thread number is bigger than %u\n", CRYPTO_BM_THREAD_MAX);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static void crypto_bm_set_default_params(struct crypto_bm_attrs *attrs)
+{
+	attrs->loop = (attrs->loop == 0) ? 1 : attrs->loop;
+	attrs->reqnum = (attrs->reqnum == 0) ? 1 : attrs->reqnum;
+	attrs->threadnum = (attrs->threadnum == 0) ? 1 : attrs->threadnum;
+	attrs->time = (attrs->time == 0) ? 1 : attrs->time;
+}
+
+static int crypto_bm_init_alg(struct crypto_bm_base *base)
+{
+	u32 idx = base->attrs->algtype;
+
+	return benchmark_ops[idx].init(base);
+}
+
+static void crypto_bm_uninit_alg(struct crypto_bm_base *base)
+{
+	u32 idx = base->attrs->algtype;
+
+	benchmark_ops[idx].uninit(base);
+}
+
+static int crypto_bm_create_tfm(struct crypto_bm_base *base)
+{
+	struct crypto_bm_attrs *attrs = base->attrs;
+	int i, ret, nodes, sbit, count = 0;
+	u32 threadnum = attrs->threadnum;
+	u32 threadpernode, threadrest;
+	u32 idx = attrs->algtype;
+
+	base->gthread = kcalloc(threadnum, sizeof(*base->gthread), GFP_KERNEL);
+	if (!base->gthread)
+		return -ENOMEM;
+
+	nodes = bitmap_weight(&attrs->numamask, MAX_NUMNODES);
+
+	if (nodes == 0) {
+		for (i = 0; i < threadnum; i++) {
+			base->gthread[i].id = i;
+			base->gthread[i].node = NUMA_NO_NODE;
+			ret = benchmark_ops[idx].create_tfm(base, i);
+			if (ret)
+				goto out_free_tfm;
+		}
+	} else {
+		threadpernode = threadnum / nodes;
+		threadrest = threadnum % nodes;
+		for_each_set_bit(sbit, (unsigned long *)&attrs->numamask, MAX_NUMNODES) {
+			int start = count * threadpernode;
+			int end = (count + 1) * threadpernode;
+
+			end += (++count == nodes) ? threadrest : 0;
+			for (i = start; i < end; i++) {
+				base->gthread[i].id = i;
+				base->gthread[i].node = sbit;
+				ret = benchmark_ops[idx].create_tfm(base, i);
+				if (ret)
+					goto out_free_tfm;
+			}
+		}
+	}
+
+	return 0;
+
+out_free_tfm:
+	for (i--; i >= 0; i--)
+		benchmark_ops[idx].release_tfm(base, i);
+
+	kfree(base->gthread);
+
+	return ret;
+}
+
+static void crypto_bm_release_tfm(struct crypto_bm_base *base)
+{
+	u32 threadnum = base->attrs->threadnum;
+	u32 idx = base->attrs->algtype;
+	int i;
+
+	for (i = 0; i < threadnum; i++)
+		benchmark_ops[idx].release_tfm(base, i);
+
+	kfree(base->gthread);
+}
+
+static int crypto_bm_create_req(struct crypto_bm_base *base)
+{
+	u32 threadnum = base->attrs->threadnum;
+	u32 idx = base->attrs->algtype;
+	int i, ret;
+
+	for (i = 0; i < threadnum; i++) {
+		ret = benchmark_ops[idx].create_req(base, i);
+		if (ret)
+			goto out_release_req;
+	}
+
+	return 0;
+
+out_release_req:
+	for (i--; i >= 0 ; i--)
+		benchmark_ops[idx].release_req(base, i);
+
+	return ret;
+}
+
+static void crypto_bm_release_req(struct crypto_bm_base *base)
+{
+	u32 threadnum = base->attrs->threadnum;
+	u32 idx = base->attrs->algtype;
+	int i;
+
+	for (i = 0; i < threadnum; i++)
+		benchmark_ops[idx].release_req(base, i);
+}
+
+static int crypto_bm_test_perf(void *data)
+{
+	struct crypto_bm_thread_data *tdata = data;
+	struct crypto_bm_base *base = tdata->base;
+	struct crypto_bm_attrs *attrs = base->attrs;
+	unsigned long endtime = jiffies + attrs->time * HZ;
+	u32 idx = attrs->algtype;
+	int ret;
+
+	do {
+		if (kthread_should_stop())
+			break;
+
+		if (time_after(jiffies, endtime))
+			break;
+
+		ret = benchmark_ops[idx].perf(tdata);
+		if (ret)
+			break;
+	} while (1);
+
+	crypto_bm_perf[tdata->threadid] = NULL;
+	atomic_dec(&crypto_bm_wq.count);
+	wake_up(&crypto_bm_wq.wq);
+
+	return ret;
+}
+
+static void crypto_bm_show_perf(u64 time)
+{
+	u32 threadnum = benchmark_attrs.threadnum;
+	u32 inputsize = benchmark_attrs.inputsize;
+	u64 throughput, pps, reqsum = 0;
+	int i;
+
+	for (i = 0; i < threadnum; i++)
+		reqsum += atomic_read(&thread_data[i].count.recv_req);
+
+	/*
+	 *               reqsum * inputsize (bytes) / (1024 * 1024)
+	 * throughput = -------------------------------------------- (MB/s)
+	 *                          time (ns) / 1000000000
+	 */
+	throughput = reqsum * inputsize * 953 / (time);
+
+	/*
+	 *          reqsum / 1024
+	 * pps = -------------------
+	 *        time / 1000000000
+	 */
+	pps = reqsum * 976562 / (time);
+
+	pr_err("Crypto benchmark result:\n"
+	       "\t throughput \t pps \t\t time\n"
+	       "\t %llu MB/s \t %llu kPP/s \t %llu ms\n",
+	       throughput, pps, time / 1000000);
+}
+
+static int crypto_bm_test(void *data)
+{
+	struct crypto_bm_base *base = data;
+	u32 threadnum = base->attrs->threadnum;
+	struct timespec64 begin, end;
+	int i, ret, node;
+
+	init_waitqueue_head(&crypto_bm_wq.wq);
+	atomic_set(&crypto_bm_wq.count, threadnum);
+
+	memset(crypto_bm_perf, 0, sizeof(*crypto_bm_perf) * threadnum);
+
+	ret = crypto_bm_init_alg(base);
+	if (ret)
+		goto out_set_stop;
+
+	ret = crypto_bm_create_tfm(base);
+	if (ret)
+		goto out_uninit;
+
+	ret = crypto_bm_create_req(base);
+	if (ret)
+		goto out_free_tfm;
+
+
+	for (i = 0; i < threadnum; i++) {
+		node = base->gthread[i].node;
+		thread_data[i].threadid = i;
+		thread_data[i].base = base;
+		memset(&thread_data[i].count, 0, sizeof(thread_data[i].count));
+		crypto_bm_perf[i] = kthread_create_on_node(crypto_bm_test_perf, &thread_data[i],
+							   node, "crypto_bm_perf-%d", i);
+		if (IS_ERR(crypto_bm_perf[i])) {
+			ret = PTR_ERR(crypto_bm_perf[i]);
+			crypto_bm_perf[i] = NULL;
+			pr_err("failed to create %dth performance thread, ret = %d\n", i, ret);
+			goto out_stop_thread;
+		}
+		kthread_bind_mask(crypto_bm_perf[i], cpumask_of_node(node));
+	}
+	i = 0;
+
+	ktime_get_real_ts64(&begin);
+	for (i = 0; i < threadnum; i++)
+		wake_up_process(crypto_bm_perf[i]);
+	wait_event_interruptible(crypto_bm_wq.wq, atomic_read(&crypto_bm_wq.count) == 0);
+	ktime_get_real_ts64(&end);
+
+	crypto_bm_show_perf(timespec64_to_ns(&end) - timespec64_to_ns(&begin));
+
+out_stop_thread:
+	for (i--; i >= 0; i--) {
+		if (!crypto_bm_perf[i])
+			continue;
+		kthread_stop(crypto_bm_perf[i]);
+		crypto_bm_perf[i] = NULL;
+	}
+
+	crypto_bm_release_req(base);
+
+out_free_tfm:
+	crypto_bm_release_tfm(base);
+
+out_uninit:
+	crypto_bm_uninit_alg(base);
+
+out_set_stop:
+	atomic_set(&benchmark_status, CRYPTO_BM_STOP);
+	test_thread = NULL;
+
+	return ret;
+}
+
+static int crypto_bm_start_test(struct crypto_bm_base *base)
+{
+	int ret = 0;
+
+	if (atomic_cmpxchg(&benchmark_status, CRYPTO_BM_STOP, CRYPTO_BM_RUN)) {
+		pr_err("Crypto benchmark is busy now, please try later!\n");
+		return -EBUSY;
+	}
+
+	test_thread = kthread_run(crypto_bm_test, base, "crypto_bm_test");
+	if (IS_ERR(test_thread))
+		ret = PTR_ERR(test_thread);
+
+	return ret;
+}
+
+static void crypto_bm_stop_test(void)
+{
+	u32 threadnum = benchmark_attrs.threadnum;
+	int i, ret;
+
+	if (!atomic_read(&benchmark_status))
+		return;
+
+	for (i = 0; i < threadnum; i++) {
+		if (!crypto_bm_perf[i])
+			continue;
+		ret = kthread_stop(crypto_bm_perf[i]);
+		if (ret)
+			pr_err("failed to stop %dth performance thread, ret = %d\n", i, ret);
+		crypto_bm_perf[i] = NULL;
+	}
+
+	if (test_thread) {
+		ret = kthread_stop(test_thread);
+		if (ret)
+			pr_err("failed to stop test thread, ret = %d\n", ret);
+	}
+
+	atomic_set(&benchmark_status, CRYPTO_BM_STOP);
+}
+
+static int run_set(const char *val, const struct kernel_param *kp)
+{
+	int ret;
+	u32 n;
+
+	ret = kstrtou32(val, 10, &n);
+	if (ret != 0)
+		return -EINVAL;
+
+	if (n == 0) {
+		crypto_bm_stop_test();
+	} else {
+		ret = crypto_bm_check_params(&benchmark_attrs);
+		if (ret)
+			return ret;
+
+		crypto_bm_set_default_params(&benchmark_attrs);
+
+		ret = crypto_bm_start_test(&benchmark_base);
+		if (ret) {
+			pr_err("failed to start test, ret = %d\n", ret);
+			return ret;
+		}
+		pr_info("run set: algorithm %s, algtype %s, inputsize %d, loop %d, numamask 0x%lx, optype %d, reqnum %d, threadnum %d, time %d.\n",
+			benchmark_attrs.algorithm, benchmark_ops[benchmark_attrs.algtype].alg,
+			benchmark_attrs.inputsize, benchmark_attrs.loop, benchmark_attrs.numamask,
+			benchmark_attrs.optype, benchmark_attrs.reqnum, benchmark_attrs.threadnum,
+			benchmark_attrs.time);
+	}
+
+	return param_set_int(val, kp);
+}
+
+static const struct kernel_param_ops run_ops = {
+	.set = run_set,
+	.get = param_get_uint,
+};
+
+static u32 run;
+module_param_cb(run, &run_ops, &run, 0644);
+MODULE_PARM_DESC(run, run_desc);
+
+static int __init crypto_bm_init(void)
+{
+	atomic_set(&benchmark_status, CRYPTO_BM_STOP);
+
+	return 0;
+}
+
+static void __exit crypto_bm_exit(void)
+{
+}
+
+module_init(crypto_bm_init);
+module_exit(crypto_bm_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Driver for testing performance of crypto algorithms");
diff --git a/crypto/benchmark/benchmark.h b/crypto/benchmark/benchmark.h
new file mode 100644
index 000000000000..84cb49af81ba
--- /dev/null
+++ b/crypto/benchmark/benchmark.h
@@ -0,0 +1,76 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2022 HiSilicon Limited.
+ */
+#ifndef CRYPTO_BM_H
+#define CRYPTO_BM_H
+
+#include <linux/crypto.h>
+#include <linux/errno.h>
+#include <linux/find.h>
+#include <linux/gfp.h>
+#include <linux/nodemask.h>
+#include <linux/printk.h>
+#include <linux/slab.h>
+
+/**
+ * struct crypto_bm_attrs - crypto benchmark attributes configured by users.
+ *
+ * @algorithm:	The algorithm name registered in crypto.
+ * @algtype:	The algorithm class list in enum crypto_bm_alg. Used to
+ *		choose the crypto_bm_ops.
+ * @inputsize:	The testing length that can greatly impact performance.
+ *		Such as data size for compress or key length for encryption.
+ * @loop:	The request sending loop times. The value is 1000 times
+ *		of user's setting.
+ * @numamask:	The mask of testing bind numa nodes.
+ * @optype:	The algorithm test operation. Defined by the algorithm self.
+ * @reqnum:	The crypto request number of a tfm.
+ * @threadnum:	The test thread number. And it is equal to tfm number.
+ * @time:	The testing time.
+ */
+struct crypto_bm_attrs {
+	char *algorithm;
+	u32 algtype;
+	u32 inputsize;
+	u32 loop;
+	unsigned long numamask;
+	u32 optype;
+	u32 reqnum;
+	u32 threadnum;
+	u32 time;
+};
+
+/**
+ * struct crypto_bm_base - crypto benchmark test objects.
+ *
+ * @attrs:	The test configuration.
+ * @gthread:	A array storing resources related to the test thread.
+ */
+struct crypto_bm_base {
+	struct crypto_bm_attrs *attrs;
+	struct {
+		u32 id;
+		int node;
+		void *tfm;
+		void **req;
+	} *gthread;
+};
+
+/**
+ * struct crypto_bm_thread_data - crypto benchmark test thread common information.
+ *
+ * @threadid:	The test thread number.
+ * @count:	Count the thread test request numbers.
+ * @base:	crypto benchmark test objects.
+ */
+struct crypto_bm_thread_data {
+	int threadid;
+	struct {
+		atomic_t send_req;
+		atomic_t recv_req;
+	} count;
+	struct crypto_bm_base *base;
+} ____cacheline_aligned;
+
+#endif
--
2.24.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 3/6] crytpo: benchmark - support compression/decompresssion
  2022-09-19 12:05 [RFC PATCH 0/6] crypto: benchmark - add the crypto benchmark Yang Shen
  2022-09-19 12:05 ` [RFC PATCH 1/6] moduleparams: Add hexulong type parameter Yang Shen
  2022-09-19 12:05 ` [RFC PATCH 2/6] crypto: benchmark - add a crypto benchmark tool Yang Shen
@ 2022-09-19 12:05 ` Yang Shen
  2022-09-19 12:05 ` [RFC PATCH 4/6] crypto: benchmark - add help information Yang Shen
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Yang Shen @ 2022-09-19 12:05 UTC (permalink / raw)
  To: herbert, davem; +Cc: linux-kernel, linux-crypto, gregkh

Register compression algorithms to crypto benchmark. Users can echo 0 to
'algtype' to appoint the compression/decompression.

Due to the compression protocol, the tool cannot set the compressed
data length to 'inputsize'. So in this algorithm class, the 'inputsize'
is used as origin data size in decompression.

To avoid the false high performance of the algorithm caused by the false
cache and TLB hit rate, the size of data set is four times of crypto_req
number at most.

Signed-off-by: Yang Shen <shenyang39@huawei.com>
---
 crypto/benchmark/Makefile    |   2 +-
 crypto/benchmark/benchmark.c |  11 +
 crypto/benchmark/bm_comp.c   | 425 +++++++++++++++++++++++++++++++++++
 crypto/benchmark/bm_comp.h   |  18 ++
 4 files changed, 455 insertions(+), 1 deletion(-)
 create mode 100644 crypto/benchmark/bm_comp.c
 create mode 100644 crypto/benchmark/bm_comp.h

diff --git a/crypto/benchmark/Makefile b/crypto/benchmark/Makefile
index 5244178e14c4..f638535442ba 100644
--- a/crypto/benchmark/Makefile
+++ b/crypto/benchmark/Makefile
@@ -1,3 +1,3 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_CRYPTO_BENCHMARK) += crypto_benchmark.o
-crypto_benchmark-objs += benchmark.o
+crypto_benchmark-objs += benchmark.o bm_comp.o
diff --git a/crypto/benchmark/benchmark.c b/crypto/benchmark/benchmark.c
index 9a833b277d87..b5dcf5829b22 100644
--- a/crypto/benchmark/benchmark.c
+++ b/crypto/benchmark/benchmark.c
@@ -11,6 +11,7 @@
 #include <linux/wait.h>

 #include "benchmark.h"
+#include "bm_comp.h"

 enum crypto_bm_status {
 	CRYPTO_BM_STOP,
@@ -18,6 +19,7 @@ enum crypto_bm_status {
 };

 enum crypto_bm_alg {
+	CRYPTO_BM_COMP,
 	CRYPTO_BM_ALG_MAX,
 };

@@ -65,6 +67,15 @@ static struct task_struct *test_thread;

 static struct crypto_bm_alg_ops benchmark_ops[] = {
 	{
+		.alg		= "CRYPTO_COMPRESS",
+		.init		= crypto_bm_init_comp,
+		.uninit		= crypto_bm_uninit_comp,
+		.create_tfm	= crypto_bm_create_tfm_comp,
+		.release_tfm	= crypto_bm_release_tfm_comp,
+		.create_req	= crypto_bm_create_req_comp,
+		.release_req	= crypto_bm_release_req_comp,
+		.perf		= crypto_bm_perf_comp,
+	}, {
 		/* sentinel */
 	}
 };
diff --git a/crypto/benchmark/bm_comp.c b/crypto/benchmark/bm_comp.c
new file mode 100644
index 000000000000..2772a8e86e2e
--- /dev/null
+++ b/crypto/benchmark/bm_comp.c
@@ -0,0 +1,425 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2022 HiSilicon Limited.
+ */
+#include <linux/scatterlist.h>
+#include <crypto/acompress.h>
+
+#include "benchmark.h"
+#include "bm_comp.h"
+
+#define COMP_BUF_SIZE		1024
+#define REQ_NUM			1024
+#define DATAPERREQ		4
+
+enum crypto_bm_comp_optype {
+	CRYPTO_BM_COMPRESS,
+	CRYPTO_BM_DECOMPRESS,
+	CRYPTO_BM_OPS_MAX,
+};
+
+struct crypto_bm_comp_buffer {
+	void *input;
+	void *output;
+	struct scatterlist *src;
+	struct scatterlist *dst;
+};
+
+struct crypto_bm_comp_cb_data {
+	atomic_t is_used;
+	struct crypto_bm_thread_data *tdata;
+};
+
+struct crypto_bm_comp_data {
+	u32 input_size;
+	u32 output_size;
+	u32 last_used;
+	struct crypto_bm_comp_buffer *buffers;
+	struct crypto_bm_comp_cb_data *cb_datas;
+};
+
+struct crypto_bm_comp_testvec {
+	int inlen;
+	int outlen;
+	char input[COMP_BUF_SIZE];
+	char output[COMP_BUF_SIZE];
+};
+
+struct crypto_bm_comp_test_func {
+	int (*testfun)(struct acomp_req *req);
+};
+
+static int dataperreq;
+
+static int totalreq;
+
+static struct crypto_bm_comp_data *data_array;
+
+static const struct crypto_bm_comp_testvec comp_compress_tv = {
+	.inlen	= 70,
+	.input	= "Join us now and share the software "
+		"Join us now and share the software ",
+};
+
+static const struct crypto_bm_comp_test_func testfunc[] = {
+	{
+		.testfun = crypto_acomp_compress,
+	}, {
+		.testfun = crypto_acomp_decompress,
+	}, {
+		/* sentinel */
+	}
+};
+
+static void crypto_bm_comp_cb(struct crypto_async_request *base, int err);
+
+int crypto_bm_init_comp(struct crypto_bm_base *base)
+{
+	struct crypto_bm_attrs *attrs = base->attrs;
+
+	if (attrs->optype >= CRYPTO_BM_OPS_MAX) {
+		pr_err("Optype should be 0 for compression or 1 for decompression!\n");
+		return -ENOMEM;
+	}
+
+	if (attrs->reqnum * DATAPERREQ >= REQ_NUM)
+		totalreq = attrs->reqnum * DATAPERREQ;
+	else
+		totalreq = REQ_NUM;
+
+	dataperreq = totalreq / attrs->reqnum;
+
+	data_array = kcalloc(attrs->threadnum, sizeof(*data_array), GFP_KERNEL);
+	if (!data_array)
+		return -ENOMEM;
+
+	return 0;
+}
+
+void crypto_bm_uninit_comp(struct crypto_bm_base *base)
+{
+	kfree(data_array);
+}
+
+int crypto_bm_create_tfm_comp(struct crypto_bm_base *base, u32 idx)
+{
+	char *alg = base->attrs->algorithm;
+	int node = base->gthread[idx].node;
+	int ret = 0;
+
+	base->gthread[idx].tfm = crypto_alloc_acomp_node(alg, 0, 0, node);
+	if (IS_ERR(base->gthread[idx].tfm)) {
+		ret = PTR_ERR(base->gthread[idx].tfm);
+		pr_err("failed to alloc %dth acomp, ret = %d\n", idx, ret);
+	}
+
+	return ret;
+}
+
+void crypto_bm_release_tfm_comp(struct crypto_bm_base *base, u32 idx)
+{
+	crypto_free_acomp(base->gthread[idx].tfm);
+}
+
+static void crypto_bm_comp_copy_data_compress(u32 idx)
+{
+	struct crypto_bm_comp_data *data = &data_array[idx];
+	u32 block, inlen, inputsize = data->input_size;
+	void *buffer;
+	int i, j;
+
+	block = DIV_ROUND_UP(inputsize, comp_compress_tv.inlen);
+	for (i = 0; i < totalreq; i++) {
+		inlen = inputsize;
+		buffer = data->buffers[i].input;
+		for (j = 0; j < block; j++) {
+			memcpy(buffer, comp_compress_tv.input,
+			       j == block - 1 ? inlen : comp_compress_tv.inlen);
+			buffer += comp_compress_tv.inlen;
+			inlen -= comp_compress_tv.inlen;
+		}
+	}
+}
+
+static int crypto_bm_comp_copy_data_decompress(struct crypto_bm_base *base, u32 idx)
+{
+	struct crypto_bm_comp_data *data = &data_array[idx];
+	struct crypto_acomp *acomp = base->gthread[idx].tfm;
+	struct crypto_wait wait;
+	struct acomp_req *req;
+	u32 block, inlen;
+	void *buffer;
+	int i, ret;
+
+	req = acomp_request_alloc(acomp);
+	if (!req)
+		return -ENOMEM;
+
+	inlen = data->input_size;
+	block = DIV_ROUND_UP(inlen, comp_compress_tv.inlen);
+	buffer = data->buffers[0].input;
+	for (i = 0; i < block; i++) {
+		memcpy(buffer, comp_compress_tv.input,
+		       i == block - 1 ? inlen : comp_compress_tv.inlen);
+		buffer += comp_compress_tv.inlen;
+		inlen -= comp_compress_tv.inlen;
+	}
+
+	/*
+	 * For decompression, the tool need to prepare compressed data according
+	 * to crypto_bm_attrs.inputsize. And here it is hard to make the compressed
+	 * data length equal to 'inputsize' value, so make the origin data length
+	 * equal to 'inputsize' value.
+	 */
+	crypto_init_wait(&wait);
+	acomp_request_set_callback(req, 0, crypto_req_done, &wait);
+	acomp_request_set_params(req, data->buffers[0].src, data->buffers[0].dst,
+				 data->input_size, data->output_size);
+
+	ret = crypto_wait_req(crypto_acomp_compress(req), &wait);
+	if (ret) {
+		pr_err("failed to prepare decompression data.\n");
+		goto out_free_req;
+	}
+
+	for (i = 0; i < totalreq; i++)
+		memcpy(data->buffers[i].input, data->buffers[0].output, req->dlen);
+
+out_free_req:
+	acomp_request_free(req);
+
+	return ret;
+}
+
+static int crypto_bm_comp_init_data(struct crypto_bm_base *base, u32 idx)
+{
+	struct crypto_bm_comp_data *data = &data_array[idx];
+	int i, ret, node = base->gthread[idx].node;
+	struct crypto_bm_comp_buffer *buffer;
+	u32 reqnum = base->attrs->reqnum;
+	u32 optype = base->attrs->optype;
+
+	data->input_size = base->attrs->inputsize;
+	data->output_size = base->attrs->inputsize;
+
+	data->buffers = kcalloc_node(1, sizeof(*data->buffers) * totalreq, GFP_KERNEL, node);
+	if (!data->buffers)
+		return -ENOMEM;
+
+	data->cb_datas = kcalloc_node(1, sizeof(*data->cb_datas) * reqnum, GFP_KERNEL, node);
+	if (!data->cb_datas) {
+		ret = -ENOMEM;
+		goto out_free_buffers;
+	}
+
+	for (i = 0; i < totalreq; i++) {
+		buffer = &data->buffers[i];
+		buffer->src = kcalloc_node(1, sizeof(struct scatterlist), GFP_KERNEL, node);
+		if (!buffer->src) {
+			ret = -ENOMEM;
+			goto out_free_src;
+		}
+
+		buffer->dst = kcalloc_node(1, sizeof(struct scatterlist), GFP_KERNEL, node);
+		if (!buffer->dst) {
+			ret = -ENOMEM;
+			goto out_free_dst;
+		}
+
+		buffer->input = kcalloc_node(1, data->input_size, GFP_KERNEL, node);
+		if (!buffer->input) {
+			ret = -ENOMEM;
+			goto out_free_input;
+		}
+
+		buffer->output = kcalloc_node(1, data->output_size, GFP_KERNEL, node);
+		if (!buffer->output) {
+			ret = -ENOMEM;
+			goto out_free_output;
+		}
+
+		sg_init_one(buffer->src, buffer->input, data->input_size);
+		sg_init_one(buffer->dst, buffer->output, data->output_size);
+	}
+
+	if (optype == CRYPTO_BM_COMPRESS) {
+		crypto_bm_comp_copy_data_compress(idx);
+	} else {
+		ret = crypto_bm_comp_copy_data_decompress(base, idx);
+		if (ret) {
+			i--;
+			goto out_free_output;
+		}
+	}
+
+	return 0;
+
+out_free_output:
+	kfree(buffer->input);
+
+out_free_input:
+	kfree(buffer->dst);
+
+out_free_dst:
+	kfree(buffer->src);
+
+out_free_src:
+	for (i--; i >= 0; i--) {
+		buffer = &data->buffers[i];
+		kfree(buffer->src);
+		kfree(buffer->dst);
+		kfree(buffer->input);
+		kfree(buffer->output);
+	}
+
+	kfree(data->cb_datas);
+
+out_free_buffers:
+	kfree(data->buffers);
+
+	return ret;
+}
+
+static void crypto_bm_comp_uninit_data(struct crypto_bm_base *base, u32 idx)
+{
+	struct crypto_bm_comp_data *data = &data_array[idx];
+	struct crypto_bm_comp_buffer *buffer;
+	int i;
+
+	for (i = 0; i < totalreq; i++) {
+		buffer = &data->buffers[i];
+		kfree(buffer->src);
+		kfree(buffer->dst);
+		kfree(buffer->input);
+		kfree(buffer->output);
+	}
+
+	kfree(data->cb_datas);
+	kfree(data->buffers);
+}
+
+static int crypto_bm_comp_alloc_req(struct crypto_bm_base *base, u32 idx)
+{
+	struct crypto_bm_comp_data *data = &data_array[idx];
+	int node = base->gthread[idx].node;
+	u32 reqnum = base->attrs->reqnum;
+	struct acomp_req *req;
+	int i;
+
+	base->gthread[idx].req = kcalloc_node(reqnum, sizeof(struct acomp_req *), GFP_KERNEL, node);
+	if (!base->gthread[idx].req)
+		return -ENOMEM;
+
+	for (i = 0; i < reqnum; i++) {
+		req = acomp_request_alloc(base->gthread[idx].tfm);
+		if (!req) {
+			pr_err("failed to allocate acomp request\n");
+			goto out_free_req;
+		}
+
+		acomp_request_set_callback(req, 0, crypto_bm_comp_cb, &data->cb_datas[i]);
+		base->gthread[idx].req[i] = req;
+	}
+
+	return 0;
+
+out_free_req:
+	for (i--; i >= 0; i--)
+		acomp_request_free(base->gthread[idx].req[i]);
+
+	kfree(base->gthread[idx].req);
+
+	return -EINVAL;
+}
+
+static void crypto_bm_comp_free_req(struct crypto_bm_base *base, u32 idx)
+{
+	u32 reqnum = base->attrs->reqnum;
+	int i;
+
+	for (i = 0; i < reqnum; i++)
+		acomp_request_free(base->gthread[idx].req[i]);
+}
+
+int crypto_bm_create_req_comp(struct crypto_bm_base *base, u32 idx)
+{
+	int ret;
+
+	ret = crypto_bm_comp_init_data(base, idx);
+	if (ret)
+		return ret;
+
+	ret = crypto_bm_comp_alloc_req(base, idx);
+	if (ret)
+		goto out_free_buf;
+
+	return 0;
+
+out_free_buf:
+	crypto_bm_comp_uninit_data(base, idx);
+
+	return ret;
+}
+
+void crypto_bm_release_req_comp(struct crypto_bm_base *base, u32 idx)
+{
+	crypto_bm_comp_free_req(base, idx);
+	crypto_bm_comp_uninit_data(base, idx);
+}
+
+static void crypto_bm_comp_cb(struct crypto_async_request *base, int err)
+{
+	struct crypto_bm_comp_cb_data *data = base->data;
+
+	atomic_inc(&data->tdata->count.recv_req);
+	atomic_set(&data->is_used, 0);
+}
+
+int crypto_bm_perf_comp(struct crypto_bm_thread_data *data)
+{
+	struct crypto_bm_base *base = data->base;
+	int i, j, ret, last_used, send_req = 0;
+	u32 loop = base->attrs->loop * 1000;
+	u32 reqnum = base->attrs->reqnum;
+	u32 threadid = data->threadid;
+	struct crypto_bm_comp_data *comp_data = &data_array[threadid];
+	struct crypto_bm_comp_buffer *buffer;
+	struct acomp_req *req;
+
+	for (i = 0; i < reqnum; i++)
+		comp_data->cb_datas[i].tdata = data;
+
+	for (i = 0; i < loop; i++) {
+		for (j = 0; j < reqnum; j++) {
+			if (atomic_read(&comp_data->cb_datas[j].is_used))
+				continue;
+			req = base->gthread[threadid].req[j];
+			last_used = comp_data->last_used;
+			buffer = &comp_data->buffers[last_used + j * dataperreq];
+			acomp_request_set_params(req, buffer->src, buffer->dst,
+						 comp_data->input_size, comp_data->output_size);
+			atomic_set(&comp_data->cb_datas[j].is_used, 1);
+			ret = testfunc[base->attrs->optype].testfun(req);
+			if (!ret) {
+				atomic_inc(&data->count.recv_req);
+				atomic_set(&comp_data->cb_datas[j].is_used, 0);
+			}
+			if (unlikely(ret && ret != -EINPROGRESS && ret != -EBUSY)) {
+				pr_err("failed to compress req, ret %d\n", ret);
+				atomic_set(&comp_data->cb_datas[j].is_used, 0);
+				break;
+			}
+			ret = 0;
+			comp_data->last_used = (last_used + 1) % dataperreq;
+			send_req++;
+		}
+	}
+
+	atomic_add(send_req, &data->count.send_req);
+	send_req = atomic_read(&data->count.send_req);
+
+	while (atomic_read(&data->count.recv_req) != send_req)
+		;
+
+	return ret;
+}
diff --git a/crypto/benchmark/bm_comp.h b/crypto/benchmark/bm_comp.h
new file mode 100644
index 000000000000..78b45f8b22a6
--- /dev/null
+++ b/crypto/benchmark/bm_comp.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2022 HiSilicon Limited.
+ */
+#ifndef CRYPTO_BM_COMP_H
+#define CRYPTO_BM_COMP_H
+
+#include <linux/types.h>
+
+int crypto_bm_init_comp(struct crypto_bm_base *base);
+void crypto_bm_uninit_comp(struct crypto_bm_base *base);
+int crypto_bm_create_tfm_comp(struct crypto_bm_base *base, u32 idx);
+void crypto_bm_release_tfm_comp(struct crypto_bm_base *base, u32 idx);
+int crypto_bm_create_req_comp(struct crypto_bm_base *base, u32 idx);
+void crypto_bm_release_req_comp(struct crypto_bm_base *base, u32 idx);
+int crypto_bm_perf_comp(struct crypto_bm_thread_data *data);
+
+#endif
--
2.24.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 4/6] crypto: benchmark - add help information
  2022-09-19 12:05 [RFC PATCH 0/6] crypto: benchmark - add the crypto benchmark Yang Shen
                   ` (2 preceding siblings ...)
  2022-09-19 12:05 ` [RFC PATCH 3/6] crytpo: benchmark - support compression/decompresssion Yang Shen
@ 2022-09-19 12:05 ` Yang Shen
  2022-09-19 12:05 ` [RFC PATCH 5/6] crypto: benchmark - add API documentation Yang Shen
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Yang Shen @ 2022-09-19 12:05 UTC (permalink / raw)
  To: herbert, davem; +Cc: linux-kernel, linux-crypto, gregkh

Add a new module parameters 'help' to make users understand the benchmark
module parameters. And due to the algorithms have different notes, add
a new callback 'help' to show the differences.

Signed-off-by: Yang Shen <shenyang39@huawei.com>
---
 crypto/benchmark/benchmark.c | 79 ++++++++++++++++++++++++++++++++++++
 crypto/benchmark/bm_comp.c   | 10 +++++
 crypto/benchmark/bm_comp.h   |  1 +
 3 files changed, 90 insertions(+)

diff --git a/crypto/benchmark/benchmark.c b/crypto/benchmark/benchmark.c
index b5dcf5829b22..a3ccd8955eaa 100644
--- a/crypto/benchmark/benchmark.c
+++ b/crypto/benchmark/benchmark.c
@@ -32,6 +32,12 @@ struct crypto_bm_alg_ops {
 	int (*create_req)(struct crypto_bm_base *base, u32 idx);
 	void (*release_req)(struct crypto_bm_base *base, u32 idx);
 	int (*perf)(struct crypto_bm_thread_data *data);
+	void (*help)(void);
+};
+
+struct crypto_bm_mp_info {
+	const char *mp;
+	const char *help_info;
 };
 
 struct {
@@ -51,6 +57,9 @@ struct {
 #define threadnum_desc		"Testing thread number, one 'crypto_tfm' per thread. 0/1 (default 1 thread), 2(2 threads) ..."
 #define time_desc		"Testing time, the unit is second, 0/1 (default 1 s), 2(2 s) ..."
 #define run_desc		"Start/stop all the tests based on the configuration, 0(default, not run, stop), or run"
+#define help_desc		"Some help information. Echo a module parameter can get the info " \
+				"of module parameter. Cat 'help' directly can get the help "\
+				"information provided by 'algtype'."
 
 static atomic_t benchmark_status;
 
@@ -75,11 +84,47 @@ static struct crypto_bm_alg_ops benchmark_ops[] = {
 		.create_req	= crypto_bm_create_req_comp,
 		.release_req	= crypto_bm_release_req_comp,
 		.perf		= crypto_bm_perf_comp,
+		.help		= crypto_bm_help_comp,
 	}, {
 		/* sentinel */
 	}
 };
 
+static struct crypto_bm_mp_info modules_help[] = {
+	{
+		.mp		= "algorithm",
+		.help_info	= "Please input a crypto supported algorithm name.\n"
+				  "The algorithm name can be found on /proc/crypto.",
+	}, {
+		.mp		= "algtype",
+		.help_info	= "Please input a valid value to choose algorithm class.\n"
+				  "0: CRYPTO_BM_COMP",
+	}, {
+		.mp		= "inputsize",
+		.help_info	= "Please input a valid value as testing input size.",
+	}, {
+		.mp		= "loop",
+		.help_info	= "Please input the send loop times.",
+	}, {
+		.mp		= "numamask",
+		.help_info	= "Please input a bitmap as testing numa nodes.",
+	}, {
+		.mp		= "optype",
+		.help_info	= "Please input a valid value for testing operation.\n"
+				  "Can get the algorithm type support optype by cat 'help'."
+	}, {
+		.mp		= "reqnum",
+		.help_info	= "Please input a valid value for per thread request number.",
+	}, {
+		.mp		= "threadnum",
+		.help_info	= "Please input a valid value for creating threads.\n"
+				  "One thread will create a crypto_tfm.",
+	}, {
+		.mp		= "time",
+		.help_info	= "Please input a valid value for testing time.",
+	}
+};
+
 static int crypto_bm_algorithm_param_set(const char *val, const struct kernel_param *kp)
 {
 	char *s = strstrip((char *)val);
@@ -103,6 +148,40 @@ static const struct kernel_param_ops alg_ops = {
 module_param_cb(algorithm, &alg_ops, &benchmark_attrs.algorithm, 0644);
 MODULE_PARM_DESC(algorithm, algorithm_desc);
 
+static int crypto_bm_help_param_set(const char *val, const struct kernel_param *kp)
+{
+	int size = ARRAY_SIZE(modules_help);
+	char *s = strstrip((char *)val);
+	int i;
+
+	for (i = 0; i < size; i++) {
+		if (!strcmp(s, modules_help[i].mp))
+			pr_err("%s\n", modules_help[i].help_info);
+	}
+
+	return 0;
+}
+
+static int crypto_bm_help_param_get(char *val, const struct kernel_param *kp)
+{
+	u32 idx = benchmark_attrs.algtype;
+
+	if (idx >= CRYPTO_BM_ALG_MAX)
+		return -EINVAL;
+
+	benchmark_ops[idx].help();
+
+	return 0;
+}
+
+static const struct kernel_param_ops help_ops = {
+	.set = crypto_bm_help_param_set,
+	.get = crypto_bm_help_param_get,
+};
+
+module_param_cb(help, &help_ops, NULL, 0644);
+MODULE_PARM_DESC(help, help_desc);
+
 static int crypto_bm_numamask_param_set(const char *val, const struct kernel_param *kp)
 {
 	if (atomic_read(&benchmark_status))
diff --git a/crypto/benchmark/bm_comp.c b/crypto/benchmark/bm_comp.c
index 2772a8e86e2e..62192a55b2ab 100644
--- a/crypto/benchmark/bm_comp.c
+++ b/crypto/benchmark/bm_comp.c
@@ -423,3 +423,13 @@ int crypto_bm_perf_comp(struct crypto_bm_thread_data *data)
 
 	return ret;
 }
+
+void crypto_bm_help_comp(void)
+{
+	pr_err("Welcome to use the crypto benchmark to test compress algorithm!\n"
+	       "There ars some different moduel parameters requirement:\n"
+	       "optype: 0 for compression, 1 for decompression\n"
+	       "inputsize: for compression, the inputsize is src_len,\n"
+	       "           for decompression, the inputsize is dst_len, and the src_len will depend on the data compression ratio.\n"
+	       );
+}
diff --git a/crypto/benchmark/bm_comp.h b/crypto/benchmark/bm_comp.h
index 78b45f8b22a6..aedafde2c3ad 100644
--- a/crypto/benchmark/bm_comp.h
+++ b/crypto/benchmark/bm_comp.h
@@ -14,5 +14,6 @@ void crypto_bm_release_tfm_comp(struct crypto_bm_base *base, u32 idx);
 int crypto_bm_create_req_comp(struct crypto_bm_base *base, u32 idx);
 void crypto_bm_release_req_comp(struct crypto_bm_base *base, u32 idx);
 int crypto_bm_perf_comp(struct crypto_bm_thread_data *data);
+void crypto_bm_help_comp(void);
 
 #endif
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 5/6] crypto: benchmark - add API documentation
  2022-09-19 12:05 [RFC PATCH 0/6] crypto: benchmark - add the crypto benchmark Yang Shen
                   ` (3 preceding siblings ...)
  2022-09-19 12:05 ` [RFC PATCH 4/6] crypto: benchmark - add help information Yang Shen
@ 2022-09-19 12:05 ` Yang Shen
  2022-09-19 12:05 ` [RFC PATCH 6/6] MAINTAINERS: add crypto benchmark MAINTAINER Yang Shen
  2022-09-20  8:28 ` [RFC PATCH 0/6] crypto: benchmark - add the crypto benchmark Herbert Xu
  6 siblings, 0 replies; 14+ messages in thread
From: Yang Shen @ 2022-09-19 12:05 UTC (permalink / raw)
  To: herbert, davem; +Cc: linux-kernel, linux-crypto, gregkh

Provide a crypto benchmark to help the developer quickly get the
performance of a crypto-registed algorithm.

To simulate more scenes, the tool has following parameters under
'/sys/modules/crypto_benchmark/parameters/' to configure: algorithm,
algtype, inputsize, loop, numamask, optype, reqnum, threadnum
and time.

To shield the differences between different algorithms, the tool has
following interface to do a crypto request: init, uninit, create_tfm,
release_tfm, create_req, release_req, perf and help.

Signed-off-by: Yang Shen <shenyang39@huawei.com>
---
 Documentation/crypto/benchmark.rst | 104 +++++++++++++++++++++++++++++
 1 file changed, 104 insertions(+)
 create mode 100644 Documentation/crypto/benchmark.rst

diff --git a/Documentation/crypto/benchmark.rst b/Documentation/crypto/benchmark.rst
new file mode 100644
index 000000000000..e9b13e81bce3
--- /dev/null
+++ b/Documentation/crypto/benchmark.rst
@@ -0,0 +1,104 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Crypto Benchmark
+================
+
+Overview
+--------
+The crypto benchmark is a crypto algorithm performance tool.
+
+Designed Scheme
+---------------
+
+1. Parameters
+
+The crypto benchmark is used for test the algorithm registered in crypto
+subsystem. Users can use module parameters to simulate different scenarios.
+Both considering the test scenarios and the use complexity, the benchmark
+tool has following module parameters:
+
+- algorithm
+The 'algorithm' is used to create a 'crypto_tfm'. The right algorithm name
+can be found in /proc/crypto.
+
+- algtype
+The 'algtype' is used to find the operations of algorithm. Can get the
+algorithm class by echo 'algtype' to
+/sys/module/crypto_benchmark/parameters/help.
+
+- inputsize
+The 'inputsize' is used as testing inputsize, outputsize will be set
+according to algorithm.
+
+- loop
+The 'loop' is used as times to try to send request for one 'crypto_req'.
+Avoid performance fluctuations caused by environment.
+For synchronization mode, the loop times is equal to send times.
+But for asynchronization, the send times is often less than loop times.
+
+- numamask
+The 'numamask' is used as testing binding numa nodes. The input will be
+analyzed as a bitmap.
+
+- optype
+The 'optype' is used for choose algorithm operation function. Can get the
+algorithm available operation types by cat
+/sys/module/crypto_benchmark/parameters/help with specified 'algtype'.
+For example, choose the compress and decompress when test crypto comp.
+
+- reqnum
+The 'reqnum' is used as requests number of a crypto tfm. For asynchronization,
+one thread may used plural 'crypto_req' to improve performance. One request
+a thread is a synchronous model
+
+- threadnum
+The 'threadnum' is used for creating testing threads. To simplify model,
+create a 'crypto_tfm' per thread. Notice that all threads will be divided
+equally to the specified NUMA node, and threads that cannot be divided
+equally will be created on the last node.
+
+- time
+The 'time' is used for testing. Used for stop the test thread. If the time
+is not enough, the thread will send another group loop times requests.
+
+- run
+The 'run' is used to trigger the test. Echo 0 for stop all test threads,
+and others for starting test.
+
+- help
+The 'help' is used to guide users to use the test interface. Echo a module
+parameter name to 'help' can get the detailed information. Cat the 'help'
+can get some private  information according to 'algtype'.
+
+2. Register
+
+There are too many differences between crypto algorithms. Therefore, the
+crypto benchmark only completes the general work. All the different parts
+are put into the callback of the algorithm to complete. The usual crypto
+task can be divided into three parts: alloc tfm, alloc request, and send
+request.
+
+A new algorithm class want to register to crypto benchmark should realize
+following callbacks:
+
+- init & uninit
+The initialize related functions. Algorithm can do some private setting.
+
+- create_tfm & release_tfm
+The crypto_tfm related functions. Algorithm has different tfm name.
+But they both has a member named tfm, so use tfm to stand for algorithm
+handle. The benchmark has provides the tfm array.
+
+- create_req & release_req
+The crypto_req related functions. The registrant should create a 'reqnum'
+'crypto_req' group in struct 'crypto_bm_base'. And the also suggest
+prepare the request data in this function. To simulate real cache and TLB
+hit rate, using a big data groups is a good plan.
+
+- perf
+The request sending functions. The registrant should use parameter 'loop'
+to send requests repeatly. And update the count in struct
+'crypto_bm_thread_data'.
+
+- help
+The algorithm private parameters meaning functions.
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 6/6] MAINTAINERS: add crypto benchmark MAINTAINER
  2022-09-19 12:05 [RFC PATCH 0/6] crypto: benchmark - add the crypto benchmark Yang Shen
                   ` (4 preceding siblings ...)
  2022-09-19 12:05 ` [RFC PATCH 5/6] crypto: benchmark - add API documentation Yang Shen
@ 2022-09-19 12:05 ` Yang Shen
  2022-09-20  8:28 ` [RFC PATCH 0/6] crypto: benchmark - add the crypto benchmark Herbert Xu
  6 siblings, 0 replies; 14+ messages in thread
From: Yang Shen @ 2022-09-19 12:05 UTC (permalink / raw)
  To: herbert, davem; +Cc: linux-kernel, linux-crypto, gregkh

Add the maintainer information for the crypto benchmark.

Signed-off-by: Yang Shen <shenyang39@huawei.com>
---
 MAINTAINERS | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 164f67e59e5f..89beaebfab23 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5445,6 +5445,13 @@ F:	include/crypto/
 F:	include/linux/crypto*
 F:	lib/crypto/
 
+CRYPTO BENCHMARK TOOL
+M:	Yang Shen <shenyang39@huawei.com>
+L:	linux-crypto@vger.kernel.org
+S:	Maintained
+F:	Documentation/crypto/benchmark.rst
+F:	crypto/benchmark/
+
 CRYPTOGRAPHIC RANDOM NUMBER GENERATOR
 M:	Neil Horman <nhorman@tuxdriver.com>
 L:	linux-crypto@vger.kernel.org
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 2/6] crypto: benchmark - add a crypto benchmark tool
  2022-09-19 12:05 ` [RFC PATCH 2/6] crypto: benchmark - add a crypto benchmark tool Yang Shen
@ 2022-09-20  7:31   ` Greg KH
  2022-09-21  8:20     ` Yang Shen
  0 siblings, 1 reply; 14+ messages in thread
From: Greg KH @ 2022-09-20  7:31 UTC (permalink / raw)
  To: Yang Shen; +Cc: herbert, davem, linux-kernel, linux-crypto

On Mon, Sep 19, 2022 at 08:05:33PM +0800, Yang Shen wrote:
> Provide a crypto benchmark to help the developer quickly get the
> performance of a algorithm registered in crypto.
> 
> Due to the crypto algorithms have multifarious parameters, the tool
> cannot support all test scenes. In order to provide users with simple
> and easy-to-use tools and support as many test scenarios as possible,
> benchmark refers to the crypto method to provide a unified struct
> 'crypto_bm_ops'. And the algorithm registers its own callbacks to parse
> the user's input. In crypto, a algorithm class has multiple algorithms,
> but all of them uses the same API. So in the benchmark, a algorithm
> class uses the same 'ops' and distinguish specific algorithm by name.
> 
> First, consider the performance calculation model. Considering the
> crypto subsystem model, a reasonable process code based on crypto api
> should create a numa node based 'crypto_tfm' in advance and apply for
> a certain amount of 'crypto_req' according to their own business.
> In the real business processing stage, the thread send tasks based on
> 'crypto_req' and wait for completion.
> 
> Therefore, the benchmark will create 'crypto_tfm' and 'crypto_req' at
> first, and then count all requests time to calculate performance.
> So the result is the pure algorithm performance. When each algorithm
> class implements its own 'ops', it needs to pay attention to the content
> completed in the callback. Before the 'ops.perf', the tool had better
> prepare the request data set. And in order to avoid the false high
> performance of the algorithm caused by the false cache and TLB hit rate,
> the size of data set should be larger than 'crypto_req' number.
> The 'crypto_bm_ops' has following api:
>  - init & uninit
>  The initialize related functions. Algorithm can do some private setting.
>  - create_tfm & release_tfm
>  The 'crypto_tfm' related functions. Algorithm has different tfm name in
>  crypto. But they both has a member named tfm, so use tfm to stand for
>  algorithm handle. The benchmark has provides the tfm array.
>  - create_req & release_req
>  The 'crypto_req' related functions. The callbacks should create a 'reqnum'
>  'crypto_req' group in struct 'crypto_bm_base'. And the also suggest
>  prepare the request data in this function. In order to avoid the false
>  high performance of the algorithm caused by the false cache and TLB hit
>  rate, the size of data set should be larger than 'crypto_req' number.
>  - perf
>  The request sending functions. The registrant should use parameter 'loop'
>  to send requests repeatly. And update the count in struct
>  'crypto_bm_thread_data'.
> 
> Then consider the parameters that user can configure. Generally speaking,
> the following parameters will affect the performance of the algorithm:
> tfm number, request number, block size, numa node. And some parameters
> will affect the stability of performance: testing time and requests sent
> number. To sum up, the benchmark has following parameters:
>  - algorithm
>  The testing algorithm name. Showed in /proc/crypto.
>  - algtype
>  The testing algorithm class. Can get the algorithm class by echo 'algtype'
>  to /sys/module/crypto_benchmark/parameters/help.
>  - inputsize
>  The testing length that can greatly impact performance. Such as data size
>  for compress or key length for encryption.
>  - loop
>  The testing loop times. Avoid performance fluctuations caused by
>  environment.
>  - numamask
>  The testing bind numamask. Used for allocate memory, create threads and
>  create 'crypto_tfm'.
>  - optype
>  The testing algorithm operation type. Can get the algorithm available
>  operation types by cat /sys/module/crypto_benchmark/parameters/help
>  with specified 'algtype'.
>  - reqnum
>  The testing request number for per tfm. Used for test asynchrony api
>  performance.
>  - threadnum
>  The testing thread number. To simplify model, create a 'crypto_tfm' per
>  thread.
>  - time
>  The testing time. Used for stop the test thread.
>  - run
>  Start or stop the test.
> 
> Users can configure parameters under
> /sys/modules/crypto_benchmark/parameters/.

Please don't use module parameters for stuff like this, use configfs
which was designed for this type of interactions.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/6] crypto: benchmark - add the crypto benchmark
  2022-09-19 12:05 [RFC PATCH 0/6] crypto: benchmark - add the crypto benchmark Yang Shen
                   ` (5 preceding siblings ...)
  2022-09-19 12:05 ` [RFC PATCH 6/6] MAINTAINERS: add crypto benchmark MAINTAINER Yang Shen
@ 2022-09-20  8:28 ` Herbert Xu
  2022-09-21  8:19   ` Yang Shen
  6 siblings, 1 reply; 14+ messages in thread
From: Herbert Xu @ 2022-09-20  8:28 UTC (permalink / raw)
  To: Yang Shen; +Cc: davem, linux-kernel, linux-crypto, gregkh

On Mon, Sep 19, 2022 at 08:05:31PM +0800, Yang Shen wrote:
> Add crypto benchmark - A tool to help the users quickly get the
> performance of a algorithm registered in crypto.

Please explain how this relates to the existing speed testing
functionality in tcrypt.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/6] crypto: benchmark - add the crypto benchmark
  2022-09-20  8:28 ` [RFC PATCH 0/6] crypto: benchmark - add the crypto benchmark Herbert Xu
@ 2022-09-21  8:19   ` Yang Shen
  2022-09-30  4:51     ` Herbert Xu
  0 siblings, 1 reply; 14+ messages in thread
From: Yang Shen @ 2022-09-21  8:19 UTC (permalink / raw)
  To: Herbert Xu; +Cc: davem, linux-kernel, linux-crypto, gregkh



在 2022/9/20 16:28, Herbert Xu 写道:
> On Mon, Sep 19, 2022 at 08:05:31PM +0800, Yang Shen wrote:
>> Add crypto benchmark - A tool to help the users quickly get the
>> performance of a algorithm registered in crypto.
> Please explain how this relates to the existing speed testing
> functionality in tcrypt.
>
> Thanks,

In fact, the purpose for I is to get a crypto benchmark tool which is 
customizable
and easy to called. For example, I test the hardware performance every 
rc1 to check
whether the modification of the common module affects it. For me, I need 
to test
the mutil threads, mutil numas, mutil requests of one tfm and so on. 
These test
cases are used to simulate some service scenarios. And in these cases, I 
can find
if any common module apply a patch that has an impact on us.

I know the tcrypt.ko has the speed test cases. But the tcrypt.ko test 
case is fixed.
If I understand correctly, the design model of tcrypt.ko is test the 
algorithms with
determined case conditions. It can provide some standardized testing to 
ensure
that the implementation of the algorithm meets the requirements. This is a
reasonable developer test tool, but it is not flexible enough for 
testers and users.

There are two main reasons for this:
1> For testers, the performance is not only related to algorithms and 
algorithm
configurations. Many configurations may have obvious effect on 
performance which
are not provided on tcrypt.ko. Of course, this problem can fix by add 
these as module
parameters.
2> For users, a friendly tool is that they can use the tool directly 
rather to need to
watch the source code to know how to use it. In tcrypt.ko, users need to 
get the 'mode'
number of case they want to test if exist.

So this tool's original intention is to allow users test more complex 
scenarios and get the
parameters usage directly.

If I have any misunderstanding about tcrypt.ko, please correct me. And 
I'll try to use the
tcrytp.ko to meet my request.

Thanks,

Yang

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 2/6] crypto: benchmark - add a crypto benchmark tool
  2022-09-20  7:31   ` Greg KH
@ 2022-09-21  8:20     ` Yang Shen
  0 siblings, 0 replies; 14+ messages in thread
From: Yang Shen @ 2022-09-21  8:20 UTC (permalink / raw)
  To: Greg KH; +Cc: herbert, davem, linux-kernel, linux-crypto



在 2022/9/20 15:31, Greg KH 写道:
> On Mon, Sep 19, 2022 at 08:05:33PM +0800, Yang Shen wrote:
>> Provide a crypto benchmark to help the developer quickly get the
>> performance of a algorithm registered in crypto.
>>
>> Due to the crypto algorithms have multifarious parameters, the tool
>> cannot support all test scenes. In order to provide users with simple
>> and easy-to-use tools and support as many test scenarios as possible,
>> benchmark refers to the crypto method to provide a unified struct
>> 'crypto_bm_ops'. And the algorithm registers its own callbacks to parse
>> the user's input. In crypto, a algorithm class has multiple algorithms,
>> but all of them uses the same API. So in the benchmark, a algorithm
>> class uses the same 'ops' and distinguish specific algorithm by name.
>>
>> First, consider the performance calculation model. Considering the
>> crypto subsystem model, a reasonable process code based on crypto api
>> should create a numa node based 'crypto_tfm' in advance and apply for
>> a certain amount of 'crypto_req' according to their own business.
>> In the real business processing stage, the thread send tasks based on
>> 'crypto_req' and wait for completion.
>>
>> Therefore, the benchmark will create 'crypto_tfm' and 'crypto_req' at
>> first, and then count all requests time to calculate performance.
>> So the result is the pure algorithm performance. When each algorithm
>> class implements its own 'ops', it needs to pay attention to the content
>> completed in the callback. Before the 'ops.perf', the tool had better
>> prepare the request data set. And in order to avoid the false high
>> performance of the algorithm caused by the false cache and TLB hit rate,
>> the size of data set should be larger than 'crypto_req' number.
>> The 'crypto_bm_ops' has following api:
>>   - init & uninit
>>   The initialize related functions. Algorithm can do some private setting.
>>   - create_tfm & release_tfm
>>   The 'crypto_tfm' related functions. Algorithm has different tfm name in
>>   crypto. But they both has a member named tfm, so use tfm to stand for
>>   algorithm handle. The benchmark has provides the tfm array.
>>   - create_req & release_req
>>   The 'crypto_req' related functions. The callbacks should create a 'reqnum'
>>   'crypto_req' group in struct 'crypto_bm_base'. And the also suggest
>>   prepare the request data in this function. In order to avoid the false
>>   high performance of the algorithm caused by the false cache and TLB hit
>>   rate, the size of data set should be larger than 'crypto_req' number.
>>   - perf
>>   The request sending functions. The registrant should use parameter 'loop'
>>   to send requests repeatly. And update the count in struct
>>   'crypto_bm_thread_data'.
>>
>> Then consider the parameters that user can configure. Generally speaking,
>> the following parameters will affect the performance of the algorithm:
>> tfm number, request number, block size, numa node. And some parameters
>> will affect the stability of performance: testing time and requests sent
>> number. To sum up, the benchmark has following parameters:
>>   - algorithm
>>   The testing algorithm name. Showed in /proc/crypto.
>>   - algtype
>>   The testing algorithm class. Can get the algorithm class by echo 'algtype'
>>   to /sys/module/crypto_benchmark/parameters/help.
>>   - inputsize
>>   The testing length that can greatly impact performance. Such as data size
>>   for compress or key length for encryption.
>>   - loop
>>   The testing loop times. Avoid performance fluctuations caused by
>>   environment.
>>   - numamask
>>   The testing bind numamask. Used for allocate memory, create threads and
>>   create 'crypto_tfm'.
>>   - optype
>>   The testing algorithm operation type. Can get the algorithm available
>>   operation types by cat /sys/module/crypto_benchmark/parameters/help
>>   with specified 'algtype'.
>>   - reqnum
>>   The testing request number for per tfm. Used for test asynchrony api
>>   performance.
>>   - threadnum
>>   The testing thread number. To simplify model, create a 'crypto_tfm' per
>>   thread.
>>   - time
>>   The testing time. Used for stop the test thread.
>>   - run
>>   Start or stop the test.
>>
>> Users can configure parameters under
>> /sys/modules/crypto_benchmark/parameters/.
> Please don't use module parameters for stuff like this, use configfs
> which was designed for this type of interactions.
>
> thanks,
>
> greg k-h
Got it!

Thanks,

Yang


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/6] crypto: benchmark - add the crypto benchmark
  2022-09-21  8:19   ` Yang Shen
@ 2022-09-30  4:51     ` Herbert Xu
  2022-10-14  1:43       ` Yang Shen
  0 siblings, 1 reply; 14+ messages in thread
From: Herbert Xu @ 2022-09-30  4:51 UTC (permalink / raw)
  To: Yang Shen; +Cc: davem, linux-kernel, linux-crypto, gregkh

On Wed, Sep 21, 2022 at 04:19:18PM +0800, Yang Shen wrote:
>
> I know the tcrypt.ko has the speed test cases. But the tcrypt.ko test case
> is fixed.
> If I understand correctly, the design model of tcrypt.ko is test the
> algorithms with
> determined case conditions. It can provide some standardized testing to
> ensure
> that the implementation of the algorithm meets the requirements. This is a
> reasonable developer test tool, but it is not flexible enough for testers
> and users.

How about improving tcrypt then? We're not going to have two things
in the kernel that do the same thing unless you provide a clear path
of eliminating one of them.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/6] crypto: benchmark - add the crypto benchmark
  2022-09-30  4:51     ` Herbert Xu
@ 2022-10-14  1:43       ` Yang Shen
  2022-10-14  8:25         ` Herbert Xu
  0 siblings, 1 reply; 14+ messages in thread
From: Yang Shen @ 2022-10-14  1:43 UTC (permalink / raw)
  To: Herbert Xu; +Cc: davem, linux-kernel, linux-crypto, gregkh



在 2022/9/30 12:51, Herbert Xu 写道:
> On Wed, Sep 21, 2022 at 04:19:18PM +0800, Yang Shen wrote:
>> I know the tcrypt.ko has the speed test cases. But the tcrypt.ko test case
>> is fixed.
>> If I understand correctly, the design model of tcrypt.ko is test the
>> algorithms with
>> determined case conditions. It can provide some standardized testing to
>> ensure
>> that the implementation of the algorithm meets the requirements. This is a
>> reasonable developer test tool, but it is not flexible enough for testers
>> and users.
> How about improving tcrypt then? We're not going to have two things
> in the kernel that do the same thing unless you provide a clear path
> of eliminating one of them.
>
> Cheers,

Got it. I'll try to support this on the tcrypt.

Thanks,

Yang


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/6] crypto: benchmark - add the crypto benchmark
  2022-10-14  1:43       ` Yang Shen
@ 2022-10-14  8:25         ` Herbert Xu
  0 siblings, 0 replies; 14+ messages in thread
From: Herbert Xu @ 2022-10-14  8:25 UTC (permalink / raw)
  To: Yang Shen; +Cc: davem, linux-kernel, linux-crypto, gregkh

On Fri, Oct 14, 2022 at 09:43:40AM +0800, Yang Shen wrote:
>
> Got it. I'll try to support this on the tcrypt.

Before you get too far into this, please note that I have no
preference as to whether you go with tcrypt or your new benchmark
code.

My only requirement is that we pick one mechanism.

But obivously others might have a preference so you should try
to produce RFCs as early as possible.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-10-14  8:26 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-19 12:05 [RFC PATCH 0/6] crypto: benchmark - add the crypto benchmark Yang Shen
2022-09-19 12:05 ` [RFC PATCH 1/6] moduleparams: Add hexulong type parameter Yang Shen
2022-09-19 12:05 ` [RFC PATCH 2/6] crypto: benchmark - add a crypto benchmark tool Yang Shen
2022-09-20  7:31   ` Greg KH
2022-09-21  8:20     ` Yang Shen
2022-09-19 12:05 ` [RFC PATCH 3/6] crytpo: benchmark - support compression/decompresssion Yang Shen
2022-09-19 12:05 ` [RFC PATCH 4/6] crypto: benchmark - add help information Yang Shen
2022-09-19 12:05 ` [RFC PATCH 5/6] crypto: benchmark - add API documentation Yang Shen
2022-09-19 12:05 ` [RFC PATCH 6/6] MAINTAINERS: add crypto benchmark MAINTAINER Yang Shen
2022-09-20  8:28 ` [RFC PATCH 0/6] crypto: benchmark - add the crypto benchmark Herbert Xu
2022-09-21  8:19   ` Yang Shen
2022-09-30  4:51     ` Herbert Xu
2022-10-14  1:43       ` Yang Shen
2022-10-14  8:25         ` Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).