iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] dma-mapping: provide a benchmark for streaming DMA mapping
@ 2020-11-01 10:15 Barry Song
  2020-11-01 10:15 ` [PATCH v2 1/2] dma-mapping: add benchmark support for streaming DMA APIs Barry Song
  2020-11-01 10:15 ` [PATCH v2 2/2] selftests/dma: add test application for DMA_MAP_BENCHMARK Barry Song
  0 siblings, 2 replies; 5+ messages in thread
From: Barry Song @ 2020-11-01 10:15 UTC (permalink / raw)
  To: iommu, hch, robin.murphy, m.szyprowski; +Cc: xuwei5, linuxarm, linux-kselftest

Nowadays, there are increasing requirements to benchmark the performance
of dma_map and dma_unmap particually while the device is attached to an
IOMMU.

This patchset provides the benchmark infrastruture for streaming DMA
mapping. The architecture of the code is pretty much similar with GUP
benchmark:
* mm/gup_benchmark.c provides kernel interface;
* tools/testing/selftests/vm/gup_benchmark.c provides user program to
call the interface provided by mm/gup_benchmark.c.

In our case, kernel/dma/map_benchmark.c is like mm/gup_benchmark.c;
tools/testing/selftests/dma/dma_map_benchmark.c is like tools/testing/
selftests/vm/gup_benchmark.c

A major difference with GUP benchmark is DMA_MAP benchmark needs to run
on a device. Considering one board with below devices and IOMMUs
device A  ------- IOMMU 1
device B  ------- IOMMU 2
device C  ------- non-IOMMU

Different devices might attach to different IOMMU or non-IOMMU. To make
benchmark run, we can either
* create a virtual device and hack the kernel code to attach the virtual
device to IOMMU1, IOMMU2 or non-IOMMU.
* use the existing driver_override mechinism, unbind device A,B, OR c from
their original driver and bind A to dma_map_benchmark platform driver or
pci driver for benchmarking.

In this patchset, I prefer to use the driver_override and avoid the ugly
hack in kernel. We can dynamically switch device behind different IOMMUs
to get the performance of IOMMU or non-IOMMU.

-v2:
  * add PCI support; v1 supported platform devices only
  * replace ssleep by msleep_interruptible() to permit users to exit
    benchmark before it is completed
  * many changes according to Robin's suggestions, thanks! Robin
    - add standard deviation output to reflect the worst case
    - check users' parameters strictly like the number of threads
    - make cache dirty before dma_map
    - fix unpaired dma_map_page and dma_unmap_single;
    - remove redundant "long long" before ktime_to_ns();
    - use devm_add_action();
    - wakeup all threads together after they are ready

Barry Song (2):
  dma-mapping: add benchmark support for streaming DMA APIs
  selftests/dma: add test application for DMA_MAP_BENCHMARK

 MAINTAINERS                                   |   6 +
 kernel/dma/Kconfig                            |   8 +
 kernel/dma/Makefile                           |   1 +
 kernel/dma/map_benchmark.c                    | 295 ++++++++++++++++++
 tools/testing/selftests/dma/Makefile          |   6 +
 tools/testing/selftests/dma/config            |   1 +
 .../testing/selftests/dma/dma_map_benchmark.c |  87 ++++++
 7 files changed, 404 insertions(+)
 create mode 100644 kernel/dma/map_benchmark.c
 create mode 100644 tools/testing/selftests/dma/Makefile
 create mode 100644 tools/testing/selftests/dma/config
 create mode 100644 tools/testing/selftests/dma/dma_map_benchmark.c

-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 1/2] dma-mapping: add benchmark support for streaming DMA APIs
  2020-11-01 10:15 [PATCH v2 0/2] dma-mapping: provide a benchmark for streaming DMA mapping Barry Song
@ 2020-11-01 10:15 ` Barry Song
  2020-11-01 18:57   ` kernel test robot
  2020-11-01 19:22   ` kernel test robot
  2020-11-01 10:15 ` [PATCH v2 2/2] selftests/dma: add test application for DMA_MAP_BENCHMARK Barry Song
  1 sibling, 2 replies; 5+ messages in thread
From: Barry Song @ 2020-11-01 10:15 UTC (permalink / raw)
  To: iommu, hch, robin.murphy, m.szyprowski
  Cc: linux-kselftest, Shuah Khan, linuxarm, xuwei5, Will Deacon

Nowadays, there are increasing requirements to benchmark the performance
of dma_map and dma_unmap particually while the device is attached to an
IOMMU.

This patch enables the support. Users can run specified number of threads
to do dma_map_page and dma_unmap_page on a specific NUMA node with the
specified duration. Then dma_map_benchmark will calculate the average
latency for map and unmap.

A difficulity for this benchmark is that dma_map/unmap APIs must run on
a particular device. Each device might have different backend of IOMMU or
non-IOMMU.

So we use the driver_override to bind dma_map_benchmark to a particual
device by:
For platform devices:
echo dma_map_benchmark > /sys/bus/platform/devices/xxx/driver_override
echo xxx > /sys/bus/platform/drivers/xxx/unbind
echo xxx > /sys/bus/platform/drivers/dma_map_benchmark/bind

For PCI devices:
echo dma_map_benchmark > /sys/bus/pci/devices/0000:00:01.0/driver_override
echo 0000:00:01.0 > /sys/bus/pci/drivers/xxx/unbind
echo 0000:00:01.0 > /sys/bus/pci/drivers/dma_map_benchmark/bind

Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
---
-v2:
  * add PCI support; v1 supported platform devices only
  * replace ssleep by msleep_interruptible() to permit users to exit
    benchmark before it is completed
  * many changes according to Robin's suggestions, thanks! Robin
    - add standard deviation output to reflect the worst case
    - check users' parameters strictly like the number of threads
    - make cache dirty before dma_map
    - fix unpaired dma_map_page and dma_unmap_single;
    - remove redundant "long long" before ktime_to_ns();
    - use devm_add_action();
    - wakeup all threads together after they are ready

 kernel/dma/Kconfig         |   8 +
 kernel/dma/Makefile        |   1 +
 kernel/dma/map_benchmark.c | 295 +++++++++++++++++++++++++++++++++++++
 3 files changed, 304 insertions(+)
 create mode 100644 kernel/dma/map_benchmark.c

diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index c99de4a21458..949c53da5991 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -225,3 +225,11 @@ config DMA_API_DEBUG_SG
 	  is technically out-of-spec.
 
 	  If unsure, say N.
+
+config DMA_MAP_BENCHMARK
+	bool "Enable benchmarking of streaming DMA mapping"
+	help
+	  Provides /sys/kernel/debug/dma_map_benchmark that helps with testing
+	  performance of dma_(un)map_page.
+
+	  See tools/testing/selftests/dma/dma_map_benchmark.c
diff --git a/kernel/dma/Makefile b/kernel/dma/Makefile
index dc755ab68aab..7aa6b26b1348 100644
--- a/kernel/dma/Makefile
+++ b/kernel/dma/Makefile
@@ -10,3 +10,4 @@ obj-$(CONFIG_DMA_API_DEBUG)		+= debug.o
 obj-$(CONFIG_SWIOTLB)			+= swiotlb.o
 obj-$(CONFIG_DMA_COHERENT_POOL)		+= pool.o
 obj-$(CONFIG_DMA_REMAP)			+= remap.o
+obj-$(CONFIG_DMA_MAP_BENCHMARK)		+= map_benchmark.o
diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c
new file mode 100644
index 000000000000..ac397758087b
--- /dev/null
+++ b/kernel/dma/map_benchmark.c
@@ -0,0 +1,295 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Hisilicon Limited.
+ */
+
+#define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt
+
+#include <linux/debugfs.h>
+#include <linux/delay.h>
+#include <linux/device.h>
+#include <linux/dma-mapping.h>
+#include <linux/kernel.h>
+#include <linux/kthread.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+#include <linux/timekeeping.h>
+
+#define DMA_MAP_BENCHMARK	_IOWR('d', 1, struct map_benchmark)
+#define DMA_MAP_MAX_THREADS	1024
+#define DMA_MAP_MAX_SECONDS	300
+
+struct map_benchmark {
+	__u64 avg_map_100ns; /* average map latency in 100ns */
+	__u64 map_stddev; /* standard deviation of map latency */
+	__u64 avg_unmap_100ns; /* as above */
+	__u64 unmap_stddev;
+	__u32 threads; /* how many threads will do map/unmap in parallel */
+	__u32 seconds; /* how long the test will last */
+	int node; /* which numa node this benchmark will run on */
+	__u64 expansion[10];	/* For future use */
+};
+
+struct map_benchmark_data {
+	struct map_benchmark bparam;
+	struct device *dev;
+	struct dentry  *debugfs;
+	atomic64_t sum_map_100ns;
+	atomic64_t sum_unmap_100ns;
+	atomic64_t sum_square_map;
+	atomic64_t sum_square_unmap;
+	atomic64_t loops;
+};
+
+static int map_benchmark_thread(void *data)
+{
+	void *buf;
+	dma_addr_t dma_addr;
+	struct map_benchmark_data *map = data;
+	int ret = 0;
+
+	buf = (void *)__get_free_page(GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	while (!kthread_should_stop())  {
+		__u64 map_100ns, unmap_100ns, map_square, unmap_square;
+		ktime_t map_stime, map_etime, unmap_stime, unmap_etime;
+
+		/*
+		 * for a non-coherent device, if we don't stain them in the cache,
+		 * this will give an underestimate of the real-world overhead of
+		 * BIDIRECTIONAL or TO_DEVICE mappings
+		 * 66 means evertything goes well! 66 is lucky.
+		 */
+		memset(buf, 0x66, PAGE_SIZE);
+
+		map_stime = ktime_get();
+		dma_addr = dma_map_single(map->dev, buf, PAGE_SIZE, DMA_BIDIRECTIONAL);
+		if (unlikely(dma_mapping_error(map->dev, dma_addr))) {
+			pr_err("dma_map_single failed on %s\n", dev_name(map->dev));
+			ret = -ENOMEM;
+			goto out;
+		}
+		map_etime = ktime_get();
+
+		unmap_stime = ktime_get();
+		dma_unmap_single(map->dev, dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
+		unmap_etime = ktime_get();
+
+		/* calculate sum and sum of squares */
+		map_100ns = ktime_to_ns(ktime_sub(map_etime, map_stime)) / 100;
+		unmap_100ns = ktime_to_ns(ktime_sub(unmap_etime, unmap_stime)) / 100;
+		map_square = map_100ns * map_100ns;
+		unmap_square = unmap_100ns * unmap_100ns;
+
+		atomic64_add(map_100ns, &map->sum_map_100ns);
+		atomic64_add(unmap_100ns, &map->sum_unmap_100ns);
+		atomic64_add(map_square, &map->sum_square_map);
+		atomic64_add(unmap_square, &map->sum_square_unmap);
+		atomic64_inc(&map->loops);
+	}
+
+out:
+	free_page((unsigned long)buf);
+	return ret;
+}
+
+static int do_map_benchmark(struct map_benchmark_data *map)
+{
+	struct task_struct **tsk;
+	int threads = map->bparam.threads;
+	int node = map->bparam.node;
+	const cpumask_t *cpu_mask = cpumask_of_node(node);
+	__u64 loops;
+	int ret = 0;
+	int i;
+
+	tsk = kmalloc_array(threads, sizeof(tsk), GFP_KERNEL);
+	if (!tsk)
+		return -ENOMEM;
+
+	get_device(map->dev);
+
+	for (i = 0; i < threads; i++) {
+		tsk[i] = kthread_create_on_node(map_benchmark_thread, map,
+				map->bparam.node, "dma-map-benchmark/%d", i);
+		if (IS_ERR(tsk[i])) {
+			pr_err("create dma_map thread failed\n");
+			ret = PTR_ERR(tsk[i]);
+			goto out;
+		}
+
+		if (node != NUMA_NO_NODE && node_online(node))
+			kthread_bind_mask(tsk[i], cpu_mask);
+	}
+
+	/* clear the old value in the previous benchmark */
+	atomic64_set(&map->sum_map_100ns, 0);
+	atomic64_set(&map->sum_unmap_100ns, 0);
+	atomic64_set(&map->sum_square_map, 0);
+	atomic64_set(&map->sum_square_unmap, 0);
+	atomic64_set(&map->loops, 0);
+
+	for (i = 0; i < threads; i++)
+		wake_up_process(tsk[i]);
+
+	msleep_interruptible(map->bparam.seconds * 1000);
+
+	/* wait for the completion of benchmark threads */
+	for (i = 0; i < threads; i++) {
+		ret = kthread_stop(tsk[i]);
+		if (ret)
+			goto out;
+	}
+
+	loops = atomic64_read(&map->loops);
+	if (likely(loops > 0)) {
+		__u64 map_variance, unmap_variance;
+
+		/* average latency */
+		map->bparam.avg_map_100ns = atomic64_read(&map->sum_map_100ns) / loops;
+		map->bparam.avg_unmap_100ns = atomic64_read(&map->sum_unmap_100ns) / loops;
+
+		/* standard deviation of latency */
+		map_variance = atomic64_read(&map->sum_square_map) / loops -
+			map->bparam.avg_map_100ns * map->bparam.avg_map_100ns;
+		unmap_variance = atomic64_read(&map->sum_square_unmap) / loops -
+			map->bparam.avg_unmap_100ns * map->bparam.avg_unmap_100ns;
+		map->bparam.map_stddev = int_sqrt64(map_variance);
+		map->bparam.unmap_stddev = int_sqrt64(unmap_variance);
+	}
+
+out:
+	put_device(map->dev);
+	kfree(tsk);
+	return ret;
+}
+
+static long map_benchmark_ioctl(struct file *filep, unsigned int cmd,
+		unsigned long arg)
+{
+	struct map_benchmark_data *map = filep->private_data;
+	int ret;
+
+	if (copy_from_user(&map->bparam, (void __user *)arg, sizeof(map->bparam)))
+		return -EFAULT;
+
+	switch (cmd) {
+	case DMA_MAP_BENCHMARK:
+		if (map->bparam.threads == 0 || map->bparam.threads > DMA_MAP_MAX_THREADS) {
+			pr_err("invalid thread number\n");
+			return -EINVAL;
+		}
+		if (map->bparam.seconds == 0 || map->bparam.seconds > DMA_MAP_MAX_SECONDS) {
+			pr_err("invalid duration seconds\n");
+			return -EINVAL;
+		}
+
+		ret = do_map_benchmark(map);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (copy_to_user((void __user *)arg, &map->bparam, sizeof(map->bparam)))
+		return -EFAULT;
+
+	return ret;
+}
+
+static const struct file_operations map_benchmark_fops = {
+	.open = simple_open,
+	.unlocked_ioctl = map_benchmark_ioctl,
+};
+
+static void map_benchmark_remove_debugfs(void *data)
+{
+	struct map_benchmark_data *map = (struct map_benchmark_data *)data;
+
+	debugfs_remove(map->debugfs);
+}
+
+static int __map_benchmark_probe(struct device *dev)
+{
+	struct dentry *entry;
+	struct map_benchmark_data *map;
+	int ret;
+
+	map = devm_kzalloc(dev, sizeof(*map), GFP_KERNEL);
+	if (!map)
+		return -ENOMEM;
+	map->dev = dev;
+
+	ret = devm_add_action(dev, map_benchmark_remove_debugfs, map);
+	if (ret) {
+		pr_err("Can't add debugfs remove action\n");
+		return ret;
+	}
+
+	/*
+	 * we only permit a device bound with this driver, 2nd probe
+	 * will fail
+	 */
+	entry = debugfs_create_file("dma_map_benchmark", 0600, NULL, map,
+			&map_benchmark_fops);
+	if (IS_ERR(entry))
+		return PTR_ERR(entry);
+	map->debugfs = entry;
+
+	return 0;
+}
+
+static int map_benchmark_platform_probe(struct platform_device *pdev)
+{
+	return __map_benchmark_probe(&pdev->dev);
+}
+
+static struct platform_driver map_benchmark_platform_driver = {
+	.driver		= {
+		.name	= "dma_map_benchmark",
+	},
+	.probe = map_benchmark_platform_probe,
+};
+
+static int map_benchmark_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+	return __map_benchmark_probe(&pdev->dev);
+}
+
+static struct pci_driver map_benchmark_pci_driver = {
+	.name	= "dma_map_benchmark",
+	.probe	= map_benchmark_pci_probe,
+};
+
+static int __init map_benchmark_init(void)
+{
+	int ret;
+
+	ret = pci_register_driver(&map_benchmark_pci_driver);
+	if (ret)
+		return ret;
+
+	ret = platform_driver_register(&map_benchmark_platform_driver);
+	if (ret) {
+		pci_unregister_driver(&map_benchmark_pci_driver);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void __exit map_benchmark_cleanup(void)
+{
+	platform_driver_unregister(&map_benchmark_platform_driver);
+	pci_unregister_driver(&map_benchmark_pci_driver);
+}
+
+module_init(map_benchmark_init);
+module_exit(map_benchmark_cleanup);
+
+MODULE_AUTHOR("Barry Song <song.bao.hua@hisilicon.com>");
+MODULE_DESCRIPTION("dma_map benchmark driver");
+MODULE_LICENSE("GPL");
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 2/2] selftests/dma: add test application for DMA_MAP_BENCHMARK
  2020-11-01 10:15 [PATCH v2 0/2] dma-mapping: provide a benchmark for streaming DMA mapping Barry Song
  2020-11-01 10:15 ` [PATCH v2 1/2] dma-mapping: add benchmark support for streaming DMA APIs Barry Song
@ 2020-11-01 10:15 ` Barry Song
  1 sibling, 0 replies; 5+ messages in thread
From: Barry Song @ 2020-11-01 10:15 UTC (permalink / raw)
  To: iommu, hch, robin.murphy, m.szyprowski
  Cc: linux-kselftest, Shuah Khan, linuxarm, xuwei5, Will Deacon

This patch provides the test application for DMA_MAP_BENCHMARK.

Before running the test application, we need to bind a device to dma_map_
benchmark driver. For example, unbind "xxx" from its original driver and
bind to dma_map_benchmark:

echo dma_map_benchmark > /sys/bus/platform/devices/xxx/driver_override
echo xxx > /sys/bus/platform/drivers/xxx/unbind
echo xxx > /sys/bus/platform/drivers/dma_map_benchmark/bind

Another example for PCI devices:
echo dma_map_benchmark > /sys/bus/pci/devices/0000:00:01.0/driver_override
echo 0000:00:01.0 > /sys/bus/pci/drivers/xxx/unbind
echo 0000:00:01.0 > /sys/bus/pci/drivers/dma_map_benchmark/bind

The below command will run 16 threads on numa node 0 for 10 seconds on
the device bound to dma_map_benchmark platform_driver or pci_driver:
./dma_map_benchmark -t 16 -s 10 -n 0
dma mapping benchmark: threads:16 seconds:10
average map latency(us):1.1 standard deviation:1.9
average unmap latency(us):0.5 standard deviation:0.8

Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
---
 -v2:
 * check parameters like threads, seconds strictly
 * print standard deviation for latencies

 MAINTAINERS                                   |  6 ++
 tools/testing/selftests/dma/Makefile          |  6 ++
 tools/testing/selftests/dma/config            |  1 +
 .../testing/selftests/dma/dma_map_benchmark.c | 87 +++++++++++++++++++
 4 files changed, 100 insertions(+)
 create mode 100644 tools/testing/selftests/dma/Makefile
 create mode 100644 tools/testing/selftests/dma/config
 create mode 100644 tools/testing/selftests/dma/dma_map_benchmark.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 608fc8484c02..a1e38d5e14f6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5247,6 +5247,12 @@ F:	include/linux/dma-mapping.h
 F:	include/linux/dma-map-ops.h
 F:	kernel/dma/
 
+DMA MAPPING BENCHMARK
+M:	Barry Song <song.bao.hua@hisilicon.com>
+L:	iommu@lists.linux-foundation.org
+F:	kernel/dma/map_benchmark.c
+F:	tools/testing/selftests/dma/
+
 DMA-BUF HEAPS FRAMEWORK
 M:	Sumit Semwal <sumit.semwal@linaro.org>
 R:	Benjamin Gaignard <benjamin.gaignard@linaro.org>
diff --git a/tools/testing/selftests/dma/Makefile b/tools/testing/selftests/dma/Makefile
new file mode 100644
index 000000000000..aa8e8b5b3864
--- /dev/null
+++ b/tools/testing/selftests/dma/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+CFLAGS += -I../../../../usr/include/
+
+TEST_GEN_PROGS := dma_map_benchmark
+
+include ../lib.mk
diff --git a/tools/testing/selftests/dma/config b/tools/testing/selftests/dma/config
new file mode 100644
index 000000000000..6102ee3c43cd
--- /dev/null
+++ b/tools/testing/selftests/dma/config
@@ -0,0 +1 @@
+CONFIG_DMA_MAP_BENCHMARK=y
diff --git a/tools/testing/selftests/dma/dma_map_benchmark.c b/tools/testing/selftests/dma/dma_map_benchmark.c
new file mode 100644
index 000000000000..4778df0c458f
--- /dev/null
+++ b/tools/testing/selftests/dma/dma_map_benchmark.c
@@ -0,0 +1,87 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Hisilicon Limited.
+ */
+
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <linux/types.h>
+
+#define DMA_MAP_BENCHMARK	_IOWR('d', 1, struct map_benchmark)
+#define DMA_MAP_MAX_THREADS	1024
+#define DMA_MAP_MAX_SECONDS     300
+
+struct map_benchmark {
+	__u64 avg_map_100ns; /* average map latency in 100ns */
+	__u64 map_stddev; /* standard deviation of map latency */
+	__u64 avg_unmap_100ns; /* as above */
+	__u64 unmap_stddev;
+	__u32 threads; /* how many threads will do map/unmap in parallel */
+	__u32 seconds; /* how long the test will last */
+	int node; /* which numa node this benchmark will run on */
+	__u64 expansion[10];	/* For future use */
+};
+
+int main(int argc, char **argv)
+{
+	struct map_benchmark map;
+	int fd, opt;
+	/* default single thread, run 20 seconds on NUMA_NO_NODE */
+	int threads = 1, seconds = 20, node = -1;
+	int cmd = DMA_MAP_BENCHMARK;
+	char *p;
+
+	while ((opt = getopt(argc, argv, "t:s:n:")) != -1) {
+		switch (opt) {
+		case 't':
+			threads = atoi(optarg);
+			break;
+		case 's':
+			seconds = atoi(optarg);
+			break;
+		case 'n':
+			node = atoi(optarg);
+			break;
+		default:
+			return -1;
+		}
+	}
+
+	if (threads <= 0 || threads > DMA_MAP_MAX_THREADS) {
+		fprintf(stderr, "invalid number of threads, must be in 1-%d\n",
+			DMA_MAP_MAX_THREADS);
+		exit(1);
+	}
+
+	if (seconds <= 0 || seconds > DMA_MAP_MAX_SECONDS) {
+		fprintf(stderr, "invalid number of seconds, must be in 1-%d\n",
+			DMA_MAP_MAX_SECONDS);
+		exit(1);
+	}
+
+	fd = open("/sys/kernel/debug/dma_map_benchmark", O_RDWR);
+	if (fd == -1) {
+		perror("open");
+		exit(1);
+	}
+
+	map.seconds = seconds;
+	map.threads = threads;
+	map.node = node;
+	if (ioctl(fd, cmd, &map)) {
+		perror("ioctl");
+		exit(1);
+	}
+
+	printf("dma mapping benchmark: threads:%d seconds:%d\n", threads, seconds);
+	printf("average map latency(us):%.1f standard deviation:%.1f\n",
+			map.avg_map_100ns/10.0, map.map_stddev/10.0);
+	printf("average unmap latency(us):%.1f standard deviation:%.1f\n",
+			map.avg_unmap_100ns/10.0, map.unmap_stddev/10.0);
+
+	return 0;
+}
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/2] dma-mapping: add benchmark support for streaming DMA APIs
  2020-11-01 10:15 ` [PATCH v2 1/2] dma-mapping: add benchmark support for streaming DMA APIs Barry Song
@ 2020-11-01 18:57   ` kernel test robot
  2020-11-01 19:22   ` kernel test robot
  1 sibling, 0 replies; 5+ messages in thread
From: kernel test robot @ 2020-11-01 18:57 UTC (permalink / raw)
  To: Barry Song, iommu, hch, robin.murphy, m.szyprowski
  Cc: kbuild-all, linux-kselftest, linuxarm, xuwei5, Will Deacon

[-- Attachment #1: Type: text/plain, Size: 3586 bytes --]

Hi Barry,

I love your patch! Yet something to improve:

[auto build test ERROR on kselftest/next]
[also build test ERROR on linus/master v5.10-rc1 next-20201030]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Barry-Song/dma-mapping-provide-a-benchmark-for-streaming-DMA-mapping/20201101-182009
base:   https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git next
config: mips-allyesconfig (attached as .config)
compiler: mips-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/b9abda38be7f32b9420c27b6c24eff2e69defa87
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Barry-Song/dma-mapping-provide-a-benchmark-for-streaming-DMA-mapping/20201101-182009
        git checkout b9abda38be7f32b9420c27b6c24eff2e69defa87
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=mips 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   arch/mips/kernel/head.o: in function `dtb_found':
   (.ref.text+0xe0): relocation truncated to fit: R_MIPS_26 against `start_kernel'
   init/main.o: in function `set_reset_devices':
   main.c:(.init.text+0x20): relocation truncated to fit: R_MIPS_26 against `_mcount'
   main.c:(.init.text+0x30): relocation truncated to fit: R_MIPS_26 against `__sanitizer_cov_trace_pc'
   init/main.o: in function `debug_kernel':
   main.c:(.init.text+0x9c): relocation truncated to fit: R_MIPS_26 against `_mcount'
   main.c:(.init.text+0xac): relocation truncated to fit: R_MIPS_26 against `__sanitizer_cov_trace_pc'
   init/main.o: in function `quiet_kernel':
   main.c:(.init.text+0x118): relocation truncated to fit: R_MIPS_26 against `_mcount'
   main.c:(.init.text+0x128): relocation truncated to fit: R_MIPS_26 against `__sanitizer_cov_trace_pc'
   init/main.o: in function `init_setup':
   main.c:(.init.text+0x1a4): relocation truncated to fit: R_MIPS_26 against `_mcount'
   main.c:(.init.text+0x1c8): relocation truncated to fit: R_MIPS_26 against `__sanitizer_cov_trace_pc'
   main.c:(.init.text+0x1e8): relocation truncated to fit: R_MIPS_26 against `__sanitizer_cov_trace_pc'
   main.c:(.init.text+0x1fc): additional relocation overflows omitted from the output
   mips-linux-ld: kernel/dma/map_benchmark.o: in function `map_benchmark_thread':
>> map_benchmark.c:(.text.map_benchmark_thread+0x1f4): undefined reference to `__divdi3'
>> mips-linux-ld: map_benchmark.c:(.text.map_benchmark_thread+0x218): undefined reference to `__divdi3'
   mips-linux-ld: kernel/dma/map_benchmark.o: in function `do_map_benchmark':
>> map_benchmark.c:(.text.do_map_benchmark+0x260): undefined reference to `__udivdi3'
>> mips-linux-ld: map_benchmark.c:(.text.do_map_benchmark+0x284): undefined reference to `__udivdi3'
   mips-linux-ld: map_benchmark.c:(.text.do_map_benchmark+0x2b4): undefined reference to `__udivdi3'
   mips-linux-ld: map_benchmark.c:(.text.do_map_benchmark+0x300): undefined reference to `__udivdi3'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 69039 bytes --]

[-- Attachment #3: Type: text/plain, Size: 156 bytes --]

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/2] dma-mapping: add benchmark support for streaming DMA APIs
  2020-11-01 10:15 ` [PATCH v2 1/2] dma-mapping: add benchmark support for streaming DMA APIs Barry Song
  2020-11-01 18:57   ` kernel test robot
@ 2020-11-01 19:22   ` kernel test robot
  1 sibling, 0 replies; 5+ messages in thread
From: kernel test robot @ 2020-11-01 19:22 UTC (permalink / raw)
  To: Barry Song, iommu, hch, robin.murphy, m.szyprowski
  Cc: kbuild-all, linux-kselftest, linuxarm, xuwei5, Will Deacon

[-- Attachment #1: Type: text/plain, Size: 2280 bytes --]

Hi Barry,

I love your patch! Yet something to improve:

[auto build test ERROR on kselftest/next]
[also build test ERROR on linus/master v5.10-rc1]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Barry-Song/dma-mapping-provide-a-benchmark-for-streaming-DMA-mapping/20201101-182009
base:   https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git next
config: h8300-allyesconfig (attached as .config)
compiler: h8300-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/b9abda38be7f32b9420c27b6c24eff2e69defa87
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Barry-Song/dma-mapping-provide-a-benchmark-for-streaming-DMA-mapping/20201101-182009
        git checkout b9abda38be7f32b9420c27b6c24eff2e69defa87
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=h8300 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   h8300-linux-ld: kernel/dma/map_benchmark.o: in function `.L28':
   map_benchmark.c:(.text+0x283): undefined reference to `__udivdi3'
>> h8300-linux-ld: map_benchmark.c:(.text+0x2c1): undefined reference to `__udivdi3'
   h8300-linux-ld: map_benchmark.c:(.text+0x327): undefined reference to `__udivdi3'
   h8300-linux-ld: kernel/dma/map_benchmark.o: in function `.L26':
   map_benchmark.c:(.text+0x3d7): undefined reference to `__udivdi3'
   h8300-linux-ld: kernel/dma/map_benchmark.o: in function `.L44':
   map_benchmark.c:(.text+0x799): undefined reference to `__divdi3'
   h8300-linux-ld: kernel/dma/map_benchmark.o: in function `.L45':
   map_benchmark.c:(.text+0x7f5): undefined reference to `__divdi3'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 57194 bytes --]

[-- Attachment #3: Type: text/plain, Size: 156 bytes --]

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-11-01 19:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-01 10:15 [PATCH v2 0/2] dma-mapping: provide a benchmark for streaming DMA mapping Barry Song
2020-11-01 10:15 ` [PATCH v2 1/2] dma-mapping: add benchmark support for streaming DMA APIs Barry Song
2020-11-01 18:57   ` kernel test robot
2020-11-01 19:22   ` kernel test robot
2020-11-01 10:15 ` [PATCH v2 2/2] selftests/dma: add test application for DMA_MAP_BENCHMARK Barry Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).