linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v3 0/3] add benchmark selftest and optimization for ptr_ring
@ 2021-07-01 12:26 Yunsheng Lin
  2021-07-01 12:26 ` [PATCH net-next v3 1/3] selftests/ptr_ring: add benchmark application " Yunsheng Lin
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Yunsheng Lin @ 2021-07-01 12:26 UTC (permalink / raw)
  To: davem, kuba, jasowang, mst
  Cc: brouer, paulmck, peterz, will, shuah, linux-kernel, netdev,
	linux-kselftest, linuxarm

Patch 1: add a selftest app to benchmark the performance
         of ptr_ring.
Patch 2: move r->queue[] clearing after r->consumer_head
         updating.
Patch 3: add barrier to ensure the visiblity of r->queue[].

V3: add patch 3 and address most of Michael's comment.
V2: add patch 1 and add performance data for patch 2.

---
Performance raw data using "perf stat -r" cmd, comparison
is also done in patch 2/3.
ptr_ring_test_org: patch 1
ptr_ring_test_opt1: patch 1 + patch 2
ptr_ring_test_opt2: patch 1 + patch 2 + patch 3


x86_64(as there is other workload in the x86_64 system, so
run 1000 times to get more accurate result):

Performance counter stats for './ptr_ring_test_org -s 1000 -m 1 -N 100000000' (1000 runs):

          5,291.83 msec task-clock                #    1.994 CPUs utilized            ( +-  0.41% )
               690      context-switches          #    0.130 K/sec                    ( +-  3.65% )
                 8      cpu-migrations            #    0.002 K/sec                    ( +-  5.70% )
               291      page-faults               #    0.055 K/sec                    ( +-  0.05% )
    12,660,040,758      cycles                    #    2.392 GHz                      ( +-  0.41% )
    24,202,160,722      instructions              #    1.91  insn per cycle           ( +-  0.06% )
     3,559,123,597      branches                  #  672.569 M/sec                    ( +-  0.07% )
         8,009,010      branch-misses             #    0.23% of all branches          ( +-  0.11% )

            2.6538 +- 0.0109 seconds time elapsed  ( +-  0.41% )

 Performance counter stats for './ptr_ring_test_opt1 -s 1000 -m 1 -N 100000000' (1000 runs):

          5,064.95 msec task-clock                #    1.992 CPUs utilized            ( +-  0.55% )
               668      context-switches          #    0.132 K/sec                    ( +-  4.20% )
                 9      cpu-migrations            #    0.002 K/sec                    ( +-  4.45% )
               291      page-faults               #    0.057 K/sec                    ( +-  0.06% )
    12,117,262,182      cycles                    #    2.392 GHz                      ( +-  0.55% )
    22,586,035,716      instructions              #    1.86  insn per cycle           ( +-  0.08% )
     3,404,652,345      branches                  #  672.199 M/sec                    ( +-  0.10% )
         7,864,190      branch-misses             #    0.23% of all branches          ( +-  0.16% )

            2.5422 +- 0.0142 seconds time elapsed  ( +-  0.56% )

 Performance counter stats for './ptr_ring_test_opt2 -s 1000 -m 1 -N 100000000' (1000 runs):

          5,105.33 msec task-clock                #    1.995 CPUs utilized            ( +-  0.47% )
               589      context-switches          #    0.115 K/sec                    ( +-  4.24% )
                11      cpu-migrations            #    0.002 K/sec                    ( +-  4.24% )
               292      page-faults               #    0.057 K/sec                    ( +-  0.04% )
    12,214,160,307      cycles                    #    2.392 GHz                      ( +-  0.47% )
    22,756,292,370      instructions              #    1.86  insn per cycle           ( +-  0.10% )
     3,429,218,233      branches                  #  671.694 M/sec                    ( +-  0.12% )
         7,921,984      branch-misses             #    0.23% of all branches          ( +-  0.15% )

            2.5587 +- 0.0122 seconds time elapsed  ( +-  0.47% )


-------------------------------------------------------------------------------------------------
arm64(using taskset to avoid the numa effects):

Performance counter stats for 'taskset -c 0-1 ./ptr_ring_test_org -s 1000 -m 1 -N 100000000' (100 runs):

           4172.83 msec task-clock                #    1.999 CPUs utilized            ( +-  0.01% )
                54      context-switches          #    0.013 K/sec                    ( +-  0.29% )
                 1      cpu-migrations            #    0.000 K/sec
               115      page-faults               #    0.028 K/sec                    ( +-  0.16% )
       10848085945      cycles                    #    2.600 GHz                      ( +-  0.01% )
       25808501369      instructions              #    2.38  insn per cycle           ( +-  0.00% )
   <not supported>      branches
          11190266      branch-misses                                                 ( +-  0.02% )

          2.087205 +- 0.000130 seconds time elapsed  ( +-  0.01% )


 Performance counter stats for 'taskset -c 0-1 ./ptr_ring_test_opt1 -s 1000 -m 1 -N 100000000' (100 runs):

           3774.91 msec task-clock                #    1.999 CPUs utilized            ( +-  0.03% )
                50      context-switches          #    0.013 K/sec                    ( +-  0.36% )
                 1      cpu-migrations            #    0.000 K/sec
               114      page-faults               #    0.030 K/sec                    ( +-  0.15% )
        9813658996      cycles                    #    2.600 GHz                      ( +-  0.03% )
       23920189000      instructions              #    2.44  insn per cycle           ( +-  0.01% )
   <not supported>      branches
          10018927      branch-misses                                                 ( +-  0.04% )

          1.888224 +- 0.000541 seconds time elapsed  ( +-  0.03% )

 Performance counter stats for 'taskset -c 0-1 ./ptr_ring_test_opt2 -s 1000 -m 1 -N 100000000' (100 runs):

           3785.79 msec task-clock                #    1.999 CPUs utilized            ( +-  0.03% )
                49      context-switches          #    0.013 K/sec                    ( +-  0.32% )
                 1      cpu-migrations            #    0.000 K/sec
               114      page-faults               #    0.030 K/sec                    ( +-  0.15% )
        9842067534      cycles                    #    2.600 GHz                      ( +-  0.03% )
       24074397270      instructions              #    2.45  insn per cycle           ( +-  0.01% )
   <not supported>      branches
          10091918      branch-misses                                                 ( +-  0.04% )

          1.893673 +- 0.000508 seconds time elapsed  ( +-  0.03% )

Yunsheng Lin (3):
  selftests/ptr_ring: add benchmark application for ptr_ring
  ptr_ring: move r->queue[] clearing after r->consumer_head updating
  ptr_ring: add barrier to ensure the visiblity of r->queue[]

 MAINTAINERS                                      |   5 +
 include/linux/ptr_ring.h                         |  52 ++++--
 tools/testing/selftests/ptr_ring/Makefile        |   6 +
 tools/testing/selftests/ptr_ring/ptr_ring_test.c | 224 +++++++++++++++++++++++
 tools/testing/selftests/ptr_ring/ptr_ring_test.h | 130 +++++++++++++
 5 files changed, 399 insertions(+), 18 deletions(-)
 create mode 100644 tools/testing/selftests/ptr_ring/Makefile
 create mode 100644 tools/testing/selftests/ptr_ring/ptr_ring_test.c
 create mode 100644 tools/testing/selftests/ptr_ring/ptr_ring_test.h

-- 
2.7.4


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next v3 1/3] selftests/ptr_ring: add benchmark application for ptr_ring
  2021-07-01 12:26 [PATCH net-next v3 0/3] add benchmark selftest and optimization for ptr_ring Yunsheng Lin
@ 2021-07-01 12:26 ` Yunsheng Lin
  2021-07-02  6:43   ` Jason Wang
  2021-07-01 12:26 ` [PATCH net-next v3 2/3] ptr_ring: move r->queue[] clearing after r->consumer_head updating Yunsheng Lin
  2021-07-01 12:26 ` [PATCH net-next v3 3/3] ptr_ring: add barrier to ensure the visiblity of r->queue[] Yunsheng Lin
  2 siblings, 1 reply; 15+ messages in thread
From: Yunsheng Lin @ 2021-07-01 12:26 UTC (permalink / raw)
  To: davem, kuba, jasowang, mst
  Cc: brouer, paulmck, peterz, will, shuah, linux-kernel, netdev,
	linux-kselftest, linuxarm

Currently ptr_ring selftest is embedded within the virtio
selftest, which involves some specific virtio operation,
such as notifying and kicking.

As ptr_ring has been used by various subsystems, it deserves
it's owner selftest in order to benchmark different usecase
of ptr_ring, such as page pool and pfifo_fast qdisc.

So add a simple application to benchmark ptr_ring performance.
Currently two test mode is supported:
Mode 0: Both producing and consuming is done in a single thread,
        it is called simple test mode in the test app.
Mode 1: Producing and consuming is done in different thread
        concurrently, also known as SPSC(single-producer/
        single-consumer) test.

The multi-producer/single-consumer test for pfifo_fast case is
not added yet, which can be added if using CAS atomic operation
to enable lockless multi-producer is proved to be better than
using r->producer_lock.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
V3: Remove timestamp sampling, use standard C library as much
    as possible.
---
 MAINTAINERS                                      |   5 +
 tools/testing/selftests/ptr_ring/Makefile        |   6 +
 tools/testing/selftests/ptr_ring/ptr_ring_test.c | 224 +++++++++++++++++++++++
 tools/testing/selftests/ptr_ring/ptr_ring_test.h | 130 +++++++++++++
 4 files changed, 365 insertions(+)
 create mode 100644 tools/testing/selftests/ptr_ring/Makefile
 create mode 100644 tools/testing/selftests/ptr_ring/ptr_ring_test.c
 create mode 100644 tools/testing/selftests/ptr_ring/ptr_ring_test.h

diff --git a/MAINTAINERS b/MAINTAINERS
index cc375fd..1227022 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14847,6 +14847,11 @@ F:	drivers/net/phy/dp83640*
 F:	drivers/ptp/*
 F:	include/linux/ptp_cl*
 
+PTR RING BENCHMARK
+M:	Yunsheng Lin <linyunsheng@huawei.com>
+L:	netdev@vger.kernel.org
+F:	tools/testing/selftests/ptr_ring/
+
 PTRACE SUPPORT
 M:	Oleg Nesterov <oleg@redhat.com>
 S:	Maintained
diff --git a/tools/testing/selftests/ptr_ring/Makefile b/tools/testing/selftests/ptr_ring/Makefile
new file mode 100644
index 0000000..346dea9
--- /dev/null
+++ b/tools/testing/selftests/ptr_ring/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+LDLIBS = -lpthread
+
+TEST_GEN_PROGS := ptr_ring_test
+
+include ../lib.mk
diff --git a/tools/testing/selftests/ptr_ring/ptr_ring_test.c b/tools/testing/selftests/ptr_ring/ptr_ring_test.c
new file mode 100644
index 0000000..4a5312f
--- /dev/null
+++ b/tools/testing/selftests/ptr_ring/ptr_ring_test.c
@@ -0,0 +1,224 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2021 HiSilicon Limited.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+#include <errno.h>
+#include <malloc.h>
+#include <stdbool.h>
+
+#include "ptr_ring_test.h"
+#include "../../../../include/linux/ptr_ring.h"
+
+#define MIN_RING_SIZE	2
+#define MAX_RING_SIZE	10000000
+
+static struct ptr_ring ring ____cacheline_aligned_in_smp;
+
+struct worker_info {
+	pthread_t tid;
+	int test_count;
+	bool error;
+};
+
+static void *produce_worker(void *arg)
+{
+	struct worker_info *info = arg;
+	unsigned long i = 0;
+	int ret;
+
+	while (++i <= info->test_count) {
+		while (__ptr_ring_full(&ring))
+			cpu_relax();
+
+		ret = __ptr_ring_produce(&ring, (void *)i);
+		if (ret) {
+			fprintf(stderr, "produce failed: %d\n", ret);
+			info->error = true;
+			return NULL;
+		}
+	}
+
+	info->error = false;
+
+	return NULL;
+}
+
+static void *consume_worker(void *arg)
+{
+	struct worker_info *info = arg;
+	unsigned long i = 0;
+	int *ptr;
+
+	while (++i <= info->test_count) {
+		while (__ptr_ring_empty(&ring))
+			cpu_relax();
+
+		ptr = __ptr_ring_consume(&ring);
+		if ((unsigned long)ptr != i) {
+			fprintf(stderr, "consumer failed, ptr: %lu, i: %lu\n",
+				(unsigned long)ptr, i);
+			info->error = true;
+			return NULL;
+		}
+	}
+
+	if (!__ptr_ring_empty(&ring)) {
+		fprintf(stderr, "ring should be empty, test failed\n");
+		info->error = true;
+		return NULL;
+	}
+
+	info->error = false;
+	return NULL;
+}
+
+/* test case for single producer single consumer */
+static void spsc_test(int size, int count)
+{
+	struct worker_info producer, consumer;
+	pthread_attr_t attr;
+	void *res;
+	int ret;
+
+	ret = ptr_ring_init(&ring, size, 0);
+	if (ret) {
+		fprintf(stderr, "init failed: %d\n", ret);
+		return;
+	}
+
+	producer.test_count = count;
+	consumer.test_count = count;
+
+	ret = pthread_attr_init(&attr);
+	if (ret) {
+		fprintf(stderr, "pthread attr init failed: %d\n", ret);
+		goto out;
+	}
+
+	ret = pthread_create(&producer.tid, &attr,
+			     produce_worker, &producer);
+	if (ret) {
+		fprintf(stderr, "create producer thread failed: %d\n", ret);
+		goto out;
+	}
+
+	ret = pthread_create(&consumer.tid, &attr,
+			     consume_worker, &consumer);
+	if (ret) {
+		fprintf(stderr, "create consumer thread failed: %d\n", ret);
+		goto out;
+	}
+
+	ret = pthread_join(producer.tid, &res);
+	if (ret) {
+		fprintf(stderr, "join producer thread failed: %d\n", ret);
+		goto out;
+	}
+
+	ret = pthread_join(consumer.tid, &res);
+	if (ret) {
+		fprintf(stderr, "join consumer thread failed: %d\n", ret);
+		goto out;
+	}
+
+	if (producer.error || consumer.error) {
+		fprintf(stderr, "spsc test failed\n");
+		goto out;
+	}
+
+	printf("ptr_ring(size:%d) perf spsc test produced/comsumed %d items, finished\n",
+	       size, count);
+out:
+	ptr_ring_cleanup(&ring, NULL);
+}
+
+static void simple_test(int size, int count)
+{
+	struct timeval start, end;
+	int i = 0;
+	int *ptr;
+	int ret;
+
+	ret = ptr_ring_init(&ring, size, 0);
+	if (ret) {
+		fprintf(stderr, "init failed: %d\n", ret);
+		return;
+	}
+
+	while (++i <= count) {
+		ret = __ptr_ring_produce(&ring, &count);
+		if (ret) {
+			fprintf(stderr, "produce failed: %d\n", ret);
+			goto out;
+		}
+
+		ptr = __ptr_ring_consume(&ring);
+		if (ptr != &count)  {
+			fprintf(stderr, "consume failed: %p\n", ptr);
+			goto out;
+		}
+	}
+
+	printf("ptr_ring(size:%d) perf simple test produced/consumed %d items, finished\n",
+	       size, count);
+
+out:
+	ptr_ring_cleanup(&ring, NULL);
+}
+
+int main(int argc, char *argv[])
+{
+	int count = 1000000;
+	int size = 1000;
+	int mode = 0;
+	int opt;
+
+	while ((opt = getopt(argc, argv, "N:s:m:h")) != -1) {
+		switch (opt) {
+		case 'N':
+			count = atoi(optarg);
+			break;
+		case 's':
+			size = atoi(optarg);
+			break;
+		case 'm':
+			mode = atoi(optarg);
+			break;
+		case 'h':
+			printf("usage: ptr_ring_test [-N COUNT] [-s RING_SIZE] [-m TEST_MODE]\n");
+			return 0;
+		default:
+			return -1;
+		}
+	}
+
+	if (count <= 0) {
+		fprintf(stderr, "invalid test count, must be > 0\n");
+		return -1;
+	}
+
+	if (size < MIN_RING_SIZE || size > MAX_RING_SIZE) {
+		fprintf(stderr, "invalid ring size, must be in %d-%d\n",
+			MIN_RING_SIZE, MAX_RING_SIZE);
+		return -1;
+	}
+
+	switch (mode) {
+	case 0:
+		simple_test(size, count);
+		break;
+	case 1:
+		spsc_test(size, count);
+		break;
+	default:
+		fprintf(stderr, "invalid test mode\n");
+		return -1;
+	}
+
+	return 0;
+}
diff --git a/tools/testing/selftests/ptr_ring/ptr_ring_test.h b/tools/testing/selftests/ptr_ring/ptr_ring_test.h
new file mode 100644
index 0000000..32bfefb
--- /dev/null
+++ b/tools/testing/selftests/ptr_ring/ptr_ring_test.h
@@ -0,0 +1,130 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifndef _TEST_PTR_RING_TEST_H
+#define _TEST_PTR_RING_TEST_H
+
+#include <assert.h>
+#include <stdatomic.h>
+#include <pthread.h>
+
+/* Assuming the cache line size is 64 for most cpu,
+ * change it accordingly if the running cpu has different
+ * cache line size in order to get more accurate result.
+ */
+#define SMP_CACHE_BYTES	64
+
+#define cpu_relax()	sched_yield()
+#define smp_release()	atomic_thread_fence(memory_order_release)
+#define smp_acquire()	atomic_thread_fence(memory_order_acquire)
+#define smp_wmb()	smp_release()
+#define smp_store_release(p, v)	\
+		atomic_store_explicit(p, v, memory_order_release)
+
+#define READ_ONCE(x)		(*(volatile typeof(x) *)&(x))
+#define WRITE_ONCE(x, val)	((*(volatile typeof(x) *)&(x)) = (val))
+#define cache_line_size		SMP_CACHE_BYTES
+#define unlikely(x)		(__builtin_expect(!!(x), 0))
+#define likely(x)		(__builtin_expect(!!(x), 1))
+#define ALIGN(x, a)		(((x) + (a) - 1) / (a) * (a))
+#define SIZE_MAX		(~(size_t)0)
+#define KMALLOC_MAX_SIZE	SIZE_MAX
+#define spinlock_t		pthread_spinlock_t
+#define gfp_t			int
+#define __GFP_ZERO		0x1
+
+#define ____cacheline_aligned_in_smp \
+		__attribute__((aligned(SMP_CACHE_BYTES)))
+
+static inline void *kmalloc(unsigned int size, gfp_t gfp)
+{
+	void *p;
+
+	p = memalign(64, size);
+	if (!p)
+		return p;
+
+	if (gfp & __GFP_ZERO)
+		memset(p, 0, size);
+
+	return p;
+}
+
+static inline void *kzalloc(unsigned int size, gfp_t flags)
+{
+	return kmalloc(size, flags | __GFP_ZERO);
+}
+
+static inline void *kmalloc_array(size_t n, size_t size, gfp_t flags)
+{
+	if (size != 0 && n > SIZE_MAX / size)
+		return NULL;
+	return kmalloc(n * size, flags);
+}
+
+static inline void *kcalloc(size_t n, size_t size, gfp_t flags)
+{
+	return kmalloc_array(n, size, flags | __GFP_ZERO);
+}
+
+static inline void kfree(void *p)
+{
+	free(p);
+}
+
+#define kvmalloc_array		kmalloc_array
+#define kvfree			kfree
+
+static inline void spin_lock_init(spinlock_t *lock)
+{
+	int r = pthread_spin_init(lock, 0);
+
+	assert(!r);
+}
+
+static inline void spin_lock(spinlock_t *lock)
+{
+	int ret = pthread_spin_lock(lock);
+
+	assert(!ret);
+}
+
+static inline void spin_unlock(spinlock_t *lock)
+{
+	int ret = pthread_spin_unlock(lock);
+
+	assert(!ret);
+}
+
+static inline void spin_lock_bh(spinlock_t *lock)
+{
+	spin_lock(lock);
+}
+
+static inline void spin_unlock_bh(spinlock_t *lock)
+{
+	spin_unlock(lock);
+}
+
+static inline void spin_lock_irq(spinlock_t *lock)
+{
+	spin_lock(lock);
+}
+
+static inline void spin_unlock_irq(spinlock_t *lock)
+{
+	spin_unlock(lock);
+}
+
+static inline void spin_lock_irqsave(spinlock_t *lock,
+				     unsigned long f)
+{
+	spin_lock(lock);
+}
+
+static inline void spin_unlock_irqrestore(spinlock_t *lock,
+					  unsigned long f)
+{
+	spin_unlock(lock);
+}
+
+#endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next v3 2/3] ptr_ring: move r->queue[] clearing after r->consumer_head updating
  2021-07-01 12:26 [PATCH net-next v3 0/3] add benchmark selftest and optimization for ptr_ring Yunsheng Lin
  2021-07-01 12:26 ` [PATCH net-next v3 1/3] selftests/ptr_ring: add benchmark application " Yunsheng Lin
@ 2021-07-01 12:26 ` Yunsheng Lin
  2021-07-02  6:45   ` Jason Wang
  2021-07-01 12:26 ` [PATCH net-next v3 3/3] ptr_ring: add barrier to ensure the visiblity of r->queue[] Yunsheng Lin
  2 siblings, 1 reply; 15+ messages in thread
From: Yunsheng Lin @ 2021-07-01 12:26 UTC (permalink / raw)
  To: davem, kuba, jasowang, mst
  Cc: brouer, paulmck, peterz, will, shuah, linux-kernel, netdev,
	linux-kselftest, linuxarm

Currently r->queue[] clearing is done before r->consumer_head
updating, which makes the __ptr_ring_empty() returning false
positive result(the ring is non-empty, but __ptr_ring_empty()
suggest that it is empty) if the checking is done after the
r->queue clearing and before the consumer_head moving forward.

Move the r->queue[] clearing after consumer_head moving forward
to avoid the above case.

As a side effect of above change, a consumer_head checking is
avoided for the likely case, and it has noticeable performance
improvement when it is tested using the ptr_ring_test selftest
added in the previous patch.

Tested using the "perf stat -r 1000 ./ptr_ring_test -s 1000 -m 1
-N 100000000", comparing the elapsed time:

 arch     unpatched           patched       improvement
arm64    2.087205 sec       1.888224 sec      +9.5%
 X86      2.6538 sec         2.5422 sec       +4.2%

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
V3: adjust the title and comment log according to disscusion in
    V2, and update performance data using "perf stat -r".
V2: Add performance data.
---
 include/linux/ptr_ring.h | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index 808f9d3..db9c282 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -261,8 +261,7 @@ static inline void __ptr_ring_discard_one(struct ptr_ring *r)
 	/* Note: we must keep consumer_head valid at all times for __ptr_ring_empty
 	 * to work correctly.
 	 */
-	int consumer_head = r->consumer_head;
-	int head = consumer_head++;
+	int consumer_head = r->consumer_head + 1;
 
 	/* Once we have processed enough entries invalidate them in
 	 * the ring all at once so producer can reuse their space in the ring.
@@ -271,19 +270,27 @@ static inline void __ptr_ring_discard_one(struct ptr_ring *r)
 	 */
 	if (unlikely(consumer_head - r->consumer_tail >= r->batch ||
 		     consumer_head >= r->size)) {
+		int tail = r->consumer_tail;
+
+		if (unlikely(consumer_head >= r->size)) {
+			r->consumer_tail = 0;
+			WRITE_ONCE(r->consumer_head, 0);
+		} else {
+			r->consumer_tail = consumer_head;
+			WRITE_ONCE(r->consumer_head, consumer_head);
+		}
+
 		/* Zero out entries in the reverse order: this way we touch the
 		 * cache line that producer might currently be reading the last;
 		 * producer won't make progress and touch other cache lines
 		 * besides the first one until we write out all entries.
 		 */
-		while (likely(head >= r->consumer_tail))
-			r->queue[head--] = NULL;
-		r->consumer_tail = consumer_head;
-	}
-	if (unlikely(consumer_head >= r->size)) {
-		consumer_head = 0;
-		r->consumer_tail = 0;
+		while (likely(--consumer_head >= tail))
+			r->queue[consumer_head] = NULL;
+
+		return;
 	}
+
 	/* matching READ_ONCE in __ptr_ring_empty for lockless tests */
 	WRITE_ONCE(r->consumer_head, consumer_head);
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next v3 3/3] ptr_ring: add barrier to ensure the visiblity of r->queue[]
  2021-07-01 12:26 [PATCH net-next v3 0/3] add benchmark selftest and optimization for ptr_ring Yunsheng Lin
  2021-07-01 12:26 ` [PATCH net-next v3 1/3] selftests/ptr_ring: add benchmark application " Yunsheng Lin
  2021-07-01 12:26 ` [PATCH net-next v3 2/3] ptr_ring: move r->queue[] clearing after r->consumer_head updating Yunsheng Lin
@ 2021-07-01 12:26 ` Yunsheng Lin
  2 siblings, 0 replies; 15+ messages in thread
From: Yunsheng Lin @ 2021-07-01 12:26 UTC (permalink / raw)
  To: davem, kuba, jasowang, mst
  Cc: brouer, paulmck, peterz, will, shuah, linux-kernel, netdev,
	linux-kselftest, linuxarm

After r->consumer_head is updated in __ptr_ring_discard_one(),
r->queue[r->consumer_head] is already cleared in the previous
round of __ptr_ring_discard_one(). But there is no guarantee
other thread will see the r->queue[r->consumer_head] being
NULL because there is no explicit barrier between r->queue[]
clearing and r->consumer_head updating.

So add two explicit barrier to make sure r->queue[] cleared in
__ptr_ring_discard_one() to be visible to other cpu, mainly to
make sure the cpu calling the __ptr_ring_empty() will see the
correct r->queue[r->consumer_head].

Hopefully the previous and this patch have ensured the correct
visibility of r->queue[], so update the comment accordingly
about __ptr_ring_empty().

Tested using the "perf stat -r 1000 ./ptr_ring_test -s 1000 -m 1
-N 100000000", comparing the elapsed time:

 arch     unpatched           patched       improvement
arm64    1.888224 sec      1.893673 sec      -0.2%
 X86      2.5422  sec       2.5587 sec       -0.6%

Reported-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
 include/linux/ptr_ring.h | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index db9c282..d78aab8 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -178,15 +178,11 @@ static inline void *__ptr_ring_peek(struct ptr_ring *r)
  *
  * NB: This is only safe to call if ring is never resized.
  *
- * However, if some other CPU consumes ring entries at the same time, the value
- * returned is not guaranteed to be correct.
- *
- * In this case - to avoid incorrectly detecting the ring
- * as empty - the CPU consuming the ring entries is responsible
- * for either consuming all ring entries until the ring is empty,
- * or synchronizing with some other CPU and causing it to
- * re-test __ptr_ring_empty and/or consume the ring enteries
- * after the synchronization point.
+ * caller might need to use the smp_rmb() to pair with smp_wmb()
+ * or smp_store_release() in __ptr_ring_discard_one() and smp_wmb()
+ * in __ptr_ring_produce() to ensure correct ordering between
+ * __ptr_ring_empty() checking and subsequent operation after
+ * __ptr_ring_empty() checking.
  *
  * Note: callers invoking this in a loop must use a compiler barrier,
  * for example cpu_relax().
@@ -274,7 +270,12 @@ static inline void __ptr_ring_discard_one(struct ptr_ring *r)
 
 		if (unlikely(consumer_head >= r->size)) {
 			r->consumer_tail = 0;
-			WRITE_ONCE(r->consumer_head, 0);
+
+			/* Make sure r->queue[0] ~ r->queue[r->consumer_tail]
+			 * cleared in previous __ptr_ring_discard_one() is
+			 * visible to other cpu.
+			 */
+			smp_store_release(&r->consumer_head, 0);
 		} else {
 			r->consumer_tail = consumer_head;
 			WRITE_ONCE(r->consumer_head, consumer_head);
@@ -288,6 +289,14 @@ static inline void __ptr_ring_discard_one(struct ptr_ring *r)
 		while (likely(--consumer_head >= tail))
 			r->queue[consumer_head] = NULL;
 
+		if (unlikely(!r->consumer_head)) {
+			/* Make sure r->queue[r->consumer_tail] ~
+			 * r->queue[r->size - 1] cleared above is visible to
+			 * other cpu.
+			 */
+			smp_wmb();
+		}
+
 		return;
 	}
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v3 1/3] selftests/ptr_ring: add benchmark application for ptr_ring
  2021-07-01 12:26 ` [PATCH net-next v3 1/3] selftests/ptr_ring: add benchmark application " Yunsheng Lin
@ 2021-07-02  6:43   ` Jason Wang
  2021-07-02  8:17     ` Yunsheng Lin
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Wang @ 2021-07-02  6:43 UTC (permalink / raw)
  To: Yunsheng Lin, davem, kuba, mst
  Cc: brouer, paulmck, peterz, will, shuah, linux-kernel, netdev,
	linux-kselftest, linuxarm


在 2021/7/1 下午8:26, Yunsheng Lin 写道:
> Currently ptr_ring selftest is embedded within the virtio
> selftest, which involves some specific virtio operation,
> such as notifying and kicking.
>
> As ptr_ring has been used by various subsystems, it deserves
> it's owner selftest in order to benchmark different usecase
> of ptr_ring, such as page pool and pfifo_fast qdisc.
>
> So add a simple application to benchmark ptr_ring performance.
> Currently two test mode is supported:
> Mode 0: Both producing and consuming is done in a single thread,
>          it is called simple test mode in the test app.
> Mode 1: Producing and consuming is done in different thread
>          concurrently, also known as SPSC(single-producer/
>          single-consumer) test.
>
> The multi-producer/single-consumer test for pfifo_fast case is
> not added yet, which can be added if using CAS atomic operation
> to enable lockless multi-producer is proved to be better than
> using r->producer_lock.
>
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> ---
> V3: Remove timestamp sampling, use standard C library as much
>      as possible.
> ---
>   MAINTAINERS                                      |   5 +
>   tools/testing/selftests/ptr_ring/Makefile        |   6 +
>   tools/testing/selftests/ptr_ring/ptr_ring_test.c | 224 +++++++++++++++++++++++
>   tools/testing/selftests/ptr_ring/ptr_ring_test.h | 130 +++++++++++++
>   4 files changed, 365 insertions(+)
>   create mode 100644 tools/testing/selftests/ptr_ring/Makefile
>   create mode 100644 tools/testing/selftests/ptr_ring/ptr_ring_test.c
>   create mode 100644 tools/testing/selftests/ptr_ring/ptr_ring_test.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index cc375fd..1227022 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -14847,6 +14847,11 @@ F:	drivers/net/phy/dp83640*
>   F:	drivers/ptp/*
>   F:	include/linux/ptp_cl*
>   
> +PTR RING BENCHMARK
> +M:	Yunsheng Lin <linyunsheng@huawei.com>
> +L:	netdev@vger.kernel.org
> +F:	tools/testing/selftests/ptr_ring/
> +
>   PTRACE SUPPORT
>   M:	Oleg Nesterov <oleg@redhat.com>
>   S:	Maintained
> diff --git a/tools/testing/selftests/ptr_ring/Makefile b/tools/testing/selftests/ptr_ring/Makefile
> new file mode 100644
> index 0000000..346dea9
> --- /dev/null
> +++ b/tools/testing/selftests/ptr_ring/Makefile
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +LDLIBS = -lpthread
> +
> +TEST_GEN_PROGS := ptr_ring_test
> +
> +include ../lib.mk
> diff --git a/tools/testing/selftests/ptr_ring/ptr_ring_test.c b/tools/testing/selftests/ptr_ring/ptr_ring_test.c
> new file mode 100644
> index 0000000..4a5312f
> --- /dev/null
> +++ b/tools/testing/selftests/ptr_ring/ptr_ring_test.c
> @@ -0,0 +1,224 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Copyright (C) 2021 HiSilicon Limited.
> + */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <unistd.h>
> +#include <string.h>
> +#include <errno.h>
> +#include <malloc.h>
> +#include <stdbool.h>
> +
> +#include "ptr_ring_test.h"
> +#include "../../../../include/linux/ptr_ring.h"
> +
> +#define MIN_RING_SIZE	2
> +#define MAX_RING_SIZE	10000000
> +
> +static struct ptr_ring ring ____cacheline_aligned_in_smp;
> +
> +struct worker_info {
> +	pthread_t tid;
> +	int test_count;
> +	bool error;
> +};
> +
> +static void *produce_worker(void *arg)
> +{
> +	struct worker_info *info = arg;
> +	unsigned long i = 0;
> +	int ret;
> +
> +	while (++i <= info->test_count) {
> +		while (__ptr_ring_full(&ring))
> +			cpu_relax();
> +
> +		ret = __ptr_ring_produce(&ring, (void *)i);
> +		if (ret) {
> +			fprintf(stderr, "produce failed: %d\n", ret);
> +			info->error = true;
> +			return NULL;
> +		}
> +	}
> +
> +	info->error = false;
> +
> +	return NULL;
> +}
> +
> +static void *consume_worker(void *arg)
> +{
> +	struct worker_info *info = arg;
> +	unsigned long i = 0;
> +	int *ptr;
> +
> +	while (++i <= info->test_count) {
> +		while (__ptr_ring_empty(&ring))
> +			cpu_relax();


Any reason for not simply use __ptr_ring_consume() here?


> +
> +		ptr = __ptr_ring_consume(&ring);
> +		if ((unsigned long)ptr != i) {
> +			fprintf(stderr, "consumer failed, ptr: %lu, i: %lu\n",
> +				(unsigned long)ptr, i);
> +			info->error = true;
> +			return NULL;
> +		}
> +	}
> +
> +	if (!__ptr_ring_empty(&ring)) {
> +		fprintf(stderr, "ring should be empty, test failed\n");
> +		info->error = true;
> +		return NULL;
> +	}
> +
> +	info->error = false;
> +	return NULL;
> +}
> +
> +/* test case for single producer single consumer */
> +static void spsc_test(int size, int count)
> +{
> +	struct worker_info producer, consumer;
> +	pthread_attr_t attr;
> +	void *res;
> +	int ret;
> +
> +	ret = ptr_ring_init(&ring, size, 0);
> +	if (ret) {
> +		fprintf(stderr, "init failed: %d\n", ret);
> +		return;
> +	}
> +
> +	producer.test_count = count;
> +	consumer.test_count = count;
> +
> +	ret = pthread_attr_init(&attr);
> +	if (ret) {
> +		fprintf(stderr, "pthread attr init failed: %d\n", ret);
> +		goto out;
> +	}
> +
> +	ret = pthread_create(&producer.tid, &attr,
> +			     produce_worker, &producer);
> +	if (ret) {
> +		fprintf(stderr, "create producer thread failed: %d\n", ret);
> +		goto out;
> +	}
> +
> +	ret = pthread_create(&consumer.tid, &attr,
> +			     consume_worker, &consumer);
> +	if (ret) {
> +		fprintf(stderr, "create consumer thread failed: %d\n", ret);
> +		goto out;
> +	}
> +
> +	ret = pthread_join(producer.tid, &res);
> +	if (ret) {
> +		fprintf(stderr, "join producer thread failed: %d\n", ret);
> +		goto out;
> +	}
> +
> +	ret = pthread_join(consumer.tid, &res);
> +	if (ret) {
> +		fprintf(stderr, "join consumer thread failed: %d\n", ret);
> +		goto out;
> +	}
> +
> +	if (producer.error || consumer.error) {
> +		fprintf(stderr, "spsc test failed\n");
> +		goto out;
> +	}
> +
> +	printf("ptr_ring(size:%d) perf spsc test produced/comsumed %d items, finished\n",
> +	       size, count);
> +out:
> +	ptr_ring_cleanup(&ring, NULL);
> +}
> +
> +static void simple_test(int size, int count)
> +{
> +	struct timeval start, end;
> +	int i = 0;
> +	int *ptr;
> +	int ret;
> +
> +	ret = ptr_ring_init(&ring, size, 0);
> +	if (ret) {
> +		fprintf(stderr, "init failed: %d\n", ret);
> +		return;
> +	}
> +
> +	while (++i <= count) {
> +		ret = __ptr_ring_produce(&ring, &count);
> +		if (ret) {
> +			fprintf(stderr, "produce failed: %d\n", ret);
> +			goto out;
> +		}
> +
> +		ptr = __ptr_ring_consume(&ring);
> +		if (ptr != &count)  {
> +			fprintf(stderr, "consume failed: %p\n", ptr);
> +			goto out;
> +		}
> +	}
> +
> +	printf("ptr_ring(size:%d) perf simple test produced/consumed %d items, finished\n",
> +	       size, count);
> +
> +out:
> +	ptr_ring_cleanup(&ring, NULL);
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +	int count = 1000000;
> +	int size = 1000;
> +	int mode = 0;
> +	int opt;
> +
> +	while ((opt = getopt(argc, argv, "N:s:m:h")) != -1) {
> +		switch (opt) {
> +		case 'N':
> +			count = atoi(optarg);
> +			break;
> +		case 's':
> +			size = atoi(optarg);
> +			break;
> +		case 'm':
> +			mode = atoi(optarg);
> +			break;
> +		case 'h':
> +			printf("usage: ptr_ring_test [-N COUNT] [-s RING_SIZE] [-m TEST_MODE]\n");
> +			return 0;
> +		default:
> +			return -1;
> +		}
> +	}
> +
> +	if (count <= 0) {
> +		fprintf(stderr, "invalid test count, must be > 0\n");
> +		return -1;
> +	}
> +
> +	if (size < MIN_RING_SIZE || size > MAX_RING_SIZE) {
> +		fprintf(stderr, "invalid ring size, must be in %d-%d\n",
> +			MIN_RING_SIZE, MAX_RING_SIZE);
> +		return -1;
> +	}
> +
> +	switch (mode) {
> +	case 0:
> +		simple_test(size, count);
> +		break;
> +	case 1:
> +		spsc_test(size, count);
> +		break;
> +	default:
> +		fprintf(stderr, "invalid test mode\n");
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> diff --git a/tools/testing/selftests/ptr_ring/ptr_ring_test.h b/tools/testing/selftests/ptr_ring/ptr_ring_test.h
> new file mode 100644
> index 0000000..32bfefb
> --- /dev/null
> +++ b/tools/testing/selftests/ptr_ring/ptr_ring_test.h


Let's reuse ptr_ring.c in tools/virtio/ringtest. Nothing virt specific 
there.

Thanks


> @@ -0,0 +1,130 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +
> +#ifndef _TEST_PTR_RING_TEST_H
> +#define _TEST_PTR_RING_TEST_H
> +
> +#include <assert.h>
> +#include <stdatomic.h>
> +#include <pthread.h>
> +
> +/* Assuming the cache line size is 64 for most cpu,
> + * change it accordingly if the running cpu has different
> + * cache line size in order to get more accurate result.
> + */
> +#define SMP_CACHE_BYTES	64
> +
> +#define cpu_relax()	sched_yield()
> +#define smp_release()	atomic_thread_fence(memory_order_release)
> +#define smp_acquire()	atomic_thread_fence(memory_order_acquire)
> +#define smp_wmb()	smp_release()
> +#define smp_store_release(p, v)	\
> +		atomic_store_explicit(p, v, memory_order_release)
> +
> +#define READ_ONCE(x)		(*(volatile typeof(x) *)&(x))
> +#define WRITE_ONCE(x, val)	((*(volatile typeof(x) *)&(x)) = (val))
> +#define cache_line_size		SMP_CACHE_BYTES
> +#define unlikely(x)		(__builtin_expect(!!(x), 0))
> +#define likely(x)		(__builtin_expect(!!(x), 1))
> +#define ALIGN(x, a)		(((x) + (a) - 1) / (a) * (a))
> +#define SIZE_MAX		(~(size_t)0)
> +#define KMALLOC_MAX_SIZE	SIZE_MAX
> +#define spinlock_t		pthread_spinlock_t
> +#define gfp_t			int
> +#define __GFP_ZERO		0x1
> +
> +#define ____cacheline_aligned_in_smp \
> +		__attribute__((aligned(SMP_CACHE_BYTES)))
> +
> +static inline void *kmalloc(unsigned int size, gfp_t gfp)
> +{
> +	void *p;
> +
> +	p = memalign(64, size);
> +	if (!p)
> +		return p;
> +
> +	if (gfp & __GFP_ZERO)
> +		memset(p, 0, size);
> +
> +	return p;
> +}
> +
> +static inline void *kzalloc(unsigned int size, gfp_t flags)
> +{
> +	return kmalloc(size, flags | __GFP_ZERO);
> +}
> +
> +static inline void *kmalloc_array(size_t n, size_t size, gfp_t flags)
> +{
> +	if (size != 0 && n > SIZE_MAX / size)
> +		return NULL;
> +	return kmalloc(n * size, flags);
> +}
> +
> +static inline void *kcalloc(size_t n, size_t size, gfp_t flags)
> +{
> +	return kmalloc_array(n, size, flags | __GFP_ZERO);
> +}
> +
> +static inline void kfree(void *p)
> +{
> +	free(p);
> +}
> +
> +#define kvmalloc_array		kmalloc_array
> +#define kvfree			kfree
> +
> +static inline void spin_lock_init(spinlock_t *lock)
> +{
> +	int r = pthread_spin_init(lock, 0);
> +
> +	assert(!r);
> +}
> +
> +static inline void spin_lock(spinlock_t *lock)
> +{
> +	int ret = pthread_spin_lock(lock);
> +
> +	assert(!ret);
> +}
> +
> +static inline void spin_unlock(spinlock_t *lock)
> +{
> +	int ret = pthread_spin_unlock(lock);
> +
> +	assert(!ret);
> +}
> +
> +static inline void spin_lock_bh(spinlock_t *lock)
> +{
> +	spin_lock(lock);
> +}
> +
> +static inline void spin_unlock_bh(spinlock_t *lock)
> +{
> +	spin_unlock(lock);
> +}
> +
> +static inline void spin_lock_irq(spinlock_t *lock)
> +{
> +	spin_lock(lock);
> +}
> +
> +static inline void spin_unlock_irq(spinlock_t *lock)
> +{
> +	spin_unlock(lock);
> +}
> +
> +static inline void spin_lock_irqsave(spinlock_t *lock,
> +				     unsigned long f)
> +{
> +	spin_lock(lock);
> +}
> +
> +static inline void spin_unlock_irqrestore(spinlock_t *lock,
> +					  unsigned long f)
> +{
> +	spin_unlock(lock);
> +}
> +
> +#endif


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v3 2/3] ptr_ring: move r->queue[] clearing after r->consumer_head updating
  2021-07-01 12:26 ` [PATCH net-next v3 2/3] ptr_ring: move r->queue[] clearing after r->consumer_head updating Yunsheng Lin
@ 2021-07-02  6:45   ` Jason Wang
  2021-07-02  8:40     ` [Linuxarm] " Yunsheng Lin
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Wang @ 2021-07-02  6:45 UTC (permalink / raw)
  To: Yunsheng Lin, davem, kuba, mst
  Cc: brouer, paulmck, peterz, will, shuah, linux-kernel, netdev,
	linux-kselftest, linuxarm


在 2021/7/1 下午8:26, Yunsheng Lin 写道:
> Currently r->queue[] clearing is done before r->consumer_head
> updating, which makes the __ptr_ring_empty() returning false
> positive result(the ring is non-empty, but __ptr_ring_empty()
> suggest that it is empty) if the checking is done after the
> r->queue clearing and before the consumer_head moving forward.
>
> Move the r->queue[] clearing after consumer_head moving forward
> to avoid the above case.
>
> As a side effect of above change, a consumer_head checking is
> avoided for the likely case, and it has noticeable performance
> improvement when it is tested using the ptr_ring_test selftest
> added in the previous patch.
>
> Tested using the "perf stat -r 1000 ./ptr_ring_test -s 1000 -m 1
> -N 100000000", comparing the elapsed time:
>
>   arch     unpatched           patched       improvement
> arm64    2.087205 sec       1.888224 sec      +9.5%
>   X86      2.6538 sec         2.5422 sec       +4.2%


I think we need the number of real workloads here.

Thanks


>
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> ---
> V3: adjust the title and comment log according to disscusion in
>      V2, and update performance data using "perf stat -r".
> V2: Add performance data.
> ---
>   include/linux/ptr_ring.h | 25 ++++++++++++++++---------
>   1 file changed, 16 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> index 808f9d3..db9c282 100644
> --- a/include/linux/ptr_ring.h
> +++ b/include/linux/ptr_ring.h
> @@ -261,8 +261,7 @@ static inline void __ptr_ring_discard_one(struct ptr_ring *r)
>   	/* Note: we must keep consumer_head valid at all times for __ptr_ring_empty
>   	 * to work correctly.
>   	 */
> -	int consumer_head = r->consumer_head;
> -	int head = consumer_head++;
> +	int consumer_head = r->consumer_head + 1;
>   
>   	/* Once we have processed enough entries invalidate them in
>   	 * the ring all at once so producer can reuse their space in the ring.
> @@ -271,19 +270,27 @@ static inline void __ptr_ring_discard_one(struct ptr_ring *r)
>   	 */
>   	if (unlikely(consumer_head - r->consumer_tail >= r->batch ||
>   		     consumer_head >= r->size)) {
> +		int tail = r->consumer_tail;
> +
> +		if (unlikely(consumer_head >= r->size)) {
> +			r->consumer_tail = 0;
> +			WRITE_ONCE(r->consumer_head, 0);
> +		} else {
> +			r->consumer_tail = consumer_head;
> +			WRITE_ONCE(r->consumer_head, consumer_head);
> +		}
> +
>   		/* Zero out entries in the reverse order: this way we touch the
>   		 * cache line that producer might currently be reading the last;
>   		 * producer won't make progress and touch other cache lines
>   		 * besides the first one until we write out all entries.
>   		 */
> -		while (likely(head >= r->consumer_tail))
> -			r->queue[head--] = NULL;
> -		r->consumer_tail = consumer_head;
> -	}
> -	if (unlikely(consumer_head >= r->size)) {
> -		consumer_head = 0;
> -		r->consumer_tail = 0;
> +		while (likely(--consumer_head >= tail))
> +			r->queue[consumer_head] = NULL;
> +
> +		return;
>   	}
> +
>   	/* matching READ_ONCE in __ptr_ring_empty for lockless tests */
>   	WRITE_ONCE(r->consumer_head, consumer_head);
>   }


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v3 1/3] selftests/ptr_ring: add benchmark application for ptr_ring
  2021-07-02  6:43   ` Jason Wang
@ 2021-07-02  8:17     ` Yunsheng Lin
  2021-07-02  8:30       ` Michael S. Tsirkin
  0 siblings, 1 reply; 15+ messages in thread
From: Yunsheng Lin @ 2021-07-02  8:17 UTC (permalink / raw)
  To: Jason Wang, davem, kuba, mst
  Cc: brouer, paulmck, peterz, will, shuah, linux-kernel, netdev,
	linux-kselftest, linuxarm

On 2021/7/2 14:43, Jason Wang wrote:
> 
> 在 2021/7/1 下午8:26, Yunsheng Lin 写道:
>> Currently ptr_ring selftest is embedded within the virtio
>> selftest, which involves some specific virtio operation,
>> such as notifying and kicking.
>>
>> As ptr_ring has been used by various subsystems, it deserves
>> it's owner selftest in order to benchmark different usecase
>> of ptr_ring, such as page pool and pfifo_fast qdisc.
>>
>> So add a simple application to benchmark ptr_ring performance.
>> Currently two test mode is supported:
>> Mode 0: Both producing and consuming is done in a single thread,
>>          it is called simple test mode in the test app.
>> Mode 1: Producing and consuming is done in different thread
>>          concurrently, also known as SPSC(single-producer/
>>          single-consumer) test.
>>
>> The multi-producer/single-consumer test for pfifo_fast case is
>> not added yet, which can be added if using CAS atomic operation
>> to enable lockless multi-producer is proved to be better than
>> using r->producer_lock.
>>
>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
>> ---
>> V3: Remove timestamp sampling, use standard C library as much
>>      as possible.

[...]

>> +static void *produce_worker(void *arg)
>> +{
>> +    struct worker_info *info = arg;
>> +    unsigned long i = 0;
>> +    int ret;
>> +
>> +    while (++i <= info->test_count) {
>> +        while (__ptr_ring_full(&ring))
>> +            cpu_relax();
>> +
>> +        ret = __ptr_ring_produce(&ring, (void *)i);
>> +        if (ret) {
>> +            fprintf(stderr, "produce failed: %d\n", ret);
>> +            info->error = true;
>> +            return NULL;
>> +        }
>> +    }
>> +
>> +    info->error = false;
>> +
>> +    return NULL;
>> +}
>> +
>> +static void *consume_worker(void *arg)
>> +{
>> +    struct worker_info *info = arg;
>> +    unsigned long i = 0;
>> +    int *ptr;
>> +
>> +    while (++i <= info->test_count) {
>> +        while (__ptr_ring_empty(&ring))
>> +            cpu_relax();
> 
> 
> Any reason for not simply use __ptr_ring_consume() here?

No particular reason, just to make sure the ring is
non-empty before doing the enqueuing, we could check
if the __ptr_ring_consume() return NULL to decide
the if the ring is empty. Using __ptr_ring_consume()
here enable testing the correctness and performance of
__ptr_ring_consume() too.

> 
> 
>> +
>> +        ptr = __ptr_ring_consume(&ring);
>> +        if ((unsigned long)ptr != i) {
>> +            fprintf(stderr, "consumer failed, ptr: %lu, i: %lu\n",
>> +                (unsigned long)ptr, i);
>> +            info->error = true;
>> +            return NULL;
>> +        }
>> +    }
>> +
>> +    if (!__ptr_ring_empty(&ring)) {
>> +        fprintf(stderr, "ring should be empty, test failed\n");
>> +        info->error = true;
>> +        return NULL;
>> +    }
>> +
>> +    info->error = false;
>> +    return NULL;
>> +}
>> +

[...]

>> +
>> +    return 0;
>> +}
>> diff --git a/tools/testing/selftests/ptr_ring/ptr_ring_test.h b/tools/testing/selftests/ptr_ring/ptr_ring_test.h
>> new file mode 100644
>> index 0000000..32bfefb
>> --- /dev/null
>> +++ b/tools/testing/selftests/ptr_ring/ptr_ring_test.h
> 
> 
> Let's reuse ptr_ring.c in tools/virtio/ringtest. Nothing virt specific there.

It *does* have some virtio specific at the end of ptr_ring.c.
It can be argued that the ptr_ring.c in tools/virtio/ringtest
could be refactored to remove the function related to virtio.

But as mentioned in the previous disscusion [1], the tools/virtio/
seems to have compile error in the latest kernel, it does not seems
right to reuse that. And most of testcase in tools/virtio/ seems
better be in tools/virtio/ringtest instead,so until the testcase
in tools/virtio/ is compile-error-free and moved to tools/testing/
selftests/, it seems better not to reuse it for now.

1. https://patchwork.kernel.org/project/netdevbpf/patch/1624591136-6647-2-git-send-email-linyunsheng@huawei.com/#24278945

> 
> Thanks
> 

[...]

> 
> .
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v3 1/3] selftests/ptr_ring: add benchmark application for ptr_ring
  2021-07-02  8:17     ` Yunsheng Lin
@ 2021-07-02  8:30       ` Michael S. Tsirkin
  2021-07-02  8:46         ` Yunsheng Lin
  0 siblings, 1 reply; 15+ messages in thread
From: Michael S. Tsirkin @ 2021-07-02  8:30 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: Jason Wang, davem, kuba, brouer, paulmck, peterz, will, shuah,
	linux-kernel, netdev, linux-kselftest, linuxarm

On Fri, Jul 02, 2021 at 04:17:17PM +0800, Yunsheng Lin wrote:
> > Let's reuse ptr_ring.c in tools/virtio/ringtest. Nothing virt specific there.
> 
> It *does* have some virtio specific at the end of ptr_ring.c.
> It can be argued that the ptr_ring.c in tools/virtio/ringtest
> could be refactored to remove the function related to virtio.
> 
> But as mentioned in the previous disscusion [1], the tools/virtio/
> seems to have compile error in the latest kernel, it does not seems
> right to reuse that.
> And most of testcase in tools/virtio/ seems
> better be in tools/virtio/ringtest instead,so until the testcase
> in tools/virtio/ is compile-error-free and moved to tools/testing/
> selftests/, it seems better not to reuse it for now.


That's a great reason to reuse - so tools/virtio/ stays working.
Please just fix that.

-- 
MST


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Linuxarm] Re: [PATCH net-next v3 2/3] ptr_ring: move r->queue[] clearing after r->consumer_head updating
  2021-07-02  6:45   ` Jason Wang
@ 2021-07-02  8:40     ` Yunsheng Lin
  0 siblings, 0 replies; 15+ messages in thread
From: Yunsheng Lin @ 2021-07-02  8:40 UTC (permalink / raw)
  To: Jason Wang, davem, kuba, mst
  Cc: brouer, paulmck, peterz, will, shuah, linux-kernel, netdev,
	linux-kselftest, linuxarm

On 2021/7/2 14:45, Jason Wang wrote:
> 
> 在 2021/7/1 下午8:26, Yunsheng Lin 写道:
>> Currently r->queue[] clearing is done before r->consumer_head
>> updating, which makes the __ptr_ring_empty() returning false
>> positive result(the ring is non-empty, but __ptr_ring_empty()
>> suggest that it is empty) if the checking is done after the
>> r->queue clearing and before the consumer_head moving forward.
>>
>> Move the r->queue[] clearing after consumer_head moving forward
>> to avoid the above case.
>>
>> As a side effect of above change, a consumer_head checking is
>> avoided for the likely case, and it has noticeable performance
>> improvement when it is tested using the ptr_ring_test selftest
>> added in the previous patch.
>>
>> Tested using the "perf stat -r 1000 ./ptr_ring_test -s 1000 -m 1
>> -N 100000000", comparing the elapsed time:
>>
>>   arch     unpatched           patched       improvement
>> arm64    2.087205 sec       1.888224 sec      +9.5%
>>   X86      2.6538 sec         2.5422 sec       +4.2%
> 
> 
> I think we need the number of real workloads here.

As it is a low optimization, and overhead of enqueuing
and dequeuing is small for any real workloads, so the
performance improvement could be buried in deviation.
And that is why the ptr_ring_test is added, the about
10% improvement for arm64 seems big, but note that it
is tested using the taskset to avoid the numa effects
for arm64.

Anyway, here is the performance data for pktgen in
queue_xmit mode + dummy netdev with pfifo_fast(which
uses ptr_ring too), which is not obvious to the above
data:

 threads    unpatched        unpatched        delta
    1       3.21Mpps         3.23Mpps         +0.6%
    2       5.56Mpps         3.59Mpps         +0.5%
    4       5.58Mpps         5.61Mpps         +0.5%
    8       2.76Mpps         2.75Mpps         -0.3%
   16       2.23Mpps         2.22Mpps         -0.4%

> 
> Thanks
> 
> 
>>
>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v3 1/3] selftests/ptr_ring: add benchmark application for ptr_ring
  2021-07-02  8:30       ` Michael S. Tsirkin
@ 2021-07-02  8:46         ` Yunsheng Lin
  2021-07-02  9:04           ` Jason Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Yunsheng Lin @ 2021-07-02  8:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, davem, kuba, brouer, paulmck, peterz, will, shuah,
	linux-kernel, netdev, linux-kselftest, linuxarm

On 2021/7/2 16:30, Michael S. Tsirkin wrote:
> On Fri, Jul 02, 2021 at 04:17:17PM +0800, Yunsheng Lin wrote:
>>> Let's reuse ptr_ring.c in tools/virtio/ringtest. Nothing virt specific there.
>>
>> It *does* have some virtio specific at the end of ptr_ring.c.
>> It can be argued that the ptr_ring.c in tools/virtio/ringtest
>> could be refactored to remove the function related to virtio.
>>
>> But as mentioned in the previous disscusion [1], the tools/virtio/
>> seems to have compile error in the latest kernel, it does not seems
>> right to reuse that.
>> And most of testcase in tools/virtio/ seems
>> better be in tools/virtio/ringtest instead,so until the testcase
>> in tools/virtio/ is compile-error-free and moved to tools/testing/
>> selftests/, it seems better not to reuse it for now.
> 
> 
> That's a great reason to reuse - so tools/virtio/ stays working.
> Please just fix that.

I understand that you guys like to see a working testcase of virtio.
I would love to do that if I have the time and knowledge of virtio,
But I do not think I have the time and I am familiar enough with
virtio to fix that now.


> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v3 1/3] selftests/ptr_ring: add benchmark application for ptr_ring
  2021-07-02  8:46         ` Yunsheng Lin
@ 2021-07-02  9:04           ` Jason Wang
  2021-07-02  9:54             ` Yunsheng Lin
  2021-07-02 14:16             ` Michael S. Tsirkin
  0 siblings, 2 replies; 15+ messages in thread
From: Jason Wang @ 2021-07-02  9:04 UTC (permalink / raw)
  To: Yunsheng Lin, Michael S. Tsirkin
  Cc: davem, kuba, brouer, paulmck, peterz, will, shuah, linux-kernel,
	netdev, linux-kselftest, linuxarm


在 2021/7/2 下午4:46, Yunsheng Lin 写道:
> On 2021/7/2 16:30, Michael S. Tsirkin wrote:
>> On Fri, Jul 02, 2021 at 04:17:17PM +0800, Yunsheng Lin wrote:
>>>> Let's reuse ptr_ring.c in tools/virtio/ringtest. Nothing virt specific there.
>>> It *does* have some virtio specific at the end of ptr_ring.c.


They are just wrappers to make ptr ring works like a virtio ring. We can 
split them out into another file if necessary.


>>> It can be argued that the ptr_ring.c in tools/virtio/ringtest
>>> could be refactored to remove the function related to virtio.
>>>
>>> But as mentioned in the previous disscusion [1], the tools/virtio/
>>> seems to have compile error in the latest kernel, it does not seems
>>> right to reuse that.
>>> And most of testcase in tools/virtio/ seems
>>> better be in tools/virtio/ringtest instead,so until the testcase
>>> in tools/virtio/ is compile-error-free and moved to tools/testing/
>>> selftests/, it seems better not to reuse it for now.
>>
>> That's a great reason to reuse - so tools/virtio/ stays working.
>> Please just fix that.


+1


> I understand that you guys like to see a working testcase of virtio.
> I would love to do that if I have the time and knowledge of virtio,
> But I do not think I have the time and I am familiar enough with
> virtio to fix that now.


So ringtest is used for bench-marking the ring performance for different 
format. Virtio is only one of the supported ring format, ptr ring is 
another. Wrappers were used to reuse the same test logic.

Though you may see host/guest in the test, it's in fact done via two 
processes.

We need figure out:

1) why the current ringtest.c does not fit for your requirement (it has 
SPSC test)
2) why can't we tweak the ptr_ring.c to be used by both ring_test and 
your benchmark

If neither of the above work, we can invent new ptr_ring infrastructure 
under tests/

Thanks


>
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v3 1/3] selftests/ptr_ring: add benchmark application for ptr_ring
  2021-07-02  9:04           ` Jason Wang
@ 2021-07-02  9:54             ` Yunsheng Lin
  2021-07-02 14:18               ` Michael S. Tsirkin
  2021-07-02 14:16             ` Michael S. Tsirkin
  1 sibling, 1 reply; 15+ messages in thread
From: Yunsheng Lin @ 2021-07-02  9:54 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: davem, kuba, brouer, paulmck, peterz, will, shuah, linux-kernel,
	netdev, linux-kselftest, linuxarm

On 2021/7/2 17:04, Jason Wang wrote:
> 

[...]

> 
> 
>> I understand that you guys like to see a working testcase of virtio.
>> I would love to do that if I have the time and knowledge of virtio,
>> But I do not think I have the time and I am familiar enough with
>> virtio to fix that now.
> 
> 
> So ringtest is used for bench-marking the ring performance for different format. Virtio is only one of the supported ring format, ptr ring is another. Wrappers were used to reuse the same test logic.
> 
> Though you may see host/guest in the test, it's in fact done via two processes.
> 
> We need figure out:
> 
> 1) why the current ringtest.c does not fit for your requirement (it has SPSC test)

There is MPSC case used by pfifo_fast, it make more sense to use a separate selftest
for ptr_ring as ptr_ring has been used by various subsystems.


> 2) why can't we tweak the ptr_ring.c to be used by both ring_test and your benchmark

Actually that is what I do in this patch, move the specific part related to ptr_ring
to ptr_ring_test.h. When the virtio testing is refactored to work, it can reuse the
abstract layer in ptr_ring_test.h too.

> 
> If neither of the above work, we can invent new ptr_ring infrastructure under tests/
> 
> Thanks
> 
> 
>>
>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v3 1/3] selftests/ptr_ring: add benchmark application for ptr_ring
  2021-07-02  9:04           ` Jason Wang
  2021-07-02  9:54             ` Yunsheng Lin
@ 2021-07-02 14:16             ` Michael S. Tsirkin
  1 sibling, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2021-07-02 14:16 UTC (permalink / raw)
  To: Jason Wang
  Cc: Yunsheng Lin, davem, kuba, brouer, paulmck, peterz, will, shuah,
	linux-kernel, netdev, linux-kselftest, linuxarm

On Fri, Jul 02, 2021 at 05:04:44PM +0800, Jason Wang wrote:
> 
> 在 2021/7/2 下午4:46, Yunsheng Lin 写道:
> > On 2021/7/2 16:30, Michael S. Tsirkin wrote:
> > > On Fri, Jul 02, 2021 at 04:17:17PM +0800, Yunsheng Lin wrote:
> > > > > Let's reuse ptr_ring.c in tools/virtio/ringtest. Nothing virt specific there.
> > > > It *does* have some virtio specific at the end of ptr_ring.c.
> 
> 
> They are just wrappers to make ptr ring works like a virtio ring. We can
> split them out into another file if necessary.
> 
> 
> > > > It can be argued that the ptr_ring.c in tools/virtio/ringtest
> > > > could be refactored to remove the function related to virtio.
> > > > 
> > > > But as mentioned in the previous disscusion [1], the tools/virtio/
> > > > seems to have compile error in the latest kernel, it does not seems
> > > > right to reuse that.
> > > > And most of testcase in tools/virtio/ seems
> > > > better be in tools/virtio/ringtest instead,so until the testcase
> > > > in tools/virtio/ is compile-error-free and moved to tools/testing/
> > > > selftests/, it seems better not to reuse it for now.
> > > 
> > > That's a great reason to reuse - so tools/virtio/ stays working.
> > > Please just fix that.
> 
> 
> +1
> 
> 
> > I understand that you guys like to see a working testcase of virtio.
> > I would love to do that if I have the time and knowledge of virtio,
> > But I do not think I have the time and I am familiar enough with
> > virtio to fix that now.
> 
> 
> So ringtest is used for bench-marking the ring performance for different
> format. Virtio is only one of the supported ring format, ptr ring is
> another. Wrappers were used to reuse the same test logic.
> 
> Though you may see host/guest in the test, it's in fact done via two
> processes.
> 
> We need figure out:
> 
> 1) why the current ringtest.c does not fit for your requirement (it has SPSC
> test)
> 2) why can't we tweak the ptr_ring.c to be used by both ring_test and your
> benchmark
> 
> If neither of the above work, we can invent new ptr_ring infrastructure
> under tests/
> 
> Thanks

For me 1) is not a question.

All the available/used terminology is not an ideal fit for ptr ring.
With virtio buffers are always owned by driver (producer) so producer
has a way to find out if a buffer has been consumed.  With ptr ring
there's no way for producer to know a buffer has been consumed.
The test hacks around that but it is very reasonable
not to want to rely on that.

However 2) is very much a question. We can split ptr_ring
to the preamble and virtio related hacks.
So all the portability infrastructure for building
kernel code from userspace, command line parsing,
run-on-all.sh to figure out affinity effects,
all that can and should IMHO be reused and not copy-pasted.

> 
> > 
> > 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v3 1/3] selftests/ptr_ring: add benchmark application for ptr_ring
  2021-07-02  9:54             ` Yunsheng Lin
@ 2021-07-02 14:18               ` Michael S. Tsirkin
  2021-07-05  1:43                 ` Yunsheng Lin
  0 siblings, 1 reply; 15+ messages in thread
From: Michael S. Tsirkin @ 2021-07-02 14:18 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: Jason Wang, davem, kuba, brouer, paulmck, peterz, will, shuah,
	linux-kernel, netdev, linux-kselftest, linuxarm

On Fri, Jul 02, 2021 at 05:54:42PM +0800, Yunsheng Lin wrote:
> On 2021/7/2 17:04, Jason Wang wrote:
> > 
> 
> [...]
> 
> > 
> > 
> >> I understand that you guys like to see a working testcase of virtio.
> >> I would love to do that if I have the time and knowledge of virtio,
> >> But I do not think I have the time and I am familiar enough with
> >> virtio to fix that now.
> > 
> > 
> > So ringtest is used for bench-marking the ring performance for different format. Virtio is only one of the supported ring format, ptr ring is another. Wrappers were used to reuse the same test logic.
> > 
> > Though you may see host/guest in the test, it's in fact done via two processes.
> > 
> > We need figure out:
> > 
> > 1) why the current ringtest.c does not fit for your requirement (it has SPSC test)
> 
> There is MPSC case used by pfifo_fast, it make more sense to use a separate selftest
> for ptr_ring as ptr_ring has been used by various subsystems.
> 
> 
> > 2) why can't we tweak the ptr_ring.c to be used by both ring_test and your benchmark
> 
> Actually that is what I do in this patch, move the specific part related to ptr_ring
> to ptr_ring_test.h. When the virtio testing is refactored to work, it can reuse the
> abstract layer in ptr_ring_test.h too.

Sounds good. But that refactoring will be up to you as a contributor.

> > 
> > If neither of the above work, we can invent new ptr_ring infrastructure under tests/
> > 
> > Thanks
> > 
> > 
> >>
> >>
> > 
> > .
> > 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Re: [PATCH net-next v3 1/3] selftests/ptr_ring: add benchmark application for ptr_ring
  2021-07-02 14:18               ` Michael S. Tsirkin
@ 2021-07-05  1:43                 ` Yunsheng Lin
  0 siblings, 0 replies; 15+ messages in thread
From: Yunsheng Lin @ 2021-07-05  1:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, davem, kuba, brouer, paulmck, peterz, will, shuah,
	linux-kernel, netdev, linux-kselftest, linuxarm

On 2021/7/2 22:18, Michael S. Tsirkin wrote:
> On Fri, Jul 02, 2021 at 05:54:42PM +0800, Yunsheng Lin wrote:
>> On 2021/7/2 17:04, Jason Wang wrote:
>>>
>>
>> [...]
>>
>>>
>>>
>>>> I understand that you guys like to see a working testcase of virtio.
>>>> I would love to do that if I have the time and knowledge of virtio,
>>>> But I do not think I have the time and I am familiar enough with
>>>> virtio to fix that now.
>>>
>>>
>>> So ringtest is used for bench-marking the ring performance for different format. Virtio is only one of the supported ring format, ptr ring is another. Wrappers were used to reuse the same test logic.
>>>
>>> Though you may see host/guest in the test, it's in fact done via two processes.
>>>
>>> We need figure out:
>>>
>>> 1) why the current ringtest.c does not fit for your requirement (it has SPSC test)
>>
>> There is MPSC case used by pfifo_fast, it make more sense to use a separate selftest
>> for ptr_ring as ptr_ring has been used by various subsystems.
>>
>>
>>> 2) why can't we tweak the ptr_ring.c to be used by both ring_test and your benchmark
>>
>> Actually that is what I do in this patch, move the specific part related to ptr_ring
>> to ptr_ring_test.h. When the virtio testing is refactored to work, it can reuse the
>> abstract layer in ptr_ring_test.h too.
> 
> Sounds good. But that refactoring will be up to you as a contributor.

It seems that tools/include/* have a lot of portability infrastructure for building
kernel code from userspace, will try to refactor the ptr_ring.h to use the portability
infrastructure in tools/include/* when building ptr_ring.h from userspace.

>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-07-05  1:43 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-01 12:26 [PATCH net-next v3 0/3] add benchmark selftest and optimization for ptr_ring Yunsheng Lin
2021-07-01 12:26 ` [PATCH net-next v3 1/3] selftests/ptr_ring: add benchmark application " Yunsheng Lin
2021-07-02  6:43   ` Jason Wang
2021-07-02  8:17     ` Yunsheng Lin
2021-07-02  8:30       ` Michael S. Tsirkin
2021-07-02  8:46         ` Yunsheng Lin
2021-07-02  9:04           ` Jason Wang
2021-07-02  9:54             ` Yunsheng Lin
2021-07-02 14:18               ` Michael S. Tsirkin
2021-07-05  1:43                 ` Yunsheng Lin
2021-07-02 14:16             ` Michael S. Tsirkin
2021-07-01 12:26 ` [PATCH net-next v3 2/3] ptr_ring: move r->queue[] clearing after r->consumer_head updating Yunsheng Lin
2021-07-02  6:45   ` Jason Wang
2021-07-02  8:40     ` [Linuxarm] " Yunsheng Lin
2021-07-01 12:26 ` [PATCH net-next v3 3/3] ptr_ring: add barrier to ensure the visiblity of r->queue[] Yunsheng Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).