[RFC PATCH v2 0/1] Pipe busy wait

* [RFC PATCH v2 0/1] Pipe busy wait
@ 2018-09-25 23:32 subhra mazumdar
  2018-09-25 23:32 ` [RFC PATCH v2 1/1] pipe: busy wait for pipe subhra mazumdar
  0 siblings, 1 reply; 5+ messages in thread
From: subhra mazumdar @ 2018-09-25 23:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, tglx, dhaval.giani, steven.sistare

This patch introduces busy waiting for pipes similar to network sockets.
When pipe is full or empty a thread busy waits for some microseconds before
sleeping. This avoids the sleep and wakeup overhead and improves
performance in case wakeup happens very fast. It uses a new field in
pipe_inode_info to decide how much to spin. As different workloads on
different systems can have different optimum spin time, it is configurable
via a tunable that can be set via /proc. The default value is 0 which
indicates no spin.

Following are the hackbench process using pipe and Unixbench performance 
numbers with baseline and suitable spin time.

Hackbench on 2 socket, 36 core and 72 threads Intel x86 machine
(lower is better). It also shows the usr+sys time as shown by time:
groups  baseline  usr+sys   patch(spin=10us)  usr+sys
1       0.6742    17.0212   0.6842 (-1.48%)   20.2765
2       0.7794    35.1954   0.7116 (8.7%)     38.1318
4       0.9744    55.6513   0.8247 (15.36%)   50.301
8       2.0382    129.519   1.572 (22.87%)    103.722
16      5.5712    427.096   2.7989 (49.76%)   194.162
24      7.9314    566.327   3.962 (50.05%)    278.368

Unixbench on 2 socket, 36 core and 72 threads Intel x86 machine (higher is
better). It shows pipe-based context switching test improving by 107.7% for
1 copy and by 51.3% for 72 parallel copies. This is expected as context
switch overhead is avoided altogether by busy wait.

72 CPUs in system; running 1 parallel copy of tests
System Benchmarks Index Values          baseline	patch(spin=10us)
Dhrystone 2 using register variables    2837.5		2842.0
Double-Precision Whetstone              753.3		753.5
Execl Throughput                        1056.8		1079.2
File Copy 1024 bufsize 2000 maxblocks   1794.0		1805.4
File Copy 256 bufsize 500 maxblocks     1111.4		1117.6
File Copy 4096 bufsize 8000 maxblocks   4136.7		4091.7
Pipe Throughput                         752.9		753.3
Pipe-based Context Switching            372.2		772.9
Process Creation                        840.1		847.6
Shell Scripts (1 concurrent)            1771.0		1747.5
Shell Scripts (8 concurrent)            7316.6		7308.3
System Call Overhead                    578.0		578.0
                                        ========	========
System Benchmarks Index Score           1337.8		1424.0

72 CPUs in system; running 72 parallel copies of tests
System Benchmarks Index Values          baseline 	patch(spin=10us)
Dhrystone 2 using register variables    112849.7	112429.8
Double-Precision Whetstone              44971.1		44929.7
Execl Throughput                        11257.6		11258.7
File Copy 1024 bufsize 2000 maxblocks   1514.6		1471.3
File Copy 256 bufsize 500 maxblocks     919.8		917.4
File Copy 4096 bufsize 8000 maxblocks   3543.8		3355.5
Pipe Throughput                         34530.6		34492.9
Pipe-based Context Switching            11298.2		17089.9
Process Creation                        9422.5		9408.1
Shell Scripts (1 concurrent)            38764.9		38610.4
Shell Scripts (8 concurrent)            32570.5		32458.8
System Call Overhead                    2607.4		2672.7
                                        ========	========
System Benchmarks Index Score           11077.2		11393.5

Changes from v1->v2:
-Removed preemption disable in the busy wait loop
-Changed busy spin condition to TASK_RUNNING state of the current thread
-Added usr+sys time for hackbench runs as shown by time command

subhra mazumdar (1):
  pipe: busy wait for pipe

 fs/pipe.c                 | 12 ++++++++++++
 include/linux/pipe_fs_i.h |  2 ++
 kernel/sysctl.c           |  7 +++++++
 3 files changed, 21 insertions(+)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 5+ messages in thread