linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched/fair: update_pick_idlest() Select group with lowest group_util when idle_cpus are equal
@ 2020-06-16 16:48 peter.puhov
  2020-06-17 10:50 ` Valentin Schneider
  2020-07-01  9:19 ` [sched/fair] 0b9730e694: vm-scalability.throughput 7.7% improvement kernel test robot
  0 siblings, 2 replies; 7+ messages in thread
From: peter.puhov @ 2020-06-16 16:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: peter.puhov, robert.foley, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman

From: Peter Puhov <peter.puhov@linaro.org>

In slow path, when selecting idlest group, if both groups have type 
group_has_spare, only idle_cpus count gets compared.
As a result, if multiple tasks are created in a tight loop,
and go back to sleep immediately 
(while waiting for all tasks to be created), 
they may be scheduled on the same core, because CPU is back to idle
when the new fork happen.

For example:
sudo perf record -e sched:sched_wakeup_new -- \
                                  sysbench threads --threads=4 run
...
    total number of events:              61582
...
sudo perf script
sysbench 129378 [006] 74586.633466: sched:sched_wakeup_new: 
                            sysbench:129380 [120] success=1 CPU:007
sysbench 129378 [006] 74586.634718: sched:sched_wakeup_new: 
                            sysbench:129381 [120] success=1 CPU:007
sysbench 129378 [006] 74586.635957: sched:sched_wakeup_new: 
                            sysbench:129382 [120] success=1 CPU:007
sysbench 129378 [006] 74586.637183: sched:sched_wakeup_new: 
                            sysbench:129383 [120] success=1 CPU:007

This may have negative impact on performance for workloads with frequent
creation of multiple threads.

In this patch we using group_util to select idlest group if both groups 
have equal number of idle_cpus. In this case newly created tasks would be
better distributed. It is possible to use nr_running instead of group_util,
but result is less predictable.

With this patch:
sudo perf record -e sched:sched_wakeup_new -- \
                                    sysbench threads --threads=4 run
...
    total number of events:              74401
...
sudo perf script
sysbench 129455 [006] 75232.853257: sched:sched_wakeup_new: 
                            sysbench:129457 [120] success=1 CPU:008
sysbench 129455 [006] 75232.854489: sched:sched_wakeup_new: 
                            sysbench:129458 [120] success=1 CPU:009
sysbench 129455 [006] 75232.855732: sched:sched_wakeup_new: 
                            sysbench:129459 [120] success=1 CPU:010
sysbench 129455 [006] 75232.856980: sched:sched_wakeup_new: 
                            sysbench:129460 [120] success=1 CPU:011

We tested this patch with following benchmarks:
  perf bench -f simple sched pipe -l 4000000
  perf bench -f simple sched messaging -l 30000
  perf bench -f simple  mem memset -s 3GB -l 15 -f default
  perf bench -f simple futex wake -s -t 640 -w 1
  sysbench cpu --threads=8 --cpu-max-prime=10000 run
  sysbench memory --memory-access-mode=rnd --threads=8 run
  sysbench threads --threads=8 run
  sysbench mutex --mutex-num=1 --threads=8 run
  hackbench --loops 20000
  hackbench --pipe --threads --loops 20000
  hackbench --pipe --threads --loops 20000 --datasize 4096

and found some performance improvements in:
  sysbench threads
  sysbench mutex
  perf bench futex wake
and no regressions in others.

master: 'commit b3a9e3b9622a ("Linux 5.8-rc1")' 
$> sysbench threads --threads=16 run
	sysbench 1.0.11 (using system LuaJIT 2.1.0-beta3)
	Running the test with following options:
	Number of threads: 16
	Initializing random number generator from current time
	Initializing worker threads...
	Threads started!
	General statistics:
		total time:                          10.0079s
		total number of events:              45526 << higher is better
	Latency (ms):
			min:                                  0.36
			avg:                                  3.52
			max:                                 54.22
			95th percentile:                     23.10
			sum:                             160044.33
	Threads fairness:
		events (avg/stddev):           2845.3750/94.18
		execution time (avg/stddev):   10.0028/0.00

With patch:
$> sysbench threads --threads=16 run
	sysbench 1.0.11 (using system LuaJIT 2.1.0-beta3)
	Running the test with following options:
	Number of threads: 16
	Initializing random number generator from current time
	Initializing worker threads...
	Threads started!
	General statistics:
		total time:                          10.0053s
		total number of events:              56567  << higher is better
	Latency (ms):
			min:                                  0.36
			avg:                                  2.83
			max:                                 27.65
			95th percentile:                     18.95
			sum:                             160003.83

	Threads fairness:
		events (avg/stddev):           3535.4375/147.38
		execution time (avg/stddev):   10.0002/0.00

master: 'commit b3a9e3b9622a ("Linux 5.8-rc1")' 
$> sysbench mutex --mutex-num=1 --threads=32 run
	sysbench 1.0.11 (using system LuaJIT 2.1.0-beta3)
	Running the test with following options:
	Number of threads: 32
	Initializing random number generator from current time
	Initializing worker threads...
	Threads started!
	General statistics:
		total time:                          1.0415s << lower is better
		total number of events:              32
	Latency (ms):
			min:                                940.57
			avg:                                959.24
			max:                               1041.05
			95th percentile:                    960.30
			sum:                              30695.84
	Threads fairness:
		events (avg/stddev):           1.0000/0.00
		execution time (avg/stddev):   0.9592/0.02

With patch:
@> sysbench mutex --mutex-num=1 --threads=32 run
	sysbench 1.0.11 (using system LuaJIT 2.1.0-beta3)
	Running the test with following options:
	Number of threads: 32
	Initializing random number generator from current time
	Initializing worker threads...
	Threads started!
	General statistics:
		total time:                          0.9209s  << lower is better
		total number of events:              32
	Latency (ms):
			min:                                867.37
			avg:                                892.09
			max:                                920.70
			95th percentile:                    909.80
			sum:                              28546.84
	Threads fairness:
		events (avg/stddev):           1.0000/0.00
		execution time (avg/stddev):   0.8921/0.01

master: 'commit b3a9e3b9622a ("Linux 5.8-rc1")'
$> perf bench futex wake -s -t 128 -w 1
	# Running 'futex/wake' benchmark:
	Run summary [PID 2414]: blocking on 128 threads 
			(at [private] futex 0xaaaab663a154), waking up 1 at a time.
	Wokeup 128 of 128 threads in 0.2852 ms (+-1.86%) << lower is better

With patch:
$> perf bench futex wake -s -t 128 -w 1
	# Running 'futex/wake' benchmark:
	Run summary [PID 5057]: blocking on 128 threads 
			(at [private] futex 0xaaaace461154), waking up 1 at a time.
	Wokeup 128 of 128 threads in 0.2705 ms (+-1.84%) << lower is better

Signed-off-by: Peter Puhov <peter.puhov@linaro.org>
---
 kernel/sched/fair.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 02f323b85b6d..abcbdf80ee75 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8662,8 +8662,14 @@ static bool update_pick_idlest(struct sched_group *idlest,
 
 	case group_has_spare:
 		/* Select group with most idle CPUs */
-		if (idlest_sgs->idle_cpus >= sgs->idle_cpus)
+		if (idlest_sgs->idle_cpus > sgs->idle_cpus)
 			return false;
+
+		/* Select group with lowest group_util */
+		if (idlest_sgs->idle_cpus == sgs->idle_cpus &&
+			idlest_sgs->group_util <= sgs->group_util)
+			return false;
+
 		break;
 	}
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-07-02 13:45 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-16 16:48 [PATCH] sched/fair: update_pick_idlest() Select group with lowest group_util when idle_cpus are equal peter.puhov
2020-06-17 10:50 ` Valentin Schneider
2020-06-17 14:52   ` Peter Puhov
2020-07-02  9:27     ` Dietmar Eggemann
2020-07-02 13:20       ` Mel Gorman
2020-07-02 13:45         ` Vincent Guittot
2020-07-01  9:19 ` [sched/fair] 0b9730e694: vm-scalability.throughput 7.7% improvement kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).