linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/10] powerpc/mm/slice: improve slice speed and stack use
@ 2018-03-06 13:24 Nicholas Piggin
  2018-03-06 13:24 ` [PATCH 01/10] selftests/powerpc: add process creation benchmark Nicholas Piggin
                   ` (9 more replies)
  0 siblings, 10 replies; 31+ messages in thread
From: Nicholas Piggin @ 2018-03-06 13:24 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy

Since this was last posted, it's been ported on top of Christophe's
8xx slice implementation that is merged in powerpc next, also taken
into account some feedback and bugs from Aneesh and Christophe --
thanks.

A few significant changes, first is refactoring slice_set_user_psize,
which makes it more obvious how the slice state is initialized, which
makes it easier to reason about using dynamic high slice size limits I
think.

Second is a significant change to how the slice masks are kept. No
longer are they bolted on the side and hit with a big recalculation
call that redoes everything whenever something changes. Now they are
just maintained as part of slice conversion.

This now passes vm selftests including the 128TB boundary case tests.
I also added a process microbenchmark and redid benchmarks and stack
measurements.

Overall on POWER8, this series increases vfork+exec+exit
microbenchmark rate by 15.6%, and mmap+munmap rate by 81%. Slice
code/data size is reduced by 1kB, and max stack overhead through
slice_get_unmapped_area call goes rom 992 to 448 bytes. The cost is
288 bytes added to the mm_context_t per mm for the slice masks on
Book3S.

Thanks,
Nick

Nicholas Piggin (10):
  selftests/powerpc: add process creation benchmark
  powerpc/mm/slice: Simplify and optimise slice context initialisation
  powerpc/mm/slice: tidy lpsizes and hpsizes update loops
  powerpc/mm/slice: pass pointers to struct slice_mask where possible
  powerpc/mm/slice: implement a slice mask cache
  powerpc/mm/slice: implement slice_check_range_fits
  powerpc/mm/slice: Switch to 3-operand slice bitops helpers
  powerpc/mm/slice: Use const pointers to cached slice masks where
    possible
  powerpc/mm/slice: use the dynamic high slice size to limit bitmap
    operations
  powerpc/mm/slice: remove radix calls to the slice code

 arch/powerpc/include/asm/book3s/64/mmu.h           |  18 +
 arch/powerpc/include/asm/hugetlb.h                 |   9 +-
 arch/powerpc/include/asm/mmu-8xx.h                 |  14 +
 arch/powerpc/include/asm/slice.h                   |   8 +-
 arch/powerpc/mm/hugetlbpage.c                      |   5 +-
 arch/powerpc/mm/mmu_context_book3s64.c             |   9 +-
 arch/powerpc/mm/mmu_context_nohash.c               |   5 +-
 arch/powerpc/mm/slice.c                            | 458 +++++++++++----------
 .../selftests/powerpc/benchmarks/.gitignore        |   2 +
 .../testing/selftests/powerpc/benchmarks/Makefile  |   8 +-
 .../selftests/powerpc/benchmarks/exec_target.c     |   5 +
 tools/testing/selftests/powerpc/benchmarks/fork.c  | 339 +++++++++++++++
 12 files changed, 632 insertions(+), 248 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/benchmarks/exec_target.c
 create mode 100644 tools/testing/selftests/powerpc/benchmarks/fork.c

-- 
2.16.1

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 01/10] selftests/powerpc: add process creation benchmark
  2018-03-06 13:24 [PATCH 00/10] powerpc/mm/slice: improve slice speed and stack use Nicholas Piggin
@ 2018-03-06 13:24 ` Nicholas Piggin
  2018-03-19 22:23   ` [01/10] " Michael Ellerman
  2018-03-20 10:15   ` Michael Ellerman
  2018-03-06 13:24 ` [PATCH 02/10] powerpc/mm/slice: Simplify and optimise slice context initialisation Nicholas Piggin
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 31+ messages in thread
From: Nicholas Piggin @ 2018-03-06 13:24 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 .../selftests/powerpc/benchmarks/.gitignore        |   2 +
 .../testing/selftests/powerpc/benchmarks/Makefile  |   8 +-
 .../selftests/powerpc/benchmarks/exec_target.c     |   5 +
 tools/testing/selftests/powerpc/benchmarks/fork.c  | 339 +++++++++++++++++++++
 4 files changed, 353 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/benchmarks/exec_target.c
 create mode 100644 tools/testing/selftests/powerpc/benchmarks/fork.c

diff --git a/tools/testing/selftests/powerpc/benchmarks/.gitignore b/tools/testing/selftests/powerpc/benchmarks/.gitignore
index 04dc1e6ef2ce..9161679b1e1a 100644
--- a/tools/testing/selftests/powerpc/benchmarks/.gitignore
+++ b/tools/testing/selftests/powerpc/benchmarks/.gitignore
@@ -1,5 +1,7 @@
 gettimeofday
 context_switch
+fork
+exec_target
 mmap_bench
 futex_bench
 null_syscall
diff --git a/tools/testing/selftests/powerpc/benchmarks/Makefile b/tools/testing/selftests/powerpc/benchmarks/Makefile
index a35058e3766c..61189a0b8285 100644
--- a/tools/testing/selftests/powerpc/benchmarks/Makefile
+++ b/tools/testing/selftests/powerpc/benchmarks/Makefile
@@ -1,5 +1,9 @@
 # SPDX-License-Identifier: GPL-2.0
-TEST_GEN_PROGS := gettimeofday context_switch mmap_bench futex_bench null_syscall
+TEST_GEN_PROGS := gettimeofday context_switch fork mmap_bench futex_bench null_syscall
+TEST_GEN_FILES := exec_target
+
+$(OUTPUT)/exec_target: exec_target.c
+	$(CC) -O2 -static -nostartfiles -oexec_target exec_target.c
 
 CFLAGS += -O2
 
@@ -10,3 +14,5 @@ $(TEST_GEN_PROGS): ../harness.c
 $(OUTPUT)/context_switch: ../utils.c
 $(OUTPUT)/context_switch: CFLAGS += -maltivec -mvsx -mabi=altivec
 $(OUTPUT)/context_switch: LDLIBS += -lpthread
+
+$(OUTPUT)/fork: LDLIBS += -lpthread
diff --git a/tools/testing/selftests/powerpc/benchmarks/exec_target.c b/tools/testing/selftests/powerpc/benchmarks/exec_target.c
new file mode 100644
index 000000000000..5e2a6e917c1a
--- /dev/null
+++ b/tools/testing/selftests/powerpc/benchmarks/exec_target.c
@@ -0,0 +1,5 @@
+void _exit(int);
+void _start(void)
+{
+	_exit(0);
+}
diff --git a/tools/testing/selftests/powerpc/benchmarks/fork.c b/tools/testing/selftests/powerpc/benchmarks/fork.c
new file mode 100644
index 000000000000..c68a7c360fd2
--- /dev/null
+++ b/tools/testing/selftests/powerpc/benchmarks/fork.c
@@ -0,0 +1,339 @@
+/*
+ * Context switch microbenchmark.
+ *
+ * Copyright (C) 2018 Anton Blanchard <anton@au.ibm.com>, IBM
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <sched.h>
+#include <string.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <getopt.h>
+#include <signal.h>
+#include <assert.h>
+#include <pthread.h>
+#include <limits.h>
+#include <sys/time.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <sys/shm.h>
+#include <sys/wait.h>
+#include <linux/futex.h>
+
+static unsigned int timeout = 30;
+
+static void set_cpu(int cpu)
+{
+	cpu_set_t cpuset;
+
+	if (cpu == -1)
+		return;
+
+	CPU_ZERO(&cpuset);
+	CPU_SET(cpu, &cpuset);
+
+	if (sched_setaffinity(0, sizeof(cpuset), &cpuset)) {
+		perror("sched_setaffinity");
+		exit(1);
+	}
+}
+
+static void start_process_on(void *(*fn)(void *), void *arg, int cpu)
+{
+	int pid;
+
+	pid = fork();
+	if (pid == -1) {
+		perror("fork");
+		exit(1);
+	}
+
+	if (pid)
+		return;
+
+	set_cpu(cpu);
+
+	fn(arg);
+
+	exit(0);
+}
+
+static int cpu;
+static int do_fork = 0;
+static int do_vfork = 0;
+static int do_exec = 0;
+static char *exec_file;
+static int exec_target = 0;
+static unsigned long iterations;
+static unsigned long iterations_prev;
+
+static void run_exec(void)
+{
+#if 0
+	char *const argv[] = { exec_file, "--exec-target", NULL };
+
+	if (execve(exec_file, argv, NULL) == -1) {
+		perror("execve");
+		exit(1);
+	}
+#else
+	char *const argv[] = { "./exec_target", NULL };
+
+	if (execve("./exec_target", argv, NULL) == -1) {
+		perror("execve");
+		exit(1);
+	}
+#endif
+}
+
+static void bench_fork(void)
+{
+	while (1) {
+		pid_t pid = fork();
+		if (pid == -1) {
+			perror("fork");
+			exit(1);
+		}
+		if (pid == 0) {
+			if (do_exec)
+				run_exec();
+			_exit(0);
+		}
+		pid = waitpid(pid, NULL, 0);
+		if (pid	== -1) {
+			perror("waitpid");
+			exit(1);
+		}
+		iterations++;
+	}
+}
+
+static void bench_vfork(void)
+{
+	while (1) {
+		pid_t pid = vfork();
+		if (pid == -1) {
+			perror("fork");
+			exit(1);
+		}
+		if (pid == 0) {
+			if (do_exec)
+				run_exec();
+			_exit(0);
+		}
+		pid = waitpid(pid, NULL, 0);
+		if (pid	== -1) {
+			perror("waitpid");
+			exit(1);
+		}
+		iterations++;
+	}
+}
+
+
+static void *null_fn(void *arg)
+{
+	pthread_exit(NULL);
+}
+
+static void bench_thread(void)
+{
+	pthread_t tid;
+	cpu_set_t cpuset;
+	pthread_attr_t attr;
+	int rc;
+
+	rc = pthread_attr_init(&attr);
+	if (rc) {
+		errno = rc;
+		perror("pthread_attr_init");
+		exit(1);
+	}
+
+	if (cpu != -1) {
+		CPU_ZERO(&cpuset);
+		CPU_SET(cpu, &cpuset);
+
+		rc = pthread_attr_setaffinity_np(&attr, sizeof(cpu_set_t), &cpuset);
+		if (rc)	{
+			errno = rc;
+			perror("pthread_attr_setaffinity_np");
+			exit(1);
+		}
+	}
+
+	while (1) {
+		rc = pthread_create(&tid, &attr, null_fn, NULL);
+		if (rc) {
+			errno = rc;
+			perror("pthread_create");
+			exit(1);
+		}
+		rc = pthread_join(tid, NULL);
+		if (rc) {
+			errno = rc;
+			perror("pthread_join");
+			exit(1);
+		}
+		iterations++;
+	}
+}
+
+
+static void sigalrm_handler(int junk)
+{
+	unsigned long i = iterations;
+
+	printf("%ld\n", i - iterations_prev);
+	iterations_prev = i;
+
+	if (--timeout == 0)
+		kill(0, SIGUSR1);
+
+	alarm(1);
+}
+
+static void sigusr1_handler(int junk)
+{
+	exit(0);
+}
+
+static void *bench_proc(void *arg)
+{
+	signal(SIGALRM, sigalrm_handler);
+	alarm(1);
+
+	if (do_fork)
+		bench_fork();
+	else if (do_vfork)
+		bench_vfork();
+	else
+		bench_thread();
+
+	return NULL;
+}
+
+static struct option options[] = {
+	{ "fork", no_argument, &do_fork, 1 },
+	{ "vfork", no_argument, &do_vfork, 1 },
+	{ "exec", no_argument, &do_exec, 1 },
+	{ "timeout", required_argument, 0, 's' },
+	{ "exec-target", no_argument, &exec_target, 1 },
+	{ 0, },
+};
+
+static void usage(void)
+{
+	fprintf(stderr, "Usage: fork <options> CPU\n\n");
+	fprintf(stderr, "\t\t--fork\tUse fork() (default threads)\n");
+	fprintf(stderr, "\t\t--vfork\tUse vfork() (default threads)\n");
+	fprintf(stderr, "\t\t--exec\tAlso exec() (default no exec)\n");
+	fprintf(stderr, "\t\t--timeout=X\tDuration in seconds to run (default 30)\n");
+	fprintf(stderr, "\t\t--exec-target\tInternal option for exec workload\n");
+}
+
+int main(int argc, char *argv[])
+{
+	signed char c;
+
+	while (1) {
+		int option_index = 0;
+
+		c = getopt_long(argc, argv, "", options, &option_index);
+
+		if (c == -1)
+			break;
+
+		switch (c) {
+		case 0:
+			if (options[option_index].flag != 0)
+				break;
+
+			usage();
+			exit(1);
+			break;
+
+		case 's':
+			timeout = atoi(optarg);
+			break;
+
+		default:
+			usage();
+			exit(1);
+		}
+	}
+
+	if (do_fork && do_vfork) {
+		usage();
+		exit(1);
+	}
+	if (do_exec && !do_fork && !do_vfork) {
+		usage();
+		exit(1);
+	}
+
+	if (do_exec) {
+		char *dirname = strdup(argv[0]);
+		int i;
+		i = strlen(dirname) - 1;
+		while (i) {
+			if (dirname[i] == '/') {
+				dirname[i] = '\0';
+				if (chdir(dirname) == -1) {
+					perror("chdir");
+					exit(1);
+				}
+				break;
+			}
+			i--;
+		}
+	}
+
+	if (exec_target) {
+		exit(0);
+	}
+
+	if (((argc - optind) != 1)) {
+		cpu = -1;
+	} else {
+		cpu = atoi(argv[optind++]);
+	}
+
+	if (do_exec)
+		exec_file = argv[0];
+
+	set_cpu(cpu);
+
+	printf("Using ");
+	if (do_fork)
+		printf("fork");
+	else if (do_vfork)
+		printf("vfork");
+	else
+		printf("clone");
+
+	if (do_exec)
+		printf(" + exec");
+
+	printf(" on cpu %d\n", cpu);
+
+	/* Create a new process group so we can signal everyone for exit */
+	setpgid(getpid(), getpid());
+
+	signal(SIGUSR1, sigusr1_handler);
+
+	start_process_on(bench_proc, NULL, cpu);
+
+	while (1)
+		sleep(3600);
+
+	return 0;
+}
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 02/10] powerpc/mm/slice: Simplify and optimise slice context initialisation
  2018-03-06 13:24 [PATCH 00/10] powerpc/mm/slice: improve slice speed and stack use Nicholas Piggin
  2018-03-06 13:24 ` [PATCH 01/10] selftests/powerpc: add process creation benchmark Nicholas Piggin
@ 2018-03-06 13:24 ` Nicholas Piggin
  2018-03-06 14:32   ` Nicholas Piggin
  2018-03-06 13:25 ` [PATCH 03/10] powerpc/mm/slice: tidy lpsizes and hpsizes update loops Nicholas Piggin
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Nicholas Piggin @ 2018-03-06 13:24 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy

The slice state of an mm gets zeroed then initialised upon exec.
This is the only caller of slice_set_user_psize now, so that can be
removed and instead implement a faster and simplified approach that
requires no locking or checking existing state.

This speeds up vfork+exec+exit performance on POWER8 by 3%.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/slice.h       |  8 ++--
 arch/powerpc/mm/mmu_context_book3s64.c |  9 +----
 arch/powerpc/mm/mmu_context_nohash.c   |  5 +--
 arch/powerpc/mm/slice.c                | 69 +++++++---------------------------
 4 files changed, 19 insertions(+), 72 deletions(-)

diff --git a/arch/powerpc/include/asm/slice.h b/arch/powerpc/include/asm/slice.h
index 172711fadb1c..e40406cf5628 100644
--- a/arch/powerpc/include/asm/slice.h
+++ b/arch/powerpc/include/asm/slice.h
@@ -28,15 +28,13 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 
 unsigned int get_slice_psize(struct mm_struct *mm, unsigned long addr);
 
-void slice_set_user_psize(struct mm_struct *mm, unsigned int psize);
 void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
 			   unsigned long len, unsigned int psize);
-#endif /* __ASSEMBLY__ */
 
-#else /* CONFIG_PPC_MM_SLICES */
+void slice_init_new_context_exec(struct mm_struct *mm);
+
+#endif /* __ASSEMBLY__ */
 
-#define slice_set_range_psize(mm, start, len, psize)	\
-	slice_set_user_psize((mm), (psize))
 #endif /* CONFIG_PPC_MM_SLICES */
 
 #endif /* _ASM_POWERPC_SLICE_H */
diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c
index 929d9ef7083f..80acad52b006 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -93,13 +93,6 @@ static int hash__init_new_context(struct mm_struct *mm)
 	if (index < 0)
 		return index;
 
-	/*
-	 * In the case of exec, use the default limit,
-	 * otherwise inherit it from the mm we are duplicating.
-	 */
-	if (!mm->context.slb_addr_limit)
-		mm->context.slb_addr_limit = DEFAULT_MAP_WINDOW_USER64;
-
 	/*
 	 * The old code would re-promote on fork, we don't do that when using
 	 * slices as it could cause problem promoting slices that have been
@@ -115,7 +108,7 @@ static int hash__init_new_context(struct mm_struct *mm)
 	 * check against 0 is OK.
 	 */
 	if (mm->context.id == 0)
-		slice_set_user_psize(mm, mmu_virtual_psize);
+		slice_init_new_context_exec(mm);
 
 	subpage_prot_init_new_context(mm);
 
diff --git a/arch/powerpc/mm/mmu_context_nohash.c b/arch/powerpc/mm/mmu_context_nohash.c
index d98f7e5c141b..be8f5c9d4d08 100644
--- a/arch/powerpc/mm/mmu_context_nohash.c
+++ b/arch/powerpc/mm/mmu_context_nohash.c
@@ -332,9 +332,6 @@ int init_new_context(struct task_struct *t, struct mm_struct *mm)
 	pr_hard("initing context for mm @%p\n", mm);
 
 #ifdef	CONFIG_PPC_MM_SLICES
-	if (!mm->context.slb_addr_limit)
-		mm->context.slb_addr_limit = DEFAULT_MAP_WINDOW;
-
 	/*
 	 * We have MMU_NO_CONTEXT set to be ~0. Hence check
 	 * explicitly against context.id == 0. This ensures that we properly
@@ -343,7 +340,7 @@ int init_new_context(struct task_struct *t, struct mm_struct *mm)
 	 * will have id != 0).
 	 */
 	if (mm->context.id == 0)
-		slice_set_user_psize(mm, mmu_virtual_psize);
+		slice_init_new_context_exec(mm);
 #endif
 	mm->context.id = MMU_NO_CONTEXT;
 	mm->context.active = 0;
diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index 5e9e1e57d580..af4351b15d01 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -671,70 +671,29 @@ unsigned int get_slice_psize(struct mm_struct *mm, unsigned long addr)
 }
 EXPORT_SYMBOL_GPL(get_slice_psize);
 
-/*
- * This is called by hash_page when it needs to do a lazy conversion of
- * an address space from real 64K pages to combo 4K pages (typically
- * when hitting a non cacheable mapping on a processor or hypervisor
- * that won't allow them for 64K pages).
- *
- * This is also called in init_new_context() to change back the user
- * psize from whatever the parent context had it set to
- * N.B. This may be called before mm->context.id has been set.
- *
- * This function will only change the content of the {low,high)_slice_psize
- * masks, it will not flush SLBs as this shall be handled lazily by the
- * caller.
- */
-void slice_set_user_psize(struct mm_struct *mm, unsigned int psize)
+void slice_init_new_context_exec(struct mm_struct *mm)
 {
-	int index, mask_index;
 	unsigned char *hpsizes, *lpsizes;
-	unsigned long flags;
-	unsigned int old_psize;
-	int i;
-
-	slice_dbg("slice_set_user_psize(mm=%p, psize=%d)\n", mm, psize);
-
-	VM_BUG_ON(radix_enabled());
-	spin_lock_irqsave(&slice_convert_lock, flags);
+	unsigned int psize = mmu_virtual_psize;
 
-	old_psize = mm->context.user_psize;
-	slice_dbg(" old_psize=%d\n", old_psize);
-	if (old_psize == psize)
-		goto bail;
+	slice_dbg("slice_init_new_context_exec(mm=%p)\n", mm);
 
+	/*
+	 * In the case of exec, use the default limit. In the
+	 * case of fork it is just inherited from the mm being
+	 * duplicated.
+	 */
+	mm->context.slb_addr_limit = DEFAULT_MAP_WINDOW_USER64;
 	mm->context.user_psize = psize;
-	wmb();
 
+	/*
+	 * Set all slice psizes to the default.
+	 */
 	lpsizes = mm->context.low_slices_psize;
-	for (i = 0; i < SLICE_NUM_LOW; i++) {
-		mask_index = i & 0x1;
-		index = i >> 1;
-		if (((lpsizes[index] >> (mask_index * 4)) & 0xf) == old_psize)
-			lpsizes[index] = (lpsizes[index] &
-					  ~(0xf << (mask_index * 4))) |
-				(((unsigned long)psize) << (mask_index * 4));
-	}
+	memset(lpsizes, (psize << 4) | psize, SLICE_NUM_LOW >> 1);
 
 	hpsizes = mm->context.high_slices_psize;
-	for (i = 0; i < SLICE_NUM_HIGH; i++) {
-		mask_index = i & 0x1;
-		index = i >> 1;
-		if (((hpsizes[index] >> (mask_index * 4)) & 0xf) == old_psize)
-			hpsizes[index] = (hpsizes[index] &
-					  ~(0xf << (mask_index * 4))) |
-				(((unsigned long)psize) << (mask_index * 4));
-	}
-
-
-
-
-	slice_dbg(" lsps=%lx, hsps=%lx\n",
-		  (unsigned long)mm->context.low_slices_psize,
-		  (unsigned long)mm->context.high_slices_psize);
-
- bail:
-	spin_unlock_irqrestore(&slice_convert_lock, flags);
+	memset(hpsizes, (psize << 4) | psize, SLICE_NUM_HIGH >> 1);
 }
 
 void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 03/10] powerpc/mm/slice: tidy lpsizes and hpsizes update loops
  2018-03-06 13:24 [PATCH 00/10] powerpc/mm/slice: improve slice speed and stack use Nicholas Piggin
  2018-03-06 13:24 ` [PATCH 01/10] selftests/powerpc: add process creation benchmark Nicholas Piggin
  2018-03-06 13:24 ` [PATCH 02/10] powerpc/mm/slice: Simplify and optimise slice context initialisation Nicholas Piggin
@ 2018-03-06 13:25 ` Nicholas Piggin
  2018-03-06 13:25 ` [PATCH 04/10] powerpc/mm/slice: pass pointers to struct slice_mask where possible Nicholas Piggin
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Nicholas Piggin @ 2018-03-06 13:25 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy

Make these loops look the same, and change their form so the
important part is not wrapped over so many lines.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/mm/slice.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index af4351b15d01..9625ceb35685 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -232,22 +232,24 @@ static void slice_convert(struct mm_struct *mm, struct slice_mask mask, int psiz
 	spin_lock_irqsave(&slice_convert_lock, flags);
 
 	lpsizes = mm->context.low_slices_psize;
-	for (i = 0; i < SLICE_NUM_LOW; i++)
-		if (mask.low_slices & (1u << i)) {
-			mask_index = i & 0x1;
-			index = i >> 1;
-			lpsizes[index] = (lpsizes[index] &
-					  ~(0xf << (mask_index * 4))) |
+	for (i = 0; i < SLICE_NUM_LOW; i++) {
+		if (!(mask.low_slices & (1u << i)))
+			continue;
+
+		mask_index = i & 0x1;
+		index = i >> 1;
+		lpsizes[index] = (lpsizes[index] & ~(0xf << (mask_index * 4))) |
 				(((unsigned long)psize) << (mask_index * 4));
-		}
+	}
 
 	hpsizes = mm->context.high_slices_psize;
 	for (i = 0; i < GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit); i++) {
+		if (!test_bit(i, mask.high_slices))
+			continue;
+
 		mask_index = i & 0x1;
 		index = i >> 1;
-		if (test_bit(i, mask.high_slices))
-			hpsizes[index] = (hpsizes[index] &
-					  ~(0xf << (mask_index * 4))) |
+		hpsizes[index] = (hpsizes[index] & ~(0xf << (mask_index * 4))) |
 				(((unsigned long)psize) << (mask_index * 4));
 	}
 
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 04/10] powerpc/mm/slice: pass pointers to struct slice_mask where possible
  2018-03-06 13:24 [PATCH 00/10] powerpc/mm/slice: improve slice speed and stack use Nicholas Piggin
                   ` (2 preceding siblings ...)
  2018-03-06 13:25 ` [PATCH 03/10] powerpc/mm/slice: tidy lpsizes and hpsizes update loops Nicholas Piggin
@ 2018-03-06 13:25 ` Nicholas Piggin
  2018-03-06 13:43   ` Christophe LEROY
  2018-03-06 13:25 ` [PATCH 05/10] powerpc/mm/slice: implement a slice mask cache Nicholas Piggin
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Nicholas Piggin @ 2018-03-06 13:25 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy

Pass around const pointers to struct slice_mask where possible, rather
than copies of slice_mask, to reduce stack and call overhead.

checkstack.pl gives, before:
0x00000d1c slice_get_unmapped_area [slice.o]:		592
0x00001864 is_hugepage_only_range [slice.o]:		448
0x00000754 slice_find_area_topdown [slice.o]:		400
0x00000484 slice_find_area_bottomup.isra.1 [slice.o]:	272
0x000017b4 slice_set_range_psize [slice.o]:		224
0x00000a4c slice_find_area [slice.o]:			128
0x00000160 slice_check_fit [slice.o]:			112

after:
0x00000ad0 slice_get_unmapped_area [slice.o]:		448
0x00001464 is_hugepage_only_range [slice.o]:		288
0x000006c0 slice_find_area [slice.o]:			144
0x0000016c slice_check_fit [slice.o]:			128
0x00000528 slice_find_area_bottomup.isra.2 [slice.o]:	128
0x000013e4 slice_set_range_psize [slice.o]:		128

This increases vfork+exec+exit performance by 1.5%.

Reduces time to mmap+munmap a 64kB page by 17%.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/mm/slice.c | 87 ++++++++++++++++++++++++++-----------------------
 1 file changed, 47 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index 9625ceb35685..233c42d593dc 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -50,19 +50,21 @@ struct slice_mask {
 #ifdef DEBUG
 int _slice_debug = 1;
 
-static void slice_print_mask(const char *label, struct slice_mask mask)
+static void slice_print_mask(const char *label, const struct slice_mask *mask)
 {
 	if (!_slice_debug)
 		return;
-	pr_devel("%s low_slice: %*pbl\n", label, (int)SLICE_NUM_LOW, &mask.low_slices);
-	pr_devel("%s high_slice: %*pbl\n", label, (int)SLICE_NUM_HIGH, mask.high_slices);
+	pr_devel("%s low_slice: %*pbl\n", label,
+			(int)SLICE_NUM_LOW, &mask->low_slices);
+	pr_devel("%s high_slice: %*pbl\n", label,
+			(int)SLICE_NUM_HIGH, mask->high_slices);
 }
 
 #define slice_dbg(fmt...) do { if (_slice_debug) pr_devel(fmt); } while (0)
 
 #else
 
-static void slice_print_mask(const char *label, struct slice_mask mask) {}
+static void slice_print_mask(const char *label, const struct slice_mask *mask) {}
 #define slice_dbg(fmt...)
 
 #endif
@@ -147,7 +149,8 @@ static void slice_mask_for_free(struct mm_struct *mm, struct slice_mask *ret,
 			__set_bit(i, ret->high_slices);
 }
 
-static void slice_mask_for_size(struct mm_struct *mm, int psize, struct slice_mask *ret,
+static void slice_mask_for_size(struct mm_struct *mm, int psize,
+				struct slice_mask *ret,
 				unsigned long high_limit)
 {
 	unsigned char *hpsizes, *lpsizes;
@@ -179,7 +182,8 @@ static void slice_mask_for_size(struct mm_struct *mm, int psize, struct slice_ma
 }
 
 static int slice_check_fit(struct mm_struct *mm,
-			   struct slice_mask mask, struct slice_mask available)
+			   const struct slice_mask *mask,
+			   const struct slice_mask *available)
 {
 	DECLARE_BITMAP(result, SLICE_NUM_HIGH);
 	/*
@@ -189,14 +193,14 @@ static int slice_check_fit(struct mm_struct *mm,
 	unsigned long slice_count = GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit);
 
 	if (!SLICE_NUM_HIGH)
-		return (mask.low_slices & available.low_slices) ==
-		       mask.low_slices;
+		return (mask->low_slices & available->low_slices) ==
+		       mask->low_slices;
 
-	bitmap_and(result, mask.high_slices,
-		   available.high_slices, slice_count);
+	bitmap_and(result, mask->high_slices,
+		   available->high_slices, slice_count);
 
-	return (mask.low_slices & available.low_slices) == mask.low_slices &&
-		bitmap_equal(result, mask.high_slices, slice_count);
+	return (mask->low_slices & available->low_slices) == mask->low_slices &&
+		bitmap_equal(result, mask->high_slices, slice_count);
 }
 
 static void slice_flush_segments(void *parm)
@@ -216,7 +220,8 @@ static void slice_flush_segments(void *parm)
 #endif
 }
 
-static void slice_convert(struct mm_struct *mm, struct slice_mask mask, int psize)
+static void slice_convert(struct mm_struct *mm,
+				const struct slice_mask *mask, int psize)
 {
 	int index, mask_index;
 	/* Write the new slice psize bits */
@@ -233,7 +238,7 @@ static void slice_convert(struct mm_struct *mm, struct slice_mask mask, int psiz
 
 	lpsizes = mm->context.low_slices_psize;
 	for (i = 0; i < SLICE_NUM_LOW; i++) {
-		if (!(mask.low_slices & (1u << i)))
+		if (!(mask->low_slices & (1u << i)))
 			continue;
 
 		mask_index = i & 0x1;
@@ -244,7 +249,7 @@ static void slice_convert(struct mm_struct *mm, struct slice_mask mask, int psiz
 
 	hpsizes = mm->context.high_slices_psize;
 	for (i = 0; i < GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit); i++) {
-		if (!test_bit(i, mask.high_slices))
+		if (!test_bit(i, mask->high_slices))
 			continue;
 
 		mask_index = i & 0x1;
@@ -270,26 +275,25 @@ static void slice_convert(struct mm_struct *mm, struct slice_mask mask, int psiz
  * 'available' slice_mark.
  */
 static bool slice_scan_available(unsigned long addr,
-				 struct slice_mask available,
-				 int end,
-				 unsigned long *boundary_addr)
+				 const struct slice_mask *available,
+				 int end, unsigned long *boundary_addr)
 {
 	unsigned long slice;
 	if (addr < SLICE_LOW_TOP) {
 		slice = GET_LOW_SLICE_INDEX(addr);
 		*boundary_addr = (slice + end) << SLICE_LOW_SHIFT;
-		return !!(available.low_slices & (1u << slice));
+		return !!(available->low_slices & (1u << slice));
 	} else {
 		slice = GET_HIGH_SLICE_INDEX(addr);
 		*boundary_addr = (slice + end) ?
 			((slice + end) << SLICE_HIGH_SHIFT) : SLICE_LOW_TOP;
-		return !!test_bit(slice, available.high_slices);
+		return !!test_bit(slice, available->high_slices);
 	}
 }
 
 static unsigned long slice_find_area_bottomup(struct mm_struct *mm,
 					      unsigned long len,
-					      struct slice_mask available,
+					      const struct slice_mask *available,
 					      int psize, unsigned long high_limit)
 {
 	int pshift = max_t(int, mmu_psize_defs[psize].shift, PAGE_SHIFT);
@@ -335,7 +339,7 @@ static unsigned long slice_find_area_bottomup(struct mm_struct *mm,
 
 static unsigned long slice_find_area_topdown(struct mm_struct *mm,
 					     unsigned long len,
-					     struct slice_mask available,
+					     const struct slice_mask *available,
 					     int psize, unsigned long high_limit)
 {
 	int pshift = max_t(int, mmu_psize_defs[psize].shift, PAGE_SHIFT);
@@ -393,7 +397,7 @@ static unsigned long slice_find_area_topdown(struct mm_struct *mm,
 
 
 static unsigned long slice_find_area(struct mm_struct *mm, unsigned long len,
-				     struct slice_mask mask, int psize,
+				     const struct slice_mask *mask, int psize,
 				     int topdown, unsigned long high_limit)
 {
 	if (topdown)
@@ -402,7 +406,8 @@ static unsigned long slice_find_area(struct mm_struct *mm, unsigned long len,
 		return slice_find_area_bottomup(mm, len, mask, psize, high_limit);
 }
 
-static inline void slice_or_mask(struct slice_mask *dst, struct slice_mask *src)
+static inline void slice_or_mask(struct slice_mask *dst,
+					const struct slice_mask *src)
 {
 	dst->low_slices |= src->low_slices;
 	if (!SLICE_NUM_HIGH)
@@ -411,7 +416,8 @@ static inline void slice_or_mask(struct slice_mask *dst, struct slice_mask *src)
 		  SLICE_NUM_HIGH);
 }
 
-static inline void slice_andnot_mask(struct slice_mask *dst, struct slice_mask *src)
+static inline void slice_andnot_mask(struct slice_mask *dst,
+					const struct slice_mask *src)
 {
 	dst->low_slices &= ~src->low_slices;
 
@@ -501,7 +507,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	 * already
 	 */
 	slice_mask_for_size(mm, psize, &good_mask, high_limit);
-	slice_print_mask(" good_mask", good_mask);
+	slice_print_mask(" good_mask", &good_mask);
 
 	/*
 	 * Here "good" means slices that are already the right page size,
@@ -535,12 +541,12 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	if (addr != 0 || fixed) {
 		/* Build a mask for the requested range */
 		slice_range_to_mask(addr, len, &mask);
-		slice_print_mask(" mask", mask);
+		slice_print_mask(" mask", &mask);
 
 		/* Check if we fit in the good mask. If we do, we just return,
 		 * nothing else to do
 		 */
-		if (slice_check_fit(mm, mask, good_mask)) {
+		if (slice_check_fit(mm, &mask, &good_mask)) {
 			slice_dbg(" fits good !\n");
 			return addr;
 		}
@@ -548,7 +554,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 		/* Now let's see if we can find something in the existing
 		 * slices for that size
 		 */
-		newaddr = slice_find_area(mm, len, good_mask,
+		newaddr = slice_find_area(mm, len, &good_mask,
 					  psize, topdown, high_limit);
 		if (newaddr != -ENOMEM) {
 			/* Found within the good mask, we don't have to setup,
@@ -564,9 +570,10 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	 */
 	slice_mask_for_free(mm, &potential_mask, high_limit);
 	slice_or_mask(&potential_mask, &good_mask);
-	slice_print_mask(" potential", potential_mask);
+	slice_print_mask(" potential", &potential_mask);
 
-	if ((addr != 0 || fixed) && slice_check_fit(mm, mask, potential_mask)) {
+	if ((addr != 0 || fixed) &&
+			slice_check_fit(mm, &mask, &potential_mask)) {
 		slice_dbg(" fits potential !\n");
 		goto convert;
 	}
@@ -581,7 +588,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	 * anywhere in the good area.
 	 */
 	if (addr) {
-		addr = slice_find_area(mm, len, good_mask,
+		addr = slice_find_area(mm, len, &good_mask,
 				       psize, topdown, high_limit);
 		if (addr != -ENOMEM) {
 			slice_dbg(" found area at 0x%lx\n", addr);
@@ -592,14 +599,14 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	/* Now let's see if we can find something in the existing slices
 	 * for that size plus free slices
 	 */
-	addr = slice_find_area(mm, len, potential_mask,
+	addr = slice_find_area(mm, len, &potential_mask,
 			       psize, topdown, high_limit);
 
 #ifdef CONFIG_PPC_64K_PAGES
 	if (addr == -ENOMEM && psize == MMU_PAGE_64K) {
 		/* retry the search with 4k-page slices included */
 		slice_or_mask(&potential_mask, &compat_mask);
-		addr = slice_find_area(mm, len, potential_mask,
+		addr = slice_find_area(mm, len, &potential_mask,
 				       psize, topdown, high_limit);
 	}
 #endif
@@ -609,7 +616,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 
 	slice_range_to_mask(addr, len, &mask);
 	slice_dbg(" found potential area at 0x%lx\n", addr);
-	slice_print_mask(" mask", mask);
+	slice_print_mask(" mask", &mask);
 
  convert:
 	slice_andnot_mask(&mask, &good_mask);
@@ -617,7 +624,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	if (mask.low_slices ||
 	    (SLICE_NUM_HIGH &&
 	     !bitmap_empty(mask.high_slices, SLICE_NUM_HIGH))) {
-		slice_convert(mm, mask, psize);
+		slice_convert(mm, &mask, psize);
 		if (psize > MMU_PAGE_BASE)
 			on_each_cpu(slice_flush_segments, mm, 1);
 	}
@@ -706,7 +713,7 @@ void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
 	VM_BUG_ON(radix_enabled());
 
 	slice_range_to_mask(start, len, &mask);
-	slice_convert(mm, mask, psize);
+	slice_convert(mm, &mask, psize);
 }
 
 #ifdef CONFIG_HUGETLB_PAGE
@@ -753,9 +760,9 @@ int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
 #if 0 /* too verbose */
 	slice_dbg("is_hugepage_only_range(mm=%p, addr=%lx, len=%lx)\n",
 		 mm, addr, len);
-	slice_print_mask(" mask", mask);
-	slice_print_mask(" available", available);
+	slice_print_mask(" mask", &mask);
+	slice_print_mask(" available", &available);
 #endif
-	return !slice_check_fit(mm, mask, available);
+	return !slice_check_fit(mm, &mask, &available);
 }
 #endif
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 05/10] powerpc/mm/slice: implement a slice mask cache
  2018-03-06 13:24 [PATCH 00/10] powerpc/mm/slice: improve slice speed and stack use Nicholas Piggin
                   ` (3 preceding siblings ...)
  2018-03-06 13:25 ` [PATCH 04/10] powerpc/mm/slice: pass pointers to struct slice_mask where possible Nicholas Piggin
@ 2018-03-06 13:25 ` Nicholas Piggin
  2018-03-06 13:49   ` Christophe LEROY
  2018-03-06 13:25 ` [PATCH 06/10] powerpc/mm/slice: implement slice_check_range_fits Nicholas Piggin
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Nicholas Piggin @ 2018-03-06 13:25 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy,
	Benjamin Herrenschmidt, Anton Blanchard

Calculating the slice mask can become a signifcant overhead for
get_unmapped_area. This patch adds a struct slice_mask for
each page size in the mm_context, and keeps these in synch with
the slices psize arrays and slb_addr_limit.

On Book3S/64 this adds 288 bytes to the mm_context_t for the
slice mask caches.

On POWER8, this increases vfork+exec+exit performance by 9.9%
and reduces time to mmap+munmap a 64kB page by 28%.

Reduces time to mmap+munmap by about 10% on 8xx.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Anton Blanchard <anton@samba.org>
---
 arch/powerpc/include/asm/book3s/64/mmu.h |  18 +++++
 arch/powerpc/include/asm/mmu-8xx.h       |  14 ++++
 arch/powerpc/mm/slice.c                  | 118 ++++++++++++++++++++-----------
 3 files changed, 107 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
index bef6e39ed63a..777778579305 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -80,6 +80,16 @@ struct spinlock;
 /* Maximum possible number of NPUs in a system. */
 #define NV_MAX_NPUS 8
 
+/*
+ * One bit per slice. We have lower slices which cover 256MB segments
+ * upto 4G range. That gets us 16 low slices. For the rest we track slices
+ * in 1TB size.
+ */
+struct slice_mask {
+	u64 low_slices;
+	DECLARE_BITMAP(high_slices, SLICE_NUM_HIGH);
+};
+
 typedef struct {
 	mm_context_id_t id;
 	u16 user_psize;		/* page size index */
@@ -95,6 +105,14 @@ typedef struct {
 	unsigned char low_slices_psize[BITS_PER_LONG / BITS_PER_BYTE];
 	unsigned char high_slices_psize[SLICE_ARRAY_SIZE];
 	unsigned long slb_addr_limit;
+# ifdef CONFIG_PPC_64K_PAGES
+	struct slice_mask mask_64k;
+# endif
+	struct slice_mask mask_4k;
+# ifdef CONFIG_HUGETLB_PAGE
+	struct slice_mask mask_16m;
+	struct slice_mask mask_16g;
+# endif
 #else
 	u16 sllp;		/* SLB page size encoding */
 #endif
diff --git a/arch/powerpc/include/asm/mmu-8xx.h b/arch/powerpc/include/asm/mmu-8xx.h
index d3d7e79140c6..4c3b14703b3e 100644
--- a/arch/powerpc/include/asm/mmu-8xx.h
+++ b/arch/powerpc/include/asm/mmu-8xx.h
@@ -192,6 +192,11 @@
 #endif
 
 #ifndef __ASSEMBLY__
+struct slice_mask {
+	u64 low_slices;
+	DECLARE_BITMAP(high_slices, 0);
+};
+
 typedef struct {
 	unsigned int id;
 	unsigned int active;
@@ -201,6 +206,15 @@ typedef struct {
 	unsigned char low_slices_psize[SLICE_ARRAY_SIZE];
 	unsigned char high_slices_psize[0];
 	unsigned long slb_addr_limit;
+# ifdef CONFIG_PPC_16K_PAGES
+	struct slice_mask mask_16k;
+# else
+	struct slice_mask mask_4k;
+# endif
+# ifdef CONFIG_HUGETLB_PAGE
+	struct slice_mask mask_512k;
+	struct slice_mask mask_8m;
+# endif
 #endif
 } mm_context_t;
 
diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index 233c42d593dc..2115efe5e869 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -37,15 +37,6 @@
 #include <asm/hugetlb.h>
 
 static DEFINE_SPINLOCK(slice_convert_lock);
-/*
- * One bit per slice. We have lower slices which cover 256MB segments
- * upto 4G range. That gets us 16 low slices. For the rest we track slices
- * in 1TB size.
- */
-struct slice_mask {
-	u64 low_slices;
-	DECLARE_BITMAP(high_slices, SLICE_NUM_HIGH);
-};
 
 #ifdef DEBUG
 int _slice_debug = 1;
@@ -149,37 +140,44 @@ static void slice_mask_for_free(struct mm_struct *mm, struct slice_mask *ret,
 			__set_bit(i, ret->high_slices);
 }
 
-static void slice_mask_for_size(struct mm_struct *mm, int psize,
-				struct slice_mask *ret,
-				unsigned long high_limit)
+#ifdef CONFIG_PPC_BOOK3S_64
+static struct slice_mask *slice_mask_for_size(struct mm_struct *mm, int psize)
 {
-	unsigned char *hpsizes, *lpsizes;
-	int index, mask_index;
-	unsigned long i;
-
-	ret->low_slices = 0;
-	if (SLICE_NUM_HIGH)
-		bitmap_zero(ret->high_slices, SLICE_NUM_HIGH);
-
-	lpsizes = mm->context.low_slices_psize;
-	for (i = 0; i < SLICE_NUM_LOW; i++) {
-		mask_index = i & 0x1;
-		index = i >> 1;
-		if (((lpsizes[index] >> (mask_index * 4)) & 0xf) == psize)
-			ret->low_slices |= 1u << i;
-	}
-
-	if (high_limit <= SLICE_LOW_TOP)
-		return;
-
-	hpsizes = mm->context.high_slices_psize;
-	for (i = 0; i < GET_HIGH_SLICE_INDEX(high_limit); i++) {
-		mask_index = i & 0x1;
-		index = i >> 1;
-		if (((hpsizes[index] >> (mask_index * 4)) & 0xf) == psize)
-			__set_bit(i, ret->high_slices);
-	}
+#ifdef CONFIG_PPC_64K_PAGES
+	if (psize == MMU_PAGE_64K)
+		return &mm->context.mask_64k;
+#endif
+	if (psize == MMU_PAGE_4K)
+		return &mm->context.mask_4k;
+#ifdef CONFIG_HUGETLB_PAGE
+	if (psize == MMU_PAGE_16M)
+		return &mm->context.mask_16m;
+	if (psize == MMU_PAGE_16G)
+		return &mm->context.mask_16g;
+#endif
+	BUG();
 }
+#elif defined(CONFIG_PPC_8xx)
+static struct slice_mask *slice_mask_for_size(struct mm_struct *mm, int psize)
+{
+#ifdef CONFIG_PPC_16K_PAGES
+	if (psize == MMU_PAGE_16K)
+		return &mm->context.mask_16k;
+#else
+	if (psize == MMU_PAGE_4K)
+		return &mm->context.mask_4k;
+#endif
+#ifdef CONFIG_HUGETLB_PAGE
+	if (psize == MMU_PAGE_512K)
+		return &mm->context.mask_512k;
+	if (psize == MMU_PAGE_8M)
+		return &mm->context.mask_8m;
+#endif
+	BUG();
+}
+#else
+#error "Must define the slice masks for page sizes supported by the platform"
+#endif
 
 static int slice_check_fit(struct mm_struct *mm,
 			   const struct slice_mask *mask,
@@ -226,11 +224,15 @@ static void slice_convert(struct mm_struct *mm,
 	int index, mask_index;
 	/* Write the new slice psize bits */
 	unsigned char *hpsizes, *lpsizes;
+	struct slice_mask *psize_mask, *old_mask;
 	unsigned long i, flags;
+	int old_psize;
 
 	slice_dbg("slice_convert(mm=%p, psize=%d)\n", mm, psize);
 	slice_print_mask(" mask", mask);
 
+	psize_mask = slice_mask_for_size(mm, psize);
+
 	/* We need to use a spinlock here to protect against
 	 * concurrent 64k -> 4k demotion ...
 	 */
@@ -243,6 +245,14 @@ static void slice_convert(struct mm_struct *mm,
 
 		mask_index = i & 0x1;
 		index = i >> 1;
+
+		/* Update the slice_mask */
+		old_psize = (lpsizes[index] >> (mask_index * 4)) & 0xf;
+		old_mask = slice_mask_for_size(mm, old_psize);
+		old_mask->low_slices &= ~(1u << i);
+		psize_mask->low_slices |= 1u << i;
+
+		/* Update the sizes array */
 		lpsizes[index] = (lpsizes[index] & ~(0xf << (mask_index * 4))) |
 				(((unsigned long)psize) << (mask_index * 4));
 	}
@@ -254,6 +264,14 @@ static void slice_convert(struct mm_struct *mm,
 
 		mask_index = i & 0x1;
 		index = i >> 1;
+
+		/* Update the slice_mask */
+		old_psize = (hpsizes[index] >> (mask_index * 4)) & 0xf;
+		old_mask = slice_mask_for_size(mm, old_psize);
+		__clear_bit(i, old_mask->high_slices);
+		__set_bit(i, psize_mask->high_slices);
+
+		/* Update the sizes array */
 		hpsizes[index] = (hpsizes[index] & ~(0xf << (mask_index * 4))) |
 				(((unsigned long)psize) << (mask_index * 4));
 	}
@@ -464,7 +482,13 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	}
 
 	if (high_limit > mm->context.slb_addr_limit) {
+		/*
+		 * Increasing the slb_addr_limit does not require
+		 * slice mask cache to be recalculated because it should
+		 * be already initialised beyond the old address limit.
+		 */
 		mm->context.slb_addr_limit = high_limit;
+
 		on_each_cpu(slice_flush_segments, mm, 1);
 	}
 
@@ -506,7 +530,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	/* First make up a "good" mask of slices that have the right size
 	 * already
 	 */
-	slice_mask_for_size(mm, psize, &good_mask, high_limit);
+	good_mask = *slice_mask_for_size(mm, psize);
 	slice_print_mask(" good_mask", &good_mask);
 
 	/*
@@ -531,7 +555,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 #ifdef CONFIG_PPC_64K_PAGES
 	/* If we support combo pages, we can allow 64k pages in 4k slices */
 	if (psize == MMU_PAGE_64K) {
-		slice_mask_for_size(mm, MMU_PAGE_4K, &compat_mask, high_limit);
+		compat_mask = *slice_mask_for_size(mm, MMU_PAGE_4K);
 		if (fixed)
 			slice_or_mask(&good_mask, &compat_mask);
 	}
@@ -683,6 +707,7 @@ EXPORT_SYMBOL_GPL(get_slice_psize);
 void slice_init_new_context_exec(struct mm_struct *mm)
 {
 	unsigned char *hpsizes, *lpsizes;
+	struct slice_mask *mask;
 	unsigned int psize = mmu_virtual_psize;
 
 	slice_dbg("slice_init_new_context_exec(mm=%p)\n", mm);
@@ -703,6 +728,14 @@ void slice_init_new_context_exec(struct mm_struct *mm)
 
 	hpsizes = mm->context.high_slices_psize;
 	memset(hpsizes, (psize << 4) | psize, SLICE_NUM_HIGH >> 1);
+
+	/*
+	 * Slice mask cache starts zeroed, fill the default size cache.
+	 */
+	mask = slice_mask_for_size(mm, psize);
+	mask->low_slices = ~0UL;
+	if (SLICE_NUM_HIGH)
+		bitmap_fill(mask->high_slices, SLICE_NUM_HIGH);
 }
 
 void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
@@ -741,18 +774,17 @@ int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
 {
 	struct slice_mask mask, available;
 	unsigned int psize = mm->context.user_psize;
-	unsigned long high_limit = mm->context.slb_addr_limit;
 
 	if (radix_enabled())
 		return 0;
 
 	slice_range_to_mask(addr, len, &mask);
-	slice_mask_for_size(mm, psize, &available, high_limit);
+	available = *slice_mask_for_size(mm, psize);
 #ifdef CONFIG_PPC_64K_PAGES
 	/* We need to account for 4k slices too */
 	if (psize == MMU_PAGE_64K) {
 		struct slice_mask compat_mask;
-		slice_mask_for_size(mm, MMU_PAGE_4K, &compat_mask, high_limit);
+		compat_mask = *slice_mask_for_size(mm, MMU_PAGE_4K);
 		slice_or_mask(&available, &compat_mask);
 	}
 #endif
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 06/10] powerpc/mm/slice: implement slice_check_range_fits
  2018-03-06 13:24 [PATCH 00/10] powerpc/mm/slice: improve slice speed and stack use Nicholas Piggin
                   ` (4 preceding siblings ...)
  2018-03-06 13:25 ` [PATCH 05/10] powerpc/mm/slice: implement a slice mask cache Nicholas Piggin
@ 2018-03-06 13:25 ` Nicholas Piggin
  2018-03-06 14:41   ` Christophe LEROY
  2018-03-06 13:25 ` [PATCH 07/10] powerpc/mm/slice: Switch to 3-operand slice bitops helpers Nicholas Piggin
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Nicholas Piggin @ 2018-03-06 13:25 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy

Rather than build slice masks from a range then use that to check for
fit in a candidate mask, implement slice_check_range_fits that checks
if a range fits in a mask directly.

This allows several structures to be removed from stacks, and also we
don't expect a huge range in a lot of these cases, so building and
comparing a full mask is going to be more expensive than testing just
one or two bits of the range.

On POWER8, this increases vfork+exec+exit performance by 0.3%
and reduces time to mmap+munmap a 64kB page by 5%.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/mm/slice.c | 63 ++++++++++++++++++++++++++-----------------------
 1 file changed, 34 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index 2115efe5e869..3841fca75006 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -179,26 +179,35 @@ static struct slice_mask *slice_mask_for_size(struct mm_struct *mm, int psize)
 #error "Must define the slice masks for page sizes supported by the platform"
 #endif
 
-static int slice_check_fit(struct mm_struct *mm,
-			   const struct slice_mask *mask,
-			   const struct slice_mask *available)
+static bool slice_check_range_fits(struct mm_struct *mm,
+			   const struct slice_mask *available,
+			   unsigned long start, unsigned long len)
 {
-	DECLARE_BITMAP(result, SLICE_NUM_HIGH);
-	/*
-	 * Make sure we just do bit compare only to the max
-	 * addr limit and not the full bit map size.
-	 */
-	unsigned long slice_count = GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit);
+	unsigned long end = start + len - 1;
+	u64 low_slices = 0;
 
-	if (!SLICE_NUM_HIGH)
-		return (mask->low_slices & available->low_slices) ==
-		       mask->low_slices;
+	if (start < SLICE_LOW_TOP) {
+		unsigned long mend = min(end, (SLICE_LOW_TOP - 1));
+
+		low_slices = (1u << (GET_LOW_SLICE_INDEX(mend) + 1))
+				- (1u << GET_LOW_SLICE_INDEX(start));
+	}
+	if ((low_slices & available->low_slices) != low_slices)
+		return false;
+
+	if (SLICE_NUM_HIGH && ((start + len) > SLICE_LOW_TOP)) {
+		unsigned long start_index = GET_HIGH_SLICE_INDEX(start);
+		unsigned long align_end = ALIGN(end, (1UL << SLICE_HIGH_SHIFT));
+		unsigned long count = GET_HIGH_SLICE_INDEX(align_end) - start_index;
+		unsigned long i;
 
-	bitmap_and(result, mask->high_slices,
-		   available->high_slices, slice_count);
+		for (i = start_index; i < start_index + count; i++) {
+			if (!test_bit(i, available->high_slices))
+				return false;
+		}
+	}
 
-	return (mask->low_slices & available->low_slices) == mask->low_slices &&
-		bitmap_equal(result, mask->high_slices, slice_count);
+	return true;
 }
 
 static void slice_flush_segments(void *parm)
@@ -562,15 +571,11 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 #endif
 
 	/* First check hint if it's valid or if we have MAP_FIXED */
-	if (addr != 0 || fixed) {
-		/* Build a mask for the requested range */
-		slice_range_to_mask(addr, len, &mask);
-		slice_print_mask(" mask", &mask);
-
+	if (addr || fixed) {
 		/* Check if we fit in the good mask. If we do, we just return,
 		 * nothing else to do
 		 */
-		if (slice_check_fit(mm, &mask, &good_mask)) {
+		if (slice_check_range_fits(mm, &good_mask, addr, len)) {
 			slice_dbg(" fits good !\n");
 			return addr;
 		}
@@ -596,10 +601,11 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	slice_or_mask(&potential_mask, &good_mask);
 	slice_print_mask(" potential", &potential_mask);
 
-	if ((addr != 0 || fixed) &&
-			slice_check_fit(mm, &mask, &potential_mask)) {
-		slice_dbg(" fits potential !\n");
-		goto convert;
+	if (addr || fixed) {
+		if (slice_check_range_fits(mm, &potential_mask, addr, len)) {
+			slice_dbg(" fits potential !\n");
+			goto convert;
+		}
 	}
 
 	/* If we have MAP_FIXED and failed the above steps, then error out */
@@ -772,13 +778,12 @@ void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
 int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
 			   unsigned long len)
 {
-	struct slice_mask mask, available;
+	struct slice_mask available;
 	unsigned int psize = mm->context.user_psize;
 
 	if (radix_enabled())
 		return 0;
 
-	slice_range_to_mask(addr, len, &mask);
 	available = *slice_mask_for_size(mm, psize);
 #ifdef CONFIG_PPC_64K_PAGES
 	/* We need to account for 4k slices too */
@@ -795,6 +800,6 @@ int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
 	slice_print_mask(" mask", &mask);
 	slice_print_mask(" available", &available);
 #endif
-	return !slice_check_fit(mm, &mask, &available);
+	return !slice_check_range_fits(mm, &available, addr, len);
 }
 #endif
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 07/10] powerpc/mm/slice: Switch to 3-operand slice bitops helpers
  2018-03-06 13:24 [PATCH 00/10] powerpc/mm/slice: improve slice speed and stack use Nicholas Piggin
                   ` (5 preceding siblings ...)
  2018-03-06 13:25 ` [PATCH 06/10] powerpc/mm/slice: implement slice_check_range_fits Nicholas Piggin
@ 2018-03-06 13:25 ` Nicholas Piggin
  2018-03-06 14:44   ` Christophe LEROY
  2018-03-06 13:25 ` [PATCH 08/10] powerpc/mm/slice: Use const pointers to cached slice masks where possible Nicholas Piggin
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Nicholas Piggin @ 2018-03-06 13:25 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy

This converts the slice_mask bit operation helpers to be the usual
3-operand kind, which is clearer to work with.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/mm/slice.c | 38 +++++++++++++++++++++++---------------
 1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index 3841fca75006..46daa1d1794f 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -433,25 +433,33 @@ static unsigned long slice_find_area(struct mm_struct *mm, unsigned long len,
 		return slice_find_area_bottomup(mm, len, mask, psize, high_limit);
 }
 
-static inline void slice_or_mask(struct slice_mask *dst,
+static inline void slice_copy_mask(struct slice_mask *dst,
 					const struct slice_mask *src)
 {
-	dst->low_slices |= src->low_slices;
+	dst->low_slices = src->low_slices;
 	if (!SLICE_NUM_HIGH)
 		return;
-	bitmap_or(dst->high_slices, dst->high_slices, src->high_slices,
-		  SLICE_NUM_HIGH);
+	bitmap_copy(dst->high_slices, src->high_slices, SLICE_NUM_HIGH);
 }
 
-static inline void slice_andnot_mask(struct slice_mask *dst,
-					const struct slice_mask *src)
+static inline void slice_or_mask(struct slice_mask *dst,
+					const struct slice_mask *src1,
+					const struct slice_mask *src2)
 {
-	dst->low_slices &= ~src->low_slices;
+	dst->low_slices = src1->low_slices | src2->low_slices;
+	if (!SLICE_NUM_HIGH)
+		return;
+	bitmap_or(dst->high_slices, src1->high_slices, src2->high_slices, SLICE_NUM_HIGH);
+}
 
+static inline void slice_andnot_mask(struct slice_mask *dst,
+					const struct slice_mask *src1,
+					const struct slice_mask *src2)
+{
+	dst->low_slices = src1->low_slices & ~src2->low_slices;
 	if (!SLICE_NUM_HIGH)
 		return;
-	bitmap_andnot(dst->high_slices, dst->high_slices, src->high_slices,
-		      SLICE_NUM_HIGH);
+	bitmap_andnot(dst->high_slices, src1->high_slices, src2->high_slices, SLICE_NUM_HIGH);
 }
 
 #ifdef CONFIG_PPC_64K_PAGES
@@ -566,7 +574,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	if (psize == MMU_PAGE_64K) {
 		compat_mask = *slice_mask_for_size(mm, MMU_PAGE_4K);
 		if (fixed)
-			slice_or_mask(&good_mask, &compat_mask);
+			slice_or_mask(&good_mask, &good_mask, &compat_mask);
 	}
 #endif
 
@@ -598,7 +606,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	 * empty and thus can be converted
 	 */
 	slice_mask_for_free(mm, &potential_mask, high_limit);
-	slice_or_mask(&potential_mask, &good_mask);
+	slice_or_mask(&potential_mask, &potential_mask, &good_mask);
 	slice_print_mask(" potential", &potential_mask);
 
 	if (addr || fixed) {
@@ -635,7 +643,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 #ifdef CONFIG_PPC_64K_PAGES
 	if (addr == -ENOMEM && psize == MMU_PAGE_64K) {
 		/* retry the search with 4k-page slices included */
-		slice_or_mask(&potential_mask, &compat_mask);
+		slice_or_mask(&potential_mask, &potential_mask, &compat_mask);
 		addr = slice_find_area(mm, len, &potential_mask,
 				       psize, topdown, high_limit);
 	}
@@ -649,8 +657,8 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	slice_print_mask(" mask", &mask);
 
  convert:
-	slice_andnot_mask(&mask, &good_mask);
-	slice_andnot_mask(&mask, &compat_mask);
+	slice_andnot_mask(&mask, &mask, &good_mask);
+	slice_andnot_mask(&mask, &mask, &compat_mask);
 	if (mask.low_slices ||
 	    (SLICE_NUM_HIGH &&
 	     !bitmap_empty(mask.high_slices, SLICE_NUM_HIGH))) {
@@ -790,7 +798,7 @@ int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
 	if (psize == MMU_PAGE_64K) {
 		struct slice_mask compat_mask;
 		compat_mask = *slice_mask_for_size(mm, MMU_PAGE_4K);
-		slice_or_mask(&available, &compat_mask);
+		slice_or_mask(&available, &available, &compat_mask);
 	}
 #endif
 
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 08/10] powerpc/mm/slice: Use const pointers to cached slice masks where possible
  2018-03-06 13:24 [PATCH 00/10] powerpc/mm/slice: improve slice speed and stack use Nicholas Piggin
                   ` (6 preceding siblings ...)
  2018-03-06 13:25 ` [PATCH 07/10] powerpc/mm/slice: Switch to 3-operand slice bitops helpers Nicholas Piggin
@ 2018-03-06 13:25 ` Nicholas Piggin
  2018-03-06 14:55   ` Christophe LEROY
  2018-03-06 13:25 ` [PATCH 09/10] powerpc/mm/slice: use the dynamic high slice size to limit bitmap operations Nicholas Piggin
  2018-03-06 13:25 ` [PATCH 10/10] powerpc/mm/slice: remove radix calls to the slice code Nicholas Piggin
  9 siblings, 1 reply; 31+ messages in thread
From: Nicholas Piggin @ 2018-03-06 13:25 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy

The slice_mask cache was a basic conversion which copied the slice
mask into caller's structures, because that's how the original code
worked. In most cases the pointer can be used directly instead, saving
a copy and an on-stack structure.

On POWER8, this increases vfork+exec+exit performance by 0.3%
and reduces time to mmap+munmap a 64kB page by 2%.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/mm/slice.c | 77 +++++++++++++++++++++----------------------------
 1 file changed, 33 insertions(+), 44 deletions(-)

diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index 46daa1d1794f..086c31b8b982 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -472,10 +472,10 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 				      unsigned long flags, unsigned int psize,
 				      int topdown)
 {
-	struct slice_mask mask;
 	struct slice_mask good_mask;
 	struct slice_mask potential_mask;
-	struct slice_mask compat_mask;
+	const struct slice_mask *maskp;
+	const struct slice_mask *compat_maskp = NULL;
 	int fixed = (flags & MAP_FIXED);
 	int pshift = max_t(int, mmu_psize_defs[psize].shift, PAGE_SHIFT);
 	unsigned long page_size = 1UL << pshift;
@@ -509,22 +509,6 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 		on_each_cpu(slice_flush_segments, mm, 1);
 	}
 
-	/*
-	 * init different masks
-	 */
-	mask.low_slices = 0;
-
-	/* silence stupid warning */;
-	potential_mask.low_slices = 0;
-
-	compat_mask.low_slices = 0;
-
-	if (SLICE_NUM_HIGH) {
-		bitmap_zero(mask.high_slices, SLICE_NUM_HIGH);
-		bitmap_zero(potential_mask.high_slices, SLICE_NUM_HIGH);
-		bitmap_zero(compat_mask.high_slices, SLICE_NUM_HIGH);
-	}
-
 	/* Sanity checks */
 	BUG_ON(mm->task_size == 0);
 	BUG_ON(mm->context.slb_addr_limit == 0);
@@ -547,8 +531,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	/* First make up a "good" mask of slices that have the right size
 	 * already
 	 */
-	good_mask = *slice_mask_for_size(mm, psize);
-	slice_print_mask(" good_mask", &good_mask);
+	maskp = slice_mask_for_size(mm, psize);
 
 	/*
 	 * Here "good" means slices that are already the right page size,
@@ -572,11 +555,19 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 #ifdef CONFIG_PPC_64K_PAGES
 	/* If we support combo pages, we can allow 64k pages in 4k slices */
 	if (psize == MMU_PAGE_64K) {
-		compat_mask = *slice_mask_for_size(mm, MMU_PAGE_4K);
+		compat_maskp = slice_mask_for_size(mm, MMU_PAGE_4K);
 		if (fixed)
-			slice_or_mask(&good_mask, &good_mask, &compat_mask);
-	}
+			slice_or_mask(&good_mask, maskp, compat_maskp);
+		else
+			slice_copy_mask(&good_mask, maskp);
+	} else
 #endif
+	{
+		slice_copy_mask(&good_mask, maskp);
+	}
+	slice_print_mask(" good_mask", &good_mask);
+	if (compat_maskp)
+		slice_print_mask(" compat_mask", compat_maskp);
 
 	/* First check hint if it's valid or if we have MAP_FIXED */
 	if (addr || fixed) {
@@ -643,7 +634,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 #ifdef CONFIG_PPC_64K_PAGES
 	if (addr == -ENOMEM && psize == MMU_PAGE_64K) {
 		/* retry the search with 4k-page slices included */
-		slice_or_mask(&potential_mask, &potential_mask, &compat_mask);
+		slice_or_mask(&potential_mask, &potential_mask, compat_maskp);
 		addr = slice_find_area(mm, len, &potential_mask,
 				       psize, topdown, high_limit);
 	}
@@ -652,17 +643,18 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	if (addr == -ENOMEM)
 		return -ENOMEM;
 
-	slice_range_to_mask(addr, len, &mask);
+	slice_range_to_mask(addr, len, &potential_mask);
 	slice_dbg(" found potential area at 0x%lx\n", addr);
-	slice_print_mask(" mask", &mask);
+	slice_print_mask(" mask", &potential_mask);
 
  convert:
-	slice_andnot_mask(&mask, &mask, &good_mask);
-	slice_andnot_mask(&mask, &mask, &compat_mask);
-	if (mask.low_slices ||
-	    (SLICE_NUM_HIGH &&
-	     !bitmap_empty(mask.high_slices, SLICE_NUM_HIGH))) {
-		slice_convert(mm, &mask, psize);
+	slice_andnot_mask(&potential_mask, &potential_mask, &good_mask);
+	if (compat_maskp && !fixed)
+		slice_andnot_mask(&potential_mask, &potential_mask, compat_maskp);
+	if (potential_mask.low_slices ||
+		(SLICE_NUM_HIGH &&
+		 !bitmap_empty(potential_mask.high_slices, SLICE_NUM_HIGH))) {
+		slice_convert(mm, &potential_mask, psize);
 		if (psize > MMU_PAGE_BASE)
 			on_each_cpu(slice_flush_segments, mm, 1);
 	}
@@ -786,28 +778,25 @@ void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
 int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
 			   unsigned long len)
 {
-	struct slice_mask available;
+	const struct slice_mask *maskp;
 	unsigned int psize = mm->context.user_psize;
 
 	if (radix_enabled())
 		return 0;
 
-	available = *slice_mask_for_size(mm, psize);
+	maskp = slice_mask_for_size(mm, psize);
 #ifdef CONFIG_PPC_64K_PAGES
 	/* We need to account for 4k slices too */
 	if (psize == MMU_PAGE_64K) {
-		struct slice_mask compat_mask;
-		compat_mask = *slice_mask_for_size(mm, MMU_PAGE_4K);
-		slice_or_mask(&available, &available, &compat_mask);
+		const struct slice_mask *compat_maskp;
+		struct slice_mask available;
+
+		compat_maskp = slice_mask_for_size(mm, MMU_PAGE_4K);
+		slice_or_mask(&available, maskp, compat_maskp);
+		return !slice_check_range_fits(mm, &available, addr, len);
 	}
 #endif
 
-#if 0 /* too verbose */
-	slice_dbg("is_hugepage_only_range(mm=%p, addr=%lx, len=%lx)\n",
-		 mm, addr, len);
-	slice_print_mask(" mask", &mask);
-	slice_print_mask(" available", &available);
-#endif
-	return !slice_check_range_fits(mm, &available, addr, len);
+	return !slice_check_range_fits(mm, maskp, addr, len);
 }
 #endif
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 09/10] powerpc/mm/slice: use the dynamic high slice size to limit bitmap operations
  2018-03-06 13:24 [PATCH 00/10] powerpc/mm/slice: improve slice speed and stack use Nicholas Piggin
                   ` (7 preceding siblings ...)
  2018-03-06 13:25 ` [PATCH 08/10] powerpc/mm/slice: Use const pointers to cached slice masks where possible Nicholas Piggin
@ 2018-03-06 13:25 ` Nicholas Piggin
  2018-03-06 15:02   ` Christophe LEROY
  2018-03-06 13:25 ` [PATCH 10/10] powerpc/mm/slice: remove radix calls to the slice code Nicholas Piggin
  9 siblings, 1 reply; 31+ messages in thread
From: Nicholas Piggin @ 2018-03-06 13:25 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy

The number of high slices a process might use now depends on its
address space size, and what allocation address it has requested.

This patch uses that limit throughout call chains where possible,
rather than use the fixed SLICE_NUM_HIGH for bitmap operations.
This saves some cost for processes that don't use very large address
spaces.

Perormance numbers aren't changed significantly, this may change
with larger address spaces or different mmap access patterns that
require more slice mask building.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/mm/slice.c | 75 +++++++++++++++++++++++++++++--------------------
 1 file changed, 45 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index 086c31b8b982..507d17e2cfcd 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -61,14 +61,12 @@ static void slice_print_mask(const char *label, const struct slice_mask *mask) {
 #endif
 
 static void slice_range_to_mask(unsigned long start, unsigned long len,
-				struct slice_mask *ret)
+				struct slice_mask *ret,
+				unsigned long high_slices)
 {
 	unsigned long end = start + len - 1;
 
 	ret->low_slices = 0;
-	if (SLICE_NUM_HIGH)
-		bitmap_zero(ret->high_slices, SLICE_NUM_HIGH);
-
 	if (start < SLICE_LOW_TOP) {
 		unsigned long mend = min(end,
 					 (unsigned long)(SLICE_LOW_TOP - 1));
@@ -77,6 +75,10 @@ static void slice_range_to_mask(unsigned long start, unsigned long len,
 			- (1u << GET_LOW_SLICE_INDEX(start));
 	}
 
+	if (!SLICE_NUM_HIGH)
+		return;
+
+	bitmap_zero(ret->high_slices, high_slices);
 	if ((start + len) > SLICE_LOW_TOP) {
 		unsigned long start_index = GET_HIGH_SLICE_INDEX(start);
 		unsigned long align_end = ALIGN(end, (1UL << SLICE_HIGH_SHIFT));
@@ -120,22 +122,20 @@ static int slice_high_has_vma(struct mm_struct *mm, unsigned long slice)
 }
 
 static void slice_mask_for_free(struct mm_struct *mm, struct slice_mask *ret,
-				unsigned long high_limit)
+				unsigned long high_slices)
 {
 	unsigned long i;
 
 	ret->low_slices = 0;
-	if (SLICE_NUM_HIGH)
-		bitmap_zero(ret->high_slices, SLICE_NUM_HIGH);
-
 	for (i = 0; i < SLICE_NUM_LOW; i++)
 		if (!slice_low_has_vma(mm, i))
 			ret->low_slices |= 1u << i;
 
-	if (high_limit <= SLICE_LOW_TOP)
+	if (!SLICE_NUM_HIGH || !high_slices)
 		return;
 
-	for (i = 0; i < GET_HIGH_SLICE_INDEX(high_limit); i++)
+	bitmap_zero(ret->high_slices, high_slices);
+	for (i = 0; i < high_slices; i++)
 		if (!slice_high_has_vma(mm, i))
 			__set_bit(i, ret->high_slices);
 }
@@ -232,6 +232,7 @@ static void slice_convert(struct mm_struct *mm,
 {
 	int index, mask_index;
 	/* Write the new slice psize bits */
+	unsigned long high_slices;
 	unsigned char *hpsizes, *lpsizes;
 	struct slice_mask *psize_mask, *old_mask;
 	unsigned long i, flags;
@@ -267,7 +268,8 @@ static void slice_convert(struct mm_struct *mm,
 	}
 
 	hpsizes = mm->context.high_slices_psize;
-	for (i = 0; i < GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit); i++) {
+	high_slices = GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit);
+	for (i = 0; SLICE_NUM_HIGH && i < high_slices; i++) {
 		if (!test_bit(i, mask->high_slices))
 			continue;
 
@@ -434,32 +436,37 @@ static unsigned long slice_find_area(struct mm_struct *mm, unsigned long len,
 }
 
 static inline void slice_copy_mask(struct slice_mask *dst,
-					const struct slice_mask *src)
+					const struct slice_mask *src,
+					unsigned long high_slices)
 {
 	dst->low_slices = src->low_slices;
 	if (!SLICE_NUM_HIGH)
 		return;
-	bitmap_copy(dst->high_slices, src->high_slices, SLICE_NUM_HIGH);
+	bitmap_copy(dst->high_slices, src->high_slices, high_slices);
 }
 
 static inline void slice_or_mask(struct slice_mask *dst,
 					const struct slice_mask *src1,
-					const struct slice_mask *src2)
+					const struct slice_mask *src2,
+					unsigned long high_slices)
 {
 	dst->low_slices = src1->low_slices | src2->low_slices;
 	if (!SLICE_NUM_HIGH)
 		return;
-	bitmap_or(dst->high_slices, src1->high_slices, src2->high_slices, SLICE_NUM_HIGH);
+	bitmap_or(dst->high_slices, src1->high_slices, src2->high_slices,
+			high_slices);
 }
 
 static inline void slice_andnot_mask(struct slice_mask *dst,
 					const struct slice_mask *src1,
-					const struct slice_mask *src2)
+					const struct slice_mask *src2,
+					unsigned long high_slices)
 {
 	dst->low_slices = src1->low_slices & ~src2->low_slices;
 	if (!SLICE_NUM_HIGH)
 		return;
-	bitmap_andnot(dst->high_slices, src1->high_slices, src2->high_slices, SLICE_NUM_HIGH);
+	bitmap_andnot(dst->high_slices, src1->high_slices, src2->high_slices,
+			high_slices);
 }
 
 #ifdef CONFIG_PPC_64K_PAGES
@@ -482,6 +489,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	struct mm_struct *mm = current->mm;
 	unsigned long newaddr;
 	unsigned long high_limit;
+	unsigned long high_slices;
 
 	high_limit = DEFAULT_MAP_WINDOW;
 	if (addr >= high_limit || (fixed && (addr + len > high_limit)))
@@ -498,6 +506,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 			return -ENOMEM;
 	}
 
+	high_slices = GET_HIGH_SLICE_INDEX(high_limit);
 	if (high_limit > mm->context.slb_addr_limit) {
 		/*
 		 * Increasing the slb_addr_limit does not require
@@ -557,13 +566,13 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	if (psize == MMU_PAGE_64K) {
 		compat_maskp = slice_mask_for_size(mm, MMU_PAGE_4K);
 		if (fixed)
-			slice_or_mask(&good_mask, maskp, compat_maskp);
+			slice_or_mask(&good_mask, maskp, compat_maskp, high_slices);
 		else
-			slice_copy_mask(&good_mask, maskp);
+			slice_copy_mask(&good_mask, maskp, high_slices);
 	} else
 #endif
 	{
-		slice_copy_mask(&good_mask, maskp);
+		slice_copy_mask(&good_mask, maskp, high_slices);
 	}
 	slice_print_mask(" good_mask", &good_mask);
 	if (compat_maskp)
@@ -596,8 +605,8 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	 * We don't fit in the good mask, check what other slices are
 	 * empty and thus can be converted
 	 */
-	slice_mask_for_free(mm, &potential_mask, high_limit);
-	slice_or_mask(&potential_mask, &potential_mask, &good_mask);
+	slice_mask_for_free(mm, &potential_mask, high_slices);
+	slice_or_mask(&potential_mask, &potential_mask, &good_mask, high_slices);
 	slice_print_mask(" potential", &potential_mask);
 
 	if (addr || fixed) {
@@ -634,7 +643,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 #ifdef CONFIG_PPC_64K_PAGES
 	if (addr == -ENOMEM && psize == MMU_PAGE_64K) {
 		/* retry the search with 4k-page slices included */
-		slice_or_mask(&potential_mask, &potential_mask, compat_maskp);
+		slice_or_mask(&potential_mask, &potential_mask, compat_maskp, high_slices);
 		addr = slice_find_area(mm, len, &potential_mask,
 				       psize, topdown, high_limit);
 	}
@@ -643,17 +652,17 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	if (addr == -ENOMEM)
 		return -ENOMEM;
 
-	slice_range_to_mask(addr, len, &potential_mask);
+	slice_range_to_mask(addr, len, &potential_mask, high_slices);
 	slice_dbg(" found potential area at 0x%lx\n", addr);
 	slice_print_mask(" mask", &potential_mask);
 
  convert:
-	slice_andnot_mask(&potential_mask, &potential_mask, &good_mask);
+	slice_andnot_mask(&potential_mask, &potential_mask, &good_mask, high_slices);
 	if (compat_maskp && !fixed)
-		slice_andnot_mask(&potential_mask, &potential_mask, compat_maskp);
+		slice_andnot_mask(&potential_mask, &potential_mask, compat_maskp, high_slices);
 	if (potential_mask.low_slices ||
 		(SLICE_NUM_HIGH &&
-		 !bitmap_empty(potential_mask.high_slices, SLICE_NUM_HIGH))) {
+		 !bitmap_empty(potential_mask.high_slices, high_slices))) {
 		slice_convert(mm, &potential_mask, psize);
 		if (psize > MMU_PAGE_BASE)
 			on_each_cpu(slice_flush_segments, mm, 1);
@@ -727,7 +736,9 @@ void slice_init_new_context_exec(struct mm_struct *mm)
 	mm->context.user_psize = psize;
 
 	/*
-	 * Set all slice psizes to the default.
+	 * Set all slice psizes to the default. High slices could
+	 * be initialised up to slb_addr_limit if we ensure to
+	 * initialise the rest of them as slb_addr_limit is expanded.
 	 */
 	lpsizes = mm->context.low_slices_psize;
 	memset(lpsizes, (psize << 4) | psize, SLICE_NUM_LOW >> 1);
@@ -748,10 +759,12 @@ void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
 			   unsigned long len, unsigned int psize)
 {
 	struct slice_mask mask;
+	unsigned long high_slices;
 
 	VM_BUG_ON(radix_enabled());
 
-	slice_range_to_mask(start, len, &mask);
+	high_slices = GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit);
+	slice_range_to_mask(start, len, &mask, high_slices);
 	slice_convert(mm, &mask, psize);
 }
 
@@ -790,9 +803,11 @@ int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
 	if (psize == MMU_PAGE_64K) {
 		const struct slice_mask *compat_maskp;
 		struct slice_mask available;
+		unsigned long high_slices;
 
 		compat_maskp = slice_mask_for_size(mm, MMU_PAGE_4K);
-		slice_or_mask(&available, maskp, compat_maskp);
+		high_slices = GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit);
+		slice_or_mask(&available, maskp, compat_maskp, high_slices);
 		return !slice_check_range_fits(mm, &available, addr, len);
 	}
 #endif
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 10/10] powerpc/mm/slice: remove radix calls to the slice code
  2018-03-06 13:24 [PATCH 00/10] powerpc/mm/slice: improve slice speed and stack use Nicholas Piggin
                   ` (8 preceding siblings ...)
  2018-03-06 13:25 ` [PATCH 09/10] powerpc/mm/slice: use the dynamic high slice size to limit bitmap operations Nicholas Piggin
@ 2018-03-06 13:25 ` Nicholas Piggin
  2018-03-06 15:12   ` Christophe LEROY
  9 siblings, 1 reply; 31+ messages in thread
From: Nicholas Piggin @ 2018-03-06 13:25 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy

This is a tidy up which removes radix MMU calls into the slice
code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/hugetlb.h |  9 ++++++---
 arch/powerpc/mm/hugetlbpage.c      |  5 +++--
 arch/powerpc/mm/slice.c            | 17 ++++-------------
 3 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h
index 1a4847f67ea8..59885d444695 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -90,16 +90,19 @@ pte_t *huge_pte_offset_and_shift(struct mm_struct *mm,
 void flush_dcache_icache_hugepage(struct page *page);
 
 #if defined(CONFIG_PPC_MM_SLICES)
-int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
+int slice_is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
 			   unsigned long len);
-#else
+#endif
 static inline int is_hugepage_only_range(struct mm_struct *mm,
 					 unsigned long addr,
 					 unsigned long len)
 {
+#if defined(CONFIG_PPC_MM_SLICES)
+	if (!radix_enabled())
+		return slice_is_hugepage_only_range(mm, addr, len);
+#endif
 	return 0;
 }
-#endif
 
 void book3e_hugetlb_preload(struct vm_area_struct *vma, unsigned long ea,
 			    pte_t pte);
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 590be3fa0ce2..b29d40889d1c 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -565,10 +565,11 @@ unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
 unsigned long vma_mmu_pagesize(struct vm_area_struct *vma)
 {
 #ifdef CONFIG_PPC_MM_SLICES
-	unsigned int psize = get_slice_psize(vma->vm_mm, vma->vm_start);
 	/* With radix we don't use slice, so derive it from vma*/
-	if (!radix_enabled())
+	if (!radix_enabled()) {
+		unsigned int psize = get_slice_psize(vma->vm_mm, vma->vm_start);
 		return 1UL << mmu_psize_to_shift(psize);
+	}
 #endif
 	if (!is_vm_hugetlb_page(vma))
 		return PAGE_SIZE;
diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index 507d17e2cfcd..15a857772617 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -697,16 +697,8 @@ unsigned int get_slice_psize(struct mm_struct *mm, unsigned long addr)
 	unsigned char *psizes;
 	int index, mask_index;
 
-	/*
-	 * Radix doesn't use slice, but can get enabled along with MMU_SLICE
-	 */
-	if (radix_enabled()) {
-#ifdef CONFIG_PPC_64K_PAGES
-		return MMU_PAGE_64K;
-#else
-		return MMU_PAGE_4K;
-#endif
-	}
+	VM_BUG_ON(radix_enabled());
+
 	if (addr < SLICE_LOW_TOP) {
 		psizes = mm->context.low_slices_psize;
 		index = GET_LOW_SLICE_INDEX(addr);
@@ -788,14 +780,13 @@ void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
  * for now as we only use slices with hugetlbfs enabled. This should
  * be fixed as the generic code gets fixed.
  */
-int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
+int slice_is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
 			   unsigned long len)
 {
 	const struct slice_mask *maskp;
 	unsigned int psize = mm->context.user_psize;
 
-	if (radix_enabled())
-		return 0;
+	VM_BUG_ON(radix_enabled());
 
 	maskp = slice_mask_for_size(mm, psize);
 #ifdef CONFIG_PPC_64K_PAGES
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 04/10] powerpc/mm/slice: pass pointers to struct slice_mask where possible
  2018-03-06 13:25 ` [PATCH 04/10] powerpc/mm/slice: pass pointers to struct slice_mask where possible Nicholas Piggin
@ 2018-03-06 13:43   ` Christophe LEROY
  2018-03-06 13:59     ` Nicholas Piggin
  0 siblings, 1 reply; 31+ messages in thread
From: Christophe LEROY @ 2018-03-06 13:43 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Aneesh Kumar K . V



Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :
> Pass around const pointers to struct slice_mask where possible, rather
> than copies of slice_mask, to reduce stack and call overhead.
> 
> checkstack.pl gives, before:
> 0x00000d1c slice_get_unmapped_area [slice.o]:		592
> 0x00001864 is_hugepage_only_range [slice.o]:		448
> 0x00000754 slice_find_area_topdown [slice.o]:		400
> 0x00000484 slice_find_area_bottomup.isra.1 [slice.o]:	272
> 0x000017b4 slice_set_range_psize [slice.o]:		224
> 0x00000a4c slice_find_area [slice.o]:			128
> 0x00000160 slice_check_fit [slice.o]:			112
> 
> after:
> 0x00000ad0 slice_get_unmapped_area [slice.o]:		448
> 0x00001464 is_hugepage_only_range [slice.o]:		288
> 0x000006c0 slice_find_area [slice.o]:			144
> 0x0000016c slice_check_fit [slice.o]:			128
> 0x00000528 slice_find_area_bottomup.isra.2 [slice.o]:	128
> 0x000013e4 slice_set_range_psize [slice.o]:		128
> 
> This increases vfork+exec+exit performance by 1.5%.
> 
> Reduces time to mmap+munmap a 64kB page by 17%.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   arch/powerpc/mm/slice.c | 87 ++++++++++++++++++++++++++-----------------------
>   1 file changed, 47 insertions(+), 40 deletions(-)
> 
> diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
> index 9625ceb35685..233c42d593dc 100644
> --- a/arch/powerpc/mm/slice.c
> +++ b/arch/powerpc/mm/slice.c
> @@ -50,19 +50,21 @@ struct slice_mask {
>   #ifdef DEBUG
>   int _slice_debug = 1;
>   
> -static void slice_print_mask(const char *label, struct slice_mask mask)
> +static void slice_print_mask(const char *label, const struct slice_mask *mask)
>   {
>   	if (!_slice_debug)
>   		return;
> -	pr_devel("%s low_slice: %*pbl\n", label, (int)SLICE_NUM_LOW, &mask.low_slices);
> -	pr_devel("%s high_slice: %*pbl\n", label, (int)SLICE_NUM_HIGH, mask.high_slices);
> +	pr_devel("%s low_slice: %*pbl\n", label,
> +			(int)SLICE_NUM_LOW, &mask->low_slices);
> +	pr_devel("%s high_slice: %*pbl\n", label,
> +			(int)SLICE_NUM_HIGH, mask->high_slices);
>   }
>   
>   #define slice_dbg(fmt...) do { if (_slice_debug) pr_devel(fmt); } while (0)
>   
>   #else
>   
> -static void slice_print_mask(const char *label, struct slice_mask mask) {}
> +static void slice_print_mask(const char *label, const struct slice_mask *mask) {}
>   #define slice_dbg(fmt...)
>   
>   #endif
> @@ -147,7 +149,8 @@ static void slice_mask_for_free(struct mm_struct *mm, struct slice_mask *ret,
>   			__set_bit(i, ret->high_slices);
>   }
>   
> -static void slice_mask_for_size(struct mm_struct *mm, int psize, struct slice_mask *ret,
> +static void slice_mask_for_size(struct mm_struct *mm, int psize,
> +				struct slice_mask *ret,

That seems to be cleanup only, is it worth including that in this patch ?

Christophe

>   				unsigned long high_limit)
>   {
>   	unsigned char *hpsizes, *lpsizes;
> @@ -179,7 +182,8 @@ static void slice_mask_for_size(struct mm_struct *mm, int psize, struct slice_ma
>   }
>   
>   static int slice_check_fit(struct mm_struct *mm,
> -			   struct slice_mask mask, struct slice_mask available)
> +			   const struct slice_mask *mask,
> +			   const struct slice_mask *available)
>   {
>   	DECLARE_BITMAP(result, SLICE_NUM_HIGH);
>   	/*
> @@ -189,14 +193,14 @@ static int slice_check_fit(struct mm_struct *mm,
>   	unsigned long slice_count = GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit);
>   
>   	if (!SLICE_NUM_HIGH)
> -		return (mask.low_slices & available.low_slices) ==
> -		       mask.low_slices;
> +		return (mask->low_slices & available->low_slices) ==
> +		       mask->low_slices;
>   
> -	bitmap_and(result, mask.high_slices,
> -		   available.high_slices, slice_count);
> +	bitmap_and(result, mask->high_slices,
> +		   available->high_slices, slice_count);
>   
> -	return (mask.low_slices & available.low_slices) == mask.low_slices &&
> -		bitmap_equal(result, mask.high_slices, slice_count);
> +	return (mask->low_slices & available->low_slices) == mask->low_slices &&
> +		bitmap_equal(result, mask->high_slices, slice_count);
>   }
>   
>   static void slice_flush_segments(void *parm)
> @@ -216,7 +220,8 @@ static void slice_flush_segments(void *parm)
>   #endif
>   }
>   
> -static void slice_convert(struct mm_struct *mm, struct slice_mask mask, int psize)
> +static void slice_convert(struct mm_struct *mm,
> +				const struct slice_mask *mask, int psize)
>   {
>   	int index, mask_index;
>   	/* Write the new slice psize bits */
> @@ -233,7 +238,7 @@ static void slice_convert(struct mm_struct *mm, struct slice_mask mask, int psiz
>   
>   	lpsizes = mm->context.low_slices_psize;
>   	for (i = 0; i < SLICE_NUM_LOW; i++) {
> -		if (!(mask.low_slices & (1u << i)))
> +		if (!(mask->low_slices & (1u << i)))
>   			continue;
>   
>   		mask_index = i & 0x1;
> @@ -244,7 +249,7 @@ static void slice_convert(struct mm_struct *mm, struct slice_mask mask, int psiz
>   
>   	hpsizes = mm->context.high_slices_psize;
>   	for (i = 0; i < GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit); i++) {
> -		if (!test_bit(i, mask.high_slices))
> +		if (!test_bit(i, mask->high_slices))
>   			continue;
>   
>   		mask_index = i & 0x1;
> @@ -270,26 +275,25 @@ static void slice_convert(struct mm_struct *mm, struct slice_mask mask, int psiz
>    * 'available' slice_mark.
>    */
>   static bool slice_scan_available(unsigned long addr,
> -				 struct slice_mask available,
> -				 int end,
> -				 unsigned long *boundary_addr)
> +				 const struct slice_mask *available,
> +				 int end, unsigned long *boundary_addr)
>   {
>   	unsigned long slice;
>   	if (addr < SLICE_LOW_TOP) {
>   		slice = GET_LOW_SLICE_INDEX(addr);
>   		*boundary_addr = (slice + end) << SLICE_LOW_SHIFT;
> -		return !!(available.low_slices & (1u << slice));
> +		return !!(available->low_slices & (1u << slice));
>   	} else {
>   		slice = GET_HIGH_SLICE_INDEX(addr);
>   		*boundary_addr = (slice + end) ?
>   			((slice + end) << SLICE_HIGH_SHIFT) : SLICE_LOW_TOP;
> -		return !!test_bit(slice, available.high_slices);
> +		return !!test_bit(slice, available->high_slices);
>   	}
>   }
>   
>   static unsigned long slice_find_area_bottomup(struct mm_struct *mm,
>   					      unsigned long len,
> -					      struct slice_mask available,
> +					      const struct slice_mask *available,
>   					      int psize, unsigned long high_limit)
>   {
>   	int pshift = max_t(int, mmu_psize_defs[psize].shift, PAGE_SHIFT);
> @@ -335,7 +339,7 @@ static unsigned long slice_find_area_bottomup(struct mm_struct *mm,
>   
>   static unsigned long slice_find_area_topdown(struct mm_struct *mm,
>   					     unsigned long len,
> -					     struct slice_mask available,
> +					     const struct slice_mask *available,
>   					     int psize, unsigned long high_limit)
>   {
>   	int pshift = max_t(int, mmu_psize_defs[psize].shift, PAGE_SHIFT);
> @@ -393,7 +397,7 @@ static unsigned long slice_find_area_topdown(struct mm_struct *mm,
>   
>   
>   static unsigned long slice_find_area(struct mm_struct *mm, unsigned long len,
> -				     struct slice_mask mask, int psize,
> +				     const struct slice_mask *mask, int psize,
>   				     int topdown, unsigned long high_limit)
>   {
>   	if (topdown)
> @@ -402,7 +406,8 @@ static unsigned long slice_find_area(struct mm_struct *mm, unsigned long len,
>   		return slice_find_area_bottomup(mm, len, mask, psize, high_limit);
>   }
>   
> -static inline void slice_or_mask(struct slice_mask *dst, struct slice_mask *src)
> +static inline void slice_or_mask(struct slice_mask *dst,
> +					const struct slice_mask *src)
>   {
>   	dst->low_slices |= src->low_slices;
>   	if (!SLICE_NUM_HIGH)
> @@ -411,7 +416,8 @@ static inline void slice_or_mask(struct slice_mask *dst, struct slice_mask *src)
>   		  SLICE_NUM_HIGH);
>   }
>   
> -static inline void slice_andnot_mask(struct slice_mask *dst, struct slice_mask *src)
> +static inline void slice_andnot_mask(struct slice_mask *dst,
> +					const struct slice_mask *src)
>   {
>   	dst->low_slices &= ~src->low_slices;
>   
> @@ -501,7 +507,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   	 * already
>   	 */
>   	slice_mask_for_size(mm, psize, &good_mask, high_limit);
> -	slice_print_mask(" good_mask", good_mask);
> +	slice_print_mask(" good_mask", &good_mask);
>   
>   	/*
>   	 * Here "good" means slices that are already the right page size,
> @@ -535,12 +541,12 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   	if (addr != 0 || fixed) {
>   		/* Build a mask for the requested range */
>   		slice_range_to_mask(addr, len, &mask);
> -		slice_print_mask(" mask", mask);
> +		slice_print_mask(" mask", &mask);
>   
>   		/* Check if we fit in the good mask. If we do, we just return,
>   		 * nothing else to do
>   		 */
> -		if (slice_check_fit(mm, mask, good_mask)) {
> +		if (slice_check_fit(mm, &mask, &good_mask)) {
>   			slice_dbg(" fits good !\n");
>   			return addr;
>   		}
> @@ -548,7 +554,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   		/* Now let's see if we can find something in the existing
>   		 * slices for that size
>   		 */
> -		newaddr = slice_find_area(mm, len, good_mask,
> +		newaddr = slice_find_area(mm, len, &good_mask,
>   					  psize, topdown, high_limit);
>   		if (newaddr != -ENOMEM) {
>   			/* Found within the good mask, we don't have to setup,
> @@ -564,9 +570,10 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   	 */
>   	slice_mask_for_free(mm, &potential_mask, high_limit);
>   	slice_or_mask(&potential_mask, &good_mask);
> -	slice_print_mask(" potential", potential_mask);
> +	slice_print_mask(" potential", &potential_mask);
>   
> -	if ((addr != 0 || fixed) && slice_check_fit(mm, mask, potential_mask)) {
> +	if ((addr != 0 || fixed) &&
> +			slice_check_fit(mm, &mask, &potential_mask)) {
>   		slice_dbg(" fits potential !\n");
>   		goto convert;
>   	}
> @@ -581,7 +588,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   	 * anywhere in the good area.
>   	 */
>   	if (addr) {
> -		addr = slice_find_area(mm, len, good_mask,
> +		addr = slice_find_area(mm, len, &good_mask,
>   				       psize, topdown, high_limit);
>   		if (addr != -ENOMEM) {
>   			slice_dbg(" found area at 0x%lx\n", addr);
> @@ -592,14 +599,14 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   	/* Now let's see if we can find something in the existing slices
>   	 * for that size plus free slices
>   	 */
> -	addr = slice_find_area(mm, len, potential_mask,
> +	addr = slice_find_area(mm, len, &potential_mask,
>   			       psize, topdown, high_limit);
>   
>   #ifdef CONFIG_PPC_64K_PAGES
>   	if (addr == -ENOMEM && psize == MMU_PAGE_64K) {
>   		/* retry the search with 4k-page slices included */
>   		slice_or_mask(&potential_mask, &compat_mask);
> -		addr = slice_find_area(mm, len, potential_mask,
> +		addr = slice_find_area(mm, len, &potential_mask,
>   				       psize, topdown, high_limit);
>   	}
>   #endif
> @@ -609,7 +616,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   
>   	slice_range_to_mask(addr, len, &mask);
>   	slice_dbg(" found potential area at 0x%lx\n", addr);
> -	slice_print_mask(" mask", mask);
> +	slice_print_mask(" mask", &mask);
>   
>    convert:
>   	slice_andnot_mask(&mask, &good_mask);
> @@ -617,7 +624,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   	if (mask.low_slices ||
>   	    (SLICE_NUM_HIGH &&
>   	     !bitmap_empty(mask.high_slices, SLICE_NUM_HIGH))) {
> -		slice_convert(mm, mask, psize);
> +		slice_convert(mm, &mask, psize);
>   		if (psize > MMU_PAGE_BASE)
>   			on_each_cpu(slice_flush_segments, mm, 1);
>   	}
> @@ -706,7 +713,7 @@ void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
>   	VM_BUG_ON(radix_enabled());
>   
>   	slice_range_to_mask(start, len, &mask);
> -	slice_convert(mm, mask, psize);
> +	slice_convert(mm, &mask, psize);
>   }
>   
>   #ifdef CONFIG_HUGETLB_PAGE
> @@ -753,9 +760,9 @@ int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
>   #if 0 /* too verbose */
>   	slice_dbg("is_hugepage_only_range(mm=%p, addr=%lx, len=%lx)\n",
>   		 mm, addr, len);
> -	slice_print_mask(" mask", mask);
> -	slice_print_mask(" available", available);
> +	slice_print_mask(" mask", &mask);
> +	slice_print_mask(" available", &available);
>   #endif
> -	return !slice_check_fit(mm, mask, available);
> +	return !slice_check_fit(mm, &mask, &available);
>   }
>   #endif
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 05/10] powerpc/mm/slice: implement a slice mask cache
  2018-03-06 13:25 ` [PATCH 05/10] powerpc/mm/slice: implement a slice mask cache Nicholas Piggin
@ 2018-03-06 13:49   ` Christophe LEROY
  2018-03-06 14:01     ` Nicholas Piggin
  0 siblings, 1 reply; 31+ messages in thread
From: Christophe LEROY @ 2018-03-06 13:49 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev
  Cc: Aneesh Kumar K . V, Benjamin Herrenschmidt, Anton Blanchard



Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :
> Calculating the slice mask can become a signifcant overhead for
> get_unmapped_area. This patch adds a struct slice_mask for
> each page size in the mm_context, and keeps these in synch with
> the slices psize arrays and slb_addr_limit.
> 
> On Book3S/64 this adds 288 bytes to the mm_context_t for the
> slice mask caches.
> 
> On POWER8, this increases vfork+exec+exit performance by 9.9%
> and reduces time to mmap+munmap a 64kB page by 28%.
> 
> Reduces time to mmap+munmap by about 10% on 8xx.
> 
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Anton Blanchard <anton@samba.org>
> ---
>   arch/powerpc/include/asm/book3s/64/mmu.h |  18 +++++
>   arch/powerpc/include/asm/mmu-8xx.h       |  14 ++++
>   arch/powerpc/mm/slice.c                  | 118 ++++++++++++++++++++-----------
>   3 files changed, 107 insertions(+), 43 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
> index bef6e39ed63a..777778579305 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
> @@ -80,6 +80,16 @@ struct spinlock;
>   /* Maximum possible number of NPUs in a system. */
>   #define NV_MAX_NPUS 8
>   
> +/*
> + * One bit per slice. We have lower slices which cover 256MB segments
> + * upto 4G range. That gets us 16 low slices. For the rest we track slices
> + * in 1TB size.
> + */
> +struct slice_mask {
> +	u64 low_slices;
> +	DECLARE_BITMAP(high_slices, SLICE_NUM_HIGH);
> +};
> +
>   typedef struct {
>   	mm_context_id_t id;
>   	u16 user_psize;		/* page size index */
> @@ -95,6 +105,14 @@ typedef struct {
>   	unsigned char low_slices_psize[BITS_PER_LONG / BITS_PER_BYTE];
>   	unsigned char high_slices_psize[SLICE_ARRAY_SIZE];
>   	unsigned long slb_addr_limit;
> +# ifdef CONFIG_PPC_64K_PAGES
> +	struct slice_mask mask_64k;
> +# endif
> +	struct slice_mask mask_4k;
> +# ifdef CONFIG_HUGETLB_PAGE
> +	struct slice_mask mask_16m;
> +	struct slice_mask mask_16g;
> +# endif
>   #else
>   	u16 sllp;		/* SLB page size encoding */
>   #endif
> diff --git a/arch/powerpc/include/asm/mmu-8xx.h b/arch/powerpc/include/asm/mmu-8xx.h
> index d3d7e79140c6..4c3b14703b3e 100644
> --- a/arch/powerpc/include/asm/mmu-8xx.h
> +++ b/arch/powerpc/include/asm/mmu-8xx.h
> @@ -192,6 +192,11 @@
>   #endif
>   
>   #ifndef __ASSEMBLY__
> +struct slice_mask {
> +	u64 low_slices;
> +	DECLARE_BITMAP(high_slices, 0);
> +};
> +
>   typedef struct {
>   	unsigned int id;
>   	unsigned int active;
> @@ -201,6 +206,15 @@ typedef struct {
>   	unsigned char low_slices_psize[SLICE_ARRAY_SIZE];
>   	unsigned char high_slices_psize[0];
>   	unsigned long slb_addr_limit;
> +# ifdef CONFIG_PPC_16K_PAGES
> +	struct slice_mask mask_16k;
> +# else
> +	struct slice_mask mask_4k;
> +# endif

Could we just call it mask_base or something like that regardless of the 
standard page size ?

> +# ifdef CONFIG_HUGETLB_PAGE
> +	struct slice_mask mask_512k;
> +	struct slice_mask mask_8m;
> +# endif
>   #endif
>   } mm_context_t;
>   
> diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
> index 233c42d593dc..2115efe5e869 100644
> --- a/arch/powerpc/mm/slice.c
> +++ b/arch/powerpc/mm/slice.c
> @@ -37,15 +37,6 @@
>   #include <asm/hugetlb.h>
>   
>   static DEFINE_SPINLOCK(slice_convert_lock);
> -/*
> - * One bit per slice. We have lower slices which cover 256MB segments
> - * upto 4G range. That gets us 16 low slices. For the rest we track slices
> - * in 1TB size.
> - */
> -struct slice_mask {
> -	u64 low_slices;
> -	DECLARE_BITMAP(high_slices, SLICE_NUM_HIGH);
> -};
>   
>   #ifdef DEBUG
>   int _slice_debug = 1;
> @@ -149,37 +140,44 @@ static void slice_mask_for_free(struct mm_struct *mm, struct slice_mask *ret,
>   			__set_bit(i, ret->high_slices);
>   }
>   
> -static void slice_mask_for_size(struct mm_struct *mm, int psize,
> -				struct slice_mask *ret,
> -				unsigned long high_limit)
> +#ifdef CONFIG_PPC_BOOK3S_64
> +static struct slice_mask *slice_mask_for_size(struct mm_struct *mm, int psize)
>   {
> -	unsigned char *hpsizes, *lpsizes;
> -	int index, mask_index;
> -	unsigned long i;
> -
> -	ret->low_slices = 0;
> -	if (SLICE_NUM_HIGH)
> -		bitmap_zero(ret->high_slices, SLICE_NUM_HIGH);
> -
> -	lpsizes = mm->context.low_slices_psize;
> -	for (i = 0; i < SLICE_NUM_LOW; i++) {
> -		mask_index = i & 0x1;
> -		index = i >> 1;
> -		if (((lpsizes[index] >> (mask_index * 4)) & 0xf) == psize)
> -			ret->low_slices |= 1u << i;
> -	}
> -
> -	if (high_limit <= SLICE_LOW_TOP)
> -		return;
> -
> -	hpsizes = mm->context.high_slices_psize;
> -	for (i = 0; i < GET_HIGH_SLICE_INDEX(high_limit); i++) {
> -		mask_index = i & 0x1;
> -		index = i >> 1;
> -		if (((hpsizes[index] >> (mask_index * 4)) & 0xf) == psize)
> -			__set_bit(i, ret->high_slices);
> -	}
> +#ifdef CONFIG_PPC_64K_PAGES
> +	if (psize == MMU_PAGE_64K)
> +		return &mm->context.mask_64k;
> +#endif
> +	if (psize == MMU_PAGE_4K)
> +		return &mm->context.mask_4k;
> +#ifdef CONFIG_HUGETLB_PAGE
> +	if (psize == MMU_PAGE_16M)
> +		return &mm->context.mask_16m;
> +	if (psize == MMU_PAGE_16G)
> +		return &mm->context.mask_16g;
> +#endif
> +	BUG();
>   }
> +#elif defined(CONFIG_PPC_8xx)
> +static struct slice_mask *slice_mask_for_size(struct mm_struct *mm, int psize)
> +{
> +#ifdef CONFIG_PPC_16K_PAGES
> +	if (psize == MMU_PAGE_16K)
> +		return &mm->context.mask_16k;
> +#else
> +	if (psize == MMU_PAGE_4K)
> +		return &mm->context.mask_4k;
> +#endif

What about the following instead:
+	if (psize == mmu_virtual_size)
+		return &mm->context.mask_base;

Christophe

> +#ifdef CONFIG_HUGETLB_PAGE
> +	if (psize == MMU_PAGE_512K)
> +		return &mm->context.mask_512k;
> +	if (psize == MMU_PAGE_8M)
> +		return &mm->context.mask_8m;
> +#endif
> +	BUG();
> +}
> +#else
> +#error "Must define the slice masks for page sizes supported by the platform"
> +#endif
>   
>   static int slice_check_fit(struct mm_struct *mm,
>   			   const struct slice_mask *mask,
> @@ -226,11 +224,15 @@ static void slice_convert(struct mm_struct *mm,
>   	int index, mask_index;
>   	/* Write the new slice psize bits */
>   	unsigned char *hpsizes, *lpsizes;
> +	struct slice_mask *psize_mask, *old_mask;
>   	unsigned long i, flags;
> +	int old_psize;
>   
>   	slice_dbg("slice_convert(mm=%p, psize=%d)\n", mm, psize);
>   	slice_print_mask(" mask", mask);
>   
> +	psize_mask = slice_mask_for_size(mm, psize);
> +
>   	/* We need to use a spinlock here to protect against
>   	 * concurrent 64k -> 4k demotion ...
>   	 */
> @@ -243,6 +245,14 @@ static void slice_convert(struct mm_struct *mm,
>   
>   		mask_index = i & 0x1;
>   		index = i >> 1;
> +
> +		/* Update the slice_mask */
> +		old_psize = (lpsizes[index] >> (mask_index * 4)) & 0xf;
> +		old_mask = slice_mask_for_size(mm, old_psize);
> +		old_mask->low_slices &= ~(1u << i);
> +		psize_mask->low_slices |= 1u << i;
> +
> +		/* Update the sizes array */
>   		lpsizes[index] = (lpsizes[index] & ~(0xf << (mask_index * 4))) |
>   				(((unsigned long)psize) << (mask_index * 4));
>   	}
> @@ -254,6 +264,14 @@ static void slice_convert(struct mm_struct *mm,
>   
>   		mask_index = i & 0x1;
>   		index = i >> 1;
> +
> +		/* Update the slice_mask */
> +		old_psize = (hpsizes[index] >> (mask_index * 4)) & 0xf;
> +		old_mask = slice_mask_for_size(mm, old_psize);
> +		__clear_bit(i, old_mask->high_slices);
> +		__set_bit(i, psize_mask->high_slices);
> +
> +		/* Update the sizes array */
>   		hpsizes[index] = (hpsizes[index] & ~(0xf << (mask_index * 4))) |
>   				(((unsigned long)psize) << (mask_index * 4));
>   	}
> @@ -464,7 +482,13 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   	}
>   
>   	if (high_limit > mm->context.slb_addr_limit) {
> +		/*
> +		 * Increasing the slb_addr_limit does not require
> +		 * slice mask cache to be recalculated because it should
> +		 * be already initialised beyond the old address limit.
> +		 */
>   		mm->context.slb_addr_limit = high_limit;
> +
>   		on_each_cpu(slice_flush_segments, mm, 1);
>   	}
>   
> @@ -506,7 +530,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   	/* First make up a "good" mask of slices that have the right size
>   	 * already
>   	 */
> -	slice_mask_for_size(mm, psize, &good_mask, high_limit);
> +	good_mask = *slice_mask_for_size(mm, psize);
>   	slice_print_mask(" good_mask", &good_mask);
>   
>   	/*
> @@ -531,7 +555,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   #ifdef CONFIG_PPC_64K_PAGES
>   	/* If we support combo pages, we can allow 64k pages in 4k slices */
>   	if (psize == MMU_PAGE_64K) {
> -		slice_mask_for_size(mm, MMU_PAGE_4K, &compat_mask, high_limit);
> +		compat_mask = *slice_mask_for_size(mm, MMU_PAGE_4K);
>   		if (fixed)
>   			slice_or_mask(&good_mask, &compat_mask);
>   	}
> @@ -683,6 +707,7 @@ EXPORT_SYMBOL_GPL(get_slice_psize);
>   void slice_init_new_context_exec(struct mm_struct *mm)
>   {
>   	unsigned char *hpsizes, *lpsizes;
> +	struct slice_mask *mask;
>   	unsigned int psize = mmu_virtual_psize;
>   
>   	slice_dbg("slice_init_new_context_exec(mm=%p)\n", mm);
> @@ -703,6 +728,14 @@ void slice_init_new_context_exec(struct mm_struct *mm)
>   
>   	hpsizes = mm->context.high_slices_psize;
>   	memset(hpsizes, (psize << 4) | psize, SLICE_NUM_HIGH >> 1);
> +
> +	/*
> +	 * Slice mask cache starts zeroed, fill the default size cache.
> +	 */
> +	mask = slice_mask_for_size(mm, psize);
> +	mask->low_slices = ~0UL;
> +	if (SLICE_NUM_HIGH)
> +		bitmap_fill(mask->high_slices, SLICE_NUM_HIGH);
>   }
>   
>   void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
> @@ -741,18 +774,17 @@ int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
>   {
>   	struct slice_mask mask, available;
>   	unsigned int psize = mm->context.user_psize;
> -	unsigned long high_limit = mm->context.slb_addr_limit;
>   
>   	if (radix_enabled())
>   		return 0;
>   
>   	slice_range_to_mask(addr, len, &mask);
> -	slice_mask_for_size(mm, psize, &available, high_limit);
> +	available = *slice_mask_for_size(mm, psize);
>   #ifdef CONFIG_PPC_64K_PAGES
>   	/* We need to account for 4k slices too */
>   	if (psize == MMU_PAGE_64K) {
>   		struct slice_mask compat_mask;
> -		slice_mask_for_size(mm, MMU_PAGE_4K, &compat_mask, high_limit);
> +		compat_mask = *slice_mask_for_size(mm, MMU_PAGE_4K);
>   		slice_or_mask(&available, &compat_mask);
>   	}
>   #endif
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 04/10] powerpc/mm/slice: pass pointers to struct slice_mask where possible
  2018-03-06 13:43   ` Christophe LEROY
@ 2018-03-06 13:59     ` Nicholas Piggin
  0 siblings, 0 replies; 31+ messages in thread
From: Nicholas Piggin @ 2018-03-06 13:59 UTC (permalink / raw)
  To: Christophe LEROY; +Cc: linuxppc-dev, Aneesh Kumar K . V

On Tue, 6 Mar 2018 14:43:40 +0100
Christophe LEROY <christophe.leroy@c-s.fr> wrote:

> Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :

> > @@ -147,7 +149,8 @@ static void slice_mask_for_free(struct mm_struct *mm, struct slice_mask *ret,
> >   			__set_bit(i, ret->high_slices);
> >   }
> >   
> > -static void slice_mask_for_size(struct mm_struct *mm, int psize, struct slice_mask *ret,
> > +static void slice_mask_for_size(struct mm_struct *mm, int psize,
> > +				struct slice_mask *ret,  
> 
> That seems to be cleanup only, is it worth including that in this patch ?

Yeah that can go.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 05/10] powerpc/mm/slice: implement a slice mask cache
  2018-03-06 13:49   ` Christophe LEROY
@ 2018-03-06 14:01     ` Nicholas Piggin
  0 siblings, 0 replies; 31+ messages in thread
From: Nicholas Piggin @ 2018-03-06 14:01 UTC (permalink / raw)
  To: Christophe LEROY
  Cc: linuxppc-dev, Aneesh Kumar K . V, Benjamin Herrenschmidt,
	Anton Blanchard

On Tue, 6 Mar 2018 14:49:57 +0100
Christophe LEROY <christophe.leroy@c-s.fr> wrote:

> Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :

> > @@ -201,6 +206,15 @@ typedef struct {
> >   	unsigned char low_slices_psize[SLICE_ARRAY_SIZE];
> >   	unsigned char high_slices_psize[0];
> >   	unsigned long slb_addr_limit;
> > +# ifdef CONFIG_PPC_16K_PAGES
> > +	struct slice_mask mask_16k;
> > +# else
> > +	struct slice_mask mask_4k;
> > +# endif  
> 
> Could we just call it mask_base or something like that regardless of the 
> standard page size ?

[...]

> > +#elif defined(CONFIG_PPC_8xx)
> > +static struct slice_mask *slice_mask_for_size(struct mm_struct *mm, int psize)
> > +{
> > +#ifdef CONFIG_PPC_16K_PAGES
> > +	if (psize == MMU_PAGE_16K)
> > +		return &mm->context.mask_16k;
> > +#else
> > +	if (psize == MMU_PAGE_4K)
> > +		return &mm->context.mask_4k;
> > +#endif  
> 
> What about the following instead:
> +	if (psize == mmu_virtual_size)
> +		return &mm->context.mask_base;

Sure if you prefer. It should generate the same code right?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 02/10] powerpc/mm/slice: Simplify and optimise slice context initialisation
  2018-03-06 13:24 ` [PATCH 02/10] powerpc/mm/slice: Simplify and optimise slice context initialisation Nicholas Piggin
@ 2018-03-06 14:32   ` Nicholas Piggin
  0 siblings, 0 replies; 31+ messages in thread
From: Nicholas Piggin @ 2018-03-06 14:32 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Aneesh Kumar K . V, Christophe Leroy

On Tue,  6 Mar 2018 23:24:59 +1000
Nicholas Piggin <npiggin@gmail.com> wrote:

> diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c
> index 929d9ef7083f..80acad52b006 100644
> --- a/arch/powerpc/mm/mmu_context_book3s64.c
> +++ b/arch/powerpc/mm/mmu_context_book3s64.c
> @@ -93,13 +93,6 @@ static int hash__init_new_context(struct mm_struct *mm)
>  	if (index < 0)
>  		return index;
>  
> -	/*
> -	 * In the case of exec, use the default limit,
> -	 * otherwise inherit it from the mm we are duplicating.
> -	 */
> -	if (!mm->context.slb_addr_limit)
> -		mm->context.slb_addr_limit = DEFAULT_MAP_WINDOW_USER64;

[...]

> diff --git a/arch/powerpc/mm/mmu_context_nohash.c b/arch/powerpc/mm/mmu_context_nohash.c
> index d98f7e5c141b..be8f5c9d4d08 100644
> --- a/arch/powerpc/mm/mmu_context_nohash.c
> +++ b/arch/powerpc/mm/mmu_context_nohash.c
> @@ -332,9 +332,6 @@ int init_new_context(struct task_struct *t, struct mm_struct *mm)
>  	pr_hard("initing context for mm @%p\n", mm);
>  
>  #ifdef	CONFIG_PPC_MM_SLICES
> -	if (!mm->context.slb_addr_limit)
> -		mm->context.slb_addr_limit = DEFAULT_MAP_WINDOW;
> -

[...]

> diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
> index 5e9e1e57d580..af4351b15d01 100644
> --- a/arch/powerpc/mm/slice.c
> +++ b/arch/powerpc/mm/slice.c
> @@ -671,70 +671,29 @@ unsigned int get_slice_psize(struct mm_struct *mm, unsigned long addr)
>  }
>  EXPORT_SYMBOL_GPL(get_slice_psize);
>  
> -/*
> - * This is called by hash_page when it needs to do a lazy conversion of
> - * an address space from real 64K pages to combo 4K pages (typically
> - * when hitting a non cacheable mapping on a processor or hypervisor
> - * that won't allow them for 64K pages).
> - *
> - * This is also called in init_new_context() to change back the user
> - * psize from whatever the parent context had it set to
> - * N.B. This may be called before mm->context.id has been set.
> - *
> - * This function will only change the content of the {low,high)_slice_psize
> - * masks, it will not flush SLBs as this shall be handled lazily by the
> - * caller.
> - */
> -void slice_set_user_psize(struct mm_struct *mm, unsigned int psize)
> +void slice_init_new_context_exec(struct mm_struct *mm)
>  {
> -	int index, mask_index;
>  	unsigned char *hpsizes, *lpsizes;
> -	unsigned long flags;
> -	unsigned int old_psize;
> -	int i;
> -
> -	slice_dbg("slice_set_user_psize(mm=%p, psize=%d)\n", mm, psize);
> -
> -	VM_BUG_ON(radix_enabled());
> -	spin_lock_irqsave(&slice_convert_lock, flags);
> +	unsigned int psize = mmu_virtual_psize;
>  
> -	old_psize = mm->context.user_psize;
> -	slice_dbg(" old_psize=%d\n", old_psize);
> -	if (old_psize == psize)
> -		goto bail;
> +	slice_dbg("slice_init_new_context_exec(mm=%p)\n", mm);
>  
> +	/*
> +	 * In the case of exec, use the default limit. In the
> +	 * case of fork it is just inherited from the mm being
> +	 * duplicated.
> +	 */
> +	mm->context.slb_addr_limit = DEFAULT_MAP_WINDOW_USER64;

Okay this looks wrong for 8xx now I just noticed. But... isn't existing
code wrong for 32-bit tasks on BOOK3S_64? It should be DEFAULT_MAP_WINDOW
here I think?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 06/10] powerpc/mm/slice: implement slice_check_range_fits
  2018-03-06 13:25 ` [PATCH 06/10] powerpc/mm/slice: implement slice_check_range_fits Nicholas Piggin
@ 2018-03-06 14:41   ` Christophe LEROY
  2018-03-06 23:12     ` Nicholas Piggin
  0 siblings, 1 reply; 31+ messages in thread
From: Christophe LEROY @ 2018-03-06 14:41 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Aneesh Kumar K . V



Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :
> Rather than build slice masks from a range then use that to check for
> fit in a candidate mask, implement slice_check_range_fits that checks
> if a range fits in a mask directly.
> 
> This allows several structures to be removed from stacks, and also we
> don't expect a huge range in a lot of these cases, so building and
> comparing a full mask is going to be more expensive than testing just
> one or two bits of the range.
> 
> On POWER8, this increases vfork+exec+exit performance by 0.3%
> and reduces time to mmap+munmap a 64kB page by 5%.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   arch/powerpc/mm/slice.c | 63 ++++++++++++++++++++++++++-----------------------
>   1 file changed, 34 insertions(+), 29 deletions(-)
> 
> diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
> index 2115efe5e869..3841fca75006 100644
> --- a/arch/powerpc/mm/slice.c
> +++ b/arch/powerpc/mm/slice.c
> @@ -179,26 +179,35 @@ static struct slice_mask *slice_mask_for_size(struct mm_struct *mm, int psize)
>   #error "Must define the slice masks for page sizes supported by the platform"
>   #endif
>   
> -static int slice_check_fit(struct mm_struct *mm,
> -			   const struct slice_mask *mask,
> -			   const struct slice_mask *available)
> +static bool slice_check_range_fits(struct mm_struct *mm,
> +			   const struct slice_mask *available,
> +			   unsigned long start, unsigned long len)
>   {
> -	DECLARE_BITMAP(result, SLICE_NUM_HIGH);
> -	/*
> -	 * Make sure we just do bit compare only to the max
> -	 * addr limit and not the full bit map size.
> -	 */
> -	unsigned long slice_count = GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit);
> +	unsigned long end = start + len - 1;
> +	u64 low_slices = 0;
>   
> -	if (!SLICE_NUM_HIGH)
> -		return (mask->low_slices & available->low_slices) ==
> -		       mask->low_slices;
> +	if (start < SLICE_LOW_TOP) {
> +		unsigned long mend = min(end, (SLICE_LOW_TOP - 1));

See slice_range_to_mask()

You'll have an issue here with PPC32, you have to cast (SLICE_LOW_TOP - 
1) to unsigned long because SLICE_LOW_TOP is unsigned long long on PPC32

> +
> +		low_slices = (1u << (GET_LOW_SLICE_INDEX(mend) + 1))
> +				- (1u << GET_LOW_SLICE_INDEX(start));
> +	}
> +	if ((low_slices & available->low_slices) != low_slices)
> +		return false;
> +
> +	if (SLICE_NUM_HIGH && ((start + len) > SLICE_LOW_TOP)) {
> +		unsigned long start_index = GET_HIGH_SLICE_INDEX(start);
> +		unsigned long align_end = ALIGN(end, (1UL << SLICE_HIGH_SHIFT));
> +		unsigned long count = GET_HIGH_SLICE_INDEX(align_end) - start_index;
> +		unsigned long i;
>   
> -	bitmap_and(result, mask->high_slices,
> -		   available->high_slices, slice_count);
> +		for (i = start_index; i < start_index + count; i++) {
> +			if (!test_bit(i, available->high_slices))
> +				return false;
> +		}

What about using bitmap_find_next_zero_area()

> +	}
>   
> -	return (mask->low_slices & available->low_slices) == mask->low_slices &&
> -		bitmap_equal(result, mask->high_slices, slice_count);
> +	return true;
>   }
>   
>   static void slice_flush_segments(void *parm)
> @@ -562,15 +571,11 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   #endif
>   
>   	/* First check hint if it's valid or if we have MAP_FIXED */
> -	if (addr != 0 || fixed) {
> -		/* Build a mask for the requested range */
> -		slice_range_to_mask(addr, len, &mask);
> -		slice_print_mask(" mask", &mask);
> -
> +	if (addr || fixed) {

It is cleanup, should it really be part of this patch ?

>   		/* Check if we fit in the good mask. If we do, we just return,
>   		 * nothing else to do
>   		 */
> -		if (slice_check_fit(mm, &mask, &good_mask)) {
> +		if (slice_check_range_fits(mm, &good_mask, addr, len)) {
>   			slice_dbg(" fits good !\n");
>   			return addr;
>   		}
> @@ -596,10 +601,11 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   	slice_or_mask(&potential_mask, &good_mask);
>   	slice_print_mask(" potential", &potential_mask);
>   
> -	if ((addr != 0 || fixed) &&
> -			slice_check_fit(mm, &mask, &potential_mask)) {
> -		slice_dbg(" fits potential !\n");
> -		goto convert;
> +	if (addr || fixed) {
> +		if (slice_check_range_fits(mm, &potential_mask, addr, len)) {
> +			slice_dbg(" fits potential !\n");
> +			goto convert;
> +		}

Why not keep the original structure and just replacing slice_check_fit() 
by slice_check_range_fits() ?

I believe cleanups should not be mixed with real feature changes. If 
needed, you should have a cleanup patch up front the serie.

Christophe

>   	}
>   
>   	/* If we have MAP_FIXED and failed the above steps, then error out */
> @@ -772,13 +778,12 @@ void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
>   int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
>   			   unsigned long len)
>   {
> -	struct slice_mask mask, available;
> +	struct slice_mask available;
>   	unsigned int psize = mm->context.user_psize;
>   
>   	if (radix_enabled())
>   		return 0;
>   
> -	slice_range_to_mask(addr, len, &mask);
>   	available = *slice_mask_for_size(mm, psize);
>   #ifdef CONFIG_PPC_64K_PAGES
>   	/* We need to account for 4k slices too */
> @@ -795,6 +800,6 @@ int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
>   	slice_print_mask(" mask", &mask);
>   	slice_print_mask(" available", &available);
>   #endif
> -	return !slice_check_fit(mm, &mask, &available);
> +	return !slice_check_range_fits(mm, &available, addr, len);
>   }
>   #endif
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 07/10] powerpc/mm/slice: Switch to 3-operand slice bitops helpers
  2018-03-06 13:25 ` [PATCH 07/10] powerpc/mm/slice: Switch to 3-operand slice bitops helpers Nicholas Piggin
@ 2018-03-06 14:44   ` Christophe LEROY
  2018-03-06 23:19     ` Nicholas Piggin
  0 siblings, 1 reply; 31+ messages in thread
From: Christophe LEROY @ 2018-03-06 14:44 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Aneesh Kumar K . V



Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :
> This converts the slice_mask bit operation helpers to be the usual
> 3-operand kind, which is clearer to work with.

What's the real benefit of doing that ?

It is helps for a subsequent patch, say it.
Otherwise, I really can't see the point.

> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   arch/powerpc/mm/slice.c | 38 +++++++++++++++++++++++---------------
>   1 file changed, 23 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
> index 3841fca75006..46daa1d1794f 100644
> --- a/arch/powerpc/mm/slice.c
> +++ b/arch/powerpc/mm/slice.c
> @@ -433,25 +433,33 @@ static unsigned long slice_find_area(struct mm_struct *mm, unsigned long len,
>   		return slice_find_area_bottomup(mm, len, mask, psize, high_limit);
>   }
>   
> -static inline void slice_or_mask(struct slice_mask *dst,
> +static inline void slice_copy_mask(struct slice_mask *dst,
>   					const struct slice_mask *src)

This new function is not used, the compiler while probably not be happy.

Christophe

>   {
> -	dst->low_slices |= src->low_slices;
> +	dst->low_slices = src->low_slices;
>   	if (!SLICE_NUM_HIGH)
>   		return;
> -	bitmap_or(dst->high_slices, dst->high_slices, src->high_slices,
> -		  SLICE_NUM_HIGH);
> +	bitmap_copy(dst->high_slices, src->high_slices, SLICE_NUM_HIGH);
>   }
>   
> -static inline void slice_andnot_mask(struct slice_mask *dst,
> -					const struct slice_mask *src)
> +static inline void slice_or_mask(struct slice_mask *dst,
> +					const struct slice_mask *src1,
> +					const struct slice_mask *src2)
>   {
> -	dst->low_slices &= ~src->low_slices;
> +	dst->low_slices = src1->low_slices | src2->low_slices;
> +	if (!SLICE_NUM_HIGH)
> +		return;
> +	bitmap_or(dst->high_slices, src1->high_slices, src2->high_slices, SLICE_NUM_HIGH);
> +}
>   
> +static inline void slice_andnot_mask(struct slice_mask *dst,
> +					const struct slice_mask *src1,
> +					const struct slice_mask *src2)
> +{
> +	dst->low_slices = src1->low_slices & ~src2->low_slices;
>   	if (!SLICE_NUM_HIGH)
>   		return;
> -	bitmap_andnot(dst->high_slices, dst->high_slices, src->high_slices,
> -		      SLICE_NUM_HIGH);
> +	bitmap_andnot(dst->high_slices, src1->high_slices, src2->high_slices, SLICE_NUM_HIGH);
>   }
>   
>   #ifdef CONFIG_PPC_64K_PAGES
> @@ -566,7 +574,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   	if (psize == MMU_PAGE_64K) {
>   		compat_mask = *slice_mask_for_size(mm, MMU_PAGE_4K);
>   		if (fixed)
> -			slice_or_mask(&good_mask, &compat_mask);
> +			slice_or_mask(&good_mask, &good_mask, &compat_mask);
>   	}
>   #endif
>   
> @@ -598,7 +606,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   	 * empty and thus can be converted
>   	 */
>   	slice_mask_for_free(mm, &potential_mask, high_limit);
> -	slice_or_mask(&potential_mask, &good_mask);
> +	slice_or_mask(&potential_mask, &potential_mask, &good_mask);
>   	slice_print_mask(" potential", &potential_mask);
>   
>   	if (addr || fixed) {
> @@ -635,7 +643,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   #ifdef CONFIG_PPC_64K_PAGES
>   	if (addr == -ENOMEM && psize == MMU_PAGE_64K) {
>   		/* retry the search with 4k-page slices included */
> -		slice_or_mask(&potential_mask, &compat_mask);
> +		slice_or_mask(&potential_mask, &potential_mask, &compat_mask);
>   		addr = slice_find_area(mm, len, &potential_mask,
>   				       psize, topdown, high_limit);
>   	}
> @@ -649,8 +657,8 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   	slice_print_mask(" mask", &mask);
>   
>    convert:
> -	slice_andnot_mask(&mask, &good_mask);
> -	slice_andnot_mask(&mask, &compat_mask);
> +	slice_andnot_mask(&mask, &mask, &good_mask);
> +	slice_andnot_mask(&mask, &mask, &compat_mask);
>   	if (mask.low_slices ||
>   	    (SLICE_NUM_HIGH &&
>   	     !bitmap_empty(mask.high_slices, SLICE_NUM_HIGH))) {
> @@ -790,7 +798,7 @@ int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
>   	if (psize == MMU_PAGE_64K) {
>   		struct slice_mask compat_mask;
>   		compat_mask = *slice_mask_for_size(mm, MMU_PAGE_4K);
> -		slice_or_mask(&available, &compat_mask);
> +		slice_or_mask(&available, &available, &compat_mask);
>   	}
>   #endif
>   
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 08/10] powerpc/mm/slice: Use const pointers to cached slice masks where possible
  2018-03-06 13:25 ` [PATCH 08/10] powerpc/mm/slice: Use const pointers to cached slice masks where possible Nicholas Piggin
@ 2018-03-06 14:55   ` Christophe LEROY
  2018-03-06 23:33     ` Nicholas Piggin
  0 siblings, 1 reply; 31+ messages in thread
From: Christophe LEROY @ 2018-03-06 14:55 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Aneesh Kumar K . V



Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :
> The slice_mask cache was a basic conversion which copied the slice
> mask into caller's structures, because that's how the original code
> worked. In most cases the pointer can be used directly instead, saving
> a copy and an on-stack structure.
> 
> On POWER8, this increases vfork+exec+exit performance by 0.3%
> and reduces time to mmap+munmap a 64kB page by 2%.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   arch/powerpc/mm/slice.c | 77 +++++++++++++++++++++----------------------------
>   1 file changed, 33 insertions(+), 44 deletions(-)
> 
> diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
> index 46daa1d1794f..086c31b8b982 100644
> --- a/arch/powerpc/mm/slice.c
> +++ b/arch/powerpc/mm/slice.c
> @@ -472,10 +472,10 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   				      unsigned long flags, unsigned int psize,
>   				      int topdown)
>   {
> -	struct slice_mask mask;
>   	struct slice_mask good_mask;
>   	struct slice_mask potential_mask;
> -	struct slice_mask compat_mask;
> +	const struct slice_mask *maskp;
> +	const struct slice_mask *compat_maskp = NULL;
>   	int fixed = (flags & MAP_FIXED);
>   	int pshift = max_t(int, mmu_psize_defs[psize].shift, PAGE_SHIFT);
>   	unsigned long page_size = 1UL << pshift;
> @@ -509,22 +509,6 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   		on_each_cpu(slice_flush_segments, mm, 1);
>   	}
>   
> -	/*
> -	 * init different masks
> -	 */
> -	mask.low_slices = 0;
> -
> -	/* silence stupid warning */;
> -	potential_mask.low_slices = 0;
> -
> -	compat_mask.low_slices = 0;
> -
> -	if (SLICE_NUM_HIGH) {
> -		bitmap_zero(mask.high_slices, SLICE_NUM_HIGH);
> -		bitmap_zero(potential_mask.high_slices, SLICE_NUM_HIGH);
> -		bitmap_zero(compat_mask.high_slices, SLICE_NUM_HIGH);
> -	}
> -
>   	/* Sanity checks */
>   	BUG_ON(mm->task_size == 0);
>   	BUG_ON(mm->context.slb_addr_limit == 0);
> @@ -547,8 +531,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   	/* First make up a "good" mask of slices that have the right size
>   	 * already
>   	 */
> -	good_mask = *slice_mask_for_size(mm, psize);
> -	slice_print_mask(" good_mask", &good_mask);
> +	maskp = slice_mask_for_size(mm, psize);
>   
>   	/*
>   	 * Here "good" means slices that are already the right page size,
> @@ -572,11 +555,19 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   #ifdef CONFIG_PPC_64K_PAGES
>   	/* If we support combo pages, we can allow 64k pages in 4k slices */
>   	if (psize == MMU_PAGE_64K) {
> -		compat_mask = *slice_mask_for_size(mm, MMU_PAGE_4K);
> +		compat_maskp = slice_mask_for_size(mm, MMU_PAGE_4K);
>   		if (fixed)
> -			slice_or_mask(&good_mask, &good_mask, &compat_mask);
> -	}
> +			slice_or_mask(&good_mask, maskp, compat_maskp);
> +		else
> +			slice_copy_mask(&good_mask, maskp);
> +	} else
>   #endif
> +	{
> +		slice_copy_mask(&good_mask, maskp);
> +	}

You could get something nicer by removing that #ifdef and doing instead:

	if (IS_ENABLED(CONFIG_PPC_64K_PAGES) && psize == MMU_PAGE_64K) {
		...
	} else {
		slice_copy_mask(&good_mask, maskp);
	}

> +	slice_print_mask(" good_mask", &good_mask);
> +	if (compat_maskp)
> +		slice_print_mask(" compat_mask", compat_maskp);
>   
>   	/* First check hint if it's valid or if we have MAP_FIXED */
>   	if (addr || fixed) {
> @@ -643,7 +634,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   #ifdef CONFIG_PPC_64K_PAGES
>   	if (addr == -ENOMEM && psize == MMU_PAGE_64K) {
>   		/* retry the search with 4k-page slices included */
> -		slice_or_mask(&potential_mask, &potential_mask, &compat_mask);
> +		slice_or_mask(&potential_mask, &potential_mask, compat_maskp);
>   		addr = slice_find_area(mm, len, &potential_mask,
>   				       psize, topdown, high_limit);
>   	}
> @@ -652,17 +643,18 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   	if (addr == -ENOMEM)
>   		return -ENOMEM;
>   
> -	slice_range_to_mask(addr, len, &mask);
> +	slice_range_to_mask(addr, len, &potential_mask);
>   	slice_dbg(" found potential area at 0x%lx\n", addr);
> -	slice_print_mask(" mask", &mask);
> +	slice_print_mask(" mask", &potential_mask);
>   
>    convert:
> -	slice_andnot_mask(&mask, &mask, &good_mask);
> -	slice_andnot_mask(&mask, &mask, &compat_mask);
> -	if (mask.low_slices ||
> -	    (SLICE_NUM_HIGH &&
> -	     !bitmap_empty(mask.high_slices, SLICE_NUM_HIGH))) {
> -		slice_convert(mm, &mask, psize);
> +	slice_andnot_mask(&potential_mask, &potential_mask, &good_mask);
> +	if (compat_maskp && !fixed)
> +		slice_andnot_mask(&potential_mask, &potential_mask, compat_maskp);
> +	if (potential_mask.low_slices ||
> +		(SLICE_NUM_HIGH &&
> +		 !bitmap_empty(potential_mask.high_slices, SLICE_NUM_HIGH))) {
> +		slice_convert(mm, &potential_mask, psize);
>   		if (psize > MMU_PAGE_BASE)
>   			on_each_cpu(slice_flush_segments, mm, 1);
>   	}
> @@ -786,28 +778,25 @@ void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
>   int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
>   			   unsigned long len)
>   {
> -	struct slice_mask available;
> +	const struct slice_mask *maskp;
>   	unsigned int psize = mm->context.user_psize;
>   
>   	if (radix_enabled())
>   		return 0;
>   
> -	available = *slice_mask_for_size(mm, psize);
> +	maskp = slice_mask_for_size(mm, psize);
>   #ifdef CONFIG_PPC_64K_PAGES
>   	/* We need to account for 4k slices too */
>   	if (psize == MMU_PAGE_64K) {
> -		struct slice_mask compat_mask;
> -		compat_mask = *slice_mask_for_size(mm, MMU_PAGE_4K);
> -		slice_or_mask(&available, &available, &compat_mask);
> +		const struct slice_mask *compat_maskp;
> +		struct slice_mask available;
> +
> +		compat_maskp = slice_mask_for_size(mm, MMU_PAGE_4K);
> +		slice_or_mask(&available, maskp, compat_maskp);
> +		return !slice_check_range_fits(mm, &available, addr, len);
>   	}
>   #endif
>   
> -#if 0 /* too verbose */
> -	slice_dbg("is_hugepage_only_range(mm=%p, addr=%lx, len=%lx)\n",
> -		 mm, addr, len);
> -	slice_print_mask(" mask", &mask);
> -	slice_print_mask(" available", &available);
> -#endif

That's cleanup, should be in a previous patch.

Christophe

> -	return !slice_check_range_fits(mm, &available, addr, len);
> +	return !slice_check_range_fits(mm, maskp, addr, len);
>   }
>   #endif
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 09/10] powerpc/mm/slice: use the dynamic high slice size to limit bitmap operations
  2018-03-06 13:25 ` [PATCH 09/10] powerpc/mm/slice: use the dynamic high slice size to limit bitmap operations Nicholas Piggin
@ 2018-03-06 15:02   ` Christophe LEROY
  2018-03-06 23:32     ` Nicholas Piggin
  0 siblings, 1 reply; 31+ messages in thread
From: Christophe LEROY @ 2018-03-06 15:02 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Aneesh Kumar K . V



Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :
> The number of high slices a process might use now depends on its
> address space size, and what allocation address it has requested.
> 
> This patch uses that limit throughout call chains where possible,
> rather than use the fixed SLICE_NUM_HIGH for bitmap operations.
> This saves some cost for processes that don't use very large address
> spaces.
> 
> Perormance numbers aren't changed significantly, this may change
> with larger address spaces or different mmap access patterns that
> require more slice mask building.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   arch/powerpc/mm/slice.c | 75 +++++++++++++++++++++++++++++--------------------
>   1 file changed, 45 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
> index 086c31b8b982..507d17e2cfcd 100644
> --- a/arch/powerpc/mm/slice.c
> +++ b/arch/powerpc/mm/slice.c
> @@ -61,14 +61,12 @@ static void slice_print_mask(const char *label, const struct slice_mask *mask) {
>   #endif
>   
>   static void slice_range_to_mask(unsigned long start, unsigned long len,
> -				struct slice_mask *ret)
> +				struct slice_mask *ret,
> +				unsigned long high_slices)
>   {
>   	unsigned long end = start + len - 1;
>   
>   	ret->low_slices = 0;
> -	if (SLICE_NUM_HIGH)
> -		bitmap_zero(ret->high_slices, SLICE_NUM_HIGH);
> -
>   	if (start < SLICE_LOW_TOP) {
>   		unsigned long mend = min(end,
>   					 (unsigned long)(SLICE_LOW_TOP - 1));
> @@ -77,6 +75,10 @@ static void slice_range_to_mask(unsigned long start, unsigned long len,
>   			- (1u << GET_LOW_SLICE_INDEX(start));
>   	}
>   
> +	if (!SLICE_NUM_HIGH)
> +		return;
> +
> +	bitmap_zero(ret->high_slices, high_slices);

In include/linux/bitmap.h, it is said:

  * Note that nbits should be always a compile time evaluable constant.
  * Otherwise many inlines will generate horrible code.

Not sure that's true, but it is written ...


>   	if ((start + len) > SLICE_LOW_TOP) {
>   		unsigned long start_index = GET_HIGH_SLICE_INDEX(start);
>   		unsigned long align_end = ALIGN(end, (1UL << SLICE_HIGH_SHIFT));
> @@ -120,22 +122,20 @@ static int slice_high_has_vma(struct mm_struct *mm, unsigned long slice)
>   }
>   
>   static void slice_mask_for_free(struct mm_struct *mm, struct slice_mask *ret,
> -				unsigned long high_limit)
> +				unsigned long high_slices)
>   {
>   	unsigned long i;
>   
>   	ret->low_slices = 0;
> -	if (SLICE_NUM_HIGH)
> -		bitmap_zero(ret->high_slices, SLICE_NUM_HIGH);
> -
>   	for (i = 0; i < SLICE_NUM_LOW; i++)
>   		if (!slice_low_has_vma(mm, i))
>   			ret->low_slices |= 1u << i;
>   
> -	if (high_limit <= SLICE_LOW_TOP)
> +	if (!SLICE_NUM_HIGH || !high_slices)
>   		return;
>   
> -	for (i = 0; i < GET_HIGH_SLICE_INDEX(high_limit); i++)
> +	bitmap_zero(ret->high_slices, high_slices);
> +	for (i = 0; i < high_slices; i++)
>   		if (!slice_high_has_vma(mm, i))
>   			__set_bit(i, ret->high_slices);
>   }
> @@ -232,6 +232,7 @@ static void slice_convert(struct mm_struct *mm,
>   {
>   	int index, mask_index;
>   	/* Write the new slice psize bits */
> +	unsigned long high_slices;
>   	unsigned char *hpsizes, *lpsizes;
>   	struct slice_mask *psize_mask, *old_mask;
>   	unsigned long i, flags;
> @@ -267,7 +268,8 @@ static void slice_convert(struct mm_struct *mm,
>   	}
>   
>   	hpsizes = mm->context.high_slices_psize;
> -	for (i = 0; i < GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit); i++) {
> +	high_slices = GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit);
> +	for (i = 0; SLICE_NUM_HIGH && i < high_slices; i++) {
>   		if (!test_bit(i, mask->high_slices))
>   			continue;
>   
> @@ -434,32 +436,37 @@ static unsigned long slice_find_area(struct mm_struct *mm, unsigned long len,
>   }
>   
>   static inline void slice_copy_mask(struct slice_mask *dst,
> -					const struct slice_mask *src)
> +					const struct slice_mask *src,
> +					unsigned long high_slices)
>   {
>   	dst->low_slices = src->low_slices;
>   	if (!SLICE_NUM_HIGH)
>   		return;
> -	bitmap_copy(dst->high_slices, src->high_slices, SLICE_NUM_HIGH);
> +	bitmap_copy(dst->high_slices, src->high_slices, high_slices);
>   }
>   
>   static inline void slice_or_mask(struct slice_mask *dst,
>   					const struct slice_mask *src1,
> -					const struct slice_mask *src2)
> +					const struct slice_mask *src2,
> +					unsigned long high_slices)
>   {
>   	dst->low_slices = src1->low_slices | src2->low_slices;
>   	if (!SLICE_NUM_HIGH)
>   		return;
> -	bitmap_or(dst->high_slices, src1->high_slices, src2->high_slices, SLICE_NUM_HIGH);
> +	bitmap_or(dst->high_slices, src1->high_slices, src2->high_slices,
> +			high_slices);

Why a new line here, this line is shorter than before.
Or that was forgotten in a previous patch ?

>   }
>   
>   static inline void slice_andnot_mask(struct slice_mask *dst,
>   					const struct slice_mask *src1,
> -					const struct slice_mask *src2)
> +					const struct slice_mask *src2,
> +					unsigned long high_slices)
>   {
>   	dst->low_slices = src1->low_slices & ~src2->low_slices;
>   	if (!SLICE_NUM_HIGH)
>   		return;
> -	bitmap_andnot(dst->high_slices, src1->high_slices, src2->high_slices, SLICE_NUM_HIGH);
> +	bitmap_andnot(dst->high_slices, src1->high_slices, src2->high_slices,
> +			high_slices);

Same comment.

>   }
>   
>   #ifdef CONFIG_PPC_64K_PAGES
> @@ -482,6 +489,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   	struct mm_struct *mm = current->mm;
>   	unsigned long newaddr;
>   	unsigned long high_limit;
> +	unsigned long high_slices;
>   
>   	high_limit = DEFAULT_MAP_WINDOW;
>   	if (addr >= high_limit || (fixed && (addr + len > high_limit)))
> @@ -498,6 +506,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   			return -ENOMEM;
>   	}
>   
> +	high_slices = GET_HIGH_SLICE_INDEX(high_limit);
>   	if (high_limit > mm->context.slb_addr_limit) {
>   		/*
>   		 * Increasing the slb_addr_limit does not require
> @@ -557,13 +566,13 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   	if (psize == MMU_PAGE_64K) {
>   		compat_maskp = slice_mask_for_size(mm, MMU_PAGE_4K);
>   		if (fixed)
> -			slice_or_mask(&good_mask, maskp, compat_maskp);
> +			slice_or_mask(&good_mask, maskp, compat_maskp, high_slices);
>   		else
> -			slice_copy_mask(&good_mask, maskp);
> +			slice_copy_mask(&good_mask, maskp, high_slices);
>   	} else
>   #endif
>   	{
> -		slice_copy_mask(&good_mask, maskp);
> +		slice_copy_mask(&good_mask, maskp, high_slices);
>   	}
>   	slice_print_mask(" good_mask", &good_mask);
>   	if (compat_maskp)
> @@ -596,8 +605,8 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   	 * We don't fit in the good mask, check what other slices are
>   	 * empty and thus can be converted
>   	 */
> -	slice_mask_for_free(mm, &potential_mask, high_limit);
> -	slice_or_mask(&potential_mask, &potential_mask, &good_mask);
> +	slice_mask_for_free(mm, &potential_mask, high_slices);
> +	slice_or_mask(&potential_mask, &potential_mask, &good_mask, high_slices);
>   	slice_print_mask(" potential", &potential_mask);
>   
>   	if (addr || fixed) {
> @@ -634,7 +643,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   #ifdef CONFIG_PPC_64K_PAGES
>   	if (addr == -ENOMEM && psize == MMU_PAGE_64K) {
>   		/* retry the search with 4k-page slices included */
> -		slice_or_mask(&potential_mask, &potential_mask, compat_maskp);
> +		slice_or_mask(&potential_mask, &potential_mask, compat_maskp, high_slices);
>   		addr = slice_find_area(mm, len, &potential_mask,
>   				       psize, topdown, high_limit);
>   	}
> @@ -643,17 +652,17 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>   	if (addr == -ENOMEM)
>   		return -ENOMEM;
>   
> -	slice_range_to_mask(addr, len, &potential_mask);
> +	slice_range_to_mask(addr, len, &potential_mask, high_slices);
>   	slice_dbg(" found potential area at 0x%lx\n", addr);
>   	slice_print_mask(" mask", &potential_mask);
>   
>    convert:
> -	slice_andnot_mask(&potential_mask, &potential_mask, &good_mask);
> +	slice_andnot_mask(&potential_mask, &potential_mask, &good_mask, high_slices);
>   	if (compat_maskp && !fixed)
> -		slice_andnot_mask(&potential_mask, &potential_mask, compat_maskp);
> +		slice_andnot_mask(&potential_mask, &potential_mask, compat_maskp, high_slices);
>   	if (potential_mask.low_slices ||
>   		(SLICE_NUM_HIGH &&
> -		 !bitmap_empty(potential_mask.high_slices, SLICE_NUM_HIGH))) {
> +		 !bitmap_empty(potential_mask.high_slices, high_slices))) {

Are we sure high_slices is not nul here when SLICE_NUM_HIGH is not nul ?

Christophe

>   		slice_convert(mm, &potential_mask, psize);
>   		if (psize > MMU_PAGE_BASE)
>   			on_each_cpu(slice_flush_segments, mm, 1);
> @@ -727,7 +736,9 @@ void slice_init_new_context_exec(struct mm_struct *mm)
>   	mm->context.user_psize = psize;
>   
>   	/*
> -	 * Set all slice psizes to the default.
> +	 * Set all slice psizes to the default. High slices could
> +	 * be initialised up to slb_addr_limit if we ensure to
> +	 * initialise the rest of them as slb_addr_limit is expanded.
>   	 */
>   	lpsizes = mm->context.low_slices_psize;
>   	memset(lpsizes, (psize << 4) | psize, SLICE_NUM_LOW >> 1);
> @@ -748,10 +759,12 @@ void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
>   			   unsigned long len, unsigned int psize)
>   {
>   	struct slice_mask mask;
> +	unsigned long high_slices;
>   
>   	VM_BUG_ON(radix_enabled());
>   
> -	slice_range_to_mask(start, len, &mask);
> +	high_slices = GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit);
> +	slice_range_to_mask(start, len, &mask, high_slices);
>   	slice_convert(mm, &mask, psize);
>   }
>   
> @@ -790,9 +803,11 @@ int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
>   	if (psize == MMU_PAGE_64K) {
>   		const struct slice_mask *compat_maskp;
>   		struct slice_mask available;
> +		unsigned long high_slices;
>   
>   		compat_maskp = slice_mask_for_size(mm, MMU_PAGE_4K);
> -		slice_or_mask(&available, maskp, compat_maskp);
> +		high_slices = GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit);
> +		slice_or_mask(&available, maskp, compat_maskp, high_slices);
>   		return !slice_check_range_fits(mm, &available, addr, len);
>   	}
>   #endif
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 10/10] powerpc/mm/slice: remove radix calls to the slice code
  2018-03-06 13:25 ` [PATCH 10/10] powerpc/mm/slice: remove radix calls to the slice code Nicholas Piggin
@ 2018-03-06 15:12   ` Christophe LEROY
  2018-03-06 23:35     ` Nicholas Piggin
  0 siblings, 1 reply; 31+ messages in thread
From: Christophe LEROY @ 2018-03-06 15:12 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Aneesh Kumar K . V



Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :
> This is a tidy up which removes radix MMU calls into the slice
> code.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   arch/powerpc/include/asm/hugetlb.h |  9 ++++++---
>   arch/powerpc/mm/hugetlbpage.c      |  5 +++--
>   arch/powerpc/mm/slice.c            | 17 ++++-------------
>   3 files changed, 13 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h
> index 1a4847f67ea8..59885d444695 100644
> --- a/arch/powerpc/include/asm/hugetlb.h
> +++ b/arch/powerpc/include/asm/hugetlb.h
> @@ -90,16 +90,19 @@ pte_t *huge_pte_offset_and_shift(struct mm_struct *mm,
>   void flush_dcache_icache_hugepage(struct page *page);
>   
>   #if defined(CONFIG_PPC_MM_SLICES)
> -int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
> +int slice_is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
>   			   unsigned long len);
> -#else
> +#endif
>   static inline int is_hugepage_only_range(struct mm_struct *mm,
>   					 unsigned long addr,
>   					 unsigned long len)
>   {
> +#if defined(CONFIG_PPC_MM_SLICES)
> +	if (!radix_enabled())
> +		return slice_is_hugepage_only_range(mm, addr, len);
> +#endif
>   	return 0;

Might be easier to understand as

	if (!IS_ENABLED(CONFIG_PPC_MM_SLICES) || radix_enabled())
		return 0;
	return slice_is_hugepage_only_range(mm, addr, len);


>   }
> -#endif
>   
>   void book3e_hugetlb_preload(struct vm_area_struct *vma, unsigned long ea,
>   			    pte_t pte);
> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
> index 590be3fa0ce2..b29d40889d1c 100644
> --- a/arch/powerpc/mm/hugetlbpage.c
> +++ b/arch/powerpc/mm/hugetlbpage.c
> @@ -565,10 +565,11 @@ unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
>   unsigned long vma_mmu_pagesize(struct vm_area_struct *vma)
>   {
>   #ifdef CONFIG_PPC_MM_SLICES
> -	unsigned int psize = get_slice_psize(vma->vm_mm, vma->vm_start);
>   	/* With radix we don't use slice, so derive it from vma*/
> -	if (!radix_enabled())
> +	if (!radix_enabled()) {
> +		unsigned int psize = get_slice_psize(vma->vm_mm, vma->vm_start);

Insert a blank line here.

Christophe

>   		return 1UL << mmu_psize_to_shift(psize);
> +	}
>   #endif
>   	if (!is_vm_hugetlb_page(vma))
>   		return PAGE_SIZE;
> diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
> index 507d17e2cfcd..15a857772617 100644
> --- a/arch/powerpc/mm/slice.c
> +++ b/arch/powerpc/mm/slice.c
> @@ -697,16 +697,8 @@ unsigned int get_slice_psize(struct mm_struct *mm, unsigned long addr)
>   	unsigned char *psizes;
>   	int index, mask_index;
>   
> -	/*
> -	 * Radix doesn't use slice, but can get enabled along with MMU_SLICE
> -	 */
> -	if (radix_enabled()) {
> -#ifdef CONFIG_PPC_64K_PAGES
> -		return MMU_PAGE_64K;
> -#else
> -		return MMU_PAGE_4K;
> -#endif
> -	}
> +	VM_BUG_ON(radix_enabled());
> +
>   	if (addr < SLICE_LOW_TOP) {
>   		psizes = mm->context.low_slices_psize;
>   		index = GET_LOW_SLICE_INDEX(addr);
> @@ -788,14 +780,13 @@ void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
>    * for now as we only use slices with hugetlbfs enabled. This should
>    * be fixed as the generic code gets fixed.
>    */
> -int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
> +int slice_is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
>   			   unsigned long len)
>   {
>   	const struct slice_mask *maskp;
>   	unsigned int psize = mm->context.user_psize;
>   
> -	if (radix_enabled())
> -		return 0;
> +	VM_BUG_ON(radix_enabled());
>   
>   	maskp = slice_mask_for_size(mm, psize);
>   #ifdef CONFIG_PPC_64K_PAGES
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 06/10] powerpc/mm/slice: implement slice_check_range_fits
  2018-03-06 14:41   ` Christophe LEROY
@ 2018-03-06 23:12     ` Nicholas Piggin
  2018-03-07  6:12       ` Christophe LEROY
  0 siblings, 1 reply; 31+ messages in thread
From: Nicholas Piggin @ 2018-03-06 23:12 UTC (permalink / raw)
  To: Christophe LEROY; +Cc: linuxppc-dev, Aneesh Kumar K . V

On Tue, 6 Mar 2018 15:41:00 +0100
Christophe LEROY <christophe.leroy@c-s.fr> wrote:

> Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :


> > +static bool slice_check_range_fits(struct mm_struct *mm,
> > +			   const struct slice_mask *available,
> > +			   unsigned long start, unsigned long len)
> >   {
> > -	DECLARE_BITMAP(result, SLICE_NUM_HIGH);
> > -	/*
> > -	 * Make sure we just do bit compare only to the max
> > -	 * addr limit and not the full bit map size.
> > -	 */
> > -	unsigned long slice_count = GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit);
> > +	unsigned long end = start + len - 1;
> > +	u64 low_slices = 0;
> >   
> > -	if (!SLICE_NUM_HIGH)
> > -		return (mask->low_slices & available->low_slices) ==
> > -		       mask->low_slices;
> > +	if (start < SLICE_LOW_TOP) {
> > +		unsigned long mend = min(end, (SLICE_LOW_TOP - 1));  
> 
> See slice_range_to_mask()
> 
> You'll have an issue here with PPC32, you have to cast (SLICE_LOW_TOP - 
> 1) to unsigned long because SLICE_LOW_TOP is unsigned long long on PPC32

Okay thanks. Forgot to cross compiled it on 8xx, so I'll do that next
time.

> > +
> > +		low_slices = (1u << (GET_LOW_SLICE_INDEX(mend) + 1))
> > +				- (1u << GET_LOW_SLICE_INDEX(start));
> > +	}
> > +	if ((low_slices & available->low_slices) != low_slices)
> > +		return false;
> > +
> > +	if (SLICE_NUM_HIGH && ((start + len) > SLICE_LOW_TOP)) {
> > +		unsigned long start_index = GET_HIGH_SLICE_INDEX(start);
> > +		unsigned long align_end = ALIGN(end, (1UL << SLICE_HIGH_SHIFT));
> > +		unsigned long count = GET_HIGH_SLICE_INDEX(align_end) - start_index;
> > +		unsigned long i;
> >   
> > -	bitmap_and(result, mask->high_slices,
> > -		   available->high_slices, slice_count);
> > +		for (i = start_index; i < start_index + count; i++) {
> > +			if (!test_bit(i, available->high_slices))
> > +				return false;
> > +		}  
> 
> What about using bitmap_find_next_zero_area()

I'll look at it. Perhaps in another patch, because existing
loops are not using bitmap range operations either. A series
to convert those is a good idea.

> > @@ -562,15 +571,11 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
> >   #endif
> >   
> >   	/* First check hint if it's valid or if we have MAP_FIXED */
> > -	if (addr != 0 || fixed) {
> > -		/* Build a mask for the requested range */
> > -		slice_range_to_mask(addr, len, &mask);
> > -		slice_print_mask(" mask", &mask);
> > -
> > +	if (addr || fixed) {  
> 
> It is cleanup, should it really be part of this patch ?



> > @@ -596,10 +601,11 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
> >   	slice_or_mask(&potential_mask, &good_mask);
> >   	slice_print_mask(" potential", &potential_mask);
> >   
> > -	if ((addr != 0 || fixed) &&
> > -			slice_check_fit(mm, &mask, &potential_mask)) {
> > -		slice_dbg(" fits potential !\n");
> > -		goto convert;
> > +	if (addr || fixed) {
> > +		if (slice_check_range_fits(mm, &potential_mask, addr, len)) {
> > +			slice_dbg(" fits potential !\n");
> > +			goto convert;
> > +		}  
> 
> Why not keep the original structure and just replacing slice_check_fit() 
> by slice_check_range_fits() ?
> 
> I believe cleanups should not be mixed with real feature changes. If 
> needed, you should have a cleanup patch up front the serie.

For code that is already changing, I think minor cleanups are okay if
they're very simple. Maybe this is getting to the point of needing
another patch. You've made valid points for a lot of other unnecessary
cleanups though, so I'll fix all of those.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 07/10] powerpc/mm/slice: Switch to 3-operand slice bitops helpers
  2018-03-06 14:44   ` Christophe LEROY
@ 2018-03-06 23:19     ` Nicholas Piggin
  0 siblings, 0 replies; 31+ messages in thread
From: Nicholas Piggin @ 2018-03-06 23:19 UTC (permalink / raw)
  To: Christophe LEROY; +Cc: linuxppc-dev, Aneesh Kumar K . V

On Tue, 6 Mar 2018 15:44:46 +0100
Christophe LEROY <christophe.leroy@c-s.fr> wrote:

> Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :
> > This converts the slice_mask bit operation helpers to be the usual
> > 3-operand kind, which is clearer to work with.  
> 
> What's the real benefit of doing that ?

One or two places where we want to combine 2 const input bitmaps
and can avoid the extra copy in the next patch.

E.g., slice_or_mask(&good_mask, maskp, compat_maskp);

> 
> It is helps for a subsequent patch, say it.

Fair point, I'll add to the changelog.

> > @@ -433,25 +433,33 @@ static unsigned long slice_find_area(struct mm_struct *mm, unsigned long len,
> >   		return slice_find_area_bottomup(mm, len, mask, psize, high_limit);
> >   }
> >   
> > -static inline void slice_or_mask(struct slice_mask *dst,
> > +static inline void slice_copy_mask(struct slice_mask *dst,
> >   					const struct slice_mask *src)  
> 
> This new function is not used, the compiler while probably not be happy.

I think it doesn't get so grumpy with inlines (otherwise we'd have a lot
of problems of headers). Usually I think the new function should go into
the patch where it's first used, but this being a self contained helper,
I thought it fit better to add here. Maybe I'm wrong, I can move it.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 09/10] powerpc/mm/slice: use the dynamic high slice size to limit bitmap operations
  2018-03-06 15:02   ` Christophe LEROY
@ 2018-03-06 23:32     ` Nicholas Piggin
  0 siblings, 0 replies; 31+ messages in thread
From: Nicholas Piggin @ 2018-03-06 23:32 UTC (permalink / raw)
  To: Christophe LEROY; +Cc: linuxppc-dev, Aneesh Kumar K . V

On Tue, 6 Mar 2018 16:02:20 +0100
Christophe LEROY <christophe.leroy@c-s.fr> wrote:

> Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :
> > The number of high slices a process might use now depends on its
> > address space size, and what allocation address it has requested.
> > 
> > This patch uses that limit throughout call chains where possible,
> > rather than use the fixed SLICE_NUM_HIGH for bitmap operations.
> > This saves some cost for processes that don't use very large address
> > spaces.
> > 
> > Perormance numbers aren't changed significantly, this may change
> > with larger address spaces or different mmap access patterns that
> > require more slice mask building.
> > 
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> >   arch/powerpc/mm/slice.c | 75 +++++++++++++++++++++++++++++--------------------
> >   1 file changed, 45 insertions(+), 30 deletions(-)
> > 
> > diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
> > index 086c31b8b982..507d17e2cfcd 100644
> > --- a/arch/powerpc/mm/slice.c
> > +++ b/arch/powerpc/mm/slice.c
> > @@ -61,14 +61,12 @@ static void slice_print_mask(const char *label, const struct slice_mask *mask) {
> >   #endif
> >   
> >   static void slice_range_to_mask(unsigned long start, unsigned long len,
> > -				struct slice_mask *ret)
> > +				struct slice_mask *ret,
> > +				unsigned long high_slices)
> >   {
> >   	unsigned long end = start + len - 1;
> >   
> >   	ret->low_slices = 0;
> > -	if (SLICE_NUM_HIGH)
> > -		bitmap_zero(ret->high_slices, SLICE_NUM_HIGH);
> > -
> >   	if (start < SLICE_LOW_TOP) {
> >   		unsigned long mend = min(end,
> >   					 (unsigned long)(SLICE_LOW_TOP - 1));
> > @@ -77,6 +75,10 @@ static void slice_range_to_mask(unsigned long start, unsigned long len,
> >   			- (1u << GET_LOW_SLICE_INDEX(start));
> >   	}
> >   
> > +	if (!SLICE_NUM_HIGH)
> > +		return;
> > +
> > +	bitmap_zero(ret->high_slices, high_slices);  
> 
> In include/linux/bitmap.h, it is said:
> 
>   * Note that nbits should be always a compile time evaluable constant.
>   * Otherwise many inlines will generate horrible code.
> 
> Not sure that's true, but it is written ...

Good question, I'll check that.

> >   static inline void slice_or_mask(struct slice_mask *dst,
> >   					const struct slice_mask *src1,
> > -					const struct slice_mask *src2)
> > +					const struct slice_mask *src2,
> > +					unsigned long high_slices)
> >   {
> >   	dst->low_slices = src1->low_slices | src2->low_slices;
> >   	if (!SLICE_NUM_HIGH)
> >   		return;
> > -	bitmap_or(dst->high_slices, src1->high_slices, src2->high_slices, SLICE_NUM_HIGH);
> > +	bitmap_or(dst->high_slices, src1->high_slices, src2->high_slices,
> > +			high_slices);  
> 
> Why a new line here, this line is shorter than before.
> Or that was forgotten in a previous patch ?

Yeah it was previously a longer line. I will fix those.

> > @@ -643,17 +652,17 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
> >   	if (addr == -ENOMEM)
> >   		return -ENOMEM;
> >   
> > -	slice_range_to_mask(addr, len, &potential_mask);
> > +	slice_range_to_mask(addr, len, &potential_mask, high_slices);
> >   	slice_dbg(" found potential area at 0x%lx\n", addr);
> >   	slice_print_mask(" mask", &potential_mask);
> >   
> >    convert:
> > -	slice_andnot_mask(&potential_mask, &potential_mask, &good_mask);
> > +	slice_andnot_mask(&potential_mask, &potential_mask, &good_mask, high_slices);
> >   	if (compat_maskp && !fixed)
> > -		slice_andnot_mask(&potential_mask, &potential_mask, compat_maskp);
> > +		slice_andnot_mask(&potential_mask, &potential_mask, compat_maskp, high_slices);
> >   	if (potential_mask.low_slices ||
> >   		(SLICE_NUM_HIGH &&
> > -		 !bitmap_empty(potential_mask.high_slices, SLICE_NUM_HIGH))) {
> > +		 !bitmap_empty(potential_mask.high_slices, high_slices))) {  
> 
> Are we sure high_slices is not nul here when SLICE_NUM_HIGH is not nul ?

On 64/s it should be for 64-bit processes, but perhaps not 32. I have
to look into that, so another good catch.

Perhaps I will leave this patch off the series for now because I didn't
measure much difference. Aneesh wants to expand address space even more,
so I might revisit after his patches go in, to see if the optimistation
becomes worthwhile.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 08/10] powerpc/mm/slice: Use const pointers to cached slice masks where possible
  2018-03-06 14:55   ` Christophe LEROY
@ 2018-03-06 23:33     ` Nicholas Piggin
  0 siblings, 0 replies; 31+ messages in thread
From: Nicholas Piggin @ 2018-03-06 23:33 UTC (permalink / raw)
  To: Christophe LEROY; +Cc: linuxppc-dev, Aneesh Kumar K . V

On Tue, 6 Mar 2018 15:55:04 +0100
Christophe LEROY <christophe.leroy@c-s.fr> wrote:

> Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :

> > @@ -572,11 +555,19 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
> >   #ifdef CONFIG_PPC_64K_PAGES
> >   	/* If we support combo pages, we can allow 64k pages in 4k slices */
> >   	if (psize == MMU_PAGE_64K) {
> > -		compat_mask = *slice_mask_for_size(mm, MMU_PAGE_4K);
> > +		compat_maskp = slice_mask_for_size(mm, MMU_PAGE_4K);
> >   		if (fixed)
> > -			slice_or_mask(&good_mask, &good_mask, &compat_mask);
> > -	}
> > +			slice_or_mask(&good_mask, maskp, compat_maskp);
> > +		else
> > +			slice_copy_mask(&good_mask, maskp);
> > +	} else
> >   #endif
> > +	{
> > +		slice_copy_mask(&good_mask, maskp);
> > +	}  
> 
> You could get something nicer by removing that #ifdef and doing instead:
> 
> 	if (IS_ENABLED(CONFIG_PPC_64K_PAGES) && psize == MMU_PAGE_64K) {
> 		...
> 	} else {
> 		slice_copy_mask(&good_mask, maskp);
> 	}

Yeah that's nicer.

> >   
> > -#if 0 /* too verbose */
> > -	slice_dbg("is_hugepage_only_range(mm=%p, addr=%lx, len=%lx)\n",
> > -		 mm, addr, len);
> > -	slice_print_mask(" mask", &mask);
> > -	slice_print_mask(" available", &available);
> > -#endif  
> 
> That's cleanup, should be in a previous patch.

Okay.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 10/10] powerpc/mm/slice: remove radix calls to the slice code
  2018-03-06 15:12   ` Christophe LEROY
@ 2018-03-06 23:35     ` Nicholas Piggin
  0 siblings, 0 replies; 31+ messages in thread
From: Nicholas Piggin @ 2018-03-06 23:35 UTC (permalink / raw)
  To: Christophe LEROY; +Cc: linuxppc-dev, Aneesh Kumar K . V

On Tue, 6 Mar 2018 16:12:34 +0100
Christophe LEROY <christophe.leroy@c-s.fr> wrote:

> Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :
> > This is a tidy up which removes radix MMU calls into the slice
> > code.
> > 
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> >   arch/powerpc/include/asm/hugetlb.h |  9 ++++++---
> >   arch/powerpc/mm/hugetlbpage.c      |  5 +++--
> >   arch/powerpc/mm/slice.c            | 17 ++++-------------
> >   3 files changed, 13 insertions(+), 18 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h
> > index 1a4847f67ea8..59885d444695 100644
> > --- a/arch/powerpc/include/asm/hugetlb.h
> > +++ b/arch/powerpc/include/asm/hugetlb.h
> > @@ -90,16 +90,19 @@ pte_t *huge_pte_offset_and_shift(struct mm_struct *mm,
> >   void flush_dcache_icache_hugepage(struct page *page);
> >   
> >   #if defined(CONFIG_PPC_MM_SLICES)
> > -int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
> > +int slice_is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
> >   			   unsigned long len);
> > -#else
> > +#endif
> >   static inline int is_hugepage_only_range(struct mm_struct *mm,
> >   					 unsigned long addr,
> >   					 unsigned long len)
> >   {
> > +#if defined(CONFIG_PPC_MM_SLICES)
> > +	if (!radix_enabled())
> > +		return slice_is_hugepage_only_range(mm, addr, len);
> > +#endif
> >   	return 0;  
> 
> Might be easier to understand as
> 
> 	if (!IS_ENABLED(CONFIG_PPC_MM_SLICES) || radix_enabled())
> 		return 0;
> 	return slice_is_hugepage_only_range(mm, addr, len);

Yep.


> >   unsigned long vma_mmu_pagesize(struct vm_area_struct *vma)
> >   {
> >   #ifdef CONFIG_PPC_MM_SLICES
> > -	unsigned int psize = get_slice_psize(vma->vm_mm, vma->vm_start);
> >   	/* With radix we don't use slice, so derive it from vma*/
> > -	if (!radix_enabled())
> > +	if (!radix_enabled()) {
> > +		unsigned int psize = get_slice_psize(vma->vm_mm, vma->vm_start);  
> 
> Insert a blank line here.

Okay.

Thanks for the review, it's really appreciated.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 06/10] powerpc/mm/slice: implement slice_check_range_fits
  2018-03-06 23:12     ` Nicholas Piggin
@ 2018-03-07  6:12       ` Christophe LEROY
  2018-03-07  7:16         ` Nicholas Piggin
  0 siblings, 1 reply; 31+ messages in thread
From: Christophe LEROY @ 2018-03-07  6:12 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, Aneesh Kumar K . V



Le 07/03/2018 à 00:12, Nicholas Piggin a écrit :
> On Tue, 6 Mar 2018 15:41:00 +0100
> Christophe LEROY <christophe.leroy@c-s.fr> wrote:
> 
>> Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :
> 
> 
>>> +static bool slice_check_range_fits(struct mm_struct *mm,
>>> +			   const struct slice_mask *available,
>>> +			   unsigned long start, unsigned long len)
>>>    {
>>> -	DECLARE_BITMAP(result, SLICE_NUM_HIGH);
>>> -	/*
>>> -	 * Make sure we just do bit compare only to the max
>>> -	 * addr limit and not the full bit map size.
>>> -	 */
>>> -	unsigned long slice_count = GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit);
>>> +	unsigned long end = start + len - 1;
>>> +	u64 low_slices = 0;
>>>    
>>> -	if (!SLICE_NUM_HIGH)
>>> -		return (mask->low_slices & available->low_slices) ==
>>> -		       mask->low_slices;
>>> +	if (start < SLICE_LOW_TOP) {
>>> +		unsigned long mend = min(end, (SLICE_LOW_TOP - 1));
>>
>> See slice_range_to_mask()
>>
>> You'll have an issue here with PPC32, you have to cast (SLICE_LOW_TOP -
>> 1) to unsigned long because SLICE_LOW_TOP is unsigned long long on PPC32
> 
> Okay thanks. Forgot to cross compiled it on 8xx, so I'll do that next
> time.
> 
>>> +
>>> +		low_slices = (1u << (GET_LOW_SLICE_INDEX(mend) + 1))
>>> +				- (1u << GET_LOW_SLICE_INDEX(start));
>>> +	}
>>> +	if ((low_slices & available->low_slices) != low_slices)
>>> +		return false;
>>> +
>>> +	if (SLICE_NUM_HIGH && ((start + len) > SLICE_LOW_TOP)) {
>>> +		unsigned long start_index = GET_HIGH_SLICE_INDEX(start);
>>> +		unsigned long align_end = ALIGN(end, (1UL << SLICE_HIGH_SHIFT));
>>> +		unsigned long count = GET_HIGH_SLICE_INDEX(align_end) - start_index;
>>> +		unsigned long i;
>>>    
>>> -	bitmap_and(result, mask->high_slices,
>>> -		   available->high_slices, slice_count);
>>> +		for (i = start_index; i < start_index + count; i++) {
>>> +			if (!test_bit(i, available->high_slices))
>>> +				return false;
>>> +		}
>>
>> What about using bitmap_find_next_zero_area()
> 
> I'll look at it. Perhaps in another patch, because existing
> loops are not using bitmap range operations either. A series
> to convert those is a good idea.
> 
>>> @@ -562,15 +571,11 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>>>    #endif
>>>    
>>>    	/* First check hint if it's valid or if we have MAP_FIXED */
>>> -	if (addr != 0 || fixed) {
>>> -		/* Build a mask for the requested range */
>>> -		slice_range_to_mask(addr, len, &mask);
>>> -		slice_print_mask(" mask", &mask);
>>> -
>>> +	if (addr || fixed) {
>>
>> It is cleanup, should it really be part of this patch ?
> 
> 
> 
>>> @@ -596,10 +601,11 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>>>    	slice_or_mask(&potential_mask, &good_mask);
>>>    	slice_print_mask(" potential", &potential_mask);
>>>    
>>> -	if ((addr != 0 || fixed) &&
>>> -			slice_check_fit(mm, &mask, &potential_mask)) {
>>> -		slice_dbg(" fits potential !\n");
>>> -		goto convert;
>>> +	if (addr || fixed) {
>>> +		if (slice_check_range_fits(mm, &potential_mask, addr, len)) {
>>> +			slice_dbg(" fits potential !\n");
>>> +			goto convert;
>>> +		}
>>
>> Why not keep the original structure and just replacing slice_check_fit()
>> by slice_check_range_fits() ?
>>
>> I believe cleanups should not be mixed with real feature changes. If
>> needed, you should have a cleanup patch up front the serie.
> 
> For code that is already changing, I think minor cleanups are okay if
> they're very simple. Maybe this is getting to the point of needing
> another patch. You've made valid points for a lot of other unnecessary
> cleanups though, so I'll fix all of those.

Ok, that's not a big point, but I like when patches really modifies
only the lines they need to modify. Why do we need a double if ?

Why not just the following ? With proper alignment of the second line 
with the open parenthese, it fits in one line

	if ((addr != 0 || fixed) &&
-			slice_check_fit(mm, &mask, &potential_mask)) {
+	    slice_check_range_fits(mm, &potential_mask, addr, len)) {
  		slice_dbg(" fits potential !\n");
  		goto convert;
	}


Christophe

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 06/10] powerpc/mm/slice: implement slice_check_range_fits
  2018-03-07  6:12       ` Christophe LEROY
@ 2018-03-07  7:16         ` Nicholas Piggin
  2018-03-07 13:38           ` Christophe LEROY
  0 siblings, 1 reply; 31+ messages in thread
From: Nicholas Piggin @ 2018-03-07  7:16 UTC (permalink / raw)
  To: Christophe LEROY; +Cc: linuxppc-dev, Aneesh Kumar K . V

On Wed, 7 Mar 2018 07:12:23 +0100
Christophe LEROY <christophe.leroy@c-s.fr> wrote:

> Le 07/03/2018 à 00:12, Nicholas Piggin a écrit :
> > On Tue, 6 Mar 2018 15:41:00 +0100
> > Christophe LEROY <christophe.leroy@c-s.fr> wrote:
> >   
> >> Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :  

> >>> @@ -596,10 +601,11 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
> >>>    	slice_or_mask(&potential_mask, &good_mask);
> >>>    	slice_print_mask(" potential", &potential_mask);
> >>>    
> >>> -	if ((addr != 0 || fixed) &&
> >>> -			slice_check_fit(mm, &mask, &potential_mask)) {
> >>> -		slice_dbg(" fits potential !\n");
> >>> -		goto convert;
> >>> +	if (addr || fixed) {
> >>> +		if (slice_check_range_fits(mm, &potential_mask, addr, len)) {
> >>> +			slice_dbg(" fits potential !\n");
> >>> +			goto convert;
> >>> +		}  
> >>
> >> Why not keep the original structure and just replacing slice_check_fit()
> >> by slice_check_range_fits() ?
> >>
> >> I believe cleanups should not be mixed with real feature changes. If
> >> needed, you should have a cleanup patch up front the serie.  
> > 
> > For code that is already changing, I think minor cleanups are okay if
> > they're very simple. Maybe this is getting to the point of needing
> > another patch. You've made valid points for a lot of other unnecessary
> > cleanups though, so I'll fix all of those.  
> 
> Ok, that's not a big point, but I like when patches really modifies
> only the lines they need to modify.

Fair point, and in the end I agree mostly they should do that. But I
don't think entirely if you can make the code slightly better as you
go (again, so long as the change is obvious). I think having extra
patches for trivial cleanups is not that great either.

> Why do we need a double if ?
> 
> Why not just the following ? With proper alignment of the second line 
> with the open parenthese, it fits in one line
> 
> 	if ((addr != 0 || fixed) &&
> -			slice_check_fit(mm, &mask, &potential_mask)) {
> +	    slice_check_range_fits(mm, &potential_mask, addr, len)) {
>   		slice_dbg(" fits potential !\n");
>   		goto convert;

For this case the main motivation was to make this test match the
form of the same test (with different mask) above here. Doing the
same thing with different coding styles annoys me.

I think I kept this one but fixed all your other suggestions in
the v2 series.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 06/10] powerpc/mm/slice: implement slice_check_range_fits
  2018-03-07  7:16         ` Nicholas Piggin
@ 2018-03-07 13:38           ` Christophe LEROY
  0 siblings, 0 replies; 31+ messages in thread
From: Christophe LEROY @ 2018-03-07 13:38 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, Aneesh Kumar K . V



Le 07/03/2018 à 08:16, Nicholas Piggin a écrit :
> On Wed, 7 Mar 2018 07:12:23 +0100
> Christophe LEROY <christophe.leroy@c-s.fr> wrote:
> 
>> Le 07/03/2018 à 00:12, Nicholas Piggin a écrit :
>>> On Tue, 6 Mar 2018 15:41:00 +0100
>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:
>>>    
>>>> Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :
> 
>>>>> @@ -596,10 +601,11 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
>>>>>     	slice_or_mask(&potential_mask, &good_mask);
>>>>>     	slice_print_mask(" potential", &potential_mask);
>>>>>     
>>>>> -	if ((addr != 0 || fixed) &&
>>>>> -			slice_check_fit(mm, &mask, &potential_mask)) {
>>>>> -		slice_dbg(" fits potential !\n");
>>>>> -		goto convert;
>>>>> +	if (addr || fixed) {
>>>>> +		if (slice_check_range_fits(mm, &potential_mask, addr, len)) {
>>>>> +			slice_dbg(" fits potential !\n");
>>>>> +			goto convert;
>>>>> +		}
>>>>
>>>> Why not keep the original structure and just replacing slice_check_fit()
>>>> by slice_check_range_fits() ?
>>>>
>>>> I believe cleanups should not be mixed with real feature changes. If
>>>> needed, you should have a cleanup patch up front the serie.
>>>
>>> For code that is already changing, I think minor cleanups are okay if
>>> they're very simple. Maybe this is getting to the point of needing
>>> another patch. You've made valid points for a lot of other unnecessary
>>> cleanups though, so I'll fix all of those.
>>
>> Ok, that's not a big point, but I like when patches really modifies
>> only the lines they need to modify.
> 
> Fair point, and in the end I agree mostly they should do that. But I
> don't think entirely if you can make the code slightly better as you
> go (again, so long as the change is obvious). I think having extra
> patches for trivial cleanups is not that great either.
> 
>> Why do we need a double if ?
>>
>> Why not just the following ? With proper alignment of the second line
>> with the open parenthese, it fits in one line
>>
>> 	if ((addr != 0 || fixed) &&
>> -			slice_check_fit(mm, &mask, &potential_mask)) {
>> +	    slice_check_range_fits(mm, &potential_mask, addr, len)) {
>>    		slice_dbg(" fits potential !\n");
>>    		goto convert;
> 
> For this case the main motivation was to make this test match the
> form of the same test (with different mask) above here. Doing the
> same thing with different coding styles annoys me.

Yes good point.

Christophe

> 
> I think I kept this one but fixed all your other suggestions in
> the v2 series.
> 
> Thanks,
> Nick
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [01/10] selftests/powerpc: add process creation benchmark
  2018-03-06 13:24 ` [PATCH 01/10] selftests/powerpc: add process creation benchmark Nicholas Piggin
@ 2018-03-19 22:23   ` Michael Ellerman
  2018-03-20 10:15   ` Michael Ellerman
  1 sibling, 0 replies; 31+ messages in thread
From: Michael Ellerman @ 2018-03-19 22:23 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Aneesh Kumar K . V, Nicholas Piggin

On Tue, 2018-03-06 at 13:24:58 UTC, Nicholas Piggin wrote:
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/838fd21b1bde8ed16e64289a8c7467

cheers

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [01/10] selftests/powerpc: add process creation benchmark
  2018-03-06 13:24 ` [PATCH 01/10] selftests/powerpc: add process creation benchmark Nicholas Piggin
  2018-03-19 22:23   ` [01/10] " Michael Ellerman
@ 2018-03-20 10:15   ` Michael Ellerman
  1 sibling, 0 replies; 31+ messages in thread
From: Michael Ellerman @ 2018-03-20 10:15 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Aneesh Kumar K . V, Nicholas Piggin

On Tue, 2018-03-06 at 13:24:58 UTC, Nicholas Piggin wrote:
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/dd40c5b4c90d84d30cdb452c2d193d

cheers

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2018-03-20 10:15 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-06 13:24 [PATCH 00/10] powerpc/mm/slice: improve slice speed and stack use Nicholas Piggin
2018-03-06 13:24 ` [PATCH 01/10] selftests/powerpc: add process creation benchmark Nicholas Piggin
2018-03-19 22:23   ` [01/10] " Michael Ellerman
2018-03-20 10:15   ` Michael Ellerman
2018-03-06 13:24 ` [PATCH 02/10] powerpc/mm/slice: Simplify and optimise slice context initialisation Nicholas Piggin
2018-03-06 14:32   ` Nicholas Piggin
2018-03-06 13:25 ` [PATCH 03/10] powerpc/mm/slice: tidy lpsizes and hpsizes update loops Nicholas Piggin
2018-03-06 13:25 ` [PATCH 04/10] powerpc/mm/slice: pass pointers to struct slice_mask where possible Nicholas Piggin
2018-03-06 13:43   ` Christophe LEROY
2018-03-06 13:59     ` Nicholas Piggin
2018-03-06 13:25 ` [PATCH 05/10] powerpc/mm/slice: implement a slice mask cache Nicholas Piggin
2018-03-06 13:49   ` Christophe LEROY
2018-03-06 14:01     ` Nicholas Piggin
2018-03-06 13:25 ` [PATCH 06/10] powerpc/mm/slice: implement slice_check_range_fits Nicholas Piggin
2018-03-06 14:41   ` Christophe LEROY
2018-03-06 23:12     ` Nicholas Piggin
2018-03-07  6:12       ` Christophe LEROY
2018-03-07  7:16         ` Nicholas Piggin
2018-03-07 13:38           ` Christophe LEROY
2018-03-06 13:25 ` [PATCH 07/10] powerpc/mm/slice: Switch to 3-operand slice bitops helpers Nicholas Piggin
2018-03-06 14:44   ` Christophe LEROY
2018-03-06 23:19     ` Nicholas Piggin
2018-03-06 13:25 ` [PATCH 08/10] powerpc/mm/slice: Use const pointers to cached slice masks where possible Nicholas Piggin
2018-03-06 14:55   ` Christophe LEROY
2018-03-06 23:33     ` Nicholas Piggin
2018-03-06 13:25 ` [PATCH 09/10] powerpc/mm/slice: use the dynamic high slice size to limit bitmap operations Nicholas Piggin
2018-03-06 15:02   ` Christophe LEROY
2018-03-06 23:32     ` Nicholas Piggin
2018-03-06 13:25 ` [PATCH 10/10] powerpc/mm/slice: remove radix calls to the slice code Nicholas Piggin
2018-03-06 15:12   ` Christophe LEROY
2018-03-06 23:35     ` Nicholas Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).