linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RESEND v1 0/4] powerpc/vdso: Add support for time namespaces
@ 2021-03-31 16:48 Christophe Leroy
  2021-03-31 16:48 ` [PATCH RESEND v1 1/4] lib/vdso: Mark do_hres_timens() and do_coarse_timens() __always_inline() Christophe Leroy
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Christophe Leroy @ 2021-03-31 16:48 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linux-kernel, linuxppc-dev, dima, avagin, arnd, tglx,
	vincenzo.frascino, luto, linux-arch

[Sorry, resending with complete destination list, I used the wrong script on the first delivery]

This series adds support for time namespaces on powerpc.

All timens selftests are successfull.

Christophe Leroy (3):
  lib/vdso: Mark do_hres_timens() and do_coarse_timens()
    __always_inline()
  lib/vdso: Add vdso_data pointer as input to
    __arch_get_timens_vdso_data()
  powerpc/vdso: Add support for time namespaces

Dmitry Safonov (1):
  powerpc/vdso: Separate vvar vma from vdso

 .../include/asm/vdso/compat_gettimeofday.h    |   3 +-
 arch/arm64/include/asm/vdso/gettimeofday.h    |   2 +-
 arch/powerpc/Kconfig                          |   3 +-
 arch/powerpc/include/asm/mmu_context.h        |   2 +-
 arch/powerpc/include/asm/vdso/gettimeofday.h  |  10 ++
 arch/powerpc/include/asm/vdso_datapage.h      |   2 -
 arch/powerpc/kernel/vdso.c                    | 138 ++++++++++++++++--
 arch/powerpc/kernel/vdso32/vdso32.lds.S       |   2 +-
 arch/powerpc/kernel/vdso64/vdso64.lds.S       |   2 +-
 arch/s390/include/asm/vdso/gettimeofday.h     |   3 +-
 arch/x86/include/asm/vdso/gettimeofday.h      |   3 +-
 lib/vdso/gettimeofday.c                       |  31 ++--
 12 files changed, 162 insertions(+), 39 deletions(-)

-- 
2.25.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH RESEND v1 1/4] lib/vdso: Mark do_hres_timens() and do_coarse_timens() __always_inline()
  2021-03-31 16:48 [PATCH RESEND v1 0/4] powerpc/vdso: Add support for time namespaces Christophe Leroy
@ 2021-03-31 16:48 ` Christophe Leroy
  2021-04-12 12:47   ` Thomas Gleixner
  2021-04-12 12:54   ` Vincenzo Frascino
  2021-03-31 16:48 ` [PATCH RESEND v1 2/4] lib/vdso: Add vdso_data pointer as input to __arch_get_timens_vdso_data() Christophe Leroy
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Christophe Leroy @ 2021-03-31 16:48 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linux-kernel, linuxppc-dev, dima, avagin, arnd, tglx,
	vincenzo.frascino, luto, linux-arch

In the same spirit as commit c966533f8c6c ("lib/vdso: Mark do_hres()
and do_coarse() as __always_inline"), mark do_hres_timens() and
do_coarse_timens() __always_inline.

The measurement below in on a non timens process, ie on the fastest path.

On powerpc32, without the patch:

clock-gettime-monotonic-raw:    vdso: 1155 nsec/call
clock-gettime-monotonic-coarse:    vdso: 813 nsec/call
clock-gettime-monotonic:    vdso: 1076 nsec/call

With the patch:

clock-gettime-monotonic-raw:    vdso: 1100 nsec/call
clock-gettime-monotonic-coarse:    vdso: 667 nsec/call
clock-gettime-monotonic:    vdso: 1025 nsec/call

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 lib/vdso/gettimeofday.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/lib/vdso/gettimeofday.c b/lib/vdso/gettimeofday.c
index 2919f1698140..c6f6dee08746 100644
--- a/lib/vdso/gettimeofday.c
+++ b/lib/vdso/gettimeofday.c
@@ -46,8 +46,8 @@ static inline bool vdso_cycles_ok(u64 cycles)
 #endif
 
 #ifdef CONFIG_TIME_NS
-static int do_hres_timens(const struct vdso_data *vdns, clockid_t clk,
-			  struct __kernel_timespec *ts)
+static __always_inline int do_hres_timens(const struct vdso_data *vdns, clockid_t clk,
+					  struct __kernel_timespec *ts)
 {
 	const struct vdso_data *vd = __arch_get_timens_vdso_data();
 	const struct timens_offset *offs = &vdns->offset[clk];
@@ -97,8 +97,8 @@ static __always_inline const struct vdso_data *__arch_get_timens_vdso_data(void)
 	return NULL;
 }
 
-static int do_hres_timens(const struct vdso_data *vdns, clockid_t clk,
-			  struct __kernel_timespec *ts)
+static __always_inline int do_hres_timens(const struct vdso_data *vdns, clockid_t clk,
+					  struct __kernel_timespec *ts)
 {
 	return -EINVAL;
 }
@@ -159,8 +159,8 @@ static __always_inline int do_hres(const struct vdso_data *vd, clockid_t clk,
 }
 
 #ifdef CONFIG_TIME_NS
-static int do_coarse_timens(const struct vdso_data *vdns, clockid_t clk,
-			    struct __kernel_timespec *ts)
+static __always_inline int do_coarse_timens(const struct vdso_data *vdns, clockid_t clk,
+					    struct __kernel_timespec *ts)
 {
 	const struct vdso_data *vd = __arch_get_timens_vdso_data();
 	const struct vdso_timestamp *vdso_ts = &vd->basetime[clk];
@@ -188,8 +188,8 @@ static int do_coarse_timens(const struct vdso_data *vdns, clockid_t clk,
 	return 0;
 }
 #else
-static int do_coarse_timens(const struct vdso_data *vdns, clockid_t clk,
-			    struct __kernel_timespec *ts)
+static __always_inline int do_coarse_timens(const struct vdso_data *vdns, clockid_t clk,
+					    struct __kernel_timespec *ts)
 {
 	return -1;
 }
-- 
2.25.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH RESEND v1 2/4] lib/vdso: Add vdso_data pointer as input to __arch_get_timens_vdso_data()
  2021-03-31 16:48 [PATCH RESEND v1 0/4] powerpc/vdso: Add support for time namespaces Christophe Leroy
  2021-03-31 16:48 ` [PATCH RESEND v1 1/4] lib/vdso: Mark do_hres_timens() and do_coarse_timens() __always_inline() Christophe Leroy
@ 2021-03-31 16:48 ` Christophe Leroy
  2021-04-05  5:00   ` Andrei Vagin
                     ` (2 more replies)
  2021-03-31 16:48 ` [PATCH RESEND v1 3/4] powerpc/vdso: Separate vvar vma from vdso Christophe Leroy
                   ` (3 subsequent siblings)
  5 siblings, 3 replies; 17+ messages in thread
From: Christophe Leroy @ 2021-03-31 16:48 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linux-kernel, linuxppc-dev, dima, avagin, arnd, tglx,
	vincenzo.frascino, luto, linux-arch

For the same reason as commit e876f0b69dc9 ("lib/vdso: Allow
architectures to provide the vdso data pointer"), powerpc wants to
avoid calculation of relative position to code.

As the timens_vdso_data is next page to vdso_data, provide
vdso_data pointer to __arch_get_timens_vdso_data() in order
to ease the calculation on powerpc in following patches.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/arm64/include/asm/vdso/compat_gettimeofday.h |  3 ++-
 arch/arm64/include/asm/vdso/gettimeofday.h        |  2 +-
 arch/s390/include/asm/vdso/gettimeofday.h         |  3 ++-
 arch/x86/include/asm/vdso/gettimeofday.h          |  3 ++-
 lib/vdso/gettimeofday.c                           | 15 +++++++++------
 5 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/vdso/compat_gettimeofday.h b/arch/arm64/include/asm/vdso/compat_gettimeofday.h
index 7508b0ac1d21..ecb6fd4c3c64 100644
--- a/arch/arm64/include/asm/vdso/compat_gettimeofday.h
+++ b/arch/arm64/include/asm/vdso/compat_gettimeofday.h
@@ -155,7 +155,8 @@ static __always_inline const struct vdso_data *__arch_get_vdso_data(void)
 }
 
 #ifdef CONFIG_TIME_NS
-static __always_inline const struct vdso_data *__arch_get_timens_vdso_data(void)
+static __always_inline
+const struct vdso_data *__arch_get_timens_vdso_data(const struct vdso_data *vd)
 {
 	const struct vdso_data *ret;
 
diff --git a/arch/arm64/include/asm/vdso/gettimeofday.h b/arch/arm64/include/asm/vdso/gettimeofday.h
index 631ab1281633..de86230a9436 100644
--- a/arch/arm64/include/asm/vdso/gettimeofday.h
+++ b/arch/arm64/include/asm/vdso/gettimeofday.h
@@ -100,7 +100,7 @@ const struct vdso_data *__arch_get_vdso_data(void)
 
 #ifdef CONFIG_TIME_NS
 static __always_inline
-const struct vdso_data *__arch_get_timens_vdso_data(void)
+const struct vdso_data *__arch_get_timens_vdso_data(const struct vdso_data *vd)
 {
 	return _timens_data;
 }
diff --git a/arch/s390/include/asm/vdso/gettimeofday.h b/arch/s390/include/asm/vdso/gettimeofday.h
index ed89ef742530..383c53c3dddd 100644
--- a/arch/s390/include/asm/vdso/gettimeofday.h
+++ b/arch/s390/include/asm/vdso/gettimeofday.h
@@ -68,7 +68,8 @@ long clock_getres_fallback(clockid_t clkid, struct __kernel_timespec *ts)
 }
 
 #ifdef CONFIG_TIME_NS
-static __always_inline const struct vdso_data *__arch_get_timens_vdso_data(void)
+static __always_inline
+const struct vdso_data *__arch_get_timens_vdso_data(const struct vdso_data *vd)
 {
 	return _timens_data;
 }
diff --git a/arch/x86/include/asm/vdso/gettimeofday.h b/arch/x86/include/asm/vdso/gettimeofday.h
index df01d7349d79..1936f21ed8cd 100644
--- a/arch/x86/include/asm/vdso/gettimeofday.h
+++ b/arch/x86/include/asm/vdso/gettimeofday.h
@@ -58,7 +58,8 @@ extern struct ms_hyperv_tsc_page hvclock_page
 #endif
 
 #ifdef CONFIG_TIME_NS
-static __always_inline const struct vdso_data *__arch_get_timens_vdso_data(void)
+static __always_inline
+const struct vdso_data *__arch_get_timens_vdso_data(const struct vdso_data *vd)
 {
 	return __timens_vdso_data;
 }
diff --git a/lib/vdso/gettimeofday.c b/lib/vdso/gettimeofday.c
index c6f6dee08746..ce2f69552003 100644
--- a/lib/vdso/gettimeofday.c
+++ b/lib/vdso/gettimeofday.c
@@ -49,13 +49,15 @@ static inline bool vdso_cycles_ok(u64 cycles)
 static __always_inline int do_hres_timens(const struct vdso_data *vdns, clockid_t clk,
 					  struct __kernel_timespec *ts)
 {
-	const struct vdso_data *vd = __arch_get_timens_vdso_data();
+	const struct vdso_data *vd;
 	const struct timens_offset *offs = &vdns->offset[clk];
 	const struct vdso_timestamp *vdso_ts;
 	u64 cycles, last, ns;
 	u32 seq;
 	s64 sec;
 
+	vd = vdns - (clk == CLOCK_MONOTONIC_RAW ? CS_RAW : CS_HRES_COARSE);
+	vd = __arch_get_timens_vdso_data(vd);
 	if (clk != CLOCK_MONOTONIC_RAW)
 		vd = &vd[CS_HRES_COARSE];
 	else
@@ -92,7 +94,8 @@ static __always_inline int do_hres_timens(const struct vdso_data *vdns, clockid_
 	return 0;
 }
 #else
-static __always_inline const struct vdso_data *__arch_get_timens_vdso_data(void)
+static __always_inline
+const struct vdso_data *__arch_get_timens_vdso_data(const struct vdso_data *vd)
 {
 	return NULL;
 }
@@ -162,7 +165,7 @@ static __always_inline int do_hres(const struct vdso_data *vd, clockid_t clk,
 static __always_inline int do_coarse_timens(const struct vdso_data *vdns, clockid_t clk,
 					    struct __kernel_timespec *ts)
 {
-	const struct vdso_data *vd = __arch_get_timens_vdso_data();
+	const struct vdso_data *vd = __arch_get_timens_vdso_data(vdns);
 	const struct vdso_timestamp *vdso_ts = &vd->basetime[clk];
 	const struct timens_offset *offs = &vdns->offset[clk];
 	u64 nsec;
@@ -310,7 +313,7 @@ __cvdso_gettimeofday_data(const struct vdso_data *vd,
 	if (unlikely(tz != NULL)) {
 		if (IS_ENABLED(CONFIG_TIME_NS) &&
 		    vd->clock_mode == VDSO_CLOCKMODE_TIMENS)
-			vd = __arch_get_timens_vdso_data();
+			vd = __arch_get_timens_vdso_data(vd);
 
 		tz->tz_minuteswest = vd[CS_HRES_COARSE].tz_minuteswest;
 		tz->tz_dsttime = vd[CS_HRES_COARSE].tz_dsttime;
@@ -333,7 +336,7 @@ __cvdso_time_data(const struct vdso_data *vd, __kernel_old_time_t *time)
 
 	if (IS_ENABLED(CONFIG_TIME_NS) &&
 	    vd->clock_mode == VDSO_CLOCKMODE_TIMENS)
-		vd = __arch_get_timens_vdso_data();
+		vd = __arch_get_timens_vdso_data(vd);
 
 	t = READ_ONCE(vd[CS_HRES_COARSE].basetime[CLOCK_REALTIME].sec);
 
@@ -363,7 +366,7 @@ int __cvdso_clock_getres_common(const struct vdso_data *vd, clockid_t clock,
 
 	if (IS_ENABLED(CONFIG_TIME_NS) &&
 	    vd->clock_mode == VDSO_CLOCKMODE_TIMENS)
-		vd = __arch_get_timens_vdso_data();
+		vd = __arch_get_timens_vdso_data(vd);
 
 	/*
 	 * Convert the clockid to a bitmask and use it to check which
-- 
2.25.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH RESEND v1 3/4] powerpc/vdso: Separate vvar vma from vdso
  2021-03-31 16:48 [PATCH RESEND v1 0/4] powerpc/vdso: Add support for time namespaces Christophe Leroy
  2021-03-31 16:48 ` [PATCH RESEND v1 1/4] lib/vdso: Mark do_hres_timens() and do_coarse_timens() __always_inline() Christophe Leroy
  2021-03-31 16:48 ` [PATCH RESEND v1 2/4] lib/vdso: Add vdso_data pointer as input to __arch_get_timens_vdso_data() Christophe Leroy
@ 2021-03-31 16:48 ` Christophe Leroy
  2021-04-05  5:03   ` Andrei Vagin
  2021-04-12 12:58   ` Vincenzo Frascino
  2021-03-31 16:48 ` [PATCH RESEND v1 4/4] powerpc/vdso: Add support for time namespaces Christophe Leroy
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Christophe Leroy @ 2021-03-31 16:48 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linux-kernel, linuxppc-dev, dima, avagin, arnd, tglx,
	vincenzo.frascino, luto, linux-arch

From: Dmitry Safonov <dima@arista.com>

Since commit 511157ab641e ("powerpc/vdso: Move vdso datapage up front")
VVAR page is in front of the VDSO area. In result it breaks CRIU
(Checkpoint Restore In Userspace) [1], where CRIU expects that "[vdso]"
from /proc/../maps points at ELF/vdso image, rather than at VVAR data page.
Laurent made a patch to keep CRIU working (by reading aux vector).
But I think it still makes sence to separate two mappings into different
VMAs. It will also make ppc64 less "special" for userspace and as
a side-bonus will make VVAR page un-writable by debugger (which previously
would COW page and can be unexpected).

I opportunistically Cc stable on it: I understand that usually such
stuff isn't a stable material, but that will allow us in CRIU have
one workaround less that is needed just for one release (v5.11) on
one platform (ppc64), which we otherwise have to maintain.
I wouldn't go as far as to say that the commit 511157ab641e is ABI
regression as no other userspace got broken, but I'd really appreciate
if it gets backported to v5.11 after v5.12 is released, so as not
to complicate already non-simple CRIU-vdso code. Thanks!

Cc: Andrei Vagin <avagin@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: stable@vger.kernel.org # v5.11
[1]: https://github.com/checkpoint-restore/criu/issues/1417
Signed-off-by: Dmitry Safonov <dima@arista.com>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/include/asm/mmu_context.h |  2 +-
 arch/powerpc/kernel/vdso.c             | 54 +++++++++++++++++++-------
 2 files changed, 40 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 652ce85f9410..4bc45d3ed8b0 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -263,7 +263,7 @@ extern void arch_exit_mmap(struct mm_struct *mm);
 static inline void arch_unmap(struct mm_struct *mm,
 			      unsigned long start, unsigned long end)
 {
-	unsigned long vdso_base = (unsigned long)mm->context.vdso - PAGE_SIZE;
+	unsigned long vdso_base = (unsigned long)mm->context.vdso;
 
 	if (start <= vdso_base && vdso_base < end)
 		mm->context.vdso = NULL;
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index e839a906fdf2..b14907209822 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -55,10 +55,10 @@ static int vdso_mremap(const struct vm_special_mapping *sm, struct vm_area_struc
 {
 	unsigned long new_size = new_vma->vm_end - new_vma->vm_start;
 
-	if (new_size != text_size + PAGE_SIZE)
+	if (new_size != text_size)
 		return -EINVAL;
 
-	current->mm->context.vdso = (void __user *)new_vma->vm_start + PAGE_SIZE;
+	current->mm->context.vdso = (void __user *)new_vma->vm_start;
 
 	return 0;
 }
@@ -73,6 +73,10 @@ static int vdso64_mremap(const struct vm_special_mapping *sm, struct vm_area_str
 	return vdso_mremap(sm, new_vma, &vdso64_end - &vdso64_start);
 }
 
+static struct vm_special_mapping vvar_spec __ro_after_init = {
+	.name = "[vvar]",
+};
+
 static struct vm_special_mapping vdso32_spec __ro_after_init = {
 	.name = "[vdso]",
 	.mremap = vdso32_mremap,
@@ -89,11 +93,11 @@ static struct vm_special_mapping vdso64_spec __ro_after_init = {
  */
 static int __arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 {
-	struct mm_struct *mm = current->mm;
+	unsigned long vdso_size, vdso_base, mappings_size;
 	struct vm_special_mapping *vdso_spec;
+	unsigned long vvar_size = PAGE_SIZE;
+	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma;
-	unsigned long vdso_size;
-	unsigned long vdso_base;
 
 	if (is_32bit_task()) {
 		vdso_spec = &vdso32_spec;
@@ -110,8 +114,8 @@ static int __arch_setup_additional_pages(struct linux_binprm *bprm, int uses_int
 		vdso_base = 0;
 	}
 
-	/* Add a page to the vdso size for the data page */
-	vdso_size += PAGE_SIZE;
+	mappings_size = vdso_size + vvar_size;
+	mappings_size += (VDSO_ALIGNMENT - 1) & PAGE_MASK;
 
 	/*
 	 * pick a base address for the vDSO in process space. We try to put it
@@ -119,9 +123,7 @@ static int __arch_setup_additional_pages(struct linux_binprm *bprm, int uses_int
 	 * and end up putting it elsewhere.
 	 * Add enough to the size so that the result can be aligned.
 	 */
-	vdso_base = get_unmapped_area(NULL, vdso_base,
-				      vdso_size + ((VDSO_ALIGNMENT - 1) & PAGE_MASK),
-				      0, 0);
+	vdso_base = get_unmapped_area(NULL, vdso_base, mappings_size, 0, 0);
 	if (IS_ERR_VALUE(vdso_base))
 		return vdso_base;
 
@@ -133,7 +135,13 @@ static int __arch_setup_additional_pages(struct linux_binprm *bprm, int uses_int
 	 * install_special_mapping or the perf counter mmap tracking code
 	 * will fail to recognise it as a vDSO.
 	 */
-	mm->context.vdso = (void __user *)vdso_base + PAGE_SIZE;
+	mm->context.vdso = (void __user *)vdso_base + vvar_size;
+
+	vma = _install_special_mapping(mm, vdso_base, vvar_size,
+				       VM_READ | VM_MAYREAD | VM_IO |
+				       VM_DONTDUMP | VM_PFNMAP, &vvar_spec);
+	if (IS_ERR(vma))
+		return PTR_ERR(vma);
 
 	/*
 	 * our vma flags don't have VM_WRITE so by default, the process isn't
@@ -145,9 +153,12 @@ static int __arch_setup_additional_pages(struct linux_binprm *bprm, int uses_int
 	 * It's fine to use that for setting breakpoints in the vDSO code
 	 * pages though.
 	 */
-	vma = _install_special_mapping(mm, vdso_base, vdso_size,
+	vma = _install_special_mapping(mm, vdso_base + vvar_size, vdso_size,
 				       VM_READ | VM_EXEC | VM_MAYREAD |
 				       VM_MAYWRITE | VM_MAYEXEC, vdso_spec);
+	if (IS_ERR(vma))
+		do_munmap(mm, vdso_base, vvar_size, NULL);
+
 	return PTR_ERR_OR_ZERO(vma);
 }
 
@@ -249,11 +260,22 @@ static struct page ** __init vdso_setup_pages(void *start, void *end)
 	if (!pagelist)
 		panic("%s: Cannot allocate page list for VDSO", __func__);
 
-	pagelist[0] = virt_to_page(vdso_data);
-
 	for (i = 0; i < pages; i++)
-		pagelist[i + 1] = virt_to_page(start + i * PAGE_SIZE);
+		pagelist[i] = virt_to_page(start + i * PAGE_SIZE);
+
+	return pagelist;
+}
+
+static struct page ** __init vvar_setup_pages(void)
+{
+	struct page **pagelist;
 
+	/* .pages is NULL-terminated */
+	pagelist = kcalloc(2, sizeof(struct page *), GFP_KERNEL);
+	if (!pagelist)
+		panic("%s: Cannot allocate page list for VVAR", __func__);
+
+	pagelist[0] = virt_to_page(vdso_data);
 	return pagelist;
 }
 
@@ -295,6 +317,8 @@ static int __init vdso_init(void)
 	if (IS_ENABLED(CONFIG_PPC64))
 		vdso64_spec.pages = vdso_setup_pages(&vdso64_start, &vdso64_end);
 
+	vvar_spec.pages = vvar_setup_pages();
+
 	smp_wmb();
 
 	return 0;
-- 
2.25.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH RESEND v1 4/4] powerpc/vdso: Add support for time namespaces
  2021-03-31 16:48 [PATCH RESEND v1 0/4] powerpc/vdso: Add support for time namespaces Christophe Leroy
                   ` (2 preceding siblings ...)
  2021-03-31 16:48 ` [PATCH RESEND v1 3/4] powerpc/vdso: Separate vvar vma from vdso Christophe Leroy
@ 2021-03-31 16:48 ` Christophe Leroy
  2021-04-05  4:50   ` Andrei Vagin
  2021-04-12 13:00   ` Vincenzo Frascino
  2021-04-12 12:49 ` [PATCH RESEND v1 0/4] " Thomas Gleixner
  2021-04-19  4:00 ` Michael Ellerman
  5 siblings, 2 replies; 17+ messages in thread
From: Christophe Leroy @ 2021-03-31 16:48 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linux-kernel, linuxppc-dev, dima, avagin, arnd, tglx,
	vincenzo.frascino, luto, linux-arch

This patch adds the necessary glue to provide time namespaces.

Things are mainly copied from ARM64.

__arch_get_timens_vdso_data() calculates timens vdso data position
based on the vdso data position, knowing it is the next page in vvar.
This avoids having to redo the mflr/bcl/mflr/mtlr dance to locate
the page relative to running code position.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/Kconfig                         |   3 +-
 arch/powerpc/include/asm/vdso/gettimeofday.h |  10 ++
 arch/powerpc/include/asm/vdso_datapage.h     |   2 -
 arch/powerpc/kernel/vdso.c                   | 116 ++++++++++++++++---
 arch/powerpc/kernel/vdso32/vdso32.lds.S      |   2 +-
 arch/powerpc/kernel/vdso64/vdso64.lds.S      |   2 +-
 6 files changed, 114 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index c1344c05226c..71daff5f15d5 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -172,6 +172,7 @@ config PPC
 	select GENERIC_CPU_AUTOPROBE
 	select GENERIC_CPU_VULNERABILITIES	if PPC_BARRIER_NOSPEC
 	select GENERIC_EARLY_IOREMAP
+	select GENERIC_GETTIMEOFDAY
 	select GENERIC_IRQ_SHOW
 	select GENERIC_IRQ_SHOW_LEVEL
 	select GENERIC_PCI_IOMAP		if PCI
@@ -179,7 +180,7 @@ config PPC
 	select GENERIC_STRNCPY_FROM_USER
 	select GENERIC_STRNLEN_USER
 	select GENERIC_TIME_VSYSCALL
-	select GENERIC_GETTIMEOFDAY
+	select GENERIC_VDSO_TIME_NS
 	select HAVE_ARCH_AUDITSYSCALL
 	select HAVE_ARCH_HUGE_VMAP		if PPC_BOOK3S_64 && PPC_RADIX_MMU
 	select HAVE_ARCH_JUMP_LABEL
diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h b/arch/powerpc/include/asm/vdso/gettimeofday.h
index d453e725c79f..e448df1dd071 100644
--- a/arch/powerpc/include/asm/vdso/gettimeofday.h
+++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
@@ -2,6 +2,8 @@
 #ifndef _ASM_POWERPC_VDSO_GETTIMEOFDAY_H
 #define _ASM_POWERPC_VDSO_GETTIMEOFDAY_H
 
+#include <asm/page.h>
+
 #ifdef __ASSEMBLY__
 
 #include <asm/ppc_asm.h>
@@ -153,6 +155,14 @@ static __always_inline u64 __arch_get_hw_counter(s32 clock_mode,
 
 const struct vdso_data *__arch_get_vdso_data(void);
 
+#ifdef CONFIG_TIME_NS
+static __always_inline
+const struct vdso_data *__arch_get_timens_vdso_data(const struct vdso_data *vd)
+{
+	return (void *)vd + PAGE_SIZE;
+}
+#endif
+
 static inline bool vdso_clocksource_ok(const struct vdso_data *vd)
 {
 	return true;
diff --git a/arch/powerpc/include/asm/vdso_datapage.h b/arch/powerpc/include/asm/vdso_datapage.h
index 3f958ecf2beb..a585c8e538ff 100644
--- a/arch/powerpc/include/asm/vdso_datapage.h
+++ b/arch/powerpc/include/asm/vdso_datapage.h
@@ -107,9 +107,7 @@ extern struct vdso_arch_data *vdso_data;
 	bcl	20, 31, .+4
 999:
 	mflr	\ptr
-#if CONFIG_PPC_PAGE_SHIFT > 14
 	addis	\ptr, \ptr, (_vdso_datapage - 999b)@ha
-#endif
 	addi	\ptr, \ptr, (_vdso_datapage - 999b)@l
 .endm
 
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index b14907209822..717f2c9a7573 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -18,6 +18,7 @@
 #include <linux/security.h>
 #include <linux/memblock.h>
 #include <linux/syscalls.h>
+#include <linux/time_namespace.h>
 #include <vdso/datapage.h>
 
 #include <asm/syscall.h>
@@ -50,6 +51,12 @@ static union {
 } vdso_data_store __page_aligned_data;
 struct vdso_arch_data *vdso_data = &vdso_data_store.data;
 
+enum vvar_pages {
+	VVAR_DATA_PAGE_OFFSET,
+	VVAR_TIMENS_PAGE_OFFSET,
+	VVAR_NR_PAGES,
+};
+
 static int vdso_mremap(const struct vm_special_mapping *sm, struct vm_area_struct *new_vma,
 		       unsigned long text_size)
 {
@@ -73,8 +80,12 @@ static int vdso64_mremap(const struct vm_special_mapping *sm, struct vm_area_str
 	return vdso_mremap(sm, new_vma, &vdso64_end - &vdso64_start);
 }
 
+static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
+			     struct vm_area_struct *vma, struct vm_fault *vmf);
+
 static struct vm_special_mapping vvar_spec __ro_after_init = {
 	.name = "[vvar]",
+	.fault = vvar_fault,
 };
 
 static struct vm_special_mapping vdso32_spec __ro_after_init = {
@@ -87,6 +98,94 @@ static struct vm_special_mapping vdso64_spec __ro_after_init = {
 	.mremap = vdso64_mremap,
 };
 
+#ifdef CONFIG_TIME_NS
+struct vdso_data *arch_get_vdso_data(void *vvar_page)
+{
+	return ((struct vdso_arch_data *)vvar_page)->data;
+}
+
+/*
+ * The vvar mapping contains data for a specific time namespace, so when a task
+ * changes namespace we must unmap its vvar data for the old namespace.
+ * Subsequent faults will map in data for the new namespace.
+ *
+ * For more details see timens_setup_vdso_data().
+ */
+int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
+{
+	struct mm_struct *mm = task->mm;
+	struct vm_area_struct *vma;
+
+	mmap_read_lock(mm);
+
+	for (vma = mm->mmap; vma; vma = vma->vm_next) {
+		unsigned long size = vma->vm_end - vma->vm_start;
+
+		if (vma_is_special_mapping(vma, &vvar_spec))
+			zap_page_range(vma, vma->vm_start, size);
+	}
+
+	mmap_read_unlock(mm);
+	return 0;
+}
+
+static struct page *find_timens_vvar_page(struct vm_area_struct *vma)
+{
+	if (likely(vma->vm_mm == current->mm))
+		return current->nsproxy->time_ns->vvar_page;
+
+	/*
+	 * VM_PFNMAP | VM_IO protect .fault() handler from being called
+	 * through interfaces like /proc/$pid/mem or
+	 * process_vm_{readv,writev}() as long as there's no .access()
+	 * in special_mapping_vmops.
+	 * For more details check_vma_flags() and __access_remote_vm()
+	 */
+	WARN(1, "vvar_page accessed remotely");
+
+	return NULL;
+}
+#else
+static struct page *find_timens_vvar_page(struct vm_area_struct *vma)
+{
+	return NULL;
+}
+#endif
+
+static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
+			     struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+	struct page *timens_page = find_timens_vvar_page(vma);
+	unsigned long pfn;
+
+	switch (vmf->pgoff) {
+	case VVAR_DATA_PAGE_OFFSET:
+		if (timens_page)
+			pfn = page_to_pfn(timens_page);
+		else
+			pfn = virt_to_pfn(vdso_data);
+		break;
+#ifdef CONFIG_TIME_NS
+	case VVAR_TIMENS_PAGE_OFFSET:
+		/*
+		 * If a task belongs to a time namespace then a namespace
+		 * specific VVAR is mapped with the VVAR_DATA_PAGE_OFFSET and
+		 * the real VVAR page is mapped with the VVAR_TIMENS_PAGE_OFFSET
+		 * offset.
+		 * See also the comment near timens_setup_vdso_data().
+		 */
+		if (!timens_page)
+			return VM_FAULT_SIGBUS;
+		pfn = virt_to_pfn(vdso_data);
+		break;
+#endif /* CONFIG_TIME_NS */
+	default:
+		return VM_FAULT_SIGBUS;
+	}
+
+	return vmf_insert_pfn(vma, vmf->address, pfn);
+}
+
 /*
  * This is called from binfmt_elf, we create the special vma for the
  * vDSO and insert it into the mm struct tree
@@ -95,7 +194,7 @@ static int __arch_setup_additional_pages(struct linux_binprm *bprm, int uses_int
 {
 	unsigned long vdso_size, vdso_base, mappings_size;
 	struct vm_special_mapping *vdso_spec;
-	unsigned long vvar_size = PAGE_SIZE;
+	unsigned long vvar_size = VVAR_NR_PAGES * PAGE_SIZE;
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma;
 
@@ -266,19 +365,6 @@ static struct page ** __init vdso_setup_pages(void *start, void *end)
 	return pagelist;
 }
 
-static struct page ** __init vvar_setup_pages(void)
-{
-	struct page **pagelist;
-
-	/* .pages is NULL-terminated */
-	pagelist = kcalloc(2, sizeof(struct page *), GFP_KERNEL);
-	if (!pagelist)
-		panic("%s: Cannot allocate page list for VVAR", __func__);
-
-	pagelist[0] = virt_to_page(vdso_data);
-	return pagelist;
-}
-
 static int __init vdso_init(void)
 {
 #ifdef CONFIG_PPC64
@@ -317,8 +403,6 @@ static int __init vdso_init(void)
 	if (IS_ENABLED(CONFIG_PPC64))
 		vdso64_spec.pages = vdso_setup_pages(&vdso64_start, &vdso64_end);
 
-	vvar_spec.pages = vvar_setup_pages();
-
 	smp_wmb();
 
 	return 0;
diff --git a/arch/powerpc/kernel/vdso32/vdso32.lds.S b/arch/powerpc/kernel/vdso32/vdso32.lds.S
index a4b806b0d618..58e0099f70f4 100644
--- a/arch/powerpc/kernel/vdso32/vdso32.lds.S
+++ b/arch/powerpc/kernel/vdso32/vdso32.lds.S
@@ -17,7 +17,7 @@ ENTRY(_start)
 
 SECTIONS
 {
-	PROVIDE(_vdso_datapage = . - PAGE_SIZE);
+	PROVIDE(_vdso_datapage = . - 2 * PAGE_SIZE);
 	. = SIZEOF_HEADERS;
 
 	.hash          	: { *(.hash) }			:text
diff --git a/arch/powerpc/kernel/vdso64/vdso64.lds.S b/arch/powerpc/kernel/vdso64/vdso64.lds.S
index 2f3c359cacd3..0288cad428b0 100644
--- a/arch/powerpc/kernel/vdso64/vdso64.lds.S
+++ b/arch/powerpc/kernel/vdso64/vdso64.lds.S
@@ -17,7 +17,7 @@ ENTRY(_start)
 
 SECTIONS
 {
-	PROVIDE(_vdso_datapage = . - PAGE_SIZE);
+	PROVIDE(_vdso_datapage = . - 2 * PAGE_SIZE);
 	. = SIZEOF_HEADERS;
 
 	.hash		: { *(.hash) }			:text
-- 
2.25.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v1 4/4] powerpc/vdso: Add support for time namespaces
  2021-03-31 16:48 ` [PATCH RESEND v1 4/4] powerpc/vdso: Add support for time namespaces Christophe Leroy
@ 2021-04-05  4:50   ` Andrei Vagin
  2021-04-12 13:00   ` Vincenzo Frascino
  1 sibling, 0 replies; 17+ messages in thread
From: Andrei Vagin @ 2021-04-05  4:50 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev, dima, arnd, tglx, vincenzo.frascino,
	luto, linux-arch

On Wed, Mar 31, 2021 at 04:48:47PM +0000, Christophe Leroy wrote:
> This patch adds the necessary glue to provide time namespaces.
> 
> Things are mainly copied from ARM64.
> 
> __arch_get_timens_vdso_data() calculates timens vdso data position
> based on the vdso data position, knowing it is the next page in vvar.
> This avoids having to redo the mflr/bcl/mflr/mtlr dance to locate
> the page relative to running code position.
>

Acked-by: Andrei Vagin <avagin@gmail.com>
 
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v1 2/4] lib/vdso: Add vdso_data pointer as input to __arch_get_timens_vdso_data()
  2021-03-31 16:48 ` [PATCH RESEND v1 2/4] lib/vdso: Add vdso_data pointer as input to __arch_get_timens_vdso_data() Christophe Leroy
@ 2021-04-05  5:00   ` Andrei Vagin
  2021-04-12 12:48   ` Thomas Gleixner
  2021-04-12 12:56   ` Vincenzo Frascino
  2 siblings, 0 replies; 17+ messages in thread
From: Andrei Vagin @ 2021-04-05  5:00 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev, dima, arnd, tglx, vincenzo.frascino,
	luto, linux-arch

On Wed, Mar 31, 2021 at 04:48:45PM +0000, Christophe Leroy wrote:
> For the same reason as commit e876f0b69dc9 ("lib/vdso: Allow
> architectures to provide the vdso data pointer"), powerpc wants to
> avoid calculation of relative position to code.
> 
> As the timens_vdso_data is next page to vdso_data, provide
> vdso_data pointer to __arch_get_timens_vdso_data() in order
> to ease the calculation on powerpc in following patches.
>

Acked-by: Andrei Vagin <avagin@gmail.com>
 
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v1 3/4] powerpc/vdso: Separate vvar vma from vdso
  2021-03-31 16:48 ` [PATCH RESEND v1 3/4] powerpc/vdso: Separate vvar vma from vdso Christophe Leroy
@ 2021-04-05  5:03   ` Andrei Vagin
  2021-04-12 12:58   ` Vincenzo Frascino
  1 sibling, 0 replies; 17+ messages in thread
From: Andrei Vagin @ 2021-04-05  5:03 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev, dima, arnd, tglx, vincenzo.frascino,
	luto, linux-arch

On Wed, Mar 31, 2021 at 04:48:46PM +0000, Christophe Leroy wrote:
> From: Dmitry Safonov <dima@arista.com>
> 
> Since commit 511157ab641e ("powerpc/vdso: Move vdso datapage up front")
> VVAR page is in front of the VDSO area. In result it breaks CRIU
> (Checkpoint Restore In Userspace) [1], where CRIU expects that "[vdso]"
> from /proc/../maps points at ELF/vdso image, rather than at VVAR data page.
> Laurent made a patch to keep CRIU working (by reading aux vector).
> But I think it still makes sence to separate two mappings into different
> VMAs. It will also make ppc64 less "special" for userspace and as
> a side-bonus will make VVAR page un-writable by debugger (which previously
> would COW page and can be unexpected).
> 
> I opportunistically Cc stable on it: I understand that usually such
> stuff isn't a stable material, but that will allow us in CRIU have
> one workaround less that is needed just for one release (v5.11) on
> one platform (ppc64), which we otherwise have to maintain.
> I wouldn't go as far as to say that the commit 511157ab641e is ABI
> regression as no other userspace got broken, but I'd really appreciate
> if it gets backported to v5.11 after v5.12 is released, so as not
> to complicate already non-simple CRIU-vdso code. Thanks!
> 
> Cc: Andrei Vagin <avagin@gmail.com>

Acked-by: Andrei Vagin <avagin@gmail.com>

> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
> Cc: Laurent Dufour <ldufour@linux.ibm.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: stable@vger.kernel.org # v5.11
> [1]: https://github.com/checkpoint-restore/criu/issues/1417
> Signed-off-by: Dmitry Safonov <dima@arista.com>
> Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v1 1/4] lib/vdso: Mark do_hres_timens() and do_coarse_timens() __always_inline()
  2021-03-31 16:48 ` [PATCH RESEND v1 1/4] lib/vdso: Mark do_hres_timens() and do_coarse_timens() __always_inline() Christophe Leroy
@ 2021-04-12 12:47   ` Thomas Gleixner
  2021-04-12 12:54   ` Vincenzo Frascino
  1 sibling, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2021-04-12 12:47 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: linux-kernel, linuxppc-dev, dima, avagin, arnd,
	vincenzo.frascino, luto, linux-arch

On Wed, Mar 31 2021 at 16:48, Christophe Leroy wrote:
> In the same spirit as commit c966533f8c6c ("lib/vdso: Mark do_hres()
> and do_coarse() as __always_inline"), mark do_hres_timens() and
> do_coarse_timens() __always_inline.
>
> The measurement below in on a non timens process, ie on the fastest path.
>
> On powerpc32, without the patch:
>
> clock-gettime-monotonic-raw:    vdso: 1155 nsec/call
> clock-gettime-monotonic-coarse:    vdso: 813 nsec/call
> clock-gettime-monotonic:    vdso: 1076 nsec/call
>
> With the patch:
>
> clock-gettime-monotonic-raw:    vdso: 1100 nsec/call
> clock-gettime-monotonic-coarse:    vdso: 667 nsec/call
> clock-gettime-monotonic:    vdso: 1025 nsec/call
>
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v1 2/4] lib/vdso: Add vdso_data pointer as input to __arch_get_timens_vdso_data()
  2021-03-31 16:48 ` [PATCH RESEND v1 2/4] lib/vdso: Add vdso_data pointer as input to __arch_get_timens_vdso_data() Christophe Leroy
  2021-04-05  5:00   ` Andrei Vagin
@ 2021-04-12 12:48   ` Thomas Gleixner
  2021-04-12 12:56   ` Vincenzo Frascino
  2 siblings, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2021-04-12 12:48 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: linux-kernel, linuxppc-dev, dima, avagin, arnd,
	vincenzo.frascino, luto, linux-arch

On Wed, Mar 31 2021 at 16:48, Christophe Leroy wrote:
> For the same reason as commit e876f0b69dc9 ("lib/vdso: Allow
> architectures to provide the vdso data pointer"), powerpc wants to
> avoid calculation of relative position to code.
>
> As the timens_vdso_data is next page to vdso_data, provide
> vdso_data pointer to __arch_get_timens_vdso_data() in order
> to ease the calculation on powerpc in following patches.
>
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v1 0/4] powerpc/vdso: Add support for time namespaces
  2021-03-31 16:48 [PATCH RESEND v1 0/4] powerpc/vdso: Add support for time namespaces Christophe Leroy
                   ` (3 preceding siblings ...)
  2021-03-31 16:48 ` [PATCH RESEND v1 4/4] powerpc/vdso: Add support for time namespaces Christophe Leroy
@ 2021-04-12 12:49 ` Thomas Gleixner
  2021-04-13  6:31   ` Michael Ellerman
  2021-04-19  4:00 ` Michael Ellerman
  5 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2021-04-12 12:49 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: linux-kernel, linuxppc-dev, dima, avagin, arnd,
	vincenzo.frascino, luto, linux-arch

On Wed, Mar 31 2021 at 16:48, Christophe Leroy wrote:
> [Sorry, resending with complete destination list, I used the wrong script on the first delivery]
>
> This series adds support for time namespaces on powerpc.
>
> All timens selftests are successfull.

If PPC people want to pick up the whole lot, no objections from my side.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v1 1/4] lib/vdso: Mark do_hres_timens() and do_coarse_timens() __always_inline()
  2021-03-31 16:48 ` [PATCH RESEND v1 1/4] lib/vdso: Mark do_hres_timens() and do_coarse_timens() __always_inline() Christophe Leroy
  2021-04-12 12:47   ` Thomas Gleixner
@ 2021-04-12 12:54   ` Vincenzo Frascino
  1 sibling, 0 replies; 17+ messages in thread
From: Vincenzo Frascino @ 2021-04-12 12:54 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: linux-kernel, linuxppc-dev, dima, avagin, arnd, tglx, luto, linux-arch



On 3/31/21 5:48 PM, Christophe Leroy wrote:
> In the same spirit as commit c966533f8c6c ("lib/vdso: Mark do_hres()
> and do_coarse() as __always_inline"), mark do_hres_timens() and
> do_coarse_timens() __always_inline.
> 
> The measurement below in on a non timens process, ie on the fastest path.
> 
> On powerpc32, without the patch:
> 
> clock-gettime-monotonic-raw:    vdso: 1155 nsec/call
> clock-gettime-monotonic-coarse:    vdso: 813 nsec/call
> clock-gettime-monotonic:    vdso: 1076 nsec/call
> 
> With the patch:
> 
> clock-gettime-monotonic-raw:    vdso: 1100 nsec/call
> clock-gettime-monotonic-coarse:    vdso: 667 nsec/call
> clock-gettime-monotonic:    vdso: 1025 nsec/call
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>

Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>

> ---
>  lib/vdso/gettimeofday.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/vdso/gettimeofday.c b/lib/vdso/gettimeofday.c
> index 2919f1698140..c6f6dee08746 100644
> --- a/lib/vdso/gettimeofday.c
> +++ b/lib/vdso/gettimeofday.c
> @@ -46,8 +46,8 @@ static inline bool vdso_cycles_ok(u64 cycles)
>  #endif
>  
>  #ifdef CONFIG_TIME_NS
> -static int do_hres_timens(const struct vdso_data *vdns, clockid_t clk,
> -			  struct __kernel_timespec *ts)
> +static __always_inline int do_hres_timens(const struct vdso_data *vdns, clockid_t clk,
> +					  struct __kernel_timespec *ts)
>  {
>  	const struct vdso_data *vd = __arch_get_timens_vdso_data();
>  	const struct timens_offset *offs = &vdns->offset[clk];
> @@ -97,8 +97,8 @@ static __always_inline const struct vdso_data *__arch_get_timens_vdso_data(void)
>  	return NULL;
>  }
>  
> -static int do_hres_timens(const struct vdso_data *vdns, clockid_t clk,
> -			  struct __kernel_timespec *ts)
> +static __always_inline int do_hres_timens(const struct vdso_data *vdns, clockid_t clk,
> +					  struct __kernel_timespec *ts)
>  {
>  	return -EINVAL;
>  }
> @@ -159,8 +159,8 @@ static __always_inline int do_hres(const struct vdso_data *vd, clockid_t clk,
>  }
>  
>  #ifdef CONFIG_TIME_NS
> -static int do_coarse_timens(const struct vdso_data *vdns, clockid_t clk,
> -			    struct __kernel_timespec *ts)
> +static __always_inline int do_coarse_timens(const struct vdso_data *vdns, clockid_t clk,
> +					    struct __kernel_timespec *ts)
>  {
>  	const struct vdso_data *vd = __arch_get_timens_vdso_data();
>  	const struct vdso_timestamp *vdso_ts = &vd->basetime[clk];
> @@ -188,8 +188,8 @@ static int do_coarse_timens(const struct vdso_data *vdns, clockid_t clk,
>  	return 0;
>  }
>  #else
> -static int do_coarse_timens(const struct vdso_data *vdns, clockid_t clk,
> -			    struct __kernel_timespec *ts)
> +static __always_inline int do_coarse_timens(const struct vdso_data *vdns, clockid_t clk,
> +					    struct __kernel_timespec *ts)
>  {
>  	return -1;
>  }
> 

-- 
Regards,
Vincenzo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v1 2/4] lib/vdso: Add vdso_data pointer as input to __arch_get_timens_vdso_data()
  2021-03-31 16:48 ` [PATCH RESEND v1 2/4] lib/vdso: Add vdso_data pointer as input to __arch_get_timens_vdso_data() Christophe Leroy
  2021-04-05  5:00   ` Andrei Vagin
  2021-04-12 12:48   ` Thomas Gleixner
@ 2021-04-12 12:56   ` Vincenzo Frascino
  2 siblings, 0 replies; 17+ messages in thread
From: Vincenzo Frascino @ 2021-04-12 12:56 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: linux-kernel, linuxppc-dev, dima, avagin, arnd, tglx, luto, linux-arch



On 3/31/21 5:48 PM, Christophe Leroy wrote:
> For the same reason as commit e876f0b69dc9 ("lib/vdso: Allow
> architectures to provide the vdso data pointer"), powerpc wants to
> avoid calculation of relative position to code.
> 
> As the timens_vdso_data is next page to vdso_data, provide
> vdso_data pointer to __arch_get_timens_vdso_data() in order
> to ease the calculation on powerpc in following patches.
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>

Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>

> ---
>  arch/arm64/include/asm/vdso/compat_gettimeofday.h |  3 ++-
>  arch/arm64/include/asm/vdso/gettimeofday.h        |  2 +-
>  arch/s390/include/asm/vdso/gettimeofday.h         |  3 ++-
>  arch/x86/include/asm/vdso/gettimeofday.h          |  3 ++-
>  lib/vdso/gettimeofday.c                           | 15 +++++++++------
>  5 files changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/vdso/compat_gettimeofday.h b/arch/arm64/include/asm/vdso/compat_gettimeofday.h
> index 7508b0ac1d21..ecb6fd4c3c64 100644
> --- a/arch/arm64/include/asm/vdso/compat_gettimeofday.h
> +++ b/arch/arm64/include/asm/vdso/compat_gettimeofday.h
> @@ -155,7 +155,8 @@ static __always_inline const struct vdso_data *__arch_get_vdso_data(void)
>  }
>  
>  #ifdef CONFIG_TIME_NS
> -static __always_inline const struct vdso_data *__arch_get_timens_vdso_data(void)
> +static __always_inline
> +const struct vdso_data *__arch_get_timens_vdso_data(const struct vdso_data *vd)
>  {
>  	const struct vdso_data *ret;
>  
> diff --git a/arch/arm64/include/asm/vdso/gettimeofday.h b/arch/arm64/include/asm/vdso/gettimeofday.h
> index 631ab1281633..de86230a9436 100644
> --- a/arch/arm64/include/asm/vdso/gettimeofday.h
> +++ b/arch/arm64/include/asm/vdso/gettimeofday.h
> @@ -100,7 +100,7 @@ const struct vdso_data *__arch_get_vdso_data(void)
>  
>  #ifdef CONFIG_TIME_NS
>  static __always_inline
> -const struct vdso_data *__arch_get_timens_vdso_data(void)
> +const struct vdso_data *__arch_get_timens_vdso_data(const struct vdso_data *vd)
>  {
>  	return _timens_data;
>  }
> diff --git a/arch/s390/include/asm/vdso/gettimeofday.h b/arch/s390/include/asm/vdso/gettimeofday.h
> index ed89ef742530..383c53c3dddd 100644
> --- a/arch/s390/include/asm/vdso/gettimeofday.h
> +++ b/arch/s390/include/asm/vdso/gettimeofday.h
> @@ -68,7 +68,8 @@ long clock_getres_fallback(clockid_t clkid, struct __kernel_timespec *ts)
>  }
>  
>  #ifdef CONFIG_TIME_NS
> -static __always_inline const struct vdso_data *__arch_get_timens_vdso_data(void)
> +static __always_inline
> +const struct vdso_data *__arch_get_timens_vdso_data(const struct vdso_data *vd)
>  {
>  	return _timens_data;
>  }
> diff --git a/arch/x86/include/asm/vdso/gettimeofday.h b/arch/x86/include/asm/vdso/gettimeofday.h
> index df01d7349d79..1936f21ed8cd 100644
> --- a/arch/x86/include/asm/vdso/gettimeofday.h
> +++ b/arch/x86/include/asm/vdso/gettimeofday.h
> @@ -58,7 +58,8 @@ extern struct ms_hyperv_tsc_page hvclock_page
>  #endif
>  
>  #ifdef CONFIG_TIME_NS
> -static __always_inline const struct vdso_data *__arch_get_timens_vdso_data(void)
> +static __always_inline
> +const struct vdso_data *__arch_get_timens_vdso_data(const struct vdso_data *vd)
>  {
>  	return __timens_vdso_data;
>  }
> diff --git a/lib/vdso/gettimeofday.c b/lib/vdso/gettimeofday.c
> index c6f6dee08746..ce2f69552003 100644
> --- a/lib/vdso/gettimeofday.c
> +++ b/lib/vdso/gettimeofday.c
> @@ -49,13 +49,15 @@ static inline bool vdso_cycles_ok(u64 cycles)
>  static __always_inline int do_hres_timens(const struct vdso_data *vdns, clockid_t clk,
>  					  struct __kernel_timespec *ts)
>  {
> -	const struct vdso_data *vd = __arch_get_timens_vdso_data();
> +	const struct vdso_data *vd;
>  	const struct timens_offset *offs = &vdns->offset[clk];
>  	const struct vdso_timestamp *vdso_ts;
>  	u64 cycles, last, ns;
>  	u32 seq;
>  	s64 sec;
>  
> +	vd = vdns - (clk == CLOCK_MONOTONIC_RAW ? CS_RAW : CS_HRES_COARSE);
> +	vd = __arch_get_timens_vdso_data(vd);
>  	if (clk != CLOCK_MONOTONIC_RAW)
>  		vd = &vd[CS_HRES_COARSE];
>  	else
> @@ -92,7 +94,8 @@ static __always_inline int do_hres_timens(const struct vdso_data *vdns, clockid_
>  	return 0;
>  }
>  #else
> -static __always_inline const struct vdso_data *__arch_get_timens_vdso_data(void)
> +static __always_inline
> +const struct vdso_data *__arch_get_timens_vdso_data(const struct vdso_data *vd)
>  {
>  	return NULL;
>  }
> @@ -162,7 +165,7 @@ static __always_inline int do_hres(const struct vdso_data *vd, clockid_t clk,
>  static __always_inline int do_coarse_timens(const struct vdso_data *vdns, clockid_t clk,
>  					    struct __kernel_timespec *ts)
>  {
> -	const struct vdso_data *vd = __arch_get_timens_vdso_data();
> +	const struct vdso_data *vd = __arch_get_timens_vdso_data(vdns);
>  	const struct vdso_timestamp *vdso_ts = &vd->basetime[clk];
>  	const struct timens_offset *offs = &vdns->offset[clk];
>  	u64 nsec;
> @@ -310,7 +313,7 @@ __cvdso_gettimeofday_data(const struct vdso_data *vd,
>  	if (unlikely(tz != NULL)) {
>  		if (IS_ENABLED(CONFIG_TIME_NS) &&
>  		    vd->clock_mode == VDSO_CLOCKMODE_TIMENS)
> -			vd = __arch_get_timens_vdso_data();
> +			vd = __arch_get_timens_vdso_data(vd);
>  
>  		tz->tz_minuteswest = vd[CS_HRES_COARSE].tz_minuteswest;
>  		tz->tz_dsttime = vd[CS_HRES_COARSE].tz_dsttime;
> @@ -333,7 +336,7 @@ __cvdso_time_data(const struct vdso_data *vd, __kernel_old_time_t *time)
>  
>  	if (IS_ENABLED(CONFIG_TIME_NS) &&
>  	    vd->clock_mode == VDSO_CLOCKMODE_TIMENS)
> -		vd = __arch_get_timens_vdso_data();
> +		vd = __arch_get_timens_vdso_data(vd);
>  
>  	t = READ_ONCE(vd[CS_HRES_COARSE].basetime[CLOCK_REALTIME].sec);
>  
> @@ -363,7 +366,7 @@ int __cvdso_clock_getres_common(const struct vdso_data *vd, clockid_t clock,
>  
>  	if (IS_ENABLED(CONFIG_TIME_NS) &&
>  	    vd->clock_mode == VDSO_CLOCKMODE_TIMENS)
> -		vd = __arch_get_timens_vdso_data();
> +		vd = __arch_get_timens_vdso_data(vd);
>  
>  	/*
>  	 * Convert the clockid to a bitmask and use it to check which
> 

-- 
Regards,
Vincenzo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v1 3/4] powerpc/vdso: Separate vvar vma from vdso
  2021-03-31 16:48 ` [PATCH RESEND v1 3/4] powerpc/vdso: Separate vvar vma from vdso Christophe Leroy
  2021-04-05  5:03   ` Andrei Vagin
@ 2021-04-12 12:58   ` Vincenzo Frascino
  1 sibling, 0 replies; 17+ messages in thread
From: Vincenzo Frascino @ 2021-04-12 12:58 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: linux-kernel, linuxppc-dev, dima, avagin, arnd, tglx, luto, linux-arch



On 3/31/21 5:48 PM, Christophe Leroy wrote:
> From: Dmitry Safonov <dima@arista.com>
> 
> Since commit 511157ab641e ("powerpc/vdso: Move vdso datapage up front")
> VVAR page is in front of the VDSO area. In result it breaks CRIU
> (Checkpoint Restore In Userspace) [1], where CRIU expects that "[vdso]"
> from /proc/../maps points at ELF/vdso image, rather than at VVAR data page.
> Laurent made a patch to keep CRIU working (by reading aux vector).
> But I think it still makes sence to separate two mappings into different
> VMAs. It will also make ppc64 less "special" for userspace and as
> a side-bonus will make VVAR page un-writable by debugger (which previously
> would COW page and can be unexpected).
> 
> I opportunistically Cc stable on it: I understand that usually such
> stuff isn't a stable material, but that will allow us in CRIU have
> one workaround less that is needed just for one release (v5.11) on
> one platform (ppc64), which we otherwise have to maintain.
> I wouldn't go as far as to say that the commit 511157ab641e is ABI
> regression as no other userspace got broken, but I'd really appreciate
> if it gets backported to v5.11 after v5.12 is released, so as not
> to complicate already non-simple CRIU-vdso code. Thanks!
> 
> Cc: Andrei Vagin <avagin@gmail.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
> Cc: Laurent Dufour <ldufour@linux.ibm.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: stable@vger.kernel.org # v5.11
> [1]: https://github.com/checkpoint-restore/criu/issues/1417
> Signed-off-by: Dmitry Safonov <dima@arista.com>
> Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>

Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> # vDSO parts.

> ---
>  arch/powerpc/include/asm/mmu_context.h |  2 +-
>  arch/powerpc/kernel/vdso.c             | 54 +++++++++++++++++++-------
>  2 files changed, 40 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index 652ce85f9410..4bc45d3ed8b0 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -263,7 +263,7 @@ extern void arch_exit_mmap(struct mm_struct *mm);
>  static inline void arch_unmap(struct mm_struct *mm,
>  			      unsigned long start, unsigned long end)
>  {
> -	unsigned long vdso_base = (unsigned long)mm->context.vdso - PAGE_SIZE;
> +	unsigned long vdso_base = (unsigned long)mm->context.vdso;
>  
>  	if (start <= vdso_base && vdso_base < end)
>  		mm->context.vdso = NULL;
> diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
> index e839a906fdf2..b14907209822 100644
> --- a/arch/powerpc/kernel/vdso.c
> +++ b/arch/powerpc/kernel/vdso.c
> @@ -55,10 +55,10 @@ static int vdso_mremap(const struct vm_special_mapping *sm, struct vm_area_struc
>  {
>  	unsigned long new_size = new_vma->vm_end - new_vma->vm_start;
>  
> -	if (new_size != text_size + PAGE_SIZE)
> +	if (new_size != text_size)
>  		return -EINVAL;
>  
> -	current->mm->context.vdso = (void __user *)new_vma->vm_start + PAGE_SIZE;
> +	current->mm->context.vdso = (void __user *)new_vma->vm_start;
>  
>  	return 0;
>  }
> @@ -73,6 +73,10 @@ static int vdso64_mremap(const struct vm_special_mapping *sm, struct vm_area_str
>  	return vdso_mremap(sm, new_vma, &vdso64_end - &vdso64_start);
>  }
>  
> +static struct vm_special_mapping vvar_spec __ro_after_init = {
> +	.name = "[vvar]",
> +};
> +
>  static struct vm_special_mapping vdso32_spec __ro_after_init = {
>  	.name = "[vdso]",
>  	.mremap = vdso32_mremap,
> @@ -89,11 +93,11 @@ static struct vm_special_mapping vdso64_spec __ro_after_init = {
>   */
>  static int __arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
>  {
> -	struct mm_struct *mm = current->mm;
> +	unsigned long vdso_size, vdso_base, mappings_size;
>  	struct vm_special_mapping *vdso_spec;
> +	unsigned long vvar_size = PAGE_SIZE;
> +	struct mm_struct *mm = current->mm;
>  	struct vm_area_struct *vma;
> -	unsigned long vdso_size;
> -	unsigned long vdso_base;
>  
>  	if (is_32bit_task()) {
>  		vdso_spec = &vdso32_spec;
> @@ -110,8 +114,8 @@ static int __arch_setup_additional_pages(struct linux_binprm *bprm, int uses_int
>  		vdso_base = 0;
>  	}
>  
> -	/* Add a page to the vdso size for the data page */
> -	vdso_size += PAGE_SIZE;
> +	mappings_size = vdso_size + vvar_size;
> +	mappings_size += (VDSO_ALIGNMENT - 1) & PAGE_MASK;
>  
>  	/*
>  	 * pick a base address for the vDSO in process space. We try to put it
> @@ -119,9 +123,7 @@ static int __arch_setup_additional_pages(struct linux_binprm *bprm, int uses_int
>  	 * and end up putting it elsewhere.
>  	 * Add enough to the size so that the result can be aligned.
>  	 */
> -	vdso_base = get_unmapped_area(NULL, vdso_base,
> -				      vdso_size + ((VDSO_ALIGNMENT - 1) & PAGE_MASK),
> -				      0, 0);
> +	vdso_base = get_unmapped_area(NULL, vdso_base, mappings_size, 0, 0);
>  	if (IS_ERR_VALUE(vdso_base))
>  		return vdso_base;
>  
> @@ -133,7 +135,13 @@ static int __arch_setup_additional_pages(struct linux_binprm *bprm, int uses_int
>  	 * install_special_mapping or the perf counter mmap tracking code
>  	 * will fail to recognise it as a vDSO.
>  	 */
> -	mm->context.vdso = (void __user *)vdso_base + PAGE_SIZE;
> +	mm->context.vdso = (void __user *)vdso_base + vvar_size;
> +
> +	vma = _install_special_mapping(mm, vdso_base, vvar_size,
> +				       VM_READ | VM_MAYREAD | VM_IO |
> +				       VM_DONTDUMP | VM_PFNMAP, &vvar_spec);
> +	if (IS_ERR(vma))
> +		return PTR_ERR(vma);
>  
>  	/*
>  	 * our vma flags don't have VM_WRITE so by default, the process isn't
> @@ -145,9 +153,12 @@ static int __arch_setup_additional_pages(struct linux_binprm *bprm, int uses_int
>  	 * It's fine to use that for setting breakpoints in the vDSO code
>  	 * pages though.
>  	 */
> -	vma = _install_special_mapping(mm, vdso_base, vdso_size,
> +	vma = _install_special_mapping(mm, vdso_base + vvar_size, vdso_size,
>  				       VM_READ | VM_EXEC | VM_MAYREAD |
>  				       VM_MAYWRITE | VM_MAYEXEC, vdso_spec);
> +	if (IS_ERR(vma))
> +		do_munmap(mm, vdso_base, vvar_size, NULL);
> +
>  	return PTR_ERR_OR_ZERO(vma);
>  }
>  
> @@ -249,11 +260,22 @@ static struct page ** __init vdso_setup_pages(void *start, void *end)
>  	if (!pagelist)
>  		panic("%s: Cannot allocate page list for VDSO", __func__);
>  
> -	pagelist[0] = virt_to_page(vdso_data);
> -
>  	for (i = 0; i < pages; i++)
> -		pagelist[i + 1] = virt_to_page(start + i * PAGE_SIZE);
> +		pagelist[i] = virt_to_page(start + i * PAGE_SIZE);
> +
> +	return pagelist;
> +}
> +
> +static struct page ** __init vvar_setup_pages(void)
> +{
> +	struct page **pagelist;
>  
> +	/* .pages is NULL-terminated */
> +	pagelist = kcalloc(2, sizeof(struct page *), GFP_KERNEL);
> +	if (!pagelist)
> +		panic("%s: Cannot allocate page list for VVAR", __func__);
> +
> +	pagelist[0] = virt_to_page(vdso_data);
>  	return pagelist;
>  }
>  
> @@ -295,6 +317,8 @@ static int __init vdso_init(void)
>  	if (IS_ENABLED(CONFIG_PPC64))
>  		vdso64_spec.pages = vdso_setup_pages(&vdso64_start, &vdso64_end);
>  
> +	vvar_spec.pages = vvar_setup_pages();
> +
>  	smp_wmb();
>  
>  	return 0;
> 

-- 
Regards,
Vincenzo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v1 4/4] powerpc/vdso: Add support for time namespaces
  2021-03-31 16:48 ` [PATCH RESEND v1 4/4] powerpc/vdso: Add support for time namespaces Christophe Leroy
  2021-04-05  4:50   ` Andrei Vagin
@ 2021-04-12 13:00   ` Vincenzo Frascino
  1 sibling, 0 replies; 17+ messages in thread
From: Vincenzo Frascino @ 2021-04-12 13:00 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: linux-kernel, linuxppc-dev, dima, avagin, arnd, tglx, luto, linux-arch



On 3/31/21 5:48 PM, Christophe Leroy wrote:
> This patch adds the necessary glue to provide time namespaces.
> 
> Things are mainly copied from ARM64.
> 
> __arch_get_timens_vdso_data() calculates timens vdso data position
> based on the vdso data position, knowing it is the next page in vvar.
> This avoids having to redo the mflr/bcl/mflr/mtlr dance to locate
> the page relative to running code position.
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>

Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> # vDSO parts

> ---
>  arch/powerpc/Kconfig                         |   3 +-
>  arch/powerpc/include/asm/vdso/gettimeofday.h |  10 ++
>  arch/powerpc/include/asm/vdso_datapage.h     |   2 -
>  arch/powerpc/kernel/vdso.c                   | 116 ++++++++++++++++---
>  arch/powerpc/kernel/vdso32/vdso32.lds.S      |   2 +-
>  arch/powerpc/kernel/vdso64/vdso64.lds.S      |   2 +-
>  6 files changed, 114 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index c1344c05226c..71daff5f15d5 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -172,6 +172,7 @@ config PPC
>  	select GENERIC_CPU_AUTOPROBE
>  	select GENERIC_CPU_VULNERABILITIES	if PPC_BARRIER_NOSPEC
>  	select GENERIC_EARLY_IOREMAP
> +	select GENERIC_GETTIMEOFDAY
>  	select GENERIC_IRQ_SHOW
>  	select GENERIC_IRQ_SHOW_LEVEL
>  	select GENERIC_PCI_IOMAP		if PCI
> @@ -179,7 +180,7 @@ config PPC
>  	select GENERIC_STRNCPY_FROM_USER
>  	select GENERIC_STRNLEN_USER
>  	select GENERIC_TIME_VSYSCALL
> -	select GENERIC_GETTIMEOFDAY
> +	select GENERIC_VDSO_TIME_NS
>  	select HAVE_ARCH_AUDITSYSCALL
>  	select HAVE_ARCH_HUGE_VMAP		if PPC_BOOK3S_64 && PPC_RADIX_MMU
>  	select HAVE_ARCH_JUMP_LABEL
> diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h b/arch/powerpc/include/asm/vdso/gettimeofday.h
> index d453e725c79f..e448df1dd071 100644
> --- a/arch/powerpc/include/asm/vdso/gettimeofday.h
> +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
> @@ -2,6 +2,8 @@
>  #ifndef _ASM_POWERPC_VDSO_GETTIMEOFDAY_H
>  #define _ASM_POWERPC_VDSO_GETTIMEOFDAY_H
>  
> +#include <asm/page.h>
> +
>  #ifdef __ASSEMBLY__
>  
>  #include <asm/ppc_asm.h>
> @@ -153,6 +155,14 @@ static __always_inline u64 __arch_get_hw_counter(s32 clock_mode,
>  
>  const struct vdso_data *__arch_get_vdso_data(void);
>  
> +#ifdef CONFIG_TIME_NS
> +static __always_inline
> +const struct vdso_data *__arch_get_timens_vdso_data(const struct vdso_data *vd)
> +{
> +	return (void *)vd + PAGE_SIZE;
> +}
> +#endif
> +
>  static inline bool vdso_clocksource_ok(const struct vdso_data *vd)
>  {
>  	return true;
> diff --git a/arch/powerpc/include/asm/vdso_datapage.h b/arch/powerpc/include/asm/vdso_datapage.h
> index 3f958ecf2beb..a585c8e538ff 100644
> --- a/arch/powerpc/include/asm/vdso_datapage.h
> +++ b/arch/powerpc/include/asm/vdso_datapage.h
> @@ -107,9 +107,7 @@ extern struct vdso_arch_data *vdso_data;
>  	bcl	20, 31, .+4
>  999:
>  	mflr	\ptr
> -#if CONFIG_PPC_PAGE_SHIFT > 14
>  	addis	\ptr, \ptr, (_vdso_datapage - 999b)@ha
> -#endif
>  	addi	\ptr, \ptr, (_vdso_datapage - 999b)@l
>  .endm
>  
> diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
> index b14907209822..717f2c9a7573 100644
> --- a/arch/powerpc/kernel/vdso.c
> +++ b/arch/powerpc/kernel/vdso.c
> @@ -18,6 +18,7 @@
>  #include <linux/security.h>
>  #include <linux/memblock.h>
>  #include <linux/syscalls.h>
> +#include <linux/time_namespace.h>
>  #include <vdso/datapage.h>
>  
>  #include <asm/syscall.h>
> @@ -50,6 +51,12 @@ static union {
>  } vdso_data_store __page_aligned_data;
>  struct vdso_arch_data *vdso_data = &vdso_data_store.data;
>  
> +enum vvar_pages {
> +	VVAR_DATA_PAGE_OFFSET,
> +	VVAR_TIMENS_PAGE_OFFSET,
> +	VVAR_NR_PAGES,
> +};
> +
>  static int vdso_mremap(const struct vm_special_mapping *sm, struct vm_area_struct *new_vma,
>  		       unsigned long text_size)
>  {
> @@ -73,8 +80,12 @@ static int vdso64_mremap(const struct vm_special_mapping *sm, struct vm_area_str
>  	return vdso_mremap(sm, new_vma, &vdso64_end - &vdso64_start);
>  }
>  
> +static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
> +			     struct vm_area_struct *vma, struct vm_fault *vmf);
> +
>  static struct vm_special_mapping vvar_spec __ro_after_init = {
>  	.name = "[vvar]",
> +	.fault = vvar_fault,
>  };
>  
>  static struct vm_special_mapping vdso32_spec __ro_after_init = {
> @@ -87,6 +98,94 @@ static struct vm_special_mapping vdso64_spec __ro_after_init = {
>  	.mremap = vdso64_mremap,
>  };
>  
> +#ifdef CONFIG_TIME_NS
> +struct vdso_data *arch_get_vdso_data(void *vvar_page)
> +{
> +	return ((struct vdso_arch_data *)vvar_page)->data;
> +}
> +
> +/*
> + * The vvar mapping contains data for a specific time namespace, so when a task
> + * changes namespace we must unmap its vvar data for the old namespace.
> + * Subsequent faults will map in data for the new namespace.
> + *
> + * For more details see timens_setup_vdso_data().
> + */
> +int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
> +{
> +	struct mm_struct *mm = task->mm;
> +	struct vm_area_struct *vma;
> +
> +	mmap_read_lock(mm);
> +
> +	for (vma = mm->mmap; vma; vma = vma->vm_next) {
> +		unsigned long size = vma->vm_end - vma->vm_start;
> +
> +		if (vma_is_special_mapping(vma, &vvar_spec))
> +			zap_page_range(vma, vma->vm_start, size);
> +	}
> +
> +	mmap_read_unlock(mm);
> +	return 0;
> +}
> +
> +static struct page *find_timens_vvar_page(struct vm_area_struct *vma)
> +{
> +	if (likely(vma->vm_mm == current->mm))
> +		return current->nsproxy->time_ns->vvar_page;
> +
> +	/*
> +	 * VM_PFNMAP | VM_IO protect .fault() handler from being called
> +	 * through interfaces like /proc/$pid/mem or
> +	 * process_vm_{readv,writev}() as long as there's no .access()
> +	 * in special_mapping_vmops.
> +	 * For more details check_vma_flags() and __access_remote_vm()
> +	 */
> +	WARN(1, "vvar_page accessed remotely");
> +
> +	return NULL;
> +}
> +#else
> +static struct page *find_timens_vvar_page(struct vm_area_struct *vma)
> +{
> +	return NULL;
> +}
> +#endif
> +
> +static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
> +			     struct vm_area_struct *vma, struct vm_fault *vmf)
> +{
> +	struct page *timens_page = find_timens_vvar_page(vma);
> +	unsigned long pfn;
> +
> +	switch (vmf->pgoff) {
> +	case VVAR_DATA_PAGE_OFFSET:
> +		if (timens_page)
> +			pfn = page_to_pfn(timens_page);
> +		else
> +			pfn = virt_to_pfn(vdso_data);
> +		break;
> +#ifdef CONFIG_TIME_NS
> +	case VVAR_TIMENS_PAGE_OFFSET:
> +		/*
> +		 * If a task belongs to a time namespace then a namespace
> +		 * specific VVAR is mapped with the VVAR_DATA_PAGE_OFFSET and
> +		 * the real VVAR page is mapped with the VVAR_TIMENS_PAGE_OFFSET
> +		 * offset.
> +		 * See also the comment near timens_setup_vdso_data().
> +		 */
> +		if (!timens_page)
> +			return VM_FAULT_SIGBUS;
> +		pfn = virt_to_pfn(vdso_data);
> +		break;
> +#endif /* CONFIG_TIME_NS */
> +	default:
> +		return VM_FAULT_SIGBUS;
> +	}
> +
> +	return vmf_insert_pfn(vma, vmf->address, pfn);
> +}
> +
>  /*
>   * This is called from binfmt_elf, we create the special vma for the
>   * vDSO and insert it into the mm struct tree
> @@ -95,7 +194,7 @@ static int __arch_setup_additional_pages(struct linux_binprm *bprm, int uses_int
>  {
>  	unsigned long vdso_size, vdso_base, mappings_size;
>  	struct vm_special_mapping *vdso_spec;
> -	unsigned long vvar_size = PAGE_SIZE;
> +	unsigned long vvar_size = VVAR_NR_PAGES * PAGE_SIZE;
>  	struct mm_struct *mm = current->mm;
>  	struct vm_area_struct *vma;
>  
> @@ -266,19 +365,6 @@ static struct page ** __init vdso_setup_pages(void *start, void *end)
>  	return pagelist;
>  }
>  
> -static struct page ** __init vvar_setup_pages(void)
> -{
> -	struct page **pagelist;
> -
> -	/* .pages is NULL-terminated */
> -	pagelist = kcalloc(2, sizeof(struct page *), GFP_KERNEL);
> -	if (!pagelist)
> -		panic("%s: Cannot allocate page list for VVAR", __func__);
> -
> -	pagelist[0] = virt_to_page(vdso_data);
> -	return pagelist;
> -}
> -
>  static int __init vdso_init(void)
>  {
>  #ifdef CONFIG_PPC64
> @@ -317,8 +403,6 @@ static int __init vdso_init(void)
>  	if (IS_ENABLED(CONFIG_PPC64))
>  		vdso64_spec.pages = vdso_setup_pages(&vdso64_start, &vdso64_end);
>  
> -	vvar_spec.pages = vvar_setup_pages();
> -
>  	smp_wmb();
>  
>  	return 0;
> diff --git a/arch/powerpc/kernel/vdso32/vdso32.lds.S b/arch/powerpc/kernel/vdso32/vdso32.lds.S
> index a4b806b0d618..58e0099f70f4 100644
> --- a/arch/powerpc/kernel/vdso32/vdso32.lds.S
> +++ b/arch/powerpc/kernel/vdso32/vdso32.lds.S
> @@ -17,7 +17,7 @@ ENTRY(_start)
>  
>  SECTIONS
>  {
> -	PROVIDE(_vdso_datapage = . - PAGE_SIZE);
> +	PROVIDE(_vdso_datapage = . - 2 * PAGE_SIZE);
>  	. = SIZEOF_HEADERS;
>  
>  	.hash          	: { *(.hash) }			:text
> diff --git a/arch/powerpc/kernel/vdso64/vdso64.lds.S b/arch/powerpc/kernel/vdso64/vdso64.lds.S
> index 2f3c359cacd3..0288cad428b0 100644
> --- a/arch/powerpc/kernel/vdso64/vdso64.lds.S
> +++ b/arch/powerpc/kernel/vdso64/vdso64.lds.S
> @@ -17,7 +17,7 @@ ENTRY(_start)
>  
>  SECTIONS
>  {
> -	PROVIDE(_vdso_datapage = . - PAGE_SIZE);
> +	PROVIDE(_vdso_datapage = . - 2 * PAGE_SIZE);
>  	. = SIZEOF_HEADERS;
>  
>  	.hash		: { *(.hash) }			:text
> 

-- 
Regards,
Vincenzo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v1 0/4] powerpc/vdso: Add support for time namespaces
  2021-04-12 12:49 ` [PATCH RESEND v1 0/4] " Thomas Gleixner
@ 2021-04-13  6:31   ` Michael Ellerman
  0 siblings, 0 replies; 17+ messages in thread
From: Michael Ellerman @ 2021-04-13  6:31 UTC (permalink / raw)
  To: Thomas Gleixner, Christophe Leroy, Benjamin Herrenschmidt,
	Paul Mackerras
  Cc: linux-kernel, linuxppc-dev, dima, avagin, arnd,
	vincenzo.frascino, luto, linux-arch

Thomas Gleixner <tglx@linutronix.de> writes:
> On Wed, Mar 31 2021 at 16:48, Christophe Leroy wrote:
>> [Sorry, resending with complete destination list, I used the wrong script on the first delivery]
>>
>> This series adds support for time namespaces on powerpc.
>>
>> All timens selftests are successfull.
>
> If PPC people want to pick up the whole lot, no objections from my side.

Thanks, will do.

cheers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v1 0/4] powerpc/vdso: Add support for time namespaces
  2021-03-31 16:48 [PATCH RESEND v1 0/4] powerpc/vdso: Add support for time namespaces Christophe Leroy
                   ` (4 preceding siblings ...)
  2021-04-12 12:49 ` [PATCH RESEND v1 0/4] " Thomas Gleixner
@ 2021-04-19  4:00 ` Michael Ellerman
  5 siblings, 0 replies; 17+ messages in thread
From: Michael Ellerman @ 2021-04-19  4:00 UTC (permalink / raw)
  To: Paul Mackerras, Benjamin Herrenschmidt, Christophe Leroy,
	Michael Ellerman
  Cc: luto, tglx, linux-arch, linuxppc-dev, avagin, arnd, linux-kernel,
	vincenzo.frascino, dima

On Wed, 31 Mar 2021 16:48:43 +0000 (UTC), Christophe Leroy wrote:
> [Sorry, resending with complete destination list, I used the wrong script on the first delivery]
> 
> This series adds support for time namespaces on powerpc.
> 
> All timens selftests are successfull.
> 
> Christophe Leroy (3):
>   lib/vdso: Mark do_hres_timens() and do_coarse_timens()
>     __always_inline()
>   lib/vdso: Add vdso_data pointer as input to
>     __arch_get_timens_vdso_data()
>   powerpc/vdso: Add support for time namespaces
> 
> [...]

Applied to powerpc/next.

[1/4] lib/vdso: Mark do_hres_timens() and do_coarse_timens() __always_inline()
      https://git.kernel.org/powerpc/c/58efe9f696cf908f40d6672aeca81cb2ad2bc762
[2/4] lib/vdso: Add vdso_data pointer as input to __arch_get_timens_vdso_data()
      https://git.kernel.org/powerpc/c/808094fcbf4196be0feb17afbbdc182ec95c8cec
[3/4] powerpc/vdso: Separate vvar vma from vdso
      https://git.kernel.org/powerpc/c/1c4bce6753857dc409a0197342d18764e7f4b741
[4/4] powerpc/vdso: Add support for time namespaces
      https://git.kernel.org/powerpc/c/74205b3fc2effde821b219d955c70e727dc43cc6

cheers

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-04-19  4:04 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-31 16:48 [PATCH RESEND v1 0/4] powerpc/vdso: Add support for time namespaces Christophe Leroy
2021-03-31 16:48 ` [PATCH RESEND v1 1/4] lib/vdso: Mark do_hres_timens() and do_coarse_timens() __always_inline() Christophe Leroy
2021-04-12 12:47   ` Thomas Gleixner
2021-04-12 12:54   ` Vincenzo Frascino
2021-03-31 16:48 ` [PATCH RESEND v1 2/4] lib/vdso: Add vdso_data pointer as input to __arch_get_timens_vdso_data() Christophe Leroy
2021-04-05  5:00   ` Andrei Vagin
2021-04-12 12:48   ` Thomas Gleixner
2021-04-12 12:56   ` Vincenzo Frascino
2021-03-31 16:48 ` [PATCH RESEND v1 3/4] powerpc/vdso: Separate vvar vma from vdso Christophe Leroy
2021-04-05  5:03   ` Andrei Vagin
2021-04-12 12:58   ` Vincenzo Frascino
2021-03-31 16:48 ` [PATCH RESEND v1 4/4] powerpc/vdso: Add support for time namespaces Christophe Leroy
2021-04-05  4:50   ` Andrei Vagin
2021-04-12 13:00   ` Vincenzo Frascino
2021-04-12 12:49 ` [PATCH RESEND v1 0/4] " Thomas Gleixner
2021-04-13  6:31   ` Michael Ellerman
2021-04-19  4:00 ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).