All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/6] arm64: add the time namespace support
@ 2020-06-24  8:33 ` Andrei Vagin
  0 siblings, 0 replies; 39+ messages in thread
From: Andrei Vagin @ 2020-06-24  8:33 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland
  Cc: linux-arm-kernel, linux-kernel, Vincenzo Frascino,
	Thomas Gleixner, Dmitry Safonov, Christian Brauner, Andrei Vagin

Allocate the time namespace page among VVAR pages and add the logic
to handle faults on VVAR properly.

If a task belongs to a time namespace then the VVAR page which contains
the system wide VDSO data is replaced with a namespace specific page
which has the same layout as the VVAR page. That page has vdso_data->seq
set to 1 to enforce the slow path and vdso_data->clock_mode set to
VCLOCK_TIMENS to enforce the time namespace handling path.

The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
update of the VDSO data is in progress, is not really affecting regular
tasks which are not part of a time namespace as the task is spin waiting
for the update to finish and vdso_data->seq to become even again.

If a time namespace task hits that code path, it invokes the corresponding
time getter function which retrieves the real VVAR page, reads host time
and then adds the offset for the requested clock which is stored in the
special VVAR page.

v2: Code cleanups suggested by Vincenzo.
v3: add a comment in __arch_get_timens_vdso_data.
v4: - fix an issue reported by the lkp robot.
    - vvar has the same size with/without CONFIG_TIME_NAMESPACE, but the
      timens page isn't allocated on !CONFIG_TIME_NAMESPACE. This
      simplifies criu/vdso migration between different kernel configs.
v5: - Code cleanups suggested by Mark Rutland.
    - In vdso_join_timens, mmap_write_lock is downgraded to
      mmap_read_lock. The VMA list isn't changed there, zap_page_range
      doesn't require mmap_write_lock.

Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dmitry Safonov <dima@arista.com>

v5 on github (if someone prefers `git pull` to `git am`):
https://github.com/avagin/linux-task-diag/tree/arm64/timens-v5

Andrei Vagin (6):
  arm64/vdso: use the fault callback to map vvar pages
  arm64/vdso: Zap vvar pages when switching to a time namespace
  arm64/vdso: Add time namespace page
  arm64/vdso: Handle faults on timens page
  arm64/vdso: Restrict splitting VVAR VMA
  arm64: enable time namespace support

--
2.24.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v5 0/6] arm64: add the time namespace support
@ 2020-06-24  8:33 ` Andrei Vagin
  0 siblings, 0 replies; 39+ messages in thread
From: Andrei Vagin @ 2020-06-24  8:33 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland
  Cc: Dmitry Safonov, linux-kernel, Andrei Vagin, Thomas Gleixner,
	Vincenzo Frascino, Christian Brauner, linux-arm-kernel

Allocate the time namespace page among VVAR pages and add the logic
to handle faults on VVAR properly.

If a task belongs to a time namespace then the VVAR page which contains
the system wide VDSO data is replaced with a namespace specific page
which has the same layout as the VVAR page. That page has vdso_data->seq
set to 1 to enforce the slow path and vdso_data->clock_mode set to
VCLOCK_TIMENS to enforce the time namespace handling path.

The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
update of the VDSO data is in progress, is not really affecting regular
tasks which are not part of a time namespace as the task is spin waiting
for the update to finish and vdso_data->seq to become even again.

If a time namespace task hits that code path, it invokes the corresponding
time getter function which retrieves the real VVAR page, reads host time
and then adds the offset for the requested clock which is stored in the
special VVAR page.

v2: Code cleanups suggested by Vincenzo.
v3: add a comment in __arch_get_timens_vdso_data.
v4: - fix an issue reported by the lkp robot.
    - vvar has the same size with/without CONFIG_TIME_NAMESPACE, but the
      timens page isn't allocated on !CONFIG_TIME_NAMESPACE. This
      simplifies criu/vdso migration between different kernel configs.
v5: - Code cleanups suggested by Mark Rutland.
    - In vdso_join_timens, mmap_write_lock is downgraded to
      mmap_read_lock. The VMA list isn't changed there, zap_page_range
      doesn't require mmap_write_lock.

Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dmitry Safonov <dima@arista.com>

v5 on github (if someone prefers `git pull` to `git am`):
https://github.com/avagin/linux-task-diag/tree/arm64/timens-v5

Andrei Vagin (6):
  arm64/vdso: use the fault callback to map vvar pages
  arm64/vdso: Zap vvar pages when switching to a time namespace
  arm64/vdso: Add time namespace page
  arm64/vdso: Handle faults on timens page
  arm64/vdso: Restrict splitting VVAR VMA
  arm64: enable time namespace support

--
2.24.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/6] arm64/vdso: use the fault callback to map vvar pages
  2020-06-24  8:33 ` Andrei Vagin
@ 2020-06-24  8:33   ` Andrei Vagin
  -1 siblings, 0 replies; 39+ messages in thread
From: Andrei Vagin @ 2020-06-24  8:33 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland
  Cc: linux-arm-kernel, linux-kernel, Vincenzo Frascino,
	Thomas Gleixner, Dmitry Safonov, Christian Brauner, Andrei Vagin

Currently the vdso has no awareness of time namespaces, which may
apply distinct offsets to processes in different namespaces. To handle
this within the vdso, we'll need to expose a per-namespace data page.

As a preparatory step, this patch separates the vdso data page from
the code pages, and has it faulted in via its own fault callback.
Subsquent patches will extend this to support distinct pages per time
namespace.

The vvar vma has to be installed with the VM_PFNMAP flag to handle
faults via its vma fault callback.

Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
---
 arch/arm64/kernel/vdso.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index 4e016574bd91..7c4620451fa5 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -107,29 +107,32 @@ static int __vdso_init(enum vdso_abi abi)
 			vdso_info[abi].vdso_code_start) >>
 			PAGE_SHIFT;
 
-	/* Allocate the vDSO pagelist, plus a page for the data. */
-	vdso_pagelist = kcalloc(vdso_info[abi].vdso_pages + 1,
+	vdso_pagelist = kcalloc(vdso_info[abi].vdso_pages,
 				sizeof(struct page *),
 				GFP_KERNEL);
 	if (vdso_pagelist == NULL)
 		return -ENOMEM;
 
-	/* Grab the vDSO data page. */
-	vdso_pagelist[0] = phys_to_page(__pa_symbol(vdso_data));
-
-
 	/* Grab the vDSO code pages. */
 	pfn = sym_to_pfn(vdso_info[abi].vdso_code_start);
 
 	for (i = 0; i < vdso_info[abi].vdso_pages; i++)
-		vdso_pagelist[i + 1] = pfn_to_page(pfn + i);
+		vdso_pagelist[i] = pfn_to_page(pfn + i);
 
-	vdso_info[abi].dm->pages = &vdso_pagelist[0];
-	vdso_info[abi].cm->pages = &vdso_pagelist[1];
+	vdso_info[abi].cm->pages = vdso_pagelist;
 
 	return 0;
 }
 
+static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
+			     struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+	if (vmf->pgoff == 0)
+		return vmf_insert_pfn(vma, vmf->address,
+				sym_to_pfn(vdso_data));
+	return VM_FAULT_SIGBUS;
+}
+
 static int __setup_additional_pages(enum vdso_abi abi,
 				    struct mm_struct *mm,
 				    struct linux_binprm *bprm,
@@ -150,7 +153,7 @@ static int __setup_additional_pages(enum vdso_abi abi,
 	}
 
 	ret = _install_special_mapping(mm, vdso_base, PAGE_SIZE,
-				       VM_READ|VM_MAYREAD,
+				       VM_READ|VM_MAYREAD|VM_PFNMAP,
 				       vdso_info[abi].dm);
 	if (IS_ERR(ret))
 		goto up_fail;
@@ -209,6 +212,7 @@ static struct vm_special_mapping aarch32_vdso_maps[] = {
 #ifdef CONFIG_COMPAT_VDSO
 	[AA32_MAP_VVAR] = {
 		.name = "[vvar]",
+		.fault = vvar_fault,
 	},
 	[AA32_MAP_VDSO] = {
 		.name = "[vdso]",
@@ -376,6 +380,7 @@ enum aarch64_map {
 static struct vm_special_mapping aarch64_vdso_maps[] __ro_after_init = {
 	[AA64_MAP_VVAR] = {
 		.name	= "[vvar]",
+		.fault = vvar_fault,
 	},
 	[AA64_MAP_VDSO] = {
 		.name	= "[vdso]",
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 1/6] arm64/vdso: use the fault callback to map vvar pages
@ 2020-06-24  8:33   ` Andrei Vagin
  0 siblings, 0 replies; 39+ messages in thread
From: Andrei Vagin @ 2020-06-24  8:33 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland
  Cc: Dmitry Safonov, linux-kernel, Andrei Vagin, Thomas Gleixner,
	Vincenzo Frascino, Christian Brauner, linux-arm-kernel

Currently the vdso has no awareness of time namespaces, which may
apply distinct offsets to processes in different namespaces. To handle
this within the vdso, we'll need to expose a per-namespace data page.

As a preparatory step, this patch separates the vdso data page from
the code pages, and has it faulted in via its own fault callback.
Subsquent patches will extend this to support distinct pages per time
namespace.

The vvar vma has to be installed with the VM_PFNMAP flag to handle
faults via its vma fault callback.

Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
---
 arch/arm64/kernel/vdso.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index 4e016574bd91..7c4620451fa5 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -107,29 +107,32 @@ static int __vdso_init(enum vdso_abi abi)
 			vdso_info[abi].vdso_code_start) >>
 			PAGE_SHIFT;
 
-	/* Allocate the vDSO pagelist, plus a page for the data. */
-	vdso_pagelist = kcalloc(vdso_info[abi].vdso_pages + 1,
+	vdso_pagelist = kcalloc(vdso_info[abi].vdso_pages,
 				sizeof(struct page *),
 				GFP_KERNEL);
 	if (vdso_pagelist == NULL)
 		return -ENOMEM;
 
-	/* Grab the vDSO data page. */
-	vdso_pagelist[0] = phys_to_page(__pa_symbol(vdso_data));
-
-
 	/* Grab the vDSO code pages. */
 	pfn = sym_to_pfn(vdso_info[abi].vdso_code_start);
 
 	for (i = 0; i < vdso_info[abi].vdso_pages; i++)
-		vdso_pagelist[i + 1] = pfn_to_page(pfn + i);
+		vdso_pagelist[i] = pfn_to_page(pfn + i);
 
-	vdso_info[abi].dm->pages = &vdso_pagelist[0];
-	vdso_info[abi].cm->pages = &vdso_pagelist[1];
+	vdso_info[abi].cm->pages = vdso_pagelist;
 
 	return 0;
 }
 
+static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
+			     struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+	if (vmf->pgoff == 0)
+		return vmf_insert_pfn(vma, vmf->address,
+				sym_to_pfn(vdso_data));
+	return VM_FAULT_SIGBUS;
+}
+
 static int __setup_additional_pages(enum vdso_abi abi,
 				    struct mm_struct *mm,
 				    struct linux_binprm *bprm,
@@ -150,7 +153,7 @@ static int __setup_additional_pages(enum vdso_abi abi,
 	}
 
 	ret = _install_special_mapping(mm, vdso_base, PAGE_SIZE,
-				       VM_READ|VM_MAYREAD,
+				       VM_READ|VM_MAYREAD|VM_PFNMAP,
 				       vdso_info[abi].dm);
 	if (IS_ERR(ret))
 		goto up_fail;
@@ -209,6 +212,7 @@ static struct vm_special_mapping aarch32_vdso_maps[] = {
 #ifdef CONFIG_COMPAT_VDSO
 	[AA32_MAP_VVAR] = {
 		.name = "[vvar]",
+		.fault = vvar_fault,
 	},
 	[AA32_MAP_VDSO] = {
 		.name = "[vdso]",
@@ -376,6 +380,7 @@ enum aarch64_map {
 static struct vm_special_mapping aarch64_vdso_maps[] __ro_after_init = {
 	[AA64_MAP_VVAR] = {
 		.name	= "[vvar]",
+		.fault = vvar_fault,
 	},
 	[AA64_MAP_VDSO] = {
 		.name	= "[vdso]",
-- 
2.24.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 2/6] arm64/vdso: Zap vvar pages when switching to a time namespace
  2020-06-24  8:33 ` Andrei Vagin
@ 2020-06-24  8:33   ` Andrei Vagin
  -1 siblings, 0 replies; 39+ messages in thread
From: Andrei Vagin @ 2020-06-24  8:33 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland
  Cc: linux-arm-kernel, linux-kernel, Vincenzo Frascino,
	Thomas Gleixner, Dmitry Safonov, Christian Brauner, Andrei Vagin

The order of vvar pages depends on whether a task belongs to the root
time namespace or not. In the root time namespace, a task doesn't have a
per-namespace page. In a non-root namespace, the VVAR page which contains
the system-wide VDSO data is replaced with a namespace specific page
that contains clock offsets.

Whenever a task changes its namespace, the VVAR page tables are cleared
and then they will be re-faulted with a corresponding layout.

A task can switch its time namespace only if its ->mm isn't shared with
another task.

Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
---
 arch/arm64/kernel/vdso.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index 7c4620451fa5..bdf492a17dff 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -124,6 +124,37 @@ static int __vdso_init(enum vdso_abi abi)
 	return 0;
 }
 
+#ifdef CONFIG_TIME_NS
+/*
+ * The vvar mapping contains data for a specific time namespace, so when a task
+ * changes namespace we must unmap its vvar data for the old namespace.
+ * Subsequent faults will map in data for the new namespace.
+ *
+ * For more details see timens_setup_vdso_data().
+ */
+int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
+{
+	struct mm_struct *mm = task->mm;
+	struct vm_area_struct *vma;
+
+	mmap_read_lock(mm);
+
+	for (vma = mm->mmap; vma; vma = vma->vm_next) {
+		unsigned long size = vma->vm_end - vma->vm_start;
+
+		if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA64].dm))
+			zap_page_range(vma, vma->vm_start, size);
+#ifdef CONFIG_COMPAT_VDSO
+		if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA32].dm))
+			zap_page_range(vma, vma->vm_start, size);
+#endif
+	}
+
+	mmap_read_unlock(mm);
+	return 0;
+}
+#endif
+
 static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
 			     struct vm_area_struct *vma, struct vm_fault *vmf)
 {
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 2/6] arm64/vdso: Zap vvar pages when switching to a time namespace
@ 2020-06-24  8:33   ` Andrei Vagin
  0 siblings, 0 replies; 39+ messages in thread
From: Andrei Vagin @ 2020-06-24  8:33 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland
  Cc: Dmitry Safonov, linux-kernel, Andrei Vagin, Thomas Gleixner,
	Vincenzo Frascino, Christian Brauner, linux-arm-kernel

The order of vvar pages depends on whether a task belongs to the root
time namespace or not. In the root time namespace, a task doesn't have a
per-namespace page. In a non-root namespace, the VVAR page which contains
the system-wide VDSO data is replaced with a namespace specific page
that contains clock offsets.

Whenever a task changes its namespace, the VVAR page tables are cleared
and then they will be re-faulted with a corresponding layout.

A task can switch its time namespace only if its ->mm isn't shared with
another task.

Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
---
 arch/arm64/kernel/vdso.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index 7c4620451fa5..bdf492a17dff 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -124,6 +124,37 @@ static int __vdso_init(enum vdso_abi abi)
 	return 0;
 }
 
+#ifdef CONFIG_TIME_NS
+/*
+ * The vvar mapping contains data for a specific time namespace, so when a task
+ * changes namespace we must unmap its vvar data for the old namespace.
+ * Subsequent faults will map in data for the new namespace.
+ *
+ * For more details see timens_setup_vdso_data().
+ */
+int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
+{
+	struct mm_struct *mm = task->mm;
+	struct vm_area_struct *vma;
+
+	mmap_read_lock(mm);
+
+	for (vma = mm->mmap; vma; vma = vma->vm_next) {
+		unsigned long size = vma->vm_end - vma->vm_start;
+
+		if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA64].dm))
+			zap_page_range(vma, vma->vm_start, size);
+#ifdef CONFIG_COMPAT_VDSO
+		if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA32].dm))
+			zap_page_range(vma, vma->vm_start, size);
+#endif
+	}
+
+	mmap_read_unlock(mm);
+	return 0;
+}
+#endif
+
 static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
 			     struct vm_area_struct *vma, struct vm_fault *vmf)
 {
-- 
2.24.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 3/6] arm64/vdso: Add time namespace page
  2020-06-24  8:33 ` Andrei Vagin
@ 2020-06-24  8:33   ` Andrei Vagin
  -1 siblings, 0 replies; 39+ messages in thread
From: Andrei Vagin @ 2020-06-24  8:33 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland
  Cc: linux-arm-kernel, linux-kernel, Vincenzo Frascino,
	Thomas Gleixner, Dmitry Safonov, Christian Brauner, Andrei Vagin

Allocate the time namespace page among VVAR pages.  Provide
__arch_get_timens_vdso_data() helper for VDSO code to get the
code-relative position of VVARs on that special page.

If a task belongs to a time namespace then the VVAR page which contains
the system wide VDSO data is replaced with a namespace specific page
which has the same layout as the VVAR page. That page has vdso_data->seq
set to 1 to enforce the slow path and vdso_data->clock_mode set to
VCLOCK_TIMENS to enforce the time namespace handling path.

The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
update of the VDSO data is in progress, is not really affecting regular
tasks which are not part of a time namespace as the task is spin waiting
for the update to finish and vdso_data->seq to become even again.

If a time namespace task hits that code path, it invokes the corresponding
time getter function which retrieves the real VVAR page, reads host time
and then adds the offset for the requested clock which is stored in the
special VVAR page.

The time-namespace page isn't allocated on !CONFIG_TIME_NAMESPACE, but
vma is the same size, which simplifies criu/vdso migration between
different kernel configs.

Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
---
 arch/arm64/include/asm/vdso.h                 |  2 ++
 .../include/asm/vdso/compat_gettimeofday.h    | 12 ++++++++++++
 arch/arm64/include/asm/vdso/gettimeofday.h    |  8 ++++++++
 arch/arm64/kernel/vdso.c                      | 19 ++++++++++++++++---
 arch/arm64/kernel/vdso/vdso.lds.S             |  5 ++++-
 arch/arm64/kernel/vdso32/vdso.lds.S           |  5 ++++-
 include/vdso/datapage.h                       |  1 +
 7 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/vdso.h b/arch/arm64/include/asm/vdso.h
index 07468428fd29..f99dcb94b438 100644
--- a/arch/arm64/include/asm/vdso.h
+++ b/arch/arm64/include/asm/vdso.h
@@ -12,6 +12,8 @@
  */
 #define VDSO_LBASE	0x0
 
+#define __VVAR_PAGES    2
+
 #ifndef __ASSEMBLY__
 
 #include <generated/vdso-offsets.h>
diff --git a/arch/arm64/include/asm/vdso/compat_gettimeofday.h b/arch/arm64/include/asm/vdso/compat_gettimeofday.h
index b6907ae78e53..b7c549d46d18 100644
--- a/arch/arm64/include/asm/vdso/compat_gettimeofday.h
+++ b/arch/arm64/include/asm/vdso/compat_gettimeofday.h
@@ -152,6 +152,18 @@ static __always_inline const struct vdso_data *__arch_get_vdso_data(void)
 	return ret;
 }
 
+#ifdef CONFIG_TIME_NS
+static __always_inline const struct vdso_data *__arch_get_timens_vdso_data(void)
+{
+	const struct vdso_data *ret;
+
+	/* See __arch_get_vdso_data(). */
+	asm volatile("mov %0, %1" : "=r"(ret) : "r"(_timens_data));
+
+	return ret;
+}
+#endif
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_VDSO_GETTIMEOFDAY_H */
diff --git a/arch/arm64/include/asm/vdso/gettimeofday.h b/arch/arm64/include/asm/vdso/gettimeofday.h
index afba6ba332f8..cf39eae5eaaf 100644
--- a/arch/arm64/include/asm/vdso/gettimeofday.h
+++ b/arch/arm64/include/asm/vdso/gettimeofday.h
@@ -96,6 +96,14 @@ const struct vdso_data *__arch_get_vdso_data(void)
 	return _vdso_data;
 }
 
+#ifdef CONFIG_TIME_NS
+static __always_inline
+const struct vdso_data *__arch_get_timens_vdso_data(void)
+{
+	return _timens_data;
+}
+#endif
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_VDSO_GETTIMEOFDAY_H */
diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index bdf492a17dff..18854e6c1373 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -40,6 +40,12 @@ enum vdso_abi {
 #endif /* CONFIG_COMPAT_VDSO */
 };
 
+enum vvar_pages {
+	VVAR_DATA_PAGE_OFFSET,
+	VVAR_TIMENS_PAGE_OFFSET,
+	VVAR_NR_PAGES,
+};
+
 struct vdso_abi_info {
 	const char *name;
 	const char *vdso_code_start;
@@ -125,6 +131,11 @@ static int __vdso_init(enum vdso_abi abi)
 }
 
 #ifdef CONFIG_TIME_NS
+struct vdso_data *arch_get_vdso_data(void *vvar_page)
+{
+	return (struct vdso_data *)(vvar_page);
+}
+
 /*
  * The vvar mapping contains data for a specific time namespace, so when a task
  * changes namespace we must unmap its vvar data for the old namespace.
@@ -173,9 +184,11 @@ static int __setup_additional_pages(enum vdso_abi abi,
 	unsigned long gp_flags = 0;
 	void *ret;
 
+	BUILD_BUG_ON(VVAR_NR_PAGES != __VVAR_PAGES);
+
 	vdso_text_len = vdso_info[abi].vdso_pages << PAGE_SHIFT;
 	/* Be sure to map the data page */
-	vdso_mapping_len = vdso_text_len + PAGE_SIZE;
+	vdso_mapping_len = vdso_text_len + VVAR_NR_PAGES * PAGE_SIZE;
 
 	vdso_base = get_unmapped_area(NULL, 0, vdso_mapping_len, 0, 0);
 	if (IS_ERR_VALUE(vdso_base)) {
@@ -183,7 +196,7 @@ static int __setup_additional_pages(enum vdso_abi abi,
 		goto up_fail;
 	}
 
-	ret = _install_special_mapping(mm, vdso_base, PAGE_SIZE,
+	ret = _install_special_mapping(mm, vdso_base, VVAR_NR_PAGES * PAGE_SIZE,
 				       VM_READ|VM_MAYREAD|VM_PFNMAP,
 				       vdso_info[abi].dm);
 	if (IS_ERR(ret))
@@ -192,7 +205,7 @@ static int __setup_additional_pages(enum vdso_abi abi,
 	if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) && system_supports_bti())
 		gp_flags = VM_ARM64_BTI;
 
-	vdso_base += PAGE_SIZE;
+	vdso_base += VVAR_NR_PAGES * PAGE_SIZE;
 	mm->context.vdso = (void *)vdso_base;
 	ret = _install_special_mapping(mm, vdso_base, vdso_text_len,
 				       VM_READ|VM_EXEC|gp_flags|
diff --git a/arch/arm64/kernel/vdso/vdso.lds.S b/arch/arm64/kernel/vdso/vdso.lds.S
index 7ad2d3a0cd48..d808ad31e01f 100644
--- a/arch/arm64/kernel/vdso/vdso.lds.S
+++ b/arch/arm64/kernel/vdso/vdso.lds.S
@@ -17,7 +17,10 @@ OUTPUT_ARCH(aarch64)
 
 SECTIONS
 {
-	PROVIDE(_vdso_data = . - PAGE_SIZE);
+	PROVIDE(_vdso_data = . - __VVAR_PAGES * PAGE_SIZE);
+#ifdef CONFIG_TIME_NS
+	PROVIDE(_timens_data = _vdso_data + PAGE_SIZE);
+#endif
 	. = VDSO_LBASE + SIZEOF_HEADERS;
 
 	.hash		: { *(.hash) }			:text
diff --git a/arch/arm64/kernel/vdso32/vdso.lds.S b/arch/arm64/kernel/vdso32/vdso.lds.S
index a3944927eaeb..06cc60a9630f 100644
--- a/arch/arm64/kernel/vdso32/vdso.lds.S
+++ b/arch/arm64/kernel/vdso32/vdso.lds.S
@@ -17,7 +17,10 @@ OUTPUT_ARCH(arm)
 
 SECTIONS
 {
-	PROVIDE_HIDDEN(_vdso_data = . - PAGE_SIZE);
+	PROVIDE_HIDDEN(_vdso_data = . - __VVAR_PAGES * PAGE_SIZE);
+#ifdef CONFIG_TIME_NS
+	PROVIDE_HIDDEN(_timens_data = _vdso_data + PAGE_SIZE);
+#endif
 	. = VDSO_LBASE + SIZEOF_HEADERS;
 
 	.hash		: { *(.hash) }			:text
diff --git a/include/vdso/datapage.h b/include/vdso/datapage.h
index 7955c56d6b3c..ee810cae4e1e 100644
--- a/include/vdso/datapage.h
+++ b/include/vdso/datapage.h
@@ -109,6 +109,7 @@ struct vdso_data {
  * relocation, and this is what we need.
  */
 extern struct vdso_data _vdso_data[CS_BASES] __attribute__((visibility("hidden")));
+extern struct vdso_data _timens_data[CS_BASES] __attribute__((visibility("hidden")));
 
 /*
  * The generic vDSO implementation requires that gettimeofday.h
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 3/6] arm64/vdso: Add time namespace page
@ 2020-06-24  8:33   ` Andrei Vagin
  0 siblings, 0 replies; 39+ messages in thread
From: Andrei Vagin @ 2020-06-24  8:33 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland
  Cc: Dmitry Safonov, linux-kernel, Andrei Vagin, Thomas Gleixner,
	Vincenzo Frascino, Christian Brauner, linux-arm-kernel

Allocate the time namespace page among VVAR pages.  Provide
__arch_get_timens_vdso_data() helper for VDSO code to get the
code-relative position of VVARs on that special page.

If a task belongs to a time namespace then the VVAR page which contains
the system wide VDSO data is replaced with a namespace specific page
which has the same layout as the VVAR page. That page has vdso_data->seq
set to 1 to enforce the slow path and vdso_data->clock_mode set to
VCLOCK_TIMENS to enforce the time namespace handling path.

The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
update of the VDSO data is in progress, is not really affecting regular
tasks which are not part of a time namespace as the task is spin waiting
for the update to finish and vdso_data->seq to become even again.

If a time namespace task hits that code path, it invokes the corresponding
time getter function which retrieves the real VVAR page, reads host time
and then adds the offset for the requested clock which is stored in the
special VVAR page.

The time-namespace page isn't allocated on !CONFIG_TIME_NAMESPACE, but
vma is the same size, which simplifies criu/vdso migration between
different kernel configs.

Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
---
 arch/arm64/include/asm/vdso.h                 |  2 ++
 .../include/asm/vdso/compat_gettimeofday.h    | 12 ++++++++++++
 arch/arm64/include/asm/vdso/gettimeofday.h    |  8 ++++++++
 arch/arm64/kernel/vdso.c                      | 19 ++++++++++++++++---
 arch/arm64/kernel/vdso/vdso.lds.S             |  5 ++++-
 arch/arm64/kernel/vdso32/vdso.lds.S           |  5 ++++-
 include/vdso/datapage.h                       |  1 +
 7 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/vdso.h b/arch/arm64/include/asm/vdso.h
index 07468428fd29..f99dcb94b438 100644
--- a/arch/arm64/include/asm/vdso.h
+++ b/arch/arm64/include/asm/vdso.h
@@ -12,6 +12,8 @@
  */
 #define VDSO_LBASE	0x0
 
+#define __VVAR_PAGES    2
+
 #ifndef __ASSEMBLY__
 
 #include <generated/vdso-offsets.h>
diff --git a/arch/arm64/include/asm/vdso/compat_gettimeofday.h b/arch/arm64/include/asm/vdso/compat_gettimeofday.h
index b6907ae78e53..b7c549d46d18 100644
--- a/arch/arm64/include/asm/vdso/compat_gettimeofday.h
+++ b/arch/arm64/include/asm/vdso/compat_gettimeofday.h
@@ -152,6 +152,18 @@ static __always_inline const struct vdso_data *__arch_get_vdso_data(void)
 	return ret;
 }
 
+#ifdef CONFIG_TIME_NS
+static __always_inline const struct vdso_data *__arch_get_timens_vdso_data(void)
+{
+	const struct vdso_data *ret;
+
+	/* See __arch_get_vdso_data(). */
+	asm volatile("mov %0, %1" : "=r"(ret) : "r"(_timens_data));
+
+	return ret;
+}
+#endif
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_VDSO_GETTIMEOFDAY_H */
diff --git a/arch/arm64/include/asm/vdso/gettimeofday.h b/arch/arm64/include/asm/vdso/gettimeofday.h
index afba6ba332f8..cf39eae5eaaf 100644
--- a/arch/arm64/include/asm/vdso/gettimeofday.h
+++ b/arch/arm64/include/asm/vdso/gettimeofday.h
@@ -96,6 +96,14 @@ const struct vdso_data *__arch_get_vdso_data(void)
 	return _vdso_data;
 }
 
+#ifdef CONFIG_TIME_NS
+static __always_inline
+const struct vdso_data *__arch_get_timens_vdso_data(void)
+{
+	return _timens_data;
+}
+#endif
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_VDSO_GETTIMEOFDAY_H */
diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index bdf492a17dff..18854e6c1373 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -40,6 +40,12 @@ enum vdso_abi {
 #endif /* CONFIG_COMPAT_VDSO */
 };
 
+enum vvar_pages {
+	VVAR_DATA_PAGE_OFFSET,
+	VVAR_TIMENS_PAGE_OFFSET,
+	VVAR_NR_PAGES,
+};
+
 struct vdso_abi_info {
 	const char *name;
 	const char *vdso_code_start;
@@ -125,6 +131,11 @@ static int __vdso_init(enum vdso_abi abi)
 }
 
 #ifdef CONFIG_TIME_NS
+struct vdso_data *arch_get_vdso_data(void *vvar_page)
+{
+	return (struct vdso_data *)(vvar_page);
+}
+
 /*
  * The vvar mapping contains data for a specific time namespace, so when a task
  * changes namespace we must unmap its vvar data for the old namespace.
@@ -173,9 +184,11 @@ static int __setup_additional_pages(enum vdso_abi abi,
 	unsigned long gp_flags = 0;
 	void *ret;
 
+	BUILD_BUG_ON(VVAR_NR_PAGES != __VVAR_PAGES);
+
 	vdso_text_len = vdso_info[abi].vdso_pages << PAGE_SHIFT;
 	/* Be sure to map the data page */
-	vdso_mapping_len = vdso_text_len + PAGE_SIZE;
+	vdso_mapping_len = vdso_text_len + VVAR_NR_PAGES * PAGE_SIZE;
 
 	vdso_base = get_unmapped_area(NULL, 0, vdso_mapping_len, 0, 0);
 	if (IS_ERR_VALUE(vdso_base)) {
@@ -183,7 +196,7 @@ static int __setup_additional_pages(enum vdso_abi abi,
 		goto up_fail;
 	}
 
-	ret = _install_special_mapping(mm, vdso_base, PAGE_SIZE,
+	ret = _install_special_mapping(mm, vdso_base, VVAR_NR_PAGES * PAGE_SIZE,
 				       VM_READ|VM_MAYREAD|VM_PFNMAP,
 				       vdso_info[abi].dm);
 	if (IS_ERR(ret))
@@ -192,7 +205,7 @@ static int __setup_additional_pages(enum vdso_abi abi,
 	if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) && system_supports_bti())
 		gp_flags = VM_ARM64_BTI;
 
-	vdso_base += PAGE_SIZE;
+	vdso_base += VVAR_NR_PAGES * PAGE_SIZE;
 	mm->context.vdso = (void *)vdso_base;
 	ret = _install_special_mapping(mm, vdso_base, vdso_text_len,
 				       VM_READ|VM_EXEC|gp_flags|
diff --git a/arch/arm64/kernel/vdso/vdso.lds.S b/arch/arm64/kernel/vdso/vdso.lds.S
index 7ad2d3a0cd48..d808ad31e01f 100644
--- a/arch/arm64/kernel/vdso/vdso.lds.S
+++ b/arch/arm64/kernel/vdso/vdso.lds.S
@@ -17,7 +17,10 @@ OUTPUT_ARCH(aarch64)
 
 SECTIONS
 {
-	PROVIDE(_vdso_data = . - PAGE_SIZE);
+	PROVIDE(_vdso_data = . - __VVAR_PAGES * PAGE_SIZE);
+#ifdef CONFIG_TIME_NS
+	PROVIDE(_timens_data = _vdso_data + PAGE_SIZE);
+#endif
 	. = VDSO_LBASE + SIZEOF_HEADERS;
 
 	.hash		: { *(.hash) }			:text
diff --git a/arch/arm64/kernel/vdso32/vdso.lds.S b/arch/arm64/kernel/vdso32/vdso.lds.S
index a3944927eaeb..06cc60a9630f 100644
--- a/arch/arm64/kernel/vdso32/vdso.lds.S
+++ b/arch/arm64/kernel/vdso32/vdso.lds.S
@@ -17,7 +17,10 @@ OUTPUT_ARCH(arm)
 
 SECTIONS
 {
-	PROVIDE_HIDDEN(_vdso_data = . - PAGE_SIZE);
+	PROVIDE_HIDDEN(_vdso_data = . - __VVAR_PAGES * PAGE_SIZE);
+#ifdef CONFIG_TIME_NS
+	PROVIDE_HIDDEN(_timens_data = _vdso_data + PAGE_SIZE);
+#endif
 	. = VDSO_LBASE + SIZEOF_HEADERS;
 
 	.hash		: { *(.hash) }			:text
diff --git a/include/vdso/datapage.h b/include/vdso/datapage.h
index 7955c56d6b3c..ee810cae4e1e 100644
--- a/include/vdso/datapage.h
+++ b/include/vdso/datapage.h
@@ -109,6 +109,7 @@ struct vdso_data {
  * relocation, and this is what we need.
  */
 extern struct vdso_data _vdso_data[CS_BASES] __attribute__((visibility("hidden")));
+extern struct vdso_data _timens_data[CS_BASES] __attribute__((visibility("hidden")));
 
 /*
  * The generic vDSO implementation requires that gettimeofday.h
-- 
2.24.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 4/6] arm64/vdso: Handle faults on timens page
  2020-06-24  8:33 ` Andrei Vagin
@ 2020-06-24  8:33   ` Andrei Vagin
  -1 siblings, 0 replies; 39+ messages in thread
From: Andrei Vagin @ 2020-06-24  8:33 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland
  Cc: linux-arm-kernel, linux-kernel, Vincenzo Frascino,
	Thomas Gleixner, Dmitry Safonov, Christian Brauner, Andrei Vagin

If a task belongs to a time namespace then the VVAR page which contains
the system wide VDSO data is replaced with a namespace specific page
which has the same layout as the VVAR page.

Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
---
 arch/arm64/kernel/vdso.c | 57 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 53 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index 18854e6c1373..be9ca8c46cff 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -18,6 +18,7 @@
 #include <linux/sched.h>
 #include <linux/signal.h>
 #include <linux/slab.h>
+#include <linux/time_namespace.h>
 #include <linux/timekeeper_internal.h>
 #include <linux/vmalloc.h>
 #include <vdso/datapage.h>
@@ -164,15 +165,63 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
 	mmap_read_unlock(mm);
 	return 0;
 }
+
+static struct page *find_timens_vvar_page(struct vm_area_struct *vma)
+{
+	if (likely(vma->vm_mm == current->mm))
+		return current->nsproxy->time_ns->vvar_page;
+
+	/*
+	 * VM_PFNMAP | VM_IO protect .fault() handler from being called
+	 * through interfaces like /proc/$pid/mem or
+	 * process_vm_{readv,writev}() as long as there's no .access()
+	 * in special_mapping_vmops().
+	 * For more details check_vma_flags() and __access_remote_vm()
+	 */
+
+	WARN(1, "vvar_page accessed remotely");
+
+	return NULL;
+}
+#else
+static inline struct page *find_timens_vvar_page(struct vm_area_struct *vma)
+{
+	return NULL;
+}
 #endif
 
 static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
 			     struct vm_area_struct *vma, struct vm_fault *vmf)
 {
-	if (vmf->pgoff == 0)
-		return vmf_insert_pfn(vma, vmf->address,
-				sym_to_pfn(vdso_data));
-	return VM_FAULT_SIGBUS;
+	struct page *timens_page = find_timens_vvar_page(vma);
+	unsigned long pfn;
+
+	switch (vmf->pgoff) {
+	case VVAR_DATA_PAGE_OFFSET:
+		if (timens_page)
+			pfn = page_to_pfn(timens_page);
+		else
+			pfn = sym_to_pfn(vdso_data);
+		break;
+#ifdef CONFIG_TIME_NS
+	case VVAR_TIMENS_PAGE_OFFSET:
+		/*
+		 * If a task belongs to a time namespace then a namespace
+		 * specific VVAR is mapped with the VVAR_DATA_PAGE_OFFSET and
+		 * the real VVAR page is mapped with the VVAR_TIMENS_PAGE_OFFSET
+		 * offset.
+		 * See also the comment near timens_setup_vdso_data().
+		 */
+		if (!timens_page)
+			return VM_FAULT_SIGBUS;
+		pfn = sym_to_pfn(vdso_data);
+		break;
+#endif /* CONFIG_TIME_NS */
+	default:
+		return VM_FAULT_SIGBUS;
+	}
+
+	return vmf_insert_pfn(vma, vmf->address, pfn);
 }
 
 static int __setup_additional_pages(enum vdso_abi abi,
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 4/6] arm64/vdso: Handle faults on timens page
@ 2020-06-24  8:33   ` Andrei Vagin
  0 siblings, 0 replies; 39+ messages in thread
From: Andrei Vagin @ 2020-06-24  8:33 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland
  Cc: Dmitry Safonov, linux-kernel, Andrei Vagin, Thomas Gleixner,
	Vincenzo Frascino, Christian Brauner, linux-arm-kernel

If a task belongs to a time namespace then the VVAR page which contains
the system wide VDSO data is replaced with a namespace specific page
which has the same layout as the VVAR page.

Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
---
 arch/arm64/kernel/vdso.c | 57 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 53 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index 18854e6c1373..be9ca8c46cff 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -18,6 +18,7 @@
 #include <linux/sched.h>
 #include <linux/signal.h>
 #include <linux/slab.h>
+#include <linux/time_namespace.h>
 #include <linux/timekeeper_internal.h>
 #include <linux/vmalloc.h>
 #include <vdso/datapage.h>
@@ -164,15 +165,63 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
 	mmap_read_unlock(mm);
 	return 0;
 }
+
+static struct page *find_timens_vvar_page(struct vm_area_struct *vma)
+{
+	if (likely(vma->vm_mm == current->mm))
+		return current->nsproxy->time_ns->vvar_page;
+
+	/*
+	 * VM_PFNMAP | VM_IO protect .fault() handler from being called
+	 * through interfaces like /proc/$pid/mem or
+	 * process_vm_{readv,writev}() as long as there's no .access()
+	 * in special_mapping_vmops().
+	 * For more details check_vma_flags() and __access_remote_vm()
+	 */
+
+	WARN(1, "vvar_page accessed remotely");
+
+	return NULL;
+}
+#else
+static inline struct page *find_timens_vvar_page(struct vm_area_struct *vma)
+{
+	return NULL;
+}
 #endif
 
 static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
 			     struct vm_area_struct *vma, struct vm_fault *vmf)
 {
-	if (vmf->pgoff == 0)
-		return vmf_insert_pfn(vma, vmf->address,
-				sym_to_pfn(vdso_data));
-	return VM_FAULT_SIGBUS;
+	struct page *timens_page = find_timens_vvar_page(vma);
+	unsigned long pfn;
+
+	switch (vmf->pgoff) {
+	case VVAR_DATA_PAGE_OFFSET:
+		if (timens_page)
+			pfn = page_to_pfn(timens_page);
+		else
+			pfn = sym_to_pfn(vdso_data);
+		break;
+#ifdef CONFIG_TIME_NS
+	case VVAR_TIMENS_PAGE_OFFSET:
+		/*
+		 * If a task belongs to a time namespace then a namespace
+		 * specific VVAR is mapped with the VVAR_DATA_PAGE_OFFSET and
+		 * the real VVAR page is mapped with the VVAR_TIMENS_PAGE_OFFSET
+		 * offset.
+		 * See also the comment near timens_setup_vdso_data().
+		 */
+		if (!timens_page)
+			return VM_FAULT_SIGBUS;
+		pfn = sym_to_pfn(vdso_data);
+		break;
+#endif /* CONFIG_TIME_NS */
+	default:
+		return VM_FAULT_SIGBUS;
+	}
+
+	return vmf_insert_pfn(vma, vmf->address, pfn);
 }
 
 static int __setup_additional_pages(enum vdso_abi abi,
-- 
2.24.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 5/6] arm64/vdso: Restrict splitting VVAR VMA
  2020-06-24  8:33 ` Andrei Vagin
@ 2020-06-24  8:33   ` Andrei Vagin
  -1 siblings, 0 replies; 39+ messages in thread
From: Andrei Vagin @ 2020-06-24  8:33 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland
  Cc: linux-arm-kernel, linux-kernel, Vincenzo Frascino,
	Thomas Gleixner, Dmitry Safonov, Christian Brauner, Andrei Vagin

Forbid splitting VVAR VMA resulting in a stricter ABI and reducing the
amount of corner-cases to consider while working further on VDSO time
namespace support.

As the offset from timens to VVAR page is computed compile-time, the pages
in VVAR should stay together and not being partically mremap()'ed.

Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
---
 arch/arm64/kernel/vdso.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index be9ca8c46cff..8be6bd6625db 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -224,6 +224,17 @@ static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
 	return vmf_insert_pfn(vma, vmf->address, pfn);
 }
 
+static int vvar_mremap(const struct vm_special_mapping *sm,
+		       struct vm_area_struct *new_vma)
+{
+	unsigned long new_size = new_vma->vm_end - new_vma->vm_start;
+
+	if (new_size != VVAR_NR_PAGES * PAGE_SIZE)
+		return -EINVAL;
+
+	return 0;
+}
+
 static int __setup_additional_pages(enum vdso_abi abi,
 				    struct mm_struct *mm,
 				    struct linux_binprm *bprm,
@@ -306,6 +317,7 @@ static struct vm_special_mapping aarch32_vdso_maps[] = {
 	[AA32_MAP_VVAR] = {
 		.name = "[vvar]",
 		.fault = vvar_fault,
+		.mremap = vvar_mremap,
 	},
 	[AA32_MAP_VDSO] = {
 		.name = "[vdso]",
@@ -474,6 +486,7 @@ static struct vm_special_mapping aarch64_vdso_maps[] __ro_after_init = {
 	[AA64_MAP_VVAR] = {
 		.name	= "[vvar]",
 		.fault = vvar_fault,
+		.mremap = vvar_mremap,
 	},
 	[AA64_MAP_VDSO] = {
 		.name	= "[vdso]",
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 5/6] arm64/vdso: Restrict splitting VVAR VMA
@ 2020-06-24  8:33   ` Andrei Vagin
  0 siblings, 0 replies; 39+ messages in thread
From: Andrei Vagin @ 2020-06-24  8:33 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland
  Cc: Dmitry Safonov, linux-kernel, Andrei Vagin, Thomas Gleixner,
	Vincenzo Frascino, Christian Brauner, linux-arm-kernel

Forbid splitting VVAR VMA resulting in a stricter ABI and reducing the
amount of corner-cases to consider while working further on VDSO time
namespace support.

As the offset from timens to VVAR page is computed compile-time, the pages
in VVAR should stay together and not being partically mremap()'ed.

Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
---
 arch/arm64/kernel/vdso.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index be9ca8c46cff..8be6bd6625db 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -224,6 +224,17 @@ static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
 	return vmf_insert_pfn(vma, vmf->address, pfn);
 }
 
+static int vvar_mremap(const struct vm_special_mapping *sm,
+		       struct vm_area_struct *new_vma)
+{
+	unsigned long new_size = new_vma->vm_end - new_vma->vm_start;
+
+	if (new_size != VVAR_NR_PAGES * PAGE_SIZE)
+		return -EINVAL;
+
+	return 0;
+}
+
 static int __setup_additional_pages(enum vdso_abi abi,
 				    struct mm_struct *mm,
 				    struct linux_binprm *bprm,
@@ -306,6 +317,7 @@ static struct vm_special_mapping aarch32_vdso_maps[] = {
 	[AA32_MAP_VVAR] = {
 		.name = "[vvar]",
 		.fault = vvar_fault,
+		.mremap = vvar_mremap,
 	},
 	[AA32_MAP_VDSO] = {
 		.name = "[vdso]",
@@ -474,6 +486,7 @@ static struct vm_special_mapping aarch64_vdso_maps[] __ro_after_init = {
 	[AA64_MAP_VVAR] = {
 		.name	= "[vvar]",
 		.fault = vvar_fault,
+		.mremap = vvar_mremap,
 	},
 	[AA64_MAP_VDSO] = {
 		.name	= "[vdso]",
-- 
2.24.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 6/6] arm64: enable time namespace support
  2020-06-24  8:33 ` Andrei Vagin
@ 2020-06-24  8:33   ` Andrei Vagin
  -1 siblings, 0 replies; 39+ messages in thread
From: Andrei Vagin @ 2020-06-24  8:33 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland
  Cc: linux-arm-kernel, linux-kernel, Vincenzo Frascino,
	Thomas Gleixner, Dmitry Safonov, Christian Brauner, Andrei Vagin

CONFIG_TIME_NS is dependes on GENERIC_VDSO_TIME_NS.

Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
---
 arch/arm64/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a4a094bedcb2..38d3180adf78 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -118,6 +118,7 @@ config ARM64
 	select GENERIC_STRNLEN_USER
 	select GENERIC_TIME_VSYSCALL
 	select GENERIC_GETTIMEOFDAY
+	select GENERIC_VDSO_TIME_NS
 	select HANDLE_DOMAIN_IRQ
 	select HARDIRQS_SW_RESEND
 	select HAVE_PCI
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 6/6] arm64: enable time namespace support
@ 2020-06-24  8:33   ` Andrei Vagin
  0 siblings, 0 replies; 39+ messages in thread
From: Andrei Vagin @ 2020-06-24  8:33 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland
  Cc: Dmitry Safonov, linux-kernel, Andrei Vagin, Thomas Gleixner,
	Vincenzo Frascino, Christian Brauner, linux-arm-kernel

CONFIG_TIME_NS is dependes on GENERIC_VDSO_TIME_NS.

Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
---
 arch/arm64/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a4a094bedcb2..38d3180adf78 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -118,6 +118,7 @@ config ARM64
 	select GENERIC_STRNLEN_USER
 	select GENERIC_TIME_VSYSCALL
 	select GENERIC_GETTIMEOFDAY
+	select GENERIC_VDSO_TIME_NS
 	select HANDLE_DOMAIN_IRQ
 	select HARDIRQS_SW_RESEND
 	select HAVE_PCI
-- 
2.24.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] arm64/vdso: Zap vvar pages when switching to a time namespace
  2020-06-24  8:33   ` Andrei Vagin
@ 2020-06-24 15:18     ` Christian Brauner
  -1 siblings, 0 replies; 39+ messages in thread
From: Christian Brauner @ 2020-06-24 15:18 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, linux-arm-kernel,
	linux-kernel, Vincenzo Frascino, Thomas Gleixner, Dmitry Safonov

On Wed, Jun 24, 2020 at 01:33:17AM -0700, Andrei Vagin wrote:
> The order of vvar pages depends on whether a task belongs to the root
> time namespace or not. In the root time namespace, a task doesn't have a
> per-namespace page. In a non-root namespace, the VVAR page which contains
> the system-wide VDSO data is replaced with a namespace specific page
> that contains clock offsets.
> 
> Whenever a task changes its namespace, the VVAR page tables are cleared
> and then they will be re-faulted with a corresponding layout.
> 
> A task can switch its time namespace only if its ->mm isn't shared with
> another task.
> 
> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
> Reviewed-by: Dmitry Safonov <dima@arista.com>
> Signed-off-by: Andrei Vagin <avagin@gmail.com>
> ---
>  arch/arm64/kernel/vdso.c | 31 +++++++++++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
> index 7c4620451fa5..bdf492a17dff 100644
> --- a/arch/arm64/kernel/vdso.c
> +++ b/arch/arm64/kernel/vdso.c
> @@ -124,6 +124,37 @@ static int __vdso_init(enum vdso_abi abi)
>  	return 0;
>  }
>  
> +#ifdef CONFIG_TIME_NS
> +/*
> + * The vvar mapping contains data for a specific time namespace, so when a task
> + * changes namespace we must unmap its vvar data for the old namespace.
> + * Subsequent faults will map in data for the new namespace.
> + *
> + * For more details see timens_setup_vdso_data().
> + */
> +int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
> +{
> +	struct mm_struct *mm = task->mm;
> +	struct vm_area_struct *vma;
> +
> +	mmap_read_lock(mm);

Perfect, thanks! I'll adapt my patches so that my change and this change
don't conflict and can go in together. Once they're landed we can simply
turn int vdso_join_timens() into void vdso_join_timens() everywhere.

Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>

Thanks!
Christian

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] arm64/vdso: Zap vvar pages when switching to a time namespace
@ 2020-06-24 15:18     ` Christian Brauner
  0 siblings, 0 replies; 39+ messages in thread
From: Christian Brauner @ 2020-06-24 15:18 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: Mark Rutland, Dmitry Safonov, Catalin Marinas, linux-kernel,
	Thomas Gleixner, Vincenzo Frascino, Will Deacon,
	linux-arm-kernel

On Wed, Jun 24, 2020 at 01:33:17AM -0700, Andrei Vagin wrote:
> The order of vvar pages depends on whether a task belongs to the root
> time namespace or not. In the root time namespace, a task doesn't have a
> per-namespace page. In a non-root namespace, the VVAR page which contains
> the system-wide VDSO data is replaced with a namespace specific page
> that contains clock offsets.
> 
> Whenever a task changes its namespace, the VVAR page tables are cleared
> and then they will be re-faulted with a corresponding layout.
> 
> A task can switch its time namespace only if its ->mm isn't shared with
> another task.
> 
> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
> Reviewed-by: Dmitry Safonov <dima@arista.com>
> Signed-off-by: Andrei Vagin <avagin@gmail.com>
> ---
>  arch/arm64/kernel/vdso.c | 31 +++++++++++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
> index 7c4620451fa5..bdf492a17dff 100644
> --- a/arch/arm64/kernel/vdso.c
> +++ b/arch/arm64/kernel/vdso.c
> @@ -124,6 +124,37 @@ static int __vdso_init(enum vdso_abi abi)
>  	return 0;
>  }
>  
> +#ifdef CONFIG_TIME_NS
> +/*
> + * The vvar mapping contains data for a specific time namespace, so when a task
> + * changes namespace we must unmap its vvar data for the old namespace.
> + * Subsequent faults will map in data for the new namespace.
> + *
> + * For more details see timens_setup_vdso_data().
> + */
> +int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
> +{
> +	struct mm_struct *mm = task->mm;
> +	struct vm_area_struct *vma;
> +
> +	mmap_read_lock(mm);

Perfect, thanks! I'll adapt my patches so that my change and this change
don't conflict and can go in together. Once they're landed we can simply
turn int vdso_join_timens() into void vdso_join_timens() everywhere.

Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>

Thanks!
Christian

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] arm64/vdso: Zap vvar pages when switching to a time namespace
  2020-06-24 15:18     ` Christian Brauner
@ 2020-06-25  8:25       ` Andrei Vagin
  -1 siblings, 0 replies; 39+ messages in thread
From: Andrei Vagin @ 2020-06-25  8:25 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, linux-arm-kernel,
	linux-kernel, Vincenzo Frascino, Thomas Gleixner, Dmitry Safonov

On Wed, Jun 24, 2020 at 05:18:01PM +0200, Christian Brauner wrote:
> On Wed, Jun 24, 2020 at 01:33:17AM -0700, Andrei Vagin wrote:
> > The order of vvar pages depends on whether a task belongs to the root
> > time namespace or not. In the root time namespace, a task doesn't have a
> > per-namespace page. In a non-root namespace, the VVAR page which contains
> > the system-wide VDSO data is replaced with a namespace specific page
> > that contains clock offsets.
> > 
> > Whenever a task changes its namespace, the VVAR page tables are cleared
> > and then they will be re-faulted with a corresponding layout.
> > 
> > A task can switch its time namespace only if its ->mm isn't shared with
> > another task.
> > 
> > Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
> > Reviewed-by: Dmitry Safonov <dima@arista.com>
> > Signed-off-by: Andrei Vagin <avagin@gmail.com>
> > ---
> >  arch/arm64/kernel/vdso.c | 31 +++++++++++++++++++++++++++++++
> >  1 file changed, 31 insertions(+)
> > 
> > diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
> > index 7c4620451fa5..bdf492a17dff 100644
> > --- a/arch/arm64/kernel/vdso.c
> > +++ b/arch/arm64/kernel/vdso.c
> > @@ -124,6 +124,37 @@ static int __vdso_init(enum vdso_abi abi)
> >  	return 0;
> >  }
> >  
> > +#ifdef CONFIG_TIME_NS
> > +/*
> > + * The vvar mapping contains data for a specific time namespace, so when a task
> > + * changes namespace we must unmap its vvar data for the old namespace.
> > + * Subsequent faults will map in data for the new namespace.
> > + *
> > + * For more details see timens_setup_vdso_data().
> > + */
> > +int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
> > +{
> > +	struct mm_struct *mm = task->mm;
> > +	struct vm_area_struct *vma;
> > +
> > +	mmap_read_lock(mm);
> 
> Perfect, thanks! I'll adapt my patches so that my change and this change
> don't conflict and can go in together. Once they're landed we can simply
> turn int vdso_join_timens() into void vdso_join_timens() everywhere.

Yep. Let's do it this way. Thanks!

> 
> Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
> 
> Thanks!
> Christian

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/6] arm64/vdso: Zap vvar pages when switching to a time namespace
@ 2020-06-25  8:25       ` Andrei Vagin
  0 siblings, 0 replies; 39+ messages in thread
From: Andrei Vagin @ 2020-06-25  8:25 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Mark Rutland, Dmitry Safonov, Catalin Marinas, linux-kernel,
	Thomas Gleixner, Vincenzo Frascino, Will Deacon,
	linux-arm-kernel

On Wed, Jun 24, 2020 at 05:18:01PM +0200, Christian Brauner wrote:
> On Wed, Jun 24, 2020 at 01:33:17AM -0700, Andrei Vagin wrote:
> > The order of vvar pages depends on whether a task belongs to the root
> > time namespace or not. In the root time namespace, a task doesn't have a
> > per-namespace page. In a non-root namespace, the VVAR page which contains
> > the system-wide VDSO data is replaced with a namespace specific page
> > that contains clock offsets.
> > 
> > Whenever a task changes its namespace, the VVAR page tables are cleared
> > and then they will be re-faulted with a corresponding layout.
> > 
> > A task can switch its time namespace only if its ->mm isn't shared with
> > another task.
> > 
> > Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
> > Reviewed-by: Dmitry Safonov <dima@arista.com>
> > Signed-off-by: Andrei Vagin <avagin@gmail.com>
> > ---
> >  arch/arm64/kernel/vdso.c | 31 +++++++++++++++++++++++++++++++
> >  1 file changed, 31 insertions(+)
> > 
> > diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
> > index 7c4620451fa5..bdf492a17dff 100644
> > --- a/arch/arm64/kernel/vdso.c
> > +++ b/arch/arm64/kernel/vdso.c
> > @@ -124,6 +124,37 @@ static int __vdso_init(enum vdso_abi abi)
> >  	return 0;
> >  }
> >  
> > +#ifdef CONFIG_TIME_NS
> > +/*
> > + * The vvar mapping contains data for a specific time namespace, so when a task
> > + * changes namespace we must unmap its vvar data for the old namespace.
> > + * Subsequent faults will map in data for the new namespace.
> > + *
> > + * For more details see timens_setup_vdso_data().
> > + */
> > +int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
> > +{
> > +	struct mm_struct *mm = task->mm;
> > +	struct vm_area_struct *vma;
> > +
> > +	mmap_read_lock(mm);
> 
> Perfect, thanks! I'll adapt my patches so that my change and this change
> don't conflict and can go in together. Once they're landed we can simply
> turn int vdso_join_timens() into void vdso_join_timens() everywhere.

Yep. Let's do it this way. Thanks!

> 
> Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
> 
> Thanks!
> Christian

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/6] arm64: add the time namespace support
  2020-06-24  8:33 ` Andrei Vagin
@ 2020-07-05  6:40   ` Andrei Vagin
  -1 siblings, 0 replies; 39+ messages in thread
From: Andrei Vagin @ 2020-07-05  6:40 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, linux-kernel, Vincenzo Frascino, Mark Rutland,
	Thomas Gleixner, Dmitry Safonov, Christian Brauner

On Wed, Jun 24, 2020 at 01:33:15AM -0700, Andrei Vagin wrote:
> Allocate the time namespace page among VVAR pages and add the logic
> to handle faults on VVAR properly.
> 
> If a task belongs to a time namespace then the VVAR page which contains
> the system wide VDSO data is replaced with a namespace specific page
> which has the same layout as the VVAR page. That page has vdso_data->seq
> set to 1 to enforce the slow path and vdso_data->clock_mode set to
> VCLOCK_TIMENS to enforce the time namespace handling path.
> 
> The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
> update of the VDSO data is in progress, is not really affecting regular
> tasks which are not part of a time namespace as the task is spin waiting
> for the update to finish and vdso_data->seq to become even again.
> 
> If a time namespace task hits that code path, it invokes the corresponding
> time getter function which retrieves the real VVAR page, reads host time
> and then adds the offset for the requested clock which is stored in the
> special VVAR page.
> 

> v2: Code cleanups suggested by Vincenzo.
> v3: add a comment in __arch_get_timens_vdso_data.
> v4: - fix an issue reported by the lkp robot.
>     - vvar has the same size with/without CONFIG_TIME_NAMESPACE, but the
>       timens page isn't allocated on !CONFIG_TIME_NAMESPACE. This
>       simplifies criu/vdso migration between different kernel configs.
> v5: - Code cleanups suggested by Mark Rutland.
>     - In vdso_join_timens, mmap_write_lock is downgraded to
>       mmap_read_lock. The VMA list isn't changed there, zap_page_range
>       doesn't require mmap_write_lock.
> 
> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
> Reviewed-by: Dmitry Safonov <dima@arista.com>

Hello Will and Catalin,

Have you had a chance to look at this patch set? I think it is ready to be
merged. Let me know if you have any questions.

Thanks,
Andrei


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/6] arm64: add the time namespace support
@ 2020-07-05  6:40   ` Andrei Vagin
  0 siblings, 0 replies; 39+ messages in thread
From: Andrei Vagin @ 2020-07-05  6:40 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Mark Rutland, Dmitry Safonov, linux-kernel, Thomas Gleixner,
	Vincenzo Frascino, Christian Brauner, linux-arm-kernel

On Wed, Jun 24, 2020 at 01:33:15AM -0700, Andrei Vagin wrote:
> Allocate the time namespace page among VVAR pages and add the logic
> to handle faults on VVAR properly.
> 
> If a task belongs to a time namespace then the VVAR page which contains
> the system wide VDSO data is replaced with a namespace specific page
> which has the same layout as the VVAR page. That page has vdso_data->seq
> set to 1 to enforce the slow path and vdso_data->clock_mode set to
> VCLOCK_TIMENS to enforce the time namespace handling path.
> 
> The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
> update of the VDSO data is in progress, is not really affecting regular
> tasks which are not part of a time namespace as the task is spin waiting
> for the update to finish and vdso_data->seq to become even again.
> 
> If a time namespace task hits that code path, it invokes the corresponding
> time getter function which retrieves the real VVAR page, reads host time
> and then adds the offset for the requested clock which is stored in the
> special VVAR page.
> 

> v2: Code cleanups suggested by Vincenzo.
> v3: add a comment in __arch_get_timens_vdso_data.
> v4: - fix an issue reported by the lkp robot.
>     - vvar has the same size with/without CONFIG_TIME_NAMESPACE, but the
>       timens page isn't allocated on !CONFIG_TIME_NAMESPACE. This
>       simplifies criu/vdso migration between different kernel configs.
> v5: - Code cleanups suggested by Mark Rutland.
>     - In vdso_join_timens, mmap_write_lock is downgraded to
>       mmap_read_lock. The VMA list isn't changed there, zap_page_range
>       doesn't require mmap_write_lock.
> 
> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
> Reviewed-by: Dmitry Safonov <dima@arista.com>

Hello Will and Catalin,

Have you had a chance to look at this patch set? I think it is ready to be
merged. Let me know if you have any questions.

Thanks,
Andrei


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/6] arm64: add the time namespace support
  2020-07-05  6:40   ` Andrei Vagin
@ 2020-07-14  1:57     ` Andrei Vagin
  -1 siblings, 0 replies; 39+ messages in thread
From: Andrei Vagin @ 2020-07-14  1:57 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, linux-kernel, Vincenzo Frascino, Mark Rutland,
	Thomas Gleixner, Dmitry Safonov, Christian Brauner

On Sat, Jul 04, 2020 at 11:40:55PM -0700, Andrei Vagin wrote:
> On Wed, Jun 24, 2020 at 01:33:15AM -0700, Andrei Vagin wrote:
> > Allocate the time namespace page among VVAR pages and add the logic
> > to handle faults on VVAR properly.
> > 
> > If a task belongs to a time namespace then the VVAR page which contains
> > the system wide VDSO data is replaced with a namespace specific page
> > which has the same layout as the VVAR page. That page has vdso_data->seq
> > set to 1 to enforce the slow path and vdso_data->clock_mode set to
> > VCLOCK_TIMENS to enforce the time namespace handling path.
> > 
> > The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
> > update of the VDSO data is in progress, is not really affecting regular
> > tasks which are not part of a time namespace as the task is spin waiting
> > for the update to finish and vdso_data->seq to become even again.
> > 
> > If a time namespace task hits that code path, it invokes the corresponding
> > time getter function which retrieves the real VVAR page, reads host time
> > and then adds the offset for the requested clock which is stored in the
> > special VVAR page.
> > 
> 
> > v2: Code cleanups suggested by Vincenzo.
> > v3: add a comment in __arch_get_timens_vdso_data.
> > v4: - fix an issue reported by the lkp robot.
> >     - vvar has the same size with/without CONFIG_TIME_NAMESPACE, but the
> >       timens page isn't allocated on !CONFIG_TIME_NAMESPACE. This
> >       simplifies criu/vdso migration between different kernel configs.
> > v5: - Code cleanups suggested by Mark Rutland.
> >     - In vdso_join_timens, mmap_write_lock is downgraded to
> >       mmap_read_lock. The VMA list isn't changed there, zap_page_range
> >       doesn't require mmap_write_lock.
> > 
> > Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
> > Reviewed-by: Dmitry Safonov <dima@arista.com>
> 
> Hello Will and Catalin,
> 
> Have you had a chance to look at this patch set? I think it is ready to be
> merged. Let me know if you have any questions.

*friendly ping*

If I am doing something wrong, let me know.

> 
> Thanks,
> Andrei
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/6] arm64: add the time namespace support
@ 2020-07-14  1:57     ` Andrei Vagin
  0 siblings, 0 replies; 39+ messages in thread
From: Andrei Vagin @ 2020-07-14  1:57 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Mark Rutland, Dmitry Safonov, linux-kernel, Thomas Gleixner,
	Vincenzo Frascino, Christian Brauner, linux-arm-kernel

On Sat, Jul 04, 2020 at 11:40:55PM -0700, Andrei Vagin wrote:
> On Wed, Jun 24, 2020 at 01:33:15AM -0700, Andrei Vagin wrote:
> > Allocate the time namespace page among VVAR pages and add the logic
> > to handle faults on VVAR properly.
> > 
> > If a task belongs to a time namespace then the VVAR page which contains
> > the system wide VDSO data is replaced with a namespace specific page
> > which has the same layout as the VVAR page. That page has vdso_data->seq
> > set to 1 to enforce the slow path and vdso_data->clock_mode set to
> > VCLOCK_TIMENS to enforce the time namespace handling path.
> > 
> > The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
> > update of the VDSO data is in progress, is not really affecting regular
> > tasks which are not part of a time namespace as the task is spin waiting
> > for the update to finish and vdso_data->seq to become even again.
> > 
> > If a time namespace task hits that code path, it invokes the corresponding
> > time getter function which retrieves the real VVAR page, reads host time
> > and then adds the offset for the requested clock which is stored in the
> > special VVAR page.
> > 
> 
> > v2: Code cleanups suggested by Vincenzo.
> > v3: add a comment in __arch_get_timens_vdso_data.
> > v4: - fix an issue reported by the lkp robot.
> >     - vvar has the same size with/without CONFIG_TIME_NAMESPACE, but the
> >       timens page isn't allocated on !CONFIG_TIME_NAMESPACE. This
> >       simplifies criu/vdso migration between different kernel configs.
> > v5: - Code cleanups suggested by Mark Rutland.
> >     - In vdso_join_timens, mmap_write_lock is downgraded to
> >       mmap_read_lock. The VMA list isn't changed there, zap_page_range
> >       doesn't require mmap_write_lock.
> > 
> > Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
> > Reviewed-by: Dmitry Safonov <dima@arista.com>
> 
> Hello Will and Catalin,
> 
> Have you had a chance to look at this patch set? I think it is ready to be
> merged. Let me know if you have any questions.

*friendly ping*

If I am doing something wrong, let me know.

> 
> Thanks,
> Andrei
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/6] arm64: add the time namespace support
  2020-07-14  1:57     ` Andrei Vagin
@ 2020-07-22 18:15       ` Catalin Marinas
  -1 siblings, 0 replies; 39+ messages in thread
From: Catalin Marinas @ 2020-07-22 18:15 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: Will Deacon, linux-arm-kernel, linux-kernel, Vincenzo Frascino,
	Mark Rutland, Thomas Gleixner, Dmitry Safonov, Christian Brauner

On Mon, Jul 13, 2020 at 06:57:43PM -0700, Andrei Vagin wrote:
> On Sat, Jul 04, 2020 at 11:40:55PM -0700, Andrei Vagin wrote:
> > On Wed, Jun 24, 2020 at 01:33:15AM -0700, Andrei Vagin wrote:
> > > Allocate the time namespace page among VVAR pages and add the logic
> > > to handle faults on VVAR properly.
> > > 
> > > If a task belongs to a time namespace then the VVAR page which contains
> > > the system wide VDSO data is replaced with a namespace specific page
> > > which has the same layout as the VVAR page. That page has vdso_data->seq
> > > set to 1 to enforce the slow path and vdso_data->clock_mode set to
> > > VCLOCK_TIMENS to enforce the time namespace handling path.
> > > 
> > > The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
> > > update of the VDSO data is in progress, is not really affecting regular
> > > tasks which are not part of a time namespace as the task is spin waiting
> > > for the update to finish and vdso_data->seq to become even again.
> > > 
> > > If a time namespace task hits that code path, it invokes the corresponding
> > > time getter function which retrieves the real VVAR page, reads host time
> > > and then adds the offset for the requested clock which is stored in the
> > > special VVAR page.
> > > 
> > 
> > > v2: Code cleanups suggested by Vincenzo.
> > > v3: add a comment in __arch_get_timens_vdso_data.
> > > v4: - fix an issue reported by the lkp robot.
> > >     - vvar has the same size with/without CONFIG_TIME_NAMESPACE, but the
> > >       timens page isn't allocated on !CONFIG_TIME_NAMESPACE. This
> > >       simplifies criu/vdso migration between different kernel configs.
> > > v5: - Code cleanups suggested by Mark Rutland.
> > >     - In vdso_join_timens, mmap_write_lock is downgraded to
> > >       mmap_read_lock. The VMA list isn't changed there, zap_page_range
> > >       doesn't require mmap_write_lock.
> > > 
> > > Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
> > > Reviewed-by: Dmitry Safonov <dima@arista.com>
> > 
> > Hello Will and Catalin,
> > 
> > Have you had a chance to look at this patch set? I think it is ready to be
> > merged. Let me know if you have any questions.
> 
> *friendly ping*
> 
> If I am doing something wrong, let me know.

Not really, just haven't got around to looking into it. Mark Rutland
raised a concern (in private) about the safety of multithreaded apps
but I think you already replied that timens_install() checks for this
already [1].

Maybe a similar atomicity issue to the one raised by Mark but for
single-threaded processes: the thread is executing vdso code, gets
interrupted and a signal handler invokes setns(). Would resuming the
execution in the vdso code on sigreturn cause any issues?

[1] https://lore.kernel.org/linux-arm-kernel/d16b5cd1-bdb1-5667-fbda-c622cc795389@arista.com/

-- 
Catalin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/6] arm64: add the time namespace support
@ 2020-07-22 18:15       ` Catalin Marinas
  0 siblings, 0 replies; 39+ messages in thread
From: Catalin Marinas @ 2020-07-22 18:15 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: Mark Rutland, Dmitry Safonov, linux-kernel, Thomas Gleixner,
	Vincenzo Frascino, Will Deacon, Christian Brauner,
	linux-arm-kernel

On Mon, Jul 13, 2020 at 06:57:43PM -0700, Andrei Vagin wrote:
> On Sat, Jul 04, 2020 at 11:40:55PM -0700, Andrei Vagin wrote:
> > On Wed, Jun 24, 2020 at 01:33:15AM -0700, Andrei Vagin wrote:
> > > Allocate the time namespace page among VVAR pages and add the logic
> > > to handle faults on VVAR properly.
> > > 
> > > If a task belongs to a time namespace then the VVAR page which contains
> > > the system wide VDSO data is replaced with a namespace specific page
> > > which has the same layout as the VVAR page. That page has vdso_data->seq
> > > set to 1 to enforce the slow path and vdso_data->clock_mode set to
> > > VCLOCK_TIMENS to enforce the time namespace handling path.
> > > 
> > > The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
> > > update of the VDSO data is in progress, is not really affecting regular
> > > tasks which are not part of a time namespace as the task is spin waiting
> > > for the update to finish and vdso_data->seq to become even again.
> > > 
> > > If a time namespace task hits that code path, it invokes the corresponding
> > > time getter function which retrieves the real VVAR page, reads host time
> > > and then adds the offset for the requested clock which is stored in the
> > > special VVAR page.
> > > 
> > 
> > > v2: Code cleanups suggested by Vincenzo.
> > > v3: add a comment in __arch_get_timens_vdso_data.
> > > v4: - fix an issue reported by the lkp robot.
> > >     - vvar has the same size with/without CONFIG_TIME_NAMESPACE, but the
> > >       timens page isn't allocated on !CONFIG_TIME_NAMESPACE. This
> > >       simplifies criu/vdso migration between different kernel configs.
> > > v5: - Code cleanups suggested by Mark Rutland.
> > >     - In vdso_join_timens, mmap_write_lock is downgraded to
> > >       mmap_read_lock. The VMA list isn't changed there, zap_page_range
> > >       doesn't require mmap_write_lock.
> > > 
> > > Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
> > > Reviewed-by: Dmitry Safonov <dima@arista.com>
> > 
> > Hello Will and Catalin,
> > 
> > Have you had a chance to look at this patch set? I think it is ready to be
> > merged. Let me know if you have any questions.
> 
> *friendly ping*
> 
> If I am doing something wrong, let me know.

Not really, just haven't got around to looking into it. Mark Rutland
raised a concern (in private) about the safety of multithreaded apps
but I think you already replied that timens_install() checks for this
already [1].

Maybe a similar atomicity issue to the one raised by Mark but for
single-threaded processes: the thread is executing vdso code, gets
interrupted and a signal handler invokes setns(). Would resuming the
execution in the vdso code on sigreturn cause any issues?

[1] https://lore.kernel.org/linux-arm-kernel/d16b5cd1-bdb1-5667-fbda-c622cc795389@arista.com/

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/6] arm64: add the time namespace support
  2020-07-22 18:15       ` Catalin Marinas
@ 2020-07-23 17:41         ` Andrei Vagin
  -1 siblings, 0 replies; 39+ messages in thread
From: Andrei Vagin @ 2020-07-23 17:41 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Will Deacon, linux-arm-kernel, linux-kernel, Vincenzo Frascino,
	Mark Rutland, Thomas Gleixner, Dmitry Safonov, Christian Brauner

On Wed, Jul 22, 2020 at 07:15:06PM +0100, Catalin Marinas wrote:
> On Mon, Jul 13, 2020 at 06:57:43PM -0700, Andrei Vagin wrote:
> > On Sat, Jul 04, 2020 at 11:40:55PM -0700, Andrei Vagin wrote:
> > > On Wed, Jun 24, 2020 at 01:33:15AM -0700, Andrei Vagin wrote:
> > > > Allocate the time namespace page among VVAR pages and add the logic
> > > > to handle faults on VVAR properly.
> > > > 
> > > > If a task belongs to a time namespace then the VVAR page which contains
> > > > the system wide VDSO data is replaced with a namespace specific page
> > > > which has the same layout as the VVAR page. That page has vdso_data->seq
> > > > set to 1 to enforce the slow path and vdso_data->clock_mode set to
> > > > VCLOCK_TIMENS to enforce the time namespace handling path.
> > > > 
> > > > The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
> > > > update of the VDSO data is in progress, is not really affecting regular
> > > > tasks which are not part of a time namespace as the task is spin waiting
> > > > for the update to finish and vdso_data->seq to become even again.
> > > > 
> > > > If a time namespace task hits that code path, it invokes the corresponding
> > > > time getter function which retrieves the real VVAR page, reads host time
> > > > and then adds the offset for the requested clock which is stored in the
> > > > special VVAR page.
> > > > 
> > > 
> > > > v2: Code cleanups suggested by Vincenzo.
> > > > v3: add a comment in __arch_get_timens_vdso_data.
> > > > v4: - fix an issue reported by the lkp robot.
> > > >     - vvar has the same size with/without CONFIG_TIME_NAMESPACE, but the
> > > >       timens page isn't allocated on !CONFIG_TIME_NAMESPACE. This
> > > >       simplifies criu/vdso migration between different kernel configs.
> > > > v5: - Code cleanups suggested by Mark Rutland.
> > > >     - In vdso_join_timens, mmap_write_lock is downgraded to
> > > >       mmap_read_lock. The VMA list isn't changed there, zap_page_range
> > > >       doesn't require mmap_write_lock.
> > > > 
> > > > Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
> > > > Reviewed-by: Dmitry Safonov <dima@arista.com>
> > > 
> > > Hello Will and Catalin,
> > > 
> > > Have you had a chance to look at this patch set? I think it is ready to be
> > > merged. Let me know if you have any questions.
> > 
> > *friendly ping*
> > 
> > If I am doing something wrong, let me know.
> 
> Not really, just haven't got around to looking into it. Mark Rutland
> raised a concern (in private) about the safety of multithreaded apps
> but I think you already replied that timens_install() checks for this
> already [1].
> 
> Maybe a similar atomicity issue to the one raised by Mark but for
> single-threaded processes: the thread is executing vdso code, gets
> interrupted and a signal handler invokes setns(). Would resuming the
> execution in the vdso code on sigreturn cause any issues?

It will not cause any issues in the kernel. In the userspace,
clock_gettime() can return a clock value with an inconsistent offset, if
a process switches between two non-root namespaces. And it can triggers
SIGSEGV if it switches from a non-root to the root time namespace,
because a time namespace isn't mapped in the root time namespace.

I don't think that we need to handle this case in the kernel. Users
must understand what they are doing and have to write code so that avoid
these sort of situations. In general, I would say that in most cases it
is a bad idea to call setns from a signal handler.

Thanks,
Andrei

> 
> [1] https://lore.kernel.org/linux-arm-kernel/d16b5cd1-bdb1-5667-fbda-c622cc795389@arista.com/
> 
> -- 
> Catalin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/6] arm64: add the time namespace support
@ 2020-07-23 17:41         ` Andrei Vagin
  0 siblings, 0 replies; 39+ messages in thread
From: Andrei Vagin @ 2020-07-23 17:41 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Mark Rutland, Dmitry Safonov, linux-kernel, Thomas Gleixner,
	Vincenzo Frascino, Will Deacon, Christian Brauner,
	linux-arm-kernel

On Wed, Jul 22, 2020 at 07:15:06PM +0100, Catalin Marinas wrote:
> On Mon, Jul 13, 2020 at 06:57:43PM -0700, Andrei Vagin wrote:
> > On Sat, Jul 04, 2020 at 11:40:55PM -0700, Andrei Vagin wrote:
> > > On Wed, Jun 24, 2020 at 01:33:15AM -0700, Andrei Vagin wrote:
> > > > Allocate the time namespace page among VVAR pages and add the logic
> > > > to handle faults on VVAR properly.
> > > > 
> > > > If a task belongs to a time namespace then the VVAR page which contains
> > > > the system wide VDSO data is replaced with a namespace specific page
> > > > which has the same layout as the VVAR page. That page has vdso_data->seq
> > > > set to 1 to enforce the slow path and vdso_data->clock_mode set to
> > > > VCLOCK_TIMENS to enforce the time namespace handling path.
> > > > 
> > > > The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
> > > > update of the VDSO data is in progress, is not really affecting regular
> > > > tasks which are not part of a time namespace as the task is spin waiting
> > > > for the update to finish and vdso_data->seq to become even again.
> > > > 
> > > > If a time namespace task hits that code path, it invokes the corresponding
> > > > time getter function which retrieves the real VVAR page, reads host time
> > > > and then adds the offset for the requested clock which is stored in the
> > > > special VVAR page.
> > > > 
> > > 
> > > > v2: Code cleanups suggested by Vincenzo.
> > > > v3: add a comment in __arch_get_timens_vdso_data.
> > > > v4: - fix an issue reported by the lkp robot.
> > > >     - vvar has the same size with/without CONFIG_TIME_NAMESPACE, but the
> > > >       timens page isn't allocated on !CONFIG_TIME_NAMESPACE. This
> > > >       simplifies criu/vdso migration between different kernel configs.
> > > > v5: - Code cleanups suggested by Mark Rutland.
> > > >     - In vdso_join_timens, mmap_write_lock is downgraded to
> > > >       mmap_read_lock. The VMA list isn't changed there, zap_page_range
> > > >       doesn't require mmap_write_lock.
> > > > 
> > > > Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
> > > > Reviewed-by: Dmitry Safonov <dima@arista.com>
> > > 
> > > Hello Will and Catalin,
> > > 
> > > Have you had a chance to look at this patch set? I think it is ready to be
> > > merged. Let me know if you have any questions.
> > 
> > *friendly ping*
> > 
> > If I am doing something wrong, let me know.
> 
> Not really, just haven't got around to looking into it. Mark Rutland
> raised a concern (in private) about the safety of multithreaded apps
> but I think you already replied that timens_install() checks for this
> already [1].
> 
> Maybe a similar atomicity issue to the one raised by Mark but for
> single-threaded processes: the thread is executing vdso code, gets
> interrupted and a signal handler invokes setns(). Would resuming the
> execution in the vdso code on sigreturn cause any issues?

It will not cause any issues in the kernel. In the userspace,
clock_gettime() can return a clock value with an inconsistent offset, if
a process switches between two non-root namespaces. And it can triggers
SIGSEGV if it switches from a non-root to the root time namespace,
because a time namespace isn't mapped in the root time namespace.

I don't think that we need to handle this case in the kernel. Users
must understand what they are doing and have to write code so that avoid
these sort of situations. In general, I would say that in most cases it
is a bad idea to call setns from a signal handler.

Thanks,
Andrei

> 
> [1] https://lore.kernel.org/linux-arm-kernel/d16b5cd1-bdb1-5667-fbda-c622cc795389@arista.com/
> 
> -- 
> Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/6] arm64: add the time namespace support
  2020-07-23 17:41         ` Andrei Vagin
@ 2020-07-23 20:25           ` Thomas Gleixner
  -1 siblings, 0 replies; 39+ messages in thread
From: Thomas Gleixner @ 2020-07-23 20:25 UTC (permalink / raw)
  To: Andrei Vagin, Catalin Marinas
  Cc: Will Deacon, linux-arm-kernel, linux-kernel, Vincenzo Frascino,
	Mark Rutland, Dmitry Safonov, Christian Brauner

Andrei Vagin <avagin@gmail.com> writes:
> On Wed, Jul 22, 2020 at 07:15:06PM +0100, Catalin Marinas wrote:
>
> I don't think that we need to handle this case in the kernel. Users
> must understand what they are doing and have to write code so that avoid
> these sort of situations. In general, I would say that in most cases it
> is a bad idea to call setns from a signal handler.

This should not be supported in the first place and just let the
offender die right there.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/6] arm64: add the time namespace support
@ 2020-07-23 20:25           ` Thomas Gleixner
  0 siblings, 0 replies; 39+ messages in thread
From: Thomas Gleixner @ 2020-07-23 20:25 UTC (permalink / raw)
  To: Andrei Vagin, Catalin Marinas
  Cc: Mark Rutland, Dmitry Safonov, linux-kernel, Christian Brauner,
	Vincenzo Frascino, Will Deacon, linux-arm-kernel

Andrei Vagin <avagin@gmail.com> writes:
> On Wed, Jul 22, 2020 at 07:15:06PM +0100, Catalin Marinas wrote:
>
> I don't think that we need to handle this case in the kernel. Users
> must understand what they are doing and have to write code so that avoid
> these sort of situations. In general, I would say that in most cases it
> is a bad idea to call setns from a signal handler.

This should not be supported in the first place and just let the
offender die right there.

Thanks,

        tglx

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/6] arm64: add the time namespace support
  2020-07-23 20:25           ` Thomas Gleixner
@ 2020-07-24 11:01             ` Catalin Marinas
  -1 siblings, 0 replies; 39+ messages in thread
From: Catalin Marinas @ 2020-07-24 11:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andrei Vagin, Will Deacon, linux-arm-kernel, linux-kernel,
	Vincenzo Frascino, Mark Rutland, Dmitry Safonov,
	Christian Brauner

On Thu, Jul 23, 2020 at 10:25:35PM +0200, Thomas Gleixner wrote:
> Andrei Vagin <avagin@gmail.com> writes:
> > On Wed, Jul 22, 2020 at 07:15:06PM +0100, Catalin Marinas wrote:
> >
> > I don't think that we need to handle this case in the kernel. Users
> > must understand what they are doing and have to write code so that avoid
> > these sort of situations. In general, I would say that in most cases it
> > is a bad idea to call setns from a signal handler.
> 
> This should not be supported in the first place and just let the
> offender die right there.

It would have been nice if we caught the offender easily but since
signal handling doesn't have to be paired with sigreturn(), the kernel
can't tell whether setns() is called in the wrong context. I guess we
just have to live with this (maybe document the restriction in
time_namespaces(7) or setns(2)).

-- 
Catalin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/6] arm64: add the time namespace support
@ 2020-07-24 11:01             ` Catalin Marinas
  0 siblings, 0 replies; 39+ messages in thread
From: Catalin Marinas @ 2020-07-24 11:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mark Rutland, Dmitry Safonov, linux-kernel, Andrei Vagin,
	Christian Brauner, Vincenzo Frascino, Will Deacon,
	linux-arm-kernel

On Thu, Jul 23, 2020 at 10:25:35PM +0200, Thomas Gleixner wrote:
> Andrei Vagin <avagin@gmail.com> writes:
> > On Wed, Jul 22, 2020 at 07:15:06PM +0100, Catalin Marinas wrote:
> >
> > I don't think that we need to handle this case in the kernel. Users
> > must understand what they are doing and have to write code so that avoid
> > these sort of situations. In general, I would say that in most cases it
> > is a bad idea to call setns from a signal handler.
> 
> This should not be supported in the first place and just let the
> offender die right there.

It would have been nice if we caught the offender easily but since
signal handling doesn't have to be paired with sigreturn(), the kernel
can't tell whether setns() is called in the wrong context. I guess we
just have to live with this (maybe document the restriction in
time_namespaces(7) or setns(2)).

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/6] arm64: add the time namespace support
  2020-07-24 11:01             ` Catalin Marinas
@ 2020-07-24 13:22               ` Thomas Gleixner
  -1 siblings, 0 replies; 39+ messages in thread
From: Thomas Gleixner @ 2020-07-24 13:22 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Andrei Vagin, Will Deacon, linux-arm-kernel, linux-kernel,
	Vincenzo Frascino, Mark Rutland, Dmitry Safonov,
	Christian Brauner

Catalin Marinas <catalin.marinas@arm.com> writes:
> On Thu, Jul 23, 2020 at 10:25:35PM +0200, Thomas Gleixner wrote:
>> Andrei Vagin <avagin@gmail.com> writes:
>> > On Wed, Jul 22, 2020 at 07:15:06PM +0100, Catalin Marinas wrote:
>> >
>> > I don't think that we need to handle this case in the kernel. Users
>> > must understand what they are doing and have to write code so that avoid
>> > these sort of situations. In general, I would say that in most cases it
>> > is a bad idea to call setns from a signal handler.
>> 
>> This should not be supported in the first place and just let the
>> offender die right there.
>
> It would have been nice if we caught the offender easily but since
> signal handling doesn't have to be paired with sigreturn(), the kernel
> can't tell whether setns() is called in the wrong context. I guess we
> just have to live with this (maybe document the restriction in
> time_namespaces(7) or setns(2)).

Yes, I know that it's more or less impossible. The 'should' was just
wishful thinking :)

But yes, proper documentation is required.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/6] arm64: add the time namespace support
@ 2020-07-24 13:22               ` Thomas Gleixner
  0 siblings, 0 replies; 39+ messages in thread
From: Thomas Gleixner @ 2020-07-24 13:22 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Mark Rutland, Dmitry Safonov, linux-kernel, Andrei Vagin,
	Christian Brauner, Vincenzo Frascino, Will Deacon,
	linux-arm-kernel

Catalin Marinas <catalin.marinas@arm.com> writes:
> On Thu, Jul 23, 2020 at 10:25:35PM +0200, Thomas Gleixner wrote:
>> Andrei Vagin <avagin@gmail.com> writes:
>> > On Wed, Jul 22, 2020 at 07:15:06PM +0100, Catalin Marinas wrote:
>> >
>> > I don't think that we need to handle this case in the kernel. Users
>> > must understand what they are doing and have to write code so that avoid
>> > these sort of situations. In general, I would say that in most cases it
>> > is a bad idea to call setns from a signal handler.
>> 
>> This should not be supported in the first place and just let the
>> offender die right there.
>
> It would have been nice if we caught the offender easily but since
> signal handling doesn't have to be paired with sigreturn(), the kernel
> can't tell whether setns() is called in the wrong context. I guess we
> just have to live with this (maybe document the restriction in
> time_namespaces(7) or setns(2)).

Yes, I know that it's more or less impossible. The 'should' was just
wishful thinking :)

But yes, proper documentation is required.

Thanks,

        tglx

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/6] arm64: add the time namespace support
  2020-07-23 17:41         ` Andrei Vagin
@ 2020-07-24 13:30           ` Christian Brauner
  -1 siblings, 0 replies; 39+ messages in thread
From: Christian Brauner @ 2020-07-24 13:30 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: Catalin Marinas, Will Deacon, linux-arm-kernel, linux-kernel,
	Vincenzo Frascino, Mark Rutland, Thomas Gleixner, Dmitry Safonov

On Thu, Jul 23, 2020 at 10:41:40AM -0700, Andrei Vagin wrote:
> On Wed, Jul 22, 2020 at 07:15:06PM +0100, Catalin Marinas wrote:
> > On Mon, Jul 13, 2020 at 06:57:43PM -0700, Andrei Vagin wrote:
> > > On Sat, Jul 04, 2020 at 11:40:55PM -0700, Andrei Vagin wrote:
> > > > On Wed, Jun 24, 2020 at 01:33:15AM -0700, Andrei Vagin wrote:
> > > > > Allocate the time namespace page among VVAR pages and add the logic
> > > > > to handle faults on VVAR properly.
> > > > > 
> > > > > If a task belongs to a time namespace then the VVAR page which contains
> > > > > the system wide VDSO data is replaced with a namespace specific page
> > > > > which has the same layout as the VVAR page. That page has vdso_data->seq
> > > > > set to 1 to enforce the slow path and vdso_data->clock_mode set to
> > > > > VCLOCK_TIMENS to enforce the time namespace handling path.
> > > > > 
> > > > > The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
> > > > > update of the VDSO data is in progress, is not really affecting regular
> > > > > tasks which are not part of a time namespace as the task is spin waiting
> > > > > for the update to finish and vdso_data->seq to become even again.
> > > > > 
> > > > > If a time namespace task hits that code path, it invokes the corresponding
> > > > > time getter function which retrieves the real VVAR page, reads host time
> > > > > and then adds the offset for the requested clock which is stored in the
> > > > > special VVAR page.
> > > > > 
> > > > 
> > > > > v2: Code cleanups suggested by Vincenzo.
> > > > > v3: add a comment in __arch_get_timens_vdso_data.
> > > > > v4: - fix an issue reported by the lkp robot.
> > > > >     - vvar has the same size with/without CONFIG_TIME_NAMESPACE, but the
> > > > >       timens page isn't allocated on !CONFIG_TIME_NAMESPACE. This
> > > > >       simplifies criu/vdso migration between different kernel configs.
> > > > > v5: - Code cleanups suggested by Mark Rutland.
> > > > >     - In vdso_join_timens, mmap_write_lock is downgraded to
> > > > >       mmap_read_lock. The VMA list isn't changed there, zap_page_range
> > > > >       doesn't require mmap_write_lock.
> > > > > 
> > > > > Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
> > > > > Reviewed-by: Dmitry Safonov <dima@arista.com>
> > > > 
> > > > Hello Will and Catalin,
> > > > 
> > > > Have you had a chance to look at this patch set? I think it is ready to be
> > > > merged. Let me know if you have any questions.
> > > 
> > > *friendly ping*
> > > 
> > > If I am doing something wrong, let me know.
> > 
> > Not really, just haven't got around to looking into it. Mark Rutland
> > raised a concern (in private) about the safety of multithreaded apps
> > but I think you already replied that timens_install() checks for this
> > already [1].
> > 
> > Maybe a similar atomicity issue to the one raised by Mark but for
> > single-threaded processes: the thread is executing vdso code, gets
> > interrupted and a signal handler invokes setns(). Would resuming the
> > execution in the vdso code on sigreturn cause any issues?
> 
> It will not cause any issues in the kernel. In the userspace,
> clock_gettime() can return a clock value with an inconsistent offset, if
> a process switches between two non-root namespaces. And it can triggers
> SIGSEGV if it switches from a non-root to the root time namespace,
> because a time namespace isn't mapped in the root time namespace.
> 
> I don't think that we need to handle this case in the kernel. Users
> must understand what they are doing and have to write code so that avoid
> these sort of situations. In general, I would say that in most cases it
> is a bad idea to call setns from a signal handler.

I would argue that calling any function not in the list of
man 7 signal-safety
without checking the kernel implementation is "you get to keep the
pieces territory". There's a whole range of syscalls that are not safe
in signal handlers and we don't have any special precautions for them so
I'm not sure we'd need one for setns(). But maybe I'm missing the bigger
picture here.

Thanks!
Christian

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/6] arm64: add the time namespace support
@ 2020-07-24 13:30           ` Christian Brauner
  0 siblings, 0 replies; 39+ messages in thread
From: Christian Brauner @ 2020-07-24 13:30 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: Mark Rutland, Dmitry Safonov, Catalin Marinas, linux-kernel,
	Thomas Gleixner, Vincenzo Frascino, Will Deacon,
	linux-arm-kernel

On Thu, Jul 23, 2020 at 10:41:40AM -0700, Andrei Vagin wrote:
> On Wed, Jul 22, 2020 at 07:15:06PM +0100, Catalin Marinas wrote:
> > On Mon, Jul 13, 2020 at 06:57:43PM -0700, Andrei Vagin wrote:
> > > On Sat, Jul 04, 2020 at 11:40:55PM -0700, Andrei Vagin wrote:
> > > > On Wed, Jun 24, 2020 at 01:33:15AM -0700, Andrei Vagin wrote:
> > > > > Allocate the time namespace page among VVAR pages and add the logic
> > > > > to handle faults on VVAR properly.
> > > > > 
> > > > > If a task belongs to a time namespace then the VVAR page which contains
> > > > > the system wide VDSO data is replaced with a namespace specific page
> > > > > which has the same layout as the VVAR page. That page has vdso_data->seq
> > > > > set to 1 to enforce the slow path and vdso_data->clock_mode set to
> > > > > VCLOCK_TIMENS to enforce the time namespace handling path.
> > > > > 
> > > > > The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
> > > > > update of the VDSO data is in progress, is not really affecting regular
> > > > > tasks which are not part of a time namespace as the task is spin waiting
> > > > > for the update to finish and vdso_data->seq to become even again.
> > > > > 
> > > > > If a time namespace task hits that code path, it invokes the corresponding
> > > > > time getter function which retrieves the real VVAR page, reads host time
> > > > > and then adds the offset for the requested clock which is stored in the
> > > > > special VVAR page.
> > > > > 
> > > > 
> > > > > v2: Code cleanups suggested by Vincenzo.
> > > > > v3: add a comment in __arch_get_timens_vdso_data.
> > > > > v4: - fix an issue reported by the lkp robot.
> > > > >     - vvar has the same size with/without CONFIG_TIME_NAMESPACE, but the
> > > > >       timens page isn't allocated on !CONFIG_TIME_NAMESPACE. This
> > > > >       simplifies criu/vdso migration between different kernel configs.
> > > > > v5: - Code cleanups suggested by Mark Rutland.
> > > > >     - In vdso_join_timens, mmap_write_lock is downgraded to
> > > > >       mmap_read_lock. The VMA list isn't changed there, zap_page_range
> > > > >       doesn't require mmap_write_lock.
> > > > > 
> > > > > Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
> > > > > Reviewed-by: Dmitry Safonov <dima@arista.com>
> > > > 
> > > > Hello Will and Catalin,
> > > > 
> > > > Have you had a chance to look at this patch set? I think it is ready to be
> > > > merged. Let me know if you have any questions.
> > > 
> > > *friendly ping*
> > > 
> > > If I am doing something wrong, let me know.
> > 
> > Not really, just haven't got around to looking into it. Mark Rutland
> > raised a concern (in private) about the safety of multithreaded apps
> > but I think you already replied that timens_install() checks for this
> > already [1].
> > 
> > Maybe a similar atomicity issue to the one raised by Mark but for
> > single-threaded processes: the thread is executing vdso code, gets
> > interrupted and a signal handler invokes setns(). Would resuming the
> > execution in the vdso code on sigreturn cause any issues?
> 
> It will not cause any issues in the kernel. In the userspace,
> clock_gettime() can return a clock value with an inconsistent offset, if
> a process switches between two non-root namespaces. And it can triggers
> SIGSEGV if it switches from a non-root to the root time namespace,
> because a time namespace isn't mapped in the root time namespace.
> 
> I don't think that we need to handle this case in the kernel. Users
> must understand what they are doing and have to write code so that avoid
> these sort of situations. In general, I would say that in most cases it
> is a bad idea to call setns from a signal handler.

I would argue that calling any function not in the list of
man 7 signal-safety
without checking the kernel implementation is "you get to keep the
pieces territory". There's a whole range of syscalls that are not safe
in signal handlers and we don't have any special precautions for them so
I'm not sure we'd need one for setns(). But maybe I'm missing the bigger
picture here.

Thanks!
Christian

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/6] arm64: add the time namespace support
  2020-07-24 13:30           ` Christian Brauner
@ 2020-07-24 14:40             ` Catalin Marinas
  -1 siblings, 0 replies; 39+ messages in thread
From: Catalin Marinas @ 2020-07-24 14:40 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Andrei Vagin, Will Deacon, linux-arm-kernel, linux-kernel,
	Vincenzo Frascino, Mark Rutland, Thomas Gleixner, Dmitry Safonov

On Fri, Jul 24, 2020 at 03:30:39PM +0200, Christian Brauner wrote:
> On Thu, Jul 23, 2020 at 10:41:40AM -0700, Andrei Vagin wrote:
> > On Wed, Jul 22, 2020 at 07:15:06PM +0100, Catalin Marinas wrote:
> > > On Mon, Jul 13, 2020 at 06:57:43PM -0700, Andrei Vagin wrote:
> > > > On Sat, Jul 04, 2020 at 11:40:55PM -0700, Andrei Vagin wrote:
> > > > > On Wed, Jun 24, 2020 at 01:33:15AM -0700, Andrei Vagin wrote:
> > > > > > Allocate the time namespace page among VVAR pages and add the logic
> > > > > > to handle faults on VVAR properly.
> > > > > > 
> > > > > > If a task belongs to a time namespace then the VVAR page which contains
> > > > > > the system wide VDSO data is replaced with a namespace specific page
> > > > > > which has the same layout as the VVAR page. That page has vdso_data->seq
> > > > > > set to 1 to enforce the slow path and vdso_data->clock_mode set to
> > > > > > VCLOCK_TIMENS to enforce the time namespace handling path.
> > > > > > 
> > > > > > The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
> > > > > > update of the VDSO data is in progress, is not really affecting regular
> > > > > > tasks which are not part of a time namespace as the task is spin waiting
> > > > > > for the update to finish and vdso_data->seq to become even again.
> > > > > > 
> > > > > > If a time namespace task hits that code path, it invokes the corresponding
> > > > > > time getter function which retrieves the real VVAR page, reads host time
> > > > > > and then adds the offset for the requested clock which is stored in the
> > > > > > special VVAR page.
> > > > > > 
> > > > > 
> > > > > > v2: Code cleanups suggested by Vincenzo.
> > > > > > v3: add a comment in __arch_get_timens_vdso_data.
> > > > > > v4: - fix an issue reported by the lkp robot.
> > > > > >     - vvar has the same size with/without CONFIG_TIME_NAMESPACE, but the
> > > > > >       timens page isn't allocated on !CONFIG_TIME_NAMESPACE. This
> > > > > >       simplifies criu/vdso migration between different kernel configs.
> > > > > > v5: - Code cleanups suggested by Mark Rutland.
> > > > > >     - In vdso_join_timens, mmap_write_lock is downgraded to
> > > > > >       mmap_read_lock. The VMA list isn't changed there, zap_page_range
> > > > > >       doesn't require mmap_write_lock.
> > > > > > 
> > > > > > Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
> > > > > > Reviewed-by: Dmitry Safonov <dima@arista.com>
> > > > > 
> > > > > Hello Will and Catalin,
> > > > > 
> > > > > Have you had a chance to look at this patch set? I think it is ready to be
> > > > > merged. Let me know if you have any questions.
> > > > 
> > > > *friendly ping*
> > > > 
> > > > If I am doing something wrong, let me know.
> > > 
> > > Not really, just haven't got around to looking into it. Mark Rutland
> > > raised a concern (in private) about the safety of multithreaded apps
> > > but I think you already replied that timens_install() checks for this
> > > already [1].
> > > 
> > > Maybe a similar atomicity issue to the one raised by Mark but for
> > > single-threaded processes: the thread is executing vdso code, gets
> > > interrupted and a signal handler invokes setns(). Would resuming the
> > > execution in the vdso code on sigreturn cause any issues?
> > 
> > It will not cause any issues in the kernel. In the userspace,
> > clock_gettime() can return a clock value with an inconsistent offset, if
> > a process switches between two non-root namespaces. And it can triggers
> > SIGSEGV if it switches from a non-root to the root time namespace,
> > because a time namespace isn't mapped in the root time namespace.
> > 
> > I don't think that we need to handle this case in the kernel. Users
> > must understand what they are doing and have to write code so that avoid
> > these sort of situations. In general, I would say that in most cases it
> > is a bad idea to call setns from a signal handler.
> 
> I would argue that calling any function not in the list of
> man 7 signal-safety
> without checking the kernel implementation is "you get to keep the
> pieces territory". There's a whole range of syscalls that are not safe
> in signal handlers and we don't have any special precautions for them so
> I'm not sure we'd need one for setns(). But maybe I'm missing the bigger
> picture here.

Good point (I don't read man pages very often ;)). Thanks for
clarifying.

-- 
Catalin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/6] arm64: add the time namespace support
@ 2020-07-24 14:40             ` Catalin Marinas
  0 siblings, 0 replies; 39+ messages in thread
From: Catalin Marinas @ 2020-07-24 14:40 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Mark Rutland, Dmitry Safonov, linux-kernel, Andrei Vagin,
	Thomas Gleixner, Vincenzo Frascino, Will Deacon,
	linux-arm-kernel

On Fri, Jul 24, 2020 at 03:30:39PM +0200, Christian Brauner wrote:
> On Thu, Jul 23, 2020 at 10:41:40AM -0700, Andrei Vagin wrote:
> > On Wed, Jul 22, 2020 at 07:15:06PM +0100, Catalin Marinas wrote:
> > > On Mon, Jul 13, 2020 at 06:57:43PM -0700, Andrei Vagin wrote:
> > > > On Sat, Jul 04, 2020 at 11:40:55PM -0700, Andrei Vagin wrote:
> > > > > On Wed, Jun 24, 2020 at 01:33:15AM -0700, Andrei Vagin wrote:
> > > > > > Allocate the time namespace page among VVAR pages and add the logic
> > > > > > to handle faults on VVAR properly.
> > > > > > 
> > > > > > If a task belongs to a time namespace then the VVAR page which contains
> > > > > > the system wide VDSO data is replaced with a namespace specific page
> > > > > > which has the same layout as the VVAR page. That page has vdso_data->seq
> > > > > > set to 1 to enforce the slow path and vdso_data->clock_mode set to
> > > > > > VCLOCK_TIMENS to enforce the time namespace handling path.
> > > > > > 
> > > > > > The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
> > > > > > update of the VDSO data is in progress, is not really affecting regular
> > > > > > tasks which are not part of a time namespace as the task is spin waiting
> > > > > > for the update to finish and vdso_data->seq to become even again.
> > > > > > 
> > > > > > If a time namespace task hits that code path, it invokes the corresponding
> > > > > > time getter function which retrieves the real VVAR page, reads host time
> > > > > > and then adds the offset for the requested clock which is stored in the
> > > > > > special VVAR page.
> > > > > > 
> > > > > 
> > > > > > v2: Code cleanups suggested by Vincenzo.
> > > > > > v3: add a comment in __arch_get_timens_vdso_data.
> > > > > > v4: - fix an issue reported by the lkp robot.
> > > > > >     - vvar has the same size with/without CONFIG_TIME_NAMESPACE, but the
> > > > > >       timens page isn't allocated on !CONFIG_TIME_NAMESPACE. This
> > > > > >       simplifies criu/vdso migration between different kernel configs.
> > > > > > v5: - Code cleanups suggested by Mark Rutland.
> > > > > >     - In vdso_join_timens, mmap_write_lock is downgraded to
> > > > > >       mmap_read_lock. The VMA list isn't changed there, zap_page_range
> > > > > >       doesn't require mmap_write_lock.
> > > > > > 
> > > > > > Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
> > > > > > Reviewed-by: Dmitry Safonov <dima@arista.com>
> > > > > 
> > > > > Hello Will and Catalin,
> > > > > 
> > > > > Have you had a chance to look at this patch set? I think it is ready to be
> > > > > merged. Let me know if you have any questions.
> > > > 
> > > > *friendly ping*
> > > > 
> > > > If I am doing something wrong, let me know.
> > > 
> > > Not really, just haven't got around to looking into it. Mark Rutland
> > > raised a concern (in private) about the safety of multithreaded apps
> > > but I think you already replied that timens_install() checks for this
> > > already [1].
> > > 
> > > Maybe a similar atomicity issue to the one raised by Mark but for
> > > single-threaded processes: the thread is executing vdso code, gets
> > > interrupted and a signal handler invokes setns(). Would resuming the
> > > execution in the vdso code on sigreturn cause any issues?
> > 
> > It will not cause any issues in the kernel. In the userspace,
> > clock_gettime() can return a clock value with an inconsistent offset, if
> > a process switches between two non-root namespaces. And it can triggers
> > SIGSEGV if it switches from a non-root to the root time namespace,
> > because a time namespace isn't mapped in the root time namespace.
> > 
> > I don't think that we need to handle this case in the kernel. Users
> > must understand what they are doing and have to write code so that avoid
> > these sort of situations. In general, I would say that in most cases it
> > is a bad idea to call setns from a signal handler.
> 
> I would argue that calling any function not in the list of
> man 7 signal-safety
> without checking the kernel implementation is "you get to keep the
> pieces territory". There's a whole range of syscalls that are not safe
> in signal handlers and we don't have any special precautions for them so
> I'm not sure we'd need one for setns(). But maybe I'm missing the bigger
> picture here.

Good point (I don't read man pages very often ;)). Thanks for
clarifying.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/6] arm64/vdso: use the fault callback to map vvar pages
  2020-06-24  8:33   ` Andrei Vagin
  (?)
@ 2020-07-24 17:26   ` Catalin Marinas
  2020-07-27  6:45       ` Andrei Vagin
  -1 siblings, 1 reply; 39+ messages in thread
From: Catalin Marinas @ 2020-07-24 17:26 UTC (permalink / raw)
  To: Andrei Vagin, Will Deacon, Mark Rutland
  Cc: Thomas Gleixner, Christian Brauner, Vincenzo Frascino,
	linux-arm-kernel, linux-kernel, Dmitry Safonov

On Wed, 24 Jun 2020 01:33:16 -0700, Andrei Vagin wrote:
> Currently the vdso has no awareness of time namespaces, which may
> apply distinct offsets to processes in different namespaces. To handle
> this within the vdso, we'll need to expose a per-namespace data page.
> 
> As a preparatory step, this patch separates the vdso data page from
> the code pages, and has it faulted in via its own fault callback.
> Subsquent patches will extend this to support distinct pages per time
> namespace.
> 
> [...]

Applied to arm64 (for-next/timens), provisionally.

One potential issue I did not check is the compat vDSO. The arm32 port
does not support timens currently. IIUC, with these patches and
COMPAT_VDSO enabled, it will allow timens for compat processes. Normally
I'd like the arm32 support first before updating compat but I don't
think there would be any interface incompatibility here.

However, does this still work for arm32 processes if COMPAT_VDSO is
disabled in the arm64 kernel?

[1/6] arm64/vdso: use the fault callback to map vvar pages
      https://git.kernel.org/arm64/c/d53b5c013e1e
[2/6] arm64/vdso: Zap vvar pages when switching to a time namespace
      https://git.kernel.org/arm64/c/1b6867d2916b
[3/6] arm64/vdso: Add time namespace page
      https://git.kernel.org/arm64/c/3503d56cc723
[4/6] arm64/vdso: Handle faults on timens page
      https://git.kernel.org/arm64/c/ee3cda8e4606
[5/6] arm64/vdso: Restrict splitting VVAR VMA
      https://git.kernel.org/arm64/c/bcf996434240
[6/6] arm64: enable time namespace support
      https://git.kernel.org/arm64/c/9614cc576d76

Thanks!

-- 
Catalin


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/6] arm64/vdso: use the fault callback to map vvar pages
  2020-07-24 17:26   ` Catalin Marinas
@ 2020-07-27  6:45       ` Andrei Vagin
  0 siblings, 0 replies; 39+ messages in thread
From: Andrei Vagin @ 2020-07-27  6:45 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Will Deacon, Mark Rutland, Thomas Gleixner, Christian Brauner,
	Vincenzo Frascino, linux-arm-kernel, linux-kernel,
	Dmitry Safonov

On Fri, Jul 24, 2020 at 06:26:23PM +0100, Catalin Marinas wrote:
> On Wed, 24 Jun 2020 01:33:16 -0700, Andrei Vagin wrote:
> > Currently the vdso has no awareness of time namespaces, which may
> > apply distinct offsets to processes in different namespaces. To handle
> > this within the vdso, we'll need to expose a per-namespace data page.
> > 
> > As a preparatory step, this patch separates the vdso data page from
> > the code pages, and has it faulted in via its own fault callback.
> > Subsquent patches will extend this to support distinct pages per time
> > namespace.
> > 
> > [...]
> 
> Applied to arm64 (for-next/timens), provisionally.
> 
> One potential issue I did not check is the compat vDSO. The arm32 port
> does not support timens currently. IIUC, with these patches and
> COMPAT_VDSO enabled, it will allow timens for compat processes. Normally
> I'd like the arm32 support first before updating compat but I don't
> think there would be any interface incompatibility here.
> 
> However, does this still work for arm32 processes if COMPAT_VDSO is
> disabled in the arm64 kernel?

Yes, it does. I checked that the timens test passes with and without
COMPAT_VDSO:

[avagin@laptop linux]$ git describe HEAD
v5.8-rc3-6-g9614cc576d76

alpine:/tip/tools/testing/selftests/timens# readelf  -h ./timens
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Shared object file)
  Machine:                           ARM
  Version:                           0x1
  Entry point address:               0x711
  Start of program headers:          52 (bytes into file)
  Start of section headers:          15444 (bytes into file)
  Flags:                             0x5000400, Version5 EABI,
hard-float ABI
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         7
  Size of section headers:           40 (bytes)
  Number of section headers:         32
  Section header string table index: 31

alpine:/tip/tools/testing/selftests/timens# uname -a
Linux arm64-alpine 5.8.0-rc3+ #100 SMP Sun Jul 26 23:21:07 PDT 2020
aarch64 Linux


[avagin@laptop linux]$ cat  .config | grep VDSO
CONFIG_COMPAT_VDSO=y
CONFIG_THUMB2_COMPAT_VDSO=y
CONFIG_HAVE_GENERIC_VDSO=y
CONFIG_GENERIC_COMPAT_VDSO=y
CONFIG_GENERIC_VDSO_TIME_NS=y


alpine:/tip/tools/testing/selftests/timens# ./timens
1..10
ok 1 Passed for CLOCK_BOOTTIME (syscall)
ok 2 Passed for CLOCK_BOOTTIME (vdso)
not ok 3 # SKIP CLOCK_BOOTTIME_ALARM isn't supported
not ok 4 # SKIP CLOCK_BOOTTIME_ALARM isn't supported
ok 5 Passed for CLOCK_MONOTONIC (syscall)
ok 6 Passed for CLOCK_MONOTONIC (vdso)
ok 7 Passed for CLOCK_MONOTONIC_COARSE (syscall)
ok 8 Passed for CLOCK_MONOTONIC_COARSE (vdso)
ok 9 Passed for CLOCK_MONOTONIC_RAW (syscall)
ok 10 Passed for CLOCK_MONOTONIC_RAW (vdso)
# Pass 8 Fail 0 Xfail 0 Xpass 0 Skip 2 Error 0


[avagin@laptop linux]$ cat  .config | grep VDSO
# CONFIG_COMPAT_VDSO is not set
CONFIG_HAVE_GENERIC_VDSO=y
CONFIG_GENERIC_VDSO_TIME_NS=y

alpine:/tip/tools/testing/selftests/timens# ./timens
1..10
ok 1 Passed for CLOCK_BOOTTIME (syscall)
ok 2 Passed for CLOCK_BOOTTIME (vdso)
not ok 3 # SKIP CLOCK_BOOTTIME_ALARM isn't supported
not ok 4 # SKIP CLOCK_BOOTTIME_ALARM isn't supported
ok 5 Passed for CLOCK_MONOTONIC (syscall)
ok 6 Passed for CLOCK_MONOTONIC (vdso)
ok 7 Passed for CLOCK_MONOTONIC_COARSE (syscall)
ok 8 Passed for CLOCK_MONOTONIC_COARSE (vdso)
ok 9 Passed for CLOCK_MONOTONIC_RAW (syscall)
ok 10 Passed for CLOCK_MONOTONIC_RAW (vdso)
# Pass 8 Fail 0 Xfail 0 Xpass 0 Skip 2 Error 0

Thanks,
Andrei

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/6] arm64/vdso: use the fault callback to map vvar pages
@ 2020-07-27  6:45       ` Andrei Vagin
  0 siblings, 0 replies; 39+ messages in thread
From: Andrei Vagin @ 2020-07-27  6:45 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Mark Rutland, Dmitry Safonov, linux-kernel, Thomas Gleixner,
	Vincenzo Frascino, Will Deacon, Christian Brauner,
	linux-arm-kernel

On Fri, Jul 24, 2020 at 06:26:23PM +0100, Catalin Marinas wrote:
> On Wed, 24 Jun 2020 01:33:16 -0700, Andrei Vagin wrote:
> > Currently the vdso has no awareness of time namespaces, which may
> > apply distinct offsets to processes in different namespaces. To handle
> > this within the vdso, we'll need to expose a per-namespace data page.
> > 
> > As a preparatory step, this patch separates the vdso data page from
> > the code pages, and has it faulted in via its own fault callback.
> > Subsquent patches will extend this to support distinct pages per time
> > namespace.
> > 
> > [...]
> 
> Applied to arm64 (for-next/timens), provisionally.
> 
> One potential issue I did not check is the compat vDSO. The arm32 port
> does not support timens currently. IIUC, with these patches and
> COMPAT_VDSO enabled, it will allow timens for compat processes. Normally
> I'd like the arm32 support first before updating compat but I don't
> think there would be any interface incompatibility here.
> 
> However, does this still work for arm32 processes if COMPAT_VDSO is
> disabled in the arm64 kernel?

Yes, it does. I checked that the timens test passes with and without
COMPAT_VDSO:

[avagin@laptop linux]$ git describe HEAD
v5.8-rc3-6-g9614cc576d76

alpine:/tip/tools/testing/selftests/timens# readelf  -h ./timens
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Shared object file)
  Machine:                           ARM
  Version:                           0x1
  Entry point address:               0x711
  Start of program headers:          52 (bytes into file)
  Start of section headers:          15444 (bytes into file)
  Flags:                             0x5000400, Version5 EABI,
hard-float ABI
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         7
  Size of section headers:           40 (bytes)
  Number of section headers:         32
  Section header string table index: 31

alpine:/tip/tools/testing/selftests/timens# uname -a
Linux arm64-alpine 5.8.0-rc3+ #100 SMP Sun Jul 26 23:21:07 PDT 2020
aarch64 Linux


[avagin@laptop linux]$ cat  .config | grep VDSO
CONFIG_COMPAT_VDSO=y
CONFIG_THUMB2_COMPAT_VDSO=y
CONFIG_HAVE_GENERIC_VDSO=y
CONFIG_GENERIC_COMPAT_VDSO=y
CONFIG_GENERIC_VDSO_TIME_NS=y


alpine:/tip/tools/testing/selftests/timens# ./timens
1..10
ok 1 Passed for CLOCK_BOOTTIME (syscall)
ok 2 Passed for CLOCK_BOOTTIME (vdso)
not ok 3 # SKIP CLOCK_BOOTTIME_ALARM isn't supported
not ok 4 # SKIP CLOCK_BOOTTIME_ALARM isn't supported
ok 5 Passed for CLOCK_MONOTONIC (syscall)
ok 6 Passed for CLOCK_MONOTONIC (vdso)
ok 7 Passed for CLOCK_MONOTONIC_COARSE (syscall)
ok 8 Passed for CLOCK_MONOTONIC_COARSE (vdso)
ok 9 Passed for CLOCK_MONOTONIC_RAW (syscall)
ok 10 Passed for CLOCK_MONOTONIC_RAW (vdso)
# Pass 8 Fail 0 Xfail 0 Xpass 0 Skip 2 Error 0


[avagin@laptop linux]$ cat  .config | grep VDSO
# CONFIG_COMPAT_VDSO is not set
CONFIG_HAVE_GENERIC_VDSO=y
CONFIG_GENERIC_VDSO_TIME_NS=y

alpine:/tip/tools/testing/selftests/timens# ./timens
1..10
ok 1 Passed for CLOCK_BOOTTIME (syscall)
ok 2 Passed for CLOCK_BOOTTIME (vdso)
not ok 3 # SKIP CLOCK_BOOTTIME_ALARM isn't supported
not ok 4 # SKIP CLOCK_BOOTTIME_ALARM isn't supported
ok 5 Passed for CLOCK_MONOTONIC (syscall)
ok 6 Passed for CLOCK_MONOTONIC (vdso)
ok 7 Passed for CLOCK_MONOTONIC_COARSE (syscall)
ok 8 Passed for CLOCK_MONOTONIC_COARSE (vdso)
ok 9 Passed for CLOCK_MONOTONIC_RAW (syscall)
ok 10 Passed for CLOCK_MONOTONIC_RAW (vdso)
# Pass 8 Fail 0 Xfail 0 Xpass 0 Skip 2 Error 0

Thanks,
Andrei

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2020-07-27  6:47 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-24  8:33 [PATCH v5 0/6] arm64: add the time namespace support Andrei Vagin
2020-06-24  8:33 ` Andrei Vagin
2020-06-24  8:33 ` [PATCH 1/6] arm64/vdso: use the fault callback to map vvar pages Andrei Vagin
2020-06-24  8:33   ` Andrei Vagin
2020-07-24 17:26   ` Catalin Marinas
2020-07-27  6:45     ` Andrei Vagin
2020-07-27  6:45       ` Andrei Vagin
2020-06-24  8:33 ` [PATCH 2/6] arm64/vdso: Zap vvar pages when switching to a time namespace Andrei Vagin
2020-06-24  8:33   ` Andrei Vagin
2020-06-24 15:18   ` Christian Brauner
2020-06-24 15:18     ` Christian Brauner
2020-06-25  8:25     ` Andrei Vagin
2020-06-25  8:25       ` Andrei Vagin
2020-06-24  8:33 ` [PATCH 3/6] arm64/vdso: Add time namespace page Andrei Vagin
2020-06-24  8:33   ` Andrei Vagin
2020-06-24  8:33 ` [PATCH 4/6] arm64/vdso: Handle faults on timens page Andrei Vagin
2020-06-24  8:33   ` Andrei Vagin
2020-06-24  8:33 ` [PATCH 5/6] arm64/vdso: Restrict splitting VVAR VMA Andrei Vagin
2020-06-24  8:33   ` Andrei Vagin
2020-06-24  8:33 ` [PATCH 6/6] arm64: enable time namespace support Andrei Vagin
2020-06-24  8:33   ` Andrei Vagin
2020-07-05  6:40 ` [PATCH v5 0/6] arm64: add the " Andrei Vagin
2020-07-05  6:40   ` Andrei Vagin
2020-07-14  1:57   ` Andrei Vagin
2020-07-14  1:57     ` Andrei Vagin
2020-07-22 18:15     ` Catalin Marinas
2020-07-22 18:15       ` Catalin Marinas
2020-07-23 17:41       ` Andrei Vagin
2020-07-23 17:41         ` Andrei Vagin
2020-07-23 20:25         ` Thomas Gleixner
2020-07-23 20:25           ` Thomas Gleixner
2020-07-24 11:01           ` Catalin Marinas
2020-07-24 11:01             ` Catalin Marinas
2020-07-24 13:22             ` Thomas Gleixner
2020-07-24 13:22               ` Thomas Gleixner
2020-07-24 13:30         ` Christian Brauner
2020-07-24 13:30           ` Christian Brauner
2020-07-24 14:40           ` Catalin Marinas
2020-07-24 14:40             ` Catalin Marinas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.