All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/3] Avoid live-lock in btrfs fault-in+uaccess loop
@ 2022-04-06 18:09 ` Catalin Marinas
  0 siblings, 0 replies; 12+ messages in thread
From: Catalin Marinas @ 2022-04-06 18:09 UTC (permalink / raw)
  To: Linus Torvalds, Andreas Gruenbacher, Josef Bacik
  Cc: Al Viro, Andrew Morton, Chris Mason, David Sterba, Will Deacon,
	linux-fsdevel, linux-arm-kernel, linux-btrfs, linux-kernel

Hi,

I finally got around to reviving this series. However, I simplified it
from v2 and focussed on solving the btrfs search_ioctl() problem only:

https://lore.kernel.org/r/20211201193750.2097885-1-catalin.marinas@arm.com

The btrfs search_ioctl() function can potentially live-lock on arm64
with MTE enabled due to a fault_in_writeable() + copy_to_user_nofault()
unbounded loop. The uaccess can fault in the middle of a page (MTE tag
check fault) even if a prior fault_in_writeable() successfully wrote to
the beginning of that page. The btrfs loop always restarts the fault-in
loop from the beginning of the user buffer, hence the live-lock.

The series introduces fault_in_subpage_writeable() together with the
arm64 probing counterpart and the btrfs fix.

I don't think with the current kernel anything other than btrfs
search_ioctl() can live-lock. The buffered file I/O can already cope
with current fault_in_*() + copy_*_user() loops (the uaccess makes
progress). Direct I/O either goes via GUP + kernel mapping access (and
memcpy() can't fault) or, if the user buffer is not PAGE aligned, it may
fall back to buffered I/O. So we really only care about
fault_in_writeable(), hence this simplified series. If at some point
we'll need to address other places, we can expand the sub-page probing
to the other fault_in_*() functions.

Thanks.

Catalin Marinas (3):
  mm: Add fault_in_subpage_writeable() to probe at sub-page granularity
  arm64: Add support for user sub-page fault probing
  btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page
    faults

 arch/Kconfig                     |  7 +++++++
 arch/arm64/Kconfig               |  1 +
 arch/arm64/include/asm/mte.h     |  1 +
 arch/arm64/include/asm/uaccess.h | 15 +++++++++++++++
 arch/arm64/kernel/mte.c          | 30 ++++++++++++++++++++++++++++++
 fs/btrfs/ioctl.c                 |  7 ++++++-
 include/linux/pagemap.h          |  1 +
 include/linux/uaccess.h          | 22 ++++++++++++++++++++++
 mm/gup.c                         | 29 +++++++++++++++++++++++++++++
 9 files changed, 112 insertions(+), 1 deletion(-)


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 0/3] Avoid live-lock in btrfs fault-in+uaccess loop
@ 2022-04-06 18:09 ` Catalin Marinas
  0 siblings, 0 replies; 12+ messages in thread
From: Catalin Marinas @ 2022-04-06 18:09 UTC (permalink / raw)
  To: Linus Torvalds, Andreas Gruenbacher, Josef Bacik
  Cc: Al Viro, Andrew Morton, Chris Mason, David Sterba, Will Deacon,
	linux-fsdevel, linux-arm-kernel, linux-btrfs, linux-kernel

Hi,

I finally got around to reviving this series. However, I simplified it
from v2 and focussed on solving the btrfs search_ioctl() problem only:

https://lore.kernel.org/r/20211201193750.2097885-1-catalin.marinas@arm.com

The btrfs search_ioctl() function can potentially live-lock on arm64
with MTE enabled due to a fault_in_writeable() + copy_to_user_nofault()
unbounded loop. The uaccess can fault in the middle of a page (MTE tag
check fault) even if a prior fault_in_writeable() successfully wrote to
the beginning of that page. The btrfs loop always restarts the fault-in
loop from the beginning of the user buffer, hence the live-lock.

The series introduces fault_in_subpage_writeable() together with the
arm64 probing counterpart and the btrfs fix.

I don't think with the current kernel anything other than btrfs
search_ioctl() can live-lock. The buffered file I/O can already cope
with current fault_in_*() + copy_*_user() loops (the uaccess makes
progress). Direct I/O either goes via GUP + kernel mapping access (and
memcpy() can't fault) or, if the user buffer is not PAGE aligned, it may
fall back to buffered I/O. So we really only care about
fault_in_writeable(), hence this simplified series. If at some point
we'll need to address other places, we can expand the sub-page probing
to the other fault_in_*() functions.

Thanks.

Catalin Marinas (3):
  mm: Add fault_in_subpage_writeable() to probe at sub-page granularity
  arm64: Add support for user sub-page fault probing
  btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page
    faults

 arch/Kconfig                     |  7 +++++++
 arch/arm64/Kconfig               |  1 +
 arch/arm64/include/asm/mte.h     |  1 +
 arch/arm64/include/asm/uaccess.h | 15 +++++++++++++++
 arch/arm64/kernel/mte.c          | 30 ++++++++++++++++++++++++++++++
 fs/btrfs/ioctl.c                 |  7 ++++++-
 include/linux/pagemap.h          |  1 +
 include/linux/uaccess.h          | 22 ++++++++++++++++++++++
 mm/gup.c                         | 29 +++++++++++++++++++++++++++++
 9 files changed, 112 insertions(+), 1 deletion(-)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 1/3] mm: Add fault_in_subpage_writeable() to probe at sub-page granularity
  2022-04-06 18:09 ` Catalin Marinas
@ 2022-04-06 18:09   ` Catalin Marinas
  -1 siblings, 0 replies; 12+ messages in thread
From: Catalin Marinas @ 2022-04-06 18:09 UTC (permalink / raw)
  To: Linus Torvalds, Andreas Gruenbacher, Josef Bacik
  Cc: Al Viro, Andrew Morton, Chris Mason, David Sterba, Will Deacon,
	linux-fsdevel, linux-arm-kernel, linux-btrfs, linux-kernel

On hardware with features like arm64 MTE or SPARC ADI, an access fault
can be triggered at sub-page granularity. Depending on how the
fault_in_writeable() function is used, the caller can get into a
live-lock by continuously retrying the fault-in on an address different
from the one where the uaccess failed.

In the majority of cases progress is ensured by the following
conditions:

1. copy_to_user_nofault() guarantees at least one byte access if the
   user address is not faulting.

2. The fault_in_writeable() loop is resumed from the first address that
   could not be accessed by copy_to_user_nofault().

If the loop iteration is restarted from an earlier (initial) point, the
loop is repeated with the same conditions and it would live-lock.

Introduce an arch-specific probe_subpage_writeable() and call it from
the newly added fault_in_subpage_writeable() function. The arch code
with sub-page faults will have to implement the specific probing
functionality.

Note that no other fault_in_subpage_*() functions are added since they
have no callers currently susceptible to a live-lock.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 arch/Kconfig            |  7 +++++++
 include/linux/pagemap.h |  1 +
 include/linux/uaccess.h | 22 ++++++++++++++++++++++
 mm/gup.c                | 29 +++++++++++++++++++++++++++++
 4 files changed, 59 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index 29b0167c088b..b34032279926 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -24,6 +24,13 @@ config KEXEC_ELF
 config HAVE_IMA_KEXEC
 	bool
 
+config ARCH_HAS_SUBPAGE_FAULTS
+	bool
+	help
+	  Select if the architecture can check permissions at sub-page
+	  granularity (e.g. arm64 MTE). The probe_user_*() functions
+	  must be implemented.
+
 config HOTPLUG_SMT
 	bool
 
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 993994cd943a..6165283bdb6f 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -1046,6 +1046,7 @@ void folio_add_wait_queue(struct folio *folio, wait_queue_entry_t *waiter);
  * Fault in userspace address range.
  */
 size_t fault_in_writeable(char __user *uaddr, size_t size);
+size_t fault_in_subpage_writeable(char __user *uaddr, size_t size);
 size_t fault_in_safe_writeable(const char __user *uaddr, size_t size);
 size_t fault_in_readable(const char __user *uaddr, size_t size);
 
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 546179418ffa..8bbb2dabac19 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -231,6 +231,28 @@ static inline bool pagefault_disabled(void)
  */
 #define faulthandler_disabled() (pagefault_disabled() || in_atomic())
 
+#ifndef CONFIG_ARCH_HAS_SUBPAGE_FAULTS
+
+/**
+ * probe_subpage_writeable: probe the user range for write faults at sub-page
+ *			    granularity (e.g. arm64 MTE)
+ * @uaddr: start of address range
+ * @size: size of address range
+ *
+ * Returns 0 on success, the number of bytes not probed on fault.
+ *
+ * It is expected that the caller checked for the write permission of each
+ * page in the range either by put_user() or GUP. The architecture port can
+ * implement a more efficient get_user() probing if the same sub-page faults
+ * are triggered by either a read or a write.
+ */
+static inline size_t probe_subpage_writeable(void __user *uaddr, size_t size)
+{
+	return 0;
+}
+
+#endif /* CONFIG_ARCH_HAS_SUBPAGE_FAULTS */
+
 #ifndef ARCH_HAS_NOCACHE_UACCESS
 
 static inline __must_check unsigned long
diff --git a/mm/gup.c b/mm/gup.c
index f598a037eb04..501bc150792c 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1648,6 +1648,35 @@ size_t fault_in_writeable(char __user *uaddr, size_t size)
 }
 EXPORT_SYMBOL(fault_in_writeable);
 
+/**
+ * fault_in_subpage_writeable - fault in an address range for writing
+ * @uaddr: start of address range
+ * @size: size of address range
+ *
+ * Fault in a user address range for writing while checking for permissions at
+ * sub-page granularity (e.g. arm64 MTE). This function should be used when
+ * the caller cannot guarantee forward progress of a copy_to_user() loop.
+ *
+ * Returns the number of bytes not faulted in (like copy_to_user() and
+ * copy_from_user()).
+ */
+size_t fault_in_subpage_writeable(char __user *uaddr, size_t size)
+{
+	size_t faulted_in;
+
+	/*
+	 * Attempt faulting in at page granularity first for page table
+	 * permission checking. The arch-specific probe_subpage_writeable()
+	 * functions may not check for this.
+	 */
+	faulted_in = size - fault_in_writeable(uaddr, size);
+	if (faulted_in)
+		faulted_in -= probe_subpage_writeable(uaddr, faulted_in);
+
+	return size - faulted_in;
+}
+EXPORT_SYMBOL(fault_in_subpage_writeable);
+
 /*
  * fault_in_safe_writeable - fault in an address range for writing
  * @uaddr: start of address range

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 1/3] mm: Add fault_in_subpage_writeable() to probe at sub-page granularity
@ 2022-04-06 18:09   ` Catalin Marinas
  0 siblings, 0 replies; 12+ messages in thread
From: Catalin Marinas @ 2022-04-06 18:09 UTC (permalink / raw)
  To: Linus Torvalds, Andreas Gruenbacher, Josef Bacik
  Cc: Al Viro, Andrew Morton, Chris Mason, David Sterba, Will Deacon,
	linux-fsdevel, linux-arm-kernel, linux-btrfs, linux-kernel

On hardware with features like arm64 MTE or SPARC ADI, an access fault
can be triggered at sub-page granularity. Depending on how the
fault_in_writeable() function is used, the caller can get into a
live-lock by continuously retrying the fault-in on an address different
from the one where the uaccess failed.

In the majority of cases progress is ensured by the following
conditions:

1. copy_to_user_nofault() guarantees at least one byte access if the
   user address is not faulting.

2. The fault_in_writeable() loop is resumed from the first address that
   could not be accessed by copy_to_user_nofault().

If the loop iteration is restarted from an earlier (initial) point, the
loop is repeated with the same conditions and it would live-lock.

Introduce an arch-specific probe_subpage_writeable() and call it from
the newly added fault_in_subpage_writeable() function. The arch code
with sub-page faults will have to implement the specific probing
functionality.

Note that no other fault_in_subpage_*() functions are added since they
have no callers currently susceptible to a live-lock.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 arch/Kconfig            |  7 +++++++
 include/linux/pagemap.h |  1 +
 include/linux/uaccess.h | 22 ++++++++++++++++++++++
 mm/gup.c                | 29 +++++++++++++++++++++++++++++
 4 files changed, 59 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index 29b0167c088b..b34032279926 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -24,6 +24,13 @@ config KEXEC_ELF
 config HAVE_IMA_KEXEC
 	bool
 
+config ARCH_HAS_SUBPAGE_FAULTS
+	bool
+	help
+	  Select if the architecture can check permissions at sub-page
+	  granularity (e.g. arm64 MTE). The probe_user_*() functions
+	  must be implemented.
+
 config HOTPLUG_SMT
 	bool
 
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 993994cd943a..6165283bdb6f 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -1046,6 +1046,7 @@ void folio_add_wait_queue(struct folio *folio, wait_queue_entry_t *waiter);
  * Fault in userspace address range.
  */
 size_t fault_in_writeable(char __user *uaddr, size_t size);
+size_t fault_in_subpage_writeable(char __user *uaddr, size_t size);
 size_t fault_in_safe_writeable(const char __user *uaddr, size_t size);
 size_t fault_in_readable(const char __user *uaddr, size_t size);
 
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 546179418ffa..8bbb2dabac19 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -231,6 +231,28 @@ static inline bool pagefault_disabled(void)
  */
 #define faulthandler_disabled() (pagefault_disabled() || in_atomic())
 
+#ifndef CONFIG_ARCH_HAS_SUBPAGE_FAULTS
+
+/**
+ * probe_subpage_writeable: probe the user range for write faults at sub-page
+ *			    granularity (e.g. arm64 MTE)
+ * @uaddr: start of address range
+ * @size: size of address range
+ *
+ * Returns 0 on success, the number of bytes not probed on fault.
+ *
+ * It is expected that the caller checked for the write permission of each
+ * page in the range either by put_user() or GUP. The architecture port can
+ * implement a more efficient get_user() probing if the same sub-page faults
+ * are triggered by either a read or a write.
+ */
+static inline size_t probe_subpage_writeable(void __user *uaddr, size_t size)
+{
+	return 0;
+}
+
+#endif /* CONFIG_ARCH_HAS_SUBPAGE_FAULTS */
+
 #ifndef ARCH_HAS_NOCACHE_UACCESS
 
 static inline __must_check unsigned long
diff --git a/mm/gup.c b/mm/gup.c
index f598a037eb04..501bc150792c 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1648,6 +1648,35 @@ size_t fault_in_writeable(char __user *uaddr, size_t size)
 }
 EXPORT_SYMBOL(fault_in_writeable);
 
+/**
+ * fault_in_subpage_writeable - fault in an address range for writing
+ * @uaddr: start of address range
+ * @size: size of address range
+ *
+ * Fault in a user address range for writing while checking for permissions at
+ * sub-page granularity (e.g. arm64 MTE). This function should be used when
+ * the caller cannot guarantee forward progress of a copy_to_user() loop.
+ *
+ * Returns the number of bytes not faulted in (like copy_to_user() and
+ * copy_from_user()).
+ */
+size_t fault_in_subpage_writeable(char __user *uaddr, size_t size)
+{
+	size_t faulted_in;
+
+	/*
+	 * Attempt faulting in at page granularity first for page table
+	 * permission checking. The arch-specific probe_subpage_writeable()
+	 * functions may not check for this.
+	 */
+	faulted_in = size - fault_in_writeable(uaddr, size);
+	if (faulted_in)
+		faulted_in -= probe_subpage_writeable(uaddr, faulted_in);
+
+	return size - faulted_in;
+}
+EXPORT_SYMBOL(fault_in_subpage_writeable);
+
 /*
  * fault_in_safe_writeable - fault in an address range for writing
  * @uaddr: start of address range

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 2/3] arm64: Add support for user sub-page fault probing
  2022-04-06 18:09 ` Catalin Marinas
@ 2022-04-06 18:09   ` Catalin Marinas
  -1 siblings, 0 replies; 12+ messages in thread
From: Catalin Marinas @ 2022-04-06 18:09 UTC (permalink / raw)
  To: Linus Torvalds, Andreas Gruenbacher, Josef Bacik
  Cc: Al Viro, Andrew Morton, Chris Mason, David Sterba, Will Deacon,
	linux-fsdevel, linux-arm-kernel, linux-btrfs, linux-kernel

With MTE, even if the pte allows an access, a mismatched tag somewhere
within a page can still cause a fault. Select ARCH_HAS_SUBPAGE_FAULTS if
MTE is enabled and implement the probe_subpage_writeable() function.
Note that get_user() is sufficient for the writeable MTE check since the
same tag mismatch fault would be triggered by a read. The caller of
probe_subpage_writeable() will need to check the pte permissions
(put_user, GUP).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/Kconfig               |  1 +
 arch/arm64/include/asm/mte.h     |  1 +
 arch/arm64/include/asm/uaccess.h | 15 +++++++++++++++
 arch/arm64/kernel/mte.c          | 30 ++++++++++++++++++++++++++++++
 4 files changed, 47 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 57c4c995965f..290b88238103 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1871,6 +1871,7 @@ config ARM64_MTE
 	depends on AS_HAS_LSE_ATOMICS
 	# Required for tag checking in the uaccess routines
 	depends on ARM64_PAN
+	select ARCH_HAS_SUBPAGE_FAULTS
 	select ARCH_USES_HIGH_VMA_FLAGS
 	help
 	  Memory Tagging (part of the ARMv8.5 Extensions) provides
diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index adcb937342f1..aa523591a44e 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -47,6 +47,7 @@ long set_mte_ctrl(struct task_struct *task, unsigned long arg);
 long get_mte_ctrl(struct task_struct *task);
 int mte_ptrace_copy_tags(struct task_struct *child, long request,
 			 unsigned long addr, unsigned long data);
+size_t mte_probe_user_range(const char __user *uaddr, size_t size);
 
 #else /* CONFIG_ARM64_MTE */
 
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index e8dce0cc5eaa..6677aa7e9993 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -460,4 +460,19 @@ static inline int __copy_from_user_flushcache(void *dst, const void __user *src,
 }
 #endif
 
+#ifdef CONFIG_ARCH_HAS_SUBPAGE_FAULTS
+
+/*
+ * Return 0 on success, the number of bytes not probed otherwise.
+ */
+static inline size_t probe_subpage_writeable(const void __user *uaddr,
+					     size_t size)
+{
+	if (!system_supports_mte())
+		return 0;
+	return mte_probe_user_range(uaddr, size);
+}
+
+#endif /* CONFIG_ARCH_HAS_SUBPAGE_FAULTS */
+
 #endif /* __ASM_UACCESS_H */
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index 78b3e0f8e997..35697a09926f 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -15,6 +15,7 @@
 #include <linux/swapops.h>
 #include <linux/thread_info.h>
 #include <linux/types.h>
+#include <linux/uaccess.h>
 #include <linux/uio.h>
 
 #include <asm/barrier.h>
@@ -543,3 +544,32 @@ static int register_mte_tcf_preferred_sysctl(void)
 	return 0;
 }
 subsys_initcall(register_mte_tcf_preferred_sysctl);
+
+/*
+ * Return 0 on success, the number of bytes not probed otherwise.
+ */
+size_t mte_probe_user_range(const char __user *uaddr, size_t size)
+{
+	const char __user *end = uaddr + size;
+	int err = 0;
+	char val;
+
+	__raw_get_user(val, uaddr, err);
+	if (err)
+		return size;
+
+	uaddr = PTR_ALIGN(uaddr, MTE_GRANULE_SIZE);
+	while (uaddr < end) {
+		/*
+		 * A read is sufficient for mte, the caller should have probed
+		 * for the pte write permission if required.
+		 */
+		__raw_get_user(val, uaddr, err);
+		if (err)
+			return end - uaddr;
+		uaddr += MTE_GRANULE_SIZE;
+	}
+	(void)val;
+
+	return 0;
+}

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 2/3] arm64: Add support for user sub-page fault probing
@ 2022-04-06 18:09   ` Catalin Marinas
  0 siblings, 0 replies; 12+ messages in thread
From: Catalin Marinas @ 2022-04-06 18:09 UTC (permalink / raw)
  To: Linus Torvalds, Andreas Gruenbacher, Josef Bacik
  Cc: Al Viro, Andrew Morton, Chris Mason, David Sterba, Will Deacon,
	linux-fsdevel, linux-arm-kernel, linux-btrfs, linux-kernel

With MTE, even if the pte allows an access, a mismatched tag somewhere
within a page can still cause a fault. Select ARCH_HAS_SUBPAGE_FAULTS if
MTE is enabled and implement the probe_subpage_writeable() function.
Note that get_user() is sufficient for the writeable MTE check since the
same tag mismatch fault would be triggered by a read. The caller of
probe_subpage_writeable() will need to check the pte permissions
(put_user, GUP).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/Kconfig               |  1 +
 arch/arm64/include/asm/mte.h     |  1 +
 arch/arm64/include/asm/uaccess.h | 15 +++++++++++++++
 arch/arm64/kernel/mte.c          | 30 ++++++++++++++++++++++++++++++
 4 files changed, 47 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 57c4c995965f..290b88238103 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1871,6 +1871,7 @@ config ARM64_MTE
 	depends on AS_HAS_LSE_ATOMICS
 	# Required for tag checking in the uaccess routines
 	depends on ARM64_PAN
+	select ARCH_HAS_SUBPAGE_FAULTS
 	select ARCH_USES_HIGH_VMA_FLAGS
 	help
 	  Memory Tagging (part of the ARMv8.5 Extensions) provides
diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index adcb937342f1..aa523591a44e 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -47,6 +47,7 @@ long set_mte_ctrl(struct task_struct *task, unsigned long arg);
 long get_mte_ctrl(struct task_struct *task);
 int mte_ptrace_copy_tags(struct task_struct *child, long request,
 			 unsigned long addr, unsigned long data);
+size_t mte_probe_user_range(const char __user *uaddr, size_t size);
 
 #else /* CONFIG_ARM64_MTE */
 
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index e8dce0cc5eaa..6677aa7e9993 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -460,4 +460,19 @@ static inline int __copy_from_user_flushcache(void *dst, const void __user *src,
 }
 #endif
 
+#ifdef CONFIG_ARCH_HAS_SUBPAGE_FAULTS
+
+/*
+ * Return 0 on success, the number of bytes not probed otherwise.
+ */
+static inline size_t probe_subpage_writeable(const void __user *uaddr,
+					     size_t size)
+{
+	if (!system_supports_mte())
+		return 0;
+	return mte_probe_user_range(uaddr, size);
+}
+
+#endif /* CONFIG_ARCH_HAS_SUBPAGE_FAULTS */
+
 #endif /* __ASM_UACCESS_H */
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index 78b3e0f8e997..35697a09926f 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -15,6 +15,7 @@
 #include <linux/swapops.h>
 #include <linux/thread_info.h>
 #include <linux/types.h>
+#include <linux/uaccess.h>
 #include <linux/uio.h>
 
 #include <asm/barrier.h>
@@ -543,3 +544,32 @@ static int register_mte_tcf_preferred_sysctl(void)
 	return 0;
 }
 subsys_initcall(register_mte_tcf_preferred_sysctl);
+
+/*
+ * Return 0 on success, the number of bytes not probed otherwise.
+ */
+size_t mte_probe_user_range(const char __user *uaddr, size_t size)
+{
+	const char __user *end = uaddr + size;
+	int err = 0;
+	char val;
+
+	__raw_get_user(val, uaddr, err);
+	if (err)
+		return size;
+
+	uaddr = PTR_ALIGN(uaddr, MTE_GRANULE_SIZE);
+	while (uaddr < end) {
+		/*
+		 * A read is sufficient for mte, the caller should have probed
+		 * for the pte write permission if required.
+		 */
+		__raw_get_user(val, uaddr, err);
+		if (err)
+			return end - uaddr;
+		uaddr += MTE_GRANULE_SIZE;
+	}
+	(void)val;
+
+	return 0;
+}

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 3/3] btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults
  2022-04-06 18:09 ` Catalin Marinas
@ 2022-04-06 18:09   ` Catalin Marinas
  -1 siblings, 0 replies; 12+ messages in thread
From: Catalin Marinas @ 2022-04-06 18:09 UTC (permalink / raw)
  To: Linus Torvalds, Andreas Gruenbacher, Josef Bacik
  Cc: Al Viro, Andrew Morton, Chris Mason, David Sterba, Will Deacon,
	linux-fsdevel, linux-arm-kernel, linux-btrfs, linux-kernel

Commit a48b73eca4ce ("btrfs: fix potential deadlock in the search
ioctl") addressed a lockdep warning by pre-faulting the user pages and
attempting the copy_to_user_nofault() in an infinite loop. On
architectures like arm64 with MTE, an access may fault within a page at
a location different from what fault_in_writeable() probed. Since the
sk_offset is rewound to the previous struct btrfs_ioctl_search_header
boundary, there is no guaranteed forward progress and search_ioctl() may
live-lock.

Use fault_in_subpage_writeable() instead of fault_in_writeable() to
ensure the permission is checked at the right granularity (smaller than
PAGE_SIZE).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Fixes: a48b73eca4ce ("btrfs: fix potential deadlock in the search ioctl")
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ioctl.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 238cee5b5254..d49e8254f823 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2556,8 +2556,13 @@ static noinline int search_ioctl(struct inode *inode,
 	key.offset = sk->min_offset;
 
 	while (1) {
+		size_t len = *buf_size - sk_offset;
 		ret = -EFAULT;
-		if (fault_in_writeable(ubuf + sk_offset, *buf_size - sk_offset))
+		/*
+		 * Ensure that the whole user buffer is faulted in at sub-page
+		 * granularity, otherwise the loop may live-lock.
+		 */
+		if (fault_in_subpage_writeable(ubuf + sk_offset, len))
 			break;
 
 		ret = btrfs_search_forward(root, &key, path, sk->min_transid);

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 3/3] btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults
@ 2022-04-06 18:09   ` Catalin Marinas
  0 siblings, 0 replies; 12+ messages in thread
From: Catalin Marinas @ 2022-04-06 18:09 UTC (permalink / raw)
  To: Linus Torvalds, Andreas Gruenbacher, Josef Bacik
  Cc: Al Viro, Andrew Morton, Chris Mason, David Sterba, Will Deacon,
	linux-fsdevel, linux-arm-kernel, linux-btrfs, linux-kernel

Commit a48b73eca4ce ("btrfs: fix potential deadlock in the search
ioctl") addressed a lockdep warning by pre-faulting the user pages and
attempting the copy_to_user_nofault() in an infinite loop. On
architectures like arm64 with MTE, an access may fault within a page at
a location different from what fault_in_writeable() probed. Since the
sk_offset is rewound to the previous struct btrfs_ioctl_search_header
boundary, there is no guaranteed forward progress and search_ioctl() may
live-lock.

Use fault_in_subpage_writeable() instead of fault_in_writeable() to
ensure the permission is checked at the right granularity (smaller than
PAGE_SIZE).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Fixes: a48b73eca4ce ("btrfs: fix potential deadlock in the search ioctl")
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ioctl.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 238cee5b5254..d49e8254f823 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2556,8 +2556,13 @@ static noinline int search_ioctl(struct inode *inode,
 	key.offset = sk->min_offset;
 
 	while (1) {
+		size_t len = *buf_size - sk_offset;
 		ret = -EFAULT;
-		if (fault_in_writeable(ubuf + sk_offset, *buf_size - sk_offset))
+		/*
+		 * Ensure that the whole user buffer is faulted in at sub-page
+		 * granularity, otherwise the loop may live-lock.
+		 */
+		if (fault_in_subpage_writeable(ubuf + sk_offset, len))
 			break;
 
 		ret = btrfs_search_forward(root, &key, path, sk->min_transid);

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/3] btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults
  2022-04-06 18:09   ` Catalin Marinas
@ 2022-04-06 20:40     ` Catalin Marinas
  -1 siblings, 0 replies; 12+ messages in thread
From: Catalin Marinas @ 2022-04-06 20:40 UTC (permalink / raw)
  To: Linus Torvalds, Andreas Gruenbacher, Josef Bacik
  Cc: Al Viro, Andrew Morton, Chris Mason, David Sterba, Will Deacon,
	linux-fsdevel, linux-arm-kernel, linux-btrfs, linux-kernel

On Wed, Apr 06, 2022 at 07:09:22PM +0100, Catalin Marinas wrote:
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 238cee5b5254..d49e8254f823 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -2556,8 +2556,13 @@ static noinline int search_ioctl(struct inode *inode,
>  	key.offset = sk->min_offset;
>  
>  	while (1) {
> +		size_t len = *buf_size - sk_offset;
>  		ret = -EFAULT;
> -		if (fault_in_writeable(ubuf + sk_offset, *buf_size - sk_offset))
> +		/*
> +		 * Ensure that the whole user buffer is faulted in at sub-page
> +		 * granularity, otherwise the loop may live-lock.
> +		 */
> +		if (fault_in_subpage_writeable(ubuf + sk_offset, len))
>  			break;

This doesn't need a new 'len' variable. It's a left-over from the v2
where fault_in_writeable() took the size and a min_size argument, both
being 'len'.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/3] btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults
@ 2022-04-06 20:40     ` Catalin Marinas
  0 siblings, 0 replies; 12+ messages in thread
From: Catalin Marinas @ 2022-04-06 20:40 UTC (permalink / raw)
  To: Linus Torvalds, Andreas Gruenbacher, Josef Bacik
  Cc: Al Viro, Andrew Morton, Chris Mason, David Sterba, Will Deacon,
	linux-fsdevel, linux-arm-kernel, linux-btrfs, linux-kernel

On Wed, Apr 06, 2022 at 07:09:22PM +0100, Catalin Marinas wrote:
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 238cee5b5254..d49e8254f823 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -2556,8 +2556,13 @@ static noinline int search_ioctl(struct inode *inode,
>  	key.offset = sk->min_offset;
>  
>  	while (1) {
> +		size_t len = *buf_size - sk_offset;
>  		ret = -EFAULT;
> -		if (fault_in_writeable(ubuf + sk_offset, *buf_size - sk_offset))
> +		/*
> +		 * Ensure that the whole user buffer is faulted in at sub-page
> +		 * granularity, otherwise the loop may live-lock.
> +		 */
> +		if (fault_in_subpage_writeable(ubuf + sk_offset, len))
>  			break;

This doesn't need a new 'len' variable. It's a left-over from the v2
where fault_in_writeable() took the size and a min_size argument, both
being 'len'.

-- 
Catalin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/3] btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults
  2022-04-06 18:09   ` Catalin Marinas
@ 2022-04-07 11:05     ` David Sterba
  -1 siblings, 0 replies; 12+ messages in thread
From: David Sterba @ 2022-04-07 11:05 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Linus Torvalds, Andreas Gruenbacher, Josef Bacik, Al Viro,
	Andrew Morton, Chris Mason, David Sterba, Will Deacon,
	linux-fsdevel, linux-arm-kernel, linux-btrfs, linux-kernel

On Wed, Apr 06, 2022 at 07:09:22PM +0100, Catalin Marinas wrote:
> Commit a48b73eca4ce ("btrfs: fix potential deadlock in the search
> ioctl") addressed a lockdep warning by pre-faulting the user pages and
> attempting the copy_to_user_nofault() in an infinite loop. On
> architectures like arm64 with MTE, an access may fault within a page at
> a location different from what fault_in_writeable() probed. Since the
> sk_offset is rewound to the previous struct btrfs_ioctl_search_header
> boundary, there is no guaranteed forward progress and search_ioctl() may
> live-lock.
> 
> Use fault_in_subpage_writeable() instead of fault_in_writeable() to
> ensure the permission is checked at the right granularity (smaller than
> PAGE_SIZE).
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Fixes: a48b73eca4ce ("btrfs: fix potential deadlock in the search ioctl")
> Reported-by: Al Viro <viro@zeniv.linux.org.uk>

Acked-by: David Sterba <dsterba@suse.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/3] btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults
@ 2022-04-07 11:05     ` David Sterba
  0 siblings, 0 replies; 12+ messages in thread
From: David Sterba @ 2022-04-07 11:05 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Linus Torvalds, Andreas Gruenbacher, Josef Bacik, Al Viro,
	Andrew Morton, Chris Mason, David Sterba, Will Deacon,
	linux-fsdevel, linux-arm-kernel, linux-btrfs, linux-kernel

On Wed, Apr 06, 2022 at 07:09:22PM +0100, Catalin Marinas wrote:
> Commit a48b73eca4ce ("btrfs: fix potential deadlock in the search
> ioctl") addressed a lockdep warning by pre-faulting the user pages and
> attempting the copy_to_user_nofault() in an infinite loop. On
> architectures like arm64 with MTE, an access may fault within a page at
> a location different from what fault_in_writeable() probed. Since the
> sk_offset is rewound to the previous struct btrfs_ioctl_search_header
> boundary, there is no guaranteed forward progress and search_ioctl() may
> live-lock.
> 
> Use fault_in_subpage_writeable() instead of fault_in_writeable() to
> ensure the permission is checked at the right granularity (smaller than
> PAGE_SIZE).
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Fixes: a48b73eca4ce ("btrfs: fix potential deadlock in the search ioctl")
> Reported-by: Al Viro <viro@zeniv.linux.org.uk>

Acked-by: David Sterba <dsterba@suse.com>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-04-07 11:10 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-06 18:09 [PATCH v3 0/3] Avoid live-lock in btrfs fault-in+uaccess loop Catalin Marinas
2022-04-06 18:09 ` Catalin Marinas
2022-04-06 18:09 ` [PATCH v3 1/3] mm: Add fault_in_subpage_writeable() to probe at sub-page granularity Catalin Marinas
2022-04-06 18:09   ` Catalin Marinas
2022-04-06 18:09 ` [PATCH v3 2/3] arm64: Add support for user sub-page fault probing Catalin Marinas
2022-04-06 18:09   ` Catalin Marinas
2022-04-06 18:09 ` [PATCH v3 3/3] btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults Catalin Marinas
2022-04-06 18:09   ` Catalin Marinas
2022-04-06 20:40   ` Catalin Marinas
2022-04-06 20:40     ` Catalin Marinas
2022-04-07 11:05   ` David Sterba
2022-04-07 11:05     ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.