linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] Fix for VM_FLUSH_RESET_PERMS on sparc
@ 2019-05-17 21:01 Rick Edgecombe
  2019-05-17 21:01 ` [PATCH 1/1] vmalloc: Fix issues with flush flag Rick Edgecombe
  0 siblings, 1 reply; 4+ messages in thread
From: Rick Edgecombe @ 2019-05-17 21:01 UTC (permalink / raw)
  To: peterz, linux-mm, sparclinux, netdev, bpf
  Cc: dave.hansen, namit, Rick Edgecombe

Meelis Roos reported issues with the new VM_FLUSH_RESET_PERMS flag on the sparc
architecture. When freeing many BPF JITs simultaneously, the vfree flush
operations can become stuck waiting as they each try to vm_unmap_aliases().

It also came up that using this flag is not needed for architectures like sparc
that already have normal kernel memory as executable. This patch fixes the usage
of this flag on sparc to also fix it in case the root cause is also an issue on
other architectures. Separately we can disable usage of VM_FLUSH_RESET_PERMS for
these architectures if desired.

Rick Edgecombe (1):
  vmalloc: Fix issues with flush flag

 mm/vmalloc.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/1] vmalloc: Fix issues with flush flag
  2019-05-17 21:01 [PATCH 0/1] Fix for VM_FLUSH_RESET_PERMS on sparc Rick Edgecombe
@ 2019-05-17 21:01 ` Rick Edgecombe
  2019-05-20  3:54   ` Edgecombe, Rick P
  2019-05-20 19:13   ` Edgecombe, Rick P
  0 siblings, 2 replies; 4+ messages in thread
From: Rick Edgecombe @ 2019-05-17 21:01 UTC (permalink / raw)
  To: peterz, linux-mm, sparclinux, netdev, bpf
  Cc: dave.hansen, namit, Rick Edgecombe, Meelis Roos, David S. Miller,
	Borislav Petkov, Andy Lutomirski, Ingo Molnar

Meelis Roos reported issues with the new VM_FLUSH_RESET_PERMS flag on the
sparc architecture.

When freeing many BPF JITs at once the free operations can become stuck
waiting for locks as they each try to vm_unmap_aliases(). Calls to this
function happen frequently on some archs, but in vmalloc itself the lazy
purge operations happens more rarely, where only in extreme cases could
multiple purges be happening at once. Since this is cross platform code we
shouldn't do this here where it could happen concurrently in a burst, and
instead just flush the TLB. Also, add a little logic to skip calls to
page_address() when possible to further speed this up, since they may have
locking on some archs.

Lastly, it appears that the calculation of the address range to flush
was broken at some point, so fix that as well.

Fixes: 868b104d7379 ("mm/vmalloc: Add flag for freeing of special permsissions")
Reported-by: Meelis Roos <mroos@linux.ee>
Cc: Meelis Roos <mroos@linux.ee>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Nadav Amit <namit@vmware.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 mm/vmalloc.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 67bbb8d2a0a8..5daa7ec8950f 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1531,9 +1531,10 @@ static inline void set_area_direct_map(const struct vm_struct *area,
 /* Handle removing and resetting vm mappings related to the vm_struct. */
 static void vm_remove_mappings(struct vm_struct *area, int deallocate_pages)
 {
+	const bool has_set_direct = IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP);
+	const bool flush_reset = area->flags & VM_FLUSH_RESET_PERMS;
 	unsigned long addr = (unsigned long)area->addr;
-	unsigned long start = ULONG_MAX, end = 0;
-	int flush_reset = area->flags & VM_FLUSH_RESET_PERMS;
+	unsigned long start = addr, end = addr + get_vm_area_size(area);
 	int i;
 
 	/*
@@ -1542,7 +1543,7 @@ static void vm_remove_mappings(struct vm_struct *area, int deallocate_pages)
 	 * This is concerned with resetting the direct map any an vm alias with
 	 * execute permissions, without leaving a RW+X window.
 	 */
-	if (flush_reset && !IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP)) {
+	if (flush_reset && !has_set_direct) {
 		set_memory_nx(addr, area->nr_pages);
 		set_memory_rw(addr, area->nr_pages);
 	}
@@ -1555,22 +1556,24 @@ static void vm_remove_mappings(struct vm_struct *area, int deallocate_pages)
 
 	/*
 	 * If not deallocating pages, just do the flush of the VM area and
-	 * return.
+	 * return. If the arch doesn't have set_direct_map_(), also skip the
+	 * below work.
 	 */
-	if (!deallocate_pages) {
-		vm_unmap_aliases();
+	if (!deallocate_pages || !has_set_direct) {
+		flush_tlb_kernel_range(addr, get_vm_area_size(area));
 		return;
 	}
 
 	/*
 	 * If execution gets here, flush the vm mapping and reset the direct
 	 * map. Find the start and end range of the direct mappings to make sure
-	 * the vm_unmap_aliases() flush includes the direct map.
+	 * the flush_tlb_kernel_range() includes the direct map.
 	 */
 	for (i = 0; i < area->nr_pages; i++) {
-		if (page_address(area->pages[i])) {
+		addr = (unsigned long)page_address(area->pages[i]);
+		if (addr) {
 			start = min(addr, start);
-			end = max(addr, end);
+			end = max(addr + PAGE_SIZE, end);
 		}
 	}
 
@@ -1580,7 +1583,7 @@ static void vm_remove_mappings(struct vm_struct *area, int deallocate_pages)
 	 * reset the direct map permissions to the default.
 	 */
 	set_area_direct_map(area, set_direct_map_invalid_noflush);
-	_vm_unmap_aliases(start, end, 1);
+	flush_tlb_kernel_range(start, end);
 	set_area_direct_map(area, set_direct_map_default_noflush);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/1] vmalloc: Fix issues with flush flag
  2019-05-17 21:01 ` [PATCH 1/1] vmalloc: Fix issues with flush flag Rick Edgecombe
@ 2019-05-20  3:54   ` Edgecombe, Rick P
  2019-05-20 19:13   ` Edgecombe, Rick P
  1 sibling, 0 replies; 4+ messages in thread
From: Edgecombe, Rick P @ 2019-05-20  3:54 UTC (permalink / raw)
  To: netdev, peterz, linux-mm, davem, sparclinux, bpf
  Cc: bp, mroos, luto, namit, Hansen, Dave, mingo

Hi,

After investigating this more, I am not positive why this fixes the
issue on sparc. I will continue to investigate as best I can, but would
like to request help from some sparc experts on evaluating my line of
thinking. I think the changes in this patch are still very worthwhile
generally though.


Besides fixing the sparc issue:

1. The fixes for the calculation of the direct map address range are
important on x86 in case a RO direct map alias ever gets loaded into
the TLB. This shouldn't normally happen, but it could cause the
permissions to not get reset on the direct map alias, and then the page
would return from the page allocator to some other component as RO and
cause a crash. This was mostly broken implementing a style suggestion
late in the development. As best I can tell, it shouldn't have any
effect on sparc.

2. Simply flushing the TLB instead of the whole vm_unmap_alias()
operation makes the frees faster and pushes the heavy work to happen on
allocation where it would be more expected. vm_unmap_alias() takes some
locks including a long hold of vmap_purge_lock, which will make all
other VM_FLUSH_RESET_PERMS vfrees wait while the purge operation
happens.


The issue observed on an UltraSparc III system was a hang on boot. The
only significant difference I can find in how Sparc works in this area
is that there is actually special optimization in the TLB flush for
handling vmalloc lazy purge operations.

Some firmware mappings live between the modules and vmalloc ranges, and
if their translations are flushed can cause "hard hangs and crashes
[1]. Additionally in the mix, "sparc64 kernel learns about
openfirmware's dynamic mappings in this region early in the boot, and
then services TLB misses in this area".[1] The firmware protection
logic seems to be in place, however later another change was made in
the lower asm to do a "flush all" if the range was big enough on this
cpu [2]. With the advent of the change this patch addresses, the purge
operations would be happening much earlier than before, with the first
special permissioned vfree, instead of after the machine has been
running for some time and the vmalloc spaces had become fragmented.

So my best theory is that the history of vmalloc lazy purges causing
hangs on the sparc has come into play here somehow, triggered by that
we were doing the purges much earlier. If it was something like this,
the fact that we instead only flush the small allocation itself on
sparc after this patch would be the reason why it fixes it.

Admittedly, there are some missing pieces in the theory. If there are
any sparc architecture experts that can help enlighten me if this
sounds reasonable at all I would really appreciate it.

Thanks,

Rick

[1] https://patchwork.ozlabs.org/patch/376523/
[2] https://patchwork.ozlabs.org/patch/687780/ 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/1] vmalloc: Fix issues with flush flag
  2019-05-17 21:01 ` [PATCH 1/1] vmalloc: Fix issues with flush flag Rick Edgecombe
  2019-05-20  3:54   ` Edgecombe, Rick P
@ 2019-05-20 19:13   ` Edgecombe, Rick P
  1 sibling, 0 replies; 4+ messages in thread
From: Edgecombe, Rick P @ 2019-05-20 19:13 UTC (permalink / raw)
  To: netdev, peterz, linux-mm, sparclinux, bpf
  Cc: bp, davem, mroos, luto, namit, Hansen, Dave, mingo

On Fri, 2019-05-17 at 14:01 -0700, Rick Edgecomb e wrote:
> Meelis Roos reported issues with the new VM_FLUSH_RESET_PERMS flag on
> the
> sparc architecture.
> 
Argh, this patch is not correct in the flush range for non-x86. I'll
send a revision.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-05-20 19:13 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-17 21:01 [PATCH 0/1] Fix for VM_FLUSH_RESET_PERMS on sparc Rick Edgecombe
2019-05-17 21:01 ` [PATCH 1/1] vmalloc: Fix issues with flush flag Rick Edgecombe
2019-05-20  3:54   ` Edgecombe, Rick P
2019-05-20 19:13   ` Edgecombe, Rick P

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).