linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: Always flush VMA ranges affected by zap_page_range
@ 2017-07-24  9:45 Mel Gorman
  2017-07-25  3:49 ` kbuild test robot
  0 siblings, 1 reply; 2+ messages in thread
From: Mel Gorman @ 2017-07-24  9:45 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Nadav Amit, Andy Lutomirski, linux-mm, linux-kernel

Nadav Amit report zap_page_range only specifies that the caller protect
the VMA list but does not specify whether it is held for read or write
with callers using either. madvise holds mmap_sem for read meaning that a
parallel zap operation can unmap PTEs which are then potentially skipped
by madvise which potentially returns with stale TLB entries present. While
the API could be extended, it would be a difficult API to use. This patch
causes zap_page_range() to always consider flushing the full affected
range. For small ranges or sparsely populated mappings, this may result
in one additional spurious TLB flush. For larger ranges, it is possible
that the TLB has already been flushed and the overhead is negligible.
Either way, this approach is safer overall and avoids stale entries being
present when madvise returns.

This can be illustrated with the following, slightly modified, program
provided by Nadav Amit. With the patch applied, it has an exit code of 0
indicating a stale TLB entry did not leak to userspace.

---8<---

volatile int sync_step = 0;
volatile char *p;

static inline unsigned long rdtsc()
{
	unsigned long hi, lo;
	__asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
	 return lo | (hi << 32);
}

static inline void wait_rdtsc(unsigned long cycles)
{
	unsigned long tsc = rdtsc();

	while (rdtsc() - tsc < cycles);
}

void *big_madvise_thread(void *ign)
{
	sync_step = 1;
	while (sync_step != 2);
	madvise((void*)p, PAGE_SIZE * N_PAGES, MADV_DONTNEED);
}

int main(void)
{
	pthread_t aux_thread;

	p = mmap(0, PAGE_SIZE * N_PAGES, PROT_READ|PROT_WRITE,
		 MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);

	memset((void*)p, 8, PAGE_SIZE * N_PAGES);

	pthread_create(&aux_thread, NULL, big_madvise_thread, NULL);
	while (sync_step != 1);

	*p = 8;		// Cache in TLB
	sync_step = 2;
	wait_rdtsc(100000);
	madvise((void*)p, PAGE_SIZE, MADV_DONTNEED);
	printf("data: %d (%s)\n", *p, (*p == 8 ? "stale, broken" : "cleared, fine"));
	return *p == 8 ? -1 : 0;
}
---8<---

Reported-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/memory.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index bb11c474857e..b93b51f56ee4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1483,8 +1483,20 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long start,
 	tlb_gather_mmu(&tlb, mm, start, end);
 	update_hiwater_rss(mm);
 	mmu_notifier_invalidate_range_start(mm, start, end);
-	for ( ; vma && vma->vm_start < end; vma = vma->vm_next)
+	for ( ; vma && vma->vm_start < end; vma = vma->vm_next) {
 		unmap_single_vma(&tlb, vma, start, end, NULL);
+
+		/*
+		 * zap_page_range does not specify whether mmap_sem should be
+		 * held for read or write. That allows parallel zap_page_range
+		 * operations to unmap a PTE and defer a flush meaning that
+		 * this call observes pte_none and fails to flush the TLB.
+		 * Rather than adding a complex API, ensure that no stale
+		 * TLB entries exist when this call returns.
+		 */
+		__tlb_adjust_range(&tlb, start, end);
+	}
+
 	mmu_notifier_invalidate_range_end(mm, start, end);
 	tlb_finish_mmu(&tlb, start, end);
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] mm: Always flush VMA ranges affected by zap_page_range
  2017-07-24  9:45 [PATCH] mm: Always flush VMA ranges affected by zap_page_range Mel Gorman
@ 2017-07-25  3:49 ` kbuild test robot
  0 siblings, 0 replies; 2+ messages in thread
From: kbuild test robot @ 2017-07-25  3:49 UTC (permalink / raw)
  To: Mel Gorman
  Cc: kbuild-all, Andrew Morton, Nadav Amit, Andy Lutomirski, linux-mm,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2673 bytes --]

Hi Mel,

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.13-rc2 next-20170724]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Mel-Gorman/mm-Always-flush-VMA-ranges-affected-by-zap_page_range/20170725-102436
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: ia64-allyesconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 6.2.0
reproduce:
        wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   mm/memory.c: In function 'zap_page_range':
>> mm/memory.c:1497:3: error: implicit declaration of function '__tlb_adjust_range' [-Werror=implicit-function-declaration]
      __tlb_adjust_range(&tlb, start, end);
      ^~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/__tlb_adjust_range +1497 mm/memory.c

  1466	
  1467	/**
  1468	 * zap_page_range - remove user pages in a given range
  1469	 * @vma: vm_area_struct holding the applicable pages
  1470	 * @start: starting address of pages to zap
  1471	 * @size: number of bytes to zap
  1472	 *
  1473	 * Caller must protect the VMA list
  1474	 */
  1475	void zap_page_range(struct vm_area_struct *vma, unsigned long start,
  1476			unsigned long size)
  1477	{
  1478		struct mm_struct *mm = vma->vm_mm;
  1479		struct mmu_gather tlb;
  1480		unsigned long end = start + size;
  1481	
  1482		lru_add_drain();
  1483		tlb_gather_mmu(&tlb, mm, start, end);
  1484		update_hiwater_rss(mm);
  1485		mmu_notifier_invalidate_range_start(mm, start, end);
  1486		for ( ; vma && vma->vm_start < end; vma = vma->vm_next) {
  1487			unmap_single_vma(&tlb, vma, start, end, NULL);
  1488	
  1489			/*
  1490			 * zap_page_range does not specify whether mmap_sem should be
  1491			 * held for read or write. That allows parallel zap_page_range
  1492			 * operations to unmap a PTE and defer a flush meaning that
  1493			 * this call observes pte_none and fails to flush the TLB.
  1494			 * Rather than adding a complex API, ensure that no stale
  1495			 * TLB entries exist when this call returns.
  1496			 */
> 1497			__tlb_adjust_range(&tlb, start, end);
  1498		}
  1499	
  1500		mmu_notifier_invalidate_range_end(mm, start, end);
  1501		tlb_finish_mmu(&tlb, start, end);
  1502	}
  1503	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 51478 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2017-07-25  3:49 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-24  9:45 [PATCH] mm: Always flush VMA ranges affected by zap_page_range Mel Gorman
2017-07-25  3:49 ` kbuild test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).