All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC, PATCH 0/8] remap_file_pages() decommission
@ 2014-05-06 14:37 Kirill A. Shutemov
  2014-05-06 14:37 ` [PATCH 1/8] mm: replace remap_file_pages() syscall with emulation Kirill A. Shutemov
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: Kirill A. Shutemov @ 2014-05-06 14:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, peterz, mingo, Kirill A. Shutemov

Hi Andrew,

This patchset replaces the syscall with emulation which creates new VMA on
each remap and remove code to support non-linear mappings.

Nonlinear mappings are pain to support and it seems there's no legitimate
use-cases nowadays since 64-bit systems are widely available.

It's not yet ready to apply. Just to give rough idea of what can we get if
we'll deprecated remap_file_pages().

I need to split patches properly and write correct commit messages. And there's
still code to remove.

Comments?

Kirill A. Shutemov (8):
  mm: replace remap_file_pages() syscall with emulation
  mm: kill vm_operations_struct->remap_pages
  mm: kill zap_details->nonlinear_vma
  mm, rmap: kill rmap_walk_control->file_nonlinear()
  mm, rmap: kill vma->shared.nonlinear
  mm, rmap: kill mapping->i_mmap_nonlinear
  mm: kill VM_NONLINEAR and FAULT_FLAG_NONLINEAR
  mm, x86: kill pte_to_pgoff(), pgoff_to_pte() and pte_file*()

 Documentation/cachetlb.txt            |   4 +-
 arch/x86/include/asm/pgtable-2level.h |  39 -----
 arch/x86/include/asm/pgtable-3level.h |   4 -
 arch/x86/include/asm/pgtable.h        |  20 ---
 arch/x86/include/asm/pgtable_64.h     |   4 -
 arch/x86/include/asm/pgtable_types.h  |   3 +-
 drivers/gpu/drm/drm_vma_manager.c     |   3 +-
 fs/9p/vfs_file.c                      |   2 -
 fs/btrfs/file.c                       |   1 -
 fs/ceph/addr.c                        |   1 -
 fs/cifs/file.c                        |   1 -
 fs/ext4/file.c                        |   1 -
 fs/f2fs/file.c                        |   1 -
 fs/fuse/file.c                        |   1 -
 fs/gfs2/file.c                        |   1 -
 fs/inode.c                            |   1 -
 fs/nfs/file.c                         |   1 -
 fs/nilfs2/file.c                      |   1 -
 fs/ocfs2/mmap.c                       |   1 -
 fs/proc/task_mmu.c                    |  10 --
 fs/ubifs/file.c                       |   1 -
 fs/xfs/xfs_file.c                     |   1 -
 include/linux/fs.h                    |   6 +-
 include/linux/mm.h                    |  12 --
 include/linux/mm_types.h              |  12 +-
 include/linux/rmap.h                  |   2 -
 include/linux/swapops.h               |   4 +-
 kernel/fork.c                         |   8 +-
 mm/Makefile                           |   2 +-
 mm/filemap.c                          |   1 -
 mm/filemap_xip.c                      |   1 -
 mm/fremap.c                           | 282 ----------------------------------
 mm/interval_tree.c                    |  34 ++--
 mm/ksm.c                              |   2 +-
 mm/madvise.c                          |  13 +-
 mm/memcontrol.c                       |   7 +-
 mm/memory.c                           | 201 +++++++-----------------
 mm/migrate.c                          |  32 ----
 mm/mincore.c                          |   5 +-
 mm/mmap.c                             |  89 +++++++++--
 mm/mprotect.c                         |   2 +-
 mm/mremap.c                           |   2 -
 mm/nommu.c                            |   8 -
 mm/rmap.c                             | 222 +-------------------------
 mm/shmem.c                            |   1 -
 mm/swap.c                             |   1 -
 46 files changed, 168 insertions(+), 883 deletions(-)
 delete mode 100644 mm/fremap.c

-- 
2.0.0.rc0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/8] mm: replace remap_file_pages() syscall with emulation
  2014-05-06 14:37 [RFC, PATCH 0/8] remap_file_pages() decommission Kirill A. Shutemov
@ 2014-05-06 14:37 ` Kirill A. Shutemov
  2014-10-08  6:50   ` Vineet Gupta
  2014-05-06 14:37 ` [PATCH 2/8] mm: kill vm_operations_struct->remap_pages Kirill A. Shutemov
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Kirill A. Shutemov @ 2014-05-06 14:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, peterz, mingo, Kirill A. Shutemov

remap_file_pages(2) was invented to be able efficiently map parts of
huge file into limited 32-bit virtual address space such as in database
workloads.

Nonlinear mappings are pain to support and it seems there's no
legitimate use-cases nowadays since 64-bit systems are widely available.

Let's drop it and get rid of all these special-cased code.

The patch replaces the syscall with emulation which creates new VMA on
each remap.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/fs.h |   8 +-
 mm/Makefile        |   2 +-
 mm/fremap.c        | 282 -----------------------------------------------------
 mm/mmap.c          |  68 +++++++++++++
 mm/nommu.c         |   8 --
 5 files changed, 75 insertions(+), 293 deletions(-)
 delete mode 100644 mm/fremap.c

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 878031227c57..b7cda7d95ea0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2401,8 +2401,12 @@ extern int sb_min_blocksize(struct super_block *, int);
 
 extern int generic_file_mmap(struct file *, struct vm_area_struct *);
 extern int generic_file_readonly_mmap(struct file *, struct vm_area_struct *);
-extern int generic_file_remap_pages(struct vm_area_struct *, unsigned long addr,
-		unsigned long size, pgoff_t pgoff);
+static inline int generic_file_remap_pages(struct vm_area_struct *vma,
+		unsigned long addr, unsigned long size, pgoff_t pgoff)
+{
+	BUG();
+	return 0;
+}
 int generic_write_checks(struct file *file, loff_t *pos, size_t *count, int isblk);
 extern ssize_t generic_file_aio_read(struct kiocb *, const struct iovec *, unsigned long, loff_t);
 extern ssize_t __generic_file_aio_write(struct kiocb *, const struct iovec *, unsigned long);
diff --git a/mm/Makefile b/mm/Makefile
index b484452dac57..27e3e30be39b 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -3,7 +3,7 @@
 #
 
 mmu-y			:= nommu.o
-mmu-$(CONFIG_MMU)	:= fremap.o highmem.o madvise.o memory.o mincore.o \
+mmu-$(CONFIG_MMU)	:= highmem.o madvise.o memory.o mincore.o \
 			   mlock.o mmap.o mprotect.o mremap.o msync.o rmap.o \
 			   vmalloc.o pagewalk.o pgtable-generic.o
 
diff --git a/mm/fremap.c b/mm/fremap.c
deleted file mode 100644
index 34feba60a17e..000000000000
--- a/mm/fremap.c
+++ /dev/null
@@ -1,282 +0,0 @@
-/*
- *   linux/mm/fremap.c
- * 
- * Explicit pagetable population and nonlinear (random) mappings support.
- *
- * started by Ingo Molnar, Copyright (C) 2002, 2003
- */
-#include <linux/export.h>
-#include <linux/backing-dev.h>
-#include <linux/mm.h>
-#include <linux/swap.h>
-#include <linux/file.h>
-#include <linux/mman.h>
-#include <linux/pagemap.h>
-#include <linux/swapops.h>
-#include <linux/rmap.h>
-#include <linux/syscalls.h>
-#include <linux/mmu_notifier.h>
-
-#include <asm/mmu_context.h>
-#include <asm/cacheflush.h>
-#include <asm/tlbflush.h>
-
-#include "internal.h"
-
-static int mm_counter(struct page *page)
-{
-	return PageAnon(page) ? MM_ANONPAGES : MM_FILEPAGES;
-}
-
-static void zap_pte(struct mm_struct *mm, struct vm_area_struct *vma,
-			unsigned long addr, pte_t *ptep)
-{
-	pte_t pte = *ptep;
-	struct page *page;
-	swp_entry_t entry;
-
-	if (pte_present(pte)) {
-		flush_cache_page(vma, addr, pte_pfn(pte));
-		pte = ptep_clear_flush(vma, addr, ptep);
-		page = vm_normal_page(vma, addr, pte);
-		if (page) {
-			if (pte_dirty(pte))
-				set_page_dirty(page);
-			update_hiwater_rss(mm);
-			dec_mm_counter(mm, mm_counter(page));
-			page_remove_rmap(page);
-			page_cache_release(page);
-		}
-	} else {	/* zap_pte() is not called when pte_none() */
-		if (!pte_file(pte)) {
-			update_hiwater_rss(mm);
-			entry = pte_to_swp_entry(pte);
-			if (non_swap_entry(entry)) {
-				if (is_migration_entry(entry)) {
-					page = migration_entry_to_page(entry);
-					dec_mm_counter(mm, mm_counter(page));
-				}
-			} else {
-				free_swap_and_cache(entry);
-				dec_mm_counter(mm, MM_SWAPENTS);
-			}
-		}
-		pte_clear_not_present_full(mm, addr, ptep, 0);
-	}
-}
-
-/*
- * Install a file pte to a given virtual memory address, release any
- * previously existing mapping.
- */
-static int install_file_pte(struct mm_struct *mm, struct vm_area_struct *vma,
-		unsigned long addr, unsigned long pgoff, pgprot_t prot)
-{
-	int err = -ENOMEM;
-	pte_t *pte, ptfile;
-	spinlock_t *ptl;
-
-	pte = get_locked_pte(mm, addr, &ptl);
-	if (!pte)
-		goto out;
-
-	ptfile = pgoff_to_pte(pgoff);
-
-	if (!pte_none(*pte)) {
-		if (pte_present(*pte) && pte_soft_dirty(*pte))
-			pte_file_mksoft_dirty(ptfile);
-		zap_pte(mm, vma, addr, pte);
-	}
-
-	set_pte_at(mm, addr, pte, ptfile);
-	/*
-	 * We don't need to run update_mmu_cache() here because the "file pte"
-	 * being installed by install_file_pte() is not a real pte - it's a
-	 * non-present entry (like a swap entry), noting what file offset should
-	 * be mapped there when there's a fault (in a non-linear vma where
-	 * that's not obvious).
-	 */
-	pte_unmap_unlock(pte, ptl);
-	err = 0;
-out:
-	return err;
-}
-
-int generic_file_remap_pages(struct vm_area_struct *vma, unsigned long addr,
-			     unsigned long size, pgoff_t pgoff)
-{
-	struct mm_struct *mm = vma->vm_mm;
-	int err;
-
-	do {
-		err = install_file_pte(mm, vma, addr, pgoff, vma->vm_page_prot);
-		if (err)
-			return err;
-
-		size -= PAGE_SIZE;
-		addr += PAGE_SIZE;
-		pgoff++;
-	} while (size);
-
-	return 0;
-}
-EXPORT_SYMBOL(generic_file_remap_pages);
-
-/**
- * sys_remap_file_pages - remap arbitrary pages of an existing VM_SHARED vma
- * @start: start of the remapped virtual memory range
- * @size: size of the remapped virtual memory range
- * @prot: new protection bits of the range (see NOTE)
- * @pgoff: to-be-mapped page of the backing store file
- * @flags: 0 or MAP_NONBLOCKED - the later will cause no IO.
- *
- * sys_remap_file_pages remaps arbitrary pages of an existing VM_SHARED vma
- * (shared backing store file).
- *
- * This syscall works purely via pagetables, so it's the most efficient
- * way to map the same (large) file into a given virtual window. Unlike
- * mmap()/mremap() it does not create any new vmas. The new mappings are
- * also safe across swapout.
- *
- * NOTE: the @prot parameter right now is ignored (but must be zero),
- * and the vma's default protection is used. Arbitrary protections
- * might be implemented in the future.
- */
-SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
-		unsigned long, prot, unsigned long, pgoff, unsigned long, flags)
-{
-	struct mm_struct *mm = current->mm;
-	struct address_space *mapping;
-	struct vm_area_struct *vma;
-	int err = -EINVAL;
-	int has_write_lock = 0;
-	vm_flags_t vm_flags = 0;
-
-	if (prot)
-		return err;
-	/*
-	 * Sanitize the syscall parameters:
-	 */
-	start = start & PAGE_MASK;
-	size = size & PAGE_MASK;
-
-	/* Does the address range wrap, or is the span zero-sized? */
-	if (start + size <= start)
-		return err;
-
-	/* Does pgoff wrap? */
-	if (pgoff + (size >> PAGE_SHIFT) < pgoff)
-		return err;
-
-	/* Can we represent this offset inside this architecture's pte's? */
-#if PTE_FILE_MAX_BITS < BITS_PER_LONG
-	if (pgoff + (size >> PAGE_SHIFT) >= (1UL << PTE_FILE_MAX_BITS))
-		return err;
-#endif
-
-	/* We need down_write() to change vma->vm_flags. */
-	down_read(&mm->mmap_sem);
- retry:
-	vma = find_vma(mm, start);
-
-	/*
-	 * Make sure the vma is shared, that it supports prefaulting,
-	 * and that the remapped range is valid and fully within
-	 * the single existing vma.
-	 */
-	if (!vma || !(vma->vm_flags & VM_SHARED))
-		goto out;
-
-	if (!vma->vm_ops || !vma->vm_ops->remap_pages)
-		goto out;
-
-	if (start < vma->vm_start || start + size > vma->vm_end)
-		goto out;
-
-	/* Must set VM_NONLINEAR before any pages are populated. */
-	if (!(vma->vm_flags & VM_NONLINEAR)) {
-		/*
-		 * vm_private_data is used as a swapout cursor
-		 * in a VM_NONLINEAR vma.
-		 */
-		if (vma->vm_private_data)
-			goto out;
-
-		/* Don't need a nonlinear mapping, exit success */
-		if (pgoff == linear_page_index(vma, start)) {
-			err = 0;
-			goto out;
-		}
-
-		if (!has_write_lock) {
-get_write_lock:
-			up_read(&mm->mmap_sem);
-			down_write(&mm->mmap_sem);
-			has_write_lock = 1;
-			goto retry;
-		}
-		mapping = vma->vm_file->f_mapping;
-		/*
-		 * page_mkclean doesn't work on nonlinear vmas, so if
-		 * dirty pages need to be accounted, emulate with linear
-		 * vmas.
-		 */
-		if (mapping_cap_account_dirty(mapping)) {
-			unsigned long addr;
-			struct file *file = get_file(vma->vm_file);
-			/* mmap_region may free vma; grab the info now */
-			vm_flags = vma->vm_flags;
-
-			addr = mmap_region(file, start, size, vm_flags, pgoff);
-			fput(file);
-			if (IS_ERR_VALUE(addr)) {
-				err = addr;
-			} else {
-				BUG_ON(addr != start);
-				err = 0;
-			}
-			goto out_freed;
-		}
-		mutex_lock(&mapping->i_mmap_mutex);
-		flush_dcache_mmap_lock(mapping);
-		vma->vm_flags |= VM_NONLINEAR;
-		vma_interval_tree_remove(vma, &mapping->i_mmap);
-		vma_nonlinear_insert(vma, &mapping->i_mmap_nonlinear);
-		flush_dcache_mmap_unlock(mapping);
-		mutex_unlock(&mapping->i_mmap_mutex);
-	}
-
-	if (vma->vm_flags & VM_LOCKED) {
-		/*
-		 * drop PG_Mlocked flag for over-mapped range
-		 */
-		if (!has_write_lock)
-			goto get_write_lock;
-		vm_flags = vma->vm_flags;
-		munlock_vma_pages_range(vma, start, start + size);
-		vma->vm_flags = vm_flags;
-	}
-
-	mmu_notifier_invalidate_range_start(mm, start, start + size);
-	err = vma->vm_ops->remap_pages(vma, start, size, pgoff);
-	mmu_notifier_invalidate_range_end(mm, start, start + size);
-
-	/*
-	 * We can't clear VM_NONLINEAR because we'd have to do
-	 * it after ->populate completes, and that would prevent
-	 * downgrading the lock.  (Locks can't be upgraded).
-	 */
-
-out:
-	if (vma)
-		vm_flags = vma->vm_flags;
-out_freed:
-	if (likely(!has_write_lock))
-		up_read(&mm->mmap_sem);
-	else
-		up_write(&mm->mmap_sem);
-	if (!err && ((vm_flags & VM_LOCKED) || !(flags & MAP_NONBLOCK)))
-		mm_populate(start, size);
-
-	return err;
-}
diff --git a/mm/mmap.c b/mm/mmap.c
index b1202cf81f4b..4106fc833f56 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2579,6 +2579,74 @@ SYSCALL_DEFINE2(munmap, unsigned long, addr, size_t, len)
 	return vm_munmap(addr, len);
 }
 
+
+/*
+ * Emulation of deprecated remap_file_pages() syscall.
+ */
+SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
+		unsigned long, prot, unsigned long, pgoff, unsigned long, flags)
+{
+
+	struct mm_struct *mm = current->mm;
+	struct vm_area_struct *vma;
+	unsigned long populate;
+	int ret = -EINVAL;
+
+	printk_once(KERN_WARNING "%s (%d) calls remap_file_pages(2) which is "
+			"deprecated and no longer supported by kernel in "
+			"an efficient way.\n"
+			"Note that emulated remap_file_pages(2) can "
+			"potentially create a lot of mappings. "
+			"Consider increasing vm.max_map_count.\n",
+			current->comm, current->pid);
+
+	if (prot)
+		return ret;
+	start = start & PAGE_MASK;
+	size = size & PAGE_MASK;
+
+	if (start + size <= start)
+		return ret;
+
+	/* Does pgoff wrap? */
+	if (pgoff + (size >> PAGE_SHIFT) < pgoff)
+		return ret;
+
+	down_write(&mm->mmap_sem);
+	vma = find_vma(mm, start);
+
+	if (!vma || !(vma->vm_flags & VM_SHARED))
+		goto out;
+
+	if (start < vma->vm_start || start + size > vma->vm_end)
+		goto out;
+
+	if (pgoff == linear_page_index(vma, start)) {
+		ret = 0;
+		goto out;
+	}
+
+	prot |= vma->vm_flags & VM_READ ? PROT_READ : 0;
+	prot |= vma->vm_flags & VM_WRITE ? PROT_WRITE : 0;
+	prot |= vma->vm_flags & VM_EXEC ? PROT_EXEC : 0;
+
+	flags &= MAP_POPULATE;
+	flags |= MAP_SHARED | MAP_FIXED;
+	if (vma->vm_flags & VM_LOCKED) {
+		flags |= MAP_LOCKED;
+		/* drop PG_Mlocked flag for over-mapped range */
+		munlock_vma_pages_range(vma, start, start + size);
+	}
+
+	ret = do_mmap_pgoff(vma->vm_file, start, size,
+			prot, flags, pgoff, &populate);
+	if (populate)
+		mm_populate(ret, populate);
+out:
+	up_write(&mm->mmap_sem);
+	return ret;
+}
+
 static inline void verify_mm_writelocked(struct mm_struct *mm)
 {
 #ifdef CONFIG_DEBUG_VM
diff --git a/mm/nommu.c b/mm/nommu.c
index 85f8d6698d48..d20b7fea2852 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1996,14 +1996,6 @@ void filemap_map_pages(struct vm_area_struct *vma, struct vm_fault *vmf)
 }
 EXPORT_SYMBOL(filemap_map_pages);
 
-int generic_file_remap_pages(struct vm_area_struct *vma, unsigned long addr,
-			     unsigned long size, pgoff_t pgoff)
-{
-	BUG();
-	return 0;
-}
-EXPORT_SYMBOL(generic_file_remap_pages);
-
 static int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
 		unsigned long addr, void *buf, int len, int write)
 {
-- 
2.0.0.rc0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/8] mm: kill vm_operations_struct->remap_pages
  2014-05-06 14:37 [RFC, PATCH 0/8] remap_file_pages() decommission Kirill A. Shutemov
  2014-05-06 14:37 ` [PATCH 1/8] mm: replace remap_file_pages() syscall with emulation Kirill A. Shutemov
@ 2014-05-06 14:37 ` Kirill A. Shutemov
  2014-05-19 15:03   ` Christoph Hellwig
  2014-05-06 14:37 ` [PATCH 3/8] mm: kill zap_details->nonlinear_vma Kirill A. Shutemov
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Kirill A. Shutemov @ 2014-05-06 14:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, peterz, mingo, Kirill A. Shutemov

There's no users anymore.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 fs/9p/vfs_file.c   | 2 --
 fs/btrfs/file.c    | 1 -
 fs/ceph/addr.c     | 1 -
 fs/cifs/file.c     | 1 -
 fs/ext4/file.c     | 1 -
 fs/f2fs/file.c     | 1 -
 fs/fuse/file.c     | 1 -
 fs/gfs2/file.c     | 1 -
 fs/nfs/file.c      | 1 -
 fs/nilfs2/file.c   | 1 -
 fs/ocfs2/mmap.c    | 1 -
 fs/ubifs/file.c    | 1 -
 fs/xfs/xfs_file.c  | 1 -
 include/linux/fs.h | 6 ------
 include/linux/mm.h | 3 ---
 mm/filemap.c       | 1 -
 mm/filemap_xip.c   | 1 -
 mm/shmem.c         | 1 -
 18 files changed, 26 deletions(-)

diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
index d8223209d4b1..4cc2eed453a7 100644
--- a/fs/9p/vfs_file.c
+++ b/fs/9p/vfs_file.c
@@ -834,7 +834,6 @@ static const struct vm_operations_struct v9fs_file_vm_ops = {
 	.fault = filemap_fault,
 	.map_pages = filemap_map_pages,
 	.page_mkwrite = v9fs_vm_page_mkwrite,
-	.remap_pages = generic_file_remap_pages,
 };
 
 static const struct vm_operations_struct v9fs_mmap_file_vm_ops = {
@@ -842,7 +841,6 @@ static const struct vm_operations_struct v9fs_mmap_file_vm_ops = {
 	.fault = filemap_fault,
 	.map_pages = filemap_map_pages,
 	.page_mkwrite = v9fs_vm_page_mkwrite,
-	.remap_pages = generic_file_remap_pages,
 };
 
 
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index ae6af072b635..1238b42e53d6 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2024,7 +2024,6 @@ static const struct vm_operations_struct btrfs_file_vm_ops = {
 	.fault		= filemap_fault,
 	.map_pages	= filemap_map_pages,
 	.page_mkwrite	= btrfs_page_mkwrite,
-	.remap_pages	= generic_file_remap_pages,
 };
 
 static int btrfs_file_mmap(struct file	*filp, struct vm_area_struct *vma)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index b53278c9fd97..ac6146d48647 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1330,7 +1330,6 @@ out:
 static struct vm_operations_struct ceph_vmops = {
 	.fault		= ceph_filemap_fault,
 	.page_mkwrite	= ceph_page_mkwrite,
-	.remap_pages	= generic_file_remap_pages,
 };
 
 int ceph_mmap(struct file *file, struct vm_area_struct *vma)
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 5ed03e0b8b40..b153aaa5da11 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -3101,7 +3101,6 @@ static struct vm_operations_struct cifs_file_vm_ops = {
 	.fault = filemap_fault,
 	.map_pages = filemap_map_pages,
 	.page_mkwrite = cifs_page_mkwrite,
-	.remap_pages = generic_file_remap_pages,
 };
 
 int cifs_file_strict_mmap(struct file *file, struct vm_area_struct *vma)
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 063fc1538355..fee8a59b3e64 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -202,7 +202,6 @@ static const struct vm_operations_struct ext4_file_vm_ops = {
 	.fault		= filemap_fault,
 	.map_pages	= filemap_map_pages,
 	.page_mkwrite   = ext4_page_mkwrite,
-	.remap_pages	= generic_file_remap_pages,
 };
 
 static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 60e7d5448a1d..c5908e5b621c 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -86,7 +86,6 @@ static const struct vm_operations_struct f2fs_file_vm_ops = {
 	.fault		= filemap_fault,
 	.map_pages	= filemap_map_pages,
 	.page_mkwrite	= f2fs_vm_page_mkwrite,
-	.remap_pages	= generic_file_remap_pages,
 };
 
 static int get_parent_ino(struct inode *inode, nid_t *pino)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 13f8bdec5110..a743711ef28b 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2116,7 +2116,6 @@ static const struct vm_operations_struct fuse_file_vm_ops = {
 	.fault		= filemap_fault,
 	.map_pages	= filemap_map_pages,
 	.page_mkwrite	= fuse_page_mkwrite,
-	.remap_pages	= generic_file_remap_pages,
 };
 
 static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 80d67253623c..2fec9c78fc6b 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -496,7 +496,6 @@ static const struct vm_operations_struct gfs2_vm_ops = {
 	.fault = filemap_fault,
 	.map_pages = filemap_map_pages,
 	.page_mkwrite = gfs2_page_mkwrite,
-	.remap_pages = generic_file_remap_pages,
 };
 
 /**
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 284ca901fe16..18f50bb4d887 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -619,7 +619,6 @@ static const struct vm_operations_struct nfs_file_vm_ops = {
 	.fault = filemap_fault,
 	.map_pages = filemap_map_pages,
 	.page_mkwrite = nfs_vm_page_mkwrite,
-	.remap_pages = generic_file_remap_pages,
 };
 
 static int nfs_need_sync_write(struct file *filp, struct inode *inode)
diff --git a/fs/nilfs2/file.c b/fs/nilfs2/file.c
index f3a82fbcae02..e0c458b8a168 100644
--- a/fs/nilfs2/file.c
+++ b/fs/nilfs2/file.c
@@ -136,7 +136,6 @@ static const struct vm_operations_struct nilfs_file_vm_ops = {
 	.fault		= filemap_fault,
 	.map_pages	= filemap_map_pages,
 	.page_mkwrite	= nilfs_page_mkwrite,
-	.remap_pages	= generic_file_remap_pages,
 };
 
 static int nilfs_file_mmap(struct file *file, struct vm_area_struct *vma)
diff --git a/fs/ocfs2/mmap.c b/fs/ocfs2/mmap.c
index 10d66c75cecb..9581d190f6e1 100644
--- a/fs/ocfs2/mmap.c
+++ b/fs/ocfs2/mmap.c
@@ -173,7 +173,6 @@ out:
 static const struct vm_operations_struct ocfs2_file_vm_ops = {
 	.fault		= ocfs2_fault,
 	.page_mkwrite	= ocfs2_page_mkwrite,
-	.remap_pages	= generic_file_remap_pages,
 };
 
 int ocfs2_mmap(struct file *file, struct vm_area_struct *vma)
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 4f34dbae823d..63c550489b1c 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1540,7 +1540,6 @@ static const struct vm_operations_struct ubifs_file_vm_ops = {
 	.fault        = filemap_fault,
 	.map_pages = filemap_map_pages,
 	.page_mkwrite = ubifs_vm_page_mkwrite,
-	.remap_pages = generic_file_remap_pages,
 };
 
 static int ubifs_file_mmap(struct file *file, struct vm_area_struct *vma)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 951a2321ee01..b315608c5b57 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1494,5 +1494,4 @@ static const struct vm_operations_struct xfs_file_vm_ops = {
 	.fault		= filemap_fault,
 	.map_pages	= filemap_map_pages,
 	.page_mkwrite	= xfs_vm_page_mkwrite,
-	.remap_pages	= generic_file_remap_pages,
 };
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b7cda7d95ea0..14abfc355726 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2401,12 +2401,6 @@ extern int sb_min_blocksize(struct super_block *, int);
 
 extern int generic_file_mmap(struct file *, struct vm_area_struct *);
 extern int generic_file_readonly_mmap(struct file *, struct vm_area_struct *);
-static inline int generic_file_remap_pages(struct vm_area_struct *vma,
-		unsigned long addr, unsigned long size, pgoff_t pgoff)
-{
-	BUG();
-	return 0;
-}
 int generic_write_checks(struct file *file, loff_t *pos, size_t *count, int isblk);
 extern ssize_t generic_file_aio_read(struct kiocb *, const struct iovec *, unsigned long, loff_t);
 extern ssize_t __generic_file_aio_write(struct kiocb *, const struct iovec *, unsigned long);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index bf9811e1321a..3e7b88ff15d6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -264,9 +264,6 @@ struct vm_operations_struct {
 	int (*migrate)(struct vm_area_struct *vma, const nodemask_t *from,
 		const nodemask_t *to, unsigned long flags);
 #endif
-	/* called by sys_remap_file_pages() to populate non-linear mapping */
-	int (*remap_pages)(struct vm_area_struct *vma, unsigned long addr,
-			   unsigned long size, pgoff_t pgoff);
 };
 
 struct mmu_gather;
diff --git a/mm/filemap.c b/mm/filemap.c
index 5020b280a771..3fda0bfee2d2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2106,7 +2106,6 @@ const struct vm_operations_struct generic_file_vm_ops = {
 	.fault		= filemap_fault,
 	.map_pages	= filemap_map_pages,
 	.page_mkwrite	= filemap_page_mkwrite,
-	.remap_pages	= generic_file_remap_pages,
 };
 
 /* This is used for a general mmap of a disk file */
diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c
index d8d9fe3f685c..e609c215c5d2 100644
--- a/mm/filemap_xip.c
+++ b/mm/filemap_xip.c
@@ -306,7 +306,6 @@ out:
 static const struct vm_operations_struct xip_file_vm_ops = {
 	.fault	= xip_file_fault,
 	.page_mkwrite	= filemap_page_mkwrite,
-	.remap_pages = generic_file_remap_pages,
 };
 
 int xip_file_mmap(struct file * file, struct vm_area_struct * vma)
diff --git a/mm/shmem.c b/mm/shmem.c
index 9f70e02111c6..335c8ea62ded 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2705,7 +2705,6 @@ static const struct vm_operations_struct shmem_vm_ops = {
 	.set_policy     = shmem_set_policy,
 	.get_policy     = shmem_get_policy,
 #endif
-	.remap_pages	= generic_file_remap_pages,
 };
 
 static struct dentry *shmem_mount(struct file_system_type *fs_type,
-- 
2.0.0.rc0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/8] mm: kill zap_details->nonlinear_vma
  2014-05-06 14:37 [RFC, PATCH 0/8] remap_file_pages() decommission Kirill A. Shutemov
  2014-05-06 14:37 ` [PATCH 1/8] mm: replace remap_file_pages() syscall with emulation Kirill A. Shutemov
  2014-05-06 14:37 ` [PATCH 2/8] mm: kill vm_operations_struct->remap_pages Kirill A. Shutemov
@ 2014-05-06 14:37 ` Kirill A. Shutemov
  2014-05-06 14:37 ` [PATCH 4/8] mm, rmap: kill rmap_walk_control->file_nonlinear() Kirill A. Shutemov
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Kirill A. Shutemov @ 2014-05-06 14:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, peterz, mingo, Kirill A. Shutemov

Nobody creates nonlinear VMAs. No need to kill them.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm.h |  1 -
 mm/madvise.c       |  9 +--------
 mm/memory.c        | 47 ++++-------------------------------------------
 3 files changed, 5 insertions(+), 52 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3e7b88ff15d6..156ca8025cec 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1066,7 +1066,6 @@ extern void user_shm_unlock(size_t, struct user_struct *);
  * Parameter block passed down to zap_pte_range in exceptional cases.
  */
 struct zap_details {
-	struct vm_area_struct *nonlinear_vma;	/* Check page->index if set */
 	struct address_space *check_mapping;	/* Check page->mapping if set */
 	pgoff_t	first_index;			/* Lowest page->index to unmap */
 	pgoff_t last_index;			/* Highest page->index to unmap */
diff --git a/mm/madvise.c b/mm/madvise.c
index 539eeb96b323..1932a1f0feda 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -278,14 +278,7 @@ static long madvise_dontneed(struct vm_area_struct *vma,
 	if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
 		return -EINVAL;
 
-	if (unlikely(vma->vm_flags & VM_NONLINEAR)) {
-		struct zap_details details = {
-			.nonlinear_vma = vma,
-			.last_index = ULONG_MAX,
-		};
-		zap_page_range(vma, start, end - start, &details);
-	} else
-		zap_page_range(vma, start, end - start, NULL);
+	zap_page_range(vma, start, end - start, NULL);
 	return 0;
 }
 
diff --git a/mm/memory.c b/mm/memory.c
index 037b812a9531..cc741a7ce71e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1111,28 +1111,12 @@ again:
 				if (details->check_mapping &&
 				    details->check_mapping != page->mapping)
 					continue;
-				/*
-				 * Each page->index must be checked when
-				 * invalidating or truncating nonlinear.
-				 */
-				if (details->nonlinear_vma &&
-				    (page->index < details->first_index ||
-				     page->index > details->last_index))
-					continue;
 			}
 			ptent = ptep_get_and_clear_full(mm, addr, pte,
 							tlb->fullmm);
 			tlb_remove_tlb_entry(tlb, pte, addr);
 			if (unlikely(!page))
 				continue;
-			if (unlikely(details) && details->nonlinear_vma
-			    && linear_page_index(details->nonlinear_vma,
-						addr) != page->index) {
-				pte_t ptfile = pgoff_to_pte(page->index);
-				if (pte_soft_dirty(ptent))
-					pte_file_mksoft_dirty(ptfile);
-				set_pte_at(mm, addr, pte, ptfile);
-			}
 			if (PageAnon(page))
 				rss[MM_ANONPAGES]--;
 			else {
@@ -1154,10 +1138,7 @@ again:
 			}
 			continue;
 		}
-		/*
-		 * If details->check_mapping, we leave swap entries;
-		 * if details->nonlinear_vma, we leave file entries.
-		 */
+		/* If details->check_mapping, we leave swap entries */
 		if (unlikely(details))
 			continue;
 		if (pte_file(ptent)) {
@@ -1292,7 +1273,7 @@ static void unmap_page_range(struct mmu_gather *tlb,
 	pgd_t *pgd;
 	unsigned long next;
 
-	if (details && !details->check_mapping && !details->nonlinear_vma)
+	if (details && !details->check_mapping)
 		details = NULL;
 
 	BUG_ON(addr >= end);
@@ -1388,7 +1369,7 @@ void unmap_vmas(struct mmu_gather *tlb,
  * @vma: vm_area_struct holding the applicable pages
  * @start: starting address of pages to zap
  * @size: number of bytes to zap
- * @details: details of nonlinear truncation or shared cache invalidation
+ * @details: details of shared cache invalidation
  *
  * Caller must protect the VMA list
  */
@@ -1414,7 +1395,7 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long start,
  * @vma: vm_area_struct holding the applicable pages
  * @address: starting address of pages to zap
  * @size: number of bytes to zap
- * @details: details of nonlinear truncation or shared cache invalidation
+ * @details: details of shared cache invalidation
  *
  * The range must fit into one VMA.
  */
@@ -2977,23 +2958,6 @@ static inline void unmap_mapping_range_tree(struct rb_root *root,
 	}
 }
 
-static inline void unmap_mapping_range_list(struct list_head *head,
-					    struct zap_details *details)
-{
-	struct vm_area_struct *vma;
-
-	/*
-	 * In nonlinear VMAs there is no correspondence between virtual address
-	 * offset and file offset.  So we must perform an exhaustive search
-	 * across *all* the pages in each nonlinear VMA, not just the pages
-	 * whose virtual address lies outside the file truncation point.
-	 */
-	list_for_each_entry(vma, head, shared.nonlinear) {
-		details->nonlinear_vma = vma;
-		unmap_mapping_range_vma(vma, vma->vm_start, vma->vm_end, details);
-	}
-}
-
 /**
  * unmap_mapping_range - unmap the portion of all mmaps in the specified address_space corresponding to the specified page range in the underlying file.
  * @mapping: the address space containing mmaps to be unmapped.
@@ -3024,7 +2988,6 @@ void unmap_mapping_range(struct address_space *mapping,
 	}
 
 	details.check_mapping = even_cows? NULL: mapping;
-	details.nonlinear_vma = NULL;
 	details.first_index = hba;
 	details.last_index = hba + hlen - 1;
 	if (details.last_index < details.first_index)
@@ -3034,8 +2997,6 @@ void unmap_mapping_range(struct address_space *mapping,
 	mutex_lock(&mapping->i_mmap_mutex);
 	if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap)))
 		unmap_mapping_range_tree(&mapping->i_mmap, &details);
-	if (unlikely(!list_empty(&mapping->i_mmap_nonlinear)))
-		unmap_mapping_range_list(&mapping->i_mmap_nonlinear, &details);
 	mutex_unlock(&mapping->i_mmap_mutex);
 }
 EXPORT_SYMBOL(unmap_mapping_range);
-- 
2.0.0.rc0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/8] mm, rmap: kill rmap_walk_control->file_nonlinear()
  2014-05-06 14:37 [RFC, PATCH 0/8] remap_file_pages() decommission Kirill A. Shutemov
                   ` (2 preceding siblings ...)
  2014-05-06 14:37 ` [PATCH 3/8] mm: kill zap_details->nonlinear_vma Kirill A. Shutemov
@ 2014-05-06 14:37 ` Kirill A. Shutemov
  2014-05-06 14:37 ` [PATCH 5/8] mm, rmap: kill vma->shared.nonlinear Kirill A. Shutemov
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Kirill A. Shutemov @ 2014-05-06 14:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, peterz, mingo, Kirill A. Shutemov

Nobody creates nonlinear VMAs. No need to support them.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/rmap.h |   2 -
 mm/migrate.c         |  32 --------
 mm/rmap.c            | 216 ---------------------------------------------------
 3 files changed, 250 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 9be55c7617da..812774417407 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -237,7 +237,6 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma);
  * arg: passed to rmap_one() and invalid_vma()
  * rmap_one: executed on each vma where page is mapped
  * done: for checking traversing termination condition
- * file_nonlinear: for handling file nonlinear mapping
  * anon_lock: for getting anon_lock by optimized way rather than default
  * invalid_vma: for skipping uninterested vma
  */
@@ -246,7 +245,6 @@ struct rmap_walk_control {
 	int (*rmap_one)(struct page *page, struct vm_area_struct *vma,
 					unsigned long addr, void *arg);
 	int (*done)(struct page *page);
-	int (*file_nonlinear)(struct page *, struct address_space *, void *arg);
 	struct anon_vma *(*anon_lock)(struct page *page);
 	bool (*invalid_vma)(struct vm_area_struct *vma, void *arg);
 };
diff --git a/mm/migrate.c b/mm/migrate.c
index bed48809e5d0..b494fdb9a636 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -178,37 +178,6 @@ out:
 }
 
 /*
- * Congratulations to trinity for discovering this bug.
- * mm/fremap.c's remap_file_pages() accepts any range within a single vma to
- * convert that vma to VM_NONLINEAR; and generic_file_remap_pages() will then
- * replace the specified range by file ptes throughout (maybe populated after).
- * If page migration finds a page within that range, while it's still located
- * by vma_interval_tree rather than lost to i_mmap_nonlinear list, no problem:
- * zap_pte() clears the temporary migration entry before mmap_sem is dropped.
- * But if the migrating page is in a part of the vma outside the range to be
- * remapped, then it will not be cleared, and remove_migration_ptes() needs to
- * deal with it.  Fortunately, this part of the vma is of course still linear,
- * so we just need to use linear location on the nonlinear list.
- */
-static int remove_linear_migration_ptes_from_nonlinear(struct page *page,
-		struct address_space *mapping, void *arg)
-{
-	struct vm_area_struct *vma;
-	/* hugetlbfs does not support remap_pages, so no huge pgoff worries */
-	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
-	unsigned long addr;
-
-	list_for_each_entry(vma,
-		&mapping->i_mmap_nonlinear, shared.nonlinear) {
-
-		addr = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
-		if (addr >= vma->vm_start && addr < vma->vm_end)
-			remove_migration_pte(page, vma, addr, arg);
-	}
-	return SWAP_AGAIN;
-}
-
-/*
  * Get rid of all migration entries and replace them by
  * references to the indicated page.
  */
@@ -217,7 +186,6 @@ static void remove_migration_ptes(struct page *old, struct page *new)
 	struct rmap_walk_control rwc = {
 		.rmap_one = remove_migration_pte,
 		.arg = old,
-		.file_nonlinear = remove_linear_migration_ptes_from_nonlinear,
 	};
 
 	rmap_walk(new, &rwc);
diff --git a/mm/rmap.c b/mm/rmap.c
index d8e1a7e7fbe8..e031d4ad0a4b 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1252,207 +1252,6 @@ out_mlock:
 	return ret;
 }
 
-/*
- * objrmap doesn't work for nonlinear VMAs because the assumption that
- * offset-into-file correlates with offset-into-virtual-addresses does not hold.
- * Consequently, given a particular page and its ->index, we cannot locate the
- * ptes which are mapping that page without an exhaustive linear search.
- *
- * So what this code does is a mini "virtual scan" of each nonlinear VMA which
- * maps the file to which the target page belongs.  The ->vm_private_data field
- * holds the current cursor into that scan.  Successive searches will circulate
- * around the vma's virtual address space.
- *
- * So as more replacement pressure is applied to the pages in a nonlinear VMA,
- * more scanning pressure is placed against them as well.   Eventually pages
- * will become fully unmapped and are eligible for eviction.
- *
- * For very sparsely populated VMAs this is a little inefficient - chances are
- * there there won't be many ptes located within the scan cluster.  In this case
- * maybe we could scan further - to the end of the pte page, perhaps.
- *
- * Mlocked pages:  check VM_LOCKED under mmap_sem held for read, if we can
- * acquire it without blocking.  If vma locked, mlock the pages in the cluster,
- * rather than unmapping them.  If we encounter the "check_page" that vmscan is
- * trying to unmap, return SWAP_MLOCK, else default SWAP_AGAIN.
- */
-#define CLUSTER_SIZE	min(32*PAGE_SIZE, PMD_SIZE)
-#define CLUSTER_MASK	(~(CLUSTER_SIZE - 1))
-
-static int try_to_unmap_cluster(unsigned long cursor, unsigned int *mapcount,
-		struct vm_area_struct *vma, struct page *check_page)
-{
-	struct mm_struct *mm = vma->vm_mm;
-	pmd_t *pmd;
-	pte_t *pte;
-	pte_t pteval;
-	spinlock_t *ptl;
-	struct page *page;
-	unsigned long address;
-	unsigned long mmun_start;	/* For mmu_notifiers */
-	unsigned long mmun_end;		/* For mmu_notifiers */
-	unsigned long end;
-	int ret = SWAP_AGAIN;
-	int locked_vma = 0;
-
-	address = (vma->vm_start + cursor) & CLUSTER_MASK;
-	end = address + CLUSTER_SIZE;
-	if (address < vma->vm_start)
-		address = vma->vm_start;
-	if (end > vma->vm_end)
-		end = vma->vm_end;
-
-	pmd = mm_find_pmd(mm, address);
-	if (!pmd)
-		return ret;
-
-	mmun_start = address;
-	mmun_end   = end;
-	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
-
-	/*
-	 * If we can acquire the mmap_sem for read, and vma is VM_LOCKED,
-	 * keep the sem while scanning the cluster for mlocking pages.
-	 */
-	if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
-		locked_vma = (vma->vm_flags & VM_LOCKED);
-		if (!locked_vma)
-			up_read(&vma->vm_mm->mmap_sem); /* don't need it */
-	}
-
-	pte = pte_offset_map_lock(mm, pmd, address, &ptl);
-
-	/* Update high watermark before we lower rss */
-	update_hiwater_rss(mm);
-
-	for (; address < end; pte++, address += PAGE_SIZE) {
-		if (!pte_present(*pte))
-			continue;
-		page = vm_normal_page(vma, address, *pte);
-		BUG_ON(!page || PageAnon(page));
-
-		if (locked_vma) {
-			if (page == check_page) {
-				/* we know we have check_page locked */
-				mlock_vma_page(page);
-				ret = SWAP_MLOCK;
-			} else if (trylock_page(page)) {
-				/*
-				 * If we can lock the page, perform mlock.
-				 * Otherwise leave the page alone, it will be
-				 * eventually encountered again later.
-				 */
-				mlock_vma_page(page);
-				unlock_page(page);
-			}
-			continue;	/* don't unmap */
-		}
-
-		if (ptep_clear_flush_young_notify(vma, address, pte))
-			continue;
-
-		/* Nuke the page table entry. */
-		flush_cache_page(vma, address, pte_pfn(*pte));
-		pteval = ptep_clear_flush(vma, address, pte);
-
-		/* If nonlinear, store the file page offset in the pte. */
-		if (page->index != linear_page_index(vma, address)) {
-			pte_t ptfile = pgoff_to_pte(page->index);
-			if (pte_soft_dirty(pteval))
-				pte_file_mksoft_dirty(ptfile);
-			set_pte_at(mm, address, pte, ptfile);
-		}
-
-		/* Move the dirty bit to the physical page now the pte is gone. */
-		if (pte_dirty(pteval))
-			set_page_dirty(page);
-
-		page_remove_rmap(page);
-		page_cache_release(page);
-		dec_mm_counter(mm, MM_FILEPAGES);
-		(*mapcount)--;
-	}
-	pte_unmap_unlock(pte - 1, ptl);
-	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
-	if (locked_vma)
-		up_read(&vma->vm_mm->mmap_sem);
-	return ret;
-}
-
-static int try_to_unmap_nonlinear(struct page *page,
-		struct address_space *mapping, void *arg)
-{
-	struct vm_area_struct *vma;
-	int ret = SWAP_AGAIN;
-	unsigned long cursor;
-	unsigned long max_nl_cursor = 0;
-	unsigned long max_nl_size = 0;
-	unsigned int mapcount;
-
-	list_for_each_entry(vma,
-		&mapping->i_mmap_nonlinear, shared.nonlinear) {
-
-		cursor = (unsigned long) vma->vm_private_data;
-		if (cursor > max_nl_cursor)
-			max_nl_cursor = cursor;
-		cursor = vma->vm_end - vma->vm_start;
-		if (cursor > max_nl_size)
-			max_nl_size = cursor;
-	}
-
-	if (max_nl_size == 0) {	/* all nonlinears locked or reserved ? */
-		return SWAP_FAIL;
-	}
-
-	/*
-	 * We don't try to search for this page in the nonlinear vmas,
-	 * and page_referenced wouldn't have found it anyway.  Instead
-	 * just walk the nonlinear vmas trying to age and unmap some.
-	 * The mapcount of the page we came in with is irrelevant,
-	 * but even so use it as a guide to how hard we should try?
-	 */
-	mapcount = page_mapcount(page);
-	if (!mapcount)
-		return ret;
-
-	cond_resched();
-
-	max_nl_size = (max_nl_size + CLUSTER_SIZE - 1) & CLUSTER_MASK;
-	if (max_nl_cursor == 0)
-		max_nl_cursor = CLUSTER_SIZE;
-
-	do {
-		list_for_each_entry(vma,
-			&mapping->i_mmap_nonlinear, shared.nonlinear) {
-
-			cursor = (unsigned long) vma->vm_private_data;
-			while (cursor < max_nl_cursor &&
-				cursor < vma->vm_end - vma->vm_start) {
-				if (try_to_unmap_cluster(cursor, &mapcount,
-						vma, page) == SWAP_MLOCK)
-					ret = SWAP_MLOCK;
-				cursor += CLUSTER_SIZE;
-				vma->vm_private_data = (void *) cursor;
-				if ((int)mapcount <= 0)
-					return ret;
-			}
-			vma->vm_private_data = (void *) max_nl_cursor;
-		}
-		cond_resched();
-		max_nl_cursor += CLUSTER_SIZE;
-	} while (max_nl_cursor <= max_nl_size);
-
-	/*
-	 * Don't loop forever (perhaps all the remaining pages are
-	 * in locked vmas).  Reset cursor on all unreserved nonlinear
-	 * vmas, now forgetting on which ones it had fallen behind.
-	 */
-	list_for_each_entry(vma, &mapping->i_mmap_nonlinear, shared.nonlinear)
-		vma->vm_private_data = NULL;
-
-	return ret;
-}
-
 bool is_vma_temporary_stack(struct vm_area_struct *vma)
 {
 	int maybe_stack = vma->vm_flags & (VM_GROWSDOWN | VM_GROWSUP);
@@ -1498,7 +1297,6 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
 		.rmap_one = try_to_unmap_one,
 		.arg = (void *)flags,
 		.done = page_not_mapped,
-		.file_nonlinear = try_to_unmap_nonlinear,
 		.anon_lock = page_lock_anon_vma_read,
 	};
 
@@ -1544,12 +1342,6 @@ int try_to_munlock(struct page *page)
 		.rmap_one = try_to_unmap_one,
 		.arg = (void *)TTU_MUNLOCK,
 		.done = page_not_mapped,
-		/*
-		 * We don't bother to try to find the munlocked page in
-		 * nonlinears. It's costly. Instead, later, page reclaim logic
-		 * may call try_to_unmap() and recover PG_mlocked lazily.
-		 */
-		.file_nonlinear = NULL,
 		.anon_lock = page_lock_anon_vma_read,
 
 	};
@@ -1678,14 +1470,6 @@ static int rmap_walk_file(struct page *page, struct rmap_walk_control *rwc)
 			goto done;
 	}
 
-	if (!rwc->file_nonlinear)
-		goto done;
-
-	if (list_empty(&mapping->i_mmap_nonlinear))
-		goto done;
-
-	ret = rwc->file_nonlinear(page, mapping, rwc->arg);
-
 done:
 	mutex_unlock(&mapping->i_mmap_mutex);
 	return ret;
-- 
2.0.0.rc0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/8] mm, rmap: kill vma->shared.nonlinear
  2014-05-06 14:37 [RFC, PATCH 0/8] remap_file_pages() decommission Kirill A. Shutemov
                   ` (3 preceding siblings ...)
  2014-05-06 14:37 ` [PATCH 4/8] mm, rmap: kill rmap_walk_control->file_nonlinear() Kirill A. Shutemov
@ 2014-05-06 14:37 ` Kirill A. Shutemov
  2014-05-06 14:37 ` [PATCH 6/8] mm, rmap: kill mapping->i_mmap_nonlinear Kirill A. Shutemov
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Kirill A. Shutemov @ 2014-05-06 14:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, peterz, mingo, Kirill A. Shutemov

Nobody creates nonlinear VMAs. No need to support them.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm.h       |  6 ------
 include/linux/mm_types.h | 12 +++---------
 kernel/fork.c            |  8 ++------
 mm/interval_tree.c       | 34 +++++++++++++++++-----------------
 mm/mmap.c                | 10 ++--------
 5 files changed, 24 insertions(+), 46 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 156ca8025cec..d8dc4cd58704 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1721,12 +1721,6 @@ struct vm_area_struct *vma_interval_tree_iter_next(struct vm_area_struct *node,
 	for (vma = vma_interval_tree_iter_first(root, start, last);	\
 	     vma; vma = vma_interval_tree_iter_next(vma, start, last))
 
-static inline void vma_nonlinear_insert(struct vm_area_struct *vma,
-					struct list_head *list)
-{
-	list_add_tail(&vma->shared.nonlinear, list);
-}
-
 void anon_vma_interval_tree_insert(struct anon_vma_chain *node,
 				   struct rb_root *root);
 void anon_vma_interval_tree_remove(struct anon_vma_chain *node,
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 8967e20cbe57..4a586fa37720 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -272,16 +272,10 @@ struct vm_area_struct {
 
 	/*
 	 * For areas with an address space and backing store,
-	 * linkage into the address_space->i_mmap interval tree, or
-	 * linkage of vma in the address_space->i_mmap_nonlinear list.
+	 * linkage into the address_space->i_mmap interval tree.
 	 */
-	union {
-		struct {
-			struct rb_node rb;
-			unsigned long rb_subtree_last;
-		} linear;
-		struct list_head nonlinear;
-	} shared;
+	struct rb_node shared_rb;
+	unsigned long shared_rb_subtree_last;
 
 	/*
 	 * A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
diff --git a/kernel/fork.c b/kernel/fork.c
index 54a8d26f612f..b4fc21e42688 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -424,12 +424,8 @@ static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
 				mapping->i_mmap_writable++;
 			flush_dcache_mmap_lock(mapping);
 			/* insert tmp into the share list, just after mpnt */
-			if (unlikely(tmp->vm_flags & VM_NONLINEAR))
-				vma_nonlinear_insert(tmp,
-						&mapping->i_mmap_nonlinear);
-			else
-				vma_interval_tree_insert_after(tmp, mpnt,
-							&mapping->i_mmap);
+			vma_interval_tree_insert_after(tmp, mpnt,
+					&mapping->i_mmap);
 			flush_dcache_mmap_unlock(mapping);
 			mutex_unlock(&mapping->i_mmap_mutex);
 		}
diff --git a/mm/interval_tree.c b/mm/interval_tree.c
index 4a5822a586e6..06d75a9ba00c 100644
--- a/mm/interval_tree.c
+++ b/mm/interval_tree.c
@@ -21,8 +21,8 @@ static inline unsigned long vma_last_pgoff(struct vm_area_struct *v)
 	return v->vm_pgoff + ((v->vm_end - v->vm_start) >> PAGE_SHIFT) - 1;
 }
 
-INTERVAL_TREE_DEFINE(struct vm_area_struct, shared.linear.rb,
-		     unsigned long, shared.linear.rb_subtree_last,
+INTERVAL_TREE_DEFINE(struct vm_area_struct, shared_rb,
+		     unsigned long, shared_rb_subtree_last,
 		     vma_start_pgoff, vma_last_pgoff,, vma_interval_tree)
 
 /* Insert node immediately after prev in the interval tree */
@@ -36,26 +36,26 @@ void vma_interval_tree_insert_after(struct vm_area_struct *node,
 
 	VM_BUG_ON(vma_start_pgoff(node) != vma_start_pgoff(prev));
 
-	if (!prev->shared.linear.rb.rb_right) {
+	if (!prev->shared_rb.rb_right) {
 		parent = prev;
-		link = &prev->shared.linear.rb.rb_right;
+		link = &prev->shared_rb.rb_right;
 	} else {
-		parent = rb_entry(prev->shared.linear.rb.rb_right,
-				  struct vm_area_struct, shared.linear.rb);
-		if (parent->shared.linear.rb_subtree_last < last)
-			parent->shared.linear.rb_subtree_last = last;
-		while (parent->shared.linear.rb.rb_left) {
-			parent = rb_entry(parent->shared.linear.rb.rb_left,
-				struct vm_area_struct, shared.linear.rb);
-			if (parent->shared.linear.rb_subtree_last < last)
-				parent->shared.linear.rb_subtree_last = last;
+		parent = rb_entry(prev->shared_rb.rb_right,
+				  struct vm_area_struct, shared_rb);
+		if (parent->shared_rb_subtree_last < last)
+			parent->shared_rb_subtree_last = last;
+		while (parent->shared_rb.rb_left) {
+			parent = rb_entry(parent->shared_rb.rb_left,
+				struct vm_area_struct, shared_rb);
+			if (parent->shared_rb_subtree_last < last)
+				parent->shared_rb_subtree_last = last;
 		}
-		link = &parent->shared.linear.rb.rb_left;
+		link = &parent->shared_rb.rb_left;
 	}
 
-	node->shared.linear.rb_subtree_last = last;
-	rb_link_node(&node->shared.linear.rb, &parent->shared.linear.rb, link);
-	rb_insert_augmented(&node->shared.linear.rb, root,
+	node->shared_rb_subtree_last = last;
+	rb_link_node(&node->shared_rb, &parent->shared_rb, link);
+	rb_insert_augmented(&node->shared_rb, root,
 			    &vma_interval_tree_augment);
 }
 
diff --git a/mm/mmap.c b/mm/mmap.c
index 4106fc833f56..8be242f07439 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -216,10 +216,7 @@ static void __remove_shared_vm_struct(struct vm_area_struct *vma,
 		mapping->i_mmap_writable--;
 
 	flush_dcache_mmap_lock(mapping);
-	if (unlikely(vma->vm_flags & VM_NONLINEAR))
-		list_del_init(&vma->shared.nonlinear);
-	else
-		vma_interval_tree_remove(vma, &mapping->i_mmap);
+	vma_interval_tree_remove(vma, &mapping->i_mmap);
 	flush_dcache_mmap_unlock(mapping);
 }
 
@@ -617,10 +614,7 @@ static void __vma_link_file(struct vm_area_struct *vma)
 			mapping->i_mmap_writable++;
 
 		flush_dcache_mmap_lock(mapping);
-		if (unlikely(vma->vm_flags & VM_NONLINEAR))
-			vma_nonlinear_insert(vma, &mapping->i_mmap_nonlinear);
-		else
-			vma_interval_tree_insert(vma, &mapping->i_mmap);
+		vma_interval_tree_insert(vma, &mapping->i_mmap);
 		flush_dcache_mmap_unlock(mapping);
 	}
 }
-- 
2.0.0.rc0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/8] mm, rmap: kill mapping->i_mmap_nonlinear
  2014-05-06 14:37 [RFC, PATCH 0/8] remap_file_pages() decommission Kirill A. Shutemov
                   ` (4 preceding siblings ...)
  2014-05-06 14:37 ` [PATCH 5/8] mm, rmap: kill vma->shared.nonlinear Kirill A. Shutemov
@ 2014-05-06 14:37 ` Kirill A. Shutemov
  2014-05-06 14:37 ` [PATCH 7/8] mm: kill VM_NONLINEAR and FAULT_FLAG_NONLINEAR Kirill A. Shutemov
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Kirill A. Shutemov @ 2014-05-06 14:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, peterz, mingo, Kirill A. Shutemov

Nobody creates nonlinear VMAs. No need to support them.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 Documentation/cachetlb.txt | 4 ++--
 fs/inode.c                 | 1 -
 include/linux/fs.h         | 4 +---
 mm/swap.c                  | 1 -
 4 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/Documentation/cachetlb.txt b/Documentation/cachetlb.txt
index d79b008e4a32..72405093f707 100644
--- a/Documentation/cachetlb.txt
+++ b/Documentation/cachetlb.txt
@@ -317,8 +317,8 @@ maps this page at its virtual address.
 	about doing this.
 
 	The idea is, first at flush_dcache_page() time, if
-	page->mapping->i_mmap is an empty tree and ->i_mmap_nonlinear
-	an empty list, just mark the architecture private page flag bit.
+	page->mapping->i_mmap is an empty tree just mark the architecture
+	private page flag bit.
 	Later, in update_mmu_cache(), a check is made of this flag bit,
 	and if set the flush is done and the flag bit is cleared.
 
diff --git a/fs/inode.c b/fs/inode.c
index f96d2a6f88cc..0cb8652b3719 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -352,7 +352,6 @@ void address_space_init_once(struct address_space *mapping)
 	INIT_LIST_HEAD(&mapping->private_list);
 	spin_lock_init(&mapping->private_lock);
 	mapping->i_mmap = RB_ROOT;
-	INIT_LIST_HEAD(&mapping->i_mmap_nonlinear);
 }
 EXPORT_SYMBOL(address_space_init_once);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 14abfc355726..f95bd31ff424 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -386,7 +386,6 @@ struct address_space {
 	spinlock_t		tree_lock;	/* and lock protecting it */
 	unsigned int		i_mmap_writable;/* count VM_SHARED mappings */
 	struct rb_root		i_mmap;		/* tree of private and shared mappings */
-	struct list_head	i_mmap_nonlinear;/*list VM_NONLINEAR mappings */
 	struct mutex		i_mmap_mutex;	/* protect tree, count, list */
 	/* Protected by tree_lock together with the radix tree */
 	unsigned long		nrpages;	/* number of total pages */
@@ -458,8 +457,7 @@ int mapping_tagged(struct address_space *mapping, int tag);
  */
 static inline int mapping_mapped(struct address_space *mapping)
 {
-	return	!RB_EMPTY_ROOT(&mapping->i_mmap) ||
-		!list_empty(&mapping->i_mmap_nonlinear);
+	return	!RB_EMPTY_ROOT(&mapping->i_mmap);
 }
 
 /*
diff --git a/mm/swap.c b/mm/swap.c
index 9ce43ba4498b..6adef8e3ccf7 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -1046,7 +1046,6 @@ void __init swap_setup(void)
 		panic("Failed to init swap bdi");
 	for (i = 0; i < MAX_SWAPFILES; i++) {
 		spin_lock_init(&swapper_spaces[i].tree_lock);
-		INIT_LIST_HEAD(&swapper_spaces[i].i_mmap_nonlinear);
 	}
 #endif
 
-- 
2.0.0.rc0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 7/8] mm: kill VM_NONLINEAR and FAULT_FLAG_NONLINEAR
  2014-05-06 14:37 [RFC, PATCH 0/8] remap_file_pages() decommission Kirill A. Shutemov
                   ` (5 preceding siblings ...)
  2014-05-06 14:37 ` [PATCH 6/8] mm, rmap: kill mapping->i_mmap_nonlinear Kirill A. Shutemov
@ 2014-05-06 14:37 ` Kirill A. Shutemov
  2014-05-06 14:37 ` [PATCH 8/8] mm, x86: kill pte_to_pgoff(), pgoff_to_pte() and pte_file*() Kirill A. Shutemov
  2014-05-06 21:35 ` [RFC, PATCH 0/8] remap_file_pages() decommission Andrew Morton
  8 siblings, 0 replies; 19+ messages in thread
From: Kirill A. Shutemov @ 2014-05-06 14:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, peterz, mingo, Kirill A. Shutemov

Nobody creates nonlinear VMAs. No need to support them.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 drivers/gpu/drm/drm_vma_manager.c |  3 +--
 fs/proc/task_mmu.c                |  5 -----
 include/linux/mm.h                |  2 --
 mm/ksm.c                          |  2 +-
 mm/madvise.c                      |  2 +-
 mm/memory.c                       | 40 +++++++--------------------------------
 mm/mmap.c                         | 11 ++++-------
 mm/rmap.c                         |  5 ++---
 8 files changed, 16 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/drm_vma_manager.c b/drivers/gpu/drm/drm_vma_manager.c
index 63b471205072..68c1f32fb086 100644
--- a/drivers/gpu/drm/drm_vma_manager.c
+++ b/drivers/gpu/drm/drm_vma_manager.c
@@ -50,8 +50,7 @@
  *
  * You must not use multiple offset managers on a single address_space.
  * Otherwise, mm-core will be unable to tear down memory mappings as the VM will
- * no longer be linear. Please use VM_NONLINEAR in that case and implement your
- * own offset managers.
+ * no longer be linear.
  *
  * This offset manager works on page-based addresses. That is, every argument
  * and return code (with the exception of drm_vma_node_offset_addr()) is given
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 442177b1119a..1a2d7d3bea28 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -552,7 +552,6 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 		[ilog2(VM_ACCOUNT)]	= "ac",
 		[ilog2(VM_NORESERVE)]	= "nr",
 		[ilog2(VM_HUGETLB)]	= "ht",
-		[ilog2(VM_NONLINEAR)]	= "nl",
 		[ilog2(VM_ARCH_1)]	= "ar",
 		[ilog2(VM_DONTDUMP)]	= "dd",
 #ifdef CONFIG_MEM_SOFT_DIRTY
@@ -626,10 +625,6 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
 		   (vma->vm_flags & VM_LOCKED) ?
 			(unsigned long)(mss.pss >> (10 + PSS_SHIFT)) : 0);
 
-	if (vma->vm_flags & VM_NONLINEAR)
-		seq_printf(m, "Nonlinear:      %8lu kB\n",
-				mss.nonlinear >> 10);
-
 	show_smap_vma_flags(m, vma);
 
 	if (m->count < m->size)  /* vma is copied successfully */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index d8dc4cd58704..2c9f3288a14a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -125,7 +125,6 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_ACCOUNT	0x00100000	/* Is a VM accounted object */
 #define VM_NORESERVE	0x00200000	/* should the VM suppress accounting */
 #define VM_HUGETLB	0x00400000	/* Huge TLB Page VM */
-#define VM_NONLINEAR	0x00800000	/* Is non-linear (remap_file_pages) */
 #define VM_ARCH_1	0x01000000	/* Architecture-specific flag */
 #define VM_DONTDUMP	0x04000000	/* Do not include in the core dump */
 
@@ -187,7 +186,6 @@ extern unsigned int kobjsize(const void *objp);
 extern pgprot_t protection_map[16];
 
 #define FAULT_FLAG_WRITE	0x01	/* Fault was a write access */
-#define FAULT_FLAG_NONLINEAR	0x02	/* Fault was via a nonlinear mapping */
 #define FAULT_FLAG_MKWRITE	0x04	/* Fault was mkwrite of existing pte */
 #define FAULT_FLAG_ALLOW_RETRY	0x08	/* Retry fault if blocking */
 #define FAULT_FLAG_RETRY_NOWAIT	0x10	/* Don't drop mmap_sem and wait when retrying */
diff --git a/mm/ksm.c b/mm/ksm.c
index 68710e80994a..48ddff33810b 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1749,7 +1749,7 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
 		 */
 		if (*vm_flags & (VM_MERGEABLE | VM_SHARED  | VM_MAYSHARE   |
 				 VM_PFNMAP    | VM_IO      | VM_DONTEXPAND |
-				 VM_HUGETLB | VM_NONLINEAR | VM_MIXEDMAP))
+				 VM_HUGETLB | VM_MIXEDMAP))
 			return 0;		/* just ignore the advice */
 
 #ifdef VM_SAO
diff --git a/mm/madvise.c b/mm/madvise.c
index 1932a1f0feda..cfb458c78e09 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -299,7 +299,7 @@ static long madvise_remove(struct vm_area_struct *vma,
 
 	*prev = NULL;	/* tell sys_madvise we drop mmap_sem */
 
-	if (vma->vm_flags & (VM_LOCKED|VM_NONLINEAR|VM_HUGETLB))
+	if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB))
 		return -EINVAL;
 
 	f = vma->vm_file;
diff --git a/mm/memory.c b/mm/memory.c
index cc741a7ce71e..a4f4ed739a60 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1024,8 +1024,7 @@ int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	 * readonly mappings. The tradeoff is that copy_page_range is more
 	 * efficient than faulting.
 	 */
-	if (!(vma->vm_flags & (VM_HUGETLB | VM_NONLINEAR |
-			       VM_PFNMAP | VM_MIXEDMAP))) {
+	if (!(vma->vm_flags & (VM_HUGETLB | VM_PFNMAP | VM_MIXEDMAP))) {
 		if (!vma->anon_vma)
 			return 0;
 	}
@@ -1142,8 +1141,7 @@ again:
 		if (unlikely(details))
 			continue;
 		if (pte_file(ptent)) {
-			if (unlikely(!(vma->vm_flags & VM_NONLINEAR)))
-				print_bad_pte(vma, addr, ptent, NULL);
+			print_bad_pte(vma, addr, ptent, NULL);
 		} else {
 			swp_entry_t entry = pte_to_swp_entry(ptent);
 
@@ -3623,42 +3621,18 @@ static int do_linear_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	return do_shared_fault(mm, vma, address, pmd, pgoff, flags, orig_pte);
 }
 
-/*
- * Fault of a previously existing named mapping. Repopulate the pte
- * from the encoded file_pte if possible. This enables swappable
- * nonlinear vmas.
- *
- * We enter with non-exclusive mmap_sem (to exclude vma changes,
- * but allow concurrent faults), and pte mapped but not yet locked.
- * We return with mmap_sem still held, but pte unmapped and unlocked.
- */
 static int do_nonlinear_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 		unsigned long address, pte_t *page_table, pmd_t *pmd,
 		unsigned int flags, pte_t orig_pte)
 {
-	pgoff_t pgoff;
-
-	flags |= FAULT_FLAG_NONLINEAR;
-
 	if (!pte_unmap_same(mm, pmd, page_table, orig_pte))
 		return 0;
 
-	if (unlikely(!(vma->vm_flags & VM_NONLINEAR))) {
-		/*
-		 * Page table corrupted: show pte and kill process.
-		 */
-		print_bad_pte(vma, address, orig_pte, NULL);
-		return VM_FAULT_SIGBUS;
-	}
-
-	pgoff = pte_to_pgoff(orig_pte);
-	if (!(flags & FAULT_FLAG_WRITE))
-		return do_read_fault(mm, vma, address, pmd, pgoff, flags,
-				orig_pte);
-	if (!(vma->vm_flags & VM_SHARED))
-		return do_cow_fault(mm, vma, address, pmd, pgoff, flags,
-				orig_pte);
-	return do_shared_fault(mm, vma, address, pmd, pgoff, flags, orig_pte);
+	/*
+	 * Page table corrupted: show pte and kill process.
+	 */
+	print_bad_pte(vma, address, orig_pte, NULL);
+	return VM_FAULT_SIGBUS;
 }
 
 static int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
diff --git a/mm/mmap.c b/mm/mmap.c
index 8be242f07439..dcac7eaa76b8 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -747,14 +747,11 @@ again:			remove_next = 1 + (end > next->vm_end);
 
 	if (file) {
 		mapping = file->f_mapping;
-		if (!(vma->vm_flags & VM_NONLINEAR)) {
-			root = &mapping->i_mmap;
-			uprobe_munmap(vma, vma->vm_start, vma->vm_end);
+		root = &mapping->i_mmap;
+		uprobe_munmap(vma, vma->vm_start, vma->vm_end);
 
-			if (adjust_next)
-				uprobe_munmap(next, next->vm_start,
-							next->vm_end);
-		}
+		if (adjust_next)
+			uprobe_munmap(next, next->vm_start, next->vm_end);
 
 		mutex_lock(&mapping->i_mmap_mutex);
 		if (insert) {
diff --git a/mm/rmap.c b/mm/rmap.c
index e031d4ad0a4b..c9d964d0a7c4 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -550,9 +550,8 @@ unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma)
 		if (!vma->anon_vma || !page__anon_vma ||
 		    vma->anon_vma->root != page__anon_vma->root)
 			return -EFAULT;
-	} else if (page->mapping && !(vma->vm_flags & VM_NONLINEAR)) {
-		if (!vma->vm_file ||
-		    vma->vm_file->f_mapping != page->mapping)
+	} else if (page->mapping) {
+		if (!vma->vm_file || vma->vm_file->f_mapping != page->mapping)
 			return -EFAULT;
 	} else
 		return -EFAULT;
-- 
2.0.0.rc0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 8/8] mm, x86: kill pte_to_pgoff(), pgoff_to_pte() and pte_file*()
  2014-05-06 14:37 [RFC, PATCH 0/8] remap_file_pages() decommission Kirill A. Shutemov
                   ` (6 preceding siblings ...)
  2014-05-06 14:37 ` [PATCH 7/8] mm: kill VM_NONLINEAR and FAULT_FLAG_NONLINEAR Kirill A. Shutemov
@ 2014-05-06 14:37 ` Kirill A. Shutemov
  2014-05-06 21:35 ` [RFC, PATCH 0/8] remap_file_pages() decommission Andrew Morton
  8 siblings, 0 replies; 19+ messages in thread
From: Kirill A. Shutemov @ 2014-05-06 14:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, peterz, mingo, Kirill A. Shutemov

Kill heplers which no longer need. For x86 only for now.

It also frees _PAGE_FILE flag in pte.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/pgtable-2level.h |  39 -----------
 arch/x86/include/asm/pgtable-3level.h |   4 --
 arch/x86/include/asm/pgtable.h        |  20 ------
 arch/x86/include/asm/pgtable_64.h     |   4 --
 arch/x86/include/asm/pgtable_types.h  |   3 +-
 fs/proc/task_mmu.c                    |   5 --
 include/linux/swapops.h               |   4 +-
 mm/madvise.c                          |   2 +-
 mm/memcontrol.c                       |   7 +-
 mm/memory.c                           | 126 ++++++++++++++--------------------
 mm/mincore.c                          |   5 +-
 mm/mprotect.c                         |   2 +-
 mm/mremap.c                           |   2 -
 mm/rmap.c                             |   1 -
 14 files changed, 57 insertions(+), 167 deletions(-)

diff --git a/arch/x86/include/asm/pgtable-2level.h b/arch/x86/include/asm/pgtable-2level.h
index 0d193e234647..6731f8c09016 100644
--- a/arch/x86/include/asm/pgtable-2level.h
+++ b/arch/x86/include/asm/pgtable-2level.h
@@ -86,27 +86,6 @@ static inline unsigned long pte_bitop(unsigned long value, unsigned int rightshi
 #define PTE_FILE_LSHIFT3	(PTE_FILE_BITS1 + PTE_FILE_BITS2)
 #define PTE_FILE_LSHIFT4	(PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3)
 
-static __always_inline pgoff_t pte_to_pgoff(pte_t pte)
-{
-	return (pgoff_t)
-		(pte_bitop(pte.pte_low, PTE_FILE_SHIFT1, PTE_FILE_MASK1,  0)		    +
-		 pte_bitop(pte.pte_low, PTE_FILE_SHIFT2, PTE_FILE_MASK2,  PTE_FILE_LSHIFT2) +
-		 pte_bitop(pte.pte_low, PTE_FILE_SHIFT3, PTE_FILE_MASK3,  PTE_FILE_LSHIFT3) +
-		 pte_bitop(pte.pte_low, PTE_FILE_SHIFT4,           -1UL,  PTE_FILE_LSHIFT4));
-}
-
-static __always_inline pte_t pgoff_to_pte(pgoff_t off)
-{
-	return (pte_t){
-		.pte_low =
-			pte_bitop(off,                0, PTE_FILE_MASK1,  PTE_FILE_SHIFT1) +
-			pte_bitop(off, PTE_FILE_LSHIFT2, PTE_FILE_MASK2,  PTE_FILE_SHIFT2) +
-			pte_bitop(off, PTE_FILE_LSHIFT3, PTE_FILE_MASK3,  PTE_FILE_SHIFT3) +
-			pte_bitop(off, PTE_FILE_LSHIFT4,           -1UL,  PTE_FILE_SHIFT4) +
-			_PAGE_FILE,
-	};
-}
-
 #else /* CONFIG_MEM_SOFT_DIRTY */
 
 /*
@@ -131,24 +110,6 @@ static __always_inline pte_t pgoff_to_pte(pgoff_t off)
 #define PTE_FILE_LSHIFT2	(PTE_FILE_BITS1)
 #define PTE_FILE_LSHIFT3	(PTE_FILE_BITS1 + PTE_FILE_BITS2)
 
-static __always_inline pgoff_t pte_to_pgoff(pte_t pte)
-{
-	return (pgoff_t)
-		(pte_bitop(pte.pte_low, PTE_FILE_SHIFT1, PTE_FILE_MASK1,  0)		    +
-		 pte_bitop(pte.pte_low, PTE_FILE_SHIFT2, PTE_FILE_MASK2,  PTE_FILE_LSHIFT2) +
-		 pte_bitop(pte.pte_low, PTE_FILE_SHIFT3,           -1UL,  PTE_FILE_LSHIFT3));
-}
-
-static __always_inline pte_t pgoff_to_pte(pgoff_t off)
-{
-	return (pte_t){
-		.pte_low =
-			pte_bitop(off,                0, PTE_FILE_MASK1,  PTE_FILE_SHIFT1) +
-			pte_bitop(off, PTE_FILE_LSHIFT2, PTE_FILE_MASK2,  PTE_FILE_SHIFT2) +
-			pte_bitop(off, PTE_FILE_LSHIFT3,           -1UL,  PTE_FILE_SHIFT3) +
-			_PAGE_FILE,
-	};
-}
 
 #endif /* CONFIG_MEM_SOFT_DIRTY */
 
diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h
index 81bb91b49a88..9315864c8d45 100644
--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -183,10 +183,6 @@ static inline pmd_t native_pmdp_get_and_clear(pmd_t *pmdp)
  * For soft-dirty tracking 11 bit is taken from
  * the low part of pte as well.
  */
-#define pte_to_pgoff(pte) ((pte).pte_high)
-#define pgoff_to_pte(off)						\
-	((pte_t) { { .pte_low = _PAGE_FILE, .pte_high = (off) } })
-#define PTE_FILE_MAX_BITS       32
 
 /* Encode and de-code a swap entry */
 #define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > 5)
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index b459ddf27d64..c2c7633e3e3e 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -109,11 +109,6 @@ static inline int pte_write(pte_t pte)
 	return pte_flags(pte) & _PAGE_RW;
 }
 
-static inline int pte_file(pte_t pte)
-{
-	return pte_flags(pte) & _PAGE_FILE;
-}
-
 static inline int pte_huge(pte_t pte)
 {
 	return pte_flags(pte) & _PAGE_PSE;
@@ -316,21 +311,6 @@ static inline pmd_t pmd_mksoft_dirty(pmd_t pmd)
 	return pmd_set_flags(pmd, _PAGE_SOFT_DIRTY);
 }
 
-static inline pte_t pte_file_clear_soft_dirty(pte_t pte)
-{
-	return pte_clear_flags(pte, _PAGE_SOFT_DIRTY);
-}
-
-static inline pte_t pte_file_mksoft_dirty(pte_t pte)
-{
-	return pte_set_flags(pte, _PAGE_SOFT_DIRTY);
-}
-
-static inline int pte_file_soft_dirty(pte_t pte)
-{
-	return pte_flags(pte) & _PAGE_SOFT_DIRTY;
-}
-
 /*
  * Mask out unsupported bits in a present pgprot.  Non-present pgprots
  * can use those bits for other purposes, so leave them be.
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index e22c1dbf7feb..7f83a69825e1 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -131,10 +131,6 @@ static inline int pgd_large(pgd_t pgd) { return 0; }
 /* PUD - Level3 access */
 
 /* PMD  - Level 2 access */
-#define pte_to_pgoff(pte) ((pte_val((pte)) & PHYSICAL_PAGE_MASK) >> PAGE_SHIFT)
-#define pgoff_to_pte(off) ((pte_t) { .pte = ((off) << PAGE_SHIFT) |	\
-					    _PAGE_FILE })
-#define PTE_FILE_MAX_BITS __PHYSICAL_MASK_SHIFT
 
 /* PTE - Level 1 access. */
 
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index eb3d44945133..0ffe96ef7c87 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -91,14 +91,13 @@
 #define _PAGE_NX	(_AT(pteval_t, 0))
 #endif
 
-#define _PAGE_FILE	(_AT(pteval_t, 1) << _PAGE_BIT_FILE)
 #define _PAGE_PROTNONE	(_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
 
 /*
  * _PAGE_NUMA indicates that this page will trigger a numa hinting
  * minor page fault to gather numa placement statistics (see
  * pte_numa()). The bit picked (8) is within the range between
- * _PAGE_FILE (6) and _PAGE_PROTNONE (8) bits. Therefore, it doesn't
+ * bit 6 and bit 8 (_PAGE_PROTNONE). Therefore, it doesn't
  * require changes to the swp entry format because that bit is always
  * zero when the pte is not present.
  *
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 1a2d7d3bea28..031bf1defffe 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -457,9 +457,6 @@ static void smaps_pte_entry(pte_t ptent, unsigned long addr,
 			mss->swap += ptent_size;
 		else if (is_migration_entry(swpent))
 			page = migration_entry_to_page(swpent);
-	} else if (pte_file(ptent)) {
-		if (pte_to_pgoff(ptent) != pgoff)
-			mss->nonlinear += ptent_size;
 	}
 
 	if (!page)
@@ -728,8 +725,6 @@ static inline void clear_soft_dirty(struct vm_area_struct *vma,
 		ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
 	} else if (is_swap_pte(ptent)) {
 		ptent = pte_swp_clear_soft_dirty(ptent);
-	} else if (pte_file(ptent)) {
-		ptent = pte_file_clear_soft_dirty(ptent);
 	}
 
 	if (vma->vm_flags & VM_SOFTDIRTY)
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index c0f75261a728..cefd26aa1aae 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -54,7 +54,7 @@ static inline pgoff_t swp_offset(swp_entry_t entry)
 /* check whether a pte points to a swap entry */
 static inline int is_swap_pte(pte_t pte)
 {
-	return !pte_none(pte) && !pte_present(pte) && !pte_file(pte);
+	return !pte_none(pte) && !pte_present(pte);
 }
 #endif
 
@@ -66,7 +66,6 @@ static inline swp_entry_t pte_to_swp_entry(pte_t pte)
 {
 	swp_entry_t arch_entry;
 
-	BUG_ON(pte_file(pte));
 	if (pte_swp_soft_dirty(pte))
 		pte = pte_swp_clear_soft_dirty(pte);
 	arch_entry = __pte_to_swp_entry(pte);
@@ -82,7 +81,6 @@ static inline pte_t swp_entry_to_pte(swp_entry_t entry)
 	swp_entry_t arch_entry;
 
 	arch_entry = __swp_entry(swp_type(entry), swp_offset(entry));
-	BUG_ON(pte_file(__swp_entry_to_pte(arch_entry)));
 	return __swp_entry_to_pte(arch_entry);
 }
 
diff --git a/mm/madvise.c b/mm/madvise.c
index cfb458c78e09..58cc98df29b1 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -155,7 +155,7 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
 		pte = *(orig_pte + ((index - start) / PAGE_SIZE));
 		pte_unmap_unlock(orig_pte, ptl);
 
-		if (pte_present(pte) || pte_none(pte) || pte_file(pte))
+		if (pte_present(pte) || pte_none(pte))
 			continue;
 		entry = pte_to_swp_entry(pte);
 		if (unlikely(non_swap_entry(entry)))
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 29501f040568..f8979fa16a5e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6680,10 +6680,7 @@ static struct page *mc_handle_file_pte(struct vm_area_struct *vma,
 		return NULL;
 
 	mapping = vma->vm_file->f_mapping;
-	if (pte_none(ptent))
-		pgoff = linear_page_index(vma, addr);
-	else /* pte_file(ptent) is true */
-		pgoff = pte_to_pgoff(ptent);
+	pgoff = linear_page_index(vma, addr);
 
 	/* page is moved even if it's not RSS of this task(page-faulted). */
 	page = find_get_page(mapping, pgoff);
@@ -6712,7 +6709,7 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma,
 		page = mc_handle_present_pte(vma, addr, ptent);
 	else if (is_swap_pte(ptent))
 		page = mc_handle_swap_pte(vma, addr, ptent, &ent);
-	else if (pte_none(ptent) || pte_file(ptent))
+	else if (pte_none(ptent))
 		page = mc_handle_file_pte(vma, addr, ptent, &ent);
 
 	if (!page && !ent.val)
diff --git a/mm/memory.c b/mm/memory.c
index a4f4ed739a60..4e1b6f626f23 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -815,42 +815,40 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 
 	/* pte contains position in swap or file, so copy. */
 	if (unlikely(!pte_present(pte))) {
-		if (!pte_file(pte)) {
-			swp_entry_t entry = pte_to_swp_entry(pte);
-
-			if (swap_duplicate(entry) < 0)
-				return entry.val;
-
-			/* make sure dst_mm is on swapoff's mmlist. */
-			if (unlikely(list_empty(&dst_mm->mmlist))) {
-				spin_lock(&mmlist_lock);
-				if (list_empty(&dst_mm->mmlist))
-					list_add(&dst_mm->mmlist,
-						 &src_mm->mmlist);
-				spin_unlock(&mmlist_lock);
-			}
-			if (likely(!non_swap_entry(entry)))
-				rss[MM_SWAPENTS]++;
-			else if (is_migration_entry(entry)) {
-				page = migration_entry_to_page(entry);
-
-				if (PageAnon(page))
-					rss[MM_ANONPAGES]++;
-				else
-					rss[MM_FILEPAGES]++;
-
-				if (is_write_migration_entry(entry) &&
-				    is_cow_mapping(vm_flags)) {
-					/*
-					 * COW mappings require pages in both
-					 * parent and child to be set to read.
-					 */
-					make_migration_entry_read(&entry);
-					pte = swp_entry_to_pte(entry);
-					if (pte_swp_soft_dirty(*src_pte))
-						pte = pte_swp_mksoft_dirty(pte);
-					set_pte_at(src_mm, addr, src_pte, pte);
-				}
+		swp_entry_t entry = pte_to_swp_entry(pte);
+
+		if (swap_duplicate(entry) < 0)
+			return entry.val;
+
+		/* make sure dst_mm is on swapoff's mmlist. */
+		if (unlikely(list_empty(&dst_mm->mmlist))) {
+			spin_lock(&mmlist_lock);
+			if (list_empty(&dst_mm->mmlist))
+				list_add(&dst_mm->mmlist,
+						&src_mm->mmlist);
+			spin_unlock(&mmlist_lock);
+		}
+		if (likely(!non_swap_entry(entry)))
+			rss[MM_SWAPENTS]++;
+		else if (is_migration_entry(entry)) {
+			page = migration_entry_to_page(entry);
+
+			if (PageAnon(page))
+				rss[MM_ANONPAGES]++;
+			else
+				rss[MM_FILEPAGES]++;
+
+			if (is_write_migration_entry(entry) &&
+					is_cow_mapping(vm_flags)) {
+				/*
+				 * COW mappings require pages in both
+				 * parent and child to be set to read.
+				 */
+				make_migration_entry_read(&entry);
+				pte = swp_entry_to_pte(entry);
+				if (pte_swp_soft_dirty(*src_pte))
+					pte = pte_swp_mksoft_dirty(pte);
+				set_pte_at(src_mm, addr, src_pte, pte);
 			}
 		}
 		goto out_set_pte;
@@ -1085,6 +1083,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 	spinlock_t *ptl;
 	pte_t *start_pte;
 	pte_t *pte;
+	swp_entry_t entry;
 
 again:
 	init_rss_vec(rss);
@@ -1140,26 +1139,21 @@ again:
 		/* If details->check_mapping, we leave swap entries */
 		if (unlikely(details))
 			continue;
-		if (pte_file(ptent)) {
-			print_bad_pte(vma, addr, ptent, NULL);
-		} else {
-			swp_entry_t entry = pte_to_swp_entry(ptent);
-
-			if (!non_swap_entry(entry))
-				rss[MM_SWAPENTS]--;
-			else if (is_migration_entry(entry)) {
-				struct page *page;
+		entry = pte_to_swp_entry(ptent);
+		if (!non_swap_entry(entry))
+			rss[MM_SWAPENTS]--;
+		else if (is_migration_entry(entry)) {
+			struct page *page;
 
-				page = migration_entry_to_page(entry);
+			page = migration_entry_to_page(entry);
 
-				if (PageAnon(page))
-					rss[MM_ANONPAGES]--;
-				else
-					rss[MM_FILEPAGES]--;
-			}
-			if (unlikely(!free_swap_and_cache(entry)))
-				print_bad_pte(vma, addr, ptent, NULL);
+			if (PageAnon(page))
+				rss[MM_ANONPAGES]--;
+			else
+				rss[MM_FILEPAGES]--;
 		}
+		if (unlikely(!free_swap_and_cache(entry)))
+			print_bad_pte(vma, addr, ptent, NULL);
 		pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
 	} while (pte++, addr += PAGE_SIZE, addr != end);
 
@@ -1545,7 +1539,7 @@ split_fallthrough:
 		 */
 		if (likely(!(flags & FOLL_MIGRATION)))
 			goto no_page;
-		if (pte_none(pte) || pte_file(pte))
+		if (pte_none(pte))
 			goto no_page;
 		entry = pte_to_swp_entry(pte);
 		if (!is_migration_entry(entry))
@@ -2561,7 +2555,7 @@ EXPORT_SYMBOL_GPL(apply_to_page_range);
  * handle_pte_fault chooses page fault handler according to an entry
  * which was read non-atomically.  Before making any commitment, on
  * those architectures or configurations (e.g. i386 with PAE) which
- * might give a mix of unmatched parts, do_swap_page and do_nonlinear_fault
+ * might give a mix of unmatched parts, do_swap_page
  * must check under lock before unmapping the pte and proceeding
  * (but do_wp_page is only called after already making such a check;
  * and do_anonymous_page can safely check later on).
@@ -3346,8 +3340,6 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
 	entry = mk_pte(page, vma->vm_page_prot);
 	if (write)
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
-	else if (pte_file(*pte) && pte_file_soft_dirty(*pte))
-		pte_mksoft_dirty(entry);
 	if (anon) {
 		inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
 		page_add_new_anon_rmap(page, vma, address);
@@ -3621,20 +3613,6 @@ static int do_linear_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	return do_shared_fault(mm, vma, address, pmd, pgoff, flags, orig_pte);
 }
 
-static int do_nonlinear_fault(struct mm_struct *mm, struct vm_area_struct *vma,
-		unsigned long address, pte_t *page_table, pmd_t *pmd,
-		unsigned int flags, pte_t orig_pte)
-{
-	if (!pte_unmap_same(mm, pmd, page_table, orig_pte))
-		return 0;
-
-	/*
-	 * Page table corrupted: show pte and kill process.
-	 */
-	print_bad_pte(vma, address, orig_pte, NULL);
-	return VM_FAULT_SIGBUS;
-}
-
 static int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
 				unsigned long addr, int page_nid,
 				int *flags)
@@ -3756,11 +3734,7 @@ static int handle_pte_fault(struct mm_struct *mm,
 			return do_anonymous_page(mm, vma, address,
 						 pte, pmd, flags);
 		}
-		if (pte_file(entry))
-			return do_nonlinear_fault(mm, vma, address,
-					pte, pmd, flags, entry);
-		return do_swap_page(mm, vma, address,
-					pte, pmd, flags, entry);
+		return do_swap_page(mm, vma, address, pte, pmd, flags, entry);
 	}
 
 	if (pte_numa(entry))
diff --git a/mm/mincore.c b/mm/mincore.c
index 725c80961048..1e7aa8aa54f1 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -131,10 +131,7 @@ static void mincore_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 			mincore_unmapped_range(vma, addr, next, vec);
 		else if (pte_present(pte))
 			*vec = 1;
-		else if (pte_file(pte)) {
-			pgoff = pte_to_pgoff(pte);
-			*vec = mincore_page(vma->vm_file->f_mapping, pgoff);
-		} else { /* pte is a swap entry */
+		else { /* pte is a swap entry */
 			swp_entry_t entry = pte_to_swp_entry(pte);
 
 			if (is_migration_entry(entry)) {
diff --git a/mm/mprotect.c b/mm/mprotect.c
index c43d557941f8..c61918c60b81 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -110,7 +110,7 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 			}
 			if (updated)
 				pages++;
-		} else if (IS_ENABLED(CONFIG_MIGRATION) && !pte_file(oldpte)) {
+		} else if (IS_ENABLED(CONFIG_MIGRATION)) {
 			swp_entry_t entry = pte_to_swp_entry(oldpte);
 
 			if (is_write_migration_entry(entry)) {
diff --git a/mm/mremap.c b/mm/mremap.c
index 05f1180e9f21..6d49f62a4863 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -81,8 +81,6 @@ static pte_t move_soft_dirty_pte(pte_t pte)
 		pte = pte_mksoft_dirty(pte);
 	else if (is_swap_pte(pte))
 		pte = pte_swp_mksoft_dirty(pte);
-	else if (pte_file(pte))
-		pte = pte_file_mksoft_dirty(pte);
 #endif
 	return pte;
 }
diff --git a/mm/rmap.c b/mm/rmap.c
index c9d964d0a7c4..9ea40f12f6e9 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1209,7 +1209,6 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 		if (pte_soft_dirty(pteval))
 			swp_pte = pte_swp_mksoft_dirty(swp_pte);
 		set_pte_at(mm, address, pte, swp_pte);
-		BUG_ON(pte_file(*pte));
 	} else if (IS_ENABLED(CONFIG_MIGRATION) &&
 		   (TTU_ACTION(flags) == TTU_MIGRATION)) {
 		/* Establish migration entry for a file page */
-- 
2.0.0.rc0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC, PATCH 0/8] remap_file_pages() decommission
  2014-05-06 14:37 [RFC, PATCH 0/8] remap_file_pages() decommission Kirill A. Shutemov
                   ` (7 preceding siblings ...)
  2014-05-06 14:37 ` [PATCH 8/8] mm, x86: kill pte_to_pgoff(), pgoff_to_pte() and pte_file*() Kirill A. Shutemov
@ 2014-05-06 21:35 ` Andrew Morton
  2014-05-06 21:51   ` Linus Torvalds
  8 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2014-05-06 21:35 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: linux-mm, peterz, mingo, Linus Torvalds

On Tue,  6 May 2014 17:37:24 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> This patchset replaces the syscall with emulation which creates new VMA on
> each remap and remove code to support non-linear mappings.
> 
> Nonlinear mappings are pain to support and it seems there's no legitimate
> use-cases nowadays since 64-bit systems are widely available.
> 
> It's not yet ready to apply. Just to give rough idea of what can we get if
> we'll deprecated remap_file_pages().
> 
> I need to split patches properly and write correct commit messages. And there's
> still code to remove.

hah.  That's bold.  It would be great if we can get away with this.

Do we have any feeling for who will be impacted by this and how badly?

I wonder if we can give people a bit more warning - put a printk() in
there immediately, backport it into -stable, wait N months then make
the change.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC, PATCH 0/8] remap_file_pages() decommission
  2014-05-06 21:35 ` [RFC, PATCH 0/8] remap_file_pages() decommission Andrew Morton
@ 2014-05-06 21:51   ` Linus Torvalds
  2014-05-06 23:03     ` Kirill A. Shutemov
  0 siblings, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2014-05-06 21:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Kirill A. Shutemov, linux-mm, Peter Zijlstra, Ingo Molnar

On Tue, May 6, 2014 at 2:35 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Tue,  6 May 2014 17:37:24 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
>
>> This patchset replaces the syscall with emulation which creates new VMA on
>> each remap and remove code to support non-linear mappings.
>>
>> Nonlinear mappings are pain to support and it seems there's no legitimate
>> use-cases nowadays since 64-bit systems are widely available.
>>
>> It's not yet ready to apply. Just to give rough idea of what can we get if
>> we'll deprecated remap_file_pages().
>>
>> I need to split patches properly and write correct commit messages. And there's
>> still code to remove.
>
> hah.  That's bold.  It would be great if we can get away with this.
>
> Do we have any feeling for who will be impacted by this and how badly?

I *would* love to get rid of the nonlinear mappings, but I really have
zero visibility into who ended up using it. I assume it's a "Oracle on
32-bit x86" kind of thing.

I think this is more of a distro question. Plus perhaps an early patch
to just add a warning first so that we can see who it triggers for?

           Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC, PATCH 0/8] remap_file_pages() decommission
  2014-05-06 21:51   ` Linus Torvalds
@ 2014-05-06 23:03     ` Kirill A. Shutemov
  2014-05-06 23:28       ` Andrew Morton
  0 siblings, 1 reply; 19+ messages in thread
From: Kirill A. Shutemov @ 2014-05-06 23:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Kirill A. Shutemov, linux-mm, Peter Zijlstra, Ingo Molnar

On Tue, May 06, 2014 at 02:51:24PM -0700, Linus Torvalds wrote:
> On Tue, May 6, 2014 at 2:35 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> > On Tue,  6 May 2014 17:37:24 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> >
> >> This patchset replaces the syscall with emulation which creates new VMA on
> >> each remap and remove code to support non-linear mappings.
> >>
> >> Nonlinear mappings are pain to support and it seems there's no legitimate
> >> use-cases nowadays since 64-bit systems are widely available.
> >>
> >> It's not yet ready to apply. Just to give rough idea of what can we get if
> >> we'll deprecated remap_file_pages().
> >>
> >> I need to split patches properly and write correct commit messages. And there's
> >> still code to remove.
> >
> > hah.  That's bold.  It would be great if we can get away with this.
> >
> > Do we have any feeling for who will be impacted by this and how badly?
> 
> I *would* love to get rid of the nonlinear mappings, but I really have
> zero visibility into who ended up using it. I assume it's a "Oracle on
> 32-bit x86" kind of thing.

There're funny PyPy people who wants to use remap_file_pages() in new code to
build software transaction memory[1]. It sounds just crazy to me.

[1] https://lwn.net/Articles/587923/

> I think this is more of a distro question. Plus perhaps an early patch
> to just add a warning first so that we can see who it triggers for?

Something like this?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC, PATCH 0/8] remap_file_pages() decommission
  2014-05-06 23:03     ` Kirill A. Shutemov
@ 2014-05-06 23:28       ` Andrew Morton
  2014-05-07  9:12         ` Peter Zijlstra
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2014-05-06 23:28 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Linus Torvalds, Kirill A. Shutemov, linux-mm, Peter Zijlstra,
	Ingo Molnar

On Wed, 7 May 2014 02:03:23 +0300 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> remap_file_pages(2) was invented to be able efficiently map parts of
> huge file into limited 32-bit virtual address space such as in database
> workloads.
> 
> Nonlinear mappings are pain to support and it seems there's no
> legitimate use-cases nowadays since 64-bit systems are widely available.
> 
> Let's deprecate remap_file_pages() syscall in hope to get rid of code
> one day.

Before we do this we should ensure that your proposed replacement is viable
and desirable.  If we later decide not to proceed with it, this patch will
sow confusion.

> --- a/mm/fremap.c
> +++ b/mm/fremap.c
> @@ -152,6 +152,9 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
>  	int has_write_lock = 0;
>  	vm_flags_t vm_flags = 0;
>  
> +	printk_once(KERN_WARNING "%s (%d) uses depricated "

pr_warn_once(), "deprecated".

> +			"remap_file_pages(2) syscall.\n",
> +			current->comm, current->pid);
>  	if (prot)
>  		return err;

Can we provide more info than this?  Why is it deprecated, what do we
plan to do with it, what are people's options, etc?  Add "See
Documentation/remap_file_pages.txt", perhaps.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC, PATCH 0/8] remap_file_pages() decommission
  2014-05-06 23:28       ` Andrew Morton
@ 2014-05-07  9:12         ` Peter Zijlstra
  2014-05-07 16:46           ` Andrew Morton
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2014-05-07  9:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Linus Torvalds, Kirill A. Shutemov, linux-mm,
	Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 1423 bytes --]

On Tue, May 06, 2014 at 04:28:56PM -0700, Andrew Morton wrote:
> On Wed, 7 May 2014 02:03:23 +0300 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > remap_file_pages(2) was invented to be able efficiently map parts of
> > huge file into limited 32-bit virtual address space such as in database
> > workloads.
> > 
> > Nonlinear mappings are pain to support and it seems there's no
> > legitimate use-cases nowadays since 64-bit systems are widely available.
> > 
> > Let's deprecate remap_file_pages() syscall in hope to get rid of code
> > one day.
> 
> Before we do this we should ensure that your proposed replacement is viable
> and desirable.  If we later decide not to proceed with it, this patch will
> sow confusion.

Chicken meet Egg ?

How are we supposed to test if its viable if we have no known users? The
printk() might maybe (hopefully) get us some reaction in say a years
time, much longer if we're really unlucky.

That said, we could make the syscall return -ENOSYS unless a sysctl was
touched. The printk() would indeed have to mention said sysctl and a
place to find information about why we're doing this..

But by creating more pain (people have to actually set the sysctl, and
we'll have to universally agree to inflict pain on distro people that
set it by default -- say, starve them from beer at the next conf.) we're
more likely to get an answer sooner.



[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC, PATCH 0/8] remap_file_pages() decommission
  2014-05-07  9:12         ` Peter Zijlstra
@ 2014-05-07 16:46           ` Andrew Morton
  0 siblings, 0 replies; 19+ messages in thread
From: Andrew Morton @ 2014-05-07 16:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kirill A. Shutemov, Linus Torvalds, Kirill A. Shutemov, linux-mm,
	Ingo Molnar

On Wed, 7 May 2014 11:12:58 +0200 Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, May 06, 2014 at 04:28:56PM -0700, Andrew Morton wrote:
> > On Wed, 7 May 2014 02:03:23 +0300 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> > 
> > > remap_file_pages(2) was invented to be able efficiently map parts of
> > > huge file into limited 32-bit virtual address space such as in database
> > > workloads.
> > > 
> > > Nonlinear mappings are pain to support and it seems there's no
> > > legitimate use-cases nowadays since 64-bit systems are widely available.
> > > 
> > > Let's deprecate remap_file_pages() syscall in hope to get rid of code
> > > one day.
> > 
> > Before we do this we should ensure that your proposed replacement is viable
> > and desirable.  If we later decide not to proceed with it, this patch will
> > sow confusion.
> 
> Chicken meet Egg ?
> 
> How are we supposed to test if its viable if we have no known users?

Same way we always do - finish the code, developer test, review, give
it a spin in linux-next, etc.  Do some microbenchmarking to get an
understanding of the impact on people who are using r_f_p for real. 
The current patchset looks rather alphaish.

> The
> printk() might maybe (hopefully) get us some reaction in say a years
> time, much longer if we're really unlucky.
> 
> That said, we could make the syscall return -ENOSYS unless a sysctl was
> touched. The printk() would indeed have to mention said sysctl and a
> place to find information about why we're doing this..
> 
> But by creating more pain (people have to actually set the sysctl, and
> we'll have to universally agree to inflict pain on distro people that
> set it by default -- say, starve them from beer at the next conf.) we're
> more likely to get an answer sooner.

Could be.  We should consult distro people, Oracle people...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/8] mm: kill vm_operations_struct->remap_pages
  2014-05-06 14:37 ` [PATCH 2/8] mm: kill vm_operations_struct->remap_pages Kirill A. Shutemov
@ 2014-05-19 15:03   ` Christoph Hellwig
  2014-05-19 15:14     ` Kirill A. Shutemov
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2014-05-19 15:03 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: Andrew Morton, linux-mm, peterz, mingo

I think this should be split into two patches and go first in the
series:

 1) remove all instances but shmem and generic_file_vm_ops given that
    remap_file_pages already doesn't work on anything that has a backing
    store and all these are dead
 2) kill the method and make the syscall call generic_file_remap_pages
    directly as this is a core VM feature.

These two should go first because they make sense even if we can't
actually go with the emulation yet.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/8] mm: kill vm_operations_struct->remap_pages
  2014-05-19 15:03   ` Christoph Hellwig
@ 2014-05-19 15:14     ` Kirill A. Shutemov
  0 siblings, 0 replies; 19+ messages in thread
From: Kirill A. Shutemov @ 2014-05-19 15:14 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kirill A. Shutemov, Andrew Morton, linux-mm, peterz, mingo

Christoph Hellwig wrote:
> I think this should be split into two patches and go first in the
> series:
> 
>  1) remove all instances but shmem and generic_file_vm_ops given that
>     remap_file_pages already doesn't work on anything that has a backing
>     store and all these are dead
>  2) kill the method and make the syscall call generic_file_remap_pages
>     directly as this is a core VM feature.
> 
> These two should go first because they make sense even if we can't
> actually go with the emulation yet.

Makes sense. I'll prepare patches.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/8] mm: replace remap_file_pages() syscall with emulation
  2014-05-06 14:37 ` [PATCH 1/8] mm: replace remap_file_pages() syscall with emulation Kirill A. Shutemov
@ 2014-10-08  6:50   ` Vineet Gupta
  2014-10-08 10:03     ` Kirill A. Shutemov
  0 siblings, 1 reply; 19+ messages in thread
From: Vineet Gupta @ 2014-10-08  6:50 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton; +Cc: linux-mm, peterz, mingo

Hi Kirill,

Due to broken PAGE_FILE on arc, I was giving this emulation patch a try and it
seems we need a minor fix to this patch. I know this is not slated for merge soon,
but u can add the fix nevertheless and my Tested-by:

Problem showed up with Ingo Korb's remap-demo.c test case from [1]

[1] https://lkml.org/lkml/2014/7/14/335

On Tuesday 06 May 2014 08:07 PM, Kirill A. Shutemov wrote:
> remap_file_pages(2) was invented to be able efficiently map parts of
> huge file into limited 32-bit virtual address space such as in database
> workloads.
> 
> Nonlinear mappings are pain to support and it seems there's no
> legitimate use-cases nowadays since 64-bit systems are widely available.
> 
> Let's drop it and get rid of all these special-cased code.
> 
> The patch replaces the syscall with emulation which creates new VMA on
> each remap.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---

....
> -}
> diff --git a/mm/mmap.c b/mm/mmap.c
> index b1202cf81f4b..4106fc833f56 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2579,6 +2579,74 @@ SYSCALL_DEFINE2(munmap, unsigned long, addr, size_t, len)
>  	return vm_munmap(addr, len);
>  }
>  
> +
> +/*
> + * Emulation of deprecated remap_file_pages() syscall.
> + */
> +SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
> +		unsigned long, prot, unsigned long, pgoff, unsigned long, flags)
> +{
> +
> +	struct mm_struct *mm = current->mm;
> +	struct vm_area_struct *vma;
> +	unsigned long populate;
> +	int ret = -EINVAL;
> +
> +	printk_once(KERN_WARNING "%s (%d) calls remap_file_pages(2) which is "
> +			"deprecated and no longer supported by kernel in "
> +			"an efficient way.\n"
> +			"Note that emulated remap_file_pages(2) can "
> +			"potentially create a lot of mappings. "
> +			"Consider increasing vm.max_map_count.\n",
> +			current->comm, current->pid);
> +
> +	if (prot)
> +		return ret;
> +	start = start & PAGE_MASK;
> +	size = size & PAGE_MASK;
> +
> +	if (start + size <= start)
> +		return ret;
> +
> +	/* Does pgoff wrap? */
> +	if (pgoff + (size >> PAGE_SHIFT) < pgoff)
> +		return ret;
> +
> +	down_write(&mm->mmap_sem);
> +	vma = find_vma(mm, start);
> +
> +	if (!vma || !(vma->vm_flags & VM_SHARED))
> +		goto out;
> +
> +	if (start < vma->vm_start || start + size > vma->vm_end)
> +		goto out;
> +
> +	if (pgoff == linear_page_index(vma, start)) {
> +		ret = 0;
> +		goto out;
> +	}
> +
> +	prot |= vma->vm_flags & VM_READ ? PROT_READ : 0;
> +	prot |= vma->vm_flags & VM_WRITE ? PROT_WRITE : 0;
> +	prot |= vma->vm_flags & VM_EXEC ? PROT_EXEC : 0;
> +
> +	flags &= MAP_POPULATE;
> +	flags |= MAP_SHARED | MAP_FIXED;
> +	if (vma->vm_flags & VM_LOCKED) {
> +		flags |= MAP_LOCKED;
> +		/* drop PG_Mlocked flag for over-mapped range */
> +		munlock_vma_pages_range(vma, start, start + size);
> +	}
> +
> +	ret = do_mmap_pgoff(vma->vm_file, start, size,
> +			prot, flags, pgoff, &populate);
> +	if (populate)
> +		mm_populate(ret, populate);
> +out:
> +	up_write(&mm->mmap_sem);

On success needs to return 0, not mapped addr.

	if (!IS_ERR_VALUE(ret))
		ret = 0;

> +	return ret;
> +}
> +

Thx,
-Vineet

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/8] mm: replace remap_file_pages() syscall with emulation
  2014-10-08  6:50   ` Vineet Gupta
@ 2014-10-08 10:03     ` Kirill A. Shutemov
  0 siblings, 0 replies; 19+ messages in thread
From: Kirill A. Shutemov @ 2014-10-08 10:03 UTC (permalink / raw)
  To: Vineet Gupta; +Cc: Kirill A. Shutemov, Andrew Morton, linux-mm, peterz, mingo

On Wed, Oct 08, 2014 at 12:20:37PM +0530, Vineet Gupta wrote:
> Hi Kirill,
> 
> Due to broken PAGE_FILE on arc, I was giving this emulation patch a try and it
> seems we need a minor fix to this patch. I know this is not slated for merge soon,
> but u can add the fix nevertheless and my Tested-by:
> 
> Problem showed up with Ingo Korb's remap-demo.c test case from [1]
> 
> [1] https://lkml.org/lkml/2014/7/14/335
> 
> > +
> > +	ret = do_mmap_pgoff(vma->vm_file, start, size,
> > +			prot, flags, pgoff, &populate);
> > +	if (populate)
> > +		mm_populate(ret, populate);
> > +out:
> > +	up_write(&mm->mmap_sem);
> 
> On success needs to return 0, not mapped addr.
> 
> 	if (!IS_ERR_VALUE(ret))
> 		ret = 0;

This bug (and few more) has been fixed long ago in -mm tree.

Thanks for testing, anyway.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2014-10-08 10:05 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-06 14:37 [RFC, PATCH 0/8] remap_file_pages() decommission Kirill A. Shutemov
2014-05-06 14:37 ` [PATCH 1/8] mm: replace remap_file_pages() syscall with emulation Kirill A. Shutemov
2014-10-08  6:50   ` Vineet Gupta
2014-10-08 10:03     ` Kirill A. Shutemov
2014-05-06 14:37 ` [PATCH 2/8] mm: kill vm_operations_struct->remap_pages Kirill A. Shutemov
2014-05-19 15:03   ` Christoph Hellwig
2014-05-19 15:14     ` Kirill A. Shutemov
2014-05-06 14:37 ` [PATCH 3/8] mm: kill zap_details->nonlinear_vma Kirill A. Shutemov
2014-05-06 14:37 ` [PATCH 4/8] mm, rmap: kill rmap_walk_control->file_nonlinear() Kirill A. Shutemov
2014-05-06 14:37 ` [PATCH 5/8] mm, rmap: kill vma->shared.nonlinear Kirill A. Shutemov
2014-05-06 14:37 ` [PATCH 6/8] mm, rmap: kill mapping->i_mmap_nonlinear Kirill A. Shutemov
2014-05-06 14:37 ` [PATCH 7/8] mm: kill VM_NONLINEAR and FAULT_FLAG_NONLINEAR Kirill A. Shutemov
2014-05-06 14:37 ` [PATCH 8/8] mm, x86: kill pte_to_pgoff(), pgoff_to_pte() and pte_file*() Kirill A. Shutemov
2014-05-06 21:35 ` [RFC, PATCH 0/8] remap_file_pages() decommission Andrew Morton
2014-05-06 21:51   ` Linus Torvalds
2014-05-06 23:03     ` Kirill A. Shutemov
2014-05-06 23:28       ` Andrew Morton
2014-05-07  9:12         ` Peter Zijlstra
2014-05-07 16:46           ` Andrew Morton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.