All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] Volatile Ranges (v8?)
@ 2013-06-12  4:22 ` John Stultz
  0 siblings, 0 replies; 48+ messages in thread
From: John Stultz @ 2013-06-12  4:22 UTC (permalink / raw)
  To: LKML
  Cc: John Stultz, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	Minchan Kim, linux-mm

Hey everyone.

I know its been quite awhile. But Minchan and I have been doing a
fair amount of discussing offlist since lsf-mm, trying to come
to agreement on the semantics for the volatile ranges interface,
and after circling around each other's arguments for awhile (he'd
suggest and idea, I'd disagree, then I'd come around to agree just as
he would begin to disagree :) I think things have started to converge
pretty nicely, at least as far as the interface goes.

Some of the more interesting and challenging ideas we've explored
recently have been given up for now, mostly so we can get some core
agreed functionality moving upstream. We may still want to revisit
those ideas before the final push, but for now, we're focusing on
the parts we agree on that we think have a chance at eventually
being merged.

If you've read some of my earlier summaries, you'll likely find
this patchset much simplified:
* We only have one interface: vrange(address, len, mode, *purged),
  which is used in a method similar to madvise on both file or
  anonymous pages.
* We no longer have a concept of anon-only or private-volatility.
  Despite the potential performance gains that Minchan liked in
  avoiding the mmap_sem,the semantics were often confusing when using
  private volatility on non-anonymous pages.
* We no longer have behavior flags. Potential extensions can still be
  done via introducing new mode flags.

The patch set has also been heavily reworked and reordered to make
more iterative sense and hopefully to be easier to review.

Patches 1-5 are what we're wanting the most feedback on, since this
is the area dealing with the userland interface and the semantics of
how volatile ranges behave.

Patches 6-8 provide the back-end purging logic, which is likely
to change, and is provided only so folks can start playing around
with a functional patch series. It currently has some limitations,
like it doesn't purge anonymous pages on swap free systems.
Additionally, the newly integrated file page purging logic likely has
issues still to be resolved.

Overall, We still have the following TODOS with the patchset:
* Come to consensus on the best way to avoid inheriting mm_struct
  volatility when the underlying vmas change. (see patch 4 in this
  series)
* Ensure we zap underlying file page (ala  truncate_inode_pages_range)
  when we purge file pages - this make purging similar to file hole
  punching and ensures we don't find stale data later. (patch 7)
* Avoid lockdep warnings caused by allocations made while holding vroot
  lock triggering reclaim which could try to purge volatile ranges,
  grabbing the same vroot lock.  Minchan added a GFP_NO_VRANGE flag,
  but we've not hooked that up into the reclaim logic to avoid purging.
* Re-integrate Minchan's logic to purge anonymous pages on swapfree
  systems (dropped for this release to keep things simpler for review)


Any feedback and review would be greatly appreciated!

thanks!
-john


Volatile Ranges
============== 
Volatile ranges provide a way for userland applications to provide
hints to the kernel, about memory that is not immediately in use and
can be regenerated if needed.

After marking a range as volatile, if the kernel experiences memory
pressure, the kernel can then purge those pages, freeing up additional
space.  Userland can also tell the kernel it wants to use that memory
again, by marking the range non-volatile, after which the kernel will
not purge that memory.

If the kernel has already purged the memory when userland requests
it be made non-volatile, the kernel will return a warning value to
notify userland that the data was lost and must be regenerated.

If userland accesses memory marked volatile that has not been purged,
it will get the values it expects.

However, if userland touches volatile memory that has been purged, the
kernel will send it a SIGBUS.  This makes it possible for userland to
handle the SIGBUS, marking the memory as non-volatile and re-generating
it as needed before continuing.

In some ways, the kernel's purging of memory can be considered
as similar to a delayed MADV_DONTNEED or FALLOC_FL_PUNCH_HOLE
operation, which can be canceled. Thus similarly to MADV_DONTNEED
or FALLOC_FL_PUNCH_HOLE, operations done on file data that is mmaped
shared will be seen by other processes who have that file mapped. Thus
if an application marks shared  mmaped file data as volatile, that
volatility state is also shared across other tasks. This allows tasks
to coordinate for one task to mark  shared file data as volatile, and a
second task to be able to unmark it if necessary. If the kernel purges
volatile file data that was marked by one task, all tasks sharing
that data will see the data as purged, and will have to mark it as
non-volatile before accessing it or will have to handle the SIGBUS.

All volatility on files is cleared when the last fd handle is closed.


Interface:
The vrange syscall is defined as follows:

int vrange(unsigned long address, size_t length, int mode, int* purged)

address:	Starting address in the process where memory will be
	 	marked. This must be page aligned

length:		Length of the range to be marked. This must be in page
		size units.

mode:
 VRANGE_VOLATILE:	Marks the specified range as volatile, and
			able to be purged.
 VRANGE_NONVOLATILE:	Marks the specified range as non-volatile. If
			any data in that range was volatile and has
			been purged, 1 will be returned in the purged
			pointer.

purged:		Pointer to an integer that will be set to 1 if any data
		in the range being marked non-volatile has been purged
		and is lost. If it is zero, then no data in the
		specified range has been lost.

Return values:
		Returns the number of bytes marked or unmarked. Similar
		to write(), it  may return fewer bytes then specified
		if it ran into a problem.

		If an error (negative value) is returned,no changes
		were made.

Errors:
	EINVAL:
		* address is not page-aligned, or is invalid.
		* length is not a multiple of the page size.
		* length is negative.
	ENOMEM:
		* Not enough memory.
	EFAULT:
		* Purge pointer is invalid.




Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Android Kernel Team <kernel-team@android.com>
Cc: Robert Love <rlove@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Andrea Righi <andrea@betterlinux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mike Hommey <mh@glandium.org>
Cc: Taras Glek <tglek@mozilla.com>
Cc: Dhaval Giani <dgiani@mozilla.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: linux-mm@kvack.org <linux-mm@kvack.org>


John Stultz (2):
  vrange: Add vrange support for file address_spaces
  vrange: Clear volatility on new mmaps

Minchan Kim (6):
  vrange: Add basic data structure and functions
  vrange: Add vrange support to mm_structs
  vrange: Add new vrange(2) system call
  vrange: Add GFP_NO_VRANGE allocation flag
  vrange: Add method to purge volatile ranges
  vrange: Send SIGBUS when user try to access purged page

 arch/x86/include/asm/pgtable_types.h   |   2 +
 arch/x86/syscalls/syscall_64.tbl       |   1 +
 fs/file_table.c                        |   5 +
 fs/inode.c                             |   2 +
 include/asm-generic/pgtable.h          |  11 +
 include/linux/fs.h                     |   2 +
 include/linux/gfp.h                    |   7 +-
 include/linux/mm_types.h               |   5 +
 include/linux/rmap.h                   |  12 +-
 include/linux/swap.h                   |   1 +
 include/linux/vrange.h                 |  60 +++
 include/linux/vrange_types.h           |  19 +
 include/uapi/asm-generic/mman-common.h |   3 +
 init/main.c                            |   2 +
 kernel/fork.c                          |   6 +
 lib/Makefile                           |   2 +-
 mm/Makefile                            |   2 +-
 mm/ksm.c                               |   2 +-
 mm/memory.c                            |  23 +-
 mm/mmap.c                              |   5 +
 mm/rmap.c                              |  30 +-
 mm/swapfile.c                          |  36 ++
 mm/vmscan.c                            |  16 +-
 mm/vrange.c                            | 731 +++++++++++++++++++++++++++++++++
 24 files changed, 963 insertions(+), 22 deletions(-)
 create mode 100644 include/linux/vrange.h
 create mode 100644 include/linux/vrange_types.h
 create mode 100644 mm/vrange.c

-- 
1.8.1.2


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 0/8] Volatile Ranges (v8?)
@ 2013-06-12  4:22 ` John Stultz
  0 siblings, 0 replies; 48+ messages in thread
From: John Stultz @ 2013-06-12  4:22 UTC (permalink / raw)
  To: LKML
  Cc: John Stultz, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	Minchan Kim, linux-mm

Hey everyone.

I know its been quite awhile. But Minchan and I have been doing a
fair amount of discussing offlist since lsf-mm, trying to come
to agreement on the semantics for the volatile ranges interface,
and after circling around each other's arguments for awhile (he'd
suggest and idea, I'd disagree, then I'd come around to agree just as
he would begin to disagree :) I think things have started to converge
pretty nicely, at least as far as the interface goes.

Some of the more interesting and challenging ideas we've explored
recently have been given up for now, mostly so we can get some core
agreed functionality moving upstream. We may still want to revisit
those ideas before the final push, but for now, we're focusing on
the parts we agree on that we think have a chance at eventually
being merged.

If you've read some of my earlier summaries, you'll likely find
this patchset much simplified:
* We only have one interface: vrange(address, len, mode, *purged),
  which is used in a method similar to madvise on both file or
  anonymous pages.
* We no longer have a concept of anon-only or private-volatility.
  Despite the potential performance gains that Minchan liked in
  avoiding the mmap_sem,the semantics were often confusing when using
  private volatility on non-anonymous pages.
* We no longer have behavior flags. Potential extensions can still be
  done via introducing new mode flags.

The patch set has also been heavily reworked and reordered to make
more iterative sense and hopefully to be easier to review.

Patches 1-5 are what we're wanting the most feedback on, since this
is the area dealing with the userland interface and the semantics of
how volatile ranges behave.

Patches 6-8 provide the back-end purging logic, which is likely
to change, and is provided only so folks can start playing around
with a functional patch series. It currently has some limitations,
like it doesn't purge anonymous pages on swap free systems.
Additionally, the newly integrated file page purging logic likely has
issues still to be resolved.

Overall, We still have the following TODOS with the patchset:
* Come to consensus on the best way to avoid inheriting mm_struct
  volatility when the underlying vmas change. (see patch 4 in this
  series)
* Ensure we zap underlying file page (ala  truncate_inode_pages_range)
  when we purge file pages - this make purging similar to file hole
  punching and ensures we don't find stale data later. (patch 7)
* Avoid lockdep warnings caused by allocations made while holding vroot
  lock triggering reclaim which could try to purge volatile ranges,
  grabbing the same vroot lock.  Minchan added a GFP_NO_VRANGE flag,
  but we've not hooked that up into the reclaim logic to avoid purging.
* Re-integrate Minchan's logic to purge anonymous pages on swapfree
  systems (dropped for this release to keep things simpler for review)


Any feedback and review would be greatly appreciated!

thanks!
-john


Volatile Ranges
============== 
Volatile ranges provide a way for userland applications to provide
hints to the kernel, about memory that is not immediately in use and
can be regenerated if needed.

After marking a range as volatile, if the kernel experiences memory
pressure, the kernel can then purge those pages, freeing up additional
space.  Userland can also tell the kernel it wants to use that memory
again, by marking the range non-volatile, after which the kernel will
not purge that memory.

If the kernel has already purged the memory when userland requests
it be made non-volatile, the kernel will return a warning value to
notify userland that the data was lost and must be regenerated.

If userland accesses memory marked volatile that has not been purged,
it will get the values it expects.

However, if userland touches volatile memory that has been purged, the
kernel will send it a SIGBUS.  This makes it possible for userland to
handle the SIGBUS, marking the memory as non-volatile and re-generating
it as needed before continuing.

In some ways, the kernel's purging of memory can be considered
as similar to a delayed MADV_DONTNEED or FALLOC_FL_PUNCH_HOLE
operation, which can be canceled. Thus similarly to MADV_DONTNEED
or FALLOC_FL_PUNCH_HOLE, operations done on file data that is mmaped
shared will be seen by other processes who have that file mapped. Thus
if an application marks shared  mmaped file data as volatile, that
volatility state is also shared across other tasks. This allows tasks
to coordinate for one task to mark  shared file data as volatile, and a
second task to be able to unmark it if necessary. If the kernel purges
volatile file data that was marked by one task, all tasks sharing
that data will see the data as purged, and will have to mark it as
non-volatile before accessing it or will have to handle the SIGBUS.

All volatility on files is cleared when the last fd handle is closed.


Interface:
The vrange syscall is defined as follows:

int vrange(unsigned long address, size_t length, int mode, int* purged)

address:	Starting address in the process where memory will be
	 	marked. This must be page aligned

length:		Length of the range to be marked. This must be in page
		size units.

mode:
 VRANGE_VOLATILE:	Marks the specified range as volatile, and
			able to be purged.
 VRANGE_NONVOLATILE:	Marks the specified range as non-volatile. If
			any data in that range was volatile and has
			been purged, 1 will be returned in the purged
			pointer.

purged:		Pointer to an integer that will be set to 1 if any data
		in the range being marked non-volatile has been purged
		and is lost. If it is zero, then no data in the
		specified range has been lost.

Return values:
		Returns the number of bytes marked or unmarked. Similar
		to write(), it  may return fewer bytes then specified
		if it ran into a problem.

		If an error (negative value) is returned,no changes
		were made.

Errors:
	EINVAL:
		* address is not page-aligned, or is invalid.
		* length is not a multiple of the page size.
		* length is negative.
	ENOMEM:
		* Not enough memory.
	EFAULT:
		* Purge pointer is invalid.




Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Android Kernel Team <kernel-team@android.com>
Cc: Robert Love <rlove@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Andrea Righi <andrea@betterlinux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mike Hommey <mh@glandium.org>
Cc: Taras Glek <tglek@mozilla.com>
Cc: Dhaval Giani <dgiani@mozilla.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: linux-mm@kvack.org <linux-mm@kvack.org>


John Stultz (2):
  vrange: Add vrange support for file address_spaces
  vrange: Clear volatility on new mmaps

Minchan Kim (6):
  vrange: Add basic data structure and functions
  vrange: Add vrange support to mm_structs
  vrange: Add new vrange(2) system call
  vrange: Add GFP_NO_VRANGE allocation flag
  vrange: Add method to purge volatile ranges
  vrange: Send SIGBUS when user try to access purged page

 arch/x86/include/asm/pgtable_types.h   |   2 +
 arch/x86/syscalls/syscall_64.tbl       |   1 +
 fs/file_table.c                        |   5 +
 fs/inode.c                             |   2 +
 include/asm-generic/pgtable.h          |  11 +
 include/linux/fs.h                     |   2 +
 include/linux/gfp.h                    |   7 +-
 include/linux/mm_types.h               |   5 +
 include/linux/rmap.h                   |  12 +-
 include/linux/swap.h                   |   1 +
 include/linux/vrange.h                 |  60 +++
 include/linux/vrange_types.h           |  19 +
 include/uapi/asm-generic/mman-common.h |   3 +
 init/main.c                            |   2 +
 kernel/fork.c                          |   6 +
 lib/Makefile                           |   2 +-
 mm/Makefile                            |   2 +-
 mm/ksm.c                               |   2 +-
 mm/memory.c                            |  23 +-
 mm/mmap.c                              |   5 +
 mm/rmap.c                              |  30 +-
 mm/swapfile.c                          |  36 ++
 mm/vmscan.c                            |  16 +-
 mm/vrange.c                            | 731 +++++++++++++++++++++++++++++++++
 24 files changed, 963 insertions(+), 22 deletions(-)
 create mode 100644 include/linux/vrange.h
 create mode 100644 include/linux/vrange_types.h
 create mode 100644 mm/vrange.c

-- 
1.8.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 1/8] vrange: Add basic data structure and functions
  2013-06-12  4:22 ` John Stultz
@ 2013-06-12  4:22   ` John Stultz
  -1 siblings, 0 replies; 48+ messages in thread
From: John Stultz @ 2013-06-12  4:22 UTC (permalink / raw)
  To: LKML
  Cc: Minchan Kim, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	linux-mm, John Stultz

From: Minchan Kim <minchan@kernel.org>

This patch adds vrange data structure(interval tree) and
related functions.

The vrange uses generic interval tree as main data structure
because it handles address range so generic interval tree
fits well for the purpose.

The add_vrange/remove_vrange are core functions for system call
will be introduced next patch.

1. add_vrange inserts new address range into interval tree.
   If new address range crosses over existing volatile range,
   existing volatile range will be expanded to cover new range.
   Then, if existing volatile range has purged state, new range
   will have a purged state.
   It's not good and we need more fine-grained purged state handling
   in a vrange(TODO)

   If new address range is inside existing range, we ignore it

2. remove_vrange removes address range
   Then, return a purged state of the address ranges.

This patch copied some part from John Stultz's work but different semantic.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Android Kernel Team <kernel-team@android.com>
Cc: Robert Love <rlove@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Andrea Righi <andrea@betterlinux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mike Hommey <mh@glandium.org>
Cc: Taras Glek <tglek@mozilla.com>
Cc: Dhaval Giani <dgiani@mozilla.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: linux-mm@kvack.org <linux-mm@kvack.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
[jstultz: Heavy rework and cleanups to make this infrastructure more
easily reused for both file and anonymous pages]
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 include/linux/vrange.h       |  44 +++++++++++
 include/linux/vrange_types.h |  19 +++++
 init/main.c                  |   2 +
 lib/Makefile                 |   2 +-
 mm/Makefile                  |   2 +-
 mm/vrange.c                  | 181 +++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 248 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/vrange.h
 create mode 100644 include/linux/vrange_types.h
 create mode 100644 mm/vrange.c

diff --git a/include/linux/vrange.h b/include/linux/vrange.h
new file mode 100644
index 0000000..2064cb0
--- /dev/null
+++ b/include/linux/vrange.h
@@ -0,0 +1,44 @@
+#ifndef _LINUX_VRANGE_H
+#define _LINUX_VRANGE_H
+
+#include <linux/vrange_types.h>
+#include <linux/mm.h>
+
+#define vrange_entry(ptr) \
+	container_of(ptr, struct vrange, node.rb)
+
+#ifdef CONFIG_MMU
+
+static inline void vrange_root_init(struct vrange_root *vroot, int type)
+{
+	vroot->type = type;
+	vroot->v_rb = RB_ROOT;
+	mutex_init(&vroot->v_lock);
+}
+
+static inline void vrange_lock(struct vrange_root *vroot)
+{
+	mutex_lock(&vroot->v_lock);
+}
+
+static inline void vrange_unlock(struct vrange_root *vroot)
+{
+	mutex_unlock(&vroot->v_lock);
+}
+
+static inline int vrange_type(struct vrange *vrange)
+{
+	return vrange->owner->type;
+}
+
+void vrange_init(void);
+extern void vrange_root_cleanup(struct vrange_root *vroot);
+
+#else
+
+static inline void vrange_init(void) {};
+static inline void vrange_root_init(struct vrange_root *vroot, int type) {};
+static inline void vrange_root_cleanup(struct vrange_root *vroot) {};
+
+#endif
+#endif /* _LINIUX_VRANGE_H */
diff --git a/include/linux/vrange_types.h b/include/linux/vrange_types.h
new file mode 100644
index 0000000..7f44c01
--- /dev/null
+++ b/include/linux/vrange_types.h
@@ -0,0 +1,19 @@
+#ifndef _LINUX_VRANGE_TYPES_H
+#define _LINUX_VRANGE_TYPES_H
+
+#include <linux/mutex.h>
+#include <linux/interval_tree.h>
+
+struct vrange_root {
+	struct rb_root v_rb;		/* vrange rb tree */
+	struct mutex v_lock;		/* Protect v_rb */
+	enum {VRANGE_MM, VRANGE_FILE} type; /* range root type */
+};
+
+struct vrange {
+	struct interval_tree_node node;
+	struct vrange_root *owner;
+	int purged;
+};
+#endif
+
diff --git a/init/main.c b/init/main.c
index 9484f4b..9cf08ba 100644
--- a/init/main.c
+++ b/init/main.c
@@ -74,6 +74,7 @@
 #include <linux/ptrace.h>
 #include <linux/blkdev.h>
 #include <linux/elevator.h>
+#include <linux/vrange.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -601,6 +602,7 @@ asmlinkage void __init start_kernel(void)
 	calibrate_delay();
 	pidmap_init();
 	anon_vma_init();
+	vrange_init();
 #ifdef CONFIG_X86
 	if (efi_enabled(EFI_RUNTIME_SERVICES))
 		efi_enter_virtual_mode();
diff --git a/lib/Makefile b/lib/Makefile
index c55a037..ccd15ff 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -13,7 +13,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 sha1.o md5.o irq_regs.o reciprocal_div.o argv_split.o \
 	 proportions.o flex_proportions.o prio_heap.o ratelimit.o show_mem.o \
 	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
-	 earlycpio.o
+	 earlycpio.o interval_tree.o
 
 obj-$(CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS) += usercopy.o
 lib-$(CONFIG_MMU) += ioremap.o
diff --git a/mm/Makefile b/mm/Makefile
index 72c5acb..b67fcf5 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -5,7 +5,7 @@
 mmu-y			:= nommu.o
 mmu-$(CONFIG_MMU)	:= fremap.o highmem.o madvise.o memory.o mincore.o \
 			   mlock.o mmap.o mprotect.o mremap.o msync.o rmap.o \
-			   vmalloc.o pagewalk.o pgtable-generic.o
+			   vmalloc.o pagewalk.o pgtable-generic.o vrange.o
 
 ifdef CONFIG_CROSS_MEMORY_ATTACH
 mmu-$(CONFIG_MMU)	+= process_vm_access.o
diff --git a/mm/vrange.c b/mm/vrange.c
new file mode 100644
index 0000000..e3042e0
--- /dev/null
+++ b/mm/vrange.c
@@ -0,0 +1,181 @@
+/*
+ * mm/vrange.c
+ */
+
+#include <linux/vrange.h>
+#include <linux/slab.h>
+
+static struct kmem_cache *vrange_cachep;
+
+void __init vrange_init(void)
+{
+	vrange_cachep = KMEM_CACHE(vrange, SLAB_PANIC);
+}
+
+static struct vrange *__vrange_alloc(gfp_t flags)
+{
+	struct vrange *vrange = kmem_cache_alloc(vrange_cachep, flags);
+	if (!vrange)
+		return vrange;
+	vrange->owner = NULL;
+	return vrange;
+}
+
+static void __vrange_free(struct vrange *range)
+{
+	WARN_ON(range->owner);
+	kmem_cache_free(vrange_cachep, range);
+}
+
+static void __vrange_add(struct vrange *range, struct vrange_root *vroot)
+{
+	range->owner = vroot;
+	interval_tree_insert(&range->node, &vroot->v_rb);
+}
+
+static void __vrange_remove(struct vrange *range)
+{
+	interval_tree_remove(&range->node, &range->owner->v_rb);
+	range->owner = NULL;
+}
+
+static inline void __vrange_set(struct vrange *range,
+		unsigned long start_idx, unsigned long end_idx,
+		bool purged)
+{
+	range->node.start = start_idx;
+	range->node.last = end_idx;
+	range->purged = purged;
+}
+
+static inline void __vrange_resize(struct vrange *range,
+		unsigned long start_idx, unsigned long end_idx)
+{
+	struct vrange_root *vroot = range->owner;
+	bool purged = range->purged;
+
+	__vrange_remove(range);
+	__vrange_set(range, start_idx, end_idx, purged);
+	__vrange_add(range, vroot);
+}
+
+static int vrange_add(struct vrange_root *vroot,
+			unsigned long start_idx, unsigned long end_idx)
+{
+	struct vrange *new_range, *range;
+	struct interval_tree_node *node, *next;
+	int purged = 0;
+
+	new_range = __vrange_alloc(GFP_KERNEL);
+	if (!new_range)
+		return -ENOMEM;
+
+	vrange_lock(vroot);
+
+	node = interval_tree_iter_first(&vroot->v_rb, start_idx, end_idx);
+	while (node) {
+		next = interval_tree_iter_next(node, start_idx, end_idx);
+		range = container_of(node, struct vrange, node);
+		/* old range covers new range fully */
+		if (node->start <= start_idx && node->last >= end_idx) {
+			__vrange_free(new_range);
+			goto out;
+		}
+
+		start_idx = min_t(unsigned long, start_idx, node->start);
+		end_idx = max_t(unsigned long, end_idx, node->last);
+		purged |= range->purged;
+
+		__vrange_remove(range);
+		__vrange_free(range);
+
+		node = next;
+	}
+
+	__vrange_set(new_range, start_idx, end_idx, purged);
+	__vrange_add(new_range, vroot);
+out:
+	vrange_unlock(vroot);
+	return 0;
+}
+
+static int vrange_remove(struct vrange_root *vroot,
+				unsigned long start_idx, unsigned long end_idx,
+				int *purged)
+{
+	struct vrange *new_range, *range;
+	struct interval_tree_node *node, *next;
+	bool used_new = false;
+
+	if (!purged)
+		return -EINVAL;
+
+	*purged = 0;
+
+	new_range = __vrange_alloc(GFP_KERNEL);
+	if (!new_range)
+		return -ENOMEM;
+
+	vrange_lock(vroot);
+
+	node = interval_tree_iter_first(&vroot->v_rb, start_idx, end_idx);
+	while (node) {
+		next = interval_tree_iter_next(node, start_idx, end_idx);
+		range = container_of(node, struct vrange, node);
+
+		*purged |= range->purged;
+
+		if (start_idx <= node->start && end_idx >= node->last) {
+			/* argumented range covers the range fully */
+			__vrange_remove(range);
+			__vrange_free(range);
+		} else if (node->start >= start_idx) {
+			/*
+			 * Argumented range covers over the left of the
+			 * range
+			 */
+			__vrange_resize(range, end_idx + 1, node->last);
+		} else if (node->last <= end_idx) {
+			/*
+			 * Argumented range covers over the right of the
+			 * range
+			 */
+			__vrange_resize(range, node->start, start_idx - 1);
+		} else {
+			/*
+			 * Argumented range is middle of the range
+			 */
+			used_new = true;
+			__vrange_resize(range, node->start, start_idx - 1);
+			__vrange_set(new_range, end_idx + 1, node->last,
+					range->purged);
+			__vrange_add(new_range, vroot);
+			break;
+		}
+
+		node = next;
+	}
+	vrange_unlock(vroot);
+
+	if (!used_new)
+		__vrange_free(new_range);
+
+	return 0;
+}
+
+void vrange_root_cleanup(struct vrange_root *vroot)
+{
+	struct vrange *range;
+	struct rb_node *next;
+
+	vrange_lock(vroot);
+	next = rb_first(&vroot->v_rb);
+	while (next) {
+		range = vrange_entry(next);
+		next = rb_next(next);
+		__vrange_remove(range);
+		__vrange_free(range);
+	}
+	vrange_unlock(vroot);
+}
+
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 1/8] vrange: Add basic data structure and functions
@ 2013-06-12  4:22   ` John Stultz
  0 siblings, 0 replies; 48+ messages in thread
From: John Stultz @ 2013-06-12  4:22 UTC (permalink / raw)
  To: LKML
  Cc: Minchan Kim, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	linux-mm, John Stultz

From: Minchan Kim <minchan@kernel.org>

This patch adds vrange data structure(interval tree) and
related functions.

The vrange uses generic interval tree as main data structure
because it handles address range so generic interval tree
fits well for the purpose.

The add_vrange/remove_vrange are core functions for system call
will be introduced next patch.

1. add_vrange inserts new address range into interval tree.
   If new address range crosses over existing volatile range,
   existing volatile range will be expanded to cover new range.
   Then, if existing volatile range has purged state, new range
   will have a purged state.
   It's not good and we need more fine-grained purged state handling
   in a vrange(TODO)

   If new address range is inside existing range, we ignore it

2. remove_vrange removes address range
   Then, return a purged state of the address ranges.

This patch copied some part from John Stultz's work but different semantic.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Android Kernel Team <kernel-team@android.com>
Cc: Robert Love <rlove@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Andrea Righi <andrea@betterlinux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mike Hommey <mh@glandium.org>
Cc: Taras Glek <tglek@mozilla.com>
Cc: Dhaval Giani <dgiani@mozilla.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: linux-mm@kvack.org <linux-mm@kvack.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
[jstultz: Heavy rework and cleanups to make this infrastructure more
easily reused for both file and anonymous pages]
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 include/linux/vrange.h       |  44 +++++++++++
 include/linux/vrange_types.h |  19 +++++
 init/main.c                  |   2 +
 lib/Makefile                 |   2 +-
 mm/Makefile                  |   2 +-
 mm/vrange.c                  | 181 +++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 248 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/vrange.h
 create mode 100644 include/linux/vrange_types.h
 create mode 100644 mm/vrange.c

diff --git a/include/linux/vrange.h b/include/linux/vrange.h
new file mode 100644
index 0000000..2064cb0
--- /dev/null
+++ b/include/linux/vrange.h
@@ -0,0 +1,44 @@
+#ifndef _LINUX_VRANGE_H
+#define _LINUX_VRANGE_H
+
+#include <linux/vrange_types.h>
+#include <linux/mm.h>
+
+#define vrange_entry(ptr) \
+	container_of(ptr, struct vrange, node.rb)
+
+#ifdef CONFIG_MMU
+
+static inline void vrange_root_init(struct vrange_root *vroot, int type)
+{
+	vroot->type = type;
+	vroot->v_rb = RB_ROOT;
+	mutex_init(&vroot->v_lock);
+}
+
+static inline void vrange_lock(struct vrange_root *vroot)
+{
+	mutex_lock(&vroot->v_lock);
+}
+
+static inline void vrange_unlock(struct vrange_root *vroot)
+{
+	mutex_unlock(&vroot->v_lock);
+}
+
+static inline int vrange_type(struct vrange *vrange)
+{
+	return vrange->owner->type;
+}
+
+void vrange_init(void);
+extern void vrange_root_cleanup(struct vrange_root *vroot);
+
+#else
+
+static inline void vrange_init(void) {};
+static inline void vrange_root_init(struct vrange_root *vroot, int type) {};
+static inline void vrange_root_cleanup(struct vrange_root *vroot) {};
+
+#endif
+#endif /* _LINIUX_VRANGE_H */
diff --git a/include/linux/vrange_types.h b/include/linux/vrange_types.h
new file mode 100644
index 0000000..7f44c01
--- /dev/null
+++ b/include/linux/vrange_types.h
@@ -0,0 +1,19 @@
+#ifndef _LINUX_VRANGE_TYPES_H
+#define _LINUX_VRANGE_TYPES_H
+
+#include <linux/mutex.h>
+#include <linux/interval_tree.h>
+
+struct vrange_root {
+	struct rb_root v_rb;		/* vrange rb tree */
+	struct mutex v_lock;		/* Protect v_rb */
+	enum {VRANGE_MM, VRANGE_FILE} type; /* range root type */
+};
+
+struct vrange {
+	struct interval_tree_node node;
+	struct vrange_root *owner;
+	int purged;
+};
+#endif
+
diff --git a/init/main.c b/init/main.c
index 9484f4b..9cf08ba 100644
--- a/init/main.c
+++ b/init/main.c
@@ -74,6 +74,7 @@
 #include <linux/ptrace.h>
 #include <linux/blkdev.h>
 #include <linux/elevator.h>
+#include <linux/vrange.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -601,6 +602,7 @@ asmlinkage void __init start_kernel(void)
 	calibrate_delay();
 	pidmap_init();
 	anon_vma_init();
+	vrange_init();
 #ifdef CONFIG_X86
 	if (efi_enabled(EFI_RUNTIME_SERVICES))
 		efi_enter_virtual_mode();
diff --git a/lib/Makefile b/lib/Makefile
index c55a037..ccd15ff 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -13,7 +13,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 sha1.o md5.o irq_regs.o reciprocal_div.o argv_split.o \
 	 proportions.o flex_proportions.o prio_heap.o ratelimit.o show_mem.o \
 	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
-	 earlycpio.o
+	 earlycpio.o interval_tree.o
 
 obj-$(CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS) += usercopy.o
 lib-$(CONFIG_MMU) += ioremap.o
diff --git a/mm/Makefile b/mm/Makefile
index 72c5acb..b67fcf5 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -5,7 +5,7 @@
 mmu-y			:= nommu.o
 mmu-$(CONFIG_MMU)	:= fremap.o highmem.o madvise.o memory.o mincore.o \
 			   mlock.o mmap.o mprotect.o mremap.o msync.o rmap.o \
-			   vmalloc.o pagewalk.o pgtable-generic.o
+			   vmalloc.o pagewalk.o pgtable-generic.o vrange.o
 
 ifdef CONFIG_CROSS_MEMORY_ATTACH
 mmu-$(CONFIG_MMU)	+= process_vm_access.o
diff --git a/mm/vrange.c b/mm/vrange.c
new file mode 100644
index 0000000..e3042e0
--- /dev/null
+++ b/mm/vrange.c
@@ -0,0 +1,181 @@
+/*
+ * mm/vrange.c
+ */
+
+#include <linux/vrange.h>
+#include <linux/slab.h>
+
+static struct kmem_cache *vrange_cachep;
+
+void __init vrange_init(void)
+{
+	vrange_cachep = KMEM_CACHE(vrange, SLAB_PANIC);
+}
+
+static struct vrange *__vrange_alloc(gfp_t flags)
+{
+	struct vrange *vrange = kmem_cache_alloc(vrange_cachep, flags);
+	if (!vrange)
+		return vrange;
+	vrange->owner = NULL;
+	return vrange;
+}
+
+static void __vrange_free(struct vrange *range)
+{
+	WARN_ON(range->owner);
+	kmem_cache_free(vrange_cachep, range);
+}
+
+static void __vrange_add(struct vrange *range, struct vrange_root *vroot)
+{
+	range->owner = vroot;
+	interval_tree_insert(&range->node, &vroot->v_rb);
+}
+
+static void __vrange_remove(struct vrange *range)
+{
+	interval_tree_remove(&range->node, &range->owner->v_rb);
+	range->owner = NULL;
+}
+
+static inline void __vrange_set(struct vrange *range,
+		unsigned long start_idx, unsigned long end_idx,
+		bool purged)
+{
+	range->node.start = start_idx;
+	range->node.last = end_idx;
+	range->purged = purged;
+}
+
+static inline void __vrange_resize(struct vrange *range,
+		unsigned long start_idx, unsigned long end_idx)
+{
+	struct vrange_root *vroot = range->owner;
+	bool purged = range->purged;
+
+	__vrange_remove(range);
+	__vrange_set(range, start_idx, end_idx, purged);
+	__vrange_add(range, vroot);
+}
+
+static int vrange_add(struct vrange_root *vroot,
+			unsigned long start_idx, unsigned long end_idx)
+{
+	struct vrange *new_range, *range;
+	struct interval_tree_node *node, *next;
+	int purged = 0;
+
+	new_range = __vrange_alloc(GFP_KERNEL);
+	if (!new_range)
+		return -ENOMEM;
+
+	vrange_lock(vroot);
+
+	node = interval_tree_iter_first(&vroot->v_rb, start_idx, end_idx);
+	while (node) {
+		next = interval_tree_iter_next(node, start_idx, end_idx);
+		range = container_of(node, struct vrange, node);
+		/* old range covers new range fully */
+		if (node->start <= start_idx && node->last >= end_idx) {
+			__vrange_free(new_range);
+			goto out;
+		}
+
+		start_idx = min_t(unsigned long, start_idx, node->start);
+		end_idx = max_t(unsigned long, end_idx, node->last);
+		purged |= range->purged;
+
+		__vrange_remove(range);
+		__vrange_free(range);
+
+		node = next;
+	}
+
+	__vrange_set(new_range, start_idx, end_idx, purged);
+	__vrange_add(new_range, vroot);
+out:
+	vrange_unlock(vroot);
+	return 0;
+}
+
+static int vrange_remove(struct vrange_root *vroot,
+				unsigned long start_idx, unsigned long end_idx,
+				int *purged)
+{
+	struct vrange *new_range, *range;
+	struct interval_tree_node *node, *next;
+	bool used_new = false;
+
+	if (!purged)
+		return -EINVAL;
+
+	*purged = 0;
+
+	new_range = __vrange_alloc(GFP_KERNEL);
+	if (!new_range)
+		return -ENOMEM;
+
+	vrange_lock(vroot);
+
+	node = interval_tree_iter_first(&vroot->v_rb, start_idx, end_idx);
+	while (node) {
+		next = interval_tree_iter_next(node, start_idx, end_idx);
+		range = container_of(node, struct vrange, node);
+
+		*purged |= range->purged;
+
+		if (start_idx <= node->start && end_idx >= node->last) {
+			/* argumented range covers the range fully */
+			__vrange_remove(range);
+			__vrange_free(range);
+		} else if (node->start >= start_idx) {
+			/*
+			 * Argumented range covers over the left of the
+			 * range
+			 */
+			__vrange_resize(range, end_idx + 1, node->last);
+		} else if (node->last <= end_idx) {
+			/*
+			 * Argumented range covers over the right of the
+			 * range
+			 */
+			__vrange_resize(range, node->start, start_idx - 1);
+		} else {
+			/*
+			 * Argumented range is middle of the range
+			 */
+			used_new = true;
+			__vrange_resize(range, node->start, start_idx - 1);
+			__vrange_set(new_range, end_idx + 1, node->last,
+					range->purged);
+			__vrange_add(new_range, vroot);
+			break;
+		}
+
+		node = next;
+	}
+	vrange_unlock(vroot);
+
+	if (!used_new)
+		__vrange_free(new_range);
+
+	return 0;
+}
+
+void vrange_root_cleanup(struct vrange_root *vroot)
+{
+	struct vrange *range;
+	struct rb_node *next;
+
+	vrange_lock(vroot);
+	next = rb_first(&vroot->v_rb);
+	while (next) {
+		range = vrange_entry(next);
+		next = rb_next(next);
+		__vrange_remove(range);
+		__vrange_free(range);
+	}
+	vrange_unlock(vroot);
+}
+
-- 
1.8.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 2/8] vrange: Add vrange support for file address_spaces
  2013-06-12  4:22 ` John Stultz
@ 2013-06-12  4:22   ` John Stultz
  -1 siblings, 0 replies; 48+ messages in thread
From: John Stultz @ 2013-06-12  4:22 UTC (permalink / raw)
  To: LKML
  Cc: John Stultz, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	Minchan Kim, linux-mm

Modify address_space structures to be able to store vrange trees.

This includes logic to clear all volatile ranges when the last
file handle is closed.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Android Kernel Team <kernel-team@android.com>
Cc: Robert Love <rlove@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Andrea Righi <andrea@betterlinux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mike Hommey <mh@glandium.org>
Cc: Taras Glek <tglek@mozilla.com>
Cc: Dhaval Giani <dgiani@mozilla.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: linux-mm@kvack.org <linux-mm@kvack.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
[jstultz: Major refactoring of the code]
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 fs/file_table.c    | 5 +++++
 fs/inode.c         | 2 ++
 include/linux/fs.h | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/fs/file_table.c b/fs/file_table.c
index cd4d87a..94e2cd3 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -26,6 +26,7 @@
 #include <linux/hardirq.h>
 #include <linux/task_work.h>
 #include <linux/ima.h>
+#include <linux/vrange.h>
 
 #include <linux/atomic.h>
 
@@ -244,6 +245,10 @@ static void __fput(struct file *file)
 			file->f_op->fasync(-1, file, 0);
 	}
 	ima_file_free(file);
+
+	/* drop all vranges on last close */
+	vrange_root_cleanup(&inode->i_mapping->vroot);
+
 	if (file->f_op && file->f_op->release)
 		file->f_op->release(inode, file);
 	security_file_free(file);
diff --git a/fs/inode.c b/fs/inode.c
index 00d5fc3..bf32780 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -17,6 +17,7 @@
 #include <linux/prefetch.h>
 #include <linux/buffer_head.h> /* for inode_has_buffers */
 #include <linux/ratelimit.h>
+#include <linux/vrange.h>
 #include "internal.h"
 
 /*
@@ -350,6 +351,7 @@ void address_space_init_once(struct address_space *mapping)
 	spin_lock_init(&mapping->private_lock);
 	mapping->i_mmap = RB_ROOT;
 	INIT_LIST_HEAD(&mapping->i_mmap_nonlinear);
+	vrange_root_init(&mapping->vroot, VRANGE_FILE);
 }
 EXPORT_SYMBOL(address_space_init_once);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 43db02e..1cbed73 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -27,6 +27,7 @@
 #include <linux/lockdep.h>
 #include <linux/percpu-rwsem.h>
 #include <linux/blk_types.h>
+#include <linux/vrange_types.h>
 
 #include <asm/byteorder.h>
 #include <uapi/linux/fs.h>
@@ -411,6 +412,7 @@ struct address_space {
 	struct rb_root		i_mmap;		/* tree of private and shared mappings */
 	struct list_head	i_mmap_nonlinear;/*list VM_NONLINEAR mappings */
 	struct mutex		i_mmap_mutex;	/* protect tree, count, list */
+	struct vrange_root	vroot;
 	/* Protected by tree_lock together with the radix tree */
 	unsigned long		nrpages;	/* number of total pages */
 	pgoff_t			writeback_index;/* writeback starts here */
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 2/8] vrange: Add vrange support for file address_spaces
@ 2013-06-12  4:22   ` John Stultz
  0 siblings, 0 replies; 48+ messages in thread
From: John Stultz @ 2013-06-12  4:22 UTC (permalink / raw)
  To: LKML
  Cc: John Stultz, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	Minchan Kim, linux-mm

Modify address_space structures to be able to store vrange trees.

This includes logic to clear all volatile ranges when the last
file handle is closed.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Android Kernel Team <kernel-team@android.com>
Cc: Robert Love <rlove@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Andrea Righi <andrea@betterlinux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mike Hommey <mh@glandium.org>
Cc: Taras Glek <tglek@mozilla.com>
Cc: Dhaval Giani <dgiani@mozilla.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: linux-mm@kvack.org <linux-mm@kvack.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
[jstultz: Major refactoring of the code]
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 fs/file_table.c    | 5 +++++
 fs/inode.c         | 2 ++
 include/linux/fs.h | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/fs/file_table.c b/fs/file_table.c
index cd4d87a..94e2cd3 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -26,6 +26,7 @@
 #include <linux/hardirq.h>
 #include <linux/task_work.h>
 #include <linux/ima.h>
+#include <linux/vrange.h>
 
 #include <linux/atomic.h>
 
@@ -244,6 +245,10 @@ static void __fput(struct file *file)
 			file->f_op->fasync(-1, file, 0);
 	}
 	ima_file_free(file);
+
+	/* drop all vranges on last close */
+	vrange_root_cleanup(&inode->i_mapping->vroot);
+
 	if (file->f_op && file->f_op->release)
 		file->f_op->release(inode, file);
 	security_file_free(file);
diff --git a/fs/inode.c b/fs/inode.c
index 00d5fc3..bf32780 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -17,6 +17,7 @@
 #include <linux/prefetch.h>
 #include <linux/buffer_head.h> /* for inode_has_buffers */
 #include <linux/ratelimit.h>
+#include <linux/vrange.h>
 #include "internal.h"
 
 /*
@@ -350,6 +351,7 @@ void address_space_init_once(struct address_space *mapping)
 	spin_lock_init(&mapping->private_lock);
 	mapping->i_mmap = RB_ROOT;
 	INIT_LIST_HEAD(&mapping->i_mmap_nonlinear);
+	vrange_root_init(&mapping->vroot, VRANGE_FILE);
 }
 EXPORT_SYMBOL(address_space_init_once);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 43db02e..1cbed73 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -27,6 +27,7 @@
 #include <linux/lockdep.h>
 #include <linux/percpu-rwsem.h>
 #include <linux/blk_types.h>
+#include <linux/vrange_types.h>
 
 #include <asm/byteorder.h>
 #include <uapi/linux/fs.h>
@@ -411,6 +412,7 @@ struct address_space {
 	struct rb_root		i_mmap;		/* tree of private and shared mappings */
 	struct list_head	i_mmap_nonlinear;/*list VM_NONLINEAR mappings */
 	struct mutex		i_mmap_mutex;	/* protect tree, count, list */
+	struct vrange_root	vroot;
 	/* Protected by tree_lock together with the radix tree */
 	unsigned long		nrpages;	/* number of total pages */
 	pgoff_t			writeback_index;/* writeback starts here */
-- 
1.8.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 3/8] vrange: Add vrange support to mm_structs
  2013-06-12  4:22 ` John Stultz
@ 2013-06-12  4:22   ` John Stultz
  -1 siblings, 0 replies; 48+ messages in thread
From: John Stultz @ 2013-06-12  4:22 UTC (permalink / raw)
  To: LKML
  Cc: Minchan Kim, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	linux-mm, John Stultz

From: Minchan Kim <minchan@kernel.org>

Allows for vranges to be managed against mm_structs.

Includes support for copying vrange trees on fork,
as well as clearing them on exec.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Android Kernel Team <kernel-team@android.com>
Cc: Robert Love <rlove@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Andrea Righi <andrea@betterlinux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mike Hommey <mh@glandium.org>
Cc: Taras Glek <tglek@mozilla.com>
Cc: Dhaval Giani <dgiani@mozilla.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: linux-mm@kvack.org <linux-mm@kvack.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
[jstultz: Heavy refactoring.]
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 include/linux/mm_types.h |  5 +++++
 include/linux/vrange.h   |  7 ++++++-
 kernel/fork.c            |  6 ++++++
 mm/vrange.c              | 30 ++++++++++++++++++++++++++++++
 4 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index ace9a5f..2e02a6d 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -13,6 +13,8 @@
 #include <linux/page-debug-flags.h>
 #include <linux/uprobes.h>
 #include <linux/page-flags-layout.h>
+#include <linux/mutex.h>
+#include <linux/vrange_types.h>
 #include <asm/page.h>
 #include <asm/mmu.h>
 
@@ -351,6 +353,9 @@ struct mm_struct {
 						 */
 
 
+#ifdef CONFIG_MMU
+	struct vrange_root vroot;
+#endif
 	unsigned long hiwater_rss;	/* High-watermark of RSS usage */
 	unsigned long hiwater_vm;	/* High-water virtual memory usage */
 
diff --git a/include/linux/vrange.h b/include/linux/vrange.h
index 2064cb0..13f4887 100644
--- a/include/linux/vrange.h
+++ b/include/linux/vrange.h
@@ -33,12 +33,17 @@ static inline int vrange_type(struct vrange *vrange)
 
 void vrange_init(void);
 extern void vrange_root_cleanup(struct vrange_root *vroot);
-
+extern int vrange_fork(struct mm_struct *new,
+					struct mm_struct *old);
 #else
 
 static inline void vrange_init(void) {};
 static inline void vrange_root_init(struct vrange_root *vroot, int type) {};
 static inline void vrange_root_cleanup(struct vrange_root *vroot) {};
+static inline int vrange_fork(struct mm_struct *new, struct mm_struct *old)
+{
+	return 0;
+}
 
 #endif
 #endif /* _LINIUX_VRANGE_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index 987b28a..6d22625 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -71,6 +71,7 @@
 #include <linux/signalfd.h>
 #include <linux/uprobes.h>
 #include <linux/aio.h>
+#include <linux/vrange.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -379,6 +380,9 @@ static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
 	retval = khugepaged_fork(mm, oldmm);
 	if (retval)
 		goto out;
+	retval = vrange_fork(mm, oldmm);
+	if (retval)
+		goto out;
 
 	prev = NULL;
 	for (mpnt = oldmm->mmap; mpnt; mpnt = mpnt->vm_next) {
@@ -542,6 +546,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p)
 	spin_lock_init(&mm->page_table_lock);
 	mm->free_area_cache = TASK_UNMAPPED_BASE;
 	mm->cached_hole_size = ~0UL;
+	vrange_root_init(&mm->vroot, VRANGE_MM);
 	mm_init_aio(mm);
 	mm_init_owner(mm, p);
 
@@ -613,6 +618,7 @@ void mmput(struct mm_struct *mm)
 
 	if (atomic_dec_and_test(&mm->mm_users)) {
 		uprobe_clear_state(mm);
+		vrange_root_cleanup(&mm->vroot);
 		exit_aio(mm);
 		ksm_exit(mm);
 		khugepaged_exit(mm); /* must run before exit_mmap */
diff --git a/mm/vrange.c b/mm/vrange.c
index e3042e0..bbaa184 100644
--- a/mm/vrange.c
+++ b/mm/vrange.c
@@ -4,6 +4,7 @@
 
 #include <linux/vrange.h>
 #include <linux/slab.h>
+#include <linux/mman.h>
 
 static struct kmem_cache *vrange_cachep;
 
@@ -179,3 +180,32 @@ void vrange_root_cleanup(struct vrange_root *vroot)
 	vrange_unlock(vroot);
 }
 
+int vrange_fork(struct mm_struct *new_mm, struct mm_struct *old_mm)
+{
+	struct vrange_root *new, *old;
+	struct vrange *range, *new_range;
+	struct rb_node *next;
+
+	new = &new_mm->vroot;
+	old = &old_mm->vroot;
+
+	vrange_lock(old);
+	next = rb_first(&old->v_rb);
+	while (next) {
+		range = vrange_entry(next);
+		next = rb_next(next);
+
+		new_range = __vrange_alloc(GFP_KERNEL);
+		if (!new_range)
+			goto fail;
+		__vrange_set(new_range, range->node.start,
+					range->node.last, range->purged);
+		__vrange_add(new_range, new);
+
+	}
+	vrange_unlock(old);
+	return 0;
+fail:
+	vrange_root_cleanup(new);
+	return -ENOMEM;
+}
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 3/8] vrange: Add vrange support to mm_structs
@ 2013-06-12  4:22   ` John Stultz
  0 siblings, 0 replies; 48+ messages in thread
From: John Stultz @ 2013-06-12  4:22 UTC (permalink / raw)
  To: LKML
  Cc: Minchan Kim, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	linux-mm, John Stultz

From: Minchan Kim <minchan@kernel.org>

Allows for vranges to be managed against mm_structs.

Includes support for copying vrange trees on fork,
as well as clearing them on exec.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Android Kernel Team <kernel-team@android.com>
Cc: Robert Love <rlove@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Andrea Righi <andrea@betterlinux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mike Hommey <mh@glandium.org>
Cc: Taras Glek <tglek@mozilla.com>
Cc: Dhaval Giani <dgiani@mozilla.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: linux-mm@kvack.org <linux-mm@kvack.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
[jstultz: Heavy refactoring.]
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 include/linux/mm_types.h |  5 +++++
 include/linux/vrange.h   |  7 ++++++-
 kernel/fork.c            |  6 ++++++
 mm/vrange.c              | 30 ++++++++++++++++++++++++++++++
 4 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index ace9a5f..2e02a6d 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -13,6 +13,8 @@
 #include <linux/page-debug-flags.h>
 #include <linux/uprobes.h>
 #include <linux/page-flags-layout.h>
+#include <linux/mutex.h>
+#include <linux/vrange_types.h>
 #include <asm/page.h>
 #include <asm/mmu.h>
 
@@ -351,6 +353,9 @@ struct mm_struct {
 						 */
 
 
+#ifdef CONFIG_MMU
+	struct vrange_root vroot;
+#endif
 	unsigned long hiwater_rss;	/* High-watermark of RSS usage */
 	unsigned long hiwater_vm;	/* High-water virtual memory usage */
 
diff --git a/include/linux/vrange.h b/include/linux/vrange.h
index 2064cb0..13f4887 100644
--- a/include/linux/vrange.h
+++ b/include/linux/vrange.h
@@ -33,12 +33,17 @@ static inline int vrange_type(struct vrange *vrange)
 
 void vrange_init(void);
 extern void vrange_root_cleanup(struct vrange_root *vroot);
-
+extern int vrange_fork(struct mm_struct *new,
+					struct mm_struct *old);
 #else
 
 static inline void vrange_init(void) {};
 static inline void vrange_root_init(struct vrange_root *vroot, int type) {};
 static inline void vrange_root_cleanup(struct vrange_root *vroot) {};
+static inline int vrange_fork(struct mm_struct *new, struct mm_struct *old)
+{
+	return 0;
+}
 
 #endif
 #endif /* _LINIUX_VRANGE_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index 987b28a..6d22625 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -71,6 +71,7 @@
 #include <linux/signalfd.h>
 #include <linux/uprobes.h>
 #include <linux/aio.h>
+#include <linux/vrange.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -379,6 +380,9 @@ static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
 	retval = khugepaged_fork(mm, oldmm);
 	if (retval)
 		goto out;
+	retval = vrange_fork(mm, oldmm);
+	if (retval)
+		goto out;
 
 	prev = NULL;
 	for (mpnt = oldmm->mmap; mpnt; mpnt = mpnt->vm_next) {
@@ -542,6 +546,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p)
 	spin_lock_init(&mm->page_table_lock);
 	mm->free_area_cache = TASK_UNMAPPED_BASE;
 	mm->cached_hole_size = ~0UL;
+	vrange_root_init(&mm->vroot, VRANGE_MM);
 	mm_init_aio(mm);
 	mm_init_owner(mm, p);
 
@@ -613,6 +618,7 @@ void mmput(struct mm_struct *mm)
 
 	if (atomic_dec_and_test(&mm->mm_users)) {
 		uprobe_clear_state(mm);
+		vrange_root_cleanup(&mm->vroot);
 		exit_aio(mm);
 		ksm_exit(mm);
 		khugepaged_exit(mm); /* must run before exit_mmap */
diff --git a/mm/vrange.c b/mm/vrange.c
index e3042e0..bbaa184 100644
--- a/mm/vrange.c
+++ b/mm/vrange.c
@@ -4,6 +4,7 @@
 
 #include <linux/vrange.h>
 #include <linux/slab.h>
+#include <linux/mman.h>
 
 static struct kmem_cache *vrange_cachep;
 
@@ -179,3 +180,32 @@ void vrange_root_cleanup(struct vrange_root *vroot)
 	vrange_unlock(vroot);
 }
 
+int vrange_fork(struct mm_struct *new_mm, struct mm_struct *old_mm)
+{
+	struct vrange_root *new, *old;
+	struct vrange *range, *new_range;
+	struct rb_node *next;
+
+	new = &new_mm->vroot;
+	old = &old_mm->vroot;
+
+	vrange_lock(old);
+	next = rb_first(&old->v_rb);
+	while (next) {
+		range = vrange_entry(next);
+		next = rb_next(next);
+
+		new_range = __vrange_alloc(GFP_KERNEL);
+		if (!new_range)
+			goto fail;
+		__vrange_set(new_range, range->node.start,
+					range->node.last, range->purged);
+		__vrange_add(new_range, new);
+
+	}
+	vrange_unlock(old);
+	return 0;
+fail:
+	vrange_root_cleanup(new);
+	return -ENOMEM;
+}
-- 
1.8.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 4/8] vrange: Clear volatility on new mmaps
  2013-06-12  4:22 ` John Stultz
@ 2013-06-12  4:22   ` John Stultz
  -1 siblings, 0 replies; 48+ messages in thread
From: John Stultz @ 2013-06-12  4:22 UTC (permalink / raw)
  To: LKML
  Cc: John Stultz, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	Minchan Kim, linux-mm

At lsf-mm, the issue was brought up that there is a precedence with
interfaces like mlock, such that new mappings in a pre-existing range
do no inherit the mlock state.

This is mostly because mlock only modifies the existing vmas, and so
any new mmaps create new vmas, which won't be mlocked.

Since volatility is not stored in the vma (for good cause, specfically
as we'd have to have manage file volatility differently from anonymous
and we're likely to manage volatility on small chunks of memory, which
would cause lots of vma splitting and churn), this patch clears volatilty
on new mappings, to ensure that we don't inherit volatility if memory in
an existing volatile range is unmapped and then re-mapped with something
else.

Thus, this patch forces any volatility to be cleared on mmap.

XXX: We expect this patch to be not well loved by mm folks, and are open
to alternative methods here. Its more of a place holder to address
the issue from lsf-mm and hopefully will spur some further discussion.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Android Kernel Team <kernel-team@android.com>
Cc: Robert Love <rlove@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Andrea Righi <andrea@betterlinux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mike Hommey <mh@glandium.org>
Cc: Taras Glek <tglek@mozilla.com>
Cc: Dhaval Giani <dgiani@mozilla.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: linux-mm@kvack.org <linux-mm@kvack.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 include/linux/vrange.h | 2 ++
 mm/mmap.c              | 5 +++++
 mm/vrange.c            | 8 ++++++++
 3 files changed, 15 insertions(+)

diff --git a/include/linux/vrange.h b/include/linux/vrange.h
index 13f4887..a97ac25 100644
--- a/include/linux/vrange.h
+++ b/include/linux/vrange.h
@@ -32,6 +32,8 @@ static inline int vrange_type(struct vrange *vrange)
 }
 
 void vrange_init(void);
+extern int vrange_clear(struct vrange_root *vroot,
+				unsigned long start, unsigned long end);
 extern void vrange_root_cleanup(struct vrange_root *vroot);
 extern int vrange_fork(struct mm_struct *new,
 					struct mm_struct *old);
diff --git a/mm/mmap.c b/mm/mmap.c
index f681e18..80d3676 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -36,6 +36,7 @@
 #include <linux/sched/sysctl.h>
 #include <linux/notifier.h>
 #include <linux/memory.h>
+#include <linux/vrange.h>
 
 #include <asm/uaccess.h>
 #include <asm/cacheflush.h>
@@ -1500,6 +1501,10 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 	/* Clear old maps */
 	error = -ENOMEM;
 munmap_back:
+
+	/* zap any volatile ranges */
+	vrange_clear(&mm->vroot, addr, addr + len);
+
 	if (find_vma_links(mm, addr, addr + len, &prev, &rb_link, &rb_parent)) {
 		if (do_munmap(mm, addr, len))
 			return -ENOMEM;
diff --git a/mm/vrange.c b/mm/vrange.c
index bbaa184..5ca8853 100644
--- a/mm/vrange.c
+++ b/mm/vrange.c
@@ -164,6 +164,14 @@ static int vrange_remove(struct vrange_root *vroot,
 	return 0;
 }
 
+int vrange_clear(struct vrange_root *vroot,
+					unsigned long start, unsigned long end)
+{
+	int purged;
+
+	return vrange_remove(vroot, start, end-1, &purged);
+}
+
 void vrange_root_cleanup(struct vrange_root *vroot)
 {
 	struct vrange *range;
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 4/8] vrange: Clear volatility on new mmaps
@ 2013-06-12  4:22   ` John Stultz
  0 siblings, 0 replies; 48+ messages in thread
From: John Stultz @ 2013-06-12  4:22 UTC (permalink / raw)
  To: LKML
  Cc: John Stultz, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	Minchan Kim, linux-mm

At lsf-mm, the issue was brought up that there is a precedence with
interfaces like mlock, such that new mappings in a pre-existing range
do no inherit the mlock state.

This is mostly because mlock only modifies the existing vmas, and so
any new mmaps create new vmas, which won't be mlocked.

Since volatility is not stored in the vma (for good cause, specfically
as we'd have to have manage file volatility differently from anonymous
and we're likely to manage volatility on small chunks of memory, which
would cause lots of vma splitting and churn), this patch clears volatilty
on new mappings, to ensure that we don't inherit volatility if memory in
an existing volatile range is unmapped and then re-mapped with something
else.

Thus, this patch forces any volatility to be cleared on mmap.

XXX: We expect this patch to be not well loved by mm folks, and are open
to alternative methods here. Its more of a place holder to address
the issue from lsf-mm and hopefully will spur some further discussion.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Android Kernel Team <kernel-team@android.com>
Cc: Robert Love <rlove@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Andrea Righi <andrea@betterlinux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mike Hommey <mh@glandium.org>
Cc: Taras Glek <tglek@mozilla.com>
Cc: Dhaval Giani <dgiani@mozilla.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: linux-mm@kvack.org <linux-mm@kvack.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 include/linux/vrange.h | 2 ++
 mm/mmap.c              | 5 +++++
 mm/vrange.c            | 8 ++++++++
 3 files changed, 15 insertions(+)

diff --git a/include/linux/vrange.h b/include/linux/vrange.h
index 13f4887..a97ac25 100644
--- a/include/linux/vrange.h
+++ b/include/linux/vrange.h
@@ -32,6 +32,8 @@ static inline int vrange_type(struct vrange *vrange)
 }
 
 void vrange_init(void);
+extern int vrange_clear(struct vrange_root *vroot,
+				unsigned long start, unsigned long end);
 extern void vrange_root_cleanup(struct vrange_root *vroot);
 extern int vrange_fork(struct mm_struct *new,
 					struct mm_struct *old);
diff --git a/mm/mmap.c b/mm/mmap.c
index f681e18..80d3676 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -36,6 +36,7 @@
 #include <linux/sched/sysctl.h>
 #include <linux/notifier.h>
 #include <linux/memory.h>
+#include <linux/vrange.h>
 
 #include <asm/uaccess.h>
 #include <asm/cacheflush.h>
@@ -1500,6 +1501,10 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 	/* Clear old maps */
 	error = -ENOMEM;
 munmap_back:
+
+	/* zap any volatile ranges */
+	vrange_clear(&mm->vroot, addr, addr + len);
+
 	if (find_vma_links(mm, addr, addr + len, &prev, &rb_link, &rb_parent)) {
 		if (do_munmap(mm, addr, len))
 			return -ENOMEM;
diff --git a/mm/vrange.c b/mm/vrange.c
index bbaa184..5ca8853 100644
--- a/mm/vrange.c
+++ b/mm/vrange.c
@@ -164,6 +164,14 @@ static int vrange_remove(struct vrange_root *vroot,
 	return 0;
 }
 
+int vrange_clear(struct vrange_root *vroot,
+					unsigned long start, unsigned long end)
+{
+	int purged;
+
+	return vrange_remove(vroot, start, end-1, &purged);
+}
+
 void vrange_root_cleanup(struct vrange_root *vroot)
 {
 	struct vrange *range;
-- 
1.8.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 5/8] vrange: Add new vrange(2) system call
  2013-06-12  4:22 ` John Stultz
@ 2013-06-12  4:22   ` John Stultz
  -1 siblings, 0 replies; 48+ messages in thread
From: John Stultz @ 2013-06-12  4:22 UTC (permalink / raw)
  To: LKML
  Cc: Minchan Kim, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	linux-mm, John Stultz

From: Minchan Kim <minchan@kernel.org>

This patch adds new system call sys_vrange.

NAME
	vrange - Mark or unmark range of memory as volatile

SYNOPSIS
	int vrange(unsigned_long start, size_t length, int mode,
			 int *purged);

DESCRIPTION
	Applications can use vrange(2) to advise the kernel how it should
	handle paging I/O in this VM area.  The idea is to help the kernel
	discard pages of vrange instead of reclaiming when memory pressure
	happens. It means kernel doesn't discard any pages of vrange if
	there is no memory pressure.

	mode:
	VRANGE_VOLATILE
		hint to kernel so VM can discard in vrange pages when
		memory pressure happens.
	VRANGE_NONVOLATILE
		hint to kernel so VM doesn't discard vrange pages
		any more.

	If user try to access purged memory without VRANGE_NOVOLATILE call,
	he can encounter SIGBUS if the page was discarded by kernel.

	purged: Pointer to an integer which will return 1 if
	mode == VRANGE_NONVOLATILE and any page in the affected range
	was purged. If purged returns zero during a mode ==
	VRANGE_NONVOLATILE call, it means all of the pages in the range
	are intact.

RETURN VALUE
	On success vrange returns the number of bytes marked or unmarked.
	Similar to write(), it may return fewer bytes then specified
	if it ran into a problem.

	If an error is returned, no changes were made.

ERRORS
	EINVAL This error can occur for the following reasons:
		* The value length is negative or not page size units.
		* addr is not page-aligned
		* mode not a valid value.

	ENOMEM Not enough memory

	EFAULT purged pointer is invalid

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Android Kernel Team <kernel-team@android.com>
Cc: Robert Love <rlove@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Andrea Righi <andrea@betterlinux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mike Hommey <mh@glandium.org>
Cc: Taras Glek <tglek@mozilla.com>
Cc: Dhaval Giani <dgiani@mozilla.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: linux-mm@kvack.org <linux-mm@kvack.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
[jstultz: Major rework of interface and commit message]
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 arch/x86/syscalls/syscall_64.tbl       |   1 +
 include/uapi/asm-generic/mman-common.h |   3 +
 mm/vrange.c                            | 147 +++++++++++++++++++++++++++++++++
 3 files changed, 151 insertions(+)

diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index 38ae65d..dc332bd 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -320,6 +320,7 @@
 311	64	process_vm_writev	sys_process_vm_writev
 312	common	kcmp			sys_kcmp
 313	common	finit_module		sys_finit_module
+314	common	vrange			sys_vrange
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index 4164529..9be120b 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -66,4 +66,7 @@
 #define MAP_HUGE_SHIFT	26
 #define MAP_HUGE_MASK	0x3f
 
+#define VRANGE_VOLATILE		0	/* unpin pages so VM can discard them */
+#define VRANGE_NONVOLATILE	1	/* pin pages so VM can't discard them */
+
 #endif /* __ASM_GENERIC_MMAN_COMMON_H */
diff --git a/mm/vrange.c b/mm/vrange.c
index 5ca8853..f3c2465 100644
--- a/mm/vrange.c
+++ b/mm/vrange.c
@@ -5,6 +5,7 @@
 #include <linux/vrange.h>
 #include <linux/slab.h>
 #include <linux/mman.h>
+#include <linux/syscalls.h>
 
 static struct kmem_cache *vrange_cachep;
 
@@ -217,3 +218,149 @@ fail:
 	vrange_root_cleanup(new);
 	return -ENOMEM;
 }
+
+static ssize_t do_vrange(struct mm_struct *mm, unsigned long start_idx,
+				unsigned long end_idx, int mode, int *purged)
+{
+	struct vm_area_struct *vma;
+	unsigned long orig_start = start_idx;
+	ssize_t count = 0, ret = 0;
+
+	down_read(&mm->mmap_sem);
+
+	vma = find_vma(mm, start_idx);
+	for (;;) {
+		struct vrange_root *vroot;
+		unsigned long tmp, vstart_idx, vend_idx;
+
+		if (!vma)
+			goto out;
+
+		/* make sure start is at the front of the current vma*/
+		if (start_idx < vma->vm_start) {
+			start_idx = vma->vm_start;
+			if (start_idx > end_idx)
+				goto out;
+		}
+
+		/* bound tmp to closer of vm_end & end */
+		tmp = vma->vm_end - 1;
+		if (end_idx < tmp)
+			tmp = end_idx;
+
+		if (vma->vm_file && (vma->vm_flags & VM_SHARED)) {
+			/* Convert to file relative offsets */
+			vroot = &vma->vm_file->f_mapping->vroot;
+			vstart_idx = vma->vm_pgoff + start_idx - vma->vm_start;
+			vend_idx = vma->vm_pgoff + tmp - vma->vm_start;
+		} else {
+			vroot = &mm->vroot;
+			vstart_idx = start_idx;
+			vend_idx = tmp;
+		}
+
+		/* mark or unmark */
+		if (mode == VRANGE_VOLATILE)
+			ret = vrange_add(vroot, vstart_idx, vend_idx);
+		else if (mode == VRANGE_NONVOLATILE)
+			ret = vrange_remove(vroot, vstart_idx, vend_idx,
+						purged);
+
+		if (ret)
+			goto out;
+
+		/* update count to distance covered so far*/
+		count = tmp - orig_start;
+
+		/* move start up to the end of the vma*/
+		start_idx = vma->vm_end;
+		if (start_idx > end_idx)
+			goto out;
+		/* move to the next vma */
+		vma = vma->vm_next;
+	}
+out:
+	up_read(&mm->mmap_sem);
+
+	/* report bytes successfully marked, even if we're exiting on error */
+	if (count)
+		return count;
+
+	return ret;
+}
+
+/*
+ * The vrange(2) system call.
+ *
+ * Applications can use vrange() to advise the kernel how it should
+ * handle paging I/O in this VM area.  The idea is to help the kernel
+ * discard pages of vrange instead of swapping out when memory pressure
+ * happens. The information provided is advisory only, and can be safely
+ * disregarded by the kernel if system has enough free memory.
+ *
+ * mode values:
+ *  VRANGE_VOLATILE - hint to kernel so VM can discard vrange pages when
+ *		memory pressure happens.
+ *  VRANGE_NONVOLATILE - Removes any volatile hints previous specified in that
+ *		range.
+ *
+ * purged ptr:
+ *  Returns 1 if any page in the range being marked nonvolatile has been purged.
+ *
+ * Return values:
+ *  On success vrange returns the number of bytes marked or unmarked.
+ *  Similar to write(), it may return fewer bytes then specified if
+ *  it ran into a problem.
+ *
+ *  If an error is returned, no changes were made.
+ *
+ * Errors:
+ *  -EINVAL - start  len < 0, start is not page-aligned, start is greater
+ *		than TASK_SIZE or "mode" is not a valid value.
+ *  -ENOMEM - Short of free memory in system for successful system call.
+ *  -EFAULT - Purged pointer is invalid.
+ *  -ENOSUP - Feature not yet supported.
+ */
+SYSCALL_DEFINE4(vrange, unsigned long, start,
+		size_t, len, int, mode, int __user *, purged)
+{
+	unsigned long end;
+	struct mm_struct *mm = current->mm;
+	ssize_t ret = -EINVAL;
+	int p = 0;
+
+	if (start & ~PAGE_MASK)
+		goto out;
+
+	len &= PAGE_MASK;
+	if (!len)
+		goto out;
+
+	end = start + len;
+	if (end < start)
+		goto out;
+
+	if (start >= TASK_SIZE)
+		goto out;
+
+	if (purged) {
+		/* Test pointer is valid before making any changes */
+		if (put_user(p, purged))
+			return -EFAULT;
+	}
+
+	ret = do_vrange(mm, start, end - 1, mode, &p);
+
+	if (purged) {
+		if (put_user(p, purged)) {
+			/*
+			 * This would be bad, since we've modified volatilty
+			 * and the change in purged state would be lost.
+			 */
+			BUG();
+		}
+	}
+
+out:
+	return ret;
+}
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 5/8] vrange: Add new vrange(2) system call
@ 2013-06-12  4:22   ` John Stultz
  0 siblings, 0 replies; 48+ messages in thread
From: John Stultz @ 2013-06-12  4:22 UTC (permalink / raw)
  To: LKML
  Cc: Minchan Kim, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	linux-mm, John Stultz

From: Minchan Kim <minchan@kernel.org>

This patch adds new system call sys_vrange.

NAME
	vrange - Mark or unmark range of memory as volatile

SYNOPSIS
	int vrange(unsigned_long start, size_t length, int mode,
			 int *purged);

DESCRIPTION
	Applications can use vrange(2) to advise the kernel how it should
	handle paging I/O in this VM area.  The idea is to help the kernel
	discard pages of vrange instead of reclaiming when memory pressure
	happens. It means kernel doesn't discard any pages of vrange if
	there is no memory pressure.

	mode:
	VRANGE_VOLATILE
		hint to kernel so VM can discard in vrange pages when
		memory pressure happens.
	VRANGE_NONVOLATILE
		hint to kernel so VM doesn't discard vrange pages
		any more.

	If user try to access purged memory without VRANGE_NOVOLATILE call,
	he can encounter SIGBUS if the page was discarded by kernel.

	purged: Pointer to an integer which will return 1 if
	mode == VRANGE_NONVOLATILE and any page in the affected range
	was purged. If purged returns zero during a mode ==
	VRANGE_NONVOLATILE call, it means all of the pages in the range
	are intact.

RETURN VALUE
	On success vrange returns the number of bytes marked or unmarked.
	Similar to write(), it may return fewer bytes then specified
	if it ran into a problem.

	If an error is returned, no changes were made.

ERRORS
	EINVAL This error can occur for the following reasons:
		* The value length is negative or not page size units.
		* addr is not page-aligned
		* mode not a valid value.

	ENOMEM Not enough memory

	EFAULT purged pointer is invalid

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Android Kernel Team <kernel-team@android.com>
Cc: Robert Love <rlove@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Andrea Righi <andrea@betterlinux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mike Hommey <mh@glandium.org>
Cc: Taras Glek <tglek@mozilla.com>
Cc: Dhaval Giani <dgiani@mozilla.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: linux-mm@kvack.org <linux-mm@kvack.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
[jstultz: Major rework of interface and commit message]
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 arch/x86/syscalls/syscall_64.tbl       |   1 +
 include/uapi/asm-generic/mman-common.h |   3 +
 mm/vrange.c                            | 147 +++++++++++++++++++++++++++++++++
 3 files changed, 151 insertions(+)

diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index 38ae65d..dc332bd 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -320,6 +320,7 @@
 311	64	process_vm_writev	sys_process_vm_writev
 312	common	kcmp			sys_kcmp
 313	common	finit_module		sys_finit_module
+314	common	vrange			sys_vrange
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index 4164529..9be120b 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -66,4 +66,7 @@
 #define MAP_HUGE_SHIFT	26
 #define MAP_HUGE_MASK	0x3f
 
+#define VRANGE_VOLATILE		0	/* unpin pages so VM can discard them */
+#define VRANGE_NONVOLATILE	1	/* pin pages so VM can't discard them */
+
 #endif /* __ASM_GENERIC_MMAN_COMMON_H */
diff --git a/mm/vrange.c b/mm/vrange.c
index 5ca8853..f3c2465 100644
--- a/mm/vrange.c
+++ b/mm/vrange.c
@@ -5,6 +5,7 @@
 #include <linux/vrange.h>
 #include <linux/slab.h>
 #include <linux/mman.h>
+#include <linux/syscalls.h>
 
 static struct kmem_cache *vrange_cachep;
 
@@ -217,3 +218,149 @@ fail:
 	vrange_root_cleanup(new);
 	return -ENOMEM;
 }
+
+static ssize_t do_vrange(struct mm_struct *mm, unsigned long start_idx,
+				unsigned long end_idx, int mode, int *purged)
+{
+	struct vm_area_struct *vma;
+	unsigned long orig_start = start_idx;
+	ssize_t count = 0, ret = 0;
+
+	down_read(&mm->mmap_sem);
+
+	vma = find_vma(mm, start_idx);
+	for (;;) {
+		struct vrange_root *vroot;
+		unsigned long tmp, vstart_idx, vend_idx;
+
+		if (!vma)
+			goto out;
+
+		/* make sure start is at the front of the current vma*/
+		if (start_idx < vma->vm_start) {
+			start_idx = vma->vm_start;
+			if (start_idx > end_idx)
+				goto out;
+		}
+
+		/* bound tmp to closer of vm_end & end */
+		tmp = vma->vm_end - 1;
+		if (end_idx < tmp)
+			tmp = end_idx;
+
+		if (vma->vm_file && (vma->vm_flags & VM_SHARED)) {
+			/* Convert to file relative offsets */
+			vroot = &vma->vm_file->f_mapping->vroot;
+			vstart_idx = vma->vm_pgoff + start_idx - vma->vm_start;
+			vend_idx = vma->vm_pgoff + tmp - vma->vm_start;
+		} else {
+			vroot = &mm->vroot;
+			vstart_idx = start_idx;
+			vend_idx = tmp;
+		}
+
+		/* mark or unmark */
+		if (mode == VRANGE_VOLATILE)
+			ret = vrange_add(vroot, vstart_idx, vend_idx);
+		else if (mode == VRANGE_NONVOLATILE)
+			ret = vrange_remove(vroot, vstart_idx, vend_idx,
+						purged);
+
+		if (ret)
+			goto out;
+
+		/* update count to distance covered so far*/
+		count = tmp - orig_start;
+
+		/* move start up to the end of the vma*/
+		start_idx = vma->vm_end;
+		if (start_idx > end_idx)
+			goto out;
+		/* move to the next vma */
+		vma = vma->vm_next;
+	}
+out:
+	up_read(&mm->mmap_sem);
+
+	/* report bytes successfully marked, even if we're exiting on error */
+	if (count)
+		return count;
+
+	return ret;
+}
+
+/*
+ * The vrange(2) system call.
+ *
+ * Applications can use vrange() to advise the kernel how it should
+ * handle paging I/O in this VM area.  The idea is to help the kernel
+ * discard pages of vrange instead of swapping out when memory pressure
+ * happens. The information provided is advisory only, and can be safely
+ * disregarded by the kernel if system has enough free memory.
+ *
+ * mode values:
+ *  VRANGE_VOLATILE - hint to kernel so VM can discard vrange pages when
+ *		memory pressure happens.
+ *  VRANGE_NONVOLATILE - Removes any volatile hints previous specified in that
+ *		range.
+ *
+ * purged ptr:
+ *  Returns 1 if any page in the range being marked nonvolatile has been purged.
+ *
+ * Return values:
+ *  On success vrange returns the number of bytes marked or unmarked.
+ *  Similar to write(), it may return fewer bytes then specified if
+ *  it ran into a problem.
+ *
+ *  If an error is returned, no changes were made.
+ *
+ * Errors:
+ *  -EINVAL - start  len < 0, start is not page-aligned, start is greater
+ *		than TASK_SIZE or "mode" is not a valid value.
+ *  -ENOMEM - Short of free memory in system for successful system call.
+ *  -EFAULT - Purged pointer is invalid.
+ *  -ENOSUP - Feature not yet supported.
+ */
+SYSCALL_DEFINE4(vrange, unsigned long, start,
+		size_t, len, int, mode, int __user *, purged)
+{
+	unsigned long end;
+	struct mm_struct *mm = current->mm;
+	ssize_t ret = -EINVAL;
+	int p = 0;
+
+	if (start & ~PAGE_MASK)
+		goto out;
+
+	len &= PAGE_MASK;
+	if (!len)
+		goto out;
+
+	end = start + len;
+	if (end < start)
+		goto out;
+
+	if (start >= TASK_SIZE)
+		goto out;
+
+	if (purged) {
+		/* Test pointer is valid before making any changes */
+		if (put_user(p, purged))
+			return -EFAULT;
+	}
+
+	ret = do_vrange(mm, start, end - 1, mode, &p);
+
+	if (purged) {
+		if (put_user(p, purged)) {
+			/*
+			 * This would be bad, since we've modified volatilty
+			 * and the change in purged state would be lost.
+			 */
+			BUG();
+		}
+	}
+
+out:
+	return ret;
+}
-- 
1.8.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 6/8] vrange: Add GFP_NO_VRANGE allocation flag
  2013-06-12  4:22 ` John Stultz
@ 2013-06-12  4:22   ` John Stultz
  -1 siblings, 0 replies; 48+ messages in thread
From: John Stultz @ 2013-06-12  4:22 UTC (permalink / raw)
  To: LKML
  Cc: Minchan Kim, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	linux-mm, John Stultz

From: Minchan Kim <minchan@kernel.org>

In cloning the vroot tree during a fork, we have to
allocate memory while hold the vroot lock. This is problematic,
as the memory allocation can trigger reclaim, which might require
grabbing a vroot lock in order to find purgable pages.

Thus this patch introduces GFP_NO_VRANGE which will allow
us to avoid having a allocation for vrange to trigger any
volatile range purging.

XXX: We're not yet using this flag in the later purge paths, so
we still get the lockdep warnings.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Android Kernel Team <kernel-team@android.com>
Cc: Robert Love <rlove@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Andrea Righi <andrea@betterlinux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mike Hommey <mh@glandium.org>
Cc: Taras Glek <tglek@mozilla.com>
Cc: Dhaval Giani <dgiani@mozilla.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: linux-mm@kvack.org <linux-mm@kvack.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
[jstultz: Split out from a different patch, created new commit message]
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 include/linux/gfp.h | 7 +++++--
 mm/vrange.c         | 2 +-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 0f615eb..fa52199 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -35,6 +35,7 @@ struct vm_area_struct;
 #define ___GFP_NO_KSWAPD	0x400000u
 #define ___GFP_OTHER_NODE	0x800000u
 #define ___GFP_WRITE		0x1000000u
+#define ___GFP_NO_VRANGE	0x2000000u
 /* If the above are modified, __GFP_BITS_SHIFT may need updating */
 
 /*
@@ -70,6 +71,7 @@ struct vm_area_struct;
 #define __GFP_HIGH	((__force gfp_t)___GFP_HIGH)	/* Should access emergency pools? */
 #define __GFP_IO	((__force gfp_t)___GFP_IO)	/* Can start physical IO? */
 #define __GFP_FS	((__force gfp_t)___GFP_FS)	/* Can call down to low-level FS? */
+#define __GFP_NO_VRANGE ((__force gfp_t)___GFP_NO_VRANGE) /* Can't reclaim volatile pages */
 #define __GFP_COLD	((__force gfp_t)___GFP_COLD)	/* Cache-cold page required */
 #define __GFP_NOWARN	((__force gfp_t)___GFP_NOWARN)	/* Suppress page allocation failure warning */
 #define __GFP_REPEAT	((__force gfp_t)___GFP_REPEAT)	/* See above */
@@ -99,7 +101,7 @@ struct vm_area_struct;
  */
 #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK)
 
-#define __GFP_BITS_SHIFT 25	/* Room for N __GFP_FOO bits */
+#define __GFP_BITS_SHIFT 26	/* Room for N __GFP_FOO bits */
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /* This equals 0, but use constants in case they ever change */
@@ -134,7 +136,8 @@ struct vm_area_struct;
 /* Control page allocator reclaim behavior */
 #define GFP_RECLAIM_MASK (__GFP_WAIT|__GFP_HIGH|__GFP_IO|__GFP_FS|\
 			__GFP_NOWARN|__GFP_REPEAT|__GFP_NOFAIL|\
-			__GFP_NORETRY|__GFP_MEMALLOC|__GFP_NOMEMALLOC)
+			__GFP_NORETRY|__GFP_MEMALLOC|__GFP_NOMEMALLOC|\
+			__GFP_NO_VRANGE)
 
 /* Control slab gfp mask during early boot */
 #define GFP_BOOT_MASK (__GFP_BITS_MASK & ~(__GFP_WAIT|__GFP_IO|__GFP_FS))
diff --git a/mm/vrange.c b/mm/vrange.c
index f3c2465..5278939 100644
--- a/mm/vrange.c
+++ b/mm/vrange.c
@@ -204,7 +204,7 @@ int vrange_fork(struct mm_struct *new_mm, struct mm_struct *old_mm)
 		range = vrange_entry(next);
 		next = rb_next(next);
 
-		new_range = __vrange_alloc(GFP_KERNEL);
+		new_range = __vrange_alloc(GFP_KERNEL|__GFP_NO_VRANGE);
 		if (!new_range)
 			goto fail;
 		__vrange_set(new_range, range->node.start,
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 6/8] vrange: Add GFP_NO_VRANGE allocation flag
@ 2013-06-12  4:22   ` John Stultz
  0 siblings, 0 replies; 48+ messages in thread
From: John Stultz @ 2013-06-12  4:22 UTC (permalink / raw)
  To: LKML
  Cc: Minchan Kim, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	linux-mm, John Stultz

From: Minchan Kim <minchan@kernel.org>

In cloning the vroot tree during a fork, we have to
allocate memory while hold the vroot lock. This is problematic,
as the memory allocation can trigger reclaim, which might require
grabbing a vroot lock in order to find purgable pages.

Thus this patch introduces GFP_NO_VRANGE which will allow
us to avoid having a allocation for vrange to trigger any
volatile range purging.

XXX: We're not yet using this flag in the later purge paths, so
we still get the lockdep warnings.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Android Kernel Team <kernel-team@android.com>
Cc: Robert Love <rlove@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Andrea Righi <andrea@betterlinux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mike Hommey <mh@glandium.org>
Cc: Taras Glek <tglek@mozilla.com>
Cc: Dhaval Giani <dgiani@mozilla.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: linux-mm@kvack.org <linux-mm@kvack.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
[jstultz: Split out from a different patch, created new commit message]
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 include/linux/gfp.h | 7 +++++--
 mm/vrange.c         | 2 +-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 0f615eb..fa52199 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -35,6 +35,7 @@ struct vm_area_struct;
 #define ___GFP_NO_KSWAPD	0x400000u
 #define ___GFP_OTHER_NODE	0x800000u
 #define ___GFP_WRITE		0x1000000u
+#define ___GFP_NO_VRANGE	0x2000000u
 /* If the above are modified, __GFP_BITS_SHIFT may need updating */
 
 /*
@@ -70,6 +71,7 @@ struct vm_area_struct;
 #define __GFP_HIGH	((__force gfp_t)___GFP_HIGH)	/* Should access emergency pools? */
 #define __GFP_IO	((__force gfp_t)___GFP_IO)	/* Can start physical IO? */
 #define __GFP_FS	((__force gfp_t)___GFP_FS)	/* Can call down to low-level FS? */
+#define __GFP_NO_VRANGE ((__force gfp_t)___GFP_NO_VRANGE) /* Can't reclaim volatile pages */
 #define __GFP_COLD	((__force gfp_t)___GFP_COLD)	/* Cache-cold page required */
 #define __GFP_NOWARN	((__force gfp_t)___GFP_NOWARN)	/* Suppress page allocation failure warning */
 #define __GFP_REPEAT	((__force gfp_t)___GFP_REPEAT)	/* See above */
@@ -99,7 +101,7 @@ struct vm_area_struct;
  */
 #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK)
 
-#define __GFP_BITS_SHIFT 25	/* Room for N __GFP_FOO bits */
+#define __GFP_BITS_SHIFT 26	/* Room for N __GFP_FOO bits */
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /* This equals 0, but use constants in case they ever change */
@@ -134,7 +136,8 @@ struct vm_area_struct;
 /* Control page allocator reclaim behavior */
 #define GFP_RECLAIM_MASK (__GFP_WAIT|__GFP_HIGH|__GFP_IO|__GFP_FS|\
 			__GFP_NOWARN|__GFP_REPEAT|__GFP_NOFAIL|\
-			__GFP_NORETRY|__GFP_MEMALLOC|__GFP_NOMEMALLOC)
+			__GFP_NORETRY|__GFP_MEMALLOC|__GFP_NOMEMALLOC|\
+			__GFP_NO_VRANGE)
 
 /* Control slab gfp mask during early boot */
 #define GFP_BOOT_MASK (__GFP_BITS_MASK & ~(__GFP_WAIT|__GFP_IO|__GFP_FS))
diff --git a/mm/vrange.c b/mm/vrange.c
index f3c2465..5278939 100644
--- a/mm/vrange.c
+++ b/mm/vrange.c
@@ -204,7 +204,7 @@ int vrange_fork(struct mm_struct *new_mm, struct mm_struct *old_mm)
 		range = vrange_entry(next);
 		next = rb_next(next);
 
-		new_range = __vrange_alloc(GFP_KERNEL);
+		new_range = __vrange_alloc(GFP_KERNEL|__GFP_NO_VRANGE);
 		if (!new_range)
 			goto fail;
 		__vrange_set(new_range, range->node.start,
-- 
1.8.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 7/8] vrange: Add method to purge volatile ranges
  2013-06-12  4:22 ` John Stultz
@ 2013-06-12  4:22   ` John Stultz
  -1 siblings, 0 replies; 48+ messages in thread
From: John Stultz @ 2013-06-12  4:22 UTC (permalink / raw)
  To: LKML
  Cc: Minchan Kim, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	linux-mm, John Stultz

From: Minchan Kim <minchan@kernel.org>

This patch adds discarding function to purge volatile ranges under
memory pressure. Logic is as following:

1. Memory pressure happens
2. VM start to reclaim pages
3. Check the page is in volatile range.
4. If so, zap the page from the process's page table.
   (By semantic vrange(2), we should mark it with another one to
    make page fault when you try to access the address. It will
    be introduced later patch)
5. If page is unmapped from all processes, discard it instead of swapping.

This patch does not address the case where there is no swap, which
keeps anonymous pages from being aged off the LRUs. Minchan has
additional patches that add support for purging anonymous pages

XXX: First pass at file purging. Seems to work, but is likely broken
and needs close review.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Android Kernel Team <kernel-team@android.com>
Cc: Robert Love <rlove@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Andrea Righi <andrea@betterlinux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mike Hommey <mh@glandium.org>
Cc: Taras Glek <tglek@mozilla.com>
Cc: Dhaval Giani <dgiani@mozilla.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: linux-mm@kvack.org <linux-mm@kvack.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
[jstultz: Reworked to add purging of file pages, commit log tweaks]
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 include/linux/rmap.h   |  12 +-
 include/linux/swap.h   |   1 +
 include/linux/vrange.h |   7 ++
 mm/ksm.c               |   2 +-
 mm/rmap.c              |  30 +++--
 mm/swapfile.c          |  36 ++++++
 mm/vmscan.c            |  16 ++-
 mm/vrange.c            | 332 +++++++++++++++++++++++++++++++++++++++++++++++++
 8 files changed, 420 insertions(+), 16 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 6dacb93..6432dfb 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -83,6 +83,8 @@ enum ttu_flags {
 };
 
 #ifdef CONFIG_MMU
+unsigned long vma_address(struct page *page, struct vm_area_struct *vma);
+
 static inline void get_anon_vma(struct anon_vma *anon_vma)
 {
 	atomic_inc(&anon_vma->refcount);
@@ -182,9 +184,11 @@ static inline void page_dup_rmap(struct page *page)
  * Called from mm/vmscan.c to handle paging out
  */
 int page_referenced(struct page *, int is_locked,
-			struct mem_cgroup *memcg, unsigned long *vm_flags);
+			struct mem_cgroup *memcg, unsigned long *vm_flags,
+			int *is_vrange);
 int page_referenced_one(struct page *, struct vm_area_struct *,
-	unsigned long address, unsigned int *mapcount, unsigned long *vm_flags);
+	unsigned long address, unsigned int *mapcount, unsigned long *vm_flags,
+	int *is_vrange);
 
 #define TTU_ACTION(x) ((x) & TTU_ACTION_MASK)
 
@@ -249,9 +253,11 @@ int rmap_walk(struct page *page, int (*rmap_one)(struct page *,
 
 static inline int page_referenced(struct page *page, int is_locked,
 				  struct mem_cgroup *memcg,
-				  unsigned long *vm_flags)
+				  unsigned long *vm_flags,
+				  int *is_vrange)
 {
 	*vm_flags = 0;
+	*is_vrange = 0;
 	return 0;
 }
 
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 1701ce4..5907936 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -383,6 +383,7 @@ extern int swap_duplicate(swp_entry_t);
 extern int swapcache_prepare(swp_entry_t);
 extern void swap_free(swp_entry_t);
 extern void swapcache_free(swp_entry_t, struct page *page);
+extern int __free_swap_and_cache(swp_entry_t);
 extern int free_swap_and_cache(swp_entry_t);
 extern int swap_type_of(dev_t, sector_t, struct block_device **);
 extern unsigned int count_swap_pages(int, int);
diff --git a/include/linux/vrange.h b/include/linux/vrange.h
index a97ac25..cbb609a 100644
--- a/include/linux/vrange.h
+++ b/include/linux/vrange.h
@@ -37,6 +37,10 @@ extern int vrange_clear(struct vrange_root *vroot,
 extern void vrange_root_cleanup(struct vrange_root *vroot);
 extern int vrange_fork(struct mm_struct *new,
 					struct mm_struct *old);
+int discard_vpage(struct page *page);
+bool vrange_address(struct mm_struct *mm, unsigned long start,
+			unsigned long end);
+
 #else
 
 static inline void vrange_init(void) {};
@@ -47,5 +51,8 @@ static inline int vrange_fork(struct mm_struct *new, struct mm_struct *old)
 	return 0;
 }
 
+static inline bool vrange_address(struct mm_struct *mm, unsigned long start,
+		unsigned long end) { return false; };
+static inline int discard_vpage(struct page *page) { return 0 };
 #endif
 #endif /* _LINIUX_VRANGE_H */
diff --git a/mm/ksm.c b/mm/ksm.c
index b6afe0c..debc20c 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1932,7 +1932,7 @@ again:
 				continue;
 
 			referenced += page_referenced_one(page, vma,
-				rmap_item->address, &mapcount, vm_flags);
+				rmap_item->address, &mapcount, vm_flags, NULL);
 			if (!search_new_forks || !mapcount)
 				break;
 		}
diff --git a/mm/rmap.c b/mm/rmap.c
index 6280da8..5522522 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -57,6 +57,8 @@
 #include <linux/migrate.h>
 #include <linux/hugetlb.h>
 #include <linux/backing-dev.h>
+#include <linux/vrange.h>
+#include <linux/rmap.h>
 
 #include <asm/tlbflush.h>
 
@@ -523,8 +525,7 @@ __vma_address(struct page *page, struct vm_area_struct *vma)
 	return vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
 }
 
-inline unsigned long
-vma_address(struct page *page, struct vm_area_struct *vma)
+unsigned long vma_address(struct page *page, struct vm_area_struct *vma)
 {
 	unsigned long address = __vma_address(page, vma);
 
@@ -662,7 +663,7 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma)
  */
 int page_referenced_one(struct page *page, struct vm_area_struct *vma,
 			unsigned long address, unsigned int *mapcount,
-			unsigned long *vm_flags)
+			unsigned long *vm_flags, int *is_vrange)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	int referenced = 0;
@@ -724,6 +725,9 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
 				referenced++;
 		}
 		pte_unmap_unlock(pte, ptl);
+		if (is_vrange &&
+			vrange_address(mm, address, address + PAGE_SIZE - 1))
+			*is_vrange = 1;
 	}
 
 	(*mapcount)--;
@@ -736,7 +740,8 @@ out:
 
 static int page_referenced_anon(struct page *page,
 				struct mem_cgroup *memcg,
-				unsigned long *vm_flags)
+				unsigned long *vm_flags,
+				int *is_vrange)
 {
 	unsigned int mapcount;
 	struct anon_vma *anon_vma;
@@ -761,7 +766,7 @@ static int page_referenced_anon(struct page *page,
 		if (memcg && !mm_match_cgroup(vma->vm_mm, memcg))
 			continue;
 		referenced += page_referenced_one(page, vma, address,
-						  &mapcount, vm_flags);
+					&mapcount, vm_flags, is_vrange);
 		if (!mapcount)
 			break;
 	}
@@ -785,7 +790,9 @@ static int page_referenced_anon(struct page *page,
  */
 static int page_referenced_file(struct page *page,
 				struct mem_cgroup *memcg,
-				unsigned long *vm_flags)
+				unsigned long *vm_flags,
+				int *is_vrange)
+
 {
 	unsigned int mapcount;
 	struct address_space *mapping = page->mapping;
@@ -826,7 +833,8 @@ static int page_referenced_file(struct page *page,
 		if (memcg && !mm_match_cgroup(vma->vm_mm, memcg))
 			continue;
 		referenced += page_referenced_one(page, vma, address,
-						  &mapcount, vm_flags);
+							&mapcount, vm_flags,
+							is_vrange);
 		if (!mapcount)
 			break;
 	}
@@ -841,6 +849,7 @@ static int page_referenced_file(struct page *page,
  * @is_locked: caller holds lock on the page
  * @memcg: target memory cgroup
  * @vm_flags: collect encountered vma->vm_flags who actually referenced the page
+ * @is_vrange: the page in vrange of some process
  *
  * Quick test_and_clear_referenced for all mappings to a page,
  * returns the number of ptes which referenced the page.
@@ -848,7 +857,8 @@ static int page_referenced_file(struct page *page,
 int page_referenced(struct page *page,
 		    int is_locked,
 		    struct mem_cgroup *memcg,
-		    unsigned long *vm_flags)
+		    unsigned long *vm_flags,
+		    int *is_vrange)
 {
 	int referenced = 0;
 	int we_locked = 0;
@@ -867,10 +877,10 @@ int page_referenced(struct page *page,
 								vm_flags);
 		else if (PageAnon(page))
 			referenced += page_referenced_anon(page, memcg,
-								vm_flags);
+							vm_flags, is_vrange);
 		else if (page->mapping)
 			referenced += page_referenced_file(page, memcg,
-								vm_flags);
+							vm_flags, is_vrange);
 		if (we_locked)
 			unlock_page(page);
 
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 6c340d9..d41c63f 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -734,6 +734,42 @@ int try_to_free_swap(struct page *page)
 }
 
 /*
+ * It's almost same with free_swap_and_cache except page is already
+ * locked.
+ */
+int __free_swap_and_cache(swp_entry_t entry)
+{
+	struct swap_info_struct *p;
+	struct page *page = NULL;
+
+	if (non_swap_entry(entry))
+		return 1;
+
+	p = swap_info_get(entry);
+	if (p) {
+		if (swap_entry_free(p, entry, 1) == SWAP_HAS_CACHE) {
+			page = find_get_page(swap_address_space(entry),
+						entry.val);
+		}
+		spin_unlock(&swap_lock);
+	}
+
+	if (page) {
+		/*
+		 * Not mapped elsewhere, or swap space full? Free it!
+		 * Also recheck PageSwapCache now page is locked (above).
+		 */
+		if (PageSwapCache(page) && !PageWriteback(page) &&
+				(!page_mapped(page) || vm_swap_full())) {
+			delete_from_swap_cache(page);
+			SetPageDirty(page);
+		}
+		page_cache_release(page);
+	}
+	return p != NULL;
+}
+
+/*
  * Free the swap entry like above, but also try to
  * free the page cache entry if it is the last user.
  */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index fa6a853..c75e0ac 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -43,6 +43,7 @@
 #include <linux/sysctl.h>
 #include <linux/oom.h>
 #include <linux/prefetch.h>
+#include <linux/vrange.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -611,6 +612,7 @@ enum page_references {
 	PAGEREF_RECLAIM,
 	PAGEREF_RECLAIM_CLEAN,
 	PAGEREF_KEEP,
+	PAGEREF_DISCARD,
 	PAGEREF_ACTIVATE,
 };
 
@@ -619,9 +621,10 @@ static enum page_references page_check_references(struct page *page,
 {
 	int referenced_ptes, referenced_page;
 	unsigned long vm_flags;
+	int is_vrange = 0;
 
 	referenced_ptes = page_referenced(page, 1, sc->target_mem_cgroup,
-					  &vm_flags);
+					  &vm_flags, &is_vrange);
 	referenced_page = TestClearPageReferenced(page);
 
 	/*
@@ -631,6 +634,12 @@ static enum page_references page_check_references(struct page *page,
 	if (vm_flags & VM_LOCKED)
 		return PAGEREF_RECLAIM;
 
+	/*
+	 * Bail out if the page is in vrange and try to discard.
+	 */
+	if (is_vrange)
+		return PAGEREF_DISCARD;
+
 	if (referenced_ptes) {
 		if (PageSwapBacked(page))
 			return PAGEREF_ACTIVATE;
@@ -769,6 +778,9 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			goto activate_locked;
 		case PAGEREF_KEEP:
 			goto keep_locked;
+		case PAGEREF_DISCARD:
+			if (discard_vpage(page))
+				goto free_it;
 		case PAGEREF_RECLAIM:
 		case PAGEREF_RECLAIM_CLEAN:
 			; /* try to reclaim the page below */
@@ -1497,7 +1509,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
 		}
 
 		if (page_referenced(page, 0, sc->target_mem_cgroup,
-				    &vm_flags)) {
+				    &vm_flags, NULL)) {
 			nr_rotated += hpage_nr_pages(page);
 			/*
 			 * Identify referenced, file-backed active pages and
diff --git a/mm/vrange.c b/mm/vrange.c
index 5278939..1c8c447 100644
--- a/mm/vrange.c
+++ b/mm/vrange.c
@@ -6,6 +6,13 @@
 #include <linux/slab.h>
 #include <linux/mman.h>
 #include <linux/syscalls.h>
+#include <linux/pagemap.h>
+#include <linux/rmap.h>
+#include <linux/hugetlb.h>
+#include "internal.h"
+#include <linux/swap.h>
+#include <linux/swapops.h>
+#include <linux/mmu_notifier.h>
 
 static struct kmem_cache *vrange_cachep;
 
@@ -364,3 +371,328 @@ SYSCALL_DEFINE4(vrange, unsigned long, start,
 out:
 	return ret;
 }
+
+
+static bool __vrange_address(struct vrange_root *vroot,
+			unsigned long start, unsigned long end)
+{
+	struct interval_tree_node *node;
+
+	node = interval_tree_iter_first(&vroot->v_rb, start, end);
+	return node ? true : false;
+}
+
+bool vrange_address(struct mm_struct *mm,
+			unsigned long start, unsigned long end)
+{
+	struct vrange_root *vroot;
+	unsigned long vstart_idx, vend_idx;
+	struct vm_area_struct *vma;
+	bool ret;
+
+	vma = find_vma(mm, start);
+	if (vma->vm_file && (vma->vm_flags & VM_SHARED)) {
+		vroot = &vma->vm_file->f_mapping->vroot;
+		vstart_idx = vma->vm_pgoff + start - vma->vm_start;
+		vend_idx = vma->vm_pgoff + end - vma->vm_start;
+	} else {
+		vroot = &mm->vroot;
+		vstart_idx = start;
+		vend_idx = end;
+	}
+
+	vrange_lock(vroot);
+	ret = __vrange_address(vroot, vstart_idx, vend_idx);
+	vrange_unlock(vroot);
+	return ret;
+}
+
+static pte_t *__vpage_check_address(struct page *page,
+		struct mm_struct *mm, unsigned long address, spinlock_t **ptlp)
+{
+	pmd_t *pmd;
+	pte_t *pte;
+	spinlock_t *ptl;
+	bool present;
+
+	/* TODO : look into tlbfs */
+	if (unlikely(PageHuge(page)))
+		return NULL;
+
+	pmd = mm_find_pmd(mm, address);
+	if (!pmd)
+		return NULL;
+	/*
+	 * TODO : Support THP
+	 */
+	if (pmd_trans_huge(*pmd))
+		return NULL;
+
+	pte = pte_offset_map_lock(mm, pmd, address, &ptl);
+	if (pte_none(*pte))
+		goto out;
+
+	present = pte_present(*pte);
+	if (present && page_to_pfn(page) != pte_pfn(*pte))
+		goto out;
+	else if (present) {
+		*ptlp = ptl;
+		return pte;
+	} else {
+		swp_entry_t entry = { .val = page_private(page) };
+
+		VM_BUG_ON(non_swap_entry(entry));
+		if (entry.val != pte_to_swp_entry(*pte).val)
+			goto out;
+		*ptlp = ptl;
+		return pte;
+	}
+out:
+	pte_unmap_unlock(pte, ptl);
+	return NULL;
+}
+
+/*
+ * This functions checks @page is matched with pte's encoded one
+ * which could be a page or swap slot.
+ */
+static inline pte_t *vpage_check_address(struct page *page,
+		struct mm_struct *mm, unsigned long address,
+		spinlock_t **ptlp)
+{
+	pte_t *ptep;
+	__cond_lock(*ptlp, ptep = __vpage_check_address(page,
+				mm, address, ptlp));
+	return ptep;
+}
+
+static void __vrange_purge(struct vrange_root *vroot,
+		unsigned long start, unsigned long end)
+{
+	struct vrange *range;
+	struct interval_tree_node *node;
+
+	node = interval_tree_iter_first(&vroot->v_rb, start, end);
+	while (node) {
+		range = container_of(node, struct vrange, node);
+		range->purged = true;
+		node = interval_tree_iter_next(node, start, end);
+	}
+}
+
+int try_to_discard_one(struct vrange_root *vroot, struct page *page,
+			struct vm_area_struct *vma, unsigned long address)
+{
+	struct mm_struct *mm = vma->vm_mm;
+	pte_t *pte;
+	pte_t pteval;
+	spinlock_t *ptl;
+	int ret = 0;
+	bool present;
+
+	VM_BUG_ON(!PageLocked(page));
+
+	vrange_lock(vroot);
+	pte = vpage_check_address(page, mm, address, &ptl);
+	if (!pte)
+		goto out;
+
+	if (vma->vm_flags & VM_LOCKED) {
+		pte_unmap_unlock(pte, ptl);
+		goto out;
+	}
+
+	present = pte_present(*pte);
+	flush_cache_page(vma, address, page_to_pfn(page));
+	pteval = ptep_clear_flush(vma, address, pte);
+
+	update_hiwater_rss(mm);
+	if (PageAnon(page))
+		dec_mm_counter(mm, MM_ANONPAGES);
+	else
+		dec_mm_counter(mm, MM_FILEPAGES);
+
+	page_remove_rmap(page);
+	page_cache_release(page);
+	if (!present) {
+		swp_entry_t entry = pte_to_swp_entry(*pte);
+		dec_mm_counter(mm, MM_SWAPENTS);
+		if (unlikely(!__free_swap_and_cache(entry)))
+			BUG_ON(1);
+	}
+
+	pte_unmap_unlock(pte, ptl);
+	mmu_notifier_invalidate_page(mm, address);
+	ret = 1;
+
+	if (!PageAnon(page)) /* switch to file offset) */
+		address = vma->vm_pgoff + address - vma->vm_start;
+
+	__vrange_purge(vroot, address, address + PAGE_SIZE - 1);
+
+out:
+	vrange_unlock(vroot);
+	return ret;
+}
+
+static int try_to_discard_anon_vpage(struct page *page)
+{
+	struct anon_vma *anon_vma;
+	struct anon_vma_chain *avc;
+	pgoff_t pgoff;
+	struct vm_area_struct *vma;
+	struct mm_struct *mm;
+	struct vrange_root *vroot;
+
+	unsigned long address;
+	bool ret = 0;
+
+	anon_vma = page_lock_anon_vma_read(page);
+	if (!anon_vma)
+		return ret;
+
+	pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+	anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root, pgoff, pgoff) {
+		pte_t *pte;
+		spinlock_t *ptl;
+
+		vma = avc->vma;
+		mm = vma->vm_mm;
+		vroot = &mm->vroot;
+		address = vma_address(page, vma);
+
+		vrange_lock(vroot);
+		/*
+		 * We can't use page_check_address because it doesn't check
+		 * swap entry of the page table. We need the check because
+		 * we have to make sure atomicity of shared vrange.
+		 * It means all vranges which are shared a page should be
+		 * purged if a page in a process is purged.
+		 */
+		pte = vpage_check_address(page, mm, address, &ptl);
+		if (!pte) {
+			vrange_unlock(vroot);
+			continue;
+		}
+
+		if (vma->vm_flags & VM_LOCKED) {
+			pte_unmap_unlock(pte, ptl);
+			vrange_unlock(vroot);
+			goto out;
+		}
+
+		pte_unmap_unlock(pte, ptl);
+		if (!__vrange_address(vroot, address,
+					address + PAGE_SIZE - 1)) {
+			vrange_unlock(vroot);
+			goto out;
+		}
+
+		vrange_unlock(vroot);
+	}
+
+	anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root, pgoff, pgoff) {
+		vma = avc->vma;
+		mm = vma->vm_mm;
+		vroot = &mm->vroot;
+		address = vma_address(page, vma);
+		if (!try_to_discard_one(vroot, page, vma, address))
+			goto out;
+	}
+
+	ret = 1;
+out:
+	page_unlock_anon_vma_read(anon_vma);
+	return ret;
+}
+
+
+
+static int try_to_discard_file_vpage(struct page *page)
+{
+	struct address_space *mapping = page->mapping;
+	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+	struct vm_area_struct *vma;
+	bool ret = 0;
+
+	mutex_lock(&mapping->i_mmap_mutex);
+	vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
+		unsigned long address = vma_address(page, vma);
+		struct mm_struct *mm = vma->vm_mm;
+		struct vrange_root *vroot = &mapping->vroot;
+		pte_t *pte;
+		spinlock_t *ptl;
+		long vstart_idx;
+
+
+		vstart_idx = vma->vm_pgoff + address - vma->vm_start;
+
+		vrange_lock(vroot);
+		/*
+		 * We can't use page_check_address because it doesn't check
+		 * swap entry of the page table. We need the check because
+		 * we have to make sure atomicity of shared vrange.
+		 * It means all vranges which are shared a page should be
+		 * purged if a page in a process is purged.
+		 */
+		pte = vpage_check_address(page, mm, address, &ptl);
+		if (!pte) {
+			vrange_unlock(vroot);
+			continue;
+		}
+
+		if (vma->vm_flags & VM_LOCKED) {
+			pte_unmap_unlock(pte, ptl);
+			vrange_unlock(vroot);
+			goto out;
+		}
+
+		pte_unmap_unlock(pte, ptl);
+		if (!__vrange_address(vroot, vstart_idx,
+					vstart_idx + PAGE_SIZE - 1)) {
+			vrange_unlock(vroot);
+			goto out;
+		}
+
+		vrange_unlock(vroot);
+	}
+
+	vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
+		unsigned long address = vma_address(page, vma);
+		struct vrange_root *vroot = &mapping->vroot;
+
+		if (!try_to_discard_one(vroot, page, vma, address))
+			goto out;
+	}
+
+	ret = 1;
+out:
+	mutex_unlock(&mapping->i_mmap_mutex);
+	return ret;
+}
+
+static int try_to_discard_vpage(struct page *page)
+{
+	if (PageAnon(page))
+		return try_to_discard_anon_vpage(page);
+	return try_to_discard_file_vpage(page);
+}
+
+int discard_vpage(struct page *page)
+{
+	VM_BUG_ON(!PageLocked(page));
+	VM_BUG_ON(PageLRU(page));
+
+	if (try_to_discard_vpage(page)) {
+		if (PageSwapCache(page))
+			try_to_free_swap(page);
+
+		if (page_freeze_refs(page, 1)) {
+			unlock_page(page);
+			return 1;
+		}
+	}
+
+	return 0;
+}
+
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 7/8] vrange: Add method to purge volatile ranges
@ 2013-06-12  4:22   ` John Stultz
  0 siblings, 0 replies; 48+ messages in thread
From: John Stultz @ 2013-06-12  4:22 UTC (permalink / raw)
  To: LKML
  Cc: Minchan Kim, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	linux-mm, John Stultz

From: Minchan Kim <minchan@kernel.org>

This patch adds discarding function to purge volatile ranges under
memory pressure. Logic is as following:

1. Memory pressure happens
2. VM start to reclaim pages
3. Check the page is in volatile range.
4. If so, zap the page from the process's page table.
   (By semantic vrange(2), we should mark it with another one to
    make page fault when you try to access the address. It will
    be introduced later patch)
5. If page is unmapped from all processes, discard it instead of swapping.

This patch does not address the case where there is no swap, which
keeps anonymous pages from being aged off the LRUs. Minchan has
additional patches that add support for purging anonymous pages

XXX: First pass at file purging. Seems to work, but is likely broken
and needs close review.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Android Kernel Team <kernel-team@android.com>
Cc: Robert Love <rlove@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Andrea Righi <andrea@betterlinux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mike Hommey <mh@glandium.org>
Cc: Taras Glek <tglek@mozilla.com>
Cc: Dhaval Giani <dgiani@mozilla.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: linux-mm@kvack.org <linux-mm@kvack.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
[jstultz: Reworked to add purging of file pages, commit log tweaks]
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 include/linux/rmap.h   |  12 +-
 include/linux/swap.h   |   1 +
 include/linux/vrange.h |   7 ++
 mm/ksm.c               |   2 +-
 mm/rmap.c              |  30 +++--
 mm/swapfile.c          |  36 ++++++
 mm/vmscan.c            |  16 ++-
 mm/vrange.c            | 332 +++++++++++++++++++++++++++++++++++++++++++++++++
 8 files changed, 420 insertions(+), 16 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 6dacb93..6432dfb 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -83,6 +83,8 @@ enum ttu_flags {
 };
 
 #ifdef CONFIG_MMU
+unsigned long vma_address(struct page *page, struct vm_area_struct *vma);
+
 static inline void get_anon_vma(struct anon_vma *anon_vma)
 {
 	atomic_inc(&anon_vma->refcount);
@@ -182,9 +184,11 @@ static inline void page_dup_rmap(struct page *page)
  * Called from mm/vmscan.c to handle paging out
  */
 int page_referenced(struct page *, int is_locked,
-			struct mem_cgroup *memcg, unsigned long *vm_flags);
+			struct mem_cgroup *memcg, unsigned long *vm_flags,
+			int *is_vrange);
 int page_referenced_one(struct page *, struct vm_area_struct *,
-	unsigned long address, unsigned int *mapcount, unsigned long *vm_flags);
+	unsigned long address, unsigned int *mapcount, unsigned long *vm_flags,
+	int *is_vrange);
 
 #define TTU_ACTION(x) ((x) & TTU_ACTION_MASK)
 
@@ -249,9 +253,11 @@ int rmap_walk(struct page *page, int (*rmap_one)(struct page *,
 
 static inline int page_referenced(struct page *page, int is_locked,
 				  struct mem_cgroup *memcg,
-				  unsigned long *vm_flags)
+				  unsigned long *vm_flags,
+				  int *is_vrange)
 {
 	*vm_flags = 0;
+	*is_vrange = 0;
 	return 0;
 }
 
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 1701ce4..5907936 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -383,6 +383,7 @@ extern int swap_duplicate(swp_entry_t);
 extern int swapcache_prepare(swp_entry_t);
 extern void swap_free(swp_entry_t);
 extern void swapcache_free(swp_entry_t, struct page *page);
+extern int __free_swap_and_cache(swp_entry_t);
 extern int free_swap_and_cache(swp_entry_t);
 extern int swap_type_of(dev_t, sector_t, struct block_device **);
 extern unsigned int count_swap_pages(int, int);
diff --git a/include/linux/vrange.h b/include/linux/vrange.h
index a97ac25..cbb609a 100644
--- a/include/linux/vrange.h
+++ b/include/linux/vrange.h
@@ -37,6 +37,10 @@ extern int vrange_clear(struct vrange_root *vroot,
 extern void vrange_root_cleanup(struct vrange_root *vroot);
 extern int vrange_fork(struct mm_struct *new,
 					struct mm_struct *old);
+int discard_vpage(struct page *page);
+bool vrange_address(struct mm_struct *mm, unsigned long start,
+			unsigned long end);
+
 #else
 
 static inline void vrange_init(void) {};
@@ -47,5 +51,8 @@ static inline int vrange_fork(struct mm_struct *new, struct mm_struct *old)
 	return 0;
 }
 
+static inline bool vrange_address(struct mm_struct *mm, unsigned long start,
+		unsigned long end) { return false; };
+static inline int discard_vpage(struct page *page) { return 0 };
 #endif
 #endif /* _LINIUX_VRANGE_H */
diff --git a/mm/ksm.c b/mm/ksm.c
index b6afe0c..debc20c 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1932,7 +1932,7 @@ again:
 				continue;
 
 			referenced += page_referenced_one(page, vma,
-				rmap_item->address, &mapcount, vm_flags);
+				rmap_item->address, &mapcount, vm_flags, NULL);
 			if (!search_new_forks || !mapcount)
 				break;
 		}
diff --git a/mm/rmap.c b/mm/rmap.c
index 6280da8..5522522 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -57,6 +57,8 @@
 #include <linux/migrate.h>
 #include <linux/hugetlb.h>
 #include <linux/backing-dev.h>
+#include <linux/vrange.h>
+#include <linux/rmap.h>
 
 #include <asm/tlbflush.h>
 
@@ -523,8 +525,7 @@ __vma_address(struct page *page, struct vm_area_struct *vma)
 	return vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
 }
 
-inline unsigned long
-vma_address(struct page *page, struct vm_area_struct *vma)
+unsigned long vma_address(struct page *page, struct vm_area_struct *vma)
 {
 	unsigned long address = __vma_address(page, vma);
 
@@ -662,7 +663,7 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma)
  */
 int page_referenced_one(struct page *page, struct vm_area_struct *vma,
 			unsigned long address, unsigned int *mapcount,
-			unsigned long *vm_flags)
+			unsigned long *vm_flags, int *is_vrange)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	int referenced = 0;
@@ -724,6 +725,9 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
 				referenced++;
 		}
 		pte_unmap_unlock(pte, ptl);
+		if (is_vrange &&
+			vrange_address(mm, address, address + PAGE_SIZE - 1))
+			*is_vrange = 1;
 	}
 
 	(*mapcount)--;
@@ -736,7 +740,8 @@ out:
 
 static int page_referenced_anon(struct page *page,
 				struct mem_cgroup *memcg,
-				unsigned long *vm_flags)
+				unsigned long *vm_flags,
+				int *is_vrange)
 {
 	unsigned int mapcount;
 	struct anon_vma *anon_vma;
@@ -761,7 +766,7 @@ static int page_referenced_anon(struct page *page,
 		if (memcg && !mm_match_cgroup(vma->vm_mm, memcg))
 			continue;
 		referenced += page_referenced_one(page, vma, address,
-						  &mapcount, vm_flags);
+					&mapcount, vm_flags, is_vrange);
 		if (!mapcount)
 			break;
 	}
@@ -785,7 +790,9 @@ static int page_referenced_anon(struct page *page,
  */
 static int page_referenced_file(struct page *page,
 				struct mem_cgroup *memcg,
-				unsigned long *vm_flags)
+				unsigned long *vm_flags,
+				int *is_vrange)
+
 {
 	unsigned int mapcount;
 	struct address_space *mapping = page->mapping;
@@ -826,7 +833,8 @@ static int page_referenced_file(struct page *page,
 		if (memcg && !mm_match_cgroup(vma->vm_mm, memcg))
 			continue;
 		referenced += page_referenced_one(page, vma, address,
-						  &mapcount, vm_flags);
+							&mapcount, vm_flags,
+							is_vrange);
 		if (!mapcount)
 			break;
 	}
@@ -841,6 +849,7 @@ static int page_referenced_file(struct page *page,
  * @is_locked: caller holds lock on the page
  * @memcg: target memory cgroup
  * @vm_flags: collect encountered vma->vm_flags who actually referenced the page
+ * @is_vrange: the page in vrange of some process
  *
  * Quick test_and_clear_referenced for all mappings to a page,
  * returns the number of ptes which referenced the page.
@@ -848,7 +857,8 @@ static int page_referenced_file(struct page *page,
 int page_referenced(struct page *page,
 		    int is_locked,
 		    struct mem_cgroup *memcg,
-		    unsigned long *vm_flags)
+		    unsigned long *vm_flags,
+		    int *is_vrange)
 {
 	int referenced = 0;
 	int we_locked = 0;
@@ -867,10 +877,10 @@ int page_referenced(struct page *page,
 								vm_flags);
 		else if (PageAnon(page))
 			referenced += page_referenced_anon(page, memcg,
-								vm_flags);
+							vm_flags, is_vrange);
 		else if (page->mapping)
 			referenced += page_referenced_file(page, memcg,
-								vm_flags);
+							vm_flags, is_vrange);
 		if (we_locked)
 			unlock_page(page);
 
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 6c340d9..d41c63f 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -734,6 +734,42 @@ int try_to_free_swap(struct page *page)
 }
 
 /*
+ * It's almost same with free_swap_and_cache except page is already
+ * locked.
+ */
+int __free_swap_and_cache(swp_entry_t entry)
+{
+	struct swap_info_struct *p;
+	struct page *page = NULL;
+
+	if (non_swap_entry(entry))
+		return 1;
+
+	p = swap_info_get(entry);
+	if (p) {
+		if (swap_entry_free(p, entry, 1) == SWAP_HAS_CACHE) {
+			page = find_get_page(swap_address_space(entry),
+						entry.val);
+		}
+		spin_unlock(&swap_lock);
+	}
+
+	if (page) {
+		/*
+		 * Not mapped elsewhere, or swap space full? Free it!
+		 * Also recheck PageSwapCache now page is locked (above).
+		 */
+		if (PageSwapCache(page) && !PageWriteback(page) &&
+				(!page_mapped(page) || vm_swap_full())) {
+			delete_from_swap_cache(page);
+			SetPageDirty(page);
+		}
+		page_cache_release(page);
+	}
+	return p != NULL;
+}
+
+/*
  * Free the swap entry like above, but also try to
  * free the page cache entry if it is the last user.
  */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index fa6a853..c75e0ac 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -43,6 +43,7 @@
 #include <linux/sysctl.h>
 #include <linux/oom.h>
 #include <linux/prefetch.h>
+#include <linux/vrange.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -611,6 +612,7 @@ enum page_references {
 	PAGEREF_RECLAIM,
 	PAGEREF_RECLAIM_CLEAN,
 	PAGEREF_KEEP,
+	PAGEREF_DISCARD,
 	PAGEREF_ACTIVATE,
 };
 
@@ -619,9 +621,10 @@ static enum page_references page_check_references(struct page *page,
 {
 	int referenced_ptes, referenced_page;
 	unsigned long vm_flags;
+	int is_vrange = 0;
 
 	referenced_ptes = page_referenced(page, 1, sc->target_mem_cgroup,
-					  &vm_flags);
+					  &vm_flags, &is_vrange);
 	referenced_page = TestClearPageReferenced(page);
 
 	/*
@@ -631,6 +634,12 @@ static enum page_references page_check_references(struct page *page,
 	if (vm_flags & VM_LOCKED)
 		return PAGEREF_RECLAIM;
 
+	/*
+	 * Bail out if the page is in vrange and try to discard.
+	 */
+	if (is_vrange)
+		return PAGEREF_DISCARD;
+
 	if (referenced_ptes) {
 		if (PageSwapBacked(page))
 			return PAGEREF_ACTIVATE;
@@ -769,6 +778,9 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			goto activate_locked;
 		case PAGEREF_KEEP:
 			goto keep_locked;
+		case PAGEREF_DISCARD:
+			if (discard_vpage(page))
+				goto free_it;
 		case PAGEREF_RECLAIM:
 		case PAGEREF_RECLAIM_CLEAN:
 			; /* try to reclaim the page below */
@@ -1497,7 +1509,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
 		}
 
 		if (page_referenced(page, 0, sc->target_mem_cgroup,
-				    &vm_flags)) {
+				    &vm_flags, NULL)) {
 			nr_rotated += hpage_nr_pages(page);
 			/*
 			 * Identify referenced, file-backed active pages and
diff --git a/mm/vrange.c b/mm/vrange.c
index 5278939..1c8c447 100644
--- a/mm/vrange.c
+++ b/mm/vrange.c
@@ -6,6 +6,13 @@
 #include <linux/slab.h>
 #include <linux/mman.h>
 #include <linux/syscalls.h>
+#include <linux/pagemap.h>
+#include <linux/rmap.h>
+#include <linux/hugetlb.h>
+#include "internal.h"
+#include <linux/swap.h>
+#include <linux/swapops.h>
+#include <linux/mmu_notifier.h>
 
 static struct kmem_cache *vrange_cachep;
 
@@ -364,3 +371,328 @@ SYSCALL_DEFINE4(vrange, unsigned long, start,
 out:
 	return ret;
 }
+
+
+static bool __vrange_address(struct vrange_root *vroot,
+			unsigned long start, unsigned long end)
+{
+	struct interval_tree_node *node;
+
+	node = interval_tree_iter_first(&vroot->v_rb, start, end);
+	return node ? true : false;
+}
+
+bool vrange_address(struct mm_struct *mm,
+			unsigned long start, unsigned long end)
+{
+	struct vrange_root *vroot;
+	unsigned long vstart_idx, vend_idx;
+	struct vm_area_struct *vma;
+	bool ret;
+
+	vma = find_vma(mm, start);
+	if (vma->vm_file && (vma->vm_flags & VM_SHARED)) {
+		vroot = &vma->vm_file->f_mapping->vroot;
+		vstart_idx = vma->vm_pgoff + start - vma->vm_start;
+		vend_idx = vma->vm_pgoff + end - vma->vm_start;
+	} else {
+		vroot = &mm->vroot;
+		vstart_idx = start;
+		vend_idx = end;
+	}
+
+	vrange_lock(vroot);
+	ret = __vrange_address(vroot, vstart_idx, vend_idx);
+	vrange_unlock(vroot);
+	return ret;
+}
+
+static pte_t *__vpage_check_address(struct page *page,
+		struct mm_struct *mm, unsigned long address, spinlock_t **ptlp)
+{
+	pmd_t *pmd;
+	pte_t *pte;
+	spinlock_t *ptl;
+	bool present;
+
+	/* TODO : look into tlbfs */
+	if (unlikely(PageHuge(page)))
+		return NULL;
+
+	pmd = mm_find_pmd(mm, address);
+	if (!pmd)
+		return NULL;
+	/*
+	 * TODO : Support THP
+	 */
+	if (pmd_trans_huge(*pmd))
+		return NULL;
+
+	pte = pte_offset_map_lock(mm, pmd, address, &ptl);
+	if (pte_none(*pte))
+		goto out;
+
+	present = pte_present(*pte);
+	if (present && page_to_pfn(page) != pte_pfn(*pte))
+		goto out;
+	else if (present) {
+		*ptlp = ptl;
+		return pte;
+	} else {
+		swp_entry_t entry = { .val = page_private(page) };
+
+		VM_BUG_ON(non_swap_entry(entry));
+		if (entry.val != pte_to_swp_entry(*pte).val)
+			goto out;
+		*ptlp = ptl;
+		return pte;
+	}
+out:
+	pte_unmap_unlock(pte, ptl);
+	return NULL;
+}
+
+/*
+ * This functions checks @page is matched with pte's encoded one
+ * which could be a page or swap slot.
+ */
+static inline pte_t *vpage_check_address(struct page *page,
+		struct mm_struct *mm, unsigned long address,
+		spinlock_t **ptlp)
+{
+	pte_t *ptep;
+	__cond_lock(*ptlp, ptep = __vpage_check_address(page,
+				mm, address, ptlp));
+	return ptep;
+}
+
+static void __vrange_purge(struct vrange_root *vroot,
+		unsigned long start, unsigned long end)
+{
+	struct vrange *range;
+	struct interval_tree_node *node;
+
+	node = interval_tree_iter_first(&vroot->v_rb, start, end);
+	while (node) {
+		range = container_of(node, struct vrange, node);
+		range->purged = true;
+		node = interval_tree_iter_next(node, start, end);
+	}
+}
+
+int try_to_discard_one(struct vrange_root *vroot, struct page *page,
+			struct vm_area_struct *vma, unsigned long address)
+{
+	struct mm_struct *mm = vma->vm_mm;
+	pte_t *pte;
+	pte_t pteval;
+	spinlock_t *ptl;
+	int ret = 0;
+	bool present;
+
+	VM_BUG_ON(!PageLocked(page));
+
+	vrange_lock(vroot);
+	pte = vpage_check_address(page, mm, address, &ptl);
+	if (!pte)
+		goto out;
+
+	if (vma->vm_flags & VM_LOCKED) {
+		pte_unmap_unlock(pte, ptl);
+		goto out;
+	}
+
+	present = pte_present(*pte);
+	flush_cache_page(vma, address, page_to_pfn(page));
+	pteval = ptep_clear_flush(vma, address, pte);
+
+	update_hiwater_rss(mm);
+	if (PageAnon(page))
+		dec_mm_counter(mm, MM_ANONPAGES);
+	else
+		dec_mm_counter(mm, MM_FILEPAGES);
+
+	page_remove_rmap(page);
+	page_cache_release(page);
+	if (!present) {
+		swp_entry_t entry = pte_to_swp_entry(*pte);
+		dec_mm_counter(mm, MM_SWAPENTS);
+		if (unlikely(!__free_swap_and_cache(entry)))
+			BUG_ON(1);
+	}
+
+	pte_unmap_unlock(pte, ptl);
+	mmu_notifier_invalidate_page(mm, address);
+	ret = 1;
+
+	if (!PageAnon(page)) /* switch to file offset) */
+		address = vma->vm_pgoff + address - vma->vm_start;
+
+	__vrange_purge(vroot, address, address + PAGE_SIZE - 1);
+
+out:
+	vrange_unlock(vroot);
+	return ret;
+}
+
+static int try_to_discard_anon_vpage(struct page *page)
+{
+	struct anon_vma *anon_vma;
+	struct anon_vma_chain *avc;
+	pgoff_t pgoff;
+	struct vm_area_struct *vma;
+	struct mm_struct *mm;
+	struct vrange_root *vroot;
+
+	unsigned long address;
+	bool ret = 0;
+
+	anon_vma = page_lock_anon_vma_read(page);
+	if (!anon_vma)
+		return ret;
+
+	pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+	anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root, pgoff, pgoff) {
+		pte_t *pte;
+		spinlock_t *ptl;
+
+		vma = avc->vma;
+		mm = vma->vm_mm;
+		vroot = &mm->vroot;
+		address = vma_address(page, vma);
+
+		vrange_lock(vroot);
+		/*
+		 * We can't use page_check_address because it doesn't check
+		 * swap entry of the page table. We need the check because
+		 * we have to make sure atomicity of shared vrange.
+		 * It means all vranges which are shared a page should be
+		 * purged if a page in a process is purged.
+		 */
+		pte = vpage_check_address(page, mm, address, &ptl);
+		if (!pte) {
+			vrange_unlock(vroot);
+			continue;
+		}
+
+		if (vma->vm_flags & VM_LOCKED) {
+			pte_unmap_unlock(pte, ptl);
+			vrange_unlock(vroot);
+			goto out;
+		}
+
+		pte_unmap_unlock(pte, ptl);
+		if (!__vrange_address(vroot, address,
+					address + PAGE_SIZE - 1)) {
+			vrange_unlock(vroot);
+			goto out;
+		}
+
+		vrange_unlock(vroot);
+	}
+
+	anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root, pgoff, pgoff) {
+		vma = avc->vma;
+		mm = vma->vm_mm;
+		vroot = &mm->vroot;
+		address = vma_address(page, vma);
+		if (!try_to_discard_one(vroot, page, vma, address))
+			goto out;
+	}
+
+	ret = 1;
+out:
+	page_unlock_anon_vma_read(anon_vma);
+	return ret;
+}
+
+
+
+static int try_to_discard_file_vpage(struct page *page)
+{
+	struct address_space *mapping = page->mapping;
+	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+	struct vm_area_struct *vma;
+	bool ret = 0;
+
+	mutex_lock(&mapping->i_mmap_mutex);
+	vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
+		unsigned long address = vma_address(page, vma);
+		struct mm_struct *mm = vma->vm_mm;
+		struct vrange_root *vroot = &mapping->vroot;
+		pte_t *pte;
+		spinlock_t *ptl;
+		long vstart_idx;
+
+
+		vstart_idx = vma->vm_pgoff + address - vma->vm_start;
+
+		vrange_lock(vroot);
+		/*
+		 * We can't use page_check_address because it doesn't check
+		 * swap entry of the page table. We need the check because
+		 * we have to make sure atomicity of shared vrange.
+		 * It means all vranges which are shared a page should be
+		 * purged if a page in a process is purged.
+		 */
+		pte = vpage_check_address(page, mm, address, &ptl);
+		if (!pte) {
+			vrange_unlock(vroot);
+			continue;
+		}
+
+		if (vma->vm_flags & VM_LOCKED) {
+			pte_unmap_unlock(pte, ptl);
+			vrange_unlock(vroot);
+			goto out;
+		}
+
+		pte_unmap_unlock(pte, ptl);
+		if (!__vrange_address(vroot, vstart_idx,
+					vstart_idx + PAGE_SIZE - 1)) {
+			vrange_unlock(vroot);
+			goto out;
+		}
+
+		vrange_unlock(vroot);
+	}
+
+	vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
+		unsigned long address = vma_address(page, vma);
+		struct vrange_root *vroot = &mapping->vroot;
+
+		if (!try_to_discard_one(vroot, page, vma, address))
+			goto out;
+	}
+
+	ret = 1;
+out:
+	mutex_unlock(&mapping->i_mmap_mutex);
+	return ret;
+}
+
+static int try_to_discard_vpage(struct page *page)
+{
+	if (PageAnon(page))
+		return try_to_discard_anon_vpage(page);
+	return try_to_discard_file_vpage(page);
+}
+
+int discard_vpage(struct page *page)
+{
+	VM_BUG_ON(!PageLocked(page));
+	VM_BUG_ON(PageLRU(page));
+
+	if (try_to_discard_vpage(page)) {
+		if (PageSwapCache(page))
+			try_to_free_swap(page);
+
+		if (page_freeze_refs(page, 1)) {
+			unlock_page(page);
+			return 1;
+		}
+	}
+
+	return 0;
+}
+
-- 
1.8.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 8/8] vrange: Send SIGBUS when user try to access purged page
  2013-06-12  4:22 ` John Stultz
@ 2013-06-12  4:22   ` John Stultz
  -1 siblings, 0 replies; 48+ messages in thread
From: John Stultz @ 2013-06-12  4:22 UTC (permalink / raw)
  To: LKML
  Cc: Minchan Kim, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	linux-mm, John Stultz

From: Minchan Kim <minchan@kernel.org>

By vrange(2) semantic, user should see SIGBUG if he try to access
purged page without vrange(...VRANGE_NOVOLATILE).

This patch implements it.

XXX: I reused PSE bit for quick prototype without enough considering
so need time to see what's empty bit and I am surely missing
many places to handle vrange pte bit. I should investigate all of
pte handling places, especially pte_none case.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Android Kernel Team <kernel-team@android.com>
Cc: Robert Love <rlove@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Andrea Righi <andrea@betterlinux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mike Hommey <mh@glandium.org>
Cc: Taras Glek <tglek@mozilla.com>
Cc: Dhaval Giani <dgiani@mozilla.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: linux-mm@kvack.org <linux-mm@kvack.org>

Signed-off-by: Minchan Kim <minchan@kernel.org>
[jstultz: Extended to work with file pages]
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 arch/x86/include/asm/pgtable_types.h |  2 ++
 include/asm-generic/pgtable.h        | 11 +++++++++++
 include/linux/vrange.h               |  2 ++
 mm/memory.c                          | 23 +++++++++++++++++++++--
 mm/vrange.c                          | 35 ++++++++++++++++++++++++++++++++++-
 5 files changed, 70 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index e642300..d7ea6a0 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -64,6 +64,8 @@
 #define _PAGE_FILE	(_AT(pteval_t, 1) << _PAGE_BIT_FILE)
 #define _PAGE_PROTNONE	(_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
 
+#define _PAGE_VRANGE	_PAGE_BIT_PSE
+
 /*
  * _PAGE_NUMA indicates that this page will trigger a numa hinting
  * minor page fault to gather numa placement statistics (see
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index a59ff51..91e8f6f 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -479,6 +479,17 @@ static inline unsigned long my_zero_pfn(unsigned long addr)
 
 #ifdef CONFIG_MMU
 
+static inline pte_t pte_mkvrange(pte_t pte)
+{
+	pte = pte_set_flags(pte, _PAGE_VRANGE);
+	return pte_clear_flags(pte, _PAGE_PRESENT);
+}
+
+static inline int pte_vrange(pte_t pte)
+{
+	return ((pte_flags(pte) | _PAGE_PRESENT) == _PAGE_VRANGE);
+}
+
 #ifndef CONFIG_TRANSPARENT_HUGEPAGE
 static inline int pmd_trans_huge(pmd_t pmd)
 {
diff --git a/include/linux/vrange.h b/include/linux/vrange.h
index cbb609a..75754d1 100644
--- a/include/linux/vrange.h
+++ b/include/linux/vrange.h
@@ -41,6 +41,8 @@ int discard_vpage(struct page *page);
 bool vrange_address(struct mm_struct *mm, unsigned long start,
 			unsigned long end);
 
+extern bool is_purged_vrange(struct mm_struct *mm, unsigned long address);
+
 #else
 
 static inline void vrange_init(void) {};
diff --git a/mm/memory.c b/mm/memory.c
index 61a262b..cc5c70b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -59,6 +59,7 @@
 #include <linux/gfp.h>
 #include <linux/migrate.h>
 #include <linux/string.h>
+#include <linux/vrange.h>
 
 #include <asm/io.h>
 #include <asm/pgalloc.h>
@@ -832,7 +833,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 
 	/* pte contains position in swap or file, so copy. */
 	if (unlikely(!pte_present(pte))) {
-		if (!pte_file(pte)) {
+		if (!pte_file(pte) && !pte_vrange(pte)) {
 			swp_entry_t entry = pte_to_swp_entry(pte);
 
 			if (swap_duplicate(entry) < 0)
@@ -1172,7 +1173,7 @@ again:
 		if (pte_file(ptent)) {
 			if (unlikely(!(vma->vm_flags & VM_NONLINEAR)))
 				print_bad_pte(vma, addr, ptent, NULL);
-		} else {
+		} else if (!pte_vrange(ptent)) {
 			swp_entry_t entry = pte_to_swp_entry(ptent);
 
 			if (!non_swap_entry(entry))
@@ -3707,9 +3708,27 @@ int handle_pte_fault(struct mm_struct *mm,
 					return do_linear_fault(mm, vma, address,
 						pte, pmd, flags, entry);
 			}
+anon:
 			return do_anonymous_page(mm, vma, address,
 						 pte, pmd, flags);
 		}
+
+		if (unlikely(pte_vrange(entry))) {
+			if (!is_purged_vrange(mm, address)) {
+				/* zap pte */
+				ptl = pte_lockptr(mm, pmd);
+				spin_lock(ptl);
+				if (unlikely(!pte_same(*pte, entry)))
+					goto unlock;
+				flush_cache_page(vma, address, pte_pfn(*pte));
+				ptep_clear_flush(vma, address, pte);
+				pte_unmap_unlock(pte, ptl);
+				goto anon;
+			}
+
+			return VM_FAULT_SIGBUS;
+		}
+
 		if (pte_file(entry))
 			return do_nonlinear_fault(mm, vma, address,
 					pte, pmd, flags, entry);
diff --git a/mm/vrange.c b/mm/vrange.c
index 1c8c447..fa965fb 100644
--- a/mm/vrange.c
+++ b/mm/vrange.c
@@ -504,7 +504,9 @@ int try_to_discard_one(struct vrange_root *vroot, struct page *page,
 
 	present = pte_present(*pte);
 	flush_cache_page(vma, address, page_to_pfn(page));
-	pteval = ptep_clear_flush(vma, address, pte);
+
+	ptep_clear_flush(vma, address, pte);
+	pteval = pte_mkvrange(*pte);
 
 	update_hiwater_rss(mm);
 	if (PageAnon(page))
@@ -521,6 +523,7 @@ int try_to_discard_one(struct vrange_root *vroot, struct page *page,
 			BUG_ON(1);
 	}
 
+	set_pte_at(mm, address, pte, pteval);
 	pte_unmap_unlock(pte, ptl);
 	mmu_notifier_invalidate_page(mm, address);
 	ret = 1;
@@ -696,3 +699,33 @@ int discard_vpage(struct page *page)
 	return 0;
 }
 
+bool is_purged_vrange(struct mm_struct *mm, unsigned long address)
+{
+	struct vrange_root *vroot;
+	struct interval_tree_node *node;
+	struct vrange *range;
+	unsigned long vstart_idx;
+	struct vm_area_struct *vma;
+	bool ret = false;
+
+	vma = find_vma(mm, address);
+	if (vma->vm_file && (vma->vm_flags & VM_SHARED)) {
+		vroot = &vma->vm_file->f_mapping->vroot;
+		vstart_idx = vma->vm_pgoff + address - vma->vm_start;
+	} else {
+		vroot = &mm->vroot;
+		vstart_idx = address;
+	}
+
+	vrange_lock(vroot);
+	node = interval_tree_iter_first(&vroot->v_rb, vstart_idx,
+						vstart_idx + PAGE_SIZE - 1);
+	if (node) {
+		range = container_of(node, struct vrange, node);
+		if (range->purged)
+			ret = true;
+	}
+	vrange_unlock(vroot);
+	return ret;
+}
+
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 8/8] vrange: Send SIGBUS when user try to access purged page
@ 2013-06-12  4:22   ` John Stultz
  0 siblings, 0 replies; 48+ messages in thread
From: John Stultz @ 2013-06-12  4:22 UTC (permalink / raw)
  To: LKML
  Cc: Minchan Kim, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	linux-mm, John Stultz

From: Minchan Kim <minchan@kernel.org>

By vrange(2) semantic, user should see SIGBUG if he try to access
purged page without vrange(...VRANGE_NOVOLATILE).

This patch implements it.

XXX: I reused PSE bit for quick prototype without enough considering
so need time to see what's empty bit and I am surely missing
many places to handle vrange pte bit. I should investigate all of
pte handling places, especially pte_none case.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Android Kernel Team <kernel-team@android.com>
Cc: Robert Love <rlove@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Andrea Righi <andrea@betterlinux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mike Hommey <mh@glandium.org>
Cc: Taras Glek <tglek@mozilla.com>
Cc: Dhaval Giani <dgiani@mozilla.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: linux-mm@kvack.org <linux-mm@kvack.org>

Signed-off-by: Minchan Kim <minchan@kernel.org>
[jstultz: Extended to work with file pages]
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 arch/x86/include/asm/pgtable_types.h |  2 ++
 include/asm-generic/pgtable.h        | 11 +++++++++++
 include/linux/vrange.h               |  2 ++
 mm/memory.c                          | 23 +++++++++++++++++++++--
 mm/vrange.c                          | 35 ++++++++++++++++++++++++++++++++++-
 5 files changed, 70 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index e642300..d7ea6a0 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -64,6 +64,8 @@
 #define _PAGE_FILE	(_AT(pteval_t, 1) << _PAGE_BIT_FILE)
 #define _PAGE_PROTNONE	(_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
 
+#define _PAGE_VRANGE	_PAGE_BIT_PSE
+
 /*
  * _PAGE_NUMA indicates that this page will trigger a numa hinting
  * minor page fault to gather numa placement statistics (see
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index a59ff51..91e8f6f 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -479,6 +479,17 @@ static inline unsigned long my_zero_pfn(unsigned long addr)
 
 #ifdef CONFIG_MMU
 
+static inline pte_t pte_mkvrange(pte_t pte)
+{
+	pte = pte_set_flags(pte, _PAGE_VRANGE);
+	return pte_clear_flags(pte, _PAGE_PRESENT);
+}
+
+static inline int pte_vrange(pte_t pte)
+{
+	return ((pte_flags(pte) | _PAGE_PRESENT) == _PAGE_VRANGE);
+}
+
 #ifndef CONFIG_TRANSPARENT_HUGEPAGE
 static inline int pmd_trans_huge(pmd_t pmd)
 {
diff --git a/include/linux/vrange.h b/include/linux/vrange.h
index cbb609a..75754d1 100644
--- a/include/linux/vrange.h
+++ b/include/linux/vrange.h
@@ -41,6 +41,8 @@ int discard_vpage(struct page *page);
 bool vrange_address(struct mm_struct *mm, unsigned long start,
 			unsigned long end);
 
+extern bool is_purged_vrange(struct mm_struct *mm, unsigned long address);
+
 #else
 
 static inline void vrange_init(void) {};
diff --git a/mm/memory.c b/mm/memory.c
index 61a262b..cc5c70b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -59,6 +59,7 @@
 #include <linux/gfp.h>
 #include <linux/migrate.h>
 #include <linux/string.h>
+#include <linux/vrange.h>
 
 #include <asm/io.h>
 #include <asm/pgalloc.h>
@@ -832,7 +833,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 
 	/* pte contains position in swap or file, so copy. */
 	if (unlikely(!pte_present(pte))) {
-		if (!pte_file(pte)) {
+		if (!pte_file(pte) && !pte_vrange(pte)) {
 			swp_entry_t entry = pte_to_swp_entry(pte);
 
 			if (swap_duplicate(entry) < 0)
@@ -1172,7 +1173,7 @@ again:
 		if (pte_file(ptent)) {
 			if (unlikely(!(vma->vm_flags & VM_NONLINEAR)))
 				print_bad_pte(vma, addr, ptent, NULL);
-		} else {
+		} else if (!pte_vrange(ptent)) {
 			swp_entry_t entry = pte_to_swp_entry(ptent);
 
 			if (!non_swap_entry(entry))
@@ -3707,9 +3708,27 @@ int handle_pte_fault(struct mm_struct *mm,
 					return do_linear_fault(mm, vma, address,
 						pte, pmd, flags, entry);
 			}
+anon:
 			return do_anonymous_page(mm, vma, address,
 						 pte, pmd, flags);
 		}
+
+		if (unlikely(pte_vrange(entry))) {
+			if (!is_purged_vrange(mm, address)) {
+				/* zap pte */
+				ptl = pte_lockptr(mm, pmd);
+				spin_lock(ptl);
+				if (unlikely(!pte_same(*pte, entry)))
+					goto unlock;
+				flush_cache_page(vma, address, pte_pfn(*pte));
+				ptep_clear_flush(vma, address, pte);
+				pte_unmap_unlock(pte, ptl);
+				goto anon;
+			}
+
+			return VM_FAULT_SIGBUS;
+		}
+
 		if (pte_file(entry))
 			return do_nonlinear_fault(mm, vma, address,
 					pte, pmd, flags, entry);
diff --git a/mm/vrange.c b/mm/vrange.c
index 1c8c447..fa965fb 100644
--- a/mm/vrange.c
+++ b/mm/vrange.c
@@ -504,7 +504,9 @@ int try_to_discard_one(struct vrange_root *vroot, struct page *page,
 
 	present = pte_present(*pte);
 	flush_cache_page(vma, address, page_to_pfn(page));
-	pteval = ptep_clear_flush(vma, address, pte);
+
+	ptep_clear_flush(vma, address, pte);
+	pteval = pte_mkvrange(*pte);
 
 	update_hiwater_rss(mm);
 	if (PageAnon(page))
@@ -521,6 +523,7 @@ int try_to_discard_one(struct vrange_root *vroot, struct page *page,
 			BUG_ON(1);
 	}
 
+	set_pte_at(mm, address, pte, pteval);
 	pte_unmap_unlock(pte, ptl);
 	mmu_notifier_invalidate_page(mm, address);
 	ret = 1;
@@ -696,3 +699,33 @@ int discard_vpage(struct page *page)
 	return 0;
 }
 
+bool is_purged_vrange(struct mm_struct *mm, unsigned long address)
+{
+	struct vrange_root *vroot;
+	struct interval_tree_node *node;
+	struct vrange *range;
+	unsigned long vstart_idx;
+	struct vm_area_struct *vma;
+	bool ret = false;
+
+	vma = find_vma(mm, address);
+	if (vma->vm_file && (vma->vm_flags & VM_SHARED)) {
+		vroot = &vma->vm_file->f_mapping->vroot;
+		vstart_idx = vma->vm_pgoff + address - vma->vm_start;
+	} else {
+		vroot = &mm->vroot;
+		vstart_idx = address;
+	}
+
+	vrange_lock(vroot);
+	node = interval_tree_iter_first(&vroot->v_rb, vstart_idx,
+						vstart_idx + PAGE_SIZE - 1);
+	if (node) {
+		range = container_of(node, struct vrange, node);
+		if (range->purged)
+			ret = true;
+	}
+	vrange_unlock(vroot);
+	return ret;
+}
+
-- 
1.8.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/8] vrange: Add new vrange(2) system call
  2013-06-12  4:22   ` John Stultz
  (?)
@ 2013-06-12  6:48   ` NeilBrown
  2013-06-12 18:47       ` John Stultz
  -1 siblings, 1 reply; 48+ messages in thread
From: NeilBrown @ 2013-06-12  6:48 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Minchan Kim, Andrew Morton, Android Kernel Team,
	Robert Love, Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Andrea Righi, Andrea Arcangeli,
	Aneesh Kumar K.V, Mike Hommey, Taras Glek, Dhaval Giani,
	Jan Kara, KOSAKI Motohiro, Michel Lespinasse, linux-mm

[-- Attachment #1: Type: text/plain, Size: 2007 bytes --]

On Tue, 11 Jun 2013 21:22:48 -0700 John Stultz <john.stultz@linaro.org> wrote:

> From: Minchan Kim <minchan@kernel.org>
> 
> This patch adds new system call sys_vrange.
> 
> NAME
> 	vrange - Mark or unmark range of memory as volatile
> 
> SYNOPSIS
> 	int vrange(unsigned_long start, size_t length, int mode,
> 			 int *purged);
> 
...
> 
> 	purged: Pointer to an integer which will return 1 if
> 	mode == VRANGE_NONVOLATILE and any page in the affected range
> 	was purged. If purged returns zero during a mode ==
> 	VRANGE_NONVOLATILE call, it means all of the pages in the range
> 	are intact.

This seems a bit ambiguous.
It is clear that the pointed-to location will be set to '1' if any part of
the range was purged, but it is not clear what will happen if it wasn't
purged.
The mention of 'returns zero' seems to suggest that it might set the location
to '0' in that case, but that isn't obvious to me.  The code appear to always
set it - that should be explicit.

Also, should the location be a fixed number of bytes to reduce possible
issues with N-bit userspace on M-bit kernels?

May I suggest:

        purge:  If not NULL, a pointer to a 32bit location which will be set
        to 1 if mode == VRANGE_NONVOLATILE and any page in the affected range
        was purged, and will be set to 0 in all other cases (including
        if mode == VRANGE_VOLATILE).


I don't think any further explanation is needed.


> +	if (purged) {
> +		/* Test pointer is valid before making any changes */
> +		if (put_user(p, purged))
> +			return -EFAULT;
> +	}
> +
> +	ret = do_vrange(mm, start, end - 1, mode, &p);
> +
> +	if (purged) {
> +		if (put_user(p, purged)) {
> +			/*
> +			 * This would be bad, since we've modified volatilty
> +			 * and the change in purged state would be lost.
> +			 */
> +			BUG();
> +		}
> +	}

I agree that would be bad, but I don't think a BUG() is called for.  Maybe a
WARN, and certainly a "return -EFAULT;"

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/8] vrange: Add new vrange(2) system call
  2013-06-12  6:48   ` NeilBrown
@ 2013-06-12 18:47       ` John Stultz
  0 siblings, 0 replies; 48+ messages in thread
From: John Stultz @ 2013-06-12 18:47 UTC (permalink / raw)
  To: NeilBrown
  Cc: LKML, Minchan Kim, Andrew Morton, Android Kernel Team,
	Robert Love, Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Andrea Righi, Andrea Arcangeli,
	Aneesh Kumar K.V, Mike Hommey, Taras Glek, Dhaval Giani,
	Jan Kara, KOSAKI Motohiro, Michel Lespinasse, linux-mm

On 06/11/2013 11:48 PM, NeilBrown wrote:
> On Tue, 11 Jun 2013 21:22:48 -0700 John Stultz <john.stultz@linaro.org> wrote:
>
>> From: Minchan Kim <minchan@kernel.org>
>>
>> This patch adds new system call sys_vrange.
>>
>> NAME
>> 	vrange - Mark or unmark range of memory as volatile
>>
>> SYNOPSIS
>> 	int vrange(unsigned_long start, size_t length, int mode,
>> 			 int *purged);
>>
> ...
>> 	purged: Pointer to an integer which will return 1 if
>> 	mode == VRANGE_NONVOLATILE and any page in the affected range
>> 	was purged. If purged returns zero during a mode ==
>> 	VRANGE_NONVOLATILE call, it means all of the pages in the range
>> 	are intact.
> This seems a bit ambiguous.
> It is clear that the pointed-to location will be set to '1' if any part of
> the range was purged, but it is not clear what will happen if it wasn't
> purged.
> The mention of 'returns zero' seems to suggest that it might set the location
> to '0' in that case, but that isn't obvious to me.  The code appear to always
> set it - that should be explicit.
>
> Also, should the location be a fixed number of bytes to reduce possible
> issues with N-bit userspace on M-bit kernels?
>
> May I suggest:
>
>          purge:  If not NULL, a pointer to a 32bit location which will be set
>          to 1 if mode == VRANGE_NONVOLATILE and any page in the affected range
>          was purged, and will be set to 0 in all other cases (including
>          if mode == VRANGE_VOLATILE).
>
>
> I don't think any further explanation is needed.

I'll use this! Thanks for the suggestion!


>> +	if (purged) {
>> +		/* Test pointer is valid before making any changes */
>> +		if (put_user(p, purged))
>> +			return -EFAULT;
>> +	}
>> +
>> +	ret = do_vrange(mm, start, end - 1, mode, &p);
>> +
>> +	if (purged) {
>> +		if (put_user(p, purged)) {
>> +			/*
>> +			 * This would be bad, since we've modified volatilty
>> +			 * and the change in purged state would be lost.
>> +			 */
>> +			BUG();
>> +		}
>> +	}
> I agree that would be bad, but I don't think a BUG() is called for.  Maybe a
> WARN, and certainly a "return -EFAULT;"

Yea, this was a late change before I sent out the patches. In reviewing 
the documentation I realized we still could return an error and the 
purge data was lost. Thus I added the earlier test to make sure the 
pointer is valid before we take any action.

The BUG() was mostly for my own testing, and I'll change it in the 
future, although I want to sort out exactly in what cases the second 
put_user() could fail if the first succeeded.

Thanks as always for the great feedback!
-john






^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/8] vrange: Add new vrange(2) system call
@ 2013-06-12 18:47       ` John Stultz
  0 siblings, 0 replies; 48+ messages in thread
From: John Stultz @ 2013-06-12 18:47 UTC (permalink / raw)
  To: NeilBrown
  Cc: LKML, Minchan Kim, Andrew Morton, Android Kernel Team,
	Robert Love, Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Andrea Righi, Andrea Arcangeli,
	Aneesh Kumar K.V, Mike Hommey, Taras Glek, Dhaval Giani,
	Jan Kara, KOSAKI Motohiro, Michel Lespinasse, linux-mm

On 06/11/2013 11:48 PM, NeilBrown wrote:
> On Tue, 11 Jun 2013 21:22:48 -0700 John Stultz <john.stultz@linaro.org> wrote:
>
>> From: Minchan Kim <minchan@kernel.org>
>>
>> This patch adds new system call sys_vrange.
>>
>> NAME
>> 	vrange - Mark or unmark range of memory as volatile
>>
>> SYNOPSIS
>> 	int vrange(unsigned_long start, size_t length, int mode,
>> 			 int *purged);
>>
> ...
>> 	purged: Pointer to an integer which will return 1 if
>> 	mode == VRANGE_NONVOLATILE and any page in the affected range
>> 	was purged. If purged returns zero during a mode ==
>> 	VRANGE_NONVOLATILE call, it means all of the pages in the range
>> 	are intact.
> This seems a bit ambiguous.
> It is clear that the pointed-to location will be set to '1' if any part of
> the range was purged, but it is not clear what will happen if it wasn't
> purged.
> The mention of 'returns zero' seems to suggest that it might set the location
> to '0' in that case, but that isn't obvious to me.  The code appear to always
> set it - that should be explicit.
>
> Also, should the location be a fixed number of bytes to reduce possible
> issues with N-bit userspace on M-bit kernels?
>
> May I suggest:
>
>          purge:  If not NULL, a pointer to a 32bit location which will be set
>          to 1 if mode == VRANGE_NONVOLATILE and any page in the affected range
>          was purged, and will be set to 0 in all other cases (including
>          if mode == VRANGE_VOLATILE).
>
>
> I don't think any further explanation is needed.

I'll use this! Thanks for the suggestion!


>> +	if (purged) {
>> +		/* Test pointer is valid before making any changes */
>> +		if (put_user(p, purged))
>> +			return -EFAULT;
>> +	}
>> +
>> +	ret = do_vrange(mm, start, end - 1, mode, &p);
>> +
>> +	if (purged) {
>> +		if (put_user(p, purged)) {
>> +			/*
>> +			 * This would be bad, since we've modified volatilty
>> +			 * and the change in purged state would be lost.
>> +			 */
>> +			BUG();
>> +		}
>> +	}
> I agree that would be bad, but I don't think a BUG() is called for.  Maybe a
> WARN, and certainly a "return -EFAULT;"

Yea, this was a late change before I sent out the patches. In reviewing 
the documentation I realized we still could return an error and the 
purge data was lost. Thus I added the earlier test to make sure the 
pointer is valid before we take any action.

The BUG() was mostly for my own testing, and I'll change it in the 
future, although I want to sort out exactly in what cases the second 
put_user() could fail if the first succeeded.

Thanks as always for the great feedback!
-john





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/8] vrange: Clear volatility on new mmaps
  2013-06-12  4:22   ` John Stultz
@ 2013-06-13  6:28     ` Minchan Kim
  -1 siblings, 0 replies; 48+ messages in thread
From: Minchan Kim @ 2013-06-13  6:28 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	linux-mm

Hey John,

On Tue, Jun 11, 2013 at 09:22:47PM -0700, John Stultz wrote:
> At lsf-mm, the issue was brought up that there is a precedence with
> interfaces like mlock, such that new mappings in a pre-existing range
> do no inherit the mlock state.
> 
> This is mostly because mlock only modifies the existing vmas, and so
> any new mmaps create new vmas, which won't be mlocked.
> 
> Since volatility is not stored in the vma (for good cause, specfically
> as we'd have to have manage file volatility differently from anonymous
> and we're likely to manage volatility on small chunks of memory, which
> would cause lots of vma splitting and churn), this patch clears volatilty
> on new mappings, to ensure that we don't inherit volatility if memory in
> an existing volatile range is unmapped and then re-mapped with something
> else.
> 
> Thus, this patch forces any volatility to be cleared on mmap.

If we have lots of node on vroot but it doesn't include newly mmmaping
vma range, it's purely unnecessary cost and that's never what we want.

> 
> XXX: We expect this patch to be not well loved by mm folks, and are open
> to alternative methods here. Its more of a place holder to address
> the issue from lsf-mm and hopefully will spur some further discussion.

Another idea is we can add "bool is_vrange" in struct vm_area_struct.
It is protected by vrange_lock. The scenario is following as,

When do_vrange is called with VRANGE_VOLATILE, it iterates vmas
and mark the vma->is_vrange to true. So, we can avoid tree traversal
if the is_vrange is false when munmap is called and newly mmaped vma
doesn't need to clear any volatility.

And it would help the performance of purging path to find that a page
is volatile page or not(for now, it is traversing on vroot to find it
but we could do it easily via checking the vma->is_vrange).

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/8] vrange: Clear volatility on new mmaps
@ 2013-06-13  6:28     ` Minchan Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Minchan Kim @ 2013-06-13  6:28 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	linux-mm

Hey John,

On Tue, Jun 11, 2013 at 09:22:47PM -0700, John Stultz wrote:
> At lsf-mm, the issue was brought up that there is a precedence with
> interfaces like mlock, such that new mappings in a pre-existing range
> do no inherit the mlock state.
> 
> This is mostly because mlock only modifies the existing vmas, and so
> any new mmaps create new vmas, which won't be mlocked.
> 
> Since volatility is not stored in the vma (for good cause, specfically
> as we'd have to have manage file volatility differently from anonymous
> and we're likely to manage volatility on small chunks of memory, which
> would cause lots of vma splitting and churn), this patch clears volatilty
> on new mappings, to ensure that we don't inherit volatility if memory in
> an existing volatile range is unmapped and then re-mapped with something
> else.
> 
> Thus, this patch forces any volatility to be cleared on mmap.

If we have lots of node on vroot but it doesn't include newly mmmaping
vma range, it's purely unnecessary cost and that's never what we want.

> 
> XXX: We expect this patch to be not well loved by mm folks, and are open
> to alternative methods here. Its more of a place holder to address
> the issue from lsf-mm and hopefully will spur some further discussion.

Another idea is we can add "bool is_vrange" in struct vm_area_struct.
It is protected by vrange_lock. The scenario is following as,

When do_vrange is called with VRANGE_VOLATILE, it iterates vmas
and mark the vma->is_vrange to true. So, we can avoid tree traversal
if the is_vrange is false when munmap is called and newly mmaped vma
doesn't need to clear any volatility.

And it would help the performance of purging path to find that a page
is volatile page or not(for now, it is traversing on vroot to find it
but we could do it easily via checking the vma->is_vrange).

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/8] vrange: Clear volatility on new mmaps
  2013-06-13  6:28     ` Minchan Kim
@ 2013-06-13 23:43       ` John Stultz
  -1 siblings, 0 replies; 48+ messages in thread
From: John Stultz @ 2013-06-13 23:43 UTC (permalink / raw)
  To: Minchan Kim
  Cc: LKML, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	linux-mm

On 06/12/2013 11:28 PM, Minchan Kim wrote:
> Hey John,
>
> On Tue, Jun 11, 2013 at 09:22:47PM -0700, John Stultz wrote:
>> At lsf-mm, the issue was brought up that there is a precedence with
>> interfaces like mlock, such that new mappings in a pre-existing range
>> do no inherit the mlock state.
>>
>> This is mostly because mlock only modifies the existing vmas, and so
>> any new mmaps create new vmas, which won't be mlocked.
>>
>> Since volatility is not stored in the vma (for good cause, specfically
>> as we'd have to have manage file volatility differently from anonymous
>> and we're likely to manage volatility on small chunks of memory, which
>> would cause lots of vma splitting and churn), this patch clears volatilty
>> on new mappings, to ensure that we don't inherit volatility if memory in
>> an existing volatile range is unmapped and then re-mapped with something
>> else.
>>
>> Thus, this patch forces any volatility to be cleared on mmap.
> If we have lots of node on vroot but it doesn't include newly mmmaping
> vma range, it's purely unnecessary cost and that's never what we want.
>
>> XXX: We expect this patch to be not well loved by mm folks, and are open
>> to alternative methods here. Its more of a place holder to address
>> the issue from lsf-mm and hopefully will spur some further discussion.
> Another idea is we can add "bool is_vrange" in struct vm_area_struct.
> It is protected by vrange_lock. The scenario is following as,
>
> When do_vrange is called with VRANGE_VOLATILE, it iterates vmas
> and mark the vma->is_vrange to true. So, we can avoid tree traversal
> if the is_vrange is false when munmap is called and newly mmaped vma
> doesn't need to clear any volatility.

We could look further into this approach if folks think its the best way 
to go. Though it has the downside of having the split the vmas when 
we're dealing with a large number of smallish objects. Also we'd be 
increasing the vma_struct size for everyone, even if no one is using 
volatile ranges, which may be a bigger concern.

Also it means we'd be managing anonymous and file volatility with 
different structures (though that's not the end of the world).

thanks
-john


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/8] vrange: Clear volatility on new mmaps
@ 2013-06-13 23:43       ` John Stultz
  0 siblings, 0 replies; 48+ messages in thread
From: John Stultz @ 2013-06-13 23:43 UTC (permalink / raw)
  To: Minchan Kim
  Cc: LKML, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	linux-mm

On 06/12/2013 11:28 PM, Minchan Kim wrote:
> Hey John,
>
> On Tue, Jun 11, 2013 at 09:22:47PM -0700, John Stultz wrote:
>> At lsf-mm, the issue was brought up that there is a precedence with
>> interfaces like mlock, such that new mappings in a pre-existing range
>> do no inherit the mlock state.
>>
>> This is mostly because mlock only modifies the existing vmas, and so
>> any new mmaps create new vmas, which won't be mlocked.
>>
>> Since volatility is not stored in the vma (for good cause, specfically
>> as we'd have to have manage file volatility differently from anonymous
>> and we're likely to manage volatility on small chunks of memory, which
>> would cause lots of vma splitting and churn), this patch clears volatilty
>> on new mappings, to ensure that we don't inherit volatility if memory in
>> an existing volatile range is unmapped and then re-mapped with something
>> else.
>>
>> Thus, this patch forces any volatility to be cleared on mmap.
> If we have lots of node on vroot but it doesn't include newly mmmaping
> vma range, it's purely unnecessary cost and that's never what we want.
>
>> XXX: We expect this patch to be not well loved by mm folks, and are open
>> to alternative methods here. Its more of a place holder to address
>> the issue from lsf-mm and hopefully will spur some further discussion.
> Another idea is we can add "bool is_vrange" in struct vm_area_struct.
> It is protected by vrange_lock. The scenario is following as,
>
> When do_vrange is called with VRANGE_VOLATILE, it iterates vmas
> and mark the vma->is_vrange to true. So, we can avoid tree traversal
> if the is_vrange is false when munmap is called and newly mmaped vma
> doesn't need to clear any volatility.

We could look further into this approach if folks think its the best way 
to go. Though it has the downside of having the split the vmas when 
we're dealing with a large number of smallish objects. Also we'd be 
increasing the vma_struct size for everyone, even if no one is using 
volatile ranges, which may be a bigger concern.

Also it means we'd be managing anonymous and file volatility with 
different structures (though that's not the end of the world).

thanks
-john

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/8] vrange: Clear volatility on new mmaps
  2013-06-13 23:43       ` John Stultz
@ 2013-06-14  0:21         ` Minchan Kim
  -1 siblings, 0 replies; 48+ messages in thread
From: Minchan Kim @ 2013-06-14  0:21 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	linux-mm

Hello John,

On Thu, Jun 13, 2013 at 04:43:58PM -0700, John Stultz wrote:
> On 06/12/2013 11:28 PM, Minchan Kim wrote:
> >Hey John,
> >
> >On Tue, Jun 11, 2013 at 09:22:47PM -0700, John Stultz wrote:
> >>At lsf-mm, the issue was brought up that there is a precedence with
> >>interfaces like mlock, such that new mappings in a pre-existing range
> >>do no inherit the mlock state.
> >>
> >>This is mostly because mlock only modifies the existing vmas, and so
> >>any new mmaps create new vmas, which won't be mlocked.
> >>
> >>Since volatility is not stored in the vma (for good cause, specfically
> >>as we'd have to have manage file volatility differently from anonymous
> >>and we're likely to manage volatility on small chunks of memory, which
> >>would cause lots of vma splitting and churn), this patch clears volatilty
> >>on new mappings, to ensure that we don't inherit volatility if memory in
> >>an existing volatile range is unmapped and then re-mapped with something
> >>else.
> >>
> >>Thus, this patch forces any volatility to be cleared on mmap.
> >If we have lots of node on vroot but it doesn't include newly mmmaping
> >vma range, it's purely unnecessary cost and that's never what we want.
> >
> >>XXX: We expect this patch to be not well loved by mm folks, and are open
> >>to alternative methods here. Its more of a place holder to address
> >>the issue from lsf-mm and hopefully will spur some further discussion.
> >Another idea is we can add "bool is_vrange" in struct vm_area_struct.
> >It is protected by vrange_lock. The scenario is following as,
> >
> >When do_vrange is called with VRANGE_VOLATILE, it iterates vmas
> >and mark the vma->is_vrange to true. So, we can avoid tree traversal
> >if the is_vrange is false when munmap is called and newly mmaped vma
> >doesn't need to clear any volatility.
> 
> We could look further into this approach if folks think its the best
> way to go. Though it has the downside of having the split the vmas
> when we're dealing with a large number of smallish objects. Also

We don't need to split vma, which I don't really want.
I meant followig as

1)

0x100000                                        0x10000000
|                       VMA : isvrange = false  |


2) vrange(0x200000, 0x100000, VRANGE_VOLATILE)


0x100000                                        0x10000000
|                       VMA : isvrange = true   |


        vroot
       /
   node 1
    
2) vrange(0x400000, 0x100000, VRANGE_VOLATILE)

0x100000                                        0x10000000
|                       VMA : isvrange = true   |


        vroot
       /     \
   node 1  node 2


3) unmap(0x400000, 0x100000, VRANGE_NOVOLATILE)

sys_munmap:

if (vma->is_vrange) {
        vrange_clear(0x400000, 0x400000 + 0x100000 -1); 
        if (vma_vrange_all_clear(vma)
                vma->isvrange = false;
}

0x100000                                        0x10000000
|                       VMA : isvrange = true   |

        vroot
       /    
   node 1


3) unmap(0x200000, 0x100000, VRANGE_NOVOLATILE)

sys_munmap:

if (vma->is_vrange) {
        vrange_clear(0x200000, 0x200000 + 0x100000 -1); 
        if (vma_vrange_all_clear(vma)
                vma->isvrange = false;
}

0x100000                                        0x10000000
|                       VMA : isvrange = false  |

        vroot
       /    \


4) purging path

bool really_vrange_page(page *page)
{
        
        return __vrange_address(vroot, startofpage, endofpage);
}

shrink_page_list
        ..
        ..

        vma = rmap_from_page(page);
        if (vma->is_vrange) {
                /*
                 * vma's is_vrange could have false positive
                 * so that we should check it.
                 */
                if (really_vrange_page(page))
                        purge_page(page);
        }
        ..
        ..

So we can reduce unnecessary vroot traverse without vma splitting.

> we'd be increasing the vma_struct size for everyone, even if no one
> is using volatile ranges, which may be a bigger concern.


I think vma is not a sensitive about size and historically, we have
been added a variable easily. Of course, another ideas which don't
need to increase vma size are welcome but IMHO, it'a good compromise
between performance and memoryfootprint.

> 
> Also it means we'd be managing anonymous and file volatility with
> different structures (though that's not the end of the world).

volatility still is kept in vrange->purged.
Do I miss something?

> 
> thanks
> -john
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/8] vrange: Clear volatility on new mmaps
@ 2013-06-14  0:21         ` Minchan Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Minchan Kim @ 2013-06-14  0:21 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	linux-mm

Hello John,

On Thu, Jun 13, 2013 at 04:43:58PM -0700, John Stultz wrote:
> On 06/12/2013 11:28 PM, Minchan Kim wrote:
> >Hey John,
> >
> >On Tue, Jun 11, 2013 at 09:22:47PM -0700, John Stultz wrote:
> >>At lsf-mm, the issue was brought up that there is a precedence with
> >>interfaces like mlock, such that new mappings in a pre-existing range
> >>do no inherit the mlock state.
> >>
> >>This is mostly because mlock only modifies the existing vmas, and so
> >>any new mmaps create new vmas, which won't be mlocked.
> >>
> >>Since volatility is not stored in the vma (for good cause, specfically
> >>as we'd have to have manage file volatility differently from anonymous
> >>and we're likely to manage volatility on small chunks of memory, which
> >>would cause lots of vma splitting and churn), this patch clears volatilty
> >>on new mappings, to ensure that we don't inherit volatility if memory in
> >>an existing volatile range is unmapped and then re-mapped with something
> >>else.
> >>
> >>Thus, this patch forces any volatility to be cleared on mmap.
> >If we have lots of node on vroot but it doesn't include newly mmmaping
> >vma range, it's purely unnecessary cost and that's never what we want.
> >
> >>XXX: We expect this patch to be not well loved by mm folks, and are open
> >>to alternative methods here. Its more of a place holder to address
> >>the issue from lsf-mm and hopefully will spur some further discussion.
> >Another idea is we can add "bool is_vrange" in struct vm_area_struct.
> >It is protected by vrange_lock. The scenario is following as,
> >
> >When do_vrange is called with VRANGE_VOLATILE, it iterates vmas
> >and mark the vma->is_vrange to true. So, we can avoid tree traversal
> >if the is_vrange is false when munmap is called and newly mmaped vma
> >doesn't need to clear any volatility.
> 
> We could look further into this approach if folks think its the best
> way to go. Though it has the downside of having the split the vmas
> when we're dealing with a large number of smallish objects. Also

We don't need to split vma, which I don't really want.
I meant followig as

1)

0x100000                                        0x10000000
|                       VMA : isvrange = false  |


2) vrange(0x200000, 0x100000, VRANGE_VOLATILE)


0x100000                                        0x10000000
|                       VMA : isvrange = true   |


        vroot
       /
   node 1
    
2) vrange(0x400000, 0x100000, VRANGE_VOLATILE)

0x100000                                        0x10000000
|                       VMA : isvrange = true   |


        vroot
       /     \
   node 1  node 2


3) unmap(0x400000, 0x100000, VRANGE_NOVOLATILE)

sys_munmap:

if (vma->is_vrange) {
        vrange_clear(0x400000, 0x400000 + 0x100000 -1); 
        if (vma_vrange_all_clear(vma)
                vma->isvrange = false;
}

0x100000                                        0x10000000
|                       VMA : isvrange = true   |

        vroot
       /    
   node 1


3) unmap(0x200000, 0x100000, VRANGE_NOVOLATILE)

sys_munmap:

if (vma->is_vrange) {
        vrange_clear(0x200000, 0x200000 + 0x100000 -1); 
        if (vma_vrange_all_clear(vma)
                vma->isvrange = false;
}

0x100000                                        0x10000000
|                       VMA : isvrange = false  |

        vroot
       /    \


4) purging path

bool really_vrange_page(page *page)
{
        
        return __vrange_address(vroot, startofpage, endofpage);
}

shrink_page_list
        ..
        ..

        vma = rmap_from_page(page);
        if (vma->is_vrange) {
                /*
                 * vma's is_vrange could have false positive
                 * so that we should check it.
                 */
                if (really_vrange_page(page))
                        purge_page(page);
        }
        ..
        ..

So we can reduce unnecessary vroot traverse without vma splitting.

> we'd be increasing the vma_struct size for everyone, even if no one
> is using volatile ranges, which may be a bigger concern.


I think vma is not a sensitive about size and historically, we have
been added a variable easily. Of course, another ideas which don't
need to increase vma size are welcome but IMHO, it'a good compromise
between performance and memoryfootprint.

> 
> Also it means we'd be managing anonymous and file volatility with
> different structures (though that's not the end of the world).

volatility still is kept in vrange->purged.
Do I miss something?

> 
> thanks
> -john
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/8] vrange: Add method to purge volatile ranges
  2013-06-12  4:22   ` John Stultz
@ 2013-06-17  7:13     ` Minchan Kim
  -1 siblings, 0 replies; 48+ messages in thread
From: Minchan Kim @ 2013-06-17  7:13 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	linux-mm

Hello John,

I am rewriting purging path and found a bug from this patch.
I might forget it so I will send this comment for recording.

On Tue, Jun 11, 2013 at 09:22:50PM -0700, John Stultz wrote:
> From: Minchan Kim <minchan@kernel.org>
> 
> This patch adds discarding function to purge volatile ranges under
> memory pressure. Logic is as following:
> 
> 1. Memory pressure happens
> 2. VM start to reclaim pages
> 3. Check the page is in volatile range.
> 4. If so, zap the page from the process's page table.
>    (By semantic vrange(2), we should mark it with another one to
>     make page fault when you try to access the address. It will
>     be introduced later patch)
> 5. If page is unmapped from all processes, discard it instead of swapping.
> 
> This patch does not address the case where there is no swap, which
> keeps anonymous pages from being aged off the LRUs. Minchan has
> additional patches that add support for purging anonymous pages
> 
> XXX: First pass at file purging. Seems to work, but is likely broken
> and needs close review.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Android Kernel Team <kernel-team@android.com>
> Cc: Robert Love <rlove@google.com>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Dave Hansen <dave@linux.vnet.ibm.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Neil Brown <neilb@suse.de>
> Cc: Andrea Righi <andrea@betterlinux.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Cc: Mike Hommey <mh@glandium.org>
> Cc: Taras Glek <tglek@mozilla.com>
> Cc: Dhaval Giani <dgiani@mozilla.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
> Cc: Michel Lespinasse <walken@google.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: linux-mm@kvack.org <linux-mm@kvack.org>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> [jstultz: Reworked to add purging of file pages, commit log tweaks]
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
>  include/linux/rmap.h   |  12 +-
>  include/linux/swap.h   |   1 +
>  include/linux/vrange.h |   7 ++
>  mm/ksm.c               |   2 +-
>  mm/rmap.c              |  30 +++--
>  mm/swapfile.c          |  36 ++++++
>  mm/vmscan.c            |  16 ++-
>  mm/vrange.c            | 332 +++++++++++++++++++++++++++++++++++++++++++++++++
>  8 files changed, 420 insertions(+), 16 deletions(-)
> 
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 6dacb93..6432dfb 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -83,6 +83,8 @@ enum ttu_flags {
>  };
>  

< snip >

> @@ -662,7 +663,7 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma)
>   */
>  int page_referenced_one(struct page *page, struct vm_area_struct *vma,
>  			unsigned long address, unsigned int *mapcount,
> -			unsigned long *vm_flags)
> +			unsigned long *vm_flags, int *is_vrange)
>  {
>  	struct mm_struct *mm = vma->vm_mm;
>  	int referenced = 0;
> @@ -724,6 +725,9 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
>  				referenced++;
>  		}
>  		pte_unmap_unlock(pte, ptl);
> +		if (is_vrange &&
> +			vrange_address(mm, address, address + PAGE_SIZE - 1))
> +			*is_vrange = 1;

< snip >

> +static bool __vrange_address(struct vrange_root *vroot,
> +			unsigned long start, unsigned long end)
> +{
> +	struct interval_tree_node *node;
> +
> +	node = interval_tree_iter_first(&vroot->v_rb, start, end);
> +	return node ? true : false;
> +}
> +
> +bool vrange_address(struct mm_struct *mm,
> +			unsigned long start, unsigned long end)
> +{
> +	struct vrange_root *vroot;
> +	unsigned long vstart_idx, vend_idx;
> +	struct vm_area_struct *vma;
> +	bool ret;
> +
> +	vma = find_vma(mm, start);

It seems to be tweaked by you while you are refactoring with file-vrange
The problem of the code is that you couldn't use vma without holding
the lock of mmap_sem and you couldn't use the lock in purging path
because you couldn't know other tasks's state so it might be a dealock
if you try to hold a lock.

> +	if (vma->vm_file && (vma->vm_flags & VM_SHARED)) {
> +		vroot = &vma->vm_file->f_mapping->vroot;
> +		vstart_idx = vma->vm_pgoff + start - vma->vm_start;
> +		vend_idx = vma->vm_pgoff + end - vma->vm_start;
> +	} else {
> +		vroot = &mm->vroot;
> +		vstart_idx = start;
> +		vend_idx = end;
> +	}
> +
> +	vrange_lock(vroot);
> +	ret = __vrange_address(vroot, vstart_idx, vend_idx);
> +	vrange_unlock(vroot);
> +	return ret;
> +}
> +
-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/8] vrange: Add method to purge volatile ranges
@ 2013-06-17  7:13     ` Minchan Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Minchan Kim @ 2013-06-17  7:13 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	linux-mm

Hello John,

I am rewriting purging path and found a bug from this patch.
I might forget it so I will send this comment for recording.

On Tue, Jun 11, 2013 at 09:22:50PM -0700, John Stultz wrote:
> From: Minchan Kim <minchan@kernel.org>
> 
> This patch adds discarding function to purge volatile ranges under
> memory pressure. Logic is as following:
> 
> 1. Memory pressure happens
> 2. VM start to reclaim pages
> 3. Check the page is in volatile range.
> 4. If so, zap the page from the process's page table.
>    (By semantic vrange(2), we should mark it with another one to
>     make page fault when you try to access the address. It will
>     be introduced later patch)
> 5. If page is unmapped from all processes, discard it instead of swapping.
> 
> This patch does not address the case where there is no swap, which
> keeps anonymous pages from being aged off the LRUs. Minchan has
> additional patches that add support for purging anonymous pages
> 
> XXX: First pass at file purging. Seems to work, but is likely broken
> and needs close review.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Android Kernel Team <kernel-team@android.com>
> Cc: Robert Love <rlove@google.com>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Dave Hansen <dave@linux.vnet.ibm.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Neil Brown <neilb@suse.de>
> Cc: Andrea Righi <andrea@betterlinux.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Cc: Mike Hommey <mh@glandium.org>
> Cc: Taras Glek <tglek@mozilla.com>
> Cc: Dhaval Giani <dgiani@mozilla.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
> Cc: Michel Lespinasse <walken@google.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: linux-mm@kvack.org <linux-mm@kvack.org>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> [jstultz: Reworked to add purging of file pages, commit log tweaks]
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
>  include/linux/rmap.h   |  12 +-
>  include/linux/swap.h   |   1 +
>  include/linux/vrange.h |   7 ++
>  mm/ksm.c               |   2 +-
>  mm/rmap.c              |  30 +++--
>  mm/swapfile.c          |  36 ++++++
>  mm/vmscan.c            |  16 ++-
>  mm/vrange.c            | 332 +++++++++++++++++++++++++++++++++++++++++++++++++
>  8 files changed, 420 insertions(+), 16 deletions(-)
> 
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 6dacb93..6432dfb 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -83,6 +83,8 @@ enum ttu_flags {
>  };
>  

< snip >

> @@ -662,7 +663,7 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma)
>   */
>  int page_referenced_one(struct page *page, struct vm_area_struct *vma,
>  			unsigned long address, unsigned int *mapcount,
> -			unsigned long *vm_flags)
> +			unsigned long *vm_flags, int *is_vrange)
>  {
>  	struct mm_struct *mm = vma->vm_mm;
>  	int referenced = 0;
> @@ -724,6 +725,9 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
>  				referenced++;
>  		}
>  		pte_unmap_unlock(pte, ptl);
> +		if (is_vrange &&
> +			vrange_address(mm, address, address + PAGE_SIZE - 1))
> +			*is_vrange = 1;

< snip >

> +static bool __vrange_address(struct vrange_root *vroot,
> +			unsigned long start, unsigned long end)
> +{
> +	struct interval_tree_node *node;
> +
> +	node = interval_tree_iter_first(&vroot->v_rb, start, end);
> +	return node ? true : false;
> +}
> +
> +bool vrange_address(struct mm_struct *mm,
> +			unsigned long start, unsigned long end)
> +{
> +	struct vrange_root *vroot;
> +	unsigned long vstart_idx, vend_idx;
> +	struct vm_area_struct *vma;
> +	bool ret;
> +
> +	vma = find_vma(mm, start);

It seems to be tweaked by you while you are refactoring with file-vrange
The problem of the code is that you couldn't use vma without holding
the lock of mmap_sem and you couldn't use the lock in purging path
because you couldn't know other tasks's state so it might be a dealock
if you try to hold a lock.

> +	if (vma->vm_file && (vma->vm_flags & VM_SHARED)) {
> +		vroot = &vma->vm_file->f_mapping->vroot;
> +		vstart_idx = vma->vm_pgoff + start - vma->vm_start;
> +		vend_idx = vma->vm_pgoff + end - vma->vm_start;
> +	} else {
> +		vroot = &mm->vroot;
> +		vstart_idx = start;
> +		vend_idx = end;
> +	}
> +
> +	vrange_lock(vroot);
> +	ret = __vrange_address(vroot, vstart_idx, vend_idx);
> +	vrange_unlock(vroot);
> +	return ret;
> +}
> +
-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/8] vrange: Add method to purge volatile ranges
  2013-06-17  7:13     ` Minchan Kim
@ 2013-06-17  7:24       ` Minchan Kim
  -1 siblings, 0 replies; 48+ messages in thread
From: Minchan Kim @ 2013-06-17  7:24 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	linux-mm

On Mon, Jun 17, 2013 at 04:13:31PM +0900, Minchan Kim wrote:
> Hello John,
> 
> I am rewriting purging path and found a bug from this patch.
> I might forget it so I will send this comment for recording.
> 
> On Tue, Jun 11, 2013 at 09:22:50PM -0700, John Stultz wrote:
> > From: Minchan Kim <minchan@kernel.org>
> > 
> > This patch adds discarding function to purge volatile ranges under
> > memory pressure. Logic is as following:
> > 
> > 1. Memory pressure happens
> > 2. VM start to reclaim pages
> > 3. Check the page is in volatile range.
> > 4. If so, zap the page from the process's page table.
> >    (By semantic vrange(2), we should mark it with another one to
> >     make page fault when you try to access the address. It will
> >     be introduced later patch)
> > 5. If page is unmapped from all processes, discard it instead of swapping.
> > 
> > This patch does not address the case where there is no swap, which
> > keeps anonymous pages from being aged off the LRUs. Minchan has
> > additional patches that add support for purging anonymous pages
> > 
> > XXX: First pass at file purging. Seems to work, but is likely broken
> > and needs close review.
> > 
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Android Kernel Team <kernel-team@android.com>
> > Cc: Robert Love <rlove@google.com>
> > Cc: Mel Gorman <mel@csn.ul.ie>
> > Cc: Hugh Dickins <hughd@google.com>
> > Cc: Dave Hansen <dave@linux.vnet.ibm.com>
> > Cc: Rik van Riel <riel@redhat.com>
> > Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
> > Cc: Dave Chinner <david@fromorbit.com>
> > Cc: Neil Brown <neilb@suse.de>
> > Cc: Andrea Righi <andrea@betterlinux.com>
> > Cc: Andrea Arcangeli <aarcange@redhat.com>
> > Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> > Cc: Mike Hommey <mh@glandium.org>
> > Cc: Taras Glek <tglek@mozilla.com>
> > Cc: Dhaval Giani <dgiani@mozilla.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
> > Cc: Michel Lespinasse <walken@google.com>
> > Cc: Minchan Kim <minchan@kernel.org>
> > Cc: linux-mm@kvack.org <linux-mm@kvack.org>
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > [jstultz: Reworked to add purging of file pages, commit log tweaks]
> > Signed-off-by: John Stultz <john.stultz@linaro.org>
> > ---
> >  include/linux/rmap.h   |  12 +-
> >  include/linux/swap.h   |   1 +
> >  include/linux/vrange.h |   7 ++
> >  mm/ksm.c               |   2 +-
> >  mm/rmap.c              |  30 +++--
> >  mm/swapfile.c          |  36 ++++++
> >  mm/vmscan.c            |  16 ++-
> >  mm/vrange.c            | 332 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  8 files changed, 420 insertions(+), 16 deletions(-)
> > 
> > diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> > index 6dacb93..6432dfb 100644
> > --- a/include/linux/rmap.h
> > +++ b/include/linux/rmap.h
> > @@ -83,6 +83,8 @@ enum ttu_flags {
> >  };
> >  
> 
> < snip >
> 
> > @@ -662,7 +663,7 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma)
> >   */
> >  int page_referenced_one(struct page *page, struct vm_area_struct *vma,
> >  			unsigned long address, unsigned int *mapcount,
> > -			unsigned long *vm_flags)
> > +			unsigned long *vm_flags, int *is_vrange)
> >  {
> >  	struct mm_struct *mm = vma->vm_mm;
> >  	int referenced = 0;
> > @@ -724,6 +725,9 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
> >  				referenced++;
> >  		}
> >  		pte_unmap_unlock(pte, ptl);
> > +		if (is_vrange &&
> > +			vrange_address(mm, address, address + PAGE_SIZE - 1))
> > +			*is_vrange = 1;
> 
> < snip >
> 
> > +static bool __vrange_address(struct vrange_root *vroot,
> > +			unsigned long start, unsigned long end)
> > +{
> > +	struct interval_tree_node *node;
> > +
> > +	node = interval_tree_iter_first(&vroot->v_rb, start, end);
> > +	return node ? true : false;
> > +}
> > +
> > +bool vrange_address(struct mm_struct *mm,
> > +			unsigned long start, unsigned long end)
> > +{
> > +	struct vrange_root *vroot;
> > +	unsigned long vstart_idx, vend_idx;
> > +	struct vm_area_struct *vma;
> > +	bool ret;
> > +
> > +	vma = find_vma(mm, start);
> 
> It seems to be tweaked by you while you are refactoring with file-vrange
> The problem of the code is that you couldn't use vma without holding
> the lock of mmap_sem and you couldn't use the lock in purging path
> because you couldn't know other tasks's state so it might be a dealock
> if you try to hold a lock.

It was sent by mistake with not completing. :(
Exactly speaking, the find_vma is a problem and we don't need it.
Couldn't we pass just vma, NOT mm?
Have you seen any problem?

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/8] vrange: Add method to purge volatile ranges
@ 2013-06-17  7:24       ` Minchan Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Minchan Kim @ 2013-06-17  7:24 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	linux-mm

On Mon, Jun 17, 2013 at 04:13:31PM +0900, Minchan Kim wrote:
> Hello John,
> 
> I am rewriting purging path and found a bug from this patch.
> I might forget it so I will send this comment for recording.
> 
> On Tue, Jun 11, 2013 at 09:22:50PM -0700, John Stultz wrote:
> > From: Minchan Kim <minchan@kernel.org>
> > 
> > This patch adds discarding function to purge volatile ranges under
> > memory pressure. Logic is as following:
> > 
> > 1. Memory pressure happens
> > 2. VM start to reclaim pages
> > 3. Check the page is in volatile range.
> > 4. If so, zap the page from the process's page table.
> >    (By semantic vrange(2), we should mark it with another one to
> >     make page fault when you try to access the address. It will
> >     be introduced later patch)
> > 5. If page is unmapped from all processes, discard it instead of swapping.
> > 
> > This patch does not address the case where there is no swap, which
> > keeps anonymous pages from being aged off the LRUs. Minchan has
> > additional patches that add support for purging anonymous pages
> > 
> > XXX: First pass at file purging. Seems to work, but is likely broken
> > and needs close review.
> > 
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Android Kernel Team <kernel-team@android.com>
> > Cc: Robert Love <rlove@google.com>
> > Cc: Mel Gorman <mel@csn.ul.ie>
> > Cc: Hugh Dickins <hughd@google.com>
> > Cc: Dave Hansen <dave@linux.vnet.ibm.com>
> > Cc: Rik van Riel <riel@redhat.com>
> > Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
> > Cc: Dave Chinner <david@fromorbit.com>
> > Cc: Neil Brown <neilb@suse.de>
> > Cc: Andrea Righi <andrea@betterlinux.com>
> > Cc: Andrea Arcangeli <aarcange@redhat.com>
> > Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> > Cc: Mike Hommey <mh@glandium.org>
> > Cc: Taras Glek <tglek@mozilla.com>
> > Cc: Dhaval Giani <dgiani@mozilla.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
> > Cc: Michel Lespinasse <walken@google.com>
> > Cc: Minchan Kim <minchan@kernel.org>
> > Cc: linux-mm@kvack.org <linux-mm@kvack.org>
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > [jstultz: Reworked to add purging of file pages, commit log tweaks]
> > Signed-off-by: John Stultz <john.stultz@linaro.org>
> > ---
> >  include/linux/rmap.h   |  12 +-
> >  include/linux/swap.h   |   1 +
> >  include/linux/vrange.h |   7 ++
> >  mm/ksm.c               |   2 +-
> >  mm/rmap.c              |  30 +++--
> >  mm/swapfile.c          |  36 ++++++
> >  mm/vmscan.c            |  16 ++-
> >  mm/vrange.c            | 332 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  8 files changed, 420 insertions(+), 16 deletions(-)
> > 
> > diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> > index 6dacb93..6432dfb 100644
> > --- a/include/linux/rmap.h
> > +++ b/include/linux/rmap.h
> > @@ -83,6 +83,8 @@ enum ttu_flags {
> >  };
> >  
> 
> < snip >
> 
> > @@ -662,7 +663,7 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma)
> >   */
> >  int page_referenced_one(struct page *page, struct vm_area_struct *vma,
> >  			unsigned long address, unsigned int *mapcount,
> > -			unsigned long *vm_flags)
> > +			unsigned long *vm_flags, int *is_vrange)
> >  {
> >  	struct mm_struct *mm = vma->vm_mm;
> >  	int referenced = 0;
> > @@ -724,6 +725,9 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
> >  				referenced++;
> >  		}
> >  		pte_unmap_unlock(pte, ptl);
> > +		if (is_vrange &&
> > +			vrange_address(mm, address, address + PAGE_SIZE - 1))
> > +			*is_vrange = 1;
> 
> < snip >
> 
> > +static bool __vrange_address(struct vrange_root *vroot,
> > +			unsigned long start, unsigned long end)
> > +{
> > +	struct interval_tree_node *node;
> > +
> > +	node = interval_tree_iter_first(&vroot->v_rb, start, end);
> > +	return node ? true : false;
> > +}
> > +
> > +bool vrange_address(struct mm_struct *mm,
> > +			unsigned long start, unsigned long end)
> > +{
> > +	struct vrange_root *vroot;
> > +	unsigned long vstart_idx, vend_idx;
> > +	struct vm_area_struct *vma;
> > +	bool ret;
> > +
> > +	vma = find_vma(mm, start);
> 
> It seems to be tweaked by you while you are refactoring with file-vrange
> The problem of the code is that you couldn't use vma without holding
> the lock of mmap_sem and you couldn't use the lock in purging path
> because you couldn't know other tasks's state so it might be a dealock
> if you try to hold a lock.

It was sent by mistake with not completing. :(
Exactly speaking, the find_vma is a problem and we don't need it.
Couldn't we pass just vma, NOT mm?
Have you seen any problem?

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 0/8] Volatile Ranges (v8?)
  2013-06-12  4:22 ` John Stultz
                   ` (8 preceding siblings ...)
  (?)
@ 2013-06-17 16:24 ` Dhaval Giani
  2013-06-18  4:11     ` Minchan Kim
  -1 siblings, 1 reply; 48+ messages in thread
From: Dhaval Giani @ 2013-06-17 16:24 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Jan Kara, KOSAKI Motohiro, Michel Lespinasse, Minchan Kim,
	linux-mm

[-- Attachment #1: Type: text/plain, Size: 10509 bytes --]

Hi John,

I have been giving your git tree a whirl, and in order to simulate a 
limited memory environment, I was using memory cgroups.

The program I was using to test is attached here. It is your test code, 
with some changes (changing the syscall interface, reducing the memory 
pressure to be generated).

I trapped it in a memory cgroup with 1MB memory.limit_in_bytes and hit this,

[  406.207612] ------------[ cut here ]------------
[  406.207621] kernel BUG at mm/vrange.c:523!
[  406.207626] invalid opcode: 0000 [#1] SMP
[  406.207631] Modules linked in:
[  406.207637] CPU: 0 PID: 1579 Comm: volatile-test Not tainted 
3.10.0-rc5+ #2
[  406.207650] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS 
VirtualBox 12/01/2006
[  406.207655] task: ffff880006fe0000 ti: ffff88001c8b0000 task.ti: 
ffff88001c8b0000
[  406.207659] RIP: 0010:[<ffffffff81155758>] [<ffffffff81155758>] 
try_to_discard_one+0x1f8/0x210
[  406.207667] RSP: 0000:ffff88001c8b1598  EFLAGS: 00010246
[  406.207671] RAX: 0000000000000000 RBX: 00007fde082c0000 RCX: 
ffff88001f199600
[  406.207675] RDX: 0000000000000006 RSI: 0000000000000007 RDI: 
0000000000000000
[  406.207679] RBP: ffff88001c8b15f8 R08: 0000000000000591 R09: 
0000000000000055
[  406.207683] R10: 0000000000000000 R11: 0000000000000000 R12: 
ffffea00002ae2c0
[  406.207687] R13: ffff88001ef9e540 R14: ffff88001ef9e5e0 R15: 
ffff88000b7cfda8
[  406.207692] FS:  00007fde08320740(0000) GS:ffff88001fc00000(0000) 
knlGS:0000000000000000
[  406.207696] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  406.207700] CR2: 00007fde082c0000 CR3: 000000001f131000 CR4: 
00000000000006f0
[  406.207707] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[  406.207711] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
[  406.207715] Stack:
[  406.207719]  0000000000000006 ffff88001f199600 ffff88001ef9e5d8 
0000000081154f16
[  406.207724]  ffff880000000001 ffffea00007c6670 ffff88001c8b15f8 
ffffea00002ae2c0
[  406.207729]  ffff88001f1386c0 ffff88001ef9e5d8 ffff88000b7cfda8 
ffff880005110a10
[  406.207734] Call Trace:
[  406.207743]  [<ffffffff81155b32>] discard_vpage+0x3c2/0x410
[  406.207753]  [<ffffffff81150881>] ? page_referenced+0x241/0x2c0
[  406.207762]  [<ffffffff8112e627>] shrink_page_list+0x397/0x950
[  406.207770]  [<ffffffff8112f12f>] shrink_inactive_list+0x14f/0x400
[  406.207778]  [<ffffffff8112f959>] shrink_lruvec+0x229/0x4e0
[  406.207787]  [<ffffffff8107e597>] ? wake_up_process+0x27/0x50
[  406.207795]  [<ffffffff8112fc76>] shrink_zone+0x66/0x1a0
[  406.207803]  [<ffffffff81130130>] do_try_to_free_pages+0x110/0x5a0
[  406.207812]  [<ffffffff8113074f>] try_to_free_mem_cgroup_pages+0xbf/0x140
[  406.207821]  [<ffffffff81179f6e>] mem_cgroup_reclaim+0x4e/0xe0
[  406.207829]  [<ffffffff8117a4ef>] __mem_cgroup_try_charge+0x4ef/0xbb0
[  406.207837]  [<ffffffff8117b29d>] mem_cgroup_charge_common+0x6d/0xd0
[  406.207846]  [<ffffffff8117cbeb>] mem_cgroup_newpage_charge+0x3b/0x50
[  406.207854]  [<ffffffff81142170>] do_wp_page+0x150/0x720
[  406.207862]  [<ffffffff811448ed>] handle_pte_fault+0x98d/0xae0
[  406.207871]  [<ffffffff811452c4>] handle_mm_fault+0x264/0x5e0
[  406.207880]  [<ffffffff8161c5b1>] __do_page_fault+0x171/0x4e0
[  406.207888]  [<ffffffff8161c92e>] ? do_page_fault+0xe/0x10
[  406.207896]  [<ffffffff81619172>] ? page_fault+0x22/0x30
[  406.207905]  [<ffffffff8161c92e>] do_page_fault+0xe/0x10
[  406.207913]  [<ffffffff81619172>] page_fault+0x22/0x30
[  406.207917] Code: c1 e7 39 48 09 c7 f0 49 ff 8d e8 02 00 00 48 89 55 
a0 48 89 4d a8 e8 78 42 00 00 85 c0 48 8b 55 a0 48 8b 4d a8 0f 85 50 ff 
ff ff <0f> 0b 66 0f 1f 44 00 00 31 db e9 7a fe ff ff 0f 0b e8 c1 aa 4b
[  406.207937] RIP  [<ffffffff81155758>] try_to_discard_one+0x1f8/0x210
[  406.207941]  RSP <ffff88001c8b1598>
[  406.207946] ---[ end trace fe9729b910a78aff ]---
[  406.207951] ------------[ cut here ]------------
[  406.207957] WARNING: at kernel/exit.c:715 do_exit+0x55/0xa30()
[  406.207960] Modules linked in:
[  406.207965] CPU: 0 PID: 1579 Comm: volatile-test Tainted: G D      
3.10.0-rc5+ #2
[  406.207969] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS 
VirtualBox 12/01/2006
[  406.207973]  0000000000000009 ffff88001c8b1288 ffffffff81612a03 
ffff88001c8b12c8
[  406.207978]  ffffffff81049bb0 ffff88001c8b14e8 000000000000000b 
ffff88001c8b14e8
[  406.207983]  0000000000000246 0000000000000000 ffff880006fe0000 
ffff88001c8b12d8
[  406.207988] Call Trace:
[  406.207997]  [<ffffffff81612a03>] dump_stack+0x19/0x1b
[  406.208189]  [<ffffffff81049bb0>] warn_slowpath_common+0x70/0xa0
[  406.208207]  [<ffffffff81049bfa>] warn_slowpath_null+0x1a/0x20
[  406.208222]  [<ffffffff8104f2e5>] do_exit+0x55/0xa30
[  406.208238]  [<ffffffff8160e4e0>] ? printk+0x61/0x63
[  406.208253]  [<ffffffff81619c9b>] oops_end+0x9b/0xe0
[  406.208269]  [<ffffffff81005908>] die+0x58/0x90
[  406.208285]  [<ffffffff8161956b>] do_trap+0x6b/0x170
[  406.208298]  [<ffffffff8161c9b2>] ? 
__atomic_notifier_call_chain+0x12/0x20
[  406.208309]  [<ffffffff81002e75>] do_invalid_op+0x95/0xb0
[  406.208317]  [<ffffffff81155758>] ? try_to_discard_one+0x1f8/0x210
[  406.208328]  [<ffffffff812b882e>] ? blk_queue_bio+0x32e/0x3b0
[  406.208338]  [<ffffffff81622128>] invalid_op+0x18/0x20
[  406.208348]  [<ffffffff81155758>] ? try_to_discard_one+0x1f8/0x210
[  406.208360]  [<ffffffff81155748>] ? try_to_discard_one+0x1e8/0x210
[  406.208370]  [<ffffffff81155b32>] discard_vpage+0x3c2/0x410
[  406.208383]  [<ffffffff81150881>] ? page_referenced+0x241/0x2c0
[  406.208394]  [<ffffffff8112e627>] shrink_page_list+0x397/0x950
[  406.208405]  [<ffffffff8112f12f>] shrink_inactive_list+0x14f/0x400
[  406.208417]  [<ffffffff8112f959>] shrink_lruvec+0x229/0x4e0
[  406.208429]  [<ffffffff8107e597>] ? wake_up_process+0x27/0x50
[  406.208440]  [<ffffffff8112fc76>] shrink_zone+0x66/0x1a0
[  406.208452]  [<ffffffff81130130>] do_try_to_free_pages+0x110/0x5a0
[  406.208464]  [<ffffffff8113074f>] try_to_free_mem_cgroup_pages+0xbf/0x140
[  406.208476]  [<ffffffff81179f6e>] mem_cgroup_reclaim+0x4e/0xe0
[  406.208489]  [<ffffffff8117a4ef>] __mem_cgroup_try_charge+0x4ef/0xbb0
[  406.208501]  [<ffffffff8117b29d>] mem_cgroup_charge_common+0x6d/0xd0
[  406.208514]  [<ffffffff8117cbeb>] mem_cgroup_newpage_charge+0x3b/0x50
[  406.208533]  [<ffffffff81142170>] do_wp_page+0x150/0x720
[  406.208543]  [<ffffffff811448ed>] handle_pte_fault+0x98d/0xae0
[  406.208556]  [<ffffffff811452c4>] handle_mm_fault+0x264/0x5e0
[  406.208568]  [<ffffffff8161c5b1>] __do_page_fault+0x171/0x4e0
[  406.208579]  [<ffffffff8161c92e>] ? do_page_fault+0xe/0x10
[  406.208591]  [<ffffffff81619172>] ? page_fault+0x22/0x30
[  406.208604]  [<ffffffff8161c92e>] do_page_fault+0xe/0x10
[  406.208615]  [<ffffffff81619172>] page_fault+0x22/0x30
[  406.208621] ---[ end trace fe9729b910a78b00 ]---
[  406.208643] BUG: Bad page map in process volatile-test 
pte:800000000ab8b005 pmd:163b2067
[  406.208651] page:ffffea00002ae2c0 count:3 mapcount:-1 
mapping:ffff88001bc769c1 index:0x7fde082c0
[  406.208657] page flags: 
0x3ff00000090009(locked|uptodate|swapcache|swapbacked)
[  406.208666] pc:ffff88001e12b8b0 pc->flags:2 
pc->mem_cgroup:ffff88000329f000
[  406.208672] addr:00007fde082c0000 vm_flags:00100073 
anon_vma:ffff88001f137dc0 mapping:          (null) index:7fde082c0
[  406.208678] CPU: 0 PID: 1579 Comm: volatile-test Tainted: G D W    
3.10.0-rc5+ #2
[  406.208683] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS 
VirtualBox 12/01/2006
[  406.208688]  ffff880005110a10 ffff88001c8b10b8 ffffffff81612a03 
ffff88001c8b1108
[  406.208695]  ffffffff81140d54 800000000ab8b005 00000007fde082c0 
ffff88001c8b1108
[  406.208703]  00007fde08323000 00007fde082c0000 ffff8800163b2600 
ffffea00002ae2c0
[  406.208710] Call Trace:
[  406.208722]  [<ffffffff81612a03>] dump_stack+0x19/0x1b
[  406.208742]  [<ffffffff81140d54>] print_bad_pte+0x194/0x230
[  406.208754]  [<ffffffff81142e8b>] unmap_single_vma+0x74b/0x810
[  406.208765]  [<ffffffff81143759>] unmap_vmas+0x49/0x60
[  406.208777]  [<ffffffff8114c311>] exit_mmap+0xb1/0x150
[  406.208790]  [<ffffffff8116af53>] ? kmem_cache_free+0x1d3/0x1f0
[  406.208802]  [<ffffffff81046f7f>] mmput+0x8f/0xf0
[  406.208814]  [<ffffffff8104f507>] do_exit+0x277/0xa30
[  406.208826]  [<ffffffff8160e4e0>] ? printk+0x61/0x63
[  406.208836]  [<ffffffff81619c9b>] oops_end+0x9b/0xe0
[  406.208845]  [<ffffffff81005908>] die+0x58/0x90
[  406.208854]  [<ffffffff8161956b>] do_trap+0x6b/0x170
[  406.208863]  [<ffffffff8161c9b2>] ? 
__atomic_notifier_call_chain+0x12/0x20
[  406.208874]  [<ffffffff81002e75>] do_invalid_op+0x95/0xb0
[  406.208951]  [<ffffffff81155758>] ? try_to_discard_one+0x1f8/0x210
[  406.208964]  [<ffffffff812b882e>] ? blk_queue_bio+0x32e/0x3b0
[  406.208977]  [<ffffffff81622128>] invalid_op+0x18/0x20
[  406.208987]  [<ffffffff81155758>] ? try_to_discard_one+0x1f8/0x210
[  406.208996]  [<ffffffff81155748>] ? try_to_discard_one+0x1e8/0x210
[  406.209485]  [<ffffffff81155b32>] discard_vpage+0x3c2/0x410
[  406.209497]  [<ffffffff81150881>] ? page_referenced+0x241/0x2c0
[  406.209507]  [<ffffffff8112e627>] shrink_page_list+0x397/0x950
[  406.209532]  [<ffffffff8112f12f>] shrink_inactive_list+0x14f/0x400
[  406.209542]  [<ffffffff8112f959>] shrink_lruvec+0x229/0x4e0
[  406.209551]  [<ffffffff8107e597>] ? wake_up_process+0x27/0x50
[  406.209560]  [<ffffffff8112fc76>] shrink_zone+0x66/0x1a0
[  406.209569]  [<ffffffff81130130>] do_try_to_free_pages+0x110/0x5a0
[  406.209577]  [<ffffffff8113074f>] try_to_free_mem_cgroup_pages+0xbf/0x140
[  406.209586]  [<ffffffff81179f6e>] mem_cgroup_reclaim+0x4e/0xe0
[  406.209595]  [<ffffffff8117a4ef>] __mem_cgroup_try_charge+0x4ef/0xbb0
[  406.209605]  [<ffffffff8117b29d>] mem_cgroup_charge_common+0x6d/0xd0
[  406.209618]  [<ffffffff8117cbeb>] mem_cgroup_newpage_charge+0x3b/0x50
[  406.209629]  [<ffffffff81142170>] do_wp_page+0x150/0x720
[  406.209640]  [<ffffffff811448ed>] handle_pte_fault+0x98d/0xae0
[  406.209652]  [<ffffffff811452c4>] handle_mm_fault+0x264/0x5e0
[  406.209664]  [<ffffffff8161c5b1>] __do_page_fault+0x171/0x4e0
[  406.209758]  [<ffffffff8161c92e>] ? do_page_fault+0xe/0x10
[  406.209771]  [<ffffffff81619172>] ? page_fault+0x22/0x30
[  406.209781]  [<ffffffff8161c92e>] do_page_fault+0xe/0x10
[  406.209791]  [<ffffffff81619172>] page_fault+0x22/0x30

I can send you the full dmesg/config if you care about it. It took me 
3-4 attempts of running the code before I hit this bug. It is reproducible.

Thanks!
Dhaval

[-- Attachment #2: volatile-test.c --]
[-- Type: text/plain, Size: 2412 bytes --]


#define _GNU_SOURCE
#include <stdio.h>
#include <pthread.h>
#include <sched.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/syscall.h>

#define SYS_vrange 314

#define VRANGE_VOLATILE	0	/* unpin all pages so VM can discard them */
#define VRANGE_NOVOLATILE	1	/* pin all pages so VM can't discard them */

#define VRANGE_MODE_SHARED 0x1	/* discard all pages of the range */



#define VRANGE_MODE 0x1

static int vrange(unsigned long start, size_t length, int mode, int *purged)
{
	return syscall(SYS_vrange, start, length, mode, purged);
}


static int mvolatile(void *addr, size_t length)
{
	return vrange((long)addr, length, VRANGE_VOLATILE, 0);
}


static int mnovolatile(void *addr, size_t length, int* purged)
{
	return vrange((long)addr, length, VRANGE_NOVOLATILE, purged);
}


char* vaddr;
int is_anon = 0;
#define PAGE_SIZE (4*1024)
#define CHUNK (4*1024*4)
#define CHUNKNUM 26
#define FULLSIZE (CHUNK*CHUNKNUM + 2*PAGE_SIZE)

void generate_pressure(megs)
{
	pid_t child;
	int one_meg = 1024*1024;
	char *addr;
	int i, status;

	child = fork();


	if (!child) {
		if (is_anon) {
			/* make sure we write to all the vrange pages
			 *  in order to break the copy-on-write
	 		 */
			for(i=0; i < CHUNKNUM; i++)
				memset(vaddr + (i*CHUNK), '0', CHUNK);
		}

		for (i=0; i < megs; i++) {
			addr = malloc(one_meg);
			bzero(addr, one_meg);		
		}
		exit(0);
	}

	waitpid(child, &status, 0);
	return;
}

int main(int argc, char *argv[])
{
	int i, purged;
	char* file;
	int fd;
	int is_file = 0;
	if (argc > 1) {
		file = argv[1];
		fd = open(file, O_RDWR);
		vaddr = mmap(0, FULLSIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
		is_file = 1;
	} else {
		is_anon = 1;
		vaddr = malloc(FULLSIZE);
	}

	purged = 0;
	vaddr += PAGE_SIZE-1;
	vaddr -= (long)vaddr % PAGE_SIZE;

	for(i=0; i < CHUNKNUM; i++)
		memset(vaddr + (i*CHUNK), 'A'+i, CHUNK);


	for(i=0; i < CHUNKNUM; ) {
		mvolatile(vaddr + (i*CHUNK), CHUNK);
		i+=2;
	}

//	for(i=0; i < CHUNKNUM; i++)
//		printf("%c\n", vaddr[i*CHUNK]);

	generate_pressure(3);

//	for(i=0; i < CHUNKNUM; i++)
//		printf("%c\n", vaddr[i*CHUNK]);

	for(i=0; i < CHUNKNUM; ) {
		int ret;
		ret = mnovolatile(vaddr + (i*CHUNK), CHUNK, &purged);
		i+=2;
	}

	if (purged)
		printf("Data purged!\n");
	for(i=0; i < CHUNKNUM; i++)
		printf("%c\n", vaddr[i*CHUNK]);
	


	return 0;
}


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 0/8] Volatile Ranges (v8?)
  2013-06-17 16:24 ` [PATCH 0/8] Volatile Ranges (v8?) Dhaval Giani
@ 2013-06-18  4:11     ` Minchan Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Minchan Kim @ 2013-06-18  4:11 UTC (permalink / raw)
  To: Dhaval Giani
  Cc: John Stultz, LKML, Andrew Morton, Android Kernel Team,
	Robert Love, Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Jan Kara, KOSAKI Motohiro, Michel Lespinasse, linux-mm

Hello Dhaval,

On Mon, Jun 17, 2013 at 12:24:07PM -0400, Dhaval Giani wrote:
> Hi John,
> 
> I have been giving your git tree a whirl, and in order to simulate a
> limited memory environment, I was using memory cgroups.
> 
> The program I was using to test is attached here. It is your test
> code, with some changes (changing the syscall interface, reducing
> the memory pressure to be generated).
> 
> I trapped it in a memory cgroup with 1MB memory.limit_in_bytes and hit this,
> 
> [  406.207612] ------------[ cut here ]------------
> [  406.207621] kernel BUG at mm/vrange.c:523!
> [  406.207626] invalid opcode: 0000 [#1] SMP
> [  406.207631] Modules linked in:
> [  406.207637] CPU: 0 PID: 1579 Comm: volatile-test Not tainted

Thanks for the testing!
Does below patch fix your problem?

diff --git a/mm/swapfile.c b/mm/swapfile.c
index d41c63f..1f6c80e 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -751,7 +751,7 @@ int __free_swap_and_cache(swp_entry_t entry)
 			page = find_get_page(swap_address_space(entry),
 						entry.val);
 		}
-		spin_unlock(&swap_lock);
+		spin_unlock(&p->lock);
 	}
 
 	if (page) {
diff --git a/mm/vrange.c b/mm/vrange.c
index fa965fb..dc32cfa 100644
--- a/mm/vrange.c
+++ b/mm/vrange.c
@@ -485,7 +485,7 @@ int try_to_discard_one(struct vrange_root *vroot, struct page *page,
 {
 	struct mm_struct *mm = vma->vm_mm;
 	pte_t *pte;
-	pte_t pteval;
+	pte_t pteval, pteswap;
 	spinlock_t *ptl;
 	int ret = 0;
 	bool present;
@@ -505,7 +505,7 @@ int try_to_discard_one(struct vrange_root *vroot, struct page *page,
 	present = pte_present(*pte);
 	flush_cache_page(vma, address, page_to_pfn(page));
 
-	ptep_clear_flush(vma, address, pte);
+	pteswap = ptep_clear_flush(vma, address, pte);
 	pteval = pte_mkvrange(*pte);
 
 	update_hiwater_rss(mm);
@@ -517,10 +517,11 @@ int try_to_discard_one(struct vrange_root *vroot, struct page *page,
 	page_remove_rmap(page);
 	page_cache_release(page);
 	if (!present) {
-		swp_entry_t entry = pte_to_swp_entry(*pte);
+		swp_entry_t entry = pte_to_swp_entry(pteswap);
 		dec_mm_counter(mm, MM_SWAPENTS);
-		if (unlikely(!__free_swap_and_cache(entry)))
+		if (unlikely(!__free_swap_and_cache(entry))) {
 			BUG_ON(1);
+		}
 	}
 
 	set_pte_at(mm, address, pte, pteval);
-- 
1.7.9.5

-- 
Kind regards,
Minchan Kim

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 0/8] Volatile Ranges (v8?)
@ 2013-06-18  4:11     ` Minchan Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Minchan Kim @ 2013-06-18  4:11 UTC (permalink / raw)
  To: Dhaval Giani
  Cc: John Stultz, LKML, Andrew Morton, Android Kernel Team,
	Robert Love, Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Jan Kara, KOSAKI Motohiro, Michel Lespinasse, linux-mm

Hello Dhaval,

On Mon, Jun 17, 2013 at 12:24:07PM -0400, Dhaval Giani wrote:
> Hi John,
> 
> I have been giving your git tree a whirl, and in order to simulate a
> limited memory environment, I was using memory cgroups.
> 
> The program I was using to test is attached here. It is your test
> code, with some changes (changing the syscall interface, reducing
> the memory pressure to be generated).
> 
> I trapped it in a memory cgroup with 1MB memory.limit_in_bytes and hit this,
> 
> [  406.207612] ------------[ cut here ]------------
> [  406.207621] kernel BUG at mm/vrange.c:523!
> [  406.207626] invalid opcode: 0000 [#1] SMP
> [  406.207631] Modules linked in:
> [  406.207637] CPU: 0 PID: 1579 Comm: volatile-test Not tainted

Thanks for the testing!
Does below patch fix your problem?

diff --git a/mm/swapfile.c b/mm/swapfile.c
index d41c63f..1f6c80e 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -751,7 +751,7 @@ int __free_swap_and_cache(swp_entry_t entry)
 			page = find_get_page(swap_address_space(entry),
 						entry.val);
 		}
-		spin_unlock(&swap_lock);
+		spin_unlock(&p->lock);
 	}
 
 	if (page) {
diff --git a/mm/vrange.c b/mm/vrange.c
index fa965fb..dc32cfa 100644
--- a/mm/vrange.c
+++ b/mm/vrange.c
@@ -485,7 +485,7 @@ int try_to_discard_one(struct vrange_root *vroot, struct page *page,
 {
 	struct mm_struct *mm = vma->vm_mm;
 	pte_t *pte;
-	pte_t pteval;
+	pte_t pteval, pteswap;
 	spinlock_t *ptl;
 	int ret = 0;
 	bool present;
@@ -505,7 +505,7 @@ int try_to_discard_one(struct vrange_root *vroot, struct page *page,
 	present = pte_present(*pte);
 	flush_cache_page(vma, address, page_to_pfn(page));
 
-	ptep_clear_flush(vma, address, pte);
+	pteswap = ptep_clear_flush(vma, address, pte);
 	pteval = pte_mkvrange(*pte);
 
 	update_hiwater_rss(mm);
@@ -517,10 +517,11 @@ int try_to_discard_one(struct vrange_root *vroot, struct page *page,
 	page_remove_rmap(page);
 	page_cache_release(page);
 	if (!present) {
-		swp_entry_t entry = pte_to_swp_entry(*pte);
+		swp_entry_t entry = pte_to_swp_entry(pteswap);
 		dec_mm_counter(mm, MM_SWAPENTS);
-		if (unlikely(!__free_swap_and_cache(entry)))
+		if (unlikely(!__free_swap_and_cache(entry))) {
 			BUG_ON(1);
+		}
 	}
 
 	set_pte_at(mm, address, pte, pteval);
-- 
1.7.9.5

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 0/8] Volatile Ranges (v8?)
  2013-06-18  4:11     ` Minchan Kim
@ 2013-06-18 16:59       ` Dhaval Giani
  -1 siblings, 0 replies; 48+ messages in thread
From: Dhaval Giani @ 2013-06-18 16:59 UTC (permalink / raw)
  To: Minchan Kim
  Cc: John Stultz, LKML, Andrew Morton, Android Kernel Team,
	Robert Love, Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Jan Kara, KOSAKI Motohiro, Michel Lespinasse, linux-mm

On 2013-06-18 12:11 AM, Minchan Kim wrote:
> Hello Dhaval,
>
> On Mon, Jun 17, 2013 at 12:24:07PM -0400, Dhaval Giani wrote:
>> Hi John,
>>
>> I have been giving your git tree a whirl, and in order to simulate a
>> limited memory environment, I was using memory cgroups.
>>
>> The program I was using to test is attached here. It is your test
>> code, with some changes (changing the syscall interface, reducing
>> the memory pressure to be generated).
>>
>> I trapped it in a memory cgroup with 1MB memory.limit_in_bytes and hit this,
>>
>> [  406.207612] ------------[ cut here ]------------
>> [  406.207621] kernel BUG at mm/vrange.c:523!
>> [  406.207626] invalid opcode: 0000 [#1] SMP
>> [  406.207631] Modules linked in:
>> [  406.207637] CPU: 0 PID: 1579 Comm: volatile-test Not tainted
> Thanks for the testing!
> Does below patch fix your problem?

Yes it does! Thank you very much for the patch.

Thanks!
Dhaval

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 0/8] Volatile Ranges (v8?)
@ 2013-06-18 16:59       ` Dhaval Giani
  0 siblings, 0 replies; 48+ messages in thread
From: Dhaval Giani @ 2013-06-18 16:59 UTC (permalink / raw)
  To: Minchan Kim
  Cc: John Stultz, LKML, Andrew Morton, Android Kernel Team,
	Robert Love, Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Jan Kara, KOSAKI Motohiro, Michel Lespinasse, linux-mm

On 2013-06-18 12:11 AM, Minchan Kim wrote:
> Hello Dhaval,
>
> On Mon, Jun 17, 2013 at 12:24:07PM -0400, Dhaval Giani wrote:
>> Hi John,
>>
>> I have been giving your git tree a whirl, and in order to simulate a
>> limited memory environment, I was using memory cgroups.
>>
>> The program I was using to test is attached here. It is your test
>> code, with some changes (changing the syscall interface, reducing
>> the memory pressure to be generated).
>>
>> I trapped it in a memory cgroup with 1MB memory.limit_in_bytes and hit this,
>>
>> [  406.207612] ------------[ cut here ]------------
>> [  406.207621] kernel BUG at mm/vrange.c:523!
>> [  406.207626] invalid opcode: 0000 [#1] SMP
>> [  406.207631] Modules linked in:
>> [  406.207637] CPU: 0 PID: 1579 Comm: volatile-test Not tainted
> Thanks for the testing!
> Does below patch fix your problem?

Yes it does! Thank you very much for the patch.

Thanks!
Dhaval

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/8] vrange: Add method to purge volatile ranges
  2013-06-12  4:22   ` John Stultz
@ 2013-06-19  4:34     ` Minchan Kim
  -1 siblings, 0 replies; 48+ messages in thread
From: Minchan Kim @ 2013-06-19  4:34 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	linux-mm

On Tue, Jun 11, 2013 at 09:22:50PM -0700, John Stultz wrote:
> From: Minchan Kim <minchan@kernel.org>
> 
> This patch adds discarding function to purge volatile ranges under
> memory pressure. Logic is as following:
> 
> 1. Memory pressure happens
> 2. VM start to reclaim pages
> 3. Check the page is in volatile range.
> 4. If so, zap the page from the process's page table.
>    (By semantic vrange(2), we should mark it with another one to
>     make page fault when you try to access the address. It will
>     be introduced later patch)
> 5. If page is unmapped from all processes, discard it instead of swapping.
> 
> This patch does not address the case where there is no swap, which
> keeps anonymous pages from being aged off the LRUs. Minchan has
> additional patches that add support for purging anonymous pages
> 
> XXX: First pass at file purging. Seems to work, but is likely broken
> and needs close review.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Android Kernel Team <kernel-team@android.com>
> Cc: Robert Love <rlove@google.com>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Dave Hansen <dave@linux.vnet.ibm.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Neil Brown <neilb@suse.de>
> Cc: Andrea Righi <andrea@betterlinux.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Cc: Mike Hommey <mh@glandium.org>
> Cc: Taras Glek <tglek@mozilla.com>
> Cc: Dhaval Giani <dgiani@mozilla.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
> Cc: Michel Lespinasse <walken@google.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: linux-mm@kvack.org <linux-mm@kvack.org>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> [jstultz: Reworked to add purging of file pages, commit log tweaks]
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
>  include/linux/rmap.h   |  12 +-
>  include/linux/swap.h   |   1 +
>  include/linux/vrange.h |   7 ++
>  mm/ksm.c               |   2 +-
>  mm/rmap.c              |  30 +++--
>  mm/swapfile.c          |  36 ++++++
>  mm/vmscan.c            |  16 ++-
>  mm/vrange.c            | 332 +++++++++++++++++++++++++++++++++++++++++++++++++
>  8 files changed, 420 insertions(+), 16 deletions(-)

This patch has some bugs so below patch should fix them and pass my
simple cases.

>From 13c458388a4784a785d93f285b0c54156c3b04aa Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Tue, 11 Jun 2013 21:22:50 -0700
Subject: [PATCH 1/2] vrange: Add method to purge volatile ranges

This patch adds discarding function to purge volatile ranges under
memory pressure. Logic is as following:

1. Memory pressure happens
2. VM start to reclaim pages
3. Check the page is in volatile range.
4. If so, zap the page from the process's page table.
   (By semantic vrange(2), we should mark it with another one to
    make page fault when you try to access the address. It will
    be introduced later patch)
5. If page is unmapped from all processes, discard it instead of swapping.

This patch does not address the case where there is no swap, which
keeps anonymous pages from being aged off the LRUs. Minchan has
additional patches that add support for purging anonymous pages

XXX: First pass at file purging. Seems to work, but is likely broken
and needs close review.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Android Kernel Team <kernel-team@android.com>
Cc: Robert Love <rlove@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Andrea Righi <andrea@betterlinux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mike Hommey <mh@glandium.org>
Cc: Taras Glek <tglek@mozilla.com>
Cc: Dhaval Giani <dgiani@mozilla.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: linux-mm@kvack.org <linux-mm@kvack.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
[jstultz: Reworked to add purging of file pages, commit log tweaks]
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 include/linux/rmap.h   |   12 +-
 include/linux/swap.h   |    1 +
 include/linux/vrange.h |    7 +
 mm/ksm.c               |    2 +-
 mm/rmap.c              |   30 +++--
 mm/swapfile.c          |   36 ++++++
 mm/vmscan.c            |   16 ++-
 mm/vrange.c            |  332 ++++++++++++++++++++++++++++++++++++++++++++++++
 8 files changed, 420 insertions(+), 16 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 6dacb93..6432dfb 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -83,6 +83,8 @@ enum ttu_flags {
 };
 
 #ifdef CONFIG_MMU
+unsigned long vma_address(struct page *page, struct vm_area_struct *vma);
+
 static inline void get_anon_vma(struct anon_vma *anon_vma)
 {
 	atomic_inc(&anon_vma->refcount);
@@ -182,9 +184,11 @@ static inline void page_dup_rmap(struct page *page)
  * Called from mm/vmscan.c to handle paging out
  */
 int page_referenced(struct page *, int is_locked,
-			struct mem_cgroup *memcg, unsigned long *vm_flags);
+			struct mem_cgroup *memcg, unsigned long *vm_flags,
+			int *is_vrange);
 int page_referenced_one(struct page *, struct vm_area_struct *,
-	unsigned long address, unsigned int *mapcount, unsigned long *vm_flags);
+	unsigned long address, unsigned int *mapcount, unsigned long *vm_flags,
+	int *is_vrange);
 
 #define TTU_ACTION(x) ((x) & TTU_ACTION_MASK)
 
@@ -249,9 +253,11 @@ int rmap_walk(struct page *page, int (*rmap_one)(struct page *,
 
 static inline int page_referenced(struct page *page, int is_locked,
 				  struct mem_cgroup *memcg,
-				  unsigned long *vm_flags)
+				  unsigned long *vm_flags,
+				  int *is_vrange)
 {
 	*vm_flags = 0;
+	*is_vrange = 0;
 	return 0;
 }
 
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 1701ce4..5907936 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -383,6 +383,7 @@ extern int swap_duplicate(swp_entry_t);
 extern int swapcache_prepare(swp_entry_t);
 extern void swap_free(swp_entry_t);
 extern void swapcache_free(swp_entry_t, struct page *page);
+extern int __free_swap_and_cache(swp_entry_t);
 extern int free_swap_and_cache(swp_entry_t);
 extern int swap_type_of(dev_t, sector_t, struct block_device **);
 extern unsigned int count_swap_pages(int, int);
diff --git a/include/linux/vrange.h b/include/linux/vrange.h
index a97ac25..cbb609a 100644
--- a/include/linux/vrange.h
+++ b/include/linux/vrange.h
@@ -37,6 +37,10 @@ extern int vrange_clear(struct vrange_root *vroot,
 extern void vrange_root_cleanup(struct vrange_root *vroot);
 extern int vrange_fork(struct mm_struct *new,
 					struct mm_struct *old);
+int discard_vpage(struct page *page);
+bool vrange_address(struct mm_struct *mm, unsigned long start,
+			unsigned long end);
+
 #else
 
 static inline void vrange_init(void) {};
@@ -47,5 +51,8 @@ static inline int vrange_fork(struct mm_struct *new, struct mm_struct *old)
 	return 0;
 }
 
+static inline bool vrange_address(struct mm_struct *mm, unsigned long start,
+		unsigned long end) { return false; };
+static inline int discard_vpage(struct page *page) { return 0 };
 #endif
 #endif /* _LINIUX_VRANGE_H */
diff --git a/mm/ksm.c b/mm/ksm.c
index b6afe0c..debc20c 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1932,7 +1932,7 @@ again:
 				continue;
 
 			referenced += page_referenced_one(page, vma,
-				rmap_item->address, &mapcount, vm_flags);
+				rmap_item->address, &mapcount, vm_flags, NULL);
 			if (!search_new_forks || !mapcount)
 				break;
 		}
diff --git a/mm/rmap.c b/mm/rmap.c
index 6280da8..5522522 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -57,6 +57,8 @@
 #include <linux/migrate.h>
 #include <linux/hugetlb.h>
 #include <linux/backing-dev.h>
+#include <linux/vrange.h>
+#include <linux/rmap.h>
 
 #include <asm/tlbflush.h>
 
@@ -523,8 +525,7 @@ __vma_address(struct page *page, struct vm_area_struct *vma)
 	return vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
 }
 
-inline unsigned long
-vma_address(struct page *page, struct vm_area_struct *vma)
+unsigned long vma_address(struct page *page, struct vm_area_struct *vma)
 {
 	unsigned long address = __vma_address(page, vma);
 
@@ -662,7 +663,7 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma)
  */
 int page_referenced_one(struct page *page, struct vm_area_struct *vma,
 			unsigned long address, unsigned int *mapcount,
-			unsigned long *vm_flags)
+			unsigned long *vm_flags, int *is_vrange)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	int referenced = 0;
@@ -724,6 +725,9 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
 				referenced++;
 		}
 		pte_unmap_unlock(pte, ptl);
+		if (is_vrange &&
+			vrange_address(mm, address, address + PAGE_SIZE - 1))
+			*is_vrange = 1;
 	}
 
 	(*mapcount)--;
@@ -736,7 +740,8 @@ out:
 
 static int page_referenced_anon(struct page *page,
 				struct mem_cgroup *memcg,
-				unsigned long *vm_flags)
+				unsigned long *vm_flags,
+				int *is_vrange)
 {
 	unsigned int mapcount;
 	struct anon_vma *anon_vma;
@@ -761,7 +766,7 @@ static int page_referenced_anon(struct page *page,
 		if (memcg && !mm_match_cgroup(vma->vm_mm, memcg))
 			continue;
 		referenced += page_referenced_one(page, vma, address,
-						  &mapcount, vm_flags);
+					&mapcount, vm_flags, is_vrange);
 		if (!mapcount)
 			break;
 	}
@@ -785,7 +790,9 @@ static int page_referenced_anon(struct page *page,
  */
 static int page_referenced_file(struct page *page,
 				struct mem_cgroup *memcg,
-				unsigned long *vm_flags)
+				unsigned long *vm_flags,
+				int *is_vrange)
+
 {
 	unsigned int mapcount;
 	struct address_space *mapping = page->mapping;
@@ -826,7 +833,8 @@ static int page_referenced_file(struct page *page,
 		if (memcg && !mm_match_cgroup(vma->vm_mm, memcg))
 			continue;
 		referenced += page_referenced_one(page, vma, address,
-						  &mapcount, vm_flags);
+							&mapcount, vm_flags,
+							is_vrange);
 		if (!mapcount)
 			break;
 	}
@@ -841,6 +849,7 @@ static int page_referenced_file(struct page *page,
  * @is_locked: caller holds lock on the page
  * @memcg: target memory cgroup
  * @vm_flags: collect encountered vma->vm_flags who actually referenced the page
+ * @is_vrange: the page in vrange of some process
  *
  * Quick test_and_clear_referenced for all mappings to a page,
  * returns the number of ptes which referenced the page.
@@ -848,7 +857,8 @@ static int page_referenced_file(struct page *page,
 int page_referenced(struct page *page,
 		    int is_locked,
 		    struct mem_cgroup *memcg,
-		    unsigned long *vm_flags)
+		    unsigned long *vm_flags,
+		    int *is_vrange)
 {
 	int referenced = 0;
 	int we_locked = 0;
@@ -867,10 +877,10 @@ int page_referenced(struct page *page,
 								vm_flags);
 		else if (PageAnon(page))
 			referenced += page_referenced_anon(page, memcg,
-								vm_flags);
+							vm_flags, is_vrange);
 		else if (page->mapping)
 			referenced += page_referenced_file(page, memcg,
-								vm_flags);
+							vm_flags, is_vrange);
 		if (we_locked)
 			unlock_page(page);
 
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 6c340d9..1f6c80e 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -734,6 +734,42 @@ int try_to_free_swap(struct page *page)
 }
 
 /*
+ * It's almost same with free_swap_and_cache except page is already
+ * locked.
+ */
+int __free_swap_and_cache(swp_entry_t entry)
+{
+	struct swap_info_struct *p;
+	struct page *page = NULL;
+
+	if (non_swap_entry(entry))
+		return 1;
+
+	p = swap_info_get(entry);
+	if (p) {
+		if (swap_entry_free(p, entry, 1) == SWAP_HAS_CACHE) {
+			page = find_get_page(swap_address_space(entry),
+						entry.val);
+		}
+		spin_unlock(&p->lock);
+	}
+
+	if (page) {
+		/*
+		 * Not mapped elsewhere, or swap space full? Free it!
+		 * Also recheck PageSwapCache now page is locked (above).
+		 */
+		if (PageSwapCache(page) && !PageWriteback(page) &&
+				(!page_mapped(page) || vm_swap_full())) {
+			delete_from_swap_cache(page);
+			SetPageDirty(page);
+		}
+		page_cache_release(page);
+	}
+	return p != NULL;
+}
+
+/*
  * Free the swap entry like above, but also try to
  * free the page cache entry if it is the last user.
  */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index fa6a853..c75e0ac 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -43,6 +43,7 @@
 #include <linux/sysctl.h>
 #include <linux/oom.h>
 #include <linux/prefetch.h>
+#include <linux/vrange.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -611,6 +612,7 @@ enum page_references {
 	PAGEREF_RECLAIM,
 	PAGEREF_RECLAIM_CLEAN,
 	PAGEREF_KEEP,
+	PAGEREF_DISCARD,
 	PAGEREF_ACTIVATE,
 };
 
@@ -619,9 +621,10 @@ static enum page_references page_check_references(struct page *page,
 {
 	int referenced_ptes, referenced_page;
 	unsigned long vm_flags;
+	int is_vrange = 0;
 
 	referenced_ptes = page_referenced(page, 1, sc->target_mem_cgroup,
-					  &vm_flags);
+					  &vm_flags, &is_vrange);
 	referenced_page = TestClearPageReferenced(page);
 
 	/*
@@ -631,6 +634,12 @@ static enum page_references page_check_references(struct page *page,
 	if (vm_flags & VM_LOCKED)
 		return PAGEREF_RECLAIM;
 
+	/*
+	 * Bail out if the page is in vrange and try to discard.
+	 */
+	if (is_vrange)
+		return PAGEREF_DISCARD;
+
 	if (referenced_ptes) {
 		if (PageSwapBacked(page))
 			return PAGEREF_ACTIVATE;
@@ -769,6 +778,9 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			goto activate_locked;
 		case PAGEREF_KEEP:
 			goto keep_locked;
+		case PAGEREF_DISCARD:
+			if (discard_vpage(page))
+				goto free_it;
 		case PAGEREF_RECLAIM:
 		case PAGEREF_RECLAIM_CLEAN:
 			; /* try to reclaim the page below */
@@ -1497,7 +1509,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
 		}
 
 		if (page_referenced(page, 0, sc->target_mem_cgroup,
-				    &vm_flags)) {
+				    &vm_flags, NULL)) {
 			nr_rotated += hpage_nr_pages(page);
 			/*
 			 * Identify referenced, file-backed active pages and
diff --git a/mm/vrange.c b/mm/vrange.c
index 5278939..d57cb38 100644
--- a/mm/vrange.c
+++ b/mm/vrange.c
@@ -6,6 +6,13 @@
 #include <linux/slab.h>
 #include <linux/mman.h>
 #include <linux/syscalls.h>
+#include <linux/pagemap.h>
+#include <linux/rmap.h>
+#include <linux/hugetlb.h>
+#include "internal.h"
+#include <linux/swap.h>
+#include <linux/swapops.h>
+#include <linux/mmu_notifier.h>
 
 static struct kmem_cache *vrange_cachep;
 
@@ -364,3 +371,328 @@ SYSCALL_DEFINE4(vrange, unsigned long, start,
 out:
 	return ret;
 }
+
+
+static bool __vrange_address(struct vrange_root *vroot,
+			unsigned long start, unsigned long end)
+{
+	struct interval_tree_node *node;
+
+	node = interval_tree_iter_first(&vroot->v_rb, start, end);
+	return node ? true : false;
+}
+
+bool vrange_address(struct mm_struct *mm,
+			unsigned long start, unsigned long end)
+{
+	struct vrange_root *vroot;
+	unsigned long vstart_idx, vend_idx;
+	struct vm_area_struct *vma;
+	bool ret;
+
+	vma = find_vma(mm, start);
+	if (vma->vm_file && (vma->vm_flags & VM_SHARED)) {
+		vroot = &vma->vm_file->f_mapping->vroot;
+		vstart_idx = vma->vm_pgoff + start - vma->vm_start;
+		vend_idx = vma->vm_pgoff + end - vma->vm_start;
+	} else {
+		vroot = &mm->vroot;
+		vstart_idx = start;
+		vend_idx = end;
+	}
+
+	vrange_lock(vroot);
+	ret = __vrange_address(vroot, vstart_idx, vend_idx);
+	vrange_unlock(vroot);
+	return ret;
+}
+
+static pte_t *__vpage_check_address(struct page *page,
+		struct mm_struct *mm, unsigned long address, spinlock_t **ptlp)
+{
+	pmd_t *pmd;
+	pte_t *pte;
+	spinlock_t *ptl;
+	bool present;
+
+	/* TODO : look into tlbfs */
+	if (unlikely(PageHuge(page)))
+		return NULL;
+
+	pmd = mm_find_pmd(mm, address);
+	if (!pmd)
+		return NULL;
+	/*
+	 * TODO : Support THP
+	 */
+	if (pmd_trans_huge(*pmd))
+		return NULL;
+
+	pte = pte_offset_map_lock(mm, pmd, address, &ptl);
+	if (pte_none(*pte))
+		goto out;
+
+	present = pte_present(*pte);
+	if (present && page_to_pfn(page) != pte_pfn(*pte))
+		goto out;
+	else if (present) {
+		*ptlp = ptl;
+		return pte;
+	} else {
+		swp_entry_t entry = { .val = page_private(page) };
+
+		VM_BUG_ON(non_swap_entry(entry));
+		if (entry.val != pte_to_swp_entry(*pte).val)
+			goto out;
+		*ptlp = ptl;
+		return pte;
+	}
+out:
+	pte_unmap_unlock(pte, ptl);
+	return NULL;
+}
+
+/*
+ * This functions checks @page is matched with pte's encoded one
+ * which could be a page or swap slot.
+ */
+static inline pte_t *vpage_check_address(struct page *page,
+		struct mm_struct *mm, unsigned long address,
+		spinlock_t **ptlp)
+{
+	pte_t *ptep;
+	__cond_lock(*ptlp, ptep = __vpage_check_address(page,
+				mm, address, ptlp));
+	return ptep;
+}
+
+static void __vrange_purge(struct vrange_root *vroot,
+		unsigned long start, unsigned long end)
+{
+	struct vrange *range;
+	struct interval_tree_node *node;
+
+	node = interval_tree_iter_first(&vroot->v_rb, start, end);
+	while (node) {
+		range = container_of(node, struct vrange, node);
+		range->purged = true;
+		node = interval_tree_iter_next(node, start, end);
+	}
+}
+
+int try_to_discard_one(struct vrange_root *vroot, struct page *page,
+			struct vm_area_struct *vma, unsigned long addr)
+{
+	struct mm_struct *mm = vma->vm_mm;
+	pte_t *pte;
+	pte_t pteval;
+	spinlock_t *ptl;
+	int ret = 0;
+	bool present;
+
+	VM_BUG_ON(!PageLocked(page));
+
+	vrange_lock(vroot);
+	pte = vpage_check_address(page, mm, addr, &ptl);
+	if (!pte)
+		goto out;
+
+	if (vma->vm_flags & VM_LOCKED) {
+		pte_unmap_unlock(pte, ptl);
+		goto out;
+	}
+
+	present = pte_present(*pte);
+	flush_cache_page(vma, address, page_to_pfn(page));
+	pteval = ptep_clear_flush(vma, addr, pte);
+
+	update_hiwater_rss(mm);
+	if (present) {
+		if (PageAnon(page))
+			dec_mm_counter(mm, MM_ANONPAGES);
+		else
+			dec_mm_counter(mm, MM_FILEPAGES);
+		page_remove_rmap(page);
+		page_cache_release(page);
+	} else {
+		swp_entry_t entry = pte_to_swp_entry(pteval);
+		dec_mm_counter(mm, MM_SWAPENTS);
+		if (unlikely(!__free_swap_and_cache(entry)))
+			BUG_ON(1);
+	}
+
+	pte_unmap_unlock(pte, ptl);
+	mmu_notifier_invalidate_page(mm, addr);
+	ret = 1;
+
+	if (!PageAnon(page)) /* switch to file offset) */
+		addr = vma->vm_pgoff + addr - vma->vm_start;
+
+	__vrange_purge(vroot, addr, addr + PAGE_SIZE - 1);
+
+out:
+	vrange_unlock(vroot);
+	return ret;
+}
+
+static int try_to_discard_anon_vpage(struct page *page)
+{
+	struct anon_vma *anon_vma;
+	struct anon_vma_chain *avc;
+	pgoff_t pgoff;
+	struct vm_area_struct *vma;
+	struct mm_struct *mm;
+	struct vrange_root *vroot;
+
+	unsigned long address;
+	bool ret = 0;
+
+	anon_vma = page_lock_anon_vma_read(page);
+	if (!anon_vma)
+		return ret;
+
+	pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+	anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root, pgoff, pgoff) {
+		pte_t *pte;
+		spinlock_t *ptl;
+
+		vma = avc->vma;
+		mm = vma->vm_mm;
+		vroot = &mm->vroot;
+		address = vma_address(page, vma);
+
+		vrange_lock(vroot);
+		/*
+		 * We can't use page_check_address because it doesn't check
+		 * swap entry of the page table. We need the check because
+		 * we have to make sure atomicity of shared vrange.
+		 * It means all vranges which are shared a page should be
+		 * purged if a page in a process is purged.
+		 */
+		pte = vpage_check_address(page, mm, address, &ptl);
+		if (!pte) {
+			vrange_unlock(vroot);
+			continue;
+		}
+
+		if (vma->vm_flags & VM_LOCKED) {
+			pte_unmap_unlock(pte, ptl);
+			vrange_unlock(vroot);
+			goto out;
+		}
+
+		pte_unmap_unlock(pte, ptl);
+		if (!__vrange_address(vroot, address,
+					address + PAGE_SIZE - 1)) {
+			vrange_unlock(vroot);
+			goto out;
+		}
+
+		vrange_unlock(vroot);
+	}
+
+	anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root, pgoff, pgoff) {
+		vma = avc->vma;
+		mm = vma->vm_mm;
+		vroot = &mm->vroot;
+		address = vma_address(page, vma);
+		if (!try_to_discard_one(vroot, page, vma, address))
+			goto out;
+	}
+
+	ret = 1;
+out:
+	page_unlock_anon_vma_read(anon_vma);
+	return ret;
+}
+
+
+
+static int try_to_discard_file_vpage(struct page *page)
+{
+	struct address_space *mapping = page->mapping;
+	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+	struct vm_area_struct *vma;
+	bool ret = 0;
+
+	mutex_lock(&mapping->i_mmap_mutex);
+	vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
+		unsigned long address = vma_address(page, vma);
+		struct mm_struct *mm = vma->vm_mm;
+		struct vrange_root *vroot = &mapping->vroot;
+		pte_t *pte;
+		spinlock_t *ptl;
+		long vstart_idx;
+
+
+		vstart_idx = vma->vm_pgoff + address - vma->vm_start;
+
+		vrange_lock(vroot);
+		/*
+		 * We can't use page_check_address because it doesn't check
+		 * swap entry of the page table. We need the check because
+		 * we have to make sure atomicity of shared vrange.
+		 * It means all vranges which are shared a page should be
+		 * purged if a page in a process is purged.
+		 */
+		pte = vpage_check_address(page, mm, address, &ptl);
+		if (!pte) {
+			vrange_unlock(vroot);
+			continue;
+		}
+
+		if (vma->vm_flags & VM_LOCKED) {
+			pte_unmap_unlock(pte, ptl);
+			vrange_unlock(vroot);
+			goto out;
+		}
+
+		pte_unmap_unlock(pte, ptl);
+		if (!__vrange_address(vroot, vstart_idx,
+					vstart_idx + PAGE_SIZE - 1)) {
+			vrange_unlock(vroot);
+			goto out;
+		}
+
+		vrange_unlock(vroot);
+	}
+
+	vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
+		unsigned long address = vma_address(page, vma);
+		struct vrange_root *vroot = &mapping->vroot;
+
+		if (!try_to_discard_one(vroot, page, vma, address))
+			goto out;
+	}
+
+	ret = 1;
+out:
+	mutex_unlock(&mapping->i_mmap_mutex);
+	return ret;
+}
+
+static int try_to_discard_vpage(struct page *page)
+{
+	if (PageAnon(page))
+		return try_to_discard_anon_vpage(page);
+	return try_to_discard_file_vpage(page);
+}
+
+int discard_vpage(struct page *page)
+{
+	VM_BUG_ON(!PageLocked(page));
+	VM_BUG_ON(PageLRU(page));
+
+	if (try_to_discard_vpage(page)) {
+		if (PageSwapCache(page))
+			try_to_free_swap(page);
+
+		if (page_freeze_refs(page, 1)) {
+			unlock_page(page);
+			return 1;
+		}
+	}
+
+	return 0;
+}
+
-- 
1.7.9.5

-- 
Kind regards,
Minchan Kim

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/8] vrange: Add method to purge volatile ranges
@ 2013-06-19  4:34     ` Minchan Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Minchan Kim @ 2013-06-19  4:34 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	linux-mm

On Tue, Jun 11, 2013 at 09:22:50PM -0700, John Stultz wrote:
> From: Minchan Kim <minchan@kernel.org>
> 
> This patch adds discarding function to purge volatile ranges under
> memory pressure. Logic is as following:
> 
> 1. Memory pressure happens
> 2. VM start to reclaim pages
> 3. Check the page is in volatile range.
> 4. If so, zap the page from the process's page table.
>    (By semantic vrange(2), we should mark it with another one to
>     make page fault when you try to access the address. It will
>     be introduced later patch)
> 5. If page is unmapped from all processes, discard it instead of swapping.
> 
> This patch does not address the case where there is no swap, which
> keeps anonymous pages from being aged off the LRUs. Minchan has
> additional patches that add support for purging anonymous pages
> 
> XXX: First pass at file purging. Seems to work, but is likely broken
> and needs close review.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Android Kernel Team <kernel-team@android.com>
> Cc: Robert Love <rlove@google.com>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Dave Hansen <dave@linux.vnet.ibm.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Neil Brown <neilb@suse.de>
> Cc: Andrea Righi <andrea@betterlinux.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Cc: Mike Hommey <mh@glandium.org>
> Cc: Taras Glek <tglek@mozilla.com>
> Cc: Dhaval Giani <dgiani@mozilla.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
> Cc: Michel Lespinasse <walken@google.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: linux-mm@kvack.org <linux-mm@kvack.org>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> [jstultz: Reworked to add purging of file pages, commit log tweaks]
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
>  include/linux/rmap.h   |  12 +-
>  include/linux/swap.h   |   1 +
>  include/linux/vrange.h |   7 ++
>  mm/ksm.c               |   2 +-
>  mm/rmap.c              |  30 +++--
>  mm/swapfile.c          |  36 ++++++
>  mm/vmscan.c            |  16 ++-
>  mm/vrange.c            | 332 +++++++++++++++++++++++++++++++++++++++++++++++++
>  8 files changed, 420 insertions(+), 16 deletions(-)

This patch has some bugs so below patch should fix them and pass my
simple cases.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 8/8] vrange: Send SIGBUS when user try to access purged page
  2013-06-12  4:22   ` John Stultz
@ 2013-06-19  4:36     ` Minchan Kim
  -1 siblings, 0 replies; 48+ messages in thread
From: Minchan Kim @ 2013-06-19  4:36 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	linux-mm

On Tue, Jun 11, 2013 at 09:22:51PM -0700, John Stultz wrote:
> From: Minchan Kim <minchan@kernel.org>
> 
> By vrange(2) semantic, user should see SIGBUG if he try to access
> purged page without vrange(...VRANGE_NOVOLATILE).
> 
> This patch implements it.
> 
> XXX: I reused PSE bit for quick prototype without enough considering
> so need time to see what's empty bit and I am surely missing
> many places to handle vrange pte bit. I should investigate all of
> pte handling places, especially pte_none case.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Android Kernel Team <kernel-team@android.com>
> Cc: Robert Love <rlove@google.com>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Dave Hansen <dave@linux.vnet.ibm.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Neil Brown <neilb@suse.de>
> Cc: Andrea Righi <andrea@betterlinux.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Cc: Mike Hommey <mh@glandium.org>
> Cc: Taras Glek <tglek@mozilla.com>
> Cc: Dhaval Giani <dgiani@mozilla.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
> Cc: Michel Lespinasse <walken@google.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: linux-mm@kvack.org <linux-mm@kvack.org>
> 
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> [jstultz: Extended to work with file pages]
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
>  arch/x86/include/asm/pgtable_types.h |  2 ++
>  include/asm-generic/pgtable.h        | 11 +++++++++++
>  include/linux/vrange.h               |  2 ++
>  mm/memory.c                          | 23 +++++++++++++++++++++--
>  mm/vrange.c                          | 35 ++++++++++++++++++++++++++++++++++-
>  5 files changed, 70 insertions(+), 3 deletions(-)
> 

This patch fixes the problem Dhaval reported.

>From e789359cf2ac706e1ebc925f14eb2d7187cd2267 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Tue, 11 Jun 2013 21:22:51 -0700
Subject: [PATCH 2/2] vrange: Send SIGBUS when user try to access purged page

By vrange(2) semantic, user should see SIGBUG if he try to access
purged page without vrange(...VRANGE_NOVOLATILE).

This patch implements it.

XXX: I reused PSE bit for quick prototype without enough considering
so need time to see what's empty bit and I am surely missing
many places to handle vrange pte bit. I should investigate all of
pte handling places, especially pte_none case.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Android Kernel Team <kernel-team@android.com>
Cc: Robert Love <rlove@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Andrea Righi <andrea@betterlinux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mike Hommey <mh@glandium.org>
Cc: Taras Glek <tglek@mozilla.com>
Cc: Dhaval Giani <dgiani@mozilla.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: linux-mm@kvack.org <linux-mm@kvack.org>

Signed-off-by: Minchan Kim <minchan@kernel.org>
[jstultz: Extended to work with file pages]
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 arch/x86/include/asm/pgtable_types.h |    2 ++
 include/asm-generic/pgtable.h        |   11 +++++++++++
 include/linux/vrange.h               |    2 ++
 mm/memory.c                          |   23 +++++++++++++++++++++--
 mm/vrange.c                          |   31 +++++++++++++++++++++++++++++++
 5 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index e642300..d7ea6a0 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -64,6 +64,8 @@
 #define _PAGE_FILE	(_AT(pteval_t, 1) << _PAGE_BIT_FILE)
 #define _PAGE_PROTNONE	(_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
 
+#define _PAGE_VRANGE	_PAGE_BIT_PSE
+
 /*
  * _PAGE_NUMA indicates that this page will trigger a numa hinting
  * minor page fault to gather numa placement statistics (see
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index a59ff51..91e8f6f 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -479,6 +479,17 @@ static inline unsigned long my_zero_pfn(unsigned long addr)
 
 #ifdef CONFIG_MMU
 
+static inline pte_t pte_mkvrange(pte_t pte)
+{
+	pte = pte_set_flags(pte, _PAGE_VRANGE);
+	return pte_clear_flags(pte, _PAGE_PRESENT);
+}
+
+static inline int pte_vrange(pte_t pte)
+{
+	return ((pte_flags(pte) | _PAGE_PRESENT) == _PAGE_VRANGE);
+}
+
 #ifndef CONFIG_TRANSPARENT_HUGEPAGE
 static inline int pmd_trans_huge(pmd_t pmd)
 {
diff --git a/include/linux/vrange.h b/include/linux/vrange.h
index cbb609a..75754d1 100644
--- a/include/linux/vrange.h
+++ b/include/linux/vrange.h
@@ -41,6 +41,8 @@ int discard_vpage(struct page *page);
 bool vrange_address(struct mm_struct *mm, unsigned long start,
 			unsigned long end);
 
+extern bool is_purged_vrange(struct mm_struct *mm, unsigned long address);
+
 #else
 
 static inline void vrange_init(void) {};
diff --git a/mm/memory.c b/mm/memory.c
index 61a262b..cc5c70b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -59,6 +59,7 @@
 #include <linux/gfp.h>
 #include <linux/migrate.h>
 #include <linux/string.h>
+#include <linux/vrange.h>
 
 #include <asm/io.h>
 #include <asm/pgalloc.h>
@@ -832,7 +833,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 
 	/* pte contains position in swap or file, so copy. */
 	if (unlikely(!pte_present(pte))) {
-		if (!pte_file(pte)) {
+		if (!pte_file(pte) && !pte_vrange(pte)) {
 			swp_entry_t entry = pte_to_swp_entry(pte);
 
 			if (swap_duplicate(entry) < 0)
@@ -1172,7 +1173,7 @@ again:
 		if (pte_file(ptent)) {
 			if (unlikely(!(vma->vm_flags & VM_NONLINEAR)))
 				print_bad_pte(vma, addr, ptent, NULL);
-		} else {
+		} else if (!pte_vrange(ptent)) {
 			swp_entry_t entry = pte_to_swp_entry(ptent);
 
 			if (!non_swap_entry(entry))
@@ -3707,9 +3708,27 @@ int handle_pte_fault(struct mm_struct *mm,
 					return do_linear_fault(mm, vma, address,
 						pte, pmd, flags, entry);
 			}
+anon:
 			return do_anonymous_page(mm, vma, address,
 						 pte, pmd, flags);
 		}
+
+		if (unlikely(pte_vrange(entry))) {
+			if (!is_purged_vrange(mm, address)) {
+				/* zap pte */
+				ptl = pte_lockptr(mm, pmd);
+				spin_lock(ptl);
+				if (unlikely(!pte_same(*pte, entry)))
+					goto unlock;
+				flush_cache_page(vma, address, pte_pfn(*pte));
+				ptep_clear_flush(vma, address, pte);
+				pte_unmap_unlock(pte, ptl);
+				goto anon;
+			}
+
+			return VM_FAULT_SIGBUS;
+		}
+
 		if (pte_file(entry))
 			return do_nonlinear_fault(mm, vma, address,
 					pte, pmd, flags, entry);
diff --git a/mm/vrange.c b/mm/vrange.c
index d57cb38..9cafb01 100644
--- a/mm/vrange.c
+++ b/mm/vrange.c
@@ -521,6 +521,7 @@ int try_to_discard_one(struct vrange_root *vroot, struct page *page,
 			BUG_ON(1);
 	}
 
+	set_pte_at(mm, addr, pte, pte_mkvrange(*pte));
 	pte_unmap_unlock(pte, ptl);
 	mmu_notifier_invalidate_page(mm, addr);
 	ret = 1;
@@ -696,3 +697,33 @@ int discard_vpage(struct page *page)
 	return 0;
 }
 
+bool is_purged_vrange(struct mm_struct *mm, unsigned long address)
+{
+	struct vrange_root *vroot;
+	struct interval_tree_node *node;
+	struct vrange *range;
+	unsigned long vstart_idx;
+	struct vm_area_struct *vma;
+	bool ret = false;
+
+	vma = find_vma(mm, address);
+	if (vma->vm_file && (vma->vm_flags & VM_SHARED)) {
+		vroot = &vma->vm_file->f_mapping->vroot;
+		vstart_idx = vma->vm_pgoff + address - vma->vm_start;
+	} else {
+		vroot = &mm->vroot;
+		vstart_idx = address;
+	}
+
+	vrange_lock(vroot);
+	node = interval_tree_iter_first(&vroot->v_rb, vstart_idx,
+						vstart_idx + PAGE_SIZE - 1);
+	if (node) {
+		range = container_of(node, struct vrange, node);
+		if (range->purged)
+			ret = true;
+	}
+	vrange_unlock(vroot);
+	return ret;
+}
+
-- 
1.7.9.5

-- 
Kind regards,
Minchan Kim

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 8/8] vrange: Send SIGBUS when user try to access purged page
@ 2013-06-19  4:36     ` Minchan Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Minchan Kim @ 2013-06-19  4:36 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Dhaval Giani, Jan Kara, KOSAKI Motohiro, Michel Lespinasse,
	linux-mm

On Tue, Jun 11, 2013 at 09:22:51PM -0700, John Stultz wrote:
> From: Minchan Kim <minchan@kernel.org>
> 
> By vrange(2) semantic, user should see SIGBUG if he try to access
> purged page without vrange(...VRANGE_NOVOLATILE).
> 
> This patch implements it.
> 
> XXX: I reused PSE bit for quick prototype without enough considering
> so need time to see what's empty bit and I am surely missing
> many places to handle vrange pte bit. I should investigate all of
> pte handling places, especially pte_none case.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Android Kernel Team <kernel-team@android.com>
> Cc: Robert Love <rlove@google.com>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Dave Hansen <dave@linux.vnet.ibm.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Neil Brown <neilb@suse.de>
> Cc: Andrea Righi <andrea@betterlinux.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Cc: Mike Hommey <mh@glandium.org>
> Cc: Taras Glek <tglek@mozilla.com>
> Cc: Dhaval Giani <dgiani@mozilla.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
> Cc: Michel Lespinasse <walken@google.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: linux-mm@kvack.org <linux-mm@kvack.org>
> 
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> [jstultz: Extended to work with file pages]
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
>  arch/x86/include/asm/pgtable_types.h |  2 ++
>  include/asm-generic/pgtable.h        | 11 +++++++++++
>  include/linux/vrange.h               |  2 ++
>  mm/memory.c                          | 23 +++++++++++++++++++++--
>  mm/vrange.c                          | 35 ++++++++++++++++++++++++++++++++++-
>  5 files changed, 70 insertions(+), 3 deletions(-)
> 

This patch fixes the problem Dhaval reported.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 0/8] Volatile Ranges (v8?)
  2013-06-18 16:59       ` Dhaval Giani
@ 2013-06-19  4:41         ` Minchan Kim
  -1 siblings, 0 replies; 48+ messages in thread
From: Minchan Kim @ 2013-06-19  4:41 UTC (permalink / raw)
  To: Dhaval Giani
  Cc: John Stultz, LKML, Andrew Morton, Android Kernel Team,
	Robert Love, Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Jan Kara, KOSAKI Motohiro, Michel Lespinasse, linux-mm

Hello Dhaval,

On Tue, Jun 18, 2013 at 12:59:02PM -0400, Dhaval Giani wrote:
> On 2013-06-18 12:11 AM, Minchan Kim wrote:
> >Hello Dhaval,
> >
> >On Mon, Jun 17, 2013 at 12:24:07PM -0400, Dhaval Giani wrote:
> >>Hi John,
> >>
> >>I have been giving your git tree a whirl, and in order to simulate a
> >>limited memory environment, I was using memory cgroups.
> >>
> >>The program I was using to test is attached here. It is your test
> >>code, with some changes (changing the syscall interface, reducing
> >>the memory pressure to be generated).
> >>
> >>I trapped it in a memory cgroup with 1MB memory.limit_in_bytes and hit this,
> >>
> >>[  406.207612] ------------[ cut here ]------------
> >>[  406.207621] kernel BUG at mm/vrange.c:523!
> >>[  406.207626] invalid opcode: 0000 [#1] SMP
> >>[  406.207631] Modules linked in:
> >>[  406.207637] CPU: 0 PID: 1579 Comm: volatile-test Not tainted
> >Thanks for the testing!
> >Does below patch fix your problem?
> 
> Yes it does! Thank you very much for the patch.

Thaks for the confirming.
While I tested it, I found several problems so I just sent fixes as reply
of each [7/8] and [8/8].
Could you test it?


FYI: John, Dhaval

I am working to clean purging mess up so maybe it would need not a few
change for purging part.

Thanks!

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 0/8] Volatile Ranges (v8?)
@ 2013-06-19  4:41         ` Minchan Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Minchan Kim @ 2013-06-19  4:41 UTC (permalink / raw)
  To: Dhaval Giani
  Cc: John Stultz, LKML, Andrew Morton, Android Kernel Team,
	Robert Love, Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Jan Kara, KOSAKI Motohiro, Michel Lespinasse, linux-mm

Hello Dhaval,

On Tue, Jun 18, 2013 at 12:59:02PM -0400, Dhaval Giani wrote:
> On 2013-06-18 12:11 AM, Minchan Kim wrote:
> >Hello Dhaval,
> >
> >On Mon, Jun 17, 2013 at 12:24:07PM -0400, Dhaval Giani wrote:
> >>Hi John,
> >>
> >>I have been giving your git tree a whirl, and in order to simulate a
> >>limited memory environment, I was using memory cgroups.
> >>
> >>The program I was using to test is attached here. It is your test
> >>code, with some changes (changing the syscall interface, reducing
> >>the memory pressure to be generated).
> >>
> >>I trapped it in a memory cgroup with 1MB memory.limit_in_bytes and hit this,
> >>
> >>[  406.207612] ------------[ cut here ]------------
> >>[  406.207621] kernel BUG at mm/vrange.c:523!
> >>[  406.207626] invalid opcode: 0000 [#1] SMP
> >>[  406.207631] Modules linked in:
> >>[  406.207637] CPU: 0 PID: 1579 Comm: volatile-test Not tainted
> >Thanks for the testing!
> >Does below patch fix your problem?
> 
> Yes it does! Thank you very much for the patch.

Thaks for the confirming.
While I tested it, I found several problems so I just sent fixes as reply
of each [7/8] and [8/8].
Could you test it?


FYI: John, Dhaval

I am working to clean purging mess up so maybe it would need not a few
change for purging part.

Thanks!

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 0/8] Volatile Ranges (v8?)
  2013-06-19  4:41         ` Minchan Kim
@ 2013-06-19 18:36           ` Dhaval Giani
  -1 siblings, 0 replies; 48+ messages in thread
From: Dhaval Giani @ 2013-06-19 18:36 UTC (permalink / raw)
  To: Minchan Kim
  Cc: John Stultz, LKML, Andrew Morton, Android Kernel Team,
	Robert Love, Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Jan Kara, KOSAKI Motohiro, Michel Lespinasse, linux-mm

On 2013-06-19 12:41 AM, Minchan Kim wrote:
> Hello Dhaval,
>
> On Tue, Jun 18, 2013 at 12:59:02PM -0400, Dhaval Giani wrote:
>> On 2013-06-18 12:11 AM, Minchan Kim wrote:
>>> Hello Dhaval,
>>>
>>> On Mon, Jun 17, 2013 at 12:24:07PM -0400, Dhaval Giani wrote:
>>>> Hi John,
>>>>
>>>> I have been giving your git tree a whirl, and in order to simulate a
>>>> limited memory environment, I was using memory cgroups.
>>>>
>>>> The program I was using to test is attached here. It is your test
>>>> code, with some changes (changing the syscall interface, reducing
>>>> the memory pressure to be generated).
>>>>
>>>> I trapped it in a memory cgroup with 1MB memory.limit_in_bytes and hit this,
>>>>
>>>> [  406.207612] ------------[ cut here ]------------
>>>> [  406.207621] kernel BUG at mm/vrange.c:523!
>>>> [  406.207626] invalid opcode: 0000 [#1] SMP
>>>> [  406.207631] Modules linked in:
>>>> [  406.207637] CPU: 0 PID: 1579 Comm: volatile-test Not tainted
>>> Thanks for the testing!
>>> Does below patch fix your problem?
>> Yes it does! Thank you very much for the patch.
> Thaks for the confirming.
> While I tested it, I found several problems so I just sent fixes as reply
> of each [7/8] and [8/8].
> Could you test it?

Great! These patches (seem to) fix another issue I noticed yesterday 
with signal handling. I have pushed out my code for testing this stuff 
at https://github.com/volatile-ranges-test/vranges-test . The code and 
the scripts are still unpolished (as in you don't get a pass or fail) 
but they seem to work just fine.

>
> FYI: John, Dhaval
>
> I am working to clean purging mess up so maybe it would need not a few
> change for purging part.

Great, I will also take a look at the code.

Dhaval

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 0/8] Volatile Ranges (v8?)
@ 2013-06-19 18:36           ` Dhaval Giani
  0 siblings, 0 replies; 48+ messages in thread
From: Dhaval Giani @ 2013-06-19 18:36 UTC (permalink / raw)
  To: Minchan Kim
  Cc: John Stultz, LKML, Andrew Morton, Android Kernel Team,
	Robert Love, Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Jan Kara, KOSAKI Motohiro, Michel Lespinasse, linux-mm

On 2013-06-19 12:41 AM, Minchan Kim wrote:
> Hello Dhaval,
>
> On Tue, Jun 18, 2013 at 12:59:02PM -0400, Dhaval Giani wrote:
>> On 2013-06-18 12:11 AM, Minchan Kim wrote:
>>> Hello Dhaval,
>>>
>>> On Mon, Jun 17, 2013 at 12:24:07PM -0400, Dhaval Giani wrote:
>>>> Hi John,
>>>>
>>>> I have been giving your git tree a whirl, and in order to simulate a
>>>> limited memory environment, I was using memory cgroups.
>>>>
>>>> The program I was using to test is attached here. It is your test
>>>> code, with some changes (changing the syscall interface, reducing
>>>> the memory pressure to be generated).
>>>>
>>>> I trapped it in a memory cgroup with 1MB memory.limit_in_bytes and hit this,
>>>>
>>>> [  406.207612] ------------[ cut here ]------------
>>>> [  406.207621] kernel BUG at mm/vrange.c:523!
>>>> [  406.207626] invalid opcode: 0000 [#1] SMP
>>>> [  406.207631] Modules linked in:
>>>> [  406.207637] CPU: 0 PID: 1579 Comm: volatile-test Not tainted
>>> Thanks for the testing!
>>> Does below patch fix your problem?
>> Yes it does! Thank you very much for the patch.
> Thaks for the confirming.
> While I tested it, I found several problems so I just sent fixes as reply
> of each [7/8] and [8/8].
> Could you test it?

Great! These patches (seem to) fix another issue I noticed yesterday 
with signal handling. I have pushed out my code for testing this stuff 
at https://github.com/volatile-ranges-test/vranges-test . The code and 
the scripts are still unpolished (as in you don't get a pass or fail) 
but they seem to work just fine.

>
> FYI: John, Dhaval
>
> I am working to clean purging mess up so maybe it would need not a few
> change for purging part.

Great, I will also take a look at the code.

Dhaval

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/8] vrange: Add new vrange(2) system call
  2013-06-12  4:22   ` John Stultz
@ 2013-06-20 21:05     ` Dhaval Giani
  -1 siblings, 0 replies; 48+ messages in thread
From: Dhaval Giani @ 2013-06-20 21:05 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Minchan Kim, Andrew Morton, Android Kernel Team,
	Robert Love, Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Jan Kara, KOSAKI Motohiro, Michel Lespinasse, linux-mm

On 2013-06-12 12:22 AM, John Stultz wrote:
> From: Minchan Kim <minchan@kernel.org>
>
> This patch adds new system call sys_vrange.
>
> NAME
> 	vrange - Mark or unmark range of memory as volatile
>
> SYNOPSIS
> 	int vrange(unsigned_long start, size_t length, int mode,
> 			 int *purged);
>
> DESCRIPTION
> 	Applications can use vrange(2) to advise the kernel how it should
> 	handle paging I/O in this VM area.  The idea is to help the kernel
> 	discard pages of vrange instead of reclaiming when memory pressure
> 	happens. It means kernel doesn't discard any pages of vrange if
> 	there is no memory pressure.
>
> 	mode:
> 	VRANGE_VOLATILE
> 		hint to kernel so VM can discard in vrange pages when
> 		memory pressure happens.
> 	VRANGE_NONVOLATILE
> 		hint to kernel so VM doesn't discard vrange pages
> 		any more.
>
> 	If user try to access purged memory without VRANGE_NOVOLATILE call,
> 	he can encounter SIGBUS if the page was discarded by kernel.

I wonder if it would be possible to provide additional information here, 
for example "purge range at a time" as opposed to "purge page at a 
time". There are some valid use cases for both approaches and it doesn't 
make sense to deny one use case.

Thanks!
Dhaval

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/8] vrange: Add new vrange(2) system call
@ 2013-06-20 21:05     ` Dhaval Giani
  0 siblings, 0 replies; 48+ messages in thread
From: Dhaval Giani @ 2013-06-20 21:05 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Minchan Kim, Andrew Morton, Android Kernel Team,
	Robert Love, Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Andrea Arcangeli, Aneesh Kumar K.V, Mike Hommey, Taras Glek,
	Jan Kara, KOSAKI Motohiro, Michel Lespinasse, linux-mm

On 2013-06-12 12:22 AM, John Stultz wrote:
> From: Minchan Kim <minchan@kernel.org>
>
> This patch adds new system call sys_vrange.
>
> NAME
> 	vrange - Mark or unmark range of memory as volatile
>
> SYNOPSIS
> 	int vrange(unsigned_long start, size_t length, int mode,
> 			 int *purged);
>
> DESCRIPTION
> 	Applications can use vrange(2) to advise the kernel how it should
> 	handle paging I/O in this VM area.  The idea is to help the kernel
> 	discard pages of vrange instead of reclaiming when memory pressure
> 	happens. It means kernel doesn't discard any pages of vrange if
> 	there is no memory pressure.
>
> 	mode:
> 	VRANGE_VOLATILE
> 		hint to kernel so VM can discard in vrange pages when
> 		memory pressure happens.
> 	VRANGE_NONVOLATILE
> 		hint to kernel so VM doesn't discard vrange pages
> 		any more.
>
> 	If user try to access purged memory without VRANGE_NOVOLATILE call,
> 	he can encounter SIGBUS if the page was discarded by kernel.

I wonder if it would be possible to provide additional information here, 
for example "purge range at a time" as opposed to "purge page at a 
time". There are some valid use cases for both approaches and it doesn't 
make sense to deny one use case.

Thanks!
Dhaval

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/8] vrange: Add method to purge volatile ranges
  2013-06-19  4:34     ` Minchan Kim
  (?)
@ 2013-10-01 14:00     ` Krzysztof Kozlowski
  2013-10-02  1:32       ` Minchan Kim
  -1 siblings, 1 reply; 48+ messages in thread
From: Krzysztof Kozlowski @ 2013-10-01 14:00 UTC (permalink / raw)
  To: linux-mm

Hi

On śro, 2013-06-19 at 13:34 +0900, Minchan Kim wrote:
> +int try_to_discard_one(struct vrange_root *vroot, struct page *page,
> +			struct vm_area_struct *vma, unsigned long addr)
> +{
> +	struct mm_struct *mm = vma->vm_mm;
> +	pte_t *pte;
> +	pte_t pteval;
> +	spinlock_t *ptl;
> +	int ret = 0;
> +	bool present;
> +
> +	VM_BUG_ON(!PageLocked(page));
> +
> +	vrange_lock(vroot);
> +	pte = vpage_check_address(page, mm, addr, &ptl);
> +	if (!pte)
> +		goto out;
> +
> +	if (vma->vm_flags & VM_LOCKED) {
> +		pte_unmap_unlock(pte, ptl);
> +		goto out;
> +	}
> +
> +	present = pte_present(*pte);
> +	flush_cache_page(vma, address, page_to_pfn(page));

Compilation error during porting to ARM:
s/address/addr


Best regards,
Krzysztof


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/8] vrange: Add method to purge volatile ranges
  2013-10-01 14:00     ` Krzysztof Kozlowski
@ 2013-10-02  1:32       ` Minchan Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Minchan Kim @ 2013-10-02  1:32 UTC (permalink / raw)
  To: Krzysztof Kozlowski; +Cc: linux-mm, John Stultz, Dhaval Giani

Hello, Krzysztof

Thanks for the fix!
Just FYI,
I and John found many bugs and changed lots of code and will send it
to upstream, maybe end of this week or next week.

Thanks!

On Tue, Oct 1, 2013 at 11:00 PM, Krzysztof Kozlowski
<k.kozlowski@samsung.com> wrote:
> Hi
>
> On śro, 2013-06-19 at 13:34 +0900, Minchan Kim wrote:
>> +int try_to_discard_one(struct vrange_root *vroot, struct page *page,
>> +                     struct vm_area_struct *vma, unsigned long addr)
>> +{
>> +     struct mm_struct *mm = vma->vm_mm;
>> +     pte_t *pte;
>> +     pte_t pteval;
>> +     spinlock_t *ptl;
>> +     int ret = 0;
>> +     bool present;
>> +
>> +     VM_BUG_ON(!PageLocked(page));
>> +
>> +     vrange_lock(vroot);
>> +     pte = vpage_check_address(page, mm, addr, &ptl);
>> +     if (!pte)
>> +             goto out;
>> +
>> +     if (vma->vm_flags & VM_LOCKED) {
>> +             pte_unmap_unlock(pte, ptl);
>> +             goto out;
>> +     }
>> +
>> +     present = pte_present(*pte);
>> +     flush_cache_page(vma, address, page_to_pfn(page));
>
> Compilation error during porting to ARM:
> s/address/addr
>
>
> Best regards,
> Krzysztof
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a hrefmailto:"dont@kvack.org"> email@kvack.org </a>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2013-10-02  1:32 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-12  4:22 [PATCH 0/8] Volatile Ranges (v8?) John Stultz
2013-06-12  4:22 ` John Stultz
2013-06-12  4:22 ` [PATCH 1/8] vrange: Add basic data structure and functions John Stultz
2013-06-12  4:22   ` John Stultz
2013-06-12  4:22 ` [PATCH 2/8] vrange: Add vrange support for file address_spaces John Stultz
2013-06-12  4:22   ` John Stultz
2013-06-12  4:22 ` [PATCH 3/8] vrange: Add vrange support to mm_structs John Stultz
2013-06-12  4:22   ` John Stultz
2013-06-12  4:22 ` [PATCH 4/8] vrange: Clear volatility on new mmaps John Stultz
2013-06-12  4:22   ` John Stultz
2013-06-13  6:28   ` Minchan Kim
2013-06-13  6:28     ` Minchan Kim
2013-06-13 23:43     ` John Stultz
2013-06-13 23:43       ` John Stultz
2013-06-14  0:21       ` Minchan Kim
2013-06-14  0:21         ` Minchan Kim
2013-06-12  4:22 ` [PATCH 5/8] vrange: Add new vrange(2) system call John Stultz
2013-06-12  4:22   ` John Stultz
2013-06-12  6:48   ` NeilBrown
2013-06-12 18:47     ` John Stultz
2013-06-12 18:47       ` John Stultz
2013-06-20 21:05   ` Dhaval Giani
2013-06-20 21:05     ` Dhaval Giani
2013-06-12  4:22 ` [PATCH 6/8] vrange: Add GFP_NO_VRANGE allocation flag John Stultz
2013-06-12  4:22   ` John Stultz
2013-06-12  4:22 ` [PATCH 7/8] vrange: Add method to purge volatile ranges John Stultz
2013-06-12  4:22   ` John Stultz
2013-06-17  7:13   ` Minchan Kim
2013-06-17  7:13     ` Minchan Kim
2013-06-17  7:24     ` Minchan Kim
2013-06-17  7:24       ` Minchan Kim
2013-06-19  4:34   ` Minchan Kim
2013-06-19  4:34     ` Minchan Kim
2013-10-01 14:00     ` Krzysztof Kozlowski
2013-10-02  1:32       ` Minchan Kim
2013-06-12  4:22 ` [PATCH 8/8] vrange: Send SIGBUS when user try to access purged page John Stultz
2013-06-12  4:22   ` John Stultz
2013-06-19  4:36   ` Minchan Kim
2013-06-19  4:36     ` Minchan Kim
2013-06-17 16:24 ` [PATCH 0/8] Volatile Ranges (v8?) Dhaval Giani
2013-06-18  4:11   ` Minchan Kim
2013-06-18  4:11     ` Minchan Kim
2013-06-18 16:59     ` Dhaval Giani
2013-06-18 16:59       ` Dhaval Giani
2013-06-19  4:41       ` Minchan Kim
2013-06-19  4:41         ` Minchan Kim
2013-06-19 18:36         ` Dhaval Giani
2013-06-19 18:36           ` Dhaval Giani

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.