linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org,
	virtualization@lists.osdl.org
Cc: akpm@osdl.org, nickpiggin@yahoo.com.au, hugh@veritas.com,
	zach@vmware.com, frankeh@watson.ibm.com,
	Martin Schwidefsky <schwidefsky@de.ibm.com>
Subject: [patch 3/6] Guest page hinting: mlocked pages.
Date: Wed, 12 Mar 2008 14:21:35 +0100	[thread overview]
Message-ID: <20080312132703.689828503@de.ibm.com> (raw)
In-Reply-To: 20080312132132.520833247@de.ibm.com

[-- Attachment #1: 003-hva-mlock.diff --]
[-- Type: text/plain, Size: 5019 bytes --]

From: Martin Schwidefsky <schwidefsky@de.ibm.com>
From: Hubertus Franke <frankeh@watson.ibm.com>
From: Himanshu Raj

Add code to get mlock() working with guest page hinting. The problem
with mlock is that locked pages may not be removed from page cache.
That means they need to be stable. page_make_volatile needs a way to
check if a page has been locked. To avoid traversing vma lists - which
would hurt performance a lot - a field is added in the struct
address_space. This field is set in mlock_fixup if a vma gets mlocked.
The bit never gets removed - once a file had an mlocked vma all future
pages added to it will stay stable.

The pages of an mlocked area are made present in the linux page table by
a call to make_pages_present which calls get_user_pages and follow_page.
The follow_page function is called for each page in the mlocked vma,
if the VM_LOCKED bit in the vma flags is set the page is made stable.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---

 include/linux/fs.h |   10 ++++++++++
 mm/memory.c        |    5 +++--
 mm/mlock.c         |    3 +++
 mm/page-states.c   |    5 ++++-
 mm/rmap.c          |   13 +++++++++++--
 5 files changed, 31 insertions(+), 5 deletions(-)

Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -513,6 +513,9 @@ struct address_space {
 	spinlock_t		private_lock;	/* for use by the address_space */
 	struct list_head	private_list;	/* ditto */
 	struct address_space	*assoc_mapping;	/* ditto */
+#ifdef CONFIG_PAGE_STATES
+	unsigned int		mlocked;	/* set if VM_LOCKED vmas present */
+#endif
 } __attribute__((aligned(sizeof(long))));
 	/*
 	 * On most architectures that alignment is already the case; but
@@ -520,6 +523,13 @@ struct address_space {
 	 * of struct page's "mapping" pointer be used for PAGE_MAPPING_ANON.
 	 */
 
+static inline void mapping_set_mlocked(struct address_space *mapping)
+{
+#ifdef CONFIG_PAGE_STATES
+	mapping->mlocked = 1;
+#endif
+}
+
 struct block_device {
 	dev_t			bd_dev;  /* not a kdev_t - it's a search key */
 	struct inode *		bd_inode;	/* will die */
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -987,9 +987,10 @@ struct page *follow_page(struct vm_area_
 	if (flags & FOLL_GET)
 		get_page(page);
 
-	if (flags & FOLL_GET) {
+	if ((flags & FOLL_GET) || (vma->vm_flags & VM_LOCKED)) {
 		/*
-		 * The page is made stable if a reference is acquired.
+		 * The page is made stable if a reference is acquired or
+		 * the vm area is locked.
 		 * If the caller does not get a reference it implies that
 		 * the caller can deal with page faults in case the page
 		 * is swapped out. In this case the caller can deal with
Index: linux-2.6/mm/mlock.c
===================================================================
--- linux-2.6.orig/mm/mlock.c
+++ linux-2.6/mm/mlock.c
@@ -12,6 +12,7 @@
 #include <linux/syscalls.h>
 #include <linux/sched.h>
 #include <linux/module.h>
+#include <linux/fs.h>
 
 int can_do_mlock(void)
 {
@@ -71,6 +72,8 @@ success:
 	 */
 	pages = (end - start) >> PAGE_SHIFT;
 	if (newflags & VM_LOCKED) {
+		if (vma->vm_file && vma->vm_file->f_mapping)
+			mapping_set_mlocked(vma->vm_file->f_mapping);
 		pages = -pages;
 		if (!(newflags & VM_IO))
 			ret = make_pages_present(start, end);
Index: linux-2.6/mm/page-states.c
===================================================================
--- linux-2.6.orig/mm/page-states.c
+++ linux-2.6/mm/page-states.c
@@ -29,6 +29,8 @@
  */
 static inline int check_bits(struct page *page)
 {
+	struct address_space *mapping;
+
 	/*
 	 * There are several conditions that prevent a page from becoming
 	 * volatile. The first check is for the page bits.
@@ -52,7 +54,8 @@ static inline int check_bits(struct page
 	 * it volatile. It will be freed soon. And if the mapping ever
 	 * had locked pages all pages of the mapping will stay stable.
 	 */
-	return page_mapping(page) != NULL;
+	mapping = page_mapping(page);
+	return mapping && !mapping->mlocked;
 }
 
 /*
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -722,8 +722,17 @@ static int try_to_unmap_one(struct page 
 	 */
 	if (!migration && ((vma->vm_flags & VM_LOCKED) ||
 			(ptep_clear_flush_young(vma, address, pte)))) {
-		ret = SWAP_FAIL;
-		goto out_unmap;
+		/*
+		 * Check for discarded pages. This can happen if there have
+		 * been discarded pages before a vma gets mlocked. The code
+		 * in make_pages_present will force all discarded pages out
+		 * and reload them. That happens after the VM_LOCKED bit
+		 * has been set.
+		 */
+		if (likely(!PageDiscarded(page))) {
+			ret = SWAP_FAIL;
+			goto out_unmap;
+		}
 	}
 
 	/* Nuke the page table entry. */

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


  parent reply	other threads:[~2008-03-12 13:27 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-12 13:21 [patch 0/6] Guest page hinting version 6 Martin Schwidefsky
2008-03-12 13:21 ` [patch 1/6] Guest page hinting: core + volatile page cache Martin Schwidefsky
2008-03-12 23:12   ` Rusty Russell
2008-03-13  9:24     ` Martin Schwidefsky
2008-03-12 13:21 ` [patch 2/6] Guest page hinting: volatile swap cache Martin Schwidefsky
2008-03-12 13:21 ` Martin Schwidefsky [this message]
2008-03-12 23:27   ` [patch 3/6] Guest page hinting: mlocked pages Rusty Russell
2008-03-13  9:13     ` Martin Schwidefsky
2008-03-12 13:21 ` [patch 4/6] Guest page hinting: writable page table entries Martin Schwidefsky
2008-03-12 23:35   ` Rusty Russell
2008-03-13  9:11     ` Martin Schwidefsky
2008-03-12 13:21 ` [patch 5/6] Guest page hinting: minor fault optimization Martin Schwidefsky
2008-03-12 13:21 ` [patch 6/6] Guest page hinting: s390 support Martin Schwidefsky
2008-03-12 16:19   ` Jeremy Fitzhardinge
2008-03-12 16:28     ` Martin Schwidefsky
2008-03-12 16:44       ` Jeremy Fitzhardinge
2008-03-12 16:59         ` Martin Schwidefsky
2008-03-12 17:48           ` Jeremy Fitzhardinge
2008-03-12 20:04             ` Anthony Liguori
2008-03-12 20:45               ` Jeremy Fitzhardinge
2008-03-12 20:56                 ` Anthony Liguori
2008-03-12 21:36                   ` Jeremy Fitzhardinge
2008-03-13  9:45                     ` Martin Schwidefsky
2008-03-13 16:07                       ` Jeremy Fitzhardinge
2008-03-13 16:17                         ` Jeremy Fitzhardinge
2008-03-13 16:55                           ` Martin Schwidefsky
2008-03-13 17:05                             ` Jeremy Fitzhardinge
2008-03-13 17:23                               ` Martin Schwidefsky
2008-03-13  9:42                   ` Martin Schwidefsky
2008-03-13  9:36                 ` Martin Schwidefsky
2008-03-13  9:32               ` Martin Schwidefsky
2008-03-12 22:41 ` [patch 0/6] Guest page hinting version 6 Rusty Russell
2008-03-13  9:47   ` Martin Schwidefsky
2008-03-13 16:57 ` Hugh Dickins
2008-03-13 17:14   ` Martin Schwidefsky
2008-03-13 17:45   ` Zachary Amsden
2008-03-13 19:45     ` Andrea Arcangeli
2008-03-13 21:41       ` Zachary Amsden
2008-03-13 18:41   ` Jeremy Fitzhardinge
2008-03-13 18:55     ` Hugh Dickins
2008-03-13 19:53       ` Zachary Amsden
2008-03-14 18:30         ` Jeremy Fitzhardinge
2008-03-14 21:32           ` Zachary Amsden
2008-03-14 21:37             ` Jeremy Fitzhardinge
2008-03-17  9:21             ` Martin Schwidefsky
2008-05-06 15:33   ` Martin Schwidefsky
2008-05-06 19:46     ` Rik van Riel
2008-05-07  3:49       ` Zachary Amsden
2008-05-07  7:00         ` Martin Schwidefsky
  -- strict thread matches above, loose matches on Subject: below --
2009-03-27 15:09 [patch 0/6] Guest page hinting version 7 Martin Schwidefsky
2009-03-27 15:09 ` [patch 3/6] Guest page hinting: mlocked pages Martin Schwidefsky
2009-04-01  2:52   ` Rik van Riel
2009-04-01  8:13     ` Martin Schwidefsky
2007-06-28 16:40 [patch 0/6] resend: guest page hinting version 5 Martin Schwidefsky
2007-06-28 16:40 ` [patch 3/6] Guest page hinting: mlocked pages Martin Schwidefsky
2007-05-11 13:58 [patch 0/6] [rfc] guest page hinting version 5 Martin Schwidefsky
2007-05-11 13:58 ` [patch 3/6] Guest page hinting: mlocked pages Martin Schwidefsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080312132703.689828503@de.ibm.com \
    --to=schwidefsky@de.ibm.com \
    --cc=akpm@osdl.org \
    --cc=frankeh@watson.ibm.com \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=virtualization@lists.osdl.org \
    --cc=zach@vmware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).