All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/10] Swap-over-NFS without deadlocking v1
@ 2011-09-09 11:00 ` Mel Gorman
  0 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-09 11:00 UTC (permalink / raw)
  To: Linux-MM
  Cc: Linux-Netdev, Linux-NFS, LKML, Andrew Morton, David Miller,
	Trond Myklebust, Neil Brown, Peter Zijlstra, Mel Gorman

This patch series is based on top of "Swap-over-NBD without deadlocking
v6" as it depends on the same reservation of PF_MEMALLOC reserves
logic.

When a user or administrator requires swap for their application,
they create a swap partition and file, format it with mkswap and
activate it with swapon. In diskless systems this is not an option
so if swap if required then swapping over the network is considered.
The two likely scenarios are when blade servers are used as part of
a cluster where the form factor or maintenance costs do not allow
the use of disks and thin clients.

The Linux Terminal Server Project recommends the use of the Network
Block Device (NBD) for swap but this is not always an option.  There is
no guarantee that the network attached storage (NAS) device is running
Linux or supports NBD. However, it is likely that it supports NFS so
there are users that want support for swapping over NFS despite any
performance concern. Some distributions currently carry patches that
support swapping over NFS but it would be preferable to support it
in the mainline kernel.

Patch 1 avoids a stream-specific deadlock that potentially affects TCP.

Patch 2 is a small modification to SELinux to avoid using PFMEMALLOC
	reserves.

Patch 3 adds four address_space_operations to allow a filesystem
	to optionally control a swapfile. The news handlers are
	expected to map requests to the swapspace operations to
	the underlying file mapping.

Patch 4 notes that patch 3 is bolting
	filesystem-specific-swapfile-support onto the side and that
	the default handlers have different information to what
	is available to the filesystem. This patch refactors the
	code so that there are generic handlers for each of the new
	address_space operations.

Patch 5 adds some helpers for filesystems to handle swap cache pages.

Patch 6 updates NFS to use the helpers from patch 5 where necessary.

Patch 7 avoids setting PF_private on PG_swapcache pages within NFS.

Patch 8 implements the new swapfile-related address_space operations
	for NFS.

Patch 9 prevents page allocator recursions in NFS by using GFP_NOIO
	where appropriate.

Patch 10 fixes a NULL pointer dereference that occurs when using
	swap-over-NFS.

 Documentation/filesystems/Locking |   23 +++++
 Documentation/filesystems/vfs.txt |   21 +++++
 fs/nfs/Kconfig                    |    8 ++
 fs/nfs/file.c                     |   26 +++++-
 fs/nfs/inode.c                    |    6 ++
 fs/nfs/internal.h                 |    7 +-
 fs/nfs/pagelist.c                 |    8 +-
 fs/nfs/read.c                     |    6 +-
 fs/nfs/write.c                    |  163 +++++++++++++++++++++++++--------
 include/linux/fs.h                |   10 ++
 include/linux/mm.h                |   25 +++++
 include/linux/nfs_fs.h            |    2 +
 include/linux/pagemap.h           |    5 +
 include/linux/sunrpc/xprt.h       |    3 +
 include/linux/swap.h              |    7 ++
 include/net/sock.h                |    7 +-
 mm/page_io.c                      |  181 ++++++++++++++++++++++++++++++++-----
 mm/swap_state.c                   |    2 +-
 mm/swapfile.c                     |  138 ++++++++++------------------
 net/caif/caif_socket.c            |    2 +-
 net/core/sock.c                   |    2 +-
 net/ipv4/tcp_input.c              |   12 ++--
 net/sctp/ulpevent.c               |    2 +-
 net/sunrpc/Kconfig                |    5 +
 net/sunrpc/clnt.c                 |    2 +
 net/sunrpc/sched.c                |    7 +-
 net/sunrpc/xprtsock.c             |   57 ++++++++++++
 security/selinux/avc.c            |    2 +-
 28 files changed, 561 insertions(+), 178 deletions(-)

-- 
1.7.3.4


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC PATCH 00/10] Swap-over-NFS without deadlocking v1
@ 2011-09-09 11:00 ` Mel Gorman
  0 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-09 11:00 UTC (permalink / raw)
  To: Linux-MM
  Cc: Linux-Netdev, Linux-NFS, LKML, Andrew Morton, David Miller,
	Trond Myklebust, Neil Brown, Peter Zijlstra, Mel Gorman

This patch series is based on top of "Swap-over-NBD without deadlocking
v6" as it depends on the same reservation of PF_MEMALLOC reserves
logic.

When a user or administrator requires swap for their application,
they create a swap partition and file, format it with mkswap and
activate it with swapon. In diskless systems this is not an option
so if swap if required then swapping over the network is considered.
The two likely scenarios are when blade servers are used as part of
a cluster where the form factor or maintenance costs do not allow
the use of disks and thin clients.

The Linux Terminal Server Project recommends the use of the Network
Block Device (NBD) for swap but this is not always an option.  There is
no guarantee that the network attached storage (NAS) device is running
Linux or supports NBD. However, it is likely that it supports NFS so
there are users that want support for swapping over NFS despite any
performance concern. Some distributions currently carry patches that
support swapping over NFS but it would be preferable to support it
in the mainline kernel.

Patch 1 avoids a stream-specific deadlock that potentially affects TCP.

Patch 2 is a small modification to SELinux to avoid using PFMEMALLOC
	reserves.

Patch 3 adds four address_space_operations to allow a filesystem
	to optionally control a swapfile. The news handlers are
	expected to map requests to the swapspace operations to
	the underlying file mapping.

Patch 4 notes that patch 3 is bolting
	filesystem-specific-swapfile-support onto the side and that
	the default handlers have different information to what
	is available to the filesystem. This patch refactors the
	code so that there are generic handlers for each of the new
	address_space operations.

Patch 5 adds some helpers for filesystems to handle swap cache pages.

Patch 6 updates NFS to use the helpers from patch 5 where necessary.

Patch 7 avoids setting PF_private on PG_swapcache pages within NFS.

Patch 8 implements the new swapfile-related address_space operations
	for NFS.

Patch 9 prevents page allocator recursions in NFS by using GFP_NOIO
	where appropriate.

Patch 10 fixes a NULL pointer dereference that occurs when using
	swap-over-NFS.

 Documentation/filesystems/Locking |   23 +++++
 Documentation/filesystems/vfs.txt |   21 +++++
 fs/nfs/Kconfig                    |    8 ++
 fs/nfs/file.c                     |   26 +++++-
 fs/nfs/inode.c                    |    6 ++
 fs/nfs/internal.h                 |    7 +-
 fs/nfs/pagelist.c                 |    8 +-
 fs/nfs/read.c                     |    6 +-
 fs/nfs/write.c                    |  163 +++++++++++++++++++++++++--------
 include/linux/fs.h                |   10 ++
 include/linux/mm.h                |   25 +++++
 include/linux/nfs_fs.h            |    2 +
 include/linux/pagemap.h           |    5 +
 include/linux/sunrpc/xprt.h       |    3 +
 include/linux/swap.h              |    7 ++
 include/net/sock.h                |    7 +-
 mm/page_io.c                      |  181 ++++++++++++++++++++++++++++++++-----
 mm/swap_state.c                   |    2 +-
 mm/swapfile.c                     |  138 ++++++++++------------------
 net/caif/caif_socket.c            |    2 +-
 net/core/sock.c                   |    2 +-
 net/ipv4/tcp_input.c              |   12 ++--
 net/sctp/ulpevent.c               |    2 +-
 net/sunrpc/Kconfig                |    5 +
 net/sunrpc/clnt.c                 |    2 +
 net/sunrpc/sched.c                |    7 +-
 net/sunrpc/xprtsock.c             |   57 ++++++++++++
 security/selinux/avc.c            |    2 +-
 28 files changed, 561 insertions(+), 178 deletions(-)

-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 01/10] netvm: Prevent a stream-specific deadlock
  2011-09-09 11:00 ` Mel Gorman
@ 2011-09-09 11:00   ` Mel Gorman
  -1 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-09 11:00 UTC (permalink / raw)
  To: Linux-MM
  Cc: Linux-Netdev, Linux-NFS, LKML, Andrew Morton, David Miller,
	Trond Myklebust, Neil Brown, Peter Zijlstra, Mel Gorman

It could happen that all !SOCK_MEMALLOC sockets have buffered so
much data that we're over the global rmem limit. This will prevent
SOCK_MEMALLOC buffers from receiving data, which will prevent userspace
from running, which is needed to reduce the buffered data.

Fix this by exempting the SOCK_MEMALLOC sockets from the rmem limit.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/net/sock.h     |    7 ++++---
 net/caif/caif_socket.c |    2 +-
 net/core/sock.c        |    2 +-
 net/ipv4/tcp_input.c   |   12 ++++++------
 net/sctp/ulpevent.c    |    2 +-
 5 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 09813fc..dcf2a55 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -980,12 +980,13 @@ static inline int sk_wmem_schedule(struct sock *sk, int size)
 		__sk_mem_schedule(sk, size, SK_MEM_SEND);
 }
 
-static inline int sk_rmem_schedule(struct sock *sk, int size)
+static inline int sk_rmem_schedule(struct sock *sk, struct sk_buff *skb)
 {
 	if (!sk_has_account(sk))
 		return 1;
-	return size <= sk->sk_forward_alloc ||
-		__sk_mem_schedule(sk, size, SK_MEM_RECV);
+	return skb->truesize <= sk->sk_forward_alloc ||
+		__sk_mem_schedule(sk, skb->truesize, SK_MEM_RECV) ||
+		skb_pfmemalloc(skb);
 }
 
 static inline void sk_mem_reclaim(struct sock *sk)
diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index a986280..78ef332 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -168,7 +168,7 @@ static int caif_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	err = sk_filter(sk, skb);
 	if (err)
 		return err;
-	if (!sk_rmem_schedule(sk, skb->truesize) && rx_flow_is_on(cf_sk)) {
+	if (!sk_rmem_schedule(sk, skb) && rx_flow_is_on(cf_sk)) {
 		set_rx_flow_off(cf_sk);
 		if (net_ratelimit())
 			pr_debug("sending flow OFF due to rmem_schedule\n");
diff --git a/net/core/sock.c b/net/core/sock.c
index 0f28a9b..c27d9e5 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -344,7 +344,7 @@ int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	if (err)
 		return err;
 
-	if (!sk_rmem_schedule(sk, skb->truesize)) {
+	if (!sk_rmem_schedule(sk, skb)) {
 		atomic_inc(&sk->sk_drops);
 		return -ENOBUFS;
 	}
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index ea0d218..4c2ec53 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4356,19 +4356,19 @@ static void tcp_ofo_queue(struct sock *sk)
 static int tcp_prune_ofo_queue(struct sock *sk);
 static int tcp_prune_queue(struct sock *sk);
 
-static inline int tcp_try_rmem_schedule(struct sock *sk, unsigned int size)
+static inline int tcp_try_rmem_schedule(struct sock *sk, struct sk_buff *skb)
 {
 	if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf ||
-	    !sk_rmem_schedule(sk, size)) {
+	    !sk_rmem_schedule(sk, skb)) {
 
 		if (tcp_prune_queue(sk) < 0)
 			return -1;
 
-		if (!sk_rmem_schedule(sk, size)) {
+		if (!sk_rmem_schedule(sk, skb)) {
 			if (!tcp_prune_ofo_queue(sk))
 				return -1;
 
-			if (!sk_rmem_schedule(sk, size))
+			if (!sk_rmem_schedule(sk, skb))
 				return -1;
 		}
 	}
@@ -4421,7 +4421,7 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 		if (eaten <= 0) {
 queue_and_out:
 			if (eaten < 0 &&
-			    tcp_try_rmem_schedule(sk, skb->truesize))
+			    tcp_try_rmem_schedule(sk, skb))
 				goto drop;
 
 			skb_set_owner_r(skb, sk);
@@ -4492,7 +4492,7 @@ drop:
 
 	TCP_ECN_check_ce(tp, skb);
 
-	if (tcp_try_rmem_schedule(sk, skb->truesize))
+	if (tcp_try_rmem_schedule(sk, skb))
 		goto drop;
 
 	/* Disable header prediction. */
diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
index 8a84017..6c6ed2d 100644
--- a/net/sctp/ulpevent.c
+++ b/net/sctp/ulpevent.c
@@ -702,7 +702,7 @@ struct sctp_ulpevent *sctp_ulpevent_make_rcvmsg(struct sctp_association *asoc,
 	if (rx_count >= asoc->base.sk->sk_rcvbuf) {
 
 		if ((asoc->base.sk->sk_userlocks & SOCK_RCVBUF_LOCK) ||
-		    (!sk_rmem_schedule(asoc->base.sk, chunk->skb->truesize)))
+		    (!sk_rmem_schedule(asoc->base.sk, chunk->skb)))
 			goto fail;
 	}
 
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 01/10] netvm: Prevent a stream-specific deadlock
@ 2011-09-09 11:00   ` Mel Gorman
  0 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-09 11:00 UTC (permalink / raw)
  To: Linux-MM
  Cc: Linux-Netdev, Linux-NFS, LKML, Andrew Morton, David Miller,
	Trond Myklebust, Neil Brown, Peter Zijlstra, Mel Gorman

It could happen that all !SOCK_MEMALLOC sockets have buffered so
much data that we're over the global rmem limit. This will prevent
SOCK_MEMALLOC buffers from receiving data, which will prevent userspace
from running, which is needed to reduce the buffered data.

Fix this by exempting the SOCK_MEMALLOC sockets from the rmem limit.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/net/sock.h     |    7 ++++---
 net/caif/caif_socket.c |    2 +-
 net/core/sock.c        |    2 +-
 net/ipv4/tcp_input.c   |   12 ++++++------
 net/sctp/ulpevent.c    |    2 +-
 5 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 09813fc..dcf2a55 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -980,12 +980,13 @@ static inline int sk_wmem_schedule(struct sock *sk, int size)
 		__sk_mem_schedule(sk, size, SK_MEM_SEND);
 }
 
-static inline int sk_rmem_schedule(struct sock *sk, int size)
+static inline int sk_rmem_schedule(struct sock *sk, struct sk_buff *skb)
 {
 	if (!sk_has_account(sk))
 		return 1;
-	return size <= sk->sk_forward_alloc ||
-		__sk_mem_schedule(sk, size, SK_MEM_RECV);
+	return skb->truesize <= sk->sk_forward_alloc ||
+		__sk_mem_schedule(sk, skb->truesize, SK_MEM_RECV) ||
+		skb_pfmemalloc(skb);
 }
 
 static inline void sk_mem_reclaim(struct sock *sk)
diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index a986280..78ef332 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -168,7 +168,7 @@ static int caif_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	err = sk_filter(sk, skb);
 	if (err)
 		return err;
-	if (!sk_rmem_schedule(sk, skb->truesize) && rx_flow_is_on(cf_sk)) {
+	if (!sk_rmem_schedule(sk, skb) && rx_flow_is_on(cf_sk)) {
 		set_rx_flow_off(cf_sk);
 		if (net_ratelimit())
 			pr_debug("sending flow OFF due to rmem_schedule\n");
diff --git a/net/core/sock.c b/net/core/sock.c
index 0f28a9b..c27d9e5 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -344,7 +344,7 @@ int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	if (err)
 		return err;
 
-	if (!sk_rmem_schedule(sk, skb->truesize)) {
+	if (!sk_rmem_schedule(sk, skb)) {
 		atomic_inc(&sk->sk_drops);
 		return -ENOBUFS;
 	}
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index ea0d218..4c2ec53 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4356,19 +4356,19 @@ static void tcp_ofo_queue(struct sock *sk)
 static int tcp_prune_ofo_queue(struct sock *sk);
 static int tcp_prune_queue(struct sock *sk);
 
-static inline int tcp_try_rmem_schedule(struct sock *sk, unsigned int size)
+static inline int tcp_try_rmem_schedule(struct sock *sk, struct sk_buff *skb)
 {
 	if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf ||
-	    !sk_rmem_schedule(sk, size)) {
+	    !sk_rmem_schedule(sk, skb)) {
 
 		if (tcp_prune_queue(sk) < 0)
 			return -1;
 
-		if (!sk_rmem_schedule(sk, size)) {
+		if (!sk_rmem_schedule(sk, skb)) {
 			if (!tcp_prune_ofo_queue(sk))
 				return -1;
 
-			if (!sk_rmem_schedule(sk, size))
+			if (!sk_rmem_schedule(sk, skb))
 				return -1;
 		}
 	}
@@ -4421,7 +4421,7 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 		if (eaten <= 0) {
 queue_and_out:
 			if (eaten < 0 &&
-			    tcp_try_rmem_schedule(sk, skb->truesize))
+			    tcp_try_rmem_schedule(sk, skb))
 				goto drop;
 
 			skb_set_owner_r(skb, sk);
@@ -4492,7 +4492,7 @@ drop:
 
 	TCP_ECN_check_ce(tp, skb);
 
-	if (tcp_try_rmem_schedule(sk, skb->truesize))
+	if (tcp_try_rmem_schedule(sk, skb))
 		goto drop;
 
 	/* Disable header prediction. */
diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
index 8a84017..6c6ed2d 100644
--- a/net/sctp/ulpevent.c
+++ b/net/sctp/ulpevent.c
@@ -702,7 +702,7 @@ struct sctp_ulpevent *sctp_ulpevent_make_rcvmsg(struct sctp_association *asoc,
 	if (rx_count >= asoc->base.sk->sk_rcvbuf) {
 
 		if ((asoc->base.sk->sk_userlocks & SOCK_RCVBUF_LOCK) ||
-		    (!sk_rmem_schedule(asoc->base.sk, chunk->skb->truesize)))
+		    (!sk_rmem_schedule(asoc->base.sk, chunk->skb)))
 			goto fail;
 	}
 
-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 02/10] selinux: tag avc cache alloc as non-critical
  2011-09-09 11:00 ` Mel Gorman
@ 2011-09-09 11:00   ` Mel Gorman
  -1 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-09 11:00 UTC (permalink / raw)
  To: Linux-MM
  Cc: Linux-Netdev, Linux-NFS, LKML, Andrew Morton, David Miller,
	Trond Myklebust, Neil Brown, Peter Zijlstra, Mel Gorman

Failing to allocate a cache entry will only harm performance not
correctness.  Do not consume valuable reserve pages for something
like that.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 security/selinux/avc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index dca1c22..a68d200 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -280,7 +280,7 @@ static struct avc_node *avc_alloc_node(void)
 {
 	struct avc_node *node;
 
-	node = kmem_cache_zalloc(avc_node_cachep, GFP_ATOMIC);
+	node = kmem_cache_zalloc(avc_node_cachep, GFP_ATOMIC|__GFP_NOMEMALLOC);
 	if (!node)
 		goto out;
 
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 02/10] selinux: tag avc cache alloc as non-critical
@ 2011-09-09 11:00   ` Mel Gorman
  0 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-09 11:00 UTC (permalink / raw)
  To: Linux-MM
  Cc: Linux-Netdev, Linux-NFS, LKML, Andrew Morton, David Miller,
	Trond Myklebust, Neil Brown, Peter Zijlstra, Mel Gorman

Failing to allocate a cache entry will only harm performance not
correctness.  Do not consume valuable reserve pages for something
like that.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 security/selinux/avc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index dca1c22..a68d200 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -280,7 +280,7 @@ static struct avc_node *avc_alloc_node(void)
 {
 	struct avc_node *node;
 
-	node = kmem_cache_zalloc(avc_node_cachep, GFP_ATOMIC);
+	node = kmem_cache_zalloc(avc_node_cachep, GFP_ATOMIC|__GFP_NOMEMALLOC);
 	if (!node)
 		goto out;
 
-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 03/10] mm: Add support for a filesystem to control swap files
  2011-09-09 11:00 ` Mel Gorman
@ 2011-09-09 11:00   ` Mel Gorman
  -1 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-09 11:00 UTC (permalink / raw)
  To: Linux-MM
  Cc: Linux-Netdev, Linux-NFS, LKML, Andrew Morton, David Miller,
	Trond Myklebust, Neil Brown, Peter Zijlstra, Mel Gorman

Currently swapfiles are managed entirely by the core VM by using
->bmap to allocate space and write to the blocks directly. This
patch adds address_space_operations methods that allow a filesystem
to optionally control the swapfile.

  int swap_activate(struct file *);
  int swap_deactivate(struct file *);
  int swap_writepage(struct file *, struct page *, struct writeback_control *);
  int swap_readpage(struct file *, struct page *);

The ->swap_activate() method is used to communicate to the file
that the VM relies on it, and the address_space should take adequate
measures such as reserving space in the underlying device, reserving
memory for mempools etc. The ->swap_deactivate() method is called on
sys_swapoff() if ->swap_activate() returned success.

After a successful swapfile ->swap_activate, the swapfile
is marked SWP_FILE and swapper_space.a_ops will proxy to
sis->swap_file->f_mappings->a_ops using ->swap_readpage and
->swap_writepage tp read/write swapcache pages.

The primary user of this interface is expected to be NFS for supporting
swap-over-NFS which is why the existing readpage/writepage interface
is not used. For writing a swap page on NFS, the struct file * is
needed for a credential context that is not passed into writepage.

[a.p.zijlstra@chello.nl: Original patch]
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 Documentation/filesystems/Locking |   23 +++++++++++++++++++++++
 Documentation/filesystems/vfs.txt |   21 +++++++++++++++++++++
 include/linux/fs.h                |    7 +++++++
 include/linux/swap.h              |    3 +++
 mm/page_io.c                      |   37 +++++++++++++++++++++++++++++++++++++
 mm/swap_state.c                   |    2 +-
 mm/swapfile.c                     |   30 ++++++++++++++++++++++++++++--
 7 files changed, 120 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 6533807..7f534f4 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -201,6 +201,10 @@ prototypes:
 	int (*launder_page)(struct page *);
 	int (*is_partially_uptodate)(struct page *, read_descriptor_t *, unsigned long);
 	int (*error_remove_page)(struct address_space *, struct page *);
+	int (*swap_activate)(struct file *);
+	int (*swap_deactivate)(struct file *);
+	int (*swap_out)(struct file *, struct page *, struct writeback_control *);
+	int (*swap_in)(struct file *, struct page *);
 
 locking rules:
 	All except set_page_dirty and freepage may block
@@ -224,6 +228,10 @@ migratepage:		yes (both)
 launder_page:		yes
 is_partially_uptodate:	yes
 error_remove_page:	yes
+swap_activate:		no
+swap_deactivate:	no
+swap_out		no			yes, unlocks
+swap_in			no			yes, unlocks
 
 	->write_begin(), ->write_end(), ->sync_page() and ->readpage()
 may be called from the request handler (/dev/loop).
@@ -325,6 +333,21 @@ cleaned, or an error value if not. Note that in order to prevent the page
 getting mapped back in and redirtied, it needs to be kept locked
 across the entire operation.
 
+	->swap_activate will be called with a non-zero argument on
+files backing (non block device backed) swapfiles. A return value
+of zero indicates success, in which case this file can be used for
+backing swapspace. The swapspace operations will be proxied to the
+address space operations.
+
+	->swap_deactivate() will be called in the sys_swapoff()
+path after ->swap_activate() returned success.
+
+       ->swap_writepage() is usable after swap_activate() returned
+success. This method is used to write a swap page.
+
+       ->swap_readpage() is usable after swap_activate() returned
+success, this method is used to read a swap page.
+
 ----------------------- file_lock_operations ------------------------------
 prototypes:
 	void (*fl_copy_lock)(struct file_lock *, struct file_lock *);
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 52d8fb8..8378eaa 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -581,6 +581,11 @@ struct address_space_operations {
 	int (*migratepage) (struct page *, struct page *);
 	int (*launder_page) (struct page *);
 	int (*error_remove_page) (struct mapping *mapping, struct page *page);
+	int (*swap_activate)(struct file *);
+	int (*swap_deactivate)(struct file *);
+	int (*swap_out)(struct file *, struct page *,
+			struct writeback_control *);
+	int (*swap_in)(struct file *, struct page *);
 };
 
   writepage: called by the VM to write a dirty page to backing store.
@@ -749,6 +754,22 @@ struct address_space_operations {
 	Setting this implies you deal with pages going away under you,
 	unless you have them locked or reference counts increased.
 
+  swap_activate: Called when swapon is used on a file to allocating
+	space if necessary and perform any other necessary
+	housekeeping. A return value of zero indicates success,
+	in which case this file can be used to back swapspace. The
+	swapspace operations will be proxied to this address space's
+	->swap_{out,in} methods.
+
+  swap_deactivate: Called during swapoff on files where swap_activate
+  	was successful.
+
+  swap_writepage: Called to write a swapcache page to a backing store,
+	similar to writepage.
+
+  swap_readpage: Called to read a swapcache page from a backing store,
+	similar to readpage.
+
 
 The File Object
 ===============
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c2bd68f..387b767 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -617,6 +617,13 @@ struct address_space_operations {
 	int (*is_partially_uptodate) (struct page *, read_descriptor_t *,
 					unsigned long);
 	int (*error_remove_page)(struct address_space *, struct page *);
+
+	/* swapfile support */
+	int (*swap_activate)(struct file *file);
+	int (*swap_deactivate)(struct file *file);
+	int (*swap_writepage)(struct file *file, struct page *page,
+			struct writeback_control *wbc);
+	int (*swap_readpage)(struct file *file, struct page *page);
 };
 
 extern const struct address_space_operations empty_aops;
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 14d6249..a044198 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -148,6 +148,7 @@ enum {
 	SWP_SOLIDSTATE	= (1 << 4),	/* blkdev seeks are cheap */
 	SWP_CONTINUED	= (1 << 5),	/* swap_map has count continuation */
 	SWP_BLKDEV	= (1 << 6),	/* its a block device */
+	SWP_FILE	= (1 << 7),	/* set after swap_activate success */
 					/* add others here before... */
 	SWP_SCANNING	= (1 << 8),	/* refcount in scan_swap_map */
 };
@@ -303,6 +304,7 @@ static inline int mem_cgroup_swappiness(struct mem_cgroup *mem)
 /* linux/mm/page_io.c */
 extern int swap_readpage(struct page *);
 extern int swap_writepage(struct page *page, struct writeback_control *wbc);
+extern int swap_set_page_dirty(struct page *page);
 extern void end_swap_bio_read(struct bio *bio, int err);
 
 /* linux/mm/swap_state.c */
@@ -339,6 +341,7 @@ extern int swap_type_of(dev_t, sector_t, struct block_device **);
 extern unsigned int count_swap_pages(int, int);
 extern sector_t map_swap_page(struct page *, struct block_device **);
 extern sector_t swapdev_block(int, pgoff_t);
+extern struct swap_info_struct *page_swap_info(struct page *);
 extern int reuse_swap_page(struct page *);
 extern int try_to_free_swap(struct page *);
 struct backing_dev_info;
diff --git a/mm/page_io.c b/mm/page_io.c
index dc76b4d..5ed5710 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -17,6 +17,7 @@
 #include <linux/swap.h>
 #include <linux/bio.h>
 #include <linux/swapops.h>
+#include <linux/buffer_head.h>
 #include <linux/writeback.h>
 #include <asm/pgtable.h>
 
@@ -93,11 +94,23 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
 {
 	struct bio *bio;
 	int ret = 0, rw = WRITE;
+	struct swap_info_struct *sis = page_swap_info(page);
 
 	if (try_to_free_swap(page)) {
 		unlock_page(page);
 		goto out;
 	}
+
+	if (sis->flags & SWP_FILE) {
+		struct file *swap_file = sis->swap_file;
+		struct address_space *mapping = swap_file->f_mapping;
+
+		ret = mapping->a_ops->swap_writepage(swap_file, page, wbc);
+		if (!ret)
+			count_vm_event(PSWPOUT);
+		return ret;
+	}
+
 	bio = get_swap_bio(GFP_NOIO, page, end_swap_bio_write);
 	if (bio == NULL) {
 		set_page_dirty(page);
@@ -119,9 +132,21 @@ int swap_readpage(struct page *page)
 {
 	struct bio *bio;
 	int ret = 0;
+	struct swap_info_struct *sis = page_swap_info(page);
 
 	VM_BUG_ON(!PageLocked(page));
 	VM_BUG_ON(PageUptodate(page));
+
+	if (sis->flags & SWP_FILE) {
+		struct file *swap_file = sis->swap_file;
+		struct address_space *mapping = swap_file->f_mapping;
+
+		ret = mapping->a_ops->swap_readpage(swap_file, page);
+		if (!ret)
+			count_vm_event(PSWPIN);
+		return ret;
+	}
+
 	bio = get_swap_bio(GFP_KERNEL, page, end_swap_bio_read);
 	if (bio == NULL) {
 		unlock_page(page);
@@ -133,3 +158,15 @@ int swap_readpage(struct page *page)
 out:
 	return ret;
 }
+
+int swap_set_page_dirty(struct page *page)
+{
+	struct swap_info_struct *sis = page_swap_info(page);
+
+	if (sis->flags & SWP_FILE) {
+		struct address_space *mapping = sis->swap_file->f_mapping;
+		return mapping->a_ops->set_page_dirty(page);
+	} else {
+		return __set_page_dirty_nobuffers(page);
+	}
+}
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 4668046..787ca54 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -28,7 +28,7 @@
  */
 static const struct address_space_operations swap_aops = {
 	.writepage	= swap_writepage,
-	.set_page_dirty	= __set_page_dirty_nobuffers,
+	.set_page_dirty	= swap_set_page_dirty,
 	.migratepage	= migrate_page,
 };
 
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 17bc224..f181884 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1343,6 +1343,14 @@ static void destroy_swap_extents(struct swap_info_struct *sis)
 		list_del(&se->list);
 		kfree(se);
 	}
+
+	if (sis->flags & SWP_FILE) {
+		struct file *swap_file = sis->swap_file;
+		struct address_space *mapping = swap_file->f_mapping;
+
+		sis->flags &= ~SWP_FILE;
+		mapping->a_ops->swap_deactivate(swap_file);
+	}
 }
 
 /*
@@ -1424,7 +1432,9 @@ add_swap_extent(struct swap_info_struct *sis, unsigned long start_page,
  */
 static int setup_swap_extents(struct swap_info_struct *sis, sector_t *span)
 {
-	struct inode *inode;
+	struct file *swap_file = sis->swap_file;
+	struct address_space *mapping = swap_file->f_mapping;
+	struct inode *inode = mapping->host;
 	unsigned blocks_per_page;
 	unsigned long page_no;
 	unsigned blkbits;
@@ -1435,13 +1445,22 @@ static int setup_swap_extents(struct swap_info_struct *sis, sector_t *span)
 	int nr_extents = 0;
 	int ret;
 
-	inode = sis->swap_file->f_mapping->host;
 	if (S_ISBLK(inode->i_mode)) {
 		ret = add_swap_extent(sis, 0, sis->max, 0);
 		*span = sis->pages;
 		goto out;
 	}
 
+	if (mapping->a_ops->swap_activate) {
+		ret = mapping->a_ops->swap_activate(swap_file);
+		if (!ret) {
+			sis->flags |= SWP_FILE;
+			ret = add_swap_extent(sis, 0, sis->max, 0);
+			*span = sis->pages;
+		}
+		goto out;
+	}
+
 	blkbits = inode->i_blkbits;
 	blocks_per_page = PAGE_SIZE >> blkbits;
 
@@ -2289,6 +2308,13 @@ int swapcache_prepare(swp_entry_t entry)
 	return __swap_duplicate(entry, SWAP_HAS_CACHE);
 }
 
+struct swap_info_struct *page_swap_info(struct page *page)
+{
+	swp_entry_t swap = { .val = page_private(page) };
+	BUG_ON(!PageSwapCache(page));
+	return swap_info[swp_type(swap)];
+}
+
 /*
  * swap_lock prevents swap_map being freed. Don't grab an extra
  * reference on the swaphandle, it doesn't matter if it becomes unused.
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 03/10] mm: Add support for a filesystem to control swap files
@ 2011-09-09 11:00   ` Mel Gorman
  0 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-09 11:00 UTC (permalink / raw)
  To: Linux-MM
  Cc: Linux-Netdev, Linux-NFS, LKML, Andrew Morton, David Miller,
	Trond Myklebust, Neil Brown, Peter Zijlstra, Mel Gorman

Currently swapfiles are managed entirely by the core VM by using
->bmap to allocate space and write to the blocks directly. This
patch adds address_space_operations methods that allow a filesystem
to optionally control the swapfile.

  int swap_activate(struct file *);
  int swap_deactivate(struct file *);
  int swap_writepage(struct file *, struct page *, struct writeback_control *);
  int swap_readpage(struct file *, struct page *);

The ->swap_activate() method is used to communicate to the file
that the VM relies on it, and the address_space should take adequate
measures such as reserving space in the underlying device, reserving
memory for mempools etc. The ->swap_deactivate() method is called on
sys_swapoff() if ->swap_activate() returned success.

After a successful swapfile ->swap_activate, the swapfile
is marked SWP_FILE and swapper_space.a_ops will proxy to
sis->swap_file->f_mappings->a_ops using ->swap_readpage and
->swap_writepage tp read/write swapcache pages.

The primary user of this interface is expected to be NFS for supporting
swap-over-NFS which is why the existing readpage/writepage interface
is not used. For writing a swap page on NFS, the struct file * is
needed for a credential context that is not passed into writepage.

[a.p.zijlstra@chello.nl: Original patch]
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 Documentation/filesystems/Locking |   23 +++++++++++++++++++++++
 Documentation/filesystems/vfs.txt |   21 +++++++++++++++++++++
 include/linux/fs.h                |    7 +++++++
 include/linux/swap.h              |    3 +++
 mm/page_io.c                      |   37 +++++++++++++++++++++++++++++++++++++
 mm/swap_state.c                   |    2 +-
 mm/swapfile.c                     |   30 ++++++++++++++++++++++++++++--
 7 files changed, 120 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 6533807..7f534f4 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -201,6 +201,10 @@ prototypes:
 	int (*launder_page)(struct page *);
 	int (*is_partially_uptodate)(struct page *, read_descriptor_t *, unsigned long);
 	int (*error_remove_page)(struct address_space *, struct page *);
+	int (*swap_activate)(struct file *);
+	int (*swap_deactivate)(struct file *);
+	int (*swap_out)(struct file *, struct page *, struct writeback_control *);
+	int (*swap_in)(struct file *, struct page *);
 
 locking rules:
 	All except set_page_dirty and freepage may block
@@ -224,6 +228,10 @@ migratepage:		yes (both)
 launder_page:		yes
 is_partially_uptodate:	yes
 error_remove_page:	yes
+swap_activate:		no
+swap_deactivate:	no
+swap_out		no			yes, unlocks
+swap_in			no			yes, unlocks
 
 	->write_begin(), ->write_end(), ->sync_page() and ->readpage()
 may be called from the request handler (/dev/loop).
@@ -325,6 +333,21 @@ cleaned, or an error value if not. Note that in order to prevent the page
 getting mapped back in and redirtied, it needs to be kept locked
 across the entire operation.
 
+	->swap_activate will be called with a non-zero argument on
+files backing (non block device backed) swapfiles. A return value
+of zero indicates success, in which case this file can be used for
+backing swapspace. The swapspace operations will be proxied to the
+address space operations.
+
+	->swap_deactivate() will be called in the sys_swapoff()
+path after ->swap_activate() returned success.
+
+       ->swap_writepage() is usable after swap_activate() returned
+success. This method is used to write a swap page.
+
+       ->swap_readpage() is usable after swap_activate() returned
+success, this method is used to read a swap page.
+
 ----------------------- file_lock_operations ------------------------------
 prototypes:
 	void (*fl_copy_lock)(struct file_lock *, struct file_lock *);
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 52d8fb8..8378eaa 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -581,6 +581,11 @@ struct address_space_operations {
 	int (*migratepage) (struct page *, struct page *);
 	int (*launder_page) (struct page *);
 	int (*error_remove_page) (struct mapping *mapping, struct page *page);
+	int (*swap_activate)(struct file *);
+	int (*swap_deactivate)(struct file *);
+	int (*swap_out)(struct file *, struct page *,
+			struct writeback_control *);
+	int (*swap_in)(struct file *, struct page *);
 };
 
   writepage: called by the VM to write a dirty page to backing store.
@@ -749,6 +754,22 @@ struct address_space_operations {
 	Setting this implies you deal with pages going away under you,
 	unless you have them locked or reference counts increased.
 
+  swap_activate: Called when swapon is used on a file to allocating
+	space if necessary and perform any other necessary
+	housekeeping. A return value of zero indicates success,
+	in which case this file can be used to back swapspace. The
+	swapspace operations will be proxied to this address space's
+	->swap_{out,in} methods.
+
+  swap_deactivate: Called during swapoff on files where swap_activate
+  	was successful.
+
+  swap_writepage: Called to write a swapcache page to a backing store,
+	similar to writepage.
+
+  swap_readpage: Called to read a swapcache page from a backing store,
+	similar to readpage.
+
 
 The File Object
 ===============
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c2bd68f..387b767 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -617,6 +617,13 @@ struct address_space_operations {
 	int (*is_partially_uptodate) (struct page *, read_descriptor_t *,
 					unsigned long);
 	int (*error_remove_page)(struct address_space *, struct page *);
+
+	/* swapfile support */
+	int (*swap_activate)(struct file *file);
+	int (*swap_deactivate)(struct file *file);
+	int (*swap_writepage)(struct file *file, struct page *page,
+			struct writeback_control *wbc);
+	int (*swap_readpage)(struct file *file, struct page *page);
 };
 
 extern const struct address_space_operations empty_aops;
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 14d6249..a044198 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -148,6 +148,7 @@ enum {
 	SWP_SOLIDSTATE	= (1 << 4),	/* blkdev seeks are cheap */
 	SWP_CONTINUED	= (1 << 5),	/* swap_map has count continuation */
 	SWP_BLKDEV	= (1 << 6),	/* its a block device */
+	SWP_FILE	= (1 << 7),	/* set after swap_activate success */
 					/* add others here before... */
 	SWP_SCANNING	= (1 << 8),	/* refcount in scan_swap_map */
 };
@@ -303,6 +304,7 @@ static inline int mem_cgroup_swappiness(struct mem_cgroup *mem)
 /* linux/mm/page_io.c */
 extern int swap_readpage(struct page *);
 extern int swap_writepage(struct page *page, struct writeback_control *wbc);
+extern int swap_set_page_dirty(struct page *page);
 extern void end_swap_bio_read(struct bio *bio, int err);
 
 /* linux/mm/swap_state.c */
@@ -339,6 +341,7 @@ extern int swap_type_of(dev_t, sector_t, struct block_device **);
 extern unsigned int count_swap_pages(int, int);
 extern sector_t map_swap_page(struct page *, struct block_device **);
 extern sector_t swapdev_block(int, pgoff_t);
+extern struct swap_info_struct *page_swap_info(struct page *);
 extern int reuse_swap_page(struct page *);
 extern int try_to_free_swap(struct page *);
 struct backing_dev_info;
diff --git a/mm/page_io.c b/mm/page_io.c
index dc76b4d..5ed5710 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -17,6 +17,7 @@
 #include <linux/swap.h>
 #include <linux/bio.h>
 #include <linux/swapops.h>
+#include <linux/buffer_head.h>
 #include <linux/writeback.h>
 #include <asm/pgtable.h>
 
@@ -93,11 +94,23 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
 {
 	struct bio *bio;
 	int ret = 0, rw = WRITE;
+	struct swap_info_struct *sis = page_swap_info(page);
 
 	if (try_to_free_swap(page)) {
 		unlock_page(page);
 		goto out;
 	}
+
+	if (sis->flags & SWP_FILE) {
+		struct file *swap_file = sis->swap_file;
+		struct address_space *mapping = swap_file->f_mapping;
+
+		ret = mapping->a_ops->swap_writepage(swap_file, page, wbc);
+		if (!ret)
+			count_vm_event(PSWPOUT);
+		return ret;
+	}
+
 	bio = get_swap_bio(GFP_NOIO, page, end_swap_bio_write);
 	if (bio == NULL) {
 		set_page_dirty(page);
@@ -119,9 +132,21 @@ int swap_readpage(struct page *page)
 {
 	struct bio *bio;
 	int ret = 0;
+	struct swap_info_struct *sis = page_swap_info(page);
 
 	VM_BUG_ON(!PageLocked(page));
 	VM_BUG_ON(PageUptodate(page));
+
+	if (sis->flags & SWP_FILE) {
+		struct file *swap_file = sis->swap_file;
+		struct address_space *mapping = swap_file->f_mapping;
+
+		ret = mapping->a_ops->swap_readpage(swap_file, page);
+		if (!ret)
+			count_vm_event(PSWPIN);
+		return ret;
+	}
+
 	bio = get_swap_bio(GFP_KERNEL, page, end_swap_bio_read);
 	if (bio == NULL) {
 		unlock_page(page);
@@ -133,3 +158,15 @@ int swap_readpage(struct page *page)
 out:
 	return ret;
 }
+
+int swap_set_page_dirty(struct page *page)
+{
+	struct swap_info_struct *sis = page_swap_info(page);
+
+	if (sis->flags & SWP_FILE) {
+		struct address_space *mapping = sis->swap_file->f_mapping;
+		return mapping->a_ops->set_page_dirty(page);
+	} else {
+		return __set_page_dirty_nobuffers(page);
+	}
+}
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 4668046..787ca54 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -28,7 +28,7 @@
  */
 static const struct address_space_operations swap_aops = {
 	.writepage	= swap_writepage,
-	.set_page_dirty	= __set_page_dirty_nobuffers,
+	.set_page_dirty	= swap_set_page_dirty,
 	.migratepage	= migrate_page,
 };
 
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 17bc224..f181884 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1343,6 +1343,14 @@ static void destroy_swap_extents(struct swap_info_struct *sis)
 		list_del(&se->list);
 		kfree(se);
 	}
+
+	if (sis->flags & SWP_FILE) {
+		struct file *swap_file = sis->swap_file;
+		struct address_space *mapping = swap_file->f_mapping;
+
+		sis->flags &= ~SWP_FILE;
+		mapping->a_ops->swap_deactivate(swap_file);
+	}
 }
 
 /*
@@ -1424,7 +1432,9 @@ add_swap_extent(struct swap_info_struct *sis, unsigned long start_page,
  */
 static int setup_swap_extents(struct swap_info_struct *sis, sector_t *span)
 {
-	struct inode *inode;
+	struct file *swap_file = sis->swap_file;
+	struct address_space *mapping = swap_file->f_mapping;
+	struct inode *inode = mapping->host;
 	unsigned blocks_per_page;
 	unsigned long page_no;
 	unsigned blkbits;
@@ -1435,13 +1445,22 @@ static int setup_swap_extents(struct swap_info_struct *sis, sector_t *span)
 	int nr_extents = 0;
 	int ret;
 
-	inode = sis->swap_file->f_mapping->host;
 	if (S_ISBLK(inode->i_mode)) {
 		ret = add_swap_extent(sis, 0, sis->max, 0);
 		*span = sis->pages;
 		goto out;
 	}
 
+	if (mapping->a_ops->swap_activate) {
+		ret = mapping->a_ops->swap_activate(swap_file);
+		if (!ret) {
+			sis->flags |= SWP_FILE;
+			ret = add_swap_extent(sis, 0, sis->max, 0);
+			*span = sis->pages;
+		}
+		goto out;
+	}
+
 	blkbits = inode->i_blkbits;
 	blocks_per_page = PAGE_SIZE >> blkbits;
 
@@ -2289,6 +2308,13 @@ int swapcache_prepare(swp_entry_t entry)
 	return __swap_duplicate(entry, SWAP_HAS_CACHE);
 }
 
+struct swap_info_struct *page_swap_info(struct page *page)
+{
+	swp_entry_t swap = { .val = page_private(page) };
+	BUG_ON(!PageSwapCache(page));
+	return swap_info[swp_type(swap)];
+}
+
 /*
  * swap_lock prevents swap_map being freed. Don't grab an extra
  * reference on the swaphandle, it doesn't matter if it becomes unused.
-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 04/10] mm: swap: Implement generic handlers for swap-related address ops
  2011-09-09 11:00 ` Mel Gorman
@ 2011-09-09 11:00   ` Mel Gorman
  -1 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-09 11:00 UTC (permalink / raw)
  To: Linux-MM
  Cc: Linux-Netdev, Linux-NFS, LKML, Andrew Morton, David Miller,
	Trond Myklebust, Neil Brown, Peter Zijlstra, Mel Gorman

With the introduction of swap_activate, swap_writepage and
swap_readpage, there is a number of SWP_FILE checks that call a_ops and
fallback to generic handlers. This patch clarifies things by creating
generic versions of these functions and passing in all the information
required to implement a generic handler so the same information is
available to filesystems.  This removes the need for SWP_FILE and
cleans up the flow slightly.  There are no functional changes.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/fs.h   |    7 ++-
 include/linux/swap.h |    6 ++-
 mm/page_io.c         |  184 +++++++++++++++++++++++++++++++++++++++-----------
 mm/swapfile.c        |  102 +++-------------------------
 4 files changed, 162 insertions(+), 137 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 387b767..dd93bb1 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -581,6 +581,8 @@ typedef struct {
 typedef int (*read_actor_t)(read_descriptor_t *, struct page *,
 		unsigned long, unsigned long);
 
+struct swap_info_struct;
+
 struct address_space_operations {
 	int (*writepage)(struct page *page, struct writeback_control *wbc);
 	int (*readpage)(struct file *, struct page *);
@@ -619,8 +621,9 @@ struct address_space_operations {
 	int (*error_remove_page)(struct address_space *, struct page *);
 
 	/* swapfile support */
-	int (*swap_activate)(struct file *file);
-	int (*swap_deactivate)(struct file *file);
+	int (*swap_activate)(struct swap_info_struct *sis, struct file *file,
+				sector_t *span);
+	void (*swap_deactivate)(struct file *file);
 	int (*swap_writepage)(struct file *file, struct page *page,
 			struct writeback_control *wbc);
 	int (*swap_readpage)(struct file *file, struct page *page);
diff --git a/include/linux/swap.h b/include/linux/swap.h
index a044198..195ae15 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -148,7 +148,6 @@ enum {
 	SWP_SOLIDSTATE	= (1 << 4),	/* blkdev seeks are cheap */
 	SWP_CONTINUED	= (1 << 5),	/* swap_map has count continuation */
 	SWP_BLKDEV	= (1 << 6),	/* its a block device */
-	SWP_FILE	= (1 << 7),	/* set after swap_activate success */
 					/* add others here before... */
 	SWP_SCANNING	= (1 << 8),	/* refcount in scan_swap_map */
 };
@@ -307,6 +306,11 @@ extern int swap_writepage(struct page *page, struct writeback_control *wbc);
 extern int swap_set_page_dirty(struct page *page);
 extern void end_swap_bio_read(struct bio *bio, int err);
 
+int add_swap_extent(struct swap_info_struct *, unsigned long start_pfn,
+		unsigned long nr_pages, sector_t);
+int generic_swapfile_activate(struct swap_info_struct *, struct file *,
+		sector_t *);
+
 /* linux/mm/swap_state.c */
 extern struct address_space swapper_space;
 #define total_swapcache_pages  swapper_space.nrpages
diff --git a/mm/page_io.c b/mm/page_io.c
index 5ed5710..6ea49d3 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -86,87 +86,189 @@ void end_swap_bio_read(struct bio *bio, int err)
 	bio_put(bio);
 }
 
+int generic_swap_writepage(struct page *page, struct writeback_control *wbc)
+{
+	struct bio *bio;
+	int rw = WRITE;
+
+	bio = get_swap_bio(GFP_NOIO, page, end_swap_bio_write);
+	if (bio == NULL) {
+		set_page_dirty(page);
+		unlock_page(page);
+		return -ENOMEM;
+	}
+	if (wbc->sync_mode == WB_SYNC_ALL)
+		rw |= REQ_SYNC;
+	count_vm_event(PSWPOUT);
+	set_page_writeback(page);
+	unlock_page(page);
+	submit_bio(rw, bio);
+
+	return 0;
+}
+
+int generic_swap_readpage(struct page *page)
+{
+	struct bio *bio;
+	bio = get_swap_bio(GFP_KERNEL, page, end_swap_bio_read);
+	if (bio == NULL) {
+		unlock_page(page);
+		return -ENOMEM;
+	}
+	count_vm_event(PSWPIN);
+	submit_bio(READ, bio);
+
+	return 0;
+}
+
+int generic_swapfile_activate(struct swap_info_struct *sis,
+				struct file *swap_file,
+				sector_t *span)
+{
+	struct address_space *mapping = swap_file->f_mapping;
+	struct inode *inode = mapping->host;
+	unsigned blocks_per_page;
+	unsigned long page_no;
+	unsigned blkbits;
+	sector_t probe_block;
+	sector_t last_block;
+	sector_t lowest_block = -1;
+	sector_t highest_block = 0;
+	int nr_extents = 0;
+	int ret;
+
+	blkbits = inode->i_blkbits;
+	blocks_per_page = PAGE_SIZE >> blkbits;
+
+	/*
+	 * Map all the blocks into the extent list.  This code doesn't try
+	 * to be very smart.
+	 */
+	probe_block = 0;
+	page_no = 0;
+	last_block = i_size_read(inode) >> blkbits;
+	while ((probe_block + blocks_per_page) <= last_block &&
+			page_no < sis->max) {
+		unsigned block_in_page;
+		sector_t first_block;
+
+		first_block = bmap(inode, probe_block);
+		if (first_block == 0)
+			goto bad_bmap;
+
+		/*
+		 * It must be PAGE_SIZE aligned on-disk
+		 */
+		if (first_block & (blocks_per_page - 1)) {
+			probe_block++;
+			goto reprobe;
+		}
+
+		for (block_in_page = 1; block_in_page < blocks_per_page;
+					block_in_page++) {
+			sector_t block;
+
+			block = bmap(inode, probe_block + block_in_page);
+			if (block == 0)
+				goto bad_bmap;
+			if (block != first_block + block_in_page) {
+				/* Discontiguity */
+				probe_block++;
+				goto reprobe;
+			}
+		}
+
+		first_block >>= (PAGE_SHIFT - blkbits);
+		if (page_no) {	/* exclude the header page */
+			if (first_block < lowest_block)
+				lowest_block = first_block;
+			if (first_block > highest_block)
+				highest_block = first_block;
+		}
+
+		/*
+		 * We found a PAGE_SIZE-length, PAGE_SIZE-aligned run of blocks
+		 */
+		ret = add_swap_extent(sis, page_no, 1, first_block);
+		if (ret < 0)
+			goto out;
+		nr_extents += ret;
+		page_no++;
+		probe_block += blocks_per_page;
+reprobe:
+		continue;
+	}
+	ret = nr_extents;
+	*span = 1 + highest_block - lowest_block;
+	if (page_no == 0)
+		page_no = 1;	/* force Empty message */
+	sis->max = page_no;
+	sis->pages = page_no - 1;
+	sis->highest_bit = page_no - 1;
+out:
+	return ret;
+bad_bmap:
+	printk(KERN_ERR "swapon: swapfile has holes\n");
+	ret = -EINVAL;
+	goto out;
+}
 /*
  * We may have stale swap cache pages in memory: notice
  * them here and get rid of the unnecessary final write.
  */
 int swap_writepage(struct page *page, struct writeback_control *wbc)
 {
-	struct bio *bio;
-	int ret = 0, rw = WRITE;
+	int ret = 0;
 	struct swap_info_struct *sis = page_swap_info(page);
+	struct file *swap_file;
+	struct address_space *mapping;
 
 	if (try_to_free_swap(page)) {
 		unlock_page(page);
-		goto out;
+		return ret;
 	}
 
-	if (sis->flags & SWP_FILE) {
-		struct file *swap_file = sis->swap_file;
-		struct address_space *mapping = swap_file->f_mapping;
-
+	swap_file = sis->swap_file;
+	mapping = swap_file->f_mapping;
+	if (mapping->a_ops->swap_writepage) {
 		ret = mapping->a_ops->swap_writepage(swap_file, page, wbc);
 		if (!ret)
 			count_vm_event(PSWPOUT);
 		return ret;
 	}
 
-	bio = get_swap_bio(GFP_NOIO, page, end_swap_bio_write);
-	if (bio == NULL) {
-		set_page_dirty(page);
-		unlock_page(page);
-		ret = -ENOMEM;
-		goto out;
-	}
-	if (wbc->sync_mode == WB_SYNC_ALL)
-		rw |= REQ_SYNC;
-	count_vm_event(PSWPOUT);
-	set_page_writeback(page);
-	unlock_page(page);
-	submit_bio(rw, bio);
-out:
-	return ret;
+	return generic_swap_writepage(page, wbc);
 }
 
 int swap_readpage(struct page *page)
 {
-	struct bio *bio;
 	int ret = 0;
 	struct swap_info_struct *sis = page_swap_info(page);
+	struct file *swap_file;
+	struct address_space *mapping;
 
 	VM_BUG_ON(!PageLocked(page));
 	VM_BUG_ON(PageUptodate(page));
 
-	if (sis->flags & SWP_FILE) {
-		struct file *swap_file = sis->swap_file;
-		struct address_space *mapping = swap_file->f_mapping;
-
+	swap_file = sis->swap_file;
+	mapping = swap_file->f_mapping;
+	if (mapping->a_ops->swap_readpage) {
 		ret = mapping->a_ops->swap_readpage(swap_file, page);
 		if (!ret)
 			count_vm_event(PSWPIN);
 		return ret;
 	}
 
-	bio = get_swap_bio(GFP_KERNEL, page, end_swap_bio_read);
-	if (bio == NULL) {
-		unlock_page(page);
-		ret = -ENOMEM;
-		goto out;
-	}
-	count_vm_event(PSWPIN);
-	submit_bio(READ, bio);
-out:
-	return ret;
+	return generic_swap_readpage(page);
 }
 
 int swap_set_page_dirty(struct page *page)
 {
 	struct swap_info_struct *sis = page_swap_info(page);
+	struct address_space *mapping = sis->swap_file->f_mapping;
 
-	if (sis->flags & SWP_FILE) {
-		struct address_space *mapping = sis->swap_file->f_mapping;
+	if (mapping->a_ops->set_page_dirty)
 		return mapping->a_ops->set_page_dirty(page);
-	} else {
+	else
 		return __set_page_dirty_nobuffers(page);
-	}
 }
diff --git a/mm/swapfile.c b/mm/swapfile.c
index f181884..c49cb33 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1335,6 +1335,9 @@ sector_t map_swap_page(struct page *page, struct block_device **bdev)
  */
 static void destroy_swap_extents(struct swap_info_struct *sis)
 {
+	struct file *swap_file = sis->swap_file;
+	struct address_space *mapping = swap_file->f_mapping;
+
 	while (!list_empty(&sis->first_swap_extent.list)) {
 		struct swap_extent *se;
 
@@ -1344,13 +1347,8 @@ static void destroy_swap_extents(struct swap_info_struct *sis)
 		kfree(se);
 	}
 
-	if (sis->flags & SWP_FILE) {
-		struct file *swap_file = sis->swap_file;
-		struct address_space *mapping = swap_file->f_mapping;
-
-		sis->flags &= ~SWP_FILE;
+	if (mapping->a_ops->swap_deactivate)
 		mapping->a_ops->swap_deactivate(swap_file);
-	}
 }
 
 /*
@@ -1359,7 +1357,7 @@ static void destroy_swap_extents(struct swap_info_struct *sis)
  *
  * This function rather assumes that it is called in ascending page order.
  */
-static int
+int
 add_swap_extent(struct swap_info_struct *sis, unsigned long start_page,
 		unsigned long nr_pages, sector_t start_block)
 {
@@ -1435,106 +1433,24 @@ static int setup_swap_extents(struct swap_info_struct *sis, sector_t *span)
 	struct file *swap_file = sis->swap_file;
 	struct address_space *mapping = swap_file->f_mapping;
 	struct inode *inode = mapping->host;
-	unsigned blocks_per_page;
-	unsigned long page_no;
-	unsigned blkbits;
-	sector_t probe_block;
-	sector_t last_block;
-	sector_t lowest_block = -1;
-	sector_t highest_block = 0;
-	int nr_extents = 0;
 	int ret;
 
 	if (S_ISBLK(inode->i_mode)) {
 		ret = add_swap_extent(sis, 0, sis->max, 0);
 		*span = sis->pages;
-		goto out;
+		return ret;
 	}
 
 	if (mapping->a_ops->swap_activate) {
-		ret = mapping->a_ops->swap_activate(swap_file);
+		ret = mapping->a_ops->swap_activate(sis, swap_file, span);
 		if (!ret) {
-			sis->flags |= SWP_FILE;
 			ret = add_swap_extent(sis, 0, sis->max, 0);
 			*span = sis->pages;
 		}
-		goto out;
+		return ret;
 	}
 
-	blkbits = inode->i_blkbits;
-	blocks_per_page = PAGE_SIZE >> blkbits;
-
-	/*
-	 * Map all the blocks into the extent list.  This code doesn't try
-	 * to be very smart.
-	 */
-	probe_block = 0;
-	page_no = 0;
-	last_block = i_size_read(inode) >> blkbits;
-	while ((probe_block + blocks_per_page) <= last_block &&
-			page_no < sis->max) {
-		unsigned block_in_page;
-		sector_t first_block;
-
-		first_block = bmap(inode, probe_block);
-		if (first_block == 0)
-			goto bad_bmap;
-
-		/*
-		 * It must be PAGE_SIZE aligned on-disk
-		 */
-		if (first_block & (blocks_per_page - 1)) {
-			probe_block++;
-			goto reprobe;
-		}
-
-		for (block_in_page = 1; block_in_page < blocks_per_page;
-					block_in_page++) {
-			sector_t block;
-
-			block = bmap(inode, probe_block + block_in_page);
-			if (block == 0)
-				goto bad_bmap;
-			if (block != first_block + block_in_page) {
-				/* Discontiguity */
-				probe_block++;
-				goto reprobe;
-			}
-		}
-
-		first_block >>= (PAGE_SHIFT - blkbits);
-		if (page_no) {	/* exclude the header page */
-			if (first_block < lowest_block)
-				lowest_block = first_block;
-			if (first_block > highest_block)
-				highest_block = first_block;
-		}
-
-		/*
-		 * We found a PAGE_SIZE-length, PAGE_SIZE-aligned run of blocks
-		 */
-		ret = add_swap_extent(sis, page_no, 1, first_block);
-		if (ret < 0)
-			goto out;
-		nr_extents += ret;
-		page_no++;
-		probe_block += blocks_per_page;
-reprobe:
-		continue;
-	}
-	ret = nr_extents;
-	*span = 1 + highest_block - lowest_block;
-	if (page_no == 0)
-		page_no = 1;	/* force Empty message */
-	sis->max = page_no;
-	sis->pages = page_no - 1;
-	sis->highest_bit = page_no - 1;
-out:
-	return ret;
-bad_bmap:
-	printk(KERN_ERR "swapon: swapfile has holes\n");
-	ret = -EINVAL;
-	goto out;
+	return generic_swapfile_activate(sis, swap_file, span);
 }
 
 static void enable_swap_info(struct swap_info_struct *p, int prio,
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 04/10] mm: swap: Implement generic handlers for swap-related address ops
@ 2011-09-09 11:00   ` Mel Gorman
  0 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-09 11:00 UTC (permalink / raw)
  To: Linux-MM
  Cc: Linux-Netdev, Linux-NFS, LKML, Andrew Morton, David Miller,
	Trond Myklebust, Neil Brown, Peter Zijlstra, Mel Gorman

With the introduction of swap_activate, swap_writepage and
swap_readpage, there is a number of SWP_FILE checks that call a_ops and
fallback to generic handlers. This patch clarifies things by creating
generic versions of these functions and passing in all the information
required to implement a generic handler so the same information is
available to filesystems.  This removes the need for SWP_FILE and
cleans up the flow slightly.  There are no functional changes.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/fs.h   |    7 ++-
 include/linux/swap.h |    6 ++-
 mm/page_io.c         |  184 +++++++++++++++++++++++++++++++++++++++-----------
 mm/swapfile.c        |  102 +++-------------------------
 4 files changed, 162 insertions(+), 137 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 387b767..dd93bb1 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -581,6 +581,8 @@ typedef struct {
 typedef int (*read_actor_t)(read_descriptor_t *, struct page *,
 		unsigned long, unsigned long);
 
+struct swap_info_struct;
+
 struct address_space_operations {
 	int (*writepage)(struct page *page, struct writeback_control *wbc);
 	int (*readpage)(struct file *, struct page *);
@@ -619,8 +621,9 @@ struct address_space_operations {
 	int (*error_remove_page)(struct address_space *, struct page *);
 
 	/* swapfile support */
-	int (*swap_activate)(struct file *file);
-	int (*swap_deactivate)(struct file *file);
+	int (*swap_activate)(struct swap_info_struct *sis, struct file *file,
+				sector_t *span);
+	void (*swap_deactivate)(struct file *file);
 	int (*swap_writepage)(struct file *file, struct page *page,
 			struct writeback_control *wbc);
 	int (*swap_readpage)(struct file *file, struct page *page);
diff --git a/include/linux/swap.h b/include/linux/swap.h
index a044198..195ae15 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -148,7 +148,6 @@ enum {
 	SWP_SOLIDSTATE	= (1 << 4),	/* blkdev seeks are cheap */
 	SWP_CONTINUED	= (1 << 5),	/* swap_map has count continuation */
 	SWP_BLKDEV	= (1 << 6),	/* its a block device */
-	SWP_FILE	= (1 << 7),	/* set after swap_activate success */
 					/* add others here before... */
 	SWP_SCANNING	= (1 << 8),	/* refcount in scan_swap_map */
 };
@@ -307,6 +306,11 @@ extern int swap_writepage(struct page *page, struct writeback_control *wbc);
 extern int swap_set_page_dirty(struct page *page);
 extern void end_swap_bio_read(struct bio *bio, int err);
 
+int add_swap_extent(struct swap_info_struct *, unsigned long start_pfn,
+		unsigned long nr_pages, sector_t);
+int generic_swapfile_activate(struct swap_info_struct *, struct file *,
+		sector_t *);
+
 /* linux/mm/swap_state.c */
 extern struct address_space swapper_space;
 #define total_swapcache_pages  swapper_space.nrpages
diff --git a/mm/page_io.c b/mm/page_io.c
index 5ed5710..6ea49d3 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -86,87 +86,189 @@ void end_swap_bio_read(struct bio *bio, int err)
 	bio_put(bio);
 }
 
+int generic_swap_writepage(struct page *page, struct writeback_control *wbc)
+{
+	struct bio *bio;
+	int rw = WRITE;
+
+	bio = get_swap_bio(GFP_NOIO, page, end_swap_bio_write);
+	if (bio == NULL) {
+		set_page_dirty(page);
+		unlock_page(page);
+		return -ENOMEM;
+	}
+	if (wbc->sync_mode == WB_SYNC_ALL)
+		rw |= REQ_SYNC;
+	count_vm_event(PSWPOUT);
+	set_page_writeback(page);
+	unlock_page(page);
+	submit_bio(rw, bio);
+
+	return 0;
+}
+
+int generic_swap_readpage(struct page *page)
+{
+	struct bio *bio;
+	bio = get_swap_bio(GFP_KERNEL, page, end_swap_bio_read);
+	if (bio == NULL) {
+		unlock_page(page);
+		return -ENOMEM;
+	}
+	count_vm_event(PSWPIN);
+	submit_bio(READ, bio);
+
+	return 0;
+}
+
+int generic_swapfile_activate(struct swap_info_struct *sis,
+				struct file *swap_file,
+				sector_t *span)
+{
+	struct address_space *mapping = swap_file->f_mapping;
+	struct inode *inode = mapping->host;
+	unsigned blocks_per_page;
+	unsigned long page_no;
+	unsigned blkbits;
+	sector_t probe_block;
+	sector_t last_block;
+	sector_t lowest_block = -1;
+	sector_t highest_block = 0;
+	int nr_extents = 0;
+	int ret;
+
+	blkbits = inode->i_blkbits;
+	blocks_per_page = PAGE_SIZE >> blkbits;
+
+	/*
+	 * Map all the blocks into the extent list.  This code doesn't try
+	 * to be very smart.
+	 */
+	probe_block = 0;
+	page_no = 0;
+	last_block = i_size_read(inode) >> blkbits;
+	while ((probe_block + blocks_per_page) <= last_block &&
+			page_no < sis->max) {
+		unsigned block_in_page;
+		sector_t first_block;
+
+		first_block = bmap(inode, probe_block);
+		if (first_block == 0)
+			goto bad_bmap;
+
+		/*
+		 * It must be PAGE_SIZE aligned on-disk
+		 */
+		if (first_block & (blocks_per_page - 1)) {
+			probe_block++;
+			goto reprobe;
+		}
+
+		for (block_in_page = 1; block_in_page < blocks_per_page;
+					block_in_page++) {
+			sector_t block;
+
+			block = bmap(inode, probe_block + block_in_page);
+			if (block == 0)
+				goto bad_bmap;
+			if (block != first_block + block_in_page) {
+				/* Discontiguity */
+				probe_block++;
+				goto reprobe;
+			}
+		}
+
+		first_block >>= (PAGE_SHIFT - blkbits);
+		if (page_no) {	/* exclude the header page */
+			if (first_block < lowest_block)
+				lowest_block = first_block;
+			if (first_block > highest_block)
+				highest_block = first_block;
+		}
+
+		/*
+		 * We found a PAGE_SIZE-length, PAGE_SIZE-aligned run of blocks
+		 */
+		ret = add_swap_extent(sis, page_no, 1, first_block);
+		if (ret < 0)
+			goto out;
+		nr_extents += ret;
+		page_no++;
+		probe_block += blocks_per_page;
+reprobe:
+		continue;
+	}
+	ret = nr_extents;
+	*span = 1 + highest_block - lowest_block;
+	if (page_no == 0)
+		page_no = 1;	/* force Empty message */
+	sis->max = page_no;
+	sis->pages = page_no - 1;
+	sis->highest_bit = page_no - 1;
+out:
+	return ret;
+bad_bmap:
+	printk(KERN_ERR "swapon: swapfile has holes\n");
+	ret = -EINVAL;
+	goto out;
+}
 /*
  * We may have stale swap cache pages in memory: notice
  * them here and get rid of the unnecessary final write.
  */
 int swap_writepage(struct page *page, struct writeback_control *wbc)
 {
-	struct bio *bio;
-	int ret = 0, rw = WRITE;
+	int ret = 0;
 	struct swap_info_struct *sis = page_swap_info(page);
+	struct file *swap_file;
+	struct address_space *mapping;
 
 	if (try_to_free_swap(page)) {
 		unlock_page(page);
-		goto out;
+		return ret;
 	}
 
-	if (sis->flags & SWP_FILE) {
-		struct file *swap_file = sis->swap_file;
-		struct address_space *mapping = swap_file->f_mapping;
-
+	swap_file = sis->swap_file;
+	mapping = swap_file->f_mapping;
+	if (mapping->a_ops->swap_writepage) {
 		ret = mapping->a_ops->swap_writepage(swap_file, page, wbc);
 		if (!ret)
 			count_vm_event(PSWPOUT);
 		return ret;
 	}
 
-	bio = get_swap_bio(GFP_NOIO, page, end_swap_bio_write);
-	if (bio == NULL) {
-		set_page_dirty(page);
-		unlock_page(page);
-		ret = -ENOMEM;
-		goto out;
-	}
-	if (wbc->sync_mode == WB_SYNC_ALL)
-		rw |= REQ_SYNC;
-	count_vm_event(PSWPOUT);
-	set_page_writeback(page);
-	unlock_page(page);
-	submit_bio(rw, bio);
-out:
-	return ret;
+	return generic_swap_writepage(page, wbc);
 }
 
 int swap_readpage(struct page *page)
 {
-	struct bio *bio;
 	int ret = 0;
 	struct swap_info_struct *sis = page_swap_info(page);
+	struct file *swap_file;
+	struct address_space *mapping;
 
 	VM_BUG_ON(!PageLocked(page));
 	VM_BUG_ON(PageUptodate(page));
 
-	if (sis->flags & SWP_FILE) {
-		struct file *swap_file = sis->swap_file;
-		struct address_space *mapping = swap_file->f_mapping;
-
+	swap_file = sis->swap_file;
+	mapping = swap_file->f_mapping;
+	if (mapping->a_ops->swap_readpage) {
 		ret = mapping->a_ops->swap_readpage(swap_file, page);
 		if (!ret)
 			count_vm_event(PSWPIN);
 		return ret;
 	}
 
-	bio = get_swap_bio(GFP_KERNEL, page, end_swap_bio_read);
-	if (bio == NULL) {
-		unlock_page(page);
-		ret = -ENOMEM;
-		goto out;
-	}
-	count_vm_event(PSWPIN);
-	submit_bio(READ, bio);
-out:
-	return ret;
+	return generic_swap_readpage(page);
 }
 
 int swap_set_page_dirty(struct page *page)
 {
 	struct swap_info_struct *sis = page_swap_info(page);
+	struct address_space *mapping = sis->swap_file->f_mapping;
 
-	if (sis->flags & SWP_FILE) {
-		struct address_space *mapping = sis->swap_file->f_mapping;
+	if (mapping->a_ops->set_page_dirty)
 		return mapping->a_ops->set_page_dirty(page);
-	} else {
+	else
 		return __set_page_dirty_nobuffers(page);
-	}
 }
diff --git a/mm/swapfile.c b/mm/swapfile.c
index f181884..c49cb33 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1335,6 +1335,9 @@ sector_t map_swap_page(struct page *page, struct block_device **bdev)
  */
 static void destroy_swap_extents(struct swap_info_struct *sis)
 {
+	struct file *swap_file = sis->swap_file;
+	struct address_space *mapping = swap_file->f_mapping;
+
 	while (!list_empty(&sis->first_swap_extent.list)) {
 		struct swap_extent *se;
 
@@ -1344,13 +1347,8 @@ static void destroy_swap_extents(struct swap_info_struct *sis)
 		kfree(se);
 	}
 
-	if (sis->flags & SWP_FILE) {
-		struct file *swap_file = sis->swap_file;
-		struct address_space *mapping = swap_file->f_mapping;
-
-		sis->flags &= ~SWP_FILE;
+	if (mapping->a_ops->swap_deactivate)
 		mapping->a_ops->swap_deactivate(swap_file);
-	}
 }
 
 /*
@@ -1359,7 +1357,7 @@ static void destroy_swap_extents(struct swap_info_struct *sis)
  *
  * This function rather assumes that it is called in ascending page order.
  */
-static int
+int
 add_swap_extent(struct swap_info_struct *sis, unsigned long start_page,
 		unsigned long nr_pages, sector_t start_block)
 {
@@ -1435,106 +1433,24 @@ static int setup_swap_extents(struct swap_info_struct *sis, sector_t *span)
 	struct file *swap_file = sis->swap_file;
 	struct address_space *mapping = swap_file->f_mapping;
 	struct inode *inode = mapping->host;
-	unsigned blocks_per_page;
-	unsigned long page_no;
-	unsigned blkbits;
-	sector_t probe_block;
-	sector_t last_block;
-	sector_t lowest_block = -1;
-	sector_t highest_block = 0;
-	int nr_extents = 0;
 	int ret;
 
 	if (S_ISBLK(inode->i_mode)) {
 		ret = add_swap_extent(sis, 0, sis->max, 0);
 		*span = sis->pages;
-		goto out;
+		return ret;
 	}
 
 	if (mapping->a_ops->swap_activate) {
-		ret = mapping->a_ops->swap_activate(swap_file);
+		ret = mapping->a_ops->swap_activate(sis, swap_file, span);
 		if (!ret) {
-			sis->flags |= SWP_FILE;
 			ret = add_swap_extent(sis, 0, sis->max, 0);
 			*span = sis->pages;
 		}
-		goto out;
+		return ret;
 	}
 
-	blkbits = inode->i_blkbits;
-	blocks_per_page = PAGE_SIZE >> blkbits;
-
-	/*
-	 * Map all the blocks into the extent list.  This code doesn't try
-	 * to be very smart.
-	 */
-	probe_block = 0;
-	page_no = 0;
-	last_block = i_size_read(inode) >> blkbits;
-	while ((probe_block + blocks_per_page) <= last_block &&
-			page_no < sis->max) {
-		unsigned block_in_page;
-		sector_t first_block;
-
-		first_block = bmap(inode, probe_block);
-		if (first_block == 0)
-			goto bad_bmap;
-
-		/*
-		 * It must be PAGE_SIZE aligned on-disk
-		 */
-		if (first_block & (blocks_per_page - 1)) {
-			probe_block++;
-			goto reprobe;
-		}
-
-		for (block_in_page = 1; block_in_page < blocks_per_page;
-					block_in_page++) {
-			sector_t block;
-
-			block = bmap(inode, probe_block + block_in_page);
-			if (block == 0)
-				goto bad_bmap;
-			if (block != first_block + block_in_page) {
-				/* Discontiguity */
-				probe_block++;
-				goto reprobe;
-			}
-		}
-
-		first_block >>= (PAGE_SHIFT - blkbits);
-		if (page_no) {	/* exclude the header page */
-			if (first_block < lowest_block)
-				lowest_block = first_block;
-			if (first_block > highest_block)
-				highest_block = first_block;
-		}
-
-		/*
-		 * We found a PAGE_SIZE-length, PAGE_SIZE-aligned run of blocks
-		 */
-		ret = add_swap_extent(sis, page_no, 1, first_block);
-		if (ret < 0)
-			goto out;
-		nr_extents += ret;
-		page_no++;
-		probe_block += blocks_per_page;
-reprobe:
-		continue;
-	}
-	ret = nr_extents;
-	*span = 1 + highest_block - lowest_block;
-	if (page_no == 0)
-		page_no = 1;	/* force Empty message */
-	sis->max = page_no;
-	sis->pages = page_no - 1;
-	sis->highest_bit = page_no - 1;
-out:
-	return ret;
-bad_bmap:
-	printk(KERN_ERR "swapon: swapfile has holes\n");
-	ret = -EINVAL;
-	goto out;
+	return generic_swapfile_activate(sis, swap_file, span);
 }
 
 static void enable_swap_info(struct swap_info_struct *p, int prio,
-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 05/10] mm: Methods for teaching filesystems about PG_swapcache pages
  2011-09-09 11:00 ` Mel Gorman
@ 2011-09-09 11:00   ` Mel Gorman
  -1 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-09 11:00 UTC (permalink / raw)
  To: Linux-MM
  Cc: Linux-Netdev, Linux-NFS, LKML, Andrew Morton, David Miller,
	Trond Myklebust, Neil Brown, Peter Zijlstra, Mel Gorman

In order to teach filesystems to handle swap cache pages, three new
page functions are introduced:

  pgoff_t page_file_index(struct page *);
  loff_t page_file_offset(struct page *);
  struct address_space *page_file_mapping(struct page *);

page_file_index() - gives the offset of this page in the file in
PAGE_CACHE_SIZE blocks. Like page->index is for mapped pages, this
function also gives the correct index for PG_swapcache pages.

page_file_offset() - uses page_file_index(), so that it will give
the expected result, even for PG_swapcache pages.

page_file_mapping() - gives the mapping backing the actual page;
that is for swap cache pages it will give swap_file->f_mapping.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/mm.h      |   25 +++++++++++++++++++++++++
 include/linux/pagemap.h |    5 +++++
 mm/swapfile.c           |   19 +++++++++++++++++++
 3 files changed, 49 insertions(+), 0 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7438071..45442a8 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -787,6 +787,17 @@ static inline void *page_rmapping(struct page *page)
 	return (void *)((unsigned long)page->mapping & ~PAGE_MAPPING_FLAGS);
 }
 
+extern struct address_space *__page_file_mapping(struct page *);
+
+static inline
+struct address_space *page_file_mapping(struct page *page)
+{
+	if (unlikely(PageSwapCache(page)))
+		return __page_file_mapping(page);
+
+	return page->mapping;
+}
+
 static inline int PageAnon(struct page *page)
 {
 	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
@@ -803,6 +814,20 @@ static inline pgoff_t page_index(struct page *page)
 	return page->index;
 }
 
+extern pgoff_t __page_file_index(struct page *page);
+
+/*
+ * Return the file index of the page. Regular pagecache pages use ->index
+ * whereas swapcache pages use swp_offset(->private)
+ */
+static inline pgoff_t page_file_index(struct page *page)
+{
+	if (unlikely(PageSwapCache(page)))
+		return __page_file_index(page);
+
+	return page->index;
+}
+
 /*
  * The atomic page->_mapcount, like _count, starts from -1:
  * so that transitions both from it and to it can be tracked,
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index cfaaa69..d4d4bda 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -286,6 +286,11 @@ static inline loff_t page_offset(struct page *page)
 	return ((loff_t)page->index) << PAGE_CACHE_SHIFT;
 }
 
+static inline loff_t page_file_offset(struct page *page)
+{
+	return ((loff_t)page_file_index(page)) << PAGE_CACHE_SHIFT;
+}
+
 extern pgoff_t linear_hugepage_index(struct vm_area_struct *vma,
 				     unsigned long address);
 
diff --git a/mm/swapfile.c b/mm/swapfile.c
index c49cb33..806b994 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2232,6 +2232,25 @@ struct swap_info_struct *page_swap_info(struct page *page)
 }
 
 /*
+ * out-of-line __page_file_ methods to avoid include hell.
+ */
+
+struct address_space *__page_file_mapping(struct page *page)
+{
+	VM_BUG_ON(!PageSwapCache(page));
+	return page_swap_info(page)->swap_file->f_mapping;
+}
+EXPORT_SYMBOL_GPL(__page_file_mapping);
+
+pgoff_t __page_file_index(struct page *page)
+{
+	swp_entry_t swap = { .val = page_private(page) };
+	VM_BUG_ON(!PageSwapCache(page));
+	return swp_offset(swap);
+}
+EXPORT_SYMBOL_GPL(__page_file_index);
+
+/*
  * swap_lock prevents swap_map being freed. Don't grab an extra
  * reference on the swaphandle, it doesn't matter if it becomes unused.
  */
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 05/10] mm: Methods for teaching filesystems about PG_swapcache pages
@ 2011-09-09 11:00   ` Mel Gorman
  0 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-09 11:00 UTC (permalink / raw)
  To: Linux-MM
  Cc: Linux-Netdev, Linux-NFS, LKML, Andrew Morton, David Miller,
	Trond Myklebust, Neil Brown, Peter Zijlstra, Mel Gorman

In order to teach filesystems to handle swap cache pages, three new
page functions are introduced:

  pgoff_t page_file_index(struct page *);
  loff_t page_file_offset(struct page *);
  struct address_space *page_file_mapping(struct page *);

page_file_index() - gives the offset of this page in the file in
PAGE_CACHE_SIZE blocks. Like page->index is for mapped pages, this
function also gives the correct index for PG_swapcache pages.

page_file_offset() - uses page_file_index(), so that it will give
the expected result, even for PG_swapcache pages.

page_file_mapping() - gives the mapping backing the actual page;
that is for swap cache pages it will give swap_file->f_mapping.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/mm.h      |   25 +++++++++++++++++++++++++
 include/linux/pagemap.h |    5 +++++
 mm/swapfile.c           |   19 +++++++++++++++++++
 3 files changed, 49 insertions(+), 0 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7438071..45442a8 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -787,6 +787,17 @@ static inline void *page_rmapping(struct page *page)
 	return (void *)((unsigned long)page->mapping & ~PAGE_MAPPING_FLAGS);
 }
 
+extern struct address_space *__page_file_mapping(struct page *);
+
+static inline
+struct address_space *page_file_mapping(struct page *page)
+{
+	if (unlikely(PageSwapCache(page)))
+		return __page_file_mapping(page);
+
+	return page->mapping;
+}
+
 static inline int PageAnon(struct page *page)
 {
 	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
@@ -803,6 +814,20 @@ static inline pgoff_t page_index(struct page *page)
 	return page->index;
 }
 
+extern pgoff_t __page_file_index(struct page *page);
+
+/*
+ * Return the file index of the page. Regular pagecache pages use ->index
+ * whereas swapcache pages use swp_offset(->private)
+ */
+static inline pgoff_t page_file_index(struct page *page)
+{
+	if (unlikely(PageSwapCache(page)))
+		return __page_file_index(page);
+
+	return page->index;
+}
+
 /*
  * The atomic page->_mapcount, like _count, starts from -1:
  * so that transitions both from it and to it can be tracked,
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index cfaaa69..d4d4bda 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -286,6 +286,11 @@ static inline loff_t page_offset(struct page *page)
 	return ((loff_t)page->index) << PAGE_CACHE_SHIFT;
 }
 
+static inline loff_t page_file_offset(struct page *page)
+{
+	return ((loff_t)page_file_index(page)) << PAGE_CACHE_SHIFT;
+}
+
 extern pgoff_t linear_hugepage_index(struct vm_area_struct *vma,
 				     unsigned long address);
 
diff --git a/mm/swapfile.c b/mm/swapfile.c
index c49cb33..806b994 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2232,6 +2232,25 @@ struct swap_info_struct *page_swap_info(struct page *page)
 }
 
 /*
+ * out-of-line __page_file_ methods to avoid include hell.
+ */
+
+struct address_space *__page_file_mapping(struct page *page)
+{
+	VM_BUG_ON(!PageSwapCache(page));
+	return page_swap_info(page)->swap_file->f_mapping;
+}
+EXPORT_SYMBOL_GPL(__page_file_mapping);
+
+pgoff_t __page_file_index(struct page *page)
+{
+	swp_entry_t swap = { .val = page_private(page) };
+	VM_BUG_ON(!PageSwapCache(page));
+	return swp_offset(swap);
+}
+EXPORT_SYMBOL_GPL(__page_file_index);
+
+/*
  * swap_lock prevents swap_map being freed. Don't grab an extra
  * reference on the swaphandle, it doesn't matter if it becomes unused.
  */
-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 06/10] nfs: teach the NFS client how to treat PG_swapcache pages
  2011-09-09 11:00 ` Mel Gorman
@ 2011-09-09 11:00   ` Mel Gorman
  -1 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-09 11:00 UTC (permalink / raw)
  To: Linux-MM
  Cc: Linux-Netdev, Linux-NFS, LKML, Andrew Morton, David Miller,
	Trond Myklebust, Neil Brown, Peter Zijlstra, Mel Gorman

Replace all relevant occurences of page->index and page->mapping in
the NFS client with the new page_file_index() and page_file_mapping()
functions.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/nfs/file.c     |    6 +++---
 fs/nfs/internal.h |    7 ++++---
 fs/nfs/pagelist.c |    6 +++---
 fs/nfs/read.c     |    6 +++---
 fs/nfs/write.c    |   46 +++++++++++++++++++++++++---------------------
 5 files changed, 38 insertions(+), 33 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 28b8c3f..38c7cf4 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -484,7 +484,7 @@ static void nfs_invalidate_page(struct page *page, unsigned long offset)
 	if (offset != 0)
 		return;
 	/* Cancel any unstarted writes on this page */
-	nfs_wb_page_cancel(page->mapping->host, page);
+	nfs_wb_page_cancel(page_file_mapping(page)->host, page);
 
 	nfs_fscache_invalidate_page(page, page->mapping->host);
 }
@@ -526,7 +526,7 @@ static int nfs_release_page(struct page *page, gfp_t gfp)
  */
 static int nfs_launder_page(struct page *page)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	struct nfs_inode *nfsi = NFS_I(inode);
 
 	dfprintk(PAGECACHE, "NFS: launder_page(%ld, %llu)\n",
@@ -575,7 +575,7 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	nfs_fscache_wait_on_page_write(NFS_I(dentry->d_inode), page);
 
 	lock_page(page);
-	mapping = page->mapping;
+	mapping = page_file_mapping(page);
 	if (mapping != dentry->d_inode->i_mapping)
 		goto out_unlock;
 
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index ab12913..1085e02 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -425,13 +425,14 @@ void nfs_super_set_maxbytes(struct super_block *sb, __u64 maxfilesize)
 static inline
 unsigned int nfs_page_length(struct page *page)
 {
-	loff_t i_size = i_size_read(page->mapping->host);
+	loff_t i_size = i_size_read(page_file_mapping(page)->host);
 
 	if (i_size > 0) {
+		pgoff_t page_index = page_file_index(page);
 		pgoff_t end_index = (i_size - 1) >> PAGE_CACHE_SHIFT;
-		if (page->index < end_index)
+		if (page_index < end_index)
 			return PAGE_CACHE_SIZE;
-		if (page->index == end_index)
+		if (page_index == end_index)
 			return ((i_size - 1) & ~PAGE_CACHE_MASK) + 1;
 	}
 	return 0;
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index b60970c..1fcc294 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -75,11 +75,11 @@ nfs_create_request(struct nfs_open_context *ctx, struct inode *inode,
 	 * update_nfs_request below if the region is not locked. */
 	req->wb_page    = page;
 	atomic_set(&req->wb_complete, 0);
-	req->wb_index	= page->index;
+	req->wb_index	= page_file_index(page);
 	page_cache_get(page);
 	BUG_ON(PagePrivate(page));
 	BUG_ON(!PageLocked(page));
-	BUG_ON(page->mapping->host != inode);
+	BUG_ON(page_file_mapping(page)->host != inode);
 	req->wb_offset  = offset;
 	req->wb_pgbase	= offset;
 	req->wb_bytes   = count;
@@ -429,7 +429,7 @@ void nfs_pageio_cond_complete(struct nfs_pageio_descriptor *desc, pgoff_t index)
  * nfs_scan_list - Scan a list for matching requests
  * @nfsi: NFS inode
  * @dst: Destination list
- * @idx_start: lower bound of page->index to scan
+ * @idx_start: lower bound of page_file_index(page) to scan
  * @npages: idx_start + npages sets the upper bound to scan.
  * @tag: tag to scan for
  *
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 2171c04..3352782 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -568,11 +568,11 @@ static const struct rpc_call_ops nfs_read_full_ops = {
 int nfs_readpage(struct file *file, struct page *page)
 {
 	struct nfs_open_context *ctx;
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	int		error;
 
 	dprintk("NFS: nfs_readpage (%p %ld@%lu)\n",
-		page, PAGE_CACHE_SIZE, page->index);
+		page, PAGE_CACHE_SIZE, page_file_index(page));
 	nfs_inc_stats(inode, NFSIOS_VFSREADPAGE);
 	nfs_add_stats(inode, NFSIOS_READPAGES, 1);
 
@@ -626,7 +626,7 @@ static int
 readpage_async_filler(void *data, struct page *page)
 {
 	struct nfs_readdesc *desc = (struct nfs_readdesc *)data;
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	struct nfs_page *new;
 	unsigned int len;
 	int error;
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index b39b37f..ffd95d1 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -125,7 +125,7 @@ static struct nfs_page *nfs_page_find_request_locked(struct page *page)
 
 static struct nfs_page *nfs_page_find_request(struct page *page)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	struct nfs_page *req = NULL;
 
 	spin_lock(&inode->i_lock);
@@ -137,16 +137,16 @@ static struct nfs_page *nfs_page_find_request(struct page *page)
 /* Adjust the file length if we're writing beyond the end */
 static void nfs_grow_file(struct page *page, unsigned int offset, unsigned int count)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	loff_t end, i_size;
 	pgoff_t end_index;
 
 	spin_lock(&inode->i_lock);
 	i_size = i_size_read(inode);
 	end_index = (i_size - 1) >> PAGE_CACHE_SHIFT;
-	if (i_size > 0 && page->index < end_index)
+	if (i_size > 0 && page_file_index(page) < end_index)
 		goto out;
-	end = ((loff_t)page->index << PAGE_CACHE_SHIFT) + ((loff_t)offset+count);
+	end = page_file_offset(page) + ((loff_t)offset+count);
 	if (i_size >= end)
 		goto out;
 	i_size_write(inode, end);
@@ -159,7 +159,7 @@ out:
 static void nfs_set_pageerror(struct page *page)
 {
 	SetPageError(page);
-	nfs_zap_mapping(page->mapping->host, page->mapping);
+	nfs_zap_mapping(page_file_mapping(page)->host, page_file_mapping(page));
 }
 
 /* We can set the PG_uptodate flag if we see that a write request
@@ -200,7 +200,7 @@ static int nfs_set_page_writeback(struct page *page)
 	int ret = test_set_page_writeback(page);
 
 	if (!ret) {
-		struct inode *inode = page->mapping->host;
+		struct inode *inode = page_file_mapping(page)->host;
 		struct nfs_server *nfss = NFS_SERVER(inode);
 
 		page_cache_get(page);
@@ -215,7 +215,7 @@ static int nfs_set_page_writeback(struct page *page)
 
 static void nfs_end_page_writeback(struct page *page)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	struct nfs_server *nfss = NFS_SERVER(inode);
 
 	end_page_writeback(page);
@@ -226,7 +226,7 @@ static void nfs_end_page_writeback(struct page *page)
 
 static struct nfs_page *nfs_find_and_lock_request(struct page *page, bool nonblock)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	struct nfs_page *req;
 	int ret;
 
@@ -287,13 +287,13 @@ out:
 
 static int nfs_do_writepage(struct page *page, struct writeback_control *wbc, struct nfs_pageio_descriptor *pgio)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	int ret;
 
 	nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGE);
 	nfs_add_stats(inode, NFSIOS_WRITEPAGES, 1);
 
-	nfs_pageio_cond_complete(pgio, page->index);
+	nfs_pageio_cond_complete(pgio, page_file_index(page));
 	ret = nfs_page_async_flush(pgio, page, wbc->sync_mode == WB_SYNC_NONE);
 	if (ret == -EAGAIN) {
 		redirty_page_for_writepage(wbc, page);
@@ -310,7 +310,8 @@ static int nfs_writepage_locked(struct page *page, struct writeback_control *wbc
 	struct nfs_pageio_descriptor pgio;
 	int err;
 
-	nfs_pageio_init_write(&pgio, page->mapping->host, wb_priority(wbc));
+	nfs_pageio_init_write(&pgio, page_file_mapping(page)->host,
+			wb_priority(wbc));
 	err = nfs_do_writepage(page, wbc, &pgio);
 	nfs_pageio_complete(&pgio);
 	if (err < 0)
@@ -428,7 +429,8 @@ static void
 nfs_mark_request_dirty(struct nfs_page *req)
 {
 	__set_page_dirty_nobuffers(req->wb_page);
-	__mark_inode_dirty(req->wb_page->mapping->host, I_DIRTY_DATASYNC);
+	__mark_inode_dirty(page_file_mapping(req->wb_page)->host,
+							I_DIRTY_DATASYNC);
 }
 
 #if defined(CONFIG_NFS_V3) || defined(CONFIG_NFS_V4)
@@ -450,7 +452,8 @@ nfs_mark_request_commit(struct nfs_page *req, struct pnfs_layout_segment *lseg)
 	spin_unlock(&inode->i_lock);
 	pnfs_mark_request_commit(req, lseg);
 	inc_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
-	inc_bdi_stat(req->wb_page->mapping->backing_dev_info, BDI_RECLAIMABLE);
+	inc_bdi_stat(page_file_mapping(req->wb_page)->backing_dev_info,
+			BDI_RECLAIMABLE);
 	__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
 }
 
@@ -461,7 +464,8 @@ nfs_clear_request_commit(struct nfs_page *req)
 
 	if (test_and_clear_bit(PG_CLEAN, &(req)->wb_flags)) {
 		dec_zone_page_state(page, NR_UNSTABLE_NFS);
-		dec_bdi_stat(page->mapping->backing_dev_info, BDI_RECLAIMABLE);
+		dec_bdi_stat(page_file_mapping(page)->backing_dev_info,
+				BDI_RECLAIMABLE);
 		return 1;
 	}
 	return 0;
@@ -527,7 +531,7 @@ nfs_need_commit(struct nfs_inode *nfsi)
  * nfs_scan_commit - Scan an inode for commit requests
  * @inode: NFS inode to scan
  * @dst: destination list
- * @idx_start: lower bound of page->index to scan.
+ * @idx_start: lower bound of page_file_index(page) to scan.
  * @npages: idx_start + npages sets the upper bound to scan.
  *
  * Moves requests from the inode's 'commit' request list.
@@ -653,7 +657,7 @@ out_err:
 static struct nfs_page * nfs_setup_write_request(struct nfs_open_context* ctx,
 		struct page *page, unsigned int offset, unsigned int bytes)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	struct nfs_page	*req;
 	int error;
 
@@ -711,7 +715,7 @@ int nfs_flush_incompatible(struct file *file, struct page *page)
 		nfs_release_request(req);
 		if (!do_flush)
 			return 0;
-		status = nfs_wb_page(page->mapping->host, page);
+		status = nfs_wb_page(page_file_mapping(page)->host, page);
 	} while (status == 0);
 	return status;
 }
@@ -737,7 +741,7 @@ int nfs_updatepage(struct file *file, struct page *page,
 		unsigned int offset, unsigned int count)
 {
 	struct nfs_open_context *ctx = nfs_file_open_context(file);
-	struct inode	*inode = page->mapping->host;
+	struct inode	*inode = page_file_mapping(page)->host;
 	int		status = 0;
 
 	nfs_inc_stats(inode, NFSIOS_VFSUPDATEPAGE);
@@ -745,7 +749,7 @@ int nfs_updatepage(struct file *file, struct page *page,
 	dprintk("NFS:       nfs_updatepage(%s/%s %d@%lld)\n",
 		file->f_path.dentry->d_parent->d_name.name,
 		file->f_path.dentry->d_name.name, count,
-		(long long)(page_offset(page) + offset));
+		(long long)(page_file_offset(page) + offset));
 
 	/* If we're not using byte range locks, and we know the page
 	 * is up to date, it may be more efficient to extend the write
@@ -1104,7 +1108,7 @@ static void nfs_writeback_release_partial(void *calldata)
 	}
 
 	if (nfs_write_need_commit(data)) {
-		struct inode *inode = page->mapping->host;
+		struct inode *inode = page_file_mapping(page)->host;
 
 		spin_lock(&inode->i_lock);
 		if (test_bit(PG_NEED_RESCHED, &req->wb_flags)) {
@@ -1409,7 +1413,7 @@ void nfs_retry_commit(struct list_head *page_list,
 		nfs_list_remove_request(req);
 		nfs_mark_request_commit(req, lseg);
 		dec_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
-		dec_bdi_stat(req->wb_page->mapping->backing_dev_info,
+		dec_bdi_stat(page_file_mapping(req->wb_page)->backing_dev_info,
 			     BDI_RECLAIMABLE);
 		nfs_clear_page_tag_locked(req);
 	}
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 06/10] nfs: teach the NFS client how to treat PG_swapcache pages
@ 2011-09-09 11:00   ` Mel Gorman
  0 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-09 11:00 UTC (permalink / raw)
  To: Linux-MM
  Cc: Linux-Netdev, Linux-NFS, LKML, Andrew Morton, David Miller,
	Trond Myklebust, Neil Brown, Peter Zijlstra, Mel Gorman

Replace all relevant occurences of page->index and page->mapping in
the NFS client with the new page_file_index() and page_file_mapping()
functions.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/nfs/file.c     |    6 +++---
 fs/nfs/internal.h |    7 ++++---
 fs/nfs/pagelist.c |    6 +++---
 fs/nfs/read.c     |    6 +++---
 fs/nfs/write.c    |   46 +++++++++++++++++++++++++---------------------
 5 files changed, 38 insertions(+), 33 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 28b8c3f..38c7cf4 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -484,7 +484,7 @@ static void nfs_invalidate_page(struct page *page, unsigned long offset)
 	if (offset != 0)
 		return;
 	/* Cancel any unstarted writes on this page */
-	nfs_wb_page_cancel(page->mapping->host, page);
+	nfs_wb_page_cancel(page_file_mapping(page)->host, page);
 
 	nfs_fscache_invalidate_page(page, page->mapping->host);
 }
@@ -526,7 +526,7 @@ static int nfs_release_page(struct page *page, gfp_t gfp)
  */
 static int nfs_launder_page(struct page *page)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	struct nfs_inode *nfsi = NFS_I(inode);
 
 	dfprintk(PAGECACHE, "NFS: launder_page(%ld, %llu)\n",
@@ -575,7 +575,7 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	nfs_fscache_wait_on_page_write(NFS_I(dentry->d_inode), page);
 
 	lock_page(page);
-	mapping = page->mapping;
+	mapping = page_file_mapping(page);
 	if (mapping != dentry->d_inode->i_mapping)
 		goto out_unlock;
 
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index ab12913..1085e02 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -425,13 +425,14 @@ void nfs_super_set_maxbytes(struct super_block *sb, __u64 maxfilesize)
 static inline
 unsigned int nfs_page_length(struct page *page)
 {
-	loff_t i_size = i_size_read(page->mapping->host);
+	loff_t i_size = i_size_read(page_file_mapping(page)->host);
 
 	if (i_size > 0) {
+		pgoff_t page_index = page_file_index(page);
 		pgoff_t end_index = (i_size - 1) >> PAGE_CACHE_SHIFT;
-		if (page->index < end_index)
+		if (page_index < end_index)
 			return PAGE_CACHE_SIZE;
-		if (page->index == end_index)
+		if (page_index == end_index)
 			return ((i_size - 1) & ~PAGE_CACHE_MASK) + 1;
 	}
 	return 0;
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index b60970c..1fcc294 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -75,11 +75,11 @@ nfs_create_request(struct nfs_open_context *ctx, struct inode *inode,
 	 * update_nfs_request below if the region is not locked. */
 	req->wb_page    = page;
 	atomic_set(&req->wb_complete, 0);
-	req->wb_index	= page->index;
+	req->wb_index	= page_file_index(page);
 	page_cache_get(page);
 	BUG_ON(PagePrivate(page));
 	BUG_ON(!PageLocked(page));
-	BUG_ON(page->mapping->host != inode);
+	BUG_ON(page_file_mapping(page)->host != inode);
 	req->wb_offset  = offset;
 	req->wb_pgbase	= offset;
 	req->wb_bytes   = count;
@@ -429,7 +429,7 @@ void nfs_pageio_cond_complete(struct nfs_pageio_descriptor *desc, pgoff_t index)
  * nfs_scan_list - Scan a list for matching requests
  * @nfsi: NFS inode
  * @dst: Destination list
- * @idx_start: lower bound of page->index to scan
+ * @idx_start: lower bound of page_file_index(page) to scan
  * @npages: idx_start + npages sets the upper bound to scan.
  * @tag: tag to scan for
  *
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 2171c04..3352782 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -568,11 +568,11 @@ static const struct rpc_call_ops nfs_read_full_ops = {
 int nfs_readpage(struct file *file, struct page *page)
 {
 	struct nfs_open_context *ctx;
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	int		error;
 
 	dprintk("NFS: nfs_readpage (%p %ld@%lu)\n",
-		page, PAGE_CACHE_SIZE, page->index);
+		page, PAGE_CACHE_SIZE, page_file_index(page));
 	nfs_inc_stats(inode, NFSIOS_VFSREADPAGE);
 	nfs_add_stats(inode, NFSIOS_READPAGES, 1);
 
@@ -626,7 +626,7 @@ static int
 readpage_async_filler(void *data, struct page *page)
 {
 	struct nfs_readdesc *desc = (struct nfs_readdesc *)data;
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	struct nfs_page *new;
 	unsigned int len;
 	int error;
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index b39b37f..ffd95d1 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -125,7 +125,7 @@ static struct nfs_page *nfs_page_find_request_locked(struct page *page)
 
 static struct nfs_page *nfs_page_find_request(struct page *page)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	struct nfs_page *req = NULL;
 
 	spin_lock(&inode->i_lock);
@@ -137,16 +137,16 @@ static struct nfs_page *nfs_page_find_request(struct page *page)
 /* Adjust the file length if we're writing beyond the end */
 static void nfs_grow_file(struct page *page, unsigned int offset, unsigned int count)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	loff_t end, i_size;
 	pgoff_t end_index;
 
 	spin_lock(&inode->i_lock);
 	i_size = i_size_read(inode);
 	end_index = (i_size - 1) >> PAGE_CACHE_SHIFT;
-	if (i_size > 0 && page->index < end_index)
+	if (i_size > 0 && page_file_index(page) < end_index)
 		goto out;
-	end = ((loff_t)page->index << PAGE_CACHE_SHIFT) + ((loff_t)offset+count);
+	end = page_file_offset(page) + ((loff_t)offset+count);
 	if (i_size >= end)
 		goto out;
 	i_size_write(inode, end);
@@ -159,7 +159,7 @@ out:
 static void nfs_set_pageerror(struct page *page)
 {
 	SetPageError(page);
-	nfs_zap_mapping(page->mapping->host, page->mapping);
+	nfs_zap_mapping(page_file_mapping(page)->host, page_file_mapping(page));
 }
 
 /* We can set the PG_uptodate flag if we see that a write request
@@ -200,7 +200,7 @@ static int nfs_set_page_writeback(struct page *page)
 	int ret = test_set_page_writeback(page);
 
 	if (!ret) {
-		struct inode *inode = page->mapping->host;
+		struct inode *inode = page_file_mapping(page)->host;
 		struct nfs_server *nfss = NFS_SERVER(inode);
 
 		page_cache_get(page);
@@ -215,7 +215,7 @@ static int nfs_set_page_writeback(struct page *page)
 
 static void nfs_end_page_writeback(struct page *page)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	struct nfs_server *nfss = NFS_SERVER(inode);
 
 	end_page_writeback(page);
@@ -226,7 +226,7 @@ static void nfs_end_page_writeback(struct page *page)
 
 static struct nfs_page *nfs_find_and_lock_request(struct page *page, bool nonblock)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	struct nfs_page *req;
 	int ret;
 
@@ -287,13 +287,13 @@ out:
 
 static int nfs_do_writepage(struct page *page, struct writeback_control *wbc, struct nfs_pageio_descriptor *pgio)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	int ret;
 
 	nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGE);
 	nfs_add_stats(inode, NFSIOS_WRITEPAGES, 1);
 
-	nfs_pageio_cond_complete(pgio, page->index);
+	nfs_pageio_cond_complete(pgio, page_file_index(page));
 	ret = nfs_page_async_flush(pgio, page, wbc->sync_mode == WB_SYNC_NONE);
 	if (ret == -EAGAIN) {
 		redirty_page_for_writepage(wbc, page);
@@ -310,7 +310,8 @@ static int nfs_writepage_locked(struct page *page, struct writeback_control *wbc
 	struct nfs_pageio_descriptor pgio;
 	int err;
 
-	nfs_pageio_init_write(&pgio, page->mapping->host, wb_priority(wbc));
+	nfs_pageio_init_write(&pgio, page_file_mapping(page)->host,
+			wb_priority(wbc));
 	err = nfs_do_writepage(page, wbc, &pgio);
 	nfs_pageio_complete(&pgio);
 	if (err < 0)
@@ -428,7 +429,8 @@ static void
 nfs_mark_request_dirty(struct nfs_page *req)
 {
 	__set_page_dirty_nobuffers(req->wb_page);
-	__mark_inode_dirty(req->wb_page->mapping->host, I_DIRTY_DATASYNC);
+	__mark_inode_dirty(page_file_mapping(req->wb_page)->host,
+							I_DIRTY_DATASYNC);
 }
 
 #if defined(CONFIG_NFS_V3) || defined(CONFIG_NFS_V4)
@@ -450,7 +452,8 @@ nfs_mark_request_commit(struct nfs_page *req, struct pnfs_layout_segment *lseg)
 	spin_unlock(&inode->i_lock);
 	pnfs_mark_request_commit(req, lseg);
 	inc_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
-	inc_bdi_stat(req->wb_page->mapping->backing_dev_info, BDI_RECLAIMABLE);
+	inc_bdi_stat(page_file_mapping(req->wb_page)->backing_dev_info,
+			BDI_RECLAIMABLE);
 	__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
 }
 
@@ -461,7 +464,8 @@ nfs_clear_request_commit(struct nfs_page *req)
 
 	if (test_and_clear_bit(PG_CLEAN, &(req)->wb_flags)) {
 		dec_zone_page_state(page, NR_UNSTABLE_NFS);
-		dec_bdi_stat(page->mapping->backing_dev_info, BDI_RECLAIMABLE);
+		dec_bdi_stat(page_file_mapping(page)->backing_dev_info,
+				BDI_RECLAIMABLE);
 		return 1;
 	}
 	return 0;
@@ -527,7 +531,7 @@ nfs_need_commit(struct nfs_inode *nfsi)
  * nfs_scan_commit - Scan an inode for commit requests
  * @inode: NFS inode to scan
  * @dst: destination list
- * @idx_start: lower bound of page->index to scan.
+ * @idx_start: lower bound of page_file_index(page) to scan.
  * @npages: idx_start + npages sets the upper bound to scan.
  *
  * Moves requests from the inode's 'commit' request list.
@@ -653,7 +657,7 @@ out_err:
 static struct nfs_page * nfs_setup_write_request(struct nfs_open_context* ctx,
 		struct page *page, unsigned int offset, unsigned int bytes)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = page_file_mapping(page)->host;
 	struct nfs_page	*req;
 	int error;
 
@@ -711,7 +715,7 @@ int nfs_flush_incompatible(struct file *file, struct page *page)
 		nfs_release_request(req);
 		if (!do_flush)
 			return 0;
-		status = nfs_wb_page(page->mapping->host, page);
+		status = nfs_wb_page(page_file_mapping(page)->host, page);
 	} while (status == 0);
 	return status;
 }
@@ -737,7 +741,7 @@ int nfs_updatepage(struct file *file, struct page *page,
 		unsigned int offset, unsigned int count)
 {
 	struct nfs_open_context *ctx = nfs_file_open_context(file);
-	struct inode	*inode = page->mapping->host;
+	struct inode	*inode = page_file_mapping(page)->host;
 	int		status = 0;
 
 	nfs_inc_stats(inode, NFSIOS_VFSUPDATEPAGE);
@@ -745,7 +749,7 @@ int nfs_updatepage(struct file *file, struct page *page,
 	dprintk("NFS:       nfs_updatepage(%s/%s %d@%lld)\n",
 		file->f_path.dentry->d_parent->d_name.name,
 		file->f_path.dentry->d_name.name, count,
-		(long long)(page_offset(page) + offset));
+		(long long)(page_file_offset(page) + offset));
 
 	/* If we're not using byte range locks, and we know the page
 	 * is up to date, it may be more efficient to extend the write
@@ -1104,7 +1108,7 @@ static void nfs_writeback_release_partial(void *calldata)
 	}
 
 	if (nfs_write_need_commit(data)) {
-		struct inode *inode = page->mapping->host;
+		struct inode *inode = page_file_mapping(page)->host;
 
 		spin_lock(&inode->i_lock);
 		if (test_bit(PG_NEED_RESCHED, &req->wb_flags)) {
@@ -1409,7 +1413,7 @@ void nfs_retry_commit(struct list_head *page_list,
 		nfs_list_remove_request(req);
 		nfs_mark_request_commit(req, lseg);
 		dec_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
-		dec_bdi_stat(req->wb_page->mapping->backing_dev_info,
+		dec_bdi_stat(page_file_mapping(req->wb_page)->backing_dev_info,
 			     BDI_RECLAIMABLE);
 		nfs_clear_page_tag_locked(req);
 	}
-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 07/10] nfs: disable data cache revalidation for swapfiles
  2011-09-09 11:00 ` Mel Gorman
@ 2011-09-09 11:00   ` Mel Gorman
  -1 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-09 11:00 UTC (permalink / raw)
  To: Linux-MM
  Cc: Linux-Netdev, Linux-NFS, LKML, Andrew Morton, David Miller,
	Trond Myklebust, Neil Brown, Peter Zijlstra, Mel Gorman

The VM does not like PG_private set on PG_swapcache pages. As suggested
by Trond in http://lkml.org/lkml/2006/8/25/348, this patch disables
NFS data cache revalidation on swap files.  as it does not make
sense to have other clients change the file while it is being used as
swap. This avoids setting PG_private on swap pages, since there ought
to be no further races with invalidate_inode_pages2() to deal with.

Since we cannot set PG_private we cannot use page->private which
is already used by PG_swapcache pages to store the nfs_page. Thus
augment the new nfs_page_find_request logic.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/nfs/inode.c |    6 ++++
 fs/nfs/write.c |   77 +++++++++++++++++++++++++++++++++++++++++++++-----------
 2 files changed, 68 insertions(+), 15 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index fe12037..fa25e2c 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -877,6 +877,12 @@ int nfs_revalidate_mapping(struct inode *inode, struct address_space *mapping)
 	struct nfs_inode *nfsi = NFS_I(inode);
 	int ret = 0;
 
+	/*
+	 * swapfiles are not supposed to be shared.
+	 */
+	if (IS_SWAPFILE(inode))
+		goto out;
+
 	if ((nfsi->cache_validity & NFS_INO_REVAL_PAGECACHE)
 			|| nfs_attribute_cache_expired(inode)
 			|| NFS_STALE(inode)) {
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index ffd95d1..15e3b7a 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -111,25 +111,64 @@ static void nfs_context_set_write_error(struct nfs_open_context *ctx, int error)
 	set_bit(NFS_CONTEXT_ERROR_WRITE, &ctx->flags);
 }
 
-static struct nfs_page *nfs_page_find_request_locked(struct page *page)
+static struct nfs_page *
+__nfs_page_find_request_locked(struct nfs_inode *nfsi, struct page *page,
+		int get)
 {
 	struct nfs_page *req = NULL;
 
-	if (PagePrivate(page)) {
+	if (PagePrivate(page))
 		req = (struct nfs_page *)page_private(page);
-		if (req != NULL)
-			kref_get(&req->wb_kref);
-	}
+	else if (unlikely(PageSwapCache(page)))
+		req = radix_tree_lookup(&nfsi->nfs_page_tree,
+				page_file_index(page));
+
+	if (get && req)
+		kref_get(&req->wb_kref);
+
 	return req;
 }
 
+static inline struct nfs_page *
+nfs_page_find_request_locked(struct nfs_inode *nfsi, struct page *page)
+{
+	return __nfs_page_find_request_locked(nfsi, page, 1);
+}
+
+static int __nfs_page_has_request(struct page *page)
+{
+	struct inode *inode = page_file_mapping(page)->host;
+	struct nfs_page *req = NULL;
+
+	spin_lock(&inode->i_lock);
+	req = __nfs_page_find_request_locked(NFS_I(inode), page, 0);
+	spin_unlock(&inode->i_lock);
+
+	/*
+	 * hole here plugged by the caller holding onto PG_locked
+	 */
+
+	return req != NULL;
+}
+
+static inline int nfs_page_has_request(struct page *page)
+{
+	if (PagePrivate(page))
+		return 1;
+
+	if (unlikely(PageSwapCache(page)))
+		return __nfs_page_has_request(page);
+
+	return 0;
+}
+
 static struct nfs_page *nfs_page_find_request(struct page *page)
 {
 	struct inode *inode = page_file_mapping(page)->host;
 	struct nfs_page *req = NULL;
 
 	spin_lock(&inode->i_lock);
-	req = nfs_page_find_request_locked(page);
+	req = nfs_page_find_request_locked(NFS_I(inode), page);
 	spin_unlock(&inode->i_lock);
 	return req;
 }
@@ -232,7 +271,7 @@ static struct nfs_page *nfs_find_and_lock_request(struct page *page, bool nonblo
 
 	spin_lock(&inode->i_lock);
 	for (;;) {
-		req = nfs_page_find_request_locked(page);
+		req = nfs_page_find_request_locked(NFS_I(inode), page);
 		if (req == NULL)
 			break;
 		if (nfs_set_page_tag_locked(req))
@@ -392,9 +431,15 @@ static int nfs_inode_add_request(struct inode *inode, struct nfs_page *req)
 	BUG_ON(error);
 	if (!nfsi->npages && nfs_have_delegation(inode, FMODE_WRITE))
 		nfsi->change_attr++;
-	set_bit(PG_MAPPED, &req->wb_flags);
-	SetPagePrivate(req->wb_page);
-	set_page_private(req->wb_page, (unsigned long)req);
+	/*
+	 * Swap-space should not get truncated. Hence no need to plug the race
+	 * with invalidate/truncate.
+	 */
+	if (likely(!PageSwapCache(req->wb_page))) {
+		set_bit(PG_MAPPED, &req->wb_flags);
+		SetPagePrivate(req->wb_page);
+		set_page_private(req->wb_page, (unsigned long)req);
+	}
 	nfsi->npages++;
 	kref_get(&req->wb_kref);
 	radix_tree_tag_set(&nfsi->nfs_page_tree, req->wb_index,
@@ -416,9 +461,11 @@ static void nfs_inode_remove_request(struct nfs_page *req)
 	BUG_ON (!NFS_WBACK_BUSY(req));
 
 	spin_lock(&inode->i_lock);
-	set_page_private(req->wb_page, 0);
-	ClearPagePrivate(req->wb_page);
-	clear_bit(PG_MAPPED, &req->wb_flags);
+	if (likely(!PageSwapCache(req->wb_page))) {
+		set_page_private(req->wb_page, 0);
+		ClearPagePrivate(req->wb_page);
+		clear_bit(PG_MAPPED, &req->wb_flags);
+	}
 	radix_tree_delete(&nfsi->nfs_page_tree, req->wb_index);
 	nfsi->npages--;
 	spin_unlock(&inode->i_lock);
@@ -593,7 +640,7 @@ static struct nfs_page *nfs_try_to_update_request(struct inode *inode,
 	spin_lock(&inode->i_lock);
 
 	for (;;) {
-		req = nfs_page_find_request_locked(page);
+		req = nfs_page_find_request_locked(NFS_I(inode), page);
 		if (req == NULL)
 			goto out_unlock;
 
@@ -1657,7 +1704,7 @@ int nfs_wb_page_cancel(struct inode *inode, struct page *page)
  */
 int nfs_wb_page(struct inode *inode, struct page *page)
 {
-	loff_t range_start = page_offset(page);
+	loff_t range_start = page_file_offset(page);
 	loff_t range_end = range_start + (loff_t)(PAGE_CACHE_SIZE - 1);
 	struct writeback_control wbc = {
 		.sync_mode = WB_SYNC_ALL,
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 07/10] nfs: disable data cache revalidation for swapfiles
@ 2011-09-09 11:00   ` Mel Gorman
  0 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-09 11:00 UTC (permalink / raw)
  To: Linux-MM
  Cc: Linux-Netdev, Linux-NFS, LKML, Andrew Morton, David Miller,
	Trond Myklebust, Neil Brown, Peter Zijlstra, Mel Gorman

The VM does not like PG_private set on PG_swapcache pages. As suggested
by Trond in http://lkml.org/lkml/2006/8/25/348, this patch disables
NFS data cache revalidation on swap files.  as it does not make
sense to have other clients change the file while it is being used as
swap. This avoids setting PG_private on swap pages, since there ought
to be no further races with invalidate_inode_pages2() to deal with.

Since we cannot set PG_private we cannot use page->private which
is already used by PG_swapcache pages to store the nfs_page. Thus
augment the new nfs_page_find_request logic.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/nfs/inode.c |    6 ++++
 fs/nfs/write.c |   77 +++++++++++++++++++++++++++++++++++++++++++++-----------
 2 files changed, 68 insertions(+), 15 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index fe12037..fa25e2c 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -877,6 +877,12 @@ int nfs_revalidate_mapping(struct inode *inode, struct address_space *mapping)
 	struct nfs_inode *nfsi = NFS_I(inode);
 	int ret = 0;
 
+	/*
+	 * swapfiles are not supposed to be shared.
+	 */
+	if (IS_SWAPFILE(inode))
+		goto out;
+
 	if ((nfsi->cache_validity & NFS_INO_REVAL_PAGECACHE)
 			|| nfs_attribute_cache_expired(inode)
 			|| NFS_STALE(inode)) {
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index ffd95d1..15e3b7a 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -111,25 +111,64 @@ static void nfs_context_set_write_error(struct nfs_open_context *ctx, int error)
 	set_bit(NFS_CONTEXT_ERROR_WRITE, &ctx->flags);
 }
 
-static struct nfs_page *nfs_page_find_request_locked(struct page *page)
+static struct nfs_page *
+__nfs_page_find_request_locked(struct nfs_inode *nfsi, struct page *page,
+		int get)
 {
 	struct nfs_page *req = NULL;
 
-	if (PagePrivate(page)) {
+	if (PagePrivate(page))
 		req = (struct nfs_page *)page_private(page);
-		if (req != NULL)
-			kref_get(&req->wb_kref);
-	}
+	else if (unlikely(PageSwapCache(page)))
+		req = radix_tree_lookup(&nfsi->nfs_page_tree,
+				page_file_index(page));
+
+	if (get && req)
+		kref_get(&req->wb_kref);
+
 	return req;
 }
 
+static inline struct nfs_page *
+nfs_page_find_request_locked(struct nfs_inode *nfsi, struct page *page)
+{
+	return __nfs_page_find_request_locked(nfsi, page, 1);
+}
+
+static int __nfs_page_has_request(struct page *page)
+{
+	struct inode *inode = page_file_mapping(page)->host;
+	struct nfs_page *req = NULL;
+
+	spin_lock(&inode->i_lock);
+	req = __nfs_page_find_request_locked(NFS_I(inode), page, 0);
+	spin_unlock(&inode->i_lock);
+
+	/*
+	 * hole here plugged by the caller holding onto PG_locked
+	 */
+
+	return req != NULL;
+}
+
+static inline int nfs_page_has_request(struct page *page)
+{
+	if (PagePrivate(page))
+		return 1;
+
+	if (unlikely(PageSwapCache(page)))
+		return __nfs_page_has_request(page);
+
+	return 0;
+}
+
 static struct nfs_page *nfs_page_find_request(struct page *page)
 {
 	struct inode *inode = page_file_mapping(page)->host;
 	struct nfs_page *req = NULL;
 
 	spin_lock(&inode->i_lock);
-	req = nfs_page_find_request_locked(page);
+	req = nfs_page_find_request_locked(NFS_I(inode), page);
 	spin_unlock(&inode->i_lock);
 	return req;
 }
@@ -232,7 +271,7 @@ static struct nfs_page *nfs_find_and_lock_request(struct page *page, bool nonblo
 
 	spin_lock(&inode->i_lock);
 	for (;;) {
-		req = nfs_page_find_request_locked(page);
+		req = nfs_page_find_request_locked(NFS_I(inode), page);
 		if (req == NULL)
 			break;
 		if (nfs_set_page_tag_locked(req))
@@ -392,9 +431,15 @@ static int nfs_inode_add_request(struct inode *inode, struct nfs_page *req)
 	BUG_ON(error);
 	if (!nfsi->npages && nfs_have_delegation(inode, FMODE_WRITE))
 		nfsi->change_attr++;
-	set_bit(PG_MAPPED, &req->wb_flags);
-	SetPagePrivate(req->wb_page);
-	set_page_private(req->wb_page, (unsigned long)req);
+	/*
+	 * Swap-space should not get truncated. Hence no need to plug the race
+	 * with invalidate/truncate.
+	 */
+	if (likely(!PageSwapCache(req->wb_page))) {
+		set_bit(PG_MAPPED, &req->wb_flags);
+		SetPagePrivate(req->wb_page);
+		set_page_private(req->wb_page, (unsigned long)req);
+	}
 	nfsi->npages++;
 	kref_get(&req->wb_kref);
 	radix_tree_tag_set(&nfsi->nfs_page_tree, req->wb_index,
@@ -416,9 +461,11 @@ static void nfs_inode_remove_request(struct nfs_page *req)
 	BUG_ON (!NFS_WBACK_BUSY(req));
 
 	spin_lock(&inode->i_lock);
-	set_page_private(req->wb_page, 0);
-	ClearPagePrivate(req->wb_page);
-	clear_bit(PG_MAPPED, &req->wb_flags);
+	if (likely(!PageSwapCache(req->wb_page))) {
+		set_page_private(req->wb_page, 0);
+		ClearPagePrivate(req->wb_page);
+		clear_bit(PG_MAPPED, &req->wb_flags);
+	}
 	radix_tree_delete(&nfsi->nfs_page_tree, req->wb_index);
 	nfsi->npages--;
 	spin_unlock(&inode->i_lock);
@@ -593,7 +640,7 @@ static struct nfs_page *nfs_try_to_update_request(struct inode *inode,
 	spin_lock(&inode->i_lock);
 
 	for (;;) {
-		req = nfs_page_find_request_locked(page);
+		req = nfs_page_find_request_locked(NFS_I(inode), page);
 		if (req == NULL)
 			goto out_unlock;
 
@@ -1657,7 +1704,7 @@ int nfs_wb_page_cancel(struct inode *inode, struct page *page)
  */
 int nfs_wb_page(struct inode *inode, struct page *page)
 {
-	loff_t range_start = page_offset(page);
+	loff_t range_start = page_file_offset(page);
 	loff_t range_end = range_start + (loff_t)(PAGE_CACHE_SIZE - 1);
 	struct writeback_control wbc = {
 		.sync_mode = WB_SYNC_ALL,
-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 08/10] nfs: enable swap on NFS
  2011-09-09 11:00 ` Mel Gorman
@ 2011-09-09 11:00   ` Mel Gorman
  -1 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-09 11:00 UTC (permalink / raw)
  To: Linux-MM
  Cc: Linux-Netdev, Linux-NFS, LKML, Andrew Morton, David Miller,
	Trond Myklebust, Neil Brown, Peter Zijlstra, Mel Gorman

Implement all the new swapfile a_ops for NFS. This will set the NFS
socket to SOCK_MEMALLOC and run socket reconnect under PF_MEMALLOC as
well as reset SOCK_MEMALLOC before engaging the protocol ->connect()
method.

PF_MEMALLOC should allow the allocation of struct socket and related
objects and the early (re)setting of SOCK_MEMALLOC should allow us
to receive the packets required for the TCP connection buildup.

[dfeng@redhat.com: Fix handling of multiple swap files]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/nfs/Kconfig              |    8 ++++++
 fs/nfs/file.c               |   20 +++++++++++++++
 fs/nfs/write.c              |   33 ++++++++++++++++++++++++-
 include/linux/nfs_fs.h      |    2 +
 include/linux/sunrpc/xprt.h |    3 ++
 net/sunrpc/Kconfig          |    5 ++++
 net/sunrpc/clnt.c           |    2 +
 net/sunrpc/sched.c          |    7 ++++-
 net/sunrpc/xprtsock.c       |   57 +++++++++++++++++++++++++++++++++++++++++++
 9 files changed, 134 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
index dbcd821..7c3b921 100644
--- a/fs/nfs/Kconfig
+++ b/fs/nfs/Kconfig
@@ -74,6 +74,14 @@ config NFS_V4
 
 	  If unsure, say Y.
 
+config NFS_SWAP
+	bool "Provide swap over NFS support"
+	default n
+	depends on NFS_FS
+	select SUNRPC_SWAP
+	help
+	  This option enables swapon to work on files located on NFS mounts.
+
 config NFS_V4_1
 	bool "NFS client support for NFSv4.1 (EXPERIMENTAL)"
 	depends on NFS_FS && NFS_V4 && EXPERIMENTAL
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 38c7cf4..2fdb1bd 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -536,6 +536,20 @@ static int nfs_launder_page(struct page *page)
 	return nfs_wb_page(inode, page);
 }
 
+#ifdef CONFIG_NFS_SWAP
+static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file,
+						sector_t *span)
+{
+	*span = sis->pages;
+	return xs_swapper(NFS_CLIENT(file->f_mapping->host)->cl_xprt, 1);
+}
+
+static void nfs_swap_deactivate(struct file *file)
+{
+	xs_swapper(NFS_CLIENT(file->f_mapping->host)->cl_xprt, 0);
+}
+#endif
+
 const struct address_space_operations nfs_file_aops = {
 	.readpage = nfs_readpage,
 	.readpages = nfs_readpages,
@@ -550,6 +564,12 @@ const struct address_space_operations nfs_file_aops = {
 	.migratepage = nfs_migrate_page,
 	.launder_page = nfs_launder_page,
 	.error_remove_page = generic_error_remove_page,
+#ifdef CONFIG_NFS_SWAP
+	.swap_activate = nfs_swap_activate,
+	.swap_deactivate = nfs_swap_deactivate,
+	.swap_writepage = nfs_swap_writepage,
+	.swap_readpage = nfs_readpage,
+#endif
 };
 
 /*
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 15e3b7a..475e1f2 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -369,6 +369,28 @@ int nfs_writepage(struct page *page, struct writeback_control *wbc)
 	return ret;
 }
 
+static int nfs_writepage_setup(struct nfs_open_context *ctx, struct page *page,
+		unsigned int offset, unsigned int count);
+
+int nfs_swap_writepage(struct file *file, struct page *page,
+		 struct writeback_control *wbc)
+{
+	struct nfs_open_context *ctx = nfs_file_open_context(file);
+	int status;
+
+	status = nfs_writepage_setup(ctx, page, 0, nfs_page_length(page));
+	if (status < 0) {
+		nfs_set_pageerror(page);
+		goto out;
+	}
+
+	status = nfs_writepage_locked(page, wbc);
+
+out:
+	unlock_page(page);
+	return status;
+}
+
 static int nfs_writepages_callback(struct page *page, struct writeback_control *wbc, void *data)
 {
 	int ret;
@@ -734,7 +756,16 @@ static int nfs_writepage_setup(struct nfs_open_context *ctx, struct page *page,
 	/* Update file length */
 	nfs_grow_file(page, offset, count);
 	nfs_mark_uptodate(page, req->wb_pgbase, req->wb_bytes);
-	nfs_mark_request_dirty(req);
+
+	/*
+	 * There is no need to mark swapfile requests as dirty like normal
+	 * writepage requests as page dirtying and cleaning is managed
+	 * from the mm. If a PageSwapCache page is marked dirty like this,
+	 * it will still be dirty after kswapd calls writepage and may
+	 * never be released
+	 */
+	if (!PageSwapCache(page))
+		nfs_mark_request_dirty(req);
 	nfs_clear_page_tag_locked(req);
 	return 0;
 }
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index eaac770..c7a1e01 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -513,6 +513,8 @@ extern int  nfs_writepages(struct address_space *, struct writeback_control *);
 extern int  nfs_flush_incompatible(struct file *file, struct page *page);
 extern int  nfs_updatepage(struct file *, struct page *, unsigned int, unsigned int);
 extern void nfs_writeback_done(struct rpc_task *, struct nfs_write_data *);
+extern int  nfs_swap_writepage(struct file *file, struct page *page,
+			 struct writeback_control *wbc);
 
 /*
  * Try to write back everything synchronously (but check the
diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index 15518a1..bc2fd1e 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -174,6 +174,8 @@ struct rpc_xprt {
 	unsigned long		state;		/* transport state */
 	unsigned char		shutdown   : 1,	/* being shut down */
 				resvport   : 1; /* use a reserved port */
+	unsigned int		swapper; 	/* we're swapping over this
+						   transport */
 	unsigned int		bind_index;	/* bind function index */
 
 	/*
@@ -311,6 +313,7 @@ void			xprt_release_rqst_cong(struct rpc_task *task);
 void			xprt_disconnect_done(struct rpc_xprt *xprt);
 void			xprt_force_disconnect(struct rpc_xprt *xprt);
 void			xprt_conditional_disconnect(struct rpc_xprt *xprt, unsigned int cookie);
+int			xs_swapper(struct rpc_xprt *xprt, int enable);
 
 /*
  * Reserved bit positions in xprt->state
diff --git a/net/sunrpc/Kconfig b/net/sunrpc/Kconfig
index ffd243d..0e9d340 100644
--- a/net/sunrpc/Kconfig
+++ b/net/sunrpc/Kconfig
@@ -21,6 +21,11 @@ config SUNRPC_XPRT_RDMA
 
 	  If unsure, say N.
 
+config SUNRPC_SWAP
+	bool
+	depends on SUNRPC
+	select NETVM
+
 config RPCSEC_GSS_KRB5
 	tristate "Secure RPC: Kerberos V mechanism"
 	depends on SUNRPC && CRYPTO
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index c5347d2..63547e0 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -594,6 +594,8 @@ void rpc_task_set_client(struct rpc_task *task, struct rpc_clnt *clnt)
 		atomic_inc(&clnt->cl_count);
 		if (clnt->cl_softrtry)
 			task->tk_flags |= RPC_TASK_SOFT;
+		if (task->tk_client->cl_xprt->swapper)
+			task->tk_flags |= RPC_TASK_SWAPPER;
 		/* Add to the client's list of all tasks */
 		spin_lock(&clnt->cl_lock);
 		list_add_tail(&task->tk_task, &clnt->cl_tasks);
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index d12ffa5..e116ab2 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -748,7 +748,10 @@ static void rpc_async_schedule(struct work_struct *work)
 void *rpc_malloc(struct rpc_task *task, size_t size)
 {
 	struct rpc_buffer *buf;
-	gfp_t gfp = RPC_IS_SWAPPER(task) ? GFP_ATOMIC : GFP_NOWAIT;
+	gfp_t gfp = GFP_NOWAIT;
+
+	if (RPC_IS_SWAPPER(task))
+		gfp |= __GFP_MEMALLOC;
 
 	size += sizeof(struct rpc_buffer);
 	if (size <= RPC_BUFFER_MAXSIZE)
@@ -828,7 +831,7 @@ static void rpc_init_task(struct rpc_task *task, const struct rpc_task_setup *ta
 static struct rpc_task *
 rpc_alloc_task(void)
 {
-	return (struct rpc_task *)mempool_alloc(rpc_task_mempool, GFP_NOFS);
+	return (struct rpc_task *)mempool_alloc(rpc_task_mempool, GFP_NOIO);
 }
 
 /*
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index d7f97ef..6448abe 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1931,6 +1931,49 @@ out:
 	xprt_wake_pending_tasks(xprt, status);
 }
 
+#ifdef CONFIG_SUNRPC_SWAP
+static void xs_set_memalloc(struct rpc_xprt *xprt)
+{
+	struct sock_xprt *transport = container_of(xprt, struct sock_xprt,
+			xprt);
+
+	if (xprt->swapper)
+		sk_set_memalloc(transport->inet);
+}
+
+#define RPC_BUF_RESERVE_PAGES \
+	kmalloc_estimate_objs(sizeof(struct rpc_rqst), GFP_KERNEL, RPC_MAX_SLOT_TABLE)
+#define RPC_RESERVE_PAGES	(RPC_BUF_RESERVE_PAGES + TX_RESERVE_PAGES)
+
+/**
+ * xs_swapper - Tag this transport as being used for swap.
+ * @xprt: transport to tag
+ * @enable: enable/disable
+ *
+ */
+int xs_swapper(struct rpc_xprt *xprt, int enable)
+{
+	struct sock_xprt *transport = container_of(xprt, struct sock_xprt,
+			xprt);
+	int err = 0;
+
+	if (enable) {
+		xprt->swapper++;
+		xs_set_memalloc(xprt);
+	} else if (xprt->swapper) {
+		xprt->swapper--;
+		sk_clear_memalloc(transport->inet);
+	}
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(xs_swapper);
+#else
+static void xs_set_memalloc(struct rpc_xprt *xprt)
+{
+}
+#endif
+
 static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
 {
 	struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
@@ -1955,6 +1998,8 @@ static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
 		transport->sock = sock;
 		transport->inet = sk;
 
+		xs_set_memalloc(xprt);
+
 		write_unlock_bh(&sk->sk_callback_lock);
 	}
 	xs_udp_do_set_buffer_size(xprt);
@@ -1966,11 +2011,15 @@ static void xs_udp_setup_socket(struct work_struct *work)
 		container_of(work, struct sock_xprt, connect_worker.work);
 	struct rpc_xprt *xprt = &transport->xprt;
 	struct socket *sock = transport->sock;
+	unsigned long pflags = current->flags;
 	int status = -EIO;
 
 	if (xprt->shutdown)
 		goto out;
 
+	if (xprt->swapper)
+		current->flags |= PF_MEMALLOC;
+
 	/* Start by resetting any existing state */
 	xs_reset_transport(transport);
 	sock = xs_create_sock(xprt, transport,
@@ -1989,6 +2038,7 @@ static void xs_udp_setup_socket(struct work_struct *work)
 out:
 	xprt_clear_connecting(xprt);
 	xprt_wake_pending_tasks(xprt, status);
+	tsk_restore_flags(current, pflags, PF_MEMALLOC);
 }
 
 /*
@@ -2079,6 +2129,8 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
 	if (!xprt_bound(xprt))
 		goto out;
 
+	xs_set_memalloc(xprt);
+
 	/* Tell the socket layer to start connecting... */
 	xprt->stat.connect_count++;
 	xprt->stat.connect_start = jiffies;
@@ -2109,11 +2161,15 @@ static void xs_tcp_setup_socket(struct work_struct *work)
 		container_of(work, struct sock_xprt, connect_worker.work);
 	struct socket *sock = transport->sock;
 	struct rpc_xprt *xprt = &transport->xprt;
+	unsigned long pflags = current->flags;
 	int status = -EIO;
 
 	if (xprt->shutdown)
 		goto out;
 
+	if (xprt->swapper)
+		current->flags |= PF_MEMALLOC;
+
 	if (!sock) {
 		clear_bit(XPRT_CONNECTION_ABORT, &xprt->state);
 		sock = xs_create_sock(xprt, transport,
@@ -2175,6 +2231,7 @@ out_eagain:
 out:
 	xprt_clear_connecting(xprt);
 	xprt_wake_pending_tasks(xprt, status);
+	tsk_restore_flags(current, pflags, PF_MEMALLOC);
 }
 
 /**
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 08/10] nfs: enable swap on NFS
@ 2011-09-09 11:00   ` Mel Gorman
  0 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-09 11:00 UTC (permalink / raw)
  To: Linux-MM
  Cc: Linux-Netdev, Linux-NFS, LKML, Andrew Morton, David Miller,
	Trond Myklebust, Neil Brown, Peter Zijlstra, Mel Gorman

Implement all the new swapfile a_ops for NFS. This will set the NFS
socket to SOCK_MEMALLOC and run socket reconnect under PF_MEMALLOC as
well as reset SOCK_MEMALLOC before engaging the protocol ->connect()
method.

PF_MEMALLOC should allow the allocation of struct socket and related
objects and the early (re)setting of SOCK_MEMALLOC should allow us
to receive the packets required for the TCP connection buildup.

[dfeng@redhat.com: Fix handling of multiple swap files]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/nfs/Kconfig              |    8 ++++++
 fs/nfs/file.c               |   20 +++++++++++++++
 fs/nfs/write.c              |   33 ++++++++++++++++++++++++-
 include/linux/nfs_fs.h      |    2 +
 include/linux/sunrpc/xprt.h |    3 ++
 net/sunrpc/Kconfig          |    5 ++++
 net/sunrpc/clnt.c           |    2 +
 net/sunrpc/sched.c          |    7 ++++-
 net/sunrpc/xprtsock.c       |   57 +++++++++++++++++++++++++++++++++++++++++++
 9 files changed, 134 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
index dbcd821..7c3b921 100644
--- a/fs/nfs/Kconfig
+++ b/fs/nfs/Kconfig
@@ -74,6 +74,14 @@ config NFS_V4
 
 	  If unsure, say Y.
 
+config NFS_SWAP
+	bool "Provide swap over NFS support"
+	default n
+	depends on NFS_FS
+	select SUNRPC_SWAP
+	help
+	  This option enables swapon to work on files located on NFS mounts.
+
 config NFS_V4_1
 	bool "NFS client support for NFSv4.1 (EXPERIMENTAL)"
 	depends on NFS_FS && NFS_V4 && EXPERIMENTAL
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 38c7cf4..2fdb1bd 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -536,6 +536,20 @@ static int nfs_launder_page(struct page *page)
 	return nfs_wb_page(inode, page);
 }
 
+#ifdef CONFIG_NFS_SWAP
+static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file,
+						sector_t *span)
+{
+	*span = sis->pages;
+	return xs_swapper(NFS_CLIENT(file->f_mapping->host)->cl_xprt, 1);
+}
+
+static void nfs_swap_deactivate(struct file *file)
+{
+	xs_swapper(NFS_CLIENT(file->f_mapping->host)->cl_xprt, 0);
+}
+#endif
+
 const struct address_space_operations nfs_file_aops = {
 	.readpage = nfs_readpage,
 	.readpages = nfs_readpages,
@@ -550,6 +564,12 @@ const struct address_space_operations nfs_file_aops = {
 	.migratepage = nfs_migrate_page,
 	.launder_page = nfs_launder_page,
 	.error_remove_page = generic_error_remove_page,
+#ifdef CONFIG_NFS_SWAP
+	.swap_activate = nfs_swap_activate,
+	.swap_deactivate = nfs_swap_deactivate,
+	.swap_writepage = nfs_swap_writepage,
+	.swap_readpage = nfs_readpage,
+#endif
 };
 
 /*
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 15e3b7a..475e1f2 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -369,6 +369,28 @@ int nfs_writepage(struct page *page, struct writeback_control *wbc)
 	return ret;
 }
 
+static int nfs_writepage_setup(struct nfs_open_context *ctx, struct page *page,
+		unsigned int offset, unsigned int count);
+
+int nfs_swap_writepage(struct file *file, struct page *page,
+		 struct writeback_control *wbc)
+{
+	struct nfs_open_context *ctx = nfs_file_open_context(file);
+	int status;
+
+	status = nfs_writepage_setup(ctx, page, 0, nfs_page_length(page));
+	if (status < 0) {
+		nfs_set_pageerror(page);
+		goto out;
+	}
+
+	status = nfs_writepage_locked(page, wbc);
+
+out:
+	unlock_page(page);
+	return status;
+}
+
 static int nfs_writepages_callback(struct page *page, struct writeback_control *wbc, void *data)
 {
 	int ret;
@@ -734,7 +756,16 @@ static int nfs_writepage_setup(struct nfs_open_context *ctx, struct page *page,
 	/* Update file length */
 	nfs_grow_file(page, offset, count);
 	nfs_mark_uptodate(page, req->wb_pgbase, req->wb_bytes);
-	nfs_mark_request_dirty(req);
+
+	/*
+	 * There is no need to mark swapfile requests as dirty like normal
+	 * writepage requests as page dirtying and cleaning is managed
+	 * from the mm. If a PageSwapCache page is marked dirty like this,
+	 * it will still be dirty after kswapd calls writepage and may
+	 * never be released
+	 */
+	if (!PageSwapCache(page))
+		nfs_mark_request_dirty(req);
 	nfs_clear_page_tag_locked(req);
 	return 0;
 }
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index eaac770..c7a1e01 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -513,6 +513,8 @@ extern int  nfs_writepages(struct address_space *, struct writeback_control *);
 extern int  nfs_flush_incompatible(struct file *file, struct page *page);
 extern int  nfs_updatepage(struct file *, struct page *, unsigned int, unsigned int);
 extern void nfs_writeback_done(struct rpc_task *, struct nfs_write_data *);
+extern int  nfs_swap_writepage(struct file *file, struct page *page,
+			 struct writeback_control *wbc);
 
 /*
  * Try to write back everything synchronously (but check the
diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index 15518a1..bc2fd1e 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -174,6 +174,8 @@ struct rpc_xprt {
 	unsigned long		state;		/* transport state */
 	unsigned char		shutdown   : 1,	/* being shut down */
 				resvport   : 1; /* use a reserved port */
+	unsigned int		swapper; 	/* we're swapping over this
+						   transport */
 	unsigned int		bind_index;	/* bind function index */
 
 	/*
@@ -311,6 +313,7 @@ void			xprt_release_rqst_cong(struct rpc_task *task);
 void			xprt_disconnect_done(struct rpc_xprt *xprt);
 void			xprt_force_disconnect(struct rpc_xprt *xprt);
 void			xprt_conditional_disconnect(struct rpc_xprt *xprt, unsigned int cookie);
+int			xs_swapper(struct rpc_xprt *xprt, int enable);
 
 /*
  * Reserved bit positions in xprt->state
diff --git a/net/sunrpc/Kconfig b/net/sunrpc/Kconfig
index ffd243d..0e9d340 100644
--- a/net/sunrpc/Kconfig
+++ b/net/sunrpc/Kconfig
@@ -21,6 +21,11 @@ config SUNRPC_XPRT_RDMA
 
 	  If unsure, say N.
 
+config SUNRPC_SWAP
+	bool
+	depends on SUNRPC
+	select NETVM
+
 config RPCSEC_GSS_KRB5
 	tristate "Secure RPC: Kerberos V mechanism"
 	depends on SUNRPC && CRYPTO
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index c5347d2..63547e0 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -594,6 +594,8 @@ void rpc_task_set_client(struct rpc_task *task, struct rpc_clnt *clnt)
 		atomic_inc(&clnt->cl_count);
 		if (clnt->cl_softrtry)
 			task->tk_flags |= RPC_TASK_SOFT;
+		if (task->tk_client->cl_xprt->swapper)
+			task->tk_flags |= RPC_TASK_SWAPPER;
 		/* Add to the client's list of all tasks */
 		spin_lock(&clnt->cl_lock);
 		list_add_tail(&task->tk_task, &clnt->cl_tasks);
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index d12ffa5..e116ab2 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -748,7 +748,10 @@ static void rpc_async_schedule(struct work_struct *work)
 void *rpc_malloc(struct rpc_task *task, size_t size)
 {
 	struct rpc_buffer *buf;
-	gfp_t gfp = RPC_IS_SWAPPER(task) ? GFP_ATOMIC : GFP_NOWAIT;
+	gfp_t gfp = GFP_NOWAIT;
+
+	if (RPC_IS_SWAPPER(task))
+		gfp |= __GFP_MEMALLOC;
 
 	size += sizeof(struct rpc_buffer);
 	if (size <= RPC_BUFFER_MAXSIZE)
@@ -828,7 +831,7 @@ static void rpc_init_task(struct rpc_task *task, const struct rpc_task_setup *ta
 static struct rpc_task *
 rpc_alloc_task(void)
 {
-	return (struct rpc_task *)mempool_alloc(rpc_task_mempool, GFP_NOFS);
+	return (struct rpc_task *)mempool_alloc(rpc_task_mempool, GFP_NOIO);
 }
 
 /*
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index d7f97ef..6448abe 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1931,6 +1931,49 @@ out:
 	xprt_wake_pending_tasks(xprt, status);
 }
 
+#ifdef CONFIG_SUNRPC_SWAP
+static void xs_set_memalloc(struct rpc_xprt *xprt)
+{
+	struct sock_xprt *transport = container_of(xprt, struct sock_xprt,
+			xprt);
+
+	if (xprt->swapper)
+		sk_set_memalloc(transport->inet);
+}
+
+#define RPC_BUF_RESERVE_PAGES \
+	kmalloc_estimate_objs(sizeof(struct rpc_rqst), GFP_KERNEL, RPC_MAX_SLOT_TABLE)
+#define RPC_RESERVE_PAGES	(RPC_BUF_RESERVE_PAGES + TX_RESERVE_PAGES)
+
+/**
+ * xs_swapper - Tag this transport as being used for swap.
+ * @xprt: transport to tag
+ * @enable: enable/disable
+ *
+ */
+int xs_swapper(struct rpc_xprt *xprt, int enable)
+{
+	struct sock_xprt *transport = container_of(xprt, struct sock_xprt,
+			xprt);
+	int err = 0;
+
+	if (enable) {
+		xprt->swapper++;
+		xs_set_memalloc(xprt);
+	} else if (xprt->swapper) {
+		xprt->swapper--;
+		sk_clear_memalloc(transport->inet);
+	}
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(xs_swapper);
+#else
+static void xs_set_memalloc(struct rpc_xprt *xprt)
+{
+}
+#endif
+
 static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
 {
 	struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
@@ -1955,6 +1998,8 @@ static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
 		transport->sock = sock;
 		transport->inet = sk;
 
+		xs_set_memalloc(xprt);
+
 		write_unlock_bh(&sk->sk_callback_lock);
 	}
 	xs_udp_do_set_buffer_size(xprt);
@@ -1966,11 +2011,15 @@ static void xs_udp_setup_socket(struct work_struct *work)
 		container_of(work, struct sock_xprt, connect_worker.work);
 	struct rpc_xprt *xprt = &transport->xprt;
 	struct socket *sock = transport->sock;
+	unsigned long pflags = current->flags;
 	int status = -EIO;
 
 	if (xprt->shutdown)
 		goto out;
 
+	if (xprt->swapper)
+		current->flags |= PF_MEMALLOC;
+
 	/* Start by resetting any existing state */
 	xs_reset_transport(transport);
 	sock = xs_create_sock(xprt, transport,
@@ -1989,6 +2038,7 @@ static void xs_udp_setup_socket(struct work_struct *work)
 out:
 	xprt_clear_connecting(xprt);
 	xprt_wake_pending_tasks(xprt, status);
+	tsk_restore_flags(current, pflags, PF_MEMALLOC);
 }
 
 /*
@@ -2079,6 +2129,8 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
 	if (!xprt_bound(xprt))
 		goto out;
 
+	xs_set_memalloc(xprt);
+
 	/* Tell the socket layer to start connecting... */
 	xprt->stat.connect_count++;
 	xprt->stat.connect_start = jiffies;
@@ -2109,11 +2161,15 @@ static void xs_tcp_setup_socket(struct work_struct *work)
 		container_of(work, struct sock_xprt, connect_worker.work);
 	struct socket *sock = transport->sock;
 	struct rpc_xprt *xprt = &transport->xprt;
+	unsigned long pflags = current->flags;
 	int status = -EIO;
 
 	if (xprt->shutdown)
 		goto out;
 
+	if (xprt->swapper)
+		current->flags |= PF_MEMALLOC;
+
 	if (!sock) {
 		clear_bit(XPRT_CONNECTION_ABORT, &xprt->state);
 		sock = xs_create_sock(xprt, transport,
@@ -2175,6 +2231,7 @@ out_eagain:
 out:
 	xprt_clear_connecting(xprt);
 	xprt_wake_pending_tasks(xprt, status);
+	tsk_restore_flags(current, pflags, PF_MEMALLOC);
 }
 
 /**
-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 09/10] nfs: Prevent page allocator recursions with swap over NFS.
  2011-09-09 11:00 ` Mel Gorman
@ 2011-09-09 11:00   ` Mel Gorman
  -1 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-09 11:00 UTC (permalink / raw)
  To: Linux-MM
  Cc: Linux-Netdev, Linux-NFS, LKML, Andrew Morton, David Miller,
	Trond Myklebust, Neil Brown, Peter Zijlstra, Mel Gorman

GFP_NOFS is _more_ permissive than GFP_NOIO in that it will initiate
IO, just not of any filesystem data.

The problem is that previously NOFS was correct because that avoids
recursion into the NFS code. With swap-over-NFS, it is no longer
correct as swap IO can lead to this recursion.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/nfs/pagelist.c |    2 +-
 fs/nfs/write.c    |    7 ++++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 1fcc294..5eb527d 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -27,7 +27,7 @@ static struct kmem_cache *nfs_page_cachep;
 static inline struct nfs_page *
 nfs_page_alloc(void)
 {
-	struct nfs_page	*p = kmem_cache_zalloc(nfs_page_cachep, GFP_KERNEL);
+	struct nfs_page	*p = kmem_cache_zalloc(nfs_page_cachep, GFP_NOIO);
 	if (p)
 		INIT_LIST_HEAD(&p->wb_list);
 	return p;
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 475e1f2..78e4ce6 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -51,7 +51,7 @@ static mempool_t *nfs_commit_mempool;
 
 struct nfs_write_data *nfs_commitdata_alloc(void)
 {
-	struct nfs_write_data *p = mempool_alloc(nfs_commit_mempool, GFP_NOFS);
+	struct nfs_write_data *p = mempool_alloc(nfs_commit_mempool, GFP_NOIO);
 
 	if (p) {
 		memset(p, 0, sizeof(*p));
@@ -71,7 +71,7 @@ EXPORT_SYMBOL_GPL(nfs_commit_free);
 
 struct nfs_write_data *nfs_writedata_alloc(unsigned int pagecount)
 {
-	struct nfs_write_data *p = mempool_alloc(nfs_wdata_mempool, GFP_NOFS);
+	struct nfs_write_data *p = mempool_alloc(nfs_wdata_mempool, GFP_NOIO);
 
 	if (p) {
 		memset(p, 0, sizeof(*p));
@@ -80,7 +80,8 @@ struct nfs_write_data *nfs_writedata_alloc(unsigned int pagecount)
 		if (pagecount <= ARRAY_SIZE(p->page_array))
 			p->pagevec = p->page_array;
 		else {
-			p->pagevec = kcalloc(pagecount, sizeof(struct page *), GFP_NOFS);
+			p->pagevec = kcalloc(pagecount, sizeof(struct page *),
+					GFP_NOIO);
 			if (!p->pagevec) {
 				mempool_free(p, nfs_wdata_mempool);
 				p = NULL;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 09/10] nfs: Prevent page allocator recursions with swap over NFS.
@ 2011-09-09 11:00   ` Mel Gorman
  0 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-09 11:00 UTC (permalink / raw)
  To: Linux-MM
  Cc: Linux-Netdev, Linux-NFS, LKML, Andrew Morton, David Miller,
	Trond Myklebust, Neil Brown, Peter Zijlstra, Mel Gorman

GFP_NOFS is _more_ permissive than GFP_NOIO in that it will initiate
IO, just not of any filesystem data.

The problem is that previously NOFS was correct because that avoids
recursion into the NFS code. With swap-over-NFS, it is no longer
correct as swap IO can lead to this recursion.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/nfs/pagelist.c |    2 +-
 fs/nfs/write.c    |    7 ++++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 1fcc294..5eb527d 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -27,7 +27,7 @@ static struct kmem_cache *nfs_page_cachep;
 static inline struct nfs_page *
 nfs_page_alloc(void)
 {
-	struct nfs_page	*p = kmem_cache_zalloc(nfs_page_cachep, GFP_KERNEL);
+	struct nfs_page	*p = kmem_cache_zalloc(nfs_page_cachep, GFP_NOIO);
 	if (p)
 		INIT_LIST_HEAD(&p->wb_list);
 	return p;
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 475e1f2..78e4ce6 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -51,7 +51,7 @@ static mempool_t *nfs_commit_mempool;
 
 struct nfs_write_data *nfs_commitdata_alloc(void)
 {
-	struct nfs_write_data *p = mempool_alloc(nfs_commit_mempool, GFP_NOFS);
+	struct nfs_write_data *p = mempool_alloc(nfs_commit_mempool, GFP_NOIO);
 
 	if (p) {
 		memset(p, 0, sizeof(*p));
@@ -71,7 +71,7 @@ EXPORT_SYMBOL_GPL(nfs_commit_free);
 
 struct nfs_write_data *nfs_writedata_alloc(unsigned int pagecount)
 {
-	struct nfs_write_data *p = mempool_alloc(nfs_wdata_mempool, GFP_NOFS);
+	struct nfs_write_data *p = mempool_alloc(nfs_wdata_mempool, GFP_NOIO);
 
 	if (p) {
 		memset(p, 0, sizeof(*p));
@@ -80,7 +80,8 @@ struct nfs_write_data *nfs_writedata_alloc(unsigned int pagecount)
 		if (pagecount <= ARRAY_SIZE(p->page_array))
 			p->pagevec = p->page_array;
 		else {
-			p->pagevec = kcalloc(pagecount, sizeof(struct page *), GFP_NOFS);
+			p->pagevec = kcalloc(pagecount, sizeof(struct page *),
+					GFP_NOIO);
 			if (!p->pagevec) {
 				mempool_free(p, nfs_wdata_mempool);
 				p = NULL;
-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 10/10] Avoid dereferencing bd_disk during swap_entry_free for network storage
  2011-09-09 11:00 ` Mel Gorman
@ 2011-09-09 11:00   ` Mel Gorman
  -1 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-09 11:00 UTC (permalink / raw)
  To: Linux-MM
  Cc: Linux-Netdev, Linux-NFS, LKML, Andrew Morton, David Miller,
	Trond Myklebust, Neil Brown, Peter Zijlstra, Mel Gorman

Commit [b3a27d: swap: Add swap slot free callback to
block_device_operations] dereferences p->bdev->bd_disk but this is a
NULL dereference if using swap-over-NFS. This patch checks SWP_BLKDEV
on the swap_info_struct before dereferencing.

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/swapfile.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 806b994..8b85a88 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -547,7 +547,6 @@ static unsigned char swap_entry_free(struct swap_info_struct *p,
 
 	/* free if no reference */
 	if (!usage) {
-		struct gendisk *disk = p->bdev->bd_disk;
 		if (offset < p->lowest_bit)
 			p->lowest_bit = offset;
 		if (offset > p->highest_bit)
@@ -557,9 +556,11 @@ static unsigned char swap_entry_free(struct swap_info_struct *p,
 			swap_list.next = p->type;
 		nr_swap_pages++;
 		p->inuse_pages--;
-		if ((p->flags & SWP_BLKDEV) &&
-				disk->fops->swap_slot_free_notify)
-			disk->fops->swap_slot_free_notify(p->bdev, offset);
+		if (p->flags & SWP_BLKDEV) {
+			struct gendisk *disk = p->bdev->bd_disk;
+			if (disk->fops->swap_slot_free_notify)
+				disk->fops->swap_slot_free_notify(p->bdev, offset);
+		}
 	}
 
 	return usage;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 10/10] Avoid dereferencing bd_disk during swap_entry_free for network storage
@ 2011-09-09 11:00   ` Mel Gorman
  0 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-09 11:00 UTC (permalink / raw)
  To: Linux-MM
  Cc: Linux-Netdev, Linux-NFS, LKML, Andrew Morton, David Miller,
	Trond Myklebust, Neil Brown, Peter Zijlstra, Mel Gorman

Commit [b3a27d: swap: Add swap slot free callback to
block_device_operations] dereferences p->bdev->bd_disk but this is a
NULL dereference if using swap-over-NFS. This patch checks SWP_BLKDEV
on the swap_info_struct before dereferencing.

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/swapfile.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 806b994..8b85a88 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -547,7 +547,6 @@ static unsigned char swap_entry_free(struct swap_info_struct *p,
 
 	/* free if no reference */
 	if (!usage) {
-		struct gendisk *disk = p->bdev->bd_disk;
 		if (offset < p->lowest_bit)
 			p->lowest_bit = offset;
 		if (offset > p->highest_bit)
@@ -557,9 +556,11 @@ static unsigned char swap_entry_free(struct swap_info_struct *p,
 			swap_list.next = p->type;
 		nr_swap_pages++;
 		p->inuse_pages--;
-		if ((p->flags & SWP_BLKDEV) &&
-				disk->fops->swap_slot_free_notify)
-			disk->fops->swap_slot_free_notify(p->bdev, offset);
+		if (p->flags & SWP_BLKDEV) {
+			struct gendisk *disk = p->bdev->bd_disk;
+			if (disk->fops->swap_slot_free_notify)
+				disk->fops->swap_slot_free_notify(p->bdev, offset);
+		}
 	}
 
 	return usage;
-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/10] mm: Add support for a filesystem to control swap files
@ 2011-09-09 13:00     ` Christoph Hellwig
  0 siblings, 0 replies; 45+ messages in thread
From: Christoph Hellwig @ 2011-09-09 13:00 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, Linux-Netdev, Linux-NFS, LKML, Andrew Morton,
	David Miller, Trond Myklebust, Neil Brown, Peter Zijlstra

On Fri, Sep 09, 2011 at 12:00:47PM +0100, Mel Gorman wrote:
> Currently swapfiles are managed entirely by the core VM by using
> ->bmap to allocate space and write to the blocks directly. This
> patch adds address_space_operations methods that allow a filesystem
> to optionally control the swapfile.
> 
>   int swap_activate(struct file *);
>   int swap_deactivate(struct file *);
>   int swap_writepage(struct file *, struct page *, struct writeback_control *);
>   int swap_readpage(struct file *, struct page *);

Just as the last two dozen times this came up:

NAK

The right fix is to add a filesystem method to support direct-I/O on
arbitrary kernel pages, instead of letting the wap abstraction leak into
the filesystem.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/10] mm: Add support for a filesystem to control swap files
@ 2011-09-09 13:00     ` Christoph Hellwig
  0 siblings, 0 replies; 45+ messages in thread
From: Christoph Hellwig @ 2011-09-09 13:00 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, Linux-Netdev, Linux-NFS, LKML, Andrew Morton,
	David Miller, Trond Myklebust, Neil Brown, Peter Zijlstra

On Fri, Sep 09, 2011 at 12:00:47PM +0100, Mel Gorman wrote:
> Currently swapfiles are managed entirely by the core VM by using
> ->bmap to allocate space and write to the blocks directly. This
> patch adds address_space_operations methods that allow a filesystem
> to optionally control the swapfile.
> 
>   int swap_activate(struct file *);
>   int swap_deactivate(struct file *);
>   int swap_writepage(struct file *, struct page *, struct writeback_control *);
>   int swap_readpage(struct file *, struct page *);

Just as the last two dozen times this came up:

NAK

The right fix is to add a filesystem method to support direct-I/O on
arbitrary kernel pages, instead of letting the wap abstraction leak into
the filesystem.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/10] mm: Add support for a filesystem to control swap files
@ 2011-09-09 13:00     ` Christoph Hellwig
  0 siblings, 0 replies; 45+ messages in thread
From: Christoph Hellwig @ 2011-09-09 13:00 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, Linux-Netdev, Linux-NFS, LKML, Andrew Morton,
	David Miller, Trond Myklebust, Neil Brown, Peter Zijlstra

On Fri, Sep 09, 2011 at 12:00:47PM +0100, Mel Gorman wrote:
> Currently swapfiles are managed entirely by the core VM by using
> ->bmap to allocate space and write to the blocks directly. This
> patch adds address_space_operations methods that allow a filesystem
> to optionally control the swapfile.
> 
>   int swap_activate(struct file *);
>   int swap_deactivate(struct file *);
>   int swap_writepage(struct file *, struct page *, struct writeback_control *);
>   int swap_readpage(struct file *, struct page *);

Just as the last two dozen times this came up:

NAK

The right fix is to add a filesystem method to support direct-I/O on
arbitrary kernel pages, instead of letting the wap abstraction leak into
the filesystem.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 10/10] Avoid dereferencing bd_disk during swap_entry_free for network storage
  2011-09-09 11:00   ` Mel Gorman
@ 2011-09-09 13:02     ` Christoph Hellwig
  -1 siblings, 0 replies; 45+ messages in thread
From: Christoph Hellwig @ 2011-09-09 13:02 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, Linux-Netdev, Linux-NFS, LKML, Andrew Morton,
	David Miller, Trond Myklebust, Neil Brown, Peter Zijlstra

On Fri, Sep 09, 2011 at 12:00:54PM +0100, Mel Gorman wrote:
> Commit [b3a27d: swap: Add swap slot free callback to
> block_device_operations] dereferences p->bdev->bd_disk but this is a
> NULL dereference if using swap-over-NFS. This patch checks SWP_BLKDEV
> on the swap_info_struct before dereferencing.

Please just remove the callback entirely.  It has no user outside the
staging tree and was added clearly against the rules for that staging
tree.

(and it's butt ugly)


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 10/10] Avoid dereferencing bd_disk during swap_entry_free for network storage
@ 2011-09-09 13:02     ` Christoph Hellwig
  0 siblings, 0 replies; 45+ messages in thread
From: Christoph Hellwig @ 2011-09-09 13:02 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, Linux-Netdev, Linux-NFS, LKML, Andrew Morton,
	David Miller, Trond Myklebust, Neil Brown, Peter Zijlstra

On Fri, Sep 09, 2011 at 12:00:54PM +0100, Mel Gorman wrote:
> Commit [b3a27d: swap: Add swap slot free callback to
> block_device_operations] dereferences p->bdev->bd_disk but this is a
> NULL dereference if using swap-over-NFS. This patch checks SWP_BLKDEV
> on the swap_info_struct before dereferencing.

Please just remove the callback entirely.  It has no user outside the
staging tree and was added clearly against the rules for that staging
tree.

(and it's butt ugly)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/10] mm: Add support for a filesystem to control swap files
  2011-09-09 13:00     ` Christoph Hellwig
@ 2011-09-09 13:15       ` Mel Gorman
  -1 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-09 13:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linux-MM, Linux-Netdev, Linux-NFS, LKML, Andrew Morton,
	David Miller, Trond Myklebust, Neil Brown, Peter Zijlstra

On Fri, Sep 09, 2011 at 09:00:08AM -0400, Christoph Hellwig wrote:
> On Fri, Sep 09, 2011 at 12:00:47PM +0100, Mel Gorman wrote:
> > Currently swapfiles are managed entirely by the core VM by using
> > ->bmap to allocate space and write to the blocks directly. This
> > patch adds address_space_operations methods that allow a filesystem
> > to optionally control the swapfile.
> > 
> >   int swap_activate(struct file *);
> >   int swap_deactivate(struct file *);
> >   int swap_writepage(struct file *, struct page *, struct writeback_control *);
> >   int swap_readpage(struct file *, struct page *);
> 
> Just as the last two dozen times this came up:
> 
> NAK
> 
> The right fix is to add a filesystem method to support direct-I/O on
> arbitrary kernel pages, instead of letting the wap abstraction leak into
> the filesystem.

Ok.

I confess I haven't investigated this direction at
all yet.  Is it correct that your previous objection was
http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-10/msg00455.html
and the direct-IO patchset you were thinking of was
http://copilotco.com/mail-archives/linux-kernel.2009/msg87176.html ?

If so, are you suggesting that instead of swap_readpage and
swap_writepage I look into what is required for swap to use ->readpage
method and ->direct_IO aops?

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/10] mm: Add support for a filesystem to control swap files
@ 2011-09-09 13:15       ` Mel Gorman
  0 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-09 13:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linux-MM, Linux-Netdev, Linux-NFS, LKML, Andrew Morton,
	David Miller, Trond Myklebust, Neil Brown, Peter Zijlstra

On Fri, Sep 09, 2011 at 09:00:08AM -0400, Christoph Hellwig wrote:
> On Fri, Sep 09, 2011 at 12:00:47PM +0100, Mel Gorman wrote:
> > Currently swapfiles are managed entirely by the core VM by using
> > ->bmap to allocate space and write to the blocks directly. This
> > patch adds address_space_operations methods that allow a filesystem
> > to optionally control the swapfile.
> > 
> >   int swap_activate(struct file *);
> >   int swap_deactivate(struct file *);
> >   int swap_writepage(struct file *, struct page *, struct writeback_control *);
> >   int swap_readpage(struct file *, struct page *);
> 
> Just as the last two dozen times this came up:
> 
> NAK
> 
> The right fix is to add a filesystem method to support direct-I/O on
> arbitrary kernel pages, instead of letting the wap abstraction leak into
> the filesystem.

Ok.

I confess I haven't investigated this direction at
all yet.  Is it correct that your previous objection was
http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-10/msg00455.html
and the direct-IO patchset you were thinking of was
http://copilotco.com/mail-archives/linux-kernel.2009/msg87176.html ?

If so, are you suggesting that instead of swap_readpage and
swap_writepage I look into what is required for swap to use ->readpage
method and ->direct_IO aops?

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/10] mm: Add support for a filesystem to control swap files
  2011-09-09 13:15       ` Mel Gorman
@ 2011-09-09 13:36         ` Christoph Hellwig
  -1 siblings, 0 replies; 45+ messages in thread
From: Christoph Hellwig @ 2011-09-09 13:36 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Christoph Hellwig, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	Andrew Morton, David Miller, Trond Myklebust, Neil Brown,
	Peter Zijlstra

On Fri, Sep 09, 2011 at 02:15:50PM +0100, Mel Gorman wrote:
> 
> I confess I haven't investigated this direction at
> all yet.  Is it correct that your previous objection was
> http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-10/msg00455.html
> and the direct-IO patchset you were thinking of was
> http://copilotco.com/mail-archives/linux-kernel.2009/msg87176.html ?

Yes.

> If so, are you suggesting that instead of swap_readpage and
> swap_writepage I look into what is required for swap to use ->readpage
> method and ->direct_IO aops?

The equivalent of ->direct_IO should be used for both reads and writes.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/10] mm: Add support for a filesystem to control swap files
@ 2011-09-09 13:36         ` Christoph Hellwig
  0 siblings, 0 replies; 45+ messages in thread
From: Christoph Hellwig @ 2011-09-09 13:36 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Christoph Hellwig, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	Andrew Morton, David Miller, Trond Myklebust, Neil Brown,
	Peter Zijlstra

On Fri, Sep 09, 2011 at 02:15:50PM +0100, Mel Gorman wrote:
> 
> I confess I haven't investigated this direction at
> all yet.  Is it correct that your previous objection was
> http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-10/msg00455.html
> and the direct-IO patchset you were thinking of was
> http://copilotco.com/mail-archives/linux-kernel.2009/msg87176.html ?

Yes.

> If so, are you suggesting that instead of swap_readpage and
> swap_writepage I look into what is required for swap to use ->readpage
> method and ->direct_IO aops?

The equivalent of ->direct_IO should be used for both reads and writes.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/10] mm: Add support for a filesystem to control swap files
  2011-09-09 13:36         ` Christoph Hellwig
  (?)
@ 2011-09-12  9:04           ` Peter Zijlstra
  -1 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2011-09-12  9:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Mel Gorman, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	Andrew Morton, David Miller, Trond Myklebust, Neil Brown

On Fri, 2011-09-09 at 09:36 -0400, Christoph Hellwig wrote:
> The equivalent of ->direct_IO should be used for both reads and writes.

So the difference between DIO and swapIO is that swapIO needs the block
map pinned in memory.. So at the very least you'll need those
swap_{activate,deactivate} aops. The read/write-page thingies could
indeed be shared with DIO.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/10] mm: Add support for a filesystem to control swap files
@ 2011-09-12  9:04           ` Peter Zijlstra
  0 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2011-09-12  9:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Mel Gorman, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	Andrew Morton, David Miller, Trond Myklebust, Neil Brown

On Fri, 2011-09-09 at 09:36 -0400, Christoph Hellwig wrote:
> The equivalent of ->direct_IO should be used for both reads and writes.

So the difference between DIO and swapIO is that swapIO needs the block
map pinned in memory.. So at the very least you'll need those
swap_{activate,deactivate} aops. The read/write-page thingies could
indeed be shared with DIO.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/10] mm: Add support for a filesystem to control swap files
@ 2011-09-12  9:04           ` Peter Zijlstra
  0 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2011-09-12  9:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Mel Gorman, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	Andrew Morton, David Miller, Trond Myklebust, Neil Brown

On Fri, 2011-09-09 at 09:36 -0400, Christoph Hellwig wrote:
> The equivalent of ->direct_IO should be used for both reads and writes.

So the difference between DIO and swapIO is that swapIO needs the block
map pinned in memory.. So at the very least you'll need those
swap_{activate,deactivate} aops. The read/write-page thingies could
indeed be shared with DIO.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/10] mm: Add support for a filesystem to control swap files
  2011-09-12  9:04           ` Peter Zijlstra
  (?)
@ 2011-09-12  9:34             ` Mel Gorman
  -1 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-12  9:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Hellwig, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	Andrew Morton, David Miller, Trond Myklebust, Neil Brown

On Mon, Sep 12, 2011 at 11:04:45AM +0200, Peter Zijlstra wrote:
> On Fri, 2011-09-09 at 09:36 -0400, Christoph Hellwig wrote:
> > The equivalent of ->direct_IO should be used for both reads and writes.
> 
> So the difference between DIO and swapIO is that swapIO needs the block
> map pinned in memory.. So at the very least you'll need those
> swap_{activate,deactivate} aops. The read/write-page thingies could
> indeed be shared with DIO.
> 

I'm travelling at the moment so it'll be later in the week when I investigate
properly but I agree swap_[de|a]ctivate are still necessary. NFS does not
need to pin a block map but it's still necessary for calling xs_set_memalloc.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/10] mm: Add support for a filesystem to control swap files
@ 2011-09-12  9:34             ` Mel Gorman
  0 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-12  9:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Hellwig, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	Andrew Morton, David Miller, Trond Myklebust, Neil Brown

On Mon, Sep 12, 2011 at 11:04:45AM +0200, Peter Zijlstra wrote:
> On Fri, 2011-09-09 at 09:36 -0400, Christoph Hellwig wrote:
> > The equivalent of ->direct_IO should be used for both reads and writes.
> 
> So the difference between DIO and swapIO is that swapIO needs the block
> map pinned in memory.. So at the very least you'll need those
> swap_{activate,deactivate} aops. The read/write-page thingies could
> indeed be shared with DIO.
> 

I'm travelling at the moment so it'll be later in the week when I investigate
properly but I agree swap_[de|a]ctivate are still necessary. NFS does not
need to pin a block map but it's still necessary for calling xs_set_memalloc.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/10] mm: Add support for a filesystem to control swap files
@ 2011-09-12  9:34             ` Mel Gorman
  0 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-12  9:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Hellwig, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	Andrew Morton, David Miller, Trond Myklebust, Neil Brown

On Mon, Sep 12, 2011 at 11:04:45AM +0200, Peter Zijlstra wrote:
> On Fri, 2011-09-09 at 09:36 -0400, Christoph Hellwig wrote:
> > The equivalent of ->direct_IO should be used for both reads and writes.
> 
> So the difference between DIO and swapIO is that swapIO needs the block
> map pinned in memory.. So at the very least you'll need those
> swap_{activate,deactivate} aops. The read/write-page thingies could
> indeed be shared with DIO.
> 

I'm travelling at the moment so it'll be later in the week when I investigate
properly but I agree swap_[de|a]ctivate are still necessary. NFS does not
need to pin a block map but it's still necessary for calling xs_set_memalloc.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/10] mm: Add support for a filesystem to control swap files
  2011-09-12  9:34             ` Mel Gorman
  (?)
@ 2011-09-12  9:56               ` Peter Zijlstra
  -1 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2011-09-12  9:56 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Christoph Hellwig, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	Andrew Morton, David Miller, Trond Myklebust, Neil Brown

On Mon, 2011-09-12 at 10:34 +0100, Mel Gorman wrote:
> On Mon, Sep 12, 2011 at 11:04:45AM +0200, Peter Zijlstra wrote:
> > On Fri, 2011-09-09 at 09:36 -0400, Christoph Hellwig wrote:
> > > The equivalent of ->direct_IO should be used for both reads and writes.
> > 
> > So the difference between DIO and swapIO is that swapIO needs the block
> > map pinned in memory.. So at the very least you'll need those
> > swap_{activate,deactivate} aops. The read/write-page thingies could
> > indeed be shared with DIO.
> > 
> 
> I'm travelling at the moment so it'll be later in the week when I investigate
> properly but I agree swap_[de|a]ctivate are still necessary. NFS does not
> need to pin a block map but it's still necessary for calling xs_set_memalloc.

Right.. but I think the hope was that we could replace the current swap
bmap hackery with this and simplify the normal swap bits. But yeah,
networked filesystems don't really bother with block maps on the client
side ;-)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/10] mm: Add support for a filesystem to control swap files
@ 2011-09-12  9:56               ` Peter Zijlstra
  0 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2011-09-12  9:56 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Christoph Hellwig, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	Andrew Morton, David Miller, Trond Myklebust, Neil Brown

On Mon, 2011-09-12 at 10:34 +0100, Mel Gorman wrote:
> On Mon, Sep 12, 2011 at 11:04:45AM +0200, Peter Zijlstra wrote:
> > On Fri, 2011-09-09 at 09:36 -0400, Christoph Hellwig wrote:
> > > The equivalent of ->direct_IO should be used for both reads and writes.
> > 
> > So the difference between DIO and swapIO is that swapIO needs the block
> > map pinned in memory.. So at the very least you'll need those
> > swap_{activate,deactivate} aops. The read/write-page thingies could
> > indeed be shared with DIO.
> > 
> 
> I'm travelling at the moment so it'll be later in the week when I investigate
> properly but I agree swap_[de|a]ctivate are still necessary. NFS does not
> need to pin a block map but it's still necessary for calling xs_set_memalloc.

Right.. but I think the hope was that we could replace the current swap
bmap hackery with this and simplify the normal swap bits. But yeah,
networked filesystems don't really bother with block maps on the client
side ;-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/10] mm: Add support for a filesystem to control swap files
@ 2011-09-12  9:56               ` Peter Zijlstra
  0 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2011-09-12  9:56 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Christoph Hellwig, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	Andrew Morton, David Miller, Trond Myklebust, Neil Brown

On Mon, 2011-09-12 at 10:34 +0100, Mel Gorman wrote:
> On Mon, Sep 12, 2011 at 11:04:45AM +0200, Peter Zijlstra wrote:
> > On Fri, 2011-09-09 at 09:36 -0400, Christoph Hellwig wrote:
> > > The equivalent of ->direct_IO should be used for both reads and writes.
> > 
> > So the difference between DIO and swapIO is that swapIO needs the block
> > map pinned in memory.. So at the very least you'll need those
> > swap_{activate,deactivate} aops. The read/write-page thingies could
> > indeed be shared with DIO.
> > 
> 
> I'm travelling at the moment so it'll be later in the week when I investigate
> properly but I agree swap_[de|a]ctivate are still necessary. NFS does not
> need to pin a block map but it's still necessary for calling xs_set_memalloc.

Right.. but I think the hope was that we could replace the current swap
bmap hackery with this and simplify the normal swap bits. But yeah,
networked filesystems don't really bother with block maps on the client
side ;-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/10] mm: Add support for a filesystem to control swap files
  2011-09-12  9:56               ` Peter Zijlstra
@ 2011-09-12 11:56                 ` Mel Gorman
  -1 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-12 11:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Hellwig, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	Andrew Morton, David Miller, Trond Myklebust, Neil Brown

On Mon, Sep 12, 2011 at 11:56:09AM +0200, Peter Zijlstra wrote:
> On Mon, 2011-09-12 at 10:34 +0100, Mel Gorman wrote:
> > On Mon, Sep 12, 2011 at 11:04:45AM +0200, Peter Zijlstra wrote:
> > > On Fri, 2011-09-09 at 09:36 -0400, Christoph Hellwig wrote:
> > > > The equivalent of ->direct_IO should be used for both reads and writes.
> > > 
> > > So the difference between DIO and swapIO is that swapIO needs the block
> > > map pinned in memory.. So at the very least you'll need those
> > > swap_{activate,deactivate} aops. The read/write-page thingies could
> > > indeed be shared with DIO.
> > > 
> > 
> > I'm travelling at the moment so it'll be later in the week when I investigate
> > properly but I agree swap_[de|a]ctivate are still necessary. NFS does not
> > need to pin a block map but it's still necessary for calling xs_set_memalloc.
> 
> Right.. but I think the hope was that we could replace the current swap
> bmap hackery with this and simplify the normal swap bits. But yeah,
> networked filesystems don't really bother with block maps on the client
> side ;-)

I took a look at what was involved with doing the block lookups in
ext4. It's what led to patch 4 of this series because it was necessary that
the filesystem get the same information as the generic handler. It got a
bit messy but looked like it would have worked if I kept at it. I stopped
because I did nt see a major advantage with swap_writepage() looking up
the block map instead of having looked it up in advance with bmap() but
I could have missed something.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/10] mm: Add support for a filesystem to control swap files
@ 2011-09-12 11:56                 ` Mel Gorman
  0 siblings, 0 replies; 45+ messages in thread
From: Mel Gorman @ 2011-09-12 11:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Hellwig, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	Andrew Morton, David Miller, Trond Myklebust, Neil Brown

On Mon, Sep 12, 2011 at 11:56:09AM +0200, Peter Zijlstra wrote:
> On Mon, 2011-09-12 at 10:34 +0100, Mel Gorman wrote:
> > On Mon, Sep 12, 2011 at 11:04:45AM +0200, Peter Zijlstra wrote:
> > > On Fri, 2011-09-09 at 09:36 -0400, Christoph Hellwig wrote:
> > > > The equivalent of ->direct_IO should be used for both reads and writes.
> > > 
> > > So the difference between DIO and swapIO is that swapIO needs the block
> > > map pinned in memory.. So at the very least you'll need those
> > > swap_{activate,deactivate} aops. The read/write-page thingies could
> > > indeed be shared with DIO.
> > > 
> > 
> > I'm travelling at the moment so it'll be later in the week when I investigate
> > properly but I agree swap_[de|a]ctivate are still necessary. NFS does not
> > need to pin a block map but it's still necessary for calling xs_set_memalloc.
> 
> Right.. but I think the hope was that we could replace the current swap
> bmap hackery with this and simplify the normal swap bits. But yeah,
> networked filesystems don't really bother with block maps on the client
> side ;-)

I took a look at what was involved with doing the block lookups in
ext4. It's what led to patch 4 of this series because it was necessary that
the filesystem get the same information as the generic handler. It got a
bit messy but looked like it would have worked if I kept at it. I stopped
because I did nt see a major advantage with swap_writepage() looking up
the block map instead of having looked it up in advance with bmap() but
I could have missed something.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/10] mm: Add support for a filesystem to control swap files
  2011-09-12 11:56                 ` Mel Gorman
  (?)
@ 2011-09-12 12:06                   ` Peter Zijlstra
  -1 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2011-09-12 12:06 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Christoph Hellwig, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	Andrew Morton, David Miller, Trond Myklebust, Neil Brown

On Mon, 2011-09-12 at 12:56 +0100, Mel Gorman wrote:

> I took a look at what was involved with doing the block lookups in
> ext4. It's what led to patch 4 of this series because it was necessary that
> the filesystem get the same information as the generic handler. It got a
> bit messy but looked like it would have worked if I kept at it. I stopped
> because I did nt see a major advantage with swap_writepage() looking up
> the block map instead of having looked it up in advance with bmap() but
> I could have missed something.

IIRC the filesystem folks don't like the bmap thing and would like it to
go away.. could be they changed their minds again though, who knows ;-)



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/10] mm: Add support for a filesystem to control swap files
@ 2011-09-12 12:06                   ` Peter Zijlstra
  0 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2011-09-12 12:06 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Christoph Hellwig, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	Andrew Morton, David Miller, Trond Myklebust, Neil Brown

On Mon, 2011-09-12 at 12:56 +0100, Mel Gorman wrote:

> I took a look at what was involved with doing the block lookups in
> ext4. It's what led to patch 4 of this series because it was necessary that
> the filesystem get the same information as the generic handler. It got a
> bit messy but looked like it would have worked if I kept at it. I stopped
> because I did nt see a major advantage with swap_writepage() looking up
> the block map instead of having looked it up in advance with bmap() but
> I could have missed something.

IIRC the filesystem folks don't like the bmap thing and would like it to
go away.. could be they changed their minds again though, who knows ;-)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/10] mm: Add support for a filesystem to control swap files
@ 2011-09-12 12:06                   ` Peter Zijlstra
  0 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2011-09-12 12:06 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Christoph Hellwig, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	Andrew Morton, David Miller, Trond Myklebust, Neil Brown

On Mon, 2011-09-12 at 12:56 +0100, Mel Gorman wrote:

> I took a look at what was involved with doing the block lookups in
> ext4. It's what led to patch 4 of this series because it was necessary that
> the filesystem get the same information as the generic handler. It got a
> bit messy but looked like it would have worked if I kept at it. I stopped
> because I did nt see a major advantage with swap_writepage() looking up
> the block map instead of having looked it up in advance with bmap() but
> I could have missed something.

IIRC the filesystem folks don't like the bmap thing and would like it to
go away.. could be they changed their minds again though, who knows ;-)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2011-09-12 12:07 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-09 11:00 [RFC PATCH 00/10] Swap-over-NFS without deadlocking v1 Mel Gorman
2011-09-09 11:00 ` Mel Gorman
2011-09-09 11:00 ` [PATCH 01/10] netvm: Prevent a stream-specific deadlock Mel Gorman
2011-09-09 11:00   ` Mel Gorman
2011-09-09 11:00 ` [PATCH 02/10] selinux: tag avc cache alloc as non-critical Mel Gorman
2011-09-09 11:00   ` Mel Gorman
2011-09-09 11:00 ` [PATCH 03/10] mm: Add support for a filesystem to control swap files Mel Gorman
2011-09-09 11:00   ` Mel Gorman
2011-09-09 13:00   ` Christoph Hellwig
2011-09-09 13:00     ` Christoph Hellwig
2011-09-09 13:00     ` Christoph Hellwig
2011-09-09 13:15     ` Mel Gorman
2011-09-09 13:15       ` Mel Gorman
2011-09-09 13:36       ` Christoph Hellwig
2011-09-09 13:36         ` Christoph Hellwig
2011-09-12  9:04         ` Peter Zijlstra
2011-09-12  9:04           ` Peter Zijlstra
2011-09-12  9:04           ` Peter Zijlstra
2011-09-12  9:34           ` Mel Gorman
2011-09-12  9:34             ` Mel Gorman
2011-09-12  9:34             ` Mel Gorman
2011-09-12  9:56             ` Peter Zijlstra
2011-09-12  9:56               ` Peter Zijlstra
2011-09-12  9:56               ` Peter Zijlstra
2011-09-12 11:56               ` Mel Gorman
2011-09-12 11:56                 ` Mel Gorman
2011-09-12 12:06                 ` Peter Zijlstra
2011-09-12 12:06                   ` Peter Zijlstra
2011-09-12 12:06                   ` Peter Zijlstra
2011-09-09 11:00 ` [PATCH 04/10] mm: swap: Implement generic handlers for swap-related address ops Mel Gorman
2011-09-09 11:00   ` Mel Gorman
2011-09-09 11:00 ` [PATCH 05/10] mm: Methods for teaching filesystems about PG_swapcache pages Mel Gorman
2011-09-09 11:00   ` Mel Gorman
2011-09-09 11:00 ` [PATCH 06/10] nfs: teach the NFS client how to treat " Mel Gorman
2011-09-09 11:00   ` Mel Gorman
2011-09-09 11:00 ` [PATCH 07/10] nfs: disable data cache revalidation for swapfiles Mel Gorman
2011-09-09 11:00   ` Mel Gorman
2011-09-09 11:00 ` [PATCH 08/10] nfs: enable swap on NFS Mel Gorman
2011-09-09 11:00   ` Mel Gorman
2011-09-09 11:00 ` [PATCH 09/10] nfs: Prevent page allocator recursions with swap over NFS Mel Gorman
2011-09-09 11:00   ` Mel Gorman
2011-09-09 11:00 ` [PATCH 10/10] Avoid dereferencing bd_disk during swap_entry_free for network storage Mel Gorman
2011-09-09 11:00   ` Mel Gorman
2011-09-09 13:02   ` Christoph Hellwig
2011-09-09 13:02     ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.