linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5]  Remove possible deadlocks in nfs_release_page() - V3
@ 2014-09-24  1:28 NeilBrown
  2014-09-24  1:28 ` [PATCH 3/5] NFS: avoid deadlocks with loop-back mounted NFS filesystems NeilBrown
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: NeilBrown @ 2014-09-24  1:28 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: linux-nfs, linux-kernel, linux-mm, Ingo Molnar, linux-fsdevel,
	Andrew Morton, Jeff Layton, Peter Zijlstra

This set includes acked-by's from Andrew and Peter so it should be
OK for all five patches to go upstream through the NFS tree.

I split the congestion tracking patch out from the wait-for-PG_private
patch as they are conceptually separate.

This set continues to perform well in my tests and addresses all
issues that have been raised.

Thanks a lot,
NeilBrown


---

NeilBrown (5):
      SCHED: add some "wait..on_bit...timeout()" interfaces.
      MM: export page_wakeup functions
      NFS: avoid deadlocks with loop-back mounted NFS filesystems.
      NFS: avoid waiting at all in nfs_release_page when congested.
      NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page()


 fs/nfs/file.c                   |   29 +++++++++++++++++++----------
 fs/nfs/write.c                  |    7 +++++++
 include/linux/pagemap.h         |   12 ++++++++++--
 include/linux/wait.h            |    5 ++++-
 kernel/sched/wait.c             |   36 ++++++++++++++++++++++++++++++++++++
 mm/filemap.c                    |   21 +++++++++++++++------
 net/sunrpc/sched.c              |    2 --
 net/sunrpc/xprtrdma/transport.c |    2 --
 net/sunrpc/xprtsock.c           |   10 ----------
 9 files changed, 91 insertions(+), 33 deletions(-)

-- 
Signature


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/5] SCHED: add some "wait..on_bit...timeout()" interfaces.
  2014-09-24  1:28 [PATCH 0/5] Remove possible deadlocks in nfs_release_page() - V3 NeilBrown
                   ` (2 preceding siblings ...)
  2014-09-24  1:28 ` [PATCH 4/5] NFS: avoid waiting at all in nfs_release_page when congested NeilBrown
@ 2014-09-24  1:28 ` NeilBrown
  2014-09-24  7:04   ` Ingo Molnar
  2014-09-24  1:28 ` [PATCH 5/5] NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page() NeilBrown
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: NeilBrown @ 2014-09-24  1:28 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: linux-nfs, linux-kernel, linux-mm, Ingo Molnar, linux-fsdevel,
	Andrew Morton, Jeff Layton, Peter Zijlstra

In commit c1221321b7c25b53204447cff9949a6d5a7ddddc
   sched: Allow wait_on_bit_action() functions to support a timeout

I suggested that a "wait_on_bit_timeout()" interface would not meet my
need.  This isn't true - I was just over-engineering.

Including a 'private' field in wait_bit_key instead of a focused
"timeout" field was just premature generalization.  If some other
use is ever found, it can be generalized or added later.

So this patch renames "private" to "timeout" with a meaning "stop
waiting when "jiffies" reaches or passes "timeout",
and adds two of the many possible wait..bit..timeout() interfaces:

wait_on_page_bit_killable_timeout(), which is the one I want to use,
and out_of_line_wait_on_bit_timeout() which is a reasonably general
example.  Others can be added as needed.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NeilBrown <neilb@suse.de>
---
 include/linux/pagemap.h |    2 ++
 include/linux/wait.h    |    5 ++++-
 kernel/sched/wait.c     |   36 ++++++++++++++++++++++++++++++++++++
 mm/filemap.c            |   13 +++++++++++++
 4 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 3df8c7db7a4e..87f9e4230d3a 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -502,6 +502,8 @@ static inline int lock_page_or_retry(struct page *page, struct mm_struct *mm,
 extern void wait_on_page_bit(struct page *page, int bit_nr);
 
 extern int wait_on_page_bit_killable(struct page *page, int bit_nr);
+extern int wait_on_page_bit_killable_timeout(struct page *page,
+					     int bit_nr, unsigned long timeout);
 
 static inline int wait_on_page_locked_killable(struct page *page)
 {
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 6fb1ba5f9b2f..80115bf88671 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -25,7 +25,7 @@ struct wait_bit_key {
 	void			*flags;
 	int			bit_nr;
 #define WAIT_ATOMIC_T_BIT_NR	-1
-	unsigned long		private;
+	unsigned long		timeout;
 };
 
 struct wait_bit_queue {
@@ -154,6 +154,7 @@ int __wait_on_bit_lock(wait_queue_head_t *, struct wait_bit_queue *, wait_bit_ac
 void wake_up_bit(void *, int);
 void wake_up_atomic_t(atomic_t *);
 int out_of_line_wait_on_bit(void *, int, wait_bit_action_f *, unsigned);
+int out_of_line_wait_on_bit_timeout(void *, int, wait_bit_action_f *, unsigned, unsigned long);
 int out_of_line_wait_on_bit_lock(void *, int, wait_bit_action_f *, unsigned);
 int out_of_line_wait_on_atomic_t(atomic_t *, int (*)(atomic_t *), unsigned);
 wait_queue_head_t *bit_waitqueue(void *, int);
@@ -859,6 +860,8 @@ int wake_bit_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
 
 extern int bit_wait(struct wait_bit_key *);
 extern int bit_wait_io(struct wait_bit_key *);
+extern int bit_wait_timeout(struct wait_bit_key *);
+extern int bit_wait_io_timeout(struct wait_bit_key *);
 
 /**
  * wait_on_bit - wait for a bit to be cleared
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index 15cab1a4f84e..380678b3cba4 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -343,6 +343,18 @@ int __sched out_of_line_wait_on_bit(void *word, int bit,
 }
 EXPORT_SYMBOL(out_of_line_wait_on_bit);
 
+int __sched out_of_line_wait_on_bit_timeout(
+	void *word, int bit, wait_bit_action_f *action,
+	unsigned mode, unsigned long timeout)
+{
+	wait_queue_head_t *wq = bit_waitqueue(word, bit);
+	DEFINE_WAIT_BIT(wait, word, bit);
+
+	wait.key.timeout = jiffies + timeout;
+	return __wait_on_bit(wq, &wait, action, mode);
+}
+EXPORT_SYMBOL(out_of_line_wait_on_bit_timeout);
+
 int __sched
 __wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
 			wait_bit_action_f *action, unsigned mode)
@@ -520,3 +532,27 @@ __sched int bit_wait_io(struct wait_bit_key *word)
 	return 0;
 }
 EXPORT_SYMBOL(bit_wait_io);
+
+__sched int bit_wait_timeout(struct wait_bit_key *word)
+{
+	unsigned long now = ACCESS_ONCE(jiffies);
+	if (signal_pending_state(current->state, current))
+		return 1;
+	if (time_after_eq(now, word->timeout))
+		return -EAGAIN;
+	schedule_timeout(word->timeout - now);
+	return 0;
+}
+EXPORT_SYMBOL(bit_wait_timeout);
+
+__sched int bit_wait_io_timeout(struct wait_bit_key *word)
+{
+	unsigned long now = ACCESS_ONCE(jiffies);
+	if (signal_pending_state(current->state, current))
+		return 1;
+	if (time_after_eq(now, word->timeout))
+		return -EAGAIN;
+	io_schedule_timeout(word->timeout - now);
+	return 0;
+}
+EXPORT_SYMBOL(bit_wait_io_timeout);
diff --git a/mm/filemap.c b/mm/filemap.c
index 90effcdf948d..4a19c084bdb1 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -703,6 +703,19 @@ int wait_on_page_bit_killable(struct page *page, int bit_nr)
 			     bit_wait_io, TASK_KILLABLE);
 }
 
+int wait_on_page_bit_killable_timeout(struct page *page,
+				       int bit_nr, unsigned long timeout)
+{
+	DEFINE_WAIT_BIT(wait, &page->flags, bit_nr);
+
+	wait.key.timeout = jiffies + timeout;
+	if (!test_bit(bit_nr, &page->flags))
+		return 0;
+	return __wait_on_bit(page_waitqueue(page), &wait,
+			     bit_wait_io_timeout, TASK_KILLABLE);
+}
+EXPORT_SYMBOL(wait_on_page_bit_killable_timeout);
+
 /**
  * add_page_wait_queue - Add an arbitrary waiter to a page's wait queue
  * @page: Page defining the wait queue of interest



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/5] MM: export page_wakeup functions
  2014-09-24  1:28 [PATCH 0/5] Remove possible deadlocks in nfs_release_page() - V3 NeilBrown
  2014-09-24  1:28 ` [PATCH 3/5] NFS: avoid deadlocks with loop-back mounted NFS filesystems NeilBrown
@ 2014-09-24  1:28 ` NeilBrown
  2014-09-24  1:28 ` [PATCH 4/5] NFS: avoid waiting at all in nfs_release_page when congested NeilBrown
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2014-09-24  1:28 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: linux-nfs, linux-kernel, linux-mm, Ingo Molnar, linux-fsdevel,
	Andrew Morton, Jeff Layton, Peter Zijlstra

This will allow NFS to wait for PG_private to be cleared and,
particularly, to send a wake-up when it is.

Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
---
 include/linux/pagemap.h |   10 ++++++++--
 mm/filemap.c            |    8 ++------
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 87f9e4230d3a..2dca0cef3506 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -496,8 +496,8 @@ static inline int lock_page_or_retry(struct page *page, struct mm_struct *mm,
 }
 
 /*
- * This is exported only for wait_on_page_locked/wait_on_page_writeback.
- * Never use this directly!
+ * This is exported only for wait_on_page_locked/wait_on_page_writeback,
+ * and for filesystems which need to wait on PG_private.
  */
 extern void wait_on_page_bit(struct page *page, int bit_nr);
 
@@ -512,6 +512,12 @@ static inline int wait_on_page_locked_killable(struct page *page)
 	return 0;
 }
 
+extern wait_queue_head_t *page_waitqueue(struct page *page);
+static inline void wake_up_page(struct page *page, int bit)
+{
+	__wake_up_bit(page_waitqueue(page), &page->flags, bit);
+}
+
 /* 
  * Wait for a page to be unlocked.
  *
diff --git a/mm/filemap.c b/mm/filemap.c
index 4a19c084bdb1..c9ba09f2ad3c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -670,17 +670,13 @@ EXPORT_SYMBOL(__page_cache_alloc);
  * at a cost of "thundering herd" phenomena during rare hash
  * collisions.
  */
-static wait_queue_head_t *page_waitqueue(struct page *page)
+wait_queue_head_t *page_waitqueue(struct page *page)
 {
 	const struct zone *zone = page_zone(page);
 
 	return &zone->wait_table[hash_ptr(page, zone->wait_table_bits)];
 }
-
-static inline void wake_up_page(struct page *page, int bit)
-{
-	__wake_up_bit(page_waitqueue(page), &page->flags, bit);
-}
+EXPORT_SYMBOL(page_waitqueue);
 
 void wait_on_page_bit(struct page *page, int bit_nr)
 {



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/5] NFS: avoid deadlocks with loop-back mounted NFS filesystems.
  2014-09-24  1:28 [PATCH 0/5] Remove possible deadlocks in nfs_release_page() - V3 NeilBrown
@ 2014-09-24  1:28 ` NeilBrown
  2014-09-24  1:28 ` [PATCH 2/5] MM: export page_wakeup functions NeilBrown
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2014-09-24  1:28 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: linux-nfs, linux-kernel, linux-mm, Ingo Molnar, linux-fsdevel,
	Andrew Morton, Jeff Layton, Peter Zijlstra

Support for loop-back mounted NFS filesystems is useful when NFS is
used to access shared storage in a high-availability cluster.

If the node running the NFS server fails, some other node can mount the
filesystem and start providing NFS service.  If that node already had
the filesystem NFS mounted, it will now have it loop-back mounted.

nfsd can suffer a deadlock when allocating memory and entering direct
reclaim.
While direct reclaim does not write to the NFS filesystem it can send
and wait for a COMMIT through nfs_release_page().

This patch modifies nfs_release_page() to wait a limited time for the
commit to complete - one second.  If the commit doesn't complete
in this time, nfs_release_page() will fail.  This means it might now
fail in some cases where it wouldn't before.  These cases are only
when 'gfp' includes '__GFP_WAIT'.

nfs_release_page() is only called by try_to_release_page(), and that
can only be called on an NFS page with required 'gfp' flags from
 - page_cache_pipe_buf_steal() in splice.c
 - shrink_page_list() in vmscan.c
 - invalidate_inode_pages2_range() in truncate.c

The first two handle failure quite safely.  The last is only called
after ->launder_page() has been called, and that will have waited
for the commit to finish already.

So aborting if the commit takes longer than 1 second is perfectly safe.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfs/file.c  |   26 ++++++++++++++++----------
 fs/nfs/write.c |    2 ++
 2 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 524dd80d1898..ef5513322cf6 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -468,17 +468,23 @@ static int nfs_release_page(struct page *page, gfp_t gfp)
 
 	dfprintk(PAGECACHE, "NFS: release_page(%p)\n", page);
 
-	/* Only do I/O if gfp is a superset of GFP_KERNEL, and we're not
-	 * doing this memory reclaim for a fs-related allocation.
+	/* Always try to initiate a 'commit' if relevant, but only
+	 * wait for it if __GFP_WAIT is set and the calling process is
+	 * allowed to block.  Even then, only wait 1 second.
+	 * Waiting indefinitely can cause deadlocks when the NFS
+	 * server is on this machine, and there is no particular need
+	 * to wait extensively here.  A short wait has the benefit
+	 * that someone else can worry about the freezer.
 	 */
-	if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL &&
-	    !(current->flags & PF_FSTRANS)) {
-		int how = FLUSH_SYNC;
-
-		/* Don't let kswapd deadlock waiting for OOM RPC calls */
-		if (current_is_kswapd())
-			how = 0;
-		nfs_commit_inode(mapping->host, how);
+	if (mapping) {
+		struct nfs_server *nfss = NFS_SERVER(mapping->host);
+		nfs_commit_inode(mapping->host, 0);
+		if ((gfp & __GFP_WAIT) &&
+		    !current_is_kswapd() &&
+		    !(current->flags & PF_FSTRANS)) {
+			wait_on_page_bit_killable_timeout(page, PG_private,
+							  HZ);
+		}
 	}
 	/* If PagePrivate() is set, then the page is not freeable */
 	if (PagePrivate(page))
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 175d5d073ccf..b5d83c7545d4 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -731,6 +731,8 @@ static void nfs_inode_remove_request(struct nfs_page *req)
 		if (likely(!PageSwapCache(head->wb_page))) {
 			set_page_private(head->wb_page, 0);
 			ClearPagePrivate(head->wb_page);
+			smp_mb__after_atomic();
+			wake_up_page(head->wb_page, PG_private);
 			clear_bit(PG_MAPPED, &head->wb_flags);
 		}
 		nfsi->npages--;



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/5] NFS: avoid waiting at all in nfs_release_page when congested.
  2014-09-24  1:28 [PATCH 0/5] Remove possible deadlocks in nfs_release_page() - V3 NeilBrown
  2014-09-24  1:28 ` [PATCH 3/5] NFS: avoid deadlocks with loop-back mounted NFS filesystems NeilBrown
  2014-09-24  1:28 ` [PATCH 2/5] MM: export page_wakeup functions NeilBrown
@ 2014-09-24  1:28 ` NeilBrown
  2014-09-24  1:28 ` [PATCH 1/5] SCHED: add some "wait..on_bit...timeout()" interfaces NeilBrown
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2014-09-24  1:28 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: linux-nfs, linux-kernel, linux-mm, Ingo Molnar, linux-fsdevel,
	Andrew Morton, Jeff Layton, Peter Zijlstra

If nfs_release_page() is called on a sequence of pages which are all
in the same file which is blocked on COMMIT, each page could
contribute a 1 second delay which could be come excessive.  I have
seen delays of as much as 208 seconds.

To keep the delay to one second, mark the bdi as write-congested
if the commit didn't finished.  Once it does finish, the
write-congested flag will be cleared by nfs_commit_release_pages().

With this, the longest total delay in try_to_free_pages that I have
seen is under 3 seconds.  With no waiting in nfs_release_page at all
I have seen delays of nearly 1.5 seconds.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfs/file.c  |    9 +++++++--
 fs/nfs/write.c |    5 +++++
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index ef5513322cf6..1243a15438d0 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -470,7 +470,8 @@ static int nfs_release_page(struct page *page, gfp_t gfp)
 
 	/* Always try to initiate a 'commit' if relevant, but only
 	 * wait for it if __GFP_WAIT is set and the calling process is
-	 * allowed to block.  Even then, only wait 1 second.
+	 * allowed to block.  Even then, only wait 1 second and only
+	 * if the 'bdi' is not congested.
 	 * Waiting indefinitely can cause deadlocks when the NFS
 	 * server is on this machine, and there is no particular need
 	 * to wait extensively here.  A short wait has the benefit
@@ -481,9 +482,13 @@ static int nfs_release_page(struct page *page, gfp_t gfp)
 		nfs_commit_inode(mapping->host, 0);
 		if ((gfp & __GFP_WAIT) &&
 		    !current_is_kswapd() &&
-		    !(current->flags & PF_FSTRANS)) {
+		    !(current->flags & PF_FSTRANS) &&
+		    !bdi_write_congested(&nfss->backing_dev_info)) {
 			wait_on_page_bit_killable_timeout(page, PG_private,
 							  HZ);
+			if (PagePrivate(page))
+				set_bdi_congested(&nfss->backing_dev_info,
+						  BLK_RW_ASYNC);
 		}
 	}
 	/* If PagePrivate() is set, then the page is not freeable */
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index b5d83c7545d4..3066c7fcb565 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1638,6 +1638,7 @@ static void nfs_commit_release_pages(struct nfs_commit_data *data)
 	struct nfs_page	*req;
 	int status = data->task.tk_status;
 	struct nfs_commit_info cinfo;
+	struct nfs_server *nfss;
 
 	while (!list_empty(&data->pages)) {
 		req = nfs_list_entry(data->pages.next);
@@ -1671,6 +1672,10 @@ static void nfs_commit_release_pages(struct nfs_commit_data *data)
 	next:
 		nfs_unlock_and_release_request(req);
 	}
+	nfss = NFS_SERVER(data->inode);
+	if (atomic_long_read(&nfss->writeback) < NFS_CONGESTION_OFF_THRESH)
+		clear_bdi_congested(&nfss->backing_dev_info, BLK_RW_ASYNC);
+
 	nfs_init_cinfo(&cinfo, data->inode, data->dreq);
 	if (atomic_dec_and_test(&cinfo.mds->rpcs_out))
 		nfs_commit_clear_lock(NFS_I(data->inode));



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/5] NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page()
  2014-09-24  1:28 [PATCH 0/5] Remove possible deadlocks in nfs_release_page() - V3 NeilBrown
                   ` (3 preceding siblings ...)
  2014-09-24  1:28 ` [PATCH 1/5] SCHED: add some "wait..on_bit...timeout()" interfaces NeilBrown
@ 2014-09-24  1:28 ` NeilBrown
  2014-09-24  2:06 ` [PATCH 0/5] Remove possible deadlocks in nfs_release_page() - V3 Trond Myklebust
  2014-09-24 11:27 ` Jeff Layton
  6 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2014-09-24  1:28 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: linux-nfs, linux-kernel, linux-mm, Ingo Molnar, linux-fsdevel,
	Andrew Morton, Jeff Layton, Peter Zijlstra

Now that nfs_release_page() doesn't block indefinitely, other deadlock
avoidance mechanisms aren't needed.
 - it doesn't hurt for kswapd to block occasionally.  If it doesn't
   want to block it would clear __GFP_WAIT.  The current_is_kswapd()
   was only added to avoid deadlocks and we have a new approach for
   that.
 - memory allocation in the SUNRPC layer can very rarely try to
   ->releasepage() a page it is trying to handle.  The deadlock
   is removed as nfs_release_page() doesn't block indefinitely.

So we don't need to set PF_FSTRANS for sunrpc network operations any
more.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfs/file.c                   |   14 ++++++--------
 net/sunrpc/sched.c              |    2 --
 net/sunrpc/xprtrdma/transport.c |    2 --
 net/sunrpc/xprtsock.c           |   10 ----------
 4 files changed, 6 insertions(+), 22 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 1243a15438d0..de322d3f4a29 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -469,20 +469,18 @@ static int nfs_release_page(struct page *page, gfp_t gfp)
 	dfprintk(PAGECACHE, "NFS: release_page(%p)\n", page);
 
 	/* Always try to initiate a 'commit' if relevant, but only
-	 * wait for it if __GFP_WAIT is set and the calling process is
-	 * allowed to block.  Even then, only wait 1 second and only
-	 * if the 'bdi' is not congested.
+	 * wait for it if __GFP_WAIT is set.  Even then, only wait 1
+	 * second and only if the 'bdi' is not congested.
 	 * Waiting indefinitely can cause deadlocks when the NFS
-	 * server is on this machine, and there is no particular need
-	 * to wait extensively here.  A short wait has the benefit
-	 * that someone else can worry about the freezer.
+	 * server is on this machine, when a new TCP connection is
+	 * needed and in other rare cases.  There is no particular
+	 * need to wait extensively here.  A short wait has the
+	 * benefit that someone else can worry about the freezer.
 	 */
 	if (mapping) {
 		struct nfs_server *nfss = NFS_SERVER(mapping->host);
 		nfs_commit_inode(mapping->host, 0);
 		if ((gfp & __GFP_WAIT) &&
-		    !current_is_kswapd() &&
-		    !(current->flags & PF_FSTRANS) &&
 		    !bdi_write_congested(&nfss->backing_dev_info)) {
 			wait_on_page_bit_killable_timeout(page, PG_private,
 							  HZ);
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 9358c79fd589..fe3441abdbe5 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -821,9 +821,7 @@ void rpc_execute(struct rpc_task *task)
 
 static void rpc_async_schedule(struct work_struct *work)
 {
-	current->flags |= PF_FSTRANS;
 	__rpc_execute(container_of(work, struct rpc_task, u.tk_work));
-	current->flags &= ~PF_FSTRANS;
 }
 
 /**
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 2faac4940563..6a4615dd0261 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -205,7 +205,6 @@ xprt_rdma_connect_worker(struct work_struct *work)
 	struct rpc_xprt *xprt = &r_xprt->xprt;
 	int rc = 0;
 
-	current->flags |= PF_FSTRANS;
 	xprt_clear_connected(xprt);
 
 	dprintk("RPC:       %s: %sconnect\n", __func__,
@@ -216,7 +215,6 @@ xprt_rdma_connect_worker(struct work_struct *work)
 
 	dprintk("RPC:       %s: exit\n", __func__);
 	xprt_clear_connecting(xprt);
-	current->flags &= ~PF_FSTRANS;
 }
 
 /*
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 43cd89eacfab..4707c0c8568b 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1927,8 +1927,6 @@ static int xs_local_setup_socket(struct sock_xprt *transport)
 	struct socket *sock;
 	int status = -EIO;
 
-	current->flags |= PF_FSTRANS;
-
 	clear_bit(XPRT_CONNECTION_ABORT, &xprt->state);
 	status = __sock_create(xprt->xprt_net, AF_LOCAL,
 					SOCK_STREAM, 0, &sock, 1);
@@ -1968,7 +1966,6 @@ static int xs_local_setup_socket(struct sock_xprt *transport)
 out:
 	xprt_clear_connecting(xprt);
 	xprt_wake_pending_tasks(xprt, status);
-	current->flags &= ~PF_FSTRANS;
 	return status;
 }
 
@@ -2071,8 +2068,6 @@ static void xs_udp_setup_socket(struct work_struct *work)
 	struct socket *sock = transport->sock;
 	int status = -EIO;
 
-	current->flags |= PF_FSTRANS;
-
 	/* Start by resetting any existing state */
 	xs_reset_transport(transport);
 	sock = xs_create_sock(xprt, transport,
@@ -2092,7 +2087,6 @@ static void xs_udp_setup_socket(struct work_struct *work)
 out:
 	xprt_clear_connecting(xprt);
 	xprt_wake_pending_tasks(xprt, status);
-	current->flags &= ~PF_FSTRANS;
 }
 
 /*
@@ -2229,8 +2223,6 @@ static void xs_tcp_setup_socket(struct work_struct *work)
 	struct rpc_xprt *xprt = &transport->xprt;
 	int status = -EIO;
 
-	current->flags |= PF_FSTRANS;
-
 	if (!sock) {
 		clear_bit(XPRT_CONNECTION_ABORT, &xprt->state);
 		sock = xs_create_sock(xprt, transport,
@@ -2276,7 +2268,6 @@ static void xs_tcp_setup_socket(struct work_struct *work)
 	case -EINPROGRESS:
 	case -EALREADY:
 		xprt_clear_connecting(xprt);
-		current->flags &= ~PF_FSTRANS;
 		return;
 	case -EINVAL:
 		/* Happens, for instance, if the user specified a link
@@ -2294,7 +2285,6 @@ out_eagain:
 out:
 	xprt_clear_connecting(xprt);
 	xprt_wake_pending_tasks(xprt, status);
-	current->flags &= ~PF_FSTRANS;
 }
 
 /**



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/5] Remove possible deadlocks in nfs_release_page() - V3
  2014-09-24  1:28 [PATCH 0/5] Remove possible deadlocks in nfs_release_page() - V3 NeilBrown
                   ` (4 preceding siblings ...)
  2014-09-24  1:28 ` [PATCH 5/5] NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page() NeilBrown
@ 2014-09-24  2:06 ` Trond Myklebust
  2014-09-24 11:27 ` Jeff Layton
  6 siblings, 0 replies; 13+ messages in thread
From: Trond Myklebust @ 2014-09-24  2:06 UTC (permalink / raw)
  To: NeilBrown
  Cc: Linux NFS Mailing List, Linux Kernel mailing list, linux-mm,
	Ingo Molnar, Devel FS Linux, Andrew Morton, Jeff Layton,
	Peter Zijlstra

On Tue, Sep 23, 2014 at 9:28 PM, NeilBrown <neilb@suse.de> wrote:
> This set includes acked-by's from Andrew and Peter so it should be
> OK for all five patches to go upstream through the NFS tree.
>
> I split the congestion tracking patch out from the wait-for-PG_private
> patch as they are conceptually separate.
>
> This set continues to perform well in my tests and addresses all
> issues that have been raised.
>
> Thanks a lot,
> NeilBrown
>

Thanks Neil! I'll give them a final review tomorrow, and then queue
them up for the 3.18 merge window.

-- 
Trond Myklebust

Linux NFS client maintainer, PrimaryData

trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/5] SCHED: add some "wait..on_bit...timeout()" interfaces.
  2014-09-24  1:28 ` [PATCH 1/5] SCHED: add some "wait..on_bit...timeout()" interfaces NeilBrown
@ 2014-09-24  7:04   ` Ingo Molnar
  2014-09-25  3:23     ` NeilBrown
  2014-09-25  3:55     ` [PATCH 1/5 - resend] " NeilBrown
  0 siblings, 2 replies; 13+ messages in thread
From: Ingo Molnar @ 2014-09-24  7:04 UTC (permalink / raw)
  To: NeilBrown
  Cc: Trond Myklebust, linux-nfs, linux-kernel, linux-mm, Ingo Molnar,
	linux-fsdevel, Andrew Morton, Jeff Layton, Peter Zijlstra


* NeilBrown <neilb@suse.de> wrote:

> @@ -859,6 +860,8 @@ int wake_bit_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
>  
>  extern int bit_wait(struct wait_bit_key *);
>  extern int bit_wait_io(struct wait_bit_key *);
> +extern int bit_wait_timeout(struct wait_bit_key *);
> +extern int bit_wait_io_timeout(struct wait_bit_key *);
>  
>  /**
>   * wait_on_bit - wait for a bit to be cleared
> diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
> index 15cab1a4f84e..380678b3cba4 100644
> --- a/kernel/sched/wait.c
> +++ b/kernel/sched/wait.c
> @@ -343,6 +343,18 @@ int __sched out_of_line_wait_on_bit(void *word, int bit,
>  }
>  EXPORT_SYMBOL(out_of_line_wait_on_bit);
>  
> +int __sched out_of_line_wait_on_bit_timeout(
> +	void *word, int bit, wait_bit_action_f *action,
> +	unsigned mode, unsigned long timeout)
> +{
> +	wait_queue_head_t *wq = bit_waitqueue(word, bit);
> +	DEFINE_WAIT_BIT(wait, word, bit);
> +
> +	wait.key.timeout = jiffies + timeout;
> +	return __wait_on_bit(wq, &wait, action, mode);
> +}
> +EXPORT_SYMBOL(out_of_line_wait_on_bit_timeout);
> +
>  int __sched
>  __wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
>  			wait_bit_action_f *action, unsigned mode)
> @@ -520,3 +532,27 @@ __sched int bit_wait_io(struct wait_bit_key *word)
>  	return 0;
>  }
>  EXPORT_SYMBOL(bit_wait_io);
> +
> +__sched int bit_wait_timeout(struct wait_bit_key *word)
> +{
> +	unsigned long now = ACCESS_ONCE(jiffies);
> +	if (signal_pending_state(current->state, current))
> +		return 1;
> +	if (time_after_eq(now, word->timeout))
> +		return -EAGAIN;
> +	schedule_timeout(word->timeout - now);
> +	return 0;
> +}
> +EXPORT_SYMBOL(bit_wait_timeout);
> +
> +__sched int bit_wait_io_timeout(struct wait_bit_key *word)
> +{
> +	unsigned long now = ACCESS_ONCE(jiffies);
> +	if (signal_pending_state(current->state, current))
> +		return 1;
> +	if (time_after_eq(now, word->timeout))
> +		return -EAGAIN;
> +	io_schedule_timeout(word->timeout - now);
> +	return 0;
> +}
> +EXPORT_SYMBOL(bit_wait_io_timeout);

New scheduler APIs should be exported via EXPORT_SYMBOL_GPL().

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/5]  Remove possible deadlocks in nfs_release_page() - V3
  2014-09-24  1:28 [PATCH 0/5] Remove possible deadlocks in nfs_release_page() - V3 NeilBrown
                   ` (5 preceding siblings ...)
  2014-09-24  2:06 ` [PATCH 0/5] Remove possible deadlocks in nfs_release_page() - V3 Trond Myklebust
@ 2014-09-24 11:27 ` Jeff Layton
  6 siblings, 0 replies; 13+ messages in thread
From: Jeff Layton @ 2014-09-24 11:27 UTC (permalink / raw)
  To: NeilBrown
  Cc: Trond Myklebust, linux-nfs, linux-kernel, linux-mm, Ingo Molnar,
	linux-fsdevel, Andrew Morton, Jeff Layton, Peter Zijlstra

On Wed, 24 Sep 2014 11:28:32 +1000
NeilBrown <neilb@suse.de> wrote:

> This set includes acked-by's from Andrew and Peter so it should be
> OK for all five patches to go upstream through the NFS tree.
> 
> I split the congestion tracking patch out from the wait-for-PG_private
> patch as they are conceptually separate.
> 
> This set continues to perform well in my tests and addresses all
> issues that have been raised.
> 
> Thanks a lot,
> NeilBrown
> 
> 
> ---
> 
> NeilBrown (5):
>       SCHED: add some "wait..on_bit...timeout()" interfaces.
>       MM: export page_wakeup functions
>       NFS: avoid deadlocks with loop-back mounted NFS filesystems.
>       NFS: avoid waiting at all in nfs_release_page when congested.
>       NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page()
> 
> 
>  fs/nfs/file.c                   |   29 +++++++++++++++++++----------
>  fs/nfs/write.c                  |    7 +++++++
>  include/linux/pagemap.h         |   12 ++++++++++--
>  include/linux/wait.h            |    5 ++++-
>  kernel/sched/wait.c             |   36 ++++++++++++++++++++++++++++++++++++
>  mm/filemap.c                    |   21 +++++++++++++++------
>  net/sunrpc/sched.c              |    2 --
>  net/sunrpc/xprtrdma/transport.c |    2 --
>  net/sunrpc/xprtsock.c           |   10 ----------
>  9 files changed, 91 insertions(+), 33 deletions(-)
> 

Cool!

This looks like it'll address my earlier concern about setting the BDI
congested inappropriately. You can add this to the set if you like:

Acked-by: Jeff Layton <jlayton@primarydata.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/5] SCHED: add some "wait..on_bit...timeout()" interfaces.
  2014-09-24  7:04   ` Ingo Molnar
@ 2014-09-25  3:23     ` NeilBrown
  2014-09-25  3:28       ` Trond Myklebust
  2014-09-25  3:55     ` [PATCH 1/5 - resend] " NeilBrown
  1 sibling, 1 reply; 13+ messages in thread
From: NeilBrown @ 2014-09-25  3:23 UTC (permalink / raw)
  To: Ingo Molnar, Trond Myklebust
  Cc: linux-nfs, linux-kernel, linux-mm, Ingo Molnar, linux-fsdevel,
	Andrew Morton, Jeff Layton, Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 2622 bytes --]

On Wed, 24 Sep 2014 09:04:18 +0200 Ingo Molnar <mingo@kernel.org> wrote:

> 
> * NeilBrown <neilb@suse.de> wrote:
> 
> > @@ -859,6 +860,8 @@ int wake_bit_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
> >  
> >  extern int bit_wait(struct wait_bit_key *);
> >  extern int bit_wait_io(struct wait_bit_key *);
> > +extern int bit_wait_timeout(struct wait_bit_key *);
> > +extern int bit_wait_io_timeout(struct wait_bit_key *);
> >  
> >  /**
> >   * wait_on_bit - wait for a bit to be cleared
> > diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
> > index 15cab1a4f84e..380678b3cba4 100644
> > --- a/kernel/sched/wait.c
> > +++ b/kernel/sched/wait.c
> > @@ -343,6 +343,18 @@ int __sched out_of_line_wait_on_bit(void *word, int bit,
> >  }
> >  EXPORT_SYMBOL(out_of_line_wait_on_bit);
> >  
> > +int __sched out_of_line_wait_on_bit_timeout(
> > +	void *word, int bit, wait_bit_action_f *action,
> > +	unsigned mode, unsigned long timeout)
> > +{
> > +	wait_queue_head_t *wq = bit_waitqueue(word, bit);
> > +	DEFINE_WAIT_BIT(wait, word, bit);
> > +
> > +	wait.key.timeout = jiffies + timeout;
> > +	return __wait_on_bit(wq, &wait, action, mode);
> > +}
> > +EXPORT_SYMBOL(out_of_line_wait_on_bit_timeout);
> > +
> >  int __sched
> >  __wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
> >  			wait_bit_action_f *action, unsigned mode)
> > @@ -520,3 +532,27 @@ __sched int bit_wait_io(struct wait_bit_key *word)
> >  	return 0;
> >  }
> >  EXPORT_SYMBOL(bit_wait_io);
> > +
> > +__sched int bit_wait_timeout(struct wait_bit_key *word)
> > +{
> > +	unsigned long now = ACCESS_ONCE(jiffies);
> > +	if (signal_pending_state(current->state, current))
> > +		return 1;
> > +	if (time_after_eq(now, word->timeout))
> > +		return -EAGAIN;
> > +	schedule_timeout(word->timeout - now);
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL(bit_wait_timeout);
> > +
> > +__sched int bit_wait_io_timeout(struct wait_bit_key *word)
> > +{
> > +	unsigned long now = ACCESS_ONCE(jiffies);
> > +	if (signal_pending_state(current->state, current))
> > +		return 1;
> > +	if (time_after_eq(now, word->timeout))
> > +		return -EAGAIN;
> > +	io_schedule_timeout(word->timeout - now);
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL(bit_wait_io_timeout);
> 
> New scheduler APIs should be exported via EXPORT_SYMBOL_GPL().
> 

Fine with me.

 Trond, can you just edit that into the patch you have, or do you want me to
 re-send?
 Also maybe added Jeff's 
    Acked-by: Jeff Layton <jlayton@primarydata.com>
 to the NFS bits.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/5] SCHED: add some "wait..on_bit...timeout()" interfaces.
  2014-09-25  3:23     ` NeilBrown
@ 2014-09-25  3:28       ` Trond Myklebust
  0 siblings, 0 replies; 13+ messages in thread
From: Trond Myklebust @ 2014-09-25  3:28 UTC (permalink / raw)
  To: NeilBrown
  Cc: Ingo Molnar, Linux NFS Mailing List, Linux Kernel mailing list,
	linux-mm, Ingo Molnar, Devel FS Linux, Andrew Morton,
	Jeff Layton, Peter Zijlstra

On Wed, Sep 24, 2014 at 11:23 PM, NeilBrown <neilb@suse.de> wrote:
> On Wed, 24 Sep 2014 09:04:18 +0200 Ingo Molnar <mingo@kernel.org> wrote:
>
>>
>> * NeilBrown <neilb@suse.de> wrote:
>>
>> > @@ -859,6 +860,8 @@ int wake_bit_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
>> >
>> >  extern int bit_wait(struct wait_bit_key *);
>> >  extern int bit_wait_io(struct wait_bit_key *);
>> > +extern int bit_wait_timeout(struct wait_bit_key *);
>> > +extern int bit_wait_io_timeout(struct wait_bit_key *);
>> >
>> >  /**
>> >   * wait_on_bit - wait for a bit to be cleared
>> > diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
>> > index 15cab1a4f84e..380678b3cba4 100644
>> > --- a/kernel/sched/wait.c
>> > +++ b/kernel/sched/wait.c
>> > @@ -343,6 +343,18 @@ int __sched out_of_line_wait_on_bit(void *word, int bit,
>> >  }
>> >  EXPORT_SYMBOL(out_of_line_wait_on_bit);
>> >
>> > +int __sched out_of_line_wait_on_bit_timeout(
>> > +   void *word, int bit, wait_bit_action_f *action,
>> > +   unsigned mode, unsigned long timeout)
>> > +{
>> > +   wait_queue_head_t *wq = bit_waitqueue(word, bit);
>> > +   DEFINE_WAIT_BIT(wait, word, bit);
>> > +
>> > +   wait.key.timeout = jiffies + timeout;
>> > +   return __wait_on_bit(wq, &wait, action, mode);
>> > +}
>> > +EXPORT_SYMBOL(out_of_line_wait_on_bit_timeout);
>> > +
>> >  int __sched
>> >  __wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
>> >                     wait_bit_action_f *action, unsigned mode)
>> > @@ -520,3 +532,27 @@ __sched int bit_wait_io(struct wait_bit_key *word)
>> >     return 0;
>> >  }
>> >  EXPORT_SYMBOL(bit_wait_io);
>> > +
>> > +__sched int bit_wait_timeout(struct wait_bit_key *word)
>> > +{
>> > +   unsigned long now = ACCESS_ONCE(jiffies);
>> > +   if (signal_pending_state(current->state, current))
>> > +           return 1;
>> > +   if (time_after_eq(now, word->timeout))
>> > +           return -EAGAIN;
>> > +   schedule_timeout(word->timeout - now);
>> > +   return 0;
>> > +}
>> > +EXPORT_SYMBOL(bit_wait_timeout);
>> > +
>> > +__sched int bit_wait_io_timeout(struct wait_bit_key *word)
>> > +{
>> > +   unsigned long now = ACCESS_ONCE(jiffies);
>> > +   if (signal_pending_state(current->state, current))
>> > +           return 1;
>> > +   if (time_after_eq(now, word->timeout))
>> > +           return -EAGAIN;
>> > +   io_schedule_timeout(word->timeout - now);
>> > +   return 0;
>> > +}
>> > +EXPORT_SYMBOL(bit_wait_io_timeout);
>>
>> New scheduler APIs should be exported via EXPORT_SYMBOL_GPL().
>>
>
> Fine with me.
>
>  Trond, can you just edit that into the patch you have, or do you want me to
>  re-send?
>  Also maybe added Jeff's
>     Acked-by: Jeff Layton <jlayton@primarydata.com>
>  to the NFS bits.
>

Can you please resend just this patch so that the final version goes
out to linux-mm, linux-kernel etc?
I can edit in the Acked-by Jeff to the NFS bits as I apply.

-- 
Trond Myklebust

Linux NFS client maintainer, PrimaryData

trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/5 - resend] SCHED: add some "wait..on_bit...timeout()" interfaces.
  2014-09-24  7:04   ` Ingo Molnar
  2014-09-25  3:23     ` NeilBrown
@ 2014-09-25  3:55     ` NeilBrown
  2014-09-25  5:01       ` Ingo Molnar
  1 sibling, 1 reply; 13+ messages in thread
From: NeilBrown @ 2014-09-25  3:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Trond Myklebust, linux-nfs, linux-kernel, linux-mm, Ingo Molnar,
	linux-fsdevel, Andrew Morton, Jeff Layton, Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 5259 bytes --]


In commit c1221321b7c25b53204447cff9949a6d5a7ddddc
   sched: Allow wait_on_bit_action() functions to support a timeout

I suggested that a "wait_on_bit_timeout()" interface would not meet my
need.  This isn't true - I was just over-engineering.

Including a 'private' field in wait_bit_key instead of a focused
"timeout" field was just premature generalization.  If some other
use is ever found, it can be generalized or added later.

So this patch renames "private" to "timeout" with a meaning "stop
waiting when "jiffies" reaches or passes "timeout",
and adds two of the many possible wait..bit..timeout() interfaces:

wait_on_page_bit_killable_timeout(), which is the one I want to use,
and out_of_line_wait_on_bit_timeout() which is a reasonably general
example.  Others can be added as needed.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NeilBrown <neilb@suse.de>

---
This time with EXPORT_SYMBOL_GPL.

Thanks,
NeilBrown


diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 3df8c7db7a4e..87f9e4230d3a 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -502,6 +502,8 @@ static inline int lock_page_or_retry(struct page *page, struct mm_struct *mm,
 extern void wait_on_page_bit(struct page *page, int bit_nr);
 
 extern int wait_on_page_bit_killable(struct page *page, int bit_nr);
+extern int wait_on_page_bit_killable_timeout(struct page *page,
+					     int bit_nr, unsigned long timeout);
 
 static inline int wait_on_page_locked_killable(struct page *page)
 {
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 6fb1ba5f9b2f..80115bf88671 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -25,7 +25,7 @@ struct wait_bit_key {
 	void			*flags;
 	int			bit_nr;
 #define WAIT_ATOMIC_T_BIT_NR	-1
-	unsigned long		private;
+	unsigned long		timeout;
 };
 
 struct wait_bit_queue {
@@ -154,6 +154,7 @@ int __wait_on_bit_lock(wait_queue_head_t *, struct wait_bit_queue *, wait_bit_ac
 void wake_up_bit(void *, int);
 void wake_up_atomic_t(atomic_t *);
 int out_of_line_wait_on_bit(void *, int, wait_bit_action_f *, unsigned);
+int out_of_line_wait_on_bit_timeout(void *, int, wait_bit_action_f *, unsigned, unsigned long);
 int out_of_line_wait_on_bit_lock(void *, int, wait_bit_action_f *, unsigned);
 int out_of_line_wait_on_atomic_t(atomic_t *, int (*)(atomic_t *), unsigned);
 wait_queue_head_t *bit_waitqueue(void *, int);
@@ -859,6 +860,8 @@ int wake_bit_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
 
 extern int bit_wait(struct wait_bit_key *);
 extern int bit_wait_io(struct wait_bit_key *);
+extern int bit_wait_timeout(struct wait_bit_key *);
+extern int bit_wait_io_timeout(struct wait_bit_key *);
 
 /**
  * wait_on_bit - wait for a bit to be cleared
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index 15cab1a4f84e..5a62915f47a8 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -343,6 +343,18 @@ int __sched out_of_line_wait_on_bit(void *word, int bit,
 }
 EXPORT_SYMBOL(out_of_line_wait_on_bit);
 
+int __sched out_of_line_wait_on_bit_timeout(
+	void *word, int bit, wait_bit_action_f *action,
+	unsigned mode, unsigned long timeout)
+{
+	wait_queue_head_t *wq = bit_waitqueue(word, bit);
+	DEFINE_WAIT_BIT(wait, word, bit);
+
+	wait.key.timeout = jiffies + timeout;
+	return __wait_on_bit(wq, &wait, action, mode);
+}
+EXPORT_SYMBOL_GPL(out_of_line_wait_on_bit_timeout);
+
 int __sched
 __wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
 			wait_bit_action_f *action, unsigned mode)
@@ -520,3 +532,27 @@ __sched int bit_wait_io(struct wait_bit_key *word)
 	return 0;
 }
 EXPORT_SYMBOL(bit_wait_io);
+
+__sched int bit_wait_timeout(struct wait_bit_key *word)
+{
+	unsigned long now = ACCESS_ONCE(jiffies);
+	if (signal_pending_state(current->state, current))
+		return 1;
+	if (time_after_eq(now, word->timeout))
+		return -EAGAIN;
+	schedule_timeout(word->timeout - now);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(bit_wait_timeout);
+
+__sched int bit_wait_io_timeout(struct wait_bit_key *word)
+{
+	unsigned long now = ACCESS_ONCE(jiffies);
+	if (signal_pending_state(current->state, current))
+		return 1;
+	if (time_after_eq(now, word->timeout))
+		return -EAGAIN;
+	io_schedule_timeout(word->timeout - now);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(bit_wait_io_timeout);
diff --git a/mm/filemap.c b/mm/filemap.c
index 90effcdf948d..cbe5a9013f70 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -703,6 +703,19 @@ int wait_on_page_bit_killable(struct page *page, int bit_nr)
 			     bit_wait_io, TASK_KILLABLE);
 }
 
+int wait_on_page_bit_killable_timeout(struct page *page,
+				       int bit_nr, unsigned long timeout)
+{
+	DEFINE_WAIT_BIT(wait, &page->flags, bit_nr);
+
+	wait.key.timeout = jiffies + timeout;
+	if (!test_bit(bit_nr, &page->flags))
+		return 0;
+	return __wait_on_bit(page_waitqueue(page), &wait,
+			     bit_wait_io_timeout, TASK_KILLABLE);
+}
+EXPORT_SYMBOL_GPL(wait_on_page_bit_killable_timeout);
+
 /**
  * add_page_wait_queue - Add an arbitrary waiter to a page's wait queue
  * @page: Page defining the wait queue of interest

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/5 - resend] SCHED: add some "wait..on_bit...timeout()" interfaces.
  2014-09-25  3:55     ` [PATCH 1/5 - resend] " NeilBrown
@ 2014-09-25  5:01       ` Ingo Molnar
  0 siblings, 0 replies; 13+ messages in thread
From: Ingo Molnar @ 2014-09-25  5:01 UTC (permalink / raw)
  To: NeilBrown
  Cc: Trond Myklebust, linux-nfs, linux-kernel, linux-mm, Ingo Molnar,
	linux-fsdevel, Andrew Morton, Jeff Layton, Peter Zijlstra


* NeilBrown <neilb@suse.de> wrote:

> 
> In commit c1221321b7c25b53204447cff9949a6d5a7ddddc
>    sched: Allow wait_on_bit_action() functions to support a timeout
> 
> I suggested that a "wait_on_bit_timeout()" interface would not meet my
> need.  This isn't true - I was just over-engineering.
> 
> Including a 'private' field in wait_bit_key instead of a focused
> "timeout" field was just premature generalization.  If some other
> use is ever found, it can be generalized or added later.
> 
> So this patch renames "private" to "timeout" with a meaning "stop
> waiting when "jiffies" reaches or passes "timeout",
> and adds two of the many possible wait..bit..timeout() interfaces:
> 
> wait_on_page_bit_killable_timeout(), which is the one I want to use,
> and out_of_line_wait_on_bit_timeout() which is a reasonably general
> example.  Others can be added as needed.
> 
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: NeilBrown <neilb@suse.de>
> 
> ---
> This time with EXPORT_SYMBOL_GPL.

Looks good to me, thanks!

Acked-by: Ingo Molnar <mingo@kernel.org>

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-09-25  5:01 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-24  1:28 [PATCH 0/5] Remove possible deadlocks in nfs_release_page() - V3 NeilBrown
2014-09-24  1:28 ` [PATCH 3/5] NFS: avoid deadlocks with loop-back mounted NFS filesystems NeilBrown
2014-09-24  1:28 ` [PATCH 2/5] MM: export page_wakeup functions NeilBrown
2014-09-24  1:28 ` [PATCH 4/5] NFS: avoid waiting at all in nfs_release_page when congested NeilBrown
2014-09-24  1:28 ` [PATCH 1/5] SCHED: add some "wait..on_bit...timeout()" interfaces NeilBrown
2014-09-24  7:04   ` Ingo Molnar
2014-09-25  3:23     ` NeilBrown
2014-09-25  3:28       ` Trond Myklebust
2014-09-25  3:55     ` [PATCH 1/5 - resend] " NeilBrown
2014-09-25  5:01       ` Ingo Molnar
2014-09-24  1:28 ` [PATCH 5/5] NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page() NeilBrown
2014-09-24  2:06 ` [PATCH 0/5] Remove possible deadlocks in nfs_release_page() - V3 Trond Myklebust
2014-09-24 11:27 ` Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).