All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: Make truncate_inode_pages_range() killable
@ 2017-04-14 21:55 Bart Van Assche
  2017-04-14 23:45 ` Bart Van Assche
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Bart Van Assche @ 2017-04-14 21:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Bart Van Assche, Oleg Nesterov, Michal Hocko, Mel Gorman,
	Hugh Dickins, Mike Snitzer, Jan Kara, Hannes Reinecke, linux-mm

The default behavior of multipathd is to run kpartx against newly
discovered paths. Avoid that these kpartx processes become unkillable
if there are no paths left and when using queue_if_no_path. This patch
avoids that kpartx sporadically hangs as follows:

Call Trace:
 __schedule+0x3df/0xc10
 schedule+0x3d/0x90
 io_schedule+0x16/0x40
 __lock_page+0x111/0x140
 truncate_inode_pages_range+0x462/0x790
 truncate_inode_pages+0x15/0x20
 kill_bdev+0x35/0x40
 __blkdev_put+0x76/0x220
 blkdev_put+0x4e/0x170
 blkdev_close+0x25/0x30
 __fput+0xed/0x1f0
 ____fput+0xe/0x10
 task_work_run+0x85/0xc0
 do_exit+0x311/0xc70
 do_group_exit+0x50/0xd0
 get_signal+0x2c7/0x930
 do_signal+0x28/0x6b0
 exit_to_usermode_loop+0x62/0xa0
 do_syscall_64+0xda/0x140
 entry_SYSCALL64_slow_path+0x25/0x25

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Hannes Reinecke <hare@suse.com>
Cc: linux-mm@kvack.org
---
 mm/truncate.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/truncate.c b/mm/truncate.c
index 6263affdef88..91abd16d74f8 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -20,6 +20,7 @@
 #include <linux/task_io_accounting_ops.h>
 #include <linux/buffer_head.h>	/* grr. try_to_release_page,
 				   do_invalidatepage */
+#include <linux/sched/signal.h>
 #include <linux/shmem_fs.h>
 #include <linux/cleancache.h>
 #include <linux/rmap.h>
@@ -366,7 +367,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
 		return;
 
 	index = start;
-	for ( ; ; ) {
+	for ( ; !signal_pending_state(TASK_WAKEKILL, current); ) {
 		cond_resched();
 		if (!pagevec_lookup_entries(&pvec, mapping, index,
 			min(end - index, (pgoff_t)PAGEVEC_SIZE), indices)) {
@@ -400,7 +401,8 @@ void truncate_inode_pages_range(struct address_space *mapping,
 				continue;
 			}
 
-			lock_page(page);
+			if (lock_page_killable(page))
+				break;
 			WARN_ON(page_to_index(page) != index);
 			wait_on_page_writeback(page);
 			truncate_inode_page(mapping, page);
-- 
2.12.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: Make truncate_inode_pages_range() killable
  2017-04-14 21:55 [PATCH] mm: Make truncate_inode_pages_range() killable Bart Van Assche
@ 2017-04-14 23:45 ` Bart Van Assche
  2017-04-15  0:40 ` Hugh Dickins
  2017-04-18 14:42 ` Oleg Nesterov
  2 siblings, 0 replies; 7+ messages in thread
From: Bart Van Assche @ 2017-04-14 23:45 UTC (permalink / raw)
  To: akpm; +Cc: hughd, linux-mm, snitzer, oleg, hare, mhocko, mgorman, jack

On Fri, 2017-04-14 at 14:55 -0700, Bart Van Assche wrote:
> diff --git a/mm/truncate.c b/mm/truncate.c
> index 6263affdef88..91abd16d74f8 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -20,6 +20,7 @@
>  #include <linux/task_io_accounting_ops.h>
>  #include <linux/buffer_head.h>	/* grr. try_to_release_page,
>  				   do_invalidatepage */
> +#include <linux/sched/signal.h>
>  #include <linux/shmem_fs.h>
>  #include <linux/cleancache.h>
>  #include <linux/rmap.h>
> @@ -366,7 +367,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
>  		return;
>  
>  	index = start;
> -	for ( ; ; ) {
> +	for ( ; !signal_pending_state(TASK_WAKEKILL, current); ) {
>  		cond_resched();
>  		if (!pagevec_lookup_entries(&pvec, mapping, index,
>  			min(end - index, (pgoff_t)PAGEVEC_SIZE), indices)) {
> @@ -400,7 +401,8 @@ void truncate_inode_pages_range(struct address_space *mapping,
>  				continue;
>  			}
>  
> -			lock_page(page);
> +			if (lock_page_killable(page))
> +				break;
>  			WARN_ON(page_to_index(page) != index);
>  			wait_on_page_writeback(page);
>  			truncate_inode_page(mapping, page);

Sorry but a small part of this patch got left out accidentally:

diff --git a/kernel/signal.c b/kernel/signal.c
index 7e59ebc2c25e..a02b273a4a1c 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -869,10 +869,10 @@ static inline int wants_signal(int sig, struct task_struct *p)
 {
 	if (sigismember(&p->blocked, sig))
 		return 0;
-	if (p->flags & PF_EXITING)
-		return 0;
 	if (sig == SIGKILL)
 		return 1;
+	if (p->flags & PF_EXITING)
+		return 0;
 	if (task_is_stopped_or_traced(p))
 		return 0;
 	return task_curr(p) || !signal_pending(p);

Does anyone who is on the CC-list of this e-mail know whether this change
is acceptable? As far as I can see the most recent change to that function
was made through the following commit:

commit 188a1eafa03aaa5e5fe6f53e637e704cd2c31c7c
Author: Linus Torvalds <torvalds@g5.osdl.org>
Date:   Fri Sep 23 13:22:21 2005 -0700

    Make sure SIGKILL gets proper respect

Thanks,

Bart.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: Make truncate_inode_pages_range() killable
  2017-04-14 21:55 [PATCH] mm: Make truncate_inode_pages_range() killable Bart Van Assche
  2017-04-14 23:45 ` Bart Van Assche
@ 2017-04-15  0:40 ` Hugh Dickins
  2017-04-15  0:59   ` Bart Van Assche
  2017-04-18 14:42 ` Oleg Nesterov
  2 siblings, 1 reply; 7+ messages in thread
From: Hugh Dickins @ 2017-04-15  0:40 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Andrew Morton, Oleg Nesterov, Michal Hocko, Mel Gorman,
	Hugh Dickins, Mike Snitzer, Jan Kara, Hannes Reinecke, linux-mm,
	linux-fsdevel

On Fri, 14 Apr 2017, Bart Van Assche wrote:

> The default behavior of multipathd is to run kpartx against newly
> discovered paths. Avoid that these kpartx processes become unkillable
> if there are no paths left and when using queue_if_no_path. This patch
> avoids that kpartx sporadically hangs as follows:
> 
> Call Trace:
>  __schedule+0x3df/0xc10
>  schedule+0x3d/0x90
>  io_schedule+0x16/0x40
>  __lock_page+0x111/0x140
>  truncate_inode_pages_range+0x462/0x790
>  truncate_inode_pages+0x15/0x20
>  kill_bdev+0x35/0x40
>  __blkdev_put+0x76/0x220
>  blkdev_put+0x4e/0x170
>  blkdev_close+0x25/0x30
>  __fput+0xed/0x1f0
>  ____fput+0xe/0x10
>  task_work_run+0x85/0xc0
>  do_exit+0x311/0xc70
>  do_group_exit+0x50/0xd0
>  get_signal+0x2c7/0x930
>  do_signal+0x28/0x6b0
>  exit_to_usermode_loop+0x62/0xa0
>  do_syscall_64+0xda/0x140
>  entry_SYSCALL64_slow_path+0x25/0x25
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Mike Snitzer <snitzer@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: linux-mm@kvack.org

Changing a fundamental function, silently not to do its essential job,
when something in the kernel has forgotten (or is slow to) unlock_page():
that seems very wrong to me in many ways.  But linux-fsdevel, Cc'ed, will
be a better forum to advise on how to solve the problem you're seeing.

Hugh

> ---
>  mm/truncate.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/truncate.c b/mm/truncate.c
> index 6263affdef88..91abd16d74f8 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -20,6 +20,7 @@
>  #include <linux/task_io_accounting_ops.h>
>  #include <linux/buffer_head.h>	/* grr. try_to_release_page,
>  				   do_invalidatepage */
> +#include <linux/sched/signal.h>
>  #include <linux/shmem_fs.h>
>  #include <linux/cleancache.h>
>  #include <linux/rmap.h>
> @@ -366,7 +367,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
>  		return;
>  
>  	index = start;
> -	for ( ; ; ) {
> +	for ( ; !signal_pending_state(TASK_WAKEKILL, current); ) {
>  		cond_resched();
>  		if (!pagevec_lookup_entries(&pvec, mapping, index,
>  			min(end - index, (pgoff_t)PAGEVEC_SIZE), indices)) {
> @@ -400,7 +401,8 @@ void truncate_inode_pages_range(struct address_space *mapping,
>  				continue;
>  			}
>  
> -			lock_page(page);
> +			if (lock_page_killable(page))
> +				break;
>  			WARN_ON(page_to_index(page) != index);
>  			wait_on_page_writeback(page);
>  			truncate_inode_page(mapping, page);
> -- 
> 2.12.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: Make truncate_inode_pages_range() killable
  2017-04-15  0:40 ` Hugh Dickins
@ 2017-04-15  0:59   ` Bart Van Assche
  2017-04-18  8:15     ` Michal Hocko
  0 siblings, 1 reply; 7+ messages in thread
From: Bart Van Assche @ 2017-04-15  0:59 UTC (permalink / raw)
  To: hughd
  Cc: linux-mm, snitzer, oleg, akpm, hare, mhocko, linux-fsdevel,
	mgorman, jack

On Fri, 2017-04-14 at 17:40 -0700, Hugh Dickins wrote:
> Changing a fundamental function, silently not to do its essential job,
> when something in the kernel has forgotten (or is slow to) unlock_page():
> that seems very wrong to me in many ways.  But linux-fsdevel, Cc'ed, will
> be a better forum to advise on how to solve the problem you're seeing.

Hello Hugh,

It seems like you have misunderstood the purpose of the patch I posted. It's
neither a missing unlock_page() nor slow I/O that I want to address but a
genuine deadlock. In case you would not be familiar with the queue_if_no_path
multipath configuration option, the multipath.conf man page is available at
e.g. https://linux.die.net/man/5/multipath.conf.

Bart.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: Make truncate_inode_pages_range() killable
  2017-04-15  0:59   ` Bart Van Assche
@ 2017-04-18  8:15     ` Michal Hocko
  2017-04-18 22:09       ` Bart Van Assche
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2017-04-18  8:15 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: hughd, linux-mm, snitzer, oleg, akpm, hare, linux-fsdevel, mgorman, jack

On Sat 15-04-17 00:59:46, Bart Van Assche wrote:
> On Fri, 2017-04-14 at 17:40 -0700, Hugh Dickins wrote:
> > Changing a fundamental function, silently not to do its essential job,
> > when something in the kernel has forgotten (or is slow to) unlock_page():
> > that seems very wrong to me in many ways.  But linux-fsdevel, Cc'ed, will
> > be a better forum to advise on how to solve the problem you're seeing.
> 
> Hello Hugh,
> 
> It seems like you have misunderstood the purpose of the patch I posted. It's
> neither a missing unlock_page() nor slow I/O that I want to address but a
> genuine deadlock. In case you would not be familiar with the queue_if_no_path
> multipath configuration option, the multipath.conf man page is available at
> e.g. https://linux.die.net/man/5/multipath.conf.

So, whole is holding the page lock and why it cannot make forward
progress? Is the storage gone so that the ongoing IO will never
terminate? Btw. we have many other places which wait for the page lock
!killable way. Why they are any different from this case?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: Make truncate_inode_pages_range() killable
  2017-04-14 21:55 [PATCH] mm: Make truncate_inode_pages_range() killable Bart Van Assche
  2017-04-14 23:45 ` Bart Van Assche
  2017-04-15  0:40 ` Hugh Dickins
@ 2017-04-18 14:42 ` Oleg Nesterov
  2 siblings, 0 replies; 7+ messages in thread
From: Oleg Nesterov @ 2017-04-18 14:42 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Andrew Morton, Michal Hocko, Mel Gorman, Hugh Dickins,
	Mike Snitzer, Jan Kara, Hannes Reinecke, linux-mm

On 04/14, Bart Van Assche wrote:
>
> On Fri, 2017-04-14 at 14:55 -0700, Bart Van Assche wrote:
> > diff --git a/mm/truncate.c b/mm/truncate.c
> > index 6263affdef88..91abd16d74f8 100644
> > --- a/mm/truncate.c
> > +++ b/mm/truncate.c
> > @@ -20,6 +20,7 @@
> >  #include <linux/task_io_accounting_ops.h>
> >  #include <linux/buffer_head.h>	/* grr. try_to_release_page,
> >  				   do_invalidatepage */
> > +#include <linux/sched/signal.h>
> >  #include <linux/shmem_fs.h>
> >  #include <linux/cleancache.h>
> >  #include <linux/rmap.h>
> > @@ -366,7 +367,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
> >  		return;
> >
> >  	index = start;
> > -	for ( ; ; ) {
> > +	for ( ; !signal_pending_state(TASK_WAKEKILL, current); ) {

you could just use fatal_signal_pending(current)

> Sorry but a small part of this patch got left out accidentally:
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 7e59ebc2c25e..a02b273a4a1c 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -869,10 +869,10 @@ static inline int wants_signal(int sig, struct task_struct *p)
>  {
>  	if (sigismember(&p->blocked, sig))
>  		return 0;
> -	if (p->flags & PF_EXITING)
> -		return 0;
>  	if (sig == SIGKILL)
>  		return 1;
> +	if (p->flags & PF_EXITING)
> +		return 0;

Oh. This is the user-visible change. With this change you send a private signal to
a zombie thread and it will kill the process. Perhaps this is even good, and in fact
I was thinking about this change too many times, but I am not sure.

And afaics it won't really help. If the exiting task is multithreaded then another
kill(SIGKILL) won't wake other threads up, you will need tkill(tid_of_bloked_thread).

OTOH. Please note that fatal_signal_pending(exiting_thread) can be true even if you
do not send another SIGKILL.

But the main problem is that the behaviour of signal sent to PF_EXITING task is not
defined, it is not clear to me what do we actually want to do.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: Make truncate_inode_pages_range() killable
  2017-04-18  8:15     ` Michal Hocko
@ 2017-04-18 22:09       ` Bart Van Assche
  0 siblings, 0 replies; 7+ messages in thread
From: Bart Van Assche @ 2017-04-18 22:09 UTC (permalink / raw)
  To: mhocko
  Cc: hughd, linux-mm, snitzer, oleg, akpm, hare, linux-fsdevel, mgorman, jack

On Tue, 2017-04-18 at 10:15 +0200, Michal Hocko wrote:
> On Sat 15-04-17 00:59:46, Bart Van Assche wrote:
> > On Fri, 2017-04-14 at 17:40 -0700, Hugh Dickins wrote:
> > > Changing a fundamental function, silently not to do its essential job,
> > > when something in the kernel has forgotten (or is slow to) unlock_page():
> > > that seems very wrong to me in many ways.  But linux-fsdevel, Cc'ed, will
> > > be a better forum to advise on how to solve the problem you're seeing.
> > 
> > It seems like you have misunderstood the purpose of the patch I posted. It's
> > neither a missing unlock_page() nor slow I/O that I want to address but a
> > genuine deadlock. In case you would not be familiar with the queue_if_no_path
> > multipath configuration option, the multipath.conf man page is available at
> > e.g. https://linux.die.net/man/5/multipath.conf.
> 
> So, who is holding the page lock and why it cannot make forward
> progress? Is the storage gone so that the ongoing IO will never
> terminate? Btw. we have many other places which wait for the page lock
> !killable way. Why they are any different from this case?

Hello Michal,

queue_if_no_path means that if no paths are available that the dm-mpath driver
does not complete an I/O request until a path becomes available. A standard
test for multipathed storage is to alternatingly remove and restore all paths.

If the reported lockup happens at the end of a test I can break the cycle by
running "dmsetup message ${mpath} 0 fail_if_no_path". That command causes
pending I/O requests to fail if no paths are available.

I think it is rather unintuitive that kill -9 does not work for a process that
uses a dm-mpath device for I/O as long as no paths are available.

The call stack I reported in the first e-mail in this thread is what I ran
into while running multipath tests. I'm not sure why I have not yet hit any
other code paths that perform an unkillable wait on a page lock.

Bart.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-04-18 22:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-14 21:55 [PATCH] mm: Make truncate_inode_pages_range() killable Bart Van Assche
2017-04-14 23:45 ` Bart Van Assche
2017-04-15  0:40 ` Hugh Dickins
2017-04-15  0:59   ` Bart Van Assche
2017-04-18  8:15     ` Michal Hocko
2017-04-18 22:09       ` Bart Van Assche
2017-04-18 14:42 ` Oleg Nesterov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.