From: Neil Brown <neilb@suse.de> To: Rik van Riel <riel@redhat.com>, Andrew Morton <akpm@linux-foundation.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>, Wu@suse.de, "Fengguang <fengguang.wu"@intel.com, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>, linux-kernel@vger.kernel.org, "linux-mm@kvack.org" <linux-mm@kvack.org> Subject: Deadlock possibly caused by too_many_isolated. Date: Wed, 15 Sep 2010 09:11:18 +1000 [thread overview] Message-ID: <20100915091118.3dbdc961@notabene> (raw) Hi, I recently had a customer (running 2.6.32) report a deadlock during very intensive IO with lots of processes. Having looked at the stack traces, my guess as to the problem is this: There are enough threads in direct_reclaim that too_many_isolated() is returning true, and so some threads are blocked in shrink_inactive_list. Those threads that are not blocked there are attempting to do filesystem writeout. But that is blocked because... Some threads that are blocked there, hold some IO lock (probably in the filesystem) and are trying to allocate memory inside the block device (md/raid1 to be precise) which is allocating with GFP_NOIO and has a mempool to fall back on. As these threads don't have __GFP_IO set, they should not really be blocked both other threads that are doing IO. But it seems they are. So I'm wondering if the loop in shrink_inactive_list should abort if __GFP_IO is not set ... and maybe if __GFP_FS is not set too??? Below is a patch that I'm asking the customer to test. If anyone can point out a flaw in my reasoning, suggest any other alternatives, provide a better patch, or otherwise help me out here, I would greatly appreciate it. (I sent this email to the people mentioned in commit: commit 35cd78156c499ef83f60605e4643d5a98fef14fd Author: Rik van Riel <riel@redhat.com> Date: Mon Sep 21 17:01:38 2009 -0700 vmscan: throttle direct reclaim when too many pages are isolated already plus the obvious mail lists) Thanks, NeilBrown Index: linux-2.6.32-SLE11-SP1/mm/vmscan.c =================================================================== --- linux-2.6.32-SLE11-SP1.orig/mm/vmscan.c 2010-09-15 08:37:32.000000000 +1000 +++ linux-2.6.32-SLE11-SP1/mm/vmscan.c 2010-09-15 08:38:57.000000000 +1000 @@ -1106,6 +1106,11 @@ static unsigned long shrink_inactive_lis /* We are about to die and free our memory. Return now. */ if (fatal_signal_pending(current)) return SWAP_CLUSTER_MAX; + if (!(sc->gfp_mask & __GFP_IO)) + /* Not allowed to do IO, so mustn't wait + * on processes that might try to + */ + return SWAP_CLUSTER_MAX; } /*
WARNING: multiple messages have this Message-ID (diff)
From: Neil Brown <neilb@suse.de> To: Rik van Riel <riel@redhat.com>, Andrew Morton <akpm@linux-foundation.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>, Wu@suse.de, "Fengguang <fengguang.wu"@intel.com, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>, linux-kernel@vger.kernel.org, "linux-mm@kvack.org" <linux-mm@kvack.org> Subject: Deadlock possibly caused by too_many_isolated. Date: Wed, 15 Sep 2010 09:11:18 +1000 [thread overview] Message-ID: <20100915091118.3dbdc961@notabene> (raw) Hi, I recently had a customer (running 2.6.32) report a deadlock during very intensive IO with lots of processes. Having looked at the stack traces, my guess as to the problem is this: There are enough threads in direct_reclaim that too_many_isolated() is returning true, and so some threads are blocked in shrink_inactive_list. Those threads that are not blocked there are attempting to do filesystem writeout. But that is blocked because... Some threads that are blocked there, hold some IO lock (probably in the filesystem) and are trying to allocate memory inside the block device (md/raid1 to be precise) which is allocating with GFP_NOIO and has a mempool to fall back on. As these threads don't have __GFP_IO set, they should not really be blocked both other threads that are doing IO. But it seems they are. So I'm wondering if the loop in shrink_inactive_list should abort if __GFP_IO is not set ... and maybe if __GFP_FS is not set too??? Below is a patch that I'm asking the customer to test. If anyone can point out a flaw in my reasoning, suggest any other alternatives, provide a better patch, or otherwise help me out here, I would greatly appreciate it. (I sent this email to the people mentioned in commit: commit 35cd78156c499ef83f60605e4643d5a98fef14fd Author: Rik van Riel <riel@redhat.com> Date: Mon Sep 21 17:01:38 2009 -0700 vmscan: throttle direct reclaim when too many pages are isolated already plus the obvious mail lists) Thanks, NeilBrown Index: linux-2.6.32-SLE11-SP1/mm/vmscan.c =================================================================== --- linux-2.6.32-SLE11-SP1.orig/mm/vmscan.c 2010-09-15 08:37:32.000000000 +1000 +++ linux-2.6.32-SLE11-SP1/mm/vmscan.c 2010-09-15 08:38:57.000000000 +1000 @@ -1106,6 +1106,11 @@ static unsigned long shrink_inactive_lis /* We are about to die and free our memory. Return now. */ if (fatal_signal_pending(current)) return SWAP_CLUSTER_MAX; + if (!(sc->gfp_mask & __GFP_IO)) + /* Not allowed to do IO, so mustn't wait + * on processes that might try to + */ + return SWAP_CLUSTER_MAX; } /* -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next reply other threads:[~2010-09-14 23:11 UTC|newest] Thread overview: 116+ messages / expand[flat|nested] mbox.gz Atom feed top 2010-09-14 23:11 Neil Brown [this message] 2010-09-14 23:11 ` Deadlock possibly caused by too_many_isolated Neil Brown 2010-09-15 0:30 ` Rik van Riel 2010-09-15 0:30 ` Rik van Riel 2010-09-15 2:23 ` Neil Brown 2010-09-15 2:23 ` Neil Brown 2010-09-15 2:37 ` Wu Fengguang 2010-09-15 2:37 ` Wu Fengguang 2010-09-15 2:54 ` Wu Fengguang 2010-09-15 2:54 ` Wu Fengguang 2010-09-15 3:06 ` Wu Fengguang 2010-09-15 3:06 ` Wu Fengguang 2010-09-15 3:13 ` Wu Fengguang 2010-09-15 3:13 ` Wu Fengguang 2010-09-15 3:18 ` Shaohua Li 2010-09-15 3:18 ` Shaohua Li 2010-09-15 3:31 ` Wu Fengguang 2010-09-15 3:31 ` Wu Fengguang 2010-09-15 3:17 ` Neil Brown 2010-09-15 3:17 ` Neil Brown 2010-09-15 3:47 ` Wu Fengguang 2010-09-15 3:47 ` Wu Fengguang 2010-09-15 8:28 ` Wu Fengguang 2010-09-15 8:28 ` Wu Fengguang 2010-09-15 8:44 ` Neil Brown 2010-09-15 8:44 ` Neil Brown 2010-10-18 4:14 ` Neil Brown 2010-10-18 4:14 ` Neil Brown 2010-10-18 5:04 ` KOSAKI Motohiro 2010-10-18 5:04 ` KOSAKI Motohiro 2010-10-18 10:58 ` Torsten Kaiser 2010-10-18 10:58 ` Torsten Kaiser 2010-10-18 23:11 ` Neil Brown 2010-10-18 23:11 ` Neil Brown 2010-10-19 8:43 ` Torsten Kaiser 2010-10-19 8:43 ` Torsten Kaiser 2010-10-19 10:06 ` Torsten Kaiser 2010-10-19 10:06 ` Torsten Kaiser 2010-10-20 5:57 ` Wu Fengguang 2010-10-20 5:57 ` Wu Fengguang 2010-10-20 7:05 ` KOSAKI Motohiro 2010-10-20 7:05 ` KOSAKI Motohiro 2010-10-20 9:27 ` Wu Fengguang 2010-10-20 9:27 ` Wu Fengguang 2010-10-20 13:03 ` Jens Axboe 2010-10-20 13:03 ` Jens Axboe 2010-10-22 5:37 ` Wu Fengguang 2010-10-22 5:37 ` Wu Fengguang 2010-10-22 8:07 ` Wu Fengguang 2010-10-22 8:07 ` Wu Fengguang 2010-10-22 8:09 ` Jens Axboe 2010-10-22 8:09 ` Jens Axboe 2010-10-24 16:52 ` Wu Fengguang 2010-10-24 16:52 ` Wu Fengguang 2010-10-25 6:40 ` Neil Brown 2010-10-25 6:40 ` Neil Brown 2010-10-25 7:26 ` Wu Fengguang 2010-10-25 7:26 ` Wu Fengguang 2010-10-20 7:25 ` Torsten Kaiser 2010-10-20 7:25 ` Torsten Kaiser 2010-10-20 9:01 ` Wu Fengguang 2010-10-20 9:01 ` Wu Fengguang 2010-10-20 10:07 ` Torsten Kaiser 2010-10-20 10:07 ` Torsten Kaiser 2010-10-20 14:23 ` Minchan Kim 2010-10-20 14:23 ` Minchan Kim 2010-10-20 15:35 ` Torsten Kaiser 2010-10-20 15:35 ` Torsten Kaiser 2010-10-20 23:31 ` Minchan Kim 2010-10-20 23:31 ` Minchan Kim 2010-10-18 16:15 ` Wu Fengguang 2010-10-18 16:15 ` Wu Fengguang 2010-10-18 21:58 ` Andrew Morton 2010-10-18 21:58 ` Andrew Morton 2010-10-18 22:31 ` Neil Brown 2010-10-18 22:31 ` Neil Brown 2010-10-18 22:41 ` Andrew Morton 2010-10-18 22:41 ` Andrew Morton 2010-10-19 0:57 ` KOSAKI Motohiro 2010-10-19 0:57 ` KOSAKI Motohiro 2010-10-19 1:15 ` Minchan Kim 2010-10-19 1:15 ` Minchan Kim 2010-10-19 1:21 ` KOSAKI Motohiro 2010-10-19 1:21 ` KOSAKI Motohiro 2010-10-19 1:32 ` Minchan Kim 2010-10-19 1:32 ` Minchan Kim 2010-10-19 2:03 ` KOSAKI Motohiro 2010-10-19 2:03 ` KOSAKI Motohiro 2010-10-19 2:16 ` Minchan Kim 2010-10-19 2:16 ` Minchan Kim 2010-10-19 2:54 ` KOSAKI Motohiro 2010-10-19 2:54 ` KOSAKI Motohiro 2010-10-19 2:35 ` Wu Fengguang 2010-10-19 2:35 ` Wu Fengguang 2010-10-19 2:52 ` Minchan Kim 2010-10-19 2:52 ` Minchan Kim 2010-10-19 3:05 ` Wu Fengguang 2010-10-19 3:05 ` Wu Fengguang 2010-10-19 3:09 ` Minchan Kim 2010-10-19 3:09 ` Minchan Kim 2010-10-19 3:13 ` KOSAKI Motohiro 2010-10-19 3:13 ` KOSAKI Motohiro 2010-10-19 5:11 ` Minchan Kim 2010-10-19 5:11 ` Minchan Kim 2010-10-19 3:21 ` Shaohua Li 2010-10-19 3:21 ` Shaohua Li 2010-10-19 7:15 ` Shaohua Li 2010-10-19 7:15 ` Shaohua Li 2010-10-19 7:34 ` Minchan Kim 2010-10-19 7:34 ` Minchan Kim 2010-10-19 2:24 ` Wu Fengguang 2010-10-19 2:24 ` Wu Fengguang 2010-10-19 2:37 ` KOSAKI Motohiro 2010-10-19 2:37 ` KOSAKI Motohiro 2010-10-19 2:37 ` Minchan Kim 2010-10-19 2:37 ` Minchan Kim
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20100915091118.3dbdc961@notabene \ --to=neilb@suse.de \ --cc="Fengguang <fengguang.wu"@intel.com \ --cc=Wu@suse.de \ --cc=akpm@linux-foundation.org \ --cc=kamezawa.hiroyu@jp.fujitsu.com \ --cc=kosaki.motohiro@jp.fujitsu.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=riel@redhat.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.