From: Christoph Lameter <clameter@sgi.com> To: Andrew Morton <akpm@osdl.org> Cc: menage@google.com, linux-kernel@vger.kernel.org, nickpiggin@yahoo.com.au, linux-mm@kvack.org, ak@suse.de, pj@sgi.com, dgc@sgi.com Subject: Re: [RFC 0/8] Cpuset aware writeback Date: Tue, 16 Jan 2007 14:15:56 -0800 (PST) [thread overview] Message-ID: <Pine.LNX.4.64.0701161407530.3545@schroedinger.engr.sgi.com> (raw) In-Reply-To: <20070116135325.3441f62b.akpm@osdl.org> On Tue, 16 Jan 2007, Andrew Morton wrote: > > On Mon, 15 Jan 2007 21:47:43 -0800 (PST) Christoph Lameter <clameter@sgi.com> wrote: > > > > Currently cpusets are not able to do proper writeback since > > dirty ratio calculations and writeback are all done for the system > > as a whole. > > We _do_ do proper writeback. But it's less efficient than it might be, and > there's an NFS problem. Well yes we write back during LRU scans when a potentially high percentage of the memory in cpuset is dirty. > > This may result in a large percentage of a cpuset > > to become dirty without writeout being triggered. Under NFS > > this can lead to OOM conditions. > > OK, a big question: is this patchset a performance improvement or a > correctness fix? Given the above, and the lack of benchmark results I'm > assuming it's for correctness. It is a correctness fix both for NFS OOM and doing proper cpuset writeout. > - Why does NFS go oom? Because it allocates potentially-unbounded > numbers of requests in the writeback path? > > It was able to go oom on non-numa machines before dirty-page-tracking > went in. So a general problem has now become specific to some NUMA > setups. Right. The issue is that large portions of memory become dirty / writeback since no writeback occurs because dirty limits are not checked for a cpuset. Then NFS attempt to writeout when doing LRU scans but is unable to allocate memory. > So an obvious, equivalent and vastly simpler "fix" would be to teach > the NFS client to go off-cpuset when trying to allocate these requests. Yes we can fix these allocations by allowing processes to allocate from other nodes. But then the container function of cpusets is no longer there. > (But is it really bad? What actual problems will it cause once NFS is fixed?) NFS is okay as far as I can tell. dirty throttling works fine in non cpuset environments because we throttle if 40% of memory becomes dirty or under writeback. > I don't understand why the proposed patches are cpuset-aware at all. This > is a per-zone problem, and a per-zone fix would seem to be appropriate, and > more general. For example, i386 machines can presumably get into trouble > if all of ZONE_DMA or ZONE_NORMAL get dirty. A good implementation would > address that problem as well. So I think it should all be per-zone? No. A zone can be completely dirty as long as we are allowed to allocate from other zones. > Do we really need those per-inode cpumasks? When page reclaim encounters a > dirty page on the zone LRU, we automatically know that page->mapping->host > has at least one dirty page in this zone, yes? We could immediately ask Yes, but when we enter reclaim most of the pages of a zone may already be dirty/writeback so we fail. Also when we enter reclaim we may not have the proper process / cpuset context. There is no use to throttle kswapd. We need to throttle the process that is dirtying memory. > But all of this is, I think, unneeded if NFS is fixed. It's hopefully a > performance optimisation to permit writeout in a less seeky fashion. > Unless there's some other problem with excessively dirty zones. The patchset improves performance because the filesystem can do sequential writeouts. So yes in some ways this is a performance improvement. But this is only because this patch makes dirty throttling for cpusets work in the same way as for non NUMA system.
WARNING: multiple messages have this Message-ID (diff)
From: Christoph Lameter <clameter@sgi.com> To: Andrew Morton <akpm@osdl.org> Cc: menage@google.com, linux-kernel@vger.kernel.org, nickpiggin@yahoo.com.au, linux-mm@kvack.org, ak@suse.de, pj@sgi.com, dgc@sgi.com Subject: Re: [RFC 0/8] Cpuset aware writeback Date: Tue, 16 Jan 2007 14:15:56 -0800 (PST) [thread overview] Message-ID: <Pine.LNX.4.64.0701161407530.3545@schroedinger.engr.sgi.com> (raw) In-Reply-To: <20070116135325.3441f62b.akpm@osdl.org> On Tue, 16 Jan 2007, Andrew Morton wrote: > > On Mon, 15 Jan 2007 21:47:43 -0800 (PST) Christoph Lameter <clameter@sgi.com> wrote: > > > > Currently cpusets are not able to do proper writeback since > > dirty ratio calculations and writeback are all done for the system > > as a whole. > > We _do_ do proper writeback. But it's less efficient than it might be, and > there's an NFS problem. Well yes we write back during LRU scans when a potentially high percentage of the memory in cpuset is dirty. > > This may result in a large percentage of a cpuset > > to become dirty without writeout being triggered. Under NFS > > this can lead to OOM conditions. > > OK, a big question: is this patchset a performance improvement or a > correctness fix? Given the above, and the lack of benchmark results I'm > assuming it's for correctness. It is a correctness fix both for NFS OOM and doing proper cpuset writeout. > - Why does NFS go oom? Because it allocates potentially-unbounded > numbers of requests in the writeback path? > > It was able to go oom on non-numa machines before dirty-page-tracking > went in. So a general problem has now become specific to some NUMA > setups. Right. The issue is that large portions of memory become dirty / writeback since no writeback occurs because dirty limits are not checked for a cpuset. Then NFS attempt to writeout when doing LRU scans but is unable to allocate memory. > So an obvious, equivalent and vastly simpler "fix" would be to teach > the NFS client to go off-cpuset when trying to allocate these requests. Yes we can fix these allocations by allowing processes to allocate from other nodes. But then the container function of cpusets is no longer there. > (But is it really bad? What actual problems will it cause once NFS is fixed?) NFS is okay as far as I can tell. dirty throttling works fine in non cpuset environments because we throttle if 40% of memory becomes dirty or under writeback. > I don't understand why the proposed patches are cpuset-aware at all. This > is a per-zone problem, and a per-zone fix would seem to be appropriate, and > more general. For example, i386 machines can presumably get into trouble > if all of ZONE_DMA or ZONE_NORMAL get dirty. A good implementation would > address that problem as well. So I think it should all be per-zone? No. A zone can be completely dirty as long as we are allowed to allocate from other zones. > Do we really need those per-inode cpumasks? When page reclaim encounters a > dirty page on the zone LRU, we automatically know that page->mapping->host > has at least one dirty page in this zone, yes? We could immediately ask Yes, but when we enter reclaim most of the pages of a zone may already be dirty/writeback so we fail. Also when we enter reclaim we may not have the proper process / cpuset context. There is no use to throttle kswapd. We need to throttle the process that is dirtying memory. > But all of this is, I think, unneeded if NFS is fixed. It's hopefully a > performance optimisation to permit writeout in a less seeky fashion. > Unless there's some other problem with excessively dirty zones. The patchset improves performance because the filesystem can do sequential writeouts. So yes in some ways this is a performance improvement. But this is only because this patch makes dirty throttling for cpusets work in the same way as for non NUMA system. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-01-16 22:16 UTC|newest] Thread overview: 220+ messages / expand[flat|nested] mbox.gz Atom feed top 2007-01-16 5:47 [RFC 0/8] Cpuset aware writeback Christoph Lameter 2007-01-16 5:47 ` Christoph Lameter 2007-01-16 5:47 ` [RFC 1/8] Convert higest_possible_node_id() into nr_node_ids Christoph Lameter 2007-01-16 5:47 ` Christoph Lameter 2007-01-16 22:05 ` Andi Kleen 2007-01-16 22:05 ` Andi Kleen 2007-01-17 3:14 ` Christoph Lameter 2007-01-17 3:14 ` Christoph Lameter 2007-01-17 4:15 ` Andi Kleen 2007-01-17 4:15 ` Andi Kleen 2007-01-17 4:23 ` Christoph Lameter 2007-01-17 4:23 ` Christoph Lameter 2007-01-16 5:47 ` [RFC 2/8] Add a map to inodes to track dirty pages per node Christoph Lameter 2007-01-16 5:47 ` Christoph Lameter 2007-01-16 5:47 ` [RFC 3/8] Add a nodemask to pdflush functions Christoph Lameter 2007-01-16 5:47 ` Christoph Lameter 2007-01-16 5:48 ` [RFC 4/8] Per cpuset dirty ratio handling and writeout Christoph Lameter 2007-01-16 5:48 ` Christoph Lameter 2007-01-16 5:48 ` [RFC 5/8] Make writeout during reclaim cpuset aware Christoph Lameter 2007-01-16 5:48 ` Christoph Lameter 2007-01-16 22:07 ` Andi Kleen 2007-01-16 22:07 ` Andi Kleen 2007-01-17 4:20 ` Paul Jackson 2007-01-17 4:20 ` Paul Jackson 2007-01-17 4:28 ` Andi Kleen 2007-01-17 4:28 ` Andi Kleen 2007-01-17 4:36 ` Paul Jackson 2007-01-17 4:36 ` Paul Jackson 2007-01-17 5:59 ` Andi Kleen 2007-01-17 5:59 ` Andi Kleen 2007-01-17 6:19 ` Christoph Lameter 2007-01-17 6:19 ` Christoph Lameter 2007-01-17 4:23 ` Christoph Lameter 2007-01-17 4:23 ` Christoph Lameter 2007-01-16 5:48 ` [RFC 6/8] Throttle vm writeout per cpuset Christoph Lameter 2007-01-16 5:48 ` Christoph Lameter 2007-01-16 5:48 ` [RFC 7/8] Exclude unreclaimable pages from dirty ration calculation Christoph Lameter 2007-01-16 5:48 ` Christoph Lameter 2007-01-18 15:48 ` Nikita Danilov 2007-01-18 15:48 ` Nikita Danilov 2007-01-18 19:56 ` Christoph Lameter 2007-01-18 19:56 ` Christoph Lameter 2007-01-16 5:48 ` [RFC 8/8] Reduce inode memory usage for systems with a high MAX_NUMNODES Christoph Lameter 2007-01-16 5:48 ` Christoph Lameter 2007-01-16 19:52 ` Paul Menage 2007-01-16 19:52 ` Paul Menage 2007-01-16 20:00 ` Christoph Lameter 2007-01-16 20:00 ` Christoph Lameter 2007-01-16 20:06 ` Paul Menage 2007-01-16 20:06 ` Paul Menage 2007-01-16 20:51 ` Christoph Lameter 2007-01-16 20:51 ` Christoph Lameter 2007-01-16 7:38 ` [RFC 0/8] Cpuset aware writeback Peter Zijlstra 2007-01-16 7:38 ` Peter Zijlstra 2007-01-16 20:10 ` Christoph Lameter 2007-01-16 20:10 ` Christoph Lameter 2007-01-16 9:25 ` Paul Jackson 2007-01-16 9:25 ` Paul Jackson 2007-01-16 17:13 ` Christoph Lameter 2007-01-16 17:13 ` Christoph Lameter 2007-01-16 21:53 ` Andrew Morton 2007-01-16 21:53 ` Andrew Morton 2007-01-16 22:08 ` [PATCH] nfs: fix congestion control Peter Zijlstra 2007-01-16 22:08 ` [PATCH] nfs: fix congestion control, " Peter Zijlstra 2007-01-16 22:27 ` [PATCH] " Trond Myklebust 2007-01-16 22:27 ` Trond Myklebust 2007-01-17 2:41 ` Peter Zijlstra 2007-01-17 2:41 ` Peter Zijlstra 2007-01-17 6:15 ` Trond Myklebust 2007-01-17 6:15 ` Trond Myklebust 2007-01-17 8:49 ` Peter Zijlstra 2007-01-17 8:49 ` Peter Zijlstra 2007-01-17 13:50 ` Trond Myklebust 2007-01-17 13:50 ` Trond Myklebust 2007-01-17 14:29 ` Peter Zijlstra 2007-01-17 14:29 ` Peter Zijlstra 2007-01-17 14:45 ` Trond Myklebust 2007-01-17 14:45 ` Trond Myklebust 2007-01-17 20:05 ` Christoph Lameter 2007-01-17 20:05 ` Christoph Lameter 2007-01-17 21:52 ` Peter Zijlstra 2007-01-17 21:52 ` Peter Zijlstra 2007-01-17 21:54 ` Trond Myklebust 2007-01-17 21:54 ` Trond Myklebust 2007-01-18 13:27 ` Peter Zijlstra 2007-01-18 13:27 ` Peter Zijlstra 2007-01-18 15:49 ` Trond Myklebust 2007-01-18 15:49 ` Trond Myklebust 2007-01-19 9:33 ` Peter Zijlstra 2007-01-19 9:33 ` Peter Zijlstra 2007-01-19 13:07 ` Peter Zijlstra 2007-01-19 13:07 ` Peter Zijlstra 2007-01-19 16:51 ` Trond Myklebust 2007-01-19 16:51 ` Trond Myklebust 2007-01-19 17:54 ` Peter Zijlstra 2007-01-19 17:54 ` Peter Zijlstra 2007-01-19 17:20 ` Christoph Lameter 2007-01-19 17:20 ` Christoph Lameter 2007-01-19 17:57 ` Peter Zijlstra 2007-01-19 17:57 ` Peter Zijlstra 2007-01-19 18:02 ` Christoph Lameter 2007-01-19 18:02 ` Christoph Lameter 2007-01-19 18:26 ` Trond Myklebust 2007-01-19 18:26 ` Trond Myklebust 2007-01-19 18:27 ` Christoph Lameter 2007-01-19 18:27 ` Christoph Lameter 2007-01-20 7:01 ` [PATCH] nfs: fix congestion control -v3 Peter Zijlstra 2007-01-20 7:01 ` [PATCH] nfs: fix congestion control -v3, nfs: fix congestion control Peter Zijlstra 2007-01-22 16:12 ` [PATCH] nfs: fix congestion control -v3 Trond Myklebust 2007-01-22 16:12 ` Trond Myklebust 2007-01-25 15:32 ` [PATCH] nfs: fix congestion control -v4 Peter Zijlstra 2007-01-25 15:32 ` Peter Zijlstra 2007-01-26 5:02 ` Andrew Morton 2007-01-26 5:02 ` Andrew Morton 2007-01-26 8:00 ` Peter Zijlstra 2007-01-26 8:00 ` Peter Zijlstra 2007-01-26 8:50 ` Peter Zijlstra 2007-01-26 8:50 ` Peter Zijlstra 2007-01-26 5:09 ` Andrew Morton 2007-01-26 5:09 ` Andrew Morton 2007-01-26 5:31 ` Christoph Lameter 2007-01-26 5:31 ` Christoph Lameter 2007-01-26 6:04 ` Andrew Morton 2007-01-26 6:04 ` Andrew Morton 2007-01-26 6:53 ` Christoph Lameter 2007-01-26 6:53 ` Christoph Lameter 2007-01-26 8:03 ` Peter Zijlstra 2007-01-26 8:03 ` Peter Zijlstra 2007-01-26 8:51 ` Andrew Morton 2007-01-26 8:51 ` Andrew Morton 2007-01-26 9:01 ` Peter Zijlstra 2007-01-26 9:01 ` Peter Zijlstra 2007-02-20 12:59 ` Peter Zijlstra 2007-02-20 12:59 ` Peter Zijlstra 2007-01-22 17:59 ` [PATCH] nfs: fix congestion control -v3 Christoph Lameter 2007-01-22 17:59 ` Christoph Lameter 2007-01-17 23:15 ` [PATCH] nfs: fix congestion control Christoph Hellwig 2007-01-17 23:15 ` Christoph Hellwig 2007-01-16 22:15 ` Christoph Lameter [this message] 2007-01-16 22:15 ` [RFC 0/8] Cpuset aware writeback Christoph Lameter 2007-01-16 23:40 ` Andrew Morton 2007-01-16 23:40 ` Andrew Morton 2007-01-17 0:16 ` Christoph Lameter 2007-01-17 0:16 ` Christoph Lameter 2007-01-17 1:07 ` Andrew Morton 2007-01-17 1:07 ` Andrew Morton 2007-01-17 1:30 ` Christoph Lameter 2007-01-17 1:30 ` Christoph Lameter 2007-01-17 2:34 ` Andrew Morton 2007-01-17 2:34 ` Andrew Morton 2007-01-17 3:40 ` Christoph Lameter 2007-01-17 3:40 ` Christoph Lameter 2007-01-17 4:02 ` Paul Jackson 2007-01-17 4:02 ` Paul Jackson 2007-01-17 4:05 ` Andrew Morton 2007-01-17 4:05 ` Andrew Morton 2007-01-17 6:27 ` Christoph Lameter 2007-01-17 6:27 ` Christoph Lameter 2007-01-17 7:00 ` Andrew Morton 2007-01-17 7:00 ` Andrew Morton 2007-01-17 8:01 ` Paul Jackson 2007-01-17 8:01 ` Paul Jackson 2007-01-17 9:57 ` Andrew Morton 2007-01-17 9:57 ` Andrew Morton 2007-01-17 19:43 ` Christoph Lameter 2007-01-17 19:43 ` Christoph Lameter 2007-01-17 22:10 ` Andrew Morton 2007-01-17 22:10 ` Andrew Morton 2007-01-18 1:10 ` Christoph Lameter 2007-01-18 1:10 ` Christoph Lameter 2007-01-18 1:25 ` Andrew Morton 2007-01-18 1:25 ` Andrew Morton 2007-01-18 5:21 ` Christoph Lameter 2007-01-18 5:21 ` Christoph Lameter 2007-01-16 23:44 ` David Chinner 2007-01-16 23:44 ` David Chinner 2007-01-16 22:01 ` Andi Kleen 2007-01-16 22:01 ` Andi Kleen 2007-01-16 22:18 ` Christoph Lameter 2007-01-16 22:18 ` Christoph Lameter 2007-02-02 1:38 ` Ethan Solomita 2007-02-02 1:38 ` Ethan Solomita 2007-02-02 2:16 ` Christoph Lameter 2007-02-02 2:16 ` Christoph Lameter 2007-02-02 4:03 ` Andrew Morton 2007-02-02 4:03 ` Andrew Morton 2007-02-02 5:29 ` Christoph Lameter 2007-02-02 5:29 ` Christoph Lameter 2007-02-02 6:02 ` Neil Brown 2007-02-02 6:02 ` Neil Brown 2007-02-02 6:17 ` Christoph Lameter 2007-02-02 6:17 ` Christoph Lameter 2007-02-02 6:41 ` Neil Brown 2007-02-02 6:41 ` Neil Brown 2007-02-02 7:12 ` Andrew Morton 2007-02-02 7:12 ` Andrew Morton 2007-03-21 21:11 ` Ethan Solomita 2007-03-21 21:11 ` Ethan Solomita 2007-03-21 21:29 ` Christoph Lameter 2007-03-21 21:29 ` Christoph Lameter 2007-03-21 21:52 ` Andrew Morton 2007-03-21 21:52 ` Andrew Morton 2007-03-21 21:57 ` Christoph Lameter 2007-03-21 21:57 ` Christoph Lameter 2007-04-19 2:07 ` Ethan Solomita 2007-04-19 2:07 ` Ethan Solomita 2007-04-19 2:55 ` Christoph Lameter 2007-04-19 2:55 ` Christoph Lameter 2007-04-19 7:52 ` Ethan Solomita 2007-04-19 7:52 ` Ethan Solomita 2007-04-19 16:03 ` Christoph Lameter 2007-04-19 16:03 ` Christoph Lameter 2007-04-21 1:37 ` Ethan Solomita 2007-04-21 1:37 ` Ethan Solomita 2007-04-21 1:48 ` Christoph Lameter 2007-04-21 1:48 ` Christoph Lameter 2007-04-21 8:15 ` Ethan Solomita 2007-04-21 8:15 ` Ethan Solomita 2007-04-21 15:40 ` Christoph Lameter 2007-04-21 15:40 ` Christoph Lameter
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=Pine.LNX.4.64.0701161407530.3545@schroedinger.engr.sgi.com \ --to=clameter@sgi.com \ --cc=ak@suse.de \ --cc=akpm@osdl.org \ --cc=dgc@sgi.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=menage@google.com \ --cc=nickpiggin@yahoo.com.au \ --cc=pj@sgi.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.