From: Peter Zijlstra <a.p.zijlstra@chello.nl> To: Christoph Lameter <clameter@sgi.com> Cc: akpm@osdl.org, Paul Menage <menage@google.com>, linux-kernel@vger.kernel.org, Nick Piggin <nickpiggin@yahoo.com.au>, linux-mm@kvack.org, Andi Kleen <ak@suse.de>, Paul Jackson <pj@sgi.com>, Dave Chinner <dgc@sgi.com> Subject: Re: [RFC 0/8] Cpuset aware writeback Date: Tue, 16 Jan 2007 08:38:10 +0100 [thread overview] Message-ID: <1168933090.22935.30.camel@twins> (raw) In-Reply-To: <20070116054743.15358.77287.sendpatchset@schroedinger.engr.sgi.com> On Mon, 2007-01-15 at 21:47 -0800, Christoph Lameter wrote: > Currently cpusets are not able to do proper writeback since > dirty ratio calculations and writeback are all done for the system > as a whole. This may result in a large percentage of a cpuset > to become dirty without writeout being triggered. Under NFS > this can lead to OOM conditions. > > Writeback will occur during the LRU scans. But such writeout > is not effective since we write page by page and not in inode page > order (regular writeback). > > In order to fix the problem we first of all introduce a method to > establish a map of nodes that contain dirty pages for each > inode mapping. > > Secondly we modify the dirty limit calculation to be based > on the acctive cpuset. > > If we are in a cpuset then we select only inodes for writeback > that have pages on the nodes of the cpuset. > > After we have the cpuset throttling in place we can then make > further fixups: > > A. We can do inode based writeout from direct reclaim > avoiding single page writes to the filesystem. > > B. We add a new counter NR_UNRECLAIMABLE that is subtracted > from the available pages in a node. This allows us to > accurately calculate the dirty ratio even if large portions > of the node have been allocated for huge pages or for > slab pages. What about mlock'ed pages? > There are a couple of points where some better ideas could be used: > > 1. The nodemask expands the inode structure significantly if the > architecture allows a high number of nodes. This is only an issue > for IA64. For that platform we expand the inode structure by 128 byte > (to support 1024 nodes). The last patch attempts to address the issue > by using the knowledge about the maximum possible number of nodes > determined on bootup to shrink the nodemask. Not the prettiest indeed, no ideas though. > 2. The calculation of the per cpuset limits can require looping > over a number of nodes which may bring the performance of get_dirty_limits > near pre 2.6.18 performance (before the introduction of the ZVC counters) > (only for cpuset based limit calculation). There is no way of keeping these > counters per cpuset since cpusets may overlap. Well, you gain functionality, you loose some runtime, sad but probably worth it. Otherwise it all looks good. Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <a.p.zijlstra@chello.nl> To: Christoph Lameter <clameter@sgi.com> Cc: akpm@osdl.org, Paul Menage <menage@google.com>, linux-kernel@vger.kernel.org, Nick Piggin <nickpiggin@yahoo.com.au>, linux-mm@kvack.org, Andi Kleen <ak@suse.de>, Paul Jackson <pj@sgi.com>, Dave Chinner <dgc@sgi.com> Subject: Re: [RFC 0/8] Cpuset aware writeback Date: Tue, 16 Jan 2007 08:38:10 +0100 [thread overview] Message-ID: <1168933090.22935.30.camel@twins> (raw) In-Reply-To: <20070116054743.15358.77287.sendpatchset@schroedinger.engr.sgi.com> On Mon, 2007-01-15 at 21:47 -0800, Christoph Lameter wrote: > Currently cpusets are not able to do proper writeback since > dirty ratio calculations and writeback are all done for the system > as a whole. This may result in a large percentage of a cpuset > to become dirty without writeout being triggered. Under NFS > this can lead to OOM conditions. > > Writeback will occur during the LRU scans. But such writeout > is not effective since we write page by page and not in inode page > order (regular writeback). > > In order to fix the problem we first of all introduce a method to > establish a map of nodes that contain dirty pages for each > inode mapping. > > Secondly we modify the dirty limit calculation to be based > on the acctive cpuset. > > If we are in a cpuset then we select only inodes for writeback > that have pages on the nodes of the cpuset. > > After we have the cpuset throttling in place we can then make > further fixups: > > A. We can do inode based writeout from direct reclaim > avoiding single page writes to the filesystem. > > B. We add a new counter NR_UNRECLAIMABLE that is subtracted > from the available pages in a node. This allows us to > accurately calculate the dirty ratio even if large portions > of the node have been allocated for huge pages or for > slab pages. What about mlock'ed pages? > There are a couple of points where some better ideas could be used: > > 1. The nodemask expands the inode structure significantly if the > architecture allows a high number of nodes. This is only an issue > for IA64. For that platform we expand the inode structure by 128 byte > (to support 1024 nodes). The last patch attempts to address the issue > by using the knowledge about the maximum possible number of nodes > determined on bootup to shrink the nodemask. Not the prettiest indeed, no ideas though. > 2. The calculation of the per cpuset limits can require looping > over a number of nodes which may bring the performance of get_dirty_limits > near pre 2.6.18 performance (before the introduction of the ZVC counters) > (only for cpuset based limit calculation). There is no way of keeping these > counters per cpuset since cpusets may overlap. Well, you gain functionality, you loose some runtime, sad but probably worth it. Otherwise it all looks good. Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-01-16 7:40 UTC|newest] Thread overview: 220+ messages / expand[flat|nested] mbox.gz Atom feed top 2007-01-16 5:47 [RFC 0/8] Cpuset aware writeback Christoph Lameter 2007-01-16 5:47 ` Christoph Lameter 2007-01-16 5:47 ` [RFC 1/8] Convert higest_possible_node_id() into nr_node_ids Christoph Lameter 2007-01-16 5:47 ` Christoph Lameter 2007-01-16 22:05 ` Andi Kleen 2007-01-16 22:05 ` Andi Kleen 2007-01-17 3:14 ` Christoph Lameter 2007-01-17 3:14 ` Christoph Lameter 2007-01-17 4:15 ` Andi Kleen 2007-01-17 4:15 ` Andi Kleen 2007-01-17 4:23 ` Christoph Lameter 2007-01-17 4:23 ` Christoph Lameter 2007-01-16 5:47 ` [RFC 2/8] Add a map to inodes to track dirty pages per node Christoph Lameter 2007-01-16 5:47 ` Christoph Lameter 2007-01-16 5:47 ` [RFC 3/8] Add a nodemask to pdflush functions Christoph Lameter 2007-01-16 5:47 ` Christoph Lameter 2007-01-16 5:48 ` [RFC 4/8] Per cpuset dirty ratio handling and writeout Christoph Lameter 2007-01-16 5:48 ` Christoph Lameter 2007-01-16 5:48 ` [RFC 5/8] Make writeout during reclaim cpuset aware Christoph Lameter 2007-01-16 5:48 ` Christoph Lameter 2007-01-16 22:07 ` Andi Kleen 2007-01-16 22:07 ` Andi Kleen 2007-01-17 4:20 ` Paul Jackson 2007-01-17 4:20 ` Paul Jackson 2007-01-17 4:28 ` Andi Kleen 2007-01-17 4:28 ` Andi Kleen 2007-01-17 4:36 ` Paul Jackson 2007-01-17 4:36 ` Paul Jackson 2007-01-17 5:59 ` Andi Kleen 2007-01-17 5:59 ` Andi Kleen 2007-01-17 6:19 ` Christoph Lameter 2007-01-17 6:19 ` Christoph Lameter 2007-01-17 4:23 ` Christoph Lameter 2007-01-17 4:23 ` Christoph Lameter 2007-01-16 5:48 ` [RFC 6/8] Throttle vm writeout per cpuset Christoph Lameter 2007-01-16 5:48 ` Christoph Lameter 2007-01-16 5:48 ` [RFC 7/8] Exclude unreclaimable pages from dirty ration calculation Christoph Lameter 2007-01-16 5:48 ` Christoph Lameter 2007-01-18 15:48 ` Nikita Danilov 2007-01-18 15:48 ` Nikita Danilov 2007-01-18 19:56 ` Christoph Lameter 2007-01-18 19:56 ` Christoph Lameter 2007-01-16 5:48 ` [RFC 8/8] Reduce inode memory usage for systems with a high MAX_NUMNODES Christoph Lameter 2007-01-16 5:48 ` Christoph Lameter 2007-01-16 19:52 ` Paul Menage 2007-01-16 19:52 ` Paul Menage 2007-01-16 20:00 ` Christoph Lameter 2007-01-16 20:00 ` Christoph Lameter 2007-01-16 20:06 ` Paul Menage 2007-01-16 20:06 ` Paul Menage 2007-01-16 20:51 ` Christoph Lameter 2007-01-16 20:51 ` Christoph Lameter 2007-01-16 7:38 ` Peter Zijlstra [this message] 2007-01-16 7:38 ` [RFC 0/8] Cpuset aware writeback Peter Zijlstra 2007-01-16 20:10 ` Christoph Lameter 2007-01-16 20:10 ` Christoph Lameter 2007-01-16 9:25 ` Paul Jackson 2007-01-16 9:25 ` Paul Jackson 2007-01-16 17:13 ` Christoph Lameter 2007-01-16 17:13 ` Christoph Lameter 2007-01-16 21:53 ` Andrew Morton 2007-01-16 21:53 ` Andrew Morton 2007-01-16 22:08 ` [PATCH] nfs: fix congestion control Peter Zijlstra 2007-01-16 22:08 ` [PATCH] nfs: fix congestion control, " Peter Zijlstra 2007-01-16 22:27 ` [PATCH] " Trond Myklebust 2007-01-16 22:27 ` Trond Myklebust 2007-01-17 2:41 ` Peter Zijlstra 2007-01-17 2:41 ` Peter Zijlstra 2007-01-17 6:15 ` Trond Myklebust 2007-01-17 6:15 ` Trond Myklebust 2007-01-17 8:49 ` Peter Zijlstra 2007-01-17 8:49 ` Peter Zijlstra 2007-01-17 13:50 ` Trond Myklebust 2007-01-17 13:50 ` Trond Myklebust 2007-01-17 14:29 ` Peter Zijlstra 2007-01-17 14:29 ` Peter Zijlstra 2007-01-17 14:45 ` Trond Myklebust 2007-01-17 14:45 ` Trond Myklebust 2007-01-17 20:05 ` Christoph Lameter 2007-01-17 20:05 ` Christoph Lameter 2007-01-17 21:52 ` Peter Zijlstra 2007-01-17 21:52 ` Peter Zijlstra 2007-01-17 21:54 ` Trond Myklebust 2007-01-17 21:54 ` Trond Myklebust 2007-01-18 13:27 ` Peter Zijlstra 2007-01-18 13:27 ` Peter Zijlstra 2007-01-18 15:49 ` Trond Myklebust 2007-01-18 15:49 ` Trond Myklebust 2007-01-19 9:33 ` Peter Zijlstra 2007-01-19 9:33 ` Peter Zijlstra 2007-01-19 13:07 ` Peter Zijlstra 2007-01-19 13:07 ` Peter Zijlstra 2007-01-19 16:51 ` Trond Myklebust 2007-01-19 16:51 ` Trond Myklebust 2007-01-19 17:54 ` Peter Zijlstra 2007-01-19 17:54 ` Peter Zijlstra 2007-01-19 17:20 ` Christoph Lameter 2007-01-19 17:20 ` Christoph Lameter 2007-01-19 17:57 ` Peter Zijlstra 2007-01-19 17:57 ` Peter Zijlstra 2007-01-19 18:02 ` Christoph Lameter 2007-01-19 18:02 ` Christoph Lameter 2007-01-19 18:26 ` Trond Myklebust 2007-01-19 18:26 ` Trond Myklebust 2007-01-19 18:27 ` Christoph Lameter 2007-01-19 18:27 ` Christoph Lameter 2007-01-20 7:01 ` [PATCH] nfs: fix congestion control -v3 Peter Zijlstra 2007-01-20 7:01 ` [PATCH] nfs: fix congestion control -v3, nfs: fix congestion control Peter Zijlstra 2007-01-22 16:12 ` [PATCH] nfs: fix congestion control -v3 Trond Myklebust 2007-01-22 16:12 ` Trond Myklebust 2007-01-25 15:32 ` [PATCH] nfs: fix congestion control -v4 Peter Zijlstra 2007-01-25 15:32 ` Peter Zijlstra 2007-01-26 5:02 ` Andrew Morton 2007-01-26 5:02 ` Andrew Morton 2007-01-26 8:00 ` Peter Zijlstra 2007-01-26 8:00 ` Peter Zijlstra 2007-01-26 8:50 ` Peter Zijlstra 2007-01-26 8:50 ` Peter Zijlstra 2007-01-26 5:09 ` Andrew Morton 2007-01-26 5:09 ` Andrew Morton 2007-01-26 5:31 ` Christoph Lameter 2007-01-26 5:31 ` Christoph Lameter 2007-01-26 6:04 ` Andrew Morton 2007-01-26 6:04 ` Andrew Morton 2007-01-26 6:53 ` Christoph Lameter 2007-01-26 6:53 ` Christoph Lameter 2007-01-26 8:03 ` Peter Zijlstra 2007-01-26 8:03 ` Peter Zijlstra 2007-01-26 8:51 ` Andrew Morton 2007-01-26 8:51 ` Andrew Morton 2007-01-26 9:01 ` Peter Zijlstra 2007-01-26 9:01 ` Peter Zijlstra 2007-02-20 12:59 ` Peter Zijlstra 2007-02-20 12:59 ` Peter Zijlstra 2007-01-22 17:59 ` [PATCH] nfs: fix congestion control -v3 Christoph Lameter 2007-01-22 17:59 ` Christoph Lameter 2007-01-17 23:15 ` [PATCH] nfs: fix congestion control Christoph Hellwig 2007-01-17 23:15 ` Christoph Hellwig 2007-01-16 22:15 ` [RFC 0/8] Cpuset aware writeback Christoph Lameter 2007-01-16 22:15 ` Christoph Lameter 2007-01-16 23:40 ` Andrew Morton 2007-01-16 23:40 ` Andrew Morton 2007-01-17 0:16 ` Christoph Lameter 2007-01-17 0:16 ` Christoph Lameter 2007-01-17 1:07 ` Andrew Morton 2007-01-17 1:07 ` Andrew Morton 2007-01-17 1:30 ` Christoph Lameter 2007-01-17 1:30 ` Christoph Lameter 2007-01-17 2:34 ` Andrew Morton 2007-01-17 2:34 ` Andrew Morton 2007-01-17 3:40 ` Christoph Lameter 2007-01-17 3:40 ` Christoph Lameter 2007-01-17 4:02 ` Paul Jackson 2007-01-17 4:02 ` Paul Jackson 2007-01-17 4:05 ` Andrew Morton 2007-01-17 4:05 ` Andrew Morton 2007-01-17 6:27 ` Christoph Lameter 2007-01-17 6:27 ` Christoph Lameter 2007-01-17 7:00 ` Andrew Morton 2007-01-17 7:00 ` Andrew Morton 2007-01-17 8:01 ` Paul Jackson 2007-01-17 8:01 ` Paul Jackson 2007-01-17 9:57 ` Andrew Morton 2007-01-17 9:57 ` Andrew Morton 2007-01-17 19:43 ` Christoph Lameter 2007-01-17 19:43 ` Christoph Lameter 2007-01-17 22:10 ` Andrew Morton 2007-01-17 22:10 ` Andrew Morton 2007-01-18 1:10 ` Christoph Lameter 2007-01-18 1:10 ` Christoph Lameter 2007-01-18 1:25 ` Andrew Morton 2007-01-18 1:25 ` Andrew Morton 2007-01-18 5:21 ` Christoph Lameter 2007-01-18 5:21 ` Christoph Lameter 2007-01-16 23:44 ` David Chinner 2007-01-16 23:44 ` David Chinner 2007-01-16 22:01 ` Andi Kleen 2007-01-16 22:01 ` Andi Kleen 2007-01-16 22:18 ` Christoph Lameter 2007-01-16 22:18 ` Christoph Lameter 2007-02-02 1:38 ` Ethan Solomita 2007-02-02 1:38 ` Ethan Solomita 2007-02-02 2:16 ` Christoph Lameter 2007-02-02 2:16 ` Christoph Lameter 2007-02-02 4:03 ` Andrew Morton 2007-02-02 4:03 ` Andrew Morton 2007-02-02 5:29 ` Christoph Lameter 2007-02-02 5:29 ` Christoph Lameter 2007-02-02 6:02 ` Neil Brown 2007-02-02 6:02 ` Neil Brown 2007-02-02 6:17 ` Christoph Lameter 2007-02-02 6:17 ` Christoph Lameter 2007-02-02 6:41 ` Neil Brown 2007-02-02 6:41 ` Neil Brown 2007-02-02 7:12 ` Andrew Morton 2007-02-02 7:12 ` Andrew Morton 2007-03-21 21:11 ` Ethan Solomita 2007-03-21 21:11 ` Ethan Solomita 2007-03-21 21:29 ` Christoph Lameter 2007-03-21 21:29 ` Christoph Lameter 2007-03-21 21:52 ` Andrew Morton 2007-03-21 21:52 ` Andrew Morton 2007-03-21 21:57 ` Christoph Lameter 2007-03-21 21:57 ` Christoph Lameter 2007-04-19 2:07 ` Ethan Solomita 2007-04-19 2:07 ` Ethan Solomita 2007-04-19 2:55 ` Christoph Lameter 2007-04-19 2:55 ` Christoph Lameter 2007-04-19 7:52 ` Ethan Solomita 2007-04-19 7:52 ` Ethan Solomita 2007-04-19 16:03 ` Christoph Lameter 2007-04-19 16:03 ` Christoph Lameter 2007-04-21 1:37 ` Ethan Solomita 2007-04-21 1:37 ` Ethan Solomita 2007-04-21 1:48 ` Christoph Lameter 2007-04-21 1:48 ` Christoph Lameter 2007-04-21 8:15 ` Ethan Solomita 2007-04-21 8:15 ` Ethan Solomita 2007-04-21 15:40 ` Christoph Lameter 2007-04-21 15:40 ` Christoph Lameter
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1168933090.22935.30.camel@twins \ --to=a.p.zijlstra@chello.nl \ --cc=ak@suse.de \ --cc=akpm@osdl.org \ --cc=clameter@sgi.com \ --cc=dgc@sgi.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=menage@google.com \ --cc=nickpiggin@yahoo.com.au \ --cc=pj@sgi.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.