All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@osdl.org>
To: Christoph Lameter <clameter@sgi.com>
Cc: menage@google.com, linux-kernel@vger.kernel.org,
	nickpiggin@yahoo.com.au, linux-mm@kvack.org, ak@suse.de,
	pj@sgi.com, dgc@sgi.com
Subject: Re: [RFC 0/8] Cpuset aware writeback
Date: Tue, 16 Jan 2007 23:00:34 -0800	[thread overview]
Message-ID: <20070116230034.b8cb4263.akpm@osdl.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0701162219180.5215@schroedinger.engr.sgi.com>

> On Tue, 16 Jan 2007 22:27:36 -0800 (PST) Christoph Lameter <clameter@sgi.com> wrote:
> On Tue, 16 Jan 2007, Andrew Morton wrote:
> 
> > > Yes this is the result of the hierachical nature of cpusets which already 
> > > causes issues with the scheduler. It is rather typical that cpusets are 
> > > used to partition the memory and cpus. Overlappig cpusets seem to have 
> > > mainly an administrative function. Paul?
> > 
> > The typical usage scenarios don't matter a lot: the examples I gave show
> > that the core problem remains unsolved.  People can still hit the bug.
> 
> I agree the overlap issue is a problem and I hope it can be addressed 
> somehow for the rare cases in which such nesting takes place.
> 
> One easy solution may be to check the dirty ratio before engaging in 
> reclaim. If the dirty ratio is sufficiently high then trigger writeout via 
> pdflush (we already wakeup pdflush while scanning and you already noted 
> that pdflush writeout is not occurring within the context of the current 
> cpuset) and pass over any dirty pages during LRU scans until some pages 
> have been cleaned up.
> 
> This means we allow allocation of additional kernel memory outside of the 
> cpuset while triggering writeout of inodes that have pages on the nodes 
> of the cpuset. The memory directly used by the application is still 
> limited. Just the temporary information needed for writeback is allocated 
> outside.

Gad.  None of that should be necessary.

> Well sounds somehow still like a hack. Any other ideas out there?

Do what blockdevs do: limit the number of in-flight requests (Peter's
recent patch seems to be doing that for us) (perhaps only when PF_MEMALLOC
is in effect, to keep Trond happy) and implement a mempool for the NFS
request critical store.  Additionally:

- we might need to twiddle the NFS gfp_flags so it doesn't call the
  oom-killer on failure: just return NULL.

- consider going off-cpuset for critical allocations.  It's better than
  going oom.  A suitable implementation might be to ignore the caller's
  cpuset if PF_MEMALLOC.  Maybe put a WARN_ON_ONCE in there: we prefer that
  it not happen and we want to know when it does.



btw, regarding the per-address_space node mask: I think we should free it
when the inode is clean (!mapping_tagged(PAGECACHE_TAG_DIRTY)).  Chances
are, the inode will be dirty for 30 seconds and in-core for hours.  We
might as well steal its nodemask storage and give it to the next file which
gets written to.  A suitable place to do all this is in
__mark_inode_dirty(I_DIRTY_PAGES), using inode_lock to protect
address_space.dirty_page_nodemask.

WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@osdl.org>
To: Christoph Lameter <clameter@sgi.com>
Cc: menage@google.com, linux-kernel@vger.kernel.org,
	nickpiggin@yahoo.com.au, linux-mm@kvack.org, ak@suse.de,
	pj@sgi.com, dgc@sgi.com
Subject: Re: [RFC 0/8] Cpuset aware writeback
Date: Tue, 16 Jan 2007 23:00:34 -0800	[thread overview]
Message-ID: <20070116230034.b8cb4263.akpm@osdl.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0701162219180.5215@schroedinger.engr.sgi.com>

> On Tue, 16 Jan 2007 22:27:36 -0800 (PST) Christoph Lameter <clameter@sgi.com> wrote:
> On Tue, 16 Jan 2007, Andrew Morton wrote:
> 
> > > Yes this is the result of the hierachical nature of cpusets which already 
> > > causes issues with the scheduler. It is rather typical that cpusets are 
> > > used to partition the memory and cpus. Overlappig cpusets seem to have 
> > > mainly an administrative function. Paul?
> > 
> > The typical usage scenarios don't matter a lot: the examples I gave show
> > that the core problem remains unsolved.  People can still hit the bug.
> 
> I agree the overlap issue is a problem and I hope it can be addressed 
> somehow for the rare cases in which such nesting takes place.
> 
> One easy solution may be to check the dirty ratio before engaging in 
> reclaim. If the dirty ratio is sufficiently high then trigger writeout via 
> pdflush (we already wakeup pdflush while scanning and you already noted 
> that pdflush writeout is not occurring within the context of the current 
> cpuset) and pass over any dirty pages during LRU scans until some pages 
> have been cleaned up.
> 
> This means we allow allocation of additional kernel memory outside of the 
> cpuset while triggering writeout of inodes that have pages on the nodes 
> of the cpuset. The memory directly used by the application is still 
> limited. Just the temporary information needed for writeback is allocated 
> outside.

Gad.  None of that should be necessary.

> Well sounds somehow still like a hack. Any other ideas out there?

Do what blockdevs do: limit the number of in-flight requests (Peter's
recent patch seems to be doing that for us) (perhaps only when PF_MEMALLOC
is in effect, to keep Trond happy) and implement a mempool for the NFS
request critical store.  Additionally:

- we might need to twiddle the NFS gfp_flags so it doesn't call the
  oom-killer on failure: just return NULL.

- consider going off-cpuset for critical allocations.  It's better than
  going oom.  A suitable implementation might be to ignore the caller's
  cpuset if PF_MEMALLOC.  Maybe put a WARN_ON_ONCE in there: we prefer that
  it not happen and we want to know when it does.



btw, regarding the per-address_space node mask: I think we should free it
when the inode is clean (!mapping_tagged(PAGECACHE_TAG_DIRTY)).  Chances
are, the inode will be dirty for 30 seconds and in-core for hours.  We
might as well steal its nodemask storage and give it to the next file which
gets written to.  A suitable place to do all this is in
__mark_inode_dirty(I_DIRTY_PAGES), using inode_lock to protect
address_space.dirty_page_nodemask.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2007-01-17  7:00 UTC|newest]

Thread overview: 220+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-16  5:47 [RFC 0/8] Cpuset aware writeback Christoph Lameter
2007-01-16  5:47 ` Christoph Lameter
2007-01-16  5:47 ` [RFC 1/8] Convert higest_possible_node_id() into nr_node_ids Christoph Lameter
2007-01-16  5:47   ` Christoph Lameter
2007-01-16 22:05   ` Andi Kleen
2007-01-16 22:05     ` Andi Kleen
2007-01-17  3:14     ` Christoph Lameter
2007-01-17  3:14       ` Christoph Lameter
2007-01-17  4:15       ` Andi Kleen
2007-01-17  4:15         ` Andi Kleen
2007-01-17  4:23         ` Christoph Lameter
2007-01-17  4:23           ` Christoph Lameter
2007-01-16  5:47 ` [RFC 2/8] Add a map to inodes to track dirty pages per node Christoph Lameter
2007-01-16  5:47   ` Christoph Lameter
2007-01-16  5:47 ` [RFC 3/8] Add a nodemask to pdflush functions Christoph Lameter
2007-01-16  5:47   ` Christoph Lameter
2007-01-16  5:48 ` [RFC 4/8] Per cpuset dirty ratio handling and writeout Christoph Lameter
2007-01-16  5:48   ` Christoph Lameter
2007-01-16  5:48 ` [RFC 5/8] Make writeout during reclaim cpuset aware Christoph Lameter
2007-01-16  5:48   ` Christoph Lameter
2007-01-16 22:07   ` Andi Kleen
2007-01-16 22:07     ` Andi Kleen
2007-01-17  4:20     ` Paul Jackson
2007-01-17  4:20       ` Paul Jackson
2007-01-17  4:28       ` Andi Kleen
2007-01-17  4:28         ` Andi Kleen
2007-01-17  4:36         ` Paul Jackson
2007-01-17  4:36           ` Paul Jackson
2007-01-17  5:59           ` Andi Kleen
2007-01-17  5:59             ` Andi Kleen
2007-01-17  6:19             ` Christoph Lameter
2007-01-17  6:19               ` Christoph Lameter
2007-01-17  4:23     ` Christoph Lameter
2007-01-17  4:23       ` Christoph Lameter
2007-01-16  5:48 ` [RFC 6/8] Throttle vm writeout per cpuset Christoph Lameter
2007-01-16  5:48   ` Christoph Lameter
2007-01-16  5:48 ` [RFC 7/8] Exclude unreclaimable pages from dirty ration calculation Christoph Lameter
2007-01-16  5:48   ` Christoph Lameter
2007-01-18 15:48   ` Nikita Danilov
2007-01-18 15:48     ` Nikita Danilov
2007-01-18 19:56     ` Christoph Lameter
2007-01-18 19:56       ` Christoph Lameter
2007-01-16  5:48 ` [RFC 8/8] Reduce inode memory usage for systems with a high MAX_NUMNODES Christoph Lameter
2007-01-16  5:48   ` Christoph Lameter
2007-01-16 19:52   ` Paul Menage
2007-01-16 19:52     ` Paul Menage
2007-01-16 20:00     ` Christoph Lameter
2007-01-16 20:00       ` Christoph Lameter
2007-01-16 20:06       ` Paul Menage
2007-01-16 20:06         ` Paul Menage
2007-01-16 20:51         ` Christoph Lameter
2007-01-16 20:51           ` Christoph Lameter
2007-01-16  7:38 ` [RFC 0/8] Cpuset aware writeback Peter Zijlstra
2007-01-16  7:38   ` Peter Zijlstra
2007-01-16 20:10   ` Christoph Lameter
2007-01-16 20:10     ` Christoph Lameter
2007-01-16  9:25 ` Paul Jackson
2007-01-16  9:25   ` Paul Jackson
2007-01-16 17:13   ` Christoph Lameter
2007-01-16 17:13     ` Christoph Lameter
2007-01-16 21:53 ` Andrew Morton
2007-01-16 21:53   ` Andrew Morton
2007-01-16 22:08   ` [PATCH] nfs: fix congestion control Peter Zijlstra
2007-01-16 22:08     ` [PATCH] nfs: fix congestion control, " Peter Zijlstra
2007-01-16 22:27     ` [PATCH] " Trond Myklebust
2007-01-16 22:27       ` Trond Myklebust
2007-01-17  2:41       ` Peter Zijlstra
2007-01-17  2:41         ` Peter Zijlstra
2007-01-17  6:15         ` Trond Myklebust
2007-01-17  6:15           ` Trond Myklebust
2007-01-17  8:49           ` Peter Zijlstra
2007-01-17  8:49             ` Peter Zijlstra
2007-01-17 13:50             ` Trond Myklebust
2007-01-17 13:50               ` Trond Myklebust
2007-01-17 14:29               ` Peter Zijlstra
2007-01-17 14:29                 ` Peter Zijlstra
2007-01-17 14:45                 ` Trond Myklebust
2007-01-17 14:45                   ` Trond Myklebust
2007-01-17 20:05     ` Christoph Lameter
2007-01-17 20:05       ` Christoph Lameter
2007-01-17 21:52       ` Peter Zijlstra
2007-01-17 21:52         ` Peter Zijlstra
2007-01-17 21:54         ` Trond Myklebust
2007-01-17 21:54           ` Trond Myklebust
2007-01-18 13:27           ` Peter Zijlstra
2007-01-18 13:27             ` Peter Zijlstra
2007-01-18 15:49             ` Trond Myklebust
2007-01-18 15:49               ` Trond Myklebust
2007-01-19  9:33               ` Peter Zijlstra
2007-01-19  9:33                 ` Peter Zijlstra
2007-01-19 13:07                 ` Peter Zijlstra
2007-01-19 13:07                   ` Peter Zijlstra
2007-01-19 16:51                   ` Trond Myklebust
2007-01-19 16:51                     ` Trond Myklebust
2007-01-19 17:54                     ` Peter Zijlstra
2007-01-19 17:54                       ` Peter Zijlstra
2007-01-19 17:20                   ` Christoph Lameter
2007-01-19 17:20                     ` Christoph Lameter
2007-01-19 17:57                     ` Peter Zijlstra
2007-01-19 17:57                       ` Peter Zijlstra
2007-01-19 18:02                       ` Christoph Lameter
2007-01-19 18:02                         ` Christoph Lameter
2007-01-19 18:26                       ` Trond Myklebust
2007-01-19 18:26                         ` Trond Myklebust
2007-01-19 18:27                         ` Christoph Lameter
2007-01-19 18:27                           ` Christoph Lameter
2007-01-20  7:01                         ` [PATCH] nfs: fix congestion control -v3 Peter Zijlstra
2007-01-20  7:01                           ` [PATCH] nfs: fix congestion control -v3, nfs: fix congestion control Peter Zijlstra
2007-01-22 16:12                           ` [PATCH] nfs: fix congestion control -v3 Trond Myklebust
2007-01-22 16:12                             ` Trond Myklebust
2007-01-25 15:32                             ` [PATCH] nfs: fix congestion control -v4 Peter Zijlstra
2007-01-25 15:32                               ` Peter Zijlstra
2007-01-26  5:02                               ` Andrew Morton
2007-01-26  5:02                                 ` Andrew Morton
2007-01-26  8:00                                 ` Peter Zijlstra
2007-01-26  8:00                                   ` Peter Zijlstra
2007-01-26  8:50                                   ` Peter Zijlstra
2007-01-26  8:50                                     ` Peter Zijlstra
2007-01-26  5:09                               ` Andrew Morton
2007-01-26  5:09                                 ` Andrew Morton
2007-01-26  5:31                                 ` Christoph Lameter
2007-01-26  5:31                                   ` Christoph Lameter
2007-01-26  6:04                                   ` Andrew Morton
2007-01-26  6:04                                     ` Andrew Morton
2007-01-26  6:53                                     ` Christoph Lameter
2007-01-26  6:53                                       ` Christoph Lameter
2007-01-26  8:03                                     ` Peter Zijlstra
2007-01-26  8:03                                       ` Peter Zijlstra
2007-01-26  8:51                                       ` Andrew Morton
2007-01-26  8:51                                         ` Andrew Morton
2007-01-26  9:01                                         ` Peter Zijlstra
2007-01-26  9:01                                           ` Peter Zijlstra
2007-02-20 12:59                                         ` Peter Zijlstra
2007-02-20 12:59                                           ` Peter Zijlstra
2007-01-22 17:59                           ` [PATCH] nfs: fix congestion control -v3 Christoph Lameter
2007-01-22 17:59                             ` Christoph Lameter
2007-01-17 23:15     ` [PATCH] nfs: fix congestion control Christoph Hellwig
2007-01-17 23:15       ` Christoph Hellwig
2007-01-16 22:15   ` [RFC 0/8] Cpuset aware writeback Christoph Lameter
2007-01-16 22:15     ` Christoph Lameter
2007-01-16 23:40     ` Andrew Morton
2007-01-16 23:40       ` Andrew Morton
2007-01-17  0:16       ` Christoph Lameter
2007-01-17  0:16         ` Christoph Lameter
2007-01-17  1:07         ` Andrew Morton
2007-01-17  1:07           ` Andrew Morton
2007-01-17  1:30           ` Christoph Lameter
2007-01-17  1:30             ` Christoph Lameter
2007-01-17  2:34             ` Andrew Morton
2007-01-17  2:34               ` Andrew Morton
2007-01-17  3:40               ` Christoph Lameter
2007-01-17  3:40                 ` Christoph Lameter
2007-01-17  4:02                 ` Paul Jackson
2007-01-17  4:02                   ` Paul Jackson
2007-01-17  4:05                 ` Andrew Morton
2007-01-17  4:05                   ` Andrew Morton
2007-01-17  6:27                   ` Christoph Lameter
2007-01-17  6:27                     ` Christoph Lameter
2007-01-17  7:00                     ` Andrew Morton [this message]
2007-01-17  7:00                       ` Andrew Morton
2007-01-17  8:01                       ` Paul Jackson
2007-01-17  8:01                         ` Paul Jackson
2007-01-17  9:57                         ` Andrew Morton
2007-01-17  9:57                           ` Andrew Morton
2007-01-17 19:43                       ` Christoph Lameter
2007-01-17 19:43                         ` Christoph Lameter
2007-01-17 22:10                         ` Andrew Morton
2007-01-17 22:10                           ` Andrew Morton
2007-01-18  1:10                           ` Christoph Lameter
2007-01-18  1:10                             ` Christoph Lameter
2007-01-18  1:25                             ` Andrew Morton
2007-01-18  1:25                               ` Andrew Morton
2007-01-18  5:21                               ` Christoph Lameter
2007-01-18  5:21                                 ` Christoph Lameter
2007-01-16 23:44   ` David Chinner
2007-01-16 23:44     ` David Chinner
2007-01-16 22:01 ` Andi Kleen
2007-01-16 22:01   ` Andi Kleen
2007-01-16 22:18   ` Christoph Lameter
2007-01-16 22:18     ` Christoph Lameter
2007-02-02  1:38 ` Ethan Solomita
2007-02-02  1:38   ` Ethan Solomita
2007-02-02  2:16   ` Christoph Lameter
2007-02-02  2:16     ` Christoph Lameter
2007-02-02  4:03     ` Andrew Morton
2007-02-02  4:03       ` Andrew Morton
2007-02-02  5:29       ` Christoph Lameter
2007-02-02  5:29         ` Christoph Lameter
2007-02-02  6:02         ` Neil Brown
2007-02-02  6:02           ` Neil Brown
2007-02-02  6:17           ` Christoph Lameter
2007-02-02  6:17             ` Christoph Lameter
2007-02-02  6:41             ` Neil Brown
2007-02-02  6:41               ` Neil Brown
2007-02-02  7:12         ` Andrew Morton
2007-02-02  7:12           ` Andrew Morton
2007-03-21 21:11     ` Ethan Solomita
2007-03-21 21:11       ` Ethan Solomita
2007-03-21 21:29       ` Christoph Lameter
2007-03-21 21:29         ` Christoph Lameter
2007-03-21 21:52         ` Andrew Morton
2007-03-21 21:52           ` Andrew Morton
2007-03-21 21:57           ` Christoph Lameter
2007-03-21 21:57             ` Christoph Lameter
2007-04-19  2:07         ` Ethan Solomita
2007-04-19  2:07           ` Ethan Solomita
2007-04-19  2:55           ` Christoph Lameter
2007-04-19  2:55             ` Christoph Lameter
2007-04-19  7:52             ` Ethan Solomita
2007-04-19  7:52               ` Ethan Solomita
2007-04-19 16:03               ` Christoph Lameter
2007-04-19 16:03                 ` Christoph Lameter
2007-04-21  1:37             ` Ethan Solomita
2007-04-21  1:37               ` Ethan Solomita
2007-04-21  1:48               ` Christoph Lameter
2007-04-21  1:48                 ` Christoph Lameter
2007-04-21  8:15                 ` Ethan Solomita
2007-04-21  8:15                   ` Ethan Solomita
2007-04-21 15:40                   ` Christoph Lameter
2007-04-21 15:40                     ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070116230034.b8cb4263.akpm@osdl.org \
    --to=akpm@osdl.org \
    --cc=ak@suse.de \
    --cc=clameter@sgi.com \
    --cc=dgc@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=menage@google.com \
    --cc=nickpiggin@yahoo.com.au \
    --cc=pj@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.