linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Hansen <hansendc@us.ibm.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>,
	containers@lists.osdl.org, linux-kernel@vger.kernel.org,
	menage@google.com, Andrew Morton <akpm@linux-foundation.org>,
	xemul@sw.ru
Subject: controlling mmap()'d vs read/write() pages
Date: Tue, 20 Mar 2007 09:15:34 -0700	[thread overview]
Message-ID: <1174407335.26166.146.camel@localhost.localdomain> (raw)
In-Reply-To: <m1zm6ajvns.fsf@ebiederm.dsl.xmission.com>

On Sun, 2007-03-18 at 11:42 -0600, Eric W. Biederman wrote:
> Dave Hansen <hansendc@us.ibm.com> writes:
> > To me, a process sitting there doing constant reads of 10 pages has the
> > same overhead to the VM as a process sitting there with a 10 page file
> > mmaped, and reading that.
> 
> I can see temporarily accounting for pages in use for such a
> read/write and possibly during things such as read ahead.
> 
> However I doubt it is enough memory to be significant, and as
> such is probably a waste of time accounting for it.
> 
> A memory limit is not about accounting for memory pressure, so I think
> the reasoning for wanting to account for unmapped pages as a hard
> requirement is still suspect.  A memory limit is to prevent one container
> from hogging all of the memory in the system, and denying it to other
> containers.
> 
> The page cache by definition is a global resource that facilitates
> global kernel optimizations.  If we kill those optimizations we
> are on the wrong track.  By requiring limits there I think we are
> very likely to kill our very important global optimizations, and bring
> the performance of the entire system down.

Let's say you have an mmap'd file.  It has zero pages brought in right
now.  You do a write to it.  It is well within the kernel's rights to
let you write one word to an mmap'd file, then unmap it, write it to
disk, and free the page.

To me, mmap() is an interface, not a directive to tell the kernel to
keep things in memory.  The fact that two reads of a bytes from an
mmap()'d file tends to not go to disk or even cause a fault for the
second read is because the page is in the page cache.  The fact that two
consecutive read()s of the same disk page tend to not cause two trips to
the disk is because the page is in the page cache.

Anybody who wants to get data in and out of a file can choose to use
either of these interfaces.  A page being brought into the system for
either a read or touch of an mmap()'d area causes the same kind of
memory pressure.

So, I think we have a difference of opinion.  I think it's _all_ about
memory pressure, and you think it is _not_ about accounting for memory
pressure. :)  Perhaps we mean different things, but we appear to
disagree greatly on the surface.

Can we agree that there must be _some_ way to control the amounts of
unmapped page cache?  Whether that's related somehow to the same way we
control RSS or done somehow at the I/O level, there must be some way to
control it.  Agree?

> >> - Could you mention proper multi process RSS limits.
> >>   (I.e.  we count the number of pages each group of processes have mapped
> >>    and limit that).
> >>   It is the same basic idea as partial page ownership, but instead of
> >>   page ownership you just count how many pages each group is using and
> >>   strictly limit that.  There is no page owner ship or partial charges.
> >>   The overhead is just walking the rmap list at map and unmap time to
> >>   see if this is the first users in the container.  No additional kernel
> >>   data structures are needed.
> >
> > I've tried to capture this.  Let me know what else you think it
> > needs.
> 
> Requirements:
> - The current kernel global optimizations are preserved and useful.
> 
>   This does mean one container can affect another when the
>   optimizations go awry but on average it means much better
>   performance.  For many the global optimizations are what make
>   the in-kernel approach attractive over paravirtualization.
> 
> Very nice to have:
> - Limits should be on things user space have control of.
...
> - SMP Scalability.

> - Perfect precision.
...

I've tried to capture this:

http://linux-mm.org/SoftwareZones

> We need several more limits in this discussion to get a full picture,
> otherwise we may to try and build the all singing all dancing limit.
> - A limit on the number of anonymous pages.
>   (Pages that are or may be in the swap cache).
> - Filesystem per container quotas.  
>   (Only applicable in some contexts but you get the idea).
> - Inode, file descriptor, and similar limits.
> - I/O limits.

Definitely.  I think we've all agreed that memory is the hard one,
though.  If we can make progress on this one, we're set! :)

-- Dave


  parent reply	other threads:[~2007-03-20 16:15 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-06 14:42 [RFC][PATCH 0/7] Resource controllers based on process containers Pavel Emelianov
2007-03-06 14:49 ` [RFC][PATCH 1/7] Resource counters Pavel Emelianov
2007-03-07  4:03   ` Balbir Singh
2007-03-07  7:19     ` Pavel Emelianov
2007-03-09 16:37       ` Herbert Poetzl
2007-03-11  9:01         ` Pavel Emelianov
2007-03-11 19:00         ` Eric W. Biederman
2007-03-12  1:16           ` Herbert Poetzl
2007-03-13  9:09             ` Eric W. Biederman
2007-03-13  9:27               ` Pavel Emelianov
2007-03-13  9:49               ` [Devel] " Kirill Korotaev
2007-03-13 15:21               ` Herbert Poetzl
2007-03-13 15:41                 ` Pavel Emelianov
2007-03-13 16:07                   ` Srivatsa Vaddagiri
2007-03-14  7:12                     ` Pavel Emelianov
2007-03-15 16:51                       ` Eric W. Biederman
2007-03-13 16:32                   ` Herbert Poetzl
2007-03-06 14:55 ` [RFC][PATCH 2/7] RSS controller core Pavel Emelianov
2007-03-06 22:00   ` Andrew Morton
2007-03-09 16:48     ` Herbert Poetzl
2007-03-11  9:08       ` Pavel Emelianov
2007-03-11 14:32         ` Herbert Poetzl
2007-03-11 15:04           ` Pavel Emelianov
2007-03-12  0:41             ` Herbert Poetzl
2007-03-12  8:31               ` Pavel Emelianov
2007-03-12  9:55       ` Balbir Singh
2007-03-12 23:43         ` Herbert Poetzl
2007-03-13  1:57           ` Balbir Singh
2007-03-13  2:24             ` Srivatsa Vaddagiri
2007-03-13 16:06             ` Herbert Poetzl
2007-03-11 12:26     ` Kirill Korotaev
2007-03-11 12:51       ` Andrew Morton
2007-03-11 15:51         ` Balbir Singh
2007-03-11 19:34         ` Eric W. Biederman
2007-03-12  9:23           ` [Devel] " Kirill Korotaev
2007-03-13  9:26             ` Eric W. Biederman
2007-03-13 15:43               ` Kirill Korotaev
2007-03-12  1:00         ` Herbert Poetzl
2007-03-12  9:02           ` Pavel Emelianov
2007-03-12 21:11             ` Herbert Poetzl
2007-03-13  7:17               ` Pavel Emelianov
2007-03-13 15:05                 ` Herbert Poetzl
2007-03-13 15:32                   ` Pavel Emelianov
2007-03-13 15:10               ` Kirill Korotaev
2007-03-13 15:11                 ` Herbert Poetzl
2007-03-13 15:54                   ` Kirill Korotaev
2007-03-12 18:42           ` Dave Hansen
2007-03-12 22:41             ` Herbert Poetzl
2007-03-12 23:02               ` Dave Hansen
2007-03-18 16:58                 ` Eric W. Biederman
2007-03-13  6:04               ` Andrew Morton
2007-03-13 10:19                 ` [Devel] " Kirill Korotaev
2007-03-13 11:48                   ` Andrew Morton
2007-03-13 14:59                     ` Herbert Poetzl
2007-03-13 17:05                     ` Dave Hansen
2007-03-14 15:38                       ` Mel Gorman
2007-03-14 20:42                         ` Dave Hansen
2007-03-20 18:57                           ` Mel Gorman
2007-03-18 22:44                       ` [Devel] " Paul Menage
2007-03-19 17:41                         ` Eric W. Biederman
2007-03-13 17:26                 ` Dave Hansen
2007-03-13 19:09                   ` Alan Cox
2007-03-13 20:28                     ` Dave Hansen
2007-03-16  0:55                     ` Eric W. Biederman
2007-03-16 16:31                       ` Dave Hansen
2007-03-16 18:54                         ` Eric W. Biederman
2007-03-16 19:46                           ` Dave Hansen
2007-03-18 17:42                             ` Eric W. Biederman
2007-03-19 15:48                               ` Herbert Poetzl
2007-03-20 16:15                               ` Dave Hansen [this message]
2007-03-20 21:19                                 ` controlling mmap()'d vs read/write() pages Eric W. Biederman
2007-03-23  0:51                                   ` Herbert Poetzl
2007-03-23  5:57                                   ` Nick Piggin
2007-03-23 10:12                                     ` Eric W. Biederman
2007-03-23 10:47                                       ` Nick Piggin
2007-03-23 12:21                                         ` Eric W. Biederman
2007-03-28  7:33                                           ` Nick Piggin
2007-03-23 16:41                                       ` Dave Hansen
2007-03-23 18:16                                         ` Herbert Poetzl
2007-03-28  9:18                                           ` Balbir Singh
2007-03-14 16:47                   ` [RFC][PATCH 2/7] RSS controller core Mel Gorman
2007-03-07  5:37   ` Balbir Singh
2007-03-07  7:27     ` Pavel Emelianov
2007-03-06 14:58 ` [RFC][PATCH 3/7] Data structures changes for RSS accounting Pavel Emelianov
2007-03-11 19:13   ` Eric W. Biederman
2007-03-12 16:16     ` Kirill Korotaev
2007-03-12 16:48       ` Dave Hansen
2007-03-12 17:19         ` Pavel Emelianov
2007-03-12 17:27           ` Dave Hansen
2007-03-13  7:10             ` Pavel Emelianov
2007-03-12 17:21         ` Balbir Singh
2007-03-06 15:00 ` [RFC][PATCH 4/7] RSS accounting hooks over the code Pavel Emelianov
2007-03-11 19:14   ` Eric W. Biederman
2007-03-12 16:23     ` Kirill Korotaev
2007-03-12 16:50       ` Dave Hansen
2007-03-12 17:07         ` Kirill Korotaev
2007-03-12 17:33           ` Dave Hansen
2007-03-13  9:43             ` Eric W. Biederman
2007-03-12 23:54         ` Herbert Poetzl
2007-03-13  9:58           ` Eric W. Biederman
2007-03-13 10:25             ` Nick Piggin
2007-03-13 16:01               ` Eric W. Biederman
2007-03-14  3:51                 ` Nick Piggin
2007-03-14  6:42                   ` Balbir Singh
2007-03-14  6:57                     ` Nick Piggin
2007-03-14  7:48                       ` Balbir Singh
2007-03-14 13:25                         ` Vaidyanathan Srinivasan
2007-03-14 13:49                           ` Nick Piggin
2007-03-14 14:43                             ` Vaidyanathan Srinivasan
2007-03-14 16:16                             ` Kirill Korotaev
2007-03-15  5:01                               ` Nick Piggin
2007-03-15  5:44                                 ` Balbir Singh
2007-03-28 20:15               ` Ethan Solomita
2007-03-14 15:37   ` Cedric Le Goater
2007-03-14 15:45     ` Pavel Emelianov
2007-03-06 15:03 ` [RFC][PATCH 5/7] Per-container OOM killer and page reclamation Pavel Emelianov
2007-03-09 21:21   ` Balbir Singh
2007-03-11  8:41     ` Pavel Emelianov
2007-03-06 15:04 ` [RFC][PATCH 6/7] Account for the number of tasks within container Pavel Emelianov
2007-03-07  2:00   ` Paul Menage
2007-03-07  7:13     ` Pavel Emelianov
2007-03-08 13:49       ` Paul Menage
2007-03-11  8:36         ` Pavel Emelianov
2007-03-06 15:07 ` [RFC][PATCH 7/7] Account for the number of files opened " Pavel Emelianov
2007-03-07  2:02 ` [RFC][PATCH 0/7] Resource controllers based on process containers Paul Menage
2007-03-07  7:30   ` Pavel Emelianov
2007-03-07  6:52 ` Balbir Singh
2007-03-07  7:32   ` Pavel Emelianov
2007-03-07  9:43     ` Kirill Korotaev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1174407335.26166.146.camel@localhost.localdomain \
    --to=hansendc@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=containers@lists.osdl.org \
    --cc=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=menage@google.com \
    --cc=xemul@sw.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).