All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pavel Emelianov <xemul@sw.ru>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@osdl.org>, Paul Menage <menage@google.com>,
	Srivatsa Vaddagiri <vatsa@in.ibm.com>,
	Balbir Singh <balbir@in.ibm.com>,
	devel@openvz.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Kirill Korotaev <dev@sw.ru>,
	Chandra Seetharaman <sekharan@us.ibm.com>,
	Cedric Le Goater <clg@fr.ibm.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Rohit Seth <rohitseth@google.com>,
	Linux Containers <containers@lists.osdl.org>
Subject: Re: [PATCH 0/8] RSS controller based on process containers (v2)
Date: Tue, 10 Apr 2007 12:30:55 +0400	[thread overview]
Message-ID: <461B4B3F.6090506@sw.ru> (raw)
In-Reply-To: <1176134091.22676.96.camel@lappy>

Peter Zijlstra wrote:
> *ugh* /me no like.
> 
> The basic premises seems to be that we can track page owners perfectly
> (although this patch set does not yet do so), through get/release

It looks like you have examined the patches not very carefully
before concluding this. These patches DO track page owners.

I know that a page may be shared among several containers and
thus have many owners so we should track all of them. This is
exactly what we decided not to do half-a-year ago.

Page sharing accounting is performed in OpenVZ beancounters, and
this functionality will be pushed to mainline after this simple
container.

> operations (on _mapcount).
> 
> This is simply not true for unmapped pagecache pages. Those receive no
> 'release' event; (the usage by find_get_page() could be seen as 'get').

These patches concern the mapped pagecache only. Unmapped pagecache
control is out of the scope of it since we do not want one container
to track all the resources.

> Also, you don't seem to balance the active/inactive scanning on a per
> container basis. This skews the per container working set logic.

This is not true. Balbir sent a patch to the first version of this
container that added active/inactive balancing to the container.
I have included this (a bit reworked) patch into this version and
pointed this fact in the zeroth letter.

> Lastly, you don't call the slab shrinker for container reclaim; which
> would leave slab reclaim only for those few non process specific
> allocations, which would greatly skew the pagecache/slab balance.

Of course I do not call the slab shrinker! We do not have the
kernel memory control yet. Thus we can not shrink arbitrary kernel
objects just because some container has run out of its *user* memory.

Kernel memory control will come later. We decided to start from
a simple RSS control. Please, refer to containers archives for
more details.

> 
> 
> Let us call 
> 
> struct reclaim_struct {
> 	struct list_head active_list;
> 	struct list_head inactive_list;
> 	unsigned long nr_active;
> 	unsigned long nr_inactive;
> }
> 
> Lets recognise three distinct page categories:
>  - anonymous memory,
>  - mapped pagecache, and
>  - unmapped pagecache.

We cannot split the user memory in parts. There must be some
overall parameter that will allow administrator to say "Well,
let us run this container in a 64Mb sandbox". With the anonymous
and mapped memory separated  administrator will be a bit confused.

> 
> 
> We then keep anonymous pages on a per container reclaim_struct, these
> pages are fully accounted to each container.

Hmm... We do have such a construction. struct rss_container has
two lists and we shrink from them sequentially using an existing
scanner. Don't forget that this scanner has been evolving for
many years and writing a new scanner is just a waste of time.

> We keep mapped pagecache pages on per inode reclaim_structs, these files
> could be shared between containers and we could either just account all
> pages belonging to each file proportional to the number of containers
> involved, or do a more precise accounting.

What happens if one container fills the RAM with mapped pages from
a single file? Who will be the "owner" of this page set? Who will
expend its IO bandwidth to push these pages on disk? What if this
container will mlock() this set? Who will be killed?

> We keep unmapped pagecache pages on a global reclaim_struct, these pages
> can, in general, not be pinned to a specific container; all we can do is
> keep a floating proportion relative to container 'get' events
> (find_get_page() and perhaps add_to_page_cache()).
>
> Reclaim will then have to fairly reclaim pages from all of these lists.
> If we schedule such that it appears that these lists are parallel
> instead of serial - that is a each tail is really a tail, not the head
> of another list - the current reclaim semantics are preserved.

Yet again. The current scanner came out from the work of many people.
This is a very tricky place that is still evolving. Do you propose
to throw this out and write a new scanner?

> The slab shrinker should be called proportional to the containers size
> relative to the machine.

The slab shrinker must be called only if we do know what kernel
objects are used by this particular container. Otherwise we break
the idea of isolation. Generally speaking if some container runs
out of its resources we should reclaim pages, shrink objects, kill
tasks, etc from this container only.

> Global reclaim will have to call each container reclaim in proportional
> fashion.
> 
> The biggest problem with this approach is that there is no per zone
> reclaim left, which is relied upon by the allocator to provide free
> pages in a given physical address range. However there has been talk to
> create a proper range allocator independent of zones.
> 
> Just my 0.02 euro..
> 
> Peter
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


  reply	other threads:[~2007-04-10  8:29 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-09 12:22 [PATCH 0/8] RSS controller based on process containers (v2) Pavel Emelianov
2007-04-09 12:35 ` [PATCH 1/8] Resource counters Pavel Emelianov
2007-04-09 12:41 ` [PATCH 2/8] Add container pointer on struct page Pavel Emelianov
2007-04-13 13:56   ` Jean-Pierre Dion
2007-04-13 14:52     ` Pavel Emelianov
2007-04-09 12:46 ` [PATCH 3/8] Add container pointer on mm_struct Pavel Emelianov
2007-04-09 12:49 ` [PATCH 4/8] RSS container core Pavel Emelianov
2007-04-09 12:54 ` [PATCH 5/8] RSS accounting hooks over the code Pavel Emelianov
2007-04-09 12:56 ` [PATCH 6/8] Per container OOM killer Pavel Emelianov
2007-04-09 13:00 ` [PATCH 7/8] Page scanner changes needed to implement per-container scanner Pavel Emelianov
2007-04-09 13:02 ` [PATCH 8/8] Per-container pages reclamation Pavel Emelianov
2007-04-24  9:47   ` Balbir Singh
2007-04-24 10:34     ` Pavel Emelianov
2007-04-24 11:01       ` Balbir Singh
2007-04-24 11:37         ` Pavel Emelianov
2007-05-02  9:51   ` Balbir Singh
2007-05-17 11:31   ` Balbir Singh
2007-05-21 15:15     ` Pavel Emelianov
2007-05-21 15:15       ` Pavel Emelianov
2007-05-24  7:59       ` Balbir Singh
2007-04-09 15:54 ` [PATCH 0/8] RSS controller based on process containers (v2) Peter Zijlstra
2007-04-10  8:30   ` Pavel Emelianov [this message]
2007-04-19  5:37     ` Vaidyanathan Srinivasan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=461B4B3F.6090506@sw.ru \
    --to=xemul@sw.ru \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@osdl.org \
    --cc=balbir@in.ibm.com \
    --cc=clg@fr.ibm.com \
    --cc=containers@lists.osdl.org \
    --cc=dev@sw.ru \
    --cc=devel@openvz.org \
    --cc=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=menage@google.com \
    --cc=rohitseth@google.com \
    --cc=sekharan@us.ibm.com \
    --cc=vatsa@in.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.