From: Andrew Morton <akpm@linux-foundation.org>
To: Pavel Emelianov <xemul@openvz.org>
Cc: Paul Menage <menage@google.com>, Balbir Singh <balbir@in.ibm.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
devel@openvz.org, Kirill Korotaev <dev@openvz.org>
Subject: Re: [PATCH 8/8] Per-container pages reclamation
Date: Wed, 30 May 2007 14:47:37 -0700 [thread overview]
Message-ID: <20070530144737.56456aaf.akpm@linux-foundation.org> (raw)
In-Reply-To: <465D9B62.5050507@openvz.org>
On Wed, 30 May 2007 19:42:26 +0400
Pavel Emelianov <xemul@openvz.org> wrote:
> Implement try_to_free_pages_in_container() to free the
> pages in container that has run out of memory.
>
> The scan_control->isolate_pages() function is set to
> isolate_pages_in_container() that isolates the container
> pages only. The exported __isolate_lru_page() call
> makes things look simpler than in the previous version.
>
> Includes fix from Balbir Singh <balbir@in.ibm.com>
>
> }
>
> +void container_rss_move_lists(struct page *pg, bool active)
> +{
> + struct rss_container *rss;
> + struct page_container *pc;
> +
> + if (!page_mapped(pg))
> + return;
> +
> + pc = page_container(pg);
> + if (pc == NULL)
> + return;
> +
> + rss = pc->cnt;
> +
> + spin_lock(&rss->res.lock);
> + if (active)
> + list_move(&pc->list, &rss->active_list);
> + else
> + list_move(&pc->list, &rss->inactive_list);
> + spin_unlock(&rss->res.lock);
> +}
This is an interesting-looking function. Please document it?
I'm inferring that the rss container has an active and inactive list and
that this basically follows the same operation as the traditional per-zone
lists?
Would I be correct in guessing that pages which are on the
per-rss-container lists are also eligible for reclaim off the traditional
page LRUs? If so, how does that work? When a page gets freed off the
per-zone LRUs does it also get removed from the per-rss_container LRU? But
how can this be right? Pages can get taken off the LRU and freed at
interrupt time, and this code isn't interrupt-safe.
I note that this lock is not irq-safe, whereas the lru locks are irq-safe.
So we don't perform the rotate_reclaimable_page() operation within the RSS
container? I think we could do so. I wonder if this was considered.
A description of how all this code works would help a lot.
> +static unsigned long isolate_container_pages(unsigned long nr_to_scan,
> + struct list_head *src, struct list_head *dst,
> + unsigned long *scanned, struct zone *zone, int mode)
> +{
> + unsigned long nr_taken = 0;
> + struct page *page;
> + struct page_container *pc;
> + unsigned long scan;
> + LIST_HEAD(pc_list);
> +
> + for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) {
> + pc = list_entry(src->prev, struct page_container, list);
> + page = pc->page;
> + if (page_zone(page) != zone)
> + continue;
That page_zone() check is interesting. What's going on here?
I'm suspecting that we have a problem here: if there are a lot of pages on
*src which are in the wrong zone, we can suffer reclaim distress leading to
omm-killings, or excessive CPU consumption?
> + list_move(&pc->list, &pc_list);
> +
> + if (__isolate_lru_page(page, mode) == 0) {
> + list_move(&page->lru, dst);
> + nr_taken++;
> + }
> + }
> +
> + list_splice(&pc_list, src);
> +
> + *scanned = scan;
> + return nr_taken;
> +}
> +
> +unsigned long isolate_pages_in_container(unsigned long nr_to_scan,
> + struct list_head *dst, unsigned long *scanned,
> + int order, int mode, struct zone *zone,
> + struct rss_container *rss, int active)
> +{
> + unsigned long ret;
> +
> + spin_lock(&rss->res.lock);
> + if (active)
> + ret = isolate_container_pages(nr_to_scan, &rss->active_list,
> + dst, scanned, zone, mode);
> + else
> + ret = isolate_container_pages(nr_to_scan, &rss->inactive_list,
> + dst, scanned, zone, mode);
> + spin_unlock(&rss->res.lock);
> + return ret;
> +}
> +
> void container_rss_add(struct page_container *pc)
> {
> struct page *pg;
>
> ...
>
> +#ifdef CONFIG_RSS_CONTAINER
> +unsigned long try_to_free_pages_in_container(struct rss_container *cnt)
> +{
> + struct scan_control sc = {
> + .gfp_mask = GFP_KERNEL,
> + .may_writepage = 1,
> + .swap_cluster_max = 1,
> + .may_swap = 1,
> + .swappiness = vm_swappiness,
> + .order = 0, /* in this case we wanted one page only */
> + .cnt = cnt,
> + .isolate_pages = isolate_pages_in_container,
> + };
> + int node;
> + struct zone **zones;
> +
> + for_each_online_node(node) {
> +#ifdef CONFIG_HIGHMEM
> + zones = NODE_DATA(node)->node_zonelists[ZONE_HIGHMEM].zones;
> + if (do_try_to_free_pages(zones, sc.gfp_mask, &sc))
> + return 1;
> +#endif
> + zones = NODE_DATA(node)->node_zonelists[ZONE_NORMAL].zones;
> + if (do_try_to_free_pages(zones, sc.gfp_mask, &sc))
> + return 1;
> + }
Definitely need to handle ZONE_DMA32 and ZONE_DMA (some architectures put
all memory into ZONE_DMA (or they used to))
> + return 0;
> +}
> +#endif
> +
> unsigned long try_to_free_pages(struct zone **zones, int order, gfp_t gfp_mask)
> {
> struct scan_control sc = {
next prev parent reply other threads:[~2007-05-30 21:48 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-30 15:24 [PATCH 0/8] RSS controller based on process containers (v3) Pavel Emelianov
2007-05-30 15:26 ` [PATCH 1/8] Resource counters Pavel Emelianov
2007-05-30 21:44 ` Andrew Morton
2007-05-30 15:28 ` [PATCH 2/8] Add container pointer on struct page Pavel Emelianov
2007-05-30 21:45 ` Andrew Morton
2007-05-30 15:29 ` [PATCH 3/8] Add container pointer on mm_struct Pavel Emelianov
2007-05-30 21:45 ` Andrew Morton
2007-05-30 15:32 ` [PATCH 4/8] RSS container core Pavel Emelianov
2007-05-30 21:46 ` Andrew Morton
2007-05-31 9:00 ` Pavel Emelianov
2007-05-30 15:34 ` [PATCH 5/8] RSS accounting hooks over the code Pavel Emelianov
2007-05-30 21:46 ` Andrew Morton
2007-06-01 6:48 ` Balbir Singh
2007-05-30 15:35 ` [PATCH 6/8] Per container OOM killer Pavel Emelianov
2007-05-30 15:39 ` [PATCH 7/8] Scanner changes needed to implement per-container scanner Pavel Emelianov
2007-05-30 21:46 ` Andrew Morton
2007-06-01 6:50 ` Balbir Singh
2007-06-01 7:40 ` Pavel Emelianov
2007-06-01 7:38 ` Balbir Singh
2007-05-30 15:42 ` [PATCH 8/8] Per-container pages reclamation Pavel Emelianov
2007-05-30 21:47 ` Andrew Morton [this message]
2007-05-31 8:22 ` Vaidyanathan Srinivasan
2007-05-31 9:22 ` Balbir Singh
2007-06-01 9:27 ` Pavel Emelianov
2007-06-01 9:23 ` Balbir Singh
2007-05-31 10:35 ` Pavel Emelianov
2007-05-31 17:58 ` Andrew Morton
2007-06-01 7:44 ` Pavel Emelianov
2007-06-01 7:49 ` Andrew Morton
2007-06-01 7:02 ` Balbir Singh
2007-06-01 7:50 ` Pavel Emelianov
-- strict thread matches above, loose matches on Subject: below --
2007-04-09 12:22 [PATCH 0/8] RSS controller based on process containers (v2) Pavel Emelianov
2007-04-09 13:02 ` [PATCH 8/8] Per-container pages reclamation Pavel Emelianov
2007-04-24 9:47 ` Balbir Singh
2007-04-24 10:34 ` Pavel Emelianov
2007-04-24 11:01 ` Balbir Singh
2007-04-24 11:37 ` Pavel Emelianov
2007-05-02 9:51 ` Balbir Singh
2007-05-17 11:31 ` Balbir Singh
2007-05-21 15:15 ` Pavel Emelianov
2007-05-24 7:59 ` Balbir Singh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070530144737.56456aaf.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=balbir@in.ibm.com \
--cc=dev@openvz.org \
--cc=devel@openvz.org \
--cc=linux-kernel@vger.kernel.org \
--cc=menage@google.com \
--cc=xemul@openvz.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).