From: Michal Hocko <mhocko@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Hugh Dickins <hughd@google.com>, Tejun Heo <tj@kernel.org>,
Vladimir Davydov <vdavydov@parallels.com>,
linux-mm@kvack.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [patch 12/13] mm: memcontrol: rewrite charge API
Date: Mon, 14 Jul 2014 17:04:46 +0200 [thread overview]
Message-ID: <20140714150446.GD30713@dhcp22.suse.cz> (raw)
In-Reply-To: <1403124045-24361-13-git-send-email-hannes@cmpxchg.org>
Hi,
I've finally manage to untagle myself from internal stuff...
On Wed 18-06-14 16:40:44, Johannes Weiner wrote:
> The memcg charge API charges pages before they are rmapped - i.e. have
> an actual "type" - and so every callsite needs its own set of charge
> and uncharge functions to know what type is being operated on. Worse,
> uncharge has to happen from a context that is still type-specific,
> rather than at the end of the page's lifetime with exclusive access,
> and so requires a lot of synchronization.
>
> Rewrite the charge API to provide a generic set of try_charge(),
> commit_charge() and cancel_charge() transaction operations, much like
> what's currently done for swap-in:
>
> mem_cgroup_try_charge() attempts to reserve a charge, reclaiming
> pages from the memcg if necessary.
>
> mem_cgroup_commit_charge() commits the page to the charge once it
> has a valid page->mapping and PageAnon() reliably tells the type.
>
> mem_cgroup_cancel_charge() aborts the transaction.
>
> This reduces the charge API and enables subsequent patches to
> drastically simplify uncharging.
>
> As pages need to be committed after rmap is established but before
> they are added to the LRU, page_add_new_anon_rmap() must stop doing
> LRU additions again. Revive lru_cache_add_active_or_unevictable().
I think it would make more sense to do
lru_cache_add_active_or_unevictable in a separate patch for easier
review. Too late, though...
Few comments bellow
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
The patch looks correct but the code is quite tricky so I hope I didn't
miss anything.
Acked-by: Michal Hocko <mhocko@suse.cz>
> ---
> Documentation/cgroups/memcg_test.txt | 32 +--
> include/linux/memcontrol.h | 53 ++---
> include/linux/swap.h | 3 +
> kernel/events/uprobes.c | 1 +
> mm/filemap.c | 9 +-
> mm/huge_memory.c | 57 +++--
> mm/memcontrol.c | 407 ++++++++++++++---------------------
> mm/memory.c | 41 ++--
> mm/rmap.c | 19 --
> mm/shmem.c | 24 ++-
> mm/swap.c | 34 +++
> mm/swapfile.c | 14 +-
> 12 files changed, 314 insertions(+), 380 deletions(-)
>
[...]
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index eb65d29516ca..1a9a096858e0 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -54,28 +54,11 @@ struct mem_cgroup_reclaim_cookie {
> };
>
> #ifdef CONFIG_MEMCG
> -/*
> - * All "charge" functions with gfp_mask should use GFP_KERNEL or
> - * (gfp_mask & GFP_RECLAIM_MASK). In current implementatin, memcg doesn't
> - * alloc memory but reclaims memory from all available zones. So, "where I want
> - * memory from" bits of gfp_mask has no meaning. So any bits of that field is
> - * available but adding a rule is better. charge functions' gfp_mask should
> - * be set to GFP_KERNEL or gfp_mask & GFP_RECLAIM_MASK for avoiding ambiguous
> - * codes.
> - * (Of course, if memcg does memory allocation in future, GFP_KERNEL is sane.)
> - */
I think we should slightly modify the comment but the primary idea
should stay there. What about the following?
/*
* Although memcg charge functions do not allocate any memory they are
* still getting GFP mask to control the reclaim process (therefore
* gfp_mask & GFP_RECLAIM_MASK is expected).
* GFP_KERNEL should be used for the general charge path without any
* constraints for the reclaim
* __GFP_WAIT should be cleared for atomic contexts
* __GFP_NORETRY should be set for charges which might fail rather than
* spend too much time reclaiming
* __GFP_NOFAIL should be set for charges which cannot fail.
*/
> -
> -extern int mem_cgroup_charge_anon(struct page *page, struct mm_struct *mm,
> - gfp_t gfp_mask);
> -/* for swap handling */
> -extern int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
> - struct page *page, gfp_t mask, struct mem_cgroup **memcgp);
> -extern void mem_cgroup_commit_charge_swapin(struct page *page,
> - struct mem_cgroup *memcg);
> -extern void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *memcg);
> -
> -extern int mem_cgroup_charge_file(struct page *page, struct mm_struct *mm,
> - gfp_t gfp_mask);
> +int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
> + gfp_t gfp_mask, struct mem_cgroup **memcgp);
> +void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
> + bool lrucare);
> +void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg);
>
> struct lruvec *mem_cgroup_zone_lruvec(struct zone *, struct mem_cgroup *);
> struct lruvec *mem_cgroup_page_lruvec(struct page *, struct zone *);
[...]
> @@ -948,6 +951,7 @@ static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm,
> struct page *page,
> unsigned long haddr)
> {
> + struct mem_cgroup *memcg;
> spinlock_t *ptl;
> pgtable_t pgtable;
> pmd_t _pmd;
> @@ -968,20 +972,21 @@ static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm,
> __GFP_OTHER_NODE,
> vma, address, page_to_nid(page));
> if (unlikely(!pages[i] ||
> - mem_cgroup_charge_anon(pages[i], mm,
> - GFP_KERNEL))) {
> + mem_cgroup_try_charge(pages[i], mm, GFP_KERNEL,
> + &memcg))) {
> if (pages[i])
> put_page(pages[i]);
> - mem_cgroup_uncharge_start();
> while (--i >= 0) {
> - mem_cgroup_uncharge_page(pages[i]);
> + memcg = (void *)page_private(pages[i]);
Hmm, OK the memcg couldn't go away even if mm owner has left it because
the charge is already there and the page is not on LRU so the
mem_cgroup_css_free will wait until we uncharge it or put to LRU.
> + set_page_private(pages[i], 0);
> + mem_cgroup_cancel_charge(pages[i], memcg);
> put_page(pages[i]);
> }
> - mem_cgroup_uncharge_end();
> kfree(pages);
> ret |= VM_FAULT_OOM;
> goto out;
> }
/*
* Pages might end up charged to a different memcgs
* because the mm owner might move while we are allocating
* them. Abuse ->private field to store the charged
* memcg until we know whether to commit or cancel the
* charge.
*/
> + set_page_private(pages[i], (unsigned long)memcg);
> }
>
> for (i = 0; i < HPAGE_PMD_NR; i++) {
[...]
> +/**
> + * mem_cgroup_commit_charge - commit a page charge
> + * @page: page to charge
> + * @memcg: memcg to charge the page to
> + * @lrucare: page might be on LRU already
> + *
> + * Finalize a charge transaction started by mem_cgroup_try_charge(),
> + * after page->mapping has been set up. This must happen atomically
> + * as part of the page instantiation, i.e. under the page table lock
> + * for anonymous pages, under the page lock for page and swap cache.
> + *
> + * In addition, the page must not be on the LRU during the commit, to
> + * prevent racing with task migration. If it might be, use @lrucare.
> + *
> + * Use mem_cgroup_cancel_charge() to cancel the transaction instead.
> + */
> +void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
> + bool lrucare)
I think we should be explicit that this is only required for LRU pages.
kmem doesn't have to finalize the transaction.
[...]
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2014-07-14 15:04 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-18 20:40 [patch 00/13] mm: memcontrol: naturalize charge lifetime v4 Johannes Weiner
2014-06-18 20:40 ` [patch 01/13] mm: memcontrol: fold mem_cgroup_do_charge() Johannes Weiner
2014-06-18 20:40 ` [patch 02/13] mm: memcontrol: rearrange charging fast path Johannes Weiner
2014-06-18 20:40 ` [patch 03/13] mm: memcontrol: reclaim at least once for __GFP_NORETRY Johannes Weiner
2014-06-18 20:40 ` [patch 04/13] mm: huge_memory: use GFP_TRANSHUGE when charging huge pages Johannes Weiner
2014-06-18 20:40 ` [patch 05/13] mm: memcontrol: retry reclaim for oom-disabled and __GFP_NOFAIL charges Johannes Weiner
2014-06-18 20:40 ` [patch 06/13] mm: memcontrol: remove explicit OOM parameter in charge path Johannes Weiner
2014-06-18 20:40 ` [patch 07/13] mm: memcontrol: simplify move precharge function Johannes Weiner
2014-06-18 20:40 ` [patch 08/13] mm: memcontrol: catch root bypass in move precharge Johannes Weiner
2014-06-18 20:40 ` [patch 09/13] mm: memcontrol: use root_mem_cgroup res_counter Johannes Weiner
2014-06-18 20:40 ` [patch 10/13] mm: memcontrol: remove ordering between pc->mem_cgroup and PageCgroupUsed Johannes Weiner
2014-06-18 20:40 ` [patch 11/13] mm: memcontrol: do not acquire page_cgroup lock for kmem pages Johannes Weiner
2014-06-18 20:40 ` [patch 12/13] mm: memcontrol: rewrite charge API Johannes Weiner
2014-06-23 6:15 ` Uwe Kleine-König
2014-06-23 9:30 ` Michal Hocko
2014-06-23 9:42 ` Uwe Kleine-König
2014-07-14 15:04 ` Michal Hocko [this message]
2014-07-14 17:13 ` Johannes Weiner
2014-07-14 18:43 ` Michal Hocko
2014-06-18 20:40 ` [patch 13/13] mm: memcontrol: rewrite uncharge API Johannes Weiner
2014-06-20 16:36 ` [PATCH -mm] memcg: mem_cgroup_charge_statistics needs preempt_disable Michal Hocko
2014-06-23 4:16 ` Johannes Weiner
2014-06-21 0:34 ` [patch 13/13] mm: memcontrol: rewrite uncharge API Sasha Levin
2014-06-21 0:56 ` Andrew Morton
2014-06-21 1:03 ` Sasha Levin
2014-07-15 8:25 ` Michal Hocko
2014-07-15 12:19 ` Michal Hocko
2014-07-18 7:12 ` Michal Hocko
2014-07-18 14:45 ` Johannes Weiner
2014-07-18 15:12 ` Miklos Szeredi
2014-07-19 17:39 ` Johannes Weiner
2014-07-22 15:08 ` Michal Hocko
2014-07-22 15:44 ` Miklos Szeredi
2014-07-23 14:38 ` Michal Hocko
2014-07-23 15:06 ` Johannes Weiner
2014-07-23 15:19 ` Michal Hocko
2014-07-23 15:36 ` Johannes Weiner
2014-07-23 18:08 ` Miklos Szeredi
2014-07-23 21:02 ` Johannes Weiner
2014-07-24 8:46 ` Michal Hocko
2014-07-24 9:02 ` Michal Hocko
2014-07-25 15:26 ` Johannes Weiner
2014-07-25 15:43 ` Michal Hocko
2014-07-25 17:34 ` Johannes Weiner
2014-07-15 14:23 ` Michal Hocko
2014-07-15 15:09 ` Johannes Weiner
2014-07-15 15:18 ` Michal Hocko
2014-07-15 15:46 ` Johannes Weiner
2014-07-15 15:56 ` Michal Hocko
2014-07-15 15:55 ` Naoya Horiguchi
2014-07-15 16:07 ` Michal Hocko
2014-07-15 17:34 ` Johannes Weiner
2014-07-15 18:21 ` Michal Hocko
2014-07-15 18:43 ` Naoya Horiguchi
2014-07-15 19:04 ` Johannes Weiner
2014-07-15 20:49 ` Naoya Horiguchi
2014-07-15 21:48 ` Johannes Weiner
2014-07-16 7:55 ` Michal Hocko
2014-07-16 13:30 ` Naoya Horiguchi
2014-07-16 14:14 ` Johannes Weiner
2014-07-16 14:57 ` Naoya Horiguchi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140714150446.GD30713@dhcp22.suse.cz \
--to=mhocko@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=tj@kernel.org \
--cc=vdavydov@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).