All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Shakeel Butt <shakeelb@google.com>
Cc: "Matthew Wilcox" <willy@infradead.org>,
	"Vladimir Davydov" <vdavydov.dev@gmail.com>,
	"Michal Hocko" <mhocko@suse.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Ben Gardon" <bgardon@google.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"Linux MM" <linux-mm@kvack.org>,
	kvm@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH] mm, kvm: account kvm_vcpu_mmap to kmemcg
Date: Fri, 29 Mar 2019 10:20:33 -0400	[thread overview]
Message-ID: <20190329142033.GB2474@cmpxchg.org> (raw)
In-Reply-To: <CALvZod5GiC1+HB3_Mm969Qbgj7s6-unbd141uP5pnMbsufS+mg@mail.gmail.com>

On Thu, Mar 28, 2019 at 08:59:45PM -0700, Shakeel Butt wrote:
> On Thu, Mar 28, 2019 at 7:36 PM Matthew Wilcox <willy@infradead.org> wrote:
> > I don't understand why we need a PageKmemcg anyway.  We already
> > have an entire pointer in struct page; can we not just check whether
> > page->mem_cgroup is NULL or not?
> 
> PageKmemcg is for kmem while page->mem_cgroup is used for anon, file
> and kmem memory. So, page->mem_cgroup can not be used for NULL check
> unless we unify them. Not sure how complicated would that be.

A page flag warrants research into this.

The only reason we have PageKmemcg() is because of the way we do
memory type accounting at uncharge time:

	if (!PageKmemcg(page)) {
		unsigned int nr_pages = 1;

		if (PageTransHuge(page)) {
			nr_pages <<= compound_order(page);
			ug->nr_huge += nr_pages;
		}
		if (PageAnon(page))
			ug->nr_anon += nr_pages;
		else {
			ug->nr_file += nr_pages;
			if (PageSwapBacked(page))
				ug->nr_shmem += nr_pages;
		}
		ug->pgpgout++;
	} else {
		ug->nr_kmem += 1 << compound_order(page);
		__ClearPageKmemcg(page);
	}

	[...]

	if (!mem_cgroup_is_root(ug->memcg)) {
		page_counter_uncharge(&ug->memcg->memory, nr_pages);
		if (do_memsw_account())
			page_counter_uncharge(&ug->memcg->memsw, nr_pages);
		if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem)
			page_counter_uncharge(&ug->memcg->kmem, ug->nr_kmem);
		memcg_oom_recover(ug->memcg);
	}

	local_irq_save(flags);
	__mod_memcg_state(ug->memcg, MEMCG_RSS, -ug->nr_anon);
	__mod_memcg_state(ug->memcg, MEMCG_CACHE, -ug->nr_file);
	__mod_memcg_state(ug->memcg, MEMCG_RSS_HUGE, -ug->nr_huge);
	__mod_memcg_state(ug->memcg, NR_SHMEM, -ug->nr_shmem);
	__count_memcg_events(ug->memcg, PGPGOUT, ug->pgpgout);

But nothing says we have to have all these duplicate private counters,
or update them this late in the page's lifetime. The generic vmstat
counters in comparison are updated when 1) we know the page is going
away but 2) we still know the page type. We can do the same here.

We can either

a) Push up the MEMCG_RSS, MEMCG_CACHE etc. accounting sites to before
   the pages are uncharged, when the page type is still known, but
   page->mem_cgroup is exclusive, i.e. when they are deleted from page
   cache or when their last pte is going away. This would be very
   close to where the VM updates NR_ANON_MAPPED, NR_FILE_PAGES etc.

or

b) Tweak the existing NR_ANON_MAPPED, NR_FILE_PAGES, NR_ANON_THPS
   accounting sites to use the lruvec_page_state infra and get rid of
   the duplicate MEMCG_RSS, MEMCG_CACHE counters completely.

   These sites would need slight adjustments, as they are sometimes
   before commit_charge() set up page->mem_cgroup, but it doesn't look
   too complicated to fix that ordering.

The latter would be a great cleanup, and frankly one that is long
overdue. There is no good reason for all this duplication. We'd not
only get rid of the private counters and the duplicate accounting
sites, it would drastically simplify charging and uncharging, and it
would even obviate the need for a separate kmem (un)charge path.

[ The cgroup1 memcg->kmem thing is the odd-one-out, but I think this
  is complete legacy at this point and nobody is actively setting
  limits on that counter anyway. We can break out an explicit v1-only
  mem_cgroup_charge_legacy_kmem(), put it into the currently accounted
  callsites for compatibility, and not add any new ones. ]

  reply	other threads:[~2019-03-29 14:20 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-29  1:28 [RFC PATCH] mm, kvm: account kvm_vcpu_mmap to kmemcg Shakeel Butt
2019-03-29  1:28 ` Shakeel Butt
2019-03-29  2:35 ` Matthew Wilcox
2019-03-29  3:59   ` Shakeel Butt
2019-03-29 14:20     ` Johannes Weiner [this message]
2019-03-29  7:52 ` Michal Hocko
2019-03-29 16:00   ` Shakeel Butt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190329142033.GB2474@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=bgardon@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=shakeelb@google.com \
    --cc=vdavydov.dev@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.