From: Mike Rapoport <rppt@kernel.org>
To: Roman Gushchin <guroan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
Borislav Petkov <bp@alien8.de>,
Catalin Marinas <catalin.marinas@arm.com>,
Christopher Lameter <cl@linux.com>,
Dan Williams <dan.j.williams@intel.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
David Hildenbrand <david@redhat.com>,
Elena Reshetova <elena.reshetova@intel.com>,
"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
James Bottomley <jejb@linux.ibm.com>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
Matthew Wilcox <willy@infradead.org>,
Mark Rutland <mark.rutland@arm.com>,
Mike Rapoport <rppt@linux.ibm.com>,
Michael Kerrisk <mtk.manpages@gmail.com>,
Palmer Dabbelt <palmer@dabbelt.com>,
Paul Walmsley <paul.walmsley@sifive.com>,
Peter Zijlstra <peterz@infradead.org>,
Rick Edgecombe <rick.p.edgecombe@intel.com>,
Shuah Khan <shuah@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Tycho Andersen <tycho@tycho.ws>, Will Deacon <will@kernel.org>,
linux-api@vger.kernel.org, linux-arch@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org,
x86@kernel.org
Subject: Re: [PATCH v8 6/9] secretmem: add memcg accounting
Date: Sun, 15 Nov 2020 11:17:00 +0200 [thread overview]
Message-ID: <20201115091700.GY4758@kernel.org> (raw)
In-Reply-To: <CALo0P13aq3GsONnZrksZNU9RtfhMsZXGWhK1n=xYJWQizCd4Zw@mail.gmail.com>
On Fri, Nov 13, 2020 at 03:42:25PM -0800, Roman Gushchin wrote:
> вт, 10 нояб. 2020 г. в 07:16, Mike Rapoport <rppt@kernel.org>:
> >
> > From: Mike Rapoport <rppt@linux.ibm.com>
> >
> > Account memory consumed by secretmem to memcg. The accounting is updated
> > when the memory is actually allocated and freed.
> >
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > ---
> > mm/filemap.c | 2 +-
> > mm/secretmem.c | 42 +++++++++++++++++++++++++++++++++++++++++-
> > 2 files changed, 42 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 249cf489f5df..11387a077373 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -844,7 +844,7 @@ static noinline int __add_to_page_cache_locked(struct page *page,
> > page->mapping = mapping;
> > page->index = offset;
> >
> > - if (!huge) {
> > + if (!huge && !page->memcg_data) {
> > error = mem_cgroup_charge(page, current->mm, gfp);
> > if (error)
> > goto error;
> > diff --git a/mm/secretmem.c b/mm/secretmem.c
> > index 1aa2b7cffe0d..1eb7667016fa 100644
> > --- a/mm/secretmem.c
> > +++ b/mm/secretmem.c
> > @@ -17,6 +17,7 @@
> > #include <linux/syscalls.h>
> > #include <linux/memblock.h>
> > #include <linux/pseudo_fs.h>
> > +#include <linux/memcontrol.h>
> > #include <linux/set_memory.h>
> > #include <linux/sched/signal.h>
> >
> > @@ -49,6 +50,38 @@ struct secretmem_ctx {
> >
> > static struct cma *secretmem_cma;
> >
>
> Hi Mike!
>
> > +static int secretmem_memcg_charge(struct page *page, gfp_t gfp, int order)
> > +{
> > + unsigned long nr_pages = (1 << order);
> > + int i, err;
> > +
> > + err = memcg_kmem_charge_page(page, gfp, order);
> > + if (err)
> > + return err;
> > +
> > + for (i = 1; i < nr_pages; i++) {
> > + struct page *p = page + i;
> > +
> > + p->memcg_data = page->memcg_data;
> > + }
>
> Hm, it looks very strange to me. Why do we need to copy memcg_data?
> What about css reference counting?
I need to copy memcg_data to mark a page as being accounted so it won't
be charged again when it is added to page cache.
What happens here is that I allocate a large page and then use it as a
local cache for allocations in secretmem_fault(). I charge the large
page as kmem.
During secretmem_fault() a small sub-page from that large page goes into
page cache and there I skip its memcg accounting.
In the end, when the large page is freed, the memcg_data for all its
sub-pages is cleared and I uncharge memcg with the order of large page.
An alternative would be to uncharge a small page from kmem in
secretmem_fault() and make this page charged in add_to_page_cache(), but
that would complicate the release path as I would need to re-charge the
small page back to kmem at secretmem_freepage() and track all the
participating memcgs till the large page is freed.
> And what about statistics?
Hmm, that's probably won't be accurate :-/
> I'm sorry for being late.
>
> Thank you!
>
> > +
> > + return 0;
> > +}
> > +
> > +static void secretmem_memcg_uncharge(struct page *page, int order)
> > +{
> > + unsigned long nr_pages = (1 << order);
> > + int i;
> > +
> > + for (i = 1; i < nr_pages; i++) {
> > + struct page *p = page + i;
> > +
> > + p->memcg_data = 0;
> > + }
> > +
> > + memcg_kmem_uncharge_page(page, PMD_PAGE_ORDER);
> > +}
> > +
> > static int secretmem_pool_increase(struct secretmem_ctx *ctx, gfp_t gfp)
> > {
> > unsigned long nr_pages = (1 << PMD_PAGE_ORDER);
> > @@ -61,10 +94,14 @@ static int secretmem_pool_increase(struct secretmem_ctx *ctx, gfp_t gfp)
> > if (!page)
> > return -ENOMEM;
> >
> > - err = set_direct_map_invalid_noflush(page, nr_pages);
> > + err = secretmem_memcg_charge(page, gfp, PMD_PAGE_ORDER);
> > if (err)
> > goto err_cma_release;
> >
> > + err = set_direct_map_invalid_noflush(page, nr_pages);
> > + if (err)
> > + goto err_memcg_uncharge;
> > +
> > addr = (unsigned long)page_address(page);
> > err = gen_pool_add(pool, addr, PMD_SIZE, NUMA_NO_NODE);
> > if (err)
> > @@ -81,6 +118,8 @@ static int secretmem_pool_increase(struct secretmem_ctx *ctx, gfp_t gfp)
> > * won't fail
> > */
> > set_direct_map_default_noflush(page, nr_pages);
> > +err_memcg_uncharge:
> > + secretmem_memcg_uncharge(page, PMD_PAGE_ORDER);
> > err_cma_release:
> > cma_release(secretmem_cma, page, nr_pages);
> > return err;
> > @@ -310,6 +349,7 @@ static void secretmem_cleanup_chunk(struct gen_pool *pool,
> > int i;
> >
> > set_direct_map_default_noflush(page, nr_pages);
> > + secretmem_memcg_uncharge(page, PMD_PAGE_ORDER);
> >
> > for (i = 0; i < nr_pages; i++)
> > clear_highpage(page + i);
> > --
> > 2.28.0
> >
> >
--
Sincerely yours,
Mike.
next prev parent reply other threads:[~2020-11-15 9:17 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-10 15:14 [PATCH v8 0/9] mm: introduce memfd_secret system call to create "secret" memory areas Mike Rapoport
2020-11-10 15:14 ` [PATCH v8 1/9] mm: add definition of PMD_PAGE_ORDER Mike Rapoport
2020-11-10 15:14 ` [PATCH v8 2/9] mmap: make mlock_future_check() global Mike Rapoport
2020-11-10 17:17 ` David Hildenbrand
2020-11-10 18:06 ` Mike Rapoport
2020-11-12 16:22 ` David Hildenbrand
2020-11-12 19:08 ` Mike Rapoport
2020-11-12 20:15 ` David Hildenbrand
2020-11-15 8:26 ` Mike Rapoport
2020-11-17 15:09 ` David Hildenbrand
2020-11-17 15:58 ` Mike Rapoport
2020-11-10 15:14 ` [PATCH v8 3/9] set_memory: allow set_direct_map_*_noflush() for multiple pages Mike Rapoport
2020-11-13 12:26 ` Catalin Marinas
2020-11-10 15:14 ` [PATCH v8 4/9] mm: introduce memfd_secret system call to create "secret" memory areas Mike Rapoport
2020-11-13 13:58 ` Matthew Wilcox
2020-11-15 8:53 ` Mike Rapoport
2020-11-13 14:06 ` Matthew Wilcox
2020-11-15 8:45 ` Mike Rapoport
2020-11-10 15:14 ` [PATCH v8 5/9] secretmem: use PMD-size pages to amortize direct map fragmentation Mike Rapoport
2020-11-10 15:14 ` [PATCH v8 6/9] secretmem: add memcg accounting Mike Rapoport
2020-11-13 1:35 ` Andrew Morton
2020-11-13 23:42 ` Roman Gushchin
2020-11-15 9:17 ` Mike Rapoport [this message]
2020-11-10 15:14 ` [PATCH v8 7/9] PM: hibernate: disable when there are active secretmem users Mike Rapoport
2020-11-10 15:14 ` [PATCH v8 8/9] arch, mm: wire up memfd_secret system call were relevant Mike Rapoport
2020-11-13 12:25 ` Catalin Marinas
2020-11-15 8:56 ` Mike Rapoport
2020-11-10 15:14 ` [PATCH v8 9/9] secretmem: test: add basic selftest for memfd_secret(2) Mike Rapoport
2020-11-12 14:56 ` [PATCH v8 0/9] mm: introduce memfd_secret system call to create "secret" memory areas Mike Rapoport
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201115091700.GY4758@kernel.org \
--to=rppt@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=cl@linux.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=elena.reshetova@intel.com \
--cc=guroan@gmail.com \
--cc=hpa@zytor.com \
--cc=jejb@linux.ibm.com \
--cc=kirill@shutemov.name \
--cc=linux-api@vger.kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=linux-riscv@lists.infradead.org \
--cc=luto@kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=mtk.manpages@gmail.com \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=peterz@infradead.org \
--cc=rick.p.edgecombe@intel.com \
--cc=rppt@linux.ibm.com \
--cc=shuah@kernel.org \
--cc=tglx@linutronix.de \
--cc=tycho@tycho.ws \
--cc=viro@zeniv.linux.org.uk \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).