From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F44AC2BB1D for ; Fri, 17 Apr 2020 03:57:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4AF60217D8 for ; Fri, 17 Apr 2020 03:57:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="gF3nVCry" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4AF60217D8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id ED9488E0003; Thu, 16 Apr 2020 23:57:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E628B8E0001; Thu, 16 Apr 2020 23:57:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D2A8B8E0003; Thu, 16 Apr 2020 23:57:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0083.hostedemail.com [216.40.44.83]) by kanga.kvack.org (Postfix) with ESMTP id B12628E0001 for ; Thu, 16 Apr 2020 23:57:23 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 66A3E180AD817 for ; Fri, 17 Apr 2020 03:57:23 +0000 (UTC) X-FDA: 76715987166.27.air09_7aae6f723c114 X-HE-Tag: air09_7aae6f723c114 X-Filterd-Recvd-Size: 8961 Received: from mail-qt1-f193.google.com (mail-qt1-f193.google.com [209.85.160.193]) by imf11.hostedemail.com (Postfix) with ESMTP for ; Fri, 17 Apr 2020 03:57:22 +0000 (UTC) Received: by mail-qt1-f193.google.com with SMTP id 71so914360qtc.12 for ; Thu, 16 Apr 2020 20:57:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=yjtIAiLRvM4NhRSEzgKLtX4qOU01Sew7tR2ybuq7TBM=; b=gF3nVCryJgS7NA0Q7bcrhnJHEbsV4OAlqBMgdFTGO0P9jJdnkTTcVNoBnb3sDiZBdi rh9N5EnMMjm5zTWPHOfSNV+91sf97EG86LdM05hsOiPkgM2qXqtmmhvoFqmGeII17BZe hRbNuZa0I0X8LRE0jfY2jjm/PSLtfrxDDkKGa82VL2hz08Y3oe8Qad7D7e9L3DqtyZrX VrWyWiOcsR/Q1iXd9xj1as4t6qfBAn7gY/MPEfZO08Te+uQRPYUzvVaXoVVqsRcGtmM2 ccyiD8H9uEEbL97e8hU9uNJi0jCQ6ekWQadgDDszipjEnBqr6hwTgRWXS/k+E4ZIlQXI Ns8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=yjtIAiLRvM4NhRSEzgKLtX4qOU01Sew7tR2ybuq7TBM=; b=qamW1Xj9qxa8hpvVIR+BbTcAMMrfGiYI5IPsEJ/ueBq1utMPxVEUnJRcoWm3iHuyNs cN+eLoOY0GNxrRamSfAxShESMjUQH7s9okgqUkBpqagZy9qwvyf/SaUOl9RugskpGXmB s+fqVjFJa3O3OqKmqq+dhVBe2My+ZhuS6D7bxGMpvWHePHqfoSSLhhTyz9n8rKY6Zd6k LONQ2dQ2cfmncuzUm68oGBBRmfwWqOZuXXXainqN2fD+lHUUQIew1+Au9GAVKp2Qeobb YIvjQs3J8z0JbRcRJcKv3j78wOvpVykDleFhSlgaeKFfsNszmPg5a0qzRvh7EBH1YNeH fCyw== X-Gm-Message-State: AGi0PubNmHjQ1F6qzNBoFGa4KjttCtLMnzmjcN/cbByUMY5w53QFmi2h vjxhrvUDhMLsp3OHO/7plSZvBfiBg1ag8XV323Y= X-Google-Smtp-Source: APiQypKkU13IgXX5+0X25ICcUavl/O5CzSKXCNQKMTt5W0YNktF38dQ+Afii6oFv07fHUV+xiHFiumWxG+weZ+AMYos= X-Received: by 2002:ac8:2ac6:: with SMTP id c6mr1097238qta.35.1587095842162; Thu, 16 Apr 2020 20:57:22 -0700 (PDT) MIME-Version: 1.0 References: <1585892447-32059-1-git-send-email-iamjoonsoo.kim@lge.com> <1585892447-32059-6-git-send-email-iamjoonsoo.kim@lge.com> <20200416161133.GA178028@cmpxchg.org> <20200417033116.GJ195132@cmpxchg.org> In-Reply-To: <20200417033116.GJ195132@cmpxchg.org> From: Joonsoo Kim Date: Fri, 17 Apr 2020 12:57:11 +0900 Message-ID: Subject: Re: [PATCH v5 05/10] mm/swap: charge the page when adding to the swap cache To: Johannes Weiner Cc: Andrew Morton , Linux Memory Management List , LKML , Michal Hocko , Hugh Dickins , Minchan Kim , Vlastimil Babka , Mel Gorman , kernel-team@lge.com, Joonsoo Kim Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: 2020=EB=85=84 4=EC=9B=94 17=EC=9D=BC (=EA=B8=88) =EC=98=A4=ED=9B=84 12:31, = Johannes Weiner =EB=8B=98=EC=9D=B4 =EC=9E=91=EC=84=B1: > > On Fri, Apr 17, 2020 at 10:38:53AM +0900, Joonsoo Kim wrote: > > 2020=EB=85=84 4=EC=9B=94 17=EC=9D=BC (=EA=B8=88) =EC=98=A4=EC=A0=84 1:1= 1, Johannes Weiner =EB=8B=98=EC=9D=B4 =EC=9E=91=EC=84= =B1: > > > > > > Hello Joonsoo, > > > > > > On Fri, Apr 03, 2020 at 02:40:43PM +0900, js1304@gmail.com wrote: > > > > @@ -112,7 +112,7 @@ void show_swap_cache_info(void) > > > > * but sets SwapCache flag and private instead of mapping and inde= x. > > > > */ > > > > int add_to_swap_cache(struct page *page, swp_entry_t entry, > > > > - gfp_t gfp, void **shadowp) > > > > + struct vm_area_struct *vma, gfp_t gfp, void *= *shadowp) > > > > { > > > > struct address_space *address_space =3D swap_address_space(en= try); > > > > pgoff_t idx =3D swp_offset(entry); > > > > @@ -120,14 +120,26 @@ int add_to_swap_cache(struct page *page, swp_= entry_t entry, > > > > unsigned long i, nr =3D compound_nr(page); > > > > unsigned long nrexceptional =3D 0; > > > > void *old; > > > > + bool compound =3D !!compound_order(page); > > > > + int error; > > > > + struct mm_struct *mm =3D vma ? vma->vm_mm : current->mm; > > > > + struct mem_cgroup *memcg; > > > > > > > > VM_BUG_ON_PAGE(!PageLocked(page), page); > > > > VM_BUG_ON_PAGE(PageSwapCache(page), page); > > > > VM_BUG_ON_PAGE(!PageSwapBacked(page), page); > > > > > > > > page_ref_add(page, nr); > > > > + /* PageSwapCache() prevent the page from being re-charged */ > > > > SetPageSwapCache(page); > > > > > > > > + error =3D mem_cgroup_try_charge(page, mm, gfp, &memcg, compou= nd); > > > > + if (error) { > > > > + ClearPageSwapCache(page); > > > > + page_ref_sub(page, nr); > > > > + return error; > > > > + } > > > > + > > > > do { > > > > xas_lock_irq(&xas); > > > > xas_create_range(&xas); > > > > @@ -153,11 +165,16 @@ int add_to_swap_cache(struct page *page, swp_= entry_t entry, > > > > xas_unlock_irq(&xas); > > > > } while (xas_nomem(&xas, gfp)); > > > > > > > > - if (!xas_error(&xas)) > > > > + if (!xas_error(&xas)) { > > > > + mem_cgroup_commit_charge(page, memcg, false, compound= ); > > > > > > Unfortunately you cannot commit here yet because the rmap isn't set u= p > > > and that will cause memcg_charge_statistics() to account the page > > > incorrectly as file. And rmap is only set up during a page fault. > > > > I also found this problem a few days ago. In my investigation, what we = need for > > anonymous page to make accounting correct is a way to find the type of = the page, > > file or anon, since there is no special code to use the rmap. And, I > > think that it > > could be done by checking NULL mapping or something else. > > page->mapping is NULL for truncated file pages, file pages before they > are inserted into the page cache, and anon pages before the rmap. It's > not straight-forward to robustly tell those apart inside memcg. Okay. > But fundamentally, it's a silly problem to fix. We only need to tell > page types apart to maintain the MEMCG_CACHE and MEMCG_RSS > counters. But these are unnecessary duplicates of the NR_FILE_PAGES > and NR_ANON_MAPPED vmstat counters - counters for which we already > have accounting sites in generic code, and that already exist in the > per-cgroup vmstat arrays. We just need to link them. > > So that's what I'm fixing instead: I'm adjusting the charging sequence > slightly so that page->mem_cgroup is stable when the core VM code > accounts pages by type. And then I'm hooking into these places with > mod_lruvec_page_state and friends, and ditching MEMCG_CACHE/MEMCG_RSS. > > After that, memcg doesn't have to know about the types of pages at > all. It can focus on maintaining page_counters and page->mem_cgroup, > and leave the vmstat breakdown to generic VM code. > > Then we can charge pages right after allocation, regardless of type. > > [ Eventually, the memcg accounting interface shouldn't be much more > than GFP_ACCOUNT (with memalloc_use_memcg() for faults, swap etc.), > and the vmstat hooks. My patches don't quite get there, but that's > the direction they're pushing. ] Thanks for explanation! It will help me understand your future patches. > > Is there anything I missed? And, I cannot find the function, > > memcg_charge_statistics(). Please let me know the file name of this > > function. > > The correct name is mem_cgroup_charge_statistics(), my apologies. No problem. > > > This needs a bit of a rework of the memcg charging code that appears > > > out of scope for your patches. I'm preparing a series right now to do > > > just that. It'll also fix the swap ownership tracking problem when th= e > > > swap controller is disabled, so we don't have to choose between > > > charging the wrong cgroup or hampering swap readahead efficiency. > > > > Sound good! I also think that these patches are out of scope of my seri= es. > > I will wait your patches. Could you let me know when your series is sub= mitted? > > I'd like to plan my work schedule based on your patch schedule. > > I just finished the first draft of the series a few minutes ago. I'll > polish it a bit, make sure it compiles etc. ;-), and send it out for > review, testing and rebasing, hopefully tomorrow or early next week. Sound good! See you again next week. :) Thanks.