From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BAAE0C2BA19 for ; Tue, 21 Apr 2020 09:33:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A2086206E9 for ; Tue, 21 Apr 2020 09:33:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728397AbgDUJd4 (ORCPT ); Tue, 21 Apr 2020 05:33:56 -0400 Received: from out30-54.freemail.mail.aliyun.com ([115.124.30.54]:49515 "EHLO out30-54.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725920AbgDUJd4 (ORCPT ); Tue, 21 Apr 2020 05:33:56 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R161e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e07484;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=11;SR=0;TI=SMTPD_---0TwDj3zG_1587461629; Received: from IT-FVFX43SYHV2H.local(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0TwDj3zG_1587461629) by smtp.aliyun-inc.com(127.0.0.1); Tue, 21 Apr 2020 17:33:50 +0800 Subject: Re: [PATCH 00/18] mm: memcontrol: charge swapin pages on instantiation To: Johannes Weiner , Joonsoo Kim Cc: Shakeel Butt , Hugh Dickins , Michal Hocko , "Kirill A. Shutemov" , Roman Gushchin , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com References: <20200420221126.341272-1-hannes@cmpxchg.org> From: Alex Shi Message-ID: Date: Tue, 21 Apr 2020 17:32:43 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 MIME-Version: 1.0 In-Reply-To: <20200420221126.341272-1-hannes@cmpxchg.org> Content-Type: text/plain; charset=gbk Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ÔÚ 2020/4/21 ÉÏÎç6:11, Johannes Weiner дµÀ: > This patch series reworks memcg to charge swapin pages directly at > swapin time, rather than at fault time, which may be much later, or > not happen at all. > > The delayed charging scheme we have right now causes problems: > > - Alex's per-cgroup lru_lock patches rely on pages that have been > isolated from the LRU to have a stable page->mem_cgroup; otherwise > the lock may change underneath him. Swapcache pages are charged only > after they are added to the LRU, and charging doesn't follow the LRU > isolation protocol. Hi Johannes, Thanks a lot! It looks all fine for me. I will rebase per cgroup lru_lock on this. Thanks! Alex > > - Joonsoo's anon workingset patches need a suitable LRU at the time > the page enters the swap cache and displaces the non-resident > info. But the correct LRU is only available after charging. > > - It's a containment hole / DoS vector. Users can trigger arbitrarily > large swap readahead using MADV_WILLNEED. The memory is never > charged unless somebody actually touches it. > > - It complicates the page->mem_cgroup stabilization rules > > In order to charge pages directly at swapin time, the memcg code base > needs to be prepared, and several overdue cleanups become a necessity: > > To charge pages at swapin time, we need to always have cgroup > ownership tracking of swap records. We also cannot rely on > page->mapping to tell apart page types at charge time, because that's > only set up during a page fault. > > To eliminate the page->mapping dependency, memcg needs to ditch its > private page type counters (MEMCG_CACHE, MEMCG_RSS, NR_SHMEM) in favor > of the generic vmstat counters and accounting sites, such as > NR_FILE_PAGES, NR_ANON_MAPPED etc. > > To switch to generic vmstat counters, the charge sequence must be > adjusted such that page->mem_cgroup is set up by the time these > counters are modified. > > The series is structured as follows: > > 1. Bug fixes > 2. Decoupling charging from rmap > 3. Swap controller integration into memcg > 4. Direct swapin charging > > The patches survive a simple swapout->swapin test inside a virtual > machine. Because this is blocking two major patch sets, I'm sending > these out early and will continue testing in parallel to the review. > > include/linux/memcontrol.h | 53 +---- > include/linux/mm.h | 4 +- > include/linux/swap.h | 6 +- > init/Kconfig | 17 +- > kernel/events/uprobes.c | 10 +- > mm/filemap.c | 43 ++--- > mm/huge_memory.c | 45 ++--- > mm/khugepaged.c | 25 +-- > mm/memcontrol.c | 448 ++++++++++++++----------------------------- > mm/memory.c | 51 ++--- > mm/migrate.c | 20 +- > mm/rmap.c | 53 +++-- > mm/shmem.c | 117 +++++------ > mm/swap_cgroup.c | 6 - > mm/swap_state.c | 89 +++++---- > mm/swapfile.c | 25 +-- > mm/userfaultfd.c | 5 +- > 17 files changed, 367 insertions(+), 650 deletions(-) > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alex Shi Subject: Re: [PATCH 00/18] mm: memcontrol: charge swapin pages on instantiation Date: Tue, 21 Apr 2020 17:32:43 +0800 Message-ID: References: <20200420221126.341272-1-hannes@cmpxchg.org> Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <20200420221126.341272-1-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="iso-8859-1" To: Johannes Weiner , Joonsoo Kim Cc: Shakeel Butt , Hugh Dickins , Michal Hocko , "Kirill A. Shutemov" , Roman Gushchin , linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, kernel-team-b10kYP2dOMg@public.gmane.org =D4=DA 2020/4/21 =C9=CF=CE=E76:11, Johannes Weiner =D0=B4=B5=C0: > This patch series reworks memcg to charge swapin pages directly at > swapin time, rather than at fault time, which may be much later, or > not happen at all. >=20 > The delayed charging scheme we have right now causes problems: >=20 > - Alex's per-cgroup lru_lock patches rely on pages that have been > isolated from the LRU to have a stable page->mem_cgroup; otherwise > the lock may change underneath him. Swapcache pages are charged only > after they are added to the LRU, and charging doesn't follow the LRU > isolation protocol. Hi Johannes, Thanks a lot!=20 It looks all fine for me. I will rebase per cgroup lru_lock on this. Thanks! Alex >=20 > - Joonsoo's anon workingset patches need a suitable LRU at the time > the page enters the swap cache and displaces the non-resident > info. But the correct LRU is only available after charging. >=20 > - It's a containment hole / DoS vector. Users can trigger arbitrarily > large swap readahead using MADV_WILLNEED. The memory is never > charged unless somebody actually touches it. >=20 > - It complicates the page->mem_cgroup stabilization rules >=20 > In order to charge pages directly at swapin time, the memcg code base > needs to be prepared, and several overdue cleanups become a necessity: >=20 > To charge pages at swapin time, we need to always have cgroup > ownership tracking of swap records. We also cannot rely on > page->mapping to tell apart page types at charge time, because that's > only set up during a page fault. >=20 > To eliminate the page->mapping dependency, memcg needs to ditch its > private page type counters (MEMCG_CACHE, MEMCG_RSS, NR_SHMEM) in favor > of the generic vmstat counters and accounting sites, such as > NR_FILE_PAGES, NR_ANON_MAPPED etc. >=20 > To switch to generic vmstat counters, the charge sequence must be > adjusted such that page->mem_cgroup is set up by the time these > counters are modified. >=20 > The series is structured as follows: >=20 > 1. Bug fixes > 2. Decoupling charging from rmap > 3. Swap controller integration into memcg > 4. Direct swapin charging >=20 > The patches survive a simple swapout->swapin test inside a virtual > machine. Because this is blocking two major patch sets, I'm sending > these out early and will continue testing in parallel to the review. >=20 > include/linux/memcontrol.h | 53 +---- > include/linux/mm.h | 4 +- > include/linux/swap.h | 6 +- > init/Kconfig | 17 +- > kernel/events/uprobes.c | 10 +- > mm/filemap.c | 43 ++--- > mm/huge_memory.c | 45 ++--- > mm/khugepaged.c | 25 +-- > mm/memcontrol.c | 448 ++++++++++++++-------------------------= ---- > mm/memory.c | 51 ++--- > mm/migrate.c | 20 +- > mm/rmap.c | 53 +++-- > mm/shmem.c | 117 +++++------ > mm/swap_cgroup.c | 6 - > mm/swap_state.c | 89 +++++---- > mm/swapfile.c | 25 +-- > mm/userfaultfd.c | 5 +- > 17 files changed, 367 insertions(+), 650 deletions(-) >=20