From: Johannes Weiner <hannes@cmpxchg.org> To: Michal Hocko <mhocko@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org>, Hugh Dickins <hughd@google.com>, Tejun Heo <tj@kernel.org>, Vladimir Davydov <vdavydov@parallels.com>, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [patch 13/13] mm: memcontrol: rewrite uncharge API Date: Tue, 15 Jul 2014 11:09:37 -0400 [thread overview] Message-ID: <20140715150937.GS29639@cmpxchg.org> (raw) In-Reply-To: <20140715142350.GD9366@dhcp22.suse.cz> On Tue, Jul 15, 2014 at 04:23:50PM +0200, Michal Hocko wrote: > On Tue 15-07-14 10:25:45, Michal Hocko wrote: > [...] > > diff --git a/Documentation/cgroups/memcg_test.txt b/Documentation/cgroups/memcg_test.txt > > index bcf750d3cecd..8870b0212150 100644 > > --- a/Documentation/cgroups/memcg_test.txt > > +++ b/Documentation/cgroups/memcg_test.txt > [...] > > 6. Shmem(tmpfs) Page Cache > > - Memcg's charge/uncharge have special handlers of shmem. The best way > > - to understand shmem's page state transition is to read mm/shmem.c. > > + The best way to understand shmem's page state transition is to read > > + mm/shmem.c. > > :D > > [...] > > 7. Page Migration > > - One of the most complicated functions is page-migration-handler. > > - Memcg has 2 routines. Assume that we are migrating a page's contents > > - from OLDPAGE to NEWPAGE. > > - > > - Usual migration logic is.. > > - (a) remove the page from LRU. > > - (b) allocate NEWPAGE (migration target) > > - (c) lock by lock_page(). > > - (d) unmap all mappings. > > - (e-1) If necessary, replace entry in radix-tree. > > - (e-2) move contents of a page. > > - (f) map all mappings again. > > - (g) pushback the page to LRU. > > - (-) OLDPAGE will be freed. > > - > > - Before (g), memcg should complete all necessary charge/uncharge to > > - NEWPAGE/OLDPAGE. > > - > > - The point is.... > > - - If OLDPAGE is anonymous, all charges will be dropped at (d) because > > - try_to_unmap() drops all mapcount and the page will not be > > - SwapCache. > > - > > - - If OLDPAGE is SwapCache, charges will be kept at (g) because > > - __delete_from_swap_cache() isn't called at (e-1) > > - > > - - If OLDPAGE is page-cache, charges will be kept at (g) because > > - remove_from_swap_cache() isn't called at (e-1) > > - > > - memcg provides following hooks. > > - > > - - mem_cgroup_prepare_migration(OLDPAGE) > > - Called after (b) to account a charge (usage += PAGE_SIZE) against > > - memcg which OLDPAGE belongs to. > > - > > - - mem_cgroup_end_migration(OLDPAGE, NEWPAGE) > > - Called after (f) before (g). > > - If OLDPAGE is used, commit OLDPAGE again. If OLDPAGE is already > > - charged, a charge by prepare_migration() is automatically canceled. > > - If NEWPAGE is used, commit NEWPAGE and uncharge OLDPAGE. > > - > > - But zap_pte() (by exit or munmap) can be called while migration, > > - we have to check if OLDPAGE/NEWPAGE is a valid page after commit(). > > + > > + mem_cgroup_migrate() > > This doesn't tell us anything abouta the page migration. On the other > hand I am not entirely sure the documentation here is very much helpful. > There is some outdated information. I wouldn't be opposed to remove > everything up to "9. Typical Tests." section which should be the primary > target of the file anyway. Yeah, documentation of the implementation should be directly in the source code and this file is kind of pointless. So all I did there was remove things that were wrong after my changes. But I agree it can probably be removed completely. > > @@ -382,9 +382,13 @@ static inline int mem_cgroup_swappiness(struct mem_cgroup *mem) > > } > > #endif > > #ifdef CONFIG_MEMCG_SWAP > > -extern void mem_cgroup_uncharge_swap(swp_entry_t ent); > > +extern void mem_cgroup_swapout(struct page *page, swp_entry_t entry); > > +extern void mem_cgroup_uncharge_swap(swp_entry_t entry); > > Wouldn't it be nicer to have those two with symmetric names? > mem_cgroup_{un}charge_swap? I thought about that when I wrote them, but their operation is not actually symmetrical. The first one migrates a memsw charge from a page to a swap entry when the page gets reclaimed - rather than when the swap entry is allocated, the second one uncharges the swap entry once the swap entry is released. > > @@ -2760,15 +2752,15 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg, > > spin_unlock_irq(&zone->lru_lock); > > } > > > > - mem_cgroup_charge_statistics(memcg, page, anon, nr_pages); > > - unlock_page_cgroup(pc); > > - > > + local_irq_disable(); > > + mem_cgroup_charge_statistics(memcg, page, nr_pages); > > /* > > * "charge_statistics" updated event counter. Then, check it. > > * Insert ancestor (and ancestor's ancestors), to softlimit RB-tree. > > * if they exceeds softlimit. > > */ > > memcg_check_events(memcg, page); > > + local_irq_enable(); > > preempt_{enable,disbale} should be sufficient for > mem_cgroup_charge_statistics and memcg_check_events no? > The first one is about per-cpu accounting (and that should be atomic > wrt. IRQ on the same CPU) and the later one uses IRQ safe locks down in > mem_cgroup_update_tree. How could it be atomic wrt. IRQ on the local CPU when IRQs that modify the counters can fire on the local CPU? > > @@ -780,11 +780,14 @@ static int move_to_new_page(struct page *newpage, struct page *page, > > rc = fallback_migrate_page(mapping, newpage, page, mode); > > > > if (rc != MIGRATEPAGE_SUCCESS) { > > - newpage->mapping = NULL; > > + if (!PageAnon(newpage)) > > + newpage->mapping = NULL; > > OK, I am probably washed out from looking into this for too long but I > cannot figure why have you done this... mem_cgroup_uncharge() relies on PageAnon() working. Usually, anon pages retain their page->mapping until they hit the page allocator, the exception was old migration pages. > > } else { > > + mem_cgroup_migrate(page, newpage, false); > > if (remap_swapcache) > > remove_migration_ptes(page, newpage); > > - page->mapping = NULL; > > + if (!PageAnon(page)) > > + page->mapping = NULL; > > } > > > > unlock_page(newpage); > > [...] > > The semantic is much cleaner now. I have to digest details about the > patch because it is really huge. But nothing really jumped at me during > the review (except for few minor things mentioned here and one mentioned > in other email regarding USED flag). > > Good work! Thanks!
WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org> To: Michal Hocko <mhocko@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org>, Hugh Dickins <hughd@google.com>, Tejun Heo <tj@kernel.org>, Vladimir Davydov <vdavydov@parallels.com>, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [patch 13/13] mm: memcontrol: rewrite uncharge API Date: Tue, 15 Jul 2014 11:09:37 -0400 [thread overview] Message-ID: <20140715150937.GS29639@cmpxchg.org> (raw) In-Reply-To: <20140715142350.GD9366@dhcp22.suse.cz> On Tue, Jul 15, 2014 at 04:23:50PM +0200, Michal Hocko wrote: > On Tue 15-07-14 10:25:45, Michal Hocko wrote: > [...] > > diff --git a/Documentation/cgroups/memcg_test.txt b/Documentation/cgroups/memcg_test.txt > > index bcf750d3cecd..8870b0212150 100644 > > --- a/Documentation/cgroups/memcg_test.txt > > +++ b/Documentation/cgroups/memcg_test.txt > [...] > > 6. Shmem(tmpfs) Page Cache > > - Memcg's charge/uncharge have special handlers of shmem. The best way > > - to understand shmem's page state transition is to read mm/shmem.c. > > + The best way to understand shmem's page state transition is to read > > + mm/shmem.c. > > :D > > [...] > > 7. Page Migration > > - One of the most complicated functions is page-migration-handler. > > - Memcg has 2 routines. Assume that we are migrating a page's contents > > - from OLDPAGE to NEWPAGE. > > - > > - Usual migration logic is.. > > - (a) remove the page from LRU. > > - (b) allocate NEWPAGE (migration target) > > - (c) lock by lock_page(). > > - (d) unmap all mappings. > > - (e-1) If necessary, replace entry in radix-tree. > > - (e-2) move contents of a page. > > - (f) map all mappings again. > > - (g) pushback the page to LRU. > > - (-) OLDPAGE will be freed. > > - > > - Before (g), memcg should complete all necessary charge/uncharge to > > - NEWPAGE/OLDPAGE. > > - > > - The point is.... > > - - If OLDPAGE is anonymous, all charges will be dropped at (d) because > > - try_to_unmap() drops all mapcount and the page will not be > > - SwapCache. > > - > > - - If OLDPAGE is SwapCache, charges will be kept at (g) because > > - __delete_from_swap_cache() isn't called at (e-1) > > - > > - - If OLDPAGE is page-cache, charges will be kept at (g) because > > - remove_from_swap_cache() isn't called at (e-1) > > - > > - memcg provides following hooks. > > - > > - - mem_cgroup_prepare_migration(OLDPAGE) > > - Called after (b) to account a charge (usage += PAGE_SIZE) against > > - memcg which OLDPAGE belongs to. > > - > > - - mem_cgroup_end_migration(OLDPAGE, NEWPAGE) > > - Called after (f) before (g). > > - If OLDPAGE is used, commit OLDPAGE again. If OLDPAGE is already > > - charged, a charge by prepare_migration() is automatically canceled. > > - If NEWPAGE is used, commit NEWPAGE and uncharge OLDPAGE. > > - > > - But zap_pte() (by exit or munmap) can be called while migration, > > - we have to check if OLDPAGE/NEWPAGE is a valid page after commit(). > > + > > + mem_cgroup_migrate() > > This doesn't tell us anything abouta the page migration. On the other > hand I am not entirely sure the documentation here is very much helpful. > There is some outdated information. I wouldn't be opposed to remove > everything up to "9. Typical Tests." section which should be the primary > target of the file anyway. Yeah, documentation of the implementation should be directly in the source code and this file is kind of pointless. So all I did there was remove things that were wrong after my changes. But I agree it can probably be removed completely. > > @@ -382,9 +382,13 @@ static inline int mem_cgroup_swappiness(struct mem_cgroup *mem) > > } > > #endif > > #ifdef CONFIG_MEMCG_SWAP > > -extern void mem_cgroup_uncharge_swap(swp_entry_t ent); > > +extern void mem_cgroup_swapout(struct page *page, swp_entry_t entry); > > +extern void mem_cgroup_uncharge_swap(swp_entry_t entry); > > Wouldn't it be nicer to have those two with symmetric names? > mem_cgroup_{un}charge_swap? I thought about that when I wrote them, but their operation is not actually symmetrical. The first one migrates a memsw charge from a page to a swap entry when the page gets reclaimed - rather than when the swap entry is allocated, the second one uncharges the swap entry once the swap entry is released. > > @@ -2760,15 +2752,15 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg, > > spin_unlock_irq(&zone->lru_lock); > > } > > > > - mem_cgroup_charge_statistics(memcg, page, anon, nr_pages); > > - unlock_page_cgroup(pc); > > - > > + local_irq_disable(); > > + mem_cgroup_charge_statistics(memcg, page, nr_pages); > > /* > > * "charge_statistics" updated event counter. Then, check it. > > * Insert ancestor (and ancestor's ancestors), to softlimit RB-tree. > > * if they exceeds softlimit. > > */ > > memcg_check_events(memcg, page); > > + local_irq_enable(); > > preempt_{enable,disbale} should be sufficient for > mem_cgroup_charge_statistics and memcg_check_events no? > The first one is about per-cpu accounting (and that should be atomic > wrt. IRQ on the same CPU) and the later one uses IRQ safe locks down in > mem_cgroup_update_tree. How could it be atomic wrt. IRQ on the local CPU when IRQs that modify the counters can fire on the local CPU? > > @@ -780,11 +780,14 @@ static int move_to_new_page(struct page *newpage, struct page *page, > > rc = fallback_migrate_page(mapping, newpage, page, mode); > > > > if (rc != MIGRATEPAGE_SUCCESS) { > > - newpage->mapping = NULL; > > + if (!PageAnon(newpage)) > > + newpage->mapping = NULL; > > OK, I am probably washed out from looking into this for too long but I > cannot figure why have you done this... mem_cgroup_uncharge() relies on PageAnon() working. Usually, anon pages retain their page->mapping until they hit the page allocator, the exception was old migration pages. > > } else { > > + mem_cgroup_migrate(page, newpage, false); > > if (remap_swapcache) > > remove_migration_ptes(page, newpage); > > - page->mapping = NULL; > > + if (!PageAnon(page)) > > + page->mapping = NULL; > > } > > > > unlock_page(newpage); > > [...] > > The semantic is much cleaner now. I have to digest details about the > patch because it is really huge. But nothing really jumped at me during > the review (except for few minor things mentioned here and one mentioned > in other email regarding USED flag). > > Good work! Thanks! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-07-15 15:09 UTC|newest] Thread overview: 141+ messages / expand[flat|nested] mbox.gz Atom feed top 2014-06-18 20:40 [patch 00/13] mm: memcontrol: naturalize charge lifetime v4 Johannes Weiner 2014-06-18 20:40 ` Johannes Weiner 2014-06-18 20:40 ` [patch 01/13] mm: memcontrol: fold mem_cgroup_do_charge() Johannes Weiner 2014-06-18 20:40 ` Johannes Weiner 2014-06-18 20:40 ` [patch 02/13] mm: memcontrol: rearrange charging fast path Johannes Weiner 2014-06-18 20:40 ` Johannes Weiner 2014-06-18 20:40 ` [patch 03/13] mm: memcontrol: reclaim at least once for __GFP_NORETRY Johannes Weiner 2014-06-18 20:40 ` Johannes Weiner 2014-06-18 20:40 ` [patch 04/13] mm: huge_memory: use GFP_TRANSHUGE when charging huge pages Johannes Weiner 2014-06-18 20:40 ` Johannes Weiner 2014-06-18 20:40 ` [patch 05/13] mm: memcontrol: retry reclaim for oom-disabled and __GFP_NOFAIL charges Johannes Weiner 2014-06-18 20:40 ` Johannes Weiner 2014-06-18 20:40 ` [patch 06/13] mm: memcontrol: remove explicit OOM parameter in charge path Johannes Weiner 2014-06-18 20:40 ` Johannes Weiner 2014-06-18 20:40 ` [patch 07/13] mm: memcontrol: simplify move precharge function Johannes Weiner 2014-06-18 20:40 ` Johannes Weiner 2014-06-18 20:40 ` [patch 08/13] mm: memcontrol: catch root bypass in move precharge Johannes Weiner 2014-06-18 20:40 ` Johannes Weiner 2014-06-18 20:40 ` [patch 09/13] mm: memcontrol: use root_mem_cgroup res_counter Johannes Weiner 2014-06-18 20:40 ` Johannes Weiner 2014-06-18 20:40 ` [patch 10/13] mm: memcontrol: remove ordering between pc->mem_cgroup and PageCgroupUsed Johannes Weiner 2014-06-18 20:40 ` Johannes Weiner 2014-06-18 20:40 ` [patch 11/13] mm: memcontrol: do not acquire page_cgroup lock for kmem pages Johannes Weiner 2014-06-18 20:40 ` Johannes Weiner 2014-06-18 20:40 ` [patch 12/13] mm: memcontrol: rewrite charge API Johannes Weiner 2014-06-18 20:40 ` Johannes Weiner 2014-06-23 6:15 ` Uwe Kleine-König 2014-06-23 6:15 ` Uwe Kleine-König 2014-06-23 6:15 ` Uwe Kleine-König 2014-06-23 9:30 ` Michal Hocko 2014-06-23 9:30 ` Michal Hocko 2014-06-23 9:30 ` Michal Hocko 2014-06-23 9:42 ` Uwe Kleine-König 2014-06-23 9:42 ` Uwe Kleine-König 2014-06-23 9:42 ` Uwe Kleine-König 2014-07-14 15:04 ` Michal Hocko 2014-07-14 15:04 ` Michal Hocko 2014-07-14 15:04 ` Michal Hocko 2014-07-14 17:13 ` Johannes Weiner 2014-07-14 17:13 ` Johannes Weiner 2014-07-14 18:43 ` Michal Hocko 2014-07-14 18:43 ` Michal Hocko 2014-06-18 20:40 ` [patch 13/13] mm: memcontrol: rewrite uncharge API Johannes Weiner 2014-06-18 20:40 ` Johannes Weiner 2014-06-20 16:36 ` [PATCH -mm] memcg: mem_cgroup_charge_statistics needs preempt_disable Michal Hocko 2014-06-20 16:36 ` Michal Hocko 2014-06-23 4:16 ` Johannes Weiner 2014-06-23 4:16 ` Johannes Weiner 2014-06-21 0:34 ` [patch 13/13] mm: memcontrol: rewrite uncharge API Sasha Levin 2014-06-21 0:34 ` Sasha Levin 2014-06-21 0:56 ` Andrew Morton 2014-06-21 0:56 ` Andrew Morton 2014-06-21 0:56 ` Andrew Morton 2014-06-21 1:03 ` Sasha Levin 2014-06-21 1:03 ` Sasha Levin 2014-07-15 8:25 ` Michal Hocko 2014-07-15 8:25 ` Michal Hocko 2014-07-15 8:25 ` Michal Hocko 2014-07-15 12:19 ` Michal Hocko 2014-07-15 12:19 ` Michal Hocko 2014-07-18 7:12 ` Michal Hocko 2014-07-18 7:12 ` Michal Hocko 2014-07-18 14:45 ` Johannes Weiner 2014-07-18 14:45 ` Johannes Weiner 2014-07-18 14:45 ` Johannes Weiner 2014-07-18 15:12 ` Miklos Szeredi 2014-07-18 15:12 ` Miklos Szeredi 2014-07-19 17:39 ` Johannes Weiner 2014-07-19 17:39 ` Johannes Weiner 2014-07-19 17:39 ` Johannes Weiner 2014-07-22 15:08 ` Michal Hocko 2014-07-22 15:08 ` Michal Hocko 2014-07-22 15:44 ` Miklos Szeredi 2014-07-22 15:44 ` Miklos Szeredi 2014-07-22 15:44 ` Miklos Szeredi 2014-07-23 14:38 ` Michal Hocko 2014-07-23 14:38 ` Michal Hocko 2014-07-23 14:38 ` Michal Hocko 2014-07-23 15:06 ` Johannes Weiner 2014-07-23 15:06 ` Johannes Weiner 2014-07-23 15:19 ` Michal Hocko 2014-07-23 15:19 ` Michal Hocko 2014-07-23 15:19 ` Michal Hocko 2014-07-23 15:36 ` Johannes Weiner 2014-07-23 15:36 ` Johannes Weiner 2014-07-23 18:08 ` Miklos Szeredi 2014-07-23 18:08 ` Miklos Szeredi 2014-07-23 21:02 ` Johannes Weiner 2014-07-23 21:02 ` Johannes Weiner 2014-07-23 21:02 ` Johannes Weiner 2014-07-24 8:46 ` Michal Hocko 2014-07-24 8:46 ` Michal Hocko 2014-07-24 9:02 ` Michal Hocko 2014-07-24 9:02 ` Michal Hocko 2014-07-24 9:02 ` Michal Hocko 2014-07-25 15:26 ` Johannes Weiner 2014-07-25 15:26 ` Johannes Weiner 2014-07-25 15:26 ` Johannes Weiner 2014-07-25 15:43 ` Michal Hocko 2014-07-25 15:43 ` Michal Hocko 2014-07-25 17:34 ` Johannes Weiner 2014-07-25 17:34 ` Johannes Weiner 2014-07-15 14:23 ` Michal Hocko 2014-07-15 14:23 ` Michal Hocko 2014-07-15 14:23 ` Michal Hocko 2014-07-15 15:09 ` Johannes Weiner [this message] 2014-07-15 15:09 ` Johannes Weiner 2014-07-15 15:18 ` Michal Hocko 2014-07-15 15:18 ` Michal Hocko 2014-07-15 15:46 ` Johannes Weiner 2014-07-15 15:46 ` Johannes Weiner 2014-07-15 15:56 ` Michal Hocko 2014-07-15 15:56 ` Michal Hocko 2014-07-15 15:55 ` Naoya Horiguchi 2014-07-15 15:55 ` Naoya Horiguchi 2014-07-15 16:07 ` Michal Hocko 2014-07-15 16:07 ` Michal Hocko 2014-07-15 17:34 ` Johannes Weiner 2014-07-15 17:34 ` Johannes Weiner 2014-07-15 17:34 ` Johannes Weiner 2014-07-15 18:21 ` Michal Hocko 2014-07-15 18:21 ` Michal Hocko 2014-07-15 18:21 ` Michal Hocko 2014-07-15 18:43 ` Naoya Horiguchi 2014-07-15 18:43 ` Naoya Horiguchi 2014-07-15 19:04 ` Johannes Weiner 2014-07-15 19:04 ` Johannes Weiner 2014-07-15 19:04 ` Johannes Weiner 2014-07-15 20:49 ` Naoya Horiguchi 2014-07-15 20:49 ` Naoya Horiguchi 2014-07-15 21:48 ` Johannes Weiner 2014-07-15 21:48 ` Johannes Weiner 2014-07-16 7:55 ` Michal Hocko 2014-07-16 7:55 ` Michal Hocko 2014-07-16 13:30 ` Naoya Horiguchi 2014-07-16 13:30 ` Naoya Horiguchi 2014-07-16 14:14 ` Johannes Weiner 2014-07-16 14:14 ` Johannes Weiner 2014-07-16 14:57 ` Naoya Horiguchi 2014-07-16 14:57 ` Naoya Horiguchi 2014-07-16 14:57 ` Naoya Horiguchi
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20140715150937.GS29639@cmpxchg.org \ --to=hannes@cmpxchg.org \ --cc=akpm@linux-foundation.org \ --cc=cgroups@vger.kernel.org \ --cc=hughd@google.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mhocko@suse.cz \ --cc=tj@kernel.org \ --cc=vdavydov@parallels.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.