From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030402Ab2CUJx7 (ORCPT ); Wed, 21 Mar 2012 05:53:59 -0400 Received: from fgwmail5.fujitsu.co.jp ([192.51.44.35]:51852 "EHLO fgwmail5.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965089Ab2CUJx4 (ORCPT ); Wed, 21 Mar 2012 05:53:56 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v1.7.4 Message-ID: <4F69A4C4.4080602@jp.fujitsu.com> Date: Wed, 21 Mar 2012 18:52:04 +0900 From: KAMEZAWA Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: "linux-mm@kvack.org" , Linux Kernel , "cgroups@vger.kernel.org" , Hugh Dickins , "n-horiguchi@ah.jp.nec.com" , Johannes Weiner , Michal Hocko , Glauber Costa , Andrew Morton Subject: [PATCH] memcg: change behavior of moving charges at task move Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org As discussed before, I post this to fix the spec and implementation of task moving. Then, do you think what target kernel version should be ? 3.4/3.5 ? but yes, it may be late for 3.4.... == In documentation, it's said that 'shared anon are not moved'. But in implementation, the check was wrong. if (!move_anon() || page_mapcount(page) > 2) Ah, memcg has been moving shared anon pages for a long time. Then, here is a discussion about handling of shared anon pages. - It's complex - Now, shared file caches are moved in force. - It adds unclear check as page_mapcount(). To do correct check, we should check swap users, etc. - No one notice this implementation behavior. So, no one get benefit from the design. - In general, once task is moved to a cgroup for running, it will not be moved.... - Finally, we have control knob as memory.move_charge_at_immigrate. Here is a patch to allow moving shared pages, completely. This makes memcg simpler and fix current broken code. Note: IIUC, libcgroup's cgroup daemon moves tasks after exec(). So, it's not affected. libcgroup's command "cgexec" does move itsef to a memcg and call exec() without fork(). it's not affected. Changelog: - fixed PageAnon() check. - remove call of lookup_swap_cache() - fixed Documentation. Suggested-by: Hugh Dickins Signed-off-by: KAMEZAWA Hiroyuki --- Documentation/cgroups/memory.txt | 9 ++++----- include/linux/swap.h | 9 --------- mm/memcontrol.c | 15 +++++++-------- mm/swapfile.c | 31 ------------------------------- 4 files changed, 11 insertions(+), 53 deletions(-) diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index 4c95c00..84d4f00 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -185,12 +185,14 @@ behind this approach is that a cgroup that aggressively uses a shared page will eventually get charged for it (once it is uncharged from the cgroup that brought it in -- this will happen on memory pressure). +But see section 8.2: when moving a task to another cgroup, its pages may +be recharged to the new cgroup, if move_charge_at_immigrate has been chosen. + Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used. When you do swapoff and make swapped-out pages of shmem(tmpfs) to be backed into memory in force, charges for pages are accounted against the caller of swapoff rather than the users of shmem. - 2.4 Swap Extension (CONFIG_CGROUP_MEM_RES_CTLR_SWAP) Swap Extension allows you to record charge for swap. A swapped-in page is @@ -623,8 +625,7 @@ memory cgroup. bit | what type of charges would be moved ? -----+------------------------------------------------------------------------ 0 | A charge of an anonymous page(or swap of it) used by the target task. - | Those pages and swaps must be used only by the target task. You must - | enable Swap Extension(see 2.4) to enable move of swap charges. + | You must enable Swap Extension(see 2.4) to enable move of swap charges. -----+------------------------------------------------------------------------ 1 | A charge of file pages(normal file, tmpfs file(e.g. ipc shared memory) | and swaps of tmpfs file) mmapped by the target task. Unlike the case of @@ -637,8 +638,6 @@ memory cgroup. 8.3 TODO -- Implement madvise(2) to let users decide the vma to be moved or not to be - moved. - All of moving charge operations are done under cgroup_mutex. It's not good behavior to hold the mutex too long, so we may need some trick. diff --git a/include/linux/swap.h b/include/linux/swap.h index 6e66c03..70d2c74 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -387,7 +387,6 @@ static inline void deactivate_swap_token(struct mm_struct *mm, bool swap_token) #ifdef CONFIG_CGROUP_MEM_RES_CTLR extern void mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout); -extern int mem_cgroup_count_swap_user(swp_entry_t ent, struct page **pagep); #else static inline void mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout) @@ -532,14 +531,6 @@ mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent) { } -#ifdef CONFIG_CGROUP_MEM_RES_CTLR -static inline int -mem_cgroup_count_swap_user(swp_entry_t ent, struct page **pagep) -{ - return 0; -} -#endif - #endif /* CONFIG_SWAP */ #endif /* __KERNEL__*/ #endif /* _LINUX_SWAP_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index b2ee6df..a48d185 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5147,7 +5147,7 @@ static struct page *mc_handle_present_pte(struct vm_area_struct *vma, return NULL; if (PageAnon(page)) { /* we don't move shared anon */ - if (!move_anon() || page_mapcount(page) > 2) + if (!move_anon()) return NULL; } else if (!move_file()) /* we ignore mapcount for file pages */ @@ -5161,18 +5161,17 @@ static struct page *mc_handle_present_pte(struct vm_area_struct *vma, static struct page *mc_handle_swap_pte(struct vm_area_struct *vma, unsigned long addr, pte_t ptent, swp_entry_t *entry) { - int usage_count; struct page *page = NULL; swp_entry_t ent = pte_to_swp_entry(ptent); if (!move_anon() || non_swap_entry(ent)) return NULL; - usage_count = mem_cgroup_count_swap_user(ent, &page); - if (usage_count > 1) { /* we don't move shared anon */ - if (page) - put_page(page); - return NULL; - } +#ifdef CONFIG_SWAP + /* + * Avoid lookup_swap_cache() not to update statistics. + */ + page = find_get_page(&swapper_space, ent.val); +#endif if (do_swap_account) entry->val = ent.val; diff --git a/mm/swapfile.c b/mm/swapfile.c index dae42f3..fedeb6b 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -717,37 +717,6 @@ int free_swap_and_cache(swp_entry_t entry) return p != NULL; } -#ifdef CONFIG_CGROUP_MEM_RES_CTLR -/** - * mem_cgroup_count_swap_user - count the user of a swap entry - * @ent: the swap entry to be checked - * @pagep: the pointer for the swap cache page of the entry to be stored - * - * Returns the number of the user of the swap entry. The number is valid only - * for swaps of anonymous pages. - * If the entry is found on swap cache, the page is stored to pagep with - * refcount of it being incremented. - */ -int mem_cgroup_count_swap_user(swp_entry_t ent, struct page **pagep) -{ - struct page *page; - struct swap_info_struct *p; - int count = 0; - - page = find_get_page(&swapper_space, ent.val); - if (page) - count += page_mapcount(page); - p = swap_info_get(ent); - if (p) { - count += swap_count(p->swap_map[swp_offset(ent)]); - spin_unlock(&swap_lock); - } - - *pagep = page; - return count; -} -#endif - #ifdef CONFIG_HIBERNATION /* * Find the swap type that corresponds to given device (if any). -- 1.7.4.1 From mboxrd@z Thu Jan 1 00:00:00 1970 From: KAMEZAWA Hiroyuki Subject: [PATCH] memcg: change behavior of moving charges at task move Date: Wed, 21 Mar 2012 18:52:04 +0900 Message-ID: <4F69A4C4.4080602@jp.fujitsu.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: "linux-mm@kvack.org" , Linux Kernel , "cgroups@vger.kernel.org" , Hugh Dickins , "n-horiguchi@ah.jp.nec.com" , Johannes Weiner , Michal Hocko , Glauber Costa , Andrew Morton As discussed before, I post this to fix the spec and implementation of task moving. Then, do you think what target kernel version should be ? 3.4/3.5 ? but yes, it may be late for 3.4.... == In documentation, it's said that 'shared anon are not moved'. But in implementation, the check was wrong. if (!move_anon() || page_mapcount(page) > 2) Ah, memcg has been moving shared anon pages for a long time. Then, here is a discussion about handling of shared anon pages. - It's complex - Now, shared file caches are moved in force. - It adds unclear check as page_mapcount(). To do correct check, we should check swap users, etc. - No one notice this implementation behavior. So, no one get benefit from the design. - In general, once task is moved to a cgroup for running, it will not be moved.... - Finally, we have control knob as memory.move_charge_at_immigrate. Here is a patch to allow moving shared pages, completely. This makes memcg simpler and fix current broken code. Note: IIUC, libcgroup's cgroup daemon moves tasks after exec(). So, it's not affected. libcgroup's command "cgexec" does move itsef to a memcg and call exec() without fork(). it's not affected. Changelog: - fixed PageAnon() check. - remove call of lookup_swap_cache() - fixed Documentation. Suggested-by: Hugh Dickins Signed-off-by: KAMEZAWA Hiroyuki --- Documentation/cgroups/memory.txt | 9 ++++----- include/linux/swap.h | 9 --------- mm/memcontrol.c | 15 +++++++-------- mm/swapfile.c | 31 ------------------------------- 4 files changed, 11 insertions(+), 53 deletions(-) diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index 4c95c00..84d4f00 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -185,12 +185,14 @@ behind this approach is that a cgroup that aggressively uses a shared page will eventually get charged for it (once it is uncharged from the cgroup that brought it in -- this will happen on memory pressure). +But see section 8.2: when moving a task to another cgroup, its pages may +be recharged to the new cgroup, if move_charge_at_immigrate has been chosen. + Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used. When you do swapoff and make swapped-out pages of shmem(tmpfs) to be backed into memory in force, charges for pages are accounted against the caller of swapoff rather than the users of shmem. - 2.4 Swap Extension (CONFIG_CGROUP_MEM_RES_CTLR_SWAP) Swap Extension allows you to record charge for swap. A swapped-in page is @@ -623,8 +625,7 @@ memory cgroup. bit | what type of charges would be moved ? -----+------------------------------------------------------------------------ 0 | A charge of an anonymous page(or swap of it) used by the target task. - | Those pages and swaps must be used only by the target task. You must - | enable Swap Extension(see 2.4) to enable move of swap charges. + | You must enable Swap Extension(see 2.4) to enable move of swap charges. -----+------------------------------------------------------------------------ 1 | A charge of file pages(normal file, tmpfs file(e.g. ipc shared memory) | and swaps of tmpfs file) mmapped by the target task. Unlike the case of @@ -637,8 +638,6 @@ memory cgroup. 8.3 TODO -- Implement madvise(2) to let users decide the vma to be moved or not to be - moved. - All of moving charge operations are done under cgroup_mutex. It's not good behavior to hold the mutex too long, so we may need some trick. diff --git a/include/linux/swap.h b/include/linux/swap.h index 6e66c03..70d2c74 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -387,7 +387,6 @@ static inline void deactivate_swap_token(struct mm_struct *mm, bool swap_token) #ifdef CONFIG_CGROUP_MEM_RES_CTLR extern void mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout); -extern int mem_cgroup_count_swap_user(swp_entry_t ent, struct page **pagep); #else static inline void mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout) @@ -532,14 +531,6 @@ mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent) { } -#ifdef CONFIG_CGROUP_MEM_RES_CTLR -static inline int -mem_cgroup_count_swap_user(swp_entry_t ent, struct page **pagep) -{ - return 0; -} -#endif - #endif /* CONFIG_SWAP */ #endif /* __KERNEL__*/ #endif /* _LINUX_SWAP_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index b2ee6df..a48d185 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5147,7 +5147,7 @@ static struct page *mc_handle_present_pte(struct vm_area_struct *vma, return NULL; if (PageAnon(page)) { /* we don't move shared anon */ - if (!move_anon() || page_mapcount(page) > 2) + if (!move_anon()) return NULL; } else if (!move_file()) /* we ignore mapcount for file pages */ @@ -5161,18 +5161,17 @@ static struct page *mc_handle_present_pte(struct vm_area_struct *vma, static struct page *mc_handle_swap_pte(struct vm_area_struct *vma, unsigned long addr, pte_t ptent, swp_entry_t *entry) { - int usage_count; struct page *page = NULL; swp_entry_t ent = pte_to_swp_entry(ptent); if (!move_anon() || non_swap_entry(ent)) return NULL; - usage_count = mem_cgroup_count_swap_user(ent, &page); - if (usage_count > 1) { /* we don't move shared anon */ - if (page) - put_page(page); - return NULL; - } +#ifdef CONFIG_SWAP + /* + * Avoid lookup_swap_cache() not to update statistics. + */ + page = find_get_page(&swapper_space, ent.val); +#endif if (do_swap_account) entry->val = ent.val; diff --git a/mm/swapfile.c b/mm/swapfile.c index dae42f3..fedeb6b 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -717,37 +717,6 @@ int free_swap_and_cache(swp_entry_t entry) return p != NULL; } -#ifdef CONFIG_CGROUP_MEM_RES_CTLR -/** - * mem_cgroup_count_swap_user - count the user of a swap entry - * @ent: the swap entry to be checked - * @pagep: the pointer for the swap cache page of the entry to be stored - * - * Returns the number of the user of the swap entry. The number is valid only - * for swaps of anonymous pages. - * If the entry is found on swap cache, the page is stored to pagep with - * refcount of it being incremented. - */ -int mem_cgroup_count_swap_user(swp_entry_t ent, struct page **pagep) -{ - struct page *page; - struct swap_info_struct *p; - int count = 0; - - page = find_get_page(&swapper_space, ent.val); - if (page) - count += page_mapcount(page); - p = swap_info_get(ent); - if (p) { - count += swap_count(p->swap_map[swp_offset(ent)]); - spin_unlock(&swap_lock); - } - - *pagep = page; - return count; -} -#endif - #ifdef CONFIG_HIBERNATION /* * Find the swap type that corresponds to given device (if any). -- 1.7.4.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org