From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753415AbZEZDOq (ORCPT ); Mon, 25 May 2009 23:14:46 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752775AbZEZDOh (ORCPT ); Mon, 25 May 2009 23:14:37 -0400 Received: from fgwmail7.fujitsu.co.jp ([192.51.44.37]:36316 "EHLO fgwmail7.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752771AbZEZDOh (ORCPT ); Mon, 25 May 2009 23:14:37 -0400 Date: Tue, 26 May 2009 12:12:59 +0900 From: KAMEZAWA Hiroyuki To: "linux-mm@kvack.org" Cc: "balbir@linux.vnet.ibm.com" , "nishimura@mxp.nes.nec.co.jp" , "hugh.dickins@tiscali.co.uk" , "hannes@cmpxchg.org" , "linux-kernel@vger.kernel.org" Subject: [RFC][PATCH] memcg: fix swap account (26/May)[0/5] Message-Id: <20090526121259.b91b3e9d.kamezawa.hiroyu@jp.fujitsu.com> Organization: FUJITSU Co. LTD. X-Mailer: Sylpheed 2.5.0 (GTK+ 2.10.14; i686-pc-mingw32) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org As Nishimura reported, there is a race at handling swap cache. Typical cases are following (from Nishimura's mail) == Type-1 == If some pages of processA has been swapped out, it calls free_swap_and_cache(). And if at the same time, processB is calling read_swap_cache_async() about a swap entry *that is used by processA*, a race like below can happen. processA | processB -------------------------------------+------------------------------------- (free_swap_and_cache()) | (read_swap_cache_async()) | swap_duplicate() | __set_page_locked() | add_to_swap_cache() swap_entry_free() == 0 | find_get_page() -> found | try_lock_page() -> fail & return | | lru_cache_add_anon() | doesn't link this page to memcg's | LRU, because of !PageCgroupUsed. This type of leak can be avoided by setting /proc/sys/vm/page-cluster to 0. == Type-2 == Assume processA is exiting and pte points to a page(!PageSwapCache). And processB is trying reclaim the page. processA | processB -------------------------------------+------------------------------------- (page_remove_rmap()) | (shrink_page_list()) mem_cgroup_uncharge_page() | ->uncharged because it's not | PageSwapCache yet. | So, both mem/memsw.usage | are decremented. | | add_to_swap() -> added to swap cache. If this page goes thorough without being freed for some reason, this page doesn't goes back to memcg's LRU because of !PageCgroupUsed. == This patch is a trial for fixing above problems by fixing memcg's swap account logic. But this requires some amount of changes in swap. Comaparing with my previous post (22/May) (http://marc.info/?l=linux-mm&m=124297915418698&w=2), I think this one is much easier to read... [1/5] change interface of swap_duplicate()/swap_free() Adds an function swapcache_prepare() and swapcache_free(). [2/5] add SWAP_HAS_CACHE flag to swap_map Add SWAP_HAS_CACHE flag to swap_map array for knowing an information that "there is an only swap cache and swap has no reference" without calling find_get_page(). [3/5] Count the number of swap-cache-only swaps After repeating swap-in/out, there are tons of cache-only swaps. (via a mapped swapcache under vm_swap_full()==false) This patch counts the number of entry and show it in debug information. (for example, sysrq-m) [4/5] fix memcg's swap accounting. change the memcg's swap accounting logic to see # of references to swap. [5/5] experimental garbage collection for cache-only swaps. reclaim swap enty which is not used. patch [4/5] is for type-1 patch [5/5] is for type-2 and sanity of swaps control... Thank you for all helps. Any comments are welcome. Thanks, -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail143.messagelabs.com (mail143.messagelabs.com [216.82.254.35]) by kanga.kvack.org (Postfix) with SMTP id 0F26D6B004D for ; Mon, 25 May 2009 23:14:22 -0400 (EDT) Received: from m6.gw.fujitsu.co.jp ([10.0.50.76]) by fgwmail5.fujitsu.co.jp (Fujitsu Gateway) with ESMTP id n4Q3EYZE019530 for (envelope-from kamezawa.hiroyu@jp.fujitsu.com); Tue, 26 May 2009 12:14:34 +0900 Received: from smail (m6 [127.0.0.1]) by outgoing.m6.gw.fujitsu.co.jp (Postfix) with ESMTP id 06F2245DE51 for ; Tue, 26 May 2009 12:14:34 +0900 (JST) Received: from s6.gw.fujitsu.co.jp (s6.gw.fujitsu.co.jp [10.0.50.96]) by m6.gw.fujitsu.co.jp (Postfix) with ESMTP id CE6E045DD72 for ; Tue, 26 May 2009 12:14:33 +0900 (JST) Received: from s6.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s6.gw.fujitsu.co.jp (Postfix) with ESMTP id BF4811DB803E for ; Tue, 26 May 2009 12:14:33 +0900 (JST) Received: from m105.s.css.fujitsu.com (m105.s.css.fujitsu.com [10.249.87.105]) by s6.gw.fujitsu.co.jp (Postfix) with ESMTP id 65C411DB8037 for ; Tue, 26 May 2009 12:14:33 +0900 (JST) Date: Tue, 26 May 2009 12:12:59 +0900 From: KAMEZAWA Hiroyuki Subject: [RFC][PATCH] memcg: fix swap account (26/May)[0/5] Message-Id: <20090526121259.b91b3e9d.kamezawa.hiroyu@jp.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: "linux-mm@kvack.org" Cc: "balbir@linux.vnet.ibm.com" , "nishimura@mxp.nes.nec.co.jp" , "hugh.dickins@tiscali.co.uk" , "hannes@cmpxchg.org" , "linux-kernel@vger.kernel.org" List-ID: As Nishimura reported, there is a race at handling swap cache. Typical cases are following (from Nishimura's mail) == Type-1 == If some pages of processA has been swapped out, it calls free_swap_and_cache(). And if at the same time, processB is calling read_swap_cache_async() about a swap entry *that is used by processA*, a race like below can happen. processA | processB -------------------------------------+------------------------------------- (free_swap_and_cache()) | (read_swap_cache_async()) | swap_duplicate() | __set_page_locked() | add_to_swap_cache() swap_entry_free() == 0 | find_get_page() -> found | try_lock_page() -> fail & return | | lru_cache_add_anon() | doesn't link this page to memcg's | LRU, because of !PageCgroupUsed. This type of leak can be avoided by setting /proc/sys/vm/page-cluster to 0. == Type-2 == Assume processA is exiting and pte points to a page(!PageSwapCache). And processB is trying reclaim the page. processA | processB -------------------------------------+------------------------------------- (page_remove_rmap()) | (shrink_page_list()) mem_cgroup_uncharge_page() | ->uncharged because it's not | PageSwapCache yet. | So, both mem/memsw.usage | are decremented. | | add_to_swap() -> added to swap cache. If this page goes thorough without being freed for some reason, this page doesn't goes back to memcg's LRU because of !PageCgroupUsed. == This patch is a trial for fixing above problems by fixing memcg's swap account logic. But this requires some amount of changes in swap. Comaparing with my previous post (22/May) (http://marc.info/?l=linux-mm&m=124297915418698&w=2), I think this one is much easier to read... [1/5] change interface of swap_duplicate()/swap_free() Adds an function swapcache_prepare() and swapcache_free(). [2/5] add SWAP_HAS_CACHE flag to swap_map Add SWAP_HAS_CACHE flag to swap_map array for knowing an information that "there is an only swap cache and swap has no reference" without calling find_get_page(). [3/5] Count the number of swap-cache-only swaps After repeating swap-in/out, there are tons of cache-only swaps. (via a mapped swapcache under vm_swap_full()==false) This patch counts the number of entry and show it in debug information. (for example, sysrq-m) [4/5] fix memcg's swap accounting. change the memcg's swap accounting logic to see # of references to swap. [5/5] experimental garbage collection for cache-only swaps. reclaim swap enty which is not used. patch [4/5] is for type-1 patch [5/5] is for type-2 and sanity of swaps control... Thank you for all helps. Any comments are welcome. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org