From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> To: "linux-mm@kvack.org" <linux-mm@kvack.org> Cc: "balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>, "nishimura@mxp.nes.nec.co.jp" <nishimura@mxp.nes.nec.co.jp>, "akpm@linux-foundation.org" <akpm@linux-foundation.org>, mingo@elte.hu, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org> Subject: [PATCH 0/3] fix stale swap cache account leak in memcg v7 Date: Tue, 12 May 2009 10:44:01 +0900 [thread overview] Message-ID: <20090512104401.28edc0a8.kamezawa.hiroyu@jp.fujitsu.com> (raw) I hope this version gets acks.. == As Nishimura reported, there is a race at handling swap cache. Typical cases are following (from Nishimura's mail) == Type-1 == If some pages of processA has been swapped out, it calls free_swap_and_cache(). And if at the same time, processB is calling read_swap_cache_async() about a swap entry *that is used by processA*, a race like below can happen. processA | processB -------------------------------------+------------------------------------- (free_swap_and_cache()) | (read_swap_cache_async()) | swap_duplicate() | __set_page_locked() | add_to_swap_cache() swap_entry_free() == 0 | find_get_page() -> found | try_lock_page() -> fail & return | | lru_cache_add_anon() | doesn't link this page to memcg's | LRU, because of !PageCgroupUsed. This type of leak can be avoided by setting /proc/sys/vm/page-cluster to 0. == Type-2 == Assume processA is exiting and pte points to a page(!PageSwapCache). And processB is trying reclaim the page. processA | processB -------------------------------------+------------------------------------- (page_remove_rmap()) | (shrink_page_list()) mem_cgroup_uncharge_page() | ->uncharged because it's not | PageSwapCache yet. | So, both mem/memsw.usage | are decremented. | | add_to_swap() -> added to swap cache. If this page goes thorough without being freed for some reason, this page doesn't goes back to memcg's LRU because of !PageCgroupUsed. Considering Type-1, it's better to avoid swapin-readahead when memcg is used. swapin-readahead just read swp_entries which are near to requested entry. So, pages not to be used can be on memory (on global LRU). When memcg is used, this is not good behavior anyway. Considering Type-2, the page should be freed from SwapCache right after WriteBack. Free swapped out pages as soon as possible is a good nature to memcg, anyway. The patch set includes followng [1/3] add mem_cgroup_is_activated() function. which tell us memcg is _really_ used. [2/3] fix swap cache handling race by avoidng readahead. [3/3] fix swap cache handling race by check swapcount again. Result is good under my test. Thanks, -Kame
WARNING: multiple messages have this Message-ID (diff)
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> To: "linux-mm@kvack.org" <linux-mm@kvack.org> Cc: "balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>, "nishimura@mxp.nes.nec.co.jp" <nishimura@mxp.nes.nec.co.jp>, "akpm@linux-foundation.org" <akpm@linux-foundation.org>, mingo@elte.hu, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org> Subject: [PATCH 0/3] fix stale swap cache account leak in memcg v7 Date: Tue, 12 May 2009 10:44:01 +0900 [thread overview] Message-ID: <20090512104401.28edc0a8.kamezawa.hiroyu@jp.fujitsu.com> (raw) I hope this version gets acks.. == As Nishimura reported, there is a race at handling swap cache. Typical cases are following (from Nishimura's mail) == Type-1 == If some pages of processA has been swapped out, it calls free_swap_and_cache(). And if at the same time, processB is calling read_swap_cache_async() about a swap entry *that is used by processA*, a race like below can happen. processA | processB -------------------------------------+------------------------------------- (free_swap_and_cache()) | (read_swap_cache_async()) | swap_duplicate() | __set_page_locked() | add_to_swap_cache() swap_entry_free() == 0 | find_get_page() -> found | try_lock_page() -> fail & return | | lru_cache_add_anon() | doesn't link this page to memcg's | LRU, because of !PageCgroupUsed. This type of leak can be avoided by setting /proc/sys/vm/page-cluster to 0. == Type-2 == Assume processA is exiting and pte points to a page(!PageSwapCache). And processB is trying reclaim the page. processA | processB -------------------------------------+------------------------------------- (page_remove_rmap()) | (shrink_page_list()) mem_cgroup_uncharge_page() | ->uncharged because it's not | PageSwapCache yet. | So, both mem/memsw.usage | are decremented. | | add_to_swap() -> added to swap cache. If this page goes thorough without being freed for some reason, this page doesn't goes back to memcg's LRU because of !PageCgroupUsed. Considering Type-1, it's better to avoid swapin-readahead when memcg is used. swapin-readahead just read swp_entries which are near to requested entry. So, pages not to be used can be on memory (on global LRU). When memcg is used, this is not good behavior anyway. Considering Type-2, the page should be freed from SwapCache right after WriteBack. Free swapped out pages as soon as possible is a good nature to memcg, anyway. The patch set includes followng [1/3] add mem_cgroup_is_activated() function. which tell us memcg is _really_ used. [2/3] fix swap cache handling race by avoidng readahead. [3/3] fix swap cache handling race by check swapcount again. Result is good under my test. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next reply other threads:[~2009-05-12 1:45 UTC|newest] Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top 2009-05-12 1:44 KAMEZAWA Hiroyuki [this message] 2009-05-12 1:44 ` [PATCH 0/3] fix stale swap cache account leak in memcg v7 KAMEZAWA Hiroyuki 2009-05-12 1:45 ` [PATCH 1/3] add check for mem cgroup is activated KAMEZAWA Hiroyuki 2009-05-12 1:45 ` KAMEZAWA Hiroyuki 2009-05-12 1:46 ` [PATCH 2/3] fix swap cache account leak at swapin-readahead KAMEZAWA Hiroyuki 2009-05-12 1:46 ` KAMEZAWA Hiroyuki 2009-05-12 4:32 ` Daisuke Nishimura 2009-05-12 4:32 ` Daisuke Nishimura 2009-05-12 11:24 ` Johannes Weiner 2009-05-12 11:24 ` Johannes Weiner 2009-05-12 23:58 ` KAMEZAWA Hiroyuki 2009-05-12 23:58 ` KAMEZAWA Hiroyuki 2009-05-13 11:18 ` Johannes Weiner 2009-05-13 11:18 ` Johannes Weiner 2009-05-13 18:03 ` Hugh Dickins 2009-05-13 18:03 ` Hugh Dickins 2009-05-14 0:05 ` KAMEZAWA Hiroyuki 2009-05-14 0:05 ` KAMEZAWA Hiroyuki 2009-05-12 1:47 ` [PATCH 3/3] fix stale swap cache at writeback KAMEZAWA Hiroyuki 2009-05-12 1:47 ` KAMEZAWA Hiroyuki 2009-05-12 5:06 ` [PATCH 4/3] memcg: call uncharge_swapcache outside of tree_lock (Re: [PATCH 0/3] fix stale swap cache account leak in memcg v7) Daisuke Nishimura 2009-05-12 5:06 ` Daisuke Nishimura 2009-05-12 7:09 ` KAMEZAWA Hiroyuki 2009-05-12 7:09 ` KAMEZAWA Hiroyuki 2009-05-12 8:00 ` Daisuke Nishimura 2009-05-12 8:00 ` Daisuke Nishimura 2009-05-12 8:13 ` [PATCH][BUGFIX] memcg: fix for deadlock between lock_page_cgroup and mapping tree_lock KAMEZAWA Hiroyuki 2009-05-12 8:13 ` KAMEZAWA Hiroyuki 2009-05-12 10:58 ` Daisuke Nishimura 2009-05-12 10:58 ` Daisuke Nishimura 2009-05-12 23:59 ` KAMEZAWA Hiroyuki 2009-05-12 23:59 ` KAMEZAWA Hiroyuki 2009-05-13 0:28 ` Daisuke Nishimura 2009-05-13 0:28 ` Daisuke Nishimura 2009-05-13 0:32 ` KAMEZAWA Hiroyuki 2009-05-13 0:32 ` KAMEZAWA Hiroyuki 2009-05-13 3:55 ` KAMEZAWA Hiroyuki 2009-05-13 3:55 ` KAMEZAWA Hiroyuki 2009-05-13 4:11 ` nishimura 2009-05-13 4:11 ` nishimura 2009-05-12 9:51 ` [PATCH 0/3] fix stale swap cache account leak in memcg v7 Balbir Singh 2009-05-12 9:51 ` Balbir Singh 2009-05-13 0:31 ` KAMEZAWA Hiroyuki 2009-05-13 0:31 ` KAMEZAWA Hiroyuki 2009-05-14 23:47 ` KAMEZAWA Hiroyuki 2009-05-14 23:47 ` KAMEZAWA Hiroyuki 2009-05-15 0:38 ` Daisuke Nishimura 2009-05-15 0:38 ` Daisuke Nishimura 2009-05-15 0:54 ` KAMEZAWA Hiroyuki 2009-05-15 0:54 ` KAMEZAWA Hiroyuki 2009-05-15 1:12 ` Daisuke Nishimura 2009-05-15 1:12 ` Daisuke Nishimura
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20090512104401.28edc0a8.kamezawa.hiroyu@jp.fujitsu.com \ --to=kamezawa.hiroyu@jp.fujitsu.com \ --cc=akpm@linux-foundation.org \ --cc=balbir@linux.vnet.ibm.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mingo@elte.hu \ --cc=nishimura@mxp.nes.nec.co.jp \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.