From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757627Ab0DARhL (ORCPT ); Thu, 1 Apr 2010 13:37:11 -0400 Received: from gir.skynet.ie ([193.1.99.77]:44843 "EHLO gir.skynet.ie" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754136Ab0DARhB (ORCPT ); Thu, 1 Apr 2010 13:37:01 -0400 Date: Thu, 1 Apr 2010 18:36:41 +0100 From: Mel Gorman To: Minchan Kim Cc: KAMEZAWA Hiroyuki , Andrew Morton , Andrea Arcangeli , Christoph Lameter , Adam Litke , Avi Kivity , David Rientjes , KOSAKI Motohiro , Rik van Riel , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 14/14] mm,migration: Allow the migration of PageSwapCache pages Message-ID: <20100401173640.GB621@csn.ul.ie> References: <1269940489-5776-1-git-send-email-mel@csn.ul.ie> <1269940489-5776-15-git-send-email-mel@csn.ul.ie> <20100331142623.62ac9175.kamezawa.hiroyu@jp.fujitsu.com> <20100401120123.f9f9e872.kamezawa.hiroyu@jp.fujitsu.com> <20100401144234.e3848876.kamezawa.hiroyu@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.17+20080114 (2008-01-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 01, 2010 at 07:51:31PM +0900, Minchan Kim wrote: > On Thu, Apr 1, 2010 at 2:42 PM, KAMEZAWA Hiroyuki > wrote: > > On Thu, 1 Apr 2010 13:44:29 +0900 > > Minchan Kim wrote: > > > >> On Thu, Apr 1, 2010 at 12:01 PM, KAMEZAWA Hiroyuki > >> wrote: > >> > On Thu, 1 Apr 2010 11:43:18 +0900 > >> > Minchan Kim wrote: > >> > > >> >> On Wed, Mar 31, 2010 at 2:26 PM, KAMEZAWA Hiroyuki       /* > >> >> >> diff --git a/mm/rmap.c b/mm/rmap.c > >> >> >> index af35b75..d5ea1f2 100644 > >> >> >> --- a/mm/rmap.c > >> >> >> +++ b/mm/rmap.c > >> >> >> @@ -1394,9 +1394,11 @@ int rmap_walk(struct page *page, int (*rmap_one)(struct page *, > >> >> >> > >> >> >>       if (unlikely(PageKsm(page))) > >> >> >>               return rmap_walk_ksm(page, rmap_one, arg); > >> >> >> -     else if (PageAnon(page)) > >> >> >> +     else if (PageAnon(page)) { > >> >> >> +             if (PageSwapCache(page)) > >> >> >> +                     return SWAP_AGAIN; > >> >> >>               return rmap_walk_anon(page, rmap_one, arg); > >> >> > > >> >> > SwapCache has a condition as (PageSwapCache(page) && page_mapped(page) == true. > >> >> > > >> >> > >> >> In case of tmpfs, page has swapcache but not mapped. > >> >> > >> >> > Please see do_swap_page(), PageSwapCache bit is cleared only when > >> >> > > >> >> > do_swap_page()... > >> >> >       swap_free(entry); > >> >> >        if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) > >> >> >                try_to_free_swap(page); > >> >> > > >> >> > Then, PageSwapCache is cleared only when swap is freeable even if mapped. > >> >> > > >> >> > rmap_walk_anon() should be called and the check is not necessary. > >> >> > >> >> Frankly speaking, I don't understand what is Mel's problem, why he added > >> >> Swapcache check in rmap_walk, and why do you said we don't need it. > >> >> > >> >> Could you explain more detail if you don't mind? > >> >> > >> > I may miss something. > >> > > >> > unmap_and_move() > >> >  1. try_to_unmap(TTU_MIGRATION) > >> >  2. move_to_newpage > >> >  3. remove_migration_ptes > >> >        -> rmap_walk() > >> > > >> > Then, to map a page back we unmapped we call rmap_walk(). > >> > > >> > Assume a SwapCache which is mapped, then, PageAnon(page) == true. > >> > > >> >  At 1. try_to_unmap() will rewrite pte with swp_entry of SwapCache. > >> >       mapcount goes to 0. > >> >  At 2. SwapCache is copied to a new page. > >> >  At 3. The new page is mapped back to the place. Now, newpage's mapcount is 0. > >> >       Before patch, the new page is mapped back to all ptes. > >> >       After patch, the new page is not mapped back because its mapcount is 0. > >> > > >> > I don't think shared SwapCache of anon is not an usual behavior, so, the logic > >> > before patch is more attractive. > >> > > >> > If SwapCache is not mapped before "1", we skip "1" and rmap_walk will do nothing > >> > because page->mapping is NULL. > >> > > >> > >> Thanks. I agree. We don't need the check. > >> Then, my question is why Mel added the check in rmap_walk. > >> He mentioned some BUG trigger and fixed things after this patch. > >> What's it? > >> Is it really related to this logic? > >> I don't think so or we are missing something. > >> > > Hmm. Consiering again. > > > > Now. > >        if (PageAnon(page)) { > >                rcu_locked = 1; > >                rcu_read_lock(); > >                if (!page_mapped(page)) { > >                        if (!PageSwapCache(page)) > >                                goto rcu_unlock; > >                } else { > >                        anon_vma = page_anon_vma(page); > >                        atomic_inc(&anon_vma->external_refcount); > >                } > > > > > > Maybe this is a fix. > > > > == > >        skip_remap = 0; > >        if (PageAnon(page)) { > >                rcu_read_lock(); > >                if (!page_mapped(page)) { > >                        if (!PageSwapCache(page)) > >                                goto rcu_unlock; > >                        /* > >                         * We can't convice this anon_vma is valid or not because > >                         * !page_mapped(page). Then, we do migration(radix-tree replacement) > >                         * but don't remap it which touches anon_vma in page->mapping. > >                         */ > >                        skip_remap = 1; > >                        goto skip_unmap; > >                } else { > >                        anon_vma = page_anon_vma(page); > >                        atomic_inc(&anon_vma->external_refcount); > >                } > >        } > >        .....copy page, radix-tree replacement,.... > > > > It's not enough. > we uses remove_migration_ptes in move_to_new_page, too. > We have to prevent it. > We can check PageSwapCache(page) in move_to_new_page and then > skip remove_migration_ptes. > > ex) > static int move_to_new_page(....) > { > int swapcache = PageSwapCache(page); > ... > if (!swapcache) > if(!rc) > remove_migration_ptes > else > newpage->mapping = NULL; > } > This I agree with. > And we have to close race between PageAnon(page) and rcu_read_lock. Not so sure on this. The page is locked at this point and that should prevent it from becoming !PageAnon > If we don't do it, anon_vma could be free in the middle of operation. > I means > > * of migration. File cache pages are no problem because of page_lock() > * File Caches may use write_page() or lock_page() in migration, then, > * just care Anon page here. > */ > if (PageAnon(page)) { > !!! RACE !!!! > rcu_read_lock(); > rcu_locked = 1; > > + > + /* > + * If the page has no mappings any more, just bail. An > + * unmapped anon page is likely to be freed soon but worse, > I am not sure this race exists because the page is locked but a key observation has been made - A page that is unmapped can be migrated if it's PageSwapCache but it may not have a valid anon_vma. Hence, in the !page_mapped case, the key is to not use anon_vma. How about the following patch? ==== CUT HERE ==== mm,migration: Allow the migration of PageSwapCache pages PageAnon pages that are unmapped may or may not have an anon_vma so are not currently migrated. However, a swap cache page can be migrated and fits this description. This patch identifies page swap caches and allows them to be migrated but ensures that no attempt to made to remap the pages would would potentially try to access an already freed anon_vma. Signed-off-by: Mel Gorman diff --git a/mm/migrate.c b/mm/migrate.c index 35aad2a..5d0218b 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -484,7 +484,8 @@ static int fallback_migrate_page(struct address_space *mapping, * < 0 - error code * == 0 - success */ -static int move_to_new_page(struct page *newpage, struct page *page) +static int move_to_new_page(struct page *newpage, struct page *page, + int safe_to_remap) { struct address_space *mapping; int rc; @@ -519,10 +520,12 @@ static int move_to_new_page(struct page *newpage, struct page *page) else rc = fallback_migrate_page(mapping, newpage, page); - if (!rc) - remove_migration_ptes(page, newpage); - else - newpage->mapping = NULL; + if (safe_to_remap) { + if (!rc) + remove_migration_ptes(page, newpage); + else + newpage->mapping = NULL; + } unlock_page(newpage); @@ -539,6 +542,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private, int rc = 0; int *result = NULL; struct page *newpage = get_new_page(page, private, &result); + int safe_to_remap = 1; int rcu_locked = 0; int charge = 0; struct mem_cgroup *mem = NULL; @@ -600,18 +604,26 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private, rcu_read_lock(); rcu_locked = 1; - /* - * If the page has no mappings any more, just bail. An - * unmapped anon page is likely to be freed soon but worse, - * it's possible its anon_vma disappeared between when - * the page was isolated and when we reached here while - * the RCU lock was not held - */ - if (!page_mapped(page)) - goto rcu_unlock; + /* Determine how to safely use anon_vma */ + if (!page_mapped(page)) { + if (!PageSwapCache(page)) + goto rcu_unlock; - anon_vma = page_anon_vma(page); - atomic_inc(&anon_vma->external_refcount); + /* + * We cannot be sure that the anon_vma of an unmapped + * page is safe to use. In this case, the page still + * gets migrated but the pages are not remapped as + */ + safe_to_remap = 0; + } else { + /* + * Take a reference count on the anon_vma if the + * page is mapped so that it is guaranteed to + * exist when the page is remapped later + */ + anon_vma = page_anon_vma(page); + atomic_inc(&anon_vma->external_refcount); + } } /* @@ -646,9 +658,9 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private, skip_unmap: if (!page_mapped(page)) - rc = move_to_new_page(newpage, page); + rc = move_to_new_page(newpage, page, safe_to_remap); - if (rc) + if (rc && safe_to_remap) remove_migration_ptes(page, page); rcu_unlock: From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail191.messagelabs.com (mail191.messagelabs.com [216.82.242.19]) by kanga.kvack.org (Postfix) with ESMTP id E62BF6B01EE for ; Thu, 1 Apr 2010 13:37:02 -0400 (EDT) Date: Thu, 1 Apr 2010 18:36:41 +0100 From: Mel Gorman Subject: Re: [PATCH 14/14] mm,migration: Allow the migration of PageSwapCache pages Message-ID: <20100401173640.GB621@csn.ul.ie> References: <1269940489-5776-1-git-send-email-mel@csn.ul.ie> <1269940489-5776-15-git-send-email-mel@csn.ul.ie> <20100331142623.62ac9175.kamezawa.hiroyu@jp.fujitsu.com> <20100401120123.f9f9e872.kamezawa.hiroyu@jp.fujitsu.com> <20100401144234.e3848876.kamezawa.hiroyu@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Sender: owner-linux-mm@kvack.org To: Minchan Kim Cc: KAMEZAWA Hiroyuki , Andrew Morton , Andrea Arcangeli , Christoph Lameter , Adam Litke , Avi Kivity , David Rientjes , KOSAKI Motohiro , Rik van Riel , linux-kernel@vger.kernel.org, linux-mm@kvack.org List-ID: On Thu, Apr 01, 2010 at 07:51:31PM +0900, Minchan Kim wrote: > On Thu, Apr 1, 2010 at 2:42 PM, KAMEZAWA Hiroyuki > wrote: > > On Thu, 1 Apr 2010 13:44:29 +0900 > > Minchan Kim wrote: > > > >> On Thu, Apr 1, 2010 at 12:01 PM, KAMEZAWA Hiroyuki > >> wrote: > >> > On Thu, 1 Apr 2010 11:43:18 +0900 > >> > Minchan Kim wrote: > >> > > >> >> On Wed, Mar 31, 2010 at 2:26 PM, KAMEZAWA Hiroyuki /* > >> >> >> diff --git a/mm/rmap.c b/mm/rmap.c > >> >> >> index af35b75..d5ea1f2 100644 > >> >> >> --- a/mm/rmap.c > >> >> >> +++ b/mm/rmap.c > >> >> >> @@ -1394,9 +1394,11 @@ int rmap_walk(struct page *page, int (*rmap_one)(struct page *, > >> >> >> > >> >> >> if (unlikely(PageKsm(page))) > >> >> >> return rmap_walk_ksm(page, rmap_one, arg); > >> >> >> - else if (PageAnon(page)) > >> >> >> + else if (PageAnon(page)) { > >> >> >> + if (PageSwapCache(page)) > >> >> >> + return SWAP_AGAIN; > >> >> >> return rmap_walk_anon(page, rmap_one, arg); > >> >> > > >> >> > SwapCache has a condition as (PageSwapCache(page) && page_mapped(page) == true. > >> >> > > >> >> > >> >> In case of tmpfs, page has swapcache but not mapped. > >> >> > >> >> > Please see do_swap_page(), PageSwapCache bit is cleared only when > >> >> > > >> >> > do_swap_page()... > >> >> > swap_free(entry); > >> >> > if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) > >> >> > try_to_free_swap(page); > >> >> > > >> >> > Then, PageSwapCache is cleared only when swap is freeable even if mapped. > >> >> > > >> >> > rmap_walk_anon() should be called and the check is not necessary. > >> >> > >> >> Frankly speaking, I don't understand what is Mel's problem, why he added > >> >> Swapcache check in rmap_walk, and why do you said we don't need it. > >> >> > >> >> Could you explain more detail if you don't mind? > >> >> > >> > I may miss something. > >> > > >> > unmap_and_move() > >> > 1. try_to_unmap(TTU_MIGRATION) > >> > 2. move_to_newpage > >> > 3. remove_migration_ptes > >> > -> rmap_walk() > >> > > >> > Then, to map a page back we unmapped we call rmap_walk(). > >> > > >> > Assume a SwapCache which is mapped, then, PageAnon(page) == true. > >> > > >> > At 1. try_to_unmap() will rewrite pte with swp_entry of SwapCache. > >> > mapcount goes to 0. > >> > At 2. SwapCache is copied to a new page. > >> > At 3. The new page is mapped back to the place. Now, newpage's mapcount is 0. > >> > Before patch, the new page is mapped back to all ptes. > >> > After patch, the new page is not mapped back because its mapcount is 0. > >> > > >> > I don't think shared SwapCache of anon is not an usual behavior, so, the logic > >> > before patch is more attractive. > >> > > >> > If SwapCache is not mapped before "1", we skip "1" and rmap_walk will do nothing > >> > because page->mapping is NULL. > >> > > >> > >> Thanks. I agree. We don't need the check. > >> Then, my question is why Mel added the check in rmap_walk. > >> He mentioned some BUG trigger and fixed things after this patch. > >> What's it? > >> Is it really related to this logic? > >> I don't think so or we are missing something. > >> > > Hmm. Consiering again. > > > > Now. > > if (PageAnon(page)) { > > rcu_locked = 1; > > rcu_read_lock(); > > if (!page_mapped(page)) { > > if (!PageSwapCache(page)) > > goto rcu_unlock; > > } else { > > anon_vma = page_anon_vma(page); > > atomic_inc(&anon_vma->external_refcount); > > } > > > > > > Maybe this is a fix. > > > > == > > skip_remap = 0; > > if (PageAnon(page)) { > > rcu_read_lock(); > > if (!page_mapped(page)) { > > if (!PageSwapCache(page)) > > goto rcu_unlock; > > /* > > * We can't convice this anon_vma is valid or not because > > * !page_mapped(page). Then, we do migration(radix-tree replacement) > > * but don't remap it which touches anon_vma in page->mapping. > > */ > > skip_remap = 1; > > goto skip_unmap; > > } else { > > anon_vma = page_anon_vma(page); > > atomic_inc(&anon_vma->external_refcount); > > } > > } > > .....copy page, radix-tree replacement,.... > > > > It's not enough. > we uses remove_migration_ptes in move_to_new_page, too. > We have to prevent it. > We can check PageSwapCache(page) in move_to_new_page and then > skip remove_migration_ptes. > > ex) > static int move_to_new_page(....) > { > int swapcache = PageSwapCache(page); > ... > if (!swapcache) > if(!rc) > remove_migration_ptes > else > newpage->mapping = NULL; > } > This I agree with. > And we have to close race between PageAnon(page) and rcu_read_lock. Not so sure on this. The page is locked at this point and that should prevent it from becoming !PageAnon > If we don't do it, anon_vma could be free in the middle of operation. > I means > > * of migration. File cache pages are no problem because of page_lock() > * File Caches may use write_page() or lock_page() in migration, then, > * just care Anon page here. > */ > if (PageAnon(page)) { > !!! RACE !!!! > rcu_read_lock(); > rcu_locked = 1; > > + > + /* > + * If the page has no mappings any more, just bail. An > + * unmapped anon page is likely to be freed soon but worse, > I am not sure this race exists because the page is locked but a key observation has been made - A page that is unmapped can be migrated if it's PageSwapCache but it may not have a valid anon_vma. Hence, in the !page_mapped case, the key is to not use anon_vma. How about the following patch? ==== CUT HERE ==== mm,migration: Allow the migration of PageSwapCache pages PageAnon pages that are unmapped may or may not have an anon_vma so are not currently migrated. However, a swap cache page can be migrated and fits this description. This patch identifies page swap caches and allows them to be migrated but ensures that no attempt to made to remap the pages would would potentially try to access an already freed anon_vma. Signed-off-by: Mel Gorman diff --git a/mm/migrate.c b/mm/migrate.c index 35aad2a..5d0218b 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -484,7 +484,8 @@ static int fallback_migrate_page(struct address_space *mapping, * < 0 - error code * == 0 - success */ -static int move_to_new_page(struct page *newpage, struct page *page) +static int move_to_new_page(struct page *newpage, struct page *page, + int safe_to_remap) { struct address_space *mapping; int rc; @@ -519,10 +520,12 @@ static int move_to_new_page(struct page *newpage, struct page *page) else rc = fallback_migrate_page(mapping, newpage, page); - if (!rc) - remove_migration_ptes(page, newpage); - else - newpage->mapping = NULL; + if (safe_to_remap) { + if (!rc) + remove_migration_ptes(page, newpage); + else + newpage->mapping = NULL; + } unlock_page(newpage); @@ -539,6 +542,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private, int rc = 0; int *result = NULL; struct page *newpage = get_new_page(page, private, &result); + int safe_to_remap = 1; int rcu_locked = 0; int charge = 0; struct mem_cgroup *mem = NULL; @@ -600,18 +604,26 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private, rcu_read_lock(); rcu_locked = 1; - /* - * If the page has no mappings any more, just bail. An - * unmapped anon page is likely to be freed soon but worse, - * it's possible its anon_vma disappeared between when - * the page was isolated and when we reached here while - * the RCU lock was not held - */ - if (!page_mapped(page)) - goto rcu_unlock; + /* Determine how to safely use anon_vma */ + if (!page_mapped(page)) { + if (!PageSwapCache(page)) + goto rcu_unlock; - anon_vma = page_anon_vma(page); - atomic_inc(&anon_vma->external_refcount); + /* + * We cannot be sure that the anon_vma of an unmapped + * page is safe to use. In this case, the page still + * gets migrated but the pages are not remapped as + */ + safe_to_remap = 0; + } else { + /* + * Take a reference count on the anon_vma if the + * page is mapped so that it is guaranteed to + * exist when the page is remapped later + */ + anon_vma = page_anon_vma(page); + atomic_inc(&anon_vma->external_refcount); + } } /* @@ -646,9 +658,9 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private, skip_unmap: if (!page_mapped(page)) - rc = move_to_new_page(newpage, page); + rc = move_to_new_page(newpage, page, safe_to_remap); - if (rc) + if (rc && safe_to_remap) remove_migration_ptes(page, page); rcu_unlock: -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org