On Sun, 2018-06-10 at 11:06 -0700, Hugh Dickins wrote: > On Thu, 7 Jun 2018, Ben Hutchings wrote: > > > 3.16.57-rc1 review patch. If anyone has any objections, please let me know. > > Not an objection as such, but if you're including this one, > please be sure to add 145e1a71e090575c74969e3daa8136d1e5b99fc8 > "mm: fix the NULL mapping case in __isolate_lru_page()" Added, thanks. Ben. > Thanks, > Hugh > > > > > ------------------ > > > > From: Mel Gorman > > > > commit 69d763fc6d3aee787a3e8c8c35092b4f4960fa5d upstream. > > > > Minchan Kim asked the following question -- what locks protects > > address_space destroying when race happens between inode trauncation and > > __isolate_lru_page? Jan Kara clarified by describing the race as follows > > > > CPU1 CPU2 > > > > truncate(inode) __isolate_lru_page() > > ... > > truncate_inode_page(mapping, page); > > delete_from_page_cache(page) > > spin_lock_irqsave(&mapping->tree_lock, flags); > > __delete_from_page_cache(page, NULL) > > page_cache_tree_delete(..) > > ... mapping = page_mapping(page); > > page->mapping = NULL; > > ... > > spin_unlock_irqrestore(&mapping->tree_lock, flags); > > page_cache_free_page(mapping, page) > > put_page(page) > > if (put_page_testzero(page)) -> false > > - inode now has no pages and can be freed including embedded address_space > > > > if (mapping && !mapping->a_ops->migratepage) > > - we've dereferenced mapping which is potentially already free. > > > > The race is theoretically possible but unlikely. Before the > > delete_from_page_cache, truncate_cleanup_page is called so the page is > > likely to be !PageDirty or PageWriteback which gets skipped by the only > > caller that checks the mappping in __isolate_lru_page. Even if the race > > occurs, a substantial amount of work has to happen during a tiny window > > with no preemption but it could potentially be done using a virtual > > machine to artifically slow one CPU or halt it during the critical > > window. > > > > This patch should eliminate the race with truncation by try-locking the > > page before derefencing mapping and aborting if the lock was not > > acquired. There was a suggestion from Huang Ying to use RCU as a > > side-effect to prevent mapping being freed. However, I do not like the > > solution as it's an unconventional means of preserving a mapping and > > it's not a context where rcu_read_lock is obviously protecting rcu data. > > > > Link: http://lkml.kernel.org/r/20180104102512.2qos3h5vqzeisrek@techsingularity.net > > Fixes: c82449352854 ("mm: compaction: make isolate_lru_page() filter-aware again") > > Signed-off-by: Mel Gorman > > Acked-by: Minchan Kim > > Cc: "Huang, Ying" > > Cc: Jan Kara > > Signed-off-by: Andrew Morton > > Signed-off-by: Linus Torvalds > > [bwh: Backported to 3.16: adjust context] > > Signed-off-by: Ben Hutchings > > --- > > mm/vmscan.c | 14 ++++++++++++-- > > 1 file changed, 12 insertions(+), 2 deletions(-) > > > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -1206,6 +1206,7 @@ int __isolate_lru_page(struct page *page > > > > if (PageDirty(page)) { > > struct address_space *mapping; > > + bool migrate_dirty; > > > > /* ISOLATE_CLEAN means only clean pages */ > > if (mode & ISOLATE_CLEAN) > > @@ -1214,10 +1215,19 @@ int __isolate_lru_page(struct page *page > > /* > > * Only pages without mappings or that have a > > * ->migratepage callback are possible to migrate > > - * without blocking > > + * without blocking. However, we can be racing with > > + * truncation so it's necessary to lock the page > > + * to stabilise the mapping as truncation holds > > + * the page lock until after the page is removed > > + * from the page cache. > > */ > > + if (!trylock_page(page)) > > + return ret; > > + > > mapping = page_mapping(page); > > - if (mapping && !mapping->a_ops->migratepage) > > + migrate_dirty = mapping && mapping->a_ops->migratepage; > > + unlock_page(page); > > + if (!migrate_dirty) > > return ret; > > } > > } > > > > -- Ben Hutchings The generation of random numbers is too important to be left to chance. - Robert Coveyou