From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933623AbbFJNrU (ORCPT ); Wed, 10 Jun 2015 09:47:20 -0400 Received: from cantor2.suse.de ([195.135.220.15]:46787 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754268AbbFJNrK (ORCPT ); Wed, 10 Jun 2015 09:47:10 -0400 Message-ID: <55783FDA.3080700@suse.cz> Date: Wed, 10 Jun 2015 15:47:06 +0200 From: Vlastimil Babka User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: "Kirill A. Shutemov" , Andrew Morton , Andrea Arcangeli , Hugh Dickins CC: Dave Hansen , Mel Gorman , Rik van Riel , Christoph Lameter , Naoya Horiguchi , Steve Capper , "Aneesh Kumar K.V" , Johannes Weiner , Michal Hocko , Jerome Marchand , Sasha Levin , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCHv6 26/36] mm: rework mapcount accounting to enable 4k mapping of THPs References: <1433351167-125878-1-git-send-email-kirill.shutemov@linux.intel.com> <1433351167-125878-27-git-send-email-kirill.shutemov@linux.intel.com> In-Reply-To: <1433351167-125878-27-git-send-email-kirill.shutemov@linux.intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/03/2015 07:05 PM, Kirill A. Shutemov wrote: > We're going to allow mapping of individual 4k pages of THP compound. > It means we need to track mapcount on per small page basis. > > Straight-forward approach is to use ->_mapcount in all subpages to track > how many time this subpage is mapped with PMDs or PTEs combined. But > this is rather expensive: mapping or unmapping of a THP page with PMD > would require HPAGE_PMD_NR atomic operations instead of single we have > now. > > The idea is to store separately how many times the page was mapped as > whole -- compound_mapcount. This frees up ->_mapcount in subpages to > track PTE mapcount. > > We use the same approach as with compound page destructor and compound > order to store compound_mapcount: use space in first tail page, > ->mapping this time. > > Any time we map/unmap whole compound page (THP or hugetlb) -- we > increment/decrement compound_mapcount. When we map part of compound page > with PTE we operate on ->_mapcount of the subpage. > > page_mapcount() counts both: PTE and PMD mappings of the page. > > Basically, we have mapcount for a subpage spread over two counters. > It makes tricky to detect when last mapcount for a page goes away. > > We introduced PageDoubleMap() for this. When we split THP PMD for the > first time and there's other PMD mapping left we offset up ->_mapcount > in all subpages by one and set PG_double_map on the compound page. > These additional references go away with last compound_mapcount. > > This approach provides a way to detect when last mapcount goes away on > per small page basis without introducing new overhead for most common > cases. > > Signed-off-by: Kirill A. Shutemov > --- > include/linux/mm.h | 26 +++++++++++- > include/linux/mm_types.h | 1 + > include/linux/page-flags.h | 37 +++++++++++++++++ > include/linux/rmap.h | 4 +- > mm/debug.c | 5 ++- > mm/huge_memory.c | 2 +- > mm/hugetlb.c | 4 +- > mm/memory.c | 2 +- > mm/migrate.c | 2 +- > mm/page_alloc.c | 14 +++++-- > mm/rmap.c | 98 +++++++++++++++++++++++++++++++++++----------- > 11 files changed, 160 insertions(+), 35 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 31cd5be081cf..22cd540104ec 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -403,6 +403,19 @@ static inline int is_vmalloc_or_module_addr(const void *x) > > extern void kvfree(const void *addr); > > +static inline atomic_t *compound_mapcount_ptr(struct page *page) > +{ > + return &page[1].compound_mapcount; > +} > + > +static inline int compound_mapcount(struct page *page) > +{ > + if (!PageCompound(page)) > + return 0; > + page = compound_head(page); > + return atomic_read(compound_mapcount_ptr(page)) + 1; > +} > + > /* > * The atomic page->_mapcount, starts from -1: so that transitions > * both from it and to it can be tracked, using atomic_inc_and_test > @@ -415,8 +428,17 @@ static inline void page_mapcount_reset(struct page *page) > > static inline int page_mapcount(struct page *page) > { > + int ret; > VM_BUG_ON_PAGE(PageSlab(page), page); > - return atomic_read(&page->_mapcount) + 1; > + > + ret = atomic_read(&page->_mapcount) + 1; > + if (PageCompound(page)) { > + page = compound_head(page); > + ret += compound_mapcount(page); compound_mapcount() means another PageCompound() and compound_head(), which you just did. I've tried this to see the effect on a function that "calls" (inlines) page_mapcount() once: - ret += compound_mapcount(page); + ret += atomic_read(compound_mapcount_ptr(page)) + 1; bloat-o-meter on compaction.o: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-59 (-59) function old new delta isolate_migratepages_block 1769 1710 -59 > + if (PageDoubleMap(page)) > + ret--; > + } > + return ret; > } > > static inline int page_count(struct page *page) > @@ -898,7 +920,7 @@ static inline pgoff_t page_file_index(struct page *page) > */ > static inline int page_mapped(struct page *page) > { > - return atomic_read(&(page)->_mapcount) >= 0; > + return atomic_read(&(page)->_mapcount) + compound_mapcount(page) >= 0; > } > > /* > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 4b51a59160ab..4d182cd14c1f 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -56,6 +56,7 @@ struct page { > * see PAGE_MAPPING_ANON below. > */ > void *s_mem; /* slab first object */ > + atomic_t compound_mapcount; /* first tail page */ > }; > > /* Second double word */ > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > index 74b7cece1dfa..a8d47c1edf6a 100644 > --- a/include/linux/page-flags.h > +++ b/include/linux/page-flags.h > @@ -127,6 +127,9 @@ enum pageflags { > > /* SLOB */ > PG_slob_free = PG_private, > + > + /* THP. Stored in first tail page's flags */ > + PG_double_map = PG_private_2, Well, not just THP. Any user of compound pages must make sure not to use PG_private_2 on the first tail page. At least where the page is going to be user-mapped. And same thing about fields that are in union with compound_mapcount. Should that be documented more prominently somewhere? I guess there's no such user so far, right? > }; > > #ifndef __GENERATING_BOUNDS_H [...] > @@ -1167,6 +1194,41 @@ out: > mem_cgroup_end_page_stat(memcg); > } > > +static void page_remove_anon_compound_rmap(struct page *page) > +{ > + int i, nr; > + > + if (!atomic_add_negative(-1, compound_mapcount_ptr(page))) > + return; > + > + /* Hugepages are not counted in NR_ANON_PAGES for now. */ > + if (unlikely(PageHuge(page))) > + return; > + > + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) > + return; > + > + __dec_zone_page_state(page, NR_ANON_TRANSPARENT_HUGEPAGES); > + > + if (PageDoubleMap(page)) { > + nr = 0; > + ClearPageDoubleMap(page); > + /* > + * Subpages can be mapped with PTEs too. Check how many of > + * themi are still mapped. > + */ > + for (i = 0; i < HPAGE_PMD_NR; i++) { > + if (atomic_add_negative(-1, &page[i]._mapcount)) > + nr++; > + } > + } else { > + nr = HPAGE_PMD_NR; > + } > + > + if (nr) > + __mod_zone_page_state(page_zone(page), NR_ANON_PAGES, nr); -nr as we discussed on IRC. > +} > + > /** > * page_remove_rmap - take down pte mapping from a page > * @page: page to remove mapping from > @@ -1176,33 +1238,25 @@ out: > */ > void page_remove_rmap(struct page *page, bool compound) > { > - int nr = compound ? hpage_nr_pages(page) : 1; > - > if (!PageAnon(page)) { > VM_BUG_ON_PAGE(compound && !PageHuge(page), page); > page_remove_file_rmap(page); > return; > } > > + if (compound) > + return page_remove_anon_compound_rmap(page); > + > /* page still mapped by someone else? */ > if (!atomic_add_negative(-1, &page->_mapcount)) > return; > > - /* Hugepages are not counted in NR_ANON_PAGES for now. */ > - if (unlikely(PageHuge(page))) > - return; > - > /* > * We use the irq-unsafe __{inc|mod}_zone_page_stat because > * these counters are not modified in interrupt context, and > * pte lock(a spinlock) is held, which implies preemption disabled. > */ > - if (compound) { > - VM_BUG_ON_PAGE(!PageTransHuge(page), page); > - __dec_zone_page_state(page, NR_ANON_TRANSPARENT_HUGEPAGES); > - } > - > - __mod_zone_page_state(page_zone(page), NR_ANON_PAGES, -nr); > + __dec_zone_page_state(page, NR_ANON_PAGES); > > if (unlikely(PageMlocked(page))) > clear_page_mlock(page); > @@ -1643,7 +1697,7 @@ void hugepage_add_anon_rmap(struct page *page, > BUG_ON(!PageLocked(page)); > BUG_ON(!anon_vma); > /* address might be in next vma when migration races vma_adjust */ > - first = atomic_inc_and_test(&page->_mapcount); > + first = atomic_inc_and_test(compound_mapcount_ptr(page)); > if (first) > __hugepage_set_anon_rmap(page, vma, address, 0); > } > @@ -1652,7 +1706,7 @@ void hugepage_add_new_anon_rmap(struct page *page, > struct vm_area_struct *vma, unsigned long address) > { > BUG_ON(address < vma->vm_start || address >= vma->vm_end); > - atomic_set(&page->_mapcount, 0); > + atomic_set(compound_mapcount_ptr(page), 0); > __hugepage_set_anon_rmap(page, vma, address, 1); > } > #endif /* CONFIG_HUGETLB_PAGE */ >