From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ning Qu Subject: Re: [PATCH 20/23] thp: handle file pages in split_huge_page() Date: Tue, 6 Aug 2013 14:47:43 -0700 Message-ID: References: <1375582645-29274-1-git-send-email-kirill.shutemov@linux.intel.com> <1375582645-29274-21-git-send-email-kirill.shutemov@linux.intel.com> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=047d7b6720382bd79504e34e6133 Cc: Andrea Arcangeli , Andrew Morton , Al Viro , Hugh Dickins , Wu Fengguang , Jan Kara , Mel Gorman , linux-mm@kvack.org, Andi Kleen , Matthew Wilcox , "Kirill A. Shutemov" , Hillf Danton , Dave Hansen , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org To: "Kirill A. Shutemov" Return-path: In-Reply-To: Sender: owner-linux-mm@kvack.org List-Id: linux-fsdevel.vger.kernel.org --047d7b6720382bd79504e34e6133 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I just tried, and it seems working fine now without the deadlock anymore. I can run some big internal test with about 40GB files in sysv shm. Just move the line before the locking happens in vma_adjust, something as below, the line number is not accurate because my patch is based on another tree right now. --- a/mm/mmap.c +++ b/mm/mmap.c @@ -581,6 +581,8 @@ again: remove_next =3D 1 + (end > next->vm_end); } } + vma_adjust_trans_huge(vma, start, end, adjust_next); + if (file) { mapping =3D file->f_mapping; if (!(vma->vm_flags & VM_NONLINEAR)) @@ -597,8 +599,6 @@ again: remove_next =3D 1 + (end > next->vm_end); } } - vma_adjust_trans_huge(vma, start, end, adjust_next); - anon_vma =3D vma->anon_vma; if (!anon_vma && adjust_next) anon_vma =3D next->anon_vma; Best wishes, --=20 Ning Qu (=E6=9B=B2=E5=AE=81) | Software Engineer | quning@google.com | +1-4= 08-418-6066 On Tue, Aug 6, 2013 at 2:09 PM, Ning Qu wrote: > Is this safe to move the vma_adjust_trans_huge before the line 772? Seems > for anonymous memory, we only take the lock after vma_adjust_trans_huge, > maybe we should do the same for file? > > Best wishes, > -- > Ning Qu (=E6=9B=B2=E5=AE=81) | Software Engineer | quning@google.com | +1= -408-418-6066 > > > On Tue, Aug 6, 2013 at 12:09 PM, Ning Qu wrote: > >> I am probably running into a deadlock case for this patch. >> >> When splitting the file huge page, we hold the i_mmap_mutex. >> >> However, when coming from the call path in vma_adjust as following, we >> will grab the i_mmap_mutex already before doing vma_adjust_trans_huge, >> which will eventually calls the split_huge_page then split_file_huge_pag= e >> .... >> >> >> https://git.kernel.org/cgit/linux/kernel/git/kas/linux.git/tree/mm/mmap.= c?h=3Dthp/pagecache#n753 >> >> >> >> >> Best wishes, >> -- >> Ning Qu (=E6=9B=B2=E5=AE=81) | Software Engineer | quning@google.com | += 1-408-418-6066 >> >> >> On Sat, Aug 3, 2013 at 7:17 PM, Kirill A. Shutemov < >> kirill.shutemov@linux.intel.com> wrote: >> >>> From: "Kirill A. Shutemov" >>> >>> The base scheme is the same as for anonymous pages, but we walk by >>> mapping->i_mmap rather then anon_vma->rb_root. >>> >>> When we add a huge page to page cache we take only reference to head >>> page, but on split we need to take addition reference to all tail pages >>> since they are still in page cache after splitting. >>> >>> Signed-off-by: Kirill A. Shutemov >>> --- >>> mm/huge_memory.c | 89 >>> +++++++++++++++++++++++++++++++++++++++++++++++--------- >>> 1 file changed, 76 insertions(+), 13 deletions(-) >>> >>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>> index 523946c..d7c6830 100644 >>> --- a/mm/huge_memory.c >>> +++ b/mm/huge_memory.c >>> @@ -1580,6 +1580,7 @@ static void __split_huge_page_refcount(struct pag= e >>> *page, >>> struct zone *zone =3D page_zone(page); >>> struct lruvec *lruvec; >>> int tail_count =3D 0; >>> + int initial_tail_refcount; >>> >>> /* prevent PageLRU to go away from under us, and freeze lru >>> stats */ >>> spin_lock_irq(&zone->lru_lock); >>> @@ -1589,6 +1590,13 @@ static void __split_huge_page_refcount(struct >>> page *page, >>> /* complete memcg works before add pages to LRU */ >>> mem_cgroup_split_huge_fixup(page); >>> >>> + /* >>> + * When we add a huge page to page cache we take only reference >>> to head >>> + * page, but on split we need to take addition reference to all >>> tail >>> + * pages since they are still in page cache after splitting. >>> + */ >>> + initial_tail_refcount =3D PageAnon(page) ? 0 : 1; >>> + >>> for (i =3D HPAGE_PMD_NR - 1; i >=3D 1; i--) { >>> struct page *page_tail =3D page + i; >>> >>> @@ -1611,8 +1619,9 @@ static void __split_huge_page_refcount(struct pag= e >>> *page, >>> * atomic_set() here would be safe on all archs (and >>> * not only on x86), it's safer to use atomic_add(). >>> */ >>> - atomic_add(page_mapcount(page) + >>> page_mapcount(page_tail) + 1, >>> - &page_tail->_count); >>> + atomic_add(initial_tail_refcount + page_mapcount(page) = + >>> + page_mapcount(page_tail) + 1, >>> + &page_tail->_count); >>> >>> /* after clearing PageTail the gup refcount can be >>> released */ >>> smp_mb(); >>> @@ -1651,23 +1660,23 @@ static void __split_huge_page_refcount(struct >>> page *page, >>> */ >>> page_tail->_mapcount =3D page->_mapcount; >>> >>> - BUG_ON(page_tail->mapping); >>> page_tail->mapping =3D page->mapping; >>> >>> page_tail->index =3D page->index + i; >>> page_nid_xchg_last(page_tail, page_nid_last(page)); >>> >>> - BUG_ON(!PageAnon(page_tail)); >>> BUG_ON(!PageUptodate(page_tail)); >>> BUG_ON(!PageDirty(page_tail)); >>> - BUG_ON(!PageSwapBacked(page_tail)); >>> >>> lru_add_page_tail(page, page_tail, lruvec, list); >>> } >>> atomic_sub(tail_count, &page->_count); >>> BUG_ON(atomic_read(&page->_count) <=3D 0); >>> >>> - __mod_zone_page_state(zone, NR_ANON_TRANSPARENT_HUGEPAGES, -1); >>> + if (PageAnon(page)) >>> + __mod_zone_page_state(zone, >>> NR_ANON_TRANSPARENT_HUGEPAGES, -1); >>> + else >>> + __mod_zone_page_state(zone, >>> NR_FILE_TRANSPARENT_HUGEPAGES, -1); >>> >>> ClearPageCompound(page); >>> compound_unlock(page); >>> @@ -1767,7 +1776,7 @@ static int __split_huge_page_map(struct page *pag= e, >>> } >>> >>> /* must be called with anon_vma->root->rwsem held */ >>> -static void __split_huge_page(struct page *page, >>> +static void __split_anon_huge_page(struct page *page, >>> struct anon_vma *anon_vma, >>> struct list_head *list) >>> { >>> @@ -1791,7 +1800,7 @@ static void __split_huge_page(struct page *page, >>> * and establishes a child pmd before >>> * __split_huge_page_splitting() freezes the parent pmd (so if >>> * we fail to prevent copy_huge_pmd() from running until the >>> - * whole __split_huge_page() is complete), we will still see >>> + * whole __split_anon_huge_page() is complete), we will still s= ee >>> * the newly established pmd of the child later during the >>> * walk, to be able to set it as pmd_trans_splitting too. >>> */ >>> @@ -1822,14 +1831,11 @@ static void __split_huge_page(struct page *page= , >>> * from the hugepage. >>> * Return 0 if the hugepage is split successfully otherwise return 1. >>> */ >>> -int split_huge_page_to_list(struct page *page, struct list_head *list) >>> +static int split_anon_huge_page(struct page *page, struct list_head >>> *list) >>> { >>> struct anon_vma *anon_vma; >>> int ret =3D 1; >>> >>> - BUG_ON(is_huge_zero_page(page)); >>> - BUG_ON(!PageAnon(page)); >>> - >>> /* >>> * The caller does not necessarily hold an mmap_sem that would >>> prevent >>> * the anon_vma disappearing so we first we take a reference to >>> it >>> @@ -1847,7 +1853,7 @@ int split_huge_page_to_list(struct page *page, >>> struct list_head *list) >>> goto out_unlock; >>> >>> BUG_ON(!PageSwapBacked(page)); >>> - __split_huge_page(page, anon_vma, list); >>> + __split_anon_huge_page(page, anon_vma, list); >>> count_vm_event(THP_SPLIT); >>> >>> BUG_ON(PageCompound(page)); >>> @@ -1858,6 +1864,63 @@ out: >>> return ret; >>> } >>> >>> +static int split_file_huge_page(struct page *page, struct list_head >>> *list) >>> +{ >>> + struct address_space *mapping =3D page->mapping; >>> + pgoff_t pgoff =3D page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT= ); >>> + struct vm_area_struct *vma; >>> + int mapcount, mapcount2; >>> + >>> + BUG_ON(!PageHead(page)); >>> + BUG_ON(PageTail(page)); >>> + >>> + mutex_lock(&mapping->i_mmap_mutex); >>> + mapcount =3D 0; >>> + vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) = { >>> + unsigned long addr =3D vma_address(page, vma); >>> + mapcount +=3D __split_huge_page_splitting(page, vma, ad= dr); >>> + } >>> + >>> + if (mapcount !=3D page_mapcount(page)) >>> + printk(KERN_ERR "mapcount %d page_mapcount %d\n", >>> + mapcount, page_mapcount(page)); >>> + BUG_ON(mapcount !=3D page_mapcount(page)); >>> + >>> + __split_huge_page_refcount(page, list); >>> + >>> + mapcount2 =3D 0; >>> + vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) = { >>> + unsigned long addr =3D vma_address(page, vma); >>> + mapcount2 +=3D __split_huge_page_map(page, vma, addr); >>> + } >>> + >>> + if (mapcount !=3D mapcount2) >>> + printk(KERN_ERR "mapcount %d mapcount2 %d page_mapcount >>> %d\n", >>> + mapcount, mapcount2, page_mapcount(page)); >>> + BUG_ON(mapcount !=3D mapcount2); >>> + count_vm_event(THP_SPLIT); >>> + mutex_unlock(&mapping->i_mmap_mutex); >>> + >>> + /* >>> + * Drop small pages beyond i_size if any. >>> + * >>> + * XXX: do we need to serialize over i_mutex here? >>> + * If yes, how to get mmap_sem vs. i_mutex ordering fixed? >>> + */ >>> + truncate_inode_pages(mapping, i_size_read(mapping->host)); >>> + return 0; >>> +} >>> + >>> +int split_huge_page_to_list(struct page *page, struct list_head *list) >>> +{ >>> + BUG_ON(is_huge_zero_page(page)); >>> + >>> + if (PageAnon(page)) >>> + return split_anon_huge_page(page, list); >>> + else >>> + return split_file_huge_page(page, list); >>> +} >>> + >>> #define VM_NO_THP >>> (VM_SPECIAL|VM_MIXEDMAP|VM_HUGETLB|VM_SHARED|VM_MAYSHARE) >>> >>> int hugepage_madvise(struct vm_area_struct *vma, >>> -- >>> 1.8.3.2 >>> >>> >> > --047d7b6720382bd79504e34e6133 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I just tried, and it seems working fine now without the de= adlock anymore. I can run some big internal test with about 40GB files in s= ysv shm. Just move the line before the locking happens in=C2=A0vma_adjust, = something as below, the line number is not accurate because my patch is bas= ed on another tree right now.

--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ = -581,6 +581,8 @@ again: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0remove_next =3D 1 + (end > next->vm_end);<= /div>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }
=C2=A0 =C2=A0 =C2=A0 =C2=A0 }
=C2=A0
+ =C2=A0 =C2=A0 =C2=A0 vma_adjust_trans_huge(vma, start, e= nd, adjust_next);
+
=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (fil= e) {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mapp= ing =3D file->f_mapping;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 if (!(vma->vm_flags & VM_NONLINEAR))
@@ -597,8 +599,6 @@ again: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0remove_next =3D 1 + (end > next->vm= _end);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }<= /div>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 }
=C2=A0
- =C2=A0= =C2=A0 =C2=A0 vma_adjust_trans_huge(vma, start, end, adjust_next);
-
=C2=A0 =C2=A0 =C2=A0 =C2=A0 anon_vma =3D vma->anon_vma;=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (!anon_vma && adjust_next= )
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 anon_vm= a =3D next->anon_vma;


Best wishes,
--=C2= =A0
Ning Qu (=E6=9B=B2=E5=AE=81)=C2=A0|=C2=A0Software Engineer |=C2=A0quning@google.com=C2=A0|=C2=A0+1-408-418-6066


On Tue, Aug 6, 2013 at 2:09 PM, Ning Qu = <quning@google.com> wrote:
Is this safe to move the=C2=A0vma_adjust_trans_huge before= the line 772? Seems for anonymous memory, we only take the lock after=C2= =A0vma_adjust_trans_huge, maybe we should do the same for file?

Best wishes,
--=C2= =A0
Ning Qu (=E6=9B=B2=E5=AE=81)=C2=A0|=C2=A0Software Engineer |=C2=A0quning@google.com=C2=A0|=C2=A0+1-408-418-6066


On Tue, Aug= 6, 2013 at 12:09 PM, Ning Qu <quning@google.com> wrote:
=
I am probably running into a deadlock case for this patch.=

When splitting the file huge page, we hold the=C2=A0i_m= map_mutex.

However, when coming from the call path= in vma_adjust as following, we will grab the=C2=A0i_mmap_mutex already before doing=C2=A0vma_adjust_trans_huge, which will eventually ca= lls the split_huge_page then split_file_huge_page ....





Best wishes,
--=C2=A0
= Ning Qu (=E6=9B=B2=E5=AE=81)=C2=A0|=C2=A0Software Engineer |=C2=A0qu= ning@google.com=C2=A0|=C2=A0+1-408-418-6066


On Sat, Aug 3, 2013 at 7:17 PM, Kirill A= . Shutemov <kirill.shutemov@linux.intel.com> w= rote:
From: "Kirill A. Shutemov" <kirill.sh= utemov@linux.intel.com>

The base scheme is the same as for anonymous pages, but we walk by
mapping->i_mmap rather then anon_vma->rb_root.

When we add a huge page to page cache we take only reference to head
page, but on split we need to take addition reference to all tail pages
since they are still in page cache after splitting.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
=C2=A0mm/huge_memory.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++= ---------
=C2=A01 file changed, 76 insertions(+), 13 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 523946c..d7c6830 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1580,6 +1580,7 @@ static void __split_huge_page_refcount(struct page *p= age,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 struct zone *zone =3D page_zone(page);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 struct lruvec *lruvec;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 int tail_count =3D 0;
+ =C2=A0 =C2=A0 =C2=A0 int initial_tail_refcount;

=C2=A0 =C2=A0 =C2=A0 =C2=A0 /* prevent PageLRU to go away from under us, an= d freeze lru stats */
=C2=A0 =C2=A0 =C2=A0 =C2=A0 spin_lock_irq(&zone->lru_lock);
@@ -1589,6 +1590,13 @@ static void __split_huge_page_refcount(struct page *= page,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 /* complete memcg works before add pages to LRU= */
=C2=A0 =C2=A0 =C2=A0 =C2=A0 mem_cgroup_split_huge_fixup(page);

+ =C2=A0 =C2=A0 =C2=A0 /*
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0* When we add a huge page to page cache we tak= e only reference to head
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0* page, but on split we need to take addition = reference to all tail
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0* pages since they are still in page cache aft= er splitting.
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0*/
+ =C2=A0 =C2=A0 =C2=A0 initial_tail_refcount =3D PageAnon(page) ? 0 : 1; +
=C2=A0 =C2=A0 =C2=A0 =C2=A0 for (i =3D HPAGE_PMD_NR - 1; i >=3D 1; i--) = {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 struct page *page_t= ail =3D page + i;

@@ -1611,8 +1619,9 @@ static void __split_huge_page_refcount(struct page *p= age,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* atomic_set(= ) here would be safe on all archs (and
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* not only on= x86), it's safer to use atomic_add().
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0*/
- =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 atomic_add(page_mapcount= (page) + page_mapcount(page_tail) + 1,
- =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0&page_tail->_count);
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 atomic_add(initial_tail_= refcount + page_mapcount(page) +
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 page_mapcount(page_tail) + 1,
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &page_tail->_count);

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* after clearing P= ageTail the gup refcount can be released */
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 smp_mb();
@@ -1651,23 +1660,23 @@ static void __split_huge_page_refcount(struct page = *page,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 */
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 page_tail->_mapc= ount =3D page->_mapcount;

- =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 BUG_ON(page_tail->map= ping);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 page_tail->mappi= ng =3D page->mapping;

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 page_tail->index= =3D page->index + i;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 page_nid_xchg_last(= page_tail, page_nid_last(page));

- =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 BUG_ON(!PageAnon(page_ta= il));
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 BUG_ON(!PageUptodat= e(page_tail));
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 BUG_ON(!PageDirty(p= age_tail));
- =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 BUG_ON(!PageSwapBacked(p= age_tail));

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 lru_add_page_tail(p= age, page_tail, lruvec, list);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 }
=C2=A0 =C2=A0 =C2=A0 =C2=A0 atomic_sub(tail_count, &page->_count); =C2=A0 =C2=A0 =C2=A0 =C2=A0 BUG_ON(atomic_read(&page->_count) <= =3D 0);

- =C2=A0 =C2=A0 =C2=A0 __mod_zone_page_state(zone, NR_ANON_TRANSPARENT_HUGE= PAGES, -1);
+ =C2=A0 =C2=A0 =C2=A0 if (PageAnon(page))
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 __mod_zone_page_state(zo= ne, NR_ANON_TRANSPARENT_HUGEPAGES, -1);
+ =C2=A0 =C2=A0 =C2=A0 else
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 __mod_zone_page_state(zo= ne, NR_FILE_TRANSPARENT_HUGEPAGES, -1);

=C2=A0 =C2=A0 =C2=A0 =C2=A0 ClearPageCompound(page);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 compound_unlock(page);
@@ -1767,7 +1776,7 @@ static int __split_huge_page_map(struct page *page, =C2=A0}

=C2=A0/* must be called with anon_vma->root->rwsem held */
-static void __split_huge_page(struct page *page,
+static void __split_anon_huge_page(struct page *page,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 struct anon_vma *anon_vma,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 struct list_head *list)
=C2=A0{
@@ -1791,7 +1800,7 @@ static void __split_huge_page(struct page *page,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* and establishes a child pmd before
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* __split_huge_page_splitting() freezes t= he parent pmd (so if
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* we fail to prevent copy_huge_pmd() from= running until the
- =C2=A0 =C2=A0 =C2=A0 =C2=A0* whole __split_huge_page() is complete), we w= ill still see
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0* whole __split_anon_huge_page() is complete),= we will still see
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* the newly established pmd of the child = later during the
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* walk, to be able to set it as pmd_trans= _splitting too.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0*/
@@ -1822,14 +1831,11 @@ static void __split_huge_page(struct page *page, =C2=A0 * from the hugepage.
=C2=A0 * Return 0 if the hugepage is split successfully otherwise return 1.=
=C2=A0 */
-int split_huge_page_to_list(struct page *page, struct list_head *list)
+static int split_anon_huge_page(struct page *page, struct list_head *list)=
=C2=A0{
=C2=A0 =C2=A0 =C2=A0 =C2=A0 struct anon_vma *anon_vma;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 int ret =3D 1;

- =C2=A0 =C2=A0 =C2=A0 BUG_ON(is_huge_zero_page(page));
- =C2=A0 =C2=A0 =C2=A0 BUG_ON(!PageAnon(page));
-
=C2=A0 =C2=A0 =C2=A0 =C2=A0 /*
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* The caller does not necessarily hold an= mmap_sem that would prevent
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* the anon_vma disappearing so we first w= e take a reference to it
@@ -1847,7 +1853,7 @@ int split_huge_page_to_list(struct page *page, struct= list_head *list)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 goto out_unlock;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 BUG_ON(!PageSwapBacked(page));
- =C2=A0 =C2=A0 =C2=A0 __split_huge_page(page, anon_vma, list);
+ =C2=A0 =C2=A0 =C2=A0 __split_anon_huge_page(page, anon_vma, list);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 count_vm_event(THP_SPLIT);

=C2=A0 =C2=A0 =C2=A0 =C2=A0 BUG_ON(PageCompound(page));
@@ -1858,6 +1864,63 @@ out:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 return ret;
=C2=A0}

+static int split_file_huge_page(struct page *page, struct list_head *list)=
+{
+ =C2=A0 =C2=A0 =C2=A0 struct address_space *mapping =3D page->mapping;<= br> + =C2=A0 =C2=A0 =C2=A0 pgoff_t pgoff =3D page->index << (PAGE_CACH= E_SHIFT - PAGE_SHIFT);
+ =C2=A0 =C2=A0 =C2=A0 struct vm_area_struct *vma;
+ =C2=A0 =C2=A0 =C2=A0 int mapcount, mapcount2;
+
+ =C2=A0 =C2=A0 =C2=A0 BUG_ON(!PageHead(page));
+ =C2=A0 =C2=A0 =C2=A0 BUG_ON(PageTail(page));
+
+ =C2=A0 =C2=A0 =C2=A0 mutex_lock(&mapping->i_mmap_mutex);
+ =C2=A0 =C2=A0 =C2=A0 mapcount =3D 0;
+ =C2=A0 =C2=A0 =C2=A0 vma_interval_tree_foreach(vma, &mapping->i_mm= ap, pgoff, pgoff) {
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 unsigned long addr =3D v= ma_address(page, vma);
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mapcount +=3D __split_hu= ge_page_splitting(page, vma, addr);
+ =C2=A0 =C2=A0 =C2=A0 }
+
+ =C2=A0 =C2=A0 =C2=A0 if (mapcount !=3D page_mapcount(page))
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 printk(KERN_ERR "ma= pcount %d page_mapcount %d\n",
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0mapcount, page_mapcount(page));
+ =C2=A0 =C2=A0 =C2=A0 BUG_ON(mapcount !=3D page_mapcount(page));
+
+ =C2=A0 =C2=A0 =C2=A0 __split_huge_page_refcount(page, list);
+
+ =C2=A0 =C2=A0 =C2=A0 mapcount2 =3D 0;
+ =C2=A0 =C2=A0 =C2=A0 vma_interval_tree_foreach(vma, &mapping->i_mm= ap, pgoff, pgoff) {
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 unsigned long addr =3D v= ma_address(page, vma);
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mapcount2 +=3D __split_h= uge_page_map(page, vma, addr);
+ =C2=A0 =C2=A0 =C2=A0 }
+
+ =C2=A0 =C2=A0 =C2=A0 if (mapcount !=3D mapcount2)
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 printk(KERN_ERR "ma= pcount %d mapcount2 %d page_mapcount %d\n",
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0mapcount, mapcount2, page_mapcount(page));
+ =C2=A0 =C2=A0 =C2=A0 BUG_ON(mapcount !=3D mapcount2);
+ =C2=A0 =C2=A0 =C2=A0 count_vm_event(THP_SPLIT);
+ =C2=A0 =C2=A0 =C2=A0 mutex_unlock(&mapping->i_mmap_mutex);
+
+ =C2=A0 =C2=A0 =C2=A0 /*
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0* Drop small pages beyond i_size if any.
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0*
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0* XXX: do we need to serialize over i_mutex he= re?
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0* If yes, how to get mmap_sem vs. i_mutex orde= ring fixed?
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0*/
+ =C2=A0 =C2=A0 =C2=A0 truncate_inode_pages(mapping, i_size_read(mapping-&g= t;host));
+ =C2=A0 =C2=A0 =C2=A0 return 0;
+}
+
+int split_huge_page_to_list(struct page *page, struct list_head *list)
+{
+ =C2=A0 =C2=A0 =C2=A0 BUG_ON(is_huge_zero_page(page));
+
+ =C2=A0 =C2=A0 =C2=A0 if (PageAnon(page))
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return split_anon_huge_p= age(page, list);
+ =C2=A0 =C2=A0 =C2=A0 else
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return split_file_huge_p= age(page, list);
+}
+
=C2=A0#define VM_NO_THP (VM_SPECIAL|VM_MIXEDMAP|VM_HUGETLB|VM_SHARED|VM_MAY= SHARE)

=C2=A0int hugepage_madvise(struct vm_area_struct *vma,
--
1.8.3.2




--047d7b6720382bd79504e34e6133-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org