From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> To: Hugh Dickins <hughd@google.com>, Andrea Arcangeli <aarcange@redhat.com>, Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave.hansen@intel.com>, Vlastimil Babka <vbabka@suse.cz>, Christoph Lameter <cl@gentwo.org>, Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>, Jerome Marchand <jmarchan@redhat.com>, Yang Shi <yang.shi@linaro.org>, Sasha Levin <sasha.levin@oracle.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Subject: [PATCHv4 05/25] mm: introduce do_set_pmd() Date: Sat, 12 Mar 2016 01:58:57 +0300 [thread overview] Message-ID: <1457737157-38573-6-git-send-email-kirill.shutemov@linux.intel.com> (raw) In-Reply-To: <1457737157-38573-1-git-send-email-kirill.shutemov@linux.intel.com> With postponed page table allocation we have chance to setup huge pages. do_set_pte() calls do_set_pmd() if following criteria met: - page is compound; - pmd entry in pmd_none(); - vma has suitable size and alignment; Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> --- include/linux/huge_mm.h | 2 ++ mm/huge_memory.c | 8 ------ mm/memory.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++- mm/migrate.c | 3 +-- 4 files changed, 74 insertions(+), 11 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 24918897f073..193fccdc275d 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -147,6 +147,8 @@ static inline bool is_huge_zero_pmd(pmd_t pmd) struct page *get_huge_zero_page(void); +#define mk_huge_pmd(page, prot) pmd_mkhuge(mk_pmd(page, prot)) + #else /* CONFIG_TRANSPARENT_HUGEPAGE */ #define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; }) #define HPAGE_PMD_MASK ({ BUILD_BUG(); 0; }) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 1b111d5c0312..2e9e6f4afe40 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -780,14 +780,6 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) return pmd; } -static inline pmd_t mk_huge_pmd(struct page *page, pgprot_t prot) -{ - pmd_t entry; - entry = mk_pmd(page, prot); - entry = pmd_mkhuge(entry); - return entry; -} - static inline struct list_head *page_deferred_list(struct page *page) { /* diff --git a/mm/memory.c b/mm/memory.c index a6c1c4955560..0109db96fdff 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2837,6 +2837,66 @@ map_pte: return 0; } +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + +#define HPAGE_CACHE_INDEX_MASK (HPAGE_PMD_NR - 1) +static inline bool transhuge_vma_suitable(struct vm_area_struct *vma, + unsigned long haddr) +{ + if (((vma->vm_start >> PAGE_SHIFT) & HPAGE_CACHE_INDEX_MASK) != + (vma->vm_pgoff & HPAGE_CACHE_INDEX_MASK)) + return false; + if (haddr < vma->vm_start || haddr + HPAGE_PMD_SIZE > vma->vm_end) + return false; + return true; +} + +static int do_set_pmd(struct fault_env *fe, struct page *page) +{ + struct vm_area_struct *vma = fe->vma; + bool write = fe->flags & FAULT_FLAG_WRITE; + unsigned long haddr = fe->address & HPAGE_PMD_MASK; + pmd_t entry; + int i, ret; + + if (!transhuge_vma_suitable(vma, haddr)) + return VM_FAULT_FALLBACK; + + ret = VM_FAULT_FALLBACK; + page = compound_head(page); + + fe->ptl = pmd_lock(vma->vm_mm, fe->pmd); + if (unlikely(!pmd_none(*fe->pmd))) + goto out; + + for (i = 0; i < HPAGE_PMD_NR; i++) + flush_icache_page(vma, page + i); + + entry = mk_huge_pmd(page, vma->vm_page_prot); + if (write) + entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); + + add_mm_counter(vma->vm_mm, MM_FILEPAGES, HPAGE_PMD_NR); + page_add_file_rmap(page, true); + + set_pmd_at(vma->vm_mm, haddr, fe->pmd, entry); + + update_mmu_cache_pmd(vma, haddr, fe->pmd); + + /* fault is handled */ + ret = 0; +out: + spin_unlock(fe->ptl); + return ret; +} +#else +static int do_set_pmd(struct fault_env *fe, struct page *page) +{ + BUILD_BUG(); + return 0; +} +#endif + /** * alloc_set_pte - setup new PTE entry for given page and add reverse page * mapping. If needed, the fucntion allocates page table or use pre-allocated. @@ -2856,9 +2916,19 @@ int alloc_set_pte(struct fault_env *fe, struct mem_cgroup *memcg, struct vm_area_struct *vma = fe->vma; bool write = fe->flags & FAULT_FLAG_WRITE; pte_t entry; + int ret; + + if (pmd_none(*fe->pmd) && PageTransCompound(page)) { + /* THP on COW? */ + VM_BUG_ON_PAGE(memcg, page); + + ret = do_set_pmd(fe, page); + if (ret != VM_FAULT_FALLBACK) + return ret; + } if (!fe->pte) { - int ret = pte_alloc_one_map(fe); + ret = pte_alloc_one_map(fe); if (ret) return ret; } diff --git a/mm/migrate.c b/mm/migrate.c index d20276fffce7..5c9cd90334ea 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1820,8 +1820,7 @@ fail_putback: } orig_entry = *pmd; - entry = mk_pmd(new_page, vma->vm_page_prot); - entry = pmd_mkhuge(entry); + entry = mk_huge_pmd(new_page, vma->vm_page_prot); entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); /* -- 2.7.0
WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> To: Hugh Dickins <hughd@google.com>, Andrea Arcangeli <aarcange@redhat.com>, Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave.hansen@intel.com>, Vlastimil Babka <vbabka@suse.cz>, Christoph Lameter <cl@gentwo.org>, Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>, Jerome Marchand <jmarchan@redhat.com>, Yang Shi <yang.shi@linaro.org>, Sasha Levin <sasha.levin@oracle.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Subject: [PATCHv4 05/25] mm: introduce do_set_pmd() Date: Sat, 12 Mar 2016 01:58:57 +0300 [thread overview] Message-ID: <1457737157-38573-6-git-send-email-kirill.shutemov@linux.intel.com> (raw) In-Reply-To: <1457737157-38573-1-git-send-email-kirill.shutemov@linux.intel.com> With postponed page table allocation we have chance to setup huge pages. do_set_pte() calls do_set_pmd() if following criteria met: - page is compound; - pmd entry in pmd_none(); - vma has suitable size and alignment; Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> --- include/linux/huge_mm.h | 2 ++ mm/huge_memory.c | 8 ------ mm/memory.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++- mm/migrate.c | 3 +-- 4 files changed, 74 insertions(+), 11 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 24918897f073..193fccdc275d 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -147,6 +147,8 @@ static inline bool is_huge_zero_pmd(pmd_t pmd) struct page *get_huge_zero_page(void); +#define mk_huge_pmd(page, prot) pmd_mkhuge(mk_pmd(page, prot)) + #else /* CONFIG_TRANSPARENT_HUGEPAGE */ #define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; }) #define HPAGE_PMD_MASK ({ BUILD_BUG(); 0; }) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 1b111d5c0312..2e9e6f4afe40 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -780,14 +780,6 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) return pmd; } -static inline pmd_t mk_huge_pmd(struct page *page, pgprot_t prot) -{ - pmd_t entry; - entry = mk_pmd(page, prot); - entry = pmd_mkhuge(entry); - return entry; -} - static inline struct list_head *page_deferred_list(struct page *page) { /* diff --git a/mm/memory.c b/mm/memory.c index a6c1c4955560..0109db96fdff 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2837,6 +2837,66 @@ map_pte: return 0; } +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + +#define HPAGE_CACHE_INDEX_MASK (HPAGE_PMD_NR - 1) +static inline bool transhuge_vma_suitable(struct vm_area_struct *vma, + unsigned long haddr) +{ + if (((vma->vm_start >> PAGE_SHIFT) & HPAGE_CACHE_INDEX_MASK) != + (vma->vm_pgoff & HPAGE_CACHE_INDEX_MASK)) + return false; + if (haddr < vma->vm_start || haddr + HPAGE_PMD_SIZE > vma->vm_end) + return false; + return true; +} + +static int do_set_pmd(struct fault_env *fe, struct page *page) +{ + struct vm_area_struct *vma = fe->vma; + bool write = fe->flags & FAULT_FLAG_WRITE; + unsigned long haddr = fe->address & HPAGE_PMD_MASK; + pmd_t entry; + int i, ret; + + if (!transhuge_vma_suitable(vma, haddr)) + return VM_FAULT_FALLBACK; + + ret = VM_FAULT_FALLBACK; + page = compound_head(page); + + fe->ptl = pmd_lock(vma->vm_mm, fe->pmd); + if (unlikely(!pmd_none(*fe->pmd))) + goto out; + + for (i = 0; i < HPAGE_PMD_NR; i++) + flush_icache_page(vma, page + i); + + entry = mk_huge_pmd(page, vma->vm_page_prot); + if (write) + entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); + + add_mm_counter(vma->vm_mm, MM_FILEPAGES, HPAGE_PMD_NR); + page_add_file_rmap(page, true); + + set_pmd_at(vma->vm_mm, haddr, fe->pmd, entry); + + update_mmu_cache_pmd(vma, haddr, fe->pmd); + + /* fault is handled */ + ret = 0; +out: + spin_unlock(fe->ptl); + return ret; +} +#else +static int do_set_pmd(struct fault_env *fe, struct page *page) +{ + BUILD_BUG(); + return 0; +} +#endif + /** * alloc_set_pte - setup new PTE entry for given page and add reverse page * mapping. If needed, the fucntion allocates page table or use pre-allocated. @@ -2856,9 +2916,19 @@ int alloc_set_pte(struct fault_env *fe, struct mem_cgroup *memcg, struct vm_area_struct *vma = fe->vma; bool write = fe->flags & FAULT_FLAG_WRITE; pte_t entry; + int ret; + + if (pmd_none(*fe->pmd) && PageTransCompound(page)) { + /* THP on COW? */ + VM_BUG_ON_PAGE(memcg, page); + + ret = do_set_pmd(fe, page); + if (ret != VM_FAULT_FALLBACK) + return ret; + } if (!fe->pte) { - int ret = pte_alloc_one_map(fe); + ret = pte_alloc_one_map(fe); if (ret) return ret; } diff --git a/mm/migrate.c b/mm/migrate.c index d20276fffce7..5c9cd90334ea 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1820,8 +1820,7 @@ fail_putback: } orig_entry = *pmd; - entry = mk_pmd(new_page, vma->vm_page_prot); - entry = pmd_mkhuge(entry); + entry = mk_huge_pmd(new_page, vma->vm_page_prot); entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); /* -- 2.7.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-03-11 22:59 UTC|newest] Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-03-11 22:58 [PATCHv4 00/25] THP-enabled tmpfs/shmem Kirill A. Shutemov 2016-03-11 22:58 ` Kirill A. Shutemov 2016-03-11 22:58 ` [PATCHv4 01/25] mm: do not pass mm_struct into handle_mm_fault Kirill A. Shutemov 2016-03-11 22:58 ` Kirill A. Shutemov 2016-03-11 22:58 ` [PATCHv4 02/25] mm: introduce fault_env Kirill A. Shutemov 2016-03-11 22:58 ` Kirill A. Shutemov 2016-03-11 22:58 ` [PATCHv4 03/25] mm: postpone page table allocation until we have page to map Kirill A. Shutemov 2016-03-11 22:58 ` Kirill A. Shutemov 2016-03-11 22:58 ` [PATCHv4 04/25] rmap: support file thp Kirill A. Shutemov 2016-03-11 22:58 ` Kirill A. Shutemov 2016-03-18 9:40 ` Aneesh Kumar K.V 2016-03-18 9:40 ` Aneesh Kumar K.V 2016-03-18 9:40 ` Aneesh Kumar K.V 2016-03-19 1:01 ` Kirill A. Shutemov 2016-03-19 1:01 ` Kirill A. Shutemov 2016-03-11 22:58 ` Kirill A. Shutemov [this message] 2016-03-11 22:58 ` [PATCHv4 05/25] mm: introduce do_set_pmd() Kirill A. Shutemov 2016-03-11 22:58 ` [PATCHv4 06/25] mm, rmap: account file thp pages Kirill A. Shutemov 2016-03-11 22:58 ` Kirill A. Shutemov 2016-03-15 15:30 ` [PATCHv5 " Kirill A. Shutemov 2016-03-15 15:30 ` Kirill A. Shutemov 2016-03-11 22:58 ` [PATCHv4 07/25] thp, vmstats: add counters for huge file pages Kirill A. Shutemov 2016-03-11 22:58 ` Kirill A. Shutemov 2016-03-11 22:59 ` [PATCHv4 08/25] thp: support file pages in zap_huge_pmd() Kirill A. Shutemov 2016-03-11 22:59 ` Kirill A. Shutemov 2016-03-18 13:53 ` Aneesh Kumar K.V 2016-03-18 13:53 ` Aneesh Kumar K.V 2016-03-18 13:53 ` Aneesh Kumar K.V 2016-03-19 1:02 ` Kirill A. Shutemov 2016-03-19 1:02 ` Kirill A. Shutemov 2016-03-21 4:33 ` Aneesh Kumar K.V 2016-03-21 4:33 ` Aneesh Kumar K.V 2016-03-21 14:39 ` Kirill A. Shutemov 2016-03-21 14:39 ` Kirill A. Shutemov 2016-03-21 16:42 ` Aneesh Kumar K.V 2016-03-21 16:42 ` Aneesh Kumar K.V 2016-03-11 22:59 ` [PATCHv4 09/25] thp: handle file pages in split_huge_pmd() Kirill A. Shutemov 2016-03-11 22:59 ` Kirill A. Shutemov 2016-03-11 22:59 ` [PATCHv4 10/25] thp: handle file COW faults Kirill A. Shutemov 2016-03-11 22:59 ` Kirill A. Shutemov 2016-03-11 22:59 ` [PATCHv4 11/25] thp: handle file pages in mremap() Kirill A. Shutemov 2016-03-11 22:59 ` Kirill A. Shutemov 2016-03-11 22:59 ` [PATCHv4 12/25] thp: skip file huge pmd on copy_huge_pmd() Kirill A. Shutemov 2016-03-11 22:59 ` Kirill A. Shutemov 2016-03-11 22:59 ` [PATCHv4 13/25] thp: prepare change_huge_pmd() for file thp Kirill A. Shutemov 2016-03-11 22:59 ` Kirill A. Shutemov 2016-03-11 22:59 ` [PATCHv4 14/25] thp: run vma_adjust_trans_huge() outside i_mmap_rwsem Kirill A. Shutemov 2016-03-11 22:59 ` Kirill A. Shutemov 2016-03-11 22:59 ` [PATCHv4 15/25] thp: file pages support for split_huge_page() Kirill A. Shutemov 2016-03-11 22:59 ` Kirill A. Shutemov 2016-03-11 22:59 ` [PATCHv4 16/25] thp, mlock: do not mlock PTE-mapped file huge pages Kirill A. Shutemov 2016-03-11 22:59 ` Kirill A. Shutemov 2016-03-11 22:59 ` [PATCHv4 17/25] vmscan: split file huge pages before paging them out Kirill A. Shutemov 2016-03-11 22:59 ` Kirill A. Shutemov 2016-03-11 22:59 ` [PATCHv4 18/25] page-flags: relax policy for PG_mappedtodisk and PG_reclaim Kirill A. Shutemov 2016-03-11 22:59 ` Kirill A. Shutemov 2016-03-11 22:59 ` [PATCHv4 19/25] radix-tree: implement radix_tree_maybe_preload_order() Kirill A. Shutemov 2016-03-11 22:59 ` Kirill A. Shutemov 2016-03-11 22:59 ` [PATCHv4 20/25] filemap: prepare find and delete operations for huge pages Kirill A. Shutemov 2016-03-11 22:59 ` Kirill A. Shutemov 2016-03-11 22:59 ` [PATCHv4 21/25] truncate: handle file thp Kirill A. Shutemov 2016-03-11 22:59 ` Kirill A. Shutemov 2016-03-11 22:59 ` [PATCHv4 22/25] shmem: prepare huge= mount option and sysfs knob Kirill A. Shutemov 2016-03-11 22:59 ` Kirill A. Shutemov 2016-03-11 22:59 ` [PATCHv4 23/25] shmem: get_unmapped_area align huge page Kirill A. Shutemov 2016-03-11 22:59 ` Kirill A. Shutemov 2016-03-11 22:59 ` [PATCHv4 24/25] shmem: add huge pages support Kirill A. Shutemov 2016-03-11 22:59 ` Kirill A. Shutemov 2016-03-11 22:59 ` [PATCHv4 25/25] shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings Kirill A. Shutemov 2016-03-11 22:59 ` Kirill A. Shutemov 2016-03-15 15:52 ` [PATCHv5 26/25] thp: update Documentation/vm/transhuge.txt Kirill A. Shutemov 2016-03-15 15:52 ` Kirill A. Shutemov 2016-03-23 20:09 ` [PATCHv4 00/25] THP-enabled tmpfs/shmem Hugh Dickins 2016-03-23 20:09 ` Hugh Dickins 2016-03-24 9:17 ` Kirill A. Shutemov 2016-03-24 9:17 ` Kirill A. Shutemov 2016-03-24 19:08 ` Hugh Dickins 2016-03-24 19:08 ` Hugh Dickins 2016-03-25 15:04 ` Kirill A. Shutemov 2016-03-25 15:04 ` Kirill A. Shutemov 2016-03-26 0:00 ` Hugh Dickins 2016-03-26 0:00 ` Hugh Dickins 2016-03-28 18:00 ` Kirill A. Shutemov 2016-03-28 18:00 ` Kirill A. Shutemov 2016-03-28 18:42 ` Hugh Dickins 2016-03-28 18:42 ` Hugh Dickins 2016-03-28 12:29 ` Kirill A. Shutemov 2016-03-28 12:29 ` Kirill A. Shutemov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1457737157-38573-6-git-send-email-kirill.shutemov@linux.intel.com \ --to=kirill.shutemov@linux.intel.com \ --cc=aarcange@redhat.com \ --cc=akpm@linux-foundation.org \ --cc=cl@gentwo.org \ --cc=dave.hansen@intel.com \ --cc=hughd@google.com \ --cc=jmarchan@redhat.com \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=n-horiguchi@ah.jp.nec.com \ --cc=sasha.levin@oracle.com \ --cc=vbabka@suse.cz \ --cc=yang.shi@linaro.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.