* Re: [PATCHv9-rebased2 11/37] mm: introduce do_set_pmd()
[not found] <050201d1c7ae$9dbf9370$d93eba50$@alibaba-inc.com>
@ 2016-06-16 9:15 ` Hillf Danton
2016-06-16 10:17 ` Kirill A. Shutemov
0 siblings, 1 reply; 3+ messages in thread
From: Hillf Danton @ 2016-06-16 9:15 UTC (permalink / raw)
To: Kirill A. Shutemov; +Cc: linux-kernel, linux-mm
> +
> +static int do_set_pmd(struct fault_env *fe, struct page *page)
> +{
> + struct vm_area_struct *vma = fe->vma;
> + bool write = fe->flags & FAULT_FLAG_WRITE;
> + unsigned long haddr = fe->address & HPAGE_PMD_MASK;
> + pmd_t entry;
> + int i, ret;
> +
> + if (!transhuge_vma_suitable(vma, haddr))
> + return VM_FAULT_FALLBACK;
> +
> + ret = VM_FAULT_FALLBACK;
> + page = compound_head(page);
> +
> + fe->ptl = pmd_lock(vma->vm_mm, fe->pmd);
> + if (unlikely(!pmd_none(*fe->pmd)))
> + goto out;
Can we reply to the caller that fault is handled correctly(by
resetting ret to zero before jump)?
> +
> + for (i = 0; i < HPAGE_PMD_NR; i++)
> + flush_icache_page(vma, page + i);
> +
> + entry = mk_huge_pmd(page, vma->vm_page_prot);
> + if (write)
> + entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
> +
> + add_mm_counter(vma->vm_mm, MM_FILEPAGES, HPAGE_PMD_NR);
> + page_add_file_rmap(page, true);
> +
> + set_pmd_at(vma->vm_mm, haddr, fe->pmd, entry);
> +
> + update_mmu_cache_pmd(vma, haddr, fe->pmd);
> +
> + /* fault is handled */
> + ret = 0;
> +out:
> + spin_unlock(fe->ptl);
> + return ret;
> +}
> +#else
> +static int do_set_pmd(struct fault_env *fe, struct page *page)
> +{
> + BUILD_BUG();
> + return 0;
> +}
> +#endif
> +
> /**
> * alloc_set_pte - setup new PTE entry for given page and add reverse page
> * mapping. If needed, the fucntion allocates page table or use pre-allocated.
> @@ -2940,9 +3000,19 @@ int alloc_set_pte(struct fault_env *fe, struct mem_cgroup *memcg,
> struct vm_area_struct *vma = fe->vma;
> bool write = fe->flags & FAULT_FLAG_WRITE;
> pte_t entry;
> + int ret;
> +
> + if (pmd_none(*fe->pmd) && PageTransCompound(page)) {
> + /* THP on COW? */
> + VM_BUG_ON_PAGE(memcg, page);
> +
> + ret = do_set_pmd(fe, page);
> + if (ret != VM_FAULT_FALLBACK)
> + return ret;
> + }
>
> if (!fe->pte) {
> - int ret = pte_alloc_one_map(fe);
> + ret = pte_alloc_one_map(fe);
> if (ret)
> return ret;
> }
> diff --git a/mm/migrate.c b/mm/migrate.c
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCHv9-rebased2 11/37] mm: introduce do_set_pmd()
2016-06-16 9:15 ` [PATCHv9-rebased2 11/37] mm: introduce do_set_pmd() Hillf Danton
@ 2016-06-16 10:17 ` Kirill A. Shutemov
0 siblings, 0 replies; 3+ messages in thread
From: Kirill A. Shutemov @ 2016-06-16 10:17 UTC (permalink / raw)
To: Hillf Danton; +Cc: Kirill A. Shutemov, linux-kernel, linux-mm
On Thu, Jun 16, 2016 at 05:15:22PM +0800, Hillf Danton wrote:
> > +
> > +static int do_set_pmd(struct fault_env *fe, struct page *page)
> > +{
> > + struct vm_area_struct *vma = fe->vma;
> > + bool write = fe->flags & FAULT_FLAG_WRITE;
> > + unsigned long haddr = fe->address & HPAGE_PMD_MASK;
> > + pmd_t entry;
> > + int i, ret;
> > +
> > + if (!transhuge_vma_suitable(vma, haddr))
> > + return VM_FAULT_FALLBACK;
> > +
> > + ret = VM_FAULT_FALLBACK;
> > + page = compound_head(page);
> > +
> > + fe->ptl = pmd_lock(vma->vm_mm, fe->pmd);
> > + if (unlikely(!pmd_none(*fe->pmd)))
> > + goto out;
>
> Can we reply to the caller that fault is handled correctly(by
> resetting ret to zero before jump)?
It's non necessary handled. It's handled only if the pmd if huge. If it
points to pte table, we need to check relevant pte entry.
If pmd is huge it will caught by pte_alloc_one_map() later.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCHv9 00/32] THP-enabled tmpfs/shmem using compound pages
@ 2016-06-06 14:06 Kirill A. Shutemov
2016-06-15 20:06 ` [PATCHv9-rebased2 00/37] " Kirill A. Shutemov
0 siblings, 1 reply; 3+ messages in thread
From: Kirill A. Shutemov @ 2016-06-06 14:06 UTC (permalink / raw)
To: Hugh Dickins, Andrea Arcangeli, Andrew Morton
Cc: Dave Hansen, Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
Jerome Marchand, Yang Shi, Sasha Levin, Andres Lagar-Cavilla,
Ning Qu, linux-kernel, linux-mm, linux-fsdevel,
Kirill A. Shutemov
This is rebased version of my implementation of huge pages support for
tmpfs.
There are few fixes by Hugh since v8. Rebase on v4.7-rc1 was somewhat
painful, because of changes in radix-tree API, but everything looks fine
now.
Andrew, please consider applying the patchset to -mm tree.
The patchset is on top of v4.7-rc1 plus khugepaged updates from -mm tree.
Git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git hugetmpfs/v9
== Changelog ==
v9:
- rebased to v4.7-rc1;
- truncate_inode_pages_range() and invalidate_inode_pages2_range() are
adjusted to use page_to_pgoff() (Hugh);
- filemap: fix refcounting in error path in radix-tree opeartions (Hugh);
- khugepaged: handle !PageUptodate() pages (due fallocate() ?) during
collapse (Hugh);
- shmem_unused_huge_shrink:
- fix shrinklist_len accounting (Hugh);
- call find_lock_page() for alligned address, so we will not get tail
page and don't crash in PageTransHuge() (Hugh);
v8:
- khugepaged updates:
+ mark collapsed page dirty, otherwise vmscan would discard it;
+ account pages to mapping->nrpages on shmem_charge;
+ fix a situation when not all tail pages put on radix tree on collapse;
+ fix off-by-one in loop-exit condition in khugepaged_scan_shmem();
+ use radix_tree_iter_next/radix_tree_iter_retry instead of gotos;
+ fix build withount CONFIG_SHMEM (again);
- split huge pages beyond i_size under memory pressure;
- disable huge tmpfs on Power, as it makes use of deposited page tables,
we don't have;
- fix filesystem size limit accouting;
- mark page referenced on split_huge_pmd() if the pmd is young;
- uncharge pages from shmem, removed during split_huge_page();
- make shmem_inode_info::lock irq-safe -- required by khugepaged;
v7:
- khugepaged updates:
+ fix page leak/page cache corruption on collapse fail;
+ filter out VMAs not suitable for huge pages due misaligned vm_pgoff;
+ fix build without CONFIG_SHMEM;
+ drop few over-protective checks;
- fix bogus VM_BUG_ON() in __delete_from_page_cache();
v6:
- experimental collapse support;
- fix swapout mapped huge pages;
- fix page leak in faularound code;
- fix exessive huge page allocation with huge=within_size;
- rename VM_NO_THP to VM_NO_KHUGEPAGED;
- fix condition in hugepage_madvise();
- accounting reworked again;
v5:
- add FileHugeMapped to /proc/PID/smaps;
- make FileHugeMapped in meminfo aligned with other fields;
- Documentation/vm/transhuge.txt updated;
v4:
- first four patch were applied to -mm tree;
- drop pages beyond i_size on split_huge_pages;
- few small random bugfixes;
v3:
- huge= mountoption now can have values always, within_size, advice and
never;
- sysctl handle is replaced with sysfs knob;
- MADV_HUGEPAGE/MADV_NOHUGEPAGE is now respected on page allocation via
page fault;
- mlock() handling had been fixed;
- bunch of smaller bugfixes and cleanups.
== Design overview ==
Huge pages are allocated by shmem when it's allowed (by mount option) and
there's no entries for the range in radix-tree. Huge page is represented by
HPAGE_PMD_NR entries in radix-tree.
MM core maps a page with PMD if ->fault() returns huge page and the VMA is
suitable for huge pages (size, alignment). There's no need into two
requests to file system: filesystem returns huge page if it can,
graceful fallback to small pages otherwise.
As with DAX, split_huge_pmd() is implemented by unmapping the PMD: we can
re-fault the page with PTEs later.
Basic scheme for split_huge_page() is the same as for anon-THP.
Few differences:
- File pages are on radix-tree, so we have head->_count offset by
HPAGE_PMD_NR. The count got distributed to small pages during split.
- mapping->tree_lock prevents non-lockless access to pages under split
over radix-tree;
- Lockless access is prevented by setting the head->_count to 0 during
split, so get_page_unless_zero() would fail;
- After split, some pages can be beyond i_size. We drop them from
radix-tree.
- We don't setup migration entries. Just unmap pages. It helps
handling cases when i_size is in the middle of the page: no need
handle unmap pages beyond i_size manually.
COW mapping handled on PTE-level. It's not clear how beneficial would be
allocation of huge pages on COW faults. And it would require some code to
make them work.
I think at some point we can consider teaching khugepaged to collapse
pages in COW mappings, but allocating huge on fault is probably overkill.
As with anon THP, we mlock file huge page only if it mapped with PMD.
PTE-mapped THPs are never mlocked. This way we can avoid all sorts of
scenarios when we can leak mlocked page.
As with anon THP, we split huge page on swap out.
Truncate and punch hole that only cover part of THP range is implemented
by zero out this part of THP.
This have visible effect on fallocate(FALLOC_FL_PUNCH_HOLE) behaviour.
As we don't really create hole in this case, lseek(SEEK_HOLE) may have
inconsistent results depending what pages happened to be allocated.
I don't think this will be a problem.
We track per-super_block list of inodes which potentially have huge page
partly beyond i_size. Under memory pressure or if we hit -ENOSPC, we split
such pages in order to recovery memory.
The list is per-sb, as we need to split a page from our filesystem if hit
-ENOSPC (-o size= limit) during shmem_getpage_gfp() to free some space.
== Patchset overview ==
[01/29]
Update documentation on THP vs. mlock. I've posted it separately
before. It can go in.
[02-04/29]
Rework fault path and rmap to handle file pmd. Unlike DAX with
vm_ops->pmd_fault, we don't need to ask filesystem twice -- first
for huge page and then for small. If ->fault happened to return
huge page and VMA is suitable for mapping it as huge, we would
do so.
[05/29]
Add support for huge file pages in rmap;
[06-15/29]
Various preparation of THP core for file pages.
[16-20/29]
Various preparation of MM core for file pages.
[21-24/29]
And finally, bring huge pages into tmpfs/shmem.
[25/29]
Wire up madvise() existing hints for file THP.
We can implement fadvise() later.
[26/29]
Documentation update.
[27-29/29]
Extend khugepaged to support shmem/tmpfs.
Hugh Dickins (1):
shmem: get_unmapped_area align huge page
Kirill A. Shutemov (31):
thp, mlock: update unevictable-lru.txt
mm: do not pass mm_struct into handle_mm_fault
mm: introduce fault_env
mm: postpone page table allocation until we have page to map
rmap: support file thp
mm: introduce do_set_pmd()
thp, vmstats: add counters for huge file pages
thp: support file pages in zap_huge_pmd()
thp: handle file pages in split_huge_pmd()
thp: handle file COW faults
thp: skip file huge pmd on copy_huge_pmd()
thp: prepare change_huge_pmd() for file thp
thp: run vma_adjust_trans_huge() outside i_mmap_rwsem
thp: file pages support for split_huge_page()
thp, mlock: do not mlock PTE-mapped file huge pages
vmscan: split file huge pages before paging them out
page-flags: relax policy for PG_mappedtodisk and PG_reclaim
radix-tree: implement radix_tree_maybe_preload_order()
filemap: prepare find and delete operations for huge pages
truncate: handle file thp
mm, rmap: account shmem thp pages
shmem: prepare huge= mount option and sysfs knob
shmem: add huge pages support
shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
thp: extract khugepaged from mm/huge_memory.c
khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
shmem: make shmem_inode_info::lock irq-safe
khugepaged: add support of collapse for tmpfs/shmem pages
thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
shmem: split huge pages beyond i_size under memory pressure
thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
Documentation/filesystems/Locking | 10 +-
Documentation/filesystems/proc.txt | 9 +
Documentation/vm/transhuge.txt | 128 ++-
Documentation/vm/unevictable-lru.txt | 21 +
arch/alpha/mm/fault.c | 2 +-
arch/arc/mm/fault.c | 2 +-
arch/arm/mm/fault.c | 2 +-
arch/arm64/mm/fault.c | 2 +-
arch/avr32/mm/fault.c | 2 +-
arch/cris/mm/fault.c | 2 +-
arch/frv/mm/fault.c | 2 +-
arch/hexagon/mm/vm_fault.c | 2 +-
arch/ia64/mm/fault.c | 2 +-
arch/m32r/mm/fault.c | 2 +-
arch/m68k/mm/fault.c | 2 +-
arch/metag/mm/fault.c | 2 +-
arch/microblaze/mm/fault.c | 2 +-
arch/mips/mm/fault.c | 2 +-
arch/mn10300/mm/fault.c | 2 +-
arch/nios2/mm/fault.c | 2 +-
arch/openrisc/mm/fault.c | 2 +-
arch/parisc/mm/fault.c | 2 +-
arch/powerpc/mm/copro_fault.c | 2 +-
arch/powerpc/mm/fault.c | 2 +-
arch/s390/mm/fault.c | 2 +-
arch/score/mm/fault.c | 2 +-
arch/sh/mm/fault.c | 2 +-
arch/sparc/mm/fault_32.c | 4 +-
arch/sparc/mm/fault_64.c | 2 +-
arch/tile/mm/fault.c | 2 +-
arch/um/kernel/trap.c | 2 +-
arch/unicore32/mm/fault.c | 2 +-
arch/x86/mm/fault.c | 2 +-
arch/xtensa/mm/fault.c | 2 +-
drivers/base/node.c | 13 +-
drivers/char/mem.c | 24 +
drivers/iommu/amd_iommu_v2.c | 3 +-
drivers/iommu/intel-svm.c | 2 +-
fs/proc/meminfo.c | 7 +-
fs/proc/task_mmu.c | 10 +-
fs/userfaultfd.c | 22 +-
include/linux/huge_mm.h | 36 +-
include/linux/khugepaged.h | 6 +
include/linux/mm.h | 51 +-
include/linux/mmzone.h | 4 +-
include/linux/page-flags.h | 19 +-
include/linux/radix-tree.h | 1 +
include/linux/rmap.h | 2 +-
include/linux/shmem_fs.h | 45 +-
include/linux/userfaultfd_k.h | 8 +-
include/linux/vm_event_item.h | 7 +
include/trace/events/huge_memory.h | 3 +-
ipc/shm.c | 10 +-
lib/radix-tree.c | 84 +-
mm/Kconfig | 8 +
mm/Makefile | 2 +-
mm/filemap.c | 217 ++--
mm/gup.c | 7 +-
mm/huge_memory.c | 2102 ++++++----------------------------
mm/internal.h | 4 +-
mm/khugepaged.c | 1911 +++++++++++++++++++++++++++++++
mm/ksm.c | 5 +-
mm/memory.c | 879 +++++++-------
mm/mempolicy.c | 4 +-
mm/migrate.c | 5 +-
mm/mmap.c | 26 +-
mm/nommu.c | 3 +-
mm/page-writeback.c | 1 +
mm/page_alloc.c | 21 +
mm/rmap.c | 78 +-
mm/shmem.c | 918 +++++++++++++--
mm/swap.c | 2 +
mm/truncate.c | 28 +-
mm/util.c | 6 +
mm/vmscan.c | 6 +
mm/vmstat.c | 4 +
76 files changed, 4333 insertions(+), 2491 deletions(-)
create mode 100644 mm/khugepaged.c
--
2.8.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCHv9-rebased2 00/37] THP-enabled tmpfs/shmem using compound pages
2016-06-06 14:06 [PATCHv9 00/32] THP-enabled tmpfs/shmem using compound pages Kirill A. Shutemov
@ 2016-06-15 20:06 ` Kirill A. Shutemov
2016-06-15 20:06 ` [PATCHv9-rebased2 11/37] mm: introduce do_set_pmd() Kirill A. Shutemov
0 siblings, 1 reply; 3+ messages in thread
From: Kirill A. Shutemov @ 2016-06-15 20:06 UTC (permalink / raw)
To: Hugh Dickins, Andrea Arcangeli, Andrew Morton
Cc: Dave Hansen, Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
Jerome Marchand, Yang Shi, Sasha Levin, Andres Lagar-Cavilla,
Ning Qu, linux-kernel, linux-mm, linux-fsdevel, Ebru Akagunduz,
Kirill A. Shutemov
Andrew,
As requested, here's refreshed version of the patchset.
During preparation, Ebru mentionedi (on irc) on that she wanted to withdraw
mm-thp-avoid-unnecessary-swapin-in-khugepaged.patch from mm tree, but it's
difficult in current state of the tree. So I did rebase removing the patch.
The patchset below is aimed to replace patches in your series, staring
with mm-vmstat-calculate-particular-vm-event.patch (it's not necessary
after mm-thp-avoid-unnecessary-swapin-in-khugepaged.patch removal) up to
end of my patchset.
I also took opportunity to address Vlastimil's concern about 'pmd'
re-validiation after mmap_sem drop (you mentioned it in series file).
See patch 05/37.
I did few sanity check. Everything looks good.
Hopefully, I didn't screw up anything on the way. :)
Andrew Morton (1):
mm-thp-make-swapin-readahead-under-down_read-of-mmap_sem-fix-2-fix
Ebru Akagunduz (2):
mm, thp: make swapin readahead under down_read of mmap_sem
mm, thp: fix locking inconsistency in collapse_huge_page
Hugh Dickins (1):
shmem: get_unmapped_area align huge page
Kirill A. Shutemov (33):
mm-thp-make-swapin-readahead-under-down_read-of-mmap_sem-fix
khugepaged: recheck pmd after mmap_sem re-acquired
thp, mlock: update unevictable-lru.txt
mm: do not pass mm_struct into handle_mm_fault
mm: introduce fault_env
mm: postpone page table allocation until we have page to map
rmap: support file thp
mm: introduce do_set_pmd()
thp, vmstats: add counters for huge file pages
thp: support file pages in zap_huge_pmd()
thp: handle file pages in split_huge_pmd()
thp: handle file COW faults
thp: skip file huge pmd on copy_huge_pmd()
thp: prepare change_huge_pmd() for file thp
thp: run vma_adjust_trans_huge() outside i_mmap_rwsem
thp: file pages support for split_huge_page()
thp, mlock: do not mlock PTE-mapped file huge pages
vmscan: split file huge pages before paging them out
page-flags: relax policy for PG_mappedtodisk and PG_reclaim
radix-tree: implement radix_tree_maybe_preload_order()
filemap: prepare find and delete operations for huge pages
truncate: handle file thp
mm, rmap: account shmem thp pages
shmem: prepare huge= mount option and sysfs knob
shmem: add huge pages support
shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
thp: extract khugepaged from mm/huge_memory.c
khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
shmem: make shmem_inode_info::lock irq-safe
khugepaged: add support of collapse for tmpfs/shmem pages
thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
shmem: split huge pages beyond i_size under memory pressure
thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
Documentation/filesystems/Locking | 10 +-
Documentation/filesystems/proc.txt | 9 +
Documentation/vm/transhuge.txt | 128 ++-
Documentation/vm/unevictable-lru.txt | 21 +
arch/alpha/mm/fault.c | 2 +-
arch/arc/mm/fault.c | 2 +-
arch/arm/mm/fault.c | 2 +-
arch/arm64/mm/fault.c | 2 +-
arch/avr32/mm/fault.c | 2 +-
arch/cris/mm/fault.c | 2 +-
arch/frv/mm/fault.c | 2 +-
arch/hexagon/mm/vm_fault.c | 2 +-
arch/ia64/mm/fault.c | 2 +-
arch/m32r/mm/fault.c | 2 +-
arch/m68k/mm/fault.c | 2 +-
arch/metag/mm/fault.c | 2 +-
arch/microblaze/mm/fault.c | 2 +-
arch/mips/mm/fault.c | 2 +-
arch/mn10300/mm/fault.c | 2 +-
arch/nios2/mm/fault.c | 2 +-
arch/openrisc/mm/fault.c | 2 +-
arch/parisc/mm/fault.c | 2 +-
arch/powerpc/mm/copro_fault.c | 2 +-
arch/powerpc/mm/fault.c | 2 +-
arch/s390/mm/fault.c | 2 +-
arch/score/mm/fault.c | 2 +-
arch/sh/mm/fault.c | 2 +-
arch/sparc/mm/fault_32.c | 4 +-
arch/sparc/mm/fault_64.c | 2 +-
arch/tile/mm/fault.c | 2 +-
arch/um/kernel/trap.c | 2 +-
arch/unicore32/mm/fault.c | 2 +-
arch/x86/mm/fault.c | 2 +-
arch/xtensa/mm/fault.c | 2 +-
drivers/base/node.c | 13 +-
drivers/char/mem.c | 24 +
drivers/iommu/amd_iommu_v2.c | 3 +-
drivers/iommu/intel-svm.c | 2 +-
fs/proc/meminfo.c | 7 +-
fs/proc/task_mmu.c | 10 +-
fs/userfaultfd.c | 22 +-
include/linux/huge_mm.h | 36 +-
include/linux/khugepaged.h | 5 +
include/linux/mm.h | 51 +-
include/linux/mmzone.h | 4 +-
include/linux/page-flags.h | 19 +-
include/linux/radix-tree.h | 1 +
include/linux/rmap.h | 2 +-
include/linux/shmem_fs.h | 45 +-
include/linux/userfaultfd_k.h | 8 +-
include/linux/vm_event_item.h | 7 +
include/trace/events/huge_memory.h | 3 +-
ipc/shm.c | 10 +-
lib/radix-tree.c | 84 +-
mm/Kconfig | 8 +
mm/Makefile | 2 +-
mm/filemap.c | 217 ++--
mm/gup.c | 7 +-
mm/huge_memory.c | 2048 ++++++----------------------------
mm/internal.h | 4 +-
mm/khugepaged.c | 1913 +++++++++++++++++++++++++++++++
mm/ksm.c | 5 +-
mm/memory.c | 860 +++++++-------
mm/mempolicy.c | 2 +-
mm/migrate.c | 5 +-
mm/mmap.c | 26 +-
mm/nommu.c | 3 +-
mm/page-writeback.c | 1 +
mm/page_alloc.c | 21 +
mm/rmap.c | 78 +-
mm/shmem.c | 918 +++++++++++++--
mm/swap.c | 2 +
mm/truncate.c | 28 +-
mm/util.c | 6 +
mm/vmscan.c | 6 +
mm/vmstat.c | 4 +
76 files changed, 4319 insertions(+), 2431 deletions(-)
create mode 100644 mm/khugepaged.c
--
2.8.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCHv9-rebased2 11/37] mm: introduce do_set_pmd()
2016-06-15 20:06 ` [PATCHv9-rebased2 00/37] " Kirill A. Shutemov
@ 2016-06-15 20:06 ` Kirill A. Shutemov
0 siblings, 0 replies; 3+ messages in thread
From: Kirill A. Shutemov @ 2016-06-15 20:06 UTC (permalink / raw)
To: Hugh Dickins, Andrea Arcangeli, Andrew Morton
Cc: Dave Hansen, Vlastimil Babka, Christoph Lameter, Naoya Horiguchi,
Jerome Marchand, Yang Shi, Sasha Levin, Andres Lagar-Cavilla,
Ning Qu, linux-kernel, linux-mm, linux-fsdevel, Ebru Akagunduz,
Kirill A. Shutemov
With postponed page table allocation we have chance to setup huge pages.
do_set_pte() calls do_set_pmd() if following criteria met:
- page is compound;
- pmd entry in pmd_none();
- vma has suitable size and alignment;
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
include/linux/huge_mm.h | 2 ++
mm/huge_memory.c | 5 ----
mm/memory.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++-
mm/migrate.c | 3 +--
4 files changed, 74 insertions(+), 8 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 670ea0e3d138..3ef07cd7730c 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -143,6 +143,8 @@ static inline bool is_huge_zero_pmd(pmd_t pmd)
struct page *get_huge_zero_page(void);
void put_huge_zero_page(void);
+#define mk_huge_pmd(page, prot) pmd_mkhuge(mk_pmd(page, prot))
+
#else /* CONFIG_TRANSPARENT_HUGEPAGE */
#define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; })
#define HPAGE_PMD_MASK ({ BUILD_BUG(); 0; })
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 05088abe7576..b24b7993c369 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -795,11 +795,6 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
return pmd;
}
-static inline pmd_t mk_huge_pmd(struct page *page, pgprot_t prot)
-{
- return pmd_mkhuge(mk_pmd(page, prot));
-}
-
static inline struct list_head *page_deferred_list(struct page *page)
{
/*
diff --git a/mm/memory.c b/mm/memory.c
index 02a5491f0f17..6c0ebbc680d4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2921,6 +2921,66 @@ map_pte:
return 0;
}
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+
+#define HPAGE_CACHE_INDEX_MASK (HPAGE_PMD_NR - 1)
+static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
+ unsigned long haddr)
+{
+ if (((vma->vm_start >> PAGE_SHIFT) & HPAGE_CACHE_INDEX_MASK) !=
+ (vma->vm_pgoff & HPAGE_CACHE_INDEX_MASK))
+ return false;
+ if (haddr < vma->vm_start || haddr + HPAGE_PMD_SIZE > vma->vm_end)
+ return false;
+ return true;
+}
+
+static int do_set_pmd(struct fault_env *fe, struct page *page)
+{
+ struct vm_area_struct *vma = fe->vma;
+ bool write = fe->flags & FAULT_FLAG_WRITE;
+ unsigned long haddr = fe->address & HPAGE_PMD_MASK;
+ pmd_t entry;
+ int i, ret;
+
+ if (!transhuge_vma_suitable(vma, haddr))
+ return VM_FAULT_FALLBACK;
+
+ ret = VM_FAULT_FALLBACK;
+ page = compound_head(page);
+
+ fe->ptl = pmd_lock(vma->vm_mm, fe->pmd);
+ if (unlikely(!pmd_none(*fe->pmd)))
+ goto out;
+
+ for (i = 0; i < HPAGE_PMD_NR; i++)
+ flush_icache_page(vma, page + i);
+
+ entry = mk_huge_pmd(page, vma->vm_page_prot);
+ if (write)
+ entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
+
+ add_mm_counter(vma->vm_mm, MM_FILEPAGES, HPAGE_PMD_NR);
+ page_add_file_rmap(page, true);
+
+ set_pmd_at(vma->vm_mm, haddr, fe->pmd, entry);
+
+ update_mmu_cache_pmd(vma, haddr, fe->pmd);
+
+ /* fault is handled */
+ ret = 0;
+out:
+ spin_unlock(fe->ptl);
+ return ret;
+}
+#else
+static int do_set_pmd(struct fault_env *fe, struct page *page)
+{
+ BUILD_BUG();
+ return 0;
+}
+#endif
+
/**
* alloc_set_pte - setup new PTE entry for given page and add reverse page
* mapping. If needed, the fucntion allocates page table or use pre-allocated.
@@ -2940,9 +3000,19 @@ int alloc_set_pte(struct fault_env *fe, struct mem_cgroup *memcg,
struct vm_area_struct *vma = fe->vma;
bool write = fe->flags & FAULT_FLAG_WRITE;
pte_t entry;
+ int ret;
+
+ if (pmd_none(*fe->pmd) && PageTransCompound(page)) {
+ /* THP on COW? */
+ VM_BUG_ON_PAGE(memcg, page);
+
+ ret = do_set_pmd(fe, page);
+ if (ret != VM_FAULT_FALLBACK)
+ return ret;
+ }
if (!fe->pte) {
- int ret = pte_alloc_one_map(fe);
+ ret = pte_alloc_one_map(fe);
if (ret)
return ret;
}
diff --git a/mm/migrate.c b/mm/migrate.c
index 7e6e9375d654..c7531ccf65f4 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1984,8 +1984,7 @@ fail_putback:
}
orig_entry = *pmd;
- entry = mk_pmd(new_page, vma->vm_page_prot);
- entry = pmd_mkhuge(entry);
+ entry = mk_huge_pmd(new_page, vma->vm_page_prot);
entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
/*
--
2.8.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2016-06-16 10:17 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <050201d1c7ae$9dbf9370$d93eba50$@alibaba-inc.com>
2016-06-16 9:15 ` [PATCHv9-rebased2 11/37] mm: introduce do_set_pmd() Hillf Danton
2016-06-16 10:17 ` Kirill A. Shutemov
2016-06-06 14:06 [PATCHv9 00/32] THP-enabled tmpfs/shmem using compound pages Kirill A. Shutemov
2016-06-15 20:06 ` [PATCHv9-rebased2 00/37] " Kirill A. Shutemov
2016-06-15 20:06 ` [PATCHv9-rebased2 11/37] mm: introduce do_set_pmd() Kirill A. Shutemov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).