All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC v2 0/4]  Add support for sharing page tables across processes (Previously mshare)
@ 2023-04-26 16:49 Khalid Aziz
  2023-04-26 16:49 ` [PATCH RFC v2 1/4] mm/ptshare: Add vm flag for shared PTE Khalid Aziz
                   ` (7 more replies)
  0 siblings, 8 replies; 26+ messages in thread
From: Khalid Aziz @ 2023-04-26 16:49 UTC (permalink / raw)
  To: akpm, willy, markhemm, viro, david, mike.kravetz
  Cc: Khalid Aziz, andreyknvl, dave.hansen, luto, brauner, arnd,
	ebiederm, catalin.marinas, linux-arch, linux-kernel, linux-mm,
	mhiramat, rostedt, vasily.averin, xhao, pcc, neilb, maz

Memory pages shared between processes require a page table entry
(PTE) for each process. Each of these PTE consumes some of the
memory and as long as number of mappings being maintained is small
enough, this space consumed by page tables is not objectionable.
When very few memory pages are shared between processes, the number
of page table entries (PTEs) to maintain is mostly constrained by
the number of pages of memory on the system.  As the number of
shared pages and the number of times pages are shared goes up,
amount of memory consumed by page tables starts to become
significant. This issue does not apply to threads. Any number of
threads can share the same pages inside a process while sharing the
same PTEs. Extending this same model to sharing pages across
processes can eliminate this issue for sharing across processes as
well.

Some of the field deployments commonly see memory pages shared
across 1000s of processes. On x86_64, each page requires a PTE that
is only 8 bytes long which is very small compared to the 4K page
size. When 2000 processes map the same page in their address space,
each one of them requires 8 bytes for its PTE and together that adds
up to 8K of memory just to hold the PTEs for one 4K page. On a
database server with 300GB SGA, a system crash was seen with
out-of-memory condition when 1500+ clients tried to share this SGA
even though the system had 512GB of memory. On this server, in the
worst case scenario of all 1500 processes mapping every page from
SGA would have required 878GB+ for just the PTEs. If these PTEs
could be shared, amount of memory saved is very significant.

This patch series adds a new flag to mmap() call - MAP_SHARED_PT.
This flag can be specified along with MAP_SHARED by a process to
hint to kernel that it wishes to share page table entries for this
file mapping mmap region with other processes. Any other process
that mmaps the same file with MAP_SHARED_PT flag can then share the
same page table entries. Besides specifying MAP_SHARED_PT flag, the
processes must map the files at a PMD aligned address with a size
that is a multiple of PMD size and at the same virtual addresses.
This last requirement of same virtual addresses can possibly be
relaxed if that is the consensus.

When mmap() is called with MAP_SHARED_PT flag, a new host mm struct
is created to hold the shared page tables. Host mm struct is not
attached to a process. Start and size of host mm are set to the
start and size of the mmap region and a VMA covering this range is
also added to host mm struct. Existing page table entries from the
process that creates the mapping are copied over to the host mm
struct. All processes mapping this shared region are considered
guest processes. When a guest process mmap's the shared region, a vm
flag VM_SHARED_PT is added to the VMAs in guest process. Upon a page
fault, VMA is checked for the presence of VM_SHARED_PT flag. If the
flag is found, its corresponding PMD is updated with the PMD from
host mm struct so the PMD will point to the page tables in host mm
struct. vm_mm pointer of the VMA is also updated to point to host mm
struct for the duration of fault handling to ensure fault handling
happens in the context of host mm struct. When a new PTE is
created, it is created in the host mm struct page tables and the PMD
in guest mm points to the same PTEs.

This is a basic working implementation. It will need to go through
more testing and refinements. Some notes and questions:

- PMD size alignment and size requirement is currently hard coded
  in. Is there a need or desire to make this more flexible and work
  with other alignments/sizes? PMD size allows for adapting this
  infrastructure to form the basis for hugetlbfs page table sharing
  as well. More work will be needed to make that happen.

- Is there a reason to allow a userspace app to query this size and
  alignment requirement for MAP_SHARED_PT in some way?

- Shared PTEs means mprotect() call made by one process affects all
  processes sharing the same mapping and that behavior will need to
  be documented clearly. Effect of mprotect call being different for
  processes using shared page tables is the primary reason to
  require an explicit opt-in from userspace processes to share page
  tables. With a transparent sharing derived from MAP_SHARED alone,
  changed effect of mprotect can break significant number of
  userspace apps. One could work around that by unsharing whenever
  mprotect changes modes on shared mapping but that introduces
  complexity and the capability to execute a single mprotect to
  change modes across 1000's of processes sharing a mapped database
  is a feature explicitly asked for by database folks. This
  capability has significant performance advantage when compared to
  mechanism of sending messages to every process using shared
  mapping to call mprotect and change modes in each process, or
  using traps on permissions mismatch in each process.

- This implementation does not allow unmapping page table shared
  mappings partially. Should that be supported in future?

Some concerns in this RFC:

- When page tables for a process are freed upon process exit,
  pmd_free_tlb() gets called at one point to free all PMDs allocated
  by the process. For a shared page table, shared PMDs can not be
  released when a guest process exits. These shared PMDs are
  released when host mm struct is released upon end of last
  reference to page table shared region hosted by this mm. For now
  to stop PMDs being released, this RFC introduces following change
  in mm/memory.c which works but does not feel like the right
  approach. Any suggestions for a better long term approach will be
  very appreciated:

@@ -210,13 +221,19 @@ static inline void free_pmd_range(struct mmu_gather *tlb,
pud_t *pud,

        pmd = pmd_offset(pud, start);
        pud_clear(pud);
-       pmd_free_tlb(tlb, pmd, start);
-       mm_dec_nr_pmds(tlb->mm);
+       if (shared_pte) {
+               tlb_flush_pud_range(tlb, start, PAGE_SIZE);
+               tlb->freed_tables = 1;
+       } else {
+               pmd_free_tlb(tlb, pmd, start);
+               mm_dec_nr_pmds(tlb->mm);
+       }
 }

 static inline void free_pud_range(struct mmu_gather *tlb, p4d_t *p4d,

- This implementation requires an additional VM flag. Since all lower
  32 bits are currently in use, the new VM flag must come from upper
  32 bits which restricts this feature to 64-bit processors.

- This feature is implemented for file mappings only. Is there a
  need to support it for anonymous memory as well?

- Accounting for MAP_SHARED_PT mapped filepages in a process and
  pagetable bytes is not quite accurate yet in this RFC and will be
  fixed in the non-RFC version of patches.

I appreciate any feedback on these patches and ideas for
improvements before moving these patches out of RFC stage.


Changes from RFC v1:
- Broken the patches up into smaller patches
- Fixed a few bugs related to freeing PTEs and PMDs incorrectly
- Cleaned up the code a bit


Khalid Aziz (4):
  mm/ptshare: Add vm flag for shared PTE
  mm/ptshare: Add flag MAP_SHARED_PT to mmap()
  mm/ptshare: Create new mm struct for page table sharing
  mm/ptshare: Add page fault handling for page table shared regions

 include/linux/fs.h                     |   2 +
 include/linux/mm.h                     |   8 +
 include/trace/events/mmflags.h         |   3 +-
 include/uapi/asm-generic/mman-common.h |   1 +
 mm/Makefile                            |   2 +-
 mm/internal.h                          |  21 ++
 mm/memory.c                            | 105 ++++++++--
 mm/mmap.c                              |  88 +++++++++
 mm/ptshare.c                           | 263 +++++++++++++++++++++++++
 9 files changed, 476 insertions(+), 17 deletions(-)
 create mode 100644 mm/ptshare.c

-- 
2.37.2


^ permalink raw reply	[flat|nested] 26+ messages in thread
* Re: [PATCH RFC v2 3/4] mm/ptshare: Create new mm struct for page table sharing
@ 2023-05-07 14:13 kernel test robot
  0 siblings, 0 replies; 26+ messages in thread
From: kernel test robot @ 2023-05-07 14:13 UTC (permalink / raw)
  To: oe-kbuild; +Cc: lkp, Dan Carpenter

BCC: lkp@intel.com
CC: oe-kbuild-all@lists.linux.dev
In-Reply-To: <1fd52581f4e4960a4d07cb9784d56659ec139d3c.1682453344.git.khalid.aziz@oracle.com>
References: <1fd52581f4e4960a4d07cb9784d56659ec139d3c.1682453344.git.khalid.aziz@oracle.com>
TO: Khalid Aziz <khalid.aziz@oracle.com>

Hi Khalid,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build warnings:

[auto build test WARNING on arnd-asm-generic/master]
[also build test WARNING on vfs-idmapping/for-next linus/master v6.3 next-20230505]
[cannot apply to akpm-mm/mm-everything]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Khalid-Aziz/mm-ptshare-Add-vm-flag-for-shared-PTE/20230427-005143
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git master
patch link:    https://lore.kernel.org/r/1fd52581f4e4960a4d07cb9784d56659ec139d3c.1682453344.git.khalid.aziz%40oracle.com
patch subject: [PATCH RFC v2 3/4] mm/ptshare: Create new mm struct for page table sharing
:::::: branch date: 11 days ago
:::::: commit date: 11 days ago
config: i386-randconfig-m021 (https://download.01.org/0day-ci/archive/20230507/202305072226.KGlQeenj-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-12) 11.3.0

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <error27@gmail.com>
| Link: https://lore.kernel.org/r/202305072226.KGlQeenj-lkp@intel.com/

smatch warnings:
mm/mmap.c:2699 mmap_region() warn: bitwise AND condition is false here

vim +2699 mm/mmap.c

dd2283f2605e3b Yang Shi           2018-10-26  2554  
e99668a56430a2 Liam R. Howlett    2022-09-06  2555  unsigned long mmap_region(struct file *file, unsigned long addr,
e99668a56430a2 Liam R. Howlett    2022-09-06  2556  		unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
e99668a56430a2 Liam R. Howlett    2022-09-06  2557  		struct list_head *uf)
e99668a56430a2 Liam R. Howlett    2022-09-06  2558  {
e99668a56430a2 Liam R. Howlett    2022-09-06  2559  	struct mm_struct *mm = current->mm;
e99668a56430a2 Liam R. Howlett    2022-09-06  2560  	struct vm_area_struct *vma = NULL;
e99668a56430a2 Liam R. Howlett    2022-09-06  2561  	struct vm_area_struct *next, *prev, *merge;
e99668a56430a2 Liam R. Howlett    2022-09-06  2562  	pgoff_t pglen = len >> PAGE_SHIFT;
e99668a56430a2 Liam R. Howlett    2022-09-06  2563  	unsigned long charged = 0;
e99668a56430a2 Liam R. Howlett    2022-09-06  2564  	unsigned long end = addr + len;
e99668a56430a2 Liam R. Howlett    2022-09-06  2565  	unsigned long merge_start = addr, merge_end = end;
e99668a56430a2 Liam R. Howlett    2022-09-06  2566  	pgoff_t vm_pgoff;
e99668a56430a2 Liam R. Howlett    2022-09-06  2567  	int error;
183654ce26a5d5 Liam R. Howlett    2023-01-20  2568  	VMA_ITERATOR(vmi, mm, addr);
e99668a56430a2 Liam R. Howlett    2022-09-06  2569  
e99668a56430a2 Liam R. Howlett    2022-09-06  2570  	/* Check against address space limit. */
e99668a56430a2 Liam R. Howlett    2022-09-06  2571  	if (!may_expand_vm(mm, vm_flags, len >> PAGE_SHIFT)) {
e99668a56430a2 Liam R. Howlett    2022-09-06  2572  		unsigned long nr_pages;
e99668a56430a2 Liam R. Howlett    2022-09-06  2573  
e99668a56430a2 Liam R. Howlett    2022-09-06  2574  		/*
e99668a56430a2 Liam R. Howlett    2022-09-06  2575  		 * MAP_FIXED may remove pages of mappings that intersects with
e99668a56430a2 Liam R. Howlett    2022-09-06  2576  		 * requested mapping. Account for the pages it would unmap.
e99668a56430a2 Liam R. Howlett    2022-09-06  2577  		 */
e99668a56430a2 Liam R. Howlett    2022-09-06  2578  		nr_pages = count_vma_pages_range(mm, addr, end);
e99668a56430a2 Liam R. Howlett    2022-09-06  2579  
e99668a56430a2 Liam R. Howlett    2022-09-06  2580  		if (!may_expand_vm(mm, vm_flags,
e99668a56430a2 Liam R. Howlett    2022-09-06  2581  					(len >> PAGE_SHIFT) - nr_pages))
e99668a56430a2 Liam R. Howlett    2022-09-06  2582  			return -ENOMEM;
e99668a56430a2 Liam R. Howlett    2022-09-06  2583  	}
e99668a56430a2 Liam R. Howlett    2022-09-06  2584  
e99668a56430a2 Liam R. Howlett    2022-09-06  2585  	/* Unmap any existing mapping in the area */
183654ce26a5d5 Liam R. Howlett    2023-01-20  2586  	if (do_vmi_munmap(&vmi, mm, addr, len, uf, false))
e99668a56430a2 Liam R. Howlett    2022-09-06  2587  		return -ENOMEM;
e99668a56430a2 Liam R. Howlett    2022-09-06  2588  
e99668a56430a2 Liam R. Howlett    2022-09-06  2589  	/*
e99668a56430a2 Liam R. Howlett    2022-09-06  2590  	 * Private writable mapping: check memory availability
e99668a56430a2 Liam R. Howlett    2022-09-06  2591  	 */
e99668a56430a2 Liam R. Howlett    2022-09-06  2592  	if (accountable_mapping(file, vm_flags)) {
e99668a56430a2 Liam R. Howlett    2022-09-06  2593  		charged = len >> PAGE_SHIFT;
e99668a56430a2 Liam R. Howlett    2022-09-06  2594  		if (security_vm_enough_memory_mm(mm, charged))
e99668a56430a2 Liam R. Howlett    2022-09-06  2595  			return -ENOMEM;
e99668a56430a2 Liam R. Howlett    2022-09-06  2596  		vm_flags |= VM_ACCOUNT;
e99668a56430a2 Liam R. Howlett    2022-09-06  2597  	}
e99668a56430a2 Liam R. Howlett    2022-09-06  2598  
183654ce26a5d5 Liam R. Howlett    2023-01-20  2599  	next = vma_next(&vmi);
183654ce26a5d5 Liam R. Howlett    2023-01-20  2600  	prev = vma_prev(&vmi);
e99668a56430a2 Liam R. Howlett    2022-09-06  2601  	if (vm_flags & VM_SPECIAL)
e99668a56430a2 Liam R. Howlett    2022-09-06  2602  		goto cannot_expand;
e99668a56430a2 Liam R. Howlett    2022-09-06  2603  
e99668a56430a2 Liam R. Howlett    2022-09-06  2604  	/* Attempt to expand an old mapping */
e99668a56430a2 Liam R. Howlett    2022-09-06  2605  	/* Check next */
e99668a56430a2 Liam R. Howlett    2022-09-06  2606  	if (next && next->vm_start == end && !vma_policy(next) &&
e99668a56430a2 Liam R. Howlett    2022-09-06  2607  	    can_vma_merge_before(next, vm_flags, NULL, file, pgoff+pglen,
e99668a56430a2 Liam R. Howlett    2022-09-06  2608  				 NULL_VM_UFFD_CTX, NULL)) {
e99668a56430a2 Liam R. Howlett    2022-09-06  2609  		merge_end = next->vm_end;
e99668a56430a2 Liam R. Howlett    2022-09-06  2610  		vma = next;
e99668a56430a2 Liam R. Howlett    2022-09-06  2611  		vm_pgoff = next->vm_pgoff - pglen;
e99668a56430a2 Liam R. Howlett    2022-09-06  2612  	}
e99668a56430a2 Liam R. Howlett    2022-09-06  2613  
e99668a56430a2 Liam R. Howlett    2022-09-06  2614  	/* Check prev */
e99668a56430a2 Liam R. Howlett    2022-09-06  2615  	if (prev && prev->vm_end == addr && !vma_policy(prev) &&
e99668a56430a2 Liam R. Howlett    2022-09-06  2616  	    (vma ? can_vma_merge_after(prev, vm_flags, vma->anon_vma, file,
e99668a56430a2 Liam R. Howlett    2022-09-06  2617  				       pgoff, vma->vm_userfaultfd_ctx, NULL) :
e99668a56430a2 Liam R. Howlett    2022-09-06  2618  		   can_vma_merge_after(prev, vm_flags, NULL, file, pgoff,
e99668a56430a2 Liam R. Howlett    2022-09-06  2619  				       NULL_VM_UFFD_CTX, NULL))) {
e99668a56430a2 Liam R. Howlett    2022-09-06  2620  		merge_start = prev->vm_start;
e99668a56430a2 Liam R. Howlett    2022-09-06  2621  		vma = prev;
e99668a56430a2 Liam R. Howlett    2022-09-06  2622  		vm_pgoff = prev->vm_pgoff;
e99668a56430a2 Liam R. Howlett    2022-09-06  2623  	}
e99668a56430a2 Liam R. Howlett    2022-09-06  2624  
e99668a56430a2 Liam R. Howlett    2022-09-06  2625  
e99668a56430a2 Liam R. Howlett    2022-09-06  2626  	/* Actually expand, if possible */
e99668a56430a2 Liam R. Howlett    2022-09-06  2627  	if (vma &&
3c441ab7d059eb Liam R. Howlett    2023-01-20  2628  	    !vma_expand(&vmi, vma, merge_start, merge_end, vm_pgoff, next)) {
e99668a56430a2 Liam R. Howlett    2022-09-06  2629  		khugepaged_enter_vma(vma, vm_flags);
e99668a56430a2 Liam R. Howlett    2022-09-06  2630  		goto expanded;
e99668a56430a2 Liam R. Howlett    2022-09-06  2631  	}
e99668a56430a2 Liam R. Howlett    2022-09-06  2632  
e99668a56430a2 Liam R. Howlett    2022-09-06  2633  cannot_expand:
e99668a56430a2 Liam R. Howlett    2022-09-06  2634  	/*
e99668a56430a2 Liam R. Howlett    2022-09-06  2635  	 * Determine the object being mapped and call the appropriate
e99668a56430a2 Liam R. Howlett    2022-09-06  2636  	 * specific mapper. the address has already been validated, but
e99668a56430a2 Liam R. Howlett    2022-09-06  2637  	 * not unmapped, but the maps are removed from the list.
e99668a56430a2 Liam R. Howlett    2022-09-06  2638  	 */
e99668a56430a2 Liam R. Howlett    2022-09-06  2639  	vma = vm_area_alloc(mm);
e99668a56430a2 Liam R. Howlett    2022-09-06  2640  	if (!vma) {
e99668a56430a2 Liam R. Howlett    2022-09-06  2641  		error = -ENOMEM;
e99668a56430a2 Liam R. Howlett    2022-09-06  2642  		goto unacct_error;
e99668a56430a2 Liam R. Howlett    2022-09-06  2643  	}
e99668a56430a2 Liam R. Howlett    2022-09-06  2644  
0fd5a9e2b09ff5 Liam R. Howlett    2023-01-20  2645  	vma_iter_set(&vmi, addr);
e99668a56430a2 Liam R. Howlett    2022-09-06  2646  	vma->vm_start = addr;
e99668a56430a2 Liam R. Howlett    2022-09-06  2647  	vma->vm_end = end;
1c71222e5f2393 Suren Baghdasaryan 2023-01-26  2648  	vm_flags_init(vma, vm_flags);
e99668a56430a2 Liam R. Howlett    2022-09-06  2649  	vma->vm_page_prot = vm_get_page_prot(vm_flags);
e99668a56430a2 Liam R. Howlett    2022-09-06  2650  	vma->vm_pgoff = pgoff;
e99668a56430a2 Liam R. Howlett    2022-09-06  2651  
e99668a56430a2 Liam R. Howlett    2022-09-06  2652  	if (file) {
e99668a56430a2 Liam R. Howlett    2022-09-06  2653  		if (vm_flags & VM_SHARED) {
e99668a56430a2 Liam R. Howlett    2022-09-06  2654  			error = mapping_map_writable(file->f_mapping);
e99668a56430a2 Liam R. Howlett    2022-09-06  2655  			if (error)
e99668a56430a2 Liam R. Howlett    2022-09-06  2656  				goto free_vma;
e99668a56430a2 Liam R. Howlett    2022-09-06  2657  		}
e99668a56430a2 Liam R. Howlett    2022-09-06  2658  
e99668a56430a2 Liam R. Howlett    2022-09-06  2659  		vma->vm_file = get_file(file);
e99668a56430a2 Liam R. Howlett    2022-09-06  2660  		error = call_mmap(file, vma);
e99668a56430a2 Liam R. Howlett    2022-09-06  2661  		if (error)
e99668a56430a2 Liam R. Howlett    2022-09-06  2662  			goto unmap_and_free_vma;
e99668a56430a2 Liam R. Howlett    2022-09-06  2663  
a57b70519d1f7c Liam Howlett       2022-10-18  2664  		/*
a57b70519d1f7c Liam Howlett       2022-10-18  2665  		 * Expansion is handled above, merging is handled below.
a57b70519d1f7c Liam Howlett       2022-10-18  2666  		 * Drivers should not alter the address of the VMA.
e99668a56430a2 Liam R. Howlett    2022-09-06  2667  		 */
a57b70519d1f7c Liam Howlett       2022-10-18  2668  		error = -EINVAL;
cc8d1b097de78b Liam R. Howlett    2023-01-20  2669  		if (WARN_ON((addr != vma->vm_start)))
a57b70519d1f7c Liam Howlett       2022-10-18  2670  			goto close_and_free_vma;
e99668a56430a2 Liam R. Howlett    2022-09-06  2671  
cc8d1b097de78b Liam R. Howlett    2023-01-20  2672  		vma_iter_set(&vmi, addr);
e99668a56430a2 Liam R. Howlett    2022-09-06  2673  		/*
e99668a56430a2 Liam R. Howlett    2022-09-06  2674  		 * If vm_flags changed after call_mmap(), we should try merge
e99668a56430a2 Liam R. Howlett    2022-09-06  2675  		 * vma again as we may succeed this time.
e99668a56430a2 Liam R. Howlett    2022-09-06  2676  		 */
e99668a56430a2 Liam R. Howlett    2022-09-06  2677  		if (unlikely(vm_flags != vma->vm_flags && prev)) {
9760ebffbf5507 Liam R. Howlett    2023-01-20  2678  			merge = vma_merge(&vmi, mm, prev, vma->vm_start,
9760ebffbf5507 Liam R. Howlett    2023-01-20  2679  				    vma->vm_end, vma->vm_flags, NULL,
9760ebffbf5507 Liam R. Howlett    2023-01-20  2680  				    vma->vm_file, vma->vm_pgoff, NULL,
9760ebffbf5507 Liam R. Howlett    2023-01-20  2681  				    NULL_VM_UFFD_CTX, NULL);
e99668a56430a2 Liam R. Howlett    2022-09-06  2682  			if (merge) {
e99668a56430a2 Liam R. Howlett    2022-09-06  2683  				/*
e99668a56430a2 Liam R. Howlett    2022-09-06  2684  				 * ->mmap() can change vma->vm_file and fput
e99668a56430a2 Liam R. Howlett    2022-09-06  2685  				 * the original file. So fput the vma->vm_file
e99668a56430a2 Liam R. Howlett    2022-09-06  2686  				 * here or we would add an extra fput for file
e99668a56430a2 Liam R. Howlett    2022-09-06  2687  				 * and cause general protection fault
e99668a56430a2 Liam R. Howlett    2022-09-06  2688  				 * ultimately.
e99668a56430a2 Liam R. Howlett    2022-09-06  2689  				 */
e99668a56430a2 Liam R. Howlett    2022-09-06  2690  				fput(vma->vm_file);
e99668a56430a2 Liam R. Howlett    2022-09-06  2691  				vm_area_free(vma);
e99668a56430a2 Liam R. Howlett    2022-09-06  2692  				vma = merge;
e99668a56430a2 Liam R. Howlett    2022-09-06  2693  				/* Update vm_flags to pick up the change. */
e99668a56430a2 Liam R. Howlett    2022-09-06  2694  				vm_flags = vma->vm_flags;
e99668a56430a2 Liam R. Howlett    2022-09-06  2695  				goto unmap_writable;
e99668a56430a2 Liam R. Howlett    2022-09-06  2696  			}
e99668a56430a2 Liam R. Howlett    2022-09-06  2697  		}
e99668a56430a2 Liam R. Howlett    2022-09-06  2698  
263f767f45a14d Khalid Aziz        2023-04-26 @2699  		if (vm_flags & VM_SHARED_PT)
263f767f45a14d Khalid Aziz        2023-04-26  2700  			vm_flags_set(vma, VM_SHARED_PT);
e99668a56430a2 Liam R. Howlett    2022-09-06  2701  		vm_flags = vma->vm_flags;
e99668a56430a2 Liam R. Howlett    2022-09-06  2702  	} else if (vm_flags & VM_SHARED) {
e99668a56430a2 Liam R. Howlett    2022-09-06  2703  		error = shmem_zero_setup(vma);
e99668a56430a2 Liam R. Howlett    2022-09-06  2704  		if (error)
e99668a56430a2 Liam R. Howlett    2022-09-06  2705  			goto free_vma;
e99668a56430a2 Liam R. Howlett    2022-09-06  2706  	} else {
e99668a56430a2 Liam R. Howlett    2022-09-06  2707  		vma_set_anonymous(vma);
e99668a56430a2 Liam R. Howlett    2022-09-06  2708  	}
e99668a56430a2 Liam R. Howlett    2022-09-06  2709  
b507808ebce235 Joey Gouly         2023-01-19  2710  	if (map_deny_write_exec(vma, vma->vm_flags)) {
b507808ebce235 Joey Gouly         2023-01-19  2711  		error = -EACCES;
b507808ebce235 Joey Gouly         2023-01-19  2712  		if (file)
b507808ebce235 Joey Gouly         2023-01-19  2713  			goto close_and_free_vma;
b507808ebce235 Joey Gouly         2023-01-19  2714  		else if (vma->vm_file)
b507808ebce235 Joey Gouly         2023-01-19  2715  			goto unmap_and_free_vma;
b507808ebce235 Joey Gouly         2023-01-19  2716  		else
b507808ebce235 Joey Gouly         2023-01-19  2717  			goto free_vma;
b507808ebce235 Joey Gouly         2023-01-19  2718  	}
b507808ebce235 Joey Gouly         2023-01-19  2719  
e99668a56430a2 Liam R. Howlett    2022-09-06  2720  	/* Allow architectures to sanity-check the vm_flags */
e99668a56430a2 Liam R. Howlett    2022-09-06  2721  	error = -EINVAL;
cc8d1b097de78b Liam R. Howlett    2023-01-20  2722  	if (!arch_validate_flags(vma->vm_flags))
deb0f6562884b5 Carlos Llamas      2022-09-30  2723  		goto close_and_free_vma;
e99668a56430a2 Liam R. Howlett    2022-09-06  2724  
e99668a56430a2 Liam R. Howlett    2022-09-06  2725  	error = -ENOMEM;
cc8d1b097de78b Liam R. Howlett    2023-01-20  2726  	if (vma_iter_prealloc(&vmi))
5789151e48acc3 Mike Kravetz       2022-10-17  2727  		goto close_and_free_vma;
e99668a56430a2 Liam R. Howlett    2022-09-06  2728  
e99668a56430a2 Liam R. Howlett    2022-09-06  2729  	if (vma->vm_file)
e99668a56430a2 Liam R. Howlett    2022-09-06  2730  		i_mmap_lock_write(vma->vm_file->f_mapping);
e99668a56430a2 Liam R. Howlett    2022-09-06  2731  
183654ce26a5d5 Liam R. Howlett    2023-01-20  2732  	vma_iter_store(&vmi, vma);
e99668a56430a2 Liam R. Howlett    2022-09-06  2733  	mm->map_count++;
e99668a56430a2 Liam R. Howlett    2022-09-06  2734  	if (vma->vm_file) {
e99668a56430a2 Liam R. Howlett    2022-09-06  2735  		if (vma->vm_flags & VM_SHARED)
e99668a56430a2 Liam R. Howlett    2022-09-06  2736  			mapping_allow_writable(vma->vm_file->f_mapping);
e99668a56430a2 Liam R. Howlett    2022-09-06  2737  
e99668a56430a2 Liam R. Howlett    2022-09-06  2738  		flush_dcache_mmap_lock(vma->vm_file->f_mapping);
e99668a56430a2 Liam R. Howlett    2022-09-06  2739  		vma_interval_tree_insert(vma, &vma->vm_file->f_mapping->i_mmap);
e99668a56430a2 Liam R. Howlett    2022-09-06  2740  		flush_dcache_mmap_unlock(vma->vm_file->f_mapping);
e99668a56430a2 Liam R. Howlett    2022-09-06  2741  		i_mmap_unlock_write(vma->vm_file->f_mapping);
e99668a56430a2 Liam R. Howlett    2022-09-06  2742  	}
e99668a56430a2 Liam R. Howlett    2022-09-06  2743  
e99668a56430a2 Liam R. Howlett    2022-09-06  2744  	/*
e99668a56430a2 Liam R. Howlett    2022-09-06  2745  	 * vma_merge() calls khugepaged_enter_vma() either, the below
e99668a56430a2 Liam R. Howlett    2022-09-06  2746  	 * call covers the non-merge case.
e99668a56430a2 Liam R. Howlett    2022-09-06  2747  	 */
e99668a56430a2 Liam R. Howlett    2022-09-06  2748  	khugepaged_enter_vma(vma, vma->vm_flags);
e99668a56430a2 Liam R. Howlett    2022-09-06  2749  
e99668a56430a2 Liam R. Howlett    2022-09-06  2750  	/* Once vma denies write, undo our temporary denial count */
e99668a56430a2 Liam R. Howlett    2022-09-06  2751  unmap_writable:
e99668a56430a2 Liam R. Howlett    2022-09-06  2752  	if (file && vm_flags & VM_SHARED)
e99668a56430a2 Liam R. Howlett    2022-09-06  2753  		mapping_unmap_writable(file->f_mapping);
e99668a56430a2 Liam R. Howlett    2022-09-06  2754  	file = vma->vm_file;
e99668a56430a2 Liam R. Howlett    2022-09-06  2755  expanded:
e99668a56430a2 Liam R. Howlett    2022-09-06  2756  	perf_event_mmap(vma);
e99668a56430a2 Liam R. Howlett    2022-09-06  2757  
e99668a56430a2 Liam R. Howlett    2022-09-06  2758  	vm_stat_account(mm, vm_flags, len >> PAGE_SHIFT);
e99668a56430a2 Liam R. Howlett    2022-09-06  2759  	if (vm_flags & VM_LOCKED) {
e99668a56430a2 Liam R. Howlett    2022-09-06  2760  		if ((vm_flags & VM_SPECIAL) || vma_is_dax(vma) ||
e99668a56430a2 Liam R. Howlett    2022-09-06  2761  					is_vm_hugetlb_page(vma) ||
e99668a56430a2 Liam R. Howlett    2022-09-06  2762  					vma == get_gate_vma(current->mm))
e430a95a04efc5 Suren Baghdasaryan 2023-01-26  2763  			vm_flags_clear(vma, VM_LOCKED_MASK);
e99668a56430a2 Liam R. Howlett    2022-09-06  2764  		else
e99668a56430a2 Liam R. Howlett    2022-09-06  2765  			mm->locked_vm += (len >> PAGE_SHIFT);
e99668a56430a2 Liam R. Howlett    2022-09-06  2766  	}
e99668a56430a2 Liam R. Howlett    2022-09-06  2767  
e99668a56430a2 Liam R. Howlett    2022-09-06  2768  	if (file)
e99668a56430a2 Liam R. Howlett    2022-09-06  2769  		uprobe_mmap(vma);
e99668a56430a2 Liam R. Howlett    2022-09-06  2770  
e99668a56430a2 Liam R. Howlett    2022-09-06  2771  	/*
e99668a56430a2 Liam R. Howlett    2022-09-06  2772  	 * New (or expanded) vma always get soft dirty status.
e99668a56430a2 Liam R. Howlett    2022-09-06  2773  	 * Otherwise user-space soft-dirty page tracker won't
e99668a56430a2 Liam R. Howlett    2022-09-06  2774  	 * be able to distinguish situation when vma area unmapped,
e99668a56430a2 Liam R. Howlett    2022-09-06  2775  	 * then new mapped in-place (which must be aimed as
e99668a56430a2 Liam R. Howlett    2022-09-06  2776  	 * a completely new data area).
e99668a56430a2 Liam R. Howlett    2022-09-06  2777  	 */
1c71222e5f2393 Suren Baghdasaryan 2023-01-26  2778  	vm_flags_set(vma, VM_SOFTDIRTY);
e99668a56430a2 Liam R. Howlett    2022-09-06  2779  
e99668a56430a2 Liam R. Howlett    2022-09-06  2780  	vma_set_page_prot(vma);
e99668a56430a2 Liam R. Howlett    2022-09-06  2781  
e99668a56430a2 Liam R. Howlett    2022-09-06  2782  	validate_mm(mm);
e99668a56430a2 Liam R. Howlett    2022-09-06  2783  	return addr;
e99668a56430a2 Liam R. Howlett    2022-09-06  2784  
deb0f6562884b5 Carlos Llamas      2022-09-30  2785  close_and_free_vma:
cc8d1b097de78b Liam R. Howlett    2023-01-20  2786  	if (file && vma->vm_ops && vma->vm_ops->close)
deb0f6562884b5 Carlos Llamas      2022-09-30  2787  		vma->vm_ops->close(vma);
cc8d1b097de78b Liam R. Howlett    2023-01-20  2788  
cc8d1b097de78b Liam R. Howlett    2023-01-20  2789  	if (file || vma->vm_file) {
e99668a56430a2 Liam R. Howlett    2022-09-06  2790  unmap_and_free_vma:
e99668a56430a2 Liam R. Howlett    2022-09-06  2791  		fput(vma->vm_file);
e99668a56430a2 Liam R. Howlett    2022-09-06  2792  		vma->vm_file = NULL;
e99668a56430a2 Liam R. Howlett    2022-09-06  2793  
e99668a56430a2 Liam R. Howlett    2022-09-06  2794  		/* Undo any partial mapping done by a device driver. */
cc8d1b097de78b Liam R. Howlett    2023-01-20  2795  		unmap_region(mm, &mm->mm_mt, vma, prev, next, vma->vm_start,
68f48381d7fdd1 Suren Baghdasaryan 2023-01-26  2796  			     vma->vm_end, true);
cc8d1b097de78b Liam R. Howlett    2023-01-20  2797  	}
cc674ab3c01880 Li Zetao           2022-10-28  2798  	if (file && (vm_flags & VM_SHARED))
e99668a56430a2 Liam R. Howlett    2022-09-06  2799  		mapping_unmap_writable(file->f_mapping);
e99668a56430a2 Liam R. Howlett    2022-09-06  2800  free_vma:
e99668a56430a2 Liam R. Howlett    2022-09-06  2801  	vm_area_free(vma);
e99668a56430a2 Liam R. Howlett    2022-09-06  2802  unacct_error:
e99668a56430a2 Liam R. Howlett    2022-09-06  2803  	if (charged)
e99668a56430a2 Liam R. Howlett    2022-09-06  2804  		vm_unacct_memory(charged);
e99668a56430a2 Liam R. Howlett    2022-09-06  2805  	validate_mm(mm);
e99668a56430a2 Liam R. Howlett    2022-09-06  2806  	return error;
e99668a56430a2 Liam R. Howlett    2022-09-06  2807  }
e99668a56430a2 Liam R. Howlett    2022-09-06  2808  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2023-08-01 19:28 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-26 16:49 [PATCH RFC v2 0/4] Add support for sharing page tables across processes (Previously mshare) Khalid Aziz
2023-04-26 16:49 ` [PATCH RFC v2 1/4] mm/ptshare: Add vm flag for shared PTE Khalid Aziz
2023-04-26 16:49 ` [PATCH RFC v2 2/4] mm/ptshare: Add flag MAP_SHARED_PT to mmap() Khalid Aziz
2023-04-27 11:17   ` kernel test robot
2023-04-29  4:41   ` kernel test robot
2023-04-26 16:49 ` [PATCH RFC v2 3/4] mm/ptshare: Create new mm struct for page table sharing Khalid Aziz
2023-06-26  8:08   ` Karim Manaouil
2023-04-26 16:49 ` [PATCH RFC v2 4/4] mm/ptshare: Add page fault handling for page table shared regions Khalid Aziz
2023-04-27  0:24   ` kernel test robot
2023-04-29 14:07   ` kernel test robot
2023-04-26 21:27 ` [PATCH RFC v2 0/4] Add support for sharing page tables across processes (Previously mshare) Mike Kravetz
2023-04-27 16:40   ` Khalid Aziz
2023-06-12 16:25 ` Peter Xu
2023-06-30 11:29 ` Rongwei Wang
2023-07-31  4:35 ` Rongwei Wang
2023-07-31 12:25   ` Matthew Wilcox
2023-07-31 12:50     ` David Hildenbrand
2023-07-31 16:19       ` Rongwei Wang
2023-07-31 16:30         ` David Hildenbrand
2023-07-31 16:38           ` Matthew Wilcox
2023-07-31 16:48             ` David Hildenbrand
2023-07-31 16:54               ` Matthew Wilcox
2023-07-31 17:06                 ` David Hildenbrand
2023-08-01  6:53             ` Rongwei Wang
2023-08-01 19:28               ` Matthew Wilcox
2023-05-07 14:13 [PATCH RFC v2 3/4] mm/ptshare: Create new mm struct for page table sharing kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.