* Re: [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race
@ 2003-05-23 18:42 Paul E. McKenney
2003-05-29 15:14 ` Paul E. McKenney
0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2003-05-23 18:42 UTC (permalink / raw)
To: hugh; +Cc: phillips, akpm, hch, linux-mm, linux-kernel
On Fri, May 23, 2003 at 06:47:31PM +0100, Hugh Dickins wrote:
> On Fri, 23 May 2003, Daniel Phillips wrote:
> > On Friday 23 May 2003 18:21, Hugh Dickins wrote:
> > > Sorry, I miss the point of this patch entirely. At the moment it just
> > > looks like an unattractive rearrangement - the code churn akpm advised
> > > against - with no bearing on that vmtruncate race. Please correct me.
> >
> > This is all about supporting cross-host mmap (nice trick, huh?). Yes,
> > somebody should post a detailed rfc on that subject.
>
> Ah, thanks - translated into terms that I can understand, so that
> some ->nopage() not yet in the tree could do something after the
> install_new_page() returns. Hmm. Can we be sure it's appropriate
> for install_new_page to drop mm->page_table_lock before it returns?
Exactly -- allows a ->nopage() to drop some lock to avoid races
between pagefault and either vmtruncate() or invalidate_mmap_range().
This race (from the cross-host mmap viewpoint) is described in:
http://marc.theaimsgroup.com/?l=linux-kernel&m=105286345316249&w=2
install_new_page() has to drop mm->page_table_lock() for the same
reason that the previous do_no_page() did. In addition, dropping
the lock permits a ->nopage() to invoke things like zap_page_range()
which acquire mm->page_table_lock().
Thanx, Paul
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race 2003-05-23 18:42 [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race Paul E. McKenney @ 2003-05-29 15:14 ` Paul E. McKenney 2003-05-29 15:18 ` [RFC][PATCH] Remove LINUX_2_2 Paul E. McKenney 2003-05-29 16:33 ` [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race Hugh Dickins 0 siblings, 2 replies; 12+ messages in thread From: Paul E. McKenney @ 2003-05-29 15:14 UTC (permalink / raw) To: hugh; +Cc: phillips, akpm, hch, linux-mm, linux-kernel On Fri, May 23, 2003 at 11:42:02AM -0700, Paul E. McKenney wrote: > On Fri, May 23, 2003 at 06:47:31PM +0100, Hugh Dickins wrote: > > On Fri, 23 May 2003, Daniel Phillips wrote: > > > On Friday 23 May 2003 18:21, Hugh Dickins wrote: > > > > Sorry, I miss the point of this patch entirely. At the moment it just > > > > looks like an unattractive rearrangement - the code churn akpm advised > > > > against - with no bearing on that vmtruncate race. Please correct me. > > > > > > This is all about supporting cross-host mmap (nice trick, huh?). Yes, > > > somebody should post a detailed rfc on that subject. > > > > Ah, thanks - translated into terms that I can understand, so that > > some ->nopage() not yet in the tree could do something after the > > install_new_page() returns. Hmm. Can we be sure it's appropriate > > for install_new_page to drop mm->page_table_lock before it returns? > > Exactly -- allows a ->nopage() to drop some lock to avoid races > between pagefault and either vmtruncate() or invalidate_mmap_range(). > This race (from the cross-host mmap viewpoint) is described in: > > http://marc.theaimsgroup.com/?l=linux-kernel&m=105286345316249&w=2 > > install_new_page() has to drop mm->page_table_lock() for the same > reason that the previous do_no_page() did. In addition, dropping > the lock permits a ->nopage() to invoke things like zap_page_range() > which acquire mm->page_table_lock(). Rediffed for 2.5.70-mm1. Some added lines of code due to following the "#ifndef LINUX_2_2" in the sound system. The patch in the following email removes these #ifdefs on the off-chance that they are a holdover rather than the sound system's way of maintaining a single code base across all versions of Linux or some such. Thanx, Paul diff -urN -x dontdiff linux-2.5.70-mm1/arch/ia64/ia32/binfmt_elf32.c linux-2.5.70-mm1.install_new_page/arch/ia64/ia32/binfmt_elf32.c --- linux-2.5.70-mm1/arch/ia64/ia32/binfmt_elf32.c 2003-05-26 18:00:58.000000000 -0700 +++ linux-2.5.70-mm1.install_new_page/arch/ia64/ia32/binfmt_elf32.c 2003-05-28 20:17:42.000000000 -0700 @@ -56,13 +56,13 @@ extern struct page *ia32_shared_page[]; extern unsigned long *ia32_gdt; -struct page * -ia32_install_shared_page (struct vm_area_struct *vma, unsigned long address, int no_share) +int +ia32_install_shared_page (struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, int write_access, pmd_t *pmd) { struct page *pg = ia32_shared_page[(address - vma->vm_start)/PAGE_SIZE]; get_page(pg); - return pg; + return install_new_page(mm, vma, address, write_access, pmd, pg); } static struct vm_operations_struct ia32_shared_page_vm_ops = { diff -urN -x dontdiff linux-2.5.70-mm1/arch/sparc64/mm/hugetlbpage.c linux-2.5.70-mm1.install_new_page/arch/sparc64/mm/hugetlbpage.c --- linux-2.5.70-mm1/arch/sparc64/mm/hugetlbpage.c 2003-05-26 18:00:42.000000000 -0700 +++ linux-2.5.70-mm1.install_new_page/arch/sparc64/mm/hugetlbpage.c 2003-05-28 20:17:42.000000000 -0700 @@ -633,11 +633,11 @@ return (int) htlbzone_pages; } -static struct page * -hugetlb_nopage(struct vm_area_struct *vma, unsigned long address, int unused) +static int +hugetlb_nopage(struct mm_struct * mm, struct vm_area_struct *vma, unsigned long address, int write_access, pmd_t * pmd) { BUG(); - return NULL; + return VM_FAULT_SIGBUS; } static struct vm_operations_struct hugetlb_vm_ops = { diff -urN -x dontdiff linux-2.5.70-mm1/drivers/char/agp/alpha-agp.c linux-2.5.70-mm1.install_new_page/drivers/char/agp/alpha-agp.c --- linux-2.5.70-mm1/drivers/char/agp/alpha-agp.c 2003-05-26 18:00:42.000000000 -0700 +++ linux-2.5.70-mm1.install_new_page/drivers/char/agp/alpha-agp.c 2003-05-28 20:37:38.000000000 -0700 @@ -11,9 +11,9 @@ #include "agp.h" -static struct page *alpha_core_agp_vm_nopage(struct vm_area_struct *vma, - unsigned long address, - int write_access) +static int alpha_core_agp_vm_nopage(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long address, + int write_access, pmd_t pmd) { alpha_agp_info *agp = agp_bridge->dev_private_data; dma_addr_t dma_addr; @@ -23,14 +23,14 @@ dma_addr = address - vma->vm_start + agp->aperture.bus_base; pa = agp->ops->translate(agp, dma_addr); - if (pa == (unsigned long)-EINVAL) return NULL; /* no translation */ + if (pa == (unsigned long)-EINVAL) return VM_FAULT_SIGBUS; /* no translation */ /* * Get the page, inc the use count, and return it */ page = virt_to_page(__va(pa)); get_page(page); - return page; + return install_new_page(mm, vma, address, write_access, pmd, page); } static struct aper_size_info_fixed alpha_core_agp_sizes[] = diff -urN -x dontdiff linux-2.5.70-mm1/drivers/char/drm/drmP.h linux-2.5.70-mm1.install_new_page/drivers/char/drm/drmP.h --- linux-2.5.70-mm1/drivers/char/drm/drmP.h 2003-05-26 18:00:45.000000000 -0700 +++ linux-2.5.70-mm1.install_new_page/drivers/char/drm/drmP.h 2003-05-28 20:55:40.000000000 -0700 @@ -620,18 +620,17 @@ extern int DRM(fasync)(int fd, struct file *filp, int on); /* Mapping support (drm_vm.h) */ -extern struct page *DRM(vm_nopage)(struct vm_area_struct *vma, - unsigned long address, - int write_access); -extern struct page *DRM(vm_shm_nopage)(struct vm_area_struct *vma, - unsigned long address, - int write_access); -extern struct page *DRM(vm_dma_nopage)(struct vm_area_struct *vma, - unsigned long address, - int write_access); -extern struct page *DRM(vm_sg_nopage)(struct vm_area_struct *vma, - unsigned long address, - int write_access); +extern int DRM(vm_nopage)(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long address, int write_access, pmd_t *pmd); +extern int DRM(vm_shm_nopage)(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long address, + int write_access, pmd_t *pmd); +extern int DRM(vm_dma_nopage)(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long address, + int write_access, pmd_t *pmd); +extern int DRM(vm_sg_nopage)(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long address, + int write_access, pmd_t *pmd); extern void DRM(vm_open)(struct vm_area_struct *vma); extern void DRM(vm_close)(struct vm_area_struct *vma); extern void DRM(vm_shm_close)(struct vm_area_struct *vma); diff -urN -x dontdiff linux-2.5.70-mm1/drivers/char/drm/drm_vm.h linux-2.5.70-mm1.install_new_page/drivers/char/drm/drm_vm.h --- linux-2.5.70-mm1/drivers/char/drm/drm_vm.h 2003-05-26 18:01:02.000000000 -0700 +++ linux-2.5.70-mm1.install_new_page/drivers/char/drm/drm_vm.h 2003-05-28 20:57:19.000000000 -0700 @@ -55,9 +55,9 @@ .close = DRM(vm_close), }; -struct page *DRM(vm_nopage)(struct vm_area_struct *vma, - unsigned long address, - int write_access) +int DRM(vm_nopage)(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long address, + int write_access, pmd_t *pmd) { #if __REALLY_HAVE_AGP drm_file_t *priv = vma->vm_file->private_data; @@ -114,35 +114,35 @@ baddr, __va(agpmem->memory->memory[offset]), offset, atomic_read(&page->count)); - return page; + return install_new_page(mm, vma, address, write_access, pmd, page); } vm_nopage_error: #endif /* __REALLY_HAVE_AGP */ - return NOPAGE_SIGBUS; /* Disallow mremap */ + return VM_FAULT_SIGBUS; /* Disallow mremap */ } -struct page *DRM(vm_shm_nopage)(struct vm_area_struct *vma, - unsigned long address, - int write_access) +int DRM(vm_shm_nopage)(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long address, + int write_access, pmd_t *pmd) { drm_map_t *map = (drm_map_t *)vma->vm_private_data; unsigned long offset; unsigned long i; struct page *page; - if (address > vma->vm_end) return NOPAGE_SIGBUS; /* Disallow mremap */ - if (!map) return NOPAGE_OOM; /* Nothing allocated */ + if (address > vma->vm_end) return VM_FAULT_SIGBUS; /* Disallow mremap */ + if (!map) return VM_FAULT_OOM; /* Nothing allocated */ offset = address - vma->vm_start; i = (unsigned long)map->handle + offset; page = vmalloc_to_page((void *)i); if (!page) - return NOPAGE_OOM; + return VM_FAULT_OOM; get_page(page); DRM_DEBUG("shm_nopage 0x%lx\n", address); - return page; + return install_new_page(mm, vma, address, write_access, pmd, page); } /* Special close routine which deletes map information if we are the last @@ -221,9 +221,9 @@ up(&dev->struct_sem); } -struct page *DRM(vm_dma_nopage)(struct vm_area_struct *vma, - unsigned long address, - int write_access) +int DRM(vm_dma_nopage)(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long address, + int write_access, pmd_t *pmd) { drm_file_t *priv = vma->vm_file->private_data; drm_device_t *dev = priv->dev; @@ -232,9 +232,9 @@ unsigned long page_nr; struct page *page; - if (!dma) return NOPAGE_SIGBUS; /* Error */ - if (address > vma->vm_end) return NOPAGE_SIGBUS; /* Disallow mremap */ - if (!dma->pagelist) return NOPAGE_OOM ; /* Nothing allocated */ + if (!dma) return VM_FAULT_SIGBUS; /* Error */ + if (address > vma->vm_end) return VM_FAULT_SIGBUS; /* Disallow mremap */ + if (!dma->pagelist) return VM_FAULT_OOM ; /* Nothing allocated */ offset = address - vma->vm_start; /* vm_[pg]off[set] should be 0 */ page_nr = offset >> PAGE_SHIFT; @@ -244,12 +244,12 @@ get_page(page); DRM_DEBUG("dma_nopage 0x%lx (page %lu)\n", address, page_nr); - return page; + return install_new_page(mm, vma, address, write_access, pmd, page); } -struct page *DRM(vm_sg_nopage)(struct vm_area_struct *vma, - unsigned long address, - int write_access) +int DRM(vm_sg_nopage)(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long address, + int write_access, pmd_t *pmd) { drm_map_t *map = (drm_map_t *)vma->vm_private_data; drm_file_t *priv = vma->vm_file->private_data; @@ -260,9 +260,9 @@ unsigned long page_offset; struct page *page; - if (!entry) return NOPAGE_SIGBUS; /* Error */ - if (address > vma->vm_end) return NOPAGE_SIGBUS; /* Disallow mremap */ - if (!entry->pagelist) return NOPAGE_OOM ; /* Nothing allocated */ + if (!entry) return VM_FAULT_SIGBUS; /* Error */ + if (address > vma->vm_end) return VM_FAULT_SIGBUS; /* Disallow mremap */ + if (!entry->pagelist) return VM_FAULT_OOM ; /* Nothing allocated */ offset = address - vma->vm_start; @@ -271,7 +271,7 @@ page = entry->pagelist[page_offset]; get_page(page); - return page; + return install_new_page(mm, vma, address, write_access, pmd, page); } void DRM(vm_open)(struct vm_area_struct *vma) diff -urN -x dontdiff linux-2.5.70-mm1/drivers/ieee1394/dma.c linux-2.5.70-mm1.install_new_page/drivers/ieee1394/dma.c --- linux-2.5.70-mm1/drivers/ieee1394/dma.c 2003-05-26 18:00:40.000000000 -0700 +++ linux-2.5.70-mm1.install_new_page/drivers/ieee1394/dma.c 2003-05-28 20:39:31.000000000 -0700 @@ -184,28 +184,27 @@ /* nopage() handler for mmap access */ -static struct page* -dma_region_pagefault(struct vm_area_struct *area, unsigned long address, int write_access) +static int +dma_region_pagefault(struct mm_struct *mm, struct vm_area_struct *area, unsigned long address, int write_access, pmd_t *pmd) { unsigned long offset; unsigned long kernel_virt_addr; - struct page *ret = NOPAGE_SIGBUS; + struct page *page; struct dma_region *dma = (struct dma_region*) area->vm_private_data; if(!dma->kvirt) - goto out; + return VM_FAULT_SIGBUS; if( (address < (unsigned long) area->vm_start) || (address > (unsigned long) area->vm_start + (PAGE_SIZE * dma->n_pages)) ) - goto out; + return VM_FAULT_SIGBUS; offset = address - area->vm_start; kernel_virt_addr = (unsigned long) dma->kvirt + offset; - ret = vmalloc_to_page((void*) kernel_virt_addr); - get_page(ret); -out: - return ret; + page = vmalloc_to_page((void*) kernel_virt_addr); + get_page(page); + return install_new_page(mm, vma, address, write_access, pmd, page); } static struct vm_operations_struct dma_region_vm_ops = { diff -urN -x dontdiff linux-2.5.70-mm1/drivers/media/video/video-buf.c linux-2.5.70-mm1.install_new_page/drivers/media/video/video-buf.c --- linux-2.5.70-mm1/drivers/media/video/video-buf.c 2003-05-26 18:00:40.000000000 -0700 +++ linux-2.5.70-mm1.install_new_page/drivers/media/video/video-buf.c 2003-05-28 20:17:42.000000000 -0700 @@ -979,21 +979,21 @@ * now ...). Bounce buffers don't work very well for the data rates * video capture has. */ -static struct page* -videobuf_vm_nopage(struct vm_area_struct *vma, unsigned long vaddr, - int write_access) +static int +videobuf_vm_nopage(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long vaddr, int write_access, pmd_t pmd) { struct page *page; dprintk(3,"nopage: fault @ %08lx [vma %08lx-%08lx]\n", vaddr,vma->vm_start,vma->vm_end); if (vaddr > vma->vm_end) - return NOPAGE_SIGBUS; + return VM_FAULT_SIGBUS; page = alloc_page(GFP_USER); if (!page) - return NOPAGE_OOM; + return VM_FAULT_OOM; clear_user_page(page_address(page), vaddr, page); - return page; + return install_new_page(mm, vma, vaddr, write_access, pmd, page); } static struct vm_operations_struct videobuf_vm_ops = diff -urN -x dontdiff linux-2.5.70-mm1/drivers/scsi/sg.c linux-2.5.70-mm1.install_new_page/drivers/scsi/sg.c --- linux-2.5.70-mm1/drivers/scsi/sg.c 2003-05-28 20:16:04.000000000 -0700 +++ linux-2.5.70-mm1.install_new_page/drivers/scsi/sg.c 2003-05-28 20:39:59.000000000 -0700 @@ -1121,21 +1121,21 @@ } } -static struct page * -sg_vma_nopage(struct vm_area_struct *vma, unsigned long addr, int unused) +static int +sg_vma_nopage(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, int write_access, pmd_t *pmd) { Sg_fd *sfp; - struct page *page = NOPAGE_SIGBUS; + struct page *page = VM_FAULT_SIGBUS; void *page_ptr = NULL; unsigned long offset; Sg_scatter_hold *rsv_schp; if ((NULL == vma) || (!(sfp = (Sg_fd *) vma->vm_private_data))) - return page; + return install_new_page(mm, vma, addr, write_access, pmd, page); rsv_schp = &sfp->reserve; offset = addr - vma->vm_start; if (offset >= rsv_schp->bufflen) - return page; + return install_new_page(mm, vma, addr, write_access, pmd, page); SCSI_LOG_TIMEOUT(3, printk("sg_vma_nopage: offset=%lu, scatg=%d\n", offset, rsv_schp->k_use_sg)); if (rsv_schp->k_use_sg) { /* reserve buffer is a scatter gather list */ @@ -1162,7 +1162,7 @@ page = virt_to_page(page_ptr); get_page(page); /* increment page count */ } - return page; + return install_new_page(mm, vma, addr, write_access, pmd, page); } static struct vm_operations_struct sg_mmap_vm_ops = { diff -urN -x dontdiff linux-2.5.70-mm1/drivers/sgi/char/graphics.c linux-2.5.70-mm1.install_new_page/drivers/sgi/char/graphics.c --- linux-2.5.70-mm1/drivers/sgi/char/graphics.c 2003-05-26 18:00:40.000000000 -0700 +++ linux-2.5.70-mm1.install_new_page/drivers/sgi/char/graphics.c 2003-05-28 20:17:42.000000000 -0700 @@ -211,9 +211,9 @@ /* * This is the core of the direct rendering engine. */ -struct page * -sgi_graphics_nopage (struct vm_area_struct *vma, unsigned long address, int - no_share) +struct int +sgi_graphics_nopage (struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long address, int write_access, pmd_t *pmdpf) { pgd_t *pgd; pmd_t *pmd; pte_t *pte; int board = GRAPHICS_CARD (vma->vm_dentry->d_inode->i_rdev); @@ -249,7 +249,7 @@ pte = pte_kmap_offset(pmd, address); page = pte_page(*pte); pte_kunmap(pte); - return page; + return install_new_page(mm, vma, address, write_access, pmdpf, page); } /* diff -urN -x dontdiff linux-2.5.70-mm1/fs/ncpfs/mmap.c linux-2.5.70-mm1.install_new_page/fs/ncpfs/mmap.c --- linux-2.5.70-mm1/fs/ncpfs/mmap.c 2003-05-26 18:00:43.000000000 -0700 +++ linux-2.5.70-mm1.install_new_page/fs/ncpfs/mmap.c 2003-05-28 20:17:42.000000000 -0700 @@ -25,8 +25,8 @@ /* * Fill in the supplied page for mmap */ -static struct page* ncp_file_mmap_nopage(struct vm_area_struct *area, - unsigned long address, int write_access) +static int ncp_file_mmap_nopage(struct mm_struct *mm, struct vm_area_struct *area, + unsigned long address, int write_access, pmd_t *pmd) { struct file *file = area->vm_file; struct dentry *dentry = file->f_dentry; @@ -85,7 +85,7 @@ memset(pg_addr + already_read, 0, PAGE_SIZE - already_read); flush_dcache_page(page); kunmap(page); - return page; + return install_new_page(mm, area, address, write_access, pmd, page); } static struct vm_operations_struct ncp_file_mmap = diff -urN -x dontdiff linux-2.5.70-mm1/include/linux/mm.h linux-2.5.70-mm1.install_new_page/include/linux/mm.h --- linux-2.5.70-mm1/include/linux/mm.h 2003-05-28 20:16:04.000000000 -0700 +++ linux-2.5.70-mm1.install_new_page/include/linux/mm.h 2003-05-28 20:17:42.000000000 -0700 @@ -142,7 +142,7 @@ struct vm_operations_struct { void (*open)(struct vm_area_struct * area); void (*close)(struct vm_area_struct * area); - struct page * (*nopage)(struct vm_area_struct * area, unsigned long address, int unused); + int (*nopage)(struct mm_struct * mm, struct vm_area_struct * area, unsigned long address, int write_access, pmd_t *pmd); int (*populate)(struct vm_area_struct * area, unsigned long address, unsigned long len, pgprot_t prot, unsigned long pgoff, int nonblock); }; @@ -380,12 +380,6 @@ } /* - * Error return values for the *_nopage functions - */ -#define NOPAGE_SIGBUS (NULL) -#define NOPAGE_OOM ((struct page *) (-1)) - -/* * Different kinds of faults, as returned by handle_mm_fault(). * Used to decide whether a process gets delivered SIGBUS or * just gets major/minor fault counters bumped up. @@ -402,8 +396,8 @@ extern void show_free_areas(void); -struct page *shmem_nopage(struct vm_area_struct * vma, - unsigned long address, int unused); +int shmem_nopage(struct mm_struct * mm, struct vm_area_struct * vma, + unsigned long address, int write_access, pmd_t * pmd); struct file *shmem_file_setup(char * name, loff_t size, unsigned long flags); void shmem_lock(struct file * file, int lock); int shmem_zero_setup(struct vm_area_struct *); @@ -421,6 +415,7 @@ int zeromap_page_range(struct vm_area_struct *vma, unsigned long from, unsigned long size, pgprot_t prot); +extern int install_new_page(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, int write_access, pmd_t *pmd, struct page * new_page); extern void invalidate_mmap_range(struct address_space *mapping, loff_t const holebegin, loff_t const holelen); @@ -559,7 +554,7 @@ extern void truncate_inode_pages(struct address_space *, loff_t); /* generic vm_area_ops exported for stackable file systems */ -extern struct page *filemap_nopage(struct vm_area_struct *, unsigned long, int); +int filemap_nopage(struct mm_struct *, struct vm_area_struct *, unsigned long, int, pmd_t *); /* mm/page-writeback.c */ int write_one_page(struct page *page, int wait); diff -urN -x dontdiff linux-2.5.70-mm1/kernel/ksyms.c linux-2.5.70-mm1.install_new_page/kernel/ksyms.c --- linux-2.5.70-mm1/kernel/ksyms.c 2003-05-28 20:16:04.000000000 -0700 +++ linux-2.5.70-mm1.install_new_page/kernel/ksyms.c 2003-05-28 20:17:42.000000000 -0700 @@ -116,6 +116,7 @@ EXPORT_SYMBOL(max_mapnr); #endif EXPORT_SYMBOL(high_memory); +EXPORT_SYMBOL(install_new_page); EXPORT_SYMBOL(invalidate_mmap_range); EXPORT_SYMBOL(vmtruncate); EXPORT_SYMBOL(find_vma); diff -urN -x dontdiff linux-2.5.70-mm1/mm/filemap.c linux-2.5.70-mm1.install_new_page/mm/filemap.c --- linux-2.5.70-mm1/mm/filemap.c 2003-05-28 20:16:04.000000000 -0700 +++ linux-2.5.70-mm1.install_new_page/mm/filemap.c 2003-05-28 20:17:42.000000000 -0700 @@ -1013,7 +1013,7 @@ * it in the page cache, and handles the special cases reasonably without * having a lot of duplicated code. */ -struct page * filemap_nopage(struct vm_area_struct * area, unsigned long address, int unused) +int filemap_nopage(struct mm_struct * mm, struct vm_area_struct * area, unsigned long address, int write_access, pmd_t * pmd) { int error; struct file *file = area->vm_file; @@ -1034,7 +1034,7 @@ */ size = (inode->i_size + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; if ((pgoff >= size) && (area->vm_mm == current->mm)) - return NULL; + return VM_FAULT_SIGBUS; /* * The "size" of the file, as far as mmap is concerned, isn't bigger @@ -1088,7 +1088,7 @@ * Found the page and have a reference on it. */ mark_page_accessed(page); - return page; + return install_new_page(mm, area, address, write_access, pmd, page); no_cached_page: /* @@ -1111,8 +1111,8 @@ * to schedule I/O. */ if (error == -ENOMEM) - return NOPAGE_OOM; - return NULL; + return VM_FAULT_OOM; + return VM_FAULT_SIGBUS; page_not_uptodate: inc_page_state(pgmajfault); @@ -1169,7 +1169,7 @@ * mm layer so, possibly freeing the page cache page first. */ page_cache_release(page); - return NULL; + return VM_FAULT_SIGBUS; } static struct page * filemap_getpage(struct file *file, unsigned long pgoff, diff -urN -x dontdiff linux-2.5.70-mm1/mm/memory.c linux-2.5.70-mm1.install_new_page/mm/memory.c --- linux-2.5.70-mm1/mm/memory.c 2003-05-28 20:16:04.000000000 -0700 +++ linux-2.5.70-mm1.install_new_page/mm/memory.c 2003-05-28 20:43:16.000000000 -0700 @@ -1374,39 +1374,33 @@ } /* - * do_no_page() tries to create a new page mapping. It aggressively - * tries to share with existing pages, but makes a separate copy if - * the "write_access" parameter is true in order to avoid the next - * page fault. - * - * As this is called only for pages that do not currently exist, we - * do not need to flush old virtual caches or the TLB. - * - * This is called with the MM semaphore held and the page table - * spinlock held. Exit with the spinlock released. + * do_no_page() invokes do_anonymous_page() or ->nopage, as appropriate. + * Called w/ MM sema and page_table_lock held, the latter released before exit. */ -static int +static inline int do_no_page(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, int write_access, pte_t *page_table, pmd_t *pmd) { - struct page * new_page; - pte_t entry; - struct pte_chain *pte_chain; - int ret; - if (!vma->vm_ops || !vma->vm_ops->nopage) - return do_anonymous_page(mm, vma, page_table, - pmd, write_access, address); + return do_anonymous_page(mm, vma, page_table, pmd, write_access, address); pte_unmap(page_table); spin_unlock(&mm->page_table_lock); + return vma->vm_ops->nopage(mm, vma, address & PAGE_MASK, write_access, pmd); +} - new_page = vma->vm_ops->nopage(vma, address & PAGE_MASK, 0); - - /* no page was available -- either SIGBUS or OOM */ - if (new_page == NOPAGE_SIGBUS) - return VM_FAULT_SIGBUS; - if (new_page == NOPAGE_OOM) - return VM_FAULT_OOM; +/* + * install_new_page - tries to create a new page mapping. + * install_new_page() tries to share w/existing pages, but makes separate + * copy if "write_access" is true in order to avoid the next page fault. + * As this is called only for pages that do not currently exist, we + * do not need to flush old virtual caches or the TLB. + */ +int +install_new_page(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, int write_access, pmd_t *pmd, struct page * new_page) +{ + pte_t entry, *page_table; + struct pte_chain *pte_chain; + int ret; pte_chain = pte_chain_alloc(GFP_KERNEL); if (!pte_chain) diff -urN -x dontdiff linux-2.5.70-mm1/mm/shmem.c linux-2.5.70-mm1.install_new_page/mm/shmem.c --- linux-2.5.70-mm1/mm/shmem.c 2003-05-26 18:00:39.000000000 -0700 +++ linux-2.5.70-mm1.install_new_page/mm/shmem.c 2003-05-28 20:17:42.000000000 -0700 @@ -936,7 +936,7 @@ return error; } -struct page *shmem_nopage(struct vm_area_struct *vma, unsigned long address, int unused) +int shmem_nopage(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, int write_access, pmd_t *pmd) { struct inode *inode = vma->vm_file->f_dentry->d_inode; struct page *page = NULL; @@ -949,10 +949,10 @@ error = shmem_getpage(inode, idx, &page, SGP_CACHE); if (error) - return (error == -ENOMEM)? NOPAGE_OOM: NOPAGE_SIGBUS; + return (error == -ENOMEM)? VM_FAULT_OOM: VM_FAULT_SIGBUS; mark_page_accessed(page); - return page; + return install_new_page(mm, vma, address, write_access, pmd, page); } static int shmem_populate(struct vm_area_struct *vma, diff -urN -x dontdiff linux-2.5.70-mm1/sound/core/pcm_native.c linux-2.5.70-mm1.install_new_page/sound/core/pcm_native.c --- linux-2.5.70-mm1/sound/core/pcm_native.c 2003-05-26 18:00:37.000000000 -0700 +++ linux-2.5.70-mm1.install_new_page/sound/core/pcm_native.c 2003-05-28 21:39:45.000000000 -0700 @@ -60,6 +60,11 @@ static int snd_pcm_hw_refine_old_user(snd_pcm_substream_t * substream, struct sndrv_pcm_hw_params_old * _oparams); static int snd_pcm_hw_params_old_user(snd_pcm_substream_t * substream, struct sndrv_pcm_hw_params_old * _oparams); +#ifndef LINUX_2_2 +#define NOPAGE_OOM VM_FAULT_OOM +#define NOPAGE_SIGBUS VM_FAULT_SIGBUS +#endif + /* * */ @@ -2693,7 +2698,7 @@ #endif #ifndef LINUX_2_2 -static struct page * snd_pcm_mmap_status_nopage(struct vm_area_struct *area, unsigned long address, int no_share) +static int snd_pcm_mmap_status_nopage(struct mm_struct *mm, struct vm_area_struct *area, unsigned long address, int write_access, pmd_t *pmd) #else static unsigned long snd_pcm_mmap_status_nopage(struct vm_area_struct *area, unsigned long address, int no_share) #endif @@ -2708,7 +2713,7 @@ page = virt_to_page(runtime->status); get_page(page); #ifndef LINUX_2_2 - return page; + return install_new_page(mm, area, address, write_access, pmd, page); #else return page_address(page); #endif @@ -2747,7 +2752,7 @@ } #ifndef LINUX_2_2 -static struct page * snd_pcm_mmap_control_nopage(struct vm_area_struct *area, unsigned long address, int no_share) +static int snd_pcm_mmap_control_nopage(struct mm_struct *mm, struct vm_area_struct *area, unsigned long address, int write_access, pmd_t *pmd) #else static unsigned long snd_pcm_mmap_control_nopage(struct vm_area_struct *area, unsigned long address, int no_share) #endif @@ -2762,7 +2767,7 @@ page = virt_to_page(runtime->control); get_page(page); #ifndef LINUX_2_2 - return page; + return install_new_page(mm, area, address, write_access, pmd, page); #else return page_address(page); #endif @@ -2813,7 +2818,7 @@ } #ifndef LINUX_2_2 -static struct page * snd_pcm_mmap_data_nopage(struct vm_area_struct *area, unsigned long address, int no_share) +static int snd_pcm_mmap_data_nopage(struct mm_struct *mm, struct vm_area_struct *area, unsigned long address, int write_access, pmd_t *pmd) #else static unsigned long snd_pcm_mmap_data_nopage(struct vm_area_struct *area, unsigned long address, int no_share) #endif @@ -2848,7 +2853,7 @@ } get_page(page); #ifndef LINUX_2_2 - return page; + return install_new_page(mm, area, address, write_access, pmd, page); #else return page_address(page); #endif diff -urN -x dontdiff linux-2.5.70-mm1/sound/oss/emu10k1/audio.c linux-2.5.70-mm1.install_new_page/sound/oss/emu10k1/audio.c --- linux-2.5.70-mm1/sound/oss/emu10k1/audio.c 2003-05-26 18:00:23.000000000 -0700 +++ linux-2.5.70-mm1.install_new_page/sound/oss/emu10k1/audio.c 2003-05-28 20:17:42.000000000 -0700 @@ -970,7 +970,7 @@ return 0; } -static struct page *emu10k1_mm_nopage (struct vm_area_struct * vma, unsigned long address, int write_access) +static int emu10k1_mm_nopage (struct mm_struct * mm, struct vm_area_struct * vma, unsigned long address, int write_access, pmd_t * pmd) { struct emu10k1_wavedevice *wave_dev = vma->vm_private_data; struct woinst *woinst = wave_dev->woinst; @@ -983,8 +983,8 @@ DPD(3, "addr: %#lx\n", address); if (address > vma->vm_end) { - DPF(1, "EXIT, returning NOPAGE_SIGBUS\n"); - return NOPAGE_SIGBUS; /* Disallow mremap */ + DPF(1, "EXIT, returning VM_FAULT_SIGBUS\n"); + return VM_FAULT_SIGBUS; /* Disallow mremap */ } pgoff = vma->vm_pgoff + ((address - vma->vm_start) >> PAGE_SHIFT); @@ -1013,7 +1013,7 @@ get_page (dmapage); DPD(3, "page: %#lx\n", (unsigned long) dmapage); - return dmapage; + return install_new_page(mm, vma, address, write_access, pmd, dmapage); } struct vm_operations_struct emu10k1_mm_ops = { diff -urN -x dontdiff linux-2.5.70-mm1/sound/oss/via82cxxx_audio.c linux-2.5.70-mm1.install_new_page/sound/oss/via82cxxx_audio.c --- linux-2.5.70-mm1/sound/oss/via82cxxx_audio.c 2003-05-26 18:00:27.000000000 -0700 +++ linux-2.5.70-mm1.install_new_page/sound/oss/via82cxxx_audio.c 2003-05-28 20:17:44.000000000 -0700 @@ -1846,8 +1846,8 @@ } -static struct page * via_mm_nopage (struct vm_area_struct * vma, - unsigned long address, int write_access) +static int via_mm_nopage (struct mm_struct *mm, struct vm_area_struct * vma, + unsigned long address, int write_access, pmd_t *pmd) { struct via_info *card = vma->vm_private_data; struct via_channel *chan = &card->ch_out; @@ -1863,12 +1863,12 @@ write_access); if (address > vma->vm_end) { - DPRINTK ("EXIT, returning NOPAGE_SIGBUS\n"); - return NOPAGE_SIGBUS; /* Disallow mremap */ + DPRINTK ("EXIT, returning VM_FAULT_SIGBUS\n"); + return VM_FAULT_SIGBUS; /* Disallow mremap */ } if (!card) { - DPRINTK ("EXIT, returning NOPAGE_OOM\n"); - return NOPAGE_OOM; /* Nothing allocated */ + DPRINTK ("EXIT, returning VM_FAULT_OOM\n"); + return VM_FAULT_OOM; /* Nothing allocated */ } pgoff = vma->vm_pgoff + ((address - vma->vm_start) >> PAGE_SHIFT); @@ -1895,10 +1895,10 @@ assert ((((unsigned long)chan->pgtbl[pgoff].cpuaddr) % PAGE_SIZE) == 0); dmapage = virt_to_page (chan->pgtbl[pgoff].cpuaddr); - DPRINTK ("EXIT, returning page %p for cpuaddr %lXh\n", + DPRINTK ("EXIT, installing page %p for cpuaddr %lXh\n", dmapage, (unsigned long) chan->pgtbl[pgoff].cpuaddr); get_page (dmapage); - return dmapage; + return install_new_page(mm, vma, address, write_access, pmd, dmapage); } ^ permalink raw reply [flat|nested] 12+ messages in thread
* [RFC][PATCH] Remove LINUX_2_2 2003-05-29 15:14 ` Paul E. McKenney @ 2003-05-29 15:18 ` Paul E. McKenney 2003-05-29 16:33 ` [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race Hugh Dickins 1 sibling, 0 replies; 12+ messages in thread From: Paul E. McKenney @ 2003-05-29 15:18 UTC (permalink / raw) To: hugh; +Cc: phillips, akpm, hch, linux-mm, linux-kernel On Thu, May 29, 2003 at 08:14:24AM -0700, Paul E. McKenney wrote: > Rediffed for 2.5.70-mm1. Some added lines of code due to following > the "#ifndef LINUX_2_2" in the sound system. The patch in the following > email removes these #ifdefs on the off-chance that they are a > holdover rather than the sound system's way of maintaining > a single code base across all versions of Linux or some such. This is the patch to remove the LINUX_2_2. This patch depends on the earlier install_new_page.2.5.70-mm1-3.patch sent earlier. Thanx, Paul diff -urN -X dontdiff linux-2.5.70-mm1.install_new_page/sound/core/control.c linux-2.5.70-mm1.loseLINUX_2_2/sound/core/control.c --- linux-2.5.70-mm1.install_new_page/sound/core/control.c 2003-05-26 18:00:24.000000000 -0700 +++ linux-2.5.70-mm1.loseLINUX_2_2/sound/core/control.c 2003-05-28 22:41:51.000000000 -0700 @@ -931,9 +931,7 @@ static struct file_operations snd_ctl_f_ops = { -#ifndef LINUX_2_2 .owner = THIS_MODULE, -#endif .read = snd_ctl_read, .open = snd_ctl_open, .release = snd_ctl_release, diff -urN -X dontdiff linux-2.5.70-mm1.install_new_page/sound/core/hwdep.c linux-2.5.70-mm1.loseLINUX_2_2/sound/core/hwdep.c --- linux-2.5.70-mm1.install_new_page/sound/core/hwdep.c 2003-05-26 18:00:21.000000000 -0700 +++ linux-2.5.70-mm1.loseLINUX_2_2/sound/core/hwdep.c 2003-05-28 22:41:51.000000000 -0700 @@ -292,9 +292,7 @@ static struct file_operations snd_hwdep_f_ops = { -#ifndef LINUX_2_2 .owner = THIS_MODULE, -#endif .llseek = snd_hwdep_llseek, .read = snd_hwdep_read, .write = snd_hwdep_write, diff -urN -X dontdiff linux-2.5.70-mm1.install_new_page/sound/core/info.c linux-2.5.70-mm1.loseLINUX_2_2/sound/core/info.c --- linux-2.5.70-mm1.install_new_page/sound/core/info.c 2003-05-26 18:00:59.000000000 -0700 +++ linux-2.5.70-mm1.loseLINUX_2_2/sound/core/info.c 2003-05-28 22:41:51.000000000 -0700 @@ -126,27 +126,6 @@ snd_info_entry_t *snd_oss_root = NULL; #endif -#ifdef LINUX_2_2 -static void snd_info_fill_inode(struct inode *inode, int fill) -{ - if (fill) - MOD_INC_USE_COUNT; - else - MOD_DEC_USE_COUNT; -} - -static inline void snd_info_entry_prepare(struct proc_dir_entry *de) -{ - de->fill_inode = snd_info_fill_inode; -} - -void snd_remove_proc_entry(struct proc_dir_entry *parent, - struct proc_dir_entry *de) -{ - if (parent && de) - proc_unregister(parent, de->low_ino); -} -#else static inline void snd_info_entry_prepare(struct proc_dir_entry *de) { de->owner = THIS_MODULE; @@ -158,7 +137,6 @@ if (de) remove_proc_entry(de->name, parent); } -#endif static loff_t snd_info_entry_llseek(struct file *file, loff_t offset, int orig) { @@ -520,9 +498,7 @@ static struct file_operations snd_info_entry_operations = { -#ifndef LINUX_2_2 .owner = THIS_MODULE, -#endif .llseek = snd_info_entry_llseek, .read = snd_info_entry_read, .write = snd_info_entry_write, @@ -533,67 +509,22 @@ .release = snd_info_entry_release, }; -#ifdef LINUX_2_2 -static struct inode_operations snd_info_entry_inode_operations = -{ - &snd_info_entry_operations, /* default sound info directory file-ops */ -}; - -static struct inode_operations snd_info_device_inode_operations = -{ - &snd_fops, /* default sound info directory file-ops */ -}; -#endif /* LINUX_2_2 */ - static int snd_info_card_readlink(struct dentry *dentry, char *buffer, int buflen) { char *s = PDE(dentry->d_inode)->data; -#ifndef LINUX_2_2 return vfs_readlink(dentry, buffer, buflen, s); -#else - int len; - - if (s == NULL) - return -EIO; - len = strlen(s); - if (len > buflen) - len = buflen; - if (copy_to_user(buffer, s, len)) - return -EFAULT; - return len; -#endif } -#ifndef LINUX_2_2 static int snd_info_card_followlink(struct dentry *dentry, struct nameidata *nd) { - char *s = PDE(dentry->d_inode)->data; - return vfs_follow_link(nd, s); -} -#else -static struct dentry *snd_info_card_followlink(struct dentry *dentry, - struct dentry *base, - unsigned int follow) -{ char *s = PDE(dentry->d_inode)->data; - return lookup_dentry(s, base, follow); + return vfs_follow_link(nd, s); } -#endif - -#ifdef LINUX_2_2 -static struct file_operations snd_info_card_link_operations = -{ - NULL -}; -#endif struct inode_operations snd_info_card_link_inode_operations = { -#ifdef LINUX_2_2 - .default_file_ops = &snd_info_card_link_operations, -#endif .readlink = snd_info_card_readlink, .follow_link = snd_info_card_followlink, }; @@ -744,12 +675,8 @@ if (p == NULL) return -ENOMEM; p->data = s; -#ifndef LINUX_2_2 p->owner = card->module; p->proc_iops = &snd_info_card_link_inode_operations; -#else - p->ops = &snd_info_card_link_inode_operations; -#endif card->proc_root_link = p; return 0; } @@ -1008,40 +935,11 @@ snd_magic_kfree(entry); } -#ifdef LINUX_2_2 -static void snd_info_device_fill_inode(struct inode *inode, int fill) -{ - struct proc_dir_entry *de; - snd_info_entry_t *entry; - - if (!fill) { - MOD_DEC_USE_COUNT; - return; - } - MOD_INC_USE_COUNT; - de = PDE(inode); - if (de == NULL) - return; - entry = (snd_info_entry_t *) de->data; - if (entry == NULL) - return; - inode->i_gid = device_gid; - inode->i_uid = device_uid; - inode->i_rdev = MKDEV(entry->c.device.major, entry->c.device.minor); -} - -static inline void snd_info_device_entry_prepare(struct proc_dir_entry *de, snd_info_entry_t *entry) -{ - de->fill_inode = snd_info_device_fill_inode; - de->ops = &snd_info_device_inode_operations; -} -#else static inline void snd_info_device_entry_prepare(struct proc_dir_entry *de, snd_info_entry_t *entry) { de->rdev = mk_kdev(entry->c.device.major, entry->c.device.minor); de->owner = THIS_MODULE; } -#endif /* LINUX_2_2 */ /* * create a procfs device file @@ -1119,15 +1017,9 @@ up(&info_mutex); return -ENOMEM; } -#ifndef LINUX_2_2 p->owner = entry->module; -#endif if (!S_ISDIR(entry->mode)) { -#ifndef LINUX_2_2 p->proc_fops = &snd_info_entry_operations; -#else - p->ops = &snd_info_entry_inode_operations; -#endif } p->size = entry->size; p->data = entry; diff -urN -X dontdiff linux-2.5.70-mm1.install_new_page/sound/core/init.c linux-2.5.70-mm1.loseLINUX_2_2/sound/core/init.c --- linux-2.5.70-mm1.install_new_page/sound/core/init.c 2003-05-26 18:00:25.000000000 -0700 +++ linux-2.5.70-mm1.loseLINUX_2_2/sound/core/init.c 2003-05-28 22:41:51.000000000 -0700 @@ -193,9 +193,7 @@ f_ops = &s_f_ops->f_ops; memset(f_ops, 0, sizeof(*f_ops)); -#ifndef LINUX_2_2 f_ops->owner = file->f_op->owner; -#endif f_ops->release = file->f_op->release; f_ops->poll = snd_disconnect_poll; diff -urN -X dontdiff linux-2.5.70-mm1.install_new_page/sound/core/oss/mixer_oss.c linux-2.5.70-mm1.loseLINUX_2_2/sound/core/oss/mixer_oss.c --- linux-2.5.70-mm1.install_new_page/sound/core/oss/mixer_oss.c 2003-05-26 18:00:42.000000000 -0700 +++ linux-2.5.70-mm1.loseLINUX_2_2/sound/core/oss/mixer_oss.c 2003-05-28 22:41:51.000000000 -0700 @@ -376,9 +376,7 @@ static struct file_operations snd_mixer_oss_f_ops = { -#ifndef LINUX_2_2 .owner = THIS_MODULE, -#endif .open = snd_mixer_oss_open, .release = snd_mixer_oss_release, .ioctl = snd_mixer_oss_ioctl, diff -urN -X dontdiff linux-2.5.70-mm1.install_new_page/sound/core/oss/pcm_oss.c linux-2.5.70-mm1.loseLINUX_2_2/sound/core/oss/pcm_oss.c --- linux-2.5.70-mm1.install_new_page/sound/core/oss/pcm_oss.c 2003-05-26 18:00:56.000000000 -0700 +++ linux-2.5.70-mm1.loseLINUX_2_2/sound/core/oss/pcm_oss.c 2003-05-28 22:41:51.000000000 -0700 @@ -2148,9 +2148,7 @@ static struct file_operations snd_pcm_oss_f_reg = { -#ifndef LINUX_2_2 .owner = THIS_MODULE, -#endif .read = snd_pcm_oss_read, .write = snd_pcm_oss_write, .open = snd_pcm_oss_open, diff -urN -X dontdiff linux-2.5.70-mm1.install_new_page/sound/core/pcm_native.c linux-2.5.70-mm1.loseLINUX_2_2/sound/core/pcm_native.c --- linux-2.5.70-mm1.install_new_page/sound/core/pcm_native.c 2003-05-28 21:39:45.000000000 -0700 +++ linux-2.5.70-mm1.loseLINUX_2_2/sound/core/pcm_native.c 2003-05-28 22:46:38.000000000 -0700 @@ -60,11 +60,6 @@ static int snd_pcm_hw_refine_old_user(snd_pcm_substream_t * substream, struct sndrv_pcm_hw_params_old * _oparams); static int snd_pcm_hw_params_old_user(snd_pcm_substream_t * substream, struct sndrv_pcm_hw_params_old * _oparams); -#ifndef LINUX_2_2 -#define NOPAGE_OOM VM_FAULT_OOM -#define NOPAGE_SIGBUS VM_FAULT_SIGBUS -#endif - /* * */ @@ -2687,21 +2682,13 @@ } #ifndef VM_RESERVED -#ifndef LINUX_2_2 static int snd_pcm_mmap_swapout(struct page * page, struct file * file) -#else -static int snd_pcm_mmap_swapout(struct vm_area_struct * area, struct page * page) -#endif { return 0; } #endif -#ifndef LINUX_2_2 static int snd_pcm_mmap_status_nopage(struct mm_struct *mm, struct vm_area_struct *area, unsigned long address, int write_access, pmd_t *pmd) -#else -static unsigned long snd_pcm_mmap_status_nopage(struct vm_area_struct *area, unsigned long address, int no_share) -#endif { snd_pcm_substream_t *substream = (snd_pcm_substream_t *)area->vm_private_data; snd_pcm_runtime_t *runtime; @@ -2712,11 +2699,7 @@ runtime = substream->runtime; page = virt_to_page(runtime->status); get_page(page); -#ifndef LINUX_2_2 return install_new_page(mm, area, address, write_access, pmd, page); -#else - return page_address(page); -#endif } static struct vm_operations_struct snd_pcm_vm_ops_status = @@ -2740,22 +2723,14 @@ if (size != PAGE_ALIGN(sizeof(snd_pcm_mmap_status_t))) return -EINVAL; area->vm_ops = &snd_pcm_vm_ops_status; -#ifndef LINUX_2_2 area->vm_private_data = substream; -#else - area->vm_private_data = (long)substream; -#endif #ifdef VM_RESERVED area->vm_flags |= VM_RESERVED; #endif return 0; } -#ifndef LINUX_2_2 static int snd_pcm_mmap_control_nopage(struct mm_struct *mm, struct vm_area_struct *area, unsigned long address, int write_access, pmd_t *pmd) -#else -static unsigned long snd_pcm_mmap_control_nopage(struct vm_area_struct *area, unsigned long address, int no_share) -#endif { snd_pcm_substream_t *substream = (snd_pcm_substream_t *)area->vm_private_data; snd_pcm_runtime_t *runtime; @@ -2766,11 +2741,7 @@ runtime = substream->runtime; page = virt_to_page(runtime->control); get_page(page); -#ifndef LINUX_2_2 return install_new_page(mm, area, address, write_access, pmd, page); -#else - return page_address(page); -#endif } static struct vm_operations_struct snd_pcm_vm_ops_control = @@ -2794,11 +2765,7 @@ if (size != PAGE_ALIGN(sizeof(snd_pcm_mmap_control_t))) return -EINVAL; area->vm_ops = &snd_pcm_vm_ops_control; -#ifndef LINUX_2_2 area->vm_private_data = substream; -#else - area->vm_private_data = (long)substream; -#endif #ifdef VM_RESERVED area->vm_flags |= VM_RESERVED; #endif @@ -2817,11 +2784,7 @@ atomic_dec(&substream->runtime->mmap_count); } -#ifndef LINUX_2_2 static int snd_pcm_mmap_data_nopage(struct mm_struct *mm, struct vm_area_struct *area, unsigned long address, int write_access, pmd_t *pmd) -#else -static unsigned long snd_pcm_mmap_data_nopage(struct vm_area_struct *area, unsigned long address, int no_share) -#endif { snd_pcm_substream_t *substream = (snd_pcm_substream_t *)area->vm_private_data; snd_pcm_runtime_t *runtime; @@ -2852,11 +2815,7 @@ page = virt_to_page(vaddr); } get_page(page); -#ifndef LINUX_2_2 return install_new_page(mm, area, address, write_access, pmd, page); -#else - return page_address(page); -#endif } static struct vm_operations_struct snd_pcm_vm_ops_data = @@ -2906,11 +2865,7 @@ return -EINVAL; area->vm_ops = &snd_pcm_vm_ops_data; -#ifndef LINUX_2_2 area->vm_private_data = substream; -#else - area->vm_private_data = (long)substream; -#endif #ifdef VM_RESERVED area->vm_flags |= VM_RESERVED; #endif @@ -3040,9 +2995,7 @@ */ static struct file_operations snd_pcm_f_ops_playback = { -#ifndef LINUX_2_2 .owner = THIS_MODULE, -#endif .write = snd_pcm_write, #if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 3, 44) .writev = snd_pcm_writev, @@ -3056,9 +3009,7 @@ }; static struct file_operations snd_pcm_f_ops_capture = { -#ifndef LINUX_2_2 .owner = THIS_MODULE, -#endif .read = snd_pcm_read, #if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 3, 44) .readv = snd_pcm_readv, diff -urN -X dontdiff linux-2.5.70-mm1.install_new_page/sound/core/rawmidi.c linux-2.5.70-mm1.loseLINUX_2_2/sound/core/rawmidi.c --- linux-2.5.70-mm1.install_new_page/sound/core/rawmidi.c 2003-05-26 18:00:24.000000000 -0700 +++ linux-2.5.70-mm1.loseLINUX_2_2/sound/core/rawmidi.c 2003-05-28 22:41:51.000000000 -0700 @@ -1316,9 +1316,7 @@ static struct file_operations snd_rawmidi_f_ops = { -#ifndef LINUX_2_2 .owner = THIS_MODULE, -#endif .read = snd_rawmidi_read, .write = snd_rawmidi_write, .open = snd_rawmidi_open, diff -urN -X dontdiff linux-2.5.70-mm1.install_new_page/sound/core/seq/oss/seq_oss.c linux-2.5.70-mm1.loseLINUX_2_2/sound/core/seq/oss/seq_oss.c --- linux-2.5.70-mm1.install_new_page/sound/core/seq/oss/seq_oss.c 2003-05-26 18:00:46.000000000 -0700 +++ linux-2.5.70-mm1.loseLINUX_2_2/sound/core/seq/oss/seq_oss.c 2003-05-28 22:41:51.000000000 -0700 @@ -194,9 +194,7 @@ static struct file_operations seq_oss_f_ops = { -#ifndef LINUX_2_2 .owner = THIS_MODULE, -#endif .read = odev_read, .write = odev_write, .open = odev_open, diff -urN -X dontdiff linux-2.5.70-mm1.install_new_page/sound/core/seq/seq_clientmgr.c linux-2.5.70-mm1.loseLINUX_2_2/sound/core/seq/seq_clientmgr.c --- linux-2.5.70-mm1.install_new_page/sound/core/seq/seq_clientmgr.c 2003-05-26 18:00:24.000000000 -0700 +++ linux-2.5.70-mm1.loseLINUX_2_2/sound/core/seq/seq_clientmgr.c 2003-05-28 22:41:51.000000000 -0700 @@ -2454,9 +2454,7 @@ static struct file_operations snd_seq_f_ops = { -#ifndef LINUX_2_2 .owner = THIS_MODULE, -#endif .read = snd_seq_read, .write = snd_seq_write, .open = snd_seq_open, diff -urN -X dontdiff linux-2.5.70-mm1.install_new_page/sound/core/seq/seq_memory.c linux-2.5.70-mm1.loseLINUX_2_2/sound/core/seq/seq_memory.c --- linux-2.5.70-mm1.install_new_page/sound/core/seq/seq_memory.c 2003-05-26 18:00:23.000000000 -0700 +++ linux-2.5.70-mm1.loseLINUX_2_2/sound/core/seq/seq_memory.c 2003-05-28 22:41:51.000000000 -0700 @@ -235,18 +235,7 @@ while (pool->free == NULL && ! nonblock && ! pool->closing) { spin_unlock(&pool->lock); -#ifdef LINUX_2_2 - /* change semaphore to allow other clients - to access device file */ - if (file) - up(&semaphore_of(file)); -#endif interruptible_sleep_on(&pool->output_sleep); -#ifdef LINUX_2_2 - /* restore semaphore again */ - if (file) - down(&semaphore_of(file)); -#endif spin_lock(&pool->lock); /* interrupted? */ if (signal_pending(current)) { diff -urN -X dontdiff linux-2.5.70-mm1.install_new_page/sound/core/sound.c linux-2.5.70-mm1.loseLINUX_2_2/sound/core/sound.c --- linux-2.5.70-mm1.install_new_page/sound/core/sound.c 2003-05-26 18:00:43.000000000 -0700 +++ linux-2.5.70-mm1.loseLINUX_2_2/sound/core/sound.c 2003-05-28 22:41:51.000000000 -0700 @@ -157,9 +157,7 @@ struct file_operations snd_fops = { -#ifndef LINUX_2_2 .owner = THIS_MODULE, -#endif .open = snd_open }; diff -urN -X dontdiff linux-2.5.70-mm1.install_new_page/sound/core/timer.c linux-2.5.70-mm1.loseLINUX_2_2/sound/core/timer.c --- linux-2.5.70-mm1.install_new_page/sound/core/timer.c 2003-05-26 18:00:41.000000000 -0700 +++ linux-2.5.70-mm1.loseLINUX_2_2/sound/core/timer.c 2003-05-28 22:41:51.000000000 -0700 @@ -1733,9 +1733,7 @@ static struct file_operations snd_timer_f_ops = { -#ifndef LINUX_2_2 .owner = THIS_MODULE, -#endif .read = snd_timer_user_read, .open = snd_timer_user_open, .release = snd_timer_user_release, ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race 2003-05-29 15:14 ` Paul E. McKenney 2003-05-29 15:18 ` [RFC][PATCH] Remove LINUX_2_2 Paul E. McKenney @ 2003-05-29 16:33 ` Hugh Dickins 2003-05-29 17:15 ` Daniel Phillips 2003-05-30 2:38 ` Paul E. McKenney 1 sibling, 2 replies; 12+ messages in thread From: Hugh Dickins @ 2003-05-29 16:33 UTC (permalink / raw) To: Paul E. McKenney; +Cc: phillips, akpm, hch, linux-mm, linux-kernel On Thu, 29 May 2003, Paul E. McKenney wrote: > On Fri, May 23, 2003 at 11:42:02AM -0700, Paul E. McKenney wrote: > > > > Exactly -- allows a ->nopage() to drop some lock to avoid races > > between pagefault and either vmtruncate() or invalidate_mmap_range(). > > This race (from the cross-host mmap viewpoint) is described in: > > > > http://marc.theaimsgroup.com/?l=linux-kernel&m=105286345316249&w=2 > > Rediffed for 2.5.70-mm1. Me? I much preferred your original, much sparer, nopagedone patch (labelled "uglyh as hell" by hch). I dislike passing lots of args down a level so they can be passed up again to the library function. In particular, I feel queasy (fear loss of control) about passing a pmd_t* down to a filesystem, which I'd prefer to have no access to such. But I may be in a minority, and the decision won't be mine. Hugh ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race 2003-05-29 16:33 ` [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race Hugh Dickins @ 2003-05-29 17:15 ` Daniel Phillips 2003-05-29 17:39 ` Daniel Phillips 2003-05-30 2:38 ` Paul E. McKenney 1 sibling, 1 reply; 12+ messages in thread From: Daniel Phillips @ 2003-05-29 17:15 UTC (permalink / raw) To: Hugh Dickins, Paul E. McKenney; +Cc: akpm, hch, linux-mm, linux-kernel On Thursday 29 May 2003 18:33, you wrote: > On Thu, 29 May 2003, Paul E. McKenney wrote: > > On Fri, May 23, 2003 at 11:42:02AM -0700, Paul E. McKenney wrote: > > > Exactly -- allows a ->nopage() to drop some lock to avoid races > > > between pagefault and either vmtruncate() or invalidate_mmap_range(). > > > This race (from the cross-host mmap viewpoint) is described in: > > > > > > http://marc.theaimsgroup.com/?l=linux-kernel&m=105286345316249&w=2 > > > > Rediffed for 2.5.70-mm1. > > Me? I much preferred your original, much sparer, nopagedone patch > (labelled "uglyh as hell" by hch). "me too". The fat patch that hits every fs to get rid of two lines and .5 cycles per no_page fault could be an epilogue (if/when it passes muster) to the little one that does the job and has already been thoroughly tested. I see both sides of the argument. The third side, not yet discussed, is the value of doing things incrementally, with widespread testing of the system at each step. Regards, Daniel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race 2003-05-29 17:15 ` Daniel Phillips @ 2003-05-29 17:39 ` Daniel Phillips 2003-05-29 20:24 ` Paul E. McKenney 0 siblings, 1 reply; 12+ messages in thread From: Daniel Phillips @ 2003-05-29 17:39 UTC (permalink / raw) To: Hugh Dickins, Paul E. McKenney; +Cc: akpm, hch, linux-mm, linux-kernel On Thursday 29 May 2003 19:15, Daniel Phillips wrote: > On Thursday 29 May 2003 18:33, you wrote: > > Me? I much preferred your original, much sparer, nopagedone patch > > (labelled "uglyh as hell" by hch). > > "me too". Oh wait, I mispoke... there is another formulation of the patch that hasn't yet been posted for review. Instead of having the nopagedone hook, it turns the entire do_no_page into a hook, per hch's suggestion, but leaves in the ->nopage hook, which makes the patch small and obviously right. I need to post that version for comparison, please bear with me. IMHO, it's nicer than the ->nopagedone form. Regards, Daniel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race 2003-05-29 17:39 ` Daniel Phillips @ 2003-05-29 20:24 ` Paul E. McKenney 0 siblings, 0 replies; 12+ messages in thread From: Paul E. McKenney @ 2003-05-29 20:24 UTC (permalink / raw) To: Daniel Phillips; +Cc: Hugh Dickins, akpm, hch, linux-mm, linux-kernel On Thu, May 29, 2003 at 07:39:47PM +0200, Daniel Phillips wrote: > On Thursday 29 May 2003 19:15, Daniel Phillips wrote: > > On Thursday 29 May 2003 18:33, you wrote: > > > Me? I much preferred your original, much sparer, nopagedone patch > > > (labelled "uglyh as hell" by hch). > > > > "me too". > > Oh wait, I mispoke... there is another formulation of the patch that hasn't > yet been posted for review. Instead of having the nopagedone hook, it turns > the entire do_no_page into a hook, per hch's suggestion, but leaves in the > ->nopage hook, which makes the patch small and obviously right. I need to > post that version for comparison, please bear with me. > > IMHO, it's nicer than the ->nopagedone form. I put together something like this, but the problem with it is that do_anonymous_page() needs the mm->page_table_lock held, but the ->nopage functions want this lock not to be held. One could require that all the lock be held on entry to all ->nopage functions, but this would require almost all ->nopage functions to drop the lock immediately upon entry. This seemed error-prone to me, but could certainly be done... Thoughts? Me, I don't care as long as there is some reasonable way for distributed filesystems to safely resolve the race between page faults and invalidation requests from other nodes. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race 2003-05-29 16:33 ` [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race Hugh Dickins 2003-05-29 17:15 ` Daniel Phillips @ 2003-05-30 2:38 ` Paul E. McKenney 1 sibling, 0 replies; 12+ messages in thread From: Paul E. McKenney @ 2003-05-30 2:38 UTC (permalink / raw) To: Hugh Dickins; +Cc: phillips, akpm, hch, linux-mm, linux-kernel On Thu, May 29, 2003 at 05:33:04PM +0100, Hugh Dickins wrote: > On Thu, 29 May 2003, Paul E. McKenney wrote: > > On Fri, May 23, 2003 at 11:42:02AM -0700, Paul E. McKenney wrote: > > > > > > Exactly -- allows a ->nopage() to drop some lock to avoid races > > > between pagefault and either vmtruncate() or invalidate_mmap_range(). > > > This race (from the cross-host mmap viewpoint) is described in: > > > > > > http://marc.theaimsgroup.com/?l=linux-kernel&m=105286345316249&w=2 > > > > Rediffed for 2.5.70-mm1. > > Me? I much preferred your original, much sparer, nopagedone patch > (labelled "uglyh as hell" by hch). I dislike passing lots of args > down a level so they can be passed up again to the library function. > > In particular, I feel queasy (fear loss of control) about passing a > pmd_t* down to a filesystem, which I'd prefer to have no access to > such. But I may be in a minority, and the decision won't be mine. Fine by me either way. ;-) Here is the rediffed nopagedone patch for 2.5.70-mm1. Thanx, Paul diff -urN -X dontdiff linux-2.5.70-mm1/include/linux/mm.h linux-2.5.70-mm1.nopagedone/include/linux/mm.h --- linux-2.5.70-mm1/include/linux/mm.h 2003-05-28 20:16:04.000000000 -0700 +++ linux-2.5.70-mm1.nopagedone/include/linux/mm.h 2003-05-29 19:34:55.000000000 -0700 @@ -143,6 +143,7 @@ void (*open)(struct vm_area_struct * area); void (*close)(struct vm_area_struct * area); struct page * (*nopage)(struct vm_area_struct * area, unsigned long address, int unused); + void (*nopagedone)(struct vm_area_struct * area, unsigned long address, int status); int (*populate)(struct vm_area_struct * area, unsigned long address, unsigned long len, pgprot_t prot, unsigned long pgoff, int nonblock); }; diff -urN -X dontdiff linux-2.5.70-mm1/mm/memory.c linux-2.5.70-mm1.nopagedone/mm/memory.c --- linux-2.5.70-mm1/mm/memory.c 2003-05-28 20:16:04.000000000 -0700 +++ linux-2.5.70-mm1.nopagedone/mm/memory.c 2003-05-29 19:34:55.000000000 -0700 @@ -1468,6 +1468,9 @@ ret = VM_FAULT_OOM; out: pte_chain_free(pte_chain); + if (vma->vm_ops && vma->vm_ops->nopagedone) { + vma->vm_ops->nopagedone(vma, address & PAGE_MASK, ret); + } return ret; } ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC][PATCH] vm_operation to avoid pagefault/inval race @ 2003-05-17 18:21 Daniel Phillips 2003-05-17 19:49 ` Andrew Morton 0 siblings, 1 reply; 12+ messages in thread From: Daniel Phillips @ 2003-05-17 18:21 UTC (permalink / raw) To: Paul E. McKenney; +Cc: Andrew Morton, Christoph Hellwig, linux-mm, linux-kernel Please don't take lack of response for lack of interest. The generic issue here is "what are the vfs changes needed to support cross-host mmap?". You defined the problem nicely. > > [...] > > 5. One way or another, life is now hard. Indeed. In brief, ->nopage just doesn't provide adequate coverage to support the cross-host lock. > One solution would be for the distributed filesystem to hold > onto a lock or semaphore upon return from the nopage function. > The problem is that there is no way to determine (in a timely > fashion) when it safe to release this lock or semaphore. > > The attached patch addresses this by adding a nopagedone > function for when do_no_page() exits. The filesystem may then > drop the lock or semaphore in this nopagedone function. > > Thoughts? Is there some other existing way to get this done? There is. One way is to make all of do_no_page a hook, and clearly this is more generic than what you proposed, since it covers your hook and the rest can be done with library calls. Once you've gone there, the next question to ask is "what use is the existing ->nopage" hook, and the answer is: none, really. The existing usage of ->nopage can be replaced by ->do_no_page plus library code, and the only problem is, we have to change pretty well every filesystem in and out of tree. So that gets a little, em, interesting from the 2.6.0 point of view, which is why I cc'd Andrew on this. Christoph has also expressed interest in this, which explains the other cc. Any clustered filesystem that wants to support posix mmap is going to need this hook, so the sooner we hash this out, the better. Regards, Daniel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC][PATCH] vm_operation to avoid pagefault/inval race 2003-05-17 18:21 [RFC][PATCH] vm_operation to avoid pagefault/inval race Daniel Phillips @ 2003-05-17 19:49 ` Andrew Morton 2003-05-20 1:23 ` Paul E. McKenney 0 siblings, 1 reply; 12+ messages in thread From: Andrew Morton @ 2003-05-17 19:49 UTC (permalink / raw) To: Daniel Phillips; +Cc: paulmck, hch, linux-mm, linux-kernel Daniel Phillips <phillips@arcor.de> wrote: > > and the only problem is, we have to change pretty well every > filesystem in and out of tree. But it's only a one-liner per fs. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC][PATCH] vm_operation to avoid pagefault/inval race 2003-05-17 19:49 ` Andrew Morton @ 2003-05-20 1:23 ` Paul E. McKenney 2003-05-20 8:11 ` Andrew Morton 0 siblings, 1 reply; 12+ messages in thread From: Paul E. McKenney @ 2003-05-20 1:23 UTC (permalink / raw) To: Andrew Morton; +Cc: Daniel Phillips, hch, linux-mm, linux-kernel On Sat, May 17, 2003 at 12:49:48PM -0700, Andrew Morton wrote: > Daniel Phillips <phillips@arcor.de> wrote: > > > > and the only problem is, we have to change pretty well every > > filesystem in and out of tree. > > But it's only a one-liner per fs. So the general idea is to do something as follows, right? (Sorry for not just putting together a patch -- I want to make sure I understand all of your advice first!) o Make all callers to do_no_page() instead call vma->vm_ops->nopage(). o Make a function, perhaps named something like install_new_page(), that does the PTE-installation and RSS-adjustment tasks currently performed by both do_no_page() and by do_anonymous_page(). (Not clear to me yet whether a full merge of these two functions is the right approach, more thought needed. Note that the nopage function is implicitly aware of whether it is handling an anonymous page or not, so a pair of functions that both call another function containing the common code is reasonable, if warranted.) The install_new_page() function needs an additional argument to accept the new_page value that used to be returned by the nopage() function. o Add arguments to nopage() to allow it to invoke install_new_page(). o Change all nopage() functions to invoke install_new_page(), but only in cases where they would -not- return VM_FAULT_OOM or VM_FAULT_SIGBUS. In these cases, these two return codes must be handed back to the caller without invoking install_new_page(). o Otherwise, the value that these nopage() functions would normally return must be passed to install_new_page(), and the value returned by install_new_page() must be returned to the nopage() function's caller. o Replace all occurrences of "->vm_ops = NULL" with "->vm_ops = anonymous_vm_ops" or some such. o The anonymous_vm_ops would have the following members: nopage: pointer to a function containing the page-allocation code extracted from do_anonymous_page(), followed by a call to install_new_page(). populate: NULL. open: NULL. close: NULL. Thoughts? Thanx, Paul ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC][PATCH] vm_operation to avoid pagefault/inval race 2003-05-20 1:23 ` Paul E. McKenney @ 2003-05-20 8:11 ` Andrew Morton 2003-05-23 14:35 ` [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race Paul E. McKenney 0 siblings, 1 reply; 12+ messages in thread From: Andrew Morton @ 2003-05-20 8:11 UTC (permalink / raw) To: paulmck; +Cc: phillips, hch, linux-mm, linux-kernel "Paul E. McKenney" <paulmck@us.ibm.com> wrote: > > So the general idea is to do something as follows, right? It sounds reasonable. A matter of putting together the appropriate library functions and refactoring a few things. > > o Make a function, perhaps named something like > install_new_page(), that does the PTE-installation > and RSS-adjustment tasks currently performed by > both do_no_page() and by do_anonymous_page(). That's similar to mm/fremap.c:install_page(). (Which forgets to call update_mmu_cache(). Debatably a buglet.) However there is not a lot of commonality between the various nopage()s and there may not be a lot to be gained from all this. There is subtle code in there and it is performance-critical. I'd be inclined to try to minimise overall code churn in this work. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race 2003-05-20 8:11 ` Andrew Morton @ 2003-05-23 14:35 ` Paul E. McKenney 2003-05-23 16:21 ` Hugh Dickins 0 siblings, 1 reply; 12+ messages in thread From: Paul E. McKenney @ 2003-05-23 14:35 UTC (permalink / raw) To: Andrew Morton; +Cc: phillips, hch, linux-mm, linux-kernel On Tue, May 20, 2003 at 01:11:57AM -0700, Andrew Morton wrote: > "Paul E. McKenney" <paulmck@us.ibm.com> wrote: > > > > So the general idea is to do something as follows, right? > > It sounds reasonable. A matter of putting together the appropriate > library functions and refactoring a few things. > > > > > o Make a function, perhaps named something like > > install_new_page(), that does the PTE-installation > > and RSS-adjustment tasks currently performed by > > both do_no_page() and by do_anonymous_page(). > > That's similar to mm/fremap.c:install_page(). (Which forgets to call > update_mmu_cache(). Debatably a buglet.) > > However there is not a lot of commonality between the various nopage()s and > there may not be a lot to be gained from all this. There is subtle code in > there and it is performance-critical. I'd be inclined to try to minimise > overall code churn in this work. Good point! Here is a patch to do this. A "few" caveats: o I have not tested this, in fact, I have only compiled it for i386. o The bit about removing checks for vm_ops==NULL and making do_anonymous_page() be just another nopage function turned out to be problematic, since the do_anonymous_page() function wants page_table_lock held, and the other nopage functions do not. So I kept the NULL checks, since I was going to need some anyway (I did -not- want to make every nopage function have to explicitly drop page_table_lock). I am especially interested in feedback on this point -- did I miss something here? o I had to expand the trace of the nopage functions to pass through the mm_struct and the pmd_t to the new install_new_page() function. o The nopage functions now return an int instead of the old struct page*. o NOPAGE_OOM and NOPAGE_SIGBUS are no more, since one case just use VM_FAULT_OOM and VM_FAULT_SIGBUS instead. o I still need to remove some LINUX_2_2 stuff (thanks to Dan Phillips for letting me know it was OK to do so...). o This patch is a bit long. Thoughts on how to break it up? Thanx, Paul diff -urN -X dontdiff linux-2.5.69-mm7/arch/ia64/ia32/binfmt_elf32.c linux-2.5.69-mm7.install_new_page/arch/ia64/ia32/binfmt_elf32.c --- linux-2.5.69-mm7/arch/ia64/ia32/binfmt_elf32.c Tue May 20 09:10:43 2003 +++ linux-2.5.69-mm7.install_new_page/arch/ia64/ia32/binfmt_elf32.c Thu May 22 16:26:07 2003 @@ -56,13 +56,13 @@ extern struct page *ia32_shared_page[]; extern unsigned long *ia32_gdt; -struct page * -ia32_install_shared_page (struct vm_area_struct *vma, unsigned long address, int no_share) +struct int +ia32_install_shared_page (struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, int write_access, pmd_t *pmd) { struct page *pg = ia32_shared_page[(address - vma->vm_start)/PAGE_SIZE]; get_page(pg); - return pg; + return install_new_page(mm, vma, address, write_access, pmd, pg); } static struct vm_operations_struct ia32_shared_page_vm_ops = { diff -urN -X dontdiff linux-2.5.69-mm7/arch/sparc64/mm/hugetlbpage.c linux-2.5.69-mm7.install_new_page/arch/sparc64/mm/hugetlbpage.c --- linux-2.5.69-mm7/arch/sparc64/mm/hugetlbpage.c Sun May 4 16:53:35 2003 +++ linux-2.5.69-mm7.install_new_page/arch/sparc64/mm/hugetlbpage.c Thu May 22 16:23:56 2003 @@ -633,11 +633,12 @@ return (int) htlbzone_pages; } -static struct page * -hugetlb_nopage(struct vm_area_struct *vma, unsigned long address, int unused) +static int +hugetlb_nopage(struct mm_struct * mm, struct vm_area_struct *vma, + unsigned long address, int write_access, pmd_t * pmd) { BUG(); - return NULL; + return VM_FAULT_SIGBUS; } static struct vm_operations_struct hugetlb_vm_ops = { diff -urN -X dontdiff linux-2.5.69-mm7/drivers/char/agp/alpha-agp.c linux-2.5.69-mm7.install_new_page/drivers/char/agp/alpha-agp.c --- linux-2.5.69-mm7/drivers/char/agp/alpha-agp.c Tue May 20 09:10:46 2003 +++ linux-2.5.69-mm7.install_new_page/drivers/char/agp/alpha-agp.c Thu May 22 16:02:30 2003 @@ -11,9 +11,11 @@ #include "agp.h" -static struct page *alpha_core_agp_vm_nopage(struct vm_area_struct *vma, - unsigned long address, - int write_access) +static int alpha_core_agp_vm_nopage(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long address, + int write_access, + pmd_t pmd) { alpha_agp_info *agp = agp_bridge->dev_private_data; dma_addr_t dma_addr; @@ -23,14 +25,15 @@ dma_addr = address - vma->vm_start + agp->aperture.bus_base; pa = agp->ops->translate(agp, dma_addr); - if (pa == (unsigned long)-EINVAL) return NULL; /* no translation */ + if (pa == (unsigned long)-EINVAL) return VM_FAULT_SIGBUS; + /* no translation */ /* * Get the page, inc the use count, and return it */ page = virt_to_page(__va(pa)); get_page(page); - return page; + return install_new_page(mm, vma, address, write_access, pmd, page); } static struct aper_size_info_fixed alpha_core_agp_sizes[] = diff -urN -X dontdiff linux-2.5.69-mm7/drivers/char/drm/drmP.h linux-2.5.69-mm7.install_new_page/drivers/char/drm/drmP.h --- linux-2.5.69-mm7/drivers/char/drm/drmP.h Sun May 4 16:53:36 2003 +++ linux-2.5.69-mm7.install_new_page/drivers/char/drm/drmP.h Thu May 22 16:15:52 2003 @@ -620,18 +620,26 @@ extern int DRM(fasync)(int fd, struct file *filp, int on); /* Mapping support (drm_vm.h) */ -extern struct page *DRM(vm_nopage)(struct vm_area_struct *vma, - unsigned long address, - int write_access); -extern struct page *DRM(vm_shm_nopage)(struct vm_area_struct *vma, - unsigned long address, - int write_access); -extern struct page *DRM(vm_dma_nopage)(struct vm_area_struct *vma, - unsigned long address, - int write_access); -extern struct page *DRM(vm_sg_nopage)(struct vm_area_struct *vma, - unsigned long address, - int write_access); +extern int DRM(vm_nopage)(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long address, + int write_access, + pmd_t *pmd); +extern int DRM(vm_shm_nopage)(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long address, + int write_access, + pmd_t *pmd); +extern int DRM(vm_dma_nopage)(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long address, + int write_access, + pmd_t *pmd); +extern int DRM(vm_sg_nopage)(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long address, + int write_access, + pmd_t *pmd); extern void DRM(vm_open)(struct vm_area_struct *vma); extern void DRM(vm_close)(struct vm_area_struct *vma); extern void DRM(vm_shm_close)(struct vm_area_struct *vma); diff -urN -X dontdiff linux-2.5.69-mm7/drivers/char/drm/drm_vm.h linux-2.5.69-mm7.install_new_page/drivers/char/drm/drm_vm.h --- linux-2.5.69-mm7/drivers/char/drm/drm_vm.h Sun May 4 16:53:57 2003 +++ linux-2.5.69-mm7.install_new_page/drivers/char/drm/drm_vm.h Thu May 22 15:09:40 2003 @@ -55,9 +55,11 @@ .close = DRM(vm_close), }; -struct page *DRM(vm_nopage)(struct vm_area_struct *vma, - unsigned long address, - int write_access) +int DRM(vm_nopage)(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long address, + int write_access, + pmd_t *pmd) { #if __REALLY_HAVE_AGP drm_file_t *priv = vma->vm_file->private_data; @@ -114,35 +116,38 @@ DRM_DEBUG("baddr = 0x%lx page = 0x%p, offset = 0x%lx\n", baddr, __va(agpmem->memory->memory[offset]), offset); - return page; + return install_new_page(mm, vma, address, write_access, + pmd, page); } vm_nopage_error: #endif /* __REALLY_HAVE_AGP */ - return NOPAGE_SIGBUS; /* Disallow mremap */ + return VM_FAULT_SIGBUS; /* Disallow mremap */ } -struct page *DRM(vm_shm_nopage)(struct vm_area_struct *vma, - unsigned long address, - int write_access) +int DRM(vm_shm_nopage)(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long address, + int write_access, + pmd_t *pmd) { drm_map_t *map = (drm_map_t *)vma->vm_private_data; unsigned long offset; unsigned long i; struct page *page; - if (address > vma->vm_end) return NOPAGE_SIGBUS; /* Disallow mremap */ - if (!map) return NOPAGE_OOM; /* Nothing allocated */ + if (address > vma->vm_end) return VM_FAULT_SIGBUS; /* Disallow mremap */ + if (!map) return VM_FAULT_OOM; /* Nothing allocated */ offset = address - vma->vm_start; i = (unsigned long)map->handle + offset; page = vmalloc_to_page((void *)i); if (!page) - return NOPAGE_OOM; + return VM_FAULT_OOM; get_page(page); DRM_DEBUG("shm_nopage 0x%lx\n", address); - return page; + return install_new_page(mm, vma, address, write_access, pmd, page); } /* Special close routine which deletes map information if we are the last @@ -221,9 +226,11 @@ up(&dev->struct_sem); } -struct page *DRM(vm_dma_nopage)(struct vm_area_struct *vma, - unsigned long address, - int write_access) +int DRM(vm_dma_nopage)(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long address, + int write_access, + pmd_t *pmd) { drm_file_t *priv = vma->vm_file->private_data; drm_device_t *dev = priv->dev; @@ -232,9 +239,9 @@ unsigned long page_nr; struct page *page; - if (!dma) return NOPAGE_SIGBUS; /* Error */ - if (address > vma->vm_end) return NOPAGE_SIGBUS; /* Disallow mremap */ - if (!dma->pagelist) return NOPAGE_OOM ; /* Nothing allocated */ + if (!dma) return VM_FAULT_SIGBUS; /* Error */ + if (address > vma->vm_end) return VM_FAULT_SIGBUS; /* Disallow mremap */ + if (!dma->pagelist) return VM_FAULT_OOM ; /* Nothing allocated */ offset = address - vma->vm_start; /* vm_[pg]off[set] should be 0 */ page_nr = offset >> PAGE_SHIFT; @@ -244,12 +251,14 @@ get_page(page); DRM_DEBUG("dma_nopage 0x%lx (page %lu)\n", address, page_nr); - return page; + return install_new_page(mm, vma, address, write_access, pmd, page); } -struct page *DRM(vm_sg_nopage)(struct vm_area_struct *vma, - unsigned long address, - int write_access) +int DRM(vm_sg_nopage)(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long address, + int write_access, + pmd_t *pmd) { drm_map_t *map = (drm_map_t *)vma->vm_private_data; drm_file_t *priv = vma->vm_file->private_data; @@ -260,9 +269,9 @@ unsigned long page_offset; struct page *page; - if (!entry) return NOPAGE_SIGBUS; /* Error */ - if (address > vma->vm_end) return NOPAGE_SIGBUS; /* Disallow mremap */ - if (!entry->pagelist) return NOPAGE_OOM ; /* Nothing allocated */ + if (!entry) return VM_FAULT_SIGBUS; /* Error */ + if (address > vma->vm_end) return VM_FAULT_SIGBUS; /* Disallow mremap */ + if (!entry->pagelist) return VM_FAULT_OOM ; /* Nothing allocated */ offset = address - vma->vm_start; @@ -271,7 +280,7 @@ page = entry->pagelist[page_offset]; get_page(page); - return page; + return install_new_page(mm, vma, address, write_access, pmd, page); } void DRM(vm_open)(struct vm_area_struct *vma) diff -urN -X dontdiff linux-2.5.69-mm7/drivers/ieee1394/dma.c linux-2.5.69-mm7.install_new_page/drivers/ieee1394/dma.c --- linux-2.5.69-mm7/drivers/ieee1394/dma.c Sun May 4 16:53:31 2003 +++ linux-2.5.69-mm7.install_new_page/drivers/ieee1394/dma.c Thu May 22 16:16:07 2003 @@ -184,28 +184,28 @@ /* nopage() handler for mmap access */ -static struct page* -dma_region_pagefault(struct vm_area_struct *area, unsigned long address, int write_access) +static int +dma_region_pagefault(struct mm_struct *mm, struct vm_area_struct *area, + unsigned long address, int write_access, pmd_t *pmd) { unsigned long offset; unsigned long kernel_virt_addr; - struct page *ret = NOPAGE_SIGBUS; + struct page *page; struct dma_region *dma = (struct dma_region*) area->vm_private_data; if(!dma->kvirt) - goto out; + return VM_FAULT_SIGBUS; if( (address < (unsigned long) area->vm_start) || (address > (unsigned long) area->vm_start + (PAGE_SIZE * dma->n_pages)) ) - goto out; + return VM_FAULT_SIGBUS; offset = address - area->vm_start; kernel_virt_addr = (unsigned long) dma->kvirt + offset; - ret = vmalloc_to_page((void*) kernel_virt_addr); - get_page(ret); -out: - return ret; + page = vmalloc_to_page((void*) kernel_virt_addr); + get_page(page); + return install_new_page(mm, vma, address, write_access, pmd, page); } static struct vm_operations_struct dma_region_vm_ops = { diff -urN -X dontdiff linux-2.5.69-mm7/drivers/media/video/video-buf.c linux-2.5.69-mm7.install_new_page/drivers/media/video/video-buf.c --- linux-2.5.69-mm7/drivers/media/video/video-buf.c Tue May 20 09:10:50 2003 +++ linux-2.5.69-mm7.install_new_page/drivers/media/video/video-buf.c Thu May 22 17:59:32 2003 @@ -979,21 +979,21 @@ * now ...). Bounce buffers don't work very well for the data rates * video capture has. */ -static struct page* -videobuf_vm_nopage(struct vm_area_struct *vma, unsigned long vaddr, - int write_access) +static int +videobuf_vm_nopage(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long vaddr, int write_access, pmd_t pmd) { struct page *page; dprintk(3,"nopage: fault @ %08lx [vma %08lx-%08lx]\n", vaddr,vma->vm_start,vma->vm_end); if (vaddr > vma->vm_end) - return NOPAGE_SIGBUS; + return VM_FAULT_SIGBUS; page = alloc_page(GFP_USER); if (!page) - return NOPAGE_OOM; + return VM_FAULT_OOM; clear_user_page(page_address(page), vaddr, page); - return page; + return install_new_page(mm, vma, vaddr, write_access, pmd, page); } static struct vm_operations_struct videobuf_vm_ops = diff -urN -X dontdiff linux-2.5.69-mm7/drivers/scsi/sg.c linux-2.5.69-mm7.install_new_page/drivers/scsi/sg.c --- linux-2.5.69-mm7/drivers/scsi/sg.c Tue May 20 09:10:57 2003 +++ linux-2.5.69-mm7.install_new_page/drivers/scsi/sg.c Thu May 22 16:34:00 2003 @@ -1121,21 +1121,22 @@ } } -static struct page * -sg_vma_nopage(struct vm_area_struct *vma, unsigned long addr, int unused) +static int +sg_vma_nopage(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr, int write_access, pmd_t *pmd) { Sg_fd *sfp; - struct page *page = NOPAGE_SIGBUS; + struct page *page = VM_FAULT_SIGBUS; void *page_ptr = NULL; unsigned long offset; Sg_scatter_hold *rsv_schp; if ((NULL == vma) || (!(sfp = (Sg_fd *) vma->vm_private_data))) - return page; + return install_new_page(mm, vma, addr, write_access, pmd, page); rsv_schp = &sfp->reserve; offset = addr - vma->vm_start; if (offset >= rsv_schp->bufflen) - return page; + return install_new_page(mm, vma, addr, write_access, pmd, page); SCSI_LOG_TIMEOUT(3, printk("sg_vma_nopage: offset=%lu, scatg=%d\n", offset, rsv_schp->k_use_sg)); if (rsv_schp->k_use_sg) { /* reserve buffer is a scatter gather list */ @@ -1162,7 +1163,7 @@ page = virt_to_page(page_ptr); get_page(page); /* increment page count */ } - return page; + return install_new_page(mm, vma, addr, write_access, pmd, page); } static struct vm_operations_struct sg_mmap_vm_ops = { diff -urN -X dontdiff linux-2.5.69-mm7/drivers/sgi/char/graphics.c linux-2.5.69-mm7.install_new_page/drivers/sgi/char/graphics.c --- linux-2.5.69-mm7/drivers/sgi/char/graphics.c Sun May 4 16:53:31 2003 +++ linux-2.5.69-mm7.install_new_page/drivers/sgi/char/graphics.c Thu May 22 16:37:00 2003 @@ -211,9 +211,9 @@ /* * This is the core of the direct rendering engine. */ -struct page * -sgi_graphics_nopage (struct vm_area_struct *vma, unsigned long address, int - no_share) +struct int +sgi_graphics_nopage (struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long address, int write_access, pmd_t *pmdpf) { pgd_t *pgd; pmd_t *pmd; pte_t *pte; int board = GRAPHICS_CARD (vma->vm_dentry->d_inode->i_rdev); @@ -249,7 +249,7 @@ pte = pte_kmap_offset(pmd, address); page = pte_page(*pte); pte_kunmap(pte); - return page; + return install_new_page(mm, vma, address, write_access, pmdpf, page); } /* diff -urN -X dontdiff linux-2.5.69-mm7/fs/ncpfs/mmap.c linux-2.5.69-mm7.install_new_page/fs/ncpfs/mmap.c --- linux-2.5.69-mm7/fs/ncpfs/mmap.c Sun May 4 16:53:35 2003 +++ linux-2.5.69-mm7.install_new_page/fs/ncpfs/mmap.c Thu May 22 16:28:25 2003 @@ -25,8 +25,10 @@ /* * Fill in the supplied page for mmap */ -static struct page* ncp_file_mmap_nopage(struct vm_area_struct *area, - unsigned long address, int write_access) +static int ncp_file_mmap_nopage(struct mm_struct *mm, + struct vm_area_struct *area, + unsigned long address, int write_access, + pmd_t *pmd) { struct file *file = area->vm_file; struct dentry *dentry = file->f_dentry; @@ -85,7 +87,7 @@ memset(pg_addr + already_read, 0, PAGE_SIZE - already_read); flush_dcache_page(page); kunmap(page); - return page; + return install_new_page(mm, area, address, write_access, pmd, page); } static struct vm_operations_struct ncp_file_mmap = diff -urN -X dontdiff linux-2.5.69-mm7/include/linux/mm.h linux-2.5.69-mm7.install_new_page/include/linux/mm.h --- linux-2.5.69-mm7/include/linux/mm.h Tue May 20 09:11:08 2003 +++ linux-2.5.69-mm7.install_new_page/include/linux/mm.h Thu May 22 20:00:41 2003 @@ -142,7 +142,8 @@ struct vm_operations_struct { void (*open)(struct vm_area_struct * area); void (*close)(struct vm_area_struct * area); - struct page * (*nopage)(struct vm_area_struct * area, unsigned long address, int unused); + int (*nopage)(struct mm_struct * mm, struct vm_area_struct * area, + unsigned long address, int write_access, pmd_t *pmd); int (*populate)(struct vm_area_struct * area, unsigned long address, unsigned long len, pgprot_t prot, unsigned long pgoff, int nonblock); }; @@ -380,12 +381,6 @@ } /* - * Error return values for the *_nopage functions - */ -#define NOPAGE_SIGBUS (NULL) -#define NOPAGE_OOM ((struct page *) (-1)) - -/* * Different kinds of faults, as returned by handle_mm_fault(). * Used to decide whether a process gets delivered SIGBUS or * just gets major/minor fault counters bumped up. @@ -402,8 +397,8 @@ extern void show_free_areas(void); -struct page *shmem_nopage(struct vm_area_struct * vma, - unsigned long address, int unused); +int shmem_nopage(struct mm_struct * mm, struct vm_area_struct * vma, + unsigned long address, int write_access, pmd_t * pmd); struct file *shmem_file_setup(char * name, loff_t size, unsigned long flags); void shmem_lock(struct file * file, int lock); int shmem_zero_setup(struct vm_area_struct *); @@ -421,6 +416,9 @@ int zeromap_page_range(struct vm_area_struct *vma, unsigned long from, unsigned long size, pgprot_t prot); +extern int install_new_page(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long address, int write_access, + pmd_t *pmd, struct page * new_page); extern void invalidate_mmap_range(struct address_space *mapping, loff_t const holebegin, loff_t const holelen); @@ -559,7 +557,8 @@ extern void truncate_inode_pages(struct address_space *, loff_t); /* generic vm_area_ops exported for stackable file systems */ -extern struct page *filemap_nopage(struct vm_area_struct *, unsigned long, int); +int filemap_nopage(struct mm_struct *, struct vm_area_struct *, + unsigned long, int, pmd_t *); /* mm/page-writeback.c */ int write_one_page(struct page *page, int wait); diff -urN -X dontdiff linux-2.5.69-mm7/kernel/ksyms.c linux-2.5.69-mm7.install_new_page/kernel/ksyms.c --- linux-2.5.69-mm7/kernel/ksyms.c Tue May 20 09:11:09 2003 +++ linux-2.5.69-mm7.install_new_page/kernel/ksyms.c Thu May 22 14:54:24 2003 @@ -116,6 +116,7 @@ EXPORT_SYMBOL(max_mapnr); #endif EXPORT_SYMBOL(high_memory); +EXPORT_SYMBOL(install_new_page); EXPORT_SYMBOL(invalidate_mmap_range); EXPORT_SYMBOL(vmtruncate); EXPORT_SYMBOL(find_vma); diff -urN -X dontdiff linux-2.5.69-mm7/mm/filemap.c linux-2.5.69-mm7.install_new_page/mm/filemap.c --- linux-2.5.69-mm7/mm/filemap.c Tue May 20 09:11:09 2003 +++ linux-2.5.69-mm7.install_new_page/mm/filemap.c Thu May 22 19:55:56 2003 @@ -982,7 +982,8 @@ * it in the page cache, and handles the special cases reasonably without * having a lot of duplicated code. */ -struct page * filemap_nopage(struct vm_area_struct * area, unsigned long address, int unused) +int filemap_nopage(struct mm_struct * mm, struct vm_area_struct * area, + unsigned long address, int write_access, pmd_t * pmd) { int error; struct file *file = area->vm_file; @@ -1003,7 +1004,7 @@ */ size = (inode->i_size + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; if ((pgoff >= size) && (area->vm_mm == current->mm)) - return NULL; + return VM_FAULT_SIGBUS; /* * The "size" of the file, as far as mmap is concerned, isn't bigger @@ -1057,7 +1058,7 @@ * Found the page and have a reference on it. */ mark_page_accessed(page); - return page; + return install_new_page(mm, area, address, write_access, pmd, page); no_cached_page: /* @@ -1080,8 +1081,8 @@ * to schedule I/O. */ if (error == -ENOMEM) - return NOPAGE_OOM; - return NULL; + return VM_FAULT_OOM; + return VM_FAULT_SIGBUS; page_not_uptodate: inc_page_state(pgmajfault); @@ -1138,7 +1139,7 @@ * mm layer so, possibly freeing the page cache page first. */ page_cache_release(page); - return NULL; + return VM_FAULT_SIGBUS; } static struct page * filemap_getpage(struct file *file, unsigned long pgoff, diff -urN -X dontdiff linux-2.5.69-mm7/mm/memory.c linux-2.5.69-mm7.install_new_page/mm/memory.c --- linux-2.5.69-mm7/mm/memory.c Tue May 20 09:11:09 2003 +++ linux-2.5.69-mm7.install_new_page/mm/memory.c Thu May 22 14:56:27 2003 @@ -1385,28 +1385,49 @@ * This is called with the MM semaphore held and the page table * spinlock held. Exit with the spinlock released. */ -static int +static inline int do_no_page(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, int write_access, pte_t *page_table, pmd_t *pmd) { - struct page * new_page; - pte_t entry; - struct pte_chain *pte_chain; - int ret; - if (!vma->vm_ops || !vma->vm_ops->nopage) return do_anonymous_page(mm, vma, page_table, - pmd, write_access, address); + pmd, write_access, address); pte_unmap(page_table); spin_unlock(&mm->page_table_lock); - new_page = vma->vm_ops->nopage(vma, address & PAGE_MASK, 0); + return vma->vm_ops->nopage(mm, vma, address & PAGE_MASK, + write_access, pmd); +} - /* no page was available -- either SIGBUS or OOM */ - if (new_page == NOPAGE_SIGBUS) - return VM_FAULT_SIGBUS; - if (new_page == NOPAGE_OOM) - return VM_FAULT_OOM; +/** + * install_new_page - tries to create a new page mapping. + * @mm: mmap structure, locus of locking and RSS activity. + * @vma: the vm_area_struct controlling the virtual address at + * which the page fault occurred. + * @address: address of fault. + * @write_access: write access required. + * @pmd: PMD for faulting address. + * @page: physical page to satisfy fault. + * + * The install_new_page() function aggressively tries to share with + * existing pages, but makes a separate copy if the "write_access" + * parameter is true in order to avoid the next page fault. + * + * As this is called only for pages that do not currently exist, we + * do not need to flush old virtual caches or the TLB. + * + * This is called with the MM semaphore held and the page table + * spinlock held. Exit with the spinlock released. + */ +int +install_new_page(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long address, int write_access, + pmd_t *pmd, struct page * new_page) +{ + pte_t entry; + pte_t *page_table; + struct pte_chain *pte_chain; + int ret; pte_chain = pte_chain_alloc(GFP_KERNEL); if (!pte_chain) diff -urN -X dontdiff linux-2.5.69-mm7/mm/shmem.c linux-2.5.69-mm7.install_new_page/mm/shmem.c --- linux-2.5.69-mm7/mm/shmem.c Tue May 20 09:11:10 2003 +++ linux-2.5.69-mm7.install_new_page/mm/shmem.c Thu May 22 17:18:55 2003 @@ -936,7 +936,8 @@ return error; } -struct page *shmem_nopage(struct vm_area_struct *vma, unsigned long address, int unused) +int shmem_nopage(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long address, int write_access, pmd_t *pmd) { struct inode *inode = vma->vm_file->f_dentry->d_inode; struct page *page = NULL; @@ -949,10 +950,10 @@ error = shmem_getpage(inode, idx, &page, SGP_CACHE); if (error) - return (error == -ENOMEM)? NOPAGE_OOM: NOPAGE_SIGBUS; + return (error == -ENOMEM)? VM_FAULT_OOM: VM_FAULT_SIGBUS; mark_page_accessed(page); - return page; + return install_new_page(mm, vma, address, write_access, pmd, page); } static int shmem_populate(struct vm_area_struct *vma, diff -urN -X dontdiff linux-2.5.69-mm7/sound/core/pcm_native.c linux-2.5.69-mm7.install_new_page/sound/core/pcm_native.c --- linux-2.5.69-mm7/sound/core/pcm_native.c Sun May 4 16:53:09 2003 +++ linux-2.5.69-mm7.install_new_page/sound/core/pcm_native.c Thu May 22 20:07:04 2003 @@ -2693,7 +2693,7 @@ #endif #ifndef LINUX_2_2 -static struct page * snd_pcm_mmap_status_nopage(struct vm_area_struct *area, unsigned long address, int no_share) +static int snd_pcm_mmap_status_nopage(struct mm_struct *mm, struct vm_area_struct *area, unsigned long address, int write_access, pmd_t *pmd) #else static unsigned long snd_pcm_mmap_status_nopage(struct vm_area_struct *area, unsigned long address, int no_share) #endif @@ -2703,12 +2703,12 @@ struct page * page; if (substream == NULL) - return NOPAGE_OOM; + return VM_FAULT_OOM; runtime = substream->runtime; page = virt_to_page(runtime->status); get_page(page); #ifndef LINUX_2_2 - return page; + return install_new_page(mm, area, address, write_access, pmd, page); #else return page_address(page); #endif @@ -2747,7 +2747,7 @@ } #ifndef LINUX_2_2 -static struct page * snd_pcm_mmap_control_nopage(struct vm_area_struct *area, unsigned long address, int no_share) +static int snd_pcm_mmap_control_nopage(struct mm_struct *mm, struct vm_area_struct *area, unsigned long address, int write_access, pmd_t *pmd) #else static unsigned long snd_pcm_mmap_control_nopage(struct vm_area_struct *area, unsigned long address, int no_share) #endif @@ -2757,12 +2757,12 @@ struct page * page; if (substream == NULL) - return NOPAGE_OOM; + return VM_FAULT_OOM; runtime = substream->runtime; page = virt_to_page(runtime->control); get_page(page); #ifndef LINUX_2_2 - return page; + return install_new_page(mm, area, address, write_access, pmd, page); #else return page_address(page); #endif @@ -2813,7 +2813,7 @@ } #ifndef LINUX_2_2 -static struct page * snd_pcm_mmap_data_nopage(struct vm_area_struct *area, unsigned long address, int no_share) +static int snd_pcm_mmap_data_nopage(struct mm_struct *mm, struct vm_area_struct *area, unsigned long address, int write_access, pmd_t *pmd) #else static unsigned long snd_pcm_mmap_data_nopage(struct vm_area_struct *area, unsigned long address, int no_share) #endif @@ -2826,7 +2826,7 @@ size_t dma_bytes; if (substream == NULL) - return NOPAGE_OOM; + return VM_FAULT_OOM; runtime = substream->runtime; #if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 3, 25) offset = area->vm_pgoff << PAGE_SHIFT; @@ -2834,21 +2834,21 @@ offset = area->vm_offset; #endif offset += address - area->vm_start; - snd_assert((offset % PAGE_SIZE) == 0, return NOPAGE_OOM); + snd_assert((offset % PAGE_SIZE) == 0, return VM_FAULT_OOM); dma_bytes = PAGE_ALIGN(runtime->dma_bytes); if (offset > dma_bytes - PAGE_SIZE) - return NOPAGE_SIGBUS; + return VM_FAULT_SIGBUS; if (substream->ops->page) { page = substream->ops->page(substream, offset); if (! page) - return NOPAGE_OOM; + return VM_FAULT_OOM; } else { vaddr = runtime->dma_area + offset; page = virt_to_page(vaddr); } get_page(page); #ifndef LINUX_2_2 - return page; + return install_new_page(mm, area, address, write_access, pmd, page); #else return page_address(page); #endif diff -urN -X dontdiff linux-2.5.69-mm7/sound/oss/emu10k1/audio.c linux-2.5.69-mm7.install_new_page/sound/oss/emu10k1/audio.c --- linux-2.5.69-mm7/sound/oss/emu10k1/audio.c Sun May 4 16:53:02 2003 +++ linux-2.5.69-mm7.install_new_page/sound/oss/emu10k1/audio.c Thu May 22 16:18:16 2003 @@ -970,7 +970,7 @@ return 0; } -static struct page *emu10k1_mm_nopage (struct vm_area_struct * vma, unsigned long address, int write_access) +static int emu10k1_mm_nopage (struct mm_struct * mm, struct vm_area_struct * vma, unsigned long address, int write_access, pmd_t * pmd) { struct emu10k1_wavedevice *wave_dev = vma->vm_private_data; struct woinst *woinst = wave_dev->woinst; @@ -983,8 +983,8 @@ DPD(3, "addr: %#lx\n", address); if (address > vma->vm_end) { - DPF(1, "EXIT, returning NOPAGE_SIGBUS\n"); - return NOPAGE_SIGBUS; /* Disallow mremap */ + DPF(1, "EXIT, returning VM_FAULT_SIGBUS\n"); + return VM_FAULT_SIGBUS; /* Disallow mremap */ } pgoff = vma->vm_pgoff + ((address - vma->vm_start) >> PAGE_SHIFT); @@ -1013,7 +1013,7 @@ get_page (dmapage); DPD(3, "page: %#lx\n", (unsigned long) dmapage); - return dmapage; + return install_new_page(mm, vma, address, write_access, pmd, dmapage); } struct vm_operations_struct emu10k1_mm_ops = { diff -urN -X dontdiff linux-2.5.69-mm7/sound/oss/via82cxxx_audio.c linux-2.5.69-mm7.install_new_page/sound/oss/via82cxxx_audio.c --- linux-2.5.69-mm7/sound/oss/via82cxxx_audio.c Sun May 4 16:53:08 2003 +++ linux-2.5.69-mm7.install_new_page/sound/oss/via82cxxx_audio.c Thu May 22 17:48:25 2003 @@ -1846,8 +1846,8 @@ } -static struct page * via_mm_nopage (struct vm_area_struct * vma, - unsigned long address, int write_access) +static int via_mm_nopage (struct mm_struct *mm, struct vm_area_struct * vma, + unsigned long address, int write_access, pmd_t *pmd) { struct via_info *card = vma->vm_private_data; struct via_channel *chan = &card->ch_out; @@ -1863,12 +1863,12 @@ write_access); if (address > vma->vm_end) { - DPRINTK ("EXIT, returning NOPAGE_SIGBUS\n"); - return NOPAGE_SIGBUS; /* Disallow mremap */ + DPRINTK ("EXIT, returning VM_FAULT_SIGBUS\n"); + return VM_FAULT_SIGBUS; /* Disallow mremap */ } if (!card) { - DPRINTK ("EXIT, returning NOPAGE_OOM\n"); - return NOPAGE_OOM; /* Nothing allocated */ + DPRINTK ("EXIT, returning VM_FAULT_OOM\n"); + return VM_FAULT_OOM; /* Nothing allocated */ } pgoff = vma->vm_pgoff + ((address - vma->vm_start) >> PAGE_SHIFT); @@ -1895,10 +1895,10 @@ assert ((((unsigned long)chan->pgtbl[pgoff].cpuaddr) % PAGE_SIZE) == 0); dmapage = virt_to_page (chan->pgtbl[pgoff].cpuaddr); - DPRINTK ("EXIT, returning page %p for cpuaddr %lXh\n", + DPRINTK ("EXIT, installing page %p for cpuaddr %lXh\n", dmapage, (unsigned long) chan->pgtbl[pgoff].cpuaddr); get_page (dmapage); - return dmapage; + return install_new_page(mm, vma, address, write_access, pmd, dmapage); } ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race 2003-05-23 14:35 ` [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race Paul E. McKenney @ 2003-05-23 16:21 ` Hugh Dickins 2003-05-23 17:10 ` Daniel Phillips 0 siblings, 1 reply; 12+ messages in thread From: Hugh Dickins @ 2003-05-23 16:21 UTC (permalink / raw) To: Paul E. McKenney; +Cc: Andrew Morton, phillips, hch, linux-mm, linux-kernel On Fri, 23 May 2003, Paul E. McKenney wrote: > On Tue, May 20, 2003 at 01:11:57AM -0700, Andrew Morton wrote: > > > > However there is not a lot of commonality between the various nopage()s and > > there may not be a lot to be gained from all this. There is subtle code in > > there and it is performance-critical. I'd be inclined to try to minimise > > overall code churn in this work. > > Good point! Here is a patch to do this. A "few" caveats: Sorry, I miss the point of this patch entirely. At the moment it just looks like an unattractive rearrangement - the code churn akpm advised against - with no bearing on that vmtruncate race. Please correct me. Hugh ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race 2003-05-23 16:21 ` Hugh Dickins @ 2003-05-23 17:10 ` Daniel Phillips 2003-05-23 17:47 ` Hugh Dickins 0 siblings, 1 reply; 12+ messages in thread From: Daniel Phillips @ 2003-05-23 17:10 UTC (permalink / raw) To: Hugh Dickins, Paul E. McKenney; +Cc: Andrew Morton, hch, linux-mm, linux-kernel On Friday 23 May 2003 18:21, Hugh Dickins wrote: > Sorry, I miss the point of this patch entirely. At the moment it just > looks like an unattractive rearrangement - the code churn akpm advised > against - with no bearing on that vmtruncate race. Please correct me. This is all about supporting cross-host mmap (nice trick, huh?). Yes, somebody should post a detailed rfc on that subject. Regards, Daniel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race 2003-05-23 17:10 ` Daniel Phillips @ 2003-05-23 17:47 ` Hugh Dickins 0 siblings, 0 replies; 12+ messages in thread From: Hugh Dickins @ 2003-05-23 17:47 UTC (permalink / raw) To: Daniel Phillips Cc: Paul E. McKenney, Andrew Morton, hch, linux-mm, linux-kernel On Fri, 23 May 2003, Daniel Phillips wrote: > On Friday 23 May 2003 18:21, Hugh Dickins wrote: > > Sorry, I miss the point of this patch entirely. At the moment it just > > looks like an unattractive rearrangement - the code churn akpm advised > > against - with no bearing on that vmtruncate race. Please correct me. > > This is all about supporting cross-host mmap (nice trick, huh?). Yes, > somebody should post a detailed rfc on that subject. Ah, thanks - translated into terms that I can understand, so that some ->nopage() not yet in the tree could do something after the install_new_page() returns. Hmm. Can we be sure it's appropriate for install_new_page to drop mm->page_table_lock before it returns? Hugh ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2003-05-30 16:45 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-05-23 18:42 [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race Paul E. McKenney 2003-05-29 15:14 ` Paul E. McKenney 2003-05-29 15:18 ` [RFC][PATCH] Remove LINUX_2_2 Paul E. McKenney 2003-05-29 16:33 ` [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race Hugh Dickins 2003-05-29 17:15 ` Daniel Phillips 2003-05-29 17:39 ` Daniel Phillips 2003-05-29 20:24 ` Paul E. McKenney 2003-05-30 2:38 ` Paul E. McKenney -- strict thread matches above, loose matches on Subject: below -- 2003-05-17 18:21 [RFC][PATCH] vm_operation to avoid pagefault/inval race Daniel Phillips 2003-05-17 19:49 ` Andrew Morton 2003-05-20 1:23 ` Paul E. McKenney 2003-05-20 8:11 ` Andrew Morton 2003-05-23 14:35 ` [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race Paul E. McKenney 2003-05-23 16:21 ` Hugh Dickins 2003-05-23 17:10 ` Daniel Phillips 2003-05-23 17:47 ` Hugh Dickins
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).