* [PATCH 0/7] [RFC] hugetlb: pagetable_operations API @ 2007-02-19 18:31 ` Adam Litke 0 siblings, 0 replies; 68+ messages in thread From: Adam Litke @ 2007-02-19 18:31 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, agl The page tables for hugetlb mappings are handled differently than page tables for normal pages. Rather than integrating multiple page size support into the main VM (which would tremendously complicate the code) some hooks were created. This allows hugetlb special cases to be handled "out of line" by a separate interface. Hugetlbfs was the huge page interface chosen. At the time, large database users were the only big users of huge pages and the hugetlbfs design meets their needs pretty well. Over time, hugetlbfs has been expanded to enable new uses of huge page memory with varied results. As features are added, the semantics become a permanent part of the Linux API. This makes maintenance of hugetlbfs an increasingly difficult task and inhibits the addition of features and functionality in support of ever-changing hardware. To remedy the situation, I propose an API (currently called pagetable_operations). All of the current hugetlbfs-specific hooks are moved into an operations struct that is attached to VMAs. The end result is a more explicit and IMO a cleaner interface between hugetlbfs and the core VM. We are then free to add other hugetlb interfaces (such as a /dev/zero-styled character device) that can operate either in concert with or independent of hugetlbfs. There should be no measurable performance impact for normal page users (we're checking if pagetable_ops != NULL instead of checking for vm_flags & VM_HUGETLB). Of course we do increase the VMA size by one pointer. For huge pages, there is an added indirection for pt_op() calls. This patch series does not change the logic of the the hugetlbfs operations, just moves them into the pagetable_operations struct. Comments? Do you think it's as good of an idea as I do? ^ permalink raw reply [flat|nested] 68+ messages in thread
* [PATCH 0/7] [RFC] hugetlb: pagetable_operations API @ 2007-02-19 18:31 ` Adam Litke 0 siblings, 0 replies; 68+ messages in thread From: Adam Litke @ 2007-02-19 18:31 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, agl The page tables for hugetlb mappings are handled differently than page tables for normal pages. Rather than integrating multiple page size support into the main VM (which would tremendously complicate the code) some hooks were created. This allows hugetlb special cases to be handled "out of line" by a separate interface. Hugetlbfs was the huge page interface chosen. At the time, large database users were the only big users of huge pages and the hugetlbfs design meets their needs pretty well. Over time, hugetlbfs has been expanded to enable new uses of huge page memory with varied results. As features are added, the semantics become a permanent part of the Linux API. This makes maintenance of hugetlbfs an increasingly difficult task and inhibits the addition of features and functionality in support of ever-changing hardware. To remedy the situation, I propose an API (currently called pagetable_operations). All of the current hugetlbfs-specific hooks are moved into an operations struct that is attached to VMAs. The end result is a more explicit and IMO a cleaner interface between hugetlbfs and the core VM. We are then free to add other hugetlb interfaces (such as a /dev/zero-styled character device) that can operate either in concert with or independent of hugetlbfs. There should be no measurable performance impact for normal page users (we're checking if pagetable_ops != NULL instead of checking for vm_flags & VM_HUGETLB). Of course we do increase the VMA size by one pointer. For huge pages, there is an added indirection for pt_op() calls. This patch series does not change the logic of the the hugetlbfs operations, just moves them into the pagetable_operations struct. Comments? Do you think it's as good of an idea as I do? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 68+ messages in thread
* [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. 2007-02-19 18:31 ` Adam Litke @ 2007-02-19 18:31 ` Adam Litke -1 siblings, 0 replies; 68+ messages in thread From: Adam Litke @ 2007-02-19 18:31 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, agl Signed-off-by: Adam Litke <agl@us.ibm.com> --- include/linux/mm.h | 25 +++++++++++++++++++++++++ 1 files changed, 25 insertions(+), 0 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 2d2c08d..a2fa66d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -98,6 +98,7 @@ struct vm_area_struct { /* Function pointers to deal with this struct. */ struct vm_operations_struct * vm_ops; + struct pagetable_operations_struct * pagetable_ops; /* Information about our backing store: */ unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE @@ -218,6 +219,30 @@ struct vm_operations_struct { }; struct mmu_gather; + +struct pagetable_operations_struct { + int (*fault)(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long address, int write_access); + int (*copy_vma)(struct mm_struct *dst, struct mm_struct *src, + struct vm_area_struct *vma); + int (*pin_pages)(struct mm_struct *mm, struct vm_area_struct *vma, + struct page **pages, struct vm_area_struct **vmas, + unsigned long *position, int *length, int i); + void (*change_protection)(struct vm_area_struct *vma, + unsigned long address, unsigned long end, pgprot_t newprot); + unsigned long (*unmap_page_range)(struct vm_area_struct *vma, + unsigned long address, unsigned long end, long *zap_work); + void (*free_pgtable_range)(struct mmu_gather **tlb, + unsigned long addr, unsigned long end, + unsigned long floor, unsigned long ceiling); +}; + +#define has_pt_op(vma, op) \ + ((vma)->pagetable_ops && (vma)->pagetable_ops->op) +#define pt_op(vma, call) \ + ((vma)->pagetable_ops->call) + struct inode; #define page_private(page) ((page)->private) ^ permalink raw reply related [flat|nested] 68+ messages in thread
* [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. @ 2007-02-19 18:31 ` Adam Litke 0 siblings, 0 replies; 68+ messages in thread From: Adam Litke @ 2007-02-19 18:31 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, agl Signed-off-by: Adam Litke <agl@us.ibm.com> --- include/linux/mm.h | 25 +++++++++++++++++++++++++ 1 files changed, 25 insertions(+), 0 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 2d2c08d..a2fa66d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -98,6 +98,7 @@ struct vm_area_struct { /* Function pointers to deal with this struct. */ struct vm_operations_struct * vm_ops; + struct pagetable_operations_struct * pagetable_ops; /* Information about our backing store: */ unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE @@ -218,6 +219,30 @@ struct vm_operations_struct { }; struct mmu_gather; + +struct pagetable_operations_struct { + int (*fault)(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long address, int write_access); + int (*copy_vma)(struct mm_struct *dst, struct mm_struct *src, + struct vm_area_struct *vma); + int (*pin_pages)(struct mm_struct *mm, struct vm_area_struct *vma, + struct page **pages, struct vm_area_struct **vmas, + unsigned long *position, int *length, int i); + void (*change_protection)(struct vm_area_struct *vma, + unsigned long address, unsigned long end, pgprot_t newprot); + unsigned long (*unmap_page_range)(struct vm_area_struct *vma, + unsigned long address, unsigned long end, long *zap_work); + void (*free_pgtable_range)(struct mmu_gather **tlb, + unsigned long addr, unsigned long end, + unsigned long floor, unsigned long ceiling); +}; + +#define has_pt_op(vma, op) \ + ((vma)->pagetable_ops && (vma)->pagetable_ops->op) +#define pt_op(vma, call) \ + ((vma)->pagetable_ops->call) + struct inode; #define page_private(page) ((page)->private) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. 2007-02-19 18:31 ` Adam Litke @ 2007-02-19 18:41 ` Arjan van de Ven -1 siblings, 0 replies; 68+ messages in thread From: Arjan van de Ven @ 2007-02-19 18:41 UTC (permalink / raw) To: Adam Litke; +Cc: linux-mm, linux-kernel On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote: > Signed-off-by: Adam Litke <agl@us.ibm.com> > --- > > include/linux/mm.h | 25 +++++++++++++++++++++++++ > 1 files changed, 25 insertions(+), 0 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 2d2c08d..a2fa66d 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -98,6 +98,7 @@ struct vm_area_struct { > > /* Function pointers to deal with this struct. */ > struct vm_operations_struct * vm_ops; > + struct pagetable_operations_struct * pagetable_ops; > please make it at least const, those things have no business ever being written to right? And by making them const the compiler helps catch that, and as bonus the data gets moved to rodata so that it won't share cachelines with anything that gets dirty ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. @ 2007-02-19 18:41 ` Arjan van de Ven 0 siblings, 0 replies; 68+ messages in thread From: Arjan van de Ven @ 2007-02-19 18:41 UTC (permalink / raw) To: Adam Litke; +Cc: linux-mm, linux-kernel On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote: > Signed-off-by: Adam Litke <agl@us.ibm.com> > --- > > include/linux/mm.h | 25 +++++++++++++++++++++++++ > 1 files changed, 25 insertions(+), 0 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 2d2c08d..a2fa66d 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -98,6 +98,7 @@ struct vm_area_struct { > > /* Function pointers to deal with this struct. */ > struct vm_operations_struct * vm_ops; > + struct pagetable_operations_struct * pagetable_ops; > please make it at least const, those things have no business ever being written to right? And by making them const the compiler helps catch that, and as bonus the data gets moved to rodata so that it won't share cachelines with anything that gets dirty -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. 2007-02-19 18:41 ` Arjan van de Ven @ 2007-02-19 19:31 ` Adam Litke -1 siblings, 0 replies; 68+ messages in thread From: Adam Litke @ 2007-02-19 19:31 UTC (permalink / raw) To: Arjan van de Ven; +Cc: linux-mm, linux-kernel On Mon, 2007-02-19 at 19:41 +0100, Arjan van de Ven wrote: > On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote: > > Signed-off-by: Adam Litke <agl@us.ibm.com> > > --- > > > > include/linux/mm.h | 25 +++++++++++++++++++++++++ > > 1 files changed, 25 insertions(+), 0 deletions(-) > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index 2d2c08d..a2fa66d 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -98,6 +98,7 @@ struct vm_area_struct { > > > > /* Function pointers to deal with this struct. */ > > struct vm_operations_struct * vm_ops; > > + struct pagetable_operations_struct * pagetable_ops; > > > > please make it at least const, those things have no business ever being > written to right? And by making them const the compiler helps catch > that, and as bonus the data gets moved to rodata so that it won't share > cachelines with anything that gets dirty Yep I agree. Changed. -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. @ 2007-02-19 19:31 ` Adam Litke 0 siblings, 0 replies; 68+ messages in thread From: Adam Litke @ 2007-02-19 19:31 UTC (permalink / raw) To: Arjan van de Ven; +Cc: linux-mm, linux-kernel On Mon, 2007-02-19 at 19:41 +0100, Arjan van de Ven wrote: > On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote: > > Signed-off-by: Adam Litke <agl@us.ibm.com> > > --- > > > > include/linux/mm.h | 25 +++++++++++++++++++++++++ > > 1 files changed, 25 insertions(+), 0 deletions(-) > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index 2d2c08d..a2fa66d 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -98,6 +98,7 @@ struct vm_area_struct { > > > > /* Function pointers to deal with this struct. */ > > struct vm_operations_struct * vm_ops; > > + struct pagetable_operations_struct * pagetable_ops; > > > > please make it at least const, those things have no business ever being > written to right? And by making them const the compiler helps catch > that, and as bonus the data gets moved to rodata so that it won't share > cachelines with anything that gets dirty Yep I agree. Changed. -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. 2007-02-19 18:31 ` Adam Litke @ 2007-02-19 19:48 ` William Lee Irwin III -1 siblings, 0 replies; 68+ messages in thread From: William Lee Irwin III @ 2007-02-19 19:48 UTC (permalink / raw) To: Adam Litke; +Cc: linux-mm, linux-kernel On Mon, Feb 19, 2007 at 10:31:34AM -0800, Adam Litke wrote: > +struct pagetable_operations_struct { > + int (*fault)(struct mm_struct *mm, > + struct vm_area_struct *vma, > + unsigned long address, int write_access); > + int (*copy_vma)(struct mm_struct *dst, struct mm_struct *src, > + struct vm_area_struct *vma); > + int (*pin_pages)(struct mm_struct *mm, struct vm_area_struct *vma, > + struct page **pages, struct vm_area_struct **vmas, > + unsigned long *position, int *length, int i); > + void (*change_protection)(struct vm_area_struct *vma, > + unsigned long address, unsigned long end, pgprot_t newprot); > + unsigned long (*unmap_page_range)(struct vm_area_struct *vma, > + unsigned long address, unsigned long end, long *zap_work); > + void (*free_pgtable_range)(struct mmu_gather **tlb, > + unsigned long addr, unsigned long end, > + unsigned long floor, unsigned long ceiling); > +}; I very very strongly approve of the approach this operations structure entails. -- wli ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. @ 2007-02-19 19:48 ` William Lee Irwin III 0 siblings, 0 replies; 68+ messages in thread From: William Lee Irwin III @ 2007-02-19 19:48 UTC (permalink / raw) To: Adam Litke; +Cc: linux-mm, linux-kernel On Mon, Feb 19, 2007 at 10:31:34AM -0800, Adam Litke wrote: > +struct pagetable_operations_struct { > + int (*fault)(struct mm_struct *mm, > + struct vm_area_struct *vma, > + unsigned long address, int write_access); > + int (*copy_vma)(struct mm_struct *dst, struct mm_struct *src, > + struct vm_area_struct *vma); > + int (*pin_pages)(struct mm_struct *mm, struct vm_area_struct *vma, > + struct page **pages, struct vm_area_struct **vmas, > + unsigned long *position, int *length, int i); > + void (*change_protection)(struct vm_area_struct *vma, > + unsigned long address, unsigned long end, pgprot_t newprot); > + unsigned long (*unmap_page_range)(struct vm_area_struct *vma, > + unsigned long address, unsigned long end, long *zap_work); > + void (*free_pgtable_range)(struct mmu_gather **tlb, > + unsigned long addr, unsigned long end, > + unsigned long floor, unsigned long ceiling); > +}; I very very strongly approve of the approach this operations structure entails. -- wli -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. 2007-02-19 18:31 ` Adam Litke @ 2007-02-19 22:29 ` Christoph Hellwig -1 siblings, 0 replies; 68+ messages in thread From: Christoph Hellwig @ 2007-02-19 22:29 UTC (permalink / raw) To: Adam Litke; +Cc: linux-mm, linux-kernel On Mon, Feb 19, 2007 at 10:31:34AM -0800, Adam Litke wrote: > Signed-off-by: Adam Litke <agl@us.ibm.com> > --- > > include/linux/mm.h | 25 +++++++++++++++++++++++++ > 1 files changed, 25 insertions(+), 0 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 2d2c08d..a2fa66d 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -98,6 +98,7 @@ struct vm_area_struct { > > /* Function pointers to deal with this struct. */ > struct vm_operations_struct * vm_ops; > + struct pagetable_operations_struct * pagetable_ops; > > /* Information about our backing store: */ > unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE > @@ -218,6 +219,30 @@ struct vm_operations_struct { > }; > > struct mmu_gather; > + > +struct pagetable_operations_struct { > + int (*fault)(struct mm_struct *mm, > + struct vm_area_struct *vma, > + unsigned long address, int write_access); > + int (*copy_vma)(struct mm_struct *dst, struct mm_struct *src, > + struct vm_area_struct *vma); > + int (*pin_pages)(struct mm_struct *mm, struct vm_area_struct *vma, > + struct page **pages, struct vm_area_struct **vmas, > + unsigned long *position, int *length, int i); > + void (*change_protection)(struct vm_area_struct *vma, > + unsigned long address, unsigned long end, pgprot_t newprot); > + unsigned long (*unmap_page_range)(struct vm_area_struct *vma, > + unsigned long address, unsigned long end, long *zap_work); > + void (*free_pgtable_range)(struct mmu_gather **tlb, > + unsigned long addr, unsigned long end, > + unsigned long floor, unsigned long ceiling); > +}; I don't think adding another operation vector is a good idea. But I'd rather extend the vma operations vector to deal with all nessecary buts ubstead if addubg a second one. ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. @ 2007-02-19 22:29 ` Christoph Hellwig 0 siblings, 0 replies; 68+ messages in thread From: Christoph Hellwig @ 2007-02-19 22:29 UTC (permalink / raw) To: Adam Litke; +Cc: linux-mm, linux-kernel On Mon, Feb 19, 2007 at 10:31:34AM -0800, Adam Litke wrote: > Signed-off-by: Adam Litke <agl@us.ibm.com> > --- > > include/linux/mm.h | 25 +++++++++++++++++++++++++ > 1 files changed, 25 insertions(+), 0 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 2d2c08d..a2fa66d 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -98,6 +98,7 @@ struct vm_area_struct { > > /* Function pointers to deal with this struct. */ > struct vm_operations_struct * vm_ops; > + struct pagetable_operations_struct * pagetable_ops; > > /* Information about our backing store: */ > unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE > @@ -218,6 +219,30 @@ struct vm_operations_struct { > }; > > struct mmu_gather; > + > +struct pagetable_operations_struct { > + int (*fault)(struct mm_struct *mm, > + struct vm_area_struct *vma, > + unsigned long address, int write_access); > + int (*copy_vma)(struct mm_struct *dst, struct mm_struct *src, > + struct vm_area_struct *vma); > + int (*pin_pages)(struct mm_struct *mm, struct vm_area_struct *vma, > + struct page **pages, struct vm_area_struct **vmas, > + unsigned long *position, int *length, int i); > + void (*change_protection)(struct vm_area_struct *vma, > + unsigned long address, unsigned long end, pgprot_t newprot); > + unsigned long (*unmap_page_range)(struct vm_area_struct *vma, > + unsigned long address, unsigned long end, long *zap_work); > + void (*free_pgtable_range)(struct mmu_gather **tlb, > + unsigned long addr, unsigned long end, > + unsigned long floor, unsigned long ceiling); > +}; I don't think adding another operation vector is a good idea. But I'd rather extend the vma operations vector to deal with all nessecary buts ubstead if addubg a second one. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. 2007-02-19 22:29 ` Christoph Hellwig @ 2007-02-20 15:50 ` Mel Gorman -1 siblings, 0 replies; 68+ messages in thread From: Mel Gorman @ 2007-02-20 15:50 UTC (permalink / raw) To: Christoph Hellwig, Adam Litke, linux-mm, linux-kernel On (19/02/07 22:29), Christoph Hellwig didst pronounce: > On Mon, Feb 19, 2007 at 10:31:34AM -0800, Adam Litke wrote: > > Signed-off-by: Adam Litke <agl@us.ibm.com> > > --- > > > > include/linux/mm.h | 25 +++++++++++++++++++++++++ > > 1 files changed, 25 insertions(+), 0 deletions(-) > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index 2d2c08d..a2fa66d 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -98,6 +98,7 @@ struct vm_area_struct { > > > > /* Function pointers to deal with this struct. */ > > struct vm_operations_struct * vm_ops; > > + struct pagetable_operations_struct * pagetable_ops; > > > > /* Information about our backing store: */ > > unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE > > @@ -218,6 +219,30 @@ struct vm_operations_struct { > > }; > > > > struct mmu_gather; > > + > > +struct pagetable_operations_struct { > > + int (*fault)(struct mm_struct *mm, > > + struct vm_area_struct *vma, > > + unsigned long address, int write_access); > > + int (*copy_vma)(struct mm_struct *dst, struct mm_struct *src, > > + struct vm_area_struct *vma); > > + int (*pin_pages)(struct mm_struct *mm, struct vm_area_struct *vma, > > + struct page **pages, struct vm_area_struct **vmas, > > + unsigned long *position, int *length, int i); > > + void (*change_protection)(struct vm_area_struct *vma, > > + unsigned long address, unsigned long end, pgprot_t newprot); > > + unsigned long (*unmap_page_range)(struct vm_area_struct *vma, > > + unsigned long address, unsigned long end, long *zap_work); > > + void (*free_pgtable_range)(struct mmu_gather **tlb, > > + unsigned long addr, unsigned long end, > > + unsigned long floor, unsigned long ceiling); > > +}; > > I don't think adding another operation vector is a good idea. But I'd > rather extend the vma operations vector to deal with all nessecary > buts ubstead if addubg a second one. Well, there are a lot of users of vm_operations_struct that have no interest in the operations in pagetable_operations_struct. Expanding vm_operations_struct would increase the size of all VMAs by more than is necessary. Also, having the pagetable ops in vm_operations_struct might lead device drivers to believe they should be doing something entertaining there. In reality, we would only want drivers playing with pagetable_operations when they really know what they are doing and why. Having the pagetable_ops set is similar to VM_HUGETLB set as a strong sign that something unusual is going on that is fairly easy to check for. I prefer the additional struct to extending VMAs anyway. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. @ 2007-02-20 15:50 ` Mel Gorman 0 siblings, 0 replies; 68+ messages in thread From: Mel Gorman @ 2007-02-20 15:50 UTC (permalink / raw) To: Christoph Hellwig, Adam Litke, linux-mm, linux-kernel On (19/02/07 22:29), Christoph Hellwig didst pronounce: > On Mon, Feb 19, 2007 at 10:31:34AM -0800, Adam Litke wrote: > > Signed-off-by: Adam Litke <agl@us.ibm.com> > > --- > > > > include/linux/mm.h | 25 +++++++++++++++++++++++++ > > 1 files changed, 25 insertions(+), 0 deletions(-) > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index 2d2c08d..a2fa66d 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -98,6 +98,7 @@ struct vm_area_struct { > > > > /* Function pointers to deal with this struct. */ > > struct vm_operations_struct * vm_ops; > > + struct pagetable_operations_struct * pagetable_ops; > > > > /* Information about our backing store: */ > > unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE > > @@ -218,6 +219,30 @@ struct vm_operations_struct { > > }; > > > > struct mmu_gather; > > + > > +struct pagetable_operations_struct { > > + int (*fault)(struct mm_struct *mm, > > + struct vm_area_struct *vma, > > + unsigned long address, int write_access); > > + int (*copy_vma)(struct mm_struct *dst, struct mm_struct *src, > > + struct vm_area_struct *vma); > > + int (*pin_pages)(struct mm_struct *mm, struct vm_area_struct *vma, > > + struct page **pages, struct vm_area_struct **vmas, > > + unsigned long *position, int *length, int i); > > + void (*change_protection)(struct vm_area_struct *vma, > > + unsigned long address, unsigned long end, pgprot_t newprot); > > + unsigned long (*unmap_page_range)(struct vm_area_struct *vma, > > + unsigned long address, unsigned long end, long *zap_work); > > + void (*free_pgtable_range)(struct mmu_gather **tlb, > > + unsigned long addr, unsigned long end, > > + unsigned long floor, unsigned long ceiling); > > +}; > > I don't think adding another operation vector is a good idea. But I'd > rather extend the vma operations vector to deal with all nessecary > buts ubstead if addubg a second one. Well, there are a lot of users of vm_operations_struct that have no interest in the operations in pagetable_operations_struct. Expanding vm_operations_struct would increase the size of all VMAs by more than is necessary. Also, having the pagetable ops in vm_operations_struct might lead device drivers to believe they should be doing something entertaining there. In reality, we would only want drivers playing with pagetable_operations when they really know what they are doing and why. Having the pagetable_ops set is similar to VM_HUGETLB set as a strong sign that something unusual is going on that is fairly easy to check for. I prefer the additional struct to extending VMAs anyway. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 68+ messages in thread
* [PATCH 2/7] copy_vma for hugetlbfs 2007-02-19 18:31 ` Adam Litke @ 2007-02-19 18:31 ` Adam Litke -1 siblings, 0 replies; 68+ messages in thread From: Adam Litke @ 2007-02-19 18:31 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, agl Signed-off-by: Adam Litke <agl@us.ibm.com> --- fs/hugetlbfs/inode.c | 6 ++++++ mm/memory.c | 4 ++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 4f4cd13..c0a7984 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -36,6 +36,7 @@ static struct super_operations hugetlbfs_ops; static const struct address_space_operations hugetlbfs_aops; const struct file_operations hugetlbfs_file_operations; +static struct pagetable_operations_struct hugetlbfs_pagetable_ops; static struct inode_operations hugetlbfs_dir_inode_operations; static struct inode_operations hugetlbfs_inode_operations; @@ -70,6 +71,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma) */ vma->vm_flags |= VM_HUGETLB | VM_RESERVED; vma->vm_ops = &hugetlb_vm_ops; + vma->pagetable_ops = &hugetlbfs_pagetable_ops; vma_len = (loff_t)(vma->vm_end - vma->vm_start); @@ -560,6 +562,10 @@ const struct file_operations hugetlbfs_file_operations = { .get_unmapped_area = hugetlb_get_unmapped_area, }; +static struct pagetable_operations_struct hugetlbfs_pagetable_ops = { + .copy_vma = copy_hugetlb_page_range, +}; + static struct inode_operations hugetlbfs_dir_inode_operations = { .create = hugetlbfs_create, .lookup = simple_lookup, diff --git a/mm/memory.c b/mm/memory.c index ef09f0a..80eafd5 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -602,8 +602,8 @@ int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm, return 0; } - if (is_vm_hugetlb_page(vma)) - return copy_hugetlb_page_range(dst_mm, src_mm, vma); + if (has_pt_op(vma, copy_vma)) + return pt_op(vma, copy_vma)(dst_mm, src_mm, vma); dst_pgd = pgd_offset(dst_mm, addr); src_pgd = pgd_offset(src_mm, addr); ^ permalink raw reply related [flat|nested] 68+ messages in thread
* [PATCH 2/7] copy_vma for hugetlbfs @ 2007-02-19 18:31 ` Adam Litke 0 siblings, 0 replies; 68+ messages in thread From: Adam Litke @ 2007-02-19 18:31 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, agl Signed-off-by: Adam Litke <agl@us.ibm.com> --- fs/hugetlbfs/inode.c | 6 ++++++ mm/memory.c | 4 ++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 4f4cd13..c0a7984 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -36,6 +36,7 @@ static struct super_operations hugetlbfs_ops; static const struct address_space_operations hugetlbfs_aops; const struct file_operations hugetlbfs_file_operations; +static struct pagetable_operations_struct hugetlbfs_pagetable_ops; static struct inode_operations hugetlbfs_dir_inode_operations; static struct inode_operations hugetlbfs_inode_operations; @@ -70,6 +71,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma) */ vma->vm_flags |= VM_HUGETLB | VM_RESERVED; vma->vm_ops = &hugetlb_vm_ops; + vma->pagetable_ops = &hugetlbfs_pagetable_ops; vma_len = (loff_t)(vma->vm_end - vma->vm_start); @@ -560,6 +562,10 @@ const struct file_operations hugetlbfs_file_operations = { .get_unmapped_area = hugetlb_get_unmapped_area, }; +static struct pagetable_operations_struct hugetlbfs_pagetable_ops = { + .copy_vma = copy_hugetlb_page_range, +}; + static struct inode_operations hugetlbfs_dir_inode_operations = { .create = hugetlbfs_create, .lookup = simple_lookup, diff --git a/mm/memory.c b/mm/memory.c index ef09f0a..80eafd5 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -602,8 +602,8 @@ int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm, return 0; } - if (is_vm_hugetlb_page(vma)) - return copy_hugetlb_page_range(dst_mm, src_mm, vma); + if (has_pt_op(vma, copy_vma)) + return pt_op(vma, copy_vma)(dst_mm, src_mm, vma); dst_pgd = pgd_offset(dst_mm, addr); src_pgd = pgd_offset(src_mm, addr); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 68+ messages in thread
* [PATCH 3/7] pin_pages for hugetlb 2007-02-19 18:31 ` Adam Litke @ 2007-02-19 18:31 ` Adam Litke -1 siblings, 0 replies; 68+ messages in thread From: Adam Litke @ 2007-02-19 18:31 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, agl Signed-off-by: Adam Litke <agl@us.ibm.com> --- fs/hugetlbfs/inode.c | 1 + mm/memory.c | 6 +++--- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index c0a7984..2d1dd84 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -564,6 +564,7 @@ const struct file_operations hugetlbfs_file_operations = { static struct pagetable_operations_struct hugetlbfs_pagetable_ops = { .copy_vma = copy_hugetlb_page_range, + .pin_pages = follow_hugetlb_page, }; static struct inode_operations hugetlbfs_dir_inode_operations = { diff --git a/mm/memory.c b/mm/memory.c index 80eafd5..9467c65 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1039,9 +1039,9 @@ int get_user_pages(struct task_struct *tsk, struct mm_struct *mm, || !(vm_flags & vma->vm_flags)) return i ? : -EFAULT; - if (is_vm_hugetlb_page(vma)) { - i = follow_hugetlb_page(mm, vma, pages, vmas, - &start, &len, i); + if (has_pt_op(vma, pin_pages)) { + i = pt_op(vma, pin_pages)(mm, vma, pages, + vmas, &start, &len, i); continue; } ^ permalink raw reply related [flat|nested] 68+ messages in thread
* [PATCH 3/7] pin_pages for hugetlb @ 2007-02-19 18:31 ` Adam Litke 0 siblings, 0 replies; 68+ messages in thread From: Adam Litke @ 2007-02-19 18:31 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, agl Signed-off-by: Adam Litke <agl@us.ibm.com> --- fs/hugetlbfs/inode.c | 1 + mm/memory.c | 6 +++--- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index c0a7984..2d1dd84 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -564,6 +564,7 @@ const struct file_operations hugetlbfs_file_operations = { static struct pagetable_operations_struct hugetlbfs_pagetable_ops = { .copy_vma = copy_hugetlb_page_range, + .pin_pages = follow_hugetlb_page, }; static struct inode_operations hugetlbfs_dir_inode_operations = { diff --git a/mm/memory.c b/mm/memory.c index 80eafd5..9467c65 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1039,9 +1039,9 @@ int get_user_pages(struct task_struct *tsk, struct mm_struct *mm, || !(vm_flags & vma->vm_flags)) return i ? : -EFAULT; - if (is_vm_hugetlb_page(vma)) { - i = follow_hugetlb_page(mm, vma, pages, vmas, - &start, &len, i); + if (has_pt_op(vma, pin_pages)) { + i = pt_op(vma, pin_pages)(mm, vma, pages, + vmas, &start, &len, i); continue; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 68+ messages in thread
* [PATCH 4/7] unmap_page_range for hugetlb 2007-02-19 18:31 ` Adam Litke @ 2007-02-19 18:32 ` Adam Litke -1 siblings, 0 replies; 68+ messages in thread From: Adam Litke @ 2007-02-19 18:32 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, agl Signed-off-by: Adam Litke <agl@us.ibm.com> --- fs/hugetlbfs/inode.c | 3 ++- include/linux/hugetlb.h | 4 ++-- mm/hugetlb.c | 12 ++++++++---- mm/memory.c | 10 ++++------ 4 files changed, 16 insertions(+), 13 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 2d1dd84..146a4b7 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -289,7 +289,7 @@ hugetlb_vmtruncate_list(struct prio_tree_root *root, pgoff_t pgoff) v_offset = 0; __unmap_hugepage_range(vma, - vma->vm_start + v_offset, vma->vm_end); + vma->vm_start + v_offset, vma->vm_end, 0); } } @@ -565,6 +565,7 @@ const struct file_operations hugetlbfs_file_operations = { static struct pagetable_operations_struct hugetlbfs_pagetable_ops = { .copy_vma = copy_hugetlb_page_range, .pin_pages = follow_hugetlb_page, + .unmap_page_range = unmap_hugepage_range, }; static struct inode_operations hugetlbfs_dir_inode_operations = { diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index a60995a..add92b3 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -16,8 +16,8 @@ static inline int is_vm_hugetlb_page(struct vm_area_struct *vma) int hugetlb_sysctl_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *); int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *); int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, unsigned long *, int *, int); -void unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long); -void __unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long); +unsigned long unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long, long *); +void __unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long, long *); int hugetlb_prefault(struct address_space *, struct vm_area_struct *); int hugetlb_report_meminfo(char *); int hugetlb_report_node_meminfo(int, char *); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index cb362f7..3bcc0db 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -356,7 +356,7 @@ nomem: } void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, - unsigned long end) + unsigned long end, long *zap_work) { struct mm_struct *mm = vma->vm_mm; unsigned long address; @@ -397,10 +397,13 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, list_del(&page->lru); put_page(page); } + + if (zap_work) + *zap_work -= (end - start) / (HPAGE_SIZE / PAGE_SIZE); } -void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, - unsigned long end) +unsigned long unmap_hugepage_range(struct vm_area_struct *vma, + unsigned long start, unsigned long end, long *zap_work) { /* * It is undesirable to test vma->vm_file as it should be non-null @@ -412,9 +415,10 @@ void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, */ if (vma->vm_file) { spin_lock(&vma->vm_file->f_mapping->i_mmap_lock); - __unmap_hugepage_range(vma, start, end); + __unmap_hugepage_range(vma, start, end, zap_work); spin_unlock(&vma->vm_file->f_mapping->i_mmap_lock); } + return end; } static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma, diff --git a/mm/memory.c b/mm/memory.c index 9467c65..aa6b06e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -839,12 +839,10 @@ unsigned long unmap_vmas(struct mmu_gather **tlbp, tlb_start_valid = 1; } - if (unlikely(is_vm_hugetlb_page(vma))) { - unmap_hugepage_range(vma, start, end); - zap_work -= (end - start) / - (HPAGE_SIZE / PAGE_SIZE); - start = end; - } else + if (unlikely(has_pt_op(vma, unmap_page_range))) + start = pt_op(vma, unmap_page_range) + (vma, start, end, &zap_work); + else start = unmap_page_range(*tlbp, vma, start, end, &zap_work, details); ^ permalink raw reply related [flat|nested] 68+ messages in thread
* [PATCH 4/7] unmap_page_range for hugetlb @ 2007-02-19 18:32 ` Adam Litke 0 siblings, 0 replies; 68+ messages in thread From: Adam Litke @ 2007-02-19 18:32 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, agl Signed-off-by: Adam Litke <agl@us.ibm.com> --- fs/hugetlbfs/inode.c | 3 ++- include/linux/hugetlb.h | 4 ++-- mm/hugetlb.c | 12 ++++++++---- mm/memory.c | 10 ++++------ 4 files changed, 16 insertions(+), 13 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 2d1dd84..146a4b7 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -289,7 +289,7 @@ hugetlb_vmtruncate_list(struct prio_tree_root *root, pgoff_t pgoff) v_offset = 0; __unmap_hugepage_range(vma, - vma->vm_start + v_offset, vma->vm_end); + vma->vm_start + v_offset, vma->vm_end, 0); } } @@ -565,6 +565,7 @@ const struct file_operations hugetlbfs_file_operations = { static struct pagetable_operations_struct hugetlbfs_pagetable_ops = { .copy_vma = copy_hugetlb_page_range, .pin_pages = follow_hugetlb_page, + .unmap_page_range = unmap_hugepage_range, }; static struct inode_operations hugetlbfs_dir_inode_operations = { diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index a60995a..add92b3 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -16,8 +16,8 @@ static inline int is_vm_hugetlb_page(struct vm_area_struct *vma) int hugetlb_sysctl_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *); int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *); int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, unsigned long *, int *, int); -void unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long); -void __unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long); +unsigned long unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long, long *); +void __unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long, long *); int hugetlb_prefault(struct address_space *, struct vm_area_struct *); int hugetlb_report_meminfo(char *); int hugetlb_report_node_meminfo(int, char *); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index cb362f7..3bcc0db 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -356,7 +356,7 @@ nomem: } void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, - unsigned long end) + unsigned long end, long *zap_work) { struct mm_struct *mm = vma->vm_mm; unsigned long address; @@ -397,10 +397,13 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, list_del(&page->lru); put_page(page); } + + if (zap_work) + *zap_work -= (end - start) / (HPAGE_SIZE / PAGE_SIZE); } -void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, - unsigned long end) +unsigned long unmap_hugepage_range(struct vm_area_struct *vma, + unsigned long start, unsigned long end, long *zap_work) { /* * It is undesirable to test vma->vm_file as it should be non-null @@ -412,9 +415,10 @@ void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, */ if (vma->vm_file) { spin_lock(&vma->vm_file->f_mapping->i_mmap_lock); - __unmap_hugepage_range(vma, start, end); + __unmap_hugepage_range(vma, start, end, zap_work); spin_unlock(&vma->vm_file->f_mapping->i_mmap_lock); } + return end; } static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma, diff --git a/mm/memory.c b/mm/memory.c index 9467c65..aa6b06e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -839,12 +839,10 @@ unsigned long unmap_vmas(struct mmu_gather **tlbp, tlb_start_valid = 1; } - if (unlikely(is_vm_hugetlb_page(vma))) { - unmap_hugepage_range(vma, start, end); - zap_work -= (end - start) / - (HPAGE_SIZE / PAGE_SIZE); - start = end; - } else + if (unlikely(has_pt_op(vma, unmap_page_range))) + start = pt_op(vma, unmap_page_range) + (vma, start, end, &zap_work); + else start = unmap_page_range(*tlbp, vma, start, end, &zap_work, details); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 68+ messages in thread
* [PATCH 5/7] change_protection for hugetlb 2007-02-19 18:31 ` Adam Litke @ 2007-02-19 18:32 ` Adam Litke -1 siblings, 0 replies; 68+ messages in thread From: Adam Litke @ 2007-02-19 18:32 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, agl Signed-off-by: Adam Litke <agl@us.ibm.com> --- fs/hugetlbfs/inode.c | 1 + mm/mprotect.c | 5 +++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 146a4b7..1016694 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -566,6 +566,7 @@ static struct pagetable_operations_struct hugetlbfs_pagetable_ops = { .copy_vma = copy_hugetlb_page_range, .pin_pages = follow_hugetlb_page, .unmap_page_range = unmap_hugepage_range, + .change_protection = hugetlb_change_protection, }; static struct inode_operations hugetlbfs_dir_inode_operations = { diff --git a/mm/mprotect.c b/mm/mprotect.c index 3b8f3c0..172e204 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -201,8 +201,9 @@ success: dirty_accountable = 1; } - if (is_vm_hugetlb_page(vma)) - hugetlb_change_protection(vma, start, end, vma->vm_page_prot); + if (has_pt_op(vma, change_protection)) + pt_op(vma, change_protection)(vma, start, end, + vma->vm_page_prot); else change_protection(vma, start, end, vma->vm_page_prot, dirty_accountable); vm_stat_account(mm, oldflags, vma->vm_file, -nrpages); ^ permalink raw reply related [flat|nested] 68+ messages in thread
* [PATCH 5/7] change_protection for hugetlb @ 2007-02-19 18:32 ` Adam Litke 0 siblings, 0 replies; 68+ messages in thread From: Adam Litke @ 2007-02-19 18:32 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, agl Signed-off-by: Adam Litke <agl@us.ibm.com> --- fs/hugetlbfs/inode.c | 1 + mm/mprotect.c | 5 +++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 146a4b7..1016694 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -566,6 +566,7 @@ static struct pagetable_operations_struct hugetlbfs_pagetable_ops = { .copy_vma = copy_hugetlb_page_range, .pin_pages = follow_hugetlb_page, .unmap_page_range = unmap_hugepage_range, + .change_protection = hugetlb_change_protection, }; static struct inode_operations hugetlbfs_dir_inode_operations = { diff --git a/mm/mprotect.c b/mm/mprotect.c index 3b8f3c0..172e204 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -201,8 +201,9 @@ success: dirty_accountable = 1; } - if (is_vm_hugetlb_page(vma)) - hugetlb_change_protection(vma, start, end, vma->vm_page_prot); + if (has_pt_op(vma, change_protection)) + pt_op(vma, change_protection)(vma, start, end, + vma->vm_page_prot); else change_protection(vma, start, end, vma->vm_page_prot, dirty_accountable); vm_stat_account(mm, oldflags, vma->vm_file, -nrpages); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 68+ messages in thread
* [PATCH 6/7] free_pgtable_range for hugetlb 2007-02-19 18:31 ` Adam Litke @ 2007-02-19 18:32 ` Adam Litke -1 siblings, 0 replies; 68+ messages in thread From: Adam Litke @ 2007-02-19 18:32 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, agl Signed-off-by: Adam Litke <agl@us.ibm.com> --- fs/hugetlbfs/inode.c | 1 + mm/memory.c | 6 +++--- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 1016694..3461f9b 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -567,6 +567,7 @@ static struct pagetable_operations_struct hugetlbfs_pagetable_ops = { .pin_pages = follow_hugetlb_page, .unmap_page_range = unmap_hugepage_range, .change_protection = hugetlb_change_protection, + .free_pgtable_range = hugetlb_free_pgd_range, }; static struct inode_operations hugetlbfs_dir_inode_operations = { diff --git a/mm/memory.c b/mm/memory.c index aa6b06e..442e6b2 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -279,15 +279,15 @@ void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *vma, anon_vma_unlink(vma); unlink_file_vma(vma); - if (is_vm_hugetlb_page(vma)) { - hugetlb_free_pgd_range(tlb, addr, vma->vm_end, + if (has_pt_op(vma, free_pgtable_range)) { + pt_op(vma, free_pgtable_range)(tlb, addr, vma->vm_end, floor, next? next->vm_start: ceiling); } else { /* * Optimization: gather nearby vmas into one call down */ while (next && next->vm_start <= vma->vm_end + PMD_SIZE - && !is_vm_hugetlb_page(next)) { + && !has_pt_op(next, free_pgtable_range)) { vma = next; next = vma->vm_next; anon_vma_unlink(vma); ^ permalink raw reply related [flat|nested] 68+ messages in thread
* [PATCH 6/7] free_pgtable_range for hugetlb @ 2007-02-19 18:32 ` Adam Litke 0 siblings, 0 replies; 68+ messages in thread From: Adam Litke @ 2007-02-19 18:32 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, agl Signed-off-by: Adam Litke <agl@us.ibm.com> --- fs/hugetlbfs/inode.c | 1 + mm/memory.c | 6 +++--- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 1016694..3461f9b 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -567,6 +567,7 @@ static struct pagetable_operations_struct hugetlbfs_pagetable_ops = { .pin_pages = follow_hugetlb_page, .unmap_page_range = unmap_hugepage_range, .change_protection = hugetlb_change_protection, + .free_pgtable_range = hugetlb_free_pgd_range, }; static struct inode_operations hugetlbfs_dir_inode_operations = { diff --git a/mm/memory.c b/mm/memory.c index aa6b06e..442e6b2 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -279,15 +279,15 @@ void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *vma, anon_vma_unlink(vma); unlink_file_vma(vma); - if (is_vm_hugetlb_page(vma)) { - hugetlb_free_pgd_range(tlb, addr, vma->vm_end, + if (has_pt_op(vma, free_pgtable_range)) { + pt_op(vma, free_pgtable_range)(tlb, addr, vma->vm_end, floor, next? next->vm_start: ceiling); } else { /* * Optimization: gather nearby vmas into one call down */ while (next && next->vm_start <= vma->vm_end + PMD_SIZE - && !is_vm_hugetlb_page(next)) { + && !has_pt_op(next, free_pgtable_range)) { vma = next; next = vma->vm_next; anon_vma_unlink(vma); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 68+ messages in thread
* [PATCH 7/7] hugetlbfs fault handler 2007-02-19 18:31 ` Adam Litke @ 2007-02-19 18:32 ` Adam Litke -1 siblings, 0 replies; 68+ messages in thread From: Adam Litke @ 2007-02-19 18:32 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, agl Signed-off-by: Adam Litke <agl@us.ibm.com> --- fs/hugetlbfs/inode.c | 1 + mm/hugetlb.c | 4 +++- mm/memory.c | 4 ++-- 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 3461f9b..1de73c1 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -568,6 +568,7 @@ static struct pagetable_operations_struct hugetlbfs_pagetable_ops = { .unmap_page_range = unmap_hugepage_range, .change_protection = hugetlb_change_protection, .free_pgtable_range = hugetlb_free_pgd_range, + .fault = hugetlb_fault, }; static struct inode_operations hugetlbfs_dir_inode_operations = { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 3bcc0db..63de369 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -588,6 +588,8 @@ int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long vaddr = *position; int remainder = *length; + BUG_ON(!has_pt_op(vma, fault)); + spin_lock(&mm->page_table_lock); while (vaddr < vma->vm_end && remainder) { pte_t *pte; @@ -604,7 +606,7 @@ int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, int ret; spin_unlock(&mm->page_table_lock); - ret = hugetlb_fault(mm, vma, vaddr, 0); + ret = pt_op(vma, fault)(mm, vma, vaddr, 0); spin_lock(&mm->page_table_lock); if (ret == VM_FAULT_MINOR) continue; diff --git a/mm/memory.c b/mm/memory.c index 442e6b2..c8e17b4 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2455,8 +2455,8 @@ int __handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma, count_vm_event(PGFAULT); - if (unlikely(is_vm_hugetlb_page(vma))) - return hugetlb_fault(mm, vma, address, write_access); + if (unlikely(has_pt_op(vma, fault))) + return pt_op(vma, fault)(mm, vma, address, write_access); pgd = pgd_offset(mm, address); pud = pud_alloc(mm, pgd, address); ^ permalink raw reply related [flat|nested] 68+ messages in thread
* [PATCH 7/7] hugetlbfs fault handler @ 2007-02-19 18:32 ` Adam Litke 0 siblings, 0 replies; 68+ messages in thread From: Adam Litke @ 2007-02-19 18:32 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, agl Signed-off-by: Adam Litke <agl@us.ibm.com> --- fs/hugetlbfs/inode.c | 1 + mm/hugetlb.c | 4 +++- mm/memory.c | 4 ++-- 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 3461f9b..1de73c1 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -568,6 +568,7 @@ static struct pagetable_operations_struct hugetlbfs_pagetable_ops = { .unmap_page_range = unmap_hugepage_range, .change_protection = hugetlb_change_protection, .free_pgtable_range = hugetlb_free_pgd_range, + .fault = hugetlb_fault, }; static struct inode_operations hugetlbfs_dir_inode_operations = { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 3bcc0db..63de369 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -588,6 +588,8 @@ int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long vaddr = *position; int remainder = *length; + BUG_ON(!has_pt_op(vma, fault)); + spin_lock(&mm->page_table_lock); while (vaddr < vma->vm_end && remainder) { pte_t *pte; @@ -604,7 +606,7 @@ int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, int ret; spin_unlock(&mm->page_table_lock); - ret = hugetlb_fault(mm, vma, vaddr, 0); + ret = pt_op(vma, fault)(mm, vma, vaddr, 0); spin_lock(&mm->page_table_lock); if (ret == VM_FAULT_MINOR) continue; diff --git a/mm/memory.c b/mm/memory.c index 442e6b2..c8e17b4 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2455,8 +2455,8 @@ int __handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma, count_vm_event(PGFAULT); - if (unlikely(is_vm_hugetlb_page(vma))) - return hugetlb_fault(mm, vma, address, write_access); + if (unlikely(has_pt_op(vma, fault))) + return pt_op(vma, fault)(mm, vma, address, write_access); pgd = pgd_offset(mm, address); pud = pud_alloc(mm, pgd, address); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 68+ messages in thread
* Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API 2007-02-19 18:31 ` Adam Litke @ 2007-02-19 18:43 ` Arjan van de Ven -1 siblings, 0 replies; 68+ messages in thread From: Arjan van de Ven @ 2007-02-19 18:43 UTC (permalink / raw) To: Adam Litke; +Cc: linux-mm, linux-kernel On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote: > The page tables for hugetlb mappings are handled differently than page tables > for normal pages. Rather than integrating multiple page size support into the > main VM (which would tremendously complicate the code) some hooks were created. > This allows hugetlb special cases to be handled "out of line" by a separate > interface. ok it makes sense to clean this up.. what I don't like is that there STILL are all the double cases... for this to work and be worth it both the common case and the hugetlb case should be using the ops structure always! Anything else and you're just replacing bad code with bad code ;( ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API @ 2007-02-19 18:43 ` Arjan van de Ven 0 siblings, 0 replies; 68+ messages in thread From: Arjan van de Ven @ 2007-02-19 18:43 UTC (permalink / raw) To: Adam Litke; +Cc: linux-mm, linux-kernel On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote: > The page tables for hugetlb mappings are handled differently than page tables > for normal pages. Rather than integrating multiple page size support into the > main VM (which would tremendously complicate the code) some hooks were created. > This allows hugetlb special cases to be handled "out of line" by a separate > interface. ok it makes sense to clean this up.. what I don't like is that there STILL are all the double cases... for this to work and be worth it both the common case and the hugetlb case should be using the ops structure always! Anything else and you're just replacing bad code with bad code ;( -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API 2007-02-19 18:43 ` Arjan van de Ven @ 2007-02-19 19:34 ` Adam Litke -1 siblings, 0 replies; 68+ messages in thread From: Adam Litke @ 2007-02-19 19:34 UTC (permalink / raw) To: Arjan van de Ven; +Cc: linux-mm, linux-kernel On Mon, 2007-02-19 at 19:43 +0100, Arjan van de Ven wrote: > On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote: > > The page tables for hugetlb mappings are handled differently than page tables > > for normal pages. Rather than integrating multiple page size support into the > > main VM (which would tremendously complicate the code) some hooks were created. > > This allows hugetlb special cases to be handled "out of line" by a separate > > interface. > > ok it makes sense to clean this up.. what I don't like is that there > STILL are all the double cases... for this to work and be worth it both > the common case and the hugetlb case should be using the ops structure > always! Anything else and you're just replacing bad code with bad > code ;( Hmm. Do you think everyone would support an extra pointer indirection for every handle_pte_fault() call? If not, then I definitely wouldn't mind creating a default_pagetable_ops and calling into that. -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API @ 2007-02-19 19:34 ` Adam Litke 0 siblings, 0 replies; 68+ messages in thread From: Adam Litke @ 2007-02-19 19:34 UTC (permalink / raw) To: Arjan van de Ven; +Cc: linux-mm, linux-kernel On Mon, 2007-02-19 at 19:43 +0100, Arjan van de Ven wrote: > On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote: > > The page tables for hugetlb mappings are handled differently than page tables > > for normal pages. Rather than integrating multiple page size support into the > > main VM (which would tremendously complicate the code) some hooks were created. > > This allows hugetlb special cases to be handled "out of line" by a separate > > interface. > > ok it makes sense to clean this up.. what I don't like is that there > STILL are all the double cases... for this to work and be worth it both > the common case and the hugetlb case should be using the ops structure > always! Anything else and you're just replacing bad code with bad > code ;( Hmm. Do you think everyone would support an extra pointer indirection for every handle_pte_fault() call? If not, then I definitely wouldn't mind creating a default_pagetable_ops and calling into that. -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API 2007-02-19 19:34 ` Adam Litke @ 2007-02-19 21:15 ` Arjan van de Ven -1 siblings, 0 replies; 68+ messages in thread From: Arjan van de Ven @ 2007-02-19 21:15 UTC (permalink / raw) To: Adam Litke; +Cc: linux-mm, linux-kernel On Mon, 2007-02-19 at 13:34 -0600, Adam Litke wrote: > On Mon, 2007-02-19 at 19:43 +0100, Arjan van de Ven wrote: > > On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote: > > > The page tables for hugetlb mappings are handled differently than page tables > > > for normal pages. Rather than integrating multiple page size support into the > > > main VM (which would tremendously complicate the code) some hooks were created. > > > This allows hugetlb special cases to be handled "out of line" by a separate > > > interface. > > > > ok it makes sense to clean this up.. what I don't like is that there > > STILL are all the double cases... for this to work and be worth it both > > the common case and the hugetlb case should be using the ops structure > > always! Anything else and you're just replacing bad code with bad > > code ;( > > Hmm. Do you think everyone would support an extra pointer indirection > for every handle_pte_fault() call? maybe. I'm not entirely convinced... (I like the cleanup potential a lot code wise.. but if it costs performance, then... well I'd hate to see linux get slower for hugetlbfs) > If not, then I definitely wouldn't > mind creating a default_pagetable_ops and calling into that. ... but without it to be honest, your patch adds nothing real.. there's ONE user of your code, and there's no real cleanup unless you get rid of all the special casing.... since the special casing is the really ugly part of hugetlbfs, not the actual code inside the special case.. -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API @ 2007-02-19 21:15 ` Arjan van de Ven 0 siblings, 0 replies; 68+ messages in thread From: Arjan van de Ven @ 2007-02-19 21:15 UTC (permalink / raw) To: Adam Litke; +Cc: linux-mm, linux-kernel On Mon, 2007-02-19 at 13:34 -0600, Adam Litke wrote: > On Mon, 2007-02-19 at 19:43 +0100, Arjan van de Ven wrote: > > On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote: > > > The page tables for hugetlb mappings are handled differently than page tables > > > for normal pages. Rather than integrating multiple page size support into the > > > main VM (which would tremendously complicate the code) some hooks were created. > > > This allows hugetlb special cases to be handled "out of line" by a separate > > > interface. > > > > ok it makes sense to clean this up.. what I don't like is that there > > STILL are all the double cases... for this to work and be worth it both > > the common case and the hugetlb case should be using the ops structure > > always! Anything else and you're just replacing bad code with bad > > code ;( > > Hmm. Do you think everyone would support an extra pointer indirection > for every handle_pte_fault() call? maybe. I'm not entirely convinced... (I like the cleanup potential a lot code wise.. but if it costs performance, then... well I'd hate to see linux get slower for hugetlbfs) > If not, then I definitely wouldn't > mind creating a default_pagetable_ops and calling into that. ... but without it to be honest, your patch adds nothing real.. there's ONE user of your code, and there's no real cleanup unless you get rid of all the special casing.... since the special casing is the really ugly part of hugetlbfs, not the actual code inside the special case.. -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API 2007-02-19 21:15 ` Arjan van de Ven @ 2007-02-20 19:57 ` Benjamin Herrenschmidt -1 siblings, 0 replies; 68+ messages in thread From: Benjamin Herrenschmidt @ 2007-02-20 19:57 UTC (permalink / raw) To: Arjan van de Ven; +Cc: Adam Litke, linux-mm, linux-kernel > maybe. I'm not entirely convinced... (I like the cleanup potential a lot > code wise.. but if it costs performance, then... well I'd hate to see > linux get slower for hugetlbfs) > > > If not, then I definitely wouldn't > > mind creating a default_pagetable_ops and calling into that. > > ... but without it to be honest, your patch adds nothing real.. there's > ONE user of your code, and there's no real cleanup unless you get rid of > all the special casing.... since the special casing is the really ugly > part of hugetlbfs, not the actual code inside the special case.. Well... I disagree there too :-) I've been working recently for example on some spufs improvements that require similar tweaking of the user address space as hugetlbfs. The problem I have is that while there are hooks in the generic code pretty much everywhere I need.... they are all hugetlb specific, that is they call directly into the hugetlb code. For now, I found ways of doing my stuff without hooking all over the page table operations (well, I had no real choices) but I can imagine it making sense to allow something (hugetlb being one of them) to take over part of the user address space. Ben. ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API @ 2007-02-20 19:57 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 68+ messages in thread From: Benjamin Herrenschmidt @ 2007-02-20 19:57 UTC (permalink / raw) To: Arjan van de Ven; +Cc: Adam Litke, linux-mm, linux-kernel > maybe. I'm not entirely convinced... (I like the cleanup potential a lot > code wise.. but if it costs performance, then... well I'd hate to see > linux get slower for hugetlbfs) > > > If not, then I definitely wouldn't > > mind creating a default_pagetable_ops and calling into that. > > ... but without it to be honest, your patch adds nothing real.. there's > ONE user of your code, and there's no real cleanup unless you get rid of > all the special casing.... since the special casing is the really ugly > part of hugetlbfs, not the actual code inside the special case.. Well... I disagree there too :-) I've been working recently for example on some spufs improvements that require similar tweaking of the user address space as hugetlbfs. The problem I have is that while there are hooks in the generic code pretty much everywhere I need.... they are all hugetlb specific, that is they call directly into the hugetlb code. For now, I found ways of doing my stuff without hooking all over the page table operations (well, I had no real choices) but I can imagine it making sense to allow something (hugetlb being one of them) to take over part of the user address space. Ben. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API 2007-02-19 18:43 ` Arjan van de Ven @ 2007-02-20 19:54 ` Benjamin Herrenschmidt -1 siblings, 0 replies; 68+ messages in thread From: Benjamin Herrenschmidt @ 2007-02-20 19:54 UTC (permalink / raw) To: Arjan van de Ven; +Cc: Adam Litke, linux-mm, linux-kernel On Mon, 2007-02-19 at 19:43 +0100, Arjan van de Ven wrote: > On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote: > > The page tables for hugetlb mappings are handled differently than page tables > > for normal pages. Rather than integrating multiple page size support into the > > main VM (which would tremendously complicate the code) some hooks were created. > > This allows hugetlb special cases to be handled "out of line" by a separate > > interface. > > ok it makes sense to clean this up.. what I don't like is that there > STILL are all the double cases... for this to work and be worth it both > the common case and the hugetlb case should be using the ops structure > always! Anything else and you're just replacing bad code with bad > code ;( I don't fully agree. I think it makes sense to have the "special" case be a function pointer and the "normal" case stay where it is for performances. You don't want to pay the cost of the function pointer call in the normal case do you ? Ben. ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API @ 2007-02-20 19:54 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 68+ messages in thread From: Benjamin Herrenschmidt @ 2007-02-20 19:54 UTC (permalink / raw) To: Arjan van de Ven; +Cc: Adam Litke, linux-mm, linux-kernel On Mon, 2007-02-19 at 19:43 +0100, Arjan van de Ven wrote: > On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote: > > The page tables for hugetlb mappings are handled differently than page tables > > for normal pages. Rather than integrating multiple page size support into the > > main VM (which would tremendously complicate the code) some hooks were created. > > This allows hugetlb special cases to be handled "out of line" by a separate > > interface. > > ok it makes sense to clean this up.. what I don't like is that there > STILL are all the double cases... for this to work and be worth it both > the common case and the hugetlb case should be using the ops structure > always! Anything else and you're just replacing bad code with bad > code ;( I don't fully agree. I think it makes sense to have the "special" case be a function pointer and the "normal" case stay where it is for performances. You don't want to pay the cost of the function pointer call in the normal case do you ? Ben. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 68+ messages in thread
* [PATCH 0/7] [RFC] hugetlb: pagetable_operations API (V2) @ 2007-03-19 20:05 Adam Litke 2007-03-19 20:05 ` Adam Litke 0 siblings, 1 reply; 68+ messages in thread From: Adam Litke @ 2007-03-19 20:05 UTC (permalink / raw) To: Andrew Morton Cc: Adam Litke, Arjan van de Ven, William Lee Irwin III, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel Andrew, given the favorable review of these patches the last time around, would you consider them for the -mm tree? Does anyone else have any objections? The page tables for hugetlb mappings are handled differently than page tables for normal pages. Rather than integrating multiple page size support into the core VM (which would tremendously complicate the code) some hooks were created. This allows hugetlb special cases to be handled "out of line" by a separate interface. Hugetlbfs was the huge page interface chosen. At the time, large database users were the only big users of huge pages and the hugetlbfs design meets their needs pretty well. Over time, hugetlbfs has been expanded to enable new uses of huge page memory with varied results. As features are added, the semantics become a permanent part of the Linux API. This makes maintenance of hugetlbfs an increasingly difficult task and inhibits the addition of features and functionality in support of ever-changing hardware. To remedy the situation, I propose an API (currently called pagetable_operations). All of the current hugetlbfs-specific hooks are moved into an operations struct that is attached to VMAs. The end result is a more explicit and IMO a cleaner interface between hugetlbfs and the core VM. We are then free to add other hugetlb interfaces (such as a /dev/zero-styled character device) that can operate either in concert with or independent of hugetlbfs. There should be no measurable performance impact for normal page users (we're checking if pagetable_ops != NULL instead of checking for vm_flags & VM_HUGETLB). Of course we do increase the VMA size by one pointer. For huge pages, there is an added indirection for pt_op() calls. This patch series does not change the logic of the the hugetlbfs operations, just moves them into the pagetable_operations struct. I did some pretty basic benchmarking of these patches on ppc64, x86, and x86_64 to get a feel for the fast-path performance impact. The following tables show kernbench performance comparisons between a clean 2.6.20 kernel and one with my patches applied. These numbers seem well within statistical noise to me. Changes since V1: - Made hugetlbfs_pagetable_ops const (Thanks Arjan) -- KernBench Comparison (ppc64) ---------------------------- 2.6.20-clean 2.6.20-pgtable_ops pct. diff User CPU time 708.82 708.59 0.03 System CPU time 62.50 62.58 -0.13 Total CPU time 771.32 771.17 0.02 Elapsed time 115.40 115.35 0.04 KernBench Comparison (x86) -------------------------- 2.6.20-clean 2.6.20-pgtable_ops pct. diff User CPU time 1382.62 1381.88 0.05 System CPU time 146.06 146.86 -0.55 Total CPU time 1528.68 1528.74 -0.00 Elapsed time 394.92 396.70 -0.45 KernBench Comparison (x86_64) ----------------------------- 2.6.20-clean 2.6.20-pgtable_ops pct. diff User CPU time 559.39 557.97 0.25 System CPU time 65.10 66.17 -1.64 Total CPU time 624.49 624.14 0.06 Elapsed time 158.54 158.59 -0.03 The lack of a performance impact makes sense to me. The following is a simplified instruction comparison for each case: 2.6.20-clean 2.6.20-pgtable_ops ------------------- -------------------- /* Load vm_flags */ /* Load pagetable_ops pointer */ mov 0x18(ecx),eax mov 0x48(ecx),eax /* Test for VM_HUGETLB */ /* Test if it's NULL */ test $0x400000,eax test eax,eax /* If set, jump to call stub */ /* If so, jump away to main code */ jne c0148f04 je c0148ba1 ... /* Lookup the operation's function pointer */ /* copy_hugetlb_page_range call */ mov 0x4(eax),ebx c0148f04: /* Test if it's NULL */ mov 0xffffff98(ebp),ecx test ebx,ebx mov 0xffffff9c(ebp),edx /* If so, jump away to main code */ mov 0xffffffa0(ebp),eax je c0148ba1 call c01536e0 /* pagetable operation call */ mov 0xffffff9c(ebp),edx mov 0xffffffa0(ebp),eax call *ebx For the common case (vma->pagetable_ops == NULL), we do almost the same thing as the current code: load and test. The third instruction is different in that we jump for the common case instead of jumping in the hugetlb case. I don't think this is a big deal though. If it is, would an unlikely() macro fix it? ^ permalink raw reply [flat|nested] 68+ messages in thread
* [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. 2007-03-19 20:05 [PATCH 0/7] [RFC] hugetlb: pagetable_operations API (V2) Adam Litke @ 2007-03-19 20:05 ` Adam Litke 0 siblings, 0 replies; 68+ messages in thread From: Adam Litke @ 2007-03-19 20:05 UTC (permalink / raw) To: Andrew Morton Cc: Adam Litke, Arjan van de Ven, William Lee Irwin III, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel Signed-off-by: Adam Litke <agl@us.ibm.com> --- include/linux/mm.h | 25 +++++++++++++++++++++++++ 1 files changed, 25 insertions(+), 0 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 60e0e4a..7089323 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -98,6 +98,7 @@ struct vm_area_struct { /* Function pointers to deal with this struct. */ struct vm_operations_struct * vm_ops; + const struct pagetable_operations_struct * pagetable_ops; /* Information about our backing store: */ unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE @@ -218,6 +219,30 @@ struct vm_operations_struct { }; struct mmu_gather; + +struct pagetable_operations_struct { + int (*fault)(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long address, int write_access); + int (*copy_vma)(struct mm_struct *dst, struct mm_struct *src, + struct vm_area_struct *vma); + int (*pin_pages)(struct mm_struct *mm, struct vm_area_struct *vma, + struct page **pages, struct vm_area_struct **vmas, + unsigned long *position, int *length, int i); + void (*change_protection)(struct vm_area_struct *vma, + unsigned long address, unsigned long end, pgprot_t newprot); + unsigned long (*unmap_page_range)(struct vm_area_struct *vma, + unsigned long address, unsigned long end, long *zap_work); + void (*free_pgtable_range)(struct mmu_gather **tlb, + unsigned long addr, unsigned long end, + unsigned long floor, unsigned long ceiling); +}; + +#define has_pt_op(vma, op) \ + ((vma)->pagetable_ops && (vma)->pagetable_ops->op) +#define pt_op(vma, call) \ + ((vma)->pagetable_ops->call) + struct inode; #define page_private(page) ((page)->private) ^ permalink raw reply related [flat|nested] 68+ messages in thread
* [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. @ 2007-03-19 20:05 ` Adam Litke 0 siblings, 0 replies; 68+ messages in thread From: Adam Litke @ 2007-03-19 20:05 UTC (permalink / raw) To: Andrew Morton Cc: Adam Litke, Arjan van de Ven, William Lee Irwin III, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel Signed-off-by: Adam Litke <agl@us.ibm.com> --- include/linux/mm.h | 25 +++++++++++++++++++++++++ 1 files changed, 25 insertions(+), 0 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 60e0e4a..7089323 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -98,6 +98,7 @@ struct vm_area_struct { /* Function pointers to deal with this struct. */ struct vm_operations_struct * vm_ops; + const struct pagetable_operations_struct * pagetable_ops; /* Information about our backing store: */ unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE @@ -218,6 +219,30 @@ struct vm_operations_struct { }; struct mmu_gather; + +struct pagetable_operations_struct { + int (*fault)(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long address, int write_access); + int (*copy_vma)(struct mm_struct *dst, struct mm_struct *src, + struct vm_area_struct *vma); + int (*pin_pages)(struct mm_struct *mm, struct vm_area_struct *vma, + struct page **pages, struct vm_area_struct **vmas, + unsigned long *position, int *length, int i); + void (*change_protection)(struct vm_area_struct *vma, + unsigned long address, unsigned long end, pgprot_t newprot); + unsigned long (*unmap_page_range)(struct vm_area_struct *vma, + unsigned long address, unsigned long end, long *zap_work); + void (*free_pgtable_range)(struct mmu_gather **tlb, + unsigned long addr, unsigned long end, + unsigned long floor, unsigned long ceiling); +}; + +#define has_pt_op(vma, op) \ + ((vma)->pagetable_ops && (vma)->pagetable_ops->op) +#define pt_op(vma, call) \ + ((vma)->pagetable_ops->call) + struct inode; #define page_private(page) ((page)->private) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. 2007-03-19 20:05 ` Adam Litke @ 2007-03-20 23:24 ` Dave Hansen -1 siblings, 0 replies; 68+ messages in thread From: Dave Hansen @ 2007-03-20 23:24 UTC (permalink / raw) To: Adam Litke Cc: Andrew Morton, Arjan van de Ven, William Lee Irwin III, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel On Mon, 2007-03-19 at 13:05 -0700, Adam Litke wrote: > > +#define has_pt_op(vma, op) \ > + ((vma)->pagetable_ops && (vma)->pagetable_ops->op) > +#define pt_op(vma, call) \ > + ((vma)->pagetable_ops->call) Can you get rid of these macros? I think they make it a wee bit harder to read. My brain doesn't properly parse the foo(arg)(bar) syntax. + if (has_pt_op(vma, copy_vma)) + return pt_op(vma, copy_vma)(dst_mm, src_mm, vma); + if (vma->pagetable_ops && vma->pagetable_ops->copy_vma) + return vma->pagetable_ops->copy_vma(dst_mm, src_mm, vma); I guess it does lead to some longish lines. Does it start looking really nasty? If you're going to have them, it might just be best to put a single unlikely() around the macro definitions themselves to keep anybody from having to open-code it for any of the users. -- Dave ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. @ 2007-03-20 23:24 ` Dave Hansen 0 siblings, 0 replies; 68+ messages in thread From: Dave Hansen @ 2007-03-20 23:24 UTC (permalink / raw) To: Adam Litke Cc: Andrew Morton, Arjan van de Ven, William Lee Irwin III, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel On Mon, 2007-03-19 at 13:05 -0700, Adam Litke wrote: > > +#define has_pt_op(vma, op) \ > + ((vma)->pagetable_ops && (vma)->pagetable_ops->op) > +#define pt_op(vma, call) \ > + ((vma)->pagetable_ops->call) Can you get rid of these macros? I think they make it a wee bit harder to read. My brain doesn't properly parse the foo(arg)(bar) syntax. + if (has_pt_op(vma, copy_vma)) + return pt_op(vma, copy_vma)(dst_mm, src_mm, vma); + if (vma->pagetable_ops && vma->pagetable_ops->copy_vma) + return vma->pagetable_ops->copy_vma(dst_mm, src_mm, vma); I guess it does lead to some longish lines. Does it start looking really nasty? If you're going to have them, it might just be best to put a single unlikely() around the macro definitions themselves to keep anybody from having to open-code it for any of the users. -- Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. 2007-03-20 23:24 ` Dave Hansen @ 2007-03-21 14:50 ` Adam Litke -1 siblings, 0 replies; 68+ messages in thread From: Adam Litke @ 2007-03-21 14:50 UTC (permalink / raw) To: Dave Hansen Cc: Andrew Morton, Arjan van de Ven, William Lee Irwin III, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel On Tue, 2007-03-20 at 16:24 -0700, Dave Hansen wrote: > On Mon, 2007-03-19 at 13:05 -0700, Adam Litke wrote: > > > > +#define has_pt_op(vma, op) \ > > + ((vma)->pagetable_ops && (vma)->pagetable_ops->op) > > +#define pt_op(vma, call) \ > > + ((vma)->pagetable_ops->call) > > Can you get rid of these macros? I think they make it a wee bit harder > to read. My brain doesn't properly parse the foo(arg)(bar) syntax. > > + if (has_pt_op(vma, copy_vma)) > + return pt_op(vma, copy_vma)(dst_mm, src_mm, vma); > > + if (vma->pagetable_ops && vma->pagetable_ops->copy_vma) > + return vma->pagetable_ops->copy_vma(dst_mm, src_mm, vma); > > I guess it does lead to some longish lines. Does it start looking > really nasty? Yeah, it starts to look pretty bad. Some of these calls are in code that is already indented several times. > If you're going to have them, it might just be best to put a single > unlikely() around the macro definitions themselves to keep anybody from > having to open-code it for any of the users. It should be pretty easy to wrap has_pt_op() with an unlikely(). Good suggestion. -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. @ 2007-03-21 14:50 ` Adam Litke 0 siblings, 0 replies; 68+ messages in thread From: Adam Litke @ 2007-03-21 14:50 UTC (permalink / raw) To: Dave Hansen Cc: Andrew Morton, Arjan van de Ven, William Lee Irwin III, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel On Tue, 2007-03-20 at 16:24 -0700, Dave Hansen wrote: > On Mon, 2007-03-19 at 13:05 -0700, Adam Litke wrote: > > > > +#define has_pt_op(vma, op) \ > > + ((vma)->pagetable_ops && (vma)->pagetable_ops->op) > > +#define pt_op(vma, call) \ > > + ((vma)->pagetable_ops->call) > > Can you get rid of these macros? I think they make it a wee bit harder > to read. My brain doesn't properly parse the foo(arg)(bar) syntax. > > + if (has_pt_op(vma, copy_vma)) > + return pt_op(vma, copy_vma)(dst_mm, src_mm, vma); > > + if (vma->pagetable_ops && vma->pagetable_ops->copy_vma) > + return vma->pagetable_ops->copy_vma(dst_mm, src_mm, vma); > > I guess it does lead to some longish lines. Does it start looking > really nasty? Yeah, it starts to look pretty bad. Some of these calls are in code that is already indented several times. > If you're going to have them, it might just be best to put a single > unlikely() around the macro definitions themselves to keep anybody from > having to open-code it for any of the users. It should be pretty easy to wrap has_pt_op() with an unlikely(). Good suggestion. -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. 2007-03-21 14:50 ` Adam Litke @ 2007-03-21 15:05 ` Arjan van de Ven -1 siblings, 0 replies; 68+ messages in thread From: Arjan van de Ven @ 2007-03-21 15:05 UTC (permalink / raw) To: Adam Litke Cc: Dave Hansen, Andrew Morton, William Lee Irwin III, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel On Wed, 2007-03-21 at 09:50 -0500, Adam Litke wrote: > On Tue, 2007-03-20 at 16:24 -0700, Dave Hansen wrote: > > On Mon, 2007-03-19 at 13:05 -0700, Adam Litke wrote: > > > > > > +#define has_pt_op(vma, op) \ > > > + ((vma)->pagetable_ops && (vma)->pagetable_ops->op) > > > +#define pt_op(vma, call) \ > > > + ((vma)->pagetable_ops->call) > > > > Can you get rid of these macros? I think they make it a wee bit harder > > to read. My brain doesn't properly parse the foo(arg)(bar) syntax. > > > > + if (has_pt_op(vma, copy_vma)) > > + return pt_op(vma, copy_vma)(dst_mm, src_mm, vma); > > > > + if (vma->pagetable_ops && vma->pagetable_ops->copy_vma) > > + return vma->pagetable_ops->copy_vma(dst_mm, src_mm, vma); > > > > I guess it does lead to some longish lines. Does it start looking > > really nasty? > > Yeah, it starts to look pretty bad. Some of these calls are in code > that is already indented several times. can we just make sure these things are never NULL in the first place? would obsolete a lot of the checks, which are also runtime overhead as well! -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. @ 2007-03-21 15:05 ` Arjan van de Ven 0 siblings, 0 replies; 68+ messages in thread From: Arjan van de Ven @ 2007-03-21 15:05 UTC (permalink / raw) To: Adam Litke Cc: Dave Hansen, Andrew Morton, William Lee Irwin III, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel On Wed, 2007-03-21 at 09:50 -0500, Adam Litke wrote: > On Tue, 2007-03-20 at 16:24 -0700, Dave Hansen wrote: > > On Mon, 2007-03-19 at 13:05 -0700, Adam Litke wrote: > > > > > > +#define has_pt_op(vma, op) \ > > > + ((vma)->pagetable_ops && (vma)->pagetable_ops->op) > > > +#define pt_op(vma, call) \ > > > + ((vma)->pagetable_ops->call) > > > > Can you get rid of these macros? I think they make it a wee bit harder > > to read. My brain doesn't properly parse the foo(arg)(bar) syntax. > > > > + if (has_pt_op(vma, copy_vma)) > > + return pt_op(vma, copy_vma)(dst_mm, src_mm, vma); > > > > + if (vma->pagetable_ops && vma->pagetable_ops->copy_vma) > > + return vma->pagetable_ops->copy_vma(dst_mm, src_mm, vma); > > > > I guess it does lead to some longish lines. Does it start looking > > really nasty? > > Yeah, it starts to look pretty bad. Some of these calls are in code > that is already indented several times. can we just make sure these things are never NULL in the first place? would obsolete a lot of the checks, which are also runtime overhead as well! -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. 2007-03-19 20:05 ` Adam Litke @ 2007-03-21 4:18 ` Nick Piggin -1 siblings, 0 replies; 68+ messages in thread From: Nick Piggin @ 2007-03-21 4:18 UTC (permalink / raw) To: Adam Litke Cc: Andrew Morton, Arjan van de Ven, William Lee Irwin III, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel Adam Litke wrote: > Signed-off-by: Adam Litke <agl@us.ibm.com> > --- > > include/linux/mm.h | 25 +++++++++++++++++++++++++ > 1 files changed, 25 insertions(+), 0 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 60e0e4a..7089323 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -98,6 +98,7 @@ struct vm_area_struct { > > /* Function pointers to deal with this struct. */ > struct vm_operations_struct * vm_ops; > + const struct pagetable_operations_struct * pagetable_ops; > > /* Information about our backing store: */ > unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE Can you remind me why this isn't in vm_ops? Also, it is going to be hugepage-only, isn't it? So should the naming be changed to reflect that? And #ifdef it... > @@ -218,6 +219,30 @@ struct vm_operations_struct { > }; > > struct mmu_gather; > + > +struct pagetable_operations_struct { > + int (*fault)(struct mm_struct *mm, > + struct vm_area_struct *vma, > + unsigned long address, int write_access); I got dibs on fault ;) My callback is a sanitised one that basically abstracts the details of the virtual memory mapping away, so it is usable by drivers and filesystems. You actually want to bypass the normal fault handling because it doesn't know how to deal with your virtual memory mapping. Hmm, the best suggestion I can come up with is handle_mm_fault... unless you can think of a better name for me to use. > + int (*copy_vma)(struct mm_struct *dst, struct mm_struct *src, > + struct vm_area_struct *vma); > + int (*pin_pages)(struct mm_struct *mm, struct vm_area_struct *vma, > + struct page **pages, struct vm_area_struct **vmas, > + unsigned long *position, int *length, int i); > + void (*change_protection)(struct vm_area_struct *vma, > + unsigned long address, unsigned long end, pgprot_t newprot); > + unsigned long (*unmap_page_range)(struct vm_area_struct *vma, > + unsigned long address, unsigned long end, long *zap_work); > + void (*free_pgtable_range)(struct mmu_gather **tlb, > + unsigned long addr, unsigned long end, > + unsigned long floor, unsigned long ceiling); > +}; > + > +#define has_pt_op(vma, op) \ > + ((vma)->pagetable_ops && (vma)->pagetable_ops->op) > +#define pt_op(vma, call) \ > + ((vma)->pagetable_ops->call) > + > struct inode; > > #define page_private(page) ((page)->private) > > -- -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. @ 2007-03-21 4:18 ` Nick Piggin 0 siblings, 0 replies; 68+ messages in thread From: Nick Piggin @ 2007-03-21 4:18 UTC (permalink / raw) To: Adam Litke Cc: Andrew Morton, Arjan van de Ven, William Lee Irwin III, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel Adam Litke wrote: > Signed-off-by: Adam Litke <agl@us.ibm.com> > --- > > include/linux/mm.h | 25 +++++++++++++++++++++++++ > 1 files changed, 25 insertions(+), 0 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 60e0e4a..7089323 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -98,6 +98,7 @@ struct vm_area_struct { > > /* Function pointers to deal with this struct. */ > struct vm_operations_struct * vm_ops; > + const struct pagetable_operations_struct * pagetable_ops; > > /* Information about our backing store: */ > unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE Can you remind me why this isn't in vm_ops? Also, it is going to be hugepage-only, isn't it? So should the naming be changed to reflect that? And #ifdef it... > @@ -218,6 +219,30 @@ struct vm_operations_struct { > }; > > struct mmu_gather; > + > +struct pagetable_operations_struct { > + int (*fault)(struct mm_struct *mm, > + struct vm_area_struct *vma, > + unsigned long address, int write_access); I got dibs on fault ;) My callback is a sanitised one that basically abstracts the details of the virtual memory mapping away, so it is usable by drivers and filesystems. You actually want to bypass the normal fault handling because it doesn't know how to deal with your virtual memory mapping. Hmm, the best suggestion I can come up with is handle_mm_fault... unless you can think of a better name for me to use. > + int (*copy_vma)(struct mm_struct *dst, struct mm_struct *src, > + struct vm_area_struct *vma); > + int (*pin_pages)(struct mm_struct *mm, struct vm_area_struct *vma, > + struct page **pages, struct vm_area_struct **vmas, > + unsigned long *position, int *length, int i); > + void (*change_protection)(struct vm_area_struct *vma, > + unsigned long address, unsigned long end, pgprot_t newprot); > + unsigned long (*unmap_page_range)(struct vm_area_struct *vma, > + unsigned long address, unsigned long end, long *zap_work); > + void (*free_pgtable_range)(struct mmu_gather **tlb, > + unsigned long addr, unsigned long end, > + unsigned long floor, unsigned long ceiling); > +}; > + > +#define has_pt_op(vma, op) \ > + ((vma)->pagetable_ops && (vma)->pagetable_ops->op) > +#define pt_op(vma, call) \ > + ((vma)->pagetable_ops->call) > + > struct inode; > > #define page_private(page) ((page)->private) > > -- -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. 2007-03-21 4:18 ` Nick Piggin @ 2007-03-21 4:52 ` William Lee Irwin III -1 siblings, 0 replies; 68+ messages in thread From: William Lee Irwin III @ 2007-03-21 4:52 UTC (permalink / raw) To: Nick Piggin Cc: Adam Litke, Andrew Morton, Arjan van de Ven, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel Adam Litke wrote: >> struct vm_operations_struct * vm_ops; >> + const struct pagetable_operations_struct * pagetable_ops; On Wed, Mar 21, 2007 at 03:18:30PM +1100, Nick Piggin wrote: > Can you remind me why this isn't in vm_ops? > Also, it is going to be hugepage-only, isn't it? So should the naming be > changed to reflect that? And #ifdef it... ISTR potential ppc64 users coming out of the woodwork for something I didn't recognize the name of, but I may be confusing that with your patch. I can implement additional users (and useful ones at that) needing this in particular if desired. Adam Litke wrote: >> +struct pagetable_operations_struct { >> + int (*fault)(struct mm_struct *mm, On Wed, Mar 21, 2007 at 03:18:30PM +1100, Nick Piggin wrote: > I got dibs on fault ;) > My callback is a sanitised one that basically abstracts the details of the > virtual memory mapping away, so it is usable by drivers and filesystems. > You actually want to bypass the normal fault handling because it doesn't > know how to deal with your virtual memory mapping. Hmm, the best suggestion > I can come up with is handle_mm_fault... unless you can think of a better > name for me to use. Two fault handling methods callbacks raise an eyebrow over here at least. I was vaguely hoping for unification of the fault handling callbacks. -- wli ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. @ 2007-03-21 4:52 ` William Lee Irwin III 0 siblings, 0 replies; 68+ messages in thread From: William Lee Irwin III @ 2007-03-21 4:52 UTC (permalink / raw) To: Nick Piggin Cc: Adam Litke, Andrew Morton, Arjan van de Ven, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel Adam Litke wrote: >> struct vm_operations_struct * vm_ops; >> + const struct pagetable_operations_struct * pagetable_ops; On Wed, Mar 21, 2007 at 03:18:30PM +1100, Nick Piggin wrote: > Can you remind me why this isn't in vm_ops? > Also, it is going to be hugepage-only, isn't it? So should the naming be > changed to reflect that? And #ifdef it... ISTR potential ppc64 users coming out of the woodwork for something I didn't recognize the name of, but I may be confusing that with your patch. I can implement additional users (and useful ones at that) needing this in particular if desired. Adam Litke wrote: >> +struct pagetable_operations_struct { >> + int (*fault)(struct mm_struct *mm, On Wed, Mar 21, 2007 at 03:18:30PM +1100, Nick Piggin wrote: > I got dibs on fault ;) > My callback is a sanitised one that basically abstracts the details of the > virtual memory mapping away, so it is usable by drivers and filesystems. > You actually want to bypass the normal fault handling because it doesn't > know how to deal with your virtual memory mapping. Hmm, the best suggestion > I can come up with is handle_mm_fault... unless you can think of a better > name for me to use. Two fault handling methods callbacks raise an eyebrow over here at least. I was vaguely hoping for unification of the fault handling callbacks. -- wli -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. 2007-03-21 4:52 ` William Lee Irwin III @ 2007-03-21 5:07 ` Nick Piggin -1 siblings, 0 replies; 68+ messages in thread From: Nick Piggin @ 2007-03-21 5:07 UTC (permalink / raw) To: William Lee Irwin III Cc: Adam Litke, Andrew Morton, Arjan van de Ven, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel William Lee Irwin III wrote: > Adam Litke wrote: > >>> struct vm_operations_struct * vm_ops; >>>+ const struct pagetable_operations_struct * pagetable_ops; > > > On Wed, Mar 21, 2007 at 03:18:30PM +1100, Nick Piggin wrote: > >>Can you remind me why this isn't in vm_ops? >>Also, it is going to be hugepage-only, isn't it? So should the naming be >>changed to reflect that? And #ifdef it... > > > ISTR potential ppc64 users coming out of the woodwork for something I > didn't recognize the name of, but I may be confusing that with your > patch. I can implement additional users (and useful ones at that) > needing this in particular if desired. Yes I would be interested in seeing useful additional users of this that cannot use our regular virtual memory, before making it a general thing. I just don't want to see proliferation of these things, if possible. > Adam Litke wrote: > >>>+struct pagetable_operations_struct { >>>+ int (*fault)(struct mm_struct *mm, > > > On Wed, Mar 21, 2007 at 03:18:30PM +1100, Nick Piggin wrote: > >>I got dibs on fault ;) >>My callback is a sanitised one that basically abstracts the details of the >>virtual memory mapping away, so it is usable by drivers and filesystems. >>You actually want to bypass the normal fault handling because it doesn't >>know how to deal with your virtual memory mapping. Hmm, the best suggestion >>I can come up with is handle_mm_fault... unless you can think of a better >>name for me to use. > > > Two fault handling methods callbacks raise an eyebrow over here at least. > I was vaguely hoping for unification of the fault handling callbacks. I don't know if it would be so clean to do that as they are at different levels. Adam's fault is before the VM translation (and bypasses it), and mine is after. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. @ 2007-03-21 5:07 ` Nick Piggin 0 siblings, 0 replies; 68+ messages in thread From: Nick Piggin @ 2007-03-21 5:07 UTC (permalink / raw) To: William Lee Irwin III Cc: Adam Litke, Andrew Morton, Arjan van de Ven, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel William Lee Irwin III wrote: > Adam Litke wrote: > >>> struct vm_operations_struct * vm_ops; >>>+ const struct pagetable_operations_struct * pagetable_ops; > > > On Wed, Mar 21, 2007 at 03:18:30PM +1100, Nick Piggin wrote: > >>Can you remind me why this isn't in vm_ops? >>Also, it is going to be hugepage-only, isn't it? So should the naming be >>changed to reflect that? And #ifdef it... > > > ISTR potential ppc64 users coming out of the woodwork for something I > didn't recognize the name of, but I may be confusing that with your > patch. I can implement additional users (and useful ones at that) > needing this in particular if desired. Yes I would be interested in seeing useful additional users of this that cannot use our regular virtual memory, before making it a general thing. I just don't want to see proliferation of these things, if possible. > Adam Litke wrote: > >>>+struct pagetable_operations_struct { >>>+ int (*fault)(struct mm_struct *mm, > > > On Wed, Mar 21, 2007 at 03:18:30PM +1100, Nick Piggin wrote: > >>I got dibs on fault ;) >>My callback is a sanitised one that basically abstracts the details of the >>virtual memory mapping away, so it is usable by drivers and filesystems. >>You actually want to bypass the normal fault handling because it doesn't >>know how to deal with your virtual memory mapping. Hmm, the best suggestion >>I can come up with is handle_mm_fault... unless you can think of a better >>name for me to use. > > > Two fault handling methods callbacks raise an eyebrow over here at least. > I was vaguely hoping for unification of the fault handling callbacks. I don't know if it would be so clean to do that as they are at different levels. Adam's fault is before the VM translation (and bypasses it), and mine is after. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. 2007-03-21 5:07 ` Nick Piggin @ 2007-03-21 5:41 ` William Lee Irwin III -1 siblings, 0 replies; 68+ messages in thread From: William Lee Irwin III @ 2007-03-21 5:41 UTC (permalink / raw) To: Nick Piggin Cc: Adam Litke, Andrew Morton, Arjan van de Ven, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel William Lee Irwin III wrote: >> ISTR potential ppc64 users coming out of the woodwork for something I >> didn't recognize the name of, but I may be confusing that with your >> patch. I can implement additional users (and useful ones at that) >> needing this in particular if desired. On Wed, Mar 21, 2007 at 04:07:43PM +1100, Nick Piggin wrote: > Yes I would be interested in seeing useful additional users of this > that cannot use our regular virtual memory, before making it a general > thing. > I just don't want to see proliferation of these things, if possible. I'm tied up elsewhere so I won't get to it in a timely fashion. Maybe in a few weeks I can start up on the first two of the bunch. William Lee Irwin III wrote: >> Two fault handling methods callbacks raise an eyebrow over here at least. >> I was vaguely hoping for unification of the fault handling callbacks. On Wed, Mar 21, 2007 at 04:07:43PM +1100, Nick Piggin wrote: > I don't know if it would be so clean to do that as they are at different > levels. > Adam's fault is before the VM translation (and bypasses it), and mine is > after. Not much of a VM translation; it's just a lookup through the software mocked-up structures on everything save i386, x86_64, and some m68k where they're the same thing only with hardware walkers (ISTR ia64's being firmware a la Alpha despite the "HPW" name, though I could be wrong) reliant on them. The drivers/etc. could just as easily use helper functions to carry out the lookup, thereby accomplishing the unification. There's nothing particularly fundamental about a pte lookup. Normal arches that do software TLB refill could just as easily consult the radix trees dangled off struct address_space or any old data structure floating around the kernel with enough information to translate user virtual addresses to the physical addresses they need to fill the TLB with, and there are other kernels that literally do things like that. Basically, drop in to the ->fault() callback with no attempt at a pte lookup. The drivers using the standard pagetable format can call helper functions to do all the gruntwork surrounding that for them. Then the more sophisticated drivers can do the necessary work by hand. But others should really be consulted on this point. My notions in/around this area tend to be outside the mainstream. I can anticipate that the two ->fault() functions will look strange to people, but not what alternatives would be most idiomatic to mainstream Linux conventions. -- wli ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. @ 2007-03-21 5:41 ` William Lee Irwin III 0 siblings, 0 replies; 68+ messages in thread From: William Lee Irwin III @ 2007-03-21 5:41 UTC (permalink / raw) To: Nick Piggin Cc: Adam Litke, Andrew Morton, Arjan van de Ven, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel William Lee Irwin III wrote: >> ISTR potential ppc64 users coming out of the woodwork for something I >> didn't recognize the name of, but I may be confusing that with your >> patch. I can implement additional users (and useful ones at that) >> needing this in particular if desired. On Wed, Mar 21, 2007 at 04:07:43PM +1100, Nick Piggin wrote: > Yes I would be interested in seeing useful additional users of this > that cannot use our regular virtual memory, before making it a general > thing. > I just don't want to see proliferation of these things, if possible. I'm tied up elsewhere so I won't get to it in a timely fashion. Maybe in a few weeks I can start up on the first two of the bunch. William Lee Irwin III wrote: >> Two fault handling methods callbacks raise an eyebrow over here at least. >> I was vaguely hoping for unification of the fault handling callbacks. On Wed, Mar 21, 2007 at 04:07:43PM +1100, Nick Piggin wrote: > I don't know if it would be so clean to do that as they are at different > levels. > Adam's fault is before the VM translation (and bypasses it), and mine is > after. Not much of a VM translation; it's just a lookup through the software mocked-up structures on everything save i386, x86_64, and some m68k where they're the same thing only with hardware walkers (ISTR ia64's being firmware a la Alpha despite the "HPW" name, though I could be wrong) reliant on them. The drivers/etc. could just as easily use helper functions to carry out the lookup, thereby accomplishing the unification. There's nothing particularly fundamental about a pte lookup. Normal arches that do software TLB refill could just as easily consult the radix trees dangled off struct address_space or any old data structure floating around the kernel with enough information to translate user virtual addresses to the physical addresses they need to fill the TLB with, and there are other kernels that literally do things like that. Basically, drop in to the ->fault() callback with no attempt at a pte lookup. The drivers using the standard pagetable format can call helper functions to do all the gruntwork surrounding that for them. Then the more sophisticated drivers can do the necessary work by hand. But others should really be consulted on this point. My notions in/around this area tend to be outside the mainstream. I can anticipate that the two ->fault() functions will look strange to people, but not what alternatives would be most idiomatic to mainstream Linux conventions. -- wli -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. 2007-03-21 5:41 ` William Lee Irwin III @ 2007-03-21 6:51 ` Nick Piggin -1 siblings, 0 replies; 68+ messages in thread From: Nick Piggin @ 2007-03-21 6:51 UTC (permalink / raw) To: William Lee Irwin III Cc: Adam Litke, Andrew Morton, Arjan van de Ven, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel, Linus Torvalds William Lee Irwin III wrote: > William Lee Irwin III wrote: > >>>ISTR potential ppc64 users coming out of the woodwork for something I >>>didn't recognize the name of, but I may be confusing that with your >>>patch. I can implement additional users (and useful ones at that) >>>needing this in particular if desired. > > > On Wed, Mar 21, 2007 at 04:07:43PM +1100, Nick Piggin wrote: > >>Yes I would be interested in seeing useful additional users of this >>that cannot use our regular virtual memory, before making it a general >>thing. >>I just don't want to see proliferation of these things, if possible. > > > I'm tied up elsewhere so I won't get to it in a timely fashion. Maybe > in a few weeks I can start up on the first two of the bunch. Care to give us a hint? :) > William Lee Irwin III wrote: > >>>Two fault handling methods callbacks raise an eyebrow over here at least. >>>I was vaguely hoping for unification of the fault handling callbacks. > > > On Wed, Mar 21, 2007 at 04:07:43PM +1100, Nick Piggin wrote: > >>I don't know if it would be so clean to do that as they are at different >>levels. >>Adam's fault is before the VM translation (and bypasses it), and mine is >>after. > > > Not much of a VM translation; it's just a lookup through the software > mocked-up structures on everything save i386, x86_64, and some m68k where > they're the same thing only with hardware walkers (ISTR ia64's being > firmware a la Alpha despite the "HPW" name, though I could be wrong) Well the vma+pagetables *are* our VM translation data structure. It is a good data structure. The Gelato/UNSW guys experimenting with changing this have basically said they haven't yet got anything that beats it. I would be opposed to anything that bypasses that unless a) it is not applicable to the VM as a whole, and b) it is really worth it (hugepages was a reasonable exception). > reliant on them. The drivers/etc. could just as easily use helper > functions to carry out the lookup, thereby accomplishing the > unification. There's nothing particularly fundamental about a pte > lookup. Yeah you could, but it looks back to front to me. The VM tells the filesystem that the machine took a fault at virtual address X, then the filesystem asks the VM what pgoff that is, then tells the VM to install the corresponding page to vaddr X. With my ->fault, the VM asks the filesystem to give the page that corresponds to vaddr X, then installs it into that vaddr. > Normal arches that do software TLB refill could just as easily > consult the radix trees dangled off struct address_space or any old > data structure floating around the kernel with enough information to > translate user virtual addresses to the physical addresses they need to > fill the TLB with, and there are other kernels that literally do things > like that. Sure it *could* be done, but it may not be very nice, given Linux's design. And you definitely need _something_ other than just the pagecache radix-tree, because the VM needs to know who maps the page. So if, for your backing store, you use a small hash table and evict old entries like powerpc, you'll constantly be faulting in and out pages from the VM's high level view of the address space. That isn't a really cheap operation. It takes at least: read_lock_irq(mapping->tree_lock); radix_tree_lookup() read_unlock_irq(mapping->tree_lock); lock_page() atomic_add(page->_count) atomic_add(page->_mapcount) unlock_page() atomic_add_negative(page->_mapcount) atomic_dec_and_test(page->_count) Compared to our current page table walk which is just a single locked op + barrier for the spinlock + radix tree walk. If you had a very large hash table (ia64 long mode, maybe?), then you may have slightly fewer high level faults, but range based operations are going to take a whole lot of cache misses, aren't they? Especially for small processes. Not that I wouldn't be happy to be proven wrong, but I don't think it should be something that sneaks in under these pagetable operations. IMO. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. @ 2007-03-21 6:51 ` Nick Piggin 0 siblings, 0 replies; 68+ messages in thread From: Nick Piggin @ 2007-03-21 6:51 UTC (permalink / raw) To: William Lee Irwin III Cc: Adam Litke, Andrew Morton, Arjan van de Ven, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel, Linus Torvalds William Lee Irwin III wrote: > William Lee Irwin III wrote: > >>>ISTR potential ppc64 users coming out of the woodwork for something I >>>didn't recognize the name of, but I may be confusing that with your >>>patch. I can implement additional users (and useful ones at that) >>>needing this in particular if desired. > > > On Wed, Mar 21, 2007 at 04:07:43PM +1100, Nick Piggin wrote: > >>Yes I would be interested in seeing useful additional users of this >>that cannot use our regular virtual memory, before making it a general >>thing. >>I just don't want to see proliferation of these things, if possible. > > > I'm tied up elsewhere so I won't get to it in a timely fashion. Maybe > in a few weeks I can start up on the first two of the bunch. Care to give us a hint? :) > William Lee Irwin III wrote: > >>>Two fault handling methods callbacks raise an eyebrow over here at least. >>>I was vaguely hoping for unification of the fault handling callbacks. > > > On Wed, Mar 21, 2007 at 04:07:43PM +1100, Nick Piggin wrote: > >>I don't know if it would be so clean to do that as they are at different >>levels. >>Adam's fault is before the VM translation (and bypasses it), and mine is >>after. > > > Not much of a VM translation; it's just a lookup through the software > mocked-up structures on everything save i386, x86_64, and some m68k where > they're the same thing only with hardware walkers (ISTR ia64's being > firmware a la Alpha despite the "HPW" name, though I could be wrong) Well the vma+pagetables *are* our VM translation data structure. It is a good data structure. The Gelato/UNSW guys experimenting with changing this have basically said they haven't yet got anything that beats it. I would be opposed to anything that bypasses that unless a) it is not applicable to the VM as a whole, and b) it is really worth it (hugepages was a reasonable exception). > reliant on them. The drivers/etc. could just as easily use helper > functions to carry out the lookup, thereby accomplishing the > unification. There's nothing particularly fundamental about a pte > lookup. Yeah you could, but it looks back to front to me. The VM tells the filesystem that the machine took a fault at virtual address X, then the filesystem asks the VM what pgoff that is, then tells the VM to install the corresponding page to vaddr X. With my ->fault, the VM asks the filesystem to give the page that corresponds to vaddr X, then installs it into that vaddr. > Normal arches that do software TLB refill could just as easily > consult the radix trees dangled off struct address_space or any old > data structure floating around the kernel with enough information to > translate user virtual addresses to the physical addresses they need to > fill the TLB with, and there are other kernels that literally do things > like that. Sure it *could* be done, but it may not be very nice, given Linux's design. And you definitely need _something_ other than just the pagecache radix-tree, because the VM needs to know who maps the page. So if, for your backing store, you use a small hash table and evict old entries like powerpc, you'll constantly be faulting in and out pages from the VM's high level view of the address space. That isn't a really cheap operation. It takes at least: read_lock_irq(mapping->tree_lock); radix_tree_lookup() read_unlock_irq(mapping->tree_lock); lock_page() atomic_add(page->_count) atomic_add(page->_mapcount) unlock_page() atomic_add_negative(page->_mapcount) atomic_dec_and_test(page->_count) Compared to our current page table walk which is just a single locked op + barrier for the spinlock + radix tree walk. If you had a very large hash table (ia64 long mode, maybe?), then you may have slightly fewer high level faults, but range based operations are going to take a whole lot of cache misses, aren't they? Especially for small processes. Not that I wouldn't be happy to be proven wrong, but I don't think it should be something that sneaks in under these pagetable operations. IMO. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. 2007-03-21 6:51 ` Nick Piggin @ 2007-03-21 7:36 ` Nick Piggin -1 siblings, 0 replies; 68+ messages in thread From: Nick Piggin @ 2007-03-21 7:36 UTC (permalink / raw) To: Nick Piggin Cc: William Lee Irwin III, Adam Litke, Andrew Morton, Arjan van de Ven, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel, Linus Torvalds Nick Piggin wrote: > Yeah you could, but it looks back to front to me. > > The VM tells the filesystem that the machine took a fault at virtual > address X, then the filesystem asks the VM what pgoff that is, then > tells the VM to install the corresponding page to vaddr X. > > With my ->fault, the VM asks the filesystem to give the page that > corresponds to vaddr X, then installs it into that vaddr. Err, sorry, that's what the current ->nopage does. It is then still up to the filesystem to do the vaddr to pgoff conversion. My fault patches of course just ask the filesystem for the page at a given pgoff. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. @ 2007-03-21 7:36 ` Nick Piggin 0 siblings, 0 replies; 68+ messages in thread From: Nick Piggin @ 2007-03-21 7:36 UTC (permalink / raw) To: Nick Piggin Cc: William Lee Irwin III, Adam Litke, Andrew Morton, Arjan van de Ven, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel, Linus Torvalds Nick Piggin wrote: > Yeah you could, but it looks back to front to me. > > The VM tells the filesystem that the machine took a fault at virtual > address X, then the filesystem asks the VM what pgoff that is, then > tells the VM to install the corresponding page to vaddr X. > > With my ->fault, the VM asks the filesystem to give the page that > corresponds to vaddr X, then installs it into that vaddr. Err, sorry, that's what the current ->nopage does. It is then still up to the filesystem to do the vaddr to pgoff conversion. My fault patches of course just ask the filesystem for the page at a given pgoff. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. 2007-03-21 6:51 ` Nick Piggin @ 2007-03-21 10:46 ` William Lee Irwin III -1 siblings, 0 replies; 68+ messages in thread From: William Lee Irwin III @ 2007-03-21 10:46 UTC (permalink / raw) To: Nick Piggin Cc: Adam Litke, Andrew Morton, Arjan van de Ven, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel, Linus Torvalds William Lee Irwin III wrote: >> I'm tied up elsewhere so I won't get to it in a timely fashion. Maybe >> in a few weeks I can start up on the first two of the bunch. On Wed, Mar 21, 2007 at 05:51:23PM +1100, Nick Piggin wrote: > Care to give us a hint? :) The first is something DISM-like. I've not made up my mind on the second, but the shopping catalogue of feature requests I've done nothing about for some time that want this is long. William Lee Irwin III wrote: >> Not much of a VM translation; it's just a lookup through the >> software mocked-up structures on everything save i386, x86_64, and >> some m68k where they're the same thing only with hardware walkers >> (ISTR ia64's being firmware a la Alpha despite the "HPW" name, >> though I could be wrong) On Wed, Mar 21, 2007 at 05:51:23PM +1100, Nick Piggin wrote: > Well the vma+pagetables *are* our VM translation data structure. It is > a good data structure. The Gelato/UNSW guys experimenting with changing > this have basically said they haven't yet got anything that beats it. > I would be opposed to anything that bypasses that unless a) it is not > applicable to the VM as a whole, and b) it is really worth it > (hugepages was a reasonable exception). Maybe anticipating the conventional Linux approach to this wasn't as difficult as I supposed. ;) William Lee Irwin III wrote: >> reliant on them. The drivers/etc. could just as easily use helper >> functions to carry out the lookup, thereby accomplishing the >> unification. There's nothing particularly fundamental about a pte >> lookup. On Wed, Mar 21, 2007 at 05:51:23PM +1100, Nick Piggin wrote: > Yeah you could, but it looks back to front to me. > The VM tells the filesystem that the machine took a fault at virtual > address X, then the filesystem asks the VM what pgoff that is, then > tells the VM to install the corresponding page to vaddr X. > With my ->fault, the VM asks the filesystem to give the page that > corresponds to vaddr X, then installs it into that vaddr. I'm aware of what is now done and the minor modification accomplished by your ->fault(). Maybe I've even written something like this before that I never posted. It's obvious what I'm on about and that my thoughts here are too divergent to fly. Others should chime in with more Linux-native ideas about what's to be done here. William Lee Irwin III wrote: >> Normal arches that do software TLB refill could just as easily >> consult the radix trees dangled off struct address_space or any old >> data structure floating around the kernel with enough information to >> translate user virtual addresses to the physical addresses they need to >> fill the TLB with, and there are other kernels that literally do things >> like that. On Wed, Mar 21, 2007 at 05:51:23PM +1100, Nick Piggin wrote: > Sure it *could* be done, but it may not be very nice, given Linux's > design. And you definitely need _something_ other than just the > pagecache radix-tree, because the VM needs to know who maps the page. > So if, for your backing store, you use a small hash table and evict old > entries like powerpc, you'll constantly be faulting in and out pages > from the VM's high level view of the address space. That isn't a really > cheap operation. It takes at least: [long list of locking operations snipped] > Compared to our current page table walk which is just a single locked > op + barrier for the spinlock + radix tree walk. > If you had a very large hash table (ia64 long mode, maybe?), then you > may have slightly fewer high level faults, but range based operations > are going to take a whole lot of cache misses, aren't they? Especially > for small processes. > Not that I wouldn't be happy to be proven wrong, but I don't think it > should be something that sneaks in under these pagetable operations. > IMO. I'll presume that was not for my benefit; if so, it was superfluous. The example I gave was to show how far things could diverge from Linux' conventions. Every single locking operation cited for Linux didn't apply to the kernel I was thinking of due to its lockless pagecache analogue, its lack of a direct equivalent of struct page, and its use of different lifetime-bounding protocols from reference counting. Things like page replacement didn't rely on things that would disturb all that. It all worked out quite well for that kernel. So not only can it be done other ways, but those ways are indeed efficient. It should be clear from the above that retrofitting Linux to do similar is effectively impossible. (Well, if you think you can pull off removing struct page in favor of no direct equivalent and bounding the lifetimes of page-sized chunks of memory by shooting down all references using knowledge of who could possibly be hanging onto them in Linux, feel free to attempt such a retrofit, and I'll send you a case of Scotch whisky if you can get it to boot and run a major database benchmark without crashing regardless of whether it's merged.) In any event, let's not talk too much at cross-purposes. I'm deferring to others on this, as I said. Someone else will doubtless come at this from a direction that gibes better with Linux-native conventions. -- wli ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. @ 2007-03-21 10:46 ` William Lee Irwin III 0 siblings, 0 replies; 68+ messages in thread From: William Lee Irwin III @ 2007-03-21 10:46 UTC (permalink / raw) To: Nick Piggin Cc: Adam Litke, Andrew Morton, Arjan van de Ven, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel, Linus Torvalds William Lee Irwin III wrote: >> I'm tied up elsewhere so I won't get to it in a timely fashion. Maybe >> in a few weeks I can start up on the first two of the bunch. On Wed, Mar 21, 2007 at 05:51:23PM +1100, Nick Piggin wrote: > Care to give us a hint? :) The first is something DISM-like. I've not made up my mind on the second, but the shopping catalogue of feature requests I've done nothing about for some time that want this is long. William Lee Irwin III wrote: >> Not much of a VM translation; it's just a lookup through the >> software mocked-up structures on everything save i386, x86_64, and >> some m68k where they're the same thing only with hardware walkers >> (ISTR ia64's being firmware a la Alpha despite the "HPW" name, >> though I could be wrong) On Wed, Mar 21, 2007 at 05:51:23PM +1100, Nick Piggin wrote: > Well the vma+pagetables *are* our VM translation data structure. It is > a good data structure. The Gelato/UNSW guys experimenting with changing > this have basically said they haven't yet got anything that beats it. > I would be opposed to anything that bypasses that unless a) it is not > applicable to the VM as a whole, and b) it is really worth it > (hugepages was a reasonable exception). Maybe anticipating the conventional Linux approach to this wasn't as difficult as I supposed. ;) William Lee Irwin III wrote: >> reliant on them. The drivers/etc. could just as easily use helper >> functions to carry out the lookup, thereby accomplishing the >> unification. There's nothing particularly fundamental about a pte >> lookup. On Wed, Mar 21, 2007 at 05:51:23PM +1100, Nick Piggin wrote: > Yeah you could, but it looks back to front to me. > The VM tells the filesystem that the machine took a fault at virtual > address X, then the filesystem asks the VM what pgoff that is, then > tells the VM to install the corresponding page to vaddr X. > With my ->fault, the VM asks the filesystem to give the page that > corresponds to vaddr X, then installs it into that vaddr. I'm aware of what is now done and the minor modification accomplished by your ->fault(). Maybe I've even written something like this before that I never posted. It's obvious what I'm on about and that my thoughts here are too divergent to fly. Others should chime in with more Linux-native ideas about what's to be done here. William Lee Irwin III wrote: >> Normal arches that do software TLB refill could just as easily >> consult the radix trees dangled off struct address_space or any old >> data structure floating around the kernel with enough information to >> translate user virtual addresses to the physical addresses they need to >> fill the TLB with, and there are other kernels that literally do things >> like that. On Wed, Mar 21, 2007 at 05:51:23PM +1100, Nick Piggin wrote: > Sure it *could* be done, but it may not be very nice, given Linux's > design. And you definitely need _something_ other than just the > pagecache radix-tree, because the VM needs to know who maps the page. > So if, for your backing store, you use a small hash table and evict old > entries like powerpc, you'll constantly be faulting in and out pages > from the VM's high level view of the address space. That isn't a really > cheap operation. It takes at least: [long list of locking operations snipped] > Compared to our current page table walk which is just a single locked > op + barrier for the spinlock + radix tree walk. > If you had a very large hash table (ia64 long mode, maybe?), then you > may have slightly fewer high level faults, but range based operations > are going to take a whole lot of cache misses, aren't they? Especially > for small processes. > Not that I wouldn't be happy to be proven wrong, but I don't think it > should be something that sneaks in under these pagetable operations. > IMO. I'll presume that was not for my benefit; if so, it was superfluous. The example I gave was to show how far things could diverge from Linux' conventions. Every single locking operation cited for Linux didn't apply to the kernel I was thinking of due to its lockless pagecache analogue, its lack of a direct equivalent of struct page, and its use of different lifetime-bounding protocols from reference counting. Things like page replacement didn't rely on things that would disturb all that. It all worked out quite well for that kernel. So not only can it be done other ways, but those ways are indeed efficient. It should be clear from the above that retrofitting Linux to do similar is effectively impossible. (Well, if you think you can pull off removing struct page in favor of no direct equivalent and bounding the lifetimes of page-sized chunks of memory by shooting down all references using knowledge of who could possibly be hanging onto them in Linux, feel free to attempt such a retrofit, and I'll send you a case of Scotch whisky if you can get it to boot and run a major database benchmark without crashing regardless of whether it's merged.) In any event, let's not talk too much at cross-purposes. I'm deferring to others on this, as I said. Someone else will doubtless come at this from a direction that gibes better with Linux-native conventions. -- wli -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. 2007-03-21 4:18 ` Nick Piggin @ 2007-03-21 15:17 ` Adam Litke -1 siblings, 0 replies; 68+ messages in thread From: Adam Litke @ 2007-03-21 15:17 UTC (permalink / raw) To: Nick Piggin Cc: Andrew Morton, Arjan van de Ven, William Lee Irwin III, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel On Wed, 2007-03-21 at 15:18 +1100, Nick Piggin wrote: > Adam Litke wrote: > > Signed-off-by: Adam Litke <agl@us.ibm.com> > > --- > > > > include/linux/mm.h | 25 +++++++++++++++++++++++++ > > 1 files changed, 25 insertions(+), 0 deletions(-) > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index 60e0e4a..7089323 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -98,6 +98,7 @@ struct vm_area_struct { > > > > /* Function pointers to deal with this struct. */ > > struct vm_operations_struct * vm_ops; > > + const struct pagetable_operations_struct * pagetable_ops; > > > > /* Information about our backing store: */ > > unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE > > Can you remind me why this isn't in vm_ops? We didn't want to bloat the size of the vm_ops struct for all of its users. > Also, it is going to be hugepage-only, isn't it? So should the naming be > changed to reflect that? And #ifdef it... They are doing some interesting things on Cell that could take advantage of this. > > @@ -218,6 +219,30 @@ struct vm_operations_struct { > > }; > > > > struct mmu_gather; > > + > > +struct pagetable_operations_struct { > > + int (*fault)(struct mm_struct *mm, > > + struct vm_area_struct *vma, > > + unsigned long address, int write_access); > > I got dibs on fault ;) > > My callback is a sanitised one that basically abstracts the details of the > virtual memory mapping away, so it is usable by drivers and filesystems. > > You actually want to bypass the normal fault handling because it doesn't > know how to deal with your virtual memory mapping. Hmm, the best suggestion > I can come up with is handle_mm_fault... unless you can think of a better > name for me to use. How about I use handle_pte_fault? -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. @ 2007-03-21 15:17 ` Adam Litke 0 siblings, 0 replies; 68+ messages in thread From: Adam Litke @ 2007-03-21 15:17 UTC (permalink / raw) To: Nick Piggin Cc: Andrew Morton, Arjan van de Ven, William Lee Irwin III, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel On Wed, 2007-03-21 at 15:18 +1100, Nick Piggin wrote: > Adam Litke wrote: > > Signed-off-by: Adam Litke <agl@us.ibm.com> > > --- > > > > include/linux/mm.h | 25 +++++++++++++++++++++++++ > > 1 files changed, 25 insertions(+), 0 deletions(-) > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index 60e0e4a..7089323 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -98,6 +98,7 @@ struct vm_area_struct { > > > > /* Function pointers to deal with this struct. */ > > struct vm_operations_struct * vm_ops; > > + const struct pagetable_operations_struct * pagetable_ops; > > > > /* Information about our backing store: */ > > unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE > > Can you remind me why this isn't in vm_ops? We didn't want to bloat the size of the vm_ops struct for all of its users. > Also, it is going to be hugepage-only, isn't it? So should the naming be > changed to reflect that? And #ifdef it... They are doing some interesting things on Cell that could take advantage of this. > > @@ -218,6 +219,30 @@ struct vm_operations_struct { > > }; > > > > struct mmu_gather; > > + > > +struct pagetable_operations_struct { > > + int (*fault)(struct mm_struct *mm, > > + struct vm_area_struct *vma, > > + unsigned long address, int write_access); > > I got dibs on fault ;) > > My callback is a sanitised one that basically abstracts the details of the > virtual memory mapping away, so it is usable by drivers and filesystems. > > You actually want to bypass the normal fault handling because it doesn't > know how to deal with your virtual memory mapping. Hmm, the best suggestion > I can come up with is handle_mm_fault... unless you can think of a better > name for me to use. How about I use handle_pte_fault? -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. 2007-03-21 15:17 ` Adam Litke @ 2007-03-21 16:00 ` Christoph Hellwig -1 siblings, 0 replies; 68+ messages in thread From: Christoph Hellwig @ 2007-03-21 16:00 UTC (permalink / raw) To: Adam Litke Cc: Nick Piggin, Andrew Morton, Arjan van de Ven, William Lee Irwin III, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel On Wed, Mar 21, 2007 at 10:17:40AM -0500, Adam Litke wrote: > > Also, it is going to be hugepage-only, isn't it? So should the naming be > > changed to reflect that? And #ifdef it... > > They are doing some interesting things on Cell that could take advantage > of this. That would be new to me. What we need on Cell is fixing up the get_unmapped_area mess which Ben is working on now. And let me once again repeat that I don't like this at all. I'll rather have a few ugly ifdefs in strategic places than a big object oriented mess like this with just a single user. ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. @ 2007-03-21 16:00 ` Christoph Hellwig 0 siblings, 0 replies; 68+ messages in thread From: Christoph Hellwig @ 2007-03-21 16:00 UTC (permalink / raw) To: Adam Litke Cc: Nick Piggin, Andrew Morton, Arjan van de Ven, William Lee Irwin III, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel On Wed, Mar 21, 2007 at 10:17:40AM -0500, Adam Litke wrote: > > Also, it is going to be hugepage-only, isn't it? So should the naming be > > changed to reflect that? And #ifdef it... > > They are doing some interesting things on Cell that could take advantage > of this. That would be new to me. What we need on Cell is fixing up the get_unmapped_area mess which Ben is working on now. And let me once again repeat that I don't like this at all. I'll rather have a few ugly ifdefs in strategic places than a big object oriented mess like this with just a single user. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. 2007-03-21 16:00 ` Christoph Hellwig @ 2007-03-21 23:03 ` Nick Piggin -1 siblings, 0 replies; 68+ messages in thread From: Nick Piggin @ 2007-03-21 23:03 UTC (permalink / raw) To: Christoph Hellwig Cc: Adam Litke, Andrew Morton, Arjan van de Ven, William Lee Irwin III, Ken Chen, linux-mm, linux-kernel Christoph Hellwig wrote: > On Wed, Mar 21, 2007 at 10:17:40AM -0500, Adam Litke wrote: > >>>Also, it is going to be hugepage-only, isn't it? So should the naming be >>>changed to reflect that? And #ifdef it... >> >>They are doing some interesting things on Cell that could take advantage >>of this. > > > That would be new to me. What we need on Cell is fixing up the > get_unmapped_area mess which Ben is working on now. > > And let me once again repeat that I don't like this at all. I'll > rather have a few ugly ifdefs in strategic places than a big object > oriented mess like this with just a single user. I think I agree that we'd need more than one user for this. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. @ 2007-03-21 23:03 ` Nick Piggin 0 siblings, 0 replies; 68+ messages in thread From: Nick Piggin @ 2007-03-21 23:03 UTC (permalink / raw) To: Christoph Hellwig Cc: Adam Litke, Andrew Morton, Arjan van de Ven, William Lee Irwin III, Ken Chen, linux-mm, linux-kernel Christoph Hellwig wrote: > On Wed, Mar 21, 2007 at 10:17:40AM -0500, Adam Litke wrote: > >>>Also, it is going to be hugepage-only, isn't it? So should the naming be >>>changed to reflect that? And #ifdef it... >> >>They are doing some interesting things on Cell that could take advantage >>of this. > > > That would be new to me. What we need on Cell is fixing up the > get_unmapped_area mess which Ben is working on now. > > And let me once again repeat that I don't like this at all. I'll > rather have a few ugly ifdefs in strategic places than a big object > oriented mess like this with just a single user. I think I agree that we'd need more than one user for this. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. 2007-03-21 15:17 ` Adam Litke @ 2007-03-21 23:02 ` Nick Piggin -1 siblings, 0 replies; 68+ messages in thread From: Nick Piggin @ 2007-03-21 23:02 UTC (permalink / raw) To: Adam Litke Cc: Andrew Morton, Arjan van de Ven, William Lee Irwin III, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel Adam Litke wrote: > On Wed, 2007-03-21 at 15:18 +1100, Nick Piggin wrote: > >>Adam Litke wrote: >>>diff --git a/include/linux/mm.h b/include/linux/mm.h >>>index 60e0e4a..7089323 100644 >>>--- a/include/linux/mm.h >>>+++ b/include/linux/mm.h >>>@@ -98,6 +98,7 @@ struct vm_area_struct { >>> >>> /* Function pointers to deal with this struct. */ >>> struct vm_operations_struct * vm_ops; >>>+ const struct pagetable_operations_struct * pagetable_ops; >>> >>> /* Information about our backing store: */ >>> unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE >> >>Can you remind me why this isn't in vm_ops? > > > We didn't want to bloat the size of the vm_ops struct for all of its > users. But vmas are surely far more numerous than vm_ops, aren't they? -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. @ 2007-03-21 23:02 ` Nick Piggin 0 siblings, 0 replies; 68+ messages in thread From: Nick Piggin @ 2007-03-21 23:02 UTC (permalink / raw) To: Adam Litke Cc: Andrew Morton, Arjan van de Ven, William Lee Irwin III, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel Adam Litke wrote: > On Wed, 2007-03-21 at 15:18 +1100, Nick Piggin wrote: > >>Adam Litke wrote: >>>diff --git a/include/linux/mm.h b/include/linux/mm.h >>>index 60e0e4a..7089323 100644 >>>--- a/include/linux/mm.h >>>+++ b/include/linux/mm.h >>>@@ -98,6 +98,7 @@ struct vm_area_struct { >>> >>> /* Function pointers to deal with this struct. */ >>> struct vm_operations_struct * vm_ops; >>>+ const struct pagetable_operations_struct * pagetable_ops; >>> >>> /* Information about our backing store: */ >>> unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE >> >>Can you remind me why this isn't in vm_ops? > > > We didn't want to bloat the size of the vm_ops struct for all of its > users. But vmas are surely far more numerous than vm_ops, aren't they? -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. 2007-03-21 23:02 ` Nick Piggin @ 2007-03-21 23:32 ` William Lee Irwin III -1 siblings, 0 replies; 68+ messages in thread From: William Lee Irwin III @ 2007-03-21 23:32 UTC (permalink / raw) To: Nick Piggin Cc: Adam Litke, Andrew Morton, Arjan van de Ven, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel Adam Litke wrote: >> We didn't want to bloat the size of the vm_ops struct for all of its >> users. On Thu, Mar 22, 2007 at 10:02:07AM +1100, Nick Piggin wrote: > But vmas are surely far more numerous than vm_ops, aren't they? It should be clarified that the pointer to the operations structure in once-per-mmap() vmas is a bigger overhead than once-per-driver function pointers in the vm_ops structure. -- wli ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros. @ 2007-03-21 23:32 ` William Lee Irwin III 0 siblings, 0 replies; 68+ messages in thread From: William Lee Irwin III @ 2007-03-21 23:32 UTC (permalink / raw) To: Nick Piggin Cc: Adam Litke, Andrew Morton, Arjan van de Ven, Christoph Hellwig, Ken Chen, linux-mm, linux-kernel Adam Litke wrote: >> We didn't want to bloat the size of the vm_ops struct for all of its >> users. On Thu, Mar 22, 2007 at 10:02:07AM +1100, Nick Piggin wrote: > But vmas are surely far more numerous than vm_ops, aren't they? It should be clarified that the pointer to the operations structure in once-per-mmap() vmas is a bigger overhead than once-per-driver function pointers in the vm_ops structure. -- wli -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 68+ messages in thread
end of thread, other threads:[~2007-03-21 23:37 UTC | newest] Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-02-19 18:31 [PATCH 0/7] [RFC] hugetlb: pagetable_operations API Adam Litke 2007-02-19 18:31 ` Adam Litke 2007-02-19 18:31 ` [PATCH 1/7] Introduce the pagetable_operations and associated helper macros Adam Litke 2007-02-19 18:31 ` Adam Litke 2007-02-19 18:41 ` Arjan van de Ven 2007-02-19 18:41 ` Arjan van de Ven 2007-02-19 19:31 ` Adam Litke 2007-02-19 19:31 ` Adam Litke 2007-02-19 19:48 ` William Lee Irwin III 2007-02-19 19:48 ` William Lee Irwin III 2007-02-19 22:29 ` Christoph Hellwig 2007-02-19 22:29 ` Christoph Hellwig 2007-02-20 15:50 ` Mel Gorman 2007-02-20 15:50 ` Mel Gorman 2007-02-19 18:31 ` [PATCH 2/7] copy_vma for hugetlbfs Adam Litke 2007-02-19 18:31 ` Adam Litke 2007-02-19 18:31 ` [PATCH 3/7] pin_pages for hugetlb Adam Litke 2007-02-19 18:31 ` Adam Litke 2007-02-19 18:32 ` [PATCH 4/7] unmap_page_range " Adam Litke 2007-02-19 18:32 ` Adam Litke 2007-02-19 18:32 ` [PATCH 5/7] change_protection " Adam Litke 2007-02-19 18:32 ` Adam Litke 2007-02-19 18:32 ` [PATCH 6/7] free_pgtable_range " Adam Litke 2007-02-19 18:32 ` Adam Litke 2007-02-19 18:32 ` [PATCH 7/7] hugetlbfs fault handler Adam Litke 2007-02-19 18:32 ` Adam Litke 2007-02-19 18:43 ` [PATCH 0/7] [RFC] hugetlb: pagetable_operations API Arjan van de Ven 2007-02-19 18:43 ` Arjan van de Ven 2007-02-19 19:34 ` Adam Litke 2007-02-19 19:34 ` Adam Litke 2007-02-19 21:15 ` Arjan van de Ven 2007-02-19 21:15 ` Arjan van de Ven 2007-02-20 19:57 ` Benjamin Herrenschmidt 2007-02-20 19:57 ` Benjamin Herrenschmidt 2007-02-20 19:54 ` Benjamin Herrenschmidt 2007-02-20 19:54 ` Benjamin Herrenschmidt 2007-03-19 20:05 [PATCH 0/7] [RFC] hugetlb: pagetable_operations API (V2) Adam Litke 2007-03-19 20:05 ` [PATCH 1/7] Introduce the pagetable_operations and associated helper macros Adam Litke 2007-03-19 20:05 ` Adam Litke 2007-03-20 23:24 ` Dave Hansen 2007-03-20 23:24 ` Dave Hansen 2007-03-21 14:50 ` Adam Litke 2007-03-21 14:50 ` Adam Litke 2007-03-21 15:05 ` Arjan van de Ven 2007-03-21 15:05 ` Arjan van de Ven 2007-03-21 4:18 ` Nick Piggin 2007-03-21 4:18 ` Nick Piggin 2007-03-21 4:52 ` William Lee Irwin III 2007-03-21 4:52 ` William Lee Irwin III 2007-03-21 5:07 ` Nick Piggin 2007-03-21 5:07 ` Nick Piggin 2007-03-21 5:41 ` William Lee Irwin III 2007-03-21 5:41 ` William Lee Irwin III 2007-03-21 6:51 ` Nick Piggin 2007-03-21 6:51 ` Nick Piggin 2007-03-21 7:36 ` Nick Piggin 2007-03-21 7:36 ` Nick Piggin 2007-03-21 10:46 ` William Lee Irwin III 2007-03-21 10:46 ` William Lee Irwin III 2007-03-21 15:17 ` Adam Litke 2007-03-21 15:17 ` Adam Litke 2007-03-21 16:00 ` Christoph Hellwig 2007-03-21 16:00 ` Christoph Hellwig 2007-03-21 23:03 ` Nick Piggin 2007-03-21 23:03 ` Nick Piggin 2007-03-21 23:02 ` Nick Piggin 2007-03-21 23:02 ` Nick Piggin 2007-03-21 23:32 ` William Lee Irwin III 2007-03-21 23:32 ` William Lee Irwin III
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.