All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] [RFC] hugetlb: pagetable_operations API
@ 2007-02-19 18:31 ` Adam Litke
  0 siblings, 0 replies; 68+ messages in thread
From: Adam Litke @ 2007-02-19 18:31 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, agl


The page tables for hugetlb mappings are handled differently than page tables
for normal pages.  Rather than integrating multiple page size support into the
main VM (which would tremendously complicate the code) some hooks were created.
This allows hugetlb special cases to be handled "out of line" by a separate
interface.

Hugetlbfs was the huge page interface chosen.  At the time, large database
users were the only big users of huge pages and the hugetlbfs design meets
their needs pretty well.  Over time, hugetlbfs has been expanded to enable new
uses of huge page memory with varied results.  As features are added, the
semantics become a permanent part of the Linux API.  This makes maintenance of
hugetlbfs an increasingly difficult task and inhibits the addition of features
and functionality in support of ever-changing hardware.

To remedy the situation, I propose an API (currently called
pagetable_operations).  All of the current hugetlbfs-specific hooks are moved
into an operations struct that is attached to VMAs.  The end result is a more
explicit and IMO a cleaner interface between hugetlbfs and the core VM.  We are
then free to add other hugetlb interfaces (such as a /dev/zero-styled character
device) that can operate either in concert with or independent of hugetlbfs.

There should be no measurable performance impact for normal page users (we're
checking if pagetable_ops != NULL instead of checking for vm_flags &
VM_HUGETLB).  Of course we do increase the VMA size by one pointer.  For huge
pages, there is an added indirection for pt_op() calls.  This patch series does
not change the logic of the the hugetlbfs operations, just moves them into the
pagetable_operations struct.

Comments?  Do you think it's as good of an idea as I do?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 0/7] [RFC] hugetlb: pagetable_operations API
@ 2007-02-19 18:31 ` Adam Litke
  0 siblings, 0 replies; 68+ messages in thread
From: Adam Litke @ 2007-02-19 18:31 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, agl

The page tables for hugetlb mappings are handled differently than page tables
for normal pages.  Rather than integrating multiple page size support into the
main VM (which would tremendously complicate the code) some hooks were created.
This allows hugetlb special cases to be handled "out of line" by a separate
interface.

Hugetlbfs was the huge page interface chosen.  At the time, large database
users were the only big users of huge pages and the hugetlbfs design meets
their needs pretty well.  Over time, hugetlbfs has been expanded to enable new
uses of huge page memory with varied results.  As features are added, the
semantics become a permanent part of the Linux API.  This makes maintenance of
hugetlbfs an increasingly difficult task and inhibits the addition of features
and functionality in support of ever-changing hardware.

To remedy the situation, I propose an API (currently called
pagetable_operations).  All of the current hugetlbfs-specific hooks are moved
into an operations struct that is attached to VMAs.  The end result is a more
explicit and IMO a cleaner interface between hugetlbfs and the core VM.  We are
then free to add other hugetlb interfaces (such as a /dev/zero-styled character
device) that can operate either in concert with or independent of hugetlbfs.

There should be no measurable performance impact for normal page users (we're
checking if pagetable_ops != NULL instead of checking for vm_flags &
VM_HUGETLB).  Of course we do increase the VMA size by one pointer.  For huge
pages, there is an added indirection for pt_op() calls.  This patch series does
not change the logic of the the hugetlbfs operations, just moves them into the
pagetable_operations struct.

Comments?  Do you think it's as good of an idea as I do?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
  2007-02-19 18:31 ` Adam Litke
@ 2007-02-19 18:31   ` Adam Litke
  -1 siblings, 0 replies; 68+ messages in thread
From: Adam Litke @ 2007-02-19 18:31 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, agl


Signed-off-by: Adam Litke <agl@us.ibm.com>
---

 include/linux/mm.h |   25 +++++++++++++++++++++++++
 1 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2d2c08d..a2fa66d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -98,6 +98,7 @@ struct vm_area_struct {
 
 	/* Function pointers to deal with this struct. */
 	struct vm_operations_struct * vm_ops;
+	struct pagetable_operations_struct * pagetable_ops;
 
 	/* Information about our backing store: */
 	unsigned long vm_pgoff;		/* Offset (within vm_file) in PAGE_SIZE
@@ -218,6 +219,30 @@ struct vm_operations_struct {
 };
 
 struct mmu_gather;
+
+struct pagetable_operations_struct {
+	int (*fault)(struct mm_struct *mm,
+		struct vm_area_struct *vma,
+		unsigned long address, int write_access);
+	int (*copy_vma)(struct mm_struct *dst, struct mm_struct *src,
+		struct vm_area_struct *vma);
+	int (*pin_pages)(struct mm_struct *mm, struct vm_area_struct *vma,
+		struct page **pages, struct vm_area_struct **vmas,
+		unsigned long *position, int *length, int i);
+	void (*change_protection)(struct vm_area_struct *vma,
+		unsigned long address, unsigned long end, pgprot_t newprot);
+	unsigned long (*unmap_page_range)(struct vm_area_struct *vma,
+		unsigned long address, unsigned long end, long *zap_work);
+	void (*free_pgtable_range)(struct mmu_gather **tlb,
+		unsigned long addr, unsigned long end,
+		unsigned long floor, unsigned long ceiling);
+};
+
+#define has_pt_op(vma, op) \
+	((vma)->pagetable_ops && (vma)->pagetable_ops->op)
+#define pt_op(vma, call) \
+	((vma)->pagetable_ops->call)
+
 struct inode;
 
 #define page_private(page)		((page)->private)

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
@ 2007-02-19 18:31   ` Adam Litke
  0 siblings, 0 replies; 68+ messages in thread
From: Adam Litke @ 2007-02-19 18:31 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, agl

Signed-off-by: Adam Litke <agl@us.ibm.com>
---

 include/linux/mm.h |   25 +++++++++++++++++++++++++
 1 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2d2c08d..a2fa66d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -98,6 +98,7 @@ struct vm_area_struct {
 
 	/* Function pointers to deal with this struct. */
 	struct vm_operations_struct * vm_ops;
+	struct pagetable_operations_struct * pagetable_ops;
 
 	/* Information about our backing store: */
 	unsigned long vm_pgoff;		/* Offset (within vm_file) in PAGE_SIZE
@@ -218,6 +219,30 @@ struct vm_operations_struct {
 };
 
 struct mmu_gather;
+
+struct pagetable_operations_struct {
+	int (*fault)(struct mm_struct *mm,
+		struct vm_area_struct *vma,
+		unsigned long address, int write_access);
+	int (*copy_vma)(struct mm_struct *dst, struct mm_struct *src,
+		struct vm_area_struct *vma);
+	int (*pin_pages)(struct mm_struct *mm, struct vm_area_struct *vma,
+		struct page **pages, struct vm_area_struct **vmas,
+		unsigned long *position, int *length, int i);
+	void (*change_protection)(struct vm_area_struct *vma,
+		unsigned long address, unsigned long end, pgprot_t newprot);
+	unsigned long (*unmap_page_range)(struct vm_area_struct *vma,
+		unsigned long address, unsigned long end, long *zap_work);
+	void (*free_pgtable_range)(struct mmu_gather **tlb,
+		unsigned long addr, unsigned long end,
+		unsigned long floor, unsigned long ceiling);
+};
+
+#define has_pt_op(vma, op) \
+	((vma)->pagetable_ops && (vma)->pagetable_ops->op)
+#define pt_op(vma, call) \
+	((vma)->pagetable_ops->call)
+
 struct inode;
 
 #define page_private(page)		((page)->private)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 2/7] copy_vma for hugetlbfs
  2007-02-19 18:31 ` Adam Litke
@ 2007-02-19 18:31   ` Adam Litke
  -1 siblings, 0 replies; 68+ messages in thread
From: Adam Litke @ 2007-02-19 18:31 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, agl


Signed-off-by: Adam Litke <agl@us.ibm.com>
---

 fs/hugetlbfs/inode.c |    6 ++++++
 mm/memory.c          |    4 ++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 4f4cd13..c0a7984 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -36,6 +36,7 @@
 static struct super_operations hugetlbfs_ops;
 static const struct address_space_operations hugetlbfs_aops;
 const struct file_operations hugetlbfs_file_operations;
+static struct pagetable_operations_struct hugetlbfs_pagetable_ops;
 static struct inode_operations hugetlbfs_dir_inode_operations;
 static struct inode_operations hugetlbfs_inode_operations;
 
@@ -70,6 +71,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 	 */
 	vma->vm_flags |= VM_HUGETLB | VM_RESERVED;
 	vma->vm_ops = &hugetlb_vm_ops;
+	vma->pagetable_ops = &hugetlbfs_pagetable_ops;
 
 	vma_len = (loff_t)(vma->vm_end - vma->vm_start);
 
@@ -560,6 +562,10 @@ const struct file_operations hugetlbfs_file_operations = {
 	.get_unmapped_area	= hugetlb_get_unmapped_area,
 };
 
+static struct pagetable_operations_struct hugetlbfs_pagetable_ops = {
+	.copy_vma		= copy_hugetlb_page_range,
+};
+
 static struct inode_operations hugetlbfs_dir_inode_operations = {
 	.create		= hugetlbfs_create,
 	.lookup		= simple_lookup,
diff --git a/mm/memory.c b/mm/memory.c
index ef09f0a..80eafd5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -602,8 +602,8 @@ int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 			return 0;
 	}
 
-	if (is_vm_hugetlb_page(vma))
-		return copy_hugetlb_page_range(dst_mm, src_mm, vma);
+	if (has_pt_op(vma, copy_vma))
+		return pt_op(vma, copy_vma)(dst_mm, src_mm, vma);
 
 	dst_pgd = pgd_offset(dst_mm, addr);
 	src_pgd = pgd_offset(src_mm, addr);

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 2/7] copy_vma for hugetlbfs
@ 2007-02-19 18:31   ` Adam Litke
  0 siblings, 0 replies; 68+ messages in thread
From: Adam Litke @ 2007-02-19 18:31 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, agl

Signed-off-by: Adam Litke <agl@us.ibm.com>
---

 fs/hugetlbfs/inode.c |    6 ++++++
 mm/memory.c          |    4 ++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 4f4cd13..c0a7984 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -36,6 +36,7 @@
 static struct super_operations hugetlbfs_ops;
 static const struct address_space_operations hugetlbfs_aops;
 const struct file_operations hugetlbfs_file_operations;
+static struct pagetable_operations_struct hugetlbfs_pagetable_ops;
 static struct inode_operations hugetlbfs_dir_inode_operations;
 static struct inode_operations hugetlbfs_inode_operations;
 
@@ -70,6 +71,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 	 */
 	vma->vm_flags |= VM_HUGETLB | VM_RESERVED;
 	vma->vm_ops = &hugetlb_vm_ops;
+	vma->pagetable_ops = &hugetlbfs_pagetable_ops;
 
 	vma_len = (loff_t)(vma->vm_end - vma->vm_start);
 
@@ -560,6 +562,10 @@ const struct file_operations hugetlbfs_file_operations = {
 	.get_unmapped_area	= hugetlb_get_unmapped_area,
 };
 
+static struct pagetable_operations_struct hugetlbfs_pagetable_ops = {
+	.copy_vma		= copy_hugetlb_page_range,
+};
+
 static struct inode_operations hugetlbfs_dir_inode_operations = {
 	.create		= hugetlbfs_create,
 	.lookup		= simple_lookup,
diff --git a/mm/memory.c b/mm/memory.c
index ef09f0a..80eafd5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -602,8 +602,8 @@ int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 			return 0;
 	}
 
-	if (is_vm_hugetlb_page(vma))
-		return copy_hugetlb_page_range(dst_mm, src_mm, vma);
+	if (has_pt_op(vma, copy_vma))
+		return pt_op(vma, copy_vma)(dst_mm, src_mm, vma);
 
 	dst_pgd = pgd_offset(dst_mm, addr);
 	src_pgd = pgd_offset(src_mm, addr);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 3/7] pin_pages for hugetlb
  2007-02-19 18:31 ` Adam Litke
@ 2007-02-19 18:31   ` Adam Litke
  -1 siblings, 0 replies; 68+ messages in thread
From: Adam Litke @ 2007-02-19 18:31 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, agl


Signed-off-by: Adam Litke <agl@us.ibm.com>
---

 fs/hugetlbfs/inode.c |    1 +
 mm/memory.c          |    6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index c0a7984..2d1dd84 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -564,6 +564,7 @@ const struct file_operations hugetlbfs_file_operations = {
 
 static struct pagetable_operations_struct hugetlbfs_pagetable_ops = {
 	.copy_vma		= copy_hugetlb_page_range,
+	.pin_pages		= follow_hugetlb_page,
 };
 
 static struct inode_operations hugetlbfs_dir_inode_operations = {
diff --git a/mm/memory.c b/mm/memory.c
index 80eafd5..9467c65 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1039,9 +1039,9 @@ int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 				|| !(vm_flags & vma->vm_flags))
 			return i ? : -EFAULT;
 
-		if (is_vm_hugetlb_page(vma)) {
-			i = follow_hugetlb_page(mm, vma, pages, vmas,
-						&start, &len, i);
+		if (has_pt_op(vma, pin_pages)) {
+			i = pt_op(vma, pin_pages)(mm, vma, pages,
+						vmas, &start, &len, i);
 			continue;
 		}
 

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 3/7] pin_pages for hugetlb
@ 2007-02-19 18:31   ` Adam Litke
  0 siblings, 0 replies; 68+ messages in thread
From: Adam Litke @ 2007-02-19 18:31 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, agl

Signed-off-by: Adam Litke <agl@us.ibm.com>
---

 fs/hugetlbfs/inode.c |    1 +
 mm/memory.c          |    6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index c0a7984..2d1dd84 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -564,6 +564,7 @@ const struct file_operations hugetlbfs_file_operations = {
 
 static struct pagetable_operations_struct hugetlbfs_pagetable_ops = {
 	.copy_vma		= copy_hugetlb_page_range,
+	.pin_pages		= follow_hugetlb_page,
 };
 
 static struct inode_operations hugetlbfs_dir_inode_operations = {
diff --git a/mm/memory.c b/mm/memory.c
index 80eafd5..9467c65 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1039,9 +1039,9 @@ int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 				|| !(vm_flags & vma->vm_flags))
 			return i ? : -EFAULT;
 
-		if (is_vm_hugetlb_page(vma)) {
-			i = follow_hugetlb_page(mm, vma, pages, vmas,
-						&start, &len, i);
+		if (has_pt_op(vma, pin_pages)) {
+			i = pt_op(vma, pin_pages)(mm, vma, pages,
+						vmas, &start, &len, i);
 			continue;
 		}
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 4/7] unmap_page_range for hugetlb
  2007-02-19 18:31 ` Adam Litke
@ 2007-02-19 18:32   ` Adam Litke
  -1 siblings, 0 replies; 68+ messages in thread
From: Adam Litke @ 2007-02-19 18:32 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, agl


Signed-off-by: Adam Litke <agl@us.ibm.com>
---

 fs/hugetlbfs/inode.c    |    3 ++-
 include/linux/hugetlb.h |    4 ++--
 mm/hugetlb.c            |   12 ++++++++----
 mm/memory.c             |   10 ++++------
 4 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 2d1dd84..146a4b7 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -289,7 +289,7 @@ hugetlb_vmtruncate_list(struct prio_tree_root *root, pgoff_t pgoff)
 			v_offset = 0;
 
 		__unmap_hugepage_range(vma,
-				vma->vm_start + v_offset, vma->vm_end);
+				vma->vm_start + v_offset, vma->vm_end, 0);
 	}
 }
 
@@ -565,6 +565,7 @@ const struct file_operations hugetlbfs_file_operations = {
 static struct pagetable_operations_struct hugetlbfs_pagetable_ops = {
 	.copy_vma		= copy_hugetlb_page_range,
 	.pin_pages		= follow_hugetlb_page,
+	.unmap_page_range	= unmap_hugepage_range,
 };
 
 static struct inode_operations hugetlbfs_dir_inode_operations = {
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index a60995a..add92b3 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -16,8 +16,8 @@ static inline int is_vm_hugetlb_page(struct vm_area_struct *vma)
 int hugetlb_sysctl_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *);
 int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *);
 int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, unsigned long *, int *, int);
-void unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long);
-void __unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long);
+unsigned long unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long, long *);
+void __unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long, long *);
 int hugetlb_prefault(struct address_space *, struct vm_area_struct *);
 int hugetlb_report_meminfo(char *);
 int hugetlb_report_node_meminfo(int, char *);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index cb362f7..3bcc0db 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -356,7 +356,7 @@ nomem:
 }
 
 void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
-			    unsigned long end)
+			    unsigned long end, long *zap_work)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long address;
@@ -397,10 +397,13 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
 		list_del(&page->lru);
 		put_page(page);
 	}
+
+	if (zap_work)
+		*zap_work -= (end - start) / (HPAGE_SIZE / PAGE_SIZE);
 }
 
-void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
-			  unsigned long end)
+unsigned long unmap_hugepage_range(struct vm_area_struct *vma,
+			unsigned long start, unsigned long end, long *zap_work)
 {
 	/*
 	 * It is undesirable to test vma->vm_file as it should be non-null
@@ -412,9 +415,10 @@ void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
 	 */
 	if (vma->vm_file) {
 		spin_lock(&vma->vm_file->f_mapping->i_mmap_lock);
-		__unmap_hugepage_range(vma, start, end);
+		__unmap_hugepage_range(vma, start, end, zap_work);
 		spin_unlock(&vma->vm_file->f_mapping->i_mmap_lock);
 	}
+	return end;
 }
 
 static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
diff --git a/mm/memory.c b/mm/memory.c
index 9467c65..aa6b06e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -839,12 +839,10 @@ unsigned long unmap_vmas(struct mmu_gather **tlbp,
 				tlb_start_valid = 1;
 			}
 
-			if (unlikely(is_vm_hugetlb_page(vma))) {
-				unmap_hugepage_range(vma, start, end);
-				zap_work -= (end - start) /
-						(HPAGE_SIZE / PAGE_SIZE);
-				start = end;
-			} else
+			if (unlikely(has_pt_op(vma, unmap_page_range)))
+				start = pt_op(vma, unmap_page_range)
+						(vma, start, end, &zap_work);
+			else
 				start = unmap_page_range(*tlbp, vma,
 						start, end, &zap_work, details);
 

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 4/7] unmap_page_range for hugetlb
@ 2007-02-19 18:32   ` Adam Litke
  0 siblings, 0 replies; 68+ messages in thread
From: Adam Litke @ 2007-02-19 18:32 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, agl

Signed-off-by: Adam Litke <agl@us.ibm.com>
---

 fs/hugetlbfs/inode.c    |    3 ++-
 include/linux/hugetlb.h |    4 ++--
 mm/hugetlb.c            |   12 ++++++++----
 mm/memory.c             |   10 ++++------
 4 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 2d1dd84..146a4b7 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -289,7 +289,7 @@ hugetlb_vmtruncate_list(struct prio_tree_root *root, pgoff_t pgoff)
 			v_offset = 0;
 
 		__unmap_hugepage_range(vma,
-				vma->vm_start + v_offset, vma->vm_end);
+				vma->vm_start + v_offset, vma->vm_end, 0);
 	}
 }
 
@@ -565,6 +565,7 @@ const struct file_operations hugetlbfs_file_operations = {
 static struct pagetable_operations_struct hugetlbfs_pagetable_ops = {
 	.copy_vma		= copy_hugetlb_page_range,
 	.pin_pages		= follow_hugetlb_page,
+	.unmap_page_range	= unmap_hugepage_range,
 };
 
 static struct inode_operations hugetlbfs_dir_inode_operations = {
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index a60995a..add92b3 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -16,8 +16,8 @@ static inline int is_vm_hugetlb_page(struct vm_area_struct *vma)
 int hugetlb_sysctl_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *);
 int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *);
 int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, unsigned long *, int *, int);
-void unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long);
-void __unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long);
+unsigned long unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long, long *);
+void __unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long, long *);
 int hugetlb_prefault(struct address_space *, struct vm_area_struct *);
 int hugetlb_report_meminfo(char *);
 int hugetlb_report_node_meminfo(int, char *);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index cb362f7..3bcc0db 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -356,7 +356,7 @@ nomem:
 }
 
 void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
-			    unsigned long end)
+			    unsigned long end, long *zap_work)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long address;
@@ -397,10 +397,13 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
 		list_del(&page->lru);
 		put_page(page);
 	}
+
+	if (zap_work)
+		*zap_work -= (end - start) / (HPAGE_SIZE / PAGE_SIZE);
 }
 
-void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
-			  unsigned long end)
+unsigned long unmap_hugepage_range(struct vm_area_struct *vma,
+			unsigned long start, unsigned long end, long *zap_work)
 {
 	/*
 	 * It is undesirable to test vma->vm_file as it should be non-null
@@ -412,9 +415,10 @@ void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
 	 */
 	if (vma->vm_file) {
 		spin_lock(&vma->vm_file->f_mapping->i_mmap_lock);
-		__unmap_hugepage_range(vma, start, end);
+		__unmap_hugepage_range(vma, start, end, zap_work);
 		spin_unlock(&vma->vm_file->f_mapping->i_mmap_lock);
 	}
+	return end;
 }
 
 static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
diff --git a/mm/memory.c b/mm/memory.c
index 9467c65..aa6b06e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -839,12 +839,10 @@ unsigned long unmap_vmas(struct mmu_gather **tlbp,
 				tlb_start_valid = 1;
 			}
 
-			if (unlikely(is_vm_hugetlb_page(vma))) {
-				unmap_hugepage_range(vma, start, end);
-				zap_work -= (end - start) /
-						(HPAGE_SIZE / PAGE_SIZE);
-				start = end;
-			} else
+			if (unlikely(has_pt_op(vma, unmap_page_range)))
+				start = pt_op(vma, unmap_page_range)
+						(vma, start, end, &zap_work);
+			else
 				start = unmap_page_range(*tlbp, vma,
 						start, end, &zap_work, details);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 5/7] change_protection for hugetlb
  2007-02-19 18:31 ` Adam Litke
@ 2007-02-19 18:32   ` Adam Litke
  -1 siblings, 0 replies; 68+ messages in thread
From: Adam Litke @ 2007-02-19 18:32 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, agl


Signed-off-by: Adam Litke <agl@us.ibm.com>
---

 fs/hugetlbfs/inode.c |    1 +
 mm/mprotect.c        |    5 +++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 146a4b7..1016694 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -566,6 +566,7 @@ static struct pagetable_operations_struct hugetlbfs_pagetable_ops = {
 	.copy_vma		= copy_hugetlb_page_range,
 	.pin_pages		= follow_hugetlb_page,
 	.unmap_page_range	= unmap_hugepage_range,
+	.change_protection	= hugetlb_change_protection,
 };
 
 static struct inode_operations hugetlbfs_dir_inode_operations = {
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 3b8f3c0..172e204 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -201,8 +201,9 @@ success:
 		dirty_accountable = 1;
 	}
 
-	if (is_vm_hugetlb_page(vma))
-		hugetlb_change_protection(vma, start, end, vma->vm_page_prot);
+	if (has_pt_op(vma, change_protection))
+		pt_op(vma, change_protection)(vma, start, end,
+			vma->vm_page_prot);
 	else
 		change_protection(vma, start, end, vma->vm_page_prot, dirty_accountable);
 	vm_stat_account(mm, oldflags, vma->vm_file, -nrpages);

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 5/7] change_protection for hugetlb
@ 2007-02-19 18:32   ` Adam Litke
  0 siblings, 0 replies; 68+ messages in thread
From: Adam Litke @ 2007-02-19 18:32 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, agl

Signed-off-by: Adam Litke <agl@us.ibm.com>
---

 fs/hugetlbfs/inode.c |    1 +
 mm/mprotect.c        |    5 +++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 146a4b7..1016694 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -566,6 +566,7 @@ static struct pagetable_operations_struct hugetlbfs_pagetable_ops = {
 	.copy_vma		= copy_hugetlb_page_range,
 	.pin_pages		= follow_hugetlb_page,
 	.unmap_page_range	= unmap_hugepage_range,
+	.change_protection	= hugetlb_change_protection,
 };
 
 static struct inode_operations hugetlbfs_dir_inode_operations = {
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 3b8f3c0..172e204 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -201,8 +201,9 @@ success:
 		dirty_accountable = 1;
 	}
 
-	if (is_vm_hugetlb_page(vma))
-		hugetlb_change_protection(vma, start, end, vma->vm_page_prot);
+	if (has_pt_op(vma, change_protection))
+		pt_op(vma, change_protection)(vma, start, end,
+			vma->vm_page_prot);
 	else
 		change_protection(vma, start, end, vma->vm_page_prot, dirty_accountable);
 	vm_stat_account(mm, oldflags, vma->vm_file, -nrpages);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 6/7] free_pgtable_range for hugetlb
  2007-02-19 18:31 ` Adam Litke
@ 2007-02-19 18:32   ` Adam Litke
  -1 siblings, 0 replies; 68+ messages in thread
From: Adam Litke @ 2007-02-19 18:32 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, agl


Signed-off-by: Adam Litke <agl@us.ibm.com>
---

 fs/hugetlbfs/inode.c |    1 +
 mm/memory.c          |    6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 1016694..3461f9b 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -567,6 +567,7 @@ static struct pagetable_operations_struct hugetlbfs_pagetable_ops = {
 	.pin_pages		= follow_hugetlb_page,
 	.unmap_page_range	= unmap_hugepage_range,
 	.change_protection	= hugetlb_change_protection,
+	.free_pgtable_range	= hugetlb_free_pgd_range,
 };
 
 static struct inode_operations hugetlbfs_dir_inode_operations = {
diff --git a/mm/memory.c b/mm/memory.c
index aa6b06e..442e6b2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -279,15 +279,15 @@ void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *vma,
 		anon_vma_unlink(vma);
 		unlink_file_vma(vma);
 
-		if (is_vm_hugetlb_page(vma)) {
-			hugetlb_free_pgd_range(tlb, addr, vma->vm_end,
+		if (has_pt_op(vma, free_pgtable_range)) {
+			pt_op(vma, free_pgtable_range)(tlb, addr, vma->vm_end,
 				floor, next? next->vm_start: ceiling);
 		} else {
 			/*
 			 * Optimization: gather nearby vmas into one call down
 			 */
 			while (next && next->vm_start <= vma->vm_end + PMD_SIZE
-			       && !is_vm_hugetlb_page(next)) {
+			       && !has_pt_op(next, free_pgtable_range)) {
 				vma = next;
 				next = vma->vm_next;
 				anon_vma_unlink(vma);

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 6/7] free_pgtable_range for hugetlb
@ 2007-02-19 18:32   ` Adam Litke
  0 siblings, 0 replies; 68+ messages in thread
From: Adam Litke @ 2007-02-19 18:32 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, agl

Signed-off-by: Adam Litke <agl@us.ibm.com>
---

 fs/hugetlbfs/inode.c |    1 +
 mm/memory.c          |    6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 1016694..3461f9b 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -567,6 +567,7 @@ static struct pagetable_operations_struct hugetlbfs_pagetable_ops = {
 	.pin_pages		= follow_hugetlb_page,
 	.unmap_page_range	= unmap_hugepage_range,
 	.change_protection	= hugetlb_change_protection,
+	.free_pgtable_range	= hugetlb_free_pgd_range,
 };
 
 static struct inode_operations hugetlbfs_dir_inode_operations = {
diff --git a/mm/memory.c b/mm/memory.c
index aa6b06e..442e6b2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -279,15 +279,15 @@ void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *vma,
 		anon_vma_unlink(vma);
 		unlink_file_vma(vma);
 
-		if (is_vm_hugetlb_page(vma)) {
-			hugetlb_free_pgd_range(tlb, addr, vma->vm_end,
+		if (has_pt_op(vma, free_pgtable_range)) {
+			pt_op(vma, free_pgtable_range)(tlb, addr, vma->vm_end,
 				floor, next? next->vm_start: ceiling);
 		} else {
 			/*
 			 * Optimization: gather nearby vmas into one call down
 			 */
 			while (next && next->vm_start <= vma->vm_end + PMD_SIZE
-			       && !is_vm_hugetlb_page(next)) {
+			       && !has_pt_op(next, free_pgtable_range)) {
 				vma = next;
 				next = vma->vm_next;
 				anon_vma_unlink(vma);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 7/7] hugetlbfs fault handler
  2007-02-19 18:31 ` Adam Litke
@ 2007-02-19 18:32   ` Adam Litke
  -1 siblings, 0 replies; 68+ messages in thread
From: Adam Litke @ 2007-02-19 18:32 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, agl


Signed-off-by: Adam Litke <agl@us.ibm.com>
---

 fs/hugetlbfs/inode.c |    1 +
 mm/hugetlb.c         |    4 +++-
 mm/memory.c          |    4 ++--
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 3461f9b..1de73c1 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -568,6 +568,7 @@ static struct pagetable_operations_struct hugetlbfs_pagetable_ops = {
 	.unmap_page_range	= unmap_hugepage_range,
 	.change_protection	= hugetlb_change_protection,
 	.free_pgtable_range	= hugetlb_free_pgd_range,
+	.fault			= hugetlb_fault,
 };
 
 static struct inode_operations hugetlbfs_dir_inode_operations = {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3bcc0db..63de369 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -588,6 +588,8 @@ int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	unsigned long vaddr = *position;
 	int remainder = *length;
 
+	BUG_ON(!has_pt_op(vma, fault));
+
 	spin_lock(&mm->page_table_lock);
 	while (vaddr < vma->vm_end && remainder) {
 		pte_t *pte;
@@ -604,7 +606,7 @@ int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			int ret;
 
 			spin_unlock(&mm->page_table_lock);
-			ret = hugetlb_fault(mm, vma, vaddr, 0);
+			ret = pt_op(vma, fault)(mm, vma, vaddr, 0);
 			spin_lock(&mm->page_table_lock);
 			if (ret == VM_FAULT_MINOR)
 				continue;
diff --git a/mm/memory.c b/mm/memory.c
index 442e6b2..c8e17b4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2455,8 +2455,8 @@ int __handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	count_vm_event(PGFAULT);
 
-	if (unlikely(is_vm_hugetlb_page(vma)))
-		return hugetlb_fault(mm, vma, address, write_access);
+	if (unlikely(has_pt_op(vma, fault)))
+		return pt_op(vma, fault)(mm, vma, address, write_access);
 
 	pgd = pgd_offset(mm, address);
 	pud = pud_alloc(mm, pgd, address);

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 7/7] hugetlbfs fault handler
@ 2007-02-19 18:32   ` Adam Litke
  0 siblings, 0 replies; 68+ messages in thread
From: Adam Litke @ 2007-02-19 18:32 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, agl

Signed-off-by: Adam Litke <agl@us.ibm.com>
---

 fs/hugetlbfs/inode.c |    1 +
 mm/hugetlb.c         |    4 +++-
 mm/memory.c          |    4 ++--
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 3461f9b..1de73c1 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -568,6 +568,7 @@ static struct pagetable_operations_struct hugetlbfs_pagetable_ops = {
 	.unmap_page_range	= unmap_hugepage_range,
 	.change_protection	= hugetlb_change_protection,
 	.free_pgtable_range	= hugetlb_free_pgd_range,
+	.fault			= hugetlb_fault,
 };
 
 static struct inode_operations hugetlbfs_dir_inode_operations = {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3bcc0db..63de369 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -588,6 +588,8 @@ int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	unsigned long vaddr = *position;
 	int remainder = *length;
 
+	BUG_ON(!has_pt_op(vma, fault));
+
 	spin_lock(&mm->page_table_lock);
 	while (vaddr < vma->vm_end && remainder) {
 		pte_t *pte;
@@ -604,7 +606,7 @@ int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			int ret;
 
 			spin_unlock(&mm->page_table_lock);
-			ret = hugetlb_fault(mm, vma, vaddr, 0);
+			ret = pt_op(vma, fault)(mm, vma, vaddr, 0);
 			spin_lock(&mm->page_table_lock);
 			if (ret == VM_FAULT_MINOR)
 				continue;
diff --git a/mm/memory.c b/mm/memory.c
index 442e6b2..c8e17b4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2455,8 +2455,8 @@ int __handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	count_vm_event(PGFAULT);
 
-	if (unlikely(is_vm_hugetlb_page(vma)))
-		return hugetlb_fault(mm, vma, address, write_access);
+	if (unlikely(has_pt_op(vma, fault)))
+		return pt_op(vma, fault)(mm, vma, address, write_access);
 
 	pgd = pgd_offset(mm, address);
 	pud = pud_alloc(mm, pgd, address);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
  2007-02-19 18:31   ` Adam Litke
@ 2007-02-19 18:41     ` Arjan van de Ven
  -1 siblings, 0 replies; 68+ messages in thread
From: Arjan van de Ven @ 2007-02-19 18:41 UTC (permalink / raw)
  To: Adam Litke; +Cc: linux-mm, linux-kernel

On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote:
> Signed-off-by: Adam Litke <agl@us.ibm.com>
> ---
> 
>  include/linux/mm.h |   25 +++++++++++++++++++++++++
>  1 files changed, 25 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 2d2c08d..a2fa66d 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -98,6 +98,7 @@ struct vm_area_struct {
>  
>  	/* Function pointers to deal with this struct. */
>  	struct vm_operations_struct * vm_ops;
> +	struct pagetable_operations_struct * pagetable_ops;
>  

please make it at least const, those things have no business ever being
written to right? And by making them const the compiler helps catch
that, and as bonus the data gets moved to rodata so that it won't share
cachelines with anything that gets dirty


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
@ 2007-02-19 18:41     ` Arjan van de Ven
  0 siblings, 0 replies; 68+ messages in thread
From: Arjan van de Ven @ 2007-02-19 18:41 UTC (permalink / raw)
  To: Adam Litke; +Cc: linux-mm, linux-kernel

On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote:
> Signed-off-by: Adam Litke <agl@us.ibm.com>
> ---
> 
>  include/linux/mm.h |   25 +++++++++++++++++++++++++
>  1 files changed, 25 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 2d2c08d..a2fa66d 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -98,6 +98,7 @@ struct vm_area_struct {
>  
>  	/* Function pointers to deal with this struct. */
>  	struct vm_operations_struct * vm_ops;
> +	struct pagetable_operations_struct * pagetable_ops;
>  

please make it at least const, those things have no business ever being
written to right? And by making them const the compiler helps catch
that, and as bonus the data gets moved to rodata so that it won't share
cachelines with anything that gets dirty

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API
  2007-02-19 18:31 ` Adam Litke
@ 2007-02-19 18:43   ` Arjan van de Ven
  -1 siblings, 0 replies; 68+ messages in thread
From: Arjan van de Ven @ 2007-02-19 18:43 UTC (permalink / raw)
  To: Adam Litke; +Cc: linux-mm, linux-kernel

On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote:
> The page tables for hugetlb mappings are handled differently than page tables
> for normal pages.  Rather than integrating multiple page size support into the
> main VM (which would tremendously complicate the code) some hooks were created.
> This allows hugetlb special cases to be handled "out of line" by a separate
> interface.

ok it makes sense to clean this up.. what I don't like is that there
STILL are all the double cases... for this to work and be worth it both
the common case and the hugetlb case should be using the ops structure
always! Anything else and you're just replacing bad code with bad
code ;(


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API
@ 2007-02-19 18:43   ` Arjan van de Ven
  0 siblings, 0 replies; 68+ messages in thread
From: Arjan van de Ven @ 2007-02-19 18:43 UTC (permalink / raw)
  To: Adam Litke; +Cc: linux-mm, linux-kernel

On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote:
> The page tables for hugetlb mappings are handled differently than page tables
> for normal pages.  Rather than integrating multiple page size support into the
> main VM (which would tremendously complicate the code) some hooks were created.
> This allows hugetlb special cases to be handled "out of line" by a separate
> interface.

ok it makes sense to clean this up.. what I don't like is that there
STILL are all the double cases... for this to work and be worth it both
the common case and the hugetlb case should be using the ops structure
always! Anything else and you're just replacing bad code with bad
code ;(

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
  2007-02-19 18:41     ` Arjan van de Ven
@ 2007-02-19 19:31       ` Adam Litke
  -1 siblings, 0 replies; 68+ messages in thread
From: Adam Litke @ 2007-02-19 19:31 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-mm, linux-kernel

On Mon, 2007-02-19 at 19:41 +0100, Arjan van de Ven wrote:
> On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote:
> > Signed-off-by: Adam Litke <agl@us.ibm.com>
> > ---
> > 
> >  include/linux/mm.h |   25 +++++++++++++++++++++++++
> >  1 files changed, 25 insertions(+), 0 deletions(-)
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 2d2c08d..a2fa66d 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -98,6 +98,7 @@ struct vm_area_struct {
> >  
> >  	/* Function pointers to deal with this struct. */
> >  	struct vm_operations_struct * vm_ops;
> > +	struct pagetable_operations_struct * pagetable_ops;
> >  
> 
> please make it at least const, those things have no business ever being
> written to right? And by making them const the compiler helps catch
> that, and as bonus the data gets moved to rodata so that it won't share
> cachelines with anything that gets dirty

Yep I agree.  Changed.

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
@ 2007-02-19 19:31       ` Adam Litke
  0 siblings, 0 replies; 68+ messages in thread
From: Adam Litke @ 2007-02-19 19:31 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-mm, linux-kernel

On Mon, 2007-02-19 at 19:41 +0100, Arjan van de Ven wrote:
> On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote:
> > Signed-off-by: Adam Litke <agl@us.ibm.com>
> > ---
> > 
> >  include/linux/mm.h |   25 +++++++++++++++++++++++++
> >  1 files changed, 25 insertions(+), 0 deletions(-)
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 2d2c08d..a2fa66d 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -98,6 +98,7 @@ struct vm_area_struct {
> >  
> >  	/* Function pointers to deal with this struct. */
> >  	struct vm_operations_struct * vm_ops;
> > +	struct pagetable_operations_struct * pagetable_ops;
> >  
> 
> please make it at least const, those things have no business ever being
> written to right? And by making them const the compiler helps catch
> that, and as bonus the data gets moved to rodata so that it won't share
> cachelines with anything that gets dirty

Yep I agree.  Changed.

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API
  2007-02-19 18:43   ` Arjan van de Ven
@ 2007-02-19 19:34     ` Adam Litke
  -1 siblings, 0 replies; 68+ messages in thread
From: Adam Litke @ 2007-02-19 19:34 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-mm, linux-kernel

On Mon, 2007-02-19 at 19:43 +0100, Arjan van de Ven wrote:
> On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote:
> > The page tables for hugetlb mappings are handled differently than page tables
> > for normal pages.  Rather than integrating multiple page size support into the
> > main VM (which would tremendously complicate the code) some hooks were created.
> > This allows hugetlb special cases to be handled "out of line" by a separate
> > interface.
> 
> ok it makes sense to clean this up.. what I don't like is that there
> STILL are all the double cases... for this to work and be worth it both
> the common case and the hugetlb case should be using the ops structure
> always! Anything else and you're just replacing bad code with bad
> code ;(

Hmm.  Do you think everyone would support an extra pointer indirection
for every handle_pte_fault() call?  If not, then I definitely wouldn't
mind creating a default_pagetable_ops and calling into that.

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API
@ 2007-02-19 19:34     ` Adam Litke
  0 siblings, 0 replies; 68+ messages in thread
From: Adam Litke @ 2007-02-19 19:34 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-mm, linux-kernel

On Mon, 2007-02-19 at 19:43 +0100, Arjan van de Ven wrote:
> On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote:
> > The page tables for hugetlb mappings are handled differently than page tables
> > for normal pages.  Rather than integrating multiple page size support into the
> > main VM (which would tremendously complicate the code) some hooks were created.
> > This allows hugetlb special cases to be handled "out of line" by a separate
> > interface.
> 
> ok it makes sense to clean this up.. what I don't like is that there
> STILL are all the double cases... for this to work and be worth it both
> the common case and the hugetlb case should be using the ops structure
> always! Anything else and you're just replacing bad code with bad
> code ;(

Hmm.  Do you think everyone would support an extra pointer indirection
for every handle_pte_fault() call?  If not, then I definitely wouldn't
mind creating a default_pagetable_ops and calling into that.

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
  2007-02-19 18:31   ` Adam Litke
@ 2007-02-19 19:48     ` William Lee Irwin III
  -1 siblings, 0 replies; 68+ messages in thread
From: William Lee Irwin III @ 2007-02-19 19:48 UTC (permalink / raw)
  To: Adam Litke; +Cc: linux-mm, linux-kernel

On Mon, Feb 19, 2007 at 10:31:34AM -0800, Adam Litke wrote:
> +struct pagetable_operations_struct {
> +	int (*fault)(struct mm_struct *mm,
> +		struct vm_area_struct *vma,
> +		unsigned long address, int write_access);
> +	int (*copy_vma)(struct mm_struct *dst, struct mm_struct *src,
> +		struct vm_area_struct *vma);
> +	int (*pin_pages)(struct mm_struct *mm, struct vm_area_struct *vma,
> +		struct page **pages, struct vm_area_struct **vmas,
> +		unsigned long *position, int *length, int i);
> +	void (*change_protection)(struct vm_area_struct *vma,
> +		unsigned long address, unsigned long end, pgprot_t newprot);
> +	unsigned long (*unmap_page_range)(struct vm_area_struct *vma,
> +		unsigned long address, unsigned long end, long *zap_work);
> +	void (*free_pgtable_range)(struct mmu_gather **tlb,
> +		unsigned long addr, unsigned long end,
> +		unsigned long floor, unsigned long ceiling);
> +};

I very very strongly approve of the approach this operations structure
entails.


-- wli

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
@ 2007-02-19 19:48     ` William Lee Irwin III
  0 siblings, 0 replies; 68+ messages in thread
From: William Lee Irwin III @ 2007-02-19 19:48 UTC (permalink / raw)
  To: Adam Litke; +Cc: linux-mm, linux-kernel

On Mon, Feb 19, 2007 at 10:31:34AM -0800, Adam Litke wrote:
> +struct pagetable_operations_struct {
> +	int (*fault)(struct mm_struct *mm,
> +		struct vm_area_struct *vma,
> +		unsigned long address, int write_access);
> +	int (*copy_vma)(struct mm_struct *dst, struct mm_struct *src,
> +		struct vm_area_struct *vma);
> +	int (*pin_pages)(struct mm_struct *mm, struct vm_area_struct *vma,
> +		struct page **pages, struct vm_area_struct **vmas,
> +		unsigned long *position, int *length, int i);
> +	void (*change_protection)(struct vm_area_struct *vma,
> +		unsigned long address, unsigned long end, pgprot_t newprot);
> +	unsigned long (*unmap_page_range)(struct vm_area_struct *vma,
> +		unsigned long address, unsigned long end, long *zap_work);
> +	void (*free_pgtable_range)(struct mmu_gather **tlb,
> +		unsigned long addr, unsigned long end,
> +		unsigned long floor, unsigned long ceiling);
> +};

I very very strongly approve of the approach this operations structure
entails.


-- wli

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API
  2007-02-19 19:34     ` Adam Litke
@ 2007-02-19 21:15       ` Arjan van de Ven
  -1 siblings, 0 replies; 68+ messages in thread
From: Arjan van de Ven @ 2007-02-19 21:15 UTC (permalink / raw)
  To: Adam Litke; +Cc: linux-mm, linux-kernel

On Mon, 2007-02-19 at 13:34 -0600, Adam Litke wrote:
> On Mon, 2007-02-19 at 19:43 +0100, Arjan van de Ven wrote:
> > On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote:
> > > The page tables for hugetlb mappings are handled differently than page tables
> > > for normal pages.  Rather than integrating multiple page size support into the
> > > main VM (which would tremendously complicate the code) some hooks were created.
> > > This allows hugetlb special cases to be handled "out of line" by a separate
> > > interface.
> > 
> > ok it makes sense to clean this up.. what I don't like is that there
> > STILL are all the double cases... for this to work and be worth it both
> > the common case and the hugetlb case should be using the ops structure
> > always! Anything else and you're just replacing bad code with bad
> > code ;(
> 
> Hmm.  Do you think everyone would support an extra pointer indirection
> for every handle_pte_fault() call?  

maybe. I'm not entirely convinced... (I like the cleanup potential a lot
code wise.. but if it costs performance, then... well I'd hate to see
linux get slower for hugetlbfs)

> If not, then I definitely wouldn't
> mind creating a default_pagetable_ops and calling into that.

... but without it to be honest, your patch adds nothing real.. there's
ONE user of your code, and there's no real cleanup unless you get rid of
all the special casing.... since the special casing is the really ugly
part of hugetlbfs, not the actual code inside the special case..


-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API
@ 2007-02-19 21:15       ` Arjan van de Ven
  0 siblings, 0 replies; 68+ messages in thread
From: Arjan van de Ven @ 2007-02-19 21:15 UTC (permalink / raw)
  To: Adam Litke; +Cc: linux-mm, linux-kernel

On Mon, 2007-02-19 at 13:34 -0600, Adam Litke wrote:
> On Mon, 2007-02-19 at 19:43 +0100, Arjan van de Ven wrote:
> > On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote:
> > > The page tables for hugetlb mappings are handled differently than page tables
> > > for normal pages.  Rather than integrating multiple page size support into the
> > > main VM (which would tremendously complicate the code) some hooks were created.
> > > This allows hugetlb special cases to be handled "out of line" by a separate
> > > interface.
> > 
> > ok it makes sense to clean this up.. what I don't like is that there
> > STILL are all the double cases... for this to work and be worth it both
> > the common case and the hugetlb case should be using the ops structure
> > always! Anything else and you're just replacing bad code with bad
> > code ;(
> 
> Hmm.  Do you think everyone would support an extra pointer indirection
> for every handle_pte_fault() call?  

maybe. I'm not entirely convinced... (I like the cleanup potential a lot
code wise.. but if it costs performance, then... well I'd hate to see
linux get slower for hugetlbfs)

> If not, then I definitely wouldn't
> mind creating a default_pagetable_ops and calling into that.

... but without it to be honest, your patch adds nothing real.. there's
ONE user of your code, and there's no real cleanup unless you get rid of
all the special casing.... since the special casing is the really ugly
part of hugetlbfs, not the actual code inside the special case..


-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
  2007-02-19 18:31   ` Adam Litke
@ 2007-02-19 22:29     ` Christoph Hellwig
  -1 siblings, 0 replies; 68+ messages in thread
From: Christoph Hellwig @ 2007-02-19 22:29 UTC (permalink / raw)
  To: Adam Litke; +Cc: linux-mm, linux-kernel

On Mon, Feb 19, 2007 at 10:31:34AM -0800, Adam Litke wrote:
> Signed-off-by: Adam Litke <agl@us.ibm.com>
> ---
> 
>  include/linux/mm.h |   25 +++++++++++++++++++++++++
>  1 files changed, 25 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 2d2c08d..a2fa66d 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -98,6 +98,7 @@ struct vm_area_struct {
>  
>  	/* Function pointers to deal with this struct. */
>  	struct vm_operations_struct * vm_ops;
> +	struct pagetable_operations_struct * pagetable_ops;
>  
>  	/* Information about our backing store: */
>  	unsigned long vm_pgoff;		/* Offset (within vm_file) in PAGE_SIZE
> @@ -218,6 +219,30 @@ struct vm_operations_struct {
>  };
>  
>  struct mmu_gather;
> +
> +struct pagetable_operations_struct {
> +	int (*fault)(struct mm_struct *mm,
> +		struct vm_area_struct *vma,
> +		unsigned long address, int write_access);
> +	int (*copy_vma)(struct mm_struct *dst, struct mm_struct *src,
> +		struct vm_area_struct *vma);
> +	int (*pin_pages)(struct mm_struct *mm, struct vm_area_struct *vma,
> +		struct page **pages, struct vm_area_struct **vmas,
> +		unsigned long *position, int *length, int i);
> +	void (*change_protection)(struct vm_area_struct *vma,
> +		unsigned long address, unsigned long end, pgprot_t newprot);
> +	unsigned long (*unmap_page_range)(struct vm_area_struct *vma,
> +		unsigned long address, unsigned long end, long *zap_work);
> +	void (*free_pgtable_range)(struct mmu_gather **tlb,
> +		unsigned long addr, unsigned long end,
> +		unsigned long floor, unsigned long ceiling);
> +};

I don't think adding another operation vector is a good idea.  But I'd
rather extend the vma operations vector to deal with all nessecary
buts ubstead if addubg a second one.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
@ 2007-02-19 22:29     ` Christoph Hellwig
  0 siblings, 0 replies; 68+ messages in thread
From: Christoph Hellwig @ 2007-02-19 22:29 UTC (permalink / raw)
  To: Adam Litke; +Cc: linux-mm, linux-kernel

On Mon, Feb 19, 2007 at 10:31:34AM -0800, Adam Litke wrote:
> Signed-off-by: Adam Litke <agl@us.ibm.com>
> ---
> 
>  include/linux/mm.h |   25 +++++++++++++++++++++++++
>  1 files changed, 25 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 2d2c08d..a2fa66d 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -98,6 +98,7 @@ struct vm_area_struct {
>  
>  	/* Function pointers to deal with this struct. */
>  	struct vm_operations_struct * vm_ops;
> +	struct pagetable_operations_struct * pagetable_ops;
>  
>  	/* Information about our backing store: */
>  	unsigned long vm_pgoff;		/* Offset (within vm_file) in PAGE_SIZE
> @@ -218,6 +219,30 @@ struct vm_operations_struct {
>  };
>  
>  struct mmu_gather;
> +
> +struct pagetable_operations_struct {
> +	int (*fault)(struct mm_struct *mm,
> +		struct vm_area_struct *vma,
> +		unsigned long address, int write_access);
> +	int (*copy_vma)(struct mm_struct *dst, struct mm_struct *src,
> +		struct vm_area_struct *vma);
> +	int (*pin_pages)(struct mm_struct *mm, struct vm_area_struct *vma,
> +		struct page **pages, struct vm_area_struct **vmas,
> +		unsigned long *position, int *length, int i);
> +	void (*change_protection)(struct vm_area_struct *vma,
> +		unsigned long address, unsigned long end, pgprot_t newprot);
> +	unsigned long (*unmap_page_range)(struct vm_area_struct *vma,
> +		unsigned long address, unsigned long end, long *zap_work);
> +	void (*free_pgtable_range)(struct mmu_gather **tlb,
> +		unsigned long addr, unsigned long end,
> +		unsigned long floor, unsigned long ceiling);
> +};

I don't think adding another operation vector is a good idea.  But I'd
rather extend the vma operations vector to deal with all nessecary
buts ubstead if addubg a second one.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
  2007-02-19 22:29     ` Christoph Hellwig
@ 2007-02-20 15:50       ` Mel Gorman
  -1 siblings, 0 replies; 68+ messages in thread
From: Mel Gorman @ 2007-02-20 15:50 UTC (permalink / raw)
  To: Christoph Hellwig, Adam Litke, linux-mm, linux-kernel

On (19/02/07 22:29), Christoph Hellwig didst pronounce:
> On Mon, Feb 19, 2007 at 10:31:34AM -0800, Adam Litke wrote:
> > Signed-off-by: Adam Litke <agl@us.ibm.com>
> > ---
> > 
> >  include/linux/mm.h |   25 +++++++++++++++++++++++++
> >  1 files changed, 25 insertions(+), 0 deletions(-)
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 2d2c08d..a2fa66d 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -98,6 +98,7 @@ struct vm_area_struct {
> >  
> >  	/* Function pointers to deal with this struct. */
> >  	struct vm_operations_struct * vm_ops;
> > +	struct pagetable_operations_struct * pagetable_ops;
> >  
> >  	/* Information about our backing store: */
> >  	unsigned long vm_pgoff;		/* Offset (within vm_file) in PAGE_SIZE
> > @@ -218,6 +219,30 @@ struct vm_operations_struct {
> >  };
> >  
> >  struct mmu_gather;
> > +
> > +struct pagetable_operations_struct {
> > +	int (*fault)(struct mm_struct *mm,
> > +		struct vm_area_struct *vma,
> > +		unsigned long address, int write_access);
> > +	int (*copy_vma)(struct mm_struct *dst, struct mm_struct *src,
> > +		struct vm_area_struct *vma);
> > +	int (*pin_pages)(struct mm_struct *mm, struct vm_area_struct *vma,
> > +		struct page **pages, struct vm_area_struct **vmas,
> > +		unsigned long *position, int *length, int i);
> > +	void (*change_protection)(struct vm_area_struct *vma,
> > +		unsigned long address, unsigned long end, pgprot_t newprot);
> > +	unsigned long (*unmap_page_range)(struct vm_area_struct *vma,
> > +		unsigned long address, unsigned long end, long *zap_work);
> > +	void (*free_pgtable_range)(struct mmu_gather **tlb,
> > +		unsigned long addr, unsigned long end,
> > +		unsigned long floor, unsigned long ceiling);
> > +};
> 
> I don't think adding another operation vector is a good idea.  But I'd
> rather extend the vma operations vector to deal with all nessecary
> buts ubstead if addubg a second one.

Well, there are a lot of users of vm_operations_struct that have no interest in
the operations in pagetable_operations_struct. Expanding vm_operations_struct
would increase the size of all VMAs by more than is necessary.

Also, having the pagetable ops in vm_operations_struct might lead device
drivers to believe they should be doing something entertaining there. In
reality, we would only want drivers playing with pagetable_operations when
they really know what they are doing and why.  Having the pagetable_ops
set is similar to VM_HUGETLB set as a strong sign that something unusual is
going on that is fairly easy to check for.

I prefer the additional struct to extending VMAs anyway.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
@ 2007-02-20 15:50       ` Mel Gorman
  0 siblings, 0 replies; 68+ messages in thread
From: Mel Gorman @ 2007-02-20 15:50 UTC (permalink / raw)
  To: Christoph Hellwig, Adam Litke, linux-mm, linux-kernel

On (19/02/07 22:29), Christoph Hellwig didst pronounce:
> On Mon, Feb 19, 2007 at 10:31:34AM -0800, Adam Litke wrote:
> > Signed-off-by: Adam Litke <agl@us.ibm.com>
> > ---
> > 
> >  include/linux/mm.h |   25 +++++++++++++++++++++++++
> >  1 files changed, 25 insertions(+), 0 deletions(-)
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 2d2c08d..a2fa66d 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -98,6 +98,7 @@ struct vm_area_struct {
> >  
> >  	/* Function pointers to deal with this struct. */
> >  	struct vm_operations_struct * vm_ops;
> > +	struct pagetable_operations_struct * pagetable_ops;
> >  
> >  	/* Information about our backing store: */
> >  	unsigned long vm_pgoff;		/* Offset (within vm_file) in PAGE_SIZE
> > @@ -218,6 +219,30 @@ struct vm_operations_struct {
> >  };
> >  
> >  struct mmu_gather;
> > +
> > +struct pagetable_operations_struct {
> > +	int (*fault)(struct mm_struct *mm,
> > +		struct vm_area_struct *vma,
> > +		unsigned long address, int write_access);
> > +	int (*copy_vma)(struct mm_struct *dst, struct mm_struct *src,
> > +		struct vm_area_struct *vma);
> > +	int (*pin_pages)(struct mm_struct *mm, struct vm_area_struct *vma,
> > +		struct page **pages, struct vm_area_struct **vmas,
> > +		unsigned long *position, int *length, int i);
> > +	void (*change_protection)(struct vm_area_struct *vma,
> > +		unsigned long address, unsigned long end, pgprot_t newprot);
> > +	unsigned long (*unmap_page_range)(struct vm_area_struct *vma,
> > +		unsigned long address, unsigned long end, long *zap_work);
> > +	void (*free_pgtable_range)(struct mmu_gather **tlb,
> > +		unsigned long addr, unsigned long end,
> > +		unsigned long floor, unsigned long ceiling);
> > +};
> 
> I don't think adding another operation vector is a good idea.  But I'd
> rather extend the vma operations vector to deal with all nessecary
> buts ubstead if addubg a second one.

Well, there are a lot of users of vm_operations_struct that have no interest in
the operations in pagetable_operations_struct. Expanding vm_operations_struct
would increase the size of all VMAs by more than is necessary.

Also, having the pagetable ops in vm_operations_struct might lead device
drivers to believe they should be doing something entertaining there. In
reality, we would only want drivers playing with pagetable_operations when
they really know what they are doing and why.  Having the pagetable_ops
set is similar to VM_HUGETLB set as a strong sign that something unusual is
going on that is fairly easy to check for.

I prefer the additional struct to extending VMAs anyway.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API
  2007-02-19 18:43   ` Arjan van de Ven
@ 2007-02-20 19:54     ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 68+ messages in thread
From: Benjamin Herrenschmidt @ 2007-02-20 19:54 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Adam Litke, linux-mm, linux-kernel

On Mon, 2007-02-19 at 19:43 +0100, Arjan van de Ven wrote:
> On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote:
> > The page tables for hugetlb mappings are handled differently than page tables
> > for normal pages.  Rather than integrating multiple page size support into the
> > main VM (which would tremendously complicate the code) some hooks were created.
> > This allows hugetlb special cases to be handled "out of line" by a separate
> > interface.
> 
> ok it makes sense to clean this up.. what I don't like is that there
> STILL are all the double cases... for this to work and be worth it both
> the common case and the hugetlb case should be using the ops structure
> always! Anything else and you're just replacing bad code with bad
> code ;(

I don't fully agree. I think it makes sense to have the "special" case
be a function pointer and the "normal" case stay where it is for
performances. You don't want to pay the cost of the function pointer
call in the normal case do you ?

Ben.



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API
@ 2007-02-20 19:54     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 68+ messages in thread
From: Benjamin Herrenschmidt @ 2007-02-20 19:54 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Adam Litke, linux-mm, linux-kernel

On Mon, 2007-02-19 at 19:43 +0100, Arjan van de Ven wrote:
> On Mon, 2007-02-19 at 10:31 -0800, Adam Litke wrote:
> > The page tables for hugetlb mappings are handled differently than page tables
> > for normal pages.  Rather than integrating multiple page size support into the
> > main VM (which would tremendously complicate the code) some hooks were created.
> > This allows hugetlb special cases to be handled "out of line" by a separate
> > interface.
> 
> ok it makes sense to clean this up.. what I don't like is that there
> STILL are all the double cases... for this to work and be worth it both
> the common case and the hugetlb case should be using the ops structure
> always! Anything else and you're just replacing bad code with bad
> code ;(

I don't fully agree. I think it makes sense to have the "special" case
be a function pointer and the "normal" case stay where it is for
performances. You don't want to pay the cost of the function pointer
call in the normal case do you ?

Ben.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API
  2007-02-19 21:15       ` Arjan van de Ven
@ 2007-02-20 19:57         ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 68+ messages in thread
From: Benjamin Herrenschmidt @ 2007-02-20 19:57 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Adam Litke, linux-mm, linux-kernel


> maybe. I'm not entirely convinced... (I like the cleanup potential a lot
> code wise.. but if it costs performance, then... well I'd hate to see
> linux get slower for hugetlbfs)
> 
> > If not, then I definitely wouldn't
> > mind creating a default_pagetable_ops and calling into that.
> 
> ... but without it to be honest, your patch adds nothing real.. there's
> ONE user of your code, and there's no real cleanup unless you get rid of
> all the special casing.... since the special casing is the really ugly
> part of hugetlbfs, not the actual code inside the special case..

Well... I disagree there too :-)

I've been working recently for example on some spufs improvements that
require similar tweaking of the user address space as hugetlbfs. The
problem I have is that while there are hooks in the generic code pretty
much everywhere I need.... they are all hugetlb specific, that is they
call directly into the hugetlb code.

For now, I found ways of doing my stuff without hooking all over the
page table operations (well, I had no real choices) but I can imagine it
making sense to allow something (hugetlb being one of them) to take over
part of the user address space.

Ben.




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API
@ 2007-02-20 19:57         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 68+ messages in thread
From: Benjamin Herrenschmidt @ 2007-02-20 19:57 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Adam Litke, linux-mm, linux-kernel

> maybe. I'm not entirely convinced... (I like the cleanup potential a lot
> code wise.. but if it costs performance, then... well I'd hate to see
> linux get slower for hugetlbfs)
> 
> > If not, then I definitely wouldn't
> > mind creating a default_pagetable_ops and calling into that.
> 
> ... but without it to be honest, your patch adds nothing real.. there's
> ONE user of your code, and there's no real cleanup unless you get rid of
> all the special casing.... since the special casing is the really ugly
> part of hugetlbfs, not the actual code inside the special case..

Well... I disagree there too :-)

I've been working recently for example on some spufs improvements that
require similar tweaking of the user address space as hugetlbfs. The
problem I have is that while there are hooks in the generic code pretty
much everywhere I need.... they are all hugetlb specific, that is they
call directly into the hugetlb code.

For now, I found ways of doing my stuff without hooking all over the
page table operations (well, I had no real choices) but I can imagine it
making sense to allow something (hugetlb being one of them) to take over
part of the user address space.

Ben.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
  2007-03-21 23:02         ` Nick Piggin
@ 2007-03-21 23:32           ` William Lee Irwin III
  -1 siblings, 0 replies; 68+ messages in thread
From: William Lee Irwin III @ 2007-03-21 23:32 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Adam Litke, Andrew Morton, Arjan van de Ven, Christoph Hellwig,
	Ken Chen, linux-mm, linux-kernel

Adam Litke wrote:
>> We didn't want to bloat the size of the vm_ops struct for all of its
>> users.

On Thu, Mar 22, 2007 at 10:02:07AM +1100, Nick Piggin wrote:
> But vmas are surely far more numerous than vm_ops, aren't they?

It should be clarified that the pointer to the operations structure
in once-per-mmap() vmas is a bigger overhead than once-per-driver
function pointers in the vm_ops structure.


-- wli

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
@ 2007-03-21 23:32           ` William Lee Irwin III
  0 siblings, 0 replies; 68+ messages in thread
From: William Lee Irwin III @ 2007-03-21 23:32 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Adam Litke, Andrew Morton, Arjan van de Ven, Christoph Hellwig,
	Ken Chen, linux-mm, linux-kernel

Adam Litke wrote:
>> We didn't want to bloat the size of the vm_ops struct for all of its
>> users.

On Thu, Mar 22, 2007 at 10:02:07AM +1100, Nick Piggin wrote:
> But vmas are surely far more numerous than vm_ops, aren't they?

It should be clarified that the pointer to the operations structure
in once-per-mmap() vmas is a bigger overhead than once-per-driver
function pointers in the vm_ops structure.


-- wli

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
  2007-03-21 16:00         ` Christoph Hellwig
@ 2007-03-21 23:03           ` Nick Piggin
  -1 siblings, 0 replies; 68+ messages in thread
From: Nick Piggin @ 2007-03-21 23:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Adam Litke, Andrew Morton, Arjan van de Ven,
	William Lee Irwin III, Ken Chen, linux-mm, linux-kernel

Christoph Hellwig wrote:
> On Wed, Mar 21, 2007 at 10:17:40AM -0500, Adam Litke wrote:
> 
>>>Also, it is going to be hugepage-only, isn't it? So should the naming be
>>>changed to reflect that? And #ifdef it...
>>
>>They are doing some interesting things on Cell that could take advantage
>>of this.
> 
> 
> That would be new to me.  What we need on Cell is fixing up the
> get_unmapped_area mess which Ben is working on now.
> 
> And let me once again repeat that I don't like this at all.  I'll
> rather have a few ugly ifdefs in strategic places than a big object
> oriented mess like this with just a single user.

I think I agree that we'd need more than one user for this.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
@ 2007-03-21 23:03           ` Nick Piggin
  0 siblings, 0 replies; 68+ messages in thread
From: Nick Piggin @ 2007-03-21 23:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Adam Litke, Andrew Morton, Arjan van de Ven,
	William Lee Irwin III, Ken Chen, linux-mm, linux-kernel

Christoph Hellwig wrote:
> On Wed, Mar 21, 2007 at 10:17:40AM -0500, Adam Litke wrote:
> 
>>>Also, it is going to be hugepage-only, isn't it? So should the naming be
>>>changed to reflect that? And #ifdef it...
>>
>>They are doing some interesting things on Cell that could take advantage
>>of this.
> 
> 
> That would be new to me.  What we need on Cell is fixing up the
> get_unmapped_area mess which Ben is working on now.
> 
> And let me once again repeat that I don't like this at all.  I'll
> rather have a few ugly ifdefs in strategic places than a big object
> oriented mess like this with just a single user.

I think I agree that we'd need more than one user for this.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
  2007-03-21 15:17       ` Adam Litke
@ 2007-03-21 23:02         ` Nick Piggin
  -1 siblings, 0 replies; 68+ messages in thread
From: Nick Piggin @ 2007-03-21 23:02 UTC (permalink / raw)
  To: Adam Litke
  Cc: Andrew Morton, Arjan van de Ven, William Lee Irwin III,
	Christoph Hellwig, Ken Chen, linux-mm, linux-kernel

Adam Litke wrote:
> On Wed, 2007-03-21 at 15:18 +1100, Nick Piggin wrote:
> 
>>Adam Litke wrote:

>>>diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>index 60e0e4a..7089323 100644
>>>--- a/include/linux/mm.h
>>>+++ b/include/linux/mm.h
>>>@@ -98,6 +98,7 @@ struct vm_area_struct {
>>> 
>>> 	/* Function pointers to deal with this struct. */
>>> 	struct vm_operations_struct * vm_ops;
>>>+	const struct pagetable_operations_struct * pagetable_ops;
>>> 
>>> 	/* Information about our backing store: */
>>> 	unsigned long vm_pgoff;		/* Offset (within vm_file) in PAGE_SIZE
>>
>>Can you remind me why this isn't in vm_ops?
> 
> 
> We didn't want to bloat the size of the vm_ops struct for all of its
> users.

But vmas are surely far more numerous than vm_ops, aren't they?

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
@ 2007-03-21 23:02         ` Nick Piggin
  0 siblings, 0 replies; 68+ messages in thread
From: Nick Piggin @ 2007-03-21 23:02 UTC (permalink / raw)
  To: Adam Litke
  Cc: Andrew Morton, Arjan van de Ven, William Lee Irwin III,
	Christoph Hellwig, Ken Chen, linux-mm, linux-kernel

Adam Litke wrote:
> On Wed, 2007-03-21 at 15:18 +1100, Nick Piggin wrote:
> 
>>Adam Litke wrote:

>>>diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>index 60e0e4a..7089323 100644
>>>--- a/include/linux/mm.h
>>>+++ b/include/linux/mm.h
>>>@@ -98,6 +98,7 @@ struct vm_area_struct {
>>> 
>>> 	/* Function pointers to deal with this struct. */
>>> 	struct vm_operations_struct * vm_ops;
>>>+	const struct pagetable_operations_struct * pagetable_ops;
>>> 
>>> 	/* Information about our backing store: */
>>> 	unsigned long vm_pgoff;		/* Offset (within vm_file) in PAGE_SIZE
>>
>>Can you remind me why this isn't in vm_ops?
> 
> 
> We didn't want to bloat the size of the vm_ops struct for all of its
> users.

But vmas are surely far more numerous than vm_ops, aren't they?

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
  2007-03-21 15:17       ` Adam Litke
@ 2007-03-21 16:00         ` Christoph Hellwig
  -1 siblings, 0 replies; 68+ messages in thread
From: Christoph Hellwig @ 2007-03-21 16:00 UTC (permalink / raw)
  To: Adam Litke
  Cc: Nick Piggin, Andrew Morton, Arjan van de Ven,
	William Lee Irwin III, Christoph Hellwig, Ken Chen, linux-mm,
	linux-kernel

On Wed, Mar 21, 2007 at 10:17:40AM -0500, Adam Litke wrote:
> > Also, it is going to be hugepage-only, isn't it? So should the naming be
> > changed to reflect that? And #ifdef it...
> 
> They are doing some interesting things on Cell that could take advantage
> of this.

That would be new to me.  What we need on Cell is fixing up the
get_unmapped_area mess which Ben is working on now.

And let me once again repeat that I don't like this at all.  I'll
rather have a few ugly ifdefs in strategic places than a big object
oriented mess like this with just a single user.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
@ 2007-03-21 16:00         ` Christoph Hellwig
  0 siblings, 0 replies; 68+ messages in thread
From: Christoph Hellwig @ 2007-03-21 16:00 UTC (permalink / raw)
  To: Adam Litke
  Cc: Nick Piggin, Andrew Morton, Arjan van de Ven,
	William Lee Irwin III, Christoph Hellwig, Ken Chen, linux-mm,
	linux-kernel

On Wed, Mar 21, 2007 at 10:17:40AM -0500, Adam Litke wrote:
> > Also, it is going to be hugepage-only, isn't it? So should the naming be
> > changed to reflect that? And #ifdef it...
> 
> They are doing some interesting things on Cell that could take advantage
> of this.

That would be new to me.  What we need on Cell is fixing up the
get_unmapped_area mess which Ben is working on now.

And let me once again repeat that I don't like this at all.  I'll
rather have a few ugly ifdefs in strategic places than a big object
oriented mess like this with just a single user.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
  2007-03-21  4:18     ` Nick Piggin
@ 2007-03-21 15:17       ` Adam Litke
  -1 siblings, 0 replies; 68+ messages in thread
From: Adam Litke @ 2007-03-21 15:17 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, Arjan van de Ven, William Lee Irwin III,
	Christoph Hellwig, Ken Chen, linux-mm, linux-kernel

On Wed, 2007-03-21 at 15:18 +1100, Nick Piggin wrote:
> Adam Litke wrote:
> > Signed-off-by: Adam Litke <agl@us.ibm.com>
> > ---
> > 
> >  include/linux/mm.h |   25 +++++++++++++++++++++++++
> >  1 files changed, 25 insertions(+), 0 deletions(-)
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 60e0e4a..7089323 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -98,6 +98,7 @@ struct vm_area_struct {
> >  
> >  	/* Function pointers to deal with this struct. */
> >  	struct vm_operations_struct * vm_ops;
> > +	const struct pagetable_operations_struct * pagetable_ops;
> >  
> >  	/* Information about our backing store: */
> >  	unsigned long vm_pgoff;		/* Offset (within vm_file) in PAGE_SIZE
> 
> Can you remind me why this isn't in vm_ops?

We didn't want to bloat the size of the vm_ops struct for all of its
users.

> Also, it is going to be hugepage-only, isn't it? So should the naming be
> changed to reflect that? And #ifdef it...

They are doing some interesting things on Cell that could take advantage
of this.

> > @@ -218,6 +219,30 @@ struct vm_operations_struct {
> >  };
> >  
> >  struct mmu_gather;
> > +
> > +struct pagetable_operations_struct {
> > +	int (*fault)(struct mm_struct *mm,
> > +		struct vm_area_struct *vma,
> > +		unsigned long address, int write_access);
> 
> I got dibs on fault ;)
> 
> My callback is a sanitised one that basically abstracts the details of the
> virtual memory mapping away, so it is usable by drivers and filesystems.
> 
> You actually want to bypass the normal fault handling because it doesn't
> know how to deal with your virtual memory mapping. Hmm, the best suggestion
> I can come up with is handle_mm_fault... unless you can think of a better
> name for me to use.

How about I use handle_pte_fault?

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
@ 2007-03-21 15:17       ` Adam Litke
  0 siblings, 0 replies; 68+ messages in thread
From: Adam Litke @ 2007-03-21 15:17 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, Arjan van de Ven, William Lee Irwin III,
	Christoph Hellwig, Ken Chen, linux-mm, linux-kernel

On Wed, 2007-03-21 at 15:18 +1100, Nick Piggin wrote:
> Adam Litke wrote:
> > Signed-off-by: Adam Litke <agl@us.ibm.com>
> > ---
> > 
> >  include/linux/mm.h |   25 +++++++++++++++++++++++++
> >  1 files changed, 25 insertions(+), 0 deletions(-)
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 60e0e4a..7089323 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -98,6 +98,7 @@ struct vm_area_struct {
> >  
> >  	/* Function pointers to deal with this struct. */
> >  	struct vm_operations_struct * vm_ops;
> > +	const struct pagetable_operations_struct * pagetable_ops;
> >  
> >  	/* Information about our backing store: */
> >  	unsigned long vm_pgoff;		/* Offset (within vm_file) in PAGE_SIZE
> 
> Can you remind me why this isn't in vm_ops?

We didn't want to bloat the size of the vm_ops struct for all of its
users.

> Also, it is going to be hugepage-only, isn't it? So should the naming be
> changed to reflect that? And #ifdef it...

They are doing some interesting things on Cell that could take advantage
of this.

> > @@ -218,6 +219,30 @@ struct vm_operations_struct {
> >  };
> >  
> >  struct mmu_gather;
> > +
> > +struct pagetable_operations_struct {
> > +	int (*fault)(struct mm_struct *mm,
> > +		struct vm_area_struct *vma,
> > +		unsigned long address, int write_access);
> 
> I got dibs on fault ;)
> 
> My callback is a sanitised one that basically abstracts the details of the
> virtual memory mapping away, so it is usable by drivers and filesystems.
> 
> You actually want to bypass the normal fault handling because it doesn't
> know how to deal with your virtual memory mapping. Hmm, the best suggestion
> I can come up with is handle_mm_fault... unless you can think of a better
> name for me to use.

How about I use handle_pte_fault?

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
  2007-03-21 14:50       ` Adam Litke
@ 2007-03-21 15:05         ` Arjan van de Ven
  -1 siblings, 0 replies; 68+ messages in thread
From: Arjan van de Ven @ 2007-03-21 15:05 UTC (permalink / raw)
  To: Adam Litke
  Cc: Dave Hansen, Andrew Morton, William Lee Irwin III,
	Christoph Hellwig, Ken Chen, linux-mm, linux-kernel

On Wed, 2007-03-21 at 09:50 -0500, Adam Litke wrote:
> On Tue, 2007-03-20 at 16:24 -0700, Dave Hansen wrote:
> > On Mon, 2007-03-19 at 13:05 -0700, Adam Litke wrote:
> > > 
> > > +#define has_pt_op(vma, op) \
> > > +       ((vma)->pagetable_ops && (vma)->pagetable_ops->op)
> > > +#define pt_op(vma, call) \
> > > +       ((vma)->pagetable_ops->call) 
> > 
> > Can you get rid of these macros?  I think they make it a wee bit harder
> > to read.  My brain doesn't properly parse the foo(arg)(bar) syntax.  
> > 
> > +       if (has_pt_op(vma, copy_vma))
> > +               return pt_op(vma, copy_vma)(dst_mm, src_mm, vma);
> > 
> > +       if (vma->pagetable_ops && vma->pagetable_ops->copy_vma)
> > +               return vma->pagetable_ops->copy_vma(dst_mm, src_mm, vma);
> > 
> > I guess it does lead to some longish lines.  Does it start looking
> > really nasty?
> 
> Yeah, it starts to look pretty bad.  Some of these calls are in code
> that is already indented several times.

can we just make sure these things are never NULL in the first place?
would obsolete a lot of the checks, which are also runtime overhead as
well!
-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
@ 2007-03-21 15:05         ` Arjan van de Ven
  0 siblings, 0 replies; 68+ messages in thread
From: Arjan van de Ven @ 2007-03-21 15:05 UTC (permalink / raw)
  To: Adam Litke
  Cc: Dave Hansen, Andrew Morton, William Lee Irwin III,
	Christoph Hellwig, Ken Chen, linux-mm, linux-kernel

On Wed, 2007-03-21 at 09:50 -0500, Adam Litke wrote:
> On Tue, 2007-03-20 at 16:24 -0700, Dave Hansen wrote:
> > On Mon, 2007-03-19 at 13:05 -0700, Adam Litke wrote:
> > > 
> > > +#define has_pt_op(vma, op) \
> > > +       ((vma)->pagetable_ops && (vma)->pagetable_ops->op)
> > > +#define pt_op(vma, call) \
> > > +       ((vma)->pagetable_ops->call) 
> > 
> > Can you get rid of these macros?  I think they make it a wee bit harder
> > to read.  My brain doesn't properly parse the foo(arg)(bar) syntax.  
> > 
> > +       if (has_pt_op(vma, copy_vma))
> > +               return pt_op(vma, copy_vma)(dst_mm, src_mm, vma);
> > 
> > +       if (vma->pagetable_ops && vma->pagetable_ops->copy_vma)
> > +               return vma->pagetable_ops->copy_vma(dst_mm, src_mm, vma);
> > 
> > I guess it does lead to some longish lines.  Does it start looking
> > really nasty?
> 
> Yeah, it starts to look pretty bad.  Some of these calls are in code
> that is already indented several times.

can we just make sure these things are never NULL in the first place?
would obsolete a lot of the checks, which are also runtime overhead as
well!
-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
  2007-03-20 23:24     ` Dave Hansen
@ 2007-03-21 14:50       ` Adam Litke
  -1 siblings, 0 replies; 68+ messages in thread
From: Adam Litke @ 2007-03-21 14:50 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Morton, Arjan van de Ven, William Lee Irwin III,
	Christoph Hellwig, Ken Chen, linux-mm, linux-kernel

On Tue, 2007-03-20 at 16:24 -0700, Dave Hansen wrote:
> On Mon, 2007-03-19 at 13:05 -0700, Adam Litke wrote:
> > 
> > +#define has_pt_op(vma, op) \
> > +       ((vma)->pagetable_ops && (vma)->pagetable_ops->op)
> > +#define pt_op(vma, call) \
> > +       ((vma)->pagetable_ops->call) 
> 
> Can you get rid of these macros?  I think they make it a wee bit harder
> to read.  My brain doesn't properly parse the foo(arg)(bar) syntax.  
> 
> +       if (has_pt_op(vma, copy_vma))
> +               return pt_op(vma, copy_vma)(dst_mm, src_mm, vma);
> 
> +       if (vma->pagetable_ops && vma->pagetable_ops->copy_vma)
> +               return vma->pagetable_ops->copy_vma(dst_mm, src_mm, vma);
> 
> I guess it does lead to some longish lines.  Does it start looking
> really nasty?

Yeah, it starts to look pretty bad.  Some of these calls are in code
that is already indented several times.

> If you're going to have them, it might just be best to put a single
> unlikely() around the macro definitions themselves to keep anybody from
> having to open-code it for any of the users.  

It should be pretty easy to wrap has_pt_op() with an unlikely().  Good
suggestion.

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
@ 2007-03-21 14:50       ` Adam Litke
  0 siblings, 0 replies; 68+ messages in thread
From: Adam Litke @ 2007-03-21 14:50 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Morton, Arjan van de Ven, William Lee Irwin III,
	Christoph Hellwig, Ken Chen, linux-mm, linux-kernel

On Tue, 2007-03-20 at 16:24 -0700, Dave Hansen wrote:
> On Mon, 2007-03-19 at 13:05 -0700, Adam Litke wrote:
> > 
> > +#define has_pt_op(vma, op) \
> > +       ((vma)->pagetable_ops && (vma)->pagetable_ops->op)
> > +#define pt_op(vma, call) \
> > +       ((vma)->pagetable_ops->call) 
> 
> Can you get rid of these macros?  I think they make it a wee bit harder
> to read.  My brain doesn't properly parse the foo(arg)(bar) syntax.  
> 
> +       if (has_pt_op(vma, copy_vma))
> +               return pt_op(vma, copy_vma)(dst_mm, src_mm, vma);
> 
> +       if (vma->pagetable_ops && vma->pagetable_ops->copy_vma)
> +               return vma->pagetable_ops->copy_vma(dst_mm, src_mm, vma);
> 
> I guess it does lead to some longish lines.  Does it start looking
> really nasty?

Yeah, it starts to look pretty bad.  Some of these calls are in code
that is already indented several times.

> If you're going to have them, it might just be best to put a single
> unlikely() around the macro definitions themselves to keep anybody from
> having to open-code it for any of the users.  

It should be pretty easy to wrap has_pt_op() with an unlikely().  Good
suggestion.

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
  2007-03-21  6:51             ` Nick Piggin
@ 2007-03-21 10:46               ` William Lee Irwin III
  -1 siblings, 0 replies; 68+ messages in thread
From: William Lee Irwin III @ 2007-03-21 10:46 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Adam Litke, Andrew Morton, Arjan van de Ven, Christoph Hellwig,
	Ken Chen, linux-mm, linux-kernel, Linus Torvalds

William Lee Irwin III wrote:
>> I'm tied up elsewhere so I won't get to it in a timely fashion. Maybe
>> in a few weeks I can start up on the first two of the bunch.

On Wed, Mar 21, 2007 at 05:51:23PM +1100, Nick Piggin wrote:
> Care to give us a hint? :)

The first is something DISM-like. I've not made up my mind on the
second, but the shopping catalogue of feature requests I've done
nothing about for some time that want this is long.


William Lee Irwin III wrote:
>> Not much of a VM translation; it's just a lookup through the
>> software mocked-up structures on everything save i386, x86_64, and
>> some m68k where they're the same thing only with hardware walkers
>> (ISTR ia64's being firmware a la Alpha despite the "HPW" name,
>> though I could be wrong)

On Wed, Mar 21, 2007 at 05:51:23PM +1100, Nick Piggin wrote:
> Well the vma+pagetables *are* our VM translation data structure. It is
> a good data structure. The Gelato/UNSW guys experimenting with changing
> this have basically said they haven't yet got anything that beats it.
> I would be opposed to anything that bypasses that unless a) it is not
> applicable to the VM as a whole, and b) it is really worth it
> (hugepages was a reasonable exception).

Maybe anticipating the conventional Linux approach to this wasn't as
difficult as I supposed. ;)


William Lee Irwin III wrote:
>> reliant on them. The drivers/etc. could just as easily use helper
>> functions to carry out the lookup, thereby accomplishing the
>> unification. There's nothing particularly fundamental about a pte
>> lookup.

On Wed, Mar 21, 2007 at 05:51:23PM +1100, Nick Piggin wrote:
> Yeah you could, but it looks back to front to me.
> The VM tells the filesystem that the machine took a fault at virtual
> address X, then the filesystem asks the VM what pgoff that is, then
> tells the VM to install the corresponding page to vaddr X.
> With my ->fault, the VM asks the filesystem to give the page that
> corresponds to vaddr X, then installs it into that vaddr.

I'm aware of what is now done and the minor modification accomplished
by your ->fault(). Maybe I've even written something like this before
that I never posted. It's obvious what I'm on about and that my
thoughts here are too divergent to fly. Others should chime in with
more Linux-native ideas about what's to be done here.


William Lee Irwin III wrote:
>> Normal arches that do software TLB refill could just as easily
>> consult the radix trees dangled off struct address_space or any old
>> data structure floating around the kernel with enough information to
>> translate user virtual addresses to the physical addresses they need to
>> fill the TLB with, and there are other kernels that literally do things
>> like that.

On Wed, Mar 21, 2007 at 05:51:23PM +1100, Nick Piggin wrote:
> Sure it *could* be done, but it may not be very nice, given Linux's
> design. And you definitely need _something_ other than just the
> pagecache radix-tree, because the VM needs to know who maps the page.
> So if, for your backing store, you use a small hash table and evict old
> entries like powerpc, you'll constantly be faulting in and out pages
> from the VM's high level view of the address space. That isn't a really
> cheap operation. It takes at least:
[long list of locking operations snipped]
> Compared to our current page table walk which is just a single locked
> op + barrier for the spinlock + radix tree walk.
> If you had a very large hash table (ia64 long mode, maybe?), then you
> may have slightly fewer high level faults, but range based operations
> are going to take a whole lot of cache misses, aren't they? Especially
> for small processes.
> Not that I wouldn't be happy to be proven wrong, but I don't think it
> should be something that sneaks in under these pagetable operations.
> IMO.

I'll presume that was not for my benefit; if so, it was superfluous.

The example I gave was to show how far things could diverge from Linux'
conventions. Every single locking operation cited for Linux didn't
apply to the kernel I was thinking of due to its lockless pagecache
analogue, its lack of a direct equivalent of struct page, and its use
of different lifetime-bounding protocols from reference counting.
Things like page replacement didn't rely on things that would disturb
all that. It all worked out quite well for that kernel. So not only
can it be done other ways, but those ways are indeed efficient.

It should be clear from the above that retrofitting Linux to do similar
is effectively impossible. (Well, if you think you can pull off removing
struct page in favor of no direct equivalent and bounding the lifetimes
of page-sized chunks of memory by shooting down all references using
knowledge of who could possibly be hanging onto them in Linux, feel
free to attempt such a retrofit, and I'll send you a case of Scotch
whisky if you can get it to boot and run a major database benchmark
without crashing regardless of whether it's merged.)

In any event, let's not talk too much at cross-purposes. I'm deferring
to others on this, as I said. Someone else will doubtless come at this
from a direction that gibes better with Linux-native conventions.


-- wli

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
@ 2007-03-21 10:46               ` William Lee Irwin III
  0 siblings, 0 replies; 68+ messages in thread
From: William Lee Irwin III @ 2007-03-21 10:46 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Adam Litke, Andrew Morton, Arjan van de Ven, Christoph Hellwig,
	Ken Chen, linux-mm, linux-kernel, Linus Torvalds

William Lee Irwin III wrote:
>> I'm tied up elsewhere so I won't get to it in a timely fashion. Maybe
>> in a few weeks I can start up on the first two of the bunch.

On Wed, Mar 21, 2007 at 05:51:23PM +1100, Nick Piggin wrote:
> Care to give us a hint? :)

The first is something DISM-like. I've not made up my mind on the
second, but the shopping catalogue of feature requests I've done
nothing about for some time that want this is long.


William Lee Irwin III wrote:
>> Not much of a VM translation; it's just a lookup through the
>> software mocked-up structures on everything save i386, x86_64, and
>> some m68k where they're the same thing only with hardware walkers
>> (ISTR ia64's being firmware a la Alpha despite the "HPW" name,
>> though I could be wrong)

On Wed, Mar 21, 2007 at 05:51:23PM +1100, Nick Piggin wrote:
> Well the vma+pagetables *are* our VM translation data structure. It is
> a good data structure. The Gelato/UNSW guys experimenting with changing
> this have basically said they haven't yet got anything that beats it.
> I would be opposed to anything that bypasses that unless a) it is not
> applicable to the VM as a whole, and b) it is really worth it
> (hugepages was a reasonable exception).

Maybe anticipating the conventional Linux approach to this wasn't as
difficult as I supposed. ;)


William Lee Irwin III wrote:
>> reliant on them. The drivers/etc. could just as easily use helper
>> functions to carry out the lookup, thereby accomplishing the
>> unification. There's nothing particularly fundamental about a pte
>> lookup.

On Wed, Mar 21, 2007 at 05:51:23PM +1100, Nick Piggin wrote:
> Yeah you could, but it looks back to front to me.
> The VM tells the filesystem that the machine took a fault at virtual
> address X, then the filesystem asks the VM what pgoff that is, then
> tells the VM to install the corresponding page to vaddr X.
> With my ->fault, the VM asks the filesystem to give the page that
> corresponds to vaddr X, then installs it into that vaddr.

I'm aware of what is now done and the minor modification accomplished
by your ->fault(). Maybe I've even written something like this before
that I never posted. It's obvious what I'm on about and that my
thoughts here are too divergent to fly. Others should chime in with
more Linux-native ideas about what's to be done here.


William Lee Irwin III wrote:
>> Normal arches that do software TLB refill could just as easily
>> consult the radix trees dangled off struct address_space or any old
>> data structure floating around the kernel with enough information to
>> translate user virtual addresses to the physical addresses they need to
>> fill the TLB with, and there are other kernels that literally do things
>> like that.

On Wed, Mar 21, 2007 at 05:51:23PM +1100, Nick Piggin wrote:
> Sure it *could* be done, but it may not be very nice, given Linux's
> design. And you definitely need _something_ other than just the
> pagecache radix-tree, because the VM needs to know who maps the page.
> So if, for your backing store, you use a small hash table and evict old
> entries like powerpc, you'll constantly be faulting in and out pages
> from the VM's high level view of the address space. That isn't a really
> cheap operation. It takes at least:
[long list of locking operations snipped]
> Compared to our current page table walk which is just a single locked
> op + barrier for the spinlock + radix tree walk.
> If you had a very large hash table (ia64 long mode, maybe?), then you
> may have slightly fewer high level faults, but range based operations
> are going to take a whole lot of cache misses, aren't they? Especially
> for small processes.
> Not that I wouldn't be happy to be proven wrong, but I don't think it
> should be something that sneaks in under these pagetable operations.
> IMO.

I'll presume that was not for my benefit; if so, it was superfluous.

The example I gave was to show how far things could diverge from Linux'
conventions. Every single locking operation cited for Linux didn't
apply to the kernel I was thinking of due to its lockless pagecache
analogue, its lack of a direct equivalent of struct page, and its use
of different lifetime-bounding protocols from reference counting.
Things like page replacement didn't rely on things that would disturb
all that. It all worked out quite well for that kernel. So not only
can it be done other ways, but those ways are indeed efficient.

It should be clear from the above that retrofitting Linux to do similar
is effectively impossible. (Well, if you think you can pull off removing
struct page in favor of no direct equivalent and bounding the lifetimes
of page-sized chunks of memory by shooting down all references using
knowledge of who could possibly be hanging onto them in Linux, feel
free to attempt such a retrofit, and I'll send you a case of Scotch
whisky if you can get it to boot and run a major database benchmark
without crashing regardless of whether it's merged.)

In any event, let's not talk too much at cross-purposes. I'm deferring
to others on this, as I said. Someone else will doubtless come at this
from a direction that gibes better with Linux-native conventions.


-- wli

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
  2007-03-21  6:51             ` Nick Piggin
@ 2007-03-21  7:36               ` Nick Piggin
  -1 siblings, 0 replies; 68+ messages in thread
From: Nick Piggin @ 2007-03-21  7:36 UTC (permalink / raw)
  To: Nick Piggin
  Cc: William Lee Irwin III, Adam Litke, Andrew Morton,
	Arjan van de Ven, Christoph Hellwig, Ken Chen, linux-mm,
	linux-kernel, Linus Torvalds

Nick Piggin wrote:

> Yeah you could, but it looks back to front to me.
> 
> The VM tells the filesystem that the machine took a fault at virtual
> address X, then the filesystem asks the VM what pgoff that is, then
> tells the VM to install the corresponding page to vaddr X.
> 
> With my ->fault, the VM asks the filesystem to give the page that
> corresponds to vaddr X, then installs it into that vaddr.

Err, sorry, that's what the current ->nopage does. It is then still
up to the filesystem to do the vaddr to pgoff conversion.

My fault patches of course just ask the filesystem for the page at
a given pgoff.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
@ 2007-03-21  7:36               ` Nick Piggin
  0 siblings, 0 replies; 68+ messages in thread
From: Nick Piggin @ 2007-03-21  7:36 UTC (permalink / raw)
  To: Nick Piggin
  Cc: William Lee Irwin III, Adam Litke, Andrew Morton,
	Arjan van de Ven, Christoph Hellwig, Ken Chen, linux-mm,
	linux-kernel, Linus Torvalds

Nick Piggin wrote:

> Yeah you could, but it looks back to front to me.
> 
> The VM tells the filesystem that the machine took a fault at virtual
> address X, then the filesystem asks the VM what pgoff that is, then
> tells the VM to install the corresponding page to vaddr X.
> 
> With my ->fault, the VM asks the filesystem to give the page that
> corresponds to vaddr X, then installs it into that vaddr.

Err, sorry, that's what the current ->nopage does. It is then still
up to the filesystem to do the vaddr to pgoff conversion.

My fault patches of course just ask the filesystem for the page at
a given pgoff.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
  2007-03-21  5:41           ` William Lee Irwin III
@ 2007-03-21  6:51             ` Nick Piggin
  -1 siblings, 0 replies; 68+ messages in thread
From: Nick Piggin @ 2007-03-21  6:51 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Adam Litke, Andrew Morton, Arjan van de Ven, Christoph Hellwig,
	Ken Chen, linux-mm, linux-kernel, Linus Torvalds

William Lee Irwin III wrote:
> William Lee Irwin III wrote:
> 
>>>ISTR potential ppc64 users coming out of the woodwork for something I
>>>didn't recognize the name of, but I may be confusing that with your
>>>patch. I can implement additional users (and useful ones at that)
>>>needing this in particular if desired.
> 
> 
> On Wed, Mar 21, 2007 at 04:07:43PM +1100, Nick Piggin wrote:
> 
>>Yes I would be interested in seeing useful additional users of this
>>that cannot use our regular virtual memory, before making it a general
>>thing.
>>I just don't want to see proliferation of these things, if possible.
> 
> 
> I'm tied up elsewhere so I won't get to it in a timely fashion. Maybe
> in a few weeks I can start up on the first two of the bunch.

Care to give us a hint? :)


> William Lee Irwin III wrote:
> 
>>>Two fault handling methods callbacks raise an eyebrow over here at least.
>>>I was vaguely hoping for unification of the fault handling callbacks.
> 
> 
> On Wed, Mar 21, 2007 at 04:07:43PM +1100, Nick Piggin wrote:
> 
>>I don't know if it would be so clean to do that as they are at different 
>>levels.
>>Adam's fault is before the VM translation (and bypasses it), and mine is 
>>after.
> 
> 
> Not much of a VM translation; it's just a lookup through the software
> mocked-up structures on everything save i386, x86_64, and some m68k where
> they're the same thing only with hardware walkers (ISTR ia64's being
> firmware a la Alpha despite the "HPW" name, though I could be wrong)

Well the vma+pagetables *are* our VM translation data structure. It is
a good data structure. The Gelato/UNSW guys experimenting with changing
this have basically said they haven't yet got anything that beats it.

I would be opposed to anything that bypasses that unless a) it is not
applicable to the VM as a whole, and b) it is really worth it
(hugepages was a reasonable exception).


> reliant on them. The drivers/etc. could just as easily use helper
> functions to carry out the lookup, thereby accomplishing the
> unification. There's nothing particularly fundamental about a pte
> lookup.

Yeah you could, but it looks back to front to me.

The VM tells the filesystem that the machine took a fault at virtual
address X, then the filesystem asks the VM what pgoff that is, then
tells the VM to install the corresponding page to vaddr X.

With my ->fault, the VM asks the filesystem to give the page that
corresponds to vaddr X, then installs it into that vaddr.


> Normal arches that do software TLB refill could just as easily
> consult the radix trees dangled off struct address_space or any old
> data structure floating around the kernel with enough information to
> translate user virtual addresses to the physical addresses they need to
> fill the TLB with, and there are other kernels that literally do things
> like that.

Sure it *could* be done, but it may not be very nice, given Linux's
design. And you definitely need _something_ other than just the
pagecache radix-tree, because the VM needs to know who maps the page.

So if, for your backing store, you use a small hash table and evict old
entries like powerpc, you'll constantly be faulting in and out pages
from the VM's high level view of the address space. That isn't a really
cheap operation. It takes at least:

read_lock_irq(mapping->tree_lock);
radix_tree_lookup()
read_unlock_irq(mapping->tree_lock);
lock_page()
atomic_add(page->_count)
atomic_add(page->_mapcount)
unlock_page()

atomic_add_negative(page->_mapcount)
atomic_dec_and_test(page->_count)

Compared to our current page table walk which is just a single locked
op + barrier for the spinlock + radix tree walk.


If you had a very large hash table (ia64 long mode, maybe?), then you
may have slightly fewer high level faults, but range based operations
are going to take a whole lot of cache misses, aren't they? Especially
for small processes.

Not that I wouldn't be happy to be proven wrong, but I don't think it
should be something that sneaks in under these pagetable operations.
IMO.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
@ 2007-03-21  6:51             ` Nick Piggin
  0 siblings, 0 replies; 68+ messages in thread
From: Nick Piggin @ 2007-03-21  6:51 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Adam Litke, Andrew Morton, Arjan van de Ven, Christoph Hellwig,
	Ken Chen, linux-mm, linux-kernel, Linus Torvalds

William Lee Irwin III wrote:
> William Lee Irwin III wrote:
> 
>>>ISTR potential ppc64 users coming out of the woodwork for something I
>>>didn't recognize the name of, but I may be confusing that with your
>>>patch. I can implement additional users (and useful ones at that)
>>>needing this in particular if desired.
> 
> 
> On Wed, Mar 21, 2007 at 04:07:43PM +1100, Nick Piggin wrote:
> 
>>Yes I would be interested in seeing useful additional users of this
>>that cannot use our regular virtual memory, before making it a general
>>thing.
>>I just don't want to see proliferation of these things, if possible.
> 
> 
> I'm tied up elsewhere so I won't get to it in a timely fashion. Maybe
> in a few weeks I can start up on the first two of the bunch.

Care to give us a hint? :)


> William Lee Irwin III wrote:
> 
>>>Two fault handling methods callbacks raise an eyebrow over here at least.
>>>I was vaguely hoping for unification of the fault handling callbacks.
> 
> 
> On Wed, Mar 21, 2007 at 04:07:43PM +1100, Nick Piggin wrote:
> 
>>I don't know if it would be so clean to do that as they are at different 
>>levels.
>>Adam's fault is before the VM translation (and bypasses it), and mine is 
>>after.
> 
> 
> Not much of a VM translation; it's just a lookup through the software
> mocked-up structures on everything save i386, x86_64, and some m68k where
> they're the same thing only with hardware walkers (ISTR ia64's being
> firmware a la Alpha despite the "HPW" name, though I could be wrong)

Well the vma+pagetables *are* our VM translation data structure. It is
a good data structure. The Gelato/UNSW guys experimenting with changing
this have basically said they haven't yet got anything that beats it.

I would be opposed to anything that bypasses that unless a) it is not
applicable to the VM as a whole, and b) it is really worth it
(hugepages was a reasonable exception).


> reliant on them. The drivers/etc. could just as easily use helper
> functions to carry out the lookup, thereby accomplishing the
> unification. There's nothing particularly fundamental about a pte
> lookup.

Yeah you could, but it looks back to front to me.

The VM tells the filesystem that the machine took a fault at virtual
address X, then the filesystem asks the VM what pgoff that is, then
tells the VM to install the corresponding page to vaddr X.

With my ->fault, the VM asks the filesystem to give the page that
corresponds to vaddr X, then installs it into that vaddr.


> Normal arches that do software TLB refill could just as easily
> consult the radix trees dangled off struct address_space or any old
> data structure floating around the kernel with enough information to
> translate user virtual addresses to the physical addresses they need to
> fill the TLB with, and there are other kernels that literally do things
> like that.

Sure it *could* be done, but it may not be very nice, given Linux's
design. And you definitely need _something_ other than just the
pagecache radix-tree, because the VM needs to know who maps the page.

So if, for your backing store, you use a small hash table and evict old
entries like powerpc, you'll constantly be faulting in and out pages
from the VM's high level view of the address space. That isn't a really
cheap operation. It takes at least:

read_lock_irq(mapping->tree_lock);
radix_tree_lookup()
read_unlock_irq(mapping->tree_lock);
lock_page()
atomic_add(page->_count)
atomic_add(page->_mapcount)
unlock_page()

atomic_add_negative(page->_mapcount)
atomic_dec_and_test(page->_count)

Compared to our current page table walk which is just a single locked
op + barrier for the spinlock + radix tree walk.


If you had a very large hash table (ia64 long mode, maybe?), then you
may have slightly fewer high level faults, but range based operations
are going to take a whole lot of cache misses, aren't they? Especially
for small processes.

Not that I wouldn't be happy to be proven wrong, but I don't think it
should be something that sneaks in under these pagetable operations.
IMO.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
  2007-03-21  5:07         ` Nick Piggin
@ 2007-03-21  5:41           ` William Lee Irwin III
  -1 siblings, 0 replies; 68+ messages in thread
From: William Lee Irwin III @ 2007-03-21  5:41 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Adam Litke, Andrew Morton, Arjan van de Ven, Christoph Hellwig,
	Ken Chen, linux-mm, linux-kernel

William Lee Irwin III wrote:
>> ISTR potential ppc64 users coming out of the woodwork for something I
>> didn't recognize the name of, but I may be confusing that with your
>> patch. I can implement additional users (and useful ones at that)
>> needing this in particular if desired.

On Wed, Mar 21, 2007 at 04:07:43PM +1100, Nick Piggin wrote:
> Yes I would be interested in seeing useful additional users of this
> that cannot use our regular virtual memory, before making it a general
> thing.
> I just don't want to see proliferation of these things, if possible.

I'm tied up elsewhere so I won't get to it in a timely fashion. Maybe
in a few weeks I can start up on the first two of the bunch.


William Lee Irwin III wrote:
>> Two fault handling methods callbacks raise an eyebrow over here at least.
>> I was vaguely hoping for unification of the fault handling callbacks.

On Wed, Mar 21, 2007 at 04:07:43PM +1100, Nick Piggin wrote:
> I don't know if it would be so clean to do that as they are at different 
> levels.
> Adam's fault is before the VM translation (and bypasses it), and mine is 
> after.

Not much of a VM translation; it's just a lookup through the software
mocked-up structures on everything save i386, x86_64, and some m68k where
they're the same thing only with hardware walkers (ISTR ia64's being
firmware a la Alpha despite the "HPW" name, though I could be wrong)
reliant on them. The drivers/etc. could just as easily use helper
functions to carry out the lookup, thereby accomplishing the
unification. There's nothing particularly fundamental about a pte
lookup. Normal arches that do software TLB refill could just as easily
consult the radix trees dangled off struct address_space or any old
data structure floating around the kernel with enough information to
translate user virtual addresses to the physical addresses they need to
fill the TLB with, and there are other kernels that literally do things
like that.

Basically, drop in to the ->fault() callback with no attempt at a pte
lookup. The drivers using the standard pagetable format can call helper
functions to do all the gruntwork surrounding that for them. Then the
more sophisticated drivers can do the necessary work by hand.

But others should really be consulted on this point. My notions in/around
this area tend to be outside the mainstream. I can anticipate that the
two ->fault() functions will look strange to people, but not what
alternatives would be most idiomatic to mainstream Linux conventions.


-- wli

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
@ 2007-03-21  5:41           ` William Lee Irwin III
  0 siblings, 0 replies; 68+ messages in thread
From: William Lee Irwin III @ 2007-03-21  5:41 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Adam Litke, Andrew Morton, Arjan van de Ven, Christoph Hellwig,
	Ken Chen, linux-mm, linux-kernel

William Lee Irwin III wrote:
>> ISTR potential ppc64 users coming out of the woodwork for something I
>> didn't recognize the name of, but I may be confusing that with your
>> patch. I can implement additional users (and useful ones at that)
>> needing this in particular if desired.

On Wed, Mar 21, 2007 at 04:07:43PM +1100, Nick Piggin wrote:
> Yes I would be interested in seeing useful additional users of this
> that cannot use our regular virtual memory, before making it a general
> thing.
> I just don't want to see proliferation of these things, if possible.

I'm tied up elsewhere so I won't get to it in a timely fashion. Maybe
in a few weeks I can start up on the first two of the bunch.


William Lee Irwin III wrote:
>> Two fault handling methods callbacks raise an eyebrow over here at least.
>> I was vaguely hoping for unification of the fault handling callbacks.

On Wed, Mar 21, 2007 at 04:07:43PM +1100, Nick Piggin wrote:
> I don't know if it would be so clean to do that as they are at different 
> levels.
> Adam's fault is before the VM translation (and bypasses it), and mine is 
> after.

Not much of a VM translation; it's just a lookup through the software
mocked-up structures on everything save i386, x86_64, and some m68k where
they're the same thing only with hardware walkers (ISTR ia64's being
firmware a la Alpha despite the "HPW" name, though I could be wrong)
reliant on them. The drivers/etc. could just as easily use helper
functions to carry out the lookup, thereby accomplishing the
unification. There's nothing particularly fundamental about a pte
lookup. Normal arches that do software TLB refill could just as easily
consult the radix trees dangled off struct address_space or any old
data structure floating around the kernel with enough information to
translate user virtual addresses to the physical addresses they need to
fill the TLB with, and there are other kernels that literally do things
like that.

Basically, drop in to the ->fault() callback with no attempt at a pte
lookup. The drivers using the standard pagetable format can call helper
functions to do all the gruntwork surrounding that for them. Then the
more sophisticated drivers can do the necessary work by hand.

But others should really be consulted on this point. My notions in/around
this area tend to be outside the mainstream. I can anticipate that the
two ->fault() functions will look strange to people, but not what
alternatives would be most idiomatic to mainstream Linux conventions.


-- wli

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
  2007-03-21  4:52       ` William Lee Irwin III
@ 2007-03-21  5:07         ` Nick Piggin
  -1 siblings, 0 replies; 68+ messages in thread
From: Nick Piggin @ 2007-03-21  5:07 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Adam Litke, Andrew Morton, Arjan van de Ven, Christoph Hellwig,
	Ken Chen, linux-mm, linux-kernel

William Lee Irwin III wrote:
> Adam Litke wrote:
> 
>>> 	struct vm_operations_struct * vm_ops;
>>>+	const struct pagetable_operations_struct * pagetable_ops;
> 
> 
> On Wed, Mar 21, 2007 at 03:18:30PM +1100, Nick Piggin wrote:
> 
>>Can you remind me why this isn't in vm_ops?
>>Also, it is going to be hugepage-only, isn't it? So should the naming be
>>changed to reflect that? And #ifdef it...
> 
> 
> ISTR potential ppc64 users coming out of the woodwork for something I
> didn't recognize the name of, but I may be confusing that with your
> patch. I can implement additional users (and useful ones at that)
> needing this in particular if desired.

Yes I would be interested in seeing useful additional users of this
that cannot use our regular virtual memory, before making it a general
thing.

I just don't want to see proliferation of these things, if possible.

> Adam Litke wrote:
> 
>>>+struct pagetable_operations_struct {
>>>+	int (*fault)(struct mm_struct *mm,
> 
> 
> On Wed, Mar 21, 2007 at 03:18:30PM +1100, Nick Piggin wrote:
> 
>>I got dibs on fault ;)
>>My callback is a sanitised one that basically abstracts the details of the
>>virtual memory mapping away, so it is usable by drivers and filesystems.
>>You actually want to bypass the normal fault handling because it doesn't
>>know how to deal with your virtual memory mapping. Hmm, the best suggestion
>>I can come up with is handle_mm_fault... unless you can think of a better
>>name for me to use.
> 
> 
> Two fault handling methods callbacks raise an eyebrow over here at least.
> I was vaguely hoping for unification of the fault handling callbacks.

I don't know if it would be so clean to do that as they are at different levels.
Adam's fault is before the VM translation (and bypasses it), and mine is after.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
@ 2007-03-21  5:07         ` Nick Piggin
  0 siblings, 0 replies; 68+ messages in thread
From: Nick Piggin @ 2007-03-21  5:07 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Adam Litke, Andrew Morton, Arjan van de Ven, Christoph Hellwig,
	Ken Chen, linux-mm, linux-kernel

William Lee Irwin III wrote:
> Adam Litke wrote:
> 
>>> 	struct vm_operations_struct * vm_ops;
>>>+	const struct pagetable_operations_struct * pagetable_ops;
> 
> 
> On Wed, Mar 21, 2007 at 03:18:30PM +1100, Nick Piggin wrote:
> 
>>Can you remind me why this isn't in vm_ops?
>>Also, it is going to be hugepage-only, isn't it? So should the naming be
>>changed to reflect that? And #ifdef it...
> 
> 
> ISTR potential ppc64 users coming out of the woodwork for something I
> didn't recognize the name of, but I may be confusing that with your
> patch. I can implement additional users (and useful ones at that)
> needing this in particular if desired.

Yes I would be interested in seeing useful additional users of this
that cannot use our regular virtual memory, before making it a general
thing.

I just don't want to see proliferation of these things, if possible.

> Adam Litke wrote:
> 
>>>+struct pagetable_operations_struct {
>>>+	int (*fault)(struct mm_struct *mm,
> 
> 
> On Wed, Mar 21, 2007 at 03:18:30PM +1100, Nick Piggin wrote:
> 
>>I got dibs on fault ;)
>>My callback is a sanitised one that basically abstracts the details of the
>>virtual memory mapping away, so it is usable by drivers and filesystems.
>>You actually want to bypass the normal fault handling because it doesn't
>>know how to deal with your virtual memory mapping. Hmm, the best suggestion
>>I can come up with is handle_mm_fault... unless you can think of a better
>>name for me to use.
> 
> 
> Two fault handling methods callbacks raise an eyebrow over here at least.
> I was vaguely hoping for unification of the fault handling callbacks.

I don't know if it would be so clean to do that as they are at different levels.
Adam's fault is before the VM translation (and bypasses it), and mine is after.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
  2007-03-21  4:18     ` Nick Piggin
@ 2007-03-21  4:52       ` William Lee Irwin III
  -1 siblings, 0 replies; 68+ messages in thread
From: William Lee Irwin III @ 2007-03-21  4:52 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Adam Litke, Andrew Morton, Arjan van de Ven, Christoph Hellwig,
	Ken Chen, linux-mm, linux-kernel

Adam Litke wrote:
>>  	struct vm_operations_struct * vm_ops;
>> +	const struct pagetable_operations_struct * pagetable_ops;

On Wed, Mar 21, 2007 at 03:18:30PM +1100, Nick Piggin wrote:
> Can you remind me why this isn't in vm_ops?
> Also, it is going to be hugepage-only, isn't it? So should the naming be
> changed to reflect that? And #ifdef it...

ISTR potential ppc64 users coming out of the woodwork for something I
didn't recognize the name of, but I may be confusing that with your
patch. I can implement additional users (and useful ones at that)
needing this in particular if desired.


Adam Litke wrote:
>> +struct pagetable_operations_struct {
>> +	int (*fault)(struct mm_struct *mm,

On Wed, Mar 21, 2007 at 03:18:30PM +1100, Nick Piggin wrote:
> I got dibs on fault ;)
> My callback is a sanitised one that basically abstracts the details of the
> virtual memory mapping away, so it is usable by drivers and filesystems.
> You actually want to bypass the normal fault handling because it doesn't
> know how to deal with your virtual memory mapping. Hmm, the best suggestion
> I can come up with is handle_mm_fault... unless you can think of a better
> name for me to use.

Two fault handling methods callbacks raise an eyebrow over here at least.
I was vaguely hoping for unification of the fault handling callbacks.


-- wli

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
@ 2007-03-21  4:52       ` William Lee Irwin III
  0 siblings, 0 replies; 68+ messages in thread
From: William Lee Irwin III @ 2007-03-21  4:52 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Adam Litke, Andrew Morton, Arjan van de Ven, Christoph Hellwig,
	Ken Chen, linux-mm, linux-kernel

Adam Litke wrote:
>>  	struct vm_operations_struct * vm_ops;
>> +	const struct pagetable_operations_struct * pagetable_ops;

On Wed, Mar 21, 2007 at 03:18:30PM +1100, Nick Piggin wrote:
> Can you remind me why this isn't in vm_ops?
> Also, it is going to be hugepage-only, isn't it? So should the naming be
> changed to reflect that? And #ifdef it...

ISTR potential ppc64 users coming out of the woodwork for something I
didn't recognize the name of, but I may be confusing that with your
patch. I can implement additional users (and useful ones at that)
needing this in particular if desired.


Adam Litke wrote:
>> +struct pagetable_operations_struct {
>> +	int (*fault)(struct mm_struct *mm,

On Wed, Mar 21, 2007 at 03:18:30PM +1100, Nick Piggin wrote:
> I got dibs on fault ;)
> My callback is a sanitised one that basically abstracts the details of the
> virtual memory mapping away, so it is usable by drivers and filesystems.
> You actually want to bypass the normal fault handling because it doesn't
> know how to deal with your virtual memory mapping. Hmm, the best suggestion
> I can come up with is handle_mm_fault... unless you can think of a better
> name for me to use.

Two fault handling methods callbacks raise an eyebrow over here at least.
I was vaguely hoping for unification of the fault handling callbacks.


-- wli

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
  2007-03-19 20:05   ` Adam Litke
@ 2007-03-21  4:18     ` Nick Piggin
  -1 siblings, 0 replies; 68+ messages in thread
From: Nick Piggin @ 2007-03-21  4:18 UTC (permalink / raw)
  To: Adam Litke
  Cc: Andrew Morton, Arjan van de Ven, William Lee Irwin III,
	Christoph Hellwig, Ken Chen, linux-mm, linux-kernel

Adam Litke wrote:
> Signed-off-by: Adam Litke <agl@us.ibm.com>
> ---
> 
>  include/linux/mm.h |   25 +++++++++++++++++++++++++
>  1 files changed, 25 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 60e0e4a..7089323 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -98,6 +98,7 @@ struct vm_area_struct {
>  
>  	/* Function pointers to deal with this struct. */
>  	struct vm_operations_struct * vm_ops;
> +	const struct pagetable_operations_struct * pagetable_ops;
>  
>  	/* Information about our backing store: */
>  	unsigned long vm_pgoff;		/* Offset (within vm_file) in PAGE_SIZE

Can you remind me why this isn't in vm_ops?

Also, it is going to be hugepage-only, isn't it? So should the naming be
changed to reflect that? And #ifdef it...

> @@ -218,6 +219,30 @@ struct vm_operations_struct {
>  };
>  
>  struct mmu_gather;
> +
> +struct pagetable_operations_struct {
> +	int (*fault)(struct mm_struct *mm,
> +		struct vm_area_struct *vma,
> +		unsigned long address, int write_access);

I got dibs on fault ;)

My callback is a sanitised one that basically abstracts the details of the
virtual memory mapping away, so it is usable by drivers and filesystems.

You actually want to bypass the normal fault handling because it doesn't
know how to deal with your virtual memory mapping. Hmm, the best suggestion
I can come up with is handle_mm_fault... unless you can think of a better
name for me to use.

> +	int (*copy_vma)(struct mm_struct *dst, struct mm_struct *src,
> +		struct vm_area_struct *vma);
> +	int (*pin_pages)(struct mm_struct *mm, struct vm_area_struct *vma,
> +		struct page **pages, struct vm_area_struct **vmas,
> +		unsigned long *position, int *length, int i);
> +	void (*change_protection)(struct vm_area_struct *vma,
> +		unsigned long address, unsigned long end, pgprot_t newprot);
> +	unsigned long (*unmap_page_range)(struct vm_area_struct *vma,
> +		unsigned long address, unsigned long end, long *zap_work);
> +	void (*free_pgtable_range)(struct mmu_gather **tlb,
> +		unsigned long addr, unsigned long end,
> +		unsigned long floor, unsigned long ceiling);
> +};
> +
> +#define has_pt_op(vma, op) \
> +	((vma)->pagetable_ops && (vma)->pagetable_ops->op)
> +#define pt_op(vma, call) \
> +	((vma)->pagetable_ops->call)
> +
>  struct inode;
>  
>  #define page_private(page)		((page)->private)
> 
> --

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
@ 2007-03-21  4:18     ` Nick Piggin
  0 siblings, 0 replies; 68+ messages in thread
From: Nick Piggin @ 2007-03-21  4:18 UTC (permalink / raw)
  To: Adam Litke
  Cc: Andrew Morton, Arjan van de Ven, William Lee Irwin III,
	Christoph Hellwig, Ken Chen, linux-mm, linux-kernel

Adam Litke wrote:
> Signed-off-by: Adam Litke <agl@us.ibm.com>
> ---
> 
>  include/linux/mm.h |   25 +++++++++++++++++++++++++
>  1 files changed, 25 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 60e0e4a..7089323 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -98,6 +98,7 @@ struct vm_area_struct {
>  
>  	/* Function pointers to deal with this struct. */
>  	struct vm_operations_struct * vm_ops;
> +	const struct pagetable_operations_struct * pagetable_ops;
>  
>  	/* Information about our backing store: */
>  	unsigned long vm_pgoff;		/* Offset (within vm_file) in PAGE_SIZE

Can you remind me why this isn't in vm_ops?

Also, it is going to be hugepage-only, isn't it? So should the naming be
changed to reflect that? And #ifdef it...

> @@ -218,6 +219,30 @@ struct vm_operations_struct {
>  };
>  
>  struct mmu_gather;
> +
> +struct pagetable_operations_struct {
> +	int (*fault)(struct mm_struct *mm,
> +		struct vm_area_struct *vma,
> +		unsigned long address, int write_access);

I got dibs on fault ;)

My callback is a sanitised one that basically abstracts the details of the
virtual memory mapping away, so it is usable by drivers and filesystems.

You actually want to bypass the normal fault handling because it doesn't
know how to deal with your virtual memory mapping. Hmm, the best suggestion
I can come up with is handle_mm_fault... unless you can think of a better
name for me to use.

> +	int (*copy_vma)(struct mm_struct *dst, struct mm_struct *src,
> +		struct vm_area_struct *vma);
> +	int (*pin_pages)(struct mm_struct *mm, struct vm_area_struct *vma,
> +		struct page **pages, struct vm_area_struct **vmas,
> +		unsigned long *position, int *length, int i);
> +	void (*change_protection)(struct vm_area_struct *vma,
> +		unsigned long address, unsigned long end, pgprot_t newprot);
> +	unsigned long (*unmap_page_range)(struct vm_area_struct *vma,
> +		unsigned long address, unsigned long end, long *zap_work);
> +	void (*free_pgtable_range)(struct mmu_gather **tlb,
> +		unsigned long addr, unsigned long end,
> +		unsigned long floor, unsigned long ceiling);
> +};
> +
> +#define has_pt_op(vma, op) \
> +	((vma)->pagetable_ops && (vma)->pagetable_ops->op)
> +#define pt_op(vma, call) \
> +	((vma)->pagetable_ops->call)
> +
>  struct inode;
>  
>  #define page_private(page)		((page)->private)
> 
> --

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
  2007-03-19 20:05   ` Adam Litke
@ 2007-03-20 23:24     ` Dave Hansen
  -1 siblings, 0 replies; 68+ messages in thread
From: Dave Hansen @ 2007-03-20 23:24 UTC (permalink / raw)
  To: Adam Litke
  Cc: Andrew Morton, Arjan van de Ven, William Lee Irwin III,
	Christoph Hellwig, Ken Chen, linux-mm, linux-kernel

On Mon, 2007-03-19 at 13:05 -0700, Adam Litke wrote:
> 
> +#define has_pt_op(vma, op) \
> +       ((vma)->pagetable_ops && (vma)->pagetable_ops->op)
> +#define pt_op(vma, call) \
> +       ((vma)->pagetable_ops->call) 

Can you get rid of these macros?  I think they make it a wee bit harder
to read.  My brain doesn't properly parse the foo(arg)(bar) syntax.  

+       if (has_pt_op(vma, copy_vma))
+               return pt_op(vma, copy_vma)(dst_mm, src_mm, vma);

+       if (vma->pagetable_ops && vma->pagetable_ops->copy_vma)
+               return vma->pagetable_ops->copy_vma(dst_mm, src_mm, vma);

I guess it does lead to some longish lines.  Does it start looking
really nasty?

If you're going to have them, it might just be best to put a single
unlikely() around the macro definitions themselves to keep anybody from
having to open-code it for any of the users.  

-- Dave


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
@ 2007-03-20 23:24     ` Dave Hansen
  0 siblings, 0 replies; 68+ messages in thread
From: Dave Hansen @ 2007-03-20 23:24 UTC (permalink / raw)
  To: Adam Litke
  Cc: Andrew Morton, Arjan van de Ven, William Lee Irwin III,
	Christoph Hellwig, Ken Chen, linux-mm, linux-kernel

On Mon, 2007-03-19 at 13:05 -0700, Adam Litke wrote:
> 
> +#define has_pt_op(vma, op) \
> +       ((vma)->pagetable_ops && (vma)->pagetable_ops->op)
> +#define pt_op(vma, call) \
> +       ((vma)->pagetable_ops->call) 

Can you get rid of these macros?  I think they make it a wee bit harder
to read.  My brain doesn't properly parse the foo(arg)(bar) syntax.  

+       if (has_pt_op(vma, copy_vma))
+               return pt_op(vma, copy_vma)(dst_mm, src_mm, vma);

+       if (vma->pagetable_ops && vma->pagetable_ops->copy_vma)
+               return vma->pagetable_ops->copy_vma(dst_mm, src_mm, vma);

I guess it does lead to some longish lines.  Does it start looking
really nasty?

If you're going to have them, it might just be best to put a single
unlikely() around the macro definitions themselves to keep anybody from
having to open-code it for any of the users.  

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
  2007-03-19 20:05 [PATCH 0/7] [RFC] hugetlb: pagetable_operations API (V2) Adam Litke
@ 2007-03-19 20:05   ` Adam Litke
  0 siblings, 0 replies; 68+ messages in thread
From: Adam Litke @ 2007-03-19 20:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Adam Litke, Arjan van de Ven, William Lee Irwin III,
	Christoph Hellwig, Ken Chen, linux-mm, linux-kernel


Signed-off-by: Adam Litke <agl@us.ibm.com>
---

 include/linux/mm.h |   25 +++++++++++++++++++++++++
 1 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 60e0e4a..7089323 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -98,6 +98,7 @@ struct vm_area_struct {
 
 	/* Function pointers to deal with this struct. */
 	struct vm_operations_struct * vm_ops;
+	const struct pagetable_operations_struct * pagetable_ops;
 
 	/* Information about our backing store: */
 	unsigned long vm_pgoff;		/* Offset (within vm_file) in PAGE_SIZE
@@ -218,6 +219,30 @@ struct vm_operations_struct {
 };
 
 struct mmu_gather;
+
+struct pagetable_operations_struct {
+	int (*fault)(struct mm_struct *mm,
+		struct vm_area_struct *vma,
+		unsigned long address, int write_access);
+	int (*copy_vma)(struct mm_struct *dst, struct mm_struct *src,
+		struct vm_area_struct *vma);
+	int (*pin_pages)(struct mm_struct *mm, struct vm_area_struct *vma,
+		struct page **pages, struct vm_area_struct **vmas,
+		unsigned long *position, int *length, int i);
+	void (*change_protection)(struct vm_area_struct *vma,
+		unsigned long address, unsigned long end, pgprot_t newprot);
+	unsigned long (*unmap_page_range)(struct vm_area_struct *vma,
+		unsigned long address, unsigned long end, long *zap_work);
+	void (*free_pgtable_range)(struct mmu_gather **tlb,
+		unsigned long addr, unsigned long end,
+		unsigned long floor, unsigned long ceiling);
+};
+
+#define has_pt_op(vma, op) \
+	((vma)->pagetable_ops && (vma)->pagetable_ops->op)
+#define pt_op(vma, call) \
+	((vma)->pagetable_ops->call)
+
 struct inode;
 
 #define page_private(page)		((page)->private)

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.
@ 2007-03-19 20:05   ` Adam Litke
  0 siblings, 0 replies; 68+ messages in thread
From: Adam Litke @ 2007-03-19 20:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Adam Litke, Arjan van de Ven, William Lee Irwin III,
	Christoph Hellwig, Ken Chen, linux-mm, linux-kernel

Signed-off-by: Adam Litke <agl@us.ibm.com>
---

 include/linux/mm.h |   25 +++++++++++++++++++++++++
 1 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 60e0e4a..7089323 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -98,6 +98,7 @@ struct vm_area_struct {
 
 	/* Function pointers to deal with this struct. */
 	struct vm_operations_struct * vm_ops;
+	const struct pagetable_operations_struct * pagetable_ops;
 
 	/* Information about our backing store: */
 	unsigned long vm_pgoff;		/* Offset (within vm_file) in PAGE_SIZE
@@ -218,6 +219,30 @@ struct vm_operations_struct {
 };
 
 struct mmu_gather;
+
+struct pagetable_operations_struct {
+	int (*fault)(struct mm_struct *mm,
+		struct vm_area_struct *vma,
+		unsigned long address, int write_access);
+	int (*copy_vma)(struct mm_struct *dst, struct mm_struct *src,
+		struct vm_area_struct *vma);
+	int (*pin_pages)(struct mm_struct *mm, struct vm_area_struct *vma,
+		struct page **pages, struct vm_area_struct **vmas,
+		unsigned long *position, int *length, int i);
+	void (*change_protection)(struct vm_area_struct *vma,
+		unsigned long address, unsigned long end, pgprot_t newprot);
+	unsigned long (*unmap_page_range)(struct vm_area_struct *vma,
+		unsigned long address, unsigned long end, long *zap_work);
+	void (*free_pgtable_range)(struct mmu_gather **tlb,
+		unsigned long addr, unsigned long end,
+		unsigned long floor, unsigned long ceiling);
+};
+
+#define has_pt_op(vma, op) \
+	((vma)->pagetable_ops && (vma)->pagetable_ops->op)
+#define pt_op(vma, call) \
+	((vma)->pagetable_ops->call)
+
 struct inode;
 
 #define page_private(page)		((page)->private)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2007-03-21 23:37 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-19 18:31 [PATCH 0/7] [RFC] hugetlb: pagetable_operations API Adam Litke
2007-02-19 18:31 ` Adam Litke
2007-02-19 18:31 ` [PATCH 1/7] Introduce the pagetable_operations and associated helper macros Adam Litke
2007-02-19 18:31   ` Adam Litke
2007-02-19 18:41   ` Arjan van de Ven
2007-02-19 18:41     ` Arjan van de Ven
2007-02-19 19:31     ` Adam Litke
2007-02-19 19:31       ` Adam Litke
2007-02-19 19:48   ` William Lee Irwin III
2007-02-19 19:48     ` William Lee Irwin III
2007-02-19 22:29   ` Christoph Hellwig
2007-02-19 22:29     ` Christoph Hellwig
2007-02-20 15:50     ` Mel Gorman
2007-02-20 15:50       ` Mel Gorman
2007-02-19 18:31 ` [PATCH 2/7] copy_vma for hugetlbfs Adam Litke
2007-02-19 18:31   ` Adam Litke
2007-02-19 18:31 ` [PATCH 3/7] pin_pages for hugetlb Adam Litke
2007-02-19 18:31   ` Adam Litke
2007-02-19 18:32 ` [PATCH 4/7] unmap_page_range " Adam Litke
2007-02-19 18:32   ` Adam Litke
2007-02-19 18:32 ` [PATCH 5/7] change_protection " Adam Litke
2007-02-19 18:32   ` Adam Litke
2007-02-19 18:32 ` [PATCH 6/7] free_pgtable_range " Adam Litke
2007-02-19 18:32   ` Adam Litke
2007-02-19 18:32 ` [PATCH 7/7] hugetlbfs fault handler Adam Litke
2007-02-19 18:32   ` Adam Litke
2007-02-19 18:43 ` [PATCH 0/7] [RFC] hugetlb: pagetable_operations API Arjan van de Ven
2007-02-19 18:43   ` Arjan van de Ven
2007-02-19 19:34   ` Adam Litke
2007-02-19 19:34     ` Adam Litke
2007-02-19 21:15     ` Arjan van de Ven
2007-02-19 21:15       ` Arjan van de Ven
2007-02-20 19:57       ` Benjamin Herrenschmidt
2007-02-20 19:57         ` Benjamin Herrenschmidt
2007-02-20 19:54   ` Benjamin Herrenschmidt
2007-02-20 19:54     ` Benjamin Herrenschmidt
2007-03-19 20:05 [PATCH 0/7] [RFC] hugetlb: pagetable_operations API (V2) Adam Litke
2007-03-19 20:05 ` [PATCH 1/7] Introduce the pagetable_operations and associated helper macros Adam Litke
2007-03-19 20:05   ` Adam Litke
2007-03-20 23:24   ` Dave Hansen
2007-03-20 23:24     ` Dave Hansen
2007-03-21 14:50     ` Adam Litke
2007-03-21 14:50       ` Adam Litke
2007-03-21 15:05       ` Arjan van de Ven
2007-03-21 15:05         ` Arjan van de Ven
2007-03-21  4:18   ` Nick Piggin
2007-03-21  4:18     ` Nick Piggin
2007-03-21  4:52     ` William Lee Irwin III
2007-03-21  4:52       ` William Lee Irwin III
2007-03-21  5:07       ` Nick Piggin
2007-03-21  5:07         ` Nick Piggin
2007-03-21  5:41         ` William Lee Irwin III
2007-03-21  5:41           ` William Lee Irwin III
2007-03-21  6:51           ` Nick Piggin
2007-03-21  6:51             ` Nick Piggin
2007-03-21  7:36             ` Nick Piggin
2007-03-21  7:36               ` Nick Piggin
2007-03-21 10:46             ` William Lee Irwin III
2007-03-21 10:46               ` William Lee Irwin III
2007-03-21 15:17     ` Adam Litke
2007-03-21 15:17       ` Adam Litke
2007-03-21 16:00       ` Christoph Hellwig
2007-03-21 16:00         ` Christoph Hellwig
2007-03-21 23:03         ` Nick Piggin
2007-03-21 23:03           ` Nick Piggin
2007-03-21 23:02       ` Nick Piggin
2007-03-21 23:02         ` Nick Piggin
2007-03-21 23:32         ` William Lee Irwin III
2007-03-21 23:32           ` William Lee Irwin III

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.