Huge TLB pages always physically continious?

* Huge TLB pages always physically continious?
@ 2002-11-01 22:56 Ingo Oeser
  2002-11-01 23:23 ` Andrew Morton
  0 siblings, 1 reply; 4+ messages in thread
From: Ingo Oeser @ 2002-11-01 22:56 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2087 bytes --]

Hi there,

are huge TLB pages always physically continous in memory?

What does follow_hugetlb_page do exactly? I simply don't
understand what the code does.

I would like to build up a simplified get_user_pages_sgl() to
build a scatter gather list from user space adresses.

If I want to coalesce physically continous pages (if they are
also virtually continious) anyway, can I write up a simplified
follow_hugetlb_page_sgl() function which handles the huge page
really as only one page?

Motivation:

Currently doing scatter gather DMA of user pages requires THREE
runs over the pages and I would like to save at least the second
one and possibly shorten the third one.

The three steps required:

   1) get_user_pages() to obtain the pages and lock them in page_cache
   2) translate the vector of pointers to struct page to a vector
      of struct scatterlist
   3) pci_map_sg() a decent amount[1], DMA it, wait for completion 
      or abortion, pci_unmap_sg() it and start again with the remainder

Step 2) could be eliminated completely and also the allocation of
the temporary vector of struct page.

Step 3) could be shortend, if I coalesce physically continous
ranges into a single scatterlist entry with just a ->length
bigger than PAGE_SIZE. I know that this is only worth it on
architectures, where physical address == bus address.

As each step is a for() loop and should be considered running on
more than 1MB worth of memory, I see significant improvements.

Without supporting huge TLB pages, I only add 700 bytes to the
kernel while simply copying get_user_pages() into a function,
which takes an vector of struct scatterlist instead of struct
page.

This sounds a promising tradeoff for a first time implementation.

Patch attached. No users yet, but they will follow. First
candidate is the v4l DMA stuff.

Regards

Ingo Oeser

[1] How much can I safely map on the strange architectures, where
   this is a limited? AFAIK there is no value or function telling
   me how far I can go.
-- 
Science is what we can tell a computer. Art is everything else. --- D.E.Knuth

[-- Attachment #2: get_user_pages_sgl.patch --]
[-- Type: text/plain, Size: 2791 bytes --]

diff -Naur linux-2.5.44/kernel/ksyms.c linux-2.5.44-ioe/kernel/ksyms.c

--- linux-2.5.44/kernel/ksyms.c	Sat Oct 19 06:01:08 2002
+++ linux-2.5.44-ioe/kernel/ksyms.c	Fri Nov  1 23:12:48 2002
@@ -136,6 +136,7 @@
 EXPORT_SYMBOL(page_address);
 #endif
 EXPORT_SYMBOL(get_user_pages);
+EXPORT_SYMBOL(get_user_pages_sgl);
 
 /* filesystem internal functions */
 EXPORT_SYMBOL(def_blk_fops);
diff -Naur linux-2.5.44/mm/memory.c linux-2.5.44-ioe/mm/memory.c
--- linux-2.5.44/mm/memory.c	Sat Oct 19 06:01:52 2002
+++ linux-2.5.44-ioe/mm/memory.c	Fri Nov  1 23:48:42 2002
@@ -49,6 +49,7 @@
 #include <asm/uaccess.h>
 #include <asm/tlb.h>
 #include <asm/tlbflush.h>
+#include <asm/scatterlist.h>
 
 #include <linux/swapops.h>
 
@@ -514,6 +515,85 @@
 }
 
 
+int get_user_pages_sgl(struct task_struct *tsk, struct mm_struct *mm,
+		unsigned long start, int len, int write,
+		struct scatterlist **sgl)
+{
+	int i;
+	unsigned int flags;
+
+	/* Without this structure, it makes no sense to call this */
+	BUG_ON(!sgl);
+
+	/* 
+	 * Require read or write permissions.
+	 */
+	flags = write ? VM_WRITE : VM_READ;
+	i = 0;
+
+	do {
+		struct vm_area_struct *	vma;
+
+		vma = find_extend_vma(mm, start);
+
+		if (!vma || (vma->vm_flags & VM_IO)
+				|| !(flags & vma->vm_flags))
+			return i ? : -EFAULT;
+
+		/* Doesn't work with huge pages! */
+		BUG_ON(is_vm_hugetlb_page(vma));
+		
+		spin_lock(&mm->page_table_lock);
+		do {
+			struct page *map;
+			while (!(map = follow_page(mm, start, write))) {
+				spin_unlock(&mm->page_table_lock);
+				switch (handle_mm_fault(mm,vma,start,write)) {
+				case VM_FAULT_MINOR:
+					tsk->min_flt++;
+					break;
+				case VM_FAULT_MAJOR:
+					tsk->maj_flt++;
+					break;
+				case VM_FAULT_SIGBUS:
+					return i ? i : -EFAULT;
+				case VM_FAULT_OOM:
+					return i ? i : -ENOMEM;
+				default:
+					BUG();
+				}
+				spin_lock(&mm->page_table_lock);
+			}
+			sgl[i]->page = get_page_map(map);
+			if (!sgl[i]->page) {
+				spin_unlock(&mm->page_table_lock);
+				while (i--)
+					page_cache_release(sgl[i]->page);
+				i = -EFAULT;
+				goto out;
+			}
+			if (!PageReserved(sgl[i]->page))
+				page_cache_get(sgl[i]->page);
+			
+			/* TODO: Do coalescing of physically continious pages
+			 * here
+			 */
+			sgl[i]->offset=0;
+			sgl[i]->length=PAGE_SIZE;
+
+			i++;
+			start += PAGE_SIZE;
+			len--;
+		} while(len && start < vma->vm_end);
+		spin_unlock(&mm->page_table_lock);
+	} while(len);
+	
+	/* This might be pointless, if start is always aligned to pages */
+	sgl[0]->offset=start & ~PAGE_MASK;
+	sgl[0]->length=PAGE_SIZE - (start & ~PAGE_MASK);
+out:
+	return i;
+}
 int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 		unsigned long start, int len, int write, int force,
 		struct page **pages, struct vm_area_struct **vmas)

^ permalink raw reply	[flat|nested] 4+ messages in thread