All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 0/8] x86 PAT: track pfnmap mappings with remap_pfn_range and vm_insert_pfn
@ 2008-11-12 21:26 Venkatesh Pallipadi
  2008-11-12 21:26 ` [patch 1/8] x86 PAT: store vm_pgoff for all linear_over_vma_region mappings Venkatesh Pallipadi
                   ` (7 more replies)
  0 siblings, 8 replies; 27+ messages in thread
From: Venkatesh Pallipadi @ 2008-11-12 21:26 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Nick Piggin,
	Hugh Dickins, Roland Dreier, Jesse Barnes, Jeremy Fitzhardinge,
	Arjan van de Ven
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

(Sorry about the duplicates. Resending as the original mails did not make it to
 the list due to some errors in my email header)

Drivers use mmap followed by pgprot_* and remap_pfn_range or vm_insert_pfn,
in order to export reserved memory to userspace. Currently, such mappings are
not tracked and hence not kept consistent with other mappings (/dev/mem,
pci resource, ioremap) for the sme memory, that may exist in the system.

The following patchset adds x86 PAT attribute tracking and untracking for
pfnmap related APIs.

First four patches in the patchset are changing the generic mm code to fit
in this tracking. Last four patches are x86 specific to make things work
with x86 PAT code. The patchset aso introduces pgprot_writecombine interface,
which gives writecombine mapping when enabled, falling back to
pgprot_noncached otherwise.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>

-- 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 1/8] x86 PAT: store vm_pgoff for all linear_over_vma_region mappings
  2008-11-12 21:26 [patch 0/8] x86 PAT: track pfnmap mappings with remap_pfn_range and vm_insert_pfn Venkatesh Pallipadi
@ 2008-11-12 21:26 ` Venkatesh Pallipadi
  2008-11-12 21:26 ` [patch 2/8] x86 PAT: set VM_PFNMAP flag in vm_insert_pfn Venkatesh Pallipadi
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Venkatesh Pallipadi @ 2008-11-12 21:26 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Nick Piggin,
	Hugh Dickins, Roland Dreier, Jesse Barnes, Jeremy Fitzhardinge,
	Arjan van de Ven
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: mm_vm_pgoff_changes.patch --]
[-- Type: text/plain, Size: 3166 bytes --]

Drivers use mmap followed by pgprot_* and remap_pfn_range or vm_insert_pfn,
in order to export reserved memory to userspace. Currently, such mappings are
not tracked and hence not kept consistent with other mappings (/dev/mem,
pci resource, ioremap) for the sme memory, that may exist in the system.

The following patchset adds x86 PAT attribute tracking and untracking for
pfnmap related APIs.

First four patches in the patchset are changing the generic mm code to fit
in this tracking. Last four patches are x86 specific to make things work
with x86 PAT code. The patchset aso introduces pgprot_writecombine interface,
which gives writecombine mapping when enabled, falling back to
pgprot_noncached otherwise.

This patch:

While working on x86 PAT, we faced some hurdles with trackking
remap_pfn_range() regions, as we do not have any information to say
whether that PFNMAP mapping is linear for the entire vma range or
it is smaller granularity regions within the vma.

A simple solution to this is to use vm_pgoff as an indicator for
linear mapping over the vma region. Currently, remap_pfn_range
only sets vm_pgoff for COW mappings. Below patch changes the
logic and sets the vm_pgoff irrespective of COW. This will still not
be enough for the case where pfn is zero (vma region mapped to
physical address zero). But, for all the other cases, we can look at
pfnmap VMAs and say whether the mappng is for the entire vma region
or not.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>

---
 include/linux/mm.h |    9 +++++++++
 mm/memory.c        |    7 +++----
 2 files changed, 12 insertions(+), 4 deletions(-)

Index: tip/mm/memory.c
===================================================================
--- tip.orig/mm/memory.c	2008-11-06 09:44:18.000000000 -0800
+++ tip/mm/memory.c	2008-11-10 09:44:49.000000000 -0800
@@ -1575,11 +1575,10 @@ int remap_pfn_range(struct vm_area_struc
 	 * behaviour that some programs depend on. We mark the "original"
 	 * un-COW'ed pages by matching them up with "vma->vm_pgoff".
 	 */
-	if (is_cow_mapping(vma->vm_flags)) {
-		if (addr != vma->vm_start || end != vma->vm_end)
-			return -EINVAL;
+	if (addr == vma->vm_start && end == vma->vm_end)
 		vma->vm_pgoff = pfn;
-	}
+	else if (is_cow_mapping(vma->vm_flags))
+		return -EINVAL;
 
 	vma->vm_flags |= VM_IO | VM_RESERVED | VM_PFNMAP;
 
Index: tip/include/linux/mm.h
===================================================================
--- tip.orig/include/linux/mm.h	2008-11-06 09:44:18.000000000 -0800
+++ tip/include/linux/mm.h	2008-11-10 09:44:47.000000000 -0800
@@ -145,6 +145,15 @@ extern pgprot_t protection_map[16];
 #define FAULT_FLAG_WRITE	0x01	/* Fault was a write access */
 #define FAULT_FLAG_NONLINEAR	0x02	/* Fault was via a nonlinear mapping */
 
+static inline int is_linear_pfn_mapping(struct vm_area_struct *vma)
+{
+	return ((vma->vm_flags & VM_PFNMAP) && vma->vm_pgoff);
+}
+
+static inline int is_pfn_mapping(struct vm_area_struct *vma)
+{
+	return (vma->vm_flags & VM_PFNMAP);
+}
 
 /*
  * vm_fault is filled by the the pagefault handler and passed to the vma's

-- 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 2/8] x86 PAT: set VM_PFNMAP flag in vm_insert_pfn
  2008-11-12 21:26 [patch 0/8] x86 PAT: track pfnmap mappings with remap_pfn_range and vm_insert_pfn Venkatesh Pallipadi
  2008-11-12 21:26 ` [patch 1/8] x86 PAT: store vm_pgoff for all linear_over_vma_region mappings Venkatesh Pallipadi
@ 2008-11-12 21:26 ` Venkatesh Pallipadi
  2008-11-12 23:23   ` Nick Piggin
  2008-11-12 21:26 ` [patch 3/8] x86 PAT: Add follow_pfnmp_pte routine to help tracking pfnmap pages Venkatesh Pallipadi
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Venkatesh Pallipadi @ 2008-11-12 21:26 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Nick Piggin,
	Hugh Dickins, Roland Dreier, Jesse Barnes, Jeremy Fitzhardinge,
	Arjan van de Ven
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: pfnmap_flag.patch --]
[-- Type: text/plain, Size: 726 bytes --]

vm_insert_pfn() is not setting the VM_PFNMAP flag in vma. Fix that.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>

---
 mm/memory.c |    2 ++
 1 file changed, 2 insertions(+)

Index: tip/mm/memory.c
===================================================================
--- tip.orig/mm/memory.c	2008-11-06 09:44:56.000000000 -0800
+++ tip/mm/memory.c	2008-11-10 09:44:47.000000000 -0800
@@ -1444,6 +1444,8 @@ int vm_insert_pfn(struct vm_area_struct 
 
 	if (addr < vma->vm_start || addr >= vma->vm_end)
 		return -EFAULT;
+
+	vma->vm_flags |= VM_PFNMAP;
 	return insert_pfn(vma, addr, pfn, vma->vm_page_prot);
 }
 EXPORT_SYMBOL(vm_insert_pfn);

-- 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 3/8] x86 PAT: Add follow_pfnmp_pte routine to help tracking pfnmap pages
  2008-11-12 21:26 [patch 0/8] x86 PAT: track pfnmap mappings with remap_pfn_range and vm_insert_pfn Venkatesh Pallipadi
  2008-11-12 21:26 ` [patch 1/8] x86 PAT: store vm_pgoff for all linear_over_vma_region mappings Venkatesh Pallipadi
  2008-11-12 21:26 ` [patch 2/8] x86 PAT: set VM_PFNMAP flag in vm_insert_pfn Venkatesh Pallipadi
@ 2008-11-12 21:26 ` Venkatesh Pallipadi
  2008-11-12 23:27   ` Nick Piggin
  2008-11-12 21:26 ` [patch 4/8] x86 PAT: hooks in generic vm code to help archs to track pfnmap regions Venkatesh Pallipadi
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Venkatesh Pallipadi @ 2008-11-12 21:26 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Nick Piggin,
	Hugh Dickins, Roland Dreier, Jesse Barnes, Jeremy Fitzhardinge,
	Arjan van de Ven
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: follow_pfn.patch --]
[-- Type: text/plain, Size: 2372 bytes --]


Add a generic interface to follow pfn in a pfnmap vma range. This is used by
one of the subsequent x86 PAT related patch to keep track of memory types
for vma regions across vma copy and free.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>

---
 include/linux/mm.h |    3 +++
 mm/memory.c        |   43 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

Index: tip/include/linux/mm.h
===================================================================
--- tip.orig/include/linux/mm.h	2008-11-06 15:41:43.000000000 -0800
+++ tip/include/linux/mm.h	2008-11-10 09:44:45.000000000 -0800
@@ -1223,6 +1223,9 @@ struct page *follow_page(struct vm_area_
 #define FOLL_GET	0x04	/* do get_page on page */
 #define FOLL_ANON	0x08	/* give ZERO_PAGE if no pgtable */
 
+unsigned long follow_pfnmap_pte(struct vm_area_struct *vma,
+				unsigned long address, pte_t *ret_ptep);
+
 typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
 			void *data);
 extern int apply_to_page_range(struct mm_struct *mm, unsigned long address,
Index: tip/mm/memory.c
===================================================================
--- tip.orig/mm/memory.c	2008-11-06 15:41:43.000000000 -0800
+++ tip/mm/memory.c	2008-11-10 09:44:45.000000000 -0800
@@ -1111,6 +1111,49 @@ no_page_table:
 	return page;
 }
 
+unsigned long follow_pfnmap_pte(struct vm_area_struct *vma,
+			unsigned long address, pte_t *ret_ptep)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *ptep, pte;
+	spinlock_t *ptl;
+	struct page *page;
+	struct mm_struct *mm = vma->vm_mm;
+
+	if (!is_pfn_mapping(vma))
+		goto err;
+
+	page = NULL;
+	pgd = pgd_offset(mm, address);
+	if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
+		goto err;
+
+	pud = pud_offset(pgd, address);
+	if (pud_none(*pud) || unlikely(pud_bad(*pud)))
+		goto err;
+
+	pmd = pmd_offset(pud, address);
+	if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd)))
+		goto err;
+
+	ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
+
+	pte = *ptep;
+	if (!pte_present(pte))
+		goto err_unlock;
+
+	*ret_ptep = pte;
+	pte_unmap_unlock(ptep, ptl);
+	return 0;
+
+err_unlock:
+	pte_unmap_unlock(ptep, ptl);
+err:
+	return -1;
+}
+
 /* Can we do the FOLL_ANON optimization? */
 static inline int use_zero_page(struct vm_area_struct *vma)
 {

-- 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 4/8] x86 PAT: hooks in generic vm code to help archs to track pfnmap regions
  2008-11-12 21:26 [patch 0/8] x86 PAT: track pfnmap mappings with remap_pfn_range and vm_insert_pfn Venkatesh Pallipadi
                   ` (2 preceding siblings ...)
  2008-11-12 21:26 ` [patch 3/8] x86 PAT: Add follow_pfnmp_pte routine to help tracking pfnmap pages Venkatesh Pallipadi
@ 2008-11-12 21:26 ` Venkatesh Pallipadi
  2008-12-16 19:57   ` Andrew Morton
  2008-11-12 21:26 ` [patch 5/8] x86 PAT: Implement track/untrack of pfnmap regions for x86 Venkatesh Pallipadi
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Venkatesh Pallipadi @ 2008-11-12 21:26 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Nick Piggin,
	Hugh Dickins, Roland Dreier, Jesse Barnes, Jeremy Fitzhardinge,
	Arjan van de Ven
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: generic_pfn_range_tracking.patch --]
[-- Type: text/plain, Size: 3998 bytes --]

Introduce generic hooks in remap_pfn_range and vm_insert_pfn and
corresponding copy and free routines with reserve and free tracking.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>

---
 include/linux/mm.h |    6 +++++
 mm/memory.c        |   54 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 59 insertions(+), 1 deletion(-)

Index: tip/mm/memory.c
===================================================================
--- tip.orig/mm/memory.c	2008-11-11 10:10:11.000000000 -0800
+++ tip/mm/memory.c	2008-11-11 12:10:18.000000000 -0800
@@ -99,6 +99,28 @@ int randomize_va_space __read_mostly =
 					2;
 #endif
 
+#ifndef track_pfn_vma_new
+int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t prot,
+			unsigned long pfn, unsigned long size)
+{
+	return 0;
+}
+#endif
+
+#ifndef track_pfn_vma_copy
+int track_pfn_vma_copy(struct vm_area_struct *vma)
+{
+	return 0;
+}
+#endif
+
+#ifndef untrack_pfn_vma
+void untrack_pfn_vma(struct vm_area_struct *vma, unsigned long pfn,
+			unsigned long size)
+{
+}
+#endif
+
 static int __init disable_randmaps(char *s)
 {
 	randomize_va_space = 0;
@@ -669,6 +691,16 @@ int copy_page_range(struct mm_struct *ds
 	if (is_vm_hugetlb_page(vma))
 		return copy_hugetlb_page_range(dst_mm, src_mm, vma);
 
+	if (is_pfn_mapping(vma)) {
+		/*
+		 * We do not free on error cases below as remove_vma
+		 * gets called on error from higher level routine
+		 */
+		ret = track_pfn_vma_copy(vma);
+		if (ret)
+			return ret;
+	}
+
 	/*
 	 * We need to invalidate the secondary MMU mappings only when
 	 * there could be a permission downgrade on the ptes of the
@@ -915,6 +947,9 @@ unsigned long unmap_vmas(struct mmu_gath
 		if (vma->vm_flags & VM_ACCOUNT)
 			*nr_accounted += (end - start) >> PAGE_SHIFT;
 
+		if (is_pfn_mapping(vma))
+			untrack_pfn_vma(vma, 0, 0);
+
 		while (start != end) {
 			if (!tlb_start_valid) {
 				tlb_start = start;
@@ -1473,6 +1508,7 @@ out:
 int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
 			unsigned long pfn)
 {
+	int ret;
 	/*
 	 * Technically, architectures with pte_special can avoid all these
 	 * restrictions (same for remap_pfn_range).  However we would like
@@ -1489,7 +1525,15 @@ int vm_insert_pfn(struct vm_area_struct 
 		return -EFAULT;
 
 	vma->vm_flags |= VM_PFNMAP;
-	return insert_pfn(vma, addr, pfn, vma->vm_page_prot);
+	if (track_pfn_vma_new(vma, vma->vm_page_prot, pfn, PAGE_SIZE))
+		return -EINVAL;
+
+  	ret = insert_pfn(vma, addr, pfn, vma->vm_page_prot);
+
+	if (ret)
+		untrack_pfn_vma(vma, pfn, PAGE_SIZE);
+
+	return ret;
 }
 EXPORT_SYMBOL(vm_insert_pfn);
 
@@ -1627,6 +1671,10 @@ int remap_pfn_range(struct vm_area_struc
 
 	vma->vm_flags |= VM_IO | VM_RESERVED | VM_PFNMAP;
 
+	err = track_pfn_vma_new(vma, prot, pfn, PAGE_ALIGN(size));
+	if (err)
+		return -EINVAL;
+
 	BUG_ON(addr >= end);
 	pfn -= addr >> PAGE_SHIFT;
 	pgd = pgd_offset(mm, addr);
@@ -1638,6 +1686,10 @@ int remap_pfn_range(struct vm_area_struc
 		if (err)
 			break;
 	} while (pgd++, addr = next, addr != end);
+
+	if (err)
+		untrack_pfn_vma(vma, pfn, PAGE_ALIGN(size));
+
 	return err;
 }
 EXPORT_SYMBOL(remap_pfn_range);
Index: tip/include/linux/mm.h
===================================================================
--- tip.orig/include/linux/mm.h	2008-11-11 10:10:11.000000000 -0800
+++ tip/include/linux/mm.h	2008-11-11 10:10:12.000000000 -0800
@@ -155,6 +155,12 @@ static inline int is_pfn_mapping(struct 
 	return (vma->vm_flags & VM_PFNMAP);
 }
 
+extern int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t prot,
+				unsigned long pfn, unsigned long size);
+extern int track_pfn_vma_copy(struct vm_area_struct *vma);
+extern void untrack_pfn_vma(struct vm_area_struct *vma, unsigned long pfn,
+				unsigned long size);
+
 /*
  * vm_fault is filled by the the pagefault handler and passed to the vma's
  * ->fault function. The vma's ->fault is responsible for returning a bitmask

-- 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 5/8] x86 PAT: Implement track/untrack of pfnmap regions for x86
  2008-11-12 21:26 [patch 0/8] x86 PAT: track pfnmap mappings with remap_pfn_range and vm_insert_pfn Venkatesh Pallipadi
                   ` (3 preceding siblings ...)
  2008-11-12 21:26 ` [patch 4/8] x86 PAT: hooks in generic vm code to help archs to track pfnmap regions Venkatesh Pallipadi
@ 2008-11-12 21:26 ` Venkatesh Pallipadi
  2008-12-16 20:07   ` Andrew Morton
  2008-11-12 21:26 ` [patch 6/8] x86 PAT: change pgprot_noncached to uc_minus instead of strong uc Venkatesh Pallipadi
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Venkatesh Pallipadi @ 2008-11-12 21:26 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Nick Piggin,
	Hugh Dickins, Roland Dreier, Jesse Barnes, Jeremy Fitzhardinge,
	Arjan van de Ven
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: x86_pfn_reservefree.patch --]
[-- Type: text/plain, Size: 6990 bytes --]

Hookup remap_pfn_range and vm_insert_pfn and corresponding copy and free
routines with reserve and free tracking.

reserve and free here only takes care of non RAM region mapping. For RAM
region, driver should use set_memory_[uc|wc|wb] to set the cache type and
then setup the mapping for user pte. We can bypass below
reserve/free in that case.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>

---
 arch/x86/include/asm/pgtable.h |   10 ++
 arch/x86/mm/pat.c              |  201 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 211 insertions(+)

Index: tip/arch/x86/mm/pat.c
===================================================================
--- tip.orig/arch/x86/mm/pat.c	2008-11-12 09:55:55.000000000 -0800
+++ tip/arch/x86/mm/pat.c	2008-11-12 12:06:50.000000000 -0800
@@ -596,6 +596,207 @@ void unmap_devmem(unsigned long pfn, uns
 	free_memtype(addr, addr + size);
 }
 
+int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t vma_prot)
+{
+	unsigned long flags;
+	unsigned long want_flags = (pgprot_val(vma_prot) & _PAGE_CACHE_MASK);
+
+	int is_ram = 0;
+	int id_sz, ret;
+
+	is_ram = pagerange_is_ram(paddr, paddr + size);
+
+	if (is_ram != 0) {
+		/*
+		 * For mapping RAM pages, drivers need to call
+		 * set_memory_[uc|wc|wb] directly, for reserve and free, before
+		 * setting up the PTE.
+		 */
+		WARN_ON_ONCE(1);
+		return 0;
+	}
+
+	ret = reserve_memtype(paddr, paddr + size, want_flags, &flags);
+	if (ret)
+		return ret;
+
+	if (flags != want_flags) {
+		free_memtype(paddr, paddr + size);
+		printk(KERN_INFO
+		"%s:%d map pfn expected mapping type %s for %Lx-%Lx, got %s\n",
+			current->comm, current->pid,
+			cattr_name(want_flags),
+			paddr, (unsigned long long)(paddr + size),
+			cattr_name(flags));
+		return -EINVAL;
+	}
+
+	/* Need to keep identity mapping in sync */
+	if (paddr >= __pa(high_memory))
+		return 0;
+
+	id_sz = (__pa(high_memory) < paddr + size) ?
+				__pa(high_memory) - paddr :
+				size;
+
+	if (ioremap_change_attr((unsigned long)__va(paddr), id_sz, flags) < 0) {
+		free_memtype(paddr, paddr + size);
+		printk(KERN_INFO
+			"%s:%d reserve_pfn_range ioremap_change_attr failed %s "
+			"for %Lx-%Lx\n",
+			current->comm, current->pid,
+			cattr_name(flags),
+			paddr, (unsigned long long)(paddr + size));
+		return -EINVAL;
+	}
+	return 0;
+}
+
+void free_pfn_range(u64 paddr, unsigned long size)
+{
+	int is_ram;
+
+	is_ram = pagerange_is_ram(paddr, paddr + size);
+	if (is_ram == 0)
+		free_memtype(paddr, paddr + size);
+}
+
+int track_pfn_vma_copy(struct vm_area_struct *vma)
+{
+	int retval = 0;
+	unsigned long i, j;
+	u64 paddr;
+	pgprot_t prot;
+	pte_t pte;
+	unsigned long vma_start = vma->vm_start;
+	unsigned long vma_end = vma->vm_end;
+	unsigned long vma_size = vma_end - vma_start;
+
+	if (!pat_enabled)
+		return 0;
+
+	if (is_linear_pfn_mapping(vma)) {
+		/*
+		 * reserve the whole chunk starting from vm_pgoff,
+		 * But, we have to get the protection from pte.
+		 */
+		if (follow_pfnmap_pte(vma, vma_start, &pte)) {
+			WARN_ON_ONCE(1);
+			return -1;
+		}
+		prot = pte_pgprot(pte);
+		paddr = (u64)vma->vm_pgoff << PAGE_SHIFT;
+		return reserve_pfn_range(paddr, vma_size, prot);
+	}
+
+	/* reserve entire vma page by page, using pfn and prot from pte */
+	for (i = 0; i < vma_size; i += PAGE_SIZE) {
+		if (follow_pfnmap_pte(vma, vma_start + i, &pte))
+			continue;
+
+		paddr = pte_pa(pte);
+		prot = pte_pgprot(pte);
+		retval = reserve_pfn_range(paddr, PAGE_SIZE, prot);
+		if (retval)
+			goto cleanup_ret;
+	}
+	return 0;
+
+cleanup_ret:
+	/* Reserve error: Cleanup partial reservation and return error */
+	for (j = 0; j < i; j += PAGE_SIZE) {
+		if (follow_pfnmap_pte(vma, vma_start + j, &pte))
+			continue;
+
+		paddr = pte_pa(pte);
+		free_pfn_range(paddr, PAGE_SIZE);
+	}
+
+	return retval;
+}
+
+int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t prot,
+			unsigned long pfn, unsigned long size)
+{
+	int retval = 0;
+	unsigned long i, j;
+	u64 base_paddr;
+	u64 paddr;
+	unsigned long vma_start = vma->vm_start;
+	unsigned long vma_end = vma->vm_end;
+	unsigned long vma_size = vma_end - vma_start;
+
+	if (!pat_enabled)
+		return 0;
+
+	if (is_linear_pfn_mapping(vma)) {
+		/* reserve the whole chunk starting from vm_pgoff */
+		paddr = (u64)vma->vm_pgoff << PAGE_SHIFT;
+		return reserve_pfn_range(paddr, vma_size, prot);
+	}
+
+	/* reserve page by page using pfn and size */
+	base_paddr = (u64)pfn << PAGE_SHIFT;
+	for (i = 0; i < size; i += PAGE_SIZE) {
+		paddr = base_paddr + i;
+		retval = reserve_pfn_range(paddr, PAGE_SIZE, prot);
+		if (retval)
+			goto cleanup_ret;
+	}
+	return 0;
+
+cleanup_ret:
+	/* Reserve error: Cleanup partial reservation and return error */
+	for (j = 0; j < i; j += PAGE_SIZE) {
+		paddr = base_paddr + j;
+		free_pfn_range(paddr, PAGE_SIZE);
+	}
+
+	return retval;
+}
+
+void untrack_pfn_vma(struct vm_area_struct *vma, unsigned long pfn,
+			unsigned long size)
+{
+	unsigned long i;
+	u64 paddr;
+	unsigned long vma_start = vma->vm_start;
+	unsigned long vma_end = vma->vm_end;
+	unsigned long vma_size = vma_end - vma_start;
+
+	if (!pat_enabled)
+		return;
+
+	if (is_linear_pfn_mapping(vma)) {
+		/* free the whole chunk starting from vm_pgoff */
+		paddr = (u64)vma->vm_pgoff << PAGE_SHIFT;
+		free_pfn_range(paddr, vma_size);
+		return;
+	}
+
+	if (size != 0 && size != vma_size) {
+		/* free page by page, using pfn and size */
+
+		paddr = (u64)pfn << PAGE_SHIFT;
+		for (i = 0; i < size; i += PAGE_SIZE) {
+			paddr = paddr + i;
+			free_pfn_range(paddr, PAGE_SIZE);
+		}
+	} else {
+		/* free entire vma, page by page, using the pfn from pte */
+
+		for (i = 0; i < vma_size; i += PAGE_SIZE) {
+			pte_t pte;
+
+			if (follow_pfnmap_pte(vma, vma_start + i, &pte))
+				continue;
+
+			paddr = pte_pa(pte);
+			free_pfn_range(paddr, PAGE_SIZE);
+		}
+	}
+}
+
 #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_X86_PAT)
 
 /* get Nth element of the linked list */
Index: tip/arch/x86/include/asm/pgtable.h
===================================================================
--- tip.orig/arch/x86/include/asm/pgtable.h	2008-11-12 09:55:55.000000000 -0800
+++ tip/arch/x86/include/asm/pgtable.h	2008-11-12 12:03:43.000000000 -0800
@@ -219,6 +219,11 @@ static inline unsigned long pte_pfn(pte_
 	return (pte_val(pte) & PTE_PFN_MASK) >> PAGE_SHIFT;
 }
 
+static inline u64 pte_pa(pte_t pte)
+{
+	return pte_val(pte) & PTE_PFN_MASK;
+}
+
 #define pte_page(pte)	pfn_to_page(pte_pfn(pte))
 
 static inline int pmd_large(pmd_t pte)
@@ -328,6 +333,11 @@ static inline pgprot_t pgprot_modify(pgp
 
 #define canon_pgprot(p) __pgprot(pgprot_val(p) & __supported_pte_mask)
 
+/* Indicate that x86 has its own track and untrack pfn vma functions */
+#define track_pfn_vma_new track_pfn_vma_new
+#define track_pfn_vma_copy track_pfn_vma_copy
+#define untrack_pfn_vma untrack_pfn_vma
+
 #ifndef __ASSEMBLY__
 #define __HAVE_PHYS_MEM_ACCESS_PROT
 struct file;

-- 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 6/8] x86 PAT: change pgprot_noncached to uc_minus instead of strong uc
  2008-11-12 21:26 [patch 0/8] x86 PAT: track pfnmap mappings with remap_pfn_range and vm_insert_pfn Venkatesh Pallipadi
                   ` (4 preceding siblings ...)
  2008-11-12 21:26 ` [patch 5/8] x86 PAT: Implement track/untrack of pfnmap regions for x86 Venkatesh Pallipadi
@ 2008-11-12 21:26 ` Venkatesh Pallipadi
  2008-11-12 21:26 ` [patch 7/8] x86 PAT: add pgprot_writecombine() interface for drivers Venkatesh Pallipadi
  2008-11-12 21:26 ` [patch 8/8] x86 PAT: update documentation to cover pgprot and remap_pfn related changes Venkatesh Pallipadi
  7 siblings, 0 replies; 27+ messages in thread
From: Venkatesh Pallipadi @ 2008-11-12 21:26 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Nick Piggin,
	Hugh Dickins, Roland Dreier, Jesse Barnes, Jeremy Fitzhardinge,
	Arjan van de Ven
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: make_pgprot_noncached_ucminus.patch --]
[-- Type: text/plain, Size: 2603 bytes --]

Make pgprot_noncached uc_minus instead of strong UC. This will make
pgprot_noncached to be in line with ioremap_nocache() and all the other
APIs that map page uc_minus on uc request.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>

---
 arch/x86/include/asm/pgtable.h    |    8 ++++++++
 arch/x86/include/asm/pgtable_32.h |    9 ---------
 arch/x86/include/asm/pgtable_64.h |    6 ------
 3 files changed, 8 insertions(+), 15 deletions(-)

Index: tip/arch/x86/include/asm/pgtable.h
===================================================================
--- tip.orig/arch/x86/include/asm/pgtable.h	2008-11-06 15:58:21.000000000 -0800
+++ tip/arch/x86/include/asm/pgtable.h	2008-11-10 09:44:38.000000000 -0800
@@ -158,6 +158,14 @@
 #define PGD_IDENT_ATTR	 0x001		/* PRESENT (no other attributes) */
 #endif
 
+/*
+ * Macro to mark a page protection value as UC-
+ */
+#define pgprot_noncached(prot)					\
+	((boot_cpu_data.x86 > 3)				\
+	 ? (__pgprot(pgprot_val(prot) | _PAGE_CACHE_UC_MINUS))	\
+	 : (prot))
+
 #ifndef __ASSEMBLY__
 
 /*
Index: tip/arch/x86/include/asm/pgtable_32.h
===================================================================
--- tip.orig/arch/x86/include/asm/pgtable_32.h	2008-11-06 09:44:12.000000000 -0800
+++ tip/arch/x86/include/asm/pgtable_32.h	2008-11-06 16:17:11.000000000 -0800
@@ -107,15 +107,6 @@ extern unsigned long pg0[];
 #endif
 
 /*
- * Macro to mark a page protection value as "uncacheable".
- * On processors which do not support it, this is a no-op.
- */
-#define pgprot_noncached(prot)					\
-	((boot_cpu_data.x86 > 3)				\
-	 ? (__pgprot(pgprot_val(prot) | _PAGE_PCD | _PAGE_PWT))	\
-	 : (prot))
-
-/*
  * Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.
  */
Index: tip/arch/x86/include/asm/pgtable_64.h
===================================================================
--- tip.orig/arch/x86/include/asm/pgtable_64.h	2008-11-06 09:44:12.000000000 -0800
+++ tip/arch/x86/include/asm/pgtable_64.h	2008-11-06 16:17:11.000000000 -0800
@@ -183,12 +183,6 @@ static inline int pmd_bad(pmd_t pmd)
 #define pages_to_mb(x)	((x) >> (20 - PAGE_SHIFT))   /* FIXME: is this right? */
 
 /*
- * Macro to mark a page protection value as "uncacheable".
- */
-#define pgprot_noncached(prot)					\
-	(__pgprot(pgprot_val((prot)) | _PAGE_PCD | _PAGE_PWT))
-
-/*
  * Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.
  */

-- 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 7/8] x86 PAT: add pgprot_writecombine() interface for drivers
  2008-11-12 21:26 [patch 0/8] x86 PAT: track pfnmap mappings with remap_pfn_range and vm_insert_pfn Venkatesh Pallipadi
                   ` (5 preceding siblings ...)
  2008-11-12 21:26 ` [patch 6/8] x86 PAT: change pgprot_noncached to uc_minus instead of strong uc Venkatesh Pallipadi
@ 2008-11-12 21:26 ` Venkatesh Pallipadi
  2008-11-12 21:26 ` [patch 8/8] x86 PAT: update documentation to cover pgprot and remap_pfn related changes Venkatesh Pallipadi
  7 siblings, 0 replies; 27+ messages in thread
From: Venkatesh Pallipadi @ 2008-11-12 21:26 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Nick Piggin,
	Hugh Dickins, Roland Dreier, Jesse Barnes, Jeremy Fitzhardinge,
	Arjan van de Ven
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: add_pgprot_writecombine.patch --]
[-- Type: text/plain, Size: 2135 bytes --]


Add pgprot_writecombine. pgprot_writecombine will be aliased to
pgprot_noncached when not supported by the architecture.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>

---
 arch/x86/include/asm/pgtable.h |    3 +++
 arch/x86/mm/pat.c              |    8 ++++++++
 include/asm-generic/pgtable.h  |    4 ++++
 3 files changed, 15 insertions(+)

Index: tip/include/asm-generic/pgtable.h
===================================================================
--- tip.orig/include/asm-generic/pgtable.h	2008-11-12 12:03:41.000000000 -0800
+++ tip/include/asm-generic/pgtable.h	2008-11-12 12:09:34.000000000 -0800
@@ -129,6 +129,10 @@ static inline void ptep_set_wrprotect(st
 #define move_pte(pte, prot, old_addr, new_addr)	(pte)
 #endif
 
+#ifndef pgprot_writecombine
+#define pgprot_writecombine pgprot_noncached
+#endif
+
 /*
  * When walking page tables, get the address of the next boundary,
  * or the end address of the range if that comes earlier.  Although no
Index: tip/arch/x86/include/asm/pgtable.h
===================================================================
--- tip.orig/arch/x86/include/asm/pgtable.h	2008-11-12 12:09:31.000000000 -0800
+++ tip/arch/x86/include/asm/pgtable.h	2008-11-12 12:09:34.000000000 -0800
@@ -168,6 +168,9 @@
 
 #ifndef __ASSEMBLY__
 
+#define pgprot_writecombine	pgprot_writecombine
+extern pgprot_t pgprot_writecombine(pgprot_t prot);
+
 /*
  * ZERO_PAGE is a global shared page that is always zero: used
  * for zero-mapped memory areas etc..
Index: tip/arch/x86/mm/pat.c
===================================================================
--- tip.orig/arch/x86/mm/pat.c	2008-11-12 12:06:50.000000000 -0800
+++ tip/arch/x86/mm/pat.c	2008-11-12 12:09:34.000000000 -0800
@@ -797,6 +797,14 @@ void untrack_pfn_vma(struct vm_area_stru
 	}
 }
 
+pgprot_t pgprot_writecombine(pgprot_t prot)
+{
+	if (pat_enabled)
+		return __pgprot(pgprot_val(prot) | _PAGE_CACHE_WC);
+	else
+		return pgprot_noncached(prot);
+}
+
 #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_X86_PAT)
 
 /* get Nth element of the linked list */

-- 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 8/8] x86 PAT: update documentation to cover pgprot and remap_pfn related changes
  2008-11-12 21:26 [patch 0/8] x86 PAT: track pfnmap mappings with remap_pfn_range and vm_insert_pfn Venkatesh Pallipadi
                   ` (6 preceding siblings ...)
  2008-11-12 21:26 ` [patch 7/8] x86 PAT: add pgprot_writecombine() interface for drivers Venkatesh Pallipadi
@ 2008-11-12 21:26 ` Venkatesh Pallipadi
  7 siblings, 0 replies; 27+ messages in thread
From: Venkatesh Pallipadi @ 2008-11-12 21:26 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Nick Piggin,
	Hugh Dickins, Roland Dreier, Jesse Barnes, Jeremy Fitzhardinge,
	Arjan van de Ven
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: documentation_update.patch --]
[-- Type: text/plain, Size: 1760 bytes --]

Add documentation related to pgprot_* change.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
 Documentation/x86/pat.txt |   24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

Index: tip/Documentation/x86/pat.txt
===================================================================
--- tip.orig/Documentation/x86/pat.txt	2008-11-06 09:44:10.000000000 -0800
+++ tip/Documentation/x86/pat.txt	2008-11-06 16:19:05.000000000 -0800
@@ -80,6 +80,30 @@ pci proc               |    --    |    -
                        |          |            |                  |
 -------------------------------------------------------------------
 
+Advanced APIs for drivers
+-------------------------
+A. Exporting pages to user with remap_pfn_range, io_remap_pfn_range,
+vm_insert_pfn
+
+Drivers wanting to export some pages to userspace, do it by using mmap
+interface and a combination of
+1) pgprot_noncached()
+2) io_remap_pfn_range() or remap_pfn_range() or vm_insert_pfn()
+
+With pat support, a new API pgprot_writecombine is being added. So, driver can
+continue to use the above sequence, with either pgprot_noncached() or
+pgprot_writecombine() in step 1, followed by step 2.
+
+In addition, step 2 internally tracks the region as UC or WC in memtype
+list in order to ensure no conflicting mapping.
+
+Note that this set of APIs only work with IO (non RAM) regions. If driver
+wants to export RAM region, it has to do set_memory_uc() or set_memory_wc()
+as step 0 above and also track the usage of those pages and use set_memory_wb()
+before the page is freed to free pool.
+
+
+
 Notes:
 
 -- in the above table mean "Not suggested usage for the API". Some of the --'s

-- 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 2/8] x86 PAT: set VM_PFNMAP flag in vm_insert_pfn
  2008-11-12 21:26 ` [patch 2/8] x86 PAT: set VM_PFNMAP flag in vm_insert_pfn Venkatesh Pallipadi
@ 2008-11-12 23:23   ` Nick Piggin
  2008-11-13  0:02     ` Pallipadi, Venkatesh
  2008-11-15  7:38     ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 27+ messages in thread
From: Nick Piggin @ 2008-11-12 23:23 UTC (permalink / raw)
  To: Venkatesh Pallipadi
  Cc: Ingo Molnar, Thomas Gleixner, H.Peter Anvin, Hugh Dickins,
	Roland Dreier, Jesse Barnes, Jeremy Fitzhardinge,
	Arjan van de Ven, linux-kernel, Suresh Siddha

You have to be careful of this, because it can be called with mmap_sem
held for read only. Hmm, I guess vm_insert_page is doing the same thing.
Probably mostly works because all other modifiers of vm_flags are holding
mmap_sem.

However, in some cases, code can do vm_insert_pfn and vm_insert_page
(actually hmm, no vm_insert_mixed actually should cover most of those
cases).

Still, I'd be much happier if we could make these into BUG_ON, and then
teach callers to set it in their .mmap routines.

 
On Wed, Nov 12, 2008 at 01:26:49PM -0800, Venkatesh Pallipadi wrote:
> vm_insert_pfn() is not setting the VM_PFNMAP flag in vma. Fix that.
> 
> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
> 
> ---
>  mm/memory.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> Index: tip/mm/memory.c
> ===================================================================
> --- tip.orig/mm/memory.c	2008-11-06 09:44:56.000000000 -0800
> +++ tip/mm/memory.c	2008-11-10 09:44:47.000000000 -0800
> @@ -1444,6 +1444,8 @@ int vm_insert_pfn(struct vm_area_struct 
>  
>  	if (addr < vma->vm_start || addr >= vma->vm_end)
>  		return -EFAULT;
> +
> +	vma->vm_flags |= VM_PFNMAP;
>  	return insert_pfn(vma, addr, pfn, vma->vm_page_prot);
>  }
>  EXPORT_SYMBOL(vm_insert_pfn);
> 
> -- 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 3/8] x86 PAT: Add follow_pfnmp_pte routine to help tracking pfnmap pages
  2008-11-12 21:26 ` [patch 3/8] x86 PAT: Add follow_pfnmp_pte routine to help tracking pfnmap pages Venkatesh Pallipadi
@ 2008-11-12 23:27   ` Nick Piggin
  2008-11-12 23:54     ` Pallipadi, Venkatesh
  0 siblings, 1 reply; 27+ messages in thread
From: Nick Piggin @ 2008-11-12 23:27 UTC (permalink / raw)
  To: Venkatesh Pallipadi
  Cc: Ingo Molnar, Thomas Gleixner, H.Peter Anvin, Hugh Dickins,
	Roland Dreier, Jesse Barnes, Jeremy Fitzhardinge,
	Arjan van de Ven, linux-kernel, Suresh Siddha

On Wed, Nov 12, 2008 at 01:26:50PM -0800, Venkatesh Pallipadi wrote:
> 
> Add a generic interface to follow pfn in a pfnmap vma range. This is used by
> one of the subsequent x86 PAT related patch to keep track of memory types
> for vma regions across vma copy and free.
> 
> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
> 
> ---
>  include/linux/mm.h |    3 +++
>  mm/memory.c        |   43 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 46 insertions(+)
> 
> Index: tip/include/linux/mm.h
> ===================================================================
> --- tip.orig/include/linux/mm.h	2008-11-06 15:41:43.000000000 -0800
> +++ tip/include/linux/mm.h	2008-11-10 09:44:45.000000000 -0800
> @@ -1223,6 +1223,9 @@ struct page *follow_page(struct vm_area_
>  #define FOLL_GET	0x04	/* do get_page on page */
>  #define FOLL_ANON	0x08	/* give ZERO_PAGE if no pgtable */
>  
> +unsigned long follow_pfnmap_pte(struct vm_area_struct *vma,
> +				unsigned long address, pte_t *ret_ptep);
> +
>  typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
>  			void *data);
>  extern int apply_to_page_range(struct mm_struct *mm, unsigned long address,
> Index: tip/mm/memory.c
> ===================================================================
> --- tip.orig/mm/memory.c	2008-11-06 15:41:43.000000000 -0800
> +++ tip/mm/memory.c	2008-11-10 09:44:45.000000000 -0800
> @@ -1111,6 +1111,49 @@ no_page_table:
>  	return page;
>  }
>  
> +unsigned long follow_pfnmap_pte(struct vm_area_struct *vma,
> +			unsigned long address, pte_t *ret_ptep)
> +{

I think we'd typically return an int for functions like this. It
probably wouldn't hurt to return more descriptive errors either.
-EINVAL, perhaps, seems to fit best with what existing code there
is doing.

> +	pgd_t *pgd;
> +	pud_t *pud;
> +	pmd_t *pmd;
> +	pte_t *ptep, pte;
> +	spinlock_t *ptl;
> +	struct page *page;
> +	struct mm_struct *mm = vma->vm_mm;
> +
> +	if (!is_pfn_mapping(vma))
> +		goto err;
> +
> +	page = NULL;
> +	pgd = pgd_offset(mm, address);
> +	if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
> +		goto err;
> +
> +	pud = pud_offset(pgd, address);
> +	if (pud_none(*pud) || unlikely(pud_bad(*pud)))
> +		goto err;
> +
> +	pmd = pmd_offset(pud, address);
> +	if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd)))
> +		goto err;
> +
> +	ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
> +
> +	pte = *ptep;
> +	if (!pte_present(pte))
> +		goto err_unlock;
> +
> +	*ret_ptep = pte;
> +	pte_unmap_unlock(ptep, ptl);
> +	return 0;
> +
> +err_unlock:
> +	pte_unmap_unlock(ptep, ptl);
> +err:
> +	return -1;
> +}
> +
>  /* Can we do the FOLL_ANON optimization? */
>  static inline int use_zero_page(struct vm_area_struct *vma)
>  {
> 
> -- 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [patch 3/8] x86 PAT: Add follow_pfnmp_pte routine to help tracking pfnmap pages
  2008-11-12 23:27   ` Nick Piggin
@ 2008-11-12 23:54     ` Pallipadi, Venkatesh
  0 siblings, 0 replies; 27+ messages in thread
From: Pallipadi, Venkatesh @ 2008-11-12 23:54 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Hugh Dickins,
	Roland Dreier, Jesse Barnes, Jeremy Fitzhardinge,
	Arjan van de Ven, linux-kernel, Siddha, Suresh B

 

>-----Original Message-----
>From: Nick Piggin [mailto:npiggin@suse.de] 
>Sent: Wednesday, November 12, 2008 3:27 PM
>To: Pallipadi, Venkatesh
>Cc: Ingo Molnar; Thomas Gleixner; H.Peter Anvin; Hugh Dickins; 
>Roland Dreier; Jesse Barnes; Jeremy Fitzhardinge; Arjan van de 
>Ven; linux-kernel@vger.kernel.org; Siddha, Suresh B
>Subject: Re: [patch 3/8] x86 PAT: Add follow_pfnmp_pte routine 
>to help tracking pfnmap pages
>
>On Wed, Nov 12, 2008 at 01:26:50PM -0800, Venkatesh Pallipadi wrote:
>> 
>> Add a generic interface to follow pfn in a pfnmap vma range. 
>This is used by
>> one of the subsequent x86 PAT related patch to keep track of 
>memory types
>> for vma regions across vma copy and free.
>> 
>> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
>> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
>> 
>> ---
>>  include/linux/mm.h |    3 +++
>>  mm/memory.c        |   43 
>+++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 46 insertions(+)
>> 
>> Index: tip/include/linux/mm.h
>> ===================================================================
>> --- tip.orig/include/linux/mm.h	2008-11-06 
>15:41:43.000000000 -0800
>> +++ tip/include/linux/mm.h	2008-11-10 09:44:45.000000000 -0800
>> @@ -1223,6 +1223,9 @@ struct page *follow_page(struct vm_area_
>>  #define FOLL_GET	0x04	/* do get_page on page */
>>  #define FOLL_ANON	0x08	/* give ZERO_PAGE if no pgtable */
>>  
>> +unsigned long follow_pfnmap_pte(struct vm_area_struct *vma,
>> +				unsigned long address, pte_t *ret_ptep);
>> +
>>  typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, 
>unsigned long addr,
>>  			void *data);
>>  extern int apply_to_page_range(struct mm_struct *mm, 
>unsigned long address,
>> Index: tip/mm/memory.c
>> ===================================================================
>> --- tip.orig/mm/memory.c	2008-11-06 15:41:43.000000000 -0800
>> +++ tip/mm/memory.c	2008-11-10 09:44:45.000000000 -0800
>> @@ -1111,6 +1111,49 @@ no_page_table:
>>  	return page;
>>  }
>>  
>> +unsigned long follow_pfnmap_pte(struct vm_area_struct *vma,
>> +			unsigned long address, pte_t *ret_ptep)
>> +{
>
>I think we'd typically return an int for functions like this. It
>probably wouldn't hurt to return more descriptive errors either.
>-EINVAL, perhaps, seems to fit best with what existing code there
>is doing.
>

Yes. Will change the return type to int with more appropriate
return values and update the patch.

Thanks,
Venki

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [patch 2/8] x86 PAT: set VM_PFNMAP flag in vm_insert_pfn
  2008-11-12 23:23   ` Nick Piggin
@ 2008-11-13  0:02     ` Pallipadi, Venkatesh
  2008-11-13  3:44       ` Nick Piggin
  2008-11-15  7:38     ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 27+ messages in thread
From: Pallipadi, Venkatesh @ 2008-11-13  0:02 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Hugh Dickins,
	Roland Dreier, Jesse Barnes, Jeremy Fitzhardinge,
	Arjan van de Ven, linux-kernel, Siddha, Suresh B

 

>-----Original Message-----
>From: Nick Piggin [mailto:npiggin@suse.de] 
>Sent: Wednesday, November 12, 2008 3:23 PM
>To: Pallipadi, Venkatesh
>Cc: Ingo Molnar; Thomas Gleixner; H.Peter Anvin; Hugh Dickins; 
>Roland Dreier; Jesse Barnes; Jeremy Fitzhardinge; Arjan van de 
>Ven; linux-kernel@vger.kernel.org; Siddha, Suresh B
>Subject: Re: [patch 2/8] x86 PAT: set VM_PFNMAP flag in vm_insert_pfn
>
>You have to be careful of this, because it can be called with mmap_sem
>held for read only. Hmm, I guess vm_insert_page is doing the 
>same thing.
>Probably mostly works because all other modifiers of vm_flags 
>are holding
>mmap_sem.

Yes. I did the patch looking at vm_insert_page doing similar thing.

>
>However, in some cases, code can do vm_insert_pfn and vm_insert_page
>(actually hmm, no vm_insert_mixed actually should cover most of those
>cases).
>
>Still, I'd be much happier if we could make these into BUG_ON, and then
>teach callers to set it in their .mmap routines.

Actually, vm_insert_pfn() already has a BUG_ON() at the start for cases
where neither (or both) MIXEDMAP and PFNMAP is not set. So, that should
cover the case we are worried about it here and we can eliminate this
patch altogether. Only part I am not sure about is why we are looking
for MIXEDMAP here. Shouldn't they be using vm_insert_mixed instead?

Thanks,
Venki


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 2/8] x86 PAT: set VM_PFNMAP flag in vm_insert_pfn
  2008-11-13  0:02     ` Pallipadi, Venkatesh
@ 2008-11-13  3:44       ` Nick Piggin
  2008-11-13 18:47         ` Pallipadi, Venkatesh
  0 siblings, 1 reply; 27+ messages in thread
From: Nick Piggin @ 2008-11-13  3:44 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Hugh Dickins,
	Roland Dreier, Jesse Barnes, Jeremy Fitzhardinge,
	Arjan van de Ven, linux-kernel, Siddha, Suresh B

On Wed, Nov 12, 2008 at 04:02:47PM -0800, Pallipadi, Venkatesh wrote:
>  
> 
> >-----Original Message-----
> >From: Nick Piggin [mailto:npiggin@suse.de] 
> >Sent: Wednesday, November 12, 2008 3:23 PM
> >To: Pallipadi, Venkatesh
> >Cc: Ingo Molnar; Thomas Gleixner; H.Peter Anvin; Hugh Dickins; 
> >Roland Dreier; Jesse Barnes; Jeremy Fitzhardinge; Arjan van de 
> >Ven; linux-kernel@vger.kernel.org; Siddha, Suresh B
> >Subject: Re: [patch 2/8] x86 PAT: set VM_PFNMAP flag in vm_insert_pfn
> >
> >You have to be careful of this, because it can be called with mmap_sem
> >held for read only. Hmm, I guess vm_insert_page is doing the 
> >same thing.
> >Probably mostly works because all other modifiers of vm_flags 
> >are holding
> >mmap_sem.
> 
> Yes. I did the patch looking at vm_insert_page doing similar thing.
> 
> >
> >However, in some cases, code can do vm_insert_pfn and vm_insert_page
> >(actually hmm, no vm_insert_mixed actually should cover most of those
> >cases).
> >
> >Still, I'd be much happier if we could make these into BUG_ON, and then
> >teach callers to set it in their .mmap routines.
> 
> Actually, vm_insert_pfn() already has a BUG_ON() at the start for cases
> where neither (or both) MIXEDMAP and PFNMAP is not set. So, that should
> cover the case we are worried about it here and we can eliminate this
> patch altogether. Only part I am not sure about is why we are looking
> for MIXEDMAP here. Shouldn't they be using vm_insert_mixed instead?

They should, but it will do an inesrt_pfn in some cases, won't it?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [patch 2/8] x86 PAT: set VM_PFNMAP flag in vm_insert_pfn
  2008-11-13  3:44       ` Nick Piggin
@ 2008-11-13 18:47         ` Pallipadi, Venkatesh
  2008-11-14  2:05           ` Nick Piggin
  0 siblings, 1 reply; 27+ messages in thread
From: Pallipadi, Venkatesh @ 2008-11-13 18:47 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Hugh Dickins,
	Roland Dreier, Jesse Barnes, Jeremy Fitzhardinge,
	Arjan van de Ven, linux-kernel, Siddha, Suresh B

 

>-----Original Message-----
>From: Nick Piggin [mailto:npiggin@suse.de] 
>Sent: Wednesday, November 12, 2008 7:44 PM
>To: Pallipadi, Venkatesh
>Cc: Ingo Molnar; Thomas Gleixner; H Peter Anvin; Hugh Dickins; 
>Roland Dreier; Jesse Barnes; Jeremy Fitzhardinge; Arjan van de 
>Ven; linux-kernel@vger.kernel.org; Siddha, Suresh B
>Subject: Re: [patch 2/8] x86 PAT: set VM_PFNMAP flag in vm_insert_pfn
>
>On Wed, Nov 12, 2008 at 04:02:47PM -0800, Pallipadi, Venkatesh wrote:
>>
>>
>> >-----Original Message-----
>> >From: Nick Piggin [mailto:npiggin@suse.de]
>> >Sent: Wednesday, November 12, 2008 3:23 PM
>> >To: Pallipadi, Venkatesh
>> >Cc: Ingo Molnar; Thomas Gleixner; H.Peter Anvin; Hugh Dickins;
>> >Roland Dreier; Jesse Barnes; Jeremy Fitzhardinge; Arjan van de
>> >Ven; linux-kernel@vger.kernel.org; Siddha, Suresh B
>> >Subject: Re: [patch 2/8] x86 PAT: set VM_PFNMAP flag in 
>vm_insert_pfn
>> >
>> >You have to be careful of this, because it can be called 
>with mmap_sem
>> >held for read only. Hmm, I guess vm_insert_page is doing the
>> >same thing.
>> >Probably mostly works because all other modifiers of vm_flags
>> >are holding
>> >mmap_sem.
>>
>> Yes. I did the patch looking at vm_insert_page doing similar thing.
>>
>> >
>> >However, in some cases, code can do vm_insert_pfn and vm_insert_page
>> >(actually hmm, no vm_insert_mixed actually should cover 
>most of those
>> >cases).
>> >
>> >Still, I'd be much happier if we could make these into 
>BUG_ON, and then
>> >teach callers to set it in their .mmap routines.
>>
>> Actually, vm_insert_pfn() already has a BUG_ON() at the 
>start for cases
>> where neither (or both) MIXEDMAP and PFNMAP is not set. So, 
>that should
>> cover the case we are worried about it here and we can eliminate this
>> patch altogether. Only part I am not sure about is why we are looking
>> for MIXEDMAP here. Shouldn't they be using vm_insert_mixed instead?
>
>They should, but it will do an inesrt_pfn in some cases, won't it?
>

Yes. It does. But, it calls a lower level insert_pfn() function. The lower
level insert_pfn() does not have any bug checks. But the higher level
vm_insert_pfn() checks for PFNMAP or MIXEDMAP.

Thanks,
Venki

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 2/8] x86 PAT: set VM_PFNMAP flag in vm_insert_pfn
  2008-11-13 18:47         ` Pallipadi, Venkatesh
@ 2008-11-14  2:05           ` Nick Piggin
  2008-11-14 21:35             ` Pallipadi, Venkatesh
  0 siblings, 1 reply; 27+ messages in thread
From: Nick Piggin @ 2008-11-14  2:05 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Hugh Dickins,
	Roland Dreier, Jesse Barnes, Jeremy Fitzhardinge,
	Arjan van de Ven, linux-kernel, Siddha, Suresh B

On Thu, Nov 13, 2008 at 10:47:23AM -0800, Pallipadi, Venkatesh wrote:
> >> >-----Original Message-----
> >> >From: Nick Piggin [mailto:npiggin@suse.de]
> >> >Sent: Wednesday, November 12, 2008 3:23 PM
> >> >To: Pallipadi, Venkatesh
> >> >Cc: Ingo Molnar; Thomas Gleixner; H.Peter Anvin; Hugh Dickins;
> >> >Roland Dreier; Jesse Barnes; Jeremy Fitzhardinge; Arjan van de
> >> >Ven; linux-kernel@vger.kernel.org; Siddha, Suresh B
> >> >Subject: Re: [patch 2/8] x86 PAT: set VM_PFNMAP flag in 
> >vm_insert_pfn
> >> >
> >> >You have to be careful of this, because it can be called 
> >with mmap_sem
> >> >held for read only. Hmm, I guess vm_insert_page is doing the
> >> >same thing.
> >> >Probably mostly works because all other modifiers of vm_flags
> >> >are holding
> >> >mmap_sem.
> >>
> >> Yes. I did the patch looking at vm_insert_page doing similar thing.
> >>
> >> >
> >> >However, in some cases, code can do vm_insert_pfn and vm_insert_page
> >> >(actually hmm, no vm_insert_mixed actually should cover 
> >most of those
> >> >cases).
> >> >
> >> >Still, I'd be much happier if we could make these into 
> >BUG_ON, and then
> >> >teach callers to set it in their .mmap routines.
> >>
> >> Actually, vm_insert_pfn() already has a BUG_ON() at the 
> >start for cases
> >> where neither (or both) MIXEDMAP and PFNMAP is not set. So, 
> >that should
> >> cover the case we are worried about it here and we can eliminate this
> >> patch altogether. Only part I am not sure about is why we are looking
> >> for MIXEDMAP here. Shouldn't they be using vm_insert_mixed instead?
> >
> >They should, but it will do an inesrt_pfn in some cases, won't it?
> >
> 
> Yes. It does. But, it calls a lower level insert_pfn() function. The lower
> level insert_pfn() does not have any bug checks. But the higher level
> vm_insert_pfn() checks for PFNMAP or MIXEDMAP.

Yes, but is there anything extra you need to check for cache aliases in
MIXEDMAP mappings?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [patch 2/8] x86 PAT: set VM_PFNMAP flag in vm_insert_pfn
  2008-11-14  2:05           ` Nick Piggin
@ 2008-11-14 21:35             ` Pallipadi, Venkatesh
  2008-11-17  2:30               ` Nick Piggin
  0 siblings, 1 reply; 27+ messages in thread
From: Pallipadi, Venkatesh @ 2008-11-14 21:35 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Hugh Dickins,
	Roland Dreier, Jesse Barnes, Jeremy Fitzhardinge,
	Arjan van de Ven, linux-kernel, Siddha, Suresh B



>-----Original Message-----
>From: Nick Piggin [mailto:npiggin@suse.de]
>Sent: Thursday, November 13, 2008 6:06 PM
>To: Pallipadi, Venkatesh
>Cc: Ingo Molnar; Thomas Gleixner; H Peter Anvin; Hugh Dickins;
>Roland Dreier; Jesse Barnes; Jeremy Fitzhardinge; Arjan van de
>Ven; linux-kernel@vger.kernel.org; Siddha, Suresh B
>Subject: Re: [patch 2/8] x86 PAT: set VM_PFNMAP flag in vm_insert_pfn
>
>On Thu, Nov 13, 2008 at 10:47:23AM -0800, Pallipadi, Venkatesh wrote:
>>
>> Yes. It does. But, it calls a lower level insert_pfn()
>function. The lower
>> level insert_pfn() does not have any bug checks. But the higher level
>> vm_insert_pfn() checks for PFNMAP or MIXEDMAP.
>
>Yes, but is there anything extra you need to check for cache aliases in
>MIXEDMAP mappings?
>

Yes. We need additional things to track MIXEDMAP and we are looking at that.
But, that is slightly more trickier than the general PFNMAP case. And
only in-tree user of MIXEDMAP is xip and that too it only uses it for
regular WB mapping. So, we thought we should fix the more common case
first here.

With MIXEDMAP there is no way whether to distinguish whether insert_pfn
Or insert_page was used while looking at VMA. We can probably use PFNMAP
in addition to MIXEDMAP to indicate that, which will make things easier.
But, we are still looking at that and trying to understand the change
implication.

Thanks,
Venki

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 2/8] x86 PAT: set VM_PFNMAP flag in vm_insert_pfn
  2008-11-12 23:23   ` Nick Piggin
  2008-11-13  0:02     ` Pallipadi, Venkatesh
@ 2008-11-15  7:38     ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2008-11-15  7:38 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Venkatesh Pallipadi, Ingo Molnar, Thomas Gleixner, H.Peter Anvin,
	Hugh Dickins, Roland Dreier, Jesse Barnes, Jeremy Fitzhardinge,
	Arjan van de Ven, linux-kernel, Suresh Siddha

On Thu, 2008-11-13 at 00:23 +0100, Nick Piggin wrote:
> You have to be careful of this, because it can be called with mmap_sem
> held for read only. Hmm, I guess vm_insert_page is doing the same thing.
> Probably mostly works because all other modifiers of vm_flags are holding
> mmap_sem.
> 
> However, in some cases, code can do vm_insert_pfn and vm_insert_page
> (actually hmm, no vm_insert_mixed actually should cover most of those
> cases).
> 
> Still, I'd be much happier if we could make these into BUG_ON, and then
> teach callers to set it in their .mmap routines.

Agreed. I don't think vm_insert_* is the right place to muck
with that ..

Ben.

>  
> On Wed, Nov 12, 2008 at 01:26:49PM -0800, Venkatesh Pallipadi wrote:
> > vm_insert_pfn() is not setting the VM_PFNMAP flag in vma. Fix that.
> > 
> > Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> > Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
> > 
> > ---
> >  mm/memory.c |    2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > Index: tip/mm/memory.c
> > ===================================================================
> > --- tip.orig/mm/memory.c	2008-11-06 09:44:56.000000000 -0800
> > +++ tip/mm/memory.c	2008-11-10 09:44:47.000000000 -0800
> > @@ -1444,6 +1444,8 @@ int vm_insert_pfn(struct vm_area_struct 
> >  
> >  	if (addr < vma->vm_start || addr >= vma->vm_end)
> >  		return -EFAULT;
> > +
> > +	vma->vm_flags |= VM_PFNMAP;
> >  	return insert_pfn(vma, addr, pfn, vma->vm_page_prot);
> >  }
> >  EXPORT_SYMBOL(vm_insert_pfn);
> > 
> > -- 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 2/8] x86 PAT: set VM_PFNMAP flag in vm_insert_pfn
  2008-11-14 21:35             ` Pallipadi, Venkatesh
@ 2008-11-17  2:30               ` Nick Piggin
  2008-11-18 21:37                 ` Ingo Molnar
  0 siblings, 1 reply; 27+ messages in thread
From: Nick Piggin @ 2008-11-17  2:30 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Hugh Dickins,
	Roland Dreier, Jesse Barnes, Jeremy Fitzhardinge,
	Arjan van de Ven, linux-kernel, Siddha, Suresh B

On Fri, Nov 14, 2008 at 01:35:38PM -0800, Pallipadi, Venkatesh wrote:
> 
> 
> >-----Original Message-----
> >From: Nick Piggin [mailto:npiggin@suse.de]
> >Sent: Thursday, November 13, 2008 6:06 PM
> >To: Pallipadi, Venkatesh
> >Cc: Ingo Molnar; Thomas Gleixner; H Peter Anvin; Hugh Dickins;
> >Roland Dreier; Jesse Barnes; Jeremy Fitzhardinge; Arjan van de
> >Ven; linux-kernel@vger.kernel.org; Siddha, Suresh B
> >Subject: Re: [patch 2/8] x86 PAT: set VM_PFNMAP flag in vm_insert_pfn
> >
> >On Thu, Nov 13, 2008 at 10:47:23AM -0800, Pallipadi, Venkatesh wrote:
> >>
> >> Yes. It does. But, it calls a lower level insert_pfn()
> >function. The lower
> >> level insert_pfn() does not have any bug checks. But the higher level
> >> vm_insert_pfn() checks for PFNMAP or MIXEDMAP.
> >
> >Yes, but is there anything extra you need to check for cache aliases in
> >MIXEDMAP mappings?
> >
> 
> Yes. We need additional things to track MIXEDMAP and we are looking at that.
> But, that is slightly more trickier than the general PFNMAP case. And
> only in-tree user of MIXEDMAP is xip and that too it only uses it for
> regular WB mapping. So, we thought we should fix the more common case
> first here.
> 
> With MIXEDMAP there is no way whether to distinguish whether insert_pfn
> Or insert_page was used while looking at VMA. We can probably use PFNMAP
> in addition to MIXEDMAP to indicate that, which will make things easier.

It's difficult because it can have either method for a single VMA, and
a given address in the vma may even change over time (not with current
code in kernel AFAIKS, but AXFS eventually might get to that point).


> But, we are still looking at that and trying to understand the change
> implication.

OK: now I understand correctly. Getting PFNMAP working is an important
first step. I agree.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 2/8] x86 PAT: set VM_PFNMAP flag in vm_insert_pfn
  2008-11-17  2:30               ` Nick Piggin
@ 2008-11-18 21:37                 ` Ingo Molnar
  2008-11-20 23:42                   ` Pallipadi, Venkatesh
  0 siblings, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2008-11-18 21:37 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Pallipadi, Venkatesh, Thomas Gleixner, H Peter Anvin,
	Hugh Dickins, Roland Dreier, Jesse Barnes, Jeremy Fitzhardinge,
	Arjan van de Ven, linux-kernel, Siddha, Suresh B


* Nick Piggin <npiggin@suse.de> wrote:

> On Fri, Nov 14, 2008 at 01:35:38PM -0800, Pallipadi, Venkatesh wrote:
> > 
> > 
> > >-----Original Message-----
> > >From: Nick Piggin [mailto:npiggin@suse.de]
> > >Sent: Thursday, November 13, 2008 6:06 PM
> > >To: Pallipadi, Venkatesh
> > >Cc: Ingo Molnar; Thomas Gleixner; H Peter Anvin; Hugh Dickins;
> > >Roland Dreier; Jesse Barnes; Jeremy Fitzhardinge; Arjan van de
> > >Ven; linux-kernel@vger.kernel.org; Siddha, Suresh B
> > >Subject: Re: [patch 2/8] x86 PAT: set VM_PFNMAP flag in vm_insert_pfn
> > >
> > >On Thu, Nov 13, 2008 at 10:47:23AM -0800, Pallipadi, Venkatesh wrote:
> > >>
> > >> Yes. It does. But, it calls a lower level insert_pfn()
> > >function. The lower
> > >> level insert_pfn() does not have any bug checks. But the higher level
> > >> vm_insert_pfn() checks for PFNMAP or MIXEDMAP.
> > >
> > >Yes, but is there anything extra you need to check for cache aliases in
> > >MIXEDMAP mappings?
> > >
> > 
> > Yes. We need additional things to track MIXEDMAP and we are looking at that.
> > But, that is slightly more trickier than the general PFNMAP case. And
> > only in-tree user of MIXEDMAP is xip and that too it only uses it for
> > regular WB mapping. So, we thought we should fix the more common case
> > first here.
> > 
> > With MIXEDMAP there is no way whether to distinguish whether insert_pfn
> > Or insert_page was used while looking at VMA. We can probably use PFNMAP
> > in addition to MIXEDMAP to indicate that, which will make things easier.
> 
> It's difficult because it can have either method for a single VMA, and
> a given address in the vma may even change over time (not with current
> code in kernel AFAIKS, but AXFS eventually might get to that point).
> 
> 
> > But, we are still looking at that and trying to understand the change
> > implication.
> 
> OK: now I understand correctly. Getting PFNMAP working is an important
> first step. I agree.

Venki, a patch logistics sidenote: the final mm/* bits of this 
patchset need acks from MM folks - Andrew, Nick or Hugh - we cannot 
just queue them up in the x86/pat tree without agreement from MM 
maintainers.

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [patch 2/8] x86 PAT: set VM_PFNMAP flag in vm_insert_pfn
  2008-11-18 21:37                 ` Ingo Molnar
@ 2008-11-20 23:42                   ` Pallipadi, Venkatesh
  2008-11-21  0:50                     ` Nick Piggin
  0 siblings, 1 reply; 27+ messages in thread
From: Pallipadi, Venkatesh @ 2008-11-20 23:42 UTC (permalink / raw)
  To: Ingo Molnar, Nick Piggin
  Cc: Thomas Gleixner, H Peter Anvin, Hugh Dickins, Roland Dreier,
	Jesse Barnes, Jeremy Fitzhardinge, Arjan van de Ven,
	linux-kernel, Siddha, Suresh B

 

>-----Original Message-----
>From: linux-kernel-owner@vger.kernel.org 
>[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Ingo Molnar
>Sent: Tuesday, November 18, 2008 1:38 PM
>To: Nick Piggin
>Cc: Pallipadi, Venkatesh; Thomas Gleixner; H Peter Anvin; Hugh 
>Dickins; Roland Dreier; Jesse Barnes; Jeremy Fitzhardinge; 
>Arjan van de Ven; linux-kernel@vger.kernel.org; Siddha, Suresh B
>Subject: Re: [patch 2/8] x86 PAT: set VM_PFNMAP flag in vm_insert_pfn
>
>
>* Nick Piggin <npiggin@suse.de> wrote:
>
>> On Fri, Nov 14, 2008 at 01:35:38PM -0800, Pallipadi, Venkatesh wrote:
>> > 
>> > 
>> > >-----Original Message-----
>> > >From: Nick Piggin [mailto:npiggin@suse.de]
>> > >Sent: Thursday, November 13, 2008 6:06 PM
>> > >To: Pallipadi, Venkatesh
>> > >Cc: Ingo Molnar; Thomas Gleixner; H Peter Anvin; Hugh Dickins;
>> > >Roland Dreier; Jesse Barnes; Jeremy Fitzhardinge; Arjan van de
>> > >Ven; linux-kernel@vger.kernel.org; Siddha, Suresh B
>> > >Subject: Re: [patch 2/8] x86 PAT: set VM_PFNMAP flag in 
>vm_insert_pfn
>> > >
>> > >On Thu, Nov 13, 2008 at 10:47:23AM -0800, Pallipadi, 
>Venkatesh wrote:
>> > >>
>> > >> Yes. It does. But, it calls a lower level insert_pfn()
>> > >function. The lower
>> > >> level insert_pfn() does not have any bug checks. But 
>the higher level
>> > >> vm_insert_pfn() checks for PFNMAP or MIXEDMAP.
>> > >
>> > >Yes, but is there anything extra you need to check for 
>cache aliases in
>> > >MIXEDMAP mappings?
>> > >
>> > 
>> > Yes. We need additional things to track MIXEDMAP and we 
>are looking at that.
>> > But, that is slightly more trickier than the general 
>PFNMAP case. And
>> > only in-tree user of MIXEDMAP is xip and that too it only 
>uses it for
>> > regular WB mapping. So, we thought we should fix the more 
>common case
>> > first here.
>> > 
>> > With MIXEDMAP there is no way whether to distinguish 
>whether insert_pfn
>> > Or insert_page was used while looking at VMA. We can 
>probably use PFNMAP
>> > in addition to MIXEDMAP to indicate that, which will make 
>things easier.
>> 
>> It's difficult because it can have either method for a 
>single VMA, and
>> a given address in the vma may even change over time (not 
>with current
>> code in kernel AFAIKS, but AXFS eventually might get to that point).
>> 
>> 
>> > But, we are still looking at that and trying to understand 
>the change
>> > implication.
>> 
>> OK: now I understand correctly. Getting PFNMAP working is an 
>important
>> first step. I agree.
>
>Venki, a patch logistics sidenote: the final mm/* bits of this 
>patchset need acks from MM folks - Andrew, Nick or Hugh - we cannot 
>just queue them up in the x86/pat tree without agreement from MM 
>maintainers.
>

OK. Feedback based on the discussions on this patchset, this particular patch
set VM_PFNMAP flag in vm_insert_pfn
is not really needed and we should rather BUG_ON on that not being set. 
And there were few more comments on one other generic vm patch from Nick.

I will resend the entire patchset with changes soon and then we can work
out the logistics.

Thanks,
Venki

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 2/8] x86 PAT: set VM_PFNMAP flag in vm_insert_pfn
  2008-11-20 23:42                   ` Pallipadi, Venkatesh
@ 2008-11-21  0:50                     ` Nick Piggin
  0 siblings, 0 replies; 27+ messages in thread
From: Nick Piggin @ 2008-11-21  0:50 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Hugh Dickins,
	Roland Dreier, Jesse Barnes, Jeremy Fitzhardinge,
	Arjan van de Ven, linux-kernel, Siddha, Suresh B

On Thu, Nov 20, 2008 at 03:42:32PM -0800, Pallipadi, Venkatesh wrote:
>  
> 
> >-----Original Message-----
> >From: linux-kernel-owner@vger.kernel.org 
> >[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Ingo Molnar
> >Sent: Tuesday, November 18, 2008 1:38 PM
> >To: Nick Piggin
> >Cc: Pallipadi, Venkatesh; Thomas Gleixner; H Peter Anvin; Hugh 
> >Dickins; Roland Dreier; Jesse Barnes; Jeremy Fitzhardinge; 
> >Arjan van de Ven; linux-kernel@vger.kernel.org; Siddha, Suresh B
> >Subject: Re: [patch 2/8] x86 PAT: set VM_PFNMAP flag in vm_insert_pfn
> >
> >
> >patchset need acks from MM folks - Andrew, Nick or Hugh - we cannot 
> >just queue them up in the x86/pat tree without agreement from MM 
> >maintainers.
> >
> 
> OK. Feedback based on the discussions on this patchset, this particular patch
> set VM_PFNMAP flag in vm_insert_pfn
> is not really needed and we should rather BUG_ON on that not being set. 
> And there were few more comments on one other generic vm patch from Nick.
> 
> I will resend the entire patchset with changes soon and then we can work
> out the logistics.

So long as it has been reviewed by mm people (cc'ing linux-mm might be
nice), I don't care so much how it makes its way upstream. If it was
a lot of code, it might make sense to slowly work dependencies though
various subsystems... but this isn't too much on the mm side, and it is
primarily support code for x86 subsystem feature so I don't mind myself
if it went via the x86 tree, with appropriate acks.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 4/8] x86 PAT: hooks in generic vm code to help archs to track pfnmap regions
  2008-11-12 21:26 ` [patch 4/8] x86 PAT: hooks in generic vm code to help archs to track pfnmap regions Venkatesh Pallipadi
@ 2008-12-16 19:57   ` Andrew Morton
  2008-12-16 20:07     ` Pallipadi, Venkatesh
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Morton @ 2008-12-16 19:57 UTC (permalink / raw)
  To: Venkatesh Pallipadi
  Cc: mingo, tglx, hpa, npiggin, hugh, rdreier, jbarnes, jeremy, arjan,
	linux-kernel, venkatesh.pallipadi, suresh.b.siddha

On Wed, 12 Nov 2008 13:26:51 -0800
Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> wrote:

> --- tip.orig/mm/memory.c	2008-11-11 10:10:11.000000000 -0800
> +++ tip/mm/memory.c	2008-11-11 12:10:18.000000000 -0800
> @@ -99,6 +99,28 @@ int randomize_va_space __read_mostly =
>  					2;
>  #endif
>  
> +#ifndef track_pfn_vma_new
> +int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t prot,
> +			unsigned long pfn, unsigned long size)
> +{
> +	return 0;
> +}
> +#endif
> +
> +#ifndef track_pfn_vma_copy
> +int track_pfn_vma_copy(struct vm_area_struct *vma)
> +{
> +	return 0;
> +}
> +#endif
> +
> +#ifndef untrack_pfn_vma
> +void untrack_pfn_vma(struct vm_area_struct *vma, unsigned long pfn,
> +			unsigned long size)
> +{
> +}
> +#endif

Using __weak would provide a somewhat neater result here.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [patch 4/8] x86 PAT: hooks in generic vm code to help archs to track pfnmap regions
  2008-12-16 19:57   ` Andrew Morton
@ 2008-12-16 20:07     ` Pallipadi, Venkatesh
  2008-12-16 20:13       ` Andrew Morton
  0 siblings, 1 reply; 27+ messages in thread
From: Pallipadi, Venkatesh @ 2008-12-16 20:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: mingo, tglx, hpa, npiggin, hugh, rdreier, jbarnes, jeremy, arjan,
	linux-kernel, Siddha, Suresh B

 

>-----Original Message-----
>From: Andrew Morton [mailto:akpm@linux-foundation.org] 
>Sent: Tuesday, December 16, 2008 11:57 AM
>To: Pallipadi, Venkatesh
>Cc: mingo@elte.hu; tglx@linutronix.de; hpa@zytor.com; 
>npiggin@suse.de; hugh@veritas.com; rdreier@cisco.com; 
>jbarnes@virtuousgeek.org; jeremy@goop.org; 
>arjan@infradead.org; linux-kernel@vger.kernel.org; Pallipadi, 
>Venkatesh; Siddha, Suresh B
>Subject: Re: [patch 4/8] x86 PAT: hooks in generic vm code to 
>help archs to track pfnmap regions
>
>On Wed, 12 Nov 2008 13:26:51 -0800
>Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> wrote:
>
>> --- tip.orig/mm/memory.c	2008-11-11 10:10:11.000000000 -0800
>> +++ tip/mm/memory.c	2008-11-11 12:10:18.000000000 -0800
>> @@ -99,6 +99,28 @@ int randomize_va_space __read_mostly =
>>  					2;
>>  #endif
>>  
>> +#ifndef track_pfn_vma_new
>> +int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t prot,
>> +			unsigned long pfn, unsigned long size)
>> +{
>> +	return 0;
>> +}
>> +#endif
>> +
>> +#ifndef track_pfn_vma_copy
>> +int track_pfn_vma_copy(struct vm_area_struct *vma)
>> +{
>> +	return 0;
>> +}
>> +#endif
>> +
>> +#ifndef untrack_pfn_vma
>> +void untrack_pfn_vma(struct vm_area_struct *vma, unsigned long pfn,
>> +			unsigned long size)
>> +{
>> +}
>> +#endif
>
>Using __weak would provide a somewhat neater result here.
>

Thought about that. But, then remembered the issues with gcc versions and __weak, as in here
http://lkml.org/lkml/2008/5/1/368

and decided to take the safer approach.

Thanks,
Venki

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 5/8] x86 PAT: Implement track/untrack of pfnmap regions for x86
  2008-11-12 21:26 ` [patch 5/8] x86 PAT: Implement track/untrack of pfnmap regions for x86 Venkatesh Pallipadi
@ 2008-12-16 20:07   ` Andrew Morton
  2008-12-16 23:19     ` Pallipadi, Venkatesh
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Morton @ 2008-12-16 20:07 UTC (permalink / raw)
  To: Venkatesh Pallipadi
  Cc: mingo, tglx, hpa, npiggin, hugh, rdreier, jbarnes, jeremy, arjan,
	linux-kernel, venkatesh.pallipadi, suresh.b.siddha

On Wed, 12 Nov 2008 13:26:52 -0800
Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> wrote:

> Hookup remap_pfn_range and vm_insert_pfn and corresponding copy and free
> routines with reserve and free tracking.
> 
> reserve and free here only takes care of non RAM region mapping. For RAM
> region, driver should use set_memory_[uc|wc|wb] to set the cache type and
> then setup the mapping for user pte. We can bypass below
> reserve/free in that case.
> 
> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
> 
> ---
>  arch/x86/include/asm/pgtable.h |   10 ++
>  arch/x86/mm/pat.c              |  201 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 211 insertions(+)
> 
> Index: tip/arch/x86/mm/pat.c
> ===================================================================
> --- tip.orig/arch/x86/mm/pat.c	2008-11-12 09:55:55.000000000 -0800
> +++ tip/arch/x86/mm/pat.c	2008-11-12 12:06:50.000000000 -0800
> @@ -596,6 +596,207 @@ void unmap_devmem(unsigned long pfn, uns
>  	free_memtype(addr, addr + size);
>  }
>  
> +int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t vma_prot)

I don't think the kernel is improved by leaving all these things
undocumented.

These are interfaces which other architectures might choose to
implement, so a few words explaining why they exist, what they are
supposed to do, etc wouldn't hurt.

It would certainly help code reviewers.

> +{
> +	unsigned long flags;
> +	unsigned long want_flags = (pgprot_val(vma_prot) & _PAGE_CACHE_MASK);
> +
> +	int is_ram = 0;
> +	int id_sz, ret;

The newline in the middle of the definition of locals makes no sense.

> +	is_ram = pagerange_is_ram(paddr, paddr + size);
> +
> +	if (is_ram != 0) {
> +		/*
> +		 * For mapping RAM pages, drivers need to call
> +		 * set_memory_[uc|wc|wb] directly, for reserve and free, before
> +		 * setting up the PTE.
> +		 */
> +		WARN_ON_ONCE(1);
> +		return 0;
> +	}
> +
> +	ret = reserve_memtype(paddr, paddr + size, want_flags, &flags);
> +	if (ret)
> +		return ret;
> +
> +	if (flags != want_flags) {
> +		free_memtype(paddr, paddr + size);
> +		printk(KERN_INFO

Would KERN_ERR be more appropriate?

> +		"%s:%d map pfn expected mapping type %s for %Lx-%Lx, got %s\n",
> +			current->comm, current->pid,

get_task_comm() is more reliable, but it hardly matters here.

> +			cattr_name(want_flags),
> +			paddr, (unsigned long long)(paddr + size),

Printing the u64 paddr without a cast is OK in arch code (we know that
u64 is implemented as unsigned long long).  But one might choose to
cast it anyway, to set a good example.

Or one could not bother.  But this code does it both ways!

> +			cattr_name(flags));
> +		return -EINVAL;
> +	}
> +
> +	/* Need to keep identity mapping in sync */
> +	if (paddr >= __pa(high_memory))
> +		return 0;
> +
> +	id_sz = (__pa(high_memory) < paddr + size) ?
> +				__pa(high_memory) - paddr :
> +				size;
> +
> +	if (ioremap_change_attr((unsigned long)__va(paddr), id_sz, flags) < 0) {
> +		free_memtype(paddr, paddr + size);
> +		printk(KERN_INFO

KERN_ERR?

> +			"%s:%d reserve_pfn_range ioremap_change_attr failed %s "
> +			"for %Lx-%Lx\n",
> +			current->comm, current->pid,
> +			cattr_name(flags),
> +			paddr, (unsigned long long)(paddr + size));

ditto

> +		return -EINVAL;
> +	}
> +	return 0;
> +}
> +
> +void free_pfn_range(u64 paddr, unsigned long size)
> +{
> +	int is_ram;
> +
> +	is_ram = pagerange_is_ram(paddr, paddr + size);
> +	if (is_ram == 0)
> +		free_memtype(paddr, paddr + size);
> +}
>
> ...
>
> +int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t prot,
> +			unsigned long pfn, unsigned long size)
> +{
> +	int retval = 0;
> +	unsigned long i, j;
> +	u64 base_paddr;
> +	u64 paddr;
> +	unsigned long vma_start = vma->vm_start;
> +	unsigned long vma_end = vma->vm_end;
> +	unsigned long vma_size = vma_end - vma_start;
> +
> +	if (!pat_enabled)
> +		return 0;
> +
> +	if (is_linear_pfn_mapping(vma)) {
> +		/* reserve the whole chunk starting from vm_pgoff */
> +		paddr = (u64)vma->vm_pgoff << PAGE_SHIFT;
> +		return reserve_pfn_range(paddr, vma_size, prot);
> +	}
> +
> +	/* reserve page by page using pfn and size */
> +	base_paddr = (u64)pfn << PAGE_SHIFT;
> +	for (i = 0; i < size; i += PAGE_SIZE) {
> +		paddr = base_paddr + i;
> +		retval = reserve_pfn_range(paddr, PAGE_SIZE, prot);
> +		if (retval)
> +			goto cleanup_ret;
> +	}
> +	return 0;
> +
> +cleanup_ret:
> +	/* Reserve error: Cleanup partial reservation and return error */
> +	for (j = 0; j < i; j += PAGE_SIZE) {
> +		paddr = base_paddr + j;
> +		free_pfn_range(paddr, PAGE_SIZE);

Could we simply do

	free_pfn_range(base_paddr, i * PAGE_SIZE);

here?

If not, then perhaps add a helper function to do this, because that
loop gets repeated in several places.
	
> +	}
> +
> +	return retval;
> +}
> +
>
> ...
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 4/8] x86 PAT: hooks in generic vm code to help archs to track pfnmap regions
  2008-12-16 20:07     ` Pallipadi, Venkatesh
@ 2008-12-16 20:13       ` Andrew Morton
  0 siblings, 0 replies; 27+ messages in thread
From: Andrew Morton @ 2008-12-16 20:13 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: mingo, tglx, hpa, npiggin, hugh, rdreier, jbarnes, jeremy, arjan,
	linux-kernel, suresh.b.siddha

On Tue, 16 Dec 2008 12:07:56 -0800
"Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com> wrote:

>  
> 
> >-----Original Message-----
> >From: Andrew Morton [mailto:akpm@linux-foundation.org] 
> >Sent: Tuesday, December 16, 2008 11:57 AM
> >To: Pallipadi, Venkatesh
> >Cc: mingo@elte.hu; tglx@linutronix.de; hpa@zytor.com; 
> >npiggin@suse.de; hugh@veritas.com; rdreier@cisco.com; 
> >jbarnes@virtuousgeek.org; jeremy@goop.org; 
> >arjan@infradead.org; linux-kernel@vger.kernel.org; Pallipadi, 
> >Venkatesh; Siddha, Suresh B
> >Subject: Re: [patch 4/8] x86 PAT: hooks in generic vm code to 
> >help archs to track pfnmap regions

Your emails are getting mucked up.

> >On Wed, 12 Nov 2008 13:26:51 -0800
> >Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> wrote:
> >
> >> --- tip.orig/mm/memory.c	2008-11-11 10:10:11.000000000 -0800
> >> +++ tip/mm/memory.c	2008-11-11 12:10:18.000000000 -0800
> >> @@ -99,6 +99,28 @@ int randomize_va_space __read_mostly =
> >>  					2;
> >>  #endif
> >>  
> >> +#ifndef track_pfn_vma_new
> >> +int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t prot,
> >> +			unsigned long pfn, unsigned long size)
> >> +{
> >> +	return 0;
> >> +}
> >> +#endif
> >> +
> >> +#ifndef track_pfn_vma_copy
> >> +int track_pfn_vma_copy(struct vm_area_struct *vma)
> >> +{
> >> +	return 0;
> >> +}
> >> +#endif
> >> +
> >> +#ifndef untrack_pfn_vma
> >> +void untrack_pfn_vma(struct vm_area_struct *vma, unsigned long pfn,
> >> +			unsigned long size)
> >> +{
> >> +}
> >> +#endif
> >
> >Using __weak would provide a somewhat neater result here.
> >
> 
> Thought about that. But, then remembered the issues with gcc versions and __weak, as in here
> http://lkml.org/lkml/2008/5/1/368
> 
> and decided to take the safer approach.
> 

Yes, it is a concern.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 5/8] x86 PAT: Implement track/untrack of pfnmap regions for x86
  2008-12-16 20:07   ` Andrew Morton
@ 2008-12-16 23:19     ` Pallipadi, Venkatesh
  0 siblings, 0 replies; 27+ messages in thread
From: Pallipadi, Venkatesh @ 2008-12-16 23:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Pallipadi, Venkatesh, mingo, tglx, hpa, npiggin, hugh, rdreier,
	jbarnes, jeremy, arjan, linux-kernel, Siddha, Suresh B

On Tue, Dec 16, 2008 at 12:07:59PM -0800, Andrew Morton wrote:
> Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> wrote:
> 
> >  2 files changed, 211 insertions(+)
> > 
> > Index: tip/arch/x86/mm/pat.c
> > ===================================================================
> > --- tip.orig/arch/x86/mm/pat.c	2008-11-12 09:55:55.000000000 -0800
> > +++ tip/arch/x86/mm/pat.c	2008-11-12 12:06:50.000000000 -0800
> > @@ -596,6 +596,207 @@ void unmap_devmem(unsigned long pfn, uns
> >  	free_memtype(addr, addr + size);
> >  }
> >  
> > +int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t vma_prot)
> 
> I don't think the kernel is improved by leaving all these things
> undocumented.
> 
> These are interfaces which other architectures might choose to
> implement, so a few words explaining why they exist, what they are
> supposed to do, etc wouldn't hurt.
> 
> It would certainly help code reviewers.

Agreed. Will add documentation to the new functions here and update the patch.

> 
> > +{
> > +	unsigned long flags;
> > +	unsigned long want_flags = (pgprot_val(vma_prot) & _PAGE_CACHE_MASK);
> > +
> > +	int is_ram = 0;
> > +	int id_sz, ret;
> 
> The newline in the middle of the definition of locals makes no sense.

Will change.

> > +	is_ram = pagerange_is_ram(paddr, paddr + size);
> > +
> > +	if (is_ram != 0) {
> > +		/*
> > +		 * For mapping RAM pages, drivers need to call
> > +		 * set_memory_[uc|wc|wb] directly, for reserve and free, before
> > +		 * setting up the PTE.
> > +		 */
> > +		WARN_ON_ONCE(1);
> > +		return 0;
> > +	}
> > +
> > +	ret = reserve_memtype(paddr, paddr + size, want_flags, &flags);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (flags != want_flags) {
> > +		free_memtype(paddr, paddr + size);
> > +		printk(KERN_INFO
> 
> Would KERN_ERR be more appropriate?

OK.

> 
> > +		"%s:%d map pfn expected mapping type %s for %Lx-%Lx, got %s\n",
> > +			current->comm, current->pid,
> 
> get_task_comm() is more reliable, but it hardly matters here.

current->comm was used at other places in this file. I should probably change
all of them as a separate cleanup patch.

> 
> > +			cattr_name(want_flags),
> > +			paddr, (unsigned long long)(paddr + size),
> 
> Printing the u64 paddr without a cast is OK in arch code (we know that
> u64 is implemented as unsigned long long).  But one might choose to
> cast it anyway, to set a good example.
> 
> Or one could not bother.  But this code does it both ways!
> 

Will change.

> > +			cattr_name(flags));
> > +		return -EINVAL;
> > +	}
> > +
> > +	/* Need to keep identity mapping in sync */
> > +	if (paddr >= __pa(high_memory))
> > +		return 0;
> > +
> > +	id_sz = (__pa(high_memory) < paddr + size) ?
> > +				__pa(high_memory) - paddr :
> > +				size;
> > +
> > +	if (ioremap_change_attr((unsigned long)__va(paddr), id_sz, flags) < 0) {
> > +		free_memtype(paddr, paddr + size);
> > +		printk(KERN_INFO
> 
> KERN_ERR?

OK.

> 
> > +			"%s:%d reserve_pfn_range ioremap_change_attr failed %s "
> > +			"for %Lx-%Lx\n",
> > +			current->comm, current->pid,
> > +			cattr_name(flags),
> > +			paddr, (unsigned long long)(paddr + size));
> 
> ditto

Will do.

> 
> > +		return -EINVAL;
> > +	}
> > +	return 0;
> > +}
> > +
> > +void free_pfn_range(u64 paddr, unsigned long size)
> > +{
> > +	int is_ram;
> > +
> > +	is_ram = pagerange_is_ram(paddr, paddr + size);
> > +	if (is_ram == 0)
> > +		free_memtype(paddr, paddr + size);
> > +}
> >
> > ...
> >
> > +int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t prot,
> > +			unsigned long pfn, unsigned long size)
> > +{
> > +	int retval = 0;
> > +	unsigned long i, j;
> > +	u64 base_paddr;
> > +	u64 paddr;
> > +	unsigned long vma_start = vma->vm_start;
> > +	unsigned long vma_end = vma->vm_end;
> > +	unsigned long vma_size = vma_end - vma_start;
> > +
> > +	if (!pat_enabled)
> > +		return 0;
> > +
> > +	if (is_linear_pfn_mapping(vma)) {
> > +		/* reserve the whole chunk starting from vm_pgoff */
> > +		paddr = (u64)vma->vm_pgoff << PAGE_SHIFT;
> > +		return reserve_pfn_range(paddr, vma_size, prot);
> > +	}
> > +
> > +	/* reserve page by page using pfn and size */
> > +	base_paddr = (u64)pfn << PAGE_SHIFT;
> > +	for (i = 0; i < size; i += PAGE_SIZE) {
> > +		paddr = base_paddr + i;
> > +		retval = reserve_pfn_range(paddr, PAGE_SIZE, prot);
> > +		if (retval)
> > +			goto cleanup_ret;
> > +	}
> > +	return 0;
> > +
> > +cleanup_ret:
> > +	/* Reserve error: Cleanup partial reservation and return error */
> > +	for (j = 0; j < i; j += PAGE_SIZE) {
> > +		paddr = base_paddr + j;
> > +		free_pfn_range(paddr, PAGE_SIZE);
> 
> Could we simply do
> 
> 	free_pfn_range(base_paddr, i * PAGE_SIZE);
> 
> here?
> 
> If not, then perhaps add a helper function to do this, because that
> loop gets repeated in several places.
> 	

 	free_pfn_range(base_paddr, i * PAGE_SIZE);

will not work as free should correspond to reserve granularity. I agree making
this a function should make this cleaner. Will do that in the refresh of this
patch.

Thanks,
Venki


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2008-12-16 23:19 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-11-12 21:26 [patch 0/8] x86 PAT: track pfnmap mappings with remap_pfn_range and vm_insert_pfn Venkatesh Pallipadi
2008-11-12 21:26 ` [patch 1/8] x86 PAT: store vm_pgoff for all linear_over_vma_region mappings Venkatesh Pallipadi
2008-11-12 21:26 ` [patch 2/8] x86 PAT: set VM_PFNMAP flag in vm_insert_pfn Venkatesh Pallipadi
2008-11-12 23:23   ` Nick Piggin
2008-11-13  0:02     ` Pallipadi, Venkatesh
2008-11-13  3:44       ` Nick Piggin
2008-11-13 18:47         ` Pallipadi, Venkatesh
2008-11-14  2:05           ` Nick Piggin
2008-11-14 21:35             ` Pallipadi, Venkatesh
2008-11-17  2:30               ` Nick Piggin
2008-11-18 21:37                 ` Ingo Molnar
2008-11-20 23:42                   ` Pallipadi, Venkatesh
2008-11-21  0:50                     ` Nick Piggin
2008-11-15  7:38     ` Benjamin Herrenschmidt
2008-11-12 21:26 ` [patch 3/8] x86 PAT: Add follow_pfnmp_pte routine to help tracking pfnmap pages Venkatesh Pallipadi
2008-11-12 23:27   ` Nick Piggin
2008-11-12 23:54     ` Pallipadi, Venkatesh
2008-11-12 21:26 ` [patch 4/8] x86 PAT: hooks in generic vm code to help archs to track pfnmap regions Venkatesh Pallipadi
2008-12-16 19:57   ` Andrew Morton
2008-12-16 20:07     ` Pallipadi, Venkatesh
2008-12-16 20:13       ` Andrew Morton
2008-11-12 21:26 ` [patch 5/8] x86 PAT: Implement track/untrack of pfnmap regions for x86 Venkatesh Pallipadi
2008-12-16 20:07   ` Andrew Morton
2008-12-16 23:19     ` Pallipadi, Venkatesh
2008-11-12 21:26 ` [patch 6/8] x86 PAT: change pgprot_noncached to uc_minus instead of strong uc Venkatesh Pallipadi
2008-11-12 21:26 ` [patch 7/8] x86 PAT: add pgprot_writecombine() interface for drivers Venkatesh Pallipadi
2008-11-12 21:26 ` [patch 8/8] x86 PAT: update documentation to cover pgprot and remap_pfn related changes Venkatesh Pallipadi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.