linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/12] PAT 64b: PAT support for X86_64
@ 2007-12-13 23:55 venkatesh.pallipadi
  2007-12-13 23:55 ` [RFC PATCH 01/12] PAT 64b: Add cpu_shutdown() support venkatesh.pallipadi
                   ` (12 more replies)
  0 siblings, 13 replies; 52+ messages in thread
From: venkatesh.pallipadi @ 2007-12-13 23:55 UTC (permalink / raw)
  To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes
  Cc: linux-kernel

Yes. It is that wonderful time of the year again.
No, no. We are not talking about holiday season or new year here.

We are talking about one another rehash of "why we do not support PAT in x86"
question and series of patches that implement some PAT support before going
into hibernation again. Only difference is that we hope to take this little
further this time and may be really get this support into
upstream kernel soon.


This series is heavily derived from the PAT patchset by Eric Biederman and
Andi Kleen.
http://www.firstfloor.org/pub/ak/x86_64/pat/

We have forwarded ported these patches to latest kernel, addressed some of the
race conditions, cut a lot more corners to get this into a working patchset
that can be seen as an RFC. Specifically, the chanegs we added include:

* Various bug fixes over original patchset above.
* Change x86_64 identity map to only map non-reserved memory. This helps
  to handle UC/WC mapping of reserved region in a much simple manner
  (we don't have to do cpa any more, as such not keep track of the actual
   reference counts. We still  track all the usages to keep the mappings
   consistent. We just avoid the headache of splitting mattr regions for
   managing ref counts for every individual usage of the reserved area).
* Modify reserve_mattr and free_mattr to handle various mapping of reserved 
  regions cleanly.


There are many rough edges in the patchset. TBD list below refers to
the open issues that we have thought through during this process. 

TBD:
* Do we need to allow RAM pages to be mapped as WC? If not, then
  we don't need to follow the TLB flush mechanism (make pte not present,
  flush, and set pte with new mapping) mentioned in section 10.12.4 of SDM
  Vol3a.
* If the above can be assumed, then for a complete solution, handle RAM
  pages with UC and /dev/mem mapping conflicts.
  Can we use the existing page struct to keep track of the /dev/mem
  mappings (through the page ref count) and not allow
  to free the page while the /dev/mem mappings are active. And
  allow /dev/mem to map only those pages which are marked reserved (which
  the driver does before doing iomap).
* For X and others, do we need the ioctl interface to sysfs or get the type
  attribute through a different sysfs file.
* Clean up early table space allocation, avoiding overallocation there.
* Avoid mapping 0 - 1M physical addresses in kernel text mapping.
* Read reserved regions in /dev/mem read() as 0xffff or something, and continue
  reading across holes, till we reach the high_memory (end of memory).
* For fork(), for every /dev/mem mapping, we have to keep track of the usage
  by doing reserve_mattr().

There are also many edges completely missing. Lot of things we did not look
at all for this first cut. Specifically:

* Only supports x86_64 for now. i386 may not even compile with this patchset.
* We did not look at implication of PAT on Suspend-Resume.
* We did not look at implications of PAT on KEXEC.
* Coding style details.

We expect this can be done easily once we have discussed/resolved the
basic PAT problems with this RFC.


Fireaway all comments, complaints, concerns and things we may break while
we do this.

Tested with 2.6.24-rc4 and X86_64.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

-- 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH 01/12] PAT 64b: Add cpu_shutdown() support
  2007-12-13 23:55 [RFC PATCH 00/12] PAT 64b: PAT support for X86_64 venkatesh.pallipadi
@ 2007-12-13 23:55 ` venkatesh.pallipadi
  2007-12-13 23:55 ` [RFC PATCH 02/12] PAT 64b: Basic PAT implementation venkatesh.pallipadi
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 52+ messages in thread
From: venkatesh.pallipadi @ 2007-12-13 23:55 UTC (permalink / raw)
  To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: cpu_shutdown.patch --]
[-- Type: text/plain, Size: 2039 bytes --]

Doesn't do anything yet.
  
Based on a earlier patch by Eric Biederman and Andi Kleen.

Simple forward port of cpu-shutdown.patch to x86 tree.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: linux-2.6/arch/x86/kernel/reboot_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/reboot_64.c	2007-12-11 03:30:35.000000000 -0800
+++ linux-2.6/arch/x86/kernel/reboot_64.c	2007-12-11 03:30:46.000000000 -0800
@@ -113,6 +113,7 @@
 #endif
 
 	disable_IO_APIC();
+	cpu_shutdown();
 
 #ifdef CONFIG_HPET_TIMER
 	hpet_disable();
Index: linux-2.6/arch/x86/kernel/setup64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup64.c	2007-12-11 03:30:35.000000000 -0800
+++ linux-2.6/arch/x86/kernel/setup64.c	2007-12-11 03:30:46.000000000 -0800
@@ -293,3 +293,7 @@
 
 	raw_local_save_flags(kernel_eflags);
 }
+
+void cpu_shutdown(void)
+{
+}
Index: linux-2.6/include/asm-x86/processor_64.h
===================================================================
--- linux-2.6.orig/include/asm-x86/processor_64.h	2007-12-11 03:30:35.000000000 -0800
+++ linux-2.6/include/asm-x86/processor_64.h	2007-12-11 03:30:46.000000000 -0800
@@ -100,6 +100,7 @@
 extern char ignore_irq13;
 
 extern void identify_cpu(struct cpuinfo_x86 *);
+extern void cpu_shutdown(void);
 extern void print_cpu_info(struct cpuinfo_x86 *);
 extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c);
 extern unsigned int init_intel_cacheinfo(struct cpuinfo_x86 *c);
Index: linux-2.6/arch/x86/kernel/smp_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/smp_64.c	2007-10-25 02:45:41.000000000 -0700
+++ linux-2.6/arch/x86/kernel/smp_64.c	2007-12-11 03:36:31.000000000 -0800
@@ -471,6 +471,7 @@
 	 */
 	cpu_clear(smp_processor_id(), cpu_online_map);
 	disable_local_APIC();
+	cpu_shutdown();
 	for (;;) 
 		halt();
 } 

-- 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH 02/12] PAT 64b: Basic PAT implementation
  2007-12-13 23:55 [RFC PATCH 00/12] PAT 64b: PAT support for X86_64 venkatesh.pallipadi
  2007-12-13 23:55 ` [RFC PATCH 01/12] PAT 64b: Add cpu_shutdown() support venkatesh.pallipadi
@ 2007-12-13 23:55 ` venkatesh.pallipadi
  2007-12-14  0:42   ` Andi Kleen
  2007-12-14  3:48   ` Eric W. Biederman
  2007-12-13 23:55 ` [RFC PATCH 03/12] PAT 64b: drm driver changes for PAT venkatesh.pallipadi
                   ` (10 subsequent siblings)
  12 siblings, 2 replies; 52+ messages in thread
From: venkatesh.pallipadi @ 2007-12-13 23:55 UTC (permalink / raw)
  To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: pat-base.patch --]
[-- Type: text/plain, Size: 8407 bytes --]

Originally based on a patch from Eric Biederman, but heavily changed.

Forward port of pat-base.patch to x86 tree, with a bug fix.
Code was using 'PCD|PWT' i.e., PAT3 for WC mapping. So set the WC mapping at
correct PAT fields PA3/PA7.

TBD: KEXEC and other CPU offline paths may need pat_shutdown()?

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
Index: linux-2.6/arch/x86/kernel/setup64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup64.c	2007-12-11 03:30:46.000000000 -0800
+++ linux-2.6/arch/x86/kernel/setup64.c	2007-12-11 03:42:08.000000000 -0800
@@ -291,9 +291,11 @@
 
 	fpu_init(); 
 
+	pat_init();
 	raw_local_save_flags(kernel_eflags);
 }
 
 void cpu_shutdown(void)
 {
+	pat_shutdown();
 }
Index: linux-2.6/arch/x86/mm/Makefile_64
===================================================================
--- linux-2.6.orig/arch/x86/mm/Makefile_64	2007-12-11 03:30:34.000000000 -0800
+++ linux-2.6/arch/x86/mm/Makefile_64	2007-12-11 03:42:08.000000000 -0800
@@ -2,7 +2,7 @@
 # Makefile for the linux x86_64-specific parts of the memory manager.
 #
 
-obj-y	 := init_64.o fault_64.o ioremap_64.o extable_64.o pageattr_64.o mmap_64.o
+obj-y	 := init_64.o fault_64.o ioremap_64.o extable_64.o pageattr_64.o mmap_64.o pat.o
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
 obj-$(CONFIG_NUMA) += numa_64.o
 obj-$(CONFIG_K8_NUMA) += k8topology_64.o
Index: linux-2.6/arch/x86/mm/pat.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/arch/x86/mm/pat.c	2007-12-11 04:12:47.000000000 -0800
@@ -0,0 +1,57 @@
+/* Handle caching attributes in page tables (PAT) */
+#include <linux/mm.h>
+#include <linux/kernel.h>
+#include <linux/rbtree.h>
+#include <linux/gfp.h>
+#include <asm/msr.h>
+#include <asm/tlbflush.h>
+#include <asm/processor.h>
+
+static u64 boot_pat_state;
+
+enum {
+	PAT_UC = 0,   	/* uncached */
+	PAT_WC = 1,		/* Write combining */
+	PAT_WT = 4,		/* Write Through */
+	PAT_WP = 5,		/* Write Protected */
+	PAT_WB = 6,		/* Write Back (default) */
+	PAT_UC_MINUS = 7,	/* UC, but can be overriden by MTRR */
+};
+
+#define PAT(x,y) ((u64)PAT_ ## y << ((x)*8))
+
+void __cpuinit pat_init(void)
+{
+	/* Set PWT+PCD to Write-Combining. All other bits stay the same */
+	if (cpu_has_pat) {
+		u64 pat;
+		/* PTE encoding used in Linux:
+                   PAT
+                   |PCD
+                   ||PWT
+                   |||
+		   000 WB         default
+		   010 UC_MINUS   _PAGE_PCD
+		   011 WC         _PAGE_WC
+		   PAT bit unused */
+		pat = PAT(0,WB) | PAT(1,WT) | PAT(2,UC_MINUS) | PAT(3,WC) |
+		      PAT(4,WB) | PAT(5,WT) | PAT(6,UC_MINUS) | PAT(7,WC);
+		rdmsrl(MSR_IA32_CR_PAT, boot_pat_state);
+		wrmsrl(MSR_IA32_CR_PAT, pat);
+		__flush_tlb_all();
+		asm volatile("wbinvd");
+	}
+}
+
+#undef PAT
+
+void pat_shutdown(void)
+{
+	/* Restore CPU default pat state */
+	if (cpu_has_pat) {
+		wrmsrl(MSR_IA32_CR_PAT, boot_pat_state);
+		__flush_tlb_all();
+		asm volatile("wbinvd");
+	}
+}
+
Index: linux-2.6/arch/x86/pci/i386.c
===================================================================
--- linux-2.6.orig/arch/x86/pci/i386.c	2007-12-11 03:30:34.000000000 -0800
+++ linux-2.6/arch/x86/pci/i386.c	2007-12-11 03:42:08.000000000 -0800
@@ -300,8 +300,6 @@
 int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
 			enum pci_mmap_state mmap_state, int write_combine)
 {
-	unsigned long prot;
-
 	/* I/O space cannot be accessed via normal processor loads and
 	 * stores on this platform.
 	 */
@@ -311,14 +309,11 @@
 	/* Leave vm_pgoff as-is, the PCI space address is the physical
 	 * address on this platform.
 	 */
-	prot = pgprot_val(vma->vm_page_prot);
-	if (boot_cpu_data.x86 > 3)
-		prot |= _PAGE_PCD | _PAGE_PWT;
-	vma->vm_page_prot = __pgprot(prot);
+	if (write_combine)
+		vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
+	else
+		vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
 
-	/* Write-combine setting is ignored, it is changed via the mtrr
-	 * interfaces on this platform.
-	 */
 	if (io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
 			       vma->vm_end - vma->vm_start,
 			       vma->vm_page_prot))
Index: linux-2.6/include/asm-x86/cpufeature_32.h
===================================================================
--- linux-2.6.orig/include/asm-x86/cpufeature_32.h	2007-12-11 03:30:34.000000000 -0800
+++ linux-2.6/include/asm-x86/cpufeature_32.h	2007-12-11 03:42:08.000000000 -0800
@@ -166,6 +166,8 @@
 #define cpu_has_clflush		boot_cpu_has(X86_FEATURE_CLFLSH)
 #define cpu_has_bts 		boot_cpu_has(X86_FEATURE_BTS)
 
+#define cpu_has_pat		boot_cpu_has(X86_FEATURE_PAT)
+
 #endif /* __ASM_I386_CPUFEATURE_H */
 
 /* 
Index: linux-2.6/include/asm-x86/msr-index.h
===================================================================
--- linux-2.6.orig/include/asm-x86/msr-index.h	2007-12-11 03:30:34.000000000 -0800
+++ linux-2.6/include/asm-x86/msr-index.h	2007-12-11 03:42:08.000000000 -0800
@@ -63,6 +63,7 @@
 #define MSR_IA32_LASTINTFROMIP		0x000001dd
 #define MSR_IA32_LASTINTTOIP		0x000001de
 
+#define MSR_IA32_CR_PAT			0x00000277
 #define MSR_IA32_MC0_CTL		0x00000400
 #define MSR_IA32_MC0_STATUS		0x00000401
 #define MSR_IA32_MC0_ADDR		0x00000402
Index: linux-2.6/include/asm-x86/pgtable_64.h
===================================================================
--- linux-2.6.orig/include/asm-x86/pgtable_64.h	2007-12-11 03:30:34.000000000 -0800
+++ linux-2.6/include/asm-x86/pgtable_64.h	2007-12-11 03:42:08.000000000 -0800
@@ -164,6 +164,12 @@
 #define _PAGE_FILE	0x040	/* nonlinear file mapping, saved PTE; unset:swap */
 #define _PAGE_GLOBAL	0x100	/* Global TLB entry */
 
+/* We redefine PWT|PCD to be write combining. PAT bit is not used */
+
+#define _PAGE_WC	(_PAGE_PWT|_PAGE_PCD)
+
+#define _PAGE_CACHE_MASK	(_PAGE_PWT|_PAGE_PCD)
+
 #define _PAGE_PROTNONE	0x080	/* If not present */
 #define _PAGE_NX        (_AC(1,UL)<<_PAGE_BIT_NX)
 
@@ -203,6 +209,7 @@
 #define PAGE_KERNEL_EXEC MAKE_GLOBAL(__PAGE_KERNEL_EXEC)
 #define PAGE_KERNEL_RO MAKE_GLOBAL(__PAGE_KERNEL_RO)
 #define PAGE_KERNEL_NOCACHE MAKE_GLOBAL(__PAGE_KERNEL_NOCACHE)
+#define PAGE_KERNEL_WC MAKE_GLOBAL(__PAGE_KERNEL_WC)
 #define PAGE_KERNEL_VSYSCALL32 __pgprot(__PAGE_KERNEL_VSYSCALL)
 #define PAGE_KERNEL_VSYSCALL MAKE_GLOBAL(__PAGE_KERNEL_VSYSCALL)
 #define PAGE_KERNEL_LARGE MAKE_GLOBAL(__PAGE_KERNEL_LARGE)
@@ -299,8 +306,24 @@
 
 /*
  * Macro to mark a page protection value as "uncacheable".
+ * Accesses through a uncached translation bypasses the cache
+ * and do not allow for consecutive writes to be combined.
  */
-#define pgprot_noncached(prot)	(__pgprot(pgprot_val(prot) | _PAGE_PCD | _PAGE_PWT))
+#define pgprot_noncached(prot) \
+	__pgprot((pgprot_val(prot) & ~_PAGE_CACHE_MASK) | _PAGE_PCD)
+
+/*
+ * Macro to make mark a page protection value as "write-combining".
+ * Accesses through a write-combining translation works bypasses the
+ * caches, but does allow for consecutive writes to be combined into
+ * single (but larger) write transactions.
+ * This is mostly useful for IO accesses, for memory it is often slower.
+ * It also implies uncached.
+ */
+#define pgprot_writecombine(prot) \
+	__pgprot((pgprot_val(prot) & ~_PAGE_CACHE_MASK) | _PAGE_WC)
+
+#define pgprot_nonstd(prot) (pgprot_val(prot) & _PAGE_CACHE_MASK)
 
 static inline int pmd_large(pmd_t pte) { 
 	return (pmd_val(pte) & __LARGE_PTE) == __LARGE_PTE; 
@@ -414,6 +437,7 @@
 #define pgtable_cache_init()   do { } while (0)
 #define check_pgt_cache()      do { } while (0)
 
+/* AGP users use MTRRs for now. Need to add an ioctl to agpgart for WC */
 #define PAGE_AGP    PAGE_KERNEL_NOCACHE
 #define HAVE_PAGE_AGP 1
 
Index: linux-2.6/include/asm-x86/processor_64.h
===================================================================
--- linux-2.6.orig/include/asm-x86/processor_64.h	2007-12-11 03:30:46.000000000 -0800
+++ linux-2.6/include/asm-x86/processor_64.h	2007-12-11 03:42:08.000000000 -0800
@@ -105,6 +105,8 @@
 extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c);
 extern unsigned int init_intel_cacheinfo(struct cpuinfo_x86 *c);
 extern unsigned short num_cache_leaves;
+extern void pat_init(void);
+extern void pat_shutdown(void);
 
 /*
  * Save the cr4 feature set we're using (ie

-- 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH 03/12] PAT 64b: drm driver changes for PAT
  2007-12-13 23:55 [RFC PATCH 00/12] PAT 64b: PAT support for X86_64 venkatesh.pallipadi
  2007-12-13 23:55 ` [RFC PATCH 01/12] PAT 64b: Add cpu_shutdown() support venkatesh.pallipadi
  2007-12-13 23:55 ` [RFC PATCH 02/12] PAT 64b: Basic PAT implementation venkatesh.pallipadi
@ 2007-12-13 23:55 ` venkatesh.pallipadi
  2007-12-13 23:55 ` [RFC PATCH 04/12] PAT 64b: reserve_mattr and free_mattr " venkatesh.pallipadi
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 52+ messages in thread
From: venkatesh.pallipadi @ 2007-12-13 23:55 UTC (permalink / raw)
  To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: pat-drivers.patch --]
[-- Type: text/plain, Size: 3042 bytes --]

Straight forward port of pat-drivers.patch to x86 tree

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

diff --git a/drivers/char/drm/drm_proc.c b/drivers/char/drm/drm_proc.c
index 12dfea8..c49be81 100644
--- a/drivers/char/drm/drm_proc.c
+++ b/drivers/char/drm/drm_proc.c
@@ -510,13 +510,13 @@ static int drm__vma_info(char *buf, char **start, off_t offset, int request,
 			       vma->vm_flags & VM_IO ? 'i' : '-',
 			       vma->vm_pgoff);
 
-#if defined(__i386__)
+#if defined(CONFIG_X86)
 		pgprot = pgprot_val(vma->vm_page_prot);
 		DRM_PROC_PRINT(" %c%c%c%c%c%c%c%c%c",
 			       pgprot & _PAGE_PRESENT ? 'p' : '-',
 			       pgprot & _PAGE_RW ? 'w' : 'r',
 			       pgprot & _PAGE_USER ? 'u' : 's',
-			       pgprot & _PAGE_PWT ? 't' : 'b',
+			       ((pgprot & _PAGE_CACHE_MASK) == _PAGE_WC) ? 'w' : 'b',
 			       pgprot & _PAGE_PCD ? 'u' : 'c',
 			       pgprot & _PAGE_ACCESSED ? 'a' : '-',
 			       pgprot & _PAGE_DIRTY ? 'd' : '-',
diff --git a/drivers/char/drm/drm_vm.c b/drivers/char/drm/drm_vm.c
index e8d50af..1bd4b49 100644
--- a/drivers/char/drm/drm_vm.c
+++ b/drivers/char/drm/drm_vm.c
@@ -45,11 +45,8 @@ static pgprot_t drm_io_prot(uint32_t map_type, struct vm_area_struct *vma)
 {
 	pgprot_t tmp = vm_get_page_prot(vma->vm_flags);
 
-#if defined(__i386__) || defined(__x86_64__)
-	if (boot_cpu_data.x86 > 3 && map_type != _DRM_AGP) {
-		pgprot_val(tmp) |= _PAGE_PCD;
-		pgprot_val(tmp) &= ~_PAGE_PWT;
-	}
+#ifdef CONFIG_X86
+	vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
 #elif defined(__powerpc__)
 	pgprot_val(tmp) |= _PAGE_NO_CACHE;
 	if (map_type == _DRM_REGISTERS)
diff --git a/drivers/video/gbefb.c b/drivers/video/gbefb.c
index b9b572b..1126e20 100644
--- a/drivers/video/gbefb.c
+++ b/drivers/video/gbefb.c
@@ -57,7 +57,7 @@ struct gbefb_par {
 #endif
 #endif
 #ifdef CONFIG_X86
-#define pgprot_fb(_prot) ((_prot) | _PAGE_PCD)
+#define pgprot_fb(_prot) pgprot_writecombine(_prot)
 #endif
 
 /*
diff --git a/drivers/video/sgivwfb.c b/drivers/video/sgivwfb.c
index 4fb1624..3a75a7b 100644
--- a/drivers/video/sgivwfb.c
+++ b/drivers/video/sgivwfb.c
@@ -714,8 +714,7 @@ static int sgivwfb_mmap(struct fb_info *info,
 	if (offset + size > sgivwfb_mem_size)
 		return -EINVAL;
 	offset += sgivwfb_mem_phys;
-	pgprot_val(vma->vm_page_prot) =
-	    pgprot_val(vma->vm_page_prot) | _PAGE_PCD;
+	vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
 	vma->vm_flags |= VM_IO;
 	if (remap_pfn_range(vma, vma->vm_start, offset >> PAGE_SHIFT,
 						size, vma->vm_page_prot))
diff --git a/include/asm-x86/fb.h b/include/asm-x86/fb.h
index 5301846..e438d48 100644
--- a/include/asm-x86/fb.h
+++ b/include/asm-x86/fb.h
@@ -8,8 +8,7 @@
 static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma,
 				unsigned long off)
 {
-	if (boot_cpu_data.x86 > 3)
-		pgprot_val(vma->vm_page_prot) |= _PAGE_PCD;
+	vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
 }
 
 #ifdef CONFIG_X86_32

-- 

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC PATCH 04/12] PAT 64b: reserve_mattr and free_mattr for PAT
  2007-12-13 23:55 [RFC PATCH 00/12] PAT 64b: PAT support for X86_64 venkatesh.pallipadi
                   ` (2 preceding siblings ...)
  2007-12-13 23:55 ` [RFC PATCH 03/12] PAT 64b: drm driver changes for PAT venkatesh.pallipadi
@ 2007-12-13 23:55 ` venkatesh.pallipadi
  2007-12-13 23:55 ` [RFC PATCH 05/12] PAT 64b: pci mmap conlfict patch venkatesh.pallipadi
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 52+ messages in thread
From: venkatesh.pallipadi @ 2007-12-13 23:55 UTC (permalink / raw)
  To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: pat-conflict.patch --]
[-- Type: text/plain, Size: 5133 bytes --]

Straight forward port of pat-conflict.patch to x86 tree.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: linux-2.6.24-rc4/arch/x86/mm/ioremap_64.c
===================================================================
--- linux-2.6.24-rc4.orig/arch/x86/mm/ioremap_64.c	2007-12-11 14:24:56.000000000 -0800
+++ linux-2.6.24-rc4/arch/x86/mm/ioremap_64.c	2007-12-12 15:03:26.000000000 -0800
@@ -19,6 +19,7 @@
 #include <asm/tlbflush.h>
 #include <asm/cacheflush.h>
 #include <asm/proto.h>
+#include <asm/pat.h>
 
 unsigned long __phys_addr(unsigned long x)
 {
@@ -125,12 +126,23 @@
 		remove_vm_area((void *)(PAGE_MASK & (unsigned long) addr));
 		return NULL;
 	}
+
+	/* For plain ioremap() get the existing attributes. Otherwise
+	   check against the existing ones */
+	if (reserve_mattr(phys_addr, phys_addr + size, flags,
+			  flags ? NULL : &flags) < 0)
+		goto out;
+
 	if (flags && ioremap_change_attr(phys_addr, size, flags) < 0) {
-		area->flags &= 0xffffff;
-		vunmap(addr);
-		return NULL;
+		free_mattr(phys_addr, phys_addr + size, flags);
+		goto out;
 	}
 	return (__force void __iomem *) (offset + (char *)addr);
+
+out:
+	area->flags &= 0xffffff;
+	vunmap(addr);
+	return NULL;
 }
 EXPORT_SYMBOL(__ioremap);
 
@@ -198,8 +210,10 @@
 	}
 
 	/* Reset the direct mapping. Can block */
-	if (p->flags >> 20)
+	if (p->flags >> 20) {
+		free_mattr(p->phys_addr, p->phys_addr + p->size, p->flags>>20);
 		ioremap_change_attr(p->phys_addr, p->size, 0);
+	}
 
 	/* Finally remove it */
 	o = remove_vm_area((void *)addr);
Index: linux-2.6.24-rc4/arch/x86/mm/pat.c
===================================================================
--- linux-2.6.24-rc4.orig/arch/x86/mm/pat.c	2007-12-11 15:08:12.000000000 -0800
+++ linux-2.6.24-rc4/arch/x86/mm/pat.c	2007-12-12 15:06:52.000000000 -0800
@@ -6,6 +6,8 @@
 #include <asm/msr.h>
 #include <asm/tlbflush.h>
 #include <asm/processor.h>
+#include <asm/pgtable.h>
+#include <asm/pat.h>
 
 static u64 boot_pat_state;
 
@@ -55,3 +57,96 @@
 	}
 }
 
+/* The global memattr list keeps track of caching attributes for specific
+   physical memory areas. Conflicting caching attributes in different
+   mappings can cause CPU cache corruption. To avoid this we keep track.
+
+   The list is sorted and can contain multiple entries for each address
+   (this allows reference counting for overlapping areas). All the aliases
+   have the same cache attributes of course.  Zero attributes are represente
+   as holes.
+
+   Currently the data structure is a list because the number of mappings
+   are right now expected to be relatively small. If this should be a problem
+   it could be changed to a rbtree or similar.
+
+   mattr_lock protects the whole list. */
+
+struct memattr {
+	struct list_head nd;
+	u64 start;
+	u64 end;
+	unsigned long attr;
+};
+
+static LIST_HEAD(mattr_list);
+static DEFINE_SPINLOCK(mattr_lock); 	/* protects memattr list */
+
+int reserve_mattr(u64 start, u64 end, unsigned long attr, unsigned long *fattr)
+{
+	struct memattr *ma = NULL, *ml;
+	int err = 0;
+	if (attr) {
+		ma  = kmalloc(sizeof(struct memattr), GFP_KERNEL);
+		if (!ma)
+			return -ENOMEM;
+		ma->start = start;
+		ma->end = end;
+		ma->attr = attr;
+	}
+	if (fattr)
+		*fattr = attr;
+	spin_lock(&mattr_lock);
+	list_for_each_entry(ml, &mattr_list, nd) {
+		if (ml->start <= start && ml->end >= end) {
+			if (fattr) {
+				attr = ml->attr;
+				*fattr = attr;
+			}
+			if (attr != ml->attr) {
+				printk(
+	KERN_ERR "%s:%d conflicting cache attribute %Lx-%Lx %lx<->%lx\n",
+					current->comm, current->pid,
+					start, end, attr, ml->attr);
+				err = -EBUSY;
+				break;
+			}
+		} else if (ml->start >= end) {
+			if (ma) {
+				list_add(&ma->nd, ml->nd.prev);
+				ma = NULL;
+			}
+			break;
+		}
+	}
+	if (ma)
+		list_add_tail(&ma->nd, &mattr_list);
+	spin_unlock(&mattr_lock);
+	return 0;
+}
+
+int free_mattr(u64 start, u64 end, unsigned long attr)
+{
+	struct memattr *ml;
+	int err = attr ? -EBUSY : 0;
+	spin_lock(&mattr_lock);
+	list_for_each_entry(ml, &mattr_list, nd) {
+		if (ml->start == start && ml->end == end) {
+			if (ml->attr != attr)
+				printk(KERN_ERR
+	"%s:%d conflicting cache attributes on free %Lx-%Lx %lx<->%lx\n",
+			current->comm, current->pid, start, end, attr,ml->attr);
+			list_del(&ml->nd);
+			kfree(ml);
+			err = 0;
+			break;
+		}
+	}
+	spin_unlock(&mattr_lock);
+	if (err)
+		printk(KERN_ERR "%s:%d freeing invalid mattr %Lx-%Lx %lx\n",
+			current->comm, current->pid,
+			start, end, attr);
+	return err;
+}
+
Index: linux-2.6.24-rc4/include/asm-x86/pat.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.24-rc4/include/asm-x86/pat.h	2007-12-11 15:39:28.000000000 -0800
@@ -0,0 +1,12 @@
+#ifndef _ASM_PAT_H
+#define _ASM_PAT_H 1
+
+#include <linux/types.h>
+
+/* Handle the page attribute table (PAT) of the CPU */
+
+int reserve_mattr(u64 start, u64 end, unsigned long attr, unsigned long *fattr);
+int free_mattr(u64 start, u64 end, unsigned long attr);
+
+#endif
+

-- 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH 05/12] PAT 64b: pci mmap conlfict patch
  2007-12-13 23:55 [RFC PATCH 00/12] PAT 64b: PAT support for X86_64 venkatesh.pallipadi
                   ` (3 preceding siblings ...)
  2007-12-13 23:55 ` [RFC PATCH 04/12] PAT 64b: reserve_mattr and free_mattr " venkatesh.pallipadi
@ 2007-12-13 23:55 ` venkatesh.pallipadi
  2007-12-13 23:55 ` [RFC PATCH 06/12] PAT 64b: Add ioremap_wc support venkatesh.pallipadi
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 52+ messages in thread
From: venkatesh.pallipadi @ 2007-12-13 23:55 UTC (permalink / raw)
  To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: pci-mmap-conflict.patch --]
[-- Type: text/plain, Size: 1979 bytes --]

Forward port of pci-mmap-conflict.patch to x86 tree.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: linux-2.6.24-rc4/arch/x86/pci/i386.c
===================================================================
--- linux-2.6.24-rc4.orig/arch/x86/pci/i386.c	2007-12-11 15:08:12.000000000 -0800
+++ linux-2.6.24-rc4/arch/x86/pci/i386.c	2007-12-11 15:43:14.000000000 -0800
@@ -30,6 +30,8 @@
 #include <linux/init.h>
 #include <linux/ioport.h>
 #include <linux/errno.h>
+#include <asm/pat.h>
+#include <asm/cacheflush.h>
 
 #include "pci.h"
 
@@ -297,9 +299,25 @@
 	pci_write_config_byte(dev, PCI_LATENCY_TIMER, lat);
 }
 
+static void pci_unmap_page_range(struct vm_area_struct *vma)
+{
+	u64 adr = (u64)vma->vm_pgoff << PAGE_SHIFT;
+	free_mattr(adr, adr + vma->vm_end - vma->vm_start,
+		pgprot_val(vma->vm_page_prot) & _PAGE_CACHE_MASK);
+}
+
+static struct vm_operations_struct pci_mmap_ops = {
+	.close = pci_unmap_page_range
+};
+
 int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
 			enum pci_mmap_state mmap_state, int write_combine)
 {
+	u64 addr = vma->vm_pgoff << PAGE_SHIFT;
+	unsigned long len = vma->vm_end - vma->vm_start;
+	unsigned long attr;
+	int err;
+
 	/* I/O space cannot be accessed via normal processor loads and
 	 * stores on this platform.
 	 */
@@ -314,10 +332,24 @@
 	else
 		vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
 
+	attr = pgprot_val(vma->vm_page_prot) & _PAGE_CACHE_MASK;
+	err = reserve_mattr(addr, addr+len, attr, NULL);
+	if (err)
+		return -EBUSY;
+
+	err = change_page_attr_addr(addr, len >> PAGE_SHIFT,
+				__pgprot(__PAGE_KERNEL | attr));
+	if (err) {
+		free_mattr(addr, addr+len, attr);
+		return err;
+	}
+
 	if (io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
 			       vma->vm_end - vma->vm_start,
 			       vma->vm_page_prot))
 		return -EAGAIN;
 
+	vma->vm_ops = &pci_mmap_ops;
+
 	return 0;
 }

-- 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH 06/12] PAT 64b: Add ioremap_wc support
  2007-12-13 23:55 [RFC PATCH 00/12] PAT 64b: PAT support for X86_64 venkatesh.pallipadi
                   ` (4 preceding siblings ...)
  2007-12-13 23:55 ` [RFC PATCH 05/12] PAT 64b: pci mmap conlfict patch venkatesh.pallipadi
@ 2007-12-13 23:55 ` venkatesh.pallipadi
  2007-12-14  4:17   ` Roland Dreier
  2007-12-13 23:55 ` [RFC PATCH 07/12] PAT 64b: dev mem chanegs for pat venkatesh.pallipadi
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 52+ messages in thread
From: venkatesh.pallipadi @ 2007-12-13 23:55 UTC (permalink / raw)
  To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: ioremap_wc.patch --]
[-- Type: text/plain, Size: 2509 bytes --]

Forward port of ioremap.patch to x86 tree.
 
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: linux-2.6.24-rc4/arch/x86/mm/ioremap_64.c
===================================================================
--- linux-2.6.24-rc4.orig/arch/x86/mm/ioremap_64.c	2007-12-11 15:48:35.000000000 -0800
+++ linux-2.6.24-rc4/arch/x86/mm/ioremap_64.c	2007-12-11 15:49:52.000000000 -0800
@@ -147,7 +147,7 @@
 EXPORT_SYMBOL(__ioremap);
 
 /**
- * ioremap_nocache     -   map bus memory into CPU space
+ * ioremap_nocache     -   map bus memory into CPU space uncached
  * @offset:    bus address of the memory
  * @size:      size of the resource to map
  *
@@ -175,6 +175,30 @@
 EXPORT_SYMBOL(ioremap_nocache);
 
 /**
+ * ioremap_wc    -   map bus memory into CPU space write combined
+ * @offset:    bus address of the memory
+ * @size:      size of the resource to map
+ *
+ * ioremap_wc performs a platform specific sequence of operations to
+ * make bus memory CPU accessible via the readb/readw/readl/writeb/
+ * writew/writel functions and the other mmio helpers. The returned
+ * address is not guaranteed to be usable directly as a virtual
+ * address.
+ *
+ * This version of ioremap ensures that the memory is marked write combining.
+ * Write combining allows faster writes to some hardware devices.
+ * See also iounmap_nocache for more details.
+ *
+ * Must be freed with iounmap.
+ */
+void __iomem *ioremap_wc(unsigned long phys_addr, unsigned long size)
+{
+	return __ioremap(phys_addr, size, _PAGE_WC);
+}
+EXPORT_SYMBOL(ioremap_wc);
+
+
+/**
  * iounmap - Free a IO remapping
  * @addr: virtual address from ioremap_*
  *
Index: linux-2.6.24-rc4/include/asm-x86/io_64.h
===================================================================
--- linux-2.6.24-rc4.orig/include/asm-x86/io_64.h	2007-12-11 14:24:56.000000000 -0800
+++ linux-2.6.24-rc4/include/asm-x86/io_64.h	2007-12-11 15:49:52.000000000 -0800
@@ -142,7 +142,8 @@
  * it's useful if some control registers are in such an area and write combining
  * or read caching is not desirable:
  */
-extern void __iomem * ioremap_nocache (unsigned long offset, unsigned long size);
+extern void __iomem * ioremap_nocache(unsigned long offset, unsigned long size);
+extern void __iomem * ioremap_wc(unsigned long offset, unsigned long size);
 extern void iounmap(volatile void __iomem *addr);
 extern void __iomem *fix_ioremap(unsigned idx, unsigned long phys);
 

-- 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH 07/12] PAT 64b: dev mem chanegs for pat
  2007-12-13 23:55 [RFC PATCH 00/12] PAT 64b: PAT support for X86_64 venkatesh.pallipadi
                   ` (5 preceding siblings ...)
  2007-12-13 23:55 ` [RFC PATCH 06/12] PAT 64b: Add ioremap_wc support venkatesh.pallipadi
@ 2007-12-13 23:55 ` venkatesh.pallipadi
  2007-12-13 23:55 ` [RFC PATCH 08/12] PAT 64b: coherent mmap and sysfs bin ioctl venkatesh.pallipadi
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 52+ messages in thread
From: venkatesh.pallipadi @ 2007-12-13 23:55 UTC (permalink / raw)
  To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: devmem.patch --]
[-- Type: text/plain, Size: 7303 bytes --]

Forward port of devmem.patch to x86 tree. With added bug fix of doing
cpa only with non zero flags.

TBD:

1. Handle RAM pages with UC/WC and /dev/mem mapping conflicts. This
conflict with RAM mapped as UC (identity mapping) and /dev/mem mapping
is already there in current mainline kernel.

2. For fork(), for every /dev/mem mapping, we have to keep track
of the usage by doing reserve_mattr().

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: linux-2.6.24-rc4/arch/x86/mm/pat.c
===================================================================
--- linux-2.6.24-rc4.orig/arch/x86/mm/pat.c	2007-12-11 15:39:28.000000000 -0800
+++ linux-2.6.24-rc4/arch/x86/mm/pat.c	2007-12-11 15:59:29.000000000 -0800
@@ -3,11 +3,14 @@
 #include <linux/kernel.h>
 #include <linux/rbtree.h>
 #include <linux/gfp.h>
+#include <linux/fs.h>
 #include <asm/msr.h>
 #include <asm/tlbflush.h>
 #include <asm/processor.h>
 #include <asm/pgtable.h>
 #include <asm/pat.h>
+#include <asm/cacheflush.h>
+#include <asm/fcntl.h>
 
 static u64 boot_pat_state;
 
@@ -57,6 +60,16 @@
 	}
 }
 
+static char *cattr_name(unsigned long flags)
+{
+	switch (flags & _PAGE_CACHE_MASK) {
+	case _PAGE_WC:  return "write combining";
+	case _PAGE_PCD: return "uncached";
+	case 0: 	return "default";
+	default: 	return "broken";
+	}
+}
+
 /* The global memattr list keeps track of caching attributes for specific
    physical memory areas. Conflicting caching attributes in different
    mappings can cause CPU cache corruption. To avoid this we keep track.
@@ -105,9 +118,10 @@
 			}
 			if (attr != ml->attr) {
 				printk(
-	KERN_ERR "%s:%d conflicting cache attribute %Lx-%Lx %lx<->%lx\n",
+	KERN_ERR "%s:%d conflicting cache attribute %Lx-%Lx %s<->%s\n",
 					current->comm, current->pid,
-					start, end, attr, ml->attr);
+					start, end,
+					cattr_name(attr), cattr_name(ml->attr));
 				err = -EBUSY;
 				break;
 			}
@@ -134,19 +148,60 @@
 		if (ml->start == start && ml->end == end) {
 			if (ml->attr != attr)
 				printk(KERN_ERR
-	"%s:%d conflicting cache attributes on free %Lx-%Lx %lx<->%lx\n",
-			current->comm, current->pid, start, end, attr,ml->attr);
+	"%s:%d conflicting cache attributes on free %Lx-%Lx %s<->%s\n",
+			current->comm, current->pid, start, end,
+			cattr_name(attr), cattr_name(ml->attr));
 			list_del(&ml->nd);
 			kfree(ml);
 			err = 0;
 			break;
 		}
 	}
 	spin_unlock(&mattr_lock);
 	if (err)
-		printk(KERN_ERR "%s:%d freeing invalid mattr %Lx-%Lx %lx\n",
+		printk(KERN_ERR "%s:%d freeing invalid mattr %Lx-%Lx %s\n",
 			current->comm, current->pid,
-			start, end, attr);
+			start, end, cattr_name(attr));
 	return err;
 }
 
+/* /dev/mem interface. Use the previous mapping */
+pgprot_t
+phys_mem_access_prot(struct file *file, unsigned long pfn, unsigned long size,
+		     pgprot_t vma_prot)
+{
+	u64 offset = pfn << PAGE_SHIFT;
+	unsigned long flags;
+	unsigned long want_flags = 0;
+	if ((file->f_flags & O_SYNC) || (offset >= __pa(high_memory)))
+		want_flags = _PAGE_PCD;
+
+	/* ignore error because we can't handle it here */
+	reserve_mattr(offset, offset+size, want_flags, &flags);
+	if (flags != want_flags) {
+		printk(KERN_INFO
+	"%s:%d /dev/mem expected mapping type %s for %Lx-%Lx, got %s\n",
+			current->comm, current->pid,
+			cattr_name(want_flags),
+			offset, offset+size,
+			cattr_name(flags));
+	}
+
+	if (offset < __pa(high_memory) && flags) {
+		/* RED-PEN when the kernel memory was write protected
+		   or similar before we'll destroy that here. need a pgprot
+		   mask in cpa? */
+		change_page_attr_addr(offset, size >> PAGE_SHIFT,
+				     __pgprot(__PAGE_KERNEL | flags));
+	}
+	return __pgprot((pgprot_val(vma_prot) & ~_PAGE_CACHE_MASK)|flags);
+}
+
+void unmap_devmem(unsigned long pfn, unsigned long size, pgprot_t vma_prot)
+{
+	u64 addr = (u64)pfn << PAGE_SHIFT;
+	free_mattr(addr, size, 0);
+	if (addr < __pa(high_memory) &&
+	   (pgprot_val(vma_prot) & _PAGE_CACHE_MASK))
+		change_page_attr_addr(addr, size >> PAGE_SHIFT, PAGE_KERNEL);
+}
Index: linux-2.6.24-rc4/drivers/char/mem.c
===================================================================
--- linux-2.6.24-rc4.orig/drivers/char/mem.c	2007-12-11 14:24:56.000000000 -0800
+++ linux-2.6.24-rc4/drivers/char/mem.c	2007-12-11 15:59:29.000000000 -0800
@@ -41,36 +41,7 @@
  */
 static inline int uncached_access(struct file *file, unsigned long addr)
 {
-#if defined(__i386__) && !defined(__arch_um__)
-	/*
-	 * On the PPro and successors, the MTRRs are used to set
-	 * memory types for physical addresses outside main memory,
-	 * so blindly setting PCD or PWT on those pages is wrong.
-	 * For Pentiums and earlier, the surround logic should disable
-	 * caching for the high addresses through the KEN pin, but
-	 * we maintain the tradition of paranoia in this code.
-	 */
-	if (file->f_flags & O_SYNC)
-		return 1;
- 	return !( test_bit(X86_FEATURE_MTRR, boot_cpu_data.x86_capability) ||
-		  test_bit(X86_FEATURE_K6_MTRR, boot_cpu_data.x86_capability) ||
-		  test_bit(X86_FEATURE_CYRIX_ARR, boot_cpu_data.x86_capability) ||
-		  test_bit(X86_FEATURE_CENTAUR_MCR, boot_cpu_data.x86_capability) )
-	  && addr >= __pa(high_memory);
-#elif defined(__x86_64__) && !defined(__arch_um__)
-	/* 
-	 * This is broken because it can generate memory type aliases,
-	 * which can cause cache corruptions
-	 * But it is only available for root and we have to be bug-to-bug
-	 * compatible with i386.
-	 */
-	if (file->f_flags & O_SYNC)
-		return 1;
-	/* same behaviour as i386. PAT always set to cached and MTRRs control the
-	   caching behaviour. 
-	   Hopefully a full PAT implementation will fix that soon. */	   
-	return 0;
-#elif defined(CONFIG_IA64)
+#if defined(CONFIG_IA64)
 	/*
 	 * On ia64, we ignore O_SYNC because we cannot tolerate memory attribute aliases.
 	 */
@@ -271,6 +242,22 @@
 }
 #endif
 
+void __attribute__((weak))
+unmap_devmem(unsigned long pfn, unsigned long len, pgprot_t prot)
+{
+	/* nothing. architectures can override. */
+}
+
+static void mmap_mem_close(struct vm_area_struct *vma)
+{
+	unmap_devmem(vma->vm_pgoff,  vma->vm_end - vma->vm_start,
+		     vma->vm_page_prot);
+}
+
+static struct vm_operations_struct mmap_mem_ops = {
+	.close = mmap_mem_close
+};
+
 static int mmap_mem(struct file * file, struct vm_area_struct * vma)
 {
 	size_t size = vma->vm_end - vma->vm_start;
@@ -285,6 +272,8 @@
 						 size,
 						 vma->vm_page_prot);
 
+	vma->vm_ops = &mmap_mem_ops;
+
 	/* Remap-pfn-range will mark the range VM_IO and VM_RESERVED */
 	if (remap_pfn_range(vma,
 			    vma->vm_start,
Index: linux-2.6.24-rc4/include/asm-x86/pgtable_64.h
===================================================================
--- linux-2.6.24-rc4.orig/include/asm-x86/pgtable_64.h	2007-12-11 15:08:12.000000000 -0800
+++ linux-2.6.24-rc4/include/asm-x86/pgtable_64.h	2007-12-11 15:59:29.000000000 -0800
@@ -452,6 +452,12 @@
 #define __HAVE_ARCH_PTEP_SET_WRPROTECT
 #define __HAVE_ARCH_PTE_SAME
 #include <asm-generic/pgtable.h>
+
+#define __HAVE_PHYS_MEM_ACCESS_PROT
+struct file;
+pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
+                              unsigned long size, pgprot_t vma_prot);
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* _X86_64_PGTABLE_H */

-- 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH 08/12] PAT 64b: coherent mmap and sysfs bin ioctl
  2007-12-13 23:55 [RFC PATCH 00/12] PAT 64b: PAT support for X86_64 venkatesh.pallipadi
                   ` (6 preceding siblings ...)
  2007-12-13 23:55 ` [RFC PATCH 07/12] PAT 64b: dev mem chanegs for pat venkatesh.pallipadi
@ 2007-12-13 23:55 ` venkatesh.pallipadi
  2007-12-14  0:19   ` Greg KH
                     ` (2 more replies)
  2007-12-13 23:55 ` [RFC PATCH 09/12] PAT 64b: map only usable memory in identity mapping venkatesh.pallipadi
                   ` (4 subsequent siblings)
  12 siblings, 3 replies; 52+ messages in thread
From: venkatesh.pallipadi @ 2007-12-13 23:55 UTC (permalink / raw)
  To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: sysfs-bin-ioctl.patch --]
[-- Type: text/plain, Size: 5491 bytes --]

Forward port of coherent-mmap.patch and sysfs-bin-ioctl.patch to x86 tree.

TBD: Do we need the ioctl interface to sysfs or get the type attribute
through a different sysfs file. And then actually specify the attribute
while doing pci_mmap_page_range ;-)

And when this interface is in place, X server has to use this interface for WC
mapping.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: linux-2.6.24-rc4/fs/sysfs/bin.c
===================================================================
--- linux-2.6.24-rc4.orig/fs/sysfs/bin.c	2007-12-11 16:23:26.000000000 -0800
+++ linux-2.6.24-rc4/fs/sysfs/bin.c	2007-12-11 16:32:01.000000000 -0800
@@ -221,6 +221,19 @@
 	return 0;
 }
 
+static int ioctl(struct inode *i, struct file *f, unsigned cmd,
+		 unsigned long arg)
+{
+	struct sysfs_dirent *attr_sd = f->f_path.dentry->d_fsdata;
+	struct bin_attribute *attr = attr_sd->s_bin_attr.bin_attr;
+	struct kobject *kobj = attr_sd->s_parent->s_dir.kobj;
+
+	if (!attr->ioctl)
+		return -EINVAL;
+
+	return attr->ioctl(kobj, attr, cmd, arg);
+}
+
 const struct file_operations bin_fops = {
 	.read		= read,
 	.write		= write,
@@ -228,6 +241,7 @@
 	.llseek		= generic_file_llseek,
 	.open		= open,
 	.release	= release,
+	.ioctl		= ioctl,
 };
 
 /**
Index: linux-2.6.24-rc4/include/linux/sysfs.h
===================================================================
--- linux-2.6.24-rc4.orig/include/linux/sysfs.h	2007-12-11 16:23:26.000000000 -0800
+++ linux-2.6.24-rc4/include/linux/sysfs.h	2007-12-11 16:29:07.000000000 -0800
@@ -69,6 +69,8 @@
 			 char *, loff_t, size_t);
 	int (*mmap)(struct kobject *, struct bin_attribute *attr,
 		    struct vm_area_struct *vma);
+	int (*ioctl)(struct kobject *, struct bin_attribute *attr,
+			unsigned cmd, unsigned long arg);
 };
 
 struct sysfs_ops {
Index: linux-2.6.24-rc4/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.24-rc4.orig/drivers/pci/pci-sysfs.c	2007-12-11 16:03:55.000000000 -0800
+++ linux-2.6.24-rc4/drivers/pci/pci-sysfs.c	2007-12-11 16:29:07.000000000 -0800
@@ -473,8 +473,56 @@
 			kfree(res_attr);
 		}
 	}
+
+#ifdef HAVE_PCI_COHERENT_MMAP
+	sysfs_remove_bin_file(&pdev->dev.kobj, pdev->coherent_attr);
+	kfree(pdev->coherent_attr);
+#endif
+}
+
+#ifdef HAVE_PCI_COHERENT_MMAP
+
+struct coh_mmap_data {
+	void *map;
+	struct device *dev;
+	dma_addr_t busadr;
+};
+
+void pci_coherent_mmap_close(struct vm_area_struct *vma)
+{
+	struct coh_mmap_data *cm = vma->vm_private_data;
+	dma_free_coherent(cm->dev, vma->vm_end - vma->vm_start, cm->map,
+			  cm->busadr);
 }
 
+static struct vm_operations_struct pci_coherent_vmops = {
+	.close = pci_coherent_mmap_close,
+};
+
+static int
+pci_mmap_coherent_mem(struct kobject *kobj, struct bin_attribute *attr,
+		  struct vm_area_struct *vma)
+{
+	struct device *dev = container_of(kobj, struct device, kobj);
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct coh_mmap_data *cm = kmalloc(sizeof(struct coh_mmap_data),
+					   GFP_KERNEL);
+	if (!cm)
+		return -ENOMEM;
+	cm->map = dma_alloc_coherent(dev, vma->vm_end - vma->vm_start,
+				     &cm->busadr, GFP_KERNEL);
+	cm->dev = dev;
+	if (!cm->map) {
+		kfree(cm->map);
+		return -ENOMEM;
+	}
+	vma->vm_private_data = cm;
+	vma->vm_pgoff = cm->busadr >> PAGE_SHIFT;
+	vma->vm_ops = &pci_coherent_vmops;
+	return pci_mmap_page_range(pdev, vma, pci_mmap_coherent, 0);
+}
+#endif
+
 /**
  * pci_create_resource_files - create resource files in sysfs for @dev
  * @dev: dev in question
@@ -692,6 +740,22 @@
 			kfree(pdev->rom_attr);
 		}
 	}
+#ifdef HAVE_PCI_COHERENT_MMAP
+	{
+		struct bin_attribute *a = kzalloc(sizeof(struct bin_attribute),
+						  GFP_KERNEL);
+		if (!a)
+			return;
+		pdev->coherent_attr = a;
+		a->attr.name = "coherent_mem";
+		a->attr.mode = S_IRUSR | S_IWUSR;
+		a->attr.owner = THIS_MODULE;
+		a->size = *(pdev->dev.dma_mask);
+		a->mmap = pci_mmap_coherent_mem;
+		a->private = NULL;
+		sysfs_create_bin_file(&pdev->dev.kobj, a);
+	}
+#endif
 }
 
 static int __init pci_sysfs_init(void)
Index: linux-2.6.24-rc4/include/asm-x86/pci.h
===================================================================
--- linux-2.6.24-rc4.orig/include/asm-x86/pci.h	2007-12-11 16:03:55.000000000 -0800
+++ linux-2.6.24-rc4/include/asm-x86/pci.h	2007-12-11 16:29:07.000000000 -0800
@@ -61,6 +61,7 @@
 
 
 #define HAVE_PCI_MMAP
+#define HAVE_PCI_COHERENT_MMAP
 extern int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
 			       enum pci_mmap_state mmap_state, int write_combine);
 
Index: linux-2.6.24-rc4/include/linux/pci.h
===================================================================
--- linux-2.6.24-rc4.orig/include/linux/pci.h	2007-12-11 16:03:55.000000000 -0800
+++ linux-2.6.24-rc4/include/linux/pci.h	2007-12-11 16:29:07.000000000 -0800
@@ -57,7 +57,8 @@
 /* File state for mmap()s on /proc/bus/pci/X/Y */
 enum pci_mmap_state {
 	pci_mmap_io,
-	pci_mmap_mem
+	pci_mmap_mem,
+	pci_mmap_coherent
 };
 
 /* This defines the direction arg to the DMA mapping routines. */
@@ -201,6 +202,7 @@
 	struct bin_attribute *rom_attr; /* attribute descriptor for sysfs ROM entry */
 	int rom_attr_enabled;		/* has display of the rom attribute been enabled? */
 	struct bin_attribute *res_attr[DEVICE_COUNT_RESOURCE]; /* sysfs file for resources */
+	struct bin_attribute *coherent_attr;
 #ifdef CONFIG_PCI_MSI
 	struct list_head msi_list;
 #endif

-- 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH 09/12] PAT 64b: map only usable memory in identity mapping
  2007-12-13 23:55 [RFC PATCH 00/12] PAT 64b: PAT support for X86_64 venkatesh.pallipadi
                   ` (7 preceding siblings ...)
  2007-12-13 23:55 ` [RFC PATCH 08/12] PAT 64b: coherent mmap and sysfs bin ioctl venkatesh.pallipadi
@ 2007-12-13 23:55 ` venkatesh.pallipadi
  2007-12-13 23:55 ` [RFC PATCH 10/12] PAT 64b: Make acpi use early map instead of assuming identity map venkatesh.pallipadi
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 52+ messages in thread
From: venkatesh.pallipadi @ 2007-12-13 23:55 UTC (permalink / raw)
  To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: usable_only_map.patch --]
[-- Type: text/plain, Size: 9105 bytes --]

Map only the usable memory, i.e., memory mapped in e820 and not marked as
reserved, in the identity mapping. This includes 'usable' and 'ACPI *' regions.

Mapping reserved regions in identity map, even though it has worked in practise,
can potentially be problematic. With identity map, there can be speculative
access to these reserved regions which can have undetermined behavior.

Caveat is that the legacy ISA address (0xa0000 - 0x100000) is always mapped,
even when it is reserved in e820. VGA seems to depend on this.

TODO:
* Clean up early table space allocation, avoiding overallocation there.
* Avoid mapping 0 - 1M physical addresses in kernel text mapping.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: linux-2.6.24-rc/arch/x86/kernel/e820_64.c
===================================================================
--- linux-2.6.24-rc.orig/arch/x86/kernel/e820_64.c
+++ linux-2.6.24-rc/arch/x86/kernel/e820_64.c
@@ -121,6 +121,35 @@ e820_any_mapped(unsigned long start, uns
 }
 EXPORT_SYMBOL_GPL(e820_any_mapped);
 
+int e820_any_non_reserved(unsigned long start, unsigned long end)
+{
+	int i;
+	for (i = 0; i < e820.nr_map; i++) {
+		struct e820entry *ei = &e820.map[i];
+		if (ei->type == E820_RESERVED)
+			continue;
+		if (ei->addr >= end || ei->addr + ei->size <= start)
+			continue;
+		return 1;
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(e820_any_non_reserved);
+
+int is_memory_any_valid(unsigned long start, unsigned long end)
+{
+	/*
+	 * Keep low PCI/ISA area always mapped.
+	 * Note: end address is exclusive and start is inclusive here
+	 */
+	if (start >= ISA_START_ADDRESS && end <= ISA_END_ADDRESS)
+		return 1;
+
+	/* Switch to efi or e820 in future here */
+	return e820_any_non_reserved(start, end);
+}
+EXPORT_SYMBOL_GPL(is_memory_any_valid);
+
 /*
  * This function checks if the entire range <start,end> is mapped with type.
  *
@@ -150,6 +179,47 @@ int __init e820_all_mapped(unsigned long
 	return 0;
 }
 
+int e820_all_non_reserved(unsigned long start, unsigned long end)
+{
+	int i;
+	for (i = 0; i < e820.nr_map; i++) {
+		struct e820entry *ei = &e820.map[i];
+		if (ei->type == E820_RESERVED)
+			continue;
+
+		/* is the region (part) in overlap with the current region ?*/
+		if (ei->addr >= end || ei->addr + ei->size <= start)
+			continue;
+
+		/*
+		 * if the region is at the beginning of <start,end> we move
+		 * start to the end of the region since it's ok until there
+		 */
+		if (ei->addr <= start)
+			start = ei->addr + ei->size;
+
+		/* if start is at or beyond end, we're done, full coverage */
+		if (start >= end)
+			return 1; /* we're done */
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(e820_all_non_reserved);
+
+int is_memory_all_valid(unsigned long start, unsigned long end)
+{
+	/*
+	 * Keep low PCI/ISA area always mapped.
+	 * Note: end address is exclusive and start is inclusive here
+	 */
+	if (start >= ISA_START_ADDRESS && end <= ISA_END_ADDRESS)
+		return 1;
+
+	/* Switch to efi or e820 in future here */
+	return e820_all_non_reserved(start, end);
+}
+EXPORT_SYMBOL_GPL(is_memory_all_valid);
+
 /* 
  * Find a free area in a specific range. 
  */ 
Index: linux-2.6.24-rc/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.24-rc.orig/arch/x86/mm/init_64.c
+++ linux-2.6.24-rc/arch/x86/mm/init_64.c
@@ -250,13 +250,46 @@ __meminit void early_iounmap(void *addr,
 }
 
 static void __meminit
+phys_pte_init(pte_t *pte_page, unsigned long address, unsigned long end)
+{
+	int i = pte_index(address); // (address % PMD_SIZE) >> PAGE_SHIFT;
+
+	for (; i < PTRS_PER_PTE; i++, address += PAGE_SIZE) {
+		unsigned long entry;
+		pte_t *pte = pte_page + i;
+
+		if (address >= end) {
+			if (!after_bootmem)
+				for (; i < PTRS_PER_PTE; i++, pte++)
+					set_pte(pte, __pte(0));
+			break;
+		}
+
+		if (pte_val(*pte))
+			continue;
+
+		/* Nothing to map */
+		if (!is_memory_any_valid(address, address + PAGE_SIZE)) {
+			set_pte(pte, __pte(0));
+			continue;
+		}
+
+		entry = _PAGE_NX|_KERNPG_TABLE|_PAGE_GLOBAL|address;
+		entry &= __supported_pte_mask;
+		set_pte(pte, __pte(entry));
+	}
+}
+
+static void __meminit
 phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end)
 {
 	int i = pmd_index(address);
 
 	for (; i < PTRS_PER_PMD; i++, address += PMD_SIZE) {
 		unsigned long entry;
-		pmd_t *pmd = pmd_page + pmd_index(address);
+		pmd_t *pmd = pmd_page + i; // pmd_index(address);
+		pte_t *pte;
+		unsigned long pte_phys;
 
 		if (address >= end) {
 			if (!after_bootmem)
@@ -268,9 +301,27 @@ phys_pmd_init(pmd_t *pmd_page, unsigned 
 		if (pmd_val(*pmd))
 			continue;
 
-		entry = _PAGE_NX|_PAGE_PSE|_KERNPG_TABLE|_PAGE_GLOBAL|address;
-		entry &= __supported_pte_mask;
-		set_pmd(pmd, __pmd(entry));
+		/* Nothing to map */
+		if (!is_memory_any_valid(address, address + PMD_SIZE)) {
+			set_pmd(pmd, __pmd(0));
+			continue;
+		}
+
+		/* Map with 2M pages */
+		if (is_memory_all_valid(address, address + PUD_SIZE)) {
+			entry = _PAGE_NX|_PAGE_PSE|_KERNPG_TABLE|
+				_PAGE_GLOBAL|address;
+			entry &= __supported_pte_mask;
+			set_pmd(pmd, __pmd(entry));
+			continue;
+		}
+
+		/* Map with 4k pages */
+		pte = alloc_low_page(&pte_phys);
+		set_pmd(pmd, __pmd(pte_phys | _KERNPG_TABLE));
+		phys_pte_init(pte, address, address + PMD_SIZE);
+		unmap_low_page(pte);
+
 	}
 }
 
@@ -291,14 +342,15 @@ static void __meminit phys_pud_init(pud_
 
 	for (; i < PTRS_PER_PUD; i++, addr = (addr & PUD_MASK) + PUD_SIZE ) {
 		unsigned long pmd_phys;
-		pud_t *pud = pud_page + pud_index(addr);
+		pud_t *pud = pud_page + i; // pud_index(addr);
 		pmd_t *pmd;
 
 		if (addr >= end)
 			break;
 
-		if (!after_bootmem && !e820_any_mapped(addr,addr+PUD_SIZE,0)) {
-			set_pud(pud, __pud(0)); 
+		if (!after_bootmem &&
+		    !is_memory_any_valid(addr, addr+PUD_SIZE)) {
+			set_pud(pud, __pud(0));
 			continue;
 		} 
 
@@ -310,7 +362,7 @@ static void __meminit phys_pud_init(pud_
 		pmd = alloc_low_page(&pmd_phys);
 		spin_lock(&init_mm.page_table_lock);
 		set_pud(pud, __pud(pmd_phys | _KERNPG_TABLE));
-		phys_pmd_init(pmd, addr, end);
+		phys_pmd_init(pmd, addr, addr + PUD_SIZE);
 		spin_unlock(&init_mm.page_table_lock);
 		unmap_low_page(pmd);
 	}
@@ -319,12 +371,14 @@ static void __meminit phys_pud_init(pud_
 
 static void __init find_early_table_space(unsigned long end)
 {
-	unsigned long puds, pmds, tables, start;
+	unsigned long puds, pmds, ptes, tables, start;
 
 	puds = (end + PUD_SIZE - 1) >> PUD_SHIFT;
 	pmds = (end + PMD_SIZE - 1) >> PMD_SHIFT;
+	ptes = (end + PAGE_SIZE - 1) >> PAGE_SHIFT;
 	tables = round_up(puds * sizeof(pud_t), PAGE_SIZE) +
-		 round_up(pmds * sizeof(pmd_t), PAGE_SIZE);
+		 round_up(pmds * sizeof(pmd_t), PAGE_SIZE) +
+		 round_up(ptes * sizeof(pte_t), PAGE_SIZE);
 
  	/* RED-PEN putting page tables only on node 0 could
  	   cause a hotspot and fill up ZONE_DMA. The page tables
Index: linux-2.6.24-rc/include/asm-x86/e820_64.h
===================================================================
--- linux-2.6.24-rc.orig/include/asm-x86/e820_64.h
+++ linux-2.6.24-rc/include/asm-x86/e820_64.h
@@ -24,6 +24,10 @@ extern void e820_mark_nosave_regions(voi
 extern void e820_print_map(char *who);
 extern int e820_any_mapped(unsigned long start, unsigned long end, unsigned type);
 extern int e820_all_mapped(unsigned long start, unsigned long end, unsigned type);
+extern int e820_any_non_reserved(unsigned long start, unsigned long end);
+extern int is_memory_any_valid(unsigned long start, unsigned long end);
+extern int e820_all_non_reserved(unsigned long start, unsigned long end);
+extern int is_memory_all_valid(unsigned long start, unsigned long end);
 extern unsigned long e820_hole_size(unsigned long start, unsigned long end);
 
 extern void e820_setup_gap(void);
@@ -36,6 +40,10 @@ extern struct e820map e820;
 
 extern unsigned ebda_addr, ebda_size;
 extern unsigned long nodemap_addr, nodemap_size;
+
+#define ISA_START_ADDRESS	0xa0000
+#define ISA_END_ADDRESS		0x100000
+
 #endif/*!__ASSEMBLY__*/
 
 #endif/*__E820_HEADER*/
Index: linux-2.6.24-rc/arch/x86/mm/pageattr_64.c
===================================================================
--- linux-2.6.24-rc.orig/arch/x86/mm/pageattr_64.c
+++ linux-2.6.24-rc/arch/x86/mm/pageattr_64.c
@@ -160,9 +160,6 @@ __change_page_attr(unsigned long address
 	} else
 		BUG();
 
-	/* on x86-64 the direct mapping set at boot is not using 4k pages */
- 	BUG_ON(PageReserved(kpte_page));
-
 	save_page(kpte_page);
 	if (page_private(kpte_page) == 0)
 		revert_page(address, ref_prot);
Index: linux-2.6.24-rc/arch/x86/mm/ioremap_64.c
===================================================================
--- linux-2.6.24-rc.orig/arch/x86/mm/ioremap_64.c
+++ linux-2.6.24-rc/arch/x86/mm/ioremap_64.c
@@ -28,9 +29,6 @@ unsigned long __phys_addr(unsigned long 
 }
 EXPORT_SYMBOL(__phys_addr);
 
-#define ISA_START_ADDRESS      0xa0000
-#define ISA_END_ADDRESS                0x100000
-
 /*
  * Fix up the linear direct mapping of the kernel to avoid cache attribute
  * conflicts.

-- 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH 10/12] PAT 64b: Make acpi use early map instead of assuming identity map
  2007-12-13 23:55 [RFC PATCH 00/12] PAT 64b: PAT support for X86_64 venkatesh.pallipadi
                   ` (8 preceding siblings ...)
  2007-12-13 23:55 ` [RFC PATCH 09/12] PAT 64b: map only usable memory in identity mapping venkatesh.pallipadi
@ 2007-12-13 23:55 ` venkatesh.pallipadi
  2007-12-13 23:55 ` [RFC PATCH 11/12] PAT 64b: devmem do not read pages not mapped in " venkatesh.pallipadi
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 52+ messages in thread
From: venkatesh.pallipadi @ 2007-12-13 23:55 UTC (permalink / raw)
  To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: acpi_use_early_ioremap.patch --]
[-- Type: text/plain, Size: 5378 bytes --]

ACPI boot code has assumptions about entire memory being mapped in identity
mapping at:
* Generic __acpi_map_table
* Looking for RSD PTR at boot time
* Looking for mp table

Fix all these to use early_ioremap and early_iounmap.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: linux-2.6.24-rc/arch/x86/kernel/acpi/boot.c
===================================================================
--- linux-2.6.24-rc.orig/arch/x86/kernel/acpi/boot.c
+++ linux-2.6.24-rc/arch/x86/kernel/acpi/boot.c
@@ -105,16 +105,20 @@ enum acpi_irq_model_id acpi_irq_model = 
 
 #ifdef	CONFIG_X86_64
 
-/* rely on all ACPI tables being in the direct mapping */
 char *__acpi_map_table(unsigned long phys_addr, unsigned long size)
 {
 	if (!phys_addr || !size)
 		return NULL;
 
-	if (phys_addr+size <= (end_pfn_map << PAGE_SHIFT) + PAGE_SIZE)
-		return __va(phys_addr);
+	return early_ioremap(phys_addr, size);
+}
 
-	return NULL;
+void __acpi_unmap_table(void * addr, unsigned long size)
+{
+	if (!addr || !size)
+		return;
+
+	early_iounmap(addr, size);
 }
 
 #else
@@ -158,6 +162,11 @@ char *__acpi_map_table(unsigned long phy
 
 	return ((unsigned char *)base + offset);
 }
+
+void __acpi_unmap_table(void * addr, unsigned long size)
+{
+}
+
 #endif
 
 #ifdef CONFIG_PCI_MMCONFIG
@@ -586,17 +595,23 @@ acpi_scan_rsdp(unsigned long start, unsi
 {
 	unsigned long offset = 0;
 	unsigned long sig_len = sizeof("RSD PTR ") - 1;
+	char * virt_addr;
 
+	virt_addr = __acpi_map_table(start, length);
+	if (!virt_addr)
+		return 0;
 	/*
 	 * Scan all 16-byte boundaries of the physical memory region for the
 	 * RSDP signature.
 	 */
 	for (offset = 0; offset < length; offset += 16) {
-		if (strncmp((char *)(phys_to_virt(start) + offset), "RSD PTR ", sig_len))
+		if (strncmp(virt_addr + offset, "RSD PTR ", sig_len))
 			continue;
+		__acpi_unmap_table(virt_addr, length);
 		return (start + offset);
 	}
 
+	__acpi_unmap_table(virt_addr, length);
 	return 0;
 }
 
Index: linux-2.6.24-rc/drivers/acpi/osl.c
===================================================================
--- linux-2.6.24-rc.orig/drivers/acpi/osl.c
+++ linux-2.6.24-rc/drivers/acpi/osl.c
@@ -231,6 +231,8 @@ void acpi_os_unmap_memory(void __iomem *
 {
 	if (acpi_gbl_permanent_mmap) {
 		iounmap(virt);
+	} else {
+		__acpi_unmap_table(virt, size);
 	}
 }
 EXPORT_SYMBOL_GPL(acpi_os_unmap_memory);
Index: linux-2.6.24-rc/include/linux/acpi.h
===================================================================
--- linux-2.6.24-rc.orig/include/linux/acpi.h
+++ linux-2.6.24-rc/include/linux/acpi.h
@@ -79,6 +79,7 @@ typedef int (*acpi_table_handler) (struc
 typedef int (*acpi_table_entry_handler) (struct acpi_subtable_header *header, const unsigned long end);
 
 char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
+void __acpi_unmap_table (void * addr, unsigned long size);
 unsigned long acpi_find_rsdp (void);
 int acpi_boot_init (void);
 int acpi_boot_table_init (void);
Index: linux-2.6.24-rc/arch/x86/kernel/mpparse_64.c
===================================================================
--- linux-2.6.24-rc.orig/arch/x86/kernel/mpparse_64.c
+++ linux-2.6.24-rc/arch/x86/kernel/mpparse_64.c
@@ -535,9 +535,12 @@ void __init get_smp_config (void)
 static int __init smp_scan_config (unsigned long base, unsigned long length)
 {
 	extern void __bad_mpf_size(void); 
-	unsigned int *bp = phys_to_virt(base);
+	unsigned int *bp = (unsigned int *)__acpi_map_table(base, length);
 	struct intel_mp_floating *mpf;
 
+	if (!bp)
+		return 0;
+
 	Dprintk("Scan SMP from %p for %ld bytes.\n", bp,length);
 	if (sizeof(*mpf) != 16)
 		__bad_mpf_size();
@@ -555,17 +558,20 @@ static int __init smp_scan_config (unsig
 			if (mpf->mpf_physptr)
 				reserve_bootmem_generic(mpf->mpf_physptr, PAGE_SIZE);
 			mpf_found = mpf;
+			__acpi_unmap_table((char *)bp, length);
 			return 1;
 		}
 		bp += 4;
 		length -= 16;
 	}
+	__acpi_unmap_table((char *)bp, length);
 	return 0;
 }
 
 void __init find_smp_config(void)
 {
 	unsigned int address;
+	unsigned short *bp;
 
 	/*
 	 * FIXME: Linux assumes you have 640K of base ram..
@@ -592,11 +598,17 @@ void __init find_smp_config(void)
 	 * should be fixed.
 	 */
 
-	address = *(unsigned short *)phys_to_virt(0x40E);
+	bp = (unsigned short *)__acpi_map_table(0x40E, 2);
+	if (!bp)
+		return;
+
+	address = *bp;
 	address <<= 4;
 	if (smp_scan_config(address, 0x1000))
 		return;
 
+	__acpi_unmap_table((char *)bp, 2);
+
 	/* If we have come this far, we did not find an MP table  */
 	 printk(KERN_INFO "No mptable found.\n");
 }
Index: linux-2.6.24-rc/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.24-rc.orig/arch/x86/mm/init_64.c
+++ linux-2.6.24-rc/arch/x86/mm/init_64.c
@@ -206,7 +206,7 @@ static __meminit void unmap_low_page(voi
 } 
 
 /* Must run before zap_low_mappings */
-__meminit void *early_ioremap(unsigned long addr, unsigned long size)
+void *early_ioremap(unsigned long addr, unsigned long size)
 {
 	unsigned long vaddr;
 	pmd_t *pmd, *last_pmd;
@@ -235,7 +235,7 @@ __meminit void *early_ioremap(unsigned l
 }
 
 /* To avoid virtual aliases later */
-__meminit void early_iounmap(void *addr, unsigned long size)
+void early_iounmap(void *addr, unsigned long size)
 {
 	unsigned long vaddr;
 	pmd_t *pmd;

-- 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH 11/12] PAT 64b: devmem do not read pages not mapped in identity map
  2007-12-13 23:55 [RFC PATCH 00/12] PAT 64b: PAT support for X86_64 venkatesh.pallipadi
                   ` (9 preceding siblings ...)
  2007-12-13 23:55 ` [RFC PATCH 10/12] PAT 64b: Make acpi use early map instead of assuming identity map venkatesh.pallipadi
@ 2007-12-13 23:55 ` venkatesh.pallipadi
  2007-12-13 23:55 ` [RFC PATCH 12/12] PAT 64b: skip attr tracking for RAM venkatesh.pallipadi
  2007-12-14  0:28 ` [RFC PATCH 00/12] PAT 64b: PAT support for X86_64 Dave Airlie
  12 siblings, 0 replies; 52+ messages in thread
From: venkatesh.pallipadi @ 2007-12-13 23:55 UTC (permalink / raw)
  To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: dev_mem_skip_read.patch --]
[-- Type: text/plain, Size: 1630 bytes --]

Enable valid_phys_addr_range for x86_64 and check whether the identity
mapping exists before reading /dev/mem. For reserved regions there is no
identity mappings any more and then cannot be read from /dev/mem.

Side effect is that dd of entire memory will not work any more, once there are
reserved holes in memory space.

TBD: Read reserved regions as 0xffff or something, and continue reading
across holes, till we reach the high_memory (end of memory).

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: linux-2.6.24-rc/arch/x86/kernel/e820_64.c
===================================================================
--- linux-2.6.24-rc.orig/arch/x86/kernel/e820_64.c
+++ linux-2.6.24-rc/arch/x86/kernel/e820_64.c
@@ -220,6 +220,11 @@ int is_memory_all_valid(unsigned long st
 }
 EXPORT_SYMBOL_GPL(is_memory_all_valid);
 
+int valid_phys_addr_range(unsigned long addr, size_t count)
+{
+	return is_memory_all_valid(addr, addr + count);
+}
+
 /* 
  * Find a free area in a specific range. 
  */ 
Index: linux-2.6.24-rc/include/asm-x86/io_64.h
===================================================================
--- linux-2.6.24-rc.orig/include/asm-x86/io_64.h
+++ linux-2.6.24-rc/include/asm-x86/io_64.h
@@ -265,6 +265,15 @@ extern int iommu_bio_merge;
  */
 #define xlate_dev_kmem_ptr(p)	p
 
+#define ARCH_HAS_VALID_PHYS_ADDR_RANGE
+extern int valid_phys_addr_range (unsigned long addr, size_t count);
+
+static inline int valid_mmap_phys_addr_range(unsigned long pfn, size_t size)
+{
+	return 1;
+}
+
+
 #endif /* __KERNEL__ */
 
 #endif

-- 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH 12/12] PAT 64b: skip attr tracking for RAM
  2007-12-13 23:55 [RFC PATCH 00/12] PAT 64b: PAT support for X86_64 venkatesh.pallipadi
                   ` (10 preceding siblings ...)
  2007-12-13 23:55 ` [RFC PATCH 11/12] PAT 64b: devmem do not read pages not mapped in " venkatesh.pallipadi
@ 2007-12-13 23:55 ` venkatesh.pallipadi
  2007-12-14  0:28 ` [RFC PATCH 00/12] PAT 64b: PAT support for X86_64 Dave Airlie
  12 siblings, 0 replies; 52+ messages in thread
From: venkatesh.pallipadi @ 2007-12-13 23:55 UTC (permalink / raw)
  To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: skip-ram-attr-tracking.patch --]
[-- Type: text/plain, Size: 7442 bytes --]

For now, track the page attribute only for reserved regions (specified as
reserved in e820 or not present in e820 at all). As we don't have the kernel
identity mappings for these regions, existing simple infrastructure of memattr
list is enough. Otherwise we need to keep track of the actual reference
count, to do cpa for kernel identity mappings.

We don't track RAM pages using memattr infrastructure. This is because,
memattr infrastructure is not enough. i.e., while the page is getting
tracked using memattr infrastructure, potentially the page can get
freed(a bug that we need to catch, to avoid attribute aliasing).
For example, a driver does ioremap_uc and an application mapped the
same page using /dev/mem. When the driver does iounamp and free the page,
/dev/mem mapping is still live and we run into aliasing issue.

for now, we allow only UC for RAM pages (with out any tracking). Why?
This is what the current mainline kernel anyhow does.

TBD:

1. Do we need to allow RAM pages to be mapped as WC? If not, then
we don't need to follow the TLB flush mechanism (make pte not present,
flush, and set pte with new mapping) mentioned in section 10.12.4 of SDM Vol3a.

2. For a complete solution, handle RAM pages with UC and /dev/mem mapping
conflicts.  Can we use the existing page struct to keep track of the /dev/mem
mappings (through the page ref count) and not allow to free the page while
the /dev/mem mappings are active. And allow /dev/mem to map only those pages
which are marked reserved (which the driver does before doing iomap).

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Index: linux-2.6.24-rc4/arch/x86/kernel/e820_64.c
===================================================================
--- linux-2.6.24-rc4.orig/arch/x86/kernel/e820_64.c	2007-12-12 16:54:34.000000000 -0800
+++ linux-2.6.24-rc4/arch/x86/kernel/e820_64.c	2007-12-12 16:54:34.000000000 -0800
@@ -156,7 +156,7 @@
  * Note: this function only works correct if the e820 table is sorted and
  * not-overlapping, which is the case
  */
-int __init e820_all_mapped(unsigned long start, unsigned long end, unsigned type)
+int e820_all_mapped(unsigned long start, unsigned long end, unsigned type)
 {
 	int i;
 	for (i = 0; i < e820.nr_map; i++) {
@@ -220,6 +220,15 @@
 }
 EXPORT_SYMBOL_GPL(is_memory_all_valid);
 
+int is_memory_all_reserved(unsigned long start, unsigned long end)
+{
+	/* Switch to efi or e820 in future here */
+	if (e820_all_mapped(start, end, E820_RESERVED) == 1)
+		return 1;
+	return !is_memory_any_valid(start, end);
+}
+EXPORT_SYMBOL_GPL(is_memory_all_reserved);
+
 int valid_phys_addr_range(unsigned long addr, size_t count)
 {
 	return is_memory_all_valid(addr, addr + count);
Index: linux-2.6.24-rc4/arch/x86/mm/ioremap_64.c
===================================================================
--- linux-2.6.24-rc4.orig/arch/x86/mm/ioremap_64.c	2007-12-12 16:54:34.000000000 -0800
+++ linux-2.6.24-rc4/arch/x86/mm/ioremap_64.c	2007-12-12 16:54:34.000000000 -0800
@@ -20,6 +20,7 @@
 #include <asm/cacheflush.h>
 #include <asm/proto.h>
 #include <asm/pat.h>
+#include <asm/e820.h>
 
 unsigned long __phys_addr(unsigned long x)
 {
@@ -72,12 +73,21 @@
 	struct vm_struct * area;
 	unsigned long offset, last_addr;
 	pgprot_t pgprot;
+	int all_memory = 0, all_reserved = 0;
 
 	/* Don't allow wraparound or zero size */
 	last_addr = phys_addr + size - 1;
 	if (!size || last_addr < phys_addr)
 		return NULL;
 
+	/* Don't allow overlapping attributes */
+	all_memory = is_memory_all_valid(phys_addr, last_addr);
+
+	all_reserved = is_memory_all_reserved(phys_addr, last_addr);
+
+	if (!all_memory && !all_reserved)
+		return NULL;
+
 	/*
 	 * Don't remap the low PCI/ISA area, it's always mapped..
 	 */
@@ -126,12 +136,19 @@
 
 	/* For plain ioremap() get the existing attributes. Otherwise
 	   check against the existing ones */
-	if (reserve_mattr(phys_addr, phys_addr + size, flags,
+	/* For now, we track the attributes for the reserved pages only.
+	 * For memory pages, this is TBD. While, we can track the page
+	 * attrib using struct page, enforcing the role is some
+	 * what tricky!
+	 */
+	if (all_reserved &&
+	    reserve_mattr(phys_addr, phys_addr + size, flags,
 			  flags ? NULL : &flags) < 0)
 		goto out;
 
 	if (flags && ioremap_change_attr(phys_addr, size, flags) < 0) {
-		free_mattr(phys_addr, phys_addr + size, flags);
+		if (all_reserved)
+			free_mattr(phys_addr, phys_addr + size, flags);
 		goto out;
 	}
 	return (__force void __iomem *) (offset + (char *)addr);
@@ -190,7 +207,10 @@
  */
 void __iomem *ioremap_wc(unsigned long phys_addr, unsigned long size)
 {
-	return __ioremap(phys_addr, size, _PAGE_WC);
+	if (!is_memory_any_valid(phys_addr, phys_addr + size - 1))
+		return __ioremap(phys_addr, size, _PAGE_WC);
+	else
+		return NULL;
 }
 EXPORT_SYMBOL(ioremap_wc);
 
@@ -232,7 +252,8 @@
 
 	/* Reset the direct mapping. Can block */
 	if (p->flags >> 20) {
-		free_mattr(p->phys_addr, p->phys_addr + p->size, p->flags>>20);
+		if (is_memory_all_reserved(p->phys_addr, p->phys_addr + p->size))
+			free_mattr(p->phys_addr, p->phys_addr + p->size, p->flags>>20);
 		ioremap_change_attr(p->phys_addr, p->size, 0);
 	}
 
Index: linux-2.6.24-rc4/include/asm-x86/e820_64.h
===================================================================
--- linux-2.6.24-rc4.orig/include/asm-x86/e820_64.h	2007-12-12 16:54:34.000000000 -0800
+++ linux-2.6.24-rc4/include/asm-x86/e820_64.h	2007-12-12 16:54:34.000000000 -0800
@@ -28,6 +28,7 @@
 extern int is_memory_any_valid(unsigned long start, unsigned long end);
 extern int e820_all_non_reserved(unsigned long start, unsigned long end);
 extern int is_memory_all_valid(unsigned long start, unsigned long end);
+extern int is_memory_all_reserved(unsigned long start, unsigned long end);
 extern unsigned long e820_hole_size(unsigned long start, unsigned long end);
 
 extern void e820_setup_gap(void);
Index: linux-2.6.24-rc4/arch/x86/mm/pat.c
===================================================================
--- linux-2.6.24-rc4.orig/arch/x86/mm/pat.c	2007-12-12 16:53:39.000000000 -0800
+++ linux-2.6.24-rc4/arch/x86/mm/pat.c	2007-12-12 16:54:34.000000000 -0800
@@ -11,6 +11,7 @@
 #include <asm/pat.h>
 #include <asm/cacheflush.h>
 #include <asm/fcntl.h>
+#include <asm/e820.h>
 
 static u64 boot_pat_state;
 
@@ -176,8 +177,15 @@
 	if ((file->f_flags & O_SYNC) || (offset >= __pa(high_memory)))
 		want_flags = _PAGE_PCD;
 
-	/* ignore error because we can't handle it here */
-	reserve_mattr(offset, offset+size, want_flags, &flags);
+	if (is_memory_all_reserved(offset, offset + size - 1))
+		/* ignore error because we can't handle it here */
+		reserve_mattr(offset, offset+size, want_flags, &flags);
+	else
+		/* for memory pages, we will allow what the requestor wants
+		 * except for WC for now.
+		 */
+		flags = want_flags & ~_PAGE_PWT;
+
 	if (flags != want_flags) {
 		printk(KERN_INFO
 	"%s:%d /dev/mem expected mapping type %s for %Lx-%Lx, got %s\n",
@@ -200,7 +208,8 @@
 void unmap_devmem(unsigned long pfn, unsigned long size, pgprot_t vma_prot)
 {
 	u64 addr = (u64)pfn << PAGE_SHIFT;
-	free_mattr(addr, size, 0);
+	if (is_memory_all_reserved(addr, addr + size - 1))
+		free_mattr(addr, size, 0);
 	if (addr < __pa(high_memory) &&
 	   (pgprot_val(vma_prot) & _PAGE_CACHE_MASK))
 		change_page_attr_addr(addr, size >> PAGE_SHIFT, PAGE_KERNEL);

-- 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 08/12] PAT 64b: coherent mmap and sysfs bin ioctl
  2007-12-13 23:55 ` [RFC PATCH 08/12] PAT 64b: coherent mmap and sysfs bin ioctl venkatesh.pallipadi
@ 2007-12-14  0:19   ` Greg KH
  2007-12-14  0:35     ` David Miller
  2007-12-14  0:43     ` Andi Kleen
  2007-12-14  0:54   ` Jesse Barnes
  2007-12-14  3:59   ` Eric W. Biederman
  2 siblings, 2 replies; 52+ messages in thread
From: Greg KH @ 2007-12-14  0:19 UTC (permalink / raw)
  To: venkatesh.pallipadi
  Cc: ak, ebiederm, rdreier, torvalds, airlied, davej, mingo, tglx,
	hpa, akpm, arjan, jesse.barnes, linux-kernel, Suresh Siddha

On Thu, Dec 13, 2007 at 03:55:51PM -0800, venkatesh.pallipadi@intel.com wrote:
> Forward port of coherent-mmap.patch and sysfs-bin-ioctl.patch to x86 tree.
> 
> TBD: Do we need the ioctl interface to sysfs or get the type attribute
> through a different sysfs file. And then actually specify the attribute
> while doing pci_mmap_page_range ;-)

Woah!  No, no ioctls on sysfs files, sorry.  Not going to happen, do
this on a /dev file if you want to have ioctls...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 00/12] PAT 64b: PAT support for X86_64
  2007-12-13 23:55 [RFC PATCH 00/12] PAT 64b: PAT support for X86_64 venkatesh.pallipadi
                   ` (11 preceding siblings ...)
  2007-12-13 23:55 ` [RFC PATCH 12/12] PAT 64b: skip attr tracking for RAM venkatesh.pallipadi
@ 2007-12-14  0:28 ` Dave Airlie
  2007-12-14 22:00   ` Siddha, Suresh B
  12 siblings, 1 reply; 52+ messages in thread
From: Dave Airlie @ 2007-12-14  0:28 UTC (permalink / raw)
  To: venkatesh.pallipadi
  Cc: ak, ebiederm, rdreier, torvalds, gregkh, davej, mingo, tglx, hpa,
	akpm, arjan, jesse.barnes, linux-kernel


> Yes. It is that wonderful time of the year again.
> No, no. We are not talking about holiday season or new year here.
> 
> We are talking about one another rehash of "why we do not support PAT in x86"
> question and series of patches that implement some PAT support before going
> into hibernation again. Only difference is that we hope to take this little
> further this time and may be really get this support into
> upstream kernel soon.

Woot, PAT support: this time we mean it!!

I'll just give one comment after a reading your todo..

> * Do we need to allow RAM pages to be mapped as WC? If not, then
>   we don't need to follow the TLB flush mechanism (make pte not present,
>   flush, and set pte with new mapping) mentioned in section 10.12.4 of SDM
>   Vol3a.

Yes, the main use for GPUs is to have RAM pages mapped WC, and placed into 
a GART on the GPU side, currently for Intel IGD we are okay as the CPU can 
access the GPU GART aperture, but other chips exist where this isn't 
possible, I think poulsbo and possible some of the AMD IGPs..

Dave.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 08/12] PAT 64b: coherent mmap and sysfs bin ioctl
  2007-12-14  0:19   ` Greg KH
@ 2007-12-14  0:35     ` David Miller
  2007-12-14  6:34       ` Greg KH
  2007-12-14  0:43     ` Andi Kleen
  1 sibling, 1 reply; 52+ messages in thread
From: David Miller @ 2007-12-14  0:35 UTC (permalink / raw)
  To: gregkh
  Cc: venkatesh.pallipadi, ak, ebiederm, rdreier, torvalds, airlied,
	davej, mingo, tglx, hpa, akpm, arjan, jesse.barnes, linux-kernel,
	suresh.b.siddha

From: Greg KH <gregkh@suse.de>
Date: Thu, 13 Dec 2007 16:19:32 -0800

> On Thu, Dec 13, 2007 at 03:55:51PM -0800, venkatesh.pallipadi@intel.com wrote:
> > Forward port of coherent-mmap.patch and sysfs-bin-ioctl.patch to x86 tree.
> > 
> > TBD: Do we need the ioctl interface to sysfs or get the type attribute
> > through a different sysfs file. And then actually specify the attribute
> > while doing pci_mmap_page_range ;-)
> 
> Woah!  No, no ioctls on sysfs files, sorry.  Not going to happen, do
> this on a /dev file if you want to have ioctls...

Well since we told people to move over to sysfs for PCI
accesses, and that's where mmap() is done via too,
it should be no surprise that we run into problems when
people want to set attributes for the mmap() as was done
for the procfs case.

So you have two choices:

1) Balk on the sysfs pci usage, and erase years of effort
   of moving people over to sysfs.  Tell them to go back to
   procfs so we can add the attribute setting via ioctl()
   which is absolutely needed.

2) Relax your restrictions a little bit and allow ioctl()'s
   for limited cases, like this one.

Otherwise, propase a way to specify PCI device mmap attributes
which works within your whole-universe sysfs theory of everything
:-)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation
  2007-12-13 23:55 ` [RFC PATCH 02/12] PAT 64b: Basic PAT implementation venkatesh.pallipadi
@ 2007-12-14  0:42   ` Andi Kleen
  2007-12-14 18:31     ` Venki Pallipadi
  2007-12-14  3:48   ` Eric W. Biederman
  1 sibling, 1 reply; 52+ messages in thread
From: Andi Kleen @ 2007-12-14  0:42 UTC (permalink / raw)
  To: venkatesh.pallipadi
  Cc: ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo, tglx,
	hpa, akpm, arjan, jesse.barnes, linux-kernel, Suresh Siddha

> +void __cpuinit pat_init(void)
> +{
> +	/* Set PWT+PCD to Write-Combining. All other bits stay the same */
> +	if (cpu_has_pat) {

All the old CPUs (PPro etc.) with known PAT bugs need to clear this flag 
now in their CPU init functions. It is fine to be aggressive there
because these old systems have lived so long without PAT they can do 
so forever. So perhaps it's best to just white list it only for newer
CPUs on the Intel side at least.

Another problem is that there are some popular modules (ATI, Nvidia for once)
who reprogram the PAT registers on their own, likely different. Need some way to detect
that case I guess, otherwise lots of users will see strange malfunctions.
Maybe recheck after module load?

> +                   |||
> +		   000 WB         default
> +		   010 UC_MINUS   _PAGE_PCD
> +		   011 WC         _PAGE_WC
> +		   PAT bit unused */
> +		pat = PAT(0,WB) | PAT(1,WT) | PAT(2,UC_MINUS) | PAT(3,WC) |
> +		      PAT(4,WB) | PAT(5,WT) | PAT(6,UC_MINUS) | PAT(7,WC);
> +		rdmsrl(MSR_IA32_CR_PAT, boot_pat_state);
> +		wrmsrl(MSR_IA32_CR_PAT, pat);
> +		__flush_tlb_all();
> +		asm volatile("wbinvd");

Have you double checked this is the full procedure from the manual? iirc there
were some steps missing.

-Andi


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 08/12] PAT 64b: coherent mmap and sysfs bin ioctl
  2007-12-14  0:19   ` Greg KH
  2007-12-14  0:35     ` David Miller
@ 2007-12-14  0:43     ` Andi Kleen
  1 sibling, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-12-14  0:43 UTC (permalink / raw)
  To: Greg KH
  Cc: venkatesh.pallipadi, ebiederm, rdreier, torvalds, airlied, davej,
	mingo, tglx, hpa, akpm, arjan, jesse.barnes, linux-kernel,
	Suresh Siddha

On Thu, Dec 13, 2007 at 04:19:32PM -0800, Greg KH wrote:
> On Thu, Dec 13, 2007 at 03:55:51PM -0800, venkatesh.pallipadi@intel.com wrote:
> > Forward port of coherent-mmap.patch and sysfs-bin-ioctl.patch to x86 tree.
> > 
> > TBD: Do we need the ioctl interface to sysfs or get the type attribute
> > through a different sysfs file. And then actually specify the attribute
> > while doing pci_mmap_page_range ;-)
> 
> Woah!  No, no ioctls on sysfs files, sorry.  Not going to happen, do
> this on a /dev file if you want to have ioctls...

That would require putting the whole PCI bus hierarchy into /dev.
We could do that, but for what would we still need /sys then ? @)

Anyways if you can suggest a better interface please do, but please think first.

-Andi

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 08/12] PAT 64b: coherent mmap and sysfs bin ioctl
  2007-12-13 23:55 ` [RFC PATCH 08/12] PAT 64b: coherent mmap and sysfs bin ioctl venkatesh.pallipadi
  2007-12-14  0:19   ` Greg KH
@ 2007-12-14  0:54   ` Jesse Barnes
  2007-12-14  3:59   ` Eric W. Biederman
  2 siblings, 0 replies; 52+ messages in thread
From: Jesse Barnes @ 2007-12-14  0:54 UTC (permalink / raw)
  To: venkatesh.pallipadi
  Cc: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, linux-kernel, Suresh Siddha

On Thursday, December 13, 2007 3:55 pm venkatesh.pallipadi@intel.com wrote:
> Forward port of coherent-mmap.patch and sysfs-bin-ioctl.patch to x86 tree.
>
> TBD: Do we need the ioctl interface to sysfs or get the type attribute
> through a different sysfs file. And then actually specify the attribute
> while doing pci_mmap_page_range ;-)
>
> And when this interface is in place, X server has to use this interface for
> WC mapping.

I remember talking with people about using madvise and/or mmap flags to set 
the attributes correctly, but I don't remember whether it was a good idea or 
not...

The advantage of a specific post-mmap call is that it would make setting the 
attribute types a little easier, so either ioctl or madvise seems preferable 
to mmapping over and over with different flags until you get the mapping.

Jesse

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation
  2007-12-13 23:55 ` [RFC PATCH 02/12] PAT 64b: Basic PAT implementation venkatesh.pallipadi
  2007-12-14  0:42   ` Andi Kleen
@ 2007-12-14  3:48   ` Eric W. Biederman
  2007-12-14  4:23     ` Eric W. Biederman
                       ` (2 more replies)
  1 sibling, 3 replies; 52+ messages in thread
From: Eric W. Biederman @ 2007-12-14  3:48 UTC (permalink / raw)
  To: venkatesh.pallipadi
  Cc: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes, linux-kernel,
	Suresh Siddha

venkatesh.pallipadi@intel.com writes:

> Originally based on a patch from Eric Biederman, but heavily changed.
>
> Forward port of pat-base.patch to x86 tree, with a bug fix.
> Code was using 'PCD|PWT' i.e., PAT3 for WC mapping. So set the WC mapping at
> correct PAT fields PA3/PA7.

Well that wasn't from my original tested patch. Grr.

> TBD: KEXEC and other CPU offline paths may need pat_shutdown()?

> Index: linux-2.6/arch/x86/mm/Makefile_64
> ===================================================================
> --- linux-2.6.orig/arch/x86/mm/Makefile_64 2007-12-11 03:30:34.000000000 -0800
> +++ linux-2.6/arch/x86/mm/Makefile_64	2007-12-11 03:42:08.000000000 -0800
> @@ -2,7 +2,7 @@
>  # Makefile for the linux x86_64-specific parts of the memory manager.
>  #
>  
> -obj-y := init_64.o fault_64.o ioremap_64.o extable_64.o pageattr_64.o mmap_64.o
> +obj-y := init_64.o fault_64.o ioremap_64.o extable_64.o pageattr_64.o mmap_64.o
> pat.o
>  obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
>  obj-$(CONFIG_NUMA) += numa_64.o
>  obj-$(CONFIG_K8_NUMA) += k8topology_64.o
> Index: linux-2.6/arch/x86/mm/pat.c
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6/arch/x86/mm/pat.c	2007-12-11 04:12:47.000000000 -0800
> @@ -0,0 +1,57 @@
> +/* Handle caching attributes in page tables (PAT) */
> +#include <linux/mm.h>
> +#include <linux/kernel.h>
> +#include <linux/rbtree.h>
> +#include <linux/gfp.h>
> +#include <asm/msr.h>
> +#include <asm/tlbflush.h>
> +#include <asm/processor.h>
> +
> +static u64 boot_pat_state;
> +
> +enum {
> +	PAT_UC = 0,   	/* uncached */
> +	PAT_WC = 1,		/* Write combining */
> +	PAT_WT = 4,		/* Write Through */
> +	PAT_WP = 5,		/* Write Protected */
> +	PAT_WB = 6,		/* Write Back (default) */
> +	PAT_UC_MINUS = 7,	/* UC, but can be overriden by MTRR */
> +};
> +
> +#define PAT(x,y) ((u64)PAT_ ## y << ((x)*8))
> +
> +void __cpuinit pat_init(void)
> +{
> +	/* Set PWT+PCD to Write-Combining. All other bits stay the same */
> +	if (cpu_has_pat) {
> +		u64 pat;
> +		/* PTE encoding used in Linux:
> +                   PAT
> +                   |PCD
> +                   ||PWT
> +                   |||
> +		   000 WB         default
> +		   010 UC_MINUS   _PAGE_PCD
> +		   011 WC         _PAGE_WC
> +		   PAT bit unused */
> +		pat = PAT(0,WB) | PAT(1,WT) | PAT(2,UC_MINUS) | PAT(3,WC) |
> +		      PAT(4,WB) | PAT(5,WT) | PAT(6,UC_MINUS) | PAT(7,WC);

I strongly object to this configuration.

The caching modes of interest are:
PAT_WB write-back or a close as the MTRRs will allow
       used for WC today.
PAT_UC completely uncachable not overridable by MTRRs 
       and what we use today for pgprot_noncached
PAT_WC what isn't available for current use.

We should use:
> +		pat = PAT(0,WB) | PAT(1,WT) | PAT(2,WC) | PAT(3,UC) |
> +		      PAT(4,WB) | PAT(5,WT) | PAT(6,WC) | PAT(7,UC);

Changing the UC- which currently allows write-combining if the MTRRs specify it,
to WC.  This grandfathers in all of our current usage and changes the one
PAT type that could today and in legacy mode specify WC to really specify WC.

I don't know if we need to set the high half or not, that would depend
on the state of the PAT errata.

I do know we need to use the low 4 pat mappings to avoid most of the PAT
errata issues.

As for Andi's concern about modules playing games with the PAT mappings
if we don't redefine how we use the page table entries our exposure to
badly behaved modules more limited.

> +		rdmsrl(MSR_IA32_CR_PAT, boot_pat_state);
> +		wrmsrl(MSR_IA32_CR_PAT, pat);
> +		__flush_tlb_all();
> +		asm volatile("wbinvd");
> +	}
> +}
> +
> +#undef PAT
> +
> +void pat_shutdown(void)
> +{
> +	/* Restore CPU default pat state */
> +	if (cpu_has_pat) {
> +		wrmsrl(MSR_IA32_CR_PAT, boot_pat_state);
> +		__flush_tlb_all();
> +		asm volatile("wbinvd");
> +	}
> +}
> +


Eric

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 08/12] PAT 64b: coherent mmap and sysfs bin ioctl
  2007-12-13 23:55 ` [RFC PATCH 08/12] PAT 64b: coherent mmap and sysfs bin ioctl venkatesh.pallipadi
  2007-12-14  0:19   ` Greg KH
  2007-12-14  0:54   ` Jesse Barnes
@ 2007-12-14  3:59   ` Eric W. Biederman
  2007-12-14  6:02     ` Greg KH
  2 siblings, 1 reply; 52+ messages in thread
From: Eric W. Biederman @ 2007-12-14  3:59 UTC (permalink / raw)
  To: venkatesh.pallipadi
  Cc: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes, linux-kernel,
	Suresh Siddha

venkatesh.pallipadi@intel.com writes:

> Forward port of coherent-mmap.patch and sysfs-bin-ioctl.patch to x86 tree.
>
> TBD: Do we need the ioctl interface to sysfs or get the type attribute
> through a different sysfs file. And then actually specify the attribute
> while doing pci_mmap_page_range ;-)

This ioctl is not connected up.  So regardless of the wisdom of ioctls on
sysfs adding the infrastructure and then not using it is broken.

> And when this interface is in place, X server has to use this interface for WC
> mapping.

Eric

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 06/12] PAT 64b: Add ioremap_wc support
  2007-12-13 23:55 ` [RFC PATCH 06/12] PAT 64b: Add ioremap_wc support venkatesh.pallipadi
@ 2007-12-14  4:17   ` Roland Dreier
  2007-12-14  4:28     ` Eric W. Biederman
  0 siblings, 1 reply; 52+ messages in thread
From: Roland Dreier @ 2007-12-14  4:17 UTC (permalink / raw)
  To: venkatesh.pallipadi
  Cc: ak, ebiederm, torvalds, gregkh, airlied, davej, mingo, tglx, hpa,
	akpm, arjan, jesse.barnes, linux-kernel, Suresh Siddha

 > --- linux-2.6.24-rc4.orig/include/asm-x86/io_64.h	2007-12-11 14:24:56.000000000 -0800
 > +++ linux-2.6.24-rc4/include/asm-x86/io_64.h	2007-12-11 15:49:52.000000000 -0800
 > @@ -142,7 +142,8 @@
 >   * it's useful if some control registers are in such an area and write combining
 >   * or read caching is not desirable:
 >   */
 > -extern void __iomem * ioremap_nocache (unsigned long offset, unsigned long size);
 > +extern void __iomem * ioremap_nocache(unsigned long offset, unsigned long size);
 > +extern void __iomem * ioremap_wc(unsigned long offset, unsigned long size);

I think ioremap_wc() needs to be available on all archs for this to be
really useful to drivers.  It can be a fallback to ioremap_nocache()
everywhere except 64-bit x86, but it's not nice for every driver that
wants to use this to need an "#ifdef X86" or whatever.

Also I didn't see anything like pgprot_wc() in the patchset (although
I just skimmed quickly for now).  The use case I actually have would
be in a a driver's .mmap method, where I want to map device registers
into userspace with write-combining turned on:

	vma->vm_page_prot = pgprot_wc(vma->vm_page_prot);
	io_remap_pfn_range(vma, vma->vm_start, pfn, PAGE_SIZE, vma->vm_page_prot);

where the pfn points into a PCI BAR (which will only be mapped once,
so no issues with conflicting PATs or anything like that).

 - R.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation
  2007-12-14  3:48   ` Eric W. Biederman
@ 2007-12-14  4:23     ` Eric W. Biederman
  2007-12-14 21:10       ` Siddha, Suresh B
  2007-12-14 10:25     ` Andi Kleen
  2007-12-14 21:06     ` Siddha, Suresh B
  2 siblings, 1 reply; 52+ messages in thread
From: Eric W. Biederman @ 2007-12-14  4:23 UTC (permalink / raw)
  To: venkatesh.pallipadi
  Cc: ak, rdreier, torvalds, gregkh, airlied, davej, mingo, tglx, hpa,
	akpm, arjan, jesse.barnes, linux-kernel, Suresh Siddha

ebiederm@xmission.com (Eric W. Biederman) writes:


> We should use:
>> +		pat = PAT(0,WB) | PAT(1,WT) | PAT(2,WC) | PAT(3,UC) |
>> +		      PAT(4,WB) | PAT(5,WT) | PAT(6,WC) | PAT(7,UC);
>
> Changing the UC- which currently allows write-combining if the MTRRs specify it,
> to WC.  This grandfathers in all of our current usage and changes the one
> PAT type that could today and in legacy mode specify WC to really specify WC.
>
> I don't know if we need to set the high half or not, that would depend
> on the state of the PAT errata.
>
> I do know we need to use the low 4 pat mappings to avoid most of the PAT
> errata issues.
>
> As for Andi's concern about modules playing games with the PAT mappings
> if we don't redefine how we use the page table entries our exposure to
> badly behaved modules more limited.

Ok.  My analysis here was wrong.  Currently pgprot_noncached and
ioremap_nocache are out of sync.  With ioremap_nocache only specifying
_PAGE_PCD and pgprot_noncached specifying _PAGE_PCD | _PAGE_PWT.

So I don't have a clue how someone could reprogram the mtrrs currently
and expect things to work.

...

If we bother to ask ioremap for memory that is not cached, the last
thing in the world we want is the MTRRs upgrading that to write combining.
So ioremap_nocache has been slightly buggy for ages.  ioremap_nocache
and PAGE_KERNEL_NOCACHE should get _PAGE_PWT added to their
definitions.

Could we please get a cleanup patch at the beginning of this patchset
or that comes before it that fixes ioremap_nocache on x86?

That will make us a lot more git-bisect safe.


Eric










^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 06/12] PAT 64b: Add ioremap_wc support
  2007-12-14  4:17   ` Roland Dreier
@ 2007-12-14  4:28     ` Eric W. Biederman
  2007-12-14  4:32       ` Roland Dreier
  0 siblings, 1 reply; 52+ messages in thread
From: Eric W. Biederman @ 2007-12-14  4:28 UTC (permalink / raw)
  To: Roland Dreier
  Cc: venkatesh.pallipadi, ak, ebiederm, torvalds, gregkh, airlied,
	davej, mingo, tglx, hpa, akpm, arjan, jesse.barnes, linux-kernel,
	Suresh Siddha

Roland Dreier <rdreier@cisco.com> writes:

>  > --- linux-2.6.24-rc4.orig/include/asm-x86/io_64.h 2007-12-11
> 14:24:56.000000000 -0800
>  > +++ linux-2.6.24-rc4/include/asm-x86/io_64.h 2007-12-11 15:49:52.000000000
> -0800
>  > @@ -142,7 +142,8 @@
>  > * it's useful if some control registers are in such an area and write
> combining
>  >   * or read caching is not desirable:
>  >   */
>  > -extern void __iomem * ioremap_nocache (unsigned long offset, unsigned long
> size);
>  > +extern void __iomem * ioremap_nocache(unsigned long offset, unsigned long
> size);
>  > +extern void __iomem * ioremap_wc(unsigned long offset, unsigned long size);
>
> I think ioremap_wc() needs to be available on all archs for this to be
> really useful to drivers.  It can be a fallback to ioremap_nocache()
> everywhere except 64-bit x86, but it's not nice for every driver that
> wants to use this to need an "#ifdef X86" or whatever.
>
> Also I didn't see anything like pgprot_wc() in the patchset (although
pgprot_writcombined.

> I just skimmed quickly for now).  The use case I actually have would
> be in a a driver's .mmap method, where I want to map device registers
> into userspace with write-combining turned on:
>
> 	vma->vm_page_prot = pgprot_wc(vma->vm_page_prot);
> 	io_remap_pfn_range(vma, vma->vm_start, pfn, PAGE_SIZE,
> vma->vm_page_prot);
>
> where the pfn points into a PCI BAR (which will only be mapped once,
> so no issues with conflicting PATs or anything like that).

Eric


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 06/12] PAT 64b: Add ioremap_wc support
  2007-12-14  4:28     ` Eric W. Biederman
@ 2007-12-14  4:32       ` Roland Dreier
  2007-12-14  4:48         ` Eric W. Biederman
  0 siblings, 1 reply; 52+ messages in thread
From: Roland Dreier @ 2007-12-14  4:32 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: venkatesh.pallipadi, ak, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes, linux-kernel,
	Suresh Siddha

 > > Also I didn't see anything like pgprot_wc() in the patchset (although

 > pgprot_writcombined.

Oh I see it now (pgprot_writecombine() actually).

However the same comment as before applies: there needs to be a
fallback to pgprot_noncached() for all other architectures so that
drivers can actually use it in a sane way.

 - R.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 06/12] PAT 64b: Add ioremap_wc support
  2007-12-14  4:32       ` Roland Dreier
@ 2007-12-14  4:48         ` Eric W. Biederman
  2007-12-14 21:40           ` Siddha, Suresh B
  0 siblings, 1 reply; 52+ messages in thread
From: Eric W. Biederman @ 2007-12-14  4:48 UTC (permalink / raw)
  To: Roland Dreier
  Cc: venkatesh.pallipadi, ak, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes, linux-kernel,
	Suresh Siddha

Roland Dreier <rdreier@cisco.com> writes:

>  > > Also I didn't see anything like pgprot_wc() in the patchset (although
>
>  > pgprot_writcombined.
>
> Oh I see it now (pgprot_writecombine() actually).
>
> However the same comment as before applies: there needs to be a
> fallback to pgprot_noncached() for all other architectures so that
> drivers can actually use it in a sane way.

Sounds reasonable.

There are three distinct pieces to this.
- Getting arch/x86 caught up the state art in linux
- Getting the state of the art generalized so everyone can use it.
- Figuring how to generally do the proper conflict checking so we
  don't shoot ourselves in the head by accident.

They are all independent problems and can be solved in any order.

It should be the conflict checking that is the actual bottleneck.
The rest is just gravy.

Eric

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 08/12] PAT 64b: coherent mmap and sysfs bin ioctl
  2007-12-14  3:59   ` Eric W. Biederman
@ 2007-12-14  6:02     ` Greg KH
  2007-12-14  6:04       ` Eric W. Biederman
  0 siblings, 1 reply; 52+ messages in thread
From: Greg KH @ 2007-12-14  6:02 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: venkatesh.pallipadi, ak, rdreier, torvalds, airlied, davej,
	mingo, tglx, hpa, akpm, arjan, jesse.barnes, linux-kernel,
	Suresh Siddha

On Thu, Dec 13, 2007 at 08:59:44PM -0700, Eric W. Biederman wrote:
> venkatesh.pallipadi@intel.com writes:
> 
> > Forward port of coherent-mmap.patch and sysfs-bin-ioctl.patch to x86 tree.
> >
> > TBD: Do we need the ioctl interface to sysfs or get the type attribute
> > through a different sysfs file. And then actually specify the attribute
> > while doing pci_mmap_page_range ;-)
> 
> This ioctl is not connected up.  So regardless of the wisdom of ioctls on
> sysfs adding the infrastructure and then not using it is broken.

Ok, I guess the "use an ioctl on a binary file in sysfs for PCI devices"
makes a bit more sense (hint, next time explain this in the changelog
instead of just saying that it is being added), but I would like to see
how this is all hooked up before passing final judgement on it.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 08/12] PAT 64b: coherent mmap and sysfs bin ioctl
  2007-12-14  6:02     ` Greg KH
@ 2007-12-14  6:04       ` Eric W. Biederman
  2007-12-14 10:19         ` Andi Kleen
  0 siblings, 1 reply; 52+ messages in thread
From: Eric W. Biederman @ 2007-12-14  6:04 UTC (permalink / raw)
  To: Greg KH
  Cc: venkatesh.pallipadi, ak, rdreier, torvalds, airlied, davej,
	mingo, tglx, hpa, akpm, arjan, jesse.barnes, linux-kernel,
	Suresh Siddha

Greg KH <gregkh@suse.de> writes:

> On Thu, Dec 13, 2007 at 08:59:44PM -0700, Eric W. Biederman wrote:
>> venkatesh.pallipadi@intel.com writes:
>> 
>> > Forward port of coherent-mmap.patch and sysfs-bin-ioctl.patch to x86 tree.
>> >
>> > TBD: Do we need the ioctl interface to sysfs or get the type attribute
>> > through a different sysfs file. And then actually specify the attribute
>> > while doing pci_mmap_page_range ;-)
>> 
>> This ioctl is not connected up.  So regardless of the wisdom of ioctls on
>> sysfs adding the infrastructure and then not using it is broken.
>
> Ok, I guess the "use an ioctl on a binary file in sysfs for PCI devices"
> makes a bit more sense (hint, next time explain this in the changelog
> instead of just saying that it is being added), but I would like to see
> how this is all hooked up before passing final judgement on it.


The obvious thing to do would be to hook it up like:
drivers/pci/proc.c:proc_bus_pci_ioctl.

Eric

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 08/12] PAT 64b: coherent mmap and sysfs bin ioctl
  2007-12-14  0:35     ` David Miller
@ 2007-12-14  6:34       ` Greg KH
  2007-12-16 21:57         ` Paul Mackerras
  0 siblings, 1 reply; 52+ messages in thread
From: Greg KH @ 2007-12-14  6:34 UTC (permalink / raw)
  To: David Miller
  Cc: venkatesh.pallipadi, ak, ebiederm, rdreier, torvalds, airlied,
	davej, mingo, tglx, hpa, akpm, arjan, jesse.barnes, linux-kernel,
	suresh.b.siddha

On Thu, Dec 13, 2007 at 04:35:05PM -0800, David Miller wrote:
> From: Greg KH <gregkh@suse.de>
> Date: Thu, 13 Dec 2007 16:19:32 -0800
> 
> > On Thu, Dec 13, 2007 at 03:55:51PM -0800, venkatesh.pallipadi@intel.com wrote:
> > > Forward port of coherent-mmap.patch and sysfs-bin-ioctl.patch to x86 tree.
> > > 
> > > TBD: Do we need the ioctl interface to sysfs or get the type attribute
> > > through a different sysfs file. And then actually specify the attribute
> > > while doing pci_mmap_page_range ;-)
> > 
> > Woah!  No, no ioctls on sysfs files, sorry.  Not going to happen, do
> > this on a /dev file if you want to have ioctls...
> 
> Well since we told people to move over to sysfs for PCI
> accesses, and that's where mmap() is done via too,
> it should be no surprise that we run into problems when
> people want to set attributes for the mmap() as was done
> for the procfs case.
> 
> So you have two choices:
> 
> 1) Balk on the sysfs pci usage, and erase years of effort
>    of moving people over to sysfs.  Tell them to go back to
>    procfs so we can add the attribute setting via ioctl()
>    which is absolutely needed.

Ok, sorry, it wasn't blindingly obvious that this was for pci sysfs
devices that are mmaped, that makes a bit more sense.

But I'd like to see what ioctl is wanted here first.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 08/12] PAT 64b: coherent mmap and sysfs bin ioctl
  2007-12-14  6:04       ` Eric W. Biederman
@ 2007-12-14 10:19         ` Andi Kleen
  0 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-12-14 10:19 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg KH, venkatesh.pallipadi, rdreier, torvalds, airlied, davej,
	mingo, tglx, hpa, akpm, arjan, jesse.barnes, linux-kernel,
	Suresh Siddha

> The obvious thing to do would be to hook it up like:
> drivers/pci/proc.c:proc_bus_pci_ioctl.

Yes that is what it intended to do -- i just had never finished/tested
that.

-Andi

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation
  2007-12-14  3:48   ` Eric W. Biederman
  2007-12-14  4:23     ` Eric W. Biederman
@ 2007-12-14 10:25     ` Andi Kleen
  2007-12-14 19:45       ` H. Peter Anvin
  2007-12-18  4:42       ` Eric W. Biederman
  2007-12-14 21:06     ` Siddha, Suresh B
  2 siblings, 2 replies; 52+ messages in thread
From: Andi Kleen @ 2007-12-14 10:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: venkatesh.pallipadi, rdreier, torvalds, gregkh, airlied, davej,
	mingo, tglx, hpa, akpm, arjan, jesse.barnes, linux-kernel,
	Suresh Siddha

> I do know we need to use the low 4 pat mappings to avoid most of the PAT
> errata issues.

They don't really matter. These are all very old systems who have run 
fine for many years without PAT. It is no problem to let them
continue to do so and just disable PAT for them. So just clear pat bit in
CPU initialization for any CPUs with non trivial erratas in this
area.

PAT is only really needed on modern boxes.

Just someone needs to go through the old errata sheets and find
out on which CPUs it is needed to clear the bit.

> As for Andi's concern about modules playing games with the PAT mappings
> if we don't redefine how we use the page table entries our exposure to
> badly behaved modules more limited.

I would just recheck them after module load and if it happens
print a nasty message and program them back. e.g. kernel debuggers
need an after module notifier anyways, so it would be fine
to just add one and hook into that.

-Andi


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation
  2007-12-14  0:42   ` Andi Kleen
@ 2007-12-14 18:31     ` Venki Pallipadi
  2007-12-18  4:50       ` Eric W. Biederman
  0 siblings, 1 reply; 52+ messages in thread
From: Venki Pallipadi @ 2007-12-14 18:31 UTC (permalink / raw)
  To: Andi Kleen
  Cc: venkatesh.pallipadi, ebiederm, rdreier, torvalds, gregkh,
	airlied, davej, mingo, tglx, hpa, akpm, arjan, jesse.barnes,
	linux-kernel, Suresh Siddha

On Fri, Dec 14, 2007 at 01:42:12AM +0100, Andi Kleen wrote:
> > +void __cpuinit pat_init(void)
> > +{
> > +	/* Set PWT+PCD to Write-Combining. All other bits stay the same */
> > +	if (cpu_has_pat) {
> 
> All the old CPUs (PPro etc.) with known PAT bugs need to clear this flag 
> now in their CPU init functions. It is fine to be aggressive there
> because these old systems have lived so long without PAT they can do 
> so forever. So perhaps it's best to just white list it only for newer
> CPUs on the Intel side at least.

Yes. Enabling this only on relatively newer CPUs is safer. Will do that in next iteration of the patches.
 
> Another problem is that there are some popular modules (ATI, Nvidia for once)
> who reprogram the PAT registers on their own, likely different. Need some way to detect
> that case I guess, otherwise lots of users will see strange malfunctions.
> Maybe recheck after module load?

Yes. We can check that at load time. But they can still do bad things at runt ime, like say when 3D gets enabled etc??

 
> > +                   |||
> > +		   000 WB         default
> > +		   010 UC_MINUS   _PAGE_PCD
> > +		   011 WC         _PAGE_WC
> > +		   PAT bit unused */
> > +		pat = PAT(0,WB) | PAT(1,WT) | PAT(2,UC_MINUS) | PAT(3,WC) |
> > +		      PAT(4,WB) | PAT(5,WT) | PAT(6,UC_MINUS) | PAT(7,WC);
> > +		rdmsrl(MSR_IA32_CR_PAT, boot_pat_state);
> > +		wrmsrl(MSR_IA32_CR_PAT, pat);
> > +		__flush_tlb_all();
> > +		asm volatile("wbinvd");
> 
> Have you double checked this is the full procedure from the manual? iirc there
> were some steps missing.


Checking the manual for this. You are right, we had missed some steps here.
Actually, manual says on MP, PAT MSR on all CPUs must be consistent (even when they are not really using it in their page tables.
So, this will change the init and shutdown parts significantly and there may be some challenges with CPU offline and KEXEC. We will redo this part in next iteration.

Thanks,
Venki

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation
  2007-12-14 10:25     ` Andi Kleen
@ 2007-12-14 19:45       ` H. Peter Anvin
  2007-12-18  4:42       ` Eric W. Biederman
  1 sibling, 0 replies; 52+ messages in thread
From: H. Peter Anvin @ 2007-12-14 19:45 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Eric W. Biederman, venkatesh.pallipadi, rdreier, torvalds,
	gregkh, airlied, davej, mingo, tglx, akpm, arjan, jesse.barnes,
	linux-kernel, Suresh Siddha

Andi Kleen wrote:
>> I do know we need to use the low 4 pat mappings to avoid most of the PAT
>> errata issues.
> 
> They don't really matter. These are all very old systems who have run 
> fine for many years without PAT. It is no problem to let them
> continue to do so and just disable PAT for them. So just clear pat bit in
> CPU initialization for any CPUs with non trivial erratas in this
> area.
> 
> PAT is only really needed on modern boxes.

How many mapping types do we actually need?  The only ones which are 
likely to be used in practice are WB, UC, WC, which still leaves a 
spare.  (Any intended users of WP or WT?)

	-hpa

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation
  2007-12-14  3:48   ` Eric W. Biederman
  2007-12-14  4:23     ` Eric W. Biederman
  2007-12-14 10:25     ` Andi Kleen
@ 2007-12-14 21:06     ` Siddha, Suresh B
  2 siblings, 0 replies; 52+ messages in thread
From: Siddha, Suresh B @ 2007-12-14 21:06 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: venkatesh.pallipadi, ak, rdreier, torvalds, gregkh, airlied,
	davej, mingo, tglx, hpa, akpm, arjan, jesse.barnes, linux-kernel,
	Suresh Siddha

On Thu, Dec 13, 2007 at 08:48:45PM -0700, Eric W. Biederman wrote:
> > +		pat = PAT(0,WB) | PAT(1,WT) | PAT(2,UC_MINUS) | PAT(3,WC) |
> > +		      PAT(4,WB) | PAT(5,WT) | PAT(6,UC_MINUS) | PAT(7,WC);
> 
> I strongly object to this configuration.
> 
> The caching modes of interest are:
> PAT_WB write-back or a close as the MTRRs will allow
>        used for WC today.
> PAT_UC completely uncachable not overridable by MTRRs 
>        and what we use today for pgprot_noncached
> PAT_WC what isn't available for current use.
>
> We should use:
> > +		pat = PAT(0,WB) | PAT(1,WT) | PAT(2,WC) | PAT(3,UC) |
> > +		      PAT(4,WB) | PAT(5,WT) | PAT(6,WC) | PAT(7,UC);
> 
> Changing the UC- which currently allows write-combining if the MTRRs specify it,
> to WC.  This grandfathers in all of our current usage and changes the one
> PAT type that could today and in legacy mode specify WC to really specify WC.

That seems reasonable. But looking at mainline kernel, ioremap_nocache()
actually uses UC_MINUS. Wonder why it is not using UC (like
pgprot_noncached).  I think it is ok to change ioremap_nocache() to use UC.

thanks,
suresh

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation
  2007-12-14  4:23     ` Eric W. Biederman
@ 2007-12-14 21:10       ` Siddha, Suresh B
  2007-12-14 23:34         ` Siddha, Suresh B
  0 siblings, 1 reply; 52+ messages in thread
From: Siddha, Suresh B @ 2007-12-14 21:10 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: venkatesh.pallipadi, ak, rdreier, torvalds, gregkh, airlied,
	davej, mingo, tglx, hpa, akpm, arjan, jesse.barnes, linux-kernel,
	Suresh Siddha

On Thu, Dec 13, 2007 at 09:23:26PM -0700, Eric W. Biederman wrote:
> ebiederm@xmission.com (Eric W. Biederman) writes:
> Ok.  My analysis here was wrong.  Currently pgprot_noncached and
> ioremap_nocache are out of sync.  With ioremap_nocache only specifying
> _PAGE_PCD and pgprot_noncached specifying _PAGE_PCD | _PAGE_PWT.
> 
> So I don't have a clue how someone could reprogram the mtrrs currently
> and expect things to work.
> 
> ...
> 
> If we bother to ask ioremap for memory that is not cached, the last
> thing in the world we want is the MTRRs upgrading that to write combining.
> So ioremap_nocache has been slightly buggy for ages.  ioremap_nocache
> and PAGE_KERNEL_NOCACHE should get _PAGE_PWT added to their
> definitions.
> 
> Could we please get a cleanup patch at the beginning of this patchset
> or that comes before it that fixes ioremap_nocache on x86?
> 
> That will make us a lot more git-bisect safe.

Ok. I will send a separate patch  fixing ioremap_nocache on x86.

thanks,
suresh

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 06/12] PAT 64b: Add ioremap_wc support
  2007-12-14  4:48         ` Eric W. Biederman
@ 2007-12-14 21:40           ` Siddha, Suresh B
  2007-12-14 23:19             ` Andi Kleen
  2007-12-18  8:29             ` Eric W. Biederman
  0 siblings, 2 replies; 52+ messages in thread
From: Siddha, Suresh B @ 2007-12-14 21:40 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Roland Dreier, venkatesh.pallipadi, ak, torvalds, gregkh,
	airlied, davej, mingo, tglx, hpa, akpm, arjan, jesse.barnes,
	linux-kernel, Suresh Siddha

On Thu, Dec 13, 2007 at 09:48:36PM -0700, Eric W. Biederman wrote:
> Roland Dreier <rdreier@cisco.com> writes:
> 
> >  > > Also I didn't see anything like pgprot_wc() in the patchset (although
> >
> >  > pgprot_writcombined.
> >
> > Oh I see it now (pgprot_writecombine() actually).
> >
> > However the same comment as before applies: there needs to be a
> > fallback to pgprot_noncached() for all other architectures so that
> > drivers can actually use it in a sane way.
> 
> Sounds reasonable.

We will fix it in next rev.

> It should be the conflict checking that is the actual bottleneck.
> The rest is just gravy.

Yes. We are looking for comments for our proposal to track the
reserved/non-reserved regions some what different.
This is the critical issue which had been holding off PAT for years now...

<snip from the other mails>

Change x86_64 identity map to only map non-reserved memory. This helps
to handle UC/WC mapping of reserved region in a much simple manner
(we don't have to do cpa any more, as such not keep track of the actual
reference counts. We still track all the usages to keep the mappings
consistent. We just avoid the headache of splitting mattr regions for
managing ref counts for every individual usage of the reserved area).

For now, we don't track RAM pages using memattr infrastructure. This is because,
memattr infrastructure is not enough. i.e., while the page is getting
tracked using memattr infrastructure, potentially the page can get
freed(a bug that we need to catch, to avoid attribute aliasing).
For example, a driver does ioremap_uc and an application mapped the
same page using /dev/mem. When the driver does iounamp and free the page,
/dev/mem mapping is still live and we run into aliasing issue.

Can we use the existing page struct to keep track of the attribute
and usage? /dev/mem mappings then can increment the page ref count and not
allow to free the page while the /dev/mem mappings are active. And allow
/dev/mem to map only those pages which are marked reserved (which the driver
does before doing iomap).

Or when a WB mapping through /dev/mem is active, don't allow any driver
to map the page as UC.. Can we do this tracking for RAM pages through
struct page. Or there any issues we should keep in mind..

thanks,
suresh

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 00/12] PAT 64b: PAT support for X86_64
  2007-12-14  0:28 ` [RFC PATCH 00/12] PAT 64b: PAT support for X86_64 Dave Airlie
@ 2007-12-14 22:00   ` Siddha, Suresh B
  2007-12-14 22:27     ` Dave Airlie
  0 siblings, 1 reply; 52+ messages in thread
From: Siddha, Suresh B @ 2007-12-14 22:00 UTC (permalink / raw)
  To: Dave Airlie
  Cc: venkatesh.pallipadi, ak, ebiederm, rdreier, torvalds, gregkh,
	davej, mingo, tglx, hpa, akpm, arjan, jesse.barnes, linux-kernel

On Fri, Dec 14, 2007 at 12:28:25AM +0000, Dave Airlie wrote:
> Yes, the main use for GPUs is to have RAM pages mapped WC, and placed into 
> a GART on the GPU side, currently for Intel IGD we are okay as the CPU can 
> access the GPU GART aperture, but other chips exist where this isn't 
> possible, I think poulsbo and possible some of the AMD IGPs..

Ok. So how is it working today on these platforms with no PAT support.
Open source drivers use UC or WB on these platforms? As this RAM is not
contiguous, one can't use MTRRs to set WC. Right?

Well, if WC is needed for RAM, then we have to address it too.

thanks,
suresh

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 00/12] PAT 64b: PAT support for X86_64
  2007-12-14 22:00   ` Siddha, Suresh B
@ 2007-12-14 22:27     ` Dave Airlie
  2007-12-14 22:32       ` H. Peter Anvin
  0 siblings, 1 reply; 52+ messages in thread
From: Dave Airlie @ 2007-12-14 22:27 UTC (permalink / raw)
  To: Siddha, Suresh B
  Cc: Dave Airlie, venkatesh.pallipadi, ak, ebiederm, rdreier,
	torvalds, gregkh, davej, mingo, tglx, hpa, akpm, arjan,
	jesse.barnes, linux-kernel

On Dec 15, 2007 8:00 AM, Siddha, Suresh B <suresh.b.siddha@intel.com> wrote:
> On Fri, Dec 14, 2007 at 12:28:25AM +0000, Dave Airlie wrote:
> > Yes, the main use for GPUs is to have RAM pages mapped WC, and placed into
> > a GART on the GPU side, currently for Intel IGD we are okay as the CPU can
> > access the GPU GART aperture, but other chips exist where this isn't
> > possible, I think poulsbo and possible some of the AMD IGPs..
>
> Ok. So how is it working today on these platforms with no PAT support.
> Open source drivers use UC or WB on these platforms? As this RAM is not
> contiguous, one can't use MTRRs to set WC. Right?
>
> Well, if WC is needed for RAM, then we have to address it too.
>

It doesn't work really, which is mostly the problem :)

We mostly use UC on these pages, or WB within cache coherent domains.
mtrrs are totally useless in this situation.

Dave.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 00/12] PAT 64b: PAT support for X86_64
  2007-12-14 22:27     ` Dave Airlie
@ 2007-12-14 22:32       ` H. Peter Anvin
  2007-12-14 22:37         ` Dave Airlie
  0 siblings, 1 reply; 52+ messages in thread
From: H. Peter Anvin @ 2007-12-14 22:32 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Siddha, Suresh B, Dave Airlie, venkatesh.pallipadi, ak, ebiederm,
	rdreier, torvalds, gregkh, davej, mingo, tglx, akpm, arjan,
	jesse.barnes, linux-kernel

Dave Airlie wrote:
> On Dec 15, 2007 8:00 AM, Siddha, Suresh B <suresh.b.siddha@intel.com> wrote:
>> On Fri, Dec 14, 2007 at 12:28:25AM +0000, Dave Airlie wrote:
>>> Yes, the main use for GPUs is to have RAM pages mapped WC, and placed into
>>> a GART on the GPU side, currently for Intel IGD we are okay as the CPU can
>>> access the GPU GART aperture, but other chips exist where this isn't
>>> possible, I think poulsbo and possible some of the AMD IGPs..
>> Ok. So how is it working today on these platforms with no PAT support.
>> Open source drivers use UC or WB on these platforms? As this RAM is not
>> contiguous, one can't use MTRRs to set WC. Right?
>>
>> Well, if WC is needed for RAM, then we have to address it too.
>>
> 
> It doesn't work really, which is mostly the problem :)
> 
> We mostly use UC on these pages, or WB within cache coherent domains.
> mtrrs are totally useless in this situation.
> 

In what sense does it not work?

	-hpa

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 00/12] PAT 64b: PAT support for X86_64
  2007-12-14 22:32       ` H. Peter Anvin
@ 2007-12-14 22:37         ` Dave Airlie
  0 siblings, 0 replies; 52+ messages in thread
From: Dave Airlie @ 2007-12-14 22:37 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Siddha, Suresh B, Dave Airlie, venkatesh.pallipadi, ak, ebiederm,
	rdreier, torvalds, gregkh, davej, mingo, tglx, akpm, arjan,
	jesse.barnes, linux-kernel

> > It doesn't work really, which is mostly the problem :)
> >
> > We mostly use UC on these pages, or WB within cache coherent domains.
> > mtrrs are totally useless in this situation.
> >
>
> In what sense does it not work?

oh I was mostly joking hence the smily.. really it just means thing
run slower than they need to..

Dave.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 06/12] PAT 64b: Add ioremap_wc support
  2007-12-14 21:40           ` Siddha, Suresh B
@ 2007-12-14 23:19             ` Andi Kleen
  2007-12-18  8:29             ` Eric W. Biederman
  1 sibling, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-12-14 23:19 UTC (permalink / raw)
  To: Siddha, Suresh B
  Cc: Eric W. Biederman, Roland Dreier, venkatesh.pallipadi, torvalds,
	gregkh, airlied, davej, mingo, tglx, hpa, akpm, arjan,
	jesse.barnes, linux-kernel

> For now, we don't track RAM pages using memattr infrastructure. This is because,
> memattr infrastructure is not enough. i.e., while the page is getting
> tracked using memattr infrastructure,

> potentially the page can get
> freed(a bug that we need to catch, to avoid attribute aliasing).

That would be a bug in the caller. But anybody who sets non standard
attributes should go through memattr first.

The only exception is /dev/mem -- not sure how to handle this best.
Perhaps only allow memattr for non memory for there. 

-Andi

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation
  2007-12-14 21:10       ` Siddha, Suresh B
@ 2007-12-14 23:34         ` Siddha, Suresh B
  2007-12-15  7:55           ` Ingo Molnar
  0 siblings, 1 reply; 52+ messages in thread
From: Siddha, Suresh B @ 2007-12-14 23:34 UTC (permalink / raw)
  To: ebiederm
  Cc: Eric W. Biederman, venkatesh.pallipadi, ak, rdreier, torvalds,
	gregkh, airlied, davej, mingo, tglx, hpa, akpm, arjan,
	jesse.barnes, linux-kernel

On Fri, Dec 14, 2007 at 01:10:39PM -0800, Siddha, Suresh B wrote:
> On Thu, Dec 13, 2007 at 09:23:26PM -0700, Eric W. Biederman wrote:
> > ebiederm@xmission.com (Eric W. Biederman) writes:
> > Ok.  My analysis here was wrong.  Currently pgprot_noncached and
> > ioremap_nocache are out of sync.  With ioremap_nocache only specifying
> > _PAGE_PCD and pgprot_noncached specifying _PAGE_PCD | _PAGE_PWT.
> > 
> > So I don't have a clue how someone could reprogram the mtrrs currently
> > and expect things to work.
> > 
> > ...
> > 
> > If we bother to ask ioremap for memory that is not cached, the last
> > thing in the world we want is the MTRRs upgrading that to write combining.
> > So ioremap_nocache has been slightly buggy for ages.  ioremap_nocache
> > and PAGE_KERNEL_NOCACHE should get _PAGE_PWT added to their
> > definitions.
> > 
> > Could we please get a cleanup patch at the beginning of this patchset
> > or that comes before it that fixes ioremap_nocache on x86?
> > 
> > That will make us a lot more git-bisect safe.
> 
> Ok. I will send a separate patch  fixing ioremap_nocache on x86.

Appended the patch. x86 folks, please consider for x86 mm git tree. Thanks.

---
[patch] x86: Set strong uncacheable where UC is really desired

Also use _PAGE_PWT for all the mappings which need uncache mapping. Instead of
existing PAT2 which is UC- (and can be overwritten by MTRRs), we now use PAT3
which is strong uncacheable.

This makes it consistent with pgprot_noncached()

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

diff --git a/arch/x86/mm/ioremap_32.c b/arch/x86/mm/ioremap_32.c
index 0b27831..ef0f6a4 100644
--- a/arch/x86/mm/ioremap_32.c
+++ b/arch/x86/mm/ioremap_32.c
@@ -119,7 +119,7 @@ EXPORT_SYMBOL(__ioremap);
 void __iomem *ioremap_nocache (unsigned long phys_addr, unsigned long size)
 {
 	unsigned long last_addr;
-	void __iomem *p = __ioremap(phys_addr, size, _PAGE_PCD);
+	void __iomem *p = __ioremap(phys_addr, size, _PAGE_PCD | _PAGE_PWT);
 	if (!p) 
 		return p; 
 
diff --git a/arch/x86/mm/ioremap_64.c b/arch/x86/mm/ioremap_64.c
index 6cac90a..8be3062 100644
--- a/arch/x86/mm/ioremap_64.c
+++ b/arch/x86/mm/ioremap_64.c
@@ -158,7 +158,7 @@ EXPORT_SYMBOL(__ioremap);
 
 void __iomem *ioremap_nocache (unsigned long phys_addr, unsigned long size)
 {
-	return __ioremap(phys_addr, size, _PAGE_PCD);
+	return __ioremap(phys_addr, size, _PAGE_PCD | _PAGE_PWT);
 }
 EXPORT_SYMBOL(ioremap_nocache);
 
diff --git a/include/asm-x86/pgtable_32.h b/include/asm-x86/pgtable_32.h
index ed3e70d..b1215e1 100644
--- a/include/asm-x86/pgtable_32.h
+++ b/include/asm-x86/pgtable_32.h
@@ -156,7 +156,7 @@ void paging_init(void);
 extern unsigned long long __PAGE_KERNEL, __PAGE_KERNEL_EXEC;
 #define __PAGE_KERNEL_RO		(__PAGE_KERNEL & ~_PAGE_RW)
 #define __PAGE_KERNEL_RX		(__PAGE_KERNEL_EXEC & ~_PAGE_RW)
-#define __PAGE_KERNEL_NOCACHE		(__PAGE_KERNEL | _PAGE_PCD)
+#define __PAGE_KERNEL_NOCACHE		(__PAGE_KERNEL | _PAGE_PCD | _PAGE_PWT)
 #define __PAGE_KERNEL_LARGE		(__PAGE_KERNEL | _PAGE_PSE)
 #define __PAGE_KERNEL_LARGE_EXEC	(__PAGE_KERNEL_EXEC | _PAGE_PSE)
 
diff --git a/include/asm-x86/pgtable_64.h b/include/asm-x86/pgtable_64.h
index 9b0ff47..4e4dcc4 100644
--- a/include/asm-x86/pgtable_64.h
+++ b/include/asm-x86/pgtable_64.h
@@ -185,13 +185,13 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, unsigned long
 #define __PAGE_KERNEL_EXEC \
 	(_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_ACCESSED)
 #define __PAGE_KERNEL_NOCACHE \
-	(_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_PCD | _PAGE_ACCESSED | _PAGE_NX)
+	(_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_PCD | _PAGE_PWT | _PAGE_ACCESSED | _PAGE_NX)
 #define __PAGE_KERNEL_RO \
 	(_PAGE_PRESENT | _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_NX)
 #define __PAGE_KERNEL_VSYSCALL \
 	(_PAGE_PRESENT | _PAGE_USER | _PAGE_ACCESSED)
 #define __PAGE_KERNEL_VSYSCALL_NOCACHE \
-	(_PAGE_PRESENT | _PAGE_USER | _PAGE_ACCESSED | _PAGE_PCD)
+	(_PAGE_PRESENT | _PAGE_USER | _PAGE_ACCESSED | _PAGE_PCD | _PAGE_PWT)
 #define __PAGE_KERNEL_LARGE \
 	(__PAGE_KERNEL | _PAGE_PSE)
 #define __PAGE_KERNEL_LARGE_EXEC \

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation
  2007-12-14 23:34         ` Siddha, Suresh B
@ 2007-12-15  7:55           ` Ingo Molnar
  0 siblings, 0 replies; 52+ messages in thread
From: Ingo Molnar @ 2007-12-15  7:55 UTC (permalink / raw)
  To: Siddha, Suresh B
  Cc: ebiederm, venkatesh.pallipadi, ak, rdreier, torvalds, gregkh,
	airlied, davej, tglx, hpa, akpm, arjan, jesse.barnes,
	linux-kernel


* Siddha, Suresh B <suresh.b.siddha@intel.com> wrote:

> > Ok. I will send a separate patch fixing ioremap_nocache on x86.
> 
> Appended the patch. x86 folks, please consider for x86 mm git tree. 
> Thanks.

thanks, applied.

	Ingo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 08/12] PAT 64b: coherent mmap and sysfs bin ioctl
  2007-12-14  6:34       ` Greg KH
@ 2007-12-16 21:57         ` Paul Mackerras
  2007-12-17 12:41           ` Andi Kleen
  0 siblings, 1 reply; 52+ messages in thread
From: Paul Mackerras @ 2007-12-16 21:57 UTC (permalink / raw)
  To: Greg KH
  Cc: David Miller, venkatesh.pallipadi, ak, ebiederm, rdreier,
	torvalds, airlied, davej, mingo, tglx, hpa, akpm, arjan,
	jesse.barnes, linux-kernel, suresh.b.siddha

Greg KH writes:

> Ok, sorry, it wasn't blindingly obvious that this was for pci sysfs
> devices that are mmaped, that makes a bit more sense.
> 
> But I'd like to see what ioctl is wanted here first.

I believe the ioctl would be to set whether the mapping goes to I/O or
memory space, and whether write-combining is allowed.

So the alternative to the ioctl would be to have multiple files in
sysfs, one per combination of modes -- i.e., 4 files, or 3 if we
exclude the "I/O with write combining" mode, which would be
reasonable.

Paul.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 08/12] PAT 64b: coherent mmap and sysfs bin ioctl
  2007-12-16 21:57         ` Paul Mackerras
@ 2007-12-17 12:41           ` Andi Kleen
  2007-12-18  4:30             ` Eric W. Biederman
  0 siblings, 1 reply; 52+ messages in thread
From: Andi Kleen @ 2007-12-17 12:41 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Greg KH, David Miller, venkatesh.pallipadi, ebiederm, rdreier,
	torvalds, airlied, davej, mingo, tglx, hpa, akpm, arjan,
	jesse.barnes, linux-kernel, suresh.b.siddha

On Mon, Dec 17, 2007 at 08:57:50AM +1100, Paul Mackerras wrote:
> Greg KH writes:
> 
> > Ok, sorry, it wasn't blindingly obvious that this was for pci sysfs
> > devices that are mmaped, that makes a bit more sense.
> > 
> > But I'd like to see what ioctl is wanted here first.
> 
> I believe the ioctl would be to set whether the mapping goes to I/O or
> memory space, 

x86 cannot really access IO space through mmap so no that wasn't planned

The main planned use was to get the translated bus address (after IOMMU)
for a mapping and to set the caching modes.

> So the alternative to the ioctl would be to have multiple files in
> sysfs, one per combination of modes -- i.e., 4 files, or 3 if we
> exclude the "I/O with write combining" mode, which would be
> reasonable.

At least for the IOMMU translation case that wouldn't work.

-Andi

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 08/12] PAT 64b: coherent mmap and sysfs bin ioctl
  2007-12-17 12:41           ` Andi Kleen
@ 2007-12-18  4:30             ` Eric W. Biederman
  2007-12-18  4:51               ` H. Peter Anvin
  2007-12-18  9:35               ` Andi Kleen
  0 siblings, 2 replies; 52+ messages in thread
From: Eric W. Biederman @ 2007-12-18  4:30 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Paul Mackerras, Greg KH, David Miller, venkatesh.pallipadi,
	ebiederm, rdreier, torvalds, airlied, davej, mingo, tglx, hpa,
	akpm, arjan, jesse.barnes, linux-kernel, suresh.b.siddha

Andi Kleen <ak@muc.de> writes:

> On Mon, Dec 17, 2007 at 08:57:50AM +1100, Paul Mackerras wrote:
>> Greg KH writes:
>> 
>> > Ok, sorry, it wasn't blindingly obvious that this was for pci sysfs
>> > devices that are mmaped, that makes a bit more sense.
>> > 
>> > But I'd like to see what ioctl is wanted here first.
>> 
>> I believe the ioctl would be to set whether the mapping goes to I/O or
>> memory space, 
>
> x86 cannot really access IO space through mmap so no that wasn't planned

0000_00FD_FC00_0000h - 0000_00FD_FDFF_FFFFh On a hypertransport based
system should work.  There is a 32MB window for it.

However the I/O vs mem distinction doesn't matter anyway if we start out
per bar because we already know if it is I/O or mem.

> The main planned use was to get the translated bus address (after IOMMU)
> for a mapping and to set the caching modes.
>
>> So the alternative to the ioctl would be to have multiple files in
>> sysfs, one per combination of modes -- i.e., 4 files, or 3 if we
>> exclude the "I/O with write combining" mode, which would be
>> reasonable.
>
> At least for the IOMMU translation case that wouldn't work.

Well the other alternative looks like having a second file per par
bar.  Say resource0_wc to support the write-combining mode, possibly
restricted to just prefetchable bars.

If that is all we have to worry about my inclination is to suggest
a second file, because that feels a bit more generally useable.  As
that attribute could be applied to ordinary reads and writes to and
from the bar.

Eric

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation
  2007-12-14 10:25     ` Andi Kleen
  2007-12-14 19:45       ` H. Peter Anvin
@ 2007-12-18  4:42       ` Eric W. Biederman
  1 sibling, 0 replies; 52+ messages in thread
From: Eric W. Biederman @ 2007-12-18  4:42 UTC (permalink / raw)
  To: Andi Kleen
  Cc: venkatesh.pallipadi, rdreier, torvalds, gregkh, airlied, davej,
	mingo, tglx, hpa, akpm, arjan, jesse.barnes, linux-kernel,
	Suresh Siddha

Andi Kleen <ak@muc.de> writes:

>> I do know we need to use the low 4 pat mappings to avoid most of the PAT
>> errata issues.
>
> They don't really matter. These are all very old systems who have run 
> fine for many years without PAT. It is no problem to let them
> continue to do so and just disable PAT for them. So just clear pat bit in
> CPU initialization for any CPUs with non trivial erratas in this
> area.
>
> PAT is only really needed on modern boxes.
>
> Just someone needs to go through the old errata sheets and find
> out on which CPUs it is needed to clear the bit.

It has been ages now, but my impression when I wrote the patch that
current cores still had a few outstanding errata with using the
extended pat bits.

Further it was my impression was that if we just changed UC- to WC
we work on essentially everything, because PAT is always enabled
on the cores that support it.

Therefore since we only have 3 interesting caching modes.
WB, WC, UC.  We should be very careful about reprogramming it
and we can ignore the errors.

As for the pat class errata about inconsistent mappings those are
reoccurring issues, that happen across all cpu types (x86/ppc/fred),
and every  major core overhaul is likely to have them again.

Eric

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation
  2007-12-14 18:31     ` Venki Pallipadi
@ 2007-12-18  4:50       ` Eric W. Biederman
  0 siblings, 0 replies; 52+ messages in thread
From: Eric W. Biederman @ 2007-12-18  4:50 UTC (permalink / raw)
  To: Venki Pallipadi
  Cc: Andi Kleen, ebiederm, rdreier, torvalds, gregkh, airlied, davej,
	mingo, tglx, hpa, akpm, arjan, jesse.barnes, linux-kernel,
	Suresh Siddha

Venki Pallipadi <venkatesh.pallipadi@intel.com> writes:

> Checking the manual for this. You are right, we had missed some steps here.
> Actually, manual says on MP, PAT MSR on all CPUs must be consistent (even when
> they are not really using it in their page tables.
> So, this will change the init and shutdown parts significantly and there may be
> some challenges with CPU offline and KEXEC. We will redo this part in next
> iteration.

Well the normal kexec path is no worse then reboot.  The kdump path is a
mess but only a minor one, and with us only changing the UC- case we
can probably just ignore it and leave the system started with that
pat register set to WC :)

What we are doing really should be no worse the MTRR setup except
that disabling it at reboot is polite.

CPU online and offline that is weird, but so far it is always weird,
and I don't think ever quite correct.

Eric

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 08/12] PAT 64b: coherent mmap and sysfs bin ioctl
  2007-12-18  4:30             ` Eric W. Biederman
@ 2007-12-18  4:51               ` H. Peter Anvin
  2007-12-18  9:35               ` Andi Kleen
  1 sibling, 0 replies; 52+ messages in thread
From: H. Peter Anvin @ 2007-12-18  4:51 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andi Kleen, Paul Mackerras, Greg KH, David Miller,
	venkatesh.pallipadi, rdreier, torvalds, airlied, davej, mingo,
	tglx, akpm, arjan, jesse.barnes, linux-kernel, suresh.b.siddha

Eric W. Biederman wrote:
> 
> 0000_00FD_FC00_0000h - 0000_00FD_FDFF_FFFFh On a hypertransport based
> system should work.  There is a 32MB window for it.
> 

It doesn't.  The termination on MMIO and IOIO transaction is different, 
and poking this memory window with an MMIO transaction will lock the 
chipset hard (yes, I've tried it.)

	-hpa

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 06/12] PAT 64b: Add ioremap_wc support
  2007-12-14 21:40           ` Siddha, Suresh B
  2007-12-14 23:19             ` Andi Kleen
@ 2007-12-18  8:29             ` Eric W. Biederman
  1 sibling, 0 replies; 52+ messages in thread
From: Eric W. Biederman @ 2007-12-18  8:29 UTC (permalink / raw)
  To: Siddha, Suresh B
  Cc: Roland Dreier, venkatesh.pallipadi, ak, torvalds, gregkh,
	airlied, davej, mingo, tglx, hpa, akpm, arjan, jesse.barnes,
	linux-kernel

"Siddha, Suresh B" <suresh.b.siddha@intel.com> writes:

> Yes. We are looking for comments for our proposal to track the
> reserved/non-reserved regions some what different.
> This is the critical issue which had been holding off PAT for years now...

The mattr infrastructure appears to do a decent job of handling the
reserved page mapping case.  It essentially reinvents struct
vm_area_struct, and so I expect we can do things a little more easily
if we use the existing vm_area_struct with it's vm_page_prot member
for our checks, all rooted at a dummy reserved page inode.  That way
we don't need to do anything special on unmap.

For normal pages we always have them in the kernel mapping and
we use them there.  change_page_attr also comes in from the AGP
drivers and changes the caching attributes on a few of those.  So when
mapping a normal page we need to require it to be write-back or
whatever change_page_attr has set it to.  I expect 2 bits 
of page->flags with the proper default can handle that.

change_page_attr needs to check non kernel mappings of a page 
and either fix them or fail.

If we perform the checks I have described for normal pages
in /dev/mem (in remap_pfn_pages?) that should be our
most difficult case handled.

Eric

> <snip from the other mails>
>
> Change x86_64 identity map to only map non-reserved memory. This helps
> to handle UC/WC mapping of reserved region in a much simple manner
> (we don't have to do cpa any more, as such not keep track of the actual
> reference counts. We still track all the usages to keep the mappings
> consistent. We just avoid the headache of splitting mattr regions for
> managing ref counts for every individual usage of the reserved
> area).

Well we do want to early map the ``isa'' region.
>
> For now, we don't track RAM pages using memattr infrastructure. This is because,
> memattr infrastructure is not enough. i.e., while the page is getting
> tracked using memattr infrastructure, potentially the page can get
> freed(a bug that we need to catch, to avoid attribute aliasing).
> For example, a driver does ioremap_uc and an application mapped the
> same page using /dev/mem. When the driver does iounamp and free the page,
> /dev/mem mapping is still live and we run into aliasing issue.

/dev/mem is particular weird because it doesn't own the page, and thus
will always be the second user if we are talking about pages in ram.

> Can we use the existing page struct to keep track of the attribute
> and usage?

Yes but not the way you describe below.

> /dev/mem mappings then can increment the page ref count and not
> allow to free the page while the /dev/mem mappings are active. And allow
> /dev/mem to map only those pages which are marked reserved (which the driver
> does before doing iomap).

Part of the usefulness of /dev/mem is that it can do silly things like
map pages someone else in the kernel is using.  /dev/mem by it's very
nature does not own ram pages so we need to handle that differently.

> Or when a WB mapping through /dev/mem is active, don't allow any driver
> to map the page as UC.. Can we do this tracking for RAM pages through
> struct page. Or there any issues we should keep in mind..

I think some bits in page->flags should do the trick.  The semantics
of change_page_attr are interesting in this case.

Eric

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 08/12] PAT 64b: coherent mmap and sysfs bin ioctl
  2007-12-18  4:30             ` Eric W. Biederman
  2007-12-18  4:51               ` H. Peter Anvin
@ 2007-12-18  9:35               ` Andi Kleen
  2007-12-18 13:48                 ` Eric W. Biederman
  1 sibling, 1 reply; 52+ messages in thread
From: Andi Kleen @ 2007-12-18  9:35 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Paul Mackerras, Greg KH, David Miller, venkatesh.pallipadi,
	rdreier, torvalds, airlied, davej, mingo, tglx, hpa, akpm, arjan,
	jesse.barnes, linux-kernel, suresh.b.siddha

> Well the other alternative looks like having a second file per par
> bar.  Say resource0_wc to support the write-combining mode, possibly

The intention was to support memory not in bars, but give a generic
IOMMU mapped memory interface for user space e.g. for the X server.
But that needs a way to return the bus address for the mmap mapping
and ioctl was the best I came up with for that.
Given it was never finished.

-Andi

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 08/12] PAT 64b: coherent mmap and sysfs bin ioctl
  2007-12-18  9:35               ` Andi Kleen
@ 2007-12-18 13:48                 ` Eric W. Biederman
  0 siblings, 0 replies; 52+ messages in thread
From: Eric W. Biederman @ 2007-12-18 13:48 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Paul Mackerras, Greg KH, David Miller, venkatesh.pallipadi,
	rdreier, torvalds, airlied, davej, mingo, tglx, hpa, akpm, arjan,
	jesse.barnes, linux-kernel, suresh.b.siddha

Andi Kleen <ak@muc.de> writes:

>> Well the other alternative looks like having a second file per par
>> bar.  Say resource0_wc to support the write-combining mode, possibly
>
> The intention was to support memory not in bars, but give a generic
> IOMMU mapped memory interface for user space e.g. for the X server.
> But that needs a way to return the bus address for the mmap mapping
> and ioctl was the best I came up with for that.
> Given it was never finished.

Ok that part wasn't obvious.  The only thing we mmap in sysfs today
are the bars.

Taking normal memory and iommu mapping it to a device and then having
a user space accessible version is a bit different.  We need a special
interface to allocate it and map it through the iommu to user space.
This needs to be a driver or a support subsystem like DRM.  Once
we have gone that far then we can map those address to user space.

I expect from the sysfs perspective those per device regions should look
a lot like bars showing contiguous chunks of memory RAM from the devices
perspective.  At which point having two files instead of just one can
solve the problem without an ioctl.

For contiguous to device memory we also have some permission issues
so I'm not yet certain that it make sense to expose it through sysfs.

Regardless that seems to be solving a completely new aspect of the problem,
and we can solve that problem separately.

Eric



^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2007-12-18 13:51 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-13 23:55 [RFC PATCH 00/12] PAT 64b: PAT support for X86_64 venkatesh.pallipadi
2007-12-13 23:55 ` [RFC PATCH 01/12] PAT 64b: Add cpu_shutdown() support venkatesh.pallipadi
2007-12-13 23:55 ` [RFC PATCH 02/12] PAT 64b: Basic PAT implementation venkatesh.pallipadi
2007-12-14  0:42   ` Andi Kleen
2007-12-14 18:31     ` Venki Pallipadi
2007-12-18  4:50       ` Eric W. Biederman
2007-12-14  3:48   ` Eric W. Biederman
2007-12-14  4:23     ` Eric W. Biederman
2007-12-14 21:10       ` Siddha, Suresh B
2007-12-14 23:34         ` Siddha, Suresh B
2007-12-15  7:55           ` Ingo Molnar
2007-12-14 10:25     ` Andi Kleen
2007-12-14 19:45       ` H. Peter Anvin
2007-12-18  4:42       ` Eric W. Biederman
2007-12-14 21:06     ` Siddha, Suresh B
2007-12-13 23:55 ` [RFC PATCH 03/12] PAT 64b: drm driver changes for PAT venkatesh.pallipadi
2007-12-13 23:55 ` [RFC PATCH 04/12] PAT 64b: reserve_mattr and free_mattr " venkatesh.pallipadi
2007-12-13 23:55 ` [RFC PATCH 05/12] PAT 64b: pci mmap conlfict patch venkatesh.pallipadi
2007-12-13 23:55 ` [RFC PATCH 06/12] PAT 64b: Add ioremap_wc support venkatesh.pallipadi
2007-12-14  4:17   ` Roland Dreier
2007-12-14  4:28     ` Eric W. Biederman
2007-12-14  4:32       ` Roland Dreier
2007-12-14  4:48         ` Eric W. Biederman
2007-12-14 21:40           ` Siddha, Suresh B
2007-12-14 23:19             ` Andi Kleen
2007-12-18  8:29             ` Eric W. Biederman
2007-12-13 23:55 ` [RFC PATCH 07/12] PAT 64b: dev mem chanegs for pat venkatesh.pallipadi
2007-12-13 23:55 ` [RFC PATCH 08/12] PAT 64b: coherent mmap and sysfs bin ioctl venkatesh.pallipadi
2007-12-14  0:19   ` Greg KH
2007-12-14  0:35     ` David Miller
2007-12-14  6:34       ` Greg KH
2007-12-16 21:57         ` Paul Mackerras
2007-12-17 12:41           ` Andi Kleen
2007-12-18  4:30             ` Eric W. Biederman
2007-12-18  4:51               ` H. Peter Anvin
2007-12-18  9:35               ` Andi Kleen
2007-12-18 13:48                 ` Eric W. Biederman
2007-12-14  0:43     ` Andi Kleen
2007-12-14  0:54   ` Jesse Barnes
2007-12-14  3:59   ` Eric W. Biederman
2007-12-14  6:02     ` Greg KH
2007-12-14  6:04       ` Eric W. Biederman
2007-12-14 10:19         ` Andi Kleen
2007-12-13 23:55 ` [RFC PATCH 09/12] PAT 64b: map only usable memory in identity mapping venkatesh.pallipadi
2007-12-13 23:55 ` [RFC PATCH 10/12] PAT 64b: Make acpi use early map instead of assuming identity map venkatesh.pallipadi
2007-12-13 23:55 ` [RFC PATCH 11/12] PAT 64b: devmem do not read pages not mapped in " venkatesh.pallipadi
2007-12-13 23:55 ` [RFC PATCH 12/12] PAT 64b: skip attr tracking for RAM venkatesh.pallipadi
2007-12-14  0:28 ` [RFC PATCH 00/12] PAT 64b: PAT support for X86_64 Dave Airlie
2007-12-14 22:00   ` Siddha, Suresh B
2007-12-14 22:27     ` Dave Airlie
2007-12-14 22:32       ` H. Peter Anvin
2007-12-14 22:37         ` Dave Airlie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).