linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/15] POWER8 Coherent Accelerator device driver
@ 2014-09-18  8:26 Michael Neuling
  2014-09-18  8:26 ` [PATCH 01/15] powerpc/cell: Move spu_handle_mm_fault() out of cell platform Michael Neuling
                   ` (14 more replies)
  0 siblings, 15 replies; 43+ messages in thread
From: Michael Neuling @ 2014-09-18  8:26 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie, cbe-oss-dev

This add support for the Coherent Accelerator (cxl) attached to POWER8
processors.  This coherent accelerator interface is designed to allow the
coherent connection of FPGA based accelerators (and other devices) to a POWER
systems.

IBM refers to this as the Coherent Accelerator Processor Interface or CAPI.  In
this driver it's referred to by the name cxl to avoid confusion with the ISDN
CAPI subsystem.

An overview of the patches:
  Patches  1-2:  Split some of the old Cell co-processor code out so it can be
		   reused.
  Patches  3-9:  Add infrastructure to arch/powerpc needed by cxl.
  Patches  10:   Add call backs needed for invalidating cxl mm contexts.
  Patch    11:   Add cxl specific support that needs to be built in to the
		   kernel (can't be a module).
  Patches 12-15: Add the majority of the device driver and API header.
  Patch    15:   Documentation.

The documentation in this last patch gives an overview of the hardware
architecture as well as the userspace API.

The cxl driver has a user-space interface described in include/uapi/misc/cxl.h
and Documentation/powerpc/cxl.txt.  There are two ioctls which can be used to
talk to the driver once the new /dev/cxl/afu0.0 device is opened.  This device
can also be read and mmaped.

There's also sysfs entries used to communicate information about the cxl
configuration to userspace.  These are documented in
Documentation/ABI/testing/sysfs-class-cxl.

Many contributed to this device driver but Ian Munsie is the principal author.

Driver can also be found here (based on 3.17-rc5):
   git://github.com/mikey/linux.git cxl
   https://github.com/mikey/linux/commits/cxl

Please consider for inclusion.  Feedback welcome!

Regards,
Mikey

 Documentation/ABI/testing/sysfs-class-cxl      | 125 ++++
 Documentation/ioctl/ioctl-number.txt           |   1 +
 Documentation/powerpc/00-INDEX                 |   2 +
 Documentation/powerpc/cxl.txt                  | 310 ++++++++
 MAINTAINERS                                    |   7 +
 arch/powerpc/include/asm/copro.h               |  18 +
 arch/powerpc/include/asm/mmu-hash64.h          |   3 +
 arch/powerpc/include/asm/opal.h                |   2 +
 arch/powerpc/include/asm/pnv-pci.h             |  27 +
 arch/powerpc/include/asm/spu.h                 |   5 +-
 arch/powerpc/mm/Makefile                       |   2 +
 arch/powerpc/mm/copro_fault.c                  | 140 ++++
 arch/powerpc/mm/hash_native_64.c               |   6 +-
 arch/powerpc/mm/hash_utils_64.c                |  25 +-
 arch/powerpc/mm/slb.c                          |   3 -
 arch/powerpc/mm/slice.c                        |   3 +
 arch/powerpc/platforms/cell/Makefile           |   2 +-
 arch/powerpc/platforms/cell/spu_base.c         |  41 +-
 arch/powerpc/platforms/cell/spu_fault.c        |  94 ---
 arch/powerpc/platforms/cell/spufs/fault.c      |   4 +-
 arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
 arch/powerpc/platforms/powernv/pci-ioda.c      | 229 +++++-
 arch/powerpc/sysdev/msi_bitmap.c               |  18 +-
 drivers/misc/Kconfig                           |   1 +
 drivers/misc/Makefile                          |   1 +
 drivers/misc/cxl/Kconfig                       |  25 +
 drivers/misc/cxl/Makefile                      |   4 +
 drivers/misc/cxl/base.c                        | 102 +++
 drivers/misc/cxl/context.c                     | 169 +++++
 drivers/misc/cxl/cxl-pci.c                     | 977 +++++++++++++++++++++++++
 drivers/misc/cxl/cxl.h                         | 605 +++++++++++++++
 drivers/misc/cxl/debugfs.c                     | 116 +++
 drivers/misc/cxl/fault.c                       | 298 ++++++++
 drivers/misc/cxl/file.c                        | 503 +++++++++++++
 drivers/misc/cxl/irq.c                         | 405 ++++++++++
 drivers/misc/cxl/main.c                        | 238 ++++++
 drivers/misc/cxl/native.c                      | 649 ++++++++++++++++
 drivers/misc/cxl/sysfs.c                       | 348 +++++++++
 include/misc/cxl.h                             |  34 +
 include/uapi/Kbuild                            |   1 +
 include/uapi/misc/Kbuild                       |   2 +
 include/uapi/misc/cxl.h                        |  88 +++
 42 files changed, 5463 insertions(+), 171 deletions(-)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 01/15] powerpc/cell: Move spu_handle_mm_fault() out of cell platform
  2014-09-18  8:26 [PATCH 0/15] POWER8 Coherent Accelerator device driver Michael Neuling
@ 2014-09-18  8:26 ` Michael Neuling
  2014-09-18 10:00   ` Jeremy Kerr
  2014-09-26  3:57   ` Anton Blanchard
  2014-09-18  8:26 ` [PATCH 02/15] powerpc/cell: Move data segment faulting code " Michael Neuling
                   ` (13 subsequent siblings)
  14 siblings, 2 replies; 43+ messages in thread
From: Michael Neuling @ 2014-09-18  8:26 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie, cbe-oss-dev

From: Ian Munsie <imunsie@au1.ibm.com>

Currently spu_handle_mm_fault() is in the cell platform.

This code is generically useful for other non-cell co-processors on powerpc.

This patch moves this function out of the cell platform into arch/powerpc/mm so
that others may use it.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/copro.h                       | 18 ++++++++++++++++++
 arch/powerpc/include/asm/spu.h                         |  5 ++---
 arch/powerpc/mm/Makefile                               |  1 +
 .../{platforms/cell/spu_fault.c => mm/copro_fault.c}   | 14 ++++++--------
 arch/powerpc/platforms/cell/Makefile                   |  2 +-
 arch/powerpc/platforms/cell/spufs/fault.c              |  4 ++--
 6 files changed, 30 insertions(+), 14 deletions(-)
 create mode 100644 arch/powerpc/include/asm/copro.h
 rename arch/powerpc/{platforms/cell/spu_fault.c => mm/copro_fault.c} (89%)

diff --git a/arch/powerpc/include/asm/copro.h b/arch/powerpc/include/asm/copro.h
new file mode 100644
index 0000000..2858108
--- /dev/null
+++ b/arch/powerpc/include/asm/copro.h
@@ -0,0 +1,18 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _ASM_POWERPC_COPRO_H
+#define _ASM_POWERPC_COPRO_H
+
+int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
+			  unsigned long dsisr, unsigned *flt);
+
+int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid);
+
+#endif /* _ASM_POWERPC_COPRO_H */
diff --git a/arch/powerpc/include/asm/spu.h b/arch/powerpc/include/asm/spu.h
index 37b7ca3..a6e6e2b 100644
--- a/arch/powerpc/include/asm/spu.h
+++ b/arch/powerpc/include/asm/spu.h
@@ -27,6 +27,8 @@
 #include <linux/workqueue.h>
 #include <linux/device.h>
 #include <linux/mutex.h>
+#include <asm/reg.h>
+#include <asm/copro.h>
 
 #define LS_SIZE (256 * 1024)
 #define LS_ADDR_MASK (LS_SIZE - 1)
@@ -277,9 +279,6 @@ void spu_remove_dev_attr(struct device_attribute *attr);
 int spu_add_dev_attr_group(struct attribute_group *attrs);
 void spu_remove_dev_attr_group(struct attribute_group *attrs);
 
-int spu_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
-		unsigned long dsisr, unsigned *flt);
-
 /*
  * Notifier blocks:
  *
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index d0130ff..a7f4dd7 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -34,3 +34,4 @@ obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += hugepage-hash64.o
 obj-$(CONFIG_PPC_SUBPAGE_PROT)	+= subpage-prot.o
 obj-$(CONFIG_NOT_COHERENT_CACHE) += dma-noncoherent.o
 obj-$(CONFIG_HIGHMEM)		+= highmem.o
+obj-$(CONFIG_SPU_BASE)		+= copro_fault.o
diff --git a/arch/powerpc/platforms/cell/spu_fault.c b/arch/powerpc/mm/copro_fault.c
similarity index 89%
rename from arch/powerpc/platforms/cell/spu_fault.c
rename to arch/powerpc/mm/copro_fault.c
index 641e727..ba7df14 100644
--- a/arch/powerpc/platforms/cell/spu_fault.c
+++ b/arch/powerpc/mm/copro_fault.c
@@ -1,5 +1,5 @@
 /*
- * SPU mm fault handler
+ * CoProcessor (SPU/AFU) mm fault handler
  *
  * (C) Copyright IBM Deutschland Entwicklung GmbH 2007
  *
@@ -23,16 +23,14 @@
 #include <linux/sched.h>
 #include <linux/mm.h>
 #include <linux/export.h>
-
-#include <asm/spu.h>
-#include <asm/spu_csa.h>
+#include <asm/reg.h>
 
 /*
  * This ought to be kept in sync with the powerpc specific do_page_fault
  * function. Currently, there are a few corner cases that we haven't had
  * to handle fortunately.
  */
-int spu_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
+int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
 		unsigned long dsisr, unsigned *flt)
 {
 	struct vm_area_struct *vma;
@@ -58,12 +56,12 @@ int spu_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
 			goto out_unlock;
 	}
 
-	is_write = dsisr & MFC_DSISR_ACCESS_PUT;
+	is_write = dsisr & DSISR_ISSTORE;
 	if (is_write) {
 		if (!(vma->vm_flags & VM_WRITE))
 			goto out_unlock;
 	} else {
-		if (dsisr & MFC_DSISR_ACCESS_DENIED)
+		if (dsisr & DSISR_PROTFAULT)
 			goto out_unlock;
 		if (!(vma->vm_flags & (VM_READ | VM_EXEC)))
 			goto out_unlock;
@@ -91,4 +89,4 @@ out_unlock:
 	up_read(&mm->mmap_sem);
 	return ret;
 }
-EXPORT_SYMBOL_GPL(spu_handle_mm_fault);
+EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
diff --git a/arch/powerpc/platforms/cell/Makefile b/arch/powerpc/platforms/cell/Makefile
index fe053e7..2d16884 100644
--- a/arch/powerpc/platforms/cell/Makefile
+++ b/arch/powerpc/platforms/cell/Makefile
@@ -20,7 +20,7 @@ spu-manage-$(CONFIG_PPC_CELL_COMMON)	+= spu_manage.o
 
 obj-$(CONFIG_SPU_BASE)			+= spu_callbacks.o spu_base.o \
 					   spu_notify.o \
-					   spu_syscalls.o spu_fault.o \
+					   spu_syscalls.o \
 					   $(spu-priv1-y) \
 					   $(spu-manage-y) \
 					   spufs/
diff --git a/arch/powerpc/platforms/cell/spufs/fault.c b/arch/powerpc/platforms/cell/spufs/fault.c
index 8cb6260..e45894a 100644
--- a/arch/powerpc/platforms/cell/spufs/fault.c
+++ b/arch/powerpc/platforms/cell/spufs/fault.c
@@ -138,7 +138,7 @@ int spufs_handle_class1(struct spu_context *ctx)
 	if (ctx->state == SPU_STATE_RUNNABLE)
 		ctx->spu->stats.hash_flt++;
 
-	/* we must not hold the lock when entering spu_handle_mm_fault */
+	/* we must not hold the lock when entering copro_handle_mm_fault */
 	spu_release(ctx);
 
 	access = (_PAGE_PRESENT | _PAGE_USER);
@@ -149,7 +149,7 @@ int spufs_handle_class1(struct spu_context *ctx)
 
 	/* hashing failed, so try the actual fault handler */
 	if (ret)
-		ret = spu_handle_mm_fault(current->mm, ea, dsisr, &flt);
+		ret = copro_handle_mm_fault(current->mm, ea, dsisr, &flt);
 
 	/*
 	 * This is nasty: we need the state_mutex for all the bookkeeping even
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 02/15] powerpc/cell: Move data segment faulting code out of cell platform
  2014-09-18  8:26 [PATCH 0/15] POWER8 Coherent Accelerator device driver Michael Neuling
  2014-09-18  8:26 ` [PATCH 01/15] powerpc/cell: Move spu_handle_mm_fault() out of cell platform Michael Neuling
@ 2014-09-18  8:26 ` Michael Neuling
  2014-09-18 10:27   ` Jeremy Kerr
                     ` (2 more replies)
  2014-09-18  8:26 ` [PATCH 03/15] powerpc/msi: Improve IRQ bitmap allocator Michael Neuling
                   ` (12 subsequent siblings)
  14 siblings, 3 replies; 43+ messages in thread
From: Michael Neuling @ 2014-09-18  8:26 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie, cbe-oss-dev

From: Ian Munsie <imunsie@au1.ibm.com>

__spu_trap_data_seg() currently contains code to determine the VSID and ESID
required for a particular EA and mm struct.

This code is generically useful for other co-processors.  This moves the code
of the cell platform so it can be used by other powerpc code.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/mmu-hash64.h  |  2 ++
 arch/powerpc/mm/copro_fault.c          | 48 ++++++++++++++++++++++++++++++++++
 arch/powerpc/mm/slb.c                  |  3 ---
 arch/powerpc/platforms/cell/spu_base.c | 41 +++--------------------------
 4 files changed, 54 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
index d765144..fd19a53 100644
--- a/arch/powerpc/include/asm/mmu-hash64.h
+++ b/arch/powerpc/include/asm/mmu-hash64.h
@@ -180,6 +180,8 @@ static inline unsigned int mmu_psize_to_shift(unsigned int mmu_psize)
  * we work in all cases including 4k page size.
  */
 #define VPN_SHIFT	12
+#define slb_vsid_shift(ssize)	\
+	((ssize) == MMU_SEGSIZE_256M ? SLB_VSID_SHIFT : SLB_VSID_SHIFT_1T)
 
 /*
  * HPTE Large Page (LP) details
diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
index ba7df14..4105a63 100644
--- a/arch/powerpc/mm/copro_fault.c
+++ b/arch/powerpc/mm/copro_fault.c
@@ -90,3 +90,51 @@ out_unlock:
 	return ret;
 }
 EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
+
+int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid)
+{
+	int psize, ssize;
+
+	*esid = (ea & ESID_MASK) | SLB_ESID_V;
+
+	switch (REGION_ID(ea)) {
+	case USER_REGION_ID:
+		pr_devel("copro_data_segment: 0x%llx -- USER_REGION_ID\n", ea);
+#ifdef CONFIG_PPC_MM_SLICES
+		psize = get_slice_psize(mm, ea);
+#else
+		psize = mm->context.user_psize;
+#endif
+		ssize = user_segment_size(ea);
+		*vsid = (get_vsid(mm->context.id, ea, ssize)
+			<< slb_vsid_shift(ssize)) | SLB_VSID_USER
+			| (ssize == MMU_SEGSIZE_1T ? SLB_VSID_B_1T : 0);
+		break;
+	case VMALLOC_REGION_ID:
+		pr_devel("copro_data_segment: 0x%llx -- VMALLOC_REGION_ID\n", ea);
+		if (ea < VMALLOC_END)
+			psize = mmu_vmalloc_psize;
+		else
+			psize = mmu_io_psize;
+		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
+			<< SLB_VSID_SHIFT) | SLB_VSID_KERNEL
+			| (mmu_kernel_ssize == MMU_SEGSIZE_1T ? SLB_VSID_B_1T : 0);
+		break;
+	case KERNEL_REGION_ID:
+		pr_devel("copro_data_segment: 0x%llx -- KERNEL_REGION_ID\n", ea);
+		psize = mmu_linear_psize;
+		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
+			<< SLB_VSID_SHIFT) | SLB_VSID_KERNEL
+			| (mmu_kernel_ssize == MMU_SEGSIZE_1T ? SLB_VSID_B_1T : 0);
+		break;
+	default:
+		/* Future: support kernel segments so that drivers can use the
+		 * CoProcessors */
+		pr_debug("invalid region access at %016llx\n", ea);
+		return 1;
+	}
+	*vsid |= mmu_psize_defs[psize].sllp;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(copro_data_segment);
diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index 0399a67..6e450ca 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -46,9 +46,6 @@ static inline unsigned long mk_esid_data(unsigned long ea, int ssize,
 	return (ea & slb_esid_mask(ssize)) | SLB_ESID_V | slot;
 }
 
-#define slb_vsid_shift(ssize)	\
-	((ssize) == MMU_SEGSIZE_256M? SLB_VSID_SHIFT: SLB_VSID_SHIFT_1T)
-
 static inline unsigned long mk_vsid_data(unsigned long ea, int ssize,
 					 unsigned long flags)
 {
diff --git a/arch/powerpc/platforms/cell/spu_base.c b/arch/powerpc/platforms/cell/spu_base.c
index 2930d1e..fe004b1 100644
--- a/arch/powerpc/platforms/cell/spu_base.c
+++ b/arch/powerpc/platforms/cell/spu_base.c
@@ -167,45 +167,12 @@ static inline void spu_load_slb(struct spu *spu, int slbe, struct spu_slb *slb)
 
 static int __spu_trap_data_seg(struct spu *spu, unsigned long ea)
 {
-	struct mm_struct *mm = spu->mm;
 	struct spu_slb slb;
-	int psize;
-
-	pr_debug("%s\n", __func__);
-
-	slb.esid = (ea & ESID_MASK) | SLB_ESID_V;
+	int ret;
 
-	switch(REGION_ID(ea)) {
-	case USER_REGION_ID:
-#ifdef CONFIG_PPC_MM_SLICES
-		psize = get_slice_psize(mm, ea);
-#else
-		psize = mm->context.user_psize;
-#endif
-		slb.vsid = (get_vsid(mm->context.id, ea, MMU_SEGSIZE_256M)
-				<< SLB_VSID_SHIFT) | SLB_VSID_USER;
-		break;
-	case VMALLOC_REGION_ID:
-		if (ea < VMALLOC_END)
-			psize = mmu_vmalloc_psize;
-		else
-			psize = mmu_io_psize;
-		slb.vsid = (get_kernel_vsid(ea, MMU_SEGSIZE_256M)
-				<< SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
-		break;
-	case KERNEL_REGION_ID:
-		psize = mmu_linear_psize;
-		slb.vsid = (get_kernel_vsid(ea, MMU_SEGSIZE_256M)
-				<< SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
-		break;
-	default:
-		/* Future: support kernel segments so that drivers
-		 * can use SPUs.
-		 */
-		pr_debug("invalid region access at %016lx\n", ea);
-		return 1;
-	}
-	slb.vsid |= mmu_psize_defs[psize].sllp;
+	ret = copro_data_segment(spu->mm, ea, &slb.esid, &slb.vsid);
+	if (ret)
+		return ret;
 
 	spu_load_slb(spu, spu->slb_replace, &slb);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 03/15] powerpc/msi: Improve IRQ bitmap allocator
  2014-09-18  8:26 [PATCH 0/15] POWER8 Coherent Accelerator device driver Michael Neuling
  2014-09-18  8:26 ` [PATCH 01/15] powerpc/cell: Move spu_handle_mm_fault() out of cell platform Michael Neuling
  2014-09-18  8:26 ` [PATCH 02/15] powerpc/cell: Move data segment faulting code " Michael Neuling
@ 2014-09-18  8:26 ` Michael Neuling
  2014-09-19 20:16   ` Scott Wood
  2014-09-22  8:29   ` Laurentiu Tudor
  2014-09-18  8:26 ` [PATCH 04/15] powerpc/mm: Export mmu_kernel_ssize and mmu_linear_psize Michael Neuling
                   ` (11 subsequent siblings)
  14 siblings, 2 replies; 43+ messages in thread
From: Michael Neuling @ 2014-09-18  8:26 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie, cbe-oss-dev

From: Ian Munsie <imunsie@au1.ibm.com>

Currently msi_bitmap_alloc_hwirqs() will round up any IRQ allocation requests
to the nearest power of 2.  eg. ask for 5 IRQs and you'll get 8.  This wastes a
lot of IRQs which can be a scarce resource.

For cxl we can require multiple IRQs for every contexts that is attached to the
accelerator.  For AFU directed accelerators, there may be 1000s of contexts
attached, hence we can easily run out of IRQs, especially if we are needlessly
wasting them.

This changes the msi_bitmap_alloc_hwirqs() to allocate only the required number
of IRQs, hence avoiding this wastage.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/sysdev/msi_bitmap.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/sysdev/msi_bitmap.c b/arch/powerpc/sysdev/msi_bitmap.c
index 2ff6302..e001559 100644
--- a/arch/powerpc/sysdev/msi_bitmap.c
+++ b/arch/powerpc/sysdev/msi_bitmap.c
@@ -24,28 +24,36 @@ int msi_bitmap_alloc_hwirqs(struct msi_bitmap *bmp, int num)
 	 * This is fast, but stricter than we need. We might want to add
 	 * a fallback routine which does a linear search with no alignment.
 	 */
-	offset = bitmap_find_free_region(bmp->bitmap, bmp->irq_count, order);
+	offset = bitmap_find_next_zero_area(bmp->bitmap, bmp->irq_count, 0,
+					    num, (1 << order) - 1);
+	if (offset > bmp->irq_count)
+		goto err;
+	bitmap_set(bmp->bitmap, offset, num);
 	spin_unlock_irqrestore(&bmp->lock, flags);
 
 	pr_debug("msi_bitmap: allocated 0x%x (2^%d) at offset 0x%x\n",
 		 num, order, offset);
 
 	return offset;
+err:
+	spin_unlock_irqrestore(&bmp->lock, flags);
+	return -ENOMEM;
 }
+EXPORT_SYMBOL(msi_bitmap_alloc_hwirqs);
 
 void msi_bitmap_free_hwirqs(struct msi_bitmap *bmp, unsigned int offset,
 			    unsigned int num)
 {
 	unsigned long flags;
-	int order = get_count_order(num);
 
-	pr_debug("msi_bitmap: freeing 0x%x (2^%d) at offset 0x%x\n",
-		 num, order, offset);
+	pr_debug("msi_bitmap: freeing 0x%x at offset 0x%x\n",
+		 num, offset);
 
 	spin_lock_irqsave(&bmp->lock, flags);
-	bitmap_release_region(bmp->bitmap, offset, order);
+	bitmap_clear(bmp->bitmap, offset, num);
 	spin_unlock_irqrestore(&bmp->lock, flags);
 }
+EXPORT_SYMBOL(msi_bitmap_free_hwirqs);
 
 void msi_bitmap_reserve_hwirq(struct msi_bitmap *bmp, unsigned int hwirq)
 {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 04/15] powerpc/mm: Export mmu_kernel_ssize and mmu_linear_psize
  2014-09-18  8:26 [PATCH 0/15] POWER8 Coherent Accelerator device driver Michael Neuling
                   ` (2 preceding siblings ...)
  2014-09-18  8:26 ` [PATCH 03/15] powerpc/msi: Improve IRQ bitmap allocator Michael Neuling
@ 2014-09-18  8:26 ` Michael Neuling
  2014-09-18  8:26 ` [PATCH 05/15] powerpc/powernv: Split out set MSI IRQ chip code Michael Neuling
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Michael Neuling @ 2014-09-18  8:26 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie, cbe-oss-dev

From: Ian Munsie <imunsie@au1.ibm.com>

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/mm/hash_utils_64.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index daee7f4..0f73367 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -98,6 +98,7 @@ unsigned long htab_size_bytes;
 unsigned long htab_hash_mask;
 EXPORT_SYMBOL_GPL(htab_hash_mask);
 int mmu_linear_psize = MMU_PAGE_4K;
+EXPORT_SYMBOL_GPL(mmu_linear_psize);
 int mmu_virtual_psize = MMU_PAGE_4K;
 int mmu_vmalloc_psize = MMU_PAGE_4K;
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
@@ -105,6 +106,7 @@ int mmu_vmemmap_psize = MMU_PAGE_4K;
 #endif
 int mmu_io_psize = MMU_PAGE_4K;
 int mmu_kernel_ssize = MMU_SEGSIZE_256M;
+EXPORT_SYMBOL_GPL(mmu_kernel_ssize);
 int mmu_highuser_ssize = MMU_SEGSIZE_256M;
 u16 mmu_slb_size = 64;
 EXPORT_SYMBOL_GPL(mmu_slb_size);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 05/15] powerpc/powernv: Split out set MSI IRQ chip code
  2014-09-18  8:26 [PATCH 0/15] POWER8 Coherent Accelerator device driver Michael Neuling
                   ` (3 preceding siblings ...)
  2014-09-18  8:26 ` [PATCH 04/15] powerpc/mm: Export mmu_kernel_ssize and mmu_linear_psize Michael Neuling
@ 2014-09-18  8:26 ` Michael Neuling
  2014-09-19  6:54   ` Gavin Shan
  2014-09-18  8:26 ` [PATCH 06/15] cxl: Add new header for call backs and structs Michael Neuling
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Michael Neuling @ 2014-09-18  8:26 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie, cbe-oss-dev

From: Ian Munsie <imunsie@au1.ibm.com>

Some of the MSI IRQ code in pnv_pci_ioda_msi_setup() is generically useful so
split it out.

This will be used by some of the cxl PCIe code later.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 43 ++++++++++++++++++-------------
 1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index df241b1..194f90a 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1306,14 +1306,36 @@ static void pnv_ioda2_msi_eoi(struct irq_data *d)
 	icp_native_eoi(d);
 }
 
+
+static void set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq)
+{
+	struct irq_data *idata;
+	struct irq_chip *ichip;
+
+	/*
+	 * Change the IRQ chip for the MSI interrupts on PHB3.
+	 * The corresponding IRQ chip should be populated for
+	 * the first time.
+	 */
+	if (phb->type == PNV_PHB_IODA2) {
+		if (!phb->ioda.irq_chip_init) {
+			idata = irq_get_irq_data(virq);
+			ichip = irq_data_get_irq_chip(idata);
+			phb->ioda.irq_chip_init = 1;
+			phb->ioda.irq_chip = *ichip;
+			phb->ioda.irq_chip.irq_eoi = pnv_ioda2_msi_eoi;
+		}
+
+		irq_set_chip(virq, &phb->ioda.irq_chip);
+	}
+}
+
 static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
 				  unsigned int hwirq, unsigned int virq,
 				  unsigned int is_64, struct msi_msg *msg)
 {
 	struct pnv_ioda_pe *pe = pnv_ioda_get_pe(dev);
 	struct pci_dn *pdn = pci_get_pdn(dev);
-	struct irq_data *idata;
-	struct irq_chip *ichip;
 	unsigned int xive_num = hwirq - phb->msi_base;
 	__be32 data;
 	int rc;
@@ -1365,22 +1387,7 @@ static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
 	}
 	msg->data = be32_to_cpu(data);
 
-	/*
-	 * Change the IRQ chip for the MSI interrupts on PHB3.
-	 * The corresponding IRQ chip should be populated for
-	 * the first time.
-	 */
-	if (phb->type == PNV_PHB_IODA2) {
-		if (!phb->ioda.irq_chip_init) {
-			idata = irq_get_irq_data(virq);
-			ichip = irq_data_get_irq_chip(idata);
-			phb->ioda.irq_chip_init = 1;
-			phb->ioda.irq_chip = *ichip;
-			phb->ioda.irq_chip.irq_eoi = pnv_ioda2_msi_eoi;
-		}
-
-		irq_set_chip(virq, &phb->ioda.irq_chip);
-	}
+	set_msi_irq_chip(phb, virq);
 
 	pr_devel("%s: %s-bit MSI on hwirq %x (xive #%d),"
 		 " address=%x_%08x data=%x PE# %d\n",
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 06/15] cxl: Add new header for call backs and structs
  2014-09-18  8:26 [PATCH 0/15] POWER8 Coherent Accelerator device driver Michael Neuling
                   ` (4 preceding siblings ...)
  2014-09-18  8:26 ` [PATCH 05/15] powerpc/powernv: Split out set MSI IRQ chip code Michael Neuling
@ 2014-09-18  8:26 ` Michael Neuling
  2014-09-18  8:26 ` [PATCH 07/15] powerpc/powerpc: Add new PCIe functions for allocating cxl interrupts Michael Neuling
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Michael Neuling @ 2014-09-18  8:26 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie, cbe-oss-dev

From: Ian Munsie <imunsie@au1.ibm.com>

This new header add defines for callbacks and structs needed by the rest of the
kernel to hook into the cxl infrastructure.

Empty functions are provided when CONFIG CXL_BASE is not enabled.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 include/misc/cxl.h | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)
 create mode 100644 include/misc/cxl.h

diff --git a/include/misc/cxl.h b/include/misc/cxl.h
new file mode 100644
index 0000000..bde46a3
--- /dev/null
+++ b/include/misc/cxl.h
@@ -0,0 +1,34 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _MISC_ASM_CXL_H
+#define _MISC_ASM_CXL_H
+
+#define CXL_IRQ_RANGES 4
+
+struct cxl_irq_ranges {
+	irq_hw_number_t offset[CXL_IRQ_RANGES];
+	irq_hw_number_t range[CXL_IRQ_RANGES];
+};
+
+#ifdef CONFIG_CXL_BASE
+
+void cxl_slbia(struct mm_struct *mm);
+void cxl_ctx_get(void);
+void cxl_ctx_put(void);
+bool cxl_ctx_in_use(void);
+
+#else /* CONFIG_CXL_BASE */
+
+#define cxl_slbia(...) do { } while (0)
+#define cxl_ctx_in_use(...) false
+
+#endif /* CONFIG_CXL_BASE */
+
+#endif
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 07/15] powerpc/powerpc: Add new PCIe functions for allocating cxl interrupts
  2014-09-18  8:26 [PATCH 0/15] POWER8 Coherent Accelerator device driver Michael Neuling
                   ` (5 preceding siblings ...)
  2014-09-18  8:26 ` [PATCH 06/15] cxl: Add new header for call backs and structs Michael Neuling
@ 2014-09-18  8:26 ` Michael Neuling
  2014-09-19  7:09   ` Gavin Shan
  2014-09-18  8:26 ` [PATCH 08/15] powerpc/mm: Add new hash_page_mm() Michael Neuling
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Michael Neuling @ 2014-09-18  8:26 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie, cbe-oss-dev

From: Ian Munsie <imunsie@au1.ibm.com>

This adds a number of functions for allocating IRQs under powernv PCIe for cxl.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/pnv-pci.h        |  27 +++++
 arch/powerpc/platforms/powernv/pci-ioda.c | 186 ++++++++++++++++++++++++++++++
 2 files changed, 213 insertions(+)
 create mode 100644 arch/powerpc/include/asm/pnv-pci.h

diff --git a/arch/powerpc/include/asm/pnv-pci.h b/arch/powerpc/include/asm/pnv-pci.h
new file mode 100644
index 0000000..71717b5
--- /dev/null
+++ b/arch/powerpc/include/asm/pnv-pci.h
@@ -0,0 +1,27 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _ASM_PNV_PCI_H
+#define _ASM_PNV_PCI_H
+
+#include <linux/pci.h>
+#include <misc/cxl.h>
+
+int pnv_phb_to_cxl(struct pci_dev *dev);
+int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
+			   unsigned int virq);
+int pnv_cxl_alloc_hwirqs(struct pci_dev *dev, int num);
+void pnv_cxl_release_hwirqs(struct pci_dev *dev, int hwirq, int num);
+int pnv_cxl_alloc_hwirq_ranges(struct cxl_irq_ranges *irqs,
+			       struct pci_dev *dev, int num);
+void pnv_cxl_release_hwirq_ranges(struct cxl_irq_ranges *irqs,
+				  struct pci_dev *dev);
+int pnv_cxl_get_irq_count(struct pci_dev *dev);
+
+#endif
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 194f90a..80919f8 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -38,6 +38,8 @@
 #include <asm/debug.h>
 #include <asm/firmware.h>
 
+#include <misc/cxl.h>
+
 #include "powernv.h"
 #include "pci.h"
 
@@ -503,6 +505,163 @@ static struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev)
 		return NULL;
 	return &phb->ioda.pe_array[pdn->pe_number];
 }
+
+struct device_node *pnv_pci_to_phb_node(struct pci_dev *dev)
+{
+	struct device_node *np;
+	struct property *prop = NULL;
+
+	np = of_node_get(pci_device_to_OF_node(dev));
+
+	/* Scan up the tree looking for the PHB node */
+	while (np) {
+		if ((prop = of_find_property(np, "ibm,opal-phbid", NULL)))
+			break;
+		np = of_get_next_parent(np);
+	}
+
+	if (!prop) {
+		of_node_put(np);
+		return NULL;
+	}
+
+	return np;
+}
+EXPORT_SYMBOL(pnv_pci_to_phb_node);
+
+#ifdef CONFIG_CXL_BASE
+int pnv_phb_to_cxl(struct pci_dev *dev)
+{
+	struct device_node *np;
+	struct pnv_ioda_pe *pe;
+	const u64 *prop64;
+	u64 phb_id;
+	int rc;
+
+	dev_info(&dev->dev, "switch PHB to CXL\n");
+
+	if (!(np = pnv_pci_to_phb_node(dev)))
+		return -ENODEV;
+
+	prop64 = of_get_property(np, "ibm,opal-phbid", NULL);
+
+	phb_id = be64_to_cpup(prop64);
+	dev_info(&dev->dev, "PHB-ID  : 0x%016llx\n", phb_id);
+
+	if (!(pe = pnv_ioda_get_pe(dev))) {
+		rc = -ENODEV;
+		goto out;
+	}
+	dev_info(&dev->dev, "     pe : %i\n", pe->pe_number);
+
+	if ((rc = opal_pci_set_phb_cxl_mode(phb_id, 1, pe->pe_number)))
+		dev_err(&dev->dev, "opal_pci_set_phb_cxl_mode failed: %i\n", rc);
+
+out:
+	of_node_put(np);
+	return rc;
+}
+EXPORT_SYMBOL(pnv_phb_to_cxl);
+
+int pnv_cxl_alloc_hwirqs(struct pci_dev *dev, int num)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	int hwirq = msi_bitmap_alloc_hwirqs(&phb->msi_bmp, num);
+
+	if (hwirq < 0) {
+		dev_warn(&dev->dev, "Failed to find a free MSI\n");
+		return -ENOSPC;
+	}
+
+	return phb->msi_base + hwirq;
+}
+EXPORT_SYMBOL(pnv_cxl_alloc_hwirqs);
+
+void pnv_cxl_release_hwirqs(struct pci_dev *dev, int hwirq, int num)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+
+	msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq - phb->msi_base, num);
+}
+EXPORT_SYMBOL(pnv_cxl_release_hwirqs);
+
+
+int pnv_cxl_alloc_hwirq_ranges(struct cxl_irq_ranges *irqs,
+			       struct pci_dev *dev, int num)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	int range = 0;
+	int hwirq;
+	int try;
+
+	memset(irqs, 0, sizeof(struct cxl_irq_ranges));
+
+	for (range = 1; range < CXL_IRQ_RANGES && num; range++) {
+		try = num;
+		while (try) {
+			hwirq = msi_bitmap_alloc_hwirqs(&phb->msi_bmp, try);
+			if (hwirq >= 0)
+				break;
+			try /= 2;
+		}
+		if (!try)
+			goto fail;
+
+		irqs->offset[range] = phb->msi_base + hwirq;
+		irqs->range[range] = try;
+		pr_devel("cxl alloc irq range 0x%x: offset: 0x%lx  limit: %li\n",
+			 range, irqs->offset[range], irqs->range[range]);
+		num -= try;
+	}
+	if (num)
+		goto fail;
+
+	return 0;
+fail:
+	for (range--; range >= 0; range--) {
+		hwirq = irqs->offset[range] - phb->msi_base;
+		msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq,
+				       irqs->range[range]);
+		irqs->range[range] = 0;
+	}
+	return -ENOSPC;
+}
+EXPORT_SYMBOL(pnv_cxl_alloc_hwirq_ranges);
+
+void pnv_cxl_release_hwirq_ranges(struct cxl_irq_ranges *irqs,
+				  struct pci_dev *dev)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	int range = 0;
+	int hwirq;
+
+	for (range = 0; range < 4; range++) {
+		hwirq = irqs->offset[range] - phb->msi_base;
+		if (irqs->range[range]) {
+			pr_devel("cxl release irq range 0x%x: offset: 0x%lx  limit: %ld\n",
+				 range, irqs->offset[range],
+				 irqs->range[range]);
+			msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq,
+					       irqs->range[range]);
+		}
+	}
+}
+EXPORT_SYMBOL(pnv_cxl_release_hwirq_ranges);
+
+int pnv_cxl_get_irq_count(struct pci_dev *dev)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+        struct pnv_phb *phb = hose->private_data;
+
+	return phb->msi_bmp.irq_count;
+}
+EXPORT_SYMBOL(pnv_cxl_get_irq_count);
+
+#endif /* CONFIG_CXL_BASE */
 #endif /* CONFIG_PCI_MSI */
 
 static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
@@ -1330,6 +1489,33 @@ static void set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq)
 	}
 }
 
+#ifdef CONFIG_CXL_BASE
+int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
+			   unsigned int virq)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	unsigned int xive_num = hwirq - phb->msi_base;
+	struct pnv_ioda_pe *pe;
+	int rc;
+
+	if (!(pe = pnv_ioda_get_pe(dev)))
+		return -ENODEV;
+
+	/* Assign XIVE to PE */
+	rc = opal_pci_set_xive_pe(phb->opal_id, pe->pe_number, xive_num);
+	if (rc) {
+		pr_warn("%s: OPAL error %d setting msi_base 0x%x hwirq 0x%x XIVE 0x%x PE\n",
+			pci_name(dev), rc, phb->msi_base, hwirq, xive_num);
+		return -EIO;
+	}
+	set_msi_irq_chip(phb, virq);
+
+	return 0;
+}
+EXPORT_SYMBOL(pnv_cxl_ioda_msi_setup);
+#endif
+
 static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
 				  unsigned int hwirq, unsigned int virq,
 				  unsigned int is_64, struct msi_msg *msg)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 08/15] powerpc/mm: Add new hash_page_mm()
  2014-09-18  8:26 [PATCH 0/15] POWER8 Coherent Accelerator device driver Michael Neuling
                   ` (6 preceding siblings ...)
  2014-09-18  8:26 ` [PATCH 07/15] powerpc/powerpc: Add new PCIe functions for allocating cxl interrupts Michael Neuling
@ 2014-09-18  8:26 ` Michael Neuling
  2014-09-29  8:50   ` Aneesh Kumar K.V
  2014-09-18  8:26 ` [PATCH 09/15] powerpc/opal: Add PHB to cxl mode call Michael Neuling
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Michael Neuling @ 2014-09-18  8:26 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie, cbe-oss-dev

From: Ian Munsie <imunsie@au1.ibm.com>

This adds a new function hash_page_mm() based on the existing hash_page().
This version allows any struct mm to be passed in, rather than assuming
current.  This is useful for servicing co-processor faults which are not in the
context of the current running process.

We need to be careful here as the current hash_page() assumes current in a few
places.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/mmu-hash64.h |  1 +
 arch/powerpc/mm/hash_utils_64.c       | 20 +++++++++++++-------
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
index fd19a53..a3b85e9 100644
--- a/arch/powerpc/include/asm/mmu-hash64.h
+++ b/arch/powerpc/include/asm/mmu-hash64.h
@@ -319,6 +319,7 @@ extern int __hash_page_64K(unsigned long ea, unsigned long access,
 			   unsigned int local, int ssize);
 struct mm_struct;
 unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap);
+extern int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, unsigned long trap);
 extern int hash_page(unsigned long ea, unsigned long access, unsigned long trap);
 int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid,
 		     pte_t *ptep, unsigned long trap, int local, int ssize,
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 0f73367..66071af 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -991,26 +991,24 @@ static void check_paca_psize(unsigned long ea, struct mm_struct *mm,
  * -1 - critical hash insertion error
  * -2 - access not permitted by subpage protection mechanism
  */
-int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
+int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, unsigned long trap)
 {
 	enum ctx_state prev_state = exception_enter();
 	pgd_t *pgdir;
 	unsigned long vsid;
-	struct mm_struct *mm;
 	pte_t *ptep;
 	unsigned hugeshift;
 	const struct cpumask *tmp;
 	int rc, user_region = 0, local = 0;
 	int psize, ssize;
 
-	DBG_LOW("hash_page(ea=%016lx, access=%lx, trap=%lx\n",
-		ea, access, trap);
+	DBG_LOW("%s(ea=%016lx, access=%lx, trap=%lx\n",
+		__func__, ea, access, trap);
 
 	/* Get region & vsid */
  	switch (REGION_ID(ea)) {
 	case USER_REGION_ID:
 		user_region = 1;
-		mm = current->mm;
 		if (! mm) {
 			DBG_LOW(" user region with no mm !\n");
 			rc = 1;
@@ -1106,7 +1104,8 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
 			WARN_ON(1);
 		}
 #endif
-		check_paca_psize(ea, mm, psize, user_region);
+		if (current->mm == mm)
+			check_paca_psize(ea, mm, psize, user_region);
 
 		goto bail;
 	}
@@ -1149,7 +1148,8 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
 		}
 	}
 
-	check_paca_psize(ea, mm, psize, user_region);
+	if (current->mm == mm)
+		check_paca_psize(ea, mm, psize, user_region);
 #endif /* CONFIG_PPC_64K_PAGES */
 
 #ifdef CONFIG_PPC_HAS_HASH_64K
@@ -1184,6 +1184,12 @@ bail:
 	exception_exit(prev_state);
 	return rc;
 }
+EXPORT_SYMBOL_GPL(hash_page_mm);
+
+int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
+{
+	return hash_page_mm(current->mm, ea, access, trap);
+}
 EXPORT_SYMBOL_GPL(hash_page);
 
 void hash_preload(struct mm_struct *mm, unsigned long ea,
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 09/15] powerpc/opal: Add PHB to cxl mode call
  2014-09-18  8:26 [PATCH 0/15] POWER8 Coherent Accelerator device driver Michael Neuling
                   ` (7 preceding siblings ...)
  2014-09-18  8:26 ` [PATCH 08/15] powerpc/mm: Add new hash_page_mm() Michael Neuling
@ 2014-09-18  8:26 ` Michael Neuling
  2014-09-26  4:35   ` Anton Blanchard
  2014-09-18  8:26 ` [PATCH 10/15] powerpc/mm: Add hooks for cxl Michael Neuling
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Michael Neuling @ 2014-09-18  8:26 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie, cbe-oss-dev

From: Ian Munsie <imunsie@au1.ibm.com>

This adds the OPAL call to change a PHB into cxl mode.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/opal.h                | 2 ++
 arch/powerpc/platforms/powernv/opal-wrappers.S | 1 +
 2 files changed, 3 insertions(+)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 86055e5..84c37c4dbc 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -146,6 +146,7 @@ struct opal_sg_list {
 #define OPAL_GET_PARAM				89
 #define OPAL_SET_PARAM				90
 #define OPAL_DUMP_RESEND			91
+#define OPAL_PCI_SET_PHB_CXL_MODE		93
 #define OPAL_DUMP_INFO2				94
 #define OPAL_PCI_EEH_FREEZE_SET			97
 #define OPAL_HANDLE_HMI				98
@@ -924,6 +925,7 @@ int64_t opal_sensor_read(uint32_t sensor_hndl, int token, __be32 *sensor_data);
 int64_t opal_handle_hmi(void);
 int64_t opal_register_dump_region(uint32_t id, uint64_t start, uint64_t end);
 int64_t opal_unregister_dump_region(uint32_t id);
+int64_t opal_pci_set_phb_cxl_mode(uint64_t phb_id, uint64_t mode, uint64_t pe_number);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 2e6ce1b..0fb56dc 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -247,3 +247,4 @@ OPAL_CALL(opal_set_param,			OPAL_SET_PARAM);
 OPAL_CALL(opal_handle_hmi,			OPAL_HANDLE_HMI);
 OPAL_CALL(opal_register_dump_region,		OPAL_REGISTER_DUMP_REGION);
 OPAL_CALL(opal_unregister_dump_region,		OPAL_UNREGISTER_DUMP_REGION);
+OPAL_CALL(opal_pci_set_phb_cxl_mode,		OPAL_PCI_SET_PHB_CXL_MODE);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 10/15] powerpc/mm: Add hooks for cxl
  2014-09-18  8:26 [PATCH 0/15] POWER8 Coherent Accelerator device driver Michael Neuling
                   ` (8 preceding siblings ...)
  2014-09-18  8:26 ` [PATCH 09/15] powerpc/opal: Add PHB to cxl mode call Michael Neuling
@ 2014-09-18  8:26 ` Michael Neuling
  2014-09-26  4:33   ` Anton Blanchard
  2014-09-29  9:10   ` Aneesh Kumar K.V
  2014-09-18  8:26 ` [PATCH 11/15] cxl: Add base builtin support Michael Neuling
                   ` (4 subsequent siblings)
  14 siblings, 2 replies; 43+ messages in thread
From: Michael Neuling @ 2014-09-18  8:26 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie, cbe-oss-dev

From: Ian Munsie <imunsie@au1.ibm.com>

This add a hook into tlbie() so that we use global invalidations when there are
cxl contexts active.

Normally cxl snoops broadcast tlbie.  cxl can have TLB entries invalidated via
MMIO, but we aren't doing that yet.  So for now we are just disabling local
tlbies when cxl contexts are active.  In future we can make tlbie() local mode
smarter so that it invalidates cxl contexts explicitly when it needs to.

This also adds a hooks for when SLBs are invalidated to ensure any
corresponding SLBs in cxl are also invalidated at the same time.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/mm/hash_native_64.c | 6 +++++-
 arch/powerpc/mm/hash_utils_64.c  | 3 +++
 arch/powerpc/mm/slice.c          | 3 +++
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index afc0a82..ae4962a 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -29,6 +29,8 @@
 #include <asm/kexec.h>
 #include <asm/ppc-opcode.h>
 
+#include <misc/cxl.h>
+
 #ifdef DEBUG_LOW
 #define DBG_LOW(fmt...) udbg_printf(fmt)
 #else
@@ -149,9 +151,11 @@ static inline void __tlbiel(unsigned long vpn, int psize, int apsize, int ssize)
 static inline void tlbie(unsigned long vpn, int psize, int apsize,
 			 int ssize, int local)
 {
-	unsigned int use_local = local && mmu_has_feature(MMU_FTR_TLBIEL);
+	unsigned int use_local;
 	int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
 
+	use_local = local && mmu_has_feature(MMU_FTR_TLBIEL) && !cxl_ctx_in_use();
+
 	if (use_local)
 		use_local = mmu_psize_defs[psize].tlbiel;
 	if (lock_tlbie && !use_local)
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 66071af..be40ff7 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -34,6 +34,7 @@
 #include <linux/signal.h>
 #include <linux/memblock.h>
 #include <linux/context_tracking.h>
+#include <misc/cxl.h>
 
 #include <asm/processor.h>
 #include <asm/pgtable.h>
@@ -906,6 +907,7 @@ void demote_segment_4k(struct mm_struct *mm, unsigned long addr)
 #ifdef CONFIG_SPU_BASE
 	spu_flush_all_slbs(mm);
 #endif
+	cxl_slbia(mm);
 	if (get_paca_psize(addr) != MMU_PAGE_4K) {
 		get_paca()->context = mm->context;
 		slb_flush_and_rebolt();
@@ -1145,6 +1147,7 @@ int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, u
 #ifdef CONFIG_SPU_BASE
 			spu_flush_all_slbs(mm);
 #endif
+			cxl_slbia(mm);
 		}
 	}
 
diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index b0c75cc..4d3a34b 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -30,6 +30,7 @@
 #include <linux/err.h>
 #include <linux/spinlock.h>
 #include <linux/export.h>
+#include <misc/cxl.h>
 #include <asm/mman.h>
 #include <asm/mmu.h>
 #include <asm/spu.h>
@@ -235,6 +236,7 @@ static void slice_convert(struct mm_struct *mm, struct slice_mask mask, int psiz
 #ifdef CONFIG_SPU_BASE
 	spu_flush_all_slbs(mm);
 #endif
+	cxl_slbia(mm);
 }
 
 /*
@@ -674,6 +676,7 @@ void slice_set_psize(struct mm_struct *mm, unsigned long address,
 #ifdef CONFIG_SPU_BASE
 	spu_flush_all_slbs(mm);
 #endif
+	cxl_slbia(mm);
 }
 
 void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 11/15] cxl: Add base builtin support
  2014-09-18  8:26 [PATCH 0/15] POWER8 Coherent Accelerator device driver Michael Neuling
                   ` (9 preceding siblings ...)
  2014-09-18  8:26 ` [PATCH 10/15] powerpc/mm: Add hooks for cxl Michael Neuling
@ 2014-09-18  8:26 ` Michael Neuling
  2014-09-18  8:26 ` [PATCH 12/15] cxl: Driver code for powernv PCIe based cards for userspace access Michael Neuling
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Michael Neuling @ 2014-09-18  8:26 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie, cbe-oss-dev

From: Ian Munsie <imunsie@au1.ibm.com>

This adds the base cxl support that needs to be build into the kernel to use
cxl as a module.  This is needed so that the cxl call backs from the core
powerpc mm code always exist irrespective of if the cxl module is loaded or
not.  This is similar to how cell works with CONFIG_SPU_BASE.

This adds a cxl_slbia() call (similar to spu_flush_all_slbs()) which checks for
the cxl module being loaded.  If the modules is not loaded we return, otherwise
we call into the cxl SLB invalidation code.

This also adds the cxl_ctx_in_use() function for use in the mm code to see if
any cxl contexts are currently in use.  This is used by the tlbie() to
determine if it can do local TLB invalidations or not.  This also adds get/put
calls for the cxl driver module to refcount the active cxl contexts.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/mm/Makefile  |   1 +
 drivers/misc/Kconfig      |   1 +
 drivers/misc/Makefile     |   1 +
 drivers/misc/cxl/Kconfig  |   7 ++++
 drivers/misc/cxl/Makefile |   1 +
 drivers/misc/cxl/base.c   | 102 ++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 113 insertions(+)
 create mode 100644 drivers/misc/cxl/Kconfig
 create mode 100644 drivers/misc/cxl/Makefile
 create mode 100644 drivers/misc/cxl/base.c

diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index a7f4dd7..2888133 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -35,3 +35,4 @@ obj-$(CONFIG_PPC_SUBPAGE_PROT)	+= subpage-prot.o
 obj-$(CONFIG_NOT_COHERENT_CACHE) += dma-noncoherent.o
 obj-$(CONFIG_HIGHMEM)		+= highmem.o
 obj-$(CONFIG_SPU_BASE)		+= copro_fault.o
+obj-$(CONFIG_CXL_BASE)		+= copro_fault.o
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index b841180..bbeb451 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -527,4 +527,5 @@ source "drivers/misc/vmw_vmci/Kconfig"
 source "drivers/misc/mic/Kconfig"
 source "drivers/misc/genwqe/Kconfig"
 source "drivers/misc/echo/Kconfig"
+source "drivers/misc/cxl/Kconfig"
 endmenu
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 5497d02..7d5c4cd 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -55,3 +55,4 @@ obj-y				+= mic/
 obj-$(CONFIG_GENWQE)		+= genwqe/
 obj-$(CONFIG_ECHO)		+= echo/
 obj-$(CONFIG_VEXPRESS_SYSCFG)	+= vexpress-syscfg.o
+obj-$(CONFIG_CXL_BASE)		+= cxl/
diff --git a/drivers/misc/cxl/Kconfig b/drivers/misc/cxl/Kconfig
new file mode 100644
index 0000000..48533e1
--- /dev/null
+++ b/drivers/misc/cxl/Kconfig
@@ -0,0 +1,7 @@
+#
+# IBM Coherent Accelerator (CXL) compatible devices
+#
+
+config CXL_BASE
+	bool
+	default n
diff --git a/drivers/misc/cxl/Makefile b/drivers/misc/cxl/Makefile
new file mode 100644
index 0000000..e30ad0a
--- /dev/null
+++ b/drivers/misc/cxl/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_CXL_BASE)		+= base.o
diff --git a/drivers/misc/cxl/base.c b/drivers/misc/cxl/base.c
new file mode 100644
index 0000000..f4cbcfb
--- /dev/null
+++ b/drivers/misc/cxl/base.c
@@ -0,0 +1,102 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/rcupdate.h>
+#include <asm/errno.h>
+#include <misc/cxl.h>
+#include "cxl.h"
+
+/* protected by rcu */
+static struct cxl_calls *cxl_calls;
+
+static atomic_t use_count = ATOMIC_INIT(0);
+
+#ifdef CONFIG_CXL_MODULE
+
+static inline struct cxl_calls *cxl_calls_get(void)
+{
+	struct cxl_calls *calls = NULL;
+
+	rcu_read_lock();
+	calls = rcu_dereference(cxl_calls);
+	if (calls && !try_module_get(calls->owner))
+		calls = NULL;
+	rcu_read_unlock();
+
+	return calls;
+}
+
+static inline void cxl_calls_put(struct cxl_calls *calls)
+{
+	BUG_ON(calls != cxl_calls);
+
+	/* we don't need to rcu this, as we hold a reference to the module */
+	module_put(cxl_calls->owner);
+}
+
+#else /* !defined CONFIG_CXL_MODULE */
+
+static inline struct cxl_calls *cxl_calls_get(void)
+{
+	return cxl_calls;
+}
+
+static inline void cxl_calls_put(struct cxl_calls *calls) { }
+
+#endif /* CONFIG_CXL_MODULE */
+
+void cxl_slbia(struct mm_struct *mm)
+{
+	struct cxl_calls *calls;
+
+	calls = cxl_calls_get();
+	if (!calls)
+		return;
+
+	calls->cxl_slbia(mm);
+	cxl_calls_put(calls);
+}
+EXPORT_SYMBOL(cxl_slbia);
+
+void cxl_ctx_get(void)
+{
+	atomic_inc(&use_count);
+}
+EXPORT_SYMBOL(cxl_ctx_get);
+
+void cxl_ctx_put(void)
+{
+	atomic_dec(&use_count);
+}
+EXPORT_SYMBOL(cxl_ctx_put);
+
+bool cxl_ctx_in_use(void)
+{
+	return (atomic_read(&use_count) != 0);
+}
+EXPORT_SYMBOL(cxl_ctx_in_use);
+
+int register_cxl_calls(struct cxl_calls *calls)
+{
+	if (cxl_calls)
+		return -EBUSY;
+
+	rcu_assign_pointer(cxl_calls, calls);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(register_cxl_calls);
+
+void unregister_cxl_calls(struct cxl_calls *calls)
+{
+	BUG_ON(cxl_calls->owner != calls->owner);
+	RCU_INIT_POINTER(cxl_calls, NULL);
+	synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(unregister_cxl_calls);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 12/15] cxl: Driver code for powernv PCIe based cards for userspace access
  2014-09-18  8:26 [PATCH 0/15] POWER8 Coherent Accelerator device driver Michael Neuling
                   ` (10 preceding siblings ...)
  2014-09-18  8:26 ` [PATCH 11/15] cxl: Add base builtin support Michael Neuling
@ 2014-09-18  8:26 ` Michael Neuling
  2014-09-18  8:26 ` [PATCH 13/15] cxl: Userspace header file Michael Neuling
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Michael Neuling @ 2014-09-18  8:26 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie, cbe-oss-dev

From: Ian Munsie <imunsie@au1.ibm.com>

This is the core of the cxl driver.

It adds support for using cxl cards in the powernv environment only (no guest
support).  It allows access to cxl accelerators by userspace using
/dev/cxl/afu0.0 char device.

The kernel driver has no knowledge of the acceleration function.  It only
provides services to userspace via the /dev/cxl/afu0.0 device.

This will compile to two modules.  cxl.ko provides the core cxl functionality
and userspace API.  cxl-pci.ko provides the PCI driver driver functionality the
powernv environment.

Documentation of the cxl hardware architecture and userspace API is provided in
subsequent patches.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 drivers/misc/cxl/context.c | 169 ++++++++
 drivers/misc/cxl/cxl-pci.c | 977 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/misc/cxl/cxl.h     | 605 ++++++++++++++++++++++++++++
 drivers/misc/cxl/debugfs.c | 116 ++++++
 drivers/misc/cxl/fault.c   | 298 ++++++++++++++
 drivers/misc/cxl/file.c    | 503 +++++++++++++++++++++++
 drivers/misc/cxl/irq.c     | 405 +++++++++++++++++++
 drivers/misc/cxl/main.c    | 238 +++++++++++
 drivers/misc/cxl/native.c  | 649 ++++++++++++++++++++++++++++++
 drivers/misc/cxl/sysfs.c   | 348 ++++++++++++++++
 10 files changed, 4308 insertions(+)
 create mode 100644 drivers/misc/cxl/context.c
 create mode 100644 drivers/misc/cxl/cxl-pci.c
 create mode 100644 drivers/misc/cxl/cxl.h
 create mode 100644 drivers/misc/cxl/debugfs.c
 create mode 100644 drivers/misc/cxl/fault.c
 create mode 100644 drivers/misc/cxl/file.c
 create mode 100644 drivers/misc/cxl/irq.c
 create mode 100644 drivers/misc/cxl/main.c
 create mode 100644 drivers/misc/cxl/native.c
 create mode 100644 drivers/misc/cxl/sysfs.c

diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
new file mode 100644
index 0000000..012fae1
--- /dev/null
+++ b/drivers/misc/cxl/context.c
@@ -0,0 +1,169 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#undef DEBUG
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/bitmap.h>
+#include <linux/sched.h>
+#include <linux/pid.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/debugfs.h>
+#include <linux/slab.h>
+#include <linux/idr.h>
+#include <asm/cputable.h>
+#include <asm/current.h>
+#include <asm/copro.h>
+
+#include "cxl.h"
+
+/*
+ * Allocates space for a CXL context.
+ */
+struct cxl_context_t *cxl_context_alloc(void)
+{
+	return kzalloc(sizeof(struct cxl_context_t), GFP_KERNEL);
+}
+
+/*
+ * Initialises a CXL context.
+ */
+int cxl_context_init(struct cxl_context_t *ctx, struct cxl_afu_t *afu, bool master)
+{
+	int i;
+
+	spin_lock_init(&ctx->sst_lock);
+	ctx->sstp = NULL;
+	ctx->afu = afu;
+	ctx->master = master;
+	ctx->pid = get_pid(get_task_pid(current, PIDTYPE_PID));
+
+	INIT_WORK(&ctx->fault_work, cxl_handle_fault);
+
+	init_waitqueue_head(&ctx->wq);
+	spin_lock_init(&ctx->lock);
+
+	ctx->irq_bitmap = NULL;
+	ctx->pending_irq = false;
+	ctx->pending_fault = false;
+	ctx->pending_afu_err = false;
+
+	ctx->status = OPENED;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&afu->contexts_lock);
+	i = idr_alloc(&ctx->afu->contexts_idr, ctx, 0,
+		      ctx->afu->num_procs, GFP_NOWAIT);
+	spin_unlock(&afu->contexts_lock);
+	idr_preload_end();
+	if (i < 0)
+		return i;
+
+	ctx->ph = i;
+	ctx->elem = &ctx->afu->spa[i];
+	ctx->pe_inserted = false;
+	return 0;
+}
+
+/*
+ * Map a per-context mmio space into the given vma.
+ */
+int cxl_context_iomap(struct cxl_context_t *ctx, struct vm_area_struct *vma)
+{
+	u64 len = vma->vm_end - vma->vm_start;
+	len = min(len, ctx->psn_size);
+
+	if (ctx->afu->current_model == CXL_MODEL_DEDICATED) {
+		vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+		return vm_iomap_memory(vma, ctx->afu->psn_phys, ctx->afu->adapter->ps_size);
+	}
+
+	/* make sure there is a valid per process space for this AFU */
+	if ((ctx->master && !ctx->afu->psa) || (!ctx->afu->pp_psa)) {
+		pr_devel("AFU doesn't support mmio space\n");
+		return -EINVAL;
+	}
+
+	/* Can't mmap until the AFU is enabled */
+	if (!ctx->afu->enabled)
+		return -EBUSY;
+
+	pr_devel("%s: mmio physical: %llx pe: %i master:%i\n", __func__,
+		 ctx->psn_phys, ctx->ph , ctx->master);
+
+	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+	return vm_iomap_memory(vma, ctx->psn_phys, len);
+}
+
+/*
+ * Detach a context from the hardware. This disables interrupts and doesn't
+ * return until all outstanding interrupts for this context have completed. The
+ * hardware should no longer access *ctx after this has returned.
+ */
+static void __detach_context(struct cxl_context_t *ctx)
+{
+	unsigned long flags;
+	enum cxl_context_status status;
+
+	spin_lock_irqsave(&ctx->sst_lock, flags);
+	status = ctx->status;
+	ctx->status = CLOSED;
+	spin_unlock_irqrestore(&ctx->sst_lock, flags);
+	if (status != STARTED)
+		return;
+
+	WARN_ON(cxl_ops->detach_process(ctx));
+	afu_release_irqs(ctx);
+	flush_work(&ctx->fault_work); /* Only needed for dedicated process */
+	wake_up_all(&ctx->wq);
+}
+
+/*
+ * Detach the given context from the AFU. This doesn't actually
+ * free the context but it should stop the context running in hardware
+ * (ie. prevent this context from generating any further interrupts
+ * so that it can be freed).
+ */
+void cxl_context_detach(struct cxl_context_t *ctx)
+{
+	__detach_context(ctx);
+}
+
+/*
+ * Detach all contexts on the given AFU.
+ */
+void cxl_context_detach_all(struct cxl_afu_t *afu)
+{
+	struct cxl_context_t *ctx;
+	int tmp;
+
+	rcu_read_lock();
+	idr_for_each_entry(&afu->contexts_idr, ctx, tmp)
+		__detach_context(ctx);
+	rcu_read_unlock();
+}
+EXPORT_SYMBOL(cxl_context_detach_all);
+
+void cxl_context_free(struct cxl_context_t *ctx)
+{
+	unsigned long flags;
+
+	idr_remove(&ctx->afu->contexts_idr, ctx->ph);
+	synchronize_rcu();
+
+	spin_lock_irqsave(&ctx->sst_lock, flags);
+	free_page((u64)ctx->sstp);
+	ctx->sstp = NULL;
+	spin_unlock_irqrestore(&ctx->sst_lock, flags);
+
+	put_pid(ctx->pid);
+	kfree(ctx);
+}
diff --git a/drivers/misc/cxl/cxl-pci.c b/drivers/misc/cxl/cxl-pci.c
new file mode 100644
index 0000000..e56cb60
--- /dev/null
+++ b/drivers/misc/cxl/cxl-pci.c
@@ -0,0 +1,977 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#undef DEBUG
+
+#include <linux/pci_regs.h>
+#include <linux/pci_ids.h>
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/sort.h>
+#include <linux/pci.h>
+#include <linux/of.h>
+#include <linux/delay.h>
+#include <asm/opal.h>
+#include <asm/msi_bitmap.h>
+#include <asm/pci-bridge.h> /* for struct pci_controller */
+#include <asm/pnv-pci.h>
+
+#include "cxl.h"
+
+
+#define CXL_PCI_VSEC_ID	0x1280
+#define CXL_VSEC_MIN_SIZE 0x80
+
+#define CXL_READ_VSEC_LENGTH(dev, vsec, dest)			\
+	{							\
+		pci_read_config_word(dev, vsec + 0x6, dest);	\
+		*dest >>= 4;					\
+	}
+#define CXL_READ_VSEC_NAFUS(dev, vsec, dest) \
+	pci_read_config_byte(dev, vsec + 0x8, dest)
+
+#define CXL_READ_VSEC_STATUS(dev, vsec, dest) \
+	pci_read_config_byte(dev, vsec + 0x9, dest)
+#define CXL_STATUS_SECOND_PORT  0x80
+#define CXL_STATUS_MSI_X_FULL   0x40
+#define CXL_STATUS_MSI_X_SINGLE 0x20
+#define CXL_STATUS_FLASH_RW     0x08
+#define CXL_STATUS_FLASH_RO     0x04
+#define CXL_STATUS_LOADABLE_AFU 0x02
+#define CXL_STATUS_LOADABLE_PSL 0x01
+/* If we see these features we won't try to use the card */
+#define CXL_UNSUPPORTED_FEATURES \
+	(CXL_STATUS_MSI_X_FULL | CXL_STATUS_MSI_X_SINGLE)
+
+#define CXL_READ_VSEC_MODE_CONTROL(dev, vsec, dest) \
+	pci_read_config_byte(dev, vsec + 0xa, dest)
+#define CXL_WRITE_VSEC_MODE_CONTROL(dev, vsec, val) \
+	pci_write_config_byte(dev, vsec + 0xa, val)
+#define CXL_VSEC_PROTOCOL_MASK   0xe0
+#define CXL_VSEC_PROTOCOL_256TB  0x80 /* Power 8 uses this */
+#define CXL_VSEC_PROTOCOL_512TB  0x40
+#define CXL_VSEC_PROTOCOL_1024TB 0x20
+#define CXL_VSEC_PROTOCOL_ENABLE 0x01
+
+#define CXL_READ_VSEC_PSL_REVISION(dev, vsec, dest) \
+	pci_read_config_word(dev, vsec + 0xc, dest)
+#define CXL_READ_VSEC_CAIA_MINOR(dev, vsec, dest) \
+	pci_read_config_byte(dev, vsec + 0xe, dest)
+#define CXL_READ_VSEC_CAIA_MAJOR(dev, vsec, dest) \
+	pci_read_config_byte(dev, vsec + 0xf, dest)
+#define CXL_READ_VSEC_BASE_IMAGE(dev, vsec, dest) \
+	pci_read_config_word(dev, vsec + 0x10, dest)
+
+#define CXL_READ_VSEC_IMAGE_STATE(dev, vsec, dest) \
+	pci_read_config_byte(dev, vsec + 0x13, dest)
+#define CXL_WRITE_VSEC_IMAGE_STATE(dev, vsec, val) \
+	pci_write_config_byte(dev, vsec + 0x13, val)
+#define CXL_VSEC_USER_IMAGE_LOADED 0x80 /* RO */
+#define CXL_VSEC_PERST_LOADS_IMAGE 0x20 /* RW */
+#define CXL_VSEC_PERST_SELECT_USER 0x10 /* RW */
+
+#define CXL_READ_VSEC_AFU_DESC_OFF(dev, vsec, dest) \
+	pci_read_config_dword(dev, vsec + 0x20, dest)
+#define CXL_READ_VSEC_AFU_DESC_SIZE(dev, vsec, dest) \
+	pci_read_config_dword(dev, vsec + 0x24, dest)
+#define CXL_READ_VSEC_PS_OFF(dev, vsec, dest) \
+	pci_read_config_dword(dev, vsec + 0x28, dest)
+#define CXL_READ_VSEC_PS_SIZE(dev, vsec, dest) \
+	pci_read_config_dword(dev, vsec + 0x2c, dest)
+
+
+/* This works a little different than the p1/p2 register accesses to make it
+ * easier to pull out individual fields */
+#define AFUD_READ(afu, off)		_cxl_reg_read(afu->afu_desc_mmio + off)
+#define EXTRACT_PPC_BIT(val, bit)	(!!(val & PPC_BIT(bit)))
+#define EXTRACT_PPC_BITS(val, bs, be)	((val & PPC_BITMASK(bs, be)) >> PPC_BITLSHIFT(be))
+
+#define AFUD_READ_INFO(afu)		AFUD_READ(afu, 0x0)
+#define   AFUD_NUM_INTS_PER_PROC(val)	EXTRACT_PPC_BITS(val,  0, 15)
+#define   AFUD_NUM_PROCS(val)		EXTRACT_PPC_BITS(val, 16, 31)
+#define   AFUD_NUM_CRS(val)		EXTRACT_PPC_BITS(val, 32, 47)
+#define   AFUD_MULTIMODEL(val)		EXTRACT_PPC_BIT(val, 48)
+#define   AFUD_PUSH_BLOCK_TRANSFER(val)	EXTRACT_PPC_BIT(val, 55)
+#define   AFUD_DEDICATED_PROCESS(val)	EXTRACT_PPC_BIT(val, 59)
+#define   AFUD_AFU_DIRECTED(val)	EXTRACT_PPC_BIT(val, 61)
+#define   AFUD_TIME_SLICED(val)		EXTRACT_PPC_BIT(val, 63)
+#define AFUD_READ_CR(afu)		AFUD_READ(afu, 0x20)
+#define   AFUD_CR_LEN(val)		EXTRACT_PPC_BITS(val, 8, 63)
+#define AFUD_READ_CR_OFF(afu)		AFUD_READ(afu, 0x28)
+#define AFUD_READ_PPPSA(afu)		AFUD_READ(afu, 0x30)
+#define   AFUD_PPPSA_PP(val)		EXTRACT_PPC_BIT(val, 6)
+#define   AFUD_PPPSA_PSA(val)		EXTRACT_PPC_BIT(val, 7)
+#define   AFUD_PPPSA_LEN(val)		EXTRACT_PPC_BITS(val, 8, 63)
+#define AFUD_READ_PPPSA_OFF(afu)	AFUD_READ(afu, 0x38)
+#define AFUD_READ_EB(afu)		AFUD_READ(afu, 0x40)
+#define   AFUD_EB_LEN(val)		EXTRACT_PPC_BITS(val, 8, 63)
+#define AFUD_READ_EB_OFF(afu)		AFUD_READ(afu, 0x48)
+
+static DEFINE_PCI_DEVICE_TABLE(cxl_pci_tbl) = {
+	{ PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x0477), },
+	{ PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x044b), },
+	{ PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x04cf), },
+	{ PCI_DEVICE_CLASS(0x120000, ~0), },
+
+	{ }
+};
+MODULE_DEVICE_TABLE(pci, cxl_pci_tbl);
+
+
+/* Mostly using these wrappers to avoid confusion:
+ * priv 1 is BAR2, while priv 2 is BAR0 */
+static inline resource_size_t p1_base(struct pci_dev *dev)
+{
+	return pci_resource_start(dev, 2);
+}
+
+static inline resource_size_t p1_size(struct pci_dev *dev)
+{
+	return pci_resource_len(dev, 2);
+}
+
+static inline resource_size_t p2_base(struct pci_dev *dev)
+{
+	return pci_resource_start(dev, 0);
+}
+
+static inline resource_size_t p2_size(struct pci_dev *dev)
+{
+	return pci_resource_len(dev, 0);
+}
+
+static int find_cxl_vsec(struct pci_dev *dev)
+{
+	int vsec = 0;
+	u16 val;
+
+	while ((vsec = pci_find_next_ext_capability(dev, vsec, PCI_EXT_CAP_ID_VNDR))) {
+		pci_read_config_word(dev, vsec + 0x4, &val);
+		if (val == CXL_PCI_VSEC_ID)
+			return vsec;
+	}
+	return 0;
+
+}
+
+static void dump_cxl_config_space(struct pci_dev *dev)
+{
+	int vsec;
+	u32 val;
+
+	dev_info(&dev->dev, "dump_cxl_config_space\n");
+
+	pci_read_config_dword(dev, PCI_BASE_ADDRESS_0, &val);
+	dev_info(&dev->dev, "BAR0: %#.8x\n", val);
+	pci_read_config_dword(dev, PCI_BASE_ADDRESS_1, &val);
+	dev_info(&dev->dev, "BAR1: %#.8x\n", val);
+	pci_read_config_dword(dev, PCI_BASE_ADDRESS_2, &val);
+	dev_info(&dev->dev, "BAR2: %#.8x\n", val);
+	pci_read_config_dword(dev, PCI_BASE_ADDRESS_3, &val);
+	dev_info(&dev->dev, "BAR3: %#.8x\n", val);
+	pci_read_config_dword(dev, PCI_BASE_ADDRESS_4, &val);
+	dev_info(&dev->dev, "BAR4: %#.8x\n", val);
+	pci_read_config_dword(dev, PCI_BASE_ADDRESS_5, &val);
+	dev_info(&dev->dev, "BAR5: %#.8x\n", val);
+
+	dev_info(&dev->dev, "p1 regs: %#llx, len: %#llx\n",
+		p1_base(dev), p1_size(dev));
+	dev_info(&dev->dev, "p2 regs: %#llx, len: %#llx\n",
+		p1_base(dev), p2_size(dev));
+	dev_info(&dev->dev, "BAR 4/5: %#llx, len: %#llx\n",
+		pci_resource_start(dev, 4), pci_resource_len(dev, 4));
+
+	if (!(vsec = find_cxl_vsec(dev)))
+		return;
+
+#define show_reg(name, what) \
+	dev_info(&dev->dev, "cxl vsec: %30s: %#x\n", name, what)
+
+	pci_read_config_dword(dev, vsec + 0x0, &val);
+	show_reg("Cap ID", (val >> 0) & 0xffff);
+	show_reg("Cap Ver", (val >> 16) & 0xf);
+	show_reg("Next Cap Ptr", (val >> 20) & 0xfff);
+	pci_read_config_dword(dev, vsec + 0x4, &val);
+	show_reg("VSEC ID", (val >> 0) & 0xffff);
+	show_reg("VSEC Rev", (val >> 16) & 0xf);
+	show_reg("VSEC Length",	(val >> 20) & 0xfff);
+	pci_read_config_dword(dev, vsec + 0x8, &val);
+	show_reg("Num AFUs", (val >> 0) & 0xff);
+	show_reg("Status", (val >> 8) & 0xff);
+	show_reg("Mode Control", (val >> 16) & 0xff);
+	show_reg("Reserved", (val >> 24) & 0xff);
+	pci_read_config_dword(dev, vsec + 0xc, &val);
+	show_reg("PSL Rev", (val >> 0) & 0xffff);
+	show_reg("CAIA Ver", (val >> 16) & 0xffff);
+	pci_read_config_dword(dev, vsec + 0x10, &val);
+	show_reg("Base Image Rev", (val >> 0) & 0xffff);
+	show_reg("Reserved", (val >> 16) & 0x0fff);
+	show_reg("Image Control", (val >> 28) & 0x3);
+	show_reg("Reserved", (val >> 30) & 0x1);
+	show_reg("Image Loaded", (val >> 31) & 0x1);
+
+	pci_read_config_dword(dev, vsec + 0x14, &val);
+	show_reg("Reserved", val);
+	pci_read_config_dword(dev, vsec + 0x18, &val);
+	show_reg("Reserved", val);
+	pci_read_config_dword(dev, vsec + 0x1c, &val);
+	show_reg("Reserved", val);
+
+	pci_read_config_dword(dev, vsec + 0x20, &val);
+	show_reg("AFU Descriptor Offset", val);
+	pci_read_config_dword(dev, vsec + 0x24, &val);
+	show_reg("AFU Descriptor Size", val);
+	pci_read_config_dword(dev, vsec + 0x28, &val);
+	show_reg("Problem State Offset", val);
+	pci_read_config_dword(dev, vsec + 0x2c, &val);
+	show_reg("Problem State Size", val);
+
+	pci_read_config_dword(dev, vsec + 0x30, &val);
+	show_reg("Reserved", val);
+	pci_read_config_dword(dev, vsec + 0x34, &val);
+	show_reg("Reserved", val);
+	pci_read_config_dword(dev, vsec + 0x38, &val);
+	show_reg("Reserved", val);
+	pci_read_config_dword(dev, vsec + 0x3c, &val);
+	show_reg("Reserved", val);
+
+	pci_read_config_dword(dev, vsec + 0x40, &val);
+	show_reg("PSL Programming Port", val);
+	pci_read_config_dword(dev, vsec + 0x44, &val);
+	show_reg("PSL Programming Control", val);
+
+	pci_read_config_dword(dev, vsec + 0x48, &val);
+	show_reg("Reserved", val);
+	pci_read_config_dword(dev, vsec + 0x4c, &val);
+	show_reg("Reserved", val);
+
+	pci_read_config_dword(dev, vsec + 0x50, &val);
+	show_reg("Flash Address Register", val);
+	pci_read_config_dword(dev, vsec + 0x54, &val);
+	show_reg("Flash Size Register", val);
+	pci_read_config_dword(dev, vsec + 0x58, &val);
+	show_reg("Flash Status/Control Register", val);
+	pci_read_config_dword(dev, vsec + 0x58, &val);
+	show_reg("Flash Data Port", val);
+
+#undef show_reg
+}
+
+static void dump_afu_descriptor(struct cxl_afu_t *afu)
+{
+	u64 val;
+
+#define show_reg(name, what) \
+	dev_info(&afu->dev, "afu desc: %30s: %#llx\n", name, what)
+
+	val = AFUD_READ_INFO(afu);
+	show_reg("num_ints_per_process", AFUD_NUM_INTS_PER_PROC(val));
+	show_reg("num_of_processes", AFUD_NUM_PROCS(val));
+	show_reg("num_of_afu_CRs", AFUD_NUM_CRS(val));
+	show_reg("req_prog_model", val & 0xffffULL);
+
+	val = AFUD_READ(afu, 0x8);
+	show_reg("Reserved", val);
+	val = AFUD_READ(afu, 0x10);
+	show_reg("Reserved", val);
+	val = AFUD_READ(afu, 0x18);
+	show_reg("Reserved", val);
+
+	val = AFUD_READ_CR(afu);
+	show_reg("Reserved", (val >> (63-7)) & 0xff);
+	show_reg("AFU_CR_len", AFUD_CR_LEN(val));
+
+	val = AFUD_READ_CR_OFF(afu);
+	show_reg("AFU_CR_offset", val);
+
+	val = AFUD_READ_PPPSA(afu);
+	show_reg("PerProcessPSA_control", (val >> (63-7)) & 0xff);
+	show_reg("PerProcessPSA Length", AFUD_PPPSA_LEN(val));
+
+	val = AFUD_READ_PPPSA_OFF(afu);
+	show_reg("PerProcessPSA_offset", val);
+
+	val = AFUD_READ_EB(afu);
+	show_reg("Reserved", (val >> (63-7)) & 0xff);
+	show_reg("AFU_EB_len", AFUD_EB_LEN(val));
+
+	val = AFUD_READ_EB_OFF(afu);
+	show_reg("AFU_EB_offset", val);
+
+#undef show_reg
+}
+
+extern struct device_node *pnv_pci_to_phb_node(struct pci_dev *dev);
+
+static int init_implementation_adapter_regs(struct cxl_t *adapter, struct pci_dev *dev)
+{
+	struct device_node *np;
+	const __be32 *prop;
+	u64 psl_dsnctl;
+	u64 chipid;
+
+	if (!(np = pnv_pci_to_phb_node(dev)))
+		return -ENODEV;
+
+	while (np && !(prop = of_get_property(np, "ibm,chip-id", NULL)))
+		np = of_get_next_parent(np);
+	if (!np)
+		return -ENODEV;
+	chipid = be32_to_cpup(prop);
+	of_node_put(np);
+
+	/* Tell PSL where to route data to */
+	psl_dsnctl = 0x02E8900002000000ULL | (chipid << (63-5));
+	cxl_p1_write(adapter, CXL_PSL_DSNDCTL, psl_dsnctl);
+	cxl_p1_write(adapter, CXL_PSL_RESLCKTO, 0x20000000200ULL);
+	/* snoop write mask */
+	cxl_p1_write(adapter, CXL_PSL_SNWRALLOC, 0x00000000FFFFFFFFULL);
+	/* set fir_accum */
+	cxl_p1_write(adapter, CXL_PSL_FIR_CNTL, 0x0800000000000000ULL);
+	/* for debugging with trace arrays */
+	cxl_p1_write(adapter, CXL_PSL_TRACE, 0x0000FF7C00000000ULL);
+
+	return 0;
+}
+
+static int init_implementation_afu_regs(struct cxl_afu_t *afu)
+{
+	/* read/write masks for this slice */
+	cxl_p1n_write(afu, CXL_PSL_APCALLOC_A, 0xFFFFFFFEFEFEFEFEULL);
+	/* APC read/write masks for this slice */
+	cxl_p1n_write(afu, CXL_PSL_COALLOC_A, 0xFF000000FEFEFEFEULL);
+	/* for debugging with trace arrays */
+	cxl_p1n_write(afu, CXL_PSL_SLICE_TRACE, 0x0000FFFF00000000ULL);
+	cxl_p1n_write(afu, CXL_PSL_RXCTL_A, 0xF000000000000000ULL);
+
+	return 0;
+}
+
+static int setup_cxl_msi(struct cxl_t *adapter, unsigned int hwirq,
+			 unsigned int virq)
+{
+	struct pci_dev *dev = to_pci_dev(adapter->dev.parent);
+
+	return pnv_cxl_ioda_msi_setup(dev, hwirq, virq);
+}
+
+static int alloc_one_hwirq(struct cxl_t *adapter)
+{
+	struct pci_dev *dev = to_pci_dev(adapter->dev.parent);
+
+	return pnv_cxl_alloc_hwirqs(dev, 1);
+}
+
+static void release_one_hwirq(struct cxl_t *adapter, int hwirq)
+{
+	struct pci_dev *dev = to_pci_dev(adapter->dev.parent);
+
+	return pnv_cxl_release_hwirqs(dev, hwirq, 1);
+}
+
+static int alloc_hwirq_ranges(struct cxl_irq_ranges *irqs, struct cxl_t *adapter, unsigned int num)
+{
+	struct pci_dev *dev = to_pci_dev(adapter->dev.parent);
+
+	return pnv_cxl_alloc_hwirq_ranges(irqs, dev, num);
+}
+
+static void release_hwirq_ranges(struct cxl_irq_ranges *irqs, struct cxl_t *adapter)
+{
+	struct pci_dev *dev = to_pci_dev(adapter->dev.parent);
+
+	pnv_cxl_release_hwirq_ranges(irqs, dev);
+
+}
+
+
+static struct cxl_driver_ops cxl_pci_driver_ops = {
+	.module = THIS_MODULE,
+	.alloc_one_irq = alloc_one_hwirq,
+	.release_one_irq = release_one_hwirq,
+	.alloc_irq_ranges = alloc_hwirq_ranges,
+	.release_irq_ranges = release_hwirq_ranges,
+	.setup_irq = setup_cxl_msi,
+};
+
+static int setup_cxl_bars(struct pci_dev *dev)
+{
+	/* Safety check in case we get backported to < 3.17 without M64 */
+	if ((p1_base(dev) < 0x100000000ULL) ||
+	    (p2_base(dev) < 0x100000000ULL)) {
+		dev_err(&dev->dev, "ABORTING: M32 BAR assignment incompatible with CXL\n");
+		return -ENODEV;
+	}
+
+	/* BAR 4/5 has a special meaning for CXL and must be programmed with a
+	 * special value corresponding to the CXL protocol address range.
+	 * For POWER 8 that means bits 48:49 must be set to 10 */
+	pci_write_config_dword(dev, PCI_BASE_ADDRESS_4, 0x00000000);
+	pci_write_config_dword(dev, PCI_BASE_ADDRESS_5, 0x00020000);
+
+	return 0;
+}
+
+/*
+ *  pciex node: ibm,opal-m64-window = <0x3d058 0x0 0x3d058 0x0 0x8 0x0>;
+ */
+
+static int switch_card_to_cxl(struct pci_dev *dev)
+{
+	int vsec;
+	u8 val;
+	int rc;
+
+	dev_info(&dev->dev, "switch card to CXL\n");
+
+	if (!(vsec = find_cxl_vsec(dev))) {
+		dev_err(&dev->dev, "ABORTING: CXL VSEC not found!\n");
+		return -ENODEV;
+	}
+
+	if ((rc = CXL_READ_VSEC_MODE_CONTROL(dev, vsec, &val))) {
+		dev_err(&dev->dev, "failed to read current mode control: %i", rc);
+		return rc;
+	}
+	val &= ~CXL_VSEC_PROTOCOL_MASK;
+	val |= CXL_VSEC_PROTOCOL_256TB | CXL_VSEC_PROTOCOL_ENABLE;
+	if ((rc = CXL_WRITE_VSEC_MODE_CONTROL(dev, vsec, val))) {
+		dev_err(&dev->dev, "failed to enable CXL protocol: %i", rc);
+		return rc;
+	}
+	/* The CAIA spec (v0.12 11.6 Bi-modal Device Support) states
+	 * we must wait 100ms after this mode switch before touching
+	 * PCIe config space.
+	 */
+	msleep(100);
+
+	return 0;
+}
+
+static int enable_cxl_protocol(struct pci_dev *dev)
+{
+	int rc;
+
+	if ((rc = switch_card_to_cxl(dev)))
+		return rc;
+
+	if ((rc = pnv_phb_to_cxl(dev)))
+		return rc;
+
+	return rc;
+}
+
+
+static int cxl_map_slice_regs(struct cxl_afu_t *afu, struct cxl_t *adapter, struct pci_dev *dev)
+{
+	u64 p1n_base, p2n_base, afu_desc;
+	const u64 p1n_size = 0x100;
+	const u64 p2n_size = 0x1000;
+
+	p1n_base = p1_base(dev) + 0x10000 + (afu->slice * p1n_size);
+	p2n_base = p2_base(dev) + (afu->slice * p2n_size);
+	afu->psn_phys = p2_base(dev) + (adapter->ps_off + (afu->slice * adapter->ps_size));
+	afu_desc = p2_base(dev) + adapter->afu_desc_off + (afu->slice * adapter->afu_desc_size);
+
+	if (!(afu->p1n_mmio = ioremap(p1n_base, p1n_size)))
+		goto err;
+	if (!(afu->p2n_mmio = ioremap(p2n_base, p2n_size)))
+		goto err1;
+	if (afu_desc) {
+		if (!(afu->afu_desc_mmio = ioremap(afu_desc, adapter->afu_desc_size)))
+			goto err2;
+	}
+
+	return 0;
+err2:
+	iounmap(afu->p2n_mmio);
+err1:
+	iounmap(afu->p1n_mmio);
+err:
+	dev_err(&afu->dev, "Error mapping AFU MMIO regions\n");
+	return -ENOMEM;
+}
+
+static void cxl_unmap_slice_regs(struct cxl_afu_t *afu)
+{
+	if (afu->p1n_mmio)
+		iounmap(afu->p2n_mmio);
+	if (afu->p1n_mmio)
+		iounmap(afu->p1n_mmio);
+}
+
+static void cxl_release_afu(struct device *dev)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(dev);
+
+	pr_devel("cxl_release_afu\n");
+
+	kfree(afu);
+}
+
+static struct cxl_afu_t *cxl_alloc_afu(struct cxl_t *adapter, int slice)
+{
+	struct cxl_afu_t *afu;
+
+	if (!(afu = kzalloc(sizeof(struct cxl_afu_t), GFP_KERNEL)))
+		return NULL;
+
+	afu->adapter = adapter;
+	afu->dev.parent = &adapter->dev;
+	afu->dev.release = cxl_release_afu;
+	afu->slice = slice;
+	idr_init(&afu->contexts_idr);
+	spin_lock_init(&afu->contexts_lock);
+	spin_lock_init(&afu->afu_cntl_lock);
+	mutex_init(&afu->spa_mutex);
+
+	afu->prefault_mode = CXL_PREFAULT_NONE;
+	afu->irqs_max = afu->adapter->user_irqs;
+
+	return afu;
+}
+
+/* Expects AFU struct to have recently been zeroed out */
+static int cxl_read_afu_descriptor(struct cxl_afu_t *afu)
+{
+	u64 val;
+
+	val = AFUD_READ_INFO(afu);
+	afu->pp_irqs = AFUD_NUM_INTS_PER_PROC(val);
+	afu->max_procs_virtualised = AFUD_NUM_PROCS(val);
+
+	if (AFUD_AFU_DIRECTED(val))
+		afu->models_supported |= CXL_MODEL_DIRECTED;
+	if (AFUD_DEDICATED_PROCESS(val))
+		afu->models_supported |= CXL_MODEL_DEDICATED;
+	if (AFUD_TIME_SLICED(val))
+		afu->models_supported |= CXL_MODEL_TIME_SLICED;
+
+	val = AFUD_READ_PPPSA(afu);
+	afu->pp_size = AFUD_PPPSA_LEN(val) * 4096;
+	afu->psa = AFUD_PPPSA_PSA(val);
+	if ((afu->pp_psa = AFUD_PPPSA_PP(val)))
+		afu->pp_offset = AFUD_READ_PPPSA_OFF(afu);
+
+	return 0;
+}
+
+static int cxl_afu_descriptor_looks_ok(struct cxl_afu_t *afu)
+{
+	if (afu->psa && afu->adapter->ps_size <
+			(afu->pp_offset + afu->pp_size*afu->max_procs_virtualised)) {
+		dev_err(&afu->dev, "per-process PSA can't fit inside the PSA!\n");
+		return -ENODEV;
+	}
+
+	if (afu->pp_psa && (afu->pp_size < PAGE_SIZE))
+		dev_warn(&afu->dev, "AFU uses < PAGE_SIZE per-process PSA!");
+
+	return 0;
+}
+
+static int sanitise_afu_regs(struct cxl_afu_t *afu)
+{
+	cxl_p1_write(afu->adapter, CXL_PSL_ErrIVTE, 0x0000000000000000);
+	cxl_p1n_write(afu, CXL_PSL_SERR_An, 0x0000000000000000);
+	cxl_p1n_write(afu, CXL_PSL_IVTE_Offset_An, 0x0000000000000000);
+	cxl_ops->slbia(afu);
+
+	return 0;
+}
+
+static int cxl_init_afu(struct cxl_t *adapter, int slice, struct pci_dev *dev)
+{
+	struct cxl_afu_t *afu;
+	bool free = true;
+	int rc;
+
+	if (!(afu = cxl_alloc_afu(adapter, slice)))
+		return -ENOMEM;
+
+	if ((rc = dev_set_name(&afu->dev, "afu%i.%i", adapter->adapter_num, slice)))
+		goto err1;
+
+	if ((rc = cxl_map_slice_regs(afu, adapter, dev)))
+		goto err1;
+
+	if ((rc = sanitise_afu_regs(afu)))
+		goto err2;
+
+	/* We need to reset the AFU before we can read the AFU descriptor */
+	if ((rc = cxl_ops->afu_reset(afu)))
+		goto err2;
+
+	if (cxl_verbose)
+		dump_afu_descriptor(afu);
+
+	if ((rc = cxl_read_afu_descriptor(afu)))
+		goto err2;
+
+	if ((rc = cxl_afu_descriptor_looks_ok(afu)))
+		goto err2;
+
+	if ((rc = init_implementation_afu_regs(afu)))
+		goto err2;
+
+	if ((rc = cxl_register_serr_irq(afu)))
+		goto err2;
+
+	if ((rc = cxl_register_psl_irq(afu)))
+		goto err3;
+
+	/* Don't care if this fails */
+	cxl_debugfs_afu_add(afu);
+
+	/* After we call this function we must not free the afu directly, even
+	 * if it returns an error! */
+	if ((rc = cxl_register_afu(afu)))
+		goto err_put1;
+
+	if ((rc = cxl_sysfs_afu_add(afu)))
+		goto err_put1;
+
+
+	if ((rc = cxl_afu_select_best_model(afu)))
+		goto err_put2;
+
+	adapter->afu[afu->slice] = afu;
+
+	return 0;
+
+err_put2:
+	cxl_sysfs_afu_remove(afu);
+err_put1:
+	device_unregister(&afu->dev);
+	free = false;
+	cxl_debugfs_afu_remove(afu);
+	cxl_release_psl_irq(afu);
+err3:
+	cxl_release_serr_irq(afu);
+err2:
+	cxl_unmap_slice_regs(afu);
+err1:
+	if (free)
+		kfree(afu);
+	return rc;
+}
+
+static void cxl_remove_afu(struct cxl_afu_t *afu)
+{
+	pr_devel("cxl_remove_afu\n");
+
+	if (!afu)
+		return;
+
+	cxl_sysfs_afu_remove(afu);
+	cxl_debugfs_afu_remove(afu);
+
+	spin_lock(&afu->adapter->afu_list_lock);
+	afu->adapter->afu[afu->slice] = NULL;
+	spin_unlock(&afu->adapter->afu_list_lock);
+
+	cxl_context_detach_all(afu);
+	cxl_afu_deactivate_model(afu);
+
+	cxl_release_psl_irq(afu);
+	cxl_release_serr_irq(afu);
+	cxl_unmap_slice_regs(afu);
+
+	device_unregister(&afu->dev);
+}
+
+
+static int cxl_map_adapter_regs(struct cxl_t *adapter, struct pci_dev *dev)
+{
+	if (pci_request_region(dev, 2, "priv 2 regs"))
+		goto err1;
+	if (pci_request_region(dev, 0, "priv 1 regs"))
+		goto err2;
+
+	pr_devel("cxl_map_adapter_regs: p1: %#.16llx %#llx, p2: %#.16llx %#llx",
+			p1_base(dev), p1_size(dev), p2_base(dev), p2_size(dev));
+
+	if (!(adapter->p1_mmio = ioremap(p1_base(dev), p1_size(dev))))
+		goto err3;
+
+	if (!(adapter->p2_mmio = ioremap(p2_base(dev), p2_size(dev))))
+		goto err4;
+
+	return 0;
+
+err4:
+	iounmap(adapter->p1_mmio);
+	adapter->p1_mmio = NULL;
+err3:
+	pci_release_region(dev, 0);
+err2:
+	pci_release_region(dev, 2);
+err1:
+	return -ENOMEM;
+}
+
+static void cxl_unmap_adapter_regs(struct cxl_t *adapter)
+{
+	if (adapter->p1_mmio)
+		iounmap(adapter->p1_mmio);
+	if (adapter->p2_mmio)
+		iounmap(adapter->p2_mmio);
+}
+
+static int cxl_read_vsec(struct cxl_t *adapter, struct pci_dev *dev)
+{
+	int vsec;
+	u32 afu_desc_off, afu_desc_size;
+	u32 ps_off, ps_size;
+	u16 vseclen;
+	u8 image_state;
+
+	if (!(vsec = find_cxl_vsec(dev))) {
+		dev_err(&adapter->dev, "ABORTING: CXL VSEC not found!\n");
+		return -ENODEV;
+	}
+
+	CXL_READ_VSEC_LENGTH(dev, vsec, &vseclen);
+	if (vseclen < CXL_VSEC_MIN_SIZE) {
+		pr_err("ABORTING: CXL VSEC too short\n");
+		return -EINVAL;
+	}
+
+	CXL_READ_VSEC_STATUS(dev, vsec, &adapter->vsec_status);
+	CXL_READ_VSEC_PSL_REVISION(dev, vsec, &adapter->psl_rev);
+	CXL_READ_VSEC_CAIA_MAJOR(dev, vsec, &adapter->caia_major);
+	CXL_READ_VSEC_CAIA_MINOR(dev, vsec, &adapter->caia_minor);
+	CXL_READ_VSEC_BASE_IMAGE(dev, vsec, &adapter->base_image);
+	CXL_READ_VSEC_IMAGE_STATE(dev, vsec, &image_state);
+	adapter->user_image_loaded = !!(image_state & CXL_VSEC_USER_IMAGE_LOADED);
+	adapter->perst_loads_image = !!(image_state & CXL_VSEC_PERST_LOADS_IMAGE);
+	adapter->perst_select_user = !!(image_state & CXL_VSEC_PERST_SELECT_USER);
+
+	CXL_READ_VSEC_NAFUS(dev, vsec, &adapter->slices);
+	CXL_READ_VSEC_AFU_DESC_OFF(dev, vsec, &afu_desc_off);
+	CXL_READ_VSEC_AFU_DESC_SIZE(dev, vsec, &afu_desc_size);
+	CXL_READ_VSEC_PS_OFF(dev, vsec, &ps_off);
+	CXL_READ_VSEC_PS_SIZE(dev, vsec, &ps_size);
+
+	/* Convert everything to bytes, because there is NO WAY I'd look at the
+	 * code a month later and forget what units these are in ;-) */
+	adapter->ps_off = ps_off * 64 * 1024;
+	adapter->ps_size = ps_size * 64 * 1024;
+	adapter->afu_desc_off = afu_desc_off * 64 * 1024;
+	adapter->afu_desc_size = afu_desc_size *64 * 1024;
+
+	/* Total IRQs - 1 PSL ERROR - #AFU*(1 slice error + 1 DSI) */
+	adapter->user_irqs = pnv_cxl_get_irq_count(dev) - 1 - 2*adapter->slices;
+
+	return 0;
+}
+
+static int cxl_vsec_looks_ok(struct cxl_t *adapter, struct pci_dev *dev)
+{
+	if (adapter->vsec_status & CXL_STATUS_SECOND_PORT)
+		return -EBUSY;
+
+	if (adapter->vsec_status & CXL_UNSUPPORTED_FEATURES) {
+		dev_err(&adapter->dev, "ABORTING: CXL requires unsupported features\n");
+		return -EINVAL;
+	}
+
+	if (!adapter->slices) {
+		/* Once we support dynamic reprogramming we can use the card if
+		 * it supports loadable AFUs */
+		dev_err(&adapter->dev, "ABORTING: Device has no AFUs\n");
+		return -EINVAL;
+	}
+
+	if (!adapter->afu_desc_off || !adapter->afu_desc_size) {
+		dev_err(&adapter->dev, "ABORTING: VSEC shows no AFU descriptors\n");
+		return -EINVAL;
+	}
+
+	if (adapter->ps_size > p2_size(dev) - adapter->ps_off) {
+		dev_err(&adapter->dev, "ABORTING: Problem state size larger than "
+				   "available in BAR2: 0x%llx > 0x%llx\n",
+			 adapter->ps_size, p2_size(dev) - adapter->ps_off);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static void cxl_release_adapter(struct device *dev)
+{
+	struct cxl_t *adapter = to_cxl_adapter(dev);
+
+	pr_devel("cxl_release_adapter\n");
+
+	kfree(adapter);
+}
+
+static struct cxl_t *cxl_alloc_adapter(struct pci_dev *dev)
+{
+	struct cxl_t *adapter;
+
+	if (!(adapter = kzalloc(sizeof(struct cxl_t), GFP_KERNEL)))
+		return NULL;
+
+	adapter->dev.parent = &dev->dev;
+	adapter->dev.release = cxl_release_adapter;
+	adapter->driver = &cxl_pci_driver_ops;
+	pci_set_drvdata(dev, adapter);
+	spin_lock_init(&adapter->afu_list_lock);
+
+	return adapter;
+}
+
+static struct cxl_t *cxl_init_adapter(struct pci_dev *dev)
+{
+	struct cxl_t *adapter;
+	bool free = true;
+	int rc;
+
+	if (!(adapter = cxl_alloc_adapter(dev)))
+		return ERR_PTR(-ENOMEM);
+
+	if ((rc = cxl_alloc_adapter_nr(adapter)))
+		goto err1;
+
+	if ((rc = dev_set_name(&adapter->dev, "card%i", adapter->adapter_num)))
+		goto err2;
+
+	if ((rc = cxl_read_vsec(adapter, dev)))
+		goto err2;
+
+	if ((rc = cxl_vsec_looks_ok(adapter, dev)))
+		goto err2;
+
+	if ((rc = cxl_map_adapter_regs(adapter, dev)))
+		goto err2;
+
+	/* TODO: cxl_ops->sanitise_adapter_regs(adapter); */
+
+	if ((rc = init_implementation_adapter_regs(adapter, dev)))
+		goto err3;
+
+	if ((rc = cxl_register_psl_err_irq(adapter)))
+		goto err3;
+
+	/* Don't care if this one fails: */
+	cxl_debugfs_adapter_add(adapter);
+
+	/* After we call this function we must not free the adapter directly,
+	 * even if it returns an error! */
+	if ((rc = cxl_register_adapter(adapter)))
+		goto err_put1;
+
+	if ((rc = cxl_sysfs_adapter_add(adapter)))
+		goto err_put1;
+
+	return adapter;
+
+err_put1:
+	device_unregister(&adapter->dev);
+	free = false;
+	cxl_debugfs_adapter_remove(adapter);
+	cxl_release_psl_err_irq(adapter);
+err3:
+	cxl_unmap_adapter_regs(adapter);
+err2:
+	cxl_remove_adapter_nr(adapter);
+err1:
+	if (free)
+		kfree(adapter);
+	return ERR_PTR(rc);
+}
+
+static void cxl_remove_adapter(struct cxl_t *adapter)
+{
+	struct pci_dev *pdev = to_pci_dev(adapter->dev.parent);
+
+	pr_devel("cxl_release_adapter\n");
+
+	cxl_sysfs_adapter_remove(adapter);
+	cxl_debugfs_adapter_remove(adapter);
+	cxl_release_psl_err_irq(adapter);
+	cxl_unmap_adapter_regs(adapter);
+	cxl_remove_adapter_nr(adapter);
+
+	device_unregister(&adapter->dev);
+
+	pci_release_region(pdev, 0);
+	pci_release_region(pdev, 2);
+	pci_disable_device(pdev);
+}
+
+static int cxl_probe(struct pci_dev *dev, const struct pci_device_id *id)
+{
+	struct cxl_t *adapter;
+	int slice;
+	int rc;
+
+	pci_dev_get(dev);
+
+	if (cxl_verbose)
+		dump_cxl_config_space(dev);
+
+	if ((rc = setup_cxl_bars(dev)))
+		return rc;
+
+	if ((rc = enable_cxl_protocol(dev))) {
+		dev_err(&dev->dev, "enable_cxl_protocol failed: %i\n", rc);
+		return rc;
+	}
+	dev_info(&dev->dev, "CXL protocol enabled\n");
+
+	if ((rc = pci_enable_device(dev))) {
+		dev_err(&dev->dev, "pci_enable_device failed: %i\n", rc);
+		return rc;
+	}
+
+	adapter = cxl_init_adapter(dev);
+	if (IS_ERR(adapter)) {
+		dev_err(&dev->dev, "cxl_init_adapter failed: %li\n", PTR_ERR(adapter));
+		return PTR_ERR(adapter);
+	}
+
+	for (slice = 0; slice < adapter->slices; slice++) {
+		if ((rc = cxl_init_afu(adapter, slice, dev)))
+			dev_err(&dev->dev, "AFU %i failed to initialise: %i\n", slice, rc);
+	}
+
+	return 0;
+}
+
+static void cxl_remove(struct pci_dev *dev)
+{
+	struct cxl_t *adapter = pci_get_drvdata(dev);
+	int afu;
+
+	dev_warn(&dev->dev, "pci remove\n");
+
+	/* Lock to prevent someone grabbing a ref through the adapter list as
+	 * we are removing it */
+	for (afu = 0; afu < adapter->slices; afu++)
+		cxl_remove_afu(adapter->afu[afu]);
+	cxl_remove_adapter(adapter);
+}
+
+static struct pci_driver cxl_pci_driver = {
+	.name = "cxl-pci",
+	.id_table = cxl_pci_tbl,
+	.probe = cxl_probe,
+	.remove = cxl_remove,
+};
+
+module_driver(cxl_pci_driver, pci_register_driver, pci_unregister_driver);
+
+MODULE_DESCRIPTION("IBM Coherent Accelerator");
+MODULE_AUTHOR("Ian Munsie <imunsie@au1.ibm.com>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
new file mode 100644
index 0000000..87984cb
--- /dev/null
+++ b/drivers/misc/cxl/cxl.h
@@ -0,0 +1,605 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _CXL_H_
+#define _CXL_H_
+
+#include <linux/interrupt.h>
+#include <linux/semaphore.h>
+#include <linux/device.h>
+#include <linux/types.h>
+#include <linux/cdev.h>
+#include <linux/pid.h>
+#include <linux/io.h>
+#include <asm/cputable.h>
+#include <asm/mmu.h>
+#include <asm/reg.h>
+#include <misc/cxl.h>
+
+#include <uapi/misc/cxl.h>
+
+extern uint cxl_verbose;
+
+#define CXL_TIMEOUT 5
+
+/* Opaque types to avoid accidentally passing registers for the wrong MMIO
+ *
+ * At the end of the day, I'm not married to using typedef here, but it might
+ * (and has!) help avoid bugs like mixing up CXL_PSL_CtxTime and
+ * CXL_PSL_CtxTime_An, or calling cxl_p1n_write instead of cxl_p1_write.
+ *
+ * I'm quite happy if these are changed back to #defines before upstreaming, it
+ * should be little more than a regexp search+replace operation in this file.
+ */
+typedef struct {
+	const int x;
+} cxl_p1_reg_t;
+typedef struct {
+	const int x;
+} cxl_p1n_reg_t;
+typedef struct {
+	const int x;
+} cxl_p2n_reg_t;
+#define cxl_reg_off(reg) \
+	(reg.x)
+
+/* Memory maps. Ref CXL Appendix A */
+
+/* PSL Privilege 1 Memory Map */
+/* Configuration and Control area */
+static const cxl_p1_reg_t CXL_PSL_CtxTime = {0x0000};
+static const cxl_p1_reg_t CXL_PSL_ErrIVTE = {0x0008};
+static const cxl_p1_reg_t CXL_PSL_KEY1    = {0x0010};
+static const cxl_p1_reg_t CXL_PSL_KEY2    = {0x0018};
+static const cxl_p1_reg_t CXL_PSL_Control = {0x0020};
+/* Downloading */
+static const cxl_p1_reg_t CXL_PSL_DLCNTL  = {0x0060};
+static const cxl_p1_reg_t CXL_PSL_DLADDR  = {0x0068};
+
+/* PSL Lookaside Buffer Management Area */
+static const cxl_p1_reg_t CXL_PSL_LBISEL  = {0x0080};
+static const cxl_p1_reg_t CXL_PSL_SLBIE   = {0x0088};
+static const cxl_p1_reg_t CXL_PSL_SLBIA   = {0x0090};
+static const cxl_p1_reg_t CXL_PSL_TLBIE   = {0x00A0};
+static const cxl_p1_reg_t CXL_PSL_TLBIA   = {0x00A8};
+static const cxl_p1_reg_t CXL_PSL_AFUSEL  = {0x00B0};
+
+/* 0x00C0:7EFF Implementation dependent area */
+static const cxl_p1_reg_t CXL_PSL_FIR1      = {0x0100};
+static const cxl_p1_reg_t CXL_PSL_FIR2      = {0x0108};
+static const cxl_p1_reg_t CXL_PSL_VERSION   = {0x0118};
+static const cxl_p1_reg_t CXL_PSL_RESLCKTO  = {0x0128};
+static const cxl_p1_reg_t CXL_PSL_FIR_CNTL  = {0x0148};
+static const cxl_p1_reg_t CXL_PSL_DSNDCTL   = {0x0150};
+static const cxl_p1_reg_t CXL_PSL_SNWRALLOC = {0x0158};
+static const cxl_p1_reg_t CXL_PSL_TRACE     = {0x0170};
+/* 0x7F00:7FFF Reserved PCIe MSI-X Pending Bit Array area */
+/* 0x8000:FFFF Reserved PCIe MSI-X Table Area */
+
+/* PSL Slice Privilege 1 Memory Map */
+/* Configuration Area */
+static const cxl_p1n_reg_t CXL_PSL_SR_An          = {0x00};
+static const cxl_p1n_reg_t CXL_PSL_LPID_An        = {0x08};
+static const cxl_p1n_reg_t CXL_PSL_AMBAR_An       = {0x10};
+static const cxl_p1n_reg_t CXL_PSL_SPOffset_An    = {0x18};
+static const cxl_p1n_reg_t CXL_PSL_ID_An          = {0x20};
+static const cxl_p1n_reg_t CXL_PSL_SERR_An        = {0x28};
+/* Memory Management and Lookaside Buffer Management */
+static const cxl_p1n_reg_t CXL_PSL_SDR_An         = {0x30};
+static const cxl_p1n_reg_t CXL_PSL_AMOR_An        = {0x38};
+/* Pointer Area */
+static const cxl_p1n_reg_t CXL_HAURP_An           = {0x80};
+static const cxl_p1n_reg_t CXL_PSL_SPAP_An        = {0x88};
+static const cxl_p1n_reg_t CXL_PSL_LLCMD_An       = {0x90};
+/* Control Area */
+static const cxl_p1n_reg_t CXL_PSL_SCNTL_An       = {0xA0};
+static const cxl_p1n_reg_t CXL_PSL_CtxTime_An     = {0xA8};
+static const cxl_p1n_reg_t CXL_PSL_IVTE_Offset_An = {0xB0};
+static const cxl_p1n_reg_t CXL_PSL_IVTE_Limit_An  = {0xB8};
+/* 0xC0:FF Implementation Dependent Area */
+static const cxl_p1n_reg_t CXL_PSL_FIR_SLICE_An   = {0xC0};
+static const cxl_p1n_reg_t CXL_AFU_DEBUG_An       = {0xC8};
+static const cxl_p1n_reg_t CXL_PSL_APCALLOC_A     = {0xD0};
+static const cxl_p1n_reg_t CXL_PSL_COALLOC_A      = {0xD8};
+static const cxl_p1n_reg_t CXL_PSL_RXCTL_A        = {0xE0};
+static const cxl_p1n_reg_t CXL_PSL_SLICE_TRACE    = {0xE8};
+
+/* PSL Slice Privilege 2 Memory Map */
+/* Configuration and Control Area */
+static const cxl_p2n_reg_t CXL_PSL_PID_TID_An = {0x000};
+static const cxl_p2n_reg_t CXL_CSRP_An        = {0x008};
+static const cxl_p2n_reg_t CXL_AURP0_An       = {0x010};
+static const cxl_p2n_reg_t CXL_AURP1_An       = {0x018};
+static const cxl_p2n_reg_t CXL_SSTP0_An       = {0x020};
+static const cxl_p2n_reg_t CXL_SSTP1_An       = {0x028};
+static const cxl_p2n_reg_t CXL_PSL_AMR_An     = {0x030};
+/* Segment Lookaside Buffer Management */
+static const cxl_p2n_reg_t CXL_SLBIE_An       = {0x040};
+static const cxl_p2n_reg_t CXL_SLBIA_An       = {0x048};
+static const cxl_p2n_reg_t CXL_SLBI_Select_An = {0x050};
+/* Interrupt Registers */
+static const cxl_p2n_reg_t CXL_PSL_DSISR_An   = {0x060};
+static const cxl_p2n_reg_t CXL_PSL_DAR_An     = {0x068};
+static const cxl_p2n_reg_t CXL_PSL_DSR_An     = {0x070};
+static const cxl_p2n_reg_t CXL_PSL_TFC_An     = {0x078};
+static const cxl_p2n_reg_t CXL_PSL_PEHandle_An = {0x080};
+static const cxl_p2n_reg_t CXL_PSL_ErrStat_An = {0x088};
+/* AFU Registers */
+static const cxl_p2n_reg_t CXL_AFU_Cntl_An    = {0x090};
+static const cxl_p2n_reg_t CXL_AFU_ERR_An     = {0x098};
+/* Work Element Descriptor */
+static const cxl_p2n_reg_t CXL_PSL_WED_An     = {0x0A0};
+/* 0x0C0:FFF Implementation Dependent Area */
+
+#define CXL_PSL_SPAP_Addr 0x0ffffffffffff000ULL
+#define CXL_PSL_SPAP_Size 0x0000000000000ff0ULL
+#define CXL_PSL_SPAP_Size_Shift 4
+#define CXL_PSL_SPAP_V    0x0000000000000001ULL
+
+/****** CXL_PSL_DLCNTL *****************************************************/
+#define CXL_PSL_DLCNTL_D (0x1ull << (63-28))
+#define CXL_PSL_DLCNTL_C (0x1ull << (63-29))
+#define CXL_PSL_DLCNTL_E (0x1ull << (63-30))
+#define CXL_PSL_DLCNTL_S (0x1ull << (63-31))
+#define CXL_PSL_DLCNTL_CE (CXL_PSL_DLCNTL_C | CXL_PSL_DLCNTL_E)
+#define CXL_PSL_DLCNTL_DCES (CXL_PSL_DLCNTL_D | CXL_PSL_DLCNTL_CE | CXL_PSL_DLCNTL_S)
+
+/****** CXL_PSL_SR_An ******************************************************/
+#define CXL_PSL_SR_An_SF  MSR_SF            /* 64bit */
+#define CXL_PSL_SR_An_TA  (1ull << (63-1))  /* Tags active,   GA1: 0 */
+#define CXL_PSL_SR_An_HV  MSR_HV            /* Hypervisor,    GA1: 0 */
+#define CXL_PSL_SR_An_PR  MSR_PR            /* Problem state, GA1: 1 */
+#define CXL_PSL_SR_An_ISL (1ull << (63-53)) /* Ignore Segment Large Page */
+#define CXL_PSL_SR_An_TC  (1ull << (63-54)) /* Page Table secondary hash */
+#define CXL_PSL_SR_An_US  (1ull << (63-56)) /* User state,    GA1: X */
+#define CXL_PSL_SR_An_SC  (1ull << (63-58)) /* Segment Table secondary hash */
+#define CXL_PSL_SR_An_R   MSR_DR            /* Relocate,      GA1: 1 */
+#define CXL_PSL_SR_An_MP  (1ull << (63-62)) /* Master Process */
+#define CXL_PSL_SR_An_LE  (1ull << (63-63)) /* Little Endian */
+
+/****** CXL_PSL_LLCMD_An ****************************************************/
+#define CXL_LLCMD_TERMINATE   0x0001000000000000ULL
+#define CXL_LLCMD_REMOVE      0x0002000000000000ULL
+#define CXL_LLCMD_SUSPEND     0x0003000000000000ULL
+#define CXL_LLCMD_RESUME      0x0004000000000000ULL
+#define CXL_LLCMD_ADD         0x0005000000000000ULL
+#define CXL_LLCMD_UPDATE      0x0006000000000000ULL
+#define CXL_LLCMD_HANDLE_MASK 0x000000000000ffffULL
+
+/****** CXL_PSL_ID_An ****************************************************/
+#define CXL_PSL_ID_An_F	(1ull << (63-31))
+#define CXL_PSL_ID_An_L	(1ull << (63-30))
+
+/****** CXL_PSL_SCNTL_An ****************************************************/
+#define CXL_PSL_SCNTL_An_CR          (0x1ull << (63-15))
+/* Programming Models: */
+#define CXL_PSL_SCNTL_An_PM_MASK     (0xffffull << (63-31))
+#define CXL_PSL_SCNTL_An_PM_Shared   (0x0000ull << (63-31))
+#define CXL_PSL_SCNTL_An_PM_OS       (0x0001ull << (63-31))
+#define CXL_PSL_SCNTL_An_PM_Process  (0x0002ull << (63-31))
+#define CXL_PSL_SCNTL_An_PM_AFU      (0x0004ull << (63-31))
+#define CXL_PSL_SCNTL_An_PM_AFU_PBT  (0x0104ull << (63-31))
+/* Purge Status (ro) */
+#define CXL_PSL_SCNTL_An_Ps_MASK     (0x3ull << (63-39))
+#define CXL_PSL_SCNTL_An_Ps_Pending  (0x1ull << (63-39))
+#define CXL_PSL_SCNTL_An_Ps_Complete (0x3ull << (63-39))
+/* Purge */
+#define CXL_PSL_SCNTL_An_Pc          (0x1ull << (63-48))
+/* Suspend Status (ro) */
+#define CXL_PSL_SCNTL_An_Ss_MASK     (0x3ull << (63-55))
+#define CXL_PSL_SCNTL_An_Ss_Pending  (0x1ull << (63-55))
+#define CXL_PSL_SCNTL_An_Ss_Complete (0x3ull << (63-55))
+/* Suspend Control */
+#define CXL_PSL_SCNTL_An_Sc          (0x1ull << (63-63))
+
+/* AFU Slice Enable Status (ro) */
+#define CXL_AFU_Cntl_An_ES_MASK     (0x7ull << (63-2))
+#define CXL_AFU_Cntl_An_ES_Disabled (0x0ull << (63-2))
+#define CXL_AFU_Cntl_An_ES_Enabled  (0x4ull << (63-2))
+/* AFU Slice Enable */
+#define CXL_AFU_Cntl_An_E           (0x1ull << (63-3))
+/* AFU Slice Reset status (ro) */
+#define CXL_AFU_Cntl_An_RS_MASK     (0x3ull << (63-5))
+#define CXL_AFU_Cntl_An_RS_Pending  (0x1ull << (63-5))
+#define CXL_AFU_Cntl_An_RS_Complete (0x2ull << (63-5))
+/* AFU Slice Reset */
+#define CXL_AFU_Cntl_An_RA          (0x1ull << (63-7))
+
+/****** CXL_SSTP0/1_An ******************************************************/
+/* These top bits are for the segment that CONTAINS the segment table */
+#define CXL_SSTP0_An_B_SHIFT    SLB_VSID_SSIZE_SHIFT
+#define CXL_SSTP0_An_KS             (1ull << (63-2))
+#define CXL_SSTP0_An_KP             (1ull << (63-3))
+#define CXL_SSTP0_An_N              (1ull << (63-4))
+#define CXL_SSTP0_An_L              (1ull << (63-5))
+#define CXL_SSTP0_An_C              (1ull << (63-6))
+#define CXL_SSTP0_An_TA             (1ull << (63-7))
+#define CXL_SSTP0_An_LP_SHIFT                (63-9)  /* 2 Bits */
+/* And finally, the virtual address & size of the segment table: */
+#define CXL_SSTP0_An_SegTableSize_SHIFT      (63-31) /* 12 Bits */
+#define CXL_SSTP0_An_SegTableSize_MASK \
+	(((1ull << 12) - 1) << CXL_SSTP0_An_SegTableSize_SHIFT)
+#define CXL_SSTP0_An_STVA_U_MASK   ((1ull << (63-49))-1)
+#define CXL_SSTP1_An_STVA_L_MASK (~((1ull << (63-55))-1))
+#define CXL_SSTP1_An_V              (1ull << (63-63))
+
+/****** CXL_PSL_SLBIE_[An] **************************************************/
+/* write: */
+#define CXL_SLBIE_C        PPC_BIT(36)         /* Class */
+#define CXL_SLBIE_SS       PPC_BITMASK(37, 38) /* Segment Size */
+#define CXL_SLBIE_SS_SHIFT PPC_BITLSHIFT(38)
+#define CXL_SLBIE_TA       PPC_BIT(38)         /* Tags Active */
+/* read: */
+#define CXL_SLBIE_MAX      PPC_BITMASK(24, 31)
+#define CXL_SLBIE_PENDING  PPC_BITMASK(56, 63)
+
+/****** CXL_SLBIA_[An] ******************************************************/
+#define CXL_SLBIA_P         (1ull) /* Pending (read) */
+
+/****** Common to all PSL_SLBIE/A_[An] registers *****************************/
+#define CXL_SLBI_IQ_ALL     (0ull)              /* Inv qualifier */
+#define CXL_SLBI_IQ_LPID    (1ull)              /* Inv qualifier */
+#define CXL_SLBI_IQ_LPIDPID (3ull)              /* Inv qualifier */
+
+/****** CXL_PSL_DSISR_An ****************************************************/
+#define CXL_PSL_DSISR_An_DS (1ull << (63-0))  /* Segment not found */
+#define CXL_PSL_DSISR_An_DM (1ull << (63-1))  /* PTE not found (See also: M) or protection fault */
+#define CXL_PSL_DSISR_An_ST (1ull << (63-2))  /* Segment Table PTE not found */
+#define CXL_PSL_DSISR_An_UR (1ull << (63-3))  /* AURP PTE not found */
+#define CXL_PSL_DSISR_TRANS (CXL_PSL_DSISR_An_DS | CXL_PSL_DSISR_An_DM | CXL_PSL_DSISR_An_ST | CXL_PSL_DSISR_An_UR)
+#define CXL_PSL_DSISR_An_PE (1ull << (63-4))  /* PSL Error (implementation specific) */
+#define CXL_PSL_DSISR_An_AE (1ull << (63-5))  /* AFU Error */
+#define CXL_PSL_DSISR_An_OC (1ull << (63-6))  /* OS Context Warning */
+/* NOTE: Bits 32:63 are undefined if DSISR[DS] = 1 */
+#define CXL_PSL_DSISR_An_M  DSISR_NOHPTE      /* PTE not found */
+#define CXL_PSL_DSISR_An_P  DSISR_PROTFAULT   /* Storage protection violation */
+#define CXL_PSL_DSISR_An_A  (1ull << (63-37)) /* AFU lock access to write through or cache inhibited storage */
+#define CXL_PSL_DSISR_An_S  DSISR_ISSTORE     /* Access was afu_wr or afu_zero */
+#define CXL_PSL_DSISR_An_K  DSISR_KEYFAULT    /* Access not permitted by virtual page class key protection */
+
+/****** CXL_PSL_TFC_An ******************************************************/
+#define CXL_PSL_TFC_An_A  (1ull << (63-28)) /* Acknowledge non-translation fault */
+#define CXL_PSL_TFC_An_C  (1ull << (63-29)) /* Continue (abort transaction) */
+#define CXL_PSL_TFC_An_AE (1ull << (63-30)) /* Restart PSL with address error */
+#define CXL_PSL_TFC_An_R  (1ull << (63-31)) /* Restart PSL transaction */
+
+/* cxl_process_element->software_status */
+#define CXL_PE_SOFTWARE_STATE_V (1ul << (31 -  0)) /* Valid */
+#define CXL_PE_SOFTWARE_STATE_C (1ul << (31 - 29)) /* Complete */
+#define CXL_PE_SOFTWARE_STATE_S (1ul << (31 - 30)) /* Suspend */
+#define CXL_PE_SOFTWARE_STATE_T (1ul << (31 - 31)) /* Terminate */
+
+/* SPA->sw_command_status */
+#define CXL_SPA_SW_CMD_MASK         0xffff000000000000ULL
+#define CXL_SPA_SW_CMD_TERMINATE    0x0001000000000000ULL
+#define CXL_SPA_SW_CMD_REMOVE       0x0002000000000000ULL
+#define CXL_SPA_SW_CMD_SUSPEND      0x0003000000000000ULL
+#define CXL_SPA_SW_CMD_RESUME       0x0004000000000000ULL
+#define CXL_SPA_SW_CMD_ADD          0x0005000000000000ULL
+#define CXL_SPA_SW_CMD_UPDATE       0x0006000000000000ULL
+#define CXL_SPA_SW_STATE_MASK       0x0000ffff00000000ULL
+#define CXL_SPA_SW_STATE_TERMINATED 0x0000000100000000ULL
+#define CXL_SPA_SW_STATE_REMOVED    0x0000000200000000ULL
+#define CXL_SPA_SW_STATE_SUSPENDED  0x0000000300000000ULL
+#define CXL_SPA_SW_STATE_RESUMED    0x0000000400000000ULL
+#define CXL_SPA_SW_STATE_ADDED      0x0000000500000000ULL
+#define CXL_SPA_SW_STATE_UPDATED    0x0000000600000000ULL
+#define CXL_SPA_SW_PSL_ID_MASK      0x00000000ffff0000ULL
+#define CXL_SPA_SW_LINK_MASK        0x000000000000ffffULL
+
+#define CXL_MAX_SLICES 4
+#define MAX_AFU_MMIO_REGS 3
+
+#define CXL_MODEL_DEDICATED   0x1
+#define CXL_MODEL_DIRECTED    0x2
+#define CXL_MODEL_TIME_SLICED 0x4
+#define CXL_SUPPORTED_MODELS (CXL_MODEL_DEDICATED | CXL_MODEL_DIRECTED)
+
+enum cxl_context_status {
+	CLOSED,
+	OPENED,
+	STARTED
+};
+
+enum prefault_modes {
+	CXL_PREFAULT_NONE,
+	CXL_PREFAULT_WED,
+	CXL_PREFAULT_ALL,
+};
+
+struct cxl_sste {
+	__be64 esid_data;
+	__be64 vsid_data;
+};
+
+#define to_cxl_adapter(d) container_of(d, struct cxl_t, dev)
+#define to_cxl_afu(d) container_of(d, struct cxl_afu_t, dev)
+
+struct cxl_afu_t {
+	irq_hw_number_t psl_hwirq;
+	irq_hw_number_t serr_hwirq;
+	unsigned int serr_virq;
+	void __iomem *p1n_mmio;
+	void __iomem *p2n_mmio;
+	phys_addr_t psn_phys;
+	u64 pp_offset;
+	u64 pp_size;
+	void __iomem *afu_desc_mmio;
+	struct cxl_t *adapter;
+	struct device dev;
+	struct cdev afu_cdev_s, afu_cdev_m;
+	struct device *chardev_s, *chardev_m;
+	struct idr contexts_idr;
+	struct dentry *debugfs;
+	spinlock_t contexts_lock;
+	struct mutex spa_mutex;
+	spinlock_t afu_cntl_lock;
+
+	/* Only the first part of the SPA is used for the process element
+	 * linked list. The only other part that software needs to worry about
+	 * is sw_command_status, which we store a separate pointer to.
+	 * Everything else in the SPA is only used by hardware */
+	struct cxl_process_element *spa;
+	__be64 *sw_command_status;
+	unsigned int spa_size;
+	int spa_order;
+	int spa_max_procs;
+	unsigned int psl_virq;
+
+	int pp_irqs;
+	int irqs_max;
+	int num_procs;
+	int max_procs_virtualised;
+	int slice;
+	int models_supported;
+	int current_model;
+	enum prefault_modes prefault_mode;
+	bool psa;
+	bool pp_psa;
+	bool enabled;
+};
+
+/* This is a cxl context.  If the PSL is in dedicated model, there will be one
+ * of these per AFU.  If in AFU directed there can be lots of these. */
+struct cxl_context_t {
+	struct cxl_afu_t *afu;
+
+	/* Problem state MMIO */
+	phys_addr_t psn_phys;
+	u64 psn_size;
+
+	spinlock_t sst_lock; /* Protects segment table */
+	struct cxl_sste *sstp;
+	unsigned int sst_size, sst_lru;
+
+	wait_queue_head_t wq;
+	struct pid *pid;
+	spinlock_t lock; /* Protects pending_irq_mask, pending_fault and fault_addr */
+	/* Only used in PR mode */
+	u64 process_token;
+
+	unsigned long *irq_bitmap; /* Accessed from IRQ context */
+	struct cxl_irq_ranges irqs;
+	u64 fault_addr;
+	u64 afu_err;
+	enum cxl_context_status status;
+
+
+	/* XXX: Is it possible to need multiple work items at once? */
+	struct work_struct fault_work;
+	u64 dsisr;
+	u64 dar;
+
+	struct cxl_process_element *elem;
+
+	int ph; /* process handle/process element index */
+	u32 irq_count;
+	bool pe_inserted;
+	bool master;
+	bool kernel;
+	bool pending_irq;
+	bool pending_fault;
+	bool pending_afu_err;
+};
+
+struct cxl_t {
+	void __iomem *p1_mmio;
+	void __iomem *p2_mmio;
+	irq_hw_number_t err_hwirq;
+	unsigned int err_virq;
+	struct cxl_driver_ops *driver;
+	spinlock_t afu_list_lock;
+	struct cxl_afu_t *afu[CXL_MAX_SLICES];
+	struct device dev;
+	struct dentry *trace;
+	struct dentry *psl_err_chk;
+	struct dentry *debugfs;
+	struct bin_attribute cxl_attr;
+	int adapter_num;
+	int user_irqs;
+	u64 afu_desc_off;
+	u64 afu_desc_size;
+	u64 ps_off;
+	u64 ps_size;
+	u16 psl_rev;
+	u16 base_image;
+	u8 vsec_status;
+	u8 caia_major;
+	u8 caia_minor;
+	u8 slices;
+	bool user_image_loaded;
+	bool perst_loads_image;
+	bool perst_select_user;
+};
+
+struct cxl_driver_ops {
+	struct module *module;
+	int (*alloc_one_irq)(struct cxl_t *adapter);
+	void (*release_one_irq)(struct cxl_t *adapter, int hwirq);
+	int (*alloc_irq_ranges)(struct cxl_irq_ranges *irqs, struct cxl_t *adapter, unsigned int num);
+	void (*release_irq_ranges)(struct cxl_irq_ranges *irqs, struct cxl_t *adapter);
+	int (*setup_irq)(struct cxl_t *adapter, unsigned int hwirq, unsigned int virq);
+};
+
+/* common == phyp + powernv */
+struct cxl_process_element_common {
+	__be32 tid;
+	__be32 pid;
+	__be64 csrp;
+	__be64 aurp0;
+	__be64 aurp1;
+	__be64 sstp0;
+	__be64 sstp1;
+	__be64 amr;
+	u8     reserved3[4];
+	__be64 wed;
+} __packed;
+
+/* just powernv */
+struct cxl_process_element {
+	__be64 sr;
+	__be64 SPOffset;
+	__be64 sdr;
+	__be64 haurp;
+	__be32 ctxtime;
+	__be16 ivte_offsets[4];
+	__be16 ivte_ranges[4];
+	__be32 lpid;
+	struct cxl_process_element_common common;
+	__be32 software_state;
+} __packed;
+
+#define _cxl_reg_write(addr, val) \
+	out_be64((u64 __iomem *)(addr), val)
+#define _cxl_reg_read(addr) \
+	in_be64((u64 __iomem *)(addr))
+
+static inline void __iomem *_cxl_p1_addr(struct cxl_t *cxl, cxl_p1_reg_t reg)
+{
+	WARN_ON(!cpu_has_feature(CPU_FTR_HVMODE));
+	return cxl->p1_mmio + cxl_reg_off(reg);
+}
+#define cxl_p1_write(cxl, reg, val) \
+	_cxl_reg_write(_cxl_p1_addr(cxl, reg), val)
+#define cxl_p1_read(cxl, reg) \
+	_cxl_reg_read(_cxl_p1_addr(cxl, reg))
+
+static inline void __iomem *_cxl_p1n_addr(struct cxl_afu_t *afu, cxl_p1n_reg_t reg)
+{
+	WARN_ON(!cpu_has_feature(CPU_FTR_HVMODE));
+	return afu->p1n_mmio + cxl_reg_off(reg);
+}
+#define cxl_p1n_write(afu, reg, val) \
+	_cxl_reg_write(_cxl_p1n_addr(afu, reg), val)
+#define cxl_p1n_read(afu, reg) \
+	_cxl_reg_read(_cxl_p1n_addr(afu, reg))
+
+static inline void __iomem *_cxl_p2n_addr(struct cxl_afu_t *afu, cxl_p2n_reg_t reg)
+{
+	return afu->p2n_mmio + cxl_reg_off(reg);
+}
+#define cxl_p2n_write(afu, reg, val) \
+	_cxl_reg_write(_cxl_p2n_addr(afu, reg), val)
+#define cxl_p2n_read(afu, reg) \
+	_cxl_reg_read(_cxl_p2n_addr(afu, reg))
+
+struct cxl_calls {
+	void (*cxl_slbia)(struct mm_struct *mm);
+	struct module *owner;
+};
+int register_cxl_calls(struct cxl_calls *calls);
+void unregister_cxl_calls(struct cxl_calls *calls);
+
+int cxl_alloc_adapter_nr(struct cxl_t *adapter);
+void cxl_remove_adapter_nr(struct cxl_t *adapter);
+
+int cxl_file_init(void);
+void cxl_file_exit(void);
+int cxl_register_adapter(struct cxl_t *adapter);
+int cxl_register_afu(struct cxl_afu_t *afu);
+int cxl_chardev_m_afu_add(struct cxl_afu_t *afu);
+int cxl_chardev_s_afu_add(struct cxl_afu_t *afu);
+void cxl_chardev_afu_remove(struct cxl_afu_t *afu);
+
+void cxl_context_detach_all(struct cxl_afu_t *afu);
+void cxl_context_free(struct cxl_context_t *ctx);
+void cxl_context_detach(struct cxl_context_t *ctx);
+
+int cxl_sysfs_adapter_add(struct cxl_t *adapter);
+void cxl_sysfs_adapter_remove(struct cxl_t *adapter);
+int cxl_sysfs_afu_add(struct cxl_afu_t *afu);
+void cxl_sysfs_afu_remove(struct cxl_afu_t *afu);
+
+int cxl_afu_activate_model(struct cxl_afu_t *afu, int model);
+int _cxl_afu_deactivate_model(struct cxl_afu_t *afu, int model);
+int cxl_afu_deactivate_model(struct cxl_afu_t *afu);
+int cxl_afu_select_best_model(struct cxl_afu_t *afu);
+
+unsigned int cxl_map_irq(struct cxl_t *adapter, irq_hw_number_t hwirq,
+		         irq_handler_t handler, void *cookie);
+void cxl_unmap_irq(unsigned int virq, void *cookie);
+int cxl_register_psl_irq(struct cxl_afu_t *afu);
+void cxl_release_psl_irq(struct cxl_afu_t *afu);
+int cxl_register_psl_err_irq(struct cxl_t *adapter);
+void cxl_release_psl_err_irq(struct cxl_t *adapter);
+int cxl_register_serr_irq(struct cxl_afu_t *afu);
+void cxl_release_serr_irq(struct cxl_afu_t *afu);
+int afu_register_irqs(struct cxl_context_t *ctx, u32 count);
+void afu_release_irqs(struct cxl_context_t *ctx);
+irqreturn_t cxl_slice_irq_err(int irq, void *data);
+
+int cxl_debugfs_init(void);
+void cxl_debugfs_exit(void);
+int cxl_debugfs_adapter_add(struct cxl_t *adapter);
+void cxl_debugfs_adapter_remove(struct cxl_t *adapter);
+int cxl_debugfs_afu_add(struct cxl_afu_t *afu);
+void cxl_debugfs_afu_remove(struct cxl_afu_t *afu);
+
+void cxl_handle_fault(struct work_struct *work);
+void cxl_prefault(struct cxl_context_t *ctx, u64 wed);
+
+struct cxl_t *get_cxl_adapter(int num);
+int cxl_alloc_sst(struct cxl_context_t *ctx, u64 *sstp0, u64 *sstp1);
+
+void init_cxl_native(void);
+
+struct cxl_context_t *cxl_context_alloc(void);
+int cxl_context_init(struct cxl_context_t *ctx, struct cxl_afu_t *afu, bool master);
+void cxl_context_free(struct cxl_context_t *ctx);
+int cxl_context_iomap(struct cxl_context_t *ctx, struct vm_area_struct *vma);
+
+/* This matches the layout of the H_COLLECT_CA_INT_INFO retbuf */
+struct cxl_irq_info {
+	u64 dsisr;
+	u64 dar;
+	u64 dsr;
+	u32 pid;
+	u32 tid;
+	u64 afu_err;
+	u64 errstat;
+	u64 padding[3]; /* to match the expected retbuf size for plpar_hcall9 */
+};
+
+struct cxl_backend_ops {
+	int (*attach_process)(struct cxl_context_t *ctx, bool kernel, u64 wed,
+			    u64 amr);
+	int (*detach_process)(struct cxl_context_t *ctx);
+
+	int (*get_irq)(struct cxl_context_t *ctx, struct cxl_irq_info *info);
+	int (*ack_irq)(struct cxl_context_t *ctx, u64 tfc, u64 psl_reset_mask);
+
+	int (*check_error)(struct cxl_afu_t *afu);
+	void (*slbia)(struct cxl_afu_t *afu);
+	int (*afu_reset)(struct cxl_afu_t *afu);
+};
+extern const struct cxl_backend_ops *cxl_ops;
+
+void cxl_stop_trace(struct cxl_t *cxl);
+
+#endif
diff --git a/drivers/misc/cxl/debugfs.c b/drivers/misc/cxl/debugfs.c
new file mode 100644
index 0000000..f4d148c
--- /dev/null
+++ b/drivers/misc/cxl/debugfs.c
@@ -0,0 +1,116 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/debugfs.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+
+#include "cxl.h"
+
+struct dentry *cxl_debugfs;
+
+void cxl_stop_trace(struct cxl_t *adapter)
+{
+	int slice;
+
+	/* Stop the trace */
+	cxl_p1_write(adapter, CXL_PSL_TRACE, 0x8000000000000017LL);
+
+	/* Stop the slice traces */
+	spin_lock(&adapter->afu_list_lock);
+	for (slice = 0; slice < adapter->slices; slice++) {
+		if (adapter->afu[slice])
+			cxl_p1n_write(adapter->afu[slice], CXL_PSL_SLICE_TRACE, 0x8000000000000000LL);
+	}
+	spin_unlock(&adapter->afu_list_lock);
+}
+
+int cxl_debugfs_adapter_add(struct cxl_t *adapter)
+{
+	struct dentry *dir;
+	char buf[32];
+
+	if (!cxl_debugfs)
+		return -ENODEV;
+
+	snprintf(buf, 32, "card%i", adapter->adapter_num);
+	dir = debugfs_create_dir(buf, cxl_debugfs);
+	if (IS_ERR(dir))
+		return PTR_ERR(dir);
+	adapter->debugfs = dir;
+
+	debugfs_create_x64("fir1",     S_IRUSR, dir, _cxl_p1_addr(adapter, CXL_PSL_FIR1));
+	debugfs_create_x64("fir2",     S_IRUSR, dir, _cxl_p1_addr(adapter, CXL_PSL_FIR2));
+	debugfs_create_x64("fir_cntl", S_IRUSR, dir, _cxl_p1_addr(adapter, CXL_PSL_FIR_CNTL));
+	debugfs_create_x64("err_ivte", S_IRUSR, dir, _cxl_p1_addr(adapter, CXL_PSL_ErrIVTE));
+
+	debugfs_create_x64("trace", S_IRUSR | S_IWUSR, dir, _cxl_p1_addr(adapter, CXL_PSL_TRACE));
+
+	return 0;
+}
+EXPORT_SYMBOL(cxl_debugfs_adapter_add);
+
+void cxl_debugfs_adapter_remove(struct cxl_t *adapter)
+{
+	debugfs_remove_recursive(adapter->debugfs);
+}
+EXPORT_SYMBOL(cxl_debugfs_adapter_remove);
+
+int cxl_debugfs_afu_add(struct cxl_afu_t *afu)
+{
+	struct dentry *dir;
+	char buf[32];
+
+	if (!afu->adapter->debugfs)
+		return -ENODEV;
+
+	snprintf(buf, 32, "psl%i.%i", afu->adapter->adapter_num, afu->slice);
+	dir = debugfs_create_dir(buf, afu->adapter->debugfs);
+	if (IS_ERR(dir))
+		return PTR_ERR(dir);
+	afu->debugfs = dir;
+
+	debugfs_create_x64("fir",        S_IRUSR, dir, _cxl_p1n_addr(afu, CXL_PSL_FIR_SLICE_An));
+	debugfs_create_x64("serr",       S_IRUSR, dir, _cxl_p1n_addr(afu, CXL_PSL_SERR_An));
+	debugfs_create_x64("afu_debug",  S_IRUSR, dir, _cxl_p1n_addr(afu, CXL_AFU_DEBUG_An));
+	debugfs_create_x64("sr",         S_IRUSR, dir, _cxl_p1n_addr(afu, CXL_PSL_SR_An));
+
+	debugfs_create_x64("dsisr",      S_IRUSR, dir, _cxl_p2n_addr(afu, CXL_PSL_DSISR_An));
+	debugfs_create_x64("dar",        S_IRUSR, dir, _cxl_p2n_addr(afu, CXL_PSL_DAR_An));
+	debugfs_create_x64("sstp0",      S_IRUSR, dir, _cxl_p2n_addr(afu, CXL_SSTP0_An));
+	debugfs_create_x64("sstp1",      S_IRUSR, dir, _cxl_p2n_addr(afu, CXL_SSTP1_An));
+	debugfs_create_x64("err_status", S_IRUSR, dir, _cxl_p2n_addr(afu, CXL_PSL_ErrStat_An));
+
+	debugfs_create_x64("trace", S_IRUSR | S_IWUSR, dir, _cxl_p1n_addr(afu, CXL_PSL_SLICE_TRACE));
+
+	return 0;
+}
+EXPORT_SYMBOL(cxl_debugfs_afu_add);
+
+void cxl_debugfs_afu_remove(struct cxl_afu_t *afu)
+{
+	debugfs_remove_recursive(afu->debugfs);
+}
+EXPORT_SYMBOL(cxl_debugfs_afu_remove);
+
+int __init cxl_debugfs_init(void)
+{
+	struct dentry *ent;
+	ent = debugfs_create_dir("cxl", NULL);
+	if (IS_ERR(ent))
+		return PTR_ERR(ent);
+	cxl_debugfs = ent;
+
+	return 0;
+}
+
+void cxl_debugfs_exit(void)
+{
+	debugfs_remove_recursive(cxl_debugfs);
+}
diff --git a/drivers/misc/cxl/fault.c b/drivers/misc/cxl/fault.c
new file mode 100644
index 0000000..f729c4a
--- /dev/null
+++ b/drivers/misc/cxl/fault.c
@@ -0,0 +1,298 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#undef DEBUG
+
+#include <linux/workqueue.h>
+#include <linux/sched.h>
+#include <linux/pid.h>
+#include <linux/mm.h>
+#include <linux/moduleparam.h>
+
+#undef MODULE_PARAM_PREFIX
+#define MODULE_PARAM_PREFIX "cxl" "."
+#include <asm/current.h>
+#include <asm/copro.h>
+#include <asm/mmu.h>
+
+#include "cxl.h"
+
+bool cxl_fault_debug = false;
+
+static struct cxl_sste* find_free_sste(struct cxl_sste *primary_group,
+				       bool sec_hash,
+				       struct cxl_sste *secondary_group,
+				       unsigned int *lru)
+{
+	unsigned int i, entry;
+	struct cxl_sste *sste, *group = primary_group;
+
+	for (i = 0; i < 2; i++) {
+		for (entry = 0; entry < 8; entry++) {
+			sste = group + entry;
+			if (!(sste->esid_data & SLB_ESID_V))
+				return sste;
+		}
+		if (!sec_hash)
+			break;
+		group = secondary_group;
+	}
+	/* Nothing free, select an entry to cast out */
+	if (sec_hash && (*lru & 0x8))
+		sste = secondary_group + (*lru & 0x7);
+	else
+		sste = primary_group + (*lru & 0x7);
+	*lru = (*lru + 1) & 0xf;
+
+	return sste;
+}
+
+static void cxl_load_segment(struct cxl_context_t *ctx, u64 esid_data,
+			     u64 vsid_data)
+{
+	/* mask is the group index, we search primary and secondary here. */
+	unsigned int mask = (ctx->sst_size >> 7)-1; /* SSTP0[SegTableSize] */
+	bool sec_hash = 1;
+	struct cxl_sste *sste;
+	unsigned int hash;
+
+	WARN_ON_SMP(!spin_is_locked(&ctx->sst_lock));
+
+	sec_hash = !!(cxl_p1n_read(ctx->afu, CXL_PSL_SR_An) & CXL_PSL_SR_An_SC);
+
+	if (vsid_data & SLB_VSID_B_1T)
+		hash = (esid_data >> SID_SHIFT_1T) & mask;
+	else /* 256M */
+		hash = (esid_data >> SID_SHIFT) & mask;
+
+	sste = find_free_sste(ctx->sstp + (hash << 3), sec_hash,
+			      ctx->sstp + ((~hash & mask) << 3), &ctx->sst_lru);
+
+	pr_devel("CXL Populating SST[%li]: %#llx %#llx\n",
+			sste - ctx->sstp, vsid_data, esid_data);
+
+	sste->vsid_data = cpu_to_be64(vsid_data);
+	sste->esid_data = cpu_to_be64(esid_data);
+}
+
+static int cxl_fault_segment(struct cxl_context_t *ctx, struct mm_struct *mm,
+			     u64 ea)
+{
+	u64 vsid_data = 0, esid_data = 0;
+	unsigned long flags;
+	int rc;
+
+	spin_lock_irqsave(&ctx->sst_lock, flags);
+	if (!(rc = copro_data_segment(mm, ea, &esid_data, &vsid_data))) {
+		cxl_load_segment(ctx, esid_data, vsid_data);
+	}
+	spin_unlock_irqrestore(&ctx->sst_lock, flags);
+
+	return rc;
+}
+
+static void cxl_ack_ae(struct cxl_context_t *ctx)
+{
+	unsigned long flags;
+
+	cxl_ops->ack_irq(ctx, CXL_PSL_TFC_An_AE, 0);
+
+	spin_lock_irqsave(&ctx->lock, flags);
+	ctx->pending_fault = true;
+	ctx->fault_addr = ctx->dar;
+	spin_unlock_irqrestore(&ctx->lock, flags);
+
+	wake_up_all(&ctx->wq);
+}
+
+static int cxl_handle_segment_miss(struct cxl_context_t *ctx,
+				   struct mm_struct *mm, u64 ea)
+{
+	int rc;
+
+	pr_devel("CXL interrupt: Segment fault pe: %i ea: %#llx\n", ctx->ph, ea);
+
+	if ((rc = cxl_fault_segment(ctx, mm, ea)))
+		cxl_ack_ae(ctx);
+	else {
+
+		mb(); /* Order seg table write to TFC MMIO write */
+		cxl_ops->ack_irq(ctx, CXL_PSL_TFC_An_R, 0);
+	}
+
+	return IRQ_HANDLED;
+}
+
+static void cxl_handle_page_fault(struct cxl_context_t *ctx,
+				  struct mm_struct *mm, u64 dsisr, u64 dar)
+{
+	unsigned flt = 0;
+	int result;
+	unsigned long access, flags;
+
+	if ((result = copro_handle_mm_fault(mm, dar, dsisr, &flt))) {
+		pr_devel("copro_handle_mm_fault failed: %#x\n", result);
+		return cxl_ack_ae(ctx);
+	}
+
+	/*
+	 * update_mmu_cache() will not have loaded the hash since current->trap
+	 * is not a 0x400 or 0x300, so just call hash_page_mm() here.
+	 */
+	access = _PAGE_PRESENT;
+	if (dsisr & CXL_PSL_DSISR_An_S)
+		access |= _PAGE_RW;
+	if ((!ctx->kernel) || ~(dar & (1ULL << 63)))
+		access |= _PAGE_USER;
+	local_irq_save(flags);
+	hash_page_mm(mm, dar, access, 0x300);
+	local_irq_restore(flags);
+
+	pr_devel("Page fault successfully handled for pe: %i!\n", ctx->ph);
+	cxl_ops->ack_irq(ctx, CXL_PSL_TFC_An_R, 0);
+}
+
+void cxl_handle_fault(struct work_struct *fault_work)
+{
+	struct cxl_context_t *ctx =
+		container_of(fault_work, struct cxl_context_t, fault_work);
+	u64 dsisr = ctx->dsisr;
+	u64 dar = ctx->dar;
+	struct task_struct *task;
+	struct mm_struct *mm;
+
+	if (cxl_p2n_read(ctx->afu, CXL_PSL_DSISR_An) != dsisr ||
+	    cxl_p2n_read(ctx->afu, CXL_PSL_DAR_An) != dar ||
+	    cxl_p2n_read(ctx->afu, CXL_PSL_PEHandle_An) != ctx->ph) {
+		/* Most likely explanation is harmless - a dedicated process
+		 * has detached and these were cleared by the PSL purge, but
+		 * warn about it just in case */
+		dev_notice(&ctx->afu->dev, "cxl_handle_fault: Translation fault regs changed\n");
+		return;
+	}
+
+	pr_devel("CXL BOTTOM HALF handling fault for afu pe: %i. "
+		"DSISR: %#llx DAR: %#llx\n", ctx->ph, dsisr, dar);
+
+	if (!(task = get_pid_task(ctx->pid, PIDTYPE_PID))) {
+		pr_devel("cxl_handle_fault unable to get task %i\n",
+			 pid_nr(ctx->pid));
+		cxl_ack_ae(ctx);
+		return;
+	}
+	if (!(mm = get_task_mm(task))) {
+		pr_devel("cxl_handle_fault unable to get mm %i\n",
+			 pid_nr(ctx->pid));
+		cxl_ack_ae(ctx);
+		goto out;
+	}
+
+	if (dsisr & CXL_PSL_DSISR_An_DS)
+		cxl_handle_segment_miss(ctx, mm, dar);
+	else if (dsisr & CXL_PSL_DSISR_An_DM)
+		cxl_handle_page_fault(ctx, mm, dsisr, dar);
+	else
+		WARN(1, "cxl_handle_fault has nothing to handle\n");
+
+	mmput(mm);
+out:
+	put_task_struct(task);
+}
+
+static void cxl_prefault_one(struct cxl_context_t *ctx, u64 ea)
+{
+	int rc;
+	struct task_struct *task;
+	struct mm_struct *mm;
+
+	if (!(task = get_pid_task(ctx->pid, PIDTYPE_PID))) {
+		pr_devel("cxl_prefault_one unable to get task %i\n",
+			 pid_nr(ctx->pid));
+		return;
+	}
+	if (!(mm = get_task_mm(task))) {
+		pr_devel("cxl_prefault_one unable to get mm %i\n",
+			 pid_nr(ctx->pid));
+		put_task_struct(task);
+		return;
+	}
+
+	rc = cxl_fault_segment(ctx, mm, ea);
+
+	mmput(mm);
+	put_task_struct(task);
+}
+
+static u64 next_segment(u64 ea, u64 vsid_data)
+{
+	if (vsid_data & SLB_VSID_B_1T)
+		ea |= (1ULL << 40) - 1;
+	else
+		ea |= (1ULL << 28) - 1;
+
+	return ea + 1;
+}
+
+static void cxl_prefault_vma(struct cxl_context_t *ctx)
+{
+	u64 ea, vsid_data, esid_data, last_esid_data = 0;
+	struct vm_area_struct *vma;
+	int rc;
+	struct task_struct *task;
+	struct mm_struct *mm;
+	unsigned long flags;
+
+	if (!(task = get_pid_task(ctx->pid, PIDTYPE_PID))) {
+		pr_devel("cxl_prefault_vma unable to get task %i\n",
+			 pid_nr(ctx->pid));
+		return;
+	}
+	if (!(mm = get_task_mm(task))) {
+		pr_devel("cxl_prefault_vm unable to get mm %i\n",
+			 pid_nr(ctx->pid));
+		goto out1;
+	}
+
+	spin_lock_irqsave(&ctx->sst_lock, flags);
+	down_read(&mm->mmap_sem);
+	for (vma = mm->mmap; vma; vma = vma->vm_next) {
+		for (ea = vma->vm_start; ea < vma->vm_end;
+				ea = next_segment(ea, vsid_data)) {
+			rc = copro_data_segment(mm, ea, &esid_data, &vsid_data);
+			if (rc)
+				continue;
+
+			if (last_esid_data == esid_data)
+				continue;
+
+			cxl_load_segment(ctx, esid_data, vsid_data);
+			last_esid_data = esid_data;
+		}
+	}
+	up_read(&mm->mmap_sem);
+	spin_unlock_irqrestore(&ctx->sst_lock, flags);
+
+	mmput(mm);
+out1:
+	put_task_struct(task);
+}
+
+void cxl_prefault(struct cxl_context_t *ctx, u64 wed)
+{
+	switch (ctx->afu->prefault_mode) {
+	case CXL_PREFAULT_WED:
+		cxl_prefault_one(ctx, wed);
+		break;
+	case CXL_PREFAULT_ALL:
+		cxl_prefault_vma(ctx);
+		break;
+	default:
+		break;
+	}
+}
diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c
new file mode 100644
index 0000000..fb87ce3
--- /dev/null
+++ b/drivers/misc/cxl/file.c
@@ -0,0 +1,503 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#undef DEBUG
+
+#include <linux/spinlock.h>
+#include <linux/module.h>
+#include <linux/export.h>
+#include <linux/kernel.h>
+#include <linux/bitmap.h>
+#include <linux/sched.h>
+#include <linux/poll.h>
+#include <linux/pid.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <asm/cputable.h>
+#include <asm/current.h>
+#include <asm/copro.h>
+
+#include "cxl.h"
+
+#define CXL_NUM_MINORS 256 /* Total to reserve */
+#define CXL_DEV_MINORS 9   /* 1 control + 4 AFUs * 2 (master/shared) */
+
+#define CXL_CARD_MINOR(adapter) (adapter->adapter_num * CXL_DEV_MINORS)
+#define CXL_AFU_MINOR(afu) (CXL_CARD_MINOR(afu->adapter) + 1 + (2 * afu->slice))
+#define CXL_AFU_MINOR_M(afu) (CXL_AFU_MINOR(afu) + 1)
+#define CXL_AFU_MKDEV(afu) MKDEV(MAJOR(cxl_dev), CXL_AFU_MINOR(afu))
+#define CXL_AFU_MKDEV_M(afu) MKDEV(MAJOR(cxl_dev), CXL_AFU_MINOR_M(afu))
+
+#define CXL_DEVT_ADAPTER(dev) (MINOR(dev) / CXL_DEV_MINORS)
+#define CXL_DEVT_AFU(dev) ((MINOR(dev) % CXL_DEV_MINORS - 1) / 2)
+
+#define CXL_DEVT_IS_CARD(dev) (MINOR(dev) % CXL_DEV_MINORS == 0)
+#define CXL_DEVT_IS_AFU(dev) (!CXL_DEVT_IS_CARD(dev))
+#define _CXL_DEVT_IS_AFU_S(dev) (((MINOR(dev) % CXL_DEV_MINORS) % 2) == 1)
+#define CXL_DEVT_IS_AFU_S(dev) (!CXL_DEVT_IS_CARD(dev) && _CXL_DEVT_IS_AFU_S(dev))
+#define CXL_DEVT_IS_AFU_M(dev) (!CXL_DEVT_IS_CARD(dev) && !_CXL_DEVT_IS_AFU_S(dev))
+
+dev_t cxl_dev;
+
+struct class *cxl_class;
+EXPORT_SYMBOL(cxl_class);
+
+static int __afu_open(struct inode *inode, struct file *file, bool master)
+{
+	struct cxl_t *adapter;
+	struct cxl_afu_t *afu;
+	struct cxl_context_t *ctx;
+	int adapter_num = CXL_DEVT_ADAPTER(inode->i_rdev);
+	int slice = CXL_DEVT_AFU(inode->i_rdev);
+	int rc = -ENODEV;
+
+	pr_devel("afu_open afu%i.%i\n", slice, adapter_num);
+
+	if (!(adapter = get_cxl_adapter(adapter_num)))
+		return -ENODEV;
+
+	if (!try_module_get(adapter->driver->module))
+		goto err_put_adapter;
+
+	if (slice > adapter->slices)
+		goto err_put_module;
+
+	spin_lock(&adapter->afu_list_lock);
+	if (!(afu = adapter->afu[slice])) {
+		spin_unlock(&adapter->afu_list_lock);
+		goto err_put_module;
+	}
+	get_device(&afu->dev);
+	spin_unlock(&adapter->afu_list_lock);
+
+	if (!afu->current_model)
+		goto err_put_afu;
+
+	if (!(ctx = cxl_context_alloc())) {
+		rc = -ENOMEM;
+		goto err_put_afu;
+	}
+
+	if ((rc = cxl_context_init(ctx, afu, master)))
+		goto err_put_afu;
+
+	pr_devel("afu_open pe: %i\n", ctx->ph);
+	file->private_data = ctx;
+	cxl_ctx_get();
+
+	/* Our ref on the AFU will now hold the adapter */
+	put_device(&adapter->dev);
+
+	return 0;
+
+err_put_afu:
+	put_device(&afu->dev);
+err_put_module:
+	module_put(adapter->driver->module);
+err_put_adapter:
+	put_device(&adapter->dev);
+	return rc;
+}
+static int afu_open(struct inode *inode, struct file *file)
+{
+	return __afu_open(inode, file, false);
+}
+
+static int afu_master_open(struct inode *inode, struct file *file)
+{
+	return __afu_open(inode, file, true);
+}
+
+static int afu_release(struct inode *inode, struct file *file)
+{
+	struct cxl_context_t *ctx = file->private_data;
+
+	pr_devel("%s: closing cxl file descriptor. pe: %i\n",
+		 __func__, ctx->ph);
+	cxl_context_detach(ctx);
+
+	module_put(ctx->afu->adapter->driver->module);
+
+	put_device(&ctx->afu->dev);
+
+	/* It should be safe to remove the context now */
+	cxl_context_free(ctx);
+
+	cxl_ctx_put();
+	return 0;
+}
+
+static long afu_ioctl_start_work(struct cxl_context_t *ctx,
+		     struct cxl_ioctl_start_work __user *uwork)
+{
+	struct cxl_ioctl_start_work work;
+	u64 amr;
+	int rc;
+
+	pr_devel("afu_ioctl: pe: %i CXL_START_WORK\n", ctx->ph);
+
+	if (ctx->status != OPENED)
+		return -EIO;
+
+	if (copy_from_user(&work, uwork,
+			   sizeof(struct cxl_ioctl_start_work)))
+		return -EFAULT;
+
+	if (work.reserved1 || work.reserved2 || work.reserved3 ||
+	    work.reserved4 || work.reserved5 || work.reserved6)
+		return -EINVAL;
+
+	if (work.num_interrupts == -1)
+		work.num_interrupts = ctx->afu->pp_irqs;
+	else if ((work.num_interrupts < ctx->afu->pp_irqs) ||
+		 (work.num_interrupts > ctx->afu->irqs_max))
+		return -EINVAL;
+	if ((rc = afu_register_irqs(ctx, work.num_interrupts)))
+		return rc;
+
+	amr = work.amr & mfspr(SPRN_UAMOR);
+
+	work.process_element = ctx->ph;
+
+	/* Returns PE and number of interrupts */
+	if (copy_to_user(uwork, &work,
+			 sizeof(struct cxl_ioctl_start_work)))
+		return -EFAULT;
+
+	if ((rc = cxl_ops->attach_process(ctx, false, work.wed, amr)))
+		return rc;
+
+	ctx->status = STARTED;
+
+	return 0;
+}
+
+static long afu_ioctl_check_error(struct cxl_context_t *ctx)
+{
+	if (ctx->status != STARTED)
+		return -EIO;
+
+	if (cxl_ops->check_error && cxl_ops->check_error(ctx->afu)) {
+		/* This may not be enough for some errors.  May need to PERST
+		 * the card in some cases if it's very broken.
+		 */
+		return cxl_ops->afu_reset(ctx->afu);
+	}
+	return -EPERM;
+}
+
+static long afu_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+	struct cxl_context_t *ctx = file->private_data;
+
+	if (ctx->status == CLOSED)
+		return -EIO;
+
+	pr_devel("afu_ioctl\n");
+	switch (cmd) {
+	case CXL_IOCTL_START_WORK:
+		return afu_ioctl_start_work(ctx,
+			(struct cxl_ioctl_start_work __user *)arg);
+	case CXL_IOCTL_CHECK_ERROR:
+		return afu_ioctl_check_error(ctx);
+	}
+	return -EINVAL;
+}
+
+static long afu_compat_ioctl(struct file *file, unsigned int cmd,
+			     unsigned long arg)
+{
+	return afu_ioctl(file, cmd, arg);
+}
+
+static int afu_mmap(struct file *file, struct vm_area_struct *vm)
+{
+	struct cxl_context_t *ctx = file->private_data;
+
+	/* AFU must be started before we can MMIO */
+	if (ctx->status != STARTED)
+		return -EIO;
+
+	return cxl_context_iomap(ctx, vm);
+}
+
+static unsigned int afu_poll(struct file *file, struct poll_table_struct *poll)
+{
+	struct cxl_context_t *ctx = file->private_data;
+	int mask = 0;
+	unsigned long flags;
+
+
+	poll_wait(file, &ctx->wq, poll);
+
+	pr_devel("afu_poll wait done pe: %i\n", ctx->ph);
+
+	spin_lock_irqsave(&ctx->lock, flags);
+	if (ctx->pending_irq || ctx->pending_fault ||
+	    ctx->pending_afu_err)
+		mask |= POLLIN | POLLRDNORM;
+	else if (ctx->status == CLOSED)
+		/* Only error on closed when there are no futher events pending
+		 */
+		mask |= POLLERR;
+	spin_unlock_irqrestore(&ctx->lock, flags);
+
+	pr_devel("afu_poll pe: %i returning %#x\n", ctx->ph, mask);
+
+	return mask;
+}
+
+static ssize_t afu_read(struct file *file, char __user *buf, size_t count,
+			loff_t *off)
+{
+	struct cxl_context_t *ctx = file->private_data;
+	struct cxl_event event;
+	unsigned long flags;
+	ssize_t size;
+	DEFINE_WAIT(wait);
+
+	if (count < sizeof(struct cxl_event_header))
+		return -EINVAL;
+
+	while (1) {
+		spin_lock_irqsave(&ctx->lock, flags);
+		if (ctx->pending_irq || ctx->pending_fault ||
+		    ctx->pending_afu_err || (ctx->status == CLOSED))
+			break;
+		spin_unlock_irqrestore(&ctx->lock, flags);
+
+		if (file->f_flags & O_NONBLOCK)
+			return -EAGAIN;
+
+		prepare_to_wait(&ctx->wq, &wait, TASK_INTERRUPTIBLE);
+		if (!(ctx->pending_irq || ctx->pending_fault ||
+		      ctx->pending_afu_err || (ctx->status == CLOSED))) {
+			pr_devel("afu_read going to sleep...\n");
+			schedule();
+			pr_devel("afu_read woken up\n");
+		}
+		finish_wait(&ctx->wq, &wait);
+
+		if (signal_pending(current))
+			return -ERESTARTSYS;
+	}
+
+	memset(&event, 0, sizeof(event));
+	event.header.process_element = ctx->ph;
+	if (ctx->pending_irq) {
+		pr_devel("afu_read delivering AFU interrupt\n");
+		event.header.size = sizeof(struct cxl_event_afu_interrupt);
+		event.header.type = CXL_EVENT_AFU_INTERRUPT;
+		event.irq.irq = find_first_bit(ctx->irq_bitmap, ctx->irq_count) + 1;
+
+		/* Only clear the IRQ if we can send the whole event: */
+		if (count >= event.header.size) {
+			clear_bit(event.irq.irq - 1, ctx->irq_bitmap);
+			if (bitmap_empty(ctx->irq_bitmap, ctx->irq_count))
+				ctx->pending_irq = false;
+		}
+	} else if (ctx->pending_fault) {
+		pr_devel("afu_read delivering data storage fault\n");
+		event.header.size = sizeof(struct cxl_event_data_storage);
+		event.header.type = CXL_EVENT_DATA_STORAGE;
+		event.fault.addr = ctx->fault_addr;
+
+		/* Only clear the fault if we can send the whole event: */
+		if (count >= event.header.size)
+			ctx->pending_fault = false;
+	} else if (ctx->pending_afu_err) {
+		pr_devel("afu_read delivering afu error\n");
+		event.header.size = sizeof(struct cxl_event_afu_error);
+		event.header.type = CXL_EVENT_AFU_ERROR;
+		event.afu_err.err = ctx->afu_err;
+
+		/* Only clear the fault if we can send the whole event: */
+		if (count >= event.header.size)
+			ctx->pending_afu_err = false;
+	} else if (ctx->status == CLOSED) {
+		pr_devel("afu_read fatal error\n");
+		spin_unlock_irqrestore(&ctx->lock, flags);
+		return -EIO;
+	} else
+		WARN(1, "afu_read must be buggy\n");
+
+	spin_unlock_irqrestore(&ctx->lock, flags);
+
+	size = min_t(size_t, count, event.header.size);
+	copy_to_user(buf, &event, size);
+
+	return size;
+}
+
+static const struct file_operations afu_fops = {
+	.owner		= THIS_MODULE,
+	.open           = afu_open,
+	.poll		= afu_poll,
+	.read		= afu_read,
+	.release        = afu_release,
+	.unlocked_ioctl = afu_ioctl,
+	.compat_ioctl   = afu_compat_ioctl,
+	.mmap           = afu_mmap,
+};
+
+static const struct file_operations afu_master_fops = {
+	.owner		= THIS_MODULE,
+	.open           = afu_master_open,
+	.poll		= afu_poll,
+	.read		= afu_read,
+	.release        = afu_release,
+	.unlocked_ioctl = afu_ioctl,
+	.compat_ioctl   = afu_compat_ioctl,
+	.mmap           = afu_mmap,
+};
+
+
+static char *cxl_devnode(struct device *dev, umode_t *mode)
+{
+	struct cxl_afu_t *afu;
+
+	if (CXL_DEVT_IS_CARD(dev->devt)) {
+		/* These minor numbers will eventually be used to program the
+		 * PSL and AFUs once we have dynamic reprogramming support */
+		return NULL;
+	} else { /* CXL_DEVT_IS_AFU */
+		/* Default character devices in each programming model just get
+		 * named /dev/cxl/afuX.Y */
+		afu = dev_get_drvdata(dev);
+		if ((afu->current_model == CXL_MODEL_DEDICATED) &&
+				CXL_DEVT_IS_AFU_M(dev->devt))
+			return kasprintf(GFP_KERNEL, "cxl/%s", dev_name(&afu->dev));
+		if ((afu->current_model == CXL_MODEL_DIRECTED) &&
+				CXL_DEVT_IS_AFU_S(dev->devt))
+			return kasprintf(GFP_KERNEL, "cxl/%s", dev_name(&afu->dev));
+	}
+	return kasprintf(GFP_KERNEL, "cxl/%s", dev_name(dev));
+}
+
+extern struct class *cxl_class;
+
+int cxl_chardev_m_afu_add(struct cxl_afu_t *afu)
+{
+	struct device *dev;
+	int rc;
+
+	cdev_init(&afu->afu_cdev_m, &afu_master_fops);
+	if ((rc = cdev_add(&afu->afu_cdev_m, CXL_AFU_MKDEV_M(afu), 1))) {
+		dev_err(&afu->dev, "Unable to add master chardev: %i\n", rc);
+		return rc;
+	}
+
+	dev = device_create(cxl_class, &afu->dev, CXL_AFU_MKDEV_M(afu), afu,
+			"afu%i.%im", afu->adapter->adapter_num, afu->slice);
+	if (IS_ERR(dev)) {
+		dev_err(&afu->dev, "Unable to create master chardev in sysfs: %i\n", rc);
+		rc = PTR_ERR(dev);
+		goto err;
+	}
+
+	afu->chardev_m = dev;
+
+	return 0;
+err:
+	cdev_del(&afu->afu_cdev_m);
+	return rc;
+}
+
+int cxl_chardev_s_afu_add(struct cxl_afu_t *afu)
+{
+	struct device *dev;
+	int rc;
+
+	cdev_init(&afu->afu_cdev_s, &afu_fops);
+	if ((rc = cdev_add(&afu->afu_cdev_s, CXL_AFU_MKDEV(afu), 1))) {
+		dev_err(&afu->dev, "Unable to add shared chardev: %i\n", rc);
+		return rc;
+	}
+
+	dev = device_create(cxl_class, &afu->dev, CXL_AFU_MKDEV(afu), afu,
+			"afu%i.%is", afu->adapter->adapter_num, afu->slice);
+	if (IS_ERR(dev)) {
+		dev_err(&afu->dev, "Unable to create shared chardev in sysfs: %i\n", rc);
+		rc = PTR_ERR(dev);
+		goto err;
+	}
+
+	afu->chardev_s = dev;
+
+	return 0;
+err:
+	cdev_del(&afu->afu_cdev_s);
+	return rc;
+}
+
+void cxl_chardev_afu_remove(struct cxl_afu_t *afu)
+{
+	if (afu->chardev_m) {
+		cdev_del(&afu->afu_cdev_m);
+		device_unregister(afu->chardev_m);
+	}
+	if (afu->chardev_s) {
+		cdev_del(&afu->afu_cdev_s);
+		device_unregister(afu->chardev_s);
+	}
+}
+
+int cxl_register_afu(struct cxl_afu_t *afu)
+{
+	afu->dev.class = cxl_class;
+
+	return device_register(&afu->dev);
+}
+EXPORT_SYMBOL(cxl_register_afu);
+
+int cxl_register_adapter(struct cxl_t *adapter)
+{
+	adapter->dev.class = cxl_class;
+
+	/* Future: When we support dynamically reprogramming the PSL & AFU we
+	 * will expose the interface to do that via a chardev:
+	 * adapter->dev.devt = CXL_CARD_MKDEV(adapter);
+	 */
+
+	return device_register(&adapter->dev);
+}
+EXPORT_SYMBOL(cxl_register_adapter);
+
+int __init cxl_file_init(void)
+{
+	int rc;
+
+	if ((rc = alloc_chrdev_region(&cxl_dev, 0, CXL_NUM_MINORS, "cxl"))) {
+		pr_err("Unable to allocate CXL major number: %i\n", rc);
+		return rc;
+	}
+
+	pr_devel("CXL device allocated, MAJOR %i\n", MAJOR(cxl_dev));
+
+	cxl_class = class_create(THIS_MODULE, "cxl");
+	if (IS_ERR(cxl_class)) {
+		pr_err("Unable to create CXL class\n");
+		rc = PTR_ERR(cxl_class);
+		goto err;
+	}
+	cxl_class->devnode = cxl_devnode;
+
+	return 0;
+
+err:
+	unregister_chrdev_region(cxl_dev, CXL_NUM_MINORS);
+	return rc;
+}
+
+void cxl_file_exit(void)
+{
+	unregister_chrdev_region(cxl_dev, CXL_NUM_MINORS);
+	class_destroy(cxl_class);
+}
diff --git a/drivers/misc/cxl/irq.c b/drivers/misc/cxl/irq.c
new file mode 100644
index 0000000..3e01e1d
--- /dev/null
+++ b/drivers/misc/cxl/irq.c
@@ -0,0 +1,405 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#undef DEBUG
+
+#include <linux/interrupt.h>
+#include <linux/workqueue.h>
+#include <linux/sched.h>
+#include <linux/wait.h>
+#include <linux/slab.h>
+#include <linux/pid.h>
+#include <asm/cputable.h>
+#include <misc/cxl.h>
+
+#include "cxl.h"
+
+/* XXX: This is implementation specific */
+static irqreturn_t handle_psl_slice_error(struct cxl_context_t *ctx, u64 dsisr, u64 errstat)
+{
+	u64 fir1, fir2, fir_slice, serr, afu_debug;
+
+	fir1 = cxl_p1_read(ctx->afu->adapter, CXL_PSL_FIR1);
+	fir2 = cxl_p1_read(ctx->afu->adapter, CXL_PSL_FIR2);
+	fir_slice = cxl_p1n_read(ctx->afu, CXL_PSL_FIR_SLICE_An);
+	serr = cxl_p1n_read(ctx->afu, CXL_PSL_SERR_An);
+	afu_debug = cxl_p1n_read(ctx->afu, CXL_AFU_DEBUG_An);
+
+	dev_crit(&ctx->afu->dev, "PSL ERROR STATUS: 0x%.16llx\n", errstat);
+	dev_crit(&ctx->afu->dev, "PSL_FIR1: 0x%.16llx\n", fir1);
+	dev_crit(&ctx->afu->dev, "PSL_FIR2: 0x%.16llx\n", fir2);
+	dev_crit(&ctx->afu->dev, "PSL_SERR_An: 0x%.16llx\n", serr);
+	dev_crit(&ctx->afu->dev, "PSL_FIR_SLICE_An: 0x%.16llx\n", fir_slice);
+	dev_crit(&ctx->afu->dev, "CXL_PSL_AFU_DEBUG_An: 0x%.16llx\n", afu_debug);
+
+	dev_crit(&ctx->afu->dev, "STOPPING CXL TRACE\n");
+	cxl_stop_trace(ctx->afu->adapter);
+
+	return cxl_ops->ack_irq(ctx, 0, errstat);
+}
+
+irqreturn_t cxl_slice_irq_err(int irq, void *data)
+{
+	struct cxl_afu_t *afu = data;
+	u64 fir_slice, errstat, serr, afu_debug;
+
+	WARN(irq, "CXL SLICE ERROR interrupt %i\n", irq);
+
+	serr = cxl_p1n_read(afu, CXL_PSL_SERR_An);
+	fir_slice = cxl_p1n_read(afu, CXL_PSL_FIR_SLICE_An);
+	errstat = cxl_p2n_read(afu, CXL_PSL_ErrStat_An);
+	afu_debug = cxl_p1n_read(afu, CXL_AFU_DEBUG_An);
+	dev_crit(&afu->dev, "PSL_SERR_An: 0x%.16llx\n", serr);
+	dev_crit(&afu->dev, "PSL_FIR_SLICE_An: 0x%.16llx\n", fir_slice);
+	dev_crit(&afu->dev, "CXL_PSL_ErrStat_An: 0x%.16llx\n", errstat);
+	dev_crit(&afu->dev, "CXL_PSL_AFU_DEBUG_An: 0x%.16llx\n", afu_debug);
+
+	cxl_p1n_write(afu, CXL_PSL_SERR_An, serr);
+
+	return IRQ_HANDLED;
+}
+
+irqreturn_t cxl_irq_err(int irq, void *data)
+{
+	struct cxl_t *adapter = data;
+	u64 fir1, fir2, err_ivte;
+
+	WARN(1, "CXL ERROR interrupt %i\n", irq);
+
+	err_ivte = cxl_p1_read(adapter, CXL_PSL_ErrIVTE);
+	dev_crit(&adapter->dev, "PSL_ErrIVTE: 0x%.16llx\n", err_ivte);
+
+	dev_crit(&adapter->dev, "STOPPING CXL TRACE\n");
+	cxl_stop_trace(adapter);
+
+	fir1 = cxl_p1_read(adapter, CXL_PSL_FIR1);
+	fir2 = cxl_p1_read(adapter, CXL_PSL_FIR2);
+
+	dev_crit(&adapter->dev, "PSL_FIR1: 0x%.16llx\nPSL_FIR2: 0x%.16llx\n", fir1, fir2);
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t schedule_cxl_fault(struct cxl_context_t *ctx, u64 dsisr, u64 dar)
+{
+	ctx->dsisr = dsisr;
+	ctx->dar = dar;
+	schedule_work(&ctx->fault_work);
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t cxl_irq(int irq, void *data)
+{
+	struct cxl_context_t *ctx = data;
+	struct cxl_irq_info irq_info;
+	u64 dsisr, dar;
+	int result;
+
+	if ((result = cxl_ops->get_irq(ctx, &irq_info))) {
+		WARN(1, "Unable to get CXL IRQ Info: %i\n", result);
+		return IRQ_HANDLED;
+	}
+
+	dsisr = irq_info.dsisr;
+	dar = irq_info.dar;
+
+	pr_devel("CXL interrupt %i for afu pe: %i DSISR: %#llx DAR: %#llx\n", irq, ctx->ph, dsisr, dar);
+
+	if (dsisr & CXL_PSL_DSISR_An_DS) {
+		/* We don't inherently need to sleep to handle this, but we do
+		 * need to get a ref to the task's mm, which we can't do from
+		 * irq context without the potential for a deadlock since it
+		 * takes the task_lock. An alternate option would be to keep a
+		 * reference to the task's mm the entire time it has cxl open,
+		 * but to do that we need to solve the issue where we hold a
+		 * ref to the mm, but the mm can hold a ref to the fd after an
+		 * mmap preventing anything from being cleaned up. */
+		pr_devel("Scheduling segment miss handling for later pe: %i\n", ctx->ph);
+		return schedule_cxl_fault(ctx, dsisr, dar);
+	}
+
+	if (dsisr & CXL_PSL_DSISR_An_M)
+		pr_devel("CXL interrupt: PTE not found\n");
+	if (dsisr & CXL_PSL_DSISR_An_P)
+		pr_devel("CXL interrupt: Storage protection violation\n");
+	if (dsisr & CXL_PSL_DSISR_An_A)
+		pr_devel("CXL interrupt: AFU lock access to write through or cache inhibited storage\n");
+	if (dsisr & CXL_PSL_DSISR_An_S)
+		pr_devel("CXL interrupt: Access was afu_wr or afu_zero\n");
+	if (dsisr & CXL_PSL_DSISR_An_K)
+		pr_devel("CXL interrupt: Access not permitted by virtual page class key protection\n");
+
+	if (dsisr & CXL_PSL_DSISR_An_DM) {
+		/* In some cases we might be able to handle the fault
+		 * immediately if hash_page would succeed, but we still need
+		 * the task's mm, which as above we can't get without a lock */
+		pr_devel("Scheduling page fault handling for later pe: %i\n", ctx->ph);
+		return schedule_cxl_fault(ctx, dsisr, dar);
+	}
+	if (dsisr & CXL_PSL_DSISR_An_ST)
+		WARN(1, "CXL interrupt: Segment Table PTE not found\n");
+	if (dsisr & CXL_PSL_DSISR_An_UR)
+		pr_devel("CXL interrupt: AURP PTE not found\n");
+	if (dsisr & CXL_PSL_DSISR_An_PE)
+		return handle_psl_slice_error(ctx, dsisr, irq_info.errstat);
+	if (dsisr & CXL_PSL_DSISR_An_AE) {
+		pr_devel("CXL interrupt: AFU Error %.llx\n", irq_info.afu_err);
+
+		if (ctx->pending_afu_err) {
+			/* This shouldn't happen - the PSL treats these errors
+			 * as fatal and will have reset the AFU, so there's not
+			 * much point buffering multiple AFU errors.
+			 * OTOH if we DO ever see a storm of these come in it's
+			 * probably best that we log them somewhere: */
+			dev_err_ratelimited(&ctx->afu->dev, "CXL AFU Error "
+					    "undelivered to pe %i: %.llx\n",
+					    ctx->ph, irq_info.afu_err);
+		} else {
+			spin_lock(&ctx->lock);
+			ctx->afu_err = irq_info.afu_err;
+			ctx->pending_afu_err = 1;
+			spin_unlock(&ctx->lock);
+
+			wake_up_all(&ctx->wq);
+		}
+
+		cxl_ops->ack_irq(ctx, CXL_PSL_TFC_An_A, 0);
+	}
+	if (dsisr & CXL_PSL_DSISR_An_OC)
+		pr_devel("CXL interrupt: OS Context Warning\n");
+
+	WARN(1, "Unhandled CXL PSL IRQ\n");
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t cxl_irq_multiplexed(int irq, void *data)
+{
+	struct cxl_afu_t *afu = data;
+	struct cxl_context_t *ctx;
+	int ph = cxl_p2n_read(afu, CXL_PSL_PEHandle_An) & 0xffff;
+	int ret;
+
+	rcu_read_lock();
+	ctx = idr_find(&afu->contexts_idr, ph);
+	if (ctx) {
+		ret = cxl_irq(irq, ctx);
+		rcu_read_unlock();
+		return ret;
+	}
+	rcu_read_unlock();
+
+	WARN(1, "Unable to demultiplex CXL PSL IRQ\n");
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t cxl_irq_afu(int irq, void *data)
+{
+	struct cxl_context_t *ctx = data;
+	irq_hw_number_t hwirq = irqd_to_hwirq(irq_get_irq_data(irq));
+	int irq_off, afu_irq = 1;
+	__u16 range;
+	int r;
+
+	for (r = 1; r < CXL_IRQ_RANGES; r++) {
+		irq_off = hwirq - ctx->irqs.offset[r];
+		range = ctx->irqs.range[r];
+		if (irq_off >= 0 && irq_off < range) {
+			afu_irq += irq_off;
+			break;
+		}
+		afu_irq += range;
+	}
+	if (unlikely(r >= CXL_IRQ_RANGES)) {
+		WARN(1, "Recieved AFU IRQ out of range for pe %i (virq %i hwirq %lx)\n",
+		     ctx->ph, irq, hwirq);
+		return IRQ_HANDLED;
+	}
+
+	pr_devel("Received AFU interrupt %i for pe: %i (virq %i hwirq %lx)\n",
+	       afu_irq, ctx->ph, irq, hwirq);
+
+	if (unlikely(!ctx->irq_bitmap)) {
+		WARN(1, "Recieved AFU IRQ for context with no IRQ bitmap\n");
+		return IRQ_HANDLED;
+	}
+	spin_lock(&ctx->lock);
+	set_bit(afu_irq - 1, ctx->irq_bitmap);
+	ctx->pending_irq = true;
+	spin_unlock(&ctx->lock);
+
+	wake_up_all(&ctx->wq);
+
+	return IRQ_HANDLED;
+}
+
+unsigned int cxl_map_irq(struct cxl_t *adapter, irq_hw_number_t hwirq,
+			 irq_handler_t handler, void *cookie)
+{
+	unsigned int virq;
+	int result;
+
+	/* IRQ Domain? */
+	virq = irq_create_mapping(NULL, hwirq);
+	if (!virq) {
+		dev_warn(&adapter->dev, "cxl_map_irq: irq_create_mapping failed\n");
+		return 0;
+	}
+
+	if (adapter->driver->setup_irq)
+		adapter->driver->setup_irq(adapter, hwirq, virq);
+
+	pr_devel("hwirq %#lx mapped to virq %u\n", hwirq, virq);
+
+	result = request_irq(virq, handler, 0, "cxl", cookie);
+	if (result) {
+		dev_warn(&adapter->dev, "cxl_map_irq: request_irq failed: %i\n", result);
+		return 0;
+	}
+
+	return virq;
+}
+
+void cxl_unmap_irq(unsigned int virq, void *cookie)
+{
+	free_irq(virq, cookie);
+	irq_dispose_mapping(virq);
+}
+
+static int cxl_register_one_irq(struct cxl_t *adapter,
+				irq_handler_t handler,
+				void *cookie,
+				irq_hw_number_t *dest_hwirq,
+				unsigned int *dest_virq)
+{
+	int hwirq, virq;
+
+	if ((hwirq = adapter->driver->alloc_one_irq(adapter)) < 0)
+		return hwirq;
+
+	if (!(virq = cxl_map_irq(adapter, hwirq, handler, cookie)))
+		goto err;
+
+	*dest_hwirq = hwirq;
+	*dest_virq = virq;
+
+	return 0;
+
+err:
+	adapter->driver->release_one_irq(adapter, hwirq);
+	return -ENOMEM;
+}
+
+int cxl_register_psl_err_irq(struct cxl_t *adapter)
+{
+	int rc;
+
+	if ((rc = cxl_register_one_irq(adapter, cxl_irq_err, adapter,
+				       &adapter->err_hwirq,
+				       &adapter->err_virq)))
+		return rc;
+
+	cxl_p1_write(adapter, CXL_PSL_ErrIVTE, adapter->err_hwirq & 0xffff);
+
+	return 0;
+}
+EXPORT_SYMBOL(cxl_register_psl_err_irq);
+
+void cxl_release_psl_err_irq(struct cxl_t *adapter)
+{
+	cxl_p1_write(adapter, CXL_PSL_ErrIVTE, 0x0000000000000000);
+	cxl_unmap_irq(adapter->err_virq, adapter);
+	adapter->driver->release_one_irq(adapter, adapter->err_hwirq);
+}
+EXPORT_SYMBOL(cxl_release_psl_err_irq);
+
+int cxl_register_serr_irq(struct cxl_afu_t *afu)
+{
+	u64 serr;
+	int rc;
+
+	if ((rc = cxl_register_one_irq(afu->adapter, cxl_slice_irq_err, afu,
+				       &afu->serr_hwirq,
+				       &afu->serr_virq)))
+		return rc;
+
+	serr = cxl_p1n_read(afu, CXL_PSL_SERR_An);
+	serr = (serr & 0x00ffffffffff0000ULL) | (afu->serr_hwirq & 0xffff);
+	cxl_p1n_write(afu, CXL_PSL_SERR_An, serr);
+
+	return 0;
+}
+EXPORT_SYMBOL(cxl_register_serr_irq);
+
+void cxl_release_serr_irq(struct cxl_afu_t *afu)
+{
+	cxl_p1n_write(afu, CXL_PSL_SERR_An, 0x0000000000000000);
+	cxl_unmap_irq(afu->serr_virq, afu);
+	afu->adapter->driver->release_one_irq(afu->adapter, afu->serr_hwirq);
+}
+EXPORT_SYMBOL(cxl_release_serr_irq);
+
+int cxl_register_psl_irq(struct cxl_afu_t *afu)
+{
+	return cxl_register_one_irq(afu->adapter, cxl_irq_multiplexed, afu,
+			&afu->psl_hwirq, &afu->psl_virq);
+}
+EXPORT_SYMBOL(cxl_register_psl_irq);
+
+void cxl_release_psl_irq(struct cxl_afu_t *afu)
+{
+	cxl_unmap_irq(afu->psl_virq, afu);
+	afu->adapter->driver->release_one_irq(afu->adapter, afu->psl_hwirq);
+}
+EXPORT_SYMBOL(cxl_release_psl_irq);
+
+int afu_register_irqs(struct cxl_context_t *ctx, u32 count)
+{
+	irq_hw_number_t hwirq;
+	int rc, r, i;
+
+	if ((rc = ctx->afu->adapter->driver->alloc_irq_ranges(&ctx->irqs, ctx->afu->adapter, count)))
+		return rc;
+
+	/* Multiplexed PSL Interrupt */
+	ctx->irqs.offset[0] = ctx->afu->psl_hwirq;
+	ctx->irqs.range[0] = 1;
+
+	ctx->irq_count = count;
+	ctx->irq_bitmap = kcalloc(BITS_TO_LONGS(count),
+				  sizeof(*ctx->irq_bitmap), GFP_KERNEL);
+	if (!ctx->irq_bitmap)
+		return -ENOMEM;
+	for (r = 1; r < CXL_IRQ_RANGES; r++) {
+		hwirq = ctx->irqs.offset[r];
+		for (i = 0; i < ctx->irqs.range[r]; hwirq++, i++) {
+			cxl_map_irq(ctx->afu->adapter, hwirq,
+				     cxl_irq_afu, ctx);
+		}
+	}
+
+	return 0;
+}
+
+void afu_release_irqs(struct cxl_context_t *ctx)
+{
+	irq_hw_number_t hwirq;
+	unsigned int virq;
+	int r, i;
+
+	for (r = 1; r < CXL_IRQ_RANGES; r++) {
+		hwirq = ctx->irqs.offset[r];
+		for (i = 0; i < ctx->irqs.range[r]; hwirq++, i++) {
+			virq = irq_find_mapping(NULL, hwirq);
+			if (virq)
+				cxl_unmap_irq(virq, ctx);
+		}
+	}
+
+	ctx->afu->adapter->driver->release_irq_ranges(&ctx->irqs, ctx->afu->adapter);
+}
diff --git a/drivers/misc/cxl/main.c b/drivers/misc/cxl/main.c
new file mode 100644
index 0000000..fb0e0fc
--- /dev/null
+++ b/drivers/misc/cxl/main.c
@@ -0,0 +1,238 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#undef DEBUG
+
+#include <linux/spinlock.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/mutex.h>
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/mm.h>
+#include <linux/of.h>
+#include <linux/slab.h>
+#include <linux/idr.h>
+#include <asm/cputable.h>
+#include <misc/cxl.h>
+
+#include "cxl.h"
+
+static DEFINE_SPINLOCK(adapter_idr_lock);
+static DEFINE_IDR(cxl_adapter_idr);
+
+const struct cxl_backend_ops *cxl_ops;
+EXPORT_SYMBOL(cxl_ops);
+
+uint cxl_verbose;
+EXPORT_SYMBOL(cxl_verbose);
+module_param_named(verbose, cxl_verbose, uint, 0600);
+MODULE_PARM_DESC(verbose, "Enable verbose dmesg output");
+
+static inline void cxl_slbia_core(struct mm_struct *mm)
+{
+	struct cxl_t *adapter;
+	struct cxl_afu_t *afu;
+	struct cxl_context_t *ctx;
+	struct task_struct *task;
+	unsigned long flags;
+	int card, slice, id;
+
+	pr_devel("%s called\n", __func__);
+
+	spin_lock(&adapter_idr_lock);
+	idr_for_each_entry(&cxl_adapter_idr, adapter, card) {
+		/* XXX: Make this lookup faster with link from mm to ctx */
+		spin_lock(&adapter->afu_list_lock);
+		for (slice = 0; slice < adapter->slices; slice++) {
+			afu = adapter->afu[slice];
+			if (!afu->enabled)
+				continue;
+			rcu_read_lock();
+			idr_for_each_entry(&afu->contexts_idr, ctx, id) {
+				if (!(task = get_pid_task(ctx->pid, PIDTYPE_PID))) {
+					pr_devel("%s unable to get task %i\n",
+						 __func__, pid_nr(ctx->pid));
+					continue;
+				}
+
+				if (task->mm != mm)
+					goto next;
+
+				pr_devel("%s matched mm - card: %i afu: %i pe: %i\n",
+					 __func__, adapter->adapter_num, slice, ctx->ph);
+
+				spin_lock_irqsave(&ctx->sst_lock, flags);
+				if (!ctx->sstp)
+					goto next_unlock;
+				memset(ctx->sstp, 0, ctx->sst_size);
+				mb();
+				cxl_ops->slbia(afu);
+
+next_unlock:
+				spin_unlock_irqrestore(&ctx->sst_lock, flags);
+next:
+				put_task_struct(task);
+			}
+			rcu_read_unlock();
+		}
+		spin_unlock(&adapter->afu_list_lock);
+	}
+	spin_unlock(&adapter_idr_lock);
+}
+
+struct cxl_calls cxl_calls = {
+	.cxl_slbia = cxl_slbia_core,
+	.owner = THIS_MODULE,
+};
+
+int cxl_alloc_sst(struct cxl_context_t *ctx, u64 *sstp0, u64 *sstp1)
+{
+	unsigned long vsid, flags;
+	u64 ea_mask;
+	u64 size;
+
+	*sstp0 = 0;
+	*sstp1 = 0;
+
+	ctx->sst_size = PAGE_SIZE;
+	ctx->sst_lru = 0;
+	if (!ctx->sstp) {
+		ctx->sstp = (struct cxl_sste *)get_zeroed_page(GFP_KERNEL);
+		pr_devel("SSTP allocated at 0x%p\n", ctx->sstp);
+	} else {
+		pr_devel("Zeroing and reusing SSTP already allocated at 0x%p\n", ctx->sstp);
+		spin_lock_irqsave(&ctx->sst_lock, flags);
+		memset(ctx->sstp, 0, PAGE_SIZE);
+		cxl_ops->slbia(ctx->afu);
+		spin_unlock_irqrestore(&ctx->sst_lock, flags);
+	}
+	if (!ctx->sstp) {
+		pr_err("cxl_alloc_sst: Unable to allocate segment table\n");
+		return -ENOMEM;
+	}
+
+	vsid  = get_kernel_vsid((u64)ctx->sstp, mmu_kernel_ssize) << 12;
+
+	*sstp0 |= (u64)mmu_kernel_ssize << CXL_SSTP0_An_B_SHIFT;
+	*sstp0 |= (SLB_VSID_KERNEL | mmu_psize_defs[mmu_linear_psize].sllp) << 50;
+
+	size = (((u64)ctx->sst_size >> 8) - 1) << CXL_SSTP0_An_SegTableSize_SHIFT;
+	if (unlikely(size & ~CXL_SSTP0_An_SegTableSize_MASK)) {
+		WARN(1, "Impossible segment table size\n");
+		return -EINVAL;
+	}
+	*sstp0 |= size;
+
+	if (mmu_kernel_ssize == MMU_SEGSIZE_256M)
+		ea_mask = 0xfffff00ULL;
+	else
+		ea_mask = 0xffffffff00ULL;
+
+	*sstp0 |=  vsid >>     (50-14);  /*   Top 14 bits of VSID */
+	*sstp1 |= (vsid << (64-(50-14))) & ~ea_mask;
+	*sstp1 |= (u64)ctx->sstp & ea_mask;
+	*sstp1 |= CXL_SSTP1_An_V;
+
+	pr_devel("Looked up %#llx: slbfee. %#llx (ssize: %x, vsid: %#lx), copied to SSTP0: %#llx, SSTP1: %#llx\n",
+			(u64)ctx->sstp, (u64)ctx->sstp & ESID_MASK, mmu_kernel_ssize, vsid, *sstp0, *sstp1);
+
+	return 0;
+}
+
+/* Find a CXL adapter by it's number and increase it's refcount */
+struct cxl_t *get_cxl_adapter(int num)
+{
+	struct cxl_t *adapter;
+
+	spin_lock(&adapter_idr_lock);
+	if ((adapter = idr_find(&cxl_adapter_idr, num)))
+		get_device(&adapter->dev);
+	spin_unlock(&adapter_idr_lock);
+
+	return adapter;
+}
+
+int cxl_alloc_adapter_nr(struct cxl_t *adapter)
+{
+	int i;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&adapter_idr_lock);
+	i = idr_alloc(&cxl_adapter_idr, adapter, 0, 0, GFP_NOWAIT);
+	spin_unlock(&adapter_idr_lock);
+	idr_preload_end();
+	if (i < 0)
+		return i;
+
+	adapter->adapter_num = i;
+
+	return 0;
+}
+EXPORT_SYMBOL(cxl_alloc_adapter_nr);
+
+void cxl_remove_adapter_nr(struct cxl_t *adapter)
+{
+	idr_remove(&cxl_adapter_idr, adapter->adapter_num);
+}
+EXPORT_SYMBOL(cxl_remove_adapter_nr);
+
+int cxl_afu_select_best_model(struct cxl_afu_t *afu)
+{
+	if (afu->models_supported & CXL_MODEL_DIRECTED)
+		return cxl_afu_activate_model(afu, CXL_MODEL_DIRECTED);
+
+	if (afu->models_supported & CXL_MODEL_DEDICATED)
+		return cxl_afu_activate_model(afu, CXL_MODEL_DEDICATED);
+
+	dev_warn(&afu->dev, "No supported programing models available\n");
+	/* We don't fail this so the user can inspect sysfs */
+	return 0;
+}
+EXPORT_SYMBOL(cxl_afu_select_best_model);
+
+static int __init init_cxl(void)
+{
+	int rc = 0;
+
+	if (!cpu_has_feature(CPU_FTR_HVMODE))
+		return -EPERM;
+
+	if ((rc = cxl_file_init()))
+		return rc;
+
+	cxl_debugfs_init();
+	init_cxl_native();
+
+	if ((rc = register_cxl_calls(&cxl_calls)))
+		goto err;
+
+	return 0;
+
+err:
+	cxl_debugfs_exit();
+	cxl_file_exit();
+
+	return rc;
+}
+
+static void exit_cxl(void)
+{
+	cxl_debugfs_exit();
+	cxl_file_exit();
+	unregister_cxl_calls(&cxl_calls);
+}
+
+module_init(init_cxl);
+module_exit(exit_cxl);
+
+MODULE_DESCRIPTION("IBM Coherent Accelerator");
+MODULE_AUTHOR("Ian Munsie <imunsie@au1.ibm.com>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c
new file mode 100644
index 0000000..3c5c6a8
--- /dev/null
+++ b/drivers/misc/cxl/native.c
@@ -0,0 +1,649 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#undef DEBUG
+
+#include <linux/spinlock.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/sched.h>
+#include <linux/mutex.h>
+#include <linux/mm.h>
+#include <linux/uaccess.h>
+#include <asm/synch.h>
+#include <misc/cxl.h>
+
+#include "cxl.h"
+
+static int afu_control(struct cxl_afu_t *afu, u64 command,
+		       u64 result, u64 mask, bool enabled)
+{
+	u64 AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An);
+	unsigned long timeout = jiffies + (HZ * CXL_TIMEOUT);
+
+	spin_lock(&afu->afu_cntl_lock);
+	pr_devel("AFU command starting: %llx\n", command);
+
+	cxl_p2n_write(afu, CXL_AFU_Cntl_An, AFU_Cntl | command);
+
+	AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An);
+	while ((AFU_Cntl & mask) != result) {
+		if (time_after_eq(jiffies, timeout)) {
+			dev_warn(&afu->dev, "WARNING: AFU control timed out!\n");
+			spin_unlock(&afu->afu_cntl_lock);
+			return -EBUSY;
+		}
+		pr_devel_ratelimited("AFU control... (0x%.16llx)\n",
+				     AFU_Cntl | command);
+		cpu_relax();
+		AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An);
+	};
+	pr_devel("AFU command complete: %llx\n", command);
+	afu->enabled = enabled;
+	spin_unlock(&afu->afu_cntl_lock);
+
+	return 0;
+}
+
+static int afu_enable(struct cxl_afu_t *afu)
+{
+	pr_devel("AFU enable request\n");
+
+	return afu_control(afu, CXL_AFU_Cntl_An_E,
+			   CXL_AFU_Cntl_An_ES_Enabled,
+			   CXL_AFU_Cntl_An_ES_MASK, true);
+}
+
+static int afu_disable(struct cxl_afu_t *afu)
+{
+	pr_devel("AFU disable request\n");
+
+	return afu_control(afu, 0, CXL_AFU_Cntl_An_ES_Disabled,
+			   CXL_AFU_Cntl_An_ES_MASK, false);
+}
+
+/* We have to disable when we reset */
+static int afu_reset_and_disable(struct cxl_afu_t *afu)
+{
+	pr_devel("AFU reset request\n");
+
+	return afu_control(afu, CXL_AFU_Cntl_An_RA,
+			   CXL_AFU_Cntl_An_RS_Complete | CXL_AFU_Cntl_An_ES_Disabled,
+			   CXL_AFU_Cntl_An_RS_MASK | CXL_AFU_Cntl_An_ES_MASK,
+			   false);
+}
+
+static int afu_check_and_enable(struct cxl_afu_t *afu)
+{
+	if (afu->enabled)
+		return 0;
+	return afu_enable(afu);
+}
+
+static int psl_purge(struct cxl_afu_t *afu)
+{
+	u64 PSL_CNTL = cxl_p1n_read(afu, CXL_PSL_SCNTL_An);
+	u64 AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An);
+	u64 dsisr, dar;
+	u64 start, end;
+	unsigned long timeout = jiffies + (HZ * CXL_TIMEOUT);
+
+	pr_devel("PSL purge request\n");
+
+	if ((AFU_Cntl & CXL_AFU_Cntl_An_ES_MASK) != CXL_AFU_Cntl_An_ES_Disabled) {
+		WARN(1, "psl_purge request while AFU not disabled!\n");
+		afu_disable(afu);
+	}
+
+	cxl_p1n_write(afu, CXL_PSL_SCNTL_An,
+		       PSL_CNTL | CXL_PSL_SCNTL_An_Pc);
+	start = local_clock();
+	PSL_CNTL = cxl_p1n_read(afu, CXL_PSL_SCNTL_An);
+	while ((PSL_CNTL &  CXL_PSL_SCNTL_An_Ps_MASK)
+			== CXL_PSL_SCNTL_An_Ps_Pending) {
+		if (time_after_eq(jiffies, timeout)) {
+			dev_warn(&afu->dev, "WARNING: PSL Purge timed out!\n");
+			return -EBUSY;
+		}
+		dsisr = cxl_p2n_read(afu, CXL_PSL_DSISR_An);
+		pr_devel_ratelimited("PSL purging... PSL_CNTL: 0x%.16llx  PSL_DSISR: 0x%.16llx\n", PSL_CNTL, dsisr);
+		if (dsisr & CXL_PSL_DSISR_TRANS) {
+			dar = cxl_p2n_read(afu, CXL_PSL_DAR_An);
+			dev_notice(&afu->dev, "PSL purge terminating pending translation, DSISR: 0x%.16llx, DAR: 0x%.16llx\n", dsisr, dar);
+			cxl_p2n_write(afu, CXL_PSL_TFC_An, CXL_PSL_TFC_An_AE);
+		} else if (dsisr) {
+			dev_notice(&afu->dev, "PSL purge acknowledging pending non-translation fault, DSISR: 0x%.16llx\n", dsisr);
+			cxl_p2n_write(afu, CXL_PSL_TFC_An, CXL_PSL_TFC_An_A);
+		} else {
+			cpu_relax();
+		}
+		PSL_CNTL = cxl_p1n_read(afu, CXL_PSL_SCNTL_An);
+	};
+	end = local_clock();
+	pr_devel("PSL purged in %lld ns\n", end - start);
+
+	cxl_p1n_write(afu, CXL_PSL_SCNTL_An,
+		       PSL_CNTL & ~CXL_PSL_SCNTL_An_Pc);
+	return 0;
+}
+
+static int spa_max_procs(int spa_size)
+{
+	/* From the CAIA:
+	 *    end_of_SPA_area = SPA_Base + ((n+4) * 128) + (( ((n*8) + 127) >> 7) * 128) + 255
+	 * Most of that junk is really just an overly-complicated way of saying
+	 * the last 256 bytes are __aligned(128), so it's really:
+	 *    end_of_SPA_area = end_of_PSL_queue_area + __aligned(128) 255
+	 * and
+	 *    end_of_PSL_queue_area = SPA_Base + ((n+4) * 128) + (n*8) - 1
+	 * so
+	 *    sizeof(SPA) = ((n+4) * 128) + (n*8) + __aligned(128) 256
+	 * Ignore the alignment (which is safe in this case as long as we are
+	 * careful with our rounding) and solve for n:
+	 */
+	return ((spa_size / 8) - 96) / 17;
+}
+
+static int alloc_spa(struct cxl_afu_t *afu)
+{
+	u64 spap;
+
+	/* Work out how many pages to allocate */
+	afu->spa_order = 0;
+	do {
+		afu->spa_order++;
+		afu->spa_size = (1 << afu->spa_order) * PAGE_SIZE;
+		afu->spa_max_procs = spa_max_procs(afu->spa_size);
+	} while (afu->spa_max_procs < afu->num_procs);
+
+	WARN_ON(afu->spa_size > 0x100000); /* Max size supported by the hardware */
+
+	if (!(afu->spa = (struct cxl_process_element *)
+	      __get_free_pages(GFP_KERNEL | __GFP_ZERO, afu->spa_order))) {
+		pr_err("cxl_alloc_spa: Unable to allocate scheduled process area\n");
+		return -ENOMEM;
+	}
+	pr_devel("spa pages: %i afu->spa_max_procs: %i   afu->num_procs: %i\n",
+		 1<<afu->spa_order, afu->spa_max_procs, afu->num_procs);
+
+	afu->sw_command_status = (__be64 *)((char *)afu->spa +
+					    ((afu->spa_max_procs + 3) * 128));
+
+	spap = virt_to_phys(afu->spa) & CXL_PSL_SPAP_Addr;
+	spap |= ((afu->spa_size >> (12 - CXL_PSL_SPAP_Size_Shift)) - 1) & CXL_PSL_SPAP_Size;
+	spap |= CXL_PSL_SPAP_V;
+	pr_devel("cxl: SPA allocated at 0x%p. Max processes: %i, sw_command_status: 0x%p CXL_PSL_SPAP_An=0x%016llx\n", afu->spa, afu->spa_max_procs, afu->sw_command_status, spap);
+	cxl_p1n_write(afu, CXL_PSL_SPAP_An, spap);
+
+	return 0;
+}
+
+static void release_spa(struct cxl_afu_t *afu)
+{
+	free_pages((unsigned long) afu->spa, afu->spa_order);
+}
+
+static void afu_slbia_native(struct cxl_afu_t *afu)
+{
+	pr_devel("cxl_afu_slbia issuing SLBIA command\n");
+	cxl_p2n_write(afu, CXL_SLBIA_An, CXL_SLBI_IQ_ALL);
+	while (cxl_p2n_read(afu, CXL_SLBIA_An) & CXL_SLBIA_P)
+		cpu_relax();
+}
+
+static void cxl_write_sstp(struct cxl_afu_t *afu, u64 sstp0, u64 sstp1)
+{
+	/* 1. Disable SSTP by writing 0 to SSTP1[V] */
+	cxl_p2n_write(afu, CXL_SSTP1_An, 0);
+
+	/* 2. Invalidate all SLB entries */
+	afu_slbia_native(afu);
+
+	/* 3. Set SSTP0_An */
+	cxl_p2n_write(afu, CXL_SSTP0_An, sstp0);
+
+	/* 4. Set SSTP1_An */
+	cxl_p2n_write(afu, CXL_SSTP1_An, sstp1);
+}
+
+/* Using per slice version may improve performance here. (ie. SLBIA_An) */
+static void slb_invalid(struct cxl_context_t *ctx)
+{
+	struct cxl_t *adapter = ctx->afu->adapter;
+	u64 slbia;
+
+	WARN_ON(!mutex_is_locked(&ctx->afu->spa_mutex));
+
+	cxl_p1_write(adapter, CXL_PSL_LBISEL,
+			((u64)be32_to_cpu(ctx->elem->common.pid) << 32) |
+			be32_to_cpu(ctx->elem->lpid));
+	cxl_p1_write(adapter, CXL_PSL_SLBIA, CXL_SLBI_IQ_LPIDPID);
+
+	while (1) {
+		slbia = cxl_p1_read(adapter, CXL_PSL_SLBIA);
+		if (!(slbia & CXL_SLBIA_P))
+			break;
+		cpu_relax();
+	}
+}
+
+static int do_process_element_cmd(struct cxl_context_t *ctx,
+				  u64 cmd, u64 pe_state)
+{
+	u64 state;
+
+	WARN_ON(!ctx->afu->enabled);
+
+	ctx->elem->software_state = cpu_to_be32(pe_state);
+	smp_wmb();
+	*(ctx->afu->sw_command_status) = cpu_to_be64(cmd | 0 | ctx->ph);
+	smp_mb();
+	cxl_p1n_write(ctx->afu, CXL_PSL_LLCMD_An, cmd | ctx->ph);
+	while (1) {
+		state = be64_to_cpup(ctx->afu->sw_command_status);
+		if (state == ~0ULL) {
+			pr_err("cxl: Error adding process element to AFU\n");
+			return -1;
+		}
+		if ((state & (CXL_SPA_SW_CMD_MASK | CXL_SPA_SW_STATE_MASK  | CXL_SPA_SW_LINK_MASK)) ==
+		    (cmd | (cmd >> 16) | ctx->ph))
+			break;
+		/* The command won't finish in the PSL if there are
+		 * outstanding DSIs.  Hence we need to yield here in
+		 * case there are outstanding DSIs that we need to
+		 * service.  Tuning possiblity: we could wait for a
+		 * while before sched
+		 */
+		schedule();
+
+	}
+	return 0;
+}
+
+static int add_process_element(struct cxl_context_t *ctx)
+{
+	int rc = 0;
+
+	mutex_lock(&ctx->afu->spa_mutex);
+	pr_devel("%s Adding pe: %i started\n", __func__, ctx->ph);
+	if (!(rc = do_process_element_cmd(ctx, CXL_SPA_SW_CMD_ADD, CXL_PE_SOFTWARE_STATE_V)))
+		ctx->pe_inserted = true;
+	pr_devel("%s Adding pe: %i finished\n", __func__, ctx->ph);
+	mutex_unlock(&ctx->afu->spa_mutex);
+	return rc;
+}
+
+static int terminate_process_element(struct cxl_context_t *ctx)
+{
+	int rc = 0;
+
+	/* fast path terminate if it's already invalid */
+	if (!(ctx->elem->software_state & cpu_to_be32(CXL_PE_SOFTWARE_STATE_V)))
+		return rc;
+
+	mutex_lock(&ctx->afu->spa_mutex);
+	pr_devel("%s Terminate pe: %i started\n", __func__, ctx->ph);
+	rc = do_process_element_cmd(ctx, CXL_SPA_SW_CMD_TERMINATE,
+				    CXL_PE_SOFTWARE_STATE_V | CXL_PE_SOFTWARE_STATE_T);
+	ctx->elem->software_state = 0;	/* Remove Valid bit */
+	pr_devel("%s Terminate pe: %i finished\n", __func__, ctx->ph);
+	mutex_unlock(&ctx->afu->spa_mutex);
+	return rc;
+}
+
+static int remove_process_element(struct cxl_context_t *ctx)
+{
+	int rc = 0;
+
+	mutex_lock(&ctx->afu->spa_mutex);
+	pr_devel("%s Remove pe: %i started\n", __func__, ctx->ph);
+	if (!(rc = do_process_element_cmd(ctx, CXL_SPA_SW_CMD_REMOVE, 0)))
+		ctx->pe_inserted = false;
+	slb_invalid(ctx);
+	pr_devel("%s Remove pe: %i finished\n", __func__, ctx->ph);
+	mutex_unlock(&ctx->afu->spa_mutex);
+
+	return rc;
+}
+
+
+static void assign_psn_space(struct cxl_context_t *ctx)
+{
+	if (!ctx->afu->pp_size || ctx->master) {
+		ctx->psn_phys = ctx->afu->psn_phys;
+		ctx->psn_size = ctx->afu->adapter->ps_size;
+	} else {
+		ctx->psn_phys = ctx->afu->psn_phys +
+			(ctx->afu->pp_offset + ctx->afu->pp_size * ctx->ph);
+		ctx->psn_size = ctx->afu->pp_size;
+	}
+}
+
+static int activate_afu_directed(struct cxl_afu_t *afu)
+{
+	int rc;
+
+	dev_info(&afu->dev, "Activating AFU directed model\n");
+
+	if (alloc_spa(afu))
+		return -ENOMEM;
+
+	cxl_p1n_write(afu, CXL_PSL_SCNTL_An, CXL_PSL_SCNTL_An_PM_AFU);
+	cxl_p1n_write(afu, CXL_PSL_AMOR_An, 0xFFFFFFFFFFFFFFFFULL);
+	cxl_p1n_write(afu, CXL_PSL_ID_An, CXL_PSL_ID_An_F | CXL_PSL_ID_An_L);
+
+	afu->current_model = CXL_MODEL_DIRECTED;
+	afu->num_procs = afu->max_procs_virtualised;
+
+	if ((rc = cxl_chardev_m_afu_add(afu)))
+		return rc;
+
+	if ((rc = cxl_chardev_s_afu_add(afu)))
+		goto err;
+
+	return 0;
+err:
+	cxl_chardev_afu_remove(afu);
+	return rc;
+}
+
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+#define set_endian(sr) ((sr) |= CXL_PSL_SR_An_LE)
+#else
+#define set_endian(sr) ((sr) &= ~(CXL_PSL_SR_An_LE))
+#endif
+
+static int attach_afu_directed(struct cxl_context_t *ctx, u64 wed, u64 amr)
+{
+
+	u64 sr, sstp0, sstp1;
+	int r, result;
+
+	assign_psn_space(ctx);
+
+	ctx->elem->ctxtime = 0; /* disable */
+	ctx->elem->lpid = cpu_to_be32(mfspr(SPRN_LPID));
+	ctx->elem->haurp = 0; /* disable */
+	ctx->elem->sdr = cpu_to_be64(mfspr(SPRN_SDR1));
+
+	sr = CXL_PSL_SR_An_SC;
+	if (ctx->master)
+		sr |= CXL_PSL_SR_An_MP;
+	if (mfspr(SPRN_LPCR) & LPCR_TC)
+		sr |= CXL_PSL_SR_An_TC;
+	/* HV=0, PR=1, R=1 for userspace
+	 * For kernel contexts: this would need to change
+	 */
+	sr |= CXL_PSL_SR_An_PR | CXL_PSL_SR_An_R;
+	set_endian(sr);
+	sr &= ~(CXL_PSL_SR_An_HV);
+	if (!test_tsk_thread_flag(current, TIF_32BIT))
+		sr |= CXL_PSL_SR_An_SF;
+	ctx->elem->common.pid = cpu_to_be32(current->pid);
+	ctx->elem->common.tid = 0;
+	ctx->elem->sr = cpu_to_be64(sr);
+
+	ctx->elem->common.csrp = 0; /* disable */
+	ctx->elem->common.aurp0 = 0; /* disable */
+	ctx->elem->common.aurp1 = 0; /* disable */
+
+	if ((result = cxl_alloc_sst(ctx, &sstp0, &sstp1)))
+		return result;
+
+	cxl_prefault(ctx, wed);
+
+	ctx->elem->common.sstp0 = cpu_to_be64(sstp0);
+	ctx->elem->common.sstp1 = cpu_to_be64(sstp1);
+
+	for (r = 0; r < CXL_IRQ_RANGES; r++) {
+		ctx->elem->ivte_offsets[r] = cpu_to_be16(ctx->irqs.offset[r]);
+		ctx->elem->ivte_ranges[r] = cpu_to_be16(ctx->irqs.range[r]);
+	}
+
+	ctx->elem->common.amr = cpu_to_be64(amr);
+	ctx->elem->common.wed = cpu_to_be64(wed);
+
+	/* first guy needs to enable */
+	if ((result = afu_check_and_enable(ctx->afu)))
+		return result;
+
+	add_process_element(ctx);
+
+	return 0;
+}
+
+static int deactivate_afu_directed(struct cxl_afu_t *afu)
+{
+	dev_info(&afu->dev, "Deactivating AFU directed model\n");
+
+	afu->current_model = 0;
+	afu->num_procs = 0;
+
+	cxl_chardev_afu_remove(afu);
+
+	afu_reset_and_disable(afu);
+	afu_disable(afu);
+	psl_purge(afu);
+
+	release_spa(afu);
+
+	return 0;
+}
+
+static int activate_dedicated_process(struct cxl_afu_t *afu)
+{
+	dev_info(&afu->dev, "Activating dedicated process model\n");
+
+	cxl_p1n_write(afu, CXL_PSL_SCNTL_An, CXL_PSL_SCNTL_An_PM_Process);
+
+	cxl_p1n_write(afu, CXL_PSL_CtxTime_An, 0); /* disable */
+	cxl_p1n_write(afu, CXL_PSL_SPAP_An, 0);    /* disable */
+	cxl_p1n_write(afu, CXL_PSL_AMOR_An, 0xFFFFFFFFFFFFFFFFULL);
+	cxl_p1n_write(afu, CXL_PSL_LPID_An, mfspr(SPRN_LPID));
+	cxl_p1n_write(afu, CXL_HAURP_An, 0);       /* disable */
+	cxl_p1n_write(afu, CXL_PSL_SDR_An, mfspr(SPRN_SDR1));
+
+	cxl_p2n_write(afu, CXL_CSRP_An, 0);        /* disable */
+	cxl_p2n_write(afu, CXL_AURP0_An, 0);       /* disable */
+	cxl_p2n_write(afu, CXL_AURP1_An, 0);       /* disable */
+
+	afu->current_model = CXL_MODEL_DEDICATED;
+	afu->num_procs = 1;
+
+	return cxl_chardev_m_afu_add(afu);
+}
+
+static int attach_dedicated(struct cxl_context_t *ctx, u64 wed, u64 amr)
+{
+	struct cxl_afu_t *afu = ctx->afu;
+	u64 sr, sstp0, sstp1;
+	int result;
+
+	sr = CXL_PSL_SR_An_SC;
+	set_endian(sr);
+	if (ctx->master)
+		sr |= CXL_PSL_SR_An_MP;
+	if (mfspr(SPRN_LPCR) & LPCR_TC)
+		sr |= CXL_PSL_SR_An_TC;
+	sr |= CXL_PSL_SR_An_PR | CXL_PSL_SR_An_R;
+	if (!test_tsk_thread_flag(current, TIF_32BIT))
+		sr |= CXL_PSL_SR_An_SF;
+	cxl_p2n_write(afu, CXL_PSL_PID_TID_An, (u64)current->pid << 32);
+	cxl_p1n_write(afu, CXL_PSL_SR_An, sr);
+
+	if ((result = cxl_alloc_sst(ctx, &sstp0, &sstp1)))
+		return result;
+
+	cxl_prefault(ctx, wed);
+
+	cxl_write_sstp(afu, sstp0, sstp1);
+	cxl_p1n_write(afu, CXL_PSL_IVTE_Offset_An,
+		       (((u64)ctx->irqs.offset[0] & 0xffff) << 48) |
+		       (((u64)ctx->irqs.offset[1] & 0xffff) << 32) |
+		       (((u64)ctx->irqs.offset[2] & 0xffff) << 16) |
+			((u64)ctx->irqs.offset[3] & 0xffff));
+	cxl_p1n_write(afu, CXL_PSL_IVTE_Limit_An, (u64)
+		       (((u64)ctx->irqs.range[0] & 0xffff) << 48) |
+		       (((u64)ctx->irqs.range[1] & 0xffff) << 32) |
+		       (((u64)ctx->irqs.range[2] & 0xffff) << 16) |
+			((u64)ctx->irqs.range[3] & 0xffff));
+
+	cxl_p2n_write(afu, CXL_PSL_AMR_An, amr);
+
+	/* master only context for dedicated */
+	assign_psn_space(ctx);
+
+	if ((result = afu_reset_and_disable(afu)))
+		return result;
+
+	cxl_p2n_write(afu, CXL_PSL_WED_An, wed);
+
+	return afu_enable(afu);
+}
+
+static int deactivate_dedicated_process(struct cxl_afu_t *afu)
+{
+	dev_info(&afu->dev, "Deactivating dedicated process model\n");
+
+	afu->current_model = 0;
+	afu->num_procs = 0;
+
+	cxl_chardev_afu_remove(afu);
+
+	return 0;
+}
+
+int _cxl_afu_deactivate_model(struct cxl_afu_t *afu, int model)
+{
+	if (model == CXL_MODEL_DIRECTED)
+		return deactivate_afu_directed(afu);
+	if (model == CXL_MODEL_DEDICATED)
+		return deactivate_dedicated_process(afu);
+	return 0;
+}
+
+int cxl_afu_deactivate_model(struct cxl_afu_t *afu)
+{
+	return _cxl_afu_deactivate_model(afu, afu->current_model);
+}
+EXPORT_SYMBOL(cxl_afu_deactivate_model);
+
+int cxl_afu_activate_model(struct cxl_afu_t *afu, int model)
+{
+	if (!model)
+		return 0;
+	if (!(model & afu->models_supported))
+		return -EINVAL;
+
+	if (model == CXL_MODEL_DIRECTED)
+		return activate_afu_directed(afu);
+	if (model == CXL_MODEL_DEDICATED)
+		return activate_dedicated_process(afu);
+
+	return -EINVAL;
+}
+EXPORT_SYMBOL(cxl_afu_activate_model);
+
+static int attach_process_native(struct cxl_context_t *ctx, bool kernel,
+			       u64 wed, u64 amr)
+{
+	ctx->kernel = kernel;
+	if (ctx->afu->current_model == CXL_MODEL_DIRECTED)
+		return attach_afu_directed(ctx, wed, amr);
+
+	if (ctx->afu->current_model == CXL_MODEL_DEDICATED)
+		return attach_dedicated(ctx, wed, amr);
+
+	return -EINVAL;
+}
+
+/* TODO: handle case when this is called with IRQs off which may
+ * happen when we unbind the driver.  Terminate & remove use a mutex
+ * lock and schedule which will not good with lock held.  May need to
+ * write do_process_element_cmd() that handles outstanding page
+ * faults. */
+static int detach_process_native(struct cxl_context_t *ctx)
+{
+	if (ctx->afu->current_model == CXL_MODEL_DEDICATED) {
+		afu_reset_and_disable(ctx->afu);
+		afu_disable(ctx->afu);
+		psl_purge(ctx->afu);
+		return 0;
+	}
+
+	if (!ctx->pe_inserted)
+		return 0;
+	if (terminate_process_element(ctx))
+		return -1;
+	if (remove_process_element(ctx))
+		return -1;
+
+	return 0;
+}
+
+static int get_irq_native(struct cxl_context_t *ctx, struct cxl_irq_info *info)
+{
+	u64 pidtid;
+
+	info->dsisr = cxl_p2n_read(ctx->afu, CXL_PSL_DSISR_An);
+	info->dar = cxl_p2n_read(ctx->afu, CXL_PSL_DAR_An);
+	info->dsr = cxl_p2n_read(ctx->afu, CXL_PSL_DSR_An);
+	pidtid = cxl_p2n_read(ctx->afu, CXL_PSL_PID_TID_An);
+	info->pid = pidtid >> 32;
+	info->tid = pidtid & 0xffffffff;
+	info->afu_err = cxl_p2n_read(ctx->afu, CXL_AFU_ERR_An);
+	info->errstat = cxl_p2n_read(ctx->afu, CXL_PSL_ErrStat_An);
+
+	return 0;
+}
+
+static void recover_psl_err(struct cxl_afu_t *afu, u64 errstat)
+{
+	u64 dsisr;
+
+	pr_devel("RECOVERING FROM PSL ERROR... (0x%.16llx)\n", errstat);
+
+	/* Clear PSL_DSISR[PE] */
+	dsisr = cxl_p2n_read(afu, CXL_PSL_DSISR_An);
+	cxl_p2n_write(afu, CXL_PSL_DSISR_An, dsisr & ~CXL_PSL_DSISR_An_PE);
+
+	/* Write 1s to clear error status bits */
+	cxl_p2n_write(afu, CXL_PSL_ErrStat_An, errstat);
+}
+
+static int ack_irq_native(struct cxl_context_t *ctx, u64 tfc,
+			  u64 psl_reset_mask)
+{
+	if (tfc)
+		cxl_p2n_write(ctx->afu, CXL_PSL_TFC_An, tfc);
+	if (psl_reset_mask)
+		recover_psl_err(ctx->afu, psl_reset_mask);
+
+	return 0;
+}
+
+static int check_error(struct cxl_afu_t *afu)
+{
+	return (cxl_p1n_read(afu, CXL_PSL_SCNTL_An) == ~0ULL);
+}
+
+static const struct cxl_backend_ops cxl_native_ops = {
+	.attach_process = attach_process_native,
+	.detach_process = detach_process_native,
+	.get_irq = get_irq_native,
+	.ack_irq = ack_irq_native,
+	.check_error = check_error,
+	.slbia = afu_slbia_native,
+	.afu_reset = afu_reset_and_disable,
+};
+
+void init_cxl_native(void)
+{
+	cxl_ops = &cxl_native_ops;
+}
diff --git a/drivers/misc/cxl/sysfs.c b/drivers/misc/cxl/sysfs.c
new file mode 100644
index 0000000..67489e8
--- /dev/null
+++ b/drivers/misc/cxl/sysfs.c
@@ -0,0 +1,348 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/device.h>
+#include <linux/sysfs.h>
+
+#include "cxl.h"
+
+#define to_afu_chardev_m(d) dev_get_drvdata(d)
+
+/*********  Adapter attributes  **********************************************/
+
+static ssize_t caia_version_show(struct device *device,
+				 struct device_attribute *attr,
+				 char *buf)
+{
+	struct cxl_t *adapter = to_cxl_adapter(device);
+
+	return scnprintf(buf, PAGE_SIZE, "%i.%i\n", adapter->caia_major,
+			 adapter->caia_minor);
+}
+
+static ssize_t psl_revision_show(struct device *device,
+				 struct device_attribute *attr,
+				 char *buf)
+{
+	struct cxl_t *adapter = to_cxl_adapter(device);
+
+	return scnprintf(buf, PAGE_SIZE, "%i\n", adapter->psl_rev);
+}
+
+static ssize_t base_image_show(struct device *device,
+			       struct device_attribute *attr,
+			       char *buf)
+{
+	struct cxl_t *adapter = to_cxl_adapter(device);
+
+	return scnprintf(buf, PAGE_SIZE, "%i\n", adapter->base_image);
+}
+
+static ssize_t image_loaded_show(struct device *device,
+				 struct device_attribute *attr,
+				 char *buf)
+{
+	struct cxl_t *adapter = to_cxl_adapter(device);
+
+	if (adapter->user_image_loaded)
+		return scnprintf(buf, PAGE_SIZE, "user\n");
+	return scnprintf(buf, PAGE_SIZE, "factory\n");
+}
+
+static struct device_attribute adapter_attrs[] = {
+	__ATTR_RO(caia_version),
+	__ATTR_RO(psl_revision),
+	__ATTR_RO(base_image),
+	__ATTR_RO(image_loaded),
+	/* __ATTR_RW(reset_loads_image); */
+	/* __ATTR_RW(reset_image_select); */
+};
+
+
+/*********  AFU master specific attributes  **********************************/
+
+static ssize_t mmio_size_show_master(struct device *device,
+				     struct device_attribute *attr,
+				     char *buf)
+{
+	struct cxl_afu_t *afu = to_afu_chardev_m(device);
+
+	return scnprintf(buf, PAGE_SIZE, "%llu\n", afu->adapter->ps_size);
+}
+
+static ssize_t pp_mmio_off_show(struct device *device,
+				struct device_attribute *attr,
+				char *buf)
+{
+	struct cxl_afu_t *afu = to_afu_chardev_m(device);
+
+	return scnprintf(buf, PAGE_SIZE, "%llu\n", afu->pp_offset);
+}
+
+static ssize_t pp_mmio_len_show(struct device *device,
+				struct device_attribute *attr,
+				char *buf)
+{
+	struct cxl_afu_t *afu = to_afu_chardev_m(device);
+
+	return scnprintf(buf, PAGE_SIZE, "%llu\n", afu->pp_size);
+}
+
+static struct device_attribute afu_master_attrs[] = {
+	__ATTR(mmio_size, S_IRUGO, mmio_size_show_master, NULL),
+	__ATTR_RO(pp_mmio_off),
+	__ATTR_RO(pp_mmio_len),
+};
+
+
+/*********  AFU attributes  **************************************************/
+
+static ssize_t mmio_size_show(struct device *device,
+			      struct device_attribute *attr,
+			      char *buf)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+
+	if (afu->pp_size)
+		return scnprintf(buf, PAGE_SIZE, "%llu\n", afu->pp_size);
+	return scnprintf(buf, PAGE_SIZE, "%llu\n", afu->adapter->ps_size);
+}
+
+static ssize_t reset_store_afu(struct device *device,
+			       struct device_attribute *attr,
+			       const char *buf, size_t count)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+	int rc;
+
+	if ((rc = cxl_ops->afu_reset(afu)))
+		return rc;
+	return count;
+}
+
+static ssize_t irqs_min_show(struct device *device,
+			     struct device_attribute *attr,
+			     char *buf)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+
+	return scnprintf(buf, PAGE_SIZE, "%i\n", afu->pp_irqs);
+}
+
+static ssize_t irqs_max_show(struct device *device,
+				  struct device_attribute *attr,
+				  char *buf)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+
+	return scnprintf(buf, PAGE_SIZE, "%i\n", afu->irqs_max);
+}
+
+static ssize_t irqs_max_store(struct device *device,
+				  struct device_attribute *attr,
+				  const char *buf, size_t count)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+	ssize_t ret;
+	int irqs_max;
+
+	ret = sscanf(buf, "%i", &irqs_max);
+	if (ret != 1)
+		return -EINVAL;
+
+	if (irqs_max < afu->pp_irqs)
+		return -EINVAL;
+
+	if (irqs_max > afu->adapter->user_irqs)
+		return -EINVAL;
+
+	afu->irqs_max = irqs_max;
+	return count;
+}
+
+static ssize_t models_supported_show(struct device *device,
+				    struct device_attribute *attr,
+				    char *buf)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+	char *p = buf, *end = buf + PAGE_SIZE;
+
+	if (afu->models_supported & CXL_MODEL_DEDICATED)
+		p += scnprintf(p, end - p, "dedicated_process\n");
+	if (afu->models_supported & CXL_MODEL_DIRECTED)
+		p += scnprintf(p, end - p, "afu_directed\n");
+	return (p - buf);
+}
+
+static ssize_t prefault_mode_show(struct device *device,
+				  struct device_attribute *attr,
+				  char *buf)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+
+	switch (afu->prefault_mode) {
+	case CXL_PREFAULT_WED:
+		return scnprintf(buf, PAGE_SIZE, "wed\n");
+	case CXL_PREFAULT_ALL:
+		return scnprintf(buf, PAGE_SIZE, "all\n");
+	default:
+		return scnprintf(buf, PAGE_SIZE, "none\n");
+	}
+}
+
+static ssize_t prefault_mode_store(struct device *device,
+			  struct device_attribute *attr,
+			  const char *buf, size_t count)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+	enum prefault_modes mode = -1;
+
+	if (!strncmp(buf, "wed", 3))
+		mode = CXL_PREFAULT_WED;
+	if (!strncmp(buf, "all", 3))
+		mode = CXL_PREFAULT_ALL;
+	if (!strncmp(buf, "none", 4))
+		mode = CXL_PREFAULT_NONE;
+
+	if (mode == -1)
+		return -EINVAL;
+
+	afu->prefault_mode = mode;
+	return count;
+}
+
+static ssize_t model_show(struct device *device,
+			 struct device_attribute *attr,
+			 char *buf)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+
+	if (afu->current_model == CXL_MODEL_DEDICATED)
+		return scnprintf(buf, PAGE_SIZE, "dedicated_process\n");
+	if (afu->current_model == CXL_MODEL_DIRECTED)
+		return scnprintf(buf, PAGE_SIZE, "afu_directed\n");
+	return scnprintf(buf, PAGE_SIZE, "none\n");
+}
+
+static ssize_t model_store(struct device *device,
+			   struct device_attribute *attr,
+			   const char *buf, size_t count)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+	int old_model, model = -1;
+	int rc = -EBUSY;
+
+	/* can't change this if we have a user */
+	spin_lock(&afu->contexts_lock);
+	if (!idr_is_empty(&afu->contexts_idr))
+		goto err;
+
+	if (!strncmp(buf, "dedicated_process", 17))
+		model = CXL_MODEL_DEDICATED;
+	if (!strncmp(buf, "afu_directed", 12))
+		model = CXL_MODEL_DIRECTED;
+	if (!strncmp(buf, "none", 4))
+		model = 0;
+
+	if (model == -1) {
+		rc = -EINVAL;
+		goto err;
+	}
+
+	/* cxl_afu_deactivate_model needs to be done outside the lock, prevent
+	 * other contexts coming in before we are ready: */
+	old_model = afu->current_model;
+	afu->current_model = 0;
+	afu->num_procs = 0;
+
+	spin_unlock(&afu->contexts_lock);
+
+	if ((rc = _cxl_afu_deactivate_model(afu, old_model)))
+		return rc;
+	if ((rc = cxl_afu_activate_model(afu, model)))
+		return rc;
+
+	return count;
+err:
+	spin_unlock(&afu->contexts_lock);
+	return rc;
+}
+
+static struct device_attribute afu_attrs[] = {
+	__ATTR_RO(mmio_size),
+	__ATTR_RO(irqs_min),
+	__ATTR_RW(irqs_max),
+	__ATTR_RO(models_supported),
+	__ATTR_RW(model),
+	__ATTR_RW(prefault_mode),
+	__ATTR(reset, S_IWUSR, NULL, reset_store_afu),
+};
+
+
+
+int cxl_sysfs_adapter_add(struct cxl_t *adapter)
+{
+	int i, rc;
+
+	for (i = 0; i < ARRAY_SIZE(adapter_attrs); i++) {
+		if ((rc = device_create_file(&adapter->dev, &adapter_attrs[i])))
+			goto err;
+	}
+	return 0;
+err:
+	for (i--; i >= 0; i--)
+		device_remove_file(&adapter->dev, &adapter_attrs[i]);
+	return rc;
+}
+EXPORT_SYMBOL(cxl_sysfs_adapter_add);
+void cxl_sysfs_adapter_remove(struct cxl_t *adapter)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(adapter_attrs); i++)
+		device_remove_file(&adapter->dev, &adapter_attrs[i]);
+}
+EXPORT_SYMBOL(cxl_sysfs_adapter_remove);
+
+int cxl_sysfs_afu_add(struct cxl_afu_t *afu)
+{
+	int afu_attr, mstr_attr, rc = 0;
+
+	for (afu_attr = 0; afu_attr < ARRAY_SIZE(afu_attrs); afu_attr++) {
+		if ((rc = device_create_file(&afu->dev, &afu_attrs[afu_attr])))
+			goto err;
+	}
+	for (mstr_attr = 0; mstr_attr < ARRAY_SIZE(afu_master_attrs); mstr_attr++) {
+		if ((rc = device_create_file(afu->chardev_m, &afu_master_attrs[mstr_attr])))
+			goto err1;
+	}
+
+	return 0;
+
+err1:
+	for (mstr_attr--; mstr_attr >= 0; mstr_attr--)
+		device_remove_file(afu->chardev_m, &afu_master_attrs[mstr_attr]);
+err:
+	for (afu_attr--; afu_attr >= 0; afu_attr--)
+		device_remove_file(&afu->dev, &afu_attrs[afu_attr]);
+	return rc;
+}
+EXPORT_SYMBOL(cxl_sysfs_afu_add);
+
+void cxl_sysfs_afu_remove(struct cxl_afu_t *afu)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(afu_master_attrs); i++)
+		device_remove_file(afu->chardev_m, &afu_master_attrs[i]);
+	for (i = 0; i < ARRAY_SIZE(afu_attrs); i++)
+		device_remove_file(&afu->dev, &afu_attrs[i]);
+}
+EXPORT_SYMBOL(cxl_sysfs_afu_remove);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 13/15] cxl: Userspace header file.
  2014-09-18  8:26 [PATCH 0/15] POWER8 Coherent Accelerator device driver Michael Neuling
                   ` (11 preceding siblings ...)
  2014-09-18  8:26 ` [PATCH 12/15] cxl: Driver code for powernv PCIe based cards for userspace access Michael Neuling
@ 2014-09-18  8:26 ` Michael Neuling
  2014-09-18  8:26 ` [PATCH 14/15] cxl: Add driver to Kbuild and Makefiles Michael Neuling
  2014-09-18  8:27 ` [PATCH 15/15] cxl: Add documentation for userspace APIs Michael Neuling
  14 siblings, 0 replies; 43+ messages in thread
From: Michael Neuling @ 2014-09-18  8:26 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie, cbe-oss-dev

From: Ian Munsie <imunsie@au1.ibm.com>

This defines structs and magic numbers required for userspace to interact with
the kernel cxl driver via /dev/cxl/afu0.0.

It adds this header file Kbuild so it's exported when doing make
headers_installs.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 include/uapi/Kbuild      |  1 +
 include/uapi/misc/Kbuild |  2 ++
 include/uapi/misc/cxl.h  | 88 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 91 insertions(+)
 create mode 100644 include/uapi/misc/Kbuild
 create mode 100644 include/uapi/misc/cxl.h

diff --git a/include/uapi/Kbuild b/include/uapi/Kbuild
index 81d2106..245aa6e 100644
--- a/include/uapi/Kbuild
+++ b/include/uapi/Kbuild
@@ -12,3 +12,4 @@ header-y += video/
 header-y += drm/
 header-y += xen/
 header-y += scsi/
+header-y += misc/
diff --git a/include/uapi/misc/Kbuild b/include/uapi/misc/Kbuild
new file mode 100644
index 0000000..e96cae7
--- /dev/null
+++ b/include/uapi/misc/Kbuild
@@ -0,0 +1,2 @@
+# misc Header export list
+header-y += cxl.h
diff --git a/include/uapi/misc/cxl.h b/include/uapi/misc/cxl.h
new file mode 100644
index 0000000..6a394b5
--- /dev/null
+++ b/include/uapi/misc/cxl.h
@@ -0,0 +1,88 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _UAPI_ASM_CXL_H
+#define _UAPI_ASM_CXL_H
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+/* ioctls */
+struct cxl_ioctl_start_work {
+	__u64 wed;
+	__u64 amr;
+	__u64 reserved1;
+	__u32 reserved2;
+	__s16 num_interrupts; /* -1 = use value from afu descriptor */
+	__u16 process_element; /* returned from kernel */
+	__u64 reserved3;
+	__u64 reserved4;
+	__u64 reserved5;
+	__u64 reserved6;
+};
+
+#define CXL_MAGIC 0xCA
+#define CXL_IOCTL_START_WORK      _IOWR(CXL_MAGIC, 0x00, struct cxl_ioctl_start_work)
+#define CXL_IOCTL_CHECK_ERROR     _IO(CXL_MAGIC,   0x02)
+
+/* events from read() */
+
+enum cxl_event_type {
+	CXL_EVENT_READ_FAIL     = -1,
+	CXL_EVENT_RESERVED      = 0,
+	CXL_EVENT_AFU_INTERRUPT = 1,
+	CXL_EVENT_DATA_STORAGE  = 2,
+	CXL_EVENT_AFU_ERROR     = 3,
+};
+
+struct cxl_event_header {
+	__u32 type;
+	__u16 size;
+	__u16 process_element;
+	__u64 reserved1;
+	__u64 reserved2;
+	__u64 reserved3;
+};
+
+struct cxl_event_afu_interrupt {
+	struct cxl_event_header header;
+	__u16 irq; /* Raised AFU interrupt number */
+	__u16 reserved1;
+	__u32 reserved2;
+	__u64 reserved3;
+	__u64 reserved4;
+	__u64 reserved5;
+};
+
+struct cxl_event_data_storage {
+	struct cxl_event_header header;
+	__u64 addr;
+	__u64 reserved1;
+	__u64 reserved2;
+	__u64 reserved3;
+};
+
+struct cxl_event_afu_error {
+	struct cxl_event_header header;
+	__u64 err;
+	__u64 reserved1;
+	__u64 reserved2;
+	__u64 reserved3;
+};
+
+struct cxl_event {
+	union {
+		struct cxl_event_header header;
+		struct cxl_event_afu_interrupt irq;
+		struct cxl_event_data_storage fault;
+		struct cxl_event_afu_error afu_err;
+	};
+};
+
+#endif
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 14/15] cxl: Add driver to Kbuild and Makefiles
  2014-09-18  8:26 [PATCH 0/15] POWER8 Coherent Accelerator device driver Michael Neuling
                   ` (12 preceding siblings ...)
  2014-09-18  8:26 ` [PATCH 13/15] cxl: Userspace header file Michael Neuling
@ 2014-09-18  8:26 ` Michael Neuling
  2014-09-18  8:27 ` [PATCH 15/15] cxl: Add documentation for userspace APIs Michael Neuling
  14 siblings, 0 replies; 43+ messages in thread
From: Michael Neuling @ 2014-09-18  8:26 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie, cbe-oss-dev

From: Ian Munsie <imunsie@au1.ibm.com>

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 drivers/misc/cxl/Kconfig  | 18 ++++++++++++++++++
 drivers/misc/cxl/Makefile |  3 +++
 2 files changed, 21 insertions(+)

diff --git a/drivers/misc/cxl/Kconfig b/drivers/misc/cxl/Kconfig
index 48533e1..d413e90 100644
--- a/drivers/misc/cxl/Kconfig
+++ b/drivers/misc/cxl/Kconfig
@@ -5,3 +5,21 @@
 config CXL_BASE
 	bool
 	default n
+
+config CXL
+	tristate "Support for IBM Coherent Accelerators (CXL)"
+	depends on PPC_POWERNV && PCI_MSI
+	select CXL_BASE
+	default m
+	help
+	  Select this option to enable userspace driver support for IBM
+	  Coherent Accelerators (CXL).  CXL is otherwise known as Coherent
+	  Accelerator Processor Interface (CAPI).
+
+config CXL_PCI
+	tristate "Support for CXL devices via PCI"
+	depends on CXL && PPC_POWERNV
+	default y
+	help
+	  Select this option to support CXL devices detected via PCI, e.g.
+	  when running under powernv/OPAL.
diff --git a/drivers/misc/cxl/Makefile b/drivers/misc/cxl/Makefile
index e30ad0a..96f292b 100644
--- a/drivers/misc/cxl/Makefile
+++ b/drivers/misc/cxl/Makefile
@@ -1 +1,4 @@
+cxl-y				+= main.o file.o irq.o fault.o native.o context.o sysfs.o debugfs.o
+obj-$(CONFIG_CXL)		+= cxl.o
+obj-$(CONFIG_CXL_PCI)		+= cxl-pci.o
 obj-$(CONFIG_CXL_BASE)		+= base.o
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 15/15] cxl: Add documentation for userspace APIs
  2014-09-18  8:26 [PATCH 0/15] POWER8 Coherent Accelerator device driver Michael Neuling
                   ` (13 preceding siblings ...)
  2014-09-18  8:26 ` [PATCH 14/15] cxl: Add driver to Kbuild and Makefiles Michael Neuling
@ 2014-09-18  8:27 ` Michael Neuling
  14 siblings, 0 replies; 43+ messages in thread
From: Michael Neuling @ 2014-09-18  8:27 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie, cbe-oss-dev

From: Ian Munsie <imunsie@au1.ibm.com>

This documentation gives an overview of the hardware architecture, userspace
APIs via /dev/cxl/afu0.0 and the syfs files.  It also adds a MAINTAINERS file
entry for cxl.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 Documentation/ABI/testing/sysfs-class-cxl | 125 ++++++++++++
 Documentation/ioctl/ioctl-number.txt      |   1 +
 Documentation/powerpc/00-INDEX            |   2 +
 Documentation/powerpc/cxl.txt             | 310 ++++++++++++++++++++++++++++++
 MAINTAINERS                               |   7 +
 5 files changed, 445 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-class-cxl
 create mode 100644 Documentation/powerpc/cxl.txt

diff --git a/Documentation/ABI/testing/sysfs-class-cxl b/Documentation/ABI/testing/sysfs-class-cxl
new file mode 100644
index 0000000..024921b
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-cxl
@@ -0,0 +1,125 @@
+Slave contexts (eg. /sys/class/cxl/afu0.0):
+
+What:		/sys/class/cxl/<afu>/irqs_max
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read/write
+		Maximum number of interrupts that can be requested by userspace.
+		The default on probe is the maximum that hardware can support
+		(eg. 2037).  Write values will limit userspace applications to
+		that many userspace interrupts.  Must be >= irqs_min.
+
+What:		/sys/class/cxl/<afu>/irqs_min
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read_only
+		The minimum number of interrupts that userspace must request
+		on a CXL_START_WORK ioctl.  Userspace may request -1 in the
+		START_WORK IOCTL to get this minimum automatically.
+
+What:		/sys/class/cxl/<afu>/mmio_size
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read only
+		Size of the MMIO space that may be mmaped by userspace.
+
+
+What:		/sys/class/cxl/<afu>/models_supported
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read only
+		List of the models this AFU supports.
+		Valid entries are: "dedicated_process" and "afu_directed"
+
+What:		/sys/class/cxl/<afu>/model
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read/write
+		The current model the AFU is using.  Will be one of the models
+		given in models_supported.  Writing will change the model but
+		no user contexts can be attached at this point.
+
+
+What:		/sys/class/cxl/<afu>/prefault_mode
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read/write
+		Set the mode for prefaulting in segments into the segment table
+		when performing the START_WORK ioctl.  Possible values:
+			none: No prefaulting (default)
+			wed: Just prefault in the wed
+			all: all segments this process currently maps
+
+What:		/sys/class/cxl/<afu>/reset
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	write only
+		Reset the AFU.
+
+
+Master contexts (eg. /sys/class/cxl/afu0.0m)
+
+What:		/sys/class/cxl/<afu>m/mmio_size
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read only
+		Size of the MMIO space that may be mmaped by userspace.  This
+		includes all slave contexts space also.
+
+What:		/sys/class/cxl/<afu>m/pp_mmio_len
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read only
+		Per Process MMIO space length.
+
+What:		/sys/class/cxl/<afu>/pp_mmio_off
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read only
+		Per Process MMIO space offset.
+
+
+Card info (eg. /sys/class/cxl/afu0.0)
+
+What:		/sys/class/cxl/<card>/caia_version
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read only
+		Identifies the CAIA Version the card implements.
+
+What:		/sys/class/cxl/<card>/psl_version
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read only
+		Identifies the revision level of the PSL.
+
+What:		/sys/class/cxl/<card>/base_image
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read only
+		Identifies the revision level of the base image for devices
+		that support load-able PSLs. For FPGAs this field identifies
+		the image contained in the on-adapter flash which is loaded
+		during the initial program load
+
+What:		/sys/class/cxl/<card>/image_loaded
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read only
+		Will return "user" or "factory" depending on the image loaded
+		onto the card
+
diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index 7e240a7..8136e1f 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -313,6 +313,7 @@ Code  Seq#(hex)	Include File		Comments
 0xB1	00-1F	PPPoX			<mailto:mostrows@styx.uwaterloo.ca>
 0xB3	00	linux/mmc/ioctl.h
 0xC0	00-0F	linux/usb/iowarrior.h
+0xCA	00-0F	uapi/misc/cxl.h
 0xCB	00-1F	CBM serial IEC bus	in development:
 					<mailto:michael.klein@puffin.lb.shuttle.de>
 0xCD	01	linux/reiserfs_fs.h
diff --git a/Documentation/powerpc/00-INDEX b/Documentation/powerpc/00-INDEX
index a68784d..116d94d 100644
--- a/Documentation/powerpc/00-INDEX
+++ b/Documentation/powerpc/00-INDEX
@@ -28,3 +28,5 @@ ptrace.txt
 	- Information on the ptrace interfaces for hardware debug registers.
 transactional_memory.txt
 	- Overview of the Power8 transactional memory support.
+cxl.txt
+	- Overview of the CXL driver.
diff --git a/Documentation/powerpc/cxl.txt b/Documentation/powerpc/cxl.txt
new file mode 100644
index 0000000..f23e675
--- /dev/null
+++ b/Documentation/powerpc/cxl.txt
@@ -0,0 +1,310 @@
+Coherent Accelerator Interface (CXL)
+====================================
+
+Introduction
+============
+
+    The coherent accelerator interface is designed to allow the
+    coherent connection of FPGA based accelerators (and other devices)
+    to a POWER system.  These devices need to adhere to the Coherent
+    Accelerator Interface Architecture (CAIA).
+
+    IBM refers to this as the Coherent Accelerator Processor Interface
+    or CAPI.  In the kernel it's referred to by the name CXL to avoid
+    confusion with the ISDN CAPI subsystem.
+
+Hardware overview
+=================
+
+          POWER8               FPGA
+       +----------+        +---------+
+       |          |        |         |
+       |   CPU    |        |   AFU   |
+       |          |        |         |
+       |          |        |         |
+       |          |        |         |
+       +----------+        +---------+
+       |          |        |         |
+       |   CAPP   +--------+   PSL   |
+       |          |  PCIe  |         |
+       +----------+        +---------+
+
+    The POWER8 chip has a Coherently Attached Processor Proxy (CAPP)
+    unit which is part of the PCIe Host Bridge (PHB).  This is managed
+    by Linux by calls into OPAL.  Linux doesn't directly program the
+    CAPP.
+
+    The FPGA (or coherently attached device) consists of two parts.
+    The POWER Service Layer (PSL) and the Accelerator Function Unit
+    (AFU). AFU is used to implement specific functionality behind
+    the PSL.  The PSL, among other things, provides memory address
+    translation services to allow each AFU direct access to userspace
+    memory.
+
+    The AFU is the core part of the accelerator (eg. the compression,
+    crypto etc function).  The kernel has no knowledge of the function
+    of the AFU.  Only userspace interacts directly with the AFU.
+
+    The PSL provides the translation and interrupt services that the
+    AFU needs.  This is what the kernel interacts with.  For example,
+    if the AFU needs to read a particular virtual address, it sends
+    that address to the PSL, the PSL then translates it, fetches the
+    data from memory and returns it to the AFU.  If the PSL has a
+    translation miss, it interrupts the kernel and the kernel services
+    the fault.  The context to which this fault is serviced is based
+    on who owns that acceleration function.
+
+AFU Models
+==========
+
+    There are two programming models supported by the AFU.  Dedicated
+    and AFU directed.  AFU may support one or both models.
+
+    In dedicated model only one MMU context is supported.  In this
+    model, only one userspace process can use the accelerator at time.
+
+    In AFU directed model, up to 16K simultaneous contexts can be
+    supported.  This means up to 16K simultaneous userspace
+    applications may use the accelerator (although specific AFUs may
+    support less).  In this mode, the AFU sends a 16 bit context ID
+    with each of its requests.  This tells the PSL which context is
+    associated with this operation.  If the PSL can't translate a
+    request, the ID can also be accessed by the kernel so it can
+    determine the associated userspace context to service this
+    translation with.
+
+MMIO space
+==========
+
+    A portion of the FPGA MMIO space can be directly mapped from the
+    AFU to userspace.  Either the whole space can be mapped (master
+    context), or just a per context portion (slave context).  The
+    hardware is self describing, hence the kernel can determine the
+    offset and size of the per context portion.
+
+Interrupts
+==========
+
+    AFUs may generate interrupts that are destined for userspace.  These
+    are received by the kernel as hardware interrupts and passed onto
+    userspace.
+
+    Data storage faults and error interrupts are handled by the kernel
+    driver.
+
+Work Element Descriptor (WED)
+=============================
+
+    The WED is a 64bit parameter passed to the AFU when a context is
+    started.  Its format is up to the AFU hence the kernel has no
+    knowledge of what it represents.  Typically it will be a virtual
+    address pointer to a work queue where the AFU and userspace can
+    share control and status information or work queues.
+
+
+
+
+User API
+========
+
+    The driver will create two character devices per AFU under
+    /dev/cxl.  One for master and one for slave contexts.
+
+    The master context (eg. /dev/cxl/afu0.0m), has access to all of
+    the MMIO space that an AFU provides.  The slave context
+    (eg. /dev/cxl/afu0.0m) has access to only the per process MMIO
+    space an AFU provides (AFU directed only).
+
+    The following file operations are supported on both slave and
+    master devices:
+
+    open
+
+        Opens device and allocates a file descriptor to be used with
+        the rest of the API.  This may be opened multiple times,
+        depending on how many contexts the AFU supports.
+
+        A dedicated model AFU only has one context and hence only
+        allows this device to be opened once.
+
+        A AFU directed model AFU can have many contexts and hence this
+        device can be opened by as many contexts as available.
+
+        Note: IRQs also need to be allocated per context, which may
+              also limit the number of contexts that can be allocated.
+              The POWER8 CAPP supports 2040 IRQs and 3 are used by the
+              kernel, so 2037 are left.  If 1 IRQ is needed per
+              context, then only 2037 contexts can be allocated.  If 4
+              IRQs are needed per context, then only 2037/4 = 509
+              contexts can be allocated.
+
+    ioctl
+
+        CAPI_IOCTL_START_WORK:
+            Starts the AFU and associates it with the process memory
+            context.  Once this ioctl is successfully executed, all
+            memory mapped into this process is accessible to this AFU
+            context using the same virtual addresses.  No additional
+            calls are required to un/map memory.  The AFU context will
+            be updated as userspace allocates and frees memory.  This
+            ioctl returns onces the context is started.
+
+            Takes a pointer to a struct cxl_ioctl_start_work
+                    struct cxl_ioctl_start_work {
+                            __u64 wed;
+                            __u64 amr;
+                            __u64 reserved1;
+                            __u32 reserved2;
+                            __s16 num_interrupts;
+                            __u16 process_element;
+                            __u64 reserved3;
+                            __u64 reserved4;
+                            __u64 reserved5;
+                            __u64 reserved6;
+                    };
+
+                wed: 64bit argument defined by the AFU.  Typically
+                    this is an virtual address pointing to an AFU
+                    specific structure describing what work to
+                    perform.
+
+                amr:
+                    Authority Mask Register (AMR), same as the powerpc
+                    AMR.
+
+                num_interrupt:
+                    Number of userspace interrupts to request.  The
+                    minimum required given in sysfs and -1 will
+                    automatically allocate this minimum.  The max also
+                    given in sysfs.
+
+                process_element:
+                    Written by the kernel with the context id (AKA
+                    process element) it allocates.  Slave contexts may
+                    want to communicate this to a master process.
+
+                reserved fields:
+                    For ABI padding and future extensions
+
+        CAPI_IOCTL_CHECK_ERROR:
+            This checks to see if the AFU has encountered an error and
+            if so resets it.  If userspace is accessing MMIO space, it
+            may notice an EEH fence (all ones on read) before the kernel,
+            hence it needs to inform the kernel of this.
+
+        CAPI_IOCTL_LOAD_AFU_IMAGE:
+            Future work: to dynamically load AFU FPGA images.  Without
+            this, the AFU is assumed to be pre-loaded on the card.
+
+    mmap
+
+        An AFU may have a MMIO space to facilitate communication with
+        the AFU and mmap allows access to this.  The size and contents
+        of this area are specific to the particular AFU.  The size can
+        be discovered via sysfs.  A read of all ones indicates the AFU
+        has encountered an error and CAPI_IOCTL_CHECK_ERROR should be
+        used to recover the AFU.
+
+        Master contexts will get all of the MMIO space.  Slave
+        contexts will get only the per process space associated with
+        its context.
+
+        This mmap call must be done after the IOCTL is started.
+
+        Care should be taken when accessing MMIO space.  Only 32 and
+        64bit accesses are supported by POWER8. Also, the AFU will be
+        designed with a specific endian, so all MMIO access should
+        consider endian (recommend endian(3) variants like: le64toh(),
+        be64toh() etc).  These endian issues equally apply to shared
+        memory queues the WED may describe.
+
+    read
+
+        Reads an event from the AFU. Will return -EINVAL if the buffer
+        does not contain enough space to write the struct
+        capi_event_header. Blocks if no events are pending.  Will
+        return -EIO in the case of an unrecoverable error or if the
+        card is removed.
+
+        All events will return a struct cxl_event which is always the
+        same size.  A struct cxl_event_header at the start gives:
+                struct cxl_event_header {
+                        __u32 type;
+                        __u16 size;
+                        __u16 process_element;
+                        __u64 reserved1;
+                        __u64 reserved2;
+                        __u64 reserved3;
+                };
+
+            type:
+                This gives the type of the interrupt.  This gives how
+                the rest event will be structured.  It can be either:
+                AFU interrupt, data storage fault or AFU error.
+
+            size:
+                This is always sizeof(struct cxl_event)
+
+            process_element:
+                Context ID of the event.  Currently this will always
+                be the current context.  Future work may allow
+                interrupts from one context to be routed to another
+                (eg. a master contexts handling error interrupts on
+                behalf of a slave).
+
+            reserved fields:
+                For future extensions
+
+        If an AFU interrupt event is received, the full structure received is:
+                struct cxl_event_afu_interrupt {
+                        struct cxl_event_header header;
+                        __u16 irq;
+                        __u16 reserved1;
+                        __u32 reserved2;
+                        __u64 reserved3;
+                        __u64 reserved4;
+                        __u64 reserved5;
+                };
+            irq:
+                The IRQ number sent by the AFU.
+
+            reserved fields:
+                For future extensions
+
+        If an data storage event is received, the full structure received is:
+                struct cxl_event_data_storage {
+                        struct cxl_event_header header;
+                        __u64 addr;
+                        __u64 reserved1;
+                        __u64 reserved2;
+                        __u64 reserved3;
+                };
+            address:
+                Address of the data storage trying to be accessed by
+                the AFU.  Valid accesses will handled transparently by
+                the kernel but invalid access will generate this
+                event.
+
+            reserved fields:
+                For future extensions
+
+        If an AFU error event is received, the full structure received is:
+                struct cxl_event_afu_error {
+                        struct cxl_event_header header;
+                        __u64 err;
+                        __u64 reserved1;
+                        __u64 reserved2;
+                        __u64 reserved3;
+                };
+            err:
+                Error status from the AFU.  AFU defined.
+
+            reserved fields:
+                For future extensions
+
+Sysfs Class
+===========
+
+    A cxl sysfs class is added under /sys/class/cxl to facilitate
+    enumeration and tuning of the accelerators. Its layout is
+    described in Documentation/ABI/testing/sysfs-class-cxl
diff --git a/MAINTAINERS b/MAINTAINERS
index 809ecd6..c972be3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2711,6 +2711,13 @@ W:	http://www.chelsio.com
 S:	Supported
 F:	drivers/net/ethernet/chelsio/cxgb4vf/
 
+CXL (IBM Coherent Accelerator Processor Interface CAPI) DRIVER
+M:	Ian Munsie <imunsie@au1.ibm.com>
+M:	Michael Neuling <mikey@neuling.org>
+L:	linuxppc-dev@lists.ozlabs.org
+S:	Supported
+F:	drivers/misc/cxl/
+
 STMMAC ETHERNET DRIVER
 M:	Giuseppe Cavallaro <peppe.cavallaro@st.com>
 L:	netdev@vger.kernel.org
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/15] powerpc/cell: Move spu_handle_mm_fault() out of cell platform
  2014-09-18  8:26 ` [PATCH 01/15] powerpc/cell: Move spu_handle_mm_fault() out of cell platform Michael Neuling
@ 2014-09-18 10:00   ` Jeremy Kerr
  2014-09-18 23:26     ` Michael Neuling
  2014-09-26  3:57   ` Anton Blanchard
  1 sibling, 1 reply; 43+ messages in thread
From: Jeremy Kerr @ 2014-09-18 10:00 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, mpe, benh
  Cc: anton, linux-kernel, linuxppc-dev, imunsie, cbe-oss-dev

Hi Mikey & Ian,

> Currently spu_handle_mm_fault() is in the cell platform.
> 
> This code is generically useful for other non-cell co-processors on powerpc.
> 
> This patch moves this function out of the cell platform into arch/powerpc/mm so
> that others may use it.

Makes sense.

Acked-by: Jeremy Kerr <jk@ozlabs.org>

> @@ -58,12 +56,12 @@ int spu_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
>  			goto out_unlock;
>  	}
>  
> -	is_write = dsisr & MFC_DSISR_ACCESS_PUT;
> +	is_write = dsisr & DSISR_ISSTORE;
>  	if (is_write) {
>  		if (!(vma->vm_flags & VM_WRITE))
>  			goto out_unlock;
>  	} else {
> -		if (dsisr & MFC_DSISR_ACCESS_DENIED)
> +		if (dsisr & DSISR_PROTFAULT)
>  			goto out_unlock;
>  		if (!(vma->vm_flags & (VM_READ | VM_EXEC)))
>  			goto out_unlock;

Consistent DSISR encodings? woot! :)

Cheers,


Jeremy

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 02/15] powerpc/cell: Move data segment faulting code out of cell platform
  2014-09-18  8:26 ` [PATCH 02/15] powerpc/cell: Move data segment faulting code " Michael Neuling
@ 2014-09-18 10:27   ` Jeremy Kerr
  2014-09-18 23:45     ` Michael Neuling
  2014-09-26  4:05   ` Anton Blanchard
  2014-09-29  8:30   ` Aneesh Kumar K.V
  2 siblings, 1 reply; 43+ messages in thread
From: Jeremy Kerr @ 2014-09-18 10:27 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, mpe, benh
  Cc: anton, linux-kernel, linuxppc-dev, imunsie, cbe-oss-dev

Hi Mikey & Ian,

> __spu_trap_data_seg() currently contains code to determine the VSID and ESID
> required for a particular EA and mm struct.
> 
> This code is generically useful for other co-processors.  This moves the code
> of the cell platform so it can be used by other powerpc code.

OK, nice.

> +
> +int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid)
> +{
> +	int psize, ssize;
> +
> +	*esid = (ea & ESID_MASK) | SLB_ESID_V;
> +
> +	switch (REGION_ID(ea)) {
> +	case USER_REGION_ID:
> +		pr_devel("copro_data_segment: 0x%llx -- USER_REGION_ID\n", ea);
> +#ifdef CONFIG_PPC_MM_SLICES
> +		psize = get_slice_psize(mm, ea);
> +#else
> +		psize = mm->context.user_psize;
> +#endif
> +		ssize = user_segment_size(ea);
> +		*vsid = (get_vsid(mm->context.id, ea, ssize)
> +			<< slb_vsid_shift(ssize)) | SLB_VSID_USER
> +			| (ssize == MMU_SEGSIZE_1T ? SLB_VSID_B_1T : 0);
> +		break;
> +	case VMALLOC_REGION_ID:
> +		pr_devel("copro_data_segment: 0x%llx -- VMALLOC_REGION_ID\n", ea);
> +		if (ea < VMALLOC_END)
> +			psize = mmu_vmalloc_psize;
> +		else
> +			psize = mmu_io_psize;
> +		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
> +			<< SLB_VSID_SHIFT) | SLB_VSID_KERNEL
> +			| (mmu_kernel_ssize == MMU_SEGSIZE_1T ? SLB_VSID_B_1T : 0);
> +		break;
> +	case KERNEL_REGION_ID:
> +		pr_devel("copro_data_segment: 0x%llx -- KERNEL_REGION_ID\n", ea);
> +		psize = mmu_linear_psize;
> +		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
> +			<< SLB_VSID_SHIFT) | SLB_VSID_KERNEL
> +			| (mmu_kernel_ssize == MMU_SEGSIZE_1T ? SLB_VSID_B_1T : 0);
> +		break;
> +	default:
> +		/* Future: support kernel segments so that drivers can use the
> +		 * CoProcessors */
> +		pr_debug("invalid region access at %016llx\n", ea);
> +		return 1;
> +	}
> +	*vsid |= mmu_psize_defs[psize].sllp;

A bit of a nitpick, but how about you remove the repeated:

		| (<size> == MMU_SEGSIZE_1T ? SLB_VSID_B_1T : 0)

then set ssize in each of the switch cases (like we do with psize), and
or-in the VSID_B_1T bit at the end:
	
	*vsid |= mmu_psize_defs[psize].sllp
		| (ssize == MMU_SEGSIZE_1T ? SLB_VSID_B_1T : 0);

Otherwise, looks good to me.

Cheers,


Jeremy


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/15] powerpc/cell: Move spu_handle_mm_fault() out of cell platform
  2014-09-18 10:00   ` Jeremy Kerr
@ 2014-09-18 23:26     ` Michael Neuling
  0 siblings, 0 replies; 43+ messages in thread
From: Michael Neuling @ 2014-09-18 23:26 UTC (permalink / raw)
  To: Jeremy Kerr
  Cc: greg, arnd, mpe, benh, anton, linux-kernel, linuxppc-dev,
	imunsie, cbe-oss-dev

> > @@ -58,12 +56,12 @@ int spu_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
> >  			goto out_unlock;
> >  	}
> >  
> > -	is_write = dsisr & MFC_DSISR_ACCESS_PUT;
> > +	is_write = dsisr & DSISR_ISSTORE;
> >  	if (is_write) {
> >  		if (!(vma->vm_flags & VM_WRITE))
> >  			goto out_unlock;
> >  	} else {
> > -		if (dsisr & MFC_DSISR_ACCESS_DENIED)
> > +		if (dsisr & DSISR_PROTFAULT)
> >  			goto out_unlock;
> >  		if (!(vma->vm_flags & (VM_READ | VM_EXEC)))
> >  			goto out_unlock;
> 
> Consistent DSISR encodings? woot! :)

Yep!

arch/powerpc/include/asm/spu.h:605:#define MFC_DSISR_ACCESS_PUT         (1 << 25)
arch/powerpc/include/asm/reg.h:255:#define   DSISR_ISSTORE              0x02000000      /* access was a store */

and 

arch/powerpc/include/asm/spu.h:603:#define MFC_DSISR_ACCESS_DENIED              (1 << 27)
arch/powerpc/include/asm/reg.h:254:#define   DSISR_PROTFAULT    0x08000000      /* protection fault */

Mikey

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 02/15] powerpc/cell: Move data segment faulting code out of cell platform
  2014-09-18 10:27   ` Jeremy Kerr
@ 2014-09-18 23:45     ` Michael Neuling
  0 siblings, 0 replies; 43+ messages in thread
From: Michael Neuling @ 2014-09-18 23:45 UTC (permalink / raw)
  To: Jeremy Kerr
  Cc: greg, arnd, mpe, benh, anton, linux-kernel, linuxppc-dev,
	imunsie, cbe-oss-dev

> > +
> > +int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid)
> > +{
> > +	int psize, ssize;
> > +
> > +	*esid = (ea & ESID_MASK) | SLB_ESID_V;
> > +
> > +	switch (REGION_ID(ea)) {
> > +	case USER_REGION_ID:
> > +		pr_devel("copro_data_segment: 0x%llx -- USER_REGION_ID\n", ea);
> > +#ifdef CONFIG_PPC_MM_SLICES
> > +		psize = get_slice_psize(mm, ea);
> > +#else
> > +		psize = mm->context.user_psize;
> > +#endif
> > +		ssize = user_segment_size(ea);
> > +		*vsid = (get_vsid(mm->context.id, ea, ssize)
> > +			<< slb_vsid_shift(ssize)) | SLB_VSID_USER
> > +			| (ssize == MMU_SEGSIZE_1T ? SLB_VSID_B_1T : 0);
> > +		break;
> > +	case VMALLOC_REGION_ID:
> > +		pr_devel("copro_data_segment: 0x%llx -- VMALLOC_REGION_ID\n", ea);
> > +		if (ea < VMALLOC_END)
> > +			psize = mmu_vmalloc_psize;
> > +		else
> > +			psize = mmu_io_psize;
> > +		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
> > +			<< SLB_VSID_SHIFT) | SLB_VSID_KERNEL
> > +			| (mmu_kernel_ssize == MMU_SEGSIZE_1T ? SLB_VSID_B_1T : 0);
> > +		break;
> > +	case KERNEL_REGION_ID:
> > +		pr_devel("copro_data_segment: 0x%llx -- KERNEL_REGION_ID\n", ea);
> > +		psize = mmu_linear_psize;
> > +		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
> > +			<< SLB_VSID_SHIFT) | SLB_VSID_KERNEL
> > +			| (mmu_kernel_ssize == MMU_SEGSIZE_1T ? SLB_VSID_B_1T : 0);
> > +		break;
> > +	default:
> > +		/* Future: support kernel segments so that drivers can use the
> > +		 * CoProcessors */
> > +		pr_debug("invalid region access at %016llx\n", ea);
> > +		return 1;
> > +	}
> > +	*vsid |= mmu_psize_defs[psize].sllp;
> 
> A bit of a nitpick, but how about you remove the repeated:
> 
> 		| (<size> == MMU_SEGSIZE_1T ? SLB_VSID_B_1T : 0)
> 
> then set ssize in each of the switch cases (like we do with psize), and
> or-in the VSID_B_1T bit at the end:
> 	
> 	*vsid |= mmu_psize_defs[psize].sllp
> 		| (ssize == MMU_SEGSIZE_1T ? SLB_VSID_B_1T : 0);

Nice.  I think below is what you mean.

I'll fold this into the existing patch and repost in a few days.

Thanks,
Mikey

diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
index 4105a63..939caf6 100644
--- a/arch/powerpc/mm/copro_fault.c
+++ b/arch/powerpc/mm/copro_fault.c
@@ -107,8 +107,7 @@ int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid)
 #endif
 		ssize = user_segment_size(ea);
 		*vsid = (get_vsid(mm->context.id, ea, ssize)
-			<< slb_vsid_shift(ssize)) | SLB_VSID_USER
-			| (ssize == MMU_SEGSIZE_1T ? SLB_VSID_B_1T : 0);
+			 << slb_vsid_shift(ssize)) | SLB_VSID_USER;
 		break;
 	case VMALLOC_REGION_ID:
 		pr_devel("copro_data_segment: 0x%llx -- VMALLOC_REGION_ID\n", ea);
@@ -116,16 +115,16 @@ int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid)
 			psize = mmu_vmalloc_psize;
 		else
 			psize = mmu_io_psize;
+		ssize = mmu_kernel_ssize;
 		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
-			<< SLB_VSID_SHIFT) | SLB_VSID_KERNEL
-			| (mmu_kernel_ssize == MMU_SEGSIZE_1T ? SLB_VSID_B_1T : 0);
+			 << SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
 		break;
 	case KERNEL_REGION_ID:
 		pr_devel("copro_data_segment: 0x%llx -- KERNEL_REGION_ID\n", ea);
 		psize = mmu_linear_psize;
+		ssize = mmu_kernel_ssize;
 		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
-			<< SLB_VSID_SHIFT) | SLB_VSID_KERNEL
-			| (mmu_kernel_ssize == MMU_SEGSIZE_1T ? SLB_VSID_B_1T : 0);
+			 << SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
 		break;
 	default:
 		/* Future: support kernel segments so that drivers can use the
@@ -133,7 +132,8 @@ int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid)
 		pr_debug("invalid region access at %016llx\n", ea);
 		return 1;
 	}
-	*vsid |= mmu_psize_defs[psize].sllp;
+	*vsid |= mmu_psize_defs[psize].sllp |
+		(ssize == MMU_SEGSIZE_1T) ? SLB_VSID_B_1T : 0;
 
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH 05/15] powerpc/powernv: Split out set MSI IRQ chip code
  2014-09-18  8:26 ` [PATCH 05/15] powerpc/powernv: Split out set MSI IRQ chip code Michael Neuling
@ 2014-09-19  6:54   ` Gavin Shan
  2014-09-22  4:31     ` Michael Neuling
  0 siblings, 1 reply; 43+ messages in thread
From: Gavin Shan @ 2014-09-19  6:54 UTC (permalink / raw)
  To: Michael Neuling
  Cc: greg, arnd, mpe, benh, cbe-oss-dev, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

On Thu, Sep 18, 2014 at 06:26:50PM +1000, Michael Neuling wrote:
>From: Ian Munsie <imunsie@au1.ibm.com>
>
>Some of the MSI IRQ code in pnv_pci_ioda_msi_setup() is generically useful so
>split it out.
>
>This will be used by some of the cxl PCIe code later.
>
>Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
>Signed-off-by: Michael Neuling <mikey@neuling.org>
>---
> arch/powerpc/platforms/powernv/pci-ioda.c | 43 ++++++++++++++++++-------------
> 1 file changed, 25 insertions(+), 18 deletions(-)
>
>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>index df241b1..194f90a 100644
>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>@@ -1306,14 +1306,36 @@ static void pnv_ioda2_msi_eoi(struct irq_data *d)
> 	icp_native_eoi(d);
> }
> 
>+
>+static void set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq)
>+{
>+	struct irq_data *idata;
>+	struct irq_chip *ichip;
>+
>+	/*
>+	 * Change the IRQ chip for the MSI interrupts on PHB3.
>+	 * The corresponding IRQ chip should be populated for
>+	 * the first time.
>+	 */
>+	if (phb->type == PNV_PHB_IODA2) {
>+		if (!phb->ioda.irq_chip_init) {
>+			idata = irq_get_irq_data(virq);
>+			ichip = irq_data_get_irq_chip(idata);
>+			phb->ioda.irq_chip_init = 1;
>+			phb->ioda.irq_chip = *ichip;
>+			phb->ioda.irq_chip.irq_eoi = pnv_ioda2_msi_eoi;
>+		}
>+
>+		irq_set_chip(virq, &phb->ioda.irq_chip);
>+	}
>+}
>+

Nitpick: to check PHB type and bail early could avoid nested code :)

	if (phb->type != PNV_PHB_IODA2)
		return;

> static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
> 				  unsigned int hwirq, unsigned int virq,
> 				  unsigned int is_64, struct msi_msg *msg)
> {
> 	struct pnv_ioda_pe *pe = pnv_ioda_get_pe(dev);
> 	struct pci_dn *pdn = pci_get_pdn(dev);
>-	struct irq_data *idata;
>-	struct irq_chip *ichip;
> 	unsigned int xive_num = hwirq - phb->msi_base;
> 	__be32 data;
> 	int rc;
>@@ -1365,22 +1387,7 @@ static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
> 	}
> 	msg->data = be32_to_cpu(data);
> 
>-	/*
>-	 * Change the IRQ chip for the MSI interrupts on PHB3.
>-	 * The corresponding IRQ chip should be populated for
>-	 * the first time.
>-	 */
>-	if (phb->type == PNV_PHB_IODA2) {
>-		if (!phb->ioda.irq_chip_init) {
>-			idata = irq_get_irq_data(virq);
>-			ichip = irq_data_get_irq_chip(idata);
>-			phb->ioda.irq_chip_init = 1;
>-			phb->ioda.irq_chip = *ichip;
>-			phb->ioda.irq_chip.irq_eoi = pnv_ioda2_msi_eoi;
>-		}
>-
>-		irq_set_chip(virq, &phb->ioda.irq_chip);
>-	}
>+	set_msi_irq_chip(phb, virq);
> 
> 	pr_devel("%s: %s-bit MSI on hwirq %x (xive #%d),"
> 		 " address=%x_%08x data=%x PE# %d\n",

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 07/15] powerpc/powerpc: Add new PCIe functions for allocating cxl interrupts
  2014-09-18  8:26 ` [PATCH 07/15] powerpc/powerpc: Add new PCIe functions for allocating cxl interrupts Michael Neuling
@ 2014-09-19  7:09   ` Gavin Shan
  2014-09-22  5:01     ` Michael Neuling
  0 siblings, 1 reply; 43+ messages in thread
From: Gavin Shan @ 2014-09-19  7:09 UTC (permalink / raw)
  To: Michael Neuling
  Cc: greg, arnd, mpe, benh, cbe-oss-dev, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

On Thu, Sep 18, 2014 at 06:26:52PM +1000, Michael Neuling wrote:
>From: Ian Munsie <imunsie@au1.ibm.com>
>
>This adds a number of functions for allocating IRQs under powernv PCIe for cxl.
>
>Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
>Signed-off-by: Michael Neuling <mikey@neuling.org>
>---
> arch/powerpc/include/asm/pnv-pci.h        |  27 +++++
> arch/powerpc/platforms/powernv/pci-ioda.c | 186 ++++++++++++++++++++++++++++++
> 2 files changed, 213 insertions(+)
> create mode 100644 arch/powerpc/include/asm/pnv-pci.h
>
>diff --git a/arch/powerpc/include/asm/pnv-pci.h b/arch/powerpc/include/asm/pnv-pci.h
>new file mode 100644
>index 0000000..71717b5
>--- /dev/null
>+++ b/arch/powerpc/include/asm/pnv-pci.h
>@@ -0,0 +1,27 @@
>+/*
>+ * Copyright 2014 IBM Corp.
>+ *
>+ * This program is free software; you can redistribute it and/or
>+ * modify it under the terms of the GNU General Public License
>+ * as published by the Free Software Foundation; either version
>+ * 2 of the License, or (at your option) any later version.
>+ */
>+
>+#ifndef _ASM_PNV_PCI_H
>+#define _ASM_PNV_PCI_H
>+
>+#include <linux/pci.h>
>+#include <misc/cxl.h>
>+
>+int pnv_phb_to_cxl(struct pci_dev *dev);
>+int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
>+			   unsigned int virq);
>+int pnv_cxl_alloc_hwirqs(struct pci_dev *dev, int num);
>+void pnv_cxl_release_hwirqs(struct pci_dev *dev, int hwirq, int num);
>+int pnv_cxl_alloc_hwirq_ranges(struct cxl_irq_ranges *irqs,
>+			       struct pci_dev *dev, int num);
>+void pnv_cxl_release_hwirq_ranges(struct cxl_irq_ranges *irqs,
>+				  struct pci_dev *dev);
>+int pnv_cxl_get_irq_count(struct pci_dev *dev);
>+
>+#endif
>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>index 194f90a..80919f8 100644
>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>@@ -38,6 +38,8 @@
> #include <asm/debug.h>
> #include <asm/firmware.h>
> 
>+#include <misc/cxl.h>
>+
> #include "powernv.h"
> #include "pci.h"
> 
>@@ -503,6 +505,163 @@ static struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev)
> 		return NULL;
> 	return &phb->ioda.pe_array[pdn->pe_number];
> }
>+
>+struct device_node *pnv_pci_to_phb_node(struct pci_dev *dev)
>+{
>+	struct device_node *np;
>+	struct property *prop = NULL;
>+
>+	np = of_node_get(pci_device_to_OF_node(dev));
>+
>+	/* Scan up the tree looking for the PHB node */
>+	while (np) {
>+		if ((prop = of_find_property(np, "ibm,opal-phbid", NULL)))
>+			break;
>+		np = of_get_next_parent(np);
>+	}
>+
>+	if (!prop) {
>+		of_node_put(np);
>+		return NULL;
>+	}
>+
>+	return np;
>+}
>+EXPORT_SYMBOL(pnv_pci_to_phb_node);

Nitpick: I'm not sure it's better way. "struct pci_controller::dn" should
always have valid "ibm,opal-phbid", so I guess the code could be like this
way:

	struct pci_controller *hose = pci_bus_to_host(dev->bus);

	return hose->dn;

>+
>+#ifdef CONFIG_CXL_BASE
>+int pnv_phb_to_cxl(struct pci_dev *dev)
>+{
>+	struct device_node *np;
>+	struct pnv_ioda_pe *pe;
>+	const u64 *prop64;
>+	u64 phb_id;
>+	int rc;
>+
>+	dev_info(&dev->dev, "switch PHB to CXL\n");
>+
>+	if (!(np = pnv_pci_to_phb_node(dev)))
>+		return -ENODEV;
>+
>+	prop64 = of_get_property(np, "ibm,opal-phbid", NULL);
>+
>+	phb_id = be64_to_cpup(prop64);
>+	dev_info(&dev->dev, "PHB-ID  : 0x%016llx\n", phb_id);
>+

The PHB ID would have been there: struct pnv_phb::opal_id. So
I guess we needn't grab it from device-tree again :)

>+	if (!(pe = pnv_ioda_get_pe(dev))) {
>+		rc = -ENODEV;
>+		goto out;
>+	}
>+	dev_info(&dev->dev, "     pe : %i\n", pe->pe_number);

Perhaps you can reuse pe_info() here.

>+
>+	if ((rc = opal_pci_set_phb_cxl_mode(phb_id, 1, pe->pe_number)))
>+		dev_err(&dev->dev, "opal_pci_set_phb_cxl_mode failed: %i\n", rc);
>+
>+out:
>+	of_node_put(np);
>+	return rc;
>+}
>+EXPORT_SYMBOL(pnv_phb_to_cxl);
>+
>+int pnv_cxl_alloc_hwirqs(struct pci_dev *dev, int num)
>+{
>+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
>+	struct pnv_phb *phb = hose->private_data;
>+	int hwirq = msi_bitmap_alloc_hwirqs(&phb->msi_bmp, num);
>+
>+	if (hwirq < 0) {
>+		dev_warn(&dev->dev, "Failed to find a free MSI\n");
>+		return -ENOSPC;
>+	}
>+
>+	return phb->msi_base + hwirq;
>+}
>+EXPORT_SYMBOL(pnv_cxl_alloc_hwirqs);
>+
>+void pnv_cxl_release_hwirqs(struct pci_dev *dev, int hwirq, int num)
>+{
>+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
>+	struct pnv_phb *phb = hose->private_data;
>+
>+	msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq - phb->msi_base, num);
>+}
>+EXPORT_SYMBOL(pnv_cxl_release_hwirqs);
>+
>+
>+int pnv_cxl_alloc_hwirq_ranges(struct cxl_irq_ranges *irqs,
>+			       struct pci_dev *dev, int num)
>+{
>+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
>+	struct pnv_phb *phb = hose->private_data;
>+	int range = 0;
>+	int hwirq;
>+	int try;
>+
>+	memset(irqs, 0, sizeof(struct cxl_irq_ranges));
>+
>+	for (range = 1; range < CXL_IRQ_RANGES && num; range++) {
>+		try = num;
>+		while (try) {
>+			hwirq = msi_bitmap_alloc_hwirqs(&phb->msi_bmp, try);
>+			if (hwirq >= 0)
>+				break;
>+			try /= 2;
>+		}
>+		if (!try)
>+			goto fail;
>+
>+		irqs->offset[range] = phb->msi_base + hwirq;
>+		irqs->range[range] = try;
>+		pr_devel("cxl alloc irq range 0x%x: offset: 0x%lx  limit: %li\n",
>+			 range, irqs->offset[range], irqs->range[range]);
>+		num -= try;
>+	}
>+	if (num)
>+		goto fail;
>+
>+	return 0;
>+fail:
>+	for (range--; range >= 0; range--) {
>+		hwirq = irqs->offset[range] - phb->msi_base;
>+		msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq,
>+				       irqs->range[range]);
>+		irqs->range[range] = 0;
>+	}
>+	return -ENOSPC;
>+}
>+EXPORT_SYMBOL(pnv_cxl_alloc_hwirq_ranges);
>+
>+void pnv_cxl_release_hwirq_ranges(struct cxl_irq_ranges *irqs,
>+				  struct pci_dev *dev)
>+{
>+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
>+	struct pnv_phb *phb = hose->private_data;
>+	int range = 0;
>+	int hwirq;
>+
>+	for (range = 0; range < 4; range++) {
>+		hwirq = irqs->offset[range] - phb->msi_base;
>+		if (irqs->range[range]) {
>+			pr_devel("cxl release irq range 0x%x: offset: 0x%lx  limit: %ld\n",
>+				 range, irqs->offset[range],
>+				 irqs->range[range]);
>+			msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq,
>+					       irqs->range[range]);
>+		}
>+	}
>+}
>+EXPORT_SYMBOL(pnv_cxl_release_hwirq_ranges);
>+
>+int pnv_cxl_get_irq_count(struct pci_dev *dev)
>+{
>+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
>+        struct pnv_phb *phb = hose->private_data;
>+
>+	return phb->msi_bmp.irq_count;
>+}
>+EXPORT_SYMBOL(pnv_cxl_get_irq_count);
>+
>+#endif /* CONFIG_CXL_BASE */
> #endif /* CONFIG_PCI_MSI */
> 
> static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>@@ -1330,6 +1489,33 @@ static void set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq)
> 	}
> }
> 
>+#ifdef CONFIG_CXL_BASE
>+int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
>+			   unsigned int virq)
>+{
>+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
>+	struct pnv_phb *phb = hose->private_data;
>+	unsigned int xive_num = hwirq - phb->msi_base;
>+	struct pnv_ioda_pe *pe;
>+	int rc;
>+
>+	if (!(pe = pnv_ioda_get_pe(dev)))
>+		return -ENODEV;
>+
>+	/* Assign XIVE to PE */
>+	rc = opal_pci_set_xive_pe(phb->opal_id, pe->pe_number, xive_num);
>+	if (rc) {
>+		pr_warn("%s: OPAL error %d setting msi_base 0x%x hwirq 0x%x XIVE 0x%x PE\n",
>+			pci_name(dev), rc, phb->msi_base, hwirq, xive_num);
>+		return -EIO;
>+	}

It seems current firmware doesn't support the OPAL API for PHB3.

>+	set_msi_irq_chip(phb, virq);
>+
>+	return 0;
>+}
>+EXPORT_SYMBOL(pnv_cxl_ioda_msi_setup);
>+#endif
>+
> static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
> 				  unsigned int hwirq, unsigned int virq,
> 				  unsigned int is_64, struct msi_msg *msg)

Thanks,
Gavin
>-- 
>1.9.1
>
>_______________________________________________
>Linuxppc-dev mailing list
>Linuxppc-dev@lists.ozlabs.org
>https://lists.ozlabs.org/listinfo/linuxppc-dev


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/15] powerpc/msi: Improve IRQ bitmap allocator
  2014-09-18  8:26 ` [PATCH 03/15] powerpc/msi: Improve IRQ bitmap allocator Michael Neuling
@ 2014-09-19 20:16   ` Scott Wood
  2014-09-19 20:19     ` Scott Wood
  2014-09-22  8:25     ` Laurentiu Tudor
  2014-09-22  8:29   ` Laurentiu Tudor
  1 sibling, 2 replies; 43+ messages in thread
From: Scott Wood @ 2014-09-19 20:16 UTC (permalink / raw)
  To: Michael Neuling
  Cc: greg, arnd, mpe, benh, cbe-oss-dev, imunsie, linux-kernel,
	linuxppc-dev, jk, anton, Laurentiu Tudor

On Thu, 2014-09-18 at 18:26 +1000, Michael Neuling wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> Currently msi_bitmap_alloc_hwirqs() will round up any IRQ allocation requests
> to the nearest power of 2.  eg. ask for 5 IRQs and you'll get 8.  This wastes a
> lot of IRQs which can be a scarce resource.
> 
> For cxl we can require multiple IRQs for every contexts that is attached to the
> accelerator.  For AFU directed accelerators, there may be 1000s of contexts
> attached, hence we can easily run out of IRQs, especially if we are needlessly
> wasting them.
> 
> This changes the msi_bitmap_alloc_hwirqs() to allocate only the required number
> of IRQs, hence avoiding this wastage.
> 
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
>  arch/powerpc/sysdev/msi_bitmap.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)

This conflicts with (and partially duplicates)
http://patchwork.ozlabs.org/patch/381892/
which I have in my tree.  How should we handle it?

Laurentiu, from looking at the overlap between patches I see a problem
with your existing patch, regarding the out-of-irqs path and
msi_bitmap_free_hwirqs(), so one way or another that needs to get fixed
soon.

-Scott

> diff --git a/arch/powerpc/sysdev/msi_bitmap.c b/arch/powerpc/sysdev/msi_bitmap.c
> index 2ff6302..e001559 100644
> --- a/arch/powerpc/sysdev/msi_bitmap.c
> +++ b/arch/powerpc/sysdev/msi_bitmap.c
> @@ -24,28 +24,36 @@ int msi_bitmap_alloc_hwirqs(struct msi_bitmap *bmp, int num)
>  	 * This is fast, but stricter than we need. We might want to add
>  	 * a fallback routine which does a linear search with no alignment.
>  	 */
> -	offset = bitmap_find_free_region(bmp->bitmap, bmp->irq_count, order);
> +	offset = bitmap_find_next_zero_area(bmp->bitmap, bmp->irq_count, 0,
> +					    num, (1 << order) - 1);
> +	if (offset > bmp->irq_count)
> +		goto err;
> +	bitmap_set(bmp->bitmap, offset, num);
>  	spin_unlock_irqrestore(&bmp->lock, flags);
>  
>  	pr_debug("msi_bitmap: allocated 0x%x (2^%d) at offset 0x%x\n",
>  		 num, order, offset);
>  
>  	return offset;
> +err:
> +	spin_unlock_irqrestore(&bmp->lock, flags);
> +	return -ENOMEM;
>  }
> +EXPORT_SYMBOL(msi_bitmap_alloc_hwirqs);
>  
>  void msi_bitmap_free_hwirqs(struct msi_bitmap *bmp, unsigned int offset,
>  			    unsigned int num)
>  {
>  	unsigned long flags;
> -	int order = get_count_order(num);
>  
> -	pr_debug("msi_bitmap: freeing 0x%x (2^%d) at offset 0x%x\n",
> -		 num, order, offset);
> +	pr_debug("msi_bitmap: freeing 0x%x at offset 0x%x\n",
> +		 num, offset);
>  
>  	spin_lock_irqsave(&bmp->lock, flags);
> -	bitmap_release_region(bmp->bitmap, offset, order);
> +	bitmap_clear(bmp->bitmap, offset, num);
>  	spin_unlock_irqrestore(&bmp->lock, flags);
>  }
> +EXPORT_SYMBOL(msi_bitmap_free_hwirqs);
>  
>  void msi_bitmap_reserve_hwirq(struct msi_bitmap *bmp, unsigned int hwirq)
>  {



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/15] powerpc/msi: Improve IRQ bitmap allocator
  2014-09-19 20:16   ` Scott Wood
@ 2014-09-19 20:19     ` Scott Wood
  2014-09-22  8:26       ` Laurentiu Tudor
  2014-09-22  8:25     ` Laurentiu Tudor
  1 sibling, 1 reply; 43+ messages in thread
From: Scott Wood @ 2014-09-19 20:19 UTC (permalink / raw)
  To: Michael Neuling
  Cc: greg, arnd, mpe, benh, cbe-oss-dev, imunsie, linux-kernel,
	linuxppc-dev, jk, anton, Laurentiu Tudor

On Fri, 2014-09-19 at 15:16 -0500, Scott Wood wrote:
> On Thu, 2014-09-18 at 18:26 +1000, Michael Neuling wrote:
> > From: Ian Munsie <imunsie@au1.ibm.com>
> > 
> > Currently msi_bitmap_alloc_hwirqs() will round up any IRQ allocation requests
> > to the nearest power of 2.  eg. ask for 5 IRQs and you'll get 8.  This wastes a
> > lot of IRQs which can be a scarce resource.
> > 
> > For cxl we can require multiple IRQs for every contexts that is attached to the
> > accelerator.  For AFU directed accelerators, there may be 1000s of contexts
> > attached, hence we can easily run out of IRQs, especially if we are needlessly
> > wasting them.
> > 
> > This changes the msi_bitmap_alloc_hwirqs() to allocate only the required number
> > of IRQs, hence avoiding this wastage.
> > 
> > Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > ---
> >  arch/powerpc/sysdev/msi_bitmap.c | 18 +++++++++++++-----
> >  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> This conflicts with (and partially duplicates)
> http://patchwork.ozlabs.org/patch/381892/
> which I have in my tree.  How should we handle it?
> 
> Laurentiu, from looking at the overlap between patches I see a problem
> with your existing patch, regarding the out-of-irqs path and
> msi_bitmap_free_hwirqs(), so one way or another that needs to get fixed
> soon.

Given the problems with Laurentiu's patch, perhaps it'd be best for me
to just revert that patch in my tree, and respin it after this patchset
has been merged.

-Scott



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 05/15] powerpc/powernv: Split out set MSI IRQ chip code
  2014-09-19  6:54   ` Gavin Shan
@ 2014-09-22  4:31     ` Michael Neuling
  0 siblings, 0 replies; 43+ messages in thread
From: Michael Neuling @ 2014-09-22  4:31 UTC (permalink / raw)
  To: Gavin Shan
  Cc: greg, arnd, mpe, benh, cbe-oss-dev, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

> >+static void set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq)
> >+{
> >+	struct irq_data *idata;
> >+	struct irq_chip *ichip;
> >+
> >+	/*
> >+	 * Change the IRQ chip for the MSI interrupts on PHB3.
> >+	 * The corresponding IRQ chip should be populated for
> >+	 * the first time.
> >+	 */
> >+	if (phb->type == PNV_PHB_IODA2) {
> >+		if (!phb->ioda.irq_chip_init) {
> >+			idata = irq_get_irq_data(virq);
> >+			ichip = irq_data_get_irq_chip(idata);
> >+			phb->ioda.irq_chip_init = 1;
> >+			phb->ioda.irq_chip = *ichip;
> >+			phb->ioda.irq_chip.irq_eoi = pnv_ioda2_msi_eoi;
> >+		}
> >+
> >+		irq_set_chip(virq, &phb->ioda.irq_chip);
> >+	}
> >+}
> >+
> 
> Nitpick: to check PHB type and bail early could avoid nested code :)
> 
> 	if (phb->type != PNV_PHB_IODA2)
> 		return;

OK, will do in repost.

Thanks,
Mikey


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 07/15] powerpc/powerpc: Add new PCIe functions for allocating cxl interrupts
  2014-09-19  7:09   ` Gavin Shan
@ 2014-09-22  5:01     ` Michael Neuling
  0 siblings, 0 replies; 43+ messages in thread
From: Michael Neuling @ 2014-09-22  5:01 UTC (permalink / raw)
  To: Gavin Shan
  Cc: greg, arnd, mpe, benh, cbe-oss-dev, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

<snip>
> >+struct device_node *pnv_pci_to_phb_node(struct pci_dev *dev)
> >+{
> >+	struct device_node *np;
> >+	struct property *prop = NULL;
> >+
> >+	np = of_node_get(pci_device_to_OF_node(dev));
> >+
> >+	/* Scan up the tree looking for the PHB node */
> >+	while (np) {
> >+		if ((prop = of_find_property(np, "ibm,opal-phbid", NULL)))
> >+			break;
> >+		np = of_get_next_parent(np);
> >+	}
> >+
> >+	if (!prop) {
> >+		of_node_put(np);
> >+		return NULL;
> >+	}
> >+
> >+	return np;
> >+}
> >+EXPORT_SYMBOL(pnv_pci_to_phb_node);
> 
> Nitpick: I'm not sure it's better way. "struct pci_controller::dn" should
> always have valid "ibm,opal-phbid", so I guess the code could be like this
> way:
> 
> 	struct pci_controller *hose = pci_bus_to_host(dev->bus);
> 
> 	return hose->dn;

Nice.. that makes it much simpler.  I'll update.

<snip>
> >+
> >+#ifdef CONFIG_CXL_BASE
> >+int pnv_phb_to_cxl(struct pci_dev *dev)
> >+{
> >+	struct device_node *np;
> >+	struct pnv_ioda_pe *pe;
> >+	const u64 *prop64;
> >+	u64 phb_id;
> >+	int rc;
> >+
> >+	dev_info(&dev->dev, "switch PHB to CXL\n");
> >+
> >+	if (!(np = pnv_pci_to_phb_node(dev)))
> >+		return -ENODEV;
> >+
> >+	prop64 = of_get_property(np, "ibm,opal-phbid", NULL);
> >+
> >+	phb_id = be64_to_cpup(prop64);
> >+	dev_info(&dev->dev, "PHB-ID  : 0x%016llx\n", phb_id);
> >+
> 
> The PHB ID would have been there: struct pnv_phb::opal_id. So
> I guess we needn't grab it from device-tree again :)

Nice, I'll update.

> >+	if (!(pe = pnv_ioda_get_pe(dev))) {
> >+		rc = -ENODEV;
> >+		goto out;
> >+	}
> >+	dev_info(&dev->dev, "     pe : %i\n", pe->pe_number);
> 
> Perhaps you can reuse pe_info() here.

Yep, will do.

<snip>
> >+#ifdef CONFIG_CXL_BASE
> >+int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
> >+			   unsigned int virq)
> >+{
> >+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
> >+	struct pnv_phb *phb = hose->private_data;
> >+	unsigned int xive_num = hwirq - phb->msi_base;
> >+	struct pnv_ioda_pe *pe;
> >+	int rc;
> >+
> >+	if (!(pe = pnv_ioda_get_pe(dev)))
> >+		return -ENODEV;
> >+
> >+	/* Assign XIVE to PE */
> >+	rc = opal_pci_set_xive_pe(phb->opal_id, pe->pe_number, xive_num);
> >+	if (rc) {
> >+		pr_warn("%s: OPAL error %d setting msi_base 0x%x hwirq 0x%x XIVE 0x%x PE\n",
> >+			pci_name(dev), rc, phb->msi_base, hwirq, xive_num);
> >+		return -EIO;
> >+	}
> 
> It seems current firmware doesn't support the OPAL API for PHB3.

The current public version of skiboot seems to be doing something here
in hw/phb3.c in phb3_set_ive_pe():

https://github.com/open-power/skiboot/blob/c34c4ef8c660e3e439365c8f5c06143ff00bc6bc/hw/phb3.c#L1096

I think we still need this.

Thanks again!
Mikey

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/15] powerpc/msi: Improve IRQ bitmap allocator
  2014-09-19 20:16   ` Scott Wood
  2014-09-19 20:19     ` Scott Wood
@ 2014-09-22  8:25     ` Laurentiu Tudor
  1 sibling, 0 replies; 43+ messages in thread
From: Laurentiu Tudor @ 2014-09-22  8:25 UTC (permalink / raw)
  To: Scott Wood, Michael Neuling
  Cc: greg, arnd, mpe, benh, cbe-oss-dev, imunsie, linux-kernel,
	linuxppc-dev, jk, anton, Laurentiu Tudor

On 09/19/2014 11:16 PM, Scott Wood wrote:
> On Thu, 2014-09-18 at 18:26 +1000, Michael Neuling wrote:
>> From: Ian Munsie <imunsie@au1.ibm.com>
>>
>> Currently msi_bitmap_alloc_hwirqs() will round up any IRQ allocation requests
>> to the nearest power of 2.  eg. ask for 5 IRQs and you'll get 8.  This wastes a
>> lot of IRQs which can be a scarce resource.
>>
>> For cxl we can require multiple IRQs for every contexts that is attached to the
>> accelerator.  For AFU directed accelerators, there may be 1000s of contexts
>> attached, hence we can easily run out of IRQs, especially if we are needlessly
>> wasting them.
>>
>> This changes the msi_bitmap_alloc_hwirqs() to allocate only the required number
>> of IRQs, hence avoiding this wastage.
>>
>> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
>> Signed-off-by: Michael Neuling <mikey@neuling.org>
>> ---
>>  arch/powerpc/sysdev/msi_bitmap.c | 18 +++++++++++++-----
>>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> This conflicts with (and partially duplicates)
> http://patchwork.ozlabs.org/patch/381892/
> which I have in my tree.  How should we handle it?
> 
> Laurentiu, from looking at the overlap between patches I see a problem
> with your existing patch, regarding the out-of-irqs path and
> msi_bitmap_free_hwirqs(), so one way or another that needs to get fixed
> soon.
> 

Agree. My patch lacks error checking so Michael's patch is better.

---
Best Regards, Laurentiu

> 
>> diff --git a/arch/powerpc/sysdev/msi_bitmap.c b/arch/powerpc/sysdev/msi_bitmap.c
>> index 2ff6302..e001559 100644
>> --- a/arch/powerpc/sysdev/msi_bitmap.c
>> +++ b/arch/powerpc/sysdev/msi_bitmap.c
>> @@ -24,28 +24,36 @@ int msi_bitmap_alloc_hwirqs(struct msi_bitmap *bmp, int num)
>>  	 * This is fast, but stricter than we need. We might want to add
>>  	 * a fallback routine which does a linear search with no alignment.
>>  	 */
>> -	offset = bitmap_find_free_region(bmp->bitmap, bmp->irq_count, order);
>> +	offset = bitmap_find_next_zero_area(bmp->bitmap, bmp->irq_count, 0,
>> +					    num, (1 << order) - 1);
>> +	if (offset > bmp->irq_count)
>> +		goto err;
>> +	bitmap_set(bmp->bitmap, offset, num);
>>  	spin_unlock_irqrestore(&bmp->lock, flags);
>>  
>>  	pr_debug("msi_bitmap: allocated 0x%x (2^%d) at offset 0x%x\n",
>>  		 num, order, offset);
>>  
>>  	return offset;
>> +err:
>> +	spin_unlock_irqrestore(&bmp->lock, flags);
>> +	return -ENOMEM;
>>  }
>> +EXPORT_SYMBOL(msi_bitmap_alloc_hwirqs);
>>  
>>  void msi_bitmap_free_hwirqs(struct msi_bitmap *bmp, unsigned int offset,
>>  			    unsigned int num)
>>  {
>>  	unsigned long flags;
>> -	int order = get_count_order(num);
>>  
>> -	pr_debug("msi_bitmap: freeing 0x%x (2^%d) at offset 0x%x\n",
>> -		 num, order, offset);
>> +	pr_debug("msi_bitmap: freeing 0x%x at offset 0x%x\n",
>> +		 num, offset);
>>  
>>  	spin_lock_irqsave(&bmp->lock, flags);
>> -	bitmap_release_region(bmp->bitmap, offset, order);
>> +	bitmap_clear(bmp->bitmap, offset, num);
>>  	spin_unlock_irqrestore(&bmp->lock, flags);
>>  }
>> +EXPORT_SYMBOL(msi_bitmap_free_hwirqs);
>>  
>>  void msi_bitmap_reserve_hwirq(struct msi_bitmap *bmp, unsigned int hwirq)
>>  {
> 
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/15] powerpc/msi: Improve IRQ bitmap allocator
  2014-09-19 20:19     ` Scott Wood
@ 2014-09-22  8:26       ` Laurentiu Tudor
  2014-09-22 23:50         ` Scott Wood
  0 siblings, 1 reply; 43+ messages in thread
From: Laurentiu Tudor @ 2014-09-22  8:26 UTC (permalink / raw)
  To: Scott Wood, Michael Neuling
  Cc: greg, arnd, mpe, benh, cbe-oss-dev, imunsie, linux-kernel,
	linuxppc-dev, jk, anton, Laurentiu Tudor

On 09/19/2014 11:19 PM, Scott Wood wrote:
> On Fri, 2014-09-19 at 15:16 -0500, Scott Wood wrote:
>> On Thu, 2014-09-18 at 18:26 +1000, Michael Neuling wrote:
>>> From: Ian Munsie <imunsie@au1.ibm.com>
>>>
>>> Currently msi_bitmap_alloc_hwirqs() will round up any IRQ allocation requests
>>> to the nearest power of 2.  eg. ask for 5 IRQs and you'll get 8.  This wastes a
>>> lot of IRQs which can be a scarce resource.
>>>
>>> For cxl we can require multiple IRQs for every contexts that is attached to the
>>> accelerator.  For AFU directed accelerators, there may be 1000s of contexts
>>> attached, hence we can easily run out of IRQs, especially if we are needlessly
>>> wasting them.
>>>
>>> This changes the msi_bitmap_alloc_hwirqs() to allocate only the required number
>>> of IRQs, hence avoiding this wastage.
>>>
>>> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
>>> Signed-off-by: Michael Neuling <mikey@neuling.org>
>>> ---
>>>  arch/powerpc/sysdev/msi_bitmap.c | 18 +++++++++++++-----
>>>  1 file changed, 13 insertions(+), 5 deletions(-)
>>
>> This conflicts with (and partially duplicates)
>> http://patchwork.ozlabs.org/patch/381892/
>> which I have in my tree.  How should we handle it?
>>
>> Laurentiu, from looking at the overlap between patches I see a problem
>> with your existing patch, regarding the out-of-irqs path and
>> msi_bitmap_free_hwirqs(), so one way or another that needs to get fixed
>> soon.
> 
> Given the problems with Laurentiu's patch, perhaps it'd be best for me
> to just revert that patch in my tree, and respin it after this patchset
> has been merged.

Let me know if you want me to rebase my stuff on top of Michael's patch.

---
Best Regards, Laurentiu


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/15] powerpc/msi: Improve IRQ bitmap allocator
  2014-09-18  8:26 ` [PATCH 03/15] powerpc/msi: Improve IRQ bitmap allocator Michael Neuling
  2014-09-19 20:16   ` Scott Wood
@ 2014-09-22  8:29   ` Laurentiu Tudor
  2014-09-22 22:59     ` Michael Neuling
  1 sibling, 1 reply; 43+ messages in thread
From: Laurentiu Tudor @ 2014-09-22  8:29 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, mpe, benh
  Cc: cbe-oss-dev, imunsie, linux-kernel, linuxppc-dev, jk, anton

Hi Michael,

Minor comment inline.

On 09/18/2014 11:26 AM, Michael Neuling wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> Currently msi_bitmap_alloc_hwirqs() will round up any IRQ allocation requests
> to the nearest power of 2.  eg. ask for 5 IRQs and you'll get 8.  This wastes a
> lot of IRQs which can be a scarce resource.
> 
> For cxl we can require multiple IRQs for every contexts that is attached to the
> accelerator.  For AFU directed accelerators, there may be 1000s of contexts
> attached, hence we can easily run out of IRQs, especially if we are needlessly
> wasting them.
> 
> This changes the msi_bitmap_alloc_hwirqs() to allocate only the required number
> of IRQs, hence avoiding this wastage.
> 
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
>  arch/powerpc/sysdev/msi_bitmap.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/sysdev/msi_bitmap.c b/arch/powerpc/sysdev/msi_bitmap.c
> index 2ff6302..e001559 100644
> --- a/arch/powerpc/sysdev/msi_bitmap.c
> +++ b/arch/powerpc/sysdev/msi_bitmap.c
> @@ -24,28 +24,36 @@ int msi_bitmap_alloc_hwirqs(struct msi_bitmap *bmp, int num)
>  	 * This is fast, but stricter than we need. We might want to add
>  	 * a fallback routine which does a linear search with no alignment.
>  	 */

Is this comment still relevant (especially the part mentioning "fast")?

---
Best Regards, Laurentiu

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/15] powerpc/msi: Improve IRQ bitmap allocator
  2014-09-22  8:29   ` Laurentiu Tudor
@ 2014-09-22 22:59     ` Michael Neuling
  0 siblings, 0 replies; 43+ messages in thread
From: Michael Neuling @ 2014-09-22 22:59 UTC (permalink / raw)
  To: Laurentiu Tudor
  Cc: greg, arnd, mpe, benh, cbe-oss-dev, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

> > diff --git a/arch/powerpc/sysdev/msi_bitmap.c b/arch/powerpc/sysdev/msi_bitmap.c
> > index 2ff6302..e001559 100644
> > --- a/arch/powerpc/sysdev/msi_bitmap.c
> > +++ b/arch/powerpc/sysdev/msi_bitmap.c
> > @@ -24,28 +24,36 @@ int msi_bitmap_alloc_hwirqs(struct msi_bitmap *bmp, int num)
> >  	 * This is fast, but stricter than we need. We might want to add
> >  	 * a fallback routine which does a linear search with no alignment.
> >  	 */
> 
> Is this comment still relevant (especially the part mentioning "fast")?

You're right, it's not really relevant any more.  I'll remove.

Thanks
Mikey


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/15] powerpc/msi: Improve IRQ bitmap allocator
  2014-09-22  8:26       ` Laurentiu Tudor
@ 2014-09-22 23:50         ` Scott Wood
  0 siblings, 0 replies; 43+ messages in thread
From: Scott Wood @ 2014-09-22 23:50 UTC (permalink / raw)
  To: Laurentiu Tudor
  Cc: Michael Neuling, greg, arnd, mpe, benh, cbe-oss-dev, imunsie,
	linux-kernel, linuxppc-dev, jk, anton, Laurentiu Tudor

On Mon, 2014-09-22 at 11:26 +0300, Laurentiu Tudor wrote:
> On 09/19/2014 11:19 PM, Scott Wood wrote:
> > On Fri, 2014-09-19 at 15:16 -0500, Scott Wood wrote:
> >> On Thu, 2014-09-18 at 18:26 +1000, Michael Neuling wrote:
> >>> From: Ian Munsie <imunsie@au1.ibm.com>
> >>>
> >>> Currently msi_bitmap_alloc_hwirqs() will round up any IRQ allocation requests
> >>> to the nearest power of 2.  eg. ask for 5 IRQs and you'll get 8.  This wastes a
> >>> lot of IRQs which can be a scarce resource.
> >>>
> >>> For cxl we can require multiple IRQs for every contexts that is attached to the
> >>> accelerator.  For AFU directed accelerators, there may be 1000s of contexts
> >>> attached, hence we can easily run out of IRQs, especially if we are needlessly
> >>> wasting them.
> >>>
> >>> This changes the msi_bitmap_alloc_hwirqs() to allocate only the required number
> >>> of IRQs, hence avoiding this wastage.
> >>>
> >>> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> >>> Signed-off-by: Michael Neuling <mikey@neuling.org>
> >>> ---
> >>>  arch/powerpc/sysdev/msi_bitmap.c | 18 +++++++++++++-----
> >>>  1 file changed, 13 insertions(+), 5 deletions(-)
> >>
> >> This conflicts with (and partially duplicates)
> >> http://patchwork.ozlabs.org/patch/381892/
> >> which I have in my tree.  How should we handle it?
> >>
> >> Laurentiu, from looking at the overlap between patches I see a problem
> >> with your existing patch, regarding the out-of-irqs path and
> >> msi_bitmap_free_hwirqs(), so one way or another that needs to get fixed
> >> soon.
> > 
> > Given the problems with Laurentiu's patch, perhaps it'd be best for me
> > to just revert that patch in my tree, and respin it after this patchset
> > has been merged.
> 
> Let me know if you want me to rebase my stuff on top of Michael's patch.

Yes, please resend it once Michael's patch gets merged.

-Scott



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/15] powerpc/cell: Move spu_handle_mm_fault() out of cell platform
  2014-09-18  8:26 ` [PATCH 01/15] powerpc/cell: Move spu_handle_mm_fault() out of cell platform Michael Neuling
  2014-09-18 10:00   ` Jeremy Kerr
@ 2014-09-26  3:57   ` Anton Blanchard
  1 sibling, 0 replies; 43+ messages in thread
From: Anton Blanchard @ 2014-09-26  3:57 UTC (permalink / raw)
  To: Michael Neuling
  Cc: greg, arnd, mpe, benh, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev

> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> Currently spu_handle_mm_fault() is in the cell platform.
> 
> This code is generically useful for other non-cell co-processors on
> powerpc.
> 
> This patch moves this function out of the cell platform into
> arch/powerpc/mm so that others may use it.
> 
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Signed-off-by: Michael Neuling <mikey@neuling.org>

Reviewed-by: Anton Blanchard <anton@samba.org>

Anton

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 02/15] powerpc/cell: Move data segment faulting code out of cell platform
  2014-09-18  8:26 ` [PATCH 02/15] powerpc/cell: Move data segment faulting code " Michael Neuling
  2014-09-18 10:27   ` Jeremy Kerr
@ 2014-09-26  4:05   ` Anton Blanchard
  2014-09-26 11:19     ` Michael Neuling
  2014-09-29  8:30   ` Aneesh Kumar K.V
  2 siblings, 1 reply; 43+ messages in thread
From: Anton Blanchard @ 2014-09-26  4:05 UTC (permalink / raw)
  To: Michael Neuling
  Cc: greg, arnd, mpe, benh, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev


> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> __spu_trap_data_seg() currently contains code to determine the VSID
> and ESID required for a particular EA and mm struct.
> 
> This code is generically useful for other co-processors.  This moves
> the code of the cell platform so it can be used by other powerpc code.

Could we also mention:

and adds 1TB segment support.

> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Signed-off-by: Michael Neuling <mikey@neuling.org>

Reviewed-by: Anton Blanchard <anton@samba.org>

Anton

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 10/15] powerpc/mm: Add hooks for cxl
  2014-09-18  8:26 ` [PATCH 10/15] powerpc/mm: Add hooks for cxl Michael Neuling
@ 2014-09-26  4:33   ` Anton Blanchard
  2014-09-26 11:33     ` Michael Neuling
  2014-09-29  9:10   ` Aneesh Kumar K.V
  1 sibling, 1 reply; 43+ messages in thread
From: Anton Blanchard @ 2014-09-26  4:33 UTC (permalink / raw)
  To: Michael Neuling
  Cc: greg, arnd, mpe, benh, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev

> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> This add a hook into tlbie() so that we use global invalidations when
> there are cxl contexts active.
> 
> Normally cxl snoops broadcast tlbie.  cxl can have TLB entries
> invalidated via MMIO, but we aren't doing that yet.  So for now we
> are just disabling local tlbies when cxl contexts are active.  In
> future we can make tlbie() local mode smarter so that it invalidates
> cxl contexts explicitly when it needs to.
> 
> This also adds a hooks for when SLBs are invalidated to ensure any
> corresponding SLBs in cxl are also invalidated at the same time.
> 
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Signed-off-by: Michael Neuling <mikey@neuling.org>

> +	use_local = local && mmu_has_feature(MMU_FTR_TLBIEL) && !cxl_ctx_in_use();

Seems reasonable until we can get the MMIO based optimisation in.

Will all CAPI cached translations be invalidated before we finish using
a CAPI context? And conversely, could CAPI cache any translations when a
context isn't active? I'm mostly concerned that we can't have a
situation where badly behaving userspace could result in a stale
translation.

>  	spu_flush_all_slbs(mm);
>  #endif
> +	cxl_slbia(mm);

>  			spu_flush_all_slbs(mm);
>  #endif
> +			cxl_slbia(mm);

>  	spu_flush_all_slbs(mm);
>  #endif
> +	cxl_slbia(mm);

>  	spu_flush_all_slbs(mm);
>  #endif
> +	cxl_slbia(mm);

Should we combine the SPU vs CXL callouts into something common -
perhaps copro_flush_all_slbs()?

Anton

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 09/15] powerpc/opal: Add PHB to cxl mode call
  2014-09-18  8:26 ` [PATCH 09/15] powerpc/opal: Add PHB to cxl mode call Michael Neuling
@ 2014-09-26  4:35   ` Anton Blanchard
  0 siblings, 0 replies; 43+ messages in thread
From: Anton Blanchard @ 2014-09-26  4:35 UTC (permalink / raw)
  To: Michael Neuling
  Cc: greg, arnd, mpe, benh, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev

> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> This adds the OPAL call to change a PHB into cxl mode.
> 
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Signed-off-by: Michael Neuling <mikey@neuling.org>

Reviewed-by: Anton Blanchard <anton@samba.org>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 02/15] powerpc/cell: Move data segment faulting code out of cell platform
  2014-09-26  4:05   ` Anton Blanchard
@ 2014-09-26 11:19     ` Michael Neuling
  0 siblings, 0 replies; 43+ messages in thread
From: Michael Neuling @ 2014-09-26 11:19 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: greg, arnd, mpe, benh, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev

On Fri, 2014-09-26 at 14:05 +1000, Anton Blanchard wrote:
> > From: Ian Munsie <imunsie@au1.ibm.com>
> > 
> > __spu_trap_data_seg() currently contains code to determine the VSID
> > and ESID required for a particular EA and mm struct.
> > 
> > This code is generically useful for other co-processors.  This moves
> > the code of the cell platform so it can be used by other powerpc code.
> 
> Could we also mention:
> 
> and adds 1TB segment support.

Good point.  I'll add.


> > Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> 
> Reviewed-by: Anton Blanchard <anton@samba.org>

Thanks,
Mikey

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 10/15] powerpc/mm: Add hooks for cxl
  2014-09-26  4:33   ` Anton Blanchard
@ 2014-09-26 11:33     ` Michael Neuling
  2014-09-26 13:24       ` Anton Blanchard
  0 siblings, 1 reply; 43+ messages in thread
From: Michael Neuling @ 2014-09-26 11:33 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: greg, arnd, mpe, benh, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev

On Fri, 2014-09-26 at 14:33 +1000, Anton Blanchard wrote:
> > From: Ian Munsie <imunsie@au1.ibm.com>
> > 
> > This add a hook into tlbie() so that we use global invalidations when
> > there are cxl contexts active.
> > 
> > Normally cxl snoops broadcast tlbie.  cxl can have TLB entries
> > invalidated via MMIO, but we aren't doing that yet.  So for now we
> > are just disabling local tlbies when cxl contexts are active.  In
> > future we can make tlbie() local mode smarter so that it invalidates
> > cxl contexts explicitly when it needs to.
> > 
> > This also adds a hooks for when SLBs are invalidated to ensure any
> > corresponding SLBs in cxl are also invalidated at the same time.
> > 
> > Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> 
> > +	use_local = local && mmu_has_feature(MMU_FTR_TLBIEL) && !cxl_ctx_in_use();
> 
> Seems reasonable until we can get the MMIO based optimisation in.
> 
> Will all CAPI cached translations be invalidated before we finish using
> a CAPI context? 

I'm not sure I understand. Can you elaborate?

> And conversely, could CAPI cache any translations when a
> context isn't active? 

The kernel invalidates all translations when the file descriptor is
closed.  So no, unless the PSL was badly behaving and ignoring the
invalidations.... but if we can't trust the PSL we're screwed.

> I'm mostly concerned that we can't have a
> situation where badly behaving userspace could result in a stale
> translation.

We only map what a user processes maps and we tear it down when the
process is teared down (on the file descriptor release).  So I think we
are ok.  

Unless there's some lazy teardown you're alluding to that I'm missing?

> 
> >  	spu_flush_all_slbs(mm);
> >  #endif
> > +	cxl_slbia(mm);
> 
> >  			spu_flush_all_slbs(mm);
> >  #endif
> > +			cxl_slbia(mm);
> 
> >  	spu_flush_all_slbs(mm);
> >  #endif
> > +	cxl_slbia(mm);
> 
> >  	spu_flush_all_slbs(mm);
> >  #endif
> > +	cxl_slbia(mm);
> 
> Should we combine the SPU vs CXL callouts into something common -
> perhaps copro_flush_all_slbs()?

Sounds good.  I'll update.

Mikey

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 10/15] powerpc/mm: Add hooks for cxl
  2014-09-26 11:33     ` Michael Neuling
@ 2014-09-26 13:24       ` Anton Blanchard
  0 siblings, 0 replies; 43+ messages in thread
From: Anton Blanchard @ 2014-09-26 13:24 UTC (permalink / raw)
  To: Michael Neuling
  Cc: greg, arnd, mpe, benh, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev


Hi Mikey,

> We only map what a user processes maps and we tear it down when the
> process is teared down (on the file descriptor release).  So I think
> we are ok.  
> 
> Unless there's some lazy teardown you're alluding to that I'm missing?

I was trying to make sure things like the TLB batching code won't allow
a tlbie to be postponed until after a CAPI mapping is destroyed. It's
been ages since I looked at that part of the mm code.

Anton

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 02/15] powerpc/cell: Move data segment faulting code out of cell platform
  2014-09-18  8:26 ` [PATCH 02/15] powerpc/cell: Move data segment faulting code " Michael Neuling
  2014-09-18 10:27   ` Jeremy Kerr
  2014-09-26  4:05   ` Anton Blanchard
@ 2014-09-29  8:30   ` Aneesh Kumar K.V
  2014-09-30  4:40     ` Michael Neuling
  2 siblings, 1 reply; 43+ messages in thread
From: Aneesh Kumar K.V @ 2014-09-29  8:30 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie, cbe-oss-dev

Michael Neuling <mikey@neuling.org> writes:

> From: Ian Munsie <imunsie@au1.ibm.com>
>
> __spu_trap_data_seg() currently contains code to determine the VSID and ESID
> required for a particular EA and mm struct.
>
> This code is generically useful for other co-processors.  This moves the code
> of the cell platform so it can be used by other powerpc code.
>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
>  arch/powerpc/include/asm/mmu-hash64.h  |  2 ++
>  arch/powerpc/mm/copro_fault.c          | 48 ++++++++++++++++++++++++++++++++++
>  arch/powerpc/mm/slb.c                  |  3 ---
>  arch/powerpc/platforms/cell/spu_base.c | 41 +++--------------------------
>  4 files changed, 54 insertions(+), 40 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
> index d765144..fd19a53 100644
> --- a/arch/powerpc/include/asm/mmu-hash64.h
> +++ b/arch/powerpc/include/asm/mmu-hash64.h
> @@ -180,6 +180,8 @@ static inline unsigned int mmu_psize_to_shift(unsigned int mmu_psize)
>   * we work in all cases including 4k page size.
>   */
>  #define VPN_SHIFT	12
> +#define slb_vsid_shift(ssize)	\
> +	((ssize) == MMU_SEGSIZE_256M ? SLB_VSID_SHIFT : SLB_VSID_SHIFT_1T)

can it be static inline similar to segment_shift() ?

>  
>  /*
>   * HPTE Large Page (LP) details
> diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
> index ba7df14..4105a63 100644
> --- a/arch/powerpc/mm/copro_fault.c
> +++ b/arch/powerpc/mm/copro_fault.c
> @@ -90,3 +90,51 @@ out_unlock:
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
> +
> +int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid)
> +{
> +	int psize, ssize;
> +
> +	*esid = (ea & ESID_MASK) | SLB_ESID_V;
> +
> +	switch (REGION_ID(ea)) {
> +	case USER_REGION_ID:
> +		pr_devel("copro_data_segment: 0x%llx -- USER_REGION_ID\n", ea);
> +#ifdef CONFIG_PPC_MM_SLICES
> +		psize = get_slice_psize(mm, ea);
> +#else
> +		psize = mm->context.user_psize;
> +#endif

We don't need that.

#ifdef CONFIG_PPC_STD_MMU_64
#define get_slice_psize(mm, addr)	((mm)->context.user_psize)



> +		ssize = user_segment_size(ea);
> +		*vsid = (get_vsid(mm->context.id, ea, ssize)
> +			<< slb_vsid_shift(ssize)) | SLB_VSID_USER
> +			| (ssize == MMU_SEGSIZE_1T ? SLB_VSID_B_1T : 0);
> +		break;
> +	case VMALLOC_REGION_ID:
> +		pr_devel("copro_data_segment: 0x%llx -- VMALLOC_REGION_ID\n", ea);
> +		if (ea < VMALLOC_END)
> +			psize = mmu_vmalloc_psize;
> +		else
> +			psize = mmu_io_psize;
> +		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
> +			<< SLB_VSID_SHIFT) | SLB_VSID_KERNEL
> +			| (mmu_kernel_ssize == MMU_SEGSIZE_1T ? SLB_VSID_B_1T : 0);
> +		break;
> +	case KERNEL_REGION_ID:
> +		pr_devel("copro_data_segment: 0x%llx -- KERNEL_REGION_ID\n", ea);
> +		psize = mmu_linear_psize;
> +		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
> +			<< SLB_VSID_SHIFT) | SLB_VSID_KERNEL
> +			| (mmu_kernel_ssize == MMU_SEGSIZE_1T ? SLB_VSID_B_1T : 0);
> +		break;
> +	default:
> +		/* Future: support kernel segments so that drivers can use the
> +		 * CoProcessors */
> +		pr_debug("invalid region access at %016llx\n", ea);
> +		return 1;
> +	}
> +	*vsid |= mmu_psize_defs[psize].sllp;
> +
> +	return 0;
> +}

large part of this is same as what we do in hash_page. And we are not
really updating vsid here, it is vsid slb encoding. So why not abstract
the vsid part and use that in hash_page also ? That would have also taken
care of the above #ifdef.

> +EXPORT_SYMBOL_GPL(copro_data_segment);
> diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
> index 0399a67..6e450ca 100644
> --- a/arch/powerpc/mm/slb.c
> +++ b/arch/powerpc/mm/slb.c
> @@ -46,9 +46,6 @@ static inline unsigned long mk_esid_data(unsigned long ea, int ssize,
>  	return (ea & slb_esid_mask(ssize)) | SLB_ESID_V | slot;
>  }
>  
> -#define slb_vsid_shift(ssize)	\
> -	((ssize) == MMU_SEGSIZE_256M? SLB_VSID_SHIFT: SLB_VSID_SHIFT_1T)
> -
>  static inline unsigned long mk_vsid_data(unsigned long ea, int ssize,
>  					 unsigned long flags)
>  {
> diff --git a/arch/powerpc/platforms/cell/spu_base.c b/arch/powerpc/platforms/cell/spu_base.c
> index 2930d1e..fe004b1 100644
> --- a/arch/powerpc/platforms/cell/spu_base.c
> +++ b/arch/powerpc/platforms/cell/spu_base.c
> @@ -167,45 +167,12 @@ static inline void spu_load_slb(struct spu *spu, int slbe, struct spu_slb *slb)
>  
>  static int __spu_trap_data_seg(struct spu *spu, unsigned long ea)
>  {
> -	struct mm_struct *mm = spu->mm;
>  	struct spu_slb slb;
> -	int psize;
> -
> -	pr_debug("%s\n", __func__);
> -
> -	slb.esid = (ea & ESID_MASK) | SLB_ESID_V;
> +	int ret;
>  
> -	switch(REGION_ID(ea)) {
> -	case USER_REGION_ID:
> -#ifdef CONFIG_PPC_MM_SLICES
> -		psize = get_slice_psize(mm, ea);
> -#else
> -		psize = mm->context.user_psize;
> -#endif
> -		slb.vsid = (get_vsid(mm->context.id, ea, MMU_SEGSIZE_256M)
> -				<< SLB_VSID_SHIFT) | SLB_VSID_USER;
> -		break;
> -	case VMALLOC_REGION_ID:
> -		if (ea < VMALLOC_END)
> -			psize = mmu_vmalloc_psize;
> -		else
> -			psize = mmu_io_psize;
> -		slb.vsid = (get_kernel_vsid(ea, MMU_SEGSIZE_256M)
> -				<< SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
> -		break;
> -	case KERNEL_REGION_ID:
> -		psize = mmu_linear_psize;
> -		slb.vsid = (get_kernel_vsid(ea, MMU_SEGSIZE_256M)
> -				<< SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
> -		break;
> -	default:
> -		/* Future: support kernel segments so that drivers
> -		 * can use SPUs.
> -		 */
> -		pr_debug("invalid region access at %016lx\n", ea);
> -		return 1;
> -	}
> -	slb.vsid |= mmu_psize_defs[psize].sllp;
> +	ret = copro_data_segment(spu->mm, ea, &slb.esid, &slb.vsid);
> +	if (ret)
> +		return ret;
>  
>  	spu_load_slb(spu, spu->slb_replace, &slb);
>  

-aneesh


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 08/15] powerpc/mm: Add new hash_page_mm()
  2014-09-18  8:26 ` [PATCH 08/15] powerpc/mm: Add new hash_page_mm() Michael Neuling
@ 2014-09-29  8:50   ` Aneesh Kumar K.V
       [not found]     ` <1412054407.1733.77.camel@ale.ozlabs.ibm.com>
  0 siblings, 1 reply; 43+ messages in thread
From: Aneesh Kumar K.V @ 2014-09-29  8:50 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie, cbe-oss-dev

Michael Neuling <mikey@neuling.org> writes:

> From: Ian Munsie <imunsie@au1.ibm.com>
>
> This adds a new function hash_page_mm() based on the existing hash_page().
> This version allows any struct mm to be passed in, rather than assuming
> current.  This is useful for servicing co-processor faults which are not in the
> context of the current running process.
>
> We need to be careful here as the current hash_page() assumes current in a few
> places.

Can you also explain calling semantics. ie, why would we want to call
this with anything other than current ? Should we flush slb now or
should it be skipped ? so what would happen if the new hash page can
result in segment demotion ?  You don't put that under if (mm ==
current->mm). is that ok ?

	if ((pte_val(*ptep) & _PAGE_4K_PFN) && psize == MMU_PAGE_64K) {
		demote_segment_4k(mm, ea);
		psize = MMU_PAGE_4K;
	}

We also update paca context there

	if (get_paca_psize(addr) != MMU_PAGE_4K) {
		get_paca()->context = mm->context;
		slb_flush_and_rebolt();
	}


You also added code to handle KERNEL_REGION_ID in
[PATCH 02/15] powerpc/cell: Move data segment faulting code out of cell
platform. do we need to handle that here ?


>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
>  arch/powerpc/include/asm/mmu-hash64.h |  1 +
>  arch/powerpc/mm/hash_utils_64.c       | 20 +++++++++++++-------
>  2 files changed, 14 insertions(+), 7 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
> index fd19a53..a3b85e9 100644
> --- a/arch/powerpc/include/asm/mmu-hash64.h
> +++ b/arch/powerpc/include/asm/mmu-hash64.h
> @@ -319,6 +319,7 @@ extern int __hash_page_64K(unsigned long ea, unsigned long access,
>  			   unsigned int local, int ssize);
>  struct mm_struct;
>  unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap);
> +extern int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, unsigned long trap);
>  extern int hash_page(unsigned long ea, unsigned long access, unsigned long trap);
>  int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid,
>  		     pte_t *ptep, unsigned long trap, int local, int ssize,
> diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> index 0f73367..66071af 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -991,26 +991,24 @@ static void check_paca_psize(unsigned long ea, struct mm_struct *mm,
>   * -1 - critical hash insertion error
>   * -2 - access not permitted by subpage protection mechanism
>   */
> -int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
> +int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, unsigned long trap)
>  {
>  	enum ctx_state prev_state = exception_enter();
>  	pgd_t *pgdir;
>  	unsigned long vsid;
> -	struct mm_struct *mm;
>  	pte_t *ptep;
>  	unsigned hugeshift;
>  	const struct cpumask *tmp;
>  	int rc, user_region = 0, local = 0;
>  	int psize, ssize;
>  
> -	DBG_LOW("hash_page(ea=%016lx, access=%lx, trap=%lx\n",
> -		ea, access, trap);
> +	DBG_LOW("%s(ea=%016lx, access=%lx, trap=%lx\n",
> +		__func__, ea, access, trap);
>  
>  	/* Get region & vsid */
>   	switch (REGION_ID(ea)) {
>  	case USER_REGION_ID:
>  		user_region = 1;
> -		mm = current->mm;
>  		if (! mm) {
>  			DBG_LOW(" user region with no mm !\n");
>  			rc = 1;
> @@ -1106,7 +1104,8 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
>  			WARN_ON(1);
>  		}
>  #endif
> -		check_paca_psize(ea, mm, psize, user_region);
> +		if (current->mm == mm)
> +			check_paca_psize(ea, mm, psize, user_region);
>  
>  		goto bail;
>  	}
> @@ -1149,7 +1148,8 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
>  		}
>  	}
>  
> -	check_paca_psize(ea, mm, psize, user_region);
> +	if (current->mm == mm)
> +		check_paca_psize(ea, mm, psize, user_region);
>  #endif /* CONFIG_PPC_64K_PAGES */
>  
>  #ifdef CONFIG_PPC_HAS_HASH_64K
> @@ -1184,6 +1184,12 @@ bail:
>  	exception_exit(prev_state);
>  	return rc;
>  }
> +EXPORT_SYMBOL_GPL(hash_page_mm);
> +
> +int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
> +{
> +	return hash_page_mm(current->mm, ea, access, trap);
> +}
>  EXPORT_SYMBOL_GPL(hash_page);
>  
>  void hash_preload(struct mm_struct *mm, unsigned long ea,
> -- 
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 10/15] powerpc/mm: Add hooks for cxl
  2014-09-18  8:26 ` [PATCH 10/15] powerpc/mm: Add hooks for cxl Michael Neuling
  2014-09-26  4:33   ` Anton Blanchard
@ 2014-09-29  9:10   ` Aneesh Kumar K.V
  1 sibling, 0 replies; 43+ messages in thread
From: Aneesh Kumar K.V @ 2014-09-29  9:10 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie, cbe-oss-dev

Michael Neuling <mikey@neuling.org> writes:

> From: Ian Munsie <imunsie@au1.ibm.com>
>
> This add a hook into tlbie() so that we use global invalidations when there are
> cxl contexts active.
>
> Normally cxl snoops broadcast tlbie.  cxl can have TLB entries invalidated via
> MMIO, but we aren't doing that yet.  So for now we are just disabling local
> tlbies when cxl contexts are active.  In future we can make tlbie() local mode
> smarter so that it invalidates cxl contexts explicitly when it needs to.
>
> This also adds a hooks for when SLBs are invalidated to ensure any
> corresponding SLBs in cxl are also invalidated at the same time.

We are not really invalidating cx1 SLB's when we are doing
slb_flush_and_rebolt(). May be add some code documentation around to
explain when we are invalidating cx1 slb here. ?

>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
>  arch/powerpc/mm/hash_native_64.c | 6 +++++-
>  arch/powerpc/mm/hash_utils_64.c  | 3 +++
>  arch/powerpc/mm/slice.c          | 3 +++
>  3 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
> index afc0a82..ae4962a 100644
> --- a/arch/powerpc/mm/hash_native_64.c
> +++ b/arch/powerpc/mm/hash_native_64.c
> @@ -29,6 +29,8 @@
>  #include <asm/kexec.h>
>  #include <asm/ppc-opcode.h>
>  
> +#include <misc/cxl.h>
> +
>  #ifdef DEBUG_LOW
>  #define DBG_LOW(fmt...) udbg_printf(fmt)
>  #else
> @@ -149,9 +151,11 @@ static inline void __tlbiel(unsigned long vpn, int psize, int apsize, int ssize)
>  static inline void tlbie(unsigned long vpn, int psize, int apsize,
>  			 int ssize, int local)
>  {
> -	unsigned int use_local = local && mmu_has_feature(MMU_FTR_TLBIEL);
> +	unsigned int use_local;
>  	int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
>  
> +	use_local = local && mmu_has_feature(MMU_FTR_TLBIEL) && !cxl_ctx_in_use();
> +
>  	if (use_local)
>  		use_local = mmu_psize_defs[psize].tlbiel;
>  	if (lock_tlbie && !use_local)
> diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> index 66071af..be40ff7 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -34,6 +34,7 @@
>  #include <linux/signal.h>
>  #include <linux/memblock.h>
>  #include <linux/context_tracking.h>
> +#include <misc/cxl.h>
>  
>  #include <asm/processor.h>
>  #include <asm/pgtable.h>
> @@ -906,6 +907,7 @@ void demote_segment_4k(struct mm_struct *mm, unsigned long addr)
>  #ifdef CONFIG_SPU_BASE
>  	spu_flush_all_slbs(mm);
>  #endif
> +	cxl_slbia(mm);
>  	if (get_paca_psize(addr) != MMU_PAGE_4K) {
>  		get_paca()->context = mm->context;
>  		slb_flush_and_rebolt();
> @@ -1145,6 +1147,7 @@ int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, u
>  #ifdef CONFIG_SPU_BASE
>  			spu_flush_all_slbs(mm);
>  #endif
> +			cxl_slbia(mm);
>  		}
>  	}
>  
> diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
> index b0c75cc..4d3a34b 100644
> --- a/arch/powerpc/mm/slice.c
> +++ b/arch/powerpc/mm/slice.c
> @@ -30,6 +30,7 @@
>  #include <linux/err.h>
>  #include <linux/spinlock.h>
>  #include <linux/export.h>
> +#include <misc/cxl.h>
>  #include <asm/mman.h>
>  #include <asm/mmu.h>
>  #include <asm/spu.h>
> @@ -235,6 +236,7 @@ static void slice_convert(struct mm_struct *mm, struct slice_mask mask, int psiz
>  #ifdef CONFIG_SPU_BASE
>  	spu_flush_all_slbs(mm);
>  #endif
> +	cxl_slbia(mm);
>  }
>  
>  /*
> @@ -674,6 +676,7 @@ void slice_set_psize(struct mm_struct *mm, unsigned long address,
>  #ifdef CONFIG_SPU_BASE
>  	spu_flush_all_slbs(mm);
>  #endif
> +	cxl_slbia(mm);
>  }
>  
>  void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
> -- 
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 02/15] powerpc/cell: Move data segment faulting code out of cell platform
  2014-09-29  8:30   ` Aneesh Kumar K.V
@ 2014-09-30  4:40     ` Michael Neuling
  0 siblings, 0 replies; 43+ messages in thread
From: Michael Neuling @ 2014-09-30  4:40 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: greg, arnd, mpe, benh, anton, linux-kernel, linuxppc-dev, jk,
	imunsie, cbe-oss-dev

On Mon, 2014-09-29 at 14:00 +0530, Aneesh Kumar K.V wrote:
> Michael Neuling <mikey@neuling.org> writes:
> 
> > From: Ian Munsie <imunsie@au1.ibm.com>
> >
> > __spu_trap_data_seg() currently contains code to determine the VSID and ESID
> > required for a particular EA and mm struct.
> >
> > This code is generically useful for other co-processors.  This moves the code
> > of the cell platform so it can be used by other powerpc code.
> >
> > Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > ---
> >  arch/powerpc/include/asm/mmu-hash64.h  |  2 ++
> >  arch/powerpc/mm/copro_fault.c          | 48 ++++++++++++++++++++++++++++++++++
> >  arch/powerpc/mm/slb.c                  |  3 ---
> >  arch/powerpc/platforms/cell/spu_base.c | 41 +++--------------------------
> >  4 files changed, 54 insertions(+), 40 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
> > index d765144..fd19a53 100644
> > --- a/arch/powerpc/include/asm/mmu-hash64.h
> > +++ b/arch/powerpc/include/asm/mmu-hash64.h
> > @@ -180,6 +180,8 @@ static inline unsigned int mmu_psize_to_shift(unsigned int mmu_psize)
> >   * we work in all cases including 4k page size.
> >   */
> >  #define VPN_SHIFT	12
> > +#define slb_vsid_shift(ssize)	\
> > +	((ssize) == MMU_SEGSIZE_256M ? SLB_VSID_SHIFT : SLB_VSID_SHIFT_1T)
> 
> can it be static inline similar to segment_shift() ?

Yep.

> 
> >  
> >  /*
> >   * HPTE Large Page (LP) details
> > diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
> > index ba7df14..4105a63 100644
> > --- a/arch/powerpc/mm/copro_fault.c
> > +++ b/arch/powerpc/mm/copro_fault.c
> > @@ -90,3 +90,51 @@ out_unlock:
> >  	return ret;
> >  }
> >  EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
> > +
> > +int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid)
> > +{
> > +	int psize, ssize;
> > +
> > +	*esid = (ea & ESID_MASK) | SLB_ESID_V;
> > +
> > +	switch (REGION_ID(ea)) {
> > +	case USER_REGION_ID:
> > +		pr_devel("copro_data_segment: 0x%llx -- USER_REGION_ID\n", ea);
> > +#ifdef CONFIG_PPC_MM_SLICES
> > +		psize = get_slice_psize(mm, ea);
> > +#else
> > +		psize = mm->context.user_psize;
> > +#endif
> 
> We don't need that.
> 
> #ifdef CONFIG_PPC_STD_MMU_64
> #define get_slice_psize(mm, addr)	((mm)->context.user_psize)

OK

> 
> 
> > +		ssize = user_segment_size(ea);
> > +		*vsid = (get_vsid(mm->context.id, ea, ssize)
> > +			<< slb_vsid_shift(ssize)) | SLB_VSID_USER
> > +			| (ssize == MMU_SEGSIZE_1T ? SLB_VSID_B_1T : 0);
> > +		break;
> > +	case VMALLOC_REGION_ID:
> > +		pr_devel("copro_data_segment: 0x%llx -- VMALLOC_REGION_ID\n", ea);
> > +		if (ea < VMALLOC_END)
> > +			psize = mmu_vmalloc_psize;
> > +		else
> > +			psize = mmu_io_psize;
> > +		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
> > +			<< SLB_VSID_SHIFT) | SLB_VSID_KERNEL
> > +			| (mmu_kernel_ssize == MMU_SEGSIZE_1T ? SLB_VSID_B_1T : 0);
> > +		break;
> > +	case KERNEL_REGION_ID:
> > +		pr_devel("copro_data_segment: 0x%llx -- KERNEL_REGION_ID\n", ea);
> > +		psize = mmu_linear_psize;
> > +		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
> > +			<< SLB_VSID_SHIFT) | SLB_VSID_KERNEL
> > +			| (mmu_kernel_ssize == MMU_SEGSIZE_1T ? SLB_VSID_B_1T : 0);
> > +		break;
> > +	default:
> > +		/* Future: support kernel segments so that drivers can use the
> > +		 * CoProcessors */
> > +		pr_debug("invalid region access at %016llx\n", ea);
> > +		return 1;
> > +	}
> > +	*vsid |= mmu_psize_defs[psize].sllp;
> > +
> > +	return 0;
> > +}
> 
> large part of this is same as what we do in hash_page. And we are not
> really updating vsid here, it is vsid slb encoding. So why not abstract
> the vsid part and use that in hash_page also ? That would have also taken
> care of the above #ifdef.

Ok, I've merge these two variants.

Going to repost this whole series again soon.  I'll be in there.

Thanks for the comments.

Mikey

> 
> > +EXPORT_SYMBOL_GPL(copro_data_segment);
> > diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
> > index 0399a67..6e450ca 100644
> > --- a/arch/powerpc/mm/slb.c
> > +++ b/arch/powerpc/mm/slb.c
> > @@ -46,9 +46,6 @@ static inline unsigned long mk_esid_data(unsigned long ea, int ssize,
> >  	return (ea & slb_esid_mask(ssize)) | SLB_ESID_V | slot;
> >  }
> >  
> > -#define slb_vsid_shift(ssize)	\
> > -	((ssize) == MMU_SEGSIZE_256M? SLB_VSID_SHIFT: SLB_VSID_SHIFT_1T)
> > -
> >  static inline unsigned long mk_vsid_data(unsigned long ea, int ssize,
> >  					 unsigned long flags)
> >  {
> > diff --git a/arch/powerpc/platforms/cell/spu_base.c b/arch/powerpc/platforms/cell/spu_base.c
> > index 2930d1e..fe004b1 100644
> > --- a/arch/powerpc/platforms/cell/spu_base.c
> > +++ b/arch/powerpc/platforms/cell/spu_base.c
> > @@ -167,45 +167,12 @@ static inline void spu_load_slb(struct spu *spu, int slbe, struct spu_slb *slb)
> >  
> >  static int __spu_trap_data_seg(struct spu *spu, unsigned long ea)
> >  {
> > -	struct mm_struct *mm = spu->mm;
> >  	struct spu_slb slb;
> > -	int psize;
> > -
> > -	pr_debug("%s\n", __func__);
> > -
> > -	slb.esid = (ea & ESID_MASK) | SLB_ESID_V;
> > +	int ret;
> >  
> > -	switch(REGION_ID(ea)) {
> > -	case USER_REGION_ID:
> > -#ifdef CONFIG_PPC_MM_SLICES
> > -		psize = get_slice_psize(mm, ea);
> > -#else
> > -		psize = mm->context.user_psize;
> > -#endif
> > -		slb.vsid = (get_vsid(mm->context.id, ea, MMU_SEGSIZE_256M)
> > -				<< SLB_VSID_SHIFT) | SLB_VSID_USER;
> > -		break;
> > -	case VMALLOC_REGION_ID:
> > -		if (ea < VMALLOC_END)
> > -			psize = mmu_vmalloc_psize;
> > -		else
> > -			psize = mmu_io_psize;
> > -		slb.vsid = (get_kernel_vsid(ea, MMU_SEGSIZE_256M)
> > -				<< SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
> > -		break;
> > -	case KERNEL_REGION_ID:
> > -		psize = mmu_linear_psize;
> > -		slb.vsid = (get_kernel_vsid(ea, MMU_SEGSIZE_256M)
> > -				<< SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
> > -		break;
> > -	default:
> > -		/* Future: support kernel segments so that drivers
> > -		 * can use SPUs.
> > -		 */
> > -		pr_debug("invalid region access at %016lx\n", ea);
> > -		return 1;
> > -	}
> > -	slb.vsid |= mmu_psize_defs[psize].sllp;
> > +	ret = copro_data_segment(spu->mm, ea, &slb.esid, &slb.vsid);
> > +	if (ret)
> > +		return ret;
> >  
> >  	spu_load_slb(spu, spu->slb_replace, &slb);
> >  
> 
> -aneesh
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 08/15] powerpc/mm: Add new hash_page_mm()
       [not found]     ` <1412054407.1733.77.camel@ale.ozlabs.ibm.com>
@ 2014-09-30  6:13       ` Michael Neuling
  0 siblings, 0 replies; 43+ messages in thread
From: Michael Neuling @ 2014-09-30  6:13 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: greg, arnd, mpe, benh, anton, linux-kernel, linuxppc-dev, jk,
	imunsie, cbe-oss-dev

> > You also added code to handle KERNEL_REGION_ID in
> > [PATCH 02/15] powerpc/cell: Move data segment faulting code out of cell
> > platform. do we need to handle that here ?
> > 

(Sorry missed this on my other reply...)

I've refactored that code now so it should be handled in the vsid
calculation.  It's not my strong area so you might wanna check on
repost.

Thanks again.

Mikey


^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2014-09-30  6:13 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-18  8:26 [PATCH 0/15] POWER8 Coherent Accelerator device driver Michael Neuling
2014-09-18  8:26 ` [PATCH 01/15] powerpc/cell: Move spu_handle_mm_fault() out of cell platform Michael Neuling
2014-09-18 10:00   ` Jeremy Kerr
2014-09-18 23:26     ` Michael Neuling
2014-09-26  3:57   ` Anton Blanchard
2014-09-18  8:26 ` [PATCH 02/15] powerpc/cell: Move data segment faulting code " Michael Neuling
2014-09-18 10:27   ` Jeremy Kerr
2014-09-18 23:45     ` Michael Neuling
2014-09-26  4:05   ` Anton Blanchard
2014-09-26 11:19     ` Michael Neuling
2014-09-29  8:30   ` Aneesh Kumar K.V
2014-09-30  4:40     ` Michael Neuling
2014-09-18  8:26 ` [PATCH 03/15] powerpc/msi: Improve IRQ bitmap allocator Michael Neuling
2014-09-19 20:16   ` Scott Wood
2014-09-19 20:19     ` Scott Wood
2014-09-22  8:26       ` Laurentiu Tudor
2014-09-22 23:50         ` Scott Wood
2014-09-22  8:25     ` Laurentiu Tudor
2014-09-22  8:29   ` Laurentiu Tudor
2014-09-22 22:59     ` Michael Neuling
2014-09-18  8:26 ` [PATCH 04/15] powerpc/mm: Export mmu_kernel_ssize and mmu_linear_psize Michael Neuling
2014-09-18  8:26 ` [PATCH 05/15] powerpc/powernv: Split out set MSI IRQ chip code Michael Neuling
2014-09-19  6:54   ` Gavin Shan
2014-09-22  4:31     ` Michael Neuling
2014-09-18  8:26 ` [PATCH 06/15] cxl: Add new header for call backs and structs Michael Neuling
2014-09-18  8:26 ` [PATCH 07/15] powerpc/powerpc: Add new PCIe functions for allocating cxl interrupts Michael Neuling
2014-09-19  7:09   ` Gavin Shan
2014-09-22  5:01     ` Michael Neuling
2014-09-18  8:26 ` [PATCH 08/15] powerpc/mm: Add new hash_page_mm() Michael Neuling
2014-09-29  8:50   ` Aneesh Kumar K.V
     [not found]     ` <1412054407.1733.77.camel@ale.ozlabs.ibm.com>
2014-09-30  6:13       ` Michael Neuling
2014-09-18  8:26 ` [PATCH 09/15] powerpc/opal: Add PHB to cxl mode call Michael Neuling
2014-09-26  4:35   ` Anton Blanchard
2014-09-18  8:26 ` [PATCH 10/15] powerpc/mm: Add hooks for cxl Michael Neuling
2014-09-26  4:33   ` Anton Blanchard
2014-09-26 11:33     ` Michael Neuling
2014-09-26 13:24       ` Anton Blanchard
2014-09-29  9:10   ` Aneesh Kumar K.V
2014-09-18  8:26 ` [PATCH 11/15] cxl: Add base builtin support Michael Neuling
2014-09-18  8:26 ` [PATCH 12/15] cxl: Driver code for powernv PCIe based cards for userspace access Michael Neuling
2014-09-18  8:26 ` [PATCH 13/15] cxl: Userspace header file Michael Neuling
2014-09-18  8:26 ` [PATCH 14/15] cxl: Add driver to Kbuild and Makefiles Michael Neuling
2014-09-18  8:27 ` [PATCH 15/15] cxl: Add documentation for userspace APIs Michael Neuling

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).