All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/17] POWER8 Coherent Accelerator device driver
@ 2014-09-30 10:34 ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:34 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

v2:
 - Updates based on comments from, Anton, Gavin, Aneesh, jk and offline reviews
 - Simplified copro_data_segment() and merged code with hash_page_mm()
    (New patch 10/17)
 - PCIe code simplifications based on Gavin's review
 - Removed redundant comment in msi_bitmap_alloc_hwirqs()
 - Fix for locking in idr_remove in core driver
 - Ensure PSL is enabled when PHB is flipped to CXL mode
 - Added CONFIG_PPC_COPRO_BASE to compile copro_fault.c
 - Merged SPU and cxl slb flushing calls into copro_flush_all_slbs()
    (New patch 03/17)
 - Moved slb_vsid_shift() to static inline from #define
 - Don't write paca->context when demoting segments and mm != current
 - Fix minor typos in documentation

v1:
 - Initial post

This add support for the Coherent Accelerator (cxl) attached to POWER8
processors.  This coherent accelerator interface is designed to allow the
coherent connection of FPGA based accelerators (and other devices) to a POWER
systems.

IBM refers to this as the Coherent Accelerator Processor Interface or CAPI.  In
this driver it's referred to by the name cxl to avoid confusion with the ISDN
CAPI subsystem.

An overview of the patches:
  Patches  1-3:  Split some of the old Cell co-processor code out so it can be
		   reused.
  Patches  4-11: Add infrastructure to arch/powerpc needed by cxl.
  Patches  12:   Add call backs needed for invalidating cxl mm contexts.
  Patch    13:   Add cxl specific support that needs to be built in to the
		   kernel (can't be a module).
  Patches 14-16: Add the majority of the device driver and API header.
  Patch    17:   Documentation.

The documentation in this last patch gives an overview of the hardware
architecture as well as the userspace API.

The cxl driver has a user-space interface described in include/uapi/misc/cxl.h
and Documentation/powerpc/cxl.txt.  There are two ioctls which can be used to
talk to the driver once the new /dev/cxl/afu0.0 device is opened.  This device
can also be read and mmaped.

There's also sysfs entries used to communicate information about the cxl
configuration to userspace.  These are documented in
Documentation/ABI/testing/sysfs-class-cxl.

Many contributed to this device driver but Ian Munsie is the principal author.

Driver can also be found here (based on 3.17-rc5):
   git://github.com/mikey/linux.git cxl
   https://github.com/mikey/linux/commits/cxl
(Series rebases on recent linux-next with one trivial include file conflict)

Please consider for inclusion.  Feedback welcome!

Regards,
Mikey

 Documentation/ABI/testing/sysfs-class-cxl      | 125 ++++
 Documentation/ioctl/ioctl-number.txt           |   1 +
 Documentation/powerpc/00-INDEX                 |   2 +
 Documentation/powerpc/cxl.txt                  | 310 ++++++++
 MAINTAINERS                                    |   7 +
 arch/powerpc/Kconfig                           |   4 +
 arch/powerpc/include/asm/copro.h               |  24 +
 arch/powerpc/include/asm/mmu-hash64.h          |  10 +-
 arch/powerpc/include/asm/opal.h                |   2 +
 arch/powerpc/include/asm/pnv-pci.h             |  27 +
 arch/powerpc/include/asm/spu.h                 |   5 +-
 arch/powerpc/mm/Makefile                       |   1 +
 arch/powerpc/mm/copro_fault.c                  | 124 ++++
 arch/powerpc/mm/hash_native_64.c               |   6 +-
 arch/powerpc/mm/hash_utils_64.c                |  95 ++-
 arch/powerpc/mm/slb.c                          |   3 -
 arch/powerpc/mm/slice.c                        |  10 +-
 arch/powerpc/platforms/cell/Kconfig            |   1 +
 arch/powerpc/platforms/cell/Makefile           |   2 +-
 arch/powerpc/platforms/cell/spu_base.c         |  41 +-
 arch/powerpc/platforms/cell/spu_fault.c        |  94 ---
 arch/powerpc/platforms/cell/spufs/fault.c      |   4 +-
 arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
 arch/powerpc/platforms/powernv/pci-ioda.c      | 204 +++++-
 arch/powerpc/sysdev/msi_bitmap.c               |  23 +-
 drivers/misc/Kconfig                           |   1 +
 drivers/misc/Makefile                          |   1 +
 drivers/misc/cxl/Kconfig                       |  26 +
 drivers/misc/cxl/Makefile                      |   4 +
 drivers/misc/cxl/base.c                        | 102 +++
 drivers/misc/cxl/context.c                     | 171 +++++
 drivers/misc/cxl/cxl-pci.c                     | 964 +++++++++++++++++++++++++
 drivers/misc/cxl/cxl.h                         | 605 ++++++++++++++++
 drivers/misc/cxl/debugfs.c                     | 116 +++
 drivers/misc/cxl/fault.c                       | 298 ++++++++
 drivers/misc/cxl/file.c                        | 503 +++++++++++++
 drivers/misc/cxl/irq.c                         | 405 +++++++++++
 drivers/misc/cxl/main.c                        | 238 ++++++
 drivers/misc/cxl/native.c                      | 649 +++++++++++++++++
 drivers/misc/cxl/sysfs.c                       | 348 +++++++++
 include/misc/cxl.h                             |  34 +
 include/uapi/Kbuild                            |   1 +
 include/uapi/misc/Kbuild                       |   2 +
 include/uapi/misc/cxl.h                        |  88 +++
 44 files changed, 5469 insertions(+), 213 deletions(-)

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v2 0/17] POWER8 Coherent Accelerator device driver
@ 2014-09-30 10:34 ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:34 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

v2:
 - Updates based on comments from, Anton, Gavin, Aneesh, jk and offline reviews
 - Simplified copro_data_segment() and merged code with hash_page_mm()
    (New patch 10/17)
 - PCIe code simplifications based on Gavin's review
 - Removed redundant comment in msi_bitmap_alloc_hwirqs()
 - Fix for locking in idr_remove in core driver
 - Ensure PSL is enabled when PHB is flipped to CXL mode
 - Added CONFIG_PPC_COPRO_BASE to compile copro_fault.c
 - Merged SPU and cxl slb flushing calls into copro_flush_all_slbs()
    (New patch 03/17)
 - Moved slb_vsid_shift() to static inline from #define
 - Don't write paca->context when demoting segments and mm != current
 - Fix minor typos in documentation

v1:
 - Initial post

This add support for the Coherent Accelerator (cxl) attached to POWER8
processors.  This coherent accelerator interface is designed to allow the
coherent connection of FPGA based accelerators (and other devices) to a POWER
systems.

IBM refers to this as the Coherent Accelerator Processor Interface or CAPI.  In
this driver it's referred to by the name cxl to avoid confusion with the ISDN
CAPI subsystem.

An overview of the patches:
  Patches  1-3:  Split some of the old Cell co-processor code out so it can be
		   reused.
  Patches  4-11: Add infrastructure to arch/powerpc needed by cxl.
  Patches  12:   Add call backs needed for invalidating cxl mm contexts.
  Patch    13:   Add cxl specific support that needs to be built in to the
		   kernel (can't be a module).
  Patches 14-16: Add the majority of the device driver and API header.
  Patch    17:   Documentation.

The documentation in this last patch gives an overview of the hardware
architecture as well as the userspace API.

The cxl driver has a user-space interface described in include/uapi/misc/cxl.h
and Documentation/powerpc/cxl.txt.  There are two ioctls which can be used to
talk to the driver once the new /dev/cxl/afu0.0 device is opened.  This device
can also be read and mmaped.

There's also sysfs entries used to communicate information about the cxl
configuration to userspace.  These are documented in
Documentation/ABI/testing/sysfs-class-cxl.

Many contributed to this device driver but Ian Munsie is the principal author.

Driver can also be found here (based on 3.17-rc5):
   git://github.com/mikey/linux.git cxl
   https://github.com/mikey/linux/commits/cxl
(Series rebases on recent linux-next with one trivial include file conflict)

Please consider for inclusion.  Feedback welcome!

Regards,
Mikey

 Documentation/ABI/testing/sysfs-class-cxl      | 125 ++++
 Documentation/ioctl/ioctl-number.txt           |   1 +
 Documentation/powerpc/00-INDEX                 |   2 +
 Documentation/powerpc/cxl.txt                  | 310 ++++++++
 MAINTAINERS                                    |   7 +
 arch/powerpc/Kconfig                           |   4 +
 arch/powerpc/include/asm/copro.h               |  24 +
 arch/powerpc/include/asm/mmu-hash64.h          |  10 +-
 arch/powerpc/include/asm/opal.h                |   2 +
 arch/powerpc/include/asm/pnv-pci.h             |  27 +
 arch/powerpc/include/asm/spu.h                 |   5 +-
 arch/powerpc/mm/Makefile                       |   1 +
 arch/powerpc/mm/copro_fault.c                  | 124 ++++
 arch/powerpc/mm/hash_native_64.c               |   6 +-
 arch/powerpc/mm/hash_utils_64.c                |  95 ++-
 arch/powerpc/mm/slb.c                          |   3 -
 arch/powerpc/mm/slice.c                        |  10 +-
 arch/powerpc/platforms/cell/Kconfig            |   1 +
 arch/powerpc/platforms/cell/Makefile           |   2 +-
 arch/powerpc/platforms/cell/spu_base.c         |  41 +-
 arch/powerpc/platforms/cell/spu_fault.c        |  94 ---
 arch/powerpc/platforms/cell/spufs/fault.c      |   4 +-
 arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
 arch/powerpc/platforms/powernv/pci-ioda.c      | 204 +++++-
 arch/powerpc/sysdev/msi_bitmap.c               |  23 +-
 drivers/misc/Kconfig                           |   1 +
 drivers/misc/Makefile                          |   1 +
 drivers/misc/cxl/Kconfig                       |  26 +
 drivers/misc/cxl/Makefile                      |   4 +
 drivers/misc/cxl/base.c                        | 102 +++
 drivers/misc/cxl/context.c                     | 171 +++++
 drivers/misc/cxl/cxl-pci.c                     | 964 +++++++++++++++++++++++++
 drivers/misc/cxl/cxl.h                         | 605 ++++++++++++++++
 drivers/misc/cxl/debugfs.c                     | 116 +++
 drivers/misc/cxl/fault.c                       | 298 ++++++++
 drivers/misc/cxl/file.c                        | 503 +++++++++++++
 drivers/misc/cxl/irq.c                         | 405 +++++++++++
 drivers/misc/cxl/main.c                        | 238 ++++++
 drivers/misc/cxl/native.c                      | 649 +++++++++++++++++
 drivers/misc/cxl/sysfs.c                       | 348 +++++++++
 include/misc/cxl.h                             |  34 +
 include/uapi/Kbuild                            |   1 +
 include/uapi/misc/Kbuild                       |   2 +
 include/uapi/misc/cxl.h                        |  88 +++
 44 files changed, 5469 insertions(+), 213 deletions(-)

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v2 01/17] powerpc/cell: Move spu_handle_mm_fault() out of cell platform
  2014-09-30 10:34 ` Michael Neuling
@ 2014-09-30 10:34   ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:34 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

From: Ian Munsie <imunsie@au1.ibm.com>

Currently spu_handle_mm_fault() is in the cell platform.

This code is generically useful for other non-cell co-processors on powerpc.

This patch moves this function out of the cell platform into arch/powerpc/mm so
that others may use it.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/Kconfig                                   |  4 ++++
 arch/powerpc/include/asm/copro.h                       | 18 ++++++++++++++++++
 arch/powerpc/include/asm/spu.h                         |  5 ++---
 arch/powerpc/mm/Makefile                               |  1 +
 .../{platforms/cell/spu_fault.c => mm/copro_fault.c}   | 14 ++++++--------
 arch/powerpc/platforms/cell/Kconfig                    |  1 +
 arch/powerpc/platforms/cell/Makefile                   |  2 +-
 arch/powerpc/platforms/cell/spufs/fault.c              |  4 ++--
 8 files changed, 35 insertions(+), 14 deletions(-)
 create mode 100644 arch/powerpc/include/asm/copro.h
 rename arch/powerpc/{platforms/cell/spu_fault.c => mm/copro_fault.c} (89%)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 4bc7b62..8f094e9 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -603,6 +603,10 @@ config PPC_SUBPAGE_PROT
 	  to set access permissions (read/write, readonly, or no access)
 	  on the 4k subpages of each 64k page.
 
+config PPC_COPRO_BASE
+	bool
+	default n
+
 config SCHED_SMT
 	bool "SMT (Hyperthreading) scheduler support"
 	depends on PPC64 && SMP
diff --git a/arch/powerpc/include/asm/copro.h b/arch/powerpc/include/asm/copro.h
new file mode 100644
index 0000000..2858108
--- /dev/null
+++ b/arch/powerpc/include/asm/copro.h
@@ -0,0 +1,18 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _ASM_POWERPC_COPRO_H
+#define _ASM_POWERPC_COPRO_H
+
+int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
+			  unsigned long dsisr, unsigned *flt);
+
+int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid);
+
+#endif /* _ASM_POWERPC_COPRO_H */
diff --git a/arch/powerpc/include/asm/spu.h b/arch/powerpc/include/asm/spu.h
index 37b7ca3..a6e6e2b 100644
--- a/arch/powerpc/include/asm/spu.h
+++ b/arch/powerpc/include/asm/spu.h
@@ -27,6 +27,8 @@
 #include <linux/workqueue.h>
 #include <linux/device.h>
 #include <linux/mutex.h>
+#include <asm/reg.h>
+#include <asm/copro.h>
 
 #define LS_SIZE (256 * 1024)
 #define LS_ADDR_MASK (LS_SIZE - 1)
@@ -277,9 +279,6 @@ void spu_remove_dev_attr(struct device_attribute *attr);
 int spu_add_dev_attr_group(struct attribute_group *attrs);
 void spu_remove_dev_attr_group(struct attribute_group *attrs);
 
-int spu_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
-		unsigned long dsisr, unsigned *flt);
-
 /*
  * Notifier blocks:
  *
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index d0130ff..325e861 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -34,3 +34,4 @@ obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += hugepage-hash64.o
 obj-$(CONFIG_PPC_SUBPAGE_PROT)	+= subpage-prot.o
 obj-$(CONFIG_NOT_COHERENT_CACHE) += dma-noncoherent.o
 obj-$(CONFIG_HIGHMEM)		+= highmem.o
+obj-$(CONFIG_PPC_COPRO_BASE)	+= copro_fault.o
diff --git a/arch/powerpc/platforms/cell/spu_fault.c b/arch/powerpc/mm/copro_fault.c
similarity index 89%
rename from arch/powerpc/platforms/cell/spu_fault.c
rename to arch/powerpc/mm/copro_fault.c
index 641e727..ba7df14 100644
--- a/arch/powerpc/platforms/cell/spu_fault.c
+++ b/arch/powerpc/mm/copro_fault.c
@@ -1,5 +1,5 @@
 /*
- * SPU mm fault handler
+ * CoProcessor (SPU/AFU) mm fault handler
  *
  * (C) Copyright IBM Deutschland Entwicklung GmbH 2007
  *
@@ -23,16 +23,14 @@
 #include <linux/sched.h>
 #include <linux/mm.h>
 #include <linux/export.h>
-
-#include <asm/spu.h>
-#include <asm/spu_csa.h>
+#include <asm/reg.h>
 
 /*
  * This ought to be kept in sync with the powerpc specific do_page_fault
  * function. Currently, there are a few corner cases that we haven't had
  * to handle fortunately.
  */
-int spu_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
+int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
 		unsigned long dsisr, unsigned *flt)
 {
 	struct vm_area_struct *vma;
@@ -58,12 +56,12 @@ int spu_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
 			goto out_unlock;
 	}
 
-	is_write = dsisr & MFC_DSISR_ACCESS_PUT;
+	is_write = dsisr & DSISR_ISSTORE;
 	if (is_write) {
 		if (!(vma->vm_flags & VM_WRITE))
 			goto out_unlock;
 	} else {
-		if (dsisr & MFC_DSISR_ACCESS_DENIED)
+		if (dsisr & DSISR_PROTFAULT)
 			goto out_unlock;
 		if (!(vma->vm_flags & (VM_READ | VM_EXEC)))
 			goto out_unlock;
@@ -91,4 +89,4 @@ out_unlock:
 	up_read(&mm->mmap_sem);
 	return ret;
 }
-EXPORT_SYMBOL_GPL(spu_handle_mm_fault);
+EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
diff --git a/arch/powerpc/platforms/cell/Kconfig b/arch/powerpc/platforms/cell/Kconfig
index 9978f59..870b6db 100644
--- a/arch/powerpc/platforms/cell/Kconfig
+++ b/arch/powerpc/platforms/cell/Kconfig
@@ -86,6 +86,7 @@ config SPU_FS_64K_LS
 config SPU_BASE
 	bool
 	default n
+	select PPC_COPRO_BASE
 
 config CBE_RAS
 	bool "RAS features for bare metal Cell BE"
diff --git a/arch/powerpc/platforms/cell/Makefile b/arch/powerpc/platforms/cell/Makefile
index fe053e7..2d16884 100644
--- a/arch/powerpc/platforms/cell/Makefile
+++ b/arch/powerpc/platforms/cell/Makefile
@@ -20,7 +20,7 @@ spu-manage-$(CONFIG_PPC_CELL_COMMON)	+= spu_manage.o
 
 obj-$(CONFIG_SPU_BASE)			+= spu_callbacks.o spu_base.o \
 					   spu_notify.o \
-					   spu_syscalls.o spu_fault.o \
+					   spu_syscalls.o \
 					   $(spu-priv1-y) \
 					   $(spu-manage-y) \
 					   spufs/
diff --git a/arch/powerpc/platforms/cell/spufs/fault.c b/arch/powerpc/platforms/cell/spufs/fault.c
index 8cb6260..e45894a 100644
--- a/arch/powerpc/platforms/cell/spufs/fault.c
+++ b/arch/powerpc/platforms/cell/spufs/fault.c
@@ -138,7 +138,7 @@ int spufs_handle_class1(struct spu_context *ctx)
 	if (ctx->state == SPU_STATE_RUNNABLE)
 		ctx->spu->stats.hash_flt++;
 
-	/* we must not hold the lock when entering spu_handle_mm_fault */
+	/* we must not hold the lock when entering copro_handle_mm_fault */
 	spu_release(ctx);
 
 	access = (_PAGE_PRESENT | _PAGE_USER);
@@ -149,7 +149,7 @@ int spufs_handle_class1(struct spu_context *ctx)
 
 	/* hashing failed, so try the actual fault handler */
 	if (ret)
-		ret = spu_handle_mm_fault(current->mm, ea, dsisr, &flt);
+		ret = copro_handle_mm_fault(current->mm, ea, dsisr, &flt);
 
 	/*
 	 * This is nasty: we need the state_mutex for all the bookkeeping even
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 01/17] powerpc/cell: Move spu_handle_mm_fault() out of cell platform
@ 2014-09-30 10:34   ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:34 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

From: Ian Munsie <imunsie@au1.ibm.com>

Currently spu_handle_mm_fault() is in the cell platform.

This code is generically useful for other non-cell co-processors on powerpc.

This patch moves this function out of the cell platform into arch/powerpc/mm so
that others may use it.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/Kconfig                                   |  4 ++++
 arch/powerpc/include/asm/copro.h                       | 18 ++++++++++++++++++
 arch/powerpc/include/asm/spu.h                         |  5 ++---
 arch/powerpc/mm/Makefile                               |  1 +
 .../{platforms/cell/spu_fault.c => mm/copro_fault.c}   | 14 ++++++--------
 arch/powerpc/platforms/cell/Kconfig                    |  1 +
 arch/powerpc/platforms/cell/Makefile                   |  2 +-
 arch/powerpc/platforms/cell/spufs/fault.c              |  4 ++--
 8 files changed, 35 insertions(+), 14 deletions(-)
 create mode 100644 arch/powerpc/include/asm/copro.h
 rename arch/powerpc/{platforms/cell/spu_fault.c => mm/copro_fault.c} (89%)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 4bc7b62..8f094e9 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -603,6 +603,10 @@ config PPC_SUBPAGE_PROT
 	  to set access permissions (read/write, readonly, or no access)
 	  on the 4k subpages of each 64k page.
 
+config PPC_COPRO_BASE
+	bool
+	default n
+
 config SCHED_SMT
 	bool "SMT (Hyperthreading) scheduler support"
 	depends on PPC64 && SMP
diff --git a/arch/powerpc/include/asm/copro.h b/arch/powerpc/include/asm/copro.h
new file mode 100644
index 0000000..2858108
--- /dev/null
+++ b/arch/powerpc/include/asm/copro.h
@@ -0,0 +1,18 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _ASM_POWERPC_COPRO_H
+#define _ASM_POWERPC_COPRO_H
+
+int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
+			  unsigned long dsisr, unsigned *flt);
+
+int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid);
+
+#endif /* _ASM_POWERPC_COPRO_H */
diff --git a/arch/powerpc/include/asm/spu.h b/arch/powerpc/include/asm/spu.h
index 37b7ca3..a6e6e2b 100644
--- a/arch/powerpc/include/asm/spu.h
+++ b/arch/powerpc/include/asm/spu.h
@@ -27,6 +27,8 @@
 #include <linux/workqueue.h>
 #include <linux/device.h>
 #include <linux/mutex.h>
+#include <asm/reg.h>
+#include <asm/copro.h>
 
 #define LS_SIZE (256 * 1024)
 #define LS_ADDR_MASK (LS_SIZE - 1)
@@ -277,9 +279,6 @@ void spu_remove_dev_attr(struct device_attribute *attr);
 int spu_add_dev_attr_group(struct attribute_group *attrs);
 void spu_remove_dev_attr_group(struct attribute_group *attrs);
 
-int spu_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
-		unsigned long dsisr, unsigned *flt);
-
 /*
  * Notifier blocks:
  *
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index d0130ff..325e861 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -34,3 +34,4 @@ obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += hugepage-hash64.o
 obj-$(CONFIG_PPC_SUBPAGE_PROT)	+= subpage-prot.o
 obj-$(CONFIG_NOT_COHERENT_CACHE) += dma-noncoherent.o
 obj-$(CONFIG_HIGHMEM)		+= highmem.o
+obj-$(CONFIG_PPC_COPRO_BASE)	+= copro_fault.o
diff --git a/arch/powerpc/platforms/cell/spu_fault.c b/arch/powerpc/mm/copro_fault.c
similarity index 89%
rename from arch/powerpc/platforms/cell/spu_fault.c
rename to arch/powerpc/mm/copro_fault.c
index 641e727..ba7df14 100644
--- a/arch/powerpc/platforms/cell/spu_fault.c
+++ b/arch/powerpc/mm/copro_fault.c
@@ -1,5 +1,5 @@
 /*
- * SPU mm fault handler
+ * CoProcessor (SPU/AFU) mm fault handler
  *
  * (C) Copyright IBM Deutschland Entwicklung GmbH 2007
  *
@@ -23,16 +23,14 @@
 #include <linux/sched.h>
 #include <linux/mm.h>
 #include <linux/export.h>
-
-#include <asm/spu.h>
-#include <asm/spu_csa.h>
+#include <asm/reg.h>
 
 /*
  * This ought to be kept in sync with the powerpc specific do_page_fault
  * function. Currently, there are a few corner cases that we haven't had
  * to handle fortunately.
  */
-int spu_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
+int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
 		unsigned long dsisr, unsigned *flt)
 {
 	struct vm_area_struct *vma;
@@ -58,12 +56,12 @@ int spu_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
 			goto out_unlock;
 	}
 
-	is_write = dsisr & MFC_DSISR_ACCESS_PUT;
+	is_write = dsisr & DSISR_ISSTORE;
 	if (is_write) {
 		if (!(vma->vm_flags & VM_WRITE))
 			goto out_unlock;
 	} else {
-		if (dsisr & MFC_DSISR_ACCESS_DENIED)
+		if (dsisr & DSISR_PROTFAULT)
 			goto out_unlock;
 		if (!(vma->vm_flags & (VM_READ | VM_EXEC)))
 			goto out_unlock;
@@ -91,4 +89,4 @@ out_unlock:
 	up_read(&mm->mmap_sem);
 	return ret;
 }
-EXPORT_SYMBOL_GPL(spu_handle_mm_fault);
+EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
diff --git a/arch/powerpc/platforms/cell/Kconfig b/arch/powerpc/platforms/cell/Kconfig
index 9978f59..870b6db 100644
--- a/arch/powerpc/platforms/cell/Kconfig
+++ b/arch/powerpc/platforms/cell/Kconfig
@@ -86,6 +86,7 @@ config SPU_FS_64K_LS
 config SPU_BASE
 	bool
 	default n
+	select PPC_COPRO_BASE
 
 config CBE_RAS
 	bool "RAS features for bare metal Cell BE"
diff --git a/arch/powerpc/platforms/cell/Makefile b/arch/powerpc/platforms/cell/Makefile
index fe053e7..2d16884 100644
--- a/arch/powerpc/platforms/cell/Makefile
+++ b/arch/powerpc/platforms/cell/Makefile
@@ -20,7 +20,7 @@ spu-manage-$(CONFIG_PPC_CELL_COMMON)	+= spu_manage.o
 
 obj-$(CONFIG_SPU_BASE)			+= spu_callbacks.o spu_base.o \
 					   spu_notify.o \
-					   spu_syscalls.o spu_fault.o \
+					   spu_syscalls.o \
 					   $(spu-priv1-y) \
 					   $(spu-manage-y) \
 					   spufs/
diff --git a/arch/powerpc/platforms/cell/spufs/fault.c b/arch/powerpc/platforms/cell/spufs/fault.c
index 8cb6260..e45894a 100644
--- a/arch/powerpc/platforms/cell/spufs/fault.c
+++ b/arch/powerpc/platforms/cell/spufs/fault.c
@@ -138,7 +138,7 @@ int spufs_handle_class1(struct spu_context *ctx)
 	if (ctx->state == SPU_STATE_RUNNABLE)
 		ctx->spu->stats.hash_flt++;
 
-	/* we must not hold the lock when entering spu_handle_mm_fault */
+	/* we must not hold the lock when entering copro_handle_mm_fault */
 	spu_release(ctx);
 
 	access = (_PAGE_PRESENT | _PAGE_USER);
@@ -149,7 +149,7 @@ int spufs_handle_class1(struct spu_context *ctx)
 
 	/* hashing failed, so try the actual fault handler */
 	if (ret)
-		ret = spu_handle_mm_fault(current->mm, ea, dsisr, &flt);
+		ret = copro_handle_mm_fault(current->mm, ea, dsisr, &flt);
 
 	/*
 	 * This is nasty: we need the state_mutex for all the bookkeeping even
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 02/17] powerpc/cell: Move data segment faulting code out of cell platform
  2014-09-30 10:34 ` Michael Neuling
@ 2014-09-30 10:34   ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:34 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

From: Ian Munsie <imunsie@au1.ibm.com>

__spu_trap_data_seg() currently contains code to determine the VSID and ESID
required for a particular EA and mm struct.

This code is generically useful for other co-processors.  This moves the code
of the cell platform so it can be used by other powerpc code.  It also adds 1TB
segment handling which Cell didn't have.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/mmu-hash64.h  |  7 ++++-
 arch/powerpc/mm/copro_fault.c          | 48 ++++++++++++++++++++++++++++++++++
 arch/powerpc/mm/slb.c                  |  3 ---
 arch/powerpc/platforms/cell/spu_base.c | 41 +++--------------------------
 4 files changed, 58 insertions(+), 41 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
index d765144..6d0b7a2 100644
--- a/arch/powerpc/include/asm/mmu-hash64.h
+++ b/arch/powerpc/include/asm/mmu-hash64.h
@@ -189,7 +189,12 @@ static inline unsigned int mmu_psize_to_shift(unsigned int mmu_psize)
 #define LP_MASK(i)	((0xFF >> (i)) << LP_SHIFT)
 
 #ifndef __ASSEMBLY__
-
+static inline int slb_vsid_shift(int ssize)
+{
+	if (ssize == MMU_SEGSIZE_256M)
+		return SLB_VSID_SHIFT;
+	return SLB_VSID_SHIFT_1T;
+}
 static inline int segment_shift(int ssize)
 {
 	if (ssize == MMU_SEGSIZE_256M)
diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
index ba7df14..b865697 100644
--- a/arch/powerpc/mm/copro_fault.c
+++ b/arch/powerpc/mm/copro_fault.c
@@ -90,3 +90,51 @@ out_unlock:
 	return ret;
 }
 EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
+
+int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid)
+{
+	int psize, ssize;
+
+	*esid = (ea & ESID_MASK) | SLB_ESID_V;
+
+	switch (REGION_ID(ea)) {
+	case USER_REGION_ID:
+		pr_devel("copro_data_segment: 0x%llx -- USER_REGION_ID\n", ea);
+#ifdef CONFIG_PPC_MM_SLICES
+		psize = get_slice_psize(mm, ea);
+#else
+		psize = mm->context.user_psize;
+#endif
+		ssize = user_segment_size(ea);
+		*vsid = (get_vsid(mm->context.id, ea, ssize)
+			 << slb_vsid_shift(ssize)) | SLB_VSID_USER;
+		break;
+	case VMALLOC_REGION_ID:
+		pr_devel("copro_data_segment: 0x%llx -- VMALLOC_REGION_ID\n", ea);
+		if (ea < VMALLOC_END)
+			psize = mmu_vmalloc_psize;
+		else
+			psize = mmu_io_psize;
+		ssize = mmu_kernel_ssize;
+		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
+			 << SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
+		break;
+	case KERNEL_REGION_ID:
+		pr_devel("copro_data_segment: 0x%llx -- KERNEL_REGION_ID\n", ea);
+		psize = mmu_linear_psize;
+		ssize = mmu_kernel_ssize;
+		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
+			 << SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
+		break;
+	default:
+		/* Future: support kernel segments so that drivers can use the
+		 * CoProcessors */
+		pr_debug("invalid region access at %016llx\n", ea);
+		return 1;
+	}
+	*vsid |= mmu_psize_defs[psize].sllp |
+		((ssize == MMU_SEGSIZE_1T) ? SLB_VSID_B_1T : 0);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(copro_data_segment);
diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index 0399a67..6e450ca 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -46,9 +46,6 @@ static inline unsigned long mk_esid_data(unsigned long ea, int ssize,
 	return (ea & slb_esid_mask(ssize)) | SLB_ESID_V | slot;
 }
 
-#define slb_vsid_shift(ssize)	\
-	((ssize) == MMU_SEGSIZE_256M? SLB_VSID_SHIFT: SLB_VSID_SHIFT_1T)
-
 static inline unsigned long mk_vsid_data(unsigned long ea, int ssize,
 					 unsigned long flags)
 {
diff --git a/arch/powerpc/platforms/cell/spu_base.c b/arch/powerpc/platforms/cell/spu_base.c
index 2930d1e..fe004b1 100644
--- a/arch/powerpc/platforms/cell/spu_base.c
+++ b/arch/powerpc/platforms/cell/spu_base.c
@@ -167,45 +167,12 @@ static inline void spu_load_slb(struct spu *spu, int slbe, struct spu_slb *slb)
 
 static int __spu_trap_data_seg(struct spu *spu, unsigned long ea)
 {
-	struct mm_struct *mm = spu->mm;
 	struct spu_slb slb;
-	int psize;
-
-	pr_debug("%s\n", __func__);
-
-	slb.esid = (ea & ESID_MASK) | SLB_ESID_V;
+	int ret;
 
-	switch(REGION_ID(ea)) {
-	case USER_REGION_ID:
-#ifdef CONFIG_PPC_MM_SLICES
-		psize = get_slice_psize(mm, ea);
-#else
-		psize = mm->context.user_psize;
-#endif
-		slb.vsid = (get_vsid(mm->context.id, ea, MMU_SEGSIZE_256M)
-				<< SLB_VSID_SHIFT) | SLB_VSID_USER;
-		break;
-	case VMALLOC_REGION_ID:
-		if (ea < VMALLOC_END)
-			psize = mmu_vmalloc_psize;
-		else
-			psize = mmu_io_psize;
-		slb.vsid = (get_kernel_vsid(ea, MMU_SEGSIZE_256M)
-				<< SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
-		break;
-	case KERNEL_REGION_ID:
-		psize = mmu_linear_psize;
-		slb.vsid = (get_kernel_vsid(ea, MMU_SEGSIZE_256M)
-				<< SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
-		break;
-	default:
-		/* Future: support kernel segments so that drivers
-		 * can use SPUs.
-		 */
-		pr_debug("invalid region access at %016lx\n", ea);
-		return 1;
-	}
-	slb.vsid |= mmu_psize_defs[psize].sllp;
+	ret = copro_data_segment(spu->mm, ea, &slb.esid, &slb.vsid);
+	if (ret)
+		return ret;
 
 	spu_load_slb(spu, spu->slb_replace, &slb);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 02/17] powerpc/cell: Move data segment faulting code out of cell platform
@ 2014-09-30 10:34   ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:34 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

From: Ian Munsie <imunsie@au1.ibm.com>

__spu_trap_data_seg() currently contains code to determine the VSID and ESID
required for a particular EA and mm struct.

This code is generically useful for other co-processors.  This moves the code
of the cell platform so it can be used by other powerpc code.  It also adds 1TB
segment handling which Cell didn't have.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/mmu-hash64.h  |  7 ++++-
 arch/powerpc/mm/copro_fault.c          | 48 ++++++++++++++++++++++++++++++++++
 arch/powerpc/mm/slb.c                  |  3 ---
 arch/powerpc/platforms/cell/spu_base.c | 41 +++--------------------------
 4 files changed, 58 insertions(+), 41 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
index d765144..6d0b7a2 100644
--- a/arch/powerpc/include/asm/mmu-hash64.h
+++ b/arch/powerpc/include/asm/mmu-hash64.h
@@ -189,7 +189,12 @@ static inline unsigned int mmu_psize_to_shift(unsigned int mmu_psize)
 #define LP_MASK(i)	((0xFF >> (i)) << LP_SHIFT)
 
 #ifndef __ASSEMBLY__
-
+static inline int slb_vsid_shift(int ssize)
+{
+	if (ssize == MMU_SEGSIZE_256M)
+		return SLB_VSID_SHIFT;
+	return SLB_VSID_SHIFT_1T;
+}
 static inline int segment_shift(int ssize)
 {
 	if (ssize == MMU_SEGSIZE_256M)
diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
index ba7df14..b865697 100644
--- a/arch/powerpc/mm/copro_fault.c
+++ b/arch/powerpc/mm/copro_fault.c
@@ -90,3 +90,51 @@ out_unlock:
 	return ret;
 }
 EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
+
+int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid)
+{
+	int psize, ssize;
+
+	*esid = (ea & ESID_MASK) | SLB_ESID_V;
+
+	switch (REGION_ID(ea)) {
+	case USER_REGION_ID:
+		pr_devel("copro_data_segment: 0x%llx -- USER_REGION_ID\n", ea);
+#ifdef CONFIG_PPC_MM_SLICES
+		psize = get_slice_psize(mm, ea);
+#else
+		psize = mm->context.user_psize;
+#endif
+		ssize = user_segment_size(ea);
+		*vsid = (get_vsid(mm->context.id, ea, ssize)
+			 << slb_vsid_shift(ssize)) | SLB_VSID_USER;
+		break;
+	case VMALLOC_REGION_ID:
+		pr_devel("copro_data_segment: 0x%llx -- VMALLOC_REGION_ID\n", ea);
+		if (ea < VMALLOC_END)
+			psize = mmu_vmalloc_psize;
+		else
+			psize = mmu_io_psize;
+		ssize = mmu_kernel_ssize;
+		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
+			 << SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
+		break;
+	case KERNEL_REGION_ID:
+		pr_devel("copro_data_segment: 0x%llx -- KERNEL_REGION_ID\n", ea);
+		psize = mmu_linear_psize;
+		ssize = mmu_kernel_ssize;
+		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
+			 << SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
+		break;
+	default:
+		/* Future: support kernel segments so that drivers can use the
+		 * CoProcessors */
+		pr_debug("invalid region access at %016llx\n", ea);
+		return 1;
+	}
+	*vsid |= mmu_psize_defs[psize].sllp |
+		((ssize == MMU_SEGSIZE_1T) ? SLB_VSID_B_1T : 0);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(copro_data_segment);
diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index 0399a67..6e450ca 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -46,9 +46,6 @@ static inline unsigned long mk_esid_data(unsigned long ea, int ssize,
 	return (ea & slb_esid_mask(ssize)) | SLB_ESID_V | slot;
 }
 
-#define slb_vsid_shift(ssize)	\
-	((ssize) == MMU_SEGSIZE_256M? SLB_VSID_SHIFT: SLB_VSID_SHIFT_1T)
-
 static inline unsigned long mk_vsid_data(unsigned long ea, int ssize,
 					 unsigned long flags)
 {
diff --git a/arch/powerpc/platforms/cell/spu_base.c b/arch/powerpc/platforms/cell/spu_base.c
index 2930d1e..fe004b1 100644
--- a/arch/powerpc/platforms/cell/spu_base.c
+++ b/arch/powerpc/platforms/cell/spu_base.c
@@ -167,45 +167,12 @@ static inline void spu_load_slb(struct spu *spu, int slbe, struct spu_slb *slb)
 
 static int __spu_trap_data_seg(struct spu *spu, unsigned long ea)
 {
-	struct mm_struct *mm = spu->mm;
 	struct spu_slb slb;
-	int psize;
-
-	pr_debug("%s\n", __func__);
-
-	slb.esid = (ea & ESID_MASK) | SLB_ESID_V;
+	int ret;
 
-	switch(REGION_ID(ea)) {
-	case USER_REGION_ID:
-#ifdef CONFIG_PPC_MM_SLICES
-		psize = get_slice_psize(mm, ea);
-#else
-		psize = mm->context.user_psize;
-#endif
-		slb.vsid = (get_vsid(mm->context.id, ea, MMU_SEGSIZE_256M)
-				<< SLB_VSID_SHIFT) | SLB_VSID_USER;
-		break;
-	case VMALLOC_REGION_ID:
-		if (ea < VMALLOC_END)
-			psize = mmu_vmalloc_psize;
-		else
-			psize = mmu_io_psize;
-		slb.vsid = (get_kernel_vsid(ea, MMU_SEGSIZE_256M)
-				<< SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
-		break;
-	case KERNEL_REGION_ID:
-		psize = mmu_linear_psize;
-		slb.vsid = (get_kernel_vsid(ea, MMU_SEGSIZE_256M)
-				<< SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
-		break;
-	default:
-		/* Future: support kernel segments so that drivers
-		 * can use SPUs.
-		 */
-		pr_debug("invalid region access at %016lx\n", ea);
-		return 1;
-	}
-	slb.vsid |= mmu_psize_defs[psize].sllp;
+	ret = copro_data_segment(spu->mm, ea, &slb.esid, &slb.vsid);
+	if (ret)
+		return ret;
 
 	spu_load_slb(spu, spu->slb_replace, &slb);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 03/17] powerpc/cell: Make spu_flush_all_slbs() generic
  2014-09-30 10:34 ` Michael Neuling
@ 2014-09-30 10:34   ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:34 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

From: Ian Munsie <imunsie@au1.ibm.com>

This moves spu_flush_all_slbs() into a generic call copro_flush_all_slbs().

This will be useful when we add cxl which also needs a similar SLB flush call.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/copro.h |  6 ++++++
 arch/powerpc/mm/copro_fault.c    |  9 +++++++++
 arch/powerpc/mm/hash_utils_64.c  | 10 +++-------
 arch/powerpc/mm/slice.c          | 10 +++-------
 4 files changed, 21 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/copro.h b/arch/powerpc/include/asm/copro.h
index 2858108..f3d338f 100644
--- a/arch/powerpc/include/asm/copro.h
+++ b/arch/powerpc/include/asm/copro.h
@@ -15,4 +15,10 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
 
 int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid);
 
+
+#ifdef CONFIG_PPC_COPRO_BASE
+void copro_flush_all_slbs(struct mm_struct *mm);
+#else
+#define copro_flush_all_slbs(mm) do {} while(0)
+#endif
 #endif /* _ASM_POWERPC_COPRO_H */
diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
index b865697..939abdf 100644
--- a/arch/powerpc/mm/copro_fault.c
+++ b/arch/powerpc/mm/copro_fault.c
@@ -24,6 +24,7 @@
 #include <linux/mm.h>
 #include <linux/export.h>
 #include <asm/reg.h>
+#include <asm/spu.h>
 
 /*
  * This ought to be kept in sync with the powerpc specific do_page_fault
@@ -138,3 +139,11 @@ int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid)
 	return 0;
 }
 EXPORT_SYMBOL_GPL(copro_data_segment);
+
+void copro_flush_all_slbs(struct mm_struct *mm)
+{
+#ifdef CONFIG_SPU_BASE
+	spu_flush_all_slbs(mm);
+#endif
+}
+EXPORT_SYMBOL_GPL(copro_flush_all_slbs);
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index daee7f4..5c0738d 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -51,7 +51,7 @@
 #include <asm/cacheflush.h>
 #include <asm/cputable.h>
 #include <asm/sections.h>
-#include <asm/spu.h>
+#include <asm/copro.h>
 #include <asm/udbg.h>
 #include <asm/code-patching.h>
 #include <asm/fadump.h>
@@ -901,9 +901,7 @@ void demote_segment_4k(struct mm_struct *mm, unsigned long addr)
 	if (get_slice_psize(mm, addr) == MMU_PAGE_4K)
 		return;
 	slice_set_range_psize(mm, addr, 1, MMU_PAGE_4K);
-#ifdef CONFIG_SPU_BASE
-	spu_flush_all_slbs(mm);
-#endif
+	copro_flush_all_slbs(mm);
 	if (get_paca_psize(addr) != MMU_PAGE_4K) {
 		get_paca()->context = mm->context;
 		slb_flush_and_rebolt();
@@ -1141,9 +1139,7 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
 			       "to 4kB pages because of "
 			       "non-cacheable mapping\n");
 			psize = mmu_vmalloc_psize = MMU_PAGE_4K;
-#ifdef CONFIG_SPU_BASE
-			spu_flush_all_slbs(mm);
-#endif
+			copro_flush_all_slbs(mm);
 		}
 	}
 
diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index b0c75cc..a81791c 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -32,7 +32,7 @@
 #include <linux/export.h>
 #include <asm/mman.h>
 #include <asm/mmu.h>
-#include <asm/spu.h>
+#include <asm/copro.h>
 
 /* some sanity checks */
 #if (PGTABLE_RANGE >> 43) > SLICE_MASK_SIZE
@@ -232,9 +232,7 @@ static void slice_convert(struct mm_struct *mm, struct slice_mask mask, int psiz
 
 	spin_unlock_irqrestore(&slice_convert_lock, flags);
 
-#ifdef CONFIG_SPU_BASE
-	spu_flush_all_slbs(mm);
-#endif
+	copro_flush_all_slbs(mm);
 }
 
 /*
@@ -671,9 +669,7 @@ void slice_set_psize(struct mm_struct *mm, unsigned long address,
 
 	spin_unlock_irqrestore(&slice_convert_lock, flags);
 
-#ifdef CONFIG_SPU_BASE
-	spu_flush_all_slbs(mm);
-#endif
+	copro_flush_all_slbs(mm);
 }
 
 void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 03/17] powerpc/cell: Make spu_flush_all_slbs() generic
@ 2014-09-30 10:34   ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:34 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

From: Ian Munsie <imunsie@au1.ibm.com>

This moves spu_flush_all_slbs() into a generic call copro_flush_all_slbs().

This will be useful when we add cxl which also needs a similar SLB flush call.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/copro.h |  6 ++++++
 arch/powerpc/mm/copro_fault.c    |  9 +++++++++
 arch/powerpc/mm/hash_utils_64.c  | 10 +++-------
 arch/powerpc/mm/slice.c          | 10 +++-------
 4 files changed, 21 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/copro.h b/arch/powerpc/include/asm/copro.h
index 2858108..f3d338f 100644
--- a/arch/powerpc/include/asm/copro.h
+++ b/arch/powerpc/include/asm/copro.h
@@ -15,4 +15,10 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
 
 int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid);
 
+
+#ifdef CONFIG_PPC_COPRO_BASE
+void copro_flush_all_slbs(struct mm_struct *mm);
+#else
+#define copro_flush_all_slbs(mm) do {} while(0)
+#endif
 #endif /* _ASM_POWERPC_COPRO_H */
diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
index b865697..939abdf 100644
--- a/arch/powerpc/mm/copro_fault.c
+++ b/arch/powerpc/mm/copro_fault.c
@@ -24,6 +24,7 @@
 #include <linux/mm.h>
 #include <linux/export.h>
 #include <asm/reg.h>
+#include <asm/spu.h>
 
 /*
  * This ought to be kept in sync with the powerpc specific do_page_fault
@@ -138,3 +139,11 @@ int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid)
 	return 0;
 }
 EXPORT_SYMBOL_GPL(copro_data_segment);
+
+void copro_flush_all_slbs(struct mm_struct *mm)
+{
+#ifdef CONFIG_SPU_BASE
+	spu_flush_all_slbs(mm);
+#endif
+}
+EXPORT_SYMBOL_GPL(copro_flush_all_slbs);
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index daee7f4..5c0738d 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -51,7 +51,7 @@
 #include <asm/cacheflush.h>
 #include <asm/cputable.h>
 #include <asm/sections.h>
-#include <asm/spu.h>
+#include <asm/copro.h>
 #include <asm/udbg.h>
 #include <asm/code-patching.h>
 #include <asm/fadump.h>
@@ -901,9 +901,7 @@ void demote_segment_4k(struct mm_struct *mm, unsigned long addr)
 	if (get_slice_psize(mm, addr) == MMU_PAGE_4K)
 		return;
 	slice_set_range_psize(mm, addr, 1, MMU_PAGE_4K);
-#ifdef CONFIG_SPU_BASE
-	spu_flush_all_slbs(mm);
-#endif
+	copro_flush_all_slbs(mm);
 	if (get_paca_psize(addr) != MMU_PAGE_4K) {
 		get_paca()->context = mm->context;
 		slb_flush_and_rebolt();
@@ -1141,9 +1139,7 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
 			       "to 4kB pages because of "
 			       "non-cacheable mapping\n");
 			psize = mmu_vmalloc_psize = MMU_PAGE_4K;
-#ifdef CONFIG_SPU_BASE
-			spu_flush_all_slbs(mm);
-#endif
+			copro_flush_all_slbs(mm);
 		}
 	}
 
diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index b0c75cc..a81791c 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -32,7 +32,7 @@
 #include <linux/export.h>
 #include <asm/mman.h>
 #include <asm/mmu.h>
-#include <asm/spu.h>
+#include <asm/copro.h>
 
 /* some sanity checks */
 #if (PGTABLE_RANGE >> 43) > SLICE_MASK_SIZE
@@ -232,9 +232,7 @@ static void slice_convert(struct mm_struct *mm, struct slice_mask mask, int psiz
 
 	spin_unlock_irqrestore(&slice_convert_lock, flags);
 
-#ifdef CONFIG_SPU_BASE
-	spu_flush_all_slbs(mm);
-#endif
+	copro_flush_all_slbs(mm);
 }
 
 /*
@@ -671,9 +669,7 @@ void slice_set_psize(struct mm_struct *mm, unsigned long address,
 
 	spin_unlock_irqrestore(&slice_convert_lock, flags);
 
-#ifdef CONFIG_SPU_BASE
-	spu_flush_all_slbs(mm);
-#endif
+	copro_flush_all_slbs(mm);
 }
 
 void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 04/17] powerpc/msi: Improve IRQ bitmap allocator
  2014-09-30 10:34 ` Michael Neuling
@ 2014-09-30 10:34   ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:34 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

From: Ian Munsie <imunsie@au1.ibm.com>

Currently msi_bitmap_alloc_hwirqs() will round up any IRQ allocation requests
to the nearest power of 2.  eg. ask for 5 IRQs and you'll get 8.  This wastes a
lot of IRQs which can be a scarce resource.

For cxl we can require multiple IRQs for every contexts that is attached to the
accelerator.  For AFU directed accelerators, there may be 1000s of contexts
attached, hence we can easily run out of IRQs, especially if we are needlessly
wasting them.

This changes the msi_bitmap_alloc_hwirqs() to allocate only the required number
of IRQs, hence avoiding this wastage.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/sysdev/msi_bitmap.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/sysdev/msi_bitmap.c b/arch/powerpc/sysdev/msi_bitmap.c
index 2ff6302..961a358 100644
--- a/arch/powerpc/sysdev/msi_bitmap.c
+++ b/arch/powerpc/sysdev/msi_bitmap.c
@@ -20,32 +20,37 @@ int msi_bitmap_alloc_hwirqs(struct msi_bitmap *bmp, int num)
 	int offset, order = get_count_order(num);
 
 	spin_lock_irqsave(&bmp->lock, flags);
-	/*
-	 * This is fast, but stricter than we need. We might want to add
-	 * a fallback routine which does a linear search with no alignment.
-	 */
-	offset = bitmap_find_free_region(bmp->bitmap, bmp->irq_count, order);
+
+	offset = bitmap_find_next_zero_area(bmp->bitmap, bmp->irq_count, 0,
+					    num, (1 << order) - 1);
+	if (offset > bmp->irq_count)
+		goto err;
+	bitmap_set(bmp->bitmap, offset, num);
 	spin_unlock_irqrestore(&bmp->lock, flags);
 
 	pr_debug("msi_bitmap: allocated 0x%x (2^%d) at offset 0x%x\n",
 		 num, order, offset);
 
 	return offset;
+err:
+	spin_unlock_irqrestore(&bmp->lock, flags);
+	return -ENOMEM;
 }
+EXPORT_SYMBOL(msi_bitmap_alloc_hwirqs);
 
 void msi_bitmap_free_hwirqs(struct msi_bitmap *bmp, unsigned int offset,
 			    unsigned int num)
 {
 	unsigned long flags;
-	int order = get_count_order(num);
 
-	pr_debug("msi_bitmap: freeing 0x%x (2^%d) at offset 0x%x\n",
-		 num, order, offset);
+	pr_debug("msi_bitmap: freeing 0x%x at offset 0x%x\n",
+		 num, offset);
 
 	spin_lock_irqsave(&bmp->lock, flags);
-	bitmap_release_region(bmp->bitmap, offset, order);
+	bitmap_clear(bmp->bitmap, offset, num);
 	spin_unlock_irqrestore(&bmp->lock, flags);
 }
+EXPORT_SYMBOL(msi_bitmap_free_hwirqs);
 
 void msi_bitmap_reserve_hwirq(struct msi_bitmap *bmp, unsigned int hwirq)
 {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 04/17] powerpc/msi: Improve IRQ bitmap allocator
@ 2014-09-30 10:34   ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:34 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

From: Ian Munsie <imunsie@au1.ibm.com>

Currently msi_bitmap_alloc_hwirqs() will round up any IRQ allocation requests
to the nearest power of 2.  eg. ask for 5 IRQs and you'll get 8.  This wastes a
lot of IRQs which can be a scarce resource.

For cxl we can require multiple IRQs for every contexts that is attached to the
accelerator.  For AFU directed accelerators, there may be 1000s of contexts
attached, hence we can easily run out of IRQs, especially if we are needlessly
wasting them.

This changes the msi_bitmap_alloc_hwirqs() to allocate only the required number
of IRQs, hence avoiding this wastage.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/sysdev/msi_bitmap.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/sysdev/msi_bitmap.c b/arch/powerpc/sysdev/msi_bitmap.c
index 2ff6302..961a358 100644
--- a/arch/powerpc/sysdev/msi_bitmap.c
+++ b/arch/powerpc/sysdev/msi_bitmap.c
@@ -20,32 +20,37 @@ int msi_bitmap_alloc_hwirqs(struct msi_bitmap *bmp, int num)
 	int offset, order = get_count_order(num);
 
 	spin_lock_irqsave(&bmp->lock, flags);
-	/*
-	 * This is fast, but stricter than we need. We might want to add
-	 * a fallback routine which does a linear search with no alignment.
-	 */
-	offset = bitmap_find_free_region(bmp->bitmap, bmp->irq_count, order);
+
+	offset = bitmap_find_next_zero_area(bmp->bitmap, bmp->irq_count, 0,
+					    num, (1 << order) - 1);
+	if (offset > bmp->irq_count)
+		goto err;
+	bitmap_set(bmp->bitmap, offset, num);
 	spin_unlock_irqrestore(&bmp->lock, flags);
 
 	pr_debug("msi_bitmap: allocated 0x%x (2^%d) at offset 0x%x\n",
 		 num, order, offset);
 
 	return offset;
+err:
+	spin_unlock_irqrestore(&bmp->lock, flags);
+	return -ENOMEM;
 }
+EXPORT_SYMBOL(msi_bitmap_alloc_hwirqs);
 
 void msi_bitmap_free_hwirqs(struct msi_bitmap *bmp, unsigned int offset,
 			    unsigned int num)
 {
 	unsigned long flags;
-	int order = get_count_order(num);
 
-	pr_debug("msi_bitmap: freeing 0x%x (2^%d) at offset 0x%x\n",
-		 num, order, offset);
+	pr_debug("msi_bitmap: freeing 0x%x at offset 0x%x\n",
+		 num, offset);
 
 	spin_lock_irqsave(&bmp->lock, flags);
-	bitmap_release_region(bmp->bitmap, offset, order);
+	bitmap_clear(bmp->bitmap, offset, num);
 	spin_unlock_irqrestore(&bmp->lock, flags);
 }
+EXPORT_SYMBOL(msi_bitmap_free_hwirqs);
 
 void msi_bitmap_reserve_hwirq(struct msi_bitmap *bmp, unsigned int hwirq)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 05/17] powerpc/mm: Export mmu_kernel_ssize and mmu_linear_psize
  2014-09-30 10:34 ` Michael Neuling
@ 2014-09-30 10:34   ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:34 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

From: Ian Munsie <imunsie@au1.ibm.com>

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/mm/hash_utils_64.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 5c0738d..bbdb054 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -98,6 +98,7 @@ unsigned long htab_size_bytes;
 unsigned long htab_hash_mask;
 EXPORT_SYMBOL_GPL(htab_hash_mask);
 int mmu_linear_psize = MMU_PAGE_4K;
+EXPORT_SYMBOL_GPL(mmu_linear_psize);
 int mmu_virtual_psize = MMU_PAGE_4K;
 int mmu_vmalloc_psize = MMU_PAGE_4K;
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
@@ -105,6 +106,7 @@ int mmu_vmemmap_psize = MMU_PAGE_4K;
 #endif
 int mmu_io_psize = MMU_PAGE_4K;
 int mmu_kernel_ssize = MMU_SEGSIZE_256M;
+EXPORT_SYMBOL_GPL(mmu_kernel_ssize);
 int mmu_highuser_ssize = MMU_SEGSIZE_256M;
 u16 mmu_slb_size = 64;
 EXPORT_SYMBOL_GPL(mmu_slb_size);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 05/17] powerpc/mm: Export mmu_kernel_ssize and mmu_linear_psize
@ 2014-09-30 10:34   ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:34 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

From: Ian Munsie <imunsie@au1.ibm.com>

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/mm/hash_utils_64.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 5c0738d..bbdb054 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -98,6 +98,7 @@ unsigned long htab_size_bytes;
 unsigned long htab_hash_mask;
 EXPORT_SYMBOL_GPL(htab_hash_mask);
 int mmu_linear_psize = MMU_PAGE_4K;
+EXPORT_SYMBOL_GPL(mmu_linear_psize);
 int mmu_virtual_psize = MMU_PAGE_4K;
 int mmu_vmalloc_psize = MMU_PAGE_4K;
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
@@ -105,6 +106,7 @@ int mmu_vmemmap_psize = MMU_PAGE_4K;
 #endif
 int mmu_io_psize = MMU_PAGE_4K;
 int mmu_kernel_ssize = MMU_SEGSIZE_256M;
+EXPORT_SYMBOL_GPL(mmu_kernel_ssize);
 int mmu_highuser_ssize = MMU_SEGSIZE_256M;
 u16 mmu_slb_size = 64;
 EXPORT_SYMBOL_GPL(mmu_slb_size);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 06/17] powerpc/powernv: Split out set MSI IRQ chip code
  2014-09-30 10:34 ` Michael Neuling
@ 2014-09-30 10:34   ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:34 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

From: Ian Munsie <imunsie@au1.ibm.com>

Some of the MSI IRQ code in pnv_pci_ioda_msi_setup() is generically useful so
split it out.

This will be used by some of the cxl PCIe code later.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 43 ++++++++++++++++++-------------
 1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index df241b1..329164f 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1306,14 +1306,36 @@ static void pnv_ioda2_msi_eoi(struct irq_data *d)
 	icp_native_eoi(d);
 }
 
+
+static void set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq)
+{
+	struct irq_data *idata;
+	struct irq_chip *ichip;
+
+	if (phb->type != PNV_PHB_IODA2)
+		return;
+
+	/*
+	 * Change the IRQ chip for the MSI interrupts on PHB3.
+	 * The corresponding IRQ chip should be populated for
+	 * the first time.
+	 */
+	if (!phb->ioda.irq_chip_init) {
+		idata = irq_get_irq_data(virq);
+		ichip = irq_data_get_irq_chip(idata);
+		phb->ioda.irq_chip_init = 1;
+		phb->ioda.irq_chip = *ichip;
+		phb->ioda.irq_chip.irq_eoi = pnv_ioda2_msi_eoi;
+	}
+	irq_set_chip(virq, &phb->ioda.irq_chip);
+}
+
 static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
 				  unsigned int hwirq, unsigned int virq,
 				  unsigned int is_64, struct msi_msg *msg)
 {
 	struct pnv_ioda_pe *pe = pnv_ioda_get_pe(dev);
 	struct pci_dn *pdn = pci_get_pdn(dev);
-	struct irq_data *idata;
-	struct irq_chip *ichip;
 	unsigned int xive_num = hwirq - phb->msi_base;
 	__be32 data;
 	int rc;
@@ -1365,22 +1387,7 @@ static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
 	}
 	msg->data = be32_to_cpu(data);
 
-	/*
-	 * Change the IRQ chip for the MSI interrupts on PHB3.
-	 * The corresponding IRQ chip should be populated for
-	 * the first time.
-	 */
-	if (phb->type == PNV_PHB_IODA2) {
-		if (!phb->ioda.irq_chip_init) {
-			idata = irq_get_irq_data(virq);
-			ichip = irq_data_get_irq_chip(idata);
-			phb->ioda.irq_chip_init = 1;
-			phb->ioda.irq_chip = *ichip;
-			phb->ioda.irq_chip.irq_eoi = pnv_ioda2_msi_eoi;
-		}
-
-		irq_set_chip(virq, &phb->ioda.irq_chip);
-	}
+	set_msi_irq_chip(phb, virq);
 
 	pr_devel("%s: %s-bit MSI on hwirq %x (xive #%d),"
 		 " address=%x_%08x data=%x PE# %d\n",
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 06/17] powerpc/powernv: Split out set MSI IRQ chip code
@ 2014-09-30 10:34   ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:34 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

From: Ian Munsie <imunsie@au1.ibm.com>

Some of the MSI IRQ code in pnv_pci_ioda_msi_setup() is generically useful so
split it out.

This will be used by some of the cxl PCIe code later.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 43 ++++++++++++++++++-------------
 1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index df241b1..329164f 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1306,14 +1306,36 @@ static void pnv_ioda2_msi_eoi(struct irq_data *d)
 	icp_native_eoi(d);
 }
 
+
+static void set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq)
+{
+	struct irq_data *idata;
+	struct irq_chip *ichip;
+
+	if (phb->type != PNV_PHB_IODA2)
+		return;
+
+	/*
+	 * Change the IRQ chip for the MSI interrupts on PHB3.
+	 * The corresponding IRQ chip should be populated for
+	 * the first time.
+	 */
+	if (!phb->ioda.irq_chip_init) {
+		idata = irq_get_irq_data(virq);
+		ichip = irq_data_get_irq_chip(idata);
+		phb->ioda.irq_chip_init = 1;
+		phb->ioda.irq_chip = *ichip;
+		phb->ioda.irq_chip.irq_eoi = pnv_ioda2_msi_eoi;
+	}
+	irq_set_chip(virq, &phb->ioda.irq_chip);
+}
+
 static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
 				  unsigned int hwirq, unsigned int virq,
 				  unsigned int is_64, struct msi_msg *msg)
 {
 	struct pnv_ioda_pe *pe = pnv_ioda_get_pe(dev);
 	struct pci_dn *pdn = pci_get_pdn(dev);
-	struct irq_data *idata;
-	struct irq_chip *ichip;
 	unsigned int xive_num = hwirq - phb->msi_base;
 	__be32 data;
 	int rc;
@@ -1365,22 +1387,7 @@ static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
 	}
 	msg->data = be32_to_cpu(data);
 
-	/*
-	 * Change the IRQ chip for the MSI interrupts on PHB3.
-	 * The corresponding IRQ chip should be populated for
-	 * the first time.
-	 */
-	if (phb->type == PNV_PHB_IODA2) {
-		if (!phb->ioda.irq_chip_init) {
-			idata = irq_get_irq_data(virq);
-			ichip = irq_data_get_irq_chip(idata);
-			phb->ioda.irq_chip_init = 1;
-			phb->ioda.irq_chip = *ichip;
-			phb->ioda.irq_chip.irq_eoi = pnv_ioda2_msi_eoi;
-		}
-
-		irq_set_chip(virq, &phb->ioda.irq_chip);
-	}
+	set_msi_irq_chip(phb, virq);
 
 	pr_devel("%s: %s-bit MSI on hwirq %x (xive #%d),"
 		 " address=%x_%08x data=%x PE# %d\n",
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 07/17] cxl: Add new header for call backs and structs
  2014-09-30 10:34 ` Michael Neuling
@ 2014-09-30 10:34   ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:34 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

From: Ian Munsie <imunsie@au1.ibm.com>

This new header add defines for callbacks and structs needed by the rest of the
kernel to hook into the cxl infrastructure.

Empty functions are provided when CONFIG CXL_BASE is not enabled.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 include/misc/cxl.h | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)
 create mode 100644 include/misc/cxl.h

diff --git a/include/misc/cxl.h b/include/misc/cxl.h
new file mode 100644
index 0000000..bde46a3
--- /dev/null
+++ b/include/misc/cxl.h
@@ -0,0 +1,34 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _MISC_ASM_CXL_H
+#define _MISC_ASM_CXL_H
+
+#define CXL_IRQ_RANGES 4
+
+struct cxl_irq_ranges {
+	irq_hw_number_t offset[CXL_IRQ_RANGES];
+	irq_hw_number_t range[CXL_IRQ_RANGES];
+};
+
+#ifdef CONFIG_CXL_BASE
+
+void cxl_slbia(struct mm_struct *mm);
+void cxl_ctx_get(void);
+void cxl_ctx_put(void);
+bool cxl_ctx_in_use(void);
+
+#else /* CONFIG_CXL_BASE */
+
+#define cxl_slbia(...) do { } while (0)
+#define cxl_ctx_in_use(...) false
+
+#endif /* CONFIG_CXL_BASE */
+
+#endif
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 07/17] cxl: Add new header for call backs and structs
@ 2014-09-30 10:34   ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:34 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

From: Ian Munsie <imunsie@au1.ibm.com>

This new header add defines for callbacks and structs needed by the rest of the
kernel to hook into the cxl infrastructure.

Empty functions are provided when CONFIG CXL_BASE is not enabled.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 include/misc/cxl.h | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)
 create mode 100644 include/misc/cxl.h

diff --git a/include/misc/cxl.h b/include/misc/cxl.h
new file mode 100644
index 0000000..bde46a3
--- /dev/null
+++ b/include/misc/cxl.h
@@ -0,0 +1,34 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _MISC_ASM_CXL_H
+#define _MISC_ASM_CXL_H
+
+#define CXL_IRQ_RANGES 4
+
+struct cxl_irq_ranges {
+	irq_hw_number_t offset[CXL_IRQ_RANGES];
+	irq_hw_number_t range[CXL_IRQ_RANGES];
+};
+
+#ifdef CONFIG_CXL_BASE
+
+void cxl_slbia(struct mm_struct *mm);
+void cxl_ctx_get(void);
+void cxl_ctx_put(void);
+bool cxl_ctx_in_use(void);
+
+#else /* CONFIG_CXL_BASE */
+
+#define cxl_slbia(...) do { } while (0)
+#define cxl_ctx_in_use(...) false
+
+#endif /* CONFIG_CXL_BASE */
+
+#endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 08/17] powerpc/powerpc: Add new PCIe functions for allocating cxl interrupts
  2014-09-30 10:34 ` Michael Neuling
@ 2014-09-30 10:34   ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:34 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

From: Ian Munsie <imunsie@au1.ibm.com>

This adds a number of functions for allocating IRQs under powernv PCIe for cxl.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/pnv-pci.h        |  27 +++++
 arch/powerpc/platforms/powernv/pci-ioda.c | 161 ++++++++++++++++++++++++++++++
 2 files changed, 188 insertions(+)
 create mode 100644 arch/powerpc/include/asm/pnv-pci.h

diff --git a/arch/powerpc/include/asm/pnv-pci.h b/arch/powerpc/include/asm/pnv-pci.h
new file mode 100644
index 0000000..71717b5
--- /dev/null
+++ b/arch/powerpc/include/asm/pnv-pci.h
@@ -0,0 +1,27 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _ASM_PNV_PCI_H
+#define _ASM_PNV_PCI_H
+
+#include <linux/pci.h>
+#include <misc/cxl.h>
+
+int pnv_phb_to_cxl(struct pci_dev *dev);
+int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
+			   unsigned int virq);
+int pnv_cxl_alloc_hwirqs(struct pci_dev *dev, int num);
+void pnv_cxl_release_hwirqs(struct pci_dev *dev, int hwirq, int num);
+int pnv_cxl_alloc_hwirq_ranges(struct cxl_irq_ranges *irqs,
+			       struct pci_dev *dev, int num);
+void pnv_cxl_release_hwirq_ranges(struct cxl_irq_ranges *irqs,
+				  struct pci_dev *dev);
+int pnv_cxl_get_irq_count(struct pci_dev *dev);
+
+#endif
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 329164f..b0b96f0 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -38,6 +38,8 @@
 #include <asm/debug.h>
 #include <asm/firmware.h>
 
+#include <misc/cxl.h>
+
 #include "powernv.h"
 #include "pci.h"
 
@@ -503,6 +505,138 @@ static struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev)
 		return NULL;
 	return &phb->ioda.pe_array[pdn->pe_number];
 }
+
+struct device_node *pnv_pci_to_phb_node(struct pci_dev *dev)
+{
+        struct pci_controller *hose = pci_bus_to_host(dev->bus);
+
+        return hose->dn;
+}
+EXPORT_SYMBOL(pnv_pci_to_phb_node);
+
+#ifdef CONFIG_CXL_BASE
+int pnv_phb_to_cxl(struct pci_dev *dev)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct pnv_ioda_pe *pe;
+	int rc;
+
+	if (!(pe = pnv_ioda_get_pe(dev))) {
+		rc = -ENODEV;
+		goto out;
+	}
+	pe_info(pe, "switch PHB to CXL\n");
+	pe_info(pe, "PHB-ID  : 0x%016llx\n", phb->opal_id);
+	pe_info(pe, "     pe : %i\n", pe->pe_number);
+
+	if ((rc = opal_pci_set_phb_cxl_mode(phb->opal_id, 1, pe->pe_number)))
+		dev_err(&dev->dev, "opal_pci_set_phb_cxl_mode failed: %i\n", rc);
+
+out:
+	return rc;
+}
+EXPORT_SYMBOL(pnv_phb_to_cxl);
+
+int pnv_cxl_alloc_hwirqs(struct pci_dev *dev, int num)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	int hwirq = msi_bitmap_alloc_hwirqs(&phb->msi_bmp, num);
+
+	if (hwirq < 0) {
+		dev_warn(&dev->dev, "Failed to find a free MSI\n");
+		return -ENOSPC;
+	}
+
+	return phb->msi_base + hwirq;
+}
+EXPORT_SYMBOL(pnv_cxl_alloc_hwirqs);
+
+void pnv_cxl_release_hwirqs(struct pci_dev *dev, int hwirq, int num)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+
+	msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq - phb->msi_base, num);
+}
+EXPORT_SYMBOL(pnv_cxl_release_hwirqs);
+
+
+int pnv_cxl_alloc_hwirq_ranges(struct cxl_irq_ranges *irqs,
+			       struct pci_dev *dev, int num)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	int range = 0;
+	int hwirq;
+	int try;
+
+	memset(irqs, 0, sizeof(struct cxl_irq_ranges));
+
+	for (range = 1; range < CXL_IRQ_RANGES && num; range++) {
+		try = num;
+		while (try) {
+			hwirq = msi_bitmap_alloc_hwirqs(&phb->msi_bmp, try);
+			if (hwirq >= 0)
+				break;
+			try /= 2;
+		}
+		if (!try)
+			goto fail;
+
+		irqs->offset[range] = phb->msi_base + hwirq;
+		irqs->range[range] = try;
+		pr_devel("cxl alloc irq range 0x%x: offset: 0x%lx  limit: %li\n",
+			 range, irqs->offset[range], irqs->range[range]);
+		num -= try;
+	}
+	if (num)
+		goto fail;
+
+	return 0;
+fail:
+	for (range--; range >= 0; range--) {
+		hwirq = irqs->offset[range] - phb->msi_base;
+		msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq,
+				       irqs->range[range]);
+		irqs->range[range] = 0;
+	}
+	return -ENOSPC;
+}
+EXPORT_SYMBOL(pnv_cxl_alloc_hwirq_ranges);
+
+void pnv_cxl_release_hwirq_ranges(struct cxl_irq_ranges *irqs,
+				  struct pci_dev *dev)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	int range = 0;
+	int hwirq;
+
+	for (range = 0; range < 4; range++) {
+		hwirq = irqs->offset[range] - phb->msi_base;
+		if (irqs->range[range]) {
+			pr_devel("cxl release irq range 0x%x: offset: 0x%lx  limit: %ld\n",
+				 range, irqs->offset[range],
+				 irqs->range[range]);
+			msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq,
+					       irqs->range[range]);
+		}
+	}
+}
+EXPORT_SYMBOL(pnv_cxl_release_hwirq_ranges);
+
+int pnv_cxl_get_irq_count(struct pci_dev *dev)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+        struct pnv_phb *phb = hose->private_data;
+
+	return phb->msi_bmp.irq_count;
+}
+EXPORT_SYMBOL(pnv_cxl_get_irq_count);
+
+#endif /* CONFIG_CXL_BASE */
 #endif /* CONFIG_PCI_MSI */
 
 static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
@@ -1330,6 +1464,33 @@ static void set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq)
 	irq_set_chip(virq, &phb->ioda.irq_chip);
 }
 
+#ifdef CONFIG_CXL_BASE
+int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
+			   unsigned int virq)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	unsigned int xive_num = hwirq - phb->msi_base;
+	struct pnv_ioda_pe *pe;
+	int rc;
+
+	if (!(pe = pnv_ioda_get_pe(dev)))
+		return -ENODEV;
+
+	/* Assign XIVE to PE */
+	rc = opal_pci_set_xive_pe(phb->opal_id, pe->pe_number, xive_num);
+	if (rc) {
+		pr_warn("%s: OPAL error %d setting msi_base 0x%x hwirq 0x%x XIVE 0x%x PE\n",
+			pci_name(dev), rc, phb->msi_base, hwirq, xive_num);
+		return -EIO;
+	}
+	set_msi_irq_chip(phb, virq);
+
+	return 0;
+}
+EXPORT_SYMBOL(pnv_cxl_ioda_msi_setup);
+#endif
+
 static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
 				  unsigned int hwirq, unsigned int virq,
 				  unsigned int is_64, struct msi_msg *msg)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 08/17] powerpc/powerpc: Add new PCIe functions for allocating cxl interrupts
@ 2014-09-30 10:34   ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:34 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

From: Ian Munsie <imunsie@au1.ibm.com>

This adds a number of functions for allocating IRQs under powernv PCIe for cxl.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/pnv-pci.h        |  27 +++++
 arch/powerpc/platforms/powernv/pci-ioda.c | 161 ++++++++++++++++++++++++++++++
 2 files changed, 188 insertions(+)
 create mode 100644 arch/powerpc/include/asm/pnv-pci.h

diff --git a/arch/powerpc/include/asm/pnv-pci.h b/arch/powerpc/include/asm/pnv-pci.h
new file mode 100644
index 0000000..71717b5
--- /dev/null
+++ b/arch/powerpc/include/asm/pnv-pci.h
@@ -0,0 +1,27 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _ASM_PNV_PCI_H
+#define _ASM_PNV_PCI_H
+
+#include <linux/pci.h>
+#include <misc/cxl.h>
+
+int pnv_phb_to_cxl(struct pci_dev *dev);
+int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
+			   unsigned int virq);
+int pnv_cxl_alloc_hwirqs(struct pci_dev *dev, int num);
+void pnv_cxl_release_hwirqs(struct pci_dev *dev, int hwirq, int num);
+int pnv_cxl_alloc_hwirq_ranges(struct cxl_irq_ranges *irqs,
+			       struct pci_dev *dev, int num);
+void pnv_cxl_release_hwirq_ranges(struct cxl_irq_ranges *irqs,
+				  struct pci_dev *dev);
+int pnv_cxl_get_irq_count(struct pci_dev *dev);
+
+#endif
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 329164f..b0b96f0 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -38,6 +38,8 @@
 #include <asm/debug.h>
 #include <asm/firmware.h>
 
+#include <misc/cxl.h>
+
 #include "powernv.h"
 #include "pci.h"
 
@@ -503,6 +505,138 @@ static struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev)
 		return NULL;
 	return &phb->ioda.pe_array[pdn->pe_number];
 }
+
+struct device_node *pnv_pci_to_phb_node(struct pci_dev *dev)
+{
+        struct pci_controller *hose = pci_bus_to_host(dev->bus);
+
+        return hose->dn;
+}
+EXPORT_SYMBOL(pnv_pci_to_phb_node);
+
+#ifdef CONFIG_CXL_BASE
+int pnv_phb_to_cxl(struct pci_dev *dev)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct pnv_ioda_pe *pe;
+	int rc;
+
+	if (!(pe = pnv_ioda_get_pe(dev))) {
+		rc = -ENODEV;
+		goto out;
+	}
+	pe_info(pe, "switch PHB to CXL\n");
+	pe_info(pe, "PHB-ID  : 0x%016llx\n", phb->opal_id);
+	pe_info(pe, "     pe : %i\n", pe->pe_number);
+
+	if ((rc = opal_pci_set_phb_cxl_mode(phb->opal_id, 1, pe->pe_number)))
+		dev_err(&dev->dev, "opal_pci_set_phb_cxl_mode failed: %i\n", rc);
+
+out:
+	return rc;
+}
+EXPORT_SYMBOL(pnv_phb_to_cxl);
+
+int pnv_cxl_alloc_hwirqs(struct pci_dev *dev, int num)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	int hwirq = msi_bitmap_alloc_hwirqs(&phb->msi_bmp, num);
+
+	if (hwirq < 0) {
+		dev_warn(&dev->dev, "Failed to find a free MSI\n");
+		return -ENOSPC;
+	}
+
+	return phb->msi_base + hwirq;
+}
+EXPORT_SYMBOL(pnv_cxl_alloc_hwirqs);
+
+void pnv_cxl_release_hwirqs(struct pci_dev *dev, int hwirq, int num)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+
+	msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq - phb->msi_base, num);
+}
+EXPORT_SYMBOL(pnv_cxl_release_hwirqs);
+
+
+int pnv_cxl_alloc_hwirq_ranges(struct cxl_irq_ranges *irqs,
+			       struct pci_dev *dev, int num)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	int range = 0;
+	int hwirq;
+	int try;
+
+	memset(irqs, 0, sizeof(struct cxl_irq_ranges));
+
+	for (range = 1; range < CXL_IRQ_RANGES && num; range++) {
+		try = num;
+		while (try) {
+			hwirq = msi_bitmap_alloc_hwirqs(&phb->msi_bmp, try);
+			if (hwirq >= 0)
+				break;
+			try /= 2;
+		}
+		if (!try)
+			goto fail;
+
+		irqs->offset[range] = phb->msi_base + hwirq;
+		irqs->range[range] = try;
+		pr_devel("cxl alloc irq range 0x%x: offset: 0x%lx  limit: %li\n",
+			 range, irqs->offset[range], irqs->range[range]);
+		num -= try;
+	}
+	if (num)
+		goto fail;
+
+	return 0;
+fail:
+	for (range--; range >= 0; range--) {
+		hwirq = irqs->offset[range] - phb->msi_base;
+		msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq,
+				       irqs->range[range]);
+		irqs->range[range] = 0;
+	}
+	return -ENOSPC;
+}
+EXPORT_SYMBOL(pnv_cxl_alloc_hwirq_ranges);
+
+void pnv_cxl_release_hwirq_ranges(struct cxl_irq_ranges *irqs,
+				  struct pci_dev *dev)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	int range = 0;
+	int hwirq;
+
+	for (range = 0; range < 4; range++) {
+		hwirq = irqs->offset[range] - phb->msi_base;
+		if (irqs->range[range]) {
+			pr_devel("cxl release irq range 0x%x: offset: 0x%lx  limit: %ld\n",
+				 range, irqs->offset[range],
+				 irqs->range[range]);
+			msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq,
+					       irqs->range[range]);
+		}
+	}
+}
+EXPORT_SYMBOL(pnv_cxl_release_hwirq_ranges);
+
+int pnv_cxl_get_irq_count(struct pci_dev *dev)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+        struct pnv_phb *phb = hose->private_data;
+
+	return phb->msi_bmp.irq_count;
+}
+EXPORT_SYMBOL(pnv_cxl_get_irq_count);
+
+#endif /* CONFIG_CXL_BASE */
 #endif /* CONFIG_PCI_MSI */
 
 static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
@@ -1330,6 +1464,33 @@ static void set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq)
 	irq_set_chip(virq, &phb->ioda.irq_chip);
 }
 
+#ifdef CONFIG_CXL_BASE
+int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
+			   unsigned int virq)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	unsigned int xive_num = hwirq - phb->msi_base;
+	struct pnv_ioda_pe *pe;
+	int rc;
+
+	if (!(pe = pnv_ioda_get_pe(dev)))
+		return -ENODEV;
+
+	/* Assign XIVE to PE */
+	rc = opal_pci_set_xive_pe(phb->opal_id, pe->pe_number, xive_num);
+	if (rc) {
+		pr_warn("%s: OPAL error %d setting msi_base 0x%x hwirq 0x%x XIVE 0x%x PE\n",
+			pci_name(dev), rc, phb->msi_base, hwirq, xive_num);
+		return -EIO;
+	}
+	set_msi_irq_chip(phb, virq);
+
+	return 0;
+}
+EXPORT_SYMBOL(pnv_cxl_ioda_msi_setup);
+#endif
+
 static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
 				  unsigned int hwirq, unsigned int virq,
 				  unsigned int is_64, struct msi_msg *msg)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 09/17] powerpc/mm: Add new hash_page_mm()
  2014-09-30 10:34 ` Michael Neuling
@ 2014-09-30 10:34   ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:34 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

From: Ian Munsie <imunsie@au1.ibm.com>

This adds a new function hash_page_mm() based on the existing hash_page().
This version allows any struct mm to be passed in, rather than assuming
current.  This is useful for servicing co-processor faults which are not in the
context of the current running process.

We need to be careful here as the current hash_page() assumes current in a few
places.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/mmu-hash64.h |  1 +
 arch/powerpc/mm/hash_utils_64.c       | 22 ++++++++++++++--------
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
index 6d0b7a2..f84e5a5 100644
--- a/arch/powerpc/include/asm/mmu-hash64.h
+++ b/arch/powerpc/include/asm/mmu-hash64.h
@@ -322,6 +322,7 @@ extern int __hash_page_64K(unsigned long ea, unsigned long access,
 			   unsigned int local, int ssize);
 struct mm_struct;
 unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap);
+extern int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, unsigned long trap);
 extern int hash_page(unsigned long ea, unsigned long access, unsigned long trap);
 int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid,
 		     pte_t *ptep, unsigned long trap, int local, int ssize,
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index bbdb054..0a5c8c0 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -904,7 +904,7 @@ void demote_segment_4k(struct mm_struct *mm, unsigned long addr)
 		return;
 	slice_set_range_psize(mm, addr, 1, MMU_PAGE_4K);
 	copro_flush_all_slbs(mm);
-	if (get_paca_psize(addr) != MMU_PAGE_4K) {
+	if ((get_paca_psize(addr) != MMU_PAGE_4K) && (current->mm == mm)) {
 		get_paca()->context = mm->context;
 		slb_flush_and_rebolt();
 	}
@@ -989,26 +989,24 @@ static void check_paca_psize(unsigned long ea, struct mm_struct *mm,
  * -1 - critical hash insertion error
  * -2 - access not permitted by subpage protection mechanism
  */
-int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
+int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, unsigned long trap)
 {
 	enum ctx_state prev_state = exception_enter();
 	pgd_t *pgdir;
 	unsigned long vsid;
-	struct mm_struct *mm;
 	pte_t *ptep;
 	unsigned hugeshift;
 	const struct cpumask *tmp;
 	int rc, user_region = 0, local = 0;
 	int psize, ssize;
 
-	DBG_LOW("hash_page(ea=%016lx, access=%lx, trap=%lx\n",
-		ea, access, trap);
+	DBG_LOW("%s(ea=%016lx, access=%lx, trap=%lx\n",
+		__func__, ea, access, trap);
 
 	/* Get region & vsid */
  	switch (REGION_ID(ea)) {
 	case USER_REGION_ID:
 		user_region = 1;
-		mm = current->mm;
 		if (! mm) {
 			DBG_LOW(" user region with no mm !\n");
 			rc = 1;
@@ -1104,7 +1102,8 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
 			WARN_ON(1);
 		}
 #endif
-		check_paca_psize(ea, mm, psize, user_region);
+		if (current->mm == mm)
+			check_paca_psize(ea, mm, psize, user_region);
 
 		goto bail;
 	}
@@ -1145,7 +1144,8 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
 		}
 	}
 
-	check_paca_psize(ea, mm, psize, user_region);
+	if (current->mm == mm)
+		check_paca_psize(ea, mm, psize, user_region);
 #endif /* CONFIG_PPC_64K_PAGES */
 
 #ifdef CONFIG_PPC_HAS_HASH_64K
@@ -1180,6 +1180,12 @@ bail:
 	exception_exit(prev_state);
 	return rc;
 }
+EXPORT_SYMBOL_GPL(hash_page_mm);
+
+int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
+{
+	return hash_page_mm(current->mm, ea, access, trap);
+}
 EXPORT_SYMBOL_GPL(hash_page);
 
 void hash_preload(struct mm_struct *mm, unsigned long ea,
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 09/17] powerpc/mm: Add new hash_page_mm()
@ 2014-09-30 10:34   ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:34 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

From: Ian Munsie <imunsie@au1.ibm.com>

This adds a new function hash_page_mm() based on the existing hash_page().
This version allows any struct mm to be passed in, rather than assuming
current.  This is useful for servicing co-processor faults which are not in the
context of the current running process.

We need to be careful here as the current hash_page() assumes current in a few
places.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/mmu-hash64.h |  1 +
 arch/powerpc/mm/hash_utils_64.c       | 22 ++++++++++++++--------
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
index 6d0b7a2..f84e5a5 100644
--- a/arch/powerpc/include/asm/mmu-hash64.h
+++ b/arch/powerpc/include/asm/mmu-hash64.h
@@ -322,6 +322,7 @@ extern int __hash_page_64K(unsigned long ea, unsigned long access,
 			   unsigned int local, int ssize);
 struct mm_struct;
 unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap);
+extern int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, unsigned long trap);
 extern int hash_page(unsigned long ea, unsigned long access, unsigned long trap);
 int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid,
 		     pte_t *ptep, unsigned long trap, int local, int ssize,
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index bbdb054..0a5c8c0 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -904,7 +904,7 @@ void demote_segment_4k(struct mm_struct *mm, unsigned long addr)
 		return;
 	slice_set_range_psize(mm, addr, 1, MMU_PAGE_4K);
 	copro_flush_all_slbs(mm);
-	if (get_paca_psize(addr) != MMU_PAGE_4K) {
+	if ((get_paca_psize(addr) != MMU_PAGE_4K) && (current->mm == mm)) {
 		get_paca()->context = mm->context;
 		slb_flush_and_rebolt();
 	}
@@ -989,26 +989,24 @@ static void check_paca_psize(unsigned long ea, struct mm_struct *mm,
  * -1 - critical hash insertion error
  * -2 - access not permitted by subpage protection mechanism
  */
-int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
+int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, unsigned long trap)
 {
 	enum ctx_state prev_state = exception_enter();
 	pgd_t *pgdir;
 	unsigned long vsid;
-	struct mm_struct *mm;
 	pte_t *ptep;
 	unsigned hugeshift;
 	const struct cpumask *tmp;
 	int rc, user_region = 0, local = 0;
 	int psize, ssize;
 
-	DBG_LOW("hash_page(ea=%016lx, access=%lx, trap=%lx\n",
-		ea, access, trap);
+	DBG_LOW("%s(ea=%016lx, access=%lx, trap=%lx\n",
+		__func__, ea, access, trap);
 
 	/* Get region & vsid */
  	switch (REGION_ID(ea)) {
 	case USER_REGION_ID:
 		user_region = 1;
-		mm = current->mm;
 		if (! mm) {
 			DBG_LOW(" user region with no mm !\n");
 			rc = 1;
@@ -1104,7 +1102,8 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
 			WARN_ON(1);
 		}
 #endif
-		check_paca_psize(ea, mm, psize, user_region);
+		if (current->mm == mm)
+			check_paca_psize(ea, mm, psize, user_region);
 
 		goto bail;
 	}
@@ -1145,7 +1144,8 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
 		}
 	}
 
-	check_paca_psize(ea, mm, psize, user_region);
+	if (current->mm == mm)
+		check_paca_psize(ea, mm, psize, user_region);
 #endif /* CONFIG_PPC_64K_PAGES */
 
 #ifdef CONFIG_PPC_HAS_HASH_64K
@@ -1180,6 +1180,12 @@ bail:
 	exception_exit(prev_state);
 	return rc;
 }
+EXPORT_SYMBOL_GPL(hash_page_mm);
+
+int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
+{
+	return hash_page_mm(current->mm, ea, access, trap);
+}
 EXPORT_SYMBOL_GPL(hash_page);
 
 void hash_preload(struct mm_struct *mm, unsigned long ea,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 10/17] powerpc/mm: Merge vsid calculation in hash_page() and copro_data_segment()
  2014-09-30 10:34 ` Michael Neuling
@ 2014-09-30 10:34   ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:34 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

From: Ian Munsie <imunsie@au1.ibm.com>

The vsid calculation between hash_page() and copro_data_segment() are very
similar.  This merges these two different versions.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/mmu-hash64.h |  2 ++
 arch/powerpc/mm/copro_fault.c         | 45 ++++++--------------------
 arch/powerpc/mm/hash_utils_64.c       | 61 ++++++++++++++++++++++-------------
 3 files changed, 50 insertions(+), 58 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
index f84e5a5..bf43fb0 100644
--- a/arch/powerpc/include/asm/mmu-hash64.h
+++ b/arch/powerpc/include/asm/mmu-hash64.h
@@ -322,6 +322,8 @@ extern int __hash_page_64K(unsigned long ea, unsigned long access,
 			   unsigned int local, int ssize);
 struct mm_struct;
 unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap);
+int calculate_vsid(struct mm_struct *mm, u64 ea,
+		   u64 *vsid, int *psize, int *ssize);
 extern int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, unsigned long trap);
 extern int hash_page(unsigned long ea, unsigned long access, unsigned long trap);
 int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid,
diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
index 939abdf..ba8bf8e 100644
--- a/arch/powerpc/mm/copro_fault.c
+++ b/arch/powerpc/mm/copro_fault.c
@@ -94,45 +94,18 @@ EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
 
 int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid)
 {
-	int psize, ssize;
+	int psize, ssize, rc;
 
 	*esid = (ea & ESID_MASK) | SLB_ESID_V;
 
-	switch (REGION_ID(ea)) {
-	case USER_REGION_ID:
-		pr_devel("copro_data_segment: 0x%llx -- USER_REGION_ID\n", ea);
-#ifdef CONFIG_PPC_MM_SLICES
-		psize = get_slice_psize(mm, ea);
-#else
-		psize = mm->context.user_psize;
-#endif
-		ssize = user_segment_size(ea);
-		*vsid = (get_vsid(mm->context.id, ea, ssize)
-			 << slb_vsid_shift(ssize)) | SLB_VSID_USER;
-		break;
-	case VMALLOC_REGION_ID:
-		pr_devel("copro_data_segment: 0x%llx -- VMALLOC_REGION_ID\n", ea);
-		if (ea < VMALLOC_END)
-			psize = mmu_vmalloc_psize;
-		else
-			psize = mmu_io_psize;
-		ssize = mmu_kernel_ssize;
-		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
-			 << SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
-		break;
-	case KERNEL_REGION_ID:
-		pr_devel("copro_data_segment: 0x%llx -- KERNEL_REGION_ID\n", ea);
-		psize = mmu_linear_psize;
-		ssize = mmu_kernel_ssize;
-		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
-			 << SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
-		break;
-	default:
-		/* Future: support kernel segments so that drivers can use the
-		 * CoProcessors */
-		pr_debug("invalid region access at %016llx\n", ea);
-		return 1;
-	}
+	rc = calculate_vsid(mm, ea, vsid, &psize, &ssize);
+	if (rc)
+		return rc;
+	if (REGION_ID(ea) == USER_REGION_ID)
+		*vsid = (*vsid << slb_vsid_shift(ssize)) | SLB_VSID_USER;
+	else
+		*vsid = (*vsid << SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
+
 	*vsid |= mmu_psize_defs[psize].sllp |
 		((ssize == MMU_SEGSIZE_1T) ? SLB_VSID_B_1T : 0);
 
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 0a5c8c0..3fa81ca 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -983,6 +983,38 @@ static void check_paca_psize(unsigned long ea, struct mm_struct *mm,
 	}
 }
 
+int calculate_vsid(struct mm_struct *mm, u64 ea,
+		   u64 *vsid, int *psize, int *ssize)
+{
+	switch (REGION_ID(ea)) {
+	case USER_REGION_ID:
+		pr_devel("%s: 0x%llx -- USER_REGION_ID\n", __func__, ea);
+		*psize = get_slice_psize(mm, ea);
+		*ssize = user_segment_size(ea);
+		*vsid = get_vsid(mm->context.id, ea, *ssize);
+		return 0;
+	case VMALLOC_REGION_ID:
+		pr_devel("%s: 0x%llx -- VMALLOC_REGION_ID\n", __func__, ea);
+		if (ea < VMALLOC_END)
+			*psize = mmu_vmalloc_psize;
+		else
+			*psize = mmu_io_psize;
+		*ssize = mmu_kernel_ssize;
+		*vsid = get_kernel_vsid(ea, mmu_kernel_ssize);
+		return 0;
+	case KERNEL_REGION_ID:
+		pr_devel("%s: 0x%llx -- KERNEL_REGION_ID\n", __func__, ea);
+		*psize = mmu_linear_psize;
+		*ssize = mmu_kernel_ssize;
+		*vsid = get_kernel_vsid(ea, mmu_kernel_ssize);
+		return 0;
+	default:
+		pr_debug("%s: invalid region access at %016llx\n", __func__, ea);
+		return 1;
+	}
+}
+EXPORT_SYMBOL_GPL(calculate_vsid);
+
 /* Result code is:
  *  0 - handled
  *  1 - normal page fault
@@ -993,7 +1025,7 @@ int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, u
 {
 	enum ctx_state prev_state = exception_enter();
 	pgd_t *pgdir;
-	unsigned long vsid;
+	u64 vsid;
 	pte_t *ptep;
 	unsigned hugeshift;
 	const struct cpumask *tmp;
@@ -1003,35 +1035,20 @@ int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, u
 	DBG_LOW("%s(ea=%016lx, access=%lx, trap=%lx\n",
 		__func__, ea, access, trap);
 
-	/* Get region & vsid */
- 	switch (REGION_ID(ea)) {
-	case USER_REGION_ID:
+	/* Get region */
+	if (REGION_ID(ea) == USER_REGION_ID) {
 		user_region = 1;
 		if (! mm) {
 			DBG_LOW(" user region with no mm !\n");
 			rc = 1;
 			goto bail;
 		}
-		psize = get_slice_psize(mm, ea);
-		ssize = user_segment_size(ea);
-		vsid = get_vsid(mm->context.id, ea, ssize);
-		break;
-	case VMALLOC_REGION_ID:
+	} else
 		mm = &init_mm;
-		vsid = get_kernel_vsid(ea, mmu_kernel_ssize);
-		if (ea < VMALLOC_END)
-			psize = mmu_vmalloc_psize;
-		else
-			psize = mmu_io_psize;
-		ssize = mmu_kernel_ssize;
-		break;
-	default:
-		/* Not a valid range
-		 * Send the problem up to do_page_fault 
-		 */
-		rc = 1;
+	rc = calculate_vsid(mm, ea, &vsid, &psize, &ssize);
+	if (rc)
 		goto bail;
-	}
+
 	DBG_LOW(" mm=%p, mm->pgdir=%p, vsid=%016lx\n", mm, mm->pgd, vsid);
 
 	/* Bad address. */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 10/17] powerpc/mm: Merge vsid calculation in hash_page() and copro_data_segment()
@ 2014-09-30 10:34   ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:34 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

From: Ian Munsie <imunsie@au1.ibm.com>

The vsid calculation between hash_page() and copro_data_segment() are very
similar.  This merges these two different versions.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/mmu-hash64.h |  2 ++
 arch/powerpc/mm/copro_fault.c         | 45 ++++++--------------------
 arch/powerpc/mm/hash_utils_64.c       | 61 ++++++++++++++++++++++-------------
 3 files changed, 50 insertions(+), 58 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
index f84e5a5..bf43fb0 100644
--- a/arch/powerpc/include/asm/mmu-hash64.h
+++ b/arch/powerpc/include/asm/mmu-hash64.h
@@ -322,6 +322,8 @@ extern int __hash_page_64K(unsigned long ea, unsigned long access,
 			   unsigned int local, int ssize);
 struct mm_struct;
 unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap);
+int calculate_vsid(struct mm_struct *mm, u64 ea,
+		   u64 *vsid, int *psize, int *ssize);
 extern int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, unsigned long trap);
 extern int hash_page(unsigned long ea, unsigned long access, unsigned long trap);
 int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid,
diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
index 939abdf..ba8bf8e 100644
--- a/arch/powerpc/mm/copro_fault.c
+++ b/arch/powerpc/mm/copro_fault.c
@@ -94,45 +94,18 @@ EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
 
 int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid)
 {
-	int psize, ssize;
+	int psize, ssize, rc;
 
 	*esid = (ea & ESID_MASK) | SLB_ESID_V;
 
-	switch (REGION_ID(ea)) {
-	case USER_REGION_ID:
-		pr_devel("copro_data_segment: 0x%llx -- USER_REGION_ID\n", ea);
-#ifdef CONFIG_PPC_MM_SLICES
-		psize = get_slice_psize(mm, ea);
-#else
-		psize = mm->context.user_psize;
-#endif
-		ssize = user_segment_size(ea);
-		*vsid = (get_vsid(mm->context.id, ea, ssize)
-			 << slb_vsid_shift(ssize)) | SLB_VSID_USER;
-		break;
-	case VMALLOC_REGION_ID:
-		pr_devel("copro_data_segment: 0x%llx -- VMALLOC_REGION_ID\n", ea);
-		if (ea < VMALLOC_END)
-			psize = mmu_vmalloc_psize;
-		else
-			psize = mmu_io_psize;
-		ssize = mmu_kernel_ssize;
-		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
-			 << SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
-		break;
-	case KERNEL_REGION_ID:
-		pr_devel("copro_data_segment: 0x%llx -- KERNEL_REGION_ID\n", ea);
-		psize = mmu_linear_psize;
-		ssize = mmu_kernel_ssize;
-		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
-			 << SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
-		break;
-	default:
-		/* Future: support kernel segments so that drivers can use the
-		 * CoProcessors */
-		pr_debug("invalid region access at %016llx\n", ea);
-		return 1;
-	}
+	rc = calculate_vsid(mm, ea, vsid, &psize, &ssize);
+	if (rc)
+		return rc;
+	if (REGION_ID(ea) == USER_REGION_ID)
+		*vsid = (*vsid << slb_vsid_shift(ssize)) | SLB_VSID_USER;
+	else
+		*vsid = (*vsid << SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
+
 	*vsid |= mmu_psize_defs[psize].sllp |
 		((ssize == MMU_SEGSIZE_1T) ? SLB_VSID_B_1T : 0);
 
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 0a5c8c0..3fa81ca 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -983,6 +983,38 @@ static void check_paca_psize(unsigned long ea, struct mm_struct *mm,
 	}
 }
 
+int calculate_vsid(struct mm_struct *mm, u64 ea,
+		   u64 *vsid, int *psize, int *ssize)
+{
+	switch (REGION_ID(ea)) {
+	case USER_REGION_ID:
+		pr_devel("%s: 0x%llx -- USER_REGION_ID\n", __func__, ea);
+		*psize = get_slice_psize(mm, ea);
+		*ssize = user_segment_size(ea);
+		*vsid = get_vsid(mm->context.id, ea, *ssize);
+		return 0;
+	case VMALLOC_REGION_ID:
+		pr_devel("%s: 0x%llx -- VMALLOC_REGION_ID\n", __func__, ea);
+		if (ea < VMALLOC_END)
+			*psize = mmu_vmalloc_psize;
+		else
+			*psize = mmu_io_psize;
+		*ssize = mmu_kernel_ssize;
+		*vsid = get_kernel_vsid(ea, mmu_kernel_ssize);
+		return 0;
+	case KERNEL_REGION_ID:
+		pr_devel("%s: 0x%llx -- KERNEL_REGION_ID\n", __func__, ea);
+		*psize = mmu_linear_psize;
+		*ssize = mmu_kernel_ssize;
+		*vsid = get_kernel_vsid(ea, mmu_kernel_ssize);
+		return 0;
+	default:
+		pr_debug("%s: invalid region access at %016llx\n", __func__, ea);
+		return 1;
+	}
+}
+EXPORT_SYMBOL_GPL(calculate_vsid);
+
 /* Result code is:
  *  0 - handled
  *  1 - normal page fault
@@ -993,7 +1025,7 @@ int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, u
 {
 	enum ctx_state prev_state = exception_enter();
 	pgd_t *pgdir;
-	unsigned long vsid;
+	u64 vsid;
 	pte_t *ptep;
 	unsigned hugeshift;
 	const struct cpumask *tmp;
@@ -1003,35 +1035,20 @@ int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, u
 	DBG_LOW("%s(ea=%016lx, access=%lx, trap=%lx\n",
 		__func__, ea, access, trap);
 
-	/* Get region & vsid */
- 	switch (REGION_ID(ea)) {
-	case USER_REGION_ID:
+	/* Get region */
+	if (REGION_ID(ea) == USER_REGION_ID) {
 		user_region = 1;
 		if (! mm) {
 			DBG_LOW(" user region with no mm !\n");
 			rc = 1;
 			goto bail;
 		}
-		psize = get_slice_psize(mm, ea);
-		ssize = user_segment_size(ea);
-		vsid = get_vsid(mm->context.id, ea, ssize);
-		break;
-	case VMALLOC_REGION_ID:
+	} else
 		mm = &init_mm;
-		vsid = get_kernel_vsid(ea, mmu_kernel_ssize);
-		if (ea < VMALLOC_END)
-			psize = mmu_vmalloc_psize;
-		else
-			psize = mmu_io_psize;
-		ssize = mmu_kernel_ssize;
-		break;
-	default:
-		/* Not a valid range
-		 * Send the problem up to do_page_fault 
-		 */
-		rc = 1;
+	rc = calculate_vsid(mm, ea, &vsid, &psize, &ssize);
+	if (rc)
 		goto bail;
-	}
+
 	DBG_LOW(" mm=%p, mm->pgdir=%p, vsid=%016lx\n", mm, mm->pgd, vsid);
 
 	/* Bad address. */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 11/17] powerpc/opal: Add PHB to cxl mode call
  2014-09-30 10:34 ` Michael Neuling
@ 2014-09-30 10:35   ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:35 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

From: Ian Munsie <imunsie@au1.ibm.com>

This adds the OPAL call to change a PHB into cxl mode.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/opal.h                | 2 ++
 arch/powerpc/platforms/powernv/opal-wrappers.S | 1 +
 2 files changed, 3 insertions(+)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 86055e5..84c37c4dbc 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -146,6 +146,7 @@ struct opal_sg_list {
 #define OPAL_GET_PARAM				89
 #define OPAL_SET_PARAM				90
 #define OPAL_DUMP_RESEND			91
+#define OPAL_PCI_SET_PHB_CXL_MODE		93
 #define OPAL_DUMP_INFO2				94
 #define OPAL_PCI_EEH_FREEZE_SET			97
 #define OPAL_HANDLE_HMI				98
@@ -924,6 +925,7 @@ int64_t opal_sensor_read(uint32_t sensor_hndl, int token, __be32 *sensor_data);
 int64_t opal_handle_hmi(void);
 int64_t opal_register_dump_region(uint32_t id, uint64_t start, uint64_t end);
 int64_t opal_unregister_dump_region(uint32_t id);
+int64_t opal_pci_set_phb_cxl_mode(uint64_t phb_id, uint64_t mode, uint64_t pe_number);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 2e6ce1b..0fb56dc 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -247,3 +247,4 @@ OPAL_CALL(opal_set_param,			OPAL_SET_PARAM);
 OPAL_CALL(opal_handle_hmi,			OPAL_HANDLE_HMI);
 OPAL_CALL(opal_register_dump_region,		OPAL_REGISTER_DUMP_REGION);
 OPAL_CALL(opal_unregister_dump_region,		OPAL_UNREGISTER_DUMP_REGION);
+OPAL_CALL(opal_pci_set_phb_cxl_mode,		OPAL_PCI_SET_PHB_CXL_MODE);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 11/17] powerpc/opal: Add PHB to cxl mode call
@ 2014-09-30 10:35   ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:35 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

From: Ian Munsie <imunsie@au1.ibm.com>

This adds the OPAL call to change a PHB into cxl mode.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/opal.h                | 2 ++
 arch/powerpc/platforms/powernv/opal-wrappers.S | 1 +
 2 files changed, 3 insertions(+)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 86055e5..84c37c4dbc 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -146,6 +146,7 @@ struct opal_sg_list {
 #define OPAL_GET_PARAM				89
 #define OPAL_SET_PARAM				90
 #define OPAL_DUMP_RESEND			91
+#define OPAL_PCI_SET_PHB_CXL_MODE		93
 #define OPAL_DUMP_INFO2				94
 #define OPAL_PCI_EEH_FREEZE_SET			97
 #define OPAL_HANDLE_HMI				98
@@ -924,6 +925,7 @@ int64_t opal_sensor_read(uint32_t sensor_hndl, int token, __be32 *sensor_data);
 int64_t opal_handle_hmi(void);
 int64_t opal_register_dump_region(uint32_t id, uint64_t start, uint64_t end);
 int64_t opal_unregister_dump_region(uint32_t id);
+int64_t opal_pci_set_phb_cxl_mode(uint64_t phb_id, uint64_t mode, uint64_t pe_number);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 2e6ce1b..0fb56dc 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -247,3 +247,4 @@ OPAL_CALL(opal_set_param,			OPAL_SET_PARAM);
 OPAL_CALL(opal_handle_hmi,			OPAL_HANDLE_HMI);
 OPAL_CALL(opal_register_dump_region,		OPAL_REGISTER_DUMP_REGION);
 OPAL_CALL(opal_unregister_dump_region,		OPAL_UNREGISTER_DUMP_REGION);
+OPAL_CALL(opal_pci_set_phb_cxl_mode,		OPAL_PCI_SET_PHB_CXL_MODE);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 12/17] powerpc/mm: Add hooks for cxl
  2014-09-30 10:34 ` Michael Neuling
@ 2014-09-30 10:35   ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:35 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

From: Ian Munsie <imunsie@au1.ibm.com>

This add a hook into tlbie() so that we use global invalidations when there are
cxl contexts active.

Normally cxl snoops broadcast tlbie.  cxl can have TLB entries invalidated via
MMIO, but we aren't doing that yet.  So for now we are just disabling local
tlbies when cxl contexts are active.  In future we can make tlbie() local mode
smarter so that it invalidates cxl contexts explicitly when it needs to.

This also adds a hooks for when SLBs are invalidated to ensure any
corresponding SLBs in cxl are also invalidated at the same time.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/mm/copro_fault.c    | 2 ++
 arch/powerpc/mm/hash_native_64.c | 6 +++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
index ba8bf8e..219dadb 100644
--- a/arch/powerpc/mm/copro_fault.c
+++ b/arch/powerpc/mm/copro_fault.c
@@ -25,6 +25,7 @@
 #include <linux/export.h>
 #include <asm/reg.h>
 #include <asm/spu.h>
+#include <misc/cxl.h>
 
 /*
  * This ought to be kept in sync with the powerpc specific do_page_fault
@@ -118,5 +119,6 @@ void copro_flush_all_slbs(struct mm_struct *mm)
 #ifdef CONFIG_SPU_BASE
 	spu_flush_all_slbs(mm);
 #endif
+	cxl_slbia(mm);
 }
 EXPORT_SYMBOL_GPL(copro_flush_all_slbs);
diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index afc0a82..ae4962a 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -29,6 +29,8 @@
 #include <asm/kexec.h>
 #include <asm/ppc-opcode.h>
 
+#include <misc/cxl.h>
+
 #ifdef DEBUG_LOW
 #define DBG_LOW(fmt...) udbg_printf(fmt)
 #else
@@ -149,9 +151,11 @@ static inline void __tlbiel(unsigned long vpn, int psize, int apsize, int ssize)
 static inline void tlbie(unsigned long vpn, int psize, int apsize,
 			 int ssize, int local)
 {
-	unsigned int use_local = local && mmu_has_feature(MMU_FTR_TLBIEL);
+	unsigned int use_local;
 	int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
 
+	use_local = local && mmu_has_feature(MMU_FTR_TLBIEL) && !cxl_ctx_in_use();
+
 	if (use_local)
 		use_local = mmu_psize_defs[psize].tlbiel;
 	if (lock_tlbie && !use_local)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 12/17] powerpc/mm: Add hooks for cxl
@ 2014-09-30 10:35   ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:35 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

From: Ian Munsie <imunsie@au1.ibm.com>

This add a hook into tlbie() so that we use global invalidations when there are
cxl contexts active.

Normally cxl snoops broadcast tlbie.  cxl can have TLB entries invalidated via
MMIO, but we aren't doing that yet.  So for now we are just disabling local
tlbies when cxl contexts are active.  In future we can make tlbie() local mode
smarter so that it invalidates cxl contexts explicitly when it needs to.

This also adds a hooks for when SLBs are invalidated to ensure any
corresponding SLBs in cxl are also invalidated at the same time.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/mm/copro_fault.c    | 2 ++
 arch/powerpc/mm/hash_native_64.c | 6 +++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
index ba8bf8e..219dadb 100644
--- a/arch/powerpc/mm/copro_fault.c
+++ b/arch/powerpc/mm/copro_fault.c
@@ -25,6 +25,7 @@
 #include <linux/export.h>
 #include <asm/reg.h>
 #include <asm/spu.h>
+#include <misc/cxl.h>
 
 /*
  * This ought to be kept in sync with the powerpc specific do_page_fault
@@ -118,5 +119,6 @@ void copro_flush_all_slbs(struct mm_struct *mm)
 #ifdef CONFIG_SPU_BASE
 	spu_flush_all_slbs(mm);
 #endif
+	cxl_slbia(mm);
 }
 EXPORT_SYMBOL_GPL(copro_flush_all_slbs);
diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index afc0a82..ae4962a 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -29,6 +29,8 @@
 #include <asm/kexec.h>
 #include <asm/ppc-opcode.h>
 
+#include <misc/cxl.h>
+
 #ifdef DEBUG_LOW
 #define DBG_LOW(fmt...) udbg_printf(fmt)
 #else
@@ -149,9 +151,11 @@ static inline void __tlbiel(unsigned long vpn, int psize, int apsize, int ssize)
 static inline void tlbie(unsigned long vpn, int psize, int apsize,
 			 int ssize, int local)
 {
-	unsigned int use_local = local && mmu_has_feature(MMU_FTR_TLBIEL);
+	unsigned int use_local;
 	int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
 
+	use_local = local && mmu_has_feature(MMU_FTR_TLBIEL) && !cxl_ctx_in_use();
+
 	if (use_local)
 		use_local = mmu_psize_defs[psize].tlbiel;
 	if (lock_tlbie && !use_local)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 13/17] cxl: Add base builtin support
  2014-09-30 10:34 ` Michael Neuling
@ 2014-09-30 10:35   ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:35 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

From: Ian Munsie <imunsie@au1.ibm.com>

This adds the base cxl support that needs to be build into the kernel to use
cxl as a module.  This is needed so that the cxl call backs from the core
powerpc mm code always exist irrespective of if the cxl module is loaded or
not.  This is similar to how cell works with CONFIG_SPU_BASE.

This adds a cxl_slbia() call (similar to spu_flush_all_slbs()) which checks for
the cxl module being loaded.  If the modules is not loaded we return, otherwise
we call into the cxl SLB invalidation code.

This also adds the cxl_ctx_in_use() function for use in the mm code to see if
any cxl contexts are currently in use.  This is used by the tlbie() to
determine if it can do local TLB invalidations or not.  This also adds get/put
calls for the cxl driver module to refcount the active cxl contexts.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 drivers/misc/Kconfig      |   1 +
 drivers/misc/Makefile     |   1 +
 drivers/misc/cxl/Kconfig  |   8 ++++
 drivers/misc/cxl/Makefile |   1 +
 drivers/misc/cxl/base.c   | 102 ++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 113 insertions(+)
 create mode 100644 drivers/misc/cxl/Kconfig
 create mode 100644 drivers/misc/cxl/Makefile
 create mode 100644 drivers/misc/cxl/base.c

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index b841180..bbeb451 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -527,4 +527,5 @@ source "drivers/misc/vmw_vmci/Kconfig"
 source "drivers/misc/mic/Kconfig"
 source "drivers/misc/genwqe/Kconfig"
 source "drivers/misc/echo/Kconfig"
+source "drivers/misc/cxl/Kconfig"
 endmenu
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 5497d02..7d5c4cd 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -55,3 +55,4 @@ obj-y				+= mic/
 obj-$(CONFIG_GENWQE)		+= genwqe/
 obj-$(CONFIG_ECHO)		+= echo/
 obj-$(CONFIG_VEXPRESS_SYSCFG)	+= vexpress-syscfg.o
+obj-$(CONFIG_CXL_BASE)		+= cxl/
diff --git a/drivers/misc/cxl/Kconfig b/drivers/misc/cxl/Kconfig
new file mode 100644
index 0000000..5cdd319
--- /dev/null
+++ b/drivers/misc/cxl/Kconfig
@@ -0,0 +1,8 @@
+#
+# IBM Coherent Accelerator (CXL) compatible devices
+#
+
+config CXL_BASE
+	bool
+	default n
+	select PPC_COPRO_BASE
diff --git a/drivers/misc/cxl/Makefile b/drivers/misc/cxl/Makefile
new file mode 100644
index 0000000..e30ad0a
--- /dev/null
+++ b/drivers/misc/cxl/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_CXL_BASE)		+= base.o
diff --git a/drivers/misc/cxl/base.c b/drivers/misc/cxl/base.c
new file mode 100644
index 0000000..f4cbcfb
--- /dev/null
+++ b/drivers/misc/cxl/base.c
@@ -0,0 +1,102 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/rcupdate.h>
+#include <asm/errno.h>
+#include <misc/cxl.h>
+#include "cxl.h"
+
+/* protected by rcu */
+static struct cxl_calls *cxl_calls;
+
+static atomic_t use_count = ATOMIC_INIT(0);
+
+#ifdef CONFIG_CXL_MODULE
+
+static inline struct cxl_calls *cxl_calls_get(void)
+{
+	struct cxl_calls *calls = NULL;
+
+	rcu_read_lock();
+	calls = rcu_dereference(cxl_calls);
+	if (calls && !try_module_get(calls->owner))
+		calls = NULL;
+	rcu_read_unlock();
+
+	return calls;
+}
+
+static inline void cxl_calls_put(struct cxl_calls *calls)
+{
+	BUG_ON(calls != cxl_calls);
+
+	/* we don't need to rcu this, as we hold a reference to the module */
+	module_put(cxl_calls->owner);
+}
+
+#else /* !defined CONFIG_CXL_MODULE */
+
+static inline struct cxl_calls *cxl_calls_get(void)
+{
+	return cxl_calls;
+}
+
+static inline void cxl_calls_put(struct cxl_calls *calls) { }
+
+#endif /* CONFIG_CXL_MODULE */
+
+void cxl_slbia(struct mm_struct *mm)
+{
+	struct cxl_calls *calls;
+
+	calls = cxl_calls_get();
+	if (!calls)
+		return;
+
+	calls->cxl_slbia(mm);
+	cxl_calls_put(calls);
+}
+EXPORT_SYMBOL(cxl_slbia);
+
+void cxl_ctx_get(void)
+{
+	atomic_inc(&use_count);
+}
+EXPORT_SYMBOL(cxl_ctx_get);
+
+void cxl_ctx_put(void)
+{
+	atomic_dec(&use_count);
+}
+EXPORT_SYMBOL(cxl_ctx_put);
+
+bool cxl_ctx_in_use(void)
+{
+	return (atomic_read(&use_count) != 0);
+}
+EXPORT_SYMBOL(cxl_ctx_in_use);
+
+int register_cxl_calls(struct cxl_calls *calls)
+{
+	if (cxl_calls)
+		return -EBUSY;
+
+	rcu_assign_pointer(cxl_calls, calls);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(register_cxl_calls);
+
+void unregister_cxl_calls(struct cxl_calls *calls)
+{
+	BUG_ON(cxl_calls->owner != calls->owner);
+	RCU_INIT_POINTER(cxl_calls, NULL);
+	synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(unregister_cxl_calls);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 13/17] cxl: Add base builtin support
@ 2014-09-30 10:35   ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:35 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

From: Ian Munsie <imunsie@au1.ibm.com>

This adds the base cxl support that needs to be build into the kernel to use
cxl as a module.  This is needed so that the cxl call backs from the core
powerpc mm code always exist irrespective of if the cxl module is loaded or
not.  This is similar to how cell works with CONFIG_SPU_BASE.

This adds a cxl_slbia() call (similar to spu_flush_all_slbs()) which checks for
the cxl module being loaded.  If the modules is not loaded we return, otherwise
we call into the cxl SLB invalidation code.

This also adds the cxl_ctx_in_use() function for use in the mm code to see if
any cxl contexts are currently in use.  This is used by the tlbie() to
determine if it can do local TLB invalidations or not.  This also adds get/put
calls for the cxl driver module to refcount the active cxl contexts.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 drivers/misc/Kconfig      |   1 +
 drivers/misc/Makefile     |   1 +
 drivers/misc/cxl/Kconfig  |   8 ++++
 drivers/misc/cxl/Makefile |   1 +
 drivers/misc/cxl/base.c   | 102 ++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 113 insertions(+)
 create mode 100644 drivers/misc/cxl/Kconfig
 create mode 100644 drivers/misc/cxl/Makefile
 create mode 100644 drivers/misc/cxl/base.c

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index b841180..bbeb451 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -527,4 +527,5 @@ source "drivers/misc/vmw_vmci/Kconfig"
 source "drivers/misc/mic/Kconfig"
 source "drivers/misc/genwqe/Kconfig"
 source "drivers/misc/echo/Kconfig"
+source "drivers/misc/cxl/Kconfig"
 endmenu
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 5497d02..7d5c4cd 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -55,3 +55,4 @@ obj-y				+= mic/
 obj-$(CONFIG_GENWQE)		+= genwqe/
 obj-$(CONFIG_ECHO)		+= echo/
 obj-$(CONFIG_VEXPRESS_SYSCFG)	+= vexpress-syscfg.o
+obj-$(CONFIG_CXL_BASE)		+= cxl/
diff --git a/drivers/misc/cxl/Kconfig b/drivers/misc/cxl/Kconfig
new file mode 100644
index 0000000..5cdd319
--- /dev/null
+++ b/drivers/misc/cxl/Kconfig
@@ -0,0 +1,8 @@
+#
+# IBM Coherent Accelerator (CXL) compatible devices
+#
+
+config CXL_BASE
+	bool
+	default n
+	select PPC_COPRO_BASE
diff --git a/drivers/misc/cxl/Makefile b/drivers/misc/cxl/Makefile
new file mode 100644
index 0000000..e30ad0a
--- /dev/null
+++ b/drivers/misc/cxl/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_CXL_BASE)		+= base.o
diff --git a/drivers/misc/cxl/base.c b/drivers/misc/cxl/base.c
new file mode 100644
index 0000000..f4cbcfb
--- /dev/null
+++ b/drivers/misc/cxl/base.c
@@ -0,0 +1,102 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/rcupdate.h>
+#include <asm/errno.h>
+#include <misc/cxl.h>
+#include "cxl.h"
+
+/* protected by rcu */
+static struct cxl_calls *cxl_calls;
+
+static atomic_t use_count = ATOMIC_INIT(0);
+
+#ifdef CONFIG_CXL_MODULE
+
+static inline struct cxl_calls *cxl_calls_get(void)
+{
+	struct cxl_calls *calls = NULL;
+
+	rcu_read_lock();
+	calls = rcu_dereference(cxl_calls);
+	if (calls && !try_module_get(calls->owner))
+		calls = NULL;
+	rcu_read_unlock();
+
+	return calls;
+}
+
+static inline void cxl_calls_put(struct cxl_calls *calls)
+{
+	BUG_ON(calls != cxl_calls);
+
+	/* we don't need to rcu this, as we hold a reference to the module */
+	module_put(cxl_calls->owner);
+}
+
+#else /* !defined CONFIG_CXL_MODULE */
+
+static inline struct cxl_calls *cxl_calls_get(void)
+{
+	return cxl_calls;
+}
+
+static inline void cxl_calls_put(struct cxl_calls *calls) { }
+
+#endif /* CONFIG_CXL_MODULE */
+
+void cxl_slbia(struct mm_struct *mm)
+{
+	struct cxl_calls *calls;
+
+	calls = cxl_calls_get();
+	if (!calls)
+		return;
+
+	calls->cxl_slbia(mm);
+	cxl_calls_put(calls);
+}
+EXPORT_SYMBOL(cxl_slbia);
+
+void cxl_ctx_get(void)
+{
+	atomic_inc(&use_count);
+}
+EXPORT_SYMBOL(cxl_ctx_get);
+
+void cxl_ctx_put(void)
+{
+	atomic_dec(&use_count);
+}
+EXPORT_SYMBOL(cxl_ctx_put);
+
+bool cxl_ctx_in_use(void)
+{
+	return (atomic_read(&use_count) != 0);
+}
+EXPORT_SYMBOL(cxl_ctx_in_use);
+
+int register_cxl_calls(struct cxl_calls *calls)
+{
+	if (cxl_calls)
+		return -EBUSY;
+
+	rcu_assign_pointer(cxl_calls, calls);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(register_cxl_calls);
+
+void unregister_cxl_calls(struct cxl_calls *calls)
+{
+	BUG_ON(cxl_calls->owner != calls->owner);
+	RCU_INIT_POINTER(cxl_calls, NULL);
+	synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(unregister_cxl_calls);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 14/17] cxl: Driver code for powernv PCIe based cards for userspace access
  2014-09-30 10:34 ` Michael Neuling
@ 2014-09-30 10:35   ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:35 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

From: Ian Munsie <imunsie@au1.ibm.com>

This is the core of the cxl driver.

It adds support for using cxl cards in the powernv environment only (no guest
support).  It allows access to cxl accelerators by userspace using
/dev/cxl/afu0.0 char device.

The kernel driver has no knowledge of the acceleration function.  It only
provides services to userspace via the /dev/cxl/afu0.0 device.

This will compile to two modules.  cxl.ko provides the core cxl functionality
and userspace API.  cxl-pci.ko provides the PCI driver driver functionality the
powernv environment.

Documentation of the cxl hardware architecture and userspace API is provided in
subsequent patches.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 drivers/misc/cxl/context.c | 171 ++++++++
 drivers/misc/cxl/cxl-pci.c | 964 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/misc/cxl/cxl.h     | 605 ++++++++++++++++++++++++++++
 drivers/misc/cxl/debugfs.c | 116 ++++++
 drivers/misc/cxl/fault.c   | 298 ++++++++++++++
 drivers/misc/cxl/file.c    | 503 +++++++++++++++++++++++
 drivers/misc/cxl/irq.c     | 405 +++++++++++++++++++
 drivers/misc/cxl/main.c    | 238 +++++++++++
 drivers/misc/cxl/native.c  | 649 ++++++++++++++++++++++++++++++
 drivers/misc/cxl/sysfs.c   | 348 ++++++++++++++++
 10 files changed, 4297 insertions(+)
 create mode 100644 drivers/misc/cxl/context.c
 create mode 100644 drivers/misc/cxl/cxl-pci.c
 create mode 100644 drivers/misc/cxl/cxl.h
 create mode 100644 drivers/misc/cxl/debugfs.c
 create mode 100644 drivers/misc/cxl/fault.c
 create mode 100644 drivers/misc/cxl/file.c
 create mode 100644 drivers/misc/cxl/irq.c
 create mode 100644 drivers/misc/cxl/main.c
 create mode 100644 drivers/misc/cxl/native.c
 create mode 100644 drivers/misc/cxl/sysfs.c

diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
new file mode 100644
index 0000000..9206ca4
--- /dev/null
+++ b/drivers/misc/cxl/context.c
@@ -0,0 +1,171 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#undef DEBUG
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/bitmap.h>
+#include <linux/sched.h>
+#include <linux/pid.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/debugfs.h>
+#include <linux/slab.h>
+#include <linux/idr.h>
+#include <asm/cputable.h>
+#include <asm/current.h>
+#include <asm/copro.h>
+
+#include "cxl.h"
+
+/*
+ * Allocates space for a CXL context.
+ */
+struct cxl_context_t *cxl_context_alloc(void)
+{
+	return kzalloc(sizeof(struct cxl_context_t), GFP_KERNEL);
+}
+
+/*
+ * Initialises a CXL context.
+ */
+int cxl_context_init(struct cxl_context_t *ctx, struct cxl_afu_t *afu, bool master)
+{
+	int i;
+
+	spin_lock_init(&ctx->sst_lock);
+	ctx->sstp = NULL;
+	ctx->afu = afu;
+	ctx->master = master;
+	ctx->pid = get_pid(get_task_pid(current, PIDTYPE_PID));
+
+	INIT_WORK(&ctx->fault_work, cxl_handle_fault);
+
+	init_waitqueue_head(&ctx->wq);
+	spin_lock_init(&ctx->lock);
+
+	ctx->irq_bitmap = NULL;
+	ctx->pending_irq = false;
+	ctx->pending_fault = false;
+	ctx->pending_afu_err = false;
+
+	ctx->status = OPENED;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&afu->contexts_lock);
+	i = idr_alloc(&ctx->afu->contexts_idr, ctx, 0,
+		      ctx->afu->num_procs, GFP_NOWAIT);
+	spin_unlock(&afu->contexts_lock);
+	idr_preload_end();
+	if (i < 0)
+		return i;
+
+	ctx->ph = i;
+	ctx->elem = &ctx->afu->spa[i];
+	ctx->pe_inserted = false;
+	return 0;
+}
+
+/*
+ * Map a per-context mmio space into the given vma.
+ */
+int cxl_context_iomap(struct cxl_context_t *ctx, struct vm_area_struct *vma)
+{
+	u64 len = vma->vm_end - vma->vm_start;
+	len = min(len, ctx->psn_size);
+
+	if (ctx->afu->current_model == CXL_MODEL_DEDICATED) {
+		vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+		return vm_iomap_memory(vma, ctx->afu->psn_phys, ctx->afu->adapter->ps_size);
+	}
+
+	/* make sure there is a valid per process space for this AFU */
+	if ((ctx->master && !ctx->afu->psa) || (!ctx->afu->pp_psa)) {
+		pr_devel("AFU doesn't support mmio space\n");
+		return -EINVAL;
+	}
+
+	/* Can't mmap until the AFU is enabled */
+	if (!ctx->afu->enabled)
+		return -EBUSY;
+
+	pr_devel("%s: mmio physical: %llx pe: %i master:%i\n", __func__,
+		 ctx->psn_phys, ctx->ph , ctx->master);
+
+	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+	return vm_iomap_memory(vma, ctx->psn_phys, len);
+}
+
+/*
+ * Detach a context from the hardware. This disables interrupts and doesn't
+ * return until all outstanding interrupts for this context have completed. The
+ * hardware should no longer access *ctx after this has returned.
+ */
+static void __detach_context(struct cxl_context_t *ctx)
+{
+	unsigned long flags;
+	enum cxl_context_status status;
+
+	spin_lock_irqsave(&ctx->sst_lock, flags);
+	status = ctx->status;
+	ctx->status = CLOSED;
+	spin_unlock_irqrestore(&ctx->sst_lock, flags);
+	if (status != STARTED)
+		return;
+
+	WARN_ON(cxl_ops->detach_process(ctx));
+	afu_release_irqs(ctx);
+	flush_work(&ctx->fault_work); /* Only needed for dedicated process */
+	wake_up_all(&ctx->wq);
+}
+
+/*
+ * Detach the given context from the AFU. This doesn't actually
+ * free the context but it should stop the context running in hardware
+ * (ie. prevent this context from generating any further interrupts
+ * so that it can be freed).
+ */
+void cxl_context_detach(struct cxl_context_t *ctx)
+{
+	__detach_context(ctx);
+}
+
+/*
+ * Detach all contexts on the given AFU.
+ */
+void cxl_context_detach_all(struct cxl_afu_t *afu)
+{
+	struct cxl_context_t *ctx;
+	int tmp;
+
+	rcu_read_lock();
+	idr_for_each_entry(&afu->contexts_idr, ctx, tmp)
+		__detach_context(ctx);
+	rcu_read_unlock();
+}
+EXPORT_SYMBOL(cxl_context_detach_all);
+
+void cxl_context_free(struct cxl_context_t *ctx)
+{
+	unsigned long flags;
+
+	spin_lock(&ctx->afu->contexts_lock);
+	idr_remove(&ctx->afu->contexts_idr, ctx->ph);
+	spin_unlock(&ctx->afu->contexts_lock);
+	synchronize_rcu();
+
+	spin_lock_irqsave(&ctx->sst_lock, flags);
+	free_page((u64)ctx->sstp);
+	ctx->sstp = NULL;
+	spin_unlock_irqrestore(&ctx->sst_lock, flags);
+
+	put_pid(ctx->pid);
+	kfree(ctx);
+}
diff --git a/drivers/misc/cxl/cxl-pci.c b/drivers/misc/cxl/cxl-pci.c
new file mode 100644
index 0000000..402ab00
--- /dev/null
+++ b/drivers/misc/cxl/cxl-pci.c
@@ -0,0 +1,964 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#undef DEBUG
+
+#include <linux/pci_regs.h>
+#include <linux/pci_ids.h>
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/sort.h>
+#include <linux/pci.h>
+#include <linux/of.h>
+#include <linux/delay.h>
+#include <asm/opal.h>
+#include <asm/msi_bitmap.h>
+#include <asm/pci-bridge.h> /* for struct pci_controller */
+#include <asm/pnv-pci.h>
+
+#include "cxl.h"
+
+
+#define CXL_PCI_VSEC_ID	0x1280
+#define CXL_VSEC_MIN_SIZE 0x80
+
+#define CXL_READ_VSEC_LENGTH(dev, vsec, dest)			\
+	{							\
+		pci_read_config_word(dev, vsec + 0x6, dest);	\
+		*dest >>= 4;					\
+	}
+#define CXL_READ_VSEC_NAFUS(dev, vsec, dest) \
+	pci_read_config_byte(dev, vsec + 0x8, dest)
+
+#define CXL_READ_VSEC_STATUS(dev, vsec, dest) \
+	pci_read_config_byte(dev, vsec + 0x9, dest)
+#define CXL_STATUS_SECOND_PORT  0x80
+#define CXL_STATUS_MSI_X_FULL   0x40
+#define CXL_STATUS_MSI_X_SINGLE 0x20
+#define CXL_STATUS_FLASH_RW     0x08
+#define CXL_STATUS_FLASH_RO     0x04
+#define CXL_STATUS_LOADABLE_AFU 0x02
+#define CXL_STATUS_LOADABLE_PSL 0x01
+/* If we see these features we won't try to use the card */
+#define CXL_UNSUPPORTED_FEATURES \
+	(CXL_STATUS_MSI_X_FULL | CXL_STATUS_MSI_X_SINGLE)
+
+#define CXL_READ_VSEC_MODE_CONTROL(dev, vsec, dest) \
+	pci_read_config_byte(dev, vsec + 0xa, dest)
+#define CXL_WRITE_VSEC_MODE_CONTROL(dev, vsec, val) \
+	pci_write_config_byte(dev, vsec + 0xa, val)
+#define CXL_VSEC_PROTOCOL_MASK   0xe0
+#define CXL_VSEC_PROTOCOL_256TB  0x80 /* Power 8 uses this */
+#define CXL_VSEC_PROTOCOL_512TB  0x40
+#define CXL_VSEC_PROTOCOL_1024TB 0x20
+#define CXL_VSEC_PROTOCOL_ENABLE 0x01
+
+#define CXL_READ_VSEC_PSL_REVISION(dev, vsec, dest) \
+	pci_read_config_word(dev, vsec + 0xc, dest)
+#define CXL_READ_VSEC_CAIA_MINOR(dev, vsec, dest) \
+	pci_read_config_byte(dev, vsec + 0xe, dest)
+#define CXL_READ_VSEC_CAIA_MAJOR(dev, vsec, dest) \
+	pci_read_config_byte(dev, vsec + 0xf, dest)
+#define CXL_READ_VSEC_BASE_IMAGE(dev, vsec, dest) \
+	pci_read_config_word(dev, vsec + 0x10, dest)
+
+#define CXL_READ_VSEC_IMAGE_STATE(dev, vsec, dest) \
+	pci_read_config_byte(dev, vsec + 0x13, dest)
+#define CXL_WRITE_VSEC_IMAGE_STATE(dev, vsec, val) \
+	pci_write_config_byte(dev, vsec + 0x13, val)
+#define CXL_VSEC_USER_IMAGE_LOADED 0x80 /* RO */
+#define CXL_VSEC_PERST_LOADS_IMAGE 0x20 /* RW */
+#define CXL_VSEC_PERST_SELECT_USER 0x10 /* RW */
+
+#define CXL_READ_VSEC_AFU_DESC_OFF(dev, vsec, dest) \
+	pci_read_config_dword(dev, vsec + 0x20, dest)
+#define CXL_READ_VSEC_AFU_DESC_SIZE(dev, vsec, dest) \
+	pci_read_config_dword(dev, vsec + 0x24, dest)
+#define CXL_READ_VSEC_PS_OFF(dev, vsec, dest) \
+	pci_read_config_dword(dev, vsec + 0x28, dest)
+#define CXL_READ_VSEC_PS_SIZE(dev, vsec, dest) \
+	pci_read_config_dword(dev, vsec + 0x2c, dest)
+
+
+/* This works a little different than the p1/p2 register accesses to make it
+ * easier to pull out individual fields */
+#define AFUD_READ(afu, off)		_cxl_reg_read(afu->afu_desc_mmio + off)
+#define EXTRACT_PPC_BIT(val, bit)	(!!(val & PPC_BIT(bit)))
+#define EXTRACT_PPC_BITS(val, bs, be)	((val & PPC_BITMASK(bs, be)) >> PPC_BITLSHIFT(be))
+
+#define AFUD_READ_INFO(afu)		AFUD_READ(afu, 0x0)
+#define   AFUD_NUM_INTS_PER_PROC(val)	EXTRACT_PPC_BITS(val,  0, 15)
+#define   AFUD_NUM_PROCS(val)		EXTRACT_PPC_BITS(val, 16, 31)
+#define   AFUD_NUM_CRS(val)		EXTRACT_PPC_BITS(val, 32, 47)
+#define   AFUD_MULTIMODEL(val)		EXTRACT_PPC_BIT(val, 48)
+#define   AFUD_PUSH_BLOCK_TRANSFER(val)	EXTRACT_PPC_BIT(val, 55)
+#define   AFUD_DEDICATED_PROCESS(val)	EXTRACT_PPC_BIT(val, 59)
+#define   AFUD_AFU_DIRECTED(val)	EXTRACT_PPC_BIT(val, 61)
+#define   AFUD_TIME_SLICED(val)		EXTRACT_PPC_BIT(val, 63)
+#define AFUD_READ_CR(afu)		AFUD_READ(afu, 0x20)
+#define   AFUD_CR_LEN(val)		EXTRACT_PPC_BITS(val, 8, 63)
+#define AFUD_READ_CR_OFF(afu)		AFUD_READ(afu, 0x28)
+#define AFUD_READ_PPPSA(afu)		AFUD_READ(afu, 0x30)
+#define   AFUD_PPPSA_PP(val)		EXTRACT_PPC_BIT(val, 6)
+#define   AFUD_PPPSA_PSA(val)		EXTRACT_PPC_BIT(val, 7)
+#define   AFUD_PPPSA_LEN(val)		EXTRACT_PPC_BITS(val, 8, 63)
+#define AFUD_READ_PPPSA_OFF(afu)	AFUD_READ(afu, 0x38)
+#define AFUD_READ_EB(afu)		AFUD_READ(afu, 0x40)
+#define   AFUD_EB_LEN(val)		EXTRACT_PPC_BITS(val, 8, 63)
+#define AFUD_READ_EB_OFF(afu)		AFUD_READ(afu, 0x48)
+
+static DEFINE_PCI_DEVICE_TABLE(cxl_pci_tbl) = {
+	{ PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x0477), },
+	{ PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x044b), },
+	{ PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x04cf), },
+	{ PCI_DEVICE_CLASS(0x120000, ~0), },
+
+	{ }
+};
+MODULE_DEVICE_TABLE(pci, cxl_pci_tbl);
+
+
+/* Mostly using these wrappers to avoid confusion:
+ * priv 1 is BAR2, while priv 2 is BAR0 */
+static inline resource_size_t p1_base(struct pci_dev *dev)
+{
+	return pci_resource_start(dev, 2);
+}
+
+static inline resource_size_t p1_size(struct pci_dev *dev)
+{
+	return pci_resource_len(dev, 2);
+}
+
+static inline resource_size_t p2_base(struct pci_dev *dev)
+{
+	return pci_resource_start(dev, 0);
+}
+
+static inline resource_size_t p2_size(struct pci_dev *dev)
+{
+	return pci_resource_len(dev, 0);
+}
+
+static int find_cxl_vsec(struct pci_dev *dev)
+{
+	int vsec = 0;
+	u16 val;
+
+	while ((vsec = pci_find_next_ext_capability(dev, vsec, PCI_EXT_CAP_ID_VNDR))) {
+		pci_read_config_word(dev, vsec + 0x4, &val);
+		if (val == CXL_PCI_VSEC_ID)
+			return vsec;
+	}
+	return 0;
+
+}
+
+static void dump_cxl_config_space(struct pci_dev *dev)
+{
+	int vsec;
+	u32 val;
+
+	dev_info(&dev->dev, "dump_cxl_config_space\n");
+
+	pci_read_config_dword(dev, PCI_BASE_ADDRESS_0, &val);
+	dev_info(&dev->dev, "BAR0: %#.8x\n", val);
+	pci_read_config_dword(dev, PCI_BASE_ADDRESS_1, &val);
+	dev_info(&dev->dev, "BAR1: %#.8x\n", val);
+	pci_read_config_dword(dev, PCI_BASE_ADDRESS_2, &val);
+	dev_info(&dev->dev, "BAR2: %#.8x\n", val);
+	pci_read_config_dword(dev, PCI_BASE_ADDRESS_3, &val);
+	dev_info(&dev->dev, "BAR3: %#.8x\n", val);
+	pci_read_config_dword(dev, PCI_BASE_ADDRESS_4, &val);
+	dev_info(&dev->dev, "BAR4: %#.8x\n", val);
+	pci_read_config_dword(dev, PCI_BASE_ADDRESS_5, &val);
+	dev_info(&dev->dev, "BAR5: %#.8x\n", val);
+
+	dev_info(&dev->dev, "p1 regs: %#llx, len: %#llx\n",
+		p1_base(dev), p1_size(dev));
+	dev_info(&dev->dev, "p2 regs: %#llx, len: %#llx\n",
+		p1_base(dev), p2_size(dev));
+	dev_info(&dev->dev, "BAR 4/5: %#llx, len: %#llx\n",
+		pci_resource_start(dev, 4), pci_resource_len(dev, 4));
+
+	if (!(vsec = find_cxl_vsec(dev)))
+		return;
+
+#define show_reg(name, what) \
+	dev_info(&dev->dev, "cxl vsec: %30s: %#x\n", name, what)
+
+	pci_read_config_dword(dev, vsec + 0x0, &val);
+	show_reg("Cap ID", (val >> 0) & 0xffff);
+	show_reg("Cap Ver", (val >> 16) & 0xf);
+	show_reg("Next Cap Ptr", (val >> 20) & 0xfff);
+	pci_read_config_dword(dev, vsec + 0x4, &val);
+	show_reg("VSEC ID", (val >> 0) & 0xffff);
+	show_reg("VSEC Rev", (val >> 16) & 0xf);
+	show_reg("VSEC Length",	(val >> 20) & 0xfff);
+	pci_read_config_dword(dev, vsec + 0x8, &val);
+	show_reg("Num AFUs", (val >> 0) & 0xff);
+	show_reg("Status", (val >> 8) & 0xff);
+	show_reg("Mode Control", (val >> 16) & 0xff);
+	show_reg("Reserved", (val >> 24) & 0xff);
+	pci_read_config_dword(dev, vsec + 0xc, &val);
+	show_reg("PSL Rev", (val >> 0) & 0xffff);
+	show_reg("CAIA Ver", (val >> 16) & 0xffff);
+	pci_read_config_dword(dev, vsec + 0x10, &val);
+	show_reg("Base Image Rev", (val >> 0) & 0xffff);
+	show_reg("Reserved", (val >> 16) & 0x0fff);
+	show_reg("Image Control", (val >> 28) & 0x3);
+	show_reg("Reserved", (val >> 30) & 0x1);
+	show_reg("Image Loaded", (val >> 31) & 0x1);
+
+	pci_read_config_dword(dev, vsec + 0x14, &val);
+	show_reg("Reserved", val);
+	pci_read_config_dword(dev, vsec + 0x18, &val);
+	show_reg("Reserved", val);
+	pci_read_config_dword(dev, vsec + 0x1c, &val);
+	show_reg("Reserved", val);
+
+	pci_read_config_dword(dev, vsec + 0x20, &val);
+	show_reg("AFU Descriptor Offset", val);
+	pci_read_config_dword(dev, vsec + 0x24, &val);
+	show_reg("AFU Descriptor Size", val);
+	pci_read_config_dword(dev, vsec + 0x28, &val);
+	show_reg("Problem State Offset", val);
+	pci_read_config_dword(dev, vsec + 0x2c, &val);
+	show_reg("Problem State Size", val);
+
+	pci_read_config_dword(dev, vsec + 0x30, &val);
+	show_reg("Reserved", val);
+	pci_read_config_dword(dev, vsec + 0x34, &val);
+	show_reg("Reserved", val);
+	pci_read_config_dword(dev, vsec + 0x38, &val);
+	show_reg("Reserved", val);
+	pci_read_config_dword(dev, vsec + 0x3c, &val);
+	show_reg("Reserved", val);
+
+	pci_read_config_dword(dev, vsec + 0x40, &val);
+	show_reg("PSL Programming Port", val);
+	pci_read_config_dword(dev, vsec + 0x44, &val);
+	show_reg("PSL Programming Control", val);
+
+	pci_read_config_dword(dev, vsec + 0x48, &val);
+	show_reg("Reserved", val);
+	pci_read_config_dword(dev, vsec + 0x4c, &val);
+	show_reg("Reserved", val);
+
+	pci_read_config_dword(dev, vsec + 0x50, &val);
+	show_reg("Flash Address Register", val);
+	pci_read_config_dword(dev, vsec + 0x54, &val);
+	show_reg("Flash Size Register", val);
+	pci_read_config_dword(dev, vsec + 0x58, &val);
+	show_reg("Flash Status/Control Register", val);
+	pci_read_config_dword(dev, vsec + 0x58, &val);
+	show_reg("Flash Data Port", val);
+
+#undef show_reg
+}
+
+static void dump_afu_descriptor(struct cxl_afu_t *afu)
+{
+	u64 val;
+
+#define show_reg(name, what) \
+	dev_info(&afu->dev, "afu desc: %30s: %#llx\n", name, what)
+
+	val = AFUD_READ_INFO(afu);
+	show_reg("num_ints_per_process", AFUD_NUM_INTS_PER_PROC(val));
+	show_reg("num_of_processes", AFUD_NUM_PROCS(val));
+	show_reg("num_of_afu_CRs", AFUD_NUM_CRS(val));
+	show_reg("req_prog_model", val & 0xffffULL);
+
+	val = AFUD_READ(afu, 0x8);
+	show_reg("Reserved", val);
+	val = AFUD_READ(afu, 0x10);
+	show_reg("Reserved", val);
+	val = AFUD_READ(afu, 0x18);
+	show_reg("Reserved", val);
+
+	val = AFUD_READ_CR(afu);
+	show_reg("Reserved", (val >> (63-7)) & 0xff);
+	show_reg("AFU_CR_len", AFUD_CR_LEN(val));
+
+	val = AFUD_READ_CR_OFF(afu);
+	show_reg("AFU_CR_offset", val);
+
+	val = AFUD_READ_PPPSA(afu);
+	show_reg("PerProcessPSA_control", (val >> (63-7)) & 0xff);
+	show_reg("PerProcessPSA Length", AFUD_PPPSA_LEN(val));
+
+	val = AFUD_READ_PPPSA_OFF(afu);
+	show_reg("PerProcessPSA_offset", val);
+
+	val = AFUD_READ_EB(afu);
+	show_reg("Reserved", (val >> (63-7)) & 0xff);
+	show_reg("AFU_EB_len", AFUD_EB_LEN(val));
+
+	val = AFUD_READ_EB_OFF(afu);
+	show_reg("AFU_EB_offset", val);
+
+#undef show_reg
+}
+
+extern struct device_node *pnv_pci_to_phb_node(struct pci_dev *dev);
+
+static int init_implementation_adapter_regs(struct cxl_t *adapter, struct pci_dev *dev)
+{
+	struct device_node *np;
+	const __be32 *prop;
+	u64 psl_dsnctl;
+	u64 chipid;
+
+	if (!(np = pnv_pci_to_phb_node(dev)))
+		return -ENODEV;
+
+	while (np && !(prop = of_get_property(np, "ibm,chip-id", NULL)))
+		np = of_get_next_parent(np);
+	if (!np)
+		return -ENODEV;
+	chipid = be32_to_cpup(prop);
+	of_node_put(np);
+
+	/* Tell PSL where to route data to */
+	psl_dsnctl = 0x02E8900002000000ULL | (chipid << (63-5));
+	cxl_p1_write(adapter, CXL_PSL_DSNDCTL, psl_dsnctl);
+	cxl_p1_write(adapter, CXL_PSL_RESLCKTO, 0x20000000200ULL);
+	/* snoop write mask */
+	cxl_p1_write(adapter, CXL_PSL_SNWRALLOC, 0x00000000FFFFFFFFULL);
+	/* set fir_accum */
+	cxl_p1_write(adapter, CXL_PSL_FIR_CNTL, 0x0800000000000000ULL);
+	/* for debugging with trace arrays */
+	cxl_p1_write(adapter, CXL_PSL_TRACE, 0x0000FF7C00000000ULL);
+
+	return 0;
+}
+
+static int init_implementation_afu_regs(struct cxl_afu_t *afu)
+{
+	/* read/write masks for this slice */
+	cxl_p1n_write(afu, CXL_PSL_APCALLOC_A, 0xFFFFFFFEFEFEFEFEULL);
+	/* APC read/write masks for this slice */
+	cxl_p1n_write(afu, CXL_PSL_COALLOC_A, 0xFF000000FEFEFEFEULL);
+	/* for debugging with trace arrays */
+	cxl_p1n_write(afu, CXL_PSL_SLICE_TRACE, 0x0000FFFF00000000ULL);
+	cxl_p1n_write(afu, CXL_PSL_RXCTL_A, 0xF000000000000000ULL);
+
+	return 0;
+}
+
+static int setup_cxl_msi(struct cxl_t *adapter, unsigned int hwirq,
+			 unsigned int virq)
+{
+	struct pci_dev *dev = to_pci_dev(adapter->dev.parent);
+
+	return pnv_cxl_ioda_msi_setup(dev, hwirq, virq);
+}
+
+static int alloc_one_hwirq(struct cxl_t *adapter)
+{
+	struct pci_dev *dev = to_pci_dev(adapter->dev.parent);
+
+	return pnv_cxl_alloc_hwirqs(dev, 1);
+}
+
+static void release_one_hwirq(struct cxl_t *adapter, int hwirq)
+{
+	struct pci_dev *dev = to_pci_dev(adapter->dev.parent);
+
+	return pnv_cxl_release_hwirqs(dev, hwirq, 1);
+}
+
+static int alloc_hwirq_ranges(struct cxl_irq_ranges *irqs, struct cxl_t *adapter, unsigned int num)
+{
+	struct pci_dev *dev = to_pci_dev(adapter->dev.parent);
+
+	return pnv_cxl_alloc_hwirq_ranges(irqs, dev, num);
+}
+
+static void release_hwirq_ranges(struct cxl_irq_ranges *irqs, struct cxl_t *adapter)
+{
+	struct pci_dev *dev = to_pci_dev(adapter->dev.parent);
+
+	pnv_cxl_release_hwirq_ranges(irqs, dev);
+
+}
+
+
+static struct cxl_driver_ops cxl_pci_driver_ops = {
+	.module = THIS_MODULE,
+	.alloc_one_irq = alloc_one_hwirq,
+	.release_one_irq = release_one_hwirq,
+	.alloc_irq_ranges = alloc_hwirq_ranges,
+	.release_irq_ranges = release_hwirq_ranges,
+	.setup_irq = setup_cxl_msi,
+};
+
+static int setup_cxl_bars(struct pci_dev *dev)
+{
+	/* Safety check in case we get backported to < 3.17 without M64 */
+	if ((p1_base(dev) < 0x100000000ULL) ||
+	    (p2_base(dev) < 0x100000000ULL)) {
+		dev_err(&dev->dev, "ABORTING: M32 BAR assignment incompatible with CXL\n");
+		return -ENODEV;
+	}
+
+	/* BAR 4/5 has a special meaning for CXL and must be programmed with a
+	 * special value corresponding to the CXL protocol address range.
+	 * For POWER 8 that means bits 48:49 must be set to 10 */
+	pci_write_config_dword(dev, PCI_BASE_ADDRESS_4, 0x00000000);
+	pci_write_config_dword(dev, PCI_BASE_ADDRESS_5, 0x00020000);
+
+	return 0;
+}
+
+/*
+ *  pciex node: ibm,opal-m64-window = <0x3d058 0x0 0x3d058 0x0 0x8 0x0>;
+ */
+
+static int switch_card_to_cxl(struct pci_dev *dev)
+{
+	int vsec;
+	u8 val;
+	int rc;
+
+	dev_info(&dev->dev, "switch card to CXL\n");
+
+	if (!(vsec = find_cxl_vsec(dev))) {
+		dev_err(&dev->dev, "ABORTING: CXL VSEC not found!\n");
+		return -ENODEV;
+	}
+
+	if ((rc = CXL_READ_VSEC_MODE_CONTROL(dev, vsec, &val))) {
+		dev_err(&dev->dev, "failed to read current mode control: %i", rc);
+		return rc;
+	}
+	val &= ~CXL_VSEC_PROTOCOL_MASK;
+	val |= CXL_VSEC_PROTOCOL_256TB | CXL_VSEC_PROTOCOL_ENABLE;
+	if ((rc = CXL_WRITE_VSEC_MODE_CONTROL(dev, vsec, val))) {
+		dev_err(&dev->dev, "failed to enable CXL protocol: %i", rc);
+		return rc;
+	}
+	/* The CAIA spec (v0.12 11.6 Bi-modal Device Support) states
+	 * we must wait 100ms after this mode switch before touching
+	 * PCIe config space.
+	 */
+	msleep(100);
+
+	return 0;
+}
+
+static int cxl_map_slice_regs(struct cxl_afu_t *afu, struct cxl_t *adapter, struct pci_dev *dev)
+{
+	u64 p1n_base, p2n_base, afu_desc;
+	const u64 p1n_size = 0x100;
+	const u64 p2n_size = 0x1000;
+
+	p1n_base = p1_base(dev) + 0x10000 + (afu->slice * p1n_size);
+	p2n_base = p2_base(dev) + (afu->slice * p2n_size);
+	afu->psn_phys = p2_base(dev) + (adapter->ps_off + (afu->slice * adapter->ps_size));
+	afu_desc = p2_base(dev) + adapter->afu_desc_off + (afu->slice * adapter->afu_desc_size);
+
+	if (!(afu->p1n_mmio = ioremap(p1n_base, p1n_size)))
+		goto err;
+	if (!(afu->p2n_mmio = ioremap(p2n_base, p2n_size)))
+		goto err1;
+	if (afu_desc) {
+		if (!(afu->afu_desc_mmio = ioremap(afu_desc, adapter->afu_desc_size)))
+			goto err2;
+	}
+
+	return 0;
+err2:
+	iounmap(afu->p2n_mmio);
+err1:
+	iounmap(afu->p1n_mmio);
+err:
+	dev_err(&afu->dev, "Error mapping AFU MMIO regions\n");
+	return -ENOMEM;
+}
+
+static void cxl_unmap_slice_regs(struct cxl_afu_t *afu)
+{
+	if (afu->p1n_mmio)
+		iounmap(afu->p2n_mmio);
+	if (afu->p1n_mmio)
+		iounmap(afu->p1n_mmio);
+}
+
+static void cxl_release_afu(struct device *dev)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(dev);
+
+	pr_devel("cxl_release_afu\n");
+
+	kfree(afu);
+}
+
+static struct cxl_afu_t *cxl_alloc_afu(struct cxl_t *adapter, int slice)
+{
+	struct cxl_afu_t *afu;
+
+	if (!(afu = kzalloc(sizeof(struct cxl_afu_t), GFP_KERNEL)))
+		return NULL;
+
+	afu->adapter = adapter;
+	afu->dev.parent = &adapter->dev;
+	afu->dev.release = cxl_release_afu;
+	afu->slice = slice;
+	idr_init(&afu->contexts_idr);
+	spin_lock_init(&afu->contexts_lock);
+	spin_lock_init(&afu->afu_cntl_lock);
+	mutex_init(&afu->spa_mutex);
+
+	afu->prefault_mode = CXL_PREFAULT_NONE;
+	afu->irqs_max = afu->adapter->user_irqs;
+
+	return afu;
+}
+
+/* Expects AFU struct to have recently been zeroed out */
+static int cxl_read_afu_descriptor(struct cxl_afu_t *afu)
+{
+	u64 val;
+
+	val = AFUD_READ_INFO(afu);
+	afu->pp_irqs = AFUD_NUM_INTS_PER_PROC(val);
+	afu->max_procs_virtualised = AFUD_NUM_PROCS(val);
+
+	if (AFUD_AFU_DIRECTED(val))
+		afu->models_supported |= CXL_MODEL_DIRECTED;
+	if (AFUD_DEDICATED_PROCESS(val))
+		afu->models_supported |= CXL_MODEL_DEDICATED;
+	if (AFUD_TIME_SLICED(val))
+		afu->models_supported |= CXL_MODEL_TIME_SLICED;
+
+	val = AFUD_READ_PPPSA(afu);
+	afu->pp_size = AFUD_PPPSA_LEN(val) * 4096;
+	afu->psa = AFUD_PPPSA_PSA(val);
+	if ((afu->pp_psa = AFUD_PPPSA_PP(val)))
+		afu->pp_offset = AFUD_READ_PPPSA_OFF(afu);
+
+	return 0;
+}
+
+static int cxl_afu_descriptor_looks_ok(struct cxl_afu_t *afu)
+{
+	if (afu->psa && afu->adapter->ps_size <
+			(afu->pp_offset + afu->pp_size*afu->max_procs_virtualised)) {
+		dev_err(&afu->dev, "per-process PSA can't fit inside the PSA!\n");
+		return -ENODEV;
+	}
+
+	if (afu->pp_psa && (afu->pp_size < PAGE_SIZE))
+		dev_warn(&afu->dev, "AFU uses < PAGE_SIZE per-process PSA!");
+
+	return 0;
+}
+
+static int sanitise_afu_regs(struct cxl_afu_t *afu)
+{
+	cxl_p1_write(afu->adapter, CXL_PSL_ErrIVTE, 0x0000000000000000);
+	cxl_p1n_write(afu, CXL_PSL_SERR_An, 0x0000000000000000);
+	cxl_p1n_write(afu, CXL_PSL_IVTE_Offset_An, 0x0000000000000000);
+	cxl_ops->slbia(afu);
+
+	return 0;
+}
+
+static int cxl_init_afu(struct cxl_t *adapter, int slice, struct pci_dev *dev)
+{
+	struct cxl_afu_t *afu;
+	bool free = true;
+	int rc;
+
+	if (!(afu = cxl_alloc_afu(adapter, slice)))
+		return -ENOMEM;
+
+	if ((rc = dev_set_name(&afu->dev, "afu%i.%i", adapter->adapter_num, slice)))
+		goto err1;
+
+	if ((rc = cxl_map_slice_regs(afu, adapter, dev)))
+		goto err1;
+
+	if ((rc = sanitise_afu_regs(afu)))
+		goto err2;
+
+	/* We need to reset the AFU before we can read the AFU descriptor */
+	if ((rc = cxl_ops->afu_reset(afu)))
+		goto err2;
+
+	if (cxl_verbose)
+		dump_afu_descriptor(afu);
+
+	if ((rc = cxl_read_afu_descriptor(afu)))
+		goto err2;
+
+	if ((rc = cxl_afu_descriptor_looks_ok(afu)))
+		goto err2;
+
+	if ((rc = init_implementation_afu_regs(afu)))
+		goto err2;
+
+	if ((rc = cxl_register_serr_irq(afu)))
+		goto err2;
+
+	if ((rc = cxl_register_psl_irq(afu)))
+		goto err3;
+
+	/* Don't care if this fails */
+	cxl_debugfs_afu_add(afu);
+
+	/* After we call this function we must not free the afu directly, even
+	 * if it returns an error! */
+	if ((rc = cxl_register_afu(afu)))
+		goto err_put1;
+
+	if ((rc = cxl_sysfs_afu_add(afu)))
+		goto err_put1;
+
+
+	if ((rc = cxl_afu_select_best_model(afu)))
+		goto err_put2;
+
+	adapter->afu[afu->slice] = afu;
+
+	return 0;
+
+err_put2:
+	cxl_sysfs_afu_remove(afu);
+err_put1:
+	device_unregister(&afu->dev);
+	free = false;
+	cxl_debugfs_afu_remove(afu);
+	cxl_release_psl_irq(afu);
+err3:
+	cxl_release_serr_irq(afu);
+err2:
+	cxl_unmap_slice_regs(afu);
+err1:
+	if (free)
+		kfree(afu);
+	return rc;
+}
+
+static void cxl_remove_afu(struct cxl_afu_t *afu)
+{
+	pr_devel("cxl_remove_afu\n");
+
+	if (!afu)
+		return;
+
+	cxl_sysfs_afu_remove(afu);
+	cxl_debugfs_afu_remove(afu);
+
+	spin_lock(&afu->adapter->afu_list_lock);
+	afu->adapter->afu[afu->slice] = NULL;
+	spin_unlock(&afu->adapter->afu_list_lock);
+
+	cxl_context_detach_all(afu);
+	cxl_afu_deactivate_model(afu);
+
+	cxl_release_psl_irq(afu);
+	cxl_release_serr_irq(afu);
+	cxl_unmap_slice_regs(afu);
+
+	device_unregister(&afu->dev);
+}
+
+
+static int cxl_map_adapter_regs(struct cxl_t *adapter, struct pci_dev *dev)
+{
+	if (pci_request_region(dev, 2, "priv 2 regs"))
+		goto err1;
+	if (pci_request_region(dev, 0, "priv 1 regs"))
+		goto err2;
+
+	pr_devel("cxl_map_adapter_regs: p1: %#.16llx %#llx, p2: %#.16llx %#llx",
+			p1_base(dev), p1_size(dev), p2_base(dev), p2_size(dev));
+
+	if (!(adapter->p1_mmio = ioremap(p1_base(dev), p1_size(dev))))
+		goto err3;
+
+	if (!(adapter->p2_mmio = ioremap(p2_base(dev), p2_size(dev))))
+		goto err4;
+
+	return 0;
+
+err4:
+	iounmap(adapter->p1_mmio);
+	adapter->p1_mmio = NULL;
+err3:
+	pci_release_region(dev, 0);
+err2:
+	pci_release_region(dev, 2);
+err1:
+	return -ENOMEM;
+}
+
+static void cxl_unmap_adapter_regs(struct cxl_t *adapter)
+{
+	if (adapter->p1_mmio)
+		iounmap(adapter->p1_mmio);
+	if (adapter->p2_mmio)
+		iounmap(adapter->p2_mmio);
+}
+
+static int cxl_read_vsec(struct cxl_t *adapter, struct pci_dev *dev)
+{
+	int vsec;
+	u32 afu_desc_off, afu_desc_size;
+	u32 ps_off, ps_size;
+	u16 vseclen;
+	u8 image_state;
+
+	if (!(vsec = find_cxl_vsec(dev))) {
+		dev_err(&adapter->dev, "ABORTING: CXL VSEC not found!\n");
+		return -ENODEV;
+	}
+
+	CXL_READ_VSEC_LENGTH(dev, vsec, &vseclen);
+	if (vseclen < CXL_VSEC_MIN_SIZE) {
+		pr_err("ABORTING: CXL VSEC too short\n");
+		return -EINVAL;
+	}
+
+	CXL_READ_VSEC_STATUS(dev, vsec, &adapter->vsec_status);
+	CXL_READ_VSEC_PSL_REVISION(dev, vsec, &adapter->psl_rev);
+	CXL_READ_VSEC_CAIA_MAJOR(dev, vsec, &adapter->caia_major);
+	CXL_READ_VSEC_CAIA_MINOR(dev, vsec, &adapter->caia_minor);
+	CXL_READ_VSEC_BASE_IMAGE(dev, vsec, &adapter->base_image);
+	CXL_READ_VSEC_IMAGE_STATE(dev, vsec, &image_state);
+	adapter->user_image_loaded = !!(image_state & CXL_VSEC_USER_IMAGE_LOADED);
+	adapter->perst_loads_image = !!(image_state & CXL_VSEC_PERST_LOADS_IMAGE);
+	adapter->perst_select_user = !!(image_state & CXL_VSEC_PERST_SELECT_USER);
+
+	CXL_READ_VSEC_NAFUS(dev, vsec, &adapter->slices);
+	CXL_READ_VSEC_AFU_DESC_OFF(dev, vsec, &afu_desc_off);
+	CXL_READ_VSEC_AFU_DESC_SIZE(dev, vsec, &afu_desc_size);
+	CXL_READ_VSEC_PS_OFF(dev, vsec, &ps_off);
+	CXL_READ_VSEC_PS_SIZE(dev, vsec, &ps_size);
+
+	/* Convert everything to bytes, because there is NO WAY I'd look at the
+	 * code a month later and forget what units these are in ;-) */
+	adapter->ps_off = ps_off * 64 * 1024;
+	adapter->ps_size = ps_size * 64 * 1024;
+	adapter->afu_desc_off = afu_desc_off * 64 * 1024;
+	adapter->afu_desc_size = afu_desc_size *64 * 1024;
+
+	/* Total IRQs - 1 PSL ERROR - #AFU*(1 slice error + 1 DSI) */
+	adapter->user_irqs = pnv_cxl_get_irq_count(dev) - 1 - 2*adapter->slices;
+
+	return 0;
+}
+
+static int cxl_vsec_looks_ok(struct cxl_t *adapter, struct pci_dev *dev)
+{
+	if (adapter->vsec_status & CXL_STATUS_SECOND_PORT)
+		return -EBUSY;
+
+	if (adapter->vsec_status & CXL_UNSUPPORTED_FEATURES) {
+		dev_err(&adapter->dev, "ABORTING: CXL requires unsupported features\n");
+		return -EINVAL;
+	}
+
+	if (!adapter->slices) {
+		/* Once we support dynamic reprogramming we can use the card if
+		 * it supports loadable AFUs */
+		dev_err(&adapter->dev, "ABORTING: Device has no AFUs\n");
+		return -EINVAL;
+	}
+
+	if (!adapter->afu_desc_off || !adapter->afu_desc_size) {
+		dev_err(&adapter->dev, "ABORTING: VSEC shows no AFU descriptors\n");
+		return -EINVAL;
+	}
+
+	if (adapter->ps_size > p2_size(dev) - adapter->ps_off) {
+		dev_err(&adapter->dev, "ABORTING: Problem state size larger than "
+				   "available in BAR2: 0x%llx > 0x%llx\n",
+			 adapter->ps_size, p2_size(dev) - adapter->ps_off);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static void cxl_release_adapter(struct device *dev)
+{
+	struct cxl_t *adapter = to_cxl_adapter(dev);
+
+	pr_devel("cxl_release_adapter\n");
+
+	kfree(adapter);
+}
+
+static struct cxl_t *cxl_alloc_adapter(struct pci_dev *dev)
+{
+	struct cxl_t *adapter;
+
+	if (!(adapter = kzalloc(sizeof(struct cxl_t), GFP_KERNEL)))
+		return NULL;
+
+	adapter->dev.parent = &dev->dev;
+	adapter->dev.release = cxl_release_adapter;
+	adapter->driver = &cxl_pci_driver_ops;
+	pci_set_drvdata(dev, adapter);
+	spin_lock_init(&adapter->afu_list_lock);
+
+	return adapter;
+}
+
+static struct cxl_t *cxl_init_adapter(struct pci_dev *dev)
+{
+	struct cxl_t *adapter;
+	bool free = true;
+	int rc;
+
+
+	if (!(adapter = cxl_alloc_adapter(dev)))
+		return ERR_PTR(-ENOMEM);
+
+	if ((rc = switch_card_to_cxl(dev)))
+		goto err1;
+
+	if ((rc = cxl_alloc_adapter_nr(adapter)))
+		goto err1;
+
+	if ((rc = dev_set_name(&adapter->dev, "card%i", adapter->adapter_num)))
+		goto err2;
+
+	if ((rc = cxl_read_vsec(adapter, dev)))
+		goto err2;
+
+	if ((rc = cxl_vsec_looks_ok(adapter, dev)))
+		goto err2;
+
+	if ((rc = cxl_map_adapter_regs(adapter, dev)))
+		goto err2;
+
+	/* TODO: cxl_ops->sanitise_adapter_regs(adapter); */
+
+	if ((rc = init_implementation_adapter_regs(adapter, dev)))
+		goto err3;
+
+	if ((rc = pnv_phb_to_cxl(dev)))
+		goto err3;
+
+	if ((rc = cxl_register_psl_err_irq(adapter)))
+		goto err3;
+
+	/* Don't care if this one fails: */
+	cxl_debugfs_adapter_add(adapter);
+
+	/* After we call this function we must not free the adapter directly,
+	 * even if it returns an error! */
+	if ((rc = cxl_register_adapter(adapter)))
+		goto err_put1;
+
+	if ((rc = cxl_sysfs_adapter_add(adapter)))
+		goto err_put1;
+
+	return adapter;
+
+err_put1:
+	device_unregister(&adapter->dev);
+	free = false;
+	cxl_debugfs_adapter_remove(adapter);
+	cxl_release_psl_err_irq(adapter);
+err3:
+	cxl_unmap_adapter_regs(adapter);
+err2:
+	cxl_remove_adapter_nr(adapter);
+err1:
+	if (free)
+		kfree(adapter);
+	return ERR_PTR(rc);
+}
+
+static void cxl_remove_adapter(struct cxl_t *adapter)
+{
+	struct pci_dev *pdev = to_pci_dev(adapter->dev.parent);
+
+	pr_devel("cxl_release_adapter\n");
+
+	cxl_sysfs_adapter_remove(adapter);
+	cxl_debugfs_adapter_remove(adapter);
+	cxl_release_psl_err_irq(adapter);
+	cxl_unmap_adapter_regs(adapter);
+	cxl_remove_adapter_nr(adapter);
+
+	device_unregister(&adapter->dev);
+
+	pci_release_region(pdev, 0);
+	pci_release_region(pdev, 2);
+	pci_disable_device(pdev);
+}
+
+static int cxl_probe(struct pci_dev *dev, const struct pci_device_id *id)
+{
+	struct cxl_t *adapter;
+	int slice;
+	int rc;
+
+	pci_dev_get(dev);
+
+	if (cxl_verbose)
+		dump_cxl_config_space(dev);
+
+	if ((rc = setup_cxl_bars(dev)))
+		return rc;
+
+	if ((rc = pci_enable_device(dev))) {
+		dev_err(&dev->dev, "pci_enable_device failed: %i\n", rc);
+		return rc;
+	}
+
+	adapter = cxl_init_adapter(dev);
+	if (IS_ERR(adapter)) {
+		dev_err(&dev->dev, "cxl_init_adapter failed: %li\n", PTR_ERR(adapter));
+		return PTR_ERR(adapter);
+	}
+
+	for (slice = 0; slice < adapter->slices; slice++) {
+		if ((rc = cxl_init_afu(adapter, slice, dev)))
+			dev_err(&dev->dev, "AFU %i failed to initialise: %i\n", slice, rc);
+	}
+
+	return 0;
+}
+
+static void cxl_remove(struct pci_dev *dev)
+{
+	struct cxl_t *adapter = pci_get_drvdata(dev);
+	int afu;
+
+	dev_warn(&dev->dev, "pci remove\n");
+
+	/* Lock to prevent someone grabbing a ref through the adapter list as
+	 * we are removing it */
+	for (afu = 0; afu < adapter->slices; afu++)
+		cxl_remove_afu(adapter->afu[afu]);
+	cxl_remove_adapter(adapter);
+}
+
+static struct pci_driver cxl_pci_driver = {
+	.name = "cxl-pci",
+	.id_table = cxl_pci_tbl,
+	.probe = cxl_probe,
+	.remove = cxl_remove,
+};
+
+module_driver(cxl_pci_driver, pci_register_driver, pci_unregister_driver);
+
+MODULE_DESCRIPTION("IBM Coherent Accelerator");
+MODULE_AUTHOR("Ian Munsie <imunsie@au1.ibm.com>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
new file mode 100644
index 0000000..87984cb
--- /dev/null
+++ b/drivers/misc/cxl/cxl.h
@@ -0,0 +1,605 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _CXL_H_
+#define _CXL_H_
+
+#include <linux/interrupt.h>
+#include <linux/semaphore.h>
+#include <linux/device.h>
+#include <linux/types.h>
+#include <linux/cdev.h>
+#include <linux/pid.h>
+#include <linux/io.h>
+#include <asm/cputable.h>
+#include <asm/mmu.h>
+#include <asm/reg.h>
+#include <misc/cxl.h>
+
+#include <uapi/misc/cxl.h>
+
+extern uint cxl_verbose;
+
+#define CXL_TIMEOUT 5
+
+/* Opaque types to avoid accidentally passing registers for the wrong MMIO
+ *
+ * At the end of the day, I'm not married to using typedef here, but it might
+ * (and has!) help avoid bugs like mixing up CXL_PSL_CtxTime and
+ * CXL_PSL_CtxTime_An, or calling cxl_p1n_write instead of cxl_p1_write.
+ *
+ * I'm quite happy if these are changed back to #defines before upstreaming, it
+ * should be little more than a regexp search+replace operation in this file.
+ */
+typedef struct {
+	const int x;
+} cxl_p1_reg_t;
+typedef struct {
+	const int x;
+} cxl_p1n_reg_t;
+typedef struct {
+	const int x;
+} cxl_p2n_reg_t;
+#define cxl_reg_off(reg) \
+	(reg.x)
+
+/* Memory maps. Ref CXL Appendix A */
+
+/* PSL Privilege 1 Memory Map */
+/* Configuration and Control area */
+static const cxl_p1_reg_t CXL_PSL_CtxTime = {0x0000};
+static const cxl_p1_reg_t CXL_PSL_ErrIVTE = {0x0008};
+static const cxl_p1_reg_t CXL_PSL_KEY1    = {0x0010};
+static const cxl_p1_reg_t CXL_PSL_KEY2    = {0x0018};
+static const cxl_p1_reg_t CXL_PSL_Control = {0x0020};
+/* Downloading */
+static const cxl_p1_reg_t CXL_PSL_DLCNTL  = {0x0060};
+static const cxl_p1_reg_t CXL_PSL_DLADDR  = {0x0068};
+
+/* PSL Lookaside Buffer Management Area */
+static const cxl_p1_reg_t CXL_PSL_LBISEL  = {0x0080};
+static const cxl_p1_reg_t CXL_PSL_SLBIE   = {0x0088};
+static const cxl_p1_reg_t CXL_PSL_SLBIA   = {0x0090};
+static const cxl_p1_reg_t CXL_PSL_TLBIE   = {0x00A0};
+static const cxl_p1_reg_t CXL_PSL_TLBIA   = {0x00A8};
+static const cxl_p1_reg_t CXL_PSL_AFUSEL  = {0x00B0};
+
+/* 0x00C0:7EFF Implementation dependent area */
+static const cxl_p1_reg_t CXL_PSL_FIR1      = {0x0100};
+static const cxl_p1_reg_t CXL_PSL_FIR2      = {0x0108};
+static const cxl_p1_reg_t CXL_PSL_VERSION   = {0x0118};
+static const cxl_p1_reg_t CXL_PSL_RESLCKTO  = {0x0128};
+static const cxl_p1_reg_t CXL_PSL_FIR_CNTL  = {0x0148};
+static const cxl_p1_reg_t CXL_PSL_DSNDCTL   = {0x0150};
+static const cxl_p1_reg_t CXL_PSL_SNWRALLOC = {0x0158};
+static const cxl_p1_reg_t CXL_PSL_TRACE     = {0x0170};
+/* 0x7F00:7FFF Reserved PCIe MSI-X Pending Bit Array area */
+/* 0x8000:FFFF Reserved PCIe MSI-X Table Area */
+
+/* PSL Slice Privilege 1 Memory Map */
+/* Configuration Area */
+static const cxl_p1n_reg_t CXL_PSL_SR_An          = {0x00};
+static const cxl_p1n_reg_t CXL_PSL_LPID_An        = {0x08};
+static const cxl_p1n_reg_t CXL_PSL_AMBAR_An       = {0x10};
+static const cxl_p1n_reg_t CXL_PSL_SPOffset_An    = {0x18};
+static const cxl_p1n_reg_t CXL_PSL_ID_An          = {0x20};
+static const cxl_p1n_reg_t CXL_PSL_SERR_An        = {0x28};
+/* Memory Management and Lookaside Buffer Management */
+static const cxl_p1n_reg_t CXL_PSL_SDR_An         = {0x30};
+static const cxl_p1n_reg_t CXL_PSL_AMOR_An        = {0x38};
+/* Pointer Area */
+static const cxl_p1n_reg_t CXL_HAURP_An           = {0x80};
+static const cxl_p1n_reg_t CXL_PSL_SPAP_An        = {0x88};
+static const cxl_p1n_reg_t CXL_PSL_LLCMD_An       = {0x90};
+/* Control Area */
+static const cxl_p1n_reg_t CXL_PSL_SCNTL_An       = {0xA0};
+static const cxl_p1n_reg_t CXL_PSL_CtxTime_An     = {0xA8};
+static const cxl_p1n_reg_t CXL_PSL_IVTE_Offset_An = {0xB0};
+static const cxl_p1n_reg_t CXL_PSL_IVTE_Limit_An  = {0xB8};
+/* 0xC0:FF Implementation Dependent Area */
+static const cxl_p1n_reg_t CXL_PSL_FIR_SLICE_An   = {0xC0};
+static const cxl_p1n_reg_t CXL_AFU_DEBUG_An       = {0xC8};
+static const cxl_p1n_reg_t CXL_PSL_APCALLOC_A     = {0xD0};
+static const cxl_p1n_reg_t CXL_PSL_COALLOC_A      = {0xD8};
+static const cxl_p1n_reg_t CXL_PSL_RXCTL_A        = {0xE0};
+static const cxl_p1n_reg_t CXL_PSL_SLICE_TRACE    = {0xE8};
+
+/* PSL Slice Privilege 2 Memory Map */
+/* Configuration and Control Area */
+static const cxl_p2n_reg_t CXL_PSL_PID_TID_An = {0x000};
+static const cxl_p2n_reg_t CXL_CSRP_An        = {0x008};
+static const cxl_p2n_reg_t CXL_AURP0_An       = {0x010};
+static const cxl_p2n_reg_t CXL_AURP1_An       = {0x018};
+static const cxl_p2n_reg_t CXL_SSTP0_An       = {0x020};
+static const cxl_p2n_reg_t CXL_SSTP1_An       = {0x028};
+static const cxl_p2n_reg_t CXL_PSL_AMR_An     = {0x030};
+/* Segment Lookaside Buffer Management */
+static const cxl_p2n_reg_t CXL_SLBIE_An       = {0x040};
+static const cxl_p2n_reg_t CXL_SLBIA_An       = {0x048};
+static const cxl_p2n_reg_t CXL_SLBI_Select_An = {0x050};
+/* Interrupt Registers */
+static const cxl_p2n_reg_t CXL_PSL_DSISR_An   = {0x060};
+static const cxl_p2n_reg_t CXL_PSL_DAR_An     = {0x068};
+static const cxl_p2n_reg_t CXL_PSL_DSR_An     = {0x070};
+static const cxl_p2n_reg_t CXL_PSL_TFC_An     = {0x078};
+static const cxl_p2n_reg_t CXL_PSL_PEHandle_An = {0x080};
+static const cxl_p2n_reg_t CXL_PSL_ErrStat_An = {0x088};
+/* AFU Registers */
+static const cxl_p2n_reg_t CXL_AFU_Cntl_An    = {0x090};
+static const cxl_p2n_reg_t CXL_AFU_ERR_An     = {0x098};
+/* Work Element Descriptor */
+static const cxl_p2n_reg_t CXL_PSL_WED_An     = {0x0A0};
+/* 0x0C0:FFF Implementation Dependent Area */
+
+#define CXL_PSL_SPAP_Addr 0x0ffffffffffff000ULL
+#define CXL_PSL_SPAP_Size 0x0000000000000ff0ULL
+#define CXL_PSL_SPAP_Size_Shift 4
+#define CXL_PSL_SPAP_V    0x0000000000000001ULL
+
+/****** CXL_PSL_DLCNTL *****************************************************/
+#define CXL_PSL_DLCNTL_D (0x1ull << (63-28))
+#define CXL_PSL_DLCNTL_C (0x1ull << (63-29))
+#define CXL_PSL_DLCNTL_E (0x1ull << (63-30))
+#define CXL_PSL_DLCNTL_S (0x1ull << (63-31))
+#define CXL_PSL_DLCNTL_CE (CXL_PSL_DLCNTL_C | CXL_PSL_DLCNTL_E)
+#define CXL_PSL_DLCNTL_DCES (CXL_PSL_DLCNTL_D | CXL_PSL_DLCNTL_CE | CXL_PSL_DLCNTL_S)
+
+/****** CXL_PSL_SR_An ******************************************************/
+#define CXL_PSL_SR_An_SF  MSR_SF            /* 64bit */
+#define CXL_PSL_SR_An_TA  (1ull << (63-1))  /* Tags active,   GA1: 0 */
+#define CXL_PSL_SR_An_HV  MSR_HV            /* Hypervisor,    GA1: 0 */
+#define CXL_PSL_SR_An_PR  MSR_PR            /* Problem state, GA1: 1 */
+#define CXL_PSL_SR_An_ISL (1ull << (63-53)) /* Ignore Segment Large Page */
+#define CXL_PSL_SR_An_TC  (1ull << (63-54)) /* Page Table secondary hash */
+#define CXL_PSL_SR_An_US  (1ull << (63-56)) /* User state,    GA1: X */
+#define CXL_PSL_SR_An_SC  (1ull << (63-58)) /* Segment Table secondary hash */
+#define CXL_PSL_SR_An_R   MSR_DR            /* Relocate,      GA1: 1 */
+#define CXL_PSL_SR_An_MP  (1ull << (63-62)) /* Master Process */
+#define CXL_PSL_SR_An_LE  (1ull << (63-63)) /* Little Endian */
+
+/****** CXL_PSL_LLCMD_An ****************************************************/
+#define CXL_LLCMD_TERMINATE   0x0001000000000000ULL
+#define CXL_LLCMD_REMOVE      0x0002000000000000ULL
+#define CXL_LLCMD_SUSPEND     0x0003000000000000ULL
+#define CXL_LLCMD_RESUME      0x0004000000000000ULL
+#define CXL_LLCMD_ADD         0x0005000000000000ULL
+#define CXL_LLCMD_UPDATE      0x0006000000000000ULL
+#define CXL_LLCMD_HANDLE_MASK 0x000000000000ffffULL
+
+/****** CXL_PSL_ID_An ****************************************************/
+#define CXL_PSL_ID_An_F	(1ull << (63-31))
+#define CXL_PSL_ID_An_L	(1ull << (63-30))
+
+/****** CXL_PSL_SCNTL_An ****************************************************/
+#define CXL_PSL_SCNTL_An_CR          (0x1ull << (63-15))
+/* Programming Models: */
+#define CXL_PSL_SCNTL_An_PM_MASK     (0xffffull << (63-31))
+#define CXL_PSL_SCNTL_An_PM_Shared   (0x0000ull << (63-31))
+#define CXL_PSL_SCNTL_An_PM_OS       (0x0001ull << (63-31))
+#define CXL_PSL_SCNTL_An_PM_Process  (0x0002ull << (63-31))
+#define CXL_PSL_SCNTL_An_PM_AFU      (0x0004ull << (63-31))
+#define CXL_PSL_SCNTL_An_PM_AFU_PBT  (0x0104ull << (63-31))
+/* Purge Status (ro) */
+#define CXL_PSL_SCNTL_An_Ps_MASK     (0x3ull << (63-39))
+#define CXL_PSL_SCNTL_An_Ps_Pending  (0x1ull << (63-39))
+#define CXL_PSL_SCNTL_An_Ps_Complete (0x3ull << (63-39))
+/* Purge */
+#define CXL_PSL_SCNTL_An_Pc          (0x1ull << (63-48))
+/* Suspend Status (ro) */
+#define CXL_PSL_SCNTL_An_Ss_MASK     (0x3ull << (63-55))
+#define CXL_PSL_SCNTL_An_Ss_Pending  (0x1ull << (63-55))
+#define CXL_PSL_SCNTL_An_Ss_Complete (0x3ull << (63-55))
+/* Suspend Control */
+#define CXL_PSL_SCNTL_An_Sc          (0x1ull << (63-63))
+
+/* AFU Slice Enable Status (ro) */
+#define CXL_AFU_Cntl_An_ES_MASK     (0x7ull << (63-2))
+#define CXL_AFU_Cntl_An_ES_Disabled (0x0ull << (63-2))
+#define CXL_AFU_Cntl_An_ES_Enabled  (0x4ull << (63-2))
+/* AFU Slice Enable */
+#define CXL_AFU_Cntl_An_E           (0x1ull << (63-3))
+/* AFU Slice Reset status (ro) */
+#define CXL_AFU_Cntl_An_RS_MASK     (0x3ull << (63-5))
+#define CXL_AFU_Cntl_An_RS_Pending  (0x1ull << (63-5))
+#define CXL_AFU_Cntl_An_RS_Complete (0x2ull << (63-5))
+/* AFU Slice Reset */
+#define CXL_AFU_Cntl_An_RA          (0x1ull << (63-7))
+
+/****** CXL_SSTP0/1_An ******************************************************/
+/* These top bits are for the segment that CONTAINS the segment table */
+#define CXL_SSTP0_An_B_SHIFT    SLB_VSID_SSIZE_SHIFT
+#define CXL_SSTP0_An_KS             (1ull << (63-2))
+#define CXL_SSTP0_An_KP             (1ull << (63-3))
+#define CXL_SSTP0_An_N              (1ull << (63-4))
+#define CXL_SSTP0_An_L              (1ull << (63-5))
+#define CXL_SSTP0_An_C              (1ull << (63-6))
+#define CXL_SSTP0_An_TA             (1ull << (63-7))
+#define CXL_SSTP0_An_LP_SHIFT                (63-9)  /* 2 Bits */
+/* And finally, the virtual address & size of the segment table: */
+#define CXL_SSTP0_An_SegTableSize_SHIFT      (63-31) /* 12 Bits */
+#define CXL_SSTP0_An_SegTableSize_MASK \
+	(((1ull << 12) - 1) << CXL_SSTP0_An_SegTableSize_SHIFT)
+#define CXL_SSTP0_An_STVA_U_MASK   ((1ull << (63-49))-1)
+#define CXL_SSTP1_An_STVA_L_MASK (~((1ull << (63-55))-1))
+#define CXL_SSTP1_An_V              (1ull << (63-63))
+
+/****** CXL_PSL_SLBIE_[An] **************************************************/
+/* write: */
+#define CXL_SLBIE_C        PPC_BIT(36)         /* Class */
+#define CXL_SLBIE_SS       PPC_BITMASK(37, 38) /* Segment Size */
+#define CXL_SLBIE_SS_SHIFT PPC_BITLSHIFT(38)
+#define CXL_SLBIE_TA       PPC_BIT(38)         /* Tags Active */
+/* read: */
+#define CXL_SLBIE_MAX      PPC_BITMASK(24, 31)
+#define CXL_SLBIE_PENDING  PPC_BITMASK(56, 63)
+
+/****** CXL_SLBIA_[An] ******************************************************/
+#define CXL_SLBIA_P         (1ull) /* Pending (read) */
+
+/****** Common to all PSL_SLBIE/A_[An] registers *****************************/
+#define CXL_SLBI_IQ_ALL     (0ull)              /* Inv qualifier */
+#define CXL_SLBI_IQ_LPID    (1ull)              /* Inv qualifier */
+#define CXL_SLBI_IQ_LPIDPID (3ull)              /* Inv qualifier */
+
+/****** CXL_PSL_DSISR_An ****************************************************/
+#define CXL_PSL_DSISR_An_DS (1ull << (63-0))  /* Segment not found */
+#define CXL_PSL_DSISR_An_DM (1ull << (63-1))  /* PTE not found (See also: M) or protection fault */
+#define CXL_PSL_DSISR_An_ST (1ull << (63-2))  /* Segment Table PTE not found */
+#define CXL_PSL_DSISR_An_UR (1ull << (63-3))  /* AURP PTE not found */
+#define CXL_PSL_DSISR_TRANS (CXL_PSL_DSISR_An_DS | CXL_PSL_DSISR_An_DM | CXL_PSL_DSISR_An_ST | CXL_PSL_DSISR_An_UR)
+#define CXL_PSL_DSISR_An_PE (1ull << (63-4))  /* PSL Error (implementation specific) */
+#define CXL_PSL_DSISR_An_AE (1ull << (63-5))  /* AFU Error */
+#define CXL_PSL_DSISR_An_OC (1ull << (63-6))  /* OS Context Warning */
+/* NOTE: Bits 32:63 are undefined if DSISR[DS] = 1 */
+#define CXL_PSL_DSISR_An_M  DSISR_NOHPTE      /* PTE not found */
+#define CXL_PSL_DSISR_An_P  DSISR_PROTFAULT   /* Storage protection violation */
+#define CXL_PSL_DSISR_An_A  (1ull << (63-37)) /* AFU lock access to write through or cache inhibited storage */
+#define CXL_PSL_DSISR_An_S  DSISR_ISSTORE     /* Access was afu_wr or afu_zero */
+#define CXL_PSL_DSISR_An_K  DSISR_KEYFAULT    /* Access not permitted by virtual page class key protection */
+
+/****** CXL_PSL_TFC_An ******************************************************/
+#define CXL_PSL_TFC_An_A  (1ull << (63-28)) /* Acknowledge non-translation fault */
+#define CXL_PSL_TFC_An_C  (1ull << (63-29)) /* Continue (abort transaction) */
+#define CXL_PSL_TFC_An_AE (1ull << (63-30)) /* Restart PSL with address error */
+#define CXL_PSL_TFC_An_R  (1ull << (63-31)) /* Restart PSL transaction */
+
+/* cxl_process_element->software_status */
+#define CXL_PE_SOFTWARE_STATE_V (1ul << (31 -  0)) /* Valid */
+#define CXL_PE_SOFTWARE_STATE_C (1ul << (31 - 29)) /* Complete */
+#define CXL_PE_SOFTWARE_STATE_S (1ul << (31 - 30)) /* Suspend */
+#define CXL_PE_SOFTWARE_STATE_T (1ul << (31 - 31)) /* Terminate */
+
+/* SPA->sw_command_status */
+#define CXL_SPA_SW_CMD_MASK         0xffff000000000000ULL
+#define CXL_SPA_SW_CMD_TERMINATE    0x0001000000000000ULL
+#define CXL_SPA_SW_CMD_REMOVE       0x0002000000000000ULL
+#define CXL_SPA_SW_CMD_SUSPEND      0x0003000000000000ULL
+#define CXL_SPA_SW_CMD_RESUME       0x0004000000000000ULL
+#define CXL_SPA_SW_CMD_ADD          0x0005000000000000ULL
+#define CXL_SPA_SW_CMD_UPDATE       0x0006000000000000ULL
+#define CXL_SPA_SW_STATE_MASK       0x0000ffff00000000ULL
+#define CXL_SPA_SW_STATE_TERMINATED 0x0000000100000000ULL
+#define CXL_SPA_SW_STATE_REMOVED    0x0000000200000000ULL
+#define CXL_SPA_SW_STATE_SUSPENDED  0x0000000300000000ULL
+#define CXL_SPA_SW_STATE_RESUMED    0x0000000400000000ULL
+#define CXL_SPA_SW_STATE_ADDED      0x0000000500000000ULL
+#define CXL_SPA_SW_STATE_UPDATED    0x0000000600000000ULL
+#define CXL_SPA_SW_PSL_ID_MASK      0x00000000ffff0000ULL
+#define CXL_SPA_SW_LINK_MASK        0x000000000000ffffULL
+
+#define CXL_MAX_SLICES 4
+#define MAX_AFU_MMIO_REGS 3
+
+#define CXL_MODEL_DEDICATED   0x1
+#define CXL_MODEL_DIRECTED    0x2
+#define CXL_MODEL_TIME_SLICED 0x4
+#define CXL_SUPPORTED_MODELS (CXL_MODEL_DEDICATED | CXL_MODEL_DIRECTED)
+
+enum cxl_context_status {
+	CLOSED,
+	OPENED,
+	STARTED
+};
+
+enum prefault_modes {
+	CXL_PREFAULT_NONE,
+	CXL_PREFAULT_WED,
+	CXL_PREFAULT_ALL,
+};
+
+struct cxl_sste {
+	__be64 esid_data;
+	__be64 vsid_data;
+};
+
+#define to_cxl_adapter(d) container_of(d, struct cxl_t, dev)
+#define to_cxl_afu(d) container_of(d, struct cxl_afu_t, dev)
+
+struct cxl_afu_t {
+	irq_hw_number_t psl_hwirq;
+	irq_hw_number_t serr_hwirq;
+	unsigned int serr_virq;
+	void __iomem *p1n_mmio;
+	void __iomem *p2n_mmio;
+	phys_addr_t psn_phys;
+	u64 pp_offset;
+	u64 pp_size;
+	void __iomem *afu_desc_mmio;
+	struct cxl_t *adapter;
+	struct device dev;
+	struct cdev afu_cdev_s, afu_cdev_m;
+	struct device *chardev_s, *chardev_m;
+	struct idr contexts_idr;
+	struct dentry *debugfs;
+	spinlock_t contexts_lock;
+	struct mutex spa_mutex;
+	spinlock_t afu_cntl_lock;
+
+	/* Only the first part of the SPA is used for the process element
+	 * linked list. The only other part that software needs to worry about
+	 * is sw_command_status, which we store a separate pointer to.
+	 * Everything else in the SPA is only used by hardware */
+	struct cxl_process_element *spa;
+	__be64 *sw_command_status;
+	unsigned int spa_size;
+	int spa_order;
+	int spa_max_procs;
+	unsigned int psl_virq;
+
+	int pp_irqs;
+	int irqs_max;
+	int num_procs;
+	int max_procs_virtualised;
+	int slice;
+	int models_supported;
+	int current_model;
+	enum prefault_modes prefault_mode;
+	bool psa;
+	bool pp_psa;
+	bool enabled;
+};
+
+/* This is a cxl context.  If the PSL is in dedicated model, there will be one
+ * of these per AFU.  If in AFU directed there can be lots of these. */
+struct cxl_context_t {
+	struct cxl_afu_t *afu;
+
+	/* Problem state MMIO */
+	phys_addr_t psn_phys;
+	u64 psn_size;
+
+	spinlock_t sst_lock; /* Protects segment table */
+	struct cxl_sste *sstp;
+	unsigned int sst_size, sst_lru;
+
+	wait_queue_head_t wq;
+	struct pid *pid;
+	spinlock_t lock; /* Protects pending_irq_mask, pending_fault and fault_addr */
+	/* Only used in PR mode */
+	u64 process_token;
+
+	unsigned long *irq_bitmap; /* Accessed from IRQ context */
+	struct cxl_irq_ranges irqs;
+	u64 fault_addr;
+	u64 afu_err;
+	enum cxl_context_status status;
+
+
+	/* XXX: Is it possible to need multiple work items at once? */
+	struct work_struct fault_work;
+	u64 dsisr;
+	u64 dar;
+
+	struct cxl_process_element *elem;
+
+	int ph; /* process handle/process element index */
+	u32 irq_count;
+	bool pe_inserted;
+	bool master;
+	bool kernel;
+	bool pending_irq;
+	bool pending_fault;
+	bool pending_afu_err;
+};
+
+struct cxl_t {
+	void __iomem *p1_mmio;
+	void __iomem *p2_mmio;
+	irq_hw_number_t err_hwirq;
+	unsigned int err_virq;
+	struct cxl_driver_ops *driver;
+	spinlock_t afu_list_lock;
+	struct cxl_afu_t *afu[CXL_MAX_SLICES];
+	struct device dev;
+	struct dentry *trace;
+	struct dentry *psl_err_chk;
+	struct dentry *debugfs;
+	struct bin_attribute cxl_attr;
+	int adapter_num;
+	int user_irqs;
+	u64 afu_desc_off;
+	u64 afu_desc_size;
+	u64 ps_off;
+	u64 ps_size;
+	u16 psl_rev;
+	u16 base_image;
+	u8 vsec_status;
+	u8 caia_major;
+	u8 caia_minor;
+	u8 slices;
+	bool user_image_loaded;
+	bool perst_loads_image;
+	bool perst_select_user;
+};
+
+struct cxl_driver_ops {
+	struct module *module;
+	int (*alloc_one_irq)(struct cxl_t *adapter);
+	void (*release_one_irq)(struct cxl_t *adapter, int hwirq);
+	int (*alloc_irq_ranges)(struct cxl_irq_ranges *irqs, struct cxl_t *adapter, unsigned int num);
+	void (*release_irq_ranges)(struct cxl_irq_ranges *irqs, struct cxl_t *adapter);
+	int (*setup_irq)(struct cxl_t *adapter, unsigned int hwirq, unsigned int virq);
+};
+
+/* common == phyp + powernv */
+struct cxl_process_element_common {
+	__be32 tid;
+	__be32 pid;
+	__be64 csrp;
+	__be64 aurp0;
+	__be64 aurp1;
+	__be64 sstp0;
+	__be64 sstp1;
+	__be64 amr;
+	u8     reserved3[4];
+	__be64 wed;
+} __packed;
+
+/* just powernv */
+struct cxl_process_element {
+	__be64 sr;
+	__be64 SPOffset;
+	__be64 sdr;
+	__be64 haurp;
+	__be32 ctxtime;
+	__be16 ivte_offsets[4];
+	__be16 ivte_ranges[4];
+	__be32 lpid;
+	struct cxl_process_element_common common;
+	__be32 software_state;
+} __packed;
+
+#define _cxl_reg_write(addr, val) \
+	out_be64((u64 __iomem *)(addr), val)
+#define _cxl_reg_read(addr) \
+	in_be64((u64 __iomem *)(addr))
+
+static inline void __iomem *_cxl_p1_addr(struct cxl_t *cxl, cxl_p1_reg_t reg)
+{
+	WARN_ON(!cpu_has_feature(CPU_FTR_HVMODE));
+	return cxl->p1_mmio + cxl_reg_off(reg);
+}
+#define cxl_p1_write(cxl, reg, val) \
+	_cxl_reg_write(_cxl_p1_addr(cxl, reg), val)
+#define cxl_p1_read(cxl, reg) \
+	_cxl_reg_read(_cxl_p1_addr(cxl, reg))
+
+static inline void __iomem *_cxl_p1n_addr(struct cxl_afu_t *afu, cxl_p1n_reg_t reg)
+{
+	WARN_ON(!cpu_has_feature(CPU_FTR_HVMODE));
+	return afu->p1n_mmio + cxl_reg_off(reg);
+}
+#define cxl_p1n_write(afu, reg, val) \
+	_cxl_reg_write(_cxl_p1n_addr(afu, reg), val)
+#define cxl_p1n_read(afu, reg) \
+	_cxl_reg_read(_cxl_p1n_addr(afu, reg))
+
+static inline void __iomem *_cxl_p2n_addr(struct cxl_afu_t *afu, cxl_p2n_reg_t reg)
+{
+	return afu->p2n_mmio + cxl_reg_off(reg);
+}
+#define cxl_p2n_write(afu, reg, val) \
+	_cxl_reg_write(_cxl_p2n_addr(afu, reg), val)
+#define cxl_p2n_read(afu, reg) \
+	_cxl_reg_read(_cxl_p2n_addr(afu, reg))
+
+struct cxl_calls {
+	void (*cxl_slbia)(struct mm_struct *mm);
+	struct module *owner;
+};
+int register_cxl_calls(struct cxl_calls *calls);
+void unregister_cxl_calls(struct cxl_calls *calls);
+
+int cxl_alloc_adapter_nr(struct cxl_t *adapter);
+void cxl_remove_adapter_nr(struct cxl_t *adapter);
+
+int cxl_file_init(void);
+void cxl_file_exit(void);
+int cxl_register_adapter(struct cxl_t *adapter);
+int cxl_register_afu(struct cxl_afu_t *afu);
+int cxl_chardev_m_afu_add(struct cxl_afu_t *afu);
+int cxl_chardev_s_afu_add(struct cxl_afu_t *afu);
+void cxl_chardev_afu_remove(struct cxl_afu_t *afu);
+
+void cxl_context_detach_all(struct cxl_afu_t *afu);
+void cxl_context_free(struct cxl_context_t *ctx);
+void cxl_context_detach(struct cxl_context_t *ctx);
+
+int cxl_sysfs_adapter_add(struct cxl_t *adapter);
+void cxl_sysfs_adapter_remove(struct cxl_t *adapter);
+int cxl_sysfs_afu_add(struct cxl_afu_t *afu);
+void cxl_sysfs_afu_remove(struct cxl_afu_t *afu);
+
+int cxl_afu_activate_model(struct cxl_afu_t *afu, int model);
+int _cxl_afu_deactivate_model(struct cxl_afu_t *afu, int model);
+int cxl_afu_deactivate_model(struct cxl_afu_t *afu);
+int cxl_afu_select_best_model(struct cxl_afu_t *afu);
+
+unsigned int cxl_map_irq(struct cxl_t *adapter, irq_hw_number_t hwirq,
+		         irq_handler_t handler, void *cookie);
+void cxl_unmap_irq(unsigned int virq, void *cookie);
+int cxl_register_psl_irq(struct cxl_afu_t *afu);
+void cxl_release_psl_irq(struct cxl_afu_t *afu);
+int cxl_register_psl_err_irq(struct cxl_t *adapter);
+void cxl_release_psl_err_irq(struct cxl_t *adapter);
+int cxl_register_serr_irq(struct cxl_afu_t *afu);
+void cxl_release_serr_irq(struct cxl_afu_t *afu);
+int afu_register_irqs(struct cxl_context_t *ctx, u32 count);
+void afu_release_irqs(struct cxl_context_t *ctx);
+irqreturn_t cxl_slice_irq_err(int irq, void *data);
+
+int cxl_debugfs_init(void);
+void cxl_debugfs_exit(void);
+int cxl_debugfs_adapter_add(struct cxl_t *adapter);
+void cxl_debugfs_adapter_remove(struct cxl_t *adapter);
+int cxl_debugfs_afu_add(struct cxl_afu_t *afu);
+void cxl_debugfs_afu_remove(struct cxl_afu_t *afu);
+
+void cxl_handle_fault(struct work_struct *work);
+void cxl_prefault(struct cxl_context_t *ctx, u64 wed);
+
+struct cxl_t *get_cxl_adapter(int num);
+int cxl_alloc_sst(struct cxl_context_t *ctx, u64 *sstp0, u64 *sstp1);
+
+void init_cxl_native(void);
+
+struct cxl_context_t *cxl_context_alloc(void);
+int cxl_context_init(struct cxl_context_t *ctx, struct cxl_afu_t *afu, bool master);
+void cxl_context_free(struct cxl_context_t *ctx);
+int cxl_context_iomap(struct cxl_context_t *ctx, struct vm_area_struct *vma);
+
+/* This matches the layout of the H_COLLECT_CA_INT_INFO retbuf */
+struct cxl_irq_info {
+	u64 dsisr;
+	u64 dar;
+	u64 dsr;
+	u32 pid;
+	u32 tid;
+	u64 afu_err;
+	u64 errstat;
+	u64 padding[3]; /* to match the expected retbuf size for plpar_hcall9 */
+};
+
+struct cxl_backend_ops {
+	int (*attach_process)(struct cxl_context_t *ctx, bool kernel, u64 wed,
+			    u64 amr);
+	int (*detach_process)(struct cxl_context_t *ctx);
+
+	int (*get_irq)(struct cxl_context_t *ctx, struct cxl_irq_info *info);
+	int (*ack_irq)(struct cxl_context_t *ctx, u64 tfc, u64 psl_reset_mask);
+
+	int (*check_error)(struct cxl_afu_t *afu);
+	void (*slbia)(struct cxl_afu_t *afu);
+	int (*afu_reset)(struct cxl_afu_t *afu);
+};
+extern const struct cxl_backend_ops *cxl_ops;
+
+void cxl_stop_trace(struct cxl_t *cxl);
+
+#endif
diff --git a/drivers/misc/cxl/debugfs.c b/drivers/misc/cxl/debugfs.c
new file mode 100644
index 0000000..f4d148c
--- /dev/null
+++ b/drivers/misc/cxl/debugfs.c
@@ -0,0 +1,116 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/debugfs.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+
+#include "cxl.h"
+
+struct dentry *cxl_debugfs;
+
+void cxl_stop_trace(struct cxl_t *adapter)
+{
+	int slice;
+
+	/* Stop the trace */
+	cxl_p1_write(adapter, CXL_PSL_TRACE, 0x8000000000000017LL);
+
+	/* Stop the slice traces */
+	spin_lock(&adapter->afu_list_lock);
+	for (slice = 0; slice < adapter->slices; slice++) {
+		if (adapter->afu[slice])
+			cxl_p1n_write(adapter->afu[slice], CXL_PSL_SLICE_TRACE, 0x8000000000000000LL);
+	}
+	spin_unlock(&adapter->afu_list_lock);
+}
+
+int cxl_debugfs_adapter_add(struct cxl_t *adapter)
+{
+	struct dentry *dir;
+	char buf[32];
+
+	if (!cxl_debugfs)
+		return -ENODEV;
+
+	snprintf(buf, 32, "card%i", adapter->adapter_num);
+	dir = debugfs_create_dir(buf, cxl_debugfs);
+	if (IS_ERR(dir))
+		return PTR_ERR(dir);
+	adapter->debugfs = dir;
+
+	debugfs_create_x64("fir1",     S_IRUSR, dir, _cxl_p1_addr(adapter, CXL_PSL_FIR1));
+	debugfs_create_x64("fir2",     S_IRUSR, dir, _cxl_p1_addr(adapter, CXL_PSL_FIR2));
+	debugfs_create_x64("fir_cntl", S_IRUSR, dir, _cxl_p1_addr(adapter, CXL_PSL_FIR_CNTL));
+	debugfs_create_x64("err_ivte", S_IRUSR, dir, _cxl_p1_addr(adapter, CXL_PSL_ErrIVTE));
+
+	debugfs_create_x64("trace", S_IRUSR | S_IWUSR, dir, _cxl_p1_addr(adapter, CXL_PSL_TRACE));
+
+	return 0;
+}
+EXPORT_SYMBOL(cxl_debugfs_adapter_add);
+
+void cxl_debugfs_adapter_remove(struct cxl_t *adapter)
+{
+	debugfs_remove_recursive(adapter->debugfs);
+}
+EXPORT_SYMBOL(cxl_debugfs_adapter_remove);
+
+int cxl_debugfs_afu_add(struct cxl_afu_t *afu)
+{
+	struct dentry *dir;
+	char buf[32];
+
+	if (!afu->adapter->debugfs)
+		return -ENODEV;
+
+	snprintf(buf, 32, "psl%i.%i", afu->adapter->adapter_num, afu->slice);
+	dir = debugfs_create_dir(buf, afu->adapter->debugfs);
+	if (IS_ERR(dir))
+		return PTR_ERR(dir);
+	afu->debugfs = dir;
+
+	debugfs_create_x64("fir",        S_IRUSR, dir, _cxl_p1n_addr(afu, CXL_PSL_FIR_SLICE_An));
+	debugfs_create_x64("serr",       S_IRUSR, dir, _cxl_p1n_addr(afu, CXL_PSL_SERR_An));
+	debugfs_create_x64("afu_debug",  S_IRUSR, dir, _cxl_p1n_addr(afu, CXL_AFU_DEBUG_An));
+	debugfs_create_x64("sr",         S_IRUSR, dir, _cxl_p1n_addr(afu, CXL_PSL_SR_An));
+
+	debugfs_create_x64("dsisr",      S_IRUSR, dir, _cxl_p2n_addr(afu, CXL_PSL_DSISR_An));
+	debugfs_create_x64("dar",        S_IRUSR, dir, _cxl_p2n_addr(afu, CXL_PSL_DAR_An));
+	debugfs_create_x64("sstp0",      S_IRUSR, dir, _cxl_p2n_addr(afu, CXL_SSTP0_An));
+	debugfs_create_x64("sstp1",      S_IRUSR, dir, _cxl_p2n_addr(afu, CXL_SSTP1_An));
+	debugfs_create_x64("err_status", S_IRUSR, dir, _cxl_p2n_addr(afu, CXL_PSL_ErrStat_An));
+
+	debugfs_create_x64("trace", S_IRUSR | S_IWUSR, dir, _cxl_p1n_addr(afu, CXL_PSL_SLICE_TRACE));
+
+	return 0;
+}
+EXPORT_SYMBOL(cxl_debugfs_afu_add);
+
+void cxl_debugfs_afu_remove(struct cxl_afu_t *afu)
+{
+	debugfs_remove_recursive(afu->debugfs);
+}
+EXPORT_SYMBOL(cxl_debugfs_afu_remove);
+
+int __init cxl_debugfs_init(void)
+{
+	struct dentry *ent;
+	ent = debugfs_create_dir("cxl", NULL);
+	if (IS_ERR(ent))
+		return PTR_ERR(ent);
+	cxl_debugfs = ent;
+
+	return 0;
+}
+
+void cxl_debugfs_exit(void)
+{
+	debugfs_remove_recursive(cxl_debugfs);
+}
diff --git a/drivers/misc/cxl/fault.c b/drivers/misc/cxl/fault.c
new file mode 100644
index 0000000..f729c4a
--- /dev/null
+++ b/drivers/misc/cxl/fault.c
@@ -0,0 +1,298 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#undef DEBUG
+
+#include <linux/workqueue.h>
+#include <linux/sched.h>
+#include <linux/pid.h>
+#include <linux/mm.h>
+#include <linux/moduleparam.h>
+
+#undef MODULE_PARAM_PREFIX
+#define MODULE_PARAM_PREFIX "cxl" "."
+#include <asm/current.h>
+#include <asm/copro.h>
+#include <asm/mmu.h>
+
+#include "cxl.h"
+
+bool cxl_fault_debug = false;
+
+static struct cxl_sste* find_free_sste(struct cxl_sste *primary_group,
+				       bool sec_hash,
+				       struct cxl_sste *secondary_group,
+				       unsigned int *lru)
+{
+	unsigned int i, entry;
+	struct cxl_sste *sste, *group = primary_group;
+
+	for (i = 0; i < 2; i++) {
+		for (entry = 0; entry < 8; entry++) {
+			sste = group + entry;
+			if (!(sste->esid_data & SLB_ESID_V))
+				return sste;
+		}
+		if (!sec_hash)
+			break;
+		group = secondary_group;
+	}
+	/* Nothing free, select an entry to cast out */
+	if (sec_hash && (*lru & 0x8))
+		sste = secondary_group + (*lru & 0x7);
+	else
+		sste = primary_group + (*lru & 0x7);
+	*lru = (*lru + 1) & 0xf;
+
+	return sste;
+}
+
+static void cxl_load_segment(struct cxl_context_t *ctx, u64 esid_data,
+			     u64 vsid_data)
+{
+	/* mask is the group index, we search primary and secondary here. */
+	unsigned int mask = (ctx->sst_size >> 7)-1; /* SSTP0[SegTableSize] */
+	bool sec_hash = 1;
+	struct cxl_sste *sste;
+	unsigned int hash;
+
+	WARN_ON_SMP(!spin_is_locked(&ctx->sst_lock));
+
+	sec_hash = !!(cxl_p1n_read(ctx->afu, CXL_PSL_SR_An) & CXL_PSL_SR_An_SC);
+
+	if (vsid_data & SLB_VSID_B_1T)
+		hash = (esid_data >> SID_SHIFT_1T) & mask;
+	else /* 256M */
+		hash = (esid_data >> SID_SHIFT) & mask;
+
+	sste = find_free_sste(ctx->sstp + (hash << 3), sec_hash,
+			      ctx->sstp + ((~hash & mask) << 3), &ctx->sst_lru);
+
+	pr_devel("CXL Populating SST[%li]: %#llx %#llx\n",
+			sste - ctx->sstp, vsid_data, esid_data);
+
+	sste->vsid_data = cpu_to_be64(vsid_data);
+	sste->esid_data = cpu_to_be64(esid_data);
+}
+
+static int cxl_fault_segment(struct cxl_context_t *ctx, struct mm_struct *mm,
+			     u64 ea)
+{
+	u64 vsid_data = 0, esid_data = 0;
+	unsigned long flags;
+	int rc;
+
+	spin_lock_irqsave(&ctx->sst_lock, flags);
+	if (!(rc = copro_data_segment(mm, ea, &esid_data, &vsid_data))) {
+		cxl_load_segment(ctx, esid_data, vsid_data);
+	}
+	spin_unlock_irqrestore(&ctx->sst_lock, flags);
+
+	return rc;
+}
+
+static void cxl_ack_ae(struct cxl_context_t *ctx)
+{
+	unsigned long flags;
+
+	cxl_ops->ack_irq(ctx, CXL_PSL_TFC_An_AE, 0);
+
+	spin_lock_irqsave(&ctx->lock, flags);
+	ctx->pending_fault = true;
+	ctx->fault_addr = ctx->dar;
+	spin_unlock_irqrestore(&ctx->lock, flags);
+
+	wake_up_all(&ctx->wq);
+}
+
+static int cxl_handle_segment_miss(struct cxl_context_t *ctx,
+				   struct mm_struct *mm, u64 ea)
+{
+	int rc;
+
+	pr_devel("CXL interrupt: Segment fault pe: %i ea: %#llx\n", ctx->ph, ea);
+
+	if ((rc = cxl_fault_segment(ctx, mm, ea)))
+		cxl_ack_ae(ctx);
+	else {
+
+		mb(); /* Order seg table write to TFC MMIO write */
+		cxl_ops->ack_irq(ctx, CXL_PSL_TFC_An_R, 0);
+	}
+
+	return IRQ_HANDLED;
+}
+
+static void cxl_handle_page_fault(struct cxl_context_t *ctx,
+				  struct mm_struct *mm, u64 dsisr, u64 dar)
+{
+	unsigned flt = 0;
+	int result;
+	unsigned long access, flags;
+
+	if ((result = copro_handle_mm_fault(mm, dar, dsisr, &flt))) {
+		pr_devel("copro_handle_mm_fault failed: %#x\n", result);
+		return cxl_ack_ae(ctx);
+	}
+
+	/*
+	 * update_mmu_cache() will not have loaded the hash since current->trap
+	 * is not a 0x400 or 0x300, so just call hash_page_mm() here.
+	 */
+	access = _PAGE_PRESENT;
+	if (dsisr & CXL_PSL_DSISR_An_S)
+		access |= _PAGE_RW;
+	if ((!ctx->kernel) || ~(dar & (1ULL << 63)))
+		access |= _PAGE_USER;
+	local_irq_save(flags);
+	hash_page_mm(mm, dar, access, 0x300);
+	local_irq_restore(flags);
+
+	pr_devel("Page fault successfully handled for pe: %i!\n", ctx->ph);
+	cxl_ops->ack_irq(ctx, CXL_PSL_TFC_An_R, 0);
+}
+
+void cxl_handle_fault(struct work_struct *fault_work)
+{
+	struct cxl_context_t *ctx =
+		container_of(fault_work, struct cxl_context_t, fault_work);
+	u64 dsisr = ctx->dsisr;
+	u64 dar = ctx->dar;
+	struct task_struct *task;
+	struct mm_struct *mm;
+
+	if (cxl_p2n_read(ctx->afu, CXL_PSL_DSISR_An) != dsisr ||
+	    cxl_p2n_read(ctx->afu, CXL_PSL_DAR_An) != dar ||
+	    cxl_p2n_read(ctx->afu, CXL_PSL_PEHandle_An) != ctx->ph) {
+		/* Most likely explanation is harmless - a dedicated process
+		 * has detached and these were cleared by the PSL purge, but
+		 * warn about it just in case */
+		dev_notice(&ctx->afu->dev, "cxl_handle_fault: Translation fault regs changed\n");
+		return;
+	}
+
+	pr_devel("CXL BOTTOM HALF handling fault for afu pe: %i. "
+		"DSISR: %#llx DAR: %#llx\n", ctx->ph, dsisr, dar);
+
+	if (!(task = get_pid_task(ctx->pid, PIDTYPE_PID))) {
+		pr_devel("cxl_handle_fault unable to get task %i\n",
+			 pid_nr(ctx->pid));
+		cxl_ack_ae(ctx);
+		return;
+	}
+	if (!(mm = get_task_mm(task))) {
+		pr_devel("cxl_handle_fault unable to get mm %i\n",
+			 pid_nr(ctx->pid));
+		cxl_ack_ae(ctx);
+		goto out;
+	}
+
+	if (dsisr & CXL_PSL_DSISR_An_DS)
+		cxl_handle_segment_miss(ctx, mm, dar);
+	else if (dsisr & CXL_PSL_DSISR_An_DM)
+		cxl_handle_page_fault(ctx, mm, dsisr, dar);
+	else
+		WARN(1, "cxl_handle_fault has nothing to handle\n");
+
+	mmput(mm);
+out:
+	put_task_struct(task);
+}
+
+static void cxl_prefault_one(struct cxl_context_t *ctx, u64 ea)
+{
+	int rc;
+	struct task_struct *task;
+	struct mm_struct *mm;
+
+	if (!(task = get_pid_task(ctx->pid, PIDTYPE_PID))) {
+		pr_devel("cxl_prefault_one unable to get task %i\n",
+			 pid_nr(ctx->pid));
+		return;
+	}
+	if (!(mm = get_task_mm(task))) {
+		pr_devel("cxl_prefault_one unable to get mm %i\n",
+			 pid_nr(ctx->pid));
+		put_task_struct(task);
+		return;
+	}
+
+	rc = cxl_fault_segment(ctx, mm, ea);
+
+	mmput(mm);
+	put_task_struct(task);
+}
+
+static u64 next_segment(u64 ea, u64 vsid_data)
+{
+	if (vsid_data & SLB_VSID_B_1T)
+		ea |= (1ULL << 40) - 1;
+	else
+		ea |= (1ULL << 28) - 1;
+
+	return ea + 1;
+}
+
+static void cxl_prefault_vma(struct cxl_context_t *ctx)
+{
+	u64 ea, vsid_data, esid_data, last_esid_data = 0;
+	struct vm_area_struct *vma;
+	int rc;
+	struct task_struct *task;
+	struct mm_struct *mm;
+	unsigned long flags;
+
+	if (!(task = get_pid_task(ctx->pid, PIDTYPE_PID))) {
+		pr_devel("cxl_prefault_vma unable to get task %i\n",
+			 pid_nr(ctx->pid));
+		return;
+	}
+	if (!(mm = get_task_mm(task))) {
+		pr_devel("cxl_prefault_vm unable to get mm %i\n",
+			 pid_nr(ctx->pid));
+		goto out1;
+	}
+
+	spin_lock_irqsave(&ctx->sst_lock, flags);
+	down_read(&mm->mmap_sem);
+	for (vma = mm->mmap; vma; vma = vma->vm_next) {
+		for (ea = vma->vm_start; ea < vma->vm_end;
+				ea = next_segment(ea, vsid_data)) {
+			rc = copro_data_segment(mm, ea, &esid_data, &vsid_data);
+			if (rc)
+				continue;
+
+			if (last_esid_data == esid_data)
+				continue;
+
+			cxl_load_segment(ctx, esid_data, vsid_data);
+			last_esid_data = esid_data;
+		}
+	}
+	up_read(&mm->mmap_sem);
+	spin_unlock_irqrestore(&ctx->sst_lock, flags);
+
+	mmput(mm);
+out1:
+	put_task_struct(task);
+}
+
+void cxl_prefault(struct cxl_context_t *ctx, u64 wed)
+{
+	switch (ctx->afu->prefault_mode) {
+	case CXL_PREFAULT_WED:
+		cxl_prefault_one(ctx, wed);
+		break;
+	case CXL_PREFAULT_ALL:
+		cxl_prefault_vma(ctx);
+		break;
+	default:
+		break;
+	}
+}
diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c
new file mode 100644
index 0000000..fb87ce3
--- /dev/null
+++ b/drivers/misc/cxl/file.c
@@ -0,0 +1,503 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#undef DEBUG
+
+#include <linux/spinlock.h>
+#include <linux/module.h>
+#include <linux/export.h>
+#include <linux/kernel.h>
+#include <linux/bitmap.h>
+#include <linux/sched.h>
+#include <linux/poll.h>
+#include <linux/pid.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <asm/cputable.h>
+#include <asm/current.h>
+#include <asm/copro.h>
+
+#include "cxl.h"
+
+#define CXL_NUM_MINORS 256 /* Total to reserve */
+#define CXL_DEV_MINORS 9   /* 1 control + 4 AFUs * 2 (master/shared) */
+
+#define CXL_CARD_MINOR(adapter) (adapter->adapter_num * CXL_DEV_MINORS)
+#define CXL_AFU_MINOR(afu) (CXL_CARD_MINOR(afu->adapter) + 1 + (2 * afu->slice))
+#define CXL_AFU_MINOR_M(afu) (CXL_AFU_MINOR(afu) + 1)
+#define CXL_AFU_MKDEV(afu) MKDEV(MAJOR(cxl_dev), CXL_AFU_MINOR(afu))
+#define CXL_AFU_MKDEV_M(afu) MKDEV(MAJOR(cxl_dev), CXL_AFU_MINOR_M(afu))
+
+#define CXL_DEVT_ADAPTER(dev) (MINOR(dev) / CXL_DEV_MINORS)
+#define CXL_DEVT_AFU(dev) ((MINOR(dev) % CXL_DEV_MINORS - 1) / 2)
+
+#define CXL_DEVT_IS_CARD(dev) (MINOR(dev) % CXL_DEV_MINORS == 0)
+#define CXL_DEVT_IS_AFU(dev) (!CXL_DEVT_IS_CARD(dev))
+#define _CXL_DEVT_IS_AFU_S(dev) (((MINOR(dev) % CXL_DEV_MINORS) % 2) == 1)
+#define CXL_DEVT_IS_AFU_S(dev) (!CXL_DEVT_IS_CARD(dev) && _CXL_DEVT_IS_AFU_S(dev))
+#define CXL_DEVT_IS_AFU_M(dev) (!CXL_DEVT_IS_CARD(dev) && !_CXL_DEVT_IS_AFU_S(dev))
+
+dev_t cxl_dev;
+
+struct class *cxl_class;
+EXPORT_SYMBOL(cxl_class);
+
+static int __afu_open(struct inode *inode, struct file *file, bool master)
+{
+	struct cxl_t *adapter;
+	struct cxl_afu_t *afu;
+	struct cxl_context_t *ctx;
+	int adapter_num = CXL_DEVT_ADAPTER(inode->i_rdev);
+	int slice = CXL_DEVT_AFU(inode->i_rdev);
+	int rc = -ENODEV;
+
+	pr_devel("afu_open afu%i.%i\n", slice, adapter_num);
+
+	if (!(adapter = get_cxl_adapter(adapter_num)))
+		return -ENODEV;
+
+	if (!try_module_get(adapter->driver->module))
+		goto err_put_adapter;
+
+	if (slice > adapter->slices)
+		goto err_put_module;
+
+	spin_lock(&adapter->afu_list_lock);
+	if (!(afu = adapter->afu[slice])) {
+		spin_unlock(&adapter->afu_list_lock);
+		goto err_put_module;
+	}
+	get_device(&afu->dev);
+	spin_unlock(&adapter->afu_list_lock);
+
+	if (!afu->current_model)
+		goto err_put_afu;
+
+	if (!(ctx = cxl_context_alloc())) {
+		rc = -ENOMEM;
+		goto err_put_afu;
+	}
+
+	if ((rc = cxl_context_init(ctx, afu, master)))
+		goto err_put_afu;
+
+	pr_devel("afu_open pe: %i\n", ctx->ph);
+	file->private_data = ctx;
+	cxl_ctx_get();
+
+	/* Our ref on the AFU will now hold the adapter */
+	put_device(&adapter->dev);
+
+	return 0;
+
+err_put_afu:
+	put_device(&afu->dev);
+err_put_module:
+	module_put(adapter->driver->module);
+err_put_adapter:
+	put_device(&adapter->dev);
+	return rc;
+}
+static int afu_open(struct inode *inode, struct file *file)
+{
+	return __afu_open(inode, file, false);
+}
+
+static int afu_master_open(struct inode *inode, struct file *file)
+{
+	return __afu_open(inode, file, true);
+}
+
+static int afu_release(struct inode *inode, struct file *file)
+{
+	struct cxl_context_t *ctx = file->private_data;
+
+	pr_devel("%s: closing cxl file descriptor. pe: %i\n",
+		 __func__, ctx->ph);
+	cxl_context_detach(ctx);
+
+	module_put(ctx->afu->adapter->driver->module);
+
+	put_device(&ctx->afu->dev);
+
+	/* It should be safe to remove the context now */
+	cxl_context_free(ctx);
+
+	cxl_ctx_put();
+	return 0;
+}
+
+static long afu_ioctl_start_work(struct cxl_context_t *ctx,
+		     struct cxl_ioctl_start_work __user *uwork)
+{
+	struct cxl_ioctl_start_work work;
+	u64 amr;
+	int rc;
+
+	pr_devel("afu_ioctl: pe: %i CXL_START_WORK\n", ctx->ph);
+
+	if (ctx->status != OPENED)
+		return -EIO;
+
+	if (copy_from_user(&work, uwork,
+			   sizeof(struct cxl_ioctl_start_work)))
+		return -EFAULT;
+
+	if (work.reserved1 || work.reserved2 || work.reserved3 ||
+	    work.reserved4 || work.reserved5 || work.reserved6)
+		return -EINVAL;
+
+	if (work.num_interrupts == -1)
+		work.num_interrupts = ctx->afu->pp_irqs;
+	else if ((work.num_interrupts < ctx->afu->pp_irqs) ||
+		 (work.num_interrupts > ctx->afu->irqs_max))
+		return -EINVAL;
+	if ((rc = afu_register_irqs(ctx, work.num_interrupts)))
+		return rc;
+
+	amr = work.amr & mfspr(SPRN_UAMOR);
+
+	work.process_element = ctx->ph;
+
+	/* Returns PE and number of interrupts */
+	if (copy_to_user(uwork, &work,
+			 sizeof(struct cxl_ioctl_start_work)))
+		return -EFAULT;
+
+	if ((rc = cxl_ops->attach_process(ctx, false, work.wed, amr)))
+		return rc;
+
+	ctx->status = STARTED;
+
+	return 0;
+}
+
+static long afu_ioctl_check_error(struct cxl_context_t *ctx)
+{
+	if (ctx->status != STARTED)
+		return -EIO;
+
+	if (cxl_ops->check_error && cxl_ops->check_error(ctx->afu)) {
+		/* This may not be enough for some errors.  May need to PERST
+		 * the card in some cases if it's very broken.
+		 */
+		return cxl_ops->afu_reset(ctx->afu);
+	}
+	return -EPERM;
+}
+
+static long afu_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+	struct cxl_context_t *ctx = file->private_data;
+
+	if (ctx->status == CLOSED)
+		return -EIO;
+
+	pr_devel("afu_ioctl\n");
+	switch (cmd) {
+	case CXL_IOCTL_START_WORK:
+		return afu_ioctl_start_work(ctx,
+			(struct cxl_ioctl_start_work __user *)arg);
+	case CXL_IOCTL_CHECK_ERROR:
+		return afu_ioctl_check_error(ctx);
+	}
+	return -EINVAL;
+}
+
+static long afu_compat_ioctl(struct file *file, unsigned int cmd,
+			     unsigned long arg)
+{
+	return afu_ioctl(file, cmd, arg);
+}
+
+static int afu_mmap(struct file *file, struct vm_area_struct *vm)
+{
+	struct cxl_context_t *ctx = file->private_data;
+
+	/* AFU must be started before we can MMIO */
+	if (ctx->status != STARTED)
+		return -EIO;
+
+	return cxl_context_iomap(ctx, vm);
+}
+
+static unsigned int afu_poll(struct file *file, struct poll_table_struct *poll)
+{
+	struct cxl_context_t *ctx = file->private_data;
+	int mask = 0;
+	unsigned long flags;
+
+
+	poll_wait(file, &ctx->wq, poll);
+
+	pr_devel("afu_poll wait done pe: %i\n", ctx->ph);
+
+	spin_lock_irqsave(&ctx->lock, flags);
+	if (ctx->pending_irq || ctx->pending_fault ||
+	    ctx->pending_afu_err)
+		mask |= POLLIN | POLLRDNORM;
+	else if (ctx->status == CLOSED)
+		/* Only error on closed when there are no futher events pending
+		 */
+		mask |= POLLERR;
+	spin_unlock_irqrestore(&ctx->lock, flags);
+
+	pr_devel("afu_poll pe: %i returning %#x\n", ctx->ph, mask);
+
+	return mask;
+}
+
+static ssize_t afu_read(struct file *file, char __user *buf, size_t count,
+			loff_t *off)
+{
+	struct cxl_context_t *ctx = file->private_data;
+	struct cxl_event event;
+	unsigned long flags;
+	ssize_t size;
+	DEFINE_WAIT(wait);
+
+	if (count < sizeof(struct cxl_event_header))
+		return -EINVAL;
+
+	while (1) {
+		spin_lock_irqsave(&ctx->lock, flags);
+		if (ctx->pending_irq || ctx->pending_fault ||
+		    ctx->pending_afu_err || (ctx->status == CLOSED))
+			break;
+		spin_unlock_irqrestore(&ctx->lock, flags);
+
+		if (file->f_flags & O_NONBLOCK)
+			return -EAGAIN;
+
+		prepare_to_wait(&ctx->wq, &wait, TASK_INTERRUPTIBLE);
+		if (!(ctx->pending_irq || ctx->pending_fault ||
+		      ctx->pending_afu_err || (ctx->status == CLOSED))) {
+			pr_devel("afu_read going to sleep...\n");
+			schedule();
+			pr_devel("afu_read woken up\n");
+		}
+		finish_wait(&ctx->wq, &wait);
+
+		if (signal_pending(current))
+			return -ERESTARTSYS;
+	}
+
+	memset(&event, 0, sizeof(event));
+	event.header.process_element = ctx->ph;
+	if (ctx->pending_irq) {
+		pr_devel("afu_read delivering AFU interrupt\n");
+		event.header.size = sizeof(struct cxl_event_afu_interrupt);
+		event.header.type = CXL_EVENT_AFU_INTERRUPT;
+		event.irq.irq = find_first_bit(ctx->irq_bitmap, ctx->irq_count) + 1;
+
+		/* Only clear the IRQ if we can send the whole event: */
+		if (count >= event.header.size) {
+			clear_bit(event.irq.irq - 1, ctx->irq_bitmap);
+			if (bitmap_empty(ctx->irq_bitmap, ctx->irq_count))
+				ctx->pending_irq = false;
+		}
+	} else if (ctx->pending_fault) {
+		pr_devel("afu_read delivering data storage fault\n");
+		event.header.size = sizeof(struct cxl_event_data_storage);
+		event.header.type = CXL_EVENT_DATA_STORAGE;
+		event.fault.addr = ctx->fault_addr;
+
+		/* Only clear the fault if we can send the whole event: */
+		if (count >= event.header.size)
+			ctx->pending_fault = false;
+	} else if (ctx->pending_afu_err) {
+		pr_devel("afu_read delivering afu error\n");
+		event.header.size = sizeof(struct cxl_event_afu_error);
+		event.header.type = CXL_EVENT_AFU_ERROR;
+		event.afu_err.err = ctx->afu_err;
+
+		/* Only clear the fault if we can send the whole event: */
+		if (count >= event.header.size)
+			ctx->pending_afu_err = false;
+	} else if (ctx->status == CLOSED) {
+		pr_devel("afu_read fatal error\n");
+		spin_unlock_irqrestore(&ctx->lock, flags);
+		return -EIO;
+	} else
+		WARN(1, "afu_read must be buggy\n");
+
+	spin_unlock_irqrestore(&ctx->lock, flags);
+
+	size = min_t(size_t, count, event.header.size);
+	copy_to_user(buf, &event, size);
+
+	return size;
+}
+
+static const struct file_operations afu_fops = {
+	.owner		= THIS_MODULE,
+	.open           = afu_open,
+	.poll		= afu_poll,
+	.read		= afu_read,
+	.release        = afu_release,
+	.unlocked_ioctl = afu_ioctl,
+	.compat_ioctl   = afu_compat_ioctl,
+	.mmap           = afu_mmap,
+};
+
+static const struct file_operations afu_master_fops = {
+	.owner		= THIS_MODULE,
+	.open           = afu_master_open,
+	.poll		= afu_poll,
+	.read		= afu_read,
+	.release        = afu_release,
+	.unlocked_ioctl = afu_ioctl,
+	.compat_ioctl   = afu_compat_ioctl,
+	.mmap           = afu_mmap,
+};
+
+
+static char *cxl_devnode(struct device *dev, umode_t *mode)
+{
+	struct cxl_afu_t *afu;
+
+	if (CXL_DEVT_IS_CARD(dev->devt)) {
+		/* These minor numbers will eventually be used to program the
+		 * PSL and AFUs once we have dynamic reprogramming support */
+		return NULL;
+	} else { /* CXL_DEVT_IS_AFU */
+		/* Default character devices in each programming model just get
+		 * named /dev/cxl/afuX.Y */
+		afu = dev_get_drvdata(dev);
+		if ((afu->current_model == CXL_MODEL_DEDICATED) &&
+				CXL_DEVT_IS_AFU_M(dev->devt))
+			return kasprintf(GFP_KERNEL, "cxl/%s", dev_name(&afu->dev));
+		if ((afu->current_model == CXL_MODEL_DIRECTED) &&
+				CXL_DEVT_IS_AFU_S(dev->devt))
+			return kasprintf(GFP_KERNEL, "cxl/%s", dev_name(&afu->dev));
+	}
+	return kasprintf(GFP_KERNEL, "cxl/%s", dev_name(dev));
+}
+
+extern struct class *cxl_class;
+
+int cxl_chardev_m_afu_add(struct cxl_afu_t *afu)
+{
+	struct device *dev;
+	int rc;
+
+	cdev_init(&afu->afu_cdev_m, &afu_master_fops);
+	if ((rc = cdev_add(&afu->afu_cdev_m, CXL_AFU_MKDEV_M(afu), 1))) {
+		dev_err(&afu->dev, "Unable to add master chardev: %i\n", rc);
+		return rc;
+	}
+
+	dev = device_create(cxl_class, &afu->dev, CXL_AFU_MKDEV_M(afu), afu,
+			"afu%i.%im", afu->adapter->adapter_num, afu->slice);
+	if (IS_ERR(dev)) {
+		dev_err(&afu->dev, "Unable to create master chardev in sysfs: %i\n", rc);
+		rc = PTR_ERR(dev);
+		goto err;
+	}
+
+	afu->chardev_m = dev;
+
+	return 0;
+err:
+	cdev_del(&afu->afu_cdev_m);
+	return rc;
+}
+
+int cxl_chardev_s_afu_add(struct cxl_afu_t *afu)
+{
+	struct device *dev;
+	int rc;
+
+	cdev_init(&afu->afu_cdev_s, &afu_fops);
+	if ((rc = cdev_add(&afu->afu_cdev_s, CXL_AFU_MKDEV(afu), 1))) {
+		dev_err(&afu->dev, "Unable to add shared chardev: %i\n", rc);
+		return rc;
+	}
+
+	dev = device_create(cxl_class, &afu->dev, CXL_AFU_MKDEV(afu), afu,
+			"afu%i.%is", afu->adapter->adapter_num, afu->slice);
+	if (IS_ERR(dev)) {
+		dev_err(&afu->dev, "Unable to create shared chardev in sysfs: %i\n", rc);
+		rc = PTR_ERR(dev);
+		goto err;
+	}
+
+	afu->chardev_s = dev;
+
+	return 0;
+err:
+	cdev_del(&afu->afu_cdev_s);
+	return rc;
+}
+
+void cxl_chardev_afu_remove(struct cxl_afu_t *afu)
+{
+	if (afu->chardev_m) {
+		cdev_del(&afu->afu_cdev_m);
+		device_unregister(afu->chardev_m);
+	}
+	if (afu->chardev_s) {
+		cdev_del(&afu->afu_cdev_s);
+		device_unregister(afu->chardev_s);
+	}
+}
+
+int cxl_register_afu(struct cxl_afu_t *afu)
+{
+	afu->dev.class = cxl_class;
+
+	return device_register(&afu->dev);
+}
+EXPORT_SYMBOL(cxl_register_afu);
+
+int cxl_register_adapter(struct cxl_t *adapter)
+{
+	adapter->dev.class = cxl_class;
+
+	/* Future: When we support dynamically reprogramming the PSL & AFU we
+	 * will expose the interface to do that via a chardev:
+	 * adapter->dev.devt = CXL_CARD_MKDEV(adapter);
+	 */
+
+	return device_register(&adapter->dev);
+}
+EXPORT_SYMBOL(cxl_register_adapter);
+
+int __init cxl_file_init(void)
+{
+	int rc;
+
+	if ((rc = alloc_chrdev_region(&cxl_dev, 0, CXL_NUM_MINORS, "cxl"))) {
+		pr_err("Unable to allocate CXL major number: %i\n", rc);
+		return rc;
+	}
+
+	pr_devel("CXL device allocated, MAJOR %i\n", MAJOR(cxl_dev));
+
+	cxl_class = class_create(THIS_MODULE, "cxl");
+	if (IS_ERR(cxl_class)) {
+		pr_err("Unable to create CXL class\n");
+		rc = PTR_ERR(cxl_class);
+		goto err;
+	}
+	cxl_class->devnode = cxl_devnode;
+
+	return 0;
+
+err:
+	unregister_chrdev_region(cxl_dev, CXL_NUM_MINORS);
+	return rc;
+}
+
+void cxl_file_exit(void)
+{
+	unregister_chrdev_region(cxl_dev, CXL_NUM_MINORS);
+	class_destroy(cxl_class);
+}
diff --git a/drivers/misc/cxl/irq.c b/drivers/misc/cxl/irq.c
new file mode 100644
index 0000000..3e01e1d
--- /dev/null
+++ b/drivers/misc/cxl/irq.c
@@ -0,0 +1,405 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#undef DEBUG
+
+#include <linux/interrupt.h>
+#include <linux/workqueue.h>
+#include <linux/sched.h>
+#include <linux/wait.h>
+#include <linux/slab.h>
+#include <linux/pid.h>
+#include <asm/cputable.h>
+#include <misc/cxl.h>
+
+#include "cxl.h"
+
+/* XXX: This is implementation specific */
+static irqreturn_t handle_psl_slice_error(struct cxl_context_t *ctx, u64 dsisr, u64 errstat)
+{
+	u64 fir1, fir2, fir_slice, serr, afu_debug;
+
+	fir1 = cxl_p1_read(ctx->afu->adapter, CXL_PSL_FIR1);
+	fir2 = cxl_p1_read(ctx->afu->adapter, CXL_PSL_FIR2);
+	fir_slice = cxl_p1n_read(ctx->afu, CXL_PSL_FIR_SLICE_An);
+	serr = cxl_p1n_read(ctx->afu, CXL_PSL_SERR_An);
+	afu_debug = cxl_p1n_read(ctx->afu, CXL_AFU_DEBUG_An);
+
+	dev_crit(&ctx->afu->dev, "PSL ERROR STATUS: 0x%.16llx\n", errstat);
+	dev_crit(&ctx->afu->dev, "PSL_FIR1: 0x%.16llx\n", fir1);
+	dev_crit(&ctx->afu->dev, "PSL_FIR2: 0x%.16llx\n", fir2);
+	dev_crit(&ctx->afu->dev, "PSL_SERR_An: 0x%.16llx\n", serr);
+	dev_crit(&ctx->afu->dev, "PSL_FIR_SLICE_An: 0x%.16llx\n", fir_slice);
+	dev_crit(&ctx->afu->dev, "CXL_PSL_AFU_DEBUG_An: 0x%.16llx\n", afu_debug);
+
+	dev_crit(&ctx->afu->dev, "STOPPING CXL TRACE\n");
+	cxl_stop_trace(ctx->afu->adapter);
+
+	return cxl_ops->ack_irq(ctx, 0, errstat);
+}
+
+irqreturn_t cxl_slice_irq_err(int irq, void *data)
+{
+	struct cxl_afu_t *afu = data;
+	u64 fir_slice, errstat, serr, afu_debug;
+
+	WARN(irq, "CXL SLICE ERROR interrupt %i\n", irq);
+
+	serr = cxl_p1n_read(afu, CXL_PSL_SERR_An);
+	fir_slice = cxl_p1n_read(afu, CXL_PSL_FIR_SLICE_An);
+	errstat = cxl_p2n_read(afu, CXL_PSL_ErrStat_An);
+	afu_debug = cxl_p1n_read(afu, CXL_AFU_DEBUG_An);
+	dev_crit(&afu->dev, "PSL_SERR_An: 0x%.16llx\n", serr);
+	dev_crit(&afu->dev, "PSL_FIR_SLICE_An: 0x%.16llx\n", fir_slice);
+	dev_crit(&afu->dev, "CXL_PSL_ErrStat_An: 0x%.16llx\n", errstat);
+	dev_crit(&afu->dev, "CXL_PSL_AFU_DEBUG_An: 0x%.16llx\n", afu_debug);
+
+	cxl_p1n_write(afu, CXL_PSL_SERR_An, serr);
+
+	return IRQ_HANDLED;
+}
+
+irqreturn_t cxl_irq_err(int irq, void *data)
+{
+	struct cxl_t *adapter = data;
+	u64 fir1, fir2, err_ivte;
+
+	WARN(1, "CXL ERROR interrupt %i\n", irq);
+
+	err_ivte = cxl_p1_read(adapter, CXL_PSL_ErrIVTE);
+	dev_crit(&adapter->dev, "PSL_ErrIVTE: 0x%.16llx\n", err_ivte);
+
+	dev_crit(&adapter->dev, "STOPPING CXL TRACE\n");
+	cxl_stop_trace(adapter);
+
+	fir1 = cxl_p1_read(adapter, CXL_PSL_FIR1);
+	fir2 = cxl_p1_read(adapter, CXL_PSL_FIR2);
+
+	dev_crit(&adapter->dev, "PSL_FIR1: 0x%.16llx\nPSL_FIR2: 0x%.16llx\n", fir1, fir2);
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t schedule_cxl_fault(struct cxl_context_t *ctx, u64 dsisr, u64 dar)
+{
+	ctx->dsisr = dsisr;
+	ctx->dar = dar;
+	schedule_work(&ctx->fault_work);
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t cxl_irq(int irq, void *data)
+{
+	struct cxl_context_t *ctx = data;
+	struct cxl_irq_info irq_info;
+	u64 dsisr, dar;
+	int result;
+
+	if ((result = cxl_ops->get_irq(ctx, &irq_info))) {
+		WARN(1, "Unable to get CXL IRQ Info: %i\n", result);
+		return IRQ_HANDLED;
+	}
+
+	dsisr = irq_info.dsisr;
+	dar = irq_info.dar;
+
+	pr_devel("CXL interrupt %i for afu pe: %i DSISR: %#llx DAR: %#llx\n", irq, ctx->ph, dsisr, dar);
+
+	if (dsisr & CXL_PSL_DSISR_An_DS) {
+		/* We don't inherently need to sleep to handle this, but we do
+		 * need to get a ref to the task's mm, which we can't do from
+		 * irq context without the potential for a deadlock since it
+		 * takes the task_lock. An alternate option would be to keep a
+		 * reference to the task's mm the entire time it has cxl open,
+		 * but to do that we need to solve the issue where we hold a
+		 * ref to the mm, but the mm can hold a ref to the fd after an
+		 * mmap preventing anything from being cleaned up. */
+		pr_devel("Scheduling segment miss handling for later pe: %i\n", ctx->ph);
+		return schedule_cxl_fault(ctx, dsisr, dar);
+	}
+
+	if (dsisr & CXL_PSL_DSISR_An_M)
+		pr_devel("CXL interrupt: PTE not found\n");
+	if (dsisr & CXL_PSL_DSISR_An_P)
+		pr_devel("CXL interrupt: Storage protection violation\n");
+	if (dsisr & CXL_PSL_DSISR_An_A)
+		pr_devel("CXL interrupt: AFU lock access to write through or cache inhibited storage\n");
+	if (dsisr & CXL_PSL_DSISR_An_S)
+		pr_devel("CXL interrupt: Access was afu_wr or afu_zero\n");
+	if (dsisr & CXL_PSL_DSISR_An_K)
+		pr_devel("CXL interrupt: Access not permitted by virtual page class key protection\n");
+
+	if (dsisr & CXL_PSL_DSISR_An_DM) {
+		/* In some cases we might be able to handle the fault
+		 * immediately if hash_page would succeed, but we still need
+		 * the task's mm, which as above we can't get without a lock */
+		pr_devel("Scheduling page fault handling for later pe: %i\n", ctx->ph);
+		return schedule_cxl_fault(ctx, dsisr, dar);
+	}
+	if (dsisr & CXL_PSL_DSISR_An_ST)
+		WARN(1, "CXL interrupt: Segment Table PTE not found\n");
+	if (dsisr & CXL_PSL_DSISR_An_UR)
+		pr_devel("CXL interrupt: AURP PTE not found\n");
+	if (dsisr & CXL_PSL_DSISR_An_PE)
+		return handle_psl_slice_error(ctx, dsisr, irq_info.errstat);
+	if (dsisr & CXL_PSL_DSISR_An_AE) {
+		pr_devel("CXL interrupt: AFU Error %.llx\n", irq_info.afu_err);
+
+		if (ctx->pending_afu_err) {
+			/* This shouldn't happen - the PSL treats these errors
+			 * as fatal and will have reset the AFU, so there's not
+			 * much point buffering multiple AFU errors.
+			 * OTOH if we DO ever see a storm of these come in it's
+			 * probably best that we log them somewhere: */
+			dev_err_ratelimited(&ctx->afu->dev, "CXL AFU Error "
+					    "undelivered to pe %i: %.llx\n",
+					    ctx->ph, irq_info.afu_err);
+		} else {
+			spin_lock(&ctx->lock);
+			ctx->afu_err = irq_info.afu_err;
+			ctx->pending_afu_err = 1;
+			spin_unlock(&ctx->lock);
+
+			wake_up_all(&ctx->wq);
+		}
+
+		cxl_ops->ack_irq(ctx, CXL_PSL_TFC_An_A, 0);
+	}
+	if (dsisr & CXL_PSL_DSISR_An_OC)
+		pr_devel("CXL interrupt: OS Context Warning\n");
+
+	WARN(1, "Unhandled CXL PSL IRQ\n");
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t cxl_irq_multiplexed(int irq, void *data)
+{
+	struct cxl_afu_t *afu = data;
+	struct cxl_context_t *ctx;
+	int ph = cxl_p2n_read(afu, CXL_PSL_PEHandle_An) & 0xffff;
+	int ret;
+
+	rcu_read_lock();
+	ctx = idr_find(&afu->contexts_idr, ph);
+	if (ctx) {
+		ret = cxl_irq(irq, ctx);
+		rcu_read_unlock();
+		return ret;
+	}
+	rcu_read_unlock();
+
+	WARN(1, "Unable to demultiplex CXL PSL IRQ\n");
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t cxl_irq_afu(int irq, void *data)
+{
+	struct cxl_context_t *ctx = data;
+	irq_hw_number_t hwirq = irqd_to_hwirq(irq_get_irq_data(irq));
+	int irq_off, afu_irq = 1;
+	__u16 range;
+	int r;
+
+	for (r = 1; r < CXL_IRQ_RANGES; r++) {
+		irq_off = hwirq - ctx->irqs.offset[r];
+		range = ctx->irqs.range[r];
+		if (irq_off >= 0 && irq_off < range) {
+			afu_irq += irq_off;
+			break;
+		}
+		afu_irq += range;
+	}
+	if (unlikely(r >= CXL_IRQ_RANGES)) {
+		WARN(1, "Recieved AFU IRQ out of range for pe %i (virq %i hwirq %lx)\n",
+		     ctx->ph, irq, hwirq);
+		return IRQ_HANDLED;
+	}
+
+	pr_devel("Received AFU interrupt %i for pe: %i (virq %i hwirq %lx)\n",
+	       afu_irq, ctx->ph, irq, hwirq);
+
+	if (unlikely(!ctx->irq_bitmap)) {
+		WARN(1, "Recieved AFU IRQ for context with no IRQ bitmap\n");
+		return IRQ_HANDLED;
+	}
+	spin_lock(&ctx->lock);
+	set_bit(afu_irq - 1, ctx->irq_bitmap);
+	ctx->pending_irq = true;
+	spin_unlock(&ctx->lock);
+
+	wake_up_all(&ctx->wq);
+
+	return IRQ_HANDLED;
+}
+
+unsigned int cxl_map_irq(struct cxl_t *adapter, irq_hw_number_t hwirq,
+			 irq_handler_t handler, void *cookie)
+{
+	unsigned int virq;
+	int result;
+
+	/* IRQ Domain? */
+	virq = irq_create_mapping(NULL, hwirq);
+	if (!virq) {
+		dev_warn(&adapter->dev, "cxl_map_irq: irq_create_mapping failed\n");
+		return 0;
+	}
+
+	if (adapter->driver->setup_irq)
+		adapter->driver->setup_irq(adapter, hwirq, virq);
+
+	pr_devel("hwirq %#lx mapped to virq %u\n", hwirq, virq);
+
+	result = request_irq(virq, handler, 0, "cxl", cookie);
+	if (result) {
+		dev_warn(&adapter->dev, "cxl_map_irq: request_irq failed: %i\n", result);
+		return 0;
+	}
+
+	return virq;
+}
+
+void cxl_unmap_irq(unsigned int virq, void *cookie)
+{
+	free_irq(virq, cookie);
+	irq_dispose_mapping(virq);
+}
+
+static int cxl_register_one_irq(struct cxl_t *adapter,
+				irq_handler_t handler,
+				void *cookie,
+				irq_hw_number_t *dest_hwirq,
+				unsigned int *dest_virq)
+{
+	int hwirq, virq;
+
+	if ((hwirq = adapter->driver->alloc_one_irq(adapter)) < 0)
+		return hwirq;
+
+	if (!(virq = cxl_map_irq(adapter, hwirq, handler, cookie)))
+		goto err;
+
+	*dest_hwirq = hwirq;
+	*dest_virq = virq;
+
+	return 0;
+
+err:
+	adapter->driver->release_one_irq(adapter, hwirq);
+	return -ENOMEM;
+}
+
+int cxl_register_psl_err_irq(struct cxl_t *adapter)
+{
+	int rc;
+
+	if ((rc = cxl_register_one_irq(adapter, cxl_irq_err, adapter,
+				       &adapter->err_hwirq,
+				       &adapter->err_virq)))
+		return rc;
+
+	cxl_p1_write(adapter, CXL_PSL_ErrIVTE, adapter->err_hwirq & 0xffff);
+
+	return 0;
+}
+EXPORT_SYMBOL(cxl_register_psl_err_irq);
+
+void cxl_release_psl_err_irq(struct cxl_t *adapter)
+{
+	cxl_p1_write(adapter, CXL_PSL_ErrIVTE, 0x0000000000000000);
+	cxl_unmap_irq(adapter->err_virq, adapter);
+	adapter->driver->release_one_irq(adapter, adapter->err_hwirq);
+}
+EXPORT_SYMBOL(cxl_release_psl_err_irq);
+
+int cxl_register_serr_irq(struct cxl_afu_t *afu)
+{
+	u64 serr;
+	int rc;
+
+	if ((rc = cxl_register_one_irq(afu->adapter, cxl_slice_irq_err, afu,
+				       &afu->serr_hwirq,
+				       &afu->serr_virq)))
+		return rc;
+
+	serr = cxl_p1n_read(afu, CXL_PSL_SERR_An);
+	serr = (serr & 0x00ffffffffff0000ULL) | (afu->serr_hwirq & 0xffff);
+	cxl_p1n_write(afu, CXL_PSL_SERR_An, serr);
+
+	return 0;
+}
+EXPORT_SYMBOL(cxl_register_serr_irq);
+
+void cxl_release_serr_irq(struct cxl_afu_t *afu)
+{
+	cxl_p1n_write(afu, CXL_PSL_SERR_An, 0x0000000000000000);
+	cxl_unmap_irq(afu->serr_virq, afu);
+	afu->adapter->driver->release_one_irq(afu->adapter, afu->serr_hwirq);
+}
+EXPORT_SYMBOL(cxl_release_serr_irq);
+
+int cxl_register_psl_irq(struct cxl_afu_t *afu)
+{
+	return cxl_register_one_irq(afu->adapter, cxl_irq_multiplexed, afu,
+			&afu->psl_hwirq, &afu->psl_virq);
+}
+EXPORT_SYMBOL(cxl_register_psl_irq);
+
+void cxl_release_psl_irq(struct cxl_afu_t *afu)
+{
+	cxl_unmap_irq(afu->psl_virq, afu);
+	afu->adapter->driver->release_one_irq(afu->adapter, afu->psl_hwirq);
+}
+EXPORT_SYMBOL(cxl_release_psl_irq);
+
+int afu_register_irqs(struct cxl_context_t *ctx, u32 count)
+{
+	irq_hw_number_t hwirq;
+	int rc, r, i;
+
+	if ((rc = ctx->afu->adapter->driver->alloc_irq_ranges(&ctx->irqs, ctx->afu->adapter, count)))
+		return rc;
+
+	/* Multiplexed PSL Interrupt */
+	ctx->irqs.offset[0] = ctx->afu->psl_hwirq;
+	ctx->irqs.range[0] = 1;
+
+	ctx->irq_count = count;
+	ctx->irq_bitmap = kcalloc(BITS_TO_LONGS(count),
+				  sizeof(*ctx->irq_bitmap), GFP_KERNEL);
+	if (!ctx->irq_bitmap)
+		return -ENOMEM;
+	for (r = 1; r < CXL_IRQ_RANGES; r++) {
+		hwirq = ctx->irqs.offset[r];
+		for (i = 0; i < ctx->irqs.range[r]; hwirq++, i++) {
+			cxl_map_irq(ctx->afu->adapter, hwirq,
+				     cxl_irq_afu, ctx);
+		}
+	}
+
+	return 0;
+}
+
+void afu_release_irqs(struct cxl_context_t *ctx)
+{
+	irq_hw_number_t hwirq;
+	unsigned int virq;
+	int r, i;
+
+	for (r = 1; r < CXL_IRQ_RANGES; r++) {
+		hwirq = ctx->irqs.offset[r];
+		for (i = 0; i < ctx->irqs.range[r]; hwirq++, i++) {
+			virq = irq_find_mapping(NULL, hwirq);
+			if (virq)
+				cxl_unmap_irq(virq, ctx);
+		}
+	}
+
+	ctx->afu->adapter->driver->release_irq_ranges(&ctx->irqs, ctx->afu->adapter);
+}
diff --git a/drivers/misc/cxl/main.c b/drivers/misc/cxl/main.c
new file mode 100644
index 0000000..fb0e0fc
--- /dev/null
+++ b/drivers/misc/cxl/main.c
@@ -0,0 +1,238 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#undef DEBUG
+
+#include <linux/spinlock.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/mutex.h>
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/mm.h>
+#include <linux/of.h>
+#include <linux/slab.h>
+#include <linux/idr.h>
+#include <asm/cputable.h>
+#include <misc/cxl.h>
+
+#include "cxl.h"
+
+static DEFINE_SPINLOCK(adapter_idr_lock);
+static DEFINE_IDR(cxl_adapter_idr);
+
+const struct cxl_backend_ops *cxl_ops;
+EXPORT_SYMBOL(cxl_ops);
+
+uint cxl_verbose;
+EXPORT_SYMBOL(cxl_verbose);
+module_param_named(verbose, cxl_verbose, uint, 0600);
+MODULE_PARM_DESC(verbose, "Enable verbose dmesg output");
+
+static inline void cxl_slbia_core(struct mm_struct *mm)
+{
+	struct cxl_t *adapter;
+	struct cxl_afu_t *afu;
+	struct cxl_context_t *ctx;
+	struct task_struct *task;
+	unsigned long flags;
+	int card, slice, id;
+
+	pr_devel("%s called\n", __func__);
+
+	spin_lock(&adapter_idr_lock);
+	idr_for_each_entry(&cxl_adapter_idr, adapter, card) {
+		/* XXX: Make this lookup faster with link from mm to ctx */
+		spin_lock(&adapter->afu_list_lock);
+		for (slice = 0; slice < adapter->slices; slice++) {
+			afu = adapter->afu[slice];
+			if (!afu->enabled)
+				continue;
+			rcu_read_lock();
+			idr_for_each_entry(&afu->contexts_idr, ctx, id) {
+				if (!(task = get_pid_task(ctx->pid, PIDTYPE_PID))) {
+					pr_devel("%s unable to get task %i\n",
+						 __func__, pid_nr(ctx->pid));
+					continue;
+				}
+
+				if (task->mm != mm)
+					goto next;
+
+				pr_devel("%s matched mm - card: %i afu: %i pe: %i\n",
+					 __func__, adapter->adapter_num, slice, ctx->ph);
+
+				spin_lock_irqsave(&ctx->sst_lock, flags);
+				if (!ctx->sstp)
+					goto next_unlock;
+				memset(ctx->sstp, 0, ctx->sst_size);
+				mb();
+				cxl_ops->slbia(afu);
+
+next_unlock:
+				spin_unlock_irqrestore(&ctx->sst_lock, flags);
+next:
+				put_task_struct(task);
+			}
+			rcu_read_unlock();
+		}
+		spin_unlock(&adapter->afu_list_lock);
+	}
+	spin_unlock(&adapter_idr_lock);
+}
+
+struct cxl_calls cxl_calls = {
+	.cxl_slbia = cxl_slbia_core,
+	.owner = THIS_MODULE,
+};
+
+int cxl_alloc_sst(struct cxl_context_t *ctx, u64 *sstp0, u64 *sstp1)
+{
+	unsigned long vsid, flags;
+	u64 ea_mask;
+	u64 size;
+
+	*sstp0 = 0;
+	*sstp1 = 0;
+
+	ctx->sst_size = PAGE_SIZE;
+	ctx->sst_lru = 0;
+	if (!ctx->sstp) {
+		ctx->sstp = (struct cxl_sste *)get_zeroed_page(GFP_KERNEL);
+		pr_devel("SSTP allocated at 0x%p\n", ctx->sstp);
+	} else {
+		pr_devel("Zeroing and reusing SSTP already allocated at 0x%p\n", ctx->sstp);
+		spin_lock_irqsave(&ctx->sst_lock, flags);
+		memset(ctx->sstp, 0, PAGE_SIZE);
+		cxl_ops->slbia(ctx->afu);
+		spin_unlock_irqrestore(&ctx->sst_lock, flags);
+	}
+	if (!ctx->sstp) {
+		pr_err("cxl_alloc_sst: Unable to allocate segment table\n");
+		return -ENOMEM;
+	}
+
+	vsid  = get_kernel_vsid((u64)ctx->sstp, mmu_kernel_ssize) << 12;
+
+	*sstp0 |= (u64)mmu_kernel_ssize << CXL_SSTP0_An_B_SHIFT;
+	*sstp0 |= (SLB_VSID_KERNEL | mmu_psize_defs[mmu_linear_psize].sllp) << 50;
+
+	size = (((u64)ctx->sst_size >> 8) - 1) << CXL_SSTP0_An_SegTableSize_SHIFT;
+	if (unlikely(size & ~CXL_SSTP0_An_SegTableSize_MASK)) {
+		WARN(1, "Impossible segment table size\n");
+		return -EINVAL;
+	}
+	*sstp0 |= size;
+
+	if (mmu_kernel_ssize == MMU_SEGSIZE_256M)
+		ea_mask = 0xfffff00ULL;
+	else
+		ea_mask = 0xffffffff00ULL;
+
+	*sstp0 |=  vsid >>     (50-14);  /*   Top 14 bits of VSID */
+	*sstp1 |= (vsid << (64-(50-14))) & ~ea_mask;
+	*sstp1 |= (u64)ctx->sstp & ea_mask;
+	*sstp1 |= CXL_SSTP1_An_V;
+
+	pr_devel("Looked up %#llx: slbfee. %#llx (ssize: %x, vsid: %#lx), copied to SSTP0: %#llx, SSTP1: %#llx\n",
+			(u64)ctx->sstp, (u64)ctx->sstp & ESID_MASK, mmu_kernel_ssize, vsid, *sstp0, *sstp1);
+
+	return 0;
+}
+
+/* Find a CXL adapter by it's number and increase it's refcount */
+struct cxl_t *get_cxl_adapter(int num)
+{
+	struct cxl_t *adapter;
+
+	spin_lock(&adapter_idr_lock);
+	if ((adapter = idr_find(&cxl_adapter_idr, num)))
+		get_device(&adapter->dev);
+	spin_unlock(&adapter_idr_lock);
+
+	return adapter;
+}
+
+int cxl_alloc_adapter_nr(struct cxl_t *adapter)
+{
+	int i;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&adapter_idr_lock);
+	i = idr_alloc(&cxl_adapter_idr, adapter, 0, 0, GFP_NOWAIT);
+	spin_unlock(&adapter_idr_lock);
+	idr_preload_end();
+	if (i < 0)
+		return i;
+
+	adapter->adapter_num = i;
+
+	return 0;
+}
+EXPORT_SYMBOL(cxl_alloc_adapter_nr);
+
+void cxl_remove_adapter_nr(struct cxl_t *adapter)
+{
+	idr_remove(&cxl_adapter_idr, adapter->adapter_num);
+}
+EXPORT_SYMBOL(cxl_remove_adapter_nr);
+
+int cxl_afu_select_best_model(struct cxl_afu_t *afu)
+{
+	if (afu->models_supported & CXL_MODEL_DIRECTED)
+		return cxl_afu_activate_model(afu, CXL_MODEL_DIRECTED);
+
+	if (afu->models_supported & CXL_MODEL_DEDICATED)
+		return cxl_afu_activate_model(afu, CXL_MODEL_DEDICATED);
+
+	dev_warn(&afu->dev, "No supported programing models available\n");
+	/* We don't fail this so the user can inspect sysfs */
+	return 0;
+}
+EXPORT_SYMBOL(cxl_afu_select_best_model);
+
+static int __init init_cxl(void)
+{
+	int rc = 0;
+
+	if (!cpu_has_feature(CPU_FTR_HVMODE))
+		return -EPERM;
+
+	if ((rc = cxl_file_init()))
+		return rc;
+
+	cxl_debugfs_init();
+	init_cxl_native();
+
+	if ((rc = register_cxl_calls(&cxl_calls)))
+		goto err;
+
+	return 0;
+
+err:
+	cxl_debugfs_exit();
+	cxl_file_exit();
+
+	return rc;
+}
+
+static void exit_cxl(void)
+{
+	cxl_debugfs_exit();
+	cxl_file_exit();
+	unregister_cxl_calls(&cxl_calls);
+}
+
+module_init(init_cxl);
+module_exit(exit_cxl);
+
+MODULE_DESCRIPTION("IBM Coherent Accelerator");
+MODULE_AUTHOR("Ian Munsie <imunsie@au1.ibm.com>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c
new file mode 100644
index 0000000..3c5c6a8
--- /dev/null
+++ b/drivers/misc/cxl/native.c
@@ -0,0 +1,649 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#undef DEBUG
+
+#include <linux/spinlock.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/sched.h>
+#include <linux/mutex.h>
+#include <linux/mm.h>
+#include <linux/uaccess.h>
+#include <asm/synch.h>
+#include <misc/cxl.h>
+
+#include "cxl.h"
+
+static int afu_control(struct cxl_afu_t *afu, u64 command,
+		       u64 result, u64 mask, bool enabled)
+{
+	u64 AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An);
+	unsigned long timeout = jiffies + (HZ * CXL_TIMEOUT);
+
+	spin_lock(&afu->afu_cntl_lock);
+	pr_devel("AFU command starting: %llx\n", command);
+
+	cxl_p2n_write(afu, CXL_AFU_Cntl_An, AFU_Cntl | command);
+
+	AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An);
+	while ((AFU_Cntl & mask) != result) {
+		if (time_after_eq(jiffies, timeout)) {
+			dev_warn(&afu->dev, "WARNING: AFU control timed out!\n");
+			spin_unlock(&afu->afu_cntl_lock);
+			return -EBUSY;
+		}
+		pr_devel_ratelimited("AFU control... (0x%.16llx)\n",
+				     AFU_Cntl | command);
+		cpu_relax();
+		AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An);
+	};
+	pr_devel("AFU command complete: %llx\n", command);
+	afu->enabled = enabled;
+	spin_unlock(&afu->afu_cntl_lock);
+
+	return 0;
+}
+
+static int afu_enable(struct cxl_afu_t *afu)
+{
+	pr_devel("AFU enable request\n");
+
+	return afu_control(afu, CXL_AFU_Cntl_An_E,
+			   CXL_AFU_Cntl_An_ES_Enabled,
+			   CXL_AFU_Cntl_An_ES_MASK, true);
+}
+
+static int afu_disable(struct cxl_afu_t *afu)
+{
+	pr_devel("AFU disable request\n");
+
+	return afu_control(afu, 0, CXL_AFU_Cntl_An_ES_Disabled,
+			   CXL_AFU_Cntl_An_ES_MASK, false);
+}
+
+/* We have to disable when we reset */
+static int afu_reset_and_disable(struct cxl_afu_t *afu)
+{
+	pr_devel("AFU reset request\n");
+
+	return afu_control(afu, CXL_AFU_Cntl_An_RA,
+			   CXL_AFU_Cntl_An_RS_Complete | CXL_AFU_Cntl_An_ES_Disabled,
+			   CXL_AFU_Cntl_An_RS_MASK | CXL_AFU_Cntl_An_ES_MASK,
+			   false);
+}
+
+static int afu_check_and_enable(struct cxl_afu_t *afu)
+{
+	if (afu->enabled)
+		return 0;
+	return afu_enable(afu);
+}
+
+static int psl_purge(struct cxl_afu_t *afu)
+{
+	u64 PSL_CNTL = cxl_p1n_read(afu, CXL_PSL_SCNTL_An);
+	u64 AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An);
+	u64 dsisr, dar;
+	u64 start, end;
+	unsigned long timeout = jiffies + (HZ * CXL_TIMEOUT);
+
+	pr_devel("PSL purge request\n");
+
+	if ((AFU_Cntl & CXL_AFU_Cntl_An_ES_MASK) != CXL_AFU_Cntl_An_ES_Disabled) {
+		WARN(1, "psl_purge request while AFU not disabled!\n");
+		afu_disable(afu);
+	}
+
+	cxl_p1n_write(afu, CXL_PSL_SCNTL_An,
+		       PSL_CNTL | CXL_PSL_SCNTL_An_Pc);
+	start = local_clock();
+	PSL_CNTL = cxl_p1n_read(afu, CXL_PSL_SCNTL_An);
+	while ((PSL_CNTL &  CXL_PSL_SCNTL_An_Ps_MASK)
+			== CXL_PSL_SCNTL_An_Ps_Pending) {
+		if (time_after_eq(jiffies, timeout)) {
+			dev_warn(&afu->dev, "WARNING: PSL Purge timed out!\n");
+			return -EBUSY;
+		}
+		dsisr = cxl_p2n_read(afu, CXL_PSL_DSISR_An);
+		pr_devel_ratelimited("PSL purging... PSL_CNTL: 0x%.16llx  PSL_DSISR: 0x%.16llx\n", PSL_CNTL, dsisr);
+		if (dsisr & CXL_PSL_DSISR_TRANS) {
+			dar = cxl_p2n_read(afu, CXL_PSL_DAR_An);
+			dev_notice(&afu->dev, "PSL purge terminating pending translation, DSISR: 0x%.16llx, DAR: 0x%.16llx\n", dsisr, dar);
+			cxl_p2n_write(afu, CXL_PSL_TFC_An, CXL_PSL_TFC_An_AE);
+		} else if (dsisr) {
+			dev_notice(&afu->dev, "PSL purge acknowledging pending non-translation fault, DSISR: 0x%.16llx\n", dsisr);
+			cxl_p2n_write(afu, CXL_PSL_TFC_An, CXL_PSL_TFC_An_A);
+		} else {
+			cpu_relax();
+		}
+		PSL_CNTL = cxl_p1n_read(afu, CXL_PSL_SCNTL_An);
+	};
+	end = local_clock();
+	pr_devel("PSL purged in %lld ns\n", end - start);
+
+	cxl_p1n_write(afu, CXL_PSL_SCNTL_An,
+		       PSL_CNTL & ~CXL_PSL_SCNTL_An_Pc);
+	return 0;
+}
+
+static int spa_max_procs(int spa_size)
+{
+	/* From the CAIA:
+	 *    end_of_SPA_area = SPA_Base + ((n+4) * 128) + (( ((n*8) + 127) >> 7) * 128) + 255
+	 * Most of that junk is really just an overly-complicated way of saying
+	 * the last 256 bytes are __aligned(128), so it's really:
+	 *    end_of_SPA_area = end_of_PSL_queue_area + __aligned(128) 255
+	 * and
+	 *    end_of_PSL_queue_area = SPA_Base + ((n+4) * 128) + (n*8) - 1
+	 * so
+	 *    sizeof(SPA) = ((n+4) * 128) + (n*8) + __aligned(128) 256
+	 * Ignore the alignment (which is safe in this case as long as we are
+	 * careful with our rounding) and solve for n:
+	 */
+	return ((spa_size / 8) - 96) / 17;
+}
+
+static int alloc_spa(struct cxl_afu_t *afu)
+{
+	u64 spap;
+
+	/* Work out how many pages to allocate */
+	afu->spa_order = 0;
+	do {
+		afu->spa_order++;
+		afu->spa_size = (1 << afu->spa_order) * PAGE_SIZE;
+		afu->spa_max_procs = spa_max_procs(afu->spa_size);
+	} while (afu->spa_max_procs < afu->num_procs);
+
+	WARN_ON(afu->spa_size > 0x100000); /* Max size supported by the hardware */
+
+	if (!(afu->spa = (struct cxl_process_element *)
+	      __get_free_pages(GFP_KERNEL | __GFP_ZERO, afu->spa_order))) {
+		pr_err("cxl_alloc_spa: Unable to allocate scheduled process area\n");
+		return -ENOMEM;
+	}
+	pr_devel("spa pages: %i afu->spa_max_procs: %i   afu->num_procs: %i\n",
+		 1<<afu->spa_order, afu->spa_max_procs, afu->num_procs);
+
+	afu->sw_command_status = (__be64 *)((char *)afu->spa +
+					    ((afu->spa_max_procs + 3) * 128));
+
+	spap = virt_to_phys(afu->spa) & CXL_PSL_SPAP_Addr;
+	spap |= ((afu->spa_size >> (12 - CXL_PSL_SPAP_Size_Shift)) - 1) & CXL_PSL_SPAP_Size;
+	spap |= CXL_PSL_SPAP_V;
+	pr_devel("cxl: SPA allocated at 0x%p. Max processes: %i, sw_command_status: 0x%p CXL_PSL_SPAP_An=0x%016llx\n", afu->spa, afu->spa_max_procs, afu->sw_command_status, spap);
+	cxl_p1n_write(afu, CXL_PSL_SPAP_An, spap);
+
+	return 0;
+}
+
+static void release_spa(struct cxl_afu_t *afu)
+{
+	free_pages((unsigned long) afu->spa, afu->spa_order);
+}
+
+static void afu_slbia_native(struct cxl_afu_t *afu)
+{
+	pr_devel("cxl_afu_slbia issuing SLBIA command\n");
+	cxl_p2n_write(afu, CXL_SLBIA_An, CXL_SLBI_IQ_ALL);
+	while (cxl_p2n_read(afu, CXL_SLBIA_An) & CXL_SLBIA_P)
+		cpu_relax();
+}
+
+static void cxl_write_sstp(struct cxl_afu_t *afu, u64 sstp0, u64 sstp1)
+{
+	/* 1. Disable SSTP by writing 0 to SSTP1[V] */
+	cxl_p2n_write(afu, CXL_SSTP1_An, 0);
+
+	/* 2. Invalidate all SLB entries */
+	afu_slbia_native(afu);
+
+	/* 3. Set SSTP0_An */
+	cxl_p2n_write(afu, CXL_SSTP0_An, sstp0);
+
+	/* 4. Set SSTP1_An */
+	cxl_p2n_write(afu, CXL_SSTP1_An, sstp1);
+}
+
+/* Using per slice version may improve performance here. (ie. SLBIA_An) */
+static void slb_invalid(struct cxl_context_t *ctx)
+{
+	struct cxl_t *adapter = ctx->afu->adapter;
+	u64 slbia;
+
+	WARN_ON(!mutex_is_locked(&ctx->afu->spa_mutex));
+
+	cxl_p1_write(adapter, CXL_PSL_LBISEL,
+			((u64)be32_to_cpu(ctx->elem->common.pid) << 32) |
+			be32_to_cpu(ctx->elem->lpid));
+	cxl_p1_write(adapter, CXL_PSL_SLBIA, CXL_SLBI_IQ_LPIDPID);
+
+	while (1) {
+		slbia = cxl_p1_read(adapter, CXL_PSL_SLBIA);
+		if (!(slbia & CXL_SLBIA_P))
+			break;
+		cpu_relax();
+	}
+}
+
+static int do_process_element_cmd(struct cxl_context_t *ctx,
+				  u64 cmd, u64 pe_state)
+{
+	u64 state;
+
+	WARN_ON(!ctx->afu->enabled);
+
+	ctx->elem->software_state = cpu_to_be32(pe_state);
+	smp_wmb();
+	*(ctx->afu->sw_command_status) = cpu_to_be64(cmd | 0 | ctx->ph);
+	smp_mb();
+	cxl_p1n_write(ctx->afu, CXL_PSL_LLCMD_An, cmd | ctx->ph);
+	while (1) {
+		state = be64_to_cpup(ctx->afu->sw_command_status);
+		if (state == ~0ULL) {
+			pr_err("cxl: Error adding process element to AFU\n");
+			return -1;
+		}
+		if ((state & (CXL_SPA_SW_CMD_MASK | CXL_SPA_SW_STATE_MASK  | CXL_SPA_SW_LINK_MASK)) ==
+		    (cmd | (cmd >> 16) | ctx->ph))
+			break;
+		/* The command won't finish in the PSL if there are
+		 * outstanding DSIs.  Hence we need to yield here in
+		 * case there are outstanding DSIs that we need to
+		 * service.  Tuning possiblity: we could wait for a
+		 * while before sched
+		 */
+		schedule();
+
+	}
+	return 0;
+}
+
+static int add_process_element(struct cxl_context_t *ctx)
+{
+	int rc = 0;
+
+	mutex_lock(&ctx->afu->spa_mutex);
+	pr_devel("%s Adding pe: %i started\n", __func__, ctx->ph);
+	if (!(rc = do_process_element_cmd(ctx, CXL_SPA_SW_CMD_ADD, CXL_PE_SOFTWARE_STATE_V)))
+		ctx->pe_inserted = true;
+	pr_devel("%s Adding pe: %i finished\n", __func__, ctx->ph);
+	mutex_unlock(&ctx->afu->spa_mutex);
+	return rc;
+}
+
+static int terminate_process_element(struct cxl_context_t *ctx)
+{
+	int rc = 0;
+
+	/* fast path terminate if it's already invalid */
+	if (!(ctx->elem->software_state & cpu_to_be32(CXL_PE_SOFTWARE_STATE_V)))
+		return rc;
+
+	mutex_lock(&ctx->afu->spa_mutex);
+	pr_devel("%s Terminate pe: %i started\n", __func__, ctx->ph);
+	rc = do_process_element_cmd(ctx, CXL_SPA_SW_CMD_TERMINATE,
+				    CXL_PE_SOFTWARE_STATE_V | CXL_PE_SOFTWARE_STATE_T);
+	ctx->elem->software_state = 0;	/* Remove Valid bit */
+	pr_devel("%s Terminate pe: %i finished\n", __func__, ctx->ph);
+	mutex_unlock(&ctx->afu->spa_mutex);
+	return rc;
+}
+
+static int remove_process_element(struct cxl_context_t *ctx)
+{
+	int rc = 0;
+
+	mutex_lock(&ctx->afu->spa_mutex);
+	pr_devel("%s Remove pe: %i started\n", __func__, ctx->ph);
+	if (!(rc = do_process_element_cmd(ctx, CXL_SPA_SW_CMD_REMOVE, 0)))
+		ctx->pe_inserted = false;
+	slb_invalid(ctx);
+	pr_devel("%s Remove pe: %i finished\n", __func__, ctx->ph);
+	mutex_unlock(&ctx->afu->spa_mutex);
+
+	return rc;
+}
+
+
+static void assign_psn_space(struct cxl_context_t *ctx)
+{
+	if (!ctx->afu->pp_size || ctx->master) {
+		ctx->psn_phys = ctx->afu->psn_phys;
+		ctx->psn_size = ctx->afu->adapter->ps_size;
+	} else {
+		ctx->psn_phys = ctx->afu->psn_phys +
+			(ctx->afu->pp_offset + ctx->afu->pp_size * ctx->ph);
+		ctx->psn_size = ctx->afu->pp_size;
+	}
+}
+
+static int activate_afu_directed(struct cxl_afu_t *afu)
+{
+	int rc;
+
+	dev_info(&afu->dev, "Activating AFU directed model\n");
+
+	if (alloc_spa(afu))
+		return -ENOMEM;
+
+	cxl_p1n_write(afu, CXL_PSL_SCNTL_An, CXL_PSL_SCNTL_An_PM_AFU);
+	cxl_p1n_write(afu, CXL_PSL_AMOR_An, 0xFFFFFFFFFFFFFFFFULL);
+	cxl_p1n_write(afu, CXL_PSL_ID_An, CXL_PSL_ID_An_F | CXL_PSL_ID_An_L);
+
+	afu->current_model = CXL_MODEL_DIRECTED;
+	afu->num_procs = afu->max_procs_virtualised;
+
+	if ((rc = cxl_chardev_m_afu_add(afu)))
+		return rc;
+
+	if ((rc = cxl_chardev_s_afu_add(afu)))
+		goto err;
+
+	return 0;
+err:
+	cxl_chardev_afu_remove(afu);
+	return rc;
+}
+
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+#define set_endian(sr) ((sr) |= CXL_PSL_SR_An_LE)
+#else
+#define set_endian(sr) ((sr) &= ~(CXL_PSL_SR_An_LE))
+#endif
+
+static int attach_afu_directed(struct cxl_context_t *ctx, u64 wed, u64 amr)
+{
+
+	u64 sr, sstp0, sstp1;
+	int r, result;
+
+	assign_psn_space(ctx);
+
+	ctx->elem->ctxtime = 0; /* disable */
+	ctx->elem->lpid = cpu_to_be32(mfspr(SPRN_LPID));
+	ctx->elem->haurp = 0; /* disable */
+	ctx->elem->sdr = cpu_to_be64(mfspr(SPRN_SDR1));
+
+	sr = CXL_PSL_SR_An_SC;
+	if (ctx->master)
+		sr |= CXL_PSL_SR_An_MP;
+	if (mfspr(SPRN_LPCR) & LPCR_TC)
+		sr |= CXL_PSL_SR_An_TC;
+	/* HV=0, PR=1, R=1 for userspace
+	 * For kernel contexts: this would need to change
+	 */
+	sr |= CXL_PSL_SR_An_PR | CXL_PSL_SR_An_R;
+	set_endian(sr);
+	sr &= ~(CXL_PSL_SR_An_HV);
+	if (!test_tsk_thread_flag(current, TIF_32BIT))
+		sr |= CXL_PSL_SR_An_SF;
+	ctx->elem->common.pid = cpu_to_be32(current->pid);
+	ctx->elem->common.tid = 0;
+	ctx->elem->sr = cpu_to_be64(sr);
+
+	ctx->elem->common.csrp = 0; /* disable */
+	ctx->elem->common.aurp0 = 0; /* disable */
+	ctx->elem->common.aurp1 = 0; /* disable */
+
+	if ((result = cxl_alloc_sst(ctx, &sstp0, &sstp1)))
+		return result;
+
+	cxl_prefault(ctx, wed);
+
+	ctx->elem->common.sstp0 = cpu_to_be64(sstp0);
+	ctx->elem->common.sstp1 = cpu_to_be64(sstp1);
+
+	for (r = 0; r < CXL_IRQ_RANGES; r++) {
+		ctx->elem->ivte_offsets[r] = cpu_to_be16(ctx->irqs.offset[r]);
+		ctx->elem->ivte_ranges[r] = cpu_to_be16(ctx->irqs.range[r]);
+	}
+
+	ctx->elem->common.amr = cpu_to_be64(amr);
+	ctx->elem->common.wed = cpu_to_be64(wed);
+
+	/* first guy needs to enable */
+	if ((result = afu_check_and_enable(ctx->afu)))
+		return result;
+
+	add_process_element(ctx);
+
+	return 0;
+}
+
+static int deactivate_afu_directed(struct cxl_afu_t *afu)
+{
+	dev_info(&afu->dev, "Deactivating AFU directed model\n");
+
+	afu->current_model = 0;
+	afu->num_procs = 0;
+
+	cxl_chardev_afu_remove(afu);
+
+	afu_reset_and_disable(afu);
+	afu_disable(afu);
+	psl_purge(afu);
+
+	release_spa(afu);
+
+	return 0;
+}
+
+static int activate_dedicated_process(struct cxl_afu_t *afu)
+{
+	dev_info(&afu->dev, "Activating dedicated process model\n");
+
+	cxl_p1n_write(afu, CXL_PSL_SCNTL_An, CXL_PSL_SCNTL_An_PM_Process);
+
+	cxl_p1n_write(afu, CXL_PSL_CtxTime_An, 0); /* disable */
+	cxl_p1n_write(afu, CXL_PSL_SPAP_An, 0);    /* disable */
+	cxl_p1n_write(afu, CXL_PSL_AMOR_An, 0xFFFFFFFFFFFFFFFFULL);
+	cxl_p1n_write(afu, CXL_PSL_LPID_An, mfspr(SPRN_LPID));
+	cxl_p1n_write(afu, CXL_HAURP_An, 0);       /* disable */
+	cxl_p1n_write(afu, CXL_PSL_SDR_An, mfspr(SPRN_SDR1));
+
+	cxl_p2n_write(afu, CXL_CSRP_An, 0);        /* disable */
+	cxl_p2n_write(afu, CXL_AURP0_An, 0);       /* disable */
+	cxl_p2n_write(afu, CXL_AURP1_An, 0);       /* disable */
+
+	afu->current_model = CXL_MODEL_DEDICATED;
+	afu->num_procs = 1;
+
+	return cxl_chardev_m_afu_add(afu);
+}
+
+static int attach_dedicated(struct cxl_context_t *ctx, u64 wed, u64 amr)
+{
+	struct cxl_afu_t *afu = ctx->afu;
+	u64 sr, sstp0, sstp1;
+	int result;
+
+	sr = CXL_PSL_SR_An_SC;
+	set_endian(sr);
+	if (ctx->master)
+		sr |= CXL_PSL_SR_An_MP;
+	if (mfspr(SPRN_LPCR) & LPCR_TC)
+		sr |= CXL_PSL_SR_An_TC;
+	sr |= CXL_PSL_SR_An_PR | CXL_PSL_SR_An_R;
+	if (!test_tsk_thread_flag(current, TIF_32BIT))
+		sr |= CXL_PSL_SR_An_SF;
+	cxl_p2n_write(afu, CXL_PSL_PID_TID_An, (u64)current->pid << 32);
+	cxl_p1n_write(afu, CXL_PSL_SR_An, sr);
+
+	if ((result = cxl_alloc_sst(ctx, &sstp0, &sstp1)))
+		return result;
+
+	cxl_prefault(ctx, wed);
+
+	cxl_write_sstp(afu, sstp0, sstp1);
+	cxl_p1n_write(afu, CXL_PSL_IVTE_Offset_An,
+		       (((u64)ctx->irqs.offset[0] & 0xffff) << 48) |
+		       (((u64)ctx->irqs.offset[1] & 0xffff) << 32) |
+		       (((u64)ctx->irqs.offset[2] & 0xffff) << 16) |
+			((u64)ctx->irqs.offset[3] & 0xffff));
+	cxl_p1n_write(afu, CXL_PSL_IVTE_Limit_An, (u64)
+		       (((u64)ctx->irqs.range[0] & 0xffff) << 48) |
+		       (((u64)ctx->irqs.range[1] & 0xffff) << 32) |
+		       (((u64)ctx->irqs.range[2] & 0xffff) << 16) |
+			((u64)ctx->irqs.range[3] & 0xffff));
+
+	cxl_p2n_write(afu, CXL_PSL_AMR_An, amr);
+
+	/* master only context for dedicated */
+	assign_psn_space(ctx);
+
+	if ((result = afu_reset_and_disable(afu)))
+		return result;
+
+	cxl_p2n_write(afu, CXL_PSL_WED_An, wed);
+
+	return afu_enable(afu);
+}
+
+static int deactivate_dedicated_process(struct cxl_afu_t *afu)
+{
+	dev_info(&afu->dev, "Deactivating dedicated process model\n");
+
+	afu->current_model = 0;
+	afu->num_procs = 0;
+
+	cxl_chardev_afu_remove(afu);
+
+	return 0;
+}
+
+int _cxl_afu_deactivate_model(struct cxl_afu_t *afu, int model)
+{
+	if (model == CXL_MODEL_DIRECTED)
+		return deactivate_afu_directed(afu);
+	if (model == CXL_MODEL_DEDICATED)
+		return deactivate_dedicated_process(afu);
+	return 0;
+}
+
+int cxl_afu_deactivate_model(struct cxl_afu_t *afu)
+{
+	return _cxl_afu_deactivate_model(afu, afu->current_model);
+}
+EXPORT_SYMBOL(cxl_afu_deactivate_model);
+
+int cxl_afu_activate_model(struct cxl_afu_t *afu, int model)
+{
+	if (!model)
+		return 0;
+	if (!(model & afu->models_supported))
+		return -EINVAL;
+
+	if (model == CXL_MODEL_DIRECTED)
+		return activate_afu_directed(afu);
+	if (model == CXL_MODEL_DEDICATED)
+		return activate_dedicated_process(afu);
+
+	return -EINVAL;
+}
+EXPORT_SYMBOL(cxl_afu_activate_model);
+
+static int attach_process_native(struct cxl_context_t *ctx, bool kernel,
+			       u64 wed, u64 amr)
+{
+	ctx->kernel = kernel;
+	if (ctx->afu->current_model == CXL_MODEL_DIRECTED)
+		return attach_afu_directed(ctx, wed, amr);
+
+	if (ctx->afu->current_model == CXL_MODEL_DEDICATED)
+		return attach_dedicated(ctx, wed, amr);
+
+	return -EINVAL;
+}
+
+/* TODO: handle case when this is called with IRQs off which may
+ * happen when we unbind the driver.  Terminate & remove use a mutex
+ * lock and schedule which will not good with lock held.  May need to
+ * write do_process_element_cmd() that handles outstanding page
+ * faults. */
+static int detach_process_native(struct cxl_context_t *ctx)
+{
+	if (ctx->afu->current_model == CXL_MODEL_DEDICATED) {
+		afu_reset_and_disable(ctx->afu);
+		afu_disable(ctx->afu);
+		psl_purge(ctx->afu);
+		return 0;
+	}
+
+	if (!ctx->pe_inserted)
+		return 0;
+	if (terminate_process_element(ctx))
+		return -1;
+	if (remove_process_element(ctx))
+		return -1;
+
+	return 0;
+}
+
+static int get_irq_native(struct cxl_context_t *ctx, struct cxl_irq_info *info)
+{
+	u64 pidtid;
+
+	info->dsisr = cxl_p2n_read(ctx->afu, CXL_PSL_DSISR_An);
+	info->dar = cxl_p2n_read(ctx->afu, CXL_PSL_DAR_An);
+	info->dsr = cxl_p2n_read(ctx->afu, CXL_PSL_DSR_An);
+	pidtid = cxl_p2n_read(ctx->afu, CXL_PSL_PID_TID_An);
+	info->pid = pidtid >> 32;
+	info->tid = pidtid & 0xffffffff;
+	info->afu_err = cxl_p2n_read(ctx->afu, CXL_AFU_ERR_An);
+	info->errstat = cxl_p2n_read(ctx->afu, CXL_PSL_ErrStat_An);
+
+	return 0;
+}
+
+static void recover_psl_err(struct cxl_afu_t *afu, u64 errstat)
+{
+	u64 dsisr;
+
+	pr_devel("RECOVERING FROM PSL ERROR... (0x%.16llx)\n", errstat);
+
+	/* Clear PSL_DSISR[PE] */
+	dsisr = cxl_p2n_read(afu, CXL_PSL_DSISR_An);
+	cxl_p2n_write(afu, CXL_PSL_DSISR_An, dsisr & ~CXL_PSL_DSISR_An_PE);
+
+	/* Write 1s to clear error status bits */
+	cxl_p2n_write(afu, CXL_PSL_ErrStat_An, errstat);
+}
+
+static int ack_irq_native(struct cxl_context_t *ctx, u64 tfc,
+			  u64 psl_reset_mask)
+{
+	if (tfc)
+		cxl_p2n_write(ctx->afu, CXL_PSL_TFC_An, tfc);
+	if (psl_reset_mask)
+		recover_psl_err(ctx->afu, psl_reset_mask);
+
+	return 0;
+}
+
+static int check_error(struct cxl_afu_t *afu)
+{
+	return (cxl_p1n_read(afu, CXL_PSL_SCNTL_An) == ~0ULL);
+}
+
+static const struct cxl_backend_ops cxl_native_ops = {
+	.attach_process = attach_process_native,
+	.detach_process = detach_process_native,
+	.get_irq = get_irq_native,
+	.ack_irq = ack_irq_native,
+	.check_error = check_error,
+	.slbia = afu_slbia_native,
+	.afu_reset = afu_reset_and_disable,
+};
+
+void init_cxl_native(void)
+{
+	cxl_ops = &cxl_native_ops;
+}
diff --git a/drivers/misc/cxl/sysfs.c b/drivers/misc/cxl/sysfs.c
new file mode 100644
index 0000000..67489e8
--- /dev/null
+++ b/drivers/misc/cxl/sysfs.c
@@ -0,0 +1,348 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/device.h>
+#include <linux/sysfs.h>
+
+#include "cxl.h"
+
+#define to_afu_chardev_m(d) dev_get_drvdata(d)
+
+/*********  Adapter attributes  **********************************************/
+
+static ssize_t caia_version_show(struct device *device,
+				 struct device_attribute *attr,
+				 char *buf)
+{
+	struct cxl_t *adapter = to_cxl_adapter(device);
+
+	return scnprintf(buf, PAGE_SIZE, "%i.%i\n", adapter->caia_major,
+			 adapter->caia_minor);
+}
+
+static ssize_t psl_revision_show(struct device *device,
+				 struct device_attribute *attr,
+				 char *buf)
+{
+	struct cxl_t *adapter = to_cxl_adapter(device);
+
+	return scnprintf(buf, PAGE_SIZE, "%i\n", adapter->psl_rev);
+}
+
+static ssize_t base_image_show(struct device *device,
+			       struct device_attribute *attr,
+			       char *buf)
+{
+	struct cxl_t *adapter = to_cxl_adapter(device);
+
+	return scnprintf(buf, PAGE_SIZE, "%i\n", adapter->base_image);
+}
+
+static ssize_t image_loaded_show(struct device *device,
+				 struct device_attribute *attr,
+				 char *buf)
+{
+	struct cxl_t *adapter = to_cxl_adapter(device);
+
+	if (adapter->user_image_loaded)
+		return scnprintf(buf, PAGE_SIZE, "user\n");
+	return scnprintf(buf, PAGE_SIZE, "factory\n");
+}
+
+static struct device_attribute adapter_attrs[] = {
+	__ATTR_RO(caia_version),
+	__ATTR_RO(psl_revision),
+	__ATTR_RO(base_image),
+	__ATTR_RO(image_loaded),
+	/* __ATTR_RW(reset_loads_image); */
+	/* __ATTR_RW(reset_image_select); */
+};
+
+
+/*********  AFU master specific attributes  **********************************/
+
+static ssize_t mmio_size_show_master(struct device *device,
+				     struct device_attribute *attr,
+				     char *buf)
+{
+	struct cxl_afu_t *afu = to_afu_chardev_m(device);
+
+	return scnprintf(buf, PAGE_SIZE, "%llu\n", afu->adapter->ps_size);
+}
+
+static ssize_t pp_mmio_off_show(struct device *device,
+				struct device_attribute *attr,
+				char *buf)
+{
+	struct cxl_afu_t *afu = to_afu_chardev_m(device);
+
+	return scnprintf(buf, PAGE_SIZE, "%llu\n", afu->pp_offset);
+}
+
+static ssize_t pp_mmio_len_show(struct device *device,
+				struct device_attribute *attr,
+				char *buf)
+{
+	struct cxl_afu_t *afu = to_afu_chardev_m(device);
+
+	return scnprintf(buf, PAGE_SIZE, "%llu\n", afu->pp_size);
+}
+
+static struct device_attribute afu_master_attrs[] = {
+	__ATTR(mmio_size, S_IRUGO, mmio_size_show_master, NULL),
+	__ATTR_RO(pp_mmio_off),
+	__ATTR_RO(pp_mmio_len),
+};
+
+
+/*********  AFU attributes  **************************************************/
+
+static ssize_t mmio_size_show(struct device *device,
+			      struct device_attribute *attr,
+			      char *buf)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+
+	if (afu->pp_size)
+		return scnprintf(buf, PAGE_SIZE, "%llu\n", afu->pp_size);
+	return scnprintf(buf, PAGE_SIZE, "%llu\n", afu->adapter->ps_size);
+}
+
+static ssize_t reset_store_afu(struct device *device,
+			       struct device_attribute *attr,
+			       const char *buf, size_t count)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+	int rc;
+
+	if ((rc = cxl_ops->afu_reset(afu)))
+		return rc;
+	return count;
+}
+
+static ssize_t irqs_min_show(struct device *device,
+			     struct device_attribute *attr,
+			     char *buf)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+
+	return scnprintf(buf, PAGE_SIZE, "%i\n", afu->pp_irqs);
+}
+
+static ssize_t irqs_max_show(struct device *device,
+				  struct device_attribute *attr,
+				  char *buf)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+
+	return scnprintf(buf, PAGE_SIZE, "%i\n", afu->irqs_max);
+}
+
+static ssize_t irqs_max_store(struct device *device,
+				  struct device_attribute *attr,
+				  const char *buf, size_t count)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+	ssize_t ret;
+	int irqs_max;
+
+	ret = sscanf(buf, "%i", &irqs_max);
+	if (ret != 1)
+		return -EINVAL;
+
+	if (irqs_max < afu->pp_irqs)
+		return -EINVAL;
+
+	if (irqs_max > afu->adapter->user_irqs)
+		return -EINVAL;
+
+	afu->irqs_max = irqs_max;
+	return count;
+}
+
+static ssize_t models_supported_show(struct device *device,
+				    struct device_attribute *attr,
+				    char *buf)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+	char *p = buf, *end = buf + PAGE_SIZE;
+
+	if (afu->models_supported & CXL_MODEL_DEDICATED)
+		p += scnprintf(p, end - p, "dedicated_process\n");
+	if (afu->models_supported & CXL_MODEL_DIRECTED)
+		p += scnprintf(p, end - p, "afu_directed\n");
+	return (p - buf);
+}
+
+static ssize_t prefault_mode_show(struct device *device,
+				  struct device_attribute *attr,
+				  char *buf)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+
+	switch (afu->prefault_mode) {
+	case CXL_PREFAULT_WED:
+		return scnprintf(buf, PAGE_SIZE, "wed\n");
+	case CXL_PREFAULT_ALL:
+		return scnprintf(buf, PAGE_SIZE, "all\n");
+	default:
+		return scnprintf(buf, PAGE_SIZE, "none\n");
+	}
+}
+
+static ssize_t prefault_mode_store(struct device *device,
+			  struct device_attribute *attr,
+			  const char *buf, size_t count)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+	enum prefault_modes mode = -1;
+
+	if (!strncmp(buf, "wed", 3))
+		mode = CXL_PREFAULT_WED;
+	if (!strncmp(buf, "all", 3))
+		mode = CXL_PREFAULT_ALL;
+	if (!strncmp(buf, "none", 4))
+		mode = CXL_PREFAULT_NONE;
+
+	if (mode == -1)
+		return -EINVAL;
+
+	afu->prefault_mode = mode;
+	return count;
+}
+
+static ssize_t model_show(struct device *device,
+			 struct device_attribute *attr,
+			 char *buf)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+
+	if (afu->current_model == CXL_MODEL_DEDICATED)
+		return scnprintf(buf, PAGE_SIZE, "dedicated_process\n");
+	if (afu->current_model == CXL_MODEL_DIRECTED)
+		return scnprintf(buf, PAGE_SIZE, "afu_directed\n");
+	return scnprintf(buf, PAGE_SIZE, "none\n");
+}
+
+static ssize_t model_store(struct device *device,
+			   struct device_attribute *attr,
+			   const char *buf, size_t count)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+	int old_model, model = -1;
+	int rc = -EBUSY;
+
+	/* can't change this if we have a user */
+	spin_lock(&afu->contexts_lock);
+	if (!idr_is_empty(&afu->contexts_idr))
+		goto err;
+
+	if (!strncmp(buf, "dedicated_process", 17))
+		model = CXL_MODEL_DEDICATED;
+	if (!strncmp(buf, "afu_directed", 12))
+		model = CXL_MODEL_DIRECTED;
+	if (!strncmp(buf, "none", 4))
+		model = 0;
+
+	if (model == -1) {
+		rc = -EINVAL;
+		goto err;
+	}
+
+	/* cxl_afu_deactivate_model needs to be done outside the lock, prevent
+	 * other contexts coming in before we are ready: */
+	old_model = afu->current_model;
+	afu->current_model = 0;
+	afu->num_procs = 0;
+
+	spin_unlock(&afu->contexts_lock);
+
+	if ((rc = _cxl_afu_deactivate_model(afu, old_model)))
+		return rc;
+	if ((rc = cxl_afu_activate_model(afu, model)))
+		return rc;
+
+	return count;
+err:
+	spin_unlock(&afu->contexts_lock);
+	return rc;
+}
+
+static struct device_attribute afu_attrs[] = {
+	__ATTR_RO(mmio_size),
+	__ATTR_RO(irqs_min),
+	__ATTR_RW(irqs_max),
+	__ATTR_RO(models_supported),
+	__ATTR_RW(model),
+	__ATTR_RW(prefault_mode),
+	__ATTR(reset, S_IWUSR, NULL, reset_store_afu),
+};
+
+
+
+int cxl_sysfs_adapter_add(struct cxl_t *adapter)
+{
+	int i, rc;
+
+	for (i = 0; i < ARRAY_SIZE(adapter_attrs); i++) {
+		if ((rc = device_create_file(&adapter->dev, &adapter_attrs[i])))
+			goto err;
+	}
+	return 0;
+err:
+	for (i--; i >= 0; i--)
+		device_remove_file(&adapter->dev, &adapter_attrs[i]);
+	return rc;
+}
+EXPORT_SYMBOL(cxl_sysfs_adapter_add);
+void cxl_sysfs_adapter_remove(struct cxl_t *adapter)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(adapter_attrs); i++)
+		device_remove_file(&adapter->dev, &adapter_attrs[i]);
+}
+EXPORT_SYMBOL(cxl_sysfs_adapter_remove);
+
+int cxl_sysfs_afu_add(struct cxl_afu_t *afu)
+{
+	int afu_attr, mstr_attr, rc = 0;
+
+	for (afu_attr = 0; afu_attr < ARRAY_SIZE(afu_attrs); afu_attr++) {
+		if ((rc = device_create_file(&afu->dev, &afu_attrs[afu_attr])))
+			goto err;
+	}
+	for (mstr_attr = 0; mstr_attr < ARRAY_SIZE(afu_master_attrs); mstr_attr++) {
+		if ((rc = device_create_file(afu->chardev_m, &afu_master_attrs[mstr_attr])))
+			goto err1;
+	}
+
+	return 0;
+
+err1:
+	for (mstr_attr--; mstr_attr >= 0; mstr_attr--)
+		device_remove_file(afu->chardev_m, &afu_master_attrs[mstr_attr]);
+err:
+	for (afu_attr--; afu_attr >= 0; afu_attr--)
+		device_remove_file(&afu->dev, &afu_attrs[afu_attr]);
+	return rc;
+}
+EXPORT_SYMBOL(cxl_sysfs_afu_add);
+
+void cxl_sysfs_afu_remove(struct cxl_afu_t *afu)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(afu_master_attrs); i++)
+		device_remove_file(afu->chardev_m, &afu_master_attrs[i]);
+	for (i = 0; i < ARRAY_SIZE(afu_attrs); i++)
+		device_remove_file(&afu->dev, &afu_attrs[i]);
+}
+EXPORT_SYMBOL(cxl_sysfs_afu_remove);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 14/17] cxl: Driver code for powernv PCIe based cards for userspace access
@ 2014-09-30 10:35   ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:35 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

From: Ian Munsie <imunsie@au1.ibm.com>

This is the core of the cxl driver.

It adds support for using cxl cards in the powernv environment only (no guest
support).  It allows access to cxl accelerators by userspace using
/dev/cxl/afu0.0 char device.

The kernel driver has no knowledge of the acceleration function.  It only
provides services to userspace via the /dev/cxl/afu0.0 device.

This will compile to two modules.  cxl.ko provides the core cxl functionality
and userspace API.  cxl-pci.ko provides the PCI driver driver functionality the
powernv environment.

Documentation of the cxl hardware architecture and userspace API is provided in
subsequent patches.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 drivers/misc/cxl/context.c | 171 ++++++++
 drivers/misc/cxl/cxl-pci.c | 964 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/misc/cxl/cxl.h     | 605 ++++++++++++++++++++++++++++
 drivers/misc/cxl/debugfs.c | 116 ++++++
 drivers/misc/cxl/fault.c   | 298 ++++++++++++++
 drivers/misc/cxl/file.c    | 503 +++++++++++++++++++++++
 drivers/misc/cxl/irq.c     | 405 +++++++++++++++++++
 drivers/misc/cxl/main.c    | 238 +++++++++++
 drivers/misc/cxl/native.c  | 649 ++++++++++++++++++++++++++++++
 drivers/misc/cxl/sysfs.c   | 348 ++++++++++++++++
 10 files changed, 4297 insertions(+)
 create mode 100644 drivers/misc/cxl/context.c
 create mode 100644 drivers/misc/cxl/cxl-pci.c
 create mode 100644 drivers/misc/cxl/cxl.h
 create mode 100644 drivers/misc/cxl/debugfs.c
 create mode 100644 drivers/misc/cxl/fault.c
 create mode 100644 drivers/misc/cxl/file.c
 create mode 100644 drivers/misc/cxl/irq.c
 create mode 100644 drivers/misc/cxl/main.c
 create mode 100644 drivers/misc/cxl/native.c
 create mode 100644 drivers/misc/cxl/sysfs.c

diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
new file mode 100644
index 0000000..9206ca4
--- /dev/null
+++ b/drivers/misc/cxl/context.c
@@ -0,0 +1,171 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#undef DEBUG
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/bitmap.h>
+#include <linux/sched.h>
+#include <linux/pid.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/debugfs.h>
+#include <linux/slab.h>
+#include <linux/idr.h>
+#include <asm/cputable.h>
+#include <asm/current.h>
+#include <asm/copro.h>
+
+#include "cxl.h"
+
+/*
+ * Allocates space for a CXL context.
+ */
+struct cxl_context_t *cxl_context_alloc(void)
+{
+	return kzalloc(sizeof(struct cxl_context_t), GFP_KERNEL);
+}
+
+/*
+ * Initialises a CXL context.
+ */
+int cxl_context_init(struct cxl_context_t *ctx, struct cxl_afu_t *afu, bool master)
+{
+	int i;
+
+	spin_lock_init(&ctx->sst_lock);
+	ctx->sstp = NULL;
+	ctx->afu = afu;
+	ctx->master = master;
+	ctx->pid = get_pid(get_task_pid(current, PIDTYPE_PID));
+
+	INIT_WORK(&ctx->fault_work, cxl_handle_fault);
+
+	init_waitqueue_head(&ctx->wq);
+	spin_lock_init(&ctx->lock);
+
+	ctx->irq_bitmap = NULL;
+	ctx->pending_irq = false;
+	ctx->pending_fault = false;
+	ctx->pending_afu_err = false;
+
+	ctx->status = OPENED;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&afu->contexts_lock);
+	i = idr_alloc(&ctx->afu->contexts_idr, ctx, 0,
+		      ctx->afu->num_procs, GFP_NOWAIT);
+	spin_unlock(&afu->contexts_lock);
+	idr_preload_end();
+	if (i < 0)
+		return i;
+
+	ctx->ph = i;
+	ctx->elem = &ctx->afu->spa[i];
+	ctx->pe_inserted = false;
+	return 0;
+}
+
+/*
+ * Map a per-context mmio space into the given vma.
+ */
+int cxl_context_iomap(struct cxl_context_t *ctx, struct vm_area_struct *vma)
+{
+	u64 len = vma->vm_end - vma->vm_start;
+	len = min(len, ctx->psn_size);
+
+	if (ctx->afu->current_model == CXL_MODEL_DEDICATED) {
+		vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+		return vm_iomap_memory(vma, ctx->afu->psn_phys, ctx->afu->adapter->ps_size);
+	}
+
+	/* make sure there is a valid per process space for this AFU */
+	if ((ctx->master && !ctx->afu->psa) || (!ctx->afu->pp_psa)) {
+		pr_devel("AFU doesn't support mmio space\n");
+		return -EINVAL;
+	}
+
+	/* Can't mmap until the AFU is enabled */
+	if (!ctx->afu->enabled)
+		return -EBUSY;
+
+	pr_devel("%s: mmio physical: %llx pe: %i master:%i\n", __func__,
+		 ctx->psn_phys, ctx->ph , ctx->master);
+
+	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+	return vm_iomap_memory(vma, ctx->psn_phys, len);
+}
+
+/*
+ * Detach a context from the hardware. This disables interrupts and doesn't
+ * return until all outstanding interrupts for this context have completed. The
+ * hardware should no longer access *ctx after this has returned.
+ */
+static void __detach_context(struct cxl_context_t *ctx)
+{
+	unsigned long flags;
+	enum cxl_context_status status;
+
+	spin_lock_irqsave(&ctx->sst_lock, flags);
+	status = ctx->status;
+	ctx->status = CLOSED;
+	spin_unlock_irqrestore(&ctx->sst_lock, flags);
+	if (status != STARTED)
+		return;
+
+	WARN_ON(cxl_ops->detach_process(ctx));
+	afu_release_irqs(ctx);
+	flush_work(&ctx->fault_work); /* Only needed for dedicated process */
+	wake_up_all(&ctx->wq);
+}
+
+/*
+ * Detach the given context from the AFU. This doesn't actually
+ * free the context but it should stop the context running in hardware
+ * (ie. prevent this context from generating any further interrupts
+ * so that it can be freed).
+ */
+void cxl_context_detach(struct cxl_context_t *ctx)
+{
+	__detach_context(ctx);
+}
+
+/*
+ * Detach all contexts on the given AFU.
+ */
+void cxl_context_detach_all(struct cxl_afu_t *afu)
+{
+	struct cxl_context_t *ctx;
+	int tmp;
+
+	rcu_read_lock();
+	idr_for_each_entry(&afu->contexts_idr, ctx, tmp)
+		__detach_context(ctx);
+	rcu_read_unlock();
+}
+EXPORT_SYMBOL(cxl_context_detach_all);
+
+void cxl_context_free(struct cxl_context_t *ctx)
+{
+	unsigned long flags;
+
+	spin_lock(&ctx->afu->contexts_lock);
+	idr_remove(&ctx->afu->contexts_idr, ctx->ph);
+	spin_unlock(&ctx->afu->contexts_lock);
+	synchronize_rcu();
+
+	spin_lock_irqsave(&ctx->sst_lock, flags);
+	free_page((u64)ctx->sstp);
+	ctx->sstp = NULL;
+	spin_unlock_irqrestore(&ctx->sst_lock, flags);
+
+	put_pid(ctx->pid);
+	kfree(ctx);
+}
diff --git a/drivers/misc/cxl/cxl-pci.c b/drivers/misc/cxl/cxl-pci.c
new file mode 100644
index 0000000..402ab00
--- /dev/null
+++ b/drivers/misc/cxl/cxl-pci.c
@@ -0,0 +1,964 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#undef DEBUG
+
+#include <linux/pci_regs.h>
+#include <linux/pci_ids.h>
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/sort.h>
+#include <linux/pci.h>
+#include <linux/of.h>
+#include <linux/delay.h>
+#include <asm/opal.h>
+#include <asm/msi_bitmap.h>
+#include <asm/pci-bridge.h> /* for struct pci_controller */
+#include <asm/pnv-pci.h>
+
+#include "cxl.h"
+
+
+#define CXL_PCI_VSEC_ID	0x1280
+#define CXL_VSEC_MIN_SIZE 0x80
+
+#define CXL_READ_VSEC_LENGTH(dev, vsec, dest)			\
+	{							\
+		pci_read_config_word(dev, vsec + 0x6, dest);	\
+		*dest >>= 4;					\
+	}
+#define CXL_READ_VSEC_NAFUS(dev, vsec, dest) \
+	pci_read_config_byte(dev, vsec + 0x8, dest)
+
+#define CXL_READ_VSEC_STATUS(dev, vsec, dest) \
+	pci_read_config_byte(dev, vsec + 0x9, dest)
+#define CXL_STATUS_SECOND_PORT  0x80
+#define CXL_STATUS_MSI_X_FULL   0x40
+#define CXL_STATUS_MSI_X_SINGLE 0x20
+#define CXL_STATUS_FLASH_RW     0x08
+#define CXL_STATUS_FLASH_RO     0x04
+#define CXL_STATUS_LOADABLE_AFU 0x02
+#define CXL_STATUS_LOADABLE_PSL 0x01
+/* If we see these features we won't try to use the card */
+#define CXL_UNSUPPORTED_FEATURES \
+	(CXL_STATUS_MSI_X_FULL | CXL_STATUS_MSI_X_SINGLE)
+
+#define CXL_READ_VSEC_MODE_CONTROL(dev, vsec, dest) \
+	pci_read_config_byte(dev, vsec + 0xa, dest)
+#define CXL_WRITE_VSEC_MODE_CONTROL(dev, vsec, val) \
+	pci_write_config_byte(dev, vsec + 0xa, val)
+#define CXL_VSEC_PROTOCOL_MASK   0xe0
+#define CXL_VSEC_PROTOCOL_256TB  0x80 /* Power 8 uses this */
+#define CXL_VSEC_PROTOCOL_512TB  0x40
+#define CXL_VSEC_PROTOCOL_1024TB 0x20
+#define CXL_VSEC_PROTOCOL_ENABLE 0x01
+
+#define CXL_READ_VSEC_PSL_REVISION(dev, vsec, dest) \
+	pci_read_config_word(dev, vsec + 0xc, dest)
+#define CXL_READ_VSEC_CAIA_MINOR(dev, vsec, dest) \
+	pci_read_config_byte(dev, vsec + 0xe, dest)
+#define CXL_READ_VSEC_CAIA_MAJOR(dev, vsec, dest) \
+	pci_read_config_byte(dev, vsec + 0xf, dest)
+#define CXL_READ_VSEC_BASE_IMAGE(dev, vsec, dest) \
+	pci_read_config_word(dev, vsec + 0x10, dest)
+
+#define CXL_READ_VSEC_IMAGE_STATE(dev, vsec, dest) \
+	pci_read_config_byte(dev, vsec + 0x13, dest)
+#define CXL_WRITE_VSEC_IMAGE_STATE(dev, vsec, val) \
+	pci_write_config_byte(dev, vsec + 0x13, val)
+#define CXL_VSEC_USER_IMAGE_LOADED 0x80 /* RO */
+#define CXL_VSEC_PERST_LOADS_IMAGE 0x20 /* RW */
+#define CXL_VSEC_PERST_SELECT_USER 0x10 /* RW */
+
+#define CXL_READ_VSEC_AFU_DESC_OFF(dev, vsec, dest) \
+	pci_read_config_dword(dev, vsec + 0x20, dest)
+#define CXL_READ_VSEC_AFU_DESC_SIZE(dev, vsec, dest) \
+	pci_read_config_dword(dev, vsec + 0x24, dest)
+#define CXL_READ_VSEC_PS_OFF(dev, vsec, dest) \
+	pci_read_config_dword(dev, vsec + 0x28, dest)
+#define CXL_READ_VSEC_PS_SIZE(dev, vsec, dest) \
+	pci_read_config_dword(dev, vsec + 0x2c, dest)
+
+
+/* This works a little different than the p1/p2 register accesses to make it
+ * easier to pull out individual fields */
+#define AFUD_READ(afu, off)		_cxl_reg_read(afu->afu_desc_mmio + off)
+#define EXTRACT_PPC_BIT(val, bit)	(!!(val & PPC_BIT(bit)))
+#define EXTRACT_PPC_BITS(val, bs, be)	((val & PPC_BITMASK(bs, be)) >> PPC_BITLSHIFT(be))
+
+#define AFUD_READ_INFO(afu)		AFUD_READ(afu, 0x0)
+#define   AFUD_NUM_INTS_PER_PROC(val)	EXTRACT_PPC_BITS(val,  0, 15)
+#define   AFUD_NUM_PROCS(val)		EXTRACT_PPC_BITS(val, 16, 31)
+#define   AFUD_NUM_CRS(val)		EXTRACT_PPC_BITS(val, 32, 47)
+#define   AFUD_MULTIMODEL(val)		EXTRACT_PPC_BIT(val, 48)
+#define   AFUD_PUSH_BLOCK_TRANSFER(val)	EXTRACT_PPC_BIT(val, 55)
+#define   AFUD_DEDICATED_PROCESS(val)	EXTRACT_PPC_BIT(val, 59)
+#define   AFUD_AFU_DIRECTED(val)	EXTRACT_PPC_BIT(val, 61)
+#define   AFUD_TIME_SLICED(val)		EXTRACT_PPC_BIT(val, 63)
+#define AFUD_READ_CR(afu)		AFUD_READ(afu, 0x20)
+#define   AFUD_CR_LEN(val)		EXTRACT_PPC_BITS(val, 8, 63)
+#define AFUD_READ_CR_OFF(afu)		AFUD_READ(afu, 0x28)
+#define AFUD_READ_PPPSA(afu)		AFUD_READ(afu, 0x30)
+#define   AFUD_PPPSA_PP(val)		EXTRACT_PPC_BIT(val, 6)
+#define   AFUD_PPPSA_PSA(val)		EXTRACT_PPC_BIT(val, 7)
+#define   AFUD_PPPSA_LEN(val)		EXTRACT_PPC_BITS(val, 8, 63)
+#define AFUD_READ_PPPSA_OFF(afu)	AFUD_READ(afu, 0x38)
+#define AFUD_READ_EB(afu)		AFUD_READ(afu, 0x40)
+#define   AFUD_EB_LEN(val)		EXTRACT_PPC_BITS(val, 8, 63)
+#define AFUD_READ_EB_OFF(afu)		AFUD_READ(afu, 0x48)
+
+static DEFINE_PCI_DEVICE_TABLE(cxl_pci_tbl) = {
+	{ PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x0477), },
+	{ PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x044b), },
+	{ PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x04cf), },
+	{ PCI_DEVICE_CLASS(0x120000, ~0), },
+
+	{ }
+};
+MODULE_DEVICE_TABLE(pci, cxl_pci_tbl);
+
+
+/* Mostly using these wrappers to avoid confusion:
+ * priv 1 is BAR2, while priv 2 is BAR0 */
+static inline resource_size_t p1_base(struct pci_dev *dev)
+{
+	return pci_resource_start(dev, 2);
+}
+
+static inline resource_size_t p1_size(struct pci_dev *dev)
+{
+	return pci_resource_len(dev, 2);
+}
+
+static inline resource_size_t p2_base(struct pci_dev *dev)
+{
+	return pci_resource_start(dev, 0);
+}
+
+static inline resource_size_t p2_size(struct pci_dev *dev)
+{
+	return pci_resource_len(dev, 0);
+}
+
+static int find_cxl_vsec(struct pci_dev *dev)
+{
+	int vsec = 0;
+	u16 val;
+
+	while ((vsec = pci_find_next_ext_capability(dev, vsec, PCI_EXT_CAP_ID_VNDR))) {
+		pci_read_config_word(dev, vsec + 0x4, &val);
+		if (val == CXL_PCI_VSEC_ID)
+			return vsec;
+	}
+	return 0;
+
+}
+
+static void dump_cxl_config_space(struct pci_dev *dev)
+{
+	int vsec;
+	u32 val;
+
+	dev_info(&dev->dev, "dump_cxl_config_space\n");
+
+	pci_read_config_dword(dev, PCI_BASE_ADDRESS_0, &val);
+	dev_info(&dev->dev, "BAR0: %#.8x\n", val);
+	pci_read_config_dword(dev, PCI_BASE_ADDRESS_1, &val);
+	dev_info(&dev->dev, "BAR1: %#.8x\n", val);
+	pci_read_config_dword(dev, PCI_BASE_ADDRESS_2, &val);
+	dev_info(&dev->dev, "BAR2: %#.8x\n", val);
+	pci_read_config_dword(dev, PCI_BASE_ADDRESS_3, &val);
+	dev_info(&dev->dev, "BAR3: %#.8x\n", val);
+	pci_read_config_dword(dev, PCI_BASE_ADDRESS_4, &val);
+	dev_info(&dev->dev, "BAR4: %#.8x\n", val);
+	pci_read_config_dword(dev, PCI_BASE_ADDRESS_5, &val);
+	dev_info(&dev->dev, "BAR5: %#.8x\n", val);
+
+	dev_info(&dev->dev, "p1 regs: %#llx, len: %#llx\n",
+		p1_base(dev), p1_size(dev));
+	dev_info(&dev->dev, "p2 regs: %#llx, len: %#llx\n",
+		p1_base(dev), p2_size(dev));
+	dev_info(&dev->dev, "BAR 4/5: %#llx, len: %#llx\n",
+		pci_resource_start(dev, 4), pci_resource_len(dev, 4));
+
+	if (!(vsec = find_cxl_vsec(dev)))
+		return;
+
+#define show_reg(name, what) \
+	dev_info(&dev->dev, "cxl vsec: %30s: %#x\n", name, what)
+
+	pci_read_config_dword(dev, vsec + 0x0, &val);
+	show_reg("Cap ID", (val >> 0) & 0xffff);
+	show_reg("Cap Ver", (val >> 16) & 0xf);
+	show_reg("Next Cap Ptr", (val >> 20) & 0xfff);
+	pci_read_config_dword(dev, vsec + 0x4, &val);
+	show_reg("VSEC ID", (val >> 0) & 0xffff);
+	show_reg("VSEC Rev", (val >> 16) & 0xf);
+	show_reg("VSEC Length",	(val >> 20) & 0xfff);
+	pci_read_config_dword(dev, vsec + 0x8, &val);
+	show_reg("Num AFUs", (val >> 0) & 0xff);
+	show_reg("Status", (val >> 8) & 0xff);
+	show_reg("Mode Control", (val >> 16) & 0xff);
+	show_reg("Reserved", (val >> 24) & 0xff);
+	pci_read_config_dword(dev, vsec + 0xc, &val);
+	show_reg("PSL Rev", (val >> 0) & 0xffff);
+	show_reg("CAIA Ver", (val >> 16) & 0xffff);
+	pci_read_config_dword(dev, vsec + 0x10, &val);
+	show_reg("Base Image Rev", (val >> 0) & 0xffff);
+	show_reg("Reserved", (val >> 16) & 0x0fff);
+	show_reg("Image Control", (val >> 28) & 0x3);
+	show_reg("Reserved", (val >> 30) & 0x1);
+	show_reg("Image Loaded", (val >> 31) & 0x1);
+
+	pci_read_config_dword(dev, vsec + 0x14, &val);
+	show_reg("Reserved", val);
+	pci_read_config_dword(dev, vsec + 0x18, &val);
+	show_reg("Reserved", val);
+	pci_read_config_dword(dev, vsec + 0x1c, &val);
+	show_reg("Reserved", val);
+
+	pci_read_config_dword(dev, vsec + 0x20, &val);
+	show_reg("AFU Descriptor Offset", val);
+	pci_read_config_dword(dev, vsec + 0x24, &val);
+	show_reg("AFU Descriptor Size", val);
+	pci_read_config_dword(dev, vsec + 0x28, &val);
+	show_reg("Problem State Offset", val);
+	pci_read_config_dword(dev, vsec + 0x2c, &val);
+	show_reg("Problem State Size", val);
+
+	pci_read_config_dword(dev, vsec + 0x30, &val);
+	show_reg("Reserved", val);
+	pci_read_config_dword(dev, vsec + 0x34, &val);
+	show_reg("Reserved", val);
+	pci_read_config_dword(dev, vsec + 0x38, &val);
+	show_reg("Reserved", val);
+	pci_read_config_dword(dev, vsec + 0x3c, &val);
+	show_reg("Reserved", val);
+
+	pci_read_config_dword(dev, vsec + 0x40, &val);
+	show_reg("PSL Programming Port", val);
+	pci_read_config_dword(dev, vsec + 0x44, &val);
+	show_reg("PSL Programming Control", val);
+
+	pci_read_config_dword(dev, vsec + 0x48, &val);
+	show_reg("Reserved", val);
+	pci_read_config_dword(dev, vsec + 0x4c, &val);
+	show_reg("Reserved", val);
+
+	pci_read_config_dword(dev, vsec + 0x50, &val);
+	show_reg("Flash Address Register", val);
+	pci_read_config_dword(dev, vsec + 0x54, &val);
+	show_reg("Flash Size Register", val);
+	pci_read_config_dword(dev, vsec + 0x58, &val);
+	show_reg("Flash Status/Control Register", val);
+	pci_read_config_dword(dev, vsec + 0x58, &val);
+	show_reg("Flash Data Port", val);
+
+#undef show_reg
+}
+
+static void dump_afu_descriptor(struct cxl_afu_t *afu)
+{
+	u64 val;
+
+#define show_reg(name, what) \
+	dev_info(&afu->dev, "afu desc: %30s: %#llx\n", name, what)
+
+	val = AFUD_READ_INFO(afu);
+	show_reg("num_ints_per_process", AFUD_NUM_INTS_PER_PROC(val));
+	show_reg("num_of_processes", AFUD_NUM_PROCS(val));
+	show_reg("num_of_afu_CRs", AFUD_NUM_CRS(val));
+	show_reg("req_prog_model", val & 0xffffULL);
+
+	val = AFUD_READ(afu, 0x8);
+	show_reg("Reserved", val);
+	val = AFUD_READ(afu, 0x10);
+	show_reg("Reserved", val);
+	val = AFUD_READ(afu, 0x18);
+	show_reg("Reserved", val);
+
+	val = AFUD_READ_CR(afu);
+	show_reg("Reserved", (val >> (63-7)) & 0xff);
+	show_reg("AFU_CR_len", AFUD_CR_LEN(val));
+
+	val = AFUD_READ_CR_OFF(afu);
+	show_reg("AFU_CR_offset", val);
+
+	val = AFUD_READ_PPPSA(afu);
+	show_reg("PerProcessPSA_control", (val >> (63-7)) & 0xff);
+	show_reg("PerProcessPSA Length", AFUD_PPPSA_LEN(val));
+
+	val = AFUD_READ_PPPSA_OFF(afu);
+	show_reg("PerProcessPSA_offset", val);
+
+	val = AFUD_READ_EB(afu);
+	show_reg("Reserved", (val >> (63-7)) & 0xff);
+	show_reg("AFU_EB_len", AFUD_EB_LEN(val));
+
+	val = AFUD_READ_EB_OFF(afu);
+	show_reg("AFU_EB_offset", val);
+
+#undef show_reg
+}
+
+extern struct device_node *pnv_pci_to_phb_node(struct pci_dev *dev);
+
+static int init_implementation_adapter_regs(struct cxl_t *adapter, struct pci_dev *dev)
+{
+	struct device_node *np;
+	const __be32 *prop;
+	u64 psl_dsnctl;
+	u64 chipid;
+
+	if (!(np = pnv_pci_to_phb_node(dev)))
+		return -ENODEV;
+
+	while (np && !(prop = of_get_property(np, "ibm,chip-id", NULL)))
+		np = of_get_next_parent(np);
+	if (!np)
+		return -ENODEV;
+	chipid = be32_to_cpup(prop);
+	of_node_put(np);
+
+	/* Tell PSL where to route data to */
+	psl_dsnctl = 0x02E8900002000000ULL | (chipid << (63-5));
+	cxl_p1_write(adapter, CXL_PSL_DSNDCTL, psl_dsnctl);
+	cxl_p1_write(adapter, CXL_PSL_RESLCKTO, 0x20000000200ULL);
+	/* snoop write mask */
+	cxl_p1_write(adapter, CXL_PSL_SNWRALLOC, 0x00000000FFFFFFFFULL);
+	/* set fir_accum */
+	cxl_p1_write(adapter, CXL_PSL_FIR_CNTL, 0x0800000000000000ULL);
+	/* for debugging with trace arrays */
+	cxl_p1_write(adapter, CXL_PSL_TRACE, 0x0000FF7C00000000ULL);
+
+	return 0;
+}
+
+static int init_implementation_afu_regs(struct cxl_afu_t *afu)
+{
+	/* read/write masks for this slice */
+	cxl_p1n_write(afu, CXL_PSL_APCALLOC_A, 0xFFFFFFFEFEFEFEFEULL);
+	/* APC read/write masks for this slice */
+	cxl_p1n_write(afu, CXL_PSL_COALLOC_A, 0xFF000000FEFEFEFEULL);
+	/* for debugging with trace arrays */
+	cxl_p1n_write(afu, CXL_PSL_SLICE_TRACE, 0x0000FFFF00000000ULL);
+	cxl_p1n_write(afu, CXL_PSL_RXCTL_A, 0xF000000000000000ULL);
+
+	return 0;
+}
+
+static int setup_cxl_msi(struct cxl_t *adapter, unsigned int hwirq,
+			 unsigned int virq)
+{
+	struct pci_dev *dev = to_pci_dev(adapter->dev.parent);
+
+	return pnv_cxl_ioda_msi_setup(dev, hwirq, virq);
+}
+
+static int alloc_one_hwirq(struct cxl_t *adapter)
+{
+	struct pci_dev *dev = to_pci_dev(adapter->dev.parent);
+
+	return pnv_cxl_alloc_hwirqs(dev, 1);
+}
+
+static void release_one_hwirq(struct cxl_t *adapter, int hwirq)
+{
+	struct pci_dev *dev = to_pci_dev(adapter->dev.parent);
+
+	return pnv_cxl_release_hwirqs(dev, hwirq, 1);
+}
+
+static int alloc_hwirq_ranges(struct cxl_irq_ranges *irqs, struct cxl_t *adapter, unsigned int num)
+{
+	struct pci_dev *dev = to_pci_dev(adapter->dev.parent);
+
+	return pnv_cxl_alloc_hwirq_ranges(irqs, dev, num);
+}
+
+static void release_hwirq_ranges(struct cxl_irq_ranges *irqs, struct cxl_t *adapter)
+{
+	struct pci_dev *dev = to_pci_dev(adapter->dev.parent);
+
+	pnv_cxl_release_hwirq_ranges(irqs, dev);
+
+}
+
+
+static struct cxl_driver_ops cxl_pci_driver_ops = {
+	.module = THIS_MODULE,
+	.alloc_one_irq = alloc_one_hwirq,
+	.release_one_irq = release_one_hwirq,
+	.alloc_irq_ranges = alloc_hwirq_ranges,
+	.release_irq_ranges = release_hwirq_ranges,
+	.setup_irq = setup_cxl_msi,
+};
+
+static int setup_cxl_bars(struct pci_dev *dev)
+{
+	/* Safety check in case we get backported to < 3.17 without M64 */
+	if ((p1_base(dev) < 0x100000000ULL) ||
+	    (p2_base(dev) < 0x100000000ULL)) {
+		dev_err(&dev->dev, "ABORTING: M32 BAR assignment incompatible with CXL\n");
+		return -ENODEV;
+	}
+
+	/* BAR 4/5 has a special meaning for CXL and must be programmed with a
+	 * special value corresponding to the CXL protocol address range.
+	 * For POWER 8 that means bits 48:49 must be set to 10 */
+	pci_write_config_dword(dev, PCI_BASE_ADDRESS_4, 0x00000000);
+	pci_write_config_dword(dev, PCI_BASE_ADDRESS_5, 0x00020000);
+
+	return 0;
+}
+
+/*
+ *  pciex node: ibm,opal-m64-window = <0x3d058 0x0 0x3d058 0x0 0x8 0x0>;
+ */
+
+static int switch_card_to_cxl(struct pci_dev *dev)
+{
+	int vsec;
+	u8 val;
+	int rc;
+
+	dev_info(&dev->dev, "switch card to CXL\n");
+
+	if (!(vsec = find_cxl_vsec(dev))) {
+		dev_err(&dev->dev, "ABORTING: CXL VSEC not found!\n");
+		return -ENODEV;
+	}
+
+	if ((rc = CXL_READ_VSEC_MODE_CONTROL(dev, vsec, &val))) {
+		dev_err(&dev->dev, "failed to read current mode control: %i", rc);
+		return rc;
+	}
+	val &= ~CXL_VSEC_PROTOCOL_MASK;
+	val |= CXL_VSEC_PROTOCOL_256TB | CXL_VSEC_PROTOCOL_ENABLE;
+	if ((rc = CXL_WRITE_VSEC_MODE_CONTROL(dev, vsec, val))) {
+		dev_err(&dev->dev, "failed to enable CXL protocol: %i", rc);
+		return rc;
+	}
+	/* The CAIA spec (v0.12 11.6 Bi-modal Device Support) states
+	 * we must wait 100ms after this mode switch before touching
+	 * PCIe config space.
+	 */
+	msleep(100);
+
+	return 0;
+}
+
+static int cxl_map_slice_regs(struct cxl_afu_t *afu, struct cxl_t *adapter, struct pci_dev *dev)
+{
+	u64 p1n_base, p2n_base, afu_desc;
+	const u64 p1n_size = 0x100;
+	const u64 p2n_size = 0x1000;
+
+	p1n_base = p1_base(dev) + 0x10000 + (afu->slice * p1n_size);
+	p2n_base = p2_base(dev) + (afu->slice * p2n_size);
+	afu->psn_phys = p2_base(dev) + (adapter->ps_off + (afu->slice * adapter->ps_size));
+	afu_desc = p2_base(dev) + adapter->afu_desc_off + (afu->slice * adapter->afu_desc_size);
+
+	if (!(afu->p1n_mmio = ioremap(p1n_base, p1n_size)))
+		goto err;
+	if (!(afu->p2n_mmio = ioremap(p2n_base, p2n_size)))
+		goto err1;
+	if (afu_desc) {
+		if (!(afu->afu_desc_mmio = ioremap(afu_desc, adapter->afu_desc_size)))
+			goto err2;
+	}
+
+	return 0;
+err2:
+	iounmap(afu->p2n_mmio);
+err1:
+	iounmap(afu->p1n_mmio);
+err:
+	dev_err(&afu->dev, "Error mapping AFU MMIO regions\n");
+	return -ENOMEM;
+}
+
+static void cxl_unmap_slice_regs(struct cxl_afu_t *afu)
+{
+	if (afu->p1n_mmio)
+		iounmap(afu->p2n_mmio);
+	if (afu->p1n_mmio)
+		iounmap(afu->p1n_mmio);
+}
+
+static void cxl_release_afu(struct device *dev)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(dev);
+
+	pr_devel("cxl_release_afu\n");
+
+	kfree(afu);
+}
+
+static struct cxl_afu_t *cxl_alloc_afu(struct cxl_t *adapter, int slice)
+{
+	struct cxl_afu_t *afu;
+
+	if (!(afu = kzalloc(sizeof(struct cxl_afu_t), GFP_KERNEL)))
+		return NULL;
+
+	afu->adapter = adapter;
+	afu->dev.parent = &adapter->dev;
+	afu->dev.release = cxl_release_afu;
+	afu->slice = slice;
+	idr_init(&afu->contexts_idr);
+	spin_lock_init(&afu->contexts_lock);
+	spin_lock_init(&afu->afu_cntl_lock);
+	mutex_init(&afu->spa_mutex);
+
+	afu->prefault_mode = CXL_PREFAULT_NONE;
+	afu->irqs_max = afu->adapter->user_irqs;
+
+	return afu;
+}
+
+/* Expects AFU struct to have recently been zeroed out */
+static int cxl_read_afu_descriptor(struct cxl_afu_t *afu)
+{
+	u64 val;
+
+	val = AFUD_READ_INFO(afu);
+	afu->pp_irqs = AFUD_NUM_INTS_PER_PROC(val);
+	afu->max_procs_virtualised = AFUD_NUM_PROCS(val);
+
+	if (AFUD_AFU_DIRECTED(val))
+		afu->models_supported |= CXL_MODEL_DIRECTED;
+	if (AFUD_DEDICATED_PROCESS(val))
+		afu->models_supported |= CXL_MODEL_DEDICATED;
+	if (AFUD_TIME_SLICED(val))
+		afu->models_supported |= CXL_MODEL_TIME_SLICED;
+
+	val = AFUD_READ_PPPSA(afu);
+	afu->pp_size = AFUD_PPPSA_LEN(val) * 4096;
+	afu->psa = AFUD_PPPSA_PSA(val);
+	if ((afu->pp_psa = AFUD_PPPSA_PP(val)))
+		afu->pp_offset = AFUD_READ_PPPSA_OFF(afu);
+
+	return 0;
+}
+
+static int cxl_afu_descriptor_looks_ok(struct cxl_afu_t *afu)
+{
+	if (afu->psa && afu->adapter->ps_size <
+			(afu->pp_offset + afu->pp_size*afu->max_procs_virtualised)) {
+		dev_err(&afu->dev, "per-process PSA can't fit inside the PSA!\n");
+		return -ENODEV;
+	}
+
+	if (afu->pp_psa && (afu->pp_size < PAGE_SIZE))
+		dev_warn(&afu->dev, "AFU uses < PAGE_SIZE per-process PSA!");
+
+	return 0;
+}
+
+static int sanitise_afu_regs(struct cxl_afu_t *afu)
+{
+	cxl_p1_write(afu->adapter, CXL_PSL_ErrIVTE, 0x0000000000000000);
+	cxl_p1n_write(afu, CXL_PSL_SERR_An, 0x0000000000000000);
+	cxl_p1n_write(afu, CXL_PSL_IVTE_Offset_An, 0x0000000000000000);
+	cxl_ops->slbia(afu);
+
+	return 0;
+}
+
+static int cxl_init_afu(struct cxl_t *adapter, int slice, struct pci_dev *dev)
+{
+	struct cxl_afu_t *afu;
+	bool free = true;
+	int rc;
+
+	if (!(afu = cxl_alloc_afu(adapter, slice)))
+		return -ENOMEM;
+
+	if ((rc = dev_set_name(&afu->dev, "afu%i.%i", adapter->adapter_num, slice)))
+		goto err1;
+
+	if ((rc = cxl_map_slice_regs(afu, adapter, dev)))
+		goto err1;
+
+	if ((rc = sanitise_afu_regs(afu)))
+		goto err2;
+
+	/* We need to reset the AFU before we can read the AFU descriptor */
+	if ((rc = cxl_ops->afu_reset(afu)))
+		goto err2;
+
+	if (cxl_verbose)
+		dump_afu_descriptor(afu);
+
+	if ((rc = cxl_read_afu_descriptor(afu)))
+		goto err2;
+
+	if ((rc = cxl_afu_descriptor_looks_ok(afu)))
+		goto err2;
+
+	if ((rc = init_implementation_afu_regs(afu)))
+		goto err2;
+
+	if ((rc = cxl_register_serr_irq(afu)))
+		goto err2;
+
+	if ((rc = cxl_register_psl_irq(afu)))
+		goto err3;
+
+	/* Don't care if this fails */
+	cxl_debugfs_afu_add(afu);
+
+	/* After we call this function we must not free the afu directly, even
+	 * if it returns an error! */
+	if ((rc = cxl_register_afu(afu)))
+		goto err_put1;
+
+	if ((rc = cxl_sysfs_afu_add(afu)))
+		goto err_put1;
+
+
+	if ((rc = cxl_afu_select_best_model(afu)))
+		goto err_put2;
+
+	adapter->afu[afu->slice] = afu;
+
+	return 0;
+
+err_put2:
+	cxl_sysfs_afu_remove(afu);
+err_put1:
+	device_unregister(&afu->dev);
+	free = false;
+	cxl_debugfs_afu_remove(afu);
+	cxl_release_psl_irq(afu);
+err3:
+	cxl_release_serr_irq(afu);
+err2:
+	cxl_unmap_slice_regs(afu);
+err1:
+	if (free)
+		kfree(afu);
+	return rc;
+}
+
+static void cxl_remove_afu(struct cxl_afu_t *afu)
+{
+	pr_devel("cxl_remove_afu\n");
+
+	if (!afu)
+		return;
+
+	cxl_sysfs_afu_remove(afu);
+	cxl_debugfs_afu_remove(afu);
+
+	spin_lock(&afu->adapter->afu_list_lock);
+	afu->adapter->afu[afu->slice] = NULL;
+	spin_unlock(&afu->adapter->afu_list_lock);
+
+	cxl_context_detach_all(afu);
+	cxl_afu_deactivate_model(afu);
+
+	cxl_release_psl_irq(afu);
+	cxl_release_serr_irq(afu);
+	cxl_unmap_slice_regs(afu);
+
+	device_unregister(&afu->dev);
+}
+
+
+static int cxl_map_adapter_regs(struct cxl_t *adapter, struct pci_dev *dev)
+{
+	if (pci_request_region(dev, 2, "priv 2 regs"))
+		goto err1;
+	if (pci_request_region(dev, 0, "priv 1 regs"))
+		goto err2;
+
+	pr_devel("cxl_map_adapter_regs: p1: %#.16llx %#llx, p2: %#.16llx %#llx",
+			p1_base(dev), p1_size(dev), p2_base(dev), p2_size(dev));
+
+	if (!(adapter->p1_mmio = ioremap(p1_base(dev), p1_size(dev))))
+		goto err3;
+
+	if (!(adapter->p2_mmio = ioremap(p2_base(dev), p2_size(dev))))
+		goto err4;
+
+	return 0;
+
+err4:
+	iounmap(adapter->p1_mmio);
+	adapter->p1_mmio = NULL;
+err3:
+	pci_release_region(dev, 0);
+err2:
+	pci_release_region(dev, 2);
+err1:
+	return -ENOMEM;
+}
+
+static void cxl_unmap_adapter_regs(struct cxl_t *adapter)
+{
+	if (adapter->p1_mmio)
+		iounmap(adapter->p1_mmio);
+	if (adapter->p2_mmio)
+		iounmap(adapter->p2_mmio);
+}
+
+static int cxl_read_vsec(struct cxl_t *adapter, struct pci_dev *dev)
+{
+	int vsec;
+	u32 afu_desc_off, afu_desc_size;
+	u32 ps_off, ps_size;
+	u16 vseclen;
+	u8 image_state;
+
+	if (!(vsec = find_cxl_vsec(dev))) {
+		dev_err(&adapter->dev, "ABORTING: CXL VSEC not found!\n");
+		return -ENODEV;
+	}
+
+	CXL_READ_VSEC_LENGTH(dev, vsec, &vseclen);
+	if (vseclen < CXL_VSEC_MIN_SIZE) {
+		pr_err("ABORTING: CXL VSEC too short\n");
+		return -EINVAL;
+	}
+
+	CXL_READ_VSEC_STATUS(dev, vsec, &adapter->vsec_status);
+	CXL_READ_VSEC_PSL_REVISION(dev, vsec, &adapter->psl_rev);
+	CXL_READ_VSEC_CAIA_MAJOR(dev, vsec, &adapter->caia_major);
+	CXL_READ_VSEC_CAIA_MINOR(dev, vsec, &adapter->caia_minor);
+	CXL_READ_VSEC_BASE_IMAGE(dev, vsec, &adapter->base_image);
+	CXL_READ_VSEC_IMAGE_STATE(dev, vsec, &image_state);
+	adapter->user_image_loaded = !!(image_state & CXL_VSEC_USER_IMAGE_LOADED);
+	adapter->perst_loads_image = !!(image_state & CXL_VSEC_PERST_LOADS_IMAGE);
+	adapter->perst_select_user = !!(image_state & CXL_VSEC_PERST_SELECT_USER);
+
+	CXL_READ_VSEC_NAFUS(dev, vsec, &adapter->slices);
+	CXL_READ_VSEC_AFU_DESC_OFF(dev, vsec, &afu_desc_off);
+	CXL_READ_VSEC_AFU_DESC_SIZE(dev, vsec, &afu_desc_size);
+	CXL_READ_VSEC_PS_OFF(dev, vsec, &ps_off);
+	CXL_READ_VSEC_PS_SIZE(dev, vsec, &ps_size);
+
+	/* Convert everything to bytes, because there is NO WAY I'd look at the
+	 * code a month later and forget what units these are in ;-) */
+	adapter->ps_off = ps_off * 64 * 1024;
+	adapter->ps_size = ps_size * 64 * 1024;
+	adapter->afu_desc_off = afu_desc_off * 64 * 1024;
+	adapter->afu_desc_size = afu_desc_size *64 * 1024;
+
+	/* Total IRQs - 1 PSL ERROR - #AFU*(1 slice error + 1 DSI) */
+	adapter->user_irqs = pnv_cxl_get_irq_count(dev) - 1 - 2*adapter->slices;
+
+	return 0;
+}
+
+static int cxl_vsec_looks_ok(struct cxl_t *adapter, struct pci_dev *dev)
+{
+	if (adapter->vsec_status & CXL_STATUS_SECOND_PORT)
+		return -EBUSY;
+
+	if (adapter->vsec_status & CXL_UNSUPPORTED_FEATURES) {
+		dev_err(&adapter->dev, "ABORTING: CXL requires unsupported features\n");
+		return -EINVAL;
+	}
+
+	if (!adapter->slices) {
+		/* Once we support dynamic reprogramming we can use the card if
+		 * it supports loadable AFUs */
+		dev_err(&adapter->dev, "ABORTING: Device has no AFUs\n");
+		return -EINVAL;
+	}
+
+	if (!adapter->afu_desc_off || !adapter->afu_desc_size) {
+		dev_err(&adapter->dev, "ABORTING: VSEC shows no AFU descriptors\n");
+		return -EINVAL;
+	}
+
+	if (adapter->ps_size > p2_size(dev) - adapter->ps_off) {
+		dev_err(&adapter->dev, "ABORTING: Problem state size larger than "
+				   "available in BAR2: 0x%llx > 0x%llx\n",
+			 adapter->ps_size, p2_size(dev) - adapter->ps_off);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static void cxl_release_adapter(struct device *dev)
+{
+	struct cxl_t *adapter = to_cxl_adapter(dev);
+
+	pr_devel("cxl_release_adapter\n");
+
+	kfree(adapter);
+}
+
+static struct cxl_t *cxl_alloc_adapter(struct pci_dev *dev)
+{
+	struct cxl_t *adapter;
+
+	if (!(adapter = kzalloc(sizeof(struct cxl_t), GFP_KERNEL)))
+		return NULL;
+
+	adapter->dev.parent = &dev->dev;
+	adapter->dev.release = cxl_release_adapter;
+	adapter->driver = &cxl_pci_driver_ops;
+	pci_set_drvdata(dev, adapter);
+	spin_lock_init(&adapter->afu_list_lock);
+
+	return adapter;
+}
+
+static struct cxl_t *cxl_init_adapter(struct pci_dev *dev)
+{
+	struct cxl_t *adapter;
+	bool free = true;
+	int rc;
+
+
+	if (!(adapter = cxl_alloc_adapter(dev)))
+		return ERR_PTR(-ENOMEM);
+
+	if ((rc = switch_card_to_cxl(dev)))
+		goto err1;
+
+	if ((rc = cxl_alloc_adapter_nr(adapter)))
+		goto err1;
+
+	if ((rc = dev_set_name(&adapter->dev, "card%i", adapter->adapter_num)))
+		goto err2;
+
+	if ((rc = cxl_read_vsec(adapter, dev)))
+		goto err2;
+
+	if ((rc = cxl_vsec_looks_ok(adapter, dev)))
+		goto err2;
+
+	if ((rc = cxl_map_adapter_regs(adapter, dev)))
+		goto err2;
+
+	/* TODO: cxl_ops->sanitise_adapter_regs(adapter); */
+
+	if ((rc = init_implementation_adapter_regs(adapter, dev)))
+		goto err3;
+
+	if ((rc = pnv_phb_to_cxl(dev)))
+		goto err3;
+
+	if ((rc = cxl_register_psl_err_irq(adapter)))
+		goto err3;
+
+	/* Don't care if this one fails: */
+	cxl_debugfs_adapter_add(adapter);
+
+	/* After we call this function we must not free the adapter directly,
+	 * even if it returns an error! */
+	if ((rc = cxl_register_adapter(adapter)))
+		goto err_put1;
+
+	if ((rc = cxl_sysfs_adapter_add(adapter)))
+		goto err_put1;
+
+	return adapter;
+
+err_put1:
+	device_unregister(&adapter->dev);
+	free = false;
+	cxl_debugfs_adapter_remove(adapter);
+	cxl_release_psl_err_irq(adapter);
+err3:
+	cxl_unmap_adapter_regs(adapter);
+err2:
+	cxl_remove_adapter_nr(adapter);
+err1:
+	if (free)
+		kfree(adapter);
+	return ERR_PTR(rc);
+}
+
+static void cxl_remove_adapter(struct cxl_t *adapter)
+{
+	struct pci_dev *pdev = to_pci_dev(adapter->dev.parent);
+
+	pr_devel("cxl_release_adapter\n");
+
+	cxl_sysfs_adapter_remove(adapter);
+	cxl_debugfs_adapter_remove(adapter);
+	cxl_release_psl_err_irq(adapter);
+	cxl_unmap_adapter_regs(adapter);
+	cxl_remove_adapter_nr(adapter);
+
+	device_unregister(&adapter->dev);
+
+	pci_release_region(pdev, 0);
+	pci_release_region(pdev, 2);
+	pci_disable_device(pdev);
+}
+
+static int cxl_probe(struct pci_dev *dev, const struct pci_device_id *id)
+{
+	struct cxl_t *adapter;
+	int slice;
+	int rc;
+
+	pci_dev_get(dev);
+
+	if (cxl_verbose)
+		dump_cxl_config_space(dev);
+
+	if ((rc = setup_cxl_bars(dev)))
+		return rc;
+
+	if ((rc = pci_enable_device(dev))) {
+		dev_err(&dev->dev, "pci_enable_device failed: %i\n", rc);
+		return rc;
+	}
+
+	adapter = cxl_init_adapter(dev);
+	if (IS_ERR(adapter)) {
+		dev_err(&dev->dev, "cxl_init_adapter failed: %li\n", PTR_ERR(adapter));
+		return PTR_ERR(adapter);
+	}
+
+	for (slice = 0; slice < adapter->slices; slice++) {
+		if ((rc = cxl_init_afu(adapter, slice, dev)))
+			dev_err(&dev->dev, "AFU %i failed to initialise: %i\n", slice, rc);
+	}
+
+	return 0;
+}
+
+static void cxl_remove(struct pci_dev *dev)
+{
+	struct cxl_t *adapter = pci_get_drvdata(dev);
+	int afu;
+
+	dev_warn(&dev->dev, "pci remove\n");
+
+	/* Lock to prevent someone grabbing a ref through the adapter list as
+	 * we are removing it */
+	for (afu = 0; afu < adapter->slices; afu++)
+		cxl_remove_afu(adapter->afu[afu]);
+	cxl_remove_adapter(adapter);
+}
+
+static struct pci_driver cxl_pci_driver = {
+	.name = "cxl-pci",
+	.id_table = cxl_pci_tbl,
+	.probe = cxl_probe,
+	.remove = cxl_remove,
+};
+
+module_driver(cxl_pci_driver, pci_register_driver, pci_unregister_driver);
+
+MODULE_DESCRIPTION("IBM Coherent Accelerator");
+MODULE_AUTHOR("Ian Munsie <imunsie@au1.ibm.com>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
new file mode 100644
index 0000000..87984cb
--- /dev/null
+++ b/drivers/misc/cxl/cxl.h
@@ -0,0 +1,605 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _CXL_H_
+#define _CXL_H_
+
+#include <linux/interrupt.h>
+#include <linux/semaphore.h>
+#include <linux/device.h>
+#include <linux/types.h>
+#include <linux/cdev.h>
+#include <linux/pid.h>
+#include <linux/io.h>
+#include <asm/cputable.h>
+#include <asm/mmu.h>
+#include <asm/reg.h>
+#include <misc/cxl.h>
+
+#include <uapi/misc/cxl.h>
+
+extern uint cxl_verbose;
+
+#define CXL_TIMEOUT 5
+
+/* Opaque types to avoid accidentally passing registers for the wrong MMIO
+ *
+ * At the end of the day, I'm not married to using typedef here, but it might
+ * (and has!) help avoid bugs like mixing up CXL_PSL_CtxTime and
+ * CXL_PSL_CtxTime_An, or calling cxl_p1n_write instead of cxl_p1_write.
+ *
+ * I'm quite happy if these are changed back to #defines before upstreaming, it
+ * should be little more than a regexp search+replace operation in this file.
+ */
+typedef struct {
+	const int x;
+} cxl_p1_reg_t;
+typedef struct {
+	const int x;
+} cxl_p1n_reg_t;
+typedef struct {
+	const int x;
+} cxl_p2n_reg_t;
+#define cxl_reg_off(reg) \
+	(reg.x)
+
+/* Memory maps. Ref CXL Appendix A */
+
+/* PSL Privilege 1 Memory Map */
+/* Configuration and Control area */
+static const cxl_p1_reg_t CXL_PSL_CtxTime = {0x0000};
+static const cxl_p1_reg_t CXL_PSL_ErrIVTE = {0x0008};
+static const cxl_p1_reg_t CXL_PSL_KEY1    = {0x0010};
+static const cxl_p1_reg_t CXL_PSL_KEY2    = {0x0018};
+static const cxl_p1_reg_t CXL_PSL_Control = {0x0020};
+/* Downloading */
+static const cxl_p1_reg_t CXL_PSL_DLCNTL  = {0x0060};
+static const cxl_p1_reg_t CXL_PSL_DLADDR  = {0x0068};
+
+/* PSL Lookaside Buffer Management Area */
+static const cxl_p1_reg_t CXL_PSL_LBISEL  = {0x0080};
+static const cxl_p1_reg_t CXL_PSL_SLBIE   = {0x0088};
+static const cxl_p1_reg_t CXL_PSL_SLBIA   = {0x0090};
+static const cxl_p1_reg_t CXL_PSL_TLBIE   = {0x00A0};
+static const cxl_p1_reg_t CXL_PSL_TLBIA   = {0x00A8};
+static const cxl_p1_reg_t CXL_PSL_AFUSEL  = {0x00B0};
+
+/* 0x00C0:7EFF Implementation dependent area */
+static const cxl_p1_reg_t CXL_PSL_FIR1      = {0x0100};
+static const cxl_p1_reg_t CXL_PSL_FIR2      = {0x0108};
+static const cxl_p1_reg_t CXL_PSL_VERSION   = {0x0118};
+static const cxl_p1_reg_t CXL_PSL_RESLCKTO  = {0x0128};
+static const cxl_p1_reg_t CXL_PSL_FIR_CNTL  = {0x0148};
+static const cxl_p1_reg_t CXL_PSL_DSNDCTL   = {0x0150};
+static const cxl_p1_reg_t CXL_PSL_SNWRALLOC = {0x0158};
+static const cxl_p1_reg_t CXL_PSL_TRACE     = {0x0170};
+/* 0x7F00:7FFF Reserved PCIe MSI-X Pending Bit Array area */
+/* 0x8000:FFFF Reserved PCIe MSI-X Table Area */
+
+/* PSL Slice Privilege 1 Memory Map */
+/* Configuration Area */
+static const cxl_p1n_reg_t CXL_PSL_SR_An          = {0x00};
+static const cxl_p1n_reg_t CXL_PSL_LPID_An        = {0x08};
+static const cxl_p1n_reg_t CXL_PSL_AMBAR_An       = {0x10};
+static const cxl_p1n_reg_t CXL_PSL_SPOffset_An    = {0x18};
+static const cxl_p1n_reg_t CXL_PSL_ID_An          = {0x20};
+static const cxl_p1n_reg_t CXL_PSL_SERR_An        = {0x28};
+/* Memory Management and Lookaside Buffer Management */
+static const cxl_p1n_reg_t CXL_PSL_SDR_An         = {0x30};
+static const cxl_p1n_reg_t CXL_PSL_AMOR_An        = {0x38};
+/* Pointer Area */
+static const cxl_p1n_reg_t CXL_HAURP_An           = {0x80};
+static const cxl_p1n_reg_t CXL_PSL_SPAP_An        = {0x88};
+static const cxl_p1n_reg_t CXL_PSL_LLCMD_An       = {0x90};
+/* Control Area */
+static const cxl_p1n_reg_t CXL_PSL_SCNTL_An       = {0xA0};
+static const cxl_p1n_reg_t CXL_PSL_CtxTime_An     = {0xA8};
+static const cxl_p1n_reg_t CXL_PSL_IVTE_Offset_An = {0xB0};
+static const cxl_p1n_reg_t CXL_PSL_IVTE_Limit_An  = {0xB8};
+/* 0xC0:FF Implementation Dependent Area */
+static const cxl_p1n_reg_t CXL_PSL_FIR_SLICE_An   = {0xC0};
+static const cxl_p1n_reg_t CXL_AFU_DEBUG_An       = {0xC8};
+static const cxl_p1n_reg_t CXL_PSL_APCALLOC_A     = {0xD0};
+static const cxl_p1n_reg_t CXL_PSL_COALLOC_A      = {0xD8};
+static const cxl_p1n_reg_t CXL_PSL_RXCTL_A        = {0xE0};
+static const cxl_p1n_reg_t CXL_PSL_SLICE_TRACE    = {0xE8};
+
+/* PSL Slice Privilege 2 Memory Map */
+/* Configuration and Control Area */
+static const cxl_p2n_reg_t CXL_PSL_PID_TID_An = {0x000};
+static const cxl_p2n_reg_t CXL_CSRP_An        = {0x008};
+static const cxl_p2n_reg_t CXL_AURP0_An       = {0x010};
+static const cxl_p2n_reg_t CXL_AURP1_An       = {0x018};
+static const cxl_p2n_reg_t CXL_SSTP0_An       = {0x020};
+static const cxl_p2n_reg_t CXL_SSTP1_An       = {0x028};
+static const cxl_p2n_reg_t CXL_PSL_AMR_An     = {0x030};
+/* Segment Lookaside Buffer Management */
+static const cxl_p2n_reg_t CXL_SLBIE_An       = {0x040};
+static const cxl_p2n_reg_t CXL_SLBIA_An       = {0x048};
+static const cxl_p2n_reg_t CXL_SLBI_Select_An = {0x050};
+/* Interrupt Registers */
+static const cxl_p2n_reg_t CXL_PSL_DSISR_An   = {0x060};
+static const cxl_p2n_reg_t CXL_PSL_DAR_An     = {0x068};
+static const cxl_p2n_reg_t CXL_PSL_DSR_An     = {0x070};
+static const cxl_p2n_reg_t CXL_PSL_TFC_An     = {0x078};
+static const cxl_p2n_reg_t CXL_PSL_PEHandle_An = {0x080};
+static const cxl_p2n_reg_t CXL_PSL_ErrStat_An = {0x088};
+/* AFU Registers */
+static const cxl_p2n_reg_t CXL_AFU_Cntl_An    = {0x090};
+static const cxl_p2n_reg_t CXL_AFU_ERR_An     = {0x098};
+/* Work Element Descriptor */
+static const cxl_p2n_reg_t CXL_PSL_WED_An     = {0x0A0};
+/* 0x0C0:FFF Implementation Dependent Area */
+
+#define CXL_PSL_SPAP_Addr 0x0ffffffffffff000ULL
+#define CXL_PSL_SPAP_Size 0x0000000000000ff0ULL
+#define CXL_PSL_SPAP_Size_Shift 4
+#define CXL_PSL_SPAP_V    0x0000000000000001ULL
+
+/****** CXL_PSL_DLCNTL *****************************************************/
+#define CXL_PSL_DLCNTL_D (0x1ull << (63-28))
+#define CXL_PSL_DLCNTL_C (0x1ull << (63-29))
+#define CXL_PSL_DLCNTL_E (0x1ull << (63-30))
+#define CXL_PSL_DLCNTL_S (0x1ull << (63-31))
+#define CXL_PSL_DLCNTL_CE (CXL_PSL_DLCNTL_C | CXL_PSL_DLCNTL_E)
+#define CXL_PSL_DLCNTL_DCES (CXL_PSL_DLCNTL_D | CXL_PSL_DLCNTL_CE | CXL_PSL_DLCNTL_S)
+
+/****** CXL_PSL_SR_An ******************************************************/
+#define CXL_PSL_SR_An_SF  MSR_SF            /* 64bit */
+#define CXL_PSL_SR_An_TA  (1ull << (63-1))  /* Tags active,   GA1: 0 */
+#define CXL_PSL_SR_An_HV  MSR_HV            /* Hypervisor,    GA1: 0 */
+#define CXL_PSL_SR_An_PR  MSR_PR            /* Problem state, GA1: 1 */
+#define CXL_PSL_SR_An_ISL (1ull << (63-53)) /* Ignore Segment Large Page */
+#define CXL_PSL_SR_An_TC  (1ull << (63-54)) /* Page Table secondary hash */
+#define CXL_PSL_SR_An_US  (1ull << (63-56)) /* User state,    GA1: X */
+#define CXL_PSL_SR_An_SC  (1ull << (63-58)) /* Segment Table secondary hash */
+#define CXL_PSL_SR_An_R   MSR_DR            /* Relocate,      GA1: 1 */
+#define CXL_PSL_SR_An_MP  (1ull << (63-62)) /* Master Process */
+#define CXL_PSL_SR_An_LE  (1ull << (63-63)) /* Little Endian */
+
+/****** CXL_PSL_LLCMD_An ****************************************************/
+#define CXL_LLCMD_TERMINATE   0x0001000000000000ULL
+#define CXL_LLCMD_REMOVE      0x0002000000000000ULL
+#define CXL_LLCMD_SUSPEND     0x0003000000000000ULL
+#define CXL_LLCMD_RESUME      0x0004000000000000ULL
+#define CXL_LLCMD_ADD         0x0005000000000000ULL
+#define CXL_LLCMD_UPDATE      0x0006000000000000ULL
+#define CXL_LLCMD_HANDLE_MASK 0x000000000000ffffULL
+
+/****** CXL_PSL_ID_An ****************************************************/
+#define CXL_PSL_ID_An_F	(1ull << (63-31))
+#define CXL_PSL_ID_An_L	(1ull << (63-30))
+
+/****** CXL_PSL_SCNTL_An ****************************************************/
+#define CXL_PSL_SCNTL_An_CR          (0x1ull << (63-15))
+/* Programming Models: */
+#define CXL_PSL_SCNTL_An_PM_MASK     (0xffffull << (63-31))
+#define CXL_PSL_SCNTL_An_PM_Shared   (0x0000ull << (63-31))
+#define CXL_PSL_SCNTL_An_PM_OS       (0x0001ull << (63-31))
+#define CXL_PSL_SCNTL_An_PM_Process  (0x0002ull << (63-31))
+#define CXL_PSL_SCNTL_An_PM_AFU      (0x0004ull << (63-31))
+#define CXL_PSL_SCNTL_An_PM_AFU_PBT  (0x0104ull << (63-31))
+/* Purge Status (ro) */
+#define CXL_PSL_SCNTL_An_Ps_MASK     (0x3ull << (63-39))
+#define CXL_PSL_SCNTL_An_Ps_Pending  (0x1ull << (63-39))
+#define CXL_PSL_SCNTL_An_Ps_Complete (0x3ull << (63-39))
+/* Purge */
+#define CXL_PSL_SCNTL_An_Pc          (0x1ull << (63-48))
+/* Suspend Status (ro) */
+#define CXL_PSL_SCNTL_An_Ss_MASK     (0x3ull << (63-55))
+#define CXL_PSL_SCNTL_An_Ss_Pending  (0x1ull << (63-55))
+#define CXL_PSL_SCNTL_An_Ss_Complete (0x3ull << (63-55))
+/* Suspend Control */
+#define CXL_PSL_SCNTL_An_Sc          (0x1ull << (63-63))
+
+/* AFU Slice Enable Status (ro) */
+#define CXL_AFU_Cntl_An_ES_MASK     (0x7ull << (63-2))
+#define CXL_AFU_Cntl_An_ES_Disabled (0x0ull << (63-2))
+#define CXL_AFU_Cntl_An_ES_Enabled  (0x4ull << (63-2))
+/* AFU Slice Enable */
+#define CXL_AFU_Cntl_An_E           (0x1ull << (63-3))
+/* AFU Slice Reset status (ro) */
+#define CXL_AFU_Cntl_An_RS_MASK     (0x3ull << (63-5))
+#define CXL_AFU_Cntl_An_RS_Pending  (0x1ull << (63-5))
+#define CXL_AFU_Cntl_An_RS_Complete (0x2ull << (63-5))
+/* AFU Slice Reset */
+#define CXL_AFU_Cntl_An_RA          (0x1ull << (63-7))
+
+/****** CXL_SSTP0/1_An ******************************************************/
+/* These top bits are for the segment that CONTAINS the segment table */
+#define CXL_SSTP0_An_B_SHIFT    SLB_VSID_SSIZE_SHIFT
+#define CXL_SSTP0_An_KS             (1ull << (63-2))
+#define CXL_SSTP0_An_KP             (1ull << (63-3))
+#define CXL_SSTP0_An_N              (1ull << (63-4))
+#define CXL_SSTP0_An_L              (1ull << (63-5))
+#define CXL_SSTP0_An_C              (1ull << (63-6))
+#define CXL_SSTP0_An_TA             (1ull << (63-7))
+#define CXL_SSTP0_An_LP_SHIFT                (63-9)  /* 2 Bits */
+/* And finally, the virtual address & size of the segment table: */
+#define CXL_SSTP0_An_SegTableSize_SHIFT      (63-31) /* 12 Bits */
+#define CXL_SSTP0_An_SegTableSize_MASK \
+	(((1ull << 12) - 1) << CXL_SSTP0_An_SegTableSize_SHIFT)
+#define CXL_SSTP0_An_STVA_U_MASK   ((1ull << (63-49))-1)
+#define CXL_SSTP1_An_STVA_L_MASK (~((1ull << (63-55))-1))
+#define CXL_SSTP1_An_V              (1ull << (63-63))
+
+/****** CXL_PSL_SLBIE_[An] **************************************************/
+/* write: */
+#define CXL_SLBIE_C        PPC_BIT(36)         /* Class */
+#define CXL_SLBIE_SS       PPC_BITMASK(37, 38) /* Segment Size */
+#define CXL_SLBIE_SS_SHIFT PPC_BITLSHIFT(38)
+#define CXL_SLBIE_TA       PPC_BIT(38)         /* Tags Active */
+/* read: */
+#define CXL_SLBIE_MAX      PPC_BITMASK(24, 31)
+#define CXL_SLBIE_PENDING  PPC_BITMASK(56, 63)
+
+/****** CXL_SLBIA_[An] ******************************************************/
+#define CXL_SLBIA_P         (1ull) /* Pending (read) */
+
+/****** Common to all PSL_SLBIE/A_[An] registers *****************************/
+#define CXL_SLBI_IQ_ALL     (0ull)              /* Inv qualifier */
+#define CXL_SLBI_IQ_LPID    (1ull)              /* Inv qualifier */
+#define CXL_SLBI_IQ_LPIDPID (3ull)              /* Inv qualifier */
+
+/****** CXL_PSL_DSISR_An ****************************************************/
+#define CXL_PSL_DSISR_An_DS (1ull << (63-0))  /* Segment not found */
+#define CXL_PSL_DSISR_An_DM (1ull << (63-1))  /* PTE not found (See also: M) or protection fault */
+#define CXL_PSL_DSISR_An_ST (1ull << (63-2))  /* Segment Table PTE not found */
+#define CXL_PSL_DSISR_An_UR (1ull << (63-3))  /* AURP PTE not found */
+#define CXL_PSL_DSISR_TRANS (CXL_PSL_DSISR_An_DS | CXL_PSL_DSISR_An_DM | CXL_PSL_DSISR_An_ST | CXL_PSL_DSISR_An_UR)
+#define CXL_PSL_DSISR_An_PE (1ull << (63-4))  /* PSL Error (implementation specific) */
+#define CXL_PSL_DSISR_An_AE (1ull << (63-5))  /* AFU Error */
+#define CXL_PSL_DSISR_An_OC (1ull << (63-6))  /* OS Context Warning */
+/* NOTE: Bits 32:63 are undefined if DSISR[DS] = 1 */
+#define CXL_PSL_DSISR_An_M  DSISR_NOHPTE      /* PTE not found */
+#define CXL_PSL_DSISR_An_P  DSISR_PROTFAULT   /* Storage protection violation */
+#define CXL_PSL_DSISR_An_A  (1ull << (63-37)) /* AFU lock access to write through or cache inhibited storage */
+#define CXL_PSL_DSISR_An_S  DSISR_ISSTORE     /* Access was afu_wr or afu_zero */
+#define CXL_PSL_DSISR_An_K  DSISR_KEYFAULT    /* Access not permitted by virtual page class key protection */
+
+/****** CXL_PSL_TFC_An ******************************************************/
+#define CXL_PSL_TFC_An_A  (1ull << (63-28)) /* Acknowledge non-translation fault */
+#define CXL_PSL_TFC_An_C  (1ull << (63-29)) /* Continue (abort transaction) */
+#define CXL_PSL_TFC_An_AE (1ull << (63-30)) /* Restart PSL with address error */
+#define CXL_PSL_TFC_An_R  (1ull << (63-31)) /* Restart PSL transaction */
+
+/* cxl_process_element->software_status */
+#define CXL_PE_SOFTWARE_STATE_V (1ul << (31 -  0)) /* Valid */
+#define CXL_PE_SOFTWARE_STATE_C (1ul << (31 - 29)) /* Complete */
+#define CXL_PE_SOFTWARE_STATE_S (1ul << (31 - 30)) /* Suspend */
+#define CXL_PE_SOFTWARE_STATE_T (1ul << (31 - 31)) /* Terminate */
+
+/* SPA->sw_command_status */
+#define CXL_SPA_SW_CMD_MASK         0xffff000000000000ULL
+#define CXL_SPA_SW_CMD_TERMINATE    0x0001000000000000ULL
+#define CXL_SPA_SW_CMD_REMOVE       0x0002000000000000ULL
+#define CXL_SPA_SW_CMD_SUSPEND      0x0003000000000000ULL
+#define CXL_SPA_SW_CMD_RESUME       0x0004000000000000ULL
+#define CXL_SPA_SW_CMD_ADD          0x0005000000000000ULL
+#define CXL_SPA_SW_CMD_UPDATE       0x0006000000000000ULL
+#define CXL_SPA_SW_STATE_MASK       0x0000ffff00000000ULL
+#define CXL_SPA_SW_STATE_TERMINATED 0x0000000100000000ULL
+#define CXL_SPA_SW_STATE_REMOVED    0x0000000200000000ULL
+#define CXL_SPA_SW_STATE_SUSPENDED  0x0000000300000000ULL
+#define CXL_SPA_SW_STATE_RESUMED    0x0000000400000000ULL
+#define CXL_SPA_SW_STATE_ADDED      0x0000000500000000ULL
+#define CXL_SPA_SW_STATE_UPDATED    0x0000000600000000ULL
+#define CXL_SPA_SW_PSL_ID_MASK      0x00000000ffff0000ULL
+#define CXL_SPA_SW_LINK_MASK        0x000000000000ffffULL
+
+#define CXL_MAX_SLICES 4
+#define MAX_AFU_MMIO_REGS 3
+
+#define CXL_MODEL_DEDICATED   0x1
+#define CXL_MODEL_DIRECTED    0x2
+#define CXL_MODEL_TIME_SLICED 0x4
+#define CXL_SUPPORTED_MODELS (CXL_MODEL_DEDICATED | CXL_MODEL_DIRECTED)
+
+enum cxl_context_status {
+	CLOSED,
+	OPENED,
+	STARTED
+};
+
+enum prefault_modes {
+	CXL_PREFAULT_NONE,
+	CXL_PREFAULT_WED,
+	CXL_PREFAULT_ALL,
+};
+
+struct cxl_sste {
+	__be64 esid_data;
+	__be64 vsid_data;
+};
+
+#define to_cxl_adapter(d) container_of(d, struct cxl_t, dev)
+#define to_cxl_afu(d) container_of(d, struct cxl_afu_t, dev)
+
+struct cxl_afu_t {
+	irq_hw_number_t psl_hwirq;
+	irq_hw_number_t serr_hwirq;
+	unsigned int serr_virq;
+	void __iomem *p1n_mmio;
+	void __iomem *p2n_mmio;
+	phys_addr_t psn_phys;
+	u64 pp_offset;
+	u64 pp_size;
+	void __iomem *afu_desc_mmio;
+	struct cxl_t *adapter;
+	struct device dev;
+	struct cdev afu_cdev_s, afu_cdev_m;
+	struct device *chardev_s, *chardev_m;
+	struct idr contexts_idr;
+	struct dentry *debugfs;
+	spinlock_t contexts_lock;
+	struct mutex spa_mutex;
+	spinlock_t afu_cntl_lock;
+
+	/* Only the first part of the SPA is used for the process element
+	 * linked list. The only other part that software needs to worry about
+	 * is sw_command_status, which we store a separate pointer to.
+	 * Everything else in the SPA is only used by hardware */
+	struct cxl_process_element *spa;
+	__be64 *sw_command_status;
+	unsigned int spa_size;
+	int spa_order;
+	int spa_max_procs;
+	unsigned int psl_virq;
+
+	int pp_irqs;
+	int irqs_max;
+	int num_procs;
+	int max_procs_virtualised;
+	int slice;
+	int models_supported;
+	int current_model;
+	enum prefault_modes prefault_mode;
+	bool psa;
+	bool pp_psa;
+	bool enabled;
+};
+
+/* This is a cxl context.  If the PSL is in dedicated model, there will be one
+ * of these per AFU.  If in AFU directed there can be lots of these. */
+struct cxl_context_t {
+	struct cxl_afu_t *afu;
+
+	/* Problem state MMIO */
+	phys_addr_t psn_phys;
+	u64 psn_size;
+
+	spinlock_t sst_lock; /* Protects segment table */
+	struct cxl_sste *sstp;
+	unsigned int sst_size, sst_lru;
+
+	wait_queue_head_t wq;
+	struct pid *pid;
+	spinlock_t lock; /* Protects pending_irq_mask, pending_fault and fault_addr */
+	/* Only used in PR mode */
+	u64 process_token;
+
+	unsigned long *irq_bitmap; /* Accessed from IRQ context */
+	struct cxl_irq_ranges irqs;
+	u64 fault_addr;
+	u64 afu_err;
+	enum cxl_context_status status;
+
+
+	/* XXX: Is it possible to need multiple work items at once? */
+	struct work_struct fault_work;
+	u64 dsisr;
+	u64 dar;
+
+	struct cxl_process_element *elem;
+
+	int ph; /* process handle/process element index */
+	u32 irq_count;
+	bool pe_inserted;
+	bool master;
+	bool kernel;
+	bool pending_irq;
+	bool pending_fault;
+	bool pending_afu_err;
+};
+
+struct cxl_t {
+	void __iomem *p1_mmio;
+	void __iomem *p2_mmio;
+	irq_hw_number_t err_hwirq;
+	unsigned int err_virq;
+	struct cxl_driver_ops *driver;
+	spinlock_t afu_list_lock;
+	struct cxl_afu_t *afu[CXL_MAX_SLICES];
+	struct device dev;
+	struct dentry *trace;
+	struct dentry *psl_err_chk;
+	struct dentry *debugfs;
+	struct bin_attribute cxl_attr;
+	int adapter_num;
+	int user_irqs;
+	u64 afu_desc_off;
+	u64 afu_desc_size;
+	u64 ps_off;
+	u64 ps_size;
+	u16 psl_rev;
+	u16 base_image;
+	u8 vsec_status;
+	u8 caia_major;
+	u8 caia_minor;
+	u8 slices;
+	bool user_image_loaded;
+	bool perst_loads_image;
+	bool perst_select_user;
+};
+
+struct cxl_driver_ops {
+	struct module *module;
+	int (*alloc_one_irq)(struct cxl_t *adapter);
+	void (*release_one_irq)(struct cxl_t *adapter, int hwirq);
+	int (*alloc_irq_ranges)(struct cxl_irq_ranges *irqs, struct cxl_t *adapter, unsigned int num);
+	void (*release_irq_ranges)(struct cxl_irq_ranges *irqs, struct cxl_t *adapter);
+	int (*setup_irq)(struct cxl_t *adapter, unsigned int hwirq, unsigned int virq);
+};
+
+/* common == phyp + powernv */
+struct cxl_process_element_common {
+	__be32 tid;
+	__be32 pid;
+	__be64 csrp;
+	__be64 aurp0;
+	__be64 aurp1;
+	__be64 sstp0;
+	__be64 sstp1;
+	__be64 amr;
+	u8     reserved3[4];
+	__be64 wed;
+} __packed;
+
+/* just powernv */
+struct cxl_process_element {
+	__be64 sr;
+	__be64 SPOffset;
+	__be64 sdr;
+	__be64 haurp;
+	__be32 ctxtime;
+	__be16 ivte_offsets[4];
+	__be16 ivte_ranges[4];
+	__be32 lpid;
+	struct cxl_process_element_common common;
+	__be32 software_state;
+} __packed;
+
+#define _cxl_reg_write(addr, val) \
+	out_be64((u64 __iomem *)(addr), val)
+#define _cxl_reg_read(addr) \
+	in_be64((u64 __iomem *)(addr))
+
+static inline void __iomem *_cxl_p1_addr(struct cxl_t *cxl, cxl_p1_reg_t reg)
+{
+	WARN_ON(!cpu_has_feature(CPU_FTR_HVMODE));
+	return cxl->p1_mmio + cxl_reg_off(reg);
+}
+#define cxl_p1_write(cxl, reg, val) \
+	_cxl_reg_write(_cxl_p1_addr(cxl, reg), val)
+#define cxl_p1_read(cxl, reg) \
+	_cxl_reg_read(_cxl_p1_addr(cxl, reg))
+
+static inline void __iomem *_cxl_p1n_addr(struct cxl_afu_t *afu, cxl_p1n_reg_t reg)
+{
+	WARN_ON(!cpu_has_feature(CPU_FTR_HVMODE));
+	return afu->p1n_mmio + cxl_reg_off(reg);
+}
+#define cxl_p1n_write(afu, reg, val) \
+	_cxl_reg_write(_cxl_p1n_addr(afu, reg), val)
+#define cxl_p1n_read(afu, reg) \
+	_cxl_reg_read(_cxl_p1n_addr(afu, reg))
+
+static inline void __iomem *_cxl_p2n_addr(struct cxl_afu_t *afu, cxl_p2n_reg_t reg)
+{
+	return afu->p2n_mmio + cxl_reg_off(reg);
+}
+#define cxl_p2n_write(afu, reg, val) \
+	_cxl_reg_write(_cxl_p2n_addr(afu, reg), val)
+#define cxl_p2n_read(afu, reg) \
+	_cxl_reg_read(_cxl_p2n_addr(afu, reg))
+
+struct cxl_calls {
+	void (*cxl_slbia)(struct mm_struct *mm);
+	struct module *owner;
+};
+int register_cxl_calls(struct cxl_calls *calls);
+void unregister_cxl_calls(struct cxl_calls *calls);
+
+int cxl_alloc_adapter_nr(struct cxl_t *adapter);
+void cxl_remove_adapter_nr(struct cxl_t *adapter);
+
+int cxl_file_init(void);
+void cxl_file_exit(void);
+int cxl_register_adapter(struct cxl_t *adapter);
+int cxl_register_afu(struct cxl_afu_t *afu);
+int cxl_chardev_m_afu_add(struct cxl_afu_t *afu);
+int cxl_chardev_s_afu_add(struct cxl_afu_t *afu);
+void cxl_chardev_afu_remove(struct cxl_afu_t *afu);
+
+void cxl_context_detach_all(struct cxl_afu_t *afu);
+void cxl_context_free(struct cxl_context_t *ctx);
+void cxl_context_detach(struct cxl_context_t *ctx);
+
+int cxl_sysfs_adapter_add(struct cxl_t *adapter);
+void cxl_sysfs_adapter_remove(struct cxl_t *adapter);
+int cxl_sysfs_afu_add(struct cxl_afu_t *afu);
+void cxl_sysfs_afu_remove(struct cxl_afu_t *afu);
+
+int cxl_afu_activate_model(struct cxl_afu_t *afu, int model);
+int _cxl_afu_deactivate_model(struct cxl_afu_t *afu, int model);
+int cxl_afu_deactivate_model(struct cxl_afu_t *afu);
+int cxl_afu_select_best_model(struct cxl_afu_t *afu);
+
+unsigned int cxl_map_irq(struct cxl_t *adapter, irq_hw_number_t hwirq,
+		         irq_handler_t handler, void *cookie);
+void cxl_unmap_irq(unsigned int virq, void *cookie);
+int cxl_register_psl_irq(struct cxl_afu_t *afu);
+void cxl_release_psl_irq(struct cxl_afu_t *afu);
+int cxl_register_psl_err_irq(struct cxl_t *adapter);
+void cxl_release_psl_err_irq(struct cxl_t *adapter);
+int cxl_register_serr_irq(struct cxl_afu_t *afu);
+void cxl_release_serr_irq(struct cxl_afu_t *afu);
+int afu_register_irqs(struct cxl_context_t *ctx, u32 count);
+void afu_release_irqs(struct cxl_context_t *ctx);
+irqreturn_t cxl_slice_irq_err(int irq, void *data);
+
+int cxl_debugfs_init(void);
+void cxl_debugfs_exit(void);
+int cxl_debugfs_adapter_add(struct cxl_t *adapter);
+void cxl_debugfs_adapter_remove(struct cxl_t *adapter);
+int cxl_debugfs_afu_add(struct cxl_afu_t *afu);
+void cxl_debugfs_afu_remove(struct cxl_afu_t *afu);
+
+void cxl_handle_fault(struct work_struct *work);
+void cxl_prefault(struct cxl_context_t *ctx, u64 wed);
+
+struct cxl_t *get_cxl_adapter(int num);
+int cxl_alloc_sst(struct cxl_context_t *ctx, u64 *sstp0, u64 *sstp1);
+
+void init_cxl_native(void);
+
+struct cxl_context_t *cxl_context_alloc(void);
+int cxl_context_init(struct cxl_context_t *ctx, struct cxl_afu_t *afu, bool master);
+void cxl_context_free(struct cxl_context_t *ctx);
+int cxl_context_iomap(struct cxl_context_t *ctx, struct vm_area_struct *vma);
+
+/* This matches the layout of the H_COLLECT_CA_INT_INFO retbuf */
+struct cxl_irq_info {
+	u64 dsisr;
+	u64 dar;
+	u64 dsr;
+	u32 pid;
+	u32 tid;
+	u64 afu_err;
+	u64 errstat;
+	u64 padding[3]; /* to match the expected retbuf size for plpar_hcall9 */
+};
+
+struct cxl_backend_ops {
+	int (*attach_process)(struct cxl_context_t *ctx, bool kernel, u64 wed,
+			    u64 amr);
+	int (*detach_process)(struct cxl_context_t *ctx);
+
+	int (*get_irq)(struct cxl_context_t *ctx, struct cxl_irq_info *info);
+	int (*ack_irq)(struct cxl_context_t *ctx, u64 tfc, u64 psl_reset_mask);
+
+	int (*check_error)(struct cxl_afu_t *afu);
+	void (*slbia)(struct cxl_afu_t *afu);
+	int (*afu_reset)(struct cxl_afu_t *afu);
+};
+extern const struct cxl_backend_ops *cxl_ops;
+
+void cxl_stop_trace(struct cxl_t *cxl);
+
+#endif
diff --git a/drivers/misc/cxl/debugfs.c b/drivers/misc/cxl/debugfs.c
new file mode 100644
index 0000000..f4d148c
--- /dev/null
+++ b/drivers/misc/cxl/debugfs.c
@@ -0,0 +1,116 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/debugfs.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+
+#include "cxl.h"
+
+struct dentry *cxl_debugfs;
+
+void cxl_stop_trace(struct cxl_t *adapter)
+{
+	int slice;
+
+	/* Stop the trace */
+	cxl_p1_write(adapter, CXL_PSL_TRACE, 0x8000000000000017LL);
+
+	/* Stop the slice traces */
+	spin_lock(&adapter->afu_list_lock);
+	for (slice = 0; slice < adapter->slices; slice++) {
+		if (adapter->afu[slice])
+			cxl_p1n_write(adapter->afu[slice], CXL_PSL_SLICE_TRACE, 0x8000000000000000LL);
+	}
+	spin_unlock(&adapter->afu_list_lock);
+}
+
+int cxl_debugfs_adapter_add(struct cxl_t *adapter)
+{
+	struct dentry *dir;
+	char buf[32];
+
+	if (!cxl_debugfs)
+		return -ENODEV;
+
+	snprintf(buf, 32, "card%i", adapter->adapter_num);
+	dir = debugfs_create_dir(buf, cxl_debugfs);
+	if (IS_ERR(dir))
+		return PTR_ERR(dir);
+	adapter->debugfs = dir;
+
+	debugfs_create_x64("fir1",     S_IRUSR, dir, _cxl_p1_addr(adapter, CXL_PSL_FIR1));
+	debugfs_create_x64("fir2",     S_IRUSR, dir, _cxl_p1_addr(adapter, CXL_PSL_FIR2));
+	debugfs_create_x64("fir_cntl", S_IRUSR, dir, _cxl_p1_addr(adapter, CXL_PSL_FIR_CNTL));
+	debugfs_create_x64("err_ivte", S_IRUSR, dir, _cxl_p1_addr(adapter, CXL_PSL_ErrIVTE));
+
+	debugfs_create_x64("trace", S_IRUSR | S_IWUSR, dir, _cxl_p1_addr(adapter, CXL_PSL_TRACE));
+
+	return 0;
+}
+EXPORT_SYMBOL(cxl_debugfs_adapter_add);
+
+void cxl_debugfs_adapter_remove(struct cxl_t *adapter)
+{
+	debugfs_remove_recursive(adapter->debugfs);
+}
+EXPORT_SYMBOL(cxl_debugfs_adapter_remove);
+
+int cxl_debugfs_afu_add(struct cxl_afu_t *afu)
+{
+	struct dentry *dir;
+	char buf[32];
+
+	if (!afu->adapter->debugfs)
+		return -ENODEV;
+
+	snprintf(buf, 32, "psl%i.%i", afu->adapter->adapter_num, afu->slice);
+	dir = debugfs_create_dir(buf, afu->adapter->debugfs);
+	if (IS_ERR(dir))
+		return PTR_ERR(dir);
+	afu->debugfs = dir;
+
+	debugfs_create_x64("fir",        S_IRUSR, dir, _cxl_p1n_addr(afu, CXL_PSL_FIR_SLICE_An));
+	debugfs_create_x64("serr",       S_IRUSR, dir, _cxl_p1n_addr(afu, CXL_PSL_SERR_An));
+	debugfs_create_x64("afu_debug",  S_IRUSR, dir, _cxl_p1n_addr(afu, CXL_AFU_DEBUG_An));
+	debugfs_create_x64("sr",         S_IRUSR, dir, _cxl_p1n_addr(afu, CXL_PSL_SR_An));
+
+	debugfs_create_x64("dsisr",      S_IRUSR, dir, _cxl_p2n_addr(afu, CXL_PSL_DSISR_An));
+	debugfs_create_x64("dar",        S_IRUSR, dir, _cxl_p2n_addr(afu, CXL_PSL_DAR_An));
+	debugfs_create_x64("sstp0",      S_IRUSR, dir, _cxl_p2n_addr(afu, CXL_SSTP0_An));
+	debugfs_create_x64("sstp1",      S_IRUSR, dir, _cxl_p2n_addr(afu, CXL_SSTP1_An));
+	debugfs_create_x64("err_status", S_IRUSR, dir, _cxl_p2n_addr(afu, CXL_PSL_ErrStat_An));
+
+	debugfs_create_x64("trace", S_IRUSR | S_IWUSR, dir, _cxl_p1n_addr(afu, CXL_PSL_SLICE_TRACE));
+
+	return 0;
+}
+EXPORT_SYMBOL(cxl_debugfs_afu_add);
+
+void cxl_debugfs_afu_remove(struct cxl_afu_t *afu)
+{
+	debugfs_remove_recursive(afu->debugfs);
+}
+EXPORT_SYMBOL(cxl_debugfs_afu_remove);
+
+int __init cxl_debugfs_init(void)
+{
+	struct dentry *ent;
+	ent = debugfs_create_dir("cxl", NULL);
+	if (IS_ERR(ent))
+		return PTR_ERR(ent);
+	cxl_debugfs = ent;
+
+	return 0;
+}
+
+void cxl_debugfs_exit(void)
+{
+	debugfs_remove_recursive(cxl_debugfs);
+}
diff --git a/drivers/misc/cxl/fault.c b/drivers/misc/cxl/fault.c
new file mode 100644
index 0000000..f729c4a
--- /dev/null
+++ b/drivers/misc/cxl/fault.c
@@ -0,0 +1,298 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#undef DEBUG
+
+#include <linux/workqueue.h>
+#include <linux/sched.h>
+#include <linux/pid.h>
+#include <linux/mm.h>
+#include <linux/moduleparam.h>
+
+#undef MODULE_PARAM_PREFIX
+#define MODULE_PARAM_PREFIX "cxl" "."
+#include <asm/current.h>
+#include <asm/copro.h>
+#include <asm/mmu.h>
+
+#include "cxl.h"
+
+bool cxl_fault_debug = false;
+
+static struct cxl_sste* find_free_sste(struct cxl_sste *primary_group,
+				       bool sec_hash,
+				       struct cxl_sste *secondary_group,
+				       unsigned int *lru)
+{
+	unsigned int i, entry;
+	struct cxl_sste *sste, *group = primary_group;
+
+	for (i = 0; i < 2; i++) {
+		for (entry = 0; entry < 8; entry++) {
+			sste = group + entry;
+			if (!(sste->esid_data & SLB_ESID_V))
+				return sste;
+		}
+		if (!sec_hash)
+			break;
+		group = secondary_group;
+	}
+	/* Nothing free, select an entry to cast out */
+	if (sec_hash && (*lru & 0x8))
+		sste = secondary_group + (*lru & 0x7);
+	else
+		sste = primary_group + (*lru & 0x7);
+	*lru = (*lru + 1) & 0xf;
+
+	return sste;
+}
+
+static void cxl_load_segment(struct cxl_context_t *ctx, u64 esid_data,
+			     u64 vsid_data)
+{
+	/* mask is the group index, we search primary and secondary here. */
+	unsigned int mask = (ctx->sst_size >> 7)-1; /* SSTP0[SegTableSize] */
+	bool sec_hash = 1;
+	struct cxl_sste *sste;
+	unsigned int hash;
+
+	WARN_ON_SMP(!spin_is_locked(&ctx->sst_lock));
+
+	sec_hash = !!(cxl_p1n_read(ctx->afu, CXL_PSL_SR_An) & CXL_PSL_SR_An_SC);
+
+	if (vsid_data & SLB_VSID_B_1T)
+		hash = (esid_data >> SID_SHIFT_1T) & mask;
+	else /* 256M */
+		hash = (esid_data >> SID_SHIFT) & mask;
+
+	sste = find_free_sste(ctx->sstp + (hash << 3), sec_hash,
+			      ctx->sstp + ((~hash & mask) << 3), &ctx->sst_lru);
+
+	pr_devel("CXL Populating SST[%li]: %#llx %#llx\n",
+			sste - ctx->sstp, vsid_data, esid_data);
+
+	sste->vsid_data = cpu_to_be64(vsid_data);
+	sste->esid_data = cpu_to_be64(esid_data);
+}
+
+static int cxl_fault_segment(struct cxl_context_t *ctx, struct mm_struct *mm,
+			     u64 ea)
+{
+	u64 vsid_data = 0, esid_data = 0;
+	unsigned long flags;
+	int rc;
+
+	spin_lock_irqsave(&ctx->sst_lock, flags);
+	if (!(rc = copro_data_segment(mm, ea, &esid_data, &vsid_data))) {
+		cxl_load_segment(ctx, esid_data, vsid_data);
+	}
+	spin_unlock_irqrestore(&ctx->sst_lock, flags);
+
+	return rc;
+}
+
+static void cxl_ack_ae(struct cxl_context_t *ctx)
+{
+	unsigned long flags;
+
+	cxl_ops->ack_irq(ctx, CXL_PSL_TFC_An_AE, 0);
+
+	spin_lock_irqsave(&ctx->lock, flags);
+	ctx->pending_fault = true;
+	ctx->fault_addr = ctx->dar;
+	spin_unlock_irqrestore(&ctx->lock, flags);
+
+	wake_up_all(&ctx->wq);
+}
+
+static int cxl_handle_segment_miss(struct cxl_context_t *ctx,
+				   struct mm_struct *mm, u64 ea)
+{
+	int rc;
+
+	pr_devel("CXL interrupt: Segment fault pe: %i ea: %#llx\n", ctx->ph, ea);
+
+	if ((rc = cxl_fault_segment(ctx, mm, ea)))
+		cxl_ack_ae(ctx);
+	else {
+
+		mb(); /* Order seg table write to TFC MMIO write */
+		cxl_ops->ack_irq(ctx, CXL_PSL_TFC_An_R, 0);
+	}
+
+	return IRQ_HANDLED;
+}
+
+static void cxl_handle_page_fault(struct cxl_context_t *ctx,
+				  struct mm_struct *mm, u64 dsisr, u64 dar)
+{
+	unsigned flt = 0;
+	int result;
+	unsigned long access, flags;
+
+	if ((result = copro_handle_mm_fault(mm, dar, dsisr, &flt))) {
+		pr_devel("copro_handle_mm_fault failed: %#x\n", result);
+		return cxl_ack_ae(ctx);
+	}
+
+	/*
+	 * update_mmu_cache() will not have loaded the hash since current->trap
+	 * is not a 0x400 or 0x300, so just call hash_page_mm() here.
+	 */
+	access = _PAGE_PRESENT;
+	if (dsisr & CXL_PSL_DSISR_An_S)
+		access |= _PAGE_RW;
+	if ((!ctx->kernel) || ~(dar & (1ULL << 63)))
+		access |= _PAGE_USER;
+	local_irq_save(flags);
+	hash_page_mm(mm, dar, access, 0x300);
+	local_irq_restore(flags);
+
+	pr_devel("Page fault successfully handled for pe: %i!\n", ctx->ph);
+	cxl_ops->ack_irq(ctx, CXL_PSL_TFC_An_R, 0);
+}
+
+void cxl_handle_fault(struct work_struct *fault_work)
+{
+	struct cxl_context_t *ctx =
+		container_of(fault_work, struct cxl_context_t, fault_work);
+	u64 dsisr = ctx->dsisr;
+	u64 dar = ctx->dar;
+	struct task_struct *task;
+	struct mm_struct *mm;
+
+	if (cxl_p2n_read(ctx->afu, CXL_PSL_DSISR_An) != dsisr ||
+	    cxl_p2n_read(ctx->afu, CXL_PSL_DAR_An) != dar ||
+	    cxl_p2n_read(ctx->afu, CXL_PSL_PEHandle_An) != ctx->ph) {
+		/* Most likely explanation is harmless - a dedicated process
+		 * has detached and these were cleared by the PSL purge, but
+		 * warn about it just in case */
+		dev_notice(&ctx->afu->dev, "cxl_handle_fault: Translation fault regs changed\n");
+		return;
+	}
+
+	pr_devel("CXL BOTTOM HALF handling fault for afu pe: %i. "
+		"DSISR: %#llx DAR: %#llx\n", ctx->ph, dsisr, dar);
+
+	if (!(task = get_pid_task(ctx->pid, PIDTYPE_PID))) {
+		pr_devel("cxl_handle_fault unable to get task %i\n",
+			 pid_nr(ctx->pid));
+		cxl_ack_ae(ctx);
+		return;
+	}
+	if (!(mm = get_task_mm(task))) {
+		pr_devel("cxl_handle_fault unable to get mm %i\n",
+			 pid_nr(ctx->pid));
+		cxl_ack_ae(ctx);
+		goto out;
+	}
+
+	if (dsisr & CXL_PSL_DSISR_An_DS)
+		cxl_handle_segment_miss(ctx, mm, dar);
+	else if (dsisr & CXL_PSL_DSISR_An_DM)
+		cxl_handle_page_fault(ctx, mm, dsisr, dar);
+	else
+		WARN(1, "cxl_handle_fault has nothing to handle\n");
+
+	mmput(mm);
+out:
+	put_task_struct(task);
+}
+
+static void cxl_prefault_one(struct cxl_context_t *ctx, u64 ea)
+{
+	int rc;
+	struct task_struct *task;
+	struct mm_struct *mm;
+
+	if (!(task = get_pid_task(ctx->pid, PIDTYPE_PID))) {
+		pr_devel("cxl_prefault_one unable to get task %i\n",
+			 pid_nr(ctx->pid));
+		return;
+	}
+	if (!(mm = get_task_mm(task))) {
+		pr_devel("cxl_prefault_one unable to get mm %i\n",
+			 pid_nr(ctx->pid));
+		put_task_struct(task);
+		return;
+	}
+
+	rc = cxl_fault_segment(ctx, mm, ea);
+
+	mmput(mm);
+	put_task_struct(task);
+}
+
+static u64 next_segment(u64 ea, u64 vsid_data)
+{
+	if (vsid_data & SLB_VSID_B_1T)
+		ea |= (1ULL << 40) - 1;
+	else
+		ea |= (1ULL << 28) - 1;
+
+	return ea + 1;
+}
+
+static void cxl_prefault_vma(struct cxl_context_t *ctx)
+{
+	u64 ea, vsid_data, esid_data, last_esid_data = 0;
+	struct vm_area_struct *vma;
+	int rc;
+	struct task_struct *task;
+	struct mm_struct *mm;
+	unsigned long flags;
+
+	if (!(task = get_pid_task(ctx->pid, PIDTYPE_PID))) {
+		pr_devel("cxl_prefault_vma unable to get task %i\n",
+			 pid_nr(ctx->pid));
+		return;
+	}
+	if (!(mm = get_task_mm(task))) {
+		pr_devel("cxl_prefault_vm unable to get mm %i\n",
+			 pid_nr(ctx->pid));
+		goto out1;
+	}
+
+	spin_lock_irqsave(&ctx->sst_lock, flags);
+	down_read(&mm->mmap_sem);
+	for (vma = mm->mmap; vma; vma = vma->vm_next) {
+		for (ea = vma->vm_start; ea < vma->vm_end;
+				ea = next_segment(ea, vsid_data)) {
+			rc = copro_data_segment(mm, ea, &esid_data, &vsid_data);
+			if (rc)
+				continue;
+
+			if (last_esid_data == esid_data)
+				continue;
+
+			cxl_load_segment(ctx, esid_data, vsid_data);
+			last_esid_data = esid_data;
+		}
+	}
+	up_read(&mm->mmap_sem);
+	spin_unlock_irqrestore(&ctx->sst_lock, flags);
+
+	mmput(mm);
+out1:
+	put_task_struct(task);
+}
+
+void cxl_prefault(struct cxl_context_t *ctx, u64 wed)
+{
+	switch (ctx->afu->prefault_mode) {
+	case CXL_PREFAULT_WED:
+		cxl_prefault_one(ctx, wed);
+		break;
+	case CXL_PREFAULT_ALL:
+		cxl_prefault_vma(ctx);
+		break;
+	default:
+		break;
+	}
+}
diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c
new file mode 100644
index 0000000..fb87ce3
--- /dev/null
+++ b/drivers/misc/cxl/file.c
@@ -0,0 +1,503 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#undef DEBUG
+
+#include <linux/spinlock.h>
+#include <linux/module.h>
+#include <linux/export.h>
+#include <linux/kernel.h>
+#include <linux/bitmap.h>
+#include <linux/sched.h>
+#include <linux/poll.h>
+#include <linux/pid.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <asm/cputable.h>
+#include <asm/current.h>
+#include <asm/copro.h>
+
+#include "cxl.h"
+
+#define CXL_NUM_MINORS 256 /* Total to reserve */
+#define CXL_DEV_MINORS 9   /* 1 control + 4 AFUs * 2 (master/shared) */
+
+#define CXL_CARD_MINOR(adapter) (adapter->adapter_num * CXL_DEV_MINORS)
+#define CXL_AFU_MINOR(afu) (CXL_CARD_MINOR(afu->adapter) + 1 + (2 * afu->slice))
+#define CXL_AFU_MINOR_M(afu) (CXL_AFU_MINOR(afu) + 1)
+#define CXL_AFU_MKDEV(afu) MKDEV(MAJOR(cxl_dev), CXL_AFU_MINOR(afu))
+#define CXL_AFU_MKDEV_M(afu) MKDEV(MAJOR(cxl_dev), CXL_AFU_MINOR_M(afu))
+
+#define CXL_DEVT_ADAPTER(dev) (MINOR(dev) / CXL_DEV_MINORS)
+#define CXL_DEVT_AFU(dev) ((MINOR(dev) % CXL_DEV_MINORS - 1) / 2)
+
+#define CXL_DEVT_IS_CARD(dev) (MINOR(dev) % CXL_DEV_MINORS == 0)
+#define CXL_DEVT_IS_AFU(dev) (!CXL_DEVT_IS_CARD(dev))
+#define _CXL_DEVT_IS_AFU_S(dev) (((MINOR(dev) % CXL_DEV_MINORS) % 2) == 1)
+#define CXL_DEVT_IS_AFU_S(dev) (!CXL_DEVT_IS_CARD(dev) && _CXL_DEVT_IS_AFU_S(dev))
+#define CXL_DEVT_IS_AFU_M(dev) (!CXL_DEVT_IS_CARD(dev) && !_CXL_DEVT_IS_AFU_S(dev))
+
+dev_t cxl_dev;
+
+struct class *cxl_class;
+EXPORT_SYMBOL(cxl_class);
+
+static int __afu_open(struct inode *inode, struct file *file, bool master)
+{
+	struct cxl_t *adapter;
+	struct cxl_afu_t *afu;
+	struct cxl_context_t *ctx;
+	int adapter_num = CXL_DEVT_ADAPTER(inode->i_rdev);
+	int slice = CXL_DEVT_AFU(inode->i_rdev);
+	int rc = -ENODEV;
+
+	pr_devel("afu_open afu%i.%i\n", slice, adapter_num);
+
+	if (!(adapter = get_cxl_adapter(adapter_num)))
+		return -ENODEV;
+
+	if (!try_module_get(adapter->driver->module))
+		goto err_put_adapter;
+
+	if (slice > adapter->slices)
+		goto err_put_module;
+
+	spin_lock(&adapter->afu_list_lock);
+	if (!(afu = adapter->afu[slice])) {
+		spin_unlock(&adapter->afu_list_lock);
+		goto err_put_module;
+	}
+	get_device(&afu->dev);
+	spin_unlock(&adapter->afu_list_lock);
+
+	if (!afu->current_model)
+		goto err_put_afu;
+
+	if (!(ctx = cxl_context_alloc())) {
+		rc = -ENOMEM;
+		goto err_put_afu;
+	}
+
+	if ((rc = cxl_context_init(ctx, afu, master)))
+		goto err_put_afu;
+
+	pr_devel("afu_open pe: %i\n", ctx->ph);
+	file->private_data = ctx;
+	cxl_ctx_get();
+
+	/* Our ref on the AFU will now hold the adapter */
+	put_device(&adapter->dev);
+
+	return 0;
+
+err_put_afu:
+	put_device(&afu->dev);
+err_put_module:
+	module_put(adapter->driver->module);
+err_put_adapter:
+	put_device(&adapter->dev);
+	return rc;
+}
+static int afu_open(struct inode *inode, struct file *file)
+{
+	return __afu_open(inode, file, false);
+}
+
+static int afu_master_open(struct inode *inode, struct file *file)
+{
+	return __afu_open(inode, file, true);
+}
+
+static int afu_release(struct inode *inode, struct file *file)
+{
+	struct cxl_context_t *ctx = file->private_data;
+
+	pr_devel("%s: closing cxl file descriptor. pe: %i\n",
+		 __func__, ctx->ph);
+	cxl_context_detach(ctx);
+
+	module_put(ctx->afu->adapter->driver->module);
+
+	put_device(&ctx->afu->dev);
+
+	/* It should be safe to remove the context now */
+	cxl_context_free(ctx);
+
+	cxl_ctx_put();
+	return 0;
+}
+
+static long afu_ioctl_start_work(struct cxl_context_t *ctx,
+		     struct cxl_ioctl_start_work __user *uwork)
+{
+	struct cxl_ioctl_start_work work;
+	u64 amr;
+	int rc;
+
+	pr_devel("afu_ioctl: pe: %i CXL_START_WORK\n", ctx->ph);
+
+	if (ctx->status != OPENED)
+		return -EIO;
+
+	if (copy_from_user(&work, uwork,
+			   sizeof(struct cxl_ioctl_start_work)))
+		return -EFAULT;
+
+	if (work.reserved1 || work.reserved2 || work.reserved3 ||
+	    work.reserved4 || work.reserved5 || work.reserved6)
+		return -EINVAL;
+
+	if (work.num_interrupts == -1)
+		work.num_interrupts = ctx->afu->pp_irqs;
+	else if ((work.num_interrupts < ctx->afu->pp_irqs) ||
+		 (work.num_interrupts > ctx->afu->irqs_max))
+		return -EINVAL;
+	if ((rc = afu_register_irqs(ctx, work.num_interrupts)))
+		return rc;
+
+	amr = work.amr & mfspr(SPRN_UAMOR);
+
+	work.process_element = ctx->ph;
+
+	/* Returns PE and number of interrupts */
+	if (copy_to_user(uwork, &work,
+			 sizeof(struct cxl_ioctl_start_work)))
+		return -EFAULT;
+
+	if ((rc = cxl_ops->attach_process(ctx, false, work.wed, amr)))
+		return rc;
+
+	ctx->status = STARTED;
+
+	return 0;
+}
+
+static long afu_ioctl_check_error(struct cxl_context_t *ctx)
+{
+	if (ctx->status != STARTED)
+		return -EIO;
+
+	if (cxl_ops->check_error && cxl_ops->check_error(ctx->afu)) {
+		/* This may not be enough for some errors.  May need to PERST
+		 * the card in some cases if it's very broken.
+		 */
+		return cxl_ops->afu_reset(ctx->afu);
+	}
+	return -EPERM;
+}
+
+static long afu_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+	struct cxl_context_t *ctx = file->private_data;
+
+	if (ctx->status == CLOSED)
+		return -EIO;
+
+	pr_devel("afu_ioctl\n");
+	switch (cmd) {
+	case CXL_IOCTL_START_WORK:
+		return afu_ioctl_start_work(ctx,
+			(struct cxl_ioctl_start_work __user *)arg);
+	case CXL_IOCTL_CHECK_ERROR:
+		return afu_ioctl_check_error(ctx);
+	}
+	return -EINVAL;
+}
+
+static long afu_compat_ioctl(struct file *file, unsigned int cmd,
+			     unsigned long arg)
+{
+	return afu_ioctl(file, cmd, arg);
+}
+
+static int afu_mmap(struct file *file, struct vm_area_struct *vm)
+{
+	struct cxl_context_t *ctx = file->private_data;
+
+	/* AFU must be started before we can MMIO */
+	if (ctx->status != STARTED)
+		return -EIO;
+
+	return cxl_context_iomap(ctx, vm);
+}
+
+static unsigned int afu_poll(struct file *file, struct poll_table_struct *poll)
+{
+	struct cxl_context_t *ctx = file->private_data;
+	int mask = 0;
+	unsigned long flags;
+
+
+	poll_wait(file, &ctx->wq, poll);
+
+	pr_devel("afu_poll wait done pe: %i\n", ctx->ph);
+
+	spin_lock_irqsave(&ctx->lock, flags);
+	if (ctx->pending_irq || ctx->pending_fault ||
+	    ctx->pending_afu_err)
+		mask |= POLLIN | POLLRDNORM;
+	else if (ctx->status == CLOSED)
+		/* Only error on closed when there are no futher events pending
+		 */
+		mask |= POLLERR;
+	spin_unlock_irqrestore(&ctx->lock, flags);
+
+	pr_devel("afu_poll pe: %i returning %#x\n", ctx->ph, mask);
+
+	return mask;
+}
+
+static ssize_t afu_read(struct file *file, char __user *buf, size_t count,
+			loff_t *off)
+{
+	struct cxl_context_t *ctx = file->private_data;
+	struct cxl_event event;
+	unsigned long flags;
+	ssize_t size;
+	DEFINE_WAIT(wait);
+
+	if (count < sizeof(struct cxl_event_header))
+		return -EINVAL;
+
+	while (1) {
+		spin_lock_irqsave(&ctx->lock, flags);
+		if (ctx->pending_irq || ctx->pending_fault ||
+		    ctx->pending_afu_err || (ctx->status == CLOSED))
+			break;
+		spin_unlock_irqrestore(&ctx->lock, flags);
+
+		if (file->f_flags & O_NONBLOCK)
+			return -EAGAIN;
+
+		prepare_to_wait(&ctx->wq, &wait, TASK_INTERRUPTIBLE);
+		if (!(ctx->pending_irq || ctx->pending_fault ||
+		      ctx->pending_afu_err || (ctx->status == CLOSED))) {
+			pr_devel("afu_read going to sleep...\n");
+			schedule();
+			pr_devel("afu_read woken up\n");
+		}
+		finish_wait(&ctx->wq, &wait);
+
+		if (signal_pending(current))
+			return -ERESTARTSYS;
+	}
+
+	memset(&event, 0, sizeof(event));
+	event.header.process_element = ctx->ph;
+	if (ctx->pending_irq) {
+		pr_devel("afu_read delivering AFU interrupt\n");
+		event.header.size = sizeof(struct cxl_event_afu_interrupt);
+		event.header.type = CXL_EVENT_AFU_INTERRUPT;
+		event.irq.irq = find_first_bit(ctx->irq_bitmap, ctx->irq_count) + 1;
+
+		/* Only clear the IRQ if we can send the whole event: */
+		if (count >= event.header.size) {
+			clear_bit(event.irq.irq - 1, ctx->irq_bitmap);
+			if (bitmap_empty(ctx->irq_bitmap, ctx->irq_count))
+				ctx->pending_irq = false;
+		}
+	} else if (ctx->pending_fault) {
+		pr_devel("afu_read delivering data storage fault\n");
+		event.header.size = sizeof(struct cxl_event_data_storage);
+		event.header.type = CXL_EVENT_DATA_STORAGE;
+		event.fault.addr = ctx->fault_addr;
+
+		/* Only clear the fault if we can send the whole event: */
+		if (count >= event.header.size)
+			ctx->pending_fault = false;
+	} else if (ctx->pending_afu_err) {
+		pr_devel("afu_read delivering afu error\n");
+		event.header.size = sizeof(struct cxl_event_afu_error);
+		event.header.type = CXL_EVENT_AFU_ERROR;
+		event.afu_err.err = ctx->afu_err;
+
+		/* Only clear the fault if we can send the whole event: */
+		if (count >= event.header.size)
+			ctx->pending_afu_err = false;
+	} else if (ctx->status == CLOSED) {
+		pr_devel("afu_read fatal error\n");
+		spin_unlock_irqrestore(&ctx->lock, flags);
+		return -EIO;
+	} else
+		WARN(1, "afu_read must be buggy\n");
+
+	spin_unlock_irqrestore(&ctx->lock, flags);
+
+	size = min_t(size_t, count, event.header.size);
+	copy_to_user(buf, &event, size);
+
+	return size;
+}
+
+static const struct file_operations afu_fops = {
+	.owner		= THIS_MODULE,
+	.open           = afu_open,
+	.poll		= afu_poll,
+	.read		= afu_read,
+	.release        = afu_release,
+	.unlocked_ioctl = afu_ioctl,
+	.compat_ioctl   = afu_compat_ioctl,
+	.mmap           = afu_mmap,
+};
+
+static const struct file_operations afu_master_fops = {
+	.owner		= THIS_MODULE,
+	.open           = afu_master_open,
+	.poll		= afu_poll,
+	.read		= afu_read,
+	.release        = afu_release,
+	.unlocked_ioctl = afu_ioctl,
+	.compat_ioctl   = afu_compat_ioctl,
+	.mmap           = afu_mmap,
+};
+
+
+static char *cxl_devnode(struct device *dev, umode_t *mode)
+{
+	struct cxl_afu_t *afu;
+
+	if (CXL_DEVT_IS_CARD(dev->devt)) {
+		/* These minor numbers will eventually be used to program the
+		 * PSL and AFUs once we have dynamic reprogramming support */
+		return NULL;
+	} else { /* CXL_DEVT_IS_AFU */
+		/* Default character devices in each programming model just get
+		 * named /dev/cxl/afuX.Y */
+		afu = dev_get_drvdata(dev);
+		if ((afu->current_model == CXL_MODEL_DEDICATED) &&
+				CXL_DEVT_IS_AFU_M(dev->devt))
+			return kasprintf(GFP_KERNEL, "cxl/%s", dev_name(&afu->dev));
+		if ((afu->current_model == CXL_MODEL_DIRECTED) &&
+				CXL_DEVT_IS_AFU_S(dev->devt))
+			return kasprintf(GFP_KERNEL, "cxl/%s", dev_name(&afu->dev));
+	}
+	return kasprintf(GFP_KERNEL, "cxl/%s", dev_name(dev));
+}
+
+extern struct class *cxl_class;
+
+int cxl_chardev_m_afu_add(struct cxl_afu_t *afu)
+{
+	struct device *dev;
+	int rc;
+
+	cdev_init(&afu->afu_cdev_m, &afu_master_fops);
+	if ((rc = cdev_add(&afu->afu_cdev_m, CXL_AFU_MKDEV_M(afu), 1))) {
+		dev_err(&afu->dev, "Unable to add master chardev: %i\n", rc);
+		return rc;
+	}
+
+	dev = device_create(cxl_class, &afu->dev, CXL_AFU_MKDEV_M(afu), afu,
+			"afu%i.%im", afu->adapter->adapter_num, afu->slice);
+	if (IS_ERR(dev)) {
+		dev_err(&afu->dev, "Unable to create master chardev in sysfs: %i\n", rc);
+		rc = PTR_ERR(dev);
+		goto err;
+	}
+
+	afu->chardev_m = dev;
+
+	return 0;
+err:
+	cdev_del(&afu->afu_cdev_m);
+	return rc;
+}
+
+int cxl_chardev_s_afu_add(struct cxl_afu_t *afu)
+{
+	struct device *dev;
+	int rc;
+
+	cdev_init(&afu->afu_cdev_s, &afu_fops);
+	if ((rc = cdev_add(&afu->afu_cdev_s, CXL_AFU_MKDEV(afu), 1))) {
+		dev_err(&afu->dev, "Unable to add shared chardev: %i\n", rc);
+		return rc;
+	}
+
+	dev = device_create(cxl_class, &afu->dev, CXL_AFU_MKDEV(afu), afu,
+			"afu%i.%is", afu->adapter->adapter_num, afu->slice);
+	if (IS_ERR(dev)) {
+		dev_err(&afu->dev, "Unable to create shared chardev in sysfs: %i\n", rc);
+		rc = PTR_ERR(dev);
+		goto err;
+	}
+
+	afu->chardev_s = dev;
+
+	return 0;
+err:
+	cdev_del(&afu->afu_cdev_s);
+	return rc;
+}
+
+void cxl_chardev_afu_remove(struct cxl_afu_t *afu)
+{
+	if (afu->chardev_m) {
+		cdev_del(&afu->afu_cdev_m);
+		device_unregister(afu->chardev_m);
+	}
+	if (afu->chardev_s) {
+		cdev_del(&afu->afu_cdev_s);
+		device_unregister(afu->chardev_s);
+	}
+}
+
+int cxl_register_afu(struct cxl_afu_t *afu)
+{
+	afu->dev.class = cxl_class;
+
+	return device_register(&afu->dev);
+}
+EXPORT_SYMBOL(cxl_register_afu);
+
+int cxl_register_adapter(struct cxl_t *adapter)
+{
+	adapter->dev.class = cxl_class;
+
+	/* Future: When we support dynamically reprogramming the PSL & AFU we
+	 * will expose the interface to do that via a chardev:
+	 * adapter->dev.devt = CXL_CARD_MKDEV(adapter);
+	 */
+
+	return device_register(&adapter->dev);
+}
+EXPORT_SYMBOL(cxl_register_adapter);
+
+int __init cxl_file_init(void)
+{
+	int rc;
+
+	if ((rc = alloc_chrdev_region(&cxl_dev, 0, CXL_NUM_MINORS, "cxl"))) {
+		pr_err("Unable to allocate CXL major number: %i\n", rc);
+		return rc;
+	}
+
+	pr_devel("CXL device allocated, MAJOR %i\n", MAJOR(cxl_dev));
+
+	cxl_class = class_create(THIS_MODULE, "cxl");
+	if (IS_ERR(cxl_class)) {
+		pr_err("Unable to create CXL class\n");
+		rc = PTR_ERR(cxl_class);
+		goto err;
+	}
+	cxl_class->devnode = cxl_devnode;
+
+	return 0;
+
+err:
+	unregister_chrdev_region(cxl_dev, CXL_NUM_MINORS);
+	return rc;
+}
+
+void cxl_file_exit(void)
+{
+	unregister_chrdev_region(cxl_dev, CXL_NUM_MINORS);
+	class_destroy(cxl_class);
+}
diff --git a/drivers/misc/cxl/irq.c b/drivers/misc/cxl/irq.c
new file mode 100644
index 0000000..3e01e1d
--- /dev/null
+++ b/drivers/misc/cxl/irq.c
@@ -0,0 +1,405 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#undef DEBUG
+
+#include <linux/interrupt.h>
+#include <linux/workqueue.h>
+#include <linux/sched.h>
+#include <linux/wait.h>
+#include <linux/slab.h>
+#include <linux/pid.h>
+#include <asm/cputable.h>
+#include <misc/cxl.h>
+
+#include "cxl.h"
+
+/* XXX: This is implementation specific */
+static irqreturn_t handle_psl_slice_error(struct cxl_context_t *ctx, u64 dsisr, u64 errstat)
+{
+	u64 fir1, fir2, fir_slice, serr, afu_debug;
+
+	fir1 = cxl_p1_read(ctx->afu->adapter, CXL_PSL_FIR1);
+	fir2 = cxl_p1_read(ctx->afu->adapter, CXL_PSL_FIR2);
+	fir_slice = cxl_p1n_read(ctx->afu, CXL_PSL_FIR_SLICE_An);
+	serr = cxl_p1n_read(ctx->afu, CXL_PSL_SERR_An);
+	afu_debug = cxl_p1n_read(ctx->afu, CXL_AFU_DEBUG_An);
+
+	dev_crit(&ctx->afu->dev, "PSL ERROR STATUS: 0x%.16llx\n", errstat);
+	dev_crit(&ctx->afu->dev, "PSL_FIR1: 0x%.16llx\n", fir1);
+	dev_crit(&ctx->afu->dev, "PSL_FIR2: 0x%.16llx\n", fir2);
+	dev_crit(&ctx->afu->dev, "PSL_SERR_An: 0x%.16llx\n", serr);
+	dev_crit(&ctx->afu->dev, "PSL_FIR_SLICE_An: 0x%.16llx\n", fir_slice);
+	dev_crit(&ctx->afu->dev, "CXL_PSL_AFU_DEBUG_An: 0x%.16llx\n", afu_debug);
+
+	dev_crit(&ctx->afu->dev, "STOPPING CXL TRACE\n");
+	cxl_stop_trace(ctx->afu->adapter);
+
+	return cxl_ops->ack_irq(ctx, 0, errstat);
+}
+
+irqreturn_t cxl_slice_irq_err(int irq, void *data)
+{
+	struct cxl_afu_t *afu = data;
+	u64 fir_slice, errstat, serr, afu_debug;
+
+	WARN(irq, "CXL SLICE ERROR interrupt %i\n", irq);
+
+	serr = cxl_p1n_read(afu, CXL_PSL_SERR_An);
+	fir_slice = cxl_p1n_read(afu, CXL_PSL_FIR_SLICE_An);
+	errstat = cxl_p2n_read(afu, CXL_PSL_ErrStat_An);
+	afu_debug = cxl_p1n_read(afu, CXL_AFU_DEBUG_An);
+	dev_crit(&afu->dev, "PSL_SERR_An: 0x%.16llx\n", serr);
+	dev_crit(&afu->dev, "PSL_FIR_SLICE_An: 0x%.16llx\n", fir_slice);
+	dev_crit(&afu->dev, "CXL_PSL_ErrStat_An: 0x%.16llx\n", errstat);
+	dev_crit(&afu->dev, "CXL_PSL_AFU_DEBUG_An: 0x%.16llx\n", afu_debug);
+
+	cxl_p1n_write(afu, CXL_PSL_SERR_An, serr);
+
+	return IRQ_HANDLED;
+}
+
+irqreturn_t cxl_irq_err(int irq, void *data)
+{
+	struct cxl_t *adapter = data;
+	u64 fir1, fir2, err_ivte;
+
+	WARN(1, "CXL ERROR interrupt %i\n", irq);
+
+	err_ivte = cxl_p1_read(adapter, CXL_PSL_ErrIVTE);
+	dev_crit(&adapter->dev, "PSL_ErrIVTE: 0x%.16llx\n", err_ivte);
+
+	dev_crit(&adapter->dev, "STOPPING CXL TRACE\n");
+	cxl_stop_trace(adapter);
+
+	fir1 = cxl_p1_read(adapter, CXL_PSL_FIR1);
+	fir2 = cxl_p1_read(adapter, CXL_PSL_FIR2);
+
+	dev_crit(&adapter->dev, "PSL_FIR1: 0x%.16llx\nPSL_FIR2: 0x%.16llx\n", fir1, fir2);
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t schedule_cxl_fault(struct cxl_context_t *ctx, u64 dsisr, u64 dar)
+{
+	ctx->dsisr = dsisr;
+	ctx->dar = dar;
+	schedule_work(&ctx->fault_work);
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t cxl_irq(int irq, void *data)
+{
+	struct cxl_context_t *ctx = data;
+	struct cxl_irq_info irq_info;
+	u64 dsisr, dar;
+	int result;
+
+	if ((result = cxl_ops->get_irq(ctx, &irq_info))) {
+		WARN(1, "Unable to get CXL IRQ Info: %i\n", result);
+		return IRQ_HANDLED;
+	}
+
+	dsisr = irq_info.dsisr;
+	dar = irq_info.dar;
+
+	pr_devel("CXL interrupt %i for afu pe: %i DSISR: %#llx DAR: %#llx\n", irq, ctx->ph, dsisr, dar);
+
+	if (dsisr & CXL_PSL_DSISR_An_DS) {
+		/* We don't inherently need to sleep to handle this, but we do
+		 * need to get a ref to the task's mm, which we can't do from
+		 * irq context without the potential for a deadlock since it
+		 * takes the task_lock. An alternate option would be to keep a
+		 * reference to the task's mm the entire time it has cxl open,
+		 * but to do that we need to solve the issue where we hold a
+		 * ref to the mm, but the mm can hold a ref to the fd after an
+		 * mmap preventing anything from being cleaned up. */
+		pr_devel("Scheduling segment miss handling for later pe: %i\n", ctx->ph);
+		return schedule_cxl_fault(ctx, dsisr, dar);
+	}
+
+	if (dsisr & CXL_PSL_DSISR_An_M)
+		pr_devel("CXL interrupt: PTE not found\n");
+	if (dsisr & CXL_PSL_DSISR_An_P)
+		pr_devel("CXL interrupt: Storage protection violation\n");
+	if (dsisr & CXL_PSL_DSISR_An_A)
+		pr_devel("CXL interrupt: AFU lock access to write through or cache inhibited storage\n");
+	if (dsisr & CXL_PSL_DSISR_An_S)
+		pr_devel("CXL interrupt: Access was afu_wr or afu_zero\n");
+	if (dsisr & CXL_PSL_DSISR_An_K)
+		pr_devel("CXL interrupt: Access not permitted by virtual page class key protection\n");
+
+	if (dsisr & CXL_PSL_DSISR_An_DM) {
+		/* In some cases we might be able to handle the fault
+		 * immediately if hash_page would succeed, but we still need
+		 * the task's mm, which as above we can't get without a lock */
+		pr_devel("Scheduling page fault handling for later pe: %i\n", ctx->ph);
+		return schedule_cxl_fault(ctx, dsisr, dar);
+	}
+	if (dsisr & CXL_PSL_DSISR_An_ST)
+		WARN(1, "CXL interrupt: Segment Table PTE not found\n");
+	if (dsisr & CXL_PSL_DSISR_An_UR)
+		pr_devel("CXL interrupt: AURP PTE not found\n");
+	if (dsisr & CXL_PSL_DSISR_An_PE)
+		return handle_psl_slice_error(ctx, dsisr, irq_info.errstat);
+	if (dsisr & CXL_PSL_DSISR_An_AE) {
+		pr_devel("CXL interrupt: AFU Error %.llx\n", irq_info.afu_err);
+
+		if (ctx->pending_afu_err) {
+			/* This shouldn't happen - the PSL treats these errors
+			 * as fatal and will have reset the AFU, so there's not
+			 * much point buffering multiple AFU errors.
+			 * OTOH if we DO ever see a storm of these come in it's
+			 * probably best that we log them somewhere: */
+			dev_err_ratelimited(&ctx->afu->dev, "CXL AFU Error "
+					    "undelivered to pe %i: %.llx\n",
+					    ctx->ph, irq_info.afu_err);
+		} else {
+			spin_lock(&ctx->lock);
+			ctx->afu_err = irq_info.afu_err;
+			ctx->pending_afu_err = 1;
+			spin_unlock(&ctx->lock);
+
+			wake_up_all(&ctx->wq);
+		}
+
+		cxl_ops->ack_irq(ctx, CXL_PSL_TFC_An_A, 0);
+	}
+	if (dsisr & CXL_PSL_DSISR_An_OC)
+		pr_devel("CXL interrupt: OS Context Warning\n");
+
+	WARN(1, "Unhandled CXL PSL IRQ\n");
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t cxl_irq_multiplexed(int irq, void *data)
+{
+	struct cxl_afu_t *afu = data;
+	struct cxl_context_t *ctx;
+	int ph = cxl_p2n_read(afu, CXL_PSL_PEHandle_An) & 0xffff;
+	int ret;
+
+	rcu_read_lock();
+	ctx = idr_find(&afu->contexts_idr, ph);
+	if (ctx) {
+		ret = cxl_irq(irq, ctx);
+		rcu_read_unlock();
+		return ret;
+	}
+	rcu_read_unlock();
+
+	WARN(1, "Unable to demultiplex CXL PSL IRQ\n");
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t cxl_irq_afu(int irq, void *data)
+{
+	struct cxl_context_t *ctx = data;
+	irq_hw_number_t hwirq = irqd_to_hwirq(irq_get_irq_data(irq));
+	int irq_off, afu_irq = 1;
+	__u16 range;
+	int r;
+
+	for (r = 1; r < CXL_IRQ_RANGES; r++) {
+		irq_off = hwirq - ctx->irqs.offset[r];
+		range = ctx->irqs.range[r];
+		if (irq_off >= 0 && irq_off < range) {
+			afu_irq += irq_off;
+			break;
+		}
+		afu_irq += range;
+	}
+	if (unlikely(r >= CXL_IRQ_RANGES)) {
+		WARN(1, "Recieved AFU IRQ out of range for pe %i (virq %i hwirq %lx)\n",
+		     ctx->ph, irq, hwirq);
+		return IRQ_HANDLED;
+	}
+
+	pr_devel("Received AFU interrupt %i for pe: %i (virq %i hwirq %lx)\n",
+	       afu_irq, ctx->ph, irq, hwirq);
+
+	if (unlikely(!ctx->irq_bitmap)) {
+		WARN(1, "Recieved AFU IRQ for context with no IRQ bitmap\n");
+		return IRQ_HANDLED;
+	}
+	spin_lock(&ctx->lock);
+	set_bit(afu_irq - 1, ctx->irq_bitmap);
+	ctx->pending_irq = true;
+	spin_unlock(&ctx->lock);
+
+	wake_up_all(&ctx->wq);
+
+	return IRQ_HANDLED;
+}
+
+unsigned int cxl_map_irq(struct cxl_t *adapter, irq_hw_number_t hwirq,
+			 irq_handler_t handler, void *cookie)
+{
+	unsigned int virq;
+	int result;
+
+	/* IRQ Domain? */
+	virq = irq_create_mapping(NULL, hwirq);
+	if (!virq) {
+		dev_warn(&adapter->dev, "cxl_map_irq: irq_create_mapping failed\n");
+		return 0;
+	}
+
+	if (adapter->driver->setup_irq)
+		adapter->driver->setup_irq(adapter, hwirq, virq);
+
+	pr_devel("hwirq %#lx mapped to virq %u\n", hwirq, virq);
+
+	result = request_irq(virq, handler, 0, "cxl", cookie);
+	if (result) {
+		dev_warn(&adapter->dev, "cxl_map_irq: request_irq failed: %i\n", result);
+		return 0;
+	}
+
+	return virq;
+}
+
+void cxl_unmap_irq(unsigned int virq, void *cookie)
+{
+	free_irq(virq, cookie);
+	irq_dispose_mapping(virq);
+}
+
+static int cxl_register_one_irq(struct cxl_t *adapter,
+				irq_handler_t handler,
+				void *cookie,
+				irq_hw_number_t *dest_hwirq,
+				unsigned int *dest_virq)
+{
+	int hwirq, virq;
+
+	if ((hwirq = adapter->driver->alloc_one_irq(adapter)) < 0)
+		return hwirq;
+
+	if (!(virq = cxl_map_irq(adapter, hwirq, handler, cookie)))
+		goto err;
+
+	*dest_hwirq = hwirq;
+	*dest_virq = virq;
+
+	return 0;
+
+err:
+	adapter->driver->release_one_irq(adapter, hwirq);
+	return -ENOMEM;
+}
+
+int cxl_register_psl_err_irq(struct cxl_t *adapter)
+{
+	int rc;
+
+	if ((rc = cxl_register_one_irq(adapter, cxl_irq_err, adapter,
+				       &adapter->err_hwirq,
+				       &adapter->err_virq)))
+		return rc;
+
+	cxl_p1_write(adapter, CXL_PSL_ErrIVTE, adapter->err_hwirq & 0xffff);
+
+	return 0;
+}
+EXPORT_SYMBOL(cxl_register_psl_err_irq);
+
+void cxl_release_psl_err_irq(struct cxl_t *adapter)
+{
+	cxl_p1_write(adapter, CXL_PSL_ErrIVTE, 0x0000000000000000);
+	cxl_unmap_irq(adapter->err_virq, adapter);
+	adapter->driver->release_one_irq(adapter, adapter->err_hwirq);
+}
+EXPORT_SYMBOL(cxl_release_psl_err_irq);
+
+int cxl_register_serr_irq(struct cxl_afu_t *afu)
+{
+	u64 serr;
+	int rc;
+
+	if ((rc = cxl_register_one_irq(afu->adapter, cxl_slice_irq_err, afu,
+				       &afu->serr_hwirq,
+				       &afu->serr_virq)))
+		return rc;
+
+	serr = cxl_p1n_read(afu, CXL_PSL_SERR_An);
+	serr = (serr & 0x00ffffffffff0000ULL) | (afu->serr_hwirq & 0xffff);
+	cxl_p1n_write(afu, CXL_PSL_SERR_An, serr);
+
+	return 0;
+}
+EXPORT_SYMBOL(cxl_register_serr_irq);
+
+void cxl_release_serr_irq(struct cxl_afu_t *afu)
+{
+	cxl_p1n_write(afu, CXL_PSL_SERR_An, 0x0000000000000000);
+	cxl_unmap_irq(afu->serr_virq, afu);
+	afu->adapter->driver->release_one_irq(afu->adapter, afu->serr_hwirq);
+}
+EXPORT_SYMBOL(cxl_release_serr_irq);
+
+int cxl_register_psl_irq(struct cxl_afu_t *afu)
+{
+	return cxl_register_one_irq(afu->adapter, cxl_irq_multiplexed, afu,
+			&afu->psl_hwirq, &afu->psl_virq);
+}
+EXPORT_SYMBOL(cxl_register_psl_irq);
+
+void cxl_release_psl_irq(struct cxl_afu_t *afu)
+{
+	cxl_unmap_irq(afu->psl_virq, afu);
+	afu->adapter->driver->release_one_irq(afu->adapter, afu->psl_hwirq);
+}
+EXPORT_SYMBOL(cxl_release_psl_irq);
+
+int afu_register_irqs(struct cxl_context_t *ctx, u32 count)
+{
+	irq_hw_number_t hwirq;
+	int rc, r, i;
+
+	if ((rc = ctx->afu->adapter->driver->alloc_irq_ranges(&ctx->irqs, ctx->afu->adapter, count)))
+		return rc;
+
+	/* Multiplexed PSL Interrupt */
+	ctx->irqs.offset[0] = ctx->afu->psl_hwirq;
+	ctx->irqs.range[0] = 1;
+
+	ctx->irq_count = count;
+	ctx->irq_bitmap = kcalloc(BITS_TO_LONGS(count),
+				  sizeof(*ctx->irq_bitmap), GFP_KERNEL);
+	if (!ctx->irq_bitmap)
+		return -ENOMEM;
+	for (r = 1; r < CXL_IRQ_RANGES; r++) {
+		hwirq = ctx->irqs.offset[r];
+		for (i = 0; i < ctx->irqs.range[r]; hwirq++, i++) {
+			cxl_map_irq(ctx->afu->adapter, hwirq,
+				     cxl_irq_afu, ctx);
+		}
+	}
+
+	return 0;
+}
+
+void afu_release_irqs(struct cxl_context_t *ctx)
+{
+	irq_hw_number_t hwirq;
+	unsigned int virq;
+	int r, i;
+
+	for (r = 1; r < CXL_IRQ_RANGES; r++) {
+		hwirq = ctx->irqs.offset[r];
+		for (i = 0; i < ctx->irqs.range[r]; hwirq++, i++) {
+			virq = irq_find_mapping(NULL, hwirq);
+			if (virq)
+				cxl_unmap_irq(virq, ctx);
+		}
+	}
+
+	ctx->afu->adapter->driver->release_irq_ranges(&ctx->irqs, ctx->afu->adapter);
+}
diff --git a/drivers/misc/cxl/main.c b/drivers/misc/cxl/main.c
new file mode 100644
index 0000000..fb0e0fc
--- /dev/null
+++ b/drivers/misc/cxl/main.c
@@ -0,0 +1,238 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#undef DEBUG
+
+#include <linux/spinlock.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/mutex.h>
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/mm.h>
+#include <linux/of.h>
+#include <linux/slab.h>
+#include <linux/idr.h>
+#include <asm/cputable.h>
+#include <misc/cxl.h>
+
+#include "cxl.h"
+
+static DEFINE_SPINLOCK(adapter_idr_lock);
+static DEFINE_IDR(cxl_adapter_idr);
+
+const struct cxl_backend_ops *cxl_ops;
+EXPORT_SYMBOL(cxl_ops);
+
+uint cxl_verbose;
+EXPORT_SYMBOL(cxl_verbose);
+module_param_named(verbose, cxl_verbose, uint, 0600);
+MODULE_PARM_DESC(verbose, "Enable verbose dmesg output");
+
+static inline void cxl_slbia_core(struct mm_struct *mm)
+{
+	struct cxl_t *adapter;
+	struct cxl_afu_t *afu;
+	struct cxl_context_t *ctx;
+	struct task_struct *task;
+	unsigned long flags;
+	int card, slice, id;
+
+	pr_devel("%s called\n", __func__);
+
+	spin_lock(&adapter_idr_lock);
+	idr_for_each_entry(&cxl_adapter_idr, adapter, card) {
+		/* XXX: Make this lookup faster with link from mm to ctx */
+		spin_lock(&adapter->afu_list_lock);
+		for (slice = 0; slice < adapter->slices; slice++) {
+			afu = adapter->afu[slice];
+			if (!afu->enabled)
+				continue;
+			rcu_read_lock();
+			idr_for_each_entry(&afu->contexts_idr, ctx, id) {
+				if (!(task = get_pid_task(ctx->pid, PIDTYPE_PID))) {
+					pr_devel("%s unable to get task %i\n",
+						 __func__, pid_nr(ctx->pid));
+					continue;
+				}
+
+				if (task->mm != mm)
+					goto next;
+
+				pr_devel("%s matched mm - card: %i afu: %i pe: %i\n",
+					 __func__, adapter->adapter_num, slice, ctx->ph);
+
+				spin_lock_irqsave(&ctx->sst_lock, flags);
+				if (!ctx->sstp)
+					goto next_unlock;
+				memset(ctx->sstp, 0, ctx->sst_size);
+				mb();
+				cxl_ops->slbia(afu);
+
+next_unlock:
+				spin_unlock_irqrestore(&ctx->sst_lock, flags);
+next:
+				put_task_struct(task);
+			}
+			rcu_read_unlock();
+		}
+		spin_unlock(&adapter->afu_list_lock);
+	}
+	spin_unlock(&adapter_idr_lock);
+}
+
+struct cxl_calls cxl_calls = {
+	.cxl_slbia = cxl_slbia_core,
+	.owner = THIS_MODULE,
+};
+
+int cxl_alloc_sst(struct cxl_context_t *ctx, u64 *sstp0, u64 *sstp1)
+{
+	unsigned long vsid, flags;
+	u64 ea_mask;
+	u64 size;
+
+	*sstp0 = 0;
+	*sstp1 = 0;
+
+	ctx->sst_size = PAGE_SIZE;
+	ctx->sst_lru = 0;
+	if (!ctx->sstp) {
+		ctx->sstp = (struct cxl_sste *)get_zeroed_page(GFP_KERNEL);
+		pr_devel("SSTP allocated at 0x%p\n", ctx->sstp);
+	} else {
+		pr_devel("Zeroing and reusing SSTP already allocated at 0x%p\n", ctx->sstp);
+		spin_lock_irqsave(&ctx->sst_lock, flags);
+		memset(ctx->sstp, 0, PAGE_SIZE);
+		cxl_ops->slbia(ctx->afu);
+		spin_unlock_irqrestore(&ctx->sst_lock, flags);
+	}
+	if (!ctx->sstp) {
+		pr_err("cxl_alloc_sst: Unable to allocate segment table\n");
+		return -ENOMEM;
+	}
+
+	vsid  = get_kernel_vsid((u64)ctx->sstp, mmu_kernel_ssize) << 12;
+
+	*sstp0 |= (u64)mmu_kernel_ssize << CXL_SSTP0_An_B_SHIFT;
+	*sstp0 |= (SLB_VSID_KERNEL | mmu_psize_defs[mmu_linear_psize].sllp) << 50;
+
+	size = (((u64)ctx->sst_size >> 8) - 1) << CXL_SSTP0_An_SegTableSize_SHIFT;
+	if (unlikely(size & ~CXL_SSTP0_An_SegTableSize_MASK)) {
+		WARN(1, "Impossible segment table size\n");
+		return -EINVAL;
+	}
+	*sstp0 |= size;
+
+	if (mmu_kernel_ssize == MMU_SEGSIZE_256M)
+		ea_mask = 0xfffff00ULL;
+	else
+		ea_mask = 0xffffffff00ULL;
+
+	*sstp0 |=  vsid >>     (50-14);  /*   Top 14 bits of VSID */
+	*sstp1 |= (vsid << (64-(50-14))) & ~ea_mask;
+	*sstp1 |= (u64)ctx->sstp & ea_mask;
+	*sstp1 |= CXL_SSTP1_An_V;
+
+	pr_devel("Looked up %#llx: slbfee. %#llx (ssize: %x, vsid: %#lx), copied to SSTP0: %#llx, SSTP1: %#llx\n",
+			(u64)ctx->sstp, (u64)ctx->sstp & ESID_MASK, mmu_kernel_ssize, vsid, *sstp0, *sstp1);
+
+	return 0;
+}
+
+/* Find a CXL adapter by it's number and increase it's refcount */
+struct cxl_t *get_cxl_adapter(int num)
+{
+	struct cxl_t *adapter;
+
+	spin_lock(&adapter_idr_lock);
+	if ((adapter = idr_find(&cxl_adapter_idr, num)))
+		get_device(&adapter->dev);
+	spin_unlock(&adapter_idr_lock);
+
+	return adapter;
+}
+
+int cxl_alloc_adapter_nr(struct cxl_t *adapter)
+{
+	int i;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&adapter_idr_lock);
+	i = idr_alloc(&cxl_adapter_idr, adapter, 0, 0, GFP_NOWAIT);
+	spin_unlock(&adapter_idr_lock);
+	idr_preload_end();
+	if (i < 0)
+		return i;
+
+	adapter->adapter_num = i;
+
+	return 0;
+}
+EXPORT_SYMBOL(cxl_alloc_adapter_nr);
+
+void cxl_remove_adapter_nr(struct cxl_t *adapter)
+{
+	idr_remove(&cxl_adapter_idr, adapter->adapter_num);
+}
+EXPORT_SYMBOL(cxl_remove_adapter_nr);
+
+int cxl_afu_select_best_model(struct cxl_afu_t *afu)
+{
+	if (afu->models_supported & CXL_MODEL_DIRECTED)
+		return cxl_afu_activate_model(afu, CXL_MODEL_DIRECTED);
+
+	if (afu->models_supported & CXL_MODEL_DEDICATED)
+		return cxl_afu_activate_model(afu, CXL_MODEL_DEDICATED);
+
+	dev_warn(&afu->dev, "No supported programing models available\n");
+	/* We don't fail this so the user can inspect sysfs */
+	return 0;
+}
+EXPORT_SYMBOL(cxl_afu_select_best_model);
+
+static int __init init_cxl(void)
+{
+	int rc = 0;
+
+	if (!cpu_has_feature(CPU_FTR_HVMODE))
+		return -EPERM;
+
+	if ((rc = cxl_file_init()))
+		return rc;
+
+	cxl_debugfs_init();
+	init_cxl_native();
+
+	if ((rc = register_cxl_calls(&cxl_calls)))
+		goto err;
+
+	return 0;
+
+err:
+	cxl_debugfs_exit();
+	cxl_file_exit();
+
+	return rc;
+}
+
+static void exit_cxl(void)
+{
+	cxl_debugfs_exit();
+	cxl_file_exit();
+	unregister_cxl_calls(&cxl_calls);
+}
+
+module_init(init_cxl);
+module_exit(exit_cxl);
+
+MODULE_DESCRIPTION("IBM Coherent Accelerator");
+MODULE_AUTHOR("Ian Munsie <imunsie@au1.ibm.com>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c
new file mode 100644
index 0000000..3c5c6a8
--- /dev/null
+++ b/drivers/misc/cxl/native.c
@@ -0,0 +1,649 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#undef DEBUG
+
+#include <linux/spinlock.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/sched.h>
+#include <linux/mutex.h>
+#include <linux/mm.h>
+#include <linux/uaccess.h>
+#include <asm/synch.h>
+#include <misc/cxl.h>
+
+#include "cxl.h"
+
+static int afu_control(struct cxl_afu_t *afu, u64 command,
+		       u64 result, u64 mask, bool enabled)
+{
+	u64 AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An);
+	unsigned long timeout = jiffies + (HZ * CXL_TIMEOUT);
+
+	spin_lock(&afu->afu_cntl_lock);
+	pr_devel("AFU command starting: %llx\n", command);
+
+	cxl_p2n_write(afu, CXL_AFU_Cntl_An, AFU_Cntl | command);
+
+	AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An);
+	while ((AFU_Cntl & mask) != result) {
+		if (time_after_eq(jiffies, timeout)) {
+			dev_warn(&afu->dev, "WARNING: AFU control timed out!\n");
+			spin_unlock(&afu->afu_cntl_lock);
+			return -EBUSY;
+		}
+		pr_devel_ratelimited("AFU control... (0x%.16llx)\n",
+				     AFU_Cntl | command);
+		cpu_relax();
+		AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An);
+	};
+	pr_devel("AFU command complete: %llx\n", command);
+	afu->enabled = enabled;
+	spin_unlock(&afu->afu_cntl_lock);
+
+	return 0;
+}
+
+static int afu_enable(struct cxl_afu_t *afu)
+{
+	pr_devel("AFU enable request\n");
+
+	return afu_control(afu, CXL_AFU_Cntl_An_E,
+			   CXL_AFU_Cntl_An_ES_Enabled,
+			   CXL_AFU_Cntl_An_ES_MASK, true);
+}
+
+static int afu_disable(struct cxl_afu_t *afu)
+{
+	pr_devel("AFU disable request\n");
+
+	return afu_control(afu, 0, CXL_AFU_Cntl_An_ES_Disabled,
+			   CXL_AFU_Cntl_An_ES_MASK, false);
+}
+
+/* We have to disable when we reset */
+static int afu_reset_and_disable(struct cxl_afu_t *afu)
+{
+	pr_devel("AFU reset request\n");
+
+	return afu_control(afu, CXL_AFU_Cntl_An_RA,
+			   CXL_AFU_Cntl_An_RS_Complete | CXL_AFU_Cntl_An_ES_Disabled,
+			   CXL_AFU_Cntl_An_RS_MASK | CXL_AFU_Cntl_An_ES_MASK,
+			   false);
+}
+
+static int afu_check_and_enable(struct cxl_afu_t *afu)
+{
+	if (afu->enabled)
+		return 0;
+	return afu_enable(afu);
+}
+
+static int psl_purge(struct cxl_afu_t *afu)
+{
+	u64 PSL_CNTL = cxl_p1n_read(afu, CXL_PSL_SCNTL_An);
+	u64 AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An);
+	u64 dsisr, dar;
+	u64 start, end;
+	unsigned long timeout = jiffies + (HZ * CXL_TIMEOUT);
+
+	pr_devel("PSL purge request\n");
+
+	if ((AFU_Cntl & CXL_AFU_Cntl_An_ES_MASK) != CXL_AFU_Cntl_An_ES_Disabled) {
+		WARN(1, "psl_purge request while AFU not disabled!\n");
+		afu_disable(afu);
+	}
+
+	cxl_p1n_write(afu, CXL_PSL_SCNTL_An,
+		       PSL_CNTL | CXL_PSL_SCNTL_An_Pc);
+	start = local_clock();
+	PSL_CNTL = cxl_p1n_read(afu, CXL_PSL_SCNTL_An);
+	while ((PSL_CNTL &  CXL_PSL_SCNTL_An_Ps_MASK)
+			== CXL_PSL_SCNTL_An_Ps_Pending) {
+		if (time_after_eq(jiffies, timeout)) {
+			dev_warn(&afu->dev, "WARNING: PSL Purge timed out!\n");
+			return -EBUSY;
+		}
+		dsisr = cxl_p2n_read(afu, CXL_PSL_DSISR_An);
+		pr_devel_ratelimited("PSL purging... PSL_CNTL: 0x%.16llx  PSL_DSISR: 0x%.16llx\n", PSL_CNTL, dsisr);
+		if (dsisr & CXL_PSL_DSISR_TRANS) {
+			dar = cxl_p2n_read(afu, CXL_PSL_DAR_An);
+			dev_notice(&afu->dev, "PSL purge terminating pending translation, DSISR: 0x%.16llx, DAR: 0x%.16llx\n", dsisr, dar);
+			cxl_p2n_write(afu, CXL_PSL_TFC_An, CXL_PSL_TFC_An_AE);
+		} else if (dsisr) {
+			dev_notice(&afu->dev, "PSL purge acknowledging pending non-translation fault, DSISR: 0x%.16llx\n", dsisr);
+			cxl_p2n_write(afu, CXL_PSL_TFC_An, CXL_PSL_TFC_An_A);
+		} else {
+			cpu_relax();
+		}
+		PSL_CNTL = cxl_p1n_read(afu, CXL_PSL_SCNTL_An);
+	};
+	end = local_clock();
+	pr_devel("PSL purged in %lld ns\n", end - start);
+
+	cxl_p1n_write(afu, CXL_PSL_SCNTL_An,
+		       PSL_CNTL & ~CXL_PSL_SCNTL_An_Pc);
+	return 0;
+}
+
+static int spa_max_procs(int spa_size)
+{
+	/* From the CAIA:
+	 *    end_of_SPA_area = SPA_Base + ((n+4) * 128) + (( ((n*8) + 127) >> 7) * 128) + 255
+	 * Most of that junk is really just an overly-complicated way of saying
+	 * the last 256 bytes are __aligned(128), so it's really:
+	 *    end_of_SPA_area = end_of_PSL_queue_area + __aligned(128) 255
+	 * and
+	 *    end_of_PSL_queue_area = SPA_Base + ((n+4) * 128) + (n*8) - 1
+	 * so
+	 *    sizeof(SPA) = ((n+4) * 128) + (n*8) + __aligned(128) 256
+	 * Ignore the alignment (which is safe in this case as long as we are
+	 * careful with our rounding) and solve for n:
+	 */
+	return ((spa_size / 8) - 96) / 17;
+}
+
+static int alloc_spa(struct cxl_afu_t *afu)
+{
+	u64 spap;
+
+	/* Work out how many pages to allocate */
+	afu->spa_order = 0;
+	do {
+		afu->spa_order++;
+		afu->spa_size = (1 << afu->spa_order) * PAGE_SIZE;
+		afu->spa_max_procs = spa_max_procs(afu->spa_size);
+	} while (afu->spa_max_procs < afu->num_procs);
+
+	WARN_ON(afu->spa_size > 0x100000); /* Max size supported by the hardware */
+
+	if (!(afu->spa = (struct cxl_process_element *)
+	      __get_free_pages(GFP_KERNEL | __GFP_ZERO, afu->spa_order))) {
+		pr_err("cxl_alloc_spa: Unable to allocate scheduled process area\n");
+		return -ENOMEM;
+	}
+	pr_devel("spa pages: %i afu->spa_max_procs: %i   afu->num_procs: %i\n",
+		 1<<afu->spa_order, afu->spa_max_procs, afu->num_procs);
+
+	afu->sw_command_status = (__be64 *)((char *)afu->spa +
+					    ((afu->spa_max_procs + 3) * 128));
+
+	spap = virt_to_phys(afu->spa) & CXL_PSL_SPAP_Addr;
+	spap |= ((afu->spa_size >> (12 - CXL_PSL_SPAP_Size_Shift)) - 1) & CXL_PSL_SPAP_Size;
+	spap |= CXL_PSL_SPAP_V;
+	pr_devel("cxl: SPA allocated at 0x%p. Max processes: %i, sw_command_status: 0x%p CXL_PSL_SPAP_An=0x%016llx\n", afu->spa, afu->spa_max_procs, afu->sw_command_status, spap);
+	cxl_p1n_write(afu, CXL_PSL_SPAP_An, spap);
+
+	return 0;
+}
+
+static void release_spa(struct cxl_afu_t *afu)
+{
+	free_pages((unsigned long) afu->spa, afu->spa_order);
+}
+
+static void afu_slbia_native(struct cxl_afu_t *afu)
+{
+	pr_devel("cxl_afu_slbia issuing SLBIA command\n");
+	cxl_p2n_write(afu, CXL_SLBIA_An, CXL_SLBI_IQ_ALL);
+	while (cxl_p2n_read(afu, CXL_SLBIA_An) & CXL_SLBIA_P)
+		cpu_relax();
+}
+
+static void cxl_write_sstp(struct cxl_afu_t *afu, u64 sstp0, u64 sstp1)
+{
+	/* 1. Disable SSTP by writing 0 to SSTP1[V] */
+	cxl_p2n_write(afu, CXL_SSTP1_An, 0);
+
+	/* 2. Invalidate all SLB entries */
+	afu_slbia_native(afu);
+
+	/* 3. Set SSTP0_An */
+	cxl_p2n_write(afu, CXL_SSTP0_An, sstp0);
+
+	/* 4. Set SSTP1_An */
+	cxl_p2n_write(afu, CXL_SSTP1_An, sstp1);
+}
+
+/* Using per slice version may improve performance here. (ie. SLBIA_An) */
+static void slb_invalid(struct cxl_context_t *ctx)
+{
+	struct cxl_t *adapter = ctx->afu->adapter;
+	u64 slbia;
+
+	WARN_ON(!mutex_is_locked(&ctx->afu->spa_mutex));
+
+	cxl_p1_write(adapter, CXL_PSL_LBISEL,
+			((u64)be32_to_cpu(ctx->elem->common.pid) << 32) |
+			be32_to_cpu(ctx->elem->lpid));
+	cxl_p1_write(adapter, CXL_PSL_SLBIA, CXL_SLBI_IQ_LPIDPID);
+
+	while (1) {
+		slbia = cxl_p1_read(adapter, CXL_PSL_SLBIA);
+		if (!(slbia & CXL_SLBIA_P))
+			break;
+		cpu_relax();
+	}
+}
+
+static int do_process_element_cmd(struct cxl_context_t *ctx,
+				  u64 cmd, u64 pe_state)
+{
+	u64 state;
+
+	WARN_ON(!ctx->afu->enabled);
+
+	ctx->elem->software_state = cpu_to_be32(pe_state);
+	smp_wmb();
+	*(ctx->afu->sw_command_status) = cpu_to_be64(cmd | 0 | ctx->ph);
+	smp_mb();
+	cxl_p1n_write(ctx->afu, CXL_PSL_LLCMD_An, cmd | ctx->ph);
+	while (1) {
+		state = be64_to_cpup(ctx->afu->sw_command_status);
+		if (state == ~0ULL) {
+			pr_err("cxl: Error adding process element to AFU\n");
+			return -1;
+		}
+		if ((state & (CXL_SPA_SW_CMD_MASK | CXL_SPA_SW_STATE_MASK  | CXL_SPA_SW_LINK_MASK)) ==
+		    (cmd | (cmd >> 16) | ctx->ph))
+			break;
+		/* The command won't finish in the PSL if there are
+		 * outstanding DSIs.  Hence we need to yield here in
+		 * case there are outstanding DSIs that we need to
+		 * service.  Tuning possiblity: we could wait for a
+		 * while before sched
+		 */
+		schedule();
+
+	}
+	return 0;
+}
+
+static int add_process_element(struct cxl_context_t *ctx)
+{
+	int rc = 0;
+
+	mutex_lock(&ctx->afu->spa_mutex);
+	pr_devel("%s Adding pe: %i started\n", __func__, ctx->ph);
+	if (!(rc = do_process_element_cmd(ctx, CXL_SPA_SW_CMD_ADD, CXL_PE_SOFTWARE_STATE_V)))
+		ctx->pe_inserted = true;
+	pr_devel("%s Adding pe: %i finished\n", __func__, ctx->ph);
+	mutex_unlock(&ctx->afu->spa_mutex);
+	return rc;
+}
+
+static int terminate_process_element(struct cxl_context_t *ctx)
+{
+	int rc = 0;
+
+	/* fast path terminate if it's already invalid */
+	if (!(ctx->elem->software_state & cpu_to_be32(CXL_PE_SOFTWARE_STATE_V)))
+		return rc;
+
+	mutex_lock(&ctx->afu->spa_mutex);
+	pr_devel("%s Terminate pe: %i started\n", __func__, ctx->ph);
+	rc = do_process_element_cmd(ctx, CXL_SPA_SW_CMD_TERMINATE,
+				    CXL_PE_SOFTWARE_STATE_V | CXL_PE_SOFTWARE_STATE_T);
+	ctx->elem->software_state = 0;	/* Remove Valid bit */
+	pr_devel("%s Terminate pe: %i finished\n", __func__, ctx->ph);
+	mutex_unlock(&ctx->afu->spa_mutex);
+	return rc;
+}
+
+static int remove_process_element(struct cxl_context_t *ctx)
+{
+	int rc = 0;
+
+	mutex_lock(&ctx->afu->spa_mutex);
+	pr_devel("%s Remove pe: %i started\n", __func__, ctx->ph);
+	if (!(rc = do_process_element_cmd(ctx, CXL_SPA_SW_CMD_REMOVE, 0)))
+		ctx->pe_inserted = false;
+	slb_invalid(ctx);
+	pr_devel("%s Remove pe: %i finished\n", __func__, ctx->ph);
+	mutex_unlock(&ctx->afu->spa_mutex);
+
+	return rc;
+}
+
+
+static void assign_psn_space(struct cxl_context_t *ctx)
+{
+	if (!ctx->afu->pp_size || ctx->master) {
+		ctx->psn_phys = ctx->afu->psn_phys;
+		ctx->psn_size = ctx->afu->adapter->ps_size;
+	} else {
+		ctx->psn_phys = ctx->afu->psn_phys +
+			(ctx->afu->pp_offset + ctx->afu->pp_size * ctx->ph);
+		ctx->psn_size = ctx->afu->pp_size;
+	}
+}
+
+static int activate_afu_directed(struct cxl_afu_t *afu)
+{
+	int rc;
+
+	dev_info(&afu->dev, "Activating AFU directed model\n");
+
+	if (alloc_spa(afu))
+		return -ENOMEM;
+
+	cxl_p1n_write(afu, CXL_PSL_SCNTL_An, CXL_PSL_SCNTL_An_PM_AFU);
+	cxl_p1n_write(afu, CXL_PSL_AMOR_An, 0xFFFFFFFFFFFFFFFFULL);
+	cxl_p1n_write(afu, CXL_PSL_ID_An, CXL_PSL_ID_An_F | CXL_PSL_ID_An_L);
+
+	afu->current_model = CXL_MODEL_DIRECTED;
+	afu->num_procs = afu->max_procs_virtualised;
+
+	if ((rc = cxl_chardev_m_afu_add(afu)))
+		return rc;
+
+	if ((rc = cxl_chardev_s_afu_add(afu)))
+		goto err;
+
+	return 0;
+err:
+	cxl_chardev_afu_remove(afu);
+	return rc;
+}
+
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+#define set_endian(sr) ((sr) |= CXL_PSL_SR_An_LE)
+#else
+#define set_endian(sr) ((sr) &= ~(CXL_PSL_SR_An_LE))
+#endif
+
+static int attach_afu_directed(struct cxl_context_t *ctx, u64 wed, u64 amr)
+{
+
+	u64 sr, sstp0, sstp1;
+	int r, result;
+
+	assign_psn_space(ctx);
+
+	ctx->elem->ctxtime = 0; /* disable */
+	ctx->elem->lpid = cpu_to_be32(mfspr(SPRN_LPID));
+	ctx->elem->haurp = 0; /* disable */
+	ctx->elem->sdr = cpu_to_be64(mfspr(SPRN_SDR1));
+
+	sr = CXL_PSL_SR_An_SC;
+	if (ctx->master)
+		sr |= CXL_PSL_SR_An_MP;
+	if (mfspr(SPRN_LPCR) & LPCR_TC)
+		sr |= CXL_PSL_SR_An_TC;
+	/* HV=0, PR=1, R=1 for userspace
+	 * For kernel contexts: this would need to change
+	 */
+	sr |= CXL_PSL_SR_An_PR | CXL_PSL_SR_An_R;
+	set_endian(sr);
+	sr &= ~(CXL_PSL_SR_An_HV);
+	if (!test_tsk_thread_flag(current, TIF_32BIT))
+		sr |= CXL_PSL_SR_An_SF;
+	ctx->elem->common.pid = cpu_to_be32(current->pid);
+	ctx->elem->common.tid = 0;
+	ctx->elem->sr = cpu_to_be64(sr);
+
+	ctx->elem->common.csrp = 0; /* disable */
+	ctx->elem->common.aurp0 = 0; /* disable */
+	ctx->elem->common.aurp1 = 0; /* disable */
+
+	if ((result = cxl_alloc_sst(ctx, &sstp0, &sstp1)))
+		return result;
+
+	cxl_prefault(ctx, wed);
+
+	ctx->elem->common.sstp0 = cpu_to_be64(sstp0);
+	ctx->elem->common.sstp1 = cpu_to_be64(sstp1);
+
+	for (r = 0; r < CXL_IRQ_RANGES; r++) {
+		ctx->elem->ivte_offsets[r] = cpu_to_be16(ctx->irqs.offset[r]);
+		ctx->elem->ivte_ranges[r] = cpu_to_be16(ctx->irqs.range[r]);
+	}
+
+	ctx->elem->common.amr = cpu_to_be64(amr);
+	ctx->elem->common.wed = cpu_to_be64(wed);
+
+	/* first guy needs to enable */
+	if ((result = afu_check_and_enable(ctx->afu)))
+		return result;
+
+	add_process_element(ctx);
+
+	return 0;
+}
+
+static int deactivate_afu_directed(struct cxl_afu_t *afu)
+{
+	dev_info(&afu->dev, "Deactivating AFU directed model\n");
+
+	afu->current_model = 0;
+	afu->num_procs = 0;
+
+	cxl_chardev_afu_remove(afu);
+
+	afu_reset_and_disable(afu);
+	afu_disable(afu);
+	psl_purge(afu);
+
+	release_spa(afu);
+
+	return 0;
+}
+
+static int activate_dedicated_process(struct cxl_afu_t *afu)
+{
+	dev_info(&afu->dev, "Activating dedicated process model\n");
+
+	cxl_p1n_write(afu, CXL_PSL_SCNTL_An, CXL_PSL_SCNTL_An_PM_Process);
+
+	cxl_p1n_write(afu, CXL_PSL_CtxTime_An, 0); /* disable */
+	cxl_p1n_write(afu, CXL_PSL_SPAP_An, 0);    /* disable */
+	cxl_p1n_write(afu, CXL_PSL_AMOR_An, 0xFFFFFFFFFFFFFFFFULL);
+	cxl_p1n_write(afu, CXL_PSL_LPID_An, mfspr(SPRN_LPID));
+	cxl_p1n_write(afu, CXL_HAURP_An, 0);       /* disable */
+	cxl_p1n_write(afu, CXL_PSL_SDR_An, mfspr(SPRN_SDR1));
+
+	cxl_p2n_write(afu, CXL_CSRP_An, 0);        /* disable */
+	cxl_p2n_write(afu, CXL_AURP0_An, 0);       /* disable */
+	cxl_p2n_write(afu, CXL_AURP1_An, 0);       /* disable */
+
+	afu->current_model = CXL_MODEL_DEDICATED;
+	afu->num_procs = 1;
+
+	return cxl_chardev_m_afu_add(afu);
+}
+
+static int attach_dedicated(struct cxl_context_t *ctx, u64 wed, u64 amr)
+{
+	struct cxl_afu_t *afu = ctx->afu;
+	u64 sr, sstp0, sstp1;
+	int result;
+
+	sr = CXL_PSL_SR_An_SC;
+	set_endian(sr);
+	if (ctx->master)
+		sr |= CXL_PSL_SR_An_MP;
+	if (mfspr(SPRN_LPCR) & LPCR_TC)
+		sr |= CXL_PSL_SR_An_TC;
+	sr |= CXL_PSL_SR_An_PR | CXL_PSL_SR_An_R;
+	if (!test_tsk_thread_flag(current, TIF_32BIT))
+		sr |= CXL_PSL_SR_An_SF;
+	cxl_p2n_write(afu, CXL_PSL_PID_TID_An, (u64)current->pid << 32);
+	cxl_p1n_write(afu, CXL_PSL_SR_An, sr);
+
+	if ((result = cxl_alloc_sst(ctx, &sstp0, &sstp1)))
+		return result;
+
+	cxl_prefault(ctx, wed);
+
+	cxl_write_sstp(afu, sstp0, sstp1);
+	cxl_p1n_write(afu, CXL_PSL_IVTE_Offset_An,
+		       (((u64)ctx->irqs.offset[0] & 0xffff) << 48) |
+		       (((u64)ctx->irqs.offset[1] & 0xffff) << 32) |
+		       (((u64)ctx->irqs.offset[2] & 0xffff) << 16) |
+			((u64)ctx->irqs.offset[3] & 0xffff));
+	cxl_p1n_write(afu, CXL_PSL_IVTE_Limit_An, (u64)
+		       (((u64)ctx->irqs.range[0] & 0xffff) << 48) |
+		       (((u64)ctx->irqs.range[1] & 0xffff) << 32) |
+		       (((u64)ctx->irqs.range[2] & 0xffff) << 16) |
+			((u64)ctx->irqs.range[3] & 0xffff));
+
+	cxl_p2n_write(afu, CXL_PSL_AMR_An, amr);
+
+	/* master only context for dedicated */
+	assign_psn_space(ctx);
+
+	if ((result = afu_reset_and_disable(afu)))
+		return result;
+
+	cxl_p2n_write(afu, CXL_PSL_WED_An, wed);
+
+	return afu_enable(afu);
+}
+
+static int deactivate_dedicated_process(struct cxl_afu_t *afu)
+{
+	dev_info(&afu->dev, "Deactivating dedicated process model\n");
+
+	afu->current_model = 0;
+	afu->num_procs = 0;
+
+	cxl_chardev_afu_remove(afu);
+
+	return 0;
+}
+
+int _cxl_afu_deactivate_model(struct cxl_afu_t *afu, int model)
+{
+	if (model == CXL_MODEL_DIRECTED)
+		return deactivate_afu_directed(afu);
+	if (model == CXL_MODEL_DEDICATED)
+		return deactivate_dedicated_process(afu);
+	return 0;
+}
+
+int cxl_afu_deactivate_model(struct cxl_afu_t *afu)
+{
+	return _cxl_afu_deactivate_model(afu, afu->current_model);
+}
+EXPORT_SYMBOL(cxl_afu_deactivate_model);
+
+int cxl_afu_activate_model(struct cxl_afu_t *afu, int model)
+{
+	if (!model)
+		return 0;
+	if (!(model & afu->models_supported))
+		return -EINVAL;
+
+	if (model == CXL_MODEL_DIRECTED)
+		return activate_afu_directed(afu);
+	if (model == CXL_MODEL_DEDICATED)
+		return activate_dedicated_process(afu);
+
+	return -EINVAL;
+}
+EXPORT_SYMBOL(cxl_afu_activate_model);
+
+static int attach_process_native(struct cxl_context_t *ctx, bool kernel,
+			       u64 wed, u64 amr)
+{
+	ctx->kernel = kernel;
+	if (ctx->afu->current_model == CXL_MODEL_DIRECTED)
+		return attach_afu_directed(ctx, wed, amr);
+
+	if (ctx->afu->current_model == CXL_MODEL_DEDICATED)
+		return attach_dedicated(ctx, wed, amr);
+
+	return -EINVAL;
+}
+
+/* TODO: handle case when this is called with IRQs off which may
+ * happen when we unbind the driver.  Terminate & remove use a mutex
+ * lock and schedule which will not good with lock held.  May need to
+ * write do_process_element_cmd() that handles outstanding page
+ * faults. */
+static int detach_process_native(struct cxl_context_t *ctx)
+{
+	if (ctx->afu->current_model == CXL_MODEL_DEDICATED) {
+		afu_reset_and_disable(ctx->afu);
+		afu_disable(ctx->afu);
+		psl_purge(ctx->afu);
+		return 0;
+	}
+
+	if (!ctx->pe_inserted)
+		return 0;
+	if (terminate_process_element(ctx))
+		return -1;
+	if (remove_process_element(ctx))
+		return -1;
+
+	return 0;
+}
+
+static int get_irq_native(struct cxl_context_t *ctx, struct cxl_irq_info *info)
+{
+	u64 pidtid;
+
+	info->dsisr = cxl_p2n_read(ctx->afu, CXL_PSL_DSISR_An);
+	info->dar = cxl_p2n_read(ctx->afu, CXL_PSL_DAR_An);
+	info->dsr = cxl_p2n_read(ctx->afu, CXL_PSL_DSR_An);
+	pidtid = cxl_p2n_read(ctx->afu, CXL_PSL_PID_TID_An);
+	info->pid = pidtid >> 32;
+	info->tid = pidtid & 0xffffffff;
+	info->afu_err = cxl_p2n_read(ctx->afu, CXL_AFU_ERR_An);
+	info->errstat = cxl_p2n_read(ctx->afu, CXL_PSL_ErrStat_An);
+
+	return 0;
+}
+
+static void recover_psl_err(struct cxl_afu_t *afu, u64 errstat)
+{
+	u64 dsisr;
+
+	pr_devel("RECOVERING FROM PSL ERROR... (0x%.16llx)\n", errstat);
+
+	/* Clear PSL_DSISR[PE] */
+	dsisr = cxl_p2n_read(afu, CXL_PSL_DSISR_An);
+	cxl_p2n_write(afu, CXL_PSL_DSISR_An, dsisr & ~CXL_PSL_DSISR_An_PE);
+
+	/* Write 1s to clear error status bits */
+	cxl_p2n_write(afu, CXL_PSL_ErrStat_An, errstat);
+}
+
+static int ack_irq_native(struct cxl_context_t *ctx, u64 tfc,
+			  u64 psl_reset_mask)
+{
+	if (tfc)
+		cxl_p2n_write(ctx->afu, CXL_PSL_TFC_An, tfc);
+	if (psl_reset_mask)
+		recover_psl_err(ctx->afu, psl_reset_mask);
+
+	return 0;
+}
+
+static int check_error(struct cxl_afu_t *afu)
+{
+	return (cxl_p1n_read(afu, CXL_PSL_SCNTL_An) == ~0ULL);
+}
+
+static const struct cxl_backend_ops cxl_native_ops = {
+	.attach_process = attach_process_native,
+	.detach_process = detach_process_native,
+	.get_irq = get_irq_native,
+	.ack_irq = ack_irq_native,
+	.check_error = check_error,
+	.slbia = afu_slbia_native,
+	.afu_reset = afu_reset_and_disable,
+};
+
+void init_cxl_native(void)
+{
+	cxl_ops = &cxl_native_ops;
+}
diff --git a/drivers/misc/cxl/sysfs.c b/drivers/misc/cxl/sysfs.c
new file mode 100644
index 0000000..67489e8
--- /dev/null
+++ b/drivers/misc/cxl/sysfs.c
@@ -0,0 +1,348 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/device.h>
+#include <linux/sysfs.h>
+
+#include "cxl.h"
+
+#define to_afu_chardev_m(d) dev_get_drvdata(d)
+
+/*********  Adapter attributes  **********************************************/
+
+static ssize_t caia_version_show(struct device *device,
+				 struct device_attribute *attr,
+				 char *buf)
+{
+	struct cxl_t *adapter = to_cxl_adapter(device);
+
+	return scnprintf(buf, PAGE_SIZE, "%i.%i\n", adapter->caia_major,
+			 adapter->caia_minor);
+}
+
+static ssize_t psl_revision_show(struct device *device,
+				 struct device_attribute *attr,
+				 char *buf)
+{
+	struct cxl_t *adapter = to_cxl_adapter(device);
+
+	return scnprintf(buf, PAGE_SIZE, "%i\n", adapter->psl_rev);
+}
+
+static ssize_t base_image_show(struct device *device,
+			       struct device_attribute *attr,
+			       char *buf)
+{
+	struct cxl_t *adapter = to_cxl_adapter(device);
+
+	return scnprintf(buf, PAGE_SIZE, "%i\n", adapter->base_image);
+}
+
+static ssize_t image_loaded_show(struct device *device,
+				 struct device_attribute *attr,
+				 char *buf)
+{
+	struct cxl_t *adapter = to_cxl_adapter(device);
+
+	if (adapter->user_image_loaded)
+		return scnprintf(buf, PAGE_SIZE, "user\n");
+	return scnprintf(buf, PAGE_SIZE, "factory\n");
+}
+
+static struct device_attribute adapter_attrs[] = {
+	__ATTR_RO(caia_version),
+	__ATTR_RO(psl_revision),
+	__ATTR_RO(base_image),
+	__ATTR_RO(image_loaded),
+	/* __ATTR_RW(reset_loads_image); */
+	/* __ATTR_RW(reset_image_select); */
+};
+
+
+/*********  AFU master specific attributes  **********************************/
+
+static ssize_t mmio_size_show_master(struct device *device,
+				     struct device_attribute *attr,
+				     char *buf)
+{
+	struct cxl_afu_t *afu = to_afu_chardev_m(device);
+
+	return scnprintf(buf, PAGE_SIZE, "%llu\n", afu->adapter->ps_size);
+}
+
+static ssize_t pp_mmio_off_show(struct device *device,
+				struct device_attribute *attr,
+				char *buf)
+{
+	struct cxl_afu_t *afu = to_afu_chardev_m(device);
+
+	return scnprintf(buf, PAGE_SIZE, "%llu\n", afu->pp_offset);
+}
+
+static ssize_t pp_mmio_len_show(struct device *device,
+				struct device_attribute *attr,
+				char *buf)
+{
+	struct cxl_afu_t *afu = to_afu_chardev_m(device);
+
+	return scnprintf(buf, PAGE_SIZE, "%llu\n", afu->pp_size);
+}
+
+static struct device_attribute afu_master_attrs[] = {
+	__ATTR(mmio_size, S_IRUGO, mmio_size_show_master, NULL),
+	__ATTR_RO(pp_mmio_off),
+	__ATTR_RO(pp_mmio_len),
+};
+
+
+/*********  AFU attributes  **************************************************/
+
+static ssize_t mmio_size_show(struct device *device,
+			      struct device_attribute *attr,
+			      char *buf)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+
+	if (afu->pp_size)
+		return scnprintf(buf, PAGE_SIZE, "%llu\n", afu->pp_size);
+	return scnprintf(buf, PAGE_SIZE, "%llu\n", afu->adapter->ps_size);
+}
+
+static ssize_t reset_store_afu(struct device *device,
+			       struct device_attribute *attr,
+			       const char *buf, size_t count)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+	int rc;
+
+	if ((rc = cxl_ops->afu_reset(afu)))
+		return rc;
+	return count;
+}
+
+static ssize_t irqs_min_show(struct device *device,
+			     struct device_attribute *attr,
+			     char *buf)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+
+	return scnprintf(buf, PAGE_SIZE, "%i\n", afu->pp_irqs);
+}
+
+static ssize_t irqs_max_show(struct device *device,
+				  struct device_attribute *attr,
+				  char *buf)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+
+	return scnprintf(buf, PAGE_SIZE, "%i\n", afu->irqs_max);
+}
+
+static ssize_t irqs_max_store(struct device *device,
+				  struct device_attribute *attr,
+				  const char *buf, size_t count)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+	ssize_t ret;
+	int irqs_max;
+
+	ret = sscanf(buf, "%i", &irqs_max);
+	if (ret != 1)
+		return -EINVAL;
+
+	if (irqs_max < afu->pp_irqs)
+		return -EINVAL;
+
+	if (irqs_max > afu->adapter->user_irqs)
+		return -EINVAL;
+
+	afu->irqs_max = irqs_max;
+	return count;
+}
+
+static ssize_t models_supported_show(struct device *device,
+				    struct device_attribute *attr,
+				    char *buf)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+	char *p = buf, *end = buf + PAGE_SIZE;
+
+	if (afu->models_supported & CXL_MODEL_DEDICATED)
+		p += scnprintf(p, end - p, "dedicated_process\n");
+	if (afu->models_supported & CXL_MODEL_DIRECTED)
+		p += scnprintf(p, end - p, "afu_directed\n");
+	return (p - buf);
+}
+
+static ssize_t prefault_mode_show(struct device *device,
+				  struct device_attribute *attr,
+				  char *buf)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+
+	switch (afu->prefault_mode) {
+	case CXL_PREFAULT_WED:
+		return scnprintf(buf, PAGE_SIZE, "wed\n");
+	case CXL_PREFAULT_ALL:
+		return scnprintf(buf, PAGE_SIZE, "all\n");
+	default:
+		return scnprintf(buf, PAGE_SIZE, "none\n");
+	}
+}
+
+static ssize_t prefault_mode_store(struct device *device,
+			  struct device_attribute *attr,
+			  const char *buf, size_t count)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+	enum prefault_modes mode = -1;
+
+	if (!strncmp(buf, "wed", 3))
+		mode = CXL_PREFAULT_WED;
+	if (!strncmp(buf, "all", 3))
+		mode = CXL_PREFAULT_ALL;
+	if (!strncmp(buf, "none", 4))
+		mode = CXL_PREFAULT_NONE;
+
+	if (mode == -1)
+		return -EINVAL;
+
+	afu->prefault_mode = mode;
+	return count;
+}
+
+static ssize_t model_show(struct device *device,
+			 struct device_attribute *attr,
+			 char *buf)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+
+	if (afu->current_model == CXL_MODEL_DEDICATED)
+		return scnprintf(buf, PAGE_SIZE, "dedicated_process\n");
+	if (afu->current_model == CXL_MODEL_DIRECTED)
+		return scnprintf(buf, PAGE_SIZE, "afu_directed\n");
+	return scnprintf(buf, PAGE_SIZE, "none\n");
+}
+
+static ssize_t model_store(struct device *device,
+			   struct device_attribute *attr,
+			   const char *buf, size_t count)
+{
+	struct cxl_afu_t *afu = to_cxl_afu(device);
+	int old_model, model = -1;
+	int rc = -EBUSY;
+
+	/* can't change this if we have a user */
+	spin_lock(&afu->contexts_lock);
+	if (!idr_is_empty(&afu->contexts_idr))
+		goto err;
+
+	if (!strncmp(buf, "dedicated_process", 17))
+		model = CXL_MODEL_DEDICATED;
+	if (!strncmp(buf, "afu_directed", 12))
+		model = CXL_MODEL_DIRECTED;
+	if (!strncmp(buf, "none", 4))
+		model = 0;
+
+	if (model == -1) {
+		rc = -EINVAL;
+		goto err;
+	}
+
+	/* cxl_afu_deactivate_model needs to be done outside the lock, prevent
+	 * other contexts coming in before we are ready: */
+	old_model = afu->current_model;
+	afu->current_model = 0;
+	afu->num_procs = 0;
+
+	spin_unlock(&afu->contexts_lock);
+
+	if ((rc = _cxl_afu_deactivate_model(afu, old_model)))
+		return rc;
+	if ((rc = cxl_afu_activate_model(afu, model)))
+		return rc;
+
+	return count;
+err:
+	spin_unlock(&afu->contexts_lock);
+	return rc;
+}
+
+static struct device_attribute afu_attrs[] = {
+	__ATTR_RO(mmio_size),
+	__ATTR_RO(irqs_min),
+	__ATTR_RW(irqs_max),
+	__ATTR_RO(models_supported),
+	__ATTR_RW(model),
+	__ATTR_RW(prefault_mode),
+	__ATTR(reset, S_IWUSR, NULL, reset_store_afu),
+};
+
+
+
+int cxl_sysfs_adapter_add(struct cxl_t *adapter)
+{
+	int i, rc;
+
+	for (i = 0; i < ARRAY_SIZE(adapter_attrs); i++) {
+		if ((rc = device_create_file(&adapter->dev, &adapter_attrs[i])))
+			goto err;
+	}
+	return 0;
+err:
+	for (i--; i >= 0; i--)
+		device_remove_file(&adapter->dev, &adapter_attrs[i]);
+	return rc;
+}
+EXPORT_SYMBOL(cxl_sysfs_adapter_add);
+void cxl_sysfs_adapter_remove(struct cxl_t *adapter)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(adapter_attrs); i++)
+		device_remove_file(&adapter->dev, &adapter_attrs[i]);
+}
+EXPORT_SYMBOL(cxl_sysfs_adapter_remove);
+
+int cxl_sysfs_afu_add(struct cxl_afu_t *afu)
+{
+	int afu_attr, mstr_attr, rc = 0;
+
+	for (afu_attr = 0; afu_attr < ARRAY_SIZE(afu_attrs); afu_attr++) {
+		if ((rc = device_create_file(&afu->dev, &afu_attrs[afu_attr])))
+			goto err;
+	}
+	for (mstr_attr = 0; mstr_attr < ARRAY_SIZE(afu_master_attrs); mstr_attr++) {
+		if ((rc = device_create_file(afu->chardev_m, &afu_master_attrs[mstr_attr])))
+			goto err1;
+	}
+
+	return 0;
+
+err1:
+	for (mstr_attr--; mstr_attr >= 0; mstr_attr--)
+		device_remove_file(afu->chardev_m, &afu_master_attrs[mstr_attr]);
+err:
+	for (afu_attr--; afu_attr >= 0; afu_attr--)
+		device_remove_file(&afu->dev, &afu_attrs[afu_attr]);
+	return rc;
+}
+EXPORT_SYMBOL(cxl_sysfs_afu_add);
+
+void cxl_sysfs_afu_remove(struct cxl_afu_t *afu)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(afu_master_attrs); i++)
+		device_remove_file(afu->chardev_m, &afu_master_attrs[i]);
+	for (i = 0; i < ARRAY_SIZE(afu_attrs); i++)
+		device_remove_file(&afu->dev, &afu_attrs[i]);
+}
+EXPORT_SYMBOL(cxl_sysfs_afu_remove);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 15/17] cxl: Userspace header file.
  2014-09-30 10:34 ` Michael Neuling
@ 2014-09-30 10:35   ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:35 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

From: Ian Munsie <imunsie@au1.ibm.com>

This defines structs and magic numbers required for userspace to interact with
the kernel cxl driver via /dev/cxl/afu0.0.

It adds this header file Kbuild so it's exported when doing make
headers_installs.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 include/uapi/Kbuild      |  1 +
 include/uapi/misc/Kbuild |  2 ++
 include/uapi/misc/cxl.h  | 88 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 91 insertions(+)
 create mode 100644 include/uapi/misc/Kbuild
 create mode 100644 include/uapi/misc/cxl.h

diff --git a/include/uapi/Kbuild b/include/uapi/Kbuild
index 81d2106..245aa6e 100644
--- a/include/uapi/Kbuild
+++ b/include/uapi/Kbuild
@@ -12,3 +12,4 @@ header-y += video/
 header-y += drm/
 header-y += xen/
 header-y += scsi/
+header-y += misc/
diff --git a/include/uapi/misc/Kbuild b/include/uapi/misc/Kbuild
new file mode 100644
index 0000000..e96cae7
--- /dev/null
+++ b/include/uapi/misc/Kbuild
@@ -0,0 +1,2 @@
+# misc Header export list
+header-y += cxl.h
diff --git a/include/uapi/misc/cxl.h b/include/uapi/misc/cxl.h
new file mode 100644
index 0000000..6a394b5
--- /dev/null
+++ b/include/uapi/misc/cxl.h
@@ -0,0 +1,88 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _UAPI_ASM_CXL_H
+#define _UAPI_ASM_CXL_H
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+/* ioctls */
+struct cxl_ioctl_start_work {
+	__u64 wed;
+	__u64 amr;
+	__u64 reserved1;
+	__u32 reserved2;
+	__s16 num_interrupts; /* -1 = use value from afu descriptor */
+	__u16 process_element; /* returned from kernel */
+	__u64 reserved3;
+	__u64 reserved4;
+	__u64 reserved5;
+	__u64 reserved6;
+};
+
+#define CXL_MAGIC 0xCA
+#define CXL_IOCTL_START_WORK      _IOWR(CXL_MAGIC, 0x00, struct cxl_ioctl_start_work)
+#define CXL_IOCTL_CHECK_ERROR     _IO(CXL_MAGIC,   0x02)
+
+/* events from read() */
+
+enum cxl_event_type {
+	CXL_EVENT_READ_FAIL     = -1,
+	CXL_EVENT_RESERVED      = 0,
+	CXL_EVENT_AFU_INTERRUPT = 1,
+	CXL_EVENT_DATA_STORAGE  = 2,
+	CXL_EVENT_AFU_ERROR     = 3,
+};
+
+struct cxl_event_header {
+	__u32 type;
+	__u16 size;
+	__u16 process_element;
+	__u64 reserved1;
+	__u64 reserved2;
+	__u64 reserved3;
+};
+
+struct cxl_event_afu_interrupt {
+	struct cxl_event_header header;
+	__u16 irq; /* Raised AFU interrupt number */
+	__u16 reserved1;
+	__u32 reserved2;
+	__u64 reserved3;
+	__u64 reserved4;
+	__u64 reserved5;
+};
+
+struct cxl_event_data_storage {
+	struct cxl_event_header header;
+	__u64 addr;
+	__u64 reserved1;
+	__u64 reserved2;
+	__u64 reserved3;
+};
+
+struct cxl_event_afu_error {
+	struct cxl_event_header header;
+	__u64 err;
+	__u64 reserved1;
+	__u64 reserved2;
+	__u64 reserved3;
+};
+
+struct cxl_event {
+	union {
+		struct cxl_event_header header;
+		struct cxl_event_afu_interrupt irq;
+		struct cxl_event_data_storage fault;
+		struct cxl_event_afu_error afu_err;
+	};
+};
+
+#endif
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 15/17] cxl: Userspace header file.
@ 2014-09-30 10:35   ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:35 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

From: Ian Munsie <imunsie@au1.ibm.com>

This defines structs and magic numbers required for userspace to interact with
the kernel cxl driver via /dev/cxl/afu0.0.

It adds this header file Kbuild so it's exported when doing make
headers_installs.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 include/uapi/Kbuild      |  1 +
 include/uapi/misc/Kbuild |  2 ++
 include/uapi/misc/cxl.h  | 88 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 91 insertions(+)
 create mode 100644 include/uapi/misc/Kbuild
 create mode 100644 include/uapi/misc/cxl.h

diff --git a/include/uapi/Kbuild b/include/uapi/Kbuild
index 81d2106..245aa6e 100644
--- a/include/uapi/Kbuild
+++ b/include/uapi/Kbuild
@@ -12,3 +12,4 @@ header-y += video/
 header-y += drm/
 header-y += xen/
 header-y += scsi/
+header-y += misc/
diff --git a/include/uapi/misc/Kbuild b/include/uapi/misc/Kbuild
new file mode 100644
index 0000000..e96cae7
--- /dev/null
+++ b/include/uapi/misc/Kbuild
@@ -0,0 +1,2 @@
+# misc Header export list
+header-y += cxl.h
diff --git a/include/uapi/misc/cxl.h b/include/uapi/misc/cxl.h
new file mode 100644
index 0000000..6a394b5
--- /dev/null
+++ b/include/uapi/misc/cxl.h
@@ -0,0 +1,88 @@
+/*
+ * Copyright 2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _UAPI_ASM_CXL_H
+#define _UAPI_ASM_CXL_H
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+/* ioctls */
+struct cxl_ioctl_start_work {
+	__u64 wed;
+	__u64 amr;
+	__u64 reserved1;
+	__u32 reserved2;
+	__s16 num_interrupts; /* -1 = use value from afu descriptor */
+	__u16 process_element; /* returned from kernel */
+	__u64 reserved3;
+	__u64 reserved4;
+	__u64 reserved5;
+	__u64 reserved6;
+};
+
+#define CXL_MAGIC 0xCA
+#define CXL_IOCTL_START_WORK      _IOWR(CXL_MAGIC, 0x00, struct cxl_ioctl_start_work)
+#define CXL_IOCTL_CHECK_ERROR     _IO(CXL_MAGIC,   0x02)
+
+/* events from read() */
+
+enum cxl_event_type {
+	CXL_EVENT_READ_FAIL     = -1,
+	CXL_EVENT_RESERVED      = 0,
+	CXL_EVENT_AFU_INTERRUPT = 1,
+	CXL_EVENT_DATA_STORAGE  = 2,
+	CXL_EVENT_AFU_ERROR     = 3,
+};
+
+struct cxl_event_header {
+	__u32 type;
+	__u16 size;
+	__u16 process_element;
+	__u64 reserved1;
+	__u64 reserved2;
+	__u64 reserved3;
+};
+
+struct cxl_event_afu_interrupt {
+	struct cxl_event_header header;
+	__u16 irq; /* Raised AFU interrupt number */
+	__u16 reserved1;
+	__u32 reserved2;
+	__u64 reserved3;
+	__u64 reserved4;
+	__u64 reserved5;
+};
+
+struct cxl_event_data_storage {
+	struct cxl_event_header header;
+	__u64 addr;
+	__u64 reserved1;
+	__u64 reserved2;
+	__u64 reserved3;
+};
+
+struct cxl_event_afu_error {
+	struct cxl_event_header header;
+	__u64 err;
+	__u64 reserved1;
+	__u64 reserved2;
+	__u64 reserved3;
+};
+
+struct cxl_event {
+	union {
+		struct cxl_event_header header;
+		struct cxl_event_afu_interrupt irq;
+		struct cxl_event_data_storage fault;
+		struct cxl_event_afu_error afu_err;
+	};
+};
+
+#endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 16/17] cxl: Add driver to Kbuild and Makefiles
  2014-09-30 10:34 ` Michael Neuling
@ 2014-09-30 10:35   ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:35 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

From: Ian Munsie <imunsie@au1.ibm.com>

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 drivers/misc/cxl/Kconfig  | 18 ++++++++++++++++++
 drivers/misc/cxl/Makefile |  3 +++
 2 files changed, 21 insertions(+)

diff --git a/drivers/misc/cxl/Kconfig b/drivers/misc/cxl/Kconfig
index 5cdd319..967b5c8 100644
--- a/drivers/misc/cxl/Kconfig
+++ b/drivers/misc/cxl/Kconfig
@@ -6,3 +6,21 @@ config CXL_BASE
 	bool
 	default n
 	select PPC_COPRO_BASE
+
+config CXL
+	tristate "Support for IBM Coherent Accelerators (CXL)"
+	depends on PPC_POWERNV && PCI_MSI
+	select CXL_BASE
+	default m
+	help
+	  Select this option to enable userspace driver support for IBM
+	  Coherent Accelerators (CXL).  CXL is otherwise known as Coherent
+	  Accelerator Processor Interface (CAPI).
+
+config CXL_PCI
+	tristate "Support for CXL devices via PCI"
+	depends on CXL && PPC_POWERNV
+	default y
+	help
+	  Select this option to support CXL devices detected via PCI, e.g.
+	  when running under powernv/OPAL.
diff --git a/drivers/misc/cxl/Makefile b/drivers/misc/cxl/Makefile
index e30ad0a..96f292b 100644
--- a/drivers/misc/cxl/Makefile
+++ b/drivers/misc/cxl/Makefile
@@ -1 +1,4 @@
+cxl-y				+= main.o file.o irq.o fault.o native.o context.o sysfs.o debugfs.o
+obj-$(CONFIG_CXL)		+= cxl.o
+obj-$(CONFIG_CXL_PCI)		+= cxl-pci.o
 obj-$(CONFIG_CXL_BASE)		+= base.o
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 16/17] cxl: Add driver to Kbuild and Makefiles
@ 2014-09-30 10:35   ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:35 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

From: Ian Munsie <imunsie@au1.ibm.com>

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 drivers/misc/cxl/Kconfig  | 18 ++++++++++++++++++
 drivers/misc/cxl/Makefile |  3 +++
 2 files changed, 21 insertions(+)

diff --git a/drivers/misc/cxl/Kconfig b/drivers/misc/cxl/Kconfig
index 5cdd319..967b5c8 100644
--- a/drivers/misc/cxl/Kconfig
+++ b/drivers/misc/cxl/Kconfig
@@ -6,3 +6,21 @@ config CXL_BASE
 	bool
 	default n
 	select PPC_COPRO_BASE
+
+config CXL
+	tristate "Support for IBM Coherent Accelerators (CXL)"
+	depends on PPC_POWERNV && PCI_MSI
+	select CXL_BASE
+	default m
+	help
+	  Select this option to enable userspace driver support for IBM
+	  Coherent Accelerators (CXL).  CXL is otherwise known as Coherent
+	  Accelerator Processor Interface (CAPI).
+
+config CXL_PCI
+	tristate "Support for CXL devices via PCI"
+	depends on CXL && PPC_POWERNV
+	default y
+	help
+	  Select this option to support CXL devices detected via PCI, e.g.
+	  when running under powernv/OPAL.
diff --git a/drivers/misc/cxl/Makefile b/drivers/misc/cxl/Makefile
index e30ad0a..96f292b 100644
--- a/drivers/misc/cxl/Makefile
+++ b/drivers/misc/cxl/Makefile
@@ -1 +1,4 @@
+cxl-y				+= main.o file.o irq.o fault.o native.o context.o sysfs.o debugfs.o
+obj-$(CONFIG_CXL)		+= cxl.o
+obj-$(CONFIG_CXL_PCI)		+= cxl-pci.o
 obj-$(CONFIG_CXL_BASE)		+= base.o
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 17/17] cxl: Add documentation for userspace APIs
  2014-09-30 10:34 ` Michael Neuling
@ 2014-09-30 10:35   ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:35 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

From: Ian Munsie <imunsie@au1.ibm.com>

This documentation gives an overview of the hardware architecture, userspace
APIs via /dev/cxl/afu0.0 and the syfs files.  It also adds a MAINTAINERS file
entry for cxl.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 Documentation/ABI/testing/sysfs-class-cxl | 125 ++++++++++++
 Documentation/ioctl/ioctl-number.txt      |   1 +
 Documentation/powerpc/00-INDEX            |   2 +
 Documentation/powerpc/cxl.txt             | 310 ++++++++++++++++++++++++++++++
 MAINTAINERS                               |   7 +
 5 files changed, 445 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-class-cxl
 create mode 100644 Documentation/powerpc/cxl.txt

diff --git a/Documentation/ABI/testing/sysfs-class-cxl b/Documentation/ABI/testing/sysfs-class-cxl
new file mode 100644
index 0000000..2d0a0f0
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-cxl
@@ -0,0 +1,125 @@
+Slave contexts (eg. /sys/class/cxl/afu0.0):
+
+What:		/sys/class/cxl/<afu>/irqs_max
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read/write
+		Maximum number of interrupts that can be requested by userspace.
+		The default on probe is the maximum that hardware can support
+		(eg. 2037).  Write values will limit userspace applications to
+		that many userspace interrupts.  Must be >= irqs_min.
+
+What:		/sys/class/cxl/<afu>/irqs_min
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read_only
+		The minimum number of interrupts that userspace must request
+		on a CXL_START_WORK ioctl.  Userspace may request -1 in the
+		START_WORK IOCTL to get this minimum automatically.
+
+What:		/sys/class/cxl/<afu>/mmio_size
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read only
+		Size of the MMIO space that may be mmaped by userspace.
+
+
+What:		/sys/class/cxl/<afu>/models_supported
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read only
+		List of the models this AFU supports.
+		Valid entries are: "dedicated_process" and "afu_directed"
+
+What:		/sys/class/cxl/<afu>/model
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read/write
+		The current model the AFU is using.  Will be one of the models
+		given in models_supported.  Writing will change the model but
+		no user contexts can be attached at this point.
+
+
+What:		/sys/class/cxl/<afu>/prefault_mode
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read/write
+		Set the mode for prefaulting in segments into the segment table
+		when performing the START_WORK ioctl.  Possible values:
+			none: No prefaulting (default)
+			wed: Just prefault in the wed
+			all: all segments this process currently maps
+
+What:		/sys/class/cxl/<afu>/reset
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	write only
+		Reset the AFU.
+
+
+Master contexts (eg. /sys/class/cxl/afu0.0m)
+
+What:		/sys/class/cxl/<afu>m/mmio_size
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read only
+		Size of the MMIO space that may be mmaped by userspace.  This
+		includes all slave contexts space also.
+
+What:		/sys/class/cxl/<afu>m/pp_mmio_len
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read only
+		Per Process MMIO space length.
+
+What:		/sys/class/cxl/<afu>m/pp_mmio_off
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read only
+		Per Process MMIO space offset.
+
+
+Card info (eg. /sys/class/cxl/card0)
+
+What:		/sys/class/cxl/<card>/caia_version
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read only
+		Identifies the CAIA Version the card implements.
+
+What:		/sys/class/cxl/<card>/psl_version
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read only
+		Identifies the revision level of the PSL.
+
+What:		/sys/class/cxl/<card>/base_image
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read only
+		Identifies the revision level of the base image for devices
+		that support load-able PSLs. For FPGAs this field identifies
+		the image contained in the on-adapter flash which is loaded
+		during the initial program load
+
+What:		/sys/class/cxl/<card>/image_loaded
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read only
+		Will return "user" or "factory" depending on the image loaded
+		onto the card
+
diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index 7e240a7..8136e1f 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -313,6 +313,7 @@ Code  Seq#(hex)	Include File		Comments
 0xB1	00-1F	PPPoX			<mailto:mostrows@styx.uwaterloo.ca>
 0xB3	00	linux/mmc/ioctl.h
 0xC0	00-0F	linux/usb/iowarrior.h
+0xCA	00-0F	uapi/misc/cxl.h
 0xCB	00-1F	CBM serial IEC bus	in development:
 					<mailto:michael.klein@puffin.lb.shuttle.de>
 0xCD	01	linux/reiserfs_fs.h
diff --git a/Documentation/powerpc/00-INDEX b/Documentation/powerpc/00-INDEX
index a68784d..116d94d 100644
--- a/Documentation/powerpc/00-INDEX
+++ b/Documentation/powerpc/00-INDEX
@@ -28,3 +28,5 @@ ptrace.txt
 	- Information on the ptrace interfaces for hardware debug registers.
 transactional_memory.txt
 	- Overview of the Power8 transactional memory support.
+cxl.txt
+	- Overview of the CXL driver.
diff --git a/Documentation/powerpc/cxl.txt b/Documentation/powerpc/cxl.txt
new file mode 100644
index 0000000..f23e675
--- /dev/null
+++ b/Documentation/powerpc/cxl.txt
@@ -0,0 +1,310 @@
+Coherent Accelerator Interface (CXL)
+====================================
+
+Introduction
+============
+
+    The coherent accelerator interface is designed to allow the
+    coherent connection of FPGA based accelerators (and other devices)
+    to a POWER system.  These devices need to adhere to the Coherent
+    Accelerator Interface Architecture (CAIA).
+
+    IBM refers to this as the Coherent Accelerator Processor Interface
+    or CAPI.  In the kernel it's referred to by the name CXL to avoid
+    confusion with the ISDN CAPI subsystem.
+
+Hardware overview
+=================
+
+          POWER8               FPGA
+       +----------+        +---------+
+       |          |        |         |
+       |   CPU    |        |   AFU   |
+       |          |        |         |
+       |          |        |         |
+       |          |        |         |
+       +----------+        +---------+
+       |          |        |         |
+       |   CAPP   +--------+   PSL   |
+       |          |  PCIe  |         |
+       +----------+        +---------+
+
+    The POWER8 chip has a Coherently Attached Processor Proxy (CAPP)
+    unit which is part of the PCIe Host Bridge (PHB).  This is managed
+    by Linux by calls into OPAL.  Linux doesn't directly program the
+    CAPP.
+
+    The FPGA (or coherently attached device) consists of two parts.
+    The POWER Service Layer (PSL) and the Accelerator Function Unit
+    (AFU). AFU is used to implement specific functionality behind
+    the PSL.  The PSL, among other things, provides memory address
+    translation services to allow each AFU direct access to userspace
+    memory.
+
+    The AFU is the core part of the accelerator (eg. the compression,
+    crypto etc function).  The kernel has no knowledge of the function
+    of the AFU.  Only userspace interacts directly with the AFU.
+
+    The PSL provides the translation and interrupt services that the
+    AFU needs.  This is what the kernel interacts with.  For example,
+    if the AFU needs to read a particular virtual address, it sends
+    that address to the PSL, the PSL then translates it, fetches the
+    data from memory and returns it to the AFU.  If the PSL has a
+    translation miss, it interrupts the kernel and the kernel services
+    the fault.  The context to which this fault is serviced is based
+    on who owns that acceleration function.
+
+AFU Models
+==========
+
+    There are two programming models supported by the AFU.  Dedicated
+    and AFU directed.  AFU may support one or both models.
+
+    In dedicated model only one MMU context is supported.  In this
+    model, only one userspace process can use the accelerator at time.
+
+    In AFU directed model, up to 16K simultaneous contexts can be
+    supported.  This means up to 16K simultaneous userspace
+    applications may use the accelerator (although specific AFUs may
+    support less).  In this mode, the AFU sends a 16 bit context ID
+    with each of its requests.  This tells the PSL which context is
+    associated with this operation.  If the PSL can't translate a
+    request, the ID can also be accessed by the kernel so it can
+    determine the associated userspace context to service this
+    translation with.
+
+MMIO space
+==========
+
+    A portion of the FPGA MMIO space can be directly mapped from the
+    AFU to userspace.  Either the whole space can be mapped (master
+    context), or just a per context portion (slave context).  The
+    hardware is self describing, hence the kernel can determine the
+    offset and size of the per context portion.
+
+Interrupts
+==========
+
+    AFUs may generate interrupts that are destined for userspace.  These
+    are received by the kernel as hardware interrupts and passed onto
+    userspace.
+
+    Data storage faults and error interrupts are handled by the kernel
+    driver.
+
+Work Element Descriptor (WED)
+=============================
+
+    The WED is a 64bit parameter passed to the AFU when a context is
+    started.  Its format is up to the AFU hence the kernel has no
+    knowledge of what it represents.  Typically it will be a virtual
+    address pointer to a work queue where the AFU and userspace can
+    share control and status information or work queues.
+
+
+
+
+User API
+========
+
+    The driver will create two character devices per AFU under
+    /dev/cxl.  One for master and one for slave contexts.
+
+    The master context (eg. /dev/cxl/afu0.0m), has access to all of
+    the MMIO space that an AFU provides.  The slave context
+    (eg. /dev/cxl/afu0.0m) has access to only the per process MMIO
+    space an AFU provides (AFU directed only).
+
+    The following file operations are supported on both slave and
+    master devices:
+
+    open
+
+        Opens device and allocates a file descriptor to be used with
+        the rest of the API.  This may be opened multiple times,
+        depending on how many contexts the AFU supports.
+
+        A dedicated model AFU only has one context and hence only
+        allows this device to be opened once.
+
+        A AFU directed model AFU can have many contexts and hence this
+        device can be opened by as many contexts as available.
+
+        Note: IRQs also need to be allocated per context, which may
+              also limit the number of contexts that can be allocated.
+              The POWER8 CAPP supports 2040 IRQs and 3 are used by the
+              kernel, so 2037 are left.  If 1 IRQ is needed per
+              context, then only 2037 contexts can be allocated.  If 4
+              IRQs are needed per context, then only 2037/4 = 509
+              contexts can be allocated.
+
+    ioctl
+
+        CAPI_IOCTL_START_WORK:
+            Starts the AFU and associates it with the process memory
+            context.  Once this ioctl is successfully executed, all
+            memory mapped into this process is accessible to this AFU
+            context using the same virtual addresses.  No additional
+            calls are required to un/map memory.  The AFU context will
+            be updated as userspace allocates and frees memory.  This
+            ioctl returns onces the context is started.
+
+            Takes a pointer to a struct cxl_ioctl_start_work
+                    struct cxl_ioctl_start_work {
+                            __u64 wed;
+                            __u64 amr;
+                            __u64 reserved1;
+                            __u32 reserved2;
+                            __s16 num_interrupts;
+                            __u16 process_element;
+                            __u64 reserved3;
+                            __u64 reserved4;
+                            __u64 reserved5;
+                            __u64 reserved6;
+                    };
+
+                wed: 64bit argument defined by the AFU.  Typically
+                    this is an virtual address pointing to an AFU
+                    specific structure describing what work to
+                    perform.
+
+                amr:
+                    Authority Mask Register (AMR), same as the powerpc
+                    AMR.
+
+                num_interrupt:
+                    Number of userspace interrupts to request.  The
+                    minimum required given in sysfs and -1 will
+                    automatically allocate this minimum.  The max also
+                    given in sysfs.
+
+                process_element:
+                    Written by the kernel with the context id (AKA
+                    process element) it allocates.  Slave contexts may
+                    want to communicate this to a master process.
+
+                reserved fields:
+                    For ABI padding and future extensions
+
+        CAPI_IOCTL_CHECK_ERROR:
+            This checks to see if the AFU has encountered an error and
+            if so resets it.  If userspace is accessing MMIO space, it
+            may notice an EEH fence (all ones on read) before the kernel,
+            hence it needs to inform the kernel of this.
+
+        CAPI_IOCTL_LOAD_AFU_IMAGE:
+            Future work: to dynamically load AFU FPGA images.  Without
+            this, the AFU is assumed to be pre-loaded on the card.
+
+    mmap
+
+        An AFU may have a MMIO space to facilitate communication with
+        the AFU and mmap allows access to this.  The size and contents
+        of this area are specific to the particular AFU.  The size can
+        be discovered via sysfs.  A read of all ones indicates the AFU
+        has encountered an error and CAPI_IOCTL_CHECK_ERROR should be
+        used to recover the AFU.
+
+        Master contexts will get all of the MMIO space.  Slave
+        contexts will get only the per process space associated with
+        its context.
+
+        This mmap call must be done after the IOCTL is started.
+
+        Care should be taken when accessing MMIO space.  Only 32 and
+        64bit accesses are supported by POWER8. Also, the AFU will be
+        designed with a specific endian, so all MMIO access should
+        consider endian (recommend endian(3) variants like: le64toh(),
+        be64toh() etc).  These endian issues equally apply to shared
+        memory queues the WED may describe.
+
+    read
+
+        Reads an event from the AFU. Will return -EINVAL if the buffer
+        does not contain enough space to write the struct
+        capi_event_header. Blocks if no events are pending.  Will
+        return -EIO in the case of an unrecoverable error or if the
+        card is removed.
+
+        All events will return a struct cxl_event which is always the
+        same size.  A struct cxl_event_header at the start gives:
+                struct cxl_event_header {
+                        __u32 type;
+                        __u16 size;
+                        __u16 process_element;
+                        __u64 reserved1;
+                        __u64 reserved2;
+                        __u64 reserved3;
+                };
+
+            type:
+                This gives the type of the interrupt.  This gives how
+                the rest event will be structured.  It can be either:
+                AFU interrupt, data storage fault or AFU error.
+
+            size:
+                This is always sizeof(struct cxl_event)
+
+            process_element:
+                Context ID of the event.  Currently this will always
+                be the current context.  Future work may allow
+                interrupts from one context to be routed to another
+                (eg. a master contexts handling error interrupts on
+                behalf of a slave).
+
+            reserved fields:
+                For future extensions
+
+        If an AFU interrupt event is received, the full structure received is:
+                struct cxl_event_afu_interrupt {
+                        struct cxl_event_header header;
+                        __u16 irq;
+                        __u16 reserved1;
+                        __u32 reserved2;
+                        __u64 reserved3;
+                        __u64 reserved4;
+                        __u64 reserved5;
+                };
+            irq:
+                The IRQ number sent by the AFU.
+
+            reserved fields:
+                For future extensions
+
+        If an data storage event is received, the full structure received is:
+                struct cxl_event_data_storage {
+                        struct cxl_event_header header;
+                        __u64 addr;
+                        __u64 reserved1;
+                        __u64 reserved2;
+                        __u64 reserved3;
+                };
+            address:
+                Address of the data storage trying to be accessed by
+                the AFU.  Valid accesses will handled transparently by
+                the kernel but invalid access will generate this
+                event.
+
+            reserved fields:
+                For future extensions
+
+        If an AFU error event is received, the full structure received is:
+                struct cxl_event_afu_error {
+                        struct cxl_event_header header;
+                        __u64 err;
+                        __u64 reserved1;
+                        __u64 reserved2;
+                        __u64 reserved3;
+                };
+            err:
+                Error status from the AFU.  AFU defined.
+
+            reserved fields:
+                For future extensions
+
+Sysfs Class
+===========
+
+    A cxl sysfs class is added under /sys/class/cxl to facilitate
+    enumeration and tuning of the accelerators. Its layout is
+    described in Documentation/ABI/testing/sysfs-class-cxl
diff --git a/MAINTAINERS b/MAINTAINERS
index 809ecd6..c972be3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2711,6 +2711,13 @@ W:	http://www.chelsio.com
 S:	Supported
 F:	drivers/net/ethernet/chelsio/cxgb4vf/
 
+CXL (IBM Coherent Accelerator Processor Interface CAPI) DRIVER
+M:	Ian Munsie <imunsie@au1.ibm.com>
+M:	Michael Neuling <mikey@neuling.org>
+L:	linuxppc-dev@lists.ozlabs.org
+S:	Supported
+F:	drivers/misc/cxl/
+
 STMMAC ETHERNET DRIVER
 M:	Giuseppe Cavallaro <peppe.cavallaro@st.com>
 L:	netdev@vger.kernel.org
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v2 17/17] cxl: Add documentation for userspace APIs
@ 2014-09-30 10:35   ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-09-30 10:35 UTC (permalink / raw)
  To: greg, arnd, mpe, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

From: Ian Munsie <imunsie@au1.ibm.com>

This documentation gives an overview of the hardware architecture, userspace
APIs via /dev/cxl/afu0.0 and the syfs files.  It also adds a MAINTAINERS file
entry for cxl.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 Documentation/ABI/testing/sysfs-class-cxl | 125 ++++++++++++
 Documentation/ioctl/ioctl-number.txt      |   1 +
 Documentation/powerpc/00-INDEX            |   2 +
 Documentation/powerpc/cxl.txt             | 310 ++++++++++++++++++++++++++++++
 MAINTAINERS                               |   7 +
 5 files changed, 445 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-class-cxl
 create mode 100644 Documentation/powerpc/cxl.txt

diff --git a/Documentation/ABI/testing/sysfs-class-cxl b/Documentation/ABI/testing/sysfs-class-cxl
new file mode 100644
index 0000000..2d0a0f0
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-cxl
@@ -0,0 +1,125 @@
+Slave contexts (eg. /sys/class/cxl/afu0.0):
+
+What:		/sys/class/cxl/<afu>/irqs_max
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read/write
+		Maximum number of interrupts that can be requested by userspace.
+		The default on probe is the maximum that hardware can support
+		(eg. 2037).  Write values will limit userspace applications to
+		that many userspace interrupts.  Must be >= irqs_min.
+
+What:		/sys/class/cxl/<afu>/irqs_min
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read_only
+		The minimum number of interrupts that userspace must request
+		on a CXL_START_WORK ioctl.  Userspace may request -1 in the
+		START_WORK IOCTL to get this minimum automatically.
+
+What:		/sys/class/cxl/<afu>/mmio_size
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read only
+		Size of the MMIO space that may be mmaped by userspace.
+
+
+What:		/sys/class/cxl/<afu>/models_supported
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read only
+		List of the models this AFU supports.
+		Valid entries are: "dedicated_process" and "afu_directed"
+
+What:		/sys/class/cxl/<afu>/model
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read/write
+		The current model the AFU is using.  Will be one of the models
+		given in models_supported.  Writing will change the model but
+		no user contexts can be attached at this point.
+
+
+What:		/sys/class/cxl/<afu>/prefault_mode
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read/write
+		Set the mode for prefaulting in segments into the segment table
+		when performing the START_WORK ioctl.  Possible values:
+			none: No prefaulting (default)
+			wed: Just prefault in the wed
+			all: all segments this process currently maps
+
+What:		/sys/class/cxl/<afu>/reset
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	write only
+		Reset the AFU.
+
+
+Master contexts (eg. /sys/class/cxl/afu0.0m)
+
+What:		/sys/class/cxl/<afu>m/mmio_size
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read only
+		Size of the MMIO space that may be mmaped by userspace.  This
+		includes all slave contexts space also.
+
+What:		/sys/class/cxl/<afu>m/pp_mmio_len
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read only
+		Per Process MMIO space length.
+
+What:		/sys/class/cxl/<afu>m/pp_mmio_off
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read only
+		Per Process MMIO space offset.
+
+
+Card info (eg. /sys/class/cxl/card0)
+
+What:		/sys/class/cxl/<card>/caia_version
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read only
+		Identifies the CAIA Version the card implements.
+
+What:		/sys/class/cxl/<card>/psl_version
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read only
+		Identifies the revision level of the PSL.
+
+What:		/sys/class/cxl/<card>/base_image
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read only
+		Identifies the revision level of the base image for devices
+		that support load-able PSLs. For FPGAs this field identifies
+		the image contained in the on-adapter flash which is loaded
+		during the initial program load
+
+What:		/sys/class/cxl/<card>/image_loaded
+Date:		September 2014
+Contact:	Ian Munsie <imunsie@au1.ibm.com>,
+		Michael Neuling <mikey@neuling.org>
+Description:	read only
+		Will return "user" or "factory" depending on the image loaded
+		onto the card
+
diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index 7e240a7..8136e1f 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -313,6 +313,7 @@ Code  Seq#(hex)	Include File		Comments
 0xB1	00-1F	PPPoX			<mailto:mostrows@styx.uwaterloo.ca>
 0xB3	00	linux/mmc/ioctl.h
 0xC0	00-0F	linux/usb/iowarrior.h
+0xCA	00-0F	uapi/misc/cxl.h
 0xCB	00-1F	CBM serial IEC bus	in development:
 					<mailto:michael.klein@puffin.lb.shuttle.de>
 0xCD	01	linux/reiserfs_fs.h
diff --git a/Documentation/powerpc/00-INDEX b/Documentation/powerpc/00-INDEX
index a68784d..116d94d 100644
--- a/Documentation/powerpc/00-INDEX
+++ b/Documentation/powerpc/00-INDEX
@@ -28,3 +28,5 @@ ptrace.txt
 	- Information on the ptrace interfaces for hardware debug registers.
 transactional_memory.txt
 	- Overview of the Power8 transactional memory support.
+cxl.txt
+	- Overview of the CXL driver.
diff --git a/Documentation/powerpc/cxl.txt b/Documentation/powerpc/cxl.txt
new file mode 100644
index 0000000..f23e675
--- /dev/null
+++ b/Documentation/powerpc/cxl.txt
@@ -0,0 +1,310 @@
+Coherent Accelerator Interface (CXL)
+====================================
+
+Introduction
+============
+
+    The coherent accelerator interface is designed to allow the
+    coherent connection of FPGA based accelerators (and other devices)
+    to a POWER system.  These devices need to adhere to the Coherent
+    Accelerator Interface Architecture (CAIA).
+
+    IBM refers to this as the Coherent Accelerator Processor Interface
+    or CAPI.  In the kernel it's referred to by the name CXL to avoid
+    confusion with the ISDN CAPI subsystem.
+
+Hardware overview
+=================
+
+          POWER8               FPGA
+       +----------+        +---------+
+       |          |        |         |
+       |   CPU    |        |   AFU   |
+       |          |        |         |
+       |          |        |         |
+       |          |        |         |
+       +----------+        +---------+
+       |          |        |         |
+       |   CAPP   +--------+   PSL   |
+       |          |  PCIe  |         |
+       +----------+        +---------+
+
+    The POWER8 chip has a Coherently Attached Processor Proxy (CAPP)
+    unit which is part of the PCIe Host Bridge (PHB).  This is managed
+    by Linux by calls into OPAL.  Linux doesn't directly program the
+    CAPP.
+
+    The FPGA (or coherently attached device) consists of two parts.
+    The POWER Service Layer (PSL) and the Accelerator Function Unit
+    (AFU). AFU is used to implement specific functionality behind
+    the PSL.  The PSL, among other things, provides memory address
+    translation services to allow each AFU direct access to userspace
+    memory.
+
+    The AFU is the core part of the accelerator (eg. the compression,
+    crypto etc function).  The kernel has no knowledge of the function
+    of the AFU.  Only userspace interacts directly with the AFU.
+
+    The PSL provides the translation and interrupt services that the
+    AFU needs.  This is what the kernel interacts with.  For example,
+    if the AFU needs to read a particular virtual address, it sends
+    that address to the PSL, the PSL then translates it, fetches the
+    data from memory and returns it to the AFU.  If the PSL has a
+    translation miss, it interrupts the kernel and the kernel services
+    the fault.  The context to which this fault is serviced is based
+    on who owns that acceleration function.
+
+AFU Models
+==========
+
+    There are two programming models supported by the AFU.  Dedicated
+    and AFU directed.  AFU may support one or both models.
+
+    In dedicated model only one MMU context is supported.  In this
+    model, only one userspace process can use the accelerator at time.
+
+    In AFU directed model, up to 16K simultaneous contexts can be
+    supported.  This means up to 16K simultaneous userspace
+    applications may use the accelerator (although specific AFUs may
+    support less).  In this mode, the AFU sends a 16 bit context ID
+    with each of its requests.  This tells the PSL which context is
+    associated with this operation.  If the PSL can't translate a
+    request, the ID can also be accessed by the kernel so it can
+    determine the associated userspace context to service this
+    translation with.
+
+MMIO space
+==========
+
+    A portion of the FPGA MMIO space can be directly mapped from the
+    AFU to userspace.  Either the whole space can be mapped (master
+    context), or just a per context portion (slave context).  The
+    hardware is self describing, hence the kernel can determine the
+    offset and size of the per context portion.
+
+Interrupts
+==========
+
+    AFUs may generate interrupts that are destined for userspace.  These
+    are received by the kernel as hardware interrupts and passed onto
+    userspace.
+
+    Data storage faults and error interrupts are handled by the kernel
+    driver.
+
+Work Element Descriptor (WED)
+=============================
+
+    The WED is a 64bit parameter passed to the AFU when a context is
+    started.  Its format is up to the AFU hence the kernel has no
+    knowledge of what it represents.  Typically it will be a virtual
+    address pointer to a work queue where the AFU and userspace can
+    share control and status information or work queues.
+
+
+
+
+User API
+========
+
+    The driver will create two character devices per AFU under
+    /dev/cxl.  One for master and one for slave contexts.
+
+    The master context (eg. /dev/cxl/afu0.0m), has access to all of
+    the MMIO space that an AFU provides.  The slave context
+    (eg. /dev/cxl/afu0.0m) has access to only the per process MMIO
+    space an AFU provides (AFU directed only).
+
+    The following file operations are supported on both slave and
+    master devices:
+
+    open
+
+        Opens device and allocates a file descriptor to be used with
+        the rest of the API.  This may be opened multiple times,
+        depending on how many contexts the AFU supports.
+
+        A dedicated model AFU only has one context and hence only
+        allows this device to be opened once.
+
+        A AFU directed model AFU can have many contexts and hence this
+        device can be opened by as many contexts as available.
+
+        Note: IRQs also need to be allocated per context, which may
+              also limit the number of contexts that can be allocated.
+              The POWER8 CAPP supports 2040 IRQs and 3 are used by the
+              kernel, so 2037 are left.  If 1 IRQ is needed per
+              context, then only 2037 contexts can be allocated.  If 4
+              IRQs are needed per context, then only 2037/4 = 509
+              contexts can be allocated.
+
+    ioctl
+
+        CAPI_IOCTL_START_WORK:
+            Starts the AFU and associates it with the process memory
+            context.  Once this ioctl is successfully executed, all
+            memory mapped into this process is accessible to this AFU
+            context using the same virtual addresses.  No additional
+            calls are required to un/map memory.  The AFU context will
+            be updated as userspace allocates and frees memory.  This
+            ioctl returns onces the context is started.
+
+            Takes a pointer to a struct cxl_ioctl_start_work
+                    struct cxl_ioctl_start_work {
+                            __u64 wed;
+                            __u64 amr;
+                            __u64 reserved1;
+                            __u32 reserved2;
+                            __s16 num_interrupts;
+                            __u16 process_element;
+                            __u64 reserved3;
+                            __u64 reserved4;
+                            __u64 reserved5;
+                            __u64 reserved6;
+                    };
+
+                wed: 64bit argument defined by the AFU.  Typically
+                    this is an virtual address pointing to an AFU
+                    specific structure describing what work to
+                    perform.
+
+                amr:
+                    Authority Mask Register (AMR), same as the powerpc
+                    AMR.
+
+                num_interrupt:
+                    Number of userspace interrupts to request.  The
+                    minimum required given in sysfs and -1 will
+                    automatically allocate this minimum.  The max also
+                    given in sysfs.
+
+                process_element:
+                    Written by the kernel with the context id (AKA
+                    process element) it allocates.  Slave contexts may
+                    want to communicate this to a master process.
+
+                reserved fields:
+                    For ABI padding and future extensions
+
+        CAPI_IOCTL_CHECK_ERROR:
+            This checks to see if the AFU has encountered an error and
+            if so resets it.  If userspace is accessing MMIO space, it
+            may notice an EEH fence (all ones on read) before the kernel,
+            hence it needs to inform the kernel of this.
+
+        CAPI_IOCTL_LOAD_AFU_IMAGE:
+            Future work: to dynamically load AFU FPGA images.  Without
+            this, the AFU is assumed to be pre-loaded on the card.
+
+    mmap
+
+        An AFU may have a MMIO space to facilitate communication with
+        the AFU and mmap allows access to this.  The size and contents
+        of this area are specific to the particular AFU.  The size can
+        be discovered via sysfs.  A read of all ones indicates the AFU
+        has encountered an error and CAPI_IOCTL_CHECK_ERROR should be
+        used to recover the AFU.
+
+        Master contexts will get all of the MMIO space.  Slave
+        contexts will get only the per process space associated with
+        its context.
+
+        This mmap call must be done after the IOCTL is started.
+
+        Care should be taken when accessing MMIO space.  Only 32 and
+        64bit accesses are supported by POWER8. Also, the AFU will be
+        designed with a specific endian, so all MMIO access should
+        consider endian (recommend endian(3) variants like: le64toh(),
+        be64toh() etc).  These endian issues equally apply to shared
+        memory queues the WED may describe.
+
+    read
+
+        Reads an event from the AFU. Will return -EINVAL if the buffer
+        does not contain enough space to write the struct
+        capi_event_header. Blocks if no events are pending.  Will
+        return -EIO in the case of an unrecoverable error or if the
+        card is removed.
+
+        All events will return a struct cxl_event which is always the
+        same size.  A struct cxl_event_header at the start gives:
+                struct cxl_event_header {
+                        __u32 type;
+                        __u16 size;
+                        __u16 process_element;
+                        __u64 reserved1;
+                        __u64 reserved2;
+                        __u64 reserved3;
+                };
+
+            type:
+                This gives the type of the interrupt.  This gives how
+                the rest event will be structured.  It can be either:
+                AFU interrupt, data storage fault or AFU error.
+
+            size:
+                This is always sizeof(struct cxl_event)
+
+            process_element:
+                Context ID of the event.  Currently this will always
+                be the current context.  Future work may allow
+                interrupts from one context to be routed to another
+                (eg. a master contexts handling error interrupts on
+                behalf of a slave).
+
+            reserved fields:
+                For future extensions
+
+        If an AFU interrupt event is received, the full structure received is:
+                struct cxl_event_afu_interrupt {
+                        struct cxl_event_header header;
+                        __u16 irq;
+                        __u16 reserved1;
+                        __u32 reserved2;
+                        __u64 reserved3;
+                        __u64 reserved4;
+                        __u64 reserved5;
+                };
+            irq:
+                The IRQ number sent by the AFU.
+
+            reserved fields:
+                For future extensions
+
+        If an data storage event is received, the full structure received is:
+                struct cxl_event_data_storage {
+                        struct cxl_event_header header;
+                        __u64 addr;
+                        __u64 reserved1;
+                        __u64 reserved2;
+                        __u64 reserved3;
+                };
+            address:
+                Address of the data storage trying to be accessed by
+                the AFU.  Valid accesses will handled transparently by
+                the kernel but invalid access will generate this
+                event.
+
+            reserved fields:
+                For future extensions
+
+        If an AFU error event is received, the full structure received is:
+                struct cxl_event_afu_error {
+                        struct cxl_event_header header;
+                        __u64 err;
+                        __u64 reserved1;
+                        __u64 reserved2;
+                        __u64 reserved3;
+                };
+            err:
+                Error status from the AFU.  AFU defined.
+
+            reserved fields:
+                For future extensions
+
+Sysfs Class
+===========
+
+    A cxl sysfs class is added under /sys/class/cxl to facilitate
+    enumeration and tuning of the accelerators. Its layout is
+    described in Documentation/ABI/testing/sysfs-class-cxl
diff --git a/MAINTAINERS b/MAINTAINERS
index 809ecd6..c972be3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2711,6 +2711,13 @@ W:	http://www.chelsio.com
 S:	Supported
 F:	drivers/net/ethernet/chelsio/cxgb4vf/
 
+CXL (IBM Coherent Accelerator Processor Interface CAPI) DRIVER
+M:	Ian Munsie <imunsie@au1.ibm.com>
+M:	Michael Neuling <mikey@neuling.org>
+L:	linuxppc-dev@lists.ozlabs.org
+S:	Supported
+F:	drivers/misc/cxl/
+
 STMMAC ETHERNET DRIVER
 M:	Giuseppe Cavallaro <peppe.cavallaro@st.com>
 L:	netdev@vger.kernel.org
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 03/17] powerpc/cell: Make spu_flush_all_slbs() generic
  2014-09-30 10:34   ` Michael Neuling
@ 2014-09-30 10:40     ` Arnd Bergmann
  -1 siblings, 0 replies; 100+ messages in thread
From: Arnd Bergmann @ 2014-09-30 10:40 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michael Neuling, greg, mpe, benh, cbe-oss-dev, Aneesh Kumar K.V,
	imunsie, linux-kernel, linuxppc-dev, jk, anton

On Tuesday 30 September 2014 20:34:52 Michael Neuling wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> This moves spu_flush_all_slbs() into a generic call copro_flush_all_slbs().
> 
> This will be useful when we add cxl which also needs a similar SLB flush call.
> 
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> 

Very nice!

Acked-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 03/17] powerpc/cell: Make spu_flush_all_slbs() generic
@ 2014-09-30 10:40     ` Arnd Bergmann
  0 siblings, 0 replies; 100+ messages in thread
From: Arnd Bergmann @ 2014-09-30 10:40 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: cbe-oss-dev, Michael Neuling, greg, imunsie, linux-kernel,
	linuxppc-dev, Aneesh Kumar K.V, anton, jk

On Tuesday 30 September 2014 20:34:52 Michael Neuling wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> This moves spu_flush_all_slbs() into a generic call copro_flush_all_slbs().
> 
> This will be useful when we add cxl which also needs a similar SLB flush call.
> 
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> 

Very nice!

Acked-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 02/17] powerpc/cell: Move data segment faulting code out of cell platform
  2014-09-30 10:34   ` Michael Neuling
@ 2014-10-01  6:47     ` Michael Ellerman
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Ellerman @ 2014-10-01  6:47 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

On Tue, 2014-30-09 at 10:34:51 UTC, Michael Neuling wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> __spu_trap_data_seg() currently contains code to determine the VSID and ESID
> required for a particular EA and mm struct.
> 
> This code is generically useful for other co-processors.  This moves the code
> of the cell platform so it can be used by other powerpc code.  It also adds 1TB
> segment handling which Cell didn't have.

I'm not loving this.

For starters the name "copro_data_segment()" doesn't contain any verbs, and it
doesn't tell me what it does.

If we give it a name that says what it does, we get copro_get_ea_esid_and_vsid().
Or something equally ugly.

And then in patch 10 you move the bulk of the logic into calculate_vsid().

So instead can we:
 - add a small helper that does the esid calculation, eg. calculate_esid() ?
 - factor out the vsid logic into a helper, calculate_vsid() ?
 - rework the spu code to use those, dropping __spu_trap_data_seg()
 - use the helpers in the cxl code


cheers

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 02/17] powerpc/cell: Move data segment faulting code out of cell platform
@ 2014-10-01  6:47     ` Michael Ellerman
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Ellerman @ 2014-10-01  6:47 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, mpe, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

On Tue, 2014-30-09 at 10:34:51 UTC, Michael Neuling wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> __spu_trap_data_seg() currently contains code to determine the VSID and ESID
> required for a particular EA and mm struct.
> 
> This code is generically useful for other co-processors.  This moves the code
> of the cell platform so it can be used by other powerpc code.  It also adds 1TB
> segment handling which Cell didn't have.

I'm not loving this.

For starters the name "copro_data_segment()" doesn't contain any verbs, and it
doesn't tell me what it does.

If we give it a name that says what it does, we get copro_get_ea_esid_and_vsid().
Or something equally ugly.

And then in patch 10 you move the bulk of the logic into calculate_vsid().

So instead can we:
 - add a small helper that does the esid calculation, eg. calculate_esid() ?
 - factor out the vsid logic into a helper, calculate_vsid() ?
 - rework the spu code to use those, dropping __spu_trap_data_seg()
 - use the helpers in the cxl code


cheers

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 02/17] powerpc/cell: Move data segment faulting code out of cell platform
  2014-10-01  6:47     ` Michael Ellerman
@ 2014-10-01  6:51       ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 100+ messages in thread
From: Benjamin Herrenschmidt @ 2014-10-01  6:51 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Michael Neuling, greg, arnd, anton, linux-kernel, linuxppc-dev,
	jk, imunsie, cbe-oss-dev, Aneesh Kumar K.V

On Wed, 2014-10-01 at 16:47 +1000, Michael Ellerman wrote:
> 
> If we give it a name that says what it does, we get
> copro_get_ea_esid_and_vsid().
> Or something equally ugly.

copro_calc_full_va() ?

Ben.



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 02/17] powerpc/cell: Move data segment faulting code out of cell platform
@ 2014-10-01  6:51       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 100+ messages in thread
From: Benjamin Herrenschmidt @ 2014-10-01  6:51 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: cbe-oss-dev, Michael Neuling, arnd, Aneesh Kumar K.V, greg,
	linux-kernel, imunsie, linuxppc-dev, anton, jk

On Wed, 2014-10-01 at 16:47 +1000, Michael Ellerman wrote:
> 
> If we give it a name that says what it does, we get
> copro_get_ea_esid_and_vsid().
> Or something equally ugly.

copro_calc_full_va() ?

Ben.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 03/17] powerpc/cell: Make spu_flush_all_slbs() generic
  2014-09-30 10:34   ` Michael Neuling
@ 2014-10-01  7:13     ` Michael Ellerman
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Ellerman @ 2014-10-01  7:13 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

On Tue, 2014-30-09 at 10:34:52 UTC, Michael Neuling wrote:
> diff --git a/arch/powerpc/include/asm/copro.h b/arch/powerpc/include/asm/copro.h
> index 2858108..f3d338f 100644
> --- a/arch/powerpc/include/asm/copro.h
> +++ b/arch/powerpc/include/asm/copro.h
> @@ -15,4 +15,10 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
>  
>  int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid);
>  
> +
> +#ifdef CONFIG_PPC_COPRO_BASE
> +void copro_flush_all_slbs(struct mm_struct *mm);
> +#else
> +#define copro_flush_all_slbs(mm) do {} while(0)

This can be a static inline, so it should be. That way you get type checking on
the argument for CONFIG_PPC_COPRO_BASE=n.


cheers

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 03/17] powerpc/cell: Make spu_flush_all_slbs() generic
@ 2014-10-01  7:13     ` Michael Ellerman
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Ellerman @ 2014-10-01  7:13 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, mpe, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

On Tue, 2014-30-09 at 10:34:52 UTC, Michael Neuling wrote:
> diff --git a/arch/powerpc/include/asm/copro.h b/arch/powerpc/include/asm/copro.h
> index 2858108..f3d338f 100644
> --- a/arch/powerpc/include/asm/copro.h
> +++ b/arch/powerpc/include/asm/copro.h
> @@ -15,4 +15,10 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
>  
>  int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid);
>  
> +
> +#ifdef CONFIG_PPC_COPRO_BASE
> +void copro_flush_all_slbs(struct mm_struct *mm);
> +#else
> +#define copro_flush_all_slbs(mm) do {} while(0)

This can be a static inline, so it should be. That way you get type checking on
the argument for CONFIG_PPC_COPRO_BASE=n.


cheers

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 04/17] powerpc/msi: Improve IRQ bitmap allocator
  2014-09-30 10:34   ` Michael Neuling
@ 2014-10-01  7:13     ` Michael Ellerman
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Ellerman @ 2014-10-01  7:13 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

On Tue, 2014-30-09 at 10:34:53 UTC, Michael Neuling wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> Currently msi_bitmap_alloc_hwirqs() will round up any IRQ allocation requests
                                                                       request
> to the nearest power of 2.  eg. ask for 5 IRQs and you'll get 8.  This wastes a
                             ^ one space after a period, or die!

> lot of IRQs which can be a scarce resource.
> 
> For cxl we can require multiple IRQs for every contexts that is attached to the
                                                 context
> accelerator.  For AFU directed accelerators, there may be 1000s of contexts

What is an AFU directed accelerator?
                   
> attached, hence we can easily run out of IRQs, especially if we are needlessly
> wasting them.
> 
> This changes the msi_bitmap_alloc_hwirqs() to allocate only the required number
                x
> of IRQs, hence avoiding this wastage.

The crucial detail you failed to mention is that you maintain the behaviour that
allocations are naturally aligned.

Can you add a check in the test code at the bottom of the file to confirm that
please?


> diff --git a/arch/powerpc/sysdev/msi_bitmap.c b/arch/powerpc/sysdev/msi_bitmap.c
> index 2ff6302..961a358 100644
> --- a/arch/powerpc/sysdev/msi_bitmap.c
> +++ b/arch/powerpc/sysdev/msi_bitmap.c
> @@ -20,32 +20,37 @@ int msi_bitmap_alloc_hwirqs(struct msi_bitmap *bmp, int num)
>  	int offset, order = get_count_order(num);
>  
>  	spin_lock_irqsave(&bmp->lock, flags);
> -	/*
> -	 * This is fast, but stricter than we need. We might want to add
> -	 * a fallback routine which does a linear search with no alignment.
> -	 */
> -	offset = bitmap_find_free_region(bmp->bitmap, bmp->irq_count, order);
> +
> +	offset = bitmap_find_next_zero_area(bmp->bitmap, bmp->irq_count, 0,
> +					    num, (1 << order) - 1);
> +	if (offset > bmp->irq_count)
> +		goto err;

Can we get a newline here :)

> +	bitmap_set(bmp->bitmap, offset, num);
>  	spin_unlock_irqrestore(&bmp->lock, flags);
>  
>  	pr_debug("msi_bitmap: allocated 0x%x (2^%d) at offset 0x%x\n",
>  		 num, order, offset);

This print out is a bit confusing now, should probably just drop the order.

cheers

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 04/17] powerpc/msi: Improve IRQ bitmap allocator
@ 2014-10-01  7:13     ` Michael Ellerman
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Ellerman @ 2014-10-01  7:13 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, mpe, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

On Tue, 2014-30-09 at 10:34:53 UTC, Michael Neuling wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> Currently msi_bitmap_alloc_hwirqs() will round up any IRQ allocation requests
                                                                       request
> to the nearest power of 2.  eg. ask for 5 IRQs and you'll get 8.  This wastes a
                             ^ one space after a period, or die!

> lot of IRQs which can be a scarce resource.
> 
> For cxl we can require multiple IRQs for every contexts that is attached to the
                                                 context
> accelerator.  For AFU directed accelerators, there may be 1000s of contexts

What is an AFU directed accelerator?
                   
> attached, hence we can easily run out of IRQs, especially if we are needlessly
> wasting them.
> 
> This changes the msi_bitmap_alloc_hwirqs() to allocate only the required number
                x
> of IRQs, hence avoiding this wastage.

The crucial detail you failed to mention is that you maintain the behaviour that
allocations are naturally aligned.

Can you add a check in the test code at the bottom of the file to confirm that
please?


> diff --git a/arch/powerpc/sysdev/msi_bitmap.c b/arch/powerpc/sysdev/msi_bitmap.c
> index 2ff6302..961a358 100644
> --- a/arch/powerpc/sysdev/msi_bitmap.c
> +++ b/arch/powerpc/sysdev/msi_bitmap.c
> @@ -20,32 +20,37 @@ int msi_bitmap_alloc_hwirqs(struct msi_bitmap *bmp, int num)
>  	int offset, order = get_count_order(num);
>  
>  	spin_lock_irqsave(&bmp->lock, flags);
> -	/*
> -	 * This is fast, but stricter than we need. We might want to add
> -	 * a fallback routine which does a linear search with no alignment.
> -	 */
> -	offset = bitmap_find_free_region(bmp->bitmap, bmp->irq_count, order);
> +
> +	offset = bitmap_find_next_zero_area(bmp->bitmap, bmp->irq_count, 0,
> +					    num, (1 << order) - 1);
> +	if (offset > bmp->irq_count)
> +		goto err;

Can we get a newline here :)

> +	bitmap_set(bmp->bitmap, offset, num);
>  	spin_unlock_irqrestore(&bmp->lock, flags);
>  
>  	pr_debug("msi_bitmap: allocated 0x%x (2^%d) at offset 0x%x\n",
>  		 num, order, offset);

This print out is a bit confusing now, should probably just drop the order.

cheers

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 05/17] powerpc/mm: Export mmu_kernel_ssize and mmu_linear_psize
  2014-09-30 10:34   ` Michael Neuling
@ 2014-10-01  7:13     ` Michael Ellerman
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Ellerman @ 2014-10-01  7:13 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

On Tue, 2014-30-09 at 10:34:54 UTC, Michael Neuling wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>

Mind explaining why ? :)

cheers

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 05/17] powerpc/mm: Export mmu_kernel_ssize and mmu_linear_psize
@ 2014-10-01  7:13     ` Michael Ellerman
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Ellerman @ 2014-10-01  7:13 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, mpe, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

On Tue, 2014-30-09 at 10:34:54 UTC, Michael Neuling wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>

Mind explaining why ? :)

cheers

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 09/17] powerpc/mm: Add new hash_page_mm()
  2014-09-30 10:34   ` Michael Neuling
@ 2014-10-01  9:43     ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 100+ messages in thread
From: Aneesh Kumar K.V @ 2014-10-01  9:43 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie, cbe-oss-dev

Michael Neuling <mikey@neuling.org> writes:

> From: Ian Munsie <imunsie@au1.ibm.com>
>
> This adds a new function hash_page_mm() based on the existing hash_page().
> This version allows any struct mm to be passed in, rather than assuming
> current.  This is useful for servicing co-processor faults which are not in the
> context of the current running process.
>
> We need to be careful here as the current hash_page() assumes current in a few
> places.

It would be nice to document the rules here. So when we try to add a hash
page entry, and if that result in demotion of the segment are we suppose to
flush slbs ? Also why would one want to hash anything other
than current->mm ? How will this get called ? 

May be they are explained in later patches. But can we also explain it
here. 

>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
>  arch/powerpc/include/asm/mmu-hash64.h |  1 +
>  arch/powerpc/mm/hash_utils_64.c       | 22 ++++++++++++++--------
>  2 files changed, 15 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
> index 6d0b7a2..f84e5a5 100644
> --- a/arch/powerpc/include/asm/mmu-hash64.h
> +++ b/arch/powerpc/include/asm/mmu-hash64.h
> @@ -322,6 +322,7 @@ extern int __hash_page_64K(unsigned long ea, unsigned long access,
>  			   unsigned int local, int ssize);
>  struct mm_struct;
>  unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap);
> +extern int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, unsigned long trap);
>  extern int hash_page(unsigned long ea, unsigned long access, unsigned long trap);
>  int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid,
>  		     pte_t *ptep, unsigned long trap, int local, int ssize,
> diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> index bbdb054..0a5c8c0 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -904,7 +904,7 @@ void demote_segment_4k(struct mm_struct *mm, unsigned long addr)
>  		return;
>  	slice_set_range_psize(mm, addr, 1, MMU_PAGE_4K);
>  	copro_flush_all_slbs(mm);
> -	if (get_paca_psize(addr) != MMU_PAGE_4K) {
> +	if ((get_paca_psize(addr) != MMU_PAGE_4K) && (current->mm == mm)) {
>  		get_paca()->context = mm->context;
>  		slb_flush_and_rebolt();
>  	}
> @@ -989,26 +989,24 @@ static void check_paca_psize(unsigned long ea, struct mm_struct *mm,
>   * -1 - critical hash insertion error
>   * -2 - access not permitted by subpage protection mechanism
>   */
> -int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
> +int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, unsigned long trap)
>  {
>  	enum ctx_state prev_state = exception_enter();
>  	pgd_t *pgdir;
>  	unsigned long vsid;
> -	struct mm_struct *mm;
>  	pte_t *ptep;
>  	unsigned hugeshift;
>  	const struct cpumask *tmp;
>  	int rc, user_region = 0, local = 0;
>  	int psize, ssize;
>  
> -	DBG_LOW("hash_page(ea=%016lx, access=%lx, trap=%lx\n",
> -		ea, access, trap);
> +	DBG_LOW("%s(ea=%016lx, access=%lx, trap=%lx\n",
> +		__func__, ea, access, trap);
>  
>  	/* Get region & vsid */
>   	switch (REGION_ID(ea)) {
>  	case USER_REGION_ID:
>  		user_region = 1;
> -		mm = current->mm;
>  		if (! mm) {
>  			DBG_LOW(" user region with no mm !\n");
>  			rc = 1;
> @@ -1104,7 +1102,8 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
>  			WARN_ON(1);
>  		}
>  #endif
> -		check_paca_psize(ea, mm, psize, user_region);
> +		if (current->mm == mm)
> +			check_paca_psize(ea, mm, psize, user_region);
>  
>  		goto bail;
>  	}
> @@ -1145,7 +1144,8 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
>  		}
>  	}
>  
> -	check_paca_psize(ea, mm, psize, user_region);
> +	if (current->mm == mm)
> +		check_paca_psize(ea, mm, psize, user_region);
>  #endif /* CONFIG_PPC_64K_PAGES */
>  
>  #ifdef CONFIG_PPC_HAS_HASH_64K
> @@ -1180,6 +1180,12 @@ bail:
>  	exception_exit(prev_state);
>  	return rc;
>  }
> +EXPORT_SYMBOL_GPL(hash_page_mm);
> +
> +int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
> +{
> +	return hash_page_mm(current->mm, ea, access, trap);
> +}
>  EXPORT_SYMBOL_GPL(hash_page);
>  
>  void hash_preload(struct mm_struct *mm, unsigned long ea,
> -- 
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 09/17] powerpc/mm: Add new hash_page_mm()
@ 2014-10-01  9:43     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 100+ messages in thread
From: Aneesh Kumar K.V @ 2014-10-01  9:43 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, mpe, benh
  Cc: cbe-oss-dev, mikey, linux-kernel, linuxppc-dev, jk, imunsie, anton

Michael Neuling <mikey@neuling.org> writes:

> From: Ian Munsie <imunsie@au1.ibm.com>
>
> This adds a new function hash_page_mm() based on the existing hash_page().
> This version allows any struct mm to be passed in, rather than assuming
> current.  This is useful for servicing co-processor faults which are not in the
> context of the current running process.
>
> We need to be careful here as the current hash_page() assumes current in a few
> places.

It would be nice to document the rules here. So when we try to add a hash
page entry, and if that result in demotion of the segment are we suppose to
flush slbs ? Also why would one want to hash anything other
than current->mm ? How will this get called ? 

May be they are explained in later patches. But can we also explain it
here. 

>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
>  arch/powerpc/include/asm/mmu-hash64.h |  1 +
>  arch/powerpc/mm/hash_utils_64.c       | 22 ++++++++++++++--------
>  2 files changed, 15 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
> index 6d0b7a2..f84e5a5 100644
> --- a/arch/powerpc/include/asm/mmu-hash64.h
> +++ b/arch/powerpc/include/asm/mmu-hash64.h
> @@ -322,6 +322,7 @@ extern int __hash_page_64K(unsigned long ea, unsigned long access,
>  			   unsigned int local, int ssize);
>  struct mm_struct;
>  unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap);
> +extern int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, unsigned long trap);
>  extern int hash_page(unsigned long ea, unsigned long access, unsigned long trap);
>  int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid,
>  		     pte_t *ptep, unsigned long trap, int local, int ssize,
> diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> index bbdb054..0a5c8c0 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -904,7 +904,7 @@ void demote_segment_4k(struct mm_struct *mm, unsigned long addr)
>  		return;
>  	slice_set_range_psize(mm, addr, 1, MMU_PAGE_4K);
>  	copro_flush_all_slbs(mm);
> -	if (get_paca_psize(addr) != MMU_PAGE_4K) {
> +	if ((get_paca_psize(addr) != MMU_PAGE_4K) && (current->mm == mm)) {
>  		get_paca()->context = mm->context;
>  		slb_flush_and_rebolt();
>  	}
> @@ -989,26 +989,24 @@ static void check_paca_psize(unsigned long ea, struct mm_struct *mm,
>   * -1 - critical hash insertion error
>   * -2 - access not permitted by subpage protection mechanism
>   */
> -int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
> +int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, unsigned long trap)
>  {
>  	enum ctx_state prev_state = exception_enter();
>  	pgd_t *pgdir;
>  	unsigned long vsid;
> -	struct mm_struct *mm;
>  	pte_t *ptep;
>  	unsigned hugeshift;
>  	const struct cpumask *tmp;
>  	int rc, user_region = 0, local = 0;
>  	int psize, ssize;
>  
> -	DBG_LOW("hash_page(ea=%016lx, access=%lx, trap=%lx\n",
> -		ea, access, trap);
> +	DBG_LOW("%s(ea=%016lx, access=%lx, trap=%lx\n",
> +		__func__, ea, access, trap);
>  
>  	/* Get region & vsid */
>   	switch (REGION_ID(ea)) {
>  	case USER_REGION_ID:
>  		user_region = 1;
> -		mm = current->mm;
>  		if (! mm) {
>  			DBG_LOW(" user region with no mm !\n");
>  			rc = 1;
> @@ -1104,7 +1102,8 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
>  			WARN_ON(1);
>  		}
>  #endif
> -		check_paca_psize(ea, mm, psize, user_region);
> +		if (current->mm == mm)
> +			check_paca_psize(ea, mm, psize, user_region);
>  
>  		goto bail;
>  	}
> @@ -1145,7 +1144,8 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
>  		}
>  	}
>  
> -	check_paca_psize(ea, mm, psize, user_region);
> +	if (current->mm == mm)
> +		check_paca_psize(ea, mm, psize, user_region);
>  #endif /* CONFIG_PPC_64K_PAGES */
>  
>  #ifdef CONFIG_PPC_HAS_HASH_64K
> @@ -1180,6 +1180,12 @@ bail:
>  	exception_exit(prev_state);
>  	return rc;
>  }
> +EXPORT_SYMBOL_GPL(hash_page_mm);
> +
> +int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
> +{
> +	return hash_page_mm(current->mm, ea, access, trap);
> +}
>  EXPORT_SYMBOL_GPL(hash_page);
>  
>  void hash_preload(struct mm_struct *mm, unsigned long ea,
> -- 
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 02/17] powerpc/cell: Move data segment faulting code out of cell platform
  2014-09-30 10:34   ` Michael Neuling
@ 2014-10-01  9:45     ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 100+ messages in thread
From: Aneesh Kumar K.V @ 2014-10-01  9:45 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie, cbe-oss-dev

Michael Neuling <mikey@neuling.org> writes:

> From: Ian Munsie <imunsie@au1.ibm.com>
>
> __spu_trap_data_seg() currently contains code to determine the VSID and ESID
> required for a particular EA and mm struct.
>
> This code is generically useful for other co-processors.  This moves the code
> of the cell platform so it can be used by other powerpc code.  It also adds 1TB
> segment handling which Cell didn't have.
>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
>  arch/powerpc/include/asm/mmu-hash64.h  |  7 ++++-
>  arch/powerpc/mm/copro_fault.c          | 48 ++++++++++++++++++++++++++++++++++
>  arch/powerpc/mm/slb.c                  |  3 ---
>  arch/powerpc/platforms/cell/spu_base.c | 41 +++--------------------------
>  4 files changed, 58 insertions(+), 41 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
> index d765144..6d0b7a2 100644
> --- a/arch/powerpc/include/asm/mmu-hash64.h
> +++ b/arch/powerpc/include/asm/mmu-hash64.h
> @@ -189,7 +189,12 @@ static inline unsigned int mmu_psize_to_shift(unsigned int mmu_psize)
>  #define LP_MASK(i)	((0xFF >> (i)) << LP_SHIFT)
>  
>  #ifndef __ASSEMBLY__
> -
> +static inline int slb_vsid_shift(int ssize)
> +{
> +	if (ssize == MMU_SEGSIZE_256M)
> +		return SLB_VSID_SHIFT;
> +	return SLB_VSID_SHIFT_1T;
> +}
>  static inline int segment_shift(int ssize)
>  {
>  	if (ssize == MMU_SEGSIZE_256M)
> diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
> index ba7df14..b865697 100644
> --- a/arch/powerpc/mm/copro_fault.c
> +++ b/arch/powerpc/mm/copro_fault.c
> @@ -90,3 +90,51 @@ out_unlock:
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
> +
> +int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid)
> +{
> +	int psize, ssize;
> +
> +	*esid = (ea & ESID_MASK) | SLB_ESID_V;
> +
> +	switch (REGION_ID(ea)) {
> +	case USER_REGION_ID:
> +		pr_devel("copro_data_segment: 0x%llx -- USER_REGION_ID\n", ea);
> +#ifdef CONFIG_PPC_MM_SLICES
> +		psize = get_slice_psize(mm, ea);
> +#else
> +		psize = mm->context.user_psize;
> +#endif

We don't really need that as explained in last review.

-aneesh


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 02/17] powerpc/cell: Move data segment faulting code out of cell platform
@ 2014-10-01  9:45     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 100+ messages in thread
From: Aneesh Kumar K.V @ 2014-10-01  9:45 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, mpe, benh
  Cc: cbe-oss-dev, mikey, linux-kernel, linuxppc-dev, jk, imunsie, anton

Michael Neuling <mikey@neuling.org> writes:

> From: Ian Munsie <imunsie@au1.ibm.com>
>
> __spu_trap_data_seg() currently contains code to determine the VSID and ESID
> required for a particular EA and mm struct.
>
> This code is generically useful for other co-processors.  This moves the code
> of the cell platform so it can be used by other powerpc code.  It also adds 1TB
> segment handling which Cell didn't have.
>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
>  arch/powerpc/include/asm/mmu-hash64.h  |  7 ++++-
>  arch/powerpc/mm/copro_fault.c          | 48 ++++++++++++++++++++++++++++++++++
>  arch/powerpc/mm/slb.c                  |  3 ---
>  arch/powerpc/platforms/cell/spu_base.c | 41 +++--------------------------
>  4 files changed, 58 insertions(+), 41 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
> index d765144..6d0b7a2 100644
> --- a/arch/powerpc/include/asm/mmu-hash64.h
> +++ b/arch/powerpc/include/asm/mmu-hash64.h
> @@ -189,7 +189,12 @@ static inline unsigned int mmu_psize_to_shift(unsigned int mmu_psize)
>  #define LP_MASK(i)	((0xFF >> (i)) << LP_SHIFT)
>  
>  #ifndef __ASSEMBLY__
> -
> +static inline int slb_vsid_shift(int ssize)
> +{
> +	if (ssize == MMU_SEGSIZE_256M)
> +		return SLB_VSID_SHIFT;
> +	return SLB_VSID_SHIFT_1T;
> +}
>  static inline int segment_shift(int ssize)
>  {
>  	if (ssize == MMU_SEGSIZE_256M)
> diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
> index ba7df14..b865697 100644
> --- a/arch/powerpc/mm/copro_fault.c
> +++ b/arch/powerpc/mm/copro_fault.c
> @@ -90,3 +90,51 @@ out_unlock:
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
> +
> +int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid)
> +{
> +	int psize, ssize;
> +
> +	*esid = (ea & ESID_MASK) | SLB_ESID_V;
> +
> +	switch (REGION_ID(ea)) {
> +	case USER_REGION_ID:
> +		pr_devel("copro_data_segment: 0x%llx -- USER_REGION_ID\n", ea);
> +#ifdef CONFIG_PPC_MM_SLICES
> +		psize = get_slice_psize(mm, ea);
> +#else
> +		psize = mm->context.user_psize;
> +#endif

We don't really need that as explained in last review.

-aneesh

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 02/17] powerpc/cell: Move data segment faulting code out of cell platform
  2014-09-30 10:34   ` Michael Neuling
@ 2014-10-01  9:53     ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 100+ messages in thread
From: Aneesh Kumar K.V @ 2014-10-01  9:53 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie, cbe-oss-dev

Michael Neuling <mikey@neuling.org> writes:

> From: Ian Munsie <imunsie@au1.ibm.com>
>
> __spu_trap_data_seg() currently contains code to determine the VSID and ESID
> required for a particular EA and mm struct.
>
> This code is generically useful for other co-processors.  This moves the code
> of the cell platform so it can be used by other powerpc code.  It also adds 1TB
> segment handling which Cell didn't have.
>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
>  arch/powerpc/include/asm/mmu-hash64.h  |  7 ++++-
>  arch/powerpc/mm/copro_fault.c          | 48 ++++++++++++++++++++++++++++++++++
>  arch/powerpc/mm/slb.c                  |  3 ---
>  arch/powerpc/platforms/cell/spu_base.c | 41 +++--------------------------
>  4 files changed, 58 insertions(+), 41 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
> index d765144..6d0b7a2 100644
> --- a/arch/powerpc/include/asm/mmu-hash64.h
> +++ b/arch/powerpc/include/asm/mmu-hash64.h
> @@ -189,7 +189,12 @@ static inline unsigned int mmu_psize_to_shift(unsigned int mmu_psize)
>  #define LP_MASK(i)	((0xFF >> (i)) << LP_SHIFT)
>  
>  #ifndef __ASSEMBLY__
> -
> +static inline int slb_vsid_shift(int ssize)
> +{
> +	if (ssize == MMU_SEGSIZE_256M)
> +		return SLB_VSID_SHIFT;
> +	return SLB_VSID_SHIFT_1T;
> +}
>  static inline int segment_shift(int ssize)
>  {
>  	if (ssize == MMU_SEGSIZE_256M)
> diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
> index ba7df14..b865697 100644
> --- a/arch/powerpc/mm/copro_fault.c
> +++ b/arch/powerpc/mm/copro_fault.c
> @@ -90,3 +90,51 @@ out_unlock:
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
> +
> +int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid)
> +{
> +	int psize, ssize;
> +
> +	*esid = (ea & ESID_MASK) | SLB_ESID_V;
> +
> +	switch (REGION_ID(ea)) {
> +	case USER_REGION_ID:
> +		pr_devel("copro_data_segment: 0x%llx -- USER_REGION_ID\n", ea);
> +#ifdef CONFIG_PPC_MM_SLICES
> +		psize = get_slice_psize(mm, ea);
> +#else
> +		psize = mm->context.user_psize;
> +#endif
> +		ssize = user_segment_size(ea);
> +		*vsid = (get_vsid(mm->context.id, ea, ssize)
> +			 << slb_vsid_shift(ssize)) | SLB_VSID_USER;
> +		break;
> +	case VMALLOC_REGION_ID:
> +		pr_devel("copro_data_segment: 0x%llx -- VMALLOC_REGION_ID\n", ea);
> +		if (ea < VMALLOC_END)
> +			psize = mmu_vmalloc_psize;
> +		else
> +			psize = mmu_io_psize;
> +		ssize = mmu_kernel_ssize;
> +		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
> +			 << SLB_VSID_SHIFT) | SLB_VSID_KERNEL;

why not
		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
			 << slb_vsid_shift(ssize)) | SLB_VSID_KERNEL;

for vmalloc and kernel region ? We could end up using 1T segments for kernel mapping too.

-aneesh


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 02/17] powerpc/cell: Move data segment faulting code out of cell platform
@ 2014-10-01  9:53     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 100+ messages in thread
From: Aneesh Kumar K.V @ 2014-10-01  9:53 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, mpe, benh
  Cc: cbe-oss-dev, mikey, linux-kernel, linuxppc-dev, jk, imunsie, anton

Michael Neuling <mikey@neuling.org> writes:

> From: Ian Munsie <imunsie@au1.ibm.com>
>
> __spu_trap_data_seg() currently contains code to determine the VSID and ESID
> required for a particular EA and mm struct.
>
> This code is generically useful for other co-processors.  This moves the code
> of the cell platform so it can be used by other powerpc code.  It also adds 1TB
> segment handling which Cell didn't have.
>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
>  arch/powerpc/include/asm/mmu-hash64.h  |  7 ++++-
>  arch/powerpc/mm/copro_fault.c          | 48 ++++++++++++++++++++++++++++++++++
>  arch/powerpc/mm/slb.c                  |  3 ---
>  arch/powerpc/platforms/cell/spu_base.c | 41 +++--------------------------
>  4 files changed, 58 insertions(+), 41 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
> index d765144..6d0b7a2 100644
> --- a/arch/powerpc/include/asm/mmu-hash64.h
> +++ b/arch/powerpc/include/asm/mmu-hash64.h
> @@ -189,7 +189,12 @@ static inline unsigned int mmu_psize_to_shift(unsigned int mmu_psize)
>  #define LP_MASK(i)	((0xFF >> (i)) << LP_SHIFT)
>  
>  #ifndef __ASSEMBLY__
> -
> +static inline int slb_vsid_shift(int ssize)
> +{
> +	if (ssize == MMU_SEGSIZE_256M)
> +		return SLB_VSID_SHIFT;
> +	return SLB_VSID_SHIFT_1T;
> +}
>  static inline int segment_shift(int ssize)
>  {
>  	if (ssize == MMU_SEGSIZE_256M)
> diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
> index ba7df14..b865697 100644
> --- a/arch/powerpc/mm/copro_fault.c
> +++ b/arch/powerpc/mm/copro_fault.c
> @@ -90,3 +90,51 @@ out_unlock:
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
> +
> +int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid)
> +{
> +	int psize, ssize;
> +
> +	*esid = (ea & ESID_MASK) | SLB_ESID_V;
> +
> +	switch (REGION_ID(ea)) {
> +	case USER_REGION_ID:
> +		pr_devel("copro_data_segment: 0x%llx -- USER_REGION_ID\n", ea);
> +#ifdef CONFIG_PPC_MM_SLICES
> +		psize = get_slice_psize(mm, ea);
> +#else
> +		psize = mm->context.user_psize;
> +#endif
> +		ssize = user_segment_size(ea);
> +		*vsid = (get_vsid(mm->context.id, ea, ssize)
> +			 << slb_vsid_shift(ssize)) | SLB_VSID_USER;
> +		break;
> +	case VMALLOC_REGION_ID:
> +		pr_devel("copro_data_segment: 0x%llx -- VMALLOC_REGION_ID\n", ea);
> +		if (ea < VMALLOC_END)
> +			psize = mmu_vmalloc_psize;
> +		else
> +			psize = mmu_io_psize;
> +		ssize = mmu_kernel_ssize;
> +		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
> +			 << SLB_VSID_SHIFT) | SLB_VSID_KERNEL;

why not
		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
			 << slb_vsid_shift(ssize)) | SLB_VSID_KERNEL;

for vmalloc and kernel region ? We could end up using 1T segments for kernel mapping too.

-aneesh

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 10/17] powerpc/mm: Merge vsid calculation in hash_page() and copro_data_segment()
  2014-09-30 10:34   ` Michael Neuling
@ 2014-10-01  9:55     ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 100+ messages in thread
From: Aneesh Kumar K.V @ 2014-10-01  9:55 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie, cbe-oss-dev

Michael Neuling <mikey@neuling.org> writes:

> From: Ian Munsie <imunsie@au1.ibm.com>
>
> The vsid calculation between hash_page() and copro_data_segment() are very
> similar.  This merges these two different versions.
>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
>  arch/powerpc/include/asm/mmu-hash64.h |  2 ++
>  arch/powerpc/mm/copro_fault.c         | 45 ++++++--------------------
>  arch/powerpc/mm/hash_utils_64.c       | 61 ++++++++++++++++++++++-------------
>  3 files changed, 50 insertions(+), 58 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
> index f84e5a5..bf43fb0 100644
> --- a/arch/powerpc/include/asm/mmu-hash64.h
> +++ b/arch/powerpc/include/asm/mmu-hash64.h
> @@ -322,6 +322,8 @@ extern int __hash_page_64K(unsigned long ea, unsigned long access,
>  			   unsigned int local, int ssize);
>  struct mm_struct;
>  unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap);
> +int calculate_vsid(struct mm_struct *mm, u64 ea,
> +		   u64 *vsid, int *psize, int *ssize);
>  extern int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, unsigned long trap);
>  extern int hash_page(unsigned long ea, unsigned long access, unsigned long trap);
>  int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid,
> diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
> index 939abdf..ba8bf8e 100644
> --- a/arch/powerpc/mm/copro_fault.c
> +++ b/arch/powerpc/mm/copro_fault.c
> @@ -94,45 +94,18 @@ EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
>  
>  int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid)
>  {
> -	int psize, ssize;
> +	int psize, ssize, rc;
>  
>  	*esid = (ea & ESID_MASK) | SLB_ESID_V;
>  
> -	switch (REGION_ID(ea)) {
> -	case USER_REGION_ID:
> -		pr_devel("copro_data_segment: 0x%llx -- USER_REGION_ID\n", ea);
> -#ifdef CONFIG_PPC_MM_SLICES
> -		psize = get_slice_psize(mm, ea);
> -#else
> -		psize = mm->context.user_psize;
> -#endif
> -		ssize = user_segment_size(ea);
> -		*vsid = (get_vsid(mm->context.id, ea, ssize)
> -			 << slb_vsid_shift(ssize)) | SLB_VSID_USER;
> -		break;
> -	case VMALLOC_REGION_ID:
> -		pr_devel("copro_data_segment: 0x%llx -- VMALLOC_REGION_ID\n", ea);
> -		if (ea < VMALLOC_END)
> -			psize = mmu_vmalloc_psize;
> -		else
> -			psize = mmu_io_psize;
> -		ssize = mmu_kernel_ssize;
> -		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
> -			 << SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
> -		break;
> -	case KERNEL_REGION_ID:
> -		pr_devel("copro_data_segment: 0x%llx -- KERNEL_REGION_ID\n", ea);
> -		psize = mmu_linear_psize;
> -		ssize = mmu_kernel_ssize;
> -		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
> -			 << SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
> -		break;
> -	default:
> -		/* Future: support kernel segments so that drivers can use the
> -		 * CoProcessors */
> -		pr_debug("invalid region access at %016llx\n", ea);
> -		return 1;
> -	}
> +	rc = calculate_vsid(mm, ea, vsid, &psize, &ssize);
> +	if (rc)
> +		return rc;
> +	if (REGION_ID(ea) == USER_REGION_ID)
> +		*vsid = (*vsid << slb_vsid_shift(ssize)) | SLB_VSID_USER;
> +	else
> +		*vsid = (*vsid << SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
> +
>  	*vsid |= mmu_psize_defs[psize].sllp |
>  		((ssize == MMU_SEGSIZE_1T) ? SLB_VSID_B_1T : 0);
>  
> diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> index 0a5c8c0..3fa81ca 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -983,6 +983,38 @@ static void check_paca_psize(unsigned long ea, struct mm_struct *mm,
>  	}
>  }
>  
> +int calculate_vsid(struct mm_struct *mm, u64 ea,
> +		   u64 *vsid, int *psize, int *ssize)
> +{
> +	switch (REGION_ID(ea)) {
> +	case USER_REGION_ID:
> +		pr_devel("%s: 0x%llx -- USER_REGION_ID\n", __func__, ea);
> +		*psize = get_slice_psize(mm, ea);
> +		*ssize = user_segment_size(ea);
> +		*vsid = get_vsid(mm->context.id, ea, *ssize);
> +		return 0;
> +	case VMALLOC_REGION_ID:
> +		pr_devel("%s: 0x%llx -- VMALLOC_REGION_ID\n", __func__, ea);
> +		if (ea < VMALLOC_END)
> +			*psize = mmu_vmalloc_psize;
> +		else
> +			*psize = mmu_io_psize;
> +		*ssize = mmu_kernel_ssize;
> +		*vsid = get_kernel_vsid(ea, mmu_kernel_ssize);
> +		return 0;
> +	case KERNEL_REGION_ID:
> +		pr_devel("%s: 0x%llx -- KERNEL_REGION_ID\n", __func__, ea);
> +		*psize = mmu_linear_psize;
> +		*ssize = mmu_kernel_ssize;
> +		*vsid = get_kernel_vsid(ea, mmu_kernel_ssize);
> +		return 0;
> +	default:
> +		pr_debug("%s: invalid region access at %016llx\n", __func__, ea);
> +		return 1;
> +	}
> +}
> +EXPORT_SYMBOL_GPL(calculate_vsid);
> +
>  /* Result code is:
>   *  0 - handled
>   *  1 - normal page fault
> @@ -993,7 +1025,7 @@ int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, u
>  {
>  	enum ctx_state prev_state = exception_enter();
>  	pgd_t *pgdir;
> -	unsigned long vsid;
> +	u64 vsid;
>  	pte_t *ptep;
>  	unsigned hugeshift;
>  	const struct cpumask *tmp;
> @@ -1003,35 +1035,20 @@ int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, u
>  	DBG_LOW("%s(ea=%016lx, access=%lx, trap=%lx\n",
>  		__func__, ea, access, trap);
>  
> -	/* Get region & vsid */
> - 	switch (REGION_ID(ea)) {
> -	case USER_REGION_ID:
> +	/* Get region */
> +	if (REGION_ID(ea) == USER_REGION_ID) {
>  		user_region = 1;
>  		if (! mm) {
>  			DBG_LOW(" user region with no mm !\n");
>  			rc = 1;
>  			goto bail;
>  		}
> -		psize = get_slice_psize(mm, ea);
> -		ssize = user_segment_size(ea);
> -		vsid = get_vsid(mm->context.id, ea, ssize);
> -		break;
> -	case VMALLOC_REGION_ID:
> +	} else
>  		mm = &init_mm;
> -		vsid = get_kernel_vsid(ea, mmu_kernel_ssize);
> -		if (ea < VMALLOC_END)
> -			psize = mmu_vmalloc_psize;
> -		else
> -			psize = mmu_io_psize;
> -		ssize = mmu_kernel_ssize;
> -		break;
> -	default:
> -		/* Not a valid range
> -		 * Send the problem up to do_page_fault 
> -		 */
> -		rc = 1;


That part is different now. We now handle kernel_region_id in case of
hash_page. Earlier we used consider it a problem. 

> +	rc = calculate_vsid(mm, ea, &vsid, &psize, &ssize);
> +	if (rc)
>  		goto bail;
> -	}
> +
>  	DBG_LOW(" mm=%p, mm->pgdir=%p, vsid=%016lx\n", mm, mm->pgd, vsid);
>  
>  	/* Bad address. */
> -- 
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 10/17] powerpc/mm: Merge vsid calculation in hash_page() and copro_data_segment()
@ 2014-10-01  9:55     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 100+ messages in thread
From: Aneesh Kumar K.V @ 2014-10-01  9:55 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, mpe, benh
  Cc: cbe-oss-dev, mikey, linux-kernel, linuxppc-dev, jk, imunsie, anton

Michael Neuling <mikey@neuling.org> writes:

> From: Ian Munsie <imunsie@au1.ibm.com>
>
> The vsid calculation between hash_page() and copro_data_segment() are very
> similar.  This merges these two different versions.
>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
>  arch/powerpc/include/asm/mmu-hash64.h |  2 ++
>  arch/powerpc/mm/copro_fault.c         | 45 ++++++--------------------
>  arch/powerpc/mm/hash_utils_64.c       | 61 ++++++++++++++++++++++-------------
>  3 files changed, 50 insertions(+), 58 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
> index f84e5a5..bf43fb0 100644
> --- a/arch/powerpc/include/asm/mmu-hash64.h
> +++ b/arch/powerpc/include/asm/mmu-hash64.h
> @@ -322,6 +322,8 @@ extern int __hash_page_64K(unsigned long ea, unsigned long access,
>  			   unsigned int local, int ssize);
>  struct mm_struct;
>  unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap);
> +int calculate_vsid(struct mm_struct *mm, u64 ea,
> +		   u64 *vsid, int *psize, int *ssize);
>  extern int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, unsigned long trap);
>  extern int hash_page(unsigned long ea, unsigned long access, unsigned long trap);
>  int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid,
> diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
> index 939abdf..ba8bf8e 100644
> --- a/arch/powerpc/mm/copro_fault.c
> +++ b/arch/powerpc/mm/copro_fault.c
> @@ -94,45 +94,18 @@ EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
>  
>  int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid)
>  {
> -	int psize, ssize;
> +	int psize, ssize, rc;
>  
>  	*esid = (ea & ESID_MASK) | SLB_ESID_V;
>  
> -	switch (REGION_ID(ea)) {
> -	case USER_REGION_ID:
> -		pr_devel("copro_data_segment: 0x%llx -- USER_REGION_ID\n", ea);
> -#ifdef CONFIG_PPC_MM_SLICES
> -		psize = get_slice_psize(mm, ea);
> -#else
> -		psize = mm->context.user_psize;
> -#endif
> -		ssize = user_segment_size(ea);
> -		*vsid = (get_vsid(mm->context.id, ea, ssize)
> -			 << slb_vsid_shift(ssize)) | SLB_VSID_USER;
> -		break;
> -	case VMALLOC_REGION_ID:
> -		pr_devel("copro_data_segment: 0x%llx -- VMALLOC_REGION_ID\n", ea);
> -		if (ea < VMALLOC_END)
> -			psize = mmu_vmalloc_psize;
> -		else
> -			psize = mmu_io_psize;
> -		ssize = mmu_kernel_ssize;
> -		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
> -			 << SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
> -		break;
> -	case KERNEL_REGION_ID:
> -		pr_devel("copro_data_segment: 0x%llx -- KERNEL_REGION_ID\n", ea);
> -		psize = mmu_linear_psize;
> -		ssize = mmu_kernel_ssize;
> -		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
> -			 << SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
> -		break;
> -	default:
> -		/* Future: support kernel segments so that drivers can use the
> -		 * CoProcessors */
> -		pr_debug("invalid region access at %016llx\n", ea);
> -		return 1;
> -	}
> +	rc = calculate_vsid(mm, ea, vsid, &psize, &ssize);
> +	if (rc)
> +		return rc;
> +	if (REGION_ID(ea) == USER_REGION_ID)
> +		*vsid = (*vsid << slb_vsid_shift(ssize)) | SLB_VSID_USER;
> +	else
> +		*vsid = (*vsid << SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
> +
>  	*vsid |= mmu_psize_defs[psize].sllp |
>  		((ssize == MMU_SEGSIZE_1T) ? SLB_VSID_B_1T : 0);
>  
> diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> index 0a5c8c0..3fa81ca 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -983,6 +983,38 @@ static void check_paca_psize(unsigned long ea, struct mm_struct *mm,
>  	}
>  }
>  
> +int calculate_vsid(struct mm_struct *mm, u64 ea,
> +		   u64 *vsid, int *psize, int *ssize)
> +{
> +	switch (REGION_ID(ea)) {
> +	case USER_REGION_ID:
> +		pr_devel("%s: 0x%llx -- USER_REGION_ID\n", __func__, ea);
> +		*psize = get_slice_psize(mm, ea);
> +		*ssize = user_segment_size(ea);
> +		*vsid = get_vsid(mm->context.id, ea, *ssize);
> +		return 0;
> +	case VMALLOC_REGION_ID:
> +		pr_devel("%s: 0x%llx -- VMALLOC_REGION_ID\n", __func__, ea);
> +		if (ea < VMALLOC_END)
> +			*psize = mmu_vmalloc_psize;
> +		else
> +			*psize = mmu_io_psize;
> +		*ssize = mmu_kernel_ssize;
> +		*vsid = get_kernel_vsid(ea, mmu_kernel_ssize);
> +		return 0;
> +	case KERNEL_REGION_ID:
> +		pr_devel("%s: 0x%llx -- KERNEL_REGION_ID\n", __func__, ea);
> +		*psize = mmu_linear_psize;
> +		*ssize = mmu_kernel_ssize;
> +		*vsid = get_kernel_vsid(ea, mmu_kernel_ssize);
> +		return 0;
> +	default:
> +		pr_debug("%s: invalid region access at %016llx\n", __func__, ea);
> +		return 1;
> +	}
> +}
> +EXPORT_SYMBOL_GPL(calculate_vsid);
> +
>  /* Result code is:
>   *  0 - handled
>   *  1 - normal page fault
> @@ -993,7 +1025,7 @@ int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, u
>  {
>  	enum ctx_state prev_state = exception_enter();
>  	pgd_t *pgdir;
> -	unsigned long vsid;
> +	u64 vsid;
>  	pte_t *ptep;
>  	unsigned hugeshift;
>  	const struct cpumask *tmp;
> @@ -1003,35 +1035,20 @@ int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, u
>  	DBG_LOW("%s(ea=%016lx, access=%lx, trap=%lx\n",
>  		__func__, ea, access, trap);
>  
> -	/* Get region & vsid */
> - 	switch (REGION_ID(ea)) {
> -	case USER_REGION_ID:
> +	/* Get region */
> +	if (REGION_ID(ea) == USER_REGION_ID) {
>  		user_region = 1;
>  		if (! mm) {
>  			DBG_LOW(" user region with no mm !\n");
>  			rc = 1;
>  			goto bail;
>  		}
> -		psize = get_slice_psize(mm, ea);
> -		ssize = user_segment_size(ea);
> -		vsid = get_vsid(mm->context.id, ea, ssize);
> -		break;
> -	case VMALLOC_REGION_ID:
> +	} else
>  		mm = &init_mm;
> -		vsid = get_kernel_vsid(ea, mmu_kernel_ssize);
> -		if (ea < VMALLOC_END)
> -			psize = mmu_vmalloc_psize;
> -		else
> -			psize = mmu_io_psize;
> -		ssize = mmu_kernel_ssize;
> -		break;
> -	default:
> -		/* Not a valid range
> -		 * Send the problem up to do_page_fault 
> -		 */
> -		rc = 1;


That part is different now. We now handle kernel_region_id in case of
hash_page. Earlier we used consider it a problem. 

> +	rc = calculate_vsid(mm, ea, &vsid, &psize, &ssize);
> +	if (rc)
>  		goto bail;
> -	}
> +
>  	DBG_LOW(" mm=%p, mm->pgdir=%p, vsid=%016lx\n", mm, mm->pgd, vsid);
>  
>  	/* Bad address. */
> -- 
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 03/17] powerpc/cell: Make spu_flush_all_slbs() generic
  2014-10-01  7:13     ` Michael Ellerman
@ 2014-10-01 10:51       ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-10-01 10:51 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: greg, arnd, benh, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

On Wed, 2014-10-01 at 17:13 +1000, Michael Ellerman wrote:
> On Tue, 2014-30-09 at 10:34:52 UTC, Michael Neuling wrote:
> > diff --git a/arch/powerpc/include/asm/copro.h b/arch/powerpc/include/asm/copro.h
> > index 2858108..f3d338f 100644
> > --- a/arch/powerpc/include/asm/copro.h
> > +++ b/arch/powerpc/include/asm/copro.h
> > @@ -15,4 +15,10 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
> >  
> >  int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid);
> >  
> > +
> > +#ifdef CONFIG_PPC_COPRO_BASE
> > +void copro_flush_all_slbs(struct mm_struct *mm);
> > +#else
> > +#define copro_flush_all_slbs(mm) do {} while(0)
> 
> This can be a static inline, so it should be. That way you get type checking on
> the argument for CONFIG_PPC_COPRO_BASE=n.

OK, I'll update.

Mikey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 03/17] powerpc/cell: Make spu_flush_all_slbs() generic
@ 2014-10-01 10:51       ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-10-01 10:51 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: cbe-oss-dev, arnd, Aneesh Kumar K.V, greg, linux-kernel, imunsie,
	linuxppc-dev, anton, jk

On Wed, 2014-10-01 at 17:13 +1000, Michael Ellerman wrote:
> On Tue, 2014-30-09 at 10:34:52 UTC, Michael Neuling wrote:
> > diff --git a/arch/powerpc/include/asm/copro.h b/arch/powerpc/include/as=
m/copro.h
> > index 2858108..f3d338f 100644
> > --- a/arch/powerpc/include/asm/copro.h
> > +++ b/arch/powerpc/include/asm/copro.h
> > @@ -15,4 +15,10 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsi=
gned long ea,
> > =20
> >  int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *v=
sid);
> > =20
> > +
> > +#ifdef CONFIG_PPC_COPRO_BASE
> > +void copro_flush_all_slbs(struct mm_struct *mm);
> > +#else
> > +#define copro_flush_all_slbs(mm) do {} while(0)
>=20
> This can be a static inline, so it should be. That way you get type check=
ing on
> the argument for CONFIG_PPC_COPRO_BASE=3Dn.

OK, I'll update.

Mikey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 02/17] powerpc/cell: Move data segment faulting code out of cell platform
  2014-10-01  9:45     ` Aneesh Kumar K.V
@ 2014-10-01 11:10       ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-10-01 11:10 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: greg, arnd, mpe, benh, anton, linux-kernel, linuxppc-dev, jk,
	imunsie, cbe-oss-dev

On Wed, 2014-10-01 at 15:15 +0530, Aneesh Kumar K.V wrote:
> Michael Neuling <mikey@neuling.org> writes:
> 
> > From: Ian Munsie <imunsie@au1.ibm.com>
> >
> > __spu_trap_data_seg() currently contains code to determine the VSID and ESID
> > required for a particular EA and mm struct.
> >
> > This code is generically useful for other co-processors.  This moves the code
> > of the cell platform so it can be used by other powerpc code.  It also adds 1TB
> > segment handling which Cell didn't have.
> >
> > Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > ---
> >  arch/powerpc/include/asm/mmu-hash64.h  |  7 ++++-
> >  arch/powerpc/mm/copro_fault.c          | 48 ++++++++++++++++++++++++++++++++++
> >  arch/powerpc/mm/slb.c                  |  3 ---
> >  arch/powerpc/platforms/cell/spu_base.c | 41 +++--------------------------
> >  4 files changed, 58 insertions(+), 41 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
> > index d765144..6d0b7a2 100644
> > --- a/arch/powerpc/include/asm/mmu-hash64.h
> > +++ b/arch/powerpc/include/asm/mmu-hash64.h
> > @@ -189,7 +189,12 @@ static inline unsigned int mmu_psize_to_shift(unsigned int mmu_psize)
> >  #define LP_MASK(i)	((0xFF >> (i)) << LP_SHIFT)
> >  
> >  #ifndef __ASSEMBLY__
> > -
> > +static inline int slb_vsid_shift(int ssize)
> > +{
> > +	if (ssize == MMU_SEGSIZE_256M)
> > +		return SLB_VSID_SHIFT;
> > +	return SLB_VSID_SHIFT_1T;
> > +}
> >  static inline int segment_shift(int ssize)
> >  {
> >  	if (ssize == MMU_SEGSIZE_256M)
> > diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
> > index ba7df14..b865697 100644
> > --- a/arch/powerpc/mm/copro_fault.c
> > +++ b/arch/powerpc/mm/copro_fault.c
> > @@ -90,3 +90,51 @@ out_unlock:
> >  	return ret;
> >  }
> >  EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
> > +
> > +int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid)
> > +{
> > +	int psize, ssize;
> > +
> > +	*esid = (ea & ESID_MASK) | SLB_ESID_V;
> > +
> > +	switch (REGION_ID(ea)) {
> > +	case USER_REGION_ID:
> > +		pr_devel("copro_data_segment: 0x%llx -- USER_REGION_ID\n", ea);
> > +#ifdef CONFIG_PPC_MM_SLICES
> > +		psize = get_slice_psize(mm, ea);
> > +#else
> > +		psize = mm->context.user_psize;
> > +#endif
> 
> We don't really need that as explained in last review.

That cleanup is in patch 10. I avoided changing it here so it's clearer
that what is being removed is the same as what is being added.

Mikey


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 02/17] powerpc/cell: Move data segment faulting code out of cell platform
@ 2014-10-01 11:10       ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-10-01 11:10 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: cbe-oss-dev, arnd, greg, linux-kernel, linuxppc-dev, anton, imunsie, jk

On Wed, 2014-10-01 at 15:15 +0530, Aneesh Kumar K.V wrote:
> Michael Neuling <mikey@neuling.org> writes:
>=20
> > From: Ian Munsie <imunsie@au1.ibm.com>
> >
> > __spu_trap_data_seg() currently contains code to determine the VSID and=
 ESID
> > required for a particular EA and mm struct.
> >
> > This code is generically useful for other co-processors.  This moves th=
e code
> > of the cell platform so it can be used by other powerpc code.  It also =
adds 1TB
> > segment handling which Cell didn't have.
> >
> > Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > ---
> >  arch/powerpc/include/asm/mmu-hash64.h  |  7 ++++-
> >  arch/powerpc/mm/copro_fault.c          | 48 ++++++++++++++++++++++++++=
++++++++
> >  arch/powerpc/mm/slb.c                  |  3 ---
> >  arch/powerpc/platforms/cell/spu_base.c | 41 +++-----------------------=
---
> >  4 files changed, 58 insertions(+), 41 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/inclu=
de/asm/mmu-hash64.h
> > index d765144..6d0b7a2 100644
> > --- a/arch/powerpc/include/asm/mmu-hash64.h
> > +++ b/arch/powerpc/include/asm/mmu-hash64.h
> > @@ -189,7 +189,12 @@ static inline unsigned int mmu_psize_to_shift(unsi=
gned int mmu_psize)
> >  #define LP_MASK(i)	((0xFF >> (i)) << LP_SHIFT)
> > =20
> >  #ifndef __ASSEMBLY__
> > -
> > +static inline int slb_vsid_shift(int ssize)
> > +{
> > +	if (ssize =3D=3D MMU_SEGSIZE_256M)
> > +		return SLB_VSID_SHIFT;
> > +	return SLB_VSID_SHIFT_1T;
> > +}
> >  static inline int segment_shift(int ssize)
> >  {
> >  	if (ssize =3D=3D MMU_SEGSIZE_256M)
> > diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_faul=
t.c
> > index ba7df14..b865697 100644
> > --- a/arch/powerpc/mm/copro_fault.c
> > +++ b/arch/powerpc/mm/copro_fault.c
> > @@ -90,3 +90,51 @@ out_unlock:
> >  	return ret;
> >  }
> >  EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
> > +
> > +int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *v=
sid)
> > +{
> > +	int psize, ssize;
> > +
> > +	*esid =3D (ea & ESID_MASK) | SLB_ESID_V;
> > +
> > +	switch (REGION_ID(ea)) {
> > +	case USER_REGION_ID:
> > +		pr_devel("copro_data_segment: 0x%llx -- USER_REGION_ID\n", ea);
> > +#ifdef CONFIG_PPC_MM_SLICES
> > +		psize =3D get_slice_psize(mm, ea);
> > +#else
> > +		psize =3D mm->context.user_psize;
> > +#endif
>=20
> We don't really need that as explained in last review.

That cleanup is in patch 10. I avoided changing it here so it's clearer
that what is being removed is the same as what is being added.

Mikey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 13/17] cxl: Add base builtin support
  2014-09-30 10:35   ` Michael Neuling
@ 2014-10-01 12:00     ` Michael Ellerman
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Ellerman @ 2014-10-01 12:00 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

On Tue, 2014-30-09 at 10:35:02 UTC, Michael Neuling wrote:
> This also adds the cxl_ctx_in_use() function for use in the mm code to see if
> any cxl contexts are currently in use.  This is used by the tlbie() to
> determine if it can do local TLB invalidations or not.  This also adds get/put
> calls for the cxl driver module to refcount the active cxl contexts.

> diff --git a/drivers/misc/cxl/base.c b/drivers/misc/cxl/base.c
> new file mode 100644
> index 0000000..f4cbcfb
> --- /dev/null
> +++ b/drivers/misc/cxl/base.c
> @@ -0,0 +1,102 @@
> +/*
> + * Copyright 2014 IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/rcupdate.h>
> +#include <asm/errno.h>
> +#include <misc/cxl.h>
> +#include "cxl.h"
> +
> +/* protected by rcu */
> +static struct cxl_calls *cxl_calls;
> +
> +static atomic_t use_count = ATOMIC_INIT(0);
...

> +void cxl_ctx_get(void)
> +{
> +	atomic_inc(&use_count);
> +}
> +EXPORT_SYMBOL(cxl_ctx_get);
> +
> +void cxl_ctx_put(void)
> +{
> +	atomic_dec(&use_count);
> +}
> +EXPORT_SYMBOL(cxl_ctx_put);
> +
> +bool cxl_ctx_in_use(void)
> +{
> +	return (atomic_read(&use_count) != 0);
> +}
> +EXPORT_SYMBOL(cxl_ctx_in_use);

So as written this results in a function call for every tlbie(), even when no
one has ever used a CAPI adapter, or when none are even in the machine.

I think the patch below is a better trade off. It makes the use_count global,
but that's not a biggy. The benefit is that the additional code in tlbie()
becomes:

	ld      r10,-29112(r2)
	lwz     r10,0(r10)
	cmpwi   cr7,r10,0

Which is about as good as it can get.

cheers


diff --git a/drivers/misc/cxl/base.c b/drivers/misc/cxl/base.c
index f4cbcfbd8dbc..4401d1c2dd33 100644
--- a/drivers/misc/cxl/base.c
+++ b/drivers/misc/cxl/base.c
@@ -16,7 +16,7 @@
 /* protected by rcu */
 static struct cxl_calls *cxl_calls;
 
-static atomic_t use_count = ATOMIC_INIT(0);
+atomic_t cxl_use_count = ATOMIC_INIT(0);
 
 #ifdef CONFIG_CXL_MODULE
 
@@ -65,24 +65,6 @@ void cxl_slbia(struct mm_struct *mm)
 }
 EXPORT_SYMBOL(cxl_slbia);
 
-void cxl_ctx_get(void)
-{
-	atomic_inc(&use_count);
-}
-EXPORT_SYMBOL(cxl_ctx_get);
-
-void cxl_ctx_put(void)
-{
-	atomic_dec(&use_count);
-}
-EXPORT_SYMBOL(cxl_ctx_put);
-
-bool cxl_ctx_in_use(void)
-{
-	return (atomic_read(&use_count) != 0);
-}
-EXPORT_SYMBOL(cxl_ctx_in_use);
-
 int register_cxl_calls(struct cxl_calls *calls)
 {
 	if (cxl_calls)
diff --git a/include/misc/cxl.h b/include/misc/cxl.h
index bde46a330881..6e43dca6a792 100644
--- a/include/misc/cxl.h
+++ b/include/misc/cxl.h
@@ -18,12 +18,24 @@ struct cxl_irq_ranges {
 };
 
 #ifdef CONFIG_CXL_BASE
+extern atomic_t cxl_use_count;
 
 void cxl_slbia(struct mm_struct *mm);
-void cxl_ctx_get(void);
-void cxl_ctx_put(void);
-bool cxl_ctx_in_use(void);
 
+static inline bool cxl_ctx_in_use(void)
+{
+	return (atomic_read(&cxl_use_count) != 0);
+}
+
+static inline void cxl_ctx_get(void)
+{
+	atomic_inc(&cxl_use_count);
+}
+
+static inline void cxl_ctx_put(void)
+{
+	atomic_dec(&cxl_use_count);
+}
 #else /* CONFIG_CXL_BASE */
 
 #define cxl_slbia(...) do { } while (0)

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 07/17] cxl: Add new header for call backs and structs
  2014-09-30 10:34   ` Michael Neuling
@ 2014-10-01 12:00     ` Michael Ellerman
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Ellerman @ 2014-10-01 12:00 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, mpe, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

On Tue, 2014-30-09 at 10:34:56 UTC, Michael Neuling wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> This new header add defines for callbacks and structs needed by the rest of the
                  adds
> kernel to hook into the cxl infrastructure.
> 
> Empty functions are provided when CONFIG CXL_BASE is not enabled.
> 
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
>  include/misc/cxl.h | 34 ++++++++++++++++++++++++++++++++++

include/misc is kind of weird. I guess it's a misc device.

Any reason not to have it in arch/powerpc/include ?

> diff --git a/include/misc/cxl.h b/include/misc/cxl.h
> new file mode 100644
> index 0000000..bde46a3
> --- /dev/null
> +++ b/include/misc/cxl.h
> @@ -0,0 +1,34 @@
> +/*
> + * Copyright 2014 IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#ifndef _MISC_ASM_CXL_H

No ASM.

> +#define _MISC_ASM_CXL_H
> +
> +#define CXL_IRQ_RANGES 4
> +
> +struct cxl_irq_ranges {
> +	irq_hw_number_t offset[CXL_IRQ_RANGES];
> +	irq_hw_number_t range[CXL_IRQ_RANGES];
> +};
> +
> +#ifdef CONFIG_CXL_BASE
> +
> +void cxl_slbia(struct mm_struct *mm);
> +void cxl_ctx_get(void);
> +void cxl_ctx_put(void);
> +bool cxl_ctx_in_use(void);
> +
> +#else /* CONFIG_CXL_BASE */
> +
> +#define cxl_slbia(...) do { } while (0)
> +#define cxl_ctx_in_use(...) false

Any reason these shouldn't be static inlines?

cheers

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 07/17] cxl: Add new header for call backs and structs
@ 2014-10-01 12:00     ` Michael Ellerman
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Ellerman @ 2014-10-01 12:00 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, mpe, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

On Tue, 2014-30-09 at 10:34:56 UTC, Michael Neuling wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> This new header add defines for callbacks and structs needed by the rest of the
                  adds
> kernel to hook into the cxl infrastructure.
> 
> Empty functions are provided when CONFIG CXL_BASE is not enabled.
> 
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
>  include/misc/cxl.h | 34 ++++++++++++++++++++++++++++++++++

include/misc is kind of weird. I guess it's a misc device.

Any reason not to have it in arch/powerpc/include ?

> diff --git a/include/misc/cxl.h b/include/misc/cxl.h
> new file mode 100644
> index 0000000..bde46a3
> --- /dev/null
> +++ b/include/misc/cxl.h
> @@ -0,0 +1,34 @@
> +/*
> + * Copyright 2014 IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#ifndef _MISC_ASM_CXL_H

No ASM.

> +#define _MISC_ASM_CXL_H
> +
> +#define CXL_IRQ_RANGES 4
> +
> +struct cxl_irq_ranges {
> +	irq_hw_number_t offset[CXL_IRQ_RANGES];
> +	irq_hw_number_t range[CXL_IRQ_RANGES];
> +};
> +
> +#ifdef CONFIG_CXL_BASE
> +
> +void cxl_slbia(struct mm_struct *mm);
> +void cxl_ctx_get(void);
> +void cxl_ctx_put(void);
> +bool cxl_ctx_in_use(void);
> +
> +#else /* CONFIG_CXL_BASE */
> +
> +#define cxl_slbia(...) do { } while (0)
> +#define cxl_ctx_in_use(...) false

Any reason these shouldn't be static inlines?

cheers

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 13/17] cxl: Add base builtin support
@ 2014-10-01 12:00     ` Michael Ellerman
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Ellerman @ 2014-10-01 12:00 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, mpe, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

On Tue, 2014-30-09 at 10:35:02 UTC, Michael Neuling wrote:
> This also adds the cxl_ctx_in_use() function for use in the mm code to see if
> any cxl contexts are currently in use.  This is used by the tlbie() to
> determine if it can do local TLB invalidations or not.  This also adds get/put
> calls for the cxl driver module to refcount the active cxl contexts.

> diff --git a/drivers/misc/cxl/base.c b/drivers/misc/cxl/base.c
> new file mode 100644
> index 0000000..f4cbcfb
> --- /dev/null
> +++ b/drivers/misc/cxl/base.c
> @@ -0,0 +1,102 @@
> +/*
> + * Copyright 2014 IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/rcupdate.h>
> +#include <asm/errno.h>
> +#include <misc/cxl.h>
> +#include "cxl.h"
> +
> +/* protected by rcu */
> +static struct cxl_calls *cxl_calls;
> +
> +static atomic_t use_count = ATOMIC_INIT(0);
...

> +void cxl_ctx_get(void)
> +{
> +	atomic_inc(&use_count);
> +}
> +EXPORT_SYMBOL(cxl_ctx_get);
> +
> +void cxl_ctx_put(void)
> +{
> +	atomic_dec(&use_count);
> +}
> +EXPORT_SYMBOL(cxl_ctx_put);
> +
> +bool cxl_ctx_in_use(void)
> +{
> +	return (atomic_read(&use_count) != 0);
> +}
> +EXPORT_SYMBOL(cxl_ctx_in_use);

So as written this results in a function call for every tlbie(), even when no
one has ever used a CAPI adapter, or when none are even in the machine.

I think the patch below is a better trade off. It makes the use_count global,
but that's not a biggy. The benefit is that the additional code in tlbie()
becomes:

	ld      r10,-29112(r2)
	lwz     r10,0(r10)
	cmpwi   cr7,r10,0

Which is about as good as it can get.

cheers


diff --git a/drivers/misc/cxl/base.c b/drivers/misc/cxl/base.c
index f4cbcfbd8dbc..4401d1c2dd33 100644
--- a/drivers/misc/cxl/base.c
+++ b/drivers/misc/cxl/base.c
@@ -16,7 +16,7 @@
 /* protected by rcu */
 static struct cxl_calls *cxl_calls;
 
-static atomic_t use_count = ATOMIC_INIT(0);
+atomic_t cxl_use_count = ATOMIC_INIT(0);
 
 #ifdef CONFIG_CXL_MODULE
 
@@ -65,24 +65,6 @@ void cxl_slbia(struct mm_struct *mm)
 }
 EXPORT_SYMBOL(cxl_slbia);
 
-void cxl_ctx_get(void)
-{
-	atomic_inc(&use_count);
-}
-EXPORT_SYMBOL(cxl_ctx_get);
-
-void cxl_ctx_put(void)
-{
-	atomic_dec(&use_count);
-}
-EXPORT_SYMBOL(cxl_ctx_put);
-
-bool cxl_ctx_in_use(void)
-{
-	return (atomic_read(&use_count) != 0);
-}
-EXPORT_SYMBOL(cxl_ctx_in_use);
-
 int register_cxl_calls(struct cxl_calls *calls)
 {
 	if (cxl_calls)
diff --git a/include/misc/cxl.h b/include/misc/cxl.h
index bde46a330881..6e43dca6a792 100644
--- a/include/misc/cxl.h
+++ b/include/misc/cxl.h
@@ -18,12 +18,24 @@ struct cxl_irq_ranges {
 };
 
 #ifdef CONFIG_CXL_BASE
+extern atomic_t cxl_use_count;
 
 void cxl_slbia(struct mm_struct *mm);
-void cxl_ctx_get(void);
-void cxl_ctx_put(void);
-bool cxl_ctx_in_use(void);
 
+static inline bool cxl_ctx_in_use(void)
+{
+	return (atomic_read(&cxl_use_count) != 0);
+}
+
+static inline void cxl_ctx_get(void)
+{
+	atomic_inc(&cxl_use_count);
+}
+
+static inline void cxl_ctx_put(void)
+{
+	atomic_dec(&cxl_use_count);
+}
 #else /* CONFIG_CXL_BASE */
 
 #define cxl_slbia(...) do { } while (0)

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 02/17] powerpc/cell: Move data segment faulting code out of cell platform
  2014-10-01  6:47     ` Michael Ellerman
@ 2014-10-02  0:42       ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-10-02  0:42 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: greg, arnd, benh, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

On Wed, 2014-10-01 at 16:47 +1000, Michael Ellerman wrote:
> On Tue, 2014-30-09 at 10:34:51 UTC, Michael Neuling wrote:
> > From: Ian Munsie <imunsie@au1.ibm.com>
> > 
> > __spu_trap_data_seg() currently contains code to determine the VSID and ESID
> > required for a particular EA and mm struct.
> > 
> > This code is generically useful for other co-processors.  This moves the code
> > of the cell platform so it can be used by other powerpc code.  It also adds 1TB
> > segment handling which Cell didn't have.
> 
> I'm not loving this.
> 
> For starters the name "copro_data_segment()" doesn't contain any verbs, and it
> doesn't tell me what it does.

Ok.

> If we give it a name that says what it does, we get copro_get_ea_esid_and_vsid().
> Or something equally ugly.

Ok

> And then in patch 10 you move the bulk of the logic into calculate_vsid().

That was intentional on my part.  I want this patch to be clear that
we're moving this code out of cell.  Then I wanted the optimisations to
be in a separate patch.  It does mean we touch the code twice in this
series, but I was hoping it would make it easier to review.  Alas. :-)

> So instead can we:
>  - add a small helper that does the esid calculation, eg. calculate_esid() ?
>  - factor out the vsid logic into a helper, calculate_vsid() ?
>  - rework the spu code to use those, dropping __spu_trap_data_seg()
>  - use the helpers in the cxl code

OK, I think I can do that.  I might change the name to something better
in this patch, but I'll leave these cleanups to the later patch 10.

Mikey


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 02/17] powerpc/cell: Move data segment faulting code out of cell platform
@ 2014-10-02  0:42       ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-10-02  0:42 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: cbe-oss-dev, arnd, Aneesh Kumar K.V, greg, linux-kernel, imunsie,
	linuxppc-dev, anton, jk

On Wed, 2014-10-01 at 16:47 +1000, Michael Ellerman wrote:
> On Tue, 2014-30-09 at 10:34:51 UTC, Michael Neuling wrote:
> > From: Ian Munsie <imunsie@au1.ibm.com>
> >=20
> > __spu_trap_data_seg() currently contains code to determine the VSID and=
 ESID
> > required for a particular EA and mm struct.
> >=20
> > This code is generically useful for other co-processors.  This moves th=
e code
> > of the cell platform so it can be used by other powerpc code.  It also =
adds 1TB
> > segment handling which Cell didn't have.
>=20
> I'm not loving this.
>=20
> For starters the name "copro_data_segment()" doesn't contain any verbs, a=
nd it
> doesn't tell me what it does.

Ok.

> If we give it a name that says what it does, we get copro_get_ea_esid_and=
_vsid().
> Or something equally ugly.

Ok

> And then in patch 10 you move the bulk of the logic into calculate_vsid()=
.

That was intentional on my part.  I want this patch to be clear that
we're moving this code out of cell.  Then I wanted the optimisations to
be in a separate patch.  It does mean we touch the code twice in this
series, but I was hoping it would make it easier to review.  Alas. :-)

> So instead can we:
>  - add a small helper that does the esid calculation, eg. calculate_esid(=
) ?
>  - factor out the vsid logic into a helper, calculate_vsid() ?
>  - rework the spu code to use those, dropping __spu_trap_data_seg()
>  - use the helpers in the cxl code

OK, I think I can do that.  I might change the name to something better
in this patch, but I'll leave these cleanups to the later patch 10.

Mikey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 02/17] powerpc/cell: Move data segment faulting code out of cell platform
  2014-10-01  9:53     ` Aneesh Kumar K.V
@ 2014-10-02  0:58       ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-10-02  0:58 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: greg, arnd, mpe, benh, anton, linux-kernel, linuxppc-dev, jk,
	imunsie, cbe-oss-dev

On Wed, 2014-10-01 at 15:23 +0530, Aneesh Kumar K.V wrote:
> Michael Neuling <mikey@neuling.org> writes:
> 
> > From: Ian Munsie <imunsie@au1.ibm.com>
> >
> > __spu_trap_data_seg() currently contains code to determine the VSID and ESID
> > required for a particular EA and mm struct.
> >
> > This code is generically useful for other co-processors.  This moves the code
> > of the cell platform so it can be used by other powerpc code.  It also adds 1TB
> > segment handling which Cell didn't have.
> >
> > Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > ---
> >  arch/powerpc/include/asm/mmu-hash64.h  |  7 ++++-
> >  arch/powerpc/mm/copro_fault.c          | 48 ++++++++++++++++++++++++++++++++++
> >  arch/powerpc/mm/slb.c                  |  3 ---
> >  arch/powerpc/platforms/cell/spu_base.c | 41 +++--------------------------
> >  4 files changed, 58 insertions(+), 41 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
> > index d765144..6d0b7a2 100644
> > --- a/arch/powerpc/include/asm/mmu-hash64.h
> > +++ b/arch/powerpc/include/asm/mmu-hash64.h
> > @@ -189,7 +189,12 @@ static inline unsigned int mmu_psize_to_shift(unsigned int mmu_psize)
> >  #define LP_MASK(i)	((0xFF >> (i)) << LP_SHIFT)
> >  
> >  #ifndef __ASSEMBLY__
> > -
> > +static inline int slb_vsid_shift(int ssize)
> > +{
> > +	if (ssize == MMU_SEGSIZE_256M)
> > +		return SLB_VSID_SHIFT;
> > +	return SLB_VSID_SHIFT_1T;
> > +}
> >  static inline int segment_shift(int ssize)
> >  {
> >  	if (ssize == MMU_SEGSIZE_256M)
> > diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
> > index ba7df14..b865697 100644
> > --- a/arch/powerpc/mm/copro_fault.c
> > +++ b/arch/powerpc/mm/copro_fault.c
> > @@ -90,3 +90,51 @@ out_unlock:
> >  	return ret;
> >  }
> >  EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
> > +
> > +int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid)
> > +{
> > +	int psize, ssize;
> > +
> > +	*esid = (ea & ESID_MASK) | SLB_ESID_V;
> > +
> > +	switch (REGION_ID(ea)) {
> > +	case USER_REGION_ID:
> > +		pr_devel("copro_data_segment: 0x%llx -- USER_REGION_ID\n", ea);
> > +#ifdef CONFIG_PPC_MM_SLICES
> > +		psize = get_slice_psize(mm, ea);
> > +#else
> > +		psize = mm->context.user_psize;
> > +#endif
> > +		ssize = user_segment_size(ea);
> > +		*vsid = (get_vsid(mm->context.id, ea, ssize)
> > +			 << slb_vsid_shift(ssize)) | SLB_VSID_USER;
> > +		break;
> > +	case VMALLOC_REGION_ID:
> > +		pr_devel("copro_data_segment: 0x%llx -- VMALLOC_REGION_ID\n", ea);
> > +		if (ea < VMALLOC_END)
> > +			psize = mmu_vmalloc_psize;
> > +		else
> > +			psize = mmu_io_psize;
> > +		ssize = mmu_kernel_ssize;
> > +		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
> > +			 << SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
> 
> why not
> 		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
> 			 << slb_vsid_shift(ssize)) | SLB_VSID_KERNEL;
> 
> for vmalloc and kernel region ? We could end up using 1T segments for kernel mapping too.

Yep, but I'm going to do this in patch 10 where the other optimisations
are for this.

Mikey


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 02/17] powerpc/cell: Move data segment faulting code out of cell platform
@ 2014-10-02  0:58       ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-10-02  0:58 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: cbe-oss-dev, arnd, greg, linux-kernel, linuxppc-dev, anton, imunsie, jk

On Wed, 2014-10-01 at 15:23 +0530, Aneesh Kumar K.V wrote:
> Michael Neuling <mikey@neuling.org> writes:
>=20
> > From: Ian Munsie <imunsie@au1.ibm.com>
> >
> > __spu_trap_data_seg() currently contains code to determine the VSID and=
 ESID
> > required for a particular EA and mm struct.
> >
> > This code is generically useful for other co-processors.  This moves th=
e code
> > of the cell platform so it can be used by other powerpc code.  It also =
adds 1TB
> > segment handling which Cell didn't have.
> >
> > Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > ---
> >  arch/powerpc/include/asm/mmu-hash64.h  |  7 ++++-
> >  arch/powerpc/mm/copro_fault.c          | 48 ++++++++++++++++++++++++++=
++++++++
> >  arch/powerpc/mm/slb.c                  |  3 ---
> >  arch/powerpc/platforms/cell/spu_base.c | 41 +++-----------------------=
---
> >  4 files changed, 58 insertions(+), 41 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/inclu=
de/asm/mmu-hash64.h
> > index d765144..6d0b7a2 100644
> > --- a/arch/powerpc/include/asm/mmu-hash64.h
> > +++ b/arch/powerpc/include/asm/mmu-hash64.h
> > @@ -189,7 +189,12 @@ static inline unsigned int mmu_psize_to_shift(unsi=
gned int mmu_psize)
> >  #define LP_MASK(i)	((0xFF >> (i)) << LP_SHIFT)
> > =20
> >  #ifndef __ASSEMBLY__
> > -
> > +static inline int slb_vsid_shift(int ssize)
> > +{
> > +	if (ssize =3D=3D MMU_SEGSIZE_256M)
> > +		return SLB_VSID_SHIFT;
> > +	return SLB_VSID_SHIFT_1T;
> > +}
> >  static inline int segment_shift(int ssize)
> >  {
> >  	if (ssize =3D=3D MMU_SEGSIZE_256M)
> > diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_faul=
t.c
> > index ba7df14..b865697 100644
> > --- a/arch/powerpc/mm/copro_fault.c
> > +++ b/arch/powerpc/mm/copro_fault.c
> > @@ -90,3 +90,51 @@ out_unlock:
> >  	return ret;
> >  }
> >  EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
> > +
> > +int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *v=
sid)
> > +{
> > +	int psize, ssize;
> > +
> > +	*esid =3D (ea & ESID_MASK) | SLB_ESID_V;
> > +
> > +	switch (REGION_ID(ea)) {
> > +	case USER_REGION_ID:
> > +		pr_devel("copro_data_segment: 0x%llx -- USER_REGION_ID\n", ea);
> > +#ifdef CONFIG_PPC_MM_SLICES
> > +		psize =3D get_slice_psize(mm, ea);
> > +#else
> > +		psize =3D mm->context.user_psize;
> > +#endif
> > +		ssize =3D user_segment_size(ea);
> > +		*vsid =3D (get_vsid(mm->context.id, ea, ssize)
> > +			 << slb_vsid_shift(ssize)) | SLB_VSID_USER;
> > +		break;
> > +	case VMALLOC_REGION_ID:
> > +		pr_devel("copro_data_segment: 0x%llx -- VMALLOC_REGION_ID\n", ea);
> > +		if (ea < VMALLOC_END)
> > +			psize =3D mmu_vmalloc_psize;
> > +		else
> > +			psize =3D mmu_io_psize;
> > +		ssize =3D mmu_kernel_ssize;
> > +		*vsid =3D (get_kernel_vsid(ea, mmu_kernel_ssize)
> > +			 << SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
>=20
> why not
> 		*vsid =3D (get_kernel_vsid(ea, mmu_kernel_ssize)
> 			 << slb_vsid_shift(ssize)) | SLB_VSID_KERNEL;
>=20
> for vmalloc and kernel region ? We could end up using 1T segments for ker=
nel mapping too.

Yep, but I'm going to do this in patch 10 where the other optimisations
are for this.

Mikey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 06/17] powerpc/powernv: Split out set MSI IRQ chip code
  2014-09-30 10:34   ` Michael Neuling
@ 2014-10-02  1:57     ` Michael Ellerman
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Ellerman @ 2014-10-02  1:57 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

On Tue, 2014-30-09 at 10:34:55 UTC, Michael Neuling wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> Some of the MSI IRQ code in pnv_pci_ioda_msi_setup() is generically useful so
> split it out.
> 
> This will be used by some of the cxl PCIe code later.
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index df241b1..329164f 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1306,14 +1306,36 @@ static void pnv_ioda2_msi_eoi(struct irq_data *d)
>  	icp_native_eoi(d);
>  }
>  
> +
> +static void set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq)
> +{
> +	struct irq_data *idata;
> +	struct irq_chip *ichip;
> +
> +	if (phb->type != PNV_PHB_IODA2)
> +		return;
> +
> +	/*
> +	 * Change the IRQ chip for the MSI interrupts on PHB3.
> +	 * The corresponding IRQ chip should be populated for
> +	 * the first time.

Seeing as you're moving this comment can you clarify the wording.


cheers

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 06/17] powerpc/powernv: Split out set MSI IRQ chip code
@ 2014-10-02  1:57     ` Michael Ellerman
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Ellerman @ 2014-10-02  1:57 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

On Tue, 2014-30-09 at 10:34:55 UTC, Michael Neuling wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> Some of the MSI IRQ code in pnv_pci_ioda_msi_setup() is generically useful so
> split it out.
> 
> This will be used by some of the cxl PCIe code later.
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index df241b1..329164f 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1306,14 +1306,36 @@ static void pnv_ioda2_msi_eoi(struct irq_data *d)
>  	icp_native_eoi(d);
>  }
>  
> +
> +static void set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq)
> +{
> +	struct irq_data *idata;
> +	struct irq_chip *ichip;
> +
> +	if (phb->type != PNV_PHB_IODA2)
> +		return;
> +
> +	/*
> +	 * Change the IRQ chip for the MSI interrupts on PHB3.
> +	 * The corresponding IRQ chip should be populated for
> +	 * the first time.

Seeing as you're moving this comment can you clarify the wording.


cheers

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 04/17] powerpc/msi: Improve IRQ bitmap allocator
  2014-10-01  7:13     ` Michael Ellerman
@ 2014-10-02  2:01       ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-10-02  2:01 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: greg, arnd, benh, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

On Wed, 2014-10-01 at 17:13 +1000, Michael Ellerman wrote:
> On Tue, 2014-30-09 at 10:34:53 UTC, Michael Neuling wrote:
> > From: Ian Munsie <imunsie@au1.ibm.com>
> > 
> > Currently msi_bitmap_alloc_hwirqs() will round up any IRQ allocation requests
>                                                                        request
> > to the nearest power of 2.  eg. ask for 5 IRQs and you'll get 8.  This wastes a
>                              ^ one space after a period, or die!
> 
> > lot of IRQs which can be a scarce resource.
> > 
> > For cxl we can require multiple IRQs for every contexts that is attached to the
>                                                  context
> > accelerator.  For AFU directed accelerators, there may be 1000s of contexts
> 
> What is an AFU directed accelerator?

>From the documentation in the last patch:

AFU Models
==========

    There are two programming models supported by the AFU.  Dedicated
    and AFU directed.  AFU may support one or both models.

    In dedicated model only one MMU context is supported.  In this
    model, only one userspace process can use the accelerator at time.

    In AFU directed model, up to 16K simultaneous contexts can be
    supported.  This means up to 16K simultaneous userspace
    applications may use the accelerator (although specific AFUs may
    support less).  In this mode, the AFU sends a 16 bit context ID
    with each of its requests.  This tells the PSL which context is
    associated with this operation.  If the PSL can't translate a
    request, the ID can also be accessed by the kernel so it can
    determine the associated userspace context to service this
    translation with.

>                    
> > attached, hence we can easily run out of IRQs, especially if we are needlessly
> > wasting them.
> > 
> > This changes the msi_bitmap_alloc_hwirqs() to allocate only the required number
>                 x
> > of IRQs, hence avoiding this wastage.
> 
> The crucial detail you failed to mention is that you maintain the behaviour that
> allocations are naturally aligned.

ok, I'll add that.

> Can you add a check in the test code at the bottom of the file to confirm that
> please?

Yep
> 
> > diff --git a/arch/powerpc/sysdev/msi_bitmap.c b/arch/powerpc/sysdev/msi_bitmap.c
> > index 2ff6302..961a358 100644
> > --- a/arch/powerpc/sysdev/msi_bitmap.c
> > +++ b/arch/powerpc/sysdev/msi_bitmap.c
> > @@ -20,32 +20,37 @@ int msi_bitmap_alloc_hwirqs(struct msi_bitmap *bmp, int num)
> >  	int offset, order = get_count_order(num);
> >  
> >  	spin_lock_irqsave(&bmp->lock, flags);
> > -	/*
> > -	 * This is fast, but stricter than we need. We might want to add
> > -	 * a fallback routine which does a linear search with no alignment.
> > -	 */
> > -	offset = bitmap_find_free_region(bmp->bitmap, bmp->irq_count, order);
> > +
> > +	offset = bitmap_find_next_zero_area(bmp->bitmap, bmp->irq_count, 0,
> > +					    num, (1 << order) - 1);
> > +	if (offset > bmp->irq_count)
> > +		goto err;
> 
> Can we get a newline here :)

Ok.

> 
> > +	bitmap_set(bmp->bitmap, offset, num);
> >  	spin_unlock_irqrestore(&bmp->lock, flags);
> >  
> >  	pr_debug("msi_bitmap: allocated 0x%x (2^%d) at offset 0x%x\n",
> >  		 num, order, offset);
> 
> This print out is a bit confusing now, should probably just drop the order.

Arrh, yep.

Thanks,
Mikey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 04/17] powerpc/msi: Improve IRQ bitmap allocator
@ 2014-10-02  2:01       ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-10-02  2:01 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: cbe-oss-dev, arnd, Aneesh Kumar K.V, greg, linux-kernel, imunsie,
	linuxppc-dev, anton, jk

On Wed, 2014-10-01 at 17:13 +1000, Michael Ellerman wrote:
> On Tue, 2014-30-09 at 10:34:53 UTC, Michael Neuling wrote:
> > From: Ian Munsie <imunsie@au1.ibm.com>
> >=20
> > Currently msi_bitmap_alloc_hwirqs() will round up any IRQ allocation re=
quests
>                                                                        re=
quest
> > to the nearest power of 2.  eg. ask for 5 IRQs and you'll get 8.  This =
wastes a
>                              ^ one space after a period, or die!
>=20
> > lot of IRQs which can be a scarce resource.
> >=20
> > For cxl we can require multiple IRQs for every contexts that is attache=
d to the
>                                                  context
> > accelerator.  For AFU directed accelerators, there may be 1000s of cont=
exts
>=20
> What is an AFU directed accelerator?

=46rom the documentation in the last patch:

AFU Models
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

    There are two programming models supported by the AFU.  Dedicated
    and AFU directed.  AFU may support one or both models.

    In dedicated model only one MMU context is supported.  In this
    model, only one userspace process can use the accelerator at time.

    In AFU directed model, up to 16K simultaneous contexts can be
    supported.  This means up to 16K simultaneous userspace
    applications may use the accelerator (although specific AFUs may
    support less).  In this mode, the AFU sends a 16 bit context ID
    with each of its requests.  This tells the PSL which context is
    associated with this operation.  If the PSL can't translate a
    request, the ID can also be accessed by the kernel so it can
    determine the associated userspace context to service this
    translation with.

>                   =20
> > attached, hence we can easily run out of IRQs, especially if we are nee=
dlessly
> > wasting them.
> >=20
> > This changes the msi_bitmap_alloc_hwirqs() to allocate only the require=
d number
>                 x
> > of IRQs, hence avoiding this wastage.
>=20
> The crucial detail you failed to mention is that you maintain the behavio=
ur that
> allocations are naturally aligned.

ok, I'll add that.

> Can you add a check in the test code at the bottom of the file to confirm=
 that
> please?

Yep
>=20
> > diff --git a/arch/powerpc/sysdev/msi_bitmap.c b/arch/powerpc/sysdev/msi=
_bitmap.c
> > index 2ff6302..961a358 100644
> > --- a/arch/powerpc/sysdev/msi_bitmap.c
> > +++ b/arch/powerpc/sysdev/msi_bitmap.c
> > @@ -20,32 +20,37 @@ int msi_bitmap_alloc_hwirqs(struct msi_bitmap *bmp,=
 int num)
> >  	int offset, order =3D get_count_order(num);
> > =20
> >  	spin_lock_irqsave(&bmp->lock, flags);
> > -	/*
> > -	 * This is fast, but stricter than we need. We might want to add
> > -	 * a fallback routine which does a linear search with no alignment.
> > -	 */
> > -	offset =3D bitmap_find_free_region(bmp->bitmap, bmp->irq_count, order=
);
> > +
> > +	offset =3D bitmap_find_next_zero_area(bmp->bitmap, bmp->irq_count, 0,
> > +					    num, (1 << order) - 1);
> > +	if (offset > bmp->irq_count)
> > +		goto err;
>=20
> Can we get a newline here :)

Ok.

>=20
> > +	bitmap_set(bmp->bitmap, offset, num);
> >  	spin_unlock_irqrestore(&bmp->lock, flags);
> > =20
> >  	pr_debug("msi_bitmap: allocated 0x%x (2^%d) at offset 0x%x\n",
> >  		 num, order, offset);
>=20
> This print out is a bit confusing now, should probably just drop the orde=
r.

Arrh, yep.

Thanks,
Mikey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 05/17] powerpc/mm: Export mmu_kernel_ssize and mmu_linear_psize
  2014-10-01  7:13     ` Michael Ellerman
@ 2014-10-02  3:13       ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-10-02  3:13 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: greg, arnd, benh, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

On Wed, 2014-10-01 at 17:13 +1000, Michael Ellerman wrote:
> On Tue, 2014-30-09 at 10:34:54 UTC, Michael Neuling wrote:
> > From: Ian Munsie <imunsie@au1.ibm.com>
> 
> Mind explaining why ? :)

Sure.

Mikey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 05/17] powerpc/mm: Export mmu_kernel_ssize and mmu_linear_psize
@ 2014-10-02  3:13       ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-10-02  3:13 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: cbe-oss-dev, arnd, Aneesh Kumar K.V, greg, linux-kernel, imunsie,
	linuxppc-dev, anton, jk

On Wed, 2014-10-01 at 17:13 +1000, Michael Ellerman wrote:
> On Tue, 2014-30-09 at 10:34:54 UTC, Michael Neuling wrote:
> > From: Ian Munsie <imunsie@au1.ibm.com>
>=20
> Mind explaining why ? :)

Sure.

Mikey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 08/17] powerpc/powerpc: Add new PCIe functions for allocating cxl interrupts
  2014-09-30 10:34   ` Michael Neuling
@ 2014-10-02  3:16     ` Michael Ellerman
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Ellerman @ 2014-10-02  3:16 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

On Tue, 2014-30-09 at 10:34:57 UTC, Michael Neuling wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> This adds a number of functions for allocating IRQs under powernv PCIe for cxl.
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 329164f..b0b96f0 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -503,6 +505,138 @@ static struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev)
>  		return NULL;
>  	return &phb->ioda.pe_array[pdn->pe_number];
>  }
> +
> +struct device_node *pnv_pci_to_phb_node(struct pci_dev *dev)
> +{
> +        struct pci_controller *hose = pci_bus_to_host(dev->bus);
> +
> +        return hose->dn;
> +}
> +EXPORT_SYMBOL(pnv_pci_to_phb_node);
> +
> +#ifdef CONFIG_CXL_BASE
> +int pnv_phb_to_cxl(struct pci_dev *dev)
> +{
> +	struct pci_controller *hose = pci_bus_to_host(dev->bus);
> +	struct pnv_phb *phb = hose->private_data;
> +	struct pnv_ioda_pe *pe;
> +	int rc;
> +
> +	if (!(pe = pnv_ioda_get_pe(dev))) {
> +		rc = -ENODEV;
> +		goto out;
> +	}

That'd be a lot simpler as:

	pe = pnv_ioda_get_pe(dev);
	if (!pe)
		return -ENODEV;

> +	pe_info(pe, "switch PHB to CXL\n");
> +	pe_info(pe, "PHB-ID  : 0x%016llx\n", phb->opal_id);
> +	pe_info(pe, "     pe : %i\n", pe->pe_number);

Spacing is a bit weird but maybe it matches something else?

> +
> +	if ((rc = opal_pci_set_phb_cxl_mode(phb->opal_id, 1, pe->pe_number)))
> +		dev_err(&dev->dev, "opal_pci_set_phb_cxl_mode failed: %i\n", rc);

Again why not:

	rc = opal_pci_set_phb_cxl_mode(phb->opal_id, 1, pe->pe_number);
	if (rc)
		dev_err(&dev->dev, "opal_pci_set_phb_cxl_mode failed: %i\n", rc);

> +out:
> +	return rc;
> +}
> +EXPORT_SYMBOL(pnv_phb_to_cxl);
> +

> +int pnv_cxl_alloc_hwirq_ranges(struct cxl_irq_ranges *irqs,
> +			       struct pci_dev *dev, int num)

This could use some documentation.

It seems to be that it allocates num irqs in some number of ranges, up to
CXL_IRQ_RANGES?

> +{
> +	struct pci_controller *hose = pci_bus_to_host(dev->bus);
> +	struct pnv_phb *phb = hose->private_data;
> +	int range = 0;

You reinitialise to 1 below?

> +	int hwirq;
> +	int try;

So these can be:

	int hwirq, try, range;

> +	memset(irqs, 0, sizeof(struct cxl_irq_ranges));
> +
> +	for (range = 1; range < CXL_IRQ_RANGES && num; range++) {

I think this would be clearer if range was just called "i" as usual.

Why does it start at 1 ?

> +		try = num;
> +		while (try) {
> +			hwirq = msi_bitmap_alloc_hwirqs(&phb->msi_bmp, try);
> +			if (hwirq >= 0)
> +				break;
> +			try /= 2;
> +		}
> +		if (!try)
> +			goto fail;
> +
> +		irqs->offset[range] = phb->msi_base + hwirq;
> +		irqs->range[range] = try;

irqs->range is irq_hw_number_t but looks like it should just be uint.

> +		pr_devel("cxl alloc irq range 0x%x: offset: 0x%lx  limit: %li\n",
> +			 range, irqs->offset[range], irqs->range[range]);
> +		num -= try;
> +	}
> +	if (num)
> +		goto fail;
> +
> +	return 0;
> +fail:
> +	for (range--; range >= 0; range--) {
> +		hwirq = irqs->offset[range] - phb->msi_base;
> +		msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq,
> +				       irqs->range[range]);
> +		irqs->range[range] = 0;
> +	}

Because you zero ranges at the top I think you can replace all of the fail
logic with a call to pnv_cxl_release_hwirq_ranges().


> +	return -ENOSPC;
> +}
> +EXPORT_SYMBOL(pnv_cxl_alloc_hwirq_ranges);
> +
> +void pnv_cxl_release_hwirq_ranges(struct cxl_irq_ranges *irqs,
> +				  struct pci_dev *dev)
> +{
> +	struct pci_controller *hose = pci_bus_to_host(dev->bus);
> +	struct pnv_phb *phb = hose->private_data;
> +	int range = 0;

Unnecessary init again.

> +	int hwirq;
> +
> +	for (range = 0; range < 4; range++) {

Shouldn't 4 be CXL_IRQ_RANGES ?

> +		hwirq = irqs->offset[range] - phb->msi_base;

That should be inside the if.

Or better do:
		if (!irqs->range[range])
			continue;
		...

> +		if (irqs->range[range]) {
> +			pr_devel("cxl release irq range 0x%x: offset: 0x%lx  limit: %ld\n",
> +				 range, irqs->offset[range],
> +				 irqs->range[range]);
> +			msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq,
> +					       irqs->range[range]);
> +		}
> +	}
> +}
> +EXPORT_SYMBOL(pnv_cxl_release_hwirq_ranges);
> +
> +int pnv_cxl_get_irq_count(struct pci_dev *dev)
> +{
> +	struct pci_controller *hose = pci_bus_to_host(dev->bus);
> +        struct pnv_phb *phb = hose->private_data;

Indentation is fubar.

> +	return phb->msi_bmp.irq_count;
> +}
> +EXPORT_SYMBOL(pnv_cxl_get_irq_count);
> +
> +#endif /* CONFIG_CXL_BASE */
>  #endif /* CONFIG_PCI_MSI */
>  
>  static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
> @@ -1330,6 +1464,33 @@ static void set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq)
>  	irq_set_chip(virq, &phb->ioda.irq_chip);
>  }
>  
> +#ifdef CONFIG_CXL_BASE

Why is this here and not in the previous #ifdef CONFIG_CXL_BASE block ?

> +int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
> +			   unsigned int virq)
> +{
> +	struct pci_controller *hose = pci_bus_to_host(dev->bus);
> +	struct pnv_phb *phb = hose->private_data;
> +	unsigned int xive_num = hwirq - phb->msi_base;
> +	struct pnv_ioda_pe *pe;
> +	int rc;
> +
> +	if (!(pe = pnv_ioda_get_pe(dev)))
> +		return -ENODEV;
> +
> +	/* Assign XIVE to PE */
> +	rc = opal_pci_set_xive_pe(phb->opal_id, pe->pe_number, xive_num);
> +	if (rc) {
> +		pr_warn("%s: OPAL error %d setting msi_base 0x%x hwirq 0x%x XIVE 0x%x PE\n",
> +			pci_name(dev), rc, phb->msi_base, hwirq, xive_num);

dev_warn() ?

cheers

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 08/17] powerpc/powerpc: Add new PCIe functions for allocating cxl interrupts
@ 2014-10-02  3:16     ` Michael Ellerman
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Ellerman @ 2014-10-02  3:16 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

On Tue, 2014-30-09 at 10:34:57 UTC, Michael Neuling wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> This adds a number of functions for allocating IRQs under powernv PCIe for cxl.
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 329164f..b0b96f0 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -503,6 +505,138 @@ static struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev)
>  		return NULL;
>  	return &phb->ioda.pe_array[pdn->pe_number];
>  }
> +
> +struct device_node *pnv_pci_to_phb_node(struct pci_dev *dev)
> +{
> +        struct pci_controller *hose = pci_bus_to_host(dev->bus);
> +
> +        return hose->dn;
> +}
> +EXPORT_SYMBOL(pnv_pci_to_phb_node);
> +
> +#ifdef CONFIG_CXL_BASE
> +int pnv_phb_to_cxl(struct pci_dev *dev)
> +{
> +	struct pci_controller *hose = pci_bus_to_host(dev->bus);
> +	struct pnv_phb *phb = hose->private_data;
> +	struct pnv_ioda_pe *pe;
> +	int rc;
> +
> +	if (!(pe = pnv_ioda_get_pe(dev))) {
> +		rc = -ENODEV;
> +		goto out;
> +	}

That'd be a lot simpler as:

	pe = pnv_ioda_get_pe(dev);
	if (!pe)
		return -ENODEV;

> +	pe_info(pe, "switch PHB to CXL\n");
> +	pe_info(pe, "PHB-ID  : 0x%016llx\n", phb->opal_id);
> +	pe_info(pe, "     pe : %i\n", pe->pe_number);

Spacing is a bit weird but maybe it matches something else?

> +
> +	if ((rc = opal_pci_set_phb_cxl_mode(phb->opal_id, 1, pe->pe_number)))
> +		dev_err(&dev->dev, "opal_pci_set_phb_cxl_mode failed: %i\n", rc);

Again why not:

	rc = opal_pci_set_phb_cxl_mode(phb->opal_id, 1, pe->pe_number);
	if (rc)
		dev_err(&dev->dev, "opal_pci_set_phb_cxl_mode failed: %i\n", rc);

> +out:
> +	return rc;
> +}
> +EXPORT_SYMBOL(pnv_phb_to_cxl);
> +

> +int pnv_cxl_alloc_hwirq_ranges(struct cxl_irq_ranges *irqs,
> +			       struct pci_dev *dev, int num)

This could use some documentation.

It seems to be that it allocates num irqs in some number of ranges, up to
CXL_IRQ_RANGES?

> +{
> +	struct pci_controller *hose = pci_bus_to_host(dev->bus);
> +	struct pnv_phb *phb = hose->private_data;
> +	int range = 0;

You reinitialise to 1 below?

> +	int hwirq;
> +	int try;

So these can be:

	int hwirq, try, range;

> +	memset(irqs, 0, sizeof(struct cxl_irq_ranges));
> +
> +	for (range = 1; range < CXL_IRQ_RANGES && num; range++) {

I think this would be clearer if range was just called "i" as usual.

Why does it start at 1 ?

> +		try = num;
> +		while (try) {
> +			hwirq = msi_bitmap_alloc_hwirqs(&phb->msi_bmp, try);
> +			if (hwirq >= 0)
> +				break;
> +			try /= 2;
> +		}
> +		if (!try)
> +			goto fail;
> +
> +		irqs->offset[range] = phb->msi_base + hwirq;
> +		irqs->range[range] = try;

irqs->range is irq_hw_number_t but looks like it should just be uint.

> +		pr_devel("cxl alloc irq range 0x%x: offset: 0x%lx  limit: %li\n",
> +			 range, irqs->offset[range], irqs->range[range]);
> +		num -= try;
> +	}
> +	if (num)
> +		goto fail;
> +
> +	return 0;
> +fail:
> +	for (range--; range >= 0; range--) {
> +		hwirq = irqs->offset[range] - phb->msi_base;
> +		msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq,
> +				       irqs->range[range]);
> +		irqs->range[range] = 0;
> +	}

Because you zero ranges at the top I think you can replace all of the fail
logic with a call to pnv_cxl_release_hwirq_ranges().


> +	return -ENOSPC;
> +}
> +EXPORT_SYMBOL(pnv_cxl_alloc_hwirq_ranges);
> +
> +void pnv_cxl_release_hwirq_ranges(struct cxl_irq_ranges *irqs,
> +				  struct pci_dev *dev)
> +{
> +	struct pci_controller *hose = pci_bus_to_host(dev->bus);
> +	struct pnv_phb *phb = hose->private_data;
> +	int range = 0;

Unnecessary init again.

> +	int hwirq;
> +
> +	for (range = 0; range < 4; range++) {

Shouldn't 4 be CXL_IRQ_RANGES ?

> +		hwirq = irqs->offset[range] - phb->msi_base;

That should be inside the if.

Or better do:
		if (!irqs->range[range])
			continue;
		...

> +		if (irqs->range[range]) {
> +			pr_devel("cxl release irq range 0x%x: offset: 0x%lx  limit: %ld\n",
> +				 range, irqs->offset[range],
> +				 irqs->range[range]);
> +			msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq,
> +					       irqs->range[range]);
> +		}
> +	}
> +}
> +EXPORT_SYMBOL(pnv_cxl_release_hwirq_ranges);
> +
> +int pnv_cxl_get_irq_count(struct pci_dev *dev)
> +{
> +	struct pci_controller *hose = pci_bus_to_host(dev->bus);
> +        struct pnv_phb *phb = hose->private_data;

Indentation is fubar.

> +	return phb->msi_bmp.irq_count;
> +}
> +EXPORT_SYMBOL(pnv_cxl_get_irq_count);
> +
> +#endif /* CONFIG_CXL_BASE */
>  #endif /* CONFIG_PCI_MSI */
>  
>  static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
> @@ -1330,6 +1464,33 @@ static void set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq)
>  	irq_set_chip(virq, &phb->ioda.irq_chip);
>  }
>  
> +#ifdef CONFIG_CXL_BASE

Why is this here and not in the previous #ifdef CONFIG_CXL_BASE block ?

> +int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
> +			   unsigned int virq)
> +{
> +	struct pci_controller *hose = pci_bus_to_host(dev->bus);
> +	struct pnv_phb *phb = hose->private_data;
> +	unsigned int xive_num = hwirq - phb->msi_base;
> +	struct pnv_ioda_pe *pe;
> +	int rc;
> +
> +	if (!(pe = pnv_ioda_get_pe(dev)))
> +		return -ENODEV;
> +
> +	/* Assign XIVE to PE */
> +	rc = opal_pci_set_xive_pe(phb->opal_id, pe->pe_number, xive_num);
> +	if (rc) {
> +		pr_warn("%s: OPAL error %d setting msi_base 0x%x hwirq 0x%x XIVE 0x%x PE\n",
> +			pci_name(dev), rc, phb->msi_base, hwirq, xive_num);

dev_warn() ?

cheers

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 07/17] cxl: Add new header for call backs and structs
  2014-10-01 12:00     ` Michael Ellerman
@ 2014-10-02  3:37       ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-10-02  3:37 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: greg, arnd, benh, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

On Wed, 2014-10-01 at 22:00 +1000, Michael Ellerman wrote:
> On Tue, 2014-30-09 at 10:34:56 UTC, Michael Neuling wrote:
> > From: Ian Munsie <imunsie@au1.ibm.com>
> > 
> > This new header add defines for callbacks and structs needed by the rest of the
>                   adds
> > kernel to hook into the cxl infrastructure.
> > 
> > Empty functions are provided when CONFIG CXL_BASE is not enabled.
> > 
> > Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > ---
> >  include/misc/cxl.h | 34 ++++++++++++++++++++++++++++++++++
> 
> include/misc is kind of weird. I guess it's a misc device.
> 
> Any reason not to have it in arch/powerpc/include ?

We can do either way.  We did consider it a driver so putting it in
arch/powerpc didn't seem quite right.

I might leave it here unless you really object.

> 
> > diff --git a/include/misc/cxl.h b/include/misc/cxl.h
> > new file mode 100644
> > index 0000000..bde46a3
> > --- /dev/null
> > +++ b/include/misc/cxl.h
> > @@ -0,0 +1,34 @@
> > +/*
> > + * Copyright 2014 IBM Corp.
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version
> > + * 2 of the License, or (at your option) any later version.
> > + */
> > +
> > +#ifndef _MISC_ASM_CXL_H
> 
> No ASM.

Oops, yes.

> 
> > +#define _MISC_ASM_CXL_H
> > +
> > +#define CXL_IRQ_RANGES 4
> > +
> > +struct cxl_irq_ranges {
> > +	irq_hw_number_t offset[CXL_IRQ_RANGES];
> > +	irq_hw_number_t range[CXL_IRQ_RANGES];
> > +};
> > +
> > +#ifdef CONFIG_CXL_BASE
> > +
> > +void cxl_slbia(struct mm_struct *mm);
> > +void cxl_ctx_get(void);
> > +void cxl_ctx_put(void);
> > +bool cxl_ctx_in_use(void);
> > +
> > +#else /* CONFIG_CXL_BASE */
> > +
> > +#define cxl_slbia(...) do { } while (0)
> > +#define cxl_ctx_in_use(...) false
> 
> Any reason these shouldn't be static inlines?

No, I'll change.

Mikey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 07/17] cxl: Add new header for call backs and structs
@ 2014-10-02  3:37       ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-10-02  3:37 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: cbe-oss-dev, arnd, Aneesh Kumar K.V, greg, linux-kernel, imunsie,
	linuxppc-dev, anton, jk

On Wed, 2014-10-01 at 22:00 +1000, Michael Ellerman wrote:
> On Tue, 2014-30-09 at 10:34:56 UTC, Michael Neuling wrote:
> > From: Ian Munsie <imunsie@au1.ibm.com>
> >=20
> > This new header add defines for callbacks and structs needed by the res=
t of the
>                   adds
> > kernel to hook into the cxl infrastructure.
> >=20
> > Empty functions are provided when CONFIG CXL_BASE is not enabled.
> >=20
> > Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > ---
> >  include/misc/cxl.h | 34 ++++++++++++++++++++++++++++++++++
>=20
> include/misc is kind of weird. I guess it's a misc device.
>=20
> Any reason not to have it in arch/powerpc/include ?

We can do either way.  We did consider it a driver so putting it in
arch/powerpc didn't seem quite right.

I might leave it here unless you really object.

>=20
> > diff --git a/include/misc/cxl.h b/include/misc/cxl.h
> > new file mode 100644
> > index 0000000..bde46a3
> > --- /dev/null
> > +++ b/include/misc/cxl.h
> > @@ -0,0 +1,34 @@
> > +/*
> > + * Copyright 2014 IBM Corp.
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version
> > + * 2 of the License, or (at your option) any later version.
> > + */
> > +
> > +#ifndef _MISC_ASM_CXL_H
>=20
> No ASM.

Oops, yes.

>=20
> > +#define _MISC_ASM_CXL_H
> > +
> > +#define CXL_IRQ_RANGES 4
> > +
> > +struct cxl_irq_ranges {
> > +	irq_hw_number_t offset[CXL_IRQ_RANGES];
> > +	irq_hw_number_t range[CXL_IRQ_RANGES];
> > +};
> > +
> > +#ifdef CONFIG_CXL_BASE
> > +
> > +void cxl_slbia(struct mm_struct *mm);
> > +void cxl_ctx_get(void);
> > +void cxl_ctx_put(void);
> > +bool cxl_ctx_in_use(void);
> > +
> > +#else /* CONFIG_CXL_BASE */
> > +
> > +#define cxl_slbia(...) do { } while (0)
> > +#define cxl_ctx_in_use(...) false
>=20
> Any reason these shouldn't be static inlines?

No, I'll change.

Mikey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 13/17] cxl: Add base builtin support
  2014-10-01 12:00     ` Michael Ellerman
@ 2014-10-02  3:43       ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-10-02  3:43 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: greg, arnd, benh, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

On Wed, 2014-10-01 at 22:00 +1000, Michael Ellerman wrote:
> On Tue, 2014-30-09 at 10:35:02 UTC, Michael Neuling wrote:
> > This also adds the cxl_ctx_in_use() function for use in the mm code to see if
> > any cxl contexts are currently in use.  This is used by the tlbie() to
> > determine if it can do local TLB invalidations or not.  This also adds get/put
> > calls for the cxl driver module to refcount the active cxl contexts.
> 
> > diff --git a/drivers/misc/cxl/base.c b/drivers/misc/cxl/base.c
> > new file mode 100644
> > index 0000000..f4cbcfb
> > --- /dev/null
> > +++ b/drivers/misc/cxl/base.c
> > @@ -0,0 +1,102 @@
> > +/*
> > + * Copyright 2014 IBM Corp.
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version
> > + * 2 of the License, or (at your option) any later version.
> > + */
> > +
> > +#include <linux/module.h>
> > +#include <linux/rcupdate.h>
> > +#include <asm/errno.h>
> > +#include <misc/cxl.h>
> > +#include "cxl.h"
> > +
> > +/* protected by rcu */
> > +static struct cxl_calls *cxl_calls;
> > +
> > +static atomic_t use_count = ATOMIC_INIT(0);
> ...
> 
> > +void cxl_ctx_get(void)
> > +{
> > +	atomic_inc(&use_count);
> > +}
> > +EXPORT_SYMBOL(cxl_ctx_get);
> > +
> > +void cxl_ctx_put(void)
> > +{
> > +	atomic_dec(&use_count);
> > +}
> > +EXPORT_SYMBOL(cxl_ctx_put);
> > +
> > +bool cxl_ctx_in_use(void)
> > +{
> > +	return (atomic_read(&use_count) != 0);
> > +}
> > +EXPORT_SYMBOL(cxl_ctx_in_use);
> 
> So as written this results in a function call for every tlbie(), even when no
> one has ever used a CAPI adapter, or when none are even in the machine.

Yep.

> I think the patch below is a better trade off. It makes the use_count global,
> but that's not a biggy. The benefit is that the additional code in tlbie()
> becomes:
> 
> 	ld      r10,-29112(r2)
> 	lwz     r10,0(r10)
> 	cmpwi   cr7,r10,0
> 
> Which is about as good as it can get.

Nice.. I'll add.  Thanks.

Mikey

> 
> cheers
> 
> 
> diff --git a/drivers/misc/cxl/base.c b/drivers/misc/cxl/base.c
> index f4cbcfbd8dbc..4401d1c2dd33 100644
> --- a/drivers/misc/cxl/base.c
> +++ b/drivers/misc/cxl/base.c
> @@ -16,7 +16,7 @@
>  /* protected by rcu */
>  static struct cxl_calls *cxl_calls;
>  
> -static atomic_t use_count = ATOMIC_INIT(0);
> +atomic_t cxl_use_count = ATOMIC_INIT(0);
>  
>  #ifdef CONFIG_CXL_MODULE
>  
> @@ -65,24 +65,6 @@ void cxl_slbia(struct mm_struct *mm)
>  }
>  EXPORT_SYMBOL(cxl_slbia);
>  
> -void cxl_ctx_get(void)
> -{
> -	atomic_inc(&use_count);
> -}
> -EXPORT_SYMBOL(cxl_ctx_get);
> -
> -void cxl_ctx_put(void)
> -{
> -	atomic_dec(&use_count);
> -}
> -EXPORT_SYMBOL(cxl_ctx_put);
> -
> -bool cxl_ctx_in_use(void)
> -{
> -	return (atomic_read(&use_count) != 0);
> -}
> -EXPORT_SYMBOL(cxl_ctx_in_use);
> -
>  int register_cxl_calls(struct cxl_calls *calls)
>  {
>  	if (cxl_calls)
> diff --git a/include/misc/cxl.h b/include/misc/cxl.h
> index bde46a330881..6e43dca6a792 100644
> --- a/include/misc/cxl.h
> +++ b/include/misc/cxl.h
> @@ -18,12 +18,24 @@ struct cxl_irq_ranges {
>  };
>  
>  #ifdef CONFIG_CXL_BASE
> +extern atomic_t cxl_use_count;
>  
>  void cxl_slbia(struct mm_struct *mm);
> -void cxl_ctx_get(void);
> -void cxl_ctx_put(void);
> -bool cxl_ctx_in_use(void);
>  
> +static inline bool cxl_ctx_in_use(void)
> +{
> +	return (atomic_read(&cxl_use_count) != 0);
> +}
> +
> +static inline void cxl_ctx_get(void)
> +{
> +	atomic_inc(&cxl_use_count);
> +}
> +
> +static inline void cxl_ctx_put(void)
> +{
> +	atomic_dec(&cxl_use_count);
> +}
>  #else /* CONFIG_CXL_BASE */
>  
>  #define cxl_slbia(...) do { } while (0)
> 


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 13/17] cxl: Add base builtin support
@ 2014-10-02  3:43       ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-10-02  3:43 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: cbe-oss-dev, arnd, Aneesh Kumar K.V, greg, linux-kernel, imunsie,
	linuxppc-dev, anton, jk

On Wed, 2014-10-01 at 22:00 +1000, Michael Ellerman wrote:
> On Tue, 2014-30-09 at 10:35:02 UTC, Michael Neuling wrote:
> > This also adds the cxl_ctx_in_use() function for use in the mm code to =
see if
> > any cxl contexts are currently in use.  This is used by the tlbie() to
> > determine if it can do local TLB invalidations or not.  This also adds =
get/put
> > calls for the cxl driver module to refcount the active cxl contexts.
>=20
> > diff --git a/drivers/misc/cxl/base.c b/drivers/misc/cxl/base.c
> > new file mode 100644
> > index 0000000..f4cbcfb
> > --- /dev/null
> > +++ b/drivers/misc/cxl/base.c
> > @@ -0,0 +1,102 @@
> > +/*
> > + * Copyright 2014 IBM Corp.
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version
> > + * 2 of the License, or (at your option) any later version.
> > + */
> > +
> > +#include <linux/module.h>
> > +#include <linux/rcupdate.h>
> > +#include <asm/errno.h>
> > +#include <misc/cxl.h>
> > +#include "cxl.h"
> > +
> > +/* protected by rcu */
> > +static struct cxl_calls *cxl_calls;
> > +
> > +static atomic_t use_count =3D ATOMIC_INIT(0);
> ...
>=20
> > +void cxl_ctx_get(void)
> > +{
> > +	atomic_inc(&use_count);
> > +}
> > +EXPORT_SYMBOL(cxl_ctx_get);
> > +
> > +void cxl_ctx_put(void)
> > +{
> > +	atomic_dec(&use_count);
> > +}
> > +EXPORT_SYMBOL(cxl_ctx_put);
> > +
> > +bool cxl_ctx_in_use(void)
> > +{
> > +	return (atomic_read(&use_count) !=3D 0);
> > +}
> > +EXPORT_SYMBOL(cxl_ctx_in_use);
>=20
> So as written this results in a function call for every tlbie(), even whe=
n no
> one has ever used a CAPI adapter, or when none are even in the machine.

Yep.

> I think the patch below is a better trade off. It makes the use_count glo=
bal,
> but that's not a biggy. The benefit is that the additional code in tlbie(=
)
> becomes:
>=20
> 	ld      r10,-29112(r2)
> 	lwz     r10,0(r10)
> 	cmpwi   cr7,r10,0
>=20
> Which is about as good as it can get.

Nice.. I'll add.  Thanks.

Mikey

>=20
> cheers
>=20
>=20
> diff --git a/drivers/misc/cxl/base.c b/drivers/misc/cxl/base.c
> index f4cbcfbd8dbc..4401d1c2dd33 100644
> --- a/drivers/misc/cxl/base.c
> +++ b/drivers/misc/cxl/base.c
> @@ -16,7 +16,7 @@
>  /* protected by rcu */
>  static struct cxl_calls *cxl_calls;
> =20
> -static atomic_t use_count =3D ATOMIC_INIT(0);
> +atomic_t cxl_use_count =3D ATOMIC_INIT(0);
> =20
>  #ifdef CONFIG_CXL_MODULE
> =20
> @@ -65,24 +65,6 @@ void cxl_slbia(struct mm_struct *mm)
>  }
>  EXPORT_SYMBOL(cxl_slbia);
> =20
> -void cxl_ctx_get(void)
> -{
> -	atomic_inc(&use_count);
> -}
> -EXPORT_SYMBOL(cxl_ctx_get);
> -
> -void cxl_ctx_put(void)
> -{
> -	atomic_dec(&use_count);
> -}
> -EXPORT_SYMBOL(cxl_ctx_put);
> -
> -bool cxl_ctx_in_use(void)
> -{
> -	return (atomic_read(&use_count) !=3D 0);
> -}
> -EXPORT_SYMBOL(cxl_ctx_in_use);
> -
>  int register_cxl_calls(struct cxl_calls *calls)
>  {
>  	if (cxl_calls)
> diff --git a/include/misc/cxl.h b/include/misc/cxl.h
> index bde46a330881..6e43dca6a792 100644
> --- a/include/misc/cxl.h
> +++ b/include/misc/cxl.h
> @@ -18,12 +18,24 @@ struct cxl_irq_ranges {
>  };
> =20
>  #ifdef CONFIG_CXL_BASE
> +extern atomic_t cxl_use_count;
> =20
>  void cxl_slbia(struct mm_struct *mm);
> -void cxl_ctx_get(void);
> -void cxl_ctx_put(void);
> -bool cxl_ctx_in_use(void);
> =20
> +static inline bool cxl_ctx_in_use(void)
> +{
> +	return (atomic_read(&cxl_use_count) !=3D 0);
> +}
> +
> +static inline void cxl_ctx_get(void)
> +{
> +	atomic_inc(&cxl_use_count);
> +}
> +
> +static inline void cxl_ctx_put(void)
> +{
> +	atomic_dec(&cxl_use_count);
> +}
>  #else /* CONFIG_CXL_BASE */
> =20
>  #define cxl_slbia(...) do { } while (0)
>=20

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 09/17] powerpc/mm: Add new hash_page_mm()
  2014-09-30 10:34   ` Michael Neuling
@ 2014-10-02  3:48     ` Michael Ellerman
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Ellerman @ 2014-10-02  3:48 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

On Tue, 2014-30-09 at 10:34:58 UTC, Michael Neuling wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> This adds a new function hash_page_mm() based on the existing hash_page().
> This version allows any struct mm to be passed in, rather than assuming
> current.  This is useful for servicing co-processor faults which are not in the
> context of the current running process.

I'm not a big fan. hash_page() is already a train wreck, and this doesn't make
it any better.

> diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> index bbdb054..0a5c8c0 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -904,7 +904,7 @@ void demote_segment_4k(struct mm_struct *mm, unsigned long addr)
>  		return;
>  	slice_set_range_psize(mm, addr, 1, MMU_PAGE_4K);
>  	copro_flush_all_slbs(mm);
> -	if (get_paca_psize(addr) != MMU_PAGE_4K) {
> +	if ((get_paca_psize(addr) != MMU_PAGE_4K) && (current->mm == mm)) {
>  		get_paca()->context = mm->context;
>  		slb_flush_and_rebolt();

This is a bit fishy.

If that mm is currently running on another cpu you just failed to update it's
paca. But I think the call to check_paca_psize() in hash_page() will save you
on that cpu.

In fact we might be able to remove that synchronisation from
demote_segment_4k() and always leave it up to check_paca_psize()?

> @@ -989,26 +989,24 @@ static void check_paca_psize(unsigned long ea, struct mm_struct *mm,
>   * -1 - critical hash insertion error
>   * -2 - access not permitted by subpage protection mechanism
>   */
> -int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
> +int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, unsigned long trap)
>  {
>  	enum ctx_state prev_state = exception_enter();
>  	pgd_t *pgdir;
>  	unsigned long vsid;
> -	struct mm_struct *mm;
>  	pte_t *ptep;
>  	unsigned hugeshift;
>  	const struct cpumask *tmp;
>  	int rc, user_region = 0, local = 0;
>  	int psize, ssize;
>  
> -	DBG_LOW("hash_page(ea=%016lx, access=%lx, trap=%lx\n",
> -		ea, access, trap);
> +	DBG_LOW("%s(ea=%016lx, access=%lx, trap=%lx\n",
> +		__func__, ea, access, trap);
>  
>  	/* Get region & vsid */
>   	switch (REGION_ID(ea)) {
>  	case USER_REGION_ID:
>  		user_region = 1;
> -		mm = current->mm;
>  		if (! mm) {
>  			DBG_LOW(" user region with no mm !\n");
>  			rc = 1;

What about the VMALLOC case where we do:
		mm = &init_mm;
		
Is that what you want? It seems odd that you pass an mm to the routine, but
then potentially it ends up using a different mm after all depending on the
address.


cheers

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 09/17] powerpc/mm: Add new hash_page_mm()
@ 2014-10-02  3:48     ` Michael Ellerman
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Ellerman @ 2014-10-02  3:48 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

On Tue, 2014-30-09 at 10:34:58 UTC, Michael Neuling wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> This adds a new function hash_page_mm() based on the existing hash_page().
> This version allows any struct mm to be passed in, rather than assuming
> current.  This is useful for servicing co-processor faults which are not in the
> context of the current running process.

I'm not a big fan. hash_page() is already a train wreck, and this doesn't make
it any better.

> diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> index bbdb054..0a5c8c0 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -904,7 +904,7 @@ void demote_segment_4k(struct mm_struct *mm, unsigned long addr)
>  		return;
>  	slice_set_range_psize(mm, addr, 1, MMU_PAGE_4K);
>  	copro_flush_all_slbs(mm);
> -	if (get_paca_psize(addr) != MMU_PAGE_4K) {
> +	if ((get_paca_psize(addr) != MMU_PAGE_4K) && (current->mm == mm)) {
>  		get_paca()->context = mm->context;
>  		slb_flush_and_rebolt();

This is a bit fishy.

If that mm is currently running on another cpu you just failed to update it's
paca. But I think the call to check_paca_psize() in hash_page() will save you
on that cpu.

In fact we might be able to remove that synchronisation from
demote_segment_4k() and always leave it up to check_paca_psize()?

> @@ -989,26 +989,24 @@ static void check_paca_psize(unsigned long ea, struct mm_struct *mm,
>   * -1 - critical hash insertion error
>   * -2 - access not permitted by subpage protection mechanism
>   */
> -int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
> +int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, unsigned long trap)
>  {
>  	enum ctx_state prev_state = exception_enter();
>  	pgd_t *pgdir;
>  	unsigned long vsid;
> -	struct mm_struct *mm;
>  	pte_t *ptep;
>  	unsigned hugeshift;
>  	const struct cpumask *tmp;
>  	int rc, user_region = 0, local = 0;
>  	int psize, ssize;
>  
> -	DBG_LOW("hash_page(ea=%016lx, access=%lx, trap=%lx\n",
> -		ea, access, trap);
> +	DBG_LOW("%s(ea=%016lx, access=%lx, trap=%lx\n",
> +		__func__, ea, access, trap);
>  
>  	/* Get region & vsid */
>   	switch (REGION_ID(ea)) {
>  	case USER_REGION_ID:
>  		user_region = 1;
> -		mm = current->mm;
>  		if (! mm) {
>  			DBG_LOW(" user region with no mm !\n");
>  			rc = 1;

What about the VMALLOC case where we do:
		mm = &init_mm;
		
Is that what you want? It seems odd that you pass an mm to the routine, but
then potentially it ends up using a different mm after all depending on the
address.


cheers

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 06/17] powerpc/powernv: Split out set MSI IRQ chip code
  2014-10-02  1:57     ` Michael Ellerman
@ 2014-10-02  5:22       ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-10-02  5:22 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: greg, arnd, benh, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

On Thu, 2014-10-02 at 11:57 +1000, Michael Ellerman wrote:
> On Tue, 2014-30-09 at 10:34:55 UTC, Michael Neuling wrote:
> > From: Ian Munsie <imunsie@au1.ibm.com>
> > 
> > Some of the MSI IRQ code in pnv_pci_ioda_msi_setup() is generically useful so
> > split it out.
> > 
> > This will be used by some of the cxl PCIe code later.
> > 
> > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> > index df241b1..329164f 100644
> > --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> > @@ -1306,14 +1306,36 @@ static void pnv_ioda2_msi_eoi(struct irq_data *d)
> >  	icp_native_eoi(d);
> >  }
> >  
> > +
> > +static void set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq)
> > +{
> > +	struct irq_data *idata;
> > +	struct irq_chip *ichip;
> > +
> > +	if (phb->type != PNV_PHB_IODA2)
> > +		return;
> > +
> > +	/*
> > +	 * Change the IRQ chip for the MSI interrupts on PHB3.
> > +	 * The corresponding IRQ chip should be populated for
> > +	 * the first time.
> 
> Seeing as you're moving this comment can you clarify the wording.

Ok.

Mikey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 06/17] powerpc/powernv: Split out set MSI IRQ chip code
@ 2014-10-02  5:22       ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-10-02  5:22 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: cbe-oss-dev, arnd, Aneesh Kumar K.V, greg, linux-kernel, imunsie,
	linuxppc-dev, anton, jk

On Thu, 2014-10-02 at 11:57 +1000, Michael Ellerman wrote:
> On Tue, 2014-30-09 at 10:34:55 UTC, Michael Neuling wrote:
> > From: Ian Munsie <imunsie@au1.ibm.com>
> >=20
> > Some of the MSI IRQ code in pnv_pci_ioda_msi_setup() is generically use=
ful so
> > split it out.
> >=20
> > This will be used by some of the cxl PCIe code later.
> >=20
> > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/p=
latforms/powernv/pci-ioda.c
> > index df241b1..329164f 100644
> > --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> > @@ -1306,14 +1306,36 @@ static void pnv_ioda2_msi_eoi(struct irq_data *=
d)
> >  	icp_native_eoi(d);
> >  }
> > =20
> > +
> > +static void set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq)
> > +{
> > +	struct irq_data *idata;
> > +	struct irq_chip *ichip;
> > +
> > +	if (phb->type !=3D PNV_PHB_IODA2)
> > +		return;
> > +
> > +	/*
> > +	 * Change the IRQ chip for the MSI interrupts on PHB3.
> > +	 * The corresponding IRQ chip should be populated for
> > +	 * the first time.
>=20
> Seeing as you're moving this comment can you clarify the wording.

Ok.

Mikey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 15/17] cxl: Userspace header file.
  2014-09-30 10:35   ` Michael Neuling
@ 2014-10-02  6:02     ` Michael Ellerman
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Ellerman @ 2014-10-02  6:02 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

On Tue, 2014-30-09 at 10:35:04 UTC, Michael Neuling wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> This defines structs and magic numbers required for userspace to interact with
> the kernel cxl driver via /dev/cxl/afu0.0.
> 
> diff --git a/include/uapi/misc/cxl.h b/include/uapi/misc/cxl.h
> new file mode 100644
> index 0000000..6a394b5
> --- /dev/null
> +++ b/include/uapi/misc/cxl.h
> @@ -0,0 +1,88 @@
> +/*
> + * Copyright 2014 IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#ifndef _UAPI_ASM_CXL_H
> +#define _UAPI_ASM_CXL_H
> +
> +#include <linux/types.h>
> +#include <linux/ioctl.h>
> +
> +/* ioctls */
> +struct cxl_ioctl_start_work {
> +	__u64 wed;
> +	__u64 amr;
> +	__u64 reserved1;
> +	__u32 reserved2;
> +	__s16 num_interrupts; /* -1 = use value from afu descriptor */
> +	__u16 process_element; /* returned from kernel */
> +	__u64 reserved3;
> +	__u64 reserved4;
> +	__u64 reserved5;
> +	__u64 reserved6;

Why so many reserved fields?

What mechanism is there that will allow you to ever unreserve them?

ie. how does a new userspace detect that the kernel it's running on supports
new fields?

Or conversely how does a new kernel detect that userspace has passed it a
meaningful value in one of the previously reserved fields?

> +#define CXL_MAGIC 0xCA
> +#define CXL_IOCTL_START_WORK      _IOWR(CXL_MAGIC, 0x00, struct cxl_ioctl_start_work)

What happened to 0x1 ?

> +#define CXL_IOCTL_CHECK_ERROR     _IO(CXL_MAGIC,   0x02)
> +
> +/* events from read() */
> +
> +enum cxl_event_type {
> +	CXL_EVENT_READ_FAIL     = -1,

I don't see this used?

> +	CXL_EVENT_RESERVED      = 0,
> +	CXL_EVENT_AFU_INTERRUPT = 1,
> +	CXL_EVENT_DATA_STORAGE  = 2,
> +	CXL_EVENT_AFU_ERROR     = 3,
> +};
> +
> +struct cxl_event_header {
> +	__u32 type;
> +	__u16 size;
> +	__u16 process_element;
> +	__u64 reserved1;
> +	__u64 reserved2;
> +	__u64 reserved3;
> +};

Again lots of reserved fields?

> +struct cxl_event_afu_interrupt {
> +	struct cxl_event_header header;
> +	__u16 irq; /* Raised AFU interrupt number */
> +	__u16 reserved1;
> +	__u32 reserved2;
> +	__u64 reserved3;
> +	__u64 reserved4;
> +	__u64 reserved5;
> +};
> +
> +struct cxl_event_data_storage {
> +	struct cxl_event_header header;
> +	__u64 addr;
> +	__u64 reserved1;
> +	__u64 reserved2;
> +	__u64 reserved3;
> +};
> +
> +struct cxl_event_afu_error {
> +	struct cxl_event_header header;
> +	__u64 err;
> +	__u64 reserved1;
> +	__u64 reserved2;
> +	__u64 reserved3;
> +};
> +
> +struct cxl_event {
> +	union {
> +		struct cxl_event_header header;
> +		struct cxl_event_afu_interrupt irq;
> +		struct cxl_event_data_storage fault;
> +		struct cxl_event_afu_error afu_err;
> +	};
> +};

Rather than having the header included in every event, would it be clearer if
the cxl_event was:

struct cxl_event {
	struct cxl_event_header header;
	union {
		struct cxl_event_afu_interrupt irq;
		struct cxl_event_data_storage fault;
		struct cxl_event_afu_error afu_err;
	};
};

cheers

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 15/17] cxl: Userspace header file.
@ 2014-10-02  6:02     ` Michael Ellerman
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Ellerman @ 2014-10-02  6:02 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

On Tue, 2014-30-09 at 10:35:04 UTC, Michael Neuling wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> This defines structs and magic numbers required for userspace to interact with
> the kernel cxl driver via /dev/cxl/afu0.0.
> 
> diff --git a/include/uapi/misc/cxl.h b/include/uapi/misc/cxl.h
> new file mode 100644
> index 0000000..6a394b5
> --- /dev/null
> +++ b/include/uapi/misc/cxl.h
> @@ -0,0 +1,88 @@
> +/*
> + * Copyright 2014 IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#ifndef _UAPI_ASM_CXL_H
> +#define _UAPI_ASM_CXL_H
> +
> +#include <linux/types.h>
> +#include <linux/ioctl.h>
> +
> +/* ioctls */
> +struct cxl_ioctl_start_work {
> +	__u64 wed;
> +	__u64 amr;
> +	__u64 reserved1;
> +	__u32 reserved2;
> +	__s16 num_interrupts; /* -1 = use value from afu descriptor */
> +	__u16 process_element; /* returned from kernel */
> +	__u64 reserved3;
> +	__u64 reserved4;
> +	__u64 reserved5;
> +	__u64 reserved6;

Why so many reserved fields?

What mechanism is there that will allow you to ever unreserve them?

ie. how does a new userspace detect that the kernel it's running on supports
new fields?

Or conversely how does a new kernel detect that userspace has passed it a
meaningful value in one of the previously reserved fields?

> +#define CXL_MAGIC 0xCA
> +#define CXL_IOCTL_START_WORK      _IOWR(CXL_MAGIC, 0x00, struct cxl_ioctl_start_work)

What happened to 0x1 ?

> +#define CXL_IOCTL_CHECK_ERROR     _IO(CXL_MAGIC,   0x02)
> +
> +/* events from read() */
> +
> +enum cxl_event_type {
> +	CXL_EVENT_READ_FAIL     = -1,

I don't see this used?

> +	CXL_EVENT_RESERVED      = 0,
> +	CXL_EVENT_AFU_INTERRUPT = 1,
> +	CXL_EVENT_DATA_STORAGE  = 2,
> +	CXL_EVENT_AFU_ERROR     = 3,
> +};
> +
> +struct cxl_event_header {
> +	__u32 type;
> +	__u16 size;
> +	__u16 process_element;
> +	__u64 reserved1;
> +	__u64 reserved2;
> +	__u64 reserved3;
> +};

Again lots of reserved fields?

> +struct cxl_event_afu_interrupt {
> +	struct cxl_event_header header;
> +	__u16 irq; /* Raised AFU interrupt number */
> +	__u16 reserved1;
> +	__u32 reserved2;
> +	__u64 reserved3;
> +	__u64 reserved4;
> +	__u64 reserved5;
> +};
> +
> +struct cxl_event_data_storage {
> +	struct cxl_event_header header;
> +	__u64 addr;
> +	__u64 reserved1;
> +	__u64 reserved2;
> +	__u64 reserved3;
> +};
> +
> +struct cxl_event_afu_error {
> +	struct cxl_event_header header;
> +	__u64 err;
> +	__u64 reserved1;
> +	__u64 reserved2;
> +	__u64 reserved3;
> +};
> +
> +struct cxl_event {
> +	union {
> +		struct cxl_event_header header;
> +		struct cxl_event_afu_interrupt irq;
> +		struct cxl_event_data_storage fault;
> +		struct cxl_event_afu_error afu_err;
> +	};
> +};

Rather than having the header included in every event, would it be clearer if
the cxl_event was:

struct cxl_event {
	struct cxl_event_header header;
	union {
		struct cxl_event_afu_interrupt irq;
		struct cxl_event_data_storage fault;
		struct cxl_event_afu_error afu_err;
	};
};

cheers

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 08/17] powerpc/powerpc: Add new PCIe functions for allocating cxl interrupts
  2014-10-02  3:16     ` Michael Ellerman
@ 2014-10-02  6:09       ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-10-02  6:09 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: greg, arnd, benh, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

On Thu, 2014-10-02 at 13:16 +1000, Michael Ellerman wrote:
> On Tue, 2014-30-09 at 10:34:57 UTC, Michael Neuling wrote:
> > From: Ian Munsie <imunsie@au1.ibm.com>
> > 
> > This adds a number of functions for allocating IRQs under powernv PCIe for cxl.
> > 
> > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> > index 329164f..b0b96f0 100644
> > --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> > @@ -503,6 +505,138 @@ static struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev)
> >  		return NULL;
> >  	return &phb->ioda.pe_array[pdn->pe_number];
> >  }
> > +
> > +struct device_node *pnv_pci_to_phb_node(struct pci_dev *dev)
> > +{
> > +        struct pci_controller *hose = pci_bus_to_host(dev->bus);
> > +
> > +        return hose->dn;
> > +}
> > +EXPORT_SYMBOL(pnv_pci_to_phb_node);
> > +
> > +#ifdef CONFIG_CXL_BASE
> > +int pnv_phb_to_cxl(struct pci_dev *dev)
> > +{
> > +	struct pci_controller *hose = pci_bus_to_host(dev->bus);
> > +	struct pnv_phb *phb = hose->private_data;
> > +	struct pnv_ioda_pe *pe;
> > +	int rc;
> > +
> > +	if (!(pe = pnv_ioda_get_pe(dev))) {
> > +		rc = -ENODEV;
> > +		goto out;
> > +	}
> 
> That'd be a lot simpler as:
> 
> 	pe = pnv_ioda_get_pe(dev);
> 	if (!pe)
> 		return -ENODEV;

OK

> > +	pe_info(pe, "switch PHB to CXL\n");
> > +	pe_info(pe, "PHB-ID  : 0x%016llx\n", phb->opal_id);
> > +	pe_info(pe, "     pe : %i\n", pe->pe_number);
> 
> Spacing is a bit weird but maybe it matches something else?

Actually, we switched this to pe_info() based on one of Gavin's reviews,
so the pe_number and opal_id being printed here are not needed anymore.
I'm simplifying this into one line.

	pe_info(pe, "Switching PHB to CXL\n");


> 
> > +
> > +	if ((rc = opal_pci_set_phb_cxl_mode(phb->opal_id, 1, pe->pe_number)))
> > +		dev_err(&dev->dev, "opal_pci_set_phb_cxl_mode failed: %i\n", rc);
> 
> Again why not:
> 
> 	rc = opal_pci_set_phb_cxl_mode(phb->opal_id, 1, pe->pe_number);
> 	if (rc)
> 		dev_err(&dev->dev, "opal_pci_set_phb_cxl_mode failed: %i\n", rc);

Ok.

> > +out:
> > +	return rc;
> > +}
> > +EXPORT_SYMBOL(pnv_phb_to_cxl);
> > +
> 
> > +int pnv_cxl_alloc_hwirq_ranges(struct cxl_irq_ranges *irqs,
> > +			       struct pci_dev *dev, int num)
> 
> This could use some documentation.
> 
> It seems to be that it allocates num irqs in some number of ranges, up to
> CXL_IRQ_RANGES?

OK

> 
> > +{
> > +	struct pci_controller *hose = pci_bus_to_host(dev->bus);
> > +	struct pnv_phb *phb = hose->private_data;
> > +	int range = 0;
> 
> You reinitialise to 1 below?

Oops

> 
> > +	int hwirq;
> > +	int try;
> 
> So these can be:
> 
> 	int hwirq, try, range;
> 
> > +	memset(irqs, 0, sizeof(struct cxl_irq_ranges));
> > +
> > +	for (range = 1; range < CXL_IRQ_RANGES && num; range++) {
> 
> I think this would be clearer if range was just called "i" as usual.

OK

> Why does it start at 1 ?

0 is used by the data storage interrupt. I'll add a comment to clarify.

> 
> > +		try = num;
> > +		while (try) {
> > +			hwirq = msi_bitmap_alloc_hwirqs(&phb->msi_bmp, try);
> > +			if (hwirq >= 0)
> > +				break;
> > +			try /= 2;
> > +		}
> > +		if (!try)
> > +			goto fail;
> > +
> > +		irqs->offset[range] = phb->msi_base + hwirq;
> > +		irqs->range[range] = try;
> 
> irqs->range is irq_hw_number_t but looks like it should just be uint.
> 
> > +		pr_devel("cxl alloc irq range 0x%x: offset: 0x%lx  limit: %li\n",
> > +			 range, irqs->offset[range], irqs->range[range]);
> > +		num -= try;
> > +	}
> > +	if (num)
> > +		goto fail;
> > +
> > +	return 0;
> > +fail:
> > +	for (range--; range >= 0; range--) {
> > +		hwirq = irqs->offset[range] - phb->msi_base;
> > +		msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq,
> > +				       irqs->range[range]);
> > +		irqs->range[range] = 0;
> > +	}
> 
> Because you zero ranges at the top I think you can replace all of the fail
> logic with a call to pnv_cxl_release_hwirq_ranges().

Nice.  Will change.

> 
> 
> > +	return -ENOSPC;
> > +}
> > +EXPORT_SYMBOL(pnv_cxl_alloc_hwirq_ranges);
> > +
> > +void pnv_cxl_release_hwirq_ranges(struct cxl_irq_ranges *irqs,
> > +				  struct pci_dev *dev)
> > +{
> > +	struct pci_controller *hose = pci_bus_to_host(dev->bus);
> > +	struct pnv_phb *phb = hose->private_data;
> > +	int range = 0;
> 
> Unnecessary init again.

Yep. I'll change to 'i' too.

> > +	int hwirq;
> > +
> > +	for (range = 0; range < 4; range++) {
> 
> Shouldn't 4 be CXL_IRQ_RANGES ?

Yep.

> 
> > +		hwirq = irqs->offset[range] - phb->msi_base;
> 
> That should be inside the if.

Yep.

> 
> Or better do:
> 		if (!irqs->range[range])
> 			continue;
> 		...

Nice.

> 
> > +		if (irqs->range[range]) {
> > +			pr_devel("cxl release irq range 0x%x: offset: 0x%lx  limit: %ld\n",
> > +				 range, irqs->offset[range],
> > +				 irqs->range[range]);
> > +			msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq,
> > +					       irqs->range[range]);
> > +		}
> > +	}
> > +}
> > +EXPORT_SYMBOL(pnv_cxl_release_hwirq_ranges);
> > +
> > +int pnv_cxl_get_irq_count(struct pci_dev *dev)
> > +{
> > +	struct pci_controller *hose = pci_bus_to_host(dev->bus);
> > +        struct pnv_phb *phb = hose->private_data;
> 
> Indentation is fubar.

OK

> 
> > +	return phb->msi_bmp.irq_count;
> > +}
> > +EXPORT_SYMBOL(pnv_cxl_get_irq_count);
> > +
> > +#endif /* CONFIG_CXL_BASE */
> >  #endif /* CONFIG_PCI_MSI */
> >  
> >  static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
> > @@ -1330,6 +1464,33 @@ static void set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq)
> >  	irq_set_chip(virq, &phb->ioda.irq_chip);
> >  }
> >  
> > +#ifdef CONFIG_CXL_BASE
> 
> Why is this here and not in the previous #ifdef CONFIG_CXL_BASE block ?

I can actually move the rest of the cxl code down here too.  So I'll do
that.

> 
> > +int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
> > +			   unsigned int virq)
> > +{
> > +	struct pci_controller *hose = pci_bus_to_host(dev->bus);
> > +	struct pnv_phb *phb = hose->private_data;
> > +	unsigned int xive_num = hwirq - phb->msi_base;
> > +	struct pnv_ioda_pe *pe;
> > +	int rc;
> > +
> > +	if (!(pe = pnv_ioda_get_pe(dev)))
> > +		return -ENODEV;
> > +
> > +	/* Assign XIVE to PE */
> > +	rc = opal_pci_set_xive_pe(phb->opal_id, pe->pe_number, xive_num);
> > +	if (rc) {
> > +		pr_warn("%s: OPAL error %d setting msi_base 0x%x hwirq 0x%x XIVE 0x%x PE\n",
> > +			pci_name(dev), rc, phb->msi_base, hwirq, xive_num);
> 
> dev_warn() ?

I'm going to move it to the pe_warn() we have here.

Cheers,
Mikey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 08/17] powerpc/powerpc: Add new PCIe functions for allocating cxl interrupts
@ 2014-10-02  6:09       ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-10-02  6:09 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: cbe-oss-dev, arnd, Aneesh Kumar K.V, greg, linux-kernel, imunsie,
	linuxppc-dev, anton, jk

On Thu, 2014-10-02 at 13:16 +1000, Michael Ellerman wrote:
> On Tue, 2014-30-09 at 10:34:57 UTC, Michael Neuling wrote:
> > From: Ian Munsie <imunsie@au1.ibm.com>
> >=20
> > This adds a number of functions for allocating IRQs under powernv PCIe =
for cxl.
> >=20
> > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/p=
latforms/powernv/pci-ioda.c
> > index 329164f..b0b96f0 100644
> > --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> > @@ -503,6 +505,138 @@ static struct pnv_ioda_pe *pnv_ioda_get_pe(struct=
 pci_dev *dev)
> >  		return NULL;
> >  	return &phb->ioda.pe_array[pdn->pe_number];
> >  }
> > +
> > +struct device_node *pnv_pci_to_phb_node(struct pci_dev *dev)
> > +{
> > +        struct pci_controller *hose =3D pci_bus_to_host(dev->bus);
> > +
> > +        return hose->dn;
> > +}
> > +EXPORT_SYMBOL(pnv_pci_to_phb_node);
> > +
> > +#ifdef CONFIG_CXL_BASE
> > +int pnv_phb_to_cxl(struct pci_dev *dev)
> > +{
> > +	struct pci_controller *hose =3D pci_bus_to_host(dev->bus);
> > +	struct pnv_phb *phb =3D hose->private_data;
> > +	struct pnv_ioda_pe *pe;
> > +	int rc;
> > +
> > +	if (!(pe =3D pnv_ioda_get_pe(dev))) {
> > +		rc =3D -ENODEV;
> > +		goto out;
> > +	}
>=20
> That'd be a lot simpler as:
>=20
> 	pe =3D pnv_ioda_get_pe(dev);
> 	if (!pe)
> 		return -ENODEV;

OK

> > +	pe_info(pe, "switch PHB to CXL\n");
> > +	pe_info(pe, "PHB-ID  : 0x%016llx\n", phb->opal_id);
> > +	pe_info(pe, "     pe : %i\n", pe->pe_number);
>=20
> Spacing is a bit weird but maybe it matches something else?

Actually, we switched this to pe_info() based on one of Gavin's reviews,
so the pe_number and opal_id being printed here are not needed anymore.
I'm simplifying this into one line.

	pe_info(pe, "Switching PHB to CXL\n");


>=20
> > +
> > +	if ((rc =3D opal_pci_set_phb_cxl_mode(phb->opal_id, 1, pe->pe_number)=
))
> > +		dev_err(&dev->dev, "opal_pci_set_phb_cxl_mode failed: %i\n", rc);
>=20
> Again why not:
>=20
> 	rc =3D opal_pci_set_phb_cxl_mode(phb->opal_id, 1, pe->pe_number);
> 	if (rc)
> 		dev_err(&dev->dev, "opal_pci_set_phb_cxl_mode failed: %i\n", rc);

Ok.

> > +out:
> > +	return rc;
> > +}
> > +EXPORT_SYMBOL(pnv_phb_to_cxl);
> > +
>=20
> > +int pnv_cxl_alloc_hwirq_ranges(struct cxl_irq_ranges *irqs,
> > +			       struct pci_dev *dev, int num)
>=20
> This could use some documentation.
>=20
> It seems to be that it allocates num irqs in some number of ranges, up to
> CXL_IRQ_RANGES?

OK

>=20
> > +{
> > +	struct pci_controller *hose =3D pci_bus_to_host(dev->bus);
> > +	struct pnv_phb *phb =3D hose->private_data;
> > +	int range =3D 0;
>=20
> You reinitialise to 1 below?

Oops

>=20
> > +	int hwirq;
> > +	int try;
>=20
> So these can be:
>=20
> 	int hwirq, try, range;
>=20
> > +	memset(irqs, 0, sizeof(struct cxl_irq_ranges));
> > +
> > +	for (range =3D 1; range < CXL_IRQ_RANGES && num; range++) {
>=20
> I think this would be clearer if range was just called "i" as usual.

OK

> Why does it start at 1 ?

0 is used by the data storage interrupt. I'll add a comment to clarify.

>=20
> > +		try =3D num;
> > +		while (try) {
> > +			hwirq =3D msi_bitmap_alloc_hwirqs(&phb->msi_bmp, try);
> > +			if (hwirq >=3D 0)
> > +				break;
> > +			try /=3D 2;
> > +		}
> > +		if (!try)
> > +			goto fail;
> > +
> > +		irqs->offset[range] =3D phb->msi_base + hwirq;
> > +		irqs->range[range] =3D try;
>=20
> irqs->range is irq_hw_number_t but looks like it should just be uint.
>=20
> > +		pr_devel("cxl alloc irq range 0x%x: offset: 0x%lx  limit: %li\n",
> > +			 range, irqs->offset[range], irqs->range[range]);
> > +		num -=3D try;
> > +	}
> > +	if (num)
> > +		goto fail;
> > +
> > +	return 0;
> > +fail:
> > +	for (range--; range >=3D 0; range--) {
> > +		hwirq =3D irqs->offset[range] - phb->msi_base;
> > +		msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq,
> > +				       irqs->range[range]);
> > +		irqs->range[range] =3D 0;
> > +	}
>=20
> Because you zero ranges at the top I think you can replace all of the fai=
l
> logic with a call to pnv_cxl_release_hwirq_ranges().

Nice.  Will change.

>=20
>=20
> > +	return -ENOSPC;
> > +}
> > +EXPORT_SYMBOL(pnv_cxl_alloc_hwirq_ranges);
> > +
> > +void pnv_cxl_release_hwirq_ranges(struct cxl_irq_ranges *irqs,
> > +				  struct pci_dev *dev)
> > +{
> > +	struct pci_controller *hose =3D pci_bus_to_host(dev->bus);
> > +	struct pnv_phb *phb =3D hose->private_data;
> > +	int range =3D 0;
>=20
> Unnecessary init again.

Yep. I'll change to 'i' too.

> > +	int hwirq;
> > +
> > +	for (range =3D 0; range < 4; range++) {
>=20
> Shouldn't 4 be CXL_IRQ_RANGES ?

Yep.

>=20
> > +		hwirq =3D irqs->offset[range] - phb->msi_base;
>=20
> That should be inside the if.

Yep.

>=20
> Or better do:
> 		if (!irqs->range[range])
> 			continue;
> 		...

Nice.

>=20
> > +		if (irqs->range[range]) {
> > +			pr_devel("cxl release irq range 0x%x: offset: 0x%lx  limit: %ld\n",
> > +				 range, irqs->offset[range],
> > +				 irqs->range[range]);
> > +			msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq,
> > +					       irqs->range[range]);
> > +		}
> > +	}
> > +}
> > +EXPORT_SYMBOL(pnv_cxl_release_hwirq_ranges);
> > +
> > +int pnv_cxl_get_irq_count(struct pci_dev *dev)
> > +{
> > +	struct pci_controller *hose =3D pci_bus_to_host(dev->bus);
> > +        struct pnv_phb *phb =3D hose->private_data;
>=20
> Indentation is fubar.

OK

>=20
> > +	return phb->msi_bmp.irq_count;
> > +}
> > +EXPORT_SYMBOL(pnv_cxl_get_irq_count);
> > +
> > +#endif /* CONFIG_CXL_BASE */
> >  #endif /* CONFIG_PCI_MSI */
> > =20
> >  static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_=
pe *pe)
> > @@ -1330,6 +1464,33 @@ static void set_msi_irq_chip(struct pnv_phb *phb=
, unsigned int virq)
> >  	irq_set_chip(virq, &phb->ioda.irq_chip);
> >  }
> > =20
> > +#ifdef CONFIG_CXL_BASE
>=20
> Why is this here and not in the previous #ifdef CONFIG_CXL_BASE block ?

I can actually move the rest of the cxl code down here too.  So I'll do
that.

>=20
> > +int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
> > +			   unsigned int virq)
> > +{
> > +	struct pci_controller *hose =3D pci_bus_to_host(dev->bus);
> > +	struct pnv_phb *phb =3D hose->private_data;
> > +	unsigned int xive_num =3D hwirq - phb->msi_base;
> > +	struct pnv_ioda_pe *pe;
> > +	int rc;
> > +
> > +	if (!(pe =3D pnv_ioda_get_pe(dev)))
> > +		return -ENODEV;
> > +
> > +	/* Assign XIVE to PE */
> > +	rc =3D opal_pci_set_xive_pe(phb->opal_id, pe->pe_number, xive_num);
> > +	if (rc) {
> > +		pr_warn("%s: OPAL error %d setting msi_base 0x%x hwirq 0x%x XIVE 0x%=
x PE\n",
> > +			pci_name(dev), rc, phb->msi_base, hwirq, xive_num);
>=20
> dev_warn() ?

I'm going to move it to the pe_warn() we have here.

Cheers,
Mikey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 10/17] powerpc/mm: Merge vsid calculation in hash_page() and copro_data_segment()
  2014-10-01  9:55     ` Aneesh Kumar K.V
@ 2014-10-02  6:44       ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-10-02  6:44 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: greg, arnd, mpe, benh, anton, linux-kernel, linuxppc-dev, jk,
	imunsie, cbe-oss-dev

On Wed, 2014-10-01 at 15:25 +0530, Aneesh Kumar K.V wrote:
> Michael Neuling <mikey@neuling.org> writes:
> 
> > From: Ian Munsie <imunsie@au1.ibm.com>
> >
> > The vsid calculation between hash_page() and copro_data_segment() are very
> > similar.  This merges these two different versions.
> >
> > Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > ---
> >  arch/powerpc/include/asm/mmu-hash64.h |  2 ++
> >  arch/powerpc/mm/copro_fault.c         | 45 ++++++--------------------
> >  arch/powerpc/mm/hash_utils_64.c       | 61 ++++++++++++++++++++++-------------
> >  3 files changed, 50 insertions(+), 58 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
> > index f84e5a5..bf43fb0 100644
> > --- a/arch/powerpc/include/asm/mmu-hash64.h
> > +++ b/arch/powerpc/include/asm/mmu-hash64.h
> > @@ -322,6 +322,8 @@ extern int __hash_page_64K(unsigned long ea, unsigned long access,
> >  			   unsigned int local, int ssize);
> >  struct mm_struct;
> >  unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap);
> > +int calculate_vsid(struct mm_struct *mm, u64 ea,
> > +		   u64 *vsid, int *psize, int *ssize);
> >  extern int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, unsigned long trap);
> >  extern int hash_page(unsigned long ea, unsigned long access, unsigned long trap);
> >  int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid,
> > diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
> > index 939abdf..ba8bf8e 100644
> > --- a/arch/powerpc/mm/copro_fault.c
> > +++ b/arch/powerpc/mm/copro_fault.c
> > @@ -94,45 +94,18 @@ EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
> >  
> >  int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *vsid)
> >  {
> > -	int psize, ssize;
> > +	int psize, ssize, rc;
> >  
> >  	*esid = (ea & ESID_MASK) | SLB_ESID_V;
> >  
> > -	switch (REGION_ID(ea)) {
> > -	case USER_REGION_ID:
> > -		pr_devel("copro_data_segment: 0x%llx -- USER_REGION_ID\n", ea);
> > -#ifdef CONFIG_PPC_MM_SLICES
> > -		psize = get_slice_psize(mm, ea);
> > -#else
> > -		psize = mm->context.user_psize;
> > -#endif
> > -		ssize = user_segment_size(ea);
> > -		*vsid = (get_vsid(mm->context.id, ea, ssize)
> > -			 << slb_vsid_shift(ssize)) | SLB_VSID_USER;
> > -		break;
> > -	case VMALLOC_REGION_ID:
> > -		pr_devel("copro_data_segment: 0x%llx -- VMALLOC_REGION_ID\n", ea);
> > -		if (ea < VMALLOC_END)
> > -			psize = mmu_vmalloc_psize;
> > -		else
> > -			psize = mmu_io_psize;
> > -		ssize = mmu_kernel_ssize;
> > -		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
> > -			 << SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
> > -		break;
> > -	case KERNEL_REGION_ID:
> > -		pr_devel("copro_data_segment: 0x%llx -- KERNEL_REGION_ID\n", ea);
> > -		psize = mmu_linear_psize;
> > -		ssize = mmu_kernel_ssize;
> > -		*vsid = (get_kernel_vsid(ea, mmu_kernel_ssize)
> > -			 << SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
> > -		break;
> > -	default:
> > -		/* Future: support kernel segments so that drivers can use the
> > -		 * CoProcessors */
> > -		pr_debug("invalid region access at %016llx\n", ea);
> > -		return 1;
> > -	}
> > +	rc = calculate_vsid(mm, ea, vsid, &psize, &ssize);
> > +	if (rc)
> > +		return rc;
> > +	if (REGION_ID(ea) == USER_REGION_ID)
> > +		*vsid = (*vsid << slb_vsid_shift(ssize)) | SLB_VSID_USER;
> > +	else
> > +		*vsid = (*vsid << SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
> > +
> >  	*vsid |= mmu_psize_defs[psize].sllp |
> >  		((ssize == MMU_SEGSIZE_1T) ? SLB_VSID_B_1T : 0);
> >  
> > diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> > index 0a5c8c0..3fa81ca 100644
> > --- a/arch/powerpc/mm/hash_utils_64.c
> > +++ b/arch/powerpc/mm/hash_utils_64.c
> > @@ -983,6 +983,38 @@ static void check_paca_psize(unsigned long ea, struct mm_struct *mm,
> >  	}
> >  }
> >  
> > +int calculate_vsid(struct mm_struct *mm, u64 ea,
> > +		   u64 *vsid, int *psize, int *ssize)
> > +{
> > +	switch (REGION_ID(ea)) {
> > +	case USER_REGION_ID:
> > +		pr_devel("%s: 0x%llx -- USER_REGION_ID\n", __func__, ea);
> > +		*psize = get_slice_psize(mm, ea);
> > +		*ssize = user_segment_size(ea);
> > +		*vsid = get_vsid(mm->context.id, ea, *ssize);
> > +		return 0;
> > +	case VMALLOC_REGION_ID:
> > +		pr_devel("%s: 0x%llx -- VMALLOC_REGION_ID\n", __func__, ea);
> > +		if (ea < VMALLOC_END)
> > +			*psize = mmu_vmalloc_psize;
> > +		else
> > +			*psize = mmu_io_psize;
> > +		*ssize = mmu_kernel_ssize;
> > +		*vsid = get_kernel_vsid(ea, mmu_kernel_ssize);
> > +		return 0;
> > +	case KERNEL_REGION_ID:
> > +		pr_devel("%s: 0x%llx -- KERNEL_REGION_ID\n", __func__, ea);
> > +		*psize = mmu_linear_psize;
> > +		*ssize = mmu_kernel_ssize;
> > +		*vsid = get_kernel_vsid(ea, mmu_kernel_ssize);
> > +		return 0;
> > +	default:
> > +		pr_debug("%s: invalid region access at %016llx\n", __func__, ea);
> > +		return 1;
> > +	}
> > +}
> > +EXPORT_SYMBOL_GPL(calculate_vsid);
> > +
> >  /* Result code is:
> >   *  0 - handled
> >   *  1 - normal page fault
> > @@ -993,7 +1025,7 @@ int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, u
> >  {
> >  	enum ctx_state prev_state = exception_enter();
> >  	pgd_t *pgdir;
> > -	unsigned long vsid;
> > +	u64 vsid;
> >  	pte_t *ptep;
> >  	unsigned hugeshift;
> >  	const struct cpumask *tmp;
> > @@ -1003,35 +1035,20 @@ int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, u
> >  	DBG_LOW("%s(ea=%016lx, access=%lx, trap=%lx\n",
> >  		__func__, ea, access, trap);
> >  
> > -	/* Get region & vsid */
> > - 	switch (REGION_ID(ea)) {
> > -	case USER_REGION_ID:
> > +	/* Get region */
> > +	if (REGION_ID(ea) == USER_REGION_ID) {
> >  		user_region = 1;
> >  		if (! mm) {
> >  			DBG_LOW(" user region with no mm !\n");
> >  			rc = 1;
> >  			goto bail;
> >  		}
> > -		psize = get_slice_psize(mm, ea);
> > -		ssize = user_segment_size(ea);
> > -		vsid = get_vsid(mm->context.id, ea, ssize);
> > -		break;
> > -	case VMALLOC_REGION_ID:
> > +	} else
> >  		mm = &init_mm;
> > -		vsid = get_kernel_vsid(ea, mmu_kernel_ssize);
> > -		if (ea < VMALLOC_END)
> > -			psize = mmu_vmalloc_psize;
> > -		else
> > -			psize = mmu_io_psize;
> > -		ssize = mmu_kernel_ssize;
> > -		break;
> > -	default:
> > -		/* Not a valid range
> > -		 * Send the problem up to do_page_fault 
> > -		 */
> > -		rc = 1;
> 
> 
> That part is different now. We now handle kernel_region_id in case of
> hash_page. Earlier we used consider it a problem. 

Yeah, that's going to be the kernel linear mapping.  We should probably
continue to barf as we shouldn't fault on that.  Thanks.

I'll fix.

Mikey

> 
> > +	rc = calculate_vsid(mm, ea, &vsid, &psize, &ssize);
> > +	if (rc)
> >  		goto bail;
> > -	}
> > +
> >  	DBG_LOW(" mm=%p, mm->pgdir=%p, vsid=%016lx\n", mm, mm->pgd, vsid);
> >  
> >  	/* Bad address. */
> > -- 
> > 1.9.1
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 10/17] powerpc/mm: Merge vsid calculation in hash_page() and copro_data_segment()
@ 2014-10-02  6:44       ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-10-02  6:44 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: cbe-oss-dev, arnd, greg, linux-kernel, linuxppc-dev, anton, imunsie, jk

On Wed, 2014-10-01 at 15:25 +0530, Aneesh Kumar K.V wrote:
> Michael Neuling <mikey@neuling.org> writes:
>=20
> > From: Ian Munsie <imunsie@au1.ibm.com>
> >
> > The vsid calculation between hash_page() and copro_data_segment() are v=
ery
> > similar.  This merges these two different versions.
> >
> > Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > ---
> >  arch/powerpc/include/asm/mmu-hash64.h |  2 ++
> >  arch/powerpc/mm/copro_fault.c         | 45 ++++++--------------------
> >  arch/powerpc/mm/hash_utils_64.c       | 61 ++++++++++++++++++++++-----=
--------
> >  3 files changed, 50 insertions(+), 58 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/inclu=
de/asm/mmu-hash64.h
> > index f84e5a5..bf43fb0 100644
> > --- a/arch/powerpc/include/asm/mmu-hash64.h
> > +++ b/arch/powerpc/include/asm/mmu-hash64.h
> > @@ -322,6 +322,8 @@ extern int __hash_page_64K(unsigned long ea, unsign=
ed long access,
> >  			   unsigned int local, int ssize);
> >  struct mm_struct;
> >  unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int =
trap);
> > +int calculate_vsid(struct mm_struct *mm, u64 ea,
> > +		   u64 *vsid, int *psize, int *ssize);
> >  extern int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsign=
ed long access, unsigned long trap);
> >  extern int hash_page(unsigned long ea, unsigned long access, unsigned =
long trap);
> >  int __hash_page_huge(unsigned long ea, unsigned long access, unsigned =
long vsid,
> > diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_faul=
t.c
> > index 939abdf..ba8bf8e 100644
> > --- a/arch/powerpc/mm/copro_fault.c
> > +++ b/arch/powerpc/mm/copro_fault.c
> > @@ -94,45 +94,18 @@ EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
> > =20
> >  int copro_data_segment(struct mm_struct *mm, u64 ea, u64 *esid, u64 *v=
sid)
> >  {
> > -	int psize, ssize;
> > +	int psize, ssize, rc;
> > =20
> >  	*esid =3D (ea & ESID_MASK) | SLB_ESID_V;
> > =20
> > -	switch (REGION_ID(ea)) {
> > -	case USER_REGION_ID:
> > -		pr_devel("copro_data_segment: 0x%llx -- USER_REGION_ID\n", ea);
> > -#ifdef CONFIG_PPC_MM_SLICES
> > -		psize =3D get_slice_psize(mm, ea);
> > -#else
> > -		psize =3D mm->context.user_psize;
> > -#endif
> > -		ssize =3D user_segment_size(ea);
> > -		*vsid =3D (get_vsid(mm->context.id, ea, ssize)
> > -			 << slb_vsid_shift(ssize)) | SLB_VSID_USER;
> > -		break;
> > -	case VMALLOC_REGION_ID:
> > -		pr_devel("copro_data_segment: 0x%llx -- VMALLOC_REGION_ID\n", ea);
> > -		if (ea < VMALLOC_END)
> > -			psize =3D mmu_vmalloc_psize;
> > -		else
> > -			psize =3D mmu_io_psize;
> > -		ssize =3D mmu_kernel_ssize;
> > -		*vsid =3D (get_kernel_vsid(ea, mmu_kernel_ssize)
> > -			 << SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
> > -		break;
> > -	case KERNEL_REGION_ID:
> > -		pr_devel("copro_data_segment: 0x%llx -- KERNEL_REGION_ID\n", ea);
> > -		psize =3D mmu_linear_psize;
> > -		ssize =3D mmu_kernel_ssize;
> > -		*vsid =3D (get_kernel_vsid(ea, mmu_kernel_ssize)
> > -			 << SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
> > -		break;
> > -	default:
> > -		/* Future: support kernel segments so that drivers can use the
> > -		 * CoProcessors */
> > -		pr_debug("invalid region access at %016llx\n", ea);
> > -		return 1;
> > -	}
> > +	rc =3D calculate_vsid(mm, ea, vsid, &psize, &ssize);
> > +	if (rc)
> > +		return rc;
> > +	if (REGION_ID(ea) =3D=3D USER_REGION_ID)
> > +		*vsid =3D (*vsid << slb_vsid_shift(ssize)) | SLB_VSID_USER;
> > +	else
> > +		*vsid =3D (*vsid << SLB_VSID_SHIFT) | SLB_VSID_KERNEL;
> > +
> >  	*vsid |=3D mmu_psize_defs[psize].sllp |
> >  		((ssize =3D=3D MMU_SEGSIZE_1T) ? SLB_VSID_B_1T : 0);
> > =20
> > diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_uti=
ls_64.c
> > index 0a5c8c0..3fa81ca 100644
> > --- a/arch/powerpc/mm/hash_utils_64.c
> > +++ b/arch/powerpc/mm/hash_utils_64.c
> > @@ -983,6 +983,38 @@ static void check_paca_psize(unsigned long ea, str=
uct mm_struct *mm,
> >  	}
> >  }
> > =20
> > +int calculate_vsid(struct mm_struct *mm, u64 ea,
> > +		   u64 *vsid, int *psize, int *ssize)
> > +{
> > +	switch (REGION_ID(ea)) {
> > +	case USER_REGION_ID:
> > +		pr_devel("%s: 0x%llx -- USER_REGION_ID\n", __func__, ea);
> > +		*psize =3D get_slice_psize(mm, ea);
> > +		*ssize =3D user_segment_size(ea);
> > +		*vsid =3D get_vsid(mm->context.id, ea, *ssize);
> > +		return 0;
> > +	case VMALLOC_REGION_ID:
> > +		pr_devel("%s: 0x%llx -- VMALLOC_REGION_ID\n", __func__, ea);
> > +		if (ea < VMALLOC_END)
> > +			*psize =3D mmu_vmalloc_psize;
> > +		else
> > +			*psize =3D mmu_io_psize;
> > +		*ssize =3D mmu_kernel_ssize;
> > +		*vsid =3D get_kernel_vsid(ea, mmu_kernel_ssize);
> > +		return 0;
> > +	case KERNEL_REGION_ID:
> > +		pr_devel("%s: 0x%llx -- KERNEL_REGION_ID\n", __func__, ea);
> > +		*psize =3D mmu_linear_psize;
> > +		*ssize =3D mmu_kernel_ssize;
> > +		*vsid =3D get_kernel_vsid(ea, mmu_kernel_ssize);
> > +		return 0;
> > +	default:
> > +		pr_debug("%s: invalid region access at %016llx\n", __func__, ea);
> > +		return 1;
> > +	}
> > +}
> > +EXPORT_SYMBOL_GPL(calculate_vsid);
> > +
> >  /* Result code is:
> >   *  0 - handled
> >   *  1 - normal page fault
> > @@ -993,7 +1025,7 @@ int hash_page_mm(struct mm_struct *mm, unsigned lo=
ng ea, unsigned long access, u
> >  {
> >  	enum ctx_state prev_state =3D exception_enter();
> >  	pgd_t *pgdir;
> > -	unsigned long vsid;
> > +	u64 vsid;
> >  	pte_t *ptep;
> >  	unsigned hugeshift;
> >  	const struct cpumask *tmp;
> > @@ -1003,35 +1035,20 @@ int hash_page_mm(struct mm_struct *mm, unsigned=
 long ea, unsigned long access, u
> >  	DBG_LOW("%s(ea=3D%016lx, access=3D%lx, trap=3D%lx\n",
> >  		__func__, ea, access, trap);
> > =20
> > -	/* Get region & vsid */
> > - 	switch (REGION_ID(ea)) {
> > -	case USER_REGION_ID:
> > +	/* Get region */
> > +	if (REGION_ID(ea) =3D=3D USER_REGION_ID) {
> >  		user_region =3D 1;
> >  		if (! mm) {
> >  			DBG_LOW(" user region with no mm !\n");
> >  			rc =3D 1;
> >  			goto bail;
> >  		}
> > -		psize =3D get_slice_psize(mm, ea);
> > -		ssize =3D user_segment_size(ea);
> > -		vsid =3D get_vsid(mm->context.id, ea, ssize);
> > -		break;
> > -	case VMALLOC_REGION_ID:
> > +	} else
> >  		mm =3D &init_mm;
> > -		vsid =3D get_kernel_vsid(ea, mmu_kernel_ssize);
> > -		if (ea < VMALLOC_END)
> > -			psize =3D mmu_vmalloc_psize;
> > -		else
> > -			psize =3D mmu_io_psize;
> > -		ssize =3D mmu_kernel_ssize;
> > -		break;
> > -	default:
> > -		/* Not a valid range
> > -		 * Send the problem up to do_page_fault=20
> > -		 */
> > -		rc =3D 1;
>=20
>=20
> That part is different now. We now handle kernel_region_id in case of
> hash_page. Earlier we used consider it a problem.=20

Yeah, that's going to be the kernel linear mapping.  We should probably
continue to barf as we shouldn't fault on that.  Thanks.

I'll fix.

Mikey

>=20
> > +	rc =3D calculate_vsid(mm, ea, &vsid, &psize, &ssize);
> > +	if (rc)
> >  		goto bail;
> > -	}
> > +
> >  	DBG_LOW(" mm=3D%p, mm->pgdir=3D%p, vsid=3D%016lx\n", mm, mm->pgd, vsi=
d);
> > =20
> >  	/* Bad address. */
> > --=20
> > 1.9.1
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel"=
 in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
>=20

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 14/17] cxl: Driver code for powernv PCIe based cards for userspace access
  2014-09-30 10:35   ` Michael Neuling
@ 2014-10-02  7:02     ` Michael Ellerman
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Ellerman @ 2014-10-02  7:02 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, benh
  Cc: mikey, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

On Tue, 2014-30-09 at 10:35:03 UTC, Michael Neuling wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> This is the core of the cxl driver.
> 
> It adds support for using cxl cards in the powernv environment only (no guest
> support). 

Which means on bare metal on power8 for the peanut gallery.

> It allows access to cxl accelerators by userspace using
> /dev/cxl/afu0.0 char device.

devices ?

> The kernel driver has no knowledge of the acceleration function.  

.. has no knowledge of the function implemented by the accelerator ?

> It only provides services to userspace via the /dev/cxl/afu0.0 device.

Provides what services?

> This will compile to two modules.  cxl.ko provides the core cxl functionality
> and userspace API.  cxl-pci.ko provides the PCI driver driver functionality the
> powernv environment.

Last sentence doesn't hold together.

> Documentation of the cxl hardware architecture and userspace API is provided in
> subsequent patches.

Partial review below.

So some meta comments.

Can you get rid of all the foo_t's. That should just be a search and replace.

Can we drop the indirection layers for now. They make it quite a bit harder to
follow the code, and it sounds like you're not 100% sure they're the right
abstraction anyway. When you add another backend/driver/whatever you can readd
just the abstractions you need.

/*
 * Block comments look like this.
 */

> diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
> new file mode 100644
> index 0000000..9206ca4
> --- /dev/null
> +++ b/drivers/misc/cxl/context.c
> @@ -0,0 +1,171 @@
> +/*
> + * Copyright 2014 IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#undef DEBUG

Drop this please.

Instead can you add:

#define pr_fmt(fmt)        "cxl: " fmt

To each file, so it's clear where your pr_xxxs() come from.

> +#include <linux/module.h>
> +#include <linux/kernel.h>
> +#include <linux/bitmap.h>
> +#include <linux/sched.h>
> +#include <linux/pid.h>
> +#include <linux/fs.h>
> +#include <linux/mm.h>
> +#include <linux/debugfs.h>
> +#include <linux/slab.h>
> +#include <linux/idr.h>
> +#include <asm/cputable.h>
> +#include <asm/current.h>
> +#include <asm/copro.h>
> +
> +#include "cxl.h"
> +
> +/*
> + * Allocates space for a CXL context.
> + */
> +struct cxl_context_t *cxl_context_alloc(void)
> +{
> +	return kzalloc(sizeof(struct cxl_context_t), GFP_KERNEL);
> +}

> +/*
> + * Initialises a CXL context.
> + */
> +int cxl_context_init(struct cxl_context_t *ctx, struct cxl_afu_t *afu, bool master)
> +{
> +	int i;
> +
> +	spin_lock_init(&ctx->sst_lock);
> +	ctx->sstp = NULL;
> +	ctx->afu = afu;
> +	ctx->master = master;
> +	ctx->pid = get_pid(get_task_pid(current, PIDTYPE_PID));
> +
> +	INIT_WORK(&ctx->fault_work, cxl_handle_fault);
> +
> +	init_waitqueue_head(&ctx->wq);
> +	spin_lock_init(&ctx->lock);
> +
> +	ctx->irq_bitmap = NULL;
> +	ctx->pending_irq = false;
> +	ctx->pending_fault = false;
> +	ctx->pending_afu_err = false;
> +
> +	ctx->status = OPENED;
> +
> +	idr_preload(GFP_KERNEL);
> +	spin_lock(&afu->contexts_lock);
> +	i = idr_alloc(&ctx->afu->contexts_idr, ctx, 0,
> +		      ctx->afu->num_procs, GFP_NOWAIT);
> +	spin_unlock(&afu->contexts_lock);
> +	idr_preload_end();
> +	if (i < 0)
> +		return i;
> +
> +	ctx->ph = i;
> +	ctx->elem = &ctx->afu->spa[i];
> +	ctx->pe_inserted = false;
> +	return 0;
> +}
> +
> +/*
> + * Map a per-context mmio space into the given vma.
> + */
> +int cxl_context_iomap(struct cxl_context_t *ctx, struct vm_area_struct *vma)
> +{
> +	u64 len = vma->vm_end - vma->vm_start;
> +	len = min(len, ctx->psn_size);
> +
> +	if (ctx->afu->current_model == CXL_MODEL_DEDICATED) {
> +		vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> +		return vm_iomap_memory(vma, ctx->afu->psn_phys, ctx->afu->adapter->ps_size);

Why don't we use len here?

> +	}
> +
> +	/* make sure there is a valid per process space for this AFU */
> +	if ((ctx->master && !ctx->afu->psa) || (!ctx->afu->pp_psa)) {

What the hell are psa and pp_psa ?

> +		pr_devel("AFU doesn't support mmio space\n");
> +		return -EINVAL;
> +	}
> +
> +	/* Can't mmap until the AFU is enabled */
> +	if (!ctx->afu->enabled)
> +		return -EBUSY;

afu_mmap() already checked status == STARTED.

Is EBUSY the right return code?

> +	pr_devel("%s: mmio physical: %llx pe: %i master:%i\n", __func__,
> +		 ctx->psn_phys, ctx->ph , ctx->master);
> +
> +	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> +	return vm_iomap_memory(vma, ctx->psn_phys, len);
> +}
> +
> +/*
> + * Detach a context from the hardware. This disables interrupts and doesn't
> + * return until all outstanding interrupts for this context have completed. The
> + * hardware should no longer access *ctx after this has returned.
> + */
> +static void __detach_context(struct cxl_context_t *ctx)
> +{
> +	unsigned long flags;
> +	enum cxl_context_status status;
> +
> +	spin_lock_irqsave(&ctx->sst_lock, flags);
> +	status = ctx->status;
> +	ctx->status = CLOSED;
> +	spin_unlock_irqrestore(&ctx->sst_lock, flags);

You take sst_lock here, before manipulating ctx->status. But I see lots of
places where you check status without taking any lock. So I'm a bit confused by
that.

At first glance it looks like we could race with afu_ioctl_start_work(), which
sets status to STARTED. But the only place we're called from is
cxl_context_detach(), from afu_release(), and that should only run once the
ioctl is finished AIUI. So that looks OK.

But some commentary would be good, especially if this is ever called via a
different path.

> +	if (status != STARTED)
> +		return;
> +
> +	WARN_ON(cxl_ops->detach_process(ctx));

As discussed offline, this can fail, and the device might continue generating
interrupts even though we asked it not to.

Once you release the irqs below you'll get warnings from the xics code. Until
those virq numbers are handed out to someone else, at which point hilarity will
ensue.

It might be better to just warn and bail if detach fails, and leave the ctx
lying around?

> +	afu_release_irqs(ctx);
> +	flush_work(&ctx->fault_work); /* Only needed for dedicated process */
> +	wake_up_all(&ctx->wq);
> +}
> +
> +/*
> + * Detach the given context from the AFU. This doesn't actually
> + * free the context but it should stop the context running in hardware
> + * (ie. prevent this context from generating any further interrupts
> + * so that it can be freed).
> + */
> +void cxl_context_detach(struct cxl_context_t *ctx)
> +{
> +	__detach_context(ctx);
> +}

Why does this exist, or why does __detach_context() exist?

> +
> +/*
> + * Detach all contexts on the given AFU.
> + */
> +void cxl_context_detach_all(struct cxl_afu_t *afu)
> +{
> +	struct cxl_context_t *ctx;
> +	int tmp;
> +

Some commentary on why you're using rcu_read_lock() would be good. I know, but
I'll have forgotten by next week.

> +	rcu_read_lock();
> +	idr_for_each_entry(&afu->contexts_idr, ctx, tmp)
> +		__detach_context(ctx);
> +	rcu_read_unlock();
> +}
> +EXPORT_SYMBOL(cxl_context_detach_all);


> +static int afu_release(struct inode *inode, struct file *file)
> +{
> +	struct cxl_context_t *ctx = file->private_data;
> +
> +	pr_devel("%s: closing cxl file descriptor. pe: %i\n",
> +		 __func__, ctx->ph);
> +	cxl_context_detach(ctx);
> +
> +	module_put(ctx->afu->adapter->driver->module);

This potentially drops the last reference on cxl-pci.ko.

cxl_remove() (in cxl-pci.ko) calls back into cxl_remove_afu() and
cxl_remove_adapter(). 

I *think* that's OK, because you won't be able to finish cxl_remove() until you
remove the afu cdev, and presumably that can't happen until you return from
release.

> +	put_device(&ctx->afu->dev);
> +
> +	/* It should be safe to remove the context now */
> +	cxl_context_free(ctx);

My other worry is that it's not until here that you remove the ctx from the
idr. And so up until this point the ctx could be found in the idr and used by
someone.

I think it would be better to remove the ctx from the idr earlier, before you
start tearing things down. Perhaps even in cxl_context_detach().

> +	cxl_ctx_put();
> +	return 0;
> +}


> +static ssize_t afu_read(struct file *file, char __user *buf, size_t count,
> +			loff_t *off)
> +{
> +	struct cxl_context_t *ctx = file->private_data;
> +	struct cxl_event event;
> +	unsigned long flags;
> +	ssize_t size;
> +	DEFINE_WAIT(wait);
> +
> +	if (count < sizeof(struct cxl_event_header))
> +		return -EINVAL;

This could use some love. The locking in here is pretty funky, and not good
funky.

> +	while (1) {
> +		spin_lock_irqsave(&ctx->lock, flags);
> +		if (ctx->pending_irq || ctx->pending_fault ||
> +		    ctx->pending_afu_err || (ctx->status == CLOSED))
> +			break;
> +		spin_unlock_irqrestore(&ctx->lock, flags);
> +
> +		if (file->f_flags & O_NONBLOCK)
> +			return -EAGAIN;
> +
> +		prepare_to_wait(&ctx->wq, &wait, TASK_INTERRUPTIBLE);
> +		if (!(ctx->pending_irq || ctx->pending_fault ||
> +		      ctx->pending_afu_err || (ctx->status == CLOSED))) {
> +			pr_devel("afu_read going to sleep...\n");
> +			schedule();
> +			pr_devel("afu_read woken up\n");
> +		}
> +		finish_wait(&ctx->wq, &wait);
> +
> +		if (signal_pending(current))
> +			return -ERESTARTSYS;
> +	}
> +
> +	memset(&event, 0, sizeof(event));
> +	event.header.process_element = ctx->ph;
> +	if (ctx->pending_irq) {
> +		pr_devel("afu_read delivering AFU interrupt\n");
> +		event.header.size = sizeof(struct cxl_event_afu_interrupt);
> +		event.header.type = CXL_EVENT_AFU_INTERRUPT;
> +		event.irq.irq = find_first_bit(ctx->irq_bitmap, ctx->irq_count) + 1;
> +
> +		/* Only clear the IRQ if we can send the whole event: */
> +		if (count >= event.header.size) {
> +			clear_bit(event.irq.irq - 1, ctx->irq_bitmap);
> +			if (bitmap_empty(ctx->irq_bitmap, ctx->irq_count))
> +				ctx->pending_irq = false;
> +		}
> +	} else if (ctx->pending_fault) {
> +		pr_devel("afu_read delivering data storage fault\n");
> +		event.header.size = sizeof(struct cxl_event_data_storage);
> +		event.header.type = CXL_EVENT_DATA_STORAGE;
> +		event.fault.addr = ctx->fault_addr;
> +
> +		/* Only clear the fault if we can send the whole event: */
> +		if (count >= event.header.size)
> +			ctx->pending_fault = false;
> +	} else if (ctx->pending_afu_err) {
> +		pr_devel("afu_read delivering afu error\n");
> +		event.header.size = sizeof(struct cxl_event_afu_error);
> +		event.header.type = CXL_EVENT_AFU_ERROR;
> +		event.afu_err.err = ctx->afu_err;
> +
> +		/* Only clear the fault if we can send the whole event: */
> +		if (count >= event.header.size)
> +			ctx->pending_afu_err = false;
> +	} else if (ctx->status == CLOSED) {
> +		pr_devel("afu_read fatal error\n");
> +		spin_unlock_irqrestore(&ctx->lock, flags);
> +		return -EIO;
> +	} else
> +		WARN(1, "afu_read must be buggy\n");
> +
> +	spin_unlock_irqrestore(&ctx->lock, flags);
> +
> +	size = min_t(size_t, count, event.header.size);
> +	copy_to_user(buf, &event, size);
> +
> +	return size;
> +}

> diff --git a/drivers/misc/cxl/irq.c b/drivers/misc/cxl/irq.c
> new file mode 100644
> index 0000000..3e01e1d
> --- /dev/null
> +++ b/drivers/misc/cxl/irq.c
...
> +
> +void afu_release_irqs(struct cxl_context_t *ctx)
> +{
> +	irq_hw_number_t hwirq;
> +	unsigned int virq;
> +	int r, i;
> +
> +	for (r = 1; r < CXL_IRQ_RANGES; r++) {
> +		hwirq = ctx->irqs.offset[r];
> +		for (i = 0; i < ctx->irqs.range[r]; hwirq++, i++) {
> +			virq = irq_find_mapping(NULL, hwirq);
> +			if (virq)
> +				cxl_unmap_irq(virq, ctx);
> +		}
> +	}
> +
> +	ctx->afu->adapter->driver->release_irq_ranges(&ctx->irqs, ctx->afu->adapter);

Do we need this many levels of indirection?

cheers

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 14/17] cxl: Driver code for powernv PCIe based cards for userspace access
@ 2014-10-02  7:02     ` Michael Ellerman
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Ellerman @ 2014-10-02  7:02 UTC (permalink / raw)
  To: Michael Neuling, greg, arnd, benh
  Cc: cbe-oss-dev, mikey, Aneesh Kumar K.V, imunsie, linux-kernel,
	linuxppc-dev, jk, anton

On Tue, 2014-30-09 at 10:35:03 UTC, Michael Neuling wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> This is the core of the cxl driver.
> 
> It adds support for using cxl cards in the powernv environment only (no guest
> support). 

Which means on bare metal on power8 for the peanut gallery.

> It allows access to cxl accelerators by userspace using
> /dev/cxl/afu0.0 char device.

devices ?

> The kernel driver has no knowledge of the acceleration function.  

.. has no knowledge of the function implemented by the accelerator ?

> It only provides services to userspace via the /dev/cxl/afu0.0 device.

Provides what services?

> This will compile to two modules.  cxl.ko provides the core cxl functionality
> and userspace API.  cxl-pci.ko provides the PCI driver driver functionality the
> powernv environment.

Last sentence doesn't hold together.

> Documentation of the cxl hardware architecture and userspace API is provided in
> subsequent patches.

Partial review below.

So some meta comments.

Can you get rid of all the foo_t's. That should just be a search and replace.

Can we drop the indirection layers for now. They make it quite a bit harder to
follow the code, and it sounds like you're not 100% sure they're the right
abstraction anyway. When you add another backend/driver/whatever you can readd
just the abstractions you need.

/*
 * Block comments look like this.
 */

> diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
> new file mode 100644
> index 0000000..9206ca4
> --- /dev/null
> +++ b/drivers/misc/cxl/context.c
> @@ -0,0 +1,171 @@
> +/*
> + * Copyright 2014 IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#undef DEBUG

Drop this please.

Instead can you add:

#define pr_fmt(fmt)        "cxl: " fmt

To each file, so it's clear where your pr_xxxs() come from.

> +#include <linux/module.h>
> +#include <linux/kernel.h>
> +#include <linux/bitmap.h>
> +#include <linux/sched.h>
> +#include <linux/pid.h>
> +#include <linux/fs.h>
> +#include <linux/mm.h>
> +#include <linux/debugfs.h>
> +#include <linux/slab.h>
> +#include <linux/idr.h>
> +#include <asm/cputable.h>
> +#include <asm/current.h>
> +#include <asm/copro.h>
> +
> +#include "cxl.h"
> +
> +/*
> + * Allocates space for a CXL context.
> + */
> +struct cxl_context_t *cxl_context_alloc(void)
> +{
> +	return kzalloc(sizeof(struct cxl_context_t), GFP_KERNEL);
> +}

> +/*
> + * Initialises a CXL context.
> + */
> +int cxl_context_init(struct cxl_context_t *ctx, struct cxl_afu_t *afu, bool master)
> +{
> +	int i;
> +
> +	spin_lock_init(&ctx->sst_lock);
> +	ctx->sstp = NULL;
> +	ctx->afu = afu;
> +	ctx->master = master;
> +	ctx->pid = get_pid(get_task_pid(current, PIDTYPE_PID));
> +
> +	INIT_WORK(&ctx->fault_work, cxl_handle_fault);
> +
> +	init_waitqueue_head(&ctx->wq);
> +	spin_lock_init(&ctx->lock);
> +
> +	ctx->irq_bitmap = NULL;
> +	ctx->pending_irq = false;
> +	ctx->pending_fault = false;
> +	ctx->pending_afu_err = false;
> +
> +	ctx->status = OPENED;
> +
> +	idr_preload(GFP_KERNEL);
> +	spin_lock(&afu->contexts_lock);
> +	i = idr_alloc(&ctx->afu->contexts_idr, ctx, 0,
> +		      ctx->afu->num_procs, GFP_NOWAIT);
> +	spin_unlock(&afu->contexts_lock);
> +	idr_preload_end();
> +	if (i < 0)
> +		return i;
> +
> +	ctx->ph = i;
> +	ctx->elem = &ctx->afu->spa[i];
> +	ctx->pe_inserted = false;
> +	return 0;
> +}
> +
> +/*
> + * Map a per-context mmio space into the given vma.
> + */
> +int cxl_context_iomap(struct cxl_context_t *ctx, struct vm_area_struct *vma)
> +{
> +	u64 len = vma->vm_end - vma->vm_start;
> +	len = min(len, ctx->psn_size);
> +
> +	if (ctx->afu->current_model == CXL_MODEL_DEDICATED) {
> +		vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> +		return vm_iomap_memory(vma, ctx->afu->psn_phys, ctx->afu->adapter->ps_size);

Why don't we use len here?

> +	}
> +
> +	/* make sure there is a valid per process space for this AFU */
> +	if ((ctx->master && !ctx->afu->psa) || (!ctx->afu->pp_psa)) {

What the hell are psa and pp_psa ?

> +		pr_devel("AFU doesn't support mmio space\n");
> +		return -EINVAL;
> +	}
> +
> +	/* Can't mmap until the AFU is enabled */
> +	if (!ctx->afu->enabled)
> +		return -EBUSY;

afu_mmap() already checked status == STARTED.

Is EBUSY the right return code?

> +	pr_devel("%s: mmio physical: %llx pe: %i master:%i\n", __func__,
> +		 ctx->psn_phys, ctx->ph , ctx->master);
> +
> +	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> +	return vm_iomap_memory(vma, ctx->psn_phys, len);
> +}
> +
> +/*
> + * Detach a context from the hardware. This disables interrupts and doesn't
> + * return until all outstanding interrupts for this context have completed. The
> + * hardware should no longer access *ctx after this has returned.
> + */
> +static void __detach_context(struct cxl_context_t *ctx)
> +{
> +	unsigned long flags;
> +	enum cxl_context_status status;
> +
> +	spin_lock_irqsave(&ctx->sst_lock, flags);
> +	status = ctx->status;
> +	ctx->status = CLOSED;
> +	spin_unlock_irqrestore(&ctx->sst_lock, flags);

You take sst_lock here, before manipulating ctx->status. But I see lots of
places where you check status without taking any lock. So I'm a bit confused by
that.

At first glance it looks like we could race with afu_ioctl_start_work(), which
sets status to STARTED. But the only place we're called from is
cxl_context_detach(), from afu_release(), and that should only run once the
ioctl is finished AIUI. So that looks OK.

But some commentary would be good, especially if this is ever called via a
different path.

> +	if (status != STARTED)
> +		return;
> +
> +	WARN_ON(cxl_ops->detach_process(ctx));

As discussed offline, this can fail, and the device might continue generating
interrupts even though we asked it not to.

Once you release the irqs below you'll get warnings from the xics code. Until
those virq numbers are handed out to someone else, at which point hilarity will
ensue.

It might be better to just warn and bail if detach fails, and leave the ctx
lying around?

> +	afu_release_irqs(ctx);
> +	flush_work(&ctx->fault_work); /* Only needed for dedicated process */
> +	wake_up_all(&ctx->wq);
> +}
> +
> +/*
> + * Detach the given context from the AFU. This doesn't actually
> + * free the context but it should stop the context running in hardware
> + * (ie. prevent this context from generating any further interrupts
> + * so that it can be freed).
> + */
> +void cxl_context_detach(struct cxl_context_t *ctx)
> +{
> +	__detach_context(ctx);
> +}

Why does this exist, or why does __detach_context() exist?

> +
> +/*
> + * Detach all contexts on the given AFU.
> + */
> +void cxl_context_detach_all(struct cxl_afu_t *afu)
> +{
> +	struct cxl_context_t *ctx;
> +	int tmp;
> +

Some commentary on why you're using rcu_read_lock() would be good. I know, but
I'll have forgotten by next week.

> +	rcu_read_lock();
> +	idr_for_each_entry(&afu->contexts_idr, ctx, tmp)
> +		__detach_context(ctx);
> +	rcu_read_unlock();
> +}
> +EXPORT_SYMBOL(cxl_context_detach_all);


> +static int afu_release(struct inode *inode, struct file *file)
> +{
> +	struct cxl_context_t *ctx = file->private_data;
> +
> +	pr_devel("%s: closing cxl file descriptor. pe: %i\n",
> +		 __func__, ctx->ph);
> +	cxl_context_detach(ctx);
> +
> +	module_put(ctx->afu->adapter->driver->module);

This potentially drops the last reference on cxl-pci.ko.

cxl_remove() (in cxl-pci.ko) calls back into cxl_remove_afu() and
cxl_remove_adapter(). 

I *think* that's OK, because you won't be able to finish cxl_remove() until you
remove the afu cdev, and presumably that can't happen until you return from
release.

> +	put_device(&ctx->afu->dev);
> +
> +	/* It should be safe to remove the context now */
> +	cxl_context_free(ctx);

My other worry is that it's not until here that you remove the ctx from the
idr. And so up until this point the ctx could be found in the idr and used by
someone.

I think it would be better to remove the ctx from the idr earlier, before you
start tearing things down. Perhaps even in cxl_context_detach().

> +	cxl_ctx_put();
> +	return 0;
> +}


> +static ssize_t afu_read(struct file *file, char __user *buf, size_t count,
> +			loff_t *off)
> +{
> +	struct cxl_context_t *ctx = file->private_data;
> +	struct cxl_event event;
> +	unsigned long flags;
> +	ssize_t size;
> +	DEFINE_WAIT(wait);
> +
> +	if (count < sizeof(struct cxl_event_header))
> +		return -EINVAL;

This could use some love. The locking in here is pretty funky, and not good
funky.

> +	while (1) {
> +		spin_lock_irqsave(&ctx->lock, flags);
> +		if (ctx->pending_irq || ctx->pending_fault ||
> +		    ctx->pending_afu_err || (ctx->status == CLOSED))
> +			break;
> +		spin_unlock_irqrestore(&ctx->lock, flags);
> +
> +		if (file->f_flags & O_NONBLOCK)
> +			return -EAGAIN;
> +
> +		prepare_to_wait(&ctx->wq, &wait, TASK_INTERRUPTIBLE);
> +		if (!(ctx->pending_irq || ctx->pending_fault ||
> +		      ctx->pending_afu_err || (ctx->status == CLOSED))) {
> +			pr_devel("afu_read going to sleep...\n");
> +			schedule();
> +			pr_devel("afu_read woken up\n");
> +		}
> +		finish_wait(&ctx->wq, &wait);
> +
> +		if (signal_pending(current))
> +			return -ERESTARTSYS;
> +	}
> +
> +	memset(&event, 0, sizeof(event));
> +	event.header.process_element = ctx->ph;
> +	if (ctx->pending_irq) {
> +		pr_devel("afu_read delivering AFU interrupt\n");
> +		event.header.size = sizeof(struct cxl_event_afu_interrupt);
> +		event.header.type = CXL_EVENT_AFU_INTERRUPT;
> +		event.irq.irq = find_first_bit(ctx->irq_bitmap, ctx->irq_count) + 1;
> +
> +		/* Only clear the IRQ if we can send the whole event: */
> +		if (count >= event.header.size) {
> +			clear_bit(event.irq.irq - 1, ctx->irq_bitmap);
> +			if (bitmap_empty(ctx->irq_bitmap, ctx->irq_count))
> +				ctx->pending_irq = false;
> +		}
> +	} else if (ctx->pending_fault) {
> +		pr_devel("afu_read delivering data storage fault\n");
> +		event.header.size = sizeof(struct cxl_event_data_storage);
> +		event.header.type = CXL_EVENT_DATA_STORAGE;
> +		event.fault.addr = ctx->fault_addr;
> +
> +		/* Only clear the fault if we can send the whole event: */
> +		if (count >= event.header.size)
> +			ctx->pending_fault = false;
> +	} else if (ctx->pending_afu_err) {
> +		pr_devel("afu_read delivering afu error\n");
> +		event.header.size = sizeof(struct cxl_event_afu_error);
> +		event.header.type = CXL_EVENT_AFU_ERROR;
> +		event.afu_err.err = ctx->afu_err;
> +
> +		/* Only clear the fault if we can send the whole event: */
> +		if (count >= event.header.size)
> +			ctx->pending_afu_err = false;
> +	} else if (ctx->status == CLOSED) {
> +		pr_devel("afu_read fatal error\n");
> +		spin_unlock_irqrestore(&ctx->lock, flags);
> +		return -EIO;
> +	} else
> +		WARN(1, "afu_read must be buggy\n");
> +
> +	spin_unlock_irqrestore(&ctx->lock, flags);
> +
> +	size = min_t(size_t, count, event.header.size);
> +	copy_to_user(buf, &event, size);
> +
> +	return size;
> +}

> diff --git a/drivers/misc/cxl/irq.c b/drivers/misc/cxl/irq.c
> new file mode 100644
> index 0000000..3e01e1d
> --- /dev/null
> +++ b/drivers/misc/cxl/irq.c
...
> +
> +void afu_release_irqs(struct cxl_context_t *ctx)
> +{
> +	irq_hw_number_t hwirq;
> +	unsigned int virq;
> +	int r, i;
> +
> +	for (r = 1; r < CXL_IRQ_RANGES; r++) {
> +		hwirq = ctx->irqs.offset[r];
> +		for (i = 0; i < ctx->irqs.range[r]; hwirq++, i++) {
> +			virq = irq_find_mapping(NULL, hwirq);
> +			if (virq)
> +				cxl_unmap_irq(virq, ctx);
> +		}
> +	}
> +
> +	ctx->afu->adapter->driver->release_irq_ranges(&ctx->irqs, ctx->afu->adapter);

Do we need this many levels of indirection?

cheers

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 09/17] powerpc/mm: Add new hash_page_mm()
  2014-10-01  9:43     ` Aneesh Kumar K.V
@ 2014-10-02  7:10       ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-10-02  7:10 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: greg, arnd, mpe, benh, anton, linux-kernel, linuxppc-dev, jk,
	imunsie, cbe-oss-dev

On Wed, 2014-10-01 at 15:13 +0530, Aneesh Kumar K.V wrote:
> Michael Neuling <mikey@neuling.org> writes:
> 
> > From: Ian Munsie <imunsie@au1.ibm.com>
> >
> > This adds a new function hash_page_mm() based on the existing hash_page().
> > This version allows any struct mm to be passed in, rather than assuming
> > current.  This is useful for servicing co-processor faults which are not in the
> > context of the current running process.
> >
> > We need to be careful here as the current hash_page() assumes current in a few
> > places.
> 
> It would be nice to document the rules here. So when we try to add a hash
> page entry, and if that result in demotion of the segment are we suppose to
> flush slbs ? 

Yeah, we found it sucky to understand.  The current documentation is
"buy benh a beer and ask him" which doesn't scale very well unless
you're benh and you like beer.

> Also why would one want to hash anything other
> than current->mm ? How will this get called ? 

We are calling this on behalf of a co-processor (eg cxl).  The mm this
is currently associated with may not be running on a cpu.  

> May be they are explained in later patches. But can we also explain it
> here. 

Ok, I'll add something (mpe had the same question).

Mikey
> 
> >
> > Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > ---
> >  arch/powerpc/include/asm/mmu-hash64.h |  1 +
> >  arch/powerpc/mm/hash_utils_64.c       | 22 ++++++++++++++--------
> >  2 files changed, 15 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
> > index 6d0b7a2..f84e5a5 100644
> > --- a/arch/powerpc/include/asm/mmu-hash64.h
> > +++ b/arch/powerpc/include/asm/mmu-hash64.h
> > @@ -322,6 +322,7 @@ extern int __hash_page_64K(unsigned long ea, unsigned long access,
> >  			   unsigned int local, int ssize);
> >  struct mm_struct;
> >  unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap);
> > +extern int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, unsigned long trap);
> >  extern int hash_page(unsigned long ea, unsigned long access, unsigned long trap);
> >  int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid,
> >  		     pte_t *ptep, unsigned long trap, int local, int ssize,
> > diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> > index bbdb054..0a5c8c0 100644
> > --- a/arch/powerpc/mm/hash_utils_64.c
> > +++ b/arch/powerpc/mm/hash_utils_64.c
> > @@ -904,7 +904,7 @@ void demote_segment_4k(struct mm_struct *mm, unsigned long addr)
> >  		return;
> >  	slice_set_range_psize(mm, addr, 1, MMU_PAGE_4K);
> >  	copro_flush_all_slbs(mm);
> > -	if (get_paca_psize(addr) != MMU_PAGE_4K) {
> > +	if ((get_paca_psize(addr) != MMU_PAGE_4K) && (current->mm == mm)) {
> >  		get_paca()->context = mm->context;
> >  		slb_flush_and_rebolt();
> >  	}
> > @@ -989,26 +989,24 @@ static void check_paca_psize(unsigned long ea, struct mm_struct *mm,
> >   * -1 - critical hash insertion error
> >   * -2 - access not permitted by subpage protection mechanism
> >   */
> > -int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
> > +int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, unsigned long trap)
> >  {
> >  	enum ctx_state prev_state = exception_enter();
> >  	pgd_t *pgdir;
> >  	unsigned long vsid;
> > -	struct mm_struct *mm;
> >  	pte_t *ptep;
> >  	unsigned hugeshift;
> >  	const struct cpumask *tmp;
> >  	int rc, user_region = 0, local = 0;
> >  	int psize, ssize;
> >  
> > -	DBG_LOW("hash_page(ea=%016lx, access=%lx, trap=%lx\n",
> > -		ea, access, trap);
> > +	DBG_LOW("%s(ea=%016lx, access=%lx, trap=%lx\n",
> > +		__func__, ea, access, trap);
> >  
> >  	/* Get region & vsid */
> >   	switch (REGION_ID(ea)) {
> >  	case USER_REGION_ID:
> >  		user_region = 1;
> > -		mm = current->mm;
> >  		if (! mm) {
> >  			DBG_LOW(" user region with no mm !\n");
> >  			rc = 1;
> > @@ -1104,7 +1102,8 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
> >  			WARN_ON(1);
> >  		}
> >  #endif
> > -		check_paca_psize(ea, mm, psize, user_region);
> > +		if (current->mm == mm)
> > +			check_paca_psize(ea, mm, psize, user_region);
> >  
> >  		goto bail;
> >  	}
> > @@ -1145,7 +1144,8 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
> >  		}
> >  	}
> >  
> > -	check_paca_psize(ea, mm, psize, user_region);
> > +	if (current->mm == mm)
> > +		check_paca_psize(ea, mm, psize, user_region);
> >  #endif /* CONFIG_PPC_64K_PAGES */
> >  
> >  #ifdef CONFIG_PPC_HAS_HASH_64K
> > @@ -1180,6 +1180,12 @@ bail:
> >  	exception_exit(prev_state);
> >  	return rc;
> >  }
> > +EXPORT_SYMBOL_GPL(hash_page_mm);
> > +
> > +int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
> > +{
> > +	return hash_page_mm(current->mm, ea, access, trap);
> > +}
> >  EXPORT_SYMBOL_GPL(hash_page);
> >  
> >  void hash_preload(struct mm_struct *mm, unsigned long ea,
> > -- 
> > 1.9.1
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 09/17] powerpc/mm: Add new hash_page_mm()
@ 2014-10-02  7:10       ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-10-02  7:10 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: cbe-oss-dev, arnd, greg, linux-kernel, linuxppc-dev, anton, imunsie, jk

On Wed, 2014-10-01 at 15:13 +0530, Aneesh Kumar K.V wrote:
> Michael Neuling <mikey@neuling.org> writes:
>=20
> > From: Ian Munsie <imunsie@au1.ibm.com>
> >
> > This adds a new function hash_page_mm() based on the existing hash_page=
().
> > This version allows any struct mm to be passed in, rather than assuming
> > current.  This is useful for servicing co-processor faults which are no=
t in the
> > context of the current running process.
> >
> > We need to be careful here as the current hash_page() assumes current i=
n a few
> > places.
>=20
> It would be nice to document the rules here. So when we try to add a hash
> page entry, and if that result in demotion of the segment are we suppose =
to
> flush slbs ?=20

Yeah, we found it sucky to understand.  The current documentation is
"buy benh a beer and ask him" which doesn't scale very well unless
you're benh and you like beer.

> Also why would one want to hash anything other
> than current->mm ? How will this get called ?=20

We are calling this on behalf of a co-processor (eg cxl).  The mm this
is currently associated with may not be running on a cpu. =20

> May be they are explained in later patches. But can we also explain it
> here.=20

Ok, I'll add something (mpe had the same question).

Mikey
>=20
> >
> > Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > ---
> >  arch/powerpc/include/asm/mmu-hash64.h |  1 +
> >  arch/powerpc/mm/hash_utils_64.c       | 22 ++++++++++++++--------
> >  2 files changed, 15 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/inclu=
de/asm/mmu-hash64.h
> > index 6d0b7a2..f84e5a5 100644
> > --- a/arch/powerpc/include/asm/mmu-hash64.h
> > +++ b/arch/powerpc/include/asm/mmu-hash64.h
> > @@ -322,6 +322,7 @@ extern int __hash_page_64K(unsigned long ea, unsign=
ed long access,
> >  			   unsigned int local, int ssize);
> >  struct mm_struct;
> >  unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int =
trap);
> > +extern int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsign=
ed long access, unsigned long trap);
> >  extern int hash_page(unsigned long ea, unsigned long access, unsigned =
long trap);
> >  int __hash_page_huge(unsigned long ea, unsigned long access, unsigned =
long vsid,
> >  		     pte_t *ptep, unsigned long trap, int local, int ssize,
> > diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_uti=
ls_64.c
> > index bbdb054..0a5c8c0 100644
> > --- a/arch/powerpc/mm/hash_utils_64.c
> > +++ b/arch/powerpc/mm/hash_utils_64.c
> > @@ -904,7 +904,7 @@ void demote_segment_4k(struct mm_struct *mm, unsign=
ed long addr)
> >  		return;
> >  	slice_set_range_psize(mm, addr, 1, MMU_PAGE_4K);
> >  	copro_flush_all_slbs(mm);
> > -	if (get_paca_psize(addr) !=3D MMU_PAGE_4K) {
> > +	if ((get_paca_psize(addr) !=3D MMU_PAGE_4K) && (current->mm =3D=3D mm=
)) {
> >  		get_paca()->context =3D mm->context;
> >  		slb_flush_and_rebolt();
> >  	}
> > @@ -989,26 +989,24 @@ static void check_paca_psize(unsigned long ea, st=
ruct mm_struct *mm,
> >   * -1 - critical hash insertion error
> >   * -2 - access not permitted by subpage protection mechanism
> >   */
> > -int hash_page(unsigned long ea, unsigned long access, unsigned long tr=
ap)
> > +int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long=
 access, unsigned long trap)
> >  {
> >  	enum ctx_state prev_state =3D exception_enter();
> >  	pgd_t *pgdir;
> >  	unsigned long vsid;
> > -	struct mm_struct *mm;
> >  	pte_t *ptep;
> >  	unsigned hugeshift;
> >  	const struct cpumask *tmp;
> >  	int rc, user_region =3D 0, local =3D 0;
> >  	int psize, ssize;
> > =20
> > -	DBG_LOW("hash_page(ea=3D%016lx, access=3D%lx, trap=3D%lx\n",
> > -		ea, access, trap);
> > +	DBG_LOW("%s(ea=3D%016lx, access=3D%lx, trap=3D%lx\n",
> > +		__func__, ea, access, trap);
> > =20
> >  	/* Get region & vsid */
> >   	switch (REGION_ID(ea)) {
> >  	case USER_REGION_ID:
> >  		user_region =3D 1;
> > -		mm =3D current->mm;
> >  		if (! mm) {
> >  			DBG_LOW(" user region with no mm !\n");
> >  			rc =3D 1;
> > @@ -1104,7 +1102,8 @@ int hash_page(unsigned long ea, unsigned long acc=
ess, unsigned long trap)
> >  			WARN_ON(1);
> >  		}
> >  #endif
> > -		check_paca_psize(ea, mm, psize, user_region);
> > +		if (current->mm =3D=3D mm)
> > +			check_paca_psize(ea, mm, psize, user_region);
> > =20
> >  		goto bail;
> >  	}
> > @@ -1145,7 +1144,8 @@ int hash_page(unsigned long ea, unsigned long acc=
ess, unsigned long trap)
> >  		}
> >  	}
> > =20
> > -	check_paca_psize(ea, mm, psize, user_region);
> > +	if (current->mm =3D=3D mm)
> > +		check_paca_psize(ea, mm, psize, user_region);
> >  #endif /* CONFIG_PPC_64K_PAGES */
> > =20
> >  #ifdef CONFIG_PPC_HAS_HASH_64K
> > @@ -1180,6 +1180,12 @@ bail:
> >  	exception_exit(prev_state);
> >  	return rc;
> >  }
> > +EXPORT_SYMBOL_GPL(hash_page_mm);
> > +
> > +int hash_page(unsigned long ea, unsigned long access, unsigned long tr=
ap)
> > +{
> > +	return hash_page_mm(current->mm, ea, access, trap);
> > +}
> >  EXPORT_SYMBOL_GPL(hash_page);
> > =20
> >  void hash_preload(struct mm_struct *mm, unsigned long ea,
> > --=20
> > 1.9.1
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel"=
 in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
>=20

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 09/17] powerpc/mm: Add new hash_page_mm()
  2014-10-02  3:48     ` Michael Ellerman
@ 2014-10-02  7:39       ` Michael Neuling
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-10-02  7:39 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: greg, arnd, benh, anton, linux-kernel, linuxppc-dev, jk, imunsie,
	cbe-oss-dev, Aneesh Kumar K.V

On Thu, 2014-10-02 at 13:48 +1000, Michael Ellerman wrote:
> On Tue, 2014-30-09 at 10:34:58 UTC, Michael Neuling wrote:
> > From: Ian Munsie <imunsie@au1.ibm.com>
> > 
> > This adds a new function hash_page_mm() based on the existing hash_page().
> > This version allows any struct mm to be passed in, rather than assuming
> > current.  This is useful for servicing co-processor faults which are not in the
> > context of the current running process.
> 
> I'm not a big fan. hash_page() is already a train wreck, and this doesn't make
> it any better.

I can document it to make the situation a bit better.  It's certainly
not clear which one to use here and under what circumstances.  It's
basically ask benh territory.  

> > diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> > index bbdb054..0a5c8c0 100644
> > --- a/arch/powerpc/mm/hash_utils_64.c
> > +++ b/arch/powerpc/mm/hash_utils_64.c
> > @@ -904,7 +904,7 @@ void demote_segment_4k(struct mm_struct *mm, unsigned long addr)
> >  		return;
> >  	slice_set_range_psize(mm, addr, 1, MMU_PAGE_4K);
> >  	copro_flush_all_slbs(mm);
> > -	if (get_paca_psize(addr) != MMU_PAGE_4K) {
> > +	if ((get_paca_psize(addr) != MMU_PAGE_4K) && (current->mm == mm)) {
> >  		get_paca()->context = mm->context;
> >  		slb_flush_and_rebolt();
> 
> This is a bit fishy.
> 
> If that mm is currently running on another cpu you just failed to update it's
> paca. But I think the call to check_paca_psize() in hash_page() will save you
> on that cpu.
> 
> In fact we might be able to remove that synchronisation from
> demote_segment_4k() and always leave it up to check_paca_psize()?

Aneesh asked the same thing for v1 and we convinced ourselves it was ok.
I said this at the time...

I had a chat to benh offline about this and he thinks it's fine.  A
running process in the same mm context will either have hit this mapping
or not.  If it's hit it, the page will be invalidated and it'll come in
via hash_page and have it's segment demoted also (and paca updated).  If
it hasn't hit, again it'll come into hash_page() and get demoted also.

> > @@ -989,26 +989,24 @@ static void check_paca_psize(unsigned long ea, struct mm_struct *mm,
> >   * -1 - critical hash insertion error
> >   * -2 - access not permitted by subpage protection mechanism
> >   */
> > -int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
> > +int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, unsigned long trap)
> >  {
> >  	enum ctx_state prev_state = exception_enter();
> >  	pgd_t *pgdir;
> >  	unsigned long vsid;
> > -	struct mm_struct *mm;
> >  	pte_t *ptep;
> >  	unsigned hugeshift;
> >  	const struct cpumask *tmp;
> >  	int rc, user_region = 0, local = 0;
> >  	int psize, ssize;
> >  
> > -	DBG_LOW("hash_page(ea=%016lx, access=%lx, trap=%lx\n",
> > -		ea, access, trap);
> > +	DBG_LOW("%s(ea=%016lx, access=%lx, trap=%lx\n",
> > +		__func__, ea, access, trap);
> >  
> >  	/* Get region & vsid */
> >   	switch (REGION_ID(ea)) {
> >  	case USER_REGION_ID:
> >  		user_region = 1;
> > -		mm = current->mm;
> >  		if (! mm) {
> >  			DBG_LOW(" user region with no mm !\n");
> >  			rc = 1;
> 
> What about the VMALLOC case where we do:
> 		mm = &init_mm;
> 		
> Is that what you want? It seems odd that you pass an mm to the routine, but
> then potentially it ends up using a different mm after all depending on the
> address.

Good point.  We have hash_page() still.  I can make that check in there
and decide which mm to use and pass that to hash_page_mm().   Then we
always use mm in hash_page_mm().  hash_page() will then look like this: 

int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
{
	struct mm_struct *mm = current->mm;

	if (REGION_ID(ea) == VMALLOC_REGION_ID)
		mm = &init_mm;

	return hash_page_mm(mm, ea, access, trap);
}

Mikey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 09/17] powerpc/mm: Add new hash_page_mm()
@ 2014-10-02  7:39       ` Michael Neuling
  0 siblings, 0 replies; 100+ messages in thread
From: Michael Neuling @ 2014-10-02  7:39 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: cbe-oss-dev, arnd, Aneesh Kumar K.V, greg, linux-kernel, imunsie,
	linuxppc-dev, anton, jk

On Thu, 2014-10-02 at 13:48 +1000, Michael Ellerman wrote:
> On Tue, 2014-30-09 at 10:34:58 UTC, Michael Neuling wrote:
> > From: Ian Munsie <imunsie@au1.ibm.com>
> >=20
> > This adds a new function hash_page_mm() based on the existing hash_page=
().
> > This version allows any struct mm to be passed in, rather than assuming
> > current.  This is useful for servicing co-processor faults which are no=
t in the
> > context of the current running process.
>=20
> I'm not a big fan. hash_page() is already a train wreck, and this doesn't=
 make
> it any better.

I can document it to make the situation a bit better.  It's certainly
not clear which one to use here and under what circumstances.  It's
basically ask benh territory. =20

> > diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_uti=
ls_64.c
> > index bbdb054..0a5c8c0 100644
> > --- a/arch/powerpc/mm/hash_utils_64.c
> > +++ b/arch/powerpc/mm/hash_utils_64.c
> > @@ -904,7 +904,7 @@ void demote_segment_4k(struct mm_struct *mm, unsign=
ed long addr)
> >  		return;
> >  	slice_set_range_psize(mm, addr, 1, MMU_PAGE_4K);
> >  	copro_flush_all_slbs(mm);
> > -	if (get_paca_psize(addr) !=3D MMU_PAGE_4K) {
> > +	if ((get_paca_psize(addr) !=3D MMU_PAGE_4K) && (current->mm =3D=3D mm=
)) {
> >  		get_paca()->context =3D mm->context;
> >  		slb_flush_and_rebolt();
>=20
> This is a bit fishy.
>=20
> If that mm is currently running on another cpu you just failed to update =
it's
> paca. But I think the call to check_paca_psize() in hash_page() will save=
 you
> on that cpu.
>=20
> In fact we might be able to remove that synchronisation from
> demote_segment_4k() and always leave it up to check_paca_psize()?

Aneesh asked the same thing for v1 and we convinced ourselves it was ok.
I said this at the time...

I had a chat to benh offline about this and he thinks it's fine.  A
running process in the same mm context will either have hit this mapping
or not.  If it's hit it, the page will be invalidated and it'll come in
via hash_page and have it's segment demoted also (and paca updated).  If
it hasn't hit, again it'll come into hash_page() and get demoted also.

> > @@ -989,26 +989,24 @@ static void check_paca_psize(unsigned long ea, st=
ruct mm_struct *mm,
> >   * -1 - critical hash insertion error
> >   * -2 - access not permitted by subpage protection mechanism
> >   */
> > -int hash_page(unsigned long ea, unsigned long access, unsigned long tr=
ap)
> > +int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long=
 access, unsigned long trap)
> >  {
> >  	enum ctx_state prev_state =3D exception_enter();
> >  	pgd_t *pgdir;
> >  	unsigned long vsid;
> > -	struct mm_struct *mm;
> >  	pte_t *ptep;
> >  	unsigned hugeshift;
> >  	const struct cpumask *tmp;
> >  	int rc, user_region =3D 0, local =3D 0;
> >  	int psize, ssize;
> > =20
> > -	DBG_LOW("hash_page(ea=3D%016lx, access=3D%lx, trap=3D%lx\n",
> > -		ea, access, trap);
> > +	DBG_LOW("%s(ea=3D%016lx, access=3D%lx, trap=3D%lx\n",
> > +		__func__, ea, access, trap);
> > =20
> >  	/* Get region & vsid */
> >   	switch (REGION_ID(ea)) {
> >  	case USER_REGION_ID:
> >  		user_region =3D 1;
> > -		mm =3D current->mm;
> >  		if (! mm) {
> >  			DBG_LOW(" user region with no mm !\n");
> >  			rc =3D 1;
>=20
> What about the VMALLOC case where we do:
> 		mm =3D &init_mm;
> 	=09
> Is that what you want? It seems odd that you pass an mm to the routine, b=
ut
> then potentially it ends up using a different mm after all depending on t=
he
> address.

Good point.  We have hash_page() still.  I can make that check in there
and decide which mm to use and pass that to hash_page_mm().   Then we
always use mm in hash_page_mm().  hash_page() will then look like this:=20

int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
{
	struct mm_struct *mm =3D current->mm;

	if (REGION_ID(ea) =3D=3D VMALLOC_REGION_ID)
		mm =3D &init_mm;

	return hash_page_mm(mm, ea, access, trap);
}

Mikey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 15/17] cxl: Userspace header file.
  2014-10-02  6:02     ` Michael Ellerman
@ 2014-10-02 10:28       ` Ian Munsie
  -1 siblings, 0 replies; 100+ messages in thread
From: Ian Munsie @ 2014-10-02 10:28 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Michael Neuling, greg, arnd, benh, anton, linux-kernel,
	linuxppc-dev, jk, cbe-oss-dev, Aneesh Kumar K.V

Hey Michael,

Excerpts from Michael Ellerman's message of 2014-10-02 16:02:37 +1000:
> > +/* ioctls */
> > +struct cxl_ioctl_start_work {
> > +    __u64 wed;
> > +    __u64 amr;
> > +    __u64 reserved1;
> > +    __u32 reserved2;
> > +    __s16 num_interrupts; /* -1 = use value from afu descriptor */
> > +    __u16 process_element; /* returned from kernel */
> > +    __u64 reserved3;
> > +    __u64 reserved4;
> > +    __u64 reserved5;
> > +    __u64 reserved6;
> 
> Why so many reserved fields?

The first two are reserved for the context save area (reserved1) and
size (reserved2) of the "shared" (AKA time sliced) virtualisation model,
which we don't yet support. That only leaves us with four reserved
fields for anything that we haven't thought of or that the hardware team
hasn't come up with yet ;-)

> What mechanism is there that will allow you to ever unreserve them?
>
> ie. how does a new userspace detect that the kernel it's running on supports
> new fields?

The ioctl will return -EINVAL if any of them are set to non-zero values,
so userspace can easily tell if it's running on an old kernel.

> Or conversely how does a new kernel detect that userspace has passed it a
> meaningful value in one of the previously reserved fields?

They would have to be non-zero (certainly true of the context save
area's size), or one could turn into a flags field or api version.

> > +#define CXL_MAGIC 0xCA
> > +#define CXL_IOCTL_START_WORK      _IOWR(CXL_MAGIC, 0x00, struct cxl_ioctl_start_work)
> 
> What happened to 0x1 ?

That was used to dynamically program the FPGA with a new AFU image, but
we don't have anything to test it on yet and I'm not convinced that the
procedure won't change by the time we do, so we pulled the code.

We can repack the ioctl numbers easily enough... Will do :)

> > +enum cxl_event_type {
> > +    CXL_EVENT_READ_FAIL     = -1,
> 
> I don't see this used?

That was used in the userspace library to mark it's buffer as bad if the
read() call failed for whatever reason... but you're right - it isn't
used by the kernel and doesn't belong in this header. Will remove.

> > +struct cxl_event_header {
> > +    __u32 type;
> > +    __u16 size;
> > +    __u16 process_element;
> > +    __u64 reserved1;
> > +    __u64 reserved2;
> > +    __u64 reserved3;
> > +};
> 
> Again lots of reserved fields?

Figured it was better to have a bit more than we expect we might need
just in case... We can reduce this if you feel it is excessive?

In an earlier version of the code the kernel would fill out the header
and not clear an event if a buffer was passed in that was too small, so
userspace could realloc a larger buffer and try again. This made the API
a bit more complex and our internal users weren't too keen on it, so we
decided to use a fixed-size buffer and make it larger than we strictly
needed so we have plenty of room for further expansion.

> Rather than having the header included in every event, would it be clearer if
> the cxl_event was:
> 
> struct cxl_event {
>     struct cxl_event_header header;
>     union {
>         struct cxl_event_afu_interrupt irq;
>         struct cxl_event_data_storage fault;
>         struct cxl_event_afu_error afu_err;
>     };
> };

Sounds like a good idea to me :)

Cheers,
-Ian


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 15/17] cxl: Userspace header file.
@ 2014-10-02 10:28       ` Ian Munsie
  0 siblings, 0 replies; 100+ messages in thread
From: Ian Munsie @ 2014-10-02 10:28 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: cbe-oss-dev, Michael Neuling, arnd, Aneesh Kumar K.V, greg,
	linux-kernel, linuxppc-dev, anton, jk

Hey Michael,

Excerpts from Michael Ellerman's message of 2014-10-02 16:02:37 +1000:
> > +/* ioctls */
> > +struct cxl_ioctl_start_work {
> > +    __u64 wed;
> > +    __u64 amr;
> > +    __u64 reserved1;
> > +    __u32 reserved2;
> > +    __s16 num_interrupts; /* -1 = use value from afu descriptor */
> > +    __u16 process_element; /* returned from kernel */
> > +    __u64 reserved3;
> > +    __u64 reserved4;
> > +    __u64 reserved5;
> > +    __u64 reserved6;
> 
> Why so many reserved fields?

The first two are reserved for the context save area (reserved1) and
size (reserved2) of the "shared" (AKA time sliced) virtualisation model,
which we don't yet support. That only leaves us with four reserved
fields for anything that we haven't thought of or that the hardware team
hasn't come up with yet ;-)

> What mechanism is there that will allow you to ever unreserve them?
>
> ie. how does a new userspace detect that the kernel it's running on supports
> new fields?

The ioctl will return -EINVAL if any of them are set to non-zero values,
so userspace can easily tell if it's running on an old kernel.

> Or conversely how does a new kernel detect that userspace has passed it a
> meaningful value in one of the previously reserved fields?

They would have to be non-zero (certainly true of the context save
area's size), or one could turn into a flags field or api version.

> > +#define CXL_MAGIC 0xCA
> > +#define CXL_IOCTL_START_WORK      _IOWR(CXL_MAGIC, 0x00, struct cxl_ioctl_start_work)
> 
> What happened to 0x1 ?

That was used to dynamically program the FPGA with a new AFU image, but
we don't have anything to test it on yet and I'm not convinced that the
procedure won't change by the time we do, so we pulled the code.

We can repack the ioctl numbers easily enough... Will do :)

> > +enum cxl_event_type {
> > +    CXL_EVENT_READ_FAIL     = -1,
> 
> I don't see this used?

That was used in the userspace library to mark it's buffer as bad if the
read() call failed for whatever reason... but you're right - it isn't
used by the kernel and doesn't belong in this header. Will remove.

> > +struct cxl_event_header {
> > +    __u32 type;
> > +    __u16 size;
> > +    __u16 process_element;
> > +    __u64 reserved1;
> > +    __u64 reserved2;
> > +    __u64 reserved3;
> > +};
> 
> Again lots of reserved fields?

Figured it was better to have a bit more than we expect we might need
just in case... We can reduce this if you feel it is excessive?

In an earlier version of the code the kernel would fill out the header
and not clear an event if a buffer was passed in that was too small, so
userspace could realloc a larger buffer and try again. This made the API
a bit more complex and our internal users weren't too keen on it, so we
decided to use a fixed-size buffer and make it larger than we strictly
needed so we have plenty of room for further expansion.

> Rather than having the header included in every event, would it be clearer if
> the cxl_event was:
> 
> struct cxl_event {
>     struct cxl_event_header header;
>     union {
>         struct cxl_event_afu_interrupt irq;
>         struct cxl_event_data_storage fault;
>         struct cxl_event_afu_error afu_err;
>     };
> };

Sounds like a good idea to me :)

Cheers,
-Ian

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 15/17] cxl: Userspace header file.
  2014-10-02 10:28       ` Ian Munsie
@ 2014-10-02 12:42         ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 100+ messages in thread
From: Benjamin Herrenschmidt @ 2014-10-02 12:42 UTC (permalink / raw)
  To: Ian Munsie
  Cc: Michael Ellerman, Michael Neuling, greg, arnd, anton,
	linux-kernel, linuxppc-dev, jk, cbe-oss-dev, Aneesh Kumar K.V

On Thu, 2014-10-02 at 20:28 +1000, Ian Munsie wrote:
> Hey Michael,
> 
> Excerpts from Michael Ellerman's message of 2014-10-02 16:02:37 +1000:
> > > +/* ioctls */
> > > +struct cxl_ioctl_start_work {
> > > +    __u64 wed;
> > > +    __u64 amr;
> > > +    __u64 reserved1;
> > > +    __u32 reserved2;
> > > +    __s16 num_interrupts; /* -1 = use value from afu descriptor */
> > > +    __u16 process_element; /* returned from kernel */
> > > +    __u64 reserved3;
> > > +    __u64 reserved4;
> > > +    __u64 reserved5;
> > > +    __u64 reserved6;
> > 
> > Why so many reserved fields?
> 
> The first two are reserved for the context save area (reserved1) and
> size (reserved2) of the "shared" (AKA time sliced) virtualisation model,
> which we don't yet support. That only leaves us with four reserved
> fields for anything that we haven't thought of or that the hardware team
> hasn't come up with yet ;-)
> 
> > What mechanism is there that will allow you to ever unreserve them?
> >
> > ie. how does a new userspace detect that the kernel it's running on supports
> > new fields?
> 
> The ioctl will return -EINVAL if any of them are set to non-zero values,
> so userspace can easily tell if it's running on an old kernel.

Not good enough in my experience. Throw in a flags field I'd say..

> > Or conversely how does a new kernel detect that userspace has passed it a
> > meaningful value in one of the previously reserved fields?
> 
> They would have to be non-zero (certainly true of the context save
> area's size), or one could turn into a flags field or api version.

If you go that way you need to negociate as well latest compatible
etc...

> > > +#define CXL_MAGIC 0xCA
> > > +#define CXL_IOCTL_START_WORK      _IOWR(CXL_MAGIC, 0x00, struct cxl_ioctl_start_work)
> > 
> > What happened to 0x1 ?
> 
> That was used to dynamically program the FPGA with a new AFU image, but
> we don't have anything to test it on yet and I'm not convinced that the
> procedure won't change by the time we do, so we pulled the code.
> 
> We can repack the ioctl numbers easily enough... Will do :)
> 
> > > +enum cxl_event_type {
> > > +    CXL_EVENT_READ_FAIL     = -1,
> > 
> > I don't see this used?
> 
> That was used in the userspace library to mark it's buffer as bad if the
> read() call failed for whatever reason... but you're right - it isn't
> used by the kernel and doesn't belong in this header. Will remove.
> 
> > > +struct cxl_event_header {
> > > +    __u32 type;
> > > +    __u16 size;
> > > +    __u16 process_element;
> > > +    __u64 reserved1;
> > > +    __u64 reserved2;
> > > +    __u64 reserved3;
> > > +};
> > 
> > Again lots of reserved fields?
> 
> Figured it was better to have a bit more than we expect we might need
> just in case... We can reduce this if you feel it is excessive?
> 
> In an earlier version of the code the kernel would fill out the header
> and not clear an event if a buffer was passed in that was too small, so
> userspace could realloc a larger buffer and try again. This made the API
> a bit more complex and our internal users weren't too keen on it, so we
> decided to use a fixed-size buffer and make it larger than we strictly
> needed so we have plenty of room for further expansion.
> 
> > Rather than having the header included in every event, would it be clearer if
> > the cxl_event was:
> > 
> > struct cxl_event {
> >     struct cxl_event_header header;
> >     union {
> >         struct cxl_event_afu_interrupt irq;
> >         struct cxl_event_data_storage fault;
> >         struct cxl_event_afu_error afu_err;
> >     };
> > };
> 
> Sounds like a good idea to me :)
> 
> Cheers,
> -Ian



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v2 15/17] cxl: Userspace header file.
@ 2014-10-02 12:42         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 100+ messages in thread
From: Benjamin Herrenschmidt @ 2014-10-02 12:42 UTC (permalink / raw)
  To: Ian Munsie
  Cc: cbe-oss-dev, Michael Neuling, arnd, Aneesh Kumar K.V,
	linux-kernel, linuxppc-dev, anton, greg, jk

On Thu, 2014-10-02 at 20:28 +1000, Ian Munsie wrote:
> Hey Michael,
> 
> Excerpts from Michael Ellerman's message of 2014-10-02 16:02:37 +1000:
> > > +/* ioctls */
> > > +struct cxl_ioctl_start_work {
> > > +    __u64 wed;
> > > +    __u64 amr;
> > > +    __u64 reserved1;
> > > +    __u32 reserved2;
> > > +    __s16 num_interrupts; /* -1 = use value from afu descriptor */
> > > +    __u16 process_element; /* returned from kernel */
> > > +    __u64 reserved3;
> > > +    __u64 reserved4;
> > > +    __u64 reserved5;
> > > +    __u64 reserved6;
> > 
> > Why so many reserved fields?
> 
> The first two are reserved for the context save area (reserved1) and
> size (reserved2) of the "shared" (AKA time sliced) virtualisation model,
> which we don't yet support. That only leaves us with four reserved
> fields for anything that we haven't thought of or that the hardware team
> hasn't come up with yet ;-)
> 
> > What mechanism is there that will allow you to ever unreserve them?
> >
> > ie. how does a new userspace detect that the kernel it's running on supports
> > new fields?
> 
> The ioctl will return -EINVAL if any of them are set to non-zero values,
> so userspace can easily tell if it's running on an old kernel.

Not good enough in my experience. Throw in a flags field I'd say..

> > Or conversely how does a new kernel detect that userspace has passed it a
> > meaningful value in one of the previously reserved fields?
> 
> They would have to be non-zero (certainly true of the context save
> area's size), or one could turn into a flags field or api version.

If you go that way you need to negociate as well latest compatible
etc...

> > > +#define CXL_MAGIC 0xCA
> > > +#define CXL_IOCTL_START_WORK      _IOWR(CXL_MAGIC, 0x00, struct cxl_ioctl_start_work)
> > 
> > What happened to 0x1 ?
> 
> That was used to dynamically program the FPGA with a new AFU image, but
> we don't have anything to test it on yet and I'm not convinced that the
> procedure won't change by the time we do, so we pulled the code.
> 
> We can repack the ioctl numbers easily enough... Will do :)
> 
> > > +enum cxl_event_type {
> > > +    CXL_EVENT_READ_FAIL     = -1,
> > 
> > I don't see this used?
> 
> That was used in the userspace library to mark it's buffer as bad if the
> read() call failed for whatever reason... but you're right - it isn't
> used by the kernel and doesn't belong in this header. Will remove.
> 
> > > +struct cxl_event_header {
> > > +    __u32 type;
> > > +    __u16 size;
> > > +    __u16 process_element;
> > > +    __u64 reserved1;
> > > +    __u64 reserved2;
> > > +    __u64 reserved3;
> > > +};
> > 
> > Again lots of reserved fields?
> 
> Figured it was better to have a bit more than we expect we might need
> just in case... We can reduce this if you feel it is excessive?
> 
> In an earlier version of the code the kernel would fill out the header
> and not clear an event if a buffer was passed in that was too small, so
> userspace could realloc a larger buffer and try again. This made the API
> a bit more complex and our internal users weren't too keen on it, so we
> decided to use a fixed-size buffer and make it larger than we strictly
> needed so we have plenty of room for further expansion.
> 
> > Rather than having the header included in every event, would it be clearer if
> > the cxl_event was:
> > 
> > struct cxl_event {
> >     struct cxl_event_header header;
> >     union {
> >         struct cxl_event_afu_interrupt irq;
> >         struct cxl_event_data_storage fault;
> >         struct cxl_event_afu_error afu_err;
> >     };
> > };
> 
> Sounds like a good idea to me :)
> 
> Cheers,
> -Ian

^ permalink raw reply	[flat|nested] 100+ messages in thread

end of thread, other threads:[~2014-10-02 12:43 UTC | newest]

Thread overview: 100+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-30 10:34 [PATCH v2 0/17] POWER8 Coherent Accelerator device driver Michael Neuling
2014-09-30 10:34 ` Michael Neuling
2014-09-30 10:34 ` [PATCH v2 01/17] powerpc/cell: Move spu_handle_mm_fault() out of cell platform Michael Neuling
2014-09-30 10:34   ` Michael Neuling
2014-09-30 10:34 ` [PATCH v2 02/17] powerpc/cell: Move data segment faulting code " Michael Neuling
2014-09-30 10:34   ` Michael Neuling
2014-10-01  6:47   ` Michael Ellerman
2014-10-01  6:47     ` Michael Ellerman
2014-10-01  6:51     ` Benjamin Herrenschmidt
2014-10-01  6:51       ` Benjamin Herrenschmidt
2014-10-02  0:42     ` Michael Neuling
2014-10-02  0:42       ` Michael Neuling
2014-10-01  9:45   ` Aneesh Kumar K.V
2014-10-01  9:45     ` Aneesh Kumar K.V
2014-10-01 11:10     ` Michael Neuling
2014-10-01 11:10       ` Michael Neuling
2014-10-01  9:53   ` Aneesh Kumar K.V
2014-10-01  9:53     ` Aneesh Kumar K.V
2014-10-02  0:58     ` Michael Neuling
2014-10-02  0:58       ` Michael Neuling
2014-09-30 10:34 ` [PATCH v2 03/17] powerpc/cell: Make spu_flush_all_slbs() generic Michael Neuling
2014-09-30 10:34   ` Michael Neuling
2014-09-30 10:40   ` Arnd Bergmann
2014-09-30 10:40     ` Arnd Bergmann
2014-10-01  7:13   ` Michael Ellerman
2014-10-01  7:13     ` Michael Ellerman
2014-10-01 10:51     ` Michael Neuling
2014-10-01 10:51       ` Michael Neuling
2014-09-30 10:34 ` [PATCH v2 04/17] powerpc/msi: Improve IRQ bitmap allocator Michael Neuling
2014-09-30 10:34   ` Michael Neuling
2014-10-01  7:13   ` Michael Ellerman
2014-10-01  7:13     ` Michael Ellerman
2014-10-02  2:01     ` Michael Neuling
2014-10-02  2:01       ` Michael Neuling
2014-09-30 10:34 ` [PATCH v2 05/17] powerpc/mm: Export mmu_kernel_ssize and mmu_linear_psize Michael Neuling
2014-09-30 10:34   ` Michael Neuling
2014-10-01  7:13   ` Michael Ellerman
2014-10-01  7:13     ` Michael Ellerman
2014-10-02  3:13     ` Michael Neuling
2014-10-02  3:13       ` Michael Neuling
2014-09-30 10:34 ` [PATCH v2 06/17] powerpc/powernv: Split out set MSI IRQ chip code Michael Neuling
2014-09-30 10:34   ` Michael Neuling
2014-10-02  1:57   ` Michael Ellerman
2014-10-02  1:57     ` Michael Ellerman
2014-10-02  5:22     ` Michael Neuling
2014-10-02  5:22       ` Michael Neuling
2014-09-30 10:34 ` [PATCH v2 07/17] cxl: Add new header for call backs and structs Michael Neuling
2014-09-30 10:34   ` Michael Neuling
2014-10-01 12:00   ` Michael Ellerman
2014-10-01 12:00     ` Michael Ellerman
2014-10-02  3:37     ` Michael Neuling
2014-10-02  3:37       ` Michael Neuling
2014-09-30 10:34 ` [PATCH v2 08/17] powerpc/powerpc: Add new PCIe functions for allocating cxl interrupts Michael Neuling
2014-09-30 10:34   ` Michael Neuling
2014-10-02  3:16   ` Michael Ellerman
2014-10-02  3:16     ` Michael Ellerman
2014-10-02  6:09     ` Michael Neuling
2014-10-02  6:09       ` Michael Neuling
2014-09-30 10:34 ` [PATCH v2 09/17] powerpc/mm: Add new hash_page_mm() Michael Neuling
2014-09-30 10:34   ` Michael Neuling
2014-10-01  9:43   ` Aneesh Kumar K.V
2014-10-01  9:43     ` Aneesh Kumar K.V
2014-10-02  7:10     ` Michael Neuling
2014-10-02  7:10       ` Michael Neuling
2014-10-02  3:48   ` Michael Ellerman
2014-10-02  3:48     ` Michael Ellerman
2014-10-02  7:39     ` Michael Neuling
2014-10-02  7:39       ` Michael Neuling
2014-09-30 10:34 ` [PATCH v2 10/17] powerpc/mm: Merge vsid calculation in hash_page() and copro_data_segment() Michael Neuling
2014-09-30 10:34   ` Michael Neuling
2014-10-01  9:55   ` Aneesh Kumar K.V
2014-10-01  9:55     ` Aneesh Kumar K.V
2014-10-02  6:44     ` Michael Neuling
2014-10-02  6:44       ` Michael Neuling
2014-09-30 10:35 ` [PATCH v2 11/17] powerpc/opal: Add PHB to cxl mode call Michael Neuling
2014-09-30 10:35   ` Michael Neuling
2014-09-30 10:35 ` [PATCH v2 12/17] powerpc/mm: Add hooks for cxl Michael Neuling
2014-09-30 10:35   ` Michael Neuling
2014-09-30 10:35 ` [PATCH v2 13/17] cxl: Add base builtin support Michael Neuling
2014-09-30 10:35   ` Michael Neuling
2014-10-01 12:00   ` Michael Ellerman
2014-10-01 12:00     ` Michael Ellerman
2014-10-02  3:43     ` Michael Neuling
2014-10-02  3:43       ` Michael Neuling
2014-09-30 10:35 ` [PATCH v2 14/17] cxl: Driver code for powernv PCIe based cards for userspace access Michael Neuling
2014-09-30 10:35   ` Michael Neuling
2014-10-02  7:02   ` Michael Ellerman
2014-10-02  7:02     ` Michael Ellerman
2014-09-30 10:35 ` [PATCH v2 15/17] cxl: Userspace header file Michael Neuling
2014-09-30 10:35   ` Michael Neuling
2014-10-02  6:02   ` Michael Ellerman
2014-10-02  6:02     ` Michael Ellerman
2014-10-02 10:28     ` Ian Munsie
2014-10-02 10:28       ` Ian Munsie
2014-10-02 12:42       ` Benjamin Herrenschmidt
2014-10-02 12:42         ` Benjamin Herrenschmidt
2014-09-30 10:35 ` [PATCH v2 16/17] cxl: Add driver to Kbuild and Makefiles Michael Neuling
2014-09-30 10:35   ` Michael Neuling
2014-09-30 10:35 ` [PATCH v2 17/17] cxl: Add documentation for userspace APIs Michael Neuling
2014-09-30 10:35   ` Michael Neuling

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.