All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/10] s390/pci: PCI support on System z
@ 2012-11-14  9:41 Jan Glauber
  2012-11-14  9:41 ` [RFC PATCH 01/10] s390/pci: base support Jan Glauber
                   ` (9 more replies)
  0 siblings, 10 replies; 15+ messages in thread
From: Jan Glauber @ 2012-11-14  9:41 UTC (permalink / raw)
  To: linux-pci; +Cc: linux-kernel, heiko.carstens, schwidefsky, Jan Glauber

Hi,

this patchset based on 3.7-rc4 aims at adding PCI(e) support to the
System z (s390) architecture. PCI is currently not available on s390,
so this is early prototype code.

I'm posting in the hope to get some feedback and direction on the more humble
aspects of it.

Patches 01-05 are only split to ease the review, they belong functional
together and don't make much sense without each other.

PCI on the mainframe differs in some aspects from PCI on other platforms. 

The most notable difference is the use of instructions for memory
mapped IO. So on s390 the BAR spaces cannot be mapped to memory. To access 
the BAR spaces privileged instructions must be used (how that will be
addressed for user-space is not part of this patchset). See patch 01 for how
the read/write pci primitives are mapped to the s390 instructions.

Another difference is that the addresses of BARs from two different PCI
functions may be identical. That means we can't distinguish from an iomem
address which PCI device the address belongs to. Therefore s390 needs a
custom implementation of pci_iomap to remap iomem addresses using the bar
and device information provided by pci_iomap.

Device naming - the topology on System z is that every PCI function is attached
directly to a PCI root complex. Therefore we decided to use a domain number per
function and leave the bus, slot and function 0.

Next one - IRQs. s390 will use its existing adapter interrupt infrastructure
to deliver MSI/MSI-X interrupts. Legacy INTs are _not_ supported at all. 
The only thing that s390 needs is to close the gap between adapter interrupts
and MSI, in other words to locate a struct msi_desc from an IRQ number.
For that I've implemented a simple hash, see patch 04.

Then there is Kconfig, see patch 09, which gives s390 a bunch of options that
we would rather avoid. Like VGA support, sound subsystem, etc. I do not have a
solution how to disable these subsystems. We would like to limit the subsystems
that become available on s390 because the usable PCI hardware will be limited
to a small number of hand-picked and supported PCI cards.

There are several options I can imagine:

a) Don't disable anything - that would result in lots of drivers builtable on
   s390 that are not supported and weird error reports

b) Introduce negative options for whole subsystems like we have already with
   NO_IOPORT, so maybe NO_USB, NO_SOUND, ...

c) Introduce a special CONFIG_PCI_S390 and add that explicitely to all
   subsystems that we want to enable on s390

None of the above looks like the obvious winner to me.


Hope to get some feedback,
TIA, Jan

Jan Glauber (10):
  s390/pci: base support
  s390/pci: CLP interface
  s390/bitops: find leftmost bit instruction support
  s390/pci: PCI adapter interrupts for MSI/MSI-X
  s390/pci: DMA support
  s390/pci: CHSC PCI support for error and availability events
  s390/pci: PCI hotplug support via SCLP
  s390/pci: s390 specific PCI sysfs attributes
  s390/pci: add PCI Kconfig options
  vga: compile fix, disable vga for s390

 arch/s390/Kbuild                    |    1 +
 arch/s390/Kconfig                   |   56 +-
 arch/s390/include/asm/bitops.h      |   81 +++
 arch/s390/include/asm/clp.h         |   28 +
 arch/s390/include/asm/dma-mapping.h |   76 +++
 arch/s390/include/asm/dma.h         |   19 +-
 arch/s390/include/asm/hw_irq.h      |   22 +
 arch/s390/include/asm/io.h          |   55 +-
 arch/s390/include/asm/irq.h         |   12 +
 arch/s390/include/asm/isc.h         |    1 +
 arch/s390/include/asm/pci.h         |  156 ++++-
 arch/s390/include/asm/pci_clp.h     |  182 ++++++
 arch/s390/include/asm/pci_dma.h     |  196 +++++++
 arch/s390/include/asm/pci_insn.h    |  280 +++++++++
 arch/s390/include/asm/pci_io.h      |  194 +++++++
 arch/s390/include/asm/sclp.h        |    2 +
 arch/s390/kernel/dis.c              |   15 +
 arch/s390/kernel/irq.c              |    2 +
 arch/s390/pci/Makefile              |    6 +
 arch/s390/pci/pci.c                 | 1098 +++++++++++++++++++++++++++++++++++
 arch/s390/pci/pci_clp.c             |  324 +++++++++++
 arch/s390/pci/pci_dma.c             |  505 ++++++++++++++++
 arch/s390/pci/pci_event.c           |   93 +++
 arch/s390/pci/pci_msi.c             |  141 +++++
 arch/s390/pci/pci_sysfs.c           |   86 +++
 drivers/gpu/vga/Kconfig             |    2 +-
 drivers/pci/hotplug/Kconfig         |   11 +
 drivers/pci/hotplug/Makefile        |    1 +
 drivers/pci/hotplug/s390_pci_hpc.c  |  252 ++++++++
 drivers/pci/msi.c                   |   10 +-
 drivers/s390/char/sclp.h            |    3 +-
 drivers/s390/char/sclp_cmd.c        |   64 +-
 drivers/s390/cio/chsc.c             |  156 +++--
 include/asm-generic/io.h            |   21 +-
 include/linux/irq.h                 |   10 +-
 include/video/vga.h                 |    2 +
 36 files changed, 4084 insertions(+), 79 deletions(-)
 create mode 100644 arch/s390/include/asm/clp.h
 create mode 100644 arch/s390/include/asm/dma-mapping.h
 create mode 100644 arch/s390/include/asm/hw_irq.h
 create mode 100644 arch/s390/include/asm/pci_clp.h
 create mode 100644 arch/s390/include/asm/pci_dma.h
 create mode 100644 arch/s390/include/asm/pci_insn.h
 create mode 100644 arch/s390/include/asm/pci_io.h
 create mode 100644 arch/s390/pci/Makefile
 create mode 100644 arch/s390/pci/pci.c
 create mode 100644 arch/s390/pci/pci_clp.c
 create mode 100644 arch/s390/pci/pci_dma.c
 create mode 100644 arch/s390/pci/pci_event.c
 create mode 100644 arch/s390/pci/pci_msi.c
 create mode 100644 arch/s390/pci/pci_sysfs.c
 create mode 100644 drivers/pci/hotplug/s390_pci_hpc.c

-- 
1.7.12.4


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH 01/10] s390/pci: base support
  2012-11-14  9:41 [RFC PATCH 00/10] s390/pci: PCI support on System z Jan Glauber
@ 2012-11-14  9:41 ` Jan Glauber
  2012-12-10 21:14   ` Bjorn Helgaas
  2012-11-14  9:41 ` [RFC PATCH 02/10] s390/pci: CLP interface Jan Glauber
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 15+ messages in thread
From: Jan Glauber @ 2012-11-14  9:41 UTC (permalink / raw)
  To: linux-pci; +Cc: linux-kernel, heiko.carstens, schwidefsky, Jan Glauber

Add PCI support for s390, (only 64 bit mode is supported by hardware):
- PCI facility tests
- PCI instructions: pcilg, pcistg, pcistb, stpcifc, mpcifc, rpcit
- map readb/w/l/q and writeb/w/l/q to pcilg and pcistg instructions
- pci_iomap implementation
- memcpy_fromio/toio
- pci_root_ops using special pcilg/pcistg
- device, bus and domain allocation

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
---
 arch/s390/Kbuild                 |   1 +
 arch/s390/include/asm/io.h       |  55 +++-
 arch/s390/include/asm/pci.h      |  82 +++++-
 arch/s390/include/asm/pci_insn.h | 280 ++++++++++++++++++++
 arch/s390/include/asm/pci_io.h   | 194 ++++++++++++++
 arch/s390/kernel/dis.c           |  15 ++
 arch/s390/pci/Makefile           |   5 +
 arch/s390/pci/pci.c              | 557 +++++++++++++++++++++++++++++++++++++++
 include/asm-generic/io.h         |  21 +-
 9 files changed, 1201 insertions(+), 9 deletions(-)
 create mode 100644 arch/s390/include/asm/pci_insn.h
 create mode 100644 arch/s390/include/asm/pci_io.h
 create mode 100644 arch/s390/pci/Makefile
 create mode 100644 arch/s390/pci/pci.c

diff --git a/arch/s390/Kbuild b/arch/s390/Kbuild
index cc45d25..647c3ec 100644
--- a/arch/s390/Kbuild
+++ b/arch/s390/Kbuild
@@ -6,3 +6,4 @@ obj-$(CONFIG_S390_HYPFS_FS)	+= hypfs/
 obj-$(CONFIG_APPLDATA_BASE)	+= appldata/
 obj-$(CONFIG_MATHEMU)		+= math-emu/
 obj-y				+= net/
+obj-$(CONFIG_PCI)		+= pci/
diff --git a/arch/s390/include/asm/io.h b/arch/s390/include/asm/io.h
index 559e921..5f3766b 100644
--- a/arch/s390/include/asm/io.h
+++ b/arch/s390/include/asm/io.h
@@ -9,9 +9,9 @@
 #ifndef _S390_IO_H
 #define _S390_IO_H
 
+#include <linux/kernel.h>
 #include <asm/page.h>
-
-#define IO_SPACE_LIMIT 0xffffffff
+#include <asm/pci_io.h>
 
 /*
  * Change virtual addresses to physical addresses and vv.
@@ -24,10 +24,11 @@ static inline unsigned long virt_to_phys(volatile void * address)
 		 "	lra	%0,0(%1)\n"
 		 "	jz	0f\n"
 		 "	la	%0,0\n"
-                 "0:"
+		 "0:"
 		 : "=a" (real_address) : "a" (address) : "cc");
-        return real_address;
+	return real_address;
 }
+#define virt_to_phys virt_to_phys
 
 static inline void * phys_to_virt(unsigned long address)
 {
@@ -42,4 +43,50 @@ void unxlate_dev_mem_ptr(unsigned long phys, void *addr);
  */
 #define xlate_dev_kmem_ptr(p)	p
 
+#define IO_SPACE_LIMIT 0
+
+#ifdef CONFIG_PCI
+
+#define ioremap_nocache(addr, size)	ioremap(addr, size)
+#define ioremap_wc			ioremap_nocache
+
+/* TODO: s390 cannot support io_remap_pfn_range... */
+#define io_remap_pfn_range(vma, vaddr, pfn, size, prot)                \
+	remap_pfn_range(vma, vaddr, pfn, size, prot)
+
+static inline void __iomem *ioremap(unsigned long offset, unsigned long size)
+{
+	return (void __iomem *) offset;
+}
+
+static inline void iounmap(volatile void __iomem *addr)
+{
+}
+
+/*
+ * s390 needs a private implementation of pci_iomap since ioremap with its
+ * offset parameter isn't sufficient. That's because BAR spaces are not
+ * disjunctive on s390 so we need the bar parameter of pci_iomap to find
+ * the corresponding device and create the mapping cookie.
+ */
+#define pci_iomap pci_iomap
+#define pci_iounmap pci_iounmap
+
+#define memcpy_fromio(dst, src, count)	zpci_memcpy_fromio(dst, src, count)
+#define memcpy_toio(dst, src, count)	zpci_memcpy_toio(dst, src, count)
+#define memset_io(dst, val, count)	zpci_memset_io(dst, val, count)
+
+#define __raw_readb	zpci_read_u8
+#define __raw_readw	zpci_read_u16
+#define __raw_readl	zpci_read_u32
+#define __raw_readq	zpci_read_u64
+#define __raw_writeb	zpci_write_u8
+#define __raw_writew	zpci_write_u16
+#define __raw_writel	zpci_write_u32
+#define __raw_writeq	zpci_write_u64
+
+#endif /* CONFIG_PCI */
+
+#include <asm-generic/io.h>
+
 #endif
diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 42a145c..5e286ce 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -1,10 +1,84 @@
 #ifndef __ASM_S390_PCI_H
 #define __ASM_S390_PCI_H
 
-/* S/390 systems don't have a PCI bus. This file is just here because some stupid .c code
- * includes it even if CONFIG_PCI is not set.
- */
+/* must be set before including asm-generic/pci.h */
 #define PCI_DMA_BUS_IS_PHYS (0)
+/* must be set before including pci_clp.h */
+#define PCI_BAR_COUNT	6
 
-#endif /* __ASM_S390_PCI_H */
+#include <asm-generic/pci.h>
+#include <asm-generic/pci-dma-compat.h>
 
+#define PCIBIOS_MIN_IO		0x1000
+#define PCIBIOS_MIN_MEM		0x10000000
+
+#define pcibios_assign_all_busses()	(0)
+
+void __iomem *pci_iomap(struct pci_dev *, int, unsigned long);
+void pci_iounmap(struct pci_dev *, void __iomem *);
+int pci_domain_nr(struct pci_bus *);
+int pci_proc_domain(struct pci_bus *);
+
+#define ZPCI_BUS_NR			0	/* default bus number */
+#define ZPCI_DEVFN			0	/* default device number */
+
+/* PCI Function Controls */
+#define ZPCI_FC_FN_ENABLED		0x80
+#define ZPCI_FC_ERROR			0x40
+#define ZPCI_FC_BLOCKED			0x20
+#define ZPCI_FC_DMA_ENABLED		0x10
+
+enum zpci_state {
+	ZPCI_FN_STATE_RESERVED,
+	ZPCI_FN_STATE_STANDBY,
+	ZPCI_FN_STATE_CONFIGURED,
+	ZPCI_FN_STATE_ONLINE,
+	NR_ZPCI_FN_STATES,
+};
+
+struct zpci_bar_struct {
+	u32		val;		/* bar start & 3 flag bits */
+	u8		size;		/* order 2 exponent */
+	u16		map_idx;	/* index into bar mapping array */
+};
+
+/* Private data per function */
+struct zpci_dev {
+	struct pci_dev	*pdev;
+	struct pci_bus  *bus;
+	struct list_head entry;		/* list of all zpci_devices, needed for hotplug, etc. */
+
+	enum zpci_state state;
+	u32		fid;		/* function ID, used by sclp */
+	u32		fh;		/* function handle, used by insn's */
+	u16		pchid;		/* physical channel ID */
+	u8		pfgid;		/* function group ID */
+	u16		domain;
+
+	struct zpci_bar_struct bars[PCI_BAR_COUNT];
+
+	enum pci_bus_speed max_bus_speed;
+};
+
+static inline bool zdev_enabled(struct zpci_dev *zdev)
+{
+	return (zdev->fh & (1UL << 31)) ? true : false;
+}
+
+/* -----------------------------------------------------------------------------
+  Prototypes
+----------------------------------------------------------------------------- */
+/* Base stuff */
+struct zpci_dev *zpci_alloc_device(void);
+int zpci_create_device(struct zpci_dev *);
+int zpci_enable_device(struct zpci_dev *);
+void zpci_stop_device(struct zpci_dev *);
+void zpci_free_device(struct zpci_dev *);
+int zpci_scan_device(struct zpci_dev *);
+
+/* Helpers */
+struct zpci_dev *get_zdev(struct pci_dev *);
+struct zpci_dev *get_zdev_by_fid(u32);
+bool zpci_fid_present(u32);
+
+#endif
diff --git a/arch/s390/include/asm/pci_insn.h b/arch/s390/include/asm/pci_insn.h
new file mode 100644
index 0000000..15b88b9
--- /dev/null
+++ b/arch/s390/include/asm/pci_insn.h
@@ -0,0 +1,280 @@
+#ifndef _ASM_S390_PCI_INSN_H
+#define _ASM_S390_PCI_INSN_H
+
+#include <linux/delay.h>
+
+#define ZPCI_INSN_BUSY_DELAY	1	/* 1 millisecond */
+
+/* Load/Store status codes */
+#define ZPCI_PCI_ST_FUNC_NOT_ENABLED		4
+#define ZPCI_PCI_ST_FUNC_IN_ERR			8
+#define ZPCI_PCI_ST_BLOCKED			12
+#define ZPCI_PCI_ST_INSUF_RES			16
+#define ZPCI_PCI_ST_INVAL_AS			20
+#define ZPCI_PCI_ST_FUNC_ALREADY_ENABLED	24
+#define ZPCI_PCI_ST_DMA_AS_NOT_ENABLED		28
+#define ZPCI_PCI_ST_2ND_OP_IN_INV_AS		36
+#define ZPCI_PCI_ST_FUNC_NOT_AVAIL		40
+#define ZPCI_PCI_ST_ALREADY_IN_RQ_STATE		44
+
+/* Load/Store return codes */
+#define ZPCI_PCI_LS_OK				0
+#define ZPCI_PCI_LS_ERR				1
+#define ZPCI_PCI_LS_BUSY			2
+#define ZPCI_PCI_LS_INVAL_HANDLE		3
+
+/* Load/Store address space identifiers */
+#define ZPCI_PCIAS_MEMIO_0			0
+#define ZPCI_PCIAS_MEMIO_1			1
+#define ZPCI_PCIAS_MEMIO_2			2
+#define ZPCI_PCIAS_MEMIO_3			3
+#define ZPCI_PCIAS_MEMIO_4			4
+#define ZPCI_PCIAS_MEMIO_5			5
+#define ZPCI_PCIAS_CFGSPC			15
+
+/* Modify PCI Function Controls */
+#define ZPCI_MOD_FC_REG_INT	2
+#define ZPCI_MOD_FC_DEREG_INT	3
+#define ZPCI_MOD_FC_REG_IOAT	4
+#define ZPCI_MOD_FC_DEREG_IOAT	5
+#define ZPCI_MOD_FC_REREG_IOAT	6
+#define ZPCI_MOD_FC_RESET_ERROR	7
+#define ZPCI_MOD_FC_RESET_BLOCK	9
+#define ZPCI_MOD_FC_SET_MEASURE	10
+
+/* FIB function controls */
+#define ZPCI_FIB_FC_ENABLED	0x80
+#define ZPCI_FIB_FC_ERROR	0x40
+#define ZPCI_FIB_FC_LS_BLOCKED	0x20
+#define ZPCI_FIB_FC_DMAAS_REG	0x10
+
+/* FIB function controls */
+#define ZPCI_FIB_FC_ENABLED	0x80
+#define ZPCI_FIB_FC_ERROR	0x40
+#define ZPCI_FIB_FC_LS_BLOCKED	0x20
+#define ZPCI_FIB_FC_DMAAS_REG	0x10
+
+/* Function Information Block */
+struct zpci_fib {
+	u32 fmt		:  8;	/* format */
+	u32		: 24;
+	u32 reserved1;
+	u8 fc;			/* function controls */
+	u8 reserved2;
+	u16 reserved3;
+	u32 reserved4;
+	u64 pba;		/* PCI base address */
+	u64 pal;		/* PCI address limit */
+	u64 iota;		/* I/O Translation Anchor */
+	u32		:  1;
+	u32 isc		:  3;	/* Interrupt subclass */
+	u32 noi		: 12;	/* Number of interrupts */
+	u32		:  2;
+	u32 aibvo	:  6;	/* Adapter interrupt bit vector offset */
+	u32 sum		:  1;	/* Adapter int summary bit enabled */
+	u32		:  1;
+	u32 aisbo	:  6;	/* Adapter int summary bit offset */
+	u32 reserved5;
+	u64 aibv;		/* Adapter int bit vector address */
+	u64 aisb;		/* Adapter int summary bit address */
+	u64 fmb_addr;		/* Function measurement block address and key */
+	u64 reserved6;
+	u64 reserved7;
+} __packed;
+
+/* Modify PCI Function Controls */
+static inline u8 __mpcifc(u64 req, struct zpci_fib *fib, u8 *status)
+{
+	u8 cc;
+
+	asm volatile (
+		"	.insn	rxy,0xe300000000d0,%[req],%[fib]\n"
+		"	ipm	%[cc]\n"
+		"	srl	%[cc],28\n"
+		: [cc] "=d" (cc), [req] "+d" (req), [fib] "+Q" (*fib)
+		: : "cc");
+	*status = req >> 24 & 0xff;
+	return cc;
+}
+
+static inline int mpcifc_instr(u64 req, struct zpci_fib *fib)
+{
+	u8 cc, status;
+
+	do {
+		cc = __mpcifc(req, fib, &status);
+		if (cc == 2)
+			msleep(ZPCI_INSN_BUSY_DELAY);
+	} while (cc == 2);
+
+	if (cc)
+		printk_once(KERN_ERR "%s: error cc: %d  status: %d\n",
+			     __func__, cc, status);
+	return (cc) ? -EIO : 0;
+}
+
+/* Refresh PCI Translations */
+static inline u8 __rpcit(u64 fn, u64 addr, u64 range, u8 *status)
+{
+	register u64 __addr asm("2") = addr;
+	register u64 __range asm("3") = range;
+	u8 cc;
+
+	asm volatile (
+		"	.insn	rre,0xb9d30000,%[fn],%[addr]\n"
+		"	ipm	%[cc]\n"
+		"	srl	%[cc],28\n"
+		: [cc] "=d" (cc), [fn] "+d" (fn)
+		: [addr] "d" (__addr), "d" (__range)
+		: "cc");
+	*status = fn >> 24 & 0xff;
+	return cc;
+}
+
+static inline int rpcit_instr(u64 fn, u64 addr, u64 range)
+{
+	u8 cc, status;
+
+	do {
+		cc = __rpcit(fn, addr, range, &status);
+		if (cc == 2)
+			msleep(ZPCI_INSN_BUSY_DELAY);
+	} while (cc == 2);
+
+	if (cc)
+		printk_once(KERN_ERR "%s: error cc: %d  status: %d  dma_addr: %Lx  size: %Lx\n",
+			    __func__, cc, status, addr, range);
+	return (cc) ? -EIO : 0;
+}
+
+/* Store PCI function controls */
+static inline u8 __stpcifc(u32 handle, u8 space, struct zpci_fib *fib, u8 *status)
+{
+	u64 fn = (u64) handle << 32 | space << 16;
+	u8 cc;
+
+	asm volatile (
+		"	.insn	rxy,0xe300000000d4,%[fn],%[fib]\n"
+		"	ipm	%[cc]\n"
+		"	srl	%[cc],28\n"
+		: [cc] "=d" (cc), [fn] "+d" (fn), [fib] "=m" (*fib)
+		: : "cc");
+	*status = fn >> 24 & 0xff;
+	return cc;
+}
+
+/* Set Interruption Controls */
+static inline void sic_instr(u16 ctl, char *unused, u8 isc)
+{
+	asm volatile (
+		"	.insn	rsy,0xeb00000000d1,%[ctl],%[isc],%[u]\n"
+		: : [ctl] "d" (ctl), [isc] "d" (isc << 27), [u] "Q" (*unused));
+}
+
+/* PCI Load */
+static inline u8 __pcilg(u64 *data, u64 req, u64 offset, u8 *status)
+{
+	register u64 __req asm("2") = req;
+	register u64 __offset asm("3") = offset;
+	u64 __data;
+	u8 cc;
+
+	asm volatile (
+		"	.insn	rre,0xb9d20000,%[data],%[req]\n"
+		"	ipm	%[cc]\n"
+		"	srl	%[cc],28\n"
+		: [cc] "=d" (cc), [data] "=d" (__data), [req] "+d" (__req)
+		:  "d" (__offset)
+		: "cc");
+	*status = __req >> 24 & 0xff;
+	*data = __data;
+	return cc;
+}
+
+static inline int pcilg_instr(u64 *data, u64 req, u64 offset)
+{
+	u8 cc, status;
+
+	do {
+		cc = __pcilg(data, req, offset, &status);
+		if (cc == 2)
+			msleep(ZPCI_INSN_BUSY_DELAY);
+	} while (cc == 2);
+
+	if (cc) {
+		printk_once(KERN_ERR "%s: error cc: %d  status: %d  req: %Lx  offset: %Lx\n",
+			    __func__, cc, status, req, offset);
+		/* TODO: on IO errors set data to 0xff...
+		 * here or in users of pcilg (le conversion)?
+		 */
+	}
+	return (cc) ? -EIO : 0;
+}
+
+/* PCI Store */
+static inline u8 __pcistg(u64 data, u64 req, u64 offset, u8 *status)
+{
+	register u64 __req asm("2") = req;
+	register u64 __offset asm("3") = offset;
+	u8 cc;
+
+	asm volatile (
+		"	.insn	rre,0xb9d00000,%[data],%[req]\n"
+		"	ipm	%[cc]\n"
+		"	srl	%[cc],28\n"
+		: [cc] "=d" (cc), [req] "+d" (__req)
+		: "d" (__offset), [data] "d" (data)
+		: "cc");
+	*status = __req >> 24 & 0xff;
+	return cc;
+}
+
+static inline int pcistg_instr(u64 data, u64 req, u64 offset)
+{
+	u8 cc, status;
+
+	do {
+		cc = __pcistg(data, req, offset, &status);
+		if (cc == 2)
+			msleep(ZPCI_INSN_BUSY_DELAY);
+	} while (cc == 2);
+
+	if (cc)
+		printk_once(KERN_ERR "%s: error cc: %d  status: %d  req: %Lx  offset: %Lx\n",
+			__func__, cc, status, req, offset);
+	return (cc) ? -EIO : 0;
+}
+
+/* PCI Store Block */
+static inline u8 __pcistb(const u64 *data, u64 req, u64 offset, u8 *status)
+{
+	u8 cc;
+
+	asm volatile (
+		"	.insn	rsy,0xeb00000000d0,%[req],%[offset],%[data]\n"
+		"	ipm	%[cc]\n"
+		"	srl	%[cc],28\n"
+		: [cc] "=d" (cc), [req] "+d" (req)
+		: [offset] "d" (offset), [data] "Q" (*data)
+		: "cc");
+	*status = req >> 24 & 0xff;
+	return cc;
+}
+
+static inline int pcistb_instr(const u64 *data, u64 req, u64 offset)
+{
+	u8 cc, status;
+
+	do {
+		cc = __pcistb(data, req, offset, &status);
+		if (cc == 2)
+			msleep(ZPCI_INSN_BUSY_DELAY);
+	} while (cc == 2);
+
+	if (cc)
+		printk_once(KERN_ERR "%s: error cc: %d  status: %d  req: %Lx  offset: %Lx\n",
+			    __func__, cc, status, req, offset);
+	return (cc) ? -EIO : 0;
+}
+
+#endif
diff --git a/arch/s390/include/asm/pci_io.h b/arch/s390/include/asm/pci_io.h
new file mode 100644
index 0000000..5fd81f3
--- /dev/null
+++ b/arch/s390/include/asm/pci_io.h
@@ -0,0 +1,194 @@
+#ifndef _ASM_S390_PCI_IO_H
+#define _ASM_S390_PCI_IO_H
+
+#ifdef CONFIG_PCI
+
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <asm/pci_insn.h>
+
+/* I/O Map */
+#define ZPCI_IOMAP_MAX_ENTRIES		0x7fff
+#define ZPCI_IOMAP_ADDR_BASE		0x8000000000000000ULL
+#define ZPCI_IOMAP_ADDR_IDX_MASK	0x7fff000000000000ULL
+#define ZPCI_IOMAP_ADDR_OFF_MASK	0x0000ffffffffffffULL
+
+struct zpci_iomap_entry {
+	u32 fh;
+	u8 bar;
+};
+
+extern struct zpci_iomap_entry *zpci_iomap_start;
+
+#define ZPCI_IDX(addr)								\
+	(((__force u64) addr & ZPCI_IOMAP_ADDR_IDX_MASK) >> 48)
+#define ZPCI_OFFSET(addr)							\
+	((__force u64) addr & ZPCI_IOMAP_ADDR_OFF_MASK)
+
+#define ZPCI_CREATE_REQ(handle, space, len)					\
+	((u64) handle << 32 | space << 16 | len)
+
+#define zpci_read(LENGTH, RETTYPE)						\
+static inline RETTYPE zpci_read_##RETTYPE(const volatile void __iomem *addr)	\
+{										\
+	struct zpci_iomap_entry *entry = &zpci_iomap_start[ZPCI_IDX(addr)];	\
+	u64 req = ZPCI_CREATE_REQ(entry->fh, entry->bar, LENGTH);		\
+	u64 data;								\
+	int rc;									\
+										\
+	rc = pcilg_instr(&data, req, ZPCI_OFFSET(addr));			\
+	if (rc)									\
+		data = -1ULL;							\
+	return (RETTYPE) data;							\
+}
+
+#define zpci_write(LENGTH, VALTYPE)						\
+static inline void zpci_write_##VALTYPE(VALTYPE val,				\
+					const volatile void __iomem *addr)	\
+{										\
+	struct zpci_iomap_entry *entry = &zpci_iomap_start[ZPCI_IDX(addr)];	\
+	u64 req = ZPCI_CREATE_REQ(entry->fh, entry->bar, LENGTH);		\
+	u64 data = (VALTYPE) val;						\
+										\
+	pcistg_instr(data, req, ZPCI_OFFSET(addr));				\
+}
+
+zpci_read(8, u64)
+zpci_read(4, u32)
+zpci_read(2, u16)
+zpci_read(1, u8)
+zpci_write(8, u64)
+zpci_write(4, u32)
+zpci_write(2, u16)
+zpci_write(1, u8)
+
+static inline int zpci_write_single(u64 req, const u64 *data, u64 offset, u8 len)
+{
+	u64 val;
+
+	switch (len) {
+	case 1:
+		val = (u64) *((u8 *) data);
+		break;
+	case 2:
+		val = (u64) *((u16 *) data);
+		break;
+	case 4:
+		val = (u64) *((u32 *) data);
+		break;
+	case 8:
+		val = (u64) *((u64 *) data);
+		break;
+	default:
+		val = 0;		/* let FW report error */
+		break;
+	}
+	return pcistg_instr(val, req, offset);
+}
+
+static inline int zpci_read_single(u64 req, u64 *dst, u64 offset, u8 len)
+{
+	u64 data;
+	u8 cc;
+
+	cc = pcilg_instr(&data,	 req, offset);
+	switch (len) {
+	case 1:
+		*((u8 *) dst) = (u8) data;
+		break;
+	case 2:
+		*((u16 *) dst) = (u16) data;
+		break;
+	case 4:
+		*((u32 *) dst) = (u32) data;
+		break;
+	case 8:
+		*((u64 *) dst) = (u64) data;
+		break;
+	}
+	return cc;
+}
+
+static inline int zpci_write_block(u64 req, const u64 *data, u64 offset)
+{
+	return pcistb_instr(data, req, offset);
+}
+
+static inline u8 zpci_get_max_write_size(u64 src, u64 dst, int len, int max)
+{
+	int count = len > max ? max : len, size = 1;
+
+	while (!(src & 0x1) && !(dst & 0x1) && ((size << 1) <= count)) {
+		dst = dst >> 1;
+		src = src >> 1;
+		size = size << 1;
+	}
+	return size;
+}
+
+static inline int zpci_memcpy_fromio(void *dst,
+				     const volatile void __iomem *src,
+				     unsigned long n)
+{
+	struct zpci_iomap_entry *entry = &zpci_iomap_start[ZPCI_IDX(src)];
+	u64 req, offset = ZPCI_OFFSET(src);
+	int size, rc = 0;
+
+	while (n > 0) {
+		size = zpci_get_max_write_size((u64) src, (u64) dst, n, 8);
+		req = ZPCI_CREATE_REQ(entry->fh, entry->bar, size);
+		rc = zpci_read_single(req, dst, offset, size);
+		if (rc)
+			break;
+		offset += size;
+		dst += size;
+		n -= size;
+	}
+	return rc;
+}
+
+static inline int zpci_memcpy_toio(volatile void __iomem *dst,
+				   const void *src, unsigned long n)
+{
+	struct zpci_iomap_entry *entry = &zpci_iomap_start[ZPCI_IDX(dst)];
+	u64 req, offset = ZPCI_OFFSET(dst);
+	int size, rc = 0;
+
+	if (!src)
+		return -EINVAL;
+
+	while (n > 0) {
+		size = zpci_get_max_write_size((u64) dst, (u64) src, n, 128);
+		req = ZPCI_CREATE_REQ(entry->fh, entry->bar, size);
+
+		if (size > 8) /* main path */
+			rc = zpci_write_block(req, src, offset);
+		else
+			rc = zpci_write_single(req, src, offset, size);
+		if (rc)
+			break;
+		offset += size;
+		src += size;
+		n -= size;
+	}
+	return rc;
+}
+
+static inline int zpci_memset_io(volatile void __iomem *dst,
+				 unsigned char val, size_t count)
+{
+	u8 *src = kmalloc(count, GFP_KERNEL);
+	int rc;
+
+	if (src == NULL)
+		return -ENOMEM;
+	memset(src, val, count);
+
+	rc = zpci_memcpy_toio(dst, src, count);
+	kfree(src);
+	return rc;
+}
+
+#endif /* CONFIG_PCI */
+
+#endif /* _ASM_S390_PCI_IO_H */
diff --git a/arch/s390/kernel/dis.c b/arch/s390/kernel/dis.c
index f00286b..8c6c421 100644
--- a/arch/s390/kernel/dis.c
+++ b/arch/s390/kernel/dis.c
@@ -320,6 +320,10 @@ enum {
 	LONG_INSN_TABORT,
 	LONG_INSN_TBEGIN,
 	LONG_INSN_TBEGINC,
+	LONG_INSN_PCISTG,
+	LONG_INSN_MPCIFC,
+	LONG_INSN_STPCIFC,
+	LONG_INSN_PCISTB,
 };
 
 static char *long_insn_name[] = {
@@ -340,6 +344,10 @@ static char *long_insn_name[] = {
 	[LONG_INSN_TABORT] = "tabort",
 	[LONG_INSN_TBEGIN] = "tbegin",
 	[LONG_INSN_TBEGINC] = "tbeginc",
+	[LONG_INSN_PCISTG] = "pcistg",
+	[LONG_INSN_MPCIFC] = "mpcifc",
+	[LONG_INSN_STPCIFC] = "stpcifc",
+	[LONG_INSN_PCISTB] = "pcistb",
 };
 
 static struct insn opcode[] = {
@@ -946,6 +954,9 @@ static struct insn opcode_b9[] = {
 	{ "slhhh", 0xcb, INSTR_RRF_R0RR2 },
 	{ "chhr ", 0xcd, INSTR_RRE_RR },
 	{ "clhhr", 0xcf, INSTR_RRE_RR },
+	{ { 0, LONG_INSN_PCISTG }, 0xd0, INSTR_RRE_RR },
+	{ "pcilg", 0xd2, INSTR_RRE_RR },
+	{ "rpcit", 0xd3, INSTR_RRE_RR },
 	{ "ahhlr", 0xd8, INSTR_RRF_R0RR2 },
 	{ "shhlr", 0xd9, INSTR_RRF_R0RR2 },
 	{ "slhhl", 0xdb, INSTR_RRF_R0RR2 },
@@ -1175,6 +1186,8 @@ static struct insn opcode_e3[] = {
 	{ "chf", 0xcd, INSTR_RXY_RRRD },
 	{ "clhf", 0xcf, INSTR_RXY_RRRD },
 	{ "ntstg", 0x25, INSTR_RXY_RRRD },
+	{ { 0, LONG_INSN_MPCIFC }, 0xd0, INSTR_RXY_RRRD },
+	{ { 0, LONG_INSN_STPCIFC }, 0xd4, INSTR_RXY_RRRD },
 #endif
 	{ "lrv", 0x1e, INSTR_RXY_RRRD },
 	{ "lrvh", 0x1f, INSTR_RXY_RRRD },
@@ -1254,6 +1267,8 @@ static struct insn opcode_eb[] = {
 	{ "alsi", 0x6e, INSTR_SIY_IRD },
 	{ "algsi", 0x7e, INSTR_SIY_IRD },
 	{ "ecag", 0x4c, INSTR_RSY_RRRD },
+	{ { 0, LONG_INSN_PCISTB }, 0xd0, INSTR_RSY_RRRD },
+	{ "sic", 0xd1, INSTR_RSY_RRRD },
 	{ "srak", 0xdc, INSTR_RSY_RRRD },
 	{ "slak", 0xdd, INSTR_RSY_RRRD },
 	{ "srlk", 0xde, INSTR_RSY_RRRD },
diff --git a/arch/s390/pci/Makefile b/arch/s390/pci/Makefile
new file mode 100644
index 0000000..78a1344
--- /dev/null
+++ b/arch/s390/pci/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for the s390 PCI subsystem.
+#
+
+obj-$(CONFIG_PCI)	+= pci.o
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
new file mode 100644
index 0000000..0b80ac7
--- /dev/null
+++ b/arch/s390/pci/pci.c
@@ -0,0 +1,557 @@
+/*
+ * Copyright IBM Corp. 2012
+ *
+ * Author(s):
+ *   Jan Glauber <jang@linux.vnet.ibm.com>
+ *
+ * The System z PCI code is a rewrite from a prototype by
+ * the following people (Kudoz!):
+ *   Alexander Schmidt <alexschm@de.ibm.com>
+ *   Christoph Raisch <raisch@de.ibm.com>
+ *   Hannes Hering <hering2@de.ibm.com>
+ *   Hoang-Nam Nguyen <hnguyen@de.ibm.com>
+ *   Jan-Bernd Themann <themann@de.ibm.com>
+ *   Stefan Roscher <stefan.roscher@de.ibm.com>
+ *   Thomas Klein <tklein@de.ibm.com>
+ */
+
+#define COMPONENT "zPCI"
+#define pr_fmt(fmt) COMPONENT ": " fmt
+
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/err.h>
+#include <linux/export.h>
+#include <linux/delay.h>
+#include <linux/seq_file.h>
+#include <linux/pci.h>
+#include <linux/msi.h>
+
+#include <asm/facility.h>
+#include <asm/pci_insn.h>
+
+#define DEBUG				/* enable pr_debug */
+
+#define ZPCI_NR_DMA_SPACES		1
+#define ZPCI_NR_DEVICES			CONFIG_PCI_NR_FUNCTIONS
+
+/* list of all detected zpci devices */
+LIST_HEAD(zpci_list);
+DEFINE_MUTEX(zpci_list_lock);
+
+static DECLARE_BITMAP(zpci_domain, ZPCI_NR_DEVICES);
+static DEFINE_SPINLOCK(zpci_domain_lock);
+
+/* I/O Map */
+static DEFINE_SPINLOCK(zpci_iomap_lock);
+static DECLARE_BITMAP(zpci_iomap, ZPCI_IOMAP_MAX_ENTRIES);
+struct zpci_iomap_entry *zpci_iomap_start;
+EXPORT_SYMBOL_GPL(zpci_iomap_start);
+
+struct zpci_dev *get_zdev(struct pci_dev *pdev)
+{
+	return (struct zpci_dev *) pdev->sysdata;
+}
+
+struct zpci_dev *get_zdev_by_fid(u32 fid)
+{
+	struct zpci_dev *tmp, *zdev = NULL;
+
+	mutex_lock(&zpci_list_lock);
+	list_for_each_entry(tmp, &zpci_list, entry) {
+		if (tmp->fid == fid) {
+			zdev = tmp;
+			break;
+		}
+	}
+	mutex_unlock(&zpci_list_lock);
+	return zdev;
+}
+
+bool zpci_fid_present(u32 fid)
+{
+	return (get_zdev_by_fid(fid) != NULL) ? true : false;
+}
+
+static struct zpci_dev *get_zdev_by_bus(struct pci_bus *bus)
+{
+	return (bus && bus->sysdata) ? (struct zpci_dev *) bus->sysdata : NULL;
+}
+
+int pci_domain_nr(struct pci_bus *bus)
+{
+	return ((struct zpci_dev *) bus->sysdata)->domain;
+}
+EXPORT_SYMBOL_GPL(pci_domain_nr);
+
+int pci_proc_domain(struct pci_bus *bus)
+{
+	return pci_domain_nr(bus);
+}
+EXPORT_SYMBOL_GPL(pci_proc_domain);
+
+/* Store PCI function information block */
+static int zpci_store_fib(struct zpci_dev *zdev, u8 *fc)
+{
+	struct zpci_fib *fib;
+	u8 status, cc;
+
+	fib = (void *) get_zeroed_page(GFP_KERNEL);
+	if (!fib)
+		return -ENOMEM;
+
+	do {
+		cc = __stpcifc(zdev->fh, 0, fib, &status);
+		if (cc == 2) {
+			msleep(ZPCI_INSN_BUSY_DELAY);
+			memset(fib, 0, PAGE_SIZE);
+		}
+	} while (cc == 2);
+
+	if (cc)
+		pr_err_once("%s: cc: %u  status: %u\n",
+			    __func__, cc, status);
+
+	/* Return PCI function controls */
+	*fc = fib->fc;
+
+	free_page((unsigned long) fib);
+	return (cc) ? -EIO : 0;
+}
+
+#define ZPCI_PCIAS_CFGSPC	15
+
+static int zpci_cfg_load(struct zpci_dev *zdev, int offset, u32 *val, u8 len)
+{
+	u64 req = ZPCI_CREATE_REQ(zdev->fh, ZPCI_PCIAS_CFGSPC, len);
+	u64 data;
+	int rc;
+
+	rc = pcilg_instr(&data, req, offset);
+	data = data << ((8 - len) * 8);
+	data = le64_to_cpu(data);
+	if (!rc)
+		*val = (u32) data;
+	else
+		*val = 0xffffffff;
+	return rc;
+}
+
+static int zpci_cfg_store(struct zpci_dev *zdev, int offset, u32 val, u8 len)
+{
+	u64 req = ZPCI_CREATE_REQ(zdev->fh, ZPCI_PCIAS_CFGSPC, len);
+	u64 data = val;
+	int rc;
+
+	data = cpu_to_le64(data);
+	data = data >> ((8 - len) * 8);
+	rc = pcistg_instr(data, req, offset);
+	return rc;
+}
+
+void __devinit pcibios_fixup_bus(struct pci_bus *bus)
+{
+}
+
+resource_size_t pcibios_align_resource(void *data, const struct resource *res,
+				       resource_size_t size,
+				       resource_size_t align)
+{
+	return 0;
+}
+
+/* Create a virtual mapping cookie for a PCI BAR */
+void __iomem *pci_iomap(struct pci_dev *pdev, int bar, unsigned long max)
+{
+	struct zpci_dev *zdev =	get_zdev(pdev);
+	u64 addr;
+	int idx;
+
+	if ((bar & 7) != bar)
+		return NULL;
+
+	idx = zdev->bars[bar].map_idx;
+	spin_lock(&zpci_iomap_lock);
+	zpci_iomap_start[idx].fh = zdev->fh;
+	zpci_iomap_start[idx].bar = bar;
+	spin_unlock(&zpci_iomap_lock);
+
+	addr = ZPCI_IOMAP_ADDR_BASE | ((u64) idx << 48);
+	return (void __iomem *) addr;
+}
+EXPORT_SYMBOL_GPL(pci_iomap);
+
+void pci_iounmap(struct pci_dev *pdev, void __iomem *addr)
+{
+	unsigned int idx;
+
+	idx = (((__force u64) addr) & ~ZPCI_IOMAP_ADDR_BASE) >> 48;
+	spin_lock(&zpci_iomap_lock);
+	zpci_iomap_start[idx].fh = 0;
+	zpci_iomap_start[idx].bar = 0;
+	spin_unlock(&zpci_iomap_lock);
+}
+EXPORT_SYMBOL_GPL(pci_iounmap);
+
+static int pci_read(struct pci_bus *bus, unsigned int devfn, int where,
+		    int size, u32 *val)
+{
+	struct zpci_dev *zdev = get_zdev_by_bus(bus);
+
+	if (!zdev || devfn != ZPCI_DEVFN)
+		return 0;
+	return zpci_cfg_load(zdev, where, val, size);
+}
+
+static int pci_write(struct pci_bus *bus, unsigned int devfn, int where,
+		     int size, u32 val)
+{
+	struct zpci_dev *zdev = get_zdev_by_bus(bus);
+
+	if (!zdev || devfn != ZPCI_DEVFN)
+		return 0;
+	return zpci_cfg_store(zdev, where, val, size);
+}
+
+static struct pci_ops pci_root_ops = {
+	.read = pci_read,
+	.write = pci_write,
+};
+
+static void zpci_map_resources(struct zpci_dev *zdev)
+{
+	struct pci_dev *pdev = zdev->pdev;
+	resource_size_t len;
+	int i;
+
+	for (i = 0; i < PCI_BAR_COUNT; i++) {
+		len = pci_resource_len(pdev, i);
+		if (!len)
+			continue;
+		pdev->resource[i].start = (resource_size_t) pci_iomap(pdev, i, 0);
+		pdev->resource[i].end = pdev->resource[i].start + len - 1;
+		pr_debug("BAR%i: -> start: %Lx  end: %Lx\n",
+			i, pdev->resource[i].start, pdev->resource[i].end);
+	}
+};
+
+static void zpci_unmap_resources(struct pci_dev *pdev)
+{
+	resource_size_t len;
+	int i;
+
+	for (i = 0; i < PCI_BAR_COUNT; i++) {
+		len = pci_resource_len(pdev, i);
+		if (!len)
+			continue;
+		pci_iounmap(pdev, (void *) pdev->resource[i].start);
+	}
+};
+
+struct zpci_dev *zpci_alloc_device(void)
+{
+	struct zpci_dev *zdev;
+
+	/* Alloc memory for our private pci device data */
+	zdev = kzalloc(sizeof(*zdev), GFP_KERNEL);
+	if (!zdev)
+		return ERR_PTR(-ENOMEM);
+	return zdev;
+}
+
+void zpci_free_device(struct zpci_dev *zdev)
+{
+	kfree(zdev);
+}
+
+/* Called on removal of pci_dev, leaves zpci and bus device */
+static void zpci_remove_device(struct pci_dev *pdev)
+{
+	struct zpci_dev *zdev = get_zdev(pdev);
+
+	dev_info(&pdev->dev, "Removing device %u\n", zdev->domain);
+	zdev->state = ZPCI_FN_STATE_CONFIGURED;
+	zpci_unmap_resources(pdev);
+	list_del(&zdev->entry);		/* can be called from init */
+	zdev->pdev = NULL;
+}
+
+static void zpci_scan_devices(void)
+{
+	struct zpci_dev *zdev;
+
+	mutex_lock(&zpci_list_lock);
+	list_for_each_entry(zdev, &zpci_list, entry)
+		if (zdev->state == ZPCI_FN_STATE_CONFIGURED)
+			zpci_scan_device(zdev);
+	mutex_unlock(&zpci_list_lock);
+}
+
+/*
+ * Too late for any s390 specific setup, since interrupts must be set up
+ * already which requires DMA setup too and the pci scan will access the
+ * config space, which only works if the function handle is enabled.
+ */
+int pcibios_enable_device(struct pci_dev *pdev, int mask)
+{
+	struct resource *res;
+	u16 cmd;
+	int i;
+
+	pci_read_config_word(pdev, PCI_COMMAND, &cmd);
+
+	for (i = 0; i < PCI_BAR_COUNT; i++) {
+		res = &pdev->resource[i];
+
+		if (res->flags & IORESOURCE_IO)
+			return -EINVAL;
+
+		if (res->flags & IORESOURCE_MEM)
+			cmd |= PCI_COMMAND_MEMORY;
+	}
+	pci_write_config_word(pdev, PCI_COMMAND, cmd);
+	return 0;
+}
+
+void pcibios_disable_device(struct pci_dev *pdev)
+{
+	zpci_remove_device(pdev);
+	pdev->sysdata = NULL;
+}
+
+static struct resource *zpci_alloc_bus_resource(unsigned long start, unsigned long size,
+						unsigned long flags, int domain)
+{
+	struct resource *r;
+	char *name;
+	int rc;
+
+	r = kzalloc(sizeof(*r), GFP_KERNEL);
+	if (!r)
+		return ERR_PTR(-ENOMEM);
+	r->start = start;
+	r->end = r->start + size - 1;
+	r->flags = flags;
+	r->parent = &iomem_resource;
+	name = kmalloc(18, GFP_KERNEL);
+	if (!name) {
+		kfree(r);
+		return ERR_PTR(-ENOMEM);
+	}
+	sprintf(name, "PCI Bus: %04x:%02x", domain, ZPCI_BUS_NR);
+	r->name = name;
+
+	rc = request_resource(&iomem_resource, r);
+	if (rc)
+		pr_debug("request resource %pR failed\n", r);
+	return r;
+}
+
+static int zpci_alloc_iomap(struct zpci_dev *zdev)
+{
+	int entry;
+
+	spin_lock(&zpci_iomap_lock);
+	entry = find_first_zero_bit(zpci_iomap, ZPCI_IOMAP_MAX_ENTRIES);
+	if (entry == ZPCI_IOMAP_MAX_ENTRIES) {
+		spin_unlock(&zpci_iomap_lock);
+		return -ENOSPC;
+	}
+	set_bit(entry, zpci_iomap);
+	spin_unlock(&zpci_iomap_lock);
+	return entry;
+}
+
+static void zpci_free_iomap(struct zpci_dev *zdev, int entry)
+{
+	spin_lock(&zpci_iomap_lock);
+	memset(&zpci_iomap_start[entry], 0, sizeof(struct zpci_iomap_entry));
+	clear_bit(entry, zpci_iomap);
+	spin_unlock(&zpci_iomap_lock);
+}
+
+static int zpci_create_device_bus(struct zpci_dev *zdev)
+{
+	struct resource *res;
+	LIST_HEAD(resources);
+	int i;
+
+	/* allocate mapping entry for each used bar */
+	for (i = 0; i < PCI_BAR_COUNT; i++) {
+		unsigned long addr, size, flags;
+		int entry;
+
+		if (!zdev->bars[i].size)
+			continue;
+		entry = zpci_alloc_iomap(zdev);
+		if (entry < 0)
+			return entry;
+		zdev->bars[i].map_idx = entry;
+
+		/* only MMIO is supported */
+		flags = IORESOURCE_MEM;
+		if (zdev->bars[i].val & 8)
+			flags |= IORESOURCE_PREFETCH;
+		if (zdev->bars[i].val & 4)
+			flags |= IORESOURCE_MEM_64;
+
+		addr = ZPCI_IOMAP_ADDR_BASE + ((u64) entry << 48);
+
+		size = 1UL << zdev->bars[i].size;
+
+		res = zpci_alloc_bus_resource(addr, size, flags, zdev->domain);
+		if (IS_ERR(res)) {
+			zpci_free_iomap(zdev, entry);
+			return PTR_ERR(res);
+		}
+		pci_add_resource(&resources, res);
+	}
+
+	zdev->bus = pci_create_root_bus(NULL, ZPCI_BUS_NR, &pci_root_ops,
+					zdev, &resources);
+	if (!zdev->bus)
+		return -EIO;
+
+	zdev->bus->max_bus_speed = zdev->max_bus_speed;
+	return 0;
+}
+
+static int zpci_alloc_domain(struct zpci_dev *zdev)
+{
+	spin_lock(&zpci_domain_lock);
+	zdev->domain = find_first_zero_bit(zpci_domain, ZPCI_NR_DEVICES);
+	if (zdev->domain == ZPCI_NR_DEVICES) {
+		spin_unlock(&zpci_domain_lock);
+		return -ENOSPC;
+	}
+	set_bit(zdev->domain, zpci_domain);
+	spin_unlock(&zpci_domain_lock);
+	return 0;
+}
+
+static void zpci_free_domain(struct zpci_dev *zdev)
+{
+	spin_lock(&zpci_domain_lock);
+	clear_bit(zdev->domain, zpci_domain);
+	spin_unlock(&zpci_domain_lock);
+}
+
+int zpci_create_device(struct zpci_dev *zdev)
+{
+	int rc;
+
+	rc = zpci_alloc_domain(zdev);
+	if (rc)
+		goto out;
+
+	rc = zpci_create_device_bus(zdev);
+	if (rc)
+		goto out_bus;
+
+	mutex_lock(&zpci_list_lock);
+	list_add_tail(&zdev->entry, &zpci_list);
+	mutex_unlock(&zpci_list_lock);
+
+	if (zdev->state == ZPCI_FN_STATE_STANDBY)
+		return 0;
+
+	return 0;
+
+out_bus:
+	zpci_free_domain(zdev);
+out:
+	return rc;
+}
+
+void zpci_stop_device(struct zpci_dev *zdev)
+{
+	/*
+	 * Note: SCLP disables fh via set-pci-fn so don't
+	 * do that here.
+	 */
+}
+EXPORT_SYMBOL_GPL(zpci_stop_device);
+
+int zpci_scan_device(struct zpci_dev *zdev)
+{
+	zdev->pdev = pci_scan_single_device(zdev->bus, ZPCI_DEVFN);
+	if (!zdev->pdev) {
+		pr_err("pci_scan_single_device failed for fid: 0x%x\n",
+			zdev->fid);
+		goto out;
+	}
+
+	zpci_map_resources(zdev);
+	pci_bus_add_devices(zdev->bus);
+
+	/* now that pdev was added to the bus mark it as used */
+	zdev->state = ZPCI_FN_STATE_ONLINE;
+	return 0;
+
+out:
+	return -EIO;
+}
+EXPORT_SYMBOL_GPL(zpci_scan_device);
+
+static inline int barsize(u8 size)
+{
+	return (size) ? (1 << size) >> 10 : 0;
+}
+
+static int zpci_mem_init(void)
+{
+	/* TODO: use realloc */
+	zpci_iomap_start = kzalloc(ZPCI_IOMAP_MAX_ENTRIES * sizeof(*zpci_iomap_start),
+				   GFP_KERNEL);
+	if (!zpci_iomap_start)
+		goto error_zdev;
+	return 0;
+
+error_zdev:
+	return -ENOMEM;
+}
+
+static void zpci_mem_exit(void)
+{
+	kfree(zpci_iomap_start);
+}
+
+unsigned int pci_probe = 1;
+EXPORT_SYMBOL_GPL(pci_probe);
+
+char * __init pcibios_setup(char *str)
+{
+	if (!strcmp(str, "off")) {
+		pci_probe = 0;
+		return NULL;
+	}
+	return str;
+}
+
+static int __init pci_base_init(void)
+{
+	int rc;
+
+	if (!pci_probe)
+		return 0;
+
+	if (!test_facility(2) || !test_facility(69)
+	    || !test_facility(71) || !test_facility(72))
+		return 0;
+
+	pr_info("Probing PCI hardware: PCI:%d  SID:%d  AEN:%d\n",
+		test_facility(69), test_facility(70),
+		test_facility(71));
+
+	rc = zpci_mem_init();
+	if (rc)
+		goto out_mem;
+
+	zpci_scan_devices();
+	return 0;
+
+	zpci_mem_exit();
+out_mem:
+	return rc;
+}
+subsys_initcall(pci_base_init);
diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index 448303b..9e0ebe0 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -83,19 +83,25 @@ static inline void __raw_writel(u32 b, volatile void __iomem *addr)
 #define writel(b,addr) __raw_writel(__cpu_to_le32(b),addr)
 
 #ifdef CONFIG_64BIT
+#ifndef __raw_readq
 static inline u64 __raw_readq(const volatile void __iomem *addr)
 {
 	return *(const volatile u64 __force *) addr;
 }
+#endif
+
 #define readq(addr) __le64_to_cpu(__raw_readq(addr))
 
+#ifndef __raw_writeq
 static inline void __raw_writeq(u64 b, volatile void __iomem *addr)
 {
 	*(volatile u64 __force *) addr = b;
 }
-#define writeq(b,addr) __raw_writeq(__cpu_to_le64(b),addr)
 #endif
 
+#define writeq(b, addr) __raw_writeq(__cpu_to_le64(b), addr)
+#endif /* CONFIG_64BIT */
+
 #ifndef PCI_IOBASE
 #define PCI_IOBASE ((void __iomem *) 0)
 #endif
@@ -286,15 +292,20 @@ static inline void writesb(const void __iomem *addr, const void *buf, int len)
 
 #ifndef CONFIG_GENERIC_IOMAP
 struct pci_dev;
+extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max);
+
+#ifndef pci_iounmap
 static inline void pci_iounmap(struct pci_dev *dev, void __iomem *p)
 {
 }
+#endif
 #endif /* CONFIG_GENERIC_IOMAP */
 
 /*
  * Change virtual addresses to physical addresses and vv.
  * These are pretty trivial
  */
+#ifndef virt_to_phys
 static inline unsigned long virt_to_phys(volatile void *address)
 {
 	return __pa((unsigned long)address);
@@ -304,6 +315,7 @@ static inline void *phys_to_virt(unsigned long address)
 {
 	return __va(address);
 }
+#endif
 
 /*
  * Change "struct page" to physical address.
@@ -363,9 +375,16 @@ static inline void *bus_to_virt(unsigned long address)
 }
 #endif
 
+#ifndef memset_io
 #define memset_io(a, b, c)	memset(__io_virt(a), (b), (c))
+#endif
+
+#ifndef memcpy_fromio
 #define memcpy_fromio(a, b, c)	memcpy((a), __io_virt(b), (c))
+#endif
+#ifndef memcpy_toio
 #define memcpy_toio(a, b, c)	memcpy(__io_virt(a), (b), (c))
+#endif
 
 #endif /* __KERNEL__ */
 
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 02/10] s390/pci: CLP interface
  2012-11-14  9:41 [RFC PATCH 00/10] s390/pci: PCI support on System z Jan Glauber
  2012-11-14  9:41 ` [RFC PATCH 01/10] s390/pci: base support Jan Glauber
@ 2012-11-14  9:41 ` Jan Glauber
  2012-11-14  9:41 ` [RFC PATCH 03/10] s390/bitops: find leftmost bit instruction support Jan Glauber
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Jan Glauber @ 2012-11-14  9:41 UTC (permalink / raw)
  To: linux-pci; +Cc: linux-kernel, heiko.carstens, schwidefsky, Jan Glauber

CLP instructions are used to query the firmware about detected PCI
functions, the attributes of those functions and to enable or disable
a PCI function. The CLP interface is the equivalent to a PCI bus scan.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
---
 arch/s390/include/asm/clp.h     |  28 ++++
 arch/s390/include/asm/pci.h     |   7 +
 arch/s390/include/asm/pci_clp.h | 182 +++++++++++++++++++++++
 arch/s390/pci/Makefile          |   2 +-
 arch/s390/pci/pci.c             |  28 ++++
 arch/s390/pci/pci_clp.c         | 317 ++++++++++++++++++++++++++++++++++++++++
 6 files changed, 563 insertions(+), 1 deletion(-)
 create mode 100644 arch/s390/include/asm/clp.h
 create mode 100644 arch/s390/include/asm/pci_clp.h
 create mode 100644 arch/s390/pci/pci_clp.c

diff --git a/arch/s390/include/asm/clp.h b/arch/s390/include/asm/clp.h
new file mode 100644
index 0000000..6c3aecc
--- /dev/null
+++ b/arch/s390/include/asm/clp.h
@@ -0,0 +1,28 @@
+#ifndef _ASM_S390_CLP_H
+#define _ASM_S390_CLP_H
+
+/* CLP common request & response block size */
+#define CLP_BLK_SIZE			(PAGE_SIZE * 2)
+
+struct clp_req_hdr {
+	u16 len;
+	u16 cmd;
+} __packed;
+
+struct clp_rsp_hdr {
+	u16 len;
+	u16 rsp;
+} __packed;
+
+/* CLP Response Codes */
+#define CLP_RC_OK			0x0010	/* Command request successfully */
+#define CLP_RC_CMD			0x0020	/* Command code not recognized */
+#define CLP_RC_PERM			0x0030	/* Command not authorized */
+#define CLP_RC_FMT			0x0040	/* Invalid command request format */
+#define CLP_RC_LEN			0x0050	/* Invalid command request length */
+#define CLP_RC_8K			0x0060	/* Command requires 8K LPCB */
+#define CLP_RC_RESNOT0			0x0070	/* Reserved field not zero */
+#define CLP_RC_NODATA			0x0080	/* No data available */
+#define CLP_RC_FC_UNKNOWN		0x0100	/* Function code not recognized */
+
+#endif
diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 5e286ce..079b7e2 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -8,6 +8,7 @@
 
 #include <asm-generic/pci.h>
 #include <asm-generic/pci-dma-compat.h>
+#include <asm/pci_clp.h>
 
 #define PCIBIOS_MIN_IO		0x1000
 #define PCIBIOS_MIN_MEM		0x10000000
@@ -76,6 +77,12 @@ void zpci_stop_device(struct zpci_dev *);
 void zpci_free_device(struct zpci_dev *);
 int zpci_scan_device(struct zpci_dev *);
 
+/* CLP */
+int clp_find_pci_devices(void);
+int clp_add_pci_device(u32, u32, int);
+int clp_enable_fh(struct zpci_dev *, u8);
+int clp_disable_fh(struct zpci_dev *);
+
 /* Helpers */
 struct zpci_dev *get_zdev(struct pci_dev *);
 struct zpci_dev *get_zdev_by_fid(u32);
diff --git a/arch/s390/include/asm/pci_clp.h b/arch/s390/include/asm/pci_clp.h
new file mode 100644
index 0000000..e97163e
--- /dev/null
+++ b/arch/s390/include/asm/pci_clp.h
@@ -0,0 +1,182 @@
+#ifndef _ASM_S390_PCI_CLP_H
+#define _ASM_S390_PCI_CLP_H
+
+#include <asm/clp.h>
+
+/*
+ * Call Logical Processor - Command Codes
+ */
+#define CLP_LIST_PCI		0x0002
+#define CLP_QUERY_PCI_FN	0x0003
+#define CLP_QUERY_PCI_FNGRP	0x0004
+#define CLP_SET_PCI_FN		0x0005
+
+/* PCI function handle list entry */
+struct clp_fh_list_entry {
+	u16 device_id;
+	u16 vendor_id;
+	u32 config_state :  1;
+	u32		 : 31;
+	u32 fid;		/* PCI function id */
+	u32 fh;			/* PCI function handle */
+} __packed;
+
+#define CLP_RC_SETPCIFN_FH	0x0101	/* Invalid PCI fn handle */
+#define CLP_RC_SETPCIFN_FHOP	0x0102	/* Fn handle not valid for op */
+#define CLP_RC_SETPCIFN_DMAAS	0x0103	/* Invalid DMA addr space */
+#define CLP_RC_SETPCIFN_RES	0x0104	/* Insufficient resources */
+#define CLP_RC_SETPCIFN_ALRDY	0x0105	/* Fn already in requested state */
+#define CLP_RC_SETPCIFN_ERR	0x0106	/* Fn in permanent error state */
+#define CLP_RC_SETPCIFN_RECPND	0x0107	/* Error recovery pending */
+#define CLP_RC_SETPCIFN_BUSY	0x0108	/* Fn busy */
+#define CLP_RC_LISTPCI_BADRT	0x010a  /* Resume token not recognized */
+#define CLP_RC_QUERYPCIFG_PFGID	0x010b  /* Unrecognized PFGID */
+
+/* request or response block header length */
+#define LIST_PCI_HDR_LEN	32
+
+/* Number of function handles fitting in response block */
+#define CLP_FH_LIST_NR_ENTRIES				\
+	((CLP_BLK_SIZE - 2 * LIST_PCI_HDR_LEN)		\
+		/ sizeof(struct clp_fh_list_entry))
+
+#define CLP_SET_ENABLE_PCI_FN	0	/* Yes, 0 enables it */
+#define CLP_SET_DISABLE_PCI_FN	1	/* Yes, 1 disables it */
+
+#define CLP_UTIL_STR_LEN	64
+
+/* List PCI functions request */
+struct clp_req_list_pci {
+	struct clp_req_hdr hdr;
+	u32 fmt			:  4;	/* cmd request block format */
+	u32			: 28;
+	u64 reserved1;
+	u64 resume_token;
+	u64 reserved2;
+} __packed;
+
+/* List PCI functions response */
+struct clp_rsp_list_pci {
+	struct clp_rsp_hdr hdr;
+	u32 fmt			:  4;	/* cmd request block format */
+	u32			: 28;
+	u64 reserved1;
+	u64 resume_token;
+	u32 reserved2;
+	u16 max_fn;
+	u8 reserved3;
+	u8 entry_size;
+	struct clp_fh_list_entry fh_list[CLP_FH_LIST_NR_ENTRIES];
+} __packed;
+
+/* Query PCI function request */
+struct clp_req_query_pci {
+	struct clp_req_hdr hdr;
+	u32 fmt			:  4;	/* cmd request block format */
+	u32			: 28;
+	u64 reserved1;
+	u32 fh;				/* function handle */
+	u32 reserved2;
+	u64 reserved3;
+} __packed;
+
+/* Query PCI function response */
+struct clp_rsp_query_pci {
+	struct clp_rsp_hdr hdr;
+	u32 fmt			:  4;	/* cmd request block format */
+	u32			: 28;
+	u64 reserved1;
+	u16 vfn;			/* virtual fn number */
+	u16			:  7;
+	u16 util_str_avail	:  1;	/* utility string available? */
+	u16 pfgid		:  8;	/* pci function group id */
+	u32 fid;			/* pci function id */
+	u8 bar_size[PCI_BAR_COUNT];
+	u16 pchid;
+	u32 bar[PCI_BAR_COUNT];
+	u64 reserved2;
+	u64 sdma;			/* start dma as */
+	u64 edma;			/* end dma as */
+	u64 reserved3[6];
+	u8 util_str[CLP_UTIL_STR_LEN];	/* utility string */
+} __packed;
+
+/* Query PCI function group request */
+struct clp_req_query_pci_grp {
+	struct clp_req_hdr hdr;
+	u32 fmt			:  4;	/* cmd request block format */
+	u32			: 28;
+	u64 reserved1;
+	u32			: 24;
+	u32 pfgid		:  8;	/* function group id */
+	u32 reserved2;
+	u64 reserved3;
+} __packed;
+
+/* Query PCI function group response */
+struct clp_rsp_query_pci_grp {
+	struct clp_rsp_hdr hdr;
+	u32 fmt			:  4;	/* cmd request block format */
+	u32			: 28;
+	u64 reserved1;
+	u16			:  4;
+	u16 noi			: 12;	/* number of interrupts */
+	u8 version;
+	u8			:  6;
+	u8 frame		:  1;
+	u8 refresh		:  1;	/* TLB refresh mode */
+	u16 reserved2;
+	u16 mui;
+	u64 reserved3;
+	u64 dasm;			/* dma address space mask */
+	u64 msia;			/* MSI address */
+	u64 reserved4;
+	u64 reserved5;
+} __packed;
+
+/* Set PCI function request */
+struct clp_req_set_pci {
+	struct clp_req_hdr hdr;
+	u32 fmt			:  4;	/* cmd request block format */
+	u32			: 28;
+	u64 reserved1;
+	u32 fh;				/* function handle */
+	u16 reserved2;
+	u8 oc;				/* operation controls */
+	u8 ndas;			/* number of dma spaces */
+	u64 reserved3;
+} __packed;
+
+/* Set PCI function response */
+struct clp_rsp_set_pci {
+	struct clp_rsp_hdr hdr;
+	u32 fmt			:  4;	/* cmd request block format */
+	u32			: 28;
+	u64 reserved1;
+	u32 fh;				/* function handle */
+	u32 reserved3;
+	u64 reserved4;
+} __packed;
+
+/* Combined request/response block structures used by clp insn */
+struct clp_req_rsp_list_pci {
+	struct clp_req_list_pci request;
+	struct clp_rsp_list_pci response;
+} __packed;
+
+struct clp_req_rsp_set_pci {
+	struct clp_req_set_pci request;
+	struct clp_rsp_set_pci response;
+} __packed;
+
+struct clp_req_rsp_query_pci {
+	struct clp_req_query_pci request;
+	struct clp_rsp_query_pci response;
+} __packed;
+
+struct clp_req_rsp_query_pci_grp {
+	struct clp_req_query_pci_grp request;
+	struct clp_rsp_query_pci_grp response;
+} __packed;
+
+#endif
diff --git a/arch/s390/pci/Makefile b/arch/s390/pci/Makefile
index 78a1344..1afd68c 100644
--- a/arch/s390/pci/Makefile
+++ b/arch/s390/pci/Makefile
@@ -2,4 +2,4 @@
 # Makefile for the s390 PCI subsystem.
 #
 
-obj-$(CONFIG_PCI)	+= pci.o
+obj-$(CONFIG_PCI)	+= pci.o pci_clp.o
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 0b80ac7..70f6c56 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -29,6 +29,7 @@
 
 #include <asm/facility.h>
 #include <asm/pci_insn.h>
+#include <asm/pci_clp.h>
 
 #define DEBUG				/* enable pr_debug */
 
@@ -436,6 +437,20 @@ static void zpci_free_domain(struct zpci_dev *zdev)
 	spin_unlock(&zpci_domain_lock);
 }
 
+int zpci_enable_device(struct zpci_dev *zdev)
+{
+	int rc;
+
+	rc = clp_enable_fh(zdev, ZPCI_NR_DMA_SPACES);
+	if (rc)
+		goto out;
+	pr_info("Enabled fh: 0x%x fid: 0x%x\n", zdev->fh, zdev->fid);
+	return 0;
+out:
+	return rc;
+}
+EXPORT_SYMBOL_GPL(zpci_enable_device);
+
 int zpci_create_device(struct zpci_dev *zdev)
 {
 	int rc;
@@ -455,8 +470,15 @@ int zpci_create_device(struct zpci_dev *zdev)
 	if (zdev->state == ZPCI_FN_STATE_STANDBY)
 		return 0;
 
+	rc = zpci_enable_device(zdev);
+	if (rc)
+		goto out_start;
 	return 0;
 
+out_start:
+	mutex_lock(&zpci_list_lock);
+	list_del(&zdev->entry);
+	mutex_unlock(&zpci_list_lock);
 out_bus:
 	zpci_free_domain(zdev);
 out:
@@ -489,6 +511,7 @@ int zpci_scan_device(struct zpci_dev *zdev)
 	return 0;
 
 out:
+	clp_disable_fh(zdev);
 	return -EIO;
 }
 EXPORT_SYMBOL_GPL(zpci_scan_device);
@@ -547,9 +570,14 @@ static int __init pci_base_init(void)
 	if (rc)
 		goto out_mem;
 
+	rc = clp_find_pci_devices();
+	if (rc)
+		goto out_find;
+
 	zpci_scan_devices();
 	return 0;
 
+out_find:
 	zpci_mem_exit();
 out_mem:
 	return rc;
diff --git a/arch/s390/pci/pci_clp.c b/arch/s390/pci/pci_clp.c
new file mode 100644
index 0000000..291da1a
--- /dev/null
+++ b/arch/s390/pci/pci_clp.c
@@ -0,0 +1,317 @@
+/*
+ * Copyright IBM Corp. 2012
+ *
+ * Author(s):
+ *   Jan Glauber <jang@linux.vnet.ibm.com>
+ */
+
+#define COMPONENT "zPCI"
+#define pr_fmt(fmt) COMPONENT ": " fmt
+
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/err.h>
+#include <linux/delay.h>
+#include <linux/pci.h>
+#include <asm/pci_clp.h>
+
+/*
+ * Call Logical Processor
+ * Retry logic is handled by the caller.
+ */
+static inline u8 clp_instr(void *req)
+{
+	u64 ilpm;
+	u8 cc;
+
+	asm volatile (
+		"	.insn	rrf,0xb9a00000,%[ilpm],%[req],0x0,0x2\n"
+		"	ipm	%[cc]\n"
+		"	srl	%[cc],28\n"
+		: [cc] "=d" (cc), [ilpm] "=d" (ilpm)
+		: [req] "a" (req)
+		: "cc", "memory");
+	return cc;
+}
+
+static void *clp_alloc_block(void)
+{
+	struct page *page = alloc_pages(GFP_KERNEL, get_order(CLP_BLK_SIZE));
+	return (page) ? page_address(page) : NULL;
+}
+
+static void clp_free_block(void *ptr)
+{
+	free_pages((unsigned long) ptr, get_order(CLP_BLK_SIZE));
+}
+
+static void clp_store_query_pci_fngrp(struct zpci_dev *zdev,
+				      struct clp_rsp_query_pci_grp *response)
+{
+	switch (response->version) {
+	case 1:
+		zdev->max_bus_speed = PCIE_SPEED_5_0GT;
+		break;
+	default:
+		zdev->max_bus_speed = PCI_SPEED_UNKNOWN;
+		break;
+	}
+}
+
+static int clp_query_pci_fngrp(struct zpci_dev *zdev, u8 pfgid)
+{
+	struct clp_req_rsp_query_pci_grp *rrb;
+	int rc;
+
+	rrb = clp_alloc_block();
+	if (!rrb)
+		return -ENOMEM;
+
+	memset(rrb, 0, sizeof(*rrb));
+	rrb->request.hdr.len = sizeof(rrb->request);
+	rrb->request.hdr.cmd = CLP_QUERY_PCI_FNGRP;
+	rrb->response.hdr.len = sizeof(rrb->response);
+	rrb->request.pfgid = pfgid;
+
+	rc = clp_instr(rrb);
+	if (!rc && rrb->response.hdr.rsp == CLP_RC_OK)
+		clp_store_query_pci_fngrp(zdev, &rrb->response);
+	else {
+		pr_err("Query PCI FNGRP failed with response: %x  cc: %d\n",
+			rrb->response.hdr.rsp, rc);
+		rc = -EIO;
+	}
+	clp_free_block(rrb);
+	return rc;
+}
+
+static int clp_store_query_pci_fn(struct zpci_dev *zdev,
+				  struct clp_rsp_query_pci *response)
+{
+	int i;
+
+	for (i = 0; i < PCI_BAR_COUNT; i++) {
+		zdev->bars[i].val = le32_to_cpu(response->bar[i]);
+		zdev->bars[i].size = response->bar_size[i];
+	}
+	zdev->pchid = response->pchid;
+	zdev->pfgid = response->pfgid;
+	return 0;
+}
+
+static int clp_query_pci_fn(struct zpci_dev *zdev, u32 fh)
+{
+	struct clp_req_rsp_query_pci *rrb;
+	int rc;
+
+	rrb = clp_alloc_block();
+	if (!rrb)
+		return -ENOMEM;
+
+	memset(rrb, 0, sizeof(*rrb));
+	rrb->request.hdr.len = sizeof(rrb->request);
+	rrb->request.hdr.cmd = CLP_QUERY_PCI_FN;
+	rrb->response.hdr.len = sizeof(rrb->response);
+	rrb->request.fh = fh;
+
+	rc = clp_instr(rrb);
+	if (!rc && rrb->response.hdr.rsp == CLP_RC_OK) {
+		rc = clp_store_query_pci_fn(zdev, &rrb->response);
+		if (rc)
+			goto out;
+		if (rrb->response.pfgid)
+			rc = clp_query_pci_fngrp(zdev, rrb->response.pfgid);
+	} else {
+		pr_err("Query PCI failed with response: %x  cc: %d\n",
+			 rrb->response.hdr.rsp, rc);
+		rc = -EIO;
+	}
+out:
+	clp_free_block(rrb);
+	return rc;
+}
+
+int clp_add_pci_device(u32 fid, u32 fh, int configured)
+{
+	struct zpci_dev *zdev;
+	int rc;
+
+	zdev = zpci_alloc_device();
+	if (IS_ERR(zdev))
+		return PTR_ERR(zdev);
+
+	zdev->fh = fh;
+	zdev->fid = fid;
+
+	/* Query function properties and update zdev */
+	rc = clp_query_pci_fn(zdev, fh);
+	if (rc)
+		goto error;
+
+	if (configured)
+		zdev->state = ZPCI_FN_STATE_CONFIGURED;
+	else
+		zdev->state = ZPCI_FN_STATE_STANDBY;
+
+	rc = zpci_create_device(zdev);
+	if (rc)
+		goto error;
+	return 0;
+
+error:
+	zpci_free_device(zdev);
+	return rc;
+}
+
+/*
+ * Enable/Disable a given PCI function defined by its function handle.
+ */
+static int clp_set_pci_fn(u32 *fh, u8 nr_dma_as, u8 command)
+{
+	struct clp_req_rsp_set_pci *rrb;
+	int rc, retries = 1000;
+
+	rrb = clp_alloc_block();
+	if (!rrb)
+		return -ENOMEM;
+
+	do {
+		memset(rrb, 0, sizeof(*rrb));
+		rrb->request.hdr.len = sizeof(rrb->request);
+		rrb->request.hdr.cmd = CLP_SET_PCI_FN;
+		rrb->response.hdr.len = sizeof(rrb->response);
+		rrb->request.fh = *fh;
+		rrb->request.oc = command;
+		rrb->request.ndas = nr_dma_as;
+
+		rc = clp_instr(rrb);
+		if (rrb->response.hdr.rsp == CLP_RC_SETPCIFN_BUSY) {
+			retries--;
+			if (retries < 0)
+				break;
+			msleep(1);
+		}
+	} while (rrb->response.hdr.rsp == CLP_RC_SETPCIFN_BUSY);
+
+	if (!rc && rrb->response.hdr.rsp == CLP_RC_OK)
+		*fh = rrb->response.fh;
+	else {
+		pr_err("Set PCI FN failed with response: %x  cc: %d\n",
+			rrb->response.hdr.rsp, rc);
+		rc = -EIO;
+	}
+	clp_free_block(rrb);
+	return rc;
+}
+
+int clp_enable_fh(struct zpci_dev *zdev, u8 nr_dma_as)
+{
+	u32 fh = zdev->fh;
+	int rc;
+
+	rc = clp_set_pci_fn(&fh, nr_dma_as, CLP_SET_ENABLE_PCI_FN);
+	if (!rc)
+		/* Success -> store enabled handle in zdev */
+		zdev->fh = fh;
+	return rc;
+}
+
+int clp_disable_fh(struct zpci_dev *zdev)
+{
+	u32 fh = zdev->fh;
+	int rc;
+
+	if (!zdev_enabled(zdev))
+		return 0;
+
+	dev_info(&zdev->pdev->dev, "disabling fn handle: 0x%x\n", fh);
+	rc = clp_set_pci_fn(&fh, 0, CLP_SET_DISABLE_PCI_FN);
+	if (!rc)
+		/* Success -> store disabled handle in zdev */
+		zdev->fh = fh;
+	else
+		dev_err(&zdev->pdev->dev,
+			"Failed to disable fn handle: 0x%x\n", fh);
+	return rc;
+}
+
+static void clp_check_pcifn_entry(struct clp_fh_list_entry *entry)
+{
+	int present, rc;
+
+	if (!entry->vendor_id)
+		return;
+
+	/* TODO: be a little bit more scalable */
+	present = zpci_fid_present(entry->fid);
+
+	if (present)
+		pr_debug("%s: device %x already present\n", __func__, entry->fid);
+
+	/* skip already used functions */
+	if (present && entry->config_state)
+		return;
+
+	/* aev 306: function moved to stand-by state */
+	if (present && !entry->config_state) {
+		/*
+		 * The handle is already disabled, that means no iota/irq freeing via
+		 * the firmware interfaces anymore. Need to free resources manually
+		 * (DMA memory, debug, sysfs)...
+		 */
+		zpci_stop_device(get_zdev_by_fid(entry->fid));
+		return;
+	}
+
+	rc = clp_add_pci_device(entry->fid, entry->fh, entry->config_state);
+	if (rc)
+		pr_err("Failed to add fid: 0x%x\n", entry->fid);
+}
+
+int clp_find_pci_devices(void)
+{
+	struct clp_req_rsp_list_pci *rrb;
+	u64 resume_token = 0;
+	int entries, i, rc;
+
+	rrb = clp_alloc_block();
+	if (!rrb)
+		return -ENOMEM;
+
+	do {
+		memset(rrb, 0, sizeof(*rrb));
+		rrb->request.hdr.len = sizeof(rrb->request);
+		rrb->request.hdr.cmd = CLP_LIST_PCI;
+		/* store as many entries as possible */
+		rrb->response.hdr.len = CLP_BLK_SIZE - LIST_PCI_HDR_LEN;
+		rrb->request.resume_token = resume_token;
+
+		/* Get PCI function handle list */
+		rc = clp_instr(rrb);
+		if (rc || rrb->response.hdr.rsp != CLP_RC_OK) {
+			pr_err("List PCI failed with response: 0x%x  cc: %d\n",
+				rrb->response.hdr.rsp, rc);
+			rc = -EIO;
+			goto out;
+		}
+
+		WARN_ON_ONCE(rrb->response.entry_size !=
+			sizeof(struct clp_fh_list_entry));
+
+		entries = (rrb->response.hdr.len - LIST_PCI_HDR_LEN) /
+			rrb->response.entry_size;
+		pr_info("Detected number of PCI functions: %u\n", entries);
+
+		/* Store the returned resume token as input for the next call */
+		resume_token = rrb->response.resume_token;
+
+		for (i = 0; i < entries; i++)
+			clp_check_pcifn_entry(&rrb->response.fh_list[i]);
+	} while (resume_token);
+
+	pr_debug("Maximum number of supported PCI functions: %u\n",
+		rrb->response.max_fn);
+out:
+	clp_free_block(rrb);
+	return rc;
+}
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 03/10] s390/bitops: find leftmost bit instruction support
  2012-11-14  9:41 [RFC PATCH 00/10] s390/pci: PCI support on System z Jan Glauber
  2012-11-14  9:41 ` [RFC PATCH 01/10] s390/pci: base support Jan Glauber
  2012-11-14  9:41 ` [RFC PATCH 02/10] s390/pci: CLP interface Jan Glauber
@ 2012-11-14  9:41 ` Jan Glauber
  2012-11-14  9:41 ` [RFC PATCH 04/10] s390/pci: PCI adapter interrupts for MSI/MSI-X Jan Glauber
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Jan Glauber @ 2012-11-14  9:41 UTC (permalink / raw)
  To: linux-pci; +Cc: linux-kernel, heiko.carstens, schwidefsky, Jan Glauber

The flogr instruction scans a bitmap starting from the leftmost bit.
Implement support for these bitops. This could be useful to scan
bitmaps like an interrupt vector set by the hardware starting
at the leftmost bit.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
---
 arch/s390/include/asm/bitops.h | 81 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/arch/s390/include/asm/bitops.h b/arch/s390/include/asm/bitops.h
index 6f57389..1542293 100644
--- a/arch/s390/include/asm/bitops.h
+++ b/arch/s390/include/asm/bitops.h
@@ -640,6 +640,87 @@ static inline unsigned long find_first_bit(const unsigned long * addr,
 }
 #define find_first_bit find_first_bit
 
+/*
+ * Big endian variant whichs starts bit counting from left using
+ * the flogr (find leftmost one) instruction.
+ */
+static inline unsigned long __flo_word(unsigned long nr, unsigned long val)
+{
+	register unsigned long bit asm("2") = val;
+	register unsigned long out asm("3");
+
+	asm volatile (
+		"	.insn	rre,0xb9830000,%[bit],%[bit]\n"
+		: [bit] "+d" (bit), [out] "=d" (out) : : "cc");
+	return nr + bit;
+}
+
+/*
+ * 64 bit special left bitops format:
+ * order in memory:
+ *    00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
+ *    10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
+ *    20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f
+ *    30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f
+ * after that follows the next long with bit numbers
+ *    40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f
+ *    50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f
+ *    60 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f
+ *    70 71 72 73 74 75 76 77 78 79 7a 7b 7c 7d 7e 7f
+ * The reason for this bit ordering is the fact that
+ * the hardware sets bits in a bitmap starting at bit 0
+ * and we don't want to scan the bitmap from the 'wrong
+ * end'.
+ */
+static inline unsigned long find_first_bit_left(const unsigned long *addr,
+						unsigned long size)
+{
+	unsigned long bytes, bits;
+
+	if (!size)
+		return 0;
+	bytes = __ffs_word_loop(addr, size);
+	bits = __flo_word(bytes * 8, __load_ulong_be(addr, bytes));
+	return (bits < size) ? bits : size;
+}
+
+static inline int find_next_bit_left(const unsigned long *addr,
+				     unsigned long size,
+				     unsigned long offset)
+{
+	const unsigned long *p;
+	unsigned long bit, set;
+
+	if (offset >= size)
+		return size;
+	bit = offset & (__BITOPS_WORDSIZE - 1);
+	offset -= bit;
+	size -= offset;
+	p = addr + offset / __BITOPS_WORDSIZE;
+	if (bit) {
+		set = __flo_word(0, *p & (~0UL << bit));
+		if (set >= size)
+			return size + offset;
+		if (set < __BITOPS_WORDSIZE)
+			return set + offset;
+		offset += __BITOPS_WORDSIZE;
+		size -= __BITOPS_WORDSIZE;
+		p++;
+	}
+	return offset + find_first_bit_left(p, size);
+}
+
+#define for_each_set_bit_left(bit, addr, size)				\
+	for ((bit) = find_first_bit_left((addr), (size));		\
+	     (bit) < (size);						\
+	     (bit) = find_next_bit_left((addr), (size), (bit) + 1))
+
+/* same as for_each_set_bit() but use bit as value to start with */
+#define for_each_set_bit_left_cont(bit, addr, size)			\
+	for ((bit) = find_next_bit_left((addr), (size), (bit));		\
+	     (bit) < (size);						\
+	     (bit) = find_next_bit_left((addr), (size), (bit) + 1))
+
 /**
  * find_next_zero_bit - find the first zero bit in a memory region
  * @addr: The address to base the search on
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 04/10] s390/pci: PCI adapter interrupts for MSI/MSI-X
  2012-11-14  9:41 [RFC PATCH 00/10] s390/pci: PCI support on System z Jan Glauber
                   ` (2 preceding siblings ...)
  2012-11-14  9:41 ` [RFC PATCH 03/10] s390/bitops: find leftmost bit instruction support Jan Glauber
@ 2012-11-14  9:41 ` Jan Glauber
  2012-11-14  9:41 ` [RFC PATCH 05/10] s390/pci: DMA support Jan Glauber
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Jan Glauber @ 2012-11-14  9:41 UTC (permalink / raw)
  To: linux-pci; +Cc: linux-kernel, heiko.carstens, schwidefsky, Jan Glauber

Support PCI adapter interrupts using the Single-IRQ-mode. Single-IRQ-mode
disables an adapter IRQ automatically after delivering it until the SIC
instruction enables it again. This is used to reduce the number of IRQs
for streaming workloads.

Up to 64 MSI handlers can be registered per PCI function.
A hash table is used to map interrupt numbers to MSI descriptors.
The interrupt vector is scanned using the flogr instruction.
Only MSI/MSI-X interrupts are supported, no legacy INTs.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
---
 arch/s390/include/asm/hw_irq.h |  22 ++
 arch/s390/include/asm/irq.h    |  12 ++
 arch/s390/include/asm/isc.h    |   1 +
 arch/s390/include/asm/pci.h    |  27 +++
 arch/s390/kernel/irq.c         |   2 +
 arch/s390/pci/Makefile         |   2 +-
 arch/s390/pci/pci.c            | 464 ++++++++++++++++++++++++++++++++++++++++-
 arch/s390/pci/pci_clp.c        |   3 +
 arch/s390/pci/pci_msi.c        | 141 +++++++++++++
 drivers/pci/msi.c              |  10 +-
 include/linux/irq.h            |  10 +-
 11 files changed, 685 insertions(+), 9 deletions(-)
 create mode 100644 arch/s390/include/asm/hw_irq.h
 create mode 100644 arch/s390/pci/pci_msi.c

diff --git a/arch/s390/include/asm/hw_irq.h b/arch/s390/include/asm/hw_irq.h
new file mode 100644
index 0000000..7e3d258
--- /dev/null
+++ b/arch/s390/include/asm/hw_irq.h
@@ -0,0 +1,22 @@
+#ifndef _HW_IRQ_H
+#define _HW_IRQ_H
+
+#include <linux/msi.h>
+#include <linux/pci.h>
+
+static inline struct msi_desc *irq_get_msi_desc(unsigned int irq)
+{
+	return __irq_get_msi_desc(irq);
+}
+
+/* Must be called with msi map lock held */
+static inline int irq_set_msi_desc(unsigned int irq, struct msi_desc *msi)
+{
+	if (!msi)
+		return -EINVAL;
+
+	msi->irq = irq;
+	return 0;
+}
+
+#endif
diff --git a/arch/s390/include/asm/irq.h b/arch/s390/include/asm/irq.h
index 6703dd9..e6972f8 100644
--- a/arch/s390/include/asm/irq.h
+++ b/arch/s390/include/asm/irq.h
@@ -33,6 +33,8 @@ enum interruption_class {
 	IOINT_APB,
 	IOINT_ADM,
 	IOINT_CSC,
+	IOINT_PCI,
+	IOINT_MSI,
 	NMI_NMI,
 	NR_IRQS,
 };
@@ -51,4 +53,14 @@ void service_subclass_irq_unregister(void);
 void measurement_alert_subclass_register(void);
 void measurement_alert_subclass_unregister(void);
 
+#ifdef CONFIG_LOCKDEP
+#  define disable_irq_nosync_lockdep(irq)	disable_irq_nosync(irq)
+#  define disable_irq_nosync_lockdep_irqsave(irq, flags) \
+						disable_irq_nosync(irq)
+#  define disable_irq_lockdep(irq)		disable_irq(irq)
+#  define enable_irq_lockdep(irq)		enable_irq(irq)
+#  define enable_irq_lockdep_irqrestore(irq, flags) \
+						enable_irq(irq)
+#endif
+
 #endif /* _ASM_IRQ_H */
diff --git a/arch/s390/include/asm/isc.h b/arch/s390/include/asm/isc.h
index 5ae6064..68d7d68 100644
--- a/arch/s390/include/asm/isc.h
+++ b/arch/s390/include/asm/isc.h
@@ -18,6 +18,7 @@
 #define CHSC_SCH_ISC 7			/* CHSC subchannels */
 /* Adapter interrupts. */
 #define QDIO_AIRQ_ISC IO_SCH_ISC	/* I/O subchannel in qdio mode */
+#define PCI_ISC 2			/* PCI I/O subchannels */
 #define AP_ISC 6			/* adjunct processor (crypto) devices */
 
 /* Functions for registration of I/O interruption subclasses */
diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 079b7e2..a9aadeb 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -20,6 +20,10 @@ void pci_iounmap(struct pci_dev *, void __iomem *);
 int pci_domain_nr(struct pci_bus *);
 int pci_proc_domain(struct pci_bus *);
 
+/* MSI arch hooks */
+#define arch_setup_msi_irqs	arch_setup_msi_irqs
+#define arch_teardown_msi_irqs	arch_teardown_msi_irqs
+
 #define ZPCI_BUS_NR			0	/* default bus number */
 #define ZPCI_DEVFN			0	/* default device number */
 
@@ -29,6 +33,15 @@ int pci_proc_domain(struct pci_bus *);
 #define ZPCI_FC_BLOCKED			0x20
 #define ZPCI_FC_DMA_ENABLED		0x10
 
+struct msi_map {
+	unsigned long irq;
+	struct msi_desc *msi;
+	struct hlist_node msi_chain;
+};
+
+#define ZPCI_NR_MSI_VECS	64
+#define ZPCI_MSI_MASK		(ZPCI_NR_MSI_VECS - 1)
+
 enum zpci_state {
 	ZPCI_FN_STATE_RESERVED,
 	ZPCI_FN_STATE_STANDBY,
@@ -56,6 +69,12 @@ struct zpci_dev {
 	u8		pfgid;		/* function group ID */
 	u16		domain;
 
+	/* IRQ stuff */
+	u64		msi_addr;	/* MSI address */
+	struct zdev_irq_map *irq_map;
+	struct msi_map *msi_map[ZPCI_NR_MSI_VECS];
+	unsigned int	aisb;		/* number of the summary bit */
+
 	struct zpci_bar_struct bars[PCI_BAR_COUNT];
 
 	enum pci_bus_speed max_bus_speed;
@@ -83,6 +102,14 @@ int clp_add_pci_device(u32, u32, int);
 int clp_enable_fh(struct zpci_dev *, u8);
 int clp_disable_fh(struct zpci_dev *);
 
+/* MSI */
+struct msi_desc *__irq_get_msi_desc(unsigned int);
+int zpci_msi_set_mask_bits(struct msi_desc *, u32, u32);
+int zpci_setup_msi_irq(struct zpci_dev *, struct msi_desc *, unsigned int, int);
+void zpci_teardown_msi_irq(struct zpci_dev *, struct msi_desc *);
+int zpci_msihash_init(void);
+void zpci_msihash_exit(void);
+
 /* Helpers */
 struct zpci_dev *get_zdev(struct pci_dev *);
 struct zpci_dev *get_zdev_by_fid(u32);
diff --git a/arch/s390/kernel/irq.c b/arch/s390/kernel/irq.c
index 6cdc55b..bf24293 100644
--- a/arch/s390/kernel/irq.c
+++ b/arch/s390/kernel/irq.c
@@ -58,6 +58,8 @@ static const struct irq_class intrclass_names[] = {
 	[IOINT_APB]  = {.name = "APB", .desc = "[I/O] AP Bus"},
 	[IOINT_ADM]  = {.name = "ADM", .desc = "[I/O] EADM Subchannel"},
 	[IOINT_CSC]  = {.name = "CSC", .desc = "[I/O] CHSC Subchannel"},
+	[IOINT_PCI]  = {.name = "PCI", .desc = "[I/O] PCI Interrupt" },
+	[IOINT_MSI] =  {.name = "MSI", .desc = "[I/O] MSI Interrupt" },
 	[NMI_NMI]    = {.name = "NMI", .desc = "[NMI] Machine Check"},
 };
 
diff --git a/arch/s390/pci/Makefile b/arch/s390/pci/Makefile
index 1afd68c..628be7b 100644
--- a/arch/s390/pci/Makefile
+++ b/arch/s390/pci/Makefile
@@ -2,4 +2,4 @@
 # Makefile for the s390 PCI subsystem.
 #
 
-obj-$(CONFIG_PCI)	+= pci.o pci_clp.o
+obj-$(CONFIG_PCI)	+= pci.o pci_clp.o pci_msi.o
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 70f6c56..d11dc8a 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -23,17 +23,25 @@
 #include <linux/err.h>
 #include <linux/export.h>
 #include <linux/delay.h>
+#include <linux/irq.h>
+#include <linux/kernel_stat.h>
 #include <linux/seq_file.h>
 #include <linux/pci.h>
 #include <linux/msi.h>
 
+#include <asm/isc.h>
+#include <asm/airq.h>
 #include <asm/facility.h>
 #include <asm/pci_insn.h>
 #include <asm/pci_clp.h>
 
 #define DEBUG				/* enable pr_debug */
 
+#define	SIC_IRQ_MODE_ALL		0
+#define	SIC_IRQ_MODE_SINGLE		1
+
 #define ZPCI_NR_DMA_SPACES		1
+#define ZPCI_MSI_VEC_BITS		6
 #define ZPCI_NR_DEVICES			CONFIG_PCI_NR_FUNCTIONS
 
 /* list of all detected zpci devices */
@@ -43,12 +51,63 @@ DEFINE_MUTEX(zpci_list_lock);
 static DECLARE_BITMAP(zpci_domain, ZPCI_NR_DEVICES);
 static DEFINE_SPINLOCK(zpci_domain_lock);
 
+struct callback {
+	irq_handler_t	handler;
+	void		*data;
+};
+
+struct zdev_irq_map {
+	unsigned long	aibv;		/* AI bit vector */
+	int		msi_vecs;	/* consecutive MSI-vectors used */
+	int		__unused;
+	struct callback	cb[ZPCI_NR_MSI_VECS]; /* callback handler array */
+	spinlock_t	lock;		/* protect callbacks against de-reg */
+};
+
+struct intr_bucket {
+	/* amap of adapters, one bit per dev, corresponds to one irq nr */
+	unsigned long	*alloc;
+	/* AI summary bit, global page for all devices */
+	unsigned long	*aisb;
+	/* pointer to aibv and callback data in zdev */
+	struct zdev_irq_map *imap[ZPCI_NR_DEVICES];
+	/* protects the whole bucket struct */
+	spinlock_t	lock;
+};
+
+static struct intr_bucket *bucket;
+
+/* Adapter local summary indicator */
+static u8 *zpci_irq_si;
+
+static atomic_t irq_retries = ATOMIC_INIT(0);
+
 /* I/O Map */
 static DEFINE_SPINLOCK(zpci_iomap_lock);
 static DECLARE_BITMAP(zpci_iomap, ZPCI_IOMAP_MAX_ENTRIES);
 struct zpci_iomap_entry *zpci_iomap_start;
 EXPORT_SYMBOL_GPL(zpci_iomap_start);
 
+/* highest irq summary bit */
+static int __read_mostly aisb_max;
+
+static struct kmem_cache *zdev_irq_cache;
+
+static inline int irq_to_msi_nr(unsigned int irq)
+{
+	return irq & ZPCI_MSI_MASK;
+}
+
+static inline int irq_to_dev_nr(unsigned int irq)
+{
+	return irq >> ZPCI_MSI_VEC_BITS;
+}
+
+static inline struct zdev_irq_map *get_imap(unsigned int irq)
+{
+	return bucket->imap[irq_to_dev_nr(irq)];
+}
+
 struct zpci_dev *get_zdev(struct pci_dev *pdev)
 {
 	return (struct zpci_dev *) pdev->sysdata;
@@ -120,6 +179,67 @@ static int zpci_store_fib(struct zpci_dev *zdev, u8 *fc)
 	return (cc) ? -EIO : 0;
 }
 
+/* Modify PCI: Register adapter interruptions */
+static int zpci_register_airq(struct zpci_dev *zdev, unsigned int aisb,
+			      u64 aibv)
+{
+	u64 req = ZPCI_CREATE_REQ(zdev->fh, 0, ZPCI_MOD_FC_REG_INT);
+	struct zpci_fib *fib;
+	int rc;
+
+	fib = (void *) get_zeroed_page(GFP_KERNEL);
+	if (!fib)
+		return -ENOMEM;
+
+	fib->isc = PCI_ISC;
+	fib->noi = zdev->irq_map->msi_vecs;
+	fib->sum = 1;		/* enable summary notifications */
+	fib->aibv = aibv;
+	fib->aibvo = 0;		/* every function has its own page */
+	fib->aisb = (u64) bucket->aisb + aisb / 8;
+	fib->aisbo = aisb & ZPCI_MSI_MASK;
+
+	rc = mpcifc_instr(req, fib);
+	pr_debug("%s mpcifc returned noi: %d\n", __func__, fib->noi);
+
+	free_page((unsigned long) fib);
+	return rc;
+}
+
+struct mod_pci_args {
+	u64 base;
+	u64 limit;
+	u64 iota;
+};
+
+static int mod_pci(struct zpci_dev *zdev, int fn, u8 dmaas, struct mod_pci_args *args)
+{
+	u64 req = ZPCI_CREATE_REQ(zdev->fh, dmaas, fn);
+	struct zpci_fib *fib;
+	int rc;
+
+	/* The FIB must be available even if it's not used */
+	fib = (void *) get_zeroed_page(GFP_KERNEL);
+	if (!fib)
+		return -ENOMEM;
+
+	fib->pba = args->base;
+	fib->pal = args->limit;
+	fib->iota = args->iota;
+
+	rc = mpcifc_instr(req, fib);
+	free_page((unsigned long) fib);
+	return rc;
+}
+
+/* Modify PCI: Unregister adapter interruptions */
+static int zpci_unregister_airq(struct zpci_dev *zdev)
+{
+	struct mod_pci_args args = { 0, 0, 0 };
+
+	return mod_pci(zdev, ZPCI_MOD_FC_DEREG_INT, 0, &args);
+}
+
 #define ZPCI_PCIAS_CFGSPC	15
 
 static int zpci_cfg_load(struct zpci_dev *zdev, int offset, u32 *val, u8 len)
@@ -150,6 +270,55 @@ static int zpci_cfg_store(struct zpci_dev *zdev, int offset, u32 val, u8 len)
 	return rc;
 }
 
+void synchronize_irq(unsigned int irq)
+{
+	/*
+	 * Not needed, the handler is protected by a lock and IRQs that occur
+	 * after the handler is deleted are just NOPs.
+	 */
+}
+EXPORT_SYMBOL_GPL(synchronize_irq);
+
+void enable_irq(unsigned int irq)
+{
+	struct msi_desc *msi = irq_get_msi_desc(irq);
+
+	zpci_msi_set_mask_bits(msi, 1, 0);
+}
+EXPORT_SYMBOL_GPL(enable_irq);
+
+void disable_irq(unsigned int irq)
+{
+	struct msi_desc *msi = irq_get_msi_desc(irq);
+
+	zpci_msi_set_mask_bits(msi, 1, 1);
+}
+EXPORT_SYMBOL_GPL(disable_irq);
+
+void disable_irq_nosync(unsigned int irq)
+{
+	disable_irq(irq);
+}
+EXPORT_SYMBOL_GPL(disable_irq_nosync);
+
+unsigned long probe_irq_on(void)
+{
+	return 0;
+}
+EXPORT_SYMBOL_GPL(probe_irq_on);
+
+int probe_irq_off(unsigned long val)
+{
+	return 0;
+}
+EXPORT_SYMBOL_GPL(probe_irq_off);
+
+unsigned int probe_irq_mask(unsigned long val)
+{
+	return val;
+}
+EXPORT_SYMBOL_GPL(probe_irq_mask);
+
 void __devinit pcibios_fixup_bus(struct pci_bus *bus)
 {
 }
@@ -219,6 +388,155 @@ static struct pci_ops pci_root_ops = {
 	.write = pci_write,
 };
 
+/* store the last handled bit to implement fair scheduling of devices */
+static DEFINE_PER_CPU(unsigned long, next_sbit);
+
+static void zpci_irq_handler(void *dont, void *need)
+{
+	unsigned long sbit, mbit, last = 0, start = __get_cpu_var(next_sbit);
+	int rescan = 0, max = aisb_max;
+	struct zdev_irq_map *imap;
+
+	kstat_cpu(smp_processor_id()).irqs[IOINT_PCI]++;
+	sbit = start;
+
+scan:
+	/* find summary_bit */
+	for_each_set_bit_left_cont(sbit, bucket->aisb, max) {
+		clear_bit(63 - (sbit & 63), bucket->aisb + (sbit >> 6));
+		last = sbit;
+
+		/* find vector bit */
+		imap = bucket->imap[sbit];
+		for_each_set_bit_left(mbit, &imap->aibv, imap->msi_vecs) {
+			kstat_cpu(smp_processor_id()).irqs[IOINT_MSI]++;
+			clear_bit(63 - mbit, &imap->aibv);
+
+			spin_lock(&imap->lock);
+			if (imap->cb[mbit].handler)
+				imap->cb[mbit].handler(mbit,
+					imap->cb[mbit].data);
+			spin_unlock(&imap->lock);
+		}
+	}
+
+	if (rescan)
+		goto out;
+
+	/* scan the skipped bits */
+	if (start > 0) {
+		sbit = 0;
+		max = start;
+		start = 0;
+		goto scan;
+	}
+
+	/* enable interrupts again */
+	sic_instr(SIC_IRQ_MODE_SINGLE, NULL, PCI_ISC);
+
+	/* check again to not lose initiative */
+	rmb();
+	max = aisb_max;
+	sbit = find_first_bit_left(bucket->aisb, max);
+	if (sbit != max) {
+		atomic_inc(&irq_retries);
+		rescan++;
+		goto scan;
+	}
+out:
+	/* store next device bit to scan */
+	__get_cpu_var(next_sbit) = (++last >= aisb_max) ? 0 : last;
+}
+
+/* msi_vecs - number of requested interrupts, 0 place function to error state */
+static int zpci_setup_msi(struct pci_dev *pdev, int msi_vecs)
+{
+	struct zpci_dev *zdev = get_zdev(pdev);
+	unsigned int aisb, msi_nr;
+	struct msi_desc *msi;
+	int rc;
+
+	/* store the number of used MSI vectors */
+	zdev->irq_map->msi_vecs = min(msi_vecs, ZPCI_NR_MSI_VECS);
+
+	spin_lock(&bucket->lock);
+	aisb = find_first_zero_bit(bucket->alloc, PAGE_SIZE);
+	/* alloc map exhausted? */
+	if (aisb == PAGE_SIZE) {
+		spin_unlock(&bucket->lock);
+		return -EIO;
+	}
+	set_bit(aisb, bucket->alloc);
+	spin_unlock(&bucket->lock);
+
+	zdev->aisb = aisb;
+	if (aisb + 1 > aisb_max)
+		aisb_max = aisb + 1;
+
+	/* wire up IRQ shortcut pointer */
+	bucket->imap[zdev->aisb] = zdev->irq_map;
+	pr_debug("%s: imap[%u] linked to %p\n", __func__, zdev->aisb, zdev->irq_map);
+
+	/* TODO: irq number 0 wont be found if we return less than requested MSIs.
+	 * ignore it for now and fix in common code.
+	 */
+	msi_nr = aisb << ZPCI_MSI_VEC_BITS;
+
+	list_for_each_entry(msi, &pdev->msi_list, list) {
+		rc = zpci_setup_msi_irq(zdev, msi, msi_nr,
+					  aisb << ZPCI_MSI_VEC_BITS);
+		if (rc)
+			return rc;
+		msi_nr++;
+	}
+
+	rc = zpci_register_airq(zdev, aisb, (u64) &zdev->irq_map->aibv);
+	if (rc) {
+		clear_bit(aisb, bucket->alloc);
+		dev_err(&pdev->dev, "register MSI failed with: %d\n", rc);
+		return rc;
+	}
+	return (zdev->irq_map->msi_vecs == msi_vecs) ?
+		0 : zdev->irq_map->msi_vecs;
+}
+
+static void zpci_teardown_msi(struct pci_dev *pdev)
+{
+	struct zpci_dev *zdev = get_zdev(pdev);
+	struct msi_desc *msi;
+	int aisb, rc;
+
+	rc = zpci_unregister_airq(zdev);
+	if (rc) {
+		dev_err(&pdev->dev, "deregister MSI failed with: %d\n", rc);
+		return;
+	}
+
+	msi = list_first_entry(&pdev->msi_list, struct msi_desc, list);
+	aisb = irq_to_dev_nr(msi->irq);
+
+	list_for_each_entry(msi, &pdev->msi_list, list)
+		zpci_teardown_msi_irq(zdev, msi);
+
+	clear_bit(aisb, bucket->alloc);
+	if (aisb + 1 == aisb_max)
+		aisb_max--;
+}
+
+int arch_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
+{
+	pr_debug("%s: requesting %d MSI-X interrupts...", __func__, nvec);
+	if (type != PCI_CAP_ID_MSIX && type != PCI_CAP_ID_MSI)
+		return -EINVAL;
+	return zpci_setup_msi(pdev, nvec);
+}
+
+void arch_teardown_msi_irqs(struct pci_dev *pdev)
+{
+	pr_info("%s: on pdev: %p\n", __func__, pdev);
+	zpci_teardown_msi(pdev);
+}
+
 static void zpci_map_resources(struct zpci_dev *zdev)
 {
 	struct pci_dev *pdev = zdev->pdev;
@@ -257,11 +575,23 @@ struct zpci_dev *zpci_alloc_device(void)
 	zdev = kzalloc(sizeof(*zdev), GFP_KERNEL);
 	if (!zdev)
 		return ERR_PTR(-ENOMEM);
+
+	/* Alloc aibv & callback space */
+	zdev->irq_map = kmem_cache_alloc(zdev_irq_cache, GFP_KERNEL);
+	if (!zdev->irq_map)
+		goto error;
+	memset(zdev->irq_map, 0, sizeof(*zdev->irq_map));
+	WARN_ON((u64) zdev->irq_map & 0xff);
 	return zdev;
+
+error:
+	kfree(zdev);
+	return ERR_PTR(-ENOMEM);
 }
 
 void zpci_free_device(struct zpci_dev *zdev)
 {
+	kmem_cache_free(zdev_irq_cache, zdev->irq_map);
 	kfree(zdev);
 }
 
@@ -320,6 +650,118 @@ void pcibios_disable_device(struct pci_dev *pdev)
 	pdev->sysdata = NULL;
 }
 
+int zpci_request_irq(unsigned int irq, irq_handler_t handler, void *data)
+{
+	int msi_nr = irq_to_msi_nr(irq);
+	struct zdev_irq_map *imap;
+	struct msi_desc *msi;
+
+	msi = irq_get_msi_desc(irq);
+	if (!msi)
+		return -EIO;
+
+	imap = get_imap(irq);
+	spin_lock_init(&imap->lock);
+
+	pr_debug("%s: register handler for IRQ:MSI %d:%d\n", __func__, irq >> 6, msi_nr);
+	imap->cb[msi_nr].handler = handler;
+	imap->cb[msi_nr].data = data;
+
+	/*
+	 * The generic MSI code returns with the interrupt disabled on the
+	 * card, using the MSI mask bits. Firmware doesn't appear to unmask
+	 * at that level, so we do it here by hand.
+	 */
+	zpci_msi_set_mask_bits(msi, 1, 0);
+	return 0;
+}
+
+void zpci_free_irq(unsigned int irq)
+{
+	struct zdev_irq_map *imap = get_imap(irq);
+	int msi_nr = irq_to_msi_nr(irq);
+	unsigned long flags;
+
+	pr_debug("%s: for irq: %d\n", __func__, irq);
+
+	spin_lock_irqsave(&imap->lock, flags);
+	imap->cb[msi_nr].handler = NULL;
+	imap->cb[msi_nr].data = NULL;
+	spin_unlock_irqrestore(&imap->lock, flags);
+}
+
+int request_irq(unsigned int irq, irq_handler_t handler,
+		unsigned long irqflags, const char *devname, void *dev_id)
+{
+	pr_debug("%s: irq: %d  handler: %p  flags: %lx  dev: %s\n",
+		__func__, irq, handler, irqflags, devname);
+
+	return zpci_request_irq(irq, handler, dev_id);
+}
+EXPORT_SYMBOL_GPL(request_irq);
+
+void free_irq(unsigned int irq, void *dev_id)
+{
+	zpci_free_irq(irq);
+}
+EXPORT_SYMBOL_GPL(free_irq);
+
+static int __init zpci_irq_init(void)
+{
+	int cpu, rc;
+
+	bucket = kzalloc(sizeof(*bucket), GFP_KERNEL);
+	if (!bucket)
+		return -ENOMEM;
+
+	bucket->aisb = (unsigned long *) get_zeroed_page(GFP_KERNEL);
+	if (!bucket->aisb) {
+		rc = -ENOMEM;
+		goto out_aisb;
+	}
+
+	bucket->alloc = (unsigned long *) get_zeroed_page(GFP_KERNEL);
+	if (!bucket->alloc) {
+		rc = -ENOMEM;
+		goto out_alloc;
+	}
+
+	isc_register(PCI_ISC);
+	zpci_irq_si = s390_register_adapter_interrupt(&zpci_irq_handler, NULL, PCI_ISC);
+	if (IS_ERR(zpci_irq_si)) {
+		rc = PTR_ERR(zpci_irq_si);
+		zpci_irq_si = NULL;
+		goto out_ai;
+	}
+
+	for_each_online_cpu(cpu)
+		per_cpu(next_sbit, cpu) = 0;
+
+	spin_lock_init(&bucket->lock);
+	/* set summary to 1 to be called every time for the ISC */
+	*zpci_irq_si = 1;
+	sic_instr(SIC_IRQ_MODE_SINGLE, NULL, PCI_ISC);
+	return 0;
+
+out_ai:
+	isc_unregister(PCI_ISC);
+	free_page((unsigned long) bucket->alloc);
+out_alloc:
+	free_page((unsigned long) bucket->aisb);
+out_aisb:
+	kfree(bucket);
+	return rc;
+}
+
+static void zpci_irq_exit(void)
+{
+	free_page((unsigned long) bucket->alloc);
+	free_page((unsigned long) bucket->aisb);
+	s390_unregister_adapter_interrupt(zpci_irq_si, PCI_ISC);
+	isc_unregister(PCI_ISC);
+	kfree(bucket);
+}
+
 static struct resource *zpci_alloc_bus_resource(unsigned long start, unsigned long size,
 						unsigned long flags, int domain)
 {
@@ -523,13 +965,20 @@ static inline int barsize(u8 size)
 
 static int zpci_mem_init(void)
 {
+	zdev_irq_cache = kmem_cache_create("PCI_IRQ_cache", sizeof(struct zdev_irq_map),
+				L1_CACHE_BYTES, SLAB_HWCACHE_ALIGN, NULL);
+	if (!zdev_irq_cache)
+		goto error_zdev;
+
 	/* TODO: use realloc */
 	zpci_iomap_start = kzalloc(ZPCI_IOMAP_MAX_ENTRIES * sizeof(*zpci_iomap_start),
 				   GFP_KERNEL);
 	if (!zpci_iomap_start)
-		goto error_zdev;
+		goto error_iomap;
 	return 0;
 
+error_iomap:
+	kmem_cache_destroy(zdev_irq_cache);
 error_zdev:
 	return -ENOMEM;
 }
@@ -537,6 +986,7 @@ error_zdev:
 static void zpci_mem_exit(void)
 {
 	kfree(zpci_iomap_start);
+	kmem_cache_destroy(zdev_irq_cache);
 }
 
 unsigned int pci_probe = 1;
@@ -570,6 +1020,14 @@ static int __init pci_base_init(void)
 	if (rc)
 		goto out_mem;
 
+	rc = zpci_msihash_init();
+	if (rc)
+		goto out_hash;
+
+	rc = zpci_irq_init();
+	if (rc)
+		goto out_irq;
+
 	rc = clp_find_pci_devices();
 	if (rc)
 		goto out_find;
@@ -578,6 +1036,10 @@ static int __init pci_base_init(void)
 	return 0;
 
 out_find:
+	zpci_irq_exit();
+out_irq:
+	zpci_msihash_exit();
+out_hash:
 	zpci_mem_exit();
 out_mem:
 	return rc;
diff --git a/arch/s390/pci/pci_clp.c b/arch/s390/pci/pci_clp.c
index 291da1a..72694fb 100644
--- a/arch/s390/pci/pci_clp.c
+++ b/arch/s390/pci/pci_clp.c
@@ -48,6 +48,9 @@ static void clp_free_block(void *ptr)
 static void clp_store_query_pci_fngrp(struct zpci_dev *zdev,
 				      struct clp_rsp_query_pci_grp *response)
 {
+	zdev->msi_addr = response->msia;
+
+	pr_debug("Supported number of MSI vectors: %u\n", response->noi);
 	switch (response->version) {
 	case 1:
 		zdev->max_bus_speed = PCIE_SPEED_5_0GT;
diff --git a/arch/s390/pci/pci_msi.c b/arch/s390/pci/pci_msi.c
new file mode 100644
index 0000000..90fd348
--- /dev/null
+++ b/arch/s390/pci/pci_msi.c
@@ -0,0 +1,141 @@
+/*
+ * Copyright IBM Corp. 2012
+ *
+ * Author(s):
+ *   Jan Glauber <jang@linux.vnet.ibm.com>
+ */
+
+#define COMPONENT "zPCI"
+#define pr_fmt(fmt) COMPONENT ": " fmt
+
+#include <linux/kernel.h>
+#include <linux/err.h>
+#include <linux/rculist.h>
+#include <linux/hash.h>
+#include <linux/pci.h>
+#include <linux/msi.h>
+#include <asm/hw_irq.h>
+
+/* mapping of irq numbers to msi_desc */
+static struct hlist_head *msi_hash;
+static unsigned int msihash_shift = 6;
+#define msi_hashfn(nr)	hash_long(nr, msihash_shift)
+
+static DEFINE_SPINLOCK(msi_map_lock);
+
+struct msi_desc *__irq_get_msi_desc(unsigned int irq)
+{
+	struct hlist_node *entry;
+	struct msi_map *map;
+
+	hlist_for_each_entry_rcu(map, entry,
+			&msi_hash[msi_hashfn(irq)], msi_chain)
+		if (map->irq == irq)
+			return map->msi;
+	return NULL;
+}
+
+int zpci_msi_set_mask_bits(struct msi_desc *msi, u32 mask, u32 flag)
+{
+	if (msi->msi_attrib.is_msix) {
+		int offset = msi->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
+			PCI_MSIX_ENTRY_VECTOR_CTRL;
+		msi->masked = readl(msi->mask_base + offset);
+		writel(flag, msi->mask_base + offset);
+	} else {
+		if (msi->msi_attrib.maskbit) {
+			int pos;
+			u32 mask_bits;
+
+			pos = (long) msi->mask_base;
+			pci_read_config_dword(msi->dev, pos, &mask_bits);
+			mask_bits &= ~(mask);
+			mask_bits |= flag & mask;
+			pci_write_config_dword(msi->dev, pos, mask_bits);
+		} else {
+			return 0;
+		}
+	}
+
+	msi->msi_attrib.maskbit = !!flag;
+	return 1;
+}
+
+int zpci_setup_msi_irq(struct zpci_dev *zdev, struct msi_desc *msi,
+			unsigned int nr, int offset)
+{
+	struct msi_map *map;
+	struct msi_msg msg;
+	int rc;
+
+	map = kmalloc(sizeof(*map), GFP_KERNEL);
+	if (map == NULL)
+		return -ENOMEM;
+
+	map->irq = nr;
+	map->msi = msi;
+	zdev->msi_map[nr & ZPCI_MSI_MASK] = map;
+
+	pr_debug("%s hashing irq: %u  to bucket nr: %llu\n",
+		__func__, nr, msi_hashfn(nr));
+	hlist_add_head_rcu(&map->msi_chain, &msi_hash[msi_hashfn(nr)]);
+
+	spin_lock(&msi_map_lock);
+	rc = irq_set_msi_desc(nr, msi);
+	if (rc) {
+		spin_unlock(&msi_map_lock);
+		hlist_del_rcu(&map->msi_chain);
+		kfree(map);
+		zdev->msi_map[nr & ZPCI_MSI_MASK] = NULL;
+		return rc;
+	}
+	spin_unlock(&msi_map_lock);
+
+	msg.data = nr - offset;
+	msg.address_lo = zdev->msi_addr & 0xffffffff;
+	msg.address_hi = zdev->msi_addr >> 32;
+	write_msi_msg(nr, &msg);
+	return 0;
+}
+
+void zpci_teardown_msi_irq(struct zpci_dev *zdev, struct msi_desc *msi)
+{
+	int irq = msi->irq & ZPCI_MSI_MASK;
+	struct msi_map *map;
+
+	msi->msg.address_lo = 0;
+	msi->msg.address_hi = 0;
+	msi->msg.data = 0;
+	msi->irq = 0;
+	zpci_msi_set_mask_bits(msi, 1, 1);
+
+	spin_lock(&msi_map_lock);
+	map = zdev->msi_map[irq];
+	hlist_del_rcu(&map->msi_chain);
+	kfree(map);
+	zdev->msi_map[irq] = NULL;
+	spin_unlock(&msi_map_lock);
+}
+
+/*
+ * The msi hash table has 256 entries which is good for 4..20
+ * devices (a typical device allocates 10 + CPUs MSI's). Maybe make
+ * the hash table size adjustable later.
+ */
+int __init zpci_msihash_init(void)
+{
+	unsigned int i;
+
+	msi_hash = kmalloc(256 * sizeof(*msi_hash), GFP_KERNEL);
+	if (!msi_hash)
+		return -ENOMEM;
+
+	for (i = 0; i < (1U << msihash_shift); i++)
+		INIT_HLIST_HEAD(&msi_hash[i]);
+	return 0;
+}
+
+void __init zpci_msihash_exit(void)
+{
+	kfree(msi_hash);
+}
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index a825d78..192e8ca 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -207,6 +207,8 @@ static void msix_mask_irq(struct msi_desc *desc, u32 flag)
 	desc->masked = __msix_mask_irq(desc, flag);
 }
 
+#ifdef CONFIG_GENERIC_HARDIRQS
+
 static void msi_set_mask_bit(struct irq_data *data, u32 flag)
 {
 	struct msi_desc *desc = irq_data_get_msi(data);
@@ -230,6 +232,8 @@ void unmask_msi_irq(struct irq_data *data)
 	msi_set_mask_bit(data, 0);
 }
 
+#endif /* CONFIG_GENERIC_HARDIRQS */
+
 void __read_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
 {
 	BUG_ON(entry->dev->current_state != PCI_D0);
@@ -333,12 +337,14 @@ static void free_msi_irqs(struct pci_dev *dev)
 	struct msi_desc *entry, *tmp;
 
 	list_for_each_entry(entry, &dev->msi_list, list) {
-		int i, nvec;
+		int nvec;
 		if (!entry->irq)
 			continue;
 		nvec = 1 << entry->msi_attrib.multiple;
-		for (i = 0; i < nvec; i++)
+#ifdef CONFIG_GENERIC_HARDIRQS
+		for (int i = 0; i < nvec; i++)
 			BUG_ON(irq_has_action(entry->irq + i));
+#endif
 	}
 
 	arch_teardown_msi_irqs(dev);
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 216b0ba..e21ed83 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -10,9 +10,6 @@
  */
 
 #include <linux/smp.h>
-
-#ifndef CONFIG_S390
-
 #include <linux/linkage.h>
 #include <linux/cache.h>
 #include <linux/spinlock.h>
@@ -737,8 +734,11 @@ static inline void irq_gc_lock(struct irq_chip_generic *gc) { }
 static inline void irq_gc_unlock(struct irq_chip_generic *gc) { }
 #endif
 
-#endif /* CONFIG_GENERIC_HARDIRQS */
+#else /* !CONFIG_GENERIC_HARDIRQS */
 
-#endif /* !CONFIG_S390 */
+extern struct msi_desc *irq_get_msi_desc(unsigned int irq);
+extern int irq_set_msi_desc(unsigned int irq, struct msi_desc *entry);
+
+#endif /* CONFIG_GENERIC_HARDIRQS */
 
 #endif /* _LINUX_IRQ_H */
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 05/10] s390/pci: DMA support
  2012-11-14  9:41 [RFC PATCH 00/10] s390/pci: PCI support on System z Jan Glauber
                   ` (3 preceding siblings ...)
  2012-11-14  9:41 ` [RFC PATCH 04/10] s390/pci: PCI adapter interrupts for MSI/MSI-X Jan Glauber
@ 2012-11-14  9:41 ` Jan Glauber
  2012-11-14  9:41 ` [RFC PATCH 06/10] s390/pci: CHSC PCI support for error and availability events Jan Glauber
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Jan Glauber @ 2012-11-14  9:41 UTC (permalink / raw)
  To: linux-pci; +Cc: linux-kernel, heiko.carstens, schwidefsky, Jan Glauber

Add DMA IOMMU support using 4K page table entries. Implement dma_map_ops.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
---
 arch/s390/include/asm/dma-mapping.h |  76 ++++++
 arch/s390/include/asm/dma.h         |  19 +-
 arch/s390/include/asm/pci.h         |  21 ++
 arch/s390/include/asm/pci_dma.h     | 196 ++++++++++++++
 arch/s390/pci/Makefile              |   2 +-
 arch/s390/pci/pci.c                 |  36 +++
 arch/s390/pci/pci_clp.c             |   4 +
 arch/s390/pci/pci_dma.c             | 505 ++++++++++++++++++++++++++++++++++++
 8 files changed, 848 insertions(+), 11 deletions(-)
 create mode 100644 arch/s390/include/asm/dma-mapping.h
 create mode 100644 arch/s390/include/asm/pci_dma.h
 create mode 100644 arch/s390/pci/pci_dma.c

diff --git a/arch/s390/include/asm/dma-mapping.h b/arch/s390/include/asm/dma-mapping.h
new file mode 100644
index 0000000..aab2158
--- /dev/null
+++ b/arch/s390/include/asm/dma-mapping.h
@@ -0,0 +1,76 @@
+#ifndef _ASM_S390_DMA_MAPPING_H
+#define _ASM_S390_DMA_MAPPING_H
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/mm.h>
+#include <linux/scatterlist.h>
+#include <linux/dma-attrs.h>
+#include <linux/dma-debug.h>
+#include <linux/io.h>
+
+#define DMA_ERROR_CODE          (~(dma_addr_t) 0x0)
+
+extern struct dma_map_ops s390_dma_ops;
+
+static inline struct dma_map_ops *get_dma_ops(struct device *dev)
+{
+	return &s390_dma_ops;
+}
+
+extern int dma_set_mask(struct device *dev, u64 mask);
+extern int dma_is_consistent(struct device *dev, dma_addr_t dma_handle);
+extern void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
+			   enum dma_data_direction direction);
+
+#define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f)
+#define dma_free_noncoherent(d, s, v, h) dma_free_coherent(d, s, v, h)
+
+#include <asm-generic/dma-mapping-common.h>
+
+static inline int dma_supported(struct device *dev, u64 mask)
+{
+	struct dma_map_ops *dma_ops = get_dma_ops(dev);
+
+	if (dma_ops->dma_supported == NULL)
+		return 1;
+	return dma_ops->dma_supported(dev, mask);
+}
+
+static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size)
+{
+	if (!dev->dma_mask)
+		return 0;
+	return addr + size - 1 <= *dev->dma_mask;
+}
+
+static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+	struct dma_map_ops *dma_ops = get_dma_ops(dev);
+
+	if (dma_ops->mapping_error)
+		return dma_ops->mapping_error(dev, dma_addr);
+	return (dma_addr == 0UL);
+}
+
+static inline void *dma_alloc_coherent(struct device *dev, size_t size,
+				       dma_addr_t *dma_handle, gfp_t flag)
+{
+	struct dma_map_ops *ops = get_dma_ops(dev);
+	void *ret;
+
+	ret = ops->alloc(dev, size, dma_handle, flag, NULL);
+	debug_dma_alloc_coherent(dev, size, *dma_handle, ret);
+	return ret;
+}
+
+static inline void dma_free_coherent(struct device *dev, size_t size,
+				     void *cpu_addr, dma_addr_t dma_handle)
+{
+	struct dma_map_ops *dma_ops = get_dma_ops(dev);
+
+	dma_ops->free(dev, size, cpu_addr, dma_handle, NULL);
+	debug_dma_free_coherent(dev, size, cpu_addr, dma_handle);
+}
+
+#endif /* _ASM_S390_DMA_MAPPING_H */
diff --git a/arch/s390/include/asm/dma.h b/arch/s390/include/asm/dma.h
index 6fb6de4..de015d8 100644
--- a/arch/s390/include/asm/dma.h
+++ b/arch/s390/include/asm/dma.h
@@ -1,14 +1,13 @@
-/*
- *  S390 version
- */
-
-#ifndef _ASM_DMA_H
-#define _ASM_DMA_H
+#ifndef _ASM_S390_DMA_H
+#define _ASM_S390_DMA_H
 
-#include <asm/io.h>		/* need byte IO */
+#include <asm/io.h>
 
+/*
+ * MAX_DMA_ADDRESS is ambiguous because on s390 its completely unrelated
+ * to DMA. It _is_ used for the s390 memory zone split at 2GB caused
+ * by the 31 bit heritage.
+ */
 #define MAX_DMA_ADDRESS         0x80000000
 
-#define free_dma(x)	do { } while (0)
-
-#endif /* _ASM_DMA_H */
+#endif /* _ASM_S390_DMA_H */
diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index a9aadeb..ca48caf 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -75,8 +75,23 @@ struct zpci_dev {
 	struct msi_map *msi_map[ZPCI_NR_MSI_VECS];
 	unsigned int	aisb;		/* number of the summary bit */
 
+	/* DMA stuff */
+	unsigned long	*dma_table;
+	spinlock_t	dma_table_lock;
+	int		tlb_refresh;
+
+	spinlock_t	iommu_bitmap_lock;
+	unsigned long	*iommu_bitmap;
+	unsigned long	iommu_size;
+	unsigned long	iommu_pages;
+	unsigned int	next_bit;
+
 	struct zpci_bar_struct bars[PCI_BAR_COUNT];
 
+	u64		start_dma;	/* Start of available DMA addresses */
+	u64		end_dma;	/* End of available DMA addresses */
+	u64		dma_mask;	/* DMA address space mask */
+
 	enum pci_bus_speed max_bus_speed;
 };
 
@@ -95,6 +110,8 @@ int zpci_enable_device(struct zpci_dev *);
 void zpci_stop_device(struct zpci_dev *);
 void zpci_free_device(struct zpci_dev *);
 int zpci_scan_device(struct zpci_dev *);
+int zpci_register_ioat(struct zpci_dev *, u8, u64, u64, u64);
+int zpci_unregister_ioat(struct zpci_dev *, u8);
 
 /* CLP */
 int clp_find_pci_devices(void);
@@ -115,4 +132,8 @@ struct zpci_dev *get_zdev(struct pci_dev *);
 struct zpci_dev *get_zdev_by_fid(u32);
 bool zpci_fid_present(u32);
 
+/* DMA */
+int zpci_dma_init(void);
+void zpci_dma_exit(void);
+
 #endif
diff --git a/arch/s390/include/asm/pci_dma.h b/arch/s390/include/asm/pci_dma.h
new file mode 100644
index 0000000..b6492bd
--- /dev/null
+++ b/arch/s390/include/asm/pci_dma.h
@@ -0,0 +1,196 @@
+#ifndef _ASM_S390_PCI_DMA_H
+#define _ASM_S390_PCI_DMA_H
+
+/* I/O Translation Anchor (IOTA) */
+enum zpci_ioat_dtype {
+	ZPCI_IOTA_STO = 0,
+	ZPCI_IOTA_RTTO = 1,
+	ZPCI_IOTA_RSTO = 2,
+	ZPCI_IOTA_RFTO = 3,
+	ZPCI_IOTA_PFAA = 4,
+	ZPCI_IOTA_IOPFAA = 5,
+	ZPCI_IOTA_IOPTO = 7
+};
+
+#define ZPCI_IOTA_IOT_ENABLED		0x800UL
+#define ZPCI_IOTA_DT_ST			(ZPCI_IOTA_STO  << 2)
+#define ZPCI_IOTA_DT_RT			(ZPCI_IOTA_RTTO << 2)
+#define ZPCI_IOTA_DT_RS			(ZPCI_IOTA_RSTO << 2)
+#define ZPCI_IOTA_DT_RF			(ZPCI_IOTA_RFTO << 2)
+#define ZPCI_IOTA_DT_PF			(ZPCI_IOTA_PFAA << 2)
+#define ZPCI_IOTA_FS_4K			0
+#define ZPCI_IOTA_FS_1M			1
+#define ZPCI_IOTA_FS_2G			2
+#define ZPCI_KEY			(PAGE_DEFAULT_KEY << 5)
+
+#define ZPCI_IOTA_STO_FLAG	(ZPCI_IOTA_IOT_ENABLED | ZPCI_KEY | ZPCI_IOTA_DT_ST)
+#define ZPCI_IOTA_RTTO_FLAG	(ZPCI_IOTA_IOT_ENABLED | ZPCI_KEY | ZPCI_IOTA_DT_RT)
+#define ZPCI_IOTA_RSTO_FLAG	(ZPCI_IOTA_IOT_ENABLED | ZPCI_KEY | ZPCI_IOTA_DT_RS)
+#define ZPCI_IOTA_RFTO_FLAG	(ZPCI_IOTA_IOT_ENABLED | ZPCI_KEY | ZPCI_IOTA_DT_RF)
+#define ZPCI_IOTA_RFAA_FLAG	(ZPCI_IOTA_IOT_ENABLED | ZPCI_KEY | ZPCI_IOTA_DT_PF | ZPCI_IOTA_FS_2G)
+
+/* I/O Region and segment tables */
+#define ZPCI_INDEX_MASK			0x7ffUL
+
+#define ZPCI_TABLE_TYPE_MASK		0xc
+#define ZPCI_TABLE_TYPE_RFX		0xc
+#define ZPCI_TABLE_TYPE_RSX		0x8
+#define ZPCI_TABLE_TYPE_RTX		0x4
+#define ZPCI_TABLE_TYPE_SX		0x0
+
+#define ZPCI_TABLE_LEN_RFX		0x3
+#define ZPCI_TABLE_LEN_RSX		0x3
+#define ZPCI_TABLE_LEN_RTX		0x3
+
+#define ZPCI_TABLE_OFFSET_MASK		0xc0
+#define ZPCI_TABLE_SIZE			0x4000
+#define ZPCI_TABLE_ALIGN		ZPCI_TABLE_SIZE
+#define ZPCI_TABLE_ENTRY_SIZE		(sizeof(unsigned long))
+#define ZPCI_TABLE_ENTRIES		(ZPCI_TABLE_SIZE / ZPCI_TABLE_ENTRY_SIZE)
+
+#define ZPCI_TABLE_BITS			11
+#define ZPCI_PT_BITS			8
+#define ZPCI_ST_SHIFT			(ZPCI_PT_BITS + PAGE_SHIFT)
+#define ZPCI_RT_SHIFT			(ZPCI_ST_SHIFT + ZPCI_TABLE_BITS)
+
+#define ZPCI_RTE_FLAG_MASK		0x3fffUL
+#define ZPCI_RTE_ADDR_MASK		(~ZPCI_RTE_FLAG_MASK)
+#define ZPCI_STE_FLAG_MASK		0x7ffUL
+#define ZPCI_STE_ADDR_MASK		(~ZPCI_STE_FLAG_MASK)
+
+/* I/O Page tables */
+#define ZPCI_PTE_VALID_MASK		0x400
+#define ZPCI_PTE_INVALID		0x400
+#define ZPCI_PTE_VALID			0x000
+#define ZPCI_PT_SIZE			0x800
+#define ZPCI_PT_ALIGN			ZPCI_PT_SIZE
+#define ZPCI_PT_ENTRIES			(ZPCI_PT_SIZE / ZPCI_TABLE_ENTRY_SIZE)
+#define ZPCI_PT_MASK			(ZPCI_PT_ENTRIES - 1)
+
+#define ZPCI_PTE_FLAG_MASK		0xfffUL
+#define ZPCI_PTE_ADDR_MASK		(~ZPCI_PTE_FLAG_MASK)
+
+/* Shared bits */
+#define ZPCI_TABLE_VALID		0x00
+#define ZPCI_TABLE_INVALID		0x20
+#define ZPCI_TABLE_PROTECTED		0x200
+#define ZPCI_TABLE_UNPROTECTED		0x000
+
+#define ZPCI_TABLE_VALID_MASK		0x20
+#define ZPCI_TABLE_PROT_MASK		0x200
+
+static inline unsigned int calc_rtx(dma_addr_t ptr)
+{
+	return ((unsigned long) ptr >> ZPCI_RT_SHIFT) & ZPCI_INDEX_MASK;
+}
+
+static inline unsigned int calc_sx(dma_addr_t ptr)
+{
+	return ((unsigned long) ptr >> ZPCI_ST_SHIFT) & ZPCI_INDEX_MASK;
+}
+
+static inline unsigned int calc_px(dma_addr_t ptr)
+{
+	return ((unsigned long) ptr >> PAGE_SHIFT) & ZPCI_PT_MASK;
+}
+
+static inline void set_pt_pfaa(unsigned long *entry, void *pfaa)
+{
+	*entry &= ZPCI_PTE_FLAG_MASK;
+	*entry |= ((unsigned long) pfaa & ZPCI_PTE_ADDR_MASK);
+}
+
+static inline void set_rt_sto(unsigned long *entry, void *sto)
+{
+	*entry &= ZPCI_RTE_FLAG_MASK;
+	*entry |= ((unsigned long) sto & ZPCI_RTE_ADDR_MASK);
+	*entry |= ZPCI_TABLE_TYPE_RTX;
+}
+
+static inline void set_st_pto(unsigned long *entry, void *pto)
+{
+	*entry &= ZPCI_STE_FLAG_MASK;
+	*entry |= ((unsigned long) pto & ZPCI_STE_ADDR_MASK);
+	*entry |= ZPCI_TABLE_TYPE_SX;
+}
+
+static inline void validate_rt_entry(unsigned long *entry)
+{
+	*entry &= ~ZPCI_TABLE_VALID_MASK;
+	*entry &= ~ZPCI_TABLE_OFFSET_MASK;
+	*entry |= ZPCI_TABLE_VALID;
+	*entry |= ZPCI_TABLE_LEN_RTX;
+}
+
+static inline void validate_st_entry(unsigned long *entry)
+{
+	*entry &= ~ZPCI_TABLE_VALID_MASK;
+	*entry |= ZPCI_TABLE_VALID;
+}
+
+static inline void invalidate_table_entry(unsigned long *entry)
+{
+	*entry &= ~ZPCI_TABLE_VALID_MASK;
+	*entry |= ZPCI_TABLE_INVALID;
+}
+
+static inline void invalidate_pt_entry(unsigned long *entry)
+{
+	WARN_ON_ONCE((*entry & ZPCI_PTE_VALID_MASK) == ZPCI_PTE_INVALID);
+	*entry &= ~ZPCI_PTE_VALID_MASK;
+	*entry |= ZPCI_PTE_INVALID;
+}
+
+static inline void validate_pt_entry(unsigned long *entry)
+{
+	WARN_ON_ONCE((*entry & ZPCI_PTE_VALID_MASK) == ZPCI_PTE_VALID);
+	*entry &= ~ZPCI_PTE_VALID_MASK;
+	*entry |= ZPCI_PTE_VALID;
+}
+
+static inline void entry_set_protected(unsigned long *entry)
+{
+	*entry &= ~ZPCI_TABLE_PROT_MASK;
+	*entry |= ZPCI_TABLE_PROTECTED;
+}
+
+static inline void entry_clr_protected(unsigned long *entry)
+{
+	*entry &= ~ZPCI_TABLE_PROT_MASK;
+	*entry |= ZPCI_TABLE_UNPROTECTED;
+}
+
+static inline int reg_entry_isvalid(unsigned long entry)
+{
+	return (entry & ZPCI_TABLE_VALID_MASK) == ZPCI_TABLE_VALID;
+}
+
+static inline int pt_entry_isvalid(unsigned long entry)
+{
+	return (entry & ZPCI_PTE_VALID_MASK) == ZPCI_PTE_VALID;
+}
+
+static inline int entry_isprotected(unsigned long entry)
+{
+	return (entry & ZPCI_TABLE_PROT_MASK) == ZPCI_TABLE_PROTECTED;
+}
+
+static inline unsigned long *get_rt_sto(unsigned long entry)
+{
+	return ((entry & ZPCI_TABLE_TYPE_MASK) == ZPCI_TABLE_TYPE_RTX)
+		? (unsigned long *) (entry & ZPCI_RTE_ADDR_MASK)
+		: NULL;
+}
+
+static inline unsigned long *get_st_pto(unsigned long entry)
+{
+	return ((entry & ZPCI_TABLE_TYPE_MASK) == ZPCI_TABLE_TYPE_SX)
+		? (unsigned long *) (entry & ZPCI_STE_ADDR_MASK)
+		: NULL;
+}
+
+/* Prototypes */
+int zpci_dma_init_device(struct zpci_dev *);
+void zpci_dma_exit_device(struct zpci_dev *);
+
+#endif
diff --git a/arch/s390/pci/Makefile b/arch/s390/pci/Makefile
index 628be7b..4590596 100644
--- a/arch/s390/pci/Makefile
+++ b/arch/s390/pci/Makefile
@@ -2,4 +2,4 @@
 # Makefile for the s390 PCI subsystem.
 #
 
-obj-$(CONFIG_PCI)	+= pci.o pci_clp.o pci_msi.o
+obj-$(CONFIG_PCI)	+= pci.o pci_dma.o pci_clp.o pci_msi.o
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index d11dc8a..5a2ef9e 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -34,6 +34,7 @@
 #include <asm/facility.h>
 #include <asm/pci_insn.h>
 #include <asm/pci_clp.h>
+#include <asm/pci_dma.h>
 
 #define DEBUG				/* enable pr_debug */
 
@@ -232,6 +233,25 @@ static int mod_pci(struct zpci_dev *zdev, int fn, u8 dmaas, struct mod_pci_args
 	return rc;
 }
 
+/* Modify PCI: Register I/O address translation parameters */
+int zpci_register_ioat(struct zpci_dev *zdev, u8 dmaas,
+		       u64 base, u64 limit, u64 iota)
+{
+	struct mod_pci_args args = { base, limit, iota };
+
+	WARN_ON_ONCE(iota & 0x3fff);
+	args.iota |= ZPCI_IOTA_RTTO_FLAG;
+	return mod_pci(zdev, ZPCI_MOD_FC_REG_IOAT, dmaas, &args);
+}
+
+/* Modify PCI: Unregister I/O address translation parameters */
+int zpci_unregister_ioat(struct zpci_dev *zdev, u8 dmaas)
+{
+	struct mod_pci_args args = { 0, 0, 0 };
+
+	return mod_pci(zdev, ZPCI_MOD_FC_DEREG_IOAT, dmaas, &args);
+}
+
 /* Modify PCI: Unregister adapter interruptions */
 static int zpci_unregister_airq(struct zpci_dev *zdev)
 {
@@ -602,6 +622,7 @@ static void zpci_remove_device(struct pci_dev *pdev)
 
 	dev_info(&pdev->dev, "Removing device %u\n", zdev->domain);
 	zdev->state = ZPCI_FN_STATE_CONFIGURED;
+	zpci_dma_exit_device(zdev);
 	zpci_unmap_resources(pdev);
 	list_del(&zdev->entry);		/* can be called from init */
 	zdev->pdev = NULL;
@@ -887,7 +908,14 @@ int zpci_enable_device(struct zpci_dev *zdev)
 	if (rc)
 		goto out;
 	pr_info("Enabled fh: 0x%x fid: 0x%x\n", zdev->fh, zdev->fid);
+
+	rc = zpci_dma_init_device(zdev);
+	if (rc)
+		goto out_dma;
 	return 0;
+
+out_dma:
+	clp_disable_fh(zdev);
 out:
 	return rc;
 }
@@ -929,6 +957,7 @@ out:
 
 void zpci_stop_device(struct zpci_dev *zdev)
 {
+	zpci_dma_exit_device(zdev);
 	/*
 	 * Note: SCLP disables fh via set-pci-fn so don't
 	 * do that here.
@@ -953,6 +982,7 @@ int zpci_scan_device(struct zpci_dev *zdev)
 	return 0;
 
 out:
+	zpci_dma_exit_device(zdev);
 	clp_disable_fh(zdev);
 	return -EIO;
 }
@@ -1028,6 +1058,10 @@ static int __init pci_base_init(void)
 	if (rc)
 		goto out_irq;
 
+	rc = zpci_dma_init();
+	if (rc)
+		goto out_dma;
+
 	rc = clp_find_pci_devices();
 	if (rc)
 		goto out_find;
@@ -1036,6 +1070,8 @@ static int __init pci_base_init(void)
 	return 0;
 
 out_find:
+	zpci_dma_exit();
+out_dma:
 	zpci_irq_exit();
 out_irq:
 	zpci_msihash_exit();
diff --git a/arch/s390/pci/pci_clp.c b/arch/s390/pci/pci_clp.c
index 72694fb..7f4ce8d 100644
--- a/arch/s390/pci/pci_clp.c
+++ b/arch/s390/pci/pci_clp.c
@@ -48,6 +48,8 @@ static void clp_free_block(void *ptr)
 static void clp_store_query_pci_fngrp(struct zpci_dev *zdev,
 				      struct clp_rsp_query_pci_grp *response)
 {
+	zdev->tlb_refresh = response->refresh;
+	zdev->dma_mask = response->dasm;
 	zdev->msi_addr = response->msia;
 
 	pr_debug("Supported number of MSI vectors: %u\n", response->noi);
@@ -97,6 +99,8 @@ static int clp_store_query_pci_fn(struct zpci_dev *zdev,
 		zdev->bars[i].val = le32_to_cpu(response->bar[i]);
 		zdev->bars[i].size = response->bar_size[i];
 	}
+	zdev->start_dma = response->sdma;
+	zdev->end_dma = response->edma;
 	zdev->pchid = response->pchid;
 	zdev->pfgid = response->pfgid;
 	return 0;
diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
new file mode 100644
index 0000000..de48625
--- /dev/null
+++ b/arch/s390/pci/pci_dma.c
@@ -0,0 +1,505 @@
+/*
+ * Copyright IBM Corp. 2012
+ *
+ * Author(s):
+ *   Jan Glauber <jang@linux.vnet.ibm.com>
+ */
+
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/export.h>
+#include <linux/iommu-helper.h>
+#include <linux/dma-mapping.h>
+#include <linux/pci.h>
+#include <asm/pci_dma.h>
+
+static enum zpci_ioat_dtype zpci_ioat_dt = ZPCI_IOTA_RTTO;
+
+static struct kmem_cache *dma_region_table_cache;
+static struct kmem_cache *dma_page_table_cache;
+
+static unsigned long *dma_alloc_cpu_table(void)
+{
+	unsigned long *table, *entry;
+
+	table = kmem_cache_alloc(dma_region_table_cache, GFP_ATOMIC);
+	if (!table)
+		return NULL;
+
+	for (entry = table; entry < table + ZPCI_TABLE_ENTRIES; entry++)
+		*entry = ZPCI_TABLE_INVALID | ZPCI_TABLE_PROTECTED;
+	return table;
+}
+
+static void dma_free_cpu_table(void *table)
+{
+	kmem_cache_free(dma_region_table_cache, table);
+}
+
+static unsigned long *dma_alloc_page_table(void)
+{
+	unsigned long *table, *entry;
+
+	table = kmem_cache_alloc(dma_page_table_cache, GFP_ATOMIC);
+	if (!table)
+		return NULL;
+
+	for (entry = table; entry < table + ZPCI_PT_ENTRIES; entry++)
+		*entry = ZPCI_PTE_INVALID | ZPCI_TABLE_PROTECTED;
+	return table;
+}
+
+static void dma_free_page_table(void *table)
+{
+	kmem_cache_free(dma_page_table_cache, table);
+}
+
+static unsigned long *dma_get_seg_table_origin(unsigned long *entry)
+{
+	unsigned long *sto;
+
+	if (reg_entry_isvalid(*entry))
+		sto = get_rt_sto(*entry);
+	else {
+		sto = dma_alloc_cpu_table();
+		if (!sto)
+			return NULL;
+
+		set_rt_sto(entry, sto);
+		validate_rt_entry(entry);
+		entry_clr_protected(entry);
+	}
+	return sto;
+}
+
+static unsigned long *dma_get_page_table_origin(unsigned long *entry)
+{
+	unsigned long *pto;
+
+	if (reg_entry_isvalid(*entry))
+		pto = get_st_pto(*entry);
+	else {
+		pto = dma_alloc_page_table();
+		if (!pto)
+			return NULL;
+		set_st_pto(entry, pto);
+		validate_st_entry(entry);
+		entry_clr_protected(entry);
+	}
+	return pto;
+}
+
+static unsigned long *dma_walk_cpu_trans(unsigned long *rto, dma_addr_t dma_addr)
+{
+	unsigned long *sto, *pto;
+	unsigned int rtx, sx, px;
+
+	rtx = calc_rtx(dma_addr);
+	sto = dma_get_seg_table_origin(&rto[rtx]);
+	if (!sto)
+		return NULL;
+
+	sx = calc_sx(dma_addr);
+	pto = dma_get_page_table_origin(&sto[sx]);
+	if (!pto)
+		return NULL;
+
+	px = calc_px(dma_addr);
+	return &pto[px];
+}
+
+static void dma_update_cpu_trans(struct zpci_dev *zdev, void *page_addr,
+				 dma_addr_t dma_addr, int flags)
+{
+	unsigned long *entry;
+
+	entry = dma_walk_cpu_trans(zdev->dma_table, dma_addr);
+	if (!entry) {
+		WARN_ON_ONCE(1);
+		return;
+	}
+
+	if (flags & ZPCI_PTE_INVALID) {
+		invalidate_pt_entry(entry);
+		return;
+	} else {
+		set_pt_pfaa(entry, page_addr);
+		validate_pt_entry(entry);
+	}
+
+	if (flags & ZPCI_TABLE_PROTECTED)
+		entry_set_protected(entry);
+	else
+		entry_clr_protected(entry);
+}
+
+static int dma_update_trans(struct zpci_dev *zdev, unsigned long pa,
+			    dma_addr_t dma_addr, size_t size, int flags)
+{
+	unsigned int nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
+	u8 *page_addr = (u8 *) (pa & PAGE_MASK);
+	dma_addr_t start_dma_addr = dma_addr;
+	unsigned long irq_flags;
+	int i, rc = 0;
+
+	if (!nr_pages)
+		return -EINVAL;
+
+	spin_lock_irqsave(&zdev->dma_table_lock, irq_flags);
+	if (!zdev->dma_table) {
+		dev_err(&zdev->pdev->dev, "Missing DMA table\n");
+		goto no_refresh;
+	}
+
+	for (i = 0; i < nr_pages; i++) {
+		dma_update_cpu_trans(zdev, page_addr, dma_addr, flags);
+		page_addr += PAGE_SIZE;
+		dma_addr += PAGE_SIZE;
+	}
+
+	/*
+	 * rpcit is not required to establish new translations when previously
+	 * invalid translation-table entries are validated, however it is
+	 * required when altering previously valid entries.
+	 */
+	if (!zdev->tlb_refresh &&
+	    ((flags & ZPCI_PTE_VALID_MASK) == ZPCI_PTE_VALID))
+		/*
+		 * TODO: also need to check that the old entry is indeed INVALID
+		 * and not only for one page but for the whole range...
+		 * -> now we WARN_ON in that case but with lazy unmap that
+		 * needs to be redone!
+		 */
+		goto no_refresh;
+	rc = rpcit_instr((u64) zdev->fh << 32, start_dma_addr,
+			  nr_pages * PAGE_SIZE);
+
+no_refresh:
+	spin_unlock_irqrestore(&zdev->dma_table_lock, irq_flags);
+	return rc;
+}
+
+static void dma_free_seg_table(unsigned long entry)
+{
+	unsigned long *sto = get_rt_sto(entry);
+	int sx;
+
+	for (sx = 0; sx < ZPCI_TABLE_ENTRIES; sx++)
+		if (reg_entry_isvalid(sto[sx]))
+			dma_free_page_table(get_st_pto(sto[sx]));
+
+	dma_free_cpu_table(sto);
+}
+
+static void dma_cleanup_tables(struct zpci_dev *zdev)
+{
+	unsigned long *table = zdev->dma_table;
+	int rtx;
+
+	if (!zdev || !zdev->dma_table)
+		return;
+
+	for (rtx = 0; rtx < ZPCI_TABLE_ENTRIES; rtx++)
+		if (reg_entry_isvalid(table[rtx]))
+			dma_free_seg_table(table[rtx]);
+
+	dma_free_cpu_table(table);
+	zdev->dma_table = NULL;
+}
+
+static unsigned long __dma_alloc_iommu(struct zpci_dev *zdev, unsigned long start,
+				   int size)
+{
+	unsigned long boundary_size = 0x1000000;
+
+	return iommu_area_alloc(zdev->iommu_bitmap, zdev->iommu_pages,
+				start, size, 0, boundary_size, 0);
+}
+
+static unsigned long dma_alloc_iommu(struct zpci_dev *zdev, int size)
+{
+	unsigned long offset, flags;
+
+	spin_lock_irqsave(&zdev->iommu_bitmap_lock, flags);
+	offset = __dma_alloc_iommu(zdev, zdev->next_bit, size);
+	if (offset == -1)
+		offset = __dma_alloc_iommu(zdev, 0, size);
+
+	if (offset != -1) {
+		zdev->next_bit = offset + size;
+		if (zdev->next_bit >= zdev->iommu_pages)
+			zdev->next_bit = 0;
+	}
+	spin_unlock_irqrestore(&zdev->iommu_bitmap_lock, flags);
+	return offset;
+}
+
+static void dma_free_iommu(struct zpci_dev *zdev, unsigned long offset, int size)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&zdev->iommu_bitmap_lock, flags);
+	if (!zdev->iommu_bitmap)
+		goto out;
+	bitmap_clear(zdev->iommu_bitmap, offset, size);
+	if (offset >= zdev->next_bit)
+		zdev->next_bit = offset + size;
+out:
+	spin_unlock_irqrestore(&zdev->iommu_bitmap_lock, flags);
+}
+
+int dma_set_mask(struct device *dev, u64 mask)
+{
+	if (!dev->dma_mask || !dma_supported(dev, mask))
+		return -EIO;
+
+	*dev->dma_mask = mask;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(dma_set_mask);
+
+static dma_addr_t s390_dma_map_pages(struct device *dev, struct page *page,
+				     unsigned long offset, size_t size,
+				     enum dma_data_direction direction,
+				     struct dma_attrs *attrs)
+{
+	struct zpci_dev *zdev = get_zdev(container_of(dev, struct pci_dev, dev));
+	unsigned long nr_pages, iommu_page_index;
+	unsigned long pa = page_to_phys(page) + offset;
+	int flags = ZPCI_PTE_VALID;
+	dma_addr_t dma_addr;
+
+	WARN_ON_ONCE(offset > PAGE_SIZE);
+
+	/* This rounds up number of pages based on size and offset */
+	nr_pages = iommu_num_pages(pa, size, PAGE_SIZE);
+	iommu_page_index = dma_alloc_iommu(zdev, nr_pages);
+	if (iommu_page_index == -1)
+		goto out_err;
+
+	/* Use rounded up size */
+	size = nr_pages * PAGE_SIZE;
+
+	dma_addr = zdev->start_dma + iommu_page_index * PAGE_SIZE;
+	if (dma_addr + size > zdev->end_dma) {
+		dev_err(dev, "(dma_addr: 0x%16.16LX + size: 0x%16.16lx) > end_dma: 0x%16.16Lx\n",
+			 dma_addr, size, zdev->end_dma);
+		goto out_free;
+	}
+
+	if (direction == DMA_NONE || direction == DMA_TO_DEVICE)
+		flags |= ZPCI_TABLE_PROTECTED;
+
+	if (!dma_update_trans(zdev, pa, dma_addr, size, flags))
+		return dma_addr + offset;
+
+out_free:
+	dma_free_iommu(zdev, iommu_page_index, nr_pages);
+out_err:
+	dev_err(dev, "Failed to map addr: %lx\n", pa);
+	return DMA_ERROR_CODE;
+}
+
+static void s390_dma_unmap_pages(struct device *dev, dma_addr_t dma_addr,
+				 size_t size, enum dma_data_direction direction,
+				 struct dma_attrs *attrs)
+{
+	struct zpci_dev *zdev = get_zdev(container_of(dev, struct pci_dev, dev));
+	unsigned long iommu_page_index;
+	int npages;
+
+	npages = iommu_num_pages(dma_addr, size, PAGE_SIZE);
+	dma_addr = dma_addr & PAGE_MASK;
+	if (dma_update_trans(zdev, 0, dma_addr, npages * PAGE_SIZE,
+			     ZPCI_TABLE_PROTECTED | ZPCI_PTE_INVALID))
+		dev_err(dev, "Failed to unmap addr: %Lx\n", dma_addr);
+
+	iommu_page_index = (dma_addr - zdev->start_dma) >> PAGE_SHIFT;
+	dma_free_iommu(zdev, iommu_page_index, npages);
+}
+
+static void *s390_dma_alloc(struct device *dev, size_t size,
+			    dma_addr_t *dma_handle, gfp_t flag,
+			    struct dma_attrs *attrs)
+{
+	struct page *page;
+	unsigned long pa;
+	dma_addr_t map;
+
+	size = PAGE_ALIGN(size);
+	page = alloc_pages(flag, get_order(size));
+	if (!page)
+		return NULL;
+	pa = page_to_phys(page);
+	memset((void *) pa, 0, size);
+
+	map = s390_dma_map_pages(dev, page, pa % PAGE_SIZE,
+				 size, DMA_BIDIRECTIONAL, NULL);
+	if (dma_mapping_error(dev, map)) {
+		free_pages(pa, get_order(size));
+		return NULL;
+	}
+
+	if (dma_handle)
+		*dma_handle = map;
+	return (void *) pa;
+}
+
+static void s390_dma_free(struct device *dev, size_t size,
+			  void *pa, dma_addr_t dma_handle,
+			  struct dma_attrs *attrs)
+{
+	s390_dma_unmap_pages(dev, dma_handle, PAGE_ALIGN(size),
+			     DMA_BIDIRECTIONAL, NULL);
+	free_pages((unsigned long) pa, get_order(size));
+}
+
+static int s390_dma_map_sg(struct device *dev, struct scatterlist *sg,
+			   int nr_elements, enum dma_data_direction dir,
+			   struct dma_attrs *attrs)
+{
+	int mapped_elements = 0;
+	struct scatterlist *s;
+	int i;
+
+	for_each_sg(sg, s, nr_elements, i) {
+		struct page *page = sg_page(s);
+		s->dma_address = s390_dma_map_pages(dev, page, s->offset,
+						    s->length, dir, NULL);
+		if (!dma_mapping_error(dev, s->dma_address)) {
+			s->dma_length = s->length;
+			mapped_elements++;
+		} else
+			goto unmap;
+	}
+out:
+	return mapped_elements;
+
+unmap:
+	for_each_sg(sg, s, mapped_elements, i) {
+		if (s->dma_address)
+			s390_dma_unmap_pages(dev, s->dma_address, s->dma_length,
+					     dir, NULL);
+		s->dma_address = 0;
+		s->dma_length = 0;
+	}
+	mapped_elements = 0;
+	goto out;
+}
+
+static void s390_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
+			      int nr_elements, enum dma_data_direction dir,
+			      struct dma_attrs *attrs)
+{
+	struct scatterlist *s;
+	int i;
+
+	for_each_sg(sg, s, nr_elements, i) {
+		s390_dma_unmap_pages(dev, s->dma_address, s->dma_length, dir, NULL);
+		s->dma_address = 0;
+		s->dma_length = 0;
+	}
+}
+
+int zpci_dma_init_device(struct zpci_dev *zdev)
+{
+	unsigned int bitmap_order;
+	int rc;
+
+	spin_lock_init(&zdev->iommu_bitmap_lock);
+	spin_lock_init(&zdev->dma_table_lock);
+
+	zdev->dma_table = dma_alloc_cpu_table();
+	if (!zdev->dma_table) {
+		rc = -ENOMEM;
+		goto out_clean;
+	}
+
+	zdev->iommu_size = (unsigned long) high_memory - PAGE_OFFSET;
+	zdev->iommu_pages = zdev->iommu_size >> PAGE_SHIFT;
+	bitmap_order = get_order(zdev->iommu_pages / 8);
+	pr_info("iommu_size: 0x%lx  iommu_pages: 0x%lx  bitmap_order: %i\n",
+		 zdev->iommu_size, zdev->iommu_pages, bitmap_order);
+
+	zdev->iommu_bitmap = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
+						       bitmap_order);
+	if (!zdev->iommu_bitmap) {
+		rc = -ENOMEM;
+		goto out_reg;
+	}
+
+	rc = zpci_register_ioat(zdev,
+				0,
+				zdev->start_dma + PAGE_OFFSET,
+				zdev->start_dma + zdev->iommu_size - 1,
+				(u64) zdev->dma_table);
+	if (rc)
+		goto out_reg;
+	return 0;
+
+out_reg:
+	dma_free_cpu_table(zdev->dma_table);
+out_clean:
+	return rc;
+}
+
+void zpci_dma_exit_device(struct zpci_dev *zdev)
+{
+	zpci_unregister_ioat(zdev, 0);
+	dma_cleanup_tables(zdev);
+	free_pages((unsigned long) zdev->iommu_bitmap,
+		   get_order(zdev->iommu_pages / 8));
+	zdev->iommu_bitmap = NULL;
+	zdev->next_bit = 0;
+}
+
+static int __init dma_alloc_cpu_table_caches(void)
+{
+	dma_region_table_cache = kmem_cache_create("PCI_DMA_region_tables",
+					ZPCI_TABLE_SIZE, ZPCI_TABLE_ALIGN,
+					0, NULL);
+	if (!dma_region_table_cache)
+		return -ENOMEM;
+
+	dma_page_table_cache = kmem_cache_create("PCI_DMA_page_tables",
+					ZPCI_PT_SIZE, ZPCI_PT_ALIGN,
+					0, NULL);
+	if (!dma_page_table_cache) {
+		kmem_cache_destroy(dma_region_table_cache);
+		return -ENOMEM;
+	}
+	return 0;
+}
+
+int __init zpci_dma_init(void)
+{
+	return dma_alloc_cpu_table_caches();
+}
+
+void zpci_dma_exit(void)
+{
+	kmem_cache_destroy(dma_page_table_cache);
+	kmem_cache_destroy(dma_region_table_cache);
+}
+
+#define PREALLOC_DMA_DEBUG_ENTRIES	(1 << 16)
+
+static int __init dma_debug_do_init(void)
+{
+	dma_debug_init(PREALLOC_DMA_DEBUG_ENTRIES);
+	return 0;
+}
+fs_initcall(dma_debug_do_init);
+
+struct dma_map_ops s390_dma_ops = {
+	.alloc		= s390_dma_alloc,
+	.free		= s390_dma_free,
+	.map_sg		= s390_dma_map_sg,
+	.unmap_sg	= s390_dma_unmap_sg,
+	.map_page	= s390_dma_map_pages,
+	.unmap_page	= s390_dma_unmap_pages,
+	/* if we support direct DMA this must be conditional */
+	.is_phys	= 0,
+	/* dma_supported is unconditionally true without a callback */
+};
+EXPORT_SYMBOL_GPL(s390_dma_ops);
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 06/10] s390/pci: CHSC PCI support for error and availability events
  2012-11-14  9:41 [RFC PATCH 00/10] s390/pci: PCI support on System z Jan Glauber
                   ` (4 preceding siblings ...)
  2012-11-14  9:41 ` [RFC PATCH 05/10] s390/pci: DMA support Jan Glauber
@ 2012-11-14  9:41 ` Jan Glauber
  2012-11-14  9:41 ` [RFC PATCH 07/10] s390/pci: PCI hotplug support via SCLP Jan Glauber
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Jan Glauber @ 2012-11-14  9:41 UTC (permalink / raw)
  To: linux-pci; +Cc: linux-kernel, heiko.carstens, schwidefsky, Jan Glauber

Add CHSC store-event-information support for PCI (notfication type 2)
and report error and availability events to the PCI architecture layer.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
---
 arch/s390/include/asm/pci.h |   4 ++
 arch/s390/pci/Makefile      |   3 +-
 arch/s390/pci/pci_event.c   |  93 ++++++++++++++++++++++++++
 drivers/s390/cio/chsc.c     | 156 +++++++++++++++++++++++++++++++-------------
 4 files changed, 211 insertions(+), 45 deletions(-)
 create mode 100644 arch/s390/pci/pci_event.c

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index ca48caf..3dc2cb6 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -127,6 +127,10 @@ void zpci_teardown_msi_irq(struct zpci_dev *, struct msi_desc *);
 int zpci_msihash_init(void);
 void zpci_msihash_exit(void);
 
+/* Error handling and recovery */
+void zpci_event_error(void *);
+void zpci_event_availability(void *);
+
 /* Helpers */
 struct zpci_dev *get_zdev(struct pci_dev *);
 struct zpci_dev *get_zdev_by_fid(u32);
diff --git a/arch/s390/pci/Makefile b/arch/s390/pci/Makefile
index 4590596..7e36f42 100644
--- a/arch/s390/pci/Makefile
+++ b/arch/s390/pci/Makefile
@@ -2,4 +2,5 @@
 # Makefile for the s390 PCI subsystem.
 #
 
-obj-$(CONFIG_PCI)	+= pci.o pci_dma.o pci_clp.o pci_msi.o
+obj-$(CONFIG_PCI)	+= pci.o pci_dma.o pci_clp.o pci_msi.o \
+			   pci_event.o
diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c
new file mode 100644
index 0000000..dbed8cd
--- /dev/null
+++ b/arch/s390/pci/pci_event.c
@@ -0,0 +1,93 @@
+/*
+ *  Copyright IBM Corp. 2012
+ *
+ *  Author(s):
+ *    Jan Glauber <jang@linux.vnet.ibm.com>
+ */
+
+#define COMPONENT "zPCI"
+#define pr_fmt(fmt) COMPONENT ": " fmt
+
+#include <linux/kernel.h>
+#include <linux/pci.h>
+
+/* Content Code Description for PCI Function Error */
+struct zpci_ccdf_err {
+	u32 reserved1;
+	u32 fh;				/* function handle */
+	u32 fid;			/* function id */
+	u32 ett		:  4;		/* expected table type */
+	u32 mvn		: 12;		/* MSI vector number */
+	u32 dmaas	:  8;		/* DMA address space */
+	u32		:  6;
+	u32 q		:  1;		/* event qualifier */
+	u32 rw		:  1;		/* read/write */
+	u64 faddr;			/* failing address */
+	u32 reserved3;
+	u16 reserved4;
+	u16 pec;			/* PCI event code */
+} __packed;
+
+/* Content Code Description for PCI Function Availability */
+struct zpci_ccdf_avail {
+	u32 reserved1;
+	u32 fh;				/* function handle */
+	u32 fid;			/* function id */
+	u32 reserved2;
+	u32 reserved3;
+	u32 reserved4;
+	u32 reserved5;
+	u16 reserved6;
+	u16 pec;			/* PCI event code */
+} __packed;
+
+static void zpci_event_log_err(struct zpci_ccdf_err *ccdf)
+{
+	struct zpci_dev *zdev = get_zdev_by_fid(ccdf->fid);
+
+	dev_err(&zdev->pdev->dev, "event code: 0x%x\n", ccdf->pec);
+}
+
+static void zpci_event_log_avail(struct zpci_ccdf_avail *ccdf)
+{
+	struct zpci_dev *zdev = get_zdev_by_fid(ccdf->fid);
+
+	pr_err("%s%s: availability event: fh: 0x%x  fid: 0x%x  event code: 0x%x  reason:",
+		(zdev) ? dev_driver_string(&zdev->pdev->dev) : "?",
+		(zdev) ? dev_name(&zdev->pdev->dev) : "?",
+		ccdf->fh, ccdf->fid, ccdf->pec);
+	print_hex_dump(KERN_CONT, "ccdf", DUMP_PREFIX_OFFSET,
+		       16, 1, ccdf, sizeof(*ccdf), false);
+
+	switch (ccdf->pec) {
+	case 0x0301:
+		zpci_enable_device(zdev);
+		break;
+	case 0x0302:
+		clp_add_pci_device(ccdf->fid, ccdf->fh, 0);
+		break;
+	case 0x0306:
+		clp_find_pci_devices();
+		break;
+	default:
+		break;
+	}
+}
+
+void zpci_event_error(void *data)
+{
+	struct zpci_ccdf_err *ccdf = data;
+	struct zpci_dev *zdev;
+
+	zpci_event_log_err(ccdf);
+	zdev = get_zdev_by_fid(ccdf->fid);
+	if (!zdev) {
+		pr_err("Error event for unknown fid: %x", ccdf->fid);
+		return;
+	}
+}
+
+void zpci_event_availability(void *data)
+{
+	zpci_event_log_avail(data);
+}
diff --git a/drivers/s390/cio/chsc.c b/drivers/s390/cio/chsc.c
index 4d51a7c..68e80e2 100644
--- a/drivers/s390/cio/chsc.c
+++ b/drivers/s390/cio/chsc.c
@@ -1,7 +1,7 @@
 /*
  *   S/390 common I/O routines -- channel subsystem call
  *
- *    Copyright IBM Corp. 1999, 2010
+ *    Copyright IBM Corp. 1999,2012
  *    Author(s): Ingo Adlung (adlung@de.ibm.com)
  *		 Cornelia Huck (cornelia.huck@de.ibm.com)
  *		 Arnd Bergmann (arndb@de.ibm.com)
@@ -14,6 +14,7 @@
 #include <linux/slab.h>
 #include <linux/init.h>
 #include <linux/device.h>
+#include <linux/pci.h>
 
 #include <asm/cio.h>
 #include <asm/chpid.h>
@@ -260,26 +261,45 @@ __get_chpid_from_lir(void *data)
 	return (u16) (lir->indesc[0]&0x000000ff);
 }
 
-struct chsc_sei_area {
-	struct chsc_header request;
+struct chsc_sei_nt0_area {
+	u8  flags;
+	u8  vf;				/* validity flags */
+	u8  rs;				/* reporting source */
+	u8  cc;				/* content code */
+	u16 fla;			/* full link address */
+	u16 rsid;			/* reporting source id */
 	u32 reserved1;
 	u32 reserved2;
-	u32 reserved3;
-	struct chsc_header response;
-	u32 reserved4;
-	u8  flags;
-	u8  vf;		/* validity flags */
-	u8  rs;		/* reporting source */
-	u8  cc;		/* content code */
-	u16 fla;	/* full link address */
-	u16 rsid;	/* reporting source id */
-	u32 reserved5;
-	u32 reserved6;
-	u8 ccdf[4096 - 16 - 24];	/* content-code dependent field */
 	/* ccdf has to be big enough for a link-incident record */
-} __attribute__ ((packed));
-
-static void chsc_process_sei_link_incident(struct chsc_sei_area *sei_area)
+	u8  ccdf[PAGE_SIZE - 24 - 16];	/* content-code dependent field */
+} __packed;
+
+struct chsc_sei_nt2_area {
+	u8  flags;			/* p and v bit */
+	u8  reserved1;
+	u8  reserved2;
+	u8  cc;				/* content code */
+	u32 reserved3[13];
+	u8  ccdf[PAGE_SIZE - 24 - 56];	/* content-code dependent field */
+} __packed;
+
+#define CHSC_SEI_NT0	0ULL
+#define CHSC_SEI_NT2	(1ULL << 61)
+
+struct chsc_sei {
+	struct chsc_header request;
+	u32 reserved1;
+	u64 ntsm;			/* notification type mask */
+	struct chsc_header response;
+	u32 reserved2;
+	union {
+		struct chsc_sei_nt0_area nt0_area;
+		struct chsc_sei_nt2_area nt2_area;
+		u8 nt_area[PAGE_SIZE - 24];
+	} u;
+} __packed;
+
+static void chsc_process_sei_link_incident(struct chsc_sei_nt0_area *sei_area)
 {
 	struct chp_id chpid;
 	int id;
@@ -298,7 +318,7 @@ static void chsc_process_sei_link_incident(struct chsc_sei_area *sei_area)
 	}
 }
 
-static void chsc_process_sei_res_acc(struct chsc_sei_area *sei_area)
+static void chsc_process_sei_res_acc(struct chsc_sei_nt0_area *sei_area)
 {
 	struct chp_link link;
 	struct chp_id chpid;
@@ -330,7 +350,7 @@ static void chsc_process_sei_res_acc(struct chsc_sei_area *sei_area)
 	s390_process_res_acc(&link);
 }
 
-static void chsc_process_sei_chp_avail(struct chsc_sei_area *sei_area)
+static void chsc_process_sei_chp_avail(struct chsc_sei_nt0_area *sei_area)
 {
 	struct channel_path *chp;
 	struct chp_id chpid;
@@ -366,7 +386,7 @@ struct chp_config_data {
 	u8 pc;
 };
 
-static void chsc_process_sei_chp_config(struct chsc_sei_area *sei_area)
+static void chsc_process_sei_chp_config(struct chsc_sei_nt0_area *sei_area)
 {
 	struct chp_config_data *data;
 	struct chp_id chpid;
@@ -398,7 +418,7 @@ static void chsc_process_sei_chp_config(struct chsc_sei_area *sei_area)
 	}
 }
 
-static void chsc_process_sei_scm_change(struct chsc_sei_area *sei_area)
+static void chsc_process_sei_scm_change(struct chsc_sei_nt0_area *sei_area)
 {
 	int ret;
 
@@ -412,13 +432,26 @@ static void chsc_process_sei_scm_change(struct chsc_sei_area *sei_area)
 			      " failed (rc=%d).\n", ret);
 }
 
-static void chsc_process_sei(struct chsc_sei_area *sei_area)
+static void chsc_process_sei_nt2(struct chsc_sei_nt2_area *sei_area)
 {
-	/* Check if we might have lost some information. */
-	if (sei_area->flags & 0x40) {
-		CIO_CRW_EVENT(2, "chsc: event overflow\n");
-		css_schedule_eval_all();
+#ifdef CONFIG_PCI
+	switch (sei_area->cc) {
+	case 1:
+		zpci_event_error(sei_area->ccdf);
+		break;
+	case 2:
+		zpci_event_availability(sei_area->ccdf);
+		break;
+	default:
+		CIO_CRW_EVENT(2, "chsc: unhandled sei content code %d\n",
+			      sei_area->cc);
+		break;
 	}
+#endif
+}
+
+static void chsc_process_sei_nt0(struct chsc_sei_nt0_area *sei_area)
+{
 	/* which kind of information was stored? */
 	switch (sei_area->cc) {
 	case 1: /* link incident*/
@@ -443,9 +476,51 @@ static void chsc_process_sei(struct chsc_sei_area *sei_area)
 	}
 }
 
+static int __chsc_process_crw(struct chsc_sei *sei, u64 ntsm)
+{
+	do {
+		memset(sei, 0, sizeof(*sei));
+		sei->request.length = 0x0010;
+		sei->request.code = 0x000e;
+		sei->ntsm = ntsm;
+
+		if (chsc(sei))
+			break;
+
+		if (sei->response.code == 0x0001) {
+			CIO_CRW_EVENT(2, "chsc: sei successful\n");
+
+			/* Check if we might have lost some information. */
+			if (sei->u.nt0_area.flags & 0x40) {
+				CIO_CRW_EVENT(2, "chsc: event overflow\n");
+				css_schedule_eval_all();
+			}
+
+			switch (sei->ntsm) {
+			case CHSC_SEI_NT0:
+				chsc_process_sei_nt0(&sei->u.nt0_area);
+				return 1;
+			case CHSC_SEI_NT2:
+				chsc_process_sei_nt2(&sei->u.nt2_area);
+				return 1;
+			default:
+				CIO_CRW_EVENT(2, "chsc: unhandled nt (nt=%08Lx)\n",
+					      sei->ntsm);
+				return 0;
+			}
+		} else {
+			CIO_CRW_EVENT(2, "chsc: sei failed (rc=%04x)\n",
+				      sei->response.code);
+			break;
+		}
+	} while (sei->u.nt0_area.flags & 0x80);
+
+	return 0;
+}
+
 static void chsc_process_crw(struct crw *crw0, struct crw *crw1, int overflow)
 {
-	struct chsc_sei_area *sei_area;
+	struct chsc_sei *sei;
 
 	if (overflow) {
 		css_schedule_eval_all();
@@ -459,25 +534,18 @@ static void chsc_process_crw(struct crw *crw0, struct crw *crw1, int overflow)
 		return;
 	/* Access to sei_page is serialized through machine check handler
 	 * thread, so no need for locking. */
-	sei_area = sei_page;
+	sei = sei_page;
 
 	CIO_TRACE_EVENT(2, "prcss");
-	do {
-		memset(sei_area, 0, sizeof(*sei_area));
-		sei_area->request.length = 0x0010;
-		sei_area->request.code = 0x000e;
-		if (chsc(sei_area))
-			break;
 
-		if (sei_area->response.code == 0x0001) {
-			CIO_CRW_EVENT(4, "chsc: sei successful\n");
-			chsc_process_sei(sei_area);
-		} else {
-			CIO_CRW_EVENT(2, "chsc: sei failed (rc=%04x)\n",
-				      sei_area->response.code);
-			break;
-		}
-	} while (sei_area->flags & 0x80);
+	/*
+	 * The ntsm does not allow to select NT0 and NT2 together. We need to
+	 * first check for NT2, than additionally for NT0...
+	 */
+#ifdef CONFIG_PCI
+	if (!__chsc_process_crw(sei, CHSC_SEI_NT2))
+#endif
+		__chsc_process_crw(sei, CHSC_SEI_NT0);
 }
 
 void chsc_chp_online(struct chp_id chpid)
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 07/10] s390/pci: PCI hotplug support via SCLP
  2012-11-14  9:41 [RFC PATCH 00/10] s390/pci: PCI support on System z Jan Glauber
                   ` (5 preceding siblings ...)
  2012-11-14  9:41 ` [RFC PATCH 06/10] s390/pci: CHSC PCI support for error and availability events Jan Glauber
@ 2012-11-14  9:41 ` Jan Glauber
  2012-11-14  9:41 ` [RFC PATCH 08/10] s390/pci: s390 specific PCI sysfs attributes Jan Glauber
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Jan Glauber @ 2012-11-14  9:41 UTC (permalink / raw)
  To: linux-pci; +Cc: linux-kernel, heiko.carstens, schwidefsky, Jan Glauber

Add SCLP PCI configure/deconfigure and implement a PCI hotplug
controller (s390_pci_hpc). The hotplug controller creates a slot
for every PCI function in stand-by or configured state. The PCI
functions are named after the PCI function ID (fid). By writing to
the power attribute in /sys/bus/pci/slots/<fid>/power the PCI function
is moved to stand-by or configured state. If moved to the configured
state the device is automatically scanned by the s390 PCI layer.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
---
 arch/s390/include/asm/pci.h        |  11 ++
 arch/s390/include/asm/sclp.h       |   2 +
 arch/s390/pci/pci.c                |   9 ++
 drivers/pci/hotplug/Kconfig        |  11 ++
 drivers/pci/hotplug/Makefile       |   1 +
 drivers/pci/hotplug/s390_pci_hpc.c | 252 +++++++++++++++++++++++++++++++++++++
 drivers/s390/char/sclp.h           |   3 +-
 drivers/s390/char/sclp_cmd.c       |  64 +++++++++-
 8 files changed, 351 insertions(+), 2 deletions(-)
 create mode 100644 drivers/pci/hotplug/s390_pci_hpc.c

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 3dc2cb6..739d9f1 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -95,6 +95,11 @@ struct zpci_dev {
 	enum pci_bus_speed max_bus_speed;
 };
 
+struct pci_hp_callback_ops {
+	int (*create_slot)	(struct zpci_dev *zdev);
+	void (*remove_slot)	(struct zpci_dev *zdev);
+};
+
 static inline bool zdev_enabled(struct zpci_dev *zdev)
 {
 	return (zdev->fh & (1UL << 31)) ? true : false;
@@ -140,4 +145,10 @@ bool zpci_fid_present(u32);
 int zpci_dma_init(void);
 void zpci_dma_exit(void);
 
+/* Hotplug */
+extern struct mutex zpci_list_lock;
+extern struct list_head zpci_list;
+extern struct pci_hp_callback_ops hotplug_ops;
+extern unsigned int pci_probe;
+
 #endif
diff --git a/arch/s390/include/asm/sclp.h b/arch/s390/include/asm/sclp.h
index e62a555..8337886 100644
--- a/arch/s390/include/asm/sclp.h
+++ b/arch/s390/include/asm/sclp.h
@@ -55,5 +55,7 @@ int sclp_chp_read_info(struct sclp_chp_info *info);
 void sclp_get_ipl_info(struct sclp_ipl_info *info);
 bool sclp_has_linemode(void);
 bool sclp_has_vt220(void);
+int sclp_pci_configure(u32 fid);
+int sclp_pci_deconfigure(u32 fid);
 
 #endif /* _ASM_S390_SCLP_H */
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 5a2ef9e..c523594 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -47,7 +47,12 @@
 
 /* list of all detected zpci devices */
 LIST_HEAD(zpci_list);
+EXPORT_SYMBOL_GPL(zpci_list);
 DEFINE_MUTEX(zpci_list_lock);
+EXPORT_SYMBOL_GPL(zpci_list_lock);
+
+struct pci_hp_callback_ops hotplug_ops;
+EXPORT_SYMBOL_GPL(hotplug_ops);
 
 static DECLARE_BITMAP(zpci_domain, ZPCI_NR_DEVICES);
 static DEFINE_SPINLOCK(zpci_domain_lock);
@@ -935,6 +940,8 @@ int zpci_create_device(struct zpci_dev *zdev)
 
 	mutex_lock(&zpci_list_lock);
 	list_add_tail(&zdev->entry, &zpci_list);
+	if (hotplug_ops.create_slot)
+		hotplug_ops.create_slot(zdev);
 	mutex_unlock(&zpci_list_lock);
 
 	if (zdev->state == ZPCI_FN_STATE_STANDBY)
@@ -948,6 +955,8 @@ int zpci_create_device(struct zpci_dev *zdev)
 out_start:
 	mutex_lock(&zpci_list_lock);
 	list_del(&zdev->entry);
+	if (hotplug_ops.remove_slot)
+		hotplug_ops.remove_slot(zdev);
 	mutex_unlock(&zpci_list_lock);
 out_bus:
 	zpci_free_domain(zdev);
diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
index b0e46de..0594c4a 100644
--- a/drivers/pci/hotplug/Kconfig
+++ b/drivers/pci/hotplug/Kconfig
@@ -151,4 +151,15 @@ config HOTPLUG_PCI_SGI
 
 	  When in doubt, say N.
 
+config HOTPLUG_PCI_S390
+	tristate "System z PCI Hotplug Support"
+	depends on S390 && 64BIT
+	help
+	  Say Y here if you want to use the System z PCI Hotplug
+	  driver for PCI devices. Without this driver it is not
+          possible to access stand-by PCI functions nor to deconfigure
+          PCI functions.
+
+	  When in doubt, say Y.
+
 endif # HOTPLUG_PCI
diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
index c459cd4..47ec8c8 100644
--- a/drivers/pci/hotplug/Makefile
+++ b/drivers/pci/hotplug/Makefile
@@ -18,6 +18,7 @@ obj-$(CONFIG_HOTPLUG_PCI_RPA)		+= rpaphp.o
 obj-$(CONFIG_HOTPLUG_PCI_RPA_DLPAR)	+= rpadlpar_io.o
 obj-$(CONFIG_HOTPLUG_PCI_SGI)		+= sgi_hotplug.o
 obj-$(CONFIG_HOTPLUG_PCI_ACPI)		+= acpiphp.o
+obj-$(CONFIG_HOTPLUG_PCI_S390)		+= s390_pci_hpc.o
 
 # acpiphp_ibm extends acpiphp, so should be linked afterwards.
 
diff --git a/drivers/pci/hotplug/s390_pci_hpc.c b/drivers/pci/hotplug/s390_pci_hpc.c
new file mode 100644
index 0000000..d4de895
--- /dev/null
+++ b/drivers/pci/hotplug/s390_pci_hpc.c
@@ -0,0 +1,252 @@
+/*
+ * PCI Hot Plug Controller Driver for System z
+ *
+ * Copyright 2012 IBM Corp.
+ *
+ * Author(s):
+ *   Jan Glauber <jang@linux.vnet.ibm.com>
+ */
+
+#define COMPONENT "zPCI hpc"
+#define pr_fmt(fmt) COMPONENT ": " fmt
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/pci.h>
+#include <linux/pci_hotplug.h>
+#include <linux/init.h>
+#include <asm/sclp.h>
+
+#define SLOT_NAME_SIZE	10
+static LIST_HEAD(s390_hotplug_slot_list);
+
+MODULE_AUTHOR("Jan Glauber <jang@linux.vnet.ibm.com");
+MODULE_DESCRIPTION("Hot Plug PCI Controller for System z");
+MODULE_LICENSE("GPL");
+
+static int zpci_fn_configured(enum zpci_state state)
+{
+	return state == ZPCI_FN_STATE_CONFIGURED ||
+	       state == ZPCI_FN_STATE_ONLINE;
+}
+
+/*
+ * struct slot - slot information for each *physical* slot
+ */
+struct slot {
+	struct list_head slot_list;
+	struct hotplug_slot *hotplug_slot;
+	struct zpci_dev *zdev;
+};
+
+static int enable_slot(struct hotplug_slot *hotplug_slot)
+{
+	struct slot *slot = hotplug_slot->private;
+	int rc;
+
+	if (slot->zdev->state != ZPCI_FN_STATE_STANDBY)
+		return -EIO;
+
+	rc = sclp_pci_configure(slot->zdev->fid);
+	if (!rc) {
+		slot->zdev->state = ZPCI_FN_STATE_CONFIGURED;
+		/* automatically scan the device after is was configured */
+		zpci_enable_device(slot->zdev);
+		zpci_scan_device(slot->zdev);
+	}
+	return rc;
+}
+
+static int disable_slot(struct hotplug_slot *hotplug_slot)
+{
+	struct slot *slot = hotplug_slot->private;
+	int rc;
+
+	if (!zpci_fn_configured(slot->zdev->state))
+		return -EIO;
+
+	/* TODO: we rely on the user to unbind/remove the device, is that plausible
+	 *       or do we need to trigger that here?
+	 */
+	rc = sclp_pci_deconfigure(slot->zdev->fid);
+	if (!rc) {
+		/* Fixme: better call List-PCI to find the disabled FH
+		   for the FID since the FH should be opaque... */
+		slot->zdev->fh &= 0x7fffffff;
+		slot->zdev->state = ZPCI_FN_STATE_STANDBY;
+	}
+	return rc;
+}
+
+static int get_power_status(struct hotplug_slot *hotplug_slot, u8 *value)
+{
+	struct slot *slot = hotplug_slot->private;
+
+	switch (slot->zdev->state) {
+	case ZPCI_FN_STATE_STANDBY:
+		*value = 0;
+		break;
+	default:
+		*value = 1;
+		break;
+	}
+	return 0;
+}
+
+static int get_adapter_status(struct hotplug_slot *hotplug_slot, u8 *value)
+{
+	/* if the slot exits it always contains a function */
+	*value = 1;
+	return 0;
+}
+
+static void release_slot(struct hotplug_slot *hotplug_slot)
+{
+	struct slot *slot = hotplug_slot->private;
+
+	pr_debug("%s - physical_slot = %s\n", __func__, hotplug_slot_name(hotplug_slot));
+	kfree(slot->hotplug_slot->info);
+	kfree(slot->hotplug_slot);
+	kfree(slot);
+}
+
+static struct hotplug_slot_ops s390_hotplug_slot_ops = {
+	.enable_slot =		enable_slot,
+	.disable_slot =		disable_slot,
+	.get_power_status =	get_power_status,
+	.get_adapter_status =	get_adapter_status,
+};
+
+static int init_pci_slot(struct zpci_dev *zdev)
+{
+	struct hotplug_slot *hotplug_slot;
+	struct hotplug_slot_info *info;
+	char name[SLOT_NAME_SIZE];
+	struct slot *slot;
+	int rc;
+
+	if (!zdev)
+		return 0;
+
+	slot = kzalloc(sizeof(*slot), GFP_KERNEL);
+	if (!slot)
+		goto error;
+
+	hotplug_slot = kzalloc(sizeof(*hotplug_slot), GFP_KERNEL);
+	if (!hotplug_slot)
+		goto error_hp;
+	hotplug_slot->private = slot;
+
+	slot->hotplug_slot = hotplug_slot;
+	slot->zdev = zdev;
+
+	info = kzalloc(sizeof(*info), GFP_KERNEL);
+	if (!info)
+		goto error_info;
+	hotplug_slot->info = info;
+
+	hotplug_slot->ops = &s390_hotplug_slot_ops;
+	hotplug_slot->release = &release_slot;
+
+	get_power_status(hotplug_slot, &info->power_status);
+	get_adapter_status(hotplug_slot, &info->adapter_status);
+
+	snprintf(name, SLOT_NAME_SIZE, "%08x", zdev->fid);
+	rc = pci_hp_register(slot->hotplug_slot, zdev->bus,
+			     ZPCI_DEVFN, name);
+	if (rc) {
+		pr_err("pci_hp_register failed with error %d\n", rc);
+		goto error_reg;
+	}
+	list_add(&slot->slot_list, &s390_hotplug_slot_list);
+	return 0;
+
+error_reg:
+	kfree(info);
+error_info:
+	kfree(hotplug_slot);
+error_hp:
+	kfree(slot);
+error:
+	return -ENOMEM;
+}
+
+static int __init init_pci_slots(void)
+{
+	struct zpci_dev *zdev;
+	int device = 0;
+
+	/*
+	 * Create a structure for each slot, and register that slot
+	 * with the pci_hotplug subsystem.
+	 */
+	mutex_lock(&zpci_list_lock);
+	list_for_each_entry(zdev, &zpci_list, entry) {
+		init_pci_slot(zdev);
+		device++;
+	}
+
+	mutex_unlock(&zpci_list_lock);
+	return (device) ? 0 : -ENODEV;
+}
+
+static void exit_pci_slot(struct zpci_dev *zdev)
+{
+	struct list_head *tmp, *n;
+	struct slot *slot;
+
+	list_for_each_safe(tmp, n, &s390_hotplug_slot_list) {
+		slot = list_entry(tmp, struct slot, slot_list);
+		if (slot->zdev != zdev)
+			continue;
+		list_del(&slot->slot_list);
+		pci_hp_deregister(slot->hotplug_slot);
+	}
+}
+
+static void __exit exit_pci_slots(void)
+{
+	struct list_head *tmp, *n;
+	struct slot *slot;
+
+	/*
+	 * Unregister all of our slots with the pci_hotplug subsystem.
+	 * Memory will be freed in release_slot() callback after slot's
+	 * lifespan is finished.
+	 */
+	list_for_each_safe(tmp, n, &s390_hotplug_slot_list) {
+		slot = list_entry(tmp, struct slot, slot_list);
+		list_del(&slot->slot_list);
+		pci_hp_deregister(slot->hotplug_slot);
+	}
+}
+
+static int __init pci_hotplug_s390_init(void)
+{
+	/*
+	 * Do specific initialization stuff for your driver here
+	 * like initializing your controller hardware (if any) and
+	 * determining the number of slots you have in the system
+	 * right now.
+	 */
+
+	if (!pci_probe)
+		return -EOPNOTSUPP;
+
+	/* register callbacks for slot handling from arch code */
+	mutex_lock(&zpci_list_lock);
+	hotplug_ops.create_slot = init_pci_slot;
+	hotplug_ops.remove_slot = exit_pci_slot;
+	mutex_unlock(&zpci_list_lock);
+	pr_info("registered hotplug slot callbacks\n");
+	return init_pci_slots();
+}
+
+static void __exit pci_hotplug_s390_exit(void)
+{
+	exit_pci_slots();
+}
+
+module_init(pci_hotplug_s390_init);
+module_exit(pci_hotplug_s390_exit);
diff --git a/drivers/s390/char/sclp.h b/drivers/s390/char/sclp.h
index d7e97ae..25bcd4c 100644
--- a/drivers/s390/char/sclp.h
+++ b/drivers/s390/char/sclp.h
@@ -1,5 +1,5 @@
 /*
- * Copyright IBM Corp. 1999, 2009
+ * Copyright IBM Corp. 1999,2012
  *
  * Author(s): Martin Peschke <mpeschke@de.ibm.com>
  *	      Martin Schwidefsky <schwidefsky@de.ibm.com>
@@ -103,6 +103,7 @@ extern u64 sclp_facilities;
 #define SCLP_HAS_CHP_RECONFIG	(sclp_facilities & 0x2000000000000000ULL)
 #define SCLP_HAS_CPU_INFO	(sclp_facilities & 0x0800000000000000ULL)
 #define SCLP_HAS_CPU_RECONFIG	(sclp_facilities & 0x0400000000000000ULL)
+#define SCLP_HAS_PCI_RECONFIG	(sclp_facilities & 0x0000000040000000ULL)
 
 
 struct gds_subvector {
diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c
index 71ea923c..f308bac 100644
--- a/drivers/s390/char/sclp_cmd.c
+++ b/drivers/s390/char/sclp_cmd.c
@@ -1,5 +1,5 @@
 /*
- * Copyright IBM Corp. 2007, 2009
+ * Copyright IBM Corp. 2007,2012
  *
  * Author(s): Heiko Carstens <heiko.carstens@de.ibm.com>,
  *	      Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
@@ -12,6 +12,7 @@
 #include <linux/init.h>
 #include <linux/errno.h>
 #include <linux/err.h>
+#include <linux/export.h>
 #include <linux/slab.h>
 #include <linux/string.h>
 #include <linux/mm.h>
@@ -702,6 +703,67 @@ __initcall(sclp_detect_standby_memory);
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 /*
+ * PCI I/O adapter configuration related functions.
+ */
+#define SCLP_CMDW_CONFIGURE_PCI			0x001a0001
+#define SCLP_CMDW_DECONFIGURE_PCI		0x001b0001
+
+#define SCLP_RECONFIG_PCI_ATPYE			2
+
+struct pci_cfg_sccb {
+	struct sccb_header header;
+	u8 atype;		/* adapter type */
+	u8 reserved1;
+	u16 reserved2;
+	u32 aid;		/* adapter identifier */
+} __packed;
+
+static int do_pci_configure(sclp_cmdw_t cmd, u32 fid)
+{
+	struct pci_cfg_sccb *sccb;
+	int rc;
+
+	if (!SCLP_HAS_PCI_RECONFIG)
+		return -EOPNOTSUPP;
+
+	sccb = (struct pci_cfg_sccb *) get_zeroed_page(GFP_KERNEL | GFP_DMA);
+	if (!sccb)
+		return -ENOMEM;
+
+	sccb->header.length = PAGE_SIZE;
+	sccb->atype = SCLP_RECONFIG_PCI_ATPYE;
+	sccb->aid = fid;
+	rc = do_sync_request(cmd, sccb);
+	if (rc)
+		goto out;
+	switch (sccb->header.response_code) {
+	case 0x0020:
+	case 0x0120:
+		break;
+	default:
+		pr_warn("configure PCI I/O adapter failed: cmd=0x%08x  response=0x%04x\n",
+			cmd, sccb->header.response_code);
+		rc = -EIO;
+		break;
+	}
+out:
+	free_page((unsigned long) sccb);
+	return rc;
+}
+
+int sclp_pci_configure(u32 fid)
+{
+	return do_pci_configure(SCLP_CMDW_CONFIGURE_PCI, fid);
+}
+EXPORT_SYMBOL(sclp_pci_configure);
+
+int sclp_pci_deconfigure(u32 fid)
+{
+	return do_pci_configure(SCLP_CMDW_DECONFIGURE_PCI, fid);
+}
+EXPORT_SYMBOL(sclp_pci_deconfigure);
+
+/*
  * Channel path configuration related functions.
  */
 
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 08/10] s390/pci: s390 specific PCI sysfs attributes
  2012-11-14  9:41 [RFC PATCH 00/10] s390/pci: PCI support on System z Jan Glauber
                   ` (6 preceding siblings ...)
  2012-11-14  9:41 ` [RFC PATCH 07/10] s390/pci: PCI hotplug support via SCLP Jan Glauber
@ 2012-11-14  9:41 ` Jan Glauber
  2012-11-14  9:41 ` [RFC PATCH 09/10] s390/pci: add PCI Kconfig options Jan Glauber
  2012-11-14  9:41 ` [RFC PATCH 10/10] vga: compile fix, disable vga for s390 Jan Glauber
  9 siblings, 0 replies; 15+ messages in thread
From: Jan Glauber @ 2012-11-14  9:41 UTC (permalink / raw)
  To: linux-pci; +Cc: linux-kernel, heiko.carstens, schwidefsky, Jan Glauber

Add some s390 specific sysfs attributes to the PCI device directory.
The following attributes are introduced:
- function_id (PCI function ID)
- function_handle (PCI function handle)
- pchid (PCI channel ID)
- pfgid (PCI function group ID aka PCI root complex)

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
---
 arch/s390/include/asm/pci.h |  4 +++
 arch/s390/pci/Makefile      |  2 +-
 arch/s390/pci/pci.c         |  6 ++++
 arch/s390/pci/pci_sysfs.c   | 86 +++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 97 insertions(+), 1 deletion(-)
 create mode 100644 arch/s390/pci/pci_sysfs.c

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 739d9f1..39057e4 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -141,6 +141,10 @@ struct zpci_dev *get_zdev(struct pci_dev *);
 struct zpci_dev *get_zdev_by_fid(u32);
 bool zpci_fid_present(u32);
 
+/* sysfs */
+int zpci_sysfs_add_device(struct device *);
+void zpci_sysfs_remove_device(struct device *);
+
 /* DMA */
 int zpci_dma_init(void);
 void zpci_dma_exit(void);
diff --git a/arch/s390/pci/Makefile b/arch/s390/pci/Makefile
index 7e36f42..ab0827b 100644
--- a/arch/s390/pci/Makefile
+++ b/arch/s390/pci/Makefile
@@ -3,4 +3,4 @@
 #
 
 obj-$(CONFIG_PCI)	+= pci.o pci_dma.o pci_clp.o pci_msi.o \
-			   pci_event.o
+			   pci_sysfs.o pci_event.o
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index c523594..0723b10 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -628,6 +628,7 @@ static void zpci_remove_device(struct pci_dev *pdev)
 	dev_info(&pdev->dev, "Removing device %u\n", zdev->domain);
 	zdev->state = ZPCI_FN_STATE_CONFIGURED;
 	zpci_dma_exit_device(zdev);
+	zpci_sysfs_remove_device(&pdev->dev);
 	zpci_unmap_resources(pdev);
 	list_del(&zdev->entry);		/* can be called from init */
 	zdev->pdev = NULL;
@@ -676,6 +677,11 @@ void pcibios_disable_device(struct pci_dev *pdev)
 	pdev->sysdata = NULL;
 }
 
+int pcibios_add_platform_entries(struct pci_dev *pdev)
+{
+	return zpci_sysfs_add_device(&pdev->dev);
+}
+
 int zpci_request_irq(unsigned int irq, irq_handler_t handler, void *data)
 {
 	int msi_nr = irq_to_msi_nr(irq);
diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c
new file mode 100644
index 0000000..a42cce6
--- /dev/null
+++ b/arch/s390/pci/pci_sysfs.c
@@ -0,0 +1,86 @@
+/*
+ * Copyright IBM Corp. 2012
+ *
+ * Author(s):
+ *   Jan Glauber <jang@linux.vnet.ibm.com>
+ */
+
+#define COMPONENT "zPCI"
+#define pr_fmt(fmt) COMPONENT ": " fmt
+
+#include <linux/kernel.h>
+#include <linux/stat.h>
+#include <linux/pci.h>
+
+static ssize_t show_fid(struct device *dev, struct device_attribute *attr,
+			char *buf)
+{
+	struct zpci_dev *zdev = get_zdev(container_of(dev, struct pci_dev, dev));
+
+	sprintf(buf, "0x%08x\n", zdev->fid);
+	return strlen(buf);
+}
+static DEVICE_ATTR(function_id, S_IRUGO, show_fid, NULL);
+
+static ssize_t show_fh(struct device *dev, struct device_attribute *attr,
+		       char *buf)
+{
+	struct zpci_dev *zdev = get_zdev(container_of(dev, struct pci_dev, dev));
+
+	sprintf(buf, "0x%08x\n", zdev->fh);
+	return strlen(buf);
+}
+static DEVICE_ATTR(function_handle, S_IRUGO, show_fh, NULL);
+
+static ssize_t show_pchid(struct device *dev, struct device_attribute *attr,
+			  char *buf)
+{
+	struct zpci_dev *zdev = get_zdev(container_of(dev, struct pci_dev, dev));
+
+	sprintf(buf, "0x%04x\n", zdev->pchid);
+	return strlen(buf);
+}
+static DEVICE_ATTR(pchid, S_IRUGO, show_pchid, NULL);
+
+static ssize_t show_pfgid(struct device *dev, struct device_attribute *attr,
+			  char *buf)
+{
+	struct zpci_dev *zdev = get_zdev(container_of(dev, struct pci_dev, dev));
+
+	sprintf(buf, "0x%02x\n", zdev->pfgid);
+	return strlen(buf);
+}
+static DEVICE_ATTR(pfgid, S_IRUGO, show_pfgid, NULL);
+
+static struct device_attribute *zpci_dev_attrs[] = {
+	&dev_attr_function_id,
+	&dev_attr_function_handle,
+	&dev_attr_pchid,
+	&dev_attr_pfgid,
+	NULL,
+};
+
+int zpci_sysfs_add_device(struct device *dev)
+{
+	int i, rc = 0;
+
+	for (i = 0; zpci_dev_attrs[i]; i++) {
+		rc = device_create_file(dev, zpci_dev_attrs[i]);
+		if (rc)
+			goto error;
+	}
+	return 0;
+
+error:
+	while (--i >= 0)
+		device_remove_file(dev, zpci_dev_attrs[i]);
+	return rc;
+}
+
+void zpci_sysfs_remove_device(struct device *dev)
+{
+	int i;
+
+	for (i = 0; zpci_dev_attrs[i]; i++)
+		device_remove_file(dev, zpci_dev_attrs[i]);
+}
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 09/10] s390/pci: add PCI Kconfig options
  2012-11-14  9:41 [RFC PATCH 00/10] s390/pci: PCI support on System z Jan Glauber
                   ` (7 preceding siblings ...)
  2012-11-14  9:41 ` [RFC PATCH 08/10] s390/pci: s390 specific PCI sysfs attributes Jan Glauber
@ 2012-11-14  9:41 ` Jan Glauber
  2012-11-14  9:41 ` [RFC PATCH 10/10] vga: compile fix, disable vga for s390 Jan Glauber
  9 siblings, 0 replies; 15+ messages in thread
From: Jan Glauber @ 2012-11-14  9:41 UTC (permalink / raw)
  To: linux-pci; +Cc: linux-kernel, heiko.carstens, schwidefsky, Jan Glauber

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
---
 arch/s390/Kconfig | 56 +++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 50 insertions(+), 6 deletions(-)

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 5dba755..e3dd4aec 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -34,12 +34,6 @@ config GENERIC_BUG
 config GENERIC_BUG_RELATIVE_POINTERS
 	def_bool y
 
-config NO_IOMEM
-	def_bool y
-
-config NO_DMA
-	def_bool y
-
 config ARCH_DMA_ADDR_T_64BIT
 	def_bool 64BIT
 
@@ -58,6 +52,12 @@ config KEXEC
 config AUDIT_ARCH
 	def_bool y
 
+config NO_IOPORT
+	def_bool y
+
+config PCI_QUIRKS
+	def_bool n
+
 config S390
 	def_bool y
 	select USE_GENERIC_SMP_HELPERS if SMP
@@ -423,6 +423,50 @@ config QDIO
 
 	  If unsure, say Y.
 
+menuconfig PCI
+	bool "PCI support" 
+	default y
+	depends on 64BIT 
+ 	select ARCH_SUPPORTS_MSI
+ 	select PCI_MSI
+	help
+	  Enable PCI support.
+
+if PCI
+
+config PCI_NR_FUNCTIONS
+	int "Maximum number of PCI functions (1-4096)"
+	range 1 4096
+	default "64"
+	help
+	  This allows you to specify the maximum number of PCI functions which
+	  this kernel will support.
+
+source "drivers/pci/Kconfig"
+source "drivers/pci/pcie/Kconfig"
+source "drivers/pci/hotplug/Kconfig"
+
+endif	# PCI
+
+config PCI_DOMAINS
+	def_bool PCI
+
+config HAS_IOMEM
+	def_bool PCI
+
+config IOMMU_HELPER
+	def_bool PCI
+
+config HAS_DMA
+	def_bool PCI
+	select HAVE_DMA_API_DEBUG
+
+config NEED_SG_DMA_LENGTH
+	def_bool PCI
+
+config HAVE_DMA_ATTRS
+	def_bool PCI
+
 config CHSC_SCH
 	def_tristate m
 	prompt "Support for CHSC subchannels"
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 10/10] vga: compile fix, disable vga for s390
  2012-11-14  9:41 [RFC PATCH 00/10] s390/pci: PCI support on System z Jan Glauber
                   ` (8 preceding siblings ...)
  2012-11-14  9:41 ` [RFC PATCH 09/10] s390/pci: add PCI Kconfig options Jan Glauber
@ 2012-11-14  9:41 ` Jan Glauber
  9 siblings, 0 replies; 15+ messages in thread
From: Jan Glauber @ 2012-11-14  9:41 UTC (permalink / raw)
  To: linux-pci; +Cc: linux-kernel, heiko.carstens, schwidefsky, Jan Glauber

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
---
 drivers/gpu/vga/Kconfig | 2 +-
 include/video/vga.h     | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/vga/Kconfig b/drivers/gpu/vga/Kconfig
index f348388..29437ea 100644
--- a/drivers/gpu/vga/Kconfig
+++ b/drivers/gpu/vga/Kconfig
@@ -1,7 +1,7 @@
 config VGA_ARB
 	bool "VGA Arbitration" if EXPERT
 	default y
-	depends on PCI
+	depends on (PCI && !S390)
 	help
 	  Some "legacy" VGA devices implemented on PCI typically have the same
 	  hard-decoded addresses as they did on ISA. When multiple PCI devices
diff --git a/include/video/vga.h b/include/video/vga.h
index cac567f..7978d31 100644
--- a/include/video/vga.h
+++ b/include/video/vga.h
@@ -19,7 +19,9 @@
 
 #include <linux/types.h>
 #include <asm/io.h>
+#ifndef __s390x__
 #include <asm/vga.h>
+#endif
 #include <asm/byteorder.h>
 
 
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 01/10] s390/pci: base support
  2012-11-14  9:41 ` [RFC PATCH 01/10] s390/pci: base support Jan Glauber
@ 2012-12-10 21:14   ` Bjorn Helgaas
  2012-12-13 11:51     ` Jan Glauber
  0 siblings, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2012-12-10 21:14 UTC (permalink / raw)
  To: Jan Glauber; +Cc: linux-pci, linux-kernel, heiko.carstens, schwidefsky

On Wed, Nov 14, 2012 at 2:41 AM, Jan Glauber <jang@linux.vnet.ibm.com> wrote:
> Add PCI support for s390, (only 64 bit mode is supported by hardware):
> - PCI facility tests
> - PCI instructions: pcilg, pcistg, pcistb, stpcifc, mpcifc, rpcit
> - map readb/w/l/q and writeb/w/l/q to pcilg and pcistg instructions
> - pci_iomap implementation
> - memcpy_fromio/toio
> - pci_root_ops using special pcilg/pcistg
> - device, bus and domain allocation
>
> Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>

I think these patches are in -next already, so I just have some
general comments & questions.

My overall impression is that these are exceptionally well done.
They're easy to read, well organized, and well documented.  It's a
refreshing change from a lot of the stuff that's posted.

As with other arches that run on top of hypervisors, you have
arch-specific code that enumerates PCI devices using hypervisor calls,
and you hook into the PCI core at a lower level than
pci_scan_root_bus().  That is, you call pci_create_root_bus(),
pci_scan_single_device(), pci_bus_add_devices(), etc., directly from
the arch code.  This is the typical approach, but it does make more
dependencies between the arch code and the PCI core than I'd like.

Did you consider hiding any of the hypervisor details behind the PCI
config accessor interface?  If that could be done, the overall
structure might be more similar to other architectures.

The current config accessors only work for dev/fn 00.0 (they fail when
"devfn != ZPCI_DEVFN").  Why is that?  It looks like it precludes
multi-function devices and basically prevents you from building an
arbitrary PCI hierarchy.

zpci_map_resources() is very unusual.  The struct pci_dev resource[]
table normally contains CPU physical addresses, but
zpci_map_resources() fills it with virtual addresses.  I suspect this
has something to do with the "BAR spaces are not disjunctive on s390"
comment.  It almost sounds like that's describing host bridges where
the PCI bus address is not equal to the CPU physical address -- in
that case, device A and device B may have the same values in their
BARs, but they can still be distinct if they're under host bridges
that apply different CPU-to-bus address translations.

Bjorn

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 01/10] s390/pci: base support
  2012-12-10 21:14   ` Bjorn Helgaas
@ 2012-12-13 11:51     ` Jan Glauber
  2012-12-13 19:34       ` Bjorn Helgaas
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Glauber @ 2012-12-13 11:51 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-kernel, heiko.carstens, schwidefsky, sebott,
	Gerald Schäfer

On Mon, 2012-12-10 at 14:14 -0700, Bjorn Helgaas wrote:
> On Wed, Nov 14, 2012 at 2:41 AM, Jan Glauber <jang@linux.vnet.ibm.com> wrote:
> > Add PCI support for s390, (only 64 bit mode is supported by hardware):
> > - PCI facility tests
> > - PCI instructions: pcilg, pcistg, pcistb, stpcifc, mpcifc, rpcit
> > - map readb/w/l/q and writeb/w/l/q to pcilg and pcistg instructions
> > - pci_iomap implementation
> > - memcpy_fromio/toio
> > - pci_root_ops using special pcilg/pcistg
> > - device, bus and domain allocation
> >
> > Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
> 
> I think these patches are in -next already, so I just have some
> general comments & questions.

Yes, since the feedback was manageable we decided to give the patches
some exposure in -next and if no one complains we'll just go for the
next merge window. BTW, Sebastian & Gerald (on CC:) will continue the
work on the PCI code.

> My overall impression is that these are exceptionally well done.
> They're easy to read, well organized, and well documented.  It's a
> refreshing change from a lot of the stuff that's posted.

Thanks Björn!

> As with other arches that run on top of hypervisors, you have
> arch-specific code that enumerates PCI devices using hypervisor calls,
> and you hook into the PCI core at a lower level than
> pci_scan_root_bus().  That is, you call pci_create_root_bus(),
> pci_scan_single_device(), pci_bus_add_devices(), etc., directly from
> the arch code.  This is the typical approach, but it does make more
> dependencies between the arch code and the PCI core than I'd like.
> 
> Did you consider hiding any of the hypervisor details behind the PCI
> config accessor interface?  If that could be done, the overall
> structure might be more similar to other architectures.

You mean pci_root_ops? I'm not sure I understand how that can be used
to hide hipervisor details. One reason why we use the lower level
functions is that we need to create the root bus (and its resources)
much earlier then the pci_dev. For instance pci_hp_register wants a
pci_bus to create the PCI slot and the slot can exist without a pci_dev.

> The current config accessors only work for dev/fn 00.0 (they fail when
> "devfn != ZPCI_DEVFN").  Why is that?  It looks like it precludes
> multi-function devices and basically prevents you from building an
> arbitrary PCI hierarchy.

Our hypervisor does not support multi-function devices. In fact the
hypervisor will limit the reported PCI devices to a hand-picked
selection so we can be sure that there will be no unsupported devices.
The PCI hierarchy is hidden by the hipervisor. We only use the PCI
domain number, bus and devfn are always zero. So it looks like every
function is directly attached to a PCI root complex.

That was the reason for the sanity check, but thinking about it I could
remove it since although we don't support multi-function devices I
think the s390 code should be more agnostic to these limitations.

> zpci_map_resources() is very unusual.  The struct pci_dev resource[]
> table normally contains CPU physical addresses, but
> zpci_map_resources() fills it with virtual addresses.  I suspect this
> has something to do with the "BAR spaces are not disjunctive on s390"
> comment.  It almost sounds like that's describing host bridges where
> the PCI bus address is not equal to the CPU physical address -- in
> that case, device A and device B may have the same values in their
> BARs, but they can still be distinct if they're under host bridges
> that apply different CPU-to-bus address translations.

Yeah, you've found it... I've had 3 or 4 tries on different
implementations but all failed. If we use the resources as they are we
cannot map them to the instructions (and ioremap does not help because
there we cannot find out which device the resource belongs to). If we
change the BARs on the card MMIO stops to work. I don't know about host
bridges - if we would use a host bridge at which point in the
translation process would it kick in?

Jan

> Bjorn
> 




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 01/10] s390/pci: base support
  2012-12-13 11:51     ` Jan Glauber
@ 2012-12-13 19:34       ` Bjorn Helgaas
  2012-12-18 18:07         ` Sebastian Ott
  0 siblings, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2012-12-13 19:34 UTC (permalink / raw)
  To: Jan Glauber
  Cc: linux-pci, linux-kernel, heiko.carstens, schwidefsky, sebott,
	Gerald Schäfer

On Thu, Dec 13, 2012 at 4:51 AM, Jan Glauber <jang@linux.vnet.ibm.com> wrote:
> On Mon, 2012-12-10 at 14:14 -0700, Bjorn Helgaas wrote:
>> On Wed, Nov 14, 2012 at 2:41 AM, Jan Glauber <jang@linux.vnet.ibm.com> wrote:
>> > Add PCI support for s390, (only 64 bit mode is supported by hardware):
>> > - PCI facility tests
>> > - PCI instructions: pcilg, pcistg, pcistb, stpcifc, mpcifc, rpcit
>> > - map readb/w/l/q and writeb/w/l/q to pcilg and pcistg instructions
>> > - pci_iomap implementation
>> > - memcpy_fromio/toio
>> > - pci_root_ops using special pcilg/pcistg
>> > - device, bus and domain allocation
>> >
>> > Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
>>
>> I think these patches are in -next already, so I just have some
>> general comments & questions.
>
> Yes, since the feedback was manageable we decided to give the patches
> some exposure in -next and if no one complains we'll just go for the
> next merge window. BTW, Sebastian & Gerald (on CC:) will continue the
> work on the PCI code.
>
>> My overall impression is that these are exceptionally well done.
>> They're easy to read, well organized, and well documented.  It's a
>> refreshing change from a lot of the stuff that's posted.
>
> Thanks Björn!
>
>> As with other arches that run on top of hypervisors, you have
>> arch-specific code that enumerates PCI devices using hypervisor calls,
>> and you hook into the PCI core at a lower level than
>> pci_scan_root_bus().  That is, you call pci_create_root_bus(),
>> pci_scan_single_device(), pci_bus_add_devices(), etc., directly from
>> the arch code.  This is the typical approach, but it does make more
>> dependencies between the arch code and the PCI core than I'd like.
>>
>> Did you consider hiding any of the hypervisor details behind the PCI
>> config accessor interface?  If that could be done, the overall
>> structure might be more similar to other architectures.
>
> You mean pci_root_ops? I'm not sure I understand how that can be used
> to hide hipervisor details.

The object of doing this would be to let you use pci_scan_root_bus(),
so you wouldn't have to call pci_scan_single_device() and
pci_bus_add_resources() directly.  The idea is to make pci_read() and
pci_write() smart enough that the PCI core can use them as though you
have a normal PCI implementation.  When pci_scan_root_bus() enumerates
devices on the root bus using pci_scan_child_bus(), it does config
reads on the VENDOR_ID of bb:00.0, bb:01.0, ..., bb:1f.0.  Your
pci_read() would return error or 0xffffffff data for everything except
bb:00.0 (I guess it actually already does that).

Then some of the other init, e.g., zpci_enable_device(), could be done
by the standard pcibios hooks such as pcibios_add_device() and
pcibios_enable_device(), which would remove some of the PCI grunge
from zpci_scan_devices() and the s390 hotplug driver.

> One reason why we use the lower level
> functions is that we need to create the root bus (and its resources)
> much earlier then the pci_dev. For instance pci_hp_register wants a
> pci_bus to create the PCI slot and the slot can exist without a pci_dev.

There must be something else going on here; a bus is *always* created
before any of the pci_devs on the bus.

One thing that looks a little strange is that zpci_list seems to be
sort of a cross between a list of PCI host bridges and a list of PCI
devices.  Understandable, since you usually have a one-to-one
correspondence between them, but for your hotplug stuff, you can have
the host bridge and slot without the pci_dev.

The hotplug slot support is really a function of the host bridge
support, so it doesn't seem quite right to me to split it into a
separate module (although that is the way most PCI hotplug drivers are
currently structured).  One reason is that it makes hot-add of a host
bridge awkward: you have to have the "if (hotplug_ops.create_slot)"
test in zpci_create_device().

>> The current config accessors only work for dev/fn 00.0 (they fail when
>> "devfn != ZPCI_DEVFN").  Why is that?  It looks like it precludes
>> multi-function devices and basically prevents you from building an
>> arbitrary PCI hierarchy.
>
> Our hypervisor does not support multi-function devices. In fact the
> hypervisor will limit the reported PCI devices to a hand-picked
> selection so we can be sure that there will be no unsupported devices.
> The PCI hierarchy is hidden by the hipervisor. We only use the PCI
> domain number, bus and devfn are always zero. So it looks like every
> function is directly attached to a PCI root complex.
>
> That was the reason for the sanity check, but thinking about it I could
> remove it since although we don't support multi-function devices I
> think the s390 code should be more agnostic to these limitations.

The config accessor interface should be defined so it works for all
PCI devices that exist, and fails for devices that do not exist.  The
approach you've taken so far is to prevent the PCI core from even
attempting access to non-existent devices, which requires you to use
some lower-level PCI interfaces.  The alternative I'm raising as a
possibility is to allow the PCI core to attempt accesses to
non-existent devices, but have the accessor be smart enough to use the
hypervisor or other arch-specific data to determine whether the device
exists or not, and act accordingly.

Basically the idea is that if we put more smarts in the config
accessors, we can make the interface between the PCI core and the
architecture thinner.

>> zpci_map_resources() is very unusual.  The struct pci_dev resource[]
>> table normally contains CPU physical addresses, but
>> zpci_map_resources() fills it with virtual addresses.  I suspect this
>> has something to do with the "BAR spaces are not disjunctive on s390"
>> comment.  It almost sounds like that's describing host bridges where
>> the PCI bus address is not equal to the CPU physical address -- in
>> that case, device A and device B may have the same values in their
>> BARs, but they can still be distinct if they're under host bridges
>> that apply different CPU-to-bus address translations.
>
> Yeah, you've found it... I've had 3 or 4 tries on different
> implementations but all failed. If we use the resources as they are we
> cannot map them to the instructions (and ioremap does not help because
> there we cannot find out which device the resource belongs to). If we
> change the BARs on the card MMIO stops to work. I don't know about host
> bridges - if we would use a host bridge at which point in the
> translation process would it kick in?

Here's how it works.  The PCI host bridge claims regions of the CPU
physical address space and forwards transactions in those regions to
the PCI root bus.  Some host bridges can apply an offset when
forwarding, so the address on the bus may be different from the
address from the CPU.  The bus address is what matches a PCI BAR.

You can tell the PCI core about this translation by using
pci_add_resource_offset() instead of pci_add_resource().  When the PCI
core reads a BAR, it applies the offset to convert the BAR value into
a CPU physical address.

For example, let's say you have two host bridges:

PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [mem 0x100000000-0x1ffffffff] (bus
address [0x00000000-0xffffffff])
PCI host bridge to bus 0001:00
pci_bus 0001:00: root bus resource [mem 0x200000000-0x2ffffffff] (bus
address [0x00000000-0xffffffff])

Both bridges use the same bus address range, and BARs of devices on
bus 0000:00 can have the same values as those of devices on bus
0001:00.  But there's no ambiguity because a CPU access to
0x1_0000_0000 will be claimed by the first bridge and translated to
bus address 0x0 on bus 0000:00, while a CPU access to 0x2_0000_0000
will be claimed by the second bridge and translated to bus address 0x0
on bus 0001:00.

The pci_dev resources will contain CPU physical addresses, not the BAR
values themselves.  These addresses implicitly identify the host
bridge (and, in your case, the pci_dev, since you have at most one
pci_dev per host bridge).

Bjorn

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 01/10] s390/pci: base support
  2012-12-13 19:34       ` Bjorn Helgaas
@ 2012-12-18 18:07         ` Sebastian Ott
  0 siblings, 0 replies; 15+ messages in thread
From: Sebastian Ott @ 2012-12-18 18:07 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jan Glauber, linux-pci, linux-kernel, heiko.carstens,
	schwidefsky, Gerald Schäfer

[-- Attachment #1: Type: TEXT/PLAIN, Size: 8539 bytes --]



On Thu, 13 Dec 2012, Bjorn Helgaas wrote:

> On Thu, Dec 13, 2012 at 4:51 AM, Jan Glauber <jang@linux.vnet.ibm.com> wrote:
> > On Mon, 2012-12-10 at 14:14 -0700, Bjorn Helgaas wrote:
> >> On Wed, Nov 14, 2012 at 2:41 AM, Jan Glauber <jang@linux.vnet.ibm.com> wrote:
> >> > Add PCI support for s390, (only 64 bit mode is supported by hardware):
> >> > - PCI facility tests
> >> > - PCI instructions: pcilg, pcistg, pcistb, stpcifc, mpcifc, rpcit
> >> > - map readb/w/l/q and writeb/w/l/q to pcilg and pcistg instructions
> >> > - pci_iomap implementation
> >> > - memcpy_fromio/toio
> >> > - pci_root_ops using special pcilg/pcistg
> >> > - device, bus and domain allocation
> >> >
> >> > Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
> >>
> >> I think these patches are in -next already, so I just have some
> >> general comments & questions.
> >
> > Yes, since the feedback was manageable we decided to give the patches
> > some exposure in -next and if no one complains we'll just go for the
> > next merge window. BTW, Sebastian & Gerald (on CC:) will continue the
> > work on the PCI code.
> >
> >> My overall impression is that these are exceptionally well done.
> >> They're easy to read, well organized, and well documented.  It's a
> >> refreshing change from a lot of the stuff that's posted.
> >
> > Thanks Björn!
> >
> >> As with other arches that run on top of hypervisors, you have
> >> arch-specific code that enumerates PCI devices using hypervisor calls,
> >> and you hook into the PCI core at a lower level than
> >> pci_scan_root_bus().  That is, you call pci_create_root_bus(),
> >> pci_scan_single_device(), pci_bus_add_devices(), etc., directly from
> >> the arch code.  This is the typical approach, but it does make more
> >> dependencies between the arch code and the PCI core than I'd like.
> >>
> >> Did you consider hiding any of the hypervisor details behind the PCI
> >> config accessor interface?  If that could be done, the overall
> >> structure might be more similar to other architectures.
> >
> > You mean pci_root_ops? I'm not sure I understand how that can be used
> > to hide hipervisor details.
> 
> The object of doing this would be to let you use pci_scan_root_bus(),
> so you wouldn't have to call pci_scan_single_device() and
> pci_bus_add_resources() directly.  The idea is to make pci_read() and
> pci_write() smart enough that the PCI core can use them as though you
> have a normal PCI implementation.  When pci_scan_root_bus() enumerates
> devices on the root bus using pci_scan_child_bus(), it does config
> reads on the VENDOR_ID of bb:00.0, bb:01.0, ..., bb:1f.0.  Your
> pci_read() would return error or 0xffffffff data for everything except
> bb:00.0 (I guess it actually already does that).
> 
> Then some of the other init, e.g., zpci_enable_device(), could be done
> by the standard pcibios hooks such as pcibios_add_device() and
> pcibios_enable_device(), which would remove some of the PCI grunge
> from zpci_scan_devices() and the s390 hotplug driver.
> 
> > One reason why we use the lower level
> > functions is that we need to create the root bus (and its resources)
> > much earlier then the pci_dev. For instance pci_hp_register wants a
> > pci_bus to create the PCI slot and the slot can exist without a pci_dev.
> 
> There must be something else going on here; a bus is *always* created
> before any of the pci_devs on the bus.
> 
> One thing that looks a little strange is that zpci_list seems to be
> sort of a cross between a list of PCI host bridges and a list of PCI
> devices.  Understandable, since you usually have a one-to-one
> correspondence between them, but for your hotplug stuff, you can have
> the host bridge and slot without the pci_dev.
> 
> The hotplug slot support is really a function of the host bridge
> support, so it doesn't seem quite right to me to split it into a
> separate module (although that is the way most PCI hotplug drivers are
> currently structured).  One reason is that it makes hot-add of a host
> bridge awkward: you have to have the "if (hotplug_ops.create_slot)"
> test in zpci_create_device().
> 
> >> The current config accessors only work for dev/fn 00.0 (they fail when
> >> "devfn != ZPCI_DEVFN").  Why is that?  It looks like it precludes
> >> multi-function devices and basically prevents you from building an
> >> arbitrary PCI hierarchy.
> >
> > Our hypervisor does not support multi-function devices. In fact the
> > hypervisor will limit the reported PCI devices to a hand-picked
> > selection so we can be sure that there will be no unsupported devices.
> > The PCI hierarchy is hidden by the hipervisor. We only use the PCI
> > domain number, bus and devfn are always zero. So it looks like every
> > function is directly attached to a PCI root complex.
> >
> > That was the reason for the sanity check, but thinking about it I could
> > remove it since although we don't support multi-function devices I
> > think the s390 code should be more agnostic to these limitations.
> 
> The config accessor interface should be defined so it works for all
> PCI devices that exist, and fails for devices that do not exist.  The
> approach you've taken so far is to prevent the PCI core from even
> attempting access to non-existent devices, which requires you to use
> some lower-level PCI interfaces.  The alternative I'm raising as a
> possibility is to allow the PCI core to attempt accesses to
> non-existent devices, but have the accessor be smart enough to use the
> hypervisor or other arch-specific data to determine whether the device
> exists or not, and act accordingly.
> 
> Basically the idea is that if we put more smarts in the config
> accessors, we can make the interface between the PCI core and the
> architecture thinner.
> 
> >> zpci_map_resources() is very unusual.  The struct pci_dev resource[]
> >> table normally contains CPU physical addresses, but
> >> zpci_map_resources() fills it with virtual addresses.  I suspect this
> >> has something to do with the "BAR spaces are not disjunctive on s390"
> >> comment.  It almost sounds like that's describing host bridges where
> >> the PCI bus address is not equal to the CPU physical address -- in
> >> that case, device A and device B may have the same values in their
> >> BARs, but they can still be distinct if they're under host bridges
> >> that apply different CPU-to-bus address translations.
> >
> > Yeah, you've found it... I've had 3 or 4 tries on different
> > implementations but all failed. If we use the resources as they are we
> > cannot map them to the instructions (and ioremap does not help because
> > there we cannot find out which device the resource belongs to). If we
> > change the BARs on the card MMIO stops to work. I don't know about host
> > bridges - if we would use a host bridge at which point in the
> > translation process would it kick in?
> 
> Here's how it works.  The PCI host bridge claims regions of the CPU
> physical address space and forwards transactions in those regions to
> the PCI root bus.  Some host bridges can apply an offset when
> forwarding, so the address on the bus may be different from the
> address from the CPU.  The bus address is what matches a PCI BAR.
> 
> You can tell the PCI core about this translation by using
> pci_add_resource_offset() instead of pci_add_resource().  When the PCI
> core reads a BAR, it applies the offset to convert the BAR value into
> a CPU physical address.
> 
> For example, let's say you have two host bridges:
> 
> PCI host bridge to bus 0000:00
> pci_bus 0000:00: root bus resource [mem 0x100000000-0x1ffffffff] (bus
> address [0x00000000-0xffffffff])
> PCI host bridge to bus 0001:00
> pci_bus 0001:00: root bus resource [mem 0x200000000-0x2ffffffff] (bus
> address [0x00000000-0xffffffff])
> 
> Both bridges use the same bus address range, and BARs of devices on
> bus 0000:00 can have the same values as those of devices on bus
> 0001:00.  But there's no ambiguity because a CPU access to
> 0x1_0000_0000 will be claimed by the first bridge and translated to
> bus address 0x0 on bus 0000:00, while a CPU access to 0x2_0000_0000
> will be claimed by the second bridge and translated to bus address 0x0
> on bus 0001:00.
> 
> The pci_dev resources will contain CPU physical addresses, not the BAR
> values themselves.  These addresses implicitly identify the host
> bridge (and, in your case, the pci_dev, since you have at most one
> pci_dev per host bridge).

Hi Bjorn,

thanks for the explanation. I'll look into it.

Regards,
Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-12-18 18:07 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-14  9:41 [RFC PATCH 00/10] s390/pci: PCI support on System z Jan Glauber
2012-11-14  9:41 ` [RFC PATCH 01/10] s390/pci: base support Jan Glauber
2012-12-10 21:14   ` Bjorn Helgaas
2012-12-13 11:51     ` Jan Glauber
2012-12-13 19:34       ` Bjorn Helgaas
2012-12-18 18:07         ` Sebastian Ott
2012-11-14  9:41 ` [RFC PATCH 02/10] s390/pci: CLP interface Jan Glauber
2012-11-14  9:41 ` [RFC PATCH 03/10] s390/bitops: find leftmost bit instruction support Jan Glauber
2012-11-14  9:41 ` [RFC PATCH 04/10] s390/pci: PCI adapter interrupts for MSI/MSI-X Jan Glauber
2012-11-14  9:41 ` [RFC PATCH 05/10] s390/pci: DMA support Jan Glauber
2012-11-14  9:41 ` [RFC PATCH 06/10] s390/pci: CHSC PCI support for error and availability events Jan Glauber
2012-11-14  9:41 ` [RFC PATCH 07/10] s390/pci: PCI hotplug support via SCLP Jan Glauber
2012-11-14  9:41 ` [RFC PATCH 08/10] s390/pci: s390 specific PCI sysfs attributes Jan Glauber
2012-11-14  9:41 ` [RFC PATCH 09/10] s390/pci: add PCI Kconfig options Jan Glauber
2012-11-14  9:41 ` [RFC PATCH 10/10] vga: compile fix, disable vga for s390 Jan Glauber

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.