All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] DAPL support on s390x platform prototype
@ 2014-08-27 10:24 Alexey Ishchuk
       [not found] ` <1409135080-44991-1-git-send-email-aishchuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Alexey Ishchuk @ 2014-08-27 10:24 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: arlin.r.davis-ral2JQCrhuEAvxtiuMwx3w,
	gilr-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb,
	roland-DgEjT+Ai2ygdnm+yROfE0A, linux-s390-u79uwXL29TY76Z2rM5mHXA,
	gmuelas-tA70FqPdS9bQT0dZR+AlfA,
	utz.bacher-tA70FqPdS9bQT0dZR+AlfA,
	martin.schwidefsky-tA70FqPdS9bQT0dZR+AlfA,
	frank.blaschka-tA70FqPdS9bQT0dZR+AlfA, Alexey Ishchuk

This patch series contains the re-architected changes to kernel and
userspace libraries required to provide support for the DAPL API on s390x
platform. Those patches are are a prototype of proposed changes. It is not
requested to include this version of changes into kernel source code and
the patches are posted only to receive comments from the community on this
changes concept.
The current implementation of Infiniband verbs uses mapped memory areas to
directly access the device UAR and Blueflame pages, which are located in
the PCI I/O memory, from userspace. On the s390x platform the PCI I/O
memory can be accessed only using special privileged CPU instructions that
cannot be used directly in user space programs. This restricts the usage of
mapped memory areas to access the PCI I/O memory on s390x platform.
In the previous attempt of DAPL support on s390x platform implementation a
new Infinband verb command was introduced and the changes to kernel modules
and user space libraries were provided but that version of changes was
rejected by community.
The new version of changes introduces new kernel system calls which allow
to execute the privileged CPU instructions in kernel space on request from
user space programs. One system call allows the user space programs to
write data to a PCI I/O memory page and the second one can be used to read
data from PCI I/O memory to userspace program buffer using mapped memory
area addresses as arguments.
This approach of the DAPL API support on s390x platform has the following
advantages:
	* the current Infiniband and mlx4 support modules remain unchanged;
	* the changes are separated into the special kernel platform
	  specific directory;
	* no conditional compilation directives are used in the kernel
	  source code;
	* no changes required to the kernel virtual memory management;
	* only minor changes are required in the user space DAPL API
	  components.
The only disadvantage of this approach is that it is still necessary to
modify existing userspace library libmlx4 to replace the direct access to
mapped memory areas intended for the PCI I/O memory access with the
appropriate new system call invocation. The changes to the other user space
DAPL component code are required only to provide the support of the s390x
platform.
There are 1 patch for the Linux kernel and 4 patches for the DAPL API user
space components.
	[PATCH 1/5] s390/kernel: add system calls for access PCI memory
This patch contains the new system call implementation required for the PCI
I/O memory access from userspace programs on s390x platform.
	[PATCH 2/5] libibverbs: add support for the s390x plaform
This patch contains the changes to the libibverbs user space library to
provide support of the s390x platform.
	[PATCH 3/5] libmlx4: add support for the s390x platform
This patch contains the changes to the libmlx4 user space library intended
to provide the PCI I/O memory access on the s390x platform. The direct
access to mapped memory areas is replaced by appropriate system call
invocation.
	[PATCH 4/5] dapl: add support for the s390x platform
This patch contains the code which is needed to be added to dapl package
to allow the dapl libraries to be used on s390x platform. There is no
changes added to this patch since the previous post and it is included only
for refrence.
	[PATCH 5/5] perftest: add support for the s390x platform
This patch contains the code which is neeed to be added to the perftest
package applications to allow their execution on the s390x platform. There
is no changes in this patch since the previous post and it is included only
for refrence.

Alexey Ishchuk (5):
  s390/kernel: add system calls for access PCI memory
  libibverbs: add support for s390x platform
  libmlx4: add support for s390x platform
  dapl: add support for s390x platform
  perftest: add support for s390x platform
-- 
1.8.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] s390/kernel: add system calls for access PCI memory
       [not found] ` <1409135080-44991-1-git-send-email-aishchuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2014-08-27 10:24   ` Alexey Ishchuk
       [not found]     ` <1409135080-44991-2-git-send-email-aishchuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2014-08-27 10:24   ` [PATCH 2/5] libibverbs: add support for s390x platform Alexey Ishchuk
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Alexey Ishchuk @ 2014-08-27 10:24 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: arlin.r.davis-ral2JQCrhuEAvxtiuMwx3w,
	gilr-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb,
	roland-DgEjT+Ai2ygdnm+yROfE0A, linux-s390-u79uwXL29TY76Z2rM5mHXA,
	gmuelas-tA70FqPdS9bQT0dZR+AlfA,
	utz.bacher-tA70FqPdS9bQT0dZR+AlfA,
	martin.schwidefsky-tA70FqPdS9bQT0dZR+AlfA,
	frank.blaschka-tA70FqPdS9bQT0dZR+AlfA, Alexey Ishchuk

Add the new __NR_s390_pci_mmio_write and __NR_s390_pci_mmio_read
system calls to allow user space applications to access device PCI I/O
memory pages on s390x platform.

Signed-off-by: Alexey Ishchuk <aishchuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 arch/s390/include/uapi/asm/unistd.h |   4 +-
 arch/s390/kernel/Makefile           |   1 +
 arch/s390/kernel/entry.h            |   4 +
 arch/s390/kernel/pci_mmio.c         | 197 ++++++++++++++++++++++++++++++++++++
 arch/s390/kernel/syscalls.S         |   2 +
 5 files changed, 207 insertions(+), 1 deletion(-)
 create mode 100644 arch/s390/kernel/pci_mmio.c

diff --git a/arch/s390/include/uapi/asm/unistd.h b/arch/s390/include/uapi/asm/unistd.h
index 3802d2d..ab49d1d 100644
--- a/arch/s390/include/uapi/asm/unistd.h
+++ b/arch/s390/include/uapi/asm/unistd.h
@@ -283,7 +283,9 @@
 #define __NR_sched_setattr	345
 #define __NR_sched_getattr	346
 #define __NR_renameat2		347
-#define NR_syscalls 348
+#define __NR_s390_pci_mmio_write	348
+#define __NR_s390_pci_mmio_read		349
+#define NR_syscalls 350
 
 /* 
  * There are some system calls that are not present on 64 bit, some
diff --git a/arch/s390/kernel/Makefile b/arch/s390/kernel/Makefile
index 8c2518f..44e8fbb 100644
--- a/arch/s390/kernel/Makefile
+++ b/arch/s390/kernel/Makefile
@@ -62,6 +62,7 @@ ifdef CONFIG_64BIT
 obj-$(CONFIG_PERF_EVENTS)	+= perf_event.o perf_cpum_cf.o perf_cpum_sf.o \
 						perf_cpum_cf_events.o
 obj-y				+= runtime_instr.o cache.o
+obj-y				+= pci_mmio.o
 endif
 
 # vdso
diff --git a/arch/s390/kernel/entry.h b/arch/s390/kernel/entry.h
index 6ac7819..a36b6f9 100644
--- a/arch/s390/kernel/entry.h
+++ b/arch/s390/kernel/entry.h
@@ -70,4 +70,8 @@ struct old_sigaction;
 long sys_s390_personality(unsigned int personality);
 long sys_s390_runtime_instr(int command, int signum);
 
+long sys_s390_pci_mmio_write(const unsigned long mmio_addr,
+			     const void *user_buffer, const size_t length);
+long sys_s390_pci_mmio_read(const unsigned long mmio_addr,
+			    void *user_buffer, const size_t length);
 #endif /* _ENTRY_H */
diff --git a/arch/s390/kernel/pci_mmio.c b/arch/s390/kernel/pci_mmio.c
new file mode 100644
index 0000000..4539d23
--- /dev/null
+++ b/arch/s390/kernel/pci_mmio.c
@@ -0,0 +1,197 @@
+/*
+ * Copyright IBM Corp. 2014
+ */
+#include <linux/kernel.h>
+#include <linux/syscalls.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/errno.h>
+#include <linux/pci.h>
+
+union value_buffer {
+	u8 buf8;
+	u16 buf16;
+	u32 buf32;
+	u64 buf64;
+	u8 buf_large[64];
+};
+
+static long get_pfn(const unsigned long user_addr,
+		    const unsigned long access,
+		    unsigned long *pfn)
+{
+	struct vm_area_struct *vma = NULL;
+
+	if (!pfn)
+		return -EINVAL;
+
+	vma = find_vma(current->mm, user_addr);
+	if (!vma)
+		return -EINVAL;
+	if (!(vma->vm_flags & access))
+		return -EACCES;
+
+	return follow_pfn(vma, user_addr, pfn);
+}
+
+static inline int verify_page_addr(const unsigned long page_addr)
+{
+	return !(page_addr < ZPCI_IOMAP_ADDR_BASE ||
+	    page_addr > (ZPCI_IOMAP_ADDR_BASE | ZPCI_IOMAP_ADDR_IDX_MASK));
+}
+
+static long choose_buffer(const size_t length,
+			  union value_buffer *value,
+			  void **buf)
+{
+	long ret = 0UL;
+
+	if (length > sizeof(value->buf_large)) {
+		*buf = kmalloc(length, GFP_KERNEL);
+		if (!*buf)
+			return -ENOMEM;
+		ret = 1;
+	} else {
+		*buf = value->buf_large;
+	}
+	return ret;
+}
+
+SYSCALL_DEFINE3(s390_pci_mmio_write,
+		const unsigned long, mmio_addr,
+		const void __user *, user_buffer,
+		const size_t, length)
+{
+	long ret = 0L;
+	void *buf = NULL;
+	long buf_allocated = 0;
+	void __iomem *io_addr = NULL;
+	unsigned long pfn = 0UL;
+	unsigned long offset = 0UL;
+	unsigned long page_addr = 0UL;
+	union value_buffer value;
+
+	if (!length)
+		return -EINVAL;
+	if (!zpci_is_enabled())
+		return -ENODEV;
+
+	ret = get_pfn(mmio_addr, VM_WRITE, &pfn);
+	if (ret)
+		return ret;
+
+	page_addr = pfn << PAGE_SHIFT;
+	if (!verify_page_addr(page_addr))
+		return -EFAULT;
+
+	offset = mmio_addr & ~PAGE_MASK;
+	if (offset + length > PAGE_SIZE)
+		return -EINVAL;
+	io_addr = (void *)(page_addr | offset);
+
+	buf_allocated = choose_buffer(length, &value, &buf);
+	if (buf_allocated < 0L)
+		return -ENOMEM;
+
+	switch (length) {
+	case 1:
+		ret = get_user(value.buf8, ((u8 *)user_buffer));
+		break;
+	case 2:
+		ret = get_user(value.buf16, ((u16 *)user_buffer));
+		break;
+	case 4:
+		ret = get_user(value.buf32, ((u32 *)user_buffer));
+		break;
+	case 8:
+		ret = get_user(value.buf64, ((u64 *)user_buffer));
+		break;
+	default:
+		ret = copy_from_user(buf, user_buffer, length);
+	}
+	if (ret)
+		goto out;
+
+	switch (length) {
+	case 1:
+		__raw_writeb(value.buf8, io_addr);
+		break;
+	case 2:
+		__raw_writew(value.buf16, io_addr);
+		break;
+	case 4:
+		__raw_writel(value.buf32, io_addr);
+		break;
+	case 8:
+		__raw_writeq(value.buf64, io_addr);
+		break;
+	default:
+		memcpy_toio(io_addr, buf, length);
+	}
+out:
+	if (buf_allocated > 0L)
+		kfree(buf);
+	return ret;
+}
+
+SYSCALL_DEFINE3(s390_pci_mmio_read,
+		const unsigned long, mmio_addr,
+		void __user *, user_buffer,
+		const size_t, length)
+{
+	long ret = 0L;
+	void *buf = NULL;
+	long buf_allocated = 0L;
+	void __iomem *io_addr = NULL;
+	unsigned long pfn = 0UL;
+	unsigned long offset = 0UL;
+	unsigned long page_addr = 0UL;
+	union value_buffer value;
+
+	if (!length)
+		return -EINVAL;
+	if (!zpci_is_enabled())
+		return -ENODEV;
+
+	ret = get_pfn(mmio_addr, VM_READ, &pfn);
+	if (ret)
+		return ret;
+
+	page_addr = pfn << PAGE_SHIFT;
+	if (!verify_page_addr(page_addr))
+		return -EFAULT;
+
+	offset = mmio_addr & ~PAGE_MASK;
+	if (offset + length > PAGE_SIZE)
+		return -EINVAL;
+	io_addr = (void *)(page_addr | offset);
+
+	buf_allocated = choose_buffer(length, &value, &buf);
+	if (buf_allocated < 0L)
+		return -ENOMEM;
+
+	switch (length) {
+	case 1:
+		value.buf8 = __raw_readb(io_addr);
+		ret = put_user(value.buf8, ((u8 *)user_buffer));
+		break;
+	case 2:
+		value.buf16 = __raw_readw(io_addr);
+		ret = put_user(value.buf16, ((u16 *)user_buffer));
+		break;
+	case 4:
+		value.buf32 = __raw_readl(io_addr);
+		ret = put_user(value.buf32, ((u32 *)user_buffer));
+		break;
+	case 8:
+		value.buf64 = __raw_readq(io_addr);
+		ret = put_user(value.buf64, ((u64 *)user_buffer));
+		break;
+	default:
+		memcpy_fromio(buf, io_addr, length);
+		ret = copy_to_user(user_buffer, buf, length);
+	}
+	if (buf_allocated > 0L)
+		kfree(buf);
+	return ret;
+}
diff --git a/arch/s390/kernel/syscalls.S b/arch/s390/kernel/syscalls.S
index fe5cdf2..1faa942 100644
--- a/arch/s390/kernel/syscalls.S
+++ b/arch/s390/kernel/syscalls.S
@@ -356,3 +356,5 @@ SYSCALL(sys_finit_module,sys_finit_module,compat_sys_finit_module)
 SYSCALL(sys_sched_setattr,sys_sched_setattr,compat_sys_sched_setattr) /* 345 */
 SYSCALL(sys_sched_getattr,sys_sched_getattr,compat_sys_sched_getattr)
 SYSCALL(sys_renameat2,sys_renameat2,compat_sys_renameat2)
+SYSCALL(sys_ni_syscall,sys_s390_pci_mmio_write,sys_ni_syscall)
+SYSCALL(sys_ni_syscall,sys_s390_pci_mmio_read,sys_ni_syscall)
-- 
1.8.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/5] libibverbs: add support for s390x platform
       [not found] ` <1409135080-44991-1-git-send-email-aishchuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2014-08-27 10:24   ` [PATCH] s390/kernel: add system calls for access PCI memory Alexey Ishchuk
@ 2014-08-27 10:24   ` Alexey Ishchuk
  2014-08-27 10:24   ` [PATCH 3/5] libmlx4: " Alexey Ishchuk
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Alexey Ishchuk @ 2014-08-27 10:24 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: arlin.r.davis-ral2JQCrhuEAvxtiuMwx3w,
	gilr-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb,
	roland-DgEjT+Ai2ygdnm+yROfE0A, linux-s390-u79uwXL29TY76Z2rM5mHXA,
	gmuelas-tA70FqPdS9bQT0dZR+AlfA,
	utz.bacher-tA70FqPdS9bQT0dZR+AlfA,
	martin.schwidefsky-tA70FqPdS9bQT0dZR+AlfA,
	frank.blaschka-tA70FqPdS9bQT0dZR+AlfA, Alexey Ishchuk

This patch adds the required platform specific code to allow execution of
the libibverbs functions on the s390x platform.

Signed-off-by: Alexey Ishchuk <aishchuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 include/infiniband/arch.h |    7 +++++++
 1 file changed, 7 insertions(+)

--- a/include/infiniband/arch.h
+++ b/include/infiniband/arch.h
@@ -115,6 +115,13 @@ static inline uint64_t ntohll(uint64_t x
 #define wmb()	 mb()
 #define wc_wmb() wmb()
 
+#elif defined(__s390x__)
+
+#define mb()	{ asm volatile("" : : : "memory"); }	/* for s390x */
+#define rmb()	mb()					/* for s390x */
+#define wmb()	mb()					/* for s390x */
+#define wc_wmb() wmb()					/* for s390x */
+
 #else
 
 #warning No architecture specific defines found.  Using generic implementation.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 3/5] libmlx4: add support for s390x platform
       [not found] ` <1409135080-44991-1-git-send-email-aishchuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2014-08-27 10:24   ` [PATCH] s390/kernel: add system calls for access PCI memory Alexey Ishchuk
  2014-08-27 10:24   ` [PATCH 2/5] libibverbs: add support for s390x platform Alexey Ishchuk
@ 2014-08-27 10:24   ` Alexey Ishchuk
  2014-08-27 10:24   ` [PATCH 4/5] dapl: add support for the " Alexey Ishchuk
  2014-08-27 10:24   ` [PATCH 5/5] perftest: " Alexey Ishchuk
  4 siblings, 0 replies; 9+ messages in thread
From: Alexey Ishchuk @ 2014-08-27 10:24 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: arlin.r.davis-ral2JQCrhuEAvxtiuMwx3w,
	gilr-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb,
	roland-DgEjT+Ai2ygdnm+yROfE0A, linux-s390-u79uwXL29TY76Z2rM5mHXA,
	gmuelas-tA70FqPdS9bQT0dZR+AlfA,
	utz.bacher-tA70FqPdS9bQT0dZR+AlfA,
	martin.schwidefsky-tA70FqPdS9bQT0dZR+AlfA,
	frank.blaschka-tA70FqPdS9bQT0dZR+AlfA, Alexey Ishchuk

Since s390x platform requires execution of privileged CPU instructions
to work with PCI I/O memory, the PCI I/O memory cannot be directly accessed
from the userspace programs via the mapped memory areas. The current
implementation of the Inifiniband verbs uses mapped memory areas to
write data to device UAR and Blueflame page to initiate the I/O
operations, these verbs cannot be used on the s390x platfrom without
modification.
This patch contains the changes to the libmlx4 userspace Mellanox device
driver library required to provide support for the DAPL API on the s390x
platform. The original code that directly used mapped memory areas to access
the PCI I/O  memory of the Mellanox networking device is replaced with the
new system call invocation for writing the data to mapped memory areas.

Signed-off-by: Alexey Ishchuk <aishchuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 src/doorbell.h |    8 +--
 src/mlx4.h     |    2 
 src/mmio.h     |  151 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 src/qp.c       |   17 ------
 4 files changed, 160 insertions(+), 18 deletions(-)

--- a/src/doorbell.h
+++ b/src/doorbell.h
@@ -33,6 +33,8 @@
 #ifndef DOORBELL_H
 #define DOORBELL_H
 
+#include "mmio.h"
+
 #if SIZEOF_LONG == 8
 
 #if __BYTE_ORDER == __LITTLE_ENDIAN
@@ -45,7 +47,7 @@
 
 static inline void mlx4_write64(uint32_t val[2], struct mlx4_context *ctx, int offset)
 {
-	*(volatile uint64_t *) (ctx->uar + offset) = MLX4_PAIR_TO_64(val);
+	mmio_writeq((unsigned long)(ctx->uar + offset), MLX4_PAIR_TO_64(val));
 }
 
 #else
@@ -53,8 +55,8 @@ static inline void mlx4_write64(uint32_t
 static inline void mlx4_write64(uint32_t val[2], struct mlx4_context *ctx, int offset)
 {
 	pthread_spin_lock(&ctx->uar_lock);
-	*(volatile uint32_t *) (ctx->uar + offset)     = val[0];
-	*(volatile uint32_t *) (ctx->uar + offset + 4) = val[1];
+	mmio_writel((unsigned long)(ctx->uar + offset), val[0]);
+	mmio_writel((unsigned long)(ctx->uar + offset + 4), val[1]);
 	pthread_spin_unlock(&ctx->uar_lock);
 }
 
--- a/src/mlx4.h
+++ b/src/mlx4.h
@@ -73,6 +73,8 @@
 #define wc_wmb() asm volatile("sfence" ::: "memory")
 #elif defined(__ia64__)
 #define wc_wmb() asm volatile("fwb" ::: "memory")
+#elif defined(__s390x__)
+#define wc_wmb { asm volatile("" : : : "memory") }
 #else
 #define wc_wmb() wmb()
 #endif
--- /dev/null
+++ b/src/mmio.h
@@ -0,0 +1,151 @@
+/*
+ * Copyright (c) 2014 IBM Corporation
+ */
+#ifndef MMIO_H
+#define MMIO_H
+
+#include <unistd.h>
+#include <asm/unistd.h>
+#include <sys/syscall.h>
+#ifdef __s390x__
+
+#define s390_pci_mmio_write_call(addr, value) \
+	syscall(__NR_s390_pci_mmio_write, addr, &value, sizeof(value))
+
+
+static inline long s390_pci_mmio_writeb(const unsigned long mmio_addr,
+					const uint8_t val)
+{
+	return s390_pci_mmio_write_call(mmio_addr, val);
+}
+
+static inline long s390_pci_mmio_writew(const unsigned long mmio_addr,
+					const uint16_t val)
+{
+	return s390_pci_mmio_write_call(mmio_addr, val);
+}
+
+static inline long s390_pci_mmio_writel(const unsigned long mmio_addr,
+					const uint32_t val)
+{
+	return s390_pci_mmio_write_call(mmio_addr, val);
+}
+
+static inline long s390_pci_mmio_writeq(const unsigned long mmio_addr,
+					const uint64_t val)
+{
+	return s390_pci_mmio_write_call(mmio_addr, val);
+}
+
+static inline long s390_pci_mmio_write(const unsigned long mmio_addr,
+				       const void *val,
+				       const size_t length)
+{
+	return syscall(__NR_s390_pci_mmio_write, mmio_addr, val, length);
+}
+
+#define s390_pci_mmio_read_call(addr, value)	\
+	syscall(__NR_s390_pci_mmio_read, addr, value, sizeof(*value))
+
+static inline long s390_pci_mmio_readb(const unsigned long mmio_addr,
+				       uint8_t *val)
+{
+	return s390_pci_mmio_read_call(mmio_addr, val);
+}
+
+static inline long s390_pci_mmio_readw(const unsigned long mmio_addr,
+				       uint16_t *val)
+{
+	return s390_pci_mmio_read_call(mmio_addr, val);
+}
+
+static inline long s390_pci_mmio_readl(const unsigned long mmio_addr,
+				       uint32_t *val)
+{
+	return s390_pci_mmio_read_call(mmio_addr, val);
+}
+
+static inline long s390_pci_mmio_readq(const unsigned long mmio_addr,
+				       uint64_t *val)
+{
+	return s390_pci_mmio_read_call(mmio_addr, val);
+}
+
+static inline long s390_pci_mmio_read(const unsigned long mmio_addr,
+				      void *val,
+				      const size_t length)
+{
+	return syscall(__NR_s390_pci_mmio_read, mmio_addr, val, length);
+}
+
+#define mmio_writeb(addr, value) \
+		s390_pci_mmio_writeb(addr, value)
+#define mmio_writew(addr, value) \
+		s390_pci_mmio_writew(addr, value)
+#define mmio_writel(addr, value) \
+		s390_pci_mmio_writel(addr, value)
+#define mmio_writeq(addr, value) \
+		s390_pci_mmio_writeq(addr, value)
+#define mmio_write(addr, value, length) \
+		s390_pci_mmio_write(addr, value, length)
+
+#define mmio_readb(addr, value) \
+		s390_pci_mmio_readb(addr, value)
+#define mmio_readw(addr, value) \
+		s390_pci_mmio_readw(addr, value)
+#define mmio_readl(addr, value) \
+		s390_pci_mmio_readl(addr, value)
+#define mmio_readq(addr, value) \
+		s390_pci_mmio_readq(addr, value)
+#define mmio_read(addr, value, length) \
+		s390_pci_mmio_read(addr, value, length)
+
+static inline void mlx4_bf_copy(unsigned long *dst,
+				unsigned long *src,
+				unsigned bytecnt)
+{
+	mmio_write((unsigned long) dst, src, bytecnt);
+}
+
+#else
+
+#define mmio_writeb(addr, value) \
+	(*((uint8_t *)addr) = value)
+#define mmio_writeb(addr, value) \
+	(*((uint16_t *)addr) = value)
+#define mmio_writew(addr, value) \
+	(*((uint32_t *)addr) = value)
+#define mmio_writeq(addr, value) \
+	(*((uint64_t *)addr) = value)
+#define mmio_write(addr, value, length) \
+	memcpy(addr, value, length)
+
+#define mmio_readb(addr, value) \
+	(value = *((uint8_t *)addr))
+#define mmio_readw(addr, value) \
+	(value = *((uint16_t *)addr))
+#define mmio_readl(addr, value) \
+	(value = *((uint32_t *)addr))
+#define mmio_readq(addr, value) \
+	(value = *((uint64_t *)addr))
+#define mmio_read(addr, value, length) \
+	memcpy(value, addr, length)
+
+/*
+ * Avoid using memcpy() to copy to BlueFlame page, since memcpy()
+ * implementations may use move-string-buffer assembler instructions,
+ * which do not guarantee order of copying.
+ */
+static inline void mlx4_bf_copy(unsigned long *dst,
+				unsigned long *src,
+				unsigned bytecnt)
+{
+	while (bytecnt > 0) {
+		*dst++ = *src++;
+		*dst++ = *src++;
+		bytecnt -= 2 * sizeof(long);
+	}
+}
+#endif
+
+#endif
--- a/src/qp.c
+++ b/src/qp.c
@@ -173,20 +173,6 @@ static void set_data_seg(struct mlx4_wqe
 	dseg->byte_count = htonl(sg->length);
 }
 
-/*
- * Avoid using memcpy() to copy to BlueFlame page, since memcpy()
- * implementations may use move-string-buffer assembler instructions,
- * which do not guarantee order of copying.
- */
-static void mlx4_bf_copy(unsigned long *dst, unsigned long *src, unsigned bytecnt)
-{
-	while (bytecnt > 0) {
-		*dst++ = *src++;
-		*dst++ = *src++;
-		bytecnt -= 2 * sizeof (long);
-	}
-}
-
 int mlx4_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
 			  struct ibv_send_wr **bad_wr)
 {
@@ -431,7 +417,8 @@ out:
 		 */
 		wmb();
 
-		*(uint32_t *) (ctx->uar + MLX4_SEND_DOORBELL) = qp->doorbell_qpn;
+		mmio_writel((unsigned long)(ctx->uar + MLX4_SEND_DOORBELL),
+			    qp->doorbell_qpn);
 	}
 
 	if (nreq)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 4/5] dapl: add support for the s390x platform
       [not found] ` <1409135080-44991-1-git-send-email-aishchuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
                     ` (2 preceding siblings ...)
  2014-08-27 10:24   ` [PATCH 3/5] libmlx4: " Alexey Ishchuk
@ 2014-08-27 10:24   ` Alexey Ishchuk
       [not found]     ` <1409135080-44991-5-git-send-email-aishchuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2014-08-27 10:24   ` [PATCH 5/5] perftest: " Alexey Ishchuk
  4 siblings, 1 reply; 9+ messages in thread
From: Alexey Ishchuk @ 2014-08-27 10:24 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: arlin.r.davis-ral2JQCrhuEAvxtiuMwx3w,
	gilr-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb,
	roland-DgEjT+Ai2ygdnm+yROfE0A, linux-s390-u79uwXL29TY76Z2rM5mHXA,
	gmuelas-tA70FqPdS9bQT0dZR+AlfA,
	utz.bacher-tA70FqPdS9bQT0dZR+AlfA,
	martin.schwidefsky-tA70FqPdS9bQT0dZR+AlfA,
	frank.blaschka-tA70FqPdS9bQT0dZR+AlfA, Alexey Ishchuk

This patch adds the dapl_os_atopmic_inc, dapl_os_atomic_dec, and
dapl_os_atomic_assign function implementatios to the dapl userspace package
to provide the DAPL API support on the s390x platform by adding Assembler
language implemenation of those platform specific functions.

Signed-off-by: Alexey Ishchuk <aishchuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 dapl/udapl/linux/dapl_osd.h |   37 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 36 insertions(+), 1 deletion(-)

--- a/dapl/udapl/linux/dapl_osd.h
+++ b/dapl/udapl/linux/dapl_osd.h
@@ -49,7 +49,9 @@
 #error UNDEFINED OS TYPE
 #endif /* __linux__ */
 
-#if !defined (__i386__) && !defined (__ia64__) && !defined(__x86_64__) && !defined(__PPC__) && !defined(__PPC64__)
+#if !defined(__i386__) && !defined(__ia64__) \
+&& !defined(__x86_64__) && !defined(__PPC__) && !defined(__PPC64__) \
+&& !defined(__s390x__)
 #error UNDEFINED ARCH
 #endif
 
@@ -156,6 +158,22 @@ int dapl_os_get_env_val (
 
 
 /* atomic functions */
+#ifdef __s390x__
+#define DAPL_CS_ADD(ptr, op_val) ({		\
+	int old_val, new_val;				\
+	__asm__ __volatile__(				\
+		"	l	%0,%2\n"		\
+		"0:	lr	%1,%0\n"		\
+		"	ar	%1,%3\n"		\
+		"	cs	%0,%1,%2\n"		\
+		"	jl	0b"			\
+		: "=&d" (old_val), "=&d" (new_val),	\
+		  "=Q" (*ptr)				\
+		: "d" (op_val), "Q" (*ptr)		\
+		: "cc", "memory");			\
+	new_val;					\
+})
+#endif
 
 /* dapl_os_atomic_inc
  *
@@ -179,6 +197,11 @@ dapl_os_atomic_inc (
 #else
 	IA64_FETCHADD(old_value,v,1,4);
 #endif
+#elif defined(__s390x__)
+	DAT_COUNT	tmp;
+	DAT_COUNT	delta = 1;
+
+	tmp = DAPL_CS_ADD(v, delta);
 #elif defined(__PPC__) || defined(__PPC64__)
 	int tmp;
 
@@ -218,6 +241,11 @@ dapl_os_atomic_dec (
 #else
 	IA64_FETCHADD(old_value,v,-1,4);
 #endif
+#elif defined(__s390x__)
+	DAT_COUNT	tmp;
+	DAT_COUNT	delta = -1;
+
+	tmp = DAPL_CS_ADD(v, delta);
 #elif defined (__PPC__) || defined(__PPC64__)
 	int tmp;
 
@@ -273,6 +301,13 @@ dapl_os_atomic_assign (
 #else
     current_value = ia64_cmpxchg(acq,v,match_value,new_value,4);
 #endif /* __ia64__ */
+#elif defined(__s390x__)
+	__asm__ __volatile__(
+		"	cs	%0,%2,%1\n"
+		: "+d" (match_value), "=Q" (*v)
+		: "d" (new_value), "Q" (*v)
+		: "cc", "memory");
+	current_value = match_value;
 #elif defined(__PPC__) || defined(__PPC64__)
         __asm__ __volatile__ (
 "       lwsync\n\

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 5/5] perftest: support for the s390x platform
       [not found] ` <1409135080-44991-1-git-send-email-aishchuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
                     ` (3 preceding siblings ...)
  2014-08-27 10:24   ` [PATCH 4/5] dapl: add support for the " Alexey Ishchuk
@ 2014-08-27 10:24   ` Alexey Ishchuk
  4 siblings, 0 replies; 9+ messages in thread
From: Alexey Ishchuk @ 2014-08-27 10:24 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: arlin.r.davis-ral2JQCrhuEAvxtiuMwx3w,
	gilr-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb,
	roland-DgEjT+Ai2ygdnm+yROfE0A, linux-s390-u79uwXL29TY76Z2rM5mHXA,
	gmuelas-tA70FqPdS9bQT0dZR+AlfA,
	utz.bacher-tA70FqPdS9bQT0dZR+AlfA,
	martin.schwidefsky-tA70FqPdS9bQT0dZR+AlfA,
	frank.blaschka-tA70FqPdS9bQT0dZR+AlfA, Alexey Ishchuk

This patch adds the required platform specific code to allow execution of
the perftest package applications on the s390x platform.

Signed-off-by: Alexey Ishchuk <aishchuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 README          |    2 ++
 src/get_clock.c |    6 ++++++
 src/get_clock.h |    8 ++++++++
 3 files changed, 16 insertions(+)

--- a/README
+++ b/README
@@ -234,3 +234,5 @@ Special feature detailed explanation in
     Please use the same perftest version on both sides to ensure consistency of benchmark results.
 
  6. This version (5.0) won't work with previous versions of perftest.
+
+ 7. In the s390x platform virtualized environment the results shown by package test applications can be incorrect.
--- a/src/get_clock.c
+++ b/src/get_clock.c
@@ -132,6 +132,7 @@ static double sample_get_cpu_mhz(void)
 	return b;
 }
 
+#ifndef __s390x__
 static double proc_get_cpu_mhz(int no_cpu_freq_fail)
 {
 	FILE* f;
@@ -181,9 +182,13 @@ static double proc_get_cpu_mhz(int no_cp
 	return mhz;
 }
 
+#endif
 
 double get_cpu_mhz(int no_cpu_freq_fail)
 {
+#ifdef __s390x__
+	return sample_get_cpu_mhz();
+#else
 	double sample, proc, delta;
 	sample = sample_get_cpu_mhz();
 	proc = proc_get_cpu_mhz(no_cpu_freq_fail);
@@ -199,4 +204,5 @@ double get_cpu_mhz(int no_cpu_freq_fail)
 			return sample;
 	}
 	return proc;
+#endif
 }
--- a/src/get_clock.h
+++ b/src/get_clock.h
@@ -70,7 +70,15 @@ static inline cycles_t get_cycles()
 	asm volatile ("mov %0=ar.itc" : "=r" (ret));
 	return ret;
 }
+#elif defined(__s390x__)
+typedef unsigned long long cycles_t;
+static inline cycles_t get_cycles(void)
+{
+	cycles_t	clk;
 
+	asm volatile("stck %0" : "=Q" (clk) : : "cc");
+	return clk >> 2;
+}
 #else
 #warning get_cycles not implemented for this architecture: attempt asm/timex.h
 #include <asm/timex.h>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH] s390/kernel: add system calls for access PCI memory
       [not found]     ` <1409135080-44991-2-git-send-email-aishchuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2014-08-27 15:18       ` Shachar Raindel
       [not found]         ` <6B2A6E60C06CCC42AE31809BF572352B010E23CCE5-LSMZvP3E4uyuSA5JZHE7gA@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Shachar Raindel @ 2014-08-27 15:18 UTC (permalink / raw)
  To: Alexey Ishchuk, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: arlin.r.davis-ral2JQCrhuEAvxtiuMwx3w,
	gilr-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb,
	roland-DgEjT+Ai2ygdnm+yROfE0A, linux-s390-u79uwXL29TY76Z2rM5mHXA,
	gmuelas-tA70FqPdS9bQT0dZR+AlfA,
	utz.bacher-tA70FqPdS9bQT0dZR+AlfA,
	martin.schwidefsky-tA70FqPdS9bQT0dZR+AlfA,
	frank.blaschka-tA70FqPdS9bQT0dZR+AlfA

Hi Alex,

Few comments on your patch below.


Thanks,
--Shachar

> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Alexey Ishchuk
> Sent: Wednesday, August 27, 2014 1:29 PM
> To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Cc: arlin.r.davis-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org; gilr-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org; roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org;
> linux-s390-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; gmuelas-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org;
> utz.bacher-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org; martin.schwidefsky-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org;
> frank.blaschka-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org; Alexey Ishchuk
> Subject: [PATCH] s390/kernel: add system calls for access PCI memory
> 
> Add the new __NR_s390_pci_mmio_write and __NR_s390_pci_mmio_read
> system calls to allow user space applications to access device PCI I/O

Why do you need this to be a special syscall for this functionality? If S390 platform supports mapping MMIO pages to the user space? If this must happen in kernel, it should be provided as a device file (probably character), on which writes or ioctls does mmio_write, and reads or ioctl does mmio_reads.

> memory pages on s390x platform.
> 
> Signed-off-by: Alexey Ishchuk <aishchuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> ---
>  arch/s390/include/uapi/asm/unistd.h |   4 +-
>  arch/s390/kernel/Makefile           |   1 +
>  arch/s390/kernel/entry.h            |   4 +
>  arch/s390/kernel/pci_mmio.c         | 197
> ++++++++++++++++++++++++++++++++++++
>  arch/s390/kernel/syscalls.S         |   2 +
>  5 files changed, 207 insertions(+), 1 deletion(-)
>  create mode 100644 arch/s390/kernel/pci_mmio.c
> 
> diff --git a/arch/s390/include/uapi/asm/unistd.h
> b/arch/s390/include/uapi/asm/unistd.h
> index 3802d2d..ab49d1d 100644
> --- a/arch/s390/include/uapi/asm/unistd.h
> +++ b/arch/s390/include/uapi/asm/unistd.h
> @@ -283,7 +283,9 @@
>  #define __NR_sched_setattr	345
>  #define __NR_sched_getattr	346
>  #define __NR_renameat2		347
> -#define NR_syscalls 348
> +#define __NR_s390_pci_mmio_write	348
> +#define __NR_s390_pci_mmio_read		349
> +#define NR_syscalls 350
> 
>  /*
>   * There are some system calls that are not present on 64 bit, some
> diff --git a/arch/s390/kernel/Makefile b/arch/s390/kernel/Makefile
> index 8c2518f..44e8fbb 100644
> --- a/arch/s390/kernel/Makefile
> +++ b/arch/s390/kernel/Makefile
> @@ -62,6 +62,7 @@ ifdef CONFIG_64BIT
>  obj-$(CONFIG_PERF_EVENTS)	+= perf_event.o perf_cpum_cf.o
> perf_cpum_sf.o \
>  						perf_cpum_cf_events.o
>  obj-y				+= runtime_instr.o cache.o
> +obj-y				+= pci_mmio.o
>  endif
> 
>  # vdso
> diff --git a/arch/s390/kernel/entry.h b/arch/s390/kernel/entry.h
> index 6ac7819..a36b6f9 100644
> --- a/arch/s390/kernel/entry.h
> +++ b/arch/s390/kernel/entry.h
> @@ -70,4 +70,8 @@ struct old_sigaction;
>  long sys_s390_personality(unsigned int personality);
>  long sys_s390_runtime_instr(int command, int signum);
> 
> +long sys_s390_pci_mmio_write(const unsigned long mmio_addr,
> +			     const void *user_buffer, const size_t length);
> +long sys_s390_pci_mmio_read(const unsigned long mmio_addr,
> +			    void *user_buffer, const size_t length);
>  #endif /* _ENTRY_H */
> diff --git a/arch/s390/kernel/pci_mmio.c b/arch/s390/kernel/pci_mmio.c
> new file mode 100644
> index 0000000..4539d23
> --- /dev/null
> +++ b/arch/s390/kernel/pci_mmio.c
> @@ -0,0 +1,197 @@
> +/*
> + * Copyright IBM Corp. 2014
> + */
> +#include <linux/kernel.h>
> +#include <linux/syscalls.h>
> +#include <linux/init.h>
> +#include <linux/mm.h>
> +#include <linux/errno.h>
> +#include <linux/pci.h>
> +
> +union value_buffer {
> +	u8 buf8;
> +	u16 buf16;
> +	u32 buf32;
> +	u64 buf64;
> +	u8 buf_large[64];
> +};
> +
> +static long get_pfn(const unsigned long user_addr,
> +		    const unsigned long access,
> +		    unsigned long *pfn)
> +{
> +	struct vm_area_struct *vma = NULL;
> +
> +	if (!pfn)
> +		return -EINVAL;
> +
> +	vma = find_vma(current->mm, user_addr);
> +	if (!vma)
> +		return -EINVAL;
> +	if (!(vma->vm_flags & access))
> +		return -EACCES;
> +
> +	return follow_pfn(vma, user_addr, pfn);
> +}
> +
> +static inline int verify_page_addr(const unsigned long page_addr)
> +{
> +	return !(page_addr < ZPCI_IOMAP_ADDR_BASE ||
> +	    page_addr > (ZPCI_IOMAP_ADDR_BASE |
> ZPCI_IOMAP_ADDR_IDX_MASK));
> +}
> +
> +static long choose_buffer(const size_t length,
> +			  union value_buffer *value,
> +			  void **buf)
> +{
> +	long ret = 0UL;
> +
> +	if (length > sizeof(value->buf_large)) {
> +		*buf = kmalloc(length, GFP_KERNEL);
> +		if (!*buf)
> +			return -ENOMEM;
> +		ret = 1;
> +	} else {
> +		*buf = value->buf_large;
> +	}
> +	return ret;
> +}
> +
> +SYSCALL_DEFINE3(s390_pci_mmio_write,
> +		const unsigned long, mmio_addr,
> +		const void __user *, user_buffer,
> +		const size_t, length)
> +{

You need some security check in the flow here (i.e. capability or root).
If you don't do that, any user can do mmio to any address in the system, which seems like a very bad idea.


> +	long ret = 0L;
> +	void *buf = NULL;
> +	long buf_allocated = 0;
> +	void __iomem *io_addr = NULL;
> +	unsigned long pfn = 0UL;
> +	unsigned long offset = 0UL;
> +	unsigned long page_addr = 0UL;
> +	union value_buffer value;
> +
> +	if (!length)
> +		return -EINVAL;
> +	if (!zpci_is_enabled())
> +		return -ENODEV;
> +
> +	ret = get_pfn(mmio_addr, VM_WRITE, &pfn);
> +	if (ret)
> +		return ret;
> +
> +	page_addr = pfn << PAGE_SHIFT;
> +	if (!verify_page_addr(page_addr))
> +		return -EFAULT;
> +
> +	offset = mmio_addr & ~PAGE_MASK;
> +	if (offset + length > PAGE_SIZE)
> +		return -EINVAL;
> +	io_addr = (void *)(page_addr | offset);
> +
> +	buf_allocated = choose_buffer(length, &value, &buf);
> +	if (buf_allocated < 0L)
> +		return -ENOMEM;
> +
> +	switch (length) {
> +	case 1:
> +		ret = get_user(value.buf8, ((u8 *)user_buffer));
> +		break;
> +	case 2:
> +		ret = get_user(value.buf16, ((u16 *)user_buffer));
> +		break;
> +	case 4:
> +		ret = get_user(value.buf32, ((u32 *)user_buffer));
> +		break;
> +	case 8:
> +		ret = get_user(value.buf64, ((u64 *)user_buffer));
> +		break;
> +	default:
> +		ret = copy_from_user(buf, user_buffer, length);
> +	}
> +	if (ret)
> +		goto out;
> +
> +	switch (length) {
> +	case 1:
> +		__raw_writeb(value.buf8, io_addr);
> +		break;
> +	case 2:
> +		__raw_writew(value.buf16, io_addr);
> +		break;
> +	case 4:
> +		__raw_writel(value.buf32, io_addr);
> +		break;
> +	case 8:
> +		__raw_writeq(value.buf64, io_addr);
> +		break;
> +	default:
> +		memcpy_toio(io_addr, buf, length);
> +	}
> +out:
> +	if (buf_allocated > 0L)
> +		kfree(buf);
> +	return ret;
> +}
> +
> +SYSCALL_DEFINE3(s390_pci_mmio_read,
> +		const unsigned long, mmio_addr,
> +		void __user *, user_buffer,
> +		const size_t, length)
> +{
> +	long ret = 0L;
> +	void *buf = NULL;
> +	long buf_allocated = 0L;
> +	void __iomem *io_addr = NULL;
> +	unsigned long pfn = 0UL;
> +	unsigned long offset = 0UL;
> +	unsigned long page_addr = 0UL;
> +	union value_buffer value;
> +
> +	if (!length)
> +		return -EINVAL;
> +	if (!zpci_is_enabled())
> +		return -ENODEV;
> +
> +	ret = get_pfn(mmio_addr, VM_READ, &pfn);
> +	if (ret)
> +		return ret;
> +
> +	page_addr = pfn << PAGE_SHIFT;
> +	if (!verify_page_addr(page_addr))
> +		return -EFAULT;
> +
> +	offset = mmio_addr & ~PAGE_MASK;
> +	if (offset + length > PAGE_SIZE)
> +		return -EINVAL;
> +	io_addr = (void *)(page_addr | offset);
> +
> +	buf_allocated = choose_buffer(length, &value, &buf);
> +	if (buf_allocated < 0L)
> +		return -ENOMEM;
> +
> +	switch (length) {
> +	case 1:
> +		value.buf8 = __raw_readb(io_addr);
> +		ret = put_user(value.buf8, ((u8 *)user_buffer));
> +		break;
> +	case 2:
> +		value.buf16 = __raw_readw(io_addr);
> +		ret = put_user(value.buf16, ((u16 *)user_buffer));
> +		break;
> +	case 4:
> +		value.buf32 = __raw_readl(io_addr);
> +		ret = put_user(value.buf32, ((u32 *)user_buffer));
> +		break;
> +	case 8:
> +		value.buf64 = __raw_readq(io_addr);
> +		ret = put_user(value.buf64, ((u64 *)user_buffer));
> +		break;
> +	default:
> +		memcpy_fromio(buf, io_addr, length);
> +		ret = copy_to_user(user_buffer, buf, length);
> +	}
> +	if (buf_allocated > 0L)
> +		kfree(buf);
> +	return ret;
> +}
> diff --git a/arch/s390/kernel/syscalls.S b/arch/s390/kernel/syscalls.S
> index fe5cdf2..1faa942 100644
> --- a/arch/s390/kernel/syscalls.S
> +++ b/arch/s390/kernel/syscalls.S
> @@ -356,3 +356,5 @@
> SYSCALL(sys_finit_module,sys_finit_module,compat_sys_finit_module)
>  SYSCALL(sys_sched_setattr,sys_sched_setattr,compat_sys_sched_setattr)
> /* 345 */
>  SYSCALL(sys_sched_getattr,sys_sched_getattr,compat_sys_sched_getattr)
>  SYSCALL(sys_renameat2,sys_renameat2,compat_sys_renameat2)
> +SYSCALL(sys_ni_syscall,sys_s390_pci_mmio_write,sys_ni_syscall)
> +SYSCALL(sys_ni_syscall,sys_s390_pci_mmio_read,sys_ni_syscall)
> --
> 1.8.5.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] s390/kernel: add system calls for access PCI memory
       [not found]         ` <6B2A6E60C06CCC42AE31809BF572352B010E23CCE5-LSMZvP3E4uyuSA5JZHE7gA@public.gmane.org>
@ 2014-08-28 12:00           ` Alexey Ishchuk
  0 siblings, 0 replies; 9+ messages in thread
From: Alexey Ishchuk @ 2014-08-28 12:00 UTC (permalink / raw)
  To: Shachar Raindel, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: arlin.r.davis-ral2JQCrhuEAvxtiuMwx3w,
	gilr-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb,
	roland-DgEjT+Ai2ygdnm+yROfE0A, linux-s390-u79uwXL29TY76Z2rM5mHXA,
	gmuelas-tA70FqPdS9bQT0dZR+AlfA,
	utz.bacher-tA70FqPdS9bQT0dZR+AlfA,
	martin.schwidefsky-tA70FqPdS9bQT0dZR+AlfA,
	frank.blaschka-tA70FqPdS9bQT0dZR+AlfA

Hi Shachar,

Thank you for your comments. Please, find my answers below.

Regards,
Alexey Ishchuk

On 08/27/2014 07:18 PM, Shachar Raindel wrote:
> Hi Alex,
>
> Few comments on your patch below.
>
>
> Thanks,
> --Shachar
>
>> -----Original Message-----
>> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
>> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Alexey Ishchuk
>> Sent: Wednesday, August 27, 2014 1:29 PM
>> To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Cc: arlin.r.davis-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org; gilr-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org; roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org;
>> linux-s390-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; gmuelas-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org;
>> utz.bacher-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org; martin.schwidefsky-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org;
>> frank.blaschka-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org; Alexey Ishchuk
>> Subject: [PATCH] s390/kernel: add system calls for access PCI memory
>>
>> Add the new __NR_s390_pci_mmio_write and __NR_s390_pci_mmio_read
>> system calls to allow user space applications to access device PCI I/O
>
> Why do you need this to be a special syscall for this functionality? If S390 platform supports mapping MMIO pages to the user space? If this must happen in kernel, it should be provided as a device file (probably character), on which writes or ioctls does mmio_write, and reads or ioctl does mmio_reads.
>
The special syscall is needed for this functionality because the s390
platform does not really support MMIO page mappings to user space. The
PCI memory itself on this platform can be accessed only using special
privileged CPU instructions. Because of that the user space program
cannot simply store the data into a mapped memory that belongs to the
PCI memory address space.

To get as close to the programming model used by "normal" architectures
the PCI MMIO address is stored into the page table for the user space
process as if the process could store to the memory. But in fact it can
not, every access to this memory area would create an exception.

To write data to the PCI memory address space  the user space program on
the s390 platform needs to call into the kernel to have these privileged
PCI instructions executed. A system call is a straight-forward way to do
this.

I tried to provide a new Infiniband verb command for data writing to PCI
memory but that approach was rejected by the community. My tests shown
that the syscall approach implementation works noticeably faster than
new verb command way. Concerning the security issues, please, see my
considerations below.

>> memory pages on s390x platform.
>>
>> Signed-off-by: Alexey Ishchuk <aishchuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>> ---
>>   arch/s390/include/uapi/asm/unistd.h |   4 +-
>>   arch/s390/kernel/Makefile           |   1 +
>>   arch/s390/kernel/entry.h            |   4 +
>>   arch/s390/kernel/pci_mmio.c         | 197
>> ++++++++++++++++++++++++++++++++++++
>>   arch/s390/kernel/syscalls.S         |   2 +
>>   5 files changed, 207 insertions(+), 1 deletion(-)
>>   create mode 100644 arch/s390/kernel/pci_mmio.c
>>
>> diff --git a/arch/s390/include/uapi/asm/unistd.h
>> b/arch/s390/include/uapi/asm/unistd.h
>> index 3802d2d..ab49d1d 100644
>> --- a/arch/s390/include/uapi/asm/unistd.h
>> +++ b/arch/s390/include/uapi/asm/unistd.h
>> @@ -283,7 +283,9 @@
>>   #define __NR_sched_setattr	345
>>   #define __NR_sched_getattr	346
>>   #define __NR_renameat2		347
>> -#define NR_syscalls 348
>> +#define __NR_s390_pci_mmio_write	348
>> +#define __NR_s390_pci_mmio_read		349
>> +#define NR_syscalls 350
>>
>>   /*
>>    * There are some system calls that are not present on 64 bit, some
>> diff --git a/arch/s390/kernel/Makefile b/arch/s390/kernel/Makefile
>> index 8c2518f..44e8fbb 100644
>> --- a/arch/s390/kernel/Makefile
>> +++ b/arch/s390/kernel/Makefile
>> @@ -62,6 +62,7 @@ ifdef CONFIG_64BIT
>>   obj-$(CONFIG_PERF_EVENTS)	+= perf_event.o perf_cpum_cf.o
>> perf_cpum_sf.o \
>>   						perf_cpum_cf_events.o
>>   obj-y				+= runtime_instr.o cache.o
>> +obj-y				+= pci_mmio.o
>>   endif
>>
>>   # vdso
>> diff --git a/arch/s390/kernel/entry.h b/arch/s390/kernel/entry.h
>> index 6ac7819..a36b6f9 100644
>> --- a/arch/s390/kernel/entry.h
>> +++ b/arch/s390/kernel/entry.h
>> @@ -70,4 +70,8 @@ struct old_sigaction;
>>   long sys_s390_personality(unsigned int personality);
>>   long sys_s390_runtime_instr(int command, int signum);
>>
>> +long sys_s390_pci_mmio_write(const unsigned long mmio_addr,
>> +			     const void *user_buffer, const size_t length);
>> +long sys_s390_pci_mmio_read(const unsigned long mmio_addr,
>> +			    void *user_buffer, const size_t length);
>>   #endif /* _ENTRY_H */
>> diff --git a/arch/s390/kernel/pci_mmio.c b/arch/s390/kernel/pci_mmio.c
>> new file mode 100644
>> index 0000000..4539d23
>> --- /dev/null
>> +++ b/arch/s390/kernel/pci_mmio.c
>> @@ -0,0 +1,197 @@
>> +/*
>> + * Copyright IBM Corp. 2014
>> + */
>> +#include <linux/kernel.h>
>> +#include <linux/syscalls.h>
>> +#include <linux/init.h>
>> +#include <linux/mm.h>
>> +#include <linux/errno.h>
>> +#include <linux/pci.h>
>> +
>> +union value_buffer {
>> +	u8 buf8;
>> +	u16 buf16;
>> +	u32 buf32;
>> +	u64 buf64;
>> +	u8 buf_large[64];
>> +};
>> +
>> +static long get_pfn(const unsigned long user_addr,
>> +		    const unsigned long access,
>> +		    unsigned long *pfn)
>> +{
>> +	struct vm_area_struct *vma = NULL;
>> +
>> +	if (!pfn)
>> +		return -EINVAL;
>> +
>> +	vma = find_vma(current->mm, user_addr);
>> +	if (!vma)
>> +		return -EINVAL;
>> +	if (!(vma->vm_flags & access))
>> +		return -EACCES;
>> +
>> +	return follow_pfn(vma, user_addr, pfn);
>> +}
>> +
>> +static inline int verify_page_addr(const unsigned long page_addr)
>> +{
>> +	return !(page_addr < ZPCI_IOMAP_ADDR_BASE ||
>> +	    page_addr > (ZPCI_IOMAP_ADDR_BASE |
>> ZPCI_IOMAP_ADDR_IDX_MASK));
>> +}
>> +
>> +static long choose_buffer(const size_t length,
>> +			  union value_buffer *value,
>> +			  void **buf)
>> +{
>> +	long ret = 0UL;
>> +
>> +	if (length > sizeof(value->buf_large)) {
>> +		*buf = kmalloc(length, GFP_KERNEL);
>> +		if (!*buf)
>> +			return -ENOMEM;
>> +		ret = 1;
>> +	} else {
>> +		*buf = value->buf_large;
>> +	}
>> +	return ret;
>> +}
>> +
>> +SYSCALL_DEFINE3(s390_pci_mmio_write,
>> +		const unsigned long, mmio_addr,
>> +		const void __user *, user_buffer,
>> +		const size_t, length)
>> +{
>
> You need some security check in the flow here (i.e. capability or root).
> If you don't do that, any user can do mmio to any address in the system, which seems like a very bad idea.
>
An additional security check on the MMIO address given with the syscall
is not required because it uses the address of a previously mapped
MMIO area. That area should be mapped only by a process that has
the appropriate permissions for working with the uverbs file. When the
syscall receives the user space address it tries to find corresponding
vma and checks required access to it, and if vma cannot be found or
access mode is invalid syscall returns an  error code. After the vma is
found, syscall verifies that the memory area belongs to the PCI memory
address space, and only if a correct address is received it writes the
data using the specified address. Therefore, as the address should be
previously mapped for a correct file and belong to the PCI memory
address space, the syscall does not work with any user space address,
and will work only with the addresses mapped from correct uverbs file
that belong to the PCI memory address space.

Compare it with a "normal" architecture, there is no additional security
check either, the user space process simply access the MMIO area
with read/write instructions directly. All required checks need to go
into the code that establishes the mapping.
>
>> +	long ret = 0L;
>> +	void *buf = NULL;
>> +	long buf_allocated = 0;
>> +	void __iomem *io_addr = NULL;
>> +	unsigned long pfn = 0UL;
>> +	unsigned long offset = 0UL;
>> +	unsigned long page_addr = 0UL;
>> +	union value_buffer value;
>> +
>> +	if (!length)
>> +		return -EINVAL;
>> +	if (!zpci_is_enabled())
>> +		return -ENODEV;
>> +
>> +	ret = get_pfn(mmio_addr, VM_WRITE, &pfn);
>> +	if (ret)
>> +		return ret;
>> +
>> +	page_addr = pfn << PAGE_SHIFT;
>> +	if (!verify_page_addr(page_addr))
>> +		return -EFAULT;
>> +
>> +	offset = mmio_addr & ~PAGE_MASK;
>> +	if (offset + length > PAGE_SIZE)
>> +		return -EINVAL;
>> +	io_addr = (void *)(page_addr | offset);
>> +
>> +	buf_allocated = choose_buffer(length, &value, &buf);
>> +	if (buf_allocated < 0L)
>> +		return -ENOMEM;
>> +
>> +	switch (length) {
>> +	case 1:
>> +		ret = get_user(value.buf8, ((u8 *)user_buffer));
>> +		break;
>> +	case 2:
>> +		ret = get_user(value.buf16, ((u16 *)user_buffer));
>> +		break;
>> +	case 4:
>> +		ret = get_user(value.buf32, ((u32 *)user_buffer));
>> +		break;
>> +	case 8:
>> +		ret = get_user(value.buf64, ((u64 *)user_buffer));
>> +		break;
>> +	default:
>> +		ret = copy_from_user(buf, user_buffer, length);
>> +	}
>> +	if (ret)
>> +		goto out;
>> +
>> +	switch (length) {
>> +	case 1:
>> +		__raw_writeb(value.buf8, io_addr);
>> +		break;
>> +	case 2:
>> +		__raw_writew(value.buf16, io_addr);
>> +		break;
>> +	case 4:
>> +		__raw_writel(value.buf32, io_addr);
>> +		break;
>> +	case 8:
>> +		__raw_writeq(value.buf64, io_addr);
>> +		break;
>> +	default:
>> +		memcpy_toio(io_addr, buf, length);
>> +	}
>> +out:
>> +	if (buf_allocated > 0L)
>> +		kfree(buf);
>> +	return ret;
>> +}
>> +
>> +SYSCALL_DEFINE3(s390_pci_mmio_read,
>> +		const unsigned long, mmio_addr,
>> +		void __user *, user_buffer,
>> +		const size_t, length)
>> +{
>> +	long ret = 0L;
>> +	void *buf = NULL;
>> +	long buf_allocated = 0L;
>> +	void __iomem *io_addr = NULL;
>> +	unsigned long pfn = 0UL;
>> +	unsigned long offset = 0UL;
>> +	unsigned long page_addr = 0UL;
>> +	union value_buffer value;
>> +
>> +	if (!length)
>> +		return -EINVAL;
>> +	if (!zpci_is_enabled())
>> +		return -ENODEV;
>> +
>> +	ret = get_pfn(mmio_addr, VM_READ, &pfn);
>> +	if (ret)
>> +		return ret;
>> +
>> +	page_addr = pfn << PAGE_SHIFT;
>> +	if (!verify_page_addr(page_addr))
>> +		return -EFAULT;
>> +
>> +	offset = mmio_addr & ~PAGE_MASK;
>> +	if (offset + length > PAGE_SIZE)
>> +		return -EINVAL;
>> +	io_addr = (void *)(page_addr | offset);
>> +
>> +	buf_allocated = choose_buffer(length, &value, &buf);
>> +	if (buf_allocated < 0L)
>> +		return -ENOMEM;
>> +
>> +	switch (length) {
>> +	case 1:
>> +		value.buf8 = __raw_readb(io_addr);
>> +		ret = put_user(value.buf8, ((u8 *)user_buffer));
>> +		break;
>> +	case 2:
>> +		value.buf16 = __raw_readw(io_addr);
>> +		ret = put_user(value.buf16, ((u16 *)user_buffer));
>> +		break;
>> +	case 4:
>> +		value.buf32 = __raw_readl(io_addr);
>> +		ret = put_user(value.buf32, ((u32 *)user_buffer));
>> +		break;
>> +	case 8:
>> +		value.buf64 = __raw_readq(io_addr);
>> +		ret = put_user(value.buf64, ((u64 *)user_buffer));
>> +		break;
>> +	default:
>> +		memcpy_fromio(buf, io_addr, length);
>> +		ret = copy_to_user(user_buffer, buf, length);
>> +	}
>> +	if (buf_allocated > 0L)
>> +		kfree(buf);
>> +	return ret;
>> +}
>> diff --git a/arch/s390/kernel/syscalls.S b/arch/s390/kernel/syscalls.S
>> index fe5cdf2..1faa942 100644
>> --- a/arch/s390/kernel/syscalls.S
>> +++ b/arch/s390/kernel/syscalls.S
>> @@ -356,3 +356,5 @@
>> SYSCALL(sys_finit_module,sys_finit_module,compat_sys_finit_module)
>>   SYSCALL(sys_sched_setattr,sys_sched_setattr,compat_sys_sched_setattr)
>> /* 345 */
>>   SYSCALL(sys_sched_getattr,sys_sched_getattr,compat_sys_sched_getattr)
>>   SYSCALL(sys_renameat2,sys_renameat2,compat_sys_renameat2)
>> +SYSCALL(sys_ni_syscall,sys_s390_pci_mmio_write,sys_ni_syscall)
>> +SYSCALL(sys_ni_syscall,sys_s390_pci_mmio_read,sys_ni_syscall)
>> --
>> 1.8.5.5
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 4/5] dapl: add support for the s390x platform
       [not found]     ` <1409135080-44991-5-git-send-email-aishchuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2014-09-02 16:06       ` Davis, Arlin R
  0 siblings, 0 replies; 9+ messages in thread
From: Davis, Arlin R @ 2014-09-02 16:06 UTC (permalink / raw)
  To: Alexey Ishchuk, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: gilr-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb,
	roland-DgEjT+Ai2ygdnm+yROfE0A, linux-s390-u79uwXL29TY76Z2rM5mHXA,
	gmuelas-tA70FqPdS9bQT0dZR+AlfA,
	utz.bacher-tA70FqPdS9bQT0dZR+AlfA,
	martin.schwidefsky-tA70FqPdS9bQT0dZR+AlfA,
	frank.blaschka-tA70FqPdS9bQT0dZR+AlfA


> Subject: [PATCH 4/5] dapl: add support for the s390x platform
> 
> This patch adds the dapl_os_atopmic_inc, dapl_os_atomic_dec, and
> dapl_os_atomic_assign function implementatios to the dapl userspace
> package to provide the DAPL API support on the s390x platform by adding
> Assembler language implemenation of those platform specific functions.
> 
> Signed-off-by: Alexey Ishchuk <aishchuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

Acked-by: Arlin Davis <arlin.r.davis-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Committed. Thanks!

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-09-02 16:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-27 10:24 [PATCH 0/5] DAPL support on s390x platform prototype Alexey Ishchuk
     [not found] ` <1409135080-44991-1-git-send-email-aishchuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2014-08-27 10:24   ` [PATCH] s390/kernel: add system calls for access PCI memory Alexey Ishchuk
     [not found]     ` <1409135080-44991-2-git-send-email-aishchuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2014-08-27 15:18       ` Shachar Raindel
     [not found]         ` <6B2A6E60C06CCC42AE31809BF572352B010E23CCE5-LSMZvP3E4uyuSA5JZHE7gA@public.gmane.org>
2014-08-28 12:00           ` Alexey Ishchuk
2014-08-27 10:24   ` [PATCH 2/5] libibverbs: add support for s390x platform Alexey Ishchuk
2014-08-27 10:24   ` [PATCH 3/5] libmlx4: " Alexey Ishchuk
2014-08-27 10:24   ` [PATCH 4/5] dapl: add support for the " Alexey Ishchuk
     [not found]     ` <1409135080-44991-5-git-send-email-aishchuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2014-09-02 16:06       ` Davis, Arlin R
2014-08-27 10:24   ` [PATCH 5/5] perftest: " Alexey Ishchuk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.